Terality is a distributed data processing engine for Data Scientists to execute all their Pandas code 100 times faster, even on terabytes of data, by only changing one line of code. Terality is hosted, so there is no infrastructure to manage, and memory is virtually unlimited. Data Scientists can be up and running within minutes, even on an existing Pandas codebase.
When I met with Adrien last year, now my co-founder, he was a Data Scientist. Adrien was very frustrated by one particular thing: executing Pandas code at scale.
He had two main issues:
As a Data Scientist and a developer, Adrien tried to find solutions. Among them:
Coming from there, we talked to 50+ Data Scientists, working in many industries such as software, financial services, logistics or gaming. Adrien and I quickly realized and validated that those issues were commonly shared among them. Starting from a few GB, Data Scientists struggled with Pandas. What’s more, most of the Data Scientists we interviewed were not software engineers, and struggled even more with existing open source solutions or Spark. Actually, Pandas is loved by Data Scientists and they want to stick with the Pandas syntax.
We decided to build the ultimate data processing engine, capable of executing all Pandas functions 100x faster, even terabytes of data, without having to handle any infrastructure, and by only changing one line of code. In the short term, our ambition is to solve all the pain points Data Scientists have with executing Pandas at scale and its alternatives.
From the very beginning, we wanted to strongly focus on the Data Scientists' experience. Our mission is to improve Data Scientists' lives.
To achieve that, we need to build the best experience possible for them including:
We've pushed hard within the last months to release a first version of Terality which already provides value to Data Scientists. Here are the first benefits we observed with our first beta testers:
Today, we are starting to accept teams and companies to our private beta. If you are interested in joining early and influencing the direction of Terality, sign up here and follow us on Twitter and Linkedin.
Terality can now provision infrastructure in less than 500 milliseconds, allowing you to get started right away with much less latency.
In our first benchmarks, we are experiencing, on average, a 10x faster experience with Terality than Pandas on 1GB+ files.
We have released a side engine that will run all the functions we haven't optimized and parallelized yet. This means that you can execute (almost) any pandas function with Terality, even the ones we haven't hand-optimized. You don't have to do anything to use this side engine: our scheduler will automatically use it when needed.