At Terality, we aim at building the fastest and easy-to-use serverless data processing engine for data teams. We want to offer the same experience as Spark in terms of speed and scalability, but with the same syntax as pandas and in a fully serverless way.
Pandas is the most used library by data science teams. We wanted to replicate its API so that you don't have to learn another syntax nor change your existing code (contrarily to PySpark or Dask, for example).
The pandas API is massive. Reimplementing and optimizing all the pandas functions will take some more time. Because we want everyone to use Terality before we complete this task, we designed a solution to cover the vast majority of the pandas API.
Therefore, we have released a side engine that will run all the functions we haven't optimized and parallelized yet. Conceptually, this side engine behaves like if you had a large server with hundreds of gigabytes of memory.
When you call a pandas function from your Notebook or IDE, it will run on our side, whatever the function:
For you as a user, the switch between both engines is entirely transparent. We keep Terality's promise: it's fully serverless, scalable, and compatible with the pandas API.
To help us prioritize the functions we will implement in the main parallelized engine, our systems will automatically notify our team when a function called the "non-parallelized engine".
You can start using Terality today by visiting our Website and Documentation.
We are in beta in September 2021. Please, reach out to us from the live chat on our website or by writing to support@terality.com. We are open to all feedback, remarks, and questions.
As of today, you can use Terality in your favorite data science online notebook environment - Google Colab. A lot of Google Colab users have been experiencing the pain of getting memory errors and speed issues with Pandas. Indeed, Pandas doesn’t scale well when it comes to processing large datasets above 5 or 10GB.