As of today, you can use Terality in your favorite data science online notebook environment - Google Colab. A lot of Google Colab users have been experiencing the pain of getting memory errors and speed issues with Pandas. Indeed, Pandas doesn’t scale well when it comes to processing large datasets above 5 or 10GB.
To check how Terality compares to the best solutions on the market, we picked the most scientific, unbiased and well-known benchmark for pandas alternatives: the h2o benchmark. It consists of a list of timed simulations on different database-like operations like: join, merge, and groupby, run on different dataset sizes: 0.5, 5 and 50GB. You can check the final section where we give more detail on the experiments and how to reproduce the results for Terality.
After weeks of preparation, we’re proud to finally announce Terality hosted demo notebook - the fastest way to take Terality for a test ride, completely free of charge. We wanted to lower the time needed for you to realize what Terality is all about to 1 click! There’s no better way than running a pre-written tutorial on our infrastructure to experience our pandas lightning-fast serverless data processing
In this article, we review the different options available to you as a Data Scientist to make pandas work at scale: whether on larger datasets or faster. We then explain why we built Terality, combining all the best features of these options in a single solution. Data scientists can finally run pandas at scale with our fully serverless engine, by changing just one line of their code.
We have released a side engine that will run all the functions we haven't optimized and parallelized yet. We keep Terality's promise: it's fully serverless, scalable, and compatible with the pandas API.
Terality is a distributed data processing engine for Data Scientists to execute all their Pandas code 100 times faster, even on terabytes of data, by only changing one line of code. Terality is hosted, so there is no infrastructure to manage, and memory is virtually unlimited. Data Scientists can be up and running within minutes, even on an existing Pandas codebase.