Terality currently performs network calls to our engine when running any pandas function. This is true even on small datasets (about one gigabyte or less), where the actual computation time is low compared to the overhead of delegating calls to the engine. In order to make Terality feel snappy even on small datasets, we investigated ways to reduce the engine overhead.
In the end, we were able to cut around 30% of the execution time when executing functions on small datasets.
This is the result of a series of internal optimizations in our parallelized engine, especially by using a more efficient way of communication between our engine and our internal storage system.
Expect even more improvements in the future!
As of today, you can use Terality in your favorite data science online notebook environment - Google Colab. A lot of Google Colab users have been experiencing the pain of getting memory errors and speed issues with Pandas. Indeed, Pandas doesn’t scale well when it comes to processing large datasets above 5 or 10GB.