🚀 Release of the side engine
We have released a side engine that will run all the functions we haven't optimized and parallelized yet. This means that you can execute (almost) any pandas function with Terality, even the ones we haven't hand-optimized. You don't have to do anything to use this side engine: our scheduler will automatically use it when needed.
Conceptually, this side engine behaves like if you had a large server with hundreds of gigabytes of memory. Read more about it here.
If you have m NA values on the left side and n NA values on the right side, the resulting dataframe would have (at least) n * m rows. For instance, merging a dataframe with 10k NA values with a dataframe with 20k NA values would result in 200.000.000 rows in the output. In such a case, pandas will typically crash without giving a hint about what happened. Since this is almost never what you intended, Terality will instead display a clear error message highlighting the issue for you.
💎 Improvements & Fixes
As of today, you can use Terality in your favorite data science online notebook environment - Google Colab. A lot of Google Colab users have been experiencing the pain of getting memory errors and speed issues with Pandas. Indeed, Pandas doesn’t scale well when it comes to processing large datasets above 5 or 10GB.