Changelog Terality #5: Improved performance on small datasets

January 31, 2022
Terality team
Terality team

Highlight: Improved performance on small datasets

Terality currently performs network calls to our engine when running any pandas function. This is true even on small datasets (about one gigabyte or less), where the actual computation time is low compared to the overhead of delegating calls to the engine. In order to make Terality feel snappy even on small datasets, we investigated ways to reduce the engine overhead.

In the end, we were able to cut around 30% of the execution time when executing functions on small datasets.

This is the result of a series of internal optimizations in our parallelized engine, especially by using a more efficient way of communication between our engine and our internal storage system.  

Expect even more improvements in the future!


  • Added support for several groupby methods: filter, head, tail, nth.
  • Added a progress bar on local file imports with an upload time estimation.

Improvement & fixes

  • `from_pandas` now correctly supports non-str index names.
  • Added support for `”xlswriter”` in `DataFrame.to_excel`.
  • Added support for the `columns` argument in `read_parquet`.
  • Fixed a bug in `Series.is_unique` and `Index.is_unique` when they contain NaN values.
  • Added support for the `indicator` argument in `merge`.
  • groupby.agg now supports user defined functions.
  • read_csv now supports the `delim_whitespace` parameter.
  • read_parquet now supports columns of booleans with NA.
  • Clearer errors when calling a pandas method with incorrect types.
  • You can now view your past invoices and update your payment information directly on
  • Fixed two potential sources of Internal Server Errors in rare situations.
  • Reduced verbose caching logs on trivial methods.
  • Fixed an issue when exporting to a file in the current working directory.

Interested in joining the team?

Home DocsIntegrationsPricingBlogContact UsAbout UsLog In