Changelog Terality #1: Release of the side engine and many other things

October 30, 2021
Terality team
Terality team

🚀  Release of the side engine

We have released a side engine that will run all the functions we haven't optimized and parallelized yet. This means that you can execute (almost) any pandas function with Terality, even the ones we haven't hand-optimized. You don't have to do anything to use this side engine: our scheduler will automatically use it when needed.

Conceptually, this side engine behaves like if you had a large server with hundreds of gigabytes of memory. Read more about it here.

⭐  New

  • Terality now notifies you when trying to merge two dataframes on columns containing many NA values.

If you have m NA values on the left side and n NA values on the right side, the resulting dataframe would have (at least) n * m rows. For instance, merging a dataframe with 10k NA values with a dataframe with 20k NA values would result in 200.000.000 rows in the output. In such a case, pandas will typically crash without giving a hint about what happened. Since this is almost never what you intended, Terality will instead display a clear error message highlighting the issue for you.

  • Index subclasses, such as DatetimeIndex or Float64Index, are now supported.
  • Datetime accessors (myindex.dt) are now available on DatetimeIndex.
  • Terality now accepts objects with the ndarray type as argument, and support functions that return a ndarray (e.g. Series.unique).
  • Added the python -m terality alias for the Terality command-line interface. This offers an alternative to users who have not configured their PATH environment variable to work with pip.

💎  Improvements & Fixes

  • Multiple performance improvements, especially when many Terality functions are called in a short time.
  • We fixed performance issues when handling series of strings containing missing or empty values.
  • Improvements to several error messages, which should now be clearer and more actionable.
  • read_parquet now supports multi-indexes and named indexes.
  • read_parquet now supports reading parquet files with row groups exceeding a few GiB (previously, this would result in internal server errors).
  • Many improvements to our internal testing and monitoring systems. We are now able to identify and solve issues faster than ever.

Interested in joining the team?

Home DocsIntegrationsPricingBlogContact UsAbout UsLog In