Changelog Terality #2: Huge Performance Improvements

October 30, 2021
Terality team
Terality team
🚀

Terality is now much more responsive

Terality is perfect for interactively exploring big datasets in a Jupyter notebook or a Python script. Still, when you began a session, we used to need a few seconds to provision infrastructure for you. This behavior could make Terality feels a bit unresponsive when you start working with a dataset.

Terality can now provision infrastructure in less than 500 milliseconds, allowing you to get started right away with much less latency. Of course, you still don’t have to concern yourself with said infrastructure: everything is done transparently behind the scenes.

With this change, associated with other optimizations in various operations, Terality can run many operations with less latency. Simple functions (such as DataFrame.head) reliably return a result faster, and complex operations (such as merges or joins) also benefit from these improvements.


⭐

New


  • Implementation of RangeIndex and DatetimeIndex with freq. After finishing implementing all data types, we now moved on to having all pandas data structures implemented in Terality.
  • Implementation of three other new functions:: 

- Df.isin

- Df.replace

- df.explode

  • Terality is now compatible with all pandas.tseries.offsets objects


💎

Improvements & Fixes

  • After fine-tuning the parameters used during the data preparation on the Terality server, complex pandas functions now take a lot less time. A merge job on 30GB now takes 1 minute instead of 3 minutes previously.
  • Terality now has better error messages for our users. We wanted to communicate very clearly if a problem is due to using a function with incorrect parameters values, something not being supported by Terality (yet), or a network issue to know what action can be taken on your side to solve the problem issue.
  • Iterating your pandas’ data structures is generally not a good practice, and even less so with a distributed engine like Terality. We still allow it, using a caching optimization mechanism, so you at least don’t make one request for each line. However, a bug was causing our iteration to still make one request for each line, nullifying the benefits of caching. It is now fixed! We still recommend avoiding iterating over your structures and using a built-in function instead, but it shouldn’t be too slow if you wish to do so.
  • Fixed an issue when the user used an old urllib version.
  • Saved a couple of hundreds of milliseconds on all operations using the non parallelized engine.
  • Fixed some issues that could result in an error 500 (without an error message) in the client.
  • Improve performances on data with mixed types.
  • memory_usage now correctly works on all data types.

Interested in joining the team?

Home DocumentationBlogPricingContact UsAbout UsLog In