Changelog Terality #4: caching the result of computations

January 5, 2022
Terality team
Terality team

Highlight: automatically skip computations whose results are already known

Terality now caches the result of computations. In other words, running the same function twice on the same inputs will return a result almost instantly.

This new feature saves you a lot of time if you execute a compute-heavy cell twice in a Jupyter notebook. The second execution will simply return the cached results, skipping the computation altogether.

Additionally, import functions such as `read_csv` are also cached. This means that once you load a dataset into Terality, loading the same dataset again will be instant - even if you close your Jupyter notebook and come back to it the day after.

This feature is available for datasets of any size (we don’t like limitations on dataset size at Terality), and is now enabled without any change to your existing code. Currently, results are cached for about three days, but this is subject to changes in the future.

If you want to benchmark Terality with reproducible execution times, you have the option to disable this cache. The full docs are available at https://docs.terality.com/getting-terality/user-guide/caching.

New

  • We published a public status page! You can now get information about any service interruption at www.teralitystatus.com. If you want to be notified of outages and scheduled maintenance windows, click on the “Get Updates” button. The user dashboard at app.terality.com will also display a banner during maintenance and incidents. And as usual, feel free to contact us at support@terality.com if you encounter any issue.
  • Added log messages to better communicate on events that happened inside the Terality engine during computation. For instance, a message will be displayed if a computation is not supported by the optimized (parallelized) engine and was run on our slower fallback engine.
  • Support for several groupby methods : agg, apply, transform, first, last, size.
  • Added support for the `expand` argument in `Series.str.split`, `Index.str.split`, `Series.str.rsplit`, `Index.str.rsplit`, `Series.str.partition`, `Index.str.partition`, `Series.str.rpartition`, and `Index.str.rpartition`.
  • Added implementation for `Index.shape` and `Index.dtype`.
  • Added support for tuples of `bool` in `DataFrame.sort_values`.

Improvements & Fixes

  • When any error occurs during a computation, the computation is now excluded from the data usage quota.
  • Added support for importing and exporting to S3 buckets in all AWS regions created before 2019. Before this change, data transfers to and from S3 buckets in specific AWS regions could fail. We’ll add support for the few remaining regions soon.
  • Fix potential “Internal Server Errors” during calls to some compute-heavy pandas functions running on large datasets.
  • Several fixes for potential “Internal Server Errors” in rare situations.
  • Fix a potential AWS authentication error when trying to read from public S3 buckets while under a specific AWS IAM policy. This could be triggered while running Terality in AWS SageMaker notebooks created with the default AWS IAM user policy.
  • Major improvements to DataFrame/Series.apply performance.
  • `read_json` now supports reading multiple files from a folder.

Interested in joining the team?

Home DocumentationBlogPricingContact UsAbout UsLog In