Hi,
I’m Alessandro, the author of Crafting Test-Driven Software with Python and Modern Python Standard Library Cookbook, and a long term Open Source Python developer working on frameworks and libraries for data intensive applications.
This is the first issue of the Python Data Engineering newsletter. The newsletter was born as a way to scratch my own itch of having to keep up with updates to all the components that are foundations for data engineering projects in Python, and aims to be different from the usual ones that focus on data science and analytics, as it concentrates more on the fundamental building blocks needed to create platforms on which data science can run.
Hope it will be helpful for other people too and lightweight enough that it won’t introduce additional information overload for its readers.
Key Highlight
Narwhals moves fast, with an astonishing number of 26 releases made in the last month.
This month the project further improved support for PyArrow, and from the limited proof of concept that it was a month ago it is now shaping into a much more feature complete dataframe interface. This practically makes PyArrow tables querable via the Acero built-in engine using the polars dataframes syntax.
This month, Polars also reaches version 1.0.0, celebrating its first major release.
News
Great Tables released version v0.10.0, with few new features and support for the
.show()
method which will show tables in rich format in a web browser when invoking it from console.Shiny released version 1.0.0, which is a major milestone for the project. With 20 new features and marking the project as production ready. Interesting new feature is
shiny.ui.Chat
which adds a UI component for response generation.Substrait has released version 0.21.0, updating the supported version of the Substrait format to 0.52.
Polars made 5 different releases this month, reaching version 1.2.1. Now Polars exposes the Arrow C-Data interface natively, thus making possible to make polars interoperable with any other library supporting Arrow via the C-Data interface. The main event is the release of the first major version, 1.0.0, with many breaking changes but also 118 new features. Most notably various improvements to its support for SQL.
Datashader fixed some compatibility issues with newer CUDF and geopandas in version 0.16.3
PyScript removed old pyweb API and in favor of pyscript.web API in version version 2024.7.1
Dask has two new releases this month, with 2024.7.1 being the most recent one. Most notably, Dask has dropped support for Pandas oldest than 2.0
Deltalake version 0.18.2 introduced a some major feature, adding HDFS support and support for PyArrow Expressions in filters.