ParadeDB, the developer of PostgreSQL version for data analysis, has introduced pg_lakehouse, adding features that allow PostgreSQL to be used instead of specific databases like DuckDB.
One key feature of pg_lakehouse is the ability to pull external data into PostgreSQL as tables, with the data being fed into Apache DataFusion, an analytics query engine similar in performance to DuckDB. Previously, there were similar extensions, but pg_lakehouse utilizes Apache OpenDAL for data transformation to support various file formats. If a query cannot utilize DataFusion, it falls back to using PostgreSQL engine.
Currently, pg_lakehouse only supports read queries, but ParadeDB plans to support Apache Iceberg tables and add additional file formats in the future.
pg_lakehouse is licensed under AGPL, different from PostgreSQL’s PostgreSQL License, which is more permissive, allowing modifications and alterations. However, AGPL requires that any modified code must be distributed under the same license.
TLDR: ParadeDB introduces pg_lakehouse, enhancing PostgreSQL for data analysis and data transformation capabilities, with support for various file formats and future plans to add more features.
Leave a Comment