ParadeDB, the creator of PostgreSQL’s distro, has introduced the pg_bm25 extension for creating a search engine using PostgreSQL, with the aim of replacing Elasticsearch. Unlike traditional search engines that only consider keyword matches, pg_bm25 utilizes the BM25 indexing method, which assigns scores based on the frequency of keyword matches, giving special importance to less frequently occurring keywords and shorter documents. It’s worth noting that Elasticsearch also employs the BM25 algorithm for document retrieval.
The project is built using the Tantivy library, written in Rust, which has similar functionality to Apache Lucene, the engine used by Elasticsearch. Additionally, it utilizes the pgrx framework for developing PostgreSQL extensions in Rust, and introduces a new operator ‘@@@’ that emulates PostgreSQL’s own ‘@@’ operator.
There are currently two ways to install pg_bm25: compiling it yourself or using the convenient ParadeDB package.
TLDR: ParadeDB has introduced pg_bm25, a PostgreSQL extension that creates a search engine to replace Elasticsearch. The extension adopts the BM25 indexing method, which scores documents based on keyword frequency and provides special importance to less common keywords and shorter documents. It utilizes the Tantivy library, inspired by Apache Lucene, and incorporates the pgrx framework for Rust development. The installation options include manual compilation or using the ParadeDB package.