Quickwit software is an open-source search engine developed by a Japanese startup that has caught the attention of Binance, a major cryptocurrency exchange market. They have successfully migrated their log storage system from Elasticsearch to Quickwit, only six months after the Binance team met with Quickwit.
The Binance application was logging up to 21 million lines per second, equivalent to 1.6PB of data per day. Initially using 600 Vector instances to pull logs from Kafka and push them to Elasticsearch across 20 clusters, the team encountered difficulties in managing such a large number of Elasticsearch clusters, requiring large storage resources and lacking the ability to replicate due to data ingestion limitations.
Choosing Quickwit allowed the team direct log extraction from Kafka, support for VRL language to transform data within logs, data compression, and the use of object storage for data storage, eliminating the need to manage storage clusters themselves.
After selecting Quickwit, the team began testing by having Quickwit extract multi-gigabyte level data per second, encountering system instability due to the cluster protocol supporting the workload of the indexer, receiving data from multiple pod levels. The team resorted to dividing the indexer clusters into 10 sub-clusters by topic and found that they could handle a full 1.6PB of data per day with 700 pods totaling 2,800 vCPUs for data retrieval.
Binance uses 10 PostgreSQL clusters separated by indexer clusters, then consolidates the databases into a single cluster, before utilizing searcher clusters to search for data.
Overall, Binance uses significantly fewer CPUs compared to Elasticsearch, up to 5 times less, while reducing storage usage by 20 times. However, they are planning to extend the log retention period in the future.
Quickwit plans to further develop data compression and cluster management in this scenario.
TLDR: Binance successfully migrated log storage from Elasticsearch to Quickwit due to its direct Kafka log extraction, VRL language support, and object storage usage. They reduced CPU and storage usage significantly and plan to enhance data compression and multi-cluster management in the future.
Leave a Comment