Clickhouse wrote a blog regarding strategies to enhance the performance of databases on slow disk machines. Utilizing unconventional techniques, particularly lazy materialization that loads only necessary parts, allows Clickhouse to query swiftly.
An exemplary machine used is m6i.8xlarge on AWS with limited SSD storage at 3,000 IOPS and 125MiB throughput. Using a database review of 150 million items as an example, the querying process for small columns like the usefulness ratings is rapid due to the minimal data to read. However, reading large columns like review content significantly slows down (approximately 3 minutes) as time is spent on disk data reading.
Clickhouse’s primary approach involves implementing PREWHERE by loading only the necessary columns for query conditions checking first. These columns are typically not extensive, enabling quick exploration. Moreover, if a row does not meet the initial conditions, Clickhouse will not load data rows that have already failed conditions into memory anymore, enhancing overall efficiency. This method ensures Clickhouse only loads the final data of rows that meet the conditions.
The lazy materialization feature functions similarly by loading high-rated review columns first before loading the remaining columns once it is certain that the data is needed. Various feature implementations have reduced the time taken for the sample query Clickhouse provided from 220 seconds to 181ms while retaining the same SQL query structure.
TLDR: Clickhouse details performance improvement strategies for databases on slow machines using techniques like lazy materialization and PREWHERE, optimizing query speed significantly.
Leave a Comment