Compressed Text with Random Access
Prof. Peter Boncz VU Amsterdam Wednesday, June 29, 2022 @ 2:00 pm Room BC 420 Hosted by: Prof. Anastasia Ailamaki
Abstract
Textual strings are very commonly present in real-world data sets, where they often occupy a large fraction of the space while they are relatively slow to process for data systems. Fast Static Symbol Table (FSST) is a new lightweight compression scheme for such text strings. FSST offers decompression and compression speed similar to or better than the best speed-optimised compression methods, such as LZ4, while giving a better compression ratio. Moreover, its use of a static symbol table allows random access to individual, compressed strings, enabling lazy decompression as well as processing tasks directly on compressed data. These features make FSST a valuable piece in the standard compression toolbox.
Bio
Peter Boncz is professor at VU Amsterdam and researcher at CWI, where he oversees three research groups. He received the 2009 VLDB 10 Years Best Paper Award for his work on the database architecture for modern computer hardware (also the theme of the DaMoN workshop he co-founded in 2005). In 2013 he received a Humboldt Research Award and became fellow at TU Munich. He also works on graph data management, co-founding and chairing the Linked Database Benchmark Council (LDBC), a benchmarking organization for graph database systems. His PhD thesis project, advised by Martin Kersten, yielded MonetDB, a pioneering column store which eventually won the 2016 ACM SIGMOD systems award. In 2008 he co-founded Vectorwise, based on the concept of vectorized query execution that has since been broadly adopted in industry. He received the Dutch ICT Regie Award 2006 for his role in the CWI spin-off Data Distilleries. He provides advice to industry including Databricks, who are now present in Amsterdam, and the latest CWI spin-off, around the highly popular embedded analytics system DuckDB.