Datasets#

This page lists a few large log datasets you can use to try out CLP and evaluate its compression ratio against other tools. Each dataset is gzipped for more efficient downloads. We will be uploading more datasets over time.

For evaluation results comparing CLP and other tools, see our paper.

Dataset

Format

Uncompressed size

Download size

hadoop-14TB-part1

Text

428.94 GB

20.33 GB

openstack-24hr

Text

33.00 GB

2.06 GB

hive-24hr

Text

2.07 GB

122.54 MB

mongodb

JSON

64.80 GB

1.48 GB

cockroachdb

JSON

9.79 GB

528.97 MB

elasticsearch

JSON

7.98 GB

165.91 MB

spark-event-logs

JSON

1.98 GB

211.88 MB

postgresql

JSON

392.84 MB

14.59 MB

We will upload the other parts soon.