CLP for JSON logs#

For JSON logs, you can compress, decompress, and search them using the clp-s binary described below.

Compression#

Usage:

./clp-s c [<options>] <archives-dir> <input-path> [<input-path> ...]
  • archives-dir is the directory that archives should be written to.

  • input-path is any new-line-delimited JSON (ndjson) log file or directory containing such files.

  • options allow you to specify things like which field should be considered as the log event’s timestamp (--timestamp-key <field-path>), or whether to fully parse array entries and encode them into dedicated columns (--structurize-arrays).

    • For a complete list, run ./clp-s c --help

Examples#

Compress /mnt/logs/log1.json and output archives to /mnt/data/archives1:

./clp-s c /mnt/data/archives1 /mnt/logs/log1.json

Treat the field {"d": {"@timestamp": "..."}} as each log event’s timestamp:

./clp-s c --timestamp-key 'd.@timestamp' /mnt/data/archives1 /mnt/logs/log1.json

Tip

Specifying the timestamp-key will create a range-index for the timestamp column which can increase compression ratio and search performance.

Set the target encoded size to 1 GiB and the compression level to 6 (3 by default)

./clp-s c \
    --target-encoded-size 1073741824 \
    --compression-level 6 \
    /mnt/data/archives1 \
    /mnt/logs/log1.json

Decompression#

Usage:

./clp-s x [<options>] <archives-dir> <output-dir>
  • archives-dir is a directory containing archives.

  • output-dir is the directory that decompressed logs should be written to.

  • options allow you to specify things like a specific archive (from within archives-dir) to decompress (--archive-id <archive-id>).

    • For a complete list, run ./clp-s x --help

Examples#

Decompress all logs from /mnt/data/archives1 into /mnt/data/archives1-decomp:

./clp-s x /mnt/data/archives1 /mnt/data/archives1-decomp

Current limitations#

  • clp-s currently only supports valid JSON logs; it does not handle JSON logs with trailing commas or other JSON syntax errors.

  • Time zone information is not preserved.

  • The order of log events is not preserved.

  • The input directory structure is not preserved and during decompression all files are written to the same file.

  • In addition, there are a few limitations, related to querying arrays, described in the search syntax reference.