CLP for JSON logs#
For JSON logs, you can compress, decompress, and search them using the clp-s
binary described
below.
Compression#
Usage:
./clp-s c [<options>] <archives-dir> <input-path> [<input-path> ...]
archives-dir
is the directory that archives should be written to.input-path
is any new-line-delimited JSON (ndjson) log file or directory containing such files.options
allow you to specify things like which field should be considered as the log event’s timestamp (--timestamp-key <field-path>
), or whether to fully parse array entries and encode them into dedicated columns (--structurize-arrays
).For a complete list, run
./clp-s c --help
Examples#
Compress /mnt/logs/log1.json
and output archives to /mnt/data/archives1
:
./clp-s c /mnt/data/archives1 /mnt/logs/log1.json
Treat the field {"d": {"@timestamp": "..."}}
as each log event’s timestamp:
./clp-s c --timestamp-key 'd.@timestamp' /mnt/data/archives1 /mnt/logs/log1.json
Tip
Specifying the timestamp-key will create a range-index for the timestamp column which can increase compression ratio and search performance.
Set the target encoded size to 1 GiB and the compression level to 6 (3 by default)
./clp-s c \
--target-encoded-size 1073741824 \
--compression-level 6 \
/mnt/data/archives1 \
/mnt/logs/log1.json
Decompression#
Usage:
./clp-s x [<options>] <archives-dir> <output-dir>
archives-dir
is a directory containing archives.output-dir
is the directory that decompressed logs should be written to.options
allow you to specify things like a specific archive (from withinarchives-dir
) to decompress (--archive-id <archive-id>
).For a complete list, run
./clp-s x --help
Examples#
Decompress all logs from /mnt/data/archives1
into /mnt/data/archives1-decomp
:
./clp-s x /mnt/data/archives1 /mnt/data/archives1-decomp
Search#
Usage:
./clp-s s [<options>] <archives-dir> <kql-query>
archives-dir
is a directory containing archives.kql-query
is a KQL query.options
allow you to specify things like a specific archive (from withinarchives-dir
) to search (--archive-id <archive-id>
).For a complete list, run
./clp-s s --help
Examples#
Find all log events within a time range:
./clp-s s /mnt/data/archives1 'ts >= 1649923037 AND ts <= 1649923038'
or
./clp-s s /mnt/data/archives1 \
'ts >= date("2022-04-14T07:57:17") AND ts <= date("2022-04-14T07:57:18")'
Find log events with a given key-value pair:
./clp-s s /mnt/data/archives1 'id: 22149'
Find ERROR log events containing a substring:
./clp-s s /mnt/data/archives1 'level: ERROR AND message: "job*"'
Find FATAL or ERROR log events and ignore case distinctions between values in the query and the compressed data:
./clp-s s --ignore-case /mnt/data/archives1 'level: FATAL OR level: ERROR'
Current limitations#
clp-s
currently only supports valid JSON logs; it does not handle JSON logs with trailing commas or other JSON syntax errors.Time zone information is not preserved.
The order of log events is not preserved.
The input directory structure is not preserved and during decompression all files are written to the same file.
In addition, there are a few limitations, related to querying arrays, described in the search syntax reference.