CLP for JSON logs#
For JSON logs, you can compress, decompress, and search them using the clp-s binary described
below.
Compression#
Usage:
./clp-s c [<options>] <archives-dir> <input-path> [<input-path> ...]
- archives-diris the directory that archives should be written to.
- input-pathis a filesystem path or URL to either:- a new-line-delimited JSON (ndjson) log file (may be Zstd-compressed); 
- a KV-IR file (may be Zstd-compressed); 
 
- optionsallow you to specify how data gets compressed into an archive. For example:- --single-file-archivespecifies that single-file archives should be produced (i.e., each archive is a single file in- archives-dir).
- --timestamp-key <field-path>specifies which field should be treated as each log event’s timestamp.
- --target-encoded-size <size>specifies the threshold (in bytes) at which archives are split, where- sizeis the total size of the dictionaries and encoded messages in an archive.- This option acts as a soft limit on memory usage for compression, decompression, and search. 
- This option significantly affects compression ratio. 
 
- --structurize-arraysspecifies that arrays should be fully parsed and array entries should be encoded into dedicated columns.
- --auth <s3|none>specifies the authentication method that should be used for network requests if the input path is a URL.- When S3 authentication is enabled, we issue a GET request following the AWS Signature Version 4 specification. This request uses the environment variables - AWS_ACCESS_KEY_ID,- AWS_SECRET_ACCESS_KEY, and, optionally,- AWS_SESSION_TOKENif it exists.
- For more information on usage with S3, see our dedicated guide. 
 
 
For a complete list of options, run ./clp-s c --help.
Examples#
Compress /mnt/logs/log1.json and output archives to /mnt/data/archives1:
./clp-s c /mnt/data/archives1 /mnt/logs/log1.json
Treat the field {"d": {"@timestamp": "..."}} as each log event’s timestamp:
./clp-s c --timestamp-key 'd.@timestamp' /mnt/data/archives1 /mnt/logs/log1.json
Tip
Specifying the timestamp-key will create a range-index for the timestamp column which can increase compression ratio and search performance.
Compress a KV-IR file stored on S3 into a single-file archive:
AWS_ACCESS_KEY_ID='...' AWS_SECRET_ACCESS_KEY='...' \
  ./clp-s c --single-file-archive --auth s3 /mnt/data/archives \
  https://my-bucket.s3.us-east-2.amazonaws.com/kv-ir-log.clp
Set the target encoded size to 1 GiB and the compression level to 6 (3 by default)
./clp-s c \
    --target-encoded-size 1073741824 \
    --compression-level 6 \
    /mnt/data/archives1 \
    /mnt/logs/log1.json
Decompression#
Usage:
./clp-s x [<options>] <archives-path> <output-dir>
- archives-pathis a directory containing archives, a path to an archive, or a URL pointing to a single-file archive.
- output-diris the directory that decompressed logs should be written to.
- optionsallow you to specify things like a specific archive (from within- archives-path, if it is a directory) to decompress (- --archive-id <archive-id>).- For a complete list, run - ./clp-s x --help
 
Examples#
Decompress all logs from /mnt/data/archives1 into /mnt/data/archives1-decomp:
./clp-s x /mnt/data/archives1 /mnt/data/archives1-decomp
Search#
Usage:
./clp-s s [<options>] <archives-path> <kql-query>
- archives-pathis a directory containing archives, a path to an archive, or a URL pointing to a single-file archive.
- kql-queryis a KQL query.
- optionsallow you to specify things like a specific archive (from within- archives-path, if it is a directory) to search (- --archive-id <archive-id>).- For a complete list, run - ./clp-s s --help
 
Examples#
Find all log events within a time range:
./clp-s s /mnt/data/archives1 'ts >= 1649923037 AND ts <= 1649923038'
or
./clp-s s /mnt/data/archives1 \
    'ts >= date("2022-04-14T07:57:17") AND ts <= date("2022-04-14T07:57:18")'
Find log events with a given key-value pair:
./clp-s s /mnt/data/archives1 'id: 22149'
Find ERROR log events containing a substring:
./clp-s s /mnt/data/archives1 'level: ERROR AND message: "job*"'
Find FATAL or ERROR log events and ignore case distinctions between values in the query and the compressed data:
./clp-s s --ignore-case /mnt/data/archives1 'level: FATAL OR level: ERROR'
Current limitations#
- clp-scurrently only supports valid JSON logs; it does not handle JSON logs with trailing commas or other JSON syntax errors.
- Time zone information is not preserved. 
- The order of log events is not preserved. 
- The input directory structure is not preserved and during decompression all files are written to the same file. 
- In addition, there are a few limitations, related to querying arrays, described in the search syntax reference. 
