Using CLP with object storage#

To compress logs from S3, follow the steps in the section below. For all other operations, you should be able to use CLP as described in the clp-json quick-start guide.

Compressing logs from S3#

To compress logs from S3, use the sbin/compress-from-s3.sh script. The script supports two modes of operation:

s3-object compression mode#

The s3-object mode allows you to specify individual S3 objects to compress by using their full URLs. To use this mode, call the sbin/compress-from-s3.sh script as follows, and replace the fields in angle brackets (<>) with the appropriate values:

sbin/compress-from-s3.sh \
  --timestamp-key <timestamp-key> \
  --dataset <dataset-name> \
  s3-object \
  <object-url> [<object-url> ...]
  • <object-url> is a URL identifying the S3 object to compress. It can be written in either of two formats:

    • https://<bucket-name>.s3.<region-code>.amazonaws.com/<object-key>

    • https://s3.<region-code>.amazonaws.com/<bucket-name>/<object-key>

  • The fields in <object-url> are as follows:

    • <bucket-name> is the name of the S3 bucket containing your logs.

    • <region-code> is the AWS region code for the S3 bucket containing your logs.

    • <object-key> is the object key of the log file object you wish to compress.

      Warning

      There must be no duplicate object keys across all <object-url> arguments.

  • For a description of other fields, see the clp-json quick-start guide.

Instead of specifying input object URLs explicitly in the command, you may specify them in a text file and then pass the file into the command using the --inputs-from flag, like so:

sbin/compress-from-s3.sh \
  --timestamp-key <timestamp-key> \
  --dataset <dataset-name> \
  s3-object \
  --inputs-from <input-file>
  • <input-file> is a path to a text file containing one S3 object URL per line. The URLs must follow the same format as described above for <object-url>.

Note

The s3-object mode requires the input object keys to share a non-empty common prefix. If the input object keys do not share a common prefix, they will be rejected and no compression job will be created. This limitation will be addressed in a future release.

s3-key-prefix compression mode#

The s3-key-prefix mode allows you to compress all objects under a given S3 key prefix. To use this mode, call the sbin/compress-from-s3.sh script as follows, and replace the fields in angle brackets (<>) with the appropriate values:

sbin/compress-from-s3.sh \
  --timestamp-key <timestamp-key> \
  --dataset <dataset-name> \
  s3-key-prefix \
  <key-prefix-url>
  • <key-prefix-url> is a URL identifying the S3 key prefix to compress. It can be written in either of two formats:

    • https://<bucket-name>.s3.<region-code>.amazonaws.com/<key-prefix>

    • https://s3.<region-code>.amazonaws.com/<bucket-name>/<key-prefix>

  • The fields in <key-prefix-url> are as follows:

    • <bucket-name> is the name of the S3 bucket containing your logs.

    • <region-code> is the AWS region code for the S3 bucket containing your logs.

    • <key-prefix> is the prefix of all logs you wish to compress and must begin with the <all-logs-prefix> value from the compression IAM policy.

Note

s3-key-prefix mode only accepts a single <key-prefix-url> argument. This limitation will be addressed in a future release.