Using log-ingestor#
log-ingestor is a CLP component that facilitates continuous log ingestion from a given log source.
Note
Currently, log-ingestor can only be used by clp-json deployments
that are configured for S3 object storage. To set up this configuration, check out the
object storage guide.
Support for ingestion from local filesystems, or for using clp-text, is planned for a future
release.
Starting log-ingestor#
clp-json will spin up log-ingestor on startup as long as the logs_input field in the CLP
package’s config file (clp-package/etc/clp-config.yaml) is
configured for object storage.
You can specify a custom configuration for log-ingestor by modifying the log_ingestor field in
the same file.
Ingestion jobs#
log-ingestor facilitates continuous log ingestion with ingestion jobs. An ingestion job
continuously monitors a configured log source, buffers incoming log data, and groups it into
compression jobs. This buffering and batching strategy improves compression efficiency and reduces
overall storage overhead.
Note
Support for one-time ingestion jobs (similar to the current CLP compression CLI workflows) is planned for a future release.
Interacting with log-ingestor#
log-ingestor exposes RESTful APIs that allow you to submit ingestion jobs, manage ingestion
jobs, and check log-ingestor’s health.
You can explore all available endpoints and their schemas at the
Swagger UI log-ingestor page.
Note
Currently, requests to log-ingestor must be sent directly to the log-ingestor service. Requests
will be routed through CLP’s API server in a future release.
Fault tolerance#
Warning
The current version of log-ingestor does not provide fault tolerance.
If log-ingestor crashes or is restarted, all in-progress ingestion jobs and their associated state
will be lost, and must be restored manually. Robust fault tolerance for the ingestion pipeline is
planned for a future release.
Continuous ingestion from S3#
log-ingestor supports continuous ingestion jobs for ingesting logs from S3-compatible object
storage. Currently, two types of ingestion jobs are available:
S3 scanner: Periodically scans an S3 bucket and prefix for new log files to ingest.
SQS listener: Listens to an SQS queue for notifications about newly created log files in S3.
S3 scanner#
An S3 scanner ingestion job periodically scans a specified S3 bucket and key prefix for new log files to ingest. The scan interval and other parameters can be configured when creating the job.
For configuration details and the request body, see the API reference for creating S3 scanner ingestion jobs.
Important
To ensure correct and efficient ingestion, the scanner relies on the following assumptions:
Lexicographical order: Every new object added to the S3 bucket has a key that is lexicographically greater than the previously added object. For example, objects with keys
log1andlog2will be ingested sequentially. If a new object with keylog0is added afterlog2, it will be ignored because it is not lexicographically greater than the last ingested key.Immutability: Objects under the specified prefix are immutable. Once an object is created, it is not modified or overwritten.
SQS listener#
An SQS listener ingestion job listens to a specified AWS SQS queue and ingests S3 objects referenced by incoming notifications. For details on configuring S3 event notifications for SQS, see the AWS documentation.
For configuration details and the request body, see the API reference for creating SQS listener ingestion jobs.
Important
To ensure correct and efficient ingestion, the listener relies on the following assumptions:
Dedicated queue: The given SQS queue must be dedicated to this ingestion job. No other consumers should read from or delete messages in the queue. The ingestion job must have permission to delete messages after they are successfully processed.
Immutability: Objects under the specified prefix are immutable. Once an object is created, it is not modified or overwritten.
Note
SQS listener ingestion jobs carry the following limitations:
An SQS listener ingestion job can only ingest objects from a single S3 bucket and prefix. Support for multiple buckets or prefixes is planned for a future release.
SQS listener ingestion jobs do not support custom S3 endpoint configurations. Support for custom endpoints is planned for a future release.