Multi-host deployment#
A multi-host deployment allows you to run CLP across a distributed set of hosts.
Warning
The instructions below provide a temporary solution for multi-host deployment and may change as we actively work to improve ease of deployment. The present solution uses manual Docker Compose orchestration; however, Kubernetes Helm support will be available in a future release, which will simplify multi-host deployments significantly.
Requirements#
Docker and Docker Compose
If you’re not running as root, ensure Docker can be run without superuser privileges.
One or more hosts networked together
When not using S3 storage, a shared filesystem accessible by all worker hosts (e.g., NFS, SeaweedFS)
See below for how to set up a simple SeaweedFS cluster.
Cluster overview#
The CLP package is composed of several components (services in orchestrator terminology) including infrastructure services, schedulers, workers, and supporting services. For a detailed overview of all services and their dependencies, see the deployment orchestration design doc.
In a multi-host cluster:
infrastructure services and schedulers should be run once per cluster (they’re singleton services).
workers can be run on multiple hosts to increase parallelism.
Configuring CLP#
To configure CLP for multi-host deployment, you’ll need to:
update CLP’s generated configuration to support a multi-host deployment.
distribute and configure the CLP package on all hosts in your cluster.
CLP environment setup#
Extract the CLP package on one host (the “setup host”).
Configure credentials:
Copy
etc/credentials.template.yamltoetc/credentials.yaml.Edit
etc/credentials.yamlto set usernames and passwords.
Edit CLP’s configuration file:
Open
etc/clp-config.yaml.For each service, set the
hostandportfields to the actual hostname/IP and port where you plan to run the specific service.When using local filesystem storage (i.e., not S3), set
logs_input.storage.directory,archive_output.storage.directory, andstream_output.storage.directoryto directories on the shared filesystem.
Set up the CLP package’s environment:
sbin/start-clp.sh --setup-onlyThis will:
Validate your configuration
Create any necessary directories
Generate an
.envfile with all necessary environment variablesCreate
var/log/.clp-config.yaml(the container-specific configuration file)Create
var/www/webui/server/dist/settings.json(thewebuiserver’s configuration file)
Updating CLP’s generated configuration#
The last step in the previous section (sbin/start-clp.sh --setup-only) will generate any necessary
configuration files, but they’re unsuitable for use across multiple hosts (they’re designed for use
on a single host).
Note
As mentioned at the beginning of this guide, this setup will be made simpler in a future release.
To update the generated configuration files for use across multiple hosts:
Edit
var/log/.clp-config.yaml:Update all
hostfields to use the actual hostname or IP address where each service will run (matching what you configured inetc/clp-config.yaml).Similarly, update any
portfields.For example, if your database runs on
192.168.1.10:3306, ensuredatabase.hostis set to192.168.1.10anddatabase.portis3306.
Edit
var/www/webui/server/dist/settings.json:Update
SqlDbHostto the actual hostname or IP address of your database service.Update
SqlDbPortif you changed the database port.Update
MongoDbHostto the actual hostname or IP address of your results cache service.Update
MongoDbPortif you changed the results cache port.
Distributing the set-up package#
With the package set up, we can now distribute it to all hosts in the cluster:
Copy the set-up package to all hosts where you want to run CLP services.
Ensure the package is copied to the same location on every host or else, on each host, you’ll need to modify the paths in
.envas appropriate.
Configure worker concurrency (optional):
On each worker host, edit the
.envfile to adjust worker concurrency settings as needed:CLP_COMPRESSION_WORKER_CONCURRENCYCLP_QUERY_WORKER_CONCURRENCYCLP_REDUCER_CONCURRENCY
Recommended settings:
If workers are started on separate hosts, set each concurrency value to match the CPU count on that host.
If compression and query/reducer workers are started on the same host, set each concurrency value to half the CPU count (e.g., for a 16-core host, set all three to 8).
Starting CLP#
You can start CLP across multiple hosts by starting each service on the relevant host. The commands below indicate how to do so, with comments indicating the startup order and dependencies between services.
Note
For clp-json + Presto deployments (package.storage_engine: clp-s with
package.query_engine: presto), you can omit starting the query-scheduler, query-worker, and
reducer services.
Tip
If you want to use your own MariaDB/MySQL or MongoDB servers instead of the Docker Compose managed
databases, see the external database setup guide. When using external
databases, skip starting the database and results-cache services below.
All commands below assume you are running them from the root of the CLP package directory.
################################################################################
# Infrastructure services
################################################################################
# Start database (skip if using external database)
docker compose \
--project-name "clp-package-$(cat var/log/instance-id)" \
up database \
--no-deps --wait
# Initialize database
docker compose \
--project-name "clp-package-$(cat var/log/instance-id)" \
up db-table-creator \
--no-deps
# Start queue
docker compose \
--project-name "clp-package-$(cat var/log/instance-id)" \
up queue \
--no-deps --wait
# Start redis
docker compose \
--project-name "clp-package-$(cat var/log/instance-id)" \
up redis \
--no-deps --wait
# Start results cache (skip if using external MongoDB)
docker compose \
--project-name "clp-package-$(cat var/log/instance-id)" \
up results-cache \
--no-deps --wait
# Initialize results cache
docker compose \
--project-name "clp-package-$(cat var/log/instance-id)" \
up results-cache-indices-creator \
--no-deps
################################################################################
# Controller services (schedulers, UI, and supporting services)
################################################################################
# Start compression scheduler
docker compose \
--project-name "clp-package-$(cat var/log/instance-id)" \
up compression-scheduler \
--no-deps --wait
# Start query scheduler
docker compose \
--project-name "clp-package-$(cat var/log/instance-id)" \
up query-scheduler \
--no-deps --wait
# Start API server
docker compose \
--project-name "clp-package-$(cat var/log/instance-id)" \
up api-server \
--no-deps --wait
# Start webui
docker compose \
--project-name "clp-package-$(cat var/log/instance-id)" \
up webui \
--no-deps --wait
# Start garbage collector (optional, only if retention is configured)
docker compose \
--project-name "clp-package-$(cat var/log/instance-id)" \
up garbage-collector \
--no-deps --wait
# Start MCP server (optional, only if configured)
docker compose \
--project-name "clp-package-$(cat var/log/instance-id)" \
up mcp-server \
--no-deps --wait
################################################################################
# Worker services (can be started on multiple hosts)
################################################################################
# Start compression worker
docker compose \
--project-name "clp-package-$(cat var/log/instance-id)" \
up compression-worker \
--no-deps --wait
# Start query worker
docker compose \
--project-name "clp-package-$(cat var/log/instance-id)" \
up query-worker \
--no-deps --wait
# Start reducer
docker compose \
--project-name "clp-package-$(cat var/log/instance-id)" \
up reducer \
--no-deps --wait
Note
To increase parallelism, start worker services (compression-worker, query-worker, reducer) on
multiple hosts.
Using CLP#
To learn how to compress and search your logs, check out the quick-start guide that corresponds to the flavor of CLP you’re running:
Using clp-json
How to compress and search JSON logs.
Using clp-text
How to compress and search unstructured text logs.
Stopping CLP#
To stop CLP, on every host where it’s running, run:
sbin/stop-clp.sh
This will stop all CLP services managed by Docker Compose on the current host.
Monitoring services#
To check the status of services on a host:
docker compose --project-name clp-package-<instance-id> ps
To view logs for a specific service:
docker compose --project-name clp-package-<instance-id> logs -f <service-name>
Setting up SeaweedFS#
The instructions below are for running a simple SeaweedFS cluster on a set of hosts. For other use cases, see the SeaweedFS docs.
Install SeaweedFS.
Start the master and a filer on one of the hosts:
weed master -port 9333 weed filer -port 8888 -master "localhost:9333"
Start one or more volume servers on one or more hosts.
Create a directory where you want SeaweedFS to store data.
Start the volume server:
weed volume -mserver "<master-host>:9333" -dir <storage-dir> -max 0
<master-host>is the hostname/IP of the master host.<storage-dir>is the directory where you want SeaweedFS to store data.
Start a FUSE mount on every host that you want to run a CLP worker:
weed mount -filer "<master-host>:8888" -dir <mount-path>
<master-host>is the hostname/IP of the master host.<mount-path>is the path where you want the mount to be.