Deployment orchestration#
The CLP package is composed of several components that are currently designed to be deployed in a set of containers that are orchestrated using a framework like Docker Compose. This document explains the architecture of that orchestration and any associated nuances.
Architecture#
Figure 1 shows the components (services in orchestrator terminology) in the CLP
package as well as their dependencies. The CLP package consists of several long-running services
(e.g., database) and some one-time initialization jobs (e.g., db-table-creator). Some of the
long-running services depend on the successful completion of the one-time jobs (e.g., webui
depends on results-cache-indices-creator), while others depend on the health of other long-running
services (e.g., compression-scheduler depends on queue).
Table 1 below lists the services their functions, while Table 2 lists the one-time initialization jobs and their functions.
%%{
init: {
"theme": "base",
"themeVariables": {
"primaryColor": "#0066cc",
"primaryTextColor": "#fff",
"primaryBorderColor": "transparent",
"lineColor": "#007fff",
"secondaryColor": "#007fff",
"tertiaryColor": "#fff"
}
}
}%%
graph LR
%% Services
database["database (MySQL)"]
queue["queue (RabbitMQ)"]
redis["redis (Redis)"]
results_cache["results-cache (MongoDB)"]
compression_scheduler["compression-scheduler"]
query_scheduler["query-scheduler"]
compression_worker["compression-worker"]
query_worker["query-worker"]
reducer["reducer"]
api_server["api-server"]
garbage_collector["garbage-collector"]
webui["webui"]
mcp_server["mcp-server"]
%% One-time jobs
db_table_creator["db-table-creator"]
results_cache_indices_creator["results-cache-indices-creator"]
%% Dependencies
database -->|healthy| db_table_creator
results_cache -->|healthy| results_cache_indices_creator
db_table_creator -->|completed_successfully| compression_scheduler
queue -->|healthy| compression_scheduler
redis -->|healthy| compression_scheduler
db_table_creator -->|completed_successfully| query_scheduler
queue -->|healthy| query_scheduler
redis -->|healthy| query_scheduler
query_scheduler -->|healthy| reducer
results_cache_indices_creator -->|completed_successfully| reducer
db_table_creator -->|completed_successfully| api_server
results_cache_indices_creator -->|completed_successfully| api_server
db_table_creator -->|completed_successfully| webui
results_cache_indices_creator -->|completed_successfully| webui
db_table_creator -->|completed_successfully| mcp_server
results_cache_indices_creator -->|completed_successfully| mcp_server
db_table_creator -->|completed_successfully| garbage_collector
results_cache_indices_creator -->|completed_successfully| garbage_collector
subgraph Databases
database
queue
redis
results_cache
end
subgraph Initialization jobs
db_table_creator
results_cache_indices_creator
end
subgraph Schedulers
compression_scheduler
query_scheduler
end
subgraph Workers
compression_worker
query_worker
reducer
end
subgraph Management & UI
api_server
garbage_collector
webui
end
subgraph AI
mcp_server
end
Service |
Description |
|---|---|
database |
Database for archive metadata, compression jobs, and query jobs |
queue |
Task queue for schedulers |
redis |
Task result storage for workers |
compression_scheduler |
Scheduler for compression jobs |
query_scheduler |
Scheduler for search/aggregation jobs |
results_cache |
Storage for the workers to return search results to the UI |
compression_worker |
Worker processes for compression jobs |
query_worker |
Worker processes for search/aggregation jobs |
reducer |
Reducers for performing the final stages of aggregation jobs |
api_server |
API server for submitting queries |
webui |
Web server for the UI |
mcp_server |
MCP server for AI agent to access CLP functionalities |
garbage_collector |
Process to manage data retention |
Job |
Description |
|---|---|
db-table-creator |
Creates and initializes database tables |
results-cache-indices-creator |
Creates a single-node replica set and sets up indices |
Code structure#
The orchestration code is split up into:
BaseControllerthat defines:common logic for preparing the environment variables, configuration files, and directories necessary for each service.
abstract methods that orchestrator-specific derived classes must implement in order to orchestrate a deployment.
<Orchestrator>Controllerthat implements (and/or overrides) any of the methods inBaseController(<Orchestrator>is a placeholder for the specific orchestrator for which the class is being implemented).
Docker Compose orchestration#
This section explains how we use Docker Compose to orchestrate the CLP package and is broken into the following subsections:
Setting up the environment#
Several services require configuration values to be passed in through the CLP package’s config file, environment variables, and/or command line arguments. Since the services are running in containers, some of these configuration values need to be modified for the orchestration environment. Specifically:
Paths on the host must be converted to appropriate paths in the container.
Component hostnames must be converted to service names, and component ports must be converted to the component’s default ports.
This ensures that in the Docker Compose configuration, services can communicate over fixed, predictable hostnames and ports rather than relying on configurable variables.
To achieve this, before starting the deployment, DockerComposeController.start generates:
a CLP configuration file (
<clp-package>/var/log/.clp-config.yamlon the host) specific to the Docker Compose project environment.an environment variable file (
<clp-package>/.env) for any other configuration values.any necessary directories (e.g., data output directories).
The Docker Compose project then passes those environment variables to the relevant services, either as environment variables or command line arguments, as necessary.
Starting and stopping the project#
To start and stop the project, DockerComposeController simply invokes docker compose up or
docker compose down as appropriate. However, to allow multiple CLP packages to be run on the same
host, we explicitly specify a project name for the project, where the name is based on the package’s
instance ID.
Deployment Types#
CLP supports two deployment types determined by the package.query_engine configuration setting.
BASE: For deployments using Presto as the query engine. This deployment only uses
docker-compose.base.yaml.FULL: For deployments using one of CLP’s native query engines. This uses both
docker-compose.base.yamlanddocker-compose.yaml.
Implementation details#
One notable implementation detail is in how we handle mounts that are only necessary under certain
configurations. For instance, the input logs mount is only necessary when the logs_input.type is
fs. If logs_input.type is s3, we shouldn’t mount some random directory from the user’s
host filesystem into the container. However, Docker doesn’t provide a mechanism to perform
conditional mounts. Instead, we use Docker’s variable interpolation to conditionally mount an empty
tmpfs mount into the container. This strategy is used wherever we need a conditional mount.
Troubleshooting#
If you encounter issues with the Docker Compose deployment, first determine the instance ID for your
deployment by checking the content of <clp-package>/var/log/instance-id. Then run one of the
commands below as necessary.
Check service status:
docker compose --project-name clp-package-<instance-id> ps
View service logs:
docker compose --project-name clp-package-<instance-id> logs <service-name>
Validate configuration:
docker compose config