Deployment orchestration#

The CLP package is composed of several components that are currently designed to be deployed in a set of containers that are orchestrated using a framework like Docker Compose. This document explains the architecture of that orchestration and any associated nuances.

Architecture#

Figure 1 shows the components (services in orchestrator terminology) in the CLP package as well as their dependencies. The CLP package consists of several long-running services (e.g., database) and some one-time initialization jobs (e.g., db-table-creator). Some of the long-running services depend on the successful completion of the one-time jobs (e.g., webui depends on results-cache-indices-creator), while others depend on the health of other long-running services (e.g., compression-scheduler depends on queue).

Table 1 below lists the services their functions, while Table 2 lists the one-time initialization jobs and their functions.

        %%{
    init: {
        "theme": "base",
        "themeVariables": {
            "primaryColor": "#0066cc",
            "primaryTextColor": "#fff",
            "primaryBorderColor": "transparent",
            "lineColor": "#007fff",
            "secondaryColor": "#007fff",
            "tertiaryColor": "#fff"
        }
    }
}%%
graph LR
  %% Services
  database["database (MySQL)"]
  queue["queue (RabbitMQ)"]
  redis["redis (Redis)"]
  results_cache["results-cache (MongoDB)"]
  compression_scheduler["compression-scheduler"]
  query_scheduler["query-scheduler"]
  compression_worker["compression-worker"]
  query_worker["query-worker"]
  reducer["reducer"]
  webui["webui"]
  garbage_collector["garbage-collector"]

  %% One-time jobs
  db_table_creator["db-table-creator"]
  results_cache_indices_creator["results-cache-indices-creator"]

  %% Dependencies
  database -->|healthy| db_table_creator
  results_cache -->|healthy| results_cache_indices_creator
  db_table_creator -->|completed_successfully| compression_scheduler
  queue -->|healthy| compression_scheduler
  redis -->|healthy| compression_scheduler
  db_table_creator -->|completed_successfully| query_scheduler
  queue -->|healthy| query_scheduler
  redis -->|healthy| query_scheduler
  query_scheduler -->|healthy| reducer
  results_cache_indices_creator -->|completed_successfully| reducer
  db_table_creator -->|completed_successfully| webui
  results_cache_indices_creator -->|completed_successfully| webui
  db_table_creator -->|completed_successfully| garbage_collector
  results_cache_indices_creator -->|completed_successfully| garbage_collector

  subgraph Databases
    database
    queue
    redis
    results_cache
  end

  subgraph Initialization jobs
    db_table_creator
    results_cache_indices_creator
  end

  subgraph Schedulers
    compression_scheduler
    query_scheduler
  end

  subgraph Workers
    compression_worker
    query_worker
    reducer
  end

  subgraph UI & Management
    webui
    garbage_collector
  end
    

Service

Description

database

Database for archive metadata, compression jobs, and query jobs

queue

Task queue for schedulers

redis

Task result storage for workers

compression_scheduler

Scheduler for compression jobs

query_scheduler

Scheduler for search/aggregation jobs

results_cache

Storage for the workers to return search results to the UI

compression_worker

Worker processes for compression jobs

query_worker

Worker processes for search/aggregation jobs

reducer

Reducers for performing the final stages of aggregation jobs

webui

Web server for the UI

garbage_collector

Process to manage data retention

Job

Description

db-table-creator

Creates and initializes database tables

results-cache-indices-creator

Creates a single-node replica set and sets up indices

Code structure#

The orchestration code is split up into:

  • BaseController that defines:

    • common logic for preparing the environment variables, configuration files, and directories necessary for each service.

    • abstract methods that orchestrator-specific derived classes must implement in order to orchestrate a deployment.

  • <Orchestrator>Controller that implements (and/or overrides) any of the methods in BaseController (<Orchestrator> is a placeholder for the specific orchestrator for which the class is being implemented).

Docker Compose orchestration#

This section explains how we use Docker Compose to orchestrate the CLP package and is broken into the following subsections:

Setting up the environment#

Several services require configuration values to be passed in through the CLP package’s config file, environment variables, and/or command line arguments. Since the services are running in containers, some of these configuration values need to be modified for the orchestration environment. Specifically:

  1. Paths on the host must be converted to appropriate paths in the container.

  2. Component hostnames must be converted to service names.

  3. Component ports must be converted to the component’s default ports.

    • This is necessary so that in the Docker Compose project file, we can network services together using the default port rather than a variable for the configured port.

To achieve this, before starting the deployment, DockerComposeController.start generates:

  • a CLP configuration file (<clp-package>/var/log/.clp-config.yml on the host) specific to the Docker Compose project environment.

  • an environment variable file (<clp-package>/.env) for any other configuration values.

  • any necessary directories (e.g., data output directories).

The Docker Compose project then passes those environment variables to the relevant services, either as environment variables or command line arguments, as necessary.

Starting and stopping the project#

To start and stop the project, DockerComposeController simply invokes docker compose up or docker compose down as appropriate. However, to allow multiple CLP packages to be run on the same host, we explicitly specify a project name for the project, where the name is based on the package’s instance ID.

Deployment Types#

CLP supports two deployment types determined by the package.query_engine configuration setting.

  1. BASE: For deployments using Presto as the query engine. This deployment only uses docker-compose.base.yaml.

  2. FULL: For deployments using one of CLP’s native query engines. This uses both docker-compose.base.yaml and docker-compose.yaml.

Implementation details#

One notable implementation detail is in how we handle mounts that are only necessary under certain configurations. For instance, the input logs mount is only necessary when the logs_input.type is fs. If logs_input.type is s3, we shouldn’t mount some random directory from the user’s host filesystem into the container. However, Docker doesn’t provide a mechanism to perform conditional mounts. Instead, we use Docker’s variable interpolation to conditionally mount an empty tmpfs mount into the container. This strategy is used wherever we need a conditional mount.

Troubleshooting#

If you encounter issues with the Docker Compose deployment:

  1. Check service status:

    docker compose --project-name clp-package-<instance-id> ps
    
  2. View service logs:

    docker compose --project-name clp-package-<instance-id> logs <service-name>
    
  3. Validate configuration:

    docker compose config