External database setup#
This guide explains how to set up external databases for CLP instead of using the Docker Compose managed databases. If the host(s) on which you’re running CLP are ephemeral, you should use external databases for metadata storage, and object storage for CLP’s archives and streams; this will ensure data is persisted even if a host is replaced.
Warning
The CLP Docker Compose project includes MariaDB/MongoDB databases by default. This guide is only for users who want to customize their deployment by using their own database servers or cloud-managed databases (e.g., AWS RDS, Azure Database).
CLP requires two databases:
MariaDB/MySQL - for storing metadata about archives, files, and jobs.
MongoDB - for caching query results.
MariaDB/MySQL setup#
CLP is compatible with any MariaDB or MySQL database. The instructions below use Ubuntu as an example, but you can use any compatible database installation or cloud-managed service.
Installing MariaDB on Ubuntu#
Install MariaDB server:
sudo apt update sudo apt install mariadb-server
Connect to MariaDB as root:
sudo mysqlCreate the CLP database:
CREATE DATABASE `clp-db`;
Create a user for CLP (replace
<password>with a secure password):CREATE USER 'clp-user'@'%' IDENTIFIED BY '<password>';
Note
The
'%'allows connections from any host. For better security, replace'%'with the specific hostname or IP address from which CLP will connect (e.g.,'clp-user'@'192.168.1.10').Grant privileges to the user:
GRANT ALL PRIVILEGES ON `clp-db`.* TO 'clp-user'@'%'; FLUSH PRIVILEGES;
Exit the MariaDB shell:
EXIT;
Configuring MariaDB for remote connections#
If CLP components will connect from a different host, you need to configure MariaDB to accept remote connections:
Edit the MariaDB configuration file:
sudo nano /etc/mysql/mariadb.conf.d/50-server.cnf
Find the
bind-addressline and change it to allow connections from all interfaces:bind-address = 0.0.0.0
Restart MariaDB:
sudo systemctl restart mariadb
Verifying the MariaDB connection#
You can verify the MariaDB connection by running:
mysql -h <mariadb-hostname-or-ip> -u clp-user -p clp-db
Using AWS RDS for MariaDB/MySQL#
When using AWS RDS:
Create a MariaDB or MySQL RDS instance in the AWS Console.
Note the endpoint hostname and port (the default is
3306).Create the database and user using a MySQL client:
mysql -h <rds-endpoint> -u admin -p
Then follow steps 2-5 from Installing MariaDB on Ubuntu.
Ensure the RDS security group allows inbound connections on port 3306 from your CLP hosts.
MongoDB setup#
CLP is compatible with any MongoDB database. For installation instructions, see the MongoDB installation documentation.
Creating the CLP database in MongoDB#
MongoDB automatically creates databases and collections when first accessed, so no manual database
creation is needed. CLP will create the necessary database and collections (clp-query-results by
default) when it first connects.
Configuring MongoDB for remote connections#
If CLP components will connect from a different host:
Edit the MongoDB configuration file:
sudo nano /etc/mongod.conf
Find the
net.bindIpsetting and change it to allow connections from all interfaces:net: port: 27017 bindIp: 0.0.0.0
Restart MongoDB:
sudo systemctl restart mongod
Warning
For production deployments, it’s highly recommended to enable authentication and SSL/TLS for MongoDB. See the MongoDB security documentation for details.
Verifying the MongoDB connection#
You can verify the MongoDB connection by running:
mongosh "mongodb://<mongodb-hostname-or-ip>:27017/clp-query-results"
Using AWS DocumentDB or MongoDB Atlas#
When using AWS DocumentDB or MongoDB Atlas:
Create a cluster in the AWS Console or MongoDB Atlas.
Note the connection string/endpoint provided.
Ensure the security group or IP access list allows connections from your CLP hosts.
Use the provided connection string when configuring CLP (see below).
Configuring CLP to use external databases#
After setting up your external databases, configure CLP to use them by editing etc/clp-config.yaml:
database:
host: "<mariadb-hostname-or-ip>"
port: 3306
name: "clp-db"
# Credentials will be set in etc/credentials.yaml
results_cache:
host: "<mongodb-hostname-or-ip>"
port: 27017
name: "clp-query-results"
Set the credentials in etc/credentials.yaml:
database:
username: "clp-user"
password: "<your-mariadb-password>"
Note
When using external databases in a multi-host deployment, you do not need to start the
database and results-cache Docker Compose services. Skip those services when following the
multi-host deployment guide. However, you still need to run the database
initialization jobs (db-table-creator and results-cache-indices-creator).