Installation Guide
This section describes how to install BEE and requirements for installation.
Requirements:
Python version 3.8 (or greater)
- Charliecloud version 0.34 (or greater)
Charliecloud is installed on Los Alamos National Laboratory (LANL) clusters and can be invoked via
module load charliecloud
before running beeflow. If you are on a system that does not have the module, Charliecloud is easily installed in user space and requires no privileges to install. To insure Charliecloud is available in subsequent runs addmodule load charliecloud
(or if you installed itexport PATH=<path_to_ch-run>:$PATH
) to your .bashrc (or other appropriate shell initialization file). BEE runs dependencies from a Charliecloud container and uses it to run the graph database neo4j and other dependencies. The default container runtime for containerized applications in BEE is Charliecloud.
- Containers:
Two Charliecloud dependency containers are currently required for BEE: one for the Neo4j graph database and another for Redis. The paths to these containers will need to be set in the BEE configuration later, using the
neo4j_image
and theredis_image
options respectively. BEE only supports Neo4j 5.x. We are currently using the latest version of Redis supplied on Docker Hub (as of 2023).For LANL systems, default locations are used and you will not be asked for them.
For other users, these containers can be pulled from Docker Hub (after following Installation: below) using
beeflow core pull-deps
, which will download and report the container paths to be set in the config later.
Installation:
BEE is a PyPI package that can be installed using pip. On an HPC cluster, you may want to set up a miniconda or conda environment or other python environment where you can install beeflow using the following command:
pip install hpc-beeflow
If you do not already have a python environment, you may be able to use the following example to create one (note: beeflow-env can be any environment name you choose):
mkdir beeflow-env
python3 -m venv beeflow-env
source beeflow-env/bin/activate
pip install hpc-beeflow
You will need to activate the environment with the command source beeflow-env/bin/activate
and type deactivate
when done.
An alternative is to use a Poetry environment, but we suggest this only for contributors. For more information click on the Developer’s Guide in this documentation.
Creating Configuration File:
You will need to setup the bee configuration file that will be located in:
Linux:
~/.config/beeflow/bee.conf
macOS:
~/Library/Application Support/beeflow/bee.conf
Using beeflow config new
will create the configuration file using the default values or you can use the --interactive
flag to be prompted for the most important settings.
Following generation of the bee.conf configuration file you can edit the settings using a text editor.
Caution: The default for container_archive is in the home directory. Some systems have small quotas for home directories and containers can be large files.
It is important to ensure the path to the two required Charliecloud containers is correct. There is a setting for the Neo4j container (neo4j_image
) and the Redis container (redis_image
). See Requirements: above for pulling these containers.
Depending on the system, you may also need to enter the system-specific information, such as account information.
Options can be left blank if they are unnecessary.
beeflow config has other options including a configuration validator. For more
information or help run: beeflow config info
or beeflow config --help
.
Starting up the BEE components:
To start the components (scheduler, slurmrestd(SLURM only), workflow manager, and task manager) simply run:
beeflow core start
To check the status of the bee components run:
beeflow core status
beeflow components:
redis ... RUNNING
scheduler ... RUNNING
celery ... RUNNING
slurmrestd ... RUNNING
wf_manager ... RUNNING
task_manager ... RUNNING
Some HPC systems have multiple front-ends. Run your workflows and components on the same front end.
Stopping the BEE components:
If at some point you would like to stop the beeflow components, you should first verify that all workflows are complete (archived). (If there are pending workflows, it is also fine to stop the components because you can restart beeflow later and start pending workflows with the “beeflow start” command).
beeflow list
Name ID Status
clamr d631d3 Archived
blast a93267 Pending
Now stop the components.
beeflow core stop