Getting Started - Example Workflows
If you have beeflow installed and the components running, you are ready to try out a BEE workflow.
It is not necessary to clone the BEE repository to run BEE. However, in order to have access to the files needed for the examples you will need a copy of the repository. If you have not already cloned it, run the following:
git clone https://github.com/lanl/BEE.git
I will refer to the location of your local copy as $BEE_PATH
.
Simple multi-step example (without container)
cat-grep-tar workflow
The cat-grep-tar workflow is a rather simple workflow that demonstrates BEE’s
ability to work with multiple tasks, including those that can run at the same
time. The first step, or task, does a simple cat
of an input file. The
stdout of this step is then passed as a file to two steps that grep
for
different words within the text. These can both be run in parallel. The final
step takes the output files from the grep step and stores these into a tarball.
These input values are stored in a YAML file. As an example we have a default
input.yml
file that can be used. It contains:
input_file: lorem.txt
word0: Vivamus
word1: pulvinar
tarball_fname: out.tgz
This uses a sample from the classic Lorem Ipsum text and does a grep for two
random words in the file. Finally an output tarball is generated with the name
out.tgz
.
Before running, make sure to take a look at the CWL files that form the
workflow. There is a main workflow.cwl
file that specifies the workflow,
inputs and outputs, and all of the workflow steps or tasks. Note
that each of these steps specifies their step-specific inputs and outputs, as
well as a run
option, that in this case points to the CWL file that
further specifies how to execute the command. When creating a
workflow, you will need to create a similar workflow structure, explicitly list
dependencies between steps and also describe how to run the steps on the
system. For more information on writing a workflow please refer to the
Common Workflow Language User Guide, but note that BEE doesn’t currently
support all features.
To run our simple example, you’ll want to first create a workdir for the
workflow and copy over the input file lorem.txt
. I’ll refer to this path in
code samples as $WORKDIR_PATH
. Note that this $WORKDIR_PATH
should be
located outside of the BEE repo. For example, it could be in your home
directory $HOME
. The workdir is where all of your input files should be
stored before starting a workflow, as this will be the current working directory
of all steps that are run. Output from each step will also be stored here.
Once this workdir has been created and beeflow has been started, you are now ready to package and submit the workflow. This can be done with the following sequence of commands:
cd $WORKDIR_PATH
cp $BEE_PATH/examples/cat-grep-tar/lorem.txt .
beeflow package $BEE_PATH/examples/cat-grep-tar . # Tars up the workflow
beeflow submit $NAME ./cat-grep-tar.tgz workflow.cwl input.yml $WORKDIR_PATH # Now submit the workflow
This first command packages the workflow into a tarball, which makes it easy to pass everything over to the Workflow Manager and finally submits the workflow, specifying a name, the tarball path, the location of the CWL file, the yaml file and finally the workflow path containing lorem.txt. If you copy and paste make sure to change $NAME to a name of your choice, $BEE_PATH to the path of the BEE repo, and $WORKDIR_PATH to the proper path that was created earlier. The submit command should have produced a short ID of 6-7 characters.
Alternatively, you can skip packaging the workflow and submit using the path of the directory for the example by:
cd $WORKDIR_PATH
cp -r $BEE_PATH/examples/cat-grep-tar . #Copy example directory
cp cat-grep-tar/lorem.txt .
beeflow submit $NAME cat-grep-tar cat-grep-tar/workflow.cwl cat-grep-tar/input.yml $WORKDIR_PATH # Submits the workflow
This will automatically do the packaging and create an archive in the background to be submitted.
Now the workflow should start up. While the workflow is running you can check
the status by running a beeflow query $ID
. On completion, each step
should be in a COMPLETED
state. If you forgot to copy the lorem.txt file
to $WORKDIR_PATH the cat step will be in the FAILED
state and the error will
be in the cat.err file.
After all steps have finished, you should see a number of files that have been created in your $WORKDIR_PATH:
cat.txt
cat.err
lorem.txt
occur0.txt
occur1.txt
out.tgz
The cat.txt
file is just a duplicate of lorem.txt
and cat.err
is
the stderr output from the cat step. The occur0.txt
and occur1.txt
files were produced respectively by the grep0
and grep1
steps.
out.tgz
was produced by the final tar step. For this example, the cat step
and the tar steps are not really necessary, since the file already exists in
the input directory and on completion you don’t necessarily need to have both
of the occur*.txt
files in a tarball. However, this is a useful sample of
the features a real-world workflow might need to use. For instance, the first
step might be producing some sort of output from a calculation, instead of just
copying the input to the output. The last step may also do some more processing
to produce some sort of final file. If necessary, there can be many more
processing steps than this simple example shows.
CLAMR workflow examples (containerized application)
CLAMR is an open source LANL mini-app that simulates shallow water equations. CLAMR performs hydrodynamic cell-based adaptive mesh refinement (AMR).
The CLAMR workflow examples we introduce here are simple two step workflows that run a CLAMR simulation in step one, producing graphic images from periodic time steps. Then, FFMPEG is run in step two to make a movie visualizing the progression of the simulation. We use these workflows for some of our integration tests and they are practical examples to help you start using BEE. The differences in the CLAMR workflows are the way the containers are used.
CLAMR build workflow the container will be built
CLAMR copy workflow, the container will be copied from a specified path to the container_archive directory (specified in bee.conf)
CLAMR use workflow uses the container specified
CLAMR build workflow
The workflow is in <path to BEE>/examples/clamr-ffmpeg-build. You may want to explore the cwl files to understand the workflow specification for the example. The specification for the build of clamr in this example is for X86 hardware. Below is the clamr step with the DockerRequirement in hints that specifies to build a container from a dockerfile using Charliecloud (the container runtime specified in the configuration file).
CWL for clamr step in examples/clamr-ffmpeg-build/clamr_wf.cwl
Next we’ll submit the CLAMR workflow from a directory of your choosing, referred to as $WORKDIR_PATH, on the same front-end where you started the components. If you have not started the beeflow components, refer to Installation Guide.
In this example, instead of packaging up the workflow cwl files directory, we’ve just listed the full path. This should auto-detect the directory and package it for you.
cd $WORKDIR_PATH
cp -r $BEE_PATH/examples/clamr-ffmpeg-build .
beeflow submit clamr-example clamr-ffmpeg-build clamr-ffmpeg-build/clamr_wf.cwl clamr-ffmpeg-build/clamr_job.yml $WORKDIR_PATH
Output:
Detected directory instead of packaged workflow. Packaging Directory...
Package clamr-ffmpeg-build.tgz created successfully
Workflow submitted! Your workflow id is b94ff7.
Started workflow!
If this is the first time you’ve run the workflow it will build the container
and create a Charliecloud image tarball. This process will be done before
running the workflow tasks as jobs and may take a few minutes. The first task
will be in the ready state, until the container is built. This is the
pre-processing building phase and will only be performed once. In this example
both steps use the container that is built in the pre-processing stage. Once
the build has been completed the Charliecloud image will be in the container
archive location specified in the builder section of the bee configuration
file. You can list contents of the configuration file using beeflow config show
.
The status of the workflow will progress to completion and can be queried as shown:
Check the status:
beeflow query fce80d
Output:
Running
clamr--READY
ffmpeg--WAITING
As the clamr task goes from READY to RUNNING, let’s check the status again:
beeflow query fce80d
Output:
Running
clamr--RUNNING
ffmpeg--WAITING
When the workflow has completed:
beeflow query fce80d
Output:
Archived
clamr--COMPLETED
ffmpeg--COMPLETED
The archived workflow with associated job outputs will be in the
bee_workdir. See the default section of your configuration file (to list
configuration file contents run beeflow config show
). This workflow also produces
output from CLAMR and ffmpeg in the directory where you submitted the workflow :
graphics_output - a directory containing the graphics png files.
total_execution_time.log - log generated by CLAMR
CLAMR_movie.mp4 - The final movie
clamr_stdout.out - standard output from clamr step
This example uses Charliecloud. The image will still be in the Charliecloud
cache. You can list what is in the cache using ch-image list
. If there are
no other builds, the result should be:
ch-image list
clamr-ffmpeg
debian:stable-slim
There are other commands for resetting (clearing out all images) and deleting
an image. Type ch-image --help
more information consult the Charliecloud
documentation.
CLAMR copy workflow
Add LANL example here copying /usr/projects/BEE/clamr/clamr-toss …
CLAMR use workflow
Add LANL example here using /usr/projects/BEE/clamr/clamr-toss …