Page tree
Skip to end of metadata
Go to start of metadata

Provenance is important for trusting the outputs of simulations or data analysis.  It is necessary to have concise and complete records of the transformations of the input data to be able to reproduce the output data.  This not only includes recording the input data and jobscript, but also what versions of software were used and where the job was run.  Different software versions and different hardware may give different output if the algorithm is sensitive to precision.

Jobscripts and/or workflow engines are preferable over interactive sessions.

Jobscripts

You should follow Best Practices on Jobscript Reproducibility.  In particular to not use Shell Initialisation Scripts to alter the job environment, and to start the job with a pristine environment which is not inherited from your login shell.

SLURM keeps some information on the job, such as resources used, so you do not need to separately record this in your output (unless required for convenience).


Print out the currently loaded modules in bash.  Do this just before executing the program.  The 2>&1 is to get bash to send the module output to stdout instead of stderr.

module list 2>&1

Output where you are running the job.

pwd

The scontrol command shows helpful information for a running job, including the nodelist, working directory, input filename and output filenames.  You can add this before the executable starts.  Not all of this information is retained for querying with sacct.

scontrol show job $SLURM_JOBID

If your jobscript is short, you can copy it to stdout so you only have one file to keep.  Put this near the top of the jobscript.

cat $0

It can be helpful to know if your jobs have a consistent runtime.  Adding the below command at the end of your jobscript will give this, within a small margin of error.

 sacct -j $SLURM_JOBID -o jobid%20,Start,elapsed

This will output as below.  Note that if there are multiple srun commands in the jobscript, a line will be added for each.

               JobID               Start    Elapsed
-------------------- ------------------- ----------
             3109174 2019-07-02T08:34:36   03:32:09
      3109174.extern 2019-07-02T08:34:36   03:32:09


Workflow Engines

If you are using a workflow engine, it may have a superior in-built method for capturing the data and what what is done to it.  Consult the documentation for your particular workflow engine.



  • No labels