Provenance is important for trusting the outputs of simulations or data analysis. It is necessary to have concise and complete records of the transformations of the input data to be able to reproduce the output data. This not only includes recording the input data and jobscript, but also what versions of software were used and where the job was run. Different software versions and different hardware may give different output if the algorithm is sensitive to precision.
Jobscripts and/or workflow engines are preferable over interactive sessions.
You should follow Best Practices on Jobscript Reproducibility. In particular to not use Shell Initialisation Scripts to alter the job environment, and to start the job with a pristine environment which is not inherited from your login shell.
SLURM keeps some information on the job, such as resources used, so you do not need to separately record this in your output (unless required for convenience).
Print out the currently loaded modules in bash. Do this just before executing the program. The 2>&1 is to get bash to send the module output to stdout instead of stderr.
Output where you are running the job.
The scontrol command shows helpful information for a running job, including the nodelist, working directory, input filename and output filenames. You can add this before the executable starts. Not all of this information is retained for querying with sacct.
If your jobscript is short, you can copy it to stdout so you only have one file to keep. Put this near the top of the jobscript.
It can be helpful to know if your jobs have a consistent runtime. Adding the below command at the end of your jobscript will give this, within a small margin of error.
This will output as below. Note that if there are multiple srun commands in the jobscript, a line will be added for each.
If you are using a workflow engine, it may have a superior in-built method for capturing the data and what what is done to it. Consult the documentation for your particular workflow engine.