Page tree

Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

Problem

A job on the Pawsey supercomputers fails with "slurmstepd: error: Exceeded job memory limit at some point."

Solution

This shows that the job has exhausted all the memory available on a core/node. This error can occur shortly after the job has started, or much later in execution, depending on the demand for memory by the application. These are the two options to solve this problem:

Panel
bgColor#fff
  1. Reduce the memory requirement of the application (which may mean reducing the problem size, but may also mean checking for the possibility of memory leaks if you are developing your own code);
  2. Increase the memory available to the application by reducing the number of tasks within a node. In the case of Zeus, explicitly request for more memory for each task.

Content by Label
showLabelsfalse
max5
spacescom.atlassian.confluence.content.render.xhtml.model.resource.identifiers.SpaceResourceIdentifier@a9e
showSpacefalse
sortmodified
reversetrue
typepage
cqllabel in ("exceeded","memory","limit","slurm") and type = "page" and space = "US"
labelsmemory limit exceeded


Page properties
hiddentrue


Related issues