Perl modules

scripts/common/scr_hostlist.pm

Manipulates lists of hostnames. The elements in a set of hostnames are expected to have a common alpha prefix (machine name) followed by a number (node number). A hostlist can be specified in one of two forms:

compressed

“atlas[3,5-7,9-11]”

Perl string scalar

uncompressed

(“atlas3”,“atals5”,“atlas6”,“atlas7”,“atlas9”,“atlas10”,“atlas11”)

Perl list of string scalars

All functions in this module are global; no instance must be created.

Given a compressed hostlist, construct the corresponding uncompressed hostlist (preserves order and duplicates).

my @hostlist = scr_hostlist::expand($hostlist);

Given an uncompressed hostlist, construct a compressed hostlist (preserves duplicate hostnames, but sorts list by node number).

my $hostlist = scr_hostlist::compress(@hostlist);

Given references to two uncompressed hostlists, subtract second list from first and return remainder as an uncompressed hostlist.

my @hostlist = scr_hostlist::diff(\@hostlist1, \@hostlist2);

Given references to two uncompressed hostlists, return the intersection of the two as an uncompressed hostlist.

my @hostlist = scr_hostlist::intersect(\@hostlist1, \@hostlist2);

scripts/common/scr_param.pm.in

Reads and returns SCR configuration parameters, returning the first set value found by searching in the following order:

  1. Environment variable,

  2. User configuration file,

  3. System configuration file,

  4. Default (build-time constant).

When an instance is created, it attempts to read the user configuration file from SCR_CONF_FILE if that variable is set. Otherwise, it attempts to read the user configuration file from <prefix>/.scrconf, where <prefix> is the prefix directory specified in SCR_PREFIX or the current working directory if SCR_PREFIX is not set.

Some parameters cannot be set by a user, and for these parameters any settings in environment variables or the user configuration file are ignored.

The majority of parameters return scalar values, but some return an associated hash.

Allocate a new scr_param object.

my $param = new scr_param();

Given the name of an SCR parameter, return its scalar value.

my $val = $param->get($name);

Given the name of an SCR parameter, return a reference to its hash.

my $hashref = $param->get_hash($name);

To disable a user from setting a parameter, add it to the no_user hash within the module implementation. Parameters listed in this hash will not be affected by environment variables or user configuration file settings.

Compile-time constants should be listed in the compile hash.

Utilities

scripts/common/scr_glob_hosts.in

Uses scr_hostlist.pm to manipulate hostlists. Only accepts compressed hostlists for input.

Given a compressed hostlist, return number of hosts.

scr_glob_hosts --count "atlas[3,5-7,9-11]"

The example above returns “7”, as there are seven hosts specified in the list.

Given a compressed hostlist, return the nth host.

scr_glob_hosts --nth 3 "atlas[3,5-7,9-11]"

The example above returns “atlas6”, which is the third host.

Given two compressed hostlists, subtract one from the other and return remainder.

scr_glob_hosts --minus "atlas[3,5-7,9-11]":"atlas[5,7,20]"

The above example returns “atlas[3,6,9-11]”, which has removed “atlas5” and “atlas7” from the first list.

Given two compressed hostlists, return intersection of the two.

scr_glob_hosts --intersection "atlas[3,5-7,9-11]":"atlas[5,7,20]"

The above example returns “atlas[5,7]”, which is the list of common hosts between the two lists.

src/scr_flush_file.c

Utility to access info in SCR flush file. One must specify the prefix directory from which to read the flush file.

Read the flush file and return the latest checkpoint id.

scr_flush_file --latest /prefix/dir

The above command prints the checkpoint id of the most recent checkpoint in the flush file. It exits with a return code of 0 if it found a checkpoint id, and it exits with a return code of 1 otherwise.

Determine whether a specified checkpoint id needs to be flushed.

scr_flush_file --needflush 6 /prefix/dir

The command above checks whether the SCR_FLUSH_KEY_LOCATION_PFS key is set for the specified checkpoint id. If so, the command exits with 0, otherwise is exits with 1.

List the location(s) containing a copy of the dataset.

scr_flush_file --location 6 /prefix/dir

This command lists PFS for the parallel file system, and it lists CACHE for datasets stored in cache.

List the subdirectory where a dataset should be flushed to.

scr_flush_file --subdir 6 /prefix/dir

scripts/common/scr_list_dir.in

Returns full path to control or cache directory. Uses scr_param.pm. This command should be executed in an environment where SCR_CONF_FILE is set to the same value as the running job.

  1. Uses scr_param.pm to read SCR_CNTL_BASE to get base control directory.

  2. Uses scr_param.pm to read CACHE hash from config file to get info on cache directories.

  3. Invokes “scr_env –user” to get the username if not specified on command line.

  4. Invokes “scr_env –jobid” to get the jobid if not specified on command line.

  5. Combines base, user, and jobid to build and output full path to control or cache directory.

scripts/TLCC/scr_list_down_nodes.in

Runs a series of tests over all specified nodes and builds list of nodes which fail one or more tests. Uses scr_hostlist.pm to manipulate hostlists. Uses scr_param.pm to read various parameters.

  1. Invokes “scr_env –nodes” to get the current nodeset, if not specified on command line.

  2. Invokes “scr_env –down” to ask resource manager whether any nodes are known to be down.

  3. Invokes ping to each node thought to be up.

  4. Uses scr_param.pm to read SCR_EXCLUDE_NODES, user may explicitly exclude nodes this way.

  5. Adds any nodes explicitly listed on command line as being down.

  6. Invokes scr_list_dir to get list of base directories for control directory.

  7. Uses scr_param.pm to read CNTLDIR hash from config file to get expected capacity corresponding to each base directory.

  8. Invokes scr_list_dir to get list of base directories for cache directory.

  9. Uses scr_param.pm to read CACHEDIR hash from config file to get expected capacity corresponding to each base directory.

  10. Invokes pdsh to run scr_check_node on each node that hasn’t yet failed a test.

  11. Optionally print list of down nodes to stdout.

  12. Optionally log each down node with reason via scr_log_event if logging is enabled.

  13. Exit with appropriate code to indicate whether any nodes are down.

scripts/common/scr_check_node.in

Runs on compute node to execute a series of checks to verify that node is still functioning.

  1. Reads list of control directories and sizes from –cntl option.

  2. Reads list of cache directories and sizes from –cache option.

  3. Invokes ls -lt to check basic access for each directory.

  4. If size is specified, invoke df to verify that each directory has sufficient space.

  5. Invokes touch and rm -rf to create and delete a test file in each directory.

scripts/common/scr_prefix.in

Prints SCR prefix directory.

  1. Reads $SCR_PREFIX if set.

  2. Invokes pwd to get current working directory otherwise.