Group descriptors

Overview

A group descriptor is a data structure that describes a group of processes. Each group is given a name, which is used as a key to refer to the group. For each group name, a process belongs to at most one group, which is a subset of all processes in the job.

There are two pre-defined groups: WORLD which contains all processes in MPI_COMM_WORLD and NODE which contains all processes on the same node. SCR determines which processes are on the same node by splitting processes into groups that have the same value for scr_my_hostname, which is set by calling scr_env_hostname().

Additional groups may be defined via entries in the system or user configuration files. It is necessary to define additional groups when failure modes or storage devices span multiple compute nodes. For example if network switch failures are common, then one could define a group to specify which nodes share a network switch to enable SCR to protect against such failures.

The group descriptor is a C struct. During the run, the SCR library maintains an array of group descriptor structures in a global variable named scr_groupdescs. It records the number of descriptors in this list in a variable named scr_ngroupdescs. It builds this list during SCR_Init() by calling scr_groupdescs_create() which constructs the list from a third variable called scr_groupdescs_hash. This hash variable is initialized from entries in the configuration files while processing SCR parameters. The group structures are freed in SCR_Finalize() by calling scr_groupdescs_free().

Group descriptor struct

Here is the definition for the C struct.

typedef struct {
  int      enabled;      /* flag indicating whether this descriptor is active */
  int      index;        /* each descriptor is indexed starting from 0 */
  char*    name;         /* name of group */
  MPI_Comm comm;         /* communicator of processes in same group */
  int      rank;         /* local rank of process in communicator */
  int      ranks;        /* number of ranks in communicator */
} scr_groupdesc;

The enabled field is set to 0 (false) or 1 (true) to indicate whether this particular group descriptor may be used. Even though a group descriptor may be defined, it may be disabled. The index field records the index within the scr_groupdescs array. The name field is a copy of the group name. The comm field is a handle to the MPI communicator that defines the group the process is a member of. The rank and ranks fields cache the rank of the process in this communicator and the number of processes in this communicator, respectively.

Example group descriptor configuration file entries

Here are some examples of configuration file entries to define new groups.

GROUPS=zin1  POWER=psu1  SWITCH=0
GROUPS=zin2  POWER=psu1  SWITCH=1
GROUPS=zin3  POWER=psu2  SWITCH=0
GROUPS=zin4  POWER=psu2  SWITCH=1

Group descriptor entries are identified by a leading GROUPS key. Each line corresponds to a single compute node, where the hostname is the value of the GROUPS key. There must be one line for every compute node in the allocation. It is recommended to specify groups in the system configuration file.

The remaining values on the line specify a set of group name / value pairs. The group name is the string to be referenced by store and checkpoint descriptors. The value can be an arbitrary character string. The only requirement is that for a given group name, nodes that form a group must specify identical values.

In the above example, there are four compute nodes: zin1, zin2, zin3, and zin4. There are two groups defined: POWER and SWITCH. Nodes zin1 and zin2 belong to the same POWER group, as do nodes zin3 and zin4. For the SWITCH group, nodes zin1 and zin3 belong to the same group, as do nodes zin2 and zin4.

Common functions

This section describes some of the most common group descriptor functions. These functions are defined in scr_groupdesc.h and implemented in scr_groupdesc.c.

Creating and freeing the group descriptors array

To initialize the scr_groupdescs and scr_ngroupdescs variables from the scr_groupdescs_hash variable:

scr_groupdescs_create();

Free group descriptors array.

scr_groupdescs_free();

Lookup group descriptor by name

To lookup a group descriptor by name.

scr_groupdesc* group = scr_groupdescs_from_name(name);

This returns NULL if the specified group name is not defined. There is also a function to return the index of a group within scr_groupdescs.

int index = scr_groupdescs_index_from_name(name);

This returns an index value in the range \([0, \texttt{scr\_ngroupdescs})\) if the specified group name is defined and it returns -1 otherwise.