Scavenge
At the end of an allocation, certain SCR commands inspect the cache to verify that the most recent checkpoint has been copied to the parallel file system. If not, these commands execute other SCR commands to scavenge this checkpoint before the allocation ends. In this section, we detail key concepts referenced as part of the scavenge operations. Detailed program flow for these operations is provided in Section Program Flow>Scavenge.
Rank filemap file
The scr_copy
command is a serial program (non-MPI) that executes on
a compute node and copies all files belonging to a specified dataset id
from the cache to a specified dataset directory on the parallel file
system. It is implemented in scr_copy.c
whose program flow is
described in Section <scr_copy>. The
scr_copy
command copies all application files and SCR redundancy
data files. In addition, it writes a special filemap file for each rank
to the dataset directory. The name of this filemap file is of the
format: <rank>.scrfilemap
. An example hash for such a filemap file
is shown below:
DSET
6
RANK
2
RANK
2
DSET
6
DSETDESC
COMPLETE
1
SIZE
2097182
FILES
4
ID
6
NAME
scr.dataset.6
CREATED
1312850690668536
USER
user1
JOBNAME
simulation123
JOBID
112573
CKPT
6
FILES
2
FILE
3_of_4_in_0.xor
META
RANKS
4
COMPLETE
1
SIZE
175693
TYPE
XOR
FILE
3_of_4_in_0.xor
CRC
0x2ef519a1
rank_2.ckpt
META
COMPLETE
1
SIZE
524296
NAME
rank_2.ckpt
PATH
/p/lscratchb/user1/simulation123
ORIG
rank_2.ckpt
RANKS
4
TYPE
FULL
FILE
rank_2.ckpt
CRC
0x738bb68f
It lists the files owned by a rank for a particular dataset. In this
case, it shows that rank 2
wrote two files (FILES=2
) as part of
dataset id 6
. Those files are named rank_2.ckpt
and
3_of_4_in_0.xor
.
This format is similar to the filemap hash format described in Section Filemap. The main differences are that files are listed using relative paths instead of absolute paths and there are no redundancy descriptors. The paths are relative so that the dataset directory on the parallel file system may be moved or renamed. Redundancy descriptors are cache-specific, so these entries are excluded.
Scanning files
After scr_copy
copies files from the cache on each compute node to
the parallel file system, the scr_index
command runs to check
whether all files were recovered, rebuild missing files if possible, and
add an entry for the dataset to the SCR index file
(Section Index_file). When invoking the
scr_index
command, the full path to the prefix directory and the
name of the dataset directory are specified on the command line. The
scr_index
command is implemented in scr_index.c
, and its program
flow is described in Section <scr_index>.
The scr_index
command first acquires a listing of all items
contained in the dataset directory by calling scr_read_dir
, which is
implemented in scr_index.c
. This function uses POSIX calls to list
all files and subdirectories contained in the dataset directory. The
hash returned by this function distinguishes directories from files
using the following format.
DIR
<dir1>
<dir2>
...
FILE
<file1>
<file2>
...
The scr_index
command then iterates over the list of file names and
reads each file that ends with the “.scrfilemap
” extension. These
files are the filemap files written by scr_copy
as described above.
The scr_index
command records the number of expected files for each
rank into a single hash called the scan hash.
For each file listed in the rank filemap file, the scr_index
command
verifies the meta data from the rank filemap map against the original
file (excluding CRC32 checks). If the file passes these checks, the
command adds a corresponding entry for the file to the scan hash. This
entry is formatted such that it can be used as an entry in the summary
file hash (Section Summary file). If the
file is an XOR
file, it sets a NOFETCH
flag under the FILE
key, which instructs the SCR library to exclude this file during a fetch
operation.
Furthermore, for each XOR
file, the scr_index
command extracts
info about the XOR
set from the file name and adds an entry under an
XOR
key in the scan hash. It records the XOR
set id (under
XOR
), the number of members in the set (under MEMBERS
), and the
group rank of the current file in this set (under MEMBER
), as well
as, the global rank id (under RANK
) and the name of the XOR
file
(under FILE
). After this all of this, the scan hash might look like
the following example:
DLIST
<dataset_id>
DSET
COMPLETE
1
SIZE
2097182
FILES
4
ID
6
NAME
scr.dataset.6
CREATED
1312850690668536
USER
user1
JOBNAME
simulation123
JOBID
112573
CKPT
6
RANK2FILE
RANKS
<num_ranks>
RANK
<rank1>
FILES
<num_expected_files_for_rank1>
FILE
<filename>
SIZE
<filesize>
CRC
<crc>
<xor_filename>
NOFETCH
SIZE
<filesize>
CRC
<crc>
...
<rank2>
FILES
<num_expected_files_for_rank2>
FILE
<filename>
SIZE
<filesize>
CRC
<crc>
<xor_filename>
NOFETCH
SIZE
<filesize>
CRC
<crc>
...
...
XOR
<set1>
MEMBERS
<num_members_in_set1>
MEMBER
<member1>
FILE
<xor_filename_of_member1_in_set1>
RANK
<rank_id_of_member1_in_set1>
<member2>
FILE
<xor_filename_of_member2_in_set1>
RANK
<rank_id_of_member2_in_set1>
...
<set2>
MEMBERS
<num_members_in_set2>
MEMBER
<member1>
FILE
<xor_filename_of_member1_in_set2>
RANK
<rank_id_of_member1_in_set2>
<member2>
FILE
<xor_filename_of_member2_in_set2>
RANK
<rank_id_of_member2_in_set2>
...
...
Inspecting files
After merging data from all filemap files in the dataset directory, the
scr_index
command inspects the scan hash to identify any missing
files. For each dataset, it determines the number of ranks associated
with the dataset, and it checks that it has an entry in the scan hash
for each rank. It then checks whether each rank has as an entry for each
of its expected number of files. If any file is determined to be
missing, the command adds an INVALID
flag to the scan hash, and it
lists all ranks that are missing files under the MISSING
key. This
operation may thus add entries like the following to the scan hash.
DLIST
<dataset_id>
INVALID
MISSING
<rank1>
<rank2>
...
Rebuilding files
If any ranks are missing files, then the scr_index
command attempts
to rebuild files. Currently, only the XOR
redundancy scheme can be
used to rebuild files. The command iterates over each of the XOR
sets listed in the scan hash, and it checks that each set has an entry
for each of its members. If it finds an XOR
set that is missing a
member, or if it finds that a set contains a rank which is known to be
missing files, the command constructs a string that can be used to fork
and exec a process to rebuild the files for that process. It records
these strings under the BUILD
key in the scan hash. If it finds that
one or more files cannot be recovered, it sets an UNRECOVERABLE
flag
in the scan hash. If the scr_index
command determines that it is
possible to rebuild all missing files, it forks and execs a process for
each string listed under the BUILD
hash. Thus this operation may add
entries like the following to the scan hash.
DLIST
<dataset_id>
UNRECOVERABLE
BUILD
<cmd_to_rebuild_files_for_set1>
<cmd_to_rebuild_files_for_set2>
...
Scan hash
After all of these steps, the scan hash is of the following form:
DLIST
<dataset_id>
UNRECOVERABLE
BUILD
<cmd_to_rebuild_files_for_set1>
<cmd_to_rebuild_files_for_set2>
...
INVALID
MISSING
<rank1>
<rank2>
...
RANKS
<num_ranks>
RANK
<rank>
FILES
<num_files_to_expect>
FILE
<file_name>
SIZE
<size_in_bytes>
CRC
<crc32_string_in_0x_form>
<xor_file_name>
NOFETCH
SIZE
<size_in_bytes>
CRC
<crc32_string_in_0x_form>
...
...
XOR
<xor_setid>
MEMBERS
<num_members_in_set>
MEMBER
<member_id>
FILE
<xor_filename>
RANK
<rank>
...
...
After the rebuild attempt, the scr_index
command writes a summary
file in the dataset directory. To produce the hash for the summary file,
the command deletes extraneous entries from the scan hash
(UNRECOVERABLE
, BUILD
, INVALID
, MISSING
, XOR
) and
adds the summary file format version number.