.. _drain: Scavenge ======== At the end of an allocation, certain SCR commands inspect the cache to verify that the most recent checkpoint has been copied to the parallel file system. If not, these commands execute other SCR commands to scavenge this checkpoint before the allocation ends. In this section, we detail key concepts referenced as part of the scavenge operations. Detailed program flow for these operations is provided in Section :ref:`Program Flow>Scavenge `. Rank filemap file ----------------- The ``scr_copy`` command is a serial program (non-MPI) that executes on a compute node and copies all files belonging to a specified dataset id from the cache to a specified dataset directory on the parallel file system. It is implemented in ``scr_copy.c`` whose program flow is described in Section :ref:``. The ``scr_copy`` command copies all application files and SCR redundancy data files. In addition, it writes a special filemap file for each rank to the dataset directory. The name of this filemap file is of the format: ``.scrfilemap``. An example hash for such a filemap file is shown below: :: DSET 6 RANK 2 RANK 2 DSET 6 DSETDESC COMPLETE 1 SIZE 2097182 FILES 4 ID 6 NAME scr.dataset.6 CREATED 1312850690668536 USER user1 JOBNAME simulation123 JOBID 112573 CKPT 6 FILES 2 FILE 3_of_4_in_0.xor META RANKS 4 COMPLETE 1 SIZE 175693 TYPE XOR FILE 3_of_4_in_0.xor CRC 0x2ef519a1 rank_2.ckpt META COMPLETE 1 SIZE 524296 NAME rank_2.ckpt PATH /p/lscratchb/user1/simulation123 ORIG rank_2.ckpt RANKS 4 TYPE FULL FILE rank_2.ckpt CRC 0x738bb68f It lists the files owned by a rank for a particular dataset. In this case, it shows that rank ``2`` wrote two files (``FILES=2``) as part of dataset id ``6``. Those files are named ``rank_2.ckpt`` and ``3_of_4_in_0.xor``. This format is similar to the filemap hash format described in Section :ref:`Filemap `. The main differences are that files are listed using relative paths instead of absolute paths and there are no redundancy descriptors. The paths are relative so that the dataset directory on the parallel file system may be moved or renamed. Redundancy descriptors are cache-specific, so these entries are excluded. Scanning files -------------- After ``scr_copy`` copies files from the cache on each compute node to the parallel file system, the ``scr_index`` command runs to check whether all files were recovered, rebuild missing files if possible, and add an entry for the dataset to the SCR index file (Section :ref:`Index_file `). When invoking the ``scr_index`` command, the full path to the prefix directory and the name of the dataset directory are specified on the command line. The ``scr_index`` command is implemented in ``scr_index.c``, and its program flow is described in Section  :ref:``. The ``scr_index`` command first acquires a listing of all items contained in the dataset directory by calling ``scr_read_dir``, which is implemented in ``scr_index.c``. This function uses POSIX calls to list all files and subdirectories contained in the dataset directory. The hash returned by this function distinguishes directories from files using the following format. :: DIR ... FILE ... The ``scr_index`` command then iterates over the list of file names and reads each file that ends with the “``.scrfilemap``” extension. These files are the filemap files written by ``scr_copy`` as described above. The ``scr_index`` command records the number of expected files for each rank into a single hash called the *scan hash*. For each file listed in the rank filemap file, the ``scr_index`` command verifies the meta data from the rank filemap map against the original file (excluding CRC32 checks). If the file passes these checks, the command adds a corresponding entry for the file to the scan hash. This entry is formatted such that it can be used as an entry in the summary file hash (Section :ref:`Summary file `). If the file is an ``XOR`` file, it sets a ``NOFETCH`` flag under the ``FILE`` key, which instructs the SCR library to exclude this file during a fetch operation. Furthermore, for each ``XOR`` file, the ``scr_index`` command extracts info about the ``XOR`` set from the file name and adds an entry under an ``XOR`` key in the scan hash. It records the ``XOR`` set id (under ``XOR``), the number of members in the set (under ``MEMBERS``), and the group rank of the current file in this set (under ``MEMBER``), as well as, the global rank id (under ``RANK``) and the name of the ``XOR`` file (under ``FILE``). After this all of this, the scan hash might look like the following example: :: DLIST DSET COMPLETE 1 SIZE 2097182 FILES 4 ID 6 NAME scr.dataset.6 CREATED 1312850690668536 USER user1 JOBNAME simulation123 JOBID 112573 CKPT 6 RANK2FILE RANKS RANK FILES FILE SIZE CRC NOFETCH SIZE CRC ... FILES FILE SIZE CRC NOFETCH SIZE CRC ... ... XOR MEMBERS MEMBER FILE RANK FILE RANK ... MEMBERS MEMBER FILE RANK FILE RANK ... ... Inspecting files ---------------- After merging data from all filemap files in the dataset directory, the ``scr_index`` command inspects the scan hash to identify any missing files. For each dataset, it determines the number of ranks associated with the dataset, and it checks that it has an entry in the scan hash for each rank. It then checks whether each rank has as an entry for each of its expected number of files. If any file is determined to be missing, the command adds an ``INVALID`` flag to the scan hash, and it lists all ranks that are missing files under the ``MISSING`` key. This operation may thus add entries like the following to the scan hash. :: DLIST INVALID MISSING ... Rebuilding files ---------------- If any ranks are missing files, then the ``scr_index`` command attempts to rebuild files. Currently, only the ``XOR`` redundancy scheme can be used to rebuild files. The command iterates over each of the ``XOR`` sets listed in the scan hash, and it checks that each set has an entry for each of its members. If it finds an ``XOR`` set that is missing a member, or if it finds that a set contains a rank which is known to be missing files, the command constructs a string that can be used to fork and exec a process to rebuild the files for that process. It records these strings under the ``BUILD`` key in the scan hash. If it finds that one or more files cannot be recovered, it sets an ``UNRECOVERABLE`` flag in the scan hash. If the ``scr_index`` command determines that it is possible to rebuild all missing files, it forks and execs a process for each string listed under the ``BUILD`` hash. Thus this operation may add entries like the following to the scan hash. :: DLIST UNRECOVERABLE BUILD ... Scan hash --------- After all of these steps, the scan hash is of the following form: :: DLIST UNRECOVERABLE BUILD ... INVALID MISSING ... RANKS RANK FILES FILE SIZE CRC NOFETCH SIZE CRC ... ... XOR MEMBERS MEMBER FILE RANK ... ... After the rebuild attempt, the ``scr_index`` command writes a summary file in the dataset directory. To produce the hash for the summary file, the command deletes extraneous entries from the scan hash (``UNRECOVERABLE``, ``BUILD``, ``INVALID``, ``MISSING``, ``XOR``) and adds the summary file format version number.