.. _rank2file_file:

Rank2file map
-------------

The rank2file map tracks which files were written by which ranks during
a particular dataset. This map contains information for every rank and
file. For large jobs, it may consist of more bytes than can be loaded
into any single MPI process. This information is scattered among
multiple files that are organized as a tree. These files are stored in
the dataset directory on the parallel file system. Internally, the data
of the rank2file map is organized as a hash.

There is always a root file named ``rank2file.scr``. Here are the
contents of an example root rank2file map.

::

     LEVEL
       1
     RANKS
       4
     RANK
       0
         OFFSET
           0
         FILE
           .scr/rank2file.0.0.scr

Note that there is no ``VERSION`` field. The version is implied from the
summary file for the dataset. The ``LEVEL`` field lists the level at
which the current rank2file map is located in the tree. The leaves of
the tree are at level 0. The ``RANKS`` field specifies the number of
ranks the current file (and its associated subtree) contains information
for.

For levels that are above level 0, the ``RANK`` hash contains
information about other rank2file map files to be read. Each entry in
this hash is identified by a rank id, and then for each rank, a ``FILE``
and ``OFFSET`` are given. The rank id specifies which rank is
responsible for reading content at the next level. The ``FILE`` field
specifies the file name that is to be read, and the ``OFFSET`` field
gives the starting byte offset within that file.

A process reading a file at the current level scatters the hash info to
the designated “reader” ranks, and those processes read data for the
next level. In this way, the task of reading the rank2file map is
distributed among multiple processes in the job. The SCR library ensures
that the maximum amount of data any process reads in any step is limited
(currently 1MB).

File names at levels lower than the root have names of the form
``rank2file.<level>.<rank>.scr``, where ``level`` is the level number
within the tree and ``rank`` is the rank of the process that wrote the
file.

Finally, level 0 contains the data that maps a rank to a list of files
names. Here are the contents of an example rank2file map file at level
0.

::

     RANK2FILE
       LEVEL
         0
       RANKS
         4
       RANK
         0
           FILE
             rank_0.ckpt
               SIZE
                 524294
               CRC
                 0x6697d4ef
         1
           FILE
             rank_1.ckpt
               SIZE
                 524295
               CRC
                 0x28eeb9e
         2
           FILE
             rank_2.ckpt
               SIZE
                 524296
               CRC
                 0xb6a62246
         3
           FILE
             rank_3.ckpt
               SIZE
                 524297
               CRC
                 0x213c897a

Again, the number of ranks that this file contains information for is
recorded under the ``RANKS`` field.

There are entries for specific ranks under the ``RANK`` hash, which is
indexed by rank id within ``scr_comm_world``. For a given rank, each
file that rank wrote as part of the dataset is indexed by file name
under the ``FILE`` hash. The file name specifies the relative path to
the file starting from the dataset directory. For each file, SCR records
the size of the file in bytes under ``SIZE``, and SCR may also record
the CRC32 checksum value over the contents of the file under the ``CRC``
field.

On restart, the reader rank that reads this hash scatters the
information to the owner rank, so that by the end of processing the
tree, all processes know which files to read.