Hash

Overview

A frequently used data structure is the scr_hash object. This data structure contains an unordered list of elements, where each element contains a key (a string) and a value (another hash). Each element in a hash has a unique key. Using the key, one can get, set, and unset elements in a hash. There are functions to iterate through the elements of a hash. There are also functions to pack and unpack a hash into a memory buffer, which enables one to transfer a hash through the network or store the hash to a file.

Throughout the documentation and comments in the source code, a hash is often displayed as a tree structure. The key belonging to a hash element is shown as a parent node, and the elements in the hash belonging to that element are displayed as children of that node. For example, consider the following tree:

+- RANK
   +- 0
   |  +- FILES
   |  |  +- 2
   |  +- FILE
   |     +- foo_0.txt
   |     |  +- SIZE
   |     |  |  +- 1024
   |     |  +- COMPLETE
   |     |     +- 1
   |     +- bar_0.txt
   |        +- SIZE
   |        |  +- 2048
   |        +- COMPLETE
   |           +- 1
   +- 1
      +- FILES
      |  +- 1
      +- FILE
         +- foo_1.txt
            +- SIZE
            |  +- 3072
            +- COMPLETE
               +- 1

The above example represents a hash that contains a single element with key RANK. The hash associated with the RANK element contains two elements with keys 0 and 1. The hash associated with the 0 element contains two elements with keys FILES and FILE. The FILES element, in turn, contains a hash with a single element with a key 2, which finally contains a hash having no elements.

Often when displaying these trees, the guidelines are not shown and only the indentation is used, like so:

RANK
  0
    FILES
      2
    FILE
      foo_0.txt
        SIZE
          1024
        COMPLETE
          1
      bar_0.txt
          SIZE
            2048
          COMPLETE
            1
  1
    FILES
      1
    FILE
      foo_1.txt
        SIZE
          3072
        COMPLETE
          1

Common functions

This section lists the most common functions used when dealing with hashes. For a full listing, refer to comments in scr_hash.h. The implementation can be found in scr_hash.c.

Hash basics

First, before using a hash, one must allocate a hash object.

scr_hash* hash = scr_hash_new();

And one must free the hash when done with it.

scr_hash_delete(&hash);

Given a hash object, you may insert an element, specifying a key and another hash as a value.

scr_hash_set(hash, key, value_hash);

If an element already exists for the specified key, this function deletes the value currently associated with the key and assigns the specified hash as the new value. Thus it is not necessary to unset a key before setting it – setting a key simply overwrites the existing value.

You may also perform a lookup by specifying a key and the hash object to be searched.

scr_hash* value_hash = scr_hash_get(hash, key);

If the hash has a key by that name, it returns a pointer to the hash associated with the key. If the hash does not have an element with the specified key, it returns NULL.

You can unset a key.

scr_hash_unset(hash, key);

If a hash value is associated with the specified key, it is freed, and then the element is deleted from the hash. It is OK to unset a key even if it does not exist in the hash.

To clear a hash (unsets all elements).

scr_hash_unset_all(hash);

To determine the number of keys in a hash.

int num_elements = scr_hash_size(hash);

To simplify coding, most hash functions accept NULL as a valid input hash parameter. It is interpreted as an empty hash. For example,

scr_hash_delete(NULL);

does nothing

scr_hash_set(NULL, key, value_hash);

does nothing and returns NULL

scr_hash_get(NULL, key);

returns NULL

scr_hash_unset(NULL, key);

does nothing

scr_hash_unset_all(NULL);

does nothing

scr_hash_size(NULL);

returns 0

Accessing and iterating over hash elements

At times, one needs to work with individual hash elements. To get a pointer to the element associated with a key (instead of a pointer to the hash belonging to that element).

scr_hash_elem* elem = scr_hash_elem_get(hash, key);

To get the key associated with an element.

char* key = scr_hash_elem_key(elem);

To get the hash associated with an element.

scr_hash* hash = scr_hash_elem_hash(elem);

It’s possible to iterate through the elements of a hash. First, you need to get a pointer to the first element.

scr_hash_elem* elem = scr_hash_elem_first(hash);

This function returns NULL if the hash has no elements. Then, to advance from one element to the next.

scr_hash_elem* next_elem = scr_hash_elem_next(elem);

This function returns NULL when the current element is the last element. Below is some example code that iterates through the elements of hash and prints the key for each element:

scr_hash_elem* elem;
for (elem = scr_hash_elem_first(hash);
     elem != NULL;
     elem = scr_hash_elem_next(elem))
{
  char* key = scr_hash_elem_key(elem);
  printf("%s\n", key);
}

Key/value convenience functions

Often, it’s useful to store a hash using two keys which act like a key/value pair. For example, a hash may contain an element with key RANK, whose hash contains a set of elements with keys corresponding to rank ids, where each rank id 0, 1, 2, etc. has a hash, like so:

RANK
  0
    <hash for rank 0>
  1
    <hash for rank 1>
  2
    <hash for rank 2>

This case comes up so frequently that there are special key/value (_kv) functions to make this operation easier. For example, to access the hash for rank 0 in the above example, one may call

scr_hash* rank_0_hash = scr_hash_get_kv(hash, "RANK", "0");

This searches for the RANK element in the specified hash. If found, it then searches for the 0 element in the hash of the RANK element. If found, it returns the hash associated with the 0 element. If hash is NULL, or if hash has no RANK element, or if the RANK hash has no 0 element, this function returns NULL.

The following function behaves similarly to scr_hash_get_kv – it returns the hash for rank 0 if it exists. It differs in that it creates and inserts hashes and elements as needed such that an empty hash is created for rank 0 if it does not already exist.

scr_hash* rank_0_hash = scr_hash_set_kv(hash, "RANK", "0");

This function creates a RANK element if it does not exist in the specified hash, and it creates a 0 element in the RANK hash if it does not exist. It returns the hash associated with the 0 element, which will be an empty hash if the 0 element was created by the call. This feature lets one string together multiple calls without requiring lots of conditional code to check whether certain elements already exist. For example, the following code is valid whether or not hash has a RANK element.

scr_hash* rank_hash = scr_hash_set_kv(hash,      "RANK", "0");
scr_hash* ckpt_hash = scr_hash_set_kv(rank_hash, "CKPT", "10");
scr_hash* file_hash = scr_hash_set_kv(ckpt_hash, "FILE", "3");

Often, as in the case above, the value key is an integer. In order to avoid requiring the caller to convert integers to strings, there are functions to handle the value argument as an int type, e.g, the above segment could be written as

scr_hash* rank_hash = scr_hash_set_kv_int(hash,      "RANK",  0);
scr_hash* ckpt_hash = scr_hash_set_kv_int(rank_hash, "CKPT", 10);
scr_hash* file_hash = scr_hash_set_kv_int(ckpt_hash, "FILE",  3);

It’s also possible to unset key/value pairs.

scr_hash_unset_kv(hash, "RANK", "0");

This call removes the 0 element from the RANK hash if one exists. If this action causes the RANK hash to be empty, it also removes the RANK element from the specified input hash.

In some cases, one wants to associate a single value with a given key. When attempting to change the value in such cases, it is necessary to first unset a key before setting the new value. Simply setting a new value will insert another element under the key. For instance, consider that one starts with the following hash

TIMESTEP
  20

If the goal is to modify this hash such that it changes to

TIMESTEP
  21

then one should do the following

scr_hash_unset(hash, "TIMESTEP");
scr_hash_set_kv_int(hash, "TIMESTEP", 21);

Simply executing the set operation without first executing the unset operation results in the following

TIMESTEP
  20
  21

Because it is common to have fields in a hash that should only hold one value, there are several utility functions to set and get such fields defined in scr_hash_util.h and implemented in scr_hash_util.c. For instance, here are a few functions to set single-value fields:

int scr_hash_util_set_bytecount(scr_hash* hash, const char* key, unsigned long count);
int scr_hash_util_set_crc32(scr_hash* hash, const char* key, uLong crc);
int scr_hash_util_set_int64(scr_hash* hash, const char* key, int64_t value);

These utility routines unset any existing value before setting the new value. They also convert the input value into an appropriate string representation. Similarly, there are corresponding get routines, such as:

int scr_hash_util_get_bytecount(const scr_hash* hash, const char* key, unsigned long* count);
int scr_hash_util_get_crc32(const scr_hash* hash, const char* key, uLong* crc);
int scr_hash_util_get_int64(const scr_hash* hash, const char* key, int64_T* value);

If a value is set for the specified key, and if the value can be interpreted as the appropriate type for the output parameter, the get routine returns SCR_SUCCESS and copies the value to the output parameter. Otherwise, the routine does not return SCR_SUCCESS and does not modify the output parameter.

For example, to set and get the timestep value from the example above, one could do the following:

scr_hash_util_set_int64(hash, "TIMESTEP", 21);

int64_t current_timestep = -1;
if (scr_hash_util_get_int64(hash, "TIMESTEP", &current_timestep) == SCR_SUCCESS) {
  /* TIMESTEP was set, and it's value is now in current_timestep */
} else {
  /* TIMESTEP was not set, and current_timestep is still -1 */
}

The difference between these utility functions and the key/value (_kv) functions is that the key/value functions are used to set and get a hash that is referenced by a key/value pair whereas the utility functions set and get a scalar value that has no associated hash.

Specifying multiple keys with format functions

One can set many keys in a single call using a printf-like statement. This call converts variables like floats, doubles, and longs into strings. It enables one to set multiple levels of keys in a single call, and it enables one to specify the hash value to associate with the last element.

scr_hash_setf(hash, value_hash, "format", variables ...);

For example, if one had a hash like the following

RANK
  0
    CKPT
      10
        <current_hash>

One could overwrite the hash associated with the 10 element in a single call like so.

scr_hash_setf(hash, new_hash, "%s %d %s %d", "RANK", 0, "CKPT", 10);

Different keys are separated by single spaces in the format string. Only a subset of the printf format strings are supported.

There is also a corresponding getf version.

scr_hash* hash = scr_hash_getf(hash, "%s %d %s %d", "RANK", 0, "CKPT", 10);

Sorting hash keys

Generally, the keys in a hash are not ordered. However, one may order the keys with the following sort routines.

scr_hash_sort(hash, direction);
scr_hash_sort_int(hash, direction);

The first routine sorts keys by string, and the second sorts keys as integer values. The direction variable may be either SCR_HASH_SORT_ASCENDING or SCR_HASH_SORT_DESCENDING. The keys remain in sorted order until new keys are added. The order is not kept between packing and unpacking hashes.

Listing hash keys

One may get a sorted list of all keys in a hash.

int num_keys;
int* keys;
scr_hash_list_int(hash, &num_keys, &keys);
...
if (keys != NULL)
  free(keys);

This routine returns the number of keys in the hash, and if there is one or more keys, it allocates memory and returns the sorted list of keys. The caller is responsible for freeing this memory. Currently, one may only get a list of keys that can be represented as integers. There is no such list routine for arbitrary key strings.

Packing and unpacking hashes

A hash can be serialized into a memory buffer for network transfer or storage in a file. To determine the size of a buffer needed to pack a hash.

int num_bytes = scr_hash_pack_size(hash);

To pack a hash into a buffer.

scr_hash_pack(buf, hash);

To unpack a hash from a buffer into a given hash object.

scr_hash* hash = scr_hash_new();
scr_hash_unpack(buf, hash);

One must pass an empty hash to the unpack function.

Hash files

Hashes may be serialized to a file and restored from a file. To write a hash to a file.

scr_hash_file_write(filename, hash);

This call creates the file if it does not exist, and it overwrites any existing file.

To read a hash from a file (merges hash from file into given hash object).

scr_hash_file_read(filename, hash);

Many hash files are written and read by more than one process. In this case, locks can be used to ensure that only one process has access to the file at a time. A process blocks while waiting on the lock. The following call blocks the calling process until it obtains a lock on the file. Then it opens, reads, closes, and unlocks the file. This results in an atomic read among processes using the file lock.

scr_hash_read_with_lock(filename, hash)

To update a locked file, it is often necessary to execute a read-modify-write operation. For this there are two functions. One function locks, opens, and reads a file.

scr_hash_lock_open_read(filename, &fd, hash)

The opened file descriptor is returned, and the contents of the file are read (merged) in to the specified hash object. The second function writes, closes, and unlocks the file.

scr_hash_write_close_unlock(filename, &fd, hash)

One must pass the filename, the opened file descriptor, and the hash to be written to the file.

Sending and receiving hashes

There are several functions to exchange hashes between MPI processes. While most hash functions are implemented in scr_hash.c, the functions dependent on MPI are implemented in scr_hash_mpi.c. This is done so that serial programs can use hashes without having to link to MPI.

To send a hash to another MPI process.

scr_hash_send(hash, rank, comm)

This call executes a blocking send to transfer a copy of the specified hash to the specified destination rank in the given MPI communicator. Similarly, to receive a copy of a hash.

scr_hash_recv(hash, rank, comm)

This call blocks until it receives a hash from the specified rank, and then it unpacks the received hash into hash and returns.

There is also a function to simultaneously send and receive hashes, which is useful to avoid worrying about ordering issues in cases where a process must both send and receive a hash.

scr_hash_sendrecv(hash_send, rank_send, hash_recv, rank_recv, comm)

The caller provides the hash to be sent and the rank it should be sent to, along with a hash to unpack the received into and the rank it should receive from, as well as, the communicator to be used.

A process may broadcast a hash to all ranks in a communicator.

scr_hash_bcast(hash, root, comm)

As with MPI, all processes must specify the same root and communicator. The root process specifies the hash to be broadcast, and each non-root process provides a hash into which the broadcasted hash is unpacked.

Finally, there is a call used to issue a (sparse) global exchange of hashes, which is similar to an MPI_Alltoallv call.

scr_hash_exchange(hash_send, hash_recv, comm)

This is a collective call which enables any process in comm to send a hash to any other process in comm (including itself). Furthermore, the destination processes do not need to know from which processes they will receive data in advance. As input, a process should provide an empty hash for hash_recv, and it must structure hash_send in the following manner.

 <rank_X>
   <hash_to_send_to_rank_X>
<rank_Y>
  <hash_to_send_to_rank_Y>

Upon return from the function, hash_recv will be filled in according to the following format.

<rank_A>
  <hash_received_from_rank_A>
<rank_B>
  <hash_received_from_rank_B>

For example, if hash_send was the following on rank 0 before the call:

hash_send on rank 0:
1
  FILES
    1
  FILE
    foo.txt
2
  FILES
    1
  FILE
    bar.txt

Then after returning from the call, hash_recv would contain the following on ranks 1 and 2:

hash_recv on rank 1:
0
  FILES
    1
  FILE
    foo.txt
<... data from other ranks ...>

hash_recv on rank 2:
0
  FILES
    1
  FILE
    bar.txt
<... data from other ranks ...>

The algorithm used to implement this function assumes the communication is sparse, meaning that each process only sends to or receives from a small number of other processes. It may also be used for gather or scatter operations.

Debugging

Newer versions of TotalView enable one to dive on hash variables and inspect them in a variable window using a tree view. For example, when diving on a hash object corresponding to the example hash in the overview section, one would see an expanded tree in the variable view window like so:

+- RANK
   +- 0
   |  +- FILES = 2
   |  +- FILE
   |     +- foo_0.txt
   |     |  +- SIZE = 1024
   |     |  +- COMPLETE = 1
   |     +- bar_0.txt
   |        +- SIZE = 2048
   |        +- COMPLETE = 1
   +- 1
      +- FILES = 1
      +- FILE
         +- foo_1.txt
            +- SIZE = 3072
            +- COMPLETE = 1

When a hash of an element contains a single element whose own hash is empty, this display condenses the line to display that entry as a key = value pair.

If TotalView is not available, one may resort to printing a hash to stdout using the following function. The number of spaces to indent each level is specified in the second parameter.

scr_hash_print(hash, indent);

To view the contents of a hash file, there is a utility called scr_print_hash_file which reads a file and prints the contents to the screen.

scr_print_hash_file  myhashfile.scr

Binary format

This section documents the binary format used when serializing a hash.

Packed hash

A hash can be serialized into a memory buffer for network transfer or storage in a file. When serialized, all integers are stored in network byte order (big-endian format). Such a “packed” hash consists of the following format:

Format of a PACKED HASH:

Field Name

Datatype

Description

Count

uint32_t

Number of elements in hash

A count of 0 means the hash is empty.

Elements

PACKED

Sequence of packed elements of length Count.

ELEMENT

Format of a PACKED ELEMENT:

Field Name

Datatype

Description

Key

NULL-terminated ASCII string

Key associated with element

Hash

PACKED

Hash associated with element

HASH

File format

A hash can be serialized and stored as a binary file. This section documents the file format for an scr_hash object. All integers are stored in network byte order (big-endian format). A hash file consists of the following sequence of bytes:

Field Name

Datatype

Description

Magic Number

uint32_t

Unique integer to help distinguish an SCR file from other types of files

0x951fc3f5 (host byte order)

File Type

uint16_t

Integer field describing what type of SCR file this file is

1 \(\rightarrow\) file is an scr_hash file

File Version

uint16_t

Integer field that together with File Type defines the file format

1 \(\rightarrow\) scr_hash file is stored in version 1 format

File Size

uint64_t

Size of this file in bytes, from first byte of the header to the last byte in the file.

Flags

uint32_t

Bit flags for file.

Data

PACKED

Packed hash data (see Section 1.4.1).

HASH

CRC32*

uint32_t

CRC32 of file, accounts for first byte of header to last byte of Data.

*Only exists if SCR_FILE_FLAGS_CRC32 bit is set in Flags.