Datasets

Overview

The scr_dataset data structure associates various attributes with each dataset written by the application. It tracks information such as the dataset id, the creation time, the total number of bytes.

The scr_hash is utilized in the scr_dataset API and its implementation. Essentially, scr_dataset objects are specialized scr_hash objects that have certain well-defined keys (fields) and associated functions to access those fields.

Example dataset hash

Internally, dataset objects are implemented as scr_hash objects. Here is an example hash for a dataset object.

ID
  23
USER
  user1
JOBNAME
  simulation123
NAME
  dataset.23
SIZE
  524294000
FILES
  1024
CREATED
  1312850690668536
CKPT
  6
COMPLETE
  1

The ID field records the dataset id of the dataset as assigned by the scr_dataset_id variable at the time the dataset is created. The USER field records the username associated with the job within which the dataset was created, and the value of $SCR_JOB_NAME, if set, is recorded in the JOBNAME field. The NAME field records the name of the dataset. This is currently defined to be “dataset.<id>” where <id> is the dataset id. The total number of bytes in the dataset is recorded in the SIZE field, and the total number of files is recorded in FILES. The CREATED field records the time at which the dataset was created, in terms of microseconds since the Linux epoch. If the dataset is a checkpoint, the checkpoint id is recorded in the CKPT field. The COMPLETE field records whether the dataset is valid. It is set to 1 if the dataset is thought to be valid, and 0 otherwise.

These are the most common fields used in dataset objects. Not all fields are required, and additional fields may be used that are not shown here.

Common functions

This section describes some of the most common dataset functions. For a detailed list of all functions, see scr_dataset.h. The implementation can be found in scr_dataset.c.

Allocating and freeing dataset objects

Create a new dataset object.

scr_dataset* dataset = scr_dataset_new()

Free a dataset object.

scr_dataset_delete(&dataset);

Setting, getting, and checking field values

There are functions to set each field individually.

int scr_dataset_set_id(scr_dataset* dataset, int id);
int scr_dataset_set_user(scr_dataset* dataset, const char* user);
int scr_dataset_set_jobname(scr_dataset* dataset, const char* name);
int scr_dataset_set_name(scr_dataset* dataset, const char* name);
int scr_dataset_set_size(scr_dataset* dataset, unsigned long size);
int scr_dataset_set_files(scr_dataset* dataset, int files);
int scr_dataset_set_created(scr_dataset* dataset, int64_t created);
int scr_dataset_set_jobid(scr_dataset* dataset, const char* jobid);
int scr_dataset_set_cluster(scr_dataset* dataset, const char* name);
int scr_dataset_set_ckpt(scr_dataset* dataset, int id);
int scr_dataset_set_complete(scr_dataset* dataset, int complete);

If a field was already set to a value before making this call, the new value overwrites any existing value.

And of course there are corresponding functions to get values.

int scr_dataset_get_id(const scr_dataset* dataset, int* id);
int scr_dataset_get_user(const scr_dataset* dataset, char** name);
int scr_dataset_get_jobname(const scr_dataset* dataset, char** name);
int scr_dataset_get_name(const scr_dataset* dataset, char** name);
int scr_dataset_get_size(const scr_dataset* dataset, unsigned long* size);
int scr_dataset_get_files(const scr_dataset* dataset, int* files);
int scr_dataset_get_created(const scr_dataset* dataset, int64_t* created);
int scr_dataset_get_jobid(const scr_dataset* dataset, char** jobid);
int scr_dataset_get_cluster(const scr_dataset* dataset, char** name);
int scr_dataset_get_ckpt(const scr_dataset* dataset, int* id);
int scr_dataset_get_complete(const scr_dataset* dataset, int* complete);

If the corresponding field is set, the get functions copy the value into the output parameter and return SCR_SUCCESS. If SCR_SUCCESS is not returned, the output parameter is not changed.