synthesizer.pipeline.pipeline_io

A module for handling I/O operations in the pipeline.

This module contains classes and functions for reading and writing data in the pipeline. This includes reading and writing HDF5 files, as well as handling the MPI communications for parallel I/O operations.

Example usage:

# Write data to an HDF5 file writer = PipelineIO(“output.hdf5”) writer.write_data(data, key)

Classes

class synthesizer.pipeline.pipeline_io.PipelineIO(filepath, comm=None, ngalaxies_local=None, start_time=None, verbose=1, parallel_io=False)[source]

A class for writing data to an HDF5 file.

This class provides methods for writing data to an HDF5 file. It can handle writing data in parallel using MPI if the h5py library has been built with parallel support.

hdf

The HDF5 file to write to.

Type:

h5py.File

comm

The MPI communicator.

Type:

mpi.Comm

num_galaxies

The total number of galaxies.

Type:

int

rank

The rank of the MPI process.

Type:

int

is_parallel

Whether the writer is running in parallel.

Type:

bool

is_root

Whether the writer is running on the root process.

Type:

bool

is_collective

Whether the writer is running in collective mode.

Type:

bool

verbose

Whether to print verbose output.

Type:

bool

_start_time

The start time of the pipeline.

Type:

float

combine_rank_files()[source]

Combine the rank files into a single file.

Parameters:

output_file (str) – The name of the output file.

combine_rank_files_virtual()[source]

Combine the rank files into a single virtual file.

Note that the virtual file this produces requires the rank files to remain in the same location as when they were created.

create_datasets_parallel(data, key)[source]

Create datasets ready to be populated in parallel.

This is only needed for collective I/O operations. We will first make the datasets here in serial so they can be written to in any order on any rank.

Parameters:
  • shapes (dict) – The shapes of the datasets to create.

  • dtypes (dict) – The data types of the datasets to create.

create_file_with_metadata(instruments, emission_model)[source]

Write metadata to the HDF5 file.

This writes useful metadata to the root group of the HDF5 file and outputs the instruments and emission model to the appropriate groups.

Parameters:
  • instruments (dict) – A dictionary of instrument objects.

  • emission_model (dict) – A dictionary of emission model objects.

write_data(data, key, indexes=None, root=0)[source]

Write data using the appropriate method based on the environment.

Parameters:
  • data (any) – The data to write.

  • key (str) – The key to write the data to.

  • root (int, optional) – The root rank for gathering and writing.

write_dataset(data, key)[source]

Write a dataset to an HDF5 file.

We handle various different cases here: - If the data is a unyt object, we write the value and units. - If the data is a string we’ll convert it to a h5py compatible string

and write it with dimensionless units.

  • If the data is a numpy array, we write the data and set the units to “dimensionless”.

Parameters:
  • data (any) – The data to write.

  • key (str) – The key to write the data to.

write_dataset_parallel(data, key)[source]

Write a dataset to an HDF5 file in parallel.

This function requires that h5py has been built with parallel support.

Parameters:
  • data (any) – The data to write.

  • key (str) – The key to write the data to.

write_datasets_parallel(data, key, paths)[source]

Write a dictionary to an HDF5 file recursively in parallel.

This function requires that h5py has been built with parallel support.

Parameters:
  • data (dict) – The data to write.

  • key (str) – The key to write the data to.

write_datasets_recursive(data, key)[source]

Write a dictionary to an HDF5 file recursively.

Parameters:
  • data (dict) – The data to write.

  • key (str) – The key to write the data to.