synthesizer.pipeline.pipeline¶

A module containing a pipeline helper class.

This module contains the Pipeline class, which is used to run observable generation pipelines on a set of galaxies. To use this functionality the user needs to define the properties of the Pipeline and a function to load the galaxies. The user can then call the various methods to generate the mock data they need, simplifying a complex pipeline full of boilerplate code to a handfull of definitions and calls to the Pipeline object.

Example usage: ```python

from synthesizer import Pipeline

pipeline = Pipeline(
gal_loader_func=load_galaxy, emission_model=emission_model, instruments=[instrument1, instrument2], n_galaxies=1000, nthreads=4, comm=None, verbose=1, )

pipeline.load_galaxies() pipeline.get_spectra() pipeline.get_photometry_luminosities() pipeline.write(“output.hdf5”)

```

Classes

class synthesizer.pipeline.pipeline.Pipeline(emission_model, instruments=(), nthreads=1, comm=None, verbose=1)[source]¶

A class for running observable generation pipelines on a set of galaxies.

To use this class the user must instantiate it with a galaxy loading function, an emission model defining the different emissions that will be included in the pipeline, any instruments that will be used to make observations, and the number of galaxies that will be loaded.

Optionally the user can also specify the number of threads to use if Synthesizer has been installed with OpenMP support, and an MPI communicator if they are running over MPI.

Finally the verbosity level can be set to control the amount of output.

Once the Pipeline object has been instantiated the user can call the various methods to generate the data they need.

For spectra:

get_spectra (passing a cosmology object if redshifted spectra are
required)
get_lnu_data_cubes (resolved spectral data cubes)
get_fnu_data_cubes (resolved spectral data cubes)

For photometry:

get_photometry_luminosities
get_photometry_fluxes

For emission lines:

get_lines (passing a list of line IDs to generate)

For images (with optional PSF and noise based on the instrument):

get_images_luminosity
get_images_flux

For the SFZH grid:

get_sfzh (passing a Grid object)

The user can also add their own analysis functions to the pipeline which will be run on each galaxy once all data has been generated. These functions should take a galaxy object as the first argument and can take any number of additional arguments and keyword arguments. The results of these functions should be attached to the galaxy object, either as base level attributes or dictionaries containing the computed values. These attributes should be unique to the function to avoid overwriting existing attributes (they should be named what is passed to the result_attribute argument, see add_analysis_func for more details).

Finally the user can write out the data generated by the pipeline using the write method. This will write out the data to an HDF5 file.

emission_model¶

The emission model to use for the pipeline.

Type:: EmissionModel

instruments¶

A list of Instrument objects to use for the pipeline.

Type:: list

n_galaxies¶

How many galaxies will we load in total (i.e. not per rank if using MPI)?

Type:: int

nthreads¶

The number of threads to use for shared memory parallelism. Default is 1.

Type:: int

comm¶

The MPI communicator to use for MPI parallelism. Default is None.

Type:: MPI.Comm

verbose¶

How talkative are we? 0: No output beyond hello and goodbye. 1: Outputs with timings but only on rank 0 (when using MPI). 2: Outputs with timings on all ranks (when using MPI).

Type:: int

galaxies¶

A list of Galaxy objects that have been loaded.

Type:: list

filters¶

A combined collection of all the filters from the instruments.

Type:: FilterCollection

add_analysis_func(func, result_key, *args, **kwargs)[source]¶

Add an analysis function to the Pipeline.

The provided function will be called on each galaxy in the Pipeline once all data has been generated. The function should take a galaxy object as the first argument and can take any number of additional arguments and keyword arguments.

The results of the analysis function should be returned. This can be a scalar, array, or a dictionary of arbitrary structure. We’ll store it in a dictionary on the Pipeline object with the key being the result_key argument.

For example:

```python def my_analysis_func(galaxy, *args, **kwargs):

return galaxy.some_attribute * 2

pipeline.add_analysis_func(my_analysis_func, “MyAnalysisResult”) ```

Or for a specific component of the galaxy:

```python def my_analysis_func(galaxy, *args, **kwargs):

return galaxy.stars.mass.sum()

pipeline.add_analysis_func(my_analysis_func, “Stars/Mass”) ```

Parameters:

func (callable) – The analysis function to add to the Pipeline. This function should take a galaxy object as the first argument and can take any number of additional arguments and keyword arguments.
result_key (str) – The key to use when storing the results of the analysis function in the output. This can include slashes to denote nesting, e.g. “Gas/Nested/Result”.

add_galaxies(galaxies)[source]¶

Add galaxies to the Pipeline.

This function will add the provided galaxies to the Pipeline. This is useful if you have already loaded the galaxies and want to add them to the Pipeline object.

Parameters:: galaxies (list) – A list of Galaxy objects to add to the Pipeline.

apply_psfs_flux()[source]¶: Apply any instrument PSFs to the flux images.

apply_psfs_luminosity()[source]¶: Apply any instrument PSFs to the luminosity images.

combine_files()[source]¶

Combine inidividual rank files into a single file.

Only applicable to MPI runs.

This will create a physical file on disk with all the data copied from the inidivdual rank files. The rank files themselves will be deleted. Once all data has been copied.

This method is cleaner but has the potential to be very slow.

combine_files_virtual()[source]¶

Combine inidividual rank files into a single virtual file.

Only applicable to MPI runs.

This will create a file where all the data is accessible but not physically copied. This is much faster than making a true copy but requires each individual rank file remains accessible.

get_data_cubes_fnu()[source]¶: Compute the Spectral flux density data cubes.

get_data_cubes_lnu()[source]¶: Compute the spectral luminosity density data cubes.

get_images_flux(fov, img_type='smoothed', kernel=None, kernel_threshold=1.0)[source]¶

Compute the flux images for the galaxies.

This function will compute the flux images for all spectra types that were saved when spectra were generated, in all filters included in the Pipeline instruments.

A PSF and/or noise will be applied if they are available on the instrument.

Parameters:

fov (unyt_quantity) – The field of view of the image with units.
img_type (str) – The type of image to generate. Options are ‘smoothed’ or ‘hist’. Default is ‘smoothed’.
kernel (array-like) – The kernel to use for smoothing the image. Default is None. Required for ‘smoothed’ images from a particle distribution.
kernel_threshold (float) – The threshold of the kernel. Default is 1.0.

get_images_luminosity(fov, img_type='smoothed', kernel=None, kernel_threshold=1.0)[source]¶

Compute the luminosity images for the galaxies.

This function will compute the luminosity images for all spectra types that were saved when spectra were generated, in all filters included in the Pipeline instruments.

A PSF and/or noise will be applied if they are available on the instrument.

Parameters:

fov (unyt_quantity) – The field of view of the image with units.
img_type (str) – The type of image to generate. Options are ‘smoothed’ or ‘hist’. Default is ‘smoothed’.
kernel (array-like) – The kernel to use for smoothing the image. Default is None. Required for ‘smoothed’ images from a particle distribution.
kernel_threshold (float) – The threshold of the kernel. Default is 1.0.

get_lines(line_ids)[source]¶

Generate the emission lines for the galaxies.

This function will generate the emission lines for all spectra types that were saved when spectra were generated.

Parameters:: line_ids (list) – The emission line IDs to generate.

get_los_optical_depths(kernel, kernel_threshold=1.0, kappa=0.0795)[source]¶

Compute the Line of Sight optical depths for all particles.

This will compute the optical depths based on the line of sight dust column density for all non-gas components. We project a ray along the z axis (LOS) and any gas kernels it intersects are evaluated at the intersection and their contributions to the optical depth is included.

Parameters:

kernel (array-like) – The gas SPH kernel.
kernel_threshold (float) – The threshold of the kernel. Default is 1.0.
kappa (float) – The dust opacity coefficient in units of Msun / pc**2. Default is 0.0795.

get_photometry_fluxes()[source]¶: Compute the photometric fluxes from the generated spectra.

get_photometry_luminosities()[source]¶: Compute the photometric luminosities from the generated spectra.

get_sfzh(grid)[source]¶

Compute the SFZH grid for each galaxy.

This is also the integrated weights of each star particle onto the SPS grid.

Parameters:: grid (Grid) – The SPS grid to use for the SFZH calculation.

get_spectra(cosmo=None)[source]¶: Generate the spectra for the galaxies based on the EmissionModel.

repartition_galaxies(galaxy_weights=None, random_seed=42)[source]¶: Given the galaxies repartition them across the ranks.

write(outpath, verbose=None)[source]¶

Write what we have produced to a HDF5 file.

Parameters:

outpath (str) – The path to the HDF5 file to write.
verbose (bool, optional) – If set, override the Pipeline verbose setting.