Library Documentation

symlib is Symphony’s data analysis library. Symphony uses custom file formats to reduce the size of files, to speed up common common tasks, and to make some common tasks easier. These library functions will allow you to work with these files. symlib also allows you to download Symphony data sets.

This page is meant to be a full technical reference to this library. First-time users will probably find it easiest to start with the tutorial pages on Data Access and Data Analysis Tutorial rather than try to read through this page from start to finish.

Units

symlib functions generally expect variables to be in the following units:

  • Masses: \(M_\odot\)

  • Distances/radii: physical \({\rm kpc}\)

  • Positions: physical \({\rm kpc}\), centered on the host halo

  • Velocities: physical \({\rm km/s}\), centered on the host halo

Much of the data that symlib reads in is a processed form of data from another code, which may use different conventions. To convert to symlib’s conventions, use the set_units_* family of functions: symlib.set_units_halos(), symlib.set_units_parameters(), symlib.set_units_branches() as needed.

Datatypes

symlib halo data is generally returned as a numpy structured array, which allows for fields to be accessesed and subselected easily. See the Data Analysis Tutorial page for usage examples.

symlib.SUBHALO_DTYPE

Time-dependent subhalo information (e.g., position) from the Rockstar halo finder. You can get this information for all a host’s subhalos by calling symlib.read_subhalos().

DATA FIELDS:

  • "id" (numpy.int32) - A unique integer identifying each subhalo. Changes from snapshot to snapshot.

  • "mvir" (numpy.float32) - The mass of the halo, \(M_{\rm vir}\). When isolated, this is an overdensity mass from the Bryan & Norman (1998) definition of the virial overdensity. When deep in a host halo, this is the bound mass. The transition between these two definitions is ill-defined.

  • "rvir" (numpy.float32) - The overdensity radius of the halo, \(R_{\rm vir}\).

  • "vmax" (numpy.float32) - The maximum value of the halo’s circular rotation curve, \(V_{\rm max} = {\rm max}\left\{V_{\rm rot}(r) = \sqrt{G M(<r)/r}\right\}\).

  • "rvmax" (numpy.float32) - The radius where this maximum rotation velocity occurs.

  • "cvir" (numpy.float32) - An estimate of how centrally concentrated the subhalo’s mass is, \(c_{\rm vir}=R_s/R_{\rm vir}\). \(R_s\) is the transition radius between shallow inner density slopes (\(d \ln(\rho)/d \ln(r)\) > -2) and steep outer slopes (i.e. \(d \ln(\rho)/d \ln(r)\) < -2). \(c_{\rm vir}\) is estimated by measuring \(V_{\rm max}/V_{\rm rot}(R_{\rm vir})\), assuming an NFW profile, and solving for \(R_s\). Because of this, the value of \(c_{\rm vir}\) is only meaningful for halos where the assumption of NFW profiles is reasonable (non-subhalos). However, the relative ordering of concentrations will be correct regardless.

  • "x" (numpy.float32) - \(x\), the position of the subhalo.

  • "v" (numpy.float32) - \(v\), The velocity of the subhalo.

  • "ok" (bool) - True if the subhalo exists during the specified snapshot and False otherwise.

symlib.HISTORY_DTYPE

Time-independent subhalo information about the subhalo’s entire history in the simulation (e.g. when it first fell into the host halo). You can get it for all the host’s subhalos by calling symlib.read_subhalos().

DATA FIELDS

  • "mpeak" (numpy.float32) - \(M_{\rm peak}\), the largest \(M_{\rm vir}\) that the subhalo ever had. This quantity is often useful for reasoning about subhalo disruption or as a component in models of galaxy mass.

  • "vpeak" (numpy.float32) - \(V_{\rm peak}\), the largest \(V_{\rm max}\) that the subhalo ever had. This is useful in the same places that \(M_{\rm peak}\) is.

  • "merger_snap" (numpy.int32) - The snapshot where the subhalo first fell within the virial radius of the host halo.

  • "merger_ratio" (numpy.float32) - The ratio of masses at this infall snapshot.

  • "branch_idx" (numpy.int32) - The index of this halo’s branch in the full merger tree. This allows you to switch back and forther between the two data structures as needed.

  • symlib.HISTORY_DTYPE also contains all the fields in symlib.BRANCH_DTYPE(). Note, however, that subhalos where is_disappear is True or is_real is False have already been removed, so there is no need to make cuts on this.

  • false_selection (bool) - True if the branch has \(M_{\rm peak} \geq 300\cdot m_p\) after infall, but \(M_{\rm peak} < 300\cdot m_p\).

symlib.BRANCH_DTYPE

Information about the main branch of a subhalo in the full consistent-trees merger tree. You probably will not need this unless you walk through the full merger tree, which is an advanced action. You can get it by calling symlib.read_branches().

DATA FIELDS

  • "start" (numpy.int32) - The index of the last halo along this main branch. It is labeled “start” because the tree is ordered from later times to earlier times. See the documentation on read_tree() for more details on tree structure.

  • "end" (numpy.int32) - The index after the first halo in the branch. This means that the full main branch can be accessed by using index slicing: branch = tree[start: end].

  • "is_real" (bool) - False if the first tracked halo of this branch is a subhalo and True otherwise. Branches where this is False are virtually always tree-linking errors.

  • "is_disappear" (bool) True if the last tracked halo of this branch disrupts without merging with any other halos and False otherwise. Branches where this is True are virtually always barely-resolved object fluctuating in-and-out of existence near the resolution barrier.

  • "is_main_sub" (bool) - True if any halo in the branch was ever a subhalo of the main host.

  • "preprocess" (numpy.int32) - A non-negative integer if the branch was ever the subhalo of a larger halo prior to becoming a subhalo of the host and -1 otherwise. If the first case is true, this variable is the index of the largest branch that this branch was a subhalo of. There’s some non-trivial bookkeeping required to deal with tree errors caused by major mergers, which will be described in a future paper. For now, suffice to say that it is a generalized version of Section 2.3.1 of Mansflied & Kravtsov (2020).

  • "first_infall_snap" (numpy.int32) - If "preprocess" is non-negative, the snapshot when this branch first fell into a halo of the branch pointed to by "preprocess".

Merger Tree Variables

The following variables can be read in from merger trees with the symlib.read_tree() function. These variables are taken directly from the consistent-trees output files and still retain its units and ID conventions.

  • "dfid" - The depth-first ID of the halo.

  • "id" - The ID of the halo.

  • "desc_id - The ID (id, not dfid) of the descendant. -1 if the halo has no descendants.

  • "upid" - The UpID of a halo. This is -1 if the halo is not within a larger halo’s virial radius, otherwise it is the ID (id, not dfid) of that larger halo.

  • "phantom" - A flag indicating whether consistent-trees was able to track the object during this snapshot. 1 if so, and 0 otherwise. If 0, this halo’s properties were interpolated during this snapshot.

  • "snap" - This halo’s snapshot.

  • "next_co_prog" - The depth-first ID (dfid, not id) of this halo’s co-progenitor, if it exists. If this halo doesn’t have a co-progenitor, this variable is -1. See Intro to Merger Trees for a description of what this is.

  • "mvir" - The mass of the halo, \(M_{\rm vir}\). When isolated, this an overdensity mass from the Bryan & Norman (1998) definition of the virial overdensity. When deep in a host halo, this is the bound mass. The transition between these two definitions is ill-defined.

  • "rs" - The NFW scale radius of the halo, \(R_s\). Units are comoving \(h^{-1}{\rm kpc}\).

  • "vmax" - The maximum value of the halo’s circular rotation curve, \(V_{\rm max} = {\rm max}\left\{V_{\rm rot}(r) = \sqrt{G M(<r)/r}\right\}\). Units are physical km/s.

  • "m200b" - The overdensity mass, \(M_{\rm 200b}\), corresponding to \(200\times \rho_m\).

  • "m200c" - The overdensity mass, \(M_{\rm 200b}\), corresponding to \(200\times \rho_c\).

  • "m500c" - The overdensity mass, \(M_{\rm 200b}\), corresponding to \(500\times \rho_c\).

  • "xoff" - The distance between the center of mass and the densest part fo the halo. units are comoving \(h^{-1}{\rm kpc}\).

  • "spin_bullock" - Unitless paramater that tracks the specific anular momentum of the halo. \(|\vec{J}|/(\sqrt{2}\,M_{\rm vir}\,V_{\rm vir}\,R_{\rm vir})\).

  • "c_to_a" - The unitless minor-to-major axis ratio of the halo.

  • "b_to_a" - The unitless intermediate-to-major axis ratio of the halo.

  • "t_to_u" - The virial ratio, \(T/|U|\).

  • "r_vmax" - The radius, \(R_{\rm vmax}\), at which \(V_{\rm max}\) occurs.

  • "x" - A 3-vector, \(\vec{x}\) giving the position of the halo in comivng \(h^{-1}{\rm Mpc}\).

  • "v" - A 3-vector, \(\vec{v}\), giving the velocity of the halo in physical km/s.

  • "j" - A 3-vector, \(\vec{J}\), giving the angular momentum of the halo in physical \(h^{-2}M_\odot\cdot{\rm Mpc}\cdot{\rm km/s}\).

  • "a" - A 3-vector, \(\vec{A}\), pointing in the direction of the halo’s major axis with length equal to that major axis. Units are comoving \(h^{-1}{\rm kpc}\).

General Functions

symlib.n_hosts(suite_name)

Returns the number of zoom-in simulations, each of which is associated with one “target” host halo, in a simulation suite (the Symphony suites are: LMC, Milky Way, Group, L-Cluster, and Cluster). Can be used with symlib.get_host_directory() to loop over all target host halos in a suite.

Parameters

suite_name (str) – The name of the simulation suite.

Return type

int

symlib.get_host_directory(base_dir, suite_name, halo_name)

Returns the name of a simulation directory given the base directory that all the suites are stored in, the suite, and the halo name. The halo name can either be the literal halo name (e.g., "Halo023") or a number in the range \([0,\,N_{\rm host})\). This can be combined with symlib.n_hosts() to loop over all the hosts in a suite.

Parameters
  • base_dir (str) – Base directory containing all suites.

  • suite_name (str) – Name of the simulation suite.

  • halo_name (str or int) – Name or index of the target host halo.

Return type

str, the name of the host’s simulation directory.

symlib.scale_factors(sim_dir)

Returns an array of the scale factors, \(a(z)\), of each of snapshot. Sorted from earliest to latest.

The scale factor arrays of two simulations in different suites may be different from one another. The scale factor arrays of two simulations in the same suite sometimes also slightly differ, depending on whether simulations needed to be restarted midway through.

Parameters

sim_dir (str) – The directory of the target host halo.

Return type

np.array containing the scale factors of each snapshot in the simulation.

symlib.simulation_parameters(dim_dir)

Returns a dictionary containing parameters of the simulation suite. These parameters are returned as a dictionary which maps the string names of variables to their values.

  • "eps" - \(\epsilon\), the effective radius of dark matter particles in comoving \(h^{-1}{\rm kpc}\) (i.e. the “Plummer-equivalent force softening scale”).

  • "mp" - \(m_p\), the mass of dark matter particles in \(h^{-1}M_\odot\).

  • "n_snap" - \(N_{\rm snap}\), the number of snapshots in the simulation.

  • "h100" - \(h_{100} = H_0 / (100\ {\rm km/s/Mpc})\), the scaled Hubble parameter.

It also contains colossus-compatible cosmology parameters. Note that these are not the same between all suites.

  • "flat" - True if the universe is flat and False otherwise.

  • "H0" - \(H_0\), the Hubble constant in units of km/s/Mpc.

  • "Om0" - \(\Omega_{m,0}\), the total matter density relative to the citical density at \(z=0\).

  • "Ob0" - \(\Omega_{m,0}\) baryon density relative to the critical density at \(z=0\).

  • "sigma8" - \(\sigma_8\) the amplitude of the power spectrum at \(8\ h^{-1}{\rm Mpc}\).

  • "ns" - \(n_s\), the spectral tilt of the power spectrum.

Parameters

sim_dir – The directory of the target host halo. You may also just pass it the name of the simulation suite (e.g. "SymphonyMilkyWay")

Return type

dict

symlib.set_units_parameters(scale, param)

Converts the particle mass (\(m_p\), "mp") and particle size (\(\epsilon\), "eps") to symlibs’s default units.

Parameters
  • mp (float) – particle mass in \(M_\odot\)

  • eps (float) – Plummer-equivalent force softening scale in physical \({\rm kpc}\).

symlib.set_units_halos(h, scale, param)

Converts the units of a 2D np.array with type symlib.SUBHALO_DTYPE to symlib’s default units. All masses will be in units of \(M_\odot\), all positions and radii will be units of physical \({\rm kpc}\). Positions will be centered on the first halo in the array at the given snapshot. Velocities will be in physical \({\rm km/s}\) and similarly centered on the velocity of the first halo at each snapshot.

This function only needs to be called if comoving=True in symlib.read_subhalos(). This is not true by default

Parameters
symlib.set_units_histories(hist, scale, param)

Converts the units of an np.array with type symlib.HISTORY_DTYPE to symlib’s default units. All masses will be in units of \(M_\odot\), all positions and radii will be units of physical \({\rm kpc}\). Positions will be centered on the first halo in the array at the given snapshot. Velocities will be in physical \({\rm km/s}\) and similarly centered on the velocity of the first halo at each snapshot.

This function only needs to be called if comoving=True in symlib.read_subhalos(). This is true by default.

Parameters

Halo Functions

symlib.read_subhalos(sim_dir, comoving=False, include_false_selections=False)

Reads the subhalo data for a single host halo. Two arrays are returned.

The first return value is a 2D symlib.SUBHALO_DTYPE array representing the time-dependent behavior of each subhalo (e.g. positions). The array first indexes over subhaloes in order of their peak \(M_{\rm vir}\) value and then indexes over snapshots from first to last. The host halo is at the first index. The second argument is a 1D symlib.SUBHALO_DTYPE array which represents time-independent information about each subhalo (e.g. merger time). It has the same ordering as the first index of the symlib.SUBHALO_DTYPE array.

Subhalos are determined by the Rockstar halo finder and consistent-trees merger tree code. All objects that have ever been within \(R_{\rm vir,host}\) of the host halo are included, meaning that disrupted, merged, and “splashback” subhalos are included.

If comoving=False, symlib’s default units are used. Positions and velocities are centered on the host halo. Otherwise, the output arrays use Rockstar’s unit conventions by default: all masses, positions, and distances have \(h_{100}\)-scalings: masses have units of \(h^{-1}M_\odot\), positions comoving \(h^{-1}{\rm Mpc}\), and radii comoving \(h^{-1}{\rm kpc}\). In this case positions will be centered on the zero-point of the box.

By default, subhalos which have \(M_{\rm peak}\) above the 300-particle cutoff, but were below the cutoff when they first became a subhalo are considered numerical artifacts and are _not_ included. They can be reintroduced to the catalog by setting include_false_selections=True

Parameters
  • sim_dir (str) – The directory of the target host halo.

  • comoving=False (bool) – Controls whether the resturn values are in default Rockstar/consistent-trees units (False) or default symlib units (True).

  • include_false_selections=False (bool) – Controls whether subhalos which only have \(M_{\rm peak}\) above the catalog cutoff due toa consistent-trees error are included (True) or excluded (False).

Return type

(h, hist): h is a symlib.SUBHALO_DTYPE np.array with shape (\(N_{\rm subhalos}\), \(N_{\rm snaps}\)), hist is is a symlib.HISTORY_DTYPE np.array with length \(N_{\rm subhalos}\).

symlib.read_tree(sim_dir, var_names)

Reads the time-dependent properties of every halo in the simulation, not just the subhalos of the target host in a “depth-first merger tree” format.

The user supplies a list of variable names and a single, 1D array is returned for each variable. Each element of each array is a halo at a specific snapshot, and these arrays are ordered in a way that encodes which halos evolve and merge into which other halos. To decode this structure, you will need to use the results of symlib.read_branches(), which breaks the tree into smaller structures, or “branches.”

The full strucutre of this merger tree is too large of a topic to be covered here. A writeup can be found on the Intro to Merger Trees page.

Parameters
  • sim_dir (str) – The directory of the target host halo.

  • var_names (str list) – The names of variables.

Return type

tuple of np.array, one for each element in var_names.

symlib.read_branches(sim_dir)

Reads information about the time-independent properties of every halo in the simulation, not just the subhalos of target host. Each element corresonds to a single branch in the tree (i.e. the evolution of a single halo over time) and gives information on the properties and location of the branch.

The full strucutre of this merger tree is too large of a topic to be covered here. A writeup can be found on the Intro to Merger Trees page.

Parameters

sim_dir (str) – The directory of the target host halo.

Return type

symlib.BRANCH_DTYPE np.array

symlib.merger_lookup_table(b, dfid)

Creates a lookup table to aid with finding the branches of merging halos. The details of this table are not important and may be changed at any time to improve performance.

Parameters
Return type

int np.array

symlib.find_merger_branch(lookup_table, co_prog)

Searches for the index of the branch corresponding of a given merging subhalo. The subhalo is identified by a “co-progenitor” ID. See the writeup in Intro to Merger Trees for more discussion on what this means.

In practice, most users will want to use symlib.find_all_merger_branches().

Parameters
  • lookup_table (int np.array) – A look up table, as created by symlib.merger_lookup_table().

  • co_prog (int) – a single “co-progenitor depth-first ID” ("next_co_prog" in calls to read_tree()).

Return type

int

symlib.find_all_merger_branches(b, lookup_table, co_prog, i)

Returns the indices of all the branches that merge with a given halo. (i.e. branches that exist in the current snapshot but disrupt in the next snapshot).

Parameters
  • b (symlib.BRANCH_DTYPE np.array) – The branch information for the merger tree.

  • lookup_table (int np.array) – A look up table, as created by symlib.merger_lookup_table().

  • co_prog (int np.array) – A tree-ordered array of co-progenitor IDs ("next_co_prog" in calls to read_tree()).

  • i (int) – The index of the halo in the tree that you are interested in.

Return type

int np.array

Utility Functions

symlib.colossus_parameters(param)

Converts a symlib parameter dictionary to a parameter dictionary that can be passed to a call to colossus.cosmology.cosmology.setCosmology. This will allow you to calculate cosmological quantities (e.g. the mass-concentration relation) using the colossus library.

Parameters

param (dict) – A symlib parameter dictionary returned by symlib.simulation_parameters().

Return type

A colossus parameter dictionary.

symlib.suite_names()

Returns a list of all the valid suite names.

Return type

string list

symlib.plot_circle(ax, x, y, r, **kwargs)

Plots the a circle to a given matplotlib.pyplot.Axes. This is a convenience function that helps with example code in the tutorial.

All keyword arguments accepted by matplotlib.pyplot.plot are accepted as keywords arguments by this function.

Parameters
  • ax (matplotlib.pyplot.Axes) – The axis to plot the circle on.

  • x (float) – The \(x\) coordinate of the circle.

  • y (float) – The \(y\) coordinate of the circle.

  • r (float) – The radius of the circle.

File Management

symlib.download_files(user, password, suite, halo_name, base_out_dir, target='halos', logging=True)

Downloads data associated with a set of halos/suites. See Data Access for usage examples.

This download has two stages. First, all the data is downloaded in “packed” tar files. Once this finishes, all the tar files are expanded into data directories and deleted. This first step is handled with symlib.download_packed_files() and the second with symlib.unpack_files(). If you are running a large download job that stops halfway and don’t want to repeat work when you restart it, you can use these two functions do do it.

Parameters
  • user – The username you would like to use to perform the download. See instructions on the Data Access page for obtaining a username/password.

  • password – The password associated with your username. See instructions on the Data Access page for obtaining a username/password.

  • suite_name (str or None) – The suite to download a halo from. This may either be the full name of a symlib suite or None. If None, symlib.download_files() will be applied to every simulation suite with the given value of halo_name.

  • halo_name (str, int, or None) – The halo to download. This can either be an int giving the index of the halo in the suite, a string giving the name of the halo, or None. If None, all the halos in the given suite[s] will be downloaded.

  • base_out_dir – The directory where data is stored.

  • target="halos" – What type of data to download. Possible options are "halos" and "trees".

  • logging=True – True if you would like output printed telling the user what stage in the download they are at and False if you would like to turn off as much printing as possible.

symlib.download_packed_files(user, password, suite, halo_name, base_out_dir, target='halos', logging=True)

Downloads “packed” tar files containing the requested data for a given set of halos/suites. This function represents half of the symlib.download_files() command and may be useful to users whose download stops halfway through and would like to restart. Note that in such a case, the _last_ downloaded tar file is likely an incomplete download and is probably corrupted. It should be repeated.

Parameters
  • user – The username you would like to use to perform the download. See instructions on the Data Access page for obtaining a username/password.

  • password – The password associated with your username. See instructions on the Data Access page for obtaining a username/password.

  • suite_name (str or None) – The suite to download a halo from. This may either be the full name of a symlib suite or None. If None, symlib.download_packed_files() will be applied to every simulation suite with the given value of halo_name.

  • halo_name (str, int, or None) – The halo to download. This can either be an int giving the index of the halo in the suite, a string giving the name of the halo, or None. If None, all the halos in the given suite[s] will be downloaded.

  • base_out_dir – The directory where data is stored.

  • target="halos" – What type of data to download. Possible options are "halos" and "trees".

  • logging=True – True if you would like output printed telling the user what stage in the download they are at and False if you would like to turn off as much printing as possible.

unpack_files(suite, halo_name, base_out_dir, target='halos', logging=True)

Opens “packed” tar files containing the requested data for a given set of halos/suites. This function represents the second half of the symlib.download_files() command and may be useful to users whose download stops halfway through and would like to restart. Note that in such a case, the _last_ downloaded tar file is likely an incomplete download and is probably corrupted. It should be repeated.

Parameters
  • suite_name (str or None) – The suite to download a halo from. This may either be the full name of a symlib suite or None. If None, symlib.unpack_files() will be applied to every simulation suite with the given value of halo_name.

  • halo_name (str, int, or None) – The halo to download. This can either be an int giving the index of the halo in the suite, a string giving the name of the halo, or None. If None, all the halos in the given suite[s] will be downloaded.

  • base_out_dir – The directory where data is stored.

  • target="halos" – What type of data to download. Possible options are "halos" and "trees".

  • logging=True – True if you would like output printed telling the user what stage in the download they are at and False if you would like to turn off as much printing as possible.