Data Analysis Tutorial

Symphony data is stored in cutsom data formats and must be read with its analysis library, symlib. This page contains tutorials and example code showing how to use this library. A complete, detailed reference for this library can be found here.

Installation

As disucssed in the Data Access page, symlib can be downloaded with the pip tool.

$ pip install symlib -U

Reading in Subhalo Data

The first step to working with simulation data is loading the simulation parameters. This is a dictionary containing cosmological and numerical information. These can be looked up with symlib.simulation_parameters() by giving the directory of the halo as an argument.

import symlib

# sim_dir is the location of a single halo. We'll talk about
# auto-generating this later, but if you downloaded "Halo933"
# in the suite "SymphonyLMC" to the directory "data/", this
# would be "data/SymphonyLMC/Halo933"
sim_dir = "path/to/ExampleHalo"

params = symlib.simulation_parameters(sim_dir)
print(params)
{'flat': True, 'H0': 70.0, 'Om0': 0.286, 'Ob0': 0.049,
 'sigma8': 0.82, 'ns': 0.95, 'eps': 0.17, 'mp': 281981.0,
 'h100': 0.7}

The first six values are colossus-compatible cosmological parameters, the next two are numerical parameters ("eps" is the radius of each particle in comoving \(h^{-1}{\rm kpc}\), and mp is the mass of each particle in \(h^{-1}M_\odot\)). The last value is \(h_{100} = H_0/(100\ {\rm km/s})\), which is often convenient to have.

Next, we read in the subhalos with the function symlib.read_subhalos()

halos, histories = symlib.read_subhalos(sim_dir)

There are two return return values, halos and histories. In your code, you’ll probably want to abbreviate these as h and hist, or something similar, following the library code.

halos is a 2D array that represents how the host halo and all its subhalos evolve over time. The first index accesses different halos and second different times. It contains information like mass and position. halos[0,235] is the host halo at snapshot 235, the last snapshot in the simulation. halos[3, 100] is the third largest subhalo, including disrupted subhalos, at snapshot 100. Subhalos are ordered according to the largest mass they ever had, \(M_{\rm peak}\). Halos stay at the same index across their lifetimes.

histories contains summary information about a halo’s full history, including \(M_{\rm peak}\) and when that subhalo fell into the host. Its length and ordering are the same as the first index of halos.

halos is a numpy structured array has type symlib.SUBHALO_DTYPE, and histories is a structured array with type symlib.HISTORY_DTYPE. Structured arrays are arrays that have different fields which can be accessed with strings. For example, halos[3,100]["mvir"] and halos["mvir"][3,100] both return the mass, \(M_{\rm vir}\) of the third most massive halo at snapshot 100. The non-string indices obey normal numpy indexing rules, so you can use slicing, boolean indexing, axis removal and whatever other tricks you use with normal numpy arrays.

The full set of fields in halos is described in the symlib.SUBHALO_DTYPE documentation. In this tutorial we will only use:

  • "x" - Position

  • "v" - Velocity

  • "mvir" - Mass

  • "rvir" - Radius

  • "ok" - True if the halo exists at the given snapshot, False otherwise.

Fields in histories will be explained as needed, but can be found in full in the symlib.HISTORY_DTYPE documentation.

Example Subhalo Analysis: Plotting Postions

Our first step with analyzing any simulation data should be to look at it qualitatively. We’ll start by looking at the positions of the major subhalos around our central halo at the last snapshot of the simulation. We will plot the central halo in one color and the subhalos in another. We’ll also need to skip all the subhalos that were destroyed before the end of the simulation.

We’ll also use a utility function, symlib.plot_circle() to make the circles.

import symlib
import matplotlib.pyplot as plt
fig, ax = plt.subplots()

sim_dir = "path/to/ExampleHalo"
h, hist = symlib.read_subhalos(sim_dir)

host = h[0,-1] # First halo, last snapshot.
symlib.plot_circle(ax, host["x"][0], host["x"][1],
                   host["rvir"], c="tab:red")

for i in range(1, len(h)):
    sub = h[i,-1] # i-th halo, last snapshot.
    if not sub["ok"]: continue
    symlib.plot_circle(
        ax, sub["x"][0], sub["x"][1],
        sub["rvir"], c="tab:blue"
    )

With a little bit of additional pyplot work that we’ve ellided here, this gives us the following. The full script used to create this image, including the omitted pyplot code is shown in examples/positions.py.

_images/positions.png

From this, we can see that our host halo is surrounded by a swarm of subhalos. Bigger subhalos are rarer and generally closer to the center of the host. Some subhalos are outside the radius of the host. These “splashback subhalos” had been inside the host in the past but have temporarily orbited outside of it. They are included in the symlink catalogs by default.

Let’s review the concepts that went into creating this image:

  • We read in simulation parameters and halo information with symlib.simulation_parameters() and symlib.read_subhalos().

  • We got the host halo at the last snapshot with halos[0,-1] and the subhalos with halos[i,-1].

  • We got a vector representing the postion of the host by accessing host["x"] and the radius with host["rvir"] and were able to get similar quantities for subhalos.

  • We needed to check sub["ok"] to make sure that the halo still existed at the snapshot we were interested in.

Here, the central halo at index 0 is red and all is subhalos are blue. We used a built-in utility function called plot_circle and needed to skip over some subhalos which disrupted before the final snapshot.

Example exercise

In the histories array, there is a field called merger_snap that gives the snapshot when a subhalo first fell into the host. Try coloring subhalos that fell in from the left side of the halo (\(x_{\rm infall} < 0\)) differently from ones that fell in from the right.

Example Analysis: Mass Growth

Now, we’ll try analysis that’s a bit more quantitative. We’ll look at the growth of subhalos over time. To do this, we’ll need to get the scale factors, \(a(z)\), for each snapshot with symlib.scale_factors(). We’ll also use one of the fields in histories, "merger_snap" which is the snapshot when the subhalo first fell into the host. We’ll use it to plot times before infall as dashed lines and times afterwards as solid lines.

sim_dir = "path/to/ExampleHalo"

scale = symlib.scale_factors(sim_dir)
h, hist = symlib.read_subhalos(sim_dir)

snaps = np.arange(len(h[0])) # Snapshots #s, for making cuts.

fig, ax = plt.subplots()
colors = ["k", "tab:red", "tab:orange", "tab:green",
          "tab:blue", "tab:purple"]
for i in range(6):
    ok = h[i,:]["ok"] # Snapshots where the halo exists
    if i == 0:
        # Plot the host halo
        plt.plot(scale[ok], h[i,ok]["mvir"], c=colors[i])
    else:
        # Plot the full history of the subhalo as a dahsed line
        plt.plot(scale[ok], h[i,ok]["mvir"], "--", c=colors[i])
        # Plot its history inside the host halo as a solid line
        is_sub = (snaps >= hist["merger_snap"][i]) & ok
        plt.plot(scale[is_sub], h[i,is_sub]["mvir"], c=colors[i])

With a little bit of additional pyplot work, this gives us the following. The full script used to create this image, including the omitted pyplot code is shown in examples/mah.py.

_images/mah.png

Here we see that our subhalos spend most of their time in the simulation building up mass prior to falling in. The earlier-infalling halos shown here don’t last for very long: they disrupt in a few snapshots! Others, like the green subhalo survive much longer.

Let’s review the concepts that went into creating this image:

  • We needed to read in scale factors with symlib.scale_factors() to figure out when each snapshot occured.

  • We were able to figure out the snapshot when a subhalo fell into the host with histories’s "merger_snap" field.

  • The indices of structured arrays work just like normal numpy arrays, so we were able to select parts of them with the boolean arrays ok and is_sub.

Example exercise:

You might have noticed that subhalos start losing mass before they actually start falling into the host (look at the green curve in particular). Using logic similar to the above plot, try figuring out how far away subhalos are on average from a host when they reach their peak mass.

Example Analysis: The Subhalo Mass Functions

Lastly, let’s try some more rigorous statistical analysis. We’re going to measure the subhalo mass function of the entire Milky Way suite. We’ll look at \(N(>M_{\rm peak})\), the average number of subhalos per host halo whose maximum mass was larger than \(M_{\rm peak}\). To do this, we’ll need to access the "mpeak" field of the histories array.

More importantly, to get good statistics we’ll need to loop over all the host halos in the Milky Way suite, SymphonyMilkyWay. One way to do this would be to manually store the names of all the halo directories, but instead we’ll use library functions to do it. First, we’ll count the number of halos in the Milky Way-mass suite with symlib.n_hosts(). Then, we can get directory names symlib.get_host_directory(), which takes the base directory, suite name, and the index of the halo you want to read. Together this lets you loop over halo directories.

Constructing a mass function has a bit more code overhead than the earlier examples: the important part is how the loop over files works.

base_dir = "path/to/base/dir"
suite_name = "SymphonyMilkyWay"

# Mass function bins and empty histogram.
log_m_min, log_m_max, n_bin = 8, 12, 200
bins = np.logspace(log_m_min, log_m_max, n_bin+1)
N_vir = np.zeros(n_bin)

n_hosts = symlib.n_hosts(suite_name)
for i_host in range(n_hosts):
    sim_dir = symlib.get_host_directory(base_dir, suite_name, i_host)
    h, hist = symlib.read_subhalos(sim_dir)

    # Only count objects within R_vir
    host_rvir = h[0,-1]["rvir"]
    sub_x = h[:,-1]["x"]
    r = np.sqrt(np.sum(sub_x**2, axis=1))
    ok = h["ok"][:,-1] & (r < host_rvir)

    # Put in bins and add to cumulative histogram
    n_vir, _ = np.histogram(hist["mpeak"][ok][1:], bins=bins)
    N_vir += np.cumsum(n_vir[::-1])[::-1]/n_hosts

plt.plot(bins[:-1], N_vir, "k")

With a little bit of additional pyplot work, this gives us the following. The full script used to create this image, including the omitted pyplot code is shown in examples/mass_func.py.

_images/mass_func.png

Here, we can see the classic form of the subhalo mass function. At smaller subhalo masses, decreasing the subhalo mass by a increses the number of subhalos by \(N \lesssim M^{-1}\), and there’s an exponential cutoff as the subhalos get close to the host mass.

Let’s review the concepts that went into creating this image:

  • We needed to use symlib.n_hosts() to find the number of host halos in our target suite

  • We needed to use symlib.get_host_directory() to find the names of the directories in the host halo.

  • We needed the "mpeak" field of histories

  • We needed to do a little bit of array magic with numpy arrays, although this could also have been done in a less concise way.

Practice:

Try adding a curve for the mass function of surviving “splashback” subhalos to this plot.