Reading Data

While I prefer to use NetCDF files for storing data, there is support to read & interface with the raw model outputs as well. I will assume you are trying to do this from a python script. This process is very similar (though slightly different for SAMI3 and GITM).

Both these methods will assume you are running a script from either the root directory of this repository, or have followed the instructions in Post-Installation

GITM

Two options exist (and are supported) for reading GITM raw GITM data.

Numpy Arrays

We can read any number (and format) of GITM outputs into a dictionary of numpy arrays. This is not especially fast, memory efficient, or user-friendly, but hey, better than nothing! The code used here has been adapted from the GITM reads in the (develop branch of the) Aetherpy repository.

By just specifying a directory, the utility_programs.read_routines.GITM.read_bin_to_nparrays module will read all the GITM outputs and return a dictionary with the data. They keys of the dictionary are gitmdtimes, corresponding to the time of midel output; gitmbins, corresponding to the actual data, gitmgrid, which contains the grid information, and optionally gitmvars, which contains the names of the columns of data output.

For example, to read the Rho outputs from time-steps 250 - 650, and plot a keogram at altitude 250 km and 120 degrees longitude, we can do the following:

from utility_programs.read_routines.GITM import read_bin_to_nparrays
import matplotlib.pyplot as plt
import numpy as np

gitmdata = read_bin_to_nparrays('path/to/GITM/output',
                                gitm_file_pattern='3DALL*.bin',
                                cols=['all'],
                                start_idx=250, end_idx=650)

lonidx = np.argmin(np.abs(gitmdata['gitmgrid']['longitude'] - 120))
altidx = np.argmin(np.abs(gitmdata['gitmgrid']['altitude'] - 250))

plt.imshow(gitmdata['gitmbins'][:, lonidx, :, altidx].T, aspect='auto',
            extent=[gitmdata['gitmdtimes'][0],
                    gitmdata['gitmdtimes'][-1],
                    gitmdata['gitmgrid']['latitude'][0],
                    gitmdata['gitmgrid']['latitude'][-1]])

plt.title('GITM Rho at 250 km and 120 degrees longitude')

plt.show()

This is a little clunky and highlights why we prefer NetCDF files. The numpy arrays and dictionaries are good for starting out, but do not scale well to large datasets or more complicated analysis.

NetCDF Files

In the backend, this code-base uses Xarray for all handling of NetCDF files. The reading from binary to Xarray DataSet is very similar to the methodology for reading to numpy arrays. There are separate scripts to read one & multiple files. To accomplish the same as above, we just need to run:

from utility_programs.read_routines import GITM

# For a single file (one time)
gitmdata = GITM.read_bin_to_xarray('path/to/GITM/output/file.bin',
                                    cols='Rho')

# And for multiple files:
gitm_files = GITM.read_multiple_bins_to_xarray('path/to/GITM/output',
                                                cols='Rho',
                                                start_idx=250, end_idx=650)

gitm_files.Rho.sel(lon=120, alt=250, method='nearest').plot(x='time')

Approximately the same amount of work to read the files, but SO much easier to plot & analyze!

There is also a script to automatically read all of the GITM data in a directory, with preference to reading in to xarray from NetCDF, though it can also read to numpy arrays, or to xarray from binaries. This is the recommended method for reading GITM data.

utility_programs.read_routines.GITM.auto_read(gitm_dir, single_file=False, start_dtime=None, start_idx=None, end_dtime=None, end_idx=None, cols='all', progress_bar=True, drop_ghost_cells=True, file_type=None, return_xarray=True, force_dict=False, parallel=True, engine='h5netcdf', use_dask=False)

Automatically reads in a directory of GITM files.

Parameters:

gitm_dir (str) – Directory of GITM files.
single_file (bool, optional) – Whether to read in a single file. Defaults to False.
start_dtime (datetime, optional) – Start time of the data you want. Defaults to None.
start_idx (int, optional) – Start index of the data you want. Defaults to None.
end_dtime (datetime, optional) – End time of the data you want. Defaults to None.
end_idx (int, optional) – End index of the data you want. Defaults to None.
cols (list-like or str, optional) – List of columns you want to read in. Defaults to ‘all’.
progress_bar (bool, optional) – Whether to show a progress bar. Defaults to True. Requires tqdm.
drop_ghost_cells (bool, optional) – Whether to drop ghost cells. Defaults to True.
file_type (str, optional) – File type of the data you want to read in. Defaults to None.
return_xarray (bool, optional) – Whether to return an xarray. Defaults to True.
force_dict (bool, optional) – Whether to force a dictionary return. Defaults to False.
parallel (bool, optional) – Whether to read in files in parallel. Defaults to True. This will use Dask, which can get hairy. If you’re having issues, try setting this to False. Needs dask and dask.distributed
engine (str, optional) – The engine to use for reading in the data. Defaults to ‘h5netcdf’.
use_dask (bool, optional) – Whether to use Dask for reading in the data. Defaults to False.

Returns:

The data read in from the GITM files.

Return type:

xarray.Dataset or dict

SAMI3

This is very similar to reading GITM data, but the format of the non-xarray reads is very different. SAMI3 runs on a magnetic grid, so the data has to be read in a little differently. We cannot make a plot at a single altitude, for example, because the altitude is not constant. Instead, we can make a plot at a single magnetic longitude, or at a range of altitudes.

To read in SAMI3 data to numpy arrays, we need to specify both the path to the data, as well as the start time of the simulation. All other parameters are automatically read from the settings files.

from utility_programs.read_routines import SAMI
from datetime import datetime

sami_data, times = SAMI.read_to_nparray('/path/to/sami/data',
                dtime_sim_start=datetime(2011,5,16),
                cols='edens')

This sami_data is a python dictionary with keys [‘grid’, ‘data’], where sami_data['data'] is indexed with [varname][nt, nlt, nf, nz].

It’s complicated, so there’s also a script to read the data to xarray, which is much easier to use.

from utility_programs.read_routines import SAMI
from datetime import datetime

sami_ds = read_raw_to_xarray('/path/to/sami/data',
                 dtime_sim_start=datetime(2011,5,16),
                 cols='edens')

And this sami_ds is slightly easier to deal with! It is indexed with [nt, nlt, nf, nz], and has the same variables as the numpy array read, but also has the grid information as coordinates.

Of course, we also have and auto_read module:

read sami data.

utility_programs.read_routines.SAMI.auto_read(sami_dir, cols='all', split_by_time=False, split_by_var=False, whole_run=False, return_xarray=True, filetype='SAMI-REGRID', force_nparrays=False, dtime_sim_start=None, parallel=True, start_dtime=None, start_idx=None, end_dtime=None, end_idx=None, hrs_before_storm_start=None, hrs_after_storm_start=None, dtime_storm_start=None, progress_bar=False, use_dask=False, engine='h5netcdf', skip_time_check=False)

Automatically reads in SAMI data and returns it in a format of your choice.

Preference is to read/return xarray datasets, but can read
and return numpy arrays.
Prefer whole files, fall back on time, variable, split.

Parameters:

(str (sami_dir) – path-like): Path to the directory containing the SAMI data
cols (str or list-like, optional) – Variables to return. Defaults to ‘all’.
split_by_time (bool, optional) – If files are output by time (and whether to prefer those files). Defaults to False.
split_by_var (bool, optional) – If files are output by variable. And to prefer those files. Defaults to False.
whole_run (bool, optional) – If the whole run is in one file. Defaults to False.
return_xarray (bool, optional) – Return xarray dataset? Defaults to True.
force_nparrays (bool, optional) – Force program to return dicts of numpy arrays. Defaults to False.
dtime_sim_start (datetime, optional) – Datetime of the start of the simulation. Defaults to None. Required if netCDF files aren’t made. If netCDF files are made, You had the option to add this as an attribute to the file.
parallel (bool, optional) – Force parallel reading of files. NetCDF files are read weird. Might be buggy. Defaults to True.
start_dtime (datetime, optional) – Datetime to start reading data. Defaults to None.
start_idx (int, optional) – Index of the first time to read. Defaults to None.
end_dtime (datetime, optional) – Datetime to stop reading data. Defaults to None
end_idx (int, optional) – Index of the last time to read. Defaults to None.
hrs_before_storm_start (int, optional) – Hours before the storm start to read data. Defaults to None. (dtime_storm_start must be set)
hrs_after_storm_start (int, optional) – Hours after the storm start to read data. Defaults to None. (dtime_storm_start must be set)
dtime_storm_start (datetime, optional) – Datetime of the storm start. Defaults to None. (hrs_before_storm_start and hrs_after_storm_start must be set)
progress_bar (bool, optional) – Show progress bar? Defaults to False. Requires tqdm.

Returns:

Dataset of the SAMI data: (If return_xarray is True)
dict: Dictionary of numpy arrays of the SAMI data: (If force_nparrays is False)

Return type:

xarray Dataset