API REFERENCE

Processing Model Results

To post-process model results, use PostProcessModelResults.py. The help information can be accessed by running:

python PostProcessMOdelResults.py --help

More functionality can be unlocked specifically through running RegridSami…

RegridSami

Postprocess SAMI data to be read into xarrays.

  • This is just an entrypoint to the interpolation functions in the utility_programs folder (utility_programs/interpolate_outputs.py)

  • These functions can be called from the command line or from other scripts.

  • Sometimes the interpolations can get a little wonky, especially if the grid is too coarse. If you see weird artifacts in the output, try increasing the grid resolution, or runnnig the interpolations manually in utility_programs/interpolate_outputs.py

RegridSami.main(sami_data_path, out_path=None, save_weights=True, cols='all', dtime_sim_start=None, lat_step=1, lon_step=4, alt_step=25, minmax_alt=[100, 2600], out_coord_file=None, sami_mintime=0, run_name=None, skip_time_check=False, progress_bar=True, num_workers=16)

Interpolate SAMI3 outputs to a new grid.

Parameters:
  • sami_data_path (str or os.pathLike) – Path to SAMI3 output files.

  • out_path (str or os.pathLike, optional) – Location to save output files. Defaults to sami_data_path.

  • save_weights (bool, optional) – Whether or not to save Delauney Triangulation. Defaults to True.

  • cols (str or list-like, optional) – Columns to read data from. Can be string or list-like. Defaults to ‘all’.

  • dtime_sim_start (datetime, optional) – Datetime of simulation start. Required to read raw SAMI outputs. Defaults to None.

  • lat_step (int, optional) – Integer step to use in output grid (deg). Defaults to 2.

  • lon_step (int, optional) – Integer step to use in output grid (deg). Defaults to 4.

  • alt_step (int, optional) – Integer step to use in output grid (in km). Defaults to 50.

  • minmax_alt (list, optional) – Min & Max altitude to output in km. Defaults to [100, 2200].

  • out_coord_file (str or os.pathLike, optional) – Output coordinates from a file instead of using the default grid. Defaults to None (no coordinate file). ** MUST HAVE “time, lat, lon, alt” COLUMNS ** Use this to specify a user-defined grid or to interpolate to a set of satellite coordinates.

  • sami_mintime (int, optional) – Minimum time to read in SAMI data. Defaults to 0. Use this to skip the first few hours of SAMI data and save time & memory.

  • run_name (str optional) – Name of the run, used to name the output file. run_name=’test’ will generate a file called ‘test_SAMI_REGRID.nc’. Defaults to None.

  • skip_time_check (bool, optional) – Skip checking if the time range is valid. SAMI can sometimes output fake timesteps, or be configured to start outputting data after several hours. This will skip checking that the times are valid. Defaults to False.

  • progress_bar (bool, optional) – Show progress bar? Defaults to True.

  • num_workers (int, optional) – Number of workers to use when interpolating. Defaults to 16. (16 workers => 1.3 GB of RAM/10 time-steps of SAMI at 80/72/256 resolution)

Raises:

ValueError – If out_coord_file does not have the required variables.

Return type:

None

Under the hood, all interpolations are performed with LinearNDInterpolator. This is a wrapper around Qhull.

For more information (and to unlock even more functionality), use the utility_programs.interpolate_outputs module:

utility_programs.interpolate_outputs

Code to interpolate any model data to either a: - user-defined grid - collection of satellite points

It’s easiest to interface with the interpolate function, or use the main function from the command line…

Can read output coords from cdf/csv or by user-defined grid. > Default output grid is 5deg lon x 2deg lat x 50km alt

utility_programs.interpolate_outputs.do_interpolations(sami_data_path=None, dtime_sim_start=None, skip_time_check=False, out_lat_lon_alt=None, out_path=None, out_runname=None, sat_times=None, cols='all', show_progress=False, gitm_data_path=None, gitm_output_each_var=True, gitm_output_each_time=False, is_grid=False, sami_mintime=0, save_delauney=False, max_alt=None, engine='h5netcdf', return_ds_too=False, num_workers=16)

Interpolate SAMI (GITM functionality not fully tested) to either a standard geographic grid or to user-defined points.

Parameters:
  • sami_data_path (str, optional) – Path to sami data.

  • dtime_sim_start (str or datetime, optional) – Start time of simulation. Required to read SAMI data. Can be str (YYYYMMDD) or a pre-computed datetime object.

  • skip_time_check (bool, optional) – If True, skip the check to make sure the SAMI times are self-consistent (not always true…).

  • out_lat_lon_alt (numpy.array, optional) – Coordinates to interpolate to. Must have dimenstions 3xN, where N is number of points. Will be converted to cartesian coordinates. Lon and Lat in degrees, Alt in km above earth surface.

  • out_path (str or os.PathLike, optional) – Path to save regridded to. Default is same as MODEL_data_path.

  • out_runname (str, optional) – Descriptive name for output file. Saved as: out_path/{out_runname + “SAMI_REGRID.nc”}.

  • sat_times (list, optional) – List of times to interpolate to. Must be a list of (python or pandas) datetime objects.

  • cols (str or list-like, optional) – Which variables to interpolate. Default is ‘all’. Can be any str from utility_programs.read_routines.SAMI.sami_og_vars.

  • show_progress (bool, optional) – Show progress bars? Default: False. Requires tqdm.

  • gitm_data_path (str, optional) – Path to gitm data.

  • gitm_output_each_var (bool, optional) – If True, output each variable to a separate file. Requires looping through the GITM output files multiple times. If False, gitm_output_each_time must be True.

  • gitm_output_each_time (bool, optional) – If True, output each time to a separate file. Will run faster for all variables than gitm_output_each_var, but will include variables the user does not care for.

  • is_grid (bool, optional) – Whether to interpolate to a standard geographic grid or to user-defined points.

  • sami_mintime (int, optional) – Minimum time to interpolate SAMI data to. Default is 0.

  • save_delauney (bool, optional) – Option to save/read delauney weights from file. It takes a while to compute them. Weight file is saved to the sami_data_path path with a specification of the max_alt. Setting to True allows the program to read weights as well as save them.

  • max_alt (int, optional) – Specify maximum altitude of data grid to feed in to delauney calculations. Useful if you don’t want to recalculate weights and the interpolation is different from one already done.

  • engine (str, optional) – Which engine to use when writing netcdf files. Default is ‘h5netcdf’ but can cause some issues on some systems and some python environments. Set to None to use default xarray engine.

  • return_ds_too (bool, optional) – Set to True to also return the interpolated dataset. !! Does not support multiple variables. !! ONLY works for SAMI (currently).

  • num_workers (int, optional) – Number of workers to use for parallel processing. Default is 16.

Raises:
  • ValueError – Only one of sami_data_path or gitm_data_path can be specified at a time.

  • ValueError – Must specify sat_times if not using a grid

  • NotImplementedError – Interpolating from NetCDF files not yet supported

  • ValueError – No GITM files found in gitm_data_path. Go run pGITM and rerun this with the .bin files.

  • ValueError – Invalid column requested.

Returns:

Interpolated data. Optional, only returned if return_ds_too=True.

Return type:

xarray.Dataset

utility_programs.interpolate_outputs.interpolate_var(tri1, tri2, outpts1, outpts2, indata, ntime, out_shape, mask)

This is a refactor of the SAMI interps so that we can thread it.

Reading Data

GITM

utility_programs.read_routines.GITM.auto_read(gitm_dir, single_file=False, start_dtime=None, start_idx=None, end_dtime=None, end_idx=None, cols='all', progress_bar=True, drop_ghost_cells=True, file_type=None, return_xarray=True, force_dict=False, parallel=True, engine='h5netcdf', use_dask=False)

Automatically reads in a directory of GITM files.

Parameters:
  • gitm_dir (str) – Directory of GITM files.

  • single_file (bool, optional) – Whether to read in a single file. Defaults to False.

  • start_dtime (datetime, optional) – Start time of the data you want. Defaults to None.

  • start_idx (int, optional) – Start index of the data you want. Defaults to None.

  • end_dtime (datetime, optional) – End time of the data you want. Defaults to None.

  • end_idx (int, optional) – End index of the data you want. Defaults to None.

  • cols (list-like or str, optional) – List of columns you want to read in. Defaults to ‘all’.

  • progress_bar (bool, optional) – Whether to show a progress bar. Defaults to True. Requires tqdm.

  • drop_ghost_cells (bool, optional) – Whether to drop ghost cells. Defaults to True.

  • file_type (str, optional) – File type of the data you want to read in. Defaults to None.

  • return_xarray (bool, optional) – Whether to return an xarray. Defaults to True.

  • force_dict (bool, optional) – Whether to force a dictionary return. Defaults to False.

  • parallel (bool, optional) – Whether to read in files in parallel. Defaults to True. This will use Dask, which can get hairy. If you’re having issues, try setting this to False. Needs dask and dask.distributed

  • engine (str, optional) – The engine to use for reading in the data. Defaults to ‘h5netcdf’.

  • use_dask (bool, optional) – Whether to use Dask for reading in the data. Defaults to False.

Returns:

The data read in from the GITM files.

Return type:

xarray.Dataset or dict

utility_programs.read_routines.GITM.find_variable(gitm_dir, varname=None, varhelp=False, nc=True)
Help function. Finds a variable in a directory of GITM files.

Return the filetype and/or all of the variables available.

Parameters:
  • (str (gitm_dir) – path-like): Directory of GITM files.

  • varname (str, optional) – Variable you’re looking for. Not setting this will just print all variables. Defaults to None.

  • varhelp (bool, optional) – If True, will print out all vaiables available. Think of it as “just checking”. Defaults to False.

  • nc (bool, optional) – Whether to only look at .nc files. Defaults to True.

Raises:

ValueError – If you don’t specify either varhelp or varname.

Returns:

The filetype holding the variable you’re loooking for.

Return type:

str (optional)

utility_programs.read_routines.GITM.gitm_times_from_filelist(file_list, century_prefix='20')

Generate datetimes from a list of GITM files.

Parameters:
  • file_list (list-like) – list of gitm files to parse

  • century_prefix (str, optional) – Which century? Defaults to ‘20’.

Raises:

ValueError – Incorrect file format.

Returns:

List of datetimes in the same order as the filelist input.

Return type:

list

utility_programs.read_routines.GITM.process_all_to_cdf(gitm_dir, out_dir=None, dtime_storm_start=None, delete_bins=False, replace_cdfs=False, progress_bar=True, drop_ghost_cells=True, drop_before=None, drop_after=None, skip_existing=False, file_types='all', use_ccmc=True, single_file=False, run_name=None, tmp_dir=None)

Process all GITM .bin files in a directory to .cdf files.

Parameters:
  • (str (out_dir) – path-like): Directory containing GITM .bin files.

  • (str – path-like, optional): Directory to output .cdf files. If None, will go into the same directory as the .bin files. Defaults to None.

  • dtime_storm_start (datetime, optional) – Attribute added to the netCDF file. Defaults to None.

  • delete_bins (bool, optional) – Delete GITM bins after making Datasets? Defaults to False.

  • replace_cdfs (bool, optional) – Replace pre-existing netCDF files? Defaults to False.

  • progress_bar (bool, optional) – Whether or not to show progress bar. Requires tqdm. Defaults to True. If outputting to a single file, a progress bar will be added when writing files to disk. This cannot be changed.

  • drop_ghost_cells (bool, optional) – Drop GITM ghost cells? Defaults to True.

  • drop_before (datetime, optional) – Similar to start_dtime. When to start processing files. Will delete files before this time. Defaults to None.

  • drop_after (datetime, optional) – Similar to start_dtime. When to start processing files. Will delete files before this time. Defaults to None.

  • skip_existing (bool, optional) – Skip existing netCDF files? Defaults to False. This will slow down the program significantly.

  • file_types (str or list-like, optional) – Which file types to process. Defaults to ‘all’. Can be a list of strings or a single string. Example usage is [‘3DALL’, ‘2DALL’] or ‘3DALL’.

  • use_ccmc (bool, optional) – Write files with CCMC naming convention? Defaults to True. Recommended if not using single_file.

  • single_file (bool, optional) – Output to a single file? Defaults to False. If True, will output to a single netCDF file. If False, will output to multiple netCDF files, one for each time.

  • run_name (str, optional) – Name of the run. Only used if single_file. Defaults to None. ‘_GITM.nc’ will be appended to this.

  • tmp_dir (str, optional) – Temporary directory to write files to. Only used if single_file. Defaults to None. Some systems have a local temp directory that’s much faster than the standard output_directory.

utility_programs.read_routines.GITM.read_bin_to_nparrays(gitm_dir, gitm_file_pattern='3DALL*.bin', cols=['all'], dtime_start=None, dtime_end=None, start_idx=0, end_idx=-1, century_prefix='20', return_vars=False, progress_bar=False)

reads in gitm data into a dictionary of numpy arrays (deprecated, use read_to_xarray instead)

Parameters:
  • gitm_dir (str) – path to gitm files

  • gitm_file_pattern (str, optional) – file pattern to match. Defaults to ‘3DALL*.bin’.

  • cols (list, optional) – which columns to read (strs). Defaults to [‘all’].

  • dtime_start (datetime, optional) – datetime to start read. Defaults to 0 (all data).

  • dtime_end (datetime, optional) – datetime to end read. Defaults to -1 (all data).

  • start_idx (int, optional) – index to start reading at.

  • end_idx (int, optional) – index to end reading at.

  • century_prefix (str, optional) – century. Defaults to ‘20’.

  • progress_bar (bool, optional) – show progress bar. Defaults to False. (Requires tqdm)

Raises:

ValueError – If GITM files don’t exist or are in a weird format.

Returns:

dictionary of numpy arrays with keys:

gitmdtimes:

times of gitm outputs

gitmbins:

gitm data

gitmgrid:

dictionary of grid variables

gitmvars (optionaln only returned if return_vars):

list of variables

Return type:

dict

utility_programs.read_routines.GITM.read_bin_to_xarray(filename, drop_ghost_cells=True, cols='all')

Reads GITM binary file into xarray Works for all GITM files, including 3DALL, 2DALL, 2DANC, etc. - (Taken and modified from aetherpy)

Parameters:
  • filename (str, path) – Path to the file to read.

  • drop_ghost_cells (bool, optional) – Drop GITM ghost cells. See GITM manual for details on ghost cells. Defaults to True.

  • cols (str/list-like, optional) – Set which columns to read. On systems with limited memory this will make datasets too large to fit into memory. Defaults to ‘all’ (all columns).

Raises:

IOError – File does not exist

Returns:

Dataset holding the data.

Indexed with glat, glon, alt (converted to deg, deg, km)

Return type:

xarray.Dataset

utility_programs.read_routines.GITM.read_multiple_bins_to_xarray(file_list, start_dtime=None, end_dtime=None, start_idx=0, end_idx=-1, drop_ghost_cells=True, cols='all', pbar=False)

Read a list-like of GITM files into an xarray Dataset.

Parameters:
  • file_list (list-like) – files to pull from.

  • start_dtime (datetime, optional) –

    Time to start read at. Not necessary (especially if you have

    pre-filtered the file_list. Defaults to None.

  • end_dtime (datetime, optional) –

    Time to end reads at. See above. Can be used exclusively.

    Defaults to None.

  • start_idx (int, optional) – Index of file_list to start reading. Defaults to 0.

  • end_idx (int, optional) – Index of file_list to end reading at. Defaults to -1.

  • drop_ghost_cells (bool, optional) – Remove Ghost cells? Defaults to True.

  • cols (str or list-like, optional) – Specific columns to read. Defaults to ‘all’.

  • pbar (bool, optional) – Whether or not to show progress bar. Requires tqdm. Defaults to False.

Raises:

ValueError – If start/end inputs are mixed up.

Returns:

Dataset containing all variables in the file_list at the

times specified.

Return type:

xarray.Dataset

SAMI3

read sami data.

utility_programs.read_routines.SAMI.auto_read(sami_dir, cols='all', split_by_time=False, split_by_var=False, whole_run=False, return_xarray=True, filetype='SAMI-REGRID', force_nparrays=False, dtime_sim_start=None, parallel=True, start_dtime=None, start_idx=None, end_dtime=None, end_idx=None, hrs_before_storm_start=None, hrs_after_storm_start=None, dtime_storm_start=None, progress_bar=False, use_dask=False, engine='h5netcdf', skip_time_check=False)

Automatically reads in SAMI data and returns it in a format of your choice.

  • Preference is to read/return xarray datasets, but can read

    and return numpy arrays.

  • Prefer whole files, fall back on time, variable, split.

Parameters:
  • (str (sami_dir) – path-like): Path to the directory containing the SAMI data

  • cols (str or list-like, optional) – Variables to return. Defaults to ‘all’.

  • split_by_time (bool, optional) – If files are output by time (and whether to prefer those files). Defaults to False.

  • split_by_var (bool, optional) – If files are output by variable. And to prefer those files. Defaults to False.

  • whole_run (bool, optional) – If the whole run is in one file. Defaults to False.

  • return_xarray (bool, optional) – Return xarray dataset? Defaults to True.

  • force_nparrays (bool, optional) – Force program to return dicts of numpy arrays. Defaults to False.

  • dtime_sim_start (datetime, optional) – Datetime of the start of the simulation. Defaults to None. Required if netCDF files aren’t made. If netCDF files are made, You had the option to add this as an attribute to the file.

  • parallel (bool, optional) – Force parallel reading of files. NetCDF files are read weird. Might be buggy. Defaults to True.

  • start_dtime (datetime, optional) – Datetime to start reading data. Defaults to None.

  • start_idx (int, optional) – Index of the first time to read. Defaults to None.

  • end_dtime (datetime, optional) – Datetime to stop reading data. Defaults to None

  • end_idx (int, optional) – Index of the last time to read. Defaults to None.

  • hrs_before_storm_start (int, optional) – Hours before the storm start to read data. Defaults to None. (dtime_storm_start must be set)

  • hrs_after_storm_start (int, optional) – Hours after the storm start to read data. Defaults to None. (dtime_storm_start must be set)

  • dtime_storm_start (datetime, optional) – Datetime of the storm start. Defaults to None. (hrs_before_storm_start and hrs_after_storm_start must be set)

  • progress_bar (bool, optional) – Show progress bar? Defaults to False. Requires tqdm.

Returns:

Dataset of the SAMI data

(If return_xarray is True)

dict: Dictionary of numpy arrays of the SAMI data

(If force_nparrays is False)

Return type:

xarray Dataset

utility_programs.read_routines.SAMI.get_grid_elems_from_parammod(sami_data_path)
Go into sami data directory and get the grid elements

from the parameter_mod.f90 file.

Parameters:

sami_data_path (str) – data path for sami outputs

Returns:

num. grid points along each field line nf: num. field lines along each magnetic longitude nlt: num. magnetic longitudes nt: num. time steps

Return type:

nz

utility_programs.read_routines.SAMI.get_postprocessed_grid(sami_data_path)
Go into sami data directory and get the grid elements

from the parameter_mod.f90 file.

Parameters:

sami_data_path (str) – data path for sami outputs

Returns:

num. grid points along each field line ny: num. field lines along each magnetic longitude

Return type:

nx

utility_programs.read_routines.SAMI.get_sami_grid(sami_data_path, nlt, nf, nz)

Read in SAMI grid files.

Parameters:
  • sami_data_path (str) – path to SAMI data

  • nlt (int) – Number of magnetic local times (lons)

  • nf (int) – Number of field lines along each longitude

  • nz (int) – Number of grid cells along each field line

  • geo_grid_files (dict, optional) – Files to use for getting the grid. Defaults to { ‘glat’: ‘glatu.dat’, ‘glon’: ‘glonu.dat’, ‘alt’: ‘zaltu.dat’, ‘mlat’: ‘blatu.dat’, ‘mlon’: ‘blonu.dat’, ‘malt’: ‘baltu.dat’}.

Returns:

SAMI3 grid in a dictionary with keys: ‘glat’, ‘glon’, ‘alt’, ‘mlat’, ‘mlon’, ‘malt’

Return type:

dict

utility_programs.read_routines.SAMI.make_times(nt, sami_data_path, dtime_sim_start, dtime_storm_start=None, hrs_before_storm=None, hrs_after_storm=None, need_help=False, skip_time_check=False)

Make a list of datetime objects for each time step from the time.dat file.

Parameters:
  • nt (int) – Number of time steps (from get_grid_elems_from_parammod)

  • sami_data_path (str) – Path to sami data

  • dtime_storm_start (datetime.datetime) – Datetime of the start of the storm

  • hrs_before_storm (int, optional) – Hours from the onset of the storm (or any event, really) to begin processing. Set to -1 to run for the whole entire simulation. Defaults to None.

  • hrs_after_storm (int, optional) – Hours from the end of the storm to stop processing. Set to -1 to run for the whole entire simulation. Defaults to None.

  • help (bool, optional) – If help is set to true, we will print the time list. (useful when getting acquainted with the run)

Raises:
  • ValueError – Sometimes SAMI outputs fake time steps.

  • ValueError – You only set one of hrs_before_storm or hrs_after_storm.

Returns:

times (list):

List of datetime objects for each time step

hrs_since_storm_start (list):

List of (float) hours since the storm start

start_idx (int):
Start index for the times list,

calculated from hrs_before_storm (ONLY if hrs_before_storm)

end_idx (int):
End index for the times list,

calculated from hrs_after_storm (ONLY if hrs_after_storm)

Return type:

(tuple) tuple containing

utility_programs.read_routines.SAMI.process_all_to_cdf(sami_data_path, dtime_sim_start, dtime_storm_start=None, progress_bar=False, start_dtime=None, end_dtime=None, out_dir=None, use_ccmc=True, split_by_time=True, split_by_var=False, whole_run=False, run_name=None, OVERWRITE=False, delete_raw=False, append_files=False, low_mem=False, cols='all', skip_time_check=False)

Process SAMI binary files to netcdf format.

Parameters:
  • sami_data_path (str) – Path to SAMI data.

  • dtime_sim_start (datetime) – Simulation start time.

  • progress_bar (bool, optional) – Show progress bar. Defaults to False. Requires tqdm

  • start_dtime (datetime, optional) – datetime to start reading data. Defaults to None.

  • end_dtime (datetime, optional) – datetime to stop reading data. Defaults to None.

  • out_dir (str, optional) – Directory to save netcdf files. Defaults to sami_data_path.

  • split_by_time (bool, optional) – Split files by time. Defaults to False.

  • split_by_var (bool, optional) – Split files by variable. Defaults to False.

  • whole_run (bool, optional) – Save whole model run (in time range) as one netcdf. Defaults to False.

  • OVERWRITE (bool, optional) – Overwrite existing files. Defaults to False.

  • append_files (bool, optional) – Append to existing files.

  • low_mem (bool, optional) – Read data in chunks to save memory. Defaults to False.

  • cols (list-like or str, optional) – List of columns to read. Defaults to ‘all’.

Raises:
  • ValueError – If incorrect time args are given.

  • ValueError – If files exist and OVERWRITE is False.

  • ValueError – If cols is not in available columns.

Returns:

None

utility_programs.read_routines.SAMI.read_raw_to_xarray(sami_data_path, dtime_sim_start, cols='all', hrs_before_storm_start=None, hrs_after_storm_start=None, dtime_storm_start=None, start_dtime=None, end_dtime=None, start_idx=None, end_idx=None, progress_bar=False, skip_time_check=False)
utility_programs.read_routines.SAMI.read_sami_dene_tec_MAG_GRID(sami_data_path, dtime_sim_start=None, reshape=True)

Read in TEC (and interpolated dene) data!

Parameters:
  • sami_data_path (str) – path to SAMI data

  • dtime_sim_start (datetime.datetime) – datetime of the start of the simulation

  • reshape (bool, optional) – reshape the data to the correct shape, defaults to True. Otherwise, the data will be returned as a 1D array.

Returns:

SAMI data, times

Return type:

dict, np.array

utility_programs.read_routines.SAMI.read_to_nparray(sami_data_path, dtime_sim_start, dtime_storm_start=None, hrs_before_storm=None, hrs_after_storm=None, pbar=False, cols='all', need_help=False, skip_time_check=False)

Automatically read in SAMI data.

Parameters:
  • sami_data_path (str) – Path to SAMI data

  • dtime_storm_start (datetime.datetime) – Datetime of the start of the storm

  • dtime_sim_start (datetime.datetime) – Datetime of the start of the simulation

  • t_start_idx (int, optional) – Time index of the start of the data return. Defaults to None.

  • t_end_idx (int, optional) – Time index of the end of the data return. Defaults to None.

  • pbar (bool, optional) – Do you want to show a progress bar? It is automatically set if tqdm is successfully imported. Defaults to False.

  • cols (str or list-like, optional) – List of columns to get data for. Defaults to ‘all’.

  • help (bool, optional) – Prints time and variable info. Defaults to False.

Raises:
  • KeyError – If given cols is not valid

  • FileNotFoundError – If the filepath is invalid

Returns:

Dictionary of SAMI data with keys: [‘grid’, ‘data’]

data is in np arrays with the shape [nlt,nf,nz]

np.array:

Times of the data

Return type:

dict

Plotting

After converting files to netCDF, you can plot them using the following:

basic_plots_from_netcdf.autoplot(file_list, columns_to_plot, output_dir=None, show_map=False, time_lims=[0, -1], cut_dict={}, lim_dict={}, loop_var='time', process_options=None, plot_arg_dict=None, concat_dim='time')

Plot data from netCDF files.

Parameters:
  • file_list (list of str or str) – List of file paths to netCDF files.

  • columns_to_plot (str or list of str) – Name(s) of the variable(s) to plot.

  • output_dir (str, optional) – Directory to save the plots. If not specified, plots will not be saved to the same directory as file_list.

  • show_map (bool, optional) – Whether to plot the data on a map. Default is False.

  • time_lims (list of int, optional) – Time limits to plot. Default is [0, -1], which plots all available times.

  • cut_dict (dict, optional) – Dictionary of cuts to apply to the data. Default is an empty dictionary (no cuts). Format as {‘lon’: 240, ‘alt’:450}.

  • lim_dict (dict, optional) – Dictionary of limits to apply to the data. Default is an empty dictionary.

  • loop_var (str, optional) – Name of the variable to loop over. This will make plots for all values of the variable (within the limits specified). Default is ‘time’.

  • process_options (dict, optional) – Dictionary of processing options to apply to the data. Default is None. See run_processing_options() for supported options.

  • plot_arg_dict (dict, optional) – Dictionary of arguments to pass to the plot function. Default is None.

  • concat_dim (str, optional) – Name of the dimension to concatenate the data along when reading netCDF files with Dask. Optional. Only change this if you are having trouble reading in files. Default is ‘time’.

Raises:

ValueError – If altitude is selected when using alt_int, or if lon/lat cuts are used when making maps.

Return type:

None

basic_plots_from_netcdf.run_processing_options(ds, process_options)

Process the given xarray.Dataset according to the specified options.

Parameters:
  • ds (xarray.Dataset) – The dataset to be processed.

  • process_options (str or list) – The processing options to be applied to the input dataset. Currently supported options: ‘alt_int’: integrate over altitude ‘bandpass’: apply bandpass filter ‘transpose’: transpose the dataset

Returns:

The processed dataset.

Return type:

xarray.Dataset

More plotting routines can be accessed through the utility_programs.plotting_routines module:

utility_programs.plotting_routines.custom_panels_keos(da, numplots=8, sel_col='localtime', max_per_row=4, suptitle=None, vmin=None, vmax=None, sharex=True, sharey=True, x='time', cmap='rainbow', one_colorbars=True, colorbar_label='')

A script to make a panel of keogram-like plots.

Parameters:
  • da (xarray DataArray) – Data array to be plotted.

  • numplots (int, optional) – Number of plots to make, by default 8.

  • sel_col (str, optional) – Column to select data from, by default ‘localtime’.

  • max_per_row (int, optional) – Maximum number of plots per row, by default 4.

  • suptitle (str, optional) – Title of plot, by default None.

  • vmin (int, optional) – Minimum value of colorbar, by default None.

  • vmax (int, optional) – Maximum value of colorbar, by default None.

  • sharex (bool, optional) – Share x-axis?, by default True.

  • sharey (bool, optional) – Share y-axis?, by default True.

  • x (str, optional) – X-axis data, by default ‘time’.

  • cmap (str, optional) – Which matplotlib colormap to use, by default ‘rainbow’.

  • one_colorbars (bool, optional) – Use one colorbar for all plots?, by default True.

  • colorbar_label (str, optional) – Label of colorbar, by default ‘’.

Returns:

Data from imshow (a figure).

Return type:

Matplotlib imshow object

utility_programs.plotting_routines.draw_field_line_plot(x, y, z, title=None, interpolate=False, x_label='Mlat (Deg)', y_label='Altitude (km)', x_lims=[-65, 65], y_lims=[0, 1200], cbar_label=None, cbar_lims=None, ax=None, cmap='viridis', fname=None, save_or_show='save', fpeak_col=None)

Draw a plot of data along a single field line (longitude).

Parameters:
  • x (numpy array) – X-axis data, or latitude

  • y (numpy array) – Y-axis data, or altitude

  • z (numpy array) – Z-axis (color) data, or data to be plotted

  • title (str, optional) – Tiitle of generated plot, by default None

  • interpolate (bool, optional) – Interpolate the data (to the x-y grid), by default False

  • x_label (str, optional) – Label of x-axis, by default ‘Mlat (Deg)’

  • y_label (str, optional) – Label of y-axis, by default ‘Altitude (km)’

  • x_lims (list, optional) – Limits of x-axis, by default [-65, 65]

  • y_lims (list, optional) – Limits of y-axis, by default [0, 1200]

  • cbar_label (str, optional) – Label of colorbar, by default None

  • cbar_lims (list, optional) – Limits of colorbar, by default None

  • ax (Matplotlib axes, optional) – If plotting on an existing axis, specify it here, by default None

  • cmap (str, optional) – Which matplotlib colormap to use, by default ‘viridis’

  • fname (str, optional) – Filename when saving, by default None

  • save_or_show (str, optional) – Save or show plots? Defaults to “save”, by default ‘save’

  • fpeak_col (numpy array, optional) – To plot the F-peak location, give the latitude-altitude coordinates here, by default None

Raises:
  • ValueError – If fname exists and OVERWRITE is False

  • ValueError – If save_or_show is not “save”, “show”, or “return”

  • ValueError – If fname is not given when save_or_show is “save”

Returns:

Data from imshow

Return type:

Matplotlib imshow object

utility_programs.plotting_routines.draw_map(data_arr, cbarlims=[None, None], title=None, ylims=None, cbar_label=None, y_label='Latitude (deg)', x_label='Longitude (deg)', save_or_show='save', cmap='viridis', ax=None, fname=None, OVERWRITE=False, **kwargs)

Draw a map of the data array.

Parameters:
  • data_arr (xarray DataArray) – Data array to be plotted

  • cbarlims (list, optional) – Colorbar limits as [vmin, vmax], defaults to None (automatic limits)

  • title (str, optional) – Title to draw, defaults to None

  • ylims (list, optional) – Limits of y-axis [ymin, ymax], defaults to None

  • cbar_label (str, optional) – Label of colorbar, defaults to None

  • y_label (str, optional) – Label of y-axis, defaults to “Latitude (deg)”

  • x_label (str, optional) – Label of x-axis, defaults to “Longitude (deg)”

  • save_or_show (str, optional) – Save or show plots? Defaults to “save”

  • cmap (str, optional) – Which matplotlib colormap to use, defaults to ‘viridis’

  • ax (Matplotlib axes, optional) – If plotting on an existing axis, specify it here, defaults to None

  • fname (str, optional) – Filename when saving, defaults to None

  • OVERWRITE (bool, optional) – Overwrite existing files when saving, defaults to False

Raises:
  • ValueError – If fname exists and OVERWRITE is False

  • ValueError – If save_or_show is not “save”, “show”, or “return”

  • ValueError – If fname is not given when save_or_show is “save”

Returns:

Data from imshow

Return type:

Matplotlib imshow object

utility_programs.plotting_routines.loop_panels(da, ncols, start_time, time_delta='1 hour', sel_criteria=None, suptitle=None, title=None, col_names=None, save=None, lon_labels=False, mask_dials=False, **plotargs)

When making panel plots, this script is easier to interface with.

Parameters:
  • da (xarray DataArray) – Data array to be plotted.

  • ncols (int) – Number of columns to plot.

  • start_time (str) – Start time of plot to be converted to pd.Timestamp. Format as ‘YYYY-MM-DD HH:MM:SS’; ‘2011-05-20’ is acceptable (00:00 UT).

  • time_delta (str, optional) – Time between each plot, defaults to ‘1 hour’.

  • sel_criteria (dict, optional) – Criteria to select data (i.e. plotting multiple altitudes), defaults to None.

  • suptitle (str, optional) – Title of plot, defaults to None.

  • title (str, optional) – Title of each subplot, defaults to None.

  • col_names (list of str, optional) – Name(s) of column(s) to plot, defaults to None (use this when plotting a DataSet instead of a DataArray).

  • save (str, optional) – Filename to save plot to, defaults to None.

  • lon_labels (bool, optional) – Show longitude labels, defaults to False.

  • mask_dials (int, optional) – Mask values below this value, defaults to False.

Returns:

Data from imshow (a figure).

Return type:

Matplotlib imshow object

utility_programs.plotting_routines.make_a_keo(arr, title=None, cbarlims=[None, None], cbar_name=None, y_label='Latitude (deg)', x_label='Index of Time-step (relative to plot start)', p_extent=None, save_or_show='save', cmap='viridis', fname=None, ax=None, ylims=None, OVERWRITE=False, **kwargs)

Generate a keogram plot from a data array.

Parameters:
  • arr (xarray DataArray) – Data array to be plotted

  • title (str, optional) – Title of plot, defaults to None

  • cbarlims (list, optional) – Colorbar limits, Defaults to [None, None] (automatic)

  • cbar_name (str, optional) – Label for colorbar, defaults to None

  • y_label (str, optional) – Y-axis label, defaults to “Latitude (deg)”

  • x_label (str, optional) – Label for x-axis, defaults to “Index of Time-step (relative to plot start)”

  • p_extent (list-like, optional) – Plot extent. Set to 0 for auto.

  • save_or_show (str, optional) – Save, show, or return plots? Defaults to “save”

  • cmap (str, optional) – Colormap to use to plot, defaults to ‘viridis’

  • fname (str, optional) – Filename to save plot to, defaults to None

  • ax (Matplotlib axes, optional) – If plotting on an existing axis, specify it here, defaults to None

  • ylims (int, optional) – Limits of y axis, defaults to None

  • OVERWRITE (bool, optional) – Overwrite existing plot, defaults to False

Raises:
  • ValueError – If fname exists and OVERWRITE is False

  • ValueError – If save_or_show is not “save”, “show”, or “return”

  • ValueError – If fname is not given when save_or_show is “save”

Returns:

Data from imshow

Return type:

Matplotlib imshow object

utility_programs.plotting_routines.map_and_dials(dial_da, total, map_da=None, max_per_row=3, isel_dials=None, sel_dials=None, isel_map=None, sel_map=None, quiver_map_cols=None, suptitle=None, suptitlesize='large', time_start=None, time_delta='1 hour', save=None, mask_dials=0.001, mask_maps=False, dial_cmap='rainbow', map_cmap='rainbow', vmin_dial=None, vmax_dial=None, dial_kwargs={}, map_kwargs={}, vmin_map=None, vmax_map=None, several_datasets=False, times_datasets=None, latlon_labeled=False)

This script will make a plot of a map and polar dials for several timesteps.

Parameters:
  • dial_da (xarray DataArray) – Data array to be plotted on dials

  • total (int) – Number of plots to make

  • map_da (xarray DataArray, optional) – Data array to be plotted on map, defaults to None

  • max_per_row (int, optional) – Maximum number of plots per row, defaults to 3

  • isel_dials (dict, optional) – Indices to select dial data from, defaults to None

  • sel_dials (dict, optional) – Criteria to select dial data from, defaults to None

  • isel_map (dict, optional) – Indices to select map data from, defaults to None

  • sel_map (dict, optional) – Criteria to select map data from, defaults to None

  • quiver_map_cols (str, optional) – Columns to plot on map as quiver, defaults to None (no quivers)

  • suptitle (str, optional) – Title of plot, defaults to None

  • suptitlesize (str, optional) – Size of suptitle, defaults to ‘large’

  • time_start (str, optional) – Start time of plot, defaults to None

  • time_delta (str, optional) – Time between each plot, defaults to ‘1 hour’

  • save (str, optional) – Filename to save plot to, defaults to None (don’t save)

  • mask_dials (float, optional) – Mask values below this value, defaults to 0.001

  • mask_maps (bool, optional) – Mask values below this value, defaults to False

  • dial_cmap (str, optional) – Which matplotlib colormap to use for dials, defaults to ‘rainbow’

  • map_cmap (str, optional) – Which matplotlib colormap to use for maps, defaults to ‘rainbow’

  • vmin_dial (int, optional) – Minimum value of dial colorbar, defaults to None

  • vmax_dial (int, optional) – Maximum value of dial colorbar, defaults to None

  • dial_kwargs (dict, optional) – Keyword arguments for dial plot, defaults to {}

  • map_kwargs (dict, optional) – Keyword arguments for map plot, defaults to {}

  • vmin_map (int, optional) – Minimum value of map colorbar, defaults to None

  • vmax_map (int, optional) – Maximum value of map colorbar, defaults to None

  • several_datasets (bool, optional) – Plot several datasets on the same plot, defaults to False

  • times_datasets (list of str, optional) – Times to plot for each dataset, defaults to None

  • latlon_labeled (bool, optional) – Show latitude and longitude labels, defaults to False

Returns:

Data from imshow (a figure)

Return type:

Matplotlib imshow object

utility_programs.plotting_routines.panel_of_dials(da, hemi_titles, times, time_titles=None, title=None, mask_dials=False, lon_labels=False, **plotargs)

Plot a panel of polar dials.

Parameters:
  • da (xarray DataArray) – Data array to be plotted.

  • hemi_titles (list, optional) – Titles of each hemisphere.

  • times (list, optional) – Times to plot.

  • time_titles (list, optional) – Titles of each time, defaults to None.

  • title (str, optional) – Title of plot, defaults to None.

  • mask_dials (int, optional) – Mask values below this value, defaults to False.

  • lon_labels (bool, optional) – Show longitude labels, defaults to False.

Returns:

Data from imshow (a figure).

Return type:

Matplotlib imshow object

Raises:

ValueError – If hemi_titles and times do not match in shape.

utility_programs.plotting_routines.panel_plot(da, x='time', y='lat', wrap_col='lon', plot_vals=[0, 45, 90, 135, 180, 225, 270, 315], do_map=False, col_wrap=4, suptitle=None, vlims=None, cmap='bwr', out_fname=None, isel_plotvals=False, tight_layout=False, cbar_label=None)

Plot a panel plot of a data array.

Parameters:
  • da (xr.DataArray) – Data array to be plotted

  • x (str, optional) – X-axis data, by default ‘time’

  • y (str, optional) – Y-axis data, by default ‘lat’

  • wrap_col (str, optional) – Column to wrap around, by default ‘lon’

  • plot_vals (list, optional) – Values (of wrap_col) to plot, by default [0, 45, 90, 135, 180, 225, 270, 315]

  • do_map (bool, optional) – Plot on a map?, by default False

  • col_wrap (int, optional) – How many columns to make, by default 4

  • suptitle (str, optional) – Title of plot, by default None

  • vlims (int, optional) – Absolute value of colorbar limits, by default None (auto).

  • cmap (str, optional) – Which matplotlib colormap to use, by default ‘bwr’

  • out_fname (str, optional) – Filename when saving, by default None

  • isel_plotvals (bool, optional) – Set to true if the plot_vals are indices rather than values, by default False

  • tight_layout (bool, optional) – Use tight layout?, by default False

  • cbar_label (str, optional) – Label of colorbar, by default None

Return type:

None

utility_programs.plotting_routines.panel_with_lt(da, x='time', y='lat', wrap_col='lon', lons=[0, 45, 90, 135, 180, 225, 270, 315], col_wrap=4, suptitle=None, figsize=None, vlims=None, cmap='bwr', out_fname=None, tight_layout=False)

Plot a panel plot of a data array, with local time on the x-axis.

Parameters:
  • da (DataArray) – Data array to be plotted

  • x (str, optional) – X-axis data, defaults to ‘time’

  • y (str, optional) – Y-axis data, defaults to ‘lat’

  • wrap_col (str, optional) – Column to wrap around, defaults to ‘lon’

  • lons (list, optional) – Values (of wrap_col) to plot, defaults to [0, 45, 90, 135, 180, 225, 270, 315]

  • col_wrap (int, optional) – How many columns to make, defaults to 4

  • suptitle (str, optional) – Title of plot, defaults to None

  • figsize (list, optional) – Size of figure, defaults to None (automatic)

  • vlims (int, optional) – Absolute value of colorbar limits, defaults to None (automatic)

  • cmap (str, optional) – Which matplotlib colormap to use, defaults to ‘bwr’

  • out_fname (str, optional) – Filename when saving, defaults to None

  • tight_layout (bool, optional) – Use tight layout? Defaults to False

Utilities

A number of useful utilities are available.

These are not very well organized, but they are all available in the utility_programs module:

utility_programs.filters.filter_xarray_DA_diff(da, dim='time', order=2, percent=False, label='lower')

Calculate the difference of a DataArray along a given dimension.

Parameters:
  • da (xarray DataArray) – DataArray to be filtered

  • dim (str, optional) – Dimension over which to calculate the diff, by default ‘time’

  • order (int, optional) – Order of the diff, by default 2

  • percent (bool, optional) – Return the percent over diff (False), or just return the fit (True), by default False

  • label (str, optional) – Label values to the ‘lower’ or ‘upper’ bound, by default ‘lower’

Returns:

Filtered data

Return type:

xarray DataArray

utility_programs.filters.make_fits(da, freq=5, lims=[40, 85], order=1, percent=True)

Calculate bandpass filter for all data previously read in, this is a wrapper for the scipy bandpass filter.

Parameters:
  • da (xarray DataArray) – DataArray to be filtered

  • freq (int, optional) – Time between outputs, units must be same as lims, by default 5

  • lims (list, optional) – Limits of bandpass filter, by default [40, 85]

  • order (int, optional) – Order of the filter, by default 1

  • percent (bool, optional) – Return DataArray as percent over background? Set to False to return the absolute perturbation over background. Defaults to True.

Returns:

DataArray with filtered data.

Return type:

xarray DataArray

Notes

The bandpass filter is applied to the data using the scipy.signal.filtfilt function. This function is a forward-backward filter, meaning that the data is filtered in both the forward and reverse directions. This results in zero phase shift in the filtered data.

The bandpass filter is designed using the scipy.signal.butter function. This function designs a Butterworth filter, which is a type of infinite impulse response filter. The filter is designed using the specified order, and the cutoff frequencies are calculated from the specified limits and the sampling frequency.

utility_programs.filters.remove_outliers(array)

Remove outliers from an array by replacing them with the median value.

Parameters:

array (numpy array or xarray DataArray) – Data to be filtered

Returns:

Filtered data

Return type:

numpy array or xarray DataArray

utility_programs.utils.add_lt_to_dataset(ds, localtimes=[2, 6, 10, 14, 18, 22], pbar=False)

Add localtime as a coordinate to an existing dataset/dataarray

Parameters:
  • ds (xarray.dataset or xarray.dataarray) – xarray object to add the column to. Can be dataset (takes longer) or dataarray. ** MUST HAVE time AS A DIMENSION**.

  • localtimes (array-like or int) – Localtimes to be returned! If int, this is the number of localtimes. They will be evenly spaced from 0-24. If list-like, these are the actual localtimes to return. Be careful adding too many of these. Since we do some interpolating, the more localtimes you want, the longer this will take. Optional to show a progress bar is it’s taking a long time.

  • pbar (bool) – Set this to true to show a progress bar.

Returns:

ds – Same exact format as the input dataset, but with a new coordinate of ‘localtime’’

Return type:

xarray.(dataset/dataarray)

utility_programs.utils.autoread(file_list, columns_to_return=None, concat_dim='time')

Automatically read in a list of files and concatenate them.

Parameters:
  • file_list (str or list of paths) – List of files to read in.

  • columns_to_return (str or list, optional) – Columns (data_vars) to return. Defaults to None.

  • concat_dim (str, optional) – Concatenate along this dimension. No reason to ever change. Defaults to ‘time’.

Raises:

ValueError – Column not found in files.

Returns:

Dataset holding requested variable(s).

Return type:

xarray.Dataset

Notes

This is not well supported. Best to use xarray.open_mfdataset() instead.

utility_programs.utils.get_var_names(dir, models)

Print out a list of variable names.

Parameters:
  • dir (str) – Directory of outputs.

  • models (str or list) – Name of model.

Return type:

None

Notes

This function prints out a list of variable names for a given model or list of models. It does this by searching for netCDF files in the specified directory that match the model name(s), opening the first file found, and printing out the names of the data variables in the file.

Examples

>>> get_var_names('/path/to/outputs', 'model1')
model1
<xarray.Dataset>
Dimensions:  (time: 10, x: 100, y: 100)
Coordinates:
  * time     (time) datetime64[ns] 2000-01-01 2000-02-01 ... 2000-10-01
  * x        (x) float64 0.0 1.0 2.0 3.0 4.0 ... 96.0 97.0 98.0 99.0 100.0
  * y        (y) float64 0.0 1.0 2.0 3.0 4.0 ... 96.0 97.0 98.0 99.0 100.0
Data variables:
    var1    (time, y, x) float64 ...
    var2    (time, y, x) float64 ...
    var3    (time, y, x) float64 ...
    var4    (time, y, x) float64 ...
    var5    (time, y, x) float64 ...
    var6    (time, y, x) float64 ...
    var7    (time, y, x) float64 ...
    var8    (time, y, x) float64 ...
    var9    (time, y, x) float64 ...
    var10   (time, y, x) float64 ...
utility_programs.utils.hours_from_storm_onset_into_ds(ds, onset_ut)

Calculate hours from an event and add to dataset.

Parameters:
  • ds (xarray.Dataset) – Dataset to add hours from storm onset to

  • onset_ut (datetime.datetime) – Datetime object of storm/event

Raises:

ValueError – If multiple days are in the dataset

Returns:

Dataset with hours from storm onset added

Return type:

xarray.Dataset

utility_programs.utils.make_ccmc_name(modelname, ut, data_type=None)

Make a CCMC-formatted filename. Returns a string of the form: modelname_datatype_YYYY-MM-DDThh-mm-ss.nc

Parameters:
  • modelname (str) – Name of model.

  • ut (datetime) – Datetime object of the time of the file.

  • data_type (str) – Type of data in the file (3DALL/2DANC/ALL/etc.).

Returns:

CCMC-formatted filename.

Return type:

str

utility_programs.utils.str_to_ut(in_str)

Convert a string to a datetime object.

Parameters:

in_str (str) – String to convert to datetime object.

Returns:

Datetime object.

Return type:

datetime

utility_programs.utils.ut_to_lt(time_array, glon)

Compute local time from date and longitude.

Parameters:
  • time_array (array-like) – Array-like of datetime objects in universal time

  • glon (array-like or float) – Float or array-like of floats containing geographic longitude in degrees. If single value or array of a different shape, all longitudes are applied to all times. If the shape is the same as time_array, the values are paired in the SLT calculation.

Returns:

lt – List of local times in hours

Return type:

array of floats

Raises:

TypeError – For badly formatted input

Routines to perform temporal calculations.

utility_programs.time_conversion.calc_time_shift(utime)

Calculate the time shift needed to orient a polar dial.

Parameters:

utime (datetime) – Datetime object of the time we’re plotting

Returns:

Time shift in degrees

Return type:

float

utility_programs.time_conversion.datetime_to_epoch(dtime)

Convert datetime to epoch seconds.

Parameters:

dtime (datetime) – Datetime object

Returns:

Seconds since 1 Jan 1965

Return type:

float

utility_programs.time_conversion.epoch_to_datetime(epoch_time)

Convert from epoch seconds to datetime.

Parameters:

epoch_time (float) – Seconds since 1 Jan 1965

Returns:

Datetime object corresponding to epoch_time

Return type:

datetime

utility_programs.time_conversion.lt_to_ut(lt, glon)

Compute universal time in hours from local time and longitude.

Parameters:
  • lt (float or array-like) – Local time(s) in hours

  • glon (float or array-like) – Geographic longitude(s) in degrees.

Returns:

Universal time in hours

Return type:

float

utility_programs.time_conversion.ut_to_lt(time_array, glon)

Compute local time from date and longitude.

Parameters:
  • time_array (array-like) – Array-like of datetime objects in universal time

  • glon (array-like or float) – Float or array-like of floats containing geographic longitude in degrees. If single value or array of a different shape, all longitudes are applied to all times. If the shape is the same as time_array, the values are paired in the SLT calculation.

Returns:

List of local times in hours

Return type:

array of floats