API REFERENCE
Processing Model Results
To post-process model results, use PostProcessModelResults.py. The help information can be accessed by running:
python PostProcessMOdelResults.py --help
Process GITM & SAMI data to NetCDF format.
Can use just one model output, if preferred.
More functionality is available in the individual model modules.
This program will process every column into netCDF files by time.
SAMI is regridded according to the default values in RegridSami.main()
This can be adjusted with a custom grid, or use the individual model’s
post-processing routines for more fine control.
This program is designed to be run from the command line, but can be imported and used as a module, though is not recommended.
- PostProcessModelResults.main(args)
This will rewrite both GITM and SAMI3 output files to NetCDF files. SAMI3 can optionally be interpolated:
Interpolation
SAMI3_ESMF_Regrid
This module contains functions for processing SAMI raw data for use in ESMF. Input and output grids are calculated and written to file as a 3D UGRID mesh and the ESMF weight file is computed and then applied.
All functions use spherical coordinates with lon & lat in degrees (with lon from 0-360), and alt in km from Earth’s surface.
- SAMI3_ESMF_Regrid.apply_weight_file(sami_data_path, dtime_sim_start, out_dir, cols='all', progress=True, output_filename=None, skip_time_check=False, custom_input_file=None, temp_dir=None)
Apply the ESMF weight file to the SAMI raw data.
- Parameters:
sami_data_path (str) – Path to SAMI3 raw data
dtime_sim_start (str) – Simulation start date (YYYYMMDD)
out_dir (str) – Output directory, Default is same as sami_data_path
cols (str or list) – Columns to interpolate. Default is ‘all’
progress (bool) – Whether or not to show tqdm progress bar
output_filename (str) – Output filename, Default is to make a new file for each variable.
- Returns:
None
Notes
This has been sped up and takes ~1 hour for a full SAMI run
- Multithreading this should be straighforward, but it is not a
priority since the speed is good enough (and Xarray can lock files which will cause issues).
- Memory usage is ~10GB, so it is likely too heavy to run on a
login node.
- SAMI3_ESMF_Regrid.generate_interior_points_custom_grid(lons, lats, alts, cell_radius=0.5, progress=False)
Generate a 3D mesh from given 3D coordinates and write it to a UGRID file for use in ESMF. The mesh has overlap (degeneracy) which should be fine. The lats, lons, and alts arrays should contain the same number of points.
- Parameters:
lats (numpy.ndarray) – 1D array of latitudes
lons (numpy.ndarray) – 1D array of longitudes
alts (numpy.ndarray) – 1D array of altitudes
cell_radius (float) – Distance from point to corner of cell, in degrees, (altitude is 10* cell_radius). Default is 1 degree.
progress (bool) – Whether or not to show tqdm progress bar
- Returns:
- tuple of 1D arrays of the corner points of the cuboids
to be used as the grid corners in ESMF. These are used for the “vertids” variable in the UGRID mesh ESMF input. [lons, lats, alts, connections]
- Return type:
tuple
- SAMI3_ESMF_Regrid.generate_interior_points_output_grid(longitudes, latitudes, altitudes, progress=False)
Generate a 3D mesh from arrays of desired output coordinate values. The input points are 1D arrays of various lengths, denoting the desired output grid (in deg and km above Earth’s surface). The output is a 1D array of the corner points of the cuboids to be used as the grid corners in ESMF. These are used for the “vertids” variable in the UGRID mesh ESMF input.
- Parameters:
longitudes (numpy.ndarray or list) – 1D array of longitudes
latitudes (numpy.ndarray or list) – 1D array of latitudes
altitudes (numpy.ndarray or list) – 1D array of altitudes
progress (bool) – Whether or not to show tqdm progress bar
- Returns:
- tuple of 1D arrays of the corner points of the cuboids
to be used as the grid corners in ESMF. These are used for the “vertids” variable in the UGRID mesh ESMF input. [lons, lats, alts, connections]
- Return type:
tuple
- SAMI3_ESMF_Regrid.generate_interior_points_sami_raw(old_shape, progress=False)
Generates mesh points from SAMI raw data for use in ESMF.
- Parameters:
in_cart (numpy.ndarray) – 3xN array of SAMI points (any coord system)
old_shape (list) – (nlt, nf, nz) shape of original sami outputs.
proogress (bool) – Whether or not to show tqdm progress bar
- Returns:
- list of 1D indices of the corners of the cuboids
to be used as the grid corners in ESMF. These are used for the “vertids” variable in the UGRID mesh ESMF input.
- Return type:
list
Notes
This is not super well documented. Contact me with questions
- The code here could be made more efficient. Rather than saving
a few seconds at execution, I opted to make it more readable.
It should “just work”.
- The mesh generated does follow ESMF conventions, however
more points are thrown out than are probably necessary. With the size of the SAMI grid this is not an issue. The points that are thrown out are all at high latitudes or low altitudes, so do not impact the results in any meaningful way.
- SAMI3_ESMF_Regrid.main(sami_data_path, dtime_sim_start, ESMF_DIR='', num_lons=90, num_lats=180, num_alts=100, alt_step=None, min_alt=100, max_alt=2400, use_log_alt=False, custom_input_file=None, custom_grid_size=0.5, cols='all', progress=False, skip_time_check=False, remake_files=True, out_dir=None, temp_dir=None, output_filename=None, use_mpi=None, do_apply_weights=True)
Main function for processing SAMI raw data for use in ESMF.
- Parameters:
sami_data_path (str) – Path to SAMI3 raw data
dtime_sim_start (str) – Simulation start date (YYYYMMDD)
ESMF_DIR (str) – ABSOLUTE path to ESMF installation. Default is ‘’. This only needs to be set in you are using a user-install of ESMF. In most cases, this will not need to be changed. See ESMF install instrunctios for more information: https://earthsystemmodeling.org/docs/release/latest/ESMF_usrdoc/
num_lons (int) – Number of longitudes in output grid
num_lats (int) – Number of latitudes in output grid
num_alts (int) – Number of altitudes in output grid
alt_step (int) – Altitude step size in output grid. Use this or num_alts (Default is None)
min_alt (int) – Minimum altitude in output grid
max_alt (int) – Maximum altitude in output grid
use_log_alt (bool) – Use log scale for alts? Default is False; (use a linear scale). Not compatible with alt_step. Only used if the output is a grid.
custom_input_file (str) – User-defined input file (.csv with comma sep and a header) (i.e. sat track w/ glon, glat, alt columns.)
custom_grid_size (float) – Size of grid cells to use when regridding to a satellite file. Measured in degrees from center (so it is a radius), and altitude is 10*custom_grid_size. Default is 0.5. If you are receiving a lot of 0’s and NaN’s, you can try increasing this value.
cols (str or list) – Columns to interpolate (comma sep)
progress (bool) – Whether or not to show tqdm progress bar when applying weights. Default is False.
remake_files (bool) – Remake ESMF input files? Default is True (remake ESMF input & weight files). When changing the output coordinates, you must remake the input files.
out_dir (str) – Output directory, Default is same as sami_data_path.
tmp_dir (str) – Location where to store temp files. Default is the sami_data_path. The temp files are the input and output grid files and the weight ESMF gives back. None are especially large.
output_filename (str) – Output filename, Default is to make a new file for each variable.
use_mpi (int) – Use MPI for multiprocessing ESMF weight calculation? Specify the number of processors here. Default is None, which runs ESMF single-threaded. Notes: ESMF will need access to MPI even run single-threaded. This is not necessary and will not drastically speed anything up, unless your SAMI3 grid is absurdly high resolution. Unless you notice the weight generation taking a long time, leave this as the default value. Weight application is single threaded.
do_apply_weights (bool) – Option for just generating the weight file or also applying it. Default is to apply weights (in addition to generating). This is mostly just a debug option.
- Returns:
None
Notes
This is not super well documented. Contact me with questions
It should “just work”, but may not…
- The mesh generated does follow ESMF conventions, however
more points are thrown out than is probably necessary. With the size of the SAMI grid this is probably not an issue.
- SAMI3_ESMF_Regrid.write_UGRID_mesh(lon, lat, alt, indices, fname)
Write a UGRID mesh file for given coordinates and indices (interconnects between nodes). See the ESMF documentation (UGRID section) for more info.
- Parameters:
lon (numpy.ndarray) – 1D array of longitudes
lat (numpy.ndarray) – 1D array of latitudes
alt (numpy.ndarray) – 1D array of altitudes
indices (numpy.ndarray) – 2D array of indices, shape (N, 8)
fname (str) – Output filename
- Returns:
None
Notes
The mesh must be created elsewhere, this just writes the file.
- This function resides in a file with other functions to help
generate the mesh interconnects.
- The provided indices must be in the correct order, see the
ESMF documentation for more info.
- This function is used for both the input and output meshes,
and can be used for anyything else, provided the arguments are correct.
Reading Data
GITM
- utility_programs.read_routines.GITM.auto_read(gitm_dir, single_file=False, start_dtime=None, start_idx=None, end_dtime=None, end_idx=None, cols='all', progress_bar=True, drop_ghost_cells=True, file_type=None, return_xarray=True, force_dict=False, parallel=True, engine='h5netcdf', use_dask=False)
Automatically reads in a directory of GITM files.
- Parameters:
gitm_dir (str) – Directory of GITM files.
single_file (bool, optional) – Whether to read in a single file. Defaults to False.
start_dtime (datetime, optional) – Start time of the data you want. Defaults to None.
start_idx (int, optional) – Start index of the data you want. Defaults to None.
end_dtime (datetime, optional) – End time of the data you want. Defaults to None.
end_idx (int, optional) – End index of the data you want. Defaults to None.
cols (list-like or str, optional) – List of columns you want to read in. Defaults to ‘all’.
progress_bar (bool, optional) – Whether to show a progress bar. Defaults to True. Requires tqdm.
drop_ghost_cells (bool, optional) – Whether to drop ghost cells. Defaults to True.
file_type (str, optional) – File type of the data you want to read in. Defaults to None.
return_xarray (bool, optional) – Whether to return an xarray. Defaults to True.
force_dict (bool, optional) – Whether to force a dictionary return. Defaults to False.
parallel (bool, optional) – Whether to read in files in parallel. Defaults to True. This will use Dask, which can get hairy. If you’re having issues, try setting this to False. Needs dask and dask.distributed
engine (str, optional) – The engine to use for reading in the data. Defaults to ‘h5netcdf’.
use_dask (bool, optional) – Whether to use Dask for reading in the data. Defaults to False.
- Returns:
The data read in from the GITM files.
- Return type:
xarray.Dataset or dict
- utility_programs.read_routines.GITM.find_variable(gitm_dir, varname=None, varhelp=False, nc=True)
- Help function. Finds a variable in a directory of GITM files.
Return the filetype and/or all of the variables available.
- Parameters:
(str (gitm_dir) – path-like): Directory of GITM files.
varname (str, optional) – Variable you’re looking for. Not setting this will just print all variables. Defaults to None.
varhelp (bool, optional) – If True, will print out all vaiables available. Think of it as “just checking”. Defaults to False.
nc (bool, optional) – Whether to only look at .nc files. Defaults to True.
- Raises:
ValueError – If you don’t specify either varhelp or varname.
- Returns:
The filetype holding the variable you’re loooking for.
- Return type:
str (optional)
- utility_programs.read_routines.GITM.gitm_times_from_filelist(file_list, century_prefix='20')
Generate datetimes from a list of GITM files.
- Parameters:
file_list (list-like) – list of gitm files to parse
century_prefix (str, optional) – Which century? Defaults to ‘20’.
- Raises:
ValueError – Incorrect file format.
- Returns:
List of datetimes in the same order as the filelist input.
- Return type:
list
- utility_programs.read_routines.GITM.process_all_to_cdf(gitm_dir, out_dir=None, dtime_storm_start=None, delete_bins=False, replace_cdfs=False, progress_bar=True, drop_ghost_cells=True, drop_before=None, drop_after=None, skip_existing=False, file_types='all', use_ccmc=True, single_file=False, run_name=None, tmp_dir=None)
Process all GITM .bin files in a directory to .cdf files.
- Parameters:
(str (out_dir) – path-like): Directory containing GITM .bin files.
(str – path-like, optional): Directory to output .cdf files. If None, will go into the same directory as the .bin files. Defaults to None.
dtime_storm_start (datetime, optional) – Attribute added to the netCDF file. Defaults to None.
delete_bins (bool, optional) – Delete GITM bins after making Datasets? Defaults to False.
replace_cdfs (bool, optional) – Replace pre-existing netCDF files? Defaults to False.
progress_bar (bool, optional) – Whether or not to show progress bar. Requires tqdm. Defaults to True. If outputting to a single file, a progress bar will be added when writing files to disk. This cannot be changed.
drop_ghost_cells (bool, optional) – Drop GITM ghost cells? Defaults to True.
drop_before (datetime, optional) – Similar to start_dtime. When to start processing files. Will delete files before this time. Defaults to None.
drop_after (datetime, optional) – Similar to start_dtime. When to start processing files. Will delete files before this time. Defaults to None.
skip_existing (bool, optional) – Skip existing netCDF files? Defaults to False. This will slow down the program significantly.
file_types (str or list-like, optional) – Which file types to process. Defaults to ‘all’. Can be a list of strings or a single string. Example usage is [‘3DALL’, ‘2DALL’] or ‘3DALL’.
use_ccmc (bool, optional) – Write files with CCMC naming convention? Defaults to True. Recommended if not using single_file.
single_file (bool, optional) – Output to a single file? Defaults to False. If True, will output to a single netCDF file. If False, will output to multiple netCDF files, one for each time.
run_name (str, optional) – Name of the run. Only used if single_file. Defaults to None. ‘_GITM.nc’ will be appended to this.
tmp_dir (str, optional) – Temporary directory to write files to. Only used if single_file. Defaults to None. Some systems have a local temp directory that’s much faster than the standard output_directory.
- utility_programs.read_routines.GITM.read_bin_to_xarray(filename, drop_ghost_cells=True, cols='all')
Reads GITM binary file into xarray Works for all GITM files, including 3DALL, 2DALL, 2DANC, etc. - (Taken and modified from aetherpy)
- Parameters:
filename (str, path) – Path to the file to read.
drop_ghost_cells (bool, optional) – Drop GITM ghost cells. See GITM manual for details on ghost cells. Defaults to True.
cols (str/list-like, optional) – Set which columns to read. On systems with limited memory this will make datasets too large to fit into memory. Defaults to ‘all’ (all columns).
- Raises:
IOError – File does not exist
- Returns:
- Dataset holding the data.
Indexed with glat, glon, alt (converted to deg, deg, km)
- Return type:
xarray.Dataset
- utility_programs.read_routines.GITM.read_multiple_bins_to_xarray(file_list, start_dtime=None, end_dtime=None, start_idx=0, end_idx=-1, drop_ghost_cells=True, cols='all', pbar=False)
Read a list-like of GITM files into an xarray Dataset.
- Parameters:
file_list (list-like) – files to pull from.
start_dtime (datetime, optional) –
- Time to start read at. Not necessary (especially if you have
pre-filtered the file_list. Defaults to None.
end_dtime (datetime, optional) –
- Time to end reads at. See above. Can be used exclusively.
Defaults to None.
start_idx (int, optional) – Index of file_list to start reading. Defaults to 0.
end_idx (int, optional) – Index of file_list to end reading at. Defaults to -1.
drop_ghost_cells (bool, optional) – Remove Ghost cells? Defaults to True.
cols (str or list-like, optional) – Specific columns to read. Defaults to ‘all’.
pbar (bool, optional) – Whether or not to show progress bar. Requires tqdm. Defaults to False.
- Raises:
ValueError – If start/end inputs are mixed up.
- Returns:
- Dataset containing all variables in the file_list at the
times specified.
- Return type:
xarray.Dataset
SAMI3
read sami data.
Various routines to handle reading raw SAMI3 outputs.
Author: Aaron Bukowski
- utility_programs.read_routines.SAMI.auto_read(sami_dir, cols='all', split_by_time=False, split_by_var=False, whole_run=False, return_xarray=True, filetype='SAMI-REGRID', force_nparrays=False, dtime_sim_start=None, parallel=True, start_dtime=None, start_idx=None, end_dtime=None, end_idx=None, hrs_before_storm_start=None, hrs_after_storm_start=None, dtime_storm_start=None, progress_bar=False, use_dask=False, engine='h5netcdf', skip_time_check=False)
Automatically reads in SAMI data and returns it in a format of your choice.
- Preference is to read/return xarray datasets, but can read
and return numpy arrays.
Prefer whole files, fall back on time, variable, split.
- Parameters:
(str (sami_dir) – path-like): Path to the directory containing the SAMI data
cols (str or list-like, optional) – Variables to return. Defaults to ‘all’.
split_by_time (bool, optional) – If files are output by time (and whether to prefer those files). Defaults to False.
split_by_var (bool, optional) – If files are output by variable. And to prefer those files. Defaults to False.
whole_run (bool, optional) – If the whole run is in one file. Defaults to False.
return_xarray (bool, optional) – Return xarray dataset? Defaults to True.
force_nparrays (bool, optional) – Force program to return dicts of numpy arrays. Defaults to False.
dtime_sim_start (datetime, optional) – Datetime of the start of the simulation. Defaults to None. Required if netCDF files aren’t made. If netCDF files are made, You had the option to add this as an attribute to the file.
parallel (bool, optional) – Force parallel reading of files. NetCDF files are read weird. Might be buggy. Defaults to True.
start_dtime (datetime, optional) – Datetime to start reading data. Defaults to None.
start_idx (int, optional) – Index of the first time to read. Defaults to None.
end_dtime (datetime, optional) – Datetime to stop reading data. Defaults to None
end_idx (int, optional) – Index of the last time to read. Defaults to None.
hrs_before_storm_start (int, optional) – Hours before the storm start to read data. Defaults to None. (dtime_storm_start must be set)
hrs_after_storm_start (int, optional) – Hours after the storm start to read data. Defaults to None. (dtime_storm_start must be set)
dtime_storm_start (datetime, optional) – Datetime of the storm start. Defaults to None. (hrs_before_storm_start and hrs_after_storm_start must be set)
progress_bar (bool, optional) – Show progress bar? Defaults to False. Requires tqdm.
- Returns:
- Dataset of the SAMI data
(If return_xarray is True)
- dict: Dictionary of numpy arrays of the SAMI data
(If force_nparrays is False)
- Return type:
xarray Dataset
- utility_programs.read_routines.SAMI.get_grid_elems_from_parammod(sami_data_path)
- Go into sami data directory and get the grid elements
from the parameter_mod.f90 file.
- Parameters:
sami_data_path (str) – data path for sami outputs
- Returns:
num. grid points along each field line nf: num. field lines along each magnetic longitude nlt: num. magnetic longitudes nt: num. time steps
- Return type:
nz
- utility_programs.read_routines.SAMI.get_postprocessed_grid(sami_data_path)
- Go into sami data directory and get the grid elements
from the parameter_mod.f90 file.
- Parameters:
sami_data_path (str) – data path for sami outputs
- Returns:
num. grid points along each field line ny: num. field lines along each magnetic longitude
- Return type:
nx
- utility_programs.read_routines.SAMI.get_sami_grid(sami_data_path, nlt, nf, nz)
Read in SAMI grid files.
- Parameters:
sami_data_path (str) – path to SAMI data
nlt (int) – Number of magnetic local times (lons)
nf (int) – Number of field lines along each longitude
nz (int) – Number of grid cells along each field line
geo_grid_files (dict, optional) – Files to use for getting the grid. Defaults to { ‘glat’: ‘glatu.dat’, ‘glon’: ‘glonu.dat’, ‘alt’: ‘zaltu.dat’, ‘mlat’: ‘blatu.dat’, ‘mlon’: ‘blonu.dat’, ‘malt’: ‘baltu.dat’}.
- Returns:
SAMI3 grid in a dictionary with keys: ‘glat’, ‘glon’, ‘alt’, ‘mlat’, ‘mlon’, ‘malt’
- Return type:
dict
- utility_programs.read_routines.SAMI.make_times(nt, sami_data_path, dtime_sim_start, dtime_storm_start=None, hrs_before_storm=None, hrs_after_storm=None, need_help=False, skip_time_check=False)
Make a list of datetime objects for each time step from the time.dat file.
- Parameters:
nt (int) – Number of time steps (from get_grid_elems_from_parammod)
sami_data_path (str) – Path to sami data
dtime_storm_start (datetime.datetime) – Datetime of the start of the storm
hrs_before_storm (int, optional) – Hours from the onset of the storm (or any event, really) to begin processing. Set to -1 to run for the whole entire simulation. Defaults to None.
hrs_after_storm (int, optional) – Hours from the end of the storm to stop processing. Set to -1 to run for the whole entire simulation. Defaults to None.
help (bool, optional) – If help is set to true, we will print the time list. (useful when getting acquainted with the run)
- Raises:
ValueError – Sometimes SAMI outputs fake time steps.
ValueError – You only set one of hrs_before_storm or hrs_after_storm.
- Returns:
- times (list):
List of datetime objects for each time step
- hrs_since_storm_start (list):
List of (float) hours since the storm start
- start_idx (int):
- Start index for the times list,
calculated from hrs_before_storm (ONLY if hrs_before_storm)
- end_idx (int):
- End index for the times list,
calculated from hrs_after_storm (ONLY if hrs_after_storm)
- Return type:
(tuple) tuple containing
- utility_programs.read_routines.SAMI.process_all_to_cdf(sami_data_path, dtime_sim_start, dtime_storm_start=None, progress_bar=False, start_dtime=None, end_dtime=None, out_dir=None, use_ccmc=True, split_by_time=True, split_by_var=False, whole_run=False, run_name=None, OVERWRITE=False, delete_raw=False, append_files=False, low_mem=True, cols='all', skip_time_check=False)
Process SAMI binary files to netcdf format.
- Parameters:
sami_data_path (str) – Path to SAMI data.
dtime_sim_start (datetime) – Simulation start time.
progress_bar (bool, optional) – Show progress bar. Defaults to False. Requires tqdm
start_dtime (datetime, optional) – datetime to start reading data. Defaults to None.
end_dtime (datetime, optional) – datetime to stop reading data. Defaults to None.
out_dir (str, optional) – Directory to save netcdf files. Defaults to sami_data_path.
split_by_time (bool, optional) – Split files by time. Defaults to False.
split_by_var (bool, optional) – Split files by variable. Defaults to False.
whole_run (bool, optional) – Save whole model run (in time range) as one netcdf. Defaults to False.
OVERWRITE (bool, optional) – Overwrite existing files. Defaults to False.
append_files (bool, optional) – Append to existing files.
low_mem (bool, optional) – Read data in chunks to save memory. Defaults to False.
cols (list-like or str, optional) – List of columns to read. Defaults to ‘all’.
- Raises:
ValueError – If incorrect time args are given.
ValueError – If files exist and OVERWRITE is False.
ValueError – If cols is not in available columns.
- Returns:
None
- utility_programs.read_routines.SAMI.read_raw_to_xarray(sami_data_path, dtime_sim_start, cols='all', hrs_before_storm_start=None, hrs_after_storm_start=None, dtime_storm_start=None, start_dtime=None, end_dtime=None, start_idx=None, end_idx=None, progress_bar=False, skip_time_check=False)
- Read in (raw) SAMI data and return an xarray dataset.
! This only works on raw, pre-processed SAMI data ! (not TEC or anything like that)
- Parameters:
sami_data_path (str- path-like) – Directory of SAMI files.
dtime_sim_start (datetime) – Start time of simulation.
cols (str or list-like, optional) – Model outputs to read. Defaults to ‘all’.
hrs_before_storm_start (int, optional) – Hours before storm onset to read data from. Need to set dtime_storm_start. Defaults to None.
hrs_after_storm_start (int, optional) – Hours after storm onset to read data from. Need to set dtime_storm_start. Defaults to None.
dtime_storm_start (datetime, optional) – storm/event start time. Only used if hrs_before/after is set. Defaults to None.
start_dtime (datetime, optional) – datetime to start reading data. Defaults to None.
end_dtime (datetime, optional) – datetime to stop reading data. Defaults to None.
start_idx (int, optional) – Index of time list to start. Defaults to None.
end_idx (int, optional) – Index of time list to stop. Defaults to None.
progress_bar (bool, optional) – Show progress bar. Defaults to False. (Requires tqdm)
- Raises:
ValueError – Invalid inputs
ValueError – Missing Files
- Returns:
Dataset of SAMI data.
- Return type:
xarray.Dataset
- utility_programs.read_routines.SAMI.read_sami_dene_tec_MAG_GRID(sami_data_path, dtime_sim_start=None, reshape=True)
Read in TEC (and interpolated dene) data!
- Parameters:
sami_data_path (str) – path to SAMI data
dtime_sim_start (datetime.datetime) – datetime of the start of the simulation
reshape (bool, optional) – reshape the data to the correct shape, defaults to True. Otherwise, the data will be returned as a 1D array.
- Returns:
SAMI data, times
- Return type:
dict, np.array
- utility_programs.read_routines.SAMI.read_to_nparray(sami_data_path, dtime_sim_start, dtime_storm_start=None, hrs_before_storm=None, hrs_after_storm=None, pbar=False, cols='all', need_help=False, skip_time_check=False)
Automatically read in SAMI data.
- Parameters:
sami_data_path (str) – Path to SAMI data
dtime_storm_start (datetime.datetime) – Datetime of the start of the storm
dtime_sim_start (datetime.datetime-like or str) – Datetime of the start of the simulation
t_start_idx (int, optional) – Time index of the start of the data return. Defaults to None.
t_end_idx (int, optional) – Time index of the end of the data return. Defaults to None.
pbar (bool, optional) – Do you want to show a progress bar? It is automatically set if tqdm is successfully imported. Defaults to False.
cols (str or list-like, optional) – List of columns to get data for. Defaults to ‘all’.
help (bool, optional) – Prints time and variable info. Defaults to False.
- Raises:
KeyError – If given cols is not valid
FileNotFoundError – If the filepath is invalid
- Returns:
- Dictionary of SAMI data with keys: [‘grid’, ‘data’]
data is in np arrays with the shape [nlt,nf,nz]
- np.array:
Times of the data
- Return type:
dict
Plotting
After converting files to netCDF, you can plot them using the following:
- basic_plots_from_netcdf.autoplot(file_list, columns_to_plot, output_dir=None, show_map=False, time_lims=[0, -1], cut_dict={}, lim_dict={}, loop_var='time', process_options=None, plot_arg_dict=None, concat_dim='time')
Plot data from netCDF files.
- Parameters:
file_list (list of str or str) – List of file paths to netCDF files.
columns_to_plot (str or list of str) – Name(s) of the variable(s) to plot.
output_dir (str, optional) – Directory to save the plots. If not specified, plots will not be saved to the same directory as file_list.
show_map (bool, optional) – Whether to plot the data on a map. Default is False.
time_lims (list of int, optional) – Time limits to plot. Default is [0, -1], which plots all available times.
cut_dict (dict, optional) – Dictionary of cuts to apply to the data. Default is an empty dictionary (no cuts). Format as {‘lon’: 240, ‘alt’:450}.
lim_dict (dict, optional) – Dictionary of limits to apply to the data. Default is an empty dictionary.
loop_var (str, optional) – Name of the variable to loop over. This will make plots for all values of the variable (within the limits specified). Default is ‘time’.
process_options (dict, optional) – Dictionary of processing options to apply to the data. Default is None. See run_processing_options() for supported options.
plot_arg_dict (dict, optional) – Dictionary of arguments to pass to the plot function. Default is None.
concat_dim (str, optional) – Name of the dimension to concatenate the data along when reading netCDF files with Dask. Optional. Only change this if you are having trouble reading in files. Default is ‘time’.
- Raises:
ValueError – If altitude is selected when using alt_int, or if lon/lat cuts are used when making maps.
- Return type:
None
- basic_plots_from_netcdf.run_processing_options(ds, process_options)
Process the given xarray.Dataset according to the specified options.
- Parameters:
ds (xarray.Dataset) – The dataset to be processed.
process_options (str or list) – The processing options to be applied to the input dataset. Currently supported options: ‘alt_int’: integrate over altitude ‘bandpass’: apply bandpass filter ‘transpose’: transpose the dataset
- Returns:
The processed dataset.
- Return type:
xarray.Dataset
More plotting routines can be accessed through the utility_programs.plotting_routines module:
- utility_programs.plotting_routines.custom_panels_keos(da, numplots=8, sel_col='localtime', max_per_row=4, suptitle=None, vmin=None, vmax=None, sharex=True, sharey=True, x='time', cmap='rainbow', one_colorbars=True, colorbar_label='')
A script to make a panel of keogram-like plots.
- Parameters:
da (xarray DataArray) – Data array to be plotted.
numplots (int, optional) – Number of plots to make, by default 8.
sel_col (str, optional) – Column to select data from, by default ‘localtime’.
max_per_row (int, optional) – Maximum number of plots per row, by default 4.
suptitle (str, optional) – Title of plot, by default None.
vmin (int, optional) – Minimum value of colorbar, by default None.
vmax (int, optional) – Maximum value of colorbar, by default None.
sharex (bool, optional) – Share x-axis?, by default True.
sharey (bool, optional) – Share y-axis?, by default True.
x (str, optional) – X-axis data, by default ‘time’.
cmap (str, optional) – Which matplotlib colormap to use, by default ‘rainbow’.
one_colorbars (bool, optional) – Use one colorbar for all plots?, by default True.
colorbar_label (str, optional) – Label of colorbar, by default ‘’.
- Returns:
Data from imshow (a figure).
- Return type:
Matplotlib imshow object
- utility_programs.plotting_routines.draw_field_line_plot(x, y, z, title=None, interpolate=False, x_label='Mlat (Deg)', y_label='Altitude (km)', x_lims=[-65, 65], y_lims=[0, 1200], cbar_label=None, cbar_lims=None, ax=None, cmap='viridis', fname=None, save_or_show='save', fpeak_col=None)
Draw a plot of data along a single field line (longitude).
- Parameters:
x (numpy array) – X-axis data, or latitude
y (numpy array) – Y-axis data, or altitude
z (numpy array) – Z-axis (color) data, or data to be plotted
title (str, optional) – Tiitle of generated plot, by default None
interpolate (bool, optional) – Interpolate the data (to the x-y grid), by default False
x_label (str, optional) – Label of x-axis, by default ‘Mlat (Deg)’
y_label (str, optional) – Label of y-axis, by default ‘Altitude (km)’
x_lims (list, optional) – Limits of x-axis, by default [-65, 65]
y_lims (list, optional) – Limits of y-axis, by default [0, 1200]
cbar_label (str, optional) – Label of colorbar, by default None
cbar_lims (list, optional) – Limits of colorbar, by default None
ax (Matplotlib axes, optional) – If plotting on an existing axis, specify it here, by default None
cmap (str, optional) – Which matplotlib colormap to use, by default ‘viridis’
fname (str, optional) – Filename when saving, by default None
save_or_show (str, optional) – Save or show plots? Defaults to “save”, by default ‘save’
fpeak_col (numpy array, optional) – To plot the F-peak location, give the latitude-altitude coordinates here, by default None
- Raises:
ValueError – If fname exists and OVERWRITE is False
ValueError – If save_or_show is not “save”, “show”, or “return”
ValueError – If fname is not given when save_or_show is “save”
- Returns:
Data from imshow
- Return type:
Matplotlib imshow object
- utility_programs.plotting_routines.draw_map(data_arr, cbarlims=[None, None], title=None, ylims=None, cbar_label=None, y_label='Latitude (deg)', x_label='Longitude (deg)', save_or_show='save', cmap='viridis', ax=None, fname=None, OVERWRITE=False, **kwargs)
Draw a map of the data array.
- Parameters:
data_arr (xarray DataArray) – Data array to be plotted
cbarlims (list, optional) – Colorbar limits as [vmin, vmax], defaults to None (automatic limits)
title (str, optional) – Title to draw, defaults to None
ylims (list, optional) – Limits of y-axis [ymin, ymax], defaults to None
cbar_label (str, optional) – Label of colorbar, defaults to None
y_label (str, optional) – Label of y-axis, defaults to “Latitude (deg)”
x_label (str, optional) – Label of x-axis, defaults to “Longitude (deg)”
save_or_show (str, optional) – Save or show plots? Defaults to “save”
cmap (str, optional) – Which matplotlib colormap to use, defaults to ‘viridis’
ax (Matplotlib axes, optional) – If plotting on an existing axis, specify it here, defaults to None
fname (str, optional) – Filename when saving, defaults to None
OVERWRITE (bool, optional) – Overwrite existing files when saving, defaults to False
- Raises:
ValueError – If fname exists and OVERWRITE is False
ValueError – If save_or_show is not “save”, “show”, or “return”
ValueError – If fname is not given when save_or_show is “save”
- Returns:
Data from imshow
- Return type:
Matplotlib imshow object
- utility_programs.plotting_routines.loop_panels(da, ncols, start_time, time_delta='1 hour', sel_criteria=None, suptitle=None, title=None, col_names=None, save=None, lon_labels=False, mask_dials=False, **plotargs)
When making panel plots, this script is easier to interface with.
- Parameters:
da (xarray DataArray) – Data array to be plotted.
ncols (int) – Number of columns to plot.
start_time (str) – Start time of plot to be converted to pd.Timestamp. Format as ‘YYYY-MM-DD HH:MM:SS’; ‘2011-05-20’ is acceptable (00:00 UT).
time_delta (str, optional) – Time between each plot, defaults to ‘1 hour’.
sel_criteria (dict, optional) – Criteria to select data (i.e. plotting multiple altitudes), defaults to None.
suptitle (str, optional) – Title of plot, defaults to None.
title (str, optional) – Title of each subplot, defaults to None.
col_names (list of str, optional) – Name(s) of column(s) to plot, defaults to None (use this when plotting a DataSet instead of a DataArray).
save (str, optional) – Filename to save plot to, defaults to None.
lon_labels (bool, optional) – Show longitude labels, defaults to False.
mask_dials (int, optional) – Mask values below this value, defaults to False.
- Returns:
Data from imshow (a figure).
- Return type:
Matplotlib imshow object
- utility_programs.plotting_routines.make_a_keo(arr, title=None, cbarlims=[None, None], cbar_name=None, y_label='Latitude (deg)', x_label='Index of Time-step (relative to plot start)', p_extent=None, save_or_show='save', cmap='viridis', fname=None, ax=None, ylims=None, OVERWRITE=False, **kwargs)
Generate a keogram plot from a data array.
- Parameters:
arr (xarray DataArray) – Data array to be plotted
title (str, optional) – Title of plot, defaults to None
cbarlims (list, optional) – Colorbar limits, Defaults to [None, None] (automatic)
cbar_name (str, optional) – Label for colorbar, defaults to None
y_label (str, optional) – Y-axis label, defaults to “Latitude (deg)”
x_label (str, optional) – Label for x-axis, defaults to “Index of Time-step (relative to plot start)”
p_extent (list-like, optional) – Plot extent. Set to 0 for auto.
save_or_show (str, optional) – Save, show, or return plots? Defaults to “save”
cmap (str, optional) – Colormap to use to plot, defaults to ‘viridis’
fname (str, optional) – Filename to save plot to, defaults to None
ax (Matplotlib axes, optional) – If plotting on an existing axis, specify it here, defaults to None
ylims (int, optional) – Limits of y axis, defaults to None
OVERWRITE (bool, optional) – Overwrite existing plot, defaults to False
- Raises:
ValueError – If fname exists and OVERWRITE is False
ValueError – If save_or_show is not “save”, “show”, or “return”
ValueError – If fname is not given when save_or_show is “save”
- Returns:
Data from imshow
- Return type:
Matplotlib imshow object
- utility_programs.plotting_routines.map_and_dials(dial_da, total, map_da=None, max_per_row=3, isel_dials=None, sel_dials=None, isel_map=None, sel_map=None, quiver_map_cols=None, suptitle=None, suptitlesize='large', time_start=None, time_delta='1 hour', save=None, mask_dials=0.001, mask_maps=False, dial_cmap='rainbow', map_cmap='rainbow', vmin_dial=None, vmax_dial=None, dial_kwargs={}, map_kwargs={}, vmin_map=None, vmax_map=None, several_datasets=False, times_datasets=None, latlon_labeled=False)
This script will make a plot of a map and polar dials for several timesteps.
- Parameters:
dial_da (xarray DataArray) – Data array to be plotted on dials
total (int) – Number of plots to make
map_da (xarray DataArray, optional) – Data array to be plotted on map, defaults to None
max_per_row (int, optional) – Maximum number of plots per row, defaults to 3
isel_dials (dict, optional) – Indices to select dial data from, defaults to None
sel_dials (dict, optional) – Criteria to select dial data from, defaults to None
isel_map (dict, optional) – Indices to select map data from, defaults to None
sel_map (dict, optional) – Criteria to select map data from, defaults to None
quiver_map_cols (str, optional) – Columns to plot on map as quiver, defaults to None (no quivers)
suptitle (str, optional) – Title of plot, defaults to None
suptitlesize (str, optional) – Size of suptitle, defaults to ‘large’
time_start (str, optional) – Start time of plot, defaults to None
time_delta (str, optional) – Time between each plot, defaults to ‘1 hour’
save (str, optional) – Filename to save plot to, defaults to None (don’t save)
mask_dials (float, optional) – Mask values below this value, defaults to 0.001
mask_maps (bool, optional) – Mask values below this value, defaults to False
dial_cmap (str, optional) – Which matplotlib colormap to use for dials, defaults to ‘rainbow’
map_cmap (str, optional) – Which matplotlib colormap to use for maps, defaults to ‘rainbow’
vmin_dial (int, optional) – Minimum value of dial colorbar, defaults to None
vmax_dial (int, optional) – Maximum value of dial colorbar, defaults to None
dial_kwargs (dict, optional) – Keyword arguments for dial plot, defaults to {}
map_kwargs (dict, optional) – Keyword arguments for map plot, defaults to {}
vmin_map (int, optional) – Minimum value of map colorbar, defaults to None
vmax_map (int, optional) – Maximum value of map colorbar, defaults to None
several_datasets (bool, optional) – Plot several datasets on the same plot, defaults to False
times_datasets (list of str, optional) – Times to plot for each dataset, defaults to None
latlon_labeled (bool, optional) – Show latitude and longitude labels, defaults to False
- Returns:
Data from imshow (a figure)
- Return type:
Matplotlib imshow object
- utility_programs.plotting_routines.panel_of_dials(da, hemi_titles, times, time_titles=None, title=None, mask_dials=False, lon_labels=False, **plotargs)
Plot a panel of polar dials.
- Parameters:
da (xarray DataArray) – Data array to be plotted.
hemi_titles (list, optional) – Titles of each hemisphere.
times (list, optional) – Times to plot.
time_titles (list, optional) – Titles of each time, defaults to None.
title (str, optional) – Title of plot, defaults to None.
mask_dials (int, optional) – Mask values below this value, defaults to False.
lon_labels (bool, optional) – Show longitude labels, defaults to False.
- Returns:
Data from imshow (a figure).
- Return type:
Matplotlib imshow object
- Raises:
ValueError – If hemi_titles and times do not match in shape.
- utility_programs.plotting_routines.panel_plot(da, x='time', y='lat', wrap_col='lon', plot_vals=[0, 45, 90, 135, 180, 225, 270, 315], do_map=False, col_wrap=4, suptitle=None, vlims=None, cmap='bwr', out_fname=None, isel_plotvals=False, tight_layout=False, cbar_label=None)
Plot a panel plot of a data array.
- Parameters:
da (xr.DataArray) – Data array to be plotted
x (str, optional) – X-axis data, by default ‘time’
y (str, optional) – Y-axis data, by default ‘lat’
wrap_col (str, optional) – Column to wrap around, by default ‘lon’
plot_vals (list, optional) – Values (of wrap_col) to plot, by default [0, 45, 90, 135, 180, 225, 270, 315]
do_map (bool, optional) – Plot on a map?, by default False
col_wrap (int, optional) – How many columns to make, by default 4
suptitle (str, optional) – Title of plot, by default None
vlims (int, optional) – Absolute value of colorbar limits, by default None (auto).
cmap (str, optional) – Which matplotlib colormap to use, by default ‘bwr’
out_fname (str, optional) – Filename when saving, by default None
isel_plotvals (bool, optional) – Set to true if the plot_vals are indices rather than values, by default False
tight_layout (bool, optional) – Use tight layout?, by default False
cbar_label (str, optional) – Label of colorbar, by default None
- Return type:
None
- utility_programs.plotting_routines.panel_with_lt(da, x='time', y='lat', wrap_col='lon', lons=[0, 45, 90, 135, 180, 225, 270, 315], col_wrap=4, suptitle=None, figsize=None, vlims=None, cmap='bwr', out_fname=None, tight_layout=False)
Plot a panel plot of a data array, with local time on the x-axis.
- Parameters:
da (DataArray) – Data array to be plotted
x (str, optional) – X-axis data, defaults to ‘time’
y (str, optional) – Y-axis data, defaults to ‘lat’
wrap_col (str, optional) – Column to wrap around, defaults to ‘lon’
lons (list, optional) – Values (of wrap_col) to plot, defaults to [0, 45, 90, 135, 180, 225, 270, 315]
col_wrap (int, optional) – How many columns to make, defaults to 4
suptitle (str, optional) – Title of plot, defaults to None
figsize (list, optional) – Size of figure, defaults to None (automatic)
vlims (int, optional) – Absolute value of colorbar limits, defaults to None (automatic)
cmap (str, optional) – Which matplotlib colormap to use, defaults to ‘bwr’
out_fname (str, optional) – Filename when saving, defaults to None
tight_layout (bool, optional) – Use tight layout? Defaults to False
Utilities
A number of useful utilities are available.
These are not very well organized, but they are all available in the utility_programs module:
- utility_programs.filters.filter_xarray_DA_diff(da, dim='time', order=2, percent=False, label='lower')
Calculate the difference of a DataArray along a given dimension.
- Parameters:
da (xarray DataArray) – DataArray to be filtered
dim (str, optional) – Dimension over which to calculate the diff, by default ‘time’
order (int, optional) – Order of the diff, by default 2
percent (bool, optional) – Return the percent over diff (False), or just return the fit (True), by default False
label (str, optional) – Label values to the ‘lower’ or ‘upper’ bound, by default ‘lower’
- Returns:
Filtered data
- Return type:
xarray DataArray
- utility_programs.filters.make_fits(da, freq=5, lims=[40, 85], order=1, percent=True)
Calculate bandpass filter for all data previously read in, this is a wrapper for the scipy bandpass filter.
- Parameters:
da (xarray DataArray or numpy array) – DataArray to be filtered
freq (int, optional) – Time between outputs, units must be same as lims, by default 5; for data every 5 minutes.
lims (list, optional) – Period limits of bandpass filter, by default [40, 85]. Units must be same as freq.
order (int, optional) – Order of the filter, by default 1
percent (bool, optional) – Return Array as percent over background? Set to False to return the absolute perturbation over background. Defaults to True.
- Returns:
DataArray with filtered data.
- Return type:
xarray DataArray
Notes
The bandpass filter is applied to the data using the scipy.signal.filtfilt function. This function is a forward-backward filter, meaning that the data is filtered in both the forward and reverse directions. This results in zero phase shift in the filtered data.
The bandpass filter is designed using the scipy.signal.butter function. This function designs a Butterworth filter, which is a type of infinite impulse response filter. The filter is designed using the specified order, and the cutoff frequencies are calculated from the specified limits and the sampling frequency.
- utility_programs.filters.remove_outliers(array)
Remove outliers from an array by replacing them with the median value.
- Parameters:
array (numpy array or xarray DataArray) – Data to be filtered
- Returns:
Filtered data
- Return type:
numpy array or xarray DataArray
- utility_programs.utils.add_lt_to_dataset(ds, localtimes=[2, 6, 10, 14, 18, 22], pbar=False)
Add localtime as a coordinate to an existing dataset/dataarray
- Parameters:
ds (xarray.dataset or xarray.dataarray) – xarray object to add the column to. Can be dataset (takes longer) or dataarray. ** MUST HAVE time AS A DIMENSION**.
localtimes (array-like or int) – Localtimes to be returned! If int, this is the number of localtimes. They will be evenly spaced from 0-24. If list-like, these are the actual localtimes to return. Be careful adding too many of these. Since we do some interpolating, the more localtimes you want, the longer this will take. Optional to show a progress bar is it’s taking a long time.
pbar (bool) – Set this to true to show a progress bar.
- Returns:
ds – Same exact format as the input dataset, but with a new coordinate of ‘localtime’’
- Return type:
xarray.(dataset/dataarray)
- utility_programs.utils.autoread(file_list, columns_to_return=None, concat_dim='time')
Automatically read in a list of files and concatenate them.
- Parameters:
file_list (str or list of paths) – List of files to read in.
columns_to_return (str or list, optional) – Columns (data_vars) to return. Defaults to None.
concat_dim (str, optional) – Concatenate along this dimension. No reason to ever change. Defaults to ‘time’.
- Raises:
ValueError – Column not found in files.
- Returns:
Dataset holding requested variable(s).
- Return type:
xarray.Dataset
Notes
This is not well supported. Best to use xarray.open_mfdataset() instead.
- utility_programs.utils.get_var_names(dir, models)
Print out a list of variable names.
- Parameters:
dir (str) – Directory of outputs.
models (str or list) – Name of model.
- Return type:
None
Notes
This function prints out a list of variable names for a given model or list of models. It does this by searching for netCDF files in the specified directory that match the model name(s), opening the first file found, and printing out the names of the data variables in the file.
Examples
>>> get_var_names('/path/to/outputs', 'model1') model1 <xarray.Dataset> Dimensions: (time: 10, x: 100, y: 100) Coordinates: * time (time) datetime64[ns] 2000-01-01 2000-02-01 ... 2000-10-01 * x (x) float64 0.0 1.0 2.0 3.0 4.0 ... 96.0 97.0 98.0 99.0 100.0 * y (y) float64 0.0 1.0 2.0 3.0 4.0 ... 96.0 97.0 98.0 99.0 100.0 Data variables: var1 (time, y, x) float64 ... var2 (time, y, x) float64 ... var3 (time, y, x) float64 ... var4 (time, y, x) float64 ... var5 (time, y, x) float64 ... var6 (time, y, x) float64 ... var7 (time, y, x) float64 ... var8 (time, y, x) float64 ... var9 (time, y, x) float64 ... var10 (time, y, x) float64 ...
- utility_programs.utils.hours_from_storm_onset_into_ds(ds, onset_ut)
Calculate hours from an event and add to dataset.
- Parameters:
ds (xarray.Dataset) – Dataset to add hours from storm onset to
onset_ut (datetime.datetime) – Datetime object of storm/event
- Raises:
ValueError – If multiple days are in the dataset
- Returns:
Dataset with hours from storm onset added
- Return type:
xarray.Dataset
- utility_programs.utils.make_ccmc_name(modelname, ut, data_type=None)
Make a CCMC-formatted filename. Returns a string of the form: modelname_datatype_YYYY-MM-DDThh-mm-ss.nc
- Parameters:
modelname (str) – Name of model.
ut (datetime) – Datetime object of the time of the file.
data_type (str) – Type of data in the file (3DALL/2DANC/ALL/etc.).
- Returns:
CCMC-formatted filename.
- Return type:
str
- utility_programs.utils.str_to_ut(in_str)
Convert a string to a datetime object.
- Parameters:
in_str (str) – String to convert to datetime object.
- Returns:
Datetime object.
- Return type:
datetime
- utility_programs.utils.ut_to_lt(time_array, glon)
Compute local time from date and longitude.
- Parameters:
time_array (array-like) – Array-like of datetime objects in universal time
glon (array-like or float) – Float or array-like of floats containing geographic longitude in degrees. If single value or array of a different shape, all longitudes are applied to all times. If the shape is the same as time_array, the values are paired in the SLT calculation.
- Returns:
lt – List of local times in hours
- Return type:
array of floats
- Raises:
TypeError – For badly formatted input
Routines to perform temporal calculations.
- utility_programs.time_conversion.calc_time_shift(utime)
Calculate the time shift needed to orient a polar dial.
- Parameters:
utime (datetime) – Datetime object of the time we’re plotting
- Returns:
Time shift in degrees
- Return type:
float
- utility_programs.time_conversion.datetime_to_epoch(dtime)
Convert datetime to epoch seconds.
- Parameters:
dtime (datetime) – Datetime object
- Returns:
Seconds since 1 Jan 1965
- Return type:
float
- utility_programs.time_conversion.epoch_to_datetime(epoch_time)
Convert from epoch seconds to datetime.
- Parameters:
epoch_time (float) – Seconds since 1 Jan 1965
- Returns:
Datetime object corresponding to epoch_time
- Return type:
datetime
- utility_programs.time_conversion.lt_to_ut(lt, glon)
Compute universal time in hours from local time and longitude.
- Parameters:
lt (float or array-like) – Local time(s) in hours
glon (float or array-like) – Geographic longitude(s) in degrees.
- Returns:
Universal time in hours
- Return type:
float
- utility_programs.time_conversion.ut_to_lt(time_array, glon)
Compute local time from date and longitude.
- Parameters:
time_array (array-like) – Array-like of datetime objects in universal time
glon (array-like or float) – Float or array-like of floats containing geographic longitude in degrees. If single value or array of a different shape, all longitudes are applied to all times. If the shape is the same as time_array, the values are paired in the SLT calculation.
- Returns:
List of local times in hours
- Return type:
array of floats