geogals package
Module contents
A collection of functions built for the geostatistical analysis of galaxy data.
Created by: Benjamin Metha, Tree Smith, Jaime Blackwell
Last Updated: May 26, 2025
- geogals.RA_DEC_to_XY(RA, DEC, meta)[source]
Takes in list of RA, DEC coordinates and transforms them into a list of deprojected XY values, where X and Y are the distances from the galaxy’s centre in units of kpc
- Parameters:
RA (ndarray like of shape (N,)) – List of RA values
DEC (ndarray like of shape (N,)) – List of DEC values
meta (dict) – Must contain RA, DEC of the galaxy centre, and PA, i, and D to get the galaxy’s absolute units
- Returns:
XY_kpc – Contains X and Y coords of all data points with units of kpc
- Return type:
(N,2) ndarray
- geogals.RA_DEC_to_radius(RA, DEC, meta)[source]
Converts a list of RA//DEC values to distances from a galaxy’s centre, using a supplied metadata dictionary.
- Parameters:
RA (float or np.array) – Right ascension of points
DEC (float or np.array) – Declination of points
- Returns:
r – Distance from each point to the galaxy’s centre
- Return type:
np array
- geogals.assign_IDs(n_dp, n_folds)[source]
Creates an array of length n_dp, with each element having a random integer from 1 to n_folds, and an equal amount of each number.
If we can’t get exactly even groups, the higher numbered groups will have one less element than the lower numbers.
e.g. assign_IDs(5,3) may return [2,3,2,1,1].
- Parameters:
n_dp (int) – number of data points
n_folds (int) – number of groups to split the data into
- Returns:
group_IDs – gives ID of each group element.
- Return type:
np array
Build the covariance matrix due to correlated error associated with the measurement of emission lines. Assumes PSF of the telescope is a Gaussian.
- Parameters:
dist_matrix ((N,N) np.array) – Distances between all pairs of regions.
e_Z ((N,) np.array) – Uncertainty in metallicity for each observation
meta (dict) –
Metadata used to calculate correlations between. Must contain: D: float
Distance from this galaxy to Earth, Mpc.
- PSF: float
Given in Arcseconds, this is the mean seeing for each galaxy (Mean value of Table 1 of Emsellem+22 for native resolution for each galaxy: https://ui.adsabs.harvard.edu/abs/2022A%26A…659A.191E/abstract)
- Returns:
cov_matrix – Covariance matrix for correlated observation errors.
- Return type:
(N,N) np.array
- geogals.deprojected_distances(RA1, DEC1, RA2=None, DEC2=None, meta={})[source]
Computes the deprojected distances between one set of RAs/DECs and another, for a known galaxy.
- Parameters:
RA1 (float, list, or np array-like) – List of (first) RA values. Must be in degrees.
DEC1 (float, list, or np array-like) – List of (first) DEC values. Must be in degrees.
RA2 (float, list, or np array-like) – (Optional) second list of RA values. Must be in degrees. If no argument is provided, then the first list will be used again.
DEC2 (float, list, or np array-like) – (Optional) second list of DEC values. Must be in degrees. If no argument is provided, then the first list will be used again.
meta (dict) –
Metadata used to calculate the distances. Must contain: PA: float
Principle Angle of the galaxy, degrees.
- i: float
inclination of the galaxy along this principle axis, degrees.
- D: float
Distance from this galaxy to Earth, Mpc.
- Returns:
dists – Array of distances between all RA, DEC pairs provided. Units: kpc.
- Return type:
np array
- geogals.fast_semivariogram(Z_grid, header=None, meta=None, bin_size=2, d_lim=None)[source]
A fast algorithm for computing the semivariogram of galaxy data.
- Parameters:
np.array) (Z_grid (2d) – Random field for which we are computing the semivariogram
header (hdu header file) – Must contain wcs. If not supplied, semivariogram will be computed in units of pixels, with no deprojection.
meta (dict) –
Metadata used to calculate the distances. Must be supplied if header is supplied. Must contain:
- PA: float
Principle Angle of the galaxy, degrees.
- i: float
inclination of the galaxy along this principle axis, degrees.
- D: float
Distance from this galaxy to Earth, Mpc.
bin_size – Size of bins for semivariogram. Defaults to 2 (pixels) – should be changed if using physical separations
d_lim (float, or None) – Maximum distance up to which compute the semivariogram. If not supplied, goes up to the maximum possible distance in the data.
- Returns:
svg (numpy array) – Semivariogram of the data at each separation
bc (numpy array) – centres of each semivariogram bin
N (Number of pairs in each bin?? ( Tree to confirm))
- geogals.fit_exp_cov_model(data_dict, meta, n_samples, n_walkers, backend_f, init_theta, init_unc_theta)[source]
Fit a geostatistical model to the supplied data using emcee, accounting for (1) a radially linear mean trend, and (2) small fluctuations that are exponentially correlated
- Parameters:
data_dict (dict) – Contains RA, DEC, Z, e_Z for each measured value of our random field.
meta (dict) – Metadata for this galaxy. Must contain PA (position angle, degrees); i (inclination, degrees) and D (distance, Mpc) for this galaxy, as well as its central RA and DEC.
n_samples (int) – Hyperparameter for emcee; controls how many samples are drawn for each walker
n_walkers (int) – Hyperparameter for emcee; controls how many walkers are used to sample from the posterior.
backend_f (str) – Filename for where to store emcee results (.hdf5)
init_theta ((4,) tuple) – Initial values assumed for log_variance, correlation scale, central value of random field, and radial gradient of random field.
init_unc_theta ((4,) tuple) – Initial uncertainties assumed for the sample parameters
- Returns:
f_acc – mean acceptance fraction over all chains.
- Return type:
float
- geogals.fit_radial_linear_trend(data_dict, meta, return_covariances=False)[source]
Fits a radial trend to the galaxy data. Designed for computing metallicity gradients – other mean models may be required for other galaxy data (e.g. velocities) Does not account for small scale variations in the data
- Parameters:
data_dict (dict) – Contains RA, DEC, Z, e_Z for each measured value of our random field.
meta (dict) –
Metadata used to calculate the distances. Must contain:
- RA: float
Right ascension of the centre of the galaxy.
- DEC: float
Declination of the centre of the galaxy.
- PA: float
Principle Angle of the galaxy, degrees.
- i: float
inclination of the galaxy along this principle axis, degrees.
- D: float
Distance from this galaxy to Earth, Mpc.
return_covariances (bool) – If True, covariances of parameters will be returned as well.
- Returns:
params (list) – Central value and radial gradient of random field.
optional – covariance (array) – Covariance matrix for returned parameters
- geogals.generate_residual_Z_grid(Z_grid, e_Z_grid, header, meta)[source]
Find and subtract a radial trend in Z_grid,
- geogals.get_subsample(data_dict, n_in_subsample)[source]
Selects n_in_subsample elements from a supplied data_dict
- geogals.globalize_data(loc_Z, loc_e_Z, loc_r, loc_dist_matrix, loc_init_theta, loc_init_unc_theta)[source]
Turn local variables into global ones (not strictly necessary, but better for transparency)
- geogals.krig_exp_model(RA, DEC, Z_df, meta, theta, mode='grid')[source]
Performs universal kriging on a model grid of RA and DEC Uses my distance function, a choice of covariance function, the best fitting value of f_d (no restrictions), and assumes Z ~ r + (random effects)
Uses equations presented in ‘Spatio-Temporal Statistics with R’, available for free at https://spacetimewithr.org
- Parameters:
RA (np array) – Array of RA values. Must be in degrees.
DEC ((N,) np array) – Array of DEC values. Must be in degrees.
Z_df (data dict, containing RA, DEC, Z, e_Z for each data point.)
meta (dict) –
Metadata used to calculate the distances. Must contain:
- PA: float
Principle Angle of the galaxy, degrees.
- i: float
inclination of the galaxy along this principle axis, degrees.
- D: float
Distance from this galaxy to Earth, Mpc.
theta ((4,) tuple) – Contains model parameters (log_Var, phi, Zc, gradZ)
mode (str) –
Options include ‘grid’: make a grid of all possible combos of supplied RA and DEC
values; use kriging to estimate Z at each point on grid.
- ’list’: just use pairs of RA and DEC values as they are given.
RA and DEC must have the same length.
’auto’: Get RA and DEC values from the df itself.
- Returns:
Z_pred_matrix ((M,N) np array) – interpolated (kriged) values over the RA, DEC coords given.
var_matrix ((M,N) np array) – variances for these predictions
- geogals.log_gamma_prior_tenth_to_ten(x)[source]
Gamma distribution with 1% probability of being below 0.1 or above 10
- geogals.log_likelihood_exp_model(theta, priors=True)[source]
Function that is optimised by emcee to find the best parameters for our model of the random field, with radially linear mean values for Z, and exponentially-correlated random effects.
The following must be defined as global variables:
- Z: (N,) np.array
Observations at N data points
- r: (N,) np.array
Covariate that is each spaxel’s distance from the galaxy center.
- e_Z: (N,N) np.array
matrix of variance from observational error at all data points.
- dist_matrix: (N,N) np.array
matrix of distance between all observed data points.
- Parameters:
theta (4-tuple) –
- Contains:
log_A, phi: model parameters for spatial_cov Z_c, gradZ: model parameters for the large scale gradient
priors (bool) – If True, folds in the provided priors to the likelihood guess. If False, just computes the likelihood.
- Returns:
log_likelihood – The log likelihood of this model with the supplied parameters.
- Return type:
float
- geogals.log_prior(theta, init_theta, init_unc_theta)[source]
A prior for the parameters of the model – feel free to tweak or add your own!
- geogals.make_RA_DEC_grid(header)[source]
Given a hdu header, create a grid of RA//DEC for each pixel in that file.
- geogals.make_physical_lag_grid(header, meta)[source]
Given a hdu header, create a grid of RA//DEC for each pixel in that file.
- Parameters:
header (hdu header file) – Must contain wcs
meta (dict) – Must contain RA, DEC of the galaxy centre, and PA, i, and D to get the galaxy’s absolute units (D should be in units of megaparsecs; PA and i should be in units of degrees).
- geogals.to_data_dict(header, Z, e_Z)[source]
- Parameters:
header (hdu header file) – Must contain wcs
Z (np array) – Grid with values of our random field
e_Z (np array) – Same shape as Z Gives uncertainty of Z at each location
- Returns:
data_dict – Contains RA, DEC, Z and e_Z for every
- Return type:
dict