geogals package

Module contents

A collection of functions built for the geostatistical analysis of galaxy data.

Created by: Benjamin Metha, Tree Smith, Jaime Blackwell

Last Updated: May 26, 2025

geogals.RA_DEC_to_XY(RA, DEC, meta)[source]

Takes in list of RA, DEC coordinates and transforms them into a list of deprojected XY values, where X and Y are the distances from the galaxy’s centre in units of kpc

Parameters:
  • RA (ndarray like of shape (N,)) – List of RA values

  • DEC (ndarray like of shape (N,)) – List of DEC values

  • meta (dict) – Must contain RA, DEC of the galaxy centre, and PA, i, and D to get the galaxy’s absolute units

Returns:

XY_kpc – Contains X and Y coords of all data points with units of kpc

Return type:

(N,2) ndarray

geogals.RA_DEC_to_radius(RA, DEC, meta)[source]

Converts a list of RA//DEC values to distances from a galaxy’s centre, using a supplied metadata dictionary.

Parameters:
  • RA (float or np.array) – Right ascension of points

  • DEC (float or np.array) – Declination of points

Returns:

r – Distance from each point to the galaxy’s centre

Return type:

np array

geogals.assign_IDs(n_dp, n_folds)[source]

Creates an array of length n_dp, with each element having a random integer from 1 to n_folds, and an equal amount of each number.

If we can’t get exactly even groups, the higher numbered groups will have one less element than the lower numbers.

e.g. assign_IDs(5,3) may return [2,3,2,1,1].

Parameters:
  • n_dp (int) – number of data points

  • n_folds (int) – number of groups to split the data into

Returns:

group_IDs – gives ID of each group element.

Return type:

np array

geogals.build_correlated_error_covariance_matrix(dist_matrix, e_Z, meta)[source]

Build the covariance matrix due to correlated error associated with the measurement of emission lines. Assumes PSF of the telescope is a Gaussian.

Parameters:
  • dist_matrix ((N,N) np.array) – Distances between all pairs of regions.

  • e_Z ((N,) np.array) – Uncertainty in metallicity for each observation

  • meta (dict) –

    Metadata used to calculate correlations between. Must contain: D: float

    Distance from this galaxy to Earth, Mpc.

    PSF: float

    Given in Arcseconds, this is the mean seeing for each galaxy (Mean value of Table 1 of Emsellem+22 for native resolution for each galaxy: https://ui.adsabs.harvard.edu/abs/2022A%26A…659A.191E/abstract)

Returns:

cov_matrix – Covariance matrix for correlated observation errors.

Return type:

(N,N) np.array

geogals.deprojected_distances(RA1, DEC1, RA2=None, DEC2=None, meta={})[source]

Computes the deprojected distances between one set of RAs/DECs and another, for a known galaxy.

Parameters:
  • RA1 (float, list, or np array-like) – List of (first) RA values. Must be in degrees.

  • DEC1 (float, list, or np array-like) – List of (first) DEC values. Must be in degrees.

  • RA2 (float, list, or np array-like) – (Optional) second list of RA values. Must be in degrees. If no argument is provided, then the first list will be used again.

  • DEC2 (float, list, or np array-like) – (Optional) second list of DEC values. Must be in degrees. If no argument is provided, then the first list will be used again.

  • meta (dict) –

    Metadata used to calculate the distances. Must contain: PA: float

    Principle Angle of the galaxy, degrees.

    i: float

    inclination of the galaxy along this principle axis, degrees.

    D: float

    Distance from this galaxy to Earth, Mpc.

Returns:

dists – Array of distances between all RA, DEC pairs provided. Units: kpc.

Return type:

np array

geogals.fast_semivariogram(Z_grid, header=None, meta=None, bin_size=2, d_lim=None)[source]

A fast algorithm for computing the semivariogram of galaxy data.

Parameters:
  • np.array) (Z_grid (2d) – Random field for which we are computing the semivariogram

  • header (hdu header file) – Must contain wcs. If not supplied, semivariogram will be computed in units of pixels, with no deprojection.

  • meta (dict) –

    Metadata used to calculate the distances. Must be supplied if header is supplied. Must contain:

    PA: float

    Principle Angle of the galaxy, degrees.

    i: float

    inclination of the galaxy along this principle axis, degrees.

    D: float

    Distance from this galaxy to Earth, Mpc.

  • bin_size – Size of bins for semivariogram. Defaults to 2 (pixels) – should be changed if using physical separations

  • d_lim (float, or None) – Maximum distance up to which compute the semivariogram. If not supplied, goes up to the maximum possible distance in the data.

Returns:

  • svg (numpy array) – Semivariogram of the data at each separation

  • bc (numpy array) – centres of each semivariogram bin

  • N (Number of pairs in each bin?? ( Tree to confirm))

geogals.fit_exp_cov_model(data_dict, meta, n_samples, n_walkers, backend_f, init_theta, init_unc_theta)[source]

Fit a geostatistical model to the supplied data using emcee, accounting for (1) a radially linear mean trend, and (2) small fluctuations that are exponentially correlated

Parameters:
  • data_dict (dict) – Contains RA, DEC, Z, e_Z for each measured value of our random field.

  • meta (dict) – Metadata for this galaxy. Must contain PA (position angle, degrees); i (inclination, degrees) and D (distance, Mpc) for this galaxy, as well as its central RA and DEC.

  • n_samples (int) – Hyperparameter for emcee; controls how many samples are drawn for each walker

  • n_walkers (int) – Hyperparameter for emcee; controls how many walkers are used to sample from the posterior.

  • backend_f (str) – Filename for where to store emcee results (.hdf5)

  • init_theta ((4,) tuple) – Initial values assumed for log_variance, correlation scale, central value of random field, and radial gradient of random field.

  • init_unc_theta ((4,) tuple) – Initial uncertainties assumed for the sample parameters

Returns:

f_acc – mean acceptance fraction over all chains.

Return type:

float

geogals.fit_radial_linear_trend(data_dict, meta, return_covariances=False)[source]

Fits a radial trend to the galaxy data. Designed for computing metallicity gradients – other mean models may be required for other galaxy data (e.g. velocities) Does not account for small scale variations in the data

Parameters:
  • data_dict (dict) – Contains RA, DEC, Z, e_Z for each measured value of our random field.

  • meta (dict) –

    Metadata used to calculate the distances. Must contain:

    RA: float

    Right ascension of the centre of the galaxy.

    DEC: float

    Declination of the centre of the galaxy.

    PA: float

    Principle Angle of the galaxy, degrees.

    i: float

    inclination of the galaxy along this principle axis, degrees.

    D: float

    Distance from this galaxy to Earth, Mpc.

  • return_covariances (bool) – If True, covariances of parameters will be returned as well.

Returns:

  • params (list) – Central value and radial gradient of random field.

  • optional – covariance (array) – Covariance matrix for returned parameters

geogals.generate_residual_Z_grid(Z_grid, e_Z_grid, header, meta)[source]

Find and subtract a radial trend in Z_grid,

geogals.get_subsample(data_dict, n_in_subsample)[source]

Selects n_in_subsample elements from a supplied data_dict

geogals.globalize_data(loc_Z, loc_e_Z, loc_r, loc_dist_matrix, loc_init_theta, loc_init_unc_theta)[source]

Turn local variables into global ones (not strictly necessary, but better for transparency)

geogals.krig_exp_model(RA, DEC, Z_df, meta, theta, mode='grid')[source]

Performs universal kriging on a model grid of RA and DEC Uses my distance function, a choice of covariance function, the best fitting value of f_d (no restrictions), and assumes Z ~ r + (random effects)

Uses equations presented in ‘Spatio-Temporal Statistics with R’, available for free at https://spacetimewithr.org

Parameters:
  • RA (np array) – Array of RA values. Must be in degrees.

  • DEC ((N,) np array) – Array of DEC values. Must be in degrees.

  • Z_df (data dict, containing RA, DEC, Z, e_Z for each data point.)

  • meta (dict) –

    Metadata used to calculate the distances. Must contain:

    PA: float

    Principle Angle of the galaxy, degrees.

    i: float

    inclination of the galaxy along this principle axis, degrees.

    D: float

    Distance from this galaxy to Earth, Mpc.

  • theta ((4,) tuple) – Contains model parameters (log_Var, phi, Zc, gradZ)

  • mode (str) –

    Options include ‘grid’: make a grid of all possible combos of supplied RA and DEC

    values; use kriging to estimate Z at each point on grid.

    ’list’: just use pairs of RA and DEC values as they are given.

    RA and DEC must have the same length.

    ’auto’: Get RA and DEC values from the df itself.

Returns:

  • Z_pred_matrix ((M,N) np array) – interpolated (kriged) values over the RA, DEC coords given.

  • var_matrix ((M,N) np array) – variances for these predictions

geogals.log_gamma_prior_tenth_to_ten(x)[source]

Gamma distribution with 1% probability of being below 0.1 or above 10

geogals.log_likelihood_exp_model(theta, priors=True)[source]

Function that is optimised by emcee to find the best parameters for our model of the random field, with radially linear mean values for Z, and exponentially-correlated random effects.

The following must be defined as global variables:

Z: (N,) np.array

Observations at N data points

r: (N,) np.array

Covariate that is each spaxel’s distance from the galaxy center.

e_Z: (N,N) np.array

matrix of variance from observational error at all data points.

dist_matrix: (N,N) np.array

matrix of distance between all observed data points.

Parameters:
  • theta (4-tuple) –

    Contains:

    log_A, phi: model parameters for spatial_cov Z_c, gradZ: model parameters for the large scale gradient

  • priors (bool) – If True, folds in the provided priors to the likelihood guess. If False, just computes the likelihood.

Returns:

log_likelihood – The log likelihood of this model with the supplied parameters.

Return type:

float

geogals.log_normal_prior(x, mu, sigma)[source]
geogals.log_prior(theta, init_theta, init_unc_theta)[source]

A prior for the parameters of the model – feel free to tweak or add your own!

geogals.make_RA_DEC_grid(header)[source]

Given a hdu header, create a grid of RA//DEC for each pixel in that file.

geogals.make_physical_lag_grid(header, meta)[source]

Given a hdu header, create a grid of RA//DEC for each pixel in that file.

Parameters:
  • header (hdu header file) – Must contain wcs

  • meta (dict) – Must contain RA, DEC of the galaxy centre, and PA, i, and D to get the galaxy’s absolute units (D should be in units of megaparsecs; PA and i should be in units of degrees).

geogals.to_data_dict(header, Z, e_Z)[source]
Parameters:
  • header (hdu header file) – Must contain wcs

  • Z (np array) – Grid with values of our random field

  • e_Z (np array) – Same shape as Z Gives uncertainty of Z at each location

Returns:

data_dict – Contains RA, DEC, Z and e_Z for every

Return type:

dict