binny.utils.normalization module#
Normalization utilities for 1D data arrays.
- binny.utils.normalization.as_bins_dict(bins: Mapping[int, Any]) dict[int, ndarray[tuple[Any, ...], dtype[float64]]]#
Coerce a bins mapping to
dict[int, float64 array].This helper normalizes bin curve mappings provided by users (or returned by builders) into a consistent representation used across diagnostics. It keeps the public API flexible while ensuring diagnostics can assume a stable type.
- Parameters:
bins – Mapping of bin identifiers to bin curves.
- Returns:
A dictionary mapping integer bin indices to float64 arrays.
- binny.utils.normalization.as_float_array(x: Any, *, name: str) ndarray[tuple[Any, ...], dtype[float64]]#
Coerce an array-like input to a float64 NumPy array.
This helper standardizes user inputs to a consistent dtype for numerical routines. It is used to keep user-facing APIs forgiving while ensuring that downstream computations receive a predictable array type.
- Parameters:
x – Array-like input.
name – Name of the input (used in error messages).
- Returns:
A 1D or nD NumPy array with dtype float64.
- Raises:
ValueError – If the input cannot be converted to a float array.
- binny.utils.normalization.cdf_from_curve(z: ndarray[tuple[Any, ...], dtype[float64]], nz: ndarray[tuple[Any, ...], dtype[float64]]) tuple[ndarray[tuple[Any, ...], dtype[float64]], float]#
Builds a trapezoid cumulative mass function from a nonnegative curve.
The result is returned at the grid nodes, starting at zero, and accumulating trapezoid segment masses. This representation is convenient for computing weighted quantiles on a discrete grid while keeping the total mass explicit.
- Parameters:
z – One-dimensional grid of nodes.
nz – Nonnegative curve values evaluated on
z.
- Returns:
A tuple
(cdf, norm)wherecdfis the cumulative trapezoid mass at each node (dtypefloat64) andnormis the total mass.- Raises:
ValueError – If
zornzare not 1D, have mismatched shapes, contain non-finite values, have fewer than two points, or ifzis not strictly increasing.ValueError – If any values of
nzare negative.ValueError – If the total mass (trapezoid integral) is non-positive.
- binny.utils.normalization.curve_norm_mode(*, required: bool, assume_normalized: bool, normalize_if_needed: bool) Literal['none', 'normalize', 'check']#
Resolves how to treat curve normalization for a given metric call.
- Parameters:
required – Whether the chosen metric expects normalized curves.
assume_normalized – User intent: treat curves as normalized.
normalize_if_needed – If True, renormalize curves when they do not appear normalized and normalization is required.
- Returns:
One of
"none","normalize", or"check"to pass ascurve_normintobinny.utils.normalization.prepare_metric_inputs().
- binny.utils.normalization.integrate_bins(z: ndarray[tuple[Any, ...], dtype[float64]], bins: Mapping[int, ndarray[tuple[Any, ...], dtype[float64]]]) dict[int, float]#
Computes trapezoid integrals for multiple curves evaluated on a shared grid.
This is useful for quickly checking per-bin masses of a collection of sampled distributions (e.g., tomographic
n_i(z)curves) defined on the same strictly increasing axis.- Parameters:
z – One-dimensional grid shared by all curves.
bins – Mapping from bin index to curve values evaluated on
z.
- Returns:
A mapping
{bin_idx: integral}of trapezoid areas.- Raises:
ValueError – If
binsis empty.ValueError – If
zor any curve is not 1D, has mismatched length withz, contains non-finite values, has fewer than two points, or ifzis not strictly increasing. The error message is annotated with the offending bin index.
- binny.utils.normalization.normalize_1d(x: ndarray[tuple[Any, ...], dtype[float64]], y: ndarray[tuple[Any, ...], dtype[float64]], *, method: Literal['trapezoid', 'simpson'] = 'trapezoid') ndarray[tuple[Any, ...], dtype[float64]]#
Returns
yscaled so that its integral overxis 1.This is commonly used to normalize sampled 1D curves (e.g., probability densities or redshift distributions) defined on a strictly increasing grid, so they can be compared consistently or interpreted as unit-mass functions.
- Parameters:
x – One-dimensional grid of sample locations.
y – Values evaluated on
x.method – Numerical integration rule used to compute the normalization.
- Returns:
The normalized values as a
float64NumPy array.- Raises:
ValueError – If
xoryare not 1D, have mismatched shapes, contain non-finite values, have fewer than two points, or ifxis not strictly increasing.ValueError – If
methodis not one of"trapezoid"or"simpson".ValueError – If the computed normalization factor is non-positive.
- binny.utils.normalization.normalize_edges(bin_indices: Sequence[int], bin_edges: Mapping[int, tuple[float, float]] | Sequence[float] | ndarray) dict[int, tuple[float, float]]#
Normalizes bin-edge specifications to a mapping of (lo, hi) per bin index.
- Parameters:
bin_indices – Sorted bin indices present in the bins mapping.
bin_edges – Either a mapping {i: (lo, hi)} or an edge array where bin i uses (edges[i], edges[i+1]).
- Returns:
(lo, hi)} for all i in bin_indices.
- Return type:
Mapping {i
- Raises:
ValueError – If required edges are missing or invalid.
- binny.utils.normalization.normalize_or_check_curves(z_arr: ndarray, p: Mapping[int, ndarray], *, normalize: bool, check_normalized: bool, rtol: float = 0.001, atol: float = 1e-06, warn_if_already_normalized: bool = False) dict[int, ndarray]#
Returns curves that are normalized and/or validated for unit integral.
This is a convenience helper for collections of sampled curves on a shared, strictly increasing grid. It can enforce that each curve integrates to one (within tolerance), and it can also normalize curves by dividing by their trapezoid integral.
- Parameters:
z_arr – Shared 1D grid of nodes.
p – Mapping from bin id to curve values evaluated on
z_arr.normalize – Whether to divide each curve by its trapezoid integral.
check_normalized – Whether to require each curve to have unit integral within
rtol/atol.rtol – Relative tolerance used for the unit-integral check.
atol – Absolute tolerance used for the unit-integral check.
warn_if_already_normalized – Whether to warn (when normalizing) if a curve already appears normalized within tolerance.
- Returns:
A new mapping from bin id to curve arrays (normalized if requested).
- Raises:
ValueError – If
z_arror any curve is not 1D, has mismatched length withz_arr, contains non-finite values, has fewer than two points, or ifz_arris not strictly increasing.ValueError – If any curve has a non-positive trapezoid integral.
ValueError – If
check_normalizedis True and any curve is not within tolerance of unit integral.
- binny.utils.normalization.normalize_over_z(z: ndarray[tuple[Any, ...], dtype[float64]], nz: ndarray[tuple[Any, ...], dtype[float64]]) ndarray[tuple[Any, ...], dtype[float64]]#
Normalizes
nzso that it integrates to 1 overz.
- binny.utils.normalization.prepare_metric_inputs(z: Any, bins: Mapping[int, Any], *, mode: Literal['curves', 'segments_prob'], curve_norm: Literal['none', 'normalize', 'check'] = 'none', rtol: float = 0.001, atol: float = 1e-06) tuple[ndarray, dict[int, ndarray]]#
Validates bin curves and prepares curve- or segment-mass inputs for metrics.
- Parameters:
z – Shared 1D grid of nodes.
bins – Mapping from bin index to curve values evaluated on
z.mode –
"curves"to return node values;"segments_prob"to return per-segment probability masses (lengthlen(z)-1).curve_norm – Normalization handling: -
"none": no normalization/checking beyond basic validation. -"normalize": divide each curve by its trapezoid integral. -"check": require each curve to have unit integral (within tol).rtol – Relative tolerance for unit-integral checks.
atol – Absolute tolerance for unit-integral checks.
- Returns:
curves dict {i: y(z)} if mode=”curves”
segment probs dict {i: p_k} if mode=”segments_prob”
- Return type:
(z_arr, out) where out is
- Raises:
ValueError – If inputs are invalid, or normalization checks fail.
- binny.utils.normalization.require_bins(bins: Mapping[int, Any] | None, *, cached: Mapping[int, Any] | None = None, name: str = 'bins') dict[int, ndarray[tuple[Any, ...], dtype[float64]]]#
Resolves bins from an explicit argument or cached bins.
This helper supports wrapper-style APIs where diagnostics accept an optional
binsargument but may also use bins cached on an instance.- Parameters:
bins – Optional bins mapping provided by the caller.
cached – Optional cached bins mapping (for wrapper classes).
name – Name used in error messages.
- Returns:
A bins dictionary with integer keys and float64 arrays.
- Raises:
ValueError – If neither
binsnorcachedare provided.
- binny.utils.normalization.trapz_weights(z_arr: ndarray) ndarray[tuple[Any, ...], dtype[float64]]#
Returns trapezoid-rule integration weights for a strictly increasing 1D grid.
The returned weights satisfy
np.trapezoid(f, x=z_arr) == np.sum(w * f)for arraysfevaluated at the grid nodes, which is useful for vectorized integrations and repeated inner products on a fixed axis.- Parameters:
z_arr – 1D grid of nodes.
- Returns:
A
float64array of node weights with the same shape asz_arr. For grids with fewer than two points, the weights are all zeros.- Raises:
ValueError – If
z_arris not 1D.ValueError – If
z_arris not strictly increasing.
- binny.utils.normalization.weighted_quantile_from_cdf(z_arr: ndarray, cdf: ndarray, norm: float, q: float, *, side: Literal['left', 'right'] = 'left') float#
Returns a weighted quantile from a precomputed cumulative mass array.
This finds the location where the cumulative mass reaches
q * normand linearly interpolates between adjacent grid nodes. It is intended to be used with cumulative masses produced by trapezoid integration on the same node grid.- Parameters:
z_arr – 1D array of strictly increasing grid nodes.
cdf – 1D array of cumulative masses at the nodes (nondecreasing).
norm – Total mass associated with the CDF.
q – Quantile in the interval
[0, 1].side – Side argument forwarded to
np.searchsortedfor locating the target.
- Returns:
The weighted quantile value on the
z_arrgrid.- Raises:
ValueError – If
qis outside[0, 1].ValueError – If
normis not positive.ValueError – If
z_arrandcdfare not 1D arrays of the same nonzero length.ValueError – If
z_arris not strictly increasing.ValueError – If
cdfis not nondecreasing.