binny.utils.normalization module#

Normalization utilities for 1D data arrays.

binny.utils.normalization.as_bins_dict(bins: Mapping[int, Any]) → dict[int, ndarray[tuple[Any, ...], dtype[float64]]]#

Coerce a bins mapping to dict[int, float64 array].

This helper normalizes bin curve mappings provided by users (or returned by builders) into a consistent representation used across diagnostics. It keeps the public API flexible while ensuring diagnostics can assume a stable type.

Parameters:: bins – Mapping of bin identifiers to bin curves.
Returns:: A dictionary mapping integer bin indices to float64 arrays.

binny.utils.normalization.as_float_array(x: Any, *, name: str) → ndarray[tuple[Any, ...], dtype[float64]]#

Coerce an array-like input to a float64 NumPy array.

This helper standardizes user inputs to a consistent dtype for numerical routines. It is used to keep user-facing APIs forgiving while ensuring that downstream computations receive a predictable array type.

Parameters:

x – Array-like input.
name – Name of the input (used in error messages).

Returns:

A 1D or nD NumPy array with dtype float64.

Raises:

ValueError – If the input cannot be converted to a float array.

binny.utils.normalization.cdf_from_curve(z: ndarray[tuple[Any, ...], dtype[float64]], nz: ndarray[tuple[Any, ...], dtype[float64]]) → tuple[ndarray[tuple[Any, ...], dtype[float64]], float]#

Builds a trapezoid cumulative mass function from a nonnegative curve.

The result is returned at the grid nodes, starting at zero, and accumulating trapezoid segment masses. This representation is convenient for computing weighted quantiles on a discrete grid while keeping the total mass explicit.

Parameters:

z – One-dimensional grid of nodes.
nz – Nonnegative curve values evaluated on z.

Returns:

A tuple (cdf, norm) where cdf is the cumulative trapezoid mass at each node (dtype float64) and norm is the total mass.

Raises:

ValueError – If z or nz are not 1D, have mismatched shapes, contain non-finite values, have fewer than two points, or if z is not strictly increasing.
ValueError – If any values of nz are negative.
ValueError – If the total mass (trapezoid integral) is non-positive.

binny.utils.normalization.curve_norm_mode(*, required: bool, assume_normalized: bool, normalize_if_needed: bool) → Literal['none', 'normalize', 'check']#

Resolves how to treat curve normalization for a given metric call.

Parameters:

required – Whether the chosen metric expects normalized curves.
assume_normalized – User intent: treat curves as normalized.
normalize_if_needed – If True, renormalize curves when they do not appear normalized and normalization is required.

Returns:

One of "none", "normalize", or "check" to pass as curve_norm into binny.utils.normalization.prepare_metric_inputs().

binny.utils.normalization.integrate_bins(z: ndarray[tuple[Any, ...], dtype[float64]], bins: Mapping[int, ndarray[tuple[Any, ...], dtype[float64]]]) → dict[int, float]#

Computes trapezoid integrals for multiple curves evaluated on a shared grid.

This is useful for quickly checking per-bin masses of a collection of sampled distributions (e.g., tomographic n_i(z) curves) defined on the same strictly increasing axis.

Parameters:

z – One-dimensional grid shared by all curves.
bins – Mapping from bin index to curve values evaluated on z.

Returns:

A mapping {bin_idx: integral} of trapezoid areas.

Raises:

ValueError – If bins is empty.
ValueError – If z or any curve is not 1D, has mismatched length with z, contains non-finite values, has fewer than two points, or if z is not strictly increasing. The error message is annotated with the offending bin index.

binny.utils.normalization.normalize_1d(x: ndarray[tuple[Any, ...], dtype[float64]], y: ndarray[tuple[Any, ...], dtype[float64]], *, method: Literal['trapezoid', 'simpson'] = 'trapezoid') → ndarray[tuple[Any, ...], dtype[float64]]#

Returns y scaled so that its integral over x is 1.

This is commonly used to normalize sampled 1D curves (e.g., probability densities or redshift distributions) defined on a strictly increasing grid, so they can be compared consistently or interpreted as unit-mass functions.

Parameters:

x – One-dimensional grid of sample locations.
y – Values evaluated on x.
method – Numerical integration rule used to compute the normalization.

Returns:

The normalized values as a float64 NumPy array.

Raises:

ValueError – If x or y are not 1D, have mismatched shapes, contain non-finite values, have fewer than two points, or if x is not strictly increasing.
ValueError – If method is not one of "trapezoid" or "simpson".
ValueError – If the computed normalization factor is non-positive.

binny.utils.normalization.normalize_edges(bin_indices: Sequence[int], bin_edges: Mapping[int, tuple[float, float]] | Sequence[float] | ndarray) → dict[int, tuple[float, float]]#

Normalizes bin-edge specifications to a mapping of (lo, hi) per bin index.

Parameters:

bin_indices – Sorted bin indices present in the bins mapping.
bin_edges – Either a mapping {i: (lo, hi)} or an edge array where bin i uses (edges[i], edges[i+1]).

Returns:

(lo, hi)} for all i in bin_indices.

Return type:

Mapping {i

Raises:

ValueError – If required edges are missing or invalid.

binny.utils.normalization.normalize_or_check_curves(z_arr: ndarray, p: Mapping[int, ndarray], *, normalize: bool, check_normalized: bool, rtol: float = 0.001, atol: float = 1e-06, warn_if_already_normalized: bool = False) → dict[int, ndarray]#

Returns curves that are normalized and/or validated for unit integral.

This is a convenience helper for collections of sampled curves on a shared, strictly increasing grid. It can enforce that each curve integrates to one (within tolerance), and it can also normalize curves by dividing by their trapezoid integral.

Parameters:

z_arr – Shared 1D grid of nodes.
p – Mapping from bin id to curve values evaluated on z_arr.
normalize – Whether to divide each curve by its trapezoid integral.
check_normalized – Whether to require each curve to have unit integral within rtol/atol.
rtol – Relative tolerance used for the unit-integral check.
atol – Absolute tolerance used for the unit-integral check.
warn_if_already_normalized – Whether to warn (when normalizing) if a curve already appears normalized within tolerance.

Returns:

A new mapping from bin id to curve arrays (normalized if requested).

Raises:

ValueError – If z_arr or any curve is not 1D, has mismatched length with z_arr, contains non-finite values, has fewer than two points, or if z_arr is not strictly increasing.
ValueError – If any curve has a non-positive trapezoid integral.
ValueError – If check_normalized is True and any curve is not within tolerance of unit integral.

binny.utils.normalization.normalize_over_z(z: ndarray[tuple[Any, ...], dtype[float64]], nz: ndarray[tuple[Any, ...], dtype[float64]]) → ndarray[tuple[Any, ...], dtype[float64]]#: Normalizes nz so that it integrates to 1 over z.

binny.utils.normalization.prepare_metric_inputs(z: Any, bins: Mapping[int, Any], *, mode: Literal['curves', 'segments_prob'], curve_norm: Literal['none', 'normalize', 'check'] = 'none', rtol: float = 0.001, atol: float = 1e-06) → tuple[ndarray, dict[int, ndarray]]#

Validates bin curves and prepares curve- or segment-mass inputs for metrics.

Parameters:

z – Shared 1D grid of nodes.
bins – Mapping from bin index to curve values evaluated on z.
mode – "curves" to return node values; "segments_prob" to return per-segment probability masses (length len(z)-1).
curve_norm – Normalization handling: - "none": no normalization/checking beyond basic validation. - "normalize": divide each curve by its trapezoid integral. - "check": require each curve to have unit integral (within tol).
rtol – Relative tolerance for unit-integral checks.
atol – Absolute tolerance for unit-integral checks.

Returns:

curves dict {i: y(z)} if mode=”curves”
segment probs dict {i: p_k} if mode=”segments_prob”

Return type:

(z_arr, out) where out is

Raises:

ValueError – If inputs are invalid, or normalization checks fail.

binny.utils.normalization.require_bins(bins: Mapping[int, Any] | None, *, cached: Mapping[int, Any] | None = None, name: str = 'bins') → dict[int, ndarray[tuple[Any, ...], dtype[float64]]]#

Resolves bins from an explicit argument or cached bins.

This helper supports wrapper-style APIs where diagnostics accept an optional bins argument but may also use bins cached on an instance.

Parameters:

bins – Optional bins mapping provided by the caller.
cached – Optional cached bins mapping (for wrapper classes).
name – Name used in error messages.

Returns:

A bins dictionary with integer keys and float64 arrays.

Raises:

ValueError – If neither bins nor cached are provided.

binny.utils.normalization.trapz_weights(z_arr: ndarray) → ndarray[tuple[Any, ...], dtype[float64]]#

Returns trapezoid-rule integration weights for a strictly increasing 1D grid.

The returned weights satisfy np.trapezoid(f, x=z_arr) == np.sum(w * f) for arrays f evaluated at the grid nodes, which is useful for vectorized integrations and repeated inner products on a fixed axis.

Parameters:

z_arr – 1D grid of nodes.

Returns:

A float64 array of node weights with the same shape as z_arr. For grids with fewer than two points, the weights are all zeros.

Raises:

ValueError – If z_arr is not 1D.
ValueError – If z_arr is not strictly increasing.

binny.utils.normalization.weighted_quantile_from_cdf(z_arr: ndarray, cdf: ndarray, norm: float, q: float, *, side: Literal['left', 'right'] = 'left') → float#

Returns a weighted quantile from a precomputed cumulative mass array.

This finds the location where the cumulative mass reaches q * norm and linearly interpolates between adjacent grid nodes. It is intended to be used with cumulative masses produced by trapezoid integration on the same node grid.

Parameters:

z_arr – 1D array of strictly increasing grid nodes.
cdf – 1D array of cumulative masses at the nodes (nondecreasing).
norm – Total mass associated with the CDF.
q – Quantile in the interval [0, 1].
side – Side argument forwarded to np.searchsorted for locating the target.

Returns:

The weighted quantile value on the z_arr grid.

Raises:

ValueError – If q is outside [0, 1].
ValueError – If norm is not positive.
ValueError – If z_arr and cdf are not 1D arrays of the same nonzero length.
ValueError – If z_arr is not strictly increasing.
ValueError – If cdf is not nondecreasing.