binny.utils.validators module#

Validation utilities for binning and axis-related functions.

binny.utils.validators.edge_coercion(bin_indices: Sequence[int], bin_edges: Mapping[int, tuple[float, float]] | Sequence[float] | ndarray[tuple[Any, ...], dtype[floating]]) → dict[int, tuple[float, float]]#

Returns a mapping from bin index to (lo, hi) edge correlations.

This normalizes bin-edge inputs into a consistent dictionary form. It supports either an explicit per-bin edge mapping or a single strictly increasing edge array interpreted in the standard way, where bin j corresponds to (edges[j], edges[j+1]).

Parameters:

bin_indices – Bin indices that must be present in the returned mapping.
bin_edges – Either a mapping {idx: (lo, hi)} or a 1D strictly increasing edge array [e0, e1, ..., eN].

Returns:

A mapping {idx: (lo, hi)} with float-valued edge correlations.

Raises:

ValueError – If a required bin index is missing from a mapping input.
ValueError – If an edge array is not 1D, has fewer than two entries, contains non-finite values, or is not strictly increasing.
ValueError – If any requested bin index is out of range for an edge array.

binny.utils.validators.resolve_binning_method(name: str) → str#

Returns the canonical binning method identifier for a user-supplied name.

This provides a small normalization layer for user input by accepting common aliases (case-insensitive) and mapping them to the internal method names used throughout the package. Normalizing method names early makes downstream binning code simpler and ensures consistent behavior across APIs.

Parameters:: name – Binning method name or alias (case-insensitive).
Returns:: one of "equidistant", "log", "equal_number", "equal_information", "equidistant_chi", or "geometric".
Return type:: Canonical method name
Raises:: ValueError – If name is not a recognized method name or alias.

binny.utils.validators.validate_axis_and_weights(x: Sequence[float] | ndarray[tuple[Any, ...], dtype[floating]], weights: Sequence[float] | ndarray[tuple[Any, ...], dtype[floating]]) → tuple[ndarray[tuple[Any, ...], dtype[float64]], ndarray[tuple[Any, ...], dtype[float64]]]#

Returns validated 1D axis values and weights as float64 arrays.

This validates a sampling axis and a corresponding weight array for use in binning routines (e.g., equal-number or equal-information edges). It ensures both inputs are 1D, aligned in length, finite, and suitable for algorithms that assume a strictly increasing axis.

Parameters:

x – 1D array-like of axis values.
weights – 1D array-like of weights corresponding to x.

Returns:

Tuple (x_arr, w_arr) as 1D float64 NumPy arrays.

Raises:

ValueError – If x is not 1D, if weights is not 1D, if they have different shapes, if either contains non-finite values, if x has fewer than two points, or if x is not strictly increasing.

binny.utils.validators.validate_grid_spec(x_min: float, x_max: float, n: int, *, log: bool = False) → tuple[float, float, int]#

Returns validated grid endpoints and point count.

This validates inputs for sampling-grid builders (e.g., linear or log grids). It ensures endpoints are finite and ordered, and enforces positivity for log-spaced grids.

Parameters:

x_min – Lower endpoint of the grid.
x_max – Upper endpoint of the grid. Must be strictly greater than x_min.
n – Number of grid points. Must be an integer >= 2.
log – If True, requires x_min > 0 and x_max > 0.

Returns:

Tuple (x_min_f, x_max_f, n_int) with endpoints as floats and n as int.

Raises:

TypeError – If n is not an integer-like value or endpoints are not real.
ValueError – If endpoints are not finite, not increasing, or (for log) not strictly positive, or if n < 2.

binny.utils.validators.validate_interval(x_min: float, x_max: float, n_bins: int, *, log: bool = False) → None#

Validates an axis interval and binning mode for edge construction.

This checks that the interval endpoints are finite and ordered, and that the interval is compatible with the requested spacing mode. It is useful for bin-edge builders that assume a well-defined interval (and for log/geometric spacing, strictly positive bounds).

Parameters:

x_min – Minimum value of the axis.
x_max – Maximum value of the axis.
n_bins – Number of bins.
log – Whether the bins are logarithmically (or geometric) spaced.

Raises:

TypeError – If n_bins is not an integer (via validate_n_bins).
ValueError – If n_bins is not positive (via validate_n_bins), if x_min or x_max are not finite, if x_max <= x_min, or if log is True and either bound is non-positive.

binny.utils.validators.validate_mixed_segments(segments: Sequence[Mapping[str, Any]], *, total_n_bins: int | None = None) → None#

Validates a mixed-binning segment specification.

This checks that a sequence of segment dictionaries is well-formed for mixed binning workflows, where different binning methods are applied over different regions. It verifies required fields, validates each segment bin count, ensures each method name is recognized, and optionally enforces that segment bin counts sum to an expected total.

Parameters:

segments – Sequence of segment specifications. Each segment must be a mapping containing: - "method": Binning method name or alias. - "n_bins": Number of bins in the segment. Optionally, a segment may include: - "params": Mapping of method-specific parameters.
total_n_bins – Optional expected total number of bins across all segments.

Raises:

ValueError – If segments is empty, if a segment is missing required keys, or if total_n_bins is provided and the sum of segment "n_bins" does not match it.
TypeError – If segments is not a sequence of mappings, if a segment "method" is not a string, if a segment "n_bins" is not an int, or if a provided "params" is not a mapping.
ValueError – If any segment "method" is not recognized (via resolve_binning_method), or if any segment "n_bins" is invalid (via validate_n_bins).

binny.utils.validators.validate_n_bins(n_bins: int, *, allow_one: bool = True, max_bins: int = 1000000) → None#

Validates a requested number of bins.

This guards against invalid bin counts and accidental huge allocations by enforcing positivity, optional constraints on allowing a single bin, and an upper bound. It is typically used at API boundaries before constructing bin edges or allocating arrays that scale with n_bins.

Parameters:

n_bins – Number of bins.
allow_one – If False, requires n_bins > 1.
max_bins – Upper bound to guard against accidental huge allocations.

Raises:

TypeError – If n_bins is not an integer.
ValueError – If n_bins <= 0, if allow_one is False and n_bins == 1, or if n_bins > max_bins.

binny.utils.validators.validate_probability_vector(p: Sequence[float] | ndarray[tuple[Any, ...], dtype[floating]], *, name: str = 'p', rtol: float = 1e-06, atol: float = 1e-12, allow_empty: bool = False) → ndarray[tuple[Any, ...], dtype[float64]]#

Returns a validated 1D probability vector as float64.

Checks: - 1D (and non-empty unless allow_empty=True) - finite - nonnegative - sums to 1 within tolerance

Parameters:

p – Array-like probability vector.
name – Name used in error messages.
rtol – Relative tolerance for the sum-to-one check.
atol – Absolute tolerance for the sum-to-one check.
allow_empty – If True, allows empty vectors (returns empty float64 array).

Returns:

1D float64 NumPy array.

Raises:

ValueError – If the input is not a valid probability vector.

binny.utils.validators.validate_response_matrix(matrix: ndarray[tuple[Any, ...], dtype[float64]], n_bins: int) → None#

Validates a bin-to-bin misassignment (response) matrix.

This checks that a response matrix used to model bin misassignment is compatible with a given number of bins and behaves like a column-stochastic mapping. It is commonly used for photo-z or classification confusion matrices where each column represents the distribution of assigned bins for a true bin.

Parameters:

matrix – 2D NumPy array representing the response/misassignment matrix.
n_bins – Expected number of bins; matrix must have shape (n_bins, n_bins).

Raises:

TypeError – If n_bins is not an integer (via validate_n_bins).
ValueError – If n_bins is not positive (via validate_n_bins), if matrix does not have shape (n_bins, n_bins), if matrix contains non-finite values, if it contains entries less than -1e-15, or if the (clipped) column sums are not close to 1 within tolerance.

binny.utils.validators.validate_same_shape(a: float | int | bool | floating | integer | bool | Sequence[object] | Sequence[Sequence[object]] | ndarray[tuple[Any, ...], dtype[generic]], b: float | int | bool | floating | integer | bool | Sequence[object] | Sequence[Sequence[object]] | ndarray[tuple[Any, ...], dtype[generic]], *, name_a: str = 'a', name_b: str = 'b') → None#: Validates that two array-likes have the same shape.

binny.utils.validators.validated_float_arrays(x: Sequence[float] | ndarray[tuple[Any, ...], dtype[floating]], y: Sequence[float] | ndarray[tuple[Any, ...], dtype[floating]]) → tuple[ndarray[tuple[Any, ...], dtype[float64]], ndarray[tuple[Any, ...], dtype[float64]]]#

Returns two validated 1D float64 arrays with matched shape and finite values.

This is a convenience wrapper for workflows that take paired 1D arrays and need them validated and converted to float64 consistently. It is commonly used before numerical operations that assume aligned samples (e.g., an axis and an associated function evaluated on that axis).

Parameters:

x – First array-like input.
y – Second array-like input.

Returns:

Tuple (x_arr, y_arr) as 1D float64 NumPy arrays.

Raises:

ValueError – If either input is not 1D, if the shapes differ, if either contains non-finite values, if the first input has fewer than two points, or if the first input is not strictly increasing.