binny.utils.validators module#
Validation utilities for binning and axis-related functions.
- binny.utils.validators.edge_coercion(bin_indices: Sequence[int], bin_edges: Mapping[int, tuple[float, float]] | Sequence[float] | ndarray[tuple[Any, ...], dtype[floating]]) dict[int, tuple[float, float]]#
Returns a mapping from bin index to
(lo, hi)edge correlations.This normalizes bin-edge inputs into a consistent dictionary form. It supports either an explicit per-bin edge mapping or a single strictly increasing edge array interpreted in the standard way, where bin
jcorresponds to(edges[j], edges[j+1]).- Parameters:
bin_indices – Bin indices that must be present in the returned mapping.
bin_edges – Either a mapping
{idx: (lo, hi)}or a 1D strictly increasing edge array[e0, e1, ..., eN].
- Returns:
A mapping
{idx: (lo, hi)}with float-valued edge correlations.- Raises:
ValueError – If a required bin index is missing from a mapping input.
ValueError – If an edge array is not 1D, has fewer than two entries, contains non-finite values, or is not strictly increasing.
ValueError – If any requested bin index is out of range for an edge array.
- binny.utils.validators.resolve_binning_method(name: str) str#
Returns the canonical binning method identifier for a user-supplied name.
This provides a small normalization layer for user input by accepting common aliases (case-insensitive) and mapping them to the internal method names used throughout the package. Normalizing method names early makes downstream binning code simpler and ensures consistent behavior across APIs.
- Parameters:
name – Binning method name or alias (case-insensitive).
- Returns:
one of
"equidistant","log","equal_number","equal_information","equidistant_chi", or"geometric".- Return type:
Canonical method name
- Raises:
ValueError – If
nameis not a recognized method name or alias.
- binny.utils.validators.validate_axis_and_weights(x: Sequence[float] | ndarray[tuple[Any, ...], dtype[floating]], weights: Sequence[float] | ndarray[tuple[Any, ...], dtype[floating]]) tuple[ndarray[tuple[Any, ...], dtype[float64]], ndarray[tuple[Any, ...], dtype[float64]]]#
Returns validated 1D axis values and weights as float64 arrays.
This validates a sampling axis and a corresponding weight array for use in binning routines (e.g., equal-number or equal-information edges). It ensures both inputs are 1D, aligned in length, finite, and suitable for algorithms that assume a strictly increasing axis.
- Parameters:
x – 1D array-like of axis values.
weights – 1D array-like of weights corresponding to
x.
- Returns:
Tuple
(x_arr, w_arr)as 1Dfloat64NumPy arrays.- Raises:
ValueError – If
xis not 1D, ifweightsis not 1D, if they have different shapes, if either contains non-finite values, ifxhas fewer than two points, or ifxis not strictly increasing.
- binny.utils.validators.validate_grid_spec(x_min: float, x_max: float, n: int, *, log: bool = False) tuple[float, float, int]#
Returns validated grid endpoints and point count.
This validates inputs for sampling-grid builders (e.g., linear or log grids). It ensures endpoints are finite and ordered, and enforces positivity for log-spaced grids.
- Parameters:
x_min – Lower endpoint of the grid.
x_max – Upper endpoint of the grid. Must be strictly greater than
x_min.n – Number of grid points. Must be an integer >= 2.
log – If True, requires
x_min > 0andx_max > 0.
- Returns:
Tuple
(x_min_f, x_max_f, n_int)with endpoints as floats andnas int.- Raises:
TypeError – If
nis not an integer-like value or endpoints are not real.ValueError – If endpoints are not finite, not increasing, or (for
log) not strictly positive, or ifn < 2.
- binny.utils.validators.validate_interval(x_min: float, x_max: float, n_bins: int, *, log: bool = False) None#
Validates an axis interval and binning mode for edge construction.
This checks that the interval endpoints are finite and ordered, and that the interval is compatible with the requested spacing mode. It is useful for bin-edge builders that assume a well-defined interval (and for log/geometric spacing, strictly positive bounds).
- Parameters:
x_min – Minimum value of the axis.
x_max – Maximum value of the axis.
n_bins – Number of bins.
log – Whether the bins are logarithmically (or geometric) spaced.
- Raises:
TypeError – If
n_binsis not an integer (viavalidate_n_bins).ValueError – If
n_binsis not positive (viavalidate_n_bins), ifx_minorx_maxare not finite, ifx_max <= x_min, or iflogis True and either bound is non-positive.
- binny.utils.validators.validate_mixed_segments(segments: Sequence[Mapping[str, Any]], *, total_n_bins: int | None = None) None#
Validates a mixed-binning segment specification.
This checks that a sequence of segment dictionaries is well-formed for mixed binning workflows, where different binning methods are applied over different regions. It verifies required fields, validates each segment bin count, ensures each method name is recognized, and optionally enforces that segment bin counts sum to an expected total.
- Parameters:
segments – Sequence of segment specifications. Each segment must be a mapping containing: -
"method": Binning method name or alias. -"n_bins": Number of bins in the segment. Optionally, a segment may include: -"params": Mapping of method-specific parameters.total_n_bins – Optional expected total number of bins across all segments.
- Raises:
ValueError – If
segmentsis empty, if a segment is missing required keys, or iftotal_n_binsis provided and the sum of segment"n_bins"does not match it.TypeError – If
segmentsis not a sequence of mappings, if a segment"method"is not a string, if a segment"n_bins"is not an int, or if a provided"params"is not a mapping.ValueError – If any segment
"method"is not recognized (viaresolve_binning_method), or if any segment"n_bins"is invalid (viavalidate_n_bins).
- binny.utils.validators.validate_n_bins(n_bins: int, *, allow_one: bool = True, max_bins: int = 1000000) None#
Validates a requested number of bins.
This guards against invalid bin counts and accidental huge allocations by enforcing positivity, optional constraints on allowing a single bin, and an upper bound. It is typically used at API boundaries before constructing bin edges or allocating arrays that scale with
n_bins.- Parameters:
n_bins – Number of bins.
allow_one – If False, requires
n_bins > 1.max_bins – Upper bound to guard against accidental huge allocations.
- Raises:
TypeError – If
n_binsis not an integer.ValueError – If
n_bins <= 0, ifallow_oneis False andn_bins == 1, or ifn_bins > max_bins.
- binny.utils.validators.validate_probability_vector(p: Sequence[float] | ndarray[tuple[Any, ...], dtype[floating]], *, name: str = 'p', rtol: float = 1e-06, atol: float = 1e-12, allow_empty: bool = False) ndarray[tuple[Any, ...], dtype[float64]]#
Returns a validated 1D probability vector as float64.
Checks: - 1D (and non-empty unless allow_empty=True) - finite - nonnegative - sums to 1 within tolerance
- Parameters:
p – Array-like probability vector.
name – Name used in error messages.
rtol – Relative tolerance for the sum-to-one check.
atol – Absolute tolerance for the sum-to-one check.
allow_empty – If True, allows empty vectors (returns empty float64 array).
- Returns:
1D float64 NumPy array.
- Raises:
ValueError – If the input is not a valid probability vector.
- binny.utils.validators.validate_response_matrix(matrix: ndarray[tuple[Any, ...], dtype[float64]], n_bins: int) None#
Validates a bin-to-bin misassignment (response) matrix.
This checks that a response matrix used to model bin misassignment is compatible with a given number of bins and behaves like a column-stochastic mapping. It is commonly used for photo-z or classification confusion matrices where each column represents the distribution of assigned bins for a true bin.
- Parameters:
matrix – 2D NumPy array representing the response/misassignment matrix.
n_bins – Expected number of bins;
matrixmust have shape(n_bins, n_bins).
- Raises:
TypeError – If
n_binsis not an integer (viavalidate_n_bins).ValueError – If
n_binsis not positive (viavalidate_n_bins), ifmatrixdoes not have shape(n_bins, n_bins), ifmatrixcontains non-finite values, if it contains entries less than-1e-15, or if the (clipped) column sums are not close to 1 within tolerance.
- binny.utils.validators.validate_same_shape(a: float | int | bool | floating | integer | bool | Sequence[object] | Sequence[Sequence[object]] | ndarray[tuple[Any, ...], dtype[generic]], b: float | int | bool | floating | integer | bool | Sequence[object] | Sequence[Sequence[object]] | ndarray[tuple[Any, ...], dtype[generic]], *, name_a: str = 'a', name_b: str = 'b') None#
Validates that two array-likes have the same shape.
- binny.utils.validators.validated_float_arrays(x: Sequence[float] | ndarray[tuple[Any, ...], dtype[floating]], y: Sequence[float] | ndarray[tuple[Any, ...], dtype[floating]]) tuple[ndarray[tuple[Any, ...], dtype[float64]], ndarray[tuple[Any, ...], dtype[float64]]]#
Returns two validated 1D float64 arrays with matched shape and finite values.
This is a convenience wrapper for workflows that take paired 1D arrays and need them validated and converted to
float64consistently. It is commonly used before numerical operations that assume aligned samples (e.g., an axis and an associated function evaluated on that axis).- Parameters:
x – First array-like input.
y – Second array-like input.
- Returns:
Tuple
(x_arr, y_arr)as 1Dfloat64NumPy arrays.- Raises:
ValueError – If either input is not 1D, if the shapes differ, if either contains non-finite values, if the first input has fewer than two points, or if the first input is not strictly increasing.