binny.utils.pairwise_metrics module#

Pairwise distance/similarity metrics for curves or segment-mass vectors.

binny.utils.pairwise_metrics.apply_unit(mat: dict[int, dict[int, float]], unit: Literal['fraction', 'percent']) → dict[int, dict[int, float]]#

Returns a unit-converted copy of a nested-dict metric matrix.

This helper converts matrices expressed as fractions to percentages when requested, while preserving the nested-dict structure.

Parameters:

mat – Nested dictionary mat[i][j] of metric values.
unit – Output unit, either "fraction" or "percent".

Returns:

A nested dictionary in the requested unit.

Raises:

ValueError – If unit is not "fraction" or "percent".

binny.utils.pairwise_metrics.fill_symmetric(bin_indices: list[int], pair_value: Callable[[int, int], float]) → dict[int, dict[int, float]]#

Returns a symmetric nested-dict matrix from a pairwise value function.

This helper evaluates a pairwise metric on a set of bin indices and stores the results in a symmetric nested dictionary. It computes the upper triangle (including the diagonal) and mirrors values to fill the lower triangle.

Parameters:

bin_indices – Bin ids to include in the output matrix.
pair_value – Callable returning the metric value for a pair of bin ids.

Returns:

A nested dictionary out[i][j] containing the metric for each pair of bin ids.

Raises:

Exception – Propagates any exception raised by pair_value during evaluation.

binny.utils.pairwise_metrics.mass_per_segment(z_arr: ndarray[tuple[Any, ...], dtype[float64]], p_arr: ndarray[tuple[Any, ...], dtype[float64]]) → ndarray[tuple[Any, ...], dtype[float64]]#

Returns trapezoid masses per grid segment for a curve sampled at nodes.

This converts node values into per-interval masses using the trapezoid rule, which is useful for building cumulative masses, rebinning, or diagnostics that operate on segment contributions rather than node values.

Parameters:

z_arr – 1D array of grid nodes.
p_arr – 1D array of curve values at the nodes.

Returns:

A float64 array of length len(z_arr) - 1 containing trapezoid masses for each adjacent node interval.

Raises:

ValueError – If inputs are not 1D arrays of the same length.
ValueError – If z_arr is not strictly increasing.

binny.utils.pairwise_metrics.pair_cosine(z_arr: ndarray[tuple[Any, ...], dtype[float64]], p: Mapping[int, ndarray[tuple[Any, ...], dtype[float64]]]) → Callable[[int, int], float]#

Computes cosine similarity under a trapezoid inner product.

This similarity treats curves as functions on a shared grid and computes a cosine-like similarity using trapezoid integration to define the inner product and L2 norms. It is useful for comparing curve shapes while reducing sensitivity to overall scale; values near 1 indicate similar shapes, and values near 0 indicate near-orthogonality under the chosen inner product.

This function validates and caches the input curves once, precomputes per-curve norms, and returns a callable suitable for repeated pairwise evaluation.

Parameters:

z_arr – 1D grid of nodes used for trapezoid integration.
p – Mapping from bin id to curve values evaluated on z_arr.

Returns:

A function f(i, j) that returns cosine similarity for bins i and j. If either curve has zero norm under the trapezoid inner product, the similarity is defined to be 0.

Raises:

ValueError – If z_arr is not a valid strictly increasing 1D grid, or any curve is invalid on that grid.
KeyError – If i or j is not present in p when evaluating the callable.

binny.utils.pairwise_metrics.pair_hellinger(masses: Mapping[int, ndarray[tuple[Any, ...], dtype[float64]]]) → Callable[[int, int], float]#

Computes Hellinger distance between probability vectors.

Hellinger distance is a bounded, symmetric distance on discrete probability vectors. It is often used for stable comparisons of distributions represented on a fixed set of bins or segments. The returned callable validates inputs on each evaluation (shape/probability checks) using binny.utils.validators.validate_probability_vector().

Parameters:

masses – Mapping from bin id to 1D probability vectors.

Returns:

A function f(i, j) that returns Hellinger distance for bins i and j.

Raises:

KeyError – If i or j is not present in masses when evaluating the callable.
ValueError – If either vector is not a valid probability vector, or shapes differ.

binny.utils.pairwise_metrics.pair_js(masses: Mapping[int, ndarray[tuple[Any, ...], dtype[float64]]]) → Callable[[int, int], float]#

Computes Jensen–Shannon distance between probability vectors.

This distance compares two discrete probability vectors (e.g., per-segment mass probabilities) using Jensen–Shannon divergence and returns its square root. With base-2 logarithms, the resulting distance is bounded in [0, 1] and is symmetric. The returned callable validates inputs on each evaluation (shape/probability checks) using binny.utils.validators.validate_probability_vector().

Parameters:

masses – Mapping from bin id to 1D probability vectors.

Returns:

A function f(i, j) that returns Jensen–Shannon distance for bins i and j.

Raises:

KeyError – If i or j is not present in masses when evaluating the callable.
ValueError – If either vector is not a valid probability vector, or shapes differ.

binny.utils.pairwise_metrics.pair_min(z_arr: ndarray[tuple[Any, ...], dtype[float64]], p: Mapping[int, ndarray[tuple[Any, ...], dtype[float64]]]) → Callable[[int, int], float]#

Computes overlap as the integral of the pointwise minimum.

This overlap score integrates min(p_i(z), p_j(z)) over a shared grid using the trapezoid rule. It is commonly used to quantify how strongly two nonnegative distributions overlap (e.g., tomographic n_i(z) bins), with larger values indicating more shared support.

This function validates and caches the input curves once, and returns a callable suitable for repeated pairwise evaluation.

Parameters:

z_arr – 1D grid of nodes used for trapezoid integration.
p – Mapping from bin id to curve values evaluated on z_arr.

Returns:

A function f(i, j) that returns the overlap integral for bins i and j.

Raises:

ValueError – If z_arr is not a valid strictly increasing 1D grid,
or any curve is invalid on that grid. –
KeyError – If i or j is not present in p when evaluating the callable.

binny.utils.pairwise_metrics.pair_tv(masses: Mapping[int, ndarray[tuple[Any, ...], dtype[float64]]]) → Callable[[int, int], float]#

Computes total variation distance between probability vectors.

Total variation distance is half the L1 distance between two discrete probability vectors. For valid probability vectors, it is bounded in [0, 1] and gives an interpretable notion of distributional difference. The returned callable validates inputs on each evaluation (shape/probability checks) using binny.utils.validators.validate_probability_vector().

Parameters:

masses – Mapping from bin id to 1D probability vectors.

Returns:

A function f(i, j) that returns total variation distance for: bins i and j.

Raises:

KeyError – If i or j is not present in masses when evaluating the callable.
ValueError – If either vector is not a valid probability vector, or shapes differ.

binny.utils.pairwise_metrics.prepare_metric_inputs(z_arr: ndarray[tuple[Any, ...], dtype[float64]], p: Mapping[int, ndarray[tuple[Any, ...], dtype[float64]]], *, mode: Literal['curves', 'segments_prob'], curve_norm: Literal['none', 'normalize', 'check'] = 'none', rtol: float = 0.001, atol: float = 1e-06) → tuple[ndarray[tuple[Any, ...], dtype[float64]], dict[int, ndarray[tuple[Any, ...], dtype[float64]]]]#

Prepares inputs for pairwise metrics (validate once; optionally normalize).

This is a convenience wrapper that standardizes the common boilerplate for pairwise curve metrics:

Validates z_arr and each curve in p using
validate_axis_and_weights().
Optionally normalizes curves to unit trapezoid integral or
checks they already are.
Optionally converts curves to per-segment probability vectors (segment masses normalized to sum to 1), suitable for discrete probability metrics.

Parameters:

z_arr – 1D strictly increasing grid of nodes.
p – Mapping from id to curve values evaluated on z_arr.
mode – Output mode: - "curves": return validated (and possibly normalized) node curves. - "segments_prob": return per-segment mass probability vectors.
curve_norm – How to treat curve normalization before any conversion: - "none": no normalization checks beyond basic validation. - "normalize": divide each curve by its trapezoid integral. - "check": require each curve integrates to 1 within tolerance.
rtol – Relative tolerance for the unit-integral check when curve_norm="check".
atol – Absolute tolerance for the unit-integral check when curve_norm="check".

Returns:

For mode="curves": arrays have length len(z_arr).
For mode="segments_prob": arrays have length len(z_arr) - 1
and sum to 1.

Return type:

(z_arr, out) where z_arr is float64 and out maps ids to arrays

Raises:

ValueError – If z_arr or any curve fails validation, if a curve has non-positive trapezoid integral (needed for normalize/check), if a check fails, or if a curve yields non-positive total segment mass in "segments_prob" mode.

binny.utils.pairwise_metrics.segment_mass_probs(z_arr: ndarray[tuple[Any, ...], dtype[float64]], p: Mapping[int, ndarray[tuple[Any, ...], dtype[float64]]]) → dict[int, ndarray[tuple[Any, ...], dtype[float64]]]#

Returns per-segment mass probability vectors derived from sampled curves.

This converts each curve into trapezoid masses per segment using binny.utils.normalization.mass_per_segment(), then normalizes the segment masses to a probability vector. The output is suitable for discrete probability-vector metrics such as Jensen–Shannon, Hellinger, and total variation distances.

Parameters:

z_arr – 1D grid of nodes used to define the trapezoid segments.
p – Mapping from bin id to curve values evaluated on z_arr.

Returns:

Mapping from bin id to 1D float64 probability vectors over segments.

Raises:

ValueError – If z_arr or any curve is invalid.
ValueError – If any curve yields non-positive total segment mass.