binny.utils.pairwise_metrics module#
Pairwise distance/similarity metrics for curves or segment-mass vectors.
- binny.utils.pairwise_metrics.apply_unit(mat: dict[int, dict[int, float]], unit: Literal['fraction', 'percent']) dict[int, dict[int, float]]#
Returns a unit-converted copy of a nested-dict metric matrix.
This helper converts matrices expressed as fractions to percentages when requested, while preserving the nested-dict structure.
- Parameters:
mat – Nested dictionary
mat[i][j]of metric values.unit – Output unit, either
"fraction"or"percent".
- Returns:
A nested dictionary in the requested unit.
- Raises:
ValueError – If
unitis not"fraction"or"percent".
- binny.utils.pairwise_metrics.fill_symmetric(bin_indices: list[int], pair_value: Callable[[int, int], float]) dict[int, dict[int, float]]#
Returns a symmetric nested-dict matrix from a pairwise value function.
This helper evaluates a pairwise metric on a set of bin indices and stores the results in a symmetric nested dictionary. It computes the upper triangle (including the diagonal) and mirrors values to fill the lower triangle.
- Parameters:
bin_indices – Bin ids to include in the output matrix.
pair_value – Callable returning the metric value for a pair of bin ids.
- Returns:
A nested dictionary
out[i][j]containing the metric for each pair of bin ids.- Raises:
Exception – Propagates any exception raised by
pair_valueduring evaluation.
- binny.utils.pairwise_metrics.mass_per_segment(z_arr: ndarray[tuple[Any, ...], dtype[float64]], p_arr: ndarray[tuple[Any, ...], dtype[float64]]) ndarray[tuple[Any, ...], dtype[float64]]#
Returns trapezoid masses per grid segment for a curve sampled at nodes.
This converts node values into per-interval masses using the trapezoid rule, which is useful for building cumulative masses, rebinning, or diagnostics that operate on segment contributions rather than node values.
- Parameters:
z_arr – 1D array of grid nodes.
p_arr – 1D array of curve values at the nodes.
- Returns:
A
float64array of lengthlen(z_arr) - 1containing trapezoid masses for each adjacent node interval.- Raises:
ValueError – If inputs are not 1D arrays of the same length.
ValueError – If
z_arris not strictly increasing.
- binny.utils.pairwise_metrics.pair_cosine(z_arr: ndarray[tuple[Any, ...], dtype[float64]], p: Mapping[int, ndarray[tuple[Any, ...], dtype[float64]]]) Callable[[int, int], float]#
Computes cosine similarity under a trapezoid inner product.
This similarity treats curves as functions on a shared grid and computes a cosine-like similarity using trapezoid integration to define the inner product and L2 norms. It is useful for comparing curve shapes while reducing sensitivity to overall scale; values near 1 indicate similar shapes, and values near 0 indicate near-orthogonality under the chosen inner product.
This function validates and caches the input curves once, precomputes per-curve norms, and returns a callable suitable for repeated pairwise evaluation.
- Parameters:
z_arr – 1D grid of nodes used for trapezoid integration.
p – Mapping from bin id to curve values evaluated on
z_arr.
- Returns:
A function
f(i, j)that returns cosine similarity for binsiandj. If either curve has zero norm under the trapezoid inner product, the similarity is defined to be 0.- Raises:
ValueError – If
z_arris not a valid strictly increasing 1D grid, or any curve is invalid on that grid.KeyError – If
iorjis not present inpwhen evaluating the callable.
- binny.utils.pairwise_metrics.pair_hellinger(masses: Mapping[int, ndarray[tuple[Any, ...], dtype[float64]]]) Callable[[int, int], float]#
Computes Hellinger distance between probability vectors.
Hellinger distance is a bounded, symmetric distance on discrete probability vectors. It is often used for stable comparisons of distributions represented on a fixed set of bins or segments. The returned callable validates inputs on each evaluation (shape/probability checks) using
binny.utils.validators.validate_probability_vector().- Parameters:
masses – Mapping from bin id to 1D probability vectors.
- Returns:
A function
f(i, j)that returns Hellinger distance for binsiandj.- Raises:
KeyError – If
iorjis not present inmasseswhen evaluating the callable.ValueError – If either vector is not a valid probability vector, or shapes differ.
- binny.utils.pairwise_metrics.pair_js(masses: Mapping[int, ndarray[tuple[Any, ...], dtype[float64]]]) Callable[[int, int], float]#
Computes Jensen–Shannon distance between probability vectors.
This distance compares two discrete probability vectors (e.g., per-segment mass probabilities) using Jensen–Shannon divergence and returns its square root. With base-2 logarithms, the resulting distance is bounded in
[0, 1]and is symmetric. The returned callable validates inputs on each evaluation (shape/probability checks) usingbinny.utils.validators.validate_probability_vector().- Parameters:
masses – Mapping from bin id to 1D probability vectors.
- Returns:
A function
f(i, j)that returns Jensen–Shannon distance for binsiandj.- Raises:
KeyError – If
iorjis not present inmasseswhen evaluating the callable.ValueError – If either vector is not a valid probability vector, or shapes differ.
- binny.utils.pairwise_metrics.pair_min(z_arr: ndarray[tuple[Any, ...], dtype[float64]], p: Mapping[int, ndarray[tuple[Any, ...], dtype[float64]]]) Callable[[int, int], float]#
Computes overlap as the integral of the pointwise minimum.
This overlap score integrates
min(p_i(z), p_j(z))over a shared grid using the trapezoid rule. It is commonly used to quantify how strongly two nonnegative distributions overlap (e.g., tomographicn_i(z)bins), with larger values indicating more shared support.This function validates and caches the input curves once, and returns a callable suitable for repeated pairwise evaluation.
- Parameters:
z_arr – 1D grid of nodes used for trapezoid integration.
p – Mapping from bin id to curve values evaluated on
z_arr.
- Returns:
A function
f(i, j)that returns the overlap integral for binsiandj.- Raises:
ValueError – If
z_arris not a valid strictly increasing 1D grid,or any curve is invalid on that grid. –
KeyError – If
iorjis not present inpwhen evaluating the callable.
- binny.utils.pairwise_metrics.pair_tv(masses: Mapping[int, ndarray[tuple[Any, ...], dtype[float64]]]) Callable[[int, int], float]#
Computes total variation distance between probability vectors.
Total variation distance is half the L1 distance between two discrete probability vectors. For valid probability vectors, it is bounded in
[0, 1]and gives an interpretable notion of distributional difference. The returned callable validates inputs on each evaluation (shape/probability checks) usingbinny.utils.validators.validate_probability_vector().- Parameters:
masses – Mapping from bin id to 1D probability vectors.
- Returns:
- A function
f(i, j)that returns total variation distance for bins
iandj.
- A function
- Raises:
KeyError – If
iorjis not present inmasseswhen evaluating the callable.ValueError – If either vector is not a valid probability vector, or shapes differ.
- binny.utils.pairwise_metrics.prepare_metric_inputs(z_arr: ndarray[tuple[Any, ...], dtype[float64]], p: Mapping[int, ndarray[tuple[Any, ...], dtype[float64]]], *, mode: Literal['curves', 'segments_prob'], curve_norm: Literal['none', 'normalize', 'check'] = 'none', rtol: float = 0.001, atol: float = 1e-06) tuple[ndarray[tuple[Any, ...], dtype[float64]], dict[int, ndarray[tuple[Any, ...], dtype[float64]]]]#
Prepares inputs for pairwise metrics (validate once; optionally normalize).
This is a convenience wrapper that standardizes the common boilerplate for pairwise curve metrics:
- Validates
z_arrand each curve inpusing validate_axis_and_weights().
- Validates
- Optionally normalizes curves to unit trapezoid integral or
checks they already are.
Optionally converts curves to per-segment probability vectors (segment masses normalized to sum to 1), suitable for discrete probability metrics.
- Parameters:
z_arr – 1D strictly increasing grid of nodes.
p – Mapping from id to curve values evaluated on
z_arr.mode – Output mode: -
"curves": return validated (and possibly normalized) node curves. -"segments_prob": return per-segment mass probability vectors.curve_norm – How to treat curve normalization before any conversion: -
"none": no normalization checks beyond basic validation. -"normalize": divide each curve by its trapezoid integral. -"check": require each curve integrates to 1 within tolerance.rtol – Relative tolerance for the unit-integral check when
curve_norm="check".atol – Absolute tolerance for the unit-integral check when
curve_norm="check".
- Returns:
For
mode="curves": arrays have lengthlen(z_arr).- For
mode="segments_prob": arrays have lengthlen(z_arr) - 1 and sum to 1.
- For
- Return type:
(z_arr, out)wherez_arris float64 andoutmaps ids to arrays- Raises:
ValueError – If
z_arror any curve fails validation, if a curve has non-positive trapezoid integral (needed for normalize/check), if a check fails, or if a curve yields non-positive total segment mass in"segments_prob"mode.
- binny.utils.pairwise_metrics.segment_mass_probs(z_arr: ndarray[tuple[Any, ...], dtype[float64]], p: Mapping[int, ndarray[tuple[Any, ...], dtype[float64]]]) dict[int, ndarray[tuple[Any, ...], dtype[float64]]]#
Returns per-segment mass probability vectors derived from sampled curves.
This converts each curve into trapezoid masses per segment using
binny.utils.normalization.mass_per_segment(), then normalizes the segment masses to a probability vector. The output is suitable for discrete probability-vector metrics such as Jensen–Shannon, Hellinger, and total variation distances.- Parameters:
z_arr – 1D grid of nodes used to define the trapezoid segments.
p – Mapping from bin id to curve values evaluated on
z_arr.
- Returns:
Mapping from bin id to 1D
float64probability vectors over segments.- Raises:
ValueError – If
z_arror any curve is invalid.ValueError – If any curve yields non-positive total segment mass.