binny.nz_tomo.bin_similarity module#

Cross-bin comparison metrics for binned redshift distributions.

binny.nz_tomo.bin_similarity.bin_overlap(z: Any, bins: Mapping[int, Any], *, method: str = 'min', unit: Literal['fraction', 'percent'] = 'fraction', normalize: bool = False, rtol: float = 0.001, atol: float = 1e-06, decimal_places: int | None = 2) → dict[int, dict[int, float]]#

Computes a pairwise comparison matrix for binned redshift distributions.

This function compares all correlations of bin distributions evaluated on a shared redshift grid and returns a symmetric matrix of values.

Supported methods:

"min": Integral of the pointwise minimum of the two curves. If curves are normalized, values lie in [0, 1] and the diagonal is 1.
"cosine": Cosine similarity under a continuous inner product. For nonnegative curves, values lie in [0, 1], with 1 meaning identical up to overall scaling.
"js": Jensen–Shannon distance computed on segment-mass probability vectors. With normalized curves, values lie in [0, 1], with 0 meaning identical and 1 meaning maximally different under this metric.
"hellinger": Hellinger distance on segment-mass probability vectors (in [0, 1]).
"tv": Total variation distance on segment-mass probability vectors (in [0, 1]).

Parameters:

z – One-dimensional redshift grid shared by all bins.
bins – Mapping from bin index to bin distributions evaluated on z.
method – Pairwise metric to compute.
unit – Output units. If "percent", values are multiplied by 100.
normalize – Wheather to normalize curves before comparison.
rtol – Relative tolerance for the normalization check.
atol – Absolute tolerance for the normalization check.
decimal_places – Rounding precision for output values.

Returns:

Nested mapping mat[i][j] giving the pairwise value between bins i and j.

Raises:

ValueError – If method is not supported.

binny.nz_tomo.bin_similarity.leakage_matrix(z: Any, bins: Mapping[int, Any], bin_edges: Mapping[int, tuple[float, float]] | Sequence[float] | ndarray, *, unit: Literal['fraction', 'percent'] = 'fraction', decimal_places: int | None = 2) → dict[int, dict[int, float]]#

Computes a leakage/confusion matrix between bins based on nominal edges.

The leakage matrix leak[i][j] gives the fraction of the total mass in bin i that lies within the edges of bin j. The diagonal entries therefore give the completeness of each bin with respect to its nominal edges, while the off-diagonal entries give the contamination from other bins.

Parameters:

z – One-dimensional redshift grid shared by all bins.
bins – Mapping from bin index to bin distributions evaluated on z.
bin_edges – Either a mapping from bin index to (low, high) edges, or a sequence/array of edges where bin i has edges (bin_edges[i], bin_edges[i+1]).
unit – Output units. If "percent", values are multiplied by 100.
decimal_places – Rounding precision for output values.

Returns:

Nested mapping leak[i][j] giving the fraction of mass in bin i that lies within the edges of bin j.

Raises:

ValueError – If a bin has non-positive total mass.
ValueError – If bin edges are invalid (hi <= lo).
ValueError – If unit is not supported.

binny.nz_tomo.bin_similarity.overlap_pairs(z: Any, bins: Mapping[int, Any], *, threshold: float = 10.0, unit: Literal['fraction', 'percent'] = 'percent', method: str = 'min', direction: Literal['high', 'low'] = 'high', normalize: bool = False, rtol: float = 0.001, atol: float = 1e-06, decimal_places: int | None = 2) → list[tuple[int, int, float]]#

Returns bin-index correlations passing a threshold in a chosen pairwise metric.

This is a convenience wrapper around bin_overlap(). It computes the pairwise matrix and returns unique off-diagonal correlations (i, j) with i < j that pass the requested threshold.

Parameters:

z – One-dimensional redshift grid shared by all bins.
bins – Mapping from bin index to bin distributions evaluated on z.
threshold – Threshold to apply in the units specified by unit.
unit – Units used for both the overlap calculation and the threshold. Accepted values are "fraction" and "percent".
method – Pairwise metric passed to bin_overlap().
direction – Whether to select values >= threshold ("high") or <= threshold ("low").
normalize – Passed to bin_overlap().
rtol – Relative tolerance for normalization check (if needed).
atol – Absolute tolerance for normalization check (if needed).
decimal_places – Rounding precision for output values.

Returns:

List of (i, j, value) tuples with i < j, sorted by decreasing value for direction="high" and increasing value for direction="low".

Raises:

ValueError – If direction is not "high" or "low".

binny.nz_tomo.bin_similarity.pearson_matrix(z: Any, bins: Mapping[int, Any], *, normalize: bool = False, rtol: float = 0.001, atol: float = 1e-06, decimal_places: int | None = 2) → dict[int, dict[int, float]]#

Computes a trapezoid-weighted Pearson correlation matrix between bin curves.

The Pearson correlation between two curves f(z) and g(z) is defined as

corr(f, g) = cov(f, g) / (std(f) * std(g))

where the covariance and standard deviations are computed using trapezoid integration weights over the redshift grid.

Note: if normalize=True, the comparison is in terms of shape correlations, since all curves are normalized to unit integral before computing the correlation. If normalize=False, the correlation reflects both shape and amplitude similarities.

Parameters:

z – One-dimensional redshift grid shared by all bins.
bins – Mapping from bin index to bin distributions evaluated on z.
normalize – Control normalization behavior. If True, all bins are normalized before computing correlations, raising an error if any already look normalized. If False, bins that do not look normalized are normalized with a warning.
rtol – Relative tolerance for the normalization check.
atol – Absolute tolerance for the normalization check.
decimal_places – Rounding precision for output values.

Returns:

Nested mapping corr[i][j] giving the Pearson correlation between bins i and j.

Raises:

ValueError – If a bin has non-positive integral when normalization is checked or performed.