binny.nz_tomo.between_sample_metrics module#

Cross-bin comparison metrics for binned redshift distributions.

binny.nz_tomo.between_sample_metrics.between_bin_overlap(z: Any, bins_a: Mapping[int, Any], bins_b: Mapping[int, Any], *, method: str = 'min', unit: Literal['fraction', 'percent'] = 'fraction', normalize: bool = False, rtol: float = 0.001, atol: float = 1e-06, decimal_places: int | None = 2) dict[int, dict[int, float]]#

Computes a rectangular pairwise comparison matrix between two bin sets.

This function compares all bin distributions from one tomographic sample against all bin distributions from another tomographic sample, assuming both are evaluated on a shared redshift grid. The output is generally rectangular rather than symmetric, since the two samples can contain different bin indices and different numbers of bins.

Supported methods:

  • "min": Integral of the pointwise minimum of the two curves. If curves are normalized, values lie in [0, 1].

  • "cosine": Cosine similarity under a continuous inner product. For nonnegative curves, values lie in [0, 1], with 1 meaning identical up to overall scaling.

  • "js": Jensen–Shannon distance computed on segment-mass probability vectors. With normalized curves, values lie in [0, 1], with 0 meaning identical and larger values meaning more distinct distributions.

  • "hellinger": Hellinger distance on segment-mass probability vectors (in [0, 1]).

  • "tv": Total variation distance on segment-mass probability vectors (in [0, 1]).

Parameters:
  • z – One-dimensional redshift grid shared by both bin sets.

  • bins_a – Mapping from first-sample bin index to bin distributions evaluated on z.

  • bins_b – Mapping from second-sample bin index to bin distributions evaluated on z.

  • method – Pairwise metric to compute.

  • unit – Output units. If "percent", values are multiplied by 100.

  • normalize – Whether to normalize curves before comparison.

  • rtol – Relative tolerance for the normalization check.

  • atol – Absolute tolerance for the normalization check.

  • decimal_places – Rounding precision for output values.

Returns:

Nested mapping mat[i][j] giving the pairwise value between first-sample bin i and second-sample bin j.

Raises:

ValueError – If method is not supported.

binny.nz_tomo.between_sample_metrics.between_interval_mass_matrix(z: Any, bins: Mapping[int, Any], target_edges: Mapping[int, tuple[float, float]] | Sequence[float] | ndarray, *, unit: Literal['fraction', 'percent'] = 'fraction', decimal_places: int | None = 2) dict[int, dict[int, float]]#

Computes a rectangular interval-mass matrix against target bin edges.

The interval-mass matrix mass[i][j] gives the fraction of the total mass in input bin i that lies within target interval j. This is the between-sample analogue of a leakage matrix and is useful, for example, when asking how much of a source bin falls inside a lens-bin interval.

Parameters:
  • z – One-dimensional redshift grid shared by all input bins.

  • bins – Mapping from input bin index to bin distributions evaluated on z.

  • target_edges – Either a mapping from target bin index to (low, high) edges, or a sequence/array of edges where target bin j has edges (target_edges[j], target_edges[j+1]).

  • unit – Output units. If "percent", values are multiplied by 100.

  • decimal_places – Rounding precision for output values.

Returns:

Nested mapping mass[i][j] giving the fraction of mass in input bin i that lies within target interval j.

Raises:
  • ValueError – If a bin has non-positive total mass.

  • ValueError – If target edges are invalid (hi <= lo).

  • ValueError – If unit is not supported.

binny.nz_tomo.between_sample_metrics.between_overlap_pairs(z: Any, bins_a: Mapping[int, Any], bins_b: Mapping[int, Any], *, threshold: float = 10.0, unit: Literal['fraction', 'percent'] = 'percent', method: str = 'min', direction: Literal['high', 'low'] = 'high', normalize: bool = False, rtol: float = 0.001, atol: float = 1e-06, decimal_places: int | None = 2) list[tuple[int, int, float]]#

Returns between-sample bin pairs passing a threshold in a chosen metric.

This is a convenience wrapper around between_bin_overlap(). It computes the rectangular pairwise matrix between two tomographic samples and returns all bin correlations (i, j) that pass the requested threshold.

Parameters:
  • z – One-dimensional redshift grid shared by both bin sets.

  • bins_a – Mapping from first-sample bin index to bin distributions evaluated on z.

  • bins_b – Mapping from second-sample bin index to bin distributions evaluated on z.

  • threshold – Threshold to apply in the units specified by unit.

  • unit – Units used for both the metric calculation and the threshold. Accepted values are "fraction" and "percent".

  • method – Pairwise metric passed to between_bin_overlap().

  • direction – Whether to select values >= threshold ("high") or <= threshold ("low").

  • normalize – Passed to between_bin_overlap().

  • rtol – Relative tolerance for normalization check (if needed).

  • atol – Absolute tolerance for normalization check (if needed).

  • decimal_places – Rounding precision for output values.

Returns:

List of (i, j, value) tuples, where i is a first-sample bin index and j is a second-sample bin index. Results are sorted by decreasing value for direction="high" and increasing value for direction="low".

Raises:

ValueError – If direction is not "high" or "low".

binny.nz_tomo.between_sample_metrics.between_pearson_matrix(z: Any, bins_a: Mapping[int, Any], bins_b: Mapping[int, Any], *, normalize: bool = False, rtol: float = 0.001, atol: float = 1e-06, decimal_places: int | None = 2) dict[int, dict[int, float]]#

Computes a rectangular trapezoid-weighted Pearson matrix between two bin sets.

The Pearson correlation between two curves f(z) and g(z) is defined as

corr(f, g) = cov(f, g) / (std(f) * std(g))

where the covariance and standard deviations are computed using trapezoid integration weights over the redshift grid.

Unlike pearson_matrix(), this function compares two different tomographic samples and therefore returns a rectangular matrix corr[i][j], where i is from the first sample and j is from the second sample.

Note: if normalize=True, the comparison is in terms of shape correlations, since all curves are normalized to unit integral before computing the correlation. If normalize=False, the correlation reflects both shape and amplitude similarities.

Parameters:
  • z – One-dimensional redshift grid shared by both bin sets.

  • bins_a – Mapping from first-sample bin index to bin distributions evaluated on z.

  • bins_b – Mapping from second-sample bin index to bin distributions evaluated on z.

  • normalize – Whether to normalize curves before computing correlations.

  • rtol – Relative tolerance for the normalization check.

  • atol – Absolute tolerance for the normalization check.

  • decimal_places – Rounding precision for output values.

Returns:

Nested mapping corr[i][j] giving the Pearson correlation between first-sample bin i and second-sample bin j.

Raises:
  • ValueError – If either bin set contains a bin with non-positive integral when normalization is checked or performed.

  • ValueError – If the two bin sets are not evaluated on the same z grid.

binny.nz_tomo.between_sample_metrics.bin_overlap(z: Any, bins: Mapping[int, Any], *, method: str = 'min', unit: Literal['fraction', 'percent'] = 'fraction', normalize: bool = False, rtol: float = 0.001, atol: float = 1e-06, decimal_places: int | None = 2) dict[int, dict[int, float]]#

Computes a pairwise comparison matrix for binned redshift distributions.

This function compares all correlations of bin distributions evaluated on a shared redshift grid and returns a symmetric matrix of values.

Supported methods:

  • "min": Integral of the pointwise minimum of the two curves. If curves are normalized, values lie in [0, 1] and the diagonal is 1.

  • "cosine": Cosine similarity under a continuous inner product. For nonnegative curves, values lie in [0, 1], with 1 meaning identical up to overall scaling.

  • "js": Jensen–Shannon distance computed on segment-mass probability vectors. With normalized curves, values lie in [0, 1], with 0 meaning identical and 1 meaning maximally different under this metric.

  • "hellinger": Hellinger distance on segment-mass probability vectors (in [0, 1]).

  • "tv": Total variation distance on segment-mass probability vectors (in [0, 1]).

Parameters:
  • z – One-dimensional redshift grid shared by all bins.

  • bins – Mapping from bin index to bin distributions evaluated on z.

  • method – Pairwise metric to compute.

  • unit – Output units. If "percent", values are multiplied by 100.

  • normalize – Whether to normalize curves before comparison.

  • rtol – Relative tolerance for the normalization check.

  • atol – Absolute tolerance for the normalization check.

  • decimal_places – Rounding precision for output values.

Returns:

Nested mapping mat[i][j] giving the pairwise value between bins i and j.

Raises:

ValueError – If method is not supported.

binny.nz_tomo.between_sample_metrics.leakage_matrix(z: Any, bins: Mapping[int, Any], bin_edges: Mapping[int, tuple[float, float]] | Sequence[float] | ndarray, *, unit: Literal['fraction', 'percent'] = 'fraction', decimal_places: int | None = 2) dict[int, dict[int, float]]#

Computes a leakage/confusion matrix between bins based on nominal edges.

The leakage matrix leak[i][j] gives the fraction of the total mass in bin i that lies within the edges of bin j. The diagonal entries therefore give the completeness of each bin with respect to its nominal edges, while the off-diagonal entries give the contamination from other bins.

Parameters:
  • z – One-dimensional redshift grid shared by all bins.

  • bins – Mapping from bin index to bin distributions evaluated on z.

  • bin_edges – Either a mapping from bin index to (low, high) edges, or a sequence/array of edges where bin i has edges (bin_edges[i], bin_edges[i+1]).

  • unit – Output units. If "percent", values are multiplied by 100.

  • decimal_places – Rounding precision for output values.

Returns:

Nested mapping leak[i][j] giving the fraction of mass in bin i that lies within the edges of bin j.

Raises:
  • ValueError – If a bin has non-positive total mass.

  • ValueError – If bin edges are invalid (hi <= lo).

  • ValueError – If unit is not supported.

binny.nz_tomo.between_sample_metrics.overlap_pairs(z: Any, bins: Mapping[int, Any], *, threshold: float = 10.0, unit: Literal['fraction', 'percent'] = 'percent', method: str = 'min', direction: Literal['high', 'low'] = 'high', normalize: bool = False, rtol: float = 0.001, atol: float = 1e-06, decimal_places: int | None = 2) list[tuple[int, int, float]]#

Returns bin-index correlations passing a threshold in a chosen pairwise metric.

This is a convenience wrapper around bin_overlap(). It computes the pairwise matrix and returns unique off-diagonal correlations (i, j) with i < j that pass the requested threshold.

Parameters:
  • z – One-dimensional redshift grid shared by all bins.

  • bins – Mapping from bin index to bin distributions evaluated on z.

  • threshold – Threshold to apply in the units specified by unit.

  • unit – Units used for both the overlap calculation and the threshold. Accepted values are "fraction" and "percent".

  • method – Pairwise metric passed to bin_overlap().

  • direction – Whether to select values >= threshold ("high") or <= threshold ("low").

  • normalize – Passed to bin_overlap().

  • rtol – Relative tolerance for normalization check (if needed).

  • atol – Absolute tolerance for normalization check (if needed).

  • decimal_places – Rounding precision for output values.

Returns:

List of (i, j, value) tuples with i < j, sorted by decreasing value for direction="high" and increasing value for direction="low".

Raises:

ValueError – If direction is not "high" or "low".

binny.nz_tomo.between_sample_metrics.pearson_matrix(z: Any, bins: Mapping[int, Any], *, normalize: bool = False, rtol: float = 0.001, atol: float = 1e-06, decimal_places: int | None = 2) dict[int, dict[int, float]]#

Computes a trapezoid-weighted Pearson correlation matrix between bin curves.

The Pearson correlation between two curves f(z) and g(z) is defined as

corr(f, g) = cov(f, g) / (std(f) * std(g))

where the covariance and standard deviations are computed using trapezoid integration weights over the redshift grid.

Note: if normalize=True, the comparison is in terms of shape correlations, since all curves are normalized to unit integral before computing the correlation. If normalize=False, the correlation reflects both shape and amplitude similarities.

Parameters:
  • z – One-dimensional redshift grid shared by all bins.

  • bins – Mapping from bin index to bin distributions evaluated on z.

  • normalize – Control normalization behavior. If True, all bins are normalized before computing correlations.

  • rtol – Relative tolerance for the normalization check.

  • atol – Absolute tolerance for the normalization check.

  • decimal_places – Rounding precision for output values.

Returns:

Nested mapping corr[i][j] giving the Pearson correlation between bins i and j.

Raises:

ValueError – If a bin has non-positive integral when normalization is checked or performed.