Skip to content

Instantly share code, notes, and snippets.

@arnavm
Created July 16, 2020 00:09
Show Gist options
  • Save arnavm/a0beb308a6656b3db2b30420297462f3 to your computer and use it in GitHub Desktop.
Save arnavm/a0beb308a6656b3db2b30420297462f3 to your computer and use it in GitHub Desktop.
A memory-efficient Rand index calculator for two segmentations
import numpy as np
from ruptures.metrics import randindex # For comparison
# Rand index for two different segmentations
def arithmeticRandIndex(intervals_A, intervals_B, overlaps):
# All three inputs are lists of contiguous interval lengths
assert np.sum(intervals_A) == np.sum(intervals_B)
total = np.sum(intervals_A)
return 1 - (np.sum(intervals_A ** 2) + np.sum(intervals_B ** 2) - 2 * np.sum(overlaps ** 2)) / (total ** 2)
# Example comparing two segmentations where values are locations of breakpoints:
# From https://ctruong.perso.math.cnrs.fr/ruptures-docs/build/html/metrics/randindex.html
A = [100, 200, 500]
B = [105, 115, 350, 400, 500]
# This calls a subroutine that creates a matrix (geometric) representation of the interval overlaps
# As such, it scales as O(n^2) where n = total size of each segment
print(randindex(A, B))
# 0.6634
# Instead, pass the lengths of segments between breakpoints,
# as well as the lengths of the intervals from the intersection of the two segmentations.
# This arithmetic approach should scale as O(n).
A_intervals = [100, 100, 300]
B_intervals = [105, 10, 235, 50, 100]
overlap_intervals = [100, 5, 10, 85, 150, 50, 100]
print(arithmeticRandIndex(A_intervals, B_intervals, overlap_intervals))
# 0.6634
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment