Created
July 16, 2020 00:09
-
-
Save arnavm/a0beb308a6656b3db2b30420297462f3 to your computer and use it in GitHub Desktop.
A memory-efficient Rand index calculator for two segmentations
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
import numpy as np | |
from ruptures.metrics import randindex # For comparison | |
# Rand index for two different segmentations | |
def arithmeticRandIndex(intervals_A, intervals_B, overlaps): | |
# All three inputs are lists of contiguous interval lengths | |
assert np.sum(intervals_A) == np.sum(intervals_B) | |
total = np.sum(intervals_A) | |
return 1 - (np.sum(intervals_A ** 2) + np.sum(intervals_B ** 2) - 2 * np.sum(overlaps ** 2)) / (total ** 2) | |
# Example comparing two segmentations where values are locations of breakpoints: | |
# From https://ctruong.perso.math.cnrs.fr/ruptures-docs/build/html/metrics/randindex.html | |
A = [100, 200, 500] | |
B = [105, 115, 350, 400, 500] | |
# This calls a subroutine that creates a matrix (geometric) representation of the interval overlaps | |
# As such, it scales as O(n^2) where n = total size of each segment | |
print(randindex(A, B)) | |
# 0.6634 | |
# Instead, pass the lengths of segments between breakpoints, | |
# as well as the lengths of the intervals from the intersection of the two segmentations. | |
# This arithmetic approach should scale as O(n). | |
A_intervals = [100, 100, 300] | |
B_intervals = [105, 10, 235, 50, 100] | |
overlap_intervals = [100, 5, 10, 85, 150, 50, 100] | |
print(arithmeticRandIndex(A_intervals, B_intervals, overlap_intervals)) | |
# 0.6634 |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment