Skip to content

Instantly share code, notes, and snippets.

Created December 8, 2016 23:32
Show Gist options
  • Save anonymous/72ab05a288a651f707a9f830a1e68023 to your computer and use it in GitHub Desktop.
Save anonymous/72ab05a288a651f707a9f830a1e68023 to your computer and use it in GitHub Desktop.

The Setup:

I have MRI data from a group of patients with brain damage (due to stroke, head injury, etc). For each patient, I have a binary MRI image (3D array) for each patient, where a value of 0 or 1 indicates the presence or absence of damage at a particular voxel (3D pixel). Each image is ~870k voxels.

The Goal:

Return a list of patients with damage at a given voxel.

Here's the stupid/inelegant way to do this:

voxel_id = 12345 # Index of the voxel I'm interested in
patient_matches = []
for patient in patient_list:
  patient_img_file = '/path/to/images/{}.MRI'.format(patient) # Get the path to the image for this patient.
  img = load_image(patient_img_file) # Load the image as a NumPy ndarray.
  img_flat = img.ravel() #Flatten to a 1D
  vox_val = img[voxel_id]
  if vox_val == 1:
    patient_matches.append(patient)

The Original Plan

  1. Load and flatten each image, so I have a 1x870k vector for each patient.
  2. Read these vectors into a DB (table='damage'), with 1 row per patient and 1 column per voxel.
  3. Query select patient_id from damage where vox12345 = 1
@wiredfool
Copy link

A quick test on a vm on a slow laptop:

import numpy as np

# dense bitpacked storage. 
# note row-major ordering, so extracting one feature is just a range, not 100 random accesses.
features = np.random.randint(0,2,(870000,100), np.bool)

def patients(features, foi):
    return [i for i,v in enumerate(features[foi,:]) if v]

>>> patients(12345)
[0, 7, 11, 12, 13, 15, 17, 19, 21, 22, 26, 28, 31, 33, 35, 36, 38, 42, 44, 46, 47, 48, 49, 50, 53, 54, 55, 58, 60, 61, 62, 63, 70, 73, 75, 78, 79, 82, 83, 84, 85, 86, 87, 89, 90, 92, 95, 96, 98]

I'm seeing 100MB process memory usage and 1000 accesses taking 45ms.

Rough guess as to postgres database size would be ( 100 * 870k * 5% (damage)) * (2 * 4bytes/int + header) * 2 (index) or roughly the same.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment