Skip to content

Instantly share code, notes, and snippets.

@k3erg
Forked from facepainter/FaceGrab.py
Last active March 27, 2018 08:14
Show Gist options
  • Save k3erg/a042265da4649e5a2b688f393db217a6 to your computer and use it in GitHub Desktop.
Save k3erg/a042265da4649e5a2b688f393db217a6 to your computer and use it in GitHub Desktop.
Batch extract known face from video/image sequence (CNN GPU with CUDA / HoG)

FaceGrab

Extract a known face from a video or image sequence.

Uses a combination of a (precomputed) deep learning CNN model to quickly batch detect faces in video frames then HoG face recognition with single or multiple encodings computed from known references

Using the GPU with CUDA in this way means batch processing face detection in up to 128 frames at at time can be achieved (VRAM dependant). This combined with other speed/optimisation techniques (such as downsampling the process frames, early frame skipping, etc) means that very high quality, low false positive, faces can be extracted from video or image sequences file many times faster than using seperate frame splitting/extraction/detection applications, or by methods that are CPU bound and only operate on individual images.

One important caveat in this process is that the input frames/images must be exactly the same dimensions. As this is primarily geared towards extraction from video this should not be an issue. However it is worth bearing in mind should you get errors processing image sequences :)

Usage

Script based

Example

python facegrab.py -r ./pics/russell-crowe -i "./movies/Gladiator (2000).avi" -o ./output 

If you like you can also watch the process in action with display_output :)

python facegrab.py -r ./pics/russell-crowe -i "./movies/Gladiator (2000).avi" -o ./output -display_output

You can get help by passing -h or --help ... you should always ask for help or rtfm :)

usage: facegrab.py [-h] -r REFERENCE -i INPUT -o OUTPUT [-bs [2-128]]
                   [-sf [0-1000]] [-xs [32-1024]] [-s [0.1-1.0]] [-do]
                   [-t [0.1-1.0]] [-j [1-1000]]

FaceGrab

optional arguments:
  -h, --help            show this help message and exit
  -r REFERENCE, --reference REFERENCE
                        Path to a single file e.g. ./images/someone.jpg or a
                        path to a directory of reference images e.g. ./images.
                        (You can also pass an empty directory if you wish to
                        match all faces).
  -i INPUT, --input INPUT
                        Path to a single file e.g. ./video/foo.mp4 Or a
                        path/pattern of an image sequence e.g.
                        ./frames/img_%04d.jpg (read like
                        ./frames/img_0000.jpg, ./frames/img_0001.jpg,
                        ./frames/img_0002.jpg, ...)
  -o OUTPUT, --output OUTPUT
                        Path to output directory
  -bs [2-128], --batch_size [2-128]
                        How many frames to include in each GPU processing
                        batch.
  -sf [0-1000], --skip_frames [0-1000]
                        How many frames to skip e.g. 5 means look at every 6th
  -xs [32-1024], --extract_size [32-1024]
                        Size in pixels of extracted face images (n*n).
  -ps [0.5-2.0], --padding_scale [0.5-2.0]
                        Scale the detected face-rect to add padding around it.
  -s [0.1-1.0], --scale [0.1-1.0]
                        Factor to down-sample input by for detection
                        processing. If you get too few matches try scaling by
                        half e.g. 0.5
  -do, --display_output
                        Show the detection and extraction images (slows
                        processing).
  -t [0.1-1.0], --tolerance [0.1-1.0]
                        How much "distance" between faces to consider it a
                        match. Lower is stricter. 0.6 is typical best
                        performance
  -j [1-1000], --jitter [1-1000]
                        How many times to re-sample images when calculating
                        recognition encodings. Higher is more accurate, but
                        slower. (100 is 100 times slower than 1).

Class based

FG = FaceGrab('./images/nick-cage-reference')
FG.process('./movies/The Wicker Man.mp4', './extracted/nick-cage-wicker-man')

Or use the Process/Recognition settings to tweak :) you can set/miss any or else leave them out entirely

RS = RecognitionSettings(jitter=1)
PS = ProcessSettings(batch_size=64, extract_size=512, scale=.5)
personA = FaceGrab("someone", RS, PS)
personA.process('a1.mp4', 'a')
personA.process('a2.mp4', 'a')

Or like...

personB = FaceGrab("someone-else", process=ProcessSettings(scale=.125))
personB.process('b.mp4', 'b')
personC = FaceGrab("another-person", recognition=RecognitionSettings(tolerance=.4))
personC.process('c.mp4', 'c')

Also If you want to ensure you have recognition encodings before you begin...

FG = FaceGrab('./images/nick-cage-reference')
if FG.reference_count:
    FG.process('./movies/The Wicker Man.mp4', './extracted/nick-cage-wicker-man')

Help!

Stuff that might happen that isn't what you wanted or expected...oh cruel world!

OOM :( - Memory issues

Very roughly speaking process.batch_size * [input frame dimensions] * process.scale = VRAM As long as you have 2GB+ VRAM and you play with the settings you should be golden :)

The two key things being

  1. Reduce the process.batch_size - note the whole thing will take longer!
  2. Decrease the process.scale e.g. 0.125 (1/8) - you may well get fewer face detections

You could also try re-encoding the video to a lower resolution, but that is cheating and punishable by...nothing.

If you are getting too many false positives (extracted images of the wrong face/not faces)

  1. Use a more varied, higher quality, more representative, range of reference images (ideally ones that look like the person in the input)
  2. Increase the recognition.jitter so that each encoding/check is done using a higher number of resamples - note this will increase the processing time.
  3. Decrease the recognition.tolerance so that each recognition is stricter e.g. 0.4

If you are getting too few matches (missing lots of good images from input)

  1. Use a more varied, higher quality, more representative, range of reference images (ideally ones that look like the person in the input)
  2. Increase the recognition.tolerance so that each recognition is less strict e.g. 0.8
  3. Decrease the recognition.jitter so that each recognition is done fewer resamples (less accurate)
  4. Decrease the process.skip_frames so that more of the input is processed (this might result in very similar extracted images)
  5. Increase the process.scale e.g. 0.5 (1/2) - bearing in mind you may need to reduce the batch_size accordingly

Built using

YMMV - pretty sure it would work just as well with CUDA 9 / cuDNN 7 / etc - but personally I could not get dlib to build with CUDA support against v9/9.1 :(

set COMPILER="C:\Program Files (x86)\Microsoft Visual Studio 14.0\VC\bin\amd64"
set PATH=%COMPILER%;%PATH%
git clone https://github.com/davisking/dlib.git
cd dlib
mkdir build
cd build
cmake -G "Visual Studio 15 Win64" -T host=x64 -DUSE_AVX_INSTRUCTIONS=1 -DDLIB_USE_CUDA=1 -DCUDA_HOST_COMPILER=%COMPILER% ..
cmake --build .
cd ..
python setup.py install --yes USE_AVX_INSTRUCTIONS --yes DLIB_USE_CUDA
'''
Extract a known face from a video.
Uses a combination of a deep learning CNN model to batch detect faces
in video frames, or a sequence of images, in GPU with CUDA and HoG to compare
the detected faces with a computed reference set of face encodings.
'''
from os import path, listdir
from typing import NamedTuple
import cv2
import numpy
import face_recognition
from tqdm import tqdm
class RecognitionSettings(NamedTuple):
'''
Face recognition settings
:param float tolerance: How much "distance" between faces to consider it a match.
:param int jitter: How many times to re-sample images when calculating encodings.
'''
tolerance: float = .6
jitter: int = 10
class ProcessSettings(NamedTuple):
'''
Video process settings
:param int batch_size: How many frames to include in each GPU processing batch.
:param int skip_frames: How many frames to skip e.g. 5 means look at every 6th
:param int extract_size: Size in pixels of extracted face images (n*n).
:param float scale: Amount to down-sample input by for detection processing.
:param bool display_output: Show the detection and extraction images in process.
'''
batch_size: int = 128
skip_frames: int = 6
extract_size: int = 256
scale: float = .25
display_output: bool = False
padding_scale: float = 1.25
class FaceGrab(object):
'''
It sure grabs faces! (tm)
:param str reference: Path to a input data (video/image sequence)
:param RecognitionSettings recognition: Face recognition settings
:param ProcessSettings process: Video process settings
'''
def __init__(self, reference, recognition=None, process=None):
if recognition is None:
recognition = RecognitionSettings()
if process is None:
process = ProcessSettings()
skip_sanity = 1 if process.skip_frames <= 0 else process.skip_frames + 1
self._ps = process._replace(batch_size=numpy.clip(process.batch_size, 2, 128),
skip_frames=skip_sanity,
scale=numpy.clip(process.scale, 0, 1.0))
self._rs = recognition._replace(tolerance=numpy.clip(recognition.tolerance, 0.1, 1))
self._process_frames = []
self._original_frames = []
self._reference_encodings = []
self._total_extracted = 0
self.__check_reference(reference)
print('Found {} face references'.format(self.reference_count))
@property
def reference_count(self):
'''Total currently loaded reference encodings for recognition'''
return len(self._reference_encodings)
@staticmethod
def __downsample(image, scale):
'''Downscale and convert image for faster detection processing'''
sampled = cv2.resize(image, (0, 0), fx=scale, fy=scale) if scale > 0 else image
return sampled[:, :, ::-1] # BGR->RGB
@staticmethod
def __extract(image, face_location, scale):
'''Upscale coordinates and extract face'''
factor = int(1 / scale) if scale > 0 else 1
top, right, bottom, left = face_location
return image[top * factor:bottom * factor, left * factor:right * factor]
@staticmethod
def __format_name(output_path, name):
return path.join(output_path, '{}.jpg'.format(name))
@staticmethod
def __file_count(directory):
'''Returns the number of files in a directory'''
return len([item for item in listdir(directory) if path.isfile(path.join(directory, item))])
def __check_reference(self, reference):
'''Checks if the reference is a wild-card/directory/file and looks for encodings'''
if reference == '*':
return
if path.isdir(reference):
with tqdm(total=self.__file_count(reference), unit='files') as progress:
for file in listdir(reference):
if path.isfile(path.join(reference, file)):
progress.update(1)
progress.set_description('Checking reference: {}'.format(file))
self.__parse_encoding(path.join(reference, file))
return
if path.isfile(reference):
self.__parse_encoding(reference)
return
raise ValueError('Invalid reference: {}'.format(reference))
def __parse_encoding(self, image_path):
'''Adds the first face encoding in an image to the reference encodings'''
image = face_recognition.load_image_file(image_path)
encoding = face_recognition.face_encodings(image, None, self._rs.jitter)
if numpy.any(encoding):
self._reference_encodings.append(encoding[0])
def __recognise(self, face):
'''Checks a given face against any known reference encodings.
If no reference encodings are present any face is classed as recognised.'''
if not self.reference_count:
return True
# TODO: is [(known_face_location),scaled(unknown_face_location)] faster than None?
# location = face_recognition.face_locations(face, 0)
encoding = face_recognition.face_encodings(face, None, self._rs.jitter)
if numpy.any(encoding):
return numpy.any(face_recognition.compare_faces(self._reference_encodings,
encoding[0],
self._rs.tolerance))
return False
def __reset_frames(self):
self._process_frames = []
self._original_frames = []
def __get_face_locations(self):
'''Get the batch face locations and frame number'''
batch = face_recognition.batch_face_locations(self._process_frames, 1, self._ps.batch_size)
for index, locations in enumerate(batch):
yield (index, locations)
def __get_faces(self, image, face_locations):
'''Get the faces from a set of locations'''
for _, location in enumerate(face_locations):
face = self.__extract(image, location, self._ps.scale)
yield face
def __save_extract(self, face, file_path):
'''Saves the face to file_path at the set extract size'''
image = cv2.resize(face, (self._ps.extract_size, self._ps.extract_size))
cv2.imwrite(file_path, image)
self._total_extracted += 1
if self._ps.display_output:
cv2.imshow('extracted', image)
cv2.waitKey(delay=1)
def __get_fame(self, sequence):
'''Grabs, decodes and returns the next frame and number.'''
frame_count = 0
while sequence.isOpened():
ret, frame = sequence.read()
if not ret:
break
frame_count += 1
if self.__skip_frame(frame_count):
continue
yield (frame, frame_count)
def __draw_detection(self, idx, locations):
'''draws the process frame and the face locations
scaled back on to the original source frames'''
#factor = int(1 / self._ps.scale)
frame = self._process_frames[idx][:, :, ::-1] # BGR->RGB
for (top, right, bottom, left) in locations:
cv2.rectangle(frame, (left, top), (right, bottom), (255, 0, 0), 1)
cv2.imshow('process', frame)
cv2.waitKey(delay=1)
def __do_batch(self, batch_count, output_path):
'''Handles each batch of detected faces, performing recognition on each'''
with tqdm(total=self._ps.batch_size, unit='frame') as progress:
extracted = 0
# each set of face locations in the batch
for idx, locations in self.__get_face_locations():
progress.update(1)
progress.set_description('Batch #{} (recognised {})'.format(batch_count, extracted))
# display output...
if self._ps.display_output:
self.__draw_detection(idx, locations)
# NB: recognition on original image
face_idx = -1;
for face in (self.__get_faces(self._original_frames[idx], locations)):
face_idx += 1
if self.__recognise(face):
extracted += 1
name = self.__format_name(output_path, self._total_extracted)
'''
Extract an new image with padding
around the detected face-rect by scaling the face-rect by padding-scale.
Make sure the scaled face-rect stays within the movie frame.
'''
top, right, bottom, left = locations[face_idx]
width = right - left
height = bottom - top
padding_horizontal = int((width * self._ps.padding_scale - width) * 0.5)
padding_vertical = int((height * self._ps.padding_scale -height) * 0.5)
max_right = self.sequence_width - 1
max_bottom = self.sequence_height - 1
padded_top = top - padding_vertical
if padded_top < 0:
padded_top = 0
padded_bottom = int(height * self._ps.padding_scale)
else:
padded_bottom = bottom + padding_vertical
if padded_bottom > max_bottom:
padded_bottom = max_bottom
padded_top = max_bottom - int(height * self._ps.padding_scale)
padded_left = left - padding_horizontal
if padded_left < 0:
padded_left = 0
padded_right = int(width * self._ps.padding_scale)
else:
padded_right = right + padding_horizontal
if padded_right > max_right:
padded_right = max_right
padded_left = max_right - int(width * self._ps.padding_scale)
if padded_left > -1 and padded_top > -1:
padded_face = self.__extract(self._original_frames[idx], (padded_top, padded_right, padded_bottom, padded_left), self._ps.scale)
self.__save_extract(padded_face, name)
# image v.unlikely to have target face more than once
# however this only holds true if we have a reference
if self.reference_count:
break
def __skip_frame(self, number):
'''We want every nth frame if skipping'''
return self._ps.skip_frames > 0 and number % self._ps.skip_frames
def __batch_builder(self, output_path, sequence, total_frames):
'''Splits the fames in batches and keeps score'''
with tqdm(total=total_frames, unit='frame') as progress:
batch_count = 0
for frame, frame_count in self.__get_fame(sequence):
progress.update(frame_count - progress.n)
progress.set_description('Total (extracted {})'.format(self._total_extracted))
self._process_frames.append(self.__downsample(frame, self._ps.scale))
self._original_frames.append(frame)
if len(self._process_frames) == self._ps.batch_size:
batch_count += 1
self.__do_batch(batch_count, output_path)
self.__reset_frames()
def process(self, input_path, output_path='.'):
'''
Extracts known faces from the input source to the output.
:param str input_path: Path to video or image sequence pattern
:param str output_path: path to output directory
'''
self._total_extracted = 0
sequence = cv2.VideoCapture(input_path)
self.sequence_width = int(sequence.get(3))
self.sequence_height = int(sequence.get(4))
total_frames = int(sequence.get(cv2.CAP_PROP_FRAME_COUNT))
total_work = int(total_frames / self._ps.skip_frames)
total_batches = int(total_work / self._ps.batch_size)
print('Processing {} ({} scale)'.format(input_path, self._ps.scale))
print('References {} ({} jitter {} tolerance)'.format(self.reference_count,
self._rs.jitter,
self._rs.tolerance))
print('Checking {} of {} frames in {} batches of {}'.format(total_work,
total_frames,
total_batches,
self._ps.batch_size))
self.__batch_builder(output_path, sequence, total_frames)
if __name__ == '__main__':
import argparse
class Range(object):
'''Restricted range for float arguments'''
def __init__(self, start, end):
self.start = start
self.end = end
def __eq__(self, other):
return self.start <= other <= self.end
AP = argparse.ArgumentParser(description='''FaceGrab''')
# Required settings
AP.add_argument('-r', '--reference', type=str, required=True,
help=r'''Path to a single file e.g. ./images/someone.jpg
or a path to a directory of reference images e.g. ./images.
(You can also pass an empty directory if you wish to match all faces).''')
AP.add_argument('-i', '--input', type=str, required=True,
help=r'''Path to a single file e.g. ./video/foo.mp4
Or a path/pattern of an image sequence e.g. ./frames/img_%%04d.jpg
(read like ./frames/img_0000.jpg, ./frames/img_0001.jpg, ./frames/img_0002.jpg, ...)''')
AP.add_argument('-o', '--output', type=str, required=True,
help='''Path to output directory''')
# Optional process settings
AP.add_argument('-bs', '--batch_size', type=int, default=128, choices=range(2, 128),
metavar="[2-128]",
help='''How many frames to include in each GPU processing batch.''')
AP.add_argument('-sf', '--skip_frames', type=int, default=6, choices=range(0, 1000),
metavar="[0-1000]",
help='''How many frames to skip e.g. 5 means look at every 6th''')
AP.add_argument('-xs', '--extract_size', type=int, default=256, choices=range(32, 1024),
metavar="[32-1024]",
help='''Size in pixels of extracted face images (n*n).''')
AP.add_argument('-ps', '--padding_scale', type=float, default=1.25, choices=[Range(0.5, 2.0)],
metavar="[0.5-2.0]",
help='''Scale the detected face-rect to add padding around it.''')
AP.add_argument('-s', '--scale', type=float, default=0.25, choices=[Range(0.1, 1.0)],
metavar="[0.1-1.0]",
help='''Factor to down-sample input by for detection processing.
If you get too few matches try scaling by half e.g. 0.5''')
AP.add_argument('-do', '--display_output', action='store_true',
help='''Show the detection and extraction images (slows processing).''')
# Optional recognition settings
AP.add_argument('-t', '--tolerance', type=float, default=0.6, choices=[Range(0.1, 1.0)],
metavar="[0.1-1.0]",
help='''How much "distance" between faces to consider it a match.
Lower is stricter. 0.6 is typical best performance''')
AP.add_argument('-j', '--jitter', type=int, default=5, choices=range(1, 1000),
metavar="[1-1000]",
help='''How many times to re-sample images when
calculating recognition encodings. Higher is more accurate, but slower.
(100 is 100 times slower than 1).''')
ARGS = AP.parse_args()
RS = RecognitionSettings(tolerance=ARGS.tolerance, jitter=ARGS.jitter)
PS = ProcessSettings(batch_size=ARGS.batch_size,
skip_frames=ARGS.skip_frames,
extract_size=ARGS.extract_size,
padding_scale=ARGS.padding_scale,
scale=ARGS.scale,
display_output=ARGS.display_output)
FG = FaceGrab(ARGS.reference, RS, PS)
FG.process(ARGS.input, ARGS.output)
opencv-python==3.4.0.12
face-recognition==1.2.1
numpy==1.14.0
tqdm==4.19.5
dlib==19.9.99
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment