Skip to content

Instantly share code, notes, and snippets.

View biglioncoding's full-sized avatar

biglioncoding

View GitHub Profile
@biglioncoding
biglioncoding / ocrpdf.sh
Created August 12, 2017 18:53 — forked from wcaleb/ocrpdf.sh
Take a PDF, OCR it, and add OCR Text as background layer to original PDF to make it searchable
#!/bin/sh
# Take a PDF, OCR it, and add OCR Text as background layer to original PDF to make it searchable.
# Hacked together using tips from these websites:
# http://www.jlaundry.com/2012/ocr-a-scanned-pdf-with-tesseract/
# http://askubuntu.com/questions/27097/how-to-print-a-regular-file-to-pdf-from-command-line
# Dependencies: pdftk, tesseract, imagemagick, enscript, ps2pdf
# Would be nice to use hocr2pdf instead so that the text lines up with the PDF image.
# http://www.exactcode.com/site/open_source/exactimage/hocr2pdf/