Created
November 7, 2023 10:22
-
-
Save pogpog/61ffa3848e7a986db83baccf869d37ff to your computer and use it in GitHub Desktop.
Basic options for Tesseract OCR
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
$ tesseract --help-extra | |
Usage: | |
tesseract --help | --help-extra | --help-psm | --help-oem | --version | |
tesseract --list-langs [--tessdata-dir PATH] | |
tesseract --print-parameters [options...] [configfile...] | |
tesseract imagename|imagelist|stdin outputbase|stdout [options...] [configfile...] | |
OCR options: | |
--tessdata-dir PATH Specify the location of tessdata path. | |
--user-words PATH Specify the location of user words file. | |
--user-patterns PATH Specify the location of user patterns file. | |
--dpi VALUE Specify DPI for input image. | |
-l LANG[+LANG] Specify language(s) used for OCR. | |
-c VAR=VALUE Set value for config variables. | |
Multiple -c arguments are allowed. | |
--psm NUM Specify page segmentation mode. | |
--oem NUM Specify OCR Engine mode. | |
NOTE: These options must occur before any configfile. | |
Page segmentation modes: | |
0 Orientation and script detection (OSD) only. | |
1 Automatic page segmentation with OSD. | |
2 Automatic page segmentation, but no OSD, or OCR. (not implemented) | |
3 Fully automatic page segmentation, but no OSD. (Default) | |
4 Assume a single column of text of variable sizes. | |
5 Assume a single uniform block of vertically aligned text. | |
6 Assume a single uniform block of text. | |
7 Treat the image as a single text line. | |
8 Treat the image as a single word. | |
9 Treat the image as a single word in a circle. | |
10 Treat the image as a single character. | |
11 Sparse text. Find as much text as possible in no particular order. | |
12 Sparse text with OSD. | |
13 Raw line. Treat the image as a single text line, | |
bypassing hacks that are Tesseract-specific. | |
OCR Engine modes: (see https://github.com/tesseract-ocr/tesseract/wiki#linux) | |
0 Legacy engine only. | |
1 Neural nets LSTM engine only. | |
2 Legacy + LSTM engines. | |
3 Default, based on what is available. | |
Single options: | |
-h, --help Show minimal help message. | |
--help-extra Show extra help for advanced users. | |
--help-psm Show page segmentation modes. | |
--help-oem Show OCR Engine modes. | |
-v, --version Show version information. | |
--list-langs List available languages for tesseract engine. | |
--print-parameters Print tesseract parameters. |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment