Last active
November 23, 2017 11:24
-
-
Save Nani-o/cef3293c1cb34312609b651d2ed81be9 to your computer and use it in GitHub Desktop.
Extract PDF info from a folder and all its subfolders into a csv file
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| #!/bin/bash | |
| # Petit script qui utilise pdfinfo pour extraire de manière récursive des infos sur des PDFs dans un csv. | |
| # Il est nécessaire que pdfinfo soit disponible (e.g : brew install xpdf sous OS X). | |
| # [usage]: ./pdf_folders_to_csv.sh /path/to/folder/of/pdfs [csv_file_name] | |
| [[ -z "$2" ]] && CSV_FILE="pdf_folders_to_csv.csv" || CSV_FILE="$2" | |
| while read -r line | |
| do | |
| NOMBRE_PAGE=$(pdfinfo $line | grep Pages | awk '{print $2}') | |
| FORMAT=$(pdfinfo $line | grep "Page size" | cut -d ':' -f2 | sed 's/^ *//g') | |
| DOSSIER=$(dirname $line | tr -d '.' | sed 's/\//\ /g') | |
| FICHIER=$(basename $line) | |
| echo "${DOSSIER};${FICHIER};${NOMBRE_PAGE};${FORMAT}" >> "${CSV_FILE}" | |
| done <<< "$(find $1 -type f -iname '*.pdf')" |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment