Skip to content

Instantly share code, notes, and snippets.

@Nani-o
Last active November 23, 2017 11:24
Show Gist options
  • Select an option

  • Save Nani-o/cef3293c1cb34312609b651d2ed81be9 to your computer and use it in GitHub Desktop.

Select an option

Save Nani-o/cef3293c1cb34312609b651d2ed81be9 to your computer and use it in GitHub Desktop.
Extract PDF info from a folder and all its subfolders into a csv file
#!/bin/bash
# Petit script qui utilise pdfinfo pour extraire de manière récursive des infos sur des PDFs dans un csv.
# Il est nécessaire que pdfinfo soit disponible (e.g : brew install xpdf sous OS X).
# [usage]: ./pdf_folders_to_csv.sh /path/to/folder/of/pdfs [csv_file_name]
[[ -z "$2" ]] && CSV_FILE="pdf_folders_to_csv.csv" || CSV_FILE="$2"
while read -r line
do
NOMBRE_PAGE=$(pdfinfo $line | grep Pages | awk '{print $2}')
FORMAT=$(pdfinfo $line | grep "Page size" | cut -d ':' -f2 | sed 's/^ *//g')
DOSSIER=$(dirname $line | tr -d '.' | sed 's/\//\ /g')
FICHIER=$(basename $line)
echo "${DOSSIER};${FICHIER};${NOMBRE_PAGE};${FORMAT}" >> "${CSV_FILE}"
done <<< "$(find $1 -type f -iname '*.pdf')"
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment