Skip to content

Instantly share code, notes, and snippets.

@MalteT
Last active May 11, 2020 14:19
Show Gist options
  • Select an option

  • Save MalteT/066b3fba3fd107a7e40c3fdce7c83956 to your computer and use it in GitHub Desktop.

Select an option

Save MalteT/066b3fba3fd107a7e40c3fdce7c83956 to your computer and use it in GitHub Desktop.
Find paths with some file hash in HWP
#!/bin/bash
IGNORE_HASHES=f6b0091b7e45d7d7a322157a88549b28
# Unzip all files
for user_dir in *; do
if [ -d "$user_dir" ]; then
# enter dir
cd "$user_dir"
# Extract
for file in *; do
if [ "${file: -4}" == ".zip" ]; then
unzip -o "$file" > /dev/null
elif [ "${file: -4}" == ".rar" ]; then
unrar -o+ e "$file" > /dev/null
elif [ "${file: -7}" == ".tar.gz" ]; then
tar xf "$file" --overwrite > /dev/null
elif [ "${file: -3}" == ".7z" ]; then
7z -y e "$file" > /dev/null
else
echo "Unknown file type on $file"
fi || exit 1
done
# exit dir
cd ..
fi
done
# Remove all garbage files
find . -iname \*.dpl -exec rm '{}' \;
find . -iname \*.dat -exec rm '{}' \;
find . -type d -name __MACOSX -exec rm -r '{}' \;
# Prehash all files for performance reasons
find . -type f -exec md5sum '{}' \; | rg -v $(echo $IGNORE_HASHES) > all_hashes.md5
# Hash all files and print checksums that occur more then once
cat all_hashes.md5 \
| awk '{ print $1 }' \
| sort \
| uniq -c \
| sort -n \
| rg -v '^ *1 ' \
| awk '{ print $2 }' > dub_hashes.txt
for sum in `cat dub_hashes.txt`; do
echo ====================
cat all_hashes.md5 \
| rg "$sum" \
| awk 'BEGIN{FS=" "} { print $2 }'
done
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment