Created
September 3, 2022 07:20
An attempt to try and rewrite the metadata for media downloaded via Google Takeout
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
import re | |
import os | |
import sys | |
import json | |
from pathlib import Path | |
#from tinytag import TinyTag | |
#import ffmpeg | |
#from PIL import Image | |
#from PIL.ExifTags import TAGS | |
import filedate | |
PHOTOS = [ | |
".JPEG", | |
".JPG", | |
".PNG", | |
".HEIC", | |
".GIF" | |
] | |
VIDEOS = [ | |
".MOV", | |
".MP4" | |
] | |
if __name__ == "__main__": | |
takeout_folder = sys.argv[1] | |
FILE_META = {} | |
for root, _, files in os.walk(takeout_folder): | |
for media in files: | |
if media.endswith(".json"): | |
with open(Path(root, media)) as f: | |
meta = json.load(f) | |
FILE_META[meta["title"]] = meta["photoTakenTime"]["formatted"] | |
for root, _, files in os.walk(takeout_folder): | |
for media in files: | |
if m := re.match(r".+\..+(\(\d\))\.json", media): | |
splt = media.split(".") | |
os.rename(Path(root, media), Path(root, splt[0] + m.group(1) + "." + splt[1][:-3] + "." + splt[2])) | |
for root, _, files in os.walk(takeout_folder): | |
for media in files: | |
ext = os.path.splitext(media)[1].upper() | |
if ext in PHOTOS or ext in VIDEOS: | |
try: | |
filedate.File(Path(root, media)).set(created=FILE_META[media]) | |
except KeyError as e1: | |
try: | |
with open(Path(root, f"{media}.json")) as f: | |
filedate.File(Path(root, media)).set(created=json.load(f)["photoTakenTime"]["formatted"]) | |
except FileNotFoundError as e2: | |
print(f"cant process: {media}", e2) | |
""" | |
if not media.endswith(".json"): | |
ext = os.path.splitext(media)[1].upper() | |
media_path, meta_path = Path(root, media), Path(root, f"{media}.json") | |
if ext in VIDEOS or ext in PHOTOS: | |
with open(meta_path, "r") as f: | |
meta = json.load(f) | |
filedate.File(media_path).set(created = meta["photoTakenTime"]["formatted"]) | |
""" | |
""" | |
if ext in PHOTOS: | |
img = Image.open(path) | |
print(img.getexif()) | |
exit() | |
elif ext in VIDEOS: | |
#vid = ffmpeg.probe(path) | |
" "[0] | |
else: | |
print(f"Unhandled file: {path}") | |
""" |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
ffmpeg-python==0.2.0 | |
filedate==2.0 | |
future==0.18.2 | |
Pillow==9.2.0 | |
python-dateutil==2.8.2 | |
six==1.16.0 | |
tinytag==1.8.1 |
Little update:
- Some of the json metadata files being trimmed is indeed due to file naming constraints, though the real reason is still a mystery. Being said, majority of the files name does exist within the json, so it usually becomes a matter of finding the closest one. Alternatively, the json should 'in theory' have the proper file name inside of the json metadata file under the 'title' value, though i have observed sometimes this is also mangled for no particular reason.
- Some jpg files are named jpeg and vice versa is their accompanying metadata files, which is incredibly infuriating and makes parsing a pain.
- Live photos are uploaded as HEIC and MP4 pairs, where the MP4 serves to be the video, and the HEIC is apples photo format (i think? im not 100% lol). This explains the lack of a metadata for the other pair, since google exports for only one of them, I believe the HEIC alone.
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
There are a lot of different wayward attempts that I tried. Originally I tried to parse each file and use respective libraries/apis to manipulate the metadata but I came across an interesting discovery (at least when it came to the videos)
Finding this out, I decided to abandon the approach and change to manipulating file metadata directly through the windows api. I came across a StackOverflow thread with this answer talking about their
filedate
library for python, which allowed a platform independant way to manipulate file metadata, specifically creation, modification, and access times. Since I was only interested in creation time, I chose to use it. Here came the real problems.In conclusion, kindly screw Google Takeout for stripping metadata and making it a complete abominable pain to transfer files over. I have hated every step of this process and I still have hundreds of files that are not working with this script, meaning I'll have to deal with them manually.