Skip to content

Instantly share code, notes, and snippets.

@Frontear
Created September 3, 2022 07:20
Show Gist options
  • Save Frontear/b33266bede1cdf2dbfb74326864c17a0 to your computer and use it in GitHub Desktop.
Save Frontear/b33266bede1cdf2dbfb74326864c17a0 to your computer and use it in GitHub Desktop.
An attempt to try and rewrite the metadata for media downloaded via Google Takeout
import re
import os
import sys
import json
from pathlib import Path
#from tinytag import TinyTag
#import ffmpeg
#from PIL import Image
#from PIL.ExifTags import TAGS
import filedate
PHOTOS = [
".JPEG",
".JPG",
".PNG",
".HEIC",
".GIF"
]
VIDEOS = [
".MOV",
".MP4"
]
if __name__ == "__main__":
takeout_folder = sys.argv[1]
FILE_META = {}
for root, _, files in os.walk(takeout_folder):
for media in files:
if media.endswith(".json"):
with open(Path(root, media)) as f:
meta = json.load(f)
FILE_META[meta["title"]] = meta["photoTakenTime"]["formatted"]
for root, _, files in os.walk(takeout_folder):
for media in files:
if m := re.match(r".+\..+(\(\d\))\.json", media):
splt = media.split(".")
os.rename(Path(root, media), Path(root, splt[0] + m.group(1) + "." + splt[1][:-3] + "." + splt[2]))
for root, _, files in os.walk(takeout_folder):
for media in files:
ext = os.path.splitext(media)[1].upper()
if ext in PHOTOS or ext in VIDEOS:
try:
filedate.File(Path(root, media)).set(created=FILE_META[media])
except KeyError as e1:
try:
with open(Path(root, f"{media}.json")) as f:
filedate.File(Path(root, media)).set(created=json.load(f)["photoTakenTime"]["formatted"])
except FileNotFoundError as e2:
print(f"cant process: {media}", e2)
"""
if not media.endswith(".json"):
ext = os.path.splitext(media)[1].upper()
media_path, meta_path = Path(root, media), Path(root, f"{media}.json")
if ext in VIDEOS or ext in PHOTOS:
with open(meta_path, "r") as f:
meta = json.load(f)
filedate.File(media_path).set(created = meta["photoTakenTime"]["formatted"])
"""
"""
if ext in PHOTOS:
img = Image.open(path)
print(img.getexif())
exit()
elif ext in VIDEOS:
#vid = ffmpeg.probe(path)
" "[0]
else:
print(f"Unhandled file: {path}")
"""
ffmpeg-python==0.2.0
filedate==2.0
future==0.18.2
Pillow==9.2.0
python-dateutil==2.8.2
six==1.16.0
tinytag==1.8.1
@Frontear
Copy link
Author

Frontear commented Sep 3, 2022

There are a lot of different wayward attempts that I tried. Originally I tried to parse each file and use respective libraries/apis to manipulate the metadata but I came across an interesting discovery (at least when it came to the videos)

  1. Using ffmpeg, I was able to find that the creation time was actually saved in the video at the main layer (for an MOV that was the QUICKTIME / MOV layer). The MPEG-4 and AAC layers meanwhile had some strange date, Jan 1st 1970 and a description saying something ISO by Google (assuming part of the Takeout process). It meant that somewhere, the metadata was still around but clearly not being recognized or ignored by operating systems and other photo storage cloud servers.

Finding this out, I decided to abandon the approach and change to manipulating file metadata directly through the windows api. I came across a StackOverflow thread with this answer talking about their filedate library for python, which allowed a platform independant way to manipulate file metadata, specifically creation, modification, and access times. Since I was only interested in creation time, I chose to use it. Here came the real problems.

  1. Takeout seems to name some of the json files differently from the media file itself. I observed this to be the case with really large file names (take 65722709309__E9D4A538-5FCF-487B-862A-ECB6CEE0E for example, pls dont ask why its like this idfk either), meaning that we can't do a simple json lookup from the filename. This is usually because the json file strips a chunk of the filename off, not sure why since I observed that I could manually rename said files to a longer name, meaning there shouldn't be a naming constraint (unless this is something for backward compatibility).
  2. Some media files lack a proper json metadata file. Some HEIC/MP4 pairings (I assume from Apples live photo stuff) failed to properly create respective json metadatas, meaning I had to manually create and name them), and some PNGs lacked a metadata entirely. I observed that these PNGs had JPGs and other files with very similar names (ignoring the extension), and this could be an issue with naming (not entirely sure).

In conclusion, kindly screw Google Takeout for stripping metadata and making it a complete abominable pain to transfer files over. I have hated every step of this process and I still have hundreds of files that are not working with this script, meaning I'll have to deal with them manually.

@Frontear
Copy link
Author

Frontear commented Sep 10, 2022

Little update:

  1. Some of the json metadata files being trimmed is indeed due to file naming constraints, though the real reason is still a mystery. Being said, majority of the files name does exist within the json, so it usually becomes a matter of finding the closest one. Alternatively, the json should 'in theory' have the proper file name inside of the json metadata file under the 'title' value, though i have observed sometimes this is also mangled for no particular reason.
  2. Some jpg files are named jpeg and vice versa is their accompanying metadata files, which is incredibly infuriating and makes parsing a pain.
  3. Live photos are uploaded as HEIC and MP4 pairs, where the MP4 serves to be the video, and the HEIC is apples photo format (i think? im not 100% lol). This explains the lack of a metadata for the other pair, since google exports for only one of them, I believe the HEIC alone.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment