-
-
Save tetrillard/759bf2d165b440e4915c to your computer and use it in GitHub Desktop.
#!/usr/bin/env python | |
# -*- coding: utf8 -*- | |
# SMSBackupRestore extractor | |
# | |
# smsbackuprestore-extractor.py | |
# 24/11/2014 | |
# | |
# This script will extract all images and videos retrieved | |
# from a xml backup of the Android application "SMS Backup & Restore". | |
# For each contact, it will create a folder inside the output folder | |
# with all received images and videos. | |
# | |
# Make sure the destination folder is empty otherwise it will create duplicates. | |
# | |
# Links : | |
# https://play.google.com/store/apps/details?id=com.riteshsahu.SMSBackupRestore | |
# | |
# example: python smsbackuprestore-extractor.py sms-20141122183844.xml medias/ | |
from lxml import etree | |
import os | |
import sys | |
if len(sys.argv) < 2: | |
print "usage: %s [sms-backup.xml] [output-folder]" % sys.argv[0] | |
sys.exit(-1) | |
INPUT_FILE = sys.argv[1] | |
OUTPUT_FOLDER = sys.argv[2] | |
if not os.path.isfile(INPUT_FILE): | |
print "File %s not found" % INPUT_FILE | |
print "[*] Parsing : %s" % INPUT_FILE | |
tree = etree.parse(INPUT_FILE) | |
mms_list = tree.xpath(".//mms") | |
total = 0 | |
for mms in mms_list: | |
address = mms.get("address") | |
contact = mms.get("contact_name") | |
if contact == "(Unknown)": | |
folder = address | |
if address == None: | |
folder = "_Unknown" | |
else: | |
folder = contact | |
media_list = mms.xpath(".//part[starts-with(@ct, 'image') or starts-with(@ct, 'video')]") | |
# Create the folders | |
for media in media_list: | |
total = total + 1 | |
output = OUTPUT_FOLDER + "/" + folder | |
if os.path.exists(output) == False: | |
os.makedirs(OUTPUT_FOLDER + "/" + folder) | |
print "[+] New folder created : %s" % output.encode("utf-8") | |
filename = media.get("cl") | |
rawdata = media.get("data").decode("base64") | |
outfile = output + "/" + filename | |
# Duplicates handling | |
i = 1 | |
while os.path.isfile(outfile): | |
dname = filename.split('.') | |
dname.insert(-1, str(i)) | |
outfile = output + "/" + '.'.join(dname) | |
i = i+1 | |
f = open(outfile, 'w') | |
f.write(rawdata) | |
f.close() | |
print "[*] Job done (%d files created)" % total | |
print "[*] Output folder : %s" % OUTPUT_FOLDER |
i really want to thank youf or this @tetrillard
i will be posting a new gist inspired by your work here for blackberry 10 users.
this gist will allow us QNX fanboys to migrate all of our texts backed up by SMS Backup, so we have a nice tidy database we will be able to convert back, hopefully, in the very near future.
i cannot believe i got this to work lol. or at least i've gotten images and things to appear in the previews properly.
big downside is that it's slow to restore because i don't know how to sort the mms/sms messages on the "date" tag. but i hope the few of us BB10ers left (i hate to toot our horn, but our small population has attributes similar to those using 'true QNX').
this whole experience made especially hate python 3, which your code was wise enough to avoid (evident by the lack of bracketed prints) since it's a nightmare. it's been a while since i've done anything in it, and i plan to keep it that way.
had i not run into your code, i very likely would have used perl.
the original SMS backup was for blackberry 10 and tagged everything as 'sms'.
it would use local file storage to reference the messsages in MMS, as the way you've done here (same sort of directory structure too).
now the "new" sms backup by RITESH (i think he bought it from the dutch guy) moves the data into the tags, wbich is probably better.
all in all i just tested a backup i made by going into SMS Backup and viewing an image in the conversation, and it's there.
it's not 100% perfect because it seems there's html-in-html for the mms header (part seq="-1") but, it does work
https://gist.github.com/i3roly/e5ec063e561af48c30c4c045746b92fc
still a WIP, but i think i've got the sorting figured out thanks to this guy named Zesk on stackoverflow:
https://stackoverflow.com/a/46128043
the only issues/questions that remain are:
- whether the header of the mms (tag part seq="-1") is really doing dynamic resizing, and if this actually honoured by the OS when displaying the messages. i just used what appeared to be prevalent on most headers, though some do show different sizes. hoping the OS handles this as it deems fit.
google are real assholes: they save your messages if you restore them after clearing your data for the messages app, because it logs you in by default, and all those messages start to restore. so it saves everything. now i have to delete it all, and it's taking forever (17 years of messages ha).
so i have to wait a bit to try the latest version. your script above successfully decodes the xml my script produces, so i am expecting it should work. at least that's the theory.
god android is so smooth-brained. it's horrific.
This is great! I just ran the newest version (posted by @bumpaneer) on a 600MB xml file and it worked wonderfully. I was about to start writing something to do the same thing but I wasn't looking forward to it :) Thank you.
Do you mind if I link to this gist from a Stack Exchange question?