Skip to content

Instantly share code, notes, and snippets.

@krisk0
Created January 31, 2025 07:40
Show Gist options
  • Save krisk0/350b41bbf4e65fc1d65f43762fc4b872 to your computer and use it in GitHub Desktop.
Save krisk0/350b41bbf4e65fc1d65f43762fc4b872 to your computer and use it in GitHub Desktop.
Fix file mostly encoded as UTF-8, remove poorly encoded strings
#!/usr/bin/python3
import sys
g_in = sys.argv[1]
g_out = g_in + '.conv'
def conv(i, o):
for j in i:
try:
k = j.encode('utf-8')
o.write(k.decode('utf-8'))
except:
o.write('---------------- not utf-8, line skipped ----------------\n')
with open(g_in, mode="r", encoding="utf-8") as g_in_file:
with open(g_out, mode="w", encoding="utf-8") as g_out_file:
conv(g_in_file, g_out_file)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment