Skip to content

Instantly share code, notes, and snippets.

@JamesChevalier
Last active April 14, 2026 12:51
Show Gist options
  • Select an option

  • Save JamesChevalier/8448512 to your computer and use it in GitHub Desktop.

Select an option

Save JamesChevalier/8448512 to your computer and use it in GitHub Desktop.
Unicode on Mac is insane. Mac OS X uses NFD while everything else uses NFC. This fixes that.

convmv manpage

Install convmv if you don't have it

sudo apt-get install convmv

Convert all files in a directory from NFD to NFC:

convmv -r -f utf8 -t utf8 --nfc --notest .

Convert all files in a directory from NFC to NFD:

convmv -r -f utf8 -t utf8 --nfd --notest .

@hwdbk

hwdbk commented Dec 27, 2020 via email

Copy link
Copy Markdown

@fguern

fguern commented Dec 27, 2020

Copy link
Copy Markdown

Hello Henk.

At this stage I see three solutions:

/////1 - Your script

-> This time, it's "not overwritten":
francois@Francoiss-MacBook-Air mac-nfd-conversion % for f in /Users/francois/Documents/01.\ Documents/25.\ Test/*.rtf ; do mv -v -n "$f" "$(dirname "$f")/$(./mac2syn <<< "$(basename "$f")")" ; done
/Users/francois/Documents/01. Documents/25. Test/2008_03_26 - A voir à paris.rtf not overwritten
Is it possible to have an entire folder+subfolders rename with your script?

//// 2 - James CONVMV command
-> I also tried the command above "convmv -r -f utf8 -t utf8 --nfc --notest" and I got a "wrong/unknown encoding" :
francois@Francoiss-MacBook-Air 25. Test % convmv -r -f enc -t enc utf8 --nfc --notest
wrong/unknown "from" encoding!

///// 3 - Rsync local copy with NFC
-> No apparent problem, except the copy of 10Go
rsync -a --iconv=utf-8-mac,utf-8 /Users/francois/Documents/01.\ Documents/02.\ Administratif /Users/francois/Documents/01.\ Documents/02.\ Administratif\ nfc

What's your expert advise? Is it worth it to try to make the script or the convmv command work?
Thanks

@hwdbk

hwdbk commented Dec 28, 2020 via email

Copy link
Copy Markdown

@hwdbk

hwdbk commented Dec 28, 2020 via email

Copy link
Copy Markdown

@jsvini

jsvini commented Mar 18, 2021

Copy link
Copy Markdown

God bless you! 🙌

@jcarnat

jcarnat commented Apr 14, 2021

Copy link
Copy Markdown

Great. Thanks a lot!

@boulderob

boulderob commented Apr 30, 2021

Copy link
Copy Markdown

So i just upgraded to a new used intel-based macbook pro (aka mbp). I reformatted the internal SDD to use APFS and i'm seeing the exact symptoms defined here when i use VLC to view downloaded french video mp4 files with matching subtitle vtt filenames that have unicode characters in the filenames themselves! VLC plays the mp4 fine but it can't locate and autoload the vtt file with the same exact filename except for the extension! If there are no unicode "french" characters in the filenames it all works fine. But all files whether they had unicode "French" characters or not worked great on my old mbp with an older version of VLC and a NON-APFS file system.

In fact if i mount my external NTFS drive with a huge library of previously downloaded french video mp4 and vtt files on the new mbp (using an NTFS driver to mount the drive of course), the new VLC recognizes and plays these OLD files normally.

BUT on the new mbp, VLC will not autorecognzie the coinciding and same exact vtt filename as it's equivalent mp4 when the filename contains unicode french chars when i download new files via youtube-dl onto the internal SDD formatted as APFS

If i copy these new download files to the usb attached NTFS drive they then magically work they way i expect from VLC on the new mbp! :) If i then copy them back to the new mbp internal SDD with APFS, they also work the way i expect :) This seems to indicate that the NTFS filesystem changes the fileNAME encoding when the file is copied to it and that copying back to APFS somehow does NOT change that new encoding!???

I have a huge collection and am constantly adding to and maintaining it.

Is there any way to just set what your scripts are doing at the filesystem or even system level? Or going forward will i have to always run a post download script on every new filename to convert the unicode flavor used so that my mac / vlc can recognize them?!

Thanks

@hwdbk

hwdbk commented May 1, 2021

Copy link
Copy Markdown

yup, that is exactly the madness with having two allowed but different character encodings for, say, the è (e-accent-grave)
the simple way to make sure both the media file and the vtt/srt file uses the same file name encoding is:

for i in *.mp4 ; do
mv -vn "$i" "$(syn2mac <<< "$i")"
done

and do the same for the subtitle files. You'll probably get a lot of "same file" warnings from mv on those files that were already in the target encoding.

@s2k

s2k commented May 4, 2021

Copy link
Copy Markdown

Very nice tip!
On my Mac, I used brew install convmv, BTW.

@igorsgm

igorsgm commented May 13, 2021

Copy link
Copy Markdown

You saved my day. Thank you!

@simnalamburt

simnalamburt commented Jun 16, 2023

Copy link
Copy Markdown

Take a look at https://github.com/cr0sh/jaso for a faster alternative written in Rust.

$ brew install simnalamburt/x/jaso
$ jaso .
DONE; 100 files in 1.111529301 seconds

@elmimmo

elmimmo commented Mar 4, 2025

Copy link
Copy Markdown

Just for reference, there is also NFCFN.py.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment