Last active
October 25, 2021 22:50
-
-
Save adrianyorke/ddff17c2ec2ad00579827c4ce1248899 to your computer and use it in GitHub Desktop.
Unicode character testing for Nordic region
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
"""Unicode character testing done for Unicode source.""" | |
# Even in the year 2021, unicode is still not used everywhere so we must test our entire processing | |
# chain for uncommon or exceptional characters. | |
# Within the Nordic region, we must test for common letters that are used in the various | |
# languages of the region, which can also be found on our country-specific keyboards. | |
# Note: It is common for those with Russian heritage to live in Nordic countries, especially Finland. | |
# Our technology stack and tools must also handle these additional letters not found in the default code page. | |
NORDIC_SPECIAL_CHARS = [ | |
"ų", | |
"ī", | |
"ū", | |
"ą", | |
"ę", | |
"į", | |
"ų", | |
"ū́", | |
"Ż", | |
"š", | |
"č", | |
"ẽ", | |
"ä", | |
"Ä", | |
"ö", | |
"Ö", | |
"å", | |
"Å", | |
] | |
for c in NORDIC_SPECIAL_CHARS: | |
print(c) |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Here's a Python statement to demonstrate the issue of convertic Nordic letters from utf-8 to latin-1 encoding:
print('öäå'.encode('utf-8').decode('latin-1'))
Expected output:
öäå