-
-
Save mathiasbynens/1010324 to your computer and use it in GitHub Desktop.
UTF-8 byte counter in 49 bytes
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
function(string) { | |
return unescape( // convert a single `%xx` escape into the corresponding character | |
encodeURI(string) // URL-encode the string (this uses UTF-8) | |
).length; // read out the length (i.e. the number of `%xx` escapes) | |
} | |
// Note: this fails for input that contains lone surrogates. | |
// Use http://mths.be/utf8js if you need something more robust. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
function(s){return unescape(encodeURI(s)).length} |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
DO WHAT THE FUCK YOU WANT TO PUBLIC LICENSE | |
Version 2, December 2004 | |
Copyright (C) 2011 Mathias Bynens <http://mathiasbynens.be/> | |
Everyone is permitted to copy and distribute verbatim or modified | |
copies of this license document, and changing it is allowed as long | |
as the name is changed. | |
DO WHAT THE FUCK YOU WANT TO PUBLIC LICENSE | |
TERMS AND CONDITIONS FOR COPYING, DISTRIBUTION AND MODIFICATION | |
0. You just DO WHAT THE FUCK YOU WANT TO. | |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
{ | |
"name": "byteSize", | |
"description": "This function will return the byte size of any UTF-8 string you pass to it.", | |
"keywords": [ | |
"utf-8", | |
"utf8", | |
"byte", | |
"byte-size" | |
] | |
} |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
<!DOCTYPE html> | |
<!-- online demo: http://mothereff.in/byte-counter --> | |
<meta charset=utf-8> | |
<title>Get the byte size of any UTF-8 string</title> | |
<input autofocus> | |
<p>Byte size: <span></span> | |
<script> | |
var byteSize = function(s){return unescape(encodeURI(s)).length}; | |
var el = document.getElementsByTagName('span')[0]; | |
document.getElementsByTagName('input')[0].oninput = function() { | |
el.innerHTML = byteSize(this.value); | |
}; | |
</script> |
@mathiasbynens: save 2 characters from your current method with some bit shifting:
-
current:
function(s,b,i,c){for(b=i=0;c=s.charCodeAt(i++);b+=1+(c>127)+(c>2047));return b}
-
improved:
function(s,b,i,c){for(b=i=0;c=s.charCodeAt(i++);b+=c>>11?3:c>>7?2:1);return b}
We could just use encodeURI
instead of encodeURIComponent
; this saves 9 bytes.
Anyway, here’s an online tool you can use to check the length & byte count of a string (useful for @140bytes): http://mothereff.in/byte-counter
API: http://mothereff.in/byte-counter#%s
where %s
is the URL-encoded input string. I’ve added it to my browser’s bookmarks / search engines :)
encodeURI and encodeURIComponent will throw out "URI malformed" errors on certain strings in Google Chrome.
@atk Yeah, if the input contains lone surrogates.
//count UTF-8 bytes of a string
function byteLengthOf(s){
//assuming the String is UCS-2(aka UTF-16) encoded
var n=0;
for(var i=0,l=s.length; i<l; i++){
var hi=s.charCodeAt(i);
if(hi<0x0080){ //[0x0000, 0x007F]
n+=1;
}else if(hi<0x0800){ //[0x0080, 0x07FF]
n+=2;
}else if(hi<0xD800){ //[0x0800, 0xD7FF]
n+=3;
}else if(hi<0xDC00){ //[0xD800, 0xDBFF]
var lo=s.charCodeAt(++i);
if(i<l&&lo>=0xDC00&&lo<=0xDFFF){ //followed by [0xDC00, 0xDFFF]
n+=4;
}else{
throw new Error("UCS-2 String malformed");
}
}else if(hi<0xE000){ //[0xDC00, 0xDFFF]
throw new Error("UCS-2 String malformed");
}else{ //[0xE000, 0xFFFF]
n+=3;
}
}
return n;
}
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Oopsy daisy. Sorry my function did work fine, I guess I messed it up at some point, then got distracted when my baby girl woke up.