Vim has built-in functions for measuring the length, width, or number of characters in a string:
Sometimes these return the same value. For example, if there is only
a #
and alphanumerics on a line, like this:
# An example
That line would return 12 by all five functions, e.g., with the cursor
anywhere in that line, :echo strlen(getline('.'))
, returns 12.
There are many things that cause the functions to return different values. They are worth knowing because different use cases will call for using a distinct function. The following script illustrates an instance where they all differ and, importantly, why.
vim9script
const S_MEASURE: func = (lnum: number = line('.')): list<string> => {
return [$"strcharlen={getline(lnum)->strcharlen()}",
$"strchars={getline(lnum)->strchars()}",
$"strwidth={getline(lnum)->strwidth()}",
$"strdisplaywidth={getline(lnum)->strdisplaywidth()}",
$"strlen={getline(lnum)->strlen()}"]
}
# Call S_MEASURE() on line 18, reporting to line 15
18->S_MEASURE()
->insert(" #", 0)
->join()
->setline(15)
# strcharlen=63 strchars=64 strwidth=65 strdisplaywidth=69 strlen=70
# Example line 18 (NB: with &tabstop of 8, as set in the modeline):
# A tab-> , CJK char 漢, combining char é, and 4-byte emoji 😊!
# strcharlen=63 - The number of character cells with combining
# characters not counted, so if you substitute
# with `:s/./_/g` it would replace with 63x _
# because the é is two Unicode code points
# (U+0065,U+0301), but only one cell.
# strchars=64 - This is the number of distinct Unicode code
# points. So the combining acute accent, U+0301,
# is counted too and the result is 64. Another
# way to think about this is to:
# :echo str2list(getline('.'))->len()
# [Note: If `strchars(1)` was used in the example
# instead of the default (equivalent to
# `strchars(0)`), it would skip combining
# characters and return the same as
# `strcharlen()`.]
# strwidth=65 - This is the number of display cells the string
# occupies, like strdisplaywidth, but with the
# tab character only counted as one cell. Since
# tabs are variable length (and in this example,
# five), it is strdisplaywidth (69) - 4, so 65.
# strdisplaywidth=69 - This is the number of cells the string
# occupies, visually. One way to think
# about this value is to consider each monospaced
# column of the display as one unit, and the
# position of the cursor when on the last
# visible character of the display:
# :echo virtcol('.')
# ...on that last visible character echos 69.
# However, beware 'conceal' (e.g. in help files,
# where `|` characters are used for hotlinks).
# Concealed characters may be hidden from view,
# but are not excluded in the number returned
# by `strdisplaywidth()` even though they don’t
# occupy any visible space on screen.
# strlen=70 - The number of bytes in the string. This,
# is always the largest number of the `str*`
# values (or equal largest). In the example
# line, there are 63 cells, as shown with
# strcharlen, but:
# 漢 (is three bytes: e6 bc a2; one cell)
# é’s combining acute (is two bytes: cc 81; 0)
# 😊 (is four bytes: f0 9f 98 8a; one)
# So, strlen is 63 + (3 - 1) + 2 + (4 - 1) = 70.
# (Note: Using `g8`, when in Normal mode, on
# display character shows the UTF-8 bytes.)
# vim:textwidth=73:tabstop=8:
Copying the script above into Vim and sourcing it illustrates this
in action. Delete the contents of line 15 of the script first with
D
or d$
(not dd
) to see the output generated on line 15
when sourced with :so
.
You may also like to change:
- Line 18 itself, or
- The
18
on line 10 of the script, pointing to a different line, to compare the outputs of the functions.