Neovim has built-in functions for measuring the length, width, or number of characters in a string:
Sometimes these return the same value. For example, if there is only
a #
and alphanumerics on a line, like this:
# An example
That line would return 12 by all five functions, e.g., with the cursor
anywhere in that line, :echo strlen(getline('.'))
, returns 12.
There are many things that cause the functions to return different values. They are worth knowing because different use cases will call for using a distinct function. The following script illustrates an instance where they all differ and, importantly, why.
local function s_measure(lnum)
lnum = lnum or vim.fn.line('.')
local line = vim.fn.getline(lnum)
return { "strcharlen=" .. vim.fn.strcharlen(line),
"strchars=" .. vim.fn.strchars(line),
"strwidth=" .. vim.fn.strwidth(line),
"strdisplaywidth=" .. vim.fn.strdisplaywidth(line),
"strlen=" .. vim.fn.strlen(line), }
end
-- Call s_measure() on line 19, reporting to line 16
local measurements = s_measure(19)
table.insert(measurements, 1, " --")
local output = table.concat(measurements, " ")
vim.fn.setline(16, output)
-- strcharlen=63 strchars=64 strwidth=65 strdisplaywidth=69 strlen=70
-- Example line 19 (NB: with &tabstop of 8, as set in the modeline):
-- A tab-> , CJK char 漢, combining char é, and 4-byte emoji 😊!
-- strcharlen=63 - The number of character cells with combining
-- characters not counted, so if you substitute
-- with `:s/./_/g` it would replace with 63x _
-- because the é is two Unicode code points
-- (U+0065,U+0301), but only one cell.
-- strchars=64 - This is the number of distinct Unicode code
-- points. So the combining acute accent, U+0301,
-- is counted too and the result is 64. Another
-- way to think about this is to:
-- :echo str2list(getline('.'))->len()
-- [Note: If `strchars(1)` was used in the example
-- instead of the default (equivalent to
-- `strchars(0)`), it would skip combining
-- characters and return the same as
-- `strcharlen()`.]
-- strwidth=65 - This is the number of display cells the string
-- occupies, like strdisplaywidth, but with the
-- tab character only counted as one cell. Since
-- tabs are variable length (and in this example,
-- five), it is strdisplaywidth (69) - 4, so 65.
-- strdisplaywidth=69 - This is the number of cells the string
-- occupies, visually. One way to think
-- about this value is to consider each monospaced
-- column of the display as one unit, and the
-- position of the cursor when on the last
-- visible character of the display:
-- :echo virtcol('.')
-- ...on that last visible character echos 69.
-- However, beware 'conceal' (e.g. in help files,
-- where `|` characters are used for hotlinks).
-- Concealed characters may be hidden from view,
-- but are not NOT excluded in the number
-- returned by `strdisplaywidth()` even though
-- they don't occupy any visible space on screen.
-- strlen=70 - The number of bytes in the string. This,
-- is always the largest number of the `str*`
-- values (or equal largest). In the example
-- line, there are 63 cells, as shown with
-- strcharlen, but:
-- 漢 (is three bytes: e6 bc a2; one cell)
-- é's combining acute (is two bytes: cc 81; 0)
-- 😊 (is four bytes: f0 9f 98 8a; one)
-- So, strlen is 63 + (3 - 1) + 2 + (4 - 1) = 70.
-- (Note: Using `g8`, when in Normal mode, on
-- display character shows the UTF-8 bytes.)
-- vim:textwidth=73:tabstop=8:
Copying the script above into Neovim, saving the file as s_measure.lua
and sourcing it illustrates this in action. Delete the contents of
line 16 of the script first with D
or d$
(not dd
) to see the
output generated on line 16 when sourced with :so
.
You may also like to change:
- Line 19 itself, or
- The
19
on line 11 of the script, pointing to a different line, to compare the outputs of the functions.