Last active
March 8, 2023 16:58
-
-
Save tbrunz/1b5e28d3e571c021aa0a440b173e1bfb to your computer and use it in GitHub Desktop.
A simple Lua app to demonstrate the principles of lexical closures
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
#! /usr/bin/env lua | |
-- | |
-- A simple Lua app to demonstrate the principles of lexical closures. | |
-- | |
------------------------------------------------------------------------------- | |
------------------------------------------------------------------------------- | |
-- | |
-- An iterator that sorts a table by its keys, in alphabetical order. | |
-- | |
function pairsByKeys ( alphaTable, sortFunction ) | |
local alphaArray = { } | |
--[[ | |
The provided table, 'alphaTable', is a Lua table, which means it's an | |
associative array. AA's store their key-value pairs using hash codes | |
to improve look-up efficiency. One trade-off in using this method is | |
that key-value retrieval (using the 'pairs' iterator) will appear in | |
more or less a "random" order, not alphabetical order. | |
For this application, we want a loop iterator that *will* successively | |
return the key-value pairs in alphabetic order by key. To do this, we | |
will recast 'alphaTable' to make a new Lua 'sequence' (i.e., an ordered | |
array), where the values of the array, 'alphaArray', are the keys of | |
the 'alphaTable' table we were given. | |
Why do this? Because while the lua function 'table.sort' sorts arrays, | |
it only sorts by their values; it can't sort keys. Ergo, we need to | |
define an array whose values are the (unsorted) keys of 'alphaTable'; | |
we can then sort this array to get the sorted keys we need. | |
]]-- | |
for key, _ in pairs( alphaTable ) do | |
alphaArray[ #alphaArray + 1 ] = key | |
end | |
--[[ | |
Now we sort the table keys, using this auxiliary array. The | |
'table.sort' function sorts arrays using a provided comparison | |
function. But note that the default sorting function is '<', which | |
works for alphabetical sorting. So if our iterator was called with | |
no second argument, we'll get an alpha sort, as expected. And if | |
it's called with a function as a second argument, the iterator will | |
return key-value pairs using that alternative sorting scheme instead. | |
Note that 'table.sort' will "sort in place", so no need to define a | |
third array. | |
]]-- | |
table.sort( alphaArray, sortFunction ) | |
--[[ | |
Now that the keys are sorted, we can provide a function that will | |
iterate over the *sorted* order of the keys, returning them & their | |
associated values in the original table. The iterator is a closure, | |
a function that "closes over" three variables that are in the lexical | |
scope of the returned function defined here (but are not actually | |
defined in the function itself): | |
1) The table argument that this function was called with, | |
2) The sorted array used to control the order of retrieval, | |
3) An index to keep track of what to return on successive calls. | |
On each call of the iterator we will pre-increment the index to make | |
detecting the end condition easy; therefore we must start the index | |
at zero. (Recall that arrays in Lua are 1-based, not 0-based.) | |
]]-- | |
local index = 0 | |
--[[ | |
Now we return a nameless function (a 'lambda'). All functions in Lua | |
are actually lambdas, which can be assigned to variables, passed as | |
parameters, and returned from functions. | |
]]-- | |
return function ( ) | |
-- When we've exhausted the table/array, a pre-incremented index | |
-- will cause the table/array access to return 'nil', which also | |
-- signals the client using this iterator that its loop is done. | |
index = index + 1 | |
-- The keys are in alphabetic order in 'alphaArray', while the | |
-- values are in a hashed order in 'alphaTable'. Note that Lua | |
-- is register-based, so the following construct is byte-compiled | |
-- into an efficient executable. | |
return alphaArray[index], alphaTable[ alphaArray[index] ] | |
end | |
end | |
------------------------------------------------------------------------------- | |
------------------------------------------------------------------------------- | |
-- | |
-- An iterator that traverses an input file, returning its words, one by one. | |
-- | |
function allWords ( filename ) | |
-- Open a file of the given name for reading, so we can count its words. | |
-- Note that 'io.input' will raise an error for us, should one occur. | |
io.input( filename ) | |
--[[ | |
Prime the while loop in the returned function below by reading the first | |
line from the file, and have it start searching for words at position 1 | |
(since strings/arrays are 1-based). Note that the default argument for | |
'io.read' is "*l", or "read line". The default file that 'io.read' reads | |
from is set with an earlier call to 'io.input'. | |
]]-- | |
local line = io.read() | |
local pos = 1 | |
--[[ | |
What Lua (and other languages, such as Java) call an "iterator" are | |
really "generators". That is, a "generator" function, such as this, | |
creates and returns an "iterator" function. Clients will call the | |
iterator function, typically in a 'for' loop. | |
Iterators are typically closures; in this case, our iterator closes | |
over the variables 'line' & 'pos', but not the file descriptor and | |
its buffer, which are globals. The variables are defined above in | |
*this* function (which is called only once, to obtain the iterator). | |
None of these variables are defined in the iterator function (closure) | |
being returned (although they could be), yet they are used by the | |
iterator each time it gets called. | |
This is possible because the above variables exist in the closure's | |
lexical scope at define-time (even though they will have already gone | |
out of scope at run-time after this function returns). The value of | |
a language having closures is that these definitions are preserved | |
inside the closure at run-time and are encapsulated ('hidden' from | |
scopes where the function is actually called). | |
So the variables used in the closure are not globals, but they are | |
also not local to the closure; hence, they are "non-local variables". | |
All functions in Lua are actually closures. | |
]]-- | |
return function ( ) | |
-- Enable looping as long as a line is read from the file. Once | |
-- the end of file is reached, 'io.read' will return 'nil', which | |
-- will prevent the 'while' statement from executing. | |
while line do | |
-- Locals 'first' & 'last' delimit the position of each word | |
-- found in the current line of the file. By 'word', we mean | |
-- a contiguous string of alphanumeric characters, a pattern | |
-- matched by "%w+". ("%W+" would match strings of non-word | |
-- characters.) The closure variable 'pos' tells Lua where | |
-- in the string to start matching word characters. | |
-- Note that Lua supports multiple return values (very nice)! | |
-- Also note that we treat 'line' (a string) as an object, | |
-- applying the find' method to it by using the ':' operator. | |
local first, last = line:find( "%w+", pos ) | |
-- Was a word found in this 'line' at or after position 'pos'? | |
-- If so, 'first' and 'last' will be returned as integers. | |
-- If not, they will be 'nil' (which equates to boolean false). | |
if first then | |
-- Advance the position pointer to the next character after | |
-- the word, then extract the word string and return it. | |
pos = last + 1 | |
-- The string function 'sub' extracts a sub-string. | |
return line:sub( first, last ) | |
else | |
-- A word was not found in (the remainder of) the line; so | |
-- read another line & reset the position to the beginning | |
-- of this new line. Then loop to try again. If there are | |
-- no more lines in the file, 'line' will be 'nil', and | |
-- this 'while' loop will terminate. By using a 'while' | |
-- loop, we will skip over multiple blank lines correctly. | |
line = io.read() | |
pos = 1 | |
end | |
end | |
-- On reaching this point, there is nothing left in the file; so | |
-- return 'nil' to signal that the loop has completed. Note that | |
-- this line isn't strictly needed, as a 'return' in Lua without | |
-- an argument will return 'nil' anyway. | |
return nil | |
end | |
end | |
------------------------------------------------------------------------------- | |
------------------------------------------------------------------------------- | |
-- | |
-- Test the 'allWords' iterator (generator) with a simple routine to | |
-- count all occurrences of each word in a text file. | |
-- | |
--[[ | |
We start by getting a file to scan -- let's use ourselves! To do that, | |
we need the Lua equivalent of '${0}' in bash or '__file__' in Python. | |
Lua doesn't have a global variable for this, though, so we need to call a | |
function, 'debug.getinfo', and extract a field that holds the filename. | |
Our call to 'debug.getinfo' below returns the name of the source file ('S') | |
of the function at the first level ('1') of the call stack. Note that the | |
filename will begin with an '@' symbol, indicating it is a filename. By | |
passing '1', we are asking for information about the current function. If | |
we were to pass in '0', it would return "=[C]", since it would be returning | |
information about the 'getinfo' function itself (which is defined in an | |
external C library). More information on this function can be found in the | |
Debug Library chapter in the "Programming in Lua" book... | |
]]-- | |
local thisModule = debug.getinfo( 1, 'S' ) | |
-- Don't forget to snip off the '@' sigil from the obtained filename! | |
-- Passing only one argument to 'sub' causes the second argument to | |
-- default to 'the last character in the string', which allows us to | |
-- use 'sub' to easily discard the first 'n' characters of a string. | |
-- In this case, the string is the 'source' field of 'thisModule'. | |
local targetFile = thisModule.source:sub( 2 ) | |
-- Now collect all the words in the target file as a 'bag of words'. | |
-- Creating a 'bag' requires us to count the number of occurrences of | |
-- each key, where the keys are simply the words we find in the file. | |
WordBag = { } | |
-- To do this, we use the Lua iterator we defined above to retrieve the | |
-- words from the target file, one by one, dropping each into our bag of | |
-- words as we go. | |
for word in allWords( targetFile ) do | |
-- We don't care about capitalization, so make the word all lowercase. | |
word = word:lower() | |
-- If the word hasn't been seen before, create a key with a value of 1. | |
-- Otherwise, increment the value (count) for keys we've seen before: | |
if not WordBag[ word ] then | |
WordBag[ word ] = 1 | |
else | |
WordBag[ word ] = WordBag[ word ] + 1 | |
end | |
end | |
-- Now that we've read through the entire file, our bag of words is full. | |
-- Create a report, sent to 'stdout', formatted as "<word> <count>". The | |
-- report format is controlled by the function 'string.format', where the | |
-- format codes are exactly the same as the 'printf' format codes in the | |
-- C language. | |
outputFormat = "%20s %s" | |
-- And here is the report generation code, starting with a header. | |
print() | |
print( string.format( outputFormat, "Word", "Frequency" ) ) | |
print( string.format( outputFormat, | |
" ----------------", "----------------" ) ) | |
--[[ | |
Now we want to extract each word (key) and its count (value) from our bag of | |
words. However, the order of the words in the bag (i.e., the order we would | |
pull them out, if we use Lua's 'pairs' iterator) is essentially random. We | |
need to interate over the bag contents, sorted by its keys in alphabetical | |
order, printing each key & value. To do this, we use the custom iterator we | |
defined at the top of this module to return the bag contents in the desired | |
order (using the default sort function to obtain an alphabetical order). | |
]]-- | |
for word, count in pairsByKeys( WordBag ) do | |
print( string.format( outputFormat, word, count ) ) | |
end | |
print() | |
-- Note that this report can easily be modified to display the words in the | |
-- order of the frequency with which they appear in the target file. | |
-- (This is left as an exercise for the student... ;^) | |
-- Finally, we close the file. (Not strictly needed, as the OS will do this.) | |
-- Note that 'io.input' when called with no arguments, will return the handle | |
-- of the current input file. | |
io.input():close() | |
------------------------------------------------------------------------------- | |
------------------------------------------------------------------------------- |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment