In this article, I will introduce iterators and discuss how they offer an intuitive interface for addressing a diverse set of programming problems. Specifically, I'll explore the potential application of iterators in Rebol, highlighting how they complement Rebol's elegant and expressive approach to problem-solving.
Iterators are mechanisms used to traverse complex data structures without exposing their internal composition. They provide a standard interface for accessing the component parts of those structures sequentially, significantly simplifying the code built around them. Iterators typically share the following characteristics:
- Encapsulation: They encapsulate the domain logic of a given data structure, meaning you do not need to learn that structure to use it.
- Sequential Access: Each call presents the next logical component from that domain.
- State Maintenance: They maintain their internal state between calls, always being ready for the next iteration.
- Completion: This continues until all components have been traversed and the iteration is complete.
Additionally, just as iterators provide incremental access to data, the amount of data needed in memory can be limited to what is required for each iteration. Depending on their implementations, iterators can offer access to data and domains that are as efficient as they are expressive.
The Iterator API and iterators described below utilize my Iterate module requiring both Rebol 3 by Oldes and my modules for that version of Rebol. Assuming both are installed, all these examples require is initiation of the Iterate module: import r3:rgchris:iterate.
The ‘DOM’ iterator walks a tree structure representing XML content. This iterator returns single word events for each stage of tree traversal: OPEN, CLOSE, EMPTY, and TEXT.
The ‘DOM’ iterator uses blocks created by the default XML module:
import r3:xml
tree: load/as "<a><b/><c/><d><e/></d>f</a>" 'xml
This should be equivalent to:
tree: [
document #[] [
[
"a" _ [
["b" _ _]
["c" _ _]
[
"d" _ [
["e" _ _]
]
]
"f"
]
]
]
]
Creating the iterator is relatively intuitive. ITERATE/NEW takes two arguments: the object containing specific iteration instructions for a given domain, and the values to be navigated.
document: iterate/new iterators/dom tree
This iterator is ready to go.
The iterator creates events as it navigates the DOM block. These events can be processed sequentially responding to the event name. Note that following each event, the properties of the iterator are updated to reflect the current position. In the case of the <d> node, the DOM iterator has a close method that closes the current node without walking through its children, thus skipping the <e> sub-node. Once the iterator has concluded walking the tree structure, it returns NONE, thus ending the WHILE loop.
name-of: :first
; shorthand used for clarity
collect [
while [
iterate/next document
][
switch document/event [
open [
keep to tag! name-of document/node
; we're going to skip children of <d>
;
if "d" == name-of document/node [
iterate/close document
; this only closes the 'd' tag, not the iterator
]
]
close [
keep to tag! rejoin [
#"/" name-of document/node
]
]
empty [
keep to tag! rejoin [
name-of document/node #"/"
]
]
text [
keep document/node
; nodes following a text event are strings
]
]
]
]
The result of the COLLECT wrapper is a block of tags and strings, a linear representation of the tree's structure:
[<document> <a> <b/> <c/> <d> </d> "f" </a> </document>]
This short example demonstrates how an iterator might be useful in serializing a nested tree structure.
Another feature of the DOM iterator is that the hierarchial collection of parent nodes are available for each event. This can be examined and tested against.
document: iterate/new iterators/dom tree
while [
iterate/next document
][
if all [
'empty = document/event
"e" == document/node/1
][
path: collect [
keep document/node/1
foreach node document/path [
keep node/1
]
]
break
]
]
path
; ["e" "d" "a" document]
Path evaluation can be useful for extracting content based on a given set of criteria.
The String iterator steps through a block of strings as if they were concatenated returning chunks of a given length. Though perhaps not the most elegant way to work with strings, this iterator is useful for simulating more complicated content streams, for example receiving network responses in chunks, or portions of large files.
In this example, a block of strings of varying lengths are processed and strings of a fixed length (the size specified by the first value in the block) are returned as events.
text: iterate/new iterators/string [
8 "1234567890" "" "12345678" "90"
]
collect [
while [
iterate/next text
][
keep text/chunk
]
]
; ["12345678" "90123456" "7890"]
In this example, the iterator returns NONE when all input is exhausted, though can resume if more text is added to the source block.
The Values iterator walks through a nested structure of Rebol values. This iterator returns OPEN or CLOSE events when it encounters blocks or objects, otherwise it returns a word value corresponding to the datatype of the current value. The Values iterator streamlines the process of traversing complex data structures, trivializing operations such as serialization.
In this example, we will return a block of values to its serialized form. Although this specific example could be handled just as easily by using MOLD, the iterative approach allows for intervention in the process to customize the process.
walker: iterate/new iterators/values [
1
#(object! [one: 1 two: #["two" 2]])
[one]
two #(none)
]
spacer: ""
openers: #[
#(block!) "["
#(map!) "#["
#(object!) "@["
#(paren!) "("
]
closers: #[
#(block!) #"]"
#(map!) #"]"
#(object!) #"]"
#(paren!) #")"
]
rejoin collect [
while [
iterate/next walker
][
if walker/name [
keep spacer
either word? walker/name [
keep mold to set-word! walker/name
][
keep mold walker/name
]
spacer: #" "
]
switch/default walker/event [
open [
keep spacer
keep select openers type-of walker/value
spacer: ""
]
close [
keep select closers type-of walker/value
spacer: #" "
]
none! [
keep spacer
keep #"_"
spacer: #" "
]
][
keep spacer
keep mold walker/value
spacer: #" "
]
]
]
Just for fun, this example uses custom notation (@[...]) for objects to demonstrate the versatility of this approach:
[1 @[one: 1 two: #["two" 2]] [one] two _]
This level of fine-grained control along with the availability of rich metadata on each event lends to a quite expressive style of coding.
Note: As the iterator walks through a map or object value, it steps through names and values together, both of which are available within the iterator object.
Caveat: at this time the iterator doesn't handle circular values.
The Folders iterator is one of my favourites and addresses a common need: navigating deeply through the contents of a filesystem folder. Amongst the applications are preparing a folder for archive, renaming, filtering, or deleting specific files, or performing a deep search. As the iterator steps through the filesystem, each file or subfolder is encountered with its full path, path relative to the provided folder, and just the filename. Folders generate both an OPEN and CLOSE event, the latter signaling completion of handling of that particular folder. Like the DOM example above, folders can be skipped with the ITERATE/CLOSE method.
A list of a folders contents can be obtained through iteration:
folder: iterate/new iterators/folders %my-folder/
collect [
while [
iterate/next folder
][
if find [open file error] folder/event [
keep folder/path
]
]
]
Note: An ERROR event signals a subfolder whose content could not be ascertained.
Or maybe you want to zap all of those .DS_Store files:
folder: iterate/new iterators/folders %project/
while [
iterate/next folder
][
if folder/file == %.DS_Store [
delete folder/full
]
]
The Folders iterator offers a quite elegant tool to take stock of a folder's contents. There's scope for refinement too: e.g. recognition of symbolic links would be of use (not least to identify potential circular structures).
The iterators as described in this document are experimental and relatively primitive. Each of the concepts could use further development to ensure their utility and rigor. My intent is to at least demonstrate the Iterator concept and potential usefulness. Though I'd also like to encourage some measure of scrutiny with a view to integrating the concept at a lower level in the Rebol environment.
*Note: It's more than likely that the PORT! datatype was deviced in part to accommodate
I've labelled this document 'Part I,' I do have further thoughts that I'll try to put together in 'Part II.'
My own thinking on iterators has focused on compatibility of ITERATOR! with existing Rebol iteration methods. This is to say that plain old series (like a BLOCK! or a GROUP!) would function as iterators, as well as objects that respond to things like NEXT and BACK.
To get there, I've been thinking NEXT and BACK would default to giving back a copy of the advanced iterator. But then be able to modify it directly by passing a variable.
In Ren-C the "pass a variable" is more obvious syntactically than passing a WORD! by quoting, because $var evaluates to
varbound, while 'var evaluates tovar(with whatever the binding originally was, usually unbound if scanned source). So if you see next $var you know you're passing a variable-with-binding.While @hiiamboris proposes using GET to read an iterator, GET and SET already have meanings for things like BLOCK! (and GROUP!) that I'd prefer not to interfere with.
For that reason (and others), I feel like interacting with iterators via PICK (and POKE) makes more sense. Then AT could be repurposed as a specialization of "PICK 1", with PUT as a specialization of "POKE 1".
This also means passing a variable to AT could be taken to mean you want to NEXT the iterator at the same time, which is the most common semantic desired:
As an alternate to NOT TAIL?, I think MORE? is probably the most literate logical test for telling you if an iterator can respond to AT without failing (although AT? is maybe more precise...given that an iterator can be at the tail and still have "more" if you're enumerating it backwards...but a bias to forward iteration is fair here, I think).
Anyway, that's the jumping-off point for my own pursuits. I am also considering the question of whether
@itershould evaluate the same asat iteror keep its current purpose. (Given how relatively nice AT and PUT look, I'm becoming less inclined to take away the current behavior of@xxxfor this feature...but I haven't fully ruled it out yet.)