Skip to content

Instantly share code, notes, and snippets.

@clarking
Created May 8, 2023 06:20
Show Gist options
  • Save clarking/ffbc8579ca21b9ec7a3395f61b9aa26f to your computer and use it in GitHub Desktop.
Save clarking/ffbc8579ca21b9ec7a3395f61b9aa26f to your computer and use it in GitHub Desktop.
bytecode-less smalltalk
"(C)1984-1985 Jecel Assumpcao Jr"
"$Id: st84.txt,v 1.1.1.1 1995/06/29 00:14:36 root Exp $"
"This is an implementation of Smalltalk described
in Smalltalk itself. This was designed in late 1984, but
what follows is a translation into English of the
February 26, 1985 version. It is only a paper design
as there was no way to test it then and represents
the first real object oriented design for the author.
A confusing aspect of this design is the *maxLex*
thing that keeps popping up - the author had assumed
that blocks were real closures and compiled temporary
variable references within blocks as (lexical level,
variable index) pairs with the associated complications
in the runtime access. The design is for a sequencial
interpreter, but a dataflow-like parallel implementation
is possible with the same object code format ( which
was one of the reasons why the format was chosen ).
--------------------------------------------------------
The program below is a discription of part of the Smalltalk
system that will be eventually written in machine language
and that interprets compiled methods. It is interesting to
note the resemblence between the static data structures
( templates and blocks ) and the dynamic structures used
by the interpreter. These structures take up more memory
than the bytecodes used in other Smalltalk systems, but
achieve a greater uniformity, which we hope will result
in a higher performance.
Only the methods that help to make clear how the interepreter
works will be shown."
Object subclass: #Template
instanceVariableNames: 'expressions'
classVariableNames: ''
poolDictionaries: ''
category: ''
"This is the basic unit of Smalltalk object code. It
represents a single message as an Array of expressions
containing the selector, the receiver and the arguments
of the message. While the selector is a string, the
receiver and the arguments are themselves Templates,
forming a tree." !
! Template methodsFor: '' !
sender: context1 index: anInteger maxLex: context2
"makes an ( almost ) clone of itself and
saves the specified dynamic environment"
| tmp |
tmp <- MethodContext new.
tmp sender: context1 index: anInteger maxLex: context2.
tmp expressions: expressions shallowCopy.
^ tmp !
saveContext
"the current environment is stored away for later"
| |
^ self sender: CurrentProcess currentContext
index: CurrentProcess index
maxLex: CurrentProcess maxLex !
eval
"creates a new context to determine the
value of this code in the current environment"
| |
CurrentProcess dispatch: self saveContext !!
Template subclass: #ReplyTemplate
instanceVariableNames: ''
classVariableNames: ''
poolDictionaries: ''
category: ''
"This class is for code which starts with the $^
character" !
! ReplyTemplate methodsFor: '' !
sender: context1 index: anInteger maxLex: context2
"remember the home context, rather than
the current one"
| |
^ super sender: context2 sender
index: context2 index
maxLex: context2 !!
Template subclass: #SuperTemplate
instanceVariableNames: ''
classVariableNames: ''
poolDictionaries: ''
category: ''
"This class represents code generated from
messages sent to super" !
! SuperTemplate methodsFor: '' !
sender: context1 index: anInteger maxLex: context2
"the clone is a SuperContext"
| tmp e |
tmp <- SuperContext new.
tmp sender: context1 index: anInteger maxLex: context2.
e <- expressions shallowCopy.
e at: 2 put: context2 receiver.
tmp expressions: e.
^ tmp !!
Template subclass: #SelfTemplate
instanceVariableNames: ''
classVariableNames: ''
poolDictionaries: ''
category: ''
"This class represents code generated from
messages sent to self" !
! SelfTemplate methodsFor: '' !
sender: context1 index: anInteger maxLex: context2
"the receiver is the same as context2 receiver"
| tmp e |
tmp <- MethodContext new.
tmp sender: context1 index: anInteger maxLex: context2.
e <- expressions shallowCopy.
e at: 2 put: context2 receiver.
tmp expressions: e.
^ tmp !!
Template subclass: #HCTemplate
instanceVariableNames: ''
classVariableNames: ''
poolDictionaries: ''
category: ''
"This class represents code generated from
messages sent to the home context" !
! HCTemplate methodsFor: '' !
sender: context1 index: anInteger maxLex: context2
"the receiver is the sending context"
| tmp e |
tmp <- MethodContext new.
tmp sender: context1 index: anInteger maxLex: context2.
e <- expressions shallowCopy.
e at: 2 put: context1.
tmp expressions: e.
^ tmp !!
Template subclass: #LiteralTemplate
instanceVariableNames: ''
classVariableNames: ''
poolDictionaries: ''
category: ''
"This class represents literal objects
in the source code. The expressions
instance variable contains a single
object ( of any kind ) rather than an
Array of Templates" !
! LiteralTemplate methodsFor: '' !
eval
"this is easy: we already know the answer"
| |
CurrentProcess reply: expressions
to: CurrentProcess currentContext
index: CurrentProcess index
maxLex: CurrentProcess maxLex !!
Template subclass: #LiteralReply
instanceVariableNames: ''
classVariableNames: ''
poolDictionaries: ''
category: ''
"This class represents code generated from
a literal object in the source code after
the $^ character" !
! LiteralReply methodsFor: '' !
eval
| |
"the answer goes to the home context sender"
CurrentProcess reply: expressions
to: CurrentProcess maxLex currentContext
index: CurrentProcess maxLex index
maxLex: CurrentProcess maxLex maxLex !!
Template subclass: #Block
instanceVariableNames: 'numTemporaries'
classVariableNames: ''
poolDictionaries: ''
category: ''
"This class represents blocks in the source
code. numTemporaries tells us how much space
to add when cloning. expressions is an Array
of Templates which are evaluated in order and
have all ( but the last ) values thrown away.
Source code methods are also compiled to
blocks." !
! Block methodsFor: '' !
sender: context1 index: anInteger maxLex: context2
"cloning much reserve space for the temporaries -
this could be avoided by merging the temporaries
and expressions Arrays"
| tmp |
tmp <- BlockContext new.
tmp nextLex: context1.
tmp expressions: expressions. "no need to copy
them as they cannot
be changed ( the
returned values are
thrown away"
tmp temporaries: ( Array new: numTemporaries ).
^ tmp !
eval
"evaluating a block results in a BlockContext"
| |
CurrentProcess reply: self saveContext
to: CurrentProcess currentContext
index: CurrentProcess index
maxLex: CurrentProcess maxLex !!
Object subclass: #Context
instanceVariableNames: 'sender index maxLex expressions'
classVariableNames: ''
poolDictionaries: ''
category: ''
"This represents the current state of a computation" !
! Context methodsFor: '' !
sender: context1 index: anInteger maxLex: context2
"initialize"
| |
sender <- context1.
index <- anInteger.
maxLex <- context2. !
expressions: anArray
| |
expressions <- anArray !
at: int put: value
| |
expressions at: int put: value !
lexicalLevel: anInt
"normal Contexts may nest in the same
lexical level"
| |
^ sender lexicalLevel: anInt !!
Context subclass: #BlockContext
instanceVariableNames: 'temporaries nextLex'
classVariableNames: ''
poolDictionaries: ''
category: ''
"This class represents the dynamic environment
of the evalutation of a block" !
! BlockContext methodsFor: '' !
temporaries: anArray
| |
temporaries <- anArray !
nextLex: context
| |
nextLex <- context !
value
"starts the evaluation of the block"
| |
sender <- CurrentProcess currentContext.
index <- CurrentProcess index.
maxLex <- CurrentProcess maxLex.
CurrentProcess dispatch: self !
value: arg
| |
temporaries at: 1 put: arg.
self value !
value: arg1 with: arg2
| |
temporaries at: 2 put: arg2.
self value: arg1 !
evalAt: anInt
"evaluate one of the block's expressions"
| |
( anInt = expressions size )
ifTrue: [ CurentProcess return ] "replies
to whom I should reply"
ifFalse: [ ( expressions at: anInt ) eval ] !
temporary: anInt
| |
^ temporaries at: anInt !
temporary: anInt put: obj
| |
temporaries at: anInt put: obj !
at: anInt put: obj
| |
^ self "just ignore it - blocks throw aways results" !
lexicalLevel: anInt
"a block always starts a new lexical level"
| |
( anInt = 1 ) ifTrue: [ ^ self ]
ifFalse: [ ^ nextLex lexicalLevel: anInt - 1 ] !!
Context subclass: #MethodContext
instanceVariableNames: ''
classVariableNames: ''
poolDictionaries: ''
category: ''
"stores the dynamic environment of evaluating a message.
There are two stages:
1) the receiver and arguments must be evaluated
2) the message must be sent ( the associated
method is found and this becomes the home
context of that method as it is evaluated )" !
! MethodContext methodsFor: '' !
evalAt: anInt
| |
( anInt > expressions size )
ifTrue: [ CurentProcess send ]
ifFalse: [ ( expressions at: anInt ) eval ] !
receiverClass
| |
^ self receiver class !
receiver
| |
^ expressions at: 2 !
selector
| |
^ expressions at: 1 !
argument: anInt
| |
^ expressions at: anInt + 2 !!
MethodContext subclass: #SuperContext
instanceVariableNames: ''
classVariableNames: ''
poolDictionaries: ''
category: ''
"Kludge to implement super" !
! SuperContext methodsFor: '' !
receiverClass
"this dynamic lookup is very inefficient and only
must be done because a messages holder is not
stored anywhere"
| |
^ ( maxLex receiver class superclassWhichHas: maxLex selector )
superclass !!
Object subclass: #Interpreter
instanceVariableNames: 'currentContext index maxLex'
classVariableNames: ''
poolDictionaries: ''
category: ''
"This represent a single thread of execution. The global
variable CurrentProcess is always an object of this
class" !
! Interpreter methodsFor: '' !
currentContext
| |
^ currentContext !
index
| |
^ index !
maxLex
| |
^ maxLex !
step
"evaluate the next thing"
| |
index <- index + 1.
currentContext evalAt: index !
dispatch: newContext
| |
"starts an additional level - like the
colon in FORTH"
currentContext <- newContext.
index <- 0.
self step !
return
"ends a level of evaluation - like the
semicolon in FORTH"
| |
index <- currentContext index.
maxLex <- currentContext maxLex.
currentContext <- currentContext sender !
reply: value to: context1 index: anInt maxLex: context2
"replaces a context subtree with the computed
result"
| |
index <- anInt.
maxLex <- context2.
currentContext <- context1.
currentContext at: index put: value.
self step !
send
"starts the second phase of MethodContext evaluation"
| method value |
method <- self lookFor: currentContext selector
in: currentContext receiverClass.
method isNil ifTrue: [ self sendMessageNotUnderstood ]
ifFalse: [ value <- PrimitiveFailed.
method primitive isNil
ifFalse: [ value <- self doPrimitive:
method primitive ].
( value = PrimitiveFailed )
ifFalse: [
self reply: value
to: currentContext sender
index: currentContext index
maxLex: currentContext maxLex
]
ifTrue: [
maxLex <- currentContext.
method compiled eval
] ] !
lookFor: selector in: class
"not really needed - just to speed up things"
| value |
Cache includes: selector and: class
ifTrue: [ value <- Cache at: selector and: class ]
ifFalse: [ value <- class lookup: selector.
value isNil
ifFalse: [ Cache at: selector and: class
put: value ] ]
^ value !!
"-----------------------------------------------
To see how this might work, it is important to know
what the compiler output looks like. On page 546 of
the Blue Book ( Smalltalk-80: The Language and Its
Implementation ) we find this example:
merge: aRectangle
| minPoint maxPoint |
minPoint <- origin min: aRectangle origin.
maxPoint <- corner max: aRectangle corner.
^ Rectangle origin: minPoint
corner: maxPoint
which gives as the bytecodes:
#( 0 16 209 224 105 1 16 211 226 106 69 17 18 224 124 )
and the literal frame:
#( min: origin max: corner origin:corner: Association(Rectangle,...) )
For our interpreter we would have:
Block(2,#(HCTemplate(#(LiteralTemplate('temporary:put:')
nil
LiteralTemplate(1)
Template(#(LiteralTemplate('min:')
SelfTemplate(#(LiteralTemplate('basicAt:')
nil
LiteralTemplate(1)
))
Template(#(LiteralTemplate('origin')
HCTemplate(#(LiteralTemplate
('argument:')
LiteralTemplate(1)
))
))
))
HCTemplate(#(LiteralTemplate('temporary:put:')
nil
LiteralTemplate(2)
Template(#(LiteralTemplate('max:')
SelfTemplate(#(LiteralTemplate('basicAt:')
nil
LiteralTemplate(2)
))
Template(#(LiteralTemplate('corner')
HCTemplate(#(LiteralTemplate
('argument:')
LiteralTemplate(1)
))
))
))
ReplyTemplate(
#(LiteralTemplate('origin:corner:')
Template(#(LiteralTemplate('value')
LiteralTemplate(Association(Rectangle,...)
))
HCTemplate(#(LiteralTemplate('temporary:')
LiteralTemplate(1)
))
HCTemplate(#(LiteralTemplate('temporary:')
LiteralTemplate(2)
))
))
)
)
Now this includes 38 Template objects, which would consume a
huge amount of memory ( plus the 14 Array objects ). Of these,
14 LiteralTemplate objects are there only to quote the selector
name - if the interpreter were modified to skip selector evaluation
they could be eliminated. This would still make the memory requirements
of this implementation several times that of a bytecoded one. If
the compiler cached objects like:
HCTemplate(#(LiteralTemplate('temporary:')
LiteralTemplate(2)
))
which occur repeatedly, the actual code expansion could be less than
100%. Also, the interpreter could short circuit evaluation when
encountering such common templates.
The reason for all this mess is that a bytecoded implementation of
Smalltalk didn't seem object-oriented enough. This proposal has
message passing at the bottom ( to use a Self term ) and the compiled
code itself is made up of objects. It was hoped that such an
implementation would be easier to render into hardware."
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment