Skip to content

Instantly share code, notes, and snippets.

@kenwebb
Last active December 29, 2024 02:19
Show Gist options
  • Save kenwebb/4f689acdc63eb64ac9ec8a1bdffe6478 to your computer and use it in GitHub Desktop.
Save kenwebb/4f689acdc63eb64ac9ec8a1bdffe6478 to your computer and use it in GitHub Desktop.
Experimenting with GPT and LLM using JavaScript
<!doctype html>
<!-- file:///home/ken/nodespace/gpt-tokenizer/index.html -->
<html>
<head>
<meta http-equiv="content-type" content="text/html; charset=UTF-8">
<meta name="author" content="Ken Webb et al" />
<meta name="viewport" content="width=device-width, initial-scale=1">
<script src="https://unpkg.com/gpt-tokenizer"></script>
<script>
const { encode, decode, EndOfText } = GPTTokenizer_cl100k_base;
</script>
<title>Testing gpt-tokenizer</title>
</head>
<body>
<h3>Testing gpt-tokenizer.</h3>
<p>see Dev Tools console for results</p>
<script>
const one = 'Hello, world';
console.log(one, GPTTokenizer_cl100k_base.encode(one)); // OK
const two = "Hello, world, 123, 567";
console.log(two, encode(two)); // OK
const text = `Hello, do you like tea? ${EndOfText} In the sunlit terraces\nof someunknownPlace.`;
console.log(text); // OK
const allowedSpecialTokens = new Set([EndOfText]);
const encoded = encode(text, {allowedSpecialTokens});
console.log(encoded); // OK
const decoded = decode(encoded);
console.log(decoded); // OK
</script>
</body>
</html>
<?xml version="1.0" encoding="UTF-8"?>
<!--Xholon Workbook http://www.primordion.com/Xholon/gwt/ MIT License, Copyright (C) Ken Webb, Sat Dec 28 2024 21:18:00 GMT-0500 (Eastern Standard Time)-->
<XholonWorkbook>
<Notes><![CDATA[
Xholon
------
Title: Experimenting with GPT and LLM using JavaScript
Description:
Url: http://www.primordion.com/Xholon/gwt/
InternalName: 4f689acdc63eb64ac9ec8a1bdffe6478
Keywords:
My Notes
--------
26 Dec 2024
I am reading Sebastian Raschka's well-written book [ref 0] on LLMs. He uses Python for examples.
I would like to be able to use his code, or equivalent code, in Xholon. This will help me to learn the material in his book, and I can experiment with how AI can be used in Xholon apps.
Possible approaches include:
- using equivalent JavaScript or TypeScript libraries
- running the Python code from the book using a local web server
- ???
http://127.0.0.1:8080/wb/editwb.html?app=Experimenting+with+GPT+and+LLM+using+JavaScript&src=lstr
Refs 0 and 4 both reference OpenAI tiktoken
see my rip folder: ~/nodespace/gpt-tokenizer
### References
(0) Sebastian Raschka, Build a Large Language Model (From Scratch), Manning, 2024
2.5 Byte pair encoding
Let’s look at a more sophisticated tokenization scheme based on a concept called byte
pair encoding (BPE). The BPE tokenizer was used to train LLMs such as GPT-2, GPT-3,
and the original model used in ChatGPT.
Since implementing BPE can be relatively complicated, we will use an existing
Python open source library called tiktoken (https://github.com/openai/tiktoken), which
implements the BPE algorithm very efficiently based on source code in Rust.
(1) search github: llm javascript
(2) search github: gpt javascript
(3) https://www.npmjs.com/search?q=gpt
(4) https://github.com/niieani/gpt-tokenizer
) https://www.npmjs.com/package/gpt-tokenizer
) https://github.com/niieani/gpt-tokenizer#readme
gpt-tokenizer is a Token Byte Pair Encoder/Decoder supporting all OpenAI's models (including GPT-3.5, GPT-4, GPT-4o, and o1).
It's the fastest, smallest and lowest footprint GPT tokenizer available for all JavaScript environments.
It's written in TypeScript.
As of 2023, it is the most feature-complete, open-source GPT tokenizer on NPM.
This package is a port of OpenAI's tiktoken, with some additional, unique features sprinkled on top:
(5) https://gpt-tokenizer.dev/
Welcome to gpt-tokenizer playground!
The most feature-complete GPT token encoder/decoder with support for OpenAI models: o1, GPT-4o and GPT-4, GPT-3.5 and others.
(6) https://github.com/openai/tiktoken
tiktoken is a fast BPE tokeniser for use with OpenAI's models.
The open source version of tiktoken can be installed from PyPI
Example code using tiktoken can be found in the OpenAI Cookbook.
What is BPE anyway?
Language models don't see text like you and I, instead they see a sequence of numbers (known as tokens).
Byte pair encoding (BPE) is a way of converting text into tokens.
It has a couple desirable properties:
(7) https://github.com/openai/openai-cookbook/blob/main/examples/How_to_count_tokens_with_tiktoken.ipynb
(8) https://github.com/openai/openai-node
Official JavaScript / TypeScript library for the OpenAI API
latest is 5 days ago
) www.npmjs.com/package/openai
]]></Notes>
<_-.XholonClass>
<!-- domain objects -->
<PhysicalSystem/>
<Block/>
<Brick/>
<!-- quantities -->
<Height superClass="Quantity"/>
</_-.XholonClass>
<xholonClassDetails>
<Block>
<port name="height" connector="Height"/>
</Block>
</xholonClassDetails>
<PhysicalSystem>
<Block>
<Height>0.1 m</Height>
</Block>
<Brick multiplicity="2"/>
</PhysicalSystem>
<Blockbehavior implName="org.primordion.xholon.base.Behavior_gwtjs"><![CDATA[
var a = 123;
var b = 456;
var c = a * b;
if (console) {
console.log(c);
}
//# sourceURL=Blockbehavior.js
]]></Blockbehavior>
<Heightbehavior implName="org.primordion.xholon.base.Behavior_gwtjs"><![CDATA[
var myHeight, testing;
var beh = {
postConfigure: function() {
testing = Math.floor(Math.random() * 10);
myHeight = this.cnode.parent();
},
act: function() {
myHeight.println(this.toString());
},
toString: function() {
return "testing:" + testing;
}
}
//# sourceURL=Heightbehavior.js
]]></Heightbehavior>
<Brickbehavior implName="org.primordion.xholon.base.Behavior_gwtjs"><![CDATA[
$wnd.xh.Brickbehavior = function Brickbehavior() {}
$wnd.xh.Brickbehavior.prototype.postConfigure = function() {
this.brick = this.cnode.parent();
this.iam = " red brick";
};
$wnd.xh.Brickbehavior.prototype.act = function() {
this.brick.println("I am a" + this.iam);
};
//# sourceURL=Brickbehavior.js
]]></Brickbehavior>
<Brickbehavior implName="org.primordion.xholon.base.Behavior_gwtjs"><![CDATA[
console.log("I'm another brick behavior");
]]></Brickbehavior>
<SvgClient><Attribute_String roleName="svgUri"><![CDATA[data:image/svg+xml,
<svg width="100" height="50" xmlns="http://www.w3.org/2000/svg">
<g>
<title>Block</title>
<rect id="PhysicalSystem/Block" fill="#98FB98" height="50" width="50" x="25" y="0"/>
<g>
<title>Height</title>
<rect id="PhysicalSystem/Block/Height" fill="#6AB06A" height="50" width="10" x="80" y="0"/>
</g>
</g>
</svg>
]]></Attribute_String><Attribute_String roleName="setup">${MODELNAME_DEFAULT},${SVGURI_DEFAULT}</Attribute_String></SvgClient>
</XholonWorkbook>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment