Goals: Add links that are reasonable and good explanations of how stuff works. No hype and no vendor content if possible. Practical first-hand accounts of models in prod eagerly sought.
| import tiktoken | |
| import langdetect | |
| T = tiktoken.get_encoding("o200k_base") | |
| length_dict = {} | |
| for i in range(T.n_vocab): | |
| try: | |
| length_dict[i] = len(T.decode([i])) | |
| except: |
The repository for the assignment is public and Github does not allow the creation of private forks for public repositories.
The correct way of creating a private frok by duplicating the repo is documented here.
For this assignment the commands are:
git clone --bare [email protected]:usi-systems/easytrace.git
| Latency Comparison Numbers (~2012) | |
| ---------------------------------- | |
| L1 cache reference 0.5 ns | |
| Branch mispredict 5 ns | |
| L2 cache reference 7 ns 14x L1 cache | |
| Mutex lock/unlock 25 ns | |
| Main memory reference 100 ns 20x L2 cache, 200x L1 cache | |
| Compress 1K bytes with Zippy 3,000 ns 3 us | |
| Send 1K bytes over 1 Gbps network 10,000 ns 10 us | |
| Read 4K randomly from SSD* 150,000 ns 150 us ~1GB/sec SSD |