Bill Katz DocSavage

Basics

B-trees and CPU Caches

Reinforcement Learning for Language Models

Yoav Goldberg, April 2023.

Why RL?

With the release of the ChatGPT model and followup large language models (LLMs), there was a lot of discussion of the importance of "RLHF training", that is, "reinforcement learning from human feedback". I was puzzled for a while as to why RL (Reinforcement Learning) is better than learning from demonstrations (a.k.a supervised learning) for training language models. Shouldn't learning from demonstrations (or, in language model terminology "instruction fine tuning", learning to immitate human written answers) be sufficient? I came up with a theoretical argument that was somewhat convincing. But I came to realize there is an additional argumment which not only supports the case of RL training, but also requires it, in particular for models like ChatGPT. This additional argument is spelled out in (the first half of) a talk by John Schulman from OpenAI. This post pretty much

Total missed: 2

Total guesses: 6158

First guess: spend

aback

⬜⬜🟩🟩🟩: quack

abate

Permanent WSL DNS Fix (WSL 2.2.1+)

If you're encountering ping github.com failing inside WSL with a Temporary failure in name resolution, you're not alone — this has been a long-standing issue, especially when using VPNs or corporate networks.

This issue is now fixed robustly with DNS tunneling, which preserves dynamic DNS behavior and avoids limitations like WSL’s former hard cap of 3 DNS servers in /etc/resolv.conf.

DNS tunneling is enabled by default in WSL version 2.2.1 and later, meaning that if you're still seeing DNS resolution issues, the first and most effective fix is simply to upgrade WSL. Upgrading WSL updates the WSL platform itself, but does not affect your installed Linux distributions, apps, or files.

To upgrade WSL, follow these steps,

The repository for the assignment is public and Github does not allow the creation of private forks for public repositories.

The correct way of creating a private frok by duplicating the repo is documented here.

For this assignment the commands are:

Create a bare clone of the repository. (This is temporary and will be removed so just do it wherever.)

git clone --bare [email protected]:usi-systems/easytrace.git

2015-01-29 Unofficial Relay FAQ

Compilation of questions and answers about Relay from React.js Conf.

Disclaimer: I work on Relay at Facebook. Relay is a complex system on which we're iterating aggressively. I'll do my best here to provide accurate, useful answers, but the details are subject to change. I may also be wrong. Feedback and additional questions are welcome.

What is Relay?

Relay is a new framework from Facebook that provides data-fetching functionality for React applications. It was announced at React.js Conf (January 2015).

	from __future__ import annotations

	from typing import (
	Any,
	Dict, Generic, Iterable, Literal, TypeVar,
	TypedDict, Union, Protocol, runtime_checkable)
	from pydantic import ValidationError
	from pydantic.generics import GenericModel
	from zarr.storage import init_group, BaseStore
	import zarr

	$ go build -x -v .

	# swigtest/ocio
	ocio/ocio.cpp:24:16: error: no viable conversion from 'OCIO::ConstConfigRcPtr' (aka 'shared_ptr<const OpenColorIO::v1::Config>') to 'Config ' (aka 'void ')
	/usr/include/c++/4.2.1/tr1/boost_shared_ptr.h:678:7: note: candidate function

	Pretty print tables summarizing properties of tensor arrays in numpy, pytorch, jax, etc.

	Now on pip! `pip install arrgh` https://github.com/nmwsharp/arrgh