Skip to content

Instantly share code, notes, and snippets.

View josherich's full-sized avatar

Huifeng Chen josherich

View GitHub Profile
@josherich
josherich / a.md
Created July 3, 2025 17:10
A Galileo moment for LLM design

OP

A Galileo Moment for LLM Design

Introduction

This document discusses the significant advancements in Large Language Model (LLM) architecture design, drawing parallels to pivotal moments in the history of science, such as the Pisa Tower experiment that catalyzed modern physics. Our findings reveal the true limits of LLM architectures through a controlled synthetic pretraining environment, marking a potential turning point in LLM research that may delineate the field into “before” and “after.”

Read more about Architecture Design and the Magic of Canon Layers
Physics of Language Models: Part 4.1, Architecture Design and the Magic of Canon Layers. Joint work with Alberto Alfarano


@josherich
josherich / a.md
Created July 2, 2025 21:05
Approximating Language Model Training Data from Weights

OP

New Research: Approximating Language Model Training Data from Weights

Introduction

This document outlines the findings of recent research focused on understanding the amount of information available in open-weights models, specifically the DeepSeek R1 weights, which amount to 1.2 TB. The central question of this research is: What can we learn from all those bits?

Methodology

Our approach involves a novel method that reverses the fine-tuning of large language models (LLMs) to recover data. The following images illustrate key concepts of our methodology:

Model Weights

@josherich
josherich / a.md
Created June 23, 2025 18:13
Unpopular opinion: The 3x+1 Conjecture might be False!

OP

Unpopular Opinion: The 3x+1 Conjecture Might Be False

Introduction

In this document, I present an argument that challenges the widely accepted belief in the validity of the 3x+1 Conjecture. This conjecture, also known as the Collatz Conjecture, posits that starting with any positive integer, repeated application of a particular function will eventually lead to the number 1.

Background

In my first paper, co-authored with Y. Sinai in 2002, I demonstrated that the paths generated by the 3x+1 function can be modeled as a geometric Brownian motion in a precise asymptotic sense, with a drift of log(3/4) < 0. This finding suggests that typical trajectories exhibit a decay pattern, supporting the previously established fact that almost every initial seed eventually reaches a value below itself. However, this process cannot be iterated indefinitely, as the paths may diverge into very sparse trajectories.

Personal Reflecti

OP

Introducing Log-Linear Attention

In the realm of attention mechanisms, we are familiar with traditional Attention and its linear-time variants, such as linear attention and State Space Models. However, what exists in the intermediate space between these two paradigms? This document introduces Log-Linear Attention, a novel approach that offers significant advantages in both computational and memory efficiency.

Features of Log-Linear Attention

Log-Linear Attention is characterized by the following key features:

  • Log-linear time training
@josherich
josherich / a.md
Created May 14, 2025 04:07
Continuous Thought Machines

OP

Continuous Thought Machines: A New Frontier in Neural Architecture

Introduction

We are pleased to announce the release of our new research paper titled "Continuous Thought Machines" (CTMs). This work explores the significant role of timing and synchronization in neuronal computation, aspects that have been largely overlooked in contemporary neural networks. Our hypothesis is that neural timing is essential for the flexibility and adaptability observed in biological intelligence.

Proposed Neural Architecture

We introduce a novel architecture, the Continuous Thought Machines (CTMs), which is designed from the ground up to incorporate neural dynamics as a fundamental representation of intelligence. By prioritizing neural dynamics as a core component, CTMs are capable of performing adaptive computation naturally.

@josherich
josherich / a.md
Created May 14, 2025 03:59
Understanding DSPy

OP

Understanding DSPy: Key Insights and Principles

DSPy represents a significant advancement in the field of AI software development. However, its complexity can make it challenging to fully comprehend. This document aims to clarify the foundational principles of DSPy and outline its core tenets.

Introduction

The central thesis of DSPy is that while large language models (LLMs) and their methodologies will continue to evolve, this progress will not be uniform across all dimensions. Therefore, it is essential to identify:

  • The minimal set of fundamental abstractions that enable the development of downstream AI software that is "future-proof" and capable of adapting to advancements.
@josherich
josherich / a.md
Created May 1, 2025 16:11
Vibe Coding Higher Quality Code

OP

https://www.all-hands.dev/blog/vibe-coding-higher-quality-code

Enhancing Code Quality While Utilizing Coding Agents

Introduction

In the realm of software development, the integration of coding agents has become increasingly prevalent. A question often posed is: How can we effectively vibe with coding agents while ensuring high standards of code quality?

@josherich
josherich / a.md
Created April 29, 2025 14:26
Automated Capability Discovery

OP

Introducing Automated Capability Discovery (ACD)

Automated Capability Discovery (ACD) is an innovative tool designed to automatically identify surprising new capabilities and failure modes in foundation models. This is achieved through a process known as "self-exploration," wherein models explore their own abilities.

Leadership: ACD is spearheaded by @cong_ml and @shengranhu.

Automated Capability Discovery

Capability Reporting

@josherich
josherich / australian-voting.md
Created April 29, 2025 06:31
Australian voting system

OP

Australian Voting System MEGA Thread

Introduction

This document serves to clarify various aspects of the Australian voting system, addressing common misconceptions and providing accurate information. In an era rife with misinformation, it is essential to equip ourselves with the correct knowledge so that we may share it with others.


Overview of the Preferential Voting System

@josherich
josherich / fortunes-foundation-a-star-solver.js
Created April 27, 2025 21:56
A Star Solver for Zachtronics Solitaire Collection Fortune's Foundation
import assert from 'node:assert';
class HeapQueue {
constructor(cmp) {
this.cmp = (cmp || function(a, b){ return a - b; });
this.length = 0;
this.data = [];
}
size() {
return this.length;