Skip to content

Instantly share code, notes, and snippets.

@traviskaufman
Created January 11, 2025 10:51
Show Gist options
  • Save traviskaufman/24cad594f5d08d00425e37b7a307ecfe to your computer and use it in GitHub Desktop.
Save traviskaufman/24cad594f5d08d00425e37b7a307ecfe to your computer and use it in GitHub Desktop.
Mindlang proof of concept (GPT o1-mini 2024-08)

Module: Source Code Parser

A Source Code Parser is responsible for taking an entrypoint file which specifies the start of a program, and parsing it into a source tree, which is a collection of every file which is represented in the source file, along with its dependencies.

The files themselves are called modules. They are formatted as standard Markdown files, following this format: https://daringfireball.net/projects/markdown/syntax and thus have the extension .md. The way each module should be parsed is as follows:

  • The text in the files which are not links constitute the source code. This is kept in plain markdown syntax.
  • The links in the files constitute reference to other modules. These are considered dependencies of the module. Note that a dependency is only valid if the link points to an existent local file on the filesystem.

The output should be an adjacency list object (https://en.wikipedia.org/wiki/Adjacency_list) which constitutes a plain object where each key is a reference to the path of a module, and each value is an object consisting of two properties:

  • source: The source markdown (including links)
  • dependencies: An object where each key is a file path corresponding to a key in overall object object, that are referenced via links in the source. Each value is the name within the source code that the source uses for the link, e.g. User Interface would be {'/resolved/path/to/user_interface.md': 'User Interface'}

Please also include a unit test

// SourceCodeParser.test.ts
import { SourceCodeParser } from "./SourceCodeParser";
import * as fs from "fs";
import * as path from "path";
import * as os from "os";
describe("SourceCodeParser", () => {
const tempDir = path.join(os.tmpdir(), "source-code-parser-test");
beforeAll(() => {
// Create temporary directory
fs.mkdirSync(tempDir, { recursive: true });
// Create sample Markdown modules
/*
Entry Point: main.md
main.md -> [User Interface](./ui.md), [Backend](./backend.md), [External Link](https://example.com/external.md)
ui.md -> [Components](./components.md)
backend.md -> [Database](./database.md)
components.md -> no dependencies
database.md -> no dependencies
*/
const files = {
"main.md": `
# Main Module
This is the main module.
- [User Interface](./ui.md)
- [Backend](./backend.md)
- [External Link](https://example.com/external.md)
`,
"ui.md": `
# User Interface
UI components.
- [Components](./components.md)
`,
"backend.md": `
# Backend
Backend services.
- [Database](./database.md)
`,
"components.md": `
# Components
Reusable UI components.
`,
"database.md": `
# Database
Database schema and migrations.
`,
};
for (const [filename, content] of Object.entries(files)) {
const filePath = path.join(tempDir, filename);
fs.writeFileSync(filePath, content.trim(), "utf-8");
}
});
afterAll(() => {
// Clean up temporary directory
fs.rmSync(tempDir, { recursive: true, force: true });
});
it("should correctly parse the source tree and build the adjacency list", async () => {
const entryPoint = path.join(tempDir, "main.md");
const parser = new SourceCodeParser(entryPoint);
const adjacencyList = await parser.parse();
// Resolve all paths
const mainPath = path.resolve(entryPoint);
const uiPath = path.resolve(path.join(tempDir, "ui.md"));
const backendPath = path.resolve(path.join(tempDir, "backend.md"));
const componentsPath = path.resolve(path.join(tempDir, "components.md"));
const databasePath = path.resolve(path.join(tempDir, "database.md"));
// Define expected adjacency list
const expectedAdjacencyList = {
[mainPath]: {
source: `# Main Module
This is the main module.
- [User Interface](./ui.md)
- [Backend](./backend.md)
- [External Link](https://example.com/external.md)`,
dependencies: {
[uiPath]: "User Interface",
[backendPath]: "Backend",
// External link should be ignored, so not included
},
},
[uiPath]: {
source: `# User Interface
UI components.
- [Components](./components.md)`,
dependencies: {
[componentsPath]: "Components",
},
},
[backendPath]: {
source: `# Backend
Backend services.
- [Database](./database.md)`,
dependencies: {
[databasePath]: "Database",
},
},
[componentsPath]: {
source: `# Components
Reusable UI components.`,
dependencies: {},
},
[databasePath]: {
source: `# Database
Database schema and migrations.`,
dependencies: {},
},
};
expect(adjacencyList).toEqual(expectedAdjacencyList);
});
it("should throw an error for missing modules", async () => {
// Create a module with a missing dependency
const missingDepPath = path.join(tempDir, "missing_dep.md");
fs.writeFileSync(
missingDepPath,
`
# Missing Dependency Module
This module has a missing dependency.
- [Nonexistent](./nonexistent.md)
`.trim(),
"utf-8"
);
const parser = new SourceCodeParser(missingDepPath);
await expect(parser.parse()).rejects.toThrow(/Module not found/);
});
it("should handle circular dependencies gracefully", async () => {
// Create circular dependencies
const aPath = path.join(tempDir, "a.md");
const bPath = path.join(tempDir, "b.md");
fs.writeFileSync(
aPath,
`
# Module A
- [Module B](./b.md)
`.trim(),
"utf-8"
);
fs.writeFileSync(
bPath,
`
# Module B
- [Module A](./a.md)
`.trim(),
"utf-8"
);
const parser = new SourceCodeParser(aPath);
const adjacencyList = await parser.parse();
const resolvedAPath = path.resolve(aPath);
const resolvedBPath = path.resolve(bPath);
const expectedAdjacencyList = {
[resolvedAPath]: {
source: `# Module A
- [Module B](./b.md)`,
dependencies: {
[resolvedBPath]: "Module B",
},
},
[resolvedBPath]: {
source: `# Module B
- [Module A](./a.md)`,
dependencies: {
[resolvedAPath]: "Module A",
},
},
};
expect(adjacencyList).toEqual(expectedAdjacencyList);
});
it("should ignore external links and not include them as dependencies", async () => {
const entryPoint = path.join(tempDir, "main.md");
const parser = new SourceCodeParser(entryPoint);
const adjacencyList = await parser.parse();
// Resolve all paths
const mainPath = path.resolve(entryPoint);
const externalLink = "https://example.com/external.md"; // Should be ignored
// Check that external link is not included in dependencies
expect(adjacencyList[mainPath].dependencies).not.toHaveProperty(
externalLink
);
});
});
// SourceCodeParser.ts
import * as fs from "fs";
import * as path from "path";
/**
* Interface representing the dependencies of a module.
* Key: Absolute path to the dependency module.
* Value: The name used in the source code for the link.
*/
interface Dependencies {
[modulePath: string]: string;
}
/**
* Interface representing the information of a module.
*/
interface ModuleInfo {
source: string;
dependencies: Dependencies;
}
/**
* Interface representing the adjacency list of the source tree.
* Key: Absolute path to the module.
* Value: ModuleInfo object containing source and dependencies.
*/
interface AdjacencyList {
[modulePath: string]: ModuleInfo;
}
/**
* SourceCodeParser is responsible for parsing Markdown modules
* starting from an entry point and building an adjacency list
* representing the source tree.
*/
export class SourceCodeParser {
private entryPoint: string;
private adjacencyList: AdjacencyList;
private processedModules: Set<string>;
constructor(entryPoint: string) {
this.entryPoint = path.resolve(entryPoint);
this.adjacencyList = {};
this.processedModules = new Set();
}
/**
* Parses the entry point and builds the adjacency list.
* @returns AdjacencyList representing the source tree.
*/
public async parse(): Promise<AdjacencyList> {
await this.processModule(this.entryPoint);
return this.adjacencyList;
}
/**
* Processes a single module: reads its content, extracts dependencies,
* and recursively processes dependencies.
* @param modulePath Absolute path to the module.
*/
private async processModule(modulePath: string): Promise<void> {
if (this.processedModules.has(modulePath)) {
return;
}
if (!fs.existsSync(modulePath)) {
throw new Error(`Module not found: ${modulePath}`);
}
const content = fs.readFileSync(modulePath, "utf-8");
this.adjacencyList[modulePath] = {
source: content,
dependencies: {},
};
this.processedModules.add(modulePath);
const links = this.extractLinks(content);
for (const link of links) {
const dependencyPath = path.resolve(path.dirname(modulePath), link.href);
this.adjacencyList[modulePath].dependencies[dependencyPath] = link.text;
await this.processModule(dependencyPath);
}
}
/**
* Extracts Markdown links from the source content.
* @param content Markdown content.
* @returns Array of objects containing link text and href.
*/
private extractLinks(content: string): { text: string; href: string }[] {
const linkRegex = /\[([^\]]+)\]\(([^)]+)\)/g;
const links: { text: string; href: string }[] = [];
let match: RegExpExecArray | null;
while ((match = linkRegex.exec(content)) !== null) {
const [_, text, href] = match;
// Only consider local .md files as dependencies
if (
href.endsWith(".md") &&
!href.startsWith("http://") &&
!href.startsWith("https://")
) {
links.push({ text, href });
}
}
return links;
}
}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment