zhuguangbin zhuguangbin

Anti-hype LLM reading list

Goals: Add links that are reasonable and good explanations of how stuff works. No hype and no vendor content if possible. Practical first-hand accounts of models in prod eagerly sought.

Foundational Concepts

Pre-Transformer Models

Apache Hadoop - add native libraries

If native libraries are not available the following message is displayed with every hadoop command: hadoop checknative

WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable

Clone hadoop source code

So Hive in CDH is horribly, painfully slow. Cloudera ships Hive 1.1, which is actually moderately modern. It is, however, very badly configured out of the box and patched with custom code from Cloudera. With a bit of effort, we managed to improve hive performance considerably. We really shouldn't have to do this, but Cloudera is actively working against supporting a performant Hive.

First, building Tez was fairly straightforward. Using the instructions at https://github.com/apache/tez/blob/master/docs/src/site/markdown/install.md, the only change was to use the version string "2.6.0" for the build. I believe that was the default. Don't use the CDH string, it won't work.

At the bottom of the installation instructions, there's mention of the fact that to use the local hadoop jars (rather than those packaged with tez) you must unpack the jars in HDFS rather than using the tarball. In this case, unpack the tez-minimal tarball and upload the contents to /apps/tez-0.7.0 (or whatever you prefer). Don't fo

	# I couldn't get return generators from chains so I had to do a bit of low level SSE, Hope this is useful
	# Probably you'll use another Vector Store instead of OpenSearch, but if you want to mimic what I did here,
	# please use the fork of `OpenSearchVectorSearch` in https://github.com/oneryalcin/langchain


	import json
	import os
	import logging
	from typing import List, Generator

	try {
	TrustManager[] trustAllCerts = new TrustManager[] {
	new X509TrustManager() {
	public java.security.cert.X509Certificate[] getAcceptedIssuers() {
	return null;
	}
	public void checkClientTrusted(X509Certificate[] certs, String authType) { }

	public void checkServerTrusted(X509Certificate[] certs, String authType) { }
	}

	#!/usr/bin/env python3

	"""Simple HTTP Server With Upload.

	This module builds on BaseHTTPServer by implementing the standard GET
	and HEAD requests in a fairly straightforward manner.

	see: https://gist.github.com/UniIsland/3346170
	"""

	config.vm.provision "shell", inline: <<-SHELL
	apt-get -y -q update
	apt-get -y -q upgrade
	apt-get -y -q install software-properties-common htop
	add-apt-repository ppa:webupd8team/java
	apt-get -y -q update
	echo oracle-java8-installer shared/accepted-oracle-license-v1-1 select true \| sudo /usr/bin/debconf-set-selections
	echo oracle-java7-installer shared/accepted-oracle-license-v1-1 select true \| sudo /usr/bin/debconf-set-selections
	apt-get -y -q install oracle-java8-installer
	apt-get -y -q install oracle-java7-installer

	/**
	The MIT License (MIT)

	Copyright (c) 2013 Jean Helou

	Permission is hereby granted, free of charge, to any person obtaining a copy
	of this software and associated documentation files (the "Software"), to deal
	in the Software without restriction, including without limitation the rights
	to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
	copies of the Software, and to permit persons to whom the Software is

	brew unlink thrift
	brew install https://raw.githubusercontent.com/Homebrew/homebrew-core/9d524e4850651cfedd64bc0740f1379b533f607d/Formula/thrift.rb


	package ca.underflow.hbase

	import org.hbase.async._
	import com.stumbleupon.async._

	object Demo extends App {

	// This let's us pass inline functions to the deferred
	// object returned by most asynchbase methods