Goals: Add links that are reasonable and good explanations of how stuff works. No hype and no vendor content if possible. Practical first-hand accounts of models in prod eagerly sought.

# I couldn't get return generators from chains so I had to do a bit of low level SSE, Hope this is useful | |
# Probably you'll use another Vector Store instead of OpenSearch, but if you want to mimic what I did here, | |
# please use the fork of `OpenSearchVectorSearch` in https://github.com/oneryalcin/langchain | |
import json | |
import os | |
import logging | |
from typing import List, Generator |
brew unlink thrift | |
brew install https://raw.githubusercontent.com/Homebrew/homebrew-core/9d524e4850651cfedd64bc0740f1379b533f607d/Formula/thrift.rb |
try { | |
TrustManager[] trustAllCerts = new TrustManager[] { | |
new X509TrustManager() { | |
public java.security.cert.X509Certificate[] getAcceptedIssuers() { | |
return null; | |
} | |
public void checkClientTrusted(X509Certificate[] certs, String authType) { } | |
public void checkServerTrusted(X509Certificate[] certs, String authType) { } | |
} |
If native libraries are not available the following message is displayed with every hadoop command: hadoop checknative
WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Clone hadoop source code
So Hive in CDH is horribly, painfully slow. Cloudera ships Hive 1.1, which is actually moderately modern. It is, however, very badly configured out of the box and patched with custom code from Cloudera. With a bit of effort, we managed to improve hive performance considerably. We really shouldn't have to do this, but Cloudera is actively working against supporting a performant Hive.
First, building Tez was fairly straightforward. Using the instructions at https://github.com/apache/tez/blob/master/docs/src/site/markdown/install.md, the only change was to use the version string "2.6.0" for the build. I believe that was the default. Don't use the CDH string, it won't work.
At the bottom of the installation instructions, there's mention of the fact that to use the local hadoop jars (rather than those packaged with tez) you must unpack the jars in HDFS rather than using the tarball. In this case, unpack the tez-minimal tarball and upload the contents to /apps/tez-0.7.0 (or whatever you prefer). Don't fo
#!/usr/bin/env python3 | |
"""Simple HTTP Server With Upload. | |
This module builds on BaseHTTPServer by implementing the standard GET | |
and HEAD requests in a fairly straightforward manner. | |
see: https://gist.github.com/UniIsland/3346170 | |
""" | |
config.vm.provision "shell", inline: <<-SHELL | |
apt-get -y -q update | |
apt-get -y -q upgrade | |
apt-get -y -q install software-properties-common htop | |
add-apt-repository ppa:webupd8team/java | |
apt-get -y -q update | |
echo oracle-java8-installer shared/accepted-oracle-license-v1-1 select true | sudo /usr/bin/debconf-set-selections | |
echo oracle-java7-installer shared/accepted-oracle-license-v1-1 select true | sudo /usr/bin/debconf-set-selections | |
apt-get -y -q install oracle-java8-installer | |
apt-get -y -q install oracle-java7-installer |
/** | |
The MIT License (MIT) | |
Copyright (c) 2013 Jean Helou | |
Permission is hereby granted, free of charge, to any person obtaining a copy | |
of this software and associated documentation files (the "Software"), to deal | |
in the Software without restriction, including without limitation the rights | |
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell | |
copies of the Software, and to permit persons to whom the Software is |
package ca.underflow.hbase | |
import org.hbase.async._ | |
import com.stumbleupon.async._ | |
object Demo extends App { | |
// This let's us pass inline functions to the deferred | |
// object returned by most asynchbase methods |