Skip to content

Instantly share code, notes, and snippets.

View tonycao's full-sized avatar

Tian Cao tonycao

View GitHub Profile
@younesbelkada
younesbelkada / finetune_llama_v2.py
Last active April 7, 2025 18:27
Fine tune Llama v2 models on Guanaco Dataset
# coding=utf-8
# Copyright 2023 The HuggingFace Inc. team. All rights reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
@rain-1
rain-1 / llama-home.md
Last active April 24, 2025 06:41
How to run Llama 13B with a 6GB graphics card

This worked on 14/May/23. The instructions will probably require updating in the future.

llama is a text prediction model similar to GPT-2, and the version of GPT-3 that has not been fine tuned yet. It is also possible to run fine tuned versions (like alpaca or vicuna with this. I think. Those versions are more focused on answering questions)

Note: I have been told that this does not support multiple GPUs. It can only use a single GPU.

It is possible to run LLama 13B with a 6GB graphics card now! (e.g. a RTX 2060). Thanks to the amazing work involved in llama.cpp. The latest change is CUDA/cuBLAS which allows you pick an arbitrary number of the transformer layers to be run on the GPU. This is perfect for low VRAM.

  • Clone llama.cpp from git, I am on commit 08737ef720f0510c7ec2aa84d7f70c691073c35d.
@kvn219
kvn219 / Spatial_Transformer_Example_Part1.ipynb
Last active December 31, 2020 05:04
Spatial Transformer Networks with Tensorflow: Part I
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
@arasharchor
arasharchor / DeepLearningFaces.md
Created January 15, 2017 14:51 — forked from jdsgomes/DeepLearningFaces.md
Deep Learning for Face Recognition

Deep Learning for Face Recognition (May 2016)

Popular architectures

  • FaceNet (Google)
    • They use a triplet loss with the goal of keeping the L2 intra-class distances low and inter-class distances high
  • DeepID (Hong Kong University)
    • They use verification and identification signals to train the network. Afer each convolutional layer there is an identity layer connected to the supervisory signals in order to train each layer closely (on top of normal backprop)
  • DeepFace (Facebook)
    • Convs followed by locally connected, followed by fully connected
@shagunsodhani
shagunsodhani / KeyValueMemNN.md
Last active April 30, 2023 04:13
Summary of paper "Key-Value Memory Networks for Directly Reading Documents"

Key-Value Memory Networks for Directly Reading Documents

Introduction

  • Knowledge Bases (KBs) are effective tools for Question Answering (QA) but are often too restrictive (due to fixed schema) and too sparse (due to limitations of Information Extraction (IE) systems).
  • The paper proposes Key-Value Memory Networks, a neural network architecture based on Memory Networks that can leverage both KBs and raw data for QA.
  • The paper also introduces MOVIEQA, a new QA dataset that can be answered by a perfect KB, by Wikipedia pages and by an imperfect KB obtained using IE techniques thereby allowing a comparison between systems using any of the three sources.
  • Link to the paper.

Related Work

@jdsgomes
jdsgomes / DeepLearningFaces.md
Last active January 6, 2020 07:01
Deep Learning for Face Recognition

Deep Learning for Face Recognition (May 2016)

Popular architectures

  • FaceNet (Google)
    • They use a triplet loss with the goal of keeping the L2 intra-class distances low and inter-class distances high
  • DeepID (Hong Kong University)
    • They use verification and identification signals to train the network. Afer each convolutional layer there is an identity layer connected to the supervisory signals in order to train each layer closely (on top of normal backprop)
  • DeepFace (Facebook)
    • Convs followed by locally connected, followed by fully connected
@gnachman
gnachman / iterm.scpt
Last active April 8, 2023 23:42
Replace /Applications/Docker/Docker Quickstart Terminal.app/Contents/Resources/Scripts/iterm.scpt with this.
set itermRunning to (application "iTerm" is running)
set scriptPath to quoted form of POSIX path of ((path to me as text) & "::" & "start.sh")
set user_shell to do shell script "dscl /Search -read /Users/$USER UserShell | awk '{print $2}'"
tell application "iTerm"
activate
if not (exists window 1) or (itermRunning = false) then
reopen
end if
@vasanthk
vasanthk / System Design.md
Last active June 16, 2025 20:31
System Design Cheatsheet

System Design Cheatsheet

Picking the right architecture = Picking the right battles + Managing trade-offs

Basic Steps

  1. Clarify and agree on the scope of the system
  • User cases (description of sequences of events that, taken together, lead to a system doing something useful)
    • Who is going to use it?
    • How are they going to use it?

KC60 Keyboard end-user tips, tricks, programming notes, etc.

Leimi's note: removed lots of stuff from the original gist of scottjl, (thanks to him by the way!), as in the end some wasn't useful for me. If you are intestered, go check the gist revisions.

The KC60 is kinda like a premade GH60 that was first sold on Massdrop during summer 2015.
It runs on TMK firmware, or something based on it at least (not sure this is the real source for the keyboard but it seems it is), which means it's heavily programmable.
There is a GUI tool (the source of this tool seems to be here) and a command-line tool to ease up the process of programming the board.
**Go check this great article on Key

@MattDMo
MattDMo / ipy_repl.py
Last active April 3, 2022 23:16
SublimeREPL ipy_repl.py for running IPython/Jupyter in Sublime Text
# from https://gist.github.com/MattDMo/6cb1dfbe8a124e1ca5af
import os
import json
import socket
import threading
activate_this = os.environ.get("SUBLIMEREPL_ACTIVATE_THIS", None)
# turn off pager