Ryan Foster f0ster

Smashing the Tariffs for Fun and Profit: How DeepSeek v3 Outsmarted the AI Ban

1. CUDA and PTX Optimizations

DeepSeek-V3’s engineers optimized GPU performance at the low-level by tailoring kernels and memory access patterns to NVIDIA’s hardware. A key strategy was warp specialization: they partitioned a subset of GPU threads (warps) specifically for communication tasks, allowing compute to overlap with data transfers (DeepSeek-V3 Technical Report). In practice, only ~20 of the GPU’s Streaming Multiprocessors (SMs) were reserved to handle all cross-node communications – enough to saturate both InfiniBand (IB) and NVLink bandwidth – while the remaining SMs focused purely on computation (DeepSeek-V3 Technical Report) ([DeepSeek-V3 Technical Report](https://arx

	QEMU ?= qemu-system-x86_64
	UBUNTU_INSTALL_CDROM ?= ~/Downloads/ubuntu-24.04.3-desktop-amd64.iso
	DISK ?= disk/disk.qcow2
	RAM ?= 32768
	CPUS ?= 16
	PORT ?= 2222
	DISPLAY_WIDTH ?= 1920
	DISPLAY_HEIGHT ?= 1080

	# QXL for stable single display (defaults)

	set -g terminal-overrides 'xterm*:smcup@:rmcup@'

	# ~/.tmux.conf
	# for tmux 3.5a
	#
	# -----------------------------------------------------------------------------
	# Global settings

	# Set prefix key to Ctrl-a
	#unbind-key C-b

	<!DOCTYPE html>
	<html lang="en">
	<head>
	<meta charset="UTF-8">
	<meta name="viewport" content="width=device-width, initial-scale=1.0">
	<title>Horizontal Binary Clock</title>
	<style>
	body {
	background-color: #1a1a1a;
	/* Center the main wrapper on the page */


	#include <iostream>
	#include <cuda.h>

	// CUDA Kernel
	__global__ void add_arrays(float a, float b, float *c, int n) {
	int idx = blockIdx.x * blockDim.x + threadIdx.x;
	if (idx < n) {
	c[idx] = a[idx] + b[idx];
	}

	# ~/.tmux.conf

	# General settings
	set -g default-terminal "screen-256color" # Use 256-color terminal
	set -g history-limit 5000 # Increase scrollback buffer size
	set -g base-index 0 # Start window indexes at 0
	set -g mouse on # Enable mouse control (pane selection, resizing, scrolling)

	# Restore original prefix key
	set-option -g prefix C-b # Set prefix to Ctrl-b

	# script to provide a summary of the repositories by only listing each one's name
	# along with its status (public, public with changes, or private)

	import os
	import subprocess
	import json
	from concurrent.futures import ThreadPoolExecutor, as_completed

	def execute_command(command, cwd):
	"""Executes a shell command in a specified directory and returns the output."""

	import time
	import torch
	from transformers import AutoModelForCausalLM, AutoTokenizer

	def load_model_and_tokenizer(model_id):
	"""
	Load the tokenizer and model based on the specified model ID.
	Model is set to use float16 for computation to reduce memory usage and improve performance.
	"""
	tokenizer = AutoTokenizer.from_pretrained(model_id)