Skip to content

Instantly share code, notes, and snippets.

View mzbac's full-sized avatar
🦀
I may be slow to respond.

Anchen mzbac

🦀
I may be slow to respond.
  • Australia
  • 07:47 (UTC +10:00)
View GitHub Profile
@Anemll
Anemll / test.swift
Last active June 10, 2025 00:42
Test Apple Foundation Model t/s
import FoundationModels
import Playgrounds
import Foundation
let session = LanguageModelSession()
let start = Date()
let response = try await session.respond(to: "What is Apple Neural Engine and how to use it?")
let responseText = response.content // Replace 'value' with the actual property name from LanguageModelSession.Response<String> that holds the string payload.
print(responseText)
let end = Date()
@awni
awni / mlx_lm_open_webui.md
Created April 25, 2025 15:41
Open WebUI with MLX LM

Setup

Install packages:

pip install open-webui mlx-lm

Start Open WebUI server:

@awni
awni / README.md
Last active April 30, 2025 12:30
Test Time Scaling with R1-based Models and MLX LM

Test Time Scaling with MLX LM and R1-based LLMs

Install MLX LM:

pip install mlx-lm

And run:

@willccbb
willccbb / grpo_demo.py
Last active June 10, 2025 09:38
GRPO Llama-1B
# train_grpo.py
#
# See https://github.com/willccbb/verifiers for ongoing developments
#
"""
citation:
@misc{brown2025grpodemo,
title={Granular Format Rewards for Eliciting Mathematical Reasoning Capabilities in Small Language Models},
author={Brown, William},
@awni
awni / mlx_distributed_deepseek.md
Last active June 8, 2025 03:50
Run DeepSeek R1 or V3 with MLX Distributed

Setup

On every machine in the cluster install openmpi and mlx-lm:

conda install conda-forge::openmpi
pip install -U mlx-lm

Next download the pipeline parallel run script. Download it to the same path on every machine:

@Maharshi-Pandya
Maharshi-Pandya / contemplative-llms.txt
Last active June 4, 2025 06:19
"Contemplative reasoning" response style for LLMs like Claude and GPT-4o
You are an assistant that engages in extremely thorough, self-questioning reasoning. Your approach mirrors human stream-of-consciousness thinking, characterized by continuous exploration, self-doubt, and iterative analysis.
## Core Principles
1. EXPLORATION OVER CONCLUSION
- Never rush to conclusions
- Keep exploring until a solution emerges naturally from the evidence
- If uncertain, continue reasoning indefinitely
- Question every assumption and inference
@ivanfioravanti
ivanfioravanti / mlx_whisper_realtime.py
Last active December 18, 2024 17:07
mlx-whisper real time audio
# Required packages:
# pip install SpeechRecognition mlx-whisper pyaudio
# Note: This script requires Apple Silicon Mac for MLX Whisper
import speech_recognition as sr
import numpy as np
import mlx_whisper
r = sr.Recognizer()
mic = sr.Microphone(sample_rate=16000)
@awni
awni / l3min.py
Last active January 25, 2025 21:30
A minimal, fast implementation of Llama 3.1 in MLX.
"""
A minimal, fast example generating text with Llama 3.1 in MLX.
To run, install the requirements:
pip install -U mlx transformers fire
Then generate text with:
python l3min.py "How tall is K2?"
from transformers import AutoModelForCausalLM, AutoTokenizer
from peft import PeftModel
import torch
import os
import argparse
def get_args():
parser = argparse.ArgumentParser()
parser.add_argument("--base_model_name_or_path", type=str)
@afiodorov
afiodorov / instruction
Last active October 28, 2023 11:51
Run your own LLM & create an api endpoint for predictions
Docker Image : pytorch/pytorch
Image Runtype : jupyter_direc ssh_direc ssh_proxy
Environment : [["JUPYTER_DIR", "/"], ["-p 41654:41654", "1"]]
pip install torch bitsandbytes sentencepiece "protobuf<=3.20.2" git+https://github.com/huggingface/transformers flask python-dotenv Flask-HTTPAuth accelerate
!mv /opt/conda/lib/python3.10/site-packages/bitsandbytes/libbitsandbytes_cuda116.so /opt/conda/lib/python3.10/site-packages/bitsandbytes/libbitsandbytes_cpu.so