Skip to content

Instantly share code, notes, and snippets.

@sachajw
Last active May 20, 2025 17:16
Show Gist options
  • Save sachajw/d66c6d5a095f31bb97870c340c0862eb to your computer and use it in GitHub Desktop.
Save sachajw/d66c6d5a095f31bb97870c340c0862eb to your computer and use it in GitHub Desktop.
Ai Open Source Stack

Layer Cake Architecture View

The following diagram represents the stack as a layer cake architecture, with uniform layers stacked on top of each other:

%%{init: {'theme': 'forest'}}%%
flowchart TD
    %% Layer styling with distinct colors for each layer
    classDef frontendLayer fill:#e6f7ff,stroke:#1890ff,stroke-width:4px,color:#0050b3,font-weight:bold
    classDef ragLayer fill:#f6ffed,stroke:#52c41a,stroke-width:4px,color:#135200,font-weight:bold
    classDef backendLayer fill:#fff2e8,stroke:#fa8c16,stroke-width:4px,color:#873800,font-weight:bold
    classDef dataLayer fill:#f9f0ff,stroke:#722ed1,stroke-width:4px,color:#391085,font-weight:bold
    classDef llmLayer fill:#fff1f0,stroke:#f5222d,stroke-width:4px,color:#a8071a,font-weight:bold
    
    %% Define simple cake layers with equal width
    Frontend["Frontend Layer\nNext.js | Vercel | Streamlit"]
    RAG["Embeddings & RAG Libraries\nNomic | Cognita | LLMware | JinaAI"]
    Backend["Backend & Model Access\nLangChain | Metaflow | Hugging Face | FastAPI | Ollama"]
    Data["Data & Retrieval Layer\nPostgreSQL | Milvus | Weaviate | pgvector | FAISS"]
    LLM["Large Language Models\nLlama3.3 | Mistral | Gemma2 | Owen | Phi"]
    
    %% Stack layers vertically like a cake
    Frontend --- RAG
    RAG --- Backend
    Backend --- Data
    Data --- LLM
    
    %% Apply styles
    class Frontend frontendLayer
    class RAG ragLayer
    class Backend backendLayer
    class Data dataLayer
    class LLM llmLayer
Loading
%%{init: {'theme': 'forest'}}%%
flowchart TB
    %% Layer styling with distinct colors for each layer
    classDef frontendLayer fill:#e6f7ff,stroke:#1890ff,stroke-width:3px,color:#0050b3,font-weight:bold
    classDef ragLayer fill:#f6ffed,stroke:#52c41a,stroke-width:3px,color:#135200,font-weight:bold
    classDef backendLayer fill:#fff2e8,stroke:#fa8c16,stroke-width:3px,color:#873800,font-weight:bold
    classDef dataLayer fill:#f9f0ff,stroke:#722ed1,stroke-width:3px,color:#391085,font-weight:bold
    classDef llmLayer fill:#fff1f0,stroke:#f5222d,stroke-width:3px,color:#a8071a,font-weight:bold
    
    classDef component fill:#ffffff,stroke:#d9d9d9,stroke-width:1px,color:#262626
    
    %% Create subgraphs for each layer with clear boundaries
    subgraph Layer1[Frontend]
        Frontend[Frontend Applications] --> NextJS["Next.js"]
        Frontend --> Vercel["Vercel"]
        Frontend --> Streamlit["Streamlit"]
    end
    
    subgraph Layer2[Embeddings & RAG Libraries]
        RAG[Embeddings and RAG Libraries] --> Nomic["Nomic"]
        RAG --> Cognita["Cognita"]
        RAG --> LLMware["LLMware"]
        RAG --> JinaAI["JinaAI"]
    end
    
    subgraph Layer3[Backend & Model Access]
        Backend[Backend and Model Access] --> LangChain["LangChain"]
        Backend --> Metaflow["Netflix Metaflow"]
        Backend --> HuggingFace["Hugging Face"]
        Backend --> FastAPI["FastAPI"]
        Backend --> Ollama["Ollama"]
    end
    
    subgraph Layer4[Data & Retrieval]
        DR[Data and Retrieval] --> Postgres["PostgreSQL"]
        DR --> Milvus["Milvus"]
        DR --> Weaviate["Weaviate"]
        DR --> PGVector["pgvector"]
        DR --> FAISS["FAISS"]
    end
    
    subgraph Layer5[Large Language Models]
        LLM[Large Language Models] --> Llama3["Llama3.3"]
        LLM --> Mistral["Mistral"]
        LLM --> Gemma2["Gemma2"]
        LLM --> Owen["Owen"]
        LLM --> Phi["Phi"]
    end
    
    %% Connections between layers with thicker, more visible lines
    Layer1 ==> Layer2
    Layer2 ==> Layer3
    Layer3 ==> Layer4
    Layer3 ==> Layer5
    Layer4 ==> Layer5
    
    %% Apply styles to layers
    class Layer1 frontendLayer
    class Layer2 ragLayer
    class Layer3 backendLayer
    class Layer4 dataLayer
    class Layer5 llmLayer
    
    %% Apply styles to components
    class Frontend,RAG,Backend,DR,LLM frontendLayer,ragLayer,backendLayer,dataLayer,llmLayer
    class NextJS,Vercel,Streamlit,Nomic,Cognita,LLMware,JinaAI,LangChain,Metaflow,HuggingFace,FastAPI,Ollama,Postgres,Milvus,Weaviate,PGVector,FAISS,Llama3,Mistral,Gemma2,Owen,Phi component
Loading

Components Breakdown

1. Frontend

User interfaces and deployment platforms:

  • Next.js: React framework for building web applications
  • Vercel: Platform for frontend deployment and hosting
  • Streamlit: Framework for quickly creating data apps and ML interfaces

2. Embeddings and RAG Libraries

Tools for creating embeddings and implementing Retrieval Augmented Generation:

  • Nomic: Library for creating and managing embeddings
  • Cognita: Tools for building knowledge-intensive applications
  • LLMware: Framework for RAG applications and enterprise data integration
  • JinaAI: Neural search framework for multimodal data

3. Backend and Model Access

Frameworks and tools for interacting with models and orchestrating AI workflows:

  • LangChain: Framework for developing applications with LLMs through composable components
  • Netflix Metaflow: Data science framework for managing workflows and deployments
  • Hugging Face: Hub for accessing and fine-tuning models
  • FastAPI: High-performance API framework for Python
  • Ollama: Tool for running LLMs locally

4. Data and Retrieval

Storage and vector search solutions for managing embeddings and knowledge bases:

  • PostgreSQL: Relational database for structured data storage
  • Milvus: Scalable vector database optimized for similarity search
  • Weaviate: Vector search engine with semantic search capabilities
  • pgvector: PostgreSQL extension for vector similarity search
  • FAISS: Facebook AI's efficient similarity search library

5. Large Language Models

The foundation of the stack, providing the core AI capabilities:

  • Llama3.3: Meta's open source LLM, known for its versatility and strong instruction following
  • Mistral: Efficient and high-performance model with strong reasoning capabilities
  • Gemma2: Google's lightweight yet powerful model designed for efficient deployment
  • Owen: A specialized model focused on coding and technical tasks
  • Phi: Microsoft's small but capable model optimized for efficiency

Getting Started

Prerequisites

  • Python 3.10+
  • Node.js 18+ (for frontend components)
  • Docker and Docker Compose
  • Git

Basic Setup

  1. Clone this repository:

    git clone https://github.com/yourusername/open-source-ai-stack.git
    cd open-source-ai-stack
  2. Set up the Python environment:

    python -m venv venv
    source venv/bin/activate  # On Windows: venv\Scripts\activate
    pip install -r requirements.txt
  3. Set up the model environment:

    # Install Ollama
    curl -fsSL https://ollama.com/install.sh | sh
    
    # Pull the models
    ollama pull llama3
    ollama pull mistral
    ollama pull gemma2
    ollama pull phi
  4. Start the database services:

    docker-compose up -d postgres milvus weaviate
  5. Set up the frontend:

    cd frontend
    npm install
    npm run dev

Example Use Cases

1. Document Question-Answering System

Combine LLMs with RAG capabilities to build a system that can answer questions based on your documents:

from langchain.llms import Ollama
from langchain.vectorstores import FAISS
from langchain.embeddings import HuggingFaceEmbeddings
from langchain.document_loaders import DirectoryLoader
from langchain.chains import RetrievalQA

# Load documents
loader = DirectoryLoader("./documents/", glob="**/*.pdf")
documents = loader.load()

# Create embeddings and vector store
embeddings = HuggingFaceEmbeddings(model_name="nomic-ai/nomic-embed-text-v1")
vector_store = FAISS.from_documents(documents, embeddings)

# Initialize LLM
llm = Ollama(model="llama3")

# Create QA chain
qa_chain = RetrievalQA.from_chain_type(
    llm=llm,
    chain_type="stuff",
    retriever=vector_store.as_retriever()
)

# Query the system
query = "What is the main conclusion of the research paper?"
response = qa_chain.run(query)
print(response)

2. Conversational AI with Memory

Build a chatbot with conversation history using LangChain and FastAPI:

from fastapi import FastAPI
from langchain.chains import ConversationChain
from langchain.memory import ConversationBufferMemory
from langchain.llms import Ollama

app = FastAPI()
llm = Ollama(model="mistral")
memory = ConversationBufferMemory()
conversation = ConversationChain(
    llm=llm, 
    memory=memory,
    verbose=True
)

@app.post("/chat")
async def chat(message: str):
    response = conversation.predict(input=message)
    return {"response": response}

3. Visual AI Dashboard with Streamlit

Create an interactive dashboard to demonstrate AI capabilities:

import streamlit as st
from langchain.llms import Ollama
from langchain.callbacks import StreamlitCallbackHandler

st.title("Open Source AI Demo")

llm = Ollama(model="gemma2")
st_callback = StreamlitCallbackHandler(st.container())

user_input = st.text_input("Ask a question:")
if user_input:
    with st.spinner("Generating response..."):
        response = llm.generate([user_input], callbacks=[st_callback])
        st.write(response.generations[0][0].text)

Performance Considerations

  • Model Selection: Choose smaller models (Phi, Gemma2) for faster inference or larger models (Llama3.3) for more complex tasks
  • Vector Database: FAISS is faster for smaller datasets, while Milvus and Weaviate scale better for large collections
  • Embedding Caching: Implement caching strategies to avoid redundant embedding generation
  • Quantization: Use quantized models where possible to reduce memory footprint and increase speed
  • Batching: Implement request batching for high-throughput applications

Deployment Options

  1. Local Development:

    • Ollama for local model hosting
    • Docker Compose for services
    • Next.js in development mode
  2. Self-Hosted Production:

    • Kubernetes with custom resource definitions
    • HuggingFace Inference Endpoints for model hosting
    • PostgreSQL with pgvector for data storage
  3. Hybrid Cloud:

    • Vercel for frontend hosting
    • Self-hosted models on dedicated hardware
    • Managed PostgreSQL service

Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

  1. Fork the repository
  2. Create your feature branch (git checkout -b feature/amazing-feature)
  3. Commit your changes (git commit -m 'Add some amazing feature')
  4. Push to the branch (git push origin feature/amazing-feature)
  5. Open a Pull Request

License

This project is licensed under the MIT License - see the LICENSE file for details.

Acknowledgments

  • All the open source projects and communities that make this stack possible
  • Contributors and maintainers of the individual components
  • The AI research community for advancing open source models

Disclaimer: This stack is provided as a reference architecture. Individual components may have their own licensing terms and requirements. Ensure you comply with all applicable licenses when using these components.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment