Ai Open Source Stack

Layer Cake Architecture View

The following diagram represents the stack as a layer cake architecture, with uniform layers stacked on top of each other:

%%{init: {'theme': 'forest'}}%%
flowchart TD
    %% Layer styling with distinct colors for each layer
    classDef frontendLayer fill:#e6f7ff,stroke:#1890ff,stroke-width:4px,color:#0050b3,font-weight:bold
    classDef ragLayer fill:#f6ffed,stroke:#52c41a,stroke-width:4px,color:#135200,font-weight:bold
    classDef backendLayer fill:#fff2e8,stroke:#fa8c16,stroke-width:4px,color:#873800,font-weight:bold
    classDef dataLayer fill:#f9f0ff,stroke:#722ed1,stroke-width:4px,color:#391085,font-weight:bold
    classDef llmLayer fill:#fff1f0,stroke:#f5222d,stroke-width:4px,color:#a8071a,font-weight:bold
    
    %% Define simple cake layers with equal width
    Frontend["Frontend Layer\nNext.js | Vercel | Streamlit"]
    RAG["Embeddings & RAG Libraries\nNomic | Cognita | LLMware | JinaAI"]
    Backend["Backend & Model Access\nLangChain | Metaflow | Hugging Face | FastAPI | Ollama"]
    Data["Data & Retrieval Layer\nPostgreSQL | Milvus | Weaviate | pgvector | FAISS"]
    LLM["Large Language Models\nLlama3.3 | Mistral | Gemma2 | Owen | Phi"]
    
    %% Stack layers vertically like a cake
    Frontend --- RAG
    RAG --- Backend
    Backend --- Data
    Data --- LLM
    
    %% Apply styles
    class Frontend frontendLayer
    class RAG ragLayer
    class Backend backendLayer
    class Data dataLayer
    class LLM llmLayer

%%{init: {'theme': 'forest'}}%%
flowchart TB
    %% Layer styling with distinct colors for each layer
    classDef frontendLayer fill:#e6f7ff,stroke:#1890ff,stroke-width:3px,color:#0050b3,font-weight:bold
    classDef ragLayer fill:#f6ffed,stroke:#52c41a,stroke-width:3px,color:#135200,font-weight:bold
    classDef backendLayer fill:#fff2e8,stroke:#fa8c16,stroke-width:3px,color:#873800,font-weight:bold
    classDef dataLayer fill:#f9f0ff,stroke:#722ed1,stroke-width:3px,color:#391085,font-weight:bold
    classDef llmLayer fill:#fff1f0,stroke:#f5222d,stroke-width:3px,color:#a8071a,font-weight:bold
    
    classDef component fill:#ffffff,stroke:#d9d9d9,stroke-width:1px,color:#262626
    
    %% Create subgraphs for each layer with clear boundaries
    subgraph Layer1[Frontend]
        Frontend[Frontend Applications] --> NextJS["Next.js"]
        Frontend --> Vercel["Vercel"]
        Frontend --> Streamlit["Streamlit"]
    end
    
    subgraph Layer2[Embeddings & RAG Libraries]
        RAG[Embeddings and RAG Libraries] --> Nomic["Nomic"]
        RAG --> Cognita["Cognita"]
        RAG --> LLMware["LLMware"]
        RAG --> JinaAI["JinaAI"]
    end
    
    subgraph Layer3[Backend & Model Access]
        Backend[Backend and Model Access] --> LangChain["LangChain"]
        Backend --> Metaflow["Netflix Metaflow"]
        Backend --> HuggingFace["Hugging Face"]
        Backend --> FastAPI["FastAPI"]
        Backend --> Ollama["Ollama"]
    end
    
    subgraph Layer4[Data & Retrieval]
        DR[Data and Retrieval] --> Postgres["PostgreSQL"]
        DR --> Milvus["Milvus"]
        DR --> Weaviate["Weaviate"]
        DR --> PGVector["pgvector"]
        DR --> FAISS["FAISS"]
    end
    
    subgraph Layer5[Large Language Models]
        LLM[Large Language Models] --> Llama3["Llama3.3"]
        LLM --> Mistral["Mistral"]
        LLM --> Gemma2["Gemma2"]
        LLM --> Owen["Owen"]
        LLM --> Phi["Phi"]
    end
    
    %% Connections between layers with thicker, more visible lines
    Layer1 ==> Layer2
    Layer2 ==> Layer3
    Layer3 ==> Layer4
    Layer3 ==> Layer5
    Layer4 ==> Layer5
    
    %% Apply styles to layers
    class Layer1 frontendLayer
    class Layer2 ragLayer
    class Layer3 backendLayer
    class Layer4 dataLayer
    class Layer5 llmLayer
    
    %% Apply styles to components
    class Frontend,RAG,Backend,DR,LLM frontendLayer,ragLayer,backendLayer,dataLayer,llmLayer
    class NextJS,Vercel,Streamlit,Nomic,Cognita,LLMware,JinaAI,LangChain,Metaflow,HuggingFace,FastAPI,Ollama,Postgres,Milvus,Weaviate,PGVector,FAISS,Llama3,Mistral,Gemma2,Owen,Phi component

Components Breakdown

1. Frontend

User interfaces and deployment platforms:

Next.js: React framework for building web applications
Vercel: Platform for frontend deployment and hosting
Streamlit: Framework for quickly creating data apps and ML interfaces

2. Embeddings and RAG Libraries

Tools for creating embeddings and implementing Retrieval Augmented Generation:

Nomic: Library for creating and managing embeddings
Cognita: Tools for building knowledge-intensive applications
LLMware: Framework for RAG applications and enterprise data integration
JinaAI: Neural search framework for multimodal data

3. Backend and Model Access

Frameworks and tools for interacting with models and orchestrating AI workflows:

LangChain: Framework for developing applications with LLMs through composable components
Netflix Metaflow: Data science framework for managing workflows and deployments
Hugging Face: Hub for accessing and fine-tuning models
FastAPI: High-performance API framework for Python
Ollama: Tool for running LLMs locally

4. Data and Retrieval

Storage and vector search solutions for managing embeddings and knowledge bases:

PostgreSQL: Relational database for structured data storage
Milvus: Scalable vector database optimized for similarity search
Weaviate: Vector search engine with semantic search capabilities
pgvector: PostgreSQL extension for vector similarity search
FAISS: Facebook AI's efficient similarity search library

5. Large Language Models

The foundation of the stack, providing the core AI capabilities:

Llama3.3: Meta's open source LLM, known for its versatility and strong instruction following
Mistral: Efficient and high-performance model with strong reasoning capabilities
Gemma2: Google's lightweight yet powerful model designed for efficient deployment
Owen: A specialized model focused on coding and technical tasks
Phi: Microsoft's small but capable model optimized for efficiency

Getting Started

Prerequisites

Python 3.10+
Node.js 18+ (for frontend components)
Docker and Docker Compose
Git

Basic Setup

Clone this repository:

git clone https://github.com/yourusername/open-source-ai-stack.git
cd open-source-ai-stack

Set up the Python environment:

python -m venv venv
source venv/bin/activate  # On Windows: venv\Scripts\activate
pip install -r requirements.txt

Set up the model environment:

# Install Ollama
curl -fsSL https://ollama.com/install.sh | sh

# Pull the models
ollama pull llama3
ollama pull mistral
ollama pull gemma2
ollama pull phi

Start the database services:

docker-compose up -d postgres milvus weaviate

Set up the frontend:
```
cd frontend
npm install
npm run dev
```

Example Use Cases

1. Document Question-Answering System

Combine LLMs with RAG capabilities to build a system that can answer questions based on your documents:

from langchain.llms import Ollama
from langchain.vectorstores import FAISS
from langchain.embeddings import HuggingFaceEmbeddings
from langchain.document_loaders import DirectoryLoader
from langchain.chains import RetrievalQA

# Load documents
loader = DirectoryLoader("./documents/", glob="**/*.pdf")
documents = loader.load()

# Create embeddings and vector store
embeddings = HuggingFaceEmbeddings(model_name="nomic-ai/nomic-embed-text-v1")
vector_store = FAISS.from_documents(documents, embeddings)

# Initialize LLM
llm = Ollama(model="llama3")

# Create QA chain
qa_chain = RetrievalQA.from_chain_type(
    llm=llm,
    chain_type="stuff",
    retriever=vector_store.as_retriever()
)

# Query the system
query = "What is the main conclusion of the research paper?"
response = qa_chain.run(query)
print(response)

2. Conversational AI with Memory

Build a chatbot with conversation history using LangChain and FastAPI:

from fastapi import FastAPI
from langchain.chains import ConversationChain
from langchain.memory import ConversationBufferMemory
from langchain.llms import Ollama

app = FastAPI()
llm = Ollama(model="mistral")
memory = ConversationBufferMemory()
conversation = ConversationChain(
    llm=llm, 
    memory=memory,
    verbose=True
)

@app.post("/chat")
async def chat(message: str):
    response = conversation.predict(input=message)
    return {"response": response}

3. Visual AI Dashboard with Streamlit

Create an interactive dashboard to demonstrate AI capabilities:

import streamlit as st
from langchain.llms import Ollama
from langchain.callbacks import StreamlitCallbackHandler

st.title("Open Source AI Demo")

llm = Ollama(model="gemma2")
st_callback = StreamlitCallbackHandler(st.container())

user_input = st.text_input("Ask a question:")
if user_input:
    with st.spinner("Generating response..."):
        response = llm.generate([user_input], callbacks=[st_callback])
        st.write(response.generations[0][0].text)

Performance Considerations

Model Selection: Choose smaller models (Phi, Gemma2) for faster inference or larger models (Llama3.3) for more complex tasks
Vector Database: FAISS is faster for smaller datasets, while Milvus and Weaviate scale better for large collections
Embedding Caching: Implement caching strategies to avoid redundant embedding generation
Quantization: Use quantized models where possible to reduce memory footprint and increase speed
Batching: Implement request batching for high-throughput applications

Deployment Options

Local Development:
- Ollama for local model hosting
- Docker Compose for services
- Next.js in development mode
Self-Hosted Production:
- Kubernetes with custom resource definitions
- HuggingFace Inference Endpoints for model hosting
- PostgreSQL with pgvector for data storage
Hybrid Cloud:
- Vercel for frontend hosting
- Self-hosted models on dedicated hardware
- Managed PostgreSQL service

Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

Fork the repository
Create your feature branch (git checkout -b feature/amazing-feature)
Commit your changes (git commit -m 'Add some amazing feature')
Push to the branch (git push origin feature/amazing-feature)
Open a Pull Request

License

This project is licensed under the MIT License - see the LICENSE file for details.

Acknowledgments

All the open source projects and communities that make this stack possible
Contributors and maintainers of the individual components
The AI research community for advancing open source models

Disclaimer: This stack is provided as a reference architecture. Individual components may have their own licensing terms and requirements. Ensure you comply with all applicable licenses when using these components.

sachajw/ai-open-source-stack.md