The following diagram represents the stack as a layer cake architecture, with uniform layers stacked on top of each other:
%%{init: {'theme': 'forest'}}%%
flowchart TD
%% Layer styling with distinct colors for each layer
classDef frontendLayer fill:#e6f7ff,stroke:#1890ff,stroke-width:4px,color:#0050b3,font-weight:bold
classDef ragLayer fill:#f6ffed,stroke:#52c41a,stroke-width:4px,color:#135200,font-weight:bold
classDef backendLayer fill:#fff2e8,stroke:#fa8c16,stroke-width:4px,color:#873800,font-weight:bold
classDef dataLayer fill:#f9f0ff,stroke:#722ed1,stroke-width:4px,color:#391085,font-weight:bold
classDef llmLayer fill:#fff1f0,stroke:#f5222d,stroke-width:4px,color:#a8071a,font-weight:bold
%% Define simple cake layers with equal width
Frontend["Frontend Layer\nNext.js | Vercel | Streamlit"]
RAG["Embeddings & RAG Libraries\nNomic | Cognita | LLMware | JinaAI"]
Backend["Backend & Model Access\nLangChain | Metaflow | Hugging Face | FastAPI | Ollama"]
Data["Data & Retrieval Layer\nPostgreSQL | Milvus | Weaviate | pgvector | FAISS"]
LLM["Large Language Models\nLlama3.3 | Mistral | Gemma2 | Owen | Phi"]
%% Stack layers vertically like a cake
Frontend --- RAG
RAG --- Backend
Backend --- Data
Data --- LLM
%% Apply styles
class Frontend frontendLayer
class RAG ragLayer
class Backend backendLayer
class Data dataLayer
class LLM llmLayer
%%{init: {'theme': 'forest'}}%%
flowchart TB
%% Layer styling with distinct colors for each layer
classDef frontendLayer fill:#e6f7ff,stroke:#1890ff,stroke-width:3px,color:#0050b3,font-weight:bold
classDef ragLayer fill:#f6ffed,stroke:#52c41a,stroke-width:3px,color:#135200,font-weight:bold
classDef backendLayer fill:#fff2e8,stroke:#fa8c16,stroke-width:3px,color:#873800,font-weight:bold
classDef dataLayer fill:#f9f0ff,stroke:#722ed1,stroke-width:3px,color:#391085,font-weight:bold
classDef llmLayer fill:#fff1f0,stroke:#f5222d,stroke-width:3px,color:#a8071a,font-weight:bold
classDef component fill:#ffffff,stroke:#d9d9d9,stroke-width:1px,color:#262626
%% Create subgraphs for each layer with clear boundaries
subgraph Layer1[Frontend]
Frontend[Frontend Applications] --> NextJS["Next.js"]
Frontend --> Vercel["Vercel"]
Frontend --> Streamlit["Streamlit"]
end
subgraph Layer2[Embeddings & RAG Libraries]
RAG[Embeddings and RAG Libraries] --> Nomic["Nomic"]
RAG --> Cognita["Cognita"]
RAG --> LLMware["LLMware"]
RAG --> JinaAI["JinaAI"]
end
subgraph Layer3[Backend & Model Access]
Backend[Backend and Model Access] --> LangChain["LangChain"]
Backend --> Metaflow["Netflix Metaflow"]
Backend --> HuggingFace["Hugging Face"]
Backend --> FastAPI["FastAPI"]
Backend --> Ollama["Ollama"]
end
subgraph Layer4[Data & Retrieval]
DR[Data and Retrieval] --> Postgres["PostgreSQL"]
DR --> Milvus["Milvus"]
DR --> Weaviate["Weaviate"]
DR --> PGVector["pgvector"]
DR --> FAISS["FAISS"]
end
subgraph Layer5[Large Language Models]
LLM[Large Language Models] --> Llama3["Llama3.3"]
LLM --> Mistral["Mistral"]
LLM --> Gemma2["Gemma2"]
LLM --> Owen["Owen"]
LLM --> Phi["Phi"]
end
%% Connections between layers with thicker, more visible lines
Layer1 ==> Layer2
Layer2 ==> Layer3
Layer3 ==> Layer4
Layer3 ==> Layer5
Layer4 ==> Layer5
%% Apply styles to layers
class Layer1 frontendLayer
class Layer2 ragLayer
class Layer3 backendLayer
class Layer4 dataLayer
class Layer5 llmLayer
%% Apply styles to components
class Frontend,RAG,Backend,DR,LLM frontendLayer,ragLayer,backendLayer,dataLayer,llmLayer
class NextJS,Vercel,Streamlit,Nomic,Cognita,LLMware,JinaAI,LangChain,Metaflow,HuggingFace,FastAPI,Ollama,Postgres,Milvus,Weaviate,PGVector,FAISS,Llama3,Mistral,Gemma2,Owen,Phi component
User interfaces and deployment platforms:
- Next.js: React framework for building web applications
- Vercel: Platform for frontend deployment and hosting
- Streamlit: Framework for quickly creating data apps and ML interfaces
Tools for creating embeddings and implementing Retrieval Augmented Generation:
- Nomic: Library for creating and managing embeddings
- Cognita: Tools for building knowledge-intensive applications
- LLMware: Framework for RAG applications and enterprise data integration
- JinaAI: Neural search framework for multimodal data
Frameworks and tools for interacting with models and orchestrating AI workflows:
- LangChain: Framework for developing applications with LLMs through composable components
- Netflix Metaflow: Data science framework for managing workflows and deployments
- Hugging Face: Hub for accessing and fine-tuning models
- FastAPI: High-performance API framework for Python
- Ollama: Tool for running LLMs locally
Storage and vector search solutions for managing embeddings and knowledge bases:
- PostgreSQL: Relational database for structured data storage
- Milvus: Scalable vector database optimized for similarity search
- Weaviate: Vector search engine with semantic search capabilities
- pgvector: PostgreSQL extension for vector similarity search
- FAISS: Facebook AI's efficient similarity search library
The foundation of the stack, providing the core AI capabilities:
- Llama3.3: Meta's open source LLM, known for its versatility and strong instruction following
- Mistral: Efficient and high-performance model with strong reasoning capabilities
- Gemma2: Google's lightweight yet powerful model designed for efficient deployment
- Owen: A specialized model focused on coding and technical tasks
- Phi: Microsoft's small but capable model optimized for efficiency
- Python 3.10+
- Node.js 18+ (for frontend components)
- Docker and Docker Compose
- Git
-
Clone this repository:
git clone https://github.com/yourusername/open-source-ai-stack.git cd open-source-ai-stack
-
Set up the Python environment:
python -m venv venv source venv/bin/activate # On Windows: venv\Scripts\activate pip install -r requirements.txt
-
Set up the model environment:
# Install Ollama curl -fsSL https://ollama.com/install.sh | sh # Pull the models ollama pull llama3 ollama pull mistral ollama pull gemma2 ollama pull phi
-
Start the database services:
docker-compose up -d postgres milvus weaviate
-
Set up the frontend:
cd frontend npm install npm run dev
Combine LLMs with RAG capabilities to build a system that can answer questions based on your documents:
from langchain.llms import Ollama
from langchain.vectorstores import FAISS
from langchain.embeddings import HuggingFaceEmbeddings
from langchain.document_loaders import DirectoryLoader
from langchain.chains import RetrievalQA
# Load documents
loader = DirectoryLoader("./documents/", glob="**/*.pdf")
documents = loader.load()
# Create embeddings and vector store
embeddings = HuggingFaceEmbeddings(model_name="nomic-ai/nomic-embed-text-v1")
vector_store = FAISS.from_documents(documents, embeddings)
# Initialize LLM
llm = Ollama(model="llama3")
# Create QA chain
qa_chain = RetrievalQA.from_chain_type(
llm=llm,
chain_type="stuff",
retriever=vector_store.as_retriever()
)
# Query the system
query = "What is the main conclusion of the research paper?"
response = qa_chain.run(query)
print(response)
Build a chatbot with conversation history using LangChain and FastAPI:
from fastapi import FastAPI
from langchain.chains import ConversationChain
from langchain.memory import ConversationBufferMemory
from langchain.llms import Ollama
app = FastAPI()
llm = Ollama(model="mistral")
memory = ConversationBufferMemory()
conversation = ConversationChain(
llm=llm,
memory=memory,
verbose=True
)
@app.post("/chat")
async def chat(message: str):
response = conversation.predict(input=message)
return {"response": response}
Create an interactive dashboard to demonstrate AI capabilities:
import streamlit as st
from langchain.llms import Ollama
from langchain.callbacks import StreamlitCallbackHandler
st.title("Open Source AI Demo")
llm = Ollama(model="gemma2")
st_callback = StreamlitCallbackHandler(st.container())
user_input = st.text_input("Ask a question:")
if user_input:
with st.spinner("Generating response..."):
response = llm.generate([user_input], callbacks=[st_callback])
st.write(response.generations[0][0].text)
- Model Selection: Choose smaller models (Phi, Gemma2) for faster inference or larger models (Llama3.3) for more complex tasks
- Vector Database: FAISS is faster for smaller datasets, while Milvus and Weaviate scale better for large collections
- Embedding Caching: Implement caching strategies to avoid redundant embedding generation
- Quantization: Use quantized models where possible to reduce memory footprint and increase speed
- Batching: Implement request batching for high-throughput applications
-
Local Development:
- Ollama for local model hosting
- Docker Compose for services
- Next.js in development mode
-
Self-Hosted Production:
- Kubernetes with custom resource definitions
- HuggingFace Inference Endpoints for model hosting
- PostgreSQL with pgvector for data storage
-
Hybrid Cloud:
- Vercel for frontend hosting
- Self-hosted models on dedicated hardware
- Managed PostgreSQL service
Contributions are welcome! Please feel free to submit a Pull Request.
- Fork the repository
- Create your feature branch (
git checkout -b feature/amazing-feature
) - Commit your changes (
git commit -m 'Add some amazing feature'
) - Push to the branch (
git push origin feature/amazing-feature
) - Open a Pull Request
This project is licensed under the MIT License - see the LICENSE file for details.
- All the open source projects and communities that make this stack possible
- Contributors and maintainers of the individual components
- The AI research community for advancing open source models
Disclaimer: This stack is provided as a reference architecture. Individual components may have their own licensing terms and requirements. Ensure you comply with all applicable licenses when using these components.