moarshy · July 24, 2024 11:45
diff --git a/local_rag.ipynb b/local_rag.ipynb
 {
 "cells": [
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {},
   "outputs": [],
   "source": [
    "# download ollama - https://ollama.com/\n",
    "# ollama pull mxbai-embed-large (or any other embedding model)\n",
    "# ollama pull gemma (or any other llm model)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {},
   "outputs": [],
   "source": [
    "# !pip install langchain langchain_ollama langchain_chroma langchain_community pymupdf langchain-text-splitters"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {},
   "outputs": [],
   "source": [
    "# with use the following papers: https://arxiv.org/abs/2310.10436 and https://arxiv.org/abs/2304.03442"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 21,
   "metadata": {},
   "outputs": [],
   "source": [
    "from langchain_community.document_loaders import PyMuPDFLoader\n",
    "from langchain_text_splitters import RecursiveCharacterTextSplitter\n",
    "from langchain_ollama import OllamaEmbeddings\n",
    "from langchain_chroma import Chroma\n",
    "from langchain_ollama import ChatOllama\n",
    "from langchain_core.prompts import ChatPromptTemplate\n",
    "\n",
    "from IPython.display import display, Markdown"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "metadata": {},
   "outputs": [],
   "source": [
    "pdf_path1 = PyMuPDFLoader('./pdfs/EconAgent- Large Language Model-Empowered Agents for Simulating Macroeconomic Activities.pdf')\n",
    "pdf_path2 = PyMuPDFLoader('./pdfs/Generative Agents- Interactive Simulacra of Human Behavior.pdf')\n",
    "\n",
    "pdf1 = pdf_path1.load()\n",
    "pdf2 = pdf_path2.load()\n",
    "\n",
    "pdf = pdf1 + pdf2"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "379"
      ]
     },
     "execution_count": 6,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "text_splitter = RecursiveCharacterTextSplitter(\n",
    "    chunk_size=600,\n",
    "    chunk_overlap=100,\n",
    "    length_function=len,\n",
    ")\n",
    "\n",
    "chunks = text_splitter.split_documents(pdf)\n",
    "len(chunks)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Creating embeddings and storing in chroma db"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "metadata": {},
   "outputs": [],
   "source": [
    "embedding_model = OllamaEmbeddings(model=\"mxbai-embed-large\")\n",
    "db = Chroma.from_documents(chunks, embedding_model, persist_directory=\"./chroma_db\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "planning, and decision-making. In this work, we\n",
      "design EconAgent, a LLM-empowered agent with\n",
      "human-like characteristics for macroeconomic sim-\n",
      "ulations. We first construct a simulation environ-\n",
      "ment that includes labor and consumption market\n",
      "dynamics driven by agents’ decisions on working\n",
      "and consumption, as well as fiscal and monetary\n",
      "policies. Using a perception module that targets\n",
      "agent profiles and mirrors real-world economic sit-\n",
      "uations, we create heterogeneous agents automat-\n",
      "ically exhibiting different decision-making mech-\n",
      "anisms. Additionally, we model the influence of\n",
      "\n",
      "overlooked in decision-making processes. In\n",
      "this work, we introduce EconAgent, a large\n",
      "language model-empowered agent with human-\n",
      "like characteristics for macroeconomic simu-\n",
      "lation. We first construct a simulation envi-\n",
      "ronment that incorporates various market dy-\n",
      "namics driven by agents’ decisions regarding\n",
      "work and consumption. Through the perception\n",
      "module, we create heterogeneous agents with\n",
      "distinct decision-making mechanisms.\n",
      "Fur-\n",
      "thermore, we model the impact of macroeco-\n",
      "nomic trends using a memory module, which\n",
      "allows agents to reflect on past individual ex-\n"
     ]
    },
    {
     "data": {
      "text/plain": [
       "(None, None, None)"
      ]
     },
     "execution_count": 8,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "r_docs = db.similarity_search('what is economics agent', k=5)\n",
    "\n",
    "print(r_docs[0].page_content), print(), print(r_docs[1].page_content)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### RAG using the created vector db"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 17,
   "metadata": {},
   "outputs": [],
   "source": [
    "llm = ChatOllama(\n",
    "    model=\"gemma\",\n",
    "    temperature=0,\n",
    "    # other params...\n",
    ")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 18,
   "metadata": {},
   "outputs": [],
   "source": [
    "USER_PROMPT = \"\"\"Here is a question: {question}\\n\\nHere are the retrieved documents:\\n\\n{documents}\\n\\nPlease answer the question based on the documents.\\n\\n\"\"\"\n",
    "\n",
    "messages = [\n",
    "    ('system', 'You are an AI assistant answering questions about retrieved documents. Only answer question to which you can find a clear answer in the documents.'),\n",
    "    ('user', USER_PROMPT),\n",
    "]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 25,
   "metadata": {},
   "outputs": [],
   "source": [
    "def prepare_r_docs(query):\n",
    "    r_docs = db.similarity_search(query, k=5)\n",
    "    return \"\\n\\n\".join([f\"Document {i}: {doc.page_content}\" for i, doc in enumerate(r_docs)])\n",
    "\n",
    "\n",
    "def answer(query):\n",
    "    documents = prepare_r_docs(query)\n",
    "    prompt_temp = ChatPromptTemplate.from_messages(messages)\n",
    "    prompt = prompt_temp.invoke({'question': query, 'documents': documents})\n",
    "    response = llm.invoke(prompt.to_messages())\n",
    "    return response.content"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 20,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Based on the retrieved documents, an **economics agent** is an artificial intelligence-powered agent designed to simulate macroeconomic activities. These agents exhibit human-like characteristics and make decisions on working, consumption, and economic policies. They are equipped with:\n",
      "\n",
      "* **Perception module:** Creates heterogeneous agents with distinct decision-making mechanisms.\n",
      "* **Memory module:** Allows agents to reflect on past individual experiences and market dynamics.\n",
      "\n",
      "The documents suggest that EconAgent is a large language model-empowered agent designed for macroeconomic simulations, addressing limitations in traditional agent modeling with predetermined rules or neural networks.\n"
     ]
    }
   ],
   "source": [
    "res = answer('what are economics agent')\n",
    "Markdown(res)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 26,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/markdown": [
       "**Economic simulations** are computational models that replicate the behavior of economic systems over time. These simulations involve modeling the decisions and actions of individual agents (such as consumers and workers) and their interactions within the economy. The goal is to understand the dynamics and behavior of the economy as a whole under different scenarios and policy changes."
      ],
      "text/plain": [
       "<IPython.core.display.Markdown object>"
      ]
     },
     "execution_count": 26,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "res = answer('what are economic simulations')\n",
    "Markdown(res)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": []
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "naveen",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.10.14"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
 }
	{
	"cells": [
	{
	"cell_type": "code",
	"execution_count": 1,
	"metadata": {},
	"outputs": [],
	"source": [
	"# download ollama - https://ollama.com/\n",
	"# ollama pull mxbai-embed-large (or any other embedding model)\n",
	"# ollama pull gemma (or any other llm model)"
	]
	},
	{
	"cell_type": "code",
	"execution_count": 2,
	"metadata": {},
	"outputs": [],
	"source": [
	"# !pip install langchain langchain_ollama langchain_chroma langchain_community pymupdf langchain-text-splitters"
	]
	},
	{
	"cell_type": "code",
	"execution_count": 3,
	"metadata": {},
	"outputs": [],
	"source": [
	"# with use the following papers: https://arxiv.org/abs/2310.10436 and https://arxiv.org/abs/2304.03442"
	]
	},
	{
	"cell_type": "code",
	"execution_count": 21,
	"metadata": {},
	"outputs": [],
	"source": [
	"from langchain_community.document_loaders import PyMuPDFLoader\n",
	"from langchain_text_splitters import RecursiveCharacterTextSplitter\n",
	"from langchain_ollama import OllamaEmbeddings\n",
	"from langchain_chroma import Chroma\n",
	"from langchain_ollama import ChatOllama\n",
	"from langchain_core.prompts import ChatPromptTemplate\n",
	"\n",
	"from IPython.display import display, Markdown"
	]
	},
	{
	"cell_type": "code",
	"execution_count": 5,
	"metadata": {},
	"outputs": [],
	"source": [
	"pdf_path1 = PyMuPDFLoader('./pdfs/EconAgent- Large Language Model-Empowered Agents for Simulating Macroeconomic Activities.pdf')\n",
	"pdf_path2 = PyMuPDFLoader('./pdfs/Generative Agents- Interactive Simulacra of Human Behavior.pdf')\n",
	"\n",
	"pdf1 = pdf_path1.load()\n",
	"pdf2 = pdf_path2.load()\n",
	"\n",
	"pdf = pdf1 + pdf2"
	]
	},
	{
	"cell_type": "code",
	"execution_count": 6,
	"metadata": {},
	"outputs": [
	{
	"data": {
	"text/plain": [
	"379"
	]
	},
	"execution_count": 6,
	"metadata": {},
	"output_type": "execute_result"
	}
	],
	"source": [
	"text_splitter = RecursiveCharacterTextSplitter(\n",
	" chunk_size=600,\n",
	" chunk_overlap=100,\n",
	" length_function=len,\n",
	")\n",
	"\n",
	"chunks = text_splitter.split_documents(pdf)\n",
	"len(chunks)"
	]
	},
	{
	"cell_type": "markdown",
	"metadata": {},
	"source": [
	"### Creating embeddings and storing in chroma db"
	]
	},
	{
	"cell_type": "code",
	"execution_count": 7,
	"metadata": {},
	"outputs": [],
	"source": [
	"embedding_model = OllamaEmbeddings(model=\"mxbai-embed-large\")\n",
	"db = Chroma.from_documents(chunks, embedding_model, persist_directory=\"./chroma_db\")"
	]
	},
	{
	"cell_type": "code",
	"execution_count": 8,
	"metadata": {},
	"outputs": [
	{
	"name": "stdout",
	"output_type": "stream",
	"text": [
	"planning, and decision-making. In this work, we\n",
	"design EconAgent, a LLM-empowered agent with\n",
	"human-like characteristics for macroeconomic sim-\n",
	"ulations. We first construct a simulation environ-\n",
	"ment that includes labor and consumption market\n",
	"dynamics driven by agents’ decisions on working\n",
	"and consumption, as well as fiscal and monetary\n",
	"policies. Using a perception module that targets\n",
	"agent profiles and mirrors real-world economic sit-\n",
	"uations, we create heterogeneous agents automat-\n",
	"ically exhibiting different decision-making mech-\n",
	"anisms. Additionally, we model the influence of\n",
	"\n",
	"overlooked in decision-making processes. In\n",
	"this work, we introduce EconAgent, a large\n",
	"language model-empowered agent with human-\n",
	"like characteristics for macroeconomic simu-\n",
	"lation. We first construct a simulation envi-\n",
	"ronment that incorporates various market dy-\n",
	"namics driven by agents’ decisions regarding\n",
	"work and consumption. Through the perception\n",
	"module, we create heterogeneous agents with\n",
	"distinct decision-making mechanisms.\n",
	"Fur-\n",
	"thermore, we model the impact of macroeco-\n",
	"nomic trends using a memory module, which\n",
	"allows agents to reflect on past individual ex-\n"
	]
	},
	{
	"data": {
	"text/plain": [
	"(None, None, None)"
	]
	},
	"execution_count": 8,
	"metadata": {},
	"output_type": "execute_result"
	}
	],
	"source": [
	"r_docs = db.similarity_search('what is economics agent', k=5)\n",
	"\n",
	"print(r_docs[0].page_content), print(), print(r_docs[1].page_content)"
	]
	},
	{
	"cell_type": "markdown",
	"metadata": {},
	"source": [
	"### RAG using the created vector db"
	]
	},
	{
	"cell_type": "code",
	"execution_count": 17,
	"metadata": {},
	"outputs": [],
	"source": [
	"llm = ChatOllama(\n",
	" model=\"gemma\",\n",
	" temperature=0,\n",
	" # other params...\n",
	")"
	]
	},
	{
	"cell_type": "code",
	"execution_count": 18,
	"metadata": {},
	"outputs": [],
	"source": [
	"USER_PROMPT = \"\"\"Here is a question: {question}\\n\\nHere are the retrieved documents:\\n\\n{documents}\\n\\nPlease answer the question based on the documents.\\n\\n\"\"\"\n",
	"\n",
	"messages = [\n",
	" ('system', 'You are an AI assistant answering questions about retrieved documents. Only answer question to which you can find a clear answer in the documents.'),\n",
	" ('user', USER_PROMPT),\n",
	"]"
	]
	},
	{
	"cell_type": "code",
	"execution_count": 25,
	"metadata": {},
	"outputs": [],
	"source": [
	"def prepare_r_docs(query):\n",
	" r_docs = db.similarity_search(query, k=5)\n",
	" return \"\\n\\n\".join([f\"Document {i}: {doc.page_content}\" for i, doc in enumerate(r_docs)])\n",
	"\n",
	"\n",
	"def answer(query):\n",
	" documents = prepare_r_docs(query)\n",
	" prompt_temp = ChatPromptTemplate.from_messages(messages)\n",
	" prompt = prompt_temp.invoke({'question': query, 'documents': documents})\n",
	" response = llm.invoke(prompt.to_messages())\n",
	" return response.content"
	]
	},
	{
	"cell_type": "code",
	"execution_count": 20,
	"metadata": {},
	"outputs": [
	{
	"name": "stdout",
	"output_type": "stream",
	"text": [
	"Based on the retrieved documents, an economics agent is an artificial intelligence-powered agent designed to simulate macroeconomic activities. These agents exhibit human-like characteristics and make decisions on working, consumption, and economic policies. They are equipped with:\n",
	"\n",
	"* Perception module: Creates heterogeneous agents with distinct decision-making mechanisms.\n",
	"* Memory module: Allows agents to reflect on past individual experiences and market dynamics.\n",
	"\n",
	"The documents suggest that EconAgent is a large language model-empowered agent designed for macroeconomic simulations, addressing limitations in traditional agent modeling with predetermined rules or neural networks.\n"
	]
	}
	],
	"source": [
	"res = answer('what are economics agent')\n",
	"Markdown(res)"
	]
	},
	{
	"cell_type": "code",
	"execution_count": 26,
	"metadata": {},
	"outputs": [
	{
	"data": {
	"text/markdown": [
	"Economic simulations are computational models that replicate the behavior of economic systems over time. These simulations involve modeling the decisions and actions of individual agents (such as consumers and workers) and their interactions within the economy. The goal is to understand the dynamics and behavior of the economy as a whole under different scenarios and policy changes."
	],
	"text/plain": [
	"<IPython.core.display.Markdown object>"
	]
	},
	"execution_count": 26,
	"metadata": {},
	"output_type": "execute_result"
	}
	],
	"source": [
	"res = answer('what are economic simulations')\n",
	"Markdown(res)"
	]
	},
	{
	"cell_type": "code",
	"execution_count": null,
	"metadata": {},
	"outputs": [],
	"source": []
	}
	],
	"metadata": {
	"kernelspec": {
	"display_name": "naveen",
	"language": "python",
	"name": "python3"
	},
	"language_info": {
	"codemirror_mode": {
	"name": "ipython",
	"version": 3
	},
	"file_extension": ".py",
	"mimetype": "text/x-python",
	"name": "python",
	"nbconvert_exporter": "python",
	"pygments_lexer": "ipython3",
	"version": "3.10.14"
	}
	},
	"nbformat": 4,
	"nbformat_minor": 2
	}