-
-
Save lmcinnes/951de185bd341006a76eece478cc6324 to your computer and use it in GitHub Desktop.
I integrated Toponymy with OpenAI following your suggestion and used AsyncOpenAI, which works well for embeddings and the initial clustering. However, during the topic naming step I’m hitting a BadRequestError when naming clusters with very large keyphrase sets. It seems some prompts exceed the API input limits.
What would you recommend as the best solution? Should I patch make_prompts() to truncate keyphrases per cluster (e.g. top 30–50), or is there an existing parameter or preferred way to limit the prompt size for topic naming?
The OpenAI integration otherwise works fine, with async speeding things up as expected!
It works pretty nicely now, but I couldn't recreate the interface. It seems the package has been updated and some functions do not exist anymore—someone has some thoughts on it?
Yes, sorry, things are under reasonably active development and I haven't had time to update this gist. You'll want enable_topic_tree=True
instead of enable_table_of_contents=True
for newer versions of datamapplot.
Note that, as with the Azure AI Foundry you will need to install the relevant package to enable it within toponymy. So if you want to use OpenAI then you'll need to install
openai
into your environment for toponymy to see it, and so on. Anthropic, Cohere, and OpenAI are all available, as well as local LLMs (assuming you have a GPU) via llamm_cpp, and in the most recent version on github, vLLM.Note also that, at the time of writing, the async/batch versions of the service wrappers wasn't available, so you may want to consider using those instead as it will be faster. Just prefix with Async to get that to work, so for example
AsyncOpenAI
etc.