Skip to content

Instantly share code, notes, and snippets.

@lmcinnes
Last active July 22, 2025 19:53
Show Gist options
  • Save lmcinnes/951de185bd341006a76eece478cc6324 to your computer and use it in GitHub Desktop.
Save lmcinnes/951de185bd341006a76eece478cc6324 to your computer and use it in GitHub Desktop.
Interactive Data Map of Wikipedia
Display the source blob
Display the rendered blob
Raw
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
@rodighiero
Copy link

I integrated Toponymy with OpenAI following your suggestion and used AsyncOpenAI, which works well for embeddings and the initial clustering. However, during the topic naming step I’m hitting a BadRequestError when naming clusters with very large keyphrase sets. It seems some prompts exceed the API input limits.

What would you recommend as the best solution? Should I patch make_prompts() to truncate keyphrases per cluster (e.g. top 30–50), or is there an existing parameter or preferred way to limit the prompt size for topic naming?

The OpenAI integration otherwise works fine, with async speeding things up as expected!

@rodighiero
Copy link

It works pretty nicely now, but I couldn't recreate the interface. It seems the package has been updated and some functions do not exist anymore—someone has some thoughts on it?

@lmcinnes
Copy link
Author

Yes, sorry, things are under reasonably active development and I haven't had time to update this gist. You'll want enable_topic_tree=True instead of enable_table_of_contents=True for newer versions of datamapplot.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment