Last active
July 22, 2025 19:53
-
-
Save lmcinnes/951de185bd341006a76eece478cc6324 to your computer and use it in GitHub Desktop.
Interactive Data Map of Wikipedia
It works pretty nicely now, but I couldn't recreate the interface. It seems the package has been updated and some functions do not exist anymore—someone has some thoughts on it?
Yes, sorry, things are under reasonably active development and I haven't had time to update this gist. You'll want enable_topic_tree=True
instead of enable_table_of_contents=True
for newer versions of datamapplot.
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
I integrated Toponymy with OpenAI following your suggestion and used AsyncOpenAI, which works well for embeddings and the initial clustering. However, during the topic naming step I’m hitting a BadRequestError when naming clusters with very large keyphrase sets. It seems some prompts exceed the API input limits.
What would you recommend as the best solution? Should I patch make_prompts() to truncate keyphrases per cluster (e.g. top 30–50), or is there an existing parameter or preferred way to limit the prompt size for topic naming?
The OpenAI integration otherwise works fine, with async speeding things up as expected!