lmcinnes/wikipedia_data_map.ipynb

Last active July 21, 2025 21:38

Star (9) You must be signed in to star a gist
Fork (3) You must be signed in to fork a gist

Learn more about clone URLs
Clone this repository at <script src="https://gist.github.com/lmcinnes/951de185bd341006a76eece478cc6324.js"></script>
Save lmcinnes/951de185bd341006a76eece478cc6324 to your computer and use it in GitHub Desktop.

Download ZIP

Interactive Data Map of Wikipedia

Raw

wikipedia_data_map.ipynb

Sorry, something went wrong. Reload?

Sorry, we cannot display this file.

Sorry, this file is invalid so it cannot be displayed.

rodighiero commented Jun 26, 2025

I am playing with the code, but I have a problem I do not understand here:

ImportError Traceback (most recent call last)
Cell In[48], line 2
1 from toponymy import Toponymy, ToponymyClusterer, KeyphraseBuilder, ClusterLayerText
----> 2 from toponymy.llm_wrappers import AzureAI
3 from toponymy.embedding_wrappers import AzureAIEmbedder

ImportError: cannot import name 'AzureAI' from 'toponymy.llm_wrappers' (/opt/homebrew/Caskroom/miniconda/base/envs/OAPEN/lib/python3.12/site-packages/toponymy/llm_wrappers.py)

berkidem commented Jun 26, 2025

I think you need to have the AzureAI's package installed in the active environment to be able to import its wrapper.

rodighiero commented Jun 27, 2025

Do you know which package I have to install in Conda environment? I tried azure-code, but I have the same error.

Author

lmcinnes commented Jun 27, 2025

azure-ai-inference is the one you'll need if you want to use an azure AI foundry model. You may have to pip install it into your conda environment.

rodighiero commented Jun 28, 2025

I tried to use it, but it's a very complicated service with some limits with EU cards. Do you think it's feasible to use GPT instead?

berkidem commented Jun 28, 2025 •

edited

Loading

There are llm and embedding wrappers for most providers here. However, if you are planning to use OpenAI wrappers, make sure to use the latest version of the package from the repo. I made a PR about the OpenAI wrappers a few days ago and it is merged now but there hasn't been a release since then, so the version you would get from pip install toponymy wouldn't work.

Author

lmcinnes commented Jun 28, 2025

Note that, as with the Azure AI Foundry you will need to install the relevant package to enable it within toponymy. So if you want to use OpenAI then you'll need to install openai into your environment for toponymy to see it, and so on. Anthropic, Cohere, and OpenAI are all available, as well as local LLMs (assuming you have a GPU) via llamm_cpp, and in the most recent version on github, vLLM.

Note also that, at the time of writing, the async/batch versions of the service wrappers wasn't available, so you may want to consider using those instead as it will be faster. Just prefix with Async to get that to work, so for example AsyncOpenAI etc.

rodighiero commented Jul 21, 2025

I integrated Toponymy with OpenAI following your suggestion and used AsyncOpenAI, which works well for embeddings and the initial clustering. However, during the topic naming step I’m hitting a BadRequestError when naming clusters with very large keyphrase sets. It seems some prompts exceed the API input limits.

What would you recommend as the best solution? Should I patch make_prompts() to truncate keyphrases per cluster (e.g. top 30–50), or is there an existing parameter or preferred way to limit the prompt size for topic naming?

The OpenAI integration otherwise works fine, with async speeding things up as expected!

rodighiero commented Jul 21, 2025

It works pretty nicely now, but I couldn't recreate the interface. It seems the package has been updated and some functions do not exist anymore—someone has some thoughts on it?

lmcinnes/wikipedia_data_map.ipynb

rodighiero commented Jun 26, 2025

Uh oh!

berkidem commented Jun 26, 2025

Uh oh!

rodighiero commented Jun 27, 2025

Uh oh!

lmcinnes commented Jun 27, 2025

Uh oh!

rodighiero commented Jun 28, 2025

Uh oh!

berkidem commented Jun 28, 2025 •

edited

Loading

Uh oh!

lmcinnes commented Jun 28, 2025

Uh oh!

rodighiero commented Jul 21, 2025

Uh oh!

rodighiero commented Jul 21, 2025

Uh oh!

lmcinnes/wikipedia_data_map.ipynb

rodighiero commented Jun 26, 2025

Uh oh!

berkidem commented Jun 26, 2025

Uh oh!

rodighiero commented Jun 27, 2025

Uh oh!

lmcinnes commented Jun 27, 2025

Uh oh!

rodighiero commented Jun 28, 2025

Uh oh!

berkidem commented Jun 28, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

lmcinnes commented Jun 28, 2025

Uh oh!

rodighiero commented Jul 21, 2025

Uh oh!

rodighiero commented Jul 21, 2025

Uh oh!

berkidem commented Jun 28, 2025 •

edited

Loading