Skip to content

Instantly share code, notes, and snippets.

@wolfecameron
Last active April 4, 2025 07:12
Show Gist options
  • Save wolfecameron/82db74244e4c46206f5d7c1336d7f4cd to your computer and use it in GitHub Desktop.
Save wolfecameron/82db74244e4c46206f5d7c1336d7f4cd to your computer and use it in GitHub Desktop.
import torch
from transformers import AutoTokenizer
# load the llama-3.2 tokenizer
tokenizer = AutoTokenizer.from_pretrained('meta-llama/Llama-3.1-8B')
# raw text
text = "This raw text will be tokenized"
# create tokens using tokenizer
tokens = tokenizer.tokenize(text)
token_ids = tokenizer.convert_tokens_to_ids(tokens)
# token_ids = tokenizer.encode(text) # directly create token ids
# view the results
print("Original Text:", text)
print("Tokens:", tokens)
print("Token IDs:", token_ids)
# create token embedding layer
VOCABULARY_SIZE: int = 128000
EMBEDDING_DIM: int = 768
token_embedding_layer = torch.nn.Embedding(
num_embeddings=VOCABULARY_SIZE,
embedding_dim=EMBEDDING_DIM,
)
# get token embeddings (IDs must be passed as a tensor, not a list)
token_emb = token_embedding_layer(torch.tensor(token_ids))
print(f'Token Embeddings Shape: {token_emb.shape}')
@ChanduTadanki
Copy link

I get an error as:

HTTPError: 401 Client Error: Unauthorized for url: https://huggingface.co/meta-llama/Llama-3.1-8B/resolve/main/config.json
The above exception was the direct cause of the following exception:
GatedRepoError

What am I missing?
Note: I reviewed https://huggingface.co/meta-llama/Llama-3.1-8B?library=transformers but that does not help much. Same error continues.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment