Skip to content

Instantly share code, notes, and snippets.

@kaustumbh7
Last active March 3, 2024 19:37
Show Gist options
  • Save kaustumbh7/6dc0b909dbdfea4ae2428fb77e18273f to your computer and use it in GitHub Desktop.
Save kaustumbh7/6dc0b909dbdfea4ae2428fb77e18273f to your computer and use it in GitHub Desktop.
#!/usr/bin/env python
# coding: utf8
# Training additional entity types using spaCy
from __future__ import unicode_literals, print_function
import pickle
import plac
import random
from pathlib import Path
import spacy
from spacy.util import minibatch, compounding
# New entity labels
# Specify the new entity labels which you want to add here
LABEL = ['I-geo', 'B-geo', 'I-art', 'B-art', 'B-tim', 'B-nat', 'B-eve', 'O', 'I-per', 'I-tim', 'I-nat', 'I-eve', 'B-per', 'I-org', 'B-gpe', 'B-org', 'I-gpe']
"""
geo = Geographical Entity
org = Organization
per = Person
gpe = Geopolitical Entity
tim = Time indicator
art = Artifact
eve = Event
nat = Natural Phenomenon
"""
# Loading training data
with open ('Data/ner_corpus_260', 'rb') as fp:
TRAIN_DATA = pickle.load(fp)
@plac.annotations(
model=("Model name. Defaults to blank 'en' model.", "option", "m", str),
new_model_name=("New model name for model meta.", "option", "nm", str),
output_dir=("Optional output directory", "option", "o", Path),
n_iter=("Number of training iterations", "option", "n", int))
def main(model=None, new_model_name='new_model', output_dir=None, n_iter=10):
"""Setting up the pipeline and entity recognizer, and training the new entity."""
if model is not None:
nlp = spacy.load(model) # load existing spacy model
print("Loaded model '%s'" % model)
else:
nlp = spacy.blank('en') # create blank Language class
print("Created blank 'en' model")
if 'ner' not in nlp.pipe_names:
ner = nlp.create_pipe('ner')
nlp.add_pipe(ner)
else:
ner = nlp.get_pipe('ner')
for i in LABEL:
ner.add_label(i) # Add new entity labels to entity recognizer
if model is None:
optimizer = nlp.begin_training()
else:
optimizer = nlp.entity.create_optimizer()
# Get names of other pipes to disable them during training to train only NER
other_pipes = [pipe for pipe in nlp.pipe_names if pipe != 'ner']
with nlp.disable_pipes(*other_pipes): # only train NER
for itn in range(n_iter):
random.shuffle(TRAIN_DATA)
losses = {}
batches = minibatch(TRAIN_DATA, size=compounding(4., 32., 1.001))
for batch in batches:
texts, annotations = zip(*batch)
nlp.update(texts, annotations, sgd=optimizer, drop=0.35,
losses=losses)
print('Losses', losses)
# Test the trained model
test_text = 'Gianni Infantino is the president of FIFA.'
doc = nlp(test_text)
print("Entities in '%s'" % test_text)
for ent in doc.ents:
print(ent.label_, ent.text)
# Save model
if output_dir is not None:
output_dir = Path(output_dir)
if not output_dir.exists():
output_dir.mkdir()
nlp.meta['name'] = new_model_name # rename model
nlp.to_disk(output_dir)
print("Saved model to", output_dir)
# Test the saved model
print("Loading from", output_dir)
nlp2 = spacy.load(output_dir)
doc2 = nlp2(test_text)
for ent in doc2.ents:
print(ent.label_, ent.text)
if __name__ == '__main__':
plac.call(main)
@kaustumbh7
Copy link
Author

Warning: Unnamed vectors

Hey David,
I guess that might be an issue with the spacy version. Please install spacy==2.0.18 and try again.

@Z-e-e
Copy link

Z-e-e commented Apr 28, 2020

@kaustumbh7 I followed the process you outlined, and when I run this script I get :

TypeError: object of type 'NoneType' has no len()

@kaustumbh7
Copy link
Author

kaustumbh7 commented May 17, 2020

@Z-e-e Hello, make sure that you have initialized the LABEL list correctly.

@YagzanManjunath
Copy link

@kaustumbh7 I am trying to understand the process of training and I am using the exact same dataset that you have specified here and I am still getting this error "TypeError: object of type 'NoneType' has no len()".
Can you help me out in where I am going wrong ?

@YagzanManjunath
Copy link

@kaustumbh7 I am trying to understand the process of training and I am using the exact same dataset that you have specified here and I am still getting this error "TypeError: object of type 'NoneType' has no len()".
Can you help me out in where I am going wrong ?

Hi, the issue was that I had empty texts values in the Training batch. I filtered them out while creating data in spacy format and it works like a gem now. Thanks anyways. This is a really helpful code :)

@mgrove6
Copy link

mgrove6 commented Jul 7, 2020

hi,
while running this code I am getting error as- 'KeyError: "[E022] Could not find a transition with the name 'U-Tag' in the NER model."'
Could you please let me know solution for this or what could be the problem for this error. I have followed your steps for creating the training dataset. I am using the same dataset as yours.
Thank you.

@sherinasundarrajan123
Copy link

hi,
while running this code I am getting error as- 'KeyError: "[E022] Could not find a transition with the name 'U-Tag' in the NER model."'
Could you please let me know solution for this or what could be the problem for this error. I have followed your steps for creating the training dataset. I am using the same dataset as yours.
Thank you.

Hi @mgrove6,
You will have to include all the labels that you are using to train your dataset in the LABEL field (Line 16) in your code. The error is basically trying to say that you have trained your model with a label that is not a valid custom label. To make the label (U-Tag) a valid one, include the same to the LABEL list as is in line 16 in the above code.

@Sn3hangshu
Copy link

Hey, This article was very helpful for beginners. /i am now able to train the data. Thanks

But How do I test, without training everytime?

@Sn3hangshu
Copy link

Sn3hangshu commented Jul 26, 2020

hi,
while running this code I am getting error as- 'KeyError: "[E022] Could not find a transition with the name 'U-Tag' in the NER model."'
Could you please let me know solution for this or what could be the problem for this error. I have followed your steps for creating the training dataset. I am using the same dataset as yours.
Thank you.

Hi @mgrove6,
You will have to include all the labels that you are using to train your dataset in the LABEL field (Line 16) in your code. The error is basically trying to say that you have trained your model with a label that is not a valid custom label. To make the label (U-Tag) a valid one, include the same to the LABEL list as is in line 16 in the above code.

Hi, I was able to fix this. I added
ner.add_label("Tag")

@ line 51

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment