Skip to content

Instantly share code, notes, and snippets.

@balachandrapai
Created March 9, 2018 06:52
Show Gist options
  • Save balachandrapai/ee3e42b96d10da35039e779328291cdf to your computer and use it in GitHub Desktop.
Save balachandrapai/ee3e42b96d10da35039e779328291cdf to your computer and use it in GitHub Desktop.
NLP NamedEntityRecognition
import nltk
from nltk.corpus import state_union
from nltk.tokenize import PunktSentenceTokenizer
train_text = state_union.raw("2005-GWBush.txt")
sample_text = state_union.raw("2006-GWBush.txt")
custom_sent_tokenizer = PunktSentenceTokenizer(train_text)
tokenized = custom_sent_tokenizer.tokenize(sample_text)
def process_content():
try:
for i in tokenized[5:]:
words = nltk.word_tokenize(i)
tagged = nltk.pos_tag(words)
## NER is built in tool for chunking
## binary = True, this means either something is a named entity, or not.
## There will be no further detail
namedEnt = nltk.ne_chunk(tagged, binary=True)
namedEnt.draw()
except Exception as e:
print(str(e))
process_content()
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment