#####1. Shallow Parsers for different languages#####
- Description - POS Tagging and Chunking for Gujarati,Odia, Hindi, Bengali, Marathi, Telugu (individual project for each language)
- We will implement many supervised algorithms including CRF, HMM, MaxEnt, SVM, some semi-supervised classification methods, finally an unsupervised one. Will try to implement Morph Analyzer if time permits. students need to annotate data, understand the challenges, compare results given by multiple
- Mentor: Pruthwik M
- algorithms- students for each language - 1 / 2
- prerequisites - Data-structures, knowledge of POS tagging, basics of Machine Learning
#####2. Implementing dependency parser.#####
- Description-one need to create CPG based dependency perser exploring different tools and resources. Advanced techniques can be applied exploiting large monolingual corpus.
- Faculty - Prof. Dipti Misra Sharma
- Mentor - Aniruddh Tammewar
- Students - 2-3
- Requirements - NLP, Machine Learning, Python.
#####3. Semantic Role Labeling using Parsing.#####
- Description-Semantic Role Labeling involves the task of automatically identifying the arguments of a verb in a sentence and then classifying them by labeling the arguments with semantic labels, also known as PropBank labels. Presently, Hindi and Urdu PropBanks are built on top of HDT and UDT respectively and this project aims at building robust statistical Semantic role labellers for both the languages. Furthermore, we can use the Prop-Bank features to improve Parsing and vice-versa.
- Faculty - Prof. Dipti Misra Sharma
- Mentor - Maaz Anwar
- Students - 2-3
- Requirements - NLP, Machine Learning, Python.
#####4. Handling Idiom Expression using Grammatical Framework (GF) software for English-Hindi pair of languages.#####
- Description :: Both English and Hindi language have different set of idioms. Idioms from one language might correspond to an idiom in another language or have a different meaning altogether. The project involves mapping idioms from one language to another by creating abstract definitions and then mapping these semantics to the available idioms or literal forms.
- Faculty :: Dr. Soma Paul
- Mentor :: Prateek Saxena and Shastri V.
- Students :: 3-4
- Resources to be read by summer school students :: Resources that give an idea of existing as well as the latest 'linguistic' and 'NLP' tools, GF slides
- Skills :: NLP, GF Programming
#####5. Preparing Linguistic resources using GF on android #####
- Description :: ( description not finalized, information not available)
- Resources :: GF app for android named "GF offline translator". One can use and play around with it.
- Faculty :: Dr. Soma Paul
- Mentor :: Ayushi Agrawal and Shivani Pathak
#####6. Generating English-Hindi Word aligned corpora using existing NLP resources#####
- Description :: The existing algorithm for the task of word alignment gives word aligned output using output of 2 tools namely Anusaaraka and phrase table(generated by SMT tool Phrasal).This algorithm will be provided to the interns who have to test, evaluate and improve it.We have to integrate output of 2 more NLP tools(Parser)-a Parser using ERG(English Resource Grammar) and a Hindi Parser.Interns will have to propose solutions for this idea and try to accomplish this ongoing task by improving the existing algorithm.
- Faculty :: Dr. Soma Paul
- Mentor :: Ayushi Agrawal and Shivani Pathak
- Students :: 3-4
- Resources to be read by summer school students :: Resources that give an idea of existing as well as the latest 'linguistic' and 'NLP' tools.
- Skills :: NLP, Programming
####Dialogue Processing
#####7. Syntactic and semantic processing for NLIDB in Telugu/Hindi#####
- NLIDB is a system which translates a natural language query into a SQL query. Syntactic and semantic processing of the given NL query are important for the NLIDB system to translate it into a SQL query. This project aims at understanding the architecture of a NLIDB system in CPG framework and also involves developing syntactic and semantic modules for the NLIDB system in Telugu/Hindi.
- Student: 3-4
- Skills: NLP, MySQL, Programming(Python).
- Mentors: Ashish P
- Description: In this project, students need to develop a dialog state tracking algorithm to track the act/state shift in the dialogue.
- Data: Will be taken from the Spoken Dialog Challenge, which consists of human/machine spoken dialogs with real users.
- Language: English, Hindi
- Detail Description: http://research.microsoft.com/pubs/198681/dstc2013.pdf
#####8. Discourse Argument Identification from Dependency Structure and Argument Span Selection.
- Description :: From the given Hindi dependency output, identify the explicit discourse connectives and the span of their arguments. Implicit discourse connections are not handled as a part of this project.
- Faculty :: Prof. Dipti Misra Sharma
- Mentor :: Rohit and Vignesh
- Students :: 2
- Resources to be read by summer school students :: Research papers
- Skills :: NLP, Programming, Linguistic analysis
#####9. Discourse Sense Identification from Sentential features and creating relation hierarchies based on Senses
- Description :: From the given Hindi dependency output, identify the explicit discourse connectives and the span of their arguments. Implicit discourse connections are not handled as a part of this project.
- Faculty :: Prof. Dipti Misra Sharma
- Mentor :: Rohit and Vignesh
- Students :: 2
- Resources to be read by summer school students :: Research papers
- Skills :: NLP, Programming, Linguistic analysis
#####10. Sentence level semantic similarity by Karka cluster classification in semantic vector space
- Description :: Given two sentences of text, s1 and s2, the system need find how similar s1 and s2 are, returning a similarity score, and an optional confidence score.The annotations and systems will use a scale from 0 (no relation) to 5 (semantic equivalence), indicating the similarity between two sentences.
- Faculty :: Prof. Dipti Misra Sharma
- Mentor :: Darshan A
- Students :: 1
- Skills :: NLP, Programming, Linguistic analysis, Moses,C++, Boost.
#####11. Integrating SMT in ILMT system
- Description :: Student need to understand the existing SMT system like Moses and modular MT system like ILMT and worked towards the improvement of ILMT system by implementing various Moses feature functions.
- Faculty :: Prof. Dipti Misra Sharma, Dr. Manish Shrivastava
- Mentor :: Saumitra
- Students :: 1
- Skills :: NLP, Programming, Linguistic analysis,NLP, Moses,C++
#####12. MultiLingual Question Answering on Google
- Description :: This project aims at building a simple Web scale QA system, which uses Google search results for answer extraction and ranking of the results will be done with designed algorithm. It has mainly 4 steps as follows 1)Question Classification, in which we use Li-Roth based classifier using svm, where we get the coarse grained as well as fine grained class. 2) Answer Retrieval, in this we use the google web search API for querying google, where we can get maximum of 8 results for a page request. then we fire complete user question on google for retrieving the results. 3) Phrase & Named Entity Extractor which tries to extract the Noun Phrases from the Search Results content using nltk chunker. Then we try to extract named entities using stanford Named Entity Recognizer. Then we have to implement 4) Answer Extraction & Ranking module where We try to extract all the different noun phrases and compare its Entity with Answer type of question. Then We rank the matched noun phrases based on the frequency of the noun phrase occurrence in different search results. Those high ranked nouns will be given as output to the user.
- Faculty :: Prof. Manish Shrivastava , Manoj chinnakotla
- Mentor :: Harish Yenala , Avinash Kamineni,Abhishek Kannan,Teja
- Students :: 4-5
- Skills :: Basic Idea of NLTK and it's usage, Python
- Resources to be read :: Li-Roth Question classification paper, SVM algorithm , knowledge on Chunker and NER, papers on "QA on unstrured web content"
#####13. Speech recognition using Sphinx.
- Description :: Speech recognition means speech to text conversion. This project will help to implement Hidden Markov Model (HMM) based speech recognition using MFCC features. SPHINX tool will be used for its implementation.
- Faculty :: Prof. Anil kumar vuppala
- Mentor :: A. Raju
- Students :: 3-4
- Resources to be read by summer school students :: Research papers
#####14. Speaker recognition using GMM.
- Description :: Speaker recognition means identification of speaker from speech. This project will help to implement Gaussian Mixture model (GMM) based speaker identification using MFCC features.
- Faculty :: Prof. Anil kumar vuppala
- Mentor :: V. Raju
- Students :: 3-4
- Resources to be read by summer school students :: Research papers
#####15. Prosody modification of speech.
- Description :: Prosody means supra-segmental features of speech, namely energy, duration and pitch. This project will help to implement prosody modification i.e changing pitch values or duration etc using SOLA technique.
- Faculty :: Prof. Anil kumar vuppala
- Mentor :: Hari Krishna
- Students :: 3-4
- Resources to be read by summer school students :: Research papers
-
Description: In this project students need to develop an ensemble method that combines embeddings produced by GloVe and word2vec with structured knowledge from the semantic networks ConceptNet or/and PPDB or/and WordNet merging their information into a common representation with a large, multi- lingual vocabulary.
-
Data: Monolingual corpus
-
Language: Hindi, English
- Description: In this project, students need to develop a system that seeks to locate and classify elements in text into predefined named entity categories using Deep Neural Networks
- Description: In this project we will be attempting to extract synonyms from raw corpus in supervised fashion with word2vec.
- Prerequisites: Comfortable with coding in python most essential, at least basic understanding of neural networks and clustering algorithms.
- Data/Languages: Monolingual Corpus/Hindi and English