IASNLP 2016 Project list

IASNLP-2016 Project list

LTRC, IIIT-Hyderabad

Treebanking

#####1. Shallow Parsers for different languages#####

Description - POS Tagging and Chunking for Gujarati,Odia, Hindi, Bengali, Marathi, Telugu (individual project for each language)
We will implement many supervised algorithms including CRF, HMM, MaxEnt, SVM, some semi-supervised classification methods, finally an unsupervised one. Will try to implement Morph Analyzer if time permits. students need to annotate data, understand the challenges, compare results given by multiple
Mentor: Pruthwik M
algorithms- students for each language - 1 / 2
prerequisites - Data-structures, knowledge of POS tagging, basics of Machine Learning

Parsing

#####2. Implementing dependency parser.#####

Description-one need to create CPG based dependency perser exploring different tools and resources. Advanced techniques can be applied exploiting large monolingual corpus.
Faculty - Prof. Dipti Misra Sharma
Mentor - Aniruddh Tammewar
Students - 2-3
Requirements - NLP, Machine Learning, Python.

#####3. Semantic Role Labeling using Parsing.#####

Description-Semantic Role Labeling involves the task of automatically identifying the arguments of a verb in a sentence and then classifying them by labeling the arguments with semantic labels, also known as PropBank labels. Presently, Hindi and Urdu PropBanks are built on top of HDT and UDT respectively and this project aims at building robust statistical Semantic role labellers for both the languages. Furthermore, we can use the Prop-Bank features to improve Parsing and vice-versa.
Faculty - Prof. Dipti Misra Sharma
Mentor - Maaz Anwar
Students - 2-3
Requirements - NLP, Machine Learning, Python.

Anusaaraka

#####4. Handling Idiom Expression using Grammatical Framework (GF) software for English-Hindi pair of languages.#####

Description :: Both English and Hindi language have different set of idioms. Idioms from one language might correspond to an idiom in another language or have a different meaning altogether. The project involves mapping idioms from one language to another by creating abstract definitions and then mapping these semantics to the available idioms or literal forms.
Faculty :: Dr. Soma Paul
Mentor :: Prateek Saxena and Shastri V.
Students :: 3-4
Resources to be read by summer school students :: Resources that give an idea of existing as well as the latest 'linguistic' and 'NLP' tools, GF slides
Skills :: NLP, GF Programming

#####5. Preparing Linguistic resources using GF on android #####

Description :: ( description not finalized, information not available)
Resources :: GF app for android named "GF offline translator". One can use and play around with it.
Faculty :: Dr. Soma Paul
Mentor :: Ayushi Agrawal and Shivani Pathak

#####6. Generating English-Hindi Word aligned corpora using existing NLP resources#####

Description :: The existing algorithm for the task of word alignment gives word aligned output using output of 2 tools namely Anusaaraka and phrase table(generated by SMT tool Phrasal).This algorithm will be provided to the interns who have to test, evaluate and improve it.We have to integrate output of 2 more NLP tools(Parser)-a Parser using ERG(English Resource Grammar) and a Hindi Parser.Interns will have to propose solutions for this idea and try to accomplish this ongoing task by improving the existing algorithm.
Faculty :: Dr. Soma Paul
Mentor :: Ayushi Agrawal and Shivani Pathak
Students :: 3-4
Resources to be read by summer school students :: Resources that give an idea of existing as well as the latest 'linguistic' and 'NLP' tools.
Skills :: NLP, Programming

####Dialogue Processing

#####7. Syntactic and semantic processing for NLIDB in Telugu/Hindi#####

NLIDB is a system which translates a natural language query into a SQL query. Syntactic and semantic processing of the given NL query are important for the NLIDB system to translate it into a SQL query. This project aims at understanding the architecture of a NLIDB system in CPG framework and also involves developing syntactic and semantic modules for the NLIDB system in Telugu/Hindi.
Student: 3-4
Skills: NLP, MySQL, Programming(Python).
Mentors: Ashish P

7a. Dialog State Tracking at dialog Level

Description: In this project, students need to develop a dialog state tracking algorithm to track the act/state shift in the dialogue.
Data: Will be taken from the Spoken Dialog Challenge, which consists of human/machine spoken dialogs with real users.
Language: English, Hindi
Detail Description: http://research.microsoft.com/pubs/198681/dstc2013.pdf

Discourse

#####8. Discourse Argument Identification from Dependency Structure and Argument Span Selection.

Description :: From the given Hindi dependency output, identify the explicit discourse connectives and the span of their arguments. Implicit discourse connections are not handled as a part of this project.
Faculty :: Prof. Dipti Misra Sharma
Mentor :: Rohit and Vignesh
Students :: 2
Resources to be read by summer school students :: Research papers
Skills :: NLP, Programming, Linguistic analysis

#####9. Discourse Sense Identification from Sentential features and creating relation hierarchies based on Senses

Description :: From the given Hindi dependency output, identify the explicit discourse connectives and the span of their arguments. Implicit discourse connections are not handled as a part of this project.
Faculty :: Prof. Dipti Misra Sharma
Mentor :: Rohit and Vignesh
Students :: 2
Resources to be read by summer school students :: Research papers
Skills :: NLP, Programming, Linguistic analysis

#####10. Sentence level semantic similarity by Karka cluster classification in semantic vector space

Description :: Given two sentences of text, s1 and s2, the system need find how similar s1 and s2 are, returning a similarity score, and an optional confidence score.The annotations and systems will use a scale from 0 (no relation) to 5 (semantic equivalence), indicating the similarity between two sentences.
Faculty :: Prof. Dipti Misra Sharma
Mentor :: Darshan A
Students :: 1
Skills :: NLP, Programming, Linguistic analysis, Moses,C++, Boost.

Machine Translation

#####11. Integrating SMT in ILMT system

Description :: Student need to understand the existing SMT system like Moses and modular MT system like ILMT and worked towards the improvement of ILMT system by implementing various Moses feature functions.
Faculty :: Prof. Dipti Misra Sharma, Dr. Manish Shrivastava
Mentor :: Saumitra
Students :: 1
Skills :: NLP, Programming, Linguistic analysis,NLP, Moses,C++

Question Answering

#####12. MultiLingual Question Answering on Google

Description :: This project aims at building a simple Web scale QA system, which uses Google search results for answer extraction and ranking of the results will be done with designed algorithm. It has mainly 4 steps as follows 1)Question Classification, in which we use Li-Roth based classifier using svm, where we get the coarse grained as well as fine grained class. 2) Answer Retrieval, in this we use the google web search API for querying google, where we can get maximum of 8 results for a page request. then we fire complete user question on google for retrieving the results. 3) Phrase & Named Entity Extractor which tries to extract the Noun Phrases from the Search Results content using nltk chunker. Then we try to extract named entities using stanford Named Entity Recognizer. Then we have to implement 4) Answer Extraction & Ranking module where We try to extract all the different noun phrases and compare its Entity with Answer type of question. Then We rank the matched noun phrases based on the frequency of the noun phrase occurrence in different search results. Those high ranked nouns will be given as output to the user.
Faculty :: Prof. Manish Shrivastava , Manoj chinnakotla
Mentor :: Harish Yenala , Avinash Kamineni,Abhishek Kannan,Teja
Students :: 4-5
Skills :: Basic Idea of NLTK and it's usage, Python
Resources to be read :: Li-Roth Question classification paper, SVM algorithm , knowledge on Chunker and NER, papers on "QA on unstrured web content"

Speech processing

#####13. Speech recognition using Sphinx.

Description :: Speech recognition means speech to text conversion. This project will help to implement Hidden Markov Model (HMM) based speech recognition using MFCC features. SPHINX tool will be used for its implementation.
Faculty :: Prof. Anil kumar vuppala
Mentor :: A. Raju
Students :: 3-4
Resources to be read by summer school students :: Research papers

#####14. Speaker recognition using GMM.

Description :: Speaker recognition means identification of speaker from speech. This project will help to implement Gaussian Mixture model (GMM) based speaker identification using MFCC features.
Faculty :: Prof. Anil kumar vuppala
Mentor :: V. Raju
Students :: 3-4
Resources to be read by summer school students :: Research papers

#####15. Prosody modification of speech.

Description :: Prosody means supra-segmental features of speech, namely energy, duration and pitch. This project will help to implement prosody modification i.e changing pitch values or duration etc using SOLA technique.
Faculty :: Prof. Anil kumar vuppala
Mentor :: Hari Krishna
Students :: 3-4
Resources to be read by summer school students :: Research papers

Learning Representations

16. Experiments on word embeddings.

Description: In this project students need to develop an ensemble method that combines embeddings produced by GloVe and word2vec with structured knowledge from the semantic networks ConceptNet or/and PPDB or/and WordNet merging their information into a common representation with a large, multi- lingual vocabulary.
Data: Monolingual corpus
Language: Hindi, English

17. Named-Entity Recognition using Deep Learning

Description: In this project, students need to develop a system that seeks to locate and classify elements in text into predefined named entity categories using Deep Neural Networks

18. Extraction of synonyms from corpus

Description: In this project we will be attempting to extract synonyms from raw corpus in supervised fashion with word2vec.
Prerequisites: Comfortable with coding in python most essential, at least basic understanding of neural networks and clustering algorithms.
Data/Languages: Monolingual Corpus/Hindi and English

manshri/IASNLP-2016.md