Knowledge Extraction and Reasoning

We are interested in building systems that can understand and reason with information in natural language. Our current work focuses on grade science exams as a test domain, where we target extraction of knowledge that can readily support new inferences. We've explored text-derived first-order rules with Markov Logic Networks as a reasoning framework [AKBC 2014, EMNLP 2015], and semantic role based knowledge for recognizing process instances [EMNLP 2016]. These efforts are in close collaboration along with financial support from the Allen Institute for Artificial Intelligence.

Event Extraction

Extracting and organizing information about events requires pre-specification of templates or schemas. This project aims to automatically discover the key entities and their roles by analyzing large volumes of news articles. Our previous work developed a semantic resource called Rel-grams, a relational analogue to lexical n-grams. Rel-grams capture co-occurrence between relations expressed in text, which can be used to automatically generate schemas [EMNLP 2013, AKBC-WEKEX 2012]

Extraction and Reasoning for Digital Pathology

As part of the fantastic team in Biomedical Informatics headed by Dr. Joel Saltz, we are working on a digital pathology project, where we are building an expert system that assists in silico classification scheme testing for brain and lung cancer. 

Biomedical Relation Extraction

This project is aimed at improving access to scientific advances in the biomedical domain. The main goal of the project is to extract medical relations from biomedical abstracts. This project is in collaboration with Prof. Ritwik Banerjee, Prof. IV Ramakrishnan, and Prof. Yejin Choi. We use latent variable models that can jointly infer how drugs treat diseases and use this inferred knowledge to identify other drugs that can also treat them. Practical applications include drug repurposing and discovering adverse drug events.

User Factor Adaptation

User-factor adaptation is the problem of adapting NLP models to real-valued human attributes, or factors, that capture fine-grained differences between individuals. These factors can include both known factors (e.g. demographics, personality) and latent factors that can be inferred simply from an unlabeled collection of a person’s tweets. Our approach to user-factor adaptation is similar to feature augmentation, a common technique in domain adaptation, with the addition of being able to adapt to continuous variables. We find that we can improve on popular NLP tasks by putting language back into its human context.