Education:

Work Experiences:

  • Sr. Research Scientist, Research and Development, Thomson Reuters, Eagan, MN, USA, Oct 2017 - present
    • Natural Language Processing
    • Machine Learning
    • Knowledge Graph and Data Integration
    • Question Answering
    • Natural Language Generation

  • Research Scientist, Research and Development, Thomson Reuters, Eagan, MN, USA, Nov 2013 - Sep 2017
    • Question Answering: Developing Natural Language Question Answering systems for querying and analyzing large-scale interlinked datasets
    • Natural Language Generation: Developing bootstrapping algorithms for automatically generating sentence templates
    • Temporal Information Extraction: Developed a method for extracting temporal information for the TAC2013 competition

  • Research Intern, IBM Research, Dublin, Ireland, Jun 2012 - Aug 2012
    • Co-designed an ontology for the smart building domain (building, asset, sensor, etc.).
    • Developed a system to detect abnormal building status by utilizing the semantics and domain knowledge in the ontology and by examining historical sensor data.

  • Intern-Biostatistics (PHD), Division of Statistics and Bioinformatics, Mayo Clinic, Rochester, MN, USA, Jun 2011 - Aug 2011
    • Developed a Protégé plugin (graphical user interface) for manual annotation of clinical narratives. We adopted cTAKES, OpenNLP and the NCBO annotator to enable semi-automatic annotation based on clinical ontologies, such as SNOMED and RxNORM. We published the research outcome in the 2012 AMIA Summit on Clinical Research Informatics and the 2011 International Semantic Web Conference.
    • System is available for download: http://informatics.mayo.edu/CNTRO/index.php/Semantator

  • Research Assistant, Department of Computer Science and Engineering, Lehigh University, Bethlehem, PA, USA, Jun 2009 - Nov 2013
    • Scalable and Domain-independent Entity Coreference (Ph.D. Dissertation), Jun 2009 - Nov 2013
      • Interlinking large-scale and heterogeneous Semantic Web data by developing unsupervised and domain-independent entity resolution algorithms.
      • Developing pruning techniques and using Information Retrieval inverted index for scaling the entity resolution process for large-scale datasets (e.g., millions of data instances)
    • ADEN: Anomaly Detection Engine for Networks (DARPA), Sep 2011 - Aug 2013
      • Developing a Tag Cloud system to enable intelligent and scalable exploration of large-scale enterprise data with indexing techniques.
      • Exploring entity resolution algorithms to link free text entities to structured data (e.g., DBpedia).
    • Structuring, Reasoning, and Querying in a Very Large Medical Image Database (NSF), Jan 2011 - May 2011
      • Designed and implemented a machine learning framework for cervical cancer patient classification. Instead of only considering cervigrams (medical images) or clinical test results, we combined these two types of data together and achieved higher classification accuracy than its ablations.
    • DAE: Document Analysis and Exploitation (DARPA), Sep 2009 - Dec 2010
      • Designed and applied a data model to store image data, experiment results and provenance.
      • Co-Designed and co-implemented a platform for document analysis researchers to execute algorithms and to interact with the backend database.
      • Analyzed the feasibility of using cloud computing for data storage and running algorithms.

  • Teaching Assistant, Department of Computer Science and Engineering, Lehigh University, Bethlehem, PA, USA, Jun 2009 - Nov 2013
    • Assisted with six courses: Databases, Java Programming, User Interface & Techniques, Computer Graphics and Automata & Formal Languages. Responsibilities included holding office hours, grading homework and exams, debugging students' Java programs and one-to-one tutoring of students.