Dezhao Song


Google Scholar


I am a Sr. Research Scientist at Thomson Reuters, working on NLP, Machine Learning, Knowledge Graph, Question Answering, Natural Language Generation and their applications (primarily) to the legal domain. Prior to joining Thomson Reuters, I also performed research internships at Mayo Clinic and IBM Research, gaining experience in applying state-of-the-art techniques to solve real-world problems.

I obtained my Ph.D. degree in Computer Science from Lehigh University, PA, USA, under the supervision of Professor Jeff Heflin. My Ph.D. work focused on Entity Coreference in the Semantic Web and Linked Data - developing domain-independent algorithms for interlinking heterogeneous and large-scale data sources in order to facilitate data consumption and utilization in the Semantic Web. I also had the opportunities to participate in several DARPA and NSF projects, in the areas of Databases, Information Retrieval, and Machine Learning. I am recipient of the 2015 Semantic Web Science Association Distinguished Dissertation Award and also won the first prize of the 2012 Semantic Web Challenge (Billion Triples Track) for my work at Lehigh University.

Research Interests

  • Semantic Web, Linked data and Knowledge Graph
  • Information Integration
  • Natural Language Processing
  • Machine Learning
  • Information Retrieval


  • January, 2021: Our paper on leveraging label embedding and domain-specific pre-training for large-scale multi-label document classification has been accepted to Information Systems.
  • January, 2021: Thomson Reuters is releasing a manually labeled dataset in the legal domain. It has 50,000 legal case opinions with manually tagged procedural postures. The dataset can be a good candidate for document classification (especially few-shot/zero-shot techniques), language model pre-training, etc.