EN | 中文

Education

Institute of Software, Chinese Academy of Sciences, Ph.D. in Natural Language Processing

2020 -- Present

Nankai University, Dual Bachelor's Degrees in Software Engineering and Finance, GPA: 92.7/100 (Ranked 2/120)

2016 -- 2020

Research

Knowledge Injection and Generalization

Large Language Models Often Say One Thing and Do Another

Ruoxi Xu, Hongyu Lin, Xianpei Han, Le Sun, Yingfei Sun
The Thirteenth International Conference on Learning Representations (ICLR 2025)
Paper
> Details

This is the first work to quantitatively investigate the consistency between words an deeds in LLMs across multiple domains. A new benchmark, word and deed consistency test, was proposed, covering four domains: opinion, (im)moral values, and theory. Our findings across diverse models reveal: (1) There exists a common inconsistency between words and deeds across various LLMs and domains; (2) The underlying reasons for this inconsistency may be a lack of strong beliefs in the base models and unsynchronized alignment of words and deeds in the aligned models; (3) Common knowledge generalization methods, such as explicit reasoning and data augmentation, may not fundamentally align the internal words and deeds of models.


Memorizing is Not Enough: Deep Knowledge Injection Through Reasoning

Ruoxi Xu, Yunjie Ji, Boxi Cao, Yaojie Lu, Hongyu Lin, Xianpei Han, Ben He, Yingfei Sun, Xiangang Li, Le Sun
The 63rd Annual Meeting of the Association for Computational Linguistics (ACL 2025)
Preprint
> Details

Current approaches focus primarily on shallow knowledge injection (e.g., memorization and retrieval). We propose a four-level framework—Memorization, Retrieval, Reasoning, and Association—that formalizes the depth of knowledge injection. Based on this framework, we build DeepKnowledge, a benchmark to evaluate fine-grained knowledge injection for novel, incremental, and updated knowledge. Our experiments provide systematic insights into key factors and matching techniques for each knowledge level.

Social Science and LLMs

AI for social science and social science of AI: A survey

Ruoxi Xu, Yingfei Sun, Mengjie Ren, Shiguang Guo, Ruotong Pan, Hongyu Lin, Le Sun, Xianpei Han
Information Processing & Management (IP&M 2024, SCI Q1, ESI Highly Cited Papers)
Preprint / Paper
> Details

Recent advancements in artificial intelligence, particularly with the emergence of large language models (LLMs), have sparked a rethinking of artificial general intelligence possibilities. The increasing human-like capabilities of AI are also attracting attention in social science research, leading to various studies exploring the combination of these two fields. In this survey, we systematically categorize previous explorations in the combination of AI and social science into two directions that share common technical approaches but differ in their research objectives. The first direction is focused on AI for social science, where AI is utilized as a powerful tool to enhance various stages of social science research. While the second direction is the social science of AI, which examines AI agents as social entities with their human-like cognitive and linguistic capabilities. By conducting a thorough review, particularly on the substantial progress facilitated by recent advancements in large language models, this paper introduces a fresh perspective to reassess the relationship between AI and social science, provides a cohesive framework that allows researchers to understand the distinctions and connections between AI for social science and social science of AI, and also summarizes state-of-art experiment simulation platforms to facilitate research in these two directions. We believe that with the ongoing advancement of AI technology and the increasing integration of intelligent agents into our daily lives, the significance of the combination of AI and social science will become even more prominent.


Situational Evaluation for Social Intelligence of Large Language Models

Ruoxi Xu, Hongyu Lin, Xianpei Han, Le Sun, Yingfei Sun
Preprint
> Details

The academic intelligence of large language models (LLMs) has made remarkable progress in recent times, but their social intelligence performance remains unclear.Inspired by established human social intelligence frameworks, particularly Daniel Goleman's social intelligence theory, we have developed a standardized social intelligence test based on real-world social scenarios to comprehensively assess the social intelligence of LLMs, termed as the Situational Evaluation of SocialIntelligence (SESI). We conducted an extensive evaluation with 13 recent popular and state-of-art LLM agents on SESl. The results indicate the social intelligence of LLMs still has significant room for improvement, with superficially friendliness as a primary reason for errors. Moreover, there exists a relatively low correlation between the social intelligence and academic intelligence exhibited by LLMs, suggesting that social intelligence is distinct from academic intelligence for LLMs.Additionally, while it is observed that LLMs can't “understand” what social intelligence is, their social intelligence, similar to that of humans, is influenced by social factors.

Information Extraction

ECO v1: towards event-centric opinion mining

Ruoxi Xu, Hongyu Lin, Meng Liao, Xianpei Han, Jin Xu, Wei Tan, Yingfei Sun, Le Sun
Findings of the Association for Computational Linguistics (ACL 2022)
Preprint / Paper
> Details

Events are considered as the fundamental building blocks of the world. Mining event-centric opinions can benefit decision making, people communication, and social good. Unfortunately, there is little literature addressing event-centric opinion mining, although which significantly diverges from the well-studied entity-centric opinion mining in connotation, structure, and expression. In this paper, we propose and formulate the task of event-centric opinion mining based on event-argument structure and expression categorizing theory. We also benchmark this task by constructing a pioneer corpus and designing a two-step benchmark framework. Experiment results show that event-centric opinion mining is feasible and challenging, and the proposed task, dataset, and baselines are beneficial for future studies.

Long Document Understanding

DLUE: Benchmarking Document Language Understanding

Ruoxi Xu, Hongyu Lin, Xinyan Guan, Xianpei Han, Yingfei Sun, Le Sun
China National Conference on Chinese Computational Linguistics (CCL 2024)
Preprint / Paper
> Details

Understanding documents is central to many real-world tasks but remains a challenging topic. Unfortunately, there is no well-established consensus on how to comprehensively evaluate document understanding abilities, which significantly hinders the fair comparison and measuring the progress of the field. To benchmark document understanding researches, this paper summarizes four representative abilities, i.e., document classification, document structural analysis, document information extraction, and document transcription. Under the new evaluation framework, we propose \textbf{Document Language Understanding Evaluation} -- DLUE, a new task suite which covers a wide-range of tasks in various forms, domains and document genres. We also systematically evaluate six well-established transformer models on DLUE, and find that due to the lengthy content, complicated underlying structure and dispersed knowledge, document understanding is still far from being solved, and currently there is no neural architecture that dominates all tasks, raising requirements for a universal document understanding architecture.

Internships

Ke Holdings Inc.

2024.7 -- Present

  • Developed an efficient knowledge injection toolkit. Achieved a one-stop solution for diverse knowledge augmentation based on large models, model continuous pre-training using Magetron, and automated multi-level knowledge application evaluation (memory, extraction, and reasoning). Compared to direct injection, the precision of knowledge extraction and reasoning improved by 55% and 47.4%, respectively.
  • Explored the generalization boundaries of knowledge injection during the pre-training phase: Repetitive learning enables the model to memorize knowledge; diverse and heterogeneous expressions enhance the model’s ability to extract knowledge; explicit reasoning data allows the model to reason knowledge in corresponding reasoning patterns and their combinations, but the model struggles to apply knowledge in other reasoning paths. Experimental results identified key factors for achieving different levels of knowledge injection in large language models, establishing a mapping between knowledge injection levels and corresponding methods.
  • Published a paper at ACL 2025.

Tencent, wechat search application department

2021.10 -- 2022.06

  • Extended the traditional entity-centric opinion mining task to an event-centric paradigm. Led the task formulation, definition, and formalization, developed benchmark annotations, and designed a two-stage framework: modeling event-centric opinion extraction as a sentence-level sequence labeling task, and opinion target extraction as a machine reading comprehension task.
  • Participated in the construction of an event knowledge graph. After validating the effectiveness of the proposed algorithms, deployed them in the "Search" deep comment and multi-perspective projects to automatically extract opinions from news articles related to core events, event participants, and sub-events, and summarize opinions from multiple perspectives.
  • Organized the shared task “Event-Centric Opinion Mining” at CCL 2022.