I am a Ph.D. Candidate (from 2020.09) in the [Chinese Information Processing Laboratory] at the Institute of Software, Chinese Academy of Sciences, under the supervision of Professor Xianpei Han and Professor Le Sun. I received my Bachelor degree in Nankai University in June 2020. My research interests include:
I am expected to graduate in 2026/06 and actively looking for job opportunities. I'd be happy to connect—feel free to reach out via the contact details below.
Email: ruoxi2021@iscas.ac.cn
Telephone: (+86) 15022171058
Wechat: 
Institute of Software, Chinese Academy of Sciences, Ph.D. in Natural Language Processing
2020 -- Present
Nankai University, Dual Bachelor's Degrees in Software Engineering and Finance, GPA: 92.7/100 (Ranked 2/120)
2016 -- 2020
This is the first work to quantitatively investigate the consistency between words an deeds in LLMs across multiple domains. A new benchmark, word and deed consistency test, was proposed, covering four domains: opinion, (im)moral values, and theory. Our findings across diverse models reveal: (1) There exists a common inconsistency between words and deeds across various LLMs and domains; (2) The underlying reasons for this inconsistency may be a lack of strong beliefs in the base models and unsynchronized alignment of words and deeds in the aligned models; (3) Common knowledge generalization methods, such as explicit reasoning and data augmentation, may not fundamentally align the internal words and deeds of models.
Current approaches focus primarily on shallow knowledge injection (e.g., memorization and retrieval). We propose a four-level framework—Memorization, Retrieval, Reasoning, and Association—that formalizes the depth of knowledge injection. Based on this framework, we build DeepKnowledge, a benchmark to evaluate fine-grained knowledge injection for novel, incremental, and updated knowledge. Our experiments provide systematic insights into key factors and matching techniques for each knowledge level.
Recent advancements in artificial intelligence, particularly with the emergence of large language models (LLMs), have sparked a rethinking of artificial general intelligence possibilities. The increasing human-like capabilities of AI are also attracting attention in social science research, leading to various studies exploring the combination of these two fields. In this survey, we systematically categorize previous explorations in the combination of AI and social science into two directions that share common technical approaches but differ in their research objectives. The first direction is focused on AI for social science, where AI is utilized as a powerful tool to enhance various stages of social science research. While the second direction is the social science of AI, which examines AI agents as social entities with their human-like cognitive and linguistic capabilities. By conducting a thorough review, particularly on the substantial progress facilitated by recent advancements in large language models, this paper introduces a fresh perspective to reassess the relationship between AI and social science, provides a cohesive framework that allows researchers to understand the distinctions and connections between AI for social science and social science of AI, and also summarizes state-of-art experiment simulation platforms to facilitate research in these two directions. We believe that with the ongoing advancement of AI technology and the increasing integration of intelligent agents into our daily lives, the significance of the combination of AI and social science will become even more prominent.
The academic intelligence of large language models (LLMs) has made remarkable progress in recent times, but their social intelligence performance remains unclear.Inspired by established human social intelligence frameworks, particularly Daniel Goleman's social intelligence theory, we have developed a standardized social intelligence test based on real-world social scenarios to comprehensively assess the social intelligence of LLMs, termed as the Situational Evaluation of SocialIntelligence (SESI). We conducted an extensive evaluation with 13 recent popular and state-of-art LLM agents on SESl. The results indicate the social intelligence of LLMs still has significant room for improvement, with superficially friendliness as a primary reason for errors. Moreover, there exists a relatively low correlation between the social intelligence and academic intelligence exhibited by LLMs, suggesting that social intelligence is distinct from academic intelligence for LLMs.Additionally, while it is observed that LLMs can't “understand” what social intelligence is, their social intelligence, similar to that of humans, is influenced by social factors.
Events are considered as the fundamental building blocks of the world. Mining event-centric opinions can benefit decision making, people communication, and social good. Unfortunately, there is little literature addressing event-centric opinion mining, although which significantly diverges from the well-studied entity-centric opinion mining in connotation, structure, and expression. In this paper, we propose and formulate the task of event-centric opinion mining based on event-argument structure and expression categorizing theory. We also benchmark this task by constructing a pioneer corpus and designing a two-step benchmark framework. Experiment results show that event-centric opinion mining is feasible and challenging, and the proposed task, dataset, and baselines are beneficial for future studies.
Understanding documents is central to many real-world tasks but remains a challenging topic. Unfortunately, there is no well-established consensus on how to comprehensively evaluate document understanding abilities, which significantly hinders the fair comparison and measuring the progress of the field. To benchmark document understanding researches, this paper summarizes four representative abilities, i.e., document classification, document structural analysis, document information extraction, and document transcription. Under the new evaluation framework, we propose \textbf{Document Language Understanding Evaluation} -- DLUE, a new task suite which covers a wide-range of tasks in various forms, domains and document genres. We also systematically evaluate six well-established transformer models on DLUE, and find that due to the lengthy content, complicated underlying structure and dispersed knowledge, document understanding is still far from being solved, and currently there is no neural architecture that dominates all tasks, raising requirements for a universal document understanding architecture.