Citrus: Leveraging Expert Cognitive Pathways in a Medical Language Model for Advanced Medical Decision Support

1 Citrus Team, JD Health International Inc   
* Project Lead   
Work done during the internship at Citrus Team.   

Abstract

Large language models (LLMs), particularly those with reasoning capabilities, have rapidly advanced in recent years, demonstrating significant potential across a wide range of applications. However, their deployment in healthcare, especially in disease reasoning tasks, is hindered by the challenge of acquiring expert-level cognitive data. In this paper, we introduce Citrus, a medical language model that bridges the gap between clinical expertise and AI reasoning by emulating the cognitive processes of medical experts. The model is trained on a large corpus of simulated expert disease reasoning data, synthesized using a novel approach that accurately captures the decision-making pathways of clinicians. This approach enables Citrus to better simulate the complex reasoning processes involved in diagnosing and treating medical conditions.To further address the lack of publicly available datasets for medical reasoning tasks, we release the last-stage training data, including a custom-built medical diagnostic dialogue dataset. This open-source contribution aims to support further research and development in the field. Evaluations using authoritative benchmarks such as MedQA, covering tasks in medical reasoning and language understanding, show that Citrus achieves superior performance compared to other models of similar size. These results highlight Citrus’s potential to significantly enhance medical decision support systems, providing a more accurate and efficient tool for clinical decision-making.

Contributions

1. We propose a training-free reasoning approach that emulates the cognitive processes of medical experts, enabling large language models to enhance their medical capabilities in clinical diagnosis and treatment.

2. In conjunction with the data construction method, we introduce a multi-stage post-training approach to further improve the model’s medical performance.

3. We have made the Citrus model and its training data publicly available as open-source resources to advance research in AI-driven medical decision-making.

4. We have developed and open-sourced a large-scale, updatable clinical practice evaluation dataset based on real-world data, accurately reflecting the distribution of patients in real-world settings.


Model Access

Model Name Backbone Link
Citrus1.0-llama-70B llama-70B Model Link
Citrus1.0-Qwen-72B Qwen-72B Model Link

Data Access

Dataset Dataset Usage Dataset Description Link
Citrus_S3 Train Data A portion of the training data for the model includes 20k data points. Data Link
JMED Test Data The dataset originates from anonymized doctor-patient dialogues at JD Health Internet Hospital, filtered to retain consultations adhering to standardized diagnostic workflows. The initial release contains 1,000 high-quality clinical records spanning all age groups (0-90 years) and multiple specialties. Data Link

Method

1. Main approaches

LLMs preforms similar cognitive pathways as medical experts.CPT enabled LLMs to learn medical knowledge and perform pattern recognition as doctors do, meanwhile LLMs are capable to handle hypothetical-deductive reasoning by executing several specific reasoning steps, which can be trained through SFT and RL procedure



2. Overview of training stages and training data pipeline

The training process consists of three stages: CPT, SFT, and RL. We shows training purposes and dataset scale on each stage, also, we points out the data pipeline in corresponding stage


Results

1. Main Results on Medical Benchmarks



2. The experiments on Citrus1.0-Llama-70B



BibTeX

        
@misc{wang2025citrus,
    title={Citrus: Leveraging Expert Cognitive Pathways in a Medical Language Model for Advanced Medical Decision Support}, 
    author={Guoxin Wang and Minyu Gao and Shuai Yang and Ya Zhang and Lizhi He and Liang Huang and Hanlin Xiao and Yexuan Zhang and Wanyue Li and Lu Chen and Jintao Fei and Xin Li},
    year={2025},
    eprint={2502.18274},
    archivePrefix={arXiv},
    primaryClass={cs.CV},
    url={https://arxiv.org/abs/2502.18274}, }