Diagens sets global benchmark for‘ realworld clinical performance’ in medical foundation models

Hangzhou Diagens Biotechnology( Diagens), a global leader in AI-driven medical imaging and imaging equipment, has officially launched DoctorBench, a medical AI evaluation platform, and unveiled its inaugural global medical foundation model leader board in Hong Kong. WiseDiag Technology’ s WiseDiag-v2, Google’ s Gemini-3.1-Pro- Preview and OpenAI’ s GPT-5.4 secured the top three positions.

For the first time, the evaluation framework places‘ real-world clinical performance’ at the centre, constructing a multidimensional benchmarking system that closely mirrors authentic diagnostic and treatment scenarios.

As medical foundation models accelerate their transition from laboratory research to clinical application worldwide, the industry has long lacked a metric that genuinely measures a model’ s‘ clinical competence’. Existing evaluations predominantly focus on medical knowledge recall, failing to capture a model’ s comprehensive performance in complex clinical contexts. This gap between benchmarking and clinical reality has become a global obstacle hindering the deployment of medical AI.

OpenAI previously launched HealthBench, signalling that leading players are beginning to take this challenge seriously. However, medicine is inherently localised – diagnostic and treatment guidelines, language conventions and patient populations vary significantly across countries and regions, rendering any single evaluation system insufficient for universal applicability.

Driven by a profound understanding of this global challenge, Diagens developed the DoctorBench platform and its creation is rooted in nearly a decade of deep collaboration by a cross-disciplinary team. Diagens brought together experts in basic medicine, clinical medicine, AI and the healthcare industry, tightly integrating

The core philosophy of DoctorBench is no longer to test a model’ s‘ knowledge base’, but to assess its clinical communication and decisionmaking ability.

rigorous clinical logic with cutting-edge deep learning algorithms. This enables DoctorBench to both comprehend the boundaries of AI technology and grasp the intricate demands of clinical practice, using that standard to construct its evaluation framework.

The core philosophy of DoctorBench is no longer to test a model’ s‘ knowledge base’, but to assess its clinical communication and decision-making ability – its capacity to‘ think like a doctor’. The platform features three leader board tracks: the Medical Leaderboard( LLM), the Multimodal Leaderboard( VLM) and the Agent Leaderboard – evaluating textual diagnostic ability, multimodal understanding and multiturn decision-making with tool-use inside a simulated clinical environment respectively.

“ The advancement of medical AI is a long-distance race concerning the health and well-being of all humanity. It demands not only disruptive technological innovation and deep cross-disciplinary, cross-regional collaboration, but also an absolute reverence for and unwavering commitment to life and health,” said Dr Song Ning, Founder of Diagens.

Ning expressed the hope of joining hands with more global research institutions, clinical centres and industry partners, so that truly capable technologies can be recognised, trusted and ultimately used to benefit every patient. • www. intelligentcio. com

INTELLIGENT CIO APAC

Intelligent CIO APAC Issue 71 | Page 29

Diagens sets global benchmark for‘ realworld clinical performance’ in medical foundation models