ASSESSING THE ADEQUACY OF ARTIFICIAL INTELLIGENCE MODELS IN ANSWERING SPINE SURGERY QUESTIONS FROM THE ORTHOPEDIC RESIDENCY TRAINING AND DEVELOPMENT EXAMINATION

Yılmaz, Bilge Kağan; Yuzuguldu, Uğur

dc.contributor.author	Yılmaz, Bilge Kağan
dc.contributor.author	Yuzuguldu, Uğur
dc.date.accessioned	2025-12-28T16:50:24Z
dc.date.available	2025-12-28T16:50:24Z
dc.date.issued	2025
dc.identifier.issn	13010336
dc.identifier.uri	https://doi.org/10.4274/jtss.galenos.2025.74436
dc.identifier.uri	https://search.trdizin.gov.tr/tr/yayin/detay/1350220
dc.identifier.uri	https://hdl.handle.net/20.500.12933/3020
dc.description.abstract	Objective: Artificial intelligence (AI) has undergone remarkable advancements in recent years, and its integration across various domains has been transformative. In the field of medicine, AI applications are rapidly expanding, offering novel opportunities for clinical practice, decision-making, and medical education. The present study sought to assess the performance and reliability of state-of-the-art AI models in addressing spine surgery questions from the Orthopedic Residency Training and Development Examination conducted in Türkiye between 2010 and 2023. Materials and Methods: A total of 286 spine surgery questions were systematically analyzed. The reference standard was established using the official correct answers, which were subsequently compared with the outputs generated by three advanced AI models: Chat Generative Pre-trained Transformer-5.0 (ChatGPT-5.0), Gemini-Pro, and DeepSeek-V3. Model performance was evaluated in terms of accuracy, error rate, and non-response rate. Comparative analyses among models were performed using chi-square and McNemar tests with pairwise post-hoc comparisons. Wilson’s method was employed to calculate 95% confidence intervals (CIs). In addition, subgroup analyses were conducted according to question categories and temporal strata. Results: Gemini-Pro achieved the highest accuracy rate (85.3%), demonstrating statistically significant superiority over ChatGPT-5.0 (71.7%, p<0.001). The overall accuracy rates were as follows: Gemini-Pro, 85.3% (95% CI: 80.7-88.9; non-response 1.4%); DeepSeek-V3, 78.0% (95% CI: 72.8-82.4; non-response 3.8%); and ChatGPT-5.0, 71.7% (95% CI: 66.2-76.6; non-response 10.8%). Temporal analyses revealed that Gemini-Pro and DeepSeek-V3 performed better in earlier years, whereas Gemini-Pro consistently maintained superior and stable performance in the later periods. In contrast, ChatGPT-5.0 exhibited persistently lower accuracy across all intervals. Conclusion: Gemini-Pro demonstrated the most consistent and robust performance across both overall and temporal analyses. These findings underscore the promising role of AI in orthopedic residency education, particularly in examination preparation. Nevertheless, integration of AI into training curricula should be approached with caution, as expert oversight remains indispensable to ensure reliability and clinical applicability. ©Copyright 2025 The Author.
dc.language.iso	en
dc.publisher	Galenos Publishing House
dc.relation.ispartof	Journal of Turkish Spinal Surgery
dc.rights	info:eu-repo/semantics/openAccess
dc.subject	Artificial intelligence
dc.subject	ChatGPT
dc.subject	DeepSeek
dc.subject	Gemini
dc.subject	spine surgery
dc.title	ASSESSING THE ADEQUACY OF ARTIFICIAL INTELLIGENCE MODELS IN ANSWERING SPINE SURGERY QUESTIONS FROM THE ORTHOPEDIC RESIDENCY TRAINING AND DEVELOPMENT EXAMINATION
dc.type	Article
dc.department	Afyonkarahisar Sağlık Bilimleri Üniversitesi
dc.identifier.doi	10.4274/jtss.galenos.2025.74436
dc.identifier.volume	36
dc.identifier.issue	4
dc.identifier.startpage	174
dc.identifier.endpage	180
dc.relation.publicationcategory	Makale - Uluslararası Hakemli Dergi - Kurum Öğretim Elemanı
dc.department-temp	Yılmaz, Bilge Kağan, Department of Orthopaedics and Traumatology, Afyonkarahisar Health Sciences University, Afyonkarahisar, Afyonkarahisar, Turkey; Yuzuguldu, Uğur, Clinic of Orthopedics and Traumatology, Balıkesir Atatürk City Hospital, Balikesir, Balikesir, Turkey
dc.identifier.scopus	2-s2.0-105023482078
dc.identifier.scopusquality	N/A
dc.identifier.trdizinid	1350220
dc.indekslendigikaynak	Scopus
dc.indekslendigikaynak	TR-Dizin
dc.snmz	KA_Scopus_20251227

Bu öğenin dosyaları:

Dosyalar	Boyut	Biçim	Göster
Bu öğe ile ilişkili dosya yok.

Bu öğe aşağıdaki koleksiyon(lar)da görünmektedir.

Scopus İndeksli Yayınlar Koleksiyonu [1550]
Scopus Indexed Publications Collection
TR-Dizin İndeksli Yayınlar Koleksiyonu [1290]
TR-Dizin Indexed Publications Collection

Basit öğe kaydını göster