Comparative analysis of five AI platforms for mandibular canal segmentation on CBCT images

Highlights

•First head-to-head comparison of five AI platforms for mandibular canal segmentation.
•AI segmentation accuracy varies widely across platforms on CBCT data.
•Only some AI tools achieved sub-millimeter, clinically acceptable performance.
•Anatomically complex regions showed the highest segmentation errors.
•Independent benchmarking is essential before clinical adoption of AI tools.

Objectives

Accurate mandibular canal (MC) identification on cone-beam computed tomography (CBCT) is vital to prevent inferior alveolar nerve injury during oral and maxillofacial procedures. Manual segmentation is time-consuming and operator-dependent, while artificial intelligence (AI) offers automated, reproducible alternatives. This study compared the accuracy of automated MC segmentation across five AI platforms using standardized quantitative and qualitative evaluations.

Methods

A total of 120 anonymized CBCT scans (240 MCs) were analyzed using five fully automated AI-based segmentation platforms: Atomica (Atomica AI, USA), BlueSkyPlan (Blue Sky Bio, USA), Craniocatch (Craniocatch, Türkiye), 3D Slicer (open-source, USA), and Relu Creator (Relu BV, Belgium). Expert-annotated models served as reference. Accuracy was quantified as unsigned mean surface deviation and categorized as optimal (<0.5 mm), acceptable (0.5–2.0 mm), or unacceptable (>2.0 mm). Qualitative evaluation employed a five-point anatomical fidelity scale. Segment-wise, laterality, and scanner-wise effects were also assessed.

Results

Significant performance differences were observed among platforms (p < 0.001). Relu Creator and 3D Slicer achieved the highest overall accuracy (≈0.5 mm) with no >2.0 mm deviations in the complete-canal analysis. Craniocatch showed moderate accuracy, while Atomica and BlueSkyPlan exhibited greater variability and more deviations > 2.0 mm. Qualitative scores reflected similar trends. Regionally, middle canal segments showed the best accuracy, with higher deviations near the mandibular and mental foramina. Scanner- and side-related effects were statistically significant but clinically negligible.

Conclusion

AI-based MC segmentation accuracy varies across platforms. Relu Creator and 3D Slicer achieved near-expert performance suitable for clinical use, while others require expert verification. Independent benchmarking and multi-scanner validation are essential for safe implementation.

Clinical significance

This study provides evidence-based guidance on the accuracy of AI tools for automated MC segmentation, supporting safer surgical planning by identifying which AI-generated outputs can be trusted and where expert verification remains essential to prevent nerve injury.

‍

Link to paper