Resident SUNY Upstate Syracuse, New York, United States
Introduction: Accurate models predicting aneurysm rupture risk remain elusive. Machine learning could out-perform traditional methods for this task.
Methods: A retrospective analysis of all cerebral aneurysms (regardless of rupture status) consecutively investigated with digital subtraction angiography (DSA) at a single institution over five years was performed. The data was randomly split into 80% training and 20% hold-out test sets. Hyperparameter tuning with grid search using 5-fold cross validation on the training set was performed for XGBoost, random forest (RF), and support vector machine (SVM) models. The highest performing model was re-trained on the entire training set and evaluated on the test set by the area under the receiver operating characteristic curve (AUC) and F1 score.
Results: A total of 331 aneurysms among 241 patients were included in the analysis. The mean age (± SD) of the cohort was 56.11 ± 12.89. There were 62 males and 179 females. 157 (65.1%) patients had ruptured aneurysms and 84 (34.9%) had unruptured aneurysms. The top performing models on the training set were XGBoost (AUC = 0.87), followed by RF (AUC = 0.86), and SVM (AUC = 0.82). On the hold-out test set, the median AUC and F1 scores for XGBoost were 0.79 (95% CI 0.76 – 0.82) and 0.80 (95% CI 0.76 – 0.83), respectively. The 5 most predictive features along with their mean permutation importances (SD) in the XGBoost model were age (0.062 [0.019]), anterior vs. posterior circulation (0.061 [0.026]), location (0.039 [0.023]), diameter (0.009 [0.015]), and lobe height/width ratio (0.009 [0.011]).
Conclusion : An XGBoost model accurately predicted aneurysm rupture status on presentation using a combination of demographic and aneurysm features. External and prospective validation studies are required.