Professor of Neurosurgery NYU Langone Health New York City, New York, United States
Disclosure(s):
Douglas Kondziolka, MD, MSc, FRCSC, FACS: Brainlab: Research Grant (Ongoing); Chiefy: Ownership Interest (Ongoing); Congress of Neurological Surgeons: Independent Contractor (Ongoing)
Introduction: The development of accurate and generalizable machine learning algorithms requires large datasets. This poses a challenge due to the sensitive and siloed nature of biomedical data. Our objective was to build the world’s largest, longitudinal dataset of real world tumor imaging with multimodal annotations describing the clinical care of patients with metastatic brain tumors.
Methods: The NYU Center for Advanced Radiosurgery (CAR) registry was converted into a SQL database. Each time point was augmented with all available imaging from the hospital PACS, and with all medication prescriptions from the EHR. MRI studies were co-registered at each time point, resampled to 1mm isotropic dimensions, and pre-processed. The final dataset was de-identified, skull stripped, and uploaded to Amazon S3. Naïve out-of-domain transfer learning was assessed with vanilla U-nets using the Brain Tumor Segmentation Challenge (BraTS) 2021 dataset.
Results: 1,293 patients with 3,449 radiosurgery high-resolution MRI studies were identified in the CAR registry. These were augmented by 27,006 diagnostic MRI studies from PACS matched on patient MRN. After excluding studies for incomplete sequences, failed registration, or duplication we obtained a final dataset of 2,148 patients,13,381 MRI studies, and 2,115 expert tumor segmentations derived from gamma knife radiosurgery plans. A total of 490,096 prescriptions were written for 19,083 unique medications and dosages. A vanilla U-Net using simple supervised pre-training obtained a mean DICE score of 0.78 on the BraTS 2021 validation set compared to baseline performance of 0.76 with training only on BraTS.
Conclusion : NYUMets is the world’s largest publicly available dataset of annotated tumor imaging, brain metastases, and longitudinal multi-modal medical data. Opening this data to the scientific research community has the potential to substantially advance medical computer vision, and potentially unlock insights into brain tumor science and care. The dataset can be accessed at https://nyumets.org/ after registration and creation of an Amazon Web Services account.