We use cookies and other tools to enhance your experience on our website and to analyze our web traffic.
For more information about these cookies and the data collected, please refer to our Privacy Policy.

NCH Sleep DataBank

About

In order to accelerate research on pediatric sleep and its connection to health, Nationwide Children's Hospital (NCH) and Carnegie Mellon University (CMU) introduce the NCH Sleep DataBank. This dataset has 3,984 pediatric sleep studies on 3,673 unique patients conducted at NCH in Columbus, Ohio, USA between 2017 and 2019, along with the patients' longitudinal clinical data. The published polysomnography (PSG) contains the patient's physiological signals as well as the technician’s assessment of the sleep stages and descriptions of additional irregularities.

The novelties of this dataset include:

  1. Size: Its large size is suitable for discovering new scientific insights via data mining.
  2. Patient population: It explicitly focuses on pediatric patients.
  3. Clinical setting: The sleep studies were gathered in the real-world clinical setting at NCH as opposed to, for example, a controlled clinical trial.
  4. Rich set of clinical data: The accompanying 5.6 million records of clinical data are extracted from the EHR, and are separated into encounters, medications, measurements (e.g. body mass index), diagnoses, and procedures.

The NCH Sleep DataBank is a valuable resource for advancing automatic sleep scoring and real-time sleep disorder prediction, among many other potential scientific discoveries. Accompanying code in Python to assist users in interacting with the dataset is published on GitHub.

The National Sleep Research Resource is grateful to the Nationwide Children's Hospital (NCH) and Carnegie Mellon University (CMU) team for sharing these data.

Data preamble and access restrictions

The NCH Sleep DataBank is only available for non-commercial use.

Citation and acknowledgement

When using this dataset, please cite the following:

Zhang GQ, Cui L, Mueller R, Tao S, Kim M, Rueschman M, Mariani S, Mobley D, Redline S. The National Sleep Research Resource: towards a sleep data commons. J Am Med Inform Assoc. 2018 Oct 1;25(10):1351-1358. doi: 10.1093/jamia/ocy064. PMID: 29860441; PMCID: PMC6188513.

Lee H, Li B, DeForte S, Splaingard ML, Huang Y, Chi Y, Linwood SL. A large collection of real-world pediatric sleep studies. Sci Data. 2022 Jul 19;9(1):421. doi: 10.1038/s41597-022-01545-6. PMID: 35853958; PMCID: PMC9296671.

Please include the following text in the Acknowledgements:

NCH Sleep DataBank was supported by the National Institute of Biomedical Imaging and Bioengineering of the National Institutes of Health under Award Number R01EB025018. The National Sleep Research Resource was supported by the U.S. National Institutes of Health, National Heart Lung and Blood Institute (R24 HL114473, 75N92019R002).

Data overview

/datasets
Covariate datasets derived from the original health_data files. The NSRR team recommends using the nchsdb-dataset-harmonized dataset, which contains variables (e.g., nsrr_age, nsrr_bmi) that match other NSRR harmonized datasets.

/health_data
Data (CSV) from the NCH clinical data warehouse. Click here for an overview of the file formats and contents.

/sleep_data
Raw physiological data (EDF) and annotations (TSV) from overnight polysomnography.

Change log

July 2022

January 2022

October 2021

  • Unpaused approval of data requests. Replaced DIAGNOSIS.csv in health_data folder. Note from the contributor: We replaced columns ('DX_CODE', 'DX_NAME', 'DX_ALT_CODE', 'CLASS_OF_PROBLEM', ‘CHRONIC_YN') corresponding a rare diagnosis with "redacted" to prevent potential privacy issues. A rare diagnosis is defined as a diagnosis that has less than 10 unique patients from all NCH records from 01/01/2000 to 12/31/2020.

September 2021

  • Pause approval of data requests

February 2021

  • Make NCHSDB dataset available for data requests
National Sleep Research Resource
NCH Sleep DataBank