Guest Blogger: Diego R. Mazzotti, Ph.D. University of Kansas Medical Center
In this post, we will discuss some of the highlights of a Workshop Report recently published in SLEEP1 by our colleagues at the Sleep Research Network (SRN), a Task Force from the Sleep Research Society (SRS). This report summarizes a discussion panel held at the World Sleep Congress in Vancouver, Canada in 2019, that brought together leaders in sleep, circadian sciences, and biomedical informatics. As such, the multidisciplinary nature of the workshop contributed to a lively discussion about some of the challenges and potential solutions related to harmonizing sleep and circadian data towards improving clinical research supporting large scale, multi-centric clinical trials and observational studies using real-world data.
There are many ways to observe the world we live in and, not surprisingly, it is very common that different groups of scientists come up with different definitions for the same observable phenomena. Moreover, even when there is an agreement on definitions, there might be variation on how some of these concepts are represented, i.e., converted into ‘data’. Data harmonization is the process of curating existing sources of data so that they could be integrated with minimal loss of information. Data harmonization contributes to generating knowledge by combining different data sources in a more generalizable setting.
We generate data at unprecedent scale. In the context of clinical research, we want to make good use of these data to help our patients and to understand how intriguing the relationship between sleep, circadian rhythms and health is. However, the most impactful studies are those that promote a positive effect on everyone in our society. While being valuable and relevant, small single-center clinical trials often do not represent a patient population in its entirety. This drives the scientific community to pursue collaborative efforts and to design multi-centric studies. This, however, comes with a challenge – more often than not, different research groups and institutions may not have the same protocols for certain study activities. Or they may use different information technology systems (e.g., electronic health records) to store their data. Thus, it is easy to anticipate that the design of multi-centric studies can become largely expensive. Sometimes, even a relatively simple task of identifying eligible participants for a clinical trial can be daunting across different sites. Often, identification of eligibility criteria that depends on detailed diagnostic confirmation (e.g., moderate-severe obstructive sleep apnea without predominant central events) can be time-consuming and doing it manually may be the only feasible way. When systems that can represent key data elements in our sleep and circadian domain exist, such task could be automated and accomplished more efficiently and at a lower cost. When different institutions use harmonized systems to represent their clinical data, this task becomes even easier. Thus, one of the goals of data harmonization is to facilitate how data can be represented, so they become easy to find, access, interchange and reutilize. These terms, often referred as FAIR (findable, accessible, interoperable, and reusable)2 help guide the development of data harmonization efforts which ultimately contribute to more efficient knowledge generation.
In sleep and circadian sciences, many complex and heterogenous sources of data exist. During the workshop, experts in questionnaires, actigraphy and polysomnography (PSG) - some of the most prevalent methods to collect data in our domain – presented their views on the state-of-the-art data representation methods and discussed the challenges as we move towards integrating disparate data sources. For example, while questionnaires are useful to collect patient-reported outcome, sometimes they lack contextual information that may invalidate the accuracy of data representation when removed from the validated context. Self-reported “good sleep quality” may be perceived differently by young or older adults. Therefore, colleting contextual information, such as whether respondents are working or retired may become relevant. Actigraphy is another important method with low subject burden and ability to use over multiple nights. This have contributed to its use for estimating several sleep and circadian traits of interest. Perhaps one of the greatest challenges in actigraphy data harmonization is the conventional reliance on proprietary software. However, the increasing availability of consumer wearables and open-source software is driving efforts to make the data generation process more transparent, allowing data from different studies to be integrated in a meaningful way. Future efforts to integrate these data into electronic health records (EHR) performed under FAIR principles could revolutionize how sleep and circadian traits are assessed as part of regular clinical care. Due to the digital nature of polysomnography (PSG), data representation protocols already exist, such as the European Data Format (EDF; https://www.edfplus.info/). This makes PSG ahead of the curve in terms of data harmonization. The full potential of PSG-derived physiological signals became apparent due to recent advances in signal processing and machine learning methods3. However, many challenges still exist, such as slightly different technical specifications (e.g., sampling rate), lack of channel name conventions, and variability in annotation formats and terminology to represent events (e.g., arousals, apneas, arrhythmias). Fortunately, the development of open-source tools, such as the ** luna** software package (http://zzz.bwh.harvard.edu/luna/), developed by the team behind the National Sleep Research Resource (NSRR), has contributed to advances in large scale processing and integration of signal data across heterogeneous studies. This page (https://gitlab-scm.partners.org/zzz-public/nsrr/-/blob/master/common/harm-principles.md) is a great resource for those interested in learning how the NSRR is applying important data harmonization principles to facilitate PSG data integration.
To harness the full potential of standardized data harmonization practices, data that has been harmonized also need to be accessible and interoperable. The NSRR is the leading example of this potential, by aggregating and harmonizing data from several observational cohorts and clinical trials. Yet, this only represents one part of our sleep and circadian data ecosystem. Other data sources, such as clinical data obtained from the EHR and personal health data generated by consumer wearables offer unprecedent opportunities in our field. Clinical research networks enabled by institutions such as the National Patient-Centered Clinical Research Network (PCORnet; https://pcornet.org/) and the Observational Health Data Sciences and Informatics (OHDSI; https://www.ohdsi.org/) have been aggregating clinical EHR data for many years. However, due to the lack of comprehensive representation of sleep and circadian data within the terminology systems used by these efforts4, these data may not be readily useable. Minimizing the gap between clinical data generation (e.g., in a sleep laboratory) and clinical data representation (e.g., having terminology systems that contains sleep and circadian terms) is essential to incorporate EHR-based clinical sleep data into our data ecosystem. It is critical for the sleep and circadian biology communities to work together with biomedical informaticists ensuring that structured language about sleep and circadian traits is well represented into data-enabled clinical research networks.
The workshop also suggested a roadmap to facilitate harmonization and adoption of standardized practices for both research and clinical data. These include the following action items:
Are you interested in learning how to incorporate sleep and circadian data harmonization principles in your studies? Do you participate in a clinical research network that collects sleep and circadian data and would like to have your data interoperable with other existing resources? Get in touch with the Sleep Research Network via firstname.lastname@example.org to learn more!
You can also hear Dr. Mazzotti speak about Data Harmonization on the Sleep Research Society Podcast here: Link to SRS Podcast (#5): Sleep and Circadian Informatics Data Harmonization - A workshop report from the Sleep Research Society and Sleep Research Network