Are the signal labels consistent? This is the most common question that we receive from colleagues who are planning signal processing projects. The answer is that it depends on the study. Some studies are very consistent. Some studies are designed around a minimal montage like the CHAT study. We recommend that signal labels and sampling rates are checked prior to performing analyses. We developed a MATLAB tool for checking and summarizing the signal content of a folder containing EDF and XML files.
BlockEdfSummarizeFig is a data checking tool and can assist in the preparation for signal analysis projects. The tool is designed to work with NSRR sleep study data . Sleep study files include phsyiological signals (EDF file) and technican scored annotations (XML file). EDF and XML checking are included to insure the files are accessible prior to analyis. The ability to list signal labels and sampling rates provide a way to check contents prior to large scale signal processing projects. Download the MATLAB app, review releases including sample output. Review the source code
I believe you identified what I believe could be a linch pin for unlocking the power of the data stored on the site. The scored data including apneic events provides a valuable resource for which to conduct research now by alleviating the high cost of manual scoring. With that said, being able to systematically extract common events systematically and objectively could open up sleep medicine. I believe the development of sleep event extraction could mirror what has been done with ECG R wave detection; where open source software with a range of approaches can be downloaded. I would argue that an open source sleep feature extraction toolkit would establish the framework for new measures to be explored.
I have some suggestions on how to approach. Many aspects of feature detection in sleep medicine look for a decrease in signal with a minimum duration that correspond with a value/change of another signal. A method that could be applied in multiple situations could be powerful. I don't believe the specific method is that importants. For example, ECG R wave detection approaches use wavelets, state space modeling, point processing modeling and dynamical systems approaches. Demonstrating an approach is robust to artifacts in a large data sets (plug for the NSRR dataset) would be more important than the methods (as long as it wasn't to slow).
Take a look at the open challenge problems for an example that demonstrates the challenges that arise when extracting features from PSG data.
I had some thought after thinking about your post. Since you are a clinical fellow, I would first try to identify existing tools that are used by researchers at your current institution. There are clear research productivity advantages to exploring a well defined hypothesis driven research problem with existing tools. I have found that nothing can compare to learning and working with the data you intend to use with researchers familiar with the tools. I recently met two individuals (one senior and a recent medical school graduate) interested in EEG analysis at your institution. Inbox me if you would like me to make an introduction.
Learning the skills required to extend existing open source data analysis packages or to process analysis result files can be helpful when working in the research environment. There are a wide range of open source EEG analysis packages that one could plan to build computer science/programming skills. As with my previous recommendation, I recommend identifying an expert and dyadic course work targeted at developing analysis/programming skills. My general rule regarding research development/programming is to do as little as possible. Let your research objective drive what support tasks (like programming) are required.
Thank you for your comments and request. An update on NSRR activities and some personal comments follow
Harmonizing EDF labels and identifying aberrant epochs are areas that we are working on directly through NSRR efforts or through signal processing research projects.
We have been working on data consistency issues that arise when doing large scale analyses. Tools for identifying inconsistency have been developed and being tested internally. The data consistency checkers generate EXCEL output that can be used to create a batch analysis file which include parameters that are not consistent in a study (ex include signal labels and sampling rate). We are happy to make these tools available to the general community as they become more robust. We are happy to share the code as is for those willing to jump right in with code underdevelopment.
I have shifted my personal development of large scale signal processing application to not require consistent signal labels nor consistent sampling rates. For example, the spectral analysis pipeline automatically converts the signal units to uV prior to analyses. This has allowed us to reduce data harmonization required during cross study analyses with only an incremental upfront development cost. I have found this approach preferable to creating/maintaining copies of multiple cohort studies.
We have begun identifying EEG studies that are not recommended to be used for EEG analyses as part of NSRR EEG spectral analysis activities. A list of studies included and excluded as part of spectral analysis will be posted as cohort spectral analysis results are posted. Individual subject spectral analyses output files includes a flag identifying epochs as artifacts. Individual subject spectral analysis output files could be made available to the community as requested.
Please feel free to email me for additional details or to request code.
I am sure there are quick hacks. Most of the staging lines could be commented out.
I am personally a fan of adding functionality and keeping the staging information. It would be nice to have one program that can analyze data across wake and sleep. I generally recommend adding an XML file with with the stages set to wake. The XML format is straight forward to generate. The summary and plotting functions are structured around sleep stages. I would recommend adding a section for summarizing or plotting wake data.
There are a couple of people using the spectral program to analyze wake data. It isn't clear if there efforts are ready for release yet.
Thank you for posting how to open EDF+ files!!!
The instructions will allow many more researchers to access their data with the MATLAB tools that we have made available.
Thanks again for looking in to extending the spectral analysis program to analyze data scored with 20 second epochs (variable fixed width epoch). The epoch width is set with the variable 'epochWidth = 30;' at line 647. Coherence epoch is set with the variable epoch at line 2054. Adding two public properties to [epochWidth and epoch] should allow the change to migrate through the program. I try hard to use variables; but, can't promise that some manual editing might not be required. There are two ways that you might want to add the epoch length configuration. The simplest way is to add a functional popup menu to the SpectralTrainFig.fig with guide, (the MATLAB interface editor). Alternatively, there is an entry in the Compumedics XML file that can be changed. SpectralTrainClass.m and the Compumedics XML loader would have to be modified for this approach. Let's switch the discussion to email. Good luck with the edits!
Hi Michael, I believe that I sent an Email to your personal account. Posting here also; just in case I sent to the wrong address.
From the downloaded file: I noticed that the reserve_1 header entry contains the string ‘EDF+C’. If the EDF has EDF+ components, I wouldn't expect the MATLAB programs I wrote to always work. I believe you mentioned using EDF Browser. I wonder if EDF+ type components were added.
Given that the code runs with NSRR files, I suspect a data driven issue. I would encourage you to review the run in Debug mode to further clarify.
You can email me directly if you would like to discuss further. My experience with working with other researchers is that tweaks are sometimes necessary when working with non-NSRR datasets.
Thank you for moving the dialog to the forum.
I realized that I have changed the name of the tool a couple of times since the initial upload. The tool that includes a checker can be found at:
The source code can be found on GitHub:
If you send me a link to the file, I can take a quick look.
The short answer is yes. More details can be found in the paper titled. "Relation of Sleep-disordered Breathing to Cardiovascular Disease Risk Factors." The PDF is available:http://aje.oxfordjournals.org/content/154/1/50.full.pdf+html.
The project you described is likely quite substantial. How to proceed is dependent on your programming skill level. I am assuming you have a limited programming background. If true, it will take some work to complete the project as described. I would recommend that you find a student that finds the work interesting. If you want to invest your own time, start small and build on successes. Good Luck!