We've updated our privacy policy.

Sleep Heart Health Study

Reliability for SHHS2

Initial Assessment of SHHS-2 Scorer Reliablilty (reviewed by PSG Committee 4/01)

The following summarizes part of our quality assurance and certification procedures for scorer reliability. In early 1995, a formal scoring reliability study was designed (Whitney et al.; Sleep) which included scoring of 20 studies per original scorer (912 {part time scorer}, 914, 915), scored twice over a 6.5 month interval. Half of these records (10/scorer) were also scored at the second time point by the other scorers. A random sample of these records were rescored over the course of SHHS to document drift over time. It should be noted that these records all came from the first 500 SHHS 1 studies, and overall, were of poorer quality (especially EEG and oximetry) than subsequent records. Note that the only scorer who originally participated in the scoring reliability study who is scoring for SHHS2 is 915. However, scorer 916 joined SHHS1 very early in that study (towards the end of the reliability study initiation) and has been involved in SHHS since then. Scorer 922 joined SHHS2 in November 2000.

Time points: 1: Jan-April 1996 2: Oct 1996-Jan 1997 3: June 2000 4: Jan 2001

Between scorer reliability (assessed at single time points):

Intra Class Correlation Coefficient by Time Point

June 2000 (n=10) January 2001 (n=10)
Scorer ID 914,915,916 915,916,922
RDI .97 .99
AI .69 .75
% Stage 1 .71 .70
% Stage 2 .90 .92
% Stage 34 .93 .94
% Rem .93 .88
Total Sleep Time .99 .97

Within scorer reliability (assessed within scorer over time):

Intra Rater Reliability for Each Scorer Across Time

912 914 915 916
Subjects, n 9 4 10 10
Time points 2 2 3* 2
RDI .96 .99 .97 .99
AI .70 .72 .77 .75
% Stage 1 .52 .78 .87 .75
% Stage 2 .72 .78 .85 .94
% Stage 34 .87 .95 .86 .98
% Rem .91 .91 .96 .90
Total Sleep Time .96 .98 .99 .99
* Only 4 subjects at Time 2

Other quality assurance exercises:

June 2000: Scorers 914, 915, 916 participated in a specific exercise designed to evaluate arousal reliability in a contemporary data set. 1040 epochs were selected from 40 records. Each record segment was scored twice by each scorer, within 2 weeks of original scoring. (Note: YY means that that scorer identified an arousal on given epoch at both time points; YN means she identified the arousal first time but not second time, etc.)

Table 1: Joint Classification of Arousal Reliability Data between Raters 914 and 915

Rater 915 YY YN NY NN Total
Rater 914 YY 106 8 8 12 134
YN 7 1 3 11 22
NY 15 3 1 20 39
NN 21 18 9 797 845
Total 149 30 21 840 1040

Table 2: Joint Classification of Arousal Reliability Data between Raters 914 and 916

Rater 916 YY YN NY NN Total
Rater 914 YY 109 8 5 12 134
YN 8 2 2 10 22
NY 13 4 3 19 39
NN 16 13 6 810 845
Total 146 27 16 851 1040

Table 3: Joint Classification of Arousal Reliability Data between Raters 915 and 916

Rater 916 YY YN NY NN Total
Rater 915 YY 118 10 6 15 149
YN 7 8 2 13 31
NY 9 2 2 8 21
NN 12 7 6 815 845
Total 146 27 16 851 1040

Table 4: Estimates of Intra- and Inter-Rater Agreement Measured by Cohen's Kappa

Intra-Rater Agreement

Rater 914 Rater 915 Rater 916
0.78 (0.72, 0.82)* 0.82 (0.77, 0.86) 0.85 (0.80, 0.89)

Inter-Rater Agreement

Raters 914/915 Raters 914/916 Raters 915/916
0.70 (0.65, 0.75) 0.73 (0.68,0.78) 0.76 (0.72, 0.80)

* 95% confidence intervals

National Sleep Research Resource
Sleep Heart Health Study