Blog Archives

VFS Interjudge Reliability Using a Free and Directed Search

Reports in the literature suggest that clinicians demonstrate poor reliability in rating videofluoroscopic swallow (VFS) variables. Contemporary perception theories suggest that the methods used in VFS reliability studies constrain subjects to make judgments in an abnormal way. The purpose of this study was to determine whether a directed search or a free search approach to rating swallow studies results in better interjudge reliability. Ten speech pathologists served as judges. Five clinical judges were assigned to the directed search group (use checklist) and five to the free search group (unguided observations). Clinical judges interpreted 20 VFS examinations of swallowing. Interjudge reliability of ratings of dysphagia severity, affected stage of swallow, dysphagia symptoms, and attributes identified by clinical judges using a directed search was compared with that using a free search approach. Interjudge reliability for rating the presence of aspiration and penetration was significantly better using a free search (“substantial” to “almost perfect” agreement) compared to a directed search (“moderate” agreement). Reliability of dysphagia severity ratings ranged from “moderate” to “almost perfect” agreement for both methods of search. Reliability for reporting all other symptoms and attributes of dysphagia was variable and was not significantly different between the groups.

from Dysphagia


Assessing executive functioning: On the validity, reliability, and sensitivity of a click/point random number generation task in healthy adults and patients with cognitive decline

In random number generation (RNG) tasks, used to assess executive functioning, participants are asked to generate a random sequence of digits at a paced rate, either verbally or by writing. Some previous studies used an alternative format in which participants had to randomly press different response keys, assuming that this task version demands the same cognitive processes as those implied in the standard version. The present study examined the validity of this assumption. To this end, the construct validity, reliability, and sensitivity of a conceptually similar task version of the key-press task were examined. Participants had to randomly click on, or point to, the digits 1-9, laid out orderly in a 3 3 grid on a computer screen. Psychometric properties of this task were examined, based on the performance of 131 healthy participants and 80 patients with cognitive decline. The results suggest that the click/point RNG task version can be used as a reliable and valid substitute for standard task versions that use the same response set and response pacing rate as those used in the present study. This task might be a useful alternative, demanding no separate recording and recoding of responses, and being suitable for use with patients with speech or writing problems.

from the Journal of Clinical and Experimental Neuropsychology

Language Sampling: Does the Length of the Transcript Matter?

Conclusion: Implications for the efficient use of language sample analysis in clinical protocols are discussed. A framework for eliciting reliable short samples is provided.

from Language, Speech and Hearing Services in Schools

Language lateralization in children: A functional transcranial Doppler reliability study

We used bilateral simultaneous functional Transcranial Doppler Ultrasonography (fTCD) measurements in the middle cerebral arteries (MCA) to obtain information on hemispheric specialization for language processing in individual children. Twenty-six healthy right-handed children (49–113 months) participated in an active, expressive language task (talking about pictures) and a more passive, receptive language task (listening to stories). One-month-retest reliability was evaluated in 20 children. Both tasks elicited a mean left hemispheric lateralization, which was more pronounced in the expressive task. Retesting confirmed the initial lateralization in 90% of the cases of the expressive paradigm and in 55% for the receptive task. Lateralization of blood flow accelerations in the MCA did not depend on demographical variables (age and gender), degree of hand dominance, performance quality, or language skills. The expressive language paradigm measured by fTCD is a reliable and non-invasive alternative to current language lateralization methods in children.

from the Journal of Neurolinguistics

The reliability of the Communication Disability Profile: A patient-reported outcome measure for aphasia

The findings of this study provided preliminary psychometric evidence to support the use of the Activities and Participation sections of the CDP as a PRO measure by people with aphasia.

from Aphasiology

Clinical Versus Laboratory Ratings of Voice Using the CAPE-V

Clinical bias may play a role in observed discrepancies between clinical and laboratory ratings of dysphonia. Additionally, auditory anchors available during laboratory procedures may contribute to these discrepancies. These findings highlight the need to standardize procedures for clinical voice assessment.

from the Journal of Voice

Considerations for test selection: How do validity and reliability impact diagnostic decisions?

Nine preschool and school-age language assessment tools found to have acceptable levels of identification accuracy were evaluated to determine their overall levels of psychometric validity for use in diagnosing the presence/absence of language impairment. Eleven specific criteria based on those initially devised by McCauley and Swisher (1984) were applied to each of the selected tests in order to determine each test’s overall level of psychometric validity. Results indicated that each of the selected assessment tools met at least eight of the 11 criteria used to evaluate each assessment tool. Five tests met 10 out of 11 criteria. Findings are discussed to assist clinicians in applying psychometric criteria to these selected tests, as well as those not reviewed as part of this current review of standardized assessment tools. A decision tree is included within the discussion of this study’s findings to aid clinicians in the selection of standardized assessment tools that are most appropriate for clinical use, based on their psychometric characteristics.

from Child Language Teaching and Therapy

Investigation of the reliability of the SSI-3 for preschool Persian-speaking children who stutter

There is a pressing need in Iran for the translation of widely used speech-language assessment tools into Persian. This study reports the interjudge and intrajudge reliability of a Persian translation of the Stuttering Severity Instrument-3 (SSI-3) (Riley, 1994). There was greater than 80% interjudge and intrajudge agreement on scale scores for Frequency and Duration, 54% interjudge and 62.2% intrajudge agreement for “Physical Concomitants” and greater than 80% interjudge and intrajudge agreement for the Overall score. In conclusion, although percentage agreement for Physical Concomitant Behaviors was low, the Persian translation of SSI-3 shows otherwise acceptable interjudge and intrajudge reliability when performed under ideal conditions.

from Journal of Fluency Disorders

Reliability, Stability, and Sensitivity to Change and Impairment in Acoustic Measures of Timing and Frequency

Assessment of the voice for supporting classifications of central nervous system (CNS) impairment requires a different practical, methodological, and statistical framework compared with assessment of the voice to guide decisions about change in the CNS. In experimental terms, an understanding of the stability and sensitivity to change of an assessment protocol is required to guide decisions about CNS change. Five experiments (N = 70) were conducted using a set of commonly used stimuli (eg, sustained vowel, reading, extemporaneous speech) and easily acquired measures (eg, f0–f4, percent pause). Stability of these measures was examined through their repeated application in healthy adults over brief and intermediate retest intervals (ie, 30 seconds, 2 hours, and 1 week). Those measures found to be stable were then challenged using an experimental model that reliably changes voice acoustic properties (ie, the Lombard effect). Finally, adults with an established CNS-related motor speech disorder (dysarthria) were compared with healthy controls. Of the 61 acoustic variables studied, 36 showed good stability over all three stability experiments (eg, number of pauses, total speech time, speech rate, f0–f4). Of the measures with good stability, a number of frequency measures showed a change in response to increased vocal effort resulting from the Lombard effect challenge. Furthermore, several timing measures significantly separated the control and motor speech impairment groups. Measures with high levels of stability within healthy adults, and those that show sensitivity to change and impairment may prove effective for monitoring changes in CNS functioning.

from the Journal of Voice

Cross-cultural Adaptation and Validation of the Voice Handicap Index Into Italian

The Italian VHI is highly reproducible, and exhibits excellent clinical validity.

from the Journal of Voice

Training specific vocal techniques can be effective in treating nonorganic dysphonias. Evaluation of vocal function in these studies has included auditory-perceptual assessment, aerodynamic measurement, acoustic analysis, self-report, and visual inspection of the larynx. Reliability of judgments made using visual rating tools for nasendoscopic and videostroboscopic visualization of the larynx when diagnosing vocal function and disorder has been the focus of previous research. However, detailed analysis of factors that affect reliability and consistency of perceptual ratings of laryngoscopic footage has not been investigated in voice therapy outcome studies. This study evaluated clinicians’ judgments of the effectiveness of training differentiated vocal tract control of false vocal fold activity (FVFA), true vocal fold mass (TVFM) and larynx height (LH). A within-subject, experimental design was used to assess participants’ mastery in manipulating FVFA, TVFM, and LH assessed via laryngoscopic visualization of the larynx. Three experienced speech pathologists rated the nasendoscopy footage with accompanying acoustic recordings of 12 speakers. Intrajudge consistency, interjudge reliability, and interjudge agreement of perceptual ratings were investigated. Twelve vocally trained unimpaired speakers used differentiated biomechanical manipulation of various laryngeal muscles to produce eight specific vocal qualities each. These manipulated vocal qualities were rated by three experienced voice clinicians who demonstrated higher levels of intrajudge consistency and interjudge agreement when identifying rather than quantifying the degree of a voice quality based on their visual and auditory perceptions of the different vocal features. The findings suggest that unimpaired speakers can be trained successfully to manipulate and change individual biomechanical aspects of their vocal functions as demonstrated by the visual- and auditory-perceptual judgments of expert voice clinicians. These judgments are vulnerable to issues of reliability and suggests that judges used auditory-perceptual judgments when interpreting laryngoscopic footage, particularly when the view of laryngeal features is compromised.

from the Journal of Voice

Cleft Audit Protocol for Speech (CAPS-A): a comprehensive training package for speech analysis

Methods & Procedures: Thirty-six specialist speech and language therapists undertook the training programme over four days. This consisted of two days’ training on the CAPS-A tool followed by a third day, making independent ratings and transcriptions on ten new cases which had been previously recorded during routine audit data collection. This task was repeated on day 4, a minimum of one month later. Ratings were made using the CAPS-A record form with the CAPS-A definition table. An analysis was made of the speech and language therapists’ CAPS-A ratings at occasion 1 and occasion 2 and the intra- and inter-rater reliability calculated.

Outcomes & Results: Trained therapists showed consistency in individual judgements on specific sections of the tool. Intraclass correlation coefficients were calculated for each section with good agreement on eight of 13 sections. There were only fair levels of agreement on anterior oral cleft speech characteristics, non-cleft errors/immaturities and voice. This was explained, at least in part, by their low prevalence which affects the calculation of the intraclass correlation coefficient statistic.

Conclusions & Implications: Speech and language therapists benefited from training on the CAPS-A, focusing on specific aspects of speech using definitions of parameters and scalar points, in order to apply the tool systematically and reliably. Ratings are enhanced by ensuring a high degree of attention to the nature of the data, standardizing the speech sample, data acquisition, the listening process together with the use of high-quality recording and playback equipment. In addition, a method is proposed for maintaining listening skills following training as part of an individual’s continuing education.

from the International Journal of Language and Communication Disorders

The vestibular evoked myogenic potential: A test–retest reliability study

A unilateral muscle contraction controlled by a feedback mechanism resulted in reliable response parameters, comparable right to left and corresponding to literature data obtained in different test conditions.

The use of a blood pressure manometer as feedback mechanism combined with a meticulously controlled positioning of the head and contraction of the SCM muscle provides a reliable alternative in clinical settings, when the background muscle contraction cannot be measured or software related correction algorithms are not accessible.

from Clinical Neurophysiology

Maximum Phonation Time: Variability and Reliability

The objective of the study was to determine maximum phonation time reliability as a function of the number of trials, days, and raters in dysphonic and control subjects. Two groups of adult subjects participated in this reliability study: a group of outpatients with functional or organic dysphonia versus a group of healthy control subjects matched by age and gender. Over a period of maximally 6 weeks, three video recordings were made of five subjects’ maximum phonation time trials. A panel of five experts were responsible for all measurements, including a repeated measurement of the subjects’ first recordings. Patients showed significantly shorter maximum phonation times compared with healthy controls (on average, 6.6 seconds shorter). The averaged interclass correlation coefficient (ICC) over all raters per trial for the first day was 0.998. The averaged reliability coefficient per rater and per trial for repeated measurements of the first day’s data was 0.997, indicating high intrarater reliability. The mean reliability coefficient per day for one trial was 0.939. When using five trials, the reliability increased to 0.987. The reliability over five trials for a single day was 0.836; for 2 days, 0.911; and for 3 days, 0.935. To conclude, the maximum phonation time has proven to be a highly reliable measure in voice assessment. A single rater is sufficient to provide highly reliable measurements.

from the Journal of Voice

The Effect of Musical Background on Judgments of Dysphonia

The objectives of this study were to determine the effect of musical background on both pitch discrimination abilities and the reliability of judging voice quality in dysphonic speakers, and to determine the relationship between pitch discrimination abilities and the reliability of voice quality judgments. Twenty musicians and 20 nonmusicians performed pitch discrimination tests. They also made judgments of dysphonic vowels and speech samples for breathiness and roughness using 100-mm visual analog scales. Musicians demonstrated significantly smaller pitch discrimination thresholds than nonmusicians. For measures of intrarater agreement, musicians were significantly more consistent than nonmusicians for judgments of breathiness in both vowels and speech produced by dysphonic speakers. Musicians also showed significantly better interrater agreement for judgments of breathiness in vowels. Weak to moderate relationships were found between pitch discrimination abilities and agreement values for voice quality judgments. Results suggest that musical background may affect a listener’s reliability in making judgments of dysphonia, particularly for judgments of breathiness. However, simple pitch discrimination skills of pure tones do not explain these differences. More complex stimuli should be used in future investigations to help determine the nature of underlying differences.

from the Journal of Voice