Monthly Archives: January 2009
Analysis of facial motion patterns during speech using a matrix factorization algorithm
This paper presents an analysis of facial motion during speech to identify linearly independent kinematic regions. The data consists of three-dimensional displacement records of a set of markers located on a subject’s face while producing speech. A QR factorization with column pivoting algorithm selects a subset of markers with independent motion patterns. The subset is used as a basis to fit the motion of the other facial markers, which determines facial regions of influence of each of the linearly independent markers. Those regions constitute kinematic “eigenregions” whose combined motion produces the total motion of the face. Facial animations may be generated by driving the independent markers with collected displacement records.
©2008 Acoustical Society of America
Benefit of high-rate envelope cues in vocoder processing: Effect of number of channels and spectral region
In cochlear implants, or vocoder simulations of cochlear implants, the transmission of envelope cues at high rates (related to voice fundamental frequency, f0) may be limited by the widths of the filters used to form the channels and/or by the cutoff frequency, flp, of the low-pass filters used for envelope extraction. The effect of varying flp in tone and noise vocoders was investigated for channel numbers, N, from 6 to 18. As N increased, the widths of the channels decreased. The value of flp was 45 Hz (envelope or “E” filter), or 180 Hz (pitch or “P” filter). The following combinations of cutoff frequencies were used for channels below and above 1500 Hz, respectively: EE, PE, EP, and PP. Results from a competing-talker task showed that the tone vocoder led to better intelligibility than the noise vocoder. The PP condition led to the best intelligibility and the EE condition to the worst. For N=6, intelligibility was better for condition PE than for condition EP. For N=18, the reverse was true. The results indicate that the channel bandwidths can compromise the transmission of f0-related envelope information, and suggest that vocoder simulations of cochlear-implant processing have limitations.
©2008 Acoustical Society of America
Hybridizing conversational and clear speech to determine the degree of contribution of acoustic features to intelligibility
Speakers naturally adopt a special “clear” (CLR) speaking style in order to be better understood by listeners who are moderately impaired in their ability to understand speech due to a hearing impairment, the presence of background noise, or both. In contrast, speech intended for nonimpaired listeners in quiet environments is referred to as “conversational” (CNV). Studies have shown that the intelligibility of CLR speech is usually higher than that of CNV speech in adverse circumstances. It is not known which individual acoustic features or combinations of features cause the higher intelligibility of CLR speech. The objective of this study is to determine the contribution of some acoustic features to intelligibility for a single speaker. The proposed method creates “hybrid” (HYB) speech stimuli that selectively combine acoustic features of one sentence spoken in the CNV and CLR styles. The intelligibility of these stimuli is then measured in perceptual tests, using 96 phonetically balanced sentences. Results for one speaker show significant sentence-level intelligibility improvements over CNV speech when replacing certain combinations of short-term spectra, phoneme identities, and phoneme durations of CNV speech with those from CLR speech, but no improvements for combinations involving fundamental frequency, energy, or nonspeech events (pauses).
©2008 Acoustical Society of America
On the ability to discriminate Gaussian-noise tokens or random tone-burst complexes
This study investigated factors that influence a listeners’ ability to discriminate Gaussian-noise stimuli in a same-different discrimination paradigm. The first experiment showed that discrimination ability increased with bandwidth for noise durations up to 100 ms. Duration had a nonmonotonic influence on performance, with a decrease in discriminability for stimuli longer than 40 ms. Further experiments investigated the cause for this performance decrease. They showed that discriminability could be improved when using frozen-noise tokens and by instructing listeners to focus on the stimulus endings. A final experiment, using a stimulus consisting of 5 ms Hanning-windowed tone-bursts randomly distributed over time, investigated whether stimulus duration and amount of information differently affect the processing capacity of the auditory system. Results showed that the number of degrees of freedom in the stimulus, not its duration, predominantly influenced the ability to discriminate. Overall, the results suggest that the discrimination performance for acoustic stimuli depends strongly on the amount of information per critical band and the capacity to process this information. This capacity seems to be limited in the temporal dimension, while extending the signal over more auditory filters does have a positive effect on performance.
©2008 Acoustical Society of America
Perception of rhythmic grouping depends on auditory experience
Many aspects of perception are known to be shaped by experience, but others are thought to be innate universal properties of the brain. A specific example comes from rhythm perception, where one of the fundamental perceptual operations is the grouping of successive events into higher-level patterns, an operation critical to the perception of language and music. Grouping has long been thought to be governed by innate perceptual principles established a century ago. The current work demonstrates instead that grouping can be strongly dependent on culture. Native English and Japanese speakers were tested for their perception of grouping of simple rhythmic sequences of tones. Members of the two cultures showed different patterns of perceptual grouping, demonstrating that these basic auditory processes are not universal but are shaped by experience. It is suggested that the observed perceptual differences reflect the rhythms of the two languages, and that native language can exert an influence on general auditory perception at a basic level
Perceptual development of phoneme contrasts: How sensitivity changes along acoustic dimensions that contrast phoneme categories
Listeners discriminate acoustic differences between phoneme categories at a higher level than similarly sized differences within phoneme categories. The question this paper aims to answer is how this pattern in perceptual sensitivity develops along an acoustic dimension that contrasts two non-native speech sounds: through acquired distinctiveness, through acquired similarity, or through a combination of the two. A pretest–training–post-test experiment was designed to study perceptual development directly, i.e., by including (i) a discrimination task to measure perceptual sensitivity, (ii) a transfer test to ensure language learning instead of stimulus learning, and (iii) a control group to exclude task repetition as an explanation of improvement. It is shown that the typical peak in perceptual sensitivity near a phoneme boundary that native listeners show is not found in relatively inexperienced language learners, despite their ability to classify a continuum in a nativelike way after short laboratory training. Experiment II indicates that a discrimination peak may be achieved by language learners, but only after much more language experience than short-term laboratory training can offer. Furthermore, reasons are given why classification improvement in the laboratory should not be taken as evidence for (i) increased discrimination of the newly learned phonemes and (ii) learning of phoneme representations.
Speech perception of noise with binary gains
For a given mixture of speech and noise, an ideal binary time-frequency mask is constructed by comparing speech energy and noise energy within local time-frequency units. It is observed that listeners achieve nearly perfect speech recognition from gated noise with binary gains prescribed by the ideal binary mask. Only 16 filter channels and a frame rate of 100 Hz are sufficient for high intelligibility. The results show that, despite a dramatic reduction of speech information, a pattern of binary gains provides an adequate basis for speech perception.
Re-evaluating split-fovea processing in word recognition: Effects of word length during monocular viewing
Several studies have claimed that, when fixating a word, a precise split in foveal processing causes all letters to the left and right of fixation to project to different, contralateral hemispheres (split-fovea theory – SFT). In support of this claim, Lavidor et al. (2001; hereafter LES&B) reported that lexical decisions were affected by the number of letters to the left of fixation but not the right, and that this indicates a functional division in hemispheric processing at the point of fixation. Jordan, Paterson, and Stachurski (Cortex, 2009; hereafter JP&S) re-evaluated these claims over 3 experiments using LES&B’s original stimuli and procedures and found no support for the findings of LES&B. Following LES&B, JP&S presented stimuli binocularly (i.e., as in normal viewing). However, this procedure has its own complications for SFT (and for assessing the validity of the theory) because the two eyes often do not fixate the same location. Consequently, we report two further experiments which used an eye-tracker to ensure fixation accuracy and monocular viewing to eliminate influences of fixation disparity. Experiment 1 used the same-sized typeface as JP&S, and Experiment 2 used a larger typeface to approximate normal reading size. In line with the findings of JP&S, neither experiment could replicate the findings of LES&B and both experiments showed simply that word recognition was easier when fixations were made towards the beginning of words. Thus, after a total of 5 separate experiments, using binocular and monocular viewing conditions and stimuli presented in a range of sizes, none of these experiments has been able to replicate the findings of LES&B or provide any evidence for a functional division in hemispheric processing at the point of fixation.
from Cortex
A comparison of the linguistic and interactional features of language learning websites and textbooks
Self-study is playing an increasingly important role in the learning and instruction of many subjects, including second and foreign languages. With the rapid development of the internet, language websites for self-study are flourishing. While the language of print-based teaching materials has received some attention, the linguistic and interactional features of websites are largely ignored by educationists, and online learning materials are regarded as simply duplicates of their print-based counterparts. This is far from satisfactory because web-based and print-based materials are very different tools with which participants negotiate their learning activities. This paper examines the linguistic and interactional features of English learning websites in terms of (1) their lexical density/clause length; (2) referential cohesion, particularly the use of personal pronouns; and (3) the presence of involvement strategies and other interactional features. These features are compared with those in textbooks to examine how websites deviate from traditional instructional texts. It is found that both clause and lexical density are greater on websites than in traditional textbooks. Websites make more use of the personal pronouns ‘I’ and ‘you’, whereas textbooks make more use of the authoritative ‘we’. Websites are also more interactional in terms of their use of involvement strategies, imperative structures and modals. These findings highlight the different contexts of textbooks and websites, particularly the different nature of the two channels and their credibility as information sources. This has practical implications for the design of appropriate online instructional resources.
Criteria for evaluating synchronous learning management systems: arguments from the distance language classroom
As Virtual Learning Environments (VLE) supported by synchronous technologies such as Synchronous Learning Management Systems (SLMSs) are still new to distance language professionals, criteria guiding the evaluation of the appropriate SLMSs for Distance Language Education (DLE) are urgently needed. This article proposes and discusses such criteria. To achieve this, we divide the article into four sections. The first section identifies the need to develop criteria for evaluating an appropriate SLMS for DLE by reviewing what has been achieved in the research into Computer Mediated Communications (CMC) and SLMSs. The second section examines established principles of second language (L2) learning and the nature of distance language education in order to establish a theoretical framework for the formulation of such criteria. In section three, we propose five major criteria for evaluating an effective SLMS for DLE. We further discuss and argue for each criterion from the perspectives of both an e-instructor and an e-learner, and drawing on the empirical data from our distance language classroom. The final section acknowledges the limitations of the study and concludes that the proposal of these criteria is timely and that these criteria need to be enriched with the pedagogical and technological developments in DLE.
Preparing language teachers to teach language online: a look at skills, roles, and responsibilities
This paper reviews and critiques an existing skills framework for online language teaching. This critique is followed by an alternative framework for online language teaching skills. This paper also uses a systems view to look at the roles and responsibilities of various stakeholders in an online learning system. Four major recommendations are provided to help language teacher training programs prepare future language teachers for online language teaching.
When is it appropriate to talk? Managing overlapping talk in multi-participant voice-based chat rooms
There has been extensive reporting on the interactional characteristics of multi-participant text-based chat rooms. In these chat rooms there are several students typing at the same time, often on more than one topic. As a result, it is not uncommon to see multiple overlapping utterances. Despite these communicative challenges, research suggests that multi-participant text-based chat rooms are beneficial for language teaching and learning. It is my objective to investigate whether the same can be said for multi-participant voice-based chat rooms. As there is little empirical work on the interaction that results from communicating in voice-based chat rooms, a necessary first step in discussing pedagogical benefits is to investigate its interactional structure. This study will therefore focus on how overlapping talk is dealt with in a medium in which multiple voices are heard in the absence of nonverbal cues. The findings show how pauses act in connection to overlapping talk, both as a source and an interactional resource. These findings will then be used to discuss the pedagogical implications of communicating in multi-participant voice-based chat rooms.
Pubertal changes in emotional information processing: Pupillary, behavioral, and subjective evidence during emotional word identification
This study investigated pupillary and behavioral responses to an emotional word valence identification paradigm among 32 pre-/early pubertal and 34 mid-/late pubertal typically developing children and adolescents. Participants were asked to identify the valence of positive, negative, and neutral words while pupil dilation was assessed using an eyetracker. Mid-/late pubertal children showed greater peak pupillary reactivity to words presented during the emotional word identification task than pre-/early pubertal children, regardless of word valence. Mid-/late pubertal children also showed smaller sustained pupil dilation than pre-/early pubertal children after the word was no longer on screen. These findings were replicated controlling for participants’ age. In addition, mid-/late pubertal children had faster reaction times to all words, and rated themselves as more emotional during their laboratory visit compared to pre-/early pubertal children. Greater recall of emotional words following the task was associated with mid-/late pubertal status, and greater recall of emotional words was also associated with higher peak pupil dilation. These results provide physiological, behavioral, and subjective evidence consistent with a model of puberty-specific changes in neurobehavioral systems underpinning emotional reactivity.
“Cajoling” as a Means of Engagement in the Dysphagia Clinic
Rapport and cooperation are key features of many clinical interactions including those of speech-language pathologists (SLPs) and clients. A desirable by-product of rapport can be described as “engagement” where participants share a mutual focus while working toward a common goal. Through an analysis of clinical discourse, this article maps the trajectory of engagement as manifest in interactions between a SLP and a client with right hemisphere damage and dysphagia. The analysis shows that, in response to some apparently inappropriate comments made by the client, the SLP responded with teasing or what she called “cajoling” behavior. Cajoling accompanied by humor and laughter became the SLP’s way of gaining and maintaining cooperation in this context. Instead of such behavior being viewed as “unprofessional,” careful mapping of this behavior across several interactions served to demonstrate its value in the ultimate joint achievement of goals. Implications for how such constructions of engagement may be manifest through talk in the SLP clinic are discussed.
A Tool for Assessing Engagement in Instructional Contexts
Engaging activities are enriching ones. To determine whether an instructional activity is engaging, it helps to divide an activity into separate domains, and to rate the activity for the degree to which it fosters engagement in each of the domains. A tool made up of six domains of instructional practice is described in this article, based on the literature of teacher-child interactions in classrooms. The tool’s domains are: the challenge of the activity, level of implementation, quality of instructional discussion, quality of instructional feedback, and quality of procedural and substantive involvement. The tool is offered as a way to evaluate and enhance engagement of children in classroom interactions as well as of adults in clinical interactions. After examining samples of engagement taken from children in classroom activities according to this construct, the observational tool is then applied to the analysis of a therapy session involving an adult with aphasia.
