Speech Pathology Management of Chronic Refractory Cough

One of my university assignments required me to research a new field of practice, and present the results as a proposal. I chose to write about Chronic Refractory Cough. Enjoy!


Defining Chronic Refractory Cough (CRC)

Cough is the most common symptom for which people seek ambulatory care in the United States (Hsiao, Cherry, Beatty, & Rechtsteiner, 2010). Physicians usually separate cough that is less than three weeks in duration (‘acute cough’) from that which persists for longer (‘sub-acute cough’ being three to eight weeks, ‘chronic cough’ persisting for longer than eight weeks). Respiratory physicians have developed a series of care guidelines for the management of cough, from which we can broadly categorise chronic cough etiologies (Irwin et al., 2006; Morice et al., 2004; Morice, McGarvey, & Pavord, 2006):

Class Cause Recommendation
Environmental ACE Inhibitors


Upper Respiratory Upper Airway Cough Syndrome

Post-nasal drip






Lower Respiratory Asthma & COPD


Eosinophillic Bronchitis



Leukotriene antagonists

Aerodigestive Reflux (GERD) Protein-Pump Inhibitors
Rare Psychogenic

Tourette's Syndrome

Psychological counselling

Management involves a careful balance between empirical pharmaceutical treatment and objective measurements or studies. It is often difficult to identify the precipitating cause, as these conditions do not always cause cough. For example, the prevalence of GERD in Australia is 10%, and only a minority of these patients will develop chronic cough (Knox, Harrison, Britt, & Henderson, 2008). However, in some patients the etiology of chronic cough is not identified, and the condition fails to respond to empiric treatment. In this case it is considered ‘refractory’.

Why is CRC refractory?

New insights into the nature of cough are able to explain CRC. Cough is primarily a defensive mechanism of the airway to prevent irritants or foreign bodies from entering the lungs. The traditional understanding of cough describes a mechanical or chemical irritant triggering an afferent receptor in the respiratory tract, which activates a diffuse ‘cough-centre’ in the medulla, which then directs the glottis to close against a build-up of sub-glottal pressure. The glottis relaxes, causing the expulsion of air at speeds approaching the speed of sound, which expels the foreign bodies and shears the irritants off the mucosa (Irwin et al., 1998).

A more complex understanding, informed by recent research, hypothesises irritants triggering a cortical ‘urge-to-cough’ sensation which can vary in intensity according to the nature of the irritant. Through repeated irritation or inflammation, these afferent pathways may be damaged, or the cortex may neuroplastically respond to lower the ‘cough-threshold’ (the minimum amount of a given stimulus required to make a cough unable to be suppressed) (Morice, 2010; Morrison, Rammage, & Emami, 1999). Also, the larynx may develop ‘urge-to-cough’ with stimuli that would not usually cause cough (Vertigan, Theodoros, Gibson, & Winkworth, 2007). Thus even if the initial cause of the hypersensitivity (reflux, infection, etc.) is resolved, the cough may remain due to the induced sensory neuropathy.

Nature and implications of the condition

People with CRC may cough on stimuli that do not usually stimulate cough, such as talking, lying down, eating or performing exercise; as well as displaying hypersensitivity to stimuli that can cause cough such as perfumes, smoke or cold air (Chung, 2014; French, Irwin, Curley, & Krikorian, 1998; Morrison et al., 1999; Vertigan et al., 2007). It is important to note that triggers are often so varied that it is impossible for patients to avoid all of them in their everyday lives.

Investigations have revealed that people with chronic cough report reduced physical, mental and social health, reduced vitality, and sometimes have worries regarding personal safety (Morice, 2013). More than half of chronic cough patient surveyed in a New York study reported depressive symptoms, which correlated with the presence of cough over the three months of the study (Dicpinigaitis, Tso, & Banauch, 2006).


A recent review suggested that the global prevalence of chronic cough, as measured by patient reports, was roughly 10% (Song et al., 2015). It is important to note that the condition is not localized to the Western Hemisphere. A limited number of small studies suggest that the proportion of those with chronic cough who fail to respond to specialist intervention is 12%-42% (McGarvey, 2008).[1] While it is difficult to work with these figures to arrive at a prevalence figure for Chronic Refractory Cough, it is clear that there are a significant number of people who will experience this condition.

According to a worldwide survey, two-thirds of chronic cough patients are female, and the majority are aged 50-69 years, although in China the majority are aged 30-50 years, given the higher amount of environmental pollution (Morice et al., 2014). Females in general display greater difficulty in managing noxious stimuli, and this has been confirmed by studies of capsaicin challenges (Kastelik et al., 2002; Morice et al., 2014).

Populations suitable for specialist management

Not all chronic cough patients receive adequate investigations through primary care. General Practitioners should at least trial therapy for GERD, asthma or upper-respiratory infections before referring for specialist management. These therapies are commonplace, and present minimal risk to the patient. The GP should also investigate common environmental causes, and discuss possible lifestyle changes with the patient. For example, the patient could implement diet changes to manage GERD or cease smoking (Gibson et al., 2010).

Better research is needed to separate refractory chronic cough patients from those whose cough will resolve. As already indicated, chronic cough patients can have poor response to empiric treatments, and cough-sensitivity testing is unfortunately not a specific measure of CRC (Birring, 2011). Better measurements will present patients with a clearer pathway to CRC resolution.

Speech Pathology Assessment and Intervention

Speech Pathology management for CRC has been used for many years (Blager, Gay, & Wood, 1988), but there has been a recent increase in published research since the mid-2000s. Speech Pathologists (SPs) do not require specific qualifications to practice in this area in Australia; however, it is an emerging area of practice not rigorously covered in tertiary courses, and is only tangentially referred to in Speech Pathology Australia’s Scope of Practice (2015). SPs could gain competency in this area through professional development workshops, contact with expert clinicians, and targeted reading of the literature.

SP assessment would involve a case history from the patient’s referring doctor and the patient themselves detailing the characteristics of the cough, its triggers, severity, impacts on the person’s activities and participation, and previous management. Associated symptoms would be investigated, including the presence of comorbid middle airway dysfunctions, such as paradoxical vocal fold movement (Vertigan, Bone, & Gibson, 2013). Additionally, the SP would ask questions typical to voice patients, including level of hydration, alcohol and caffeine consumption, lozenge use, exposure to fumes, breathing style, and vocal behaviours and demands. Finally the SP should assess the patient’s candidacy for behavioural therapy by investigating their concern and their ability to adopt an internal locus of control.

Over the course of 2-3 sessions, the SP would deliver a four-pronged intervention as per Vertigan & Gibson (2012). The patient would be taught about the nature of cough, and that cough is not always productive (in both senses of the word), along with basic respiratory anatomy and physiology. Then, the patient would learn the control techniques, in much the same manner as learning fluency-shaping techniques in stuttering. They would learn to apply the technique for short periods of time in unchallenging situations before moving up a hierarchy. The specific techniques include controlled breathing through pursed lips, effortful swallows, gum-chewing, and sipping water.

The SP would also give the patient a generic vocal hygiene education to reduce laryngeal irritation; and finally, provide psychosocial counselling to build an internal locus of control for the patient, in order for them to view their cough as a behavior rather than as an affliction. Resolution of the condition is best measured by patient report against a valid measure of cough-related quality of life (Boulet et al., 2014)

The treatment agent in SP intervention is believed to be the cessation of the ‘vicious cycle’, in which repeated irritation causes more coughing, which in turn creates more irritation (Vertigan & Gibson, 2012). By ‘controlling’ the cough and minimising irritation, patients should be able to neuroplastically restore their cough thresholds.

If the SP intervention fails, the SP should consult with the cough team (a respiratory specialist, otolaryngologist, or possibly a psychologist), who would order specialised investigations or prescribe pharmaceutical management. Gabapentin, an analgesic, and amitriptyline, an antidepressant, are used for resolving neuropathic pain with recent investigations solidifying the evidence base for their use in chronic cough (Bastian, Vaidya, & Delsupehe, 2006; Jeyakumar, Brickman, & Haben, 2006; Lee & Woo, 2005; Ryan, Birring, & Gibson, 2012). It is important to note, however, that they will only relieve central neuropathies, not peripheral, and thus may not resolve all cases of CRC. Also, recent evidence suggests that patients’ CRC may return after the cessation of Gabapentin (Gibson & Vertigan, 2015). Results soon to be published show the increased efficacy of speech pathology and Gabapentin protocol compared with either intervention (Vertigan, forthcoming).

Evidence for SP intervention

A single-blinded, randomised controlled study of the speech pathology management program outlined above was undertaken, where the control group were given a placebo ‘healthy-lifestyle’ education program (Vertigan, Theodoros, Gibson, & Winkworth, 2006). Treatment dose was four one-on-one thirty minute sessions over two months. Participants rated cough, respiratory, voice and upper airway symptoms, as well as overall limitation on two five point scales of severity and frequency. Participants in the intervention group improved on all measures at a significantly higher rate than those in the placebo group. Dropouts were similar across groups, effect sizes were not reported, and no follow up was undertaken. Other teams have made similar findings with less robust study designs (Murry et al., 2010; Murry, Tabaee, & Aviv, 2004). There is a clear need for further research to investigate the treatment agent in SP intervention; that is, which of the four prongs is specifically efficacious in treatment.

[1] It should be noted that these figures vary greatly between studies, with a recent paper suggesting CRC occurs in up to between 0%-50% of specialist presentations – a range that is not clinically useful (Gibson & Vertigan, 2015).


Evidence Evidence Evidence

Last week I had the immense privilege of attending a two-day workshop given by Dr Katherine Verdolini, covering her rather wordily named Lessac-Madsen Resonant Voice Therapy and Casper-Stone Confidential Flow Therapy (or LMRVT and CSCFT!). The programs are (I believe) proprietary, so I won’t talk about them too much.

However, Dr Verdolini did not limit the talk to the mechanics of voice therapy. She also talked at length about motor learning theory, patient compliance, and evidence-based practice. Her view was that while EBP is generally a good thing, the movement at present suffers from a number of flaws, which I’ll briefly summarise.

  • Systematic reviews of quality RCTs are considered the highest level of evidence, with clinical experience, available resources and patient preferences being consigned to the bottom of the ‘evidence pyramid’.
  • But RCTs will only tell you what will generally work for the ‘average’ patient, and these studies often exclude patients with complications – which are often the patients we see.
  • We develop a false sense of security with EBP that the research was external to the clinical reality it was measuring, when of course it took place within it.

I recall being in a clinic where I was working one-on-one with a ten-year-old boy with severe ASD. He was largely non-verbal, and we tried to convey the structure of his day to him through the use of a visual schedule. My clinic partner and I suggested supplementing the visual schedule with a ‘now/next’ board:

first+then+autism+5To keep our client oriented to the task, we would point to the board and say “Now we are doing X, then we will do Y” etc. I remember being surprised when my clinical educator asked if there was evidence regarding the use of such boards, in particular if there was evidence that colouring each space differently would be more effective.

I was a bit surprised. After all, it seemed obvious that it would, unless the client was colour-blind. What was absent from my CE’s comment was an exercise of clinical judgement – not every facet of our practice needs to be interrogated for its efficacy.

I believe what tends to happen is that EBP is used as a mallet to hit people on the head with. Evidence! Evidence! Evidence! Practicing clinicians are constantly told to integrate evidence into their practice, and for them, evidence is not clinical experience or consideration of patient preferences. Instead, they can have a narrow conception of EBP as being “that . . . double blinded, controlled thing.” (1). This conception probably comes from university courses which stress the evidence hierarchy, and actively ask students to challenge themselves if their thinking is ‘evidence-based’, which seems to be code for ‘I have read the systematic review’.

To put it simply, journal articles and other formal evidence is one piece of the puzzle for clinicians. If it were as simple as implementing a checklist from a guideline, SPs wouldn’t be counted as ‘professionals’ – we would be more like factory-workers. We add value from our ability to consider all issues: our resources, the patient’s/family’s preferences, the published evidence, and our experience in order to implement care.

In other news

One area which could probably do with more EBP is hydration in dysphagia management. I have talked about this before:

In the latest issue of IJSLP, I note a paper by a team from Curtin Uni (2), reporting that of all the factors considered by clinicians in bedside dysphagia assessment (such as oro-motor ability, oral hygiene, alertness, etc.), hydration status was considered the least, with less than 40% SPs surveyed reporting they usually or always consider hydration in their assessment.

This is worrying, considering the implications of dehydration, and the possible lack of coordination between dietetics and speech pathology regarding this issue.

However, more positively, I note that a student at JCU is investigating the impact of diet modifications on quality-of-life in dysphagic patients. I eagerly anticipate the results.


  1. Foster, A., Worrall, L., Rose, M., & O’Halloran, R. (2015). ‘That doesn’t translate’: the role of evidence-based practice in disempowering speech pathologists in acute aphasia management. Int J Lang Commun Disord. doi: 10.1111/1460-6984.12155
  2. Vogels, B., Cartwright, J., & Cocks, N. (2015). The bedside assessment practices of speech-language pathologists in adult dysphagia. Int J Speech Lang Pathol, 17(4), 390-400. doi: 10.3109/17549507.2014.979877

Cosmetic Speech Pathology

There’s been a recent flare-up in the debate over the policing of non-conforming voices. The two cases are vocal fry in women, and ‘gay speech’ – a sociolect common to a subset of Western males (who are not always homosexual as the research informs us).police-badge-clipart-black-and-white-LTKdMyMGc

The articles in question:

Reactions to the last article were intense:

Both of these responses saved special criticism for the Speech Pathologist in question (who I don’t think it’s important to identify), specifically what they view as the pathologising of normal variation in speech. The SP’s opinions about vocal fry:

They just have developed a speech pattern that’s a habit, and they don’t know how to break out of it. When we present ourselves, the way we speak is our verbal image. Much as the way people in the professional world typically don’t go to work in sweats and a t-shirt, they are more concerned about how they present themselves, a lot of the clients that come to see me are concerned about how they’re presenting themselves verbally.

and on gay speech:

I don’t try to dissuade them because when people come to see me they’ve typically reached the point where it’s really bothering them.

In some ways the Speech Pathologist is working through patient-centered goals: these patients want the speech therapy to achieve a change, so why not give it to them?

I think this model places the clinician outside of the society in which their clients operate. If a client who spoke in a non-standard dialect of English (say, African-American Vernacular English) and said they weren’t happy with their accent and dialect, and wanted to approach the standard variety, what should the Speech Pathologist do? Should we view ourselves as a therapy machine that exists solely to enact individual patient wishes, or should we advocate for a society we would want to be part of that embraced diversity? There is no meaningful functional limitation that comes from using vocal fry, uptalk, gay speech, or AAVE that isn’t propagated by people. Speech Pathologists are people, who are part of society, and pretending that we can leave our prejudices at the door of the clinic room is wishful thinking.

Instead of normalising this difference, so that everyone speaks in the same way to not risk upsetting those who cannot tolerate difference, couldn’t we instead advocate for the acceptance of other ways of talking? Speech Pathology’s record here is not fantastic, as a famous David Sedaris essay reminds us:

One of these days I’m going to have to hang a sign on that door,” Agent Samson [the Speech Pathologist] used to say. She was probably thinking along the lines of SPEECH THERAPY LAB, though a more appropriate marker would have read FUTURE HOMOSEXUALS OF AMERICA. We knocked ourselves out trying to fit in but were ultimately betrayed by our tongues. At the beginning of the school year, while we were congratulating ourselves on successfully passing for normal, Agent Samson was taking names as our assembled teachers raised their hands, saying, “I’ve got one in my homeroom,” and “There are two in my fourth-period math class.” Were they also able to spot the future drunks and depressives? Did they hope that by eliminating our lisps, they might set us on a different path, or were they trying to prepare us for future stage and choral careers?

I’m sure that commentary like that above that suggests that the profession is not accepting of language and speech difference is not a ‘good look’ for the profession, especially given that SPs are in general completely unlike the population in makeup (on gender, age, and cultural measures).

Embarassing Voices

Since we are close to World Voice Day (April 16), I thought I’d share some remarks on an old argument: Is beauty or expressivity more important in a voice? Possibly the best case for expressivity comes from Cathy Berberian:

“I feel that today’s singers should avoid the kind of concentration on just pure sound, like the old school is Renata Tebaldi and today we have Montserrat Caballé, these beautiful voices but they sing like cows, they have the mentality of cows . . . they just want the sound to come out. They don’t think of . . . the meaning behind it . . . They’re worried about the next note, the high note that is coming . . . I think after their voices are gone, they are just old cows.” (Music is the Air I Breathe)

I couldn’t but help think of this upon reading quotes of an interview between star soprano Anna Netrebko and the New York Times:

Ms. Netrebko was reluctant to speculate about her character’s inner life. Asked how strong or vulnerable she planned to play her, she replied: “She’s blind. That’s it.” …In other instances, too, Ms. Netrebko favors a literal approach. Asked what the white and red roses mentioned several times in the opera might represent, she retorted, “I don’t know: roses!”

If you are looking for beautiful, well-executed singing, Netrebko is a good place to start. But what makes the media of opera so exciting sometimes is when singers deliberately adopt ‘bad’ techniques for expressive reasons. Let’s think about this in terms of my favourite opera ever: Richard Strauss’s Salome. Salome KarajanThis is an excessive performance of an excessively overripe opera. Strauss set a German translation of Oscar Wilde’s fruity and exotic play of the same name in a directly representative manner. Everything mentioned in the metaphor-laden text is represented in the music, from birds flying around Herod’s head to the reeling, drunken madwoman-like moon (!) to the jewels and rubies, and even the (off-stage) decapitation of Jochanaan (a solo, squeaky double-bass!). The vocal parts are extremely demanding in terms of sheer pitch and volume range required (given the unusually large orchestra). Salome herself has many conflicting moods, often in rapid succession (and rightly so, given the opera really is about the journey from love to death). I thought it might be interesting to look at four sopranos interpreting some key moments in the opera and see how they (ab)use their instrument.


Salome is in a single act, but divides nicely into distinct scenes separated by progressively longer musical interludes (the last of which is the famous ‘Dance of the Seven Veils’). The first early climax comes after Salome attempts to seduce Jochanaan (John the Baptist) by successively praising, and then cursing, his body, hair and lips before he retreats to his prison. Salome heralds her seduction with an exclamation as the horns hammer out her characteristic motive: Jochanaan

Here are the four sopranos, in order they are Nina Stemme, Hildegard Behrens, Maria Ewing and Teresa Stratas

All seem to take the beauty approach, but there are subtle differences. Stratas seems to restart on the second syllable Jo-CHA-naan; Ewing slides first and then seems to glottalise the final syllable; Behrens appears throatier, if more rapturously pretty than the others; and Stemme takes a no-nonsense approach, gliding rather quickly through the notes while soaring above a very loud orchestra. (Note the Ewing and Stemme recordings are live, whereas the others are studio recordings).

Gib mir den Kopf des Jochanaan!

After dancing for Herod, Salome requests the head of Jochanaan as reward. Herod at first refuses, instead offering her everything from his peacocks (producing the memorable line “You are ridiculous! You and your peacocks!”) to the mantel of the temple. Salome is unmoved:
Gib mir
Once again the four sopranos in the same order:

Stemme is deliberate, but not ugly. She takes her time on the final arpeggio, as well as clearly aspirating the “k” of “Kopf”. Her Salome is not wild or ferocious, but calculating and intense. Behrens is at first intense and accurate, but loses control at the end of the arpeggio, hooting out the final syllables. The effect is unashamedly ugly – her Salome is fed up with Herod’s excuses and wants what she has been promised. Ewing is perhaps the most successful in conveying ferocity. Her high Bb is shaky, but deliberately so, and the final snarl is truly frightening. Stratas’s snarl is less convincing than Ewing. All sopranos seem to have some difficultly pitching the Gb, which really doesn’t agree with the accompaniment.

As can be seen, while Salome is a role demanding singing of the utmost beauty, on occasions the active denial of bel canto technique is necessary. How can singers perform in such a way and preserve their instruments? As Cathy Berberian also said, she sang lots of modern music and made weird and wonderful sounds doing so and her instrument held up until her death; whereas Maria Callas who sang only the classics destroyed hers.

Pan Frying Part Three – Four Recent Papers

The following papers have been referenced a lot in media stories about fry. However, as I show, none of them conclusively prove that fry is new, bad, good, or pathological. The gender difference in fry could be a result of sexual dimorphism (see discussion on Language Log). Given the probably vast speech corpora available, surely it wouldn’t be difficult to improve the state of the literature?

Perceptions of Fry [1]

This study, reported in a linguistics journal, compared perceptual/acoustic findings from 11 male and 12 female speakers of Californian English (students at UC Berkeley). It found American females using creaky voice twice as often as Japanese females or American males.

For the second part of the study, one voice recording from the first part was selected and presented to 175 college students at UC Berkeley and the University of Iowa, who were asked “what kind of impressions” they had of the woman who produced the voice. About four-fifths of listeners reported recognising the feature (interestingly, 90% in Iowa and 60% in California – the disparity is not discussed). The overwhelming impressions were “professional“, “upwardly mobile” and “urban“. No evidence is presented that vocal fry was the phenomenon the listeners associated with these impressions.

Conclusions: Female college students fry more than male college students. One speaker who uses vocal fry is thought of as sophisticatedly urban. It’s a stretch to say that fry is intrinsically urban or professional.

Prevalence in Young Adult Males & Females [2,3]

In this study, the authors worked from the position that fry is both a pathological sign and present in normal speakers – which renders its clinical utility as part of a perceptual profile a bit suspect, no? The goal of the study was to “quantify the prevalence of vocal fry in a population of young, female, SAE [Standard American English], college students” (p.e112 – my emphasis). The protocol involved sentence reading and vowel production.

That’s five modifiers, but we should add two more: firstly, that the students were all at Long Island University; and secondly, that they consented to appear in this study (volunteer bias). This doesn’t affect the validity of a narrow reading of the results, but often a broad reading is reported. Wolk (the lead author) was quoted as saying “Although it’s not exclusively used by young women, they seem to use verbal fry more frequently than young men or older individuals.” – which I suppose is more sexy than saying “Although it’s not exclusively used by young, female, SAE-speaking, Long Island-residing, college students who consented to be in the study…etc.”

The team found a prevalence of about two-thirds (n=34). In the Discussion they note that “knowledge of the extent of vocal fry usage in college students may have very important long-term consequences for vocal health”, citing Colton’s textbook Understanding Voice Disorders as a reference. While Colton is a fine author and clinician, no evidence is provided in this text for this assertion.

In a follow-up study, the team repeated the protocol with male, SAE-speaking, Long Island University-attending, 18-25 yr old students (n=34), but did not recruit further female students, instead choosing to use the old data. No proportion was reported (“vocal fry was rarely used”).

Conclusions: This doesn’t tell us a lot, other than confirming that female college students fry more than male college students. The judges seemed to have difficulty agreeing on fry (which is a fairly noteworthy feature as my previous post shows). Describing Kappas of 0.48 and 0.49 as “high agreement” seems stretched (The standard reference calls for at least .7 for a “reliable” instrument [4]

You won’t get a job with Fry [5]

The PLoS One article which received quite a bit of attention (see Part One of this series). It didn’t begin well, quoting many anecdotal sources as one might quote evidence in an Introduction. 14 speakers (7 male, 7 female) produced the phrase “thank you for considering me for this opportunity” in their “normal tone” and in vocal fry (“mimicking”). These recordings were then presented in random pairs to 800 internet-based listeners who answered questions like “who is more competent?”. The researchers found that the listeners, both male and female, preferred the “normal tones” to fry at a rate greater than chance. The researchers conclude that vocal-fry is perceived negatively, and may result in “negative labor market perceptions”. They also note its prevalence is increasing[citation needed].

Christian DiCanio has pointed out many flaws in this study on Language Log:

  • The fry samples were not real fry but imitation
  • The samples did not differ in just fry but also in
    • duration of the sentence
    • duration of individual words
    • pitch
    • perceived vocal effort
  • The “normal tone” examples had some fry as well!!! (you can listen to all the stimuli on the PLOS website)

To these I’ll add:

  • Nobody would base the decision to hire solely on your voice (except perhaps this person).
  • The judges did not work in recruitment.

Conclusions: This paper’s methodological flaws seem fatal to its conclusion. Perhaps we could say people imitating a vocal style they do not use do not sound trustworthy or convincing?


Pan Frying Part Two – Fry Physiology, Acoustics and Linguistics

First some pictures. Here’s a high speed digital video of the vocal folds flapping away (vocal fry not pictured):

You can see the vocal folds coming together and apart. This pulsing causes the disturbances in the air that are perceived as fry under the right conditions, described below.

Fry is pretty noticeable on spectrograms as well. I went to everyone’s favourite fry source: This American Life (episode 554), and took a sample to Praat.


Praat’s poor glottal pulse tracker (the vertical blue lines on the audio waveform) completely fails to work on the vowel nucleus of ‘friend’, but has better luck with ‘party’ and ‘my’. You can see that the glottal pulses in ‘friend’ are irregularly spaced.

Let’s review some descriptions of fry. Everyone’s favourite source says fry:

is produced through a loose glottal closure which will permit air to bubble through slowly with a popping or rattling sound of a very low frequency. During this phonation, the arytenoid cartilages in the larynx are drawn together which causes the vocal folds to compress rather tightly and become relatively slack and compact. This process forms a large and irregularly vibrating mass within the vocal folds that produces the characteristic low popping or rattling sound when air passes through the glottal closure.

I suppose that will do for now. I did a quick database search for “vocal fry” and got 80 results, from which I whittled 18 relevant documents (this isn’t a systematic review, so don’t judge my search strategy…or my referencing!). I didn’t include papers on synthesis or computer detection, because they really fall outside my comfort zone. The earliest papers I could find on fry are in the Journal of Speech and Hearing Research and date from the late 1960s [1-9] but abruptly stop in 1971. I can’t access these because they are in Bundoora in a storeroom and not here on the internet.

The next papers are attempts to relate voice quality to vibratory patterns observed with high-speed digital imaging [10-11]. However, they used one subject who had a “bamboo node nodule” – not your typical vocal-fry user. They found that this subject’s vocal fry was characterised by a double or triple open/close phases of the vocal folds, followed by a longer closed phase.

In 2001, Gerratt and Kreiman of the improbably named Bureau of Glottal Affairs published “Toward a taxonomy of nonmodal phonation”. They noted three previous descriptions of vocal fry:

  1. Low-frequency aperiodicity (Dejonckere & Lebacq, 1983 [12])
  2. The alternation of large and small glottal pulses (Herzel, 1993 [13])
  3. High-pitched phonation with intermittent subharmonics (Mazo et al., 1995 [14])

They note an intriguing paper entitled ‘Creak as a sociophonetic marker’ by Henton and Bladon 1988 [15], which I will track down, as it seems to break the recency illusion which characterises vocal fry as a very recent phenomenon, promoted by Britney and Khloe.

Ingo Titze’s Principles of Voice Production (1993) describes fry acoustically. For him, it occurs when the fundamental frequency F0 is less than the ‘crossover frequency’ (about 70Hz), the frequency at which our ears perceive the glottic pulses individually, as opposed to hearing them as a tone: “bursts and gaps” (p.254). However, he notes a number of confounding factors:

  • “there could be multiple excitations within the period” (i.e. the glottis could partially close and reopen a few times before closing for the period, as described by [10-11] above).
  • The formant banding could increase “the open portion of the glottal cycle”, depending on the vowel. So you would be less likely to perceive fry on a [u] vowel as a [α] vowel.

In the This American Life example above, I manually calculated the average cycle length to be about 30ms, meaning the frequency would be about 35Hz – definitely in the perceptual vocal fry register, especially with a front vowel like /ε/.

To conclude, fry is an acoustic and physiological phenomenon. The larynx adopts a posture that promotes irregular phonation, characterised by a few partial pulses then a period of closure. These pulses are distance apart enough to be perceived individually, and not as a tone – the fry.


Fry is well-known to linguists as ‘creaky voice’, and is present in many languages. Danish stød is well-known in my household as the explanation for why the characters in Borgen always seem to be mumbling on the threshold of audibility. I remember being exposed to many Mon-Khmer languages in my undergraduate phonetics classes where ‘creak’ was a lexical feature. There is no evidence that I’m aware of that indicates that Danish speakers are more prone to laryngeal disorders because of their creaky voices.

In English, fry seems to occur mostly at the end of utterances, accompanying the downwards inflection that characteristically informs your conversation partner that “I’m done”.

In the final post, I will examine the recent sociologically-oriented research into fry, and address the four issues raised in the first post.


Pan Frying Part One – A Media Frenzy

‘Vocal fry’ was trending again. This time I picked it up through SPA’s twitter feed:

Perhaps you’ve also read about it here, or here, or here, or here, or here?

I hear it every day. At a rough estimate, perhaps 20% of my colleagues at uni use it habitually, and many more at the end of sentences. I even hear myself do it. Here’s my housemate (mid 20s male) doing it:

Across the coverage, several things become apparent:

Here’s one of probably thousands comments on the articles mentioned above that illustrates the level of disgust people have for this ‘vocal tic’:

fry excerptWow. So why are people so into fry?

Simply put, it’s just another way to ignore the structural barriers women face to occupational success. People seem unwilling to admit that the barriers are cultural, historical and institutional, and instead seek to blame the women themselves. You didn’t get the job because you used uptalk. Because you used vocal fry. Because you made grammatical errors.

Luckily, there have been a few people to question this dominant narrative. Mark Liberman at Language Log has deconstructed many of the faulty assumptions that underly the above assertions (here’s a list). A voice training business wrote a stirring defense of women’s voices, and here is Amanda Hess’s response to Bob Garfield as referenced above.

fryIn an upcoming post I will discuss the physiology and acoustics of fry, and review the published research.

(Cover Image by user Managementboy (Own work) [CC BY-SA 3.0 (, via Wikimedia Commons)