There’s been a recent flare-up in the debate over the policing of non-conforming voices. The two cases are vocal fry in women, and ‘gay speech’ – a sociolect common to a subset of Western males (who are not always homosexual as the research informs us).police-badge-clipart-black-and-white-LTKdMyMGc

The articles in question:

Reactions to the last article were intense:

Both of these responses saved special criticism for the Speech Pathologist in question (who I don’t think it’s important to identify), specifically what they view as the pathologising of normal variation in speech. The SP’s opinions about vocal fry:

They just have developed a speech pattern that’s a habit, and they don’t know how to break out of it. When we present ourselves, the way we speak is our verbal image. Much as the way people in the professional world typically don’t go to work in sweats and a t-shirt, they are more concerned about how they present themselves, a lot of the clients that come to see me are concerned about how they’re presenting themselves verbally.

and on gay speech:

I don’t try to dissuade them because when people come to see me they’ve typically reached the point where it’s really bothering them.

In some ways the Speech Pathologist is working through patient-centered goals: these patients want the speech therapy to achieve a change, so why not give it to them?

I think this model places the clinician outside of the society in which their clients operate. If a client who spoke in a non-standard dialect of English (say, African-American Vernacular English) and said they weren’t happy with their accent and dialect, and wanted to approach the standard variety, what should the Speech Pathologist do? Should we view ourselves as a therapy machine that exists solely to enact individual patient wishes, or should we advocate for a society we would want to be part of that embraced diversity? There is no meaningful functional limitation that comes from using vocal fry, uptalk, gay speech, or AAVE that isn’t propagated by people. Speech Pathologists are people, who are part of society, and pretending that we can leave our prejudices at the door of the clinic room is wishful thinking.

Instead of normalising this difference, so that everyone speaks in the same way to not risk upsetting those who cannot tolerate difference, couldn’t we instead advocate for the acceptance of other ways of talking? Speech Pathology’s record here is not fantastic, as a famous David Sedaris essay reminds us:

One of these days I’m going to have to hang a sign on that door,” Agent Samson [the Speech Pathologist] used to say. She was probably thinking along the lines of SPEECH THERAPY LAB, though a more appropriate marker would have read FUTURE HOMOSEXUALS OF AMERICA. We knocked ourselves out trying to fit in but were ultimately betrayed by our tongues. At the beginning of the school year, while we were congratulating ourselves on successfully passing for normal, Agent Samson was taking names as our assembled teachers raised their hands, saying, “I’ve got one in my homeroom,” and “There are two in my fourth-period math class.” Were they also able to spot the future drunks and depressives? Did they hope that by eliminating our lisps, they might set us on a different path, or were they trying to prepare us for future stage and choral careers?

I’m sure that commentary like that above that suggests that the profession is not accepting of language and speech difference is not a ‘good look’ for the profession, especially given that SPs are in general completely unlike the population in makeup (on gender, age, and cultural measures).


Pan Frying Part Two – Fry Physiology, Acoustics and Linguistics

First some pictures. Here’s a high speed digital video of the vocal folds flapping away (vocal fry not pictured):

You can see the vocal folds coming together and apart. This pulsing causes the disturbances in the air that are perceived as fry under the right conditions, described below.

Fry is pretty noticeable on spectrograms as well. I went to everyone’s favourite fry source: This American Life (episode 554), and took a sample to Praat.


Praat’s poor glottal pulse tracker (the vertical blue lines on the audio waveform) completely fails to work on the vowel nucleus of ‘friend’, but has better luck with ‘party’ and ‘my’. You can see that the glottal pulses in ‘friend’ are irregularly spaced.

Let’s review some descriptions of fry. Everyone’s favourite source says fry:

is produced through a loose glottal closure which will permit air to bubble through slowly with a popping or rattling sound of a very low frequency. During this phonation, the arytenoid cartilages in the larynx are drawn together which causes the vocal folds to compress rather tightly and become relatively slack and compact. This process forms a large and irregularly vibrating mass within the vocal folds that produces the characteristic low popping or rattling sound when air passes through the glottal closure.

I suppose that will do for now. I did a quick database search for “vocal fry” and got 80 results, from which I whittled 18 relevant documents (this isn’t a systematic review, so don’t judge my search strategy…or my referencing!). I didn’t include papers on synthesis or computer detection, because they really fall outside my comfort zone. The earliest papers I could find on fry are in the Journal of Speech and Hearing Research and date from the late 1960s [1-9] but abruptly stop in 1971. I can’t access these because they are in Bundoora in a storeroom and not here on the internet.

The next papers are attempts to relate voice quality to vibratory patterns observed with high-speed digital imaging [10-11]. However, they used one subject who had a “bamboo node nodule” – not your typical vocal-fry user. They found that this subject’s vocal fry was characterised by a double or triple open/close phases of the vocal folds, followed by a longer closed phase.

In 2001, Gerratt and Kreiman of the improbably named Bureau of Glottal Affairs published “Toward a taxonomy of nonmodal phonation”. They noted three previous descriptions of vocal fry:

  1. Low-frequency aperiodicity (Dejonckere & Lebacq, 1983 [12])
  2. The alternation of large and small glottal pulses (Herzel, 1993 [13])
  3. High-pitched phonation with intermittent subharmonics (Mazo et al., 1995 [14])

They note an intriguing paper entitled ‘Creak as a sociophonetic marker’ by Henton and Bladon 1988 [15], which I will track down, as it seems to break the recency illusion which characterises vocal fry as a very recent phenomenon, promoted by Britney and Khloe.

Ingo Titze’s Principles of Voice Production (1993) describes fry acoustically. For him, it occurs when the fundamental frequency F0 is less than the ‘crossover frequency’ (about 70Hz), the frequency at which our ears perceive the glottic pulses individually, as opposed to hearing them as a tone: “bursts and gaps” (p.254). However, he notes a number of confounding factors:

  • “there could be multiple excitations within the period” (i.e. the glottis could partially close and reopen a few times before closing for the period, as described by [10-11] above).
  • The formant banding could increase “the open portion of the glottal cycle”, depending on the vowel. So you would be less likely to perceive fry on a [u] vowel as a [α] vowel.

In the This American Life example above, I manually calculated the average cycle length to be about 30ms, meaning the frequency would be about 35Hz – definitely in the perceptual vocal fry register, especially with a front vowel like /ε/.

To conclude, fry is an acoustic and physiological phenomenon. The larynx adopts a posture that promotes irregular phonation, characterised by a few partial pulses then a period of closure. These pulses are distance apart enough to be perceived individually, and not as a tone – the fry.


Fry is well-known to linguists as ‘creaky voice’, and is present in many languages. Danish stød is well-known in my household as the explanation for why the characters in Borgen always seem to be mumbling on the threshold of audibility. I remember being exposed to many Mon-Khmer languages in my undergraduate phonetics classes where ‘creak’ was a lexical feature. There is no evidence that I’m aware of that indicates that Danish speakers are more prone to laryngeal disorders because of their creaky voices.

In English, fry seems to occur mostly at the end of utterances, accompanying the downwards inflection that characteristically informs your conversation partner that “I’m done”.

In the final post, I will examine the recent sociologically-oriented research into fry, and address the four issues raised in the first post.


