The following papers have been referenced a lot in media stories about fry. However, as I show, none of them conclusively prove that fry is new, bad, good, or pathological. The gender difference in fry could be a result of sexual dimorphism (see discussion on Language Log). Given the probably vast speech corpora available, surely it wouldn’t be difficult to improve the state of the literature?
Perceptions of Fry 
This study, reported in a linguistics journal, compared perceptual/acoustic findings from 11 male and 12 female speakers of Californian English (students at UC Berkeley). It found American females using creaky voice twice as often as Japanese females or American males.
For the second part of the study, one voice recording from the first part was selected and presented to 175 college students at UC Berkeley and the University of Iowa, who were asked “what kind of impressions” they had of the woman who produced the voice. About four-fifths of listeners reported recognising the feature (interestingly, 90% in Iowa and 60% in California – the disparity is not discussed). The overwhelming impressions were “professional“, “upwardly mobile” and “urban“. No evidence is presented that vocal fry was the phenomenon the listeners associated with these impressions.
Conclusions: Female college students fry more than male college students. One speaker who uses vocal fry is thought of as sophisticatedly urban. It’s a stretch to say that fry is intrinsically urban or professional.
Prevalence in Young Adult Males & Females [2,3]
In this study, the authors worked from the position that fry is both a pathological sign and present in normal speakers – which renders its clinical utility as part of a perceptual profile a bit suspect, no? The goal of the study was to “quantify the prevalence of vocal fry in a population of young, female, SAE [Standard American English], college students” (p.e112 – my emphasis). The protocol involved sentence reading and vowel production.
That’s five modifiers, but we should add two more: firstly, that the students were all at Long Island University; and secondly, that they consented to appear in this study (volunteer bias). This doesn’t affect the validity of a narrow reading of the results, but often a broad reading is reported. Wolk (the lead author) was quoted as saying “Although it’s not exclusively used by young women, they seem to use verbal fry more frequently than young men or older individuals.” – which I suppose is more sexy than saying “Although it’s not exclusively used by young, female, SAE-speaking, Long Island-residing, college students who consented to be in the study…etc.”
The team found a prevalence of about two-thirds (n=34). In the Discussion they note that “knowledge of the extent of vocal fry usage in college students may have very important long-term consequences for vocal health”, citing Colton’s textbook Understanding Voice Disorders as a reference. While Colton is a fine author and clinician, no evidence is provided in this text for this assertion.
In a follow-up study, the team repeated the protocol with male, SAE-speaking, Long Island University-attending, 18-25 yr old students (n=34), but did not recruit further female students, instead choosing to use the old data. No proportion was reported (“vocal fry was rarely used”).
Conclusions: This doesn’t tell us a lot, other than confirming that female college students fry more than male college students. The judges seemed to have difficulty agreeing on fry (which is a fairly noteworthy feature as my previous post shows). Describing Kappas of 0.48 and 0.49 as “high agreement” seems stretched (The standard reference calls for at least .7 for a “reliable” instrument 
You won’t get a job with Fry 
The PLoS One article which received quite a bit of attention (see Part One of this series). It didn’t begin well, quoting many anecdotal sources as one might quote evidence in an Introduction. 14 speakers (7 male, 7 female) produced the phrase “thank you for considering me for this opportunity” in their “normal tone” and in vocal fry (“mimicking”). These recordings were then presented in random pairs to 800 internet-based listeners who answered questions like “who is more competent?”. The researchers found that the listeners, both male and female, preferred the “normal tones” to fry at a rate greater than chance. The researchers conclude that vocal-fry is perceived negatively, and may result in “negative labor market perceptions”. They also note its prevalence is increasing.
- The fry samples were not real fry but imitation
- The samples did not differ in just fry but also in
- duration of the sentence
- duration of individual words
- perceived vocal effort
- The “normal tone” examples had some fry as well!!! (you can listen to all the stimuli on the PLOS website)
To these I’ll add:
- Nobody would base the decision to hire solely on your voice (except perhaps this person).
- The judges did not work in recruitment.
Conclusions: This paper’s methodological flaws seem fatal to its conclusion. Perhaps we could say people imitating a vocal style they do not use do not sound trustworthy or convincing?
- Yuasa, I. P. (2010). Creaky Voice: A new feminine voice quality for young urban-oriented upwardly mobile American women? American Speech, 85(3), 315-337.
- Wolk, L., Abdelli-Beruh, N. B., & Slavin, D. (2012). Habitual use of vocal fry in young adult female speakers. Journal of Voice, 26(3), e111-e116.
- Abdelli-Beruh, N. B., Wolk, L. & Slavin, D. (2014) Prevalence of vocal fry in young adult male American speakers. Journal of Voice, 28(2), 185-190.
- Landis, J. R., Koch, G. G. (1977). The measurement of observer agreement for categorical data. Biometrics 33:159-174
- Anderson, R. C., Klofstad, C. A., Mayew, W. J., & Venkatachalam, M. (2014). Vocal fry may undermine the success of young women in the labor market. PLoS ONE, 9(5)
First some pictures. Here’s a high speed digital video of the vocal folds flapping away (vocal fry not pictured):
You can see the vocal folds coming together and apart. This pulsing causes the disturbances in the air that are perceived as fry under the right conditions, described below.
Fry is pretty noticeable on spectrograms as well. I went to everyone’s favourite fry source: This American Life (episode 554), and took a sample to Praat.
Praat’s poor glottal pulse tracker (the vertical blue lines on the audio waveform) completely fails to work on the vowel nucleus of ‘friend’, but has better luck with ‘party’ and ‘my’. You can see that the glottal pulses in ‘friend’ are irregularly spaced.
Let’s review some descriptions of fry. Everyone’s favourite source says fry:
is produced through a loose glottal closure which will permit air to bubble through slowly with a popping or rattling sound of a very low frequency. During this phonation, the arytenoid cartilages in the larynx are drawn together which causes the vocal folds to compress rather tightly and become relatively slack and compact. This process forms a large and irregularly vibrating mass within the vocal folds that produces the characteristic low popping or rattling sound when air passes through the glottal closure.
I suppose that will do for now. I did a quick database search for “vocal fry” and got 80 results, from which I whittled 18 relevant documents (this isn’t a systematic review, so don’t judge my search strategy…or my referencing!). I didn’t include papers on synthesis or computer detection, because they really fall outside my comfort zone. The earliest papers I could find on fry are in the Journal of Speech and Hearing Research and date from the late 1960s [1-9] but abruptly stop in 1971. I can’t access these because they are in Bundoora in a storeroom and not here on the internet.
The next papers are attempts to relate voice quality to vibratory patterns observed with high-speed digital imaging [10-11]. However, they used one subject who had a “bamboo node nodule” – not your typical vocal-fry user. They found that this subject’s vocal fry was characterised by a double or triple open/close phases of the vocal folds, followed by a longer closed phase.
In 2001, Gerratt and Kreiman of the improbably named Bureau of Glottal Affairs published “Toward a taxonomy of nonmodal phonation”. They noted three previous descriptions of vocal fry:
- Low-frequency aperiodicity (Dejonckere & Lebacq, 1983 )
- The alternation of large and small glottal pulses (Herzel, 1993 )
- High-pitched phonation with intermittent subharmonics (Mazo et al., 1995 )
They note an intriguing paper entitled ‘Creak as a sociophonetic marker’ by Henton and Bladon 1988 , which I will track down, as it seems to break the recency illusion which characterises vocal fry as a very recent phenomenon, promoted by Britney and Khloe.
Ingo Titze’s Principles of Voice Production (1993) describes fry acoustically. For him, it occurs when the fundamental frequency F0 is less than the ‘crossover frequency’ (about 70Hz), the frequency at which our ears perceive the glottic pulses individually, as opposed to hearing them as a tone: “bursts and gaps” (p.254). However, he notes a number of confounding factors:
- “there could be multiple excitations within the period” (i.e. the glottis could partially close and reopen a few times before closing for the period, as described by [10-11] above).
- The formant banding could increase “the open portion of the glottal cycle”, depending on the vowel. So you would be less likely to perceive fry on a [u] vowel as a [α] vowel.
In the This American Life example above, I manually calculated the average cycle length to be about 30ms, meaning the frequency would be about 35Hz – definitely in the perceptual vocal fry register, especially with a front vowel like /ε/.
To conclude, fry is an acoustic and physiological phenomenon. The larynx adopts a posture that promotes irregular phonation, characterised by a few partial pulses then a period of closure. These pulses are distance apart enough to be perceived individually, and not as a tone – the fry.
Fry is well-known to linguists as ‘creaky voice’, and is present in many languages. Danish stød is well-known in my household as the explanation for why the characters in Borgen always seem to be mumbling on the threshold of audibility. I remember being exposed to many Mon-Khmer languages in my undergraduate phonetics classes where ‘creak’ was a lexical feature. There is no evidence that I’m aware of that indicates that Danish speakers are more prone to laryngeal disorders because of their creaky voices.
In English, fry seems to occur mostly at the end of utterances, accompanying the downwards inflection that characteristically informs your conversation partner that “I’m done”.
In the final post, I will examine the recent sociologically-oriented research into fry, and address the four issues raised in the first post.
- Hollien, H., Moore, P., Wendahl, R. W., & Michel, J. F. (1966). On the nature of vocal fry. Journal of Speech and Hearing Research, 9(2), 245-247.
- Michel, J. F., & Hollien, H. (1968). Perceptual differentiation of vocal fry and harshness. Journal of Speech and Hearing Research, 11(2), 439-443.
- McGlone, R. E. (1967). Air flow during vocal fry phonation. Journal of Speech and Hearing Research, 10(2), 299-304.
- Michel, J. F. (1968). Fundamental frequency investigation of vocal fry and harshness. Journal of Speech and Hearing Research, 11(3), 590-594.
- Hollien, H., & Michel, J. F. (1968). Vocal fry as a phonational register. Journal of Speech and Hearing Research, 11(3), 600-604.
- Hollien, H., & Wendahl, R. W. (1968). Perceptual study of vocal fry. Journal of the Acoustical Society of America, 43(3), 506-509.
- Hollien, H., Damsté, H., & Murry, T. (1969). Vocal fold length during vocal fry phonation. Folia Phoniatrica, 21(4), 257-265.
- McGlone, R. E., & Shipp, T. (1971). Some physiologic correlates of vocal-fry phonation. Journal of Speech and Hearing Research, 14(4), 769-775.
- Murry, T, & Brown Jr W. S. (1971). Regulation of vocal intensity during vocal fry phonation. Journal of the Acoustical Society of America, 49(6 pt 2), 1905-1907.
- Miyaji, M., Iwamoto, Y., Oda, M., & Niimi, S. (1999). Relation between voice quality and pathological vibratory patterns using high-speed digital imaging. Journal of Otolaryngology of Japan, 102(3), 354-367.
- Niimi, S., & Miyaji, M. (2000). Vocal fold vibration and voice quality. Folia Phoniatrica Et Logopaedica, 52(1-3), 32-38.
- Dejonckere, P. H. & Lebacq, J. (1983) An analysis of the diplophonia phenomenon, Speech Communication, 2, 47-56.
- Herzel, H. (1993). Bifurcations and chaos in voice signals, Applied Mechanics Review, 46, 399-413.
- Mazo, M., Erickson, D., & Harvey, T. (1995) Emotion and expression: temporal data on voice quality in Russian lament, in Fujimura, O., Hirano, M. (Eds.) Vocal fold physiology: voice quality control, Singular Press, San Diego.
- Henton, C. G., & Bladon, A. (1998) Creak as a sociophonetic marker, in Hyman, L., Li, C. (Eds.) Language, speech and mind: studies in honour of Victoria A. Fromkin, Routledge, London, pp.3-29.