First some pictures. Here’s a high speed digital video of the vocal folds flapping away (vocal fry not pictured):
You can see the vocal folds coming together and apart. This pulsing causes the disturbances in the air that are perceived as fry under the right conditions, described below.
Fry is pretty noticeable on spectrograms as well. I went to everyone’s favourite fry source: This American Life (episode 554), and took a sample to Praat.
Praat’s poor glottal pulse tracker (the vertical blue lines on the audio waveform) completely fails to work on the vowel nucleus of ‘friend’, but has better luck with ‘party’ and ‘my’. You can see that the glottal pulses in ‘friend’ are irregularly spaced.
Let’s review some descriptions of fry. Everyone’s favourite source says fry:
is produced through a loose glottal closure which will permit air to bubble through slowly with a popping or rattling sound of a very low frequency. During this phonation, the arytenoid cartilages in the larynx are drawn together which causes the vocal folds to compress rather tightly and become relatively slack and compact. This process forms a large and irregularly vibrating mass within the vocal folds that produces the characteristic low popping or rattling sound when air passes through the glottal closure.
I suppose that will do for now. I did a quick database search for “vocal fry” and got 80 results, from which I whittled 18 relevant documents (this isn’t a systematic review, so don’t judge my search strategy…or my referencing!). I didn’t include papers on synthesis or computer detection, because they really fall outside my comfort zone. The earliest papers I could find on fry are in the Journal of Speech and Hearing Research and date from the late 1960s [1-9] but abruptly stop in 1971. I can’t access these because they are in Bundoora in a storeroom and not here on the internet.
The next papers are attempts to relate voice quality to vibratory patterns observed with high-speed digital imaging [10-11]. However, they used one subject who had a “bamboo node nodule” – not your typical vocal-fry user. They found that this subject’s vocal fry was characterised by a double or triple open/close phases of the vocal folds, followed by a longer closed phase.
In 2001, Gerratt and Kreiman of the improbably named Bureau of Glottal Affairs published “Toward a taxonomy of nonmodal phonation”. They noted three previous descriptions of vocal fry:
- Low-frequency aperiodicity (Dejonckere & Lebacq, 1983 )
- The alternation of large and small glottal pulses (Herzel, 1993 )
- High-pitched phonation with intermittent subharmonics (Mazo et al., 1995 )
They note an intriguing paper entitled ‘Creak as a sociophonetic marker’ by Henton and Bladon 1988 , which I will track down, as it seems to break the recency illusion which characterises vocal fry as a very recent phenomenon, promoted by Britney and Khloe.
Ingo Titze’s Principles of Voice Production (1993) describes fry acoustically. For him, it occurs when the fundamental frequency F0 is less than the ‘crossover frequency’ (about 70Hz), the frequency at which our ears perceive the glottic pulses individually, as opposed to hearing them as a tone: “bursts and gaps” (p.254). However, he notes a number of confounding factors:
- “there could be multiple excitations within the period” (i.e. the glottis could partially close and reopen a few times before closing for the period, as described by [10-11] above).
- The formant banding could increase “the open portion of the glottal cycle”, depending on the vowel. So you would be less likely to perceive fry on a [u] vowel as a [α] vowel.
In the This American Life example above, I manually calculated the average cycle length to be about 30ms, meaning the frequency would be about 35Hz – definitely in the perceptual vocal fry register, especially with a front vowel like /ε/.
To conclude, fry is an acoustic and physiological phenomenon. The larynx adopts a posture that promotes irregular phonation, characterised by a few partial pulses then a period of closure. These pulses are distance apart enough to be perceived individually, and not as a tone – the fry.
Fry is well-known to linguists as ‘creaky voice’, and is present in many languages. Danish stød is well-known in my household as the explanation for why the characters in Borgen always seem to be mumbling on the threshold of audibility. I remember being exposed to many Mon-Khmer languages in my undergraduate phonetics classes where ‘creak’ was a lexical feature. There is no evidence that I’m aware of that indicates that Danish speakers are more prone to laryngeal disorders because of their creaky voices.
In English, fry seems to occur mostly at the end of utterances, accompanying the downwards inflection that characteristically informs your conversation partner that “I’m done”.
In the final post, I will examine the recent sociologically-oriented research into fry, and address the four issues raised in the first post.
- Hollien, H., Moore, P., Wendahl, R. W., & Michel, J. F. (1966). On the nature of vocal fry. Journal of Speech and Hearing Research, 9(2), 245-247.
- Michel, J. F., & Hollien, H. (1968). Perceptual differentiation of vocal fry and harshness. Journal of Speech and Hearing Research, 11(2), 439-443.
- McGlone, R. E. (1967). Air flow during vocal fry phonation. Journal of Speech and Hearing Research, 10(2), 299-304.
- Michel, J. F. (1968). Fundamental frequency investigation of vocal fry and harshness. Journal of Speech and Hearing Research, 11(3), 590-594.
- Hollien, H., & Michel, J. F. (1968). Vocal fry as a phonational register. Journal of Speech and Hearing Research, 11(3), 600-604.
- Hollien, H., & Wendahl, R. W. (1968). Perceptual study of vocal fry. Journal of the Acoustical Society of America, 43(3), 506-509.
- Hollien, H., Damsté, H., & Murry, T. (1969). Vocal fold length during vocal fry phonation. Folia Phoniatrica, 21(4), 257-265.
- McGlone, R. E., & Shipp, T. (1971). Some physiologic correlates of vocal-fry phonation. Journal of Speech and Hearing Research, 14(4), 769-775.
- Murry, T, & Brown Jr W. S. (1971). Regulation of vocal intensity during vocal fry phonation. Journal of the Acoustical Society of America, 49(6 pt 2), 1905-1907.
- Miyaji, M., Iwamoto, Y., Oda, M., & Niimi, S. (1999). Relation between voice quality and pathological vibratory patterns using high-speed digital imaging. Journal of Otolaryngology of Japan, 102(3), 354-367.
- Niimi, S., & Miyaji, M. (2000). Vocal fold vibration and voice quality. Folia Phoniatrica Et Logopaedica, 52(1-3), 32-38.
- Dejonckere, P. H. & Lebacq, J. (1983) An analysis of the diplophonia phenomenon, Speech Communication, 2, 47-56.
- Herzel, H. (1993). Bifurcations and chaos in voice signals, Applied Mechanics Review, 46, 399-413.
- Mazo, M., Erickson, D., & Harvey, T. (1995) Emotion and expression: temporal data on voice quality in Russian lament, in Fujimura, O., Hirano, M. (Eds.) Vocal fold physiology: voice quality control, Singular Press, San Diego.
- Henton, C. G., & Bladon, A. (1998) Creak as a sociophonetic marker, in Hyman, L., Li, C. (Eds.) Language, speech and mind: studies in honour of Victoria A. Fromkin, Routledge, London, pp.3-29.