Polysemy and word frequency: A replication
Issue: Vol 4 No. 2 (2017)
Journal: Journal of Research Design and Statistics in Linguistics and Communication Science
Subject Areas: Linguistics
DOI: 10.1558/jrds.33751
Abstract:
One piece of evidence adduced by George Kingsley Zipf for his eponymous law (Zipf, 1935) and its explanation of the principle of least effort (Zipf, 1949) is the hypothesis that a word's polysemy is proportional to the square root of its frequency (Levelt, 2013). Pawley (2006) following Zipf, also proposes that 'there is a strong general correlation between frequency and the extent of polysemy'. This paper replicates Zipf 's approach but with data drawn from different sources to those available to Zipf, namely, for word frequency, the Kilgarriff most frequent word list drawn from the BNC (Kilgarriff, 1995) and, as a measure of polysemy, the WordNet data for the polysemy of the words in Kilgarriff's list. It also takes note of the syntactic category of lexemes. More advanced statistical modelling is used. Zipf 's observations are confirmed with some provisos. Their utility is examined. Explanations for this relationship remain to be established.
Author: Koenraad Kuiper, Robert Fromont, Daniel Gerhard
References :
Amir, Y. and Sharon, I. (1990). Replication research: A ‘must’ for the scientific advancement of psychology. Journal of Social Behavior and Personality 5 (4): 51–69.
Baayen, R. H., Shaoul, C., Willits, J., and Ramscar, M. (2015). Comprehension without segmentation: A proof of concept with naive discrimination learning. Language, Cognition, and Neuroscience 31 (1): 106–128. https://doi.org/10.1080/23273798.2015.1065336
Baker, M. C. (2003). Lexical Categories: Verbs, Nouns and Adjectives. Cambridge: Cambridge University Press. https://doi.org/10.1017/CBO9780511615047
Barque, L. and Chaumartin, F.-R. (2006). Regular polysemy in WordNet. LDV-Forum 21 (1): 1–14.
Chaplot, D. S., Bhattacharyya, P., and Paranjape, A. (2015). Unsupervised word sense disambiguation using Markov random field and dependency parser Paper presented at the 29th AAAI Conference on Artificial Intelligence (AAAI-15), Austin, Texas.
Crossley, S., Salsbury, T., and McNamara, D. (2010). The development of polysemy and frequency use in English second language speakers. Language Learning: A Journal of Research in Language Studies 60 (3): 573–605.
Everaert, M. and Bolhuis, J. (2017). The biology of language. Neuroscience and Biobehavioral Reviews 81: 99–102. https://doi.org/10.1016/j.neubiorev.2017.08.005
Grimshaw, J. (1990). Argument Structure. Cambridge, MA: MIT Press.
Hanks, P. (2013). Lexical Analysis: Norms and Exploitations. Cambridge, MA: MIT Press. https://doi.org/10.7551/mitpress/9780262018579.001.0001
Hernández-Fernández, A., Casas, B., Ferrer-i-Cancho, R., and Baixeries, J. (2016). Testing the robustness of laws of polysemy and brevity versus frequency. In P. Král and C. Martín-Vide (Eds) Statistical Language and Speech Processing. SLSP 2016. Lecture Notes in Computer Science, vol 9918. Champaign, IL: Springer. https://doi.org/10.1007/978-3-319-45925-7_2
Katz, J. J. and Fodor, J. A. (1963). The structure of semantic theory Language 39 (2): 170–210. https://doi.org/10.2307/411200
Kearns, K. (1998). Light verbs in English. Linguistics 34: 53–72. https://doi.org/10.1017/S002222679700683X
Kilgarriff, A. (1995). BNC database and word frequency lists. Retrieved on 24 February 2014 from http://www.kilgarriff.co.uk/BNC_lists/lemma.al
Klepousniotou, E. (2002). The processing of lexical ambiguity: Homonymy and polysemy in the mental lexicon. Brain and Language 81: 205–223. https://doi.org/10.1006/brln.2001.2518
Levelt, W. J. M. (1989). Speaking: From Intention to Articulation. Cambridge, MA: MIT Press.
Levelt, W. J. M. (2013). A History of Psycholinguistics: The pre-Chomskian Era. Oxford: Oxford University Press.
Levelt, W. J. M., Roelofs, A., and Meyer, A. S. (1999). A theory of lexical access in speech production. Behavioral & Brain Sciences 22 (1): 1–75. https://doi.org/10.1017/S0140525X99001776
McCullagh, P. and Nelder, J. A. (1989). Generalized linear Models (2nd ed.). Boca Raton, FL: Chapman & Hall. https://doi.org/10.1007/978-1-4899-3242-6
Miller, G. A., Beckwith, R., Fellbaum, C., Gross, D., and Miller, K. (1992). WordNet: A lexical database for English. Commun. ACM 38: 39-41. https://doi.org/10.1145/219717.219748
Nation, I. S. P. (2008). Teaching Vocabulary: Strategies and Techniques. Boston, MA: Cengage Learning.
Pawley, A. (2006). Where have all the verbs gone? Remarks on the organisation of language with small, closed verb classes. Paper presented at the 11th Biennial Rice University Linguistics Symposium. Austin, Texas.
R Core Team. (2016). R: A language and environment for statistical computing. Retrieved on 28 August 2015from https://www.R-project.org/
Simons, D. J. (2014). The value of direct replication. Perspectives on Psychological Science 9 (1): 76–80. https://doi.org/10.1177/1745691613514755
Taylor, J. R. (2003). Polysemy’s paradoxes. Language Sciences 25 (6): 637–655. https://doi.org/10.1016/S0388-0001(03)00031-7
Taylor, J. R. (2012). The Mental Corpus: How Language is Represented in the Mind. Oxford: Oxford University Press. https://doi.org/10.1093/acprof:oso/9780199290802.001.0001
Tengi, R. I. (1998). Design and implementation of the WordNet lexical database and searching software. In: C. Fellbaum (Ed.) WordNet: An Electronic Lexical Database, 105–127. Cambridge, MA: MIT Press.
Wittgenstein, L. (1965). Philosophical Investigations. New York: The Macmillan Company.
Yang, C. (2013). Ontogeny and phylogeny of language. Proceedings of the National Academy of Sciences 110 (16): 6324–6327. https://doi.org/10.1073/pnas.1216803110
Yang, C. (2016). The Price of Linguistic Productivity: How Children learn to break the Rules of Language. Cambridge, MA: MIT Press.
Zipf, G. K. (1949). Human Behaviour and the Principle of Least Effort. Cambridge, MA: Addison-Wesley.