Michael Tomasello
Constructing a Language
A Usage-Based Theory of Language Aquisition
Harvard 2003

pg 8 Origins of Language

The common behavior of mankind is the system of reference by means of which we interpret an unknown language. LUDWIG WITTGENSTEIN

HUMAN linguistic communication differs from the communication of other animal species in two main ways. First, and most importantly, human linguistic communication is symbolic.

Linguistic symbols are social conventions by means of which one individual attempts to share attention with another individual by directing the other's attentional or mental state to something in the outside world. Other animal species do not communicate with one another using linguistic symbols, most likely because they do not understand that conspecifics have attentional or mental states that they could attempt to direct or share (Tomasello, 1998b). To oversimplify, animal signals are aimed at the behavior and motivational states of others, whereas human symbols are aimed at the attentional and mental states of others. It is this mental dimension that gives linguistic symbols their unparalleled communicative power, enabling them to be used to refer to and to predicate all kinds of diverse perspectives on objects, events, and situations in the world.

The second main difference is that human linguistic communication is grammatical. Human beings use their linguistic symbols together in patterned ways, and these patterns, known as linguistic constructions, take on meanings of their own—deriving partly from the meanings of the individual symbols but, over time, at least partly from the pattern itself. The process by which this occurs over historical time is called grammaticalization (or syntacticization), and grammatical constructions of course add still another dimension of communicative power to human languages. The process of grammaticalization depends crucially on a variety of domain-general cognitive and social-cognitive processes that operate as people communicate with and learn from one another. It is also a species­unique process—because if other animals do not use symbols, the question of grammar is moot.

Human skills of linguistic communication are also unique in the way they are acquired during ontogeny. The main point is that, unlike other animal species, the human species does not have a single system of communication. Different groups of human beings have conventionalized different systems of communication (there are more than 6,000 of them), and children typically acquire only the system(s) of their natal group(s). Children take soveral years to acquire the many tens of thousands of linguistic conventions usod by those around them, whereas most other ani­mal species do not learn any of their species-typical communicative signals at all.

2.1. Phylogenetic Origins

As adumbrated in Chapter 1, the Generative Grammar hypothesis focuses only on grammar and claims that the human species has evolved during its phylogeny a genetically based universal grammar. The theory is unconcerned with the symbolic dimensions of human linguistic communication.
The usage-based view—or at least the version of it espoused here—is precisely the opposite. In this view, the human use of symbols is primary, with the most likely evolutionary scenario being that the human species evolved skills enabling the use of linguistic symbols phylogenetically. But the emergence of grammar is a cultural-historical affair—probably originating quite recently in human evolution—involving no additional genetic events concerning language per se (except possibly some vocal-auditory information-processing skills that contribute indirectly to grammaticalization processes).

This is not to imply that we know how language originated in human evolution, because we do not. But if we focus on linguistic symbols as primary, we may obtain some hints by looking at the communication of our nearest primate relatives—who communicate not with symbols but with vocal and gestural signals. At the very least, this comparison will help us to identify the unique features of human symbolic communication. For hints about the emergence of grammar in human evolution we need to examine various processes of grammaticalization and syntacticization as they may be inferred from historical examinations of written language and from comparative examinations within language families.

2.1.1. Primate Communication

Discerning the unique features of human symbolic/linguistic communication is sometimes made more difficult by anthropocentric accounts of non­human primate communication. The most important instance of this is the well-known case of the alarm calls of vervet monkeys. The basic facts are these (see Cheney and Seyfarth, 1990, for more details). In their natural habitats in east Africa vervet monkeys use three different types of alarm calls to indicate the presence of three different types of predator: leopards, eagles, and snakes. A loud, barking call is given to leopards and other cat species, a short cough-like call is given to two species of eagle, and a "chutter" call is given to a variety of dangerous snake species. Each call elicits a different escape response on the part of vervets who hear the call: to a leopard alarm they run for the trees; to an eagle alarm they look up in the air and sometimes run into the bushes; and to a snake alarm they look down at the gronnd, sometimes from a bipedal stance. These responses are just as distinct and frequent when researchers play back previously recorded alarm calls over a loudspeaker, indicating that the responses of the vervets are not dependent on seeing the predator but rather on information contained in the call itself.

On the surface, these alarm calls would seem to be very similar to human language. It seems as if the caller is directing the attention of others to something they do not perceive or something they do not know is present; that is, the calls would seem to be symbolic (referential). But several additional facts argue against this interpretation. First, there is basically no sign that vervet monkeys attempt to manipulate the attentional or mental states of conspecifics in any other domain of their lives. Thus, vervets also have different "grunts" that they use in various social situations, but these show no signs of being symbolic or referential in the sense of being intended to direct the attention of others to outside entities; they mainly serve to regulate dyadic social interactions not involving outside entities, such as grooming, playing, fighting, sex, and travel. Second, predator­specific alarm calls turn out to be fairly widespread in the animal kingdom. They are used by a number of species—from ground squirrels to do­mestic chickens—that must deal with multiple predators requiring different types of escape responses (Owings and Morton, 1998), but no one considers them to be symbolic or referential in a human-like way. An extremely important evolutionary fact in all of this is that no species of ape has such specific alarm calls or any other vocalizations that appear to be referential (Cheney and Wrangham, 1987). Since human beings are most closely related to the great apes, this means that it is not possible that vervet monkey alarm calls could be the direct precursor of human language unless at some point apes used them also—and there is no evidence of this.

Similarly and importantly, the visual-gestural communication of nonhuman primates shows no signs of referentiality or symbolicity either. Most strikingly, nonhuman primates do not point or gesture to outside objects or events for others, they do not hold up objects to show them to others, and they do not even hold out objects to offer them to others (Tomasello and Call, 1997). Once again, primate gestures are used almost exclusively |to regulate dyadic social interactions such as grooming, play, fighting, sex, and travel, not triadically to direct the attention of others to outside entities or events. Relatedly, nonhuman primates use their species-typical vocalizations and gestures almost exclusively for imperative motives, to request a behavior of others, not to share attention or information with others in a disinterested manner (Tomasello, 1998b).

Finally, nonhuman primate vocalizations and gestures are not socially learned in the sense of being copied from others. Primate vocalizations are almost certainly not learned at all, as monkeys and apes raised outside their normal social environments vocalize in much the same way as those who grow up in normal social environments (although some aspects of call comprehension and use may be learned). Many nonhuman primate gestures are also not learned, but some are. However, these are not learned by imitation—by observing others using a gesture and then adopting it oneself—but rather by a process of ritualization in which individuals mutually shape one another's behavior over repeated social interactions (Tomasello and Zuberbühler, 2002). Overall, because they are not used referentially, not used simply to share attention with others, and not learned from others via imitation, the communicative signals of nonhu­man primates do not seem to be socially shared (or socially constituted) in the same way as human linguistic symbols.

As a result of facts such as these, a number of primatologists and behavioral ecologists have cautioned against using human language as an interpretive framework for nonhuman primate communication (Owings and Morton, 1998; Owren and Rendell, 2001). They concur with the current analysis that nonhuman primates do not use communicative signals toconvey meaning or to convey information or to refer to things or to direct the attention of others, but rather use them to affect the behavior or motivational states of others directly. If this interpretation is correct, then the deep evolutionary roots of human language lie in the attempts of primate individuals to influence the behavior, not the mental states, of conspecifics. To find the most direct precursors of human linguistic symbols as tools for directing attention, therefore, we can only look at the history of the human species since it began its own unique evolutionary trajectory.

2.1.2. Symbols and Grammaticalization

Although no one knows for certain, it is very likely that human symbolic skills arose as a more or less direct result of a biological adaptation—most likely occurring very recently with the emergence of modern humans some 200,000 years ago. According to Deacon (1998), this adaptation concerned symbolic skills directly, whereas according to Tomasello (1999) it concerned a new kind of social cognition more generally, in which human beings understood one another for the first time as intentional and mental agents—which then led them to attempt to manipulate one another's intentional and mental states for various cooperative and competitive purposes.

In any case, whenever and however they arose, human linguistic symbols are most clearly distinguished from the communicative signals of other primate species by the ways they are learned and used:

Human linguistic symbols are socially learned, mainly by cultural (imitative) learning in which the learner acquires not just the conventional form of the symbol but also its conventional use in acts of communication (Tomasello, Kruger, and Ratner, 1993).

Because they are learned imitatively from others, linguistic symbols are understood by their users intersubjectively in the sense that users know their interlocutors share the convention (that is, everyone is potentially both a producer and a comprehender and they all know this; see Saussure, 1916, on "bi-directionality of the sign").

Linguistic symbols are not used dyadically to regulate social interactions directly, but rather they are usod in utterances referentially (triadically) to direct the attentional and mental states of others to outside entities (see Grice, 1975, on the non-natural meaning of linguistic symbols).

Linguistic symbols are sometimes used declaratively, simply to inform other persons of something, with no expectation of an overt behavioral response (see Dunbar, 1996, on the origins of language for purposes of gossip).

Linguistic symbols are fundamentally perspectival in the sense that a person may refer to one and the same entity as dog, animal, pet, or pest, or to the same event as running, fleeing, moving, or surviving— depending on her communicative goal with respect to the listener's attentional states (Langacker, 1987a).

All these features are in contrast to the unlearned, or at least not imitatively learned, dyadic and imperative communicative signals of nonhuman primates that do not involve mental perspectives at all. In at least one reasonable hypothesis, these uniquely human features all derive—along with a host of other cultural skills involving, for example, teaching and collaborative interactions—from a single social-cognitive adaptation enabling the understanding of the psychological states of others more generally (theory of mind, broadly defined; Tomasello, 1999).

Tomasello (1999) also argued that linguistic symbols provide human beings with a species-unique format for cognitive representation. That is, when a child learns the conventional use of linguistic symbols, what she is learning are the ways her forebears in the culture found it useful to share and manipulate the attention of others in the past. And because the people of a culture, as they move through historical time, evolve many and varied purposes for manipulating one another's attention (and because they need to do this in many different types of discourse situations), today's child is faced with a whole panoply of linguistic symbols and constructions that embody many different attentional construals of any given situation. As just a sampling, languages embody attentional construals based on such things as:

Granularity-specificity (thing, furniture, chair, desk chair).
Perspective (chase-flee, buy-sell, come-go, borrow-lend).
Function (father, lawyer, man, American; coast, shore, beach).

Consequently, as the young child internalizes a linguistic symbol—as she learns the human perspective embodied in that symbol—she cognitively represents, not just the perceptual or motoric aspects of a situation, but also one way, among other ways of which she is also aware, that the current situation may be attentionally construed by "us," the users of the symbol. The way that human beings use linguistic symbols thus creates a clear break with straightforward perceptual or sensory-motor cognitive representations—even those connected with events displaced in space and/ or time—and enables human beings to view the world in whatever way is convenient for the communicative purpose at hand.

The evolution of grammar raises a more controversial set of theoretical issues, leading to some very different hypothesized evolutionary scenarios. Generative grammarians believe that the human species evolved a genetically based universal grammar common to all peoples and that the variability in modern languages is basically on the surface only. There are a number of accounts from this perspective, ranging from Chomsky's (1986) single-mutation account to Bickerton's (1984) two-stage account to Pinker and Bloom's (1992) gradualist account. But in all these variants the basic idea is the same: that the fundamental grammatical categories and relations underlying all of the world's languages come from a biological adaptation (or set of adaptations) in the form of a universal grammar.

The alternative is the usage-based view, in which there is no need to posit a specific genetic adaptation for grammar because processes of grammaticalization and syntacticization can actually create grammatical structures out of concrete utterances—and grammaticalization and syn­tacticization are cultural-historical processes, not biological ones. Thus, it is a historical fact that the specific items and constructions of a given language are not invented all at once, but rather they emerge, evolve, and accumulate modifications over historical time as human beings use them with one another and adapt them to changing communicative circumstances (Croft, 2000).

Most importantly, through various discourse processes (involving various kinds of pragmatic inferencing, analogy making, and so on) loose and redundantly organized discourse structures congeal into more tightly and less redundantly organized constructions (see Traugott and Heine, 1991; Hopper and Traugott, 1993). This happens both on the level of words and on the level of more complex constructions.

On the level of words, simple examples are English phrases such as on the top of and in the side of evolving into on top of and inside of and eventually into atop and inside. Often, however, this congealing process results in some structural changes as the communicative functions of some elements are reanalyzed in the context of specific constructions. Thus, case markers and agreement markers most often originate in free-standing words such as spatial prepositions, pronouns, or even nouns and verbs. A simple English example concerns the future marker gonna, a fusion of going and to. The original use of going was as a verb for movement, often in combination with the preposition to to indicate the destination (I'm going to the store), but sometimes also to indicate an intended action that the going to enabled (Why are you going to London? I'm going to see my bride). This later became I'm gonna VERB, with gonna indicating not just the intention to do something in the future, but futurity only (with no movement or intention necessary; on this change see Bybee, 2002). Givon's (1979) well-known characterization of this process is: today's morphology is yesterday's syntax.

On the lovel of constructions, instead of sequences of words becoming one word, whole phrases take on a new kind of organization; that is, loose discourse sequences become more tightly organized syntactic constructions. Again Givon's characterization is apt: today's syntax is yesterday's discourse. Some hypothetical examples based on Givon (although in many cases the historical record is not sufficiently detailed for confidence in the specifics):

Loose discourse sequences such as He pulled the door and it opened may become syntacticized into He pulled the door open (a resultative construction).

Loose discourse sequences such as My boyfriend ...He plays piano... He plays in a band may become My bayfriend plays piano in a band. Or, similarly, My boyfriend ... He rides horses... He bets on them may become My boyfriend, who rides horses, bets on them (a relative clause construction).

If someone expresses the belief that Mary will wed John, another per­son may respond with an assent, I believe that, followed by a repetition of the expressed belief, Mary will wed John—which become syntacticized into the single statement I believe that Mary will wed John (a sentential complement construction).

Complex constructions may also derive from discourse sequences of initially separate utterances, as in I want it ... I buy it evolving into I want to buy it (an infinitival complement construction).

The historical processes of grammaticalization and syntacticization derive from a number of psychological and social-communicative processes that have been well studied, most importantly automatization, functional reanalysis, and analogy. Thus, when a person says going and to together enough (and consistently for the same single function), she ends up saying gonna by processes of automaticity very similar to those which occur in a variety of sensory-motor skills (Schneider, 1999). The constraint on such streamlining is of course that the behavior cannot be so streamlined that it no longer serves its communicative function effectively. In situations of high predictability the reduction of phonetic content may be relatively great; in less predictable situations less reduction is possible without serious consequences for communication.

Frequency plays a large role in this process as well, as only relatively frequently used expressions will become highly predictable—which accounts for the well-known principle that the more frequently a word is used in a language the shorter it tends to be (Zipf's Law). Frequency is also crucial because, as is well known, constructions that occur frequently are often irregular. This irregularity can be maintained because items and constructions that are highly frequent can be learned and used on their own, as constructional islands, whereas items and constructions that are less frequent tend to get regularized by pattern-secking children (or, in the limiting case, they drop out of use as children do not get enough exposures to learn them). An interesting example is the subjunctive in Canadian French, which has dropped out of active use for virtually all low­frequency verbs but has stayed in use for a small number of high­frequency verbs (Poplack, 2001; also note an even narrower pattern in English in which the subjunctive survives for most speakers only in some fixed expressions such as If I were you...).

Grammaticalization also quite often involves processes of functional reanalysis and analogy. An example from English illustrates (adapted from Trask, 1996). Old English had a verb lician that meant something like "be pleasing to." Like similar verbs in many languages (such as the German gefallen, the Spanish gustar), this verb had as its subject the thing that pleased, with the person who was pleased with that item appearing in the dative case (X is pleasing to Fred). The normal word order for utterances with this verb consisted of the person being pleased said before the verb (in dative case) and the thing doing the pleasing said after the verb (as sub­ject, agreeing in number with the verb); this is presumably because in Eng­lish nominals indicating people most often come before verbs (for prag­matic reasons of topicality) and nominals for inanimate objects most often come after verbs. We thus get: Pam kynge licoden peran. To the king-[dative] were - pleasing pears. (pears = plural subiect). During the Middle English period, however, English lost much of its case­marking morphology, and so this same utterance was normally expressed:
The king licenden peares. The king were- pleasing pears.(no dative marking) It is clear that pears is still the subject at this paint since the verb agrees with it in number, and not with the singular king (the -en ending on the verb indicates plurality, as in modern-day German). Finally, the plural marking on the verb was lost too, and we were left with the modern-day: The king liked pears. The dative king has now been reanalyzed as the subject, and the former subject pears as a direct object. Presumably, a driving force in this particular historical development was the fact that this construction had an atypical configuration of case‑marking and word order (and perhaps it became less frequent as well, creating pressure for regularization), and so the reanalysis was in some sense aided by a kind of analogy to other Subject­Verb-Object (SVO) constructions.

All of this is not perfectly understood at this point, but for the process of grammaticalization to result in complex and abstract syntactic constructions the organisms involved must be equipped with some fairly complex cognitive and social-cognitive skills, including the ability to form complex schemas and to categorize these and their internal constituents into abstract categories, as well as the abilities to make sophisticated pragmatic inferences, functional reanalyses, and analogies. It may also be that humans' relatively recent specialized speech adaptations enabled the emergence of fully linguistic communieation, if for no other reason than that they enabled the very rapid production of sequences of linguistic symbols so that grammaticalization could take place (Lieberman, 1985). In any case, grammaticalization theory is able, at least in principle, to account both for the similarities among the world's languages—based on species­wide skills of cognition, vocal-auditory information processing, and prag­matic inferencing, along with commonalities among peoples in social and communicative goals—and for fundamental differences in these lan­guages, as different speech communities use and grammaticalize different discourse sequences. Some people may doubt that cultural‑historical processes can create abstract structures such as those embodied in the grammatical constructions of modern-day languages. But, al­thongh the analogy is clearly not perfect, there are many highly abstract structures in modern mathematics that could only have been created by cultural-historical processes since they are not universal among cultures (for example, those of algebra and calculus). Again, there are many disanalogies between language and mathematics (which is more closely related, both logically and historically, to written language). The ouly point is that abstract symbolic systems can be created by groups of human beings working together over historical time in the domain of mathematics, and so perhaps they can also be created in similar yet different ways in the domain of language.

2.1.3. Language Universals

Of crucial importance to the question of whether human grammatical competence is best explained by an innate universal grammar or by processes of grammaticalization is the question of language universals. The basic facts are these. Leaving aside for the moment nouns and verbs— which may or may not be universal in all the world's languages—virtually all linguists who are involved in the detailed analysis of individual languages cross-linguistically (known as linguistic typologists) now agree that there are very few if any specific grammatical categories and constructions that are present in all languages. Many languages simply do not have one or more of what are conventionally called relative clauses, auxiliary verbs, passive constructions, grammatical markers for tense, grammatical markers of evidentiality, prepositions, topic markers, subject markers, a copula (to be), case marking of grammatical roles, subjunctive mood, definite and indefinite articles, incorporated nonns, plural markers, conjunctions, ad­verbs, complementizers, and on and on. The fact is that many languages (or language families) have grammatical categories and constructions that seem to be unique to them, that is, that do not correspond to any of the European categories and constructions as these have been defined over the centuries, beginning with Greck and Roman sources—who, by the way, created these grammatical entities not with the goal of psychological reality in mind, but rather as resources for the analysis of written texts and the teaching of Latin grammar.

For sure, we can force all languages into one abstract mold, which mostly means forcing the grammatical entities of non-European languages into European categories. Just as there was a time when Europeans viewed all languages through the Procrustean lens of Latin grammar, we may now view the native languages of Southeast Asia, the Americas, and Australia through the Procrustean lens of Standard Average European grammar. But why? On one reasonable view, this is just Eurocentrism, plain and simple, and it is not very good science. Foley and van Valin (1984) speculate about what our linguistic categories and theories would look like if we had begun by analyzing the languages of Southeast Asia and the Pacific Occan and then attempted to assimilate European languages to them. The conclusion is that they would look very different. Croft (2002) also points out the "methodological opportunism" routinely employed by many linguists looking for language universals. In effect, they focus on a subset of the features that characterize, for instance, English subjects, and claim that any category in any language characterized by this subset is a subject—basically ignoring the features that don't match. From a very practical perspective, Dryer (1997) points out that when different investigators, whatever their theoretical persuasions, look long enough and in enough detail at a given language, they mostly come to agreement about the basic grammatical categories and how they work. The problems arise when they then try to decide if any of these categories correspond to such things as "subject," "preposition," and "auxiliary verb," as these have been defined for Euro­pean languages. We can fight about it, but is it really a useful fight? The fact that our Greco-Roman pigeonholes do not accommodate many non­European languages in a particularly graceful way should not be surprising, since these pigeonholes were not created with those languages in mind.

Of course there are language universals. It is just that they are not universals of form—that is, not particular kinds of linguistic symbols or grammatical categories or syntactic constructions—but rather they are universals of communication and cognition and human physiology. Because all languages are usod by human beings with similar social lives, all peoples have the need to solve in their languages certain kinds of communicative tasks, such as referring to specific entities or predicating things about those entities. All human beings also have the same basic tools for accomplishing those tasks—linguistic symbols, markers on those symbols, ordering of symbols, and prosodic patterns (Bates and MacWhinney, 1982)—and certain grammaticalization pathways seem to recur quite often in the service of those tasks. This leads to some language universals, for example, something like nouns and verbs as expressions of reference and predication using linguistic symbols of cortain kinds. Such universals are therefore emergent phenomena, based ultimately on universals of hu­man cognition, human communicative needs, and human vocal‑auditory processing. But there is very little evidence in the typological literature for the existence of contentful language universals of the type one would nor­mally associate with an innate universal grammar.

2.2. Ontogenetic Origins

The human adaptation for symbolic communication emerges in human ontogeny quite predictably across cultures at around 1 year of age (Tomasello, 1995a, 1999). It emerges in the context of a whole suite of new social-cognitive skills, the most important for language acquisition being the establishment of joint attentional frames, the understanding communicative intentions, and a particular type of cultural learning known as role reversal imitation. Together this new suite of skills may be referred to as skills of intention-reading, indicating the most fundamental social-cognitive ability underlying them all.

As for grammar, recent findings have demonstrated that prelinguistic infants have some astounding skills of pattern-finding when exposod to vari­ous kinds of auditory sequences, which obviously prepare them for acquiring grammatical constructions. But these skills cannot go to work in earnest until children are able to acquire some linguistic symbols in the first place, again depending on key social-cognitive developments at around 1 year of age.