Bootstrapping in the acquisition of word stress in Brazilian Portuguese

This paper deals with the acquisition of word stress in Brazilian Portuguese. In the course of acquisition, children use several strategies to mark stress prominence before the adult algorithm of primary stress is used productively. I argue that the prominences found in children ́s early utterances do not reflect word stress but prominences of a higher prosodic level. In other words, children use the stress information available in higher prosodic domains as a cue for the acquisition of the algorithm of primary stress.


Introduction
The input children access has a segmental sequence, some metrical prominences, and an intonational contour to which a meaning is associated.Children must deal with all of these kinds of phonological information and organize them appropriately, in a hiearchical way (see Nespor & Vogel's (1986) prosodic hierarchy, for instance).We may in principle be able to find how such organization takes place by investigating children's spontaneous production, since their production data arguably reflect their hypotheses about the language that is being acquired.Assuming this to be so, this paper will focus on the acquisition of word stress in Brazilian Portuguese and examine production data by two children from 1;2 to 2;5 years of age.
Children's early utterances are usually short (word size), have a strong beat, and look like adult utterances.It is generally assumed in the literature that this strong beat in children's utterances is word stress (see Archibald 1995, Demuth 1996a,b, Echols & Newport 1992, Hochberg 1988a,b, Fikkert 1994, 1995).Under this view, children start with the lower levels of the prosodic hierarchy and go up in the metrical grid, in a bottom-up fashion.
However, the fact that children's utterances look like adult speech does not necessarily mean that children and adults assign the same structures to such utterances.In this particular case, stress in children's early utterances may not necessarily be word stress.Given the fact that stresses are superimposed on a metrical grid, it is possible that children's strong beats in their early utterances may come from another prosodic level through a bootstrapping process.
Broadly speaking, bootstrapping is a process by which children use one type of linguistic information already acquired to support the acquisition of another type of linguistic information.This notion was first used in proposals according to which children use phonetic and semantic cues in the acquisition of syntax (cf.Pinker 1987, among others).With respect to prosodic bootstrapping, for instance, it has been argued that stress can contribute to bootstrapping children into acquiring grammatical categories (cf.Echols & Newport 1992;Fernald & McRoberts 1996, Selkirk 1996, among others).The kind of bootstrapping that I will investigate in this paper is the one occurring within a single component of the grammar: phonology bootstrapping phonology.
Given that the syllable that carries the utterance stress also carries word stress, my hypothesis is that children bootstrap the acquisition of the word stress on the prominence of the utterance.In other words, the prominence heard in children's early utterances is not word stress, even when such utterances are one-word size.
The paper is organized as follows: section 2 describes the subjects and the methodology used; section 3 discusses the types of intonational contours found in the corpus; section 4 describes some of the phonological processes children use in order to fill the intonational contour targeted; finally, section 5 presents some concluding remarks.

Subject and Methods
The corpus analyzed in this study is part of a larger corpus of longitudinal records on the acquisition of Brazilian Portuguese belonging to the Projeto de Aquisição da Linguagem of the Universidade Estadual de Campinas (for a description of the corpus of the project, see Lemos 1995).The corpus analyzed here is constituted by production data by two Brazilian children -R. and T.from 1;2 to 2;5 years of age.The children are from middle-class families and their parents are university graduates.
The children´s spontaneous production was audiorecorded on a weekly basis, in half-hour sessions, in a naturalistic way.The data referring to both the children and their caretakers were phonetically transcribed on a perceptual basis by a first researcher and the transcriptions were checked by the author of this paper or another researcher.Only the data whose transcriptions were agreed upon by both researchers were taken into consideration here.
The data were segmented into words only when children´s utterances had more than one stress prominence.Utterances were identified according to standard intonational and pause criteria.Given that utterances with more than one prominence generally appear in latter periods (from 2;5 on), until then the whole utteranceincluding unintelligible oneswas analyzed as a token.
Since the analysis of the data was meant to be qualitative rather than quantitative, all the different intonational contours and relevant phonological processes were registered even if they appeared only once.

Intonational Contours
Since the hypothesis of this paper is that children bootstrap the word stress on the prominence of the utterances, let us start the discussion by examining the different patterns of intonational contour found in the corpus.

Data
Between 1;3 and 2;0, T. and R. develop a primitive system of intonational contours.Gebara (1984) documented the development of their intonational systems by mapping the different intonational contours found in the corpus onto the meaning (use) children associated to them.Her results can be summarized in (1) and ( 2 Gebara 1984:98, 120, 155, 159) In the course of the development of their intonational systems depicted above, the two children exhibit a considerable degree of variation in their segmentation of adult utterances.Their utterances may have the same segmental sequence as adults', as seen in ( 3)-( 7) below;2 however, these adult--like sequences are not always words in the adult language, as exemplified in (8).
( [  ] tone 1T ´?´ (T.1;3.17) These non-adult segmental sequences are used to complete an intonational contour targeted; that is, they are filler-sounds, whose function is to make it possible for a given segmental sequence to match an intonational template, by filling initial weak positions (Scarpa 1994).T. and R. use these strategies when targeting the intonational contours 6T and 2R.( 9)-( 12 At about 1;8 both R's. and T's systems start to change and the process continues until 2;3.T. starts using only one tone, 6T (see Gebara 1984).The fact that segmental sequences not always match this tone leads T. to resort to tone filling, under the pattern (L) L + H* (L)%.In turn, R. does not generalize any single tone, but always uses a falling tone (2R, 4R, 5R, 9R), with the possibility of low pre-nucleus, to "experiment" and add filler-sounds; when she uses other contours, there is no modification of the segmental sequence (see Santos 1995).Taking one variety of 2R, L+H* L L %, to be representative of the tones used for "experimentation", ( 13) below allows a comparison between T. and R.'s systems.
(13) T.: Despite their apparent differences (T.´s contour admits two initial L syllables and just one final L, while R.´s contour allows only one initial L but two final L), (13) shows that both systems are actually similar in displaying a high strong syllable, a low pre-nuclear syllable, and an optional low post-nuclear syllable.This becomes clearer if we assume that the intonational domain of children´s early utterances is the tone group (cf.Gebara 1984). 3Superposing 3 The tone group is formed by an obligatory nucleus and optional constituentshead, pre-head and tail, as represented in (i) (cf.Ladd 1986, 1996and Arvaniti, Ladd & Mennen 2000).
(i) (pre-head) (head) nucleus (tail) The nucleus is associated with the last stressed syllable of the tone group; the head is the first stressed syllable that occurs before the nucleus plus all the non-stressed syllables that occur between them; the pre-head involves all non-stressed syllables that occur before the head; and the tail refers to the syllables that follow the nucleus.
the tone R. and T. choose to work with on the structure of the tone group yields the template in ( 14) below.Thus, the specific tones each child resorts to are different, but the template is the same. ( In case of mismatch between segmental sequences and the template in ( 14), the most radical way to match the chosen tone is to change the position of stress.This process is however not common because the cut-off from the input is made from the nuclear syllable; besides, this strategy is only possible with three syllable words.Hence, we may find only a few cases of "stress mistakes", as illustrated in ( 15) and ( 16). ( 15) [  ] /  / ´song´ (T.2;0.21) In the remaining cases (words with more or less syllables), various other processes allow matching.In the case of pre-nuclear syllables, children have four different possibilities: filler-sounds, as shown in ( 17) and ( 18), lengthening of the nuclear syllable, as shown in ( 19), maintenance of the stressed pre-nuclear syllables, as shown in ( 20)-( 23), and maintenance of the most adjacent syllable, as shown in ( 24)-( 27 On the right side of the intonational nucleus, the child has three options to fill the post-nuclear position: lengthening of the syllable that bears the phrasal stress , maintenance of the most complex syllablecvc -, or maintenance of the closest syllable to the nucleus, as respectively illustrated in ( 28)-( 30)./  / ´cup´ (T.2;2;10) Summarizing, from 1;3 to 2;0 both R. and T. use different contours in their utterances.Segmental sequences cut off from the input plus filler-sounds are used in order to fill the different intonational contours of this period.From 1;8 on, their intonational systems change and they single out one of the contours to experiment with.Both emerging systems are similar in displaying a high strong syllable, a low pre-nuclear syllable, and an optional low post-nuclear syllable.In order to fill this template, children use different phonological processes, and even move word stresses that do not match the position of the nucleus of the intonational contour.

Discussion
As seen in section 3.1, R. and T. start manipulating many different intonational contours and later choose one of these contours as a template to be matched.The question then is whether the prominences perceived in their speech during this period reflect word stress or stress at higher prosodic levels.
Of course, this is not easy to tease apart, given that children's early utterances are in general word size and that one-syllable utterances usually coincide with the syllable that carries phrasal stress in the target form. 4Two things seem to be clear, though.First, there are no utterances without an intonational contour.And second, despite the fact that there is no one-to-one correlation between contours and meanings (for example, 4R is used for questions and callings, and interrogatives may be expressed by 7R or 4R), early sentences do convey speech acts (cf.Gebara's (1984) results in (1) and ( 2)).Besides, crosslinguistic studies have shown that early language achieves stability as far as sentence stress is concerned (see, among others, Bever, Fodor & weksel 1971, Crystal 1979, Dore 1975, Frota & Vigário 1993, 1994, Locke 1983, Menn 1976, Peters 1977, Scherer 1981, Tonkova-Yampol´skaya 1973, Vigário & Frota 1992).
This picture accords well with the proposal by Dore, Franklin, Miller & Ramer (1975), among others, according to whom early speech is not organized by grammatical categories, but by relations between the conceptual meanings and the phonetic output.In the same vein, Vigário & Frota (1992) argue that it is possible to analyze children's early utterances as having only a "prosodic meaning" (declarative, interrogative, etc.), without a lexical meaning associated to them.In other words, children initially use an inventory of intonational contours to interact with adults in different ways, despite their lack of developed segmental and lexical structures.
If, as these authors claim, children in this stage are still in the process of acquiring lexical structure and in this sense are more tuned to speech acts rather than lexical meaning, the prominences perceived in children's early utterances could in principle come not from the word level, but from the utterance level, whose system is more developed at this stage and shows more stability.
That children are working with the prominence of the intonational domain and not with word stress is indicated by the fact that all of T. and R.'s production from 1;8 through 2;3 converges towards the same intonational template, represented in (31) below ( = ( 14)).Although it is possible to find the template in (31) with less than three syllables in the corpus, as seen in ( 32)-( 34), utterances were usually three syllables long, with the following pattern: a weak, short, and low pre-nuclear syllable, followed by a strong, long and high syllable, followed by a weak, short, and low syllable, as illustrated in ( 35)-( 40).If children were indeed working with word stress, the emergence of this template in this period would lack an explanation.
( Another piece of evidence that shows that children are working with the intonational level comes from the repetition of the input.In all cases where the children repeat the adult, the repetition is from the right side of the input utterance, where the intonational nucleus is, as illustrated in ( 41) and ( 42 45), but should not have changed the stress.On the other hand, if children were working with syllable weight, R. should not have moved the stress to the penultimate syllable in (46).The surprising "mistakes" in ( 45) and ( 46) however receive a straightforward account if the stress moves in order to allow the intonational template to be appropriately matched. 5 This bootstrapping hypothesis, where acquisition of word stress is contigent on utterance's prominences, brings up interesting issues regarding the acquisition of word prosody.When R. and T.'s utterances have only two syllables, for instance, the stress can be on the first or on the last syllable, as show in ( 47)-( 50 These results are in consonance with Hochberg (1988a,b) and Nauclér & Magnusson (1996), who argue against a trochaic bias in the beginning of the acquisition process, as claimed by Allen & Hawkins (1978, 1980), Fikkert (1994, 1995), Archibald (1995), Demuth (1996ab), Vihman (1996), and, specifically for Portuguese, Rapp (1994).The data found in the corpus indicate that there is no preponderancy for any kind of stress and that children do not start with a default stress.The data in ( 51)-( 54 Two syllable words with penultimate stress may fit a trochaic feet without any problems.( 51)-( 54) show that children may however add an extra syllable in these cases.If children were working with words at this stage and, conse-5 Corroboration for present analysis is also found in Moraes´s (1999) acoustic analyses of words and utterances in Brazilian Portuguese.He argues that fundamental frequency is the most important parameter for prominence in utterances, while intensity and duration are more stable parameters for stress.Interestingly, his experiments show that intonational prominence superposes other prosodic categories (as phrase group structure and words) in Brazilian Portuguese.Thus, the hypothesis that what we hear in children´s one-word utterances is the intonational prominence and not the word stress is attractive not only on phonological, but on phonetic grounds, as well.
quently with the word stress system, there would be no need for such insertion in ( 51)-( 54).If, on the other hand, children are working with the intonational system, the extra syllable in these cases is arguaby added in order to match the LH*L template.These facts do not confirm recent conclusions reached based on the acquisition of Germanic languages (see Fikkert 1994, Gerken 1994b, Vihman 1996, among others), according to which words in children´s early grammar must constitute a binary foot.Similarly, it has been proposed for other languages that the first "words" are one foot long (see Hochberg 1988a,b for Spanish, andRapp 1994 for Brazilian Portuguese, among others).Common in these works is the view that prosodic development proceeds from the lowest level (syllable) to higher levels (feet and then word) in a bottom-up fashion.In this line, Demuth (1995) proposes the four stages in the acquisition of prosody sketched in ( 55).If prosodic development took place in a bottom-up fashion, children's utterances should be one syllable long in the beginning and later become one foot long.However, as pointed out by Scarpa (1999), crosslinguistic studies have shown that children tend to avoid one-syllable utterances and that one--syllable utterances and minimal words are usually found at the same time and form a primitive prosodic set for a primitive lexicon.In the case of T. and R., they start producing three syllable utterances with penultimate stress in their very early stages, raising questions for the claim that the acquisition process must proceed from lower to upper levels.At the very least, we can say that such bottom-up path is not universal.

Phonological processes
As pointed out in section 3, children use different phonological processes in order to fill the intonational contours of their utterances.This section takes a closer look at some of the processes that take place at the word level.

Segment Insertion
The most common phonological process found in the corpus was segment insertion on the left side of the utterance.The two children resort to this process since the beginning of the period analyzed here.For R., it is documented until 2;6 and for T., it continues until the end of the period recorded, although less and less frequent.These inserted segments could fill 1994,2000).Santos (1995), Lléo (1997) and Scarpa (1999) argue that, during the acquisition process, these segments can in fact be reanalyzed from filler-sounds to place-holders.
As can be seen in ( 56), ( 61), ( 63), ( 64), ( 66), ( 68) and ( 72), for instance, there are different kinds of filler-sounds.For our purposes, two types of filler-sounds are more interesting because they fill the weak positions (usually to the left) in order to ensure that the whole tone group targeted is filled (cf.Scarpa's (2000) typology of filler-sounds based on the position they fill in the intonational contour): a) Sounds that (entirely or partially) fill the pre-nucleus portion of a tone group, as in ( 61)-( 63).These fillers combine with mature or adult-like fragments to support the rhythmic and intonational integrity of the tone group.
b) Front, central, or back vowels, with [-tense] trace, combined with verbal and nominal forms, as in ( 68)-( 70).These fillers fill in weak portions to the left of the nucleus.Their imprecise characteristics and their position can be taken to indicate syntactic information and, consequently, they may be analyzed as place-holders.
The interesting thing to observe is that there are no cases in which children maintain nonstressed syllables from the input and use filler-sounds to fill the nuclear position.The intonational prominence is always preserved.

Resyllabification in two syllables
Children may also break diphtongs into two syllables in order to match the target template, as illustrated in ( 73)-( 77) below.This modification process happens very early in the acquisition period and occurs only with falling diphthongs.The examples were not many and usually occurred when children tried to emphasize their utterances.Another instance of resyllabification in two syllables in order to match the targeted intonational template is illustrated in ( 78) and ( 79) below.Importantly, this kind of data is found when T. and R. already have complex syllable structures; so, it cannot be the case that resyllabification in this case is motivated by the need to parse a non-acquired syllable structure.One could think that the two processes above aim at preserving binary feet; in other words, instead of one heavy syllable, R. and T. would give preference to two light ones.That Portuguese is quantity sensitive is, however, not a consensus in the literature.In fact, it has been convincingly argued that both European (see Pereira 1999 andMateus &Andrade 2001) and Brazilian (see Lee 1995) Portuguese are not sensitive to syllable weight.Confirming evidence for this position is the fact that R and T. always respect the word stress of the input, as shown in ( 80)-( 86), and never make mistakes by stressing cvc final syllables (see Santos 2001) 80)-( 86) should be stressed on the last syllable, since this would constitute a foot.However, this does not happen.During the whole process of acquisition, children do not change the stress to a final heavy syllable.The opposite is what may indeed happen; as illustrated in (87), T. even moves the stress from a final cvc syllable to the penultimate syllable.
(87) [  /  / ´keep-recorder´ (R.2;0.5)Thus, a more plausible analysis is that the two processes of resyllabification discussed are used to fill the targeted intonational contour.Given that one syllable can carry on more than one tone, these processes are optional, as can be observed in the data.

Post-tonic Omission
If children insert segments or modify diphthongs to obtain two syllables so that it may be possible to fill a template that is larger than the target word, the opposite strategy is also used: segments and syllables may be deleted when the target word is larger than the template.
In the corpus, post-tonic omission occurred essentially with strings with two final weak syllables (sww), which is a very marginal stress template in Portuguese. 6This process was used until the end of the period recorded for both R. and for T.There seemed to be no preference for the omission of the first or the second post-tonic syllable; in fact, one may even find variation in this regard with respect to the same input, as shown in ( 88)-( 93).( 88 Rapp (1994), working with Brazilian Portuguese data, claimed that the closest syllable to the tonic is the one subject to deletion.This proposal however does not account for ( 88), (90), and (93).In turn, Gerken (1994), based on English data, proposed that in a sww template, the last weak syllable is the one that could be omitted; however, if the last weak syllable has a cv(c) structure, the omitted syllable is the one with the simplest syllabic structure, and in case two final syllables have the same syllabic structure, it is a free choice situation.Gerken's proposal can account for ( 88)-( 92), but not for (93), since its last syllable is cvc and should be maintained, contrary to fact.
Based on the diferent possibilities for syllable omission illustrated in ( 88)-( 93), Santos (1995) suggested that in Brazilian Portuguese, the syllable to be omitted depends on whether children are working with metrical constituents or syllabic structure.The relevant point for our current discussion is that if this proposal is correct, children are working with feet and syllables at the same time, which indicates, in consonance with the hypothesis explored here, that the acquisition of prosody does not take place in a bottom-up fashion.

Pre-tonic Omission
Like post-tonic omission, pre-tonic omission occurs during all the period analyzed for both subjects.In cases in which the utterances had more than one principle be metrically more salient and be preserved by children.9When these syllables were the most distant and did not have a filled onset, the omission was categoric10 and when the onsets were filled, the choice for the most distant or closest syllable was random.Thus, it cannot be said that children were paying attention to the syllabic structure or the metrical structure, because only in (103) does the remaining syllable have a different syllabic structure from the omitted one.In this case, the omitted syllable is exactly the most complex one (cvc), being at odds with the literature on the subject.11

Reduplication
Another process, although not common, is the reduplication of the tonic syllable.This process occurs only for T., at around 2;0: This phenomenon is usually analyzed as a type of segment or syllable insertion.Work that deals with reduplication, either in acquisition or in adult language, argues that this is a filling trochaic foot phenomena. 12In this paper, this phenomenon was considered apart from insertion because the period when it occurred and the rhythmic patterns created are different from the ones related to insertion.Insertion creates wsw, ws or sw patterns, while reduplication creates wsw and ws patterns, but never sw patterns.In other words, reduplication does not always create a 3 syllable utterance with penultimate stress.If the prominence is not on the second syllable, it falls on the last one, where the intonational prominence is usually found in adult utterances.T., for instance, never reduplicates syllables in order to target the Brazilian Portuguese trochaic pattern.His reduplications are always at the right boundary of the utterance, where the intonational prominence is found in adult language.T. is therefore using the reduplication process to attain an intonational template, and not a word stress template.

Conclusion
In this paper, I argued that what is perceived in children´s utterances in the beginning of the acquisition process is not word stress, even when the utterance is one word long.At this stage, children are not yet working on the word level and instead use the intonational contour as a support to acquire word stress.They resort to different phonological processes in order to work with the intonational template they chose, always respecting the nucleus position of the tone group.In the case the 'words' chosen (or segmented from the input or spontaneous production) are smaller than the tone group, children reduplicate syllables, insert segments and syllables, change diphthongs into hiatus.In cases where the target is larger than the tone group, children delete segments and diphthongate.Most importantly, when the word targeted has the same number of syllables as the tone group template but does not have the same prominence template, children may move word stress in order to match the tone group template.All these different processes can change the segmental sequence, but always preserve the intonational prominence of the tune.

( 55 )
Stage I: core syllables (cv) Stage II: minimal words is a binary foot Stage III: prosodic words larger than a binary foot Stage IV: prosodic words fit the target form
), and there is no preference for trochaic or iambic feet.
) are very telling in this regard. .