Spanish was traditionally classified as a non-plastic language (Vallduví and Engdahl 1996) on the basis of a presumed unavailability to mark focus prosodically in-situ. Nonetheless, experimental evidence has suggested that Spanish speakers of different dialects can in fact use intonational and prosodic strategies to mark focus instead of or in addition to syntactic strategies such as clefting or p-movement, etc. (Feldhausen and Vanrell 2014; Gabriel 2006; Gabriel, Feldhausen & Pešková 2009; Hoot 2012; Muntendam 2009; Vanrell and Fernández-Soriano 2013, among others). This supports a new classification of languages in a continuum based on the degree of use of intonational and morphosyntactic mechanisms (Face and D’Imperio 2005). In this sense, different Romance languages would occupy different positions, as prosody plays different roles in each one of them. A further conclusion from previous studies is that dialectal variation still plays an important role (Dufter and Gabriel 2016; Feldhausen and Vanrell 2014). To the best of our knowledge, no study has examined the realization of focus in the Spanish spoken in Asturias, a region located in the North of Spain, nor in Asturian, the Romance language this dialect is in contact with. To shed some light on the role that prosody plays in this dialect as compared to others, the present study reports the results from a contextualized sentence completion task. In the remaining of this section, the theoretical framework and previous experimental studies are reviewed in more detail. Then, the goals and predictions for the current research are introduced.
1.1. Information structure and the different types of focus
Halliday and Hasan (1967: 27) define information structure as “the ordering of the text, independently of its construction in terms of sentences, clauses and the like, into units of information on the basis of the distinction into given and new”. Old information is usually referred to as old, given, presupposed, or the topic. New information, on the other hand, is generally referred to as new and is introduced by means of a linguistic device referred to as focus (Krifka 2008; Prince 1981). Different types of focus have been distinguished in terms of their domains, the most general distinction being broad vs. narrow focus (Gussenhoven 2007; Ladd 1980; Lambrecht 1994; Roberts 1996; Selkirk 1986). The former implies that more than one constituent or even all the information in the sentence is new, while the latter entails that the focus involves only one constituent or a smaller unit. An utterance with broad focus could be the response to a question such as What happened? as exemplified in the dialogue presented in (1).
|1)||A: What happened?|
|B: F[The singer fell off the stage at the concert]|
For the purpose of the current study, it is relevant to further classify narrow focus into different types. The terminology used in the literature varies and, as Kiss (1998) explains, the terms are not always consistently used in the same manner. Kiss (1998) distinguished information vs. identificational focus: Information focus is non-exhaustive and does not require movement of constituents, that is, it can be marked in-situ; identificational focus, on the other hand, is exhaustive and involves movement of focused constituents to the specifier position of a functional projection. In the present study, however, we will follow a distinction of two types of foci in a sense that does not correspond to Kiss’s (1998) distinction between informational and identificational focus, but to a distinction followed by a substantial number of scholars: Informational and contrastive focus. Across different studies, identificational focus seems to have been identified with corrective focus and referred to as well as contrastive focus. This would be the type of focus used when the speaker’s intention is to direct the hearer’s attention and to make them change their background assumptions based on new information (Zimmermann and Onea 2011). The other type of focus considered, which has been referred to as informational focus (and not information focus) pertains the introduction of new information in discourse in those cases where there is no presupposed information nor a limited subset of possible entities (Dufter and Gabriel 2016; Zubizarreta 1998); in these cases, the focused constituent serves as an answer to a wh-question, which may be overt or covert (Erteschik-Shir 2007). Informational focus would then be non-corrective. This distinction between contrastive and informational focus is used by Culicover and Rochemont (1983), Vallduví and Engdhal (1996) or Gussenhoven (2008). While acknowledging the confusion present in the literature, this paper is consistent with the terminology that has been used in other intonational studies concerned with the realization of focus in Peninsular Spanish, namely Vanrell and Fernández-Soriano (2013, 2016), as these directly inform the present study. To further clarify what we mean with these terms, the dialogue in (2) presents an example of informational focus whereas the dialogue presented in (3) constitutes an example of contrastive focus. The differences between these two types of focus in terms of their syntactic and prosodic realization will be discussed in the following section.
|2)||A: Who fell off the stage at the concert?|
|B: F[The singer] fell off the stage at the concert.|
|3)||A: The guitarist fell off the stage at the concert.|
|B: F[The singer] fell off the stage at the concert.|
1.2. Focus marking
1.2.1. Syntactic strategies
The linguistic strategies for marking the status of information in discourse vary across languages (Ladd 1996; Vallduví and Engdahl 1996; Zubizarreta 1998). Across languages, focused constituents tend to receive main prosodic prominence (Büring 2010; Jackendoff 1972; Vallduví and Engdahl 1996), which results from the combination of different acoustic features, namely pitch, duration, and intensity (Cruttenden 1986) or alignment with the edge of an intonational phrase (Büring 2010; Féry 2013). Vallduví and Engdahl (1996) show that languages differ based on the strategies used to associate nuclear stress and focused constituents. Based on these differences, the authors classify languages into one of two categories: Plastic and non-plastic languages. The former group of languages, in which English and Dutch are included, have the ability to shift the position of nuclear stress without changing the syntactic structure (prominence shift). On the other hand, non-plastic languages modify the syntactic structure in order to render the focused constituent in a position where nuclear stress is systematically assigned. Zubizarreta (1998), Gutiérrez-Bravo (2002) and Samek-Lodovici (2005) consider Spanish as a non-plastic language that makes use of syntactic strategies (e.g., word order modifications) in order to express focus. Zubizarreta (1998) argues that, in Spanish, informational subject focus can only be marked via p(rosodically motivated)-movement, that is, by moving all the defocalized material to a higher position so as to leave the focused constituent in sentence-final position, where nuclear stress is assigned (see the answer presented in example (4a)). The idea that this position is reserved for constituents conveying new information was well-established in previous literature (Bolinger 1954, 1972; Contreras 1980). The need to use this strategy derives from the assumption that mechanisms such as anaphoric deaccentuation and prominence shift, which are productively used in Germanic languages, cannot be used in Spanish to convey informational focus. Their use, nonetheless, is accepted in contrastive focus, and would result in utterances such as the answer presented in (4b).
- ¿Quién compra novelas de fantasía cada mes?
- Who buys fantasy novels every month?
- ¿Tu hermana compra novelas de fantasía cada mes?
- Your sister buys fantasy novels every month?
For Spanish, experimental studies using question-answer pairs to elicit utterances with different focus structures have provided evidence supporting the claim that speakers do not always resort to word order variation in order to mark a specific constituent as focused: Gabriel (2006) and Gabriel, Feldhausen and Pešková (2009) for Argentina, Hoot (2012) for Mexico and Muntendam (2009) for Bolivia and Ecuador. In fact, Face and D’Imperio (2005) have proposed a new typology for the classification of languages based on the mechanisms available for the realization of contrastive focus, one that is not as rigid as the one proposed by Vallduví and Engdahl (1996) but more of a continuum. Accordingly, languages would be placed in this continuum based on the degree to which they use intonational or syntactic strategies to mark focus. Romance languages such as Spanish or Italian would then be somewhere in the middle of this continuum, as evidence from experimental studies has shown that speakers of Spanish allow for the use of word order or intonational marking of focus alone, while Italian speakers use both word order and intonation (Face and D’Imperio 2005). In this sense, Portuguese speakers have also been shown to use both syntactic (Costa 2000) and prosodic strategies (Frota 2014) in focus marking.
For Peninsular Spanish in particular, Vanrell and Fernández-Soriano (2013) and Feldhausen and Vanrell (2014) report that some of the most common strategies for the expression of informational focus are focus marking in-situ (5), p-movement (6), and clefting (7). These studies have looked at Castilian Spanish and other dialects such as the Spanish spoken in the Canary Islands and in the Basque Country. The strategies that were preferred differed based on the dialect considered, but the expression of focus in-situ was always one of the two most frequent options.
- 3s-pst-take out
- ‘María took the car out without problems’
- Snow White
- ‘Snow White brought the apple with tiredness’
- Snow White
- ‘It was Snow White who brought the apple with tiredness’
The preferred strategies, however, differed also as a result of the focused constituent’s function: In contexts of informational focus, speakers mainly used prosodic marking in-situ and p-movement (Vanrell and Fernández-Soriano 2013), as well as clefting (Feldhausen and Vanrell 2014); with contrastive focus, subjects were prosodically marked in-situ or through clefting while the most frequent strategy used to mark contrastively focused objects was clefting. As pointed out by Dufter and Gabriel (2016) after reviewing some of the studies presented above among many others, we should actually expect free variation with regards to the strategies chosen by speakers to convey a specific type of focus, since there is no one-to-one mapping between them. Additionally, these studies point towards the role played by dialectal variation and the need to explore other dialects, as is the goal of the present study.
1.2.2. Prosodic strategies
The Autosegmental-Metrical framework, proposed by Pierrehumbert (1980), and the language-dependent annotation systems (Tones and Break Indices or ToBI) derived from it have been employed to describe the intonational grammars of multiple languages. The labelling system created for Spanish (Sp_ToBI) was first proposed by Beckman et al. (2002) and recently revised by Hualde and Prieto (2015). The main categories used for analysis, pitch accents and boundary tones, are characterized in terms of the nature of the tone as low (L), high (H), or a combination of these two. Pitch accents describe tonal movements associated with the stressed syllable of an accented word and may be monotonal (L*, H*) or bitonal (L+H*, H+L*, L*+H, among others); in the case of bitonal pitch accents, the * is associated with the tone that is most prominent in the stressed syllable. Boundary tones are associated with the end of intonational phrases and may as well be monotonal (L%, H%), bitonal (LH%, HL%), and even tritonal (LHL%); boundary tones may also be found at the end of intermediate phrases (L-, !H-, or H-). Pitch accents in final position are referred to as nuclear pitch accents, while all the preceding ones are pre-nuclear pitch accents; the combination of a nuclear pitch accent and a boundary tone constitutes a nuclear configuration.
Several studies examining the realization of focus in Spanish have made use of the Sp_ToBI as well as of acoustic analyses to account for the most relevant prosodic strategies associated with the expression of different focus-structures. The most relevant findings for the purpose of the present study are reviewed below.
Pitch categories: In most varieties of Spanish, a rising pitch accent with a delayed peak (L+<H*) is found in pre-nuclear position in broad focus statements (Hualde and Prieto 2015). In nuclear position, it is possible to find a variety of pitch accents, namely L+H*, L* or H+L*. The use of L+H*, that is, a rising pitch accent with its peak aligned within the stressed syllable, has been reported as well in pre-nuclear position not only for contrastive focus (de la Mota 1997; Face 2001, 2002; Face and Prieto 2007; Gabriel, Feldhausen & Pešková 2009; Hualde 2002) but also for informational focus (Vanrell and Fernández-Soriano in press). Nonetheless, whether there is a phonological contrast between L+<H* and L+H* (shown in Figure 1) is still an open question raised by Hualde and Prieto (2015). In fact, Face (2002) found that the presence of L+H* to mark focus was not consistent in Madrid Spanish. One of the alternative strategies found by Face was the use of L*+H with a higher F0 peak when the word was marked with contrastive focus. In nuclear position, on the other hand, it is possible to find L+H* alternating with a low monotone L* or even with H+L* in certain varieties (Hualde and Prieto 2015), regardless of the type of focus being conveyed. As a result, L+H* can appear in nuclear position both with a broad focus or a narrow focus reading, and no difference in terms of the prosodic realization (i.e., peak height, peak alignment) has been documented (Domínguez 2004).
Another intonational strategy proposed for the expression of focus in Spanish is the use of boundary tones. Nibert (2000), Face (2002, 2003) and Vanrell and Fernández-Soriano (in press) claim that a low intermediate boundary tone (L-) can be found following a focused constituent, both in contexts of informational and contrastive focus. A high boundary tone (H-), on the other, tends to be used to mark the end of constituents conveying given information (Hualde 2002, 2005). Nonetheless, such high tone can also be found marking the end of syntactic constituents (Face 2003), or even following a word marked with contrastive focus (Face 2002).
The phonetic implementation: The acoustic features associated with focused constituents have been analyzed in more detail in contexts of contrastive focus but their role has not been extensively explored in contexts of informational focus. Nevertheless, there may be differences in the phonetic implementation of focal pitch accents in terms of pitch range, peak alignment or duration that could be contributing to mark focus. The Biological Codes, and more specifically the Effort Code, predict that speakers will make use of wider pitch excursions in order to assign more importance to a fragment of their speech (Gussenhoven 2004). For Spanish, however, different features have been shown to play a role. In Vanrell et al. (2013), early peak alignment was found to be consistently used by speakers to mark contrastive focus in pre-nuclear position; duration and pitch scaling, on the other hand, were not exploited as systematically, contradicting findings from previous studies (de la Mota 1997).
1.3. Present study
The aim of the present study is to provide more insight on the intonational and prosodic strategies used in the expression of focus in Spanish. This study is, nonetheless, innovative as it considers a variety that has not been explored from this perspective before, that is, the Spanish spoken in the northern region of Asturias (see Figure 2). The interest in this dialect comes from the fact that it is in contact with another Romance language, Asturian. Arias-Cachero Cabal (2009) explains that although the exact number of Asturian speakers is unknown (the estimations point to 20 to 30% of the population) almost everyone in the region is able to understand the language. Due to the linguistic interference between the two languages, most of the people in Asturias speak a hybrid solution called amestáu, which results from the influence of Asturian on various aspects of the Spanish spoken in Asturias (Dyzmann 2000; González-Quevedo 2001). Few studies have provided an exhaustive description of the intonational grammar of Asturian, or even Asturian Spanish, within the AM framework. Alvarellos et al. (2011) present a phonetic analysis following the parameters of the AMPER project but using ToBI notation to account for the phonological value of the contours found in the varieties of Asturian considered. They found that, in neutral declaratives, L+<H* is used in pre-nuclear position, while H+L* L% is the most common nuclear configuration. Troncoso-Ruiz and Elordieta (2017) found the same nuclear configuration in Amestáu and in Asturian Spanish.
In order to examine the differences in the prosodic realizations of focus, a contextualized sentence completion task was designed so as to elicit examples of the three most common syntactic strategies reported for Castilian Spanish (Feldhausen and Vanrell 2014; Vanrell and Fernández-Soriano 2013): Focus-marking in-situ, clefting, and p-movement. In this study, the analysis will concentrate on the realization of utterances with prosodic marking of focus in-situ. Considering the theoretical frameworks discussed above as well as the conclusions drawn from previous experimental research, the following section will present the research questions guiding this study as well as the hypotheses for the variety of Spanish under study.
1.3.1. Research questions and hypotheses
The overarching question guiding this study is whether speakers of Asturian Spanish use prosody to mark the informational status of an expression. To provide an answer to this question, the research questions guiding this study are: 1) can the nature of in-situ narrow focus marking be captured by phonological categories (pitch accents and boundary tones) distinct from those used to realize accents in non-focused constituents (i.e., broad focus pre-nuclear accents)? and 2) does the phonetic implementation of focal pitch accents (including prosodic parameters like pitch range, peak alignment, or duration) contribute to the expression of different types of focus?
It is predicted that speakers of Asturian Spanish will use prosody to mark focus in-situ, as shown in previous studies for speakers of other varieties (Gabriel, Feldhausen & Pešková 2009; Vanrell and Fernández-Soriano 2013; in press). Thus, taking into consideration that L+<H* is the default pitch accent in pre-nuclear position in broad focus declaratives in Asturian (Alvarellos et al. 2011), the hypothesis for the first research question (H1) is that a phonological contrast based on alignment will be one of the strategies used to signal new information, as it is the case in other dialects of Spanish (Estebas-Vilaplana and Prieto 2008; Gabriel 2006; Hualde and Prieto 2015). As a result, there will be a phonological contrast between rising pitch accents: L+<H* vs. L+H*. Thereby, L+H* will be associated with focused constituents in pre-nuclear (non-final) position; pre-nuclear pitch accents realized on non-focused constituents or broad focus statements will be associated with the pitch category L+<H*. Additionally, falling boundary tones will be used to mark the end of focused constituents (Face 2003; Nibert 2000; Vanrell and Fernández-Soriano in press).
Regarding the second research question, it is predicted that the phonetic implementation of focal pitch accents will play an additional role (H2). Features such as pitch range, peak alignment, and duration will add to the pitch categories to convey focus based on the premises of the Effort Code (Gussenhoven 2004). As a result, focused constituents will display wider pitch range, earlier peaks, and longer duration than the pitch accents associated with non-focused constituents; these features will be much more prominent in contexts of contrastive focus (de la Mota 1997; Vanrell et al. 2013). It is important to note, then, that this hypothesis is mostly concerned with the phonetic realization of pitch accents, which in some cases may result in phonological distinctions (e.g., differences in alignment result in a phonological distinction between L+<H* and L+H*). Nonetheless, as Face (2002) showed, this is not always the case and thus the need to further describe the phonetic implementation of focal and non-focal pitch accents.
The next sections present the methodology employed in data collection and analysis. The results will be discussed quantitatively (Section 3). Then, the findings are discussed in relation to the adopted theoretical frameworks (Section 4). Finally, the relative contribution of the findings will be summarized and some final remarks will be presented (Section 5).
A discourse completion task similar to the one used in Prieto and Roseano (2010) was employed to elicit utterances with different information structure configurations. A sentence completion technique was used to obtain the target utterances. The design then incorporates situations that introduce an information gap in the conversation held by two interlocutors (one of them being identified with the participant). In all the situations, this gap is resolved later on and the participant is asked to provide the missing information to the person who also ignored it in the first place. Such a design was chosen, as opposed to question-answer pairs (employed in most intonational), in an attempt to find an elicitation method that overcomes one of the drawbacks from said methodology: The tendency shown by native speakers to respond with a single word instead of full sentences (Ortega-Llebaria and Colantoni 2014). By building a situation in which the information is introduced little by little and the question is asked in a more implicit or covert way, the use of a full sentence in the answer need not be as unnatural as when all the given information has already been used in an overt question. The fact that the new information is already introduced in the discourse after inserting it in the paragraph provided to each participant should not be problematic since, by putting themselves in the situation, they will still bear in mind that it is not part of the common ground, and therefore this should not prevent them from focalizing it. Another advantage of this methodology is that participants are not being asked to just read a given response; instead, they are prompted to produce a specific type of structure in a more spontaneous manner. Nonetheless, they are still being forced to answer the question in a certain way, which could be considered problematic. Still, we considered this to be a more adequate technique since the goal of the current paper is to provide an account of the intonational and prosodic parameters used in the realization of focus in-situ, if any, under the assumption that this is one of the strategies in free variation for this variety of Spanish (Dufter and Gabriel 2016). Below is an example of one of the situations used in the present study:
|8)||Tu jefe te comenta que alguien pasó la noche en la oficina. No puedes ayudarle, porque no sabes quién fue, pero después tu compañero te comenta que fue Andrea así que vuelves a la oficina del jefe y le dices…|
|‘Your boss tells you that someone spent the night in the office. You cannot help him, because you do not know who it was but later, your colleague tells you that it was Andrea, so you go back to the office and tell your boss?’|
Situations similar to the one presented above were used to elicit utterances with informational focus in three possible syntactic configurations: (a) an unmarked word order and prosodic marking in-situ, (b) clefting, and (c) p-movement. In order to do so, the beginning of the sentence was presented immediately after the situation in one of the following manners depending on the condition:
|b.||Fue… ‘It was’|
|c.||Pasó la noche en la oficina… ‘Spent the night at the office…’|
Three versions of the experiment were created in order to elicit the three possible configurations for each of the situations eliciting informational focus without presenting the same one three times to the same participant. This would allow for the collection of more comparable data while preventing participants from incorporating the new information introduced in previous situations into the common ground. In addition to one practice item that allowed participants to become familiarized with the task and understand the sentence completion technique, 18 situations were created for the elicitation of informational focus (half of them with subject focus and the other half with object focus), three for the elicitation of broad focus, and four for the elicitation of contrastive focus (half for subject focus and the other half for object focus). For the latter, only prosodic marking in-situ and clefting were elicited; p-movement was excluded from this condition since previous studies did not report on the availability of this configuration in the realization of contrastive subject focus. The target sentences contained as well an indirect object or an adjunct, in order for the object not to be in final position when focus was prosodically marked in-situ. Subjects, objects and adjuncts were constantly kept paroxytones, but this was not always possible in the case of verbs, which were almost consistently oxytones, as a result of using verbs in the past tense throughout the situations. Table 1 presents a schematic representation of how the items were distributed in the experimental design.
|Informational Focus||Contrastive Focus||Broad focus|
|Subject Focus||9||3 with prosodic marking in-situ||4||2 with prosodic marking in-situ|
|3 with clefting||2 with clefting|
|3 with p-movement|
|Object Focus||9||3 with prosodic marking in-situ||4||2 with prosodic marking in-situ|
|3 with clefting||2 with clefting|
|3 with p-movement|
The experimental task was performed in a convenient place for the participants where no background noise would compromise the quality of the recordings (i.e., their home or a language lab). Before participants started with the experimental task, they were asked to first sign the informed consent form and then complete a linguistic background questionnaire, which included questions concerning their native language, their birthplace, their parents’ origin, and their linguistic practices in terms of the use of Spanish, Asturian, or a combination of both with family and friends, as well as in more formal contexts. They were finally asked to assess the degree of influence of Asturian in their Spanish when speaking with family and friends or in other contexts (i.e., at work or school) in a scale from 1 to 10 (1 meaning that the influence is minimal and 10 meaning that the influence is considerable).
Upon the completion of the background questionnaire, participants were presented with the experimental task using a PowerPoint presentation. They read each situation quietly to themselves and then responded as naturally as possible completing the sentence presented to them immediately after. They were recorded with a Logitech USB Headset (model A-00009) attached to a MacbookPro laptop using the software Audacity. The recordings were digitized at a 44,100 Hz sample rate and a 16 bit amplitude resolution.
The following analysis presents the results from twelve speakers of Asturian Spanish, who were presented with one of the three versions of the experiment (four participants per version). Three more speakers were recorded but their data was discarded from the analyses due to the high rates of disfluency in their speech. The mean age of the participants was 30 years of age (23–40). All participants were born in Asturias, although five of them were born to parents who were not raised in Asturias.1 All participants considered Spanish to be their first language. In spite of that, they all acknowledged their use of Spanish “with an Asturian accent” or a combination of both Spanish and Asturian in informal contexts (with family and friends); this influence was, nonetheless, minimized in more formal contexts, such as the workplace or the university; the average degree of influence that they reported in the speech they use with friends and family is 5 (ranging from 1 to 9 and 1 to 8 respectively), as opposed to 3.6 in more formal contexts (ranging from 1 to 6). Table 2 below presents the values provided by each participant for each of the contexts.
In order to perform the analysis, the utterances were extracted from each recording. In total, 348 utterances were elicited (29 items × 12 participants). Out of these utterances, 70 were discarded from the analysis for various reasons: Doubt leading to question intonation or long pauses in between constituents (29), disfluency (10), non-target-like utterances due to the use of pseudo-clefts or non-full sentences, among other reasons (22), background noise and laughter (9). Out of the remaining 278 utterances, the present analysis concentrates on the prosodic realization of 59 utterances with informational focus prosodically marked in-situ, 26 utterances with broad focus, and 39 utterances with contrastive focus prosodically marked in-situ.
Each utterance was annotated using Praat (Boersma and Weenink 2015). Following Vanrell et al. (2013), annotations included the orthographic transcription, boundaries marking the beginning and the end of the stressed syllable: oasy (onset of accented syllable) and ofasy (offset of accented syllable) for focalized constituents and odsy (onset of –accented- defocalized syllable) and ofdsy (offset of –accented- defocalized syllable) for non-focused constituents. Additionally, the point at which the highest tone within a pitch accent was realized, as well as the lowest one when it was not aligned with the beginning of the stressed syllable, were manually marked using the Praat functions that allow for the identification of the minimum and the maximum pitch in a specific segment (even in cases where a plateau was found); corrections were then manually performed in cases of pitch track errors. These segmental labels facilitated the manual extraction of pitch range, alignment and duration values. Moreover, the tones associated with the stressed syllable of words bearing prominence as well as the tones associated with the end of intermediate and intonational phrases were annotated following the latest version of the Sp_ToBI (Hualde and Prieto 2015). An example of the coding is presented in Figure 3. These tonal labels were transferred into a spreadsheet, where further coding was carried out.
In the spreadsheet, the nucleus of each syntactic constituent was coded for the following variables: a) Item; b) Participant; c) Focus condition, the relevant ones to this study being: B (Broad focus), IS (Informational subject focus expressed in-situ), IO (Informational object focus expressed in-situ), CS (Contrastive subject focus expressed in-situ), and CO (Contrastive object focus expressed in-situ); d) Syntactic function: Subject (S), Verb (V), Object (O), Adjunct (A), and Indirect Object (I); e) Pitch accent; f) Boundary tone; g) Range, which was calculated subtracting the minimum from the maximum F0 values, in the case of bitonal pitch accents; in order to normalize F0 values, measurements in Hz were converted to semitones using the formula (12*log2(Hz) – 12*log2(origin)), since this scale has been shown to be the most appropriate one in order to obtain normalized values (Nolan 2003); a value of 0 was assigned to monotonal pitch accents; h) Alignment, which applies only to rising bitonal accents and corresponds to the distance in milliseconds from the F0 peak to the end of the stressed syllable; i) Duration of the stressed vowel; duration measurements were normalized, calculating a z-score for each speaker.
R Studio (R Core Team 2014) was used to run generalized additive regression models, given the non-parametric nature of the data. The package used for that purpose was mgcv (Wood 2011). Two generalized additive regression models with binary dependent variables were created, one in which the dependent variable was the presence or absence of L+H*, and another one in which the use of an intermediate boundary tone or lack thereof was the dependent variable. The goal was to test whether the presence of a pitch category (or lack thereof) was significant in the realization of the specific constituent under study (i.e., the subject or the object) based on the focus condition (fixed effect) in which they had been realized. Furthermore, three generalized additive regression models with linear dependent variables were fit to the data. In each one of them, the dependent variable was one of the prosodic features considered in the present study (i.e., pitch range, peak alignment, and duration). The goal was to determine whether there were differences in their manifestation based on two fixed effects: Focus condition and function (i.e., subject, verb, object, adjunct).
In order to determine whether there is intonational and prosodic marking of focus in-situ in Asturian Spanish, this section will describe the use of pitch accents and boundary tones, on the one hand, and the role played by other prosodic features (i.e., pitch range, peak alignment, and duration) on the other. Utterances with informational subject focus (IS) and informational object focus (IO) will be taken as the point of comparison.
3.1. Pitch categories
3.1.1. Subject focus
Table 3 shows the distribution of pitch accents placed on subjects in five different focus conditions: Informational subject focus (IS), contrastive subject focus (CS), broad focus (B), and non-focused in contexts of contrastive object focus (CO), and informational object focus (IO). L+<H* is the most common pitch accent produced on the subject across conditions. Its use, nonetheless, seems to decrease in contexts of subject focus at the expense of an increase in the use of other pitch accents, such as L+H* and H*. The results from the generalized non-parametric regression model with the presence or the absence of L+H* as the dependent variable did not reveal any significant increase of its presence or lack thereof in any of the focus conditions as compared to its presence in contexts of informational subject focus.
Figure 4 shows the distribution of boundary tones employed after the subject in all the focus conditions described above. A clear difference can be established between utterances with subject focus and utterances with broad focus or object focus, since in the last two contexts, only two possibilities arise: Either the use of a high boundary tone (H-) or the absence of a boundary tone. In contexts of subject focus, on the other hand, the use of a variety of boundary tones is more common, and even more so in contexts of contrastive focus. Nonetheless, the results from the regression model did not reveal any significant differences between conditions regarding the presence, or lack thereof, of boundary tones.
Examples of some of the configurations found in the data to convey informational subject focus are shown in Figures 5 and 6. Figure 5 shows the realization of a rising pitch accent with a late peak (L+<H*) followed by a high (H-) intermediate boundary tone on a focused subject. In the utterance shown in Figure 6, the subject is realized with an earlier peak (L+H*) followed by a falling intermediate boundary tone (!H-).
3.1.2. Object focus
In contexts of object focus marked in-situ, the picture is more complicated as some participants omitted the adjunct or the indirect object, leaving the object in nuclear position. While participants were encouraged to produce full sentences, this was not always the case. As repeating the answer adding the missing element would result in a less natural utterance and the pragmatic information could be disregarded in an attempt to produce the target sentence, participants were not asked to provide a new response. Since the interest of this study is to determine whether intonational marking of focus can take place in-situ even if the focused word is in non-final position (the default position for prosodic prominence in Spanish), only the data from utterances with non-final objects will be discussed. In pre-nuclear position, the pitch accents assigned to focused objects vary considerably (see Table 4). The most common one in all the non-contrastive contexts is L+<H*, the pitch accent that is known to be associated with broad focus readings. It is interesting to note, nonetheless, that the percentage of uses of L+H* on the object decreases in utterances with subject focus and is non-existent in utterances with broad focus. In the utterances with contrastive focus displaying non-final objects (two in cases of object focus and four in cases of subject focus) there is variation as well. Nonetheless, the results from the regression model indicate that the presence of L+H* is not significantly different in any of these conditions as compared to the IO condition.
The proportion of intermediate boundary tones produced when objects were in non-final position is presented in Figure 7. While there seems to be a tendency towards an increase in the use of boundary tones after focalized objects, the results from the regression model did not reveal any significant differences. When marking contrastive focus, the use of L- is much more consistent. However, given the reduced number of utterances expressing contrastive focus in pre-nuclear position, these results should be taken with caution.
The examples shown in Figures 8 and 9 below present two different utterances with informational focus as produced by the same participant. In the first one, the object is realized with a L+H* pitch accent; in the second one, the object is realized with a L+<H* pitch accent followed by a H- boundary tone.
In summary, and as the examples presented above suggest, there is variation, not only between participants, but also within participants. With regards to the pitch accents used, L+<H* was the most common one in contexts of informational subject focus, followed by L*+H, which is another category that corresponds to the underlying (LH)* category. The pitch accent that was predicted to appear in this condition, L+H*, was only used by participants 1, 3 and 10, and out of these, only participant 3 used it consistently. Interestingly, participant 3 is the one who reported the lowest degree of influence of Asturian in the way he speaks Spanish, although he acknowledges that he speaks a combination of Asturian and Spanish. The use of L+H*, nonetheless, increased in contexts of contrastive focus. In contexts of informational object focus, L+<H* was also the most common pitch accent while the use of L*+H was almost null. More participants produced L+H* in this context (participants 2, 3, 11, 12, and 13) but none of them used it consistently. These participants, as well as those who did not use L+H* at all, reported different degrees of influence of Asturian in the way they speak. Thus, it is not possible to draw any conclusions on what the influence of Asturian, if any, would be. Due to the deletion of the adjunct, no conclusions can be drawn with regards to the realization of contrastive object focus in pre-nuclear position.
Regarding the individual variation found in terms of the intermediate boundary tones used in contexts of informational focus, it is interesting to note that while most participants used either a high intermediate boundary tone (H-), or no boundary tone at all after the focused constituent, some participants made use of different boundary tones. In contexts of subject focus, participant 3 used !H- consistently after the focused subject while participant 10 used a bitonal intermediate boundary tone (LH-) in one of the utterances in this condition. In cases of object focus, as it was the case with pitch accents, more variation was found between participants. In this context, participant 3 used L- consistently, and participant 10 made use of this intermediate boundary tone once. In addition, participants 2 and 13 used !H- and LH- as well in some of their utterances. With regards to contexts of contrastive focus, the results point to an increase in the use of intermediate boundary tones and specifically, an increase in the use of L-, although no significant differences were found.
3.2. Prosodic features
In order to determine whether the phonetic implementation of the focal pitch accent (or pitch accents on post-focal material) contributed to the expression of focus, the use of features such as pitch range, peak alignment, and duration was further explored. In other words, the goal is to determine whether speakers are employing specific prosodic strategies instead of or in addition to the intonational ones described above. The manifestation of these features in each content word in all the different conditions will be compared to their equivalent in utterances with informational subject focus first, and informational object focus after; thus, the analysis will examine the prosodic realization of the relevant constituents (i.e., the subject and the object, respectively) as well as that of other constituents, including then as well the realization of post-focal material.
3.2.1. Pitch range
Figure 10 shows the pitch range values reported for each content word in each condition.
Subject focus: Taking utterances with informational subject focus (IS) as the point of comparison, no significant differences were found in the realization of subjects, although there was a tendency for them to be produced with a wider pitch range in utterances with contrastively focused objects (CO). Regarding the realization of post-focal material, indirect objects in the IS condition were produced within a significantly narrower pitch range than those in utterances with informational object focus (IO) (ß = 2.52, SE = 1.13, t = 2.23, p < 0.05).
Object focus: Taking utterances with informational object focus (IO) as the point of comparison, the results from the regression model reveal that objects were produced within a significantly narrower pitch range in utterances with broad focus (B) (ß = –1.16, SE = 0.57, t = –2.04, p < 0.05) and utterances with contrastive subject focus (CS) (ß = –1.43, SE = 0.67, t = –2.12, p < 0.05). The realization of the remaining constituents was not significantly affected by the type of focus being conveyed.
3.2.2. Peak alignment
Figure 11 shows the peak alignment values for each content word produced with a rising pitch accent in each condition. Since peak alignment was measured as the distance from the maximum F0 point to the end of the stressed syllable, the value of 0 represents the offset of the stressed syllable.
Subject focus: Taking utterances with informational subject focus (IS) as the point of comparison, the regression model indicates that subjects in this context displayed significantly later peaks than those in utterances with contrastive subject focus (CS) (ß = 38.19, SE = 15.55, t = 2.45, p < 0.05) or informational object focus (IO) (ß = 26.43, SE = 12.71, t = 2.07, p < 0.05). With regards to the realization of post-focal material, it was found that objects in the IS condition were realized with significantly later peaks as compared to those in contexts of contrastive object focus (CO) (ß = 79.38, SE = 34.73, t = 2.28, p < 0.05) or informational object focus (IO) (ß = 29.41, SE = 14.81, t = 1.98, p < 0.05).
Object focus: When comparing the peak alignment patterns in utterances with informational object focus (IO) with those in the remaining conditions, it was found that objects display significantly earlier peaks in this condition than objects in utterances with informational subject focus (IS) and in utterances with broad focus (B) (ß = –46.88, SE = 19.23, t = –2.43, p < 0.05). No significant differences were found when comparing informationally and contrastively focused objects but it seems that alignment alone can favor the distinction between focused and non-focused objects. Furthermore, no significant differences were found in the realization of the remaining constituents.
Figure 12 shows the normalized duration of the stressed vowel for each constituent in each condition.
Subject focus: This is the prosodic feature that gave rise to more significant differences when taking utterances with informational subject focus (IS) as the baseline. The results from the regression model reveal that, for subjects, the stressed vowel was significantly longer when they were contrastively focused (CS) (ß = 1.06, SE = 0.26, t = 3.99, p < 0.001) but significantly shorter in utterances where the object was contrastively focused (CO) (ß = –0.48, SE = 0.24, t = –2.03, p < 0.05). This then allows then for a distinction between different focus types: Contrastively focused subjects displayed longer stressed vowels than informationally focused subjects which, in turn, displayed significantly longer stressed vowels than non-focused subjects in contexts of contrastive object focus. No significant differences were found between informationally focused subjects and subjects in broad focus statements. Regarding the realization of post-focal material, the only significant difference concerned objects, which displayed significantly shorter stressed vowels in the IS condition than when they were contrastively focused (CO) (ß = –0.78, SE = –0.23, t = –3.03, p < 0.01).
Object focus: Interesting differences in terms of duration arise as well when the point of comparison are utterances with informational object focus (IO). The results from the regression model reveal that the stressed vowel in informationally focused objects is also significantly shorter than that of objects marked with contrastive focus (CO) (ß = –0.67, SE = 0.23, t = –2.91, p < 0.01) establishing, as it was the case with subjects, a distinction between informational and contrastive focus. With respect to the realization of the remaining constituents, it was found that subjects displayed significantly longer stressed vowels when contrastively focused (CS) than when produced in utterances with informational object focus (ß = 1.19, SE = 0.26, t = 4.58, p < 0.001). Furthermore, adjuncts displayed longer duration of their stressed vowels in utterances with broad focus (B) than in utterances with informational object focus (ß = 0.60, SE = 0.24, t = 2.43, p < 0.05), which could be related to the hypoarticulation of post-focal material.
To summarize, these results indicate that in the realization of subjects, none of the prosodic parameters considered were relevant in the distinction between focused and non-focused subjects. However, alignment and duration were used to differentiate informational and contrastive subject focus: Contrastively focused subjects displayed earlier peaks and longer duration than informationally focused subjects. Regarding the realization of objects, it was found that alignment and pitch range allowed for the distinction between focused and non-focused objects, since informationally focused objects displayed earlier peaks and increased pitch range. In order to distinguish informationally from contrastively focused objects, the most relevant cue was duration, which was longer in contrastive focus contexts. Pitch range did not have any significant effect in the contrast between informational and contrastive focus, neither for subjects nor for objects. Finally, some differences were found in the realization of the final constituent: a) Indirect objects were realized within a much narrower pitch range when the subject was informationally focused than when the object was focused; b) The stressed syllable in adjuncts produced in utterances with informational object focus was significantly shorter than in broad focus contexts. The relevance of these findings will be discussed in the following section.
The production results presented above provide some insight on the prosodic realization of focus in Asturian Spanish. First, the analysis concentrated on the use of pitch accents and boundary tones. Then, the manifestation of other prosodic cues (i.e., pitch range, alignment, and duration) was further explored in order to determine their role in the phonetic implementation of focus in-situ.
The first research question and hypothesis (H1) were concerned with the use of pitch accents and boundary tones. It was predicted that a phonological contrast based on alignment would be established between rising pitch accents: L+<H* vs. L+H*. Thereby, L+H* would be associated with focused constituents while pitch accents realized on non-focused constituents or broad focus statements would be realized with a later peak (L+<H*) when produced in non-final position. This was not the case, since the most common pitch accent across conditions was L+<H*. The use of L+H* increased slightly in utterances expressing contrastive focus but the difference was not significant.
Interesting trends were found in the use of intermediate boundary tones after the focused constituent. In this sense, a variety of boundary tones were almost consistently used in cases of contrastive focus, and more than 50% of the time in contexts of informational focus; after non-focused constituents and in cases of broad focus, on the other hand, the use of intermediate boundary tones decreased. Differences, however, did not reach significance. Nonetheless, the employment of intermediate boundary tones cannot be considered as the sole mechanism allowing for the distinction of different types of focus, as there was no division of labor between different boundary tones, and H- was the most common one, regardless of the strength of the focus being conveyed.
These results suggest that pitch categories alone cannot account for the realization of focus in Asturian Spanish, as opposed to what other studies have shown for other dialects of Peninsular Spanish (Face 2001; Nibert 2000; Vanrell and Fernández-Soriano in press), at least in the type of situations used in this experimental task to elicit informational focus. Interestingly, the configuration reported in some of the previous studies, that is, L+H* followed by a falling intermediate boundary tone (L-), was found most consistently in the speech of the participant who acknowledged the lower degree of influence of Asturian in his Spanish. While this may be a simple coincidence, it would be worth to further explore this trend and determine whether speakers of Asturian Spanish disfavor the use of intonation as a way to mark focus and prefer to use syntactic strategies such as clefting or p-movement. This could explain the lack of use of specific phonological categories in the expression of focus.
Despite the lack of a phonological distinction between L+H* and L+<H*, speakers may still be using prosodic (intonational and non-intonational) parameters differently to signal the status of the information conveyed in their utterances. In this line, the second hypothesis (H2) stated that focused constituents would be realized with a wider pitch range, earlier peaks, and longer duration, and that these features would be exploited even more in contexts of contrastive focus. The individual analysis of subjects and objects revealed that, depending on the syntactic function of the focused word, different prosodic features could become relevant.
In contexts of informational subject focus, few prosodic features seemed to be used to highlight the information status of the subject. Subjects marked with contrastive focus, on the other hand, were produced with earlier peaks and longer duration than informationally-focused subjects. This suggests that the prosodic features realized on the subject only become relevant in the expression of contrastive focus, while they make no difference in contexts of informational or broad focus. The prosodic realization of objects, on the other hand, does differ as a result of the information status. Informationally focused objects displayed wider pitch range and earlier peaks than objects in broad focus statements. With respect to the distinction between contrastive and non-contrastive focus in the realization of objects, duration was the most relevant prosodic feature (i.e., longer stressed vowels were produced in contrastively focused objects) while pitch range and alignment did not differ significantly. All these parameters involve the use of an increased effort with the purpose of highlighting a specific constituent, as predicted by the Effort Code (Gussenhoven 2004). As suggested by Baumann et al. (2007), parameters other than pitch range can be exploited to mark focus.
The prosodic realization of other functions besides subjects and objects also points to the relevance of prosody in the distinction of utterances with different types of focus. Non-focused constituents tended to be realized with later peaks and shorter duration (e.g., objects in contexts of informational focus on the subject or subjects in utterances where the object was contrastively focused). It is interesting to note as well that the prosodic realization of adjuncts in utterances with broad focus was characterized by the use of longer stressed vowels as compared to adjuncts in utterances with informational object focus. In contexts of broad focus, the last constituent is the one supposed to be the most prominent one within the utterance; the reduced duration of the stressed vowel in adjuncts produced in contexts of focus on the object could then be the result of the hypoarticulation that characterizes the realization of post-focal material, as suggested by Vanrell and Nadeu (2015). The results from the present study, however, do not provide any evidence in favor of the use of deaccentuation or post-focal compression that has been reported in previous studies (Domínguez 2004; Labastía 2006; Vanrell and Fernández-Soriano in press), as no differences were found in terms of pitch range in the realization of all the different types of informational focus in Asturian Spanish.
In light of the results presented above, this study points towards an asymmetry between subjects and objects. As mentioned above, the phonetic implementation of focal pitch accents realized on informationally focused subjects was not different from that of subjects in broad focus statements. The realization of subjects only differed significantly when comparing informational and contrastive focus, since contrastively focused objects were realized with earlier peaks and longer stressed vowels. Objects, on the contrary, displayed a different prosodic realization if they were informationally focused as compared to when they were produced in utterances with broad focus, since they were produced with an increased pitch range and earlier peaks. Furthermore, contrastively focused objects differed from informationally focused objects, since they were realized with longer stressed vowels. The immediate consequence of this asymmetry is that prosodic marking in-situ of informational subject focus does not seem to be available to Asturian Spanish speakers. More needs to be investigated about the reasons why the phonetic realization of objects is more susceptible to be modified as a result of the informational context when the canonical order is maintained and what the consequences of this are for the grammar.
The experimental design proposed in the present study, while innovative, may have also motivated the asymmetry described above. The discourse completion task combined with a sentence completion technique avoided the use of one-word responses, although the pragmatic nature of the responses elicited in this manner may have differed from that of utterances elicited using question-answer pairs, since the communicative situation is slightly different. Additionally, the use of three dots may not have been the most ideal method to prompt participants to complete the sentence, as the use of three dots is associated with different communicative intentions (e.g., doubt, unfinished statement, etc.), each one of them associated with a specific intonational patterns characterized by the use of continuation rises or sustained pitch. Instead, it might have been better to use a long stretch of an underlined empty space, as in (10). The unbalanced nature of the situations included in the experimental design to elicit contrastive focus (2 instead of 3 for strategy and type of focus) was another flaw of the experimental design. While the intention was simply to reduce the length of the experiment under the assumption that more consistency would be found in those utterances, participants did not always produce the expected response (they tended to omit the adjunct/indirect object), which in turn resulted in the exclusion of several utterances from the analysis.
|c.||Pasó la noche en la oficina_____________________________.|
In summary, given the data collected in this study, it seems to be the case that speakers of Asturian Spanish do not use pitch categories systematically to mark the status of the information being introduced in the discourse. The use of the focal pitch accent L+H* found in other dialects of Spanish (Face 2001; Face and Prieto 2007; Vanrell and Fernández-Soriano in press) did not increase significantly enough to confirm its phonological role among the group of participants of this study, and it was only present in the speech of a few participants. This provides support to the idea of an underlying rising pitch accent (LH)* proposed in Hualde (2002). Thus, there would be no phonological distinction between its three possible realization in Spanish (L*+H, L+<H* and L+H*). It is worth pointing out, nonetheless, that the pool of participants considered in this study was bigger than that of most of previous studies, which may have led to wider variation across speakers. That was also the case in Face (2002), a study with 20 participants, in which four different strategies were found in the realization of contrastive focus: The use of L*+H with a higher F0 peak, the use of H- or L- following the word marked with contrastive focus, and the use of L+H* with no boundary tone.
As discussed above, however, fine-grained prosodic details in the implementation of focal accents were significant despite the individual differences. This suggests that even if a specific category is not used systematically to mark focus, as it is the case in languages such as English or European Portuguese, speakers still make use of prosody to highlight the status of information and convey different focus strength (i.e., contrastive vs. non-contrastive). Alignment, in this regard, was still relevant. Even though peaks were not realized consistently within the stressed syllable of a focused constituent, which would have given rise to the consideration of L+H* as a phonological category, subjects were still produced with significantly earlier peaks in contexts of contrastive subject focus, as was the case with objects in utterances marked with informational focus when compared to those in broad focus statements. Furthermore, the prosodic features used for the purpose of distinguishing different focus types need not be necessarily F0 related (Baumann et al. 2007); duration was shown to play an important role in making these distinctions as well. The features that were found to be the most relevant ones in the expression of contrastive focus mostly coincide with those reported in previous studies (Vanrell et al. 2013), which in turn suggests that there need not be dialectal variation in this regard. Thus, given all these variability, the role and contribution of all these features needs to be further explored in subsequent perceptual tasks. The labeling systems proposed within the AM framework, however, do not allow to account for these differences.
This study has provided an exploratory description of the prosodic strategies used in the realization of focus in Asturian Spanish, a variety of Peninsular Spanish that had not been described before. This was accomplished using an innovative elicitation task that had not been used in previous studies, which was aimed at favoring the elicitation of full sentences by avoiding the use of an overt wh-question. The data obtained through this elicitation technique suggested that, while there is not a systematic use of L+H* on focused constituents (as opposed to L+<H*), different prosodic features are exploited to convey meaningful differences. These differences pertain the strength when it comes to the realization of subjects (contrastive focus vs. other), or both the strength and the informational status, as it was the case of objects (narrow focus vs. broad focus). These findings point towards a subject/object asymmetry regarding the availability of prosodic marking of focus which should be further explored. This, as well as the specific type of focus being elicited should be taken into consideration when describing the realization of focus, as pointed out by Féry (2013).
In the light of the results presented above, it could be concluded that prosody can be used in the expression of focus in Asturian Spanish. Nonetheless, the phonological role of specific pitch categories (L+H* vs. L+<H*) has not been established. Considering the proposal put forth by Face and D’Imperio (2005), suggesting that languages should be placed at some point in a continuum from plastic to non-plastic, the findings presented in this study suggest that Asturian Spanish leans more towards being a non-plastic language, although it should not be placed at the far end of the continuum. The reason for this is that the findings do not support the existence of a phonological contrast such as the one proposed for English (Pierrehumbert and Hirschberg 1990) and European Portuguese (Frota 2012, 2014). Nonetheless, the use of prosodic features (i.e., pitch range, peak alignment and duration) was still deemed relevant, especially in the realization of object focus, which would then suggest that there are prosodic strategies available to convey different kinds of focus, separating Asturian Spanish from that end of the continuum reserved for languages that use exclusively syntactic or morphological strategies, such as Hungarian (Vallduví and Engdahl 1996).
Further research needs to be carried out in order to determine whether prosody is used in Asturian Spanish to the same extent to which it is used in other dialects of Spain. Considering the subject/object asymmetry found in the data, more needs to be said about which strategy would be preferred among all the possible ones (e.g., prosodic marking in-situ, clefting, p-movement, etc.). For that purpose, the same situations employed in the experimental task presented in this study could be used to elicit spontaneous responses or as part of a preference task where all those strategies are presented as possible answers. In this regard, it would be very interesting if these tasks were performed by Spanish-Asturian bilingual speakers with different degrees of language dominance (Spanish-dominant, balanced bilinguals and Asturian dominant) as well as by speakers of Castilian Spanish. The phonetic implementation of focal pitch accents should also be compared across strategies and types of focus to determine as well whether the differences that were found in the present study in terms of pitch range, peak alignment or duration are also manifested when a specific syntactic strategy is employed to mark focus, either contrastive or non-contrastive. Furthermore, more perception tasks and acceptability judgment tasks are needed in order to clarify what the role of each of these prosodic parameters is in the conveyance of all the different types of focus. Finally, other strategies such as gesture could be acting as an additional cue, as has been shown in the expression of other pragmatic meanings such as incredulity (Armstrong and Prieto 2015; Crespo Sendra et al. 2013); its role could then be further explored as well in future studies.
The additional file for this article can be found as follows:Appendix
Questions included in the linguistic background questionnaire (as originally presented in Spanish and with the corresponding English translation) and list of all the situations used in the experimental design. DOI: https://doi.org/10.5334/jpl.176.s1