Subject Pronoun Expression in Santomean Portuguese

Studies on Subject Pronoun Expression (SPE) in the Portuguese-speaking world have shown a distinction between European Portuguese, which is a Null Subject Language (NSL) with high rates of null subjects, and Brazilian Portuguese, which is controversially treated as a partial-NSL and exhibits a considerably lower rate of null subjects. No specific studies have been conducted on the matter on Santomean Portuguese, but we know that both null and overt subject personal pronouns exist in this variety of Portuguese. The objective of this paper is to investigate variation in SPE in Santomean Portuguese, and to situate this variety of Portuguese in comparison with other varieties. Results of the variationist analyses show that Santomean Portuguese patterns more like European Portuguese in its high rate of use of null subject. Interestingly, and contrary to previous studies, Santomeans with a higher level of education disfavor the use of null subject, which I relate to a sensitivity to grammatical ideology and the favoring of the overt subject in more formal situations. Most of the results regarding the linguistic predictors, which are stronger than the social predictors, relate Santomean Portuguese to other varieties of Portuguese, and to Spanish.


Introduction
Languages that do not require the presence of an overt subject personal pronoun (henceforth SPP) are called Null Subject Languages (henceforth NSL), or pro-drop languages, and the ones that ordinarily require the presence of an overt SPP are called non-Null Subject Languages (henceforth non-NSL), or non-pro-drop languages.Italian, Spanish, and European Portuguese are NSL, while French, English, and German are non-NSL.Following are examples of null and overt SPP, where we see that SPP are obligatory in English (1), and optional in Spanish (2): (1) a. you bought a computer b.*bought a computer (2) a. tu compraste un ordenador b. compraste um ordenador Although there is no agreement on this classification, some languages are also considered to be partial-NSL, such as Brazilian Portuguese, Finnish, and Marathi.Those languages allow null subjects under more restricted conditions than full-fledged NSL (Holmberg, Nayudu & Sheehan 2009).
Variation in Subject Pronoun Expression (henceforth SPE) is of interest to sociolinguists because the speaker has the option of expressing the SPP or omitting it.How does a speaker make a choice between those two options?The main objective of most sociolinguistic research on SPE has been to ascertain the linguistic, stylistic, and social factors that determine, or at least that influence, the expression or omission of the SPP.All sociolinguistic research has found correlations between those factors and the SPE.Even so, this syntactic variable remains highly debated among scholars who work on the topic.
Linguists have investigated SPE in Brazilian and European Portuguese (cf.Barbosa 2000Barbosa , 2009;;Barbosa, Duarte & Kato 2001, 2005;Duarte 1993), but no studies have been done on SPE in the variety of Portuguese spoken in São Tomé and Príncipe.All we know is that both null and overt SPP are present in Santomean Portuguese, as shown in (3): (3) STP: os Angolares, eles não falam peixe, [Ø] falam kikiê ENG: the Angolar they neg say.3plfish, speak.3plkikiê 'the Angolares, they don't say fish, they say kikiê' -Suéli, 32 years old My objective is to investigate the social and linguistic factors that condition SPE in Santomean Portuguese, and to compare the results to previous research on SPE in Brazilian and European Portuguese.São Tomé and Príncipe was a Portuguese colony from the late fifteenth century to 1975.From the sixteenth century to the beginning of the twentieth century, Forro, Angolar, and Lung'ie (three native creoles) were the most widely spoken languages on the islands (Hagemeijer, in press).However, the massive arrivals of contract laborers starting at the end of the nineteenth century, and the use of Portuguese as a lingua franca completely changed the sociolinguistic setting.As a consequence, a process of linguistic shift (from creoles to Portuguese) started to take place.This shift was intensified from the 1960s, with the rise of the nationalist movement, the choice of Portuguese as a unifying language for Santomeans of different ethnic background, the independence of the country (in 1975), and the generalized access to education (Bouchard 2017;Seibert 2006).Since then, children have been growing up with the local variety of Portuguese as their first (and often only) language (Bouchard 2017).Today, 98.4% of the population speak Portuguese (as a first or second language) (INE 2012).This paper's first section provides a background on SPE in Portuguese varieties.The second section details the methodology for analyzing SPE and the social and linguistic variables included in the quantitative analyses.The third section offers an overview of the distribution of null and overt subject pronouns in Santomean Portuguese, and the fourth section presents and analyzes the results.Finally, the last section is a wrap-up of the most important findings.

Subject Pronoun Expression in Portuguese varieties
Variable SPE constitutes a morphosyntactic feature that Portuguese inherited from Latin.However, not all Latin's descendant languages developed into NSL: European Portuguese, Spanish, and Italian are still consistent NSL, Brazilian Portuguese is a partial-NSL, or a NSL with a lower rate of null subjects, depending on one's position, and Modern French and Haitian Creole are non-NSL (Orozco 2015a).This section will review some of the literature on SPE in varieties of Portuguese, focusing on European and Brazilian Portuguese, as more studies exist on those varieties, in order to present how speakers of Portuguese use SPP.
The nature of null subjects in Brazilian Portuguese has been investigated by Negrão (1997), Modesto (2000aModesto ( , 2000bModesto ( , 2009)), Rodrigues (2002, 2004), and Sheehan (2006), and in European Portuguese by Duarte (1995) and Barbosa (1995Barbosa ( , 2000Barbosa ( , 2009)).Studies comparing SPE in European and Brazilian Portuguese are also numerous (cf.Barbosa, Duarte & Kato 2001;Magalhães & Santos 2006).As written above, European Portuguese is a full-fledged NSL, while Brazilian Portuguese is undergoing language change in the direction of becoming a non-NSL (Martínez-Sanz 2011); it is considered a partial-NSL (cf.Holmberg, Nayadu & Sheenan 2009), or a semi pro-drop language (cf.Erker & Guy 2012).In one view, this partiality implies that "two different NSL grammars are available in the mental grammars of Brazilian speakers: on the one hand, a NSL grammar that allows for the licensing of null subjects, and a non-NSL grammar, on the other, responsible for widespread overt subjects" (Martínez-Sanz 2011: 154-155).The Brazilian SPE system could be viewed as a language that allows null subjects in certain restricted environments, but that lacks unrestricted referential null subjects (Barbosa 2013;Martínez-Sanz 2011).
Lobo (2016) discusses variation in SPE comparing, among other languages, European and Brazilian Portuguese.According to her, one of the distinctions between these two varieties of Portuguese is the interpretation of the (null or overt) subject in reference to its antecedent.In European Portuguese, a null SPP usually refers to the subject of the main sentence, while an overt SPP usually refers to the complement.That means that in (4), it is João who won the race, and in (5), it is Pedro.
(4) EP: o João disse ao Pedro que Ø tinha ganho a corrida ENG: the John told.3sgto.thePeter that had.3sg won the race 'John i told Peter j that he i won the race ' Lobo (2016: 564) (5) EP: o João disse ao Pedro que ele tinha ganho a corrida ENG: the John told.3sgto.thePeter that he had.3sg won the race 'John i told Peter j that he j won the race ' Lobo (2016: 564) In Brazilian Portuguese (a partial-NSL), the overt SPP is not necessarily interpreted the same way as in European Portuguese, as the overt SPP in (6) may refer to João, or to Pedro.
(6) BP: o João disse ao Pedro que ele tinha ganho a corrida ENG: the John told.3sgto.thePeter that he had.3sg won the race 'John i told Peter j that he i/j won the race' That being said, the pioneer work on the changing nature of SPE in Brazilian Portuguese is Duarte's (1995) dissertation.In her study, she demonstrates how Brazilian Portuguese differs from European Portuguese and other NSL regarding SPE, and how it is changing toward a more frequent use of overt subjects.Figure 1 illustrates this change.
The author relates this change in SPE to the reduction of the Brazilian inflectional paradigm.This reduction probably started with the loss of the second person singular tu 'you' and its replacement by você (which takes third person agreement) (Duarte 1993;Galves 1990).From the inflectional paradigm that originally showed six distinctive forms, Brazilian Portuguese now has three distinctive verb endings, as a result of the loss of tu 'you' and the replacement of first person plural nós 'we' by the expression a gente 'we', which also takes third person singular verbal agreement (e.g.nós comemos → a gente come 'we eat').1 To illustrate this change in the Brazilian inflectional paradigm, see the difference in Table 1 between the European and Brazilian systems with the verb falar 'to speak/to talk'.
According to Duarte (1995), as a consequence of this changing paradigm, the Avoid Pronoun Principle2 that usually leads to the null representation of the subject is lost, and null subject becomes a less frequently used option.In her work, she showed that in the 1992 stage, 71% of SPP were phonologically realized, and 29% were not.However, this is a functional explanation of the change in Brazilian Portuguese.An alternative theory is that reduced verbal inflection and higher rates of SPE are both consequences of slavery in Brazil, and massive L2 acquisition of some perhaps creolized but certainly non-standard version of Portuguese by the Africans brought to Brazil (Guy 1981).A semi-acquired L2 version of Portuguese (as well as a creole) would probably lack verbal inflection and require overt SPP.In this view, contemporary Popular Brazilian Portuguese is a partially decreolized descendant of that earlier L2 version of the language (cf.Guy 1981; Lucchesi et al. 2009).However, it is somewhat reductionist to only include African descendants in this theory, as other linguistic and cultural groups (including Amerindians and European emigrants) also acquired Portuguese as a L2 in Brazil.Also, under a decreolization hypothesis, one might expect the language to go from non-NSL to NSL, i.e. the opposite of what is shown in Figure 1.Note that other studies on the syntax of subject licensing in Brazilian Portuguese agree with Duarte (1995) regarding the semantic and syntactic distinction of null subjects in this language (Barbosa 2009(Barbosa , 2013;;Ferreira 2000;Holmberg 2005;Kato 1999;Rodrigues 2002Rodrigues , 2004;;Sheehan 2006), which set it apart from European Portuguese and other NSL.
There are not as many quantitative studies on SPE in European Portuguese as there are in Brazilian Portuguese. 5 Much of the information we have about SPE in European Portuguese come from comparative studies between this language and Brazilian Portuguese.Duarte (2000) has compared the distribution of overt and null subjects in those varieties of Portuguese.In Table 2 (adapted from Barbosa, Duarte & Kato 2005: 23;and Duarte 2000: 25) Santos 2006, in Oliveira & Ferreira dos Santos 2007: 97) shows their results regarding SPE.
We note in Table 3 that 1) like European and Brazilian Portuguese, both overt and null subjects are possible, and 2) that Angolan Portuguese seems to pattern more like European 4 Making generalizations about Brazilian Portuguese is challenging, as many different dialects of the variety exist.Regarding the second person singular for example, note that in some regions such as Rio Grande do Sul and Pará, the pronoun tu 'you' is used, but it usually agrees with the third person singular verb ending (e.g.tu fala instead of tu falas 'you talk'). 5One may refer to Ambulate (2008), andCosta, Lobo andSilva (2009), for work on acquisition of SPE, and to Barbosa (2000Barbosa ( , 2010Barbosa ( , 2013)), Lobo (1995), andRaposo (1986) for work on the syntax of SPE.Duarte (1995) for Brazilian Portuguese.However, not enough information about the methodology is given in Oliveira and Ferreira dos Santos (2007) and Teixeira (2013) to analyze and understand the difference between their results.
To my knowledge, Dias (2009) is the only paper on SPE in Mozambican Portuguese.However, her study differs greatly from the other mentioned above as she used a written corpus.Dias investigated SPE in the written Portuguese of forty-five 5th grade students in a suburban region of Maputo.They are all bilingual, speakers of Changana and Portuguese, and most of them learned Portuguese as a L2 at school.Her results show that 52.5% are null SPP and 47.5% are overt.Interestingly, Dias noted that the first person singular mainly occurs with overt SPP, while the first person plural only occurs with null SPP.She also writes that null SPP use correlates with more verbal agreement.
In Table 4 are results that compare numbers for the four varieties of Portuguese discussed in this section.However, remember that the first three varieties are in their spoken form, and the last one, in written form.These numbers come from different studies and are not balanced regarding sociolinguistic variables.More comparable studies on the topic are necessary.
To my knowledge, there are no studies on SPE in the variety of Portuguese spoken in São Tomé and Príncipe.However, it is possible to see in the literature on Santomean Portuguese that SPPs can be expressed (7) or not ( 8): ( 7) Gonçalves (2010: 130) STP: nós fazemos um pouco de tudo sem aprofundarmos bem no assunto Eng: we do.3pl a little of all without deepening.3plwell in.thetopic 'we do a little bit of everything without deepening too much in any topic' (8) Gonçalves (2010: 50) STP: queria matricular no instituto Eng: wanted.1sgto.register in.the institute 'I wanted to register for the institute' It is also possible to find sentences that contain both overt and null SPP, as in example ( 9): (9) Gonçalves (2010: 130) STP: depois cheguei (a) um momento que eu vi que era vazio Eng: after arrived.1sg(to) one moment that I saw.1sg that was.3sg empty 'after, I arrived at some point and saw that it was empty' 7 Oliveira & Ferreira dos Santos (2007).Note that Teixeira (2013) had slightly different results, with 65% of overt SPPs.Based on this information, I now turn to the methodology of the variationist analysis I conducted to investigate SPE in Santomean Portuguese.

Methodology for coding
The fieldwork for my data was mainly conducted in the city of São Tomé, the capital of São Tomé and Príncipe, and its surroundings, between June 2015 and March 2017.This study is based on roughly 46 hours of tape-recorded individual interviews from 48 adults and eight teenagers.These interviews were carried out employing techniques from both sociolinguistic interviews (Becker 2013;Labov 1984;Tagliamonte 2006) and ethnographic interviews (Spradley 1979).The participants included in this study are Santomeans born and raised on São Tomé Island who live in the capital or its surroundings.Many of the participants are monolingual Portuguese speakers, or have some knowledge of Forro, and a few (usually older) participants are bilingual native speakers of Forro and Portuguese.The interviews were transcribed, and 100 tokens per participant were coded for analysis.
Decisions regarding inclusion and exclusion of tokens are greatly influenced by the coding manual of Otheguy and Zentella (2012).The tokens included in the dataset are the ones where the null and overt subject alternation is possible.The initial dataset comprised 5,600 tokens (100 tokens per speaker), and was reduced to 4,512 tokens once the full noun phrase subjects were excluded.The envelope of variation is based around the Principle of Accountability (Labov 1972), i.e. all clauses where the variant is possible are analyzed to compare the number of tokens of null subjects with those of expressed subjects.

Dependent linguistic variable
type of pronoun expression.The dependent variable is how speakers express a SSP, whether it is with a null subject (e.g.falas 'you speak') or an overt pronominal subject (e.g.tu falas 'you speak').This is summarized in Table 5.

Independent social variables
speaker.I have chosen 56 speakers from the capital of São Tomé and its surroundings.
Region is a social variable that has been widely discussed in language variation and dialect studies (cf.Chambers & Trudgill 1998).The place where people grew up and spent most of their time is traditionally an important criterion when studying variation; speakers from different places speak differently.To be rigorous about this, all the participants of this study come from the same area.gender.I selected an equal number of men and women in order to study the variable gender.Sociolinguistic studies have shown that linguistic variation often correlates with gender (or sex) of speakers (Cheschire 2004;Trudgill 2000).Following Eckert (1990), I choose the word gender to refer to the social and cultural elaboration of sex difference, as sex has become more politicized and problematized in the past few decades (Cheshire 2004).Gender separation is manifested in a number of domains of social life in São Tomé, including the division of labor regarding housekeeping, parenting, tasks and jobs, among other things.
age.The speakers selected can be divided into five age categories: 12-18, 20-29, 30-39, 40-49, and 50 and more.Note that São Tomé and Príncipe has a youthful age  (Labov 1963(Labov , 1966)), applying the apparent-time construct (Bailey 2004;Bailey et al. 1991).level of education.Level of education is a good indicator of socioeconomic status in São Tomé, as in many other countries.Many sociolinguistic studies provide evidence that different social groups within a community differ in their usage of linguistic features (cf.Labov 1966;Trudgill 1974).For this study, level of education is divided into primary school (grade 1 to 6), high school (grade 7 to 12), and university (including bachelor, master and doctorate).All participants attended school, and some of them were still in school at the time of the interviews.The school grade that was attributed to them is the grade that was completed, or in progress in the case of those who were still in school.ethnic origin.Labelling by ethnic origin is problematic (cf.Fought 2004), especially among a mixed-race and mixed-ethnic population such as São Tomé and Príncipe.I tried as much as possible to focus this research on Forros because, as for place of origin within the island, I am not sure if ethnic origin has an influence on language or not.By choosing mainly Forros to participate to this study, I wanted to avoid dealing with this problem.However, this did not work out as planned, as some of my "Forro" participants appeared to have one non-Forro parent (Angolar or Cape Verdean).Therefore, participants are divided into two (unbalanced) groups, depending on if they have two Forro parents, or one Forro parent (the other one being of other ethnic origin). 8poken language(s).The possible influence of creole on Portuguese is important to the present study.All the participants speak Portuguese, but knowledge of creole varies from one speaker to another.I divided speakers according to whether they were monolingual in Portuguese L1 (with no knowledge of creole), speakers of Portuguese L1 with "some" knowledge of creole, and bilingual.9 Table 6 summarizes the independent social factors.10

Independent linguistic variables
type of clause.To see whether the type of clause has an impact on SPE, the clauses were divided into three groups: main clause, conjoined clause, and subordinate clause.type of clause appeared as a constraint that significantly conditions SPE in previous studies of Spanish (Morales 1997;Orozco 2015a;Otheguy & Zentella 2012;Otheguy, Zentella & Livert 2007).priming effects.Priming is a psycholinguistic process that consists of the repetition of an element or linguistic structure (Cameron & Flores-Ferrán 2004;Flores-Ferrán 2002;Travis 2005).Understanding the role of priming in speech can impact our understanding of SPE.This variable is divided into four groups: previous clause had a full NP subject, previous clause had an overt SPP, previous clause had a null SPP, and no priming.The "no priming" was usually used when the previous clause was not spoken by the interviewee, or when there was a long pause or laughs.The hypothesis is variant continuity: use of one subject type favors the subsequent use of the same type (Orozco 2015a).morphological regularity.Verb forms were divided into regular and irregular verbs.The website Conjuga-me (www.conjuga-me.net)was used to verify the morphological regularity of verbs.Verb forms were coded as irregular if there was a change in the root of the verb (e.g.medir 'to measure', meço 'I measure' and not *medo), and if there was a change in the regular ending of verb (e.g.querer 'to want', ele quer 'he wants' and not *ele quere).Each form was coded independently of the other forms of the same word; for instance, leio 'I read' was coded as irregular and lemos 'we read', as regular, even if they have the same root ler 'to read'.Results for Spanish SPE in Erker and Guy (2012) show that irregular verbs are more often used with null subjects than regular verbs are, most notably among high-frequency verbs.
semantic content.Following Erker and Guy (2012), verb forms were divided into the following three semantic classes: mental activity (e.g.saber 'to know', pensar 'to think'), stative (e.g.ser/estar 'to be', ter 'to have'), and external activity (e.g.correr 'to run', beber 'to drink').Previous studies have shown that mental activity verbs have the lowest null subject rates and that external activity verbs have the highest (Erker & Guy 2012;Orozco 2015a).
verb form.This contrasts complex verb forms (e.g.tinha falado 'I had talked', vou dizer 'I'll say'), and simple verbs (e.g.falei 'I talked', digo 'I say').Contrary to Otheguy and Zentella (2012), I did not consider querer 'to want' + infinitive to be a complex verb form.When a token is a complex verb, I look at the finite form of the verb to code it.For instance, for the token vou passar 'I'll pass', vou is irregular, so the token is coded as irregular, even if passar is a regular verb.
ambiguity paradigm.This refers to how clearly the verb form indicates what the subject is, based on its morphology.Some verb tenses provide a more obvious indication of the subject person/number (present, past tense, future, imperative) because all persons have a different ending.In others, this is less obvious (imperfect tense, conditional, subjunctive) because some persons have identical inflections.The idea behind this coding is that when the morphology of a verb makes it clear what person and number the subject is, an overt SPP may appear redundant.11Consequently, I expect that the verbs that have a "more obvious" morphology favor the use of null subject.person and number.Verb forms were classified for one of the five person and number values: first singular, second singular, third singular, first plural, and third plural.Note that there were no tokens for second person plural, and the over SPP vós 'you.2pl' is an old form that is basically no longer used in spoken Portuguese.The tokens for PERSON AND NUMBER are unevenly distributed, with 1,595 first singular, 46 second singular, 2,037 third singular, 406 first plural, and 428 third plural.When there was absence of verbal agreement (e.g.tu fala 'you speak.3sg'), it is still the verb form that was coded, and not the subject -therefore, tu fala is coded as third person singular.Based on Duarte (1993) who has illustrated the increased use of overt subjects over time  in Brazilian Portuguese, the difference between each person and number might not be as significant as the changes across time (Figure 1).
animacy.Animacy is expressed based on how "alive" the referent of a noun is, whether it is animate or inanimate.Barbosa, Duarte and Kato (2005) have shown that null subjects in European Portuguese are strongly favored when the subject referent of a verb is inanimate.
coreferentiality.Coreferentiality is defined in terms of the relationship between the target verb (i.e. the one being examined) and the trigger verb (i.e. the finite verb preceding the target verb).It is also sometimes referred to as switch reference (e.g.Erker & Guy 2012).The present study made a distinction between coreference with subject of previous clause, coreference with indirect object complement (IO), coreference with direct object complement (DO), coreference with oblique object complement (OO), and switch reference.Switch reference has always been found to favor overt subjects.frequency.To find the frequency of a word, I used the Corpus do Português (www.corpusdoportugues.org), and their corpus called Web/Dialects, which has 1 billion words (Davies & Ferreira 2017). 12The different forms of a verb were considered all together; for instance, como 'I eat', comeram 'they ate', and comemos 'we ate' were all coded with the value 112 339 (log(10) = 5) under COMER 'to eat'.The frequency of all tokens was then grouped into seven categories based on a logarithm (base 10).I expect high frequency tokens to behave differently than the low frequency ones (Erker & Guy 2012).
Table 7 is a summary of these factors.

Considerations for coding
Coding for SPE was subject to the following considerations: 1) I only included finite verbs, and therefore did not include non-finite verbs ( infinitive, participles, and gerunds).That means that I also excluded the inflected infinitive, which is morphologically marked in Portuguese, but was infrequent in my dataset.2) I eliminated high frequency expressions made with a verb, such as quer dizer 'I mean', sei lá 'I don't know', não sei quê 'or whatever', tá(s) a ver 'you see', digamos 'let's say', como posso dizer 'how can I say', and sabe(s) 'you know'.These expressions are probably processed as one word, and not a sentence (cf.Heine, Claudi & Hünnemeyer 1991;Heine & Kuteva 2003).3) Following Otheguy and Zentella (2012), I excluded the verbs ser and estar 'to be' when they have no subject and mean 'it's' (e.g.é um bolo de chocolate 'it's a chocolate cake', está bem 'it's fine') because they always take the third person singular agreement, but I included them when copulative (e.g.é o meu irmão 'he's my brother', você está longe 'you are far') because these cases are marked for agreement between the subject and the verb. 1 001-10 000 10 001-100 000 100 001-1 000 000 1 000 001-10 000 000 4) I excluded the existential verbs haver and ter 'to have', because they have no subject.For instance, há palavras que eu não percebo 'there are words that I don't understand', or tem muita gente 'there is a lot of people'.5) I excluded the verbs that have no subject (the ones that have expletive subjects in English, for example), such as tá chovendo 'it's raining', or neva 'it snows'.6) I included incomplete clauses when the expression of the subject was clear.
For example, I included ele tem que… 'he has to…' because the overt subject is clearly expressed, but I excluded vivia… 'lived…', because the SPP could be expressed after the verb, as in vivia ele 'he lived'.7) In Portuguese, when the antecedent of the relative clause is co-indexed with the subject of the relative clause, a null subject is expected, since subject resumption is uncommon.For instance, the token tinha in the sentence batia numa filha que tinha cinco anos 'he beat a daughter who was five years old' would not be included in the dataset because the antecedent of tinha is filhathey are co-indexed.However, the token tinha in the sentence batia numa filha que ele tinha 'he beat a daughter that he had' would be included because it is not co-indexed with filha, but rather with the subject of the verb batia 'he beat'.

Distribution of null and overt subject pronouns
One hundred tokens per speaker were coded (N = 5600).The overall distribution of subjects in my dataset was 55.2% null subjects, 25.4% overt subjects, and 19.4% full noun phrase subjects, as shown in Table 8.To be consistent with other studies and because the focus of this section is pronominal use, I removed from the dataset the tokens with full noun phrase subjects, and retained only the subject pronouns, whether they are expressed or not (N = 4,512).To Table 4, I now add the results regarding the use of null and overt SPP in Santomean Portuguese (Table 9).
At first glance, the results suggest that the pronominal use in Santomean Portuguese is more similar to European Portuguese than to Brazilian Portuguese.In fact, 68.5% of subject pronouns are unexpressed in Santomean, which compares to 78% in European Portuguese and 44% in Brazilian Portuguese. 13Table 9 shows that the African varieties of Portuguese are situated between European and Brazilian Portuguese in their use of the null subject.However, remember that the methodologies and variables that underlie these studies are incomplete or unbalanced, and different from the ones used in the current study.Further studies with comparable data, methodology, and variables are necessary to validate this finding.
The following section deals with the social and linguistic constraints on the use of null subject.
13 SPE might vary in Brazil from one region to another, from one study to another, but generally speaking, the rate of null subjects is always lower than in European Portuguese.

25.4
Full noun phrase subject 19.4

Results and analysis
The variation of SPE was modeled through logistic mixed-effects regression using the R package.The R package has the advantage of allowing random and mixed effects, which takes into account that some speakers might favor a linguistic outcome while others might disfavor it, regardless of what their social characteristics would predict, and that some words might be treated distinctively (Johnson 2009).14Rbrul was also used to perform one-level analyses and obtain factor weights, a statistical measure often used in sociolinguistics that indicates to what degree a variable is favored or disfavored.The variation of SPE was modeled through a logistic mixed-effects with SPEAKER as a random effect using R package lme4 (version 1.1-12; Bates et al. 2015).Both social and linguistic predictors of SPE, run as binary variables, were investigated.A backward elimination with the anova function was performed to find the best model.According to the best-fit model, the constraints education level, type of clause, priming effects, morphological regularity, semantic content, person and number, animacy, and corefentiality were all significant.The random speaker effect is also significant (p<0.001).The factors age, gender, spoken language, ethnic origin, verb form, paradigm ambiguity and frequency15 were not significant.Each of the significant factors will be discussed one at a time.

Significant social factors for the use of null subject
In the full model with all factors included, education and ethnic origin were almost but not quite significant, but when doing the backward elimination in R, deleting education level would make ethnic origin significant, and eliminating ethnic origin would make education level significant.This is usually a sign of correlation.The crosstabulation in Table 10 shows why there is interaction between education level and ethnic origin.
The distribution of speakers is clearly skewed; those with just one Forro parent tend to be less well educated, so that most of the data for university educated speakers comes from those with two Forro parents, while data for speakers with only a primary education includes a high proportion of subjects with only one Forro parent. 16Consequently, I decided to remove ethnic origin from the analysis and keep education level.Without ethnic origin as a factor, education level is significant.Little social significance seems to be attached to SPE.In fact, participants did not address this feature during interviews (contrary to other features, such as pronunciation of rhotics which is more often mentioned; Bouchard 2017).As the results show, the conditioning effects of social predictors on SPE do not appear to be as strong as the linguistic predictors.The only social factor that appears as significant is education level.And surprisingly, the results are contrary to expectations and previous studies (e.g.Ávila-Jiménez 1996): having a university degree disfavors the use of null subjects (factor weight: 0.55), while having no more than a primary school education favors it (factor weight: 0.45) (Table 11).
Speakers generally associate formality with standard language.My supposition is that overt subject in São Tomé is somehow associated with formality, or to "more proper" speech.Although SPE was never discussed during interviews or in informal conversations, grammatical ideology might be an explanation for the higher rate of overt subjects among highly educated Santomeans.Kroch and Small argue that this grammatical ideology prescriptively "favors the most direct correspondence between propositional form and surface syntax" (1978: 48); overt subject pronouns provide an explicit surface realization of a propositional form.Consequently, the results suggest that people with a higher level of education show greater adherence to the grammatical ideology of a standard language that favors the use of overt pronouns in Santomean Portuguese.
However, this finding arises many questions.First, Kroch and Small (1978) suggest that people do have prescriptive grammatical intuitions and that consciousness of the prestige norms can influence speech.In São Tomé and Príncipe, the prestige variety is still considered to be European Portuguese, which favors the use of null subjects.Consequently, if prescriptive grammatical intuitions were an explanation for SPE in Santomean Portuguese, then one would expect educated Santomeans to have a greater use of null subjects, with a rate similar to native speakers of European Portuguese.
Second, since the local creoles lack null subject, one might expect speakers of a Santomean creole to show a greater use of overt subjects.In fact, among the adult participants, there is a high number of Santomeans with a low level of education who are bilinguals (i.e.people who learned creole when they were children and who still use it today).However, it is erroneous to assume that highly educated Santomeans do not speak creole.In fact, among the highly educated adult participants, 19% are bilingual and 56% have knowledge of creole as a L2 (Table 12).
Third, could this greater use of overt subjects be related to the interview setting?The speech data included in this study was elicited in individual interviews, and these interviews were collected after I had spent a period of time (starting during the third month, more precisely) in São Tomé to ensure that the questions asked were relevant.The first plan was to structure the interviews in modules that included demographic questions, as well as questions related to family, childhood, schooling, social network, identity, and language attitude.But in reality, the recording sessions followed no predetermined structure.The scope of the conversations was not limited to the question models; participants elaborated on topics that interested them.Therefore, the interview setting or any relation between the question-answer pair probably did not affect the speech of the participants more than it would have in other studies on SPE.Also, in order to mitigate the observer's paradox (Labov 1972b) and to collect casual speech, the tokens of SPP were taken from the middle of the interview.
Fourth, could this finding be related to the interviewer's variety of Portuguese?The interviewer (myself) speaks Brazilian Portuguese as a L2.Could these two elements of information (being a non-native speaker and speaking a Brazilian variety of Portuguese) have influenced the use of SPE of the participants?Were speakers with a higher level of education.i.e. speakers with a greater knowledge of the prescriptive grammar, adapting to the interviewer?If this were the case, the results regarding level of education and greater use of overt subjects would be due to the effects of external conditioning factors, and could not be considered a finding.However, because little social significance is attached to SPE, and because many of my participants did not know that I was not a native speaker of Portuguese, the Brazilian accent might have influenced the results more (if it did at all) than the fact that Portuguese is one of my L2, because I have a good command of the language.To my knowledge, there is no evidence of a register effect that endows educated speakers with the ability to use a different register with more null subjects in formal contexts.Further studies on the matter would be relevant, as they could clarify whether the correspondence between propositional form and surface syntax is favored in formal contexts.

Significant linguistic factors for the use of null subjects
A total of seven linguistic constraints significantly condition SPE in Santomean Portuguese, as the results presented in Table 13 indicate.The significance of most of these factors is consistent with findings from previous studies, most specifically in the extensive literature on the topic in Spanish (cf.Cameron 1992;Orozco 2015aOrozco , 2015bOrozco , 2016;;Otheguy & Zentella 2012;among others).The following sentences from my dataset illustrate these tendencies in the use of the pronouns: (  This finding, which is consistent with previous studies (Erker & Guy 2012), is probably related to the fact that irregular verbs often have distinctive forms for its different persons and numbers (e.g. the verb ser 'to be': sou, és, é, somos, são).semantic content.External and mental activity verbs favor the use of null subjects with respective factor weights of 0.55 and 0.53, while stative verbs disfavor its use, with a factor weight of 0.43.One explanation to the disfavoring of null subject with stative verbs might be the high frequency of at least two stative verbs: ser and estar 'to be'.Those two verbs show morphological irregularity in most of their inflectional forms, and as seen above, irregular forms of verb favor the use of null subject.coreferentiality.As expected, a null subject is favored (factor weight: 0.72) when there is complete coreference with the subject of the previous clause, and it is disfavored when there is a switch of reference (factor weight: 0.48).Interestingly, coreference with objects behaves differently according to the type of object: coreference with a direct object or an oblique object slightly favors the use of a null subject (respective factor weights of 0.54 and 0.52), but coreference with an indirect object strongly disfavors it (factor weight: 0.25 However, as seen in Table 13, there are very few tokens of coreference with a complement (total of 91 tokens, representing 2% of all tokens).To get a clearer picture of this factor, I collapsed the levels of coreferentiality to make a binary distinction between no switch in reference, and switch in reference, with this last category including the complete switch and the partial switch with coreference with complements.This coding follows Erker and Guy (2012) (Table 17).
Results in Table 17 show clearly that the use of a null subject is favored when there is no switch in reference (factor weight: 0.62, and 20.8% more null subjects).
Finally, Table 18 is an updated version of Table 13; it presents the significant linguistic factors for SPE with the revised and combined factor groups as discussed throughout this chapter.This table gives a definitive and clearest picture of the meaningful linguistic constraints on the process.

Discussion and conclusion
Santomean Portuguese has a high rate of null subjects (68.5%), which makes it more similar to European Portuguese (78%) than to Brazilian Portuguese (44%) in its use of pronouns.Interestingly, when comparing different varieties of Portuguese, we see that the African varieties are situated between the European and Brazilian ones (as seen in Table 9).As is the case for Spanish, this suggests that SPE can serve as a tool to differentiate Portuguese varieties.
I examined the effects of five social and ten linguistic constraints.Results show that education level, type of clause, priming effects, morphological regularity, semantic content, person and number, animacy, and corefentiality significantly condition SPE in Santomean Portuguese.The random speaker effects were also significant.
Regarding social constraints, Santomeans with a lower level of education favor the use of null subject, and the ones with a higher level of education disfavor it.I suggest that the use of overt subject gives a direct correspondence between surface syntax and propositional form, which might explain this preference among highly educated Santomeans.Following surprising as overt subjects with irregular verb forms may appear as being redundant.
Irregular verbs give a more salient information independently of the use of the pronoun.
As to semantic content of verb, external and mental activity verbs favor the use of a null subject.However, evidence from Orozco (to appear) suggests that the high frequency of certain verbs within each semantic category influences these results.If he is right, then the favoring of null or overt subjects would be a characteristic of each verb, and not of any semantic category.For person and number, the third persons behave quite differently than the other ones.A recoding of the third persons versus the others shows that they favor the use of a null subject.This supports the studies regarding Brazilian Portuguese in which third person subjects also behaved differently, and which led linguists to see third person subjects as a different type of empty category (cf.Barbosa, Duarte and Kato 2005).animacy strongly conditions SPE, with most of the inanimate objects expressed with a null subject.Finally, as was expected regarding coreferentiality, no switch in reference favors the use of a null subject, as the referent remains the same.I conclude with the following question: what happens to the variety of Portuguese spoken in São Tomé and Príncipe now that it is an independent nation?Santomean Portuguese can be seen as a language in contact with Forro and other creole languages of the islands, but the use of these creoles is restricted.Today, the great majority of young Santomeans are monolingual Portuguese speakers.Most of the Portuguese colonists left the islands as soon as São Tomé and Príncipe became independent, just as they abandoned the rest of the collapsing Portuguese colonies in Africa in 1974-1975.Since then, Santomeans have had a greater access to education and means of communication in Portuguese (e.g.television, Internet), and greater social mobility (in part related to Santomean immigration to Portugal).
Within this emergence of the Santomean variety of Portuguese, some features show the influence of the creole (Afonso 2008;D'Apresentação 2013;Figueiredo 2010;Gonçalves 2010Gonçalves , 2016;;Lima 2009), others such as the use of rhotics show innovation (Bouchard 2017), and others such as SPE show conservatism.This conservatism from European Portuguese probably emerged after the colonial period, with better education opportunities for Santomeans.Little (or no) social significance is attached to this feature.One reason for this might be the fact that it is a feature that maintained a similar use to European Portuguese.Explanations to most findings regarding SPE in Santomean Portuguese can be found in previous studies.The one element that diverges from previous studies in other Portuguese-speaking countries is the fact that in São Tomé, highly educated people use overt subjects more frequently than less educated people, while null subjects are highly favored in European Portuguese regardless of level of education.
are the results of her study, which was based on spoken corpora.This table shows that European Portuguese favors null subjects and that Brazilian Portuguese favors overt subjects.Those numbers vary depending on the person; in Figure 2, we see that in Brazilian Portuguese overt subjects occur with the greatest frequency with second person while in European Portuguese they do so with first person.All the studies mentioned above have demonstrated the difference between European and Brazilian Portuguese regarding SPE.Very little work has been done on African varieties of Portuguese regarding SPE.Oliveira and Ferreira dos Santos (2007) and Teixeira (2013) investigated SPE in Angolan Portuguese.First, Oliveira and Ferreira dos Santos (2007) examined the pronominal system of Angolan Portuguese and noted how it is becoming more like Brazilian Portuguese in relation to the use of você 'you' and a gente 'we'.However, as is the case in European Portuguese, Angolan speakers of Portuguese also use tu 'you'; there is therefore variation between the two second person singular forms.Table 3 (adapted from

Table 2 :
Percentages of null and overt subjects in EP and BP.

Table 1 :
Spoken Brazilian and European inflectional paradigms.

Table 3 :
Frequency number of SPP in Angolan Portuguese (369 sentences).6

Table 5 :
Dependent linguistic variable., with 60% of the population under the age of 25, and 6.5% over 55(CIA World  Factbook 2017).This explains why the age categories are low, and why I do not have more categories for older people.The correlation of linguistic variables with age is important when studying language change; in this case, investigating age will allow us to investigate change in progress structure

Table 6 :
Independent social factors for coding SPE.

Table 7 :
Independent linguistic factors for coding SPE.

Table 10 :
Cross-tabulation between education level and ethnic origin: number of tokens and percentage of data.

Table 12 :
Cross tabulation between SPOKEN LANGUAGES and EDUCATION LEVEL: number of participants.

Table 15 :
The significance of the use of null subject for PERSON AND NUMBER recoded (3rd persons vs. others) (intercept = 1.54;N = 4512; [Ø] = 68.5%).These results are consistent with the findings ofBarbosa, Duarte and Karo (2005: 23)for European Portuguese: "One major condition that contributes to the difference between null and overt pronouns is animacy.In this regard, the results are striking.When the referent is [-animate], [European Portuguese] shows, in the sample analyzed, 97% of null subjects."This is comparable to my sample of Santomean Portuguese in which 91% of the inanimate referents are expressed with null subjects.
).The following are illustrations of these patterns: