Module talk:pt-pronunc

From Wiktionary, the free dictionary
Jump to navigation Jump to search

Stress[edit]

This module does not yet indicate stress unless the word is spelled (or respelled in the first positional parameter of {{pt-IPA}}) with an accent mark, see e.g. desdobrar#Pronunciation, which says /dɨʒ.do.bɾɐɾ/ instead of /dɨʒ.doˈbɾɐɾ/. —Aɴɢʀ (talk) 15:26, 29 June 2017 (UTC)Reply

fixing this module[edit]

(Notifying Ungoliant MMDCCLXIV, Daniel Carrero, Jberkel): @Metaknowledge I am thinking of fixing this module to work properly. First question: Who are the active native speakers of Portuguese other than Ungoliant? Second question: What accent should be used? For {{es-IPA}}, I made it display up to six accents depending on distinción vs. seseo, lleísmo vs. yeísmo and Rioplatense accents. cebolla is an example with all 6; only 2 show by default, and the rest are hidden under a "More" button. There are lots of Portuguese accents, but I'm thinking at first we should show only two: standard European Portuguese (of Lisbon) and some newscaster type of Brazilian Portuguese. My Portuguese is exclusively Brazilian (particularly of Salvador da Bahia), so I may need some help with European Portuguese. I'm thinking the Brazilian variant should have the following:

  1. /tʃ/ and /dʒ/ before /i/, as in noite, teatro, rede, devagar, dezoito;
  2. vocalization of /l/ to /w/ syllable-finally;
  3. s should probably be /s/ not /ʃ/ word-finally, although when I get around to it I'll add a Carioca variant that has /ʃ/ word-finally;
  4. s and z before consonants I'm not sure, Bahian speech variably pronounces /s/ and /z/ or /ʃ/ and /ʒ/;
  5. epenthesis of /i/ before stressed final /s/, hence faz /fajs/, Jesus /ʒe.ˈzujs/ (Wikipedia says most Brazilians now do this);
  6. final -e and -o are /i/ and /u/ not /e/ and /o/ (which appears to be the case in far southern dialects, although I've never heard people speak this way);
  7. guttural r is maybe (?) written /ʁ/, although I've never liked this, IMO it's highly misleading; definitely, it will be accompanied by a phonetic notation in brackets that renders it as [h] (or maybe [χ], but I think this is not the most common rendering; but the Carioca variant will use [χ]);
  8. word-final r is written /(ʁ)/, glossed phonetically as [(h)];
  9. unstressed pretonic vowels not occurring before nasal consonants are written /a e i o u/ both phonemically and phonetically, and raising of written e -> /i/ and written o -> /u/ occurs only when explicitly respelled that way, as in devagar, optionally respelled divagar;
  10. stressed vowels preceding nasal consonants are indicated as nasalized even before nasal + vowel, at least in the phonetic notation; whether this should happen with unstressed vowels, I don't know;
  11. nh is written /ɲ/ but phonetically [j̃], as in unha /ˈu.ɲa/ [ˈũ.j̃ɐ];
  12. ou is maybe written /ou/ [o(ʊ)].

Anything I missed or people disagree with? Benwing2 (talk) 02:46, 14 April 2021 (UTC)Reply

@Mahagaja Benwing2 (talk) 02:50, 14 April 2021 (UTC)Reply
Ungoliant is very knowledgeable, but may not be able to respond promptly. Other active native speakers include @Munmula, Svjatysberega, Cpt.Guapo. @Ultimateria is not a native speaker, but is worth keeping in the loop, and may think of other people to ping as well. —Μετάknowledgediscuss/deeds 03:05, 14 April 2021 (UTC)Reply
One minor thing I object to is defaulting EP to the Lisbon dialect. Not that tons of words will be affected, but I'd prefer it to appear under a dropdown link. I believe we should take a region-centric approach rather than a city-centric one; city dialects are subsets even if they're major cultural centers that create most of a country's media. Ultimateria (talk) 05:30, 14 April 2021 (UTC)Reply
Thanks for pinging me, but I am really not knowledgeable (or Metaknowledgeable) enough to say much here. My knowledge of Portuguese phonology is limited to Brazilian as well, and even then only to what I learned in Portuguese class at the University of Texas over 30 years and what I notice Brazilian singers like Caetano Veloso, Gilberto Gil, Regina Elis etc. doing in their songs (and yes, I'm aware that singers don't necessarily pronounce words the same when singing as they do when speaking). —Mahāgaja · talk 06:37, 14 April 2021 (UTC)Reply
A descriptive Brazilian pronunciation is so far removed from the orthography that it is questionable whether a template like this offers enough benefit to be worth the upkeep. I will post a list of several incongruities, but before I’ll make some comments and address your list. But given the mess that the overall state of our transcriptions are in (much of it my fault), I’d support the template as long as the following conditions are met:
  • The template does not generate a transcription without a parameter if any potential ambiguity is detected
  • It is not added to pages by bot
  • Editors who are not familiar with Portuguese pronunciation be discouraged from adding the template to pages
But I’ll help regardless. Concerning Standard Brazilian Portuguese, it is not a ‘thing’ in the same way that for example Received Pronunciation is a thing. Transcriptions tagged as “Brazil” are those that lack features that have a limited regional scope. They include informal and unguarded pronunciations, as long as there is no feature that is considered regional. This means that they do not include final /e/-/i/ distinction, nor should it include the /ti/ and /di/ of Northeastern BP. [ʃ] for the coda sibilant is less obvious; I do not include it because speakers trying to sound as formal as possible to a national audience will sometimes switch to [s], whereas the opposite does not occur, but I can see why someone would disagree. I have little advice to offer on European Portuguese and even less on African Portuguese.
Concerning your points:
  1. Always in noite, rede, devagar (?); optionally in teatro, dezoito (which also has [d͡zoj.tu], with no obvious underlying phonemic represantion)
  2. Always
  3. See above
  4. [s] and [z] always match the voicing of what follows, including across word boundaries. Before pauses it’s [s]. Similarly for [ʃ] and [ʒ] but for most speakers it’s [z] across word boundaries where [ʒ] would be expected, and for many [ɦ] or similar before voiced consonants, especially in unguarded speech
  5. Should always be parenthesised as it is somewhat low in prestige and avoided in formal contexts. It’s probably best not to put it in the transcription of terms that are only used in formal contexts. It is also blocked by certain morpheme boundaries by most speakers (see -s)
  6. Shouldn’t be listed in transcriptions tagged as Brazil, as it is very regional and low in prestige. Note that this is a distinction, not a shift; júri and jure are minimal pairs in dialects where this occurs. If you’re curious: [1]; the caster uses standard pronunciation but lets slip a [kõˈfɛ.ɾe], the first interviewee uses mostly final [e]~[ɪ] and [ʊ] (but not in de).
  7. All options will be highly misleading if I’m being honest. I think /ʁ/ is a good compromise for the onset non-tap rhotic but /h/, /ɦ/ and /χ/ are all options I can live with.
  8. The best transcription coda rhotic is more debatable; for a long time I simply refused to add {{a|Brazil}} transcriptions to words with coda rhotics and added regional transcriptions with multiple variants as phomenic transcriptions (something I regret). Eventually I came to accept /ʁ/ for how well established it is; but I think /ɾ/ or even an abstract /r/ would be better options.
  9. The non-prespecriptive pronunciation of unstressed, non-final <e> and <o> cannot be guessed from the spelling
  10. Not a fan of indicating this in phonemic transcription, even though it is admitedly widespread. com meu does not sound the same as comeu for at least some speakers. Also consider how some speakers have /ɔ/ before nasals in a handful of words (verb tomo, homem); would we then have a new phoneme /ɔ̃/ for these transcriptions?
  11. The sound of <nh> has been historically transcribed as /ɲ/; the push for /j̃/ is well-supported but I feel uncomfortable with it as I know from personal experience that a non-approximant is used by at least some speakers at least when speaking carefully. When I invesitaged the issue, I found the research commonly used to support the ubiquity of [j̃] on Wikipedia to have been too limited in scope. But I have no argument against it from a non-personal standpoint.
  12. Some words have /ow/ consistently (Douglas) or semi-consistently (Sousa, some have /ɔ/ in certain conditions (verb estouro). This cannot be guessed from the spelling. The /w/ (or /ʊ/ if we settle on that) should always be listed, as the careful, formal pronunciation always has it.
Ungoliant (falai) 14:19, 14 April 2021 (UTC)Reply
@Ungoliant MMDCCLXIV Thank you very much for your detailed input. I've written or rewritten the modules for Russian, French, Spanish, Old English, Ukrainian and Belarusian. Generally, it's possible and useful to do so whenever the mapping from spelling to pronunciation is more or less predictable (in some cases with respelling hints). This includes e.g. French, where the spelling and pronunciation are radically different, but it's still mostly possible to map spelling to pronunciation (not the other way around). The only cases I've given up on are German (which should be possible to do but tricky and requiring a good deal of respelling hints to handle vowels properly) and English (which should still be possible, but it requires respelling of every word, hence maybe "our fathers brought forth a new nation" respelled as something like <our fáadharz braut forth a nue náishan>). In some cases the respelling is mandatory, in others there are defaults. For example, the Russian module requires that stress be indicated in words of more than syllable that don't have the letter <ё>, since it's almost completely unpredictable, whereas palatalization before written <е> is the default, and in cases where it's absent (which happens in many loanwords but not in native words) you have to respell it as <э>. For Portuguese, I'd probably do the following:
  1. Definitely make it mandatory to mark the pronunciation of most stressed <e> and <o> as <é>/<ê> and <ó>/<ô>, and throw an error if the appropriate respelling isn't present or doesn't include an accent.
  2. On the other hand, I would default stressed <e> and <o> to high-mid /ẽ/ and /õ/ before nasals, since this is almost universally the case in Brazil and usually the case in Portugal. Similarly, I would default stressed <ei>, <eu>, <oi>, <ou> to have high-mid vowels in them, since the low-mid variants are usually marked in spelling.
  3. As for raising of pretonic <e> and <o>, my instinct is (at least for BP) to simply use <e> and <o> in respelling for /e/, /o/ and require all cases of /i/, /u/ to be respelled <i> and <u>. We could require that the respelling be explicitly provided in such cases, but that wouldn't prevent a lazy editor from just copying the spelling to the respelling. An alternative is to require that some symbol be explicitly added in respelling to signal that the specific respelling is in fact intended, e.g. <ẹ>/<ị> and <ọ>/<ụ>.
  4. In Russian we use the symbol <_> between two letters to indicate that the letters on either side should be pronounced separately, preventing any special handling involving the combination of the two. This would work, for example, to prevent <ti> from gaining /tʃ/ and stressed final <as> from gaining /(j)/. Hence, you might respell <teacher> as <t_ítcher> (I'm guessing here as to the pronunciation) and <nós> "knots" as <nó_s>.
  5. For words pronounced differently in Brazil vs. Portugal, where the difference isn't predictable, the module might support parameters |bp= and |ep= to indicate the appropriate respellings.

I'll respond in more detail to your comment below. Benwing2 (talk) 01:11, 15 April 2021 (UTC)Reply

Features that cannot be determined from the spelling[edit]

  • Height of non-nasal, unaccented <o> in stressed syllables: dor, gota, godo, coisa (/o/); moda, fole, pote (/ɔ/); poça (free variation); toda, choro (homographs)
  • Height of non-nasal, unaccented <e> in stressed syllables: cometa, negro (/e/); festa, guerra, quero (/ɛ/); tempero, medo (homographs)
  • /w/ for the <u> in <gue>, <gui>, <que> and <qui>
  • Strong vowels in unstressed syllables: cafezinho, sozinho, poveiro (EP only), caminha (in the diminutive, not in the verb); in BP this only occurs consistently in words with the diminutive, augmentative and adverbial suffixes, though there are exceptions; in EP it occurs in a lot more words
  • Secondary stress in words with morpheme boundaries: contrafilé, extraterrestre
  • Loanwords, in particular those that do not have any orthographical feature that marks them as foreign, such as core
  • Words that unexpectedly allow /t/ and /d/ before /i/ in BP: DJ, teacher, made in China, default; these are so rare that we could ignore them
  • The letter X can only be consistently assigned to /ʃ/ word-initially and perhaps after /j/. In all other places it could as easily be /s/ or /ks/
  • Whether <u> before a vowel (or <o> in some cases) should be /w/ or /u./ (or /o./)
  • Whether <i> or <e> before a vowel is /i./, /j/ or /e./
  • Word-final <o> after a vowel. The noun rio has /w/ and less commonly /.u/, the verb is always /.u/
  • Height of unstressed, non-final <e> in BP: perito (always /e/); penico (almost always /i/); feliz (varies); raising is more common when the following syllable has a stressed /i/ or /u/, and less common in formal words
  • Height of unstressed, non-final <o> in BP: coragem (/o/); comprimento, coberta (varies)
  • The stressed syllable of demonyms ending in <i> like somali and maori; the rules of orthgraphy indicate word-final stress, but they are usually pronounced with stress in the penult. The headwords of Dicionário Aulete give lend some prescriptive credibility to this practice
  • <s> after <n>: /s/ or /z/
  • <dj>: /d͡ʒ/ (azerbaidjano) or /d.ʒ/ (adjetivo) or /dʒ/ (Djalma)
  • intrusive /j/ before /s/: faz, Jesus, mas, pronoun nós; blocked by morpheme boundaries (más, noun nós) except in vocês

Addenda:

  • word-final rhotic: parenthesised if the word is a verb not ending in -or / -ôr
  • <h>: /h/ or nothing; I recommend defaulting to nothing word-initially and require respelling elsewhere (when not part of a digraph)

Ungoliant (falai) 16:13, 14 April 2021 (UTC)Reply

@Ungoliant MMDCCLXIV Comments:

  1. <o> and <e> in stressed syllables: requires respelling to <ó>/<ô>, <é>/<ê>, as discussed above.
  2. /w/ in <gue>, <que>, <gui>, <qui>: I'd default these to have no /w/ and require respelling with <güe> etc. to indicate the /w/.
  3. "Strong vowels in unaccented syllables": what do you mean exactly by "strong vowels"? If the vowels are always of a single type, e.g. low-mid for <e>, <o>, we can handle this using a special symbol on these vowels, e.g. <ė>, <ȯ>.
  4. Secondary stress: Require respelling with <ò> etc. If the vowel is <e> or <o> not before a nasal, you'd also need to specify the quality, e.g. <é̀> (a bit awkward, admittedly).
  5. Loanwords simply need respelling.
  6. For <x>, I'd probably do the following: (1) Initial <x-> or <x> after vowel + <i> defaults to /ʃ/; (2) maybe, final <-x> defaults to /ks/, as in xérox; (3) other instances require respelling, e.g. exame respelled as <ezame>, trouxe respelled as <trousse>, axé respelled as <aché>.
  7. For unstressed <u> before vowel: From the examples I can find, e.g. cuecas, anual, ruim, suíça, it looks like they should default to /u/ in hiatus, requiring cases of /w/ to be respelled with <w>. Alternatively, if this isn't very predictable, there could be no default, requiring spelling either as <w> or as <u.>.
  8. For unstressed <i> before vowel: Similarly, using <y> to force a semivowel.
  9. For unstressed <e> before vowel: this should default to /e/, I think.
  10. Word final <o> after a vowel: Should default to /.u/, I think. Use /w/ to force a semivowel.
  11. Height of unstressed pre-tonic <e>, <o>: See above, I think these should default to /e/, /o/; use <i>, <u> to force high vowels.
  12. Demonyms like somali, maori: Use an accent in respelling; the default will generate word-final stress.
  13. <ns>: Either default to /ns/, with /nz/ requiring respelling, or require respelling in all cases as either <nc>, <nç> or <nz>.
  14. <dj>: What is the exact difference between /d͡ʒ/, /d.ʒ/ and /dʒ/? Does /d.ʒ/ actually mean [dʒiʒ] (in which case it should be respelled as <dij>)?
  15. Intrusive /j/ before /s/: Should be the default, as indicated above; to force this not to happen, put an <_> betweent the vowel and the <s>.

Benwing2 (talk) 01:50, 15 April 2021 (UTC)Reply

  1. Agreed
  2. I could live with it; but keep in mind that this is something that will reflect poorly on Wiktionary whenever we get it wrong
  3. By strong vowels I mean /ɛ/, /ɔ/ and mandatory /ɐ/. These only occur in unstressed syllables in specific circumstances (note: unstressed <a> before nasals is in free variation between [a] and [ɐ], affected by dialect, lexical stress and guardedness); for BP, it can be assumed to be a possibility in words ending in -mente, -inho, -inha, -inhos, -inhas, -ão, -ões, -ona and -onas. Until 50 years ago, the first two cases were spelt with ` (sòzinho, cafèzinho)
  4. Come to think of it, probably the best way to deal with secondary stress is to represent the terms as multiple words, e.g. {{pt-pronunc|contra-filé}}, {{pt-pronun|êstra-terréstre}}
  5. Agreed
  6. Agreed, but if we use <ch> to respell /ʃ/, these respellings must not be used for the dialect of northern Portugal
  7. Any option is tolerable as there is almost always some degree of free variation
  8. Excellent idea
  9. Not sure
  10. I’d rather always require the respelling here
  11. Perfect
  12. Agreed
  13. I recommend defaulting to /◌̃z/ in words with trans- and /◌̃s/ elsewhere
  14. d͡ʒ is a single phon*e, found in jazz and in {{a|Brazil}} dia. /dʒ/ and /d.ʒ/ are two phonemes pronounced separately, one after the other. As written, they only occur in careful pronunciation; one also hears the breaking you describe (i.e. a-di-je-ti-vo, di-jal-ma or de-jal-ma)
  15. Consider how many entries are plurals, where this is a concern. If we do default, it’s better to default to no intrusive /j/, as that is guaranteed to be correct at least in the formal pronunciation.
Ungoliant (falai) 02:53, 15 April 2021 (UTC)Reply
@Ungoliant MMDCCLXIV Does your addendum above mean that in non-verbs like mar and âmbar, as well as verbs in -or (e.g. por), the final <r> is always pronounced, even in dialects where it sounds as [h]? Also, what is the pronunciation of coda <r> in words like porto, carta in São Paulo? Is it still [ɾ] or is it increasingly being pronounced as [h] or [χ], as in Rio, Belo Horizonte, Salvador, etc.? Benwing2 (talk) 04:23, 15 April 2021 (UTC)Reply
That’s right. R-dropping in other words should not be indicated in pronunciations tagged with (Brazil). Coda R in São Paulo is [ɾ], [ɹ] or [ɻ]. [ɾ] is primarily used in the metropolis and the other two primarily in the rest of the state. [ɾ] is the prestige variety and people trying to improve their social status will often switch to it; [ɻ] in particular is stigmatised as r caipira (hillbilly r). Some older speakers use [r] as an emphatic variant of [ɾ]. I’ve never heard of people from São Paulo state using a fricative for it; it might exist in border areas.

getting fixed[edit]

(Notifying Ungoliant MMDCCLXIV, Daniel Carrero, Jberkel): @Metaknowledge, Munmula, Svjatysberega, Cpt.Guapo, Ultimateria The first version of the new module is implemented. You can see it in action so far at Module:pt-pronunc/testcases and User:Benwing2/test-pt-IPA. It currently can generate up to four variants (Rio, São Paulo, Lisbon and non-Lisbon Portugal). Here, "non-Lisbon Portugal" represents standard Central Portugal speech minus lowering of /e/ -> /ɐ/ before palatals. It still needs some work and a whole lot of test cases need to be added. You can see from User:Benwing2/test-pt-IPA that you can specify country-specific and variant-specific respellings, and if there are two separate per-country variants (or conceivably more than two, if/when they exist), one of them is hidden behind a "More" button. Currently, the "closed" form of the display shows Lisbon and São Paulo speech and will expand to show São Paulo + Rio and Lisbon + non-Lisbon Portugal. I'm not totally happy with this solution in particular for Brazil, and I'm thinking of creating a "composite Brazilian" variant that is similar in most respects to São Paulo but where coda <r> is /ʁ/ [h] rather than São Paulo /ɾ/ or Rio /ʁ/ [χ]. Is there a specific speech community that speaks this way (e.g. Belo Horizonte)? I will document more and add more test cases in the next day or so. Benwing2 (talk) 06:39, 25 April 2021 (UTC)Reply

As for your question about [h], yes, this is the general pronunciation found in Belo Horizonte. It is also widespread in large parts of the North and Northeast regions, and also in Brasilia (my city). Svjatysberega (talk) 07:02, 25 April 2021 (UTC)Reply

some more questions[edit]

(Notifying Ungoliant MMDCCLXIV, Daniel Carrero, Jberkel, Svjatysberega, Cpt.Guapo, Munmula): @Munmula, Svjatysberega, Cpt.Guapo Some questions of native speakers:

  1. Final -x as in xérox, cóccix: Is this always [ks] or is it [kʃ] in Rio and Portugal?
  2. In Brazil, final -r when not dropped, before a word beginning with a vowel: Is this always /ɾ/ or can it be /ʁ/? E.g. por um lado, por enquanto, mar azul, âmbar em madeira, pôr um prato na mesa, etc.
  3. Pronunciation of unstressed por: I gather this is [puɾ] in Portugal, what about Brazil? Always with /o/, always with /u/ or it depends?
  4. Pronunciation of unstressed <e> or <i> in hiatus: I gather this can be variably /i/ or /j/ in Brazil. Is it always or regularly /j/ in Portugal? E.g. tédio, vocabulário, silêncio, presentear, recrear, passear, teatro, Timóteo. If so, I'll make this the default in Portugal, requiring a respelling such as 'tédi.o' to avoid it.
  5. Same question for unstressed <u> in hiatus in Portugal: atuar, continuar, luar, visual, linguiça, etc.
  6. Diphthongs in narrow phonetic notation: I am indicating them as e.g. /aj/, /oj/, /ew/ in phonemic notation: mew /mew/, noite /ˈnoj.t͡ʃi/ (Brazil), saia /ˈsaj.ɐ/. Should we do the same in phonetic notation or use something narrower like /aɪ/, /aɪ̯/ or /ai̯/?
  7. Diphthongs in hiatus: As indicated above, saia becomes /ˈsaj.ɐ/. Is that right or is /ˈsa.jɐ/ better?
  8. <ou>, <ei>: I made the module indicate written <ou> as /o(w)/, but indicate written <ei> as /ej/. Does that make sense or should <ei> also be written /e(j)/ or similar?
  9. as gentes in Brazil: I gather this is /a ˈʒẽ.t͡ʃiʃ/ in Rio, but is it /az ˈʒẽ.t͡ʃis/ or /a ˈʒẽ.t͡ʃis/ in São Paulo? I have assumed the latter but Wikipedia isn't clear on this.
  10. <ld> in Portugal e.g. aldeia, humilde: Wikipedia claims that written <b> <d> <g> are pronounced as fricatives unless they're at the beginning of an utterance or after a nasal, but in Spanish, <d> is a hard [d] in <ld>. Is this also true in Portugal? The existing transcriptions in Wiktionary are inconsistent.
  11. Final <-ing>: It seems like final <-ing> in English loanwords is /ĩ/ in Brazil, but what about Portugal? I see some pronunciations like shopping /ˈʃɔ.pĩŋ/ and xing ling /ʃiŋ.liŋ/, could this be the norm in Portugal?
  12. Initial unstressed <em->/<en-> in Brazil: I see pronunciations like emprego /ĩ.ˈpɾe.ɡu/|/ẽ.ˈpɾe.ɡu/, empório /ĩ.ˈpɔ.ɾju/, embolar /ĩ.bo.ˈla(ɾ)/|/ĩ.boˈla(χ)/, enxágue /ĩ.ˈʃa.ɡwi/ indicated for Brazil, whereas the Portugal pronunciations always have /ẽ-/. Is this /ĩ-/ the norm in Brazil?
  13. culposamente, certamente, completamente, rapidamente, etc.: Is the <a> before -mente pronounced as [a] or [ɐ] in Brazil? I know an <e> in this position is pronounced /i/ as in e.g. freqüentemente, as if word-final.

Thanks! Benwing2 (talk) 16:46, 25 April 2021 (UTC)Reply

  1. /k͡s/, both in Rio and Portugal.
  2. Always /ɾ/.
  3. /puɾ/ in Portugal, /puʁ/ in Brazil.
  4. In Portugal, "io" and "eo" are /ju/, "ea" is /i.ˈa/ → /ˈtɛ.dju/, /vu.kɐ.bu.ˈla.ɾju/; /ti.ˈa.tɾu/, /pɐ.si.ˈaɾ/.
  5. "ua" is /wa/ in Portugal and /u.ˈa/ in Brazil; "ui" is /wi/ in both.
  6. /aj/, /ew/ and /ow/ in broad transcription, /aɪ̯/, /eʊ̯/, /oʊ̯/ in narrow transcription.
  7. /ˈsaj.ɐ/.
  8. It is /e(j)/ only before /ɾ/: /paʁ.ˈse(j).ɾu/, /pɐɾ.ˈse(j).ɾu/ (Portugal).
  9. /az.ˈʒẽ.t͡ʃis/ is the standard in São Paulo, but in careless speech the /z/ is dropped.
  10. Indeed /ɫd/, never lenited /ɫð/.
  11. I might be wrong, but it actually seems to be /ĩɡ/.
  12. /ĩ/ is the norm throughout the country, it is only /ẽ/ in careful speech, but sounds kind of unnatural.
  13. More often /a/, only a few speakers use /ɐ/ (not specific to any region, it is a personal preference of some people). And yes, it is /i/ in frequent-e-mente. Svjatysberega (talk) 18:41, 25 April 2021 (UTC)Reply

I would like to add more about the pronunciation of guttural R /ʁ/, it had been addressed earlier but I did not follow it.

Onset /ʁ/, as in "rio" and "carro":

  1. In São Paulo and the South, it is uvular /χ/ or /ʁ/.
  2. In Rio de Janeiro, it is velar /x/ or /ɣ/.
  3. It is also velar in Espírito Santo.
  4. In Minas Gerais, it is glottal /h/ or /ɦ/.
  5. It is also glottal in the Northeast, North and Central-West regions.

The voiced and voiceless fricatives overlap each other in onset position and there is no rule that determines which one to be used, it simply depends on the speaker. I pronounce it as voiced, "rio" /ˈɦiw/ and "carro" /ˈka.ɦu/; but there are people around me who say /ˈhiw/ and /ˈka.hu/. This applies to all variants /ʁ/-/χ/, /ɣ/-/x/, /ɦ/-/h/. I think it is better to use the voiced ones in broad transcription.

Coda /ʁ/, as in "mar", "porta", and "guarda":

  1. In metropolitan São Paulo, it is /ɾ/.
  2. In countryside São Paulo, it is /ɹ/.
  3. In Rio, it is /x/, but /ɣ/ before voiced consonants.
  4. In the South, it is /χ/, but /ʁ/ before voiced consonants.
  5. In the Northeast and North regions, it is /h/, but /ɦ/ before voiced consonants.
  6. In the Central-West region, it is either /h/ or /ɹ/ in urban areas, /ɹ/ or /ɻ/ in rural areas.

When a fricative is used, it assimilates in voicing to the next consonant: porta /ˈpɔh.tɐ/, guarda /ˈɡwaɦ.dɐ/. When there is no following consonant, both can be found: mar /ˈmah/, /ˈmaɦ/.


I still recommend transcribing all with /ʁ/, and leaving the other options to regional pronunciations:

  1. rio /ˈʁiw/
  2. carro /ˈka.ʁu/
  3. mar /ˈmaʁ/
  4. porta /ˈpɔʁ.tɐ/, narrow [ˈpɔχ.tɐ]
  5. guarda /ˈɡwaʁ.dɐ/

@Benwing2 Svjatysberega (talk) 20:06, 25 April 2021 (UTC)Reply

@Svjatysberega Thanks for the detail! This is great and will help the module a lot. I've been adding testcases to Module:pt-pronunc/testcases but there are more to add and some of the existing ones are failing. As for <r>, so far I've used /ʁ/ for the coda except for the São Paulo pronunciation, where it seems a bit strange to me to use /ʁ/ for [ɾ] given that /ɾ/ is also a phoneme. I wonder what others think. My experience with Portuguese is primarily from spending time in Salvador so I am most familiar with [h], and [ɦ] before voiced consonants. It actually sounds to me like there's a slight [ᵊ] after [ɦ] in words like guarda but it's not clear we need to indicate that. (Also for /ɔ/, /ɛ/ I actually hear something like [oɔ̯], [eɛ̯].) Benwing2 (talk) 20:36, 25 April 2021 (UTC)Reply

@Benwing2 Keep this way, use coda /ʁ/ for general Brazilian and /ɾ/ for São Paulo specifically, for instance:

It's true that there is a [ᵊ] after [ɦ]. As for /ɛ/ and /ɔ/, I have never heard about that nor noticed it in other's speech, maybe it's typical of Salvador only. There's no need to indicate none of them [ɦᵊ] [eɛ̯] [oɔ̯] because it's superfluous.

^ Svjatysberega (talk) 22:12, 26 April 2021 (UTC)Reply

@Svjatysberega, Ungoliant MMDCCLXIV Thanks! I have a few more questions:

  1. You mentioned that final -r before vowels e.g. in por enquanto, mar azul is always /ɾ/ even in Brazil. What about final -r of verb forms before vowels, e.g. in amar uma garota, fazer a coisa? Would it be /(ɾ)/ in Brazil?
  2. Final -io. The current transcriptions are totally inconsistent in how this is rendered, esp. for Brazil. For example, just considering unstressed -rio, we have território given as only /te.ʁi.ˈtɔ.ɾju/, but transitório given as either /ˌtɾɐ̃.zi.ˈtɔ.ɾi.u/ or /ˌtɾɐ̃.zi.ˈtɔ.ɾju/. Similarly for unstressed -dio, we have tédio given as either /ˈtɛ.d͡ʒi.u/ or /ˈtɛ.d͡ʒju/, but médio given as any of /ˈmɛ.d͡ʒi.u/, /ˈmɛ.d͡ʒju/ or /ˈmɛ.dʒiw/, and subsídio given as either /sub.ˈsi.dʒju/ or /sub.ˈsi.dʒiw/, while sódio is given as /ˈsɔ.d͡ʒu/ (?) or /ˈsɔ.d͡ʒju/. With stressed -io it's no better: baldio is given as only /baw.ˈdʒi.u/, while tardio is given as "Brazil" only /taʁ.ˈd͡ʒiw/ but "Paulista" either /taɹ.ˈd͡ʒi.u/ or /taɹ.ˈd͡ʒiw/. The Portugal pronunciations are a bit more consistent but not perfect, e.g. próprio is given as /ˈpɾɔ.pɾju/, but proprietário is given as /pɾu.pɾi.ɛ.ˈta.ɾju/ with unresolved hiatus for the <ie> (and /ɛ/, which seems strange). I gather in some cases there may be differences depending on part of speech, e.g. rio "river" is given as Portugal /ˈʁi.u/, Brazil /ˈʁiw/, but rio "I laugh" is given as Portugal/Brazil /ˈʁi.u/. But the current transcriptions are extremely messy; can you give some pointers as to what's actually going on and how I should render words with final -io in them? Note that you can always force different pronunciations through respelling, e.g. i.o for the two-syllable variant, iw or yu for the one-syllable variants, so the issue is what should be done by default.
  3. Is there any phonetic difference between final -u in a diphthong and final -l in Brazil? I gather than mau "bad" and mal "badly" are pronounced exactly the same, and that pariu and funil may be exact rhymes, but what about bolsa vs. bouça? I assume these are not the same because /w/ from <l> cannot be dropped whereas /w/ in <ou> can. Also what is the correct phonetic rendering of mal, funil, bolsa, are [maʊ̯], [fu.niʊ̯], [boʊ̯.sɐ] correct? Finally, what is the correct phonetic rendering of /uw/ e.g. in Raul?
  4. You mentioned that <eir> is pronounced /e(j)ɾ/ with optional /j/. What about in Lisbon, where e.g. beira becomes /ˈbɐj.ɾɐ/? I take it the /j/ is mandatory here?
  5. You mentioned above the <ea> is /i.ˈa/ in Portugal, I take it that applies only when the <a> is stressed? E.g. fêmea /ˈfe.mjɐ/, área /ˈa.ɾjɐ/, etc. In general any help/rules you can give on how to handle hiatuses with written e/i/o/u + vowel would be greatly appreciated, including cases where the first vowel is stressed, the second vowel is stressed, or neither vowel is stressed.
  6. Secondary stress: Sometimes secondary stress is indicated in the current transcriptions. As with so much else, it seems quite random, e.g. diminuir /d͡ʒi.ˌmi.nu.ˈi(ʁ)/ but distribuir /ˌd͡ʒis.tɾi.bu.ˈi(ɻ)/. Sometimes the same word is given with secondary stress in Brazil but not in Portugal, e.g. destruidor Brazil /ˌdes.tɾu.i.ˈdoɻ/, Portugal /dɨʃ.tɾu.i.ˈðoɾ/. I gather that secondary stress is important in some cases, e.g. in rapidamente stressed as ràpidamênte, which results e.g. in Portugal /ˌʁa.pi.ðɐˈmẽ.tɨ/, with /a/ not /ɐ/ in the first syllable due to the secondary stress. But in the other words above I wonder if it's real. Any comments on how secondary stress works?
  7. (added) Initial unstressed e- in Brazil, e.g. está, exame, excelente, espanhol, espelho, escolha, errar: The transcriptions are inconsistent in whether and under what circumstances this is raised to /i-/. I gather this is normal in Rio in <esC...> words, e.g. estar, espelho, but what about other e- words and what about outside of Rio?
  8. (added) How is /l/ pronounced in Portugal at the end of a word before a vowel, as in mal e mal, azul e branco? Is it [l] or [ɫ]?
  9. (added) Final -n in Portugal: I gather this is /n/, but causes a nasal vowel in Brazil. What is the quality of the vowel preceding /n/? The transcriptions are inconsistent for final <-on>, with bóson /ˈbɔ.son/ vs. most others in /-ɔn/, eg. fóton /ˈfɔ.tɔn/, próton /ˈprɔ.tɔn/, íon /ˈi.ɔn/, etc. Similarly, I'm guessing final <-en> e.g. abdómen, flúmen, glúten, regímen, is /ɛn/. What about final <-an>? The only transcribed example is Renan /ʁɨ.ˈnɐn/, which is stressed.
  10. (added) How is the vowel before /z/ pronounced in Brazil in audazmente, atrozmente, eficazmente, ferozmente, etc. Is there an intrusive /j/?

Thank you for all your help. Benwing2 (talk) 04:48, 27 April 2021 (UTC)Reply

@Benwing2

  1. Yes, /(ɾ)/.
  2. "io" is /ju/ when unstressed and /ˈiw/ when stressed. /i.u/ and /ˈi.u/ are only used in very careful speech by a few speakers. The messy transcriptions were caused by many editors adding different pronunciations, I admit I have made a bit of it some time ago and have not corrected.
  3. No difference, "mau" and "mal" are both /ˈmaw/ [ˈmaʊ̯]. "ul" is /uw/ [uʊ̯].
  4. When /ej/ becomes /ɐj/, the /j/ is mandatory, but when it does not, it is optional: /ˈbɐj.ɾɐ/, /ˈbe(j).ɾɐ/.
  5. Yes, only when the <a> is stressed. I can write you a list of hiatuses and their pronunciations.
  6. Secondary stress should be indicated in four-syllable words with stress on the last syllable, as in /d͡ʒis.ˌtɾi.bu.ˈi(ʁ)/, /su.ˌpe.ɾi.ˈo(ʁ)/, and in five-syllable words with stress on the last or penultimate syllable: /ʁe.ku.ˌpe.ɾa.ˈsɐ̃w̃/, /ĩ.ˌte.li.ˈʒẽ.t͡͡ʃi/. It falls two syllables before the stressed one, but in the case of adverbs from adjectives, it is the stressed syllable of the adjective root, as /ˌʁa.pi.da.ˈmẽ.t͡ʃi/.
  7. Unstressed es- should be /is/ because it is the most common everywhere, other initial "e's" should be always /e/, /e.ˈzɐ.mi/, /e.se.ˈlẽ.t͡ʃi/, /e.ˈʁa(ʁ)/.
  8. /l/.
  9. -on is /ɔn/, -en is /ɛn/, -an is /an/. (In Brazil, /õw̃/, /ẽj̃/, /ɐ̃/).
  10. The /j/ is not present in the adverbial form.

I'd be thankful if you gave me examples of words with the hiatuses you mentioned (e/i/o/u + vowel, stressed-unstressed, unstressed-stressed, unstressed-unstressed), so I will just have to tell you how they are pronounced. Svjatysberega (talk) 09:41, 28 April 2021 (UTC)Reply

@Svjatysberega I have added a "General" Brazilian style. I don't know if this makes any sense; see User:Benwing2/test-pt-IPA for a few examples. "General" Brazilian is an attempt to represent a compromise accent that uses [s]/[z] along with /ʁ/ [h] in codas. I know that there isn't exactly such a thing as "General" Brazilian but I want to indicate that coda /ʁ/, pronounced as [h] or [χ] or similar, is the most common pronunciation of written 'r' in codas. An alternative is to not include the phonetic [...] version for "General" Brazilian, but then I'd want to include a specific accent that has coda [s]/[z] and [h] (or maybe [χ]) along with the Rio and São Paulo accents, and list it first of the three as most representative of a compromise accent. I originally thought of labeling this as "Belo Horizonte" or "Central Brazil" (or "Brasilia"?) but I don't know if this is accurate. I will include a bunch of examples of hiatuses tomorrow. Benwing2 (talk) 04:52, 30 April 2021 (UTC)Reply

@Benwing2 It's better to completely replace /ʁ/ with /h/ and not add a phonetic transcription:

Svjatysberega (talk) 04:39, 4 May 2021 (UTC)Reply

hiatuses[edit]

@Svjatysberega Words with hiatuses where the first vowel is i:

  1. ia:
    1. Both unstressed, word-final: abundância, agência, ambulância, etc.; acurácia, águia, Albânia, angústia, ânsia, Antuérpia, Amália, Amazônia, Islândia, etc.
    2. Both unstressed, pre-tonic: apreciador, invariavelmente, inviabilidade, maquiavélico, mediação, etc.
    3. Second vowel stressed: apreciar, apropriado, arrepiado, judicial, Juliana, miasma, mundial, etc.
  2. :
    1. Second vowel stressed: camião, cirurgião, guião, lampião, legião, opinião, ocasião, união; sutiã
  3. ie:
    1. Both unstressed, word-final: cárie, espécie, série, superfície, etc.
    2. Both unstressed, post-tonic: alien
    3. Both unstressed, pre-tonic: alienar, científico, experienciar, parietal, piedade, proprietário, sociedade, Vietname, etc.
    4. Second vowel stressed: quieto, recipiente, rapieira, Trieste, viela, vieira, Xavier, etc.
  4. io:
    1. Both unstressed, word-final: absíntio, advérbio, alumínio, Anastásio, egípcio, escritório, exercício, genocídio, impróprio, etc.
    2. Both unstressed, pre-tonic: adicional, agiotagem, fisionomia, funcionar, idiossincrasia, marionete, ociosidade, piorar, ultravioleta, violão, etc.
    3. Second vowel stressed: estudioso, piolho, pior, (ele) pressiona, quiosque, superior, viola, etc.
    4. First vowel stressed: rio, tardio, vazio
  5. :
    1. Second vowel stressed: religiões and similar plurals of nouns in -ão
  6. iu:
    1. Both unstressed, post-tonic: médium
    2. Both unstressed, pre-tonic: diurese, friulano, poliuria, triunfal
    3. Second vowel stressed: oriundo, triunfo
    4. First vowel stressed: abiu, pariu, viu

Benwing2 (talk) 05:19, 1 May 2021 (UTC)Reply

@Svjatysberega Words with hiatuses where the first vowel is u:

  1. ua (not including qua, gua):
    1. Both unstressed, word-final: Mântua, tábua
    2. Both unstressed, pre-tonic: atualmente, continuamente, persuadir, situação, etc.
    3. Second vowel stressed: arruaça, bissexual, carruagem, continuar, desjejuar, evacuar, Luanda, suave, tatuagem, usual, virtual, etc.
  2. (not including quã, guã):
    1. Second vowel stressed: Irapuã, Itapuã (this is in Salvador and appears in several songs), muçuã, puã, ruão
  3. ue (not including que, gue, qüe, güe):
    1. Both unstressed, word-final: ténue
    2. Both unstressed, pre-tonic: consuetudinário, crueldade, fueguino, influenciar, pueril, etc.
    3. Second vowel stressed: afluente, cruel, cuecas, cueiro, duende, embu-guaçuense, Noruega, silhueta, sueco, etc.
  4. ui (not including qui, gui, qüi, güi):
    1. Both unstressed, pre-tonic: cuidar, cuidadoso, distribuição, fruiteira, juizado, poluição, suicídio, vacuidade, etc.
    2. Second vowel stressed: fuinha, gratuito, juiz, possuir, ruim, substituir, etc.
    3. First vowel stressed: chuiva, fui, ruivo, uivo, etc.
  5. uo (not including quo, guo):
    1. Both unstressed, word-final: árduo, contínuo, indivíduo, inócuo, melífluo, mútuo, promíscuo, vácuo, etc.
    2. Both unstressed, pre-tonic: duodécimo, fluoração, luxuosamente, sinuosidade, vacuolado, virtuosamente
    3. Second vowel stressed: impetuoso, luxuoso, monstruoso, suor, virtuoso
    4. First vowel stressed: luo?, recuo?

Benwing2 (talk) 18:55, 1 May 2021 (UTC)Reply

@Sarilho1 Hi. I see you are active and are a native Portuguese speaker. Perhaps you can help; I'm creating a module to generate Portuguese pronunciation from spelling. The stuff just above is a request for how to pronounce words with hiatuses in them. I gather that in most of these words in European Portuguese, the i or u is a glide /j/ or /w/, but I'm not sure if that applies to all of these words. Benwing2 (talk) 20:24, 1 May 2021 (UTC)Reply
Hi, Benwing2. I'm indeed a European Portuguese native speaker, but I don't particularly understand what you do need me to do. I'm not a linguist, my editions on pronunciation are based on what's stated in Infopédia compared to what I would expect with the content already in Wiktionary. I can provide you with sounds samples, if that helps, but otherwise, I don't think I can help you, though if there is something else you have in mind, I will try to help the best I can. I sincerely hope you succeed in the project. - Sarilho1 (talk) 15:24, 2 May 2021 (UTC)Reply
That being said, I'm not sure about the category of Luanda. I would say it would be in the same category of the words with uã. Also, the 'ue' in duende, silhueta, and sueco, for instance are pronounced in three different ways. - Sarilho1 (talk) 15:29, 2 May 2021 (UTC)Reply
@Sarilho1 Thanks. I agree that Luanda probably goes with , and that in duende, silhueta, and sueco, the e is pronounced differently in each case. What I'm curious about is the pronunciation of the i and u in each case above: is it pronounced as i/u (i.e. it occupies a separate syllable), or is it pronounced as y/w (i.e. it is what linguists called a glide, and does not occupy a syllable). For example, the u in guarda is pronounced like w and does not occupy a syllable (there are two syllables, guar-da), whereas I'm pretty sure the u in a word like Itapuã does occupy a separate syllable (there are four syllables, I-ta-pu-ã). For many of the words above, I'm not sure about the syllable division, and I'm trying to see if there are patterns depending on the spelling and/or the position of the stress. Any help you can give would be much appreciated. Benwing2 (talk) 20:52, 2 May 2021 (UTC)Reply


@Benwing2

  1. ia
    1. /jɐ/
    2. /i.a/; /ja/ in Portugal
    3. /i.ˈa/ (but Juliana is /i.ˈɐ/ because of the nasal consonant); /ˈja/ in Portuɡal
    1. /i.ˈɐ̃/; /ˈjɐ̃/ in Portugal
  2. ie
    1. /(j)i/
    2. /je/
    3. /i.e/; /je/ in Portugal
    4. /i.ˈe/ or /i.ˈɛ/ (viela) but quieto is an exception, it is /ˈkjɛ/
  3. io
    1. /ju/
    2. /jo/
    3. /i.ˈo/ or /i.ˈɔ/ (pior, viola, quiosque); /ˈjo/ or /ˈjɔ/ in Portugal
    4. /ˈiw/
    1. /i.ˈõ/; /ˈjõ/ in Portugal
  4. iu
    1. /ju/
    2. /i.u/; /ju/ in Portugal
    3. /i.ˈu/; /ˈju/ in Portugal
    4. /ˈiw/
  5. ua
    1. /wɐ/
    2. /u.a/; /wa/ in Portugal
    3. /u.ˈa/, Luanda is /u.ˈɐ̃/; in Portugal /ˈwa/ or /ˈwɐ̃/ (Luanda)
    1. /u.ˈɐ̃/
  6. ue
    1. /wi/
    2. /u.e/; /we/ in Portugal
    3. /u.ˈe/ or /u.ˈɛ/ (cruel, cuecas, Noruega, sueco); /ˈwe/ or /ˈwɛ/ in Portugal
  7. ui
    1. /uj/
    2. /u.ˈi/, but gratuito is an exception, it is /ˈuj/ (in Brazil and Portugal); /ˈwi/ in Portugal (but juiz and ruim are /u.ˈi/)
    3. /ˈuj/
  8. uo
    1. /(w)u/
    2. /u.o/ or /u.ɔ/ for luxuosamente and virtuosamente (because adjectives ending in "-oso" are /ˈo.zu/ in the masculine singular form but /ˈɔ.zus/ /ˈɔ.zɐ/ /ˈɔ.zɐs/ in the other forms); /wo/ in Portugal
    3. /u.ˈo/ or /u.ˈɔ/ (suor); /ˈwo/ in Portugal
    4. /ˈu.u/

@Benwing2 Sorry for replying so late Svjatysberega (talk) 04:38, 4 May 2021 (UTC)Reply

@Svjatysberega Thanks again. I repeated the above exercise for hiatuses involving e and o.

Words with hiatuses where the first vowel is e:

  1. ea:
    1. Both unstressed, word-final: área, fêmea, línea, orquídea, rosácea
    2. Both unstressed, pre-tonic: ameaçar, barbeador, Beatriz, Ceará, meação, neandertal (nasal), nomeação, realmente, zoneamento
    3. Second vowel stressed: alhear, ameaça, areal, baseado, bloquear, chatear, chateado, cardeal, coclear, eneágono, folhear, frear, geada, golpear, nuclear, oceano (nasal), pancreático, passear, real, teatro
  2. :
    1. Second vowel stressed: camaleão, campeão, campeã, leão, meã
  3. ee:
    1. Both unstressed, pre-tonic: apreender (nasal), Beemôt, compreensão (nasal), neerlandês, Neerlândia, preexistir, reeleger, reencher (nasal), reestruturar, surpreendido (nasal), Teerã, veemência
    2. Second vowel stressed: arreeiro, baleeiro, candeeiro, cumeeira, guineense (nasal), oleento (nasal)
  4. ei:
    1. Second vowel stressed: acroleína, aldeído, ateísmo, codeína, europeísta, meinha (nasal), oleífero, proteína, reinha (nasal), veículo
  5. eo:
    1. Both unstressed, word-final: -áceo (e.g. cetáceo, crustáceo, malváceo, sebáceo), aéreo, áureo, calcâneo, felídeo, gêmeo, lácteo, Mediterrâneo, óleo, petróleo, vídeo
    2. Both unstressed, pre-tonic: campeonato (nasal), creolina, Deodato, geográfico, ideologia, Jeová, neozelandês, praseodímio, preocupar, Teodoro, teoria
    3. Second vowel stressed: anteontem (nasal), eoo, estereótipo, geógrafo, Geórgia, leopardo
    4. First vowel stressed: alvéola, auréola, céo, Leo, nucléolo, Quéops (almost certainly two separte vowels), Téo

Words with hiatuses where the first vowel is o:

  1. oa:
    1. Both unstressed, word-final: amêijoa, amêndoa, mágoa, Páscoa, póvoa
    2. Both unstressed, pre-tonic: criptoanálise, coabitar, joalheiro, Joaquim, poaense, povoação, soalheiro, voador
    3. Second vowel stressed: abençoar, abotoar, amontoado, assoalho, boate, coar, consoante (nasal), Croácia, croata, doação, Eloá, enjoar, feijoada, impessoal, oásis, perdoar, razoável, soalho, soar, toalha, samoano (nasal), voar
  2. :
    1. Second vowel stressed: Itapoã, João, Lagoão, Nipoã, Taboão
  3. oe:
    1. Both unstressed, word-final: áloe, Calírroe
    2. Both unstressed, pre-tonic: adoecer, ajoelhar, autoestrada, autoevidente, capoeirista, Groenlândia (nasal), Noemi, poesia
    3. Second vowel stressed: cachoeira, capoeira, coelho, doença (nasal), joelho, moeda, Noé, noroeste, oboé, oeste, poema, poeta, Proença (nasal), roer
  4. oi:
    1. Both unstressed, pre-tonic: coincidir (nasal), proibido
    2. Second vowel stressed: amendoim (nasal), Coimbra (nasal), egoísta, Heloísa, heroísmo, joinha (nasal), maoísmo, moído, moinho (nasal), ventoinha (nasal)
  5. oo:
    1. Both unstressed, post-tonic: álcool
    2. Both unstressed, pre-tonic: cooperação, zoologia
    3. Second vowel stressed: coorte, zoólogo
    4. First vowel stressed: enjoo, voo

I'm aware that not all these words in a given category may behave the same, but if you can tell me the pronunciation it would still be helpful as I can look for patterns. Benwing2 (talk) 02:58, 5 May 2021 (UTC)Reply

@Benwing2

  1. ea:
    1. /jɐ/
    2. /e.a/, /jɐ/ in Portugal
    3. /e.ˈa/, /i.ˈa/ in Portugal
  2. eã:
    1. /e.ˈɐ̃/, /i.ˈɐ̃/ in Portugal
  3. ee:
    1. /e.e/, in Portugal /i.ɨ/, but "een" is /i.ẽ/ in both
    2. /e.ˈe/, /i.ˈe/ in Portugal
  4. ei:
    1. /e.ˈi/, /i.ˈi/ in Portugal
  5. eo:
    1. /ju/
    2. /e.o/, /ju/ in Portugal
    3. /e.ˈo/, /i.ˈo/ in Portugal
    4. /ˈɛw/, /ˈew/
  1. oa:
    1. /wɐ/
    2. /o.a/, /wɐ/ in Portugal
    3. /o.ˈa/, /ˈwa/ in Portugal
  2. oã:
    1. /o.ˈɐ̃/, /ˈwɐ̃/ in Portugal
  3. oe:
    1. /wi/
    2. /o.e/, /we/ in Portugal
    3. /o.ˈe/ or /o.ˈɛ/, in Portugal /ˈwe/ or /ˈwɛ/
  4. oi:
    1. /o.i/, /u.i/ in Portugal
    2. /o.ˈi/, /u.ˈi/ in Portugal
  5. oo:
    1. /o/, /u/ in Portugal (álcool is /ɔl/ because of the coda L)
    2. /o.o/, /u.u/ in Portugal
    3. /o.ˈo/, /u.ˈo/ in Portugal
    4. /ˈo.u/ Svjatysberega (talk) 02:58, 8 May 2021 (UTC)Reply

Bullets[edit]

@Benwing2 The template is doing weird things to a following line starting with an asterisk for a bullet. See e.g. the hyphenation line at repto or desafio. —Mahāgaja · talk 15:31, 6 May 2021 (UTC)Reply

@Mahagaja I fixed this. However, you need to be careful using this template. For example, per [2], the correct pronunciation of desafio is [ʤi.zaˈfiw], which would currently require a respelling of 'disafiu'. The default pronunciation of final '-io' in Brazil as generated by this module is probably going to change to [iw] per a discussion with User:Svjatysberega that is still ongoing; see above. Before you changed it, there were two manual pronunciations of desafio, [de.zaˈfi.u] and [de.zaˈfiw], which in the current scheme would respectively use respellings of 'desafio' (the default) and 'desafiu', but in the new scheme I will probably change to will require respellings respectively of 'desafi.o' and 'desafio' (the default). I'm not sure whether the correct pronunciation at the beginning is [dez] or [ʤiz] or both (I suspect both). The upshot of this is that in order to use this template properly you either need to be a native speaker or have good references (which certainly exist but I'm not sure of what they are). Even for repto, I suspect there are two possible pronunciations, which would respectively use respellings of 'répto' and 'répito'. Benwing2 (talk) 02:52, 7 May 2021 (UTC)Reply
@Mahagaja See [3], which includes user-submitted audio samples. The two samples respectively sound to me like [de.zaˈfi.u] and [de.zaˈfiw]. Benwing2 (talk) 02:56, 7 May 2021 (UTC)Reply
Yeah, I was wondering about repto, whether it's pronounced "répito" and/or "reto" in Brazil. —Mahāgaja · talk 07:20, 7 May 2021 (UTC)Reply
@Benwing2: Now it's putting too much space below the template if the hyphenation line isn't there, e.g. at hélice. —Mahāgaja · talk 20:34, 7 May 2021 (UTC)Reply
@Mahagaja I don't know how to fix this. In particular I don't know why it doesn't recognize the hyphen at the beginning of the line without adding the extra newline, because the template itself has a newline following in normal usage. Must be a weird parsing bug in MediaWiki. The best suggestion I have is to not have the template append a newline and to require that you manually insert an extra newline before the asterisk (this works). Benwing2 (talk) 02:19, 8 May 2021 (UTC)Reply
@Mahagaja OK, I think I figured it out, using a nasty hack. Benwing2 (talk) 02:24, 8 May 2021 (UTC)Reply
@Benwing2: I hate to sound always unhappy, but now the template is breaking Tabbed languages view. Basically, all languages after Portuguese are being included in the Portuguese tab instead of having their own tabs. —Mahāgaja · talk 06:57, 8 May 2021 (UTC)Reply
@Benwing2: That last tweak seems to have worked. Thanks! —Mahāgaja · talk 07:31, 8 May 2021 (UTC)Reply

European Portuguese vowels before velarized l[edit]

Hi @Benwing2! So, I noticed kind of a problem with the module with some EP cases. So I will just cite them:

 

for -ável and should be IPA(key): (Portugal) /ˈa.vɛl/ or something like it, with the /l/ "opening" the vowel to /ɛ/.
 
  • (Brazil) IPA(key): /ˈtũ.new/ [ˈtũ.neʊ̯]
    • (Southern Brazil) IPA(key): /ˈtu.new/ [ˈtu.neʊ̯]

for túnel and should be IPA(key): (Portugal) ˈtunɛɫ.
 
  • (Brazil) IPA(key): /aw.maˈna.ki/ [aʊ̯.maˈna.ki]
    • (Southern Brazil) IPA(key): /aw.maˈna.ke/ [aʊ̯.maˈna.ke]

for almanaque and should be IPA(key): (Portugal) /al.mɐˈna.kɨ/.
 
  • (Brazil) IPA(key): /aw.voˈɾa.dɐ/ [aʊ̯.voˈɾa.dɐ]
    • (Southern Brazil) IPA(key): /aw.voˈɾa.da/ [aʊ̯.voˈɾa.da]
 

for alvorada and should be IPA(key): (Portugal) /aɫvuˈɾadɐ/.

The following cases I'm not so sure if the current pronunciations are wrong or are just alternatives:

 
  • (Brazil) IPA(key): /kowˈmej.ɐ/ [koʊ̯ˈmeɪ̯.ɐ]
    • (Southern Brazil) IPA(key): /kowˈmej.a/ [koʊ̯ˈmeɪ̯.a]
 
  • (Portugal) IPA(key): /kolˈmɐj.ɐ/ [koɫˈmɐj.ɐ], /kɔlˈmɐj.ɐ/ [kɔɫˈmɐj.ɐ]
    • (Northern Portugal) IPA(key): /kolˈmej.ɐ/ [koɫˈmej.ɐ], /kɔlˈmej.ɐ/ [kɔɫˈmej.ɐ]
    • (Central Portugal) IPA(key): /kolˈmej.ɐ/ [koɫˈmej.ɐ], /kɔlˈmej.ɐ/ [kɔɫˈmej.ɐ]
    • (Southern Portugal) IPA(key): /kolˈme.ɐ/ [koɫˈme.ɐ], /kɔlˈme.ɐ/ [kɔɫˈme.ɐ]

for colmeia and it's (also?) read IPA(key): (Portugal) /koɫ.ˈmɐj.ɐ/, /koɫ.ˈmej.ɐ/.
  • (Brazil) IPA(key): /mowˈda.ʒẽj̃/ [moʊ̯ˈda.ʒẽɪ̯̃]
  • (Portugal) IPA(key): /molˈda.ʒɐ̃j̃/ [moɫˈda.ʒɐ̃j̃], /mɔlˈda.ʒɐ̃j̃/ [mɔɫˈda.ʒɐ̃j̃]

for moldagem and it's (also?) read IPA(key): (Portugal) /moɫˈdaʒɐ̃j̃/.
 
  • (Brazil) IPA(key): /powˈtɾõ.nɐ/ [poʊ̯ˈtɾõ.nɐ]
    • (Southern Brazil) IPA(key): /powˈtɾo.na/ [poʊ̯ˈtɾo.na]
  • (Portugal) IPA(key): /polˈtɾo.nɐ/ [poɫˈtɾo.nɐ], /pɔlˈtɾo.nɐ/ [pɔɫˈtɾo.nɐ]

for poltrona and it's read both IPA(key): (Portugal) /poɫˈtɾonɐ/, /puɫˈtɾonɐ/.

I don't know if this passage on Wikipedia helps: All vowels are lowered and retracted before /l/ (see ref). Sorry, I'm not a specialist, so I doubt I can help any further. - Sarilho1 (talk) 15:43, 10 June 2021 (UTC)Reply

Placement of accent labels[edit]

@Benwing2, can you put the accent labels (Lua error in Module:parameters at line 95: Parameter 1 should be a valid language code; the value "Brazil" is not valid. See WT:LOL., Lua error in Module:parameters at line 95: Parameter 1 should be a valid language code; the value "Portugal" is not valid. See WT:LOL. etc.) before the "• IPA(key):" instead of after? That's the usual order. Thanks! —Mahāgaja · talk 12:32, 31 July 2021 (UTC)Reply

Some wrong results[edit]

Hi @Benwing2! So I've being using the template and I've noticed several errors. Some of them seem easily fixable whereas others (like the varying stress between pt-br and pt-pt) might be quite annoying, though I think it might be useful to list them.

Fixed
  1. Wrong place for syllable break (between "f"/"v" and "r"/"l"):
  1. /ʃ/ instead of /s/ in pt-pt (can be fixed with writing ss in the template, but can't it be fixed without doing so?)
  2. /a/ instead of /ɐ/ in 1-syllable words in pt-pt:
Fixed
  1. /ɐ/ instead of /a/ in pt-pt:
  1. /u/ instead of /ɔ/ in pt-pt (in pt-br: /o/):
    1. auto- IPA(key): /awtɔ-/
Fixed
  1. necro- IPA(key): /nɨkrɔ-/
  2. orto- (IPA(key): /ɔrtɔ-/)
  3. video- (IPA(key): /vidjɔ-/
Fixed
Fixed

There might be a few other problems, but I think these are the main ones that I noticed. - Sarilho1 (talk) 16:02, 14 August 2021 (UTC)Reply

Update[edit]

I've decided to update this discussion. I'm not sure if this option already existed or was updated in the meanwhile, but many cases were resolved using parameters like so {{pt-IPA|br=eletricidade|pt=elètricidade}} (I marked them as fixed, above). There's some cases I can't, however, fix. For instance:

  1. /ʃ/ instead of /s/ and /ʒ/ instead of /z/ in pt-pt can technically be fixed adding an extra s in the parameter: {{pt-IPA|ssuave}}, but cases like {{pt-IPA|casual}} that return IPA(key): /kɐˈʒwal/ rather than IPA(key): /kɐˈzwal/ for European Portuguese pronunciations can't be fixed as easily. There's some places that do pronounce IPA(key): /kɐˈʒwal/, but the standard is clearly IPA(key): /kɐˈzwal/. Other examples: césio, magnésio, amnésia, falésia, sósia, ardósia.

- Sarilho1 (talk) 12:39, 17 December 2021 (UTC)Reply

Creation of "general Portugal" dialect[edit]

I've recently reworked the algorithm to express Brazilian dialects. Rather than creating if-statements for every possible combination (e.g.: if not rio_lisbon_different and not rio_sp_different and not sp_gbr_different and not lisbon_cpt_different then), I've changed the code to only add different dialects when they differ from the "general Brazil" one. I believe that this shall allow us to easily add more dialects without having to state how different they are from each other. A casualty from this method is that the message "Brazil including São Paulo" is no more, but I personally don't think it's bad, since, in my opinion, it means no local dialect shall have a precedence over the others. Note, that this doesn't exclude the fact that our "general Brazil" might be closer to São Paulo dialect.

Now, I personally would like to do something similar with European Portuguese in order to increase modularity and facilitate the introduction of other European Portuguese dialects. This would require the creation of a new style that I propose to call "general Portugal" (gpt) by analogy with gbr. I think this is a better solution than assuming "lisbon" as the standard dialect since there are peculiarities of the Lisbon accent that I don't think they should be considered standard and so they could be expressed as such in the future. Of course, I still argue that this gpt should be closer to the Lisbon-Central Portugal variants.

I do foresee some arguments about how the "gpt" might be favoring Lisbon too much, though, but the same arguments could occur now anyway when "lisbon" is given primacy, so I think proceeding this way shall be beneficial. - Sarilho1 (talk) 12:35, 3 June 2022 (UTC)Reply

I welcome the change. One small issue with the implementation: at farmeiro, I can expand Brazil and see two variations underneath, but when I expand Portugal, it transforms it into two variations and Portugal disappears. I think it should work the same was as Brazil does. Ultimateria (talk) 18:40, 3 June 2022 (UTC)Reply
@Ultimateria Yeah, that's exactly how I was thinking about doing it. - Sarilho1 (talk) 21:43, 3 June 2022 (UTC)Reply

On some Testcases pronunciations[edit]

I don't think I've ever heard anyone in Brazil pronounce "fauna" as "fão-na" or "córtex" with an open E. The latter doesn't really strike me as odd but the first *sure* does. All results on YouTube that happen to mention the words pronounce them the way I would: "fau-na" and then córtex with a closed E. ...then again, both people ended up being from around the same region as me and also spoke the Caipira accent (Paulistano with a retroflex coda R). It might be a good idea to look into how other regions say it?

A similar thing goes for "ainhum". Both "à inhum" and "ãe-nhum" seem like perfectly fine pronunciations to me. I'd recommend adding them as alternate pronunciations? Frankly, this alternate pronunciation thing should go for quite a few other words too.
For example, the em/en to im/im thing really isn't universal; it's just like first syllable unstressed "de/te" being pronounced as either "de/te" or "di/ti" (which despite being just as common isn't at all represented in the PT-IPA template. Weird), where both ways work, the same person might say any given word both ways and some people only ever use one method (it depends on the individual rather than on the Brazilian region they're in).
Examples are entrada/devia/teatro/empregado/Deodoro/depois/em/endereço/enciclopédia. Those would be en/both/ti/im/De/de/im/both/en for me, but I hear things like "divia", "T atro", "dipois", "hein dereço" all the time. ...It's possible region actually plays a role in this, now that I think about it. I think I've only specifically heard "dipois" from people from Rio. 2804:1B0:1903:3600:E413:9FDC:36B6:2C25 13:18, 2 August 2022 (UTC)Reply

There may be some Brazilian out there who might pronounce all words I listed in the last paragraph the way the template predicts it, so I thought it'd be a good idea to list a few more words, like "desinteressada", "desentenderam", "enteado" and "dezoito". For me, all of the words with "des-" would have that "des-" turned into a "dis-" (which then becomes just "dz-" when actually talking) and "enteado" is "hem tiado". 2804:1B0:1903:3600:E413:9FDC:36B6:2C25 13:31, 2 August 2022 (UTC)Reply

Initial open "o" in European Portuguese[edit]

I've changed the module in order to automatically open the initial "o" in European Portuguese words. This fixes dozens of cases that appeared previously and wrongly with a "u", but breaks the pronunciations with articles (e.g.: os dentes). I still think the change was worth it, but a more permanent fix will be needed in the future. - Sarilho1 (talk) 10:23, 16 August 2022 (UTC)Reply

South Brazil; secondary stress[edit]

@Sarilho1 Should we add "South Brazil" pronunciation automatically? It's in tons of articles. If so, what are the rules? Also, the manually-added IPA often includes secondary stress that seems questionable, e.g. in componente, comportamento. I have not been preserving these stresses when converting to {{pt-IPA}} (although I haven't made very many changes of this sort). Is this reasonable? It looks to be trying to add secondary stress on alternating syllables. I believe we should only mark secondary stress when it's clear and phonemic, presumably as in setecentos. Also, in the future please ping me whenever you change the module or comment on this talk page, as I don't automatically see the changes. Thanks! Benwing2 (talk) 05:34, 9 September 2022 (UTC)Reply

@Sarilho1 Also, you should avoid using secondary stress to get open syllables (esp. in European Portuguese) as in pregar, ativo, unless the secondary stress is real; use a dot under the syllable instead. Benwing2 (talk) 05:37, 9 September 2022 (UTC)Reply
Also, I see you have added secondary stresses in various places like comprometer. Is this real? Do you have an online source that reliably indicates secondary stress? Benwing2 (talk) 05:41, 9 September 2022 (UTC)Reply
I don't actually remember why I added. I can't find a justification for it. I'll try to recall why I did so, but I removed it for now. - Sarilho1 (talk) 09:22, 9 September 2022 (UTC)Reply
Can't we simply change the module so that the grave accent triggers the dot under rather than the secondary stress? Secondary stresses can still be easily signaled using two (or more) acute accents. I think it makes it the template simpler to use and it's in line with the definitions that some dictionaries, like Priberam uses. - Sarilho1 (talk) 09:21, 9 September 2022 (UTC)Reply
@Sarilho1 I need to think about this a bit. We can definitely define the accents any way we want, although grave accents are most commonly used for secondary stress. Any solution should work for both BP and EP and for all three of a/e/o. I don't really like the idea of doubling up accents of the same sort because they aren't easily visible in many fonts. One possibility is to make the dot-under automatically open unstressed a/e/o, hence for EP you get [a ɛ ɔ] instead of [ɐ e o]. For BP the dot-under would have no effect on a (which would be [a] regardless except word-finally) but otherwise behave the same. Another change I'm thinking of that should be useful is to make the hyphen join two words without space but otherwise treat them as if they were separate words; so setecentos can be written séte-centos and you'd get the right output for both EP and BP. Benwing2 (talk) 09:55, 9 September 2022 (UTC)Reply
@Sarilho1 I think it is fine to use a grave accent to indicate unstressed open syllables. Secondary stress can be handled a bit like the way {{it-IPA}} works: if there are two or more accented vowels in a single word, all but the last are converted to secondary stress. If for some reason you need to put secondary stress after the stressed vowel, you can use a line-under, e.g. é̱ or ê̱. This is how {{it-IPA}} does it. I have some other questions, though:
  1. In exposto spelled expôsto, currently the module generates /isˈpos.tu/ for Brazil and /(i)ʃˈpoʃ.tu/ for Portugal. However, Infopédia indicates two pronunciations for Portugal, /ɐjʃˈpoʃ.tu/ and /(i)ʃˈpoʃ.tu/. I think we should generate both of these automatically; but I have some questions about this: (1) Does this apply to all words in ex- + consonant? I see it also applies to extenso, expresso and explícito, for example. (2) What about non-Lisbon accents? Do they also have two pronunciations, and if so, what are they?
  2. Should we auto-generate a "South Brazil" accent, and if so, what are the rules?
Thanks. Benwing2 (talk) 04:18, 12 September 2022 (UTC)Reply
Regarding 1. Both pronunciations for ex- + consonant do indeed occur and I would be great to generate both automatically. Also note that ex- is always pronounced as "eis" in EP, so forms like "ex-aluno" are read as (see [4]):

As for regional accents, Priberam lists the two options for pronouncing ex-: "eis-" and "es-" [5], but I'm not sure if "eis-" is a characteristic of the generalized Lisbon accent and thus is always pronounced as "âis-" or if there are regional instances where it is indeed "eis-". I've been favoring the later, but I will try to find regional examples of it.
As for 2., I think it would be great to have it, but unfortunately I don't know the rules. - Sarilho1 (talk) 10:36, 13 September 2022 (UTC)Reply

lots more fixes[edit]

@Sarilho1 I have a bunch more fixes I'm going to make, please comment:

  1. 'des^-' will be a shortcut for both 'dis-' and 'des-' in that order in Brazil, but just 'des-' in Portugal. I have made all the offline changes to add 'des^-' pronunciations where correct (to about 1,000 entries).
  2. After 'des^-', the remainder of the word will be treated as if word-initial, so that 'des^emprêgo' for desemprego gives the right results.
  3. 'i^' before vowel will be a shortcut for both /i/ and /j/ in Brazil, e.g. desistência spelled 'desistênci^a'.
  4. 'à, è, ò' will be unstressed open vowels /a ɛ ɔ/ in Portugal, but /a e o/ in Brazil. This should make it possible to spell a lot of words with a single spelling that currently require separate pt/br spellings.
  5. Unstressed 'o' in hiatus (e.g. in -oar) will be /w/ in Portugal but /o/ in Brazil; currently it's /u/ in Portugal.
  6. Unstressed 'e' in hiatus (e.g. in -ear) will be /j/ in Portugal and /i/ in Brazil; currently it's /i/ in Portugal and /e/ in Brazil. This change for Portugal seems unquestionably correct but for Brazil there are exceptions such as leal that will require respelling. (UPDATE: This change for Brazil is probably more trouble than it's worth.) With this in mind, what do Portugal spellings like 'tiológico,teológico' for teológico and 'terráquio,terráqueo' for terráqueo represent? Infopédia gives only /tjuˈlɔʒiku/ and /təˈʀakju/.
  7. In '-CriV' in Portugal where C = consonant and V = vowel, 'i' will be rendered as /i/ not /j/; similarly '-CruV' as in destruição. But 'CreV' should probably be /CrjV/ as in preocupado spelled 'preòcupado', given as /prjɔkuˈpadu/ in Infopédia. Please comment; triangular is spelled 'tryangularh' currently, but Infopédia gives /triɐ̃ɡuˈlar/.
  8. With two words joined by a hyphen, their pronunciation will be generated as if two separate words but the stress of the first converted to secondary stress, and the result concatenated. That way, setecentos can be written 'séte-centos' and get the right result in Brazil (in particular, the 'e' at the end of 'séte' will be /i/, as when word-finally). Note that internally this already happens automatically (more or less) with -mente and -zinho suffixes. Words like desculpar-se will be special cased so they work correctly automatically.
  9. With two words joined by +, the same things will happen as with hyphen but the stress on the first word will remain unwritten, so that e.g. entreter can be written 'entre+têr' and give the right results in Brazil (in particular, the 'e' at the end of 'entre' will be /i/).
  10. Unstressed 'el' not before a vowel will be [ɛɫ] in Portugal, as in selvagem, adelfia, beldade, delgado, túnel, -ável, -ível, etc.
  11. Unstressed 'ol' not before a vowel will be [oɫ] in Portugal, as in desenvolver, solteiro, molduragem, etc.
  12. 'exC' will be [ɐjʃ] in Lisbon Portugal, as in expresso, descontextualizar, contexto, etc.
  13. 'o' and 'os' as separate words in Portugal will be fixed to be /u/, /uʃ/ respectively.
  14. '+' by itself will stand for the pagename in respelling.
  15. I'm thinking of changing 'ui' to be /uj/ by default rather than /wi/ (Portugal) and /ui/ (Brazil). This seems to be what Priberam thinks ought to be the default, but I don't know if this makes sense. Cf. Priberam writes continuidade with 'u-i' to indicate /wi/, and doesn't write anything in descuidar to indicate /uj/.
  16. I'm thinking initial 'em-' in Brazil should say have /ĩ/ (natural), '/ẽ/' (careful) rather than just render as /ĩ/.
  17. I will support '<q:...>' at the end of a respelling to add a qualifier before the respelling, and '<qq:...>' at the end of a respelling to add a qualifier after the respelling. This notation is already supported for {{syn}}/{{ant}}, {{desc}} and various other templates.
  18. Words in 'dess-' in Portugal will have /dɨʃ.s-/ rather than /dɨ.s-/, e.g. dessalinizar. If you want /dɨ.s-/, you can always write 'deç-'.

Finally, a few questions (I will have more):

  1. Is there a rule that unstressed 'ilV' is optionally pronounced like 'elV' in Portugal in certain circumstances? desmilitarização, desmobilizar, desestabilização, desequilíbrio and several other words have it. If so, in which circumstances? Specifically in 'ili'?
  2. insuscetível, imiscível with second Portugal pronunciation 'insuchètível', 'michível: What is the deal here? It's not in Infopédia. Is this a fast pronunciation and if so do we really want to indicate it?
  3. esquecer with Portugal pronunciation 'ɨsquècêr': This doesn't agree with Infopédia. Can you explain?
  4. imperdível with second Portugal pronunciation 'impredível'. Explain?
  5. loa, voo with Portugal pronunciation 'loua', 'vouo' (and similar for lots of other words). Explain?
  6. desengano with second Portugal pronunciation 'desingano'. Explain?
  7. acelerador with second Portugal pronunciation 'acelaradôr'; same for acelerar. Explain?
  8. acoplar with second Portugal pronunciation 'acopular'. Explain?
  9. administração with first Portugal pronunciation 'a.dmenistração'. Explain the 'e'?
  10. You have added manual syllabifications to a lot of words for Portugal, e.g. 'a.bdicar', 'a.bdominal', 'a.bduzir', etc., plus 'a.bjèção', 'a.bjéto', 'a.bjurar', 'abru.pto', 'a.bsidal', 'a.ctínio', 'ada.ptação', 'a.dmitir', 'a.dvento', 'a.dvérbio', etc. etc. What is the purpose of these? These are IMO very unnatural syllabifications, and if for some reason they are correct, the module should be adjusted to do this automatically rather than manually adding syllabifications everywhere.
  11. Brazilian pronunciation spellings like 'acihma', 'alumíhnio', 'aluhno', 'apêhnas', 'Atêhnas', 'biquíhni', 'blasfêhmya', 'cêhna', 'ciclôhne', 'sihma' (for cima), and several others: Did you add these? If so, why? They contradict explicit statements made by various authors that stressed vowels are nasalized in Brazil before nasal consonants. Never mind, it's User:OweOwnAwe adding them.
  12. Words in '-ulo': Quite a lot of them have two Portugal pronunciations like 'título,títelo' and 'trémulo,trémelo'. These are not in Infopédia; do they represent a colloquial pronunciation or something? Are they consistent in all words in '-ulo'?
  13. 'tinhoso' respelled 'tinhôso,tenhôso'. Again not in Infopédia. Is this a colloquial pronunciation again? Similarly 'tijolo' respelled 'tijôlo,tjôlo'. I take it 'tjôlo' is a colloquial variant of 'tejôlo'?

Thanks for any comments. Benwing2 (talk) 04:44, 19 September 2022 (UTC)Reply

Those are a lot of very good changes.
  • I like your solution for 1, 2, 3, 8, and 9. Which implementation should be used for words like líderes where the first "e" is open in EP? pt=líder+es?
  • We have already discussed 4 but what about the case where "o" is actually "ô" instead of "u" in EP. Will we be able to spell it with "ô" or do we need to write "ộ"?
  • I also welcome the changes to 5 and 6. Regarding your question, I tried to represent the fact that pronunciations like IPA(key): /tɨˈʁa.kɨ.u/ sometimes also occur, but it's fine to remove them and had them latter with more research backing their existence.
  • Regarding 7, I think it's mostly a choice we can make. Personally, I pronounce both /i/ and /j/ in criança or triangular. It's fine, in my opinion, to follow Infopédia lead.
  • 10 is fine, but regarding 11, I have some doubts about having "ol" become [oɫ] in EP, by default. Infopédia does generally favors [oɫ], although some words (like voltagem) register both [oɫ] and [ɔɫ] options. Words like moldar are registered by Infopédia as pronounced with [oɫ], but Instituto Camões gives it as a example of a word that is pronounced as [ɔɫ] [6]. I would say this depends on the speaker (I personally pronounce all of your examples with [ɔɫ]). Do you think both options should always be presented or is it better to have [oɫ] as the default and then add [ɔɫ] in a case-by-case basis?
  • For 12, do you plan to add to Lisbon Portugal or to the general Portugal? The pronunciation is quite widespread and not only a feature of Lisbon's accent.
  • All the other changes seem fine by me.

I will try to add answers to your question latter. - Sarilho1 (talk) 10:42, 21 September 2022 (UTC)Reply

@Sarilho1 Thank you very much for your comments. This is what I'm thinking:
  • For cases like 'líderes', I'm not completely sure. My original idea for + is that the part before it is treated like a separate word. In a case like 'líderes' there are two issues; one is that it's the suffix rather than the prefix that needs to be unstressed, and two is that you don't want the -es treated like a separate word because then the 'e' would become /i/. We could introduce new symbols for this case but maybe it would just be best to write 'lídères'.
  • I'd definitely like to avoid ộ if possible, as it's awkward. My current thought is that ô and ó preceding another stressed syllable would be turned into secondary stress. That would mean we'd need a different way of notating an unstressed /o/ and /e/. Maybe this could be ō and ē or something like o* and e* with a symbol afterwards. Another possibility is [o] and [e]; I've used this bracket notation in {{it-IPA}} as a general way of indicating that the symbol inside should be pronounced exactly as in IPA, based on the idea that brackets indicate IPA pronunciation. So for example in {{it-IPA}}, [s] indicates a literal [s] (useful e.g. where 's' would normally be voiced), [x] indicates a literal [x], etc.
  • For 'ol' I think there are two possibilities; either we default to [oɫ] and require that something like 'òl' be written to indicate an open /ɔ/, or we can make it list both [oɫ] and [ɔɫ] by default and require that you need to indicate the specific vowel quality if you don't want this. It comes down to what is more common among speakers; presenting both [oɫ] and [ɔɫ] by default would imply that most speakers don't make any phonemic distinction between the two, i.e. there aren't any minimal or near-minimal pairs involving the two sequences. I suspect that is probably indeed the case, but Instituto Camões seems to suggest some speakers distinguish the two. The problem then becomes where do we get a source for pronunciation to distinguish the two; Infopédia is the only source I currently know of for European Portuguese pronunciation. That would suggest maybe we should present both.
  • For 'exC', what is your recommendation? For example, if you think it makes sense, I can add make both Lisbon and "general" Portugal render as [ɐjʃ], but not regional Central or Southern Portugal.

Benwing2 (talk) 03:12, 22 September 2022 (UTC)Reply

@Benwing2, your solutions seem quite fine to me. Regarding the third point, the problem might not be exclusive to 'ol', but the variation also occurs in words with initial unstressed 'o' (cf. original) [7]. I changed the code to currently return /ɔ/ by default, whereas before it simply return /u/ (which certain sources state that it occurs, but it's mostly dialectal), but maybe the solution you adopt (inclusive or exclusive by default) should also be extended to these cases. For the last point, I asked since I would like to add a "Lisbon" accent in the future that is similar but not equal to the "general" Portugal, but I think that for now adding [ɐjʃ] to the "general Portugal" is good enough since the pronunciation is quite widespread across the country. - Sarilho1 (talk) 11:13, 22 September 2022 (UTC)Reply
Finally, to answer your questions:
  • Regarding your first question, the first "i" in iCi is often pronounced as /ɨ/. This phenomenon is very prevalent in Lisbon and was once considered the standard pronunciation (and some still prescribe it) [8], [9], [10]. Examples are the pronunciation of ministro as menistro; civil as cevil; Filipe as Felipe; militar as melitar, administração as admenistração (point 10), as well as all of your examples. It can occur, though to a lesser degree, in the rest of the country (e.g.: feminino as femenino). Less standard, but that can sometimes also be heard (mostly in Lisbon, imo) is the dissimilation of the second "i" when the first is stressed, like in vício, tília, mínimo.
  • The careful pronunciation of sc is /ʃs/, but I would say that the vast majority of speakers either pronounce it as /s/ (the most conservative pronunciation, as indicated by the Old Portuguese and Galician nacer) or as /ʃ/, thus registering the possibilities nascer/nacer/nacher or crescer/crecer/crecher [11] (the source states /s/ is a characteristic of the Center-North, but I've heard it in Alentejo and Algarve and I've heard /ʃ/ quite commonly in the North, so I don't know how truthful it still is). I think /ʃ/ is the most widespread, though, so I tried to represent it (some entries had it already, in fact). However, I've being edging lately to side of passing the least amount of arguments to module as possible, so if you think registering all the three possibilities doesn't make much sense, then it's better to remove then and keep only /ʃs/.
  • Regarding ɨsquècêr, I think it was simply because the previous version of the module only returned isquècêr. At the time I was undecided if es- should be represented as /ɨʃ/, /iʃ/, /(ɨ)ʃ/, or /(i)ʃ/. I've already standardized it to pt=esquècer.
  • Impredível is a non-standard pronunciation (and possible a common misspelling?). I think it's better to remove it and only added it with proper sources.
  • The "u" in Lisboua, loua, vouo is common in the North of Portugal, in particular in Porto. Since the module represents ou as "o(w)" I added, but maybe it should only be added in the proper subdialects.
  • I would say that some cases en- and in- can be pronounced either as /ẽ/ or /ĩ/. I would say that's the case of (des)emprego (sometimes read (des)imprego) or (des)importante (sometimes read (des)emportante). Those are all non-standard and should probably be removed.
  • Regarding "acelarar", that's a very common mispronunciation [12]. I don't now if it's an actual sound change or simply an exception, so it should be fine to keep this one.
  • Acoplar and unstressed "ulo" have a similar explanation. Indeed colloquial, there's variation between /Vulu/ and /Vɨlu/, and given that ɨ is often suppressed (reason why vale is often said /vaɫ/ and pelicano is /plicɐno/), one gets /Vlu/. The opposite phenomenon can also occur (e.g.: ciclo read as cíquelo or cículo). Again, these are nonstandard pronunciations that I think we should now remove and only later with proper research backing them.
  • Regarding the syllabifications, I find it way more natural to pronounce the consonant with the following cluster than with the previous vowel. I think they can be better thought in EP (and maybe also BP?) as being respelled with an "e" (that is, abedicar), which is suppressed in normal speech so we either get a.be.di.car (slow) or a.bdi.car (normal/fast).
  • Regarding the last point, it's mostly a particularity of the Lisbon accent that sometimes spreads: that is, the phenomenon of /i/ becoming /ɨ/ before palatals (the later with the ɨ removed as it often happens in EP). Other examples would be "pejama" instead of "pijama", "camenhar" instead of "caminhar", "Esrael" instead of "Israel", etc.
I hope these answers clarify the extra options. Still, as I said, I think it's fine for now to focus simply in adding the most widespread pronunciations and only add more later if necessary, so it shouldn't be problematic to remove the vast majority of them. - Sarilho1 (talk) 11:13, 22 September 2022 (UTC)Reply
@Sarilho1 Thanks very much for your comments. I agree that we should strive to pass as few params to the module as possible and automate whatever we can; otherwise we will end up with a lot of inconsistencies. I also agree we should avoid nonstandard pronunciations and only keep colloquial pronunciations when they're common in normal speech; hence I would suggest having the module automatically generate /nɐˈʃer/ (and maybe /nɐˈser/) along with /nɐʃˈser/, with appropriate qualifiers (maybe /nɐʃˈser/ can say "in careful speech" and /nɐˈʃer/ can say "in normal speech"). I will be adding support for user-specified qualifiers, which should make it easier to distinguish colloquialisms, regionalisms, etc. that we do want to keep.
  • As for cases like a.bdicar, if you think it should syllabify this way, we should definitely do it in the module. Which clusters get syllabified this way? Is it all CC clusters where the first C is an obstruent (surely 'so.rte' is wrong)?
  • It is interesting you mention insertion of an 'e' between consonants; I thought only Brazilian Portuguese did this (where it is an 'i' that's inserted). BP is well known for saying /ad͡ʒivoˈgadu/ for 'advogado'. Sometimes (I don't know under what circumstances) the 'i' can be deleted but its underlying presence is revealed by the fact that 'd' and 't' still palatalize (hence Sprite = /iˈsprajt͡ʃ(i)/).
Benwing2 (talk) 02:58, 23 September 2022 (UTC)Reply
@Sarilho1 A few more questions I've encountered as I clean up existing pronuns:
  1. The iCi -> eCi pronun: how does this apply if you have iCiCi? For example, poliginia has 'pòliginia,pòligenia' with the second i being affected, but dirigível has 'dirigível,derigível' with the first i affected; similarly feminilidade 'feminilidade,femenilidade'.
  2. arraial spelled 'arraial,arra.yal'. Similarly faiança with manual pronunciations /faj.ˈɐ̃.sɐ/, /fɐ.ˈjɐ̃.sɐ/. Infopédia has /ɐʀɐjˈaɫ/ but /fajˈɐ̃sɐ/; I assume /ɐʀɐjˈaɫ/ is what's intended by 'arra.yal'. Is this consistent with all words in -aia-, -aio-, -aie-, etc. in Portugal? Or in all -aia-, -aio-, -aie-, where the 'a' is unstressed? E.g. alfaiate, baiano, caiaque, desmaiar, etc., with 'a' unstressed; alfaia, praia with 'a' stressed; aioli, maiores, Arraiolos, baioneta, etc. with 'a' unstressed; balaio with 'a' stressed; caieira, Lafaiete, praieiro with 'a' unstressed; Aiaie (only example) with 'a' stressed. (I should add, Infopédia has /ɐʀɐjˈaɫ/, /aɫfɐjˈat(ə)/, /bɐjˈɐnu/ with /ɐj/ but /kajˈak(ə)/, /kajˈar/, /dəʒmajˈar/, /ɐvajˈɐnu/, /ẽbrajˈaʒɐ̃j̃/ with /aj/; no consistency that I can see.)
  3. aurícula spelled 'aurícula, ourícula'. Is the latter a colloquial pronunciation or mispronunciation? Same for aurífice, auscultação, auscultar.
  4. orelha: Currently respelled 'orêlha,ourêlha'. Infopédia has /oˈrɐ(j)ʎɐ, oˈreʎɐ/ = 'ōrêlha'. Should we have the same with closed initial /o/, and is the spelling 'ourêlha' dialectal?
  5. outono: Currently respelled 'outôno,òtôno'; the latter pronun not in Infopédia.
  6. Azerbaijão spelled 'Azerbaijão,Azerbeijão'. Explain? Interestingly, Infopédia has /ɐzərbɐjʒɐˈneʃ/ for azerbaijanês. Is the /ɐj/ here due to the following /ʒ/? If so is this consistent? Our module doesn't currently handle this.
  7. Benjamim spelled 'Benjamim,Bãejamím'. The second spelling implies underlying -em dipthong; is this again due to the /ʒ/? If so, is it consistent?
  8. Words in pretonic -ie-: E.g. abalienar, ansiedade, hierarquia, piezoelétrico, aquiescer, bienal, bielorusso, piedade; almost all seem to have /jɛ/. Is this a rule we can implement? There are some that are given in Infopédia with /je/, e.g. alienar (despite abalienar having /jɛ/). I suspect this isn't a real distinction, but maybe is just a speaker-to-speaker variation. Correct?
  9. bienal spelled 'biènal,bianal'. I have deleted the second spelling as an assumed mispronunciation or colloquialism.
  10. bilhar spelled 'bilhar,belhar'. You mentioned this above as a "particularity of the Lisbon accent that sometimes spreads". Is this i -> ɨ before palatals consistent in Lisbon, and if so should we represent it, either alongside /i/ or instead of it? Similarly digestão spelled 'digestão,degestão' (likewise digestível), is this another instance of this, happening before /ʒ/? Similarly discernimento spelled 'discernimento,dechernimento', is this another instance before /ʃ/?
  11. discutível spelled 'discutível,descutível', similarly dispensável spelled 'dispensável,despensável', disputar spelled 'disputar,desputar', distração spelled 'distràção,destràção, also distrator, distrativo. Not sure what is going on here, but I'm planning on removing the second pronun in each case as they can't be verified in Infopédia.
  12. coima spelled 'coima,cóima'. Explain?
  13. equilíbrio given with /iki/ and /eki/ in Infopédia. Similarly, emaçar given with /em/ and /im/, deserdar given with /dəzerˈdar, dəzirˈdar/, ecuménico, epidemiologia, equador, equiparar, ervilha, herdeiro, heresia given with /e-/, /i-/ (or /i-/, /e-/). Note also eficaz, elástico where Infopédia gives only /i-/ but we give both /i-/ and /ɛ-/ (should this be /e-/?). Finally note e.g. errar where we give three pronunciations with /i-/, /e-/, /ɛ-/ and Infopédia gives two (/e-/, /i-/). Is there a pattern here that we can automate?
  14. esclarecer: We spell it 'esclarecêr,eisclarecêr' as if it were spelled 'exclarecer'. Neither Infopédia nor Priberam agrees. Is this a colloquial or nonstandard pronun, based on confusing es- with ex-? Something similar with estendível.
  15. inexistir: We spell it 'inisistir,inaisistir' as if it contained -exC-. Explain?
  16. festinha: We split this into respelling 'fèstinha' (= diminutive of festa (party)) and respelling 'fèstinha,festinha' (= diminutive of festa (caress)). This seems rather strange and it's not indicated either in Infopédia (which just gives /fəʃˈtiɲɐ/ for both) or in Priberam.
  17. saudade: A famous word. Infopédia lists /sɐwˈdad(ə), sɐuˈdad(ə)/. Why /ɐw/ not /aw/ in the first pronunciation? Is there a rule here? (Also occurs in saudável, saudar, saudoso, saudosismo, saudosista, but not for some reason in saudação, and not in unrelated saudita.)
Thanks again for your help. I am in the process of implementing the ideas from the previous discussions above. Benwing2 (talk) 07:32, 24 September 2022 (UTC)Reply
Hi @Benwing2.
1. I actually don't know the rules for these cases. I tried to search, but I didn't find any information about it either. The only rules I found simply stated that the unstressed "i" before a stressed "i" suffers dissimulation. I don't have a proper explanation for dirigível: maybe it's another dissimulation process that is changing the first "i" (the pronunciations deregível or diregível don't sound too bad to me) or it's simply the effect of the suffix (e.g.: dirijo read as derijo). The last one could also explain why feminilidade is read femenilidade. Still, I have no proper answer for these cases, unfortunately. Just a small comment: in Portuguese linguistics, the term "affected pronunciation" actually refers to the one where there is no dissimulation, since the Lisbon pronunciation is assumed to be the oldest one (and for some authors, the 'correct' one).
6. Azerbaijão is often mispronounced and misspelled Azerbeijão (to the point that I don't think we should even consider a misspelling). Since Azerbaijão is prescribed, I added the alternative (though very common) mispronunciation. The alternative pronunciation might indeed be due to the following /ʒ/, but I can't think of another example that would prove that. It might be better to simply treat it as an exception.
7. The pronunciation of Benjamim as Bãejamim is likely due to reanalysing the word as bem+jamim (bem is pronounced as bãe in EP). Another example is Bemposta (reanalysed as bem+posta). I think these should simply be treated as exceptions that we ought to manually register, instead of trying to codify them in a rule. As a sidenote, I'm not sure that the pronunciation of em as /ẽj̃/ outside of Lisbon makes sense. I haven't found any documentation attesting it.
17. Again, I don't think it's a rule that transforms saudade in sâudade. I have the impression that it's common in Lisbon (but not in general Portuguese) to read the unstressed /aw/ as /ɐw/ and this might be a case that got registered, but it's certainly not a standard phenomenon.
Sorry for the delay in answering. I'll add a few answers for now and try to answer the rest latter. - Sarilho1 (talk) 10:08, 29 September 2022 (UTC)Reply
Some more answers @Benwing2
3. Indeed colloquial that sometimes leads to misspellings as can be seen by oscultar. Google searches seem to indicate oricular/ouricular are widespread both in Portugal and Brazil. Still, rather than making it a general (colloquial rule), we remove all except the ones with attested misspellings due to mispronunciations.
4. I think it's a feature that occurs mostly in the Northern dialects that are closer to Galician. I think adding it as dialectal is the best option.
5. It's in the same line as the previous two. In Northern dialects, it's common to open more the "o". I would say we should remove it.
8. I think it should be fine to implement /jɛ/ as the rule. I'm a bit surprised by the prescription of "e" in alienar. Personally, I would still pronounce it with /jɛ/. The unstressed /e/ seems to be what induces the colloquialisms of /jɐ/ (in 9.). In fact, "alianado" seems to produce some relevant hits Google searches. Similar to vaículo (from veículo).
13. This article explains the phenomenon of initial /e/ instead of /i/ as a hypercorrection. I would say the double options in Infopédia are cases that went mainstream (mostly in Lisbon) and they registered them. The third option of errar, I think it could be taken as regional or colloquial and be removed since only the first two are prescribed.
14. Those are somewhat common mispronunciations, as you indicated. extendível is registered as a misspelling.
15. I think it was supposed to represent â, rather than à, based on the mispronunciation "eixistir". I removed it. - Sarilho1 (talk) 14:49, 11 October 2022 (UTC)Reply
This thesis seems to suggest this improper clusters are plosive+plosive (pt, bt, bd, dk, kt), plosive+fricative (ps, bs, bv, bʒ, tz, dv, ks), plosive+nasal (pn, bn, tm, tn, dm, dn, gm, gn), fricative+plosive (ft), nasal+nasal (mn). Interestingly, it also states that some authors also consider plosive+liquid as cluster that can introduce silent "e" (e.g.: some speakers syllabify "planta" as "pe+lan+ta"). - Sarilho1 (talk) 20:03, 24 September 2022 (UTC)Reply
@Sarilho1 Thanks again for your responses. I found a pronunciation /sɨ.vɨ.li.zɐ.ˈsɐ̃w̃/ given in the entry for civilização; this is also the first pronunciation listed in Infopédia (as /səvəlizɐˈsɐ̃w̃/). Maybe in cases of three /i/ in a row, both of the first two can be dissimilated? If this is general, I will implement it this way, otherwise if it just varies from word to word, I'll not list any dissimilated form. Benwing2 (talk) 02:30, 1 October 2022 (UTC)Reply
Another issue has to do with syllable divisions like /ˈklawʃ.tɾu/. Esp. given the above discussion I'd definitely expect /ˈklaw.ʃtɾu/. These are found also in Infopédia, which has e.g. /kõbuʃˈtivɛɫ/. It is very strange to me that /kt/ is considered a possible onset but not /st/, but I will implement it this way for EP if you think this is right. Benwing2 (talk) 02:40, 1 October 2022 (UTC)Reply
No, /ˈklaw.ʃtɾu/ definitely sounds weird to me. If the sound were /st/, that could happen (like stresse is a valid EP word), but /ʃt/ is quite hard for me to utter. The /l/, /ʃ/, /ʒ/ and /ɾ/ are definitely codas. Maybe this page might be a useful resource. - Sarilho1 (talk) 09:18, 3 October 2022 (UTC)Reply

module FIXME's[edit]

@Sarilho1 FYI, here is my current list of FIXME's for the module, which I'm working through one-by-one:

  1. Implement i^ not before vowel = epenthetic i or deleted epenthetic i in Brazil (in that order), and i^^ not before vowel = opposite order. Epenthetic i should not affect stress but should otherwise be treated like a normal vowel. Deleted epenthetic i should trigger palatalization of t/d but have no other effects.
  2. Implement i^ before vowel = i.V or yV (in that order), and i^^ before vowel = opposite order.
  3. Implement i* = mandatory epenthetic i in Brazil.
  4. Implement o^ = u or o in Brazil (in that order), and o^^ = opposite order.
  5. Implement e^ = i or e in Brazil (in that order), and e^^ = opposite order.
  6. Implement des^ at beginning of word = /dis+/ or /des/ in Brazil (in that order), and des^^ = opposite order.
  7. In Portugal, before [ɫ], unstressed 'a' should be /a/; unstressed 'e' should be /ɛ/; and unstressed 'o' should be either /o/ or /ɔ/ (in that order).
  8. Support qualifiers using <q:...> and <qq:...>.
  9. Support references using <ref:...>. Syntax is the same as for IPA ref=.
  10. In Portugal, unstressed o in hiatus should be /w/, and unstressed e in hiatus should be /j/.
  11. Support - (hyphen) = left and right parts should be treated as distinct phonological words but written joined together, and non-final primary stresses turn into secondary stresses. Word-initial and word-final behavior should happen, e.g. Brazil epenthesis of (j) before word-final /s/ followed a stressed vowel, Brazil raising of esC- and Portugal rendering of o- as ò-, but syllabification should ignore the hyphen, e.g. if the hyphen follows a consonant and precedes a vowel, the syllable division should happen before the consonant as normal.
  12. Support : (colon), similar to hyphen but in non-final parts, final vowels aren't rendered as closed.
  13. Support + (colon), similar to colon but non-final primary stresses aren't displayed.
  14. In Brazil, word-initial enC-, emC- should display as (careful pronunciation) ẽ-, (natural pronunciation) ĩ-.
  15. In Portugal, -sç- and -sc(e/i)- should show as (careful pronunciation) /ʃs/, (natural pronunciation) /ʃ/.
  16. In Portugal, grave accent indicates unstressed open a/e/o and macron indicates unstressed closed a/e/o; both are ignored in Brazil.
  17. In Portugal, iCi where the first i is before the stress should (maybe) show as iCi, (traditional pronunciation) ɨCi. In iCiCi, both of the first two i's show as ɨ in the traditional pronunciation (FIXME: verify this). C should be only a single consonant, hence not in piscina or distrito (FIXME: verify this). Does not apply if the first i is stressed (e.g. mínimo, tília, pírico, tísica) or if the stressed i is word-final (Mimi, Lili, chichizinho, piripiri), or in certain other words (felicíssimo, filhinho, estilista, pirite). Possibly this means it doesn't apply when the stressed i is in a suffix (-íssimo, -inho, -ista). We can always disable the eCi spelling by adding an h in 'ihCi' to make it look like a cluster between the i's. NOTE: It appears that iCi -> eCi should apply in dicionário, meaning if we apply it at the end, we have to distinguish between glides from original i and glides from e or y.
  18. In Portugal and Brazil, stressed o in hiatus should automatically be ô (voo, Samoa, Alagoas, perdoe, abençoe).
  19. In Portugal, stressed closed ô in hiatus (whether written explicitly as e.g. vôo, Côa or generated automatically) should show as e.g. /ˈbo.ɐ/, (regional) /ˈbo.wɐ/. (FIXME: Verify syllable division in second.)
  20. Recognize -zinha like -zinho, -mente. Just use hyphen (-) to handle these. We don't recognize -zão, -zona, -zito, -zita because of too many false positives; you can just write the hyphen explicitly before the suffix as needed. Cf. among our current vocabulary we have 10 -zão augmentatives (animalzão, aviãozão, cipozão, cuzão, homenzão, leãozão, paizão, pãozão, pezão, tatuzão), 2 -ão augmentatives after a word ending in -z (codornizão, felizão), and7 non-augmentatives (alazão, coalizão, razão, rezão, sazão, sezão, vazão). Similarly for -zona: we have 5 -zona augmentatives (boazona, cuzona, maçãzona, mãezona, mãozona) against 8 non-augmentatives (amazona, aminofenazona, arilidrazona, Arizona, cronozona, ecozona, Eurozona, fenazona) and no -ona augmentatives after words ending in -z. For -zito, we have 1 -ito diminutive after a word ending in -z (Queluzito), one non-diminutive (quartzito), and no -zito diminutives. For -zita we have 1 -zita diminutive (maçãzita) and 4 non-diminutives (andaluzita, monazita, pedzita, stolzita).
  21. Don't special-case final 'a' before -zinho, -zinha, -mente. (FIXME: Ask Ungoliant about this. Maybe both unreduced and reduced '-a' are possible here.)
  22. Final 'r' isn't optional before -zinho, -zinha, -mente.
  23. Consider making secondary stress optional in cases like traduçãozinha where the stress is directly before the primary stress.
  24. In Brazil, unstressed final-syllable /a/ should be reduced even before a final consonant. Cf. açúcar, tórax. (Except possibly /l/? FIXME: Verify.)
  25. Support + = pagename.
  26. Deduplicate final pronunciations without distinct qualifiers.
  27. Implement support for dot-under without accompanying quality diacritic. When attached to a/e/o, it defaults to acute e= open pronun, except in the following circumstances, where it defaults to circumflex: (1) in the diphthongs ei/eu/oi/ou; (2) in a nasal vowel.
  28. Portugal final -e should show as optional (ɨ) unless there is a vowel-initial word following, in which case it should not be displayed at all.
  29. Syllabification: "Improper" clusters of non-sibiliant-obstruent + obstruent (pt, bt, bd, dk, kt; ps, bs, bv, bʒ, tz, dv, ks; ft), non-sibiliant-obstruent + nasal (pn, bn, tm, tn, dm, dn, gm, gn), nasal + nasal (mn) are syllabified in Portugal as .pt, .bv, .mn, etc. Note ʃ.t, ʃ.p, ʃ.k, etc. But in Brazil, all of these divide between the consonants (p.t, b.v, ʃ.t, s.p, etc.). Particular case: ab-rogação divides as a.brr in Portugal but ab.rr in Brazil.
  30. -ão, -ãe, -õe should be recognized as nasal diphthongs with a circumflex added to force stress.
  31. In CluV, CruV, CliV, CriV, the 'u' and 'i' are vowels not glides in both Portugal and Brazil.

Benwing2 (talk) 08:01, 2 October 2022 (UTC)Reply

See User:Benwing2/test-pt-IPA, with a bunch of test cases to exercise these new features. Some are now implemented using a sandbox module, but a great deal of these cases are still broken. Benwing2 (talk) 01:06, 3 October 2022 (UTC)Reply
@Sarilho1 Hi. I implemented most of these fixes; see the test cases in User:Benwing2/test-pt-IPA as well as the sandbox module Module:User:Benwing2/pt-pronunc, which has an updated list of 47 FIXME's along with an indication of which ones are implemented. Some are still broken but will be fixed soon (e.g. I need to figure out the best way to handle prefixes like auto- in a way that works for both Portugal and Brazil). (NOTE: Inside of the list of FIXME's are a few "verify" FIXME's that are mostly questions I need to ask you; if you have a chance please take a look :-) ...) I also have a month's worth of changes across the whole set of Portuguese lemmas that I'm merging and will push them to the site when I'm done. In some cases they conflict with changes you've made in the last month, and in some cases your changes aren't correct; please note for example that 'aceno' (1st person singular present indic of acenar) is pronounced 'acêno' in Brazil, not 'acéno' as you have it; in general there are no words in Brazilian Portuguese with open 'é' or 'ó' followed by a nasal consonant. Another source of differences is words in '-ei-' + vowel like geleia; often these have 'éi' in Brazil but 'êi' in Portugal. In these cases, Priberam frequently says something like 'Grafia no Brasil: geléia' (which is no longer correct but was correct prior to the 1990 spelling reform). In both of these cases, I have implemented a shortcut in my new code, which is to write the Brazilian vowel followed by a *, which specifies to use the vowel as-is in Brazil and the opposite-colored vowel (é vs. ê, ó vs. ô) in Portugal. I wasn't sure whether this choice makes the most sense in terms of what is easiest to remember but it's consistent with i*, which I had already implemented to indicate an epenthetic i in Brazil that isn't present in Portugal. Benwing2 (talk) 17:57, 9 October 2022 (UTC)Reply
I have pushed the new module changes to production and am in the process of pushing all the lemma changes as well; it takes a few hours to do the latter as bots can only save at most one page a second (slower if it takes more than one second to regenerate a given page). Benwing2 (talk) 22:02, 9 October 2022 (UTC)Reply
Sorry @Benwing2. I will try to get back to you as quickly as possible. I've being editing mostly to take my mind of work, so I haven't put much effort into finding the sources I need to properly answer your questions. Still, I noticed something about new changes: the plurals in Portugal no longer show a sibilant. Was that intended or is it just a bug? - Sarilho1 (talk) 22:34, 10 October 2022 (UTC)Reply
@Sarilho1 OK, just ping me when you have a chance to look into some of the questions, no need to do them all at once. The issue with plurals was definitely a bug, which I have fixed. Benwing2 (talk) 05:12, 11 October 2022 (UTC)Reply
Two quick notes. Indeed I've made the mistake of adding some pronunciations of verbal forms like aceno. However, since BP doesn't seem to support é before nasal consonants, I think we could simply make that those cases are automatically lowered to e. Regarding the notes in Priberam, it only shows the pre-1990 Brazilian spelling if you select the pre-A01990 option. So make sure when consulting you that selected that option, otherwise you might be mislead. - Sarilho1 (talk) 14:57, 11 October 2022 (UTC)Reply
@Sarilho1 Thank you very much for your responses. Indeed BP doesn't normally support é or ó before nasal consonants, although I'm a bit wary of automatically making these vowels become ê and ô in this circumstance, both because there may be exceptions (e.g. manual /ˈɔ.mẽj̃/ is given currently for São Paulo for homem) and because it may not completely apply in unstressed syllables: The raising of 'e' and 'o' before nasals may be related to the nasalization of these vowels specifically when stressed before even heterosyllabic 'm' and 'n', hence aceno is pronounced with a nasal vowel in Brazil but not in Portugal. That said, I don't know of any exceptions in stressed or unstressed syllables other than the possible one I just mentioned. Also I have one more question about words like sobrio and others ending in '-io', e.g. rio. There are frequent manual pronunciations for Brazil that indicate both /i.o/ and /iw/ as possibilities, and occasionally both possibilities are given for Portugal as well. Do you think it's worth autogenerating both, or is the /iw/ pronunciation just a fast pronunciation not worth recording (along with lots of other fast pronunciations)? Benwing2 (talk) 05:07, 12 October 2022 (UTC)Reply
@Sarilho1 One more example (also marginal) where a vowel before a nasal consonant is open: ramen, pronounced either 'rámem' or 'râmem' in Brazil. Also I have pushed some fixes which include the handling of unstressed /ie/ -> /jɛ/ as discussed above (including in words like Teerão where the first 'e' is in hiatus and hence raised to /j/). Benwing2 (talk) 07:01, 13 October 2022 (UTC)Reply
One more thing: I am thinking of cleaning up etymologies to use {{bor+}} and {{inh+}} instead of writing out "Borrowed from {{bor|pt|...}}" or "From {{bor|pt|...}}" etc. Some people for some strange reasons object to the plus-variant templates so I want to make sure you're OK with them before making the change. Benwing2 (talk) 11:19, 13 October 2022 (UTC)Reply
I've being already using {{bor+}}, but I've abstained from replacing {{inh+}}, since there wasn't a very clear community consensus. I don't particularly mind the change, but I would still caution to be careful with mass replacements that might spark backlashes in the community. - Sarilho1 (talk) 08:30, 18 October 2022 (UTC)Reply
@Sarilho1 OK, thanks. Benwing2 (talk) 06:03, 19 October 2022 (UTC)Reply
One more thing :) ... I implemented Portugal eí, e.i -> aí, a.i in my sandbox module (ateísta, proteína, europeizar respelled 'europe.izar'). But I see now that veículo is written pt=vēículo|gpt=vaículo. Should I implement it this way? Benwing2 (talk) 00:04, 14 October 2022 (UTC)Reply
According to this discussion, it seems that eí is often pronounced as e(i)í (though the source proscribes it). Thus we get Lisbon's /ɐˈi/. Maybe a better solution would simply to introduce the optional diphthong like we already do with words like lenha, thus rendering gpt's /ɐ(j)ˈi/. What do you think? - Sarilho1 (talk) 08:42, 18 October 2022 (UTC)Reply
@Sarilho1 Hmm, even though it's proscribed? Note that the module can always have special cases for different dialects to avoid proscribed variants. Benwing2 (talk) 06:03, 19 October 2022 (UTC)Reply
I don't usually give much credit to sources that proscribe certain pronunciations. I would prefer if we had a more descriptivist approach. However, if you think we shouldn't include them, it's still fine by me. - Sarilho1 (talk) 09:27, 19 October 2022 (UTC)Reply
@Sarilho1 IMO it all depends on what educated speakers think. For example, dropping the first 'r' in February (pronouncing it like 'Febyuary' or similar) may be proscribed by some sources but it's nonetheless common among educated speakers, whereas dropping the first 'r' in library sounds uneducated; we include it but correctly label it as nonstandard. Another example is pronouncing final '-o' as '-er' ("potater", "tomater", etc.); this is common in certain dialects (e.g. Appalachian English) but definitely avoided by educated speakers except to be deliberately humorous. In this case, tomater is given as an "eye dialect" spelling but not listed at all as a possible pronunciation. In general, given the multitude of possible Portuguese pronunciations and the possibility of information overload, I'd prefer to omit any pronunciations that are nonstandard or are avoided by educated speakers, but include those that are common among educated speakers, possibly with a note such as "proscribed but common among educated speakers" if needed. In this case, what is the situation with the inserted glide? Benwing2 (talk) 05:07, 20 October 2022 (UTC)Reply
@Sarilho1 I am going through and adding pronunciations for diminutives in '-inho/-inha'. It seems that in Brazil, such words always preserve the quality of the stressed vowel of the base noun or adjective. However, in Portugal it seems this often is not the case, e.g. per Infopédia, certinho, amarelinha, azedinha, capelinha, cedinho have a reduced /ɨ/; but betinho has /ɛ/ rather than reduced /ɨ/. Likewise, 'a' is reduced. On the other hand, abobrinha, agorinha, bolinha have /ɔ/ rather than reduced /u/; yet cebolinho/cebolinha have /u/ not /o/. Is there a general rule here? E.g. how are arrozinho, bolinho "small cake", bonequinha/bonequinho, boquinha, cabecinha, cachorrinho, carrocinha, ceguinho and velhinho pronounced in Portugal? Thanks! Benwing2 (talk) 02:22, 17 October 2022 (UTC)Reply
I'm not aware of any general rule for this. In your examples, arrozinho (chiefly), bolinho, cachorrinho, and boquinha are pronounced with /u/, carrocinha is chiefly pronounced with /ɔ/, bonequinha, bonequinho, ceguinho are pronounced with /ɛ/, while velhinho follows the pronunciation of velho (/ɐ(j)/ for gpt and /ɛ/ for the rest), cabecinha is pronounced with /ɨ/. It seems to suggest that /e/ is reduced to /ɨ/, while /ɛ/ might or not be reduced, /o/ goes to /u/ while /ɔ/ is mantained. This reminds me that I still have to retrieve the source that lead me to register festinha with two different pronunciations. - Sarilho1 (talk) 08:58, 18 October 2022 (UTC)Reply
@Sarilho1 It is strange there is no general rule. Note also that this link [13] which you supplied earlier claims that unstressed vowels are reduced before -inho and -íssimo in Portugal (unlike before -zinho). I guess this is somewhat incorrect? Benwing2 (talk) 06:03, 19 October 2022 (UTC)Reply

Questions for Ungoliant[edit]

@Ungoliant MMDCCLXIV I am working on Portuguese pronunciations, as you can see from the above discussions. I gather you are a native speaker of South Brazilian Portuguese, and I'd like to get your help on some matters.

  1. You've added quite a lot of manual South Brazil pronunciations. I'd like to remove them and have the module automatically generate South Brazil regional pronunciation whenever possible. Can you help me by enumerating the differences between South Brazil pronun and standard Brazilian dialect(s)? When looking at the manual South Brazil pronuns, I see some patterns:
    1. Final -o is /o/, and final -e is /e/.
    2. This dialect does have palatalization of 't' and 'd' before 'i'.
    3. Syllable-final 'l' is normally /w/ like elsewhere; but you write almofada as /ˌaɫ.mo.ˈfa.da/; can you explain?
    4. You write ab-rogar is written as /ab.ʁoˈɡa(ɻ)/|/ab.hoˈɡa(ɻ)/|/ab.χoˈɡa(ɻ)/|/ab.ɦoˈɡa(ɻ)/, which suggests (a) that syllable-final 'r' is "Caipira" /ɻ/. OTOH you write abactor as /aˌbak(i)ˈtoɾ/, using /ɾ/, and abafador as [a.ˌba.fa.ˈdoɻ]|[-ˈdoɾ]; can you clarify the syllable-final 'r' situation?
    5. You write ab-rogar as above without epenthetic /i/ (where I think standard pronunciation would be something like [a.bi.ho.ˈga(h)]), and similarly administração as [ˌadmiˌnistɾaˈsɐ̃w] and advogado as [ˌad.vo.ˈɡa.do] (whereas you write Nordestino [ˌadimiˌnistɾaˈsɐ̃w], [ˌadi.vu.ˈɡa.dʊ]). Is epenthetic /i/ less common here?
    6. You write abano as /a.ˈbɐ.no/ without nasalization of /ɐ/; not sure if this means the dialect doesn't nasalize stressed vowels before nasals in open syllables, or if this is just a phonemic representation. You even write phonetic [awˈkɐno] as well.
    7. You write -rr- as /h/ in acorrentado; is this the normal pronunciation?
    8. You write agente as [aˈʒẽte]|[aˈʒente]. Is nasalization optional vs. a pronounced /n/?
    9. You write agnosticismo as [aɡˌnostʃiˈsismo] instead of expected /izmo/. Is this a feature of this dialect or just a phonemic representation?
    10. You write alvor as /aw.ˈvɔɻ/ instead of expected /aw.ˈvoɻ/. Can you explain?
    11. It appears that raising of -ear to -iar is absent; similarly calcâneo /kaw.ˈkɐ.ne.o/. Is this related to the lack of raising of final 'e' to /i/?
    12. Any other things I should be aware of?
  2. Do you know of any online sources for Brazilian Portuguese pronunciation? There are lots of online monolingual Brazilian Portuguese dictionaries but none seem to have IPA in them.
  3. 'i' in hiatus: any rules you can give? It appears that final '-ia' is usually /ja/, and '-io' is usually /ju/. Or maybe it can be either /i.a/ or /ja/, and /i.u/ or /ju/?
  4. '-ear' in verbs: is this pretty consistently pronounced as if written '-iar', or are there no rules?
  5. Epenthetic /i/: The manual IPA seems to insert this less often than I would expect, e.g. ab-rogar /ab.hoˈɡah/ where I would expect /a.bi.hoˈgah/ (and I verified various Youtube recordings with this latter pronunciation), and magnânimo /magˈnɐ̃.ni.mu/ where I would expect /ma.giˈnɐ̃.ni.mu/. OTOH you have (or someone has) added pronunciations with epenthetic /i/ in /ks/, /kt/ and /ps/ clusters, where I thought it wasn't normally inserted (especially not in /ks/). Indeed /fakˈtʃi.vu/ rather than /fa.kiˈtʃi.vu/ seems more common on Youtube. Can you make any comments on whether and under what circumstances /i/ is epenthesized?
  6. '-ejar' and '-elhar' verbs: Earl W. Thomas, The Grammar of Spoken Brazilian Portuguese, p. 46 says verbs in '-ejar' and '-elhar' take /e/ not /ɛ/ in stressed forms in Brazil ('eu desejo', 'eu aconselho'), with the only exception being invejar. Is this true? We have plenty of '-ejar' verbs like bocejar, cacarejar, cortejar, ensejar, festejar, gargarejar, gorgolejar, gracejar, lampejar, latejar, manejar, motejar, ornejar, pejar, sobejar, traquejar, varejar, etc. All of these just listed have deverbal nouns in '-ejo' pronounced /eʒu/, and for some of them (lampejo, manejo, pejo) the pronunciation of the corresponding verbal form is explicitly given with /ɛʒu/. Similarly, espelho verb form of espelhar is given with South Brazil /ɛlu/ (although conceivably that is South-Brazil-specific).

Thanks for any help. Benwing2 (talk) 04:25, 26 September 2022 (UTC)Reply

Optional /ɨ/[edit]

Hi @Benwing2. I noticed that you added the optional final /ɨ/, however think there's some aspects that now look a bit weird. First, I would assume the syllabification of cidade with the dropped /ɨ/ would be IPA(key): /siˈdad/ instead of IPA(key): /siˈda.d/ (compare Madrid). However, I understand that it would be hard to indicate that in the module in a compact way, so if you think it not a major problem, I don't think it needs a change. However, there's one case where the current version of the module doesn't seem to make much sense, which is when removing /ɨ/ leaves /l/ as the final letter, such as in vale. I think in this case, the /l/ should be IPA(key): [ˈvaɫ] rather than the current IPA(key): [ˈva.l]. - Sarilho1 (talk) 10:42, 7 November 2022 (UTC)Reply

@Sarilho1 Hi, I missed this. Good point about /l/, I will fix it. As for the syllabification issue, it would require the module to generate two outputs, which is definitely possible but not clear if it's worth it; I'll leave it as-is for now but we can always change it later. Benwing2 (talk) 02:04, 16 November 2022 (UTC)Reply

Respelling of "x" as "sh" instead of "ch"[edit]

@Benwing2, do you mind if we change the asked respelling of "x" with value of IPA(key): /ʃ/ from "ch" to "sh"? I would like to leave the etymological "ch" available for the a future introduction of the North Portuguese dialects that read it as IPA(key): /t͡ʃ/ (for instance chuva as IPA(key): t͡ʃuβɐ). - Sarilho1 (talk) 15:23, 15 November 2022 (UTC)Reply

@Sarilho1 That is fine with me. Just a question, though, how common are the North Portugal dialects that distinguish 'x' from 'ch'? Benwing2 (talk) 02:06, 16 November 2022 (UTC)Reply
The dialects are definitely being supplanted by the standard Portuguese, but since they conserve features from Galician-Portuguese, I think it would be interesting to include them at a later date. This map (the two areas in blue), indicate the approximate extent of the Northern dialects. - Sarilho1 (talk) 23:28, 16 November 2022 (UTC)Reply
@Sarilho1 I implemented this and your previous request concerning -le, and respelled all terms where 'ch' was being used to respell 'x' with 'sh' instead (these changes are still being pushed; they will be done in an hour or so). Benwing2 (talk) 09:04, 19 November 2022 (UTC)Reply

Don't display phonetic IPA[edit]

@Benwing2 Is the following change, "76. Don't display phonetic IPA if identical to phonemic IPA. [DONE]", necessary? I find it a bit confusing that sometimes Portugal presents the phonetic IPA, but Brazil doesn't or vice-versa. - Sarilho1 (talk) 15:42, 22 November 2022 (UTC)Reply

@Benwing2 Hmm. For me I find it simplifies the output by not including redundant info; usually the purpose of having both phonetic and phonemic IPA is that the phonetic IPA contains more detailed info not present in the phonemic IPA, but if they're both identical, that kind of defeats the purpose. Maybe there's some other way to indicate this? E.g. if the phonetic IPA is identical, write '[same]' or something similar in place of the phonetic IPA? That would maybe be less confusing for you. Benwing2 (talk) 10:16, 24 November 2022 (UTC)Reply
@Sarilho1 Oops, I pinged myself, not you :) ... Benwing2 (talk) 10:17, 24 November 2022 (UTC)Reply
Or maybe just indicate the phonetic IPA when they are equal. Since it's the most detailed one, it would be clearer that there no further information missing. - Sarilho1 (talk) 11:10, 7 December 2022 (UTC)Reply

Please add Angolan Portuguese[edit]

Can someone please add Angolan Portuguese to this template and module? An IPA table can be found at wikipedia:Angolan Portuguese#Phonology. Angola is the second most populated Lusophone country after Brazil (even more people than Portugal), so I think adding Angolan pronunciation would be good. Can someone please do this? Thanks! 153.107.26.53 02:50, 30 November 2022 (UTC)Reply

I support adding Angolan and more dialects to this template, I think it's helpful to have IPA for words in less well-known dialects. 2001:8004:44F0:F2AF:2C2C:A387:9D83:BFA1 20:14, 5 December 2022 (UTC)Reply
IMO it's already complex enough without new dialects, and we need a lot more than just the Wikipedia Angolan Portuguese phonology page to do justice to the Angolan dialect; we need all the individual words that are pronounced unpredictably, and I doubt that reference is easily available. Benwing2 (talk) 09:00, 7 December 2022 (UTC)Reply

Breaking-up of hiatuses in Brazilian Portuguese[edit]

In Brazilian Portuguese, hiatuses where the first vowel of the hiatus is the stressed one tend to get broken up. This is very standard and actually a reason for terms like Corrêa,idéa to have become Correia,ideia in the modern-day orthography. It's represented in verbs too, see all the verbs that get their "-e-"s turned into "-ei-"s once they become stressed. Also the cognates with Spanish and Galician where the only difference between them is stressed e -> ei. This happens with open Es, closed Es and closed Os alike. I can't think of any hiatuses where the first element is an open O. The 'breaking up' results in the additions of a bunch of semi-vowels, which are ignored by this module in other cases too for some reason. There are some studies on the matter too, and I could've sworn I've seen someone once say that it's weird how 'Brazil really hates hiatuses and puts glides everywhere in order to avoid them' in some online.

I actually feel like it'd be accurate to put the optional glide on both sides of the hiatus, so "boa","idéa" would be /ˈbo(w).(w)ɐ/,/i.ˈdɛ(j).(j)ɐ/ and not /ˈbo(w).ɐ/,/i.ˈdɛ(j).ɐ/. For simplicity, though (and by taking some influence from the current glide-averse module), I'll be keeping /ˈbo(w).ɐ/-like forms here. Here's a small, non-exhaustive list from the top of my head:

"Enéas" and "Corrêa" can be/are written as "Eneias" and "Correia". In those cases, I don't think the glide should be optional (/eˈnɛj.ɐs/,/koˈʁej.ɐ/) since it's actually written.

It can also happen when the first term isn't stressed if said first term is E*, too:

  • cear: /ce(j)ˈaɾ/ (don't mind the rhotic; that could've been pretty much anything lmao)
  • freada: /fɾe(j)ˈa.dɐ/

"Ceiar" and "freiada" are both common misspellings, mind. This is the case for a reason. I've caught myself saying enseada as /en.sejˈa.dɐ/ plenty of times, too (I've never heard "ensiada" in Brazil, though I don't doubt someone somewhere might say it that way).

*: It doesn't happen with O. Instead, it either gets reduced to U... "feijoada": /fe(j).ʒuˈa.dɐ/ (the module doesn't represent this either. Huh.) Or it gets "merged" with the O that follows it. "Coordenador": /ko.oɾ.de.na.ˈdoɾ/,/koɾ.de.na.ˈdoɾ/ (yet again, the module can't do it... also pay no mind to the rhotic). MedK1 (talk) 01:05, 18 October 2023 (UTC)Reply

Pinging @Benwing2 so this can get looked at. Paging @Sarilho1 so we can maybe talk about Template_talk:pt-IPA#Suggestion and Template_talk:pt-IPA#Pronunciation_of_unstressed_hiatuses/diphthongs at the same time. MedK1 (talk) 02:05, 17 November 2023 (UTC)Reply
Also, can we talk about how the pronunciation at dezesseis doesn't match the module's predicted spelling at all? This isn't to say the speaker said it wrong, oh no, I say it exactly the same way, it's a problem with the module. I talked a little about it above, but the TL;DR is that it goes for pretty much anything starting in "des-" or "dez-". MedK1 (talk) 02:15, 17 November 2023 (UTC)Reply
@MedK1 Hmm, are you saying the normal pronunciation of des- is /dz/ in Brazil? That seems strange and not in accordance with what I heard when I was in Brazil. Maybe for European Portuguese? Benwing2 (talk) 02:52, 17 November 2023 (UTC)Reply
@MedK1 Also in general when we specify pronunciations, we give more enunciated/formal pronunciations than people might say when speaking fast; otherwise it just gets impossible. Benwing2 (talk) 02:53, 17 November 2023 (UTC)Reply
I'm Brazilian, @Benwing. When "des-" is preceded by a vowel (or a voiced consonant like in "desgarrar"), I'd say [d͡z] is absolutely a normal pronunciation, even when speaking carefully: just listen to the audio in the page; there's no vowel. I think it's far more common than /de/zesseis actually. /d͡ʒi/zesseis is plenty common too though. The other pronunciations in this section and the ones I linked to above have nothing to do with speaking quickly either; indeed, the hiatus breaking thing actually makes speech slower since it adds an extra sound. And the other one is just the normal pronunciation: like I said in the other section, I haven't heard sanguín/e.u/ ever. It sounds really really off to my ears.
Everything I'm reporting here is stuff I'd do even if I were to speak in slow-motion. You say only 'enunciated/formal pronunciations' are given, and yet feijoada has a /w/ in there. All those faster pronunciations changing Us in hiatuses to /w/ are actually things I wouldn't do, even when speaking at mach 1. atuar is always a-tu-á(r) for example (and "adiar" is always ad/i/ar). I'm not sure what might've led these pronunciations to be made standard for the module; I've actually talked about this online once a few years ago -- I don't remember the specifics of the conversation, but my point here is that the only person I've seen who says they pronounce those kinds of words like that was from Curitiba, which I believe counts as "South Brazil" for this module.
I mean, maybe I'm the one who has a regional pronunciation, but I really don't think so: my family's from all over Brazil, loads of voice actors on the TV are from Rio and I go to near Greater São Paulo yearly... I seriously don't get the feeling that I'm talking about something that's just regional, you know? MedK1 (talk) 03:12, 17 November 2023 (UTC)Reply
A few more notes: I don't think European Portuguese's phonetics allow something like this because most of the time, in an environment like this, they'd have turned their Ds into dentals afaik.
Also, this goes for "des-" before an unvoiced consonant too, with [d͡s], and with "tes" too: /ɐ̃t͡s/ for antes is listed as "Caipira" only but when I look at it, I can't help but think "no way!". MedK1 (talk) 03:14, 17 November 2023 (UTC)Reply
I'd been trying to come up with some fancy sentence that had everything I'd been commenting about. I couldn't. So have me rambling instead here and here. I tried to speak a bit more slowly, but I'm not sure if I succeeded lol. 2804:1B0:1903:FF5F:C4EA:573B:1EBF:6CEC 03:30, 17 November 2023 (UTC)Reply
Here's a third one concerning glides. I hope it's alright if the audios are mostly in Portuguese? It completely slipped my mind that I might talk in a way that's hard to understand or something... 2804:1B0:1903:FF5F:C4EA:573B:1EBF:6CEC 03:37, 17 November 2023 (UTC)Reply
@MedK1 OK, I can't fix this right now due to time constraints but if you make a list of the things you think should be changed, I will look into it. The current situation of the module is due to my trying to match the previous manually-specified pronunciations along with comments from other native speakers. Keep in mind the module is very, very complicated, so anything that moves in the direction of simplification is welcome. Also, different native speakers seem to disagree rather strongly with each other about what is normal and what isn't, and it seems to differ from region to region, which makes things a lot trickier. BTW IMO the "South Brazil" accent with final /o/ and /e/ should probably be deleted because it represents a vanishingly small fraction of speakers; it's there because someone (I think User:Ungoliant MMDCCLXIV) added over 2000 manually specified pronunciations of this nature and I tried to match them rather than delete them. Benwing2 (talk) 03:47, 17 November 2023 (UTC)Reply
@Benwing2 For Brazil:
  • -ia, -io endings shouldn't have two pronunciations. Keep only the "faster" one, the one that matches Portugal.
  • -ie is correct; do add parenthesis to the j though: "superfície"'s ending can be pronounced as both /si.i/ and /s(j)i/.
  • -ea, -eo, -ee unstressed endings should work exactly like -ia and -io after the above changes. No more /e.u/; /ju/ only.
  • The same goes for -oa and -oe; not "o.ɐ" and "o.i", but "wɐ" and "oj".
  • unstressed "o" before stressed "ada" should be reduced to "u"; /u/s shouldn't become /w/s like that.
  • "a", "da", etc. should have reduced (upside down) As in the module. Both pronunciations are common and the difference between Brazil and Portugal isn't so much an actual 'different pronunciations for the word' thing, but moreso a difference in prosody. Words like "uma", "aquela" and "ela" often get their final As 'unreduced' in Brazil.
Brazil, things that complicate things rather than simplify;
  • Word-initial "NH" should be treated in narrow transcriptions like word-medial NH and be represented as a nasal j.
  • There's the hiatus breaking I mentioned above. I kind of feel bad talking about this since the module must've taken a lot of work already, and both this and the nasal glide thing isn't even phonemic. I have a section on the glides below this one. Maybe they could show up only in the narrow transcription?
    • In narrow transcriptions, nasal E and O should always have a diacritic under them saying they're slightly more open than the closed vowels; I have a paper backing this up, but I'm on mobile right now...
  • For a word like "emporcar", maybe instead of "/ẽporka(ʁ)/ (careful pronunciation), /ĩporka(ʁ)/ (natural pronunciation)", you could make it show up as "/ẽporka(ʁ)/, /ĩ-/"? We do something like that for accent. Also, pay no mind to the umlauts; I'm on mobile and it won't let me use tildes on them.
    • A similar thing could go for... I think /kade^'adu/, where instead of /kade'adu/, /kadʒi'adu/, it could do /kade'adu/, /-dʒi-/.
      • ...I think I'm seeing a pattern actually. Could it be that before stressed vowels, then E/O get reduced to I/U? Unless it's the same vowel aa them of course. This includes nasals though: "coando" is "cuando". I'm not sure if this goes for everything, but I can't think of any exceptions rn.
  • Stressed oxytone O/Es next to Ls should be presumed open O/Es. Like, say, "anel" or "anzol". They shouldn't need to be specified; the default plural of stressed "-el" and "-ol" are "-éis", "-óis" in the pt-verb module for a reason.
  • We should be able to type something to get initial unstressed /des/, /dis/, /diz/ and /dez/ to be displayed as /d(ʒi)z/ or /d(ʒi)s/ according to what comes afterward (the module can already handle the z/s part, so the difference here would be in making sure the vowel goes from E/I to 'optional I'.
  • This feels nitpicky, but imo the way we use graves in the module is a little weird. If they were swapped with the dot under vowels, we could make them able to read old spellings like òtimamente just fine. On letters w/o open-closed distinction (aka Is and Us), they could force a hiatus like "." preceding them does. This would allow them to read faìscazinha if need be, and if the dieresis accent could do the same thing, the same would go for saüdade.
    • The old graves have been partially replaced by the acute accent here too, which led protocooperação to have accidentally been left marked as some preproparoxytone for a while. So yeah, this change would really make it feel more... 'accurate', I guess?
  • A way to see words that have 'specified spellings'in order for us to notice patterns more clearly was suggested. Nvm if it's too hard to implement, but I thought it might be interesting. 2804:1B0:1903:FF5F:C4EA:573B:1EBF:6CEC 05:12, 17 November 2023 (UTC)Reply
I agree that South Brazil shouldn't be here. The differences are really, really predictable and there really aren't a lot of speakers for it. I find it odd that South Brazil is here but not Nordestino nor Angola Portuguese. ...I mean, I get why they're not here, but like, you're 100% correct. 2804:1B0:1903:FF5F:C4EA:573B:1EBF:6CEC 05:15, 17 November 2023 (UTC)Reply
@MedK1 There is no Northeastern Brazilian Portuguese or Angolan Portuguese because I don't have a good reference or description on the latter and the former seems to consist of a zillion different accents so I wouldn't know which one to use unless someone supplies a good description. I'll respond to your other comments in a bit. Yes, the module took a lot of work; you can see 81 FIXME issues listed at the top of the file, and I think they're all dealt with. As for grave accents, this convention was suggested by User:Sarilho1 based on usage in Priberam, to make it easier to note unreduced vowels in European Portuguese. Also when you have a chance can you clean up your comment to use standard IPA notation? I'm not sure what some of the symbols mean that you're using (e.g. the 3). Benwing2 (talk) 05:48, 17 November 2023 (UTC)Reply
Right, of course -- it's fixed up now (the 3 was a ʒ). And yeah, fair points, I can think of at least 3 different varieties of Northeastern Brazilian Portuguese. I believe @Stríðsdrengur is Northeastern though: perhaps he could supply exactly that? About Angolan Portuguese, the Wikipedia article seems to have a bunch of sources, but I can't tell you if they're very reliable; I've never actually interacted with somebody from Angola... MedK1 (talk) 12:30, 17 November 2023 (UTC)Reply
@Benwing I found a few more things worth fixing:
  • Epenthetic "i" on Brazilian Portuguese messes up European Portuguese if used after "k", see acne and técnico where the words get divided as ".kn" instead of "k.n".
    • Honestly, the epenthesis should happen by default for 'foreign' clusters (read: anytime a syllable wants to end in a consonant that isn't S/R/semi-vowel), there are way more cases where it happens than cases where it doesn't, and the cases where it doesn't tend to explicitly be because the word is being used in a formal context and people actively watch out for that when they're speaking highly formally.
  • I asked for the module to assume open E/Os for unaccented words that end in L. It should assume closed E/Os for words ending in R. That'd reduce the amount of written-up 'exceptions' by A LOT (most -er verbs would be good to go). From the top of my head, the only "ér" word I can think about is mister, but even then, the module was asking us to specify it regardless so it's a win-win either way.
Also, Stríðs gave me a reference (two videos by the same guy, he makes some good linguistics vids) for Northeastern Brazilian Portuguese, but neither of us had time to write up the list of elements for it. Since I wrote up the above, I might as well...:
  • No affrication of T/D before /i/.
  • End-of-syllable R is often transcribed as /h/.
  • Substitution of [v], [z] and [ʒ] for [ɦ] (the ɦ forms are alternate, informal versions and the 'standard' ones are used too)
  • lack of vowel breaking for stressed i (dia is always /ˈdiɐ/ and never /ˈdij.ɐ/ or /dij.ja/, see my original comment at the start of the section for the j.j thing)
  • vowel harmony (usually on verbs, the module could generalize it to "unaccented words ending in R" and editors could take care of any exceptions), where all E/O vowels preceding a stressed syllable will match the openness of the vowel in the stressed syllable.
    1. p/ɔ/dar
    2. c/o/rrer
    3. p/u/lir
    4. d/u/rmir
    5. v/i/stir
    6. s/ẽ/ntar
    7. s/ĩ/ntir
  • /s/ and /z/ become [ʃ] and [ʒ] before /t/ and /d, l, n/ respectively.
    1. ve[ʃ]tir, e[ʃ]trela
    2. e[s]paço
    3. o[ʒ] dois, o[ʃ] três, o[s] quatro
MedK1 (talk) 00:20, 28 November 2023 (UTC)Reply
@MedK1 Thanks, I'll take a look. BTW my experience of Northeastern Portuguese is from spending time in Salvador, and they definitely do affricate ti/di there, but the non-affrication may occur elsewhere. Benwing2 (talk) 00:43, 28 November 2023 (UTC)Reply
Ah, yes, that's for sure! The video does mention it's mostly for the area marked with a 7 in this here map. That's the area denominated "Nordestino" in Wikipedia, too. Salvador is marked under "3", Baiano. A bunch of the other features fluctuate a bit in the actual Northeast region too (you were 100% on the mark when you said it was a gajillion accents!), especially the /s,z/ thing. MedK1 (talk) 00:48, 28 November 2023 (UTC)Reply

Extra glides after non-ã/an nasal vowels in Brazilian Portuguese[edit]

Wikipedia's Portuguese phonology page mentions how nasals are often realized as vowel+ȷ̃, vowel+ɰ̃, or vowel+w̃. It's some pretty accurate* information backed up by scientific papers and by how natives around me (plus literally me, too!) pronounce it, and yet, this module doesn't include those at all.

*: I'd actually write them as diphthongs, replacing ȷ̃,w̃ with ɪ̯̃,ʊ̯̃ respectively in narrow transcriptions (to match how we treat "tem") and prevent learners from mistakenly thinking there might be some allophonic schwa-like sound between, say, ȷ̃ and the next sound in the transcription; but either way, 90% accurately representing the glides is far better than not having them there at all.

So here's my suggestions about what to do, actually:

  • bom = /ˈbõw̃/ [ˈbõ̞ʊ̯̃],[ˈbõ̞],[ˈbõ̞ɰ̯̃]
The diphthongized pronunciation is the standard/most-frequent one. It's inconsistent to write "tem" as /ˈtẽj̃/ with the glide but leave it out with "bom".
  • tempo = /ˈtẽ.pu/ [ˈtẽ̞ɪ̯̃pu],[ˈtẽ̞pu],[ˈtẽ̞ɰ̯̃pu]
The glide (j̃) is hidden in the broad transcription because it's not at the end. It's alright to omit the glide mid-word but at the end of an utterance, it just sounds off. Keeping it like this makes it so the broad transcription matches European Portuguese, too, so that's pretty neat, right?
Maybe the second one should be [ˈtẽ̞ᵐpu] with the little M?
  • cinza = /ˈsĩ(j̃).zɐ/ [ˈsĩɪ̯̃zɐ],[ˈsĩzɐ],[ˈsĩɰ̯̃zɐ]
Maybe the second one should be [ˈsĩⁿzɐ] with the little N?
  • juntos = /ˈʒũ.tus/ [ˈʒũʊ̯̃tus],[ˈʒũtus],[ˈʒũɰ̯̃tus]
The omitted glide (in the broad transcription) here is w̃.
  • bonde = /ˈbõ.d͡ʒi/ [ˈbõ̞ʊ̯̃d͡ʒi],[ˈbõ̞d͡ʒi],[ˈbõ̞ɰ̯̃d͡ʒi]

The ones with ã/an stay unchanged as there's no glide after it.

Alternatively, maybe the narrow transcriptions could be represented as

  • [ˈtẽ̞(ɪ̯̃/ɰ̯̃)pu]
  • [ˈsĩ(ɪ̯̃/ɰ̯̃)zɐ]
  • [ˈʒũ(ʊ̯̃/ɰ̯̃)tus]

That'd make it way more concise.

It's worth noting that in my experience, "ɰ̯̃" (instead of ɪ̯̃) tends to be WAY more frequent in Rio de Janeiro in my experience. Although I've heard people from other places use it too, I swear 9 in 10 people from outside Rio use ɪ̯̃ and 9 in 10 people from Rio use ɰ̯̃... This is pretty anecdotal though so probably not worth making it a regional thing. Unless there's some sort of paper on the matter? I haven't found any. MedK1 (talk) 01:45, 18 October 2023 (UTC)Reply

I really think the idea of adding "/" could be useful as otherwise, this and the previous section could lead to terms like "enjoar" becoming a little tricky. Compare:
  • /ẽ.ʒoˈa(ʁ)/,/ĩ.ʒoˈa(ʁ)/,/ẽ.ʒuˈa(ʁ)/,/ĩ.ʒuˈa(ʁ)/ [ẽ̞ɪ̯̃ʒoˈa(ʁ)],[ẽ̞ʒoˈa(ʁ)],[ẽ̞ɰ̯̃ʒoˈa(ʁ)],[ĩɪ̯̃ʒoˈa(ʁ)],[ĩʒoˈa(ʁ)],[ĩɰ̯̃ʒoˈa(ʁ)],[ẽ̞ɪ̯̃ʒuˈa(ʁ)],[ẽ̞ʒuˈa(ʁ)],[ẽ̞ɰ̯̃ʒuˈa(ʁ)],[ĩɪ̯̃ʒuˈa(ʁ)],[ĩʒuˈa(ʁ)],[ĩɰ̯̃ʒuˈa(ʁ)] (And this is without including the 'natural pronunciation' bit)
  • /ẽ.ʒoˈa(ʁ)/,/ĩ.ʒoˈa(ʁ)/,/ẽ.ʒuˈa(ʁ)/,/ĩ.ʒuˈa(ʁ)/ [ẽ̞(ɪ̯̃/ɰ̯̃)ʒoˈa(ʁ)],[ĩ(ɪ̯̃/ɰ̯̃)ʒoˈa(ʁ)],[ẽ̞(ɪ̯̃/ɰ̯̃)ʒuˈa(ʁ)],[ĩ(ɪ̯̃/ɰ̯̃)ʒuˈa(ʁ)]
Ideally, there'd be a way to note 'obligatorily pick one between these' in IPA. That'd make things extremely simple. If we use, say, {} for that...
  • /{ẽ/ĩ}.ʒ{o/u}ˈa(ʁ)/ [{ẽ̞/ĩ}(ɪ̯̃/ɰ̯̃)ʒ{o/u}ˈa(ʁ)].
I wonder if IPA possesses something like that. It HAS to, right? MedK1 (talk) 02:00, 18 October 2023 (UTC)Reply
IMO this is way too much info; this would be impossible for a learner to make sense of. Benwing2 (talk) 02:55, 17 November 2023 (UTC)Reply