Module talk:fa-IPA

From Wiktionary, the free dictionary
Latest comment: 3 months ago by 144.121.237.29 in topic Ů and Å
Jump to navigation Jump to search

@Fenakhay Is this module ready to replace {{fa-IPA}}? I could do a bot run to replace it if so. —AryamanA (मुझसे बात करेंयोगदान) 15:42, 24 March 2021 (UTC)Reply

Pls add the functionality to suppress a variety or explain how[edit]

Pls add the functionality to suppress a variety or explain how. E.g. at کره جنوبی no classical Persian should exist. Anatoli T. (обсудить/вклад) 13:49, 15 March 2023 (UTC)Reply

@Atitarev The style= param is supposed to do that but it may be broken. Benwing2 (talk) 07:23, 18 March 2023 (UTC)Reply
@Benwing2, @Atitarev See my edit. It is already possible.—Saranamd (talk) 07:29, 18 March 2023 (UTC)Reply
@Saranamd OK, but that doesn't really do what Anatoli wants. You are just manually giving the spelling for all supported variants. What if e.g. they're all the same but Classical should be omitted? The style= param in the original code that this module was copied from does this (e.g. |style=-cls), but I tried to use it in this module and it appears broken. Benwing2 (talk) 07:34, 18 March 2023 (UTC)Reply
@Benwing2, Saranamd: That's right. There should be more control for contributors, which varieties to include. Automatation based on classical Persian transliteration is good but it may be correct to include ALL of the varieties, if they don't exist or are unknown. There are examples less obvious than کرهٔ جنوبی (kore-ye jonubi). Anatoli T. (обсудить/вклад) 08:21, 20 March 2023 (UTC)Reply

unnecessary stress mark[edit]

(Notifying Ariamihr, Atitarev, Dijan, Mazsch, Qehath, ZxxZxxZ, Sameerhameedy, Saranamd): @Fenakhay How come the stress is needed to be specified manually in this module? Per w:Persian phonology, the stress is almost always word-final. We can default it to word-final and only require stress in the exceptional cases. (Apologies for the dual ping, I moved this to Module talk:fa-IPA.)

I also have a question about Tehrani and Kabuli pronunciation. The old {{fa-IPA}} generates these pronunciations automatically if |teh=1 (for Tehrani) or |kab=1 (for Kabuli) is set. The new module requires that Tehrani and Kabuli pronunciations be spelled out phonetically to be included, which is more awkward and prevents me auto-converting the old template to the new one. Is there a reason for this? Are Tehrani and Kabuli pronunciations too unpredictable to be generated automatically, or is this just an implementation gap? Benwing2 (talk) 07:28, 18 March 2023 (UTC)Reply

@Benwing2 Stress is predictable for most lemma entries, yes. The main cases of non-final stress are found in inflected non-lemma verbs.
Tehrani pronunciation cannot be generated automatically because the rules are applied unpredictably depending on various factors (how common or colloquial the word is, whether the word is or is perceived as a loan, etc.).
The current IPA module is flawed overall and in dire need of phonetic IPA. I have written down all the necessary rules (which are not very many) in User:Saranamd/Persian phonetic IPA, but this has not been coded into the module yet. Saranamd (talk) 07:37, 18 March 2023 (UTC)Reply
@Saranamd I can probably work on this sometime soon. Can you let me know what the phonemic flaws are in the current module? Also I took a look at User:Saranamd/Persian phonetic IPA. The Wikipedia w:Persian phonology article claims that syllable-final vowels are *ALL* long, including "short" a e o, and sources it to [1]. This is not mentioned in your page, is this an omission on your part or a mistake in the Wikipedia article? Benwing2 (talk) 07:44, 18 March 2023 (UTC)Reply
The main issue with Iranian phonemic IPA isn’t factual inaccuracy per se, but that it is potentially misleading for learners because it misses key phonetic features like the initial glottal stop, the assimilation of nasals, the fact that historical /k/ is much more commonly pronounced as [c], etc. A naive reader would pronounce many if not most words wrongly just going by the phonemic IPA.
The Classical phonemic IPA is factually problematic for other reasons, e.g. glottal stops were not pronounced in ninth-century Early New Persian other than as a non-phonemic feature word-initially and assumed phonemic status only under later Arabic influence.
I do not know enough about Dari to say anything substantial.
The vowel length change is somewhat controversial. The article cited by Wikipedia mentions an erasure of quantity distinctions in closed syllables and cites Sokolova as the source. But the Oxford Handbook of Persian Linguistics says:
Early acoustic investigations had shown that the length distinction between the two sets has disappeared in Standard Modern Persian except in open unstressed syllables (Sokolova et al. 1952; Hodge 1957; Mohammadova 1974). However, recent work on the duration of vowels in Persian has shown that the duration distinction is also present in closed stressed syllables (Modarresi Ghavami 2015 [1393]). Table 4.6 shows the duration of vowels in closed stressed syllables. As the numbers indicate, the long vowels [i,u,ɑ] are consistently longer than [e,o,a] respectively.
Hence it might be safer (and would be in line with the Handbook‘s implicit conclusion) to omit it.
Not sure about Kabuli, but the Tehrani IPA does represent common Tehrani pronunciation shifts. The issue is that these are not consistently applied to all words, so they should be turned off by default.—Saranamd (talk) 08:08, 18 March 2023 (UTC)Reply
@Saranamd What do the auto-generated Tehrani and Kabuli pronunciations that the old templates can generate correspond to? You can see the definitions here Template:fa-IPA-teh and here Template:fa-IPA-kab. Thanks! Benwing2 (talk) 07:47, 18 March 2023 (UTC)Reply
@Saranamd: Thanks. I guess the main question I have about Tehrani pronunciation is can we add a feature |teh=1 that adds those Tehrani features to the words that have them and leaves them out otherwise, or are they applied so inconsistently that we have to spell absolutely everything out? Benwing2 (talk) 21:03, 18 March 2023 (UTC)Reply
BTW can you help me with automating syllable division in Persian words? I would like to implement auto-stressing of the final syllable (which can be overridden) but in that case I need to know how to divide the consonants preceding the stressed vowel. I can make some guesses based on existing words but better would be if you could spell out the rules. Thanks! Benwing2 (talk) 21:06, 18 March 2023 (UTC)Reply
|teh=1 sounds good.
Persian syllable structure is very simple, just CV(C)(C). All Persian syllables must begin with a consonant, with /j/ as hiatus breaker. تلویزیون (televiziyon) had a wrong IPA in this regard. Saranamd (talk) 05:54, 19 March 2023 (UTC)Reply
@Benwing2 Ping.—Saranamd (talk) 05:54, 19 March 2023 (UTC)Reply
@Saranamd Thanks. I guess what I meant is, how do I know how to split a cluster of consonants? In Romance languages, for example, Cl and Cr are normally kept together; I take it that isn't the case in Persian? So CC always splits as C.C, and CCC always splits as CC.C? Also, it is good to know that you have to put y between vowels; User:Atitarev and I weren't sure about that. Another issue we weren't sure of is translit like reyâz or riyâz for رياض (Saudi Arabian capital). Can both eyV and iyV exist in Persian and if so how do you know which one to use? (And how to indicate them differently when vocalizing?) Benwing2 (talk) 06:20, 19 March 2023 (UTC)Reply
@Benwing2 Yes, CC always splits as C.C because syllable-initial clusters are forbidden.
Both iyV and eyV are possible, but the latter I think mainly across morpheme boundaries, like نیستان (neyestân, reedbed) from نی (ney). But ریاض should be riyâz.
One thing I missed regarding vowels is that Classical /ajj/ is still realized as /aj(j)/, not /ej(j)/. This only affects Arabic loans like حی (hayy).—Saranamd (talk) 06:52, 19 March 2023 (UTC)Reply
@Benwing2 But note that this is not universal, e.g. سید is normally pronounced seyyed and not sayyed (dated reading). Thinking about it, it seems like only a few Arabic loans that resist the shift.—Saranamd (talk) 07:01, 19 March 2023 (UTC)Reply
@Saranamd Thanks. I am working on Module:fa-IPA now to add various features and fix bugs. I'll probably add the phonetic form in the process (for Iranian Persian only; it seems we'd need a totally different set of rules for Tajik, Dari and Classical). Benwing2 (talk) 07:07, 19 March 2023 (UTC)Reply
@Saranamd Also is there a good reference for Persian pronunciation of written words, including the vowels? E.g. an online-accessible monolingual dictionary or something like that (I can't read Persian but I will be able to figure out the pronunciation if given in any reasonable system). Benwing2 (talk) 07:08, 19 March 2023 (UTC)Reply
TBH I think Hayyim’s dictionary is the best, although it is missing many modern words and the pronunciation can be dated (giving sayyed for سید for example).
The most exhaustive monolingual dictionary is Dehkhoda, but unfortunately its pronunciation guidelines (which rely on the short vowel diacritics) leave much to be desired. As mentioned, حی (hayy) and حیوان (heyvân) should have different vowels (mentioned explicitly in the Oxford Handbook) but both are marked indiscriminately as حَیّ and حَیْوان.
By contrast, نیستان is given indiscriminately as نِیِستان (neyestân), نَیِستان (nayestân), نَیَستان (nayastân), نَیسِتان (naysetân), and نِیسِتان (neysetân). The -setân forms are not normal (they may be used to fit classical poetic meters), and there seems to be no clear indication that the normal pronunciations are only the first two.
Dehkhoda could be useful for modern words not included in Hayyim. For example, I checked تلویزیون and it gives تِلِویزیُن, which shows that both my televiziyon and the previous televizion were wrong and the actual translit should be four-syllable televizyon. Saranamd (talk) 07:38, 19 March 2023 (UTC)Reply
@Benwing2 And thank you for adding the phonetic forms.—Saranamd (talk) 07:41, 19 March 2023 (UTC)Reply
@Saranamd Trying to understand how the two dictionaries work. It looks like Hayyim indiscriminately writes short a and long â as a, this is not good if so. Dekhoda seems to leave out certain consonants, e.g. for تلویزیون the pronunciation appears to be given as [تِ لِ یُنْ], is that correct? Are you supposed to infer the parts that are missing? Benwing2 (talk) 08:15, 19 March 2023 (UTC)Reply
@Benwing2
Hayyim indiscriminately writes both as a, but because â and a are easily distinguishable based on the Arabic script.
If not marked in Dekhoda, the default assumption is that the vowel letters are pronounced as long vowels and that the consonants are not followed by any short vowels.
شترمرغ (šotormorğ) is given as [ شُ تُ مُ ], but this implies شُتُرْمُرْغ. They will mark the sokun only when they think it is ambiguous/not correctly inferrable otherwise.—Saranamd (talk) 08:35, 19 March 2023 (UTC)Reply
@Saranamd, Atitarev A question about word stress: Can you give me the list of unstressable words in Persian? This might consist of articles (if Persian has any), determiners, pronouns, conjunctions and/or prepositions, especially if monosyllabic. The idea is that the remaining monosyllabic words will be marked with stress. This is so that the pronunciation of phrases can be correctly indicated. If Persian has any sandhi rules, please let me know as well. Thanks! Benwing2 (talk) 01:56, 20 March 2023 (UTC)Reply
@Benwing2: In my limited observation, one-syllable prepositions don't take stress. Full word pronouns do have a stress. The enclitic pronouns (suffixes) don't take the stress. There are no articles.
Negative ن takes the stress نرفتم (ná-raftam) (I didn't go).
Verbs are a bit complicated but some current already show the correct stress.
The abbreviated one-syllable copulas seem unstressed, e.g است (ast). Anatoli T. (обсудить/вклад) 03:06, 20 March 2023 (UTC)Reply

──────────────────────────────────────────────────────────────────────────────────────────────────── @Atitarev Can you help me make a list of the copulas and one-syllable prepositions? Apologies, I know little of Persian. Benwing2 (talk) 03:47, 20 March 2023 (UTC)Reply

@Benwing2: We can both start by looking at Category:Persian prepositions (many will have ezâfe, making them longer than one syllable).
I am a bit worried that I don't have a good knowledge of the pronunciation either and may give you some wrong information. The online resources seem scarce as well. Let's see if native speakers become more active.
Making a phonemic module with a manual accent with ` (as is done currently) seems straightforward, though. Anatoli T. (обсудить/вклад) 03:57, 20 March 2023 (UTC)Reply
@Atitarev OK. There aren't very many (if any) phrases currently being run through {{fa-IPA}} so it's less of an issue right now. My plan is to make word stress in multisyllabic words default to word-final, requiring a manual accent only if it's elsewhere. Benwing2 (talk) 04:07, 20 March 2023 (UTC)Reply
@Benwing2: It is usually on the last syllable but may or may not remain unchanged in certain circumstances, like ezâfe doesn't change the main stress but plural markers do. A good reference would be good. I've got bits and pieces but it may not be complete, the book I sent you has infos on stresses as well.
The current verb conjugation templates seem good on stresses and match my sources. Anatoli T. (обсудить/вклад) 04:18, 20 March 2023 (UTC)Reply
@Benwing2, @Atitarev:
Excluding inflected verbs (irrelevant because the lemmatized infinitive has word-final stress) and enclitic-attached nouns, stress is almost always word-final. It can be automated.
There are only twenty exceptional cases of word-initial stress, again per the Oxford Handbook:
  • آری (âri), بلی (bali), بله (bale). All formal interjections meaning “yes”. Note that colloquial آره (âre, yeah) has normal word-final stress.
  • نخیر (naxeyr). Formal word meaning “no.”
  • ولی (vali), اما (ammâ), بلکه (balke), لاکن (lâken). All conjunctions meaning “but; however”.
  • شاید (šâyad, maybe).
  • آیا (âyâ, formal particle used for polar questions).
  • همین (hamin, this very) and همان (hamân, that very).
  • حتی (hattâ, even).
  • کاشکه (kâške, I wish...) and variant کاشکی (kâški)
  • ماشاءالله (mâšâllâh, the mashallah)
  • خیلی (xeyli, a lot of), etymologically noun خیل (xeyl, cavalry army) + unstressed enclitic
  • اگر (agar, if) and etymological antonym مگر (magar, unless)
  • گویا (guyâ, as if), originally an inflected verb
  • بغضی (ba’zi) and برخی (barxi), both meaning “some” and etymologically involving unstressed enclitics
  • چرا (čerâ, why), see below
  • زیرا (zirâ, because), see below
  • مرسی (mersi, thanks), French loanword with unexplained initial stress
  • چونکه (čonke, because), etymologically a compound of two conjunctions
  • آفرین (âfarin) and باریکلا (bârikalâ), interjections meaning “bravo”
  • هرچند (harčand, even though)
  • وقتی (vaqti, when)
The Handbook says these are “the full list of these words with stress on the first syllable”, presumably in modern Iranian. (The obsolete verbal particle همی (hamē) should also have word-initial stress IIRC, but this word is no longer used outside classicizing poetry.)
Note that in Persian phonology, the plural and comparative suffixes behave as derivational rather than inflectional suffixes and accordingly take the word-final stress.
IIRC there are only six true (non-ezafe) one-syllable prepositions. These are به (be), از (az), با (), تا (), در (dar), بر (bar). All the rest are theoretically nouns with ezafe, e.g. روی (ru-ye, on, literally on the face of), and behave accordingly.
The particle را () also does not take stress, which etymologically explains the initial stress of چرا (čerâ) and زیرا (zirâ) above.—Saranamd (talk) 09:57, 23 March 2023 (UTC)Reply
@Saranamd, Benwing2: Thank you for all these examples. I have a few resources (I can share) with stress infos. It's interesting how some titles and names can shift the stress in the vocative (when calling). I have already edited خانم and آقا on this.
@Saranamd, are you sure about ماشاءالله? Where is the stress? At Forvo, a Farsi speaker stressed the last syllable. Anatoli T. (обсудить/вклад) 23:06, 23 March 2023 (UTC)Reply
@Saranamd Thank you! Benwing2 (talk) 23:19, 23 March 2023 (UTC)Reply
@Atitarev Yes, mashallah is explicitly mentioned as having word-initial stress in the Oxford Handbook.
All nouns and proper nouns in the vocative have initial stress, not just some. This is a grammatical rule and should probably not be included.—Saranamd (talk) 08:35, 24 March 2023 (UTC)Reply

multiple stresses[edit]

(Notifying Ariamihr, Atitarev, Dijan, Mazsch, Qehath, ZxxZxxZ, Sameerhameedy, Saranamd): @Fenakhay Several words in {{fa-IPA}} are given multiple stresses, sometimes on every syllable (as with تلویزیون, spelled {{fa-IPA|ir=ti`li`wī`zī`un|prs=tal`wī`zyūn}}). Is this really correct? Benwing2 (talk) 07:31, 18 March 2023 (UTC)Reply

@Benwing2: This has been addressed in an edit. Just adding a response, so everybody knows it's closed. Anatoli T. (обсудить/вклад) 02:24, 21 March 2023 (UTC)Reply

سیاره and رادیو[edit]

{{fa-IPA|ir=rādi`yo}} produces /ɾɒːdeˈjo/. I was expecting /ɾɒːdiˈjo/

It works correctly with {{fa-IPA|rūsiya}} gives /ɾuːsije/ for Iranian.

{{fa-IPA|sayyā`ra}} gives /sejjɒːˈɾe/ for Iranian. I was expecting /sajjɒːˈɾe/ or /sæjjɒːˈɾe/. Anatoli T. (обсудить/вклад) 23:43, 20 March 2023 (UTC)Reply

@Atitarev Not sure about rūsiya vs. rādiyo, but sayyā`ra appears intentional; written 'ay' not before a vowel is transformed into 'ey'. Is this wrong? User:Saranamd can you comment? Benwing2 (talk) 02:29, 21 March 2023 (UTC)Reply
@Benwing2: Ah, thanks for pointing out what causes it. It's a case with a diphthong ey/ay on my talk page:
It seems a different case with سیاره (sayyâre), رادیو (râdiyo) or اسپانیا (espâniyâ). They are not considered "ey" + vowel. I checked forvo recordings by native speakers and previous edits. They don't have "e" before "y" in the translit or IPA. Pretty sure it's wrong in this case. Anatoli T. (обсудить/вклад) 02:38, 21 March 2023 (UTC)Reply
@Atitarev OK, that should be easy to fix. I am rewriting Module:fa-IPA so I'll include this. Benwing2 (talk) 03:20, 21 March 2023 (UTC)Reply
What about sayyid -> seyyed though? I think User:Saranamd mentioned above that some Arabic loanwords have eyy and some have ayy. If that's the case, we need to make ayy -> ayy by default and add an overriding Iranian-specific respelling for the ayy -> eyy cases (or potentially vice-versa). Benwing2 (talk) 03:22, 21 March 2023 (UTC)Reply
I think سید should be probably treated separately, as there are several variations in Iranian Persian at least: sayyed (very formal), seyd (informal), seyyed (both formal and informal) --Z 14:16, 30 March 2023 (UTC)Reply
Well if they are written as saiyā`ra or rādīyu in the fa-IPA template, they will be converted into the Iranian IPA as sæijɒː're and rɒːdiːyo. Alternatively, if we preemptively convert a -> æ and put it in as sæyyā`ra, the template will convert it to sæjjɒː're. Sameerhameedy (talk) 03:29, 21 March 2023 (UTC)Reply
@Sameerhameedy, Benwing2: Thank you! So سیاره is fixed. Is a long "ī" correct in rɒːdiːyo? Anatoli T. (обсудить/вклад) 03:38, 21 March 2023 (UTC)Reply
@Sameerhameedy: Hi. Just repeating my question re long "ī" before "y" in رادیو‎, اسپانیا‎ or روسیه. Is "i" really long here or the module just can't handle such scenarios?
(Notifying Ariamihr, Benwing2, Dijan, Mazsch, Qehath, Rodrigo5260, ZxxZxxZ, Sameerhameedy, Saranamd): I've got a new question. How do you add a long "ô" ("ō" in the template) to make it [oː] or does it never-ever occur in modern Iranian? Anatoli T. (обсудить/вклад) 03:22, 30 March 2023 (UTC)Reply
/i/ is short before /j/. User:Dick Laurent once told me he hears it to be more like [ɪ] rather than [i] when it comes before [j], and I think he was right, particularly when the word is not pronounced emphatically. I'm a native speaker of Iranian Persian and can upload recordings of some words if that helps. --Z 14:27, 30 March 2023 (UTC)Reply
It does occur in Iranian Persian, most importantly as the most common pronunciation of the ـَوْ aw/ow diphthong. I think it is found in loadwords, too, such as توکیو, where the final o could be pronounced longer. --Z 14:20, 30 March 2023 (UTC)Reply
@Atitarev my was a suggestion based on the limitations of the template. Afaik Iranian Persian doesn't typically distinguish [ɪ] from [e]/[i] or [i] from [iː] (which is why the template doesn't include them) so I thought it would be fine. Maybe ZxxZxxZ would be better to ask about that though. Also from what I know ô/ō and ê/ē both occur in Iranian Persian but usually as allophones of ow/aw and ey/ay, and not as distinct vowel phonemes like they do in Dari Persian. Sameerhameedy (talk) 17:16, 30 March 2023 (UTC)Reply
Thank you all! Based on the above, it seems the pronunciation module needs to have a way to force modern Iranian Persian pronunciations literally, i.e. when "i" means a short [i], not [e] (typically before [y]), also long vowels ô/ō and ê/ē to be taken literally, not converted to [uː] or [iː]. Same for diphthongs "ay" and "aw" (if the latter ("aw") occurs as well in modern Iranian).
A long [oː] would be more appropriate for loanwords, such as توکیو (tokyô), not allophones as چطور (četowr).
@Benwing2: will you be able to find a way to force this? E.g. using such letters in quotes or similar? E.g. {{fa-IPA|"say"yid}} (/sajjed/ as an older/alternative reading of /sejjed/ سید) Anatoli T. (обсудить/вклад) 22:32, 30 March 2023 (UTC)Reply

ɢ[edit]

The module currently pronounces ق's in non-initial position as /ɢ/ in Iranian Persian, while it is [ɣ] like other accents and [ɢ] only in initial position. I think the phonemic representation shold be /ɣ/. Z 14:55, 3 April 2023 (UTC)Reply

@ZxxZxxZ, Benwing2: Should it be /ɣ/ in the non-initial position only? Anatoli T. (обсудить/вклад) 23:15, 3 April 2023 (UTC)Reply
@ZxxZxxZ, Atitarev Wikipedia says this:
In Classical Persian, the uvular consonants غ and ق denoted the original Arabic phonemes, the fricative [ʁ] and the plosive [q], respectively. In modern Tehrani Persian (which is used in the Iranian mass media, both colloquial and standard), there is no difference in the pronunciation of غ and ق. The actual realisation is usually that of a voiced stop [ɢ], but a voiced fricative [ɣ]~[ʁ] is common intervocalically. The classic pronunciations of غ and ق are preserved in the eastern varieties, Dari and Tajiki, as well as in the southern varieties (e.g. Zoroastrian Dari language and other Central / Central Plateau or Kermanic languages).Benwing2 (talk) 23:37, 3 April 2023 (UTC)Reply
@Benwing2, ZxxZxxZ: Thanks. Based on this, what's the outcome? Leave as is or use the phonemic and make Iranian and Tehrani the same as other varieties? Anatoli T. (обсудить/вклад) 13:11, 4 April 2023 (UTC)Reply
I'm not sure about this part: "The actual realisation is usually that of a voiced stop [ɢ], but a voiced fricative [ɣ]~[ʁ] is common intervocalically"
In "standard" Iranian Persia and Tehran accent, [ɢ] is used for initial position only, and using it in all positions is one of the features of the Azeri accent of Persian (لهجه ترکی). Indeed, this relatively recent transition from [ɣ] to [ɢ] in Persian was under influence of the Azeri language.
If I'm not mistaken, this is only a phonetic, not phonemic difference by definition, because there is no difference in meaning. Yet we are labeling these IPA representations as "Iranian Persian" and "Tehrani", so maybe we should represent the differences among accents accurately. Z 13:15, 4 April 2023 (UTC)Reply
@ZxxZxxZ: Thanks, this makes sense. I am sure when @Benwing2 gets around to work on this module, this can be done. BTW, did you try to challenge the info on WP? I'm getting used to the sounds but I haven't been exposed enough.
BTW, you can add cases for both [ɢ] and [ɣ] under test_iranian_persian (or others for contrasting) in Module:fa-IPA/testcases. You can always add a comment for what you're testing, e.g.. -- غ and ق in a non-initial position, etc. That way, it won't be completely forgotten and can be reviewed. Anatoli T. (обсудить/вклад) 13:45, 4 April 2023 (UTC)Reply

Pitch accent[edit]

I think for example wa`tan should create /watán/, putting a diacritic on the next vowel, since Persian has a pitch accent? I also think being able to indicate syllables would be helpful, unless their is some reason that is not possible. I think readings like /dʊ.ʔɑ́ː/, /sɑː.ʔát/, /baʔd/, /maw.qɪ́ʔ/ are easier to understand but I may be wrong.

Also is it possible to make it so that if text entered is surrounded by [ ] (something like this: [text]), to not convert it? سَمِیر | sameer (talk) 21:01, 16 July 2023 (UTC)Reply

Updates[edit]

@Atitarev you said to continue the conversation here, but on the issue of Vocalization:

I am probably going to keep sukoon for both of them, since both pronunciation spellings are intended to be detailed. I was thinking about omitting diacritics for the Iranian vocalization that are only used for extremely detailed vocalization, and labeling it as "Iranian (simplified)" but I decided that might imply Iran legally depreciated some diacritics (they didn't, afaik neither Afghanistan or Iran have ever standardized diacritics) so I decided to use the most detailed vocalizations in both.

IMO, I think vocalizations in the headword would be redundant if vocalized spellings are auto generated. On a similar note I think transliterations should be moved from the header to the pronunciation section. Since the table can consistently mark transliterations and can prevent incorrect romanizations (obviously they should remain in links), which would solve a lot of issues that Persian has been having with incorrect transliterations. But I'm not sure.

And on the Tajik capitalization: Ill see if can copy some text from another project or something. سَمِیر | sameer (مشارکت‌هابا مرا گپ بزن) 04:22, 26 August 2023 (UTC)Reply

@Atitarev one more thing, I think fa-IPA could generate Tajik spellings, do you think that's an idea worth considering? I'm on the fence about it. Something like this: template:User:Sameerhameedy/Romanizations سَمِیر | Sameer (مشارکت‌هابا مرا گپ بزن) 06:47, 27 August 2023 (UTC)Reply
@Sameerhameedy: Re: headword: it's OK for now but it's rare that headwords don't generate any transliteration. There could be multiple vocalisations and transliterations or there could be just one, default transliteration. It's predictable that someone may demand it back, since we already provide transliterations. Thai, Khmer and Burmese entries may produce just one transliteration in the headword, even if there are more than one reading.
Also detailed vocalisation is best if the automation is planned. I have used sokun much until now, since it's easier to find Persian vocalisation without them but I will change that practice. Urdu uses sokun.
@Re: Tajik. Not sure it's a good idea or if it's possible. The IPA generated is currently 90-95% accurate but never 100%. There could be also unexpected vowel changes, which differ from both classical and modern. It doesn't hurt to try out the functionality but ultimately, the info should be based on real-life evidence. I check against both https://sahifa.tj/russko_tadzhikskij.aspx and https://sahifa.tj/tadzhiksko_russkij.aspx when the site is up. It has occasional misspellings too.
Re: capitalisation. One trick is to use "^" in Korean, Japanese and Mandarin Chinese modules: 서울 (Seoul), ソウル (Souru). Japanese also automatically capitalises transliteration of proper nouns in the headword. With Korean, as I mentioned, it's |cap=y. Anatoli T. (обсудить/вклад) 07:22, 27 August 2023 (UTC)Reply
@Atitarev on Tajik, that's true. On second thought if I did anything like that it would NOT be labeled as "Tajik spelling" but rather as "Tajik (translit)" and would be listed with all the phonetic spellings, and obviously the transliterations definitely won't create links. I think a Tajik translit/phonetic spelling would be helpful, even if the word isn't used in Tajik. The main worry holding me back is that someone unaware of the meaning of "transliteration" or "translit." could make a Tajik entry based off the transliteration without checking first....
The trick with the Carrot is definitely feasible, and would be very easy to add. Adding a "cap=" field would be more difficult but I can probably copy text from somewhere.
On the romanizations... yeah Ill probably talk to Persian editors about it. If anything else I think Majhūls should always be in the marked in headers and links even if we just continue to use the Iranian romanization. (which is likely) سَمِیر | Sameer (مشارکت‌هابا مرا گپ بزن) 08:34, 27 August 2023 (UTC)Reply
Tajik tends to use Russian capitalization rules for obvious reasons, while romanized Chinese, Japanese and Korean tend to use English ones (also for obvious reasons). Rodrigo5260 (talk) 13:32, 27 August 2023 (UTC)Reply

Iranian Persian velar stops as palatal?[edit]

I’m seeing the two velar stops ک and گ in (Iranian) Persian being transcribed as /c/ and /j-barred/ respectively, which are the palatal stops. This seems to be an error to me, unless you are citing some phonemic manual I am not aware of. All other Persian variants - Dari, Tajik - are correct. Glenohumeral13 (talk) 01:29, 25 October 2023 (UTC)Reply

For reference see کتاب and مگر Glenohumeral13 (talk) 01:31, 25 October 2023 (UTC)Reply
Seems to be introduced with this change Special:MobileDiff/75791361 @Sameerhameedy I am going to undo this since these transcriptions are unattested. Please comment further if you have a source. Glenohumeral13 (talk) 01:41, 25 October 2023 (UTC)Reply
the phonetic Iranian Persian was provided to me by @Saranamd, you should discuss it with them before making changes to the Iranian transcription. سَمِیر | Sameer (مشارکت‌هاکتی من گپ بزن) 01:46, 25 October 2023 (UTC)Reply
@Glenohumeral13, Sameerhameedy: I've reverted your edit because we are providing a phonetic transcription for Iranian Persian. I think @Saranamd commented once on Discord about changing them. I've implemented his suggestion: ɟ > ɡʲ and c > Fenakhay (حيطي · مساهماتي) 01:57, 25 October 2023 (UTC)Reply
Interesting, this is correct. I consulted the Oxford Handbook of Persian Linguistics and it confirms this phonetic transcription: ɟ for گ and c for ک for Standard Modern Persian (Ghavami, 2018). It might be good to add an academic source for future readers, since almost all glosses for Farsi use /k, g/ as the phonemes and there is no agreement on how to transcribe these exactly:
"There is no consistency in the recognition of the phonemic status of the palatal and velar stops. While some scholars consider them as velars or prevelars (c.f. Mahootian 1997; Majidi and Ternes 1999; Windfuhr 2009b; UPSID), Pisowicz (1985) considers the palatal articulation as the chief one. Velars occur only in the syllable-onset position when the nucleus is a back vowel [e.g. کار ] while palatals occur in all other positions. Therefore, posterodorso-velar stops [k, g] should be considered as allophones of anterodorso-palatal stops /c, ɟ/." (Bijankhan, 113)
Thank you for the response. Glenohumeral13 (talk) 03:45, 25 October 2023 (UTC)Reply
(By add source, I mean in comments of transcription module) Glenohumeral13 (talk) 03:46, 25 October 2023 (UTC)Reply

Iranian ق[edit]

@Sameerhameedy: Hi. I noticed the Iranian ق is [q] again (phonemically). Don't know which edit changed that. However, translit shows "ğ", e.g. قَفَسِهٔ کِتابğafase-ye ketâbbookshelf. Anatoli T. (обсудить/вклад) 01:18, 14 December 2023 (UTC)Reply

Long majhul vowels in Dari[edit]

The long majhul vowels ē and ō need to be shown before the glottal consonants h and ʔ in Dari, as ی and و (as well as ــِـ and ــُـ) are pronounced مجهول before ه ح ء ع . (Note that in Urdu, however, only the short vowels ــِـ and ــُـ are pronounced مجهول (e and o) before ه ح ء ع ; the long vowels ی and و are pronounced معروف (ī and ū) before ه ح ء ع .) 144.121.237.29 18:09, 29 January 2024 (UTC)Reply

Ů and Å[edit]

For the transliteration readings in the Latin alphabet, please change Tajik ü to ů (Cyrillic у̊, مجهول ُ\و), and Iranian â to å (ا). 144.121.237.29 18:32, 29 January 2024 (UTC)Reply