Wiktionary talk:About Ottoman Turkish

From Wiktionary, the free dictionary
Latest comment: 21 days ago by RagingPichu in topic Some change proposals
Jump to navigation Jump to search

Unicode, not clear what should be used[edit]

@Erutuon Looking on your list of wrong scripts in Ottoman links, which seems to correspond to a correct observation of the practice, I must say that I am not sure though who has thought out the rule that Ottoman Turkish, on the internet, should use U+0643 ARABIC LETTER KAF and not U+06A9 ARABIC LETTER KEHEH, but U+06CC ARABIC LETTER FARSI YEH. That of course corresponds to the Wikipedia page Ottoman Turkish alphabet (as also on Wikipedia in other languages); U+0643 ARABIC LETTER KAF does not correspond in shape to U+06AF ARABIC LETTER GAF which also used here but to U+06AD ARABIC LETTER NG which is also used here because it was used for the velar nasal /ŋ/, e.g. in سیڭك (sinek, fly); funny, this last is the only entry of Category:Ottoman Turkish letters – for other letters, we are not completely sure which should get an entry. Also we use U+0647 ARABIC LETTER HEH with Zero-width non-joiner while I have seen on texts transcribed to the internet U+06D5 ARABIC LETTER AE in this place, which corresponds to the current usage in Uyghur and Kazakh; however they also use U+06BE ARABIC LETTER HEH DOACHASHMEE and U+0647 ARABIC LETTER HEH. This rather confounds me and I give no recommendation, which I should not do anyway because of lacking Ottoman experience, I just continued to use the encoding that was already on Wiktionary, however almost all entries had been entered by Dijan (talkcontribs). Currently we count 1,115 pages with Ottoman Turkish lemmas, I just added some 300 for etymological crosslinks.

If some people will seriously start Ottoman work, they need to sort out which Unicode they want to use. This of course all strengthens my opinion that Unicode has been wrong to encode all separately here, like in many places (remember the big blooper of encoding many Greek letters twice). The distinctions between the various /k/ letters should be a function of the langcode and solved by markup, like the Cyrillic /b/ is not encoded separately for Russian and Serbian, Bulgarian – though it usually looks different in print, this is solved on the internet aptly by markup . Fay Freak (talk) 12:04, 6 June 2019 (UTC)Reply

Generally I think Unicode tries to make texts display correctly without using different fonts for different languages. This would not be true if Persian and Arabic used the same character for their /k/ letter. Not sure about the Cyrillic issue you mention, though. — Eru·tuon 18:49, 6 June 2019 (UTC)Reply

Looking a bit on scans of books on manuscripts, it seems that at the end, they indeed wrote and printed the kāf like in Arabic. The printing resembles mostly the Arabic printing. However there are also examples of printing that resemble Persian, and especially in handwriting the Persian forms seem very popular. The gāf گ was also readily employed in handwriting at the end; though at the same time I find specimens of Arabic kāf ك. They probably were under impression of Qurʾān manuscripts, but at the same time the access to Persian culture was more direct. It all corresponds to the general notion that an educated Ottoman person would know Persian and Arabic at the same time and both layers could be used for loans.

I note that even for Arabic in the 19th century, in printing the used forms where not clearly standardized. One also finds Arabic books where there is no ى but always the dotted yāʾ ي (y); similarly I have found Ottoman printing where the ye is always dotted at the end, but this is clearly non-standard and dispreferred. Fay Freak (talk) 13:54, 6 June 2019 (UTC)Reply

If we extrapolate what would have happened had the alphabet not been reformed, I assume the Turks would now consistently use the Arabic kāf and U+06CC ARABIC LETTER FARSI YEH and U+06AF ARABIC LETTER GAF as this is the trend by the 20th century, and it is definitely not wrong to have it so on Wiktionary, though this is perhaps not classical Ottoman. We have Ottoman Turkish here at its latest. But I doubt how the /h/ writing would be implemented. Fay Freak (talk) 15:57, 6 June 2019 (UTC)Reply

@Fay Freak: So do you think it would at least be okay to replace ي (y) (U+064A, yeh) with ی (U+06CC, Farsi yeh) and ک (U+06A9, keheh) with ك (U+0643, kaf) in Ottoman Turkish? If so, I can add those replacements to the script that I have used for other languages. — Eru·tuon 18:23, 6 June 2019 (UTC)Reply
@Erutuon Yes. Fay Freak (talk) 18:24, 6 June 2019 (UTC)Reply
Okay, I'm starting on that. — Eru·tuon 18:35, 6 June 2019 (UTC)Reply
I was wondering if I should also replace ى (U+0649, alef maksura) with ی (U+06CC, Farsi yeh). I guess so based on this edit. — Eru·tuon 19:33, 6 June 2019 (UTC)Reply

Also found a few Ottoman Turkish entries with the wrong characters in the title:

{"chars":["ي"],"title":"اختيار"}
{"chars":["ک"],"title":"ایکی"}
{"chars":["ي"],"title":"بايقوش"}
{"chars":["ي"],"title":"تصديق"}
{"chars":["ي"],"title":"تفسير"}
{"chars":["ى"],"title":"صارغى"}
{"chars":["ي","ي"],"title":"قيزغين"}
{"chars":["ى"],"title":"ماوى"}
{"chars":["ي"],"title":"نصيحت"}
{"chars":["ي"],"title":"پين"}

I guess those should be moved, but my script doesn't do that. — Eru·tuon 19:22, 6 June 2019 (UTC)Reply

Finished replacing ي (y) (U+064A, yeh) with ی (U+06CC, Farsi yeh), ک (U+06A9, keheh) with ك (U+0643, kaf), ى (U+0649, alef maksura) with ی (U+06CC, Farsi yeh). Now much of what remains in the list is terms written in Latin script, which should be moved to transliteration parameters. — Eru·tuon 20:43, 6 June 2019 (UTC)Reply

Finished moving Latin-script Ottoman Turkish terms to transliteration parameters. — Eru·tuon 18:13, 8 June 2019 (UTC)Reply

Transliteration[edit]

Moved from Talk:جلاد#Ottoman_Turkish_translit

@Fay Freak: Hi. Why is Ottoman Turkish translit "cellat", not "cellad", at least? Do you base the transliteration entirely on modern Turkish? I can see you made WT:AOTA but should some consonants, at least be transliterated as they are spelled? --Anatoli T. (обсудить/вклад) 07:26, 13 August 2020 (UTC)Reply

@Atitarev: Because it is transcription, not translit (using |tr=, but it isn’t necessarily for transliteration if |ts= isn’t present, like before). Here you clearly see from its descendants that it was devoiced, even if most original Ottoman dictionaries transcribed otherwise – Turkish “transcriptions” before Modern Turkish cannot be trusted, they just converted the graphs into Latin letters and e.g. transliterate دیمك (demɛk) “dimɛk” when it is clearly inherited from Proto-Turkic *dē- (to say, call) and pronounced /e/ unchanged up to Modern Turkish: Meninski already as some others writes it is “vulgarly” pronounced demɛk, which is apparently to be understood “in RL always heard demɛk, dimɛk is a learned fiction of bookscholars”, same situation with ایتمك (etmɛk) which isn’t even always spelled with ی. Or they disregard labial harmony and transcribe كدی اوتی (kedi otu) as kedi otı (the inflectional endings in Ottoman had consistent spelling defying harmony), as if labial harmony were only to be invented by evil reformers. Transliterating as spelled without transcribing as spoken is misleading. Because it is a bit that the Ottomans did not even have spelling but used logographs. Like in معدنوس (parsley) there is no point to transliterate ع – what would you “transliterate” there? /ʕ/ has not been spoken in Turkish. Fay Freak (talk) 12:50, 13 August 2020 (UTC)Reply
@Fay Freak: Thank you. This policy is better than no policy. I understand the rationale but this may be challenged by others later on. If you look at WT:FA TR, it addresses various letters one by one. It may be hard to do but perhaps you may want to mention in the policy some discrepancies between written and transliterated symbols with some examples. The actual spelling (and corresponding transliterations) may also helpful to understand the original Turkish or the source language pronunciation and spelling, please take this into account. Persian transliterates ع with an apostrophe, even if it's not pronounced but it may be ignored in Urdu, if it follows Hindi transliteration entirely. --Anatoli T. (обсудить/вклад) 00:06, 14 August 2020 (UTC)Reply
@Atitarev: Glottal stops have been pronounced even in Modern Turkish (usually without being signified in the Latin script), now old-fashioned. The Routledge grammar Turkish by Göksel-Kerslake writes in 2005: “The glottal stop survives mainly in the speech of some elderly speakers, and is going out of usage. It is confined to words of Arabic origin, and mostly to those in which it is intervocalic.” Thus according to them fiil (verb) is /fiʔil/. Apparently not in that word معدنوس (parsley) (there ع (ʕ) is /j/ as implied or lengthens the previous vowel in variants). I have not yet assessed how to represent it in transcription. Fay Freak (talk) 13:45, 14 August 2020 (UTC)Reply
Hello @Atitarev. It is not just Persian, but the Turkish latin alphabet used to mark it as well, up until the 1960s. See the following excerpt:

After the introduction of the new alphabet, the apostrophe (kesme işareti, koma işareti) was originally employed to transliterate the Arabic letters hamza and ayn, as in mes’ele ‘issue’ and mes’ut ‘happy’. (...) At a first stage, the apostrophe constitutes a phonographic marker for the transliteration of hamza and ayn in Arabic and Persian loanwords (san’at ‘art’). At a second stage, the apostrophe is gradually refunctionalized as a morphographic marker in proper names (Ankara’da ‘in Ankara’). At a final stage, the apostrophe is absent from Arabic and Persian loanwords (san’at > sanat ‘art’) and is mainly associated with inflected proper names. The absence of the apostrophe with Arabic and Persian loanwords reflects the loss of the glottal stop in spoken Turkish. After vowels, glottal stop deletion can be accompanied by compensatory vowel lengthening. This is the case with tesir [tɛːsiɾ] ‘effect’ and malum [maːlum] ‘known’, which originally contained hamza and ayn, respectively (Steuerwald 1964: 117, 120; Lewis 1967: 14). The glottal stop still occured in the 1960s with old-fashioned words such as kur’a ‘prize draw’ (Steuerwald 1964: 23, 122–123). Today, it is mainly found among older speakers (Kornfilt 1997: 488–489).

It may be possible to make use of the same style on Wiktionary, given that it was certainly pronounced by speakers of Ottoman Turkish and has precedent in Turkish orthography. Samiollah1357 (talk) 17:43, 11 September 2023 (UTC)Reply
“Ottoman Turkish” covers a long period, and we don't really know about the historical pronunciations. Also, it's not just Modern Turkish, it's New Ottoman Turkish too, so, taking it as transliteration/transcription basis is, to the contrary, is a right move. But we can think of other solutions for earlier pronunciations that we are, to some extent, sure about. Dohqo (talk) 11:40, 16 June 2022 (UTC)Reply

Fay Freak, I expect you to rework this page and bring it into accordance with the current state of the field. Since you took on to write a policy page on Ottoman Turkish, that is to say. Allahverdi Verdizade (talk) 14:41, 13 January 2021 (UTC)Reply

@Allahverdi Verdizade: I don’t know what you mean. Also I don’t think there even is a field, especially in so far as could influence a policy concerning the encoding of Ottoman on the web. Fay Freak (talk) 15:09, 13 January 2021 (UTC)Reply
Here are just a small number of things to consider:
  • Rewrite so that it doesn't look like an essay or collection of thoughts, but like an actual introduction to the language on Wiktionary, written in clear and comprehensible English language. Introduce the language, get straight to the point, when was it spoken, where and by whom. Remove all irrelevant and erroneous statements, such as reference to the Azerbaijani language or the claim found in the first sentence giving the impression that Armenian alphabet was widely used to write Ottoman. Remove the erroneous statement that "/ɛ/ and /e/ were distinguished in Ottoman Turkish". Rewrite the section 'its like modern Turkish' with the understanding that transliteration is not the same thing as transcription.
  • Lexicographic sources, where to find them and how to use them.
  • Read and write about alternative forms and how they should be lemmatized.
  • How Arabic and Persian terms should be translitterated.
  • What steps should be taken on the unification of the Ottoman translitteration.
  • How compounds should be written (ZWNJ or a single space? Dash?), ایش گوج (iş-güç), گوج ‌‌بلا (güç-belâ)
  • How Turkish and Gagauz terms should be linked to Ottoman.

Allahverdi Verdizade (talk) 15:26, 13 January 2021 (UTC)Reply

@Allahverdi Verdizade: I deny that it contains irrelevant or erroneous statements. Probably your interpretations are irrelevant or erroneous. The Armenian alphabet was widely used to write Ottoman. The fact that Ottoman Arabic printing only started after 1800 while Armenian existed before and even provided for the first printed Ottoman Turkish novel is highly relevant, if only owing to the fact that few people even realize it. It’s a problem that the last century of the language’s use is so prominent.
Lexicographic sources can be done – like on Wiktionary:About Arabic#How to add references to sources? I thought people don’t like this bloat. Myself I consider it a burden, although people find it useful.
“How Arabic and Persian terms should be translitterated.” – how is that relevant? But the difference between Ottoman and Iranian Azerbaijani script forms is a relevant consideration. There is a problem in Turkish and Azerbaijani ending up at Old Anatolian Turkish but Azerbaijani, alternatively Turkish, ending up under a different encoding of kāf.
“What steps should be taken on the unification of the Ottoman translitteration.“ I don’t know what that means. Use transcriptions like Modern Turkish for example, that is very practicable.
“How compounds should be written (ZWNJ or a single space? Dash?)” Like they were written, as with every other language. I can’t decide that it shall be done always in one of the possibilities.
“How Turkish and Gagauz terms should be linked to Ottoman.“ Why not in the descendants section? Fay Freak (talk) 15:41, 13 January 2021 (UTC)Reply
Turkish in Armenian script was indeed widespread. More than 2500 books were printed starting from 1727. 130 periodicals were published. But it was used only by Armenians and a few Turks who had learned the script to be able to read newspapers. --Vahag (talk) 17:32, 13 January 2021 (UTC)Reply

Transliteration[edit]

What does this mean? "Transcribe as though it were Modern Turkish, and according to the pronunciation how ever inferred."

But it's not Modern Turkish and differs from Modern Turkish in many aspects, most importantly the alphabet. Ottoman Turkish has 2 "D" sounds, 2 "G" sounds, 3 "H" sounds, 2 "K" sounds, 3 "S" sounds, 2 "T" sounds and 3 "Z" sounds, whereas Modern Turkish has 1 for all except G and K, which have their own IPA symbolizations and for good reason. Are the letter Kaf and Kef both supposed to be transliterated as "K" for instance, when all sources use Q for Kaf and K for Kef? Arabic and Persian transliterations use Ḳ and K; same symbols are displayed at the Wikipedia entry for Ottoman Turkish. Is this already established table supposed to be followed, or do we "Modern Turkish-ify" all transliterations on virtually the same website? Orexan (talk) 12:53, 23 February 2023 (UTC)Reply

You write nonsense. Transcription refers to the language and not the alphabet, otherwise it would be transliteration, the distinction of which you seem to principally know, falling back to this term in the last sentence. What you refer to as “sounds” are only the consonant signs of the alphabet, which to repeat by romanization is pointless. “All sources” use such a romanization in so far as they do not print the Arabic alphabet spelling but make it reconstructable from their romanizations (academic publishing sucks). So on Wiktionary you are supposed to represent in |tr= what you think it was supposed to be pronounced. Reason is also that editors copy-paste between Modern Turkish and Ottoman entries, thus save their attention to only mark differently what was actually differently pronounced. I am sure @Samubert96, Vox Sciurorum have appreciated this circumstance, visibly allowing them to add larger amounts of entries in both Ottoman and post-Ottoman Turkish than they would have added if they had been vexed by unnecessarily distinctly styled transcriptions, while maintaining sanity by reason of not juggling between systems, instead staying aware of the oral nature of the language. Fay Freak (talk) 16:17, 23 February 2023 (UTC)Reply
So in the Etymology section of an item in Turkish to refer to the Ottoman origin, are we to show "tr=" as "soğuk" of a word spelled as "صوغوق", even though this isn't how it was pronounced in Ottoman and it's not what I think it was supposed to be pronounced. Writing this as "soğuk" doesn't make any sense to me, because Sad isn't the same as Sin, nor Kaf same as Kef etc. Are we to ignore all nuance in pronunciation and just show modern adaptations? Orexan (talk) 18:24, 23 February 2023 (UTC)Reply
Then you think wrong. soğuk represents the Ottoman pronunciation perfectly, although and because it was more often /ɣ/ than today’s mute pronunciation and some other words had /ɡ/ while Modern Turkish has /ğ/, neither was Sad differently pronounced from Sin etc., there is no “nuance”, there is only primary articulation and secondary articulation; the later is present in the Arabic language from which the characters have been taken but not in Turkish, while for ق and ك it is not phonematic but they are positional allophones hence to be predicted by the vowels surrounding them, apart from the original spelling we also give allowing prediction. You just thought wrong all the time. Does not surprise me, earlier, another driveby editor contended that there was no ö or ü either in Ottoman. People don’t actually learn anything in school. Fay Freak (talk) 20:13, 23 February 2023 (UTC)Reply
"neither was Sad differently pronounced from Sin"
This is just utterly false.
"while for ق‎ and ك‎ it is not phonematic but they are positional allophones hence to be predicted by the vowels surrounding them"
This is somehow admitting you're wrong, but simultaneously insisting you're right. You're saying vowels surrounding it change it, yet also say they're not different and should be shown with the same symbol.
"People don’t actually learn anything in school."
This is just a juvenile insult from someone, hiding behind internet anonymity, who's been given a sliver of responsibility on an online platform and now they think they're everyone's boss.
So at first I read this policy and thought I must be missing something, but now I'm convinced that it's completely illogical and unfounded. It's inventing a new system when there already is one that makes sense. I would understand it for simplicity or convenience sake, if other editors choose to do so for efficiency, they may. I don't mind putting proper markers, though.
Thank you for your time. Orexan (talk) 08:02, 24 February 2023 (UTC)Reply
@Orexan: Then tell us how sad and sin were differently pronounced. If there was a difference then the phonologic science should provide you the vocabulary to express it.
Of course I insist I am right. Surrounding vowels change the pronunciation but not in a manner which distinguishes meaning (the definition of phoneme: the smallest unit distinguishing meaning); or if there are rare minimal pairs (which can only occur by means of foreign words) then the distinction bears almost no functional load. We don’t transcribe the х (x) in хер (xer) хуй (xuj) differently either, in spite of it being changed in pronunciation by the surrounding vowel. Fay Freak (talk) 15:57, 24 February 2023 (UTC)Reply

Other scripts[edit]

There are important dictionaries and a vast literature written in Ottoman-period Turkish using the Latin, Armenian, Cyrillic and Greek scripts. These often supply words and forms unattested in the Arabic script. For example Italo-Turkish bucciacgí (attested in 1611) = buçakcı is a by-form of بیچاقجی (bıçakcı), Armeno-Turkish եիւսիւք (yüsük) = yüsük (attested in 1843) is a by-form of یوكسوك (yüksük). We should devise a way to show these. One option is to list these under ===Alternative forms=== like this: {{l|ota|bucciacgí|tr=buçakcı}}. Another one is to normalize all Ottoman words to a modern-Turkish Latin spelling and list at that entry all the attested forms in various scripts like in Akkadian siparrum. Yet another option is to list the alternative transcriptions supplied by other scripts on the headword line of the Arabic spelling, but that would be a lie and sometimes the Arabic-script version may not be attested at all. What do others think? Pinging @Fay Freak, Vox Sciurorum, Samubert96, Allahverdi Verdizade Vahag (talk) 20:53, 21 March 2023 (UTC)Reply

Category:Ottoman Turkish language allows the Armenian script. I would hesitate to add dictionary-only words in other scripts, especially Latin. Do you have a use of եիւսիւք or only a mention? I found it in dictionaries. I suggest starting with an appendix page, Ottoman Turkish forms only attested in unsupported scripts. If it grows large we can consider promoting its contents. Vox Sciurorum (talk) 18:51, 22 March 2023 (UTC)Reply
I agree with @Vox Sciurorum, I'd prefer avoiding the normalization of all the Ottoman words in other scripts too, since the official writing system during the Ottoman era was a variant of the Arabic one. In my opinion, it could be viewed as "historically inappropriate", but that's just my opinion. The use of an appendix page could be a useful way of showing Ottoman words in other scripts, but I wouldn't mind listing them under ===Alternative forms=== either. Thanks for this discussion Samubert96 (talk) 20:19, 23 March 2023 (UTC)Reply
Appendix pages are where we banish non-useful curiosities. Transcribed Turkish is useful: it supplies words, forms and senses unattested in the Arabic script or attested earlier than in the Arabic script. Note especially the work of Rocchi on pre-Meninski Latin-transcribed Turkish, e.g. {{R:ota:Rocchi:2013b}}, {{R:ota:Rocchi:2013a}}, {{R:ota:Rocchi:2011}}, {{R:ota:Rocchi:2020}}. He normalizes the Latin spelling into modern Turkish, e.g. bucciacgí > buçakcı.
Transcribed attestations help to explain the shapes of borrowings in other languages. For example bucciacgí /buçakcı/ explains Armenian պուչախճի (pučʻaxči) instead of ըչախճի (*pəčʻaxči). Giazi, giasi, giaszi /cazı/ explains the -z- in Armenian and Laz descendants of جادو (cadu, cadı).
They also supply old evidence for dialectal Turkish forms recorded in modern times. For example, Ottoman եիւսիւք /yüsük/ is recorded in Ankara and Çorum in {{R:tr:DS}} as a dialectal form of yüksük.
I can't attest եիւսիւք /yüsük/ outside of dictionaries, but it certainly existed as the form is given as an alternative form of եիւքսիւք /yüksük/ and is supported by modern dialectological records.
We need to think of a way to incorporate this useful information in the Ottoman pages. Vahag (talk) 13:51, 25 March 2023 (UTC)Reply
I had the same issue with Albanian sheh. It would be nice to have a way to show the variant's existence in the Ottoman Turkish entry somehow. I like the idea of listing them as alts, although I'm not sure about having them as full-fledged entries. Catonif (talk) 21:48, 22 August 2023 (UTC)Reply

@Samubert96, Vox Sciurorum, I have written an automatic transliteration module for Ottoman Turkish in Armenian script. Would you take a look at the transliteration of these two pieces of text to check if anything contradicts our transliteration practice of the Ottoman Turkish in the Arabic script:

Book of Mormon (1901)
MOSYAHIN KİTABI 27:1
Ve vakı՚ oldu ki, iman etmeyenler tarafından kilise üzerine gelen te՚addi ol kadar şiddetli oldu ki kilisenin ՚azaları mırıldanmağa, ve bu husuua da՚՚ir kilisenin kılağuzlarına çikâyet etmeye başladılar; ve anler dahi Almaya teşekki etdiler. Ve Alma maddeyi anlerin meliyi Mosayah huzurına getirdi, ve Mosayah kâhinler ile istiçare eyledi.

Don Quixote (1868)
KISMI EVVVEL
Bundan az vakıt evvel Sbanianın Manş nam vilayetinde, ekseri elinde bir mızrak ve bir eski kalkan ve zayif bir bargir ve bir kaç tane dahi tazı taşıyan takımdan bir kibar zade var ıdı.

--Vahag (talk) 20:40, 16 August 2023 (UTC)Reply

@Vahagn Petrosyan As always I really appreciate your work here in the Wiktionary community. Even though I consider my knowledge on the Ottoman Turkish language still rather amateurish, I think that the transliteration from the Armenian script to the Turkish one is excellent, I didn't find errors or contradictions. The only word that puzzles me a bit is Sbanianın, mentioned in the Don Quixote passage, since it should be İspanyanın I guess, but other than this I think that your work proved to be really good. Samubert96 (talk) 18:38, 17 August 2023 (UTC)Reply
Thanks. Do we transliterate ayn and hamza? In Armeno-Turkish, they are shown with a single apostrophe and a double apostrophe, respectively.
Սպանիա (Sbania) is a borrowing from Armenian Սպանիա (Spania). Vahag (talk) 19:54, 17 August 2023 (UTC)Reply
I think they have to, being pronounced as a glottal stop. The editors have also used rings ʾ and ʿ, and zero, due to its non-pronunciation also being colloquially unmarked. I have not observed consistency. However, according to the same principle by which not the every distinction in the script is expressed, which to note is more on the token level , so only one sign for one phoneme should be used, i.e. ʼ for both like k instead of q, whereas on the other hand even the distinction between /ɛ/ and /e/ is levelled out by being transcribed the same. Fay Freak (talk) 20:57, 17 August 2023 (UTC)Reply
Ok, both ayn and hamza will now transliterate to <ʼ>. Vahag (talk) 09:04, 18 August 2023 (UTC)Reply

Some change proposals[edit]

@Fay Freak, Vox Sciurorum, Samubert96, Vahagn Petrosyan, Orexan, Allahverdi Verdizade. I'm sorry for whoever I missed. I'd like to propose some changes to this policy. Posting here, as I'm not sure what better place there might be.

  1. About the encoding, as already envisioned by the policy itself with "editors might later decide", I propose to lemmatise entries with گ and ڭ to ك, either leaving the distinction to the |head= or simply not using the two characters anywhere at all, which would be my preference, following the same principle for which we don't add tashkeel to headwords.
  2. All the other points are about transliteration. At the moment the section for transliteration is a bit hasty, so I'd like to expand a bit. Firstly I think we could formalise the usage of ʼ U+02BC MODIFIER LETTER APOSTROPHE (or some other apostrophe encoding) for hamza and 3ayn, as was already decided upon for MOD:ota-Armn-translit in the discussion above, and is already in place in a considerable number of entries.
  3. Allow circumflexes on â and û after k g l to infer consonantal change. The policy is currently unclear, as it seems to assume all circumflexes are used only to infer Arabic script spelling. In this condition they bare quite a significant pronounced value. Words like لوقوم (lokum) would of course stay as lokum despire the /l/ since ô isn't a Turkish letter. Similarly the distinction would still remain veiled also in cases where the consonant in question is word-final. This practice is also already quite widespread among entries.
  4. Disallow transcribing consonantal devoicing, assimilation and word-final degemination. This is supposed to be a transliteration after all. For example, بیچاقجی (bıçakcı) yet New Turkish bıçakçı, ولد (veled) yet NT velet, شرانپول (şaranpol) yet NT şarampol, حل (hall) yet NT hal. This is another very widespread but undocumented practice. Phonetic pricision is to be left to the pronunciation section.

These are the ideas I can remember at the moment of writing this. They of course don't need to all pass together, each proposal should be considered individually. The exact wording will be worked on. Not being particularly drastic, there wouldn't be any pressing need to go back to all entries made thus far and go through the hassle of imposing the changes which pass. They would just be guidelines to bare in mind when creating new or editing existing entries. Catonif (talk) 16:53, 22 November 2023 (UTC)Reply

How about transcribing qaf as q? How about the Arabic emphatic consonants in Arabic words (but not Turkish ones)? Allahverdi Verdizade (talk) 18:11, 22 November 2023 (UTC)Reply
And may be use x (or ) in Arabic and Persian words where appropriate. The Armenian script distinguishes q from k, and x from h. Vahag (talk) 19:53, 22 November 2023 (UTC)Reply
So, does that mean you two agree with the points above? You haven't expressed yourself about them. About this new proposal, personally I would object since it's not what modern romanisations of OsmT do. Compare most notably the systems employed by {{R:tr:OTK}} and the works of Rocchi. Catonif (talk) 20:06, 22 November 2023 (UTC)Reply
I don't know enough about Turkish or the Arabic script to comment on your initial proposals. Vahag (talk) 20:25, 22 November 2023 (UTC)Reply
I support your suggestion only as part of a broader set of amendments to the transliteration which would also include my suggestions and the one that Vahag mentioned. It's bizarre to only amend the ones that you suggested (which are in themselves good) at the same time as opposing broader set of amendments that would take the separation of pronunciation and transliteration to its logical completion. Allahverdi Verdizade (talk) 20:45, 22 November 2023 (UTC)Reply
At first I supported strict, reversible, scholarly transliteration systems that mechanically replaced native script's letters with Latin letters. But now I like systems that are closer to the pronunciation and explain the shapes of borrowings in other languages. I know the pronunciation should go to the etymon's pronunciation section, but usually the etymon is a redlink and even if it is a bluelink, it often doesn't have a pronunciation section, at least not the Ottoman Turkish. Additionally, it is far away from my beloved etymology and descendants sections.
It is confusing to say Armenian խալլ (xall) is borrowed from Turkish حل (hal), because then people wonder why is it not borrowed as *հալ (*hal). For Armenian too I am thinking about bringing the transliteration scheme closer to pronunciation. I don't like seeing the monstrosity kiwvērčin in կիւվէրճին (kiwvērčin) when it is simply a Western Armenian orthography to say güverǰin. Fay Freak once said we don't have to be strict with the reversibility of the transliteration because unlike traditional sources we include the original script right beside it. We can cheat and sneak more useful information into the transliteration. Vahag (talk) 22:34, 22 November 2023 (UTC)Reply
Why would we “take the separation of pronunciation and transliteration to its logical completion”? It did not occur to me to transliterate rather than transcribe in the first place, it is understood.
Even transcription might not hinder disregarding devoicing however, like Middle High German wrote devoicing but then German forwent it, the morphological spelling being more iconic. I don’t think in the categories of transcription and transliteration at all. Impractical confusing categories professors teach people to make exams about.
But transliteration would assume that the Arabic spelling is Ottoman, when at the same time it is Armenian spellings, and mostly audio. More inconsistency. The greatest point was to avoid manual adjustment of transcriptions since I know people like to copy Turkish and Ottoman transcriptions between each other, so it did accelerate entry creation with no loss, save of workload.
is probably better though than x … like for Turks, and dialect transcriptions, though I realize its being known from Azerbaijani (after Russian I suppose, while the IPA /x/ depends on Spanish according to myself after Lagarde on the page x, and the later Persian transcriptions are from the IPA alphabet, and Assyriological ~ Semitist is an organic development?).
As the about page gave the reasoning “No circumflex signs, since these are used in scholarly works for readers to infer the Ottoman spelling but this is not needed since in this dictionary we have the Ottoman Turkish alphabet right next” this was only valid in so far as the purpose of these spellings is to infer the Arabic spelling. It did not disallow and hence is widespread in the other contexts. There were other options like (ǵ, ľ?) I did not write any position about I made editors feel emboldened rather than confused. Fay Freak (talk) 22:53, 22 November 2023 (UTC)Reply
Verdi, I understand how it may look bizarre as your "logical completion" seems to be Azeri, but modern scholarly romanisations of OsmT I'm accustomed with, namely Çağbayır and Rocchi I mentioned earlier, don't seem to embrace your additional amendments (if not to transliterate the Arabic language itself). Not saying they're not valid points, merely that they are not to be equated with the changes I proposed.
The situation of /x/ seems analogous to that of /ɛ/, both posing an actual pronunciation difference lost in Modern Standard Turkish and as a result not transcribed in modern scholarly romanisations. It doesn't seem analogous, on the other hand, to emphatics, which never suggested a separate pronunciation in the first place, and /ʔ/, which is still transcribed. If we are to transcribe either one of /x ɛ/ I suggest we transcribe both, though not as x ə, maybe ḫ ä?
Anyways, that definitely seems more controversial and complicated than whatever I'm trying to do now, which seems relatively trivial. About the practicality of copy-pasting transcriptions, I can see why one would value convenience, but I'd rather not overlook quality for quantity. If there are no oppositions to the initial points alone (esp. to point 1, which I think is the one with the greatest impact) I would go ahead and make the changes to the policy. Catonif (talk) 21:26, 23 November 2023 (UTC)Reply
To me, your proposals seem uncontroversial incremental improvements. Go ahead if no one objects. I just wanted to use this opportunity to voice my хотелка (xotelka)s. Vahag (talk) 13:14, 24 November 2023 (UTC)Reply
An opinion was sought, an opinion was given... Allahverdi Verdizade (talk) 13:19, 24 November 2023 (UTC)Reply
Clarifying my position: transliteration in historical languages should transliterate, pronunciation should give historical pronunciations. Link to modern Turkish should be given in Descendants. That is, all Arabic letters should be properly transliterated. Possible exception are the emphatics in inherited Turkic vocabulary which were used to indicate frontedness of following vowels. But actually, those could also be transliterated properly, because transliteration =/= pronunciation (so if 'water' is spelled صو I don't see any problems with transliterating it as ṣu). This is still one step short of full transliteration which would leave out vowels, but I think this is a good place to stop. Understanding this will likely not be adopted, I nevertheless wanted to contribute to this festival of khotelkes. Allahverdi Verdizade (talk) 14:31, 25 November 2023 (UTC)Reply
@Catonif RE Proposal #1. There definitely is an inconsistency on the platform with regards to entries containing these letters, which is understandable because the language itself had similar irregularities and inconsistencies among its users and dictionary writers. I'm not strongly leaning towards either way, but Ottoman Turkish used a version of the Perso-Arabic alphabet and these are individual letters in the OTA alphabet, not variations created with diacritics, like exists in Modern Turkish with "â, î, û". We do lemmatise Modern TR entries containing these as they are, when in my opinion we should follow the Russian model, where the word моде́ль (modélʹ) for example is lemmatised in the page модель (modelʹ), as they're only accents and circumflexes. Whereas with the OTA entries, these are separate letters. Not a 1:1 equivalent, but Persian entries containing these letters are lemmatised as they are. I'd rather گ and ڭ are used as they are but my main area of focus is Modern Turkish.
RE Proposal #2-3. Yes, this is a necessary change. The reason given on several occasions by the creator of the current policy more or less is "using accents would slow down entry creation, because copy-pasting takes too much time". But for some reason, after I said "I don't mind spending the extra time to copy-paste", some of the editors pinged under this section and some others have time and time again changed my entries/edits in a way to fit the policy. I thought we were in such a time crunch to create as many entries as possible that we couldn't even spare the extra time to copy-paste proper diacritics, but we find time to change others' entries or revert their changes with the sole purpose to have them conform the policy in this specific regard. In addition to consonant change and the apostrophe, long vowels also need to be indicated because there are examples of semantic change due to short/long vowels.
RE Proposal #4. Is it me or is the wording a little confusing? If what is meant is to transliterate بیچاقجی as "bıçakcı" and ولد as "veled", I support it, except بیچاقجی is spelled with a "ق" and the proposed transliteration doesn't discern "ق" from "ك", and I have an issue with that. Orexan (talk) 16:25, 24 November 2023 (UTC)Reply
#4: Because it is not a transliteration and you have generally two other ways to discern the spelling difference 1. by looking at the spelling 2. by vowel harmony, as it is conditioned. And it would be actually contradictory to distinguish that when other distinctions of little functional load, devoicing and assimilation of /n/ to /p/, are removed. Fay Freak (talk) 16:54, 24 November 2023 (UTC)Reply
I have little knowledge of Ottoman Turkish to contribute but I know the transliteration is not looked after very well. What Fay Freak offered, is based on modern Turkish, which makes it easier (?) to match with Turkish but is heavily mismatched with the actual older spelling. Having said this. It's common to transliterate various letters phonetically, despite differences in spelling. Compare with modern Persian:
  1. Letters س, ص and ث are transliterated as "s".
  2. Letters ز and ظ are transliterated as "z".
  3. Letters ت and ط are transliterated as "t".
  4. Letters ق and غ are transliterated as "ğ".
Just saying it's not unique that multiple letters have the same value. Anatoli T. (обсудить/вклад) 02:50, 25 November 2023 (UTC)Reply
@Anatoli T. "Letters ق‎ and غ‎ are transliterated as "ğ"."
Every instance of Persian entry I have seen on Wiktionary containing "ق", start/middle/end of words, are transliterated as "q". Can you provide any examples for this Wikipedia table? Orexan (talk) 07:22, 25 November 2023 (UTC)Reply
@Orexan: I should have made clear that ق and غ are merged in modern standard Iranian but remains different in Dari, Classical Persian and Tajik. Per w:Persian_phonology#Allophonic_variation: In modern Tehrani Persian (which is used in the Iranian mass media, both colloquial and standard), there is no difference in the pronunciation of غ and ق.
The link I gave above Wiktionary:Persian transliteration is specific to modern Iranian. There's also a page for Dari/Classical in Wiktionary:Persian transliteration/Dari.
The policy page changes reflects the module changes. Pls see how automated transliteration handles ق different varieties using specific language codes and only 'fa-ira' (Iranian Persian) produces "ğ".
  1. Classical Persian قِصَّه (qissa) or Dari قِصَّه (qissa)
  2. Iranian Persian قِصِّه (ğesse)
  3. Tajik қисса (qissa)
The change to the module was made by @Sameerhameedy. There was no full endorsement yet but there are no many active Persian editors right now. Anatoli T. (обсудить/вклад) 01:53, 26 November 2023 (UTC)Reply
Also, even though most Persian entries use Iranian specific transliterations but "q" was used traditionally, since غ and ق differ in Classical Persian and Dari but the difference can be highlighted in translations sections where the varieties are nested under Persian, e.g. in tale#Translations, you can check nested Persian translations. Anatoli T. (обсудить/вклад) 01:57, 26 November 2023 (UTC)Reply
I made the changes to the policy, we may discuss in more detail all of our hotelkas on a second occasion. I made two additional changes I had forgot to mention in this discussion: (1) keeping the spaces of the Arabic spelling, whenever NT univerbates, and (2) disallowing capitalisation, following the practice of Arabic and Persian. Since, due to me forgetting at the time of creating this post, these have not been properly discussed upon, feel free to get rid of either if disagreed upon. Catonif (talk) 16:57, 26 November 2023 (UTC)Reply
@Catonif: Did you mean those hotelkas? Anatoli T. (обсудить/вклад) 23:14, 29 November 2023 (UTC)Reply
I'm sorry if I reply just now. I'd like to start off with saying that I don't really consider myself an expert in Ottoman Turkish language or Arabic and Persian scripts. Despite this, I agree with the majority of @Catonif's points, including the last two he mentioned: (1) keeping the spaces of the Arabic spelling and (2) disallowing capitalisation. But, as for your first proposal (lemmatise entries with گ and ڭ to ك), I share the same opinion of @Orexan, mainly because they are presented as separate letters in the Ottoman Turkish language, even though most historical dictionaries don't make this separation. Samubert96 (talk) 10:42, 27 November 2023 (UTC)Reply
They are indeed theoretically separate letters, but given how rarely they employed, I believe the most sensible choice is to follow what most dictionaries do. I did not see much, but I've actually yet to find any dictionary, modern or old, using ڭ. Catonif (talk) 20:55, 29 November 2023 (UTC)Reply
I am by no means an expert on Turkish or Ottoman TK, but from what I understand certain consonants in OT simply condition nearby vowels to be back vowels (according to Wikipedia these consonants are ح خ ص ض ط ظ ع غ ق) and other consonants condition front vowels (ت س ك گ ه). Since we already mark the front and back vowels that these consonants indicate, I understand why we'd chose not to distinguish them in translit. However, I do think it's a bit strange to distinguish them in the Armenian script but not the OT Arabic script. That kind of inconsistency is kinda weird tbh.
For the most part I support your proposal, but I do think it's weird to have major transliteration differences between the Arabic and Armenian script. Even Hindi and Urdu try to be similar despite their alphabets being very different.

However, regarding your suggestion to lemmatize گ and ڭ to ك, I strongly disagree. I understand your reasoning but I think treating them them the way Hindi treats nuqtaless forms is a better approach. - سَمِیر | Sameer (مشارکت‌ها · بحث) 22:22, 30 November 2023 (UTC)Reply
There are no transliteration differences between the Arabic and Armenian script. In Module:ota-Armn-translit I have implemented exactly the same rules as for Arabic script. The two should be kept in sync. The only out of sync feature now is the capitalization. Vahag (talk) 13:04, 1 December 2023 (UTC)Reply
@RagingPichu: did you read our discussion here before implementing the changes to Module:ota-Armn-translit? We are trying to synchronize the transliteration of the Arabic and Armenian scripts. Vahag (talk) 07:48, 19 April 2024 (UTC)Reply
I read this page but didn't reach this far down. The changes I made were for the sake of consistency anyway. I didn't see that ghayn was transliterated as ğ but rather as ġ, which is the basis of my change. My two cents on this issue would be to normalize the various kaf flavors into kaf, gaf, ñaf (and yaf being represented with kaf), and then to romanize each letter differently but with the base corresponding to modern turkish; i.e., do d ḍ g ġ h ḥ ẖ k ḳ n ñ s ṣ s̱ t ṭ z ẓ ẕ for د ض گ غ ه ح خ ك ق ن ڭ س ص ث ت ط ز ظ ذ respectively. Armeno-turkish, which distinguishes k ḳ and g ġ with some consistency and n ñ in older texts, should as said be transliterated the same, as mentioned. There might be room to do x instead of ẖ, since this letter isn't unknown to Turkish speakers, and it looks more pleasant anyway in my opinion. The circumflexes for palatalization are not useful if the letters are distinguished with a mark and the spelling normalized, but they should be retained for length, like in 'âciz. RagingPichu (talk) 11:16, 19 April 2024 (UTC)Reply
@RagingPichu: You also added a նկ, նղ > ñ rule, which is now generating üzeñisini in the transliteration of the quote at اوزنگی (üzengi). It is better to remove the automatic rule, otherwise all occurences of նկ, նղ will have to be tracked and manually overriden, which is an unnecessary pain. The beauty of the Armenian script as opposed to Arabic is that no manual work is needed. Vahag (talk) 15:41, 19 April 2024 (UTC)Reply
I added that rule based on consistency too, actually. That beauty only works if it's consistent, which since նկ is ambiguous, it doesn't. ZWNJ can prevent the automatic rule from applying in examples, which I tested in چرگه (çerge), but it might be less desirable. If removing the automatic rule and maintaining the ambiguity is better, though, then it's better. RagingPichu (talk) 23:28, 19 April 2024 (UTC)Reply