Wiktionary talk:Language treatment

From Wiktionary, the free dictionary
Latest comment: 4 months ago by Vininn126 in topic RFM discussion: December 2023–January 2024
Jump to navigation Jump to search

moving dialect codes to the etyl namespace[edit]

For cases where we consider the macrolanguage to be the individual language and the subdivisions to be dialects, I think we should move the subdivision language code templates to the etyl: namespace. Similar to language families & other dialects, this is where we house codes that should only be used in Etymologies and not as valid L2 languages. Sound fine? --Bequw¢τ 21:59, 20 January 2010 (UTC)Reply

Treatment by SIL[edit]

I thought it may be interesting to post what SIL's (the Registration Authority for ISO 639-3) criteria are for determining if language varieties are dialects or distinct languages. It can be found on their Change Request Form (page 3).

For this part of ISO 639, judgments regarding when two varieties are considered to be the same or different languages are based on a number of factors, including linguistic similarity, intelligibility, a common literature (traditional or written), a common writing system, the views of users concerning the relationship between language and identity, and other factors. The following basic criteria are followed:

  • Two related varieties are normally considered varieties of the same language if users of each variety have inherent understanding of the other variety (that is, can understand based on knowledge of their own variety without needing to learn the other variety) at a functional level.
  • Where intelligibility between varieties is marginal, the existence of a common literature or of a common ethnolinguistic identity with a central variety that both understand can be strong indicators that they should nevertheless be considered varieties of the same language.
  • Where there is enough intelligibility between varieties to enable communication, the existence of well-established distinct ethnolinguistic identities can be a strong indicator that they should nevertheless be considered to be different languages

We are of course independent of these, but they may be useful nonetheless. --Bequw¢τ 21:58, 21 January 2010 (UTC)Reply

Allowing macro and non-standard dialects[edit]

Sometimes (as with Latvian and Estonian) we treat the subdivisions of a macrolanguage as individual languages, but we use the macrolanguage name/code in place of the "standard" dialect name/code. I just added this option to the table. Are there other macrolanguages where this is the case (possibly Arabic and Malay)? --Bequw¢τ 17:09, 23 January 2010 (UTC)Reply

Aramaic[edit]

Apparently some have been treating "Jewish Babylonian Aramaic" (aka "Talmudic Aramaic", code=tmr) as a variety of Aramaic. Does anyone know if this is standard, or if this is true of other ISO 639-3 coded Aramaic varieties? --Bequwτ 18:34, 8 February 2010 (UTC)Reply

"Apparently" should now link to an archive.​—msh210 18:52, 15 February 2010 (UTC)Reply

Chinese[edit]

I'd like to update the Chinese entry. Is there any way to just write in plain English, without passing through a template? Mglovesfun (talk) 16:41, 25 March 2010 (UTC)Reply

Use the templates, please; because they standardize the possible texts, and standardization is good. Another way to contribute to the page is typing here what you need, so I may update the table. --Daniel. 14:15, 22 April 2010 (UTC)Reply
I've changed the table to a regular wikitable so that anyone can edit it and so that it can handle more complex situations and the presence of deleted codes. Cheers, - -sche (discuss) 21:05, 23 May 2013 (UTC)Reply

Aramaic redux[edit]

Because at least one RFM is ongoing(?), I'll list this here rather than on the main page: oar (Old Aramaic, up to 700 BCE) is not used, as it has been superseded by arc and syc. tmr (Jewish Babylonian Aramaic, circa 200-1200 CE) is not used, as it has been superseded by arc and etyl:tmr. - -sche (discuss) 00:11, 16 July 2013 (UTC)Reply

Montagnais/Innu[edit]

Currently, some main-namespace pages use Montagnais/Innu's language code (probably mostly in translations tables) while a few use other Cree dialects' language codes. Innu is different enough from Cree that Innu is regularly considered side-by-side with (rather than subordinated under) Cree; e.g. the Linguistic Atlas of Canada speaks of "different Cree and Innu dialects". OTOH, they're not that different, and splitting them at the L2 level would raise questions of what to do with e.g. Naskapi. I'm curious whether we should (a) allow Innu its own L2, (b) merge it completely into Cree, or (c) leave it subordinated under / merge into cr at the L2 level, but let it keep its code (it currently still has one, as no-one ever deleted it) so that it can be used in translations tables (like the Romani lects' codes). The translations could be nested under Cree/cr, or could be separate, sorted under M or I depending on which name we end up using for the lect. - -sche (discuss) 22:26, 20 July 2013 (UTC)Reply

East Frisian: frs, stq[edit]

RFM 1[edit]

The following discussion has been moved from Wiktionary:Requests for moves, mergers and splits.

This discussion is no longer live and is left here as an archive. Please do not modify this conversation, but feel free to discuss its conclusions.


This is an old, old mistake in ISO. Both codes refer to the very same language, namely the Frisian dialect spoken in Saterland, which is an Eastern Frisian dialect. I have no idea how that was overlooked, but it means the two codes should be merged somehow. I'd prefer {{frs}}, since that one is in 639-2. -- Liliana 14:24, 17 October 2011 (UTC)Reply

Should the language name be "East Frisian" or "Saterland Frisian"? I'd prefer to use the code "frs", but the name "Saterland Frisian". - -sche (discuss) 19:08, 17 October 2011 (UTC)Reply
To me it seems like Saterland Frisian is the most common name, so we should probably use that. -- Liliana 19:14, 17 October 2011 (UTC)Reply
Alright, frs = Saterland Frisian it is, but {{stq}} is in fact widely used — someone will need to replace it by bot. - -sche (discuss) 23:50, 23 October 2011 (UTC)Reply
Or a really bored person like me needs to spend an hour or two. -- Liliana 00:12, 24 October 2011 (UTC)Reply
But what about etymologies involving Eastern Frisian, at the time it still existed? With no code, how should they be entered? Or even, how should the etymologies that already exist be fixed? —CodeCat 11:23, 24 October 2011 (UTC)Reply
If it warrants a distinction, it should get one of these constructed codes. It isn't covered by the code frs anyway, which ISO classifies as a "living" language, not an extinct one. -- Liliana 12:13, 24 October 2011 (UTC)Reply
East Frisian isn't really extinct strictly, but the only surviving instance of it is now called Saterland Frisian. —CodeCat 16:11, 24 October 2011 (UTC)Reply
This doesn't explain why ISO assigned two codes to one language. We do not have that for any other language of the world. Using frs for a different language than what ISO intended would make a precedent case, and almost certainly require a vote.
Another problem is that the current name "East Frisian" is really confusing, since there's an (unrelated) Low German dialect which is also called East Frisian. So in any case, you would have to sort out the erroneous uses. -- Liliana 16:15, 24 October 2011 (UTC)Reply
I agree with Liliana, we need a separate code of our own for non-Saterland varieties of East Frisian (or we need to clearly indicate that we are using "frs" to refer to a language other than the one the ISO refers to as "frs"). If a word is derived from a variety of East Frisian other than the one the ISO calls "stq", it cannot be derived from what the ISO calls "frs", because "frs" is living, and the only living East Frisian lect is "stq". - -sche (discuss) 00:34, 26 October 2011 (UTC)Reply


Proposed additions / clarifications[edit]

These are all from translation tables, which I will edit to reflect consensus for any of these cases:

  • Macro languages:
  • Chinese: dng, ltc, och
  • Sorbian: dsb, hsb
  • Apache: apw, apm, apj, apl, apk
  • Sami: smn, smj, sms, sma, se
  • Frisian: fy, ofs, frr
  • Berber: shi
  • Marquesan: mrq, mrm
  • Dialects / script group:
  • sq: als does not exist any more, change to just Tosk
  • cop: Bohairic, Sahidic, Fayyumic
  • lt: Aukštaitian
  • ms: Rumi, Jawa
  • sc: Nugorese
  • tly: Asalemi, Anbarani, Masali
  • sh: Cyrillic, Roman, Arebica, Latin
  • arc: Hebrew, Syriac
  • ks: Arabic, Devanagari
  • cu: Cyrillic, Glagolitic
  • ro: mo no longer exists; Latin, Cyrillic
  • os: Digor, Iron
  • kea: Badiu, São Vicente, ALUPEC, Sotavento, Barlavento, Santo Antão
  • az: Cyrillic, Roman, Perso-Arabic, Arabic, Persic
  • avd: Vidari
  • egy: Archaic Egyptian, Old Egyptian, Middle Egyptian, Late Egyptian
  • tt: Cyrillic, Roman
  • lad: Roman, Hebrew, Latin
  • pa: Gurmukhi, Shahmukhi (has its own code?)
  • nso: Sepedi
  • vot: Roman, Cyrillic
  • rom: table says that rmc, rmf, rml, rmn, rmo, rmw, rmy are deprecated but they still exist in the languages module
  • kw: Kernewek Kemmyn
  • be: Cyrillic, Roman, Narkamaŭka, Taraškievica, Tarashkevitsa
  • tg: Cyrillic, Persic, Roman
  • ug: Persic, Roman, Cyrillic, Perso-Arabic
  • uz: Cyrillic, Roman, table says that uzn and uzs are deprecated but they still exist in the languages module
  • zza: Persic, Roman
  • ko: South, North
  • fia: Fadicca, Kenzi
  • cr: some codes are deprecated but still in languages module
  • lmo: Eastern, Western, Milanese
  • ms: Rumi, Jawi, Latin, Arabic
  • la: New Latin
  • pi: Burmese, Devanagari, Latin
  • Other:
  • ar: xaa, mey
  • fr: frm, fro
  • de: ksh, gsw
  • nds: deprecated but still in languages module, add nds-de, nds-nl
  • mn: cmg
  • es: osp
  • hy: xcl
  • pnb: pa
  • id: ace, ban, bjn, bug, jv, mad, mak, min, nia, sas, su
  • ga: sga, pgl
  • fy: stq
  • arc: syc
  • tt: crh
  • ko: oko, okm
  • rom: rmq
  • pl: zlw-opl

I apologize if this is in an inconvenient format- rearrange it as you like. DTLHS (talk) 00:44, 20 August 2013 (UTC)Reply

Nice. Some additional things that I noticed after a quick read: okm should be under ko, pgl should be under ga, zlw-opl should be under pl, there are tons of missing Arabic sublects that should be under ar, and grc (and possibly some other lects) should be under el. —Μετάknowledgediscuss/deeds 02:40, 20 August 2013 (UTC)Reply
grc is already under el on the page. What Arabic sublects aren't in my list or the existing table? DTLHS (talk) 03:08, 20 August 2013 (UTC)Reply
Never mind. Only mt, which shouldn't be under ar anyway (well, linguistically it should, but not sociopolitically). —Μετάknowledgediscuss/deeds 04:04, 20 August 2013 (UTC)Reply

Use title text for the language names?[edit]

A lot of the language codes in the table don't have a name next to them, but if we added the name it would become very hard to see. Would it be useful to turn it into title text, so that the name is shown when you over the mouse over the code? —CodeCat 19:36, 25 August 2013 (UTC)Reply

Hmm. One downside to that is that it would no longer be possible (would it?) to hit Ctrl+F and search the page for a particular dialect's name. Given that one of the reasons this page exists is so that people can see if the reason we don't have a code is because we've merged it into something else (vs we just haven't added it yet), that's a significant downside. - -sche (discuss) 05:15, 26 February 2014 (UTC)Reply

RFC discussion: May 2013[edit]

The following discussion has been moved from Wiktionary:Requests for cleanup.

This discussion is no longer live and is left here as an archive. Please do not modify this conversation, but feel free to discuss its conclusions.


This is causing some script errors because some of the codes have since been deleted. I'm not sure what to do about that. —CodeCat 13:34, 23 May 2013 (UTC)Reply

It needs to be redesigned so that the table can contain/mention codes that have been deleted, for the reason you mention and several other reasons. - -sche (discuss) 19:03, 23 May 2013 (UTC)Reply
I've started redoing the table. - -sche (discuss) 19:30, 23 May 2013 (UTC)Reply


List of codes the ISO has retired[edit]

This was previously at User:-sche/retired codes, but I think it is useful to have it in the Wiktionary: namespace. - -sche (discuss) 23:53, 21 December 2014 (UTC)Reply

Retired codes which were not used on Wiktionary in February 2014[edit]

Codes which were retired from the ISO and which were not used on Wiktionary as of February of 2014. (Since then, several other codes which were retired from the ISO by that date have also been retired on Wiktionary; see the following sections.)

Retired codes which have been discussed since February 2014[edit]

Please see Wiktionary:Language treatment/Discussions#Various_code_retirements.2C_part_one (Wiktionary:Beer parlour/2014/February#Codes_the_ISO_has_split_or_merged_.28first_batch.29) and Wiktionary:Language treatment/Discussions#Various_code_retirements.2C_part_two (Wiktionary:Beer parlour/2014/March#Codes_the_ISO_has_split_or_merged_.28second_batch.29).

Retired codes which are still used on Wiktionary[edit]

Some codes which were retired from the ISO but which are still used on Wiktionary. (This list is not necessarily comprehensive.) Some codes in the list have been discussed, and these have been intentionally retained: sh "Serbo-Croatian", gio "Gelao", kzh "Dongolawi" / "Kenuzi-Dongola", mnt "Maykulan". Meanwhile, these have not yet been discussed.

List of ISO 639 codes absent from Wiktionary[edit]

Most of the 7865 codes present in ISO 639 are present on Wiktionary; most of those which are not are recorded on WT:LT. The only ones which have slipped between those two cracks are these, which should be investigated and discussed in the coming weeks. In many cases, the exclusion is likely nothing more than an oversight; in some cases, it's clearly because a naming conflict prevented importation of the codes back when Wiktionary bot-imported ISO 639 en masse (something we can now solve with disambiguators):

  1. cek — Eastern Khumi Chin - a dialect of cnk (Khumi Chin)
  2. dda — Dadi Dadi
  3. dgw — Daungwurrung
  4. dja — Djadjawurrung
  5. deq — Dendi (Central African Republic) - presumably failed to be included because of the naming conflict with ddn — Dendi (Benin)
  6. dmd — Madhi Madhi (Muthimuthi)
  7. dth — Adithinngithigh - compare rrt, which is said to be a different language
  8. dty — Dotyali
  9. gku — ǂUngkue
  10. gll — Garlali
  11. gpe — Ghanaian Pidgin English - probably to be combined with other African Pidgin English (see RFM)
  12. gwm — Awngthim
  13. gmz - Mgbo
  14. hna — Mina (Cameroon) - presumably failed to be included because of the naming conflict with myi Mina (India), which, however, is spurious
  15. ihw — Bidhawal - a dialect of/with unn
  16. jan — Jandai
  17. jbi — Badjiri - possibly not even Karnic; cf my notes about ekc above and on User:-sche/retired codes
  18. jbk (Barikewa) and jmw (Mouwase) — varieties of {{mgx}} Omati/Mini, said to be quite divergent from each other: but we should either have mgx or have jbk+jmw, not all three
  19. jbw — Yawijibaya
  20. jgk — Gwak
  21. jjr — Bankal
  22. jms — Mashi (Nigeria)
  23. jog — Jogi
  24. jui — Ngadjuri
  25. kbn — Kare (Central African Republic)
  26. kmf — Kare (Papua New Guinea)
  27. kol — Kol (Papua New Guinea)
  28. myi — Mina (India) (see hna)
  29. nmx — Nama (Papua New Guinea)
  30. npg — Ponyo-Gongwang Naga
  31. nqy — Akyaung Ari Naga
  32. nsf — Northwestern Nisu
  33. ntx — Tangkhul Naga (Myanmar)
  34. nwg — Ngayawung
  35. nxk — Koki Naga
  36. oke — Okpe (Southwestern Edo)
  37. okx — Okpe (Northwestern Edo)
  38. olk — Olkol
  39. orc — Orma
  40. pnl — Paleni
  41. ptq — Pattapu
  42. sfe — Eastern Subanen
  43. sgj — Surgujia - Suraji, Surguja, Surgujia-Chhattisgarhi, Surjugia
  44. sim — Mende (Papua New Guinea)
  45. sng — Sanga (Democratic Republic of Congo)
  46. sox — Swo
  47. spb — Sepa (Indonesia)
  48. tcl — Taman (Myanmar) - (extinct)
  49. tgj — Tagin
  50. tgz — Tagalaka - (extinct)
  51. tjl — Tai Laing
  52. tmn — Taman (Indonesia)
  53. tnz — Tonga (Thailand)
  54. tst — Tondi Songway Kiini
  55. xsn — Sanga (Nigeria)
  56. xud — Umiida - (extinct)
  57. xun — Unggaranggu - (extinct)
  58. xyy — Yorta Yorta
  59. yhs — Yan-nhaŋu Sign Language - signed by 10 people, not that distinct from ygs (exclude?)
  60. ykn — Kua-nsi
  61. yku — Kuamasi
  62. ysg — Sonaga
  63. yxy — Yabula Yabula - (extinct)

(This list is complete as of August 2015, before the 2015 change requests were finalized. Notes and misc.) - -sche (discuss) 15:47, 11 August 2015 (UTC)Reply

Codes in the above list which have been added to Module:languages or WT:LT or otherwise dealt with have been stuck. - -sche (discuss) 03:11, 21 August 2016 (UTC)Reply

Bidhawal[edit]

The ISO added a code for Bidhawal, which we never got around to adding. That seems to be OK; Robert M. W. Dixon says in Australian Languages: Their Nature and Development (2002, →ISBN that "Bidhawal appears not to constitute a separate language, but rather to be the most eastern dialect of Q, Muk-thang (or Kurnai). The grammatical forms given by Mathews for Bidhawal are almost identical to those for Muk-thang, as are most of the verbs and a good proportion of nouns." - -sche (discuss) 03:02, 21 August 2016 (UTC)Reply

Treatment of reconstructed languages?[edit]

We merged Proto-Finno-Ugric and Proto-Finno-Permic into Proto-Uralic, and Proto-Baltic into Proto-Balto-Slavic. The original languages remain as etymology codes. Should this be mentioned here? —CodeCat 18:48, 21 August 2015 (UTC)Reply

Sure. Maybe in a separate table, though? Since those aren't cases where we deprecated, split, or broadened an ISO code, but rather cases where we assigned a code of our own devising and then went "wait, on second thought, nah". - -sche (discuss) 19:10, 21 August 2015 (UTC)Reply

Akan and its subdivisions[edit]

As for Akan we can currently find that both the macrolanguage and its subdivisons are treated as languages though Category:Fanti language and Category:Twi language were merged previously. It seems that we have to modify the description. How's that? --Eryk Kij (talk) 22:53, 26 May 2016 (UTC)Reply

Like so; thanks for pointing out that this page still needed to be updated. - -sche (discuss) 23:21, 26 May 2016 (UTC)Reply

ISO code changes 2018[edit]

Some codes have been merged or retired following the ISO's 2018 code changes; these changes are not necessarily recorded on WT:LT because the codes in question were not just merged or retired by Wiktionary but by the ISO. See Wiktionary:Beer parlour/2019/February#2018_ISO_code_changes for a list. - -sche (discuss) 00:07, 24 February 2019 (UTC)Reply

Scope of the page[edit]

I think we need to be more specific in terms of what we mean by language treatment. It should only apply to how languages are treated for entry making, but not anywhere else. For example, we allow many etymology-only languages in etymologies. See Wiktionary talk:About Chinese#Language treatment: Only the macrolanguage is treated as a language? — justin(r)leung (t...) | c=› } 04:52, 9 June 2020 (UTC)Reply

Oh my~ got to extirpate the remnants of truth from Wiktionary! This is just about pretending Chinese is a language when we know it's a macrolanguage. Just treat Chinese like any other macrolanguage group. So sad. --Geographyinitiative (talk) 05:05, 9 June 2020 (UTC)Reply
@Geographyinitiative: Many macrolanguages are treated the same way Chinese is. Have you even read the page? — justin(r)leung (t...) | c=› } 05:21, 9 June 2020 (UTC)Reply
If I may inquire, what other macrolanguage group is treated like Chinese is treated on Wiktionary? If you can give me a good answer on this, I could be much more convinced that the current system for covering Chinese languages on Wiktionary is not a disaster. --Geographyinitiative (talk) 05:24, 9 June 2020 (UTC)Reply
@Geographyinitiative: Zhuang is one of among many. Please see the main page - any macrolanguage in the table that is marked with "Only the macrolanguage is treated as a language" would be the same situation (more or less). — justin(r)leung (t...) | c=› } 05:35, 9 June 2020 (UTC)Reply
@Justinrleung: A rhetorical question: How many times the sad geography troll should be spoiled with responses so he stops treating Wiktionary and its editors as a complete disaster? --Anatoli T. (обсудить/вклад) 07:20, 9 June 2020 (UTC)Reply
If I may ask, what are the different Zhuang languages? Are there any other macrolanguage groups not associated with Chinese characters or influenced by Chinese politics that are not split up by language? I think every language should have its own header on Wiktionary, don't you? Atitarev, please don't hate me man! I am bringing a perspective that represents the opinions of many others and I am trying to make honest inquiries about really important things. There were no Wade Giles or Tongyong Pinyin derived geo terms before I came here, and I helped add an important perspective which was being neglected. I am a 'troll' because I bring an outsider perspective, but I am not a troll because I am actively working and negotiating to make the dictionary better with tangible results. Geographyinitiative (talk) 22:43, 20 June 2020 (UTC)Reply
You don't bring anything new. All valid forms are welcome and nobody blocked any language or any script or dialect or transliteration scheme. Yes, that includes Wade-Giles, Tonyong Pinyin, Min Nan in Chinese characters and Min Nan in POJ. Your conspiracy theories have no grounds at all. Bring away your perspective but don't poison people's minds about the achievements of this site. You don't raise any awareness, everyone is aware of what's out there. You just don't want to see it. You're slinging dirt around, then apologise or start praising people, which I find hypocritical. You talk a lot about your own achievements but nobody does it here, this is called narcissism. If there is not enough coverage for anything, then there was not enough contributors. Languages are somewhat like currencies. If a value of currency of small third-world country is low, nobody is interested in it but people of that country have to use it. Even if you do add Wade-Giles, Tonyong Pinyin, you pose it as an opposition of Mandarin and Hanyu Pinyin domination, which you blame this site for, not accepting the reality but it's still someone's fault, isn't it? And you keep blaming someone and no-one in particular for that. Everything is doable and achievable. You want to make the distinction between Min Nan and Mandarin, just do it within the existing infrastructure. Nobody stops you from defining specific senses, usage examples, etc. You want to add alternative English spellings, varieties of Chinese, go ahead, just do it in a positive way. Stop blaming everyone or the site. You just turn away people from your cause. All the work is welcome, if it's not breaking agreed conventions or rules. In short, add you Wade-Giles your forms, POJ, Min Nan, whatever but start making sense, stop attacking pinyin, Mandarin, this site, etc.
The Zhuang situation is a good example of a macrolanguage but it's harder to demonstrate at Wiktionary as the Zhuang coverage is very low at Wiktionary. The unified approach for other language, other than Chinese is better demonstrated by Serbo-Croatian, which combines two to four different standards, depending how to count - Croatian, Serbian, Bosnian and Montenegrin, two scripts - Cyrillic and Roman (Latin), two major dialects - Ekavian and Ijekavian (+Kajkavian). I don't want to cause more trolling from you but Serbo-Croatian "unification" had much stronger opposition and hate. You can imagine the passions after the Yugoslav war where language identity was a reason to be shot at or imprisoned. Nevertheless, at Wiktionary, the scientific and technical reasoning prevailed over hate. Don't imagine for a second that Chinese varieties and Serbo-Croatian standards and dialects are comparable. No way. They are not. Chinese varieties are mostly not mutually comprehensible. However, the rationale for the unified approach was presented and it won. You won't achieve anything by winging and trolling negative messages. Yes, I consider your mentioning that this site may be a complete disaster or similar at every opportunity is trolling. --Anatoli T. (обсудить/вклад) 23:37, 20 June 2020 (UTC)Reply
Zhuang lects. (Whether these pronunciations actually belong on the raemx entry is a different matter) —Suzukaze-c (talk) 23:39, 20 June 2020 (UTC)Reply
Let me mull it over a little more. However, I think that it would be wildly difficult to reach the conclusion "the Chinese macrolanguage header, including all language in modern China's borders from Cangjie to to-day, should be portrayed as equivalent to the Danish, Norwegian, Sweedish, English, German, French etc headers, implying they are all equally "languages"." I would that in my worst case scenario some kind of further disclaimer should be added automatically to every page that has the "Chinese" header so we know it means "any Chinese characters used in China since the Shang dynasty til today, including numerous unintelligible dialects with independent Wikipedia versions". What an expansive header it is! --Geographyinitiative (talk) 00:20, 22 June 2020 (UTC)Reply
Wiktionary doesn't have to apologise on every page on how it works. The votes on Chinese and Serbo-Croatian unifications defined the dictionary policies, which is or should be mentioned on appropriate About pages. If it's too hard to accept, which is understandable, you have two options: 1. get a new vote and win it or 2. leave, which was the case with some unhappy Croatian and Serbian editors. We didn't have a precedent in my memory with unhappy Chinese editors wanting to reverse the change and leaving, you may be the first. If you decide to stay, my personal advice is, you have to stop complaining at every opportunity in talk pages or edit summaries, bite the bullet and contribute in your favourite area, including enhancing dialectal coverage, strive to make it work, so that all promises in the vote to adequately cover all Chinese varieties are kept. --Anatoli T. (обсудить/вклад) 00:53, 22 June 2020 (UTC)Reply

Stale but unresolved discussions of languages to add or remove[edit]

Because they are so stale, I am unilaterally moving these off the RFM page because that page has grown too massive (800,000 bytes) to be usable; however, because they are unresolved, I don't want to hide them away on WT:LTD... so here they go... - -sche (discuss) 06:52, 28 December 2023 (UTC)Reply

RFM discussion: July 2016–October 2020[edit]

The following discussion has been moved from Wiktionary:Requests for moves, mergers and splits (permalink).

This discussion is no longer live and is left here as an archive. Please do not modify this conversation, but feel free to discuss its conclusions.


Even more languages without ISO codes, part 6

This next batch is of languages from lists other than Ethnologue and LinguistList. As before, I've tried to vet them all beforehand, but I will have doubtlessly made some mistakes. NB if you want to find more: I've avoided dealing with most of the Loloish languages, because all the literature seems to be in Chinese. —Μετάknowledgediscuss/deeds 04:54, 6 July 2016 (UTC)Reply

Australian languages[edit]

Tasmanian and other[edit]

Northeastern Tasmanian:
  • Northeastern, Pyemmairre language (aus-pye)  Done
    alt names/varieties: Plangermaireener, Plangamerina, Cape Portland, Ben Lomond, Pipers River
  • North Midlands, Tyerrernotepanner language (aus-tye) — Bowern considers this a dialect; perhaps we should just trust her
    now has an ISO code which should be added instead, see BP shortly - -sche (discuss) 04:27, 14 October 2020 (UTC)Reply
  • Lhotsky/Blackhouse Tasmanian language (aus-lbt) — the worst name in Bowern's set!
    I'm not sure... the very language is "reconstructed" by Bowern on the assumption that three wordlists (of which only two make it into the name) attest the same language, although apparently none of the three bothered to name the language. The chance of someone "would run across [a word in] it and want to know what it means" seems nonexistent. If we wanted to host the wordlists, we could do that in an appendix or on Wikisource. - -sche (discuss) 16:09, 9 August 2016 (UTC)Reply
Bowern's methods are scientific; but I would feel better if more than one scholar was saying there was one language in this set of wordlists, the way that for e.g. Port Sorrell, Dixon & Crowley and Glottolog agree that there is a unit/lect there. - -sche (discuss) 16:55, 4 June 2017 (UTC)Reply
and what of "Norman Tasmanian"? - -sche (discuss)


RFM discussion: August 2016[edit]

The following discussion has been moved from Wiktionary:Requests for moves, mergers and splits (permalink).

This discussion is no longer live and is left here as an archive. Please do not modify this conversation, but feel free to discuss its conclusions.


Some more missing American languages

Here are a few more North American languages for which we could add codes:

  • Akokisa (nai-ako). WP says it is attested certainly in two words in Spanish records (Yegsa "Spaniard[s]", which Swanton suggests is similar to Atakapa yik "trade" + ica[k] "people"; and the female name Quiselpoo), and possibly in more words in a wordlist by Jean Béranger in 1721 (if the wordlist is not some other language).
  • Algonquian–Basque pidgin (crp-abp). Wikipedia has a sample. The Atlas of Languages of Intercultural Communication, citing Bakker, says it was spoken from at least 1580 (and perhaps as early as 1530s) through 1635, and "only a few phrases and less than 30 words attributable to Basque were written down" (though apparently more words, attributable to other sources, were also recorded).
  • Guachichil (Cuauchichil, Quauhchichitl, Chichimeca) (nai-gch or, if Guachí is added as sai-gch, perhaps nai-gcl to prevent the two similarly-named lects from being mixed up by only typoing the initial n vs s), apparently sparsely attested.
  • Concho (nai-cnc). The Handbook of North American Indians, volume 10, says "three words of Concho [...] were recorded in 1581 [and] look like they may be [...] Uto-Aztecan".
  • Jumano (Humano, Jumana, Xumana, Chouman, Zumana, Zuma, Suma, and Yuma) (nai-jmn). The Handbook says "It has been established that the Jumano and Suma spoke the same language. Three words have been recorded" of it.

and from South America:

  • Peba / Peva (sai-peb), said by Erben to more properly by called Nijamvo, Nixamvo. Spoken in "the department of Loreto" in Peru. Attested in wordlists by Erben and Castelnau, which Loukotka provides, and which disagree with each other substantially: munyo (Erben) / money (Castelnau) "canoe, small boat"; nero (E) / yuna (C) "demon"; nebi (E) / nemey (C) "jaguar"; teki (E) / tomen-lay (C) "one", manaxo (E) / nomoira (C) "two"; etc. I would even consider that one might not be the same language as the other... what's with these languages that survive in disparate wordlists? lol.
  • possibly Saynáwa: fr.Wikt grants a code to this variety of Yaminawá language, described here (see also [1]).

- -sche (discuss) 04:04, 16 August 2016 (UTC)Reply

Support all except possibly Akokisa. I think it's a dialect of Atakapa, and that the wordlist is very likely not being linked correctly. That said, it's so few words, that there's no real reason not to accept it as a separate language, just to be conservative about it. —Μετάknowledgediscuss/deeds 04:08, 16 August 2016 (UTC)Reply
Good point about Akokisa. (I am reminded that you had mentioned its dialectness earlier; sorry I forgot!) The wordlist, labelled only with a tribal name per WP, is possibly plain Atakapa, but Yegsa is supposedly recorded as specifically Akokisa; OTOH that doesn't rule out that Akokisa is a dialect. Indeed, M. Mithun's Languages of Native North America treats as dialects Akokisa, Eastern ("the most divergent, [...] known from a list of 287 entries") and Western ("the best documented. Gatschet recorded around 2000 words and sentences, as well as texts [...] Swanton recorded a few Western forms", all published in 1932 in a dictionary). I suppose the benefit to treating it as a dialect would be that we could context-label Yegsa and Quiselpoo as {{lb|aqp|Akokisa}} and then Béranger's forms as {{lb|aqp|possibly|Akokisa}} without needing to agonize over which header to put them under. - -sche (discuss) 15:31, 16 August 2016 (UTC)Reply


RFM discussion: April 2017–October 2020[edit]

The following discussion has been moved from Wiktionary:Requests for moves, mergers and splits (permalink).

This discussion is no longer live and is left here as an archive. Please do not modify this conversation, but feel free to discuss its conclusions.


More unattested languages

The following languages have ISO codes, but those codes should be removed, as there is no linguistic material that can be added to Wiktionary. This list is taken from Wikipedia's list of unattested languages, but I have excluded languages which are not definitively extinct (and thus which may have material become available). If there was any reliable source I could find corroborating the WP article's claim of lack of attestation, it is given after the language. —Μετάknowledgediscuss/deeds 04:15, 4 April 2017 (UTC)Reply

  • Aguano language [aga]
    Unclear if it even existed per The Indigenous Languages of South America: A Comprehensive Guide (Campbell and Grondona).
  • Giyug language [giy]
    AIATSIS has the following to say: "According to Ian Green (2007 p.c.), this language probably died before the 1920's and neighbouring groups in the Daly claim it was the language of Peron Island which was linguistically and perhaps culturally distinctive from the nearby mainland societies. Black & Walsh (1989) say that this may or may not have been a dialect of Wadiginy N31." —Μετάknowledge
    The 1992 International Encyclopedia of Linguistics, v. 1, p. 337, says "Giyug: 2 speakers reported in 1981, in the Peron Islands in Anson Bay, southwest of Darwin." The 2003 edition repeats the claim that "2 speakers remain". Wikipedia says it's extinct and unattested, but Glottolog, although having no resources on it, suggests it's not extinct. Might be best to leave it alone for now. - -sche (discuss) 01:13, 6 August 2020 (UTC)Reply
  • Mawa language (Nigeria) [wma] (We call this "Mawa", if removed, [mcw] Mahwa (Mawa language (Chad) can be renamed to the evidently more common spelling "Mawa".)
    Removed, and mcw renamed. Glottolog had only one reference to support the existence of Mawa, Temple (1922), which does not even include a section under that header. There may be confusion with the section on the "Marawa", but that does not even mention what language those people speak. (Temple also knows very little about linguistics; while skimming through, I found that Margi (a Chadic language) was said to be similar to the languages of South Africa. —Μετάknowledgediscuss/deeds 01:39, 6 August 2020 (UTC)Reply
  • Nagarchal language [nbg]
    Appendix I in The Indo-Aryan Languages records this language as being a subdialect of Dhundari [dhd] and the 1901 Indian Census concurs; this is at odds with its description as an unattested Dravidian language, but the geographical specifications seem to match up.
  • Ngurmbur language [nrx]
    AIATSIS says: "Harvey (PMS 5822) treats Ngomburr as a dialect of Umbukarla N43, but in Harvey (ASEDA 802), it is listed as a separate language." Nicholas Evans confirms in The Non-Pama-Nyungan Languages of Northern Australia that it is unattested.
  • Wasu language [was]
    Unclassified due to its absence of data per The Indigenous Languages of South America: A Comprehensive Guide (Campbell and Grondona).


RFM discussion: May 2017–October 2020[edit]

The following discussion has been moved from Wiktionary:Requests for moves, mergers and splits (permalink).

This discussion is no longer live and is left here as an archive. Please do not modify this conversation, but feel free to discuss its conclusions.


Some spurious languages to merge or remove, 2
remove Adabe [adb]

Geoffrey Hull, director of research for the Instituto Nacional de Linguística in East Timor, notes (in a 2004 Tetum Reference Grammar, page 228) that "the alleged Atauran Papuan language called 'Adabe' is a case of the mistaken identity of Raklungu," a dialect (along with Rahesuk and Resuk) of Wetarese. He notes (in The Languages of East Timor, Some Basic Facts) that only Wetarese is spoken on the island, and Studies in Languages and Cultures of East Timor likewise says "The three Atauran dialects—with the northernmost of which the dialect of nearby Lirar is mutually intelligible—are unquestionably Wetarese, and not dialects of Galoli, as Fox and Wurm suggest for two of them (n. 32). The same authors refer (ibidem) to a supposedly Papuan language of Atauro, the existence of which appears to be entirely illusory." (The error appears to have originated not with Fox and Wurm but with Antonio de Almeida in 1966.) - -sche (discuss) 01:45, 31 May 2017 (UTC)Reply

We could repurpose the code into one for those three Atauran varieties of Malayo-Polynesian Wetarese, Rahesuk, Resuk, and Raklu Un / Raklungu (the last of which Ethnologue does list as an alt name of adb, despite their erroneous family assignment of it), perhaps under the name "Atauran Wetarese" for clarity. - -sche (discuss) 01:52, 31 May 2017 (UTC)Reply
remove Agaria [agi]

Glottolog makes the case that this is spurious. - -sche (discuss) 07:57, 31 May 2017 (UTC)Reply

Arma

Arma (aoh) is also said to be "a possible but unattested extinct language"; I am trying to see if that means it is entirely unattested, or if there are personal/ethnic/place names, etc. - -sche (discuss) 09:45, 3 June 2017 (UTC)Reply

Removed, see Wiktionary:Beer_parlour/2020/October#2019-2020_ISO_code_changes. - -sche (discuss) 06:18, 14 October 2020 (UTC)Reply
Aghu language

The VU Amsterdam report linked to here seems to indicate that one lect has been given multiple codes, and that "Jair" at least is spurious. Further research wouldn't hurt. —Μετάknowledgediscuss/deeds 00:24, 3 October 2019 (UTC)Reply

RFM[edit]

Splitting Selkup[edit]

The following discussion has been moved from Wiktionary:Requests for moves, mergers and splits (permalink).

This discussion is no longer live and is left here as an archive. Please do not modify this conversation, but feel free to discuss its conclusions.


{{ping|Surjection|Tropylium|Kaarkemhveel}} After a while of deliberation with Kaarkemhveel and two other future Selkup editors, we have come to the conclusion that it's best to split Selkup into two codes: Northern Selkup (sel-nor) and Southern Selkup (sel-sou) [the exact form of the codes is up for debate], which will both be part of the Selkup family (sel).

These two dialect areas are so different that treating them as a single language would be too bothersome. All subdialects are going to be marked with labels, and provided as languages in descendants sections (much like the two Karelian proper varieties are, or the Zyrian dialects).

The two branches are often named as different: Glottolog splits Selkup into "Kety-Central-Southern Selkup" (Southern) and "Taz-Turukhan" (Northern); The Oxford Guide to the Uralic Languages also shows a split between "Northern Selkup" and "Tomsk region Selkup" (p.778). A few more examples of papers that do this include Wurm (1997), Budzisch (2015), Vorobeva et al. (2017)...

There is precedent for treating these as different languages: ELP splits the family into three full-fledged languages ([2] [3] [4]). On the pages there is the following reasoning for this split: "The three main varieties of Selkup have traditionally been counted as dialects of a single language; their differences are, however, comparable to those between, for instance, Ket, Yug, and Pumpokol".

The Russian institute RAN also splits Selkup into Northern and Southern, as two full-fledged languages.

So, does anyone have an issue with this split? Thadh (talk) 11:04, 19 June 2023 (UTC)Reply

Not oppose as there are clear differences both lexical and cultural. Tollef Salemann (talk) 11:14, 19 June 2023 (UTC)Reply
The Wikipedia article also mentions a Central Selkup. What are you doing with that one? Does it belong to Southern Selkup? —Mahāgaja · talk 14:03, 19 June 2023 (UTC)Reply
Yes, that one will then be handled as Southern Selkup, just like it is by the above sources. Thadh (talk) 14:12, 19 June 2023 (UTC)Reply
No opposition on this much, Northern Selkup is by now clearly distinct from non-Northern and has its own literary standard. Bridging historical data exists but would be probably better handled in Proto-Selkup entries anyway, about all of it is field records and not direct literary use by the speaker community.
Depending on how work on non-Northern Selkup develops, further division could be eventually meaningful too. The other recent handbook, Routledge's The Uralic Languages, Second Edition discusses things from a primarily tripartite Southern / Central / Northern perspective and notes that, though the sharpest modern boundary is Central vs. Northern, the most taxonomically significant difference is Southern vs. {Central, Northern}. I believe currently Southern is better-documented than Central, but the latter is what still has some attempts at literary usage and revival. --Tropylium (talk) 14:48, 19 June 2023 (UTC)Reply
Done. Cleanup is ongoing. Thadh (talk) 20:01, 28 June 2023 (UTC)Reply


RFM discussion: December 2023–January 2024[edit]

The following discussion has been moved from Wiktionary:Requests for moves, mergers and splits (permalink).

This discussion is no longer live and is left here as an archive. Please do not modify this conversation, but feel free to discuss its conclusions.


Rusyn

Pinging (possibly) interested users, as always, feel free to ping more: @Vininn126, Sławobóg, Chernorizets, Mahagaja, -sche, Benwing2, Atitarev.

I propose we split Carpathian and Pannonian Rusyn into two codes (rue and rsk respectively, in line with their ISO 639-3 codes), and then set Old Slovak the ancestor of Pannonian Rusyn. I have made a list of typical Slavic developments on User:Thadh/Rusyn and given both a Pannonian Rusyn form (from Ramač 1995, Српско-русински речник) and a Carpathian Rusyn form (from Kercha 2012, Словник русько-русинськый). I think this proves beyond much of a doubt that Pannonian Rusyn belongs to the West Slavic group, and specifically to the Slovak dialects, while Carpathian Rusyn is part of the East Slavic group. This is also a view that is supported by many scholars. Thadh (talk) 13:28, 14 December 2023 (UTC)Reply

Support Sławobóg (talk) 13:36, 14 December 2023 (UTC)Reply
@Thadh would it be possible to add an Eastern Slovak column to your tables (presumably the variety of Slovak that Pannonian Rusyn would be closest to) for comparison? I'm not sure how much extra work that would be, but if it's not a huge amount, it would be helpful. Chernorizets (talk) 13:44, 14 December 2023 (UTC)Reply
Unfortunately, I don't have an Eastern Slovak dictionary at hand, but if anyone does, they're encouraged to add the forms! Thadh (talk) 13:59, 14 December 2023 (UTC)Reply
Strong support. The reflexes are clear, there are language codes, and it's the right moment to do this as Rusyn isn't highly developed yet, so splitting will be easier. Vininn126 (talk) 13:49, 14 December 2023 (UTC)Reply
Thanks for pinging me. I don't have enough background in Rusyn to wager a strong opinion here. Benwing2 (talk) 21:39, 14 December 2023 (UTC)Reply
Support @Thadh: Does Pannonian Rusyn completely lack native pleophony (polnoglasie) or they are all late borrowings? E.g. Pannonian/Carpathian брег (breh) vs бе́рег (béreh) and злато (zlato) vs зо́лото (zóloto). If yes, then it looks like it can't belong to East Slavic languages. I support tentatively but I don't have much knowledge on Pannonian. Anatoli T. (обсудить/вклад) 22:31, 14 December 2023 (UTC)Reply
@Atitarev: Yes (which is kind of the point). Similarly reflexes of PS palatals, strong yers, and other things. Everything points to Pannonian being West Slavic and Carpathian being East Slavic. Thadh (talk) 22:34, 14 December 2023 (UTC)Reply
@Thadh: I see, thanks. I have yet to digest other differences.
Pannonian examples гарло (harlo) and дороги (dorohy) kind of contradict the overall differences, no? Anatoli T. (обсудить/вклад) 23:26, 14 December 2023 (UTC)Reply
@Atitarev: The language has been influenced by Czech, Ruthenian (> Ukrainian/Rusyn), Hungarian and Serbo-Croatian for the last two-hundred years quite intensively, so some inconsistencies due to borrowings are expected. For гарло, this might be a language-specific innovation (I can imagine grdl- and -rdl- overall not being a very easy cluster, and for this specific example Slovincian also does some simplification). дороги is undoubtedly a borrowing though. Thadh (talk) 23:36, 14 December 2023 (UTC)Reply
@Thadh: I think it's worth addressing possible loanwords for your case (e.g. дороги, etc.). Compare with the English, which has more Romance words than native words and the Korean, which has more Sinitic words than native but it doesn't change their language family belonging. These languages are described well, though, but for Pannonian Rusyn, need to make it explicit, IMO, in case someone questions. Anatoli T. (обсудить/вклад) 23:49, 14 December 2023 (UTC)Reply
@Atitarev I think the words chosen are unlikely to have been borrowed. Or at least there are enough that are unlikely to have been borrowed that it's even more unlikely that we chose only borrowed words. Vininn126 (talk) 09:49, 15 December 2023 (UTC)Reply

It's been a month and there's been overall support for this. I'm going to mark this thread as closed and lang codes for Carpathian and Pannonian should be assigned. Vininn126 (talk) 12:49, 14 January 2024 (UTC)Reply