Wiktionary:Beer parlour/2024/May

From Wiktionary, the free dictionary
Jump to navigation Jump to search

Arabic and Hebrew transliteration[edit]

Wiktionary currently transliterates the glottal stop in both Arabic and Hebrew as ʔ and the voiced pharyngeal fricative in both languages as ʕ. Would it be possible to correct these to respectively transliterate the glottal stop as ʾ and the voiced pharyngeal fricative as ʿ so they would be in line with Wiktionary's transliteration of other Semitic languages, which all use ʾ and ʿ?

Wiktionary also currently transliterates the Arabic voiceless velar fricative as . However, an alternate transliteration as is also used for this sound. Since is used for the transliteration of voiceless velar fricative for most Semitic languages except for Hebrew and Aramaic, I would like to request that Wiktionary's transliteration of the Arabic voiceless velar fricative be changed from to as well. Antiquistik (talk) 13:08, 1 May 2024 (UTC)Reply

@Antiquistik: No, we switched the other day. As for to , I don’t know, perhaps it’s better if you want to make an etymological statement that is fricativized k which we keep in begadkefat affected languages while organic. Fay Freak (talk) 18:08, 1 May 2024 (UTC)Reply
@Fay Freak In this case, I will add my opposition to the discussion regarding that change.
Concerning to , should I make another request, or should I add it to this one itself? Antiquistik (talk) 18:55, 1 May 2024 (UTC)Reply
@Antiquistik IMO the opposite change should happen and other Semitic languages should use ʔ and ʕ. The problem with the forward and backward quotes is that they're too small and too easily confused in many fonts. I also think ḵ is better than ḫ; ḫ is easily confused with the pharyngeal fricative. Benwing2 (talk) 23:49, 1 May 2024 (UTC)Reply
Personally, I agree with Benwing, although I am sympathetic to the idea that we should use whatever is most widely used, and I am also sensitive to the issue of words being findable by people who search for them using other transliteration systems. I would like us to implement having the templates/modules produce (but then potentially set to be invisible / display:none by default) other common transliterations so the entries can be found if people use our site search or Google and search for ʾiʿlān etc, as discussed in the 2022 discussion, unless that would cause problems. Then we could probably also set different CSS classes for the different transliterations so people could select whether they see ʾiʿlān or ʔiʕlān, similar to the way people can choose to see or not see {{,}} (and we could debate which one would be most helpful to have on by default for the average lay reader). - -sche (discuss) 02:33, 2 May 2024 (UTC)Reply
@-sche I think this is a good idea. AFAICT it would require some changes to Module:languages (which handles transliteration) so that a given transliteration method can return multiple transliterations rather than just one, each transliteration associated with properties such as CSS class, with one of them identified as "canonical" (meaning it is displayed while the others aren't). The only tricky thing here is manual transliterations; ideally, there would be method to convert a manual transliteration in the canonical system into each of the other systems, so that users have to specify only one transliteration rather than multiple. In the examples here, that conversion isn't hard, but sometimes it may not be possible (e.g. the current Hebrew transliterations are based on modern Hebrew pronunciation, which has several mergers compared with Biblical Hebrew, so we couldn't convert modern to Biblical Hebrew transliterations). Benwing2 (talk) 02:45, 2 May 2024 (UTC)Reply
@Benwing2: I believe that some of the existing manual transliteration entries may need to be reviewed in order to see whether their use was actually justified in the first place. Some of them are there only to workaround various technical issues, which ceased to exist. For example, this manually added transliteration for a Belarusian quotation became unnecessary after this fix. And I definitely support the idea of having multiple transliteration schemas, because this would allow introducing Belarusian Łacinka in addition to the current WT:BE TR scholary transliteration. As @-sche mentioned, the primary motivation is that words should be preferably searchable via Google or via the search box from the Wiktionary front page. Belarusian entries currently solve the searchability problem via manually added "Alternative forms" sections with red links, but this isn't ideal. So the proposed improvement has uses even beyond Arabic and Hebrew. --Ssvb (talk) 16:41, 2 May 2024 (UTC)Reply
Yes, I'm also in favour of having multiple transliteration schemes for this reason. Theknightwho (talk) 11:44, 8 May 2024 (UTC)Reply
@-sche This is a good proposal.
@Benwing2 I understand that ʔ and ʕ are more visible than the small half-rings, but I question how useful using them would be for the average reader since they are barely used in current transliteration schemes. If it hinders readers' ability to find these entries, we should avoid using them. Additionally, when is ḫ confused with the pharyngeal fricative? Antiquistik (talk) 05:42, 2 May 2024 (UTC)Reply
@Antiquistik I'm not sure what you mean by "barely used in current transliteration schemes". Are you referring to transliteration schemes outside of Wiktionary? If so, why do you think the average reader will be familiar with them, but won't be familiar with IPA? As for using ḫ, my point is that this is easily confused with ḥ (the transliteration for pharnygeal fricative), and having all three of h ḫ ḥ is going to make for endless confusion. Benwing2 (talk) 05:47, 2 May 2024 (UTC)Reply
@Benwing2 While I don't think that the average reader will be more familiar with the IPA signs, I doubt that they will be searching Arabic terms with signs from the current standard transliteration schemes substituted by IPA signs that are rarely used for Arabic transliteration.
And, as pointed out by @Ssvb, the entries need to be searchable. Using the more widely employed transliteration is the better option for this.
As for the transliteration of /x/, I strongly disagree with your position. The transliterations for other Afroasiatic languages like Old South Arabian, Ugaritic and Ancient Egyptian use both ḫ and ḥ without any problem, and I don't see why should the organic /x/ in Arabic be represented through a character used for sounds affected by begadkefat. Antiquistik (talk) 11:19, 3 May 2024 (UTC)Reply
@Antiquistik: Your premise of the signs being but used in IPA transcriptions before having been adopted by Wiktionary is wrong. We realized that there are lots of linguistic books, more or less traditionally Semitist, with them as their editorial choice for transcription. I have doomsurfed the philologies enough in the last 1½ decade to know that this is by far not so uncommon as to be stunting someone’s dictionary use. I also want to raise your attention towards pertinent languages without native writing system that can only be entered in an academic transcription, the Modern South Arabian languages, which have suffered some variations in transcription styles over the decades and native countries of researchers but I think are amenable as written down at أَيْدَع (ʔaydaʕ), whereas with all their diacritics the rings would strain the readers’ tempers. Fay Freak (talk) 11:37, 3 May 2024 (UTC)Reply
@Fay Freak How prevalent are Arabic transliterations using the IPA signs compared to the half-rings? Antiquistik (talk) 13:06, 3 May 2024 (UTC)Reply
@Antiquistik: No one, or at least not me, can do stats on such thing. There’s is also a qualitative difference in the kinds of resources that use them. In purely Semitist sources due to tradition the rings hold their ground. I have clicked around in my Semitics folder for you. I wanted to say that Leonid Kogan uses MODIFIER LETTER GLOTTAL STOP ˀ a lot, which is a bit more conspicious and between the two extremes, but the second work by him I opened ({{R:tig:Kogan:2011}} after {{R:sem-pro:GC}}), goes the whole hog and uses ʔ for Arabic and the other Semitic languages. {{R:sqt:CSOL}} and {{R:sem-pro:SED}} uses ˀ, anything published in the Journal of Semitic Studies such as doi: 10.1093/jss/fgt038 the rings, we may see it as a publisher decision, in more relaxed journal pieces he seems to prefer the IPA letters? In the old and long series Perspectives on Arabic linguistics you got the IPA letters all around. There is a lot of socialization behind letter choices, you just need to get used them, but not lose aesthetic sense. University docents may teach something specific but there is a point where one shan’t believe other people. Younglings learn and adults function by imitation but science by organized skepticism, a dilemma.
The complicated part: I can hold you a lecture how it is has to do with spatial-temporal memory, again the first chapter of the handbook of memory, ASD and the law I mentioned. Everything normal in the head, you guys tribally react to relations previously experienced with and from other people, in spite of the meatspace effecting the worst selection bias, contrary to universalism of science. You underestimate the psychological background behind all this. I did hardly positively respond to what teachers required or expected from me in terms of organizing a treatise, by some internal logics which aren’t strictly rationally evident, writing points of a paper in this and that order and not missing out a super-influential fashionable nonsense in the field I mean, which is detrimental to exams, and self-portrayal in job applications, however exquisitely able to judge the merits of the matter in isolation, and I am now very aware how strong feelings about signs come about, without sustaining them myself. We don’t just count voices together to let the loudest party win, this is not how creating good stuff works, only a working hypothesis. Fay Freak (talk) 14:09, 3 May 2024 (UTC)Reply

Descendant tree design[edit]

Here's my idea for a horizontal tree style that could be generated by {{etymon}}. I've switched up the colour scheme, since this is a descendants tree rather than an etymology tree. We can also include question marks or labels just as in the etymology tree. Let me know what you think! @Vininn126, Equinox, Sławobóg, -sche, 0DF Ioaxxere (talk) 21:24, 1 May 2024 (UTC)Reply

How would you represent borrowings and morphological reshaping in this format? Also I think I prefer Design 2, because in Design 1 the single right-branching node might be interpreted as somehow different from the below-branching nodes (and in addition, in Design 1 someone might e.g. interpret the juncture where Proto-Italic branches off as its own node, a daughter of PIE rather than just an artifact of the design). However, even better than either IMO would be one where the parent is centered vertically among all of its children rather than being at the top. Benwing2 (talk) 02:55, 2 May 2024 (UTC)Reply
@Benwing2: Probably with the same label system that {{etymon}} already uses. I like your idea for centering the node, although for trees with a huge number of lines it might lead to the ultimate ancestor being far down the page. Possibly the ultimate ancestor could be given some kind of special status where it always goes at the top left of the page. Ioaxxere (talk) 05:31, 2 May 2024 (UTC)Reply
I think Design 2 is also my preference, at least on desktop. Vininn126 (talk) 13:25, 3 May 2024 (UTC)Reply

Design 1

Proto-Indo-European *ph₂tḗr

Proto-Germanic *fadēr

Proto-West Germanic *fader

Old English fæder

Middle English fader

English father

Scots faither

English faeder

Proto-Italic *patēr

Latin pater

Old French pere

Middle French pere

French père

English père

English pater

Tok Pisin pater

Proto-Celtic *ɸatīr

Old Irish athair

Manx ayr

English ayr

Design 2

Proto-Indo-European *ph₂tḗr

Proto-Germanic *fadēr

Proto-West Germanic *fader

Old English fæder

Middle English fader

English father

Scots faither

English faeder

Proto-Italic *patēr

Latin pater

Old French pere

Middle French pere

French père

English père

English pater

Tok Pisin pater

Proto-Celtic *ɸatīr

Old Irish athair

Manx ayr

English ayr

Design 3

Proto-Indo-European *ph₂tḗr

Proto-Germanic *fadēr

Proto-West Germanic *fader

Old English fæder

Middle English fader

English father

Scots faither

English faeder

Proto-Italic *patēr

Latin pater

Old French pere

Middle French pere

French père

English père

English pater

Tok Pisin pater

Proto-Celtic *ɸatīr

Old Irish athair

Manx ayr

English ayr

Design 4

Proto-Indo-European *ph₂tḗr

Proto-Germanic *fadēr

Proto-West Germanic *fader

Old English fæder

Middle English fader

English father

Scots faither

English faeder

Proto-Italic *patēr

Latin pater

Old French pere

Middle French pere

French père

English père

English pater

Tok Pisin pater

Proto-Celtic *ɸatīr

Old Irish athair

Manx ayr

English ayr

Design 5

Proto-Indo-European *ph₂tḗr

Proto-Germanic *fadēr

Proto-West Germanic *fader

Old English fæder

Middle English fader

English father

Scots faither

English faeder

Proto-Italic *patēr

Latin pater

Old French pere

Middle French pere

French père

English père

English pater

Tok Pisin pater

Proto-Celtic *ɸatīr

Old Irish athair

Manx ayr

English ayr

At the risk of stating the obvious, only a small fraction of the descendants are being shown here. Is this focussed on English? Nicodene (talk) 21:49, 1 May 2024 (UTC)Reply
@Nicodene: This is just a mockup. I created all the HTML by hand, but the full (automatically-generated) tree will have all the descendants. Ioaxxere (talk) 22:11, 1 May 2024 (UTC)Reply
How would they all fit? Some of the ‘nodes’ have dozens of direct descendants. Nicodene (talk) 22:16, 1 May 2024 (UTC)Reply
@Nicodene: The tree would be extremely tall in that case. Either way, it would still be significantly more readable than something like what we currently have at Reconstruction:Proto-Sino-Tibetan/s-la#Descendants. Ioaxxere (talk) 22:19, 1 May 2024 (UTC)Reply
I have to agree with Nicodene. With etymology trees and the vertical format, it makes more sense to me because the tree will be much more compressed, but for descendants, I can't really see it working as well. It'll get really unwieldy and fast. The list you've pointed too isn't good either, but I don't like replacing one problem with another one. Looking at the link you've sent, how would this interact with etymology-only languages or the situation with Chinese? AG202 (talk) 03:06, 2 May 2024 (UTC)Reply
Etymology-only languages shouldn't be too difficult to handle in general. For Chinese, I feel like including dozens of dialectal pronunciations in Reconstruction:Proto-Sino-Tibetan/s-la is excessive and we should reduce that to only those forms which were borrowed into other languages. It's also possible that descendants trees will end up having less automation than etymology trees in general. Ioaxxere (talk) 05:31, 2 May 2024 (UTC)Reply
One thing that needs to be addressed is alternative forms. In Middle English, there are loads of them for everything. They can't always be ignored, because there are enough cases like catch and chase from Old French: chacier, chacer; cachier, flour and flower from Middle English: flour, fflour, fflowr, fleur, flor, floure, flower, flowr, flowre, flowyr, flur or even morrow and morn from Middle English: morwe, morewe, morowe, morow, morrou, morue, morw, morȝe, morewen, morowen, morȝen, morwen, morwyn, morwhen, morwoun, morun, moron, moryn, morn; morgen, marhen, mareȝen, morghen, moruwe, where different alternative forms have different descendants. Chuck Entz (talk) 18:31, 3 May 2024 (UTC)Reply
Love it. After a quick glance at the HTML, is the only difference alignment? I think that since this could appear early on in a number of entries that have right-floating tables of contents, I think left-alignment makes the most sense to avoid some of the inevitable bunching. —Justin (koavf)TCM 22:14, 1 May 2024 (UTC)Reply
@Koavf: No, the difference is whether there are connectors on the bottom of the boxes. I have no idea why the alignment is different, actually... Ioaxxere (talk) 22:16, 1 May 2024 (UTC)Reply
Ah, I see that now. —Justin (koavf)TCM 22:17, 1 May 2024 (UTC)Reply

get rid of noun and adjective plural form categories once and for all[edit]

There appears to be consensus established here, here and here, as well as in this diff, to not categorize noun and adjective non-lemma forms in separate 'noun plural forms' and 'adjective plural forms' categories. Yet when I made such a change for newly added Chadian Arabic terms, my favorite editor User:Fenakhay went on a revert spree. By longstanding consensus, we do not in general categorize non-lemma forms as e.g. Category:Russian noun prepositional case forms etc., so I don't see why an exception needs to be made for noun plural forms. However, I'd like to get clear consensus here to remove all such categories and delete the entries from Module:category tree/poscatboiler/data/non-lemma forms that allow such categories to be recognized. We have already done this for some languages; for example, there is intentionally no Category:English noun plural forms, and that page is protected against re-creation by bots or non-admins.

The alternative is to outline a clear rationale for why we need such categories and a rule for which situations they are allowed and which situations they aren't allowed. Either way, the current haphazard situation, where some languages have such categories and some don't, and the categories are incomplete, is unmaintainable.

Benwing2 (talk) 23:45, 1 May 2024 (UTC)Reply

And a stronger consensus at Wiktionary:Requests for deletion/Others#Category:Adjective plural forms by language. It seems that Fenakhay is the only editor who supports the retention of these categories. Consensus is against them. This, that and the other (talk) 02:55, 2 May 2024 (UTC)Reply
I support getting rid - trivial category intersections like this are a waste of time. Theknightwho (talk) 03:24, 2 May 2024 (UTC)Reply
I don't see any rationale for this kind of category either and so am in favour of deleting them. Nicodene (talk) 14:13, 2 May 2024 (UTC)Reply
I agree as well. Ioaxxere (talk) 17:57, 2 May 2024 (UTC)Reply
Support deleting these. Ultimateria (talk) 17:21, 6 May 2024 (UTC)Reply
If we have this kind of thing, it should be with a clear rationale for when/where and why (as Benwing says) and it should be added automatically, probably by whatever headword- or definition-line templates we're using to declare something as a noun plural form, paucal form, etc in the first place — I say this because as far as I saw in the prior RFDs, the categories were populated haphazardly and manually with handfuls of entries, which is not useful. The usefulness of categorizing non-lemma forms by their specific non-lemma-ness seems small (though not nonexistent) to me; I suppose if I wanted to know what kinds of endings Foobarian noun plural forms had, a category would be useful, but the array of endings which Foobarian noun plural forms have could alternatively be mentioned on the About Foobar page, or on the Foobarian equivalent of Appendix:English grammar. Can anyone articulate something these categories would be useful for? (Absent that, I have no objection to deleting them, and indeed voted to do so in some of the prior RFDs.) - -sche (discuss) 19:21, 2 May 2024 (UTC)Reply
Personally, I find these categories very useful from a navigational standpoint, so I'd like to see them kept. That said, they should be added automatically as part of templates like {{infl of}} and {{plural of}}, not added manually by users. Binarystep (talk) 11:26, 5 May 2024 (UTC)Reply
@Binarystep Do you realize this is simply an intersection category? In general we don't usually include intersection categories because you can search for any combination using the Search feature. In this case, e.g. to do the equivalent of CAT:Chadian Arabic noun plural forms, you can search for the combination of category CAT:Chadian Arabic noun forms and template Template:plural of. Adding them automatically using templates like {{infl of}} and {{plural of}} has already been tried, but it turns out to be difficult from a programmatic standpoint in some cases and a maintenance headache, which is the reason I want them removed. Benwing2 (talk) 20:02, 5 May 2024 (UTC)Reply

template similar to Template:alt or Template:desc for Derived terms, Related terms, etc.?[edit]

Hi. User:Fay Freak and I have been having a discussion about using {{alt}} or {{desc}}, or a creating a similar template, for Derived terms and the like. This came up because Fay Freak has been using {{desc|nolb=1}} in Derived terms sections. (Note: |nolb=1 disables the language name at the beginning. FF proposes renaming |nolb= to |nolang= to avoid confusion with |lb= for labels and because what's being suppressed is a language name, not a label.) Both {{alt}} and {{desc}} let you specify a series of terms along with per-term properties plus overall labels for the whole set of terms, although the syntax of the two templates is different and {{desc}} has some extra features specific to descendants. Note that we also have {{syn}}, {{ant}}, etc. for inline synonyms/antonyms/etc., which likewise have support for specifying a series of terms with both per-term properties and overall labels. The current syntax for Derived terms, Related terms and such involves manually listing each term with {{l}} and using {{q}} to add qualifiers as needed, but compared with {{alt}} and {{desc}} this is both more cumbersome and less standardized, meaning that different people format things differently. I think we ought to have some way for Derived terms sections and the like of specifying a list of terms plus labels, similar to {{alt}} and {{desc}}. The question is, should we just reuse e.g. {{alt}} for this purpose, or create another template? (If the latter, I'd maybe call it {{terms}}.) Potentially we could rename {{alt}} to {{terms}} or something similarly generic and keep {{alt}} as an alias, since there isn't really anything about {{alt}} that is specific to Alternative forms.

I'm omitting mention of {{col3}} and the like; while these are useful especially for long lists of similar terms, they don't provide the ability to specify a set of labels at the end of the list of terms, as {{alt}} and {{desc}} do.

Benwing2 (talk) 05:22, 2 May 2024 (UTC)Reply

That'd be quite nice. All I have to add is that it'd help to have the option to split derived terms into columns or put them in collapsible boxes, as people have been doing with a variety of other templates (cf. cado). Nicodene (talk) 14:01, 2 May 2024 (UTC)Reply
I think we'd be able to scrape this to be honest. All it'd need is an etymology section for most terms... Vininn126 (talk) 16:20, 5 May 2024 (UTC)Reply
@Vininn126 I don't quite understand what you mean, can you clarify? Benwing2 (talk) 20:03, 5 May 2024 (UTC)Reply
Sorry, misinterpreted. Not sure I have a strong opinion. Vininn126 (talk) 07:17, 6 May 2024 (UTC)Reply
I'm having trouble understanding the need for such a template beyond stringing multiple {{l}}s together. Can you give an example? I'm also confused by the association being made between Derived terms and Alternative forms. They're pretty distinct in my mind. -- Sokkjō 03:41, 11 May 2024 (UTC)Reply
@Sokkjo User:Fay Freak gave the example in Sittenstrolch of using {{desc|de|Sittich|lb=prison slang|nolb=1}} under Derived terms in order to get the label functionality; it displays as
You can get a somewhat similar effect using {{alt|de|Sittich||prison slang}}:
Here, only one term is listed but you can easily imagine listing multiple terms and multiple labels, which are supported in both syntaxes. Note that you couldn't so easily just use a qualifier because the labels autolink like {{lb}} labels, but don't categorize. I suppose you could write
* {{l|de|foo}}, {{l|de|bar}}, {{l|de|baz}} {{lb|de|prison slang|Austria|nocat=1}}
which displays as
much like writing
* {{alt|de|foo|bar|baz||prison slang|Austria}}
but as you can see, the former is much more awkward.
The reason I brought this up is that there's not a lot of functionality (and arguably no functionality) that's specific to {{alt}}; that's why I mentioned generalizing (or simply renaming) {{alt}} so it can be used outside of Alternative forms sections. Benwing2 (talk) 07:10, 11 May 2024 (UTC)Reply
In the example Sittenstrolch, there is no reason a usage label would belong there -- that should be left to the entry page. If I saw a user add that, I would delete it. -- Sokkjō 07:27, 11 May 2024 (UTC)Reply
Obviously not everyone agrees with you, because qualifiers and labels are extremely common in derived terms, synonyms and the like. I would tread lightly and think twice before deleting such a label. Benwing2 (talk) 08:39, 11 May 2024 (UTC)Reply
What other users are putting usage labels in the derived terms section?! -- Sokkjō 05:07, 12 May 2024 (UTC)Reply
Being able to string together multiple {{l}}’s is all I ever wanted for Christmas. Nicodene (talk) 05:56, 12 May 2024 (UTC)Reply

Plurals on head lines and declension tables[edit]

Is there any point in having both plurals on the head line and a declension table showing the plural for a noun lemma? I would be inclined to omit the plural(s) when there is a declension table. --RichardW57m (talk) 16:36, 2 May 2024 (UTC)Reply

@RichardW57m it would perhaps help to specify which language you're thinking of and give an example. This, that and the other (talk) 03:07, 6 May 2024 (UTC)Reply
The specific language where this has come up is Lithuanian, avìdė, which currently only displays the plural through the declension table. A similar specific is with the Lithuanian adjective headword template, where until recently many ordinals' neuter form was wrong and contradicted the following declension table. --RichardW57m (talk) 11:27, 7 May 2024 (UTC)Reply
IMO it depends on how regular the inflections in question are. If they serve as something like principal parts, I think it's useful to put them on the headword line as well as in the declension table, because then someone with some familiarity with the language will know how to inflect the term without needing to look through the whole declension table to figure out what the most important parts are. This is similar to how we list the past historic and past participle for Italian verbs. OTOH if they are largely predictable, putting them in the headword line is less useful. Benwing2 (talk) 23:27, 8 May 2024 (UTC)Reply
As Benwing suggested, I would say the answer is language-specific. For example, in German, plurals seem to be the most unpredictable declined form of a noun, so it makes some sense to give the plural in the head line.--Urszag (talk) 22:38, 9 May 2024 (UTC)Reply

A way to more easily connect with readers: a follow-up[edit]

Following Wiktionary:Beer_parlour/2024/March#A_way_to_more_easily_connect_with_readers, I wrote to WMF in an attempt to figure out how to best resolve this issue. @Johan Jönsson replied and has given us an option, I think. He suggests we create a new mailing list for admins and for us to put enwiktionary in the name somehow. What do people think of this solution? Vininn126 (talk) 16:03, 3 May 2024 (UTC)Reply

Support Ioaxxere (talk) 16:56, 3 May 2024 (UTC)Reply
Support This, that and the other (talk) 08:30, 5 May 2024 (UTC)Reply
Support Binarystep (talk) 12:45, 5 May 2024 (UTC)Reply
Support Thadh (talk) 11:09, 6 May 2024 (UTC)Reply
Okay, I'm going to move forward with this. See phabricator:/T364731. Vininn126 (talk) 10:38, 13 May 2024 (UTC)Reply
Update, we have a private mailing list for admins (please open phabricator thread for details). Any active admins may sign up. Ladsgroup mentioned we may also open a public general use mailing list if we want. I'll leave that discussion for another time. Vininn126 (talk) 07:20, 14 May 2024 (UTC)Reply

Volga Türki language[edit]

Greetings, I'd like to propose giving Volga Türki an L2.

It is a significant member of the Middle Turkic literary languages, and is as important as Ottoman Turkish, Chagatai and Karakhanid, all of which already have their own Wiktionary categories: Category:Ottoman Turkish language, Category:Chagatai language, Category:Karakhanid language. Volga Türki is considered a descendant of Karakhanid, together with Chagatai, however they all are roughly contemporary.

It was in wide use in the Volga-Ural region from 15th century (if including Qissa-i Yosof poem by Qul Ghali, then from 12th century) until adoption of Cyrillic and Latin scripts for Tatar and Bashkir languages under Soviet rule. Even though before Soviet rule, at late 18th-early 19th century the written languages for Tatar and Bashkir started to slightly diverge from Volga Türki, it remained a common standard for international affairs, especially between other Turkic groups.

Its addition would not only help with etymological sections, but also help connect the cognates with other Turkic languages, similarly to other Middle Turkic literary languages' sections.

As for Unicode characters, numerals and readings, I already have prepared all of this, and will work on adding them as soon as the category is created. The sources of lemmas are going to be taken from books, dictionaries and other written resources from that time period. I will try to list a source for each lemma whenever possible.

The only issue, however, is that the language does not have its own ISO 639-2 code yet. I propose one of the following codes to be used for the language: iut (for İdil-Ural Turkic); tui (Turkic of İdil-Ural). I deprecate codes like vut (Volga-Ural Turkic) and ott (Old Tatar) firstly due to the name Volga not being used by the locals, especially during the era of Volga Türki, and secondly due to the name Volga/İdil/İdel Türki being neutral, and Old Tatar primarily referring to the diverged variant of Volga Türki that was used specifically for Tatar. Bababashqort (talk) 16:06, 3 May 2024 (UTC)Reply

What is the Volga Turki corpus and how accessible is it? Qissa-i Yosof poem by Qul Ghali should definitely not be included, as it is covered by Khorezmian Turkic [1]. Allahverdi Verdizade (talk) 20:40, 3 May 2024 (UTC)Reply
Support BurakD53 (talk) 17:57, 4 May 2024 (UTC)Reply
Its corpus mostly isn't digitalised, but practically all Bashkir and Tatar literature from at least 16th century until late 19th century is written in Volga Türki. The books, manuscripts and magazines are still preserved in a lot of libraries in Tatarstan and Bashkortostan. As for Qissa-i Yusuf, that is somewhat debatable, but given the timeframe it probably suits Khorezmian, as one of the ancestors of Volga Türki. Bababashqort (talk) 07:24, 5 May 2024 (UTC)Reply
@Bababashqort: for the last issue, we generally make up our own codes using the code for the group it belongs to (probably "trk") followed by a hyphen ("-") followed by some sequence of letters that's not already in use by us. That way there's no chance of our code conflicting with an ISO code. Since this is strictly for internal use and our modules and css/jss code convert everything for browsers, we don't have to use existing ISO codes. Chuck Entz (talk) 18:24, 4 May 2024 (UTC)Reply
Yes, I've been told that wiki uses a placeholder, but didn't exactly know how it worked. Thank you for explaining!
In this case I'd suggest trk-iut Bababashqort (talk) 07:25, 5 May 2024 (UTC)Reply
@Bababashqort We try to use the first three letters of the lect in the second part of names like this. What do you think of trk-idi or trk-vol? Benwing2 (talk) 08:04, 5 May 2024 (UTC)Reply
trk-idi includes only the Volga part, as well as trk-vol. The name itself, however, is taken from the most widespread naming of the language, which unfortunately is shortened to Volga Türki, omitting Ural. And speaking of İdil, it is actually spelled as İdel in Tatar itself, İdil is just more Common Turkic. Therefore the only solution seems to be trk-iut, it's not that hard to deduce I think. Bababashqort (talk) 11:54, 5 May 2024 (UTC)Reply
@Allahverdi Verdizade suggested to make a Turki category instead, which I'd very much prefer. It would remove the need to add more distinct subvariants of it, such as North Caucasian Turki, Nogay Turki and others. This would also allow to use derivation template for all languages that used it: Crimean Tatar, Kumyk, Nogay, Bashkir and others. Bababashqort (talk) 13:21, 5 May 2024 (UTC)Reply
@Bababashqort Sure, that works. What language is this a category of? Benwing2 (talk) 19:56, 5 May 2024 (UTC)Reply
I think he meant he wants Türki as a language code, not specifically Volga Türki Bortkastningskonto (talk) 07:01, 6 May 2024 (UTC)Reply
@Bortkastningskonto @Bababashqort OK, I need more information then. Is "Türki" supposed to be an L2 language? This is an awfully generic name for a language, and I would likely oppose this name for this reason. And I will repeat my assertion that the code for Volga Türki should be 'trk-vol' in keeping with the name. The code should reflect the first three letters of the lect name barring extraordinary circumstances (usually due to ambiguity when there are multiple lects sharing the first three letters, which is not an issue here). @Allahverdi Verdizade can you weigh in here? I am not qualified enough to tell whether this should be an L2 language, an etym-only language or just a label of some other language (the last two being rather similar). Benwing2 (talk) 07:11, 6 May 2024 (UTC)Reply
I didn't actually suggest making Türki a L2, rather I wondered whether it wouldn't be better to do so depending on how different Volga Türki is from, say, North Caucasian Türki. I can't answer that question myself, and I think, in general, very few people can give a well-informed opinion on that. Reading this book on North Caucasian Turki (in Russian) might help a little. Considering that Bababashqort is likely only going to work with sources written in the Volga variety, maybe it is the safest to create a Volga Turki L2, in which case you would circumvent the problem with "awfully generic name". Documents in North Caucasian Turki are terribly inaccessible (not digitized or normalized), so I don't think anyone is going to work with them.
In any case, there is also the problem of classifying "literary languages" and fitting them into genealogical tree schemes. It is often said that this or that language "is moslty X, but also incorporates elements of Y", at the same time as it "continues the literary tradition of Z". I can't exactly tell you what it means that "Volga Turki continues the tradition of Khorezmian Turki", which in turn "continues the tradition of Karakhanid", as it oftentimes is put in Russian books on the matter. Too much arbitrariness for my taste. So my opinion is that these "literary languages" maybe should not have ancestors and descendants. Allahverdi Verdizade (talk) 17:35, 7 May 2024 (UTC)Reply
Support Yorınçga573 (talk) 20:23, 9 May 2024 (UTC)Reply

Request for a new language[edit]

Yet again, I request for Old Lombard to be listed separately, as for now Old Lombard is listed as a dialect and not a language. That Northern Irish Historian (talk) 17:30, 4 May 2024 (UTC)Reply

I notice that Old Italian is currently an etym-only variant of Italian. Why can't Old Lombard be the same? How different are Lombard and Old Lombard? Benwing2 (talk) 18:59, 4 May 2024 (UTC)Reply
Old Lombard:
  • Faremo preg a Deo a Questi cominzament
  • et a la soa mather ke preg l’omnipotent.
  • Ke n’des a dir et a far tute l so placiment
  • Ço ked is la scritura si se conven a dir
  • De la pasin de Christ a ki ne plas hodir
  • La qual per nu katif je plase sostegnir
  • Bene questi paroli de panzer e da stremir
  • Qui longa fis e dis del pasio del fy de la rayna.
  • La qual si m’dia gratia et a mi sia vesina
  • Ke parlo dritament de la pasion divina
  • St’apreso si me scampo da la infernal pena.
Modern Lombard:
  • Ambiaróm con ‘na preghiéra a Dio
  • e a sò madèr che la préghes l’Onipotent
  • Che nómes a dì e a fa töt de so gradimènt
  • E per bontà sò el vègnes a compimènt
  • Chèl che la dis la Scritüra isé come l’è giöst a dìl
  • De la pasiù de Cristo a chi che öl sintìl
  • Pasiù che per notèr pecadùr la sèrf a soportà
  • Con rasegnasiù chèste parole de pianzer e dè dulùr
  • Ché se parla e se dìs del fiöl de la regina
  • Che la me dàghes gràsia e la me stàghes vizìna
  • ‘Ntat che parle drit de la pasiù divina
  • Semài che scamparó de la pena infernal.
That Northern Irish Historian (talk) 22:35, 8 May 2024 (UTC)Reply
@That Northern Irish Historian That's not what I was looking for; you have pasted in two different translations which naturally will be different. If you try to match up the corresponding words, they are IMO marginally different enough to maybe be considered different L2's (although they differ less e.g. than the current Occitan dialects). I notice however that there are 0 lemmas currently listed as Old Lombard; are you actually planning on adding some? Benwing2 (talk) 23:16, 8 May 2024 (UTC)Reply
Yes, but see zinqui, Jesu, and other pages. It is not working. That Northern Irish Historian (talk) 23:24, 8 May 2024 (UTC)Reply

That's how we enter these words. If you have any objections, please write here. BurakD53 (talk) 14:29, 5 May 2024 (UTC) wordsReply

lol. Yes, I have objections. Allahverdi Verdizade (talk) 16:11, 5 May 2024 (UTC)Reply
As I said before, I want the {{trk-ogz-pro}} code to be removed and replaced with {{trk-ogz}}. Since we have already reconstructed them all under the {{trk-pro}} pages, Proto-Oghuz is quite unnecessary. If anyone still wants to reconstruct Proto-Oghuz, you can reconstruct it using the * sign on the Oghuz page. (Which is quite unnecessary) Likewise, {{trk-klj}} can also refer to the Arghu language, but the data in this language consists of a few words. {{trk-ogz}} is the direct ancestor of all Oghuz languages, in short, it is the same as Proto-Oghuz {{trk-ogz-pro}}. However, we cannot enter these Oghuz or Proto Oghuz words recorded in the Diwan into the site as entries. It requires reconstruction in order to be entered to us. However, these Proto Oghuz words, also Proto Khalaj words, are not a reconstruction. I think that both of them should be entered as input on the site, the biggest reason is that these languages cannot be assumed to be dialects of other languages. But since the Arghu language consists of only a few words, it can be entered under the name Proto. Oghuz language is mentioned many times in the Diwan and even information about its grammar is given. A few Proto Khalaj, i.e. Arghu, words may be added as exceptions. But since this is the case for Oghuz, there is no need to create a language code called Proto-Oghuz. This is my opinion. I firmly reject the addition of these Oghuz words to Old Anatolian Turkish. Not every word mentioned in the Diwan has been witnessed in Old Anatolian Turkish, and the place where Kashgarî shows the Oghuzs on the map in the period he mentions is not Iran, but Central Asia. Also the words here are more archaic than the form in which they are found in Old Anatolian Turkish. BurakD53 (talk) 18:22, 5 May 2024 (UTC)Reply
Support Yorınçga573 (talk) 20:10, 9 May 2024 (UTC)Reply

Lemma categories[edit]

Discussion moved from WT:Beer parlour/2024/April#Lemma categories.

I've been cleaning up Special:UncategorizedPages, and I've run across a number where @Nicodene has disabled categorization for alternative forms. My understanding is that all mainspace entries should be in either Category:[Language] lemmas or Category:[Language] non-lemma forms. While an alternative form is supposed to be a stub that links to the main form, as far as the categories are concerned, it's a lemma. It's certainly not a non-lemma form, because it has its own non-lemma forms. Leaving it out of both categories raises the question of why we have the entry at all, if we feel we need to hide it: if we don't link to it in the main entry, there's no way to navigate to it.

This has come up before over the years, and we've more than once decided to do it this way. As far as I can tell, Nicodene is the only editor who's doing otherwise. Has anything changed? Chuck Entz (talk) 03:13, 6 May 2024 (UTC)Reply

Why should Category:Franco-Provençal lemmas be clogged with twelve different renditions of ôtro, seventeen of ôtrament, and ten of solament? Why should Category:Old French lemmas (not to mention Category:Old French adverbs) be clogged with two hundred seventy one renditions of iluec? The whole point of a lemma is to provide a citation form to cover the variants. That is how altforms and altspellings are handled by the vast majority of dictionaries. Nicodene (talk) 03:23, 6 May 2024 (UTC)Reply
I'm of two minds here. Yes, we generally include alternative spellings and forms as lemmas; otherwise, for example, we'd end up including only one of oxidi{s,z}e as a lemma, and the other would go nowhere. At the same time, however, including 171 alt variants of iluec seems like serious overkill. Maybe we need a separate policy for non-standardized languages vs. standardized ones. Benwing2 (talk) 07:16, 6 May 2024 (UTC)Reply
At a minimum, every entry should be in some category. As far as how that's been accomplished up to now, my understanding matches Chuck's, that every entry is supposed to be categorized as either a lemma or a nonlemma (or both) and that alternatively-spelled nouns are still nouns (and lemmas, from the category / grammatical perspective). We could change that, e.g. add a parameter which, instead of turning categorization off, moves the entries from "Category:Foobarian nouns" to at least a POS-agnostic catchall "Category:Foobarian alternative forms and spellings", or something more specific like "Category:Foobarian alternative forms and spellings of nouns", "Category:Foobarian alternative forms and spellings of lemmas", but I do think we should continue to regard a completely uncategorized entry—an entry that cannot be accessed from any part of our category tree—as a problem.
There was support for not putting just any alternative spelling into topical categories in this 2022 discussion, but that didn't leave the entries categoryless.
FWIW, the issue of terms having tons of spellings isn't strictly limited to overall-nonstandardized languages, e.g. English has lots of spellings of kinnikinnick, Muhammad, voivode... but I think Benwing's suggestion of handling this on a per-language basis (and just accepting that the English categories will have a few cases like Muhammad where there are a bunch of spellings) is probably more workable than e.g. trying to decide (in a way that can be maintained over time with any consistency) on a per-spelling basis what counts, in a mostly-standardized (but standards-body-less "ungoverned") language like English, as a "standard" spelling. (E.g., several of the alternative spellings of Muhammad are used mainly in scholarly works, so dismissing them as nonstandard seems hard; and in the other direction, for a largely dialectal word, determining why any one spelling should be considered more standard than another seems hard.) - -sche (discuss) 13:54, 6 May 2024 (UTC)Reply
We could change that, e.g. add a parameter which, instead of turning categorization off, moves the entries from "Category:Foobarian nouns" to at least a POS-agnostic catchall "Category:Foobarian alternative forms and spellings"
I would be quite happy to use that if it were available as an option.
My main concern is keeping the categories clear and usable. When I look up 'Foobarian feminine nouns', for instance, I'd rather not have to wade through 5–10 (+) duplicates for every distinct noun. That is a serious headache with languages like Franco-Provençal or Romansch. Nicodene (talk) 07:54, 7 May 2024 (UTC)Reply
@-sche: I would like this to be implemented for English as well. Having full-fledged entries for minor spelling variants was a bad idea. Ioaxxere (talk) 03:08, 10 May 2024 (UTC)Reply
I disagree. All words should be given equal status, at least when it comes to categorization. I don't think Wiktionary should be treating variant spellings as inferior forms of the main entries. For starters, every spelling is (or was) the "default" spelling to someone. Using the example of Muhammad, for instance, there are plenty of people named Mohamad, Mohamed, Mohammad, Muhamad, Muhammet, etc. and it seems weird to claim that their names are merely lesser variants of the single "canonical" spelling. There's also the fact that some spellings carry unique etymological information, have slightly different pronunciations, or are used primarily by certain groups (regional spellings, for instance, or spellings used primarily by non-native speakers). Frankly, I find it troubling that there have been so many recent attempts lately to get us to reduce our coverage rather than expand it. At this rate, I won't be surprised if someone starts a proposal to convert alternate spellings into hard redirects. Binarystep (talk) 19:20, 11 May 2024 (UTC)Reply

Edit with "username removed"[edit]

This edit has the user name removed. How can one see (if not who the user is), which user removed it and why? [2] Equinox 09:33, 6 May 2024 (UTC)Reply

I removed it because it was an accidental IP/logged-out edit by an editor (the same as did a similar change to unrapable). — SURJECTION / T / C / L / 10:17, 6 May 2024 (UTC)Reply
I'm officially saying: don't do that. You can revert, delete, but do not wipe content unless it's real serious stuff like child porn. Thank you. Equinox 23:01, 7 May 2024 (UTC)Reply
Re how to see which admin performed the revdel: it's technically in the "View logs for this page" link on the edit history page, [3]. If there were a lot of revdels and they did not follow so closely after the time the edits themselves were made, e.g. if I now went to the page and hid a revision from two months ago, and then Surjection hid a revision from one month ago as well as your edit just now, it might be hard for non-admins [who don't have "diff" links] to discern from that log who hid which thing... I guess in that case they'd just have to say "hey, who revdel'd X" and admins could check.) - -sche (discuss) 14:28, 6 May 2024 (UTC)Reply
@Surjection: I want you to understand how it looked to me: I saw that someone had made an edit, they had no name, I couldn't see them, or talk to them, or discuss, it was like a GHOST DID IT. And I couldn't see who removed their name either. If you ever spent time on WP:OFFICE then ...well. Equinox 22:50, 7 May 2024 (UTC)Reply
I would, personally, be happy to see text like "edit made by a user whose name is hidden by this admin: Surjection". What I think is wrong and bad and goes against our free openness is just that MYSTERY NO-NAME. Equinox 22:51, 7 May 2024 (UTC)Reply
Side point: I know Chuck Entz (for example) likes to "clean the graffiti wall" so that vandals can't see their names. But I don't like that. The wiki should be a public space and we should only hide the history in real serious situations like "doxxing" (real name-addresses) or... am I wrong? @-sche @Chuck Entz @Surjection (and even worse, are there Wikipedia rules we are supposed to obey as children.) Equinox 22:55, 7 May 2024 (UTC)Reply
AFAIK it's global WMF policy to suppress this kind of thing (the IP addresses of users who've accidentally edited logged out), and indeed to suppress it way harder than a mere revision-deletion like Surjection did: "oversighters" have (or had?) database access to delete the information so hard that not even admins can see it. (But it also takes time to contact them, so it's fine for admins to revdel it in the meantime, like this.) This is precisely because of doxxing concerns, because many IP addresses identify the person's real address. (Other IP addresses, of course, merely send you to that one farm in Kansas.) If you ever see an edit where you think the content of the edit is wrong, just undo the edit... as you saw in this case, the username being suppressed doesn't prevent you from undoing the edit. - -sche (discuss) 01:39, 8 May 2024 (UTC)Reply
Would there be an issue if contributors were to hide their IP address with their screen names after, say, a week? CitationsFreak (talk) 03:46, 8 May 2024 (UTC)Reply
I should clarify that AFAIK such hiding only happens when someone requests it—usually the person who made the edit, though plausibly someone else who simply noticed what was going on. Last I heard, WMF folks were trying to roll out something that automatically obfuscates all IP addresses by making them show up in edit histories as e.g. incrementing numbers that change periodically or on request (so anytime someone thinks their current [non-]IP is getting too much attention from admins, they can hit "refresh" and start doing vandalism under a new identity, just like logged-in users can by creating multiple accounts), which will probably remove the need to do this in the future, if it gets implemented. - -sche (discuss) 05:12, 8 May 2024 (UTC)Reply
Would there be an issue if contributors could request that their IP address be hidden by their screen name? CitationsFreak (talk) 05:42, 8 May 2024 (UTC)Reply
Tangent: Is there a way to "claim" an edit you made while accidentally logged out? Caoimhin ceallach (talk) 21:51, 12 May 2024 (UTC)Reply

The issue of Old Kashubian (Old Pomeranian?)[edit]

I came to a recent realization about the {{R:zlw-opl:SPJSP|Old Polish dictionary}}: it contains texts from Pomerania with Pomeranian features, as it was made during a time when Kashubian was considered a dialect of Polish. However, typologically, this is very, very wrong. Pomeranian is considered North Lechitic, and anything "Polish" and (Masovian, Upper Polish, Lower Polish, and Silesian) are considered East Lechitic, therefor anything Old Kashubian should not be considered Old Polish. I propose a split; I intend to add the location of creation for any Old Polish documents anyway for a future dialectal project (for Old Polish this means categorizing somehow location of attestation by dialect) and separating any texts from Pomerania for "Old Pomeranian" with a code zlw-opm, or perhaps "Old Kashubian" zlw-ocb with Kashubian and Slovincian as the children. These codes seem clunky to me and I am open to others. I have also corroborated this by emailing the editors of the Old Polish dictionary, who have told me that it indeed is "Old Kashubian", which they accept in their framework of Old Polish. Gorazd also holds the same view. @Thadh @Sławobóg @Rakso43243 @Benwing2 @Mahagaja @Silmethule. Vininn126 (talk) 10:50, 6 May 2024 (UTC)Reply

Alternatived are if we accept Kashubian and Slovincian as the descendants of Old Pomeranian, then we could set them both to be descendants of Old Polish. However, the argument for this is one could accept "Old Kashubian" as a constituent of Old Polish - not a dialect, but constituent. This is what the editor of the Old Polish dictionary told me, quote " Nie napisałam, że to dialekt. Napisałam, że to element składowy języka staropolskiego. To duża różnica. Język starokaszubski to element składowy języka staropolskiego." The alternative is also we ignore this, which seems wrong to me as well. Vininn126 (talk) 11:42, 6 May 2024 (UTC)Reply
Another solution: give Old Kashubian an etycode and make it an alt of Old Polish and if a term is attested in Pomerania, we could set the Kashubian and Slovincian reflexes as inherited from that? Otherwise directly from Proto-Slavic. Vininn126 (talk) 14:36, 6 May 2024 (UTC)Reply
@Vininn126 I think this last solution is maybe the best. This is similar to what is done with Old Northern French, which is considered an etym-only variety of Old French even though Old French as normally construed refers to the Old French of the Paris area whereas Old Northern French refers to the Old French of Normandy, and neither is an ancestor or descendant of the other. The two differ significantly in phonology, e.g. Old French chacier /tʃatsiɛr/ -> English "chase" vs. Old Northern French cachier /katʃiɛr/ -> English "catch". Anglo-Norman and modern Norman are both descendants of Old Northern French (although we currently list Norman as a descendant of Middle French, which is wrong) and modern French is a descendant of Old French per se. Benwing2 (talk) 18:38, 6 May 2024 (UTC)Reply
I know @Silmethule also mentioned a similar situation with Ancient and Mycenean Greek and also Old Norse and Swedish/Icelandic. See also my question on WT:About Old Polish. Related to that, I'm unsure how to handle labels for all of this. I think we'd want to list Kashubian/Slovincian in the Old Polish entries if and only if a text from Pomerania has an attestation. And any Kashubian/Slovincian words should still have "inherited from Old Kashubian/Pomeranian". Vininn126 (talk) 18:49, 6 May 2024 (UTC)Reply
@Nicodene As our resident Romance expert, do you agree with changing the ancestor of Norman to be Old Northern French instead of Middle French? This will cause the 5 terms in CAT:Norman terms inherited from Middle French to throw errors, I think. Can you fix up those 5 terms? Also I notice there are 30 terms in CAT:Norman terms inherited from Medieval Latin, which seems impossible and probably need to be cleaned up. Benwing2 (talk) 19:54, 6 May 2024 (UTC)Reply
I've just cleared out the categories in question. Αgreed on removing Middle French as an ancestor of Norman. As for its further ancestor, I would leave it as just Old French, which includes ONF as-is. I think the latter are best treated as one overall language.
I've been meaning to eliminate '[Romance] terms inherited form Medieval Latin' in general, reassigning them to '...inherited from Early Medieval Latin' or '...borrowed from [later] Medieval Latin'. That will take some time. When it's done, perhaps we can make {{inh|romance language|ML.|...}} throw an error message and a brief comment. Nicodene (talk) 00:50, 7 May 2024 (UTC)Reply
@Nicodene Thanks! I think the basic advantage of setting the ancestor of Norman to be Old Northern French is it more clearly shows the ancestry (when you go CAT:Norman language and look at the Ancestors panel) than just setting it to Old French. Since Old Northern French is an etym-only variant of Old French, I don't think it will make any difference in terms of what Norman terms are allowed to inherit from. What do you think? Benwing2 (talk) 01:44, 7 May 2024 (UTC)Reply
Oh, so setting it to ONF won't disallow inheritance from Old French. In that case it sounds fine to me. Nicodene (talk) 01:50, 7 May 2024 (UTC)Reply
Yeah that's right. Benwing2 (talk) 01:56, 7 May 2024 (UTC)Reply
@Nicodene @Benwing2 Here's how it works. If you set a variety (etym-only language) as an ancestor, the descendant can inherit from:
  • That ancestor and any (sub)varieties of that ancestor (in this case, Old Northern French, and any varieties it might have).
  • The parent (in this case, Old French) unless the ancestral variety is also explicitly ancestral to its parent (read: the thing it's a variety of), which doesn't apply here. This is for situations like Tajik having Classical Persian as an ancestor: Classical Persian's set as a variety of Persian, but is also set as its ancestor. Since Tajik's ancestor is also Classical Persian, it's only possible for it to inherit from Classical Persian (and any varieties thereof), not Persian in general.
It can't inherit from:
  • Any other varieties of the parent which aren't in the direct lineage of its ancestor (i.e. it wouldn't be able to inherit from other varieties of Old French, unless they're ancestral to/descended from/a subvariety of Old Northern French). To use an Italic example: if we set the proto-language of Romance to be Vulgar Latin, instead of simply Latin, the Romance languages could also inherit from Classical Latin (its ancestor), Latin (the general parent) and Old Latin (set as the ancestor of Latin), but they wouldn't be able to inherit from varieties like Medieval Latin or New Latin, since they aren't in the direct lineage.
It sounds complicated, but it seems to line up pretty neatly with most people's intuitions in practice. Theknightwho (talk) 15:40, 10 May 2024 (UTC)Reply
So it would be possible to set Old Kashubian as an etym-only variant of Old Polish and then set Kashubian and Slovincian as the children of Old Kashubian but not Old Polish? Vininn126 (talk) 15:43, 10 May 2024 (UTC)Reply
@Vininn126 Per the rules just outlined, we could definitely make Old Kashubian an etym-only variety of Old Polish and set the ancestor of Kashubian and Slovincian to Old Kashubian, but people would still be able to "inherit" Kashubian and Slovincian terms from Old Polish. It'd be like the situation with Old French. If you wanted to avoid that, either we'd need a new flag or rule of some sort, or we'd need to change the name of Old Polish to e.g. "Old Lechitic" and make Old Polish an etym-only variety of Old Lechitic. @Theknightwho Here's a thought though. If we set the explicitly set the ancestor of Old Kashubian to Proto-Slavic, would that make it impossible to inherit Kashubian terms from Old Polish? That would be like a slight generalization of the special-case rule for ancestral-to-parent etym languages. Benwing2 (talk) 21:18, 10 May 2024 (UTC)Reply
I've dreamed of "Old Lechitic", but it doesn't encompass Polabian. Vininn126 (talk) 22:08, 10 May 2024 (UTC)Reply
@Vininn126 Sorry, why does Polabian matter here? It can just be excluded from Old Lechitic just as it would be excluded from Old Polish. Benwing2 (talk) 08:41, 11 May 2024 (UTC)Reply
@Benwing2 I have actually tossed the idea of "Old Lechitic" around before with @Sławobóg and @Silmethule. I suppose since it contains Old Kashubian as well there is more precedent for the name. Vininn126 (talk) 08:45, 11 May 2024 (UTC)Reply
@Benwing2 So far I think the name change and etycodes might be the best solution. I'd like to see if anyone else has any thoughts. If we agree, we can make this change, maybe once I finish adding location information to the quotation templates (or maybe that's not necessary...). Vininn126 (talk) 12:03, 13 May 2024 (UTC)Reply
Also pinging @KamiruPL as the other main Old Polish editor so he can be aware of the goings-on and give his opinion. Vininn126 (talk) 08:34, 11 May 2024 (UTC)Reply
I wouldn't like this. This is almost akin to handling Old East Slavic as an Old Church Slavonic variety. Pomeranian and Polish are two distinct branches, and the fact that an earlier stage was highly influenced in their literary variety by the other doesn't make them one and the same. Thadh (talk) 20:43, 6 May 2024 (UTC)Reply
There's actually a similar issue with texts from Pomerania from {{R:pl:SXVI}} and {{R:pl:SXVII}} but I think we can safely nest these under modern Kashubian with a label, as I have done with Middle Polish. Vininn126 (talk) 19:43, 6 May 2024 (UTC)Reply
Absolutely no to changing name of Old Polish to Old Lechitic or something. Since Kashubian belongs to different group, it should be separate Old Pomeranian L2 language. It would work better as Proto-Pomeranian too. Having etym-only code would be an alternative solution too, but then we are not consistent with our system (that made BG and MK descendants of OCS :)). Sławobóg (talk) 13:12, 14 May 2024 (UTC)Reply
@Sławobóg As to your second point: are you saying we could set Old Pomeranian as an etycode within Old Polish? What about the issue where people would be able to give type e.g. {{inh+|csb|zlw-opl}} with no issues? Vininn126 (talk) 13:30, 14 May 2024 (UTC)Reply
Having Kashubian as descendant of Old Polish is just wrong. Having "Old Pomeranian" as etym-code for Old Polish would be better, but still not as good as having separate lang, but it might make editing easier. Sławobóg (talk) 13:44, 14 May 2024 (UTC)Reply
Would you be able to 1) assist in establishing spelling norms? 2) Dealing with the texts? 3) Understanding the grammar? 4) What about the fact that there are very few texts? 5) What about the fact that all of Old Polish already is a collection of dialects? Vininn126 (talk) 13:54, 14 May 2024 (UTC)Reply
1-3) Probably not. 4) We have languages like that. 5) Pomeranian being part of it is wrong. I'm not gonna fight here, you asked be about opinion, I gave my opinion. And if you plan having Middle Kashubian, having Old Kashubian/Pomeranian as L2 would be a good thing. Sławobóg (talk) 14:48, 14 May 2024 (UTC)Reply
I'm not arguing, I'm just asking questions. I have no problem if you question the points I raised earlier! Vininn126 (talk) 14:50, 14 May 2024 (UTC)Reply
@Vininn126 I think we can fix the issue of {{inh+|csb|zlw-opl}}, if that would help. Benwing2 (talk) 14:54, 14 May 2024 (UTC)Reply
@Benwing2 That could be a good compromise. Vininn126 (talk) 14:55, 14 May 2024 (UTC)Reply

Old Polish regional categorization[edit]

As a sort of continuation of Wiktionary:Beer_parlour/2024/May#The issue of Old Kashubian (Old Pomeranian?) and Wiktionary talk:About Old Polish#Regional Old Polish, I'm trying to figure out the best way to handle regional information for Old Polish. I have a document explaining the origin of most texts in Old Polish so it should be easy to figure out which of the 5 lects currently considered Old Polish (those being Masovian, Greater Polish, Lesser Polish, Silesian, and Pomeranian/Kashubian). I think it would be useful for readers to know which region a definition/term has been attested, as Old Polish wasn't a single entity and ultimately is the source of those modern dialects today, so we can see more clearly regional features and the like. My concern about using labels is that they would imply that a term might have been limited to a given lect, which we can't know for sure. What do others think? Vininn126 (talk) 19:17, 6 May 2024 (UTC)Reply

One solution could be to use {{lb}} but print the text {{lb|zlw-opl|attested in|Masovia|Lesser Poland}} etc. @Benwing2, would this be technically bad? Vininn126 (talk) 15:56, 8 May 2024 (UTC)Reply
@Vininn126 No, I don't see why that would be an issue. attested in isn't currently a recognized label but could easily be made one, so that it suppresses the following comma. Benwing2 (talk) 23:21, 8 May 2024 (UTC)Reply
@Benwing2 Alright, that would be fine, and I think that's a good solution. Vininn126 (talk) 07:32, 9 May 2024 (UTC)Reply
@Benwing2 Another solution would be to have the quotation templates categorize by dialect when added to a page. This probably would be a bad idea? Vininn126 (talk) 07:44, 9 May 2024 (UTC)Reply
@Vininn126 Yeah the quotation templates do take a label but I feel uncomfortable categorizing based on that label. You could for example imagine someone illustrating a general-use term with a sentence written in a dialect, and labeling the quotation with the dialect in question; that doesn't mean in this case that the term is in the dialect. Benwing2 (talk) 08:24, 9 May 2024 (UTC)Reply
@Benwing2 Alright so for now I'm going to add the location of creation of the documents and a note saying what label the quotation template should count toward, see {{RQ:zlw-opl:AcCas}}, and I'll add the labels and regions manually from there. Unless it'd be possible to do a bot job after. Vininn126 (talk) 08:26, 9 May 2024 (UTC)Reply
@Vininn126 Might be possible, depends on how regular everything is and you making a list of all the quotation templates and associated lect/labels. Benwing2 (talk) 08:36, 9 May 2024 (UTC)Reply
@Benwing2 Can we add this to the labels module? I'm slowly working through these sources and I think this is the best solution. Vininn126 (talk) 12:01, 13 May 2024 (UTC)Reply

Continental Celtic[edit]

We have Continental Celtic as a family, but my understanding is that the consensus among Celticists is that is CC isn't a clade but just a term of convenience for Celtic languages other than the Insular Celtic ones. Isn't our custom at Wiktionary to have only actual genetic families, not convenient groupings? —Mahāgaja · talk 11:28, 7 May 2024 (UTC)Reply

@Mahagaja Yeah we should get rid of this. BTW the Wikipedia article on Continental Celtic was in a terrible state due to a bunch of crap added a month ago, which I reverted. Benwing2 (talk) 22:06, 7 May 2024 (UTC)Reply
Yeah, agreed. Theknightwho (talk) 04:09, 10 May 2024 (UTC)Reply

Ban one-descendant Proto-Italic and Proto-Hellenic redlinks[edit]

There are already far too many one-descendant Proto-Italic and Proto-Hellenic entries, and adding one descendant redlinks to, for example, a descendant tree or an etymology section is only going to encourage more of these entries being created. These redlinks should be banned. -saph 🍏 13:31, 7 May 2024 (UTC)Reply

Right, there should be above-average incentive to create such a page, so unless it is already decided to have one, bots should neutralize these links. Fay Freak (talk) 13:51, 7 May 2024 (UTC)Reply
In practice, what does a 'ban' on making certain kinds of redlinks mean, and what is the alternative it is supposed to incentivize? I guess mentioning the same form but not linking it would be slightly better, as it doesn't encourage creating an entry, but I'm not totally happy with that either in some cases. E.g. if the reconstructed form is itself doubtful, I wouldn't want it to be mentioned anywhere.--Urszag (talk) 15:44, 7 May 2024 (UTC)Reply
For example:
From Proto-Italic *fworom, from Proto-Indo-European *dʰwor-om (enclosure, courtyard, i.e. something enclosed by the door, or the place outside, i.e. through the door), from *dʰwer- (door, gate).
With the Proto-Italic word displaying as just plain text, rather than what we currently have (forum). As for the reconstructed form being doubtful, we should just list the hypothesised PIE form, e.g.:
As opposed to the current etymology given at serius. -saph 🍏 15:58, 7 May 2024 (UTC)Reply
The alternative it is supposed to incentivize is not creating such entries. You would have to have a more serious motive than ticking off a removed red link, since they are not apparent in the first place. Fay Freak (talk) 16:03, 7 May 2024 (UTC)Reply
Agreed. Down the line it may also be worth discussing a general ban of reconstructions (and their associated redlinks) that have only one descendant and no derived terms. Nicodene (talk) 22:49, 9 May 2024 (UTC)Reply
Could someone run a bot to do this? -saph 🍏 19:50, 10 May 2024 (UTC)Reply

Add "Muslim", "Hindu" etc. labels?[edit]

Proposal to add labels for lemmas used by people of specific faiths (which are not necessarily religious terms, rather they're only used by certain groups. Case in point মিঞা (mĩa) which has a Muslim gloss, but the Muslim label is an alias for 'Islam', though it's not an 'Islamic' term, just used by Muslims. Urdu dictionaries, which I concern myself with, have used these labels for centuries without prejudice. I know this would be useful for languages in the Indian subcontinent, as well as European languages (especially English). نعم البدل (talk) 20:55, 7 May 2024 (UTC)Reply

@نعم البدل There are (at least) two possibilities here. One is to disentangle the labels 'Muslim' and 'Islam' in a language-independent fashion, and the other is to do it for specific languages. I suspect the aliasing of 'Muslim' and 'Islam' was done with English entries in mind, where on the surface it makes a certain amount of sense (e.g. we have 'Muslim finance' as an alias of 'Islamic finance' and 'Christian' as an alias of 'Christianity'). A third possibility is to create a separate label, something like 'Muslim usage' or 'Muslim speakers', which makes it clear that the term is used by particular speech communities. Note that the advantage of doing it in a language-specific fashion is we can create associated categories, such as Category:Muslim Bengali, to categorize such terms, which wouldn't make so much sense if done language-independently. Finally, the adjective-noun issue you're bringing up isn't limited to this case; there is for example the issue of 'British India' (English terms formerly used in British India) vs. 'British Indian' (English terms currently used by Brits of Indian background).
BTW if you think the terms should be disentangled language-independently, you can see all current uses of the label 'Muslim' here: Special:WhatLinksHere/Wiktionary:Tracking/labels/label/Muslim (there are only 9 of them). Benwing2 (talk) 21:58, 7 May 2024 (UTC)Reply
@Benwing2: I think the 'Muslim' (etc.) tag should be detached from the 'Islam' label and made into an independent label and placed under the Module:labels/data/topical so that, as you say, it can generate associated categories, something like Category:Bengali Muslim speech (similar to Category:English women's speech terms, a minor difference between 'Muslim Bengali' as the label I'm proposing should be shed of its religious connotations as much as possible).
  • you can see all current uses of the label 'Muslim' here – Thank you for this! As far as I can see, apart from marabout, all of the other terms should be placed under my proposed label, as that's what was probably implied. Note how the 'Muslim' tag in মিঞা (mĩa) was encapsulated with Template:a (added by an IP), not the 'Muslim' label – likely because the 'Muslim' label appends the lemma to Category:Islam which doesn't fit. نعم البدل (talk) 02:21, 8 May 2024 (UTC)Reply
@نعم البدل OK, let's see if there are any objections/comments, and if not I'll make this change in a few days. Benwing2 (talk) 03:04, 8 May 2024 (UTC)Reply
Yeah no worries! نعم البدل (talk) 17:34, 8 May 2024 (UTC)Reply

──────────────────────────────────────────────────────────────────────────────────────────────────── @Benwing2, نعم البدل For a while there was a category named CAT:Musalman Gujarati, which is now empty. The handful of terms that were in it were moved to CAT:Gujarati dialectal terms. It would be helpful if there is a category named something like CAT:Gujarati Muslim speech as a replacement for CAT:Musalman Gujarati.

There is a phenomenon known as being a Cultural Muslim, but not a practising Muslim, who might use the terms in a category such as CAT:Muslim speech but not necessarily identify with the terms in CAT:Islam. The same would probably be applicable to other faiths.

Would greetings such as salaam alaikum that are associated with a Muslim context but may or may not be intended to be Islamic be in the proposed CAT:Muslim speech alongside CAT:Islam? For this particular term, it says on Wikipedia that it is ‘common among Arabic speakers of other religions (such as Arab Christians and Mizrahi Jews)’. The usage notes section of नमस्ते says ‘it is often considered gracious to greet someone in their religion’s greeting’ [even if that differs from their own religion]. Kutchkutch (talk) 03:44, 10 May 2024 (UTC)Reply

@Kutchkutch: I might be drifting away from the subject a little since I'm a little in interested in this :) The case with salaam alaikum is slightly complex, though. In Arabic, it's a common greeting, and used by people who follow Abrahamic faiths. I'm not really sure about the exact perception of that phrase in Arabic but in Urdu, it's sometimes the same, people who speak Urdu, regardless of their faith, might use that term, but some hardliners might be of the opinion that it's even forbidden to say 'Salam' to a non-Muslim, while other Muslims might not even bat an eye to the other's faith, and a label might not even be considered. Generally, I would say it applies to a CAT:Muslim speech (but not Category:Islam) because of alternatives like آداب (ādāb) being considered more 'neutral'. Is नमस्ते (namaste) considered to be inherently an Hindu phrase, as is generally the perception of Urdu speakers – even when it comes it to Hindi, or is it somewhat neutral? نعم البدل (talk) 01:44, 11 May 2024 (UTC)Reply
@نعم البدل: Thanks for the clarification about سَلام عَلَیکُم (salām 'alaikum).
  • Is नमस्ते considered to be inherently an Hindu phrase, …when it comes it to Hindi, or is it somewhat neutral?
  • With respect to this proposal, नमस्ते and नमस्कार could go in CAT:Hindi Hindu speech. However, there is inherently nothing Hindu about the words नमस्ते and नमस्कार in of themselves other than Sanskrit being the liturgical language of Hinduism (similar to how Arabic is the liturgical language of Islam). What may considered inherently Hindu/Buddhist/Jain/Sikh about नमस्ते and नमस्कार is when the the salutation (and related hand gesture 🙏) is toward a deity rather than actual person.
  • Although the words नमस्ते and नमस्कार are found in Vedic literature in the context of worshipping Hindu deities, the words themselves are formations derived नमस्, which is cognate to نماز .نماز was probably associated with Zoroastrianism rather than Islam before the Islamic conquest of Persia, and this is indicative that the term was not inherently bound to a particular religion.
  • Even though नमस्ते and नमस्कार are considered Hindu greetings, it seems to be neutral when speaking Hindi because it may only be inappropriate to use them if both the speaker and listener belong to a community that has its own community-specific greeting such as सलाम अलैकुम (salām alaikum) among Muslims, जय जिनेंद्र (jay jinendra), among Jains and सत श्री अकाल (sat śrī akāl) among Sikhs. The reason for this may be that India is 79.8% Hindu (according to the 2011 census). If there are no overt indicators to guess the other person’s religion when talking to strangers, using the Hindu greetings (alongside the English greetings) may be considered as neutral since there is an 80% probability that the other person is a Hindu.
──────────────────────────────────────────────────────────────────────────────────────────────────── Thinking about the "Judeo-Urdu" vs "Urdu Jewish speech" question has me wondering how sensible labels/categories for "Jewish speech", "Muslim speech" etc really are...though I don't know what a better alternative is.
In theory, if we add a "Jewish speech" label, all of our entries in any Judeo-X lect, whether we treat it as a distinct language ("Judeo-Italian", "Judeo-Tat") or a dialect ("Judeo-Arabic"), could simultaneously gain this new "Jewish speech" label, by definition, no? Arguably a bulk of our Hebrew and Yiddish entries would also gain it. But is that useful? (So maybe, for such languages, we forgo the label? But where is the line? Do we have a Hindu-speech Hindi category, or do we assume the default for Hindi is Hindu and exceptions must be specified? But even more Bangladeshis are Muslim than Indians are Hindu...)
On a practical level, I worry that users will not grasp or maintain a distinction between "Muslim" and "Islam", because "it's mostly Muslims who use [such-and-such Islamic-religion-related word]" → "I'll label it 'Muslim'" is just too logical a train of thought, as is "only people who believe in Islam use this word" → "I'll label it 'Islam' [even if the word just means 'man' and not a per se religious concept]", so it'll be a perpetual maintenance task to keep the labels straight. I also wonder... would nonreligious people from 'traditionally Muslim' areas not use (e.g.) মিঞা? Is using মিঞা really bound up in being Islamic, something only Muslims do—and if so, why is it not then a {{lb|en|Islam}} term? I wonder if this is not better handled (like also the ostensible "English women's speech terms")* by usage notes, that a particular term is typically used by people from "culturally Muslim" communities...? But I concede that even usage notes imply that if there are many such terms, they could be in a category (and if something is a dialect, even a cultural dialect, well, we do often have labels for that), even if I wonder if it would be possible to find clearer wording ("culturally Muslim"?)...
Does Judeo-Urdu, in particular the spoken form, have a lot of words in common with "general" Urdu? Is the main distinction that Judeo-Urdu is Urdu written in Hebrew script, or are there pervasive "dialectal" differences e.g. in how vowels are pronounced or how words inflect? If, in speech, lots of words are held in common between Jewish speakers' Urdu and other people's Urdu, then it might be weird to call those common words "Jewish speech" words solely because Jews tend to use a different script in writing, no? (Conversely, if lots of words are different and Judeo-Urdu is its own lect, whether an independent language like Judeo-Italian or a dialetc like Judeo-Arabic, is there much benefit to categorizing things as both "Judeo-Urdu" and "Urdu Jewish speech"? But as I said above, I suppose we could set it up so that the labels that would generate "Urdu Jewish speech" and "Judeo-Urdu" were aliases for Urdu and only generated one of those categories no matter which one was entered...
This all seems...thorny.
(*Re "English women's speech": "Women's speech" does not seem to be a distinct lect in English the way it is in e.g. Sumerian ... but until very recently, the nature of our label- and category- system meant that any label that a single small language needed, had to be put into the singular big mishmash everyone saw presented as "labels that are available to all languages", so various people tried to find ways to apply myriad Sumerian-and Chinese- etc- specific labels to English... so I am tempted to change the few entries that use that label in English to instead have usage notes, where such notes/label would even be accurate, saying mostly women use it... and to restrict the label to only those languages which actually have distinct women's speech registers...) - -sche (discuss) 19:25, 12 May 2024 (UTC)Reply
@-sche You've made a lot of good points. I agree with you about converting the terms in CAT:English women's speech terms into Usage notes. Interestingly, all but one are primarily used in foreign contexts; possibly the native languages in those contexts do have a women's register. And the one remaining (bestie) seems questionable; it has a cutesy feel to it, which is probably why it's being considered "women's speech" but I'm sure you can find examples of men using it. As for Judeo-Urdu etc. I think any variety that is predominantly used by a particular ethnic community should probably not be redundantly tagged using that community's speech tag. Hence you could have e.g. "Sikh speech" or "Jain speech" in Hindi but not "Hindu speech". Same goes e.g. for Yiddish and Ladino being tagged as "Jewish speech". Benwing2 (talk) 21:31, 12 May 2024 (UTC)Reply
Sure we should have the labels. In other cases they are or have been even L2, as Christian Palestinian Aramaic and Jewish Palestinian Aramaic, the situation with information exchange in the past was of course more seclusive. For Arabic we agreed that the separate codes were exaggerated, but there were cases where I had to label Moroccan Arabic terms as Jewish-only, and even for Serbo-Croatian we have terms only working for Muslims.
Having to label most Hebrew, Yiddish and Ladino terms as Jewish is a strawman, Arab Israelis do great, of course etymologically many terms used by the minority will have foundation in the historical religion of the majority, the question is whether the respective frequency differs significantly. Fay Freak (talk) 21:51, 12 May 2024 (UTC)Reply

Englishman picture[edit]

So User:Shoshin000 (among other trollish activities) has been insisting on adding a picture of an angry football hooligan as the picture of "Englishman". I reverted it once, he restored. I mention this because I know the modus operandi and soon I'll be accused of being a badmin. Check out the entry and you know the previous picture was nicer. Equinox 22:47, 7 May 2024 (UTC)Reply

I personally think your picture is better (although I wonder, do we need a picture to illustrate this?). Benwing2 (talk) 00:00, 8 May 2024 (UTC)Reply
Honestly, I like Shoshin's pic, as it's more stereotypical.[1] There's nothing inherently Englishman-y about Eq's pic, besides the depicted person being English.
[1] Then again, that's a good argument against the pic. CitationsFreak (talk) 03:25, 8 May 2024 (UTC)Reply
It could be argued that pictures of nationalities, if they exist at all, should show someone of that nationality in characteristic clothing (although that is probably more appropriate for nationalities that actually have characteristic clothing that most people wear on a day-to-day basis). OTOH it's in general very hard to capture a nationality in single picture (for this reason, Wikipedia usually supplies a whole collection of pictures to illustrate a nationality), and in any case this is more encyclopedic than dictionaric (a real but rare word). Benwing2 (talk) 04:06, 8 May 2024 (UTC)Reply
Yeah, I was thinking that a college would be best. I'm not sure what a recognizable British outfit would be, and having one person stand-in for Britain could imply that British people all are X. Highly unlikely, but possible. CitationsFreak (talk) 04:12, 8 May 2024 (UTC)Reply
I don't think nationalities should have photos at all, but I also disagree that File:ENG-BEL (6).jpg is "a picture of an angry football hooligan". The person in that photo doesn't look angry, nor is he doing anything hooliganish. His Englishness is clearly shown by the St George's Cross painted on his face. He arguably does illustrate [[Englishman]] better than the photo of Greg Rutherford, since Rutherford is representing the entire UK (not just England) in his photo. All that said, however, it is probably better to leave such entries unillustrated to avoid stereotyping. —Mahāgaja · talk 08:07, 8 May 2024 (UTC)Reply
I agree, this is not an image that requires an image. Vininn126 (talk) 08:09, 8 May 2024 (UTC)Reply
Aren't photos appropriate where there is an attestable, probably dated and often derogatory or demeaning, definition of a stereotype? Eg, Bavarians with lederhosen, Prussians with spiked helmets, Mexicans with sombreros and/or serapes.
There is no such definition here, nor would I expect us to attest any such definition. DCDuring (talk) 17:34, 8 May 2024 (UTC)Reply
I don't really see this picture as a problem, really, even though I wouldn't pick it myself. It'd probably be fine as part of a collage. Theknightwho (talk) 17:51, 8 May 2024 (UTC)Reply

Fixing Telugu rhymes[edit]

For years now, User:Rajasekhar1961 has been adding Telugu rhymes written in Telugu script instead of IPA. There is a special hack in Module:rhymes to deal with this, but IMO Telugu should (obviously) use IPA for rhymes, just like all other languages. Does anyone object to this? Can anyone out there read Telugu script well enough to tell me if the rhymes listed under Rhymes:Telugu (e.g. Rhymes:Telugu/రం) and Category:Rhymes:Telugu are even salvageable, or should just be nuked? I don't know much about Telugu but scripts are generally not 1-to-1 mappable to IPA, so I don't know what it means to have a rhyme listed using Telugu script. Benwing2 (talk) 00:40, 8 May 2024 (UTC)Reply

Strongly agree. Theknightwho (talk) 11:40, 8 May 2024 (UTC)Reply
@Benwing2, Theknightwho Rajasekhar1961 has certainly put effort into creating CAT:Telugu rhymes. However, unless the definition of a rhyme in a Telugu or Dravidian context differs from
‘the second part of a syllable, from the vowel on, as opposed to the onset’
you are correct in pointing out that these do not appear to have been done correctly. From an orthographic perspective, the final consonant (or consonant cluster) followed by the final diacritic (or the inherent schwa) of a word written in Telugu script (which is a Southern Brahmic abugida) does not constitute a rhyme. The entries in CAT:Telugu rhymes categorise words by word-final syllables rather than rhymes because the onset is included.
A Telugu editor could probably rectify the words mentioned on the entries in CAT:Telugu rhymes. However, even if there is a user with the appropriate background to do so, it would be a lot of work, and it would be the equivalent of deleting the entries currently in CAT:Telugu rhymes and starting over again. Kutchkutch (talk) 11:13, 10 May 2024 (UTC)Reply
@Kutchkutch Thanks. User:Rajasekhar1961 can you comment on why you did this? If I don't hear from you in a few days I will go ahead and delete all the Telugu rhymes. Benwing2 (talk) 14:50, 10 May 2024 (UTC)Reply

Kwami is messing with translingual entries, again[edit]

Just want to make sure there are some eyes on Kwami, as they've been making mass edits to Translingual entries that seem... worrying. After being reverted by @Theknightwho and @Benwing2 for deleting the translingual section, Kwami has recently begun deleting all the definitions from the translingual section instead.

I reverted all (but one) of the single character edits they've made today. However, they've been editing hundreds of TL entries and I have no idea how many entries are affected, as I've been very busy recently and can't check.

I'm not sure how bad the situation is so I don't want to "call out" Kwami. Just want to make sure people are aware before it becomes out of hand, like the last time this was discussed on here. — Sameer مشارکت‌هابحث﴿ 23:54, 8 May 2024 (UTC)Reply

@Sameerhameedy Thank you. I have blocked him for a month this time; I am getting seriously sick of this. I think he has used up all his lives; next time we should consider a permablock. Benwing2 (talk) 00:38, 9 May 2024 (UTC)Reply
Thank you, I'm also a bit annoyed since Kwami has gotten so many warnings and continues to do the same action. Now, Kwami has indicated that they will actually start a discussion on this issue before acting. There's no way to know if Kwami will actually follow through on that statement, but hopefully they do, so we don't have to do this every month. — Sameer مشارکت‌هابحث﴿ 00:51, 9 May 2024 (UTC)Reply
Just to clarify, these weren't random articles. I went through the whole Latin Extended Additional block and replaced physical descriptions (e.g. "the letter N with a line below") with requests for definition. I didn't delete actual definitions that would tell the reader what the letter meant or what it was used for.
Sameer, the discussion is the next thread. kwami (talk) 06:16, 10 May 2024 (UTC)Reply
@Kwamikagami That is exactly the issue. You are continuing to fail to see that there is no consensus for doing what you did, after 10+ times that you've been asked to get consensus *BEFORE* doing mass changes. If you're not seeing this now, I doubt you will ever see it, and if you're not willing to defer to and respect consensus, you're in for a permablock. Benwing2 (talk) 08:01, 10 May 2024 (UTC)Reply
@Benwing2, Sameer was concerned that there may be many more such edits, so I clarified what edits I had made. That included the category of articles I had edited, and the kind of edits I had made on them. I thought they might find that helpful.
As to your point, I wonder how possible it is to get consensus to do anything here. Hopefully the discussion below will produce consensus. My hopes aren't high, given that previous discussions got nowhere, but you never know. kwami (talk) 09:06, 10 May 2024 (UTC)Reply
I've been at Wiktionary for almost 20 years and have never yet seen a Beer parlour discussion result in consensus, so my hopes aren't high either. —Mahāgaja · talk 09:16, 10 May 2024 (UTC)Reply
You can't have read many Beer Parlour discussions, then. Kwami is simply trying to convince themselves that what they're being asked to do is impossible, because they can't ever accept they're wrong about anything, ever. It's not complicated. Theknightwho (talk) 10:57, 10 May 2024 (UTC)Reply
@Mahagaja I have seen plenty of Beer Parlour discussions that result in consensus; not sure what you're referring to. Benwing2 (talk) 14:53, 10 May 2024 (UTC)Reply
I guess ironically there's no consensus if there's consensus? And while I think us Wiktionarians like to bicker and we often disagree over certain details and such, I do think there's enough cooperation, compromise, and agreement to say that plenty of threads end inn consensus. Vininn126 (talk) 15:10, 10 May 2024 (UTC)Reply
If I decide that "q" is not a proper English letter unless followed by "u" and I want to get rid of all the English entries with a "q" not followed by "u", there is no way that I can get consensus for that via any process. That doesn't mean that I can go ahead and remove the English entries for words like Qatar and Iraq or even BBQ (it's not a proper abbreviation) because the usual process doesn't work. It means I should find something else to do. The unwritten question underlying all of this is "how can I get my way when I'm right and I can't get people to say they agree with me". Yes, the process isn't perfect, and sometimes doesn't work- but rejecting it entirely won't fix it. Chuck Entz (talk) 16:16, 10 May 2024 (UTC)Reply
That's why I'm here. The question is straightforward: do we have standards for what counts as a definition? If so, what are they? Where can I find them?
In this case, does a graphic description count as a definition? Quite a few editors have said they do not, but there seems to be difficulty in implementing that.
Also, should we have a translingual section without providing evidence of translingual use? Especially when there is no definition in that translingual section?
Do we have consensus that such things should be tagged with RFDef or RFD, and how should I respond if I tag them and someone goes through and deletes the tags without discussion because they don't like the extra work?
It's fine to say 'go to RFD', but why spend months doing that if it should be obvious from the outset that they're not going to pass? That's a waste of everyone's time. That's why I'd like some concrete standards to follow. I assume Wikt must have standards; if you could just show me where they are (I don't see anything in the help pages), I could add a link to my user page and refer to them when making edits. Then instead of arguing over every edit, I could point to the standards and show that I've been following them, or they could point to them and show that I've been violating them. I don't mean about the RFD process, but about the content of our articles. kwami (talk) 19:37, 10 May 2024 (UTC)Reply
An example to provoke thought is zebra. A definition as 'an equid with prominent black and white stripes' would be an accurate description, and would work even if they were not a clade. (An early cladistic study concluded that they were not - morphology is a poor guide to details of relationships.) For Unicode characters - and Unicode provides an important high level classification of glyphs - the general principle is that combining marks are distinguished on the basis of shape. Therefore, graphic descriptions are quite relevant for 'precomposed' characters.
If there's an objection to a claim of translinguality, then raise a request for verification.
If tags are deleted without reason, then complain to the Beer Parlour if you can't find a helpful admin.
What may seem obvious to you is not necessarily even true. There are some very obscure characters around, quite possibly restricted to expensive books.
Wiktionary has a general disdain for wikilawyering, so don't expect everything to be laid down. Wiktionary also seems quite bad at documenting things - I'd love a guide on the anatomy of a definition. --RichardW57m (talk) 12:56, 13 May 2024 (UTC)Reply
@RichardW57m For physical objects, a physical description is fine. And I think it would be fine if we had "the letter a with an acute accent, used for ...," where we went on to give its use or meaning. And a short description without definition would be fine under a 'description' section or even 'etymology', assuming it's accurate (many Unicode names are not, they're just labels that need to differ from all others). But these cases are like defining 'zebra' as "a word spelled Z-E-B-R-A", and placing it under 'translingual' because there are multiple languages that have such a word, even if they mean different things (maybe some languages use it only in the sense of a crosswalk, others only for the equid), and give it only the English pronunciation /ˈziːbrə/ because English is the most important language. (Yes, we have characters listed under 'translingual' even though we give them the pronunciation of a particular language.)
Many people have now said that the Unicode name of a character or emoji, or similar sum-of-parts description, is not appropriate on its own as the definition. The problem I've had is trying to implement that. I've been told to take it to RFD, but that generally doesn't work. It would be nice to have some agreement as to what counts as a definition. kwami (talk) 18:19, 13 May 2024 (UTC)Reply
"the letter a with an acute accent, used for ...,"

I don't know how many languages change the meaning of an accent per character. It would much more efficient to say "the letter a with an acute accent" with the page acute accent explaining what the diacritic does. — SAMEER (؂؄؏) 18:49, 13 May 2024 (UTC)Reply
I'd be fine if we had consensus on doing that, but would think we'd want something more. Such a description often wouldn't say anything more that the Unicode name, and conflicts with our sum-of-parts criterion. Plus, people often just use the Unicode name even when it's not an accurate description. Another problem is that it creates a blue link that make it look like we have a definition when we don't have a functional one. For people like me who use red links as a guide to creating missing articles, that can be a problem. Also, not all Unicode characters actually exist, some are errors. And some are rare enough that giving a graphical description as a definition doesn't do much for the reader, who may still not know what the character is used for or what languages it appears in. kwami (talk) 19:10, 13 May 2024 (UTC)Reply
The problem I've had is trying to implement that. I've been told to take it to RFD, but that generally doesn't work. - the small problem with this is that you didn't take things to RFD, though, and it won't become any more true just because you keep repeating it. It's very clear that Kwami will never, ever understand why their approach is wrong. Theknightwho (talk) 18:53, 13 May 2024 (UTC)Reply
I've taken dozens, possibly hundreds, of articles to RFD, or at least tag them so that the people at RFD respond to them. I don't know why you keep denying that. You repeating things over and over doesn't make them true either. Repeated false accusations like this are one reason I have a hard time accepting that you act in good faith. kwami (talk) 19:01, 13 May 2024 (UTC)Reply
@Kwamikagami I've found one, plus this which isn't relevant. Are you referring to all those entries you mis-tagged with {{d}} (speedy deletion), which exists to avoid doing the RFD process for routine deletions? I've just remembered that, after I found this comment where you try to bullshit about that, as well. Good grief. Theknightwho (talk) 19:22, 13 May 2024 (UTC)Reply
Yes, the POV of anyone who disagrees with you is "bullshit", while your POV is "truth". Again, arguing in bad faith and you habitually do.
I don't recall which abbreviation of which template I used for which article. Some of them were probably speedies. Some were RFD. Some later on were RFDef (which I know you think is somehow illegitimate, but I maintain is still a valid use of process). kwami (talk) 20:17, 13 May 2024 (UTC)Reply
@Kwamikagami Saying I've taken dozens, possibly hundreds, of articles to RFD after taking two is bullshit, yes. Theknightwho (talk) 20:23, 13 May 2024 (UTC)Reply
@Kwamikagami Replacing a definition with {{rfdef}} is not a valid process. In general you should never delete content even if you don't like it. Benwing2 (talk) 20:40, 13 May 2024 (UTC)Reply
I understand that. I meant that rfdef itself is a valid process. kwami (talk) 20:48, 13 May 2024 (UTC)Reply
RFDef is not a process - it's just the request template {{rfdef}}. It can't be used instead of RFD, and what you've just said makes absolutely no sense when the only times you used RFDef were in an attempt to circumvent the RFD process. Theknightwho (talk) 20:55, 13 May 2024 (UTC)Reply
If RFD doesn't work, then RFDef is another possibility. If no-one can furnish a definition, then there's an argument that the entry should be deleted. How is that "circumventing" the process? Again, you attribute bad faith to anything you don't like, which simply shows bad faith on your part. kwami (talk) 21:01, 13 May 2024 (UTC)Reply
@Kwamikagami If RFD doesn't work You've only ever taken two things to RFD, and one of those wasn't an entry, so you cannot possibly make that claim. You also seem to be under the bizarre impression that you're entitled to delete entries in other ways if you don't get what you want out of the RFD process.
I'm out of patience with this complete and utter refusal to understand the problem, and it's pretty clear other people are as well. Theknightwho (talk) 21:08, 13 May 2024 (UTC)Reply
A few dozen articles were deleted, so somehow your count is off.
Not deleting entries in other ways, requesting completion or deletion in other ways. I didn't delete these entries I was blocked for. I replaced non-definitions with requests for definition -- and I won't do that again -- but the entries remained. kwami (talk) 22:02, 13 May 2024 (UTC)Reply
@Kwamikagami There's nothing wrong with my counting: you only brought two things to RFD. It's not difficult to understand. Theknightwho (talk) 22:13, 13 May 2024 (UTC)Reply
To clarify, it looks like @Kwamikagami only brought two articles to RFD, but tagged a bunch of articles for speedy deletion, which were deleted before the admins realized those speedy deletions were bogus. RFD is in general the correct process for requesting deletion of pages you believe ought to be deleted that don't meet the speedy deletion criteria; but IMO if you are going to request deletion of a large number of articles, you should not tag every article with RFD, just make a post in RFD with a title "all articles meeting such-and-such criteria" and give your reasons. Or alternatively, bring it up in the Beer Parlour if it's controversial and merits being seen more generally. Benwing2 (talk) 22:31, 13 May 2024 (UTC)Reply
Precisely. Theknightwho (talk) 22:32, 13 May 2024 (UTC)Reply
Okay, so that's what I'm doing: bringing it up at the Beer Parlour, as I was advised to do.
But if it's better at RFD than here, I can start a thread there.
As for "bogus", the opinion I got from other editors at the time was that, if an article had no real content, then it met the criteria and it should be deleted. That specifically included articles that consisted of nothing but the character box and Unicode name in the definition section. I didn't start this, but picked up from where I saw others acting. This has been happening for years, especially with emojis, where someone would go through and create batches of emoji articles defined simply as their unicode names, then someone else going through and deleting them, then someone else recreating them, etc. You can see that in their deletion histories. Less common with non-emoji characters, but there's a history of this there as well. So not only did I have no reason to think this was inappropriate, I was told it was appropriate and was something Wikt needed to keep on top of. kwami (talk) 00:45, 14 May 2024 (UTC)Reply
Yes, and those people would have expected you to take any entries like that to RFD, instead of unilaterally deleting them, as there needs to be consensus that they contain no content. How are you still failing to understand such a simple concept? Theknightwho (talk) 10:42, 14 May 2024 (UTC)Reply
What he wrote was "I've taken dozens, possibly hundreds, of articles to RFD, or at least tag them so that the people at RFD respond to them". Bickering about numbers is not constructive. The issue is whether or not he (or anyone) should be wholesale deleting translingual definitions and it seems pretty clear that the answer is no. —Justin (koavf)TCM 23:20, 13 May 2024 (UTC)Reply
I didn't see that as deleting definitions. I replaced Unicode names with requests for actual definitions. I won't do that again. kwami (talk) 00:47, 14 May 2024 (UTC)Reply
I appreciate that you're acting in good faith and trying to do what you think is best, but it's not clear to me that you're correct and I'm honestly very surprised that you keep on doing things that seem like big unilateral changes without consulting others first because this kind of complaint has come up repeatedly. —Justin (koavf)TCM 00:59, 14 May 2024 (UTC)Reply
Yes, it has. I need to be more careful. I could've just tacked the RFDef tag on the end of the description. kwami (talk) 03:31, 14 May 2024 (UTC)Reply
@Kwamikagami That's not the right thing to do either. IMO you need to get consensus *BEFORE* making changes to large numbers of pages, even just adding {{rfdef}}; if you can't get that consensus, don't make the changes. Benwing2 (talk) 03:44, 14 May 2024 (UTC)Reply
Okay. We'll see how the question below pans out. kwami (talk) 03:45, 14 May 2024 (UTC)Reply
The letter 'a' with a grave accent is a translingual description. The particular meaning of the grave obviously varies from language to language, and sometimes it may merely be an arbitrary diacritic, perhaps even just a word diacritic as in French. Translingually, the accent part is often used as a tone mark, but I think it can also be a symbol used as an input to stress assignment rules (though I may be being confused by my own private invention). Perhaps we need a vote on whether precomposed characters can be dismissed as sums of parts.
We even have some interesting language assignment questions. For example, 'ṉ' denotes the alveolar nasal of Tamil and Malayalam, but I think it's arguable that it isn't a letter of those languages. I'm a bit bothered by a pair of Lithuanian diacritics which don't seem to be any part of standard Lithuanian. --RichardW57m (talk) 12:41, 14 May 2024 (UTC)Reply
Yes, we define '' as the Latin transliteration of letters in Tamil and Malayalam scripts, and place that under a translingual heading rather than under Tamil or Malayalam headings. I think that's appropriate. For á, we don't have a translingual heading, as no-one has come up with a translingual use/definition. Not saying one doesn't exist; e.g. there's IPA [a] with high tone, but as you say that's SOP. Again, I think that approach is appropriate. The question then is whether we want to make this our general approach, and reserve the translingual section for definable translingual uses. (Perhaps if we had a translingual use of 'á', we might also give the SOP use in IPA for clarity, but that wouldn't be enough to create the translingual section in the first place.) kwami (talk) 18:12, 14 May 2024 (UTC)Reply
In favor of this. Vininn126 (talk) 09:41, 10 May 2024 (UTC)Reply

Do descriptions count as "definitions"?[edit]

I'm not being facetious here. This is a serious question for something I haven't understood for a long time.

For instance, in the article á, would "the letter a with an acute accent" be a valid definition? If so, should such descriptions be added to all letters? If not, should they be removed (perhaps placed under a "Description" heading instead)? And if not, and the only material for an article is such a non-definition, should the entry be tagged as needing a definition, or the article tagged for deletion for having no content?

I suspect that if I were to add a definition to cat as "the word spelled C-A-T", I would be blocked for vandalism. I don't see any meaningful difference between that and defining á as "the letter a with an acute accent". I've been told this is a straw-man argument, but I really don't understand what's appropriate in our entries if graphical descriptions are allowed as actual definitions.

The same applies to emojis, of course. Should an emoji of a face with tears be defined as "a face with tears", or should the definition be what it means and what it's used for? kwami (talk) 00:29, 9 May 2024 (UTC)Reply

@Kwamikagami I agree that "the letter a with an acute accent" is not a good definition. If á had a translingual section it should at least explain how the letter is typically used across languages. I assume it usually represents some kind of /a/? Ioaxxere (talk) 01:43, 9 May 2024 (UTC)Reply
A definition that depended on users understanding IPA, however, would be unsatisfactory. DCDuring (talk) 01:56, 9 May 2024 (UTC)Reply
Personally, I don't see the point of a translingual section, except for things like the IPA or IAST transliteration, and this particular case would only be sum-of-parts in such cases.
But my question is what should be done with articles that have such non-definitions. Since starting this discussion, I was blocked for pasting [rfdef] tags yesterday on a bunch of articles in place of such descriptions.
So,
  1. since it is not a good definition, can it be removed?
  2. should it be replaced with a request for definition, or should the empty section be deleted?
kwami (talk) 01:56, 9 May 2024 (UTC)Reply
@Kwamikagami: I think it's better to improve the definition rather than adding a ton of {{rfdef}}s as this creates lots of work for other editors. Ioaxxere (talk) 02:09, 9 May 2024 (UTC)Reply
I agree, and I've been doing that where I can. But how do I improve the definition when there is no definition? What is the translingual definition of a letter that does not have translingual use? What is the definition of a letter that has no evidence of any kind of use? What I've been tagging are cases where I can't find any definition to provide.
The reason I've been adding rfdef tags is that I'm not allowed to delete empty entries.
So, if there's an empty article or section, one that has no content except a character box, and no definition of what's supposedly being defined, what's the solution? Do we leave it as a joke, or do we try to improve it? If we want to improve it, how do we do that, when there's no available data to improve it with?
If someone added a bunch of articles, all with the definition being "it's a word", shouldn't we at the very least tag them as needing actual definitions, even if that creates work for people? kwami (talk) 02:17, 9 May 2024 (UTC)Reply
I've gone through and added definitions to hundreds of these articles. The ones I've tagged are ones where I can find to definition to give. It's a choice of adding a tag or leaving Wikt looking like a joke. kwami (talk) 02:24, 9 May 2024 (UTC)Reply
Clearly, most definitions are simple descriptions, prototypically, for nouns, having a hypernym and differentia. The descriptions are also supposed to be useful to users. "The word spelled C-A-T" is not useful being redundant to the graphic representation of the headword itself, thus a straw-man. For a Latin letter with a diacritical mark it might be useful for some that the definition explained how the so-marked letter differs from the Latin character without the mark or those with other marks in each relevant character set.
A definition of a word naming an emoji might include a description as well as what the emoji is understood to mean, like a good definition of green light. The entry might also have the appropriate image, too, constituting an ostensive definition, redundant to the headword in the case of Latin characters diacritically marked. DCDuring (talk) 01:56, 9 May 2024 (UTC)Reply
> "The word spelled C-A-T" is not useful being redundant to the graphic representation of the headword itself"
But "the letter a with an acute accent" is equally redundant to the graphic representation of the headword itself. So no, it's not a straw-man, it's reductio ad absurdum.
Because our users include many people who are not familiar with diacritical marks on Latin letters we give them an explanation of what they are looking at. Those descriptions also help those who, like me, don't have great vision and can't necessarily discriminate among the various diacritical marks and don't necessarily know the names of those marks. DCDuring (talk) 12:29, 9 May 2024 (UTC)Reply
I have no problem with that. That's what the 'description' section is for. But it's not a definition.
We also have a pronunciation section for people who don't know how to pronounce a word. But again, the pronunciation of a word is not its definition.
In many cases I've moved the description to a description section, and tagged the definition section as needing a definition. But then people get annoyed that I'm creating work for them, because now they're expected to treat Wikt as an actual dictionary.
Much of the opposition to improving articles seems to center around it being more important for Wikt to be correctly formatted and to look good, than for it to actually contain any content or be useful as a dictionary. kwami (talk) 21:58, 9 May 2024 (UTC)Reply
Definitions of nouns are not descriptions of the word, but of the meaning of the word. Orthography and pronunciation belong in other sections: they are not the definition itself. Why should graphemes (incl. emojis) be different? Many have (graphical) description, etymology and pronunciation sections. Those cover the description of the letter as a mark on paper or as said. A definition concerns itself with meaning. If no meaning is provided, so the reader can't tell what the symbol is for, then we're not providing a functional definition. kwami (talk) 02:00, 9 May 2024 (UTC)Reply
BTW, we also have cases of 'translingual' sections with zero evidence of translingual use. Sometimes letters are specific to a particular language, yet they have a translingual section with no definition. All that does is push the actual definition down lower on the page, where you can't see it without scrolling. That is a minority situation, but we do have hundreds of articles with "the letter a with an acute accent"-type descriptions as their 'definition'. kwami (talk) 02:10, 9 May 2024 (UTC)Reply
I guess my question is, if it's not acceptable to add these fake definitions, and it's not acceptable to tag them for improvement, why is it not acceptable to delete them? kwami (talk) 02:29, 9 May 2024 (UTC)Reply
@Kwamikagami What's not acceptable is deleting them without going through RFV or RFD. Come on, we've said this many times by now. Theknightwho (talk) 02:36, 9 May 2024 (UTC)Reply

English anagrams[edit]

English anagrams haven't been updated in a while. Could someone run a bot to update them? Maybe @Kiril kovachev, Benwing2 Ioaxxere (talk) 01:43, 9 May 2024 (UTC)Reply

@Ioaxxere I can try, but I'm not sure if I trust myself to do it properly. Specifically the part of which characters (like punctuation) should be removed when comparing two words. Kiril kovachev (talkcontribs) 12:41, 9 May 2024 (UTC)Reply
@Kiril kovachev: It doesn't matter too much, since the vast majority of English terms don't have any special characters. Punctuation (periods, commas, etc.) as well as different casing should definitely be ignored, but I have no preference with respect to diacritics. Ioaxxere (talk) 15:45, 9 May 2024 (UTC)Reply

becocked, and whether we want Trivia sections[edit]

Recently, Trump used the word becocked, which attracted some attention because it's an unusual word, and quite a lot of people thought he'd made it up (even though he didn't). Is this the kind of thing we want to note in trivia sections? To me, it seems like the kind of thing no-one will care about in a month, and that it adds pointless clutter. Pinging @Ioaxxere, who originally added it as a usage note, but later changed it to the little-used Trivia heading. Theknightwho (talk) 02:12, 9 May 2024 (UTC)Reply

His use of the term attracted some media coverage ([4] [5]) making it probably the most notable event in the history of becocked. Does a single sentence about this really add so much clutter? Ioaxxere (talk) 02:18, 9 May 2024 (UTC)Reply
Ask yourself this: in two years' time, if someone came across this in an entry, would they feel like this addition was the cringeworthy result of terminally online recency bias? Almost certainly yes. It's basically just celebrity gossip. Theknightwho (talk) 02:21, 9 May 2024 (UTC)Reply
"X said this word" should never go in Trivia/Useful Notes. It can go as a quote, however. CitationsFreak (talk) 03:08, 10 May 2024 (UTC)Reply
I agree completely. I think trivia sections should chiefly be used for things like noting that a word is thought to be the longest in a particular language, has no vowels, doesn’t rhyme with any other word—that sort of thing. — Sgconlaw (talk) 11:35, 10 May 2024 (UTC)Reply
I agree too. PUC19:24, 11 May 2024 (UTC)Reply

Manipuri vs Meitei language (moved to RFM)[edit]

Discussion moved to WT:RFM#Manipuri vs Meitei language.

Performing bulk edits for Bengali/Bangla[edit]

Discussion moved from Wiktionary talk:Beer parlour/2024/May.

I'm a NLP researcher who uses Wiktionary to collect pronunciation data. As part of this effort we have noticed various inconsistencies in phonemic transcription. For example,

1. According to various sources (Khan, 2010; Dasgupta, 2003, Ferguson & Chowdhuri, 1960; Chatterji, 1970), Bengali have only one voiceless glottal fricative /h/, so /ɦ/ > /h/. E.g.: অকৃতোদ্বাহ 'bachelor' /ɔ.kri.t̪od̪.ba.ɦo/ > /ɔ.kri.t̪od̪.ba.ho/. This IPA symbol is not correctly represented in Wiktionary Bengali transliteration guide. Therefore, I propose to edit the guide.

2. The correct phonemic transcription (ref. Dasgupta, 2003, Ferguson & Chowdhuri, 1960; Chatterji, 1970) for affricates should include the tie-bar, so /tʃ, t͡ʃʰ, dʒ, d͡ʒʱ/ > /t͡ʃ, t͡ʃʰ, d͡ʒ, d͡ʒʱ/. E.g.:চরম 'extreme' /tʃɔɾom/ > /t͡ʃɔɾom/, ছায়াছবি 'film' /tʃʰae̯atʃʰbi/ > /t͡ʃʰae̯at͡ʃʰbi/, জল 'water' /dʒɔl/ > /d͡ʒɔl/, ঝিনুক 'sea shells' /dʒʱinuk/ > /d͡ʒʱinuk/. This tie-bar is not included in Wiktionary Bengali transliteration guide. I proposed to include this tie-bar for affricates symbols.

3. According to various sources (Khan, 2010; Dasgupta, 2003, Ferguson & Chowdhuri, 1960; Chatterji, 1970), Bengali doesn't have palatal plosive /c and ɟ/. Instead it has post alveolar affricates (ref. https://en.wiktionary.org/wiki/Wiktionary:Bengali_transliteration). Therefore, /c/ > /t͡ʃ/ and /ɟ/ > /d͡ʒ/. E.g. : অগোচর 'beyond one's knowledge' /ɔɡocɔr/ > /ɔɡot͡ʃɔr/, অগ্নিযুগ '(figurative) the age of revolution' /oɡniɟuɡ/ > /oɡnid͡ʒuɡ/.

Does there exist any tool or API that could allow us to apply bulk edits? If this sounds right, I will start to make corrections. Arundhatisgupta (talk) 16:06, 9 May 2024 (UTC)Reply

I relocated this post because it was in the wrong place. — Sgconlaw (talk) 16:20, 9 May 2024 (UTC)Reply
The IPA has long held that the tie bar is not necessary when transcribing languages that don't distinguish affricates from stop-fricative sequences. If Bengali doesn't distinguish /t͡ʃʰ/ from ?/tʃʰ/, then our current transcription convention is fine.
In describing the phonetics of a language, you want to be as precise as possible, so the ties are a good thing. But with a key like we have, they're not necessary.
The tie bars clutter a transcription and can make it more difficult to read. If we did implement them, it would probably be better to use the under-tie, ⟨t͜ʃʰ⟩. That's generally more legible because our eyes pick up details better at the top of a symbol, so the under-tie is less distracting. kwami (talk) 05:12, 10 May 2024 (UTC)Reply
While "the tie bar is not necessary", it is good practice to include it and most languages on Wiktionary do. I don't see why Bengali would be an exception. Thadh (talk) 11:36, 10 May 2024 (UTC)Reply
I agree with @Thadh.
@kwami It is not necessary for English as well. Why did you included it in English? Also, there is no consistency. If you think it is not necessary then make sure that you maintain that consistency. E.g.: অগচ্ছিত 'not entrusted to anyone' /ɔɡot͡ʃt͡ʃʰit̪o/ has the tie bar but চরম 'extreme' /tʃɔɾom/ doesn't. What do you think about that? Arundhatisgupta (talk) 16:37, 10 May 2024 (UTC)Reply
Whichever convention is chosen, it should be consistent, and should match the key. kwami (talk) 19:19, 10 May 2024 (UTC)Reply
There will be a confusion when /t/ and /ʃ/ occurs together but they are not affricate. E.g. কুৎসা 'slander' /kutʃa/ and বচসা 'contention' /bɔtʃoʃa/. Without a tie-bar they seems like having similar pronunciation for /tʃ/ but the correct pronunciations are - /kutʃa/ and /bɔt͡ʃoʃa/. Arundhatisgupta (talk) 16:51, 10 May 2024 (UTC)Reply
That can be handled as ⟨kut.ʃa⟩ and ⟨bɔtʃoʃa⟩ or as ⟨kutʃa⟩ and ⟨bɔt͜ʃoʃa⟩ -- or, for maximal clarity, as ⟨kut.ʃa⟩ and ⟨bɔt͜ʃoʃa⟩. Just as long as we're consistent, or people will get really confused. kwami (talk) 19:22, 10 May 2024 (UTC)Reply
I personally think there should be a tie bar and it should go above, which is the more common practice. Benwing2 (talk) 21:06, 10 May 2024 (UTC)Reply
@Kwamikagami 1. If you are introducing a syllable break (indicating with a dot), then it should be applied consistently for all words in Wiktionary.
2. According to Wikipedia, undertie is used to represent linking (absence of a break) in the International Phonetic Alphabet. E.g.: /vuz‿ave/ (Ref. https://en.wikipedia.org/wiki/Tie_(typography)#cite_note-6) Arundhatisgupta (talk) 21:34, 10 May 2024 (UTC)Reply
Linking is used to override the orthographic spaces we insert between words in transcription. In that example, the words are //vuz ave// but the pronunciation is /vu.za.ve/. The /za/ forms a single syllable. That tie is not the same thing as the 'slur' tie used for affricates, which comes from musical notation (slurred notes). kwami (talk) 21:56, 10 May 2024 (UTC)Reply
I think that there should be tie bar and it should go above, which is the more common and establish practice for phonemic/phonemic transcription. It is important to maintain the consistency within language and across Wiktionary.
Is there any objection regarding other inconsistencies mentioned in the proposal? Arundhatisgupta (talk) 09:52, 11 May 2024 (UTC)Reply
@Arundhatisgupta No objections from me although I don't know enough about Bengali phonology to say whether e.g. the use of palatal plosives or affricates is correct. IMO the best way to go about making these changes is either manually or through AWB or JWB, which let you quickly do semi-manual changes based on regexes. Benwing2 (talk) 18:15, 11 May 2024 (UTC)Reply
@Benwing2 Could you please add me to Wiktionary:AutoWikiBrowser/CheckPageJSON ? Arundhatisgupta (talk) 14:39, 14 May 2024 (UTC)Reply

How should we present Latin adjectives that inflect like nouns (or that are really appositive nouns?)[edit]

A few times now, I've been puzzled about how to handle showing the inflected forms of certain Latin third-declension adjectives that don't fit well into any of the usual adjective inflection patterns, because they show the endings typical for a noun instead. Currently, these seem to mostly be treated in our entries as third declension adjectives of "one ending", but I think there are some issues with the accuracy of this in terms of showing forms and usage.

A particularly clear case is certain rare words that are attested with adjectival function but that have the form of feminine nouns, such as silvicultrīx, -trīcis and Nīlōtis, -tidis (which have the forms respectively of Latin and Greek feminine agent nouns). The masculine counterparts would presumably be *silvicultor and *Nīlōtēs, but these do not to my knowledge occur, and in any case, we normally treat agent nouns as noun lemmas (distinct for masculine and feminine) rather than combining the masculine and feminine versions under one adjective lemma. Should we lemmatize such words as nouns and include a usage note saying that they're used appositively? Or should we put them as adjectives (as many dictionaries do) but include some kind of special headword and declension table coding to avoid showing masculine or neuter forms, which I think aren't accurate in this case? For example, Gaffiot marks silvicultrix as "adj. f".

Not quite as clearcut are cases like senex, iuvenis, mās that generally have the form of nouns, commonly function as either nouns or as adjectival or appositional modifiers of masculine or feminine nouns, but are extremely rare or unattested in the neuter. (I've found some neuter forms attested in some cases in New Latin.) Functionally, I think there isn't much difference between how mās and fēmina are used, but we treat mās as a noun or adjective and fēmina as only a noun. Urszag (talk) 02:49, 11 May 2024 (UTC)Reply

@Urszag We have a whole category Category:Latin first declension adjectives for words like amnicola and indigena that don't seem so different from the words you've cited, and there are unquestionably third-declension non-i-stem adjectives (e.g. vetus, concolor) that "show the endings typical for a noun", so I don't see an issue treating these as adjectives. Benwing2 (talk) 05:33, 11 May 2024 (UTC)Reply
Yes, we have that category for first-declension adjectives. The full inflection of those words as adjectives is actually a bit questionable also (there was an RFV that I closed based on New Latin examples, but I added some notes in Appendix:Latin_first_declension discussing how the neuter plural nominative/accusative/vocative forms in -a and dative/ablative forms in -īs are rather hypothetical and ambiguous, as they've often (at least since Priscian's time) been interpreted as belonging to a second-declension paradigm instead (e.g. that of indigenus).
Third-declension non-i-stem adjectives such as vetus exist, but are rare (aside from comparative forms). When there are attested neuter forms distinct from the masculine/feminine forms (such as vetus in the accusative singular, or vetera in the nominative/accusative plural), this establishes that a word is formally distinct from an appositive noun (and also establishes whether the neuter plural ends in -ia or -a, which can't always be predicted from the forms of the ablative singular or genitive plural). But I think that such neuter forms are often unattested (except sometimes in very late periods of the language) and in that case it's arguably misleading to just present a single full declension table. E.g. I found iuvenia once in Medieval Latin and occasionally in New Latin, and a couple of New Latin cases of iuvena (both from the same author), but I think it's more misleading than not to present either of these as established or standard Latin forms: a late imperial-era grammarian says that this word simply lacks neuter plural forms. In cases like this, there's an existing parameter to mark an adjective as lacking neuter forms, so I ended up using that and mentioning other forms in usage notes. But in cases like silvicultrīx and Nīlōtis, I don't know how to best present the fact that they occur only as adjectival modifiers of feminine nouns: for now I've just removed the declension table from the second word (since several forms are unattested, or only attested in post-Classical texts, and the Greek origin makes it tricky to actually infer what missing forms would be), but for silvicultrīx it seems fairly clear that it would simply inflect like victrīx. If we continued to categorize these as adjectives, does it sound reasonable to establish a parametric way to mark them as feminine-only?--Urszag (talk) 06:33, 11 May 2024 (UTC)Reply
@Urszag Yes, I think so. We have done that for some other languages, e.g. French adjective headwords have an |onlyg= parameter that you can set to a gender (m or f), a number (s or p) or a gender-number combination (e.g. m-s, f-p). Now mind you, some of the terms that make use of this (e.g. enceinte) would IMO be better treated as conventional adjectives that are simply rare in other genders or numbers; there's even a usage note for enceinte that says
The masculine form enceint is occasionally used with regard to transgender men, for species with male pregnancy such as seahorses, as well as in metaphorical, jocular, or fantastic contexts.
And indeed you will find that in Spanish, the corresponding words like encinto, embarazado and preñado are given in the masculine, with quotes establishing that such usage does exist. But if the term is indeed unattested in some genders, I would definitely support adding a flag to suppress those genders in the declension table and make sure that the title next to the declension table reflects this. Benwing2 (talk) 06:53, 11 May 2024 (UTC)Reply

Default font size for polytonic Greek[edit]

Does anyone else think the default font size for polytonic Greek should be increased? It looks small to me, especially in SBL Greek, which is the font with the highest priority in the CSS. Weylaway (talk) 19:55, 12 May 2024 (UTC)Reply

It seems OK to me; can you post screen shots showing how it looks for you? BTW I do notice when comparing polytonic Ἀριστοτέλης (Aristotélēs) to non-polytonic Αριστοτέλης (Aristotélis) that the latter seems relatively ugly because it uses a sans-serif font. Benwing2 (talk) 21:17, 12 May 2024 (UTC)Reply
FWIW, this is what it looks like for me (top is the level of zoom I usually use, bottom is 100% zoom). - -sche (discuss) 22:13, 12 May 2024 (UTC)Reply
Interesting; your polytonic looks sans-serif, while your non-polytonic looks more serif, which is the reverse of what I see. Benwing2 (talk) 22:19, 12 May 2024 (UTC)Reply
For non-polytonic, it's using Gentium for me, the second font in the list (because I don't have Athena, the first font in the list). For polytonic it was using DejaVu Sans (the third font in the list, because I didn't have SBL Greek or Athena). Now that I've downloaded SBL Greek, it looks like this, displaying polytonic in SBL Greek, which is heavily (IMO distractingly) serifed and slanted and handwritingesque. SBL Greek polytonic text is indeed smaller than other text, though not unreadably so IMO. (But I do think SBL Greek looks worse than other fonts in the list, so I might be tempted to move it down in the ranking... but perhaps it is first because it has the best diacritic support?) - -sche (discuss) 06:56, 13 May 2024 (UTC)Reply
@-sche Hmm. I checked using the Computed tab in Chrome and, if it's correct, my polytonic font is using Times (postscript name "Times-Roman"), which is way down the list, and my non-polytonic font is Arial Unicode MS, which is likewise way down the list. Maybe this is because I'm on a Mac, although I'm surprised there aren't more fonts installed by default. BTW here is what it looks like: [6] Benwing2 (talk) 07:25, 13 May 2024 (UTC)Reply
I personally think SBL Greek looks the best and I use a custom style to display it at 130% size. I was just thinking of learners who may find the default hard to read – in -sche's example the x-heights of the Greek letters are smaller than those of the Latin letters next to them. But clearly there is variation across operating systems and difference in personal preference, so maybe it's better for me just to use my custom style. Weylaway (talk) 17:46, 13 May 2024 (UTC)Reply
Thank you @Weylaway for your question. Greek (script Grek, polyonic and monotonic alike) look miserable and small at en.wikt. I have no idea what the default font is for this site (as in sc=Latn), or what the designers wish their readers to view. Default looks much beter. Or perhaps an equivalent making sure that grave accent is shown (not a vertical accent). User:Sarri.greek/fonts#default. Ancient Greek inflectional tables, which should have a 110%, are even smaller. Even the prosody marks look better with normal default fonts. Thank you again, for putting this. ‑‑Sarri.greek  I 21:38, 13 May 2024 (UTC)Reply

Blottoism[edit]

Don't let him die forgotten. Talk:upput. I hope my gastric bad temper will also survive. True story: our librarian/researcher has asked me why our system contains some non-existent records. It's because Artefactual's API is wrong. But I've got half a day of billable debugging before I can prove it. Equinox 02:00, 13 May 2024 (UTC)Reply

Old Pskovian[edit]

I propose to add an etymological code for Old Pskovian (~zle-ops?), as part of Old Novgorodian (zle-ono) in the branch of East Slavic languages. Cases of mention of Old Pskovian. This is a dialect and variety of Old Novgorodian, which was in ancient Pskov and its environs (https://ru.wikipedia.org/wiki/Древнепсковский_диалект). What do you think @Thadh? AshFox (talk) 04:29, 13 May 2024 (UTC)Reply

Don't see any issue with this. If nobody opposes, I'll add it in a week or so. Thadh (talk) 19:23, 13 May 2024 (UTC)Reply
No objections. Benwing2 (talk) 20:32, 13 May 2024 (UTC)Reply

Should we split up multi-language pages?[edit]

Currently, a user trying to get to da#Zhuang on desktop has to:

  1. Type "da" into the search bar.
  2. Wait for the massive page to load (this could take a while on older devices or on slower connections).
  3. Scroll for a very long time until reaching "Zhuang" in the table of contents.
  4. Click it.

On mobile, the situation is even worse since in fact there is no table of contents.

Maybe a better option would be to have da function as a sort of disambiguation page which lists all of the available languages in a compact table. In this case, the user would quickly be able to locate and click "Zhuang" which would take them to da/Zhuang. Also, since da and da/Zhuang would both be very compact, the loading times would be practically instantaneous.

Also, doing this would also solve all of our Lua-related problems (at least for the near future). What do we think @Chuck Entz, Benwing2, Theknightwho? Ioaxxere (talk) 05:50, 13 May 2024 (UTC)Reply

@Ioaxxere This has been proposed various times but it would be an enormous undertaking and would (of course) have some downsides, such as requiring more clicks to view anything and not so easily being able to see the similarities among different languages that share the same spelling. Maybe a less radical solution for the time being would be, as Chuck proposes, to move letter information out of letter pages into an Appendix or something. Benwing2 (talk) 05:58, 13 May 2024 (UTC)Reply
(Not saying I think we should split pages, but) something I suggested in past discussions which would address the "more clicks to view anything" problem is : if we split, transclude the subpages back onto the 'main' entry, so someone looking up e.g. the main sender page rather than sender/da still sees all the languages. Transclusion could be the default, and for the few pages with excessively many language sections where it wouldn't be feasible (particularly because I think transcluding a page causes it to count 2x against the PEIS limit? and causes any templates it transcludes to thus count 4x? and even Tim Starling has said that raising the PEIS limit is not something the devs will do), we could fall back on having a table like Ioaxxere suggests. BTW I think the usual proposal is to use language codes in naming the subpages, rather than language names, which may be long and contain untypable characters. Either way, we have to watch out for conflicts with pages that actually contain slashes, e.g. s/he.
In this case, I'm inclined to agree that moving the letters to alphabet appendices is a solution to most of the immediate problem. Let something like Appendix:Dutch alphabet give the names and pronunciations of all the Dutch letters in one place, rather than giving them on a, b, etc. Just have a ==Translingual== entry on a and maybe categorize all the Appendices that use a into a category like "Category:Alphabets that use Latin a" or something and then have a link in the Translingual entry to that category...? - -sche (discuss) 06:31, 13 May 2024 (UTC)Reply
@-sche: I'm very confused as to why letter entries are being mentioned? The initial page mentioned is da which isn't even about a single letter. There's only one "letter name" entry on it being Tagalog da. mi is one of the worst pages, if not the worst, when it comes to this, so again, trying to focus on letters is not the way to go about it. This type of proposal was proposed in 2020 and was not passed then either. Also, as I mentioned on said vote, most of the bytes on a aren't from letter entries either. Let's focus on finding a solution that actually fixes the overarching problem, rather than throwing us into the issue of letters again. AG202 (talk) 14:13, 13 May 2024 (UTC)Reply
Additionally, the notion of moving letter entries comes from a clear, whether intentional or not, Latin script language-bias. As I mentioned in the vote, entries like (n) should not belong in an Appendix or Translingual just because some other languages on Wiktionary do their letter entries poorly. AG202 (talk) 15:24, 13 May 2024 (UTC)Reply
@AG202: It's definitely not a Latin-script bias. The same applies to Cyrillic, Greek, Perso-Arabic, Georgian... The fact that certain writing systems are slightly more complex and language-specific doesn't mean that all those that aren't deserve an entry for every language and every grapheme.
By the way, I don't see a Jeju entry for , and something tells me that if you were to duplicate this content three times (Middle Korean, Korean, Jeju) you will not be such a fan of keeping the three entries on one page. Thadh (talk) 16:36, 13 May 2024 (UTC)Reply
I would be. Just as I am with every other letter entry. The only reason I haven't created them myself is because I haven't had the time to. Also, even if it's not from a Latin-script bias, I still do not think that several smaller language communities are being considered.
That being said, this still doesn't address my main point that this doesn't actually fix the problem. If you don't want letter entries that's fine, that's another conversation, but let's not pretend that it's going to fix this current lua memory issue. It's not even a solution to "most" of the immediate problem at pages like a, nor does it fix anything at all at pages like da or mi or la. AG202 (talk) 17:21, 13 May 2024 (UTC)Reply
For reference at a: there are 170 L2s, and 64 letter entries (with the header "Letter"). So not even half of the L2s there have letter entries. It's frankly overblown. That's not even considering the L2s that have significantly much more content in their non-letter entries compared to the letter entry such as English a. AG202 (talk) 22:17, 13 May 2024 (UTC)Reply
@AG202 Not sure it's overblown, since a is the article causing the most headache in terms of Lua and parser limits. a is currently at 1.887MB out of an allowed 2.097MB in post-expand include size, and removing all the letter entries would bring that down noticeably. People are also thinking ahead to the fact that there are 5,000+ languages that use the Latin script, and we can't possibly have an entry for the letter a in every such language; whereas the number of languages where a is a word (excluding those where it's the name of the letter a) is much more limited. (Note also that when I just previewed a, I got a CPU timeout. User:Surjection may have inadvertently made this worse by lite-ifying a bunch of the templates again; the preview showed 43 seconds of CPU time and 53 seconds of real time, vs. 23 seconds of CPU time and 30 seconds of real time when previewing a slightly earlier version not using the lite templates. YMMV though, as there is a lot of variation in the CPU times.) Benwing2 (talk) 22:44, 13 May 2024 (UTC)Reply
@Benwing2: "Removing all the letter entries would bring that down noticeably", can we actually get the numbers for this? Because when I tested that back in 2020, that wasn't the case. CC:@Surjection
"Whereas the number of languages where a is a word (excluding those where it's the name of the letter a) is much more limited": I also don't think that's the case. Again, looking at what's on the ground right now, there are significantly more non-letter entries that are taking up "space". There are only 64 L2s with letters or letter names, out of 170. Clearly the focus should be elsewhere.
Letter entries don't even take up that much space relatively; they don't have quotations like Sassarese a and they don't need usage notes like Serbo-Croatian a. I'm much more worried about 100 more L2s with non-Letter POSs as that's more realistic and takes up significantly more space, instead of a very rare possibility of 5000+ letter entries. Hell, the English entry at a has twelve etymologies outside of the letter entry, and is itself equivalent to several languages' letter entries.
Let's focus on actual long-term solutions, like the TOC option being discussed below, rather than taking out information that users like myself and others find useful. AG202 (talk) 23:18, 13 May 2024 (UTC)Reply
User:Thadh/a I've removed all noun senses for letter names (except for the Norwegian figurative ones) and letter senses. I don't know how to measure whether the page loads better, but at least there are no lua errors anymore. Thadh (talk) 13:47, 14 May 2024 (UTC)Reply
Thank you! Yeah looking at the page you've linked, removing the letters (and letter names which I thought were a separate issue) only removed 27225 bytes, which may seem like a lot, but that's out of 197738 bytes initially. That means that letters & letter names only account for ~14% of the bytes on the page, which is exactly what I was talking about. Letters on their own account for even less. We'd reach the max byte limit again in no time even if we barred letters from being added. (Also by my count only 14 L2s have solely a letter and/or letter name entry out of 170) AG202 (talk) 14:19, 14 May 2024 (UTC)Reply
«no table of contents»? What do you mean? Tollef Salemann (talk) 06:19, 13 May 2024 (UTC)Reply
Are the pages titles "Appendix: Variations of the letter [LETTER]" complete? Could they be made to include all the languages that use the given letter? If they could, that would eliminate one advantage of the current letter-page structure: comparison of letter use across languages. DCDuring (talk) 12:00, 13 May 2024 (UTC)Reply
At the moment, the "Variations of" appendices are language-neutral. We could agree to change that, but I think a separate set of appendices would probably be the better approach. —Mahāgaja · talk 12:38, 13 May 2024 (UTC)Reply
That's not what those pages are for - those are to list confusable/similar terms so that we don't clutter {{also}} with massive lists at the top of a page. Theknightwho (talk) 12:39, 13 May 2024 (UTC)Reply
Right, I am wondering about extending their purpose to overcome a short-coming of off-loading letters to language- or script-based appendices: that one loses the ability to compare across languages. If we need to create something additional to preserve their purity of their current purpose, I would not object. It just seems that their current narrow purpose could be broadened to make them more effective even at achieving their current purpose. DCDuring (talk) 13:52, 13 May 2024 (UTC)Reply
I suppose, but I'm not sure how useful it'd be (especially with letters like "a"). It'd make more sense with less common letters, though. Theknightwho (talk) 14:01, 13 May 2024 (UTC)Reply
Based on the persistence of this problem over at least a decade, it seems that we are forced to use incremental solutions. Not all incremental solutions have to be technical. As was observed above, letters (and symbols) are not like the rest of our content, so perhaps we can use a different content model for them. If the different content model allows us to reduce the number of module-error pages, our attention will be led to a somewhat different set of violations, requiring or suggesting different partial solutions. A different content model for letters may lead to a better (more comprehensive) handling of letters. There are more than 60 Letter L3/4 headers on a. That does not seem a negligible amount of content to offload, but it seems likely to be smaller than the number of languages likely to use the letter a. DCDuring (talk) 17:19, 13 May 2024 (UTC)Reply
  • The problem isn't just individual letter pages, though. A page like [[an]] is also difficult to navigate around and could benefit from being split up. On the other hand, at [[bachall]] I find it very convenient to have the Old Irish entry and its homographic Irish and Scottish Gaelic descendants all on the same page. —Mahāgaja · talk 12:34, 13 May 2024 (UTC)Reply
@Benwing2: Well, the number of clicks is equal if you count the table of contents. Also, it seems like splitting the letter entries off doesn't address the Lua issue since letters make up only small proportion of the content at a.
@Tollef Salemann: If you don't have a phone handy, go to https://en.m.wiktionary.org/wiki/da and resize your browser to make it narrow — the table of contents disappears.
@Mahagaja, -sche: Yes, there could be some kind of controller template on da that would automatically transclude the pages if the number of languages is under a certain reasonable value (say 5), otherwise display the disambiguation table. Ioaxxere (talk) 13:32, 13 May 2024 (UTC)Reply
What I find convenient about having all three languages at [[bachall]] is not so much being able to read them all at once as being able to edit them all at once, and if they're transcluded from three separate pages named "bachall/ga", "bachall/sga", "bachall/gd" or the like, then that's not convenient anymore. —Mahāgaja · talk 14:30, 13 May 2024 (UTC)Reply
No, it doesn't want to disappear on my handy. Not on my neighbors' either. We both use Apple handy (iphone). What do you all mean it disappears? It just being smaller, so you must touch on it, so it becomes bigger and you easily can navigate, like in Wikipedia. Are you all talking but non-Apple devices? Tollef Salemann (talk) 17:44, 13 May 2024 (UTC)Reply
How about only splitting up pages over a certain size? At the moment when I look up a short word I type "da#Zhuang" in the search bar to get me straight to the entry I need, but that's still annoying. —Caoimhin ceallach (talk) 13:52, 13 May 2024 (UTC)Reply
I don't like the idea of splitting pages. Is there any way to personalise the table of contents? Being able to collapse language names per letter of the alphabet would go a long way. Thadh (talk) 14:03, 13 May 2024 (UTC)Reply
@Thadh This may be possible with CSS. I know for example that User:Sarri.greek has been experimenting with different layouts for the TOC. If not, and you can create a clear plan for what functionality you'd like, the MediaWiki devs might be amenable (e.g. if you contact Tim Starling directly; he's the one who increased our memory and timeout limits). Benwing2 (talk) 20:32, 13 May 2024 (UTC)Reply
@Thadh See [7] for an example of what Sarri did. Benwing2 (talk) 21:16, 13 May 2024 (UTC)Reply
That looks good! Maybe this at least solves the issue of navigation. Thadh (talk) 21:17, 13 May 2024 (UTC)Reply
I agree - it does look good. Theknightwho (talk) 21:19, 13 May 2024 (UTC)Reply
Not bad. Is this intended as a default for certain types of pages (What kind?), opt-in, or only in custom JS/CSS? DCDuring (talk) 01:38, 14 May 2024 (UTC)Reply
@DCDuring I think we could clean it up a bit and use it for long pages where we'd otherwise compress the TOC by omitting subheadings. Benwing2 (talk) 01:59, 14 May 2024 (UTC)Reply
@Benwing2, Thadh, Theknightwho: I rewrote the template: {{minitoc}}. Maybe it can be automatically added to any entry with more than (say) ten languages. Ioaxxere (talk) 05:14, 14 May 2024 (UTC)Reply
@Ioaxxere Looks good to me but let's solicit more comment first. Also it would be great if there was a way, after you expand it, to further expand it to show the subheadings. Some people (maybe User:RichardW57?) have complained about the shortened TOC's that you can't so easily navigate to the subheadings of a particular language. Benwing2 (talk) 05:19, 14 May 2024 (UTC)Reply
M @Ioaxxere, thank you for your Template:minitoc! and your help for [8], [9], [10], [11],[12], ... Also, perhaps variations for few (1-5) languages by L2 something like this? For related language periods like this? (Reconstruction of the magic word __TOC__ because it might be taken away from us in future skins like vector22? (e.g. discussion@el.wikt for modifications) Thank you, thank you! ‑‑Sarri.greek  I 05:36, 14 May 2024 (UTC)Reply
@Sarri.greek: Yes, a multi-column TOC is certainly possible. You can add the following into your commons.css:
div.toc > ul { display: flex; flex-direction: column; gap: 0 20px; flex-wrap: wrap; overflow: auto; max-height: 30em; /* change max-height as desired */ }
div.toc { width: 100%; }
I can't promise you that it'll look very good, though. Ioaxxere (talk) 06:08, 14 May 2024 (UTC)Reply
@Ioaxxere @Sarri.greek I've rewritten Module:minitoc somewhat to take advantage of the pre-computed list of L2s that is already calculated by Module:headword/page, since it can cope with a bunch of weird edge-cases that can't be dealt with by a simple Lua pattern (and it's also faster, since it means we don't need to parse the page again). Theknightwho (talk) 14:00, 14 May 2024 (UTC)Reply
Re complaining when TOCs are collapsed, FWIW I have also complained that when TOCs are collapsed you can't easily navigate to subsections of a given language section, but as long as we're only deploying that on entries with a truly excessive number of L2s, like a, I'll live with it. If we're deploying it on tons of pages, e.g. cat—11 L2s—I'm less happy. Maybe if we're only deploying it on mobile, that's better than also deploying it on desktop, OTOH someone was just complaining in another discussion that entries are hard to navigate because TOCs are collapsed or sometimes hidden(?) on mobile. So maybe the ideal would be to make it a gadget/pref, whether opt in or opt out, so people who wanted collapsed TOCs could get them—maybe even on all entries, if they wanted—and people who wanted uncollapsed TOCs on all entries could keep them. Or to make it possible to expand the collapsed TOCs (all at once or on a per-L2 basis) as mentioned above.
Not directly relevant to this specific concern, but relevant to the general topic is Wiktionary:Grease_pit/2021/June#Experience_on_mobile which also links a number of other prior discussions; see also some history at Wiktionary:Beer_parlour/2021/April#collapsed/minimized_language_headers. - -sche (discuss) 15:37, 14 May 2024 (UTC)Reply
I can understand why __NOTOC__ has been included, since it avoids having two TOC with Vector, but with Vector 2022 it's a bit detrimental since it means there's now no longer a TOC in the left-hand sidebar, which can normally be used even if you're scrolled halfway down the page. Theknightwho (talk) 15:46, 14 May 2024 (UTC)Reply
@Theknightwho: That's a good point. It's actually possible to specify by-skin behaviour by changing MediaWiki:Vector-2022.css and similar, so maybe we could use that to override __NOTOC__ in Vector 2022 since the TOC doesn't take up space in the document flow anymore. Ioaxxere (talk) 16:04, 14 May 2024 (UTC)Reply
Instead of moving the Zhuang entry for da to da/Zhuang, we could move it to Zhuang/da, and we could move all the Zhuang entries that way and add a specialised search bar searching only in entries starting with “Zhuang/”, like we do with the search bar on top of the beer parlour here. That would reduce the scrolling and the clicks for people interested in Zhuang. MuDavid 栘𩿠 (talk) 01:54, 14 May 2024 (UTC)Reply

Enabling categories for logged-out users[edit]

Currently, categories are hidden on mobile unless a user is logged in and has "advanced mode" enabled. I don't think there's any good reason to do this since categories are a pretty important part of the site. Apparently we need to get community consensus and then open a Phabricator request to set $wgMinervaShowCategories['base'] = true; Would you support this? Ioaxxere (talk) 14:16, 14 May 2024 (UTC)Reply

@Ioaxxere Support I had always assumed there was some reason for not doing so already such as:
  • not making entries look too cluttered
  • categories are too technical for viewers of Wiktionary who do not edit
  • or there are technical difficulties involved
Kutchkutch (talk) 14:27, 14 May 2024 (UTC)Reply
Support. Binarystep (talk) 14:42, 14 May 2024 (UTC)Reply
I assume this would slow things down a bit for all users. How much? DCDuring (talk) 14:49, 14 May 2024 (UTC)Reply
Support. Benwing2 (talk) 14:55, 14 May 2024 (UTC)Reply
I don't think this would cause any noticeable slowdown, even on very large pages. Theknightwho (talk) 14:57, 14 May 2024 (UTC)Reply
Support SAMEER (؂؄؏) 18:01, 14 May 2024 (UTC)Reply
Support, and I wish Wikipedia would follow suit but alas. lattermint (talk) 23:38, 14 May 2024 (UTC)Reply
Support - -sche (discuss) 01:32, 15 May 2024 (UTC)Reply
Support Fay Freak (talk) 01:55, 15 May 2024 (UTC)Reply

Sign up for the language community meeting on May 31st, 16:00 UTC[edit]

Hello all,

The next language community meeting is scheduled in a few weeks - May 31st at 16:00 UTC. If you're interested, you can sign up on this wiki page.

This is a participant-driven meeting, where we share language-specific updates related to various projects, collectively discuss technical issues related to language wikis, and work together to find possible solutions. For example, in the last meeting, the topics included the machine translation service (MinT) and the languages and models it currently supports, localization efforts from the Kiwix team, and technical challenges with numerical sorting in files used on Bengali Wikisource.

Do you have any ideas for topics to share technical updates related to your project? Any problems that you would like to bring for discussion during the meeting? Do you need interpretation support from English to another language? Please reach out to me at ssethi(__AT__)wikimedia.org and add agenda items to the document here.

We look forward to your participation!


MediaWiki message delivery 21:23, 14 May 2024 (UTC)Reply