Wiktionary:Beer parlour/2023/January

From Wiktionary, the free dictionary
Jump to navigation Jump to search

Request for rollback and/or autopatrol[edit]

Hello, I have 2500+ edits and 420+ entry creations, and I also made many reverts and speedy deletion tagging for bad changes. Please allow me to ask if I can have rollback and/or autopatrol at here. I already have rollbacker rights at simple English Wikipedia and simple English Wiktionary, so I'm sure I know how to use it. Thank you. MathXplore (talk) 08:08, 1 January 2023 (UTC)(Updated edit count, withdraw autopatrol request per feedback, MathXplore (talk) 09:30, 3 January 2023 (UTC))[reply]

My reverts at this project can be seen at here, and a full record of my reverts can be seen at m:User:MathXplore#Reverts. MathXplore (talk) 06:14, 3 January 2023 (UTC)[reply]
I am not familiar with this user and I'm not completely sure what autopatrol functionality is and how it differs from rollback functionality but I don't think it's a particularly big deal to grant it. Any comments/objections? Benwing2 (talk) 07:37, 3 January 2023 (UTC)[reply]
@Benwing2 They've been reverting a fair amount of obvious vandalism, which is good, but they've also been nominating things for speedy deletion based on criteria that don't make all that much sense according to Wiktionary standards- I'm not so sure they're up to speed on Wiktionary standards, overall.
As for autopatroller status, that just means that all of their edits show as patrolled. That makes me a little nervous because sometimes reverting isn't enough- the perpetrators may need to be blocked, spam or personal information may need to be hidden, etc. The rollback tool marks the reverted edits as patrolled, so they drop off the radar for those of us who filter for unpatrolled edits in Recent Changes. If someone is both an autopatroller and rollbacker, both the reverts and the reverted edits disappear. Chuck Entz (talk) 08:25, 3 January 2023 (UTC)[reply]
Hello, thank you for the feedback. In order to respond to your concerns, I decided to withdraw the autopatrol request for now. I would like to keep the discussion open to hear comments about my rollback request. MathXplore (talk) 09:30, 3 January 2023 (UTC)[reply]

Propose to rename Template:uncertain to Template:defn-uncertain and reuse Template:uncertain for uncertain etymologies[edit]

The current template {{uncertain}} is a definition line template that displays text something like The meaning of this term is uncertain. Possibilities include:. It is non-parallel with {{unknown}}, which is an etymology template for terms with unknown etymologies. In general, templates with plain names like {{uncertain}} are used for etymologies, and definition-line templates are often identified by additional text such as of, with or for (compare {{clipping}} vs. {{clipping of}}). Furthermore, {{uncertain}} is currently used on around 238 pages, while {{unk|title=Uncertain}} or similar is used in 1,588 instances on 1,553 pages. I propose therefore to rename {{uncertain}} to either {{defn-uncertain}}, {{uncertain defn}} or {{uncertain meaning}} and reuse {{uncertain}} (shortcut {{unc}}) for uncertain etymologies. The way I conceive of it, it displays Uncertain instead of Unknown (as {{unknown}} does) but otherwise categorizes the same, e.g. into e.g. Category:English terms with unknown etymologies. Possibly we should rename that category to Category:English terms with unknown or uncertain etymologies to better reflect the reality of these terms ("unknown" suggests we have no idea where a term comes from, "uncertain" suggests we have some guesses but we can't substantiate them, which is usually closer to the truth). But I would advocate not splitting the category into "unknown" vs. "uncertain" categories; there is too much uncertainty (so to speak ...) in the etymologies to be able to draw a clear line between the two categories. Thoughts? Benwing2 (talk) 08:32, 1 January 2023 (UTC)[reply]

I've seen templatized-"Uncertain" in enough etymologies that my initial reaction upon reading the section title was "isn't T:uncertain already for etymologies?", even though upon thinking about it I realize I have of course encountered it in its actual current use on definition lines — so that's a clear sign of where the naming format suggests it belongs, as you say! Hah. Yeah, it seems reasonable to give the name "Template:uncertain" to an etymology-section template, and name the current definition-line template something else. (And something in the vein of "defn-uncertain", though it feels odd to me because I'm not used to it, is upon consideration pretty intuitive.) - -sche (discuss) 09:52, 1 January 2023 (UTC)[reply]
Side note, the fact that the various {{clipping}}- vs {{clipping of}}-type templates you mention are so similarly named that people continually mix them up (most of the results of a search for insource:"Etymology acronym of" are using T:acronym of where they should have T:acronym: taser, supo, COBOL, ...) almost makes me wonder if all definition templates should be renamed to start with "definition:", e.g. {{definition:acronym of}}. But I'm loathe to make people type extra letters, especially when even spelling out "definition:" might not stop them using it in the wrong place. - -sche (discuss) 09:52, 1 January 2023 (UTC)[reply]
Support the main point. (To save a keystroke, and by analogy with {{defdate}} and {{rfdef}} (itself redone from {{defn}}), I'd prefer "def" to "defn", though.) I'm not sure unknown and uncertain need to be split at the category level, but I agree with the idea of making a shortcut for uncertain etymologies—I also tend to think the default should be uncertain rather than unknown anyway. The problem with splitting them rigorously is that I think the intuitive distinction is less clear-cut—IMO it's useful to call something where there are conjectures, but they're relatively weak and/or there are a lot of them, "unknown" rather than just "uncertain" (e.g. Latin -ensis). —Al-Muqanna المقنع (talk) 12:40, 1 January 2023 (UTC)[reply]
Yeah, I agree with you and Benwing about keeping one category. (And I like the clarity of Benwing's "Category:English terms with unknown or uncertain etymologies" as a category name.) - -sche (discuss) 19:50, 1 January 2023 (UTC)[reply]
Agree. Personally I think the system we have now is fine, but I can understand why people would want to make the change. Vininn126 (talk) 20:55, 1 January 2023 (UTC)[reply]
I went ahead and renamed {{uncertain}} to {{def-uncertain}} (following the name recommended by User:Al-Muqanna) and created a new {{uncertain}} for uncertain etymologies. (There are actually more than ~1500 uses of "uncertain" in {{unk}}/{{unknown}}, maybe more like ~2500.) The category still reads e.g. Category:English terms with unknown etymologies; I would like to rename this to Category:English terms with unknown or uncertain etymologies but will hold off a few days longer to see if there are any comments about this name. Benwing2 (talk) 06:54, 9 January 2023 (UTC)[reply]
Is this also catching instances of the words manually written in the etymology lines? Vininn126 (talk) 09:16, 9 January 2023 (UTC)[reply]
@Vininn126 Do you mean where they write out "Uncertain" or similar but don't use any template? I haven't dealt with those yet; to do that I need to look through a recent dump file. Benwing2 (talk) 19:09, 9 January 2023 (UTC)[reply]

"A toponym"[edit]

I don't think "A toponym in ..." should be allowed as a definition. It's just a longer way of saying "place", but in a non-gloss way (so actually more like "name for a place"), and I think we would all agree that that is a very vague definition that should be avoided in favor of something specific like "neighborhood", "street", "fort", etc., except in extenuating cases like a place mentioned once in an ancient document with no further context. If a toponym is actually attested in multiple sentences, it should usually be possible to infer what kind of place it is. 70.172.194.25 09:13, 2 January 2023 (UTC)[reply]

Can't we use {{place}}? Vininn126 (talk) 10:19, 2 January 2023 (UTC)[reply]
Further to the discussion about some of these at Wiktionary:Requests for verification/Non-English#Oudorte, it seems these tend to be place names that are mentioned on a map but without any real significance (individual fields, clearings, copses, etc.) that shouldn't have entries anyway per WT:CFI#Place names. They're often just small labels on stretches of countryside on géoportail. If something like this can only be called "a toponym" there's a good chance it doesn't need an entry. —Al-Muqanna المقنع (talk) 10:27, 2 January 2023 (UTC)[reply]
@Al-Muqanna I completely agree. We already have lots of entries for unincorporated communities, which are generally in Wikipedia but many of which exist only in databases and on old maps. If you poke around Wikipedia you can find tons of them, e.g. (just randomly looking a bit) Hoover, Missouri. This one isn't in Wiktionary but under e.g. Florence you can see a bunch of them mentioned. If you look up the one in Stephenson County, Illinois in Wikipedia here: Florence, Stephenson County, Illinois, you see nothing but a stub, and you probably can't find much info on it in Google. If it has any significant population and cohesiveness it's generally called a "census-designated place" instead. Benwing2 (talk) 04:22, 3 January 2023 (UTC)[reply]
  • The best of these French "toponym" entries might be on a par with Florence, Stephenson County, for which one can actually find several hits on Google Books, including in sentences. And to be clear, I'm relatively inclusionist and welcome the creation of entries for place names with that level of attestability, e.g. Roudoulous (see history). That place may only have 1,170 hits on Google (about twice as many as "Florence, Stephenson County"), but it has good citations available on Google Books. I would just prefer if 1. people were to check whether a place name is attested before adding it, 2. such entries were defined using a more specific term as appropriate (e.g. "hamlet") instead of the vague "toponym".
  • The worst of these place names are infinitely less attestable, with no Google-indexed evidence that they exist. I challenge anyone to find a single mention of e.g. Arrominguerasse or Touteillac anywhere on the Web, let alone a use in a sentence in a durably archived source. Géoportail must have gotten them from somewhere, but I have no idea where. Maybe manual data entry or OCR from old maps, IDK. How do we know this database doesn't have typos/scannos, etc.? I don't like relying on it as the sole available source, not to mention that CFI doesn't allow us to do so. 70.172.194.25 01:25, 4 January 2023 (UTC)[reply]
I think it's just data from IGN, the French equivalent of Ordnance Survey (don't think there's an exact equivalent in the US). You can turn on the IGN map as an overlay on géoportail and the labels will be there. —Al-Muqanna المقنع (talk) 17:17, 5 January 2023 (UTC)[reply]

Transliteration for Classical Chinese[edit]

Rationale: Currently all our Classical Chinese quotations are transliterated into pinyin when {{zh-x}} is used. However there are several problems with this: 1) it is anachronistic as people from the ancient times obviously did not speak Mandarin; 2) the auto-generated pinyin is often incorrect, and in some cases the character is so rare that it is not in the data modules (see diff for example); 3) the Classical Chinese text can be equivalently read in many other Chinese lects or other languages, and there is no reason to always assume MSM by default.

Proposal: Transliterations would not be provided by {{zh-x}} when the third parameter is specified to be CL. Variants of CL (including a new CL-M for MSM) would only be used where appropriate, e.g. when the quotation in consideration uses certain features such that it would be possible to identify the lect in question, or when the author is known to speak a particular lect.

(Notifying Atitarev, Tooironic, Fish bowl, Justinrleung, Mar vin kaiser, RcAlex36, The dog2, Frigoris, 沈澄心, 恨国党非蠢即坏, Michael Ly, ND381): Wpi31 (talk) 17:36, 2 January 2023 (UTC)[reply]

@Wpi31: Not providing any romanization would not be great because it would make it very inaccessible to the target audience, who we would assume is unable to read Chinese characters. In most scholarly work on Classical Chinese in English, Mandarin is the default romanization. I am not saying that this is necessarily the best thing to do, but it's not unprecedented. — justin(r)leung (t...) | c=› } 17:46, 2 January 2023 (UTC)[reply]
@Wpi31: Outline of Classical Chinese Grammar by Edwin G. Pulleyblank gives Mandarin romanizations. RcAlex36 (talk) 18:02, 2 January 2023 (UTC)[reply]
@Justinrleung Obviously I do acknowledge that it is fairly standard in academia to provide Mandarin romanisations (well, since it's "standard Chinese" in modern times), but in my view this is a barely acceptable solution to the larger problem(s) of romanisation of Chinese, and the problem(s) caused by Wiktionary's approach of lumping Chinese together. Would it be possible to instead supply OC/MC reconstructions or Baxter's transcription that we are already providing on many pages, or even splitting Classical Chinese as a separate L2?
(TBH we really need to reconsider how we deal with Classical Chinese, it really deserves some love and attention. Only the written texts are attested, so theoretically all the existing OC/MC reconstruction stuff should be in the reconstruction namespace.)Wpi31 (talk) 18:41, 2 January 2023 (UTC)[reply]
@Wpi31: I don't really know if OC or MC are necessarily better since OC reconstructions are highly controversial, and OC/MC reconstructions are also possibly anachronistic because they only represent particular points in the history of Chinese. — justin(r)leung (t...) | c=› } 22:12, 2 January 2023 (UTC)[reply]
I support keeping pinyin for Classical Chinese. It is not perfect, but neither are reconstructions, which as Justin points out, are highly controversial. As editors, we will need to monitor pinyin transcriptions to ensure they are accurate, and edit in overrides where necessary. I also note that pinyin is used across the board in pretty much every Classical Chinese publication both in and outside of China, especially for obscure characters. The readings are meant as a guide to the modern reader, and are not taken to be historical readings. ---> Tooironic (talk) 00:58, 3 January 2023 (UTC)[reply]
@Wpi31: Template:ltc-l and Template:och-l may be useful to you. 恨国党非蠢即坏 (talk) 03:53, 16 January 2023 (UTC)[reply]

Placeholder pronouns in headwords and inflection lines[edit]

We do not have a consistent way of handling placeholder pronouns in the headwords and inflection lines of some multiword phrases, eg, leave someone to their own devices. Often the result is to have a form that doesn't include all possible pronouns (eg, 1st- and 2nd-person pronouns) as well as not allowing for the possibility of nouns in the expression and therefore not helping learners use the expression correctly. I understand that there is usually (always?) no perfect solution to the difficulty in presenting such expressions. Other lexical references follow a variety of practices.

In [[leave someone to their own devices]], the use of their is more inclusive than his or her (what MWOnline uses) and, though imperfect, is adequate for 3rd-person use, though it could be taken as implying that his and her are wrong when used with a singular someone or what this placeholder stands in for. I think that most users understand that someone can be replaced by a plural, but I fear that fewer recognize that nouns (including some referring to certain things) and 1st- and 2nd-person pronouns can occupy the 1st placeholder position and fewer yet will be confident in selecting the proper occupant of the 2nd placeholder position.

  1. Possible changes in presentation:
    1. Would it be an improvement to enclose the pronouns someone and their in parentheses to emphasize those pronouns' role as placeholders: leave (someone) to (their) own devices? At least one other OneLook reference (Fairlex Dictionary of Idioms) does just that. AHD and Random House omit the 1st placeholder and parenthesize the 2nd, as we did until recently.
      1. Should such a change appear in the headword as well as the inflection line?
    2. Some other references show placeholders unemboldened.
    3. Some references exclude placeholders entirely!: leave to own devices. Search engines may allow this to enable users to find the entry.
  2. Should we facilitate the automated or semi-automated addition of hard or soft redirect entries with placeholders replaced by all possible personal pronouns (including no replacement)?
  3. Should we have some kind of Appendix or Wiktionary page to explain in detail what usages placeholders indicate? To be useful such a page would need to be referenced, probably in a standard usage note.

DCDuring (talk) 17:14, 3 January 2023 (UTC)[reply]

On point 2, I am absolutely in favour of using redirects more widely in cases like this, for variations with other pronouns or verb conjugations; redirects are cheap and left you to your own devices, leave me to my own devices etc isn't going to be a valid entry in any other language so there's no conflict. (If it is, just have a proper entry for that language, and a soft redirect for the English L2 section.) Usexes and citations in the lemmatized form should try to show the range of possibilities.
Would it be useful to have a (semi-templatized?) usage note explicitly spelling out what substitutions are possible? Or does the fact that almost any phrase can (if rarely) be modified make that pointless? E.g., physician, heal thyself usually uses that second person, but sometimes the google books:"physician heals himself" / google books:"physician needs to heal himself"; you and whose army can be modified to ask about google books:"him and whose army", etc. - -sche (discuss) 02:43, 4 January 2023 (UTC)[reply]

Persian contributions for HAKHSIN[edit]

(Notifying Ariamihr, Dijan, Mazsch, Qehath, ZxxZxxZ): Hi. Please review Special:Contributions/HAKHSIN. It seems they are adding made up words or unattestable neologisms. Anatoli T. (обсудить/вклад) 04:54, 5 January 2023 (UTC)[reply]

List of unreferenced English terms with few hits in IA corpus[edit]

I have created Wiktionary:Todo/Rare IA terms. So far I've only generated tables for entries starting with A-Z and a-c, checking about one third of English entries. It's turned out to be a good way to find terms that may not exist at all (like Afbachir), as well as real but rare terms deserving of quotations/referencing (like advowrer). If anyone finds these lists useful I can generate lists for the rest of the alphabet. 70.172.194.25 19:21, 5 January 2023 (UTC)[reply]

This is why we need more corpora research on this project. It's just so invaluable. Vininn126 (talk) 19:26, 5 January 2023 (UTC)[reply]
This seems useful. Clearly it's helping catch nonexistent terms. (I wonder, in the other direction, if we could figure out the most common words in the IA corpus that lack entries here are. That's probably harder to do.) - -sche (discuss) 20:17, 5 January 2023 (UTC)[reply]

A Potential Bot for pages that need Topic Categories[edit]

The Topic Categories are underused for even basic things such as light or fire, I suggest a bot be created to go over the wiki and put pages that do not have a topic category on them to a hypothetical Category:Request for topic categories with a template/module made to populate each language

I brought up this in the discord and was pointed out a number of exceptions would need to be made, such as grammar words like of and to, and there are bound to be other considerations to be made Akaibu1 (talk) 20:58, 5 January 2023 (UTC)[reply]

@Akaibu1 Someone made a similar request a few months ago. I'm not quite sure what you're proposing; if it's to add all content pages that aren't in any topic to a language-specific Category:Request for topic categories page, there are far too many such pages, and the category would become useless. More useful maybe would be to do this selectively only for the top X pages of a language by frequency (e.g. the top 10,000). What was proposed a few months ago was to auto-add pages to categories based on words occurring in the pagetitle, which would lead to tons of issues. Anything like this ultimately needs to be at least reviewed manually; if I were to do this I would set up a procedure to auto-create a text file of potential pages to add and the categories to add them to, then manually edit the text file to fix issues, then run a script to auto-push the resulting categories. But this is hard to get working for people who don't have a bot; ideally you'd want a UI to do this, but that would take some (nontrivial) programming work by someone (e.g. User:Erutuon) who knows how to create UI's. Benwing2 (talk) 21:57, 6 January 2023 (UTC)[reply]

affixes added into syllable word categories when they shouldn't?[edit]

see Category:English_1-syllable_words

this is likely a template issue if this is not in fact intended Akaibu1 (talk) 21:02, 5 January 2023 (UTC)[reply]

have Quiet Quentin also work on Google Scholar?[edit]

Can we have Quiet Quentin also support searching (and formatting citations from) Google Scholar, like it does Google Books? - -sche (discuss) 01:31, 6 January 2023 (UTC)[reply]

Wiktionary:Quotations#Quotation_gadgets Read that, there's a secondary gadget. It'd be nice to merge the code. I bet I could figure out a way to modify it to work with a certain Polish library website but that's not important at the moment. Vininn126 (talk) 01:40, 6 January 2023 (UTC)[reply]
Yeah, if we could either merge the code, or just house the code as a PREFS-enablable gadget rather than something people have to go outside of Wiktionary to turn on, it'd be great. - -sche (discuss) 01:44, 6 January 2023 (UTC)[reply]
I might be willing to work on this (merging / streamlining the code) someday if nobody else does. 70.172.194.25 01:45, 6 January 2023 (UTC)[reply]

It would be nice to have it work on Firefox. Its subwindow is currently unscrollable, which makes it unusable. --RichardW57m (talk) 12:35, 9 January 2023 (UTC)[reply]

The page currently at User:Dan Polansky/Thesaurus Benefits was placed by me at Wiktionary:Thesaurus/Benefits. The page ought to be moved back to Wiktionary namespace where it was created. The thesaurus project needs a fine description of its benefits and I have written one. The page I have written is a Thesaurus project asset, not a personal user page content. What I did was in keeping with Wiktionary common law, the practice by which Wiktionary namespace pages are written by initiative of individual editors and then developed together. Issues can be raised on the talk page and addressed. I will not provide examples now of other Wiktionary namespace pages like that since they are very easy to find, and all is on record. I even added a disclaimer at the top of the page: "This page reflects reasoning not necessarily shared by all editors." I am one of the few key contributors to the Thesaurus over many years, and have a natural role in stating the benefits as I understand them. Thesaurus project is better off with at least one description of its benefits; however, if someone thinks they can provide a better description of the benefits, we can use that content in Wiktionary:Thesaurus/Benefits instead. In the mean time, what I created ought to be moved back. Dan Polansky (talk) 07:13, 6 January 2023 (UTC)[reply]

You might have written the best thing in the world, and beaten us all, but unless there is community consensus, that's userspace. And you know it too. This isn't your personal blog. Equinox 07:15, 6 January 2023 (UTC)[reply]
That's not an attempt at argumentation, merely a statement of disagreement. The above does not say why the Thesaurus project ought not have a description of benefits by one of its main authors; it says that editors disagree with that. By that, the principle of the strength of the argument is violated and is replaced with numerical majority. --Dan Polansky (talk) 07:21, 6 January 2023 (UTC)[reply]
I assume you are able to read, Dan: the argument is - very obviously - that you lack consensus. Three different users moved it back to your userspace after you repeatedly edit warred to reinstate it, and had I not salted the page, I have no doubt you would have continued doing so. Just be thankful you didn’t receive another ban, because you’re on thin ice. Theknightwho (talk) 07:25, 6 January 2023 (UTC)[reply]
"Lack of consensus" is not an argument; to the contrary, on Wikipedia, what they call "consensus" is based on arguments. --Dan Polansky (talk) 07:33, 6 January 2023 (UTC)[reply]
It is an argument for why we don’t have to reinstate the page as some kind of status quo ante, which is what you seem to want. Theknightwho (talk) 07:37, 6 January 2023 (UTC)[reply]
I am not arguing "status quo ante"; I did not use the phrase. I am arguing the page is a good thing and those who oppose it have to explain why it is a bad thing; their saying "we oppose" ought alone have no force. --Dan Polansky (talk) 07:49, 6 January 2023 (UTC)[reply]
In the mean time, what I created ought to be moved back. Don’t lie. Theknightwho (talk) 07:55, 6 January 2023 (UTC)[reply]
Arguing against this level of argumentation stupidity or dishonesty is unproductive. I won't. If some editors want to see the benefits page restored, it's their call. --Dan Polansky (talk) 07:58, 6 January 2023 (UTC)[reply]
Threats against my user account belong on my talk page, not to this discussion. The above is unacceptable intimidation of a major Thesaurus contributor, which the above isn't. --Dan Polansky (talk) 07:33, 6 January 2023 (UTC)[reply]
I don't need an "attempt at argumentation" because, guess what, Wiktionary doesn't work on your rules (which are: "we argue until someone gets tired, and then Dan wins"), but rather on consensus. Go to sleep mate. Equinox 07:28, 6 January 2023 (UTC)[reply]
These are not my rules; these are Wikipedia's rules, viz that what matters is the strength of the argument, not the numerical majority. Therefore, it is required that at least an appearance of the search for strong argument is maintained. --Dan Polansky (talk) 07:33, 6 January 2023 (UTC)[reply]
This isn’t Wikipedia. Theknightwho (talk) 07:35, 6 January 2023 (UTC)[reply]
It was the above editor who claimed in another discussion that Wiktionary ought to decide by combination of strength of argument and numerical majority. My position is that Wiktionary ought to adopt much more of the Wikipedia principle that the strength of the argument makes a huge difference and that those who do not create at least an appearance of seeking strong arguments have no say. --Dan Polansky (talk) 07:37, 6 January 2023 (UTC)[reply]

skill issue and whether a noun phrase should be included as an interjection only because it is frequently uttered in isolation[edit]

I have repeatedly observed that the interjection PoS is applied very liberally on Wiktionary. The supposed interjections whose inclusion I disagree with are usually nouns that happen to be regularly uttered in isolation. I hold it that this is not sufficient grounds for inclusion; some sort of semantic or syntactic alteration has to be observed. Compare

with

FWIW, Wikipedia describes a similar distinction: w:English_interjections#Interjections_vs._nouns. — Fytcha T | L | C 11:12, 6 January 2023 (UTC)[reply]

Cf. the discussions at WT:Requests for deletion/English#victory and bad. To summarise what I wrote on victory, my preference for a principle of exclusion is roughly similar to what you've suggested, that secondary interjections which simply amount to "this is X", "X has happened", "X is here", or vocatives ("Waiter!" etc.) should be excluded. However, I think common "hortative" interjections ("Freedom!") should be included. But note that Wikipedia's "interjections vs. nouns" section is not correct: its citation is discussing primary interjections, a distinction the Wiki article fails to make (the title of the cited journal article is even "From proper name to primary interjection"); nouns in isolation are still, linguistically, interjections even if there's nothing peculiar semantically about that form. If a word retains its function as another part of speech then it's a secondary interjection, even if there's something particular about what it means as an interjection (the point about "gee" in the journal piece is that it's lost those other functions completely). By the same token, if we only accepted primary interjections, as the Wiki section would suggest, then clearly congratulations, hell, shit would fail since they haven't been "bleached of their original meaning" as nouns. —Al-Muqanna المقنع (talk) 11:22, 6 January 2023 (UTC)[reply]
@Al-Muqanna: Thanks for the pointers. The primary/secondary distinction that you're describing is strongly reminiscent to me of the lexical-syntactic distinction drawn with respect to adverbs in German grammar: Wiktionary:Beer parlour/2022/July § (German) Categorically removing adverbially used adjectives, Adverb#Usage_notes. If we're going to contend that most if not all nouns can (at least syntactically) 'be' interjections (what has been referred to as a secondary interjection), then that's all the more reason for me to oppose their double inclusion for the same reasons that I've already laid out in that BP thread. As for hortatives, I haven't thought too much about them but at present I'm probably also against their inclusion.
if we only accepted primary interjections, as the Wiki section would suggest, then clearly congratulations, hell, shit would fail since they haven't been "bleached of their original meaning" as nouns I disagree. When someone says Oh, hell! I got another parking ticket., the noun senses of the word hell are not invoked. The semantics that the interjection hell adds to this utterance (frustration, discontent) can only be accessed when hell is used interjectionally. For words like bummer, the part of speech does not fence off special semantics; something is labeled a disappointment, regardless of whether bummer syntactically acts as a noun or an interjection. — Fytcha T | L | C 11:58, 6 January 2023 (UTC)[reply]
That might be argued for hell, but I think it's much harder to argue that congratulations, condolences, apologies aren't invoking their original noun senses. —Al-Muqanna المقنع (talk) 12:00, 6 January 2023 (UTC)[reply]
@Al-Muqanna: I would have to think more about these in particular (intuitively, they seem to contain more "meat" than, say, bummer but I can't articulate my intuition right now), but even if it turns out that they are indeed secondary interjections, that's a bullet I'd be willing to bite, considering that they aren't getting deleted anyway thanks to WT:THUB and perhaps also WT:PB. — Fytcha T | L | C 12:06, 6 January 2023 (UTC)[reply]
I've always considered nouns to be quasi-interjections at best. My idea of "true" interjection is "oops" or "ouch". I have no literature on those quasi-interjections and what to do about them, so these are just my 2 cents at this point. --Dan Polansky (talk) 12:36, 6 January 2023 (UTC)[reply]
I agree with the "this is" criteria for determining interjections versus nouns like victory. In the case of skill issue I could see it being reworked into a noun (although I've mainly seen it used as an interjection). Ioaxxere (talk) 13:40, 6 January 2023 (UTC)[reply]
(The problem there is that like I pointed out at RFD the "this is" criterion doesn't apply to victory, the OED's definition of the interjection sense is "an expression of triumph or encouragement" (hence my "hortative" category, which needs separate discussion).) —Al-Muqanna المقنع (talk) 13:48, 6 January 2023 (UTC)[reply]
Whom are we serving when we include the PoS in these cases? Any normal user, including language learners, hearing or reading an NP used in isolation thereby know that it is very probably being used as an interjection, an answer to a question, a question, a command, encouragement, etc. IOW, this is a matter or pragmatics and, therefore, it would rarely (never?) be lexical. I suppose that some kind of shift of lexical meaning might warrant an interjection PoS, but such shifts seem likely to be rare. DCDuring (talk) 16:26, 6 January 2023 (UTC)[reply]
Well, I think if it's an interjection, as you say, it should be listed as one. For example, grow up is defined as a verb, and one of the senses is described as being often used as an imperative interjection. If we agree that it's an interjection, why not break it out into its own section? That's just the argument from a rules-as-rules sense, but you asked about benefits ..... having a separate part of speech will give us more room to illustrate the range of meaning with quotes and use-examples instead of cramming it into a single sense under the noun definition where it might be easily missed. An interjection might have more variety of meaning than its parent noun or verb does. Soap 09:28, 16 January 2023 (UTC)[reply]
FYI, I've gone ahead reworked skill issue into a noun (referencing new quotations have become available since I created the entry back in October). Ioaxxere (talk) 19:18, 6 January 2023 (UTC)[reply]

List of administrators[edit]

There probably is such a list somewhere. Can anyone point me to it? Thanks in advance. DonnanZ (talk) 16:55, 6 January 2023 (UTC)[reply]

@Donnanz: Wiktionary:AdministratorsFytcha T | L | C 16:57, 6 January 2023 (UTC)[reply]
Special:ListUsers/sysop should do it on most any MediaWiki wiki. —Justin (koavf)TCM 17:57, 6 January 2023 (UTC)[reply]
Sadly, some of the old guard seem to be inactive now, with their last contributions in 2022. DonnanZ (talk) 18:05, 6 January 2023 (UTC)[reply]
To be fair, we're only 6 days into 2023... and I personally find it quite refreshing that of the 34 admins elected before 2010, all have been active in the last 2 months or so (Internoob is the only exception). I guess we can't be too toxic a place, after all. Celui qui crée ébauches de football anglais (talk) 21:13, 6 January 2023 (UTC)[reply]
OTOH, 17 of our 108 human admins haven't edited after 2019, leaving more than 80,000 content pages per admin. It's a good thing we have more than 1,000 autopatrollers. DCDuring (talk) 23:04, 6 January 2023 (UTC)[reply]

Unattested intermediates of inherited descendents[edit]

@Thadh, Hythonia, Fenakhay, Sławobóg, Surjection but also this is very much a site-wide decision. I would like to establish a policy for words that were never recorded in parent languages but were inherited nonetheless. I see three options 1) do not list the unattested intermediate at all anywhere. 2) Create a reconstruction 3) list it but unlinked.

Option three seems like the best middle ground. Vininn126 (talk) 18:26, 6 January 2023 (UTC)[reply]

Option 3. PUC18:30, 6 January 2023 (UTC)[reply]
It should depend on whether the parent language has only one descendant or not. — SURJECTION / T / C / L / 18:32, 6 January 2023 (UTC)[reply]
So option 3 if one desc, option 2 if 2? Vininn126 (talk) 18:38, 6 January 2023 (UTC)[reply]
Sounds good to me, or alternatively option 1 for one descendant (I don't have a strong opinion either way). — SURJECTION / T / C / L / 18:38, 6 January 2023 (UTC)[reply]
My two cents:
  • For languages that are well-attested (Old Polish, Old East Slavic for instance), reconstructing ancestors when both a parent entry and a daughter entry exists is fairly straightforward in most cases, and should just be done, linked and given.
  • For more obscure languages where we can't yet reconstruct everything safely, I agree these should be skipped. But I can't even think of one such an example off the top of my head.
  • For languages that only have a few inscriptions (Proto-Norse): Do we even want to have these separate from the parent language in many cases?
Realistically, for most attested intermediate languages I can think of (Old Lithuanian, Old Polish, Old Czech, Old Dutch, Old French, Old Swedish...) the corpus is more than large enough to be able to reconstruct a form for the era we're lemmatising under (and all these languages either already do or should normalise their entries at least to some extent), so we should do that and add them to the chain in my opinion. For reconstructed languages - if we're making entries for them at all, that means they should be based on some kind of model, and this model should allow for reconstructing from a parent language, provided that the daughter language is regular.
Now the only problem with any of this I can think of is irregularities in daughter languages - should we assume these irregularities are older or younger than the language we're reconstructing? In these - pretty rare - cases, I agree that we should either omit the term (so use an {{inh}} template and leave the second parameter empty) and explain the irregularities in the attested language's etymology. Thadh (talk) 23:52, 6 January 2023 (UTC)[reply]
All (1, 2, 3), but with mobility in certain conditions. Gnosandes ✿ (talk) 00:20, 7 January 2023 (UTC)[reply]
Strong 1. See User talk:Catonif#Does *farβātos exist? and User:Catonif/great-pit-cleanup, where I'm keeping the project. Proto-Italic is not the intermediate stage between PIE and Latin, it's the language reconstructible from comparison between the attested terms of Italic languages, and without any other IE data. (though if I'm not convincing enough, I can settle for 2 as well, I meant 3). Catonif (talk) 19:52, 7 January 2023 (UTC)[reply]
That sounds to me more like a Proto-Italic problem than a site-wide problem. I also disagree that it adds clutter to the pages, since ideally you'd only want Proto-Italic to be given at Latin, Umbrian, Oscan etc. entries, and it provides an intermediate form between 8000 BC and 1000 BC, which is quite a big gap. Thadh (talk) 21:01, 7 January 2023 (UTC)[reply]
Yes, it's very much a Proto-Italic problem (and I presume also Proto-Hellenic, though I don't meddle there), but it seemed like the problem was shared with this. Also, well, we seem to be going off two different definitions of Proto-Italic, for me it's the reconstructible language through internal reconstruction of the Italic languages. It might be a good thing to agree what Proto-Italic even is. Catonif (talk) 22:49, 7 January 2023 (UTC)[reply]
In an ideal world, the language we'd get through internal reconstruction would be identical to the one obtained through applying sound changes from its parent language. Sadly, we don't live in an ideal world, which is why our Indo-European etymological communities should actually work together on this rather than go by the individual sources, because an etymological dictionary that doesn't handle a single model for its etymologies is frankly useless. But yes, this is a quite broad issue and I think that it has little to do with, say, whether or not to reconstruct Old French based on Latin and modern French. Thadh (talk) 23:15, 7 January 2023 (UTC)[reply]
Proto-Italic is a good illustration of how not to handle reconstructions. There's a notoriously bad IP who gelocates to the Pays-de-Loire region of France that treats all the world's languages as their private conlang, and who is especially fond of playing with Proto-Italic. I expanded an abuse filter that I had created to keep them out of the Reconstruction namespace in October of 2021, but there are lots of edits from before that and they've gotten around it on occasion when their IPs have changed. Some of their ranges: 89.225.181.102/14 (talkcontribsglobal account infodeleted contribsnukeabuse filter logpage movesblockblock logactive blocks), 90.12.53.215/16 (talkcontribsglobal account infodeleted contribsnukeabuse filter logpage movesblockblock logactive blocks), 109.211.233.154/24 (talkcontribsglobal account infodeleted contribsnukeabuse filter logpage movesblockblock logactive blocks), 2A01:CB05:8B96:E000:A5:89DF:5351:CDCD/32 (talkcontribsglobal account infodeleted contribsnukeabuse filter logpage movesblockblock logactive blocks). They've also used the accounts Inkbolt (talkcontribsglobal account infodeleted contribsnukeabuse filter logpage movesblockblock logactive blocks) and Dim Blob (talkcontribsglobal account infodeleted contribsnukeabuse filter logpage movesblockblock logactive blocks). See Proto-Italic *leɣʷis for one of their creations that no one has touched.
At any rate, they may be fairly knowledgable in some areas, but they don't bother with references and they make stuff up to fill in the gaps. For every possible way to go wrong with insufficient data, I'm sure you can find an example in their contributions (8 of the first 14 deletes on your page are theirs, for instance). Chuck Entz (talk) 22:19, 7 January 2023 (UTC)[reply]
It seems the general consensus is to avoid reconstructions. Can we generate a list of possible entries to clean up? Vininn126 (talk) 21:48, 14 January 2023 (UTC)[reply]
@Vininn126: Did you just base that on three supports and two opposes? Because I'd argue we need a little more than that. Thadh (talk) 22:04, 14 January 2023 (UTC)[reply]

Option 1, in some rare cases option 3. Delete everything in Category:Proto-Albanian lemmas. Ban Proto-Armenian. These things are magnets for charlatans. --Vahag (talk) 10:30, 20 January 2023 (UTC)[reply]

@Vahagn Petrosyan: Why not just delete Proto-Armenian and Proto-Albanian and then do option 2? Thadh (talk) 13:19, 20 January 2023 (UTC)[reply]
@Thadh: I would like to be able to mention Proto-Armenian (without linking) in rare cases like *ɣwino-. Vahag (talk) 13:42, 20 January 2023 (UTC)[reply]
Maybe change it to Pre-Armenian? Like Pre-Germanic? Thadh (talk) 14:12, 20 January 2023 (UTC)[reply]
To be clear: This proposal was only for languages that we have a code from (if I understand Vininn correctly), so intermediate forms may still be given without having to link to them according to option 2 and option 1 alike. I doubt anyone's actively proposing creating Pre-Germanic entries or removing them altogether Thadh (talk) 14:15, 20 January 2023 (UTC)[reply]
Yes. Vininn126 (talk) 14:23, 20 January 2023 (UTC)[reply]

This user is adding unattested "Old Prussian" neologisms as descendants of Proto-Balto-Slavic (e.g., astars), which is preposterous because those terms have only been invented/"reconstructed" in the 21st century based on the other descendants! They're also adding attested ones but using non-attested orthography from the conlang site twanksta.org instead of actual historical primary sources (e.g., awwins instead of awins). 70.172.194.25 23:29, 6 January 2023 (UTC)[reply]

They're even changing attested terms to non-attested ones, and are obviously ignorant of Baltic morphology. E.g. this edit which changed the Prussian descendant of the agent noun *arˀtāˀjas ("ploughman") from the attested word artoys with the same meaning, to the twanksta.org neologism artuwan meaning the instrument plough (itself based on the attested form preartue). 70.172.194.25 23:38, 6 January 2023 (UTC)[reply]
Thanks, let me know if you want this IP blocked. Benwing2 (talk) 00:41, 7 January 2023 (UTC)[reply]
Well, they seem to have stopped for now, so it may not be necessary. I'll leave them a message on their talk page, and if they continue to make questionable edits after that then maybe a block would be in order. 70.172.194.25 00:44, 7 January 2023 (UTC)[reply]

Notifying other Wiktionaries when a term fails RfV[edit]

Is there a procedure for this? E.g. see Æscleah, which has corresponding entries on the Northern Kurdish and Polish subdomains. 70.172.194.25 02:10, 7 January 2023 (UTC)[reply]

Unfortunately no, as each Wiktionary has its own RFV guidelines (ex: some entries on fr.wiktionary would not pass RFV here but have passed their guidelines there). Though you could maybe reach out to an editor there to get it removed. At the same time, some smaller Wiktionaries may take even longer to delete the entries as there's little moderation. AG202 (talk) 08:53, 17 January 2023 (UTC)[reply]

Etymologies of adverbs with lexicalized inflections[edit]

There is fairly wide consensus that we shouldn't have affix templates on nonlemma forms. What about if there is an adverb, particle, etc. that contains case suffixes? They are etymologically case forms, but are not inflected (outside of possible alternative forms which may have different case endings). An example of such a case is seinemmälle. It is etymologically an allative singular inflection (-lle) of a comparative form of seinä (despite it being a noun, as usually only adjectives have comparative forms, but the comparative form *seinempi does not exist outside of the case forms used as adverbs).

Possible options:

  1. Simply write out "(from?) (the) allative singular of {{m|fi|...}}" in the etymology.
  2. Implement a template like {{inflection of}} that is meant to be used in etymology sections (like {{initialism}} for etymologies, {{initialism of}} for definition lines).
    1. The template does not categorize anything.
    2. The template only categorizes into a single category per language.
    3. The template categorizes by form (and language, naturally).
  3. Use affixation templates. Write out the case that is represented either in the text, or use |posN=, |qN= to mark the case ending specifically.

Apparently Arabic has {{adverbial accusative}} for something like this.

(Bonus question: loitommalle is a case where all three external locative forms (i.e. a subset of all possible case forms for nominals) exist, but none other do. seinemmälle only has two of the three. Finnish has quite many of these - perhaps there is a case for avoiding duplication/redundancy with etymologies?)

@Vininn126, Thadh, Theknightwho, Fenakhay as users who were involved in the original Discord discussion.SURJECTION / T / C / L / 21:25, 7 January 2023 (UTC)[reply]

I personally would prefer a template, if it were to categorize it should categorize into one category. The reasons for this are:
1) if we allow for affixation categories of just an ending, then people will wonder why we might categorize non-lemmas and ask why we don't add the affixes to each non-lemma. I recognize this is probably my weakest argument, since we can add a rule saying "don't add etymologies to non-lemmas", but I still foresee some sort of issue.
Vininn126 (talk) 21:32, 7 January 2023 (UTC)[reply]
When discussing how to mimimise duplication when documenting non-lemmas, it was agreed that non-lemmas could have etymologies. An example is mice. --RichardW57m (talk) 13:30, 11 January 2023 (UTC)[reply]
2) It gives a sort of weight to these terms, which I believe the other methods don't. These are lemmas after all, and something special is happening with their lexicalization, that should be represented somehow, and I feel an affix template or something else doesn't quite get the job done. Also, I don't think these categories will be overly populated, but I'm sure some language out there might prove me wrong. Until then, I think it makes a certain sense.
Vininn126 (talk) 21:32, 7 January 2023 (UTC)[reply]
So far, I have approached this with the first option (cf. Ingrian alemmaal, hyvin), and I'm quite happy with it.
I don't think categorisation is a good approach here, but I would be fine with a non-categorising template.
Alternatively, categorisation is fine by me, as long as we try to keep these forms to a minimum - i.e. categorising loitommalle as inherited from Proto-Finnic *loit'omballek, which is morphologically regular and expected from a reconstructed lemma form, and reserving the categorisation for such cases like seinemmälle, where it appears there was no regular counterpart at any point in time. Thadh (talk) 21:38, 7 January 2023 (UTC)[reply]

We have a precedent with -es, which is referenced in the etymology of once. --RichardW57m (talk) 13:30, 11 January 2023 (UTC)[reply]

This user is hard to work with. He ignores rules, he edit wars, he doesn't want to cooperate. I don't know what to do.

Proto-Slavic:
*rarog - he pushed fringe theories that are ignored by 95% of modern etymologists. He didn't mention single source stating that etymology, he warred even after I added like 4 sources (9 at the end) for fixed etymology. We spent few hours on discord explaining to him why it his etymology doesn't make any sense, didn't work.
*grimati - another absurd edit warring without single word of explanation. He stated that etymology is unknown even tho all dictionaries explain etymology of that word. Again, we spent few hours talking about it without good result. We even emailed {{R:sla:ESSJa}} to make sure...
He just reverted my change at *voľa - he pushes his own rules against our rules: for Proto-Slavic we accept accent paradigms (a), (b) or (c) only and we have templates supporting only these, other paradigms are not very popular in scientific community and we should use traditional ones (WT:ASLA). Again, he doesn't care.
Polabian
I started editing this language some time ago and my goal I want to make WT have all its words. About was missing so I made WT:APOX, also to standardize our alphabet. I proposed the most popular one used by {{R:pox:SejDp}} with small changes. The author of this alphabet is perhaps the most significant researcher of this language, and the alphabet I have proposed is used in his last work ({{R:pox:Polanski:2010}}). However, Gnosandes keeps ignoring that and use simplified alphabet used in {{R:pox:Polanski:1967}} which is not that popular. Additionally, a few days ago I made a template ({{R:pox:Olesch1983}}) for dictionary that Gnosandes published on webarchive, and this researcher also uses the alphabet I proposed, which Gnosandes also ignores. Additionally I mentioned more sources that use my ortography, that also was ignored. "Problematic" letters: ʒ vs dz (always preffered ʒ), i̯ vs i (diphthong, usually prefered i̯), u̯ vs u (diphthong, usually prefered u̯), χ vs x (variously). Additionally ʒ, i̯, u̯ and χ are used in Slovincian, and these letters are also used by uk.wikt and pl.wikt (only 2 wiktionaries with a lot of Polabian words), changing the alphabet so it fits his weird ideas will break interwiki.
So that is it. Sławobóg (talk) 21:47, 7 January 2023 (UTC)[reply]

@Sławobóg This is not the first time people have complained about this user. He has been blocked several times in fact, as you can see here: [1] Not only does he seem to believe in and push fringe Slavic theories but he has several times injected political commentary into usage examples (and been blocked for doing so). I'm not sure what the correct response is, maybe we need to block him again. User:Atitarev can you comment as you've dealt more directly with him? Benwing2 (talk) 02:22, 8 January 2023 (UTC)[reply]
@Sławobóg, Benwing2: My personal negative experience with Gnosandes was mostly limited to his pushing some political garbage in usage examples you can hear from pro-Putin channels, his failure to stay neutral and his whataboutist responses. That was back in July, though. I haven't checked or noticed anything since then. I also dislike his failure (in some instances) to apply normalisation and standards accepted by the community to Slavic entries and share your dismay. Anatoli T. (обсудить/вклад) 02:40, 8 January 2023 (UTC)[reply]
My experience with the user is a general lack of a want to cooperate and instead focuses on what he believes I am saying rather than actually responding ot what I am saying. Either there is a language barrier or a lack of willing to cooperate, either way, it's nigh impossible to get anything done. Vininn126 (talk) 10:42, 8 January 2023 (UTC)[reply]
It's both but more the latter. Should have been perma-blocked a long time ago. -- Skiulinamo (talk) 10:51, 16 January 2023 (UTC)[reply]

Request for template editor[edit]

(Continued from Module talk:quote#Archiveurl and Archivedate, pinging @Sgconlaw) I would like to be able to edit and preview changes for Template:cite-meta. I haven't touched templates for the most part so far, but I have made edits on Module:quote which were well-received and I plan to do more template edits in the future. If there are criteria I haven't yet met, please let me know. Ioaxxere (talk) 17:47, 8 January 2023 (UTC)[reply]

@Sgconlaw does this pass? Ioaxxere (talk) 01:07, 3 February 2023 (UTC)[reply]

Upcoming vote on the revised Enforcement Guidelines for the Universal Code of Conduct[edit]

You can find this message translated into additional languages on Meta-wiki.

Hello all,

In mid-January 2023, the Enforcement Guidelines for the Universal Code of Conduct will undergo a second community-wide ratification vote. This follows the March 2022 vote, which resulted in a majority of voters supporting the Enforcement Guidelines. During the vote, participants helped highlight important community concerns. The Board’s Community Affairs Committee requested that these areas of concern be reviewed.

The volunteer-led Revisions Committee worked hard reviewing community input and making changes. They updated areas of concern, such as training and affirmation requirements, privacy and transparency in the process, and readability and translatability of the document itself.

The revised Enforcement Guidelines can be viewed here, and a comparison of changes can be found here.

How to vote?

Beginning January 17, 2023, voting will be open. This page on Meta-wiki outlines information on how to vote using SecurePoll.

Who can vote?

The eligibility requirements for this vote are the same as for the Wikimedia Board of Trustees elections. See the voter information page for more details about voter eligibility. If you are an eligible voter, you can use your Wikimedia account to access the voting server.

What happens after the vote?

Votes will be scrutinized by an independent group of volunteers, and the results will be published on Wikimedia-l, the Movement Strategy Forum, Diff and on Meta-wiki. Voters will again be able to vote and share concerns they have about the guidelines. The Board of Trustees will look at the levels of support and concerns raised as they look at how the Enforcement Guidelines should be ratified or developed further.

On behalf of the UCoC Project Team,

Mervat (WMF) (talk) 13:12, 10 January 2023 (UTC)[reply]

Italian participle+object constructions like scatenatosi, scatenantesi[edit]

I asked a question about scatenatosi that led to a wider discussion about such forms. I'm moving it here to get a broader perspective.

(begin copied discussion)

(Notifying GianWiki, SemperBlotto, Ultimateria, Jberkel, Imetsia, Sartma, Catonif): I was under the impression that past participles with enclitic -si were archaic, but I can find many examples of scatenatosi, scatenatasi etc. in recent texts. Is this a special case or is it relatively common? Are there other reflexive past participles like this? Benwing2 (talk) 02:20, 9 January 2023 (UTC)[reply]

@Benwing2 Not a special case. Participles with -si are archaic when being participles (i.e. we now say si è scatenato and not è scatenatosi) but are still in use as adjectives (or other specific formations), eg:
  • La guerra, scatenatasi nel 1500 in Germania, giunse finalmente al termine. — The war, unleashed in 1500 in Germany, finally came to its end.
  • Scatenatosi, si calmò. — [After] having gone wild, he calmed down.
Catonif (talk) 08:32, 9 January 2023 (UTC)[reply]
@Benwing2: Not a special case (and pretty common indeed), as per Catonif's explanation. — Sartma 𒁾𒁉𒊭 𒌑𒊑𒀉𒁲 10:35, 9 January 2023 (UTC)[reply]
@Benwing2: I believe Catonif explained it pretty well. — GianWiki (talk) 17:41, 9 January 2023 (UTC)[reply]
@Catonif, Sartma, GianWiki Hmm. I presume that not all reflexive past participles can randomly be used as adjectives ending in -si (similarly to how only certain past participles in English can function as adjectives). If so, then where can I find these reflexive past participles (functioning as adjectives) in dictionaries? Treccani, for example, has scatenato listed under scatenare but not scatenatosi. Hoepli and Olivetti similarly have entries for scatenato but not scatenatosi, and none of these entries give any examples containing scatenatosi. Can you give some other examples of adjectives ending in -atosi, -itosi and -utosi? Benwing2 (talk) 19:05, 9 January 2023 (UTC)[reply]
@Benwing2 If a verb can take the -si, so can the participle: trovatosi, tramutatosi, credutosi, mangiatosi, etc. It's probably more rare and literary if the -si is reciprocal: *conosciutisi sounds very old-book-ish, and might not even exist. As said above, no *andatosi can exist (since there can be no *andarsi) but andatosene can (because andarsene). Dictionaries don't have them because they're non-lemmas. Catonif (talk) 19:36, 9 January 2023 (UTC)[reply]
@Catonif: andatosi can exist, though, if the si is borrowed from another verb: «Così dopo pochi giorni il Brutto Anatroccolo si svegliò, ed andatosi a specchiare nello stagno vide che tutte le sue piume erano diventate bianche come il latte, e la sua goffaggine si era trasformata in un portamento elegante ed aggraziato: era diventato un cigno!». — Sartma 𒁾𒁉𒊭 𒌑𒊑𒀉𒁲 11:10, 10 January 2023 (UTC)[reply]
Aha, you're right! Catonif (talk) 15:17, 10 January 2023 (UTC)[reply]
Maybe it's worth noting that the -si is -mi for first person singular.
  • Credutomi spacciato, mi arresi. — Believing I was done for, I surrendered.
Formations with -oti (2s), -ici (1p), -ivi (2p) and -isi (3p) on the other hand all sound old to me. Catonif (talk) 19:55, 9 January 2023 (UTC)[reply]
@Catonif OK. I'm asking because awhile ago I deleted a bunch of such forms that were auto-generated, on the theory that they were obsolete and needed individual attestation. One such example of many is amatosi; you can see the deletion message, including the old content. conosciutosi also was auto-generated and later deleted. andatosene still exists. This leads to a bunch of questions, e.g. for reflexive verbs we currently display the past participle without any clitics, e.g. andarsene just lists its past participle as andato. Should the clitic-full forms be listed instead, or in addition? And how do we know automatically (maybe not possible) whether a given reflexive verb has such a form? For example, conoscersi does exist, and for this reason SemperBlotto auto-generated *conosciutosi. Benwing2 (talk) 20:59, 9 January 2023 (UTC)[reply]
BTW in your examples Scatenatosi, si calmò and Credutomi spacciato, mi arresi, the participle is behaving like what in Russian would be called an adverbial participle (which has a distinction between adverbial and adjectival participles, each of which can be active or passive and if adjectival can also be distinguished as present or past). (In Latin they would likely use an ablative absolute in these circumstances.) Not sure how to gloss such usages in Italian. Benwing2 (talk) 21:10, 9 January 2023 (UTC)[reply]
Yes, "ablative absolute" is exactly how I would describe it. Catonif (talk) 15:17, 10 January 2023 (UTC)[reply]
Related question ... present participles? I find recent examples of scatenantesi, is the situation the same with these as with past participles or are there some extra nuances here? Under scatenarsi the present participle is given just as scatenante, should it be scatenantesi instead or in addition? Benwing2 (talk) 06:59, 10 January 2023 (UTC)[reply]
@Benwing2: I personally think that in cases like andarsene we should give andatosene as a past participle; I find it clearer, and if I was a learner that's what I'd prefer.
Participles are indeed used like the ablative absolute in Latin, to form what in Italian are called "implicit relative clauses" (proposizioni relative implicite), and can use both past and present participles. (See for example: Relativa implicita > Participio)
Our attestation principle doesn't go well together with grammatical forms like the one we are discussing, since any native speaker could come up with something "never said or written" any time, as long as it is allowed by Italian grammar. Whatever is grammatically possible might be written or said, the fact that you can't find it online doesn't mean it's not Italian or that it cannot exist. I believe dictionaries should also give you "possible" forms, not only "attested" ones, but I do understand that this is a different, and much bigger topic.
So, when it comes to participles with pronouns attached to them, as long as we keep our criteria for inclusion the way they are, the job of determining what should be given and what not is going to be colossal... — Sartma 𒁾𒁉𒊭 𒌑𒊑𒀉𒁲 11:29, 10 January 2023 (UTC)[reply]
@Benwing2: Random comment, but I wouldn't consider conosciutosi wrong. It's usually found in the plural conosciutisi or conosciutesi, but since we always give the masculine singular as a citation form, we have to go with conosciuto here. This doesn't mean that the past participle of conoscersi doesn't exist, though. It's only rare to find a context that would require that form, but nonetheless not impossible at all: «e via via, conosciutosi il suo valore, egli entrò in relazione con dotti e illustri uomini» (Here, for example. A bit old, but still modern Italian. In this case the si is not reflexive, but impersonal, but still). I could say "Conosciutosi meglio, Giovanni fu finalmente in grado di modificare il suo comportamento". It's unusual, but neither impossible nor wrong. — Sartma 𒁾𒁉𒊭 𒌑𒊑𒀉𒁲 11:44, 10 January 2023 (UTC)[reply]
@Benwing2: Same goes for amatosi: «I due, amatisi a Roma, avevano deciso di divorziare dai rispettivi consorti per unirsi in matrimonio.» From hereSartma 𒁾𒁉𒊭 𒌑𒊑𒀉𒁲 11:46, 10 January 2023 (UTC)[reply]
@Benwing2 I'd discourage the presence of these forms as entries here, as they don't mean anything more that what they are formed by, just like I'd discourage conoscerla, trovarci, saperne, etc. and as I would also discourage conjugation boxes on verbs with particles attached to them, like andarsene, trovarsi, andarci, comprami, etc. These particles dance around a sentence without a single care (even more so in older texts), and the fact that there's no space between them and the unlucky word they'll stick on is mostly an orthographical matter, and shouldn't distact the learner into thinking that these are "new" words. Having all the possible combination of particles on top of every verb form (because yes, nothing stopped Dante from saying fécemi or trarròtti) of every verb really doesn't seem ideal. They might have occasionally helped learners but it would take us all the way to fàbbricamicene. These are the problems that arises with this current system, and I have no good solution if not just ditching it and 1) removing conjugation boxes from particled verbs and 2) removing combined forms. Of course I realize this would take time and thought and discussion. Catonif (talk) 15:17, 10 January 2023 (UTC)[reply]

(end copied discussion)

Several users e.g. User:Catonif and User:Sartma make some very good points. My personal take is it's fine to include conjugation tables esp. of reflexive verbs (e.g. ritenersi) and probably also verbs with attached particles (e.g. andarsene) but I don't see the point of creating combined forms like aggiustarmela or agitali except maybe in a few cases where there are irregularities or archaic forms involved, e.g. andovvi and the above-cited trarròtti. I also don't see the point of including combined forms in verb conjugation tables in most cases. (We do this for Spanish but not Italian; it makes a certain amount of sense in Spanish because the rules for accent placement are somewhat complicated and get messy when you have particles attached, but this doesn't apply to Italian.) However, we have 15,000+ entries currently in Category:Italian combined forms. Per our WT:CFI, I'm not sure I can just mass-delete them (as I'd like to do) and nothing prevents an energetic user with a bot from creating a zillion more. I wonder if we need to rethink or at least clarify the rules for such combined forms in Italian. See also Wiktionary:Requests for deletion/Non-English#essersi, where User:Fytcha enumerated some potential rules for situations like this. Benwing2 (talk) 03:00, 11 January 2023 (UTC)[reply]

I've just implemented this template/module. The rationale behind this is that our English entries usually have better categorization, labels and {{place}} arguments than their non-English counterparts. By using this template, we ensure that the non-English entries get all these nice things, too, without editors having to tediously copy-paste the senses over. Furthermore, changes or fixes to the English sense automatically propagate to all non-English entries. Note that this template may only to be used for terms that have perfect translations between languages. Terms that have perfect translations are predominantly those that are "accepted to be prescribed", in practice they are mostly proper nouns (especially toponyms) or scientific vocabulary (taxonomic names, units, mathematical objects etc.).

For this template to work, the English donor sense has to be annotated with {{senseid}} (right after the #). If there is no {{senseid}} present and if you're about to add it, please briefly check whether Wikidata has an entry for the referent and if so, use Wikidata's Q... code as the senseid.

For the categories to propagate, they have to occur in the definition line, too. This style is not that common on Wiktionary at present but I'd argue that it is the most sensible thing to do anyway, even if it weren't for this template. IMO, the best place to put them is between {{senseid}} and {{lb}}.

If there is interest in this template, I'll write the documentation and fix the sortkey shenanigans that are currently going on. Ideally, I'd also want it to propagate the "firstness" (the property that that entry appears as the first one within a topical category, see e.g. koira in Category:fi:Dogs) because firstness is invariant under perfect translation.

It currently doesn't work with United States of America due to some interference between {{,}} and {{place}}: {{,}} appears outside of {{place}} in the English entry but {{transclude sense}} moves it inside in non-English transclusions (compare English Belgium / Romanian Belgia to see why). I've looked at the template code generated by {{transclude sense}} and it looks fine so the issue is on {{place}}'s end (i.e. {{place}} misinterprets the HTML generated by {{,}}).

For examples of this template in action, see Romanian Belgia, Romanian hidrogen, Portuguese Atari. Any feedback welcome. — Fytcha T | L | C 14:53, 11 January 2023 (UTC)[reply]

I have been thinking about a template like this for a long time - it's more easily machine readable and easy to link definitions. In theory this template can be modified to link to the given definition, no? The sameway senseid's work.
I think we could potentially add a bunch of senseid's to defs from WikiData automatically, particularly pages with one def, and check other pages (I'm thinking of things like boring adverbs and such), potentially check trans boxes and also pages with only one defline and convert them. Vininn126 (talk) 15:06, 11 January 2023 (UTC)[reply]
@Vininn126: In theory this template can be modified to link to the given definition, no? The sameway senseid's work. It already does that. It generates a link to the correct English entry and sense and then displays the English definition as the gloss (which can be turned off using |nogloss=1 but I'd argue should be left as is for almost all entries). Check out the three examples I've provided. — Fytcha T | L | C 15:16, 11 January 2023 (UTC)[reply]
That's definitely ideal, then. Obviously my suggestion is for words with one def, and this is better for words with many defs. If it's things like chemical compounds or elements or such, then we're in luck. Vininn126 (talk) 15:19, 11 January 2023 (UTC)[reply]
Like {{desctree}}, this depends a lot on the details of another entry without any indication visible at the other entry. That makes me nervous. It also reads the wikitext of the other entry into Lua memory, which is the kind of thing that makes Han-character entries such memory hogs. I can see how this might cause problems in pages where there are lots and lots of languages with the same spelling for the same thing. Used in the right entries in the right way, it will be good- but we have no control over that. Chuck Entz (talk) 15:48, 11 January 2023 (UTC)[reply]
That's a double edged sword - there will be times where making mass updates will be good, and itmes where the defs don't exactly line up. In the second, the template should not have been deployed in the first place. Vininn126 (talk) 16:13, 11 January 2023 (UTC)[reply]
@Chuck Entz: Those are two valid concerns. When I considered implementing the template, I just thought the benefits outweigh the downsides.
As for memory: This template, of course, increases the memory consumption on pages it is used on, no question. To give some figures: revision 70798354 (10 {{transclude sense}}) uses 14.1±0.013MiB, revision 70439449 (no {{transclude sense}}) uses 12.8±0.028MiB (N=10). However, this is overly pessimistic because we'd need to compare the consumption of two visually identical pages as the former revision displays more information than the latter. Another point is that this template still has a lot of potential in terms of memory-efficiency; during development, I managed to reduce its memory consumption to about half of what it initially was. I still don't understand all the nuances behind how much memory frame:preprocess and frame:expandTemplate consume but I firmly believe there is more to be squeezed out of them than what I arrived at by more or less trial and error. Furthermore, comparing Fenakhay's and my revision on (history) Belgia, it is clear that the module doesn't memoize at all (between invocations on the same page), even though the entire parsing phase is wholly language-independent (i.e. is identical among all invocations of {{transclude sense}} on Belgia). I don't know if there is a solution to this but if there is, this is going to be another huge improvement on pages with many languages and one common referent.
As for inadvertently causing errors in another article à la {{desctree}}, that is true and it is a problem, but I'd say it becomes less of a concern the more robust and mature {{transclude sense}} grows. Moreover, carelessly changing the {{senseid}} in an article is already an error (in that it causes dangling links) even if it is not currently a Lua error, so this template converts this kind of silent error into loud (and thus fixable) ones (which is good). — Fytcha T | L | C 16:29, 11 January 2023 (UTC)[reply]
  • "I don't know if there is a solution to this but if there is, this is going to be another huge improvement on pages with many languages and one common referent." I don't think there's any way to pass/store data between different module invocations. This is an intentional choice on the part of the Scribunto developers, because it would interfere with Parsoid's design goals. Even if there were a way around this, it would be considered a defect to be removed and therefore couldn't be relied upon.
  • From a few experiments (invoking Module:Sandbox/43 in WT:Sandbox with various parameters), it seems like there might already be some kind of caching between calls to mw.title.new(...):getContent() for the same title, so that doing so over and over "only" adds a constant amount of memory each time (~82 KB) that seemingly doesn't depend on page size, but calling it with previously unseen titles adds more. The processing itself could not be cached, though. 70.172.194.25 06:21, 13 January 2023 (UTC)[reply]
I'm not persuaded that a Lua function is better than a system of templates. I think a family of templates (one per sense maybe plus some general templates) would be better:
(1) It avoids the problem of running out of memory for Lua.
(2) Quite possibly quicker.
(3) Already tooled up for tracking usage, e.g. when we find that technical words have multiple meanings in English but not in other languages.
--RichardW57m (talk) 16:12, 12 January 2023 (UTC)[reply]
Many languages have a different focus than others on what a country is, or what hydrogen may signify: I'm sure your average Ingrian speakers doesn't care that the Netherlands are "the main constituent country of the Kingdom of the Netherlands, located primarily in northwestern Europe bordering Germany and Belgium", they just care that it's a country in Europe, so giving all this encyclopedic information is completely useless, and also lexicographically not faithful to the speakers' world view. As such I'm not sure how useful this template is, since it essentially reduces definitions to a kind of machine code, whereas our entries should ideally aim to describe the definitions with relation to the speakers' world views. Thadh (talk) 17:08, 11 January 2023 (UTC)[reply]
Although that's a fair point in principle, I'm not sure that in practice having "Budapest (the capital city of Hungary)" manually duplicated 10 times at Budapest provides much insight into Weltanschauungen. Fytcha did say "this template may only to be used for terms that have perfect translations between languages", which seems like an adequate proviso to me. —Al-Muqanna المقنع (talk) 17:23, 11 January 2023 (UTC)[reply]
As long as it is, that seems fine; I'm just saying it shouldn't be used indiscriminately and added by people that don't know the language. Thadh (talk) 18:05, 11 January 2023 (UTC)[reply]
Yeah, I kinda agree that the benefit of having an explicitly written gloss for common words is unclear to me, especially when repeated over and over on the same page, even if the fact that we can do it is technically cool. For example, Fytcha put this on hidrogen#Romanian, but I'm not sure I actually find that any more aesthetically pleasing, or useful, than the simpler definitions above. I can also imagine that someday someone feels like adding more detail to the English entry, which will then make the already long gloss even longer. Ultimately this is basically a stylistic preference, I suppose. 70.172.194.25 06:51, 13 January 2023 (UTC)[reply]
Thanks for the feedback. Maybe it is indeed ultimately a stylistic preference but I think there are some (perhaps underappreciated) objective facts that should at least inform our perception of this matter. For instance, I don't find it particularly satisfying if an entry such as Nullvektor contained nothing more than what is basically a soft-redirect to the English entry zero vector because if a reader who is unfamiliar with both Nullvektor and zero vector looks up Nullvektor, we force additional cognitive workload on them by only telling them the meaning of the German term by transitivity (i.e. we force them to learn the English middleman and only then do we allow them to figure out the meaning of the German term). We also force them to visit one more page. It's particularly bad if the reader is interested in different aspects of the German word (e.g. first the meaning, then the gender) because they are then forced to jump back and forth. All this is solved by just having the mathematical definition in the German entry as well.
Intuitively, many editors seem to dislike it if non-English entries contain more than a simple soft-redirect. I share this intuition but I'd say it mostly stems from a desire to reduce redundancy which this template takes care of without having to sacrifice self-sufficiency. I strongly disagree with the notion that non-English entries don't have a right to self-sufficiency; I understand Wiktionary as a multilingual dictionary, not as an English dictionary with some (bidirectional) translations. — Fytcha T | L | C 12:04, 16 January 2023 (UTC)[reply]
I, too, share this feeling. I really hate synonym listing and gloss-less defs. Vininn126 (talk) 13:03, 16 January 2023 (UTC)[reply]
@Thadh: If it doesn't fit, don't use it. This doesn't take away from the fact that there are cases where it is fitting. Additionally, the gloss can be suppressed using |nogloss=1 which still gives you the benefit of labels and categories being propagated (which less well documented languages usually direly lack). In the case of a non-English term linking to Netherlands, I disagree that the gloss is encyclopedic, it serves to disambiguate the polysemic English translation. Even if it weren't for that, I would, while reading a dictionary, never take the degree of verbosity of a gloss to be a reflection of the average native speaker's attitude towards the referent; I would have never taken the presence or absence of |nogloss=1 as a documentation of how much the average Ingrian speakers cares about the legal minutiae of the Netherlands. — Fytcha T | L | C 17:51, 11 January 2023 (UTC)[reply]
I'm sure there are plenty of cases where two groups share the same reality and view of a word in which case it completely makes sense to link them, as we already do with senseid. If they are different, we shouldn't link them, as mentioned in the original post. Vininn126 (talk) 17:58, 11 January 2023 (UTC)[reply]
I love this! I was recently lamenting the lack of something similar when editing the entry maguey that English has borrowed from the Spanish. It seemed unfair (although perfectly understandable) that the Spanish definition of maguey was maguey while the English contained a full gloss. This is a great solution. Could it also be used as a replacement for {{syn of}} for words that are perfect synonyms within the same language for something like condicionado? What about some sort of {{transclude lemma}} for forms? JeffDoozan (talk) 17:56, 11 January 2023 (UTC)[reply]
This is an interesting idea but I fear it might be TOO broad sweeping - there might be times where a word is expanded and has some new senses not shared by other languages. Vininn126 (talk) 17:59, 11 January 2023 (UTC)[reply]
I meant for transcluding senses/lemmas within the same language, to avoid duplicating the sense text between synonyms or for showing the lemma's senses on the form's page. I agree it would not be ideal for transclusions between languages for the same reason you stated. JeffDoozan (talk) 18:08, 11 January 2023 (UTC)[reply]
That makes sense, sorry. Vininn126 (talk) 18:12, 11 January 2023 (UTC)[reply]
@JeffDoozan: Thanks. An analogous template for terms within the same language could be implemented but I'd be very wary of that; perfect synonymy is much rarer and much harder to identify than perfect translationhood. — Fytcha T | L | C 13:18, 12 January 2023 (UTC)[reply]
I can't support it for the same technical reasons that were pointed out above, and Thadh's concerns also make me feel like this isn't really a good idea in practice. — SURJECTION / T / C / L / 18:24, 11 January 2023 (UTC)[reply]
Agreed. The idea is good, but the implementation (inevitably) hacky. We need a real datastore, not more regular expressions. – Jberkel 20:18, 11 January 2023 (UTC)[reply]
@Jberkel: I'd argue, if you like the idea but not the current implementation, having (and using) this template is still a big plus. Once we figure out a better implementation, the information of what is a perfect translation of what will already be here (in the form of {{transclude sense}} invocations). — Fytcha T | L | C 13:18, 12 January 2023 (UTC)[reply]
I like the idea of having a single place for the meat of the definitions, but i do not like the idea of using templates to make a kludge like this. If anything, we should be tearing down as many of our existing kludges as possible, or we will never get rid of eternal grievances like Lua memory errors. The correct solution for consolidating data is to complete restructure the dictionary into something machine-readable instead of the semiformatted wikitext we have now, but WIktionary would stop being Wiktionary at that point. — SURJECTION / T / C / L / 15:21, 12 January 2023 (UTC)[reply]
Hear, hear!, to the idea of reducing kludginess.
Note sure I agree that doing so would un-Wiktionarify Wiktionary. I refuse to accept that kludges are any inherent part of our identity as a community of editors. 😄 ‑‑ Eiríkr Útlendi │Tala við mig 19:48, 12 January 2023 (UTC)[reply]
This seems OK only when used for the lack of anything better. But what we call synonyms in English are often merely near-synonyms. I doubt that the situation differs in other languages. I would expect that meanings would evolve differently in different languages and that different levels of detail and different nuances/connotations are appropriate for different languages. I don't think this should ever over-ride an existing definition that consists of more than a single term. It also has no bearing on a one-word gloss in any language where the one English word is polysemic. DCDuring (talk) 19:00, 11 January 2023 (UTC)[reply]
Meh. This seems like it could work for placenames; I'm not sure what else would be so perfectly synonymous, so I might be wary of having it as a general template people are liable to widely misuse, since people are already, in this thread, floating using it for things like synonyms within the same language, which {{synonym of}} or just defining Barfoo as "# [[Foobar]]" is for.
There are a few things it might be helpful if we could store in a more database-like structure which multiple entries could pull from, perhaps most famously etymologies (so that if it's thought English borrowed foo from Hindi, which borrowed it from Tibetan, and this gets propagated out to every other language which borrowed foo from English so they all say they got it "from English, from Hindi, from Tibetan", but we later find the English word is actually from Persian, from Arabic, we don't have a zillion still-wrong entries after someone fixes the English one), but also this kind of thing (recording in one place that City X is in Area Y). (I suppose, technically, this type of thing would also allow displaying the definitions of flavouring / flavoring in two places at once...) - -sche (discuss) 22:17, 11 January 2023 (UTC)[reply]
@-sche: I'm not sure what else would be so perfectly synonymous Scientific vocabulary (at least in the hard sciences); I've just created Nullvektor using this template. — Fytcha T | L | C 13:18, 12 January 2023 (UTC)[reply]
There are plenty of situations where there is an exact translation. Vininn126 (talk) 13:52, 12 January 2023 (UTC)[reply]
Yeah, I'm a bit confused by the comments about how no/very few words have exact translations etc. There's a huge domain of technical vocabulary with objective referents, terms that are typically either shared as internationalisms or translated via calques. What might be useful is to have the template add entries to a hidden maintenance category that can be monitored for abuse. —Al-Muqanna المقنع (talk) 16:03, 12 January 2023 (UTC)[reply]
Not only that but also certain things like specific colors or even everyday vocabulary like "dictionary" or "guitar"... Or even everyday verbs. A maintenance category could be good. The concern raised earlier about a change to an English page radiating to many other pages is a real concern. I'd still like to get a bot to assign ID's to definitions, however. Vininn126 (talk) 16:12, 12 January 2023 (UTC)[reply]
Can you give examples of reliable colours? Basic colours are notoriously linguistically variable. Verb equivalences tend to be inexact. --RichardW57m (talk) 16:37, 12 January 2023 (UTC)[reply]
These tend to be the fancier ones, like alabastrowy and alabaster. Vininn126 (talk) 16:44, 12 January 2023 (UTC)[reply]
How would we handle the sudden recognition of a difference between English homomorphism and French homomorphisme. It's only in French that I've seen the requirement that such functions be open mappings, i.e. map open sets to open sets. --RichardW57m (talk) 16:20, 12 January 2023 (UTC)[reply]
I assume the same way we would handle it now, by either a usage note or by changing the gloss (and removing the template). Such a difference is not mentioned in the existing entry either. Someone with more maths knowledge like @Fytcha can probably comment. —Al-Muqanna المقنع (talk) 16:38, 12 January 2023 (UTC)[reply]
Ah, I overlooked that the proposed system still generates a link to the English term. Sorry. --RichardW57m (talk) 16:55, 12 January 2023 (UTC)[reply]
I personally find this neat. Nonetheless, all the technical concerns mentioned above are more or less valid. They're not unique to this template, and would also apply to a host of widely used CJK templates (including but not limited to {{zh-see}} and {{ja-see}}), {{desctree}}, {{senseno}}, and probably lots of others. For the Lua memory issue in particular, it probably won't be much of a problem if one simply refrains from using this on pages near the memory limit (e.g., it's fine to use on something small like Nullvektor, we obviously wouldn't want to use it on a). Along the lines of Chuck Entz's comment, the issue of senseids potentially getting out of sync comes to mind. I definitely don't think this should (at least for now) be adopted as a standard or that we should have a bot mass-convert entries to use this, but small-scale experimentation seems alright. Then we can see how it empirically works out in practice with all the potential issues people are raising here, vs. the potential benefits. 70.172.194.25 06:21, 13 January 2023 (UTC)[reply]
Perhaps not convert, but mass ID assigning is going to be a pain in the tuchus. Vininn126 (talk) 11:31, 14 January 2023 (UTC)[reply]

A verification page for the Reconstruction namespace[edit]

Over the years, we've dealt with challenged reconstructions in different places: WT:RFDO, the Etymology scriptorium(s), and currently WT:RFVN. I think we should have a dedicated page for these. Reconstructions are different from regular entries because they're an outgrowth of the etymologies and because they are, by definition, not attested. Also, RFVN is seriously overloaded due to how much longer it takes to verify entries in many languages.

I propose we create a page called "Requests for verification/Reconstruction" with an alias of "WT:RFVR".

We should create it using the method we use for the monthly forum pages: move the latest Etymology scriptorium page to that name, then move it back and replace the redirect with the appropriate content for an rfv page. That will add it to the watchlists of everyone who has the Etymology scriptorium page watchlisted. I'm suggesting that because the closest thing to the new RFVR is the requests for verification of etymologies (tagged with {{rfv-etym}}). The only difference is that this involves an entry which is proposed for deletion rather than an item in an etymology. The people involved and the methods of resolving the request are basically identical.

The rest of the tasks for setting this up are pretty straightforward: changing the code in the modules behind the {{rfv}} template to point to RFVR instead of RFVN when the entry is in the Reconstruction namespace, changing the display for that template, and changing the documentation for the template and in whatever WT-namespace pages deal with reconstructions, verifications and discussion pages, and perhaps a maintenance category or two. Chuck Entz (talk) 19:30, 13 January 2023 (UTC)[reply]

Perhaps the page should be *Requests for verification/Reconstruction with a star At this point, anything to reduce the RFV pages would be nice. I agree that reconstructions are special. Vininn126 (talk) 19:49, 13 January 2023 (UTC)[reply]
Support. 70.172.194.25 20:01, 13 January 2023 (UTC)[reply]
Support as above (and yeah, anything to trim WT:RFVN) —Al-Muqanna المقنع (talk) 20:11, 13 January 2023 (UTC)[reply]
Support. BTW Maybe it's time to implement the split of RFVN for Italic or Italic+Greek (or just Romance). This was discussed when I split CJK out of RFVN but never implemented. Thoughts? Benwing2 (talk) 07:29, 14 January 2023 (UTC)[reply]
@Benwing2 Yes, sorry for not replying there, but let's do it! AG202 (talk) 13:50, 7 February 2023 (UTC)[reply]
Support. — Fenakhay (حيطي · مساهماتي) 08:50, 14 January 2023 (UTC)[reply]
We don't "verify" reconstructions, as they are by their nature unattested. The RFV process is irrelevant to reconstructions. The page should be called Wiktionary:Requests for deletion/Reconstructions. Definitely support the idea though - I've considered raising this myself. This, that and the other (talk) 06:20, 15 January 2023 (UTC)[reply]
Support Sławobóg (talk) 13:16, 19 January 2023 (UTC)[reply]
Support. Thadh (talk) 13:32, 19 January 2023 (UTC)[reply]
Support. Gnosandes ✿ (talk) 08:56, 20 January 2023 (UTC)[reply]
Support. ‑‑ Eiríkr Útlendi │Tala við mig 21:55, 24 January 2023 (UTC)[reply]
This has clear widespread support, and RFV NE is nigh unusable. Can we move ahead with this? What is the next step we need to take? Vininn126 (talk) 18:19, 31 January 2023 (UTC)[reply]
@Chuck Entz I will even help as I can. Just need to know what to do! Vininn126 (talk) 13:47, 7 February 2023 (UTC)[reply]

Limit headers to L4[edit]

Wiktionary's current policy requires that for words with multiple etymologies, the headers be nested with Etymologies be L3, the words be L4, and additional notes be L5. By itself this is a very good idea, however L5 headers do not seem to be supported by Wiktionary. On PC L5 headers are not visually different from L4 headers and, even worse, on Mobile L5 headers completely break and turn the header into plain text. This completely defeats the purpose of nesting by making it difficult to find usage notes, synonyms, etc. (You can test this by visiting beat on a mobile device and a computer) as such I think it would be best to limit headers to L4. As L5 headers are either the same as L4 or break the layout, I dont believe there's any benefit to having them.

However, I am only requesting this due to L5 headers not being supported. If Wiktionary is working on a fix (I haven't seen any announcements) then I DO NOT want this to go into effect.

Edit: as for replacements maybe we could make all L5 headers indented L4 headers? at least that way they'd be visible. Sameerhameedy (talk) 03:23, 15 January 2023 (UTC)[reply]

Completely disagree. L5 headers are everywhere (for good reason) and the fix is to figure out how to make them show up better esp. on mobile, and not to disallow them. Benwing2 (talk) 07:27, 14 January 2023 (UTC)[reply]
But haven't L5 headers been required by policy for a few years now? I don't know exactly when it became required but Wiktionary started about 2 decades ago, It doesn't seem like they have and plans to add in L5 headers anytime soon. Since it's a problem with the website and not the policy, there's really nothing us users can do to make it show up better. It seems like this is a backend problem.
Perhaps all current L5 headers can become indented L4 headers? that way they'd be visibly different and won't break on mobile. Sameerhameedy (talk) 03:23, 15 January 2023 (UTC)[reply]
Thanks for bringing this up. I've been frustrated by the limited nesting ability too, and on a private wiki I switched to using indentation instead, like with discussion threads, so I could have unlimited nesting. I doubt that solution will catch fire here, though .... I've never seen it used anywhere else at all.
Anyway, the formatting is the same on Wikipedia ... there aren't too many articles that get that deep, but I've seen it from time to time where a section title just appears as a normal word. On Wikipedia, I dont think it's so bad, since the header is always on its own line, and Wikipedia has very few content sections that are just one word or even sentence long.
I hadnt known about L4 and L5 headers appearing the same .... I've been using custom color highlighting, and so for me they're distinguished by color and not by size. I doubt that solution will catch on here either, and like my indentation idea, I've never seen a wiki that uses color to distinguish headers from content and from each other.
It seems that the L4 and L5 headers are just written in the code of <h4> and <h5>, which surprises me a bit ... in theory, if this thread doesnt come up with any good ideas, we could offer a custom CSS snippet just a few lines long that would make the headers more distinct, perhaps by magnifying all of them, and then if it's popular it could be added to the default skins. Soap 21:27, 14 January 2023 (UTC)[reply]
The bigger type on L2 and L3 is too big, wasting vertical screen space. Smaller type can be hard to read on phones. Indentation and possibly bold/italics would be good ways to differentiate the other levels of headings. Color is an accessibility problem for the color-blind, though we might be able to reduce the scale of the problem by addressing the needs of the red-green colorblind. DCDuring (talk) 21:44, 14 January 2023 (UTC)[reply]
In theory, since we don't currently use L1 in entries, we could move everything up a level (language names at L1 instead of L2, numbered etymology sections at L2 instead of L3, etc), making the headers which are currently L5 into L4 headers. In practice there is probably a better idea. Our whole system of headers involves a lot of empty whitespace, e.g. giving "Noun" its own line all to itself, with nothing else on that line, and then the headword line beneath it: it would be more compact to have "Noun: headword (plural: headwords)". This is especially noticeable on mobile. - -sche (discuss) 05:12, 16 January 2023 (UTC)[reply]
I actually don't know why L1 headers aren't used, I tested it and it seemed to work great in the preview. However that's just the preview, maybe on the actual page it would cause issues. If there's no technical reason for it, I would be in favor of adopting L1 headers.
I do disagree with the compaction thing though, I personally think it's better that everything is spaced out.-Sameerhameedy (talk) 06:24, 18 January 2023 (UTC)[reply]
I've created User:ExcarnateSojourner/beat as a copy of beat with all headings promoted one level, if anyone is interested in seeing how it looks and feels. — excarnateSojourner (talk · contrib) 02:33, 5 February 2023 (UTC)[reply]
Content boxes make L5s worth it. As Benwing says, we should fix what it looks like, not ignore the issue. Thadh (talk) 09:37, 18 January 2023 (UTC)[reply]
  • Can someone do a mock up of the appearance of a radical vertical-screen-space-saving reduction of whitespace by reducing font size and moving some headers (eg, PoS per -sche) next to content lines? Even if there is a (near-?)consensus on the mockup, we would need to make sure that there were no major adverse interactions with all the the templates and modules, and widely used tools. I would expect that font-size- and vertical-whitespace-reduction are fairly easy and low risk. (Could they be implemented in Custom CSS and JavaScript?) I'd expect other changes to be more problematic. DCDuring (talk) 18:42, 18 January 2023 (UTC)[reply]

@Sameerhameedy: This is not a structural problem, this is a styling problem.

L5 headers are correctly marked up in the rendered HTML as H5. Here's an excerpt of the source from that same [[beat]] entry, specifically the [[beat#Derived terms]] section header (pretty-printed for legibility).

<h5>
    <span class="mw-headline" id="Derived_terms">Derived terms</span>
    <span class="mw-editsection">
        <a href="/w/index.php?title=beat&action=edit&section=5" title="Edit section: Derived terms" data-section="5" class="mw-ui-icon mw-ui-icon-element mw-ui-icon-wikimedia-edit-base20 edit-page mw-ui-icon-flush-right mw-ui-button mw-ui-quiet">Edit</a>
    </span>
</h5>

This is taken from the mobile version.

From what I can glean from dev tools in Firefox, it looks like the style is coming from the "Minerva" skin. Experimenting just now in my laptop browser, if I go to Special:Preferences, click Appearance, and select MinervaNeue, I get the same bad styling in my laptop -- L5 headers are still <h5> in the markup, but they are now visually indistinguishable from regular text.

In my iOS Safari, I can click the hamburger icon in the upper right (the three horizontal lines) and go to Settings, which offers me the solitary option of changing the font size -- this only affects the size of text that isn't indented.

I think the MinervaNeue default skin for mobile is broken, from an accessibility and usability standpoint.

I've tried a few things, but I cannot figure out how to force the mobile UI to use any skin other than MinervaNeue.

If anyone has advice on how to change the skin in mobile, and even better how to change it for users who aren't logged in, I would be very interested to hear it. ‑‑ Eiríkr Útlendi │Tala við mig 22:42, 24 January 2023 (UTC)[reply]

New L2: Old Tibetan (otb)[edit]

We currently don't support the Old Tibetan language, which was the Tibetic language spoken prior to the development of Classical Tibetan, but after (the unattested) Proto-Sino-Tibetan. It is the ancestor to numerous Tibetic languages, some of which diverged before the Classical Tibetan period, and is the oldest attested ancestor to modern Tibetan.

Old Tibetan is attested from around the 7th century onwards, and (notably) uses a considerably more complex orthography that reflects a much earlier period of the spoken language; one which seems to have been dated even at the time of the earliest transcriptions still extant today. The reform of that orthography, starting in the 9th century, marked the period of transition to Classical Tibetan, which was unambiguously in use from the 12th century onwards. One example of this was the abolishment of the "da drag" (a (da) used as a second suffix), which - for better or worse - still lingers on in an indirect form in modern Tibetan pronunciation, despite the fact it hasn't been written in over 1,000 years. For example, the Old Tibetan word དབྱརད (dbyard​, summer) developed into the modern word དབྱར (dbyar, summer), but still declines as though it had the old spelling. As such, adding Old Tibetan would be of benefit both for its own sake (as a distinct period of the language), as well as for the etymology sections of numerous terms we already have.

For what it's worth, Old Tibetan has its own ISO language code, too: otb. I suggest we use it. Theknightwho (talk) 00:37, 15 January 2023 (UTC)[reply]

Support. AG202 (talk) 07:52, 16 January 2023 (UTC)[reply]
Support. Three cites, for all senses.

Durability of National Corpora[edit]

Welp, it's my turn to ask one of these questions.

How durable are we to consider something like an online National Corpus, such as NKJP (National Corpus of the Polish Language). It includes many internet citations, mostly from UseNet, but the main issue with internet citations has always been their durability. For a while UseNet was considered durable, but this has been called into question as we have SEEN the deletion of certain threads by Google.

I would assume a resource like this would be considered highly durable, and therefor any citations from it would count toward CFI, no? Vininn126 (talk) 11:05, 15 January 2023 (UTC)[reply]

I’m inclined towards calling these durable. Theknightwho (talk) 21:23, 15 January 2023 (UTC)[reply]
If this is true, we need to consider the status of other corpora such as COCA. Vininn126 (talk) 22:04, 15 January 2023 (UTC)[reply]
A corpus whose access is funded by a national government with stable institutions deserves special consideration. COCA is not in that position. DCDuring (talk) 22:54, 15 January 2023 (UTC)[reply]

Unattested senses in extinct languages[edit]

Sometimes an extinct language - be it due to their small corpus or topical bias - may have a sense that is unattested, but can be reconstructed based on the descendants and ancestor; My question is: How should we handle these? Currently, I have implemented a less-than-ideal solution on 𐍒𐍩𐍠𐍢, but maybe a more durable solution can be found? Thadh (talk) 14:07, 15 January 2023 (UTC)[reply]

If that categorized, I think that would be a huge improvement. Vininn126 (talk) 14:26, 15 January 2023 (UTC)[reply]
Let me be clear what I mean:
I think modifying {{label}} to include a reconstructed link and categorization would be best. Vininn126 (talk) 17:07, 15 January 2023 (UTC)[reply]
Latin focus has the label "vulgar Latin" for the sense "fire". I'd propose in descendants as a label. One could also say rare|chiefly|in descendants, in case the sense is, or close to, a hapax, but is the one main meaning in all of its descendants. Catonif (talk) 14:40, 15 January 2023 (UTC)[reply]
We need a template for it, a kind of stylized star that links to an explanation, being obvious enough since the ASCII asterisk is insufficient also for its Wikisyntax employment. What also occurs is that a sense can be assumed for a dialect but it is only attested for another, as for زُوم (zūm). Fay Freak (talk) 17:23, 15 January 2023 (UTC)[reply]
I think I’d find this label confusing. I’d prefer (reconstructed), perhaps with some kind of formatting that distinguishes the sense in some way. Theknightwho (talk) 20:50, 15 January 2023 (UTC)[reply]
I agree "reconstructed" would be better than "unattested", definitely. Vininn126 (talk) 20:53, 15 January 2023 (UTC)[reply]

The Uindiorix problem[edit]

As you know, the newest entry feature of categories malfunctions when you mix attested and reconstructed entries. I'd like to keep the "newest entry" list functional so that it's easier for editors to watch over newly created entries in a given proto-language, and see how/if the creation of entries in the language is active.

The problem with the malfunction is obvious when dealing with Uindiorix inscription. Editors can't decide between assigning it to Proto-Brythonic or elsewhere; anywhere it's assigned, it causes the "newest entry" feature to break.

AFAIK the only workarounds I can think of to get the "newest entry" feature to work again are:

  • Give the language of the Uindiorix inscription a separate language code from both Proto-Brythonic and Proto-Celtic. It can't be Proto-Brythonic (nouns have case endings in the inscription; which were already all lost at the late stage we reconstruct it) and it can't be Proto-Celtic (Proto-Celtic is dated much earlier than this Roman-era inscription), so it's probably warranted to treat it separately from both.
  • Move the entry into Reconstruction space, restore reconstruction-only status, and write notes stating that the term is attested without creating an attested-form entry.

Ceso femmuin mbolgaig mbung, mellohi! (投稿) 03:37, 16 January 2023 (UTC)[reply]

I can't speak to the category issue but it would seem very strange to me to give it a special language code. The Proto-Brythonic reconstructions may represent a late stage of the language, but if a word is attested from a different stage then it makes sense for it still to be noted as Proto-Brythonic, with an appropriate note about the period it is attested from, rather than inventing a new ad hoc language category for it. —Al-Muqanna المقنع (talk) 03:54, 16 January 2023 (UTC)[reply]
Would solution 2 (move to Reconstruction namespace but state that it's actually attested and it's only in Reconstruction: for software reasons (possibly with a banner template?) work for you? — Ceso femmuin mbolgaig mbung, mellohi! (投稿) 04:08, 16 January 2023 (UTC)[reply]

As far as I can tell, the dynamic page list can only pull from one namespace if any namespace is specified. The code in the category page is

<dynamicpagelist>category=Proto-Celtic lemmas
namespace=
count=10
mode=ordered
ordermethod=categoryadd
order=descending</dynamicpagelist>

You can get the wikitext generated by {{auto cat}} by plugging the wikitext of the category page into "Input wikitext:" in Special:ExpandTemplates and setting the title to Category:Proto-Celtic lemmas. It shows only mainspace entries because of namespace= (that is, namespace is an empty string, mainspace). namespace=,Reconstruction doesn't work. If namespace= is removed, both main and Reconstruction namespaces will be shown, but so will User and Wiktionary and Appendix namespaces, if someone is testing headword-line templates elsewhere and they end up categorizing.

Lines starting with line 284 in Module:category tree control the inclusion of namespace=. The dynamic page lists on the category page show pages in the Reconstruction namespace for solely reconstructed languages (namespace=Reconstruction), Appendix for appendix languages (namespace=Appendix), main for others (namespace=).

In theory we want the list (whatever use it is) to show main and Reconstruction namespaces always. If a reconstructed-only language has mainspace entries (which it shouldn't), we'd want to see them. A non-reconstructed and non-appendix language can have reconstructed entries and those should be shown. An appendix-only language should show Appendix pages, but it doesn't hurt to show main and Reconstruction pages because there shouldn't be any in any case. But there's no way to do that without allowing all namespaces, which means non-entries will sometimes show in there.

No good general solution, but in this case there could be some kind of exception for Proto-Celtic that makes the dynamic page lists show Reconstructed entries only. I hardcoded it in Module:category tree (diff), since it's such an exception that I don't know a better place to put it. — Eru·tuon 01:51, 17 January 2023 (UTC)[reply]

I extended the exception to other languages whose code ends in -pro except for Proto-Norse (gmq-pro) because Proto-Brythonic (cel-bry-pro) was in the same boat as Proto-Celtic and most others will be as well if they get any non-reconstructed entries. Suggestion of Chuck Entz on my talk page. It would be more scientific to do some kind of census, but unnecessary at this point. — Eru·tuon 02:37, 17 January 2023 (UTC)[reply]

Demonstration of such a solution 2 template[edit]

This entry contains directly attested terms in a reconstructed language. Even though the term is attested, the term is in the Reconstruction namespace for technical reasons.

@Skiulinamo Your move (which switched Proto-Celtic to allowing attested entries) is causing technical problems with categorization amenities, since they cease to function properly when attested and reconstructed entries are mixed. I would like to move Uindiorix into Reconstruction but mark the fact that they're attested with a banner template as a workaround. — Ceso femmuin mbolgaig mbung, mellohi! (投稿) 04:21, 16 January 2023 (UTC)[reply]

Uindiorix is obviously attested and shouldn't be moved to a reconstruction. What "categorization amenities" are you referring to? --Skiulinamo (talk) 04:35, 16 January 2023 (UTC)[reply]
@Skiulinamo: The "Newest pages ordered by last category link update" function. — Ceso femmuin mbolgaig mbung, mellohi! (投稿) 04:36, 16 January 2023 (UTC)[reply]
Link. --Skiulinamo (talk) 04:37, 16 January 2023 (UTC)[reply]
@Skiulinamo: This box on the top-right of the category page. (I circled it in red.) It cannot display newly-added reconstructed entries if an attested entry is also part of the category. — Ceso femmuin mbolgaig mbung, mellohi! (投稿) 04:39, 16 January 2023 (UTC)[reply]
@Erutuon --Skiulinamo (talk) 04:44, 16 January 2023 (UTC)[reply]
Until it's fixed if you just want to track new entries you can do the same thing by filtering the related changes—this also shows removals from the category, which the box doesn't. It looks like there are two other attested entries at Proto-Brythonic in any case (which is presumably where Uindiorix should be, not Proto-Celtic, given that the dating is nowhere near the Proto-Celtic era), Artognou and Ἀργεντοκόξος. —Al-Muqanna المقنع (talk) 05:02, 16 January 2023 (UTC)[reply]
At this point I'm waiting for Erutuon to comment, which might take a couple of days (since he might be sleeping for a while, who knows). — Ceso femmuin mbolgaig mbung, mellohi! (投稿) 05:05, 16 January 2023 (UTC)[reply]
Alright, Erutuon switched the newest pages list box up so that only reconstructed entries are displayed, partially fixing my concerns. — Ceso femmuin mbolgaig mbung, mellohi! (投稿) 02:19, 17 January 2023 (UTC)[reply]
We shouldn't be putting terms in the wrong namespace "for technical reasons". Theknightwho (talk) 04:57, 16 January 2023 (UTC)[reply]

Where is the best place for me to lodge a complaint about Wiktionary's completely inaccurate and idiosyncratic use of the term "Proto-Brythonic" for what should properly be termed (Common) Neo-Brittonic (or, as some more Welsh-centric scholars prefer, [Common] Neo-Brittonic)? Proto-Brittonic/Brythonic should (obviously!) be the ancestor of Brittonic, the ancient British Celtic language of Britain attested from the 4th century BC through the mid-6th century AD, after which it gave way to (Common) Neo-Brittonic (the immediate ancestor of Welsh, Cornish, Breton, and Cumbric). Proto-Brittonic would be a daughter of Insular Celtic, itself a daughter of Proto-Celtic, and would roughly date to the early Iron Age. Any attested or reconstructed ancient British Celtic name/word dating from circa the 4th century BC through the mid-6th century AD should be labeled here as Brittonic and anything dating to the mid-6th century through the 8th century should be labeled Neo-Brittonic (or one of its variants, such as John Koch's Common Archaic Neo-Brittonic).M.Aurelius.Viator (talk) 16:04, 18 January 2023 (UTC)[reply]

I know Schrijver uses "Late Proto-British" for the stage Wiktionary deals with. He also uses "Proto-British" for the stage of the proto-language more in line with the Uindiorix description. — Ceso femmuin mbolgaig mbung, mellohi! (投稿) 18:30, 18 January 2023 (UTC)[reply]
Yes, but few other scholars use this term and Schrijver even admits that his use of the term is unusual and "requires elucidation"; Common Neo-Brittonic/-Brythonic (Archaic Common Neo-Brittonic for John Koch) is the preferred term among most Celticists for post-apocope, post-syncope, pre-dialectal British Celtic of c. mid-6th century AD through c. 8th century AD. Also, for anyone who hasn't read Schrijver, you can see his explanation of the term in his book Studies in British Celtic Historical Phonology [1995], p. 12. M.Aurelius.Viator (talk) 18:53, 18 January 2023 (UTC)[reply]
Specifically Schrijver says there that using it to refer to an earlier stage needs elucidation, i.e. what you are claiming Wiktionary should adopt exclusively, and that proto-languages generally refer to "the final stage of common development of a number of languages", i.e. how Wiktionary in general uses Proto-Brythonic at the moment. —Al-Muqanna المقنع (talk) 18:59, 18 January 2023 (UTC)[reply]
I don't know how many times I need to repeat it (have already mentioned it several times on the other page where this was being discussed) but no one other than Schrijver calls post-apocope/post-syncope British Celtic of the early medieval period "Proto-British" and certainly no one (outside of Wiktionary!) calls it "Proto-Brythonic"; the latter is inaccurate, confusing for non-specialists in Celtic linguistics, and out of line with the academic consensus today. It needs to be changed. M.Aurelius.Viator (talk) 19:51, 18 January 2023 (UTC)[reply]
My point was to point out the irony in that Schrijver's late-stage reconstruction is equivalent to "Proto-Brythonic" on Wiktionary even though he himself did not actually call it that... — Ceso femmuin mbolgaig mbung, mellohi! (投稿) 20:24, 18 January 2023 (UTC)[reply]
@M.Aurelius.Viator: Schrijver says, clearly, and in line with standard linguistics, that a proto-language is in general the last common ancestor of a group of languages, i.e., in this case, the Brittonic/Brythonic languages. This is what is being reconstructed on Wiktionary. You've already been told by @Chuck Entz elsewhere that continual appeals to your own authority are not going to impress anybody, so do you have any specific reference or point to make beyond asserting it over and over? —Al-Muqanna المقنع (talk) 20:32, 18 January 2023 (UTC)[reply]
But is there a scholarly consensus calling this specific stage (with all grammatical endings destroyed by apocope and umlaut in full swing) "Proto-Brythonic"? I do not believe it exists, and I provided a major source that specifically avoided the word "Proto-Brythonic/Proto-Brittonic" for the post-apocope stage. — Ceso femmuin mbolgaig mbung, mellohi! (投稿) 20:54, 18 January 2023 (UTC)[reply]
No one that I'm aware of - and I am very well read on this subject - refers to post-apocope, post-syncope British Celtic - i.e., the language of the mid-6th century AD through the 8th century AD - as "Proto-Brythonic". Most call this stage of the language Neo-Brittonic/-Brythonic. After this period we are in the Old Welsh/Cornish/Breton/Cumbric periods. M.Aurelius.Viator (talk) 22:11, 18 January 2023 (UTC)[reply]
But he is right, I don’t understand why everyone is so dead against using proper terminology? Doesn’t make sense. Why would you not use the same common terminology as people who study the field and people who have a profession in the field. Seems like an ego thing, just don’t want to admit all the entries that have been submitted this far are incorrect.
also what do I do if I want to make an entry for Brittonic but not the stage of Neo-Brittonic? What if I want to make entries for forms of Early Brittonic, Late Brittonic and Neo-Brittonic? I can’t because you are lumping all of these together under one term “Proto-Brythonic”. I can’t make sense of any of this, why not allow us to make enteries based on what is best for people who study or want to study these languages. I want to create enteries on the Brittonic verbs ‘to know’ for instance, let’s take the 1.sing.pres. first I want to create a Proto-Celtic. *wintnua, then Early Brittonic. *winnua, then Late Brittonic. *wunnoa, then Neo-Brittonic. *gwnn. I can’t do this because 2 stages of Brittonic and Neo-Brittonic are all lumped together? Why? Senseless, absolutely senseless… Silurhys (talk) 21:31, 18 January 2023 (UTC)[reply]
Exactly - it's completely bizarre that there is so much pushback against a very common sense proposal - at least for those of who have actually studied Celtic historical linguistics! M.Aurelius.Viator (talk) 22:07, 18 January 2023 (UTC)[reply]
@Al-Muqanna المقنعI can provide ample sources to back up my statements here - can you provide even a single one to back up yours? Do you have any background whatsoever in Celtic historical linguistics? I have almost 40 years experience, have run/moderated several academic mailing lists on the subject, and have been consulted by and cited as a reliable source by numerous well known scholars in the field (Delamarre, Stifter, et al.); what have you done in this field? M.Aurelius.Viator (talk) 22:16, 18 January 2023 (UTC)[reply]
@M.Aurelius.Viator, Al-Muqanna I don't know much about the timeline of Brythonic linguistic stages but we definitely need to use proper terminology, and from what I know about Celtic, I would be highly skeptical of treating something as late as 8th century AD as Proto-Brythonic. There is a lot of text written up above but it seems only User:Mellohi! is objecting to using proper terminology, is that right? If so I think we can ignore their objections given the number of people in this discussion. Can you enumerate what exact changes need to be made? I may be able to help with the technical aspects. Benwing2 (talk) 01:13, 21 January 2023 (UTC)[reply]
@Benwing: I oppose using "Proto-Brythonic" and in fact agree with you and Viator it's "Proto-Brythonic" that's the incorrect terminology. The current "Proto-Brythonic" should be renamed to something else. — Ceso femmuin mbolgaig mbung, mellohi! (投稿) 01:23, 21 January 2023 (UTC)[reply]

──────────────────────────────────────────────────────────────────────────────────────────────────── @Benwing2, Mellohi!: There are two distinct discussions, I think, I'm not sure to which Benwing was referring to when saying you don't support standard terminology. First, I 100% agree that Uindiorix should not be listed at Proto-Celtic, which is about a millennium off the mark. What it should be is a bit more complicated. The reason not to split these stages into separate languages is pretty simple from a comparative linguistics standpoint; there are no additional languages from which to reconstruct them, and the "Proto-" standard is also uniform with our use e.g. of "Proto-Slavic" rather than "Common Slavic" (also used in literature) for the common ancestor of the Slavic languages and the like. It seems to be an accepted guideline that reconstruction pages are not created where there's only a single descendant, which would obviously be the case for various stages of a reconstructed language. Many languages (e.g. obviously Latin) have period-specific labels, which could be useful in this case. I think generally speaking the reconstruction entry should be at the last stage, following standard comparative practice, and extra info listed there; the few terms that are actually attested like Uindiorix could be handled with labels.

The renaming of Proto-Brythonic is a separate issue (and a derail of the original discussion IMO), which I'll deal with below.

@M.Aurelius.Viator: You've already been provided a "single source" in the RFV discussion, but to write two recent sources out, 1. "Around 800 AD, Late Proto-Brittonic branched into West-Brittonic, which then became Old Welsh, and Southwest-Brittonic, which further sub-divided into Breton and Cornish from about 1100 onwards." (Irslinger 2014, "The Gender of Abstract Noun Suffixes in the Brittonic Languages", p. 65); 2. Anders Richard Jørgensen 2022, "Celtic", in The Indo-European Language Family: A Phylogenetic Perspective, has "Proto-Brittonic" and "PBrit." for the phase of the language before Old Welsh etc., thus "a Proto-Brittonic suffix *-ɪnn" (p. 145, i.e. our Reconstruction:Proto-Brythonic/-ɨnn). Now, you've repeatedly been asked to provide sources yourself specifically treating Proto-Brittonic/Brythonic as an early stage in the language, and, from what I can tell, have still not done so despite abundant assertions that you can.

I don't personally object at all in principle to changing the name to something else with greater currency, e.g. Common Brittonic, but I do object very strongly to doing so simply because a new editor is stamping their foot and demanding it on their own authority, which is being lent to the obviously false claim that no Celticist has ever used the term Proto-Brythonic/Brittonic for this stage of the language; it sets a pretty terrible precedent. There should be a serious discussion of the merits, and what the sources actually say, rather than arguing the toss over unverified credentials. This is to me a "governance" issue independent of the matter at hand. (Maybe @Skiulinamo has a perspective on this as well.) —Al-Muqanna المقنع (talk) 02:26, 21 January 2023 (UTC)[reply]

@Al-Muqanna, Mellohi! Blah. My apologies for getting confused about what was going on; upon rereading I realize I totally misread some of what was written. Let me know if you need any technical help; otherwise I'll refrain from further comments. Benwing2 (talk) 04:08, 21 January 2023 (UTC)[reply]
What's annoying about all this is that instead of asking a good question, like "should we rethink how we reconstruct Proto-Brythonic?", it's a pedantic and silly one of "why are you idiots calling it this? what should we call it?". If there was a vote to reconstruct PB in an earlier form, with case endings and internal thematic vowels, I would support that, same as I did 6 years ago. -- Skiulinamo (talk) 07:03, 21 January 2023 (UTC)[reply]
Let’s take a vote then to split Proto-Brythonic into two stages; Brittonic and Neo-Brittonic. It would certainly make entries easier, more reasonable and closer in line with modern academia. Silurhys (talk) 12:11, 21 January 2023 (UTC)[reply]
@Silurhys: I'm not sure if you're intentionally being obtuse, but I promise you, a vote to split PB would fail horribly and be a waste of time. -- Skiulinamo (talk) 23:09, 21 January 2023 (UTC)[reply]
How am I being obtuse, all I’m doing is trying to improve the situation, you have to put attested Brittonic names under Proto-Celtic because the system thats in place is a mess. I can’t understand the thinking here, why would it fail? Why don’t people want to do what’s best. I want to make Brittonic entries on words that may not have existed in Proto-Celtic. What am I to do in that situation? A reasonable answer would be nice but nobody has one. Sad that it’s like this, people can’t rely on Wiktionary to have accurate information. Silurhys (talk) 23:38, 21 January 2023 (UTC)[reply]
@Silurhys Your proposal is exactly what is needed and it's bizarre that people are arguing against it! M.Aurelius.Viator (talk) 16:59, 22 January 2023 (UTC)[reply]
> the obviously false claim that no Celticist has ever used the term Proto-Brythonic/Brittonic for this stage of the language; it
You don't have clue as to what you're talking about. No one uses this term for Common Neo-Brittonic. The closest you have come is Schrijver's idiosyncratic Late Proto-British, which is not the same thing. Brittonic (aka Brythonic) is the ancient language of Britain, attested from the 4th century BC through the mid-6th century AD, after which it became Common Neo-Brittonic. Proto-Brittonic would be the ancestor of Brittonic, not its descendant, LOL! Yet this is the ridiculous way that it is being used on Wiktionary. M.Aurelius.Viator (talk) 17:04, 22 January 2023 (UTC)[reply]
John Koch, Celtic Culture, ABC-CLIO, 2006, p. 305, "In contemporary Celtic studies, ‘Brittonic’ and ‘Brythonic’ are interchangeable terms with precisely the same meaning. ‘Brythonic’ has been adopted in this Encyclopedia, since it has the advantage of resembling the Welsh terms from which it is derived, namely Brython ‘Britons’ and Brythoneg ‘Brythonic’, thus reminding the generalist reader of the languages to which it refers, and avoiding potential confusion owing to the similarity of ‘Brittonic’ and the name of the modern nation state of Britain (i.e., the UK) and of the British Empire, whose chief language is the non-Brythonic (and non-Celtic) English. In Jackson’s highly regarded scheme as set out in Language and History in Early Britain, ‘Brittonic’ is used in the same sense as ‘Brythonic’ here. ...(T)his Encyclopedia follows Jackson’s usage in Language and History in Early Britain, using ‘British’ only for the ancient form of Brythonic as found in documentary evidence from the Iron Age and the Roman period in Britain. In linguistic terms, ‘British’ therefore refers to Brythonic as long as it retained its Proto-Celtic (and ultimately Indo-European) syllable structure; for example, the Brythonic name written in ancient sources as British Cunobelinos is written in the early Middle Ages as an archaic Neo-Brythonic (or early Old Welsh) Cunbelin, revealing the loss of two unaccented Proto-Celtic syllables. Using ‘British’ to refer only to the ancient form of Brythonic (the ‘Cunobelinos stage’) has the advantage of using ‘British’ only for a period at which this language was actually the most widely spoken language of Britain; this avoids the confusions discussed above with reference to later periods when English had become the predominant language of Britain and then the British Empire, for which periods ‘British’ might be misunderstood as shorthand for ‘British English’ in contrast to ‘American’ or ‘Indian English’, &c."
M.Aurelius.Viator (talk) 17:27, 22 January 2023 (UTC)[reply]
Review of Patrick Sims-Williams, "Dating the Transition to Neo-Brittonic: Phonology and History, 400-600" in: "Britain 400-600: Language and History". Ed. by Alfred Bammesberger and Alfred Wollmann (Anglistische Forschungen 205) by Poppe, Erich.  Amsterdamer Beiträge zur Älteren Germanistik; Amsterdam Vol. 34, (Jan 1, 1991): 194: "In his closely argued contribution Sims-Williams sets up criteria for neo-Brittonic (loss of final syllables , syncope , Second Spirantization , new quantity system) and then examines the relevant developments in detail." M.Aurelius.Viator (talk) 17:33, 22 January 2023 (UTC)[reply]
[Despite its age, still the standard texts on Brittonic and early Neo-Brittonic]: Kenneth Jackson, Language and History in Early Britain, Edinburgh, 1953, pp. 4-5: "In this book British (Brit.)1 [n. 1: Some writers employ Old British, and in German Altbritisch, but this is meaningless, because in linguistic usage Old implies Middle and Modern, and there is no such thing as Middle and Modern British. The adjective is unnecessary and is therefore omitted here, in accordance with the usual practice.] is used as a general term for the Brittonic language from the time of the oldest Greek information about it (derived from Pytheas of Marseilles, c. 325 B.C.) down to the sub-Roman period in the fifth century and on into the sixth. Where it is necessary to be more precise, a distinction is made between Early British, during the Roman occupation and as far as the coming of the Saxons in the middle of the fifth century, and Late British (Late Brit.), from that time until and including the earlier half of the sixth century. As we shall see, the earlier period of British coincides with the oldest dateable Brittonic sound-changes, beginning in the first century B.C. and the first to second century A.D., and consisting mainly ofcertain modifications of vowels and diphthongs ; and Late British covers a number of important transitional phonetic developments, from lenition through final vowel affection down to the loss of final and unstressed internal syllables. Since some of the special features of the separate modern languages reach right back into the British period, it is necessary to postulate a West British dialect, that of Wales and the Midlands (and Late West British, Late W.Brit.), and a South-West British (and Late South-West British, Late SW.Brit.), that spoken in the peninsula of Devon and Cornwall. Romano-British (Rom.-Brit.) is confined to forms of the language reported by Roman writers in Latinised spelling; and the term is stretched to include those given by Greek authors, chiefly derived from Latin sources, in Greek spelling, e.g. by Ptolemy. British Latin on the other hand is something quite different, the variety of Vulgar Latin spoken in Britain during and for some time after the Roman occupation."
"With the drastic changes which occurred during the Late British period we reach an entirely new stage; the ancient language, with its final syllables, its case terminations, and the rest, gave place to what is really a mediaeval one, and to the rise of what will be called here the Neo-Brittonic tongues, Welsh, Cornish, and Breton. From the middle of the sixth century we can begin to speak of these as separating languages, and from the end of the century as separate. In this period, and down to the time of their earliest written records (other than inscriptions and a few names in sixth- to eighth-century Latin sources}, the new terms Primitive Welsh (Pr.W.), Primitive Cornish (Pr.C.}, and Primitive Breton (Pr.B.) are used here. With these written records we enter upon the stage of Old Welsh (OW.), beginning with the later eighth century, Old Cornish (OC.), from the late ninth century, and Old Breton (OB.), from the early part or middle of the ninth century. M.Aurelius.Viator (talk) 17:50, 22 January 2023 (UTC)[reply]
Richard Coates, "Rethinking Romano-British *Corinium", The Antiquaries Journal, Cambridge, 2013, "Before we embark on the discussion, we need to attend to some terminology for the languages of Britain in antiquity and later. The main native language of Britain in the Roman period was British Celtic, recorded almost exclusively in a latinized form called Romano-British (RB), for which the main evidence is personal and place-names. In the period after the collapse of Roman rule, this language evolved through a stage called in this article Brittonic (roughly 450-550 C. E.), for which there is little direct evidence, and from this into the distinct modern languages Welsh, Cornish, and Breton (collectively called the neo-Brittonic languages; 550 C. E. onwards). The earliest stages of these, before written texts appear (550-750/800 C. E.), are called e.g. Proto-Welsh and Proto-Cornish, and the earliest texts are described as being in e.g. Old Welsh and Old Cornish." M.Aurelius.Viator (talk) 18:03, 22 January 2023 (UTC)[reply]
Martin J Ball and James Fife, The Celtic Languages, Routledge, 1993 (2005 repr.), p. 75-77: "P-Celtic and Q Celtic: sources and transition in the neo-Celtic languages of the British Isles. P-Celtic, Brythonic, Old Brythonic or Common Brythonic was spoken in Britain south of the Rivers Forth and Clyde down to the middle of the sixth century AD. Before the rise of the neo-Brythonic languages, from the middle to the end of the century, we observe the formation of dialects..."
"In dealing with the emergence of the neo-Brythonic languages from Common Brythonic, three facts must be taken into consideration. First, the breakdown of Late British (450-550) which, according to Jackson (1973: 121), probably started in the Lowland Zone and was 'the consequence of violent social disturbances and the disappearance of a linguistically conservative class' (ibid.). Among the phonological changes which mark the end of Late British, four transformations are particularly important: (a) the dropping of final syllables (as well as of interior ones) caused by the penultimate stress access and resulting in the change of inflection type (from synthetic to analytic)... (b) lenition of consonants in intervocalic position... (c) vowel affection, e.g. umlaut... (d) quantity collapse with the lengthening of short stressed vowels before short consonants..." M.Aurelius.Viator (talk) 18:23, 22 January 2023 (UTC)[reply]

Mass-adding translations[edit]

There seems to be a class of editors who have a penchant for mass-adding translation in languages they don't know without taking proper care. Usually they just copy Wikipedia titles and leave behind a mess of unattested terms and missing qualifiers/genders/noun classes/... Recent cases include StuckInLagToad, VGPaleontologist, Numberguy6, Hippietrail and Sgconlaw (the first two have left the project after being reprimanded for this behavior, the last two have stopped doing it on their own accord AFAICT) along with a number of IP editors. From my experience, translation boxes contain the lowest quality content on Wiktionary and I wouldn't trust them except in the case of a few select languages. It seems that bad content goes unnoticed the longest in translation boxes. It doesn't have to be this way and that's why I'm suggesting that we crack down on low quality translation content. Cracking down would mean to me that recent changes patrollers revert all such mass-additions and that previous contributions of this sort are converted to {{t-check}} by a bot. Opinions?

(As a side note, I've also done this once but I have learned from it since then.) — Fytcha T | L | C 12:36, 16 January 2023 (UTC)[reply]

I also propose including Rajkiandris to the list, who added tons of incorrect translations in his time here. I agree that we need to do something about the translations, even though I don't like editing them myself... Thadh (talk) 12:53, 16 January 2023 (UTC)[reply]
Toad and VGP were the same person. He more or less confessed to it in his last message here. I think he really wanted to help, and like you, I hope he decides to come back, but at a much slower pace. Translation is difficult, and we can't rely on our fellow project. Wikipedia often will link to the most appropriate article, even if it's not an exact translation. So even knowing the basics of the grammar of a certain language often isnt enough. As I learned just recently, bubble bath and hot tub are two entirely different things in English, but in many languages, such as Dutch, they are covered by one word, in this case bubbelbad, but with the second meaning more common (schuimbad is the word that specifically means a bath with bubbles). Wikipedia cannot link one article to two, so at some point we decided to go with the literal translation of bubble bath, and I had to ask a native speaker to figure out why things didnt seem to line up. Soap 14:24, 16 January 2023 (UTC)[reply]
I'm a native speaker of Hungarian and will still generally double-check every translation I add, by checking Hungarian dictionaries and/or by searching to make sure it's actually attested. On the other hand there are a small number of translations I've added for languages I'm not fluent in but can understand source material for, where they're straightforward and I've made sure they're properly attested (e.g. the Spanish indulgencia plenaria at plenary indulgence). So IMO it's not just a fluency issue, it's about taking proper care and not just porting over stuff from Wikipedia and the like. —Al-Muqanna المقنع (talk) 14:48, 16 January 2023 (UTC)[reply]
Well it's not necessarily a lack of fluency, but a lack of even being able to understand the material in the first place. Vininn126 (talk) 15:20, 16 January 2023 (UTC)[reply]
Translations require knowledge of both languages and often of the subject matter. There are also a lot of cases where there's only partial overlap semantically, grammatically/morphologically, sociolinguistically, etc. An English word may correspond to a grammatical case or an SOP phrase in another language, and vice versa. The English word or the translation may be more general, or may be everyday speech vs. absolutely taboo. The default sense in English may require a qualifier in the other language, and vice versa. There may be regional variation in one or both languages. There are a million ways to go wrong.
On top of that, there a lots of easily available bad sources: Google (or Bing) Translate, Wikipedias, general translation websites, etc.
There are specific red flags I look for: isolated and less studied languages such as Basque, Albanian, and Khmer. Dead languages with limited corpora that people are trying to revive, such as Gothic and Old Prussian (even Old English, to some extent). Regions such as Africa, the Americas and parts of Asia that are only known by people from the region and specialists (and even those are too diverse for mastery of an entire region). There are people who are qualified to add translations to any one of those, but combinations of more than a couple are suspect. Chuck Entz (talk) 19:30, 16 January 2023 (UTC)[reply]
To raise a connected issue: at least one editor in that list has added thousands of bare lemmas on the basis of translation boxes alone. In some cases these have required correcting or simply deleting.
Also, I propose a mass delete of every translation added by Rajkiandris. There are thousands and thousands of them, and they're often of extremely low quality due to catastrophic competency issues. One example I remember was him misunderstanding the difference between vessel (boat) and vessel (container), leading to a slew of incorrect entries. Theknightwho (talk) 02:42, 17 January 2023 (UTC)[reply]
This is a really good idea. Mass-additions cannot be trusted and should be reverted on sight. Megathonic (talk) 07:03, 17 January 2023 (UTC)[reply]
I agree. Part of my TODOs is going through the mess that was Yorùbá translations before our recent group effort started. There might also need to be a way to cite where translations are found, as currently there's no way to do so. Also, seeing the case of User:Rajkiandris, as a project I think we need to be much more proactive about blocks unfortunately. Personally, I used to lean more on giving people more chances, as with User:Spacestationtrustfund, but clearly doing this only causes more harm to the project and creates more issues that have to be corrected in the end, and most active editors don't have the time to parse through all their good and bad edits, leading the bad ones to remain for much longer than they needed to. AG202 (talk) 08:51, 17 January 2023 (UTC)[reply]
@Fytcha, Soap, Rajkiandris, AG202, Thadh Completely agree about both the need to be more proactive about blocks of repeat-offending users and the wisdom of mass-removing bad translations. Some users do tons of harm adding stuff that at first looks semi-reasonable but is revealed to be garbage as soon as you scratch the surface. This is much more harmful than obvious vandalism, which usually gets reverted quickly. If we agree, for example, to remove all of Rajkiandris's translations, I may be able to write a bot script to do this; same with User:Fytcha's proposal, although the specifics of how to do that need to be ironed out. Benwing2 (talk) 01:05, 21 January 2023 (UTC)[reply]
As said above, I support removing all translations added by Rajkiandris. It's better to not give any translation than risk giving a wrong translation, and nobody has the time to go through the thousands upon thousands of translations this user has added and fix them.
On a different note, it would also be useful for me with Ingrian (and maybe for others, like AG202, with the languages they edit) if a list could be generated for all pages featuring translations in the language. Of course this isn't feasible for bigger languages, but for languages under a few thousand entries it's possible, surely? Thadh (talk) 01:17, 21 January 2023 (UTC)[reply]
In case anybody else needs this, this can be achieved with a simple CirrusSearch query: [2]Fytcha T | L | C 12:40, 21 January 2023 (UTC)[reply]
I've had a version that looks only for language codes bookmarked since you first shared it with us. While looking through the results for Old Prussian (prg), which is a language that died centuries ago and left a very limited corpus, I found a dubious translation for passport, which led me to this. Aside from 1 edit by Wingerbot to the Derived terms section, everything is the same IP: a solid block of ~ 107 translations added over 4 days from every inhabited continent except possibly Australia. How many people know all of Ojibwe, Yoruba, Cornish, Scots Gaelic, Samoan, Dzongka, Tarifit, Venetian, Chuvash and Cantonese well enough to add translations? Chuck Entz (talk) 06:10, 23 January 2023 (UTC)[reply]
I'm wondering if there should be a different way to handle translations, because they're completely impossible to trust at the moment. Theknightwho (talk) 19:27, 23 January 2023 (UTC)[reply]
It may just be the case that translations are inherently hopeless for a Wiki-like passive review system (by that I mean the system where the hope is that somebody knowledgeable will eventually come across nonsense and delete it). We have more English entries than the English Wikipedia but only a fraction of their active users. To make matters worse, any given user can only verify so many languages on a page. To make matters even worse, unlike Wikipedia, most of our stuff is not sourced. These three factors make the Wikipedia-style passive review paradigm incredibly ineffective for Wiktionary translations.
Cracking down on edit patterns that are known to lead to many errors is a good first step but may just be a drop in the bucket at this point. Bflanzekùnd has been up for 9 years, Sauerstoff for 14 (this doesn't even show up in the headline for me because it is incorrectly formatted, making it essentially undetectable), a Turkish swear word was up as a supposed Quecha translation for over 15 years. Perhaps we need a role along the lines of "translations approved" but then we'd have laughably few translations. I see no way around either sacrificing quality or quantity and we're currently sacrificing quality. — Fytcha T | L | C 23:41, 23 January 2023 (UTC)[reply]
(Not to say 'I told you so', but I proposed making sourcing obligatory a while back and everybody said no) Thadh (talk) 00:19, 24 January 2023 (UTC)[reply]
Your proposal only pertained to entries, no? In that case, I fail to see how it solves the issue of bad translations considering that they almost never point to any entries. Or did you envision that translations must be backed up by sources, too? — Fytcha T | L | C 15:01, 24 January 2023 (UTC)[reply]
We are not hopeless, we do our best, also by attracting and retaining valuable editors by rigidly showing, verbalizing good editing standards. There is not too much quality sacrified, have you searched together your bucket of junk? It’s not like we are standing in the muck. Fay Freak (talk) 00:29, 24 January 2023 (UTC)[reply]
The severity of the problem strongly depends on the language, of course. German translations are mostly fine, Finnish (I'd imagine) must be pretty good, but when it comes to smaller languages or languages with smaller editor communities such as gsw, I really don't have to search much to find obvious nonsense. I don't even have to search much to find nonsense that stood for over a decade. I can't help but think that this is the reality for most smaller languages (which make up most languages). My point was also rather that the usual Wiki "self-fixing mechanisms" aren't really effective under these circumstances. — Fytcha T | L | C 15:01, 24 January 2023 (UTC)[reply]
Yoruba translations until the recent effort were by and large problematic and stood for years as well. There's so many issues that I've seen pop up, but as mentioned, I'm only one person, and there's so much that needs to be done. This is part of why I'd been more adamant about creating a more welcoming environment in this project so that more people feel welcome to come contribute and learn, especially for smaller languages. But unfortunately, it has seemed to me (which is part of why I took a break from participating in discussions like these) that there's a very vocal portion of editors that want to maintain the status quo and protect "the traditions" of the website, pushing away new editors and not wanting to be more accessible, even though it's supposed to be collaborative. It seriously negatively affects the quality of the project as a whole. AG202 (talk) 15:57, 24 January 2023 (UTC)[reply]
Equating "traditions" developed over years of experience with exclusionary gatekeeping is not at all fair. If we allow people to come here and do anything and ignore all the rules, will you accept the enormous burden of cleanup? Equinox 16:06, 24 January 2023 (UTC)[reply]

Color change to Polish, Kashubian, and Czech templates[edit]

@BigDom @Hythonia @Benwing2 @Thadh @Sławobóg Most Slavic languages declension template use blue where as the templates for the languages in the title use gray. Any strong opinions if we switch them over to blue? Vininn126 (talk) 18:06, 17 January 2023 (UTC)[reply]

The Kashubian one is already blue. And I support the change to blue, seems reasonable to have all Slavic languages in one colour. Thadh (talk) 18:07, 17 January 2023 (UTC)[reply]
I just changed it to our dank grey West Slavic color. :V Sławobóg (talk) 18:15, 17 January 2023 (UTC)[reply]
If anything, we should use red, as red is stereotypical for Slavic. Plus, we could actually use family colors from Wikipedia for all languages and standarize all templates, some of them look crazy. Sławobóg (talk) 18:18, 17 January 2023 (UTC)[reply]
@Vininn126 I support using blue. I'd be opposed to red as it would look gaudy. Benwing2 (talk) 00:55, 21 January 2023 (UTC)[reply]
I vote blue. Vininn126 (talk) 11:12, 21 January 2023 (UTC)[reply]
I’ll go a step further, and say we should use the same blue for all declension templates. Kyrgyz and Mongolian also use it (and a few other non-Slavic languages, too, I think). Theknightwho (talk) 14:36, 21 January 2023 (UTC)[reply]
No. This should be a decision per branch, and I like my purple and green templates, thank you very much. Thadh (talk) 15:06, 21 January 2023 (UTC)[reply]

Voting now open on the revised Enforcement Guidelines for the Universal Code of Conduct[edit]

Hello all,

The voting period for the revised Universal Code of Conduct Enforcement Guidelines is now open! Voting will be open for two weeks and will close at 23:59 UTC on January 31, 2023. Please visit the voter information page on Meta-wiki for voter eligibility information and details on how to vote.

For more details on the Enforcement Guidelines and the voting process, see our previous message.

On behalf of the UCoC Project Team,

JPBeland-WMF (talk) 00:23, 18 January 2023 (UTC)[reply]

Ancient Greek "Derived terms"[edit]

For the derived terms-section in Ancient Greek entries, how shoud their size be judged? Is it preferable to have derivatives with (presumably) multiple steps between them on page of the main verb (a-la: λύτρον > λυτρωτός) for ease of use, or to split up more to one-step removed (a-la: λύτρον > λυτρόω> λυτρωτός) for the sake of size? Is there a standard (written or unwritten) for this? AntiquatedMan (talk) 15:05, 18 January 2023 (UTC)[reply]

Working with Latin, I only specify direct derivatives, leaving any indirect one for the entry for whatever form they directly derive from. Otherwise we'd have a nightmare on our hands for entries for 'prolific' base verbs. Nicodene (talk) 17:50, 21 January 2023 (UTC)[reply]

Deletion of the Westrobothnian lect[edit]

A proposal has been made on RFDO to delete the Westrobothnian lect: WT:RFDO#Category:Westrobothnian language, on the grounds that it is actively misleading. This seems quite important so I thought I would cross-post it here for wider visibility and input.

Let's continue the discussion at RFDO. This, that and the other (talk) 22:19, 18 January 2023 (UTC)[reply]

The use of Template:rel-top for cognates[edit]

I propose that we use a different template for collapsible cognate sections in etymologies. {{rel-top}}'s documentation states that it is intended for the "Related terms" section of an entry, not the etymology section, which would also have been my guess based on its name. The issue with abusing {{rel-top}} for cognates is that it makes it harder to customize these boxes separately (with user CSS/JS). In my case, I want to completely remove the cognate boxes (i.e. show the contents without the box) because it's a waste of both time and space considering that most boxes (e.g. see, cold, ship) contain content that easily fits on a single line on my monitor (even despite using tabbed languages which takes away horizontal space).

Pinging @Benwing2 who has done some work on these templates. — Fytcha T | L | C 12:32, 21 January 2023 (UTC)[reply]

I abuse {{rel-top}} for hiding a long list of references like in խմորուկ (xmoruk). We need a general-purpose content hiding template. Vahag (talk) 17:17, 21 January 2023 (UTC)[reply]
@Vahagn Petrosyan It exists at {{box-top}} and {{box-bottom}}. Benwing2 (talk) 20:53, 21 January 2023 (UTC)[reply]
@Fytcha Yeah I agree with not collapsing cognates in most cases. Benwing2 (talk) 20:54, 21 January 2023 (UTC)[reply]
@Fytcha Rather than having a proliferation of such box templates I'd rather have a single template with a param that abbreviates both the title and the CSS class, maybe something like {{box-top|[cog]}} for cognates, {{box-top|[rel]}} for related terms, etc. The param can also be arbitrary text. Benwing2 (talk) 22:24, 21 January 2023 (UTC)[reply]
@Benwing2: While that seems reasonable and logical, I think in practice it would necessitate some heavy botting because I can't imagine the Wiktionary editor base correctly using these tags, seeing how often {{der-top}} and {{rel-top}} are currently being used misused. Just abbreviating and pasting the parent L3/L4 header into the box header seems to be very easily bottable, though. I would also encourage editors to stop putting cognates into boxes so much; the cognate line in, for instance, cold is merely ~1500 pixels long but I guess the argument is going to be that we should also take mobile users into consideration. — Fytcha T | L | C 09:50, 22 January 2023 (UTC)[reply]
Confer WT:RFDO#remove lesser-used column templates. As stated there, I'm supportive of the effort to consolidate this confusing proliferation of templates. Benwing2 proposed a {{ctop*}} family of templates, {{ctop2}} through {{ctop5}}; is that still your current thinking, Ben? Or was that superseded by {{box-top}}? The intended outcome of the changes proposed at RFDO needs a little more clarity imho.
As far as cognates are specifically concerned - I have never seen an entry that benefited from the hiding of cognates in a collapsible box. Where we link to high-quality proto-language entries that have complete lists of descendants, it should not be necessary to list all imaginable cognates in the etymology section - just selected ones from key languages. This, that and the other (talk) 08:07, 22 January 2023 (UTC)[reply]
@This, that and the other I'm thinking {{box-top*}} because some people objected to {{ctop*}} as being too ambiguous. I'm open to better names though. Maybe just {{box}}, {{box2}}, etc. in place of {{box-top}}, {{box-top2}}, etc., and {{-box}} in place of {{box-bottom}}, to save typing. I do need to clean up the table in WT:RFDO#remove lesser-used column templates now that some of the changes have been made, and clarify the remaining changes. Benwing2 (talk) 08:57, 22 January 2023 (UTC)[reply]
Thanks This, that and the other, I was not aware of this RFD but I'm fully in support of it. — Fytcha T | L | C 09:50, 22 January 2023 (UTC)[reply]
Excuse my ignorance @Benwing, Fytcha. I have always wondered. What is the difference of so many col templates? I have only used one: {{col2}} or col3 or col1 under all kinds of headings. At el.wikt we only have template ( and ) for separated thematic columns as in el:πείθω and templates (( and )) for continuous columns (but we do not have sort by lang) (++for Translations el:Template:μτφ-αρχή). Also have box. Thank you. ‑‑Sarri.greek  I 18:35, 22 January 2023 (UTC)[reply]
@Sarri.greek No ignorance here, you are completely right that there are too many of them, and we're trying to fix this. Benwing2 (talk) 21:57, 22 January 2023 (UTC)[reply]

Option to suppress archaic gen. sg. in Latin nouns in -ius/-ium[edit]

Since I've had to manually override this in quite a few entries now I think it'd be helpful to have an in-built option in the declension module to suppress the "pre-Augustan" genitive singular form for these nouns which is unhelpful for terms from Late/Medieval/etc. Latin. (A particularly bad example is caeliscalpium "skyscraper", where the non-lemma entry caeliscalpi was actually created by a well-meaning editor.) I tend to think this ought to be opt-in as well, especially since the wording "Found in older Latin" implies the form actually is attested for the entry it's displayed on, but that might be too much of a hassle at this point. @Benwing2Al-Muqanna المقنع (talk) 06:48, 22 January 2023 (UTC)[reply]

Can't you already do this by changing the <2> to <2.-ium>, etc.? 70.172.194.25 06:53, 22 January 2023 (UTC)[reply]
I guess you can! I wasn't aware of that since I've never seen it in use (and I've had to change a fair number of entries like this at this point). The opt-in point might be worth considering, though. —Al-Muqanna المقنع (talk) 07:09, 22 January 2023 (UTC)[reply]
@Al-Muqanna At one point I went through and edited all the Neo-Latin chemical elements, plus various Neo-Latin toponyms, etc. to add .-ium or .-ius, so I understand the issue can be annoying. (If this isn't properly documented, my fault.) Changing everything to opt-in can be done by bot without too much difficulty if the Latin editors agree; essentially you just need to add another flag with the opposite sense, something like .+ium or .+ius, and then flip all the terms ending in -ium or -ius. Not sure who works on Latin entries these days, maybe User:This, that and the other, User:Nicodene, User:Fay Freak? Benwing2 (talk) 07:22, 22 January 2023 (UTC)[reply]
I agree that this would be better as an opt-in flag, rather than opt-out as it is currently. Also ping @Urszag. This, that and the other (talk) 08:00, 22 January 2023 (UTC)[reply]
If it's easy to do, I would support changing it to an opt-in. While there are a fair number of words where it may apply, I think words with the short nominative singular in are more of a closed category. I'd imagine that at this point, new Latin nouns that are added to Wiktionary are more likely to be from post-Classical Latin rather than early or Classical Latin. Incidentally, would there be any way to generate a list of which words are currently displayed with both and -iī to facilitate auditing these entries?--Urszag (talk) 08:28, 22 January 2023 (UTC)[reply]
This search query seems to work. 70.172.194.25 08:32, 22 January 2023 (UTC)[reply]
What I can probably do is generate a page containing all the nouns displayed one way or the other along with their definitions, for easy auditing. The auditing can potentially be done by editing that page, and then a bot script will make the appropriate changes. It also sounds like there's a consensus to switch to opt-in short genitives. Benwing2 (talk) 08:53, 22 January 2023 (UTC)[reply]
@Al-Muqanna, Urszag, This, that and the other Please see User:Benwing2/la-noun-ius-ium and User:Benwing2/la-proper-noun-ius-ium. These include all Latin nouns and proper nouns in -ius/-ium, along with their declensions, defns, a column identifying whether the short genitive is suppressed, and a column in which a "yes" or "no" should be entered to indicate whether the noun is currently wrong in this regard (i.e. whether the value of the "Suppress Short Gen" column is wrong). This last column currently has a ? in it for all nouns and proper nouns but should be changed to a "yes" or "no" (or blank for "no"). Once this is done I will fix all the nouns and proper nouns with "yes" in the Wrong column. Benwing2 (talk) 11:05, 22 January 2023 (UTC)[reply]
@Urszag Thanks for starting on auditing the terms. Given the number of them, it might make more sense if you just identify all of the ones that you think do have a short genitive, and I'll have them opt into having a short genitive while all the others opt out. That way you don't have to spend a lot of time looking up the not-sure cases. Benwing2 (talk) 00:44, 23 January 2023 (UTC)[reply]
Sure, I can do that! I got distracted just now into looking into whether quō is ever really used as a feminine form of quis, but identifying the words that definitely do have a short genitive should be fairly simple (I can just check PHI and PedeCerto and if I see the genitive, that's done) so I'll work on that now. Thanks!--Urszag (talk) 00:49, 23 January 2023 (UTC)[reply]

────────────────────────────────────────────────────────────────────────────────────────────────────I updated the list manually with some common words found in Plautus and other early Latin authors. I just found a book with tables on this subject, "On the Contracted Genitive in I in Latin" (William Augustus Merrill, 1969); while I can transcribe the information manually, I'm now wondering whether there's some way to pull the list from the existing scan.--Urszag (talk) 02:52, 23 January 2023 (UTC)[reply]

I got a plain text version of the list, so I will clean that up now and then post it on a userpage of my own.--Urszag (talk) 02:59, 23 January 2023 (UTC)[reply]
OK, I'll keep working on this tomorrow or later this week, but if anyone else wants to work on it, I put up the list I got from Merrill here: User:Urszag/genitive-ius-ium Somewhat annoyingly, Merrill lists the words by the genitive form.--Urszag (talk) 03:57, 23 January 2023 (UTC)[reply]
@Urszag Thanks for the info! Let me know when you feel it's in a relatively final form and I'll get to work fixing up the module code and the individual terms. Even if we make uncontracted-only the default, we can support a (redundant) indicator to specify this explicitly in case we want to surface this info (e.g. a note above the declension table saying "Only uncontracted genitive singular attested" or something). Benwing2 (talk) 23:11, 23 January 2023 (UTC)[reply]
@Benwing2 Thanks! I've finished marking all the words on that list that Merrill says are attested with a contracted genitive form. I haven't marked all the ones where he says that only the uncontracted form is attested. I think that information is not significant enough to warrant a note, and is likely to not be entirely certain. For example, one doubtful case that I ran into is "acatium": Merrill says only examples of acatii are attested, but some editions print "acati" where others have "acatii" in Pliny's Natural History. So in my opinion, it would be better to include it as a possible form. Working on this project also made me think a bit more about the reasons why we ought to include contracted forms. I think one fairly good reason, perhaps the best, is so that if a certain contracted form is reasonably likely to be encountered in reading by a Latin student, we can tell them what it means and direct them to the lemma: I think all of the most common attested contracted genitive forms are now marked, although some may have been omitted from Merrill's list.
Another, separate reason might be if we are trying to give an accurate picture of what forms are likely to have been used, regardless of whether we happen to have them attested or not. The value of this is I think a bit less clear: for example, scholars who are working on deciphering newly found documents surely would not need our help to point out that this kind of contracted form could have been used. However, perhaps listing the contracted form on some words, but not others might give a naive reader a misleading impression as to how common contracted forms were in the time period when they were used. For example, I think it seems pretty certain that words like amasius, used by Plautus, would have had in Plautus' time genitive forms like *amasī despite the fact that this form does not happen to be attested. That said, every other dictionary I've seen (L&S, Gaffiot, OLD) just gives the genitive of amasius in -ĭī, so I think it's fine for us to default to that as well; I would expect it to be declined this way if any modern author made use of it, and a reader isn't ever going to come across any examples of the hypothetical early Latin form *amasi.
So in the end, I still support changing to make the contracted forms an opt-in rather than opt-out for common nouns, and I think that should be ready to go forward now. (I haven't gone through proper nouns yet: I've read that for personal names, at least, -i is supposed to have continued to be used for a notably longer time period.) But I think the wording of the note "Found in older Latin (until the Augustan Age)" should be revised. Sturtevant 1902 writes that "the form with -i was the only one in use in early Latin for substantives. Adjectives, however, ended in -ii. [...] the ending -ii [..] is seen already in Propertius, and in Vergil, if one accepts Aen. III. 702 with its genitive fluvii. Ovid and most of the later poets use -ii prevailingly, but Manilius, Persius, and Martial use -i with a very few exceptions. The use of -ii on inscriptions dates from the end of Augustus's reign, but -i is the prevalent form throughout the empire. Proper names are especially conservative in the retention of -i" (Contraction in the case forms of the Latin io- and ia stems...). That is, it sounds like the Augustan Age is closer to being the start of the use of forms in -ii rather than the end of the use of forms in -i (for common nouns). I propose the wording "Found in early Latin; usual before the Augustan Age" to describe contracted forms.--Urszag (talk) 02:42, 24 January 2023 (UTC)[reply]
One other thing I thought of: if it does still seem useful to have an option to explicitly suppress the contracted form (for example, if we don't want to lose that information in case there's a decision in the future to change the default behavior back) I think it would be more helpful for the criteria and note to be based not on attestation of the genitive form, but on the age of the term itself. E.g. we could add a note saying "This word is first attested after the contracted genitive singular in -ī had passed out of regular use." on "New Latin" nouns with these endings.--Urszag (talk) 02:54, 24 January 2023 (UTC)[reply]
@Urszag Awesome, thanks! I'll go ahead and make the opt-in change. I agree with you that it would be more useful to specify the age of a term than simply that the contracted genitive is unattested; having this info specified should help for a whole range of reasons. We sort-of specify this now by notating words as Late Latin, Medieval Latin, New Latin, etc. but not consistently, and some places it's indicated may be wrong. This sort of info best goes in the Etymology section, though. Benwing2 (talk) 04:13, 24 January 2023 (UTC)[reply]

Tool to find words in passage not on Wiktionary[edit]

I recall that such a tool existed, but I cannot find it. Does anyone know of this? 70.172.194.25 01:36, 23 January 2023 (UTC)[reply]

What do you mean "words in passage not on Wiktionary"? Benwing2 (talk) 01:51, 23 January 2023 (UTC)[reply]
Given a passage (block of text), the tool would output the words contained in the passage that do not exist on Wiktionary. The version I remember was a Lua-based template that I think just checked each word (sequence of letters separated by spaces), without trying to normalize to a lemma form, to see if it was a redlink. But if anyone knows of another tool already written to accomplish this, I would be interested. 70.172.194.25 01:58, 23 January 2023 (UTC)[reply]
Palaeolithic method: copy-paste into a text file. Using search and replace in a text editor that can recognize control characters, convert everything that's not alphanumeric into "]] [newline] [[" (getting rid of duplicate/empty lines in the process). If you copypaste this into a page on Wiktionary and preview it inside a self-sorting column template, all the redlinks will be easily visible. If you want to eliminate the 47 "the"s, you can first save the text file to csv, import it into a database or speadsheet and do a unique select query on it, export it to a text file, then copypaste it into a page on Wiktionary and preview it. Tedium being the true mother of invention, you'll probably come up with a far more efficient way to do this long before you're finished. Chuck Entz (talk) 02:37, 23 January 2023 (UTC)[reply]
Yeah, I was hoping someone else had already done the work. FWIW, here is a Unix one-liner to accomplish this:
grep -o -E '\w+' | sort -u | awk '{print "[[" $0 "]]"}'
You can add | tr '[:upper:]' '[:lower:]' to the end to make everything lowercase. A more complicated way of handling that is to output both the original and lowercase forms:
grep -o -E '\w+' | sort -u | PERLIO=:utf8 perl -lne 'print m|\p{Lu}|?"[[$_]]([[@{[lc]}]])":"[[$_]]"'
You can wrap the whole output in
{{l|en|2=[...]}}
to support OrangeLinks and entry name normalization for languages where that's relevant. The first regular expression can be changed if you want a different definition of a word, e.g. '(\w|-)+' will make it count pseudo-science as one word instead of breaking it into "pseudo" and "science". 70.172.194.25 01:11, 24 January 2023 (UTC)[reply]
I see, I have a script to find red and yellow links on a given set of pages (among the 559 Wiktionary scripts I've written in the last 8 years or so) but presumably you're looking for something interactive? Benwing2 (talk) 02:38, 23 January 2023 (UTC)[reply]

Etymology language for Classical Latin[edit]

Any reason this doesn't exist? Normally I'd just create it but it seems such an obvious gap that I wonder if it's intentional. Yes, "Latin" by default is Classical Latin, but it can still be useful to express this explicitly to distinguish a word attested during the Classical period (c. 100 BC - 200 AD?) from Old Latin and Late Latin. Benwing2 (talk) 01:50, 23 January 2023 (UTC)[reply]

Because this is of various or vague definition, perhaps not even usually a chronolect, but a narrow corpus, a standard and register attained. In some cases klassisches Latein is = goldene Latinität, which used to be a square concept of nothing but how Cicero and Caesar would have written under exclusion of contemporary writing and speaking, as in this book for university students to become Latin teachers, making it end with Augusti death in 14 CE and then squeezing in a “spätklassisches Latein” (Late-Classical Latin) between it and an actual Spätlatein (Late Latin).
Still in lack of a better term it is amenable if you fill this gap by it, let it be a idiosyncratic meaning fixation of Wiktionary – in analogy to our consequential chronolectization of Hebrew and Greek. If @Nicodene does not know a more preferrable denotation. Fay Freak (talk) 02:23, 23 January 2023 (UTC)[reply]
Yes, traditionally Classicists regard the death of Augustus as a major turning point, hence L&S's marking several lexemes as 'Post-Augustan', but I've yet to see any linguistic work that does so. In my view, Latin did not change drastically enough in the span from 100 BC to 14 CE for a new category to really be needed. Nicodene (talk) 10:40, 23 January 2023 (UTC)[reply]
I would personally support adding a Classical Latin ety language, it'd be helpful to use e.g. when specifying that a term was borrowed from a post-Classical word deriving from another Classical Latin word, and I remember wanting to use it in certain entries I've made. @Fay Freak: We do already have a specific in-house chronology, see the category descriptions for Category:Old Latin, Category:Classical Latin, and Category:Late Latin; "Classical" means around 75 BC to AD 200. —Al-Muqanna المقنع (talk) 10:26, 23 January 2023 (UTC)[reply]
(Incidentally as a specific example the etymology I just added at jujube could do with this. —Al-Muqanna المقنع (talk) 09:13, 25 January 2023 (UTC))[reply]

User:Fenakhay is going around systematically replacing "Equivalent to" in etymologies to {{surface analysis}}, and is more than happy to revert anyone that disagrees with it's adoption. I find this template both pedantically ambiguating and templatization overkill, and as far as I know, it is has not been mandated nor approved for systematic replacement. -- Skiulinamo (talk) 10:15, 23 January 2023 (UTC)[reply]

Going around systematically replacing it? How about you start your thread with a bit of truth instead of accusing people left and right. It was only one replacement.
Writing “Equivalent to” is ambiguous since it can be interpreted in many ways, and using {{surf}} only improves the readability of the etymology with a link to the glossary entry. You are always against any change without any reason. — Fenakhay (حيطي · مساهماتي) 10:22, 23 January 2023 (UTC)[reply]
Fenakhay, please stop adding templates to languages you don't edit; I don't like {{surf}} either and you should respect the decision of other editors when it comes to the languages they edit instead of going around sneakily adding your preferred style to any entry that looks abandoned. Thadh (talk) 10:26, 23 January 2023 (UTC)[reply]
The accusation of "systematically" is still by far an exaggeration, and at that "equivalent to" is not machine readable, a huge disadvantage. Vininn126 (talk) 10:27, 23 January 2023 (UTC)[reply]
This is far from the first time someone's caught Fenakhay at adding this kind of template ({{inh+}}, {{syn}}, {{col-auto}}...) without consulting with the editing community first, so I don't really care at this point whether it's systematic or not, it's just annoying and should stop. If you want to have your preferred style on all pages, start your own wiki. Thadh (talk) 10:33, 23 January 2023 (UTC)[reply]
@Fenakhay: are you saying you're not going around replacing "Equivalent to" with {{surface analysis}} in etymologies? Are you truly going to make me provide a list of examples? I've also gave two legitimate reasons to my preference for "Equivalent to". -- Skiulinamo (talk) 10:34, 23 January 2023 (UTC)[reply]
Yes, please provide me examples of me replacing “Equivalent to” with {{surf}} after your edits? — Fenakhay (حيطي · مساهماتي) 10:42, 23 January 2023 (UTC)[reply]
User:Fenakhay has now blocked me from editing the entry I linked as an example. Fenakhay: use {{surface analysis}} or be blocked. -- Skiulinamo (talk) 10:34, 23 January 2023 (UTC)[reply]
I have reverted the page back to its original state. I expect it to stay that way until this discussion is over. Thadh (talk) 10:38, 23 January 2023 (UTC)[reply]
I've blocked you on that page for edit warring a.k.a. reverting my edit more than once. — Fenakhay (حيطي · مساهماتي) 10:43, 23 January 2023 (UTC)[reply]
I wish that we could instead try to find a middle ground - consistency greatly improves readabilityf or users. Vininn126 (talk) 10:36, 23 January 2023 (UTC)[reply]
I don't know what to say to this accusation. {{col-auto}} replacements were a mistake because I forgot to specify Maltese as the language. I've apologised for that. So I don't know why you are bringing this up right now to this conversation.
Using {{syn}} on Afar entries, imo, was only an improvement to the readability of the entries (I've only done it in a few entries), but I stop after you pointed it out to me.
What do you mean by {{inh+}}? I don't even use the template... — Fenakhay (حيطي · مساهماتي) 10:41, 23 January 2023 (UTC)[reply]
(All of this stands up to what I've seen from the user and the accusations seem to be made more out of frustration and exaggerated towards dislike of the templates). Vininn126 (talk) 10:42, 23 January 2023 (UTC)[reply]
Just to add to this whole pointless conversation. Editors don't own pages and this isn't the first time victar (or whatever their name is right now) behaved like this. — Fenakhay (حيطي · مساهماتي) 10:47, 23 January 2023 (UTC)[reply]
@Fenakhay: Behaved like what, disagreeing with you? Don't turn it around just because you've been called out. -- Skiulinamo (talk) 10:57, 23 January 2023 (UTC)[reply]
From someone who has no stake in this and is an observer, it's much harder for folks to side with you if you're levying what seems to be disgruntled personal attacks that aren't based on what the person actually even did. If you have a problem with the template, then actually focus on that rather than bringing up unnecessary drama (that according to Fenakhay, they didn't even do?). You should also be coming in with actual examples of the problems you see, as if Fenakhay really only did it once, then this is being made to be much a bigger issue (and amounts to dishonesty frankly) than it actually is. AG202 (talk) 11:58, 23 January 2023 (UTC)[reply]
Looking into it more, there aren't even any Talk page messages, on user pages nor on the entry itself, just one edit war that was immediately moved here. It really does seem like the discussion was moved here to almost "tell on" Fenakhay by making it seem to the unknowledgeable third-party that they're doing something much worse than they actually are. And as seen by Fytcha's response below, multiple German editors already use the template, something that may have already been taken into account (let alone the benefits of {{surface analysis}} for readers). Also, as mentioned, this is a collaborative project, no editor owns any page. There can be guidelines and rules made as seen in WT:ADE or WT:CFI, but otherwise, we should be treating other editors and their edits with good-faith (unless actually proven otherwise). I would highly suggest that this issue in the future be resolved otherwise sans these types of accusations and discussion moves. AG202 (talk) 12:13, 23 January 2023 (UTC)[reply]
I recommend you read Fenakhay edit comments, which make his intentions abundantly clear: use {{surface analysis}} or you will be reverted and blocked. My complaint was not personal, but factual, and I'm not going to fall into the trap of a tu quoque argument. --Skiulinamo (talk) 01:28, 24 January 2023 (UTC)[reply]
??? I did read them, and while the block may have gone too far, I agree with them that no one owns any specific page and telling someone not to edit a page with comments like "youre welcome to use that, doesnt mean i have to" is completely inappropriate and against the idea of a collaborative project. You have been here long enough and should know better to actually have a discussion/use the talk page before reverting good faith edits like that. You explicitly stated that and opened up this discussion with "Fenakhay is going around systematically replacing "Equivalent to" in etymologies with {{surface analysis}}." That has been found to be false. Putting it on one entry is objectively not "systematically replacing it", and I'm glad that others are noticing that falsehood as well (take note especially of Benwing's comment below). Even if that is not a personal attack, that was absolutely not factual. And you have yet to provide a counterargument or examples of Fenakhay replacing more than one entry, when it was explicitly asked for by them. Again, please stop bringing personal beef into these discussions and bring actual facts into the matter, or else, like I said, people won't take you as seriously, as has started already. AG202 (talk) 12:28, 24 January 2023 (UTC)[reply]
@AG202: Instead of gaslighting and making ad hominem attacks, you could have just looked at User:Fenakhay's edit log and seen 60 such mass replacements on 11 January 2023. -- Skiulinamo (talk) 06:19, 26 January 2023 (UTC)[reply]
Are you still at it? Those only concern Maltese entries which I am the sole editor. Go make more Proto-Germanic entries to cool down. — Fenakhay (حيطي · مساهماتي) 06:22, 26 January 2023 (UTC)[reply]
LMFAO, yes, I'm still at replying to users. And for the record, Qehath created several of those entries. -- Skiulinamo (talk) 07:03, 26 January 2023 (UTC)[reply]
Well, in that case, apologies for that, but aside from that, my other points still stand. It feels especially pointed, and I wish that there were more actual discussion on issues like this, like the discussion below. AG202 (talk) 14:22, 26 January 2023 (UTC)[reply]
I also don't think it's very fair to expect folks to scroll through hundreds of edits to find the edits from 2 weeks ago, when you were asked multiple times to provide examples, but it's fine I guess. AG202 (talk) 14:24, 26 January 2023 (UTC)[reply]
Let's ask the German editing base to decide on whether to use this template and hopefully that'll be that. (Notifying Matthias Buchmeier, -sche, Jberkel, Mahagaja, Fay Freak, Fytcha): . Thadh (talk) 11:01, 23 January 2023 (UTC)[reply]
AFAICT, {{surf}} is in frequent use by multiple German editors, including but not limited to Jberkel, the prolific IP editor from Bonn and myself. I admit that the template leaves some things to be desired (first and foremost, it doesn't support internal derivations that are not expressible with {{af}} as would be needed in the case of Sudel, an apparent deverbal) but the arguments against its use in the case of simple affixations/compounds have not been convincing to me, while the benefits have. — Fytcha T | L | C 11:02, 23 January 2023 (UTC)[reply]
I absolutely agree {{surf}} should be able to handle more than just affixation. Vininn126 (talk) 11:47, 23 January 2023 (UTC)[reply]
I had forgotten this template. The machine readability argument speaks for it, but stereotypical phrasing makes me suppress it from consideration. There should be a way to have it or even link the appendix but still use the phrasing “equivalent to” and “analyzable as” etc. Editors don’t like the Systemzwang of having to change their style of formulation for so trifling a cause, and human readers cannot esteem it high either when they realize it is a Textbaustein. Fay Freak (talk) 15:21, 23 January 2023 (UTC)[reply]
I routinely use {{surf}} in Italian etymologies for standardization purposes. I'm not completely happy with the "By surface analysis, ..." wording but substituting with no template at all is not an improvement. Also, some editors in this discussion have been making unfounded accusations and not assuming good faith (AGF); it's important to AGF at all times otherwise discussions will rapidly degenerate into flame wars. Benwing2 (talk) 22:17, 23 January 2023 (UTC)[reply]
What's wrong with 'by surface analysis'? Nicodene (talk) 22:26, 23 January 2023 (UTC)[reply]
@Nicodene I hadn't heard of "surface analysis" before this and it somehow rubs me the wrong way as it seems to pass a negative judgment on the resulting analysis as it suggests it's just "on the surface". I prefer "synchronic analysis", which is more accurate and which is more standard in the linguistic community. Benwing2 (talk) 23:05, 23 January 2023 (UTC)[reply]
What about just "Synchronously"? Vininn126 (talk) 23:08, 23 January 2023 (UTC)[reply]
Synchronically? PUC23:28, 23 January 2023 (UTC)[reply]
Doesn't that mean "happening at the same time?" Vininn126 (talk) 23:32, 23 January 2023 (UTC)[reply]
I think "synchronically" is better. The adjective "synchronic" is used in this sense in linguistics, not "synchronous". And if anything, I read "synchronously" as meaning "at the same time" more so than "synchronically" does, which sounds like a specialized term. Also compare Google Books results for "synchronically analyzed" vs. "synchronously analyzed". 70.172.194.25 01:07, 24 January 2023 (UTC)[reply]
I think we should have "Synchronic" and "Diachronic" subsections as an option for etymologies (which should probably be blue-linked). At the moment, it's too-often a jumble of both. Theknightwho (talk) 01:08, 24 January 2023 (UTC)[reply]
I don't know about it being 'more accurate and [...] standard', but I certainly have no issues accepting 'by synchronic analysis' as a possible replacement. Nicodene (talk) 00:16, 24 January 2023 (UTC)[reply]
Oh yeah, we should have many formulation variants in an ordered list within a parameter and then a template substitution for random text generation. Fay Freak (talk) 00:45, 24 January 2023 (UTC)[reply]
What? I said no such thing. Nicodene (talk) 12:56, 24 January 2023 (UTC)[reply]
{{surf|mode=Fay Freak}}Jberkel 00:59, 24 January 2023 (UTC)[reply]
The general vibe seems to be some form of "By synchronic analysis". if we do that, can we change the name of the template as well? Vininn126 (talk) 10:23, 24 January 2023 (UTC)[reply]
"Surface analysis" is linguistic jargon. "Equivalent to" is plain English that doesn't need to be spelled out in a glossary link. Honestly, can someone explain the "machine readability" argument to me? I'm not sure what you would even do with the information from a "surface analysis". Even if you did something with it, you can data scrape "Equivalent to {{af}}" just as well as you could "{{surf}}". If it truly is helpful beyond what-if theory, a {{eq}} shell over {{af}} would be fine, but the lead text is a non-starter for me. --Skiulinamo (talk) 01:28, 24 January 2023 (UTC)[reply]
Yes, you can scrape "Equivalent to {{af}}" just as well as you could "{{surf}}". The issue is that editors will not consistently write "Equivalent to {{af}}" in absence of such a template as I've already demonstrated in Template talk:surface analysis (I view the implicitly arising homogeneity as another upside of this template). Also, as you asked what we would want to do with that template anyway, I can presently imagine two realistic scenarios where it would come in handy: 1. We may, at one point, decide that surface analyses should not categorize into the usual affix categories anymore which would be very easy to implement if we use different templates for derivations and surface analyses. 2. We may want to be able to automatically detect potentially incorrect derived terms; this can be done by checking whether the potentially derived term actually derives (i.e. uses {{af}} or similar) from a given term or not (i.e. merely mentions that term inside {{surf}}). — Fytcha T | L | C 14:43, 24 January 2023 (UTC)[reply]
If I'm being honest the wording "eqvuialent to" is equally as vague and unhelpful - I was very confused by it for a long time not realizing it was referring to a specific thing in linguistics. Vininn126 (talk) 14:46, 24 January 2023 (UTC)[reply]

Misleading <ll> representations in Spanish + representing a true picture of the Hispanosphere[edit]

(Notifying Ungoliant MMDCCLXIV, Metaknowledge, Ultimateria, Koavf): @Soap, @Nicodene, @Ser be etre shi To get to the point, the way <ll> is currently represented in {{es-pr}} and the corresponding module is misleading. Having /ʝ/ be the default that readers see immediately doesn't paint the whole picture, and we shouldn't keep the other pronunciations in the "more" click. Similar to how we list multiple pronunciations for English or French or many other languages, we can easily list multiple for Spanish as well by default. Also, the descriptions don't paint the whole picture either. /ʎ/ is not only seen in Northern rural Spain in terms of the peninsula, nor is it only seen in the "Andes Mountains" in Latin America (Paraguay is one of the major countries without yeísmo, yet it's not listed). /ʒ/ is also not limited to Argentina & Uruguay, as seen here. In fact, according to Coloma, German (2011) “Valoración socioeconómica de los rasgos fonéticos dialectales de la lengua española.”, in Lexis[3], page 103, one of the cited sources at yeímso, the population studied that does yeísmo (371.53 million) is less than the population that does seseo (376.72 million), which clearly makes sense with the division between Spain & LatAm (note: the study unfortunately does not include lects outside of those two areas), but is interesting to note as we separate out distinción by default but not /ʎ/. We also notably do not list Equatorial Guinea or the Phillippines at all in pronunciation, even though they're both noted to have distinción in their dialects, and the latter is noted to have /ʎ/. Overall, this module needs some serious fixing in terms of proper representation of the regional Spanish lects in order to paint a true image of the Hispanosphere. I would've made these changes myself, but I would rather have discussion first about it. AG202 (talk) 15:03, 23 January 2023 (UTC)[reply]

This is all completely sensible and backed up by evidence. I support, based on my understanding of ceceo/seseo/distincion and lleismo/yeismo. —Justin (koavf)TCM 17:44, 23 January 2023 (UTC)[reply]
@AG202 First of all, you should ping me as I wrote this module, otherwise I may miss the request. Secondly I don't agree that we need to unhide lleismo pronunciations by default; at a certain point with more information displayed by default you get diminishing returns and overwhelm the user with excess information. The difference between lleismo and distincion is that the latter is the overwhelming standard pronunciation of Spain while the former is scattered in various places and has significant regional differences in how the pronunciation is realized. Thirdly there is a limit to how much info we can pack into the dialectal tags without it getting excessive. We can definitely change the wording but I'd be opposed to a laundry list of every country and subregion where lleismo is found just to be "complete". Please propose specific changes to the wording, and we can discuss it. (Fourthly, there are subtleties in the way the module handles regional varieties; I don't know your level of programming skills but you'd need to review the module carefully before hacking on it. It might be better for me to make the changes once they're agreed upon.) Benwing2 (talk) 22:13, 23 January 2023 (UTC)[reply]
@Benwing2: Apologies for not pinging you, I completely forgot to. In terms of hiding lleísmo by default, a compromise would be maybe to list both pronunciations on the same line. Ex: at ella, we could do something like IPA(key): /ˈeʝa/, /ˈeʎa/ invalid IPA characters (//) or something similar that I've seen other languages do. And honestly, I don't see the big deal with listing two (or more) lines anyways; French often has multiple lines as with maître or English with daughter. I agree that it sometimes gets excessive and can lead to clutter, but I don't think that including /ʎ/ reaches anywhere near that level, and it also starts to raise questions. What's the standard dialect that we're supposedly putting at the default in entries like ella, there's no qualifier for it. Spanish is an linguistically diverse language, and we should show that to readers by default. We don't need to list every subregion, but if we're listing any country, then we should be more specific about where. Just listing "Andean Mountains" when Paraguay is one of the main regions that has /ʎ/ is misleading, and the exclusion of African & Philippine Spanish leaves much to be desired. If it were a case of "we're working on it, and it'll come in the future", as with some other projects, it'd be fine, but it doesn't seem like that's the case at the moment. AG202 (talk) 22:33, 23 January 2023 (UTC)[reply]
@AG202 I don't like IPA(key): /ˈeʝa/, /ˈeʎa/ invalid IPA characters (//) because no info is provided on which pronun is correct in which circumstances; it wrongly implies that there is free variation. This is worse IMO than providing a somewhat cluttered view that at least gives reasons for different pronuns. I'm still not really sure why you are objecting to having the info there but hidden; keep in mind if we unhide all the info, we'll end up with up to 11 lines in some cases; see cebolla for an example. My idea in having yeismo be the default for Latin America is that AFAIK this represents the standard prescribed pronunciation more or less everywhere; can you give me an example of a country where lleismo is prescribed and standard (and if so, how are the ll and y pronounced)? We can definitely change "Latin America" to read "Latin America (prescribed)" or something. In the meantime let's fix the wording of the dialect tags; can you make suggestions? Benwing2 (talk) 23:00, 23 January 2023 (UTC)[reply]
@Benwing2 It's hard to find out what exactly is the "standard" for a specific country, let alone considering the fact that /ʎ/ is still the "standard" for the RAE in careful pronunciation (at least that's how I interpret this link: section 2b). And with that, iirc Spain still prescribes lleísmo as well. But at the very least, Paraguay is almost universally said to do lleísmo and preserves a distinction between <ll> & <y>, though I'm unsure how exactly <y> is pronounced there though from what I'm seeing the representation is /ʝ/, as cited at the following as well.
Jaime Peña Arce (2015) “Yeísmo en el español de América. Algunos apuntes sobre su extensión”, in Revista de Filología de la Universidad de La Laguna (in Spanish), number 33, Universidad Complutense de Madrid, →ISSN, pages 193-194:
Paraguay, contrariamente a la mayoría de Hispanoamérica, es casi en su totalidad distinguidor; de hecho, la articulación de la lateral se ha convertido en emblema y motivo de orgullo para los paraguayos frente a sus vecinos, en especial frente a los argentinos, donde las articulaciones fricativas más adelantadas –especialmente en el Río de la Plata– se han desarrollado hasta el extremo de modificar totalmente su sonoridad.
It also goes into more detail about the situation in Colombia. Let alone the Philippines as well, which is said to preserve /ʎ/. Also, for the record, yeísmo isn't just the standard for LatAm, it's the default for everything at the moment; there's no qualifier at entries like ella until you click "More". Overall, this goes to show how it can be misleading to only have /ʝ/ be the default, as to an uninformed reader who's skimming, they can think that /ʝ/ is the sole pronunciation or that it's standard; especially when we already split distinción vs seseo out by default.
In terms of the wording on the dialect tags, I'd maybe need more time to really go in depth, but in the meantime, I'd for sure add Paraguay & the Philippines to the line with /ʎ/. /ʒ/ is cited elsewhere, but I don't want to give incomplete info right now. For distinción at entries like ceceo, "Equatorial Guinea" should be added next to Spain; we should also start thinking about how to incorporate ceceo from Andalucía for the future as well, though it's not as pertinent right now. AG202 (talk) 23:28, 23 January 2023 (UTC)[reply]

{{sl-pronounce}} and {{sl-pronounce-other}} (as seeon on akrobatka)[edit]

I really don't think we want these templates. I don't see why they can't be some pronunciation module. Vininn126 (talk) 18:09, 23 January 2023 (UTC)[reply]

I am also fond of that idea, however I don't know how to code so I made those templates instead. The current module-operated template, Template:sl-IPA is outdated in relation to the Wikipedia article and offers too little variation for regional differences.
The similar issue is with inflection table templates as they are also outdated, and there are some unresolved issues from 2016 regarding the ordering of cases. Slovene has two standards and the tonal (less common, but more detailed version) Slovene was decided to be the standard for diacritical marks of the head templates, however the inflection tables support non-tonal Slovene.
Also, Appendix:Slovene pronunciation and Help:IPA/Slovene are outdated as well because I was never able to find enough interested editors to actually decide on how they should be updated. Garygo golob (talk) 19:34, 23 January 2023 (UTC)[reply]

Review the anti-Ukrainian propaganda edit by User:Overlordnat1[edit]

Rather than spamming the Kiev/Kyiv linguistic discussion, I'd like to check if this kind of edits is OK in this community: diff made by Overlordnat1 (talkcontribs) and a block is merited. Anatoli T. (обсудить/вклад) 01:01, 24 January 2023 (UTC)[reply]

Perhaps Overlordnat1 thinks that he is merely making a statement about a Ukrainian politician. Sorry, but associating Ukraine with Nazism is a major theme of Russian propaganda used to motivate the killing of Ukrainians because they are Ukrainians. It resembles hate speech, and if tolerated in a public forum it will just enable more of the same. It clearly violates Wikimedia’s code of conduct, and it is way beyond just some rude comment. Best if they can immediately take it back and apologize, and we can all move on. If not, then action is necessary. —Michael Z. 01:59, 24 January 2023 (UTC)[reply]
False dichotomy: “not about what they believe in but about what they write.” I don’t know about what they believe from anywhere else than their writing, and I couldn’t obtain the same reaction from it as Anatoli T. and Mzajac have. “Associating the elected head of state and elected government with Nazism” is what one constantly has to do to understand a certain existing position (which we oppose). It is an interesting specimen of how many simple and ugly people think, and this is how I took it, didn’t really matter whose opinion that is. Putin propaganda, commissioned by the Russian state, looks different and should be curtailed, but calling a public figure, so important as to lead a country, a nasty name in some long Wiki talk page thread is not suited to harm this public figure nor should it resonate with any individual of the said country; it is wild that this has been taken so far as abetting a narrative motivating the killing of Ukrainians: rah, how could you reasonably have interpreted the statement in a way that could allow inferring this point behind the message? There should be high standards concerning insults of state officials (this is also what freedom of speech is originally for), and the discussion contributions should not be interpreted that fast as badmouthing individuals of a whole country. Now I see Overlordnat1 probably isn’t fighting on the Ukrainian lines too actively, for he would have been more sensitive in that case, but still you can’t tell if he isn’t elsewhere supportive of the country in general though “sickened” by that leader.
It does not look a correct approach to interpretating utterances either in the result, as a control consideration, if the result of who is offended is totally different depending on whether Zelensky or Putin or Erdoğan is inveighed against, and who is at war with whom to which extent. (Also a mode Russian propaganda wants us to think, don’t let them set our themes! Some Nazis in the lines don’t make a Nazi country and Russia is, for all we hope for, also not only Putin’s or Dugin's ideology, this collectivist logic is untrue, and this consciousness, in favour of constant application of facts and logics rather than the watchword from above, is more important than those symbolic victories of whether somebody may or may or may not be considered a Nazi or gay or pussy-grabber at some point, or a city is spelt more Russian or more Ukrainian.)
Don’t forget, also and on the other hand, that every measure and utterance must be proportionate to and in relation with the amount of work a user invests into the dictionary product. Kremlin propaganda writers, as I imagine them, don’t do all that normal dictionary stuff, and if they do, then anti-Ukrainian utterances maybe even won’t be propaganda. You did as though some evil had encroached, and indeed state-sponsored actors who attempt to rewrite narratives in favour of their masters should not feel safe, while real people, actually more likely to make a mistake than those who have an elaborate supervisory structure, should be met with charity, and what’s that kind of trend of writing general discussion topics about Wiktionary users 😫? (So far, there are few enough people here for me to remember who is probably real and who could be a paid actor making that rogue state propaganda.)
Now go on, he was wrong and you are right, and also you have been wrong and he had a point, made us think, perversely. (Although it will be smart for him to apologize and/or elucidate his risky statement.) Fay Freak (talk) 02:50, 24 January 2023 (UTC)[reply]
Please don’t imply that we wrote things that we did not. I did not write that the offending editor’s point is killing Ukrainians.
The comment is unacceptable because it indulges themes of Ukraine is responsible for the war, Ukraine’s government is illegitimate, Ukraine is led by Nazis: all present in anti-Ukrainian conspiracy theory, all used to delegitimize and dehumanize a nation and its state that are under existential attack. It is unacceptable because if allowed it will help normalize expressions of hate speech and allow Wiktionary to become a hostile environment. Not just about Ukrainians, but about any group. —Michael Z. 03:09, 24 January 2023 (UTC)[reply]
How is this lexical? Vininn126 (talk) 12:57, 24 January 2023 (UTC)[reply]
User:Mzajac, in my opinion both User:Overlordnat1's statement and yours above pertain to politics and not word usage, do not belong on Wiktionary and should not have any bearing on Wiktionary discussions. If his comments were insulting or offensive, he should be sanctioned on that basis, not simply because he takes the Russian side or creates a 'hostile environment'. Beaneater00 (talk) 20:20, 26 January 2023 (UTC)[reply]
If they haven't been warned before, I'd maybe go that route first, but I will say that I have been noticing oddly out-of-place political comments on RFVs/RFDs that don't really have anything to do with the usage of the word, see Wiktionary:Requests for verification/English § Warshington, D.C. for example. AG202 (talk) 12:37, 24 January 2023 (UTC)[reply]
Sickening Kremlin propaganda. Nicodene (talk) 12:51, 24 January 2023 (UTC)[reply]
The major problem with this ENTIRE discussion is that no one is providing MODERN QUOTES in RUNNING TEXT. It's all based on personal opinion and committee's recommendations and what "should" be correct. You are all forgetting the point of this project which is documentation of running text. I don't care whose opinion is what, I care how people in reality are using what forms. If you're going to make a claim either way you need to present QUOTES, nothing else matters. Vininn126 (talk) 12:54, 24 January 2023 (UTC)[reply]
IMO we shouldn't be policing the political ideology of discussion participants. While I vehemently disagree with the opinions expressed, and I realize that @Atitarev and @Mzajac have good reason to be sensitive about such things, given their ties to the region- for our purposes the only only thing that matters is that these were inflammatory and unnecessary political remarks. Wiktionary is a community of people from all over the world, and with that comes a need to be sensitive about how what we say will be interpreted by people from outside our individual personal spheres. @Overlordnat1 seems to be young and inexperienced. I hope he's learned something from all of this. Chuck Entz (talk) 14:31, 24 January 2023 (UTC)[reply]
I very strongly agree. Political ideaologies expressed on fora, but not in content, should be treated differently and we shouldn't be conducting witch-hunts, even if I strongly disagree with the expressed opinions. It's immature. Vininn126 (talk) 14:34, 24 January 2023 (UTC)[reply]
This project has an obligation to deal with breaches of the Wikimedia code of conduct.  —Michael Z. 15:58, 24 January 2023 (UTC)[reply]
“Individual personal sphere” and “ties to the region” is presumptive and condescending. Please don’t try to make this about me. This is about comments that dehumanize a recognizable group. I would sanction users trying to normalize unacceptable speech about any group.  —Michael Z. 16:04, 24 January 2023 (UTC)[reply]
@Chuck Entz: Thanks for being the voice of reason in all of this. Thadh (talk) 16:32, 24 January 2023 (UTC)[reply]
I see that @Overlordnat1 has deleted the comment,[4] and is perhaps trying to decide how to word a brief apology to the community.  —Michael Z. 16:12, 24 January 2023 (UTC)[reply]
My comment was emphatically not a breach of any code but I apologise for any offence caused and will not be making any political comments on this forum in the future. I have no further comment to make about the matter. —Overlordnat1 (talk) 17:48, 24 January 2023 (UTC)[reply]
It’s probably a good idea to avoid political comments. Since you don’t acknowledge that what you stated went way beyond that into anti-Ukrainian sentiment, I still have concerns.  —Michael Z. 17:58, 24 January 2023 (UTC)[reply]
I have to say I was quite shocked to see the comment, which was largely irrelevant to the discussion. I would say that comments which annoy or upset other editors and thus threaten to derail discussions by introducing digressions are disruptive to the project. A warning may be warranted in the first instance, but if the behaviour continues a block may be in order. — Sgconlaw (talk) 21:22, 24 January 2023 (UTC)[reply]
@Sgconlaw: But then Mr. Atitarev also needs to be issued a warning. In my opinion, he uses anti-scientific, obscurantistic a phrase; he repeats a phrase that were created by ideologists inventing a certain kind of intellectual (linguistic) schemes, stamps, cliches, labels, images, generalizing examples and samples, parables, catchphrase, slogans, and not as auxiliary means on the way to cognition of being as it is, but as a final and the highest result of cognition. Gnosandes ✿ (talk) 22:34, 24 January 2023 (UTC)[reply]
The only obscurantism I see here is that of your own comment. I cannot discern a coherent point, as you appear to be saying that we should issue warnings simply for the use of the word “propaganda”. Is that seriously what you are suggesting? Theknightwho (talk) 23:00, 24 January 2023 (UTC)[reply]
@Theknightwho: I wrote a phrase, not a word. Stop twisting my words. Gnosandes ✿ (talk) 23:36, 24 January 2023 (UTC)[reply]
@Gnosandes What words did I twist, exactly? Theknightwho (talk) 23:52, 24 January 2023 (UTC)[reply]
@Theknightwho: I wrote above. This was an amendment to your comment. "Nazilensky" ⇔ "Putin's propaganda". Gnosandes ✿ (talk) 00:28, 25 January 2023 (UTC)[reply]
Yeah, right. The best way to defend is to attack. Good try. Of course, in your world Putin's propaganda with hundreds of millions of dollars doesn't exist, you come up with those offensive terms and the view on the events yourselves.
I am less concerned, since the editor removed his comment, consider this as a weak apology. Anatoli T. (обсудить/вклад) 04:15, 25 January 2023 (UTC)[reply]
@Atitarev: I didn't attack anyone. There really is no "Putin's propaganda with hundreds of millions of dollars" in my world, but there is also no "Nazilensky" (diff) or "ₚосія". What you are suggesting is a primitive view of the world. I have never come up with offensive terms and I have never used it, as well as a view on the events.
I will not consider this as a weak apology. Gnosandes ✿ (talk) 12:19, 25 January 2023 (UTC)[reply]
A new meme LUL Gnosandes ✿ (talk) 21:20, 24 January 2023 (UTC)[reply]

Translation glosses in single-sense entries[edit]

@Sgconlaw just reverted me in kangaroo court. In my opinion, there is zero use in copy-pasting definitions into the translation gloss, especially lengthy and wordy ones as in the present case. There actually being text there (instead of the default "Translations" text) gives off the false impression that there is actual information to be found there; I certainly wouldn't expect there to be totally redundant text on a Wiktionary page. What's more, sooner or later the definition and translation gloss will start to diverge in content, giving off the wrong impression that there actually is some kind of difference. Lastly, the header of translation boxes is used by the translation gadget to show translations in one's preferred languages. The useless text takes away precious space and makes visually skimming whether one's languages are present or not much harder.

Some editors opt for a strongly simplified version of the definition (something like "quasi-judicial proceeding" in this case) but I'd opine that we should not use them unless absolutely necessary. The problem with them is that editors are then tempted to only translate the term/description found in the translation gloss which can lead to some very inaccurate translations. If a translation gloss is necessary (i.e. if there are multiple senses), I prefer the style that I employed in ameliorative, i.e. simply writing (philosophy) because it takes minimal space, is minimally redundant and makes finding the corresponding sense very easy.

Thoughts? — Fytcha T | L | C 17:05, 24 January 2023 (UTC)[reply]

While I'm not sure about this case, there are cases where having multiple translation boxes for a single sense would honestly be better. The first examples that come to mind are uncle & aunt, and oddly enough we have paternal aunt, paternal uncle, etc. as surprisingly not translation hubs while also having a number of translations there as well. I think that there could be some leeway along with more cleanup. AG202 (talk) 17:10, 24 January 2023 (UTC)[reply]
I think it's absolutely redundant to include a gloss in the translation box of a single-definition word. Vininn126 (talk) 17:16, 24 January 2023 (UTC)[reply]
I think it is better to include a gloss even if there is only one sense in the entry, (1) for consistency across entries, and (2) to remind editors of the sense that they are adding translations for, particularly when an entry is long (for example, because of quotations separating the sense from the translation table). — Sgconlaw (talk) 17:19, 24 January 2023 (UTC)[reply]
You need to be reminded of a single sense? There aren't any other senses to confuse even. Vininn126 (talk) 17:20, 24 January 2023 (UTC)[reply]
Reminding? Of the text that is literally 100px further up? — Fytcha T | L | C 17:22, 24 January 2023 (UTC)[reply]
I actually thought of addressing this point before I saw these more recent comments. See the addition to my comment above. — Sgconlaw (talk) 17:34, 24 January 2023 (UTC)[reply]
There's still only one sense. There's no other senses to confuse it with. Someone editing kangaroo court aren't suddenly going to think they're adding translations for "green". Vininn126 (talk) 17:36, 24 January 2023 (UTC)[reply]
I've added thousands of translations to Wiktionary but the "issue" of needing to be reminded of the definition in single-sense entries never once occurred to me. Unlike the issues I've presented, I honestly don't think yours exists. Furthermore, quotations are collapsed by default and a couple of nyms are, from my experience, not amnesia-inducing enough to justify the reminder. — Fytcha T | L | C 17:50, 24 January 2023 (UTC)[reply]
"Sooner or later the definition and translation gloss will start to diverge in content"—this is key I think. We should avoid duplication of info wherever possible, and I fairly regularly come across translation boxes with glosses that diverge from the senses they're meant to correspond to (because they weren't updated when the sense was tweaked/corrected), which makes me hesitant to add translations. —Al-Muqanna المقنع (talk) 17:40, 24 January 2023 (UTC)[reply]
I agree with Fytcha on this, and actually remove these when I encounter them. They’re redundant, waste space, and often cause confusion when they start to diverge from the actual definition. Get rid. Theknightwho (talk) 17:42, 24 January 2023 (UTC)[reply]
When that happens then it makes sense to add a gloss. Vininn126 (talk) 17:43, 24 January 2023 (UTC)[reply]
@Al-Muqanna: I feel that's a general issue not specific to translation tables. Editors just have to be more careful when they edit (or clean up after those who aren't, unfortunately). I always check that there's correspondence between senses and the glosses of translation tables. — Sgconlaw (talk) 17:44, 24 January 2023 (UTC)[reply]

──────────────────────────────────────────────────────────────────────────────────────────────────── @Fytcha: I'm not going to revert this edit, but I think the discussion should be allowed to continue for longer than two days. I also suggest recording any consensus reached at "Wiktionary:Translations". — Sgconlaw (talk) 14:23, 26 January 2023 (UTC)[reply]

Descendants of Non-lemmas[edit]

I believe the general view strongly disfavours the formal concept. However, in specific words, Pali seems to have borrowed forms from related languages. There is the general phenomenon of Magadhisms, e.g. the plural bhikkhave of bhikkhu, which is the form typical of Magadhi, rather than the form typical of Pali. The word buddha is in general not a borrowing from Sanskrit, but is a common inheritance. However, in at least the phrase namo buddhāya, the usage of that dative singular form appears to be a borrowing from Sanskrit. To promote the borrowing of that case form, which occurs natively with a restricted sense (dative of purpose) in other words from that declension, to a borrowing of the lemma, seems wrong. May I therefore list that Pali case form as a borrowing of the Sanskrit dative singular non-lemma? --RichardW57m (talk) 13:04, 25 January 2023 (UTC)[reply]

So far I've been handling this type of problem by listing the descendant under the ancestor lemma with an explanatory note such as 'via the dative singular'. I'm also interested in hearing what the community thinks is the best approach here. Nicodene (talk) 13:14, 25 January 2023 (UTC)[reply]
That I think is a different case. In this case, it's only the dative singular of the Pali that's been affected by Sanskrit. The rest of the paradigm is in keeping with the rest of the language, and is inherited rather than borrowed. Or are you suggesting that in the Sanskrit lemma's descendants section, I add a note such as 'dative singular only'? RichardW57m (talk) 13:28, 25 January 2023 (UTC)[reply]
That is how I would have been inclined to do it. Nicodene (talk) 13:31, 25 January 2023 (UTC)[reply]
I would personally prefer listing them under the non-lemmas. AG202 (talk) 13:17, 25 January 2023 (UTC)[reply]
I don't know if there's a decisive argument either way but I would opt for Nicodene's approach if only because it's more likely the reader will then actually see it, and I prefer to have this kind of information centralised. The same would go for descendants from alternative forms. —Al-Muqanna المقنع (talk) 13:21, 25 January 2023 (UTC)[reply]
We could have something like {{see desc}} placed before or after the list of descendants saying "see <non-lemma form> for further descendants". Vahag (talk) 13:49, 25 January 2023 (UTC)[reply]
I think that's suboptimal, honestly. Non-lemmas should be as empty as possible. Vininn126 (talk) 14:01, 25 January 2023 (UTC)[reply]
It creates the issue that we see a lot of times with Latin right imho. It gets confusing when you see etymology sections like Portuguese comer that point to comedere but then aren't listed as descendants there. And then when you get to comedō, it doesn't make it clear that the verbs are actually derived from the present active infinitive instead of the present first-person singular (which would actually be the ancestor of como [Etymology 2] if anything). So overall, trying to keep etymologies & descendants of non-lemmas at their actual pages would increase readability to me and avoid situations where we have to keep adding labels and adding extra explanations in etymologies and hope that the reader understands what's going on. AG202 (talk) 18:51, 25 January 2023 (UTC)[reply]
I would say that's a good example of where it shouldn't be listed at a separate entry, since it's a pure circumstance of the dictionary form in Portuguese being the infinitive and the one in Latin being the form. The etymology should link to the dictionary lemma, with the infinitive optionally as the given alt form, which is indeed what the entry at comer does. —Al-Muqanna المقنع (talk) 19:27, 25 January 2023 (UTC)[reply]
See that doesn't really explain to the user why it's linked to the dictionary lemma though. (Nor why Spanish comer does it differently). For a user that's not familiar with Latin etymologies or the way we chose to lemmatize Latin, they could be easily confused as to why comedere is displayed and links to comedo directly, without an explanation of what form is which. The logic is broken unless you're someone who's been used to editing with or seeing the way we do things already for a while. I remember having to read multiple discussions to finally figure out why we do certain things like this, still without figuring out the benefits. Edit: While I think the Spanish way is better at that entry, I still find it suboptimal from a user-experience and user journey perspective. AG202 (talk) 20:58, 25 January 2023 (UTC)[reply]
I also think our situation with non-lemmas and their etymologies and more leaves much to be desired, and I'm rather surprised that there's a push to limit them even more. Though sure there might be duplication, but for example, there's no connection between Spanish verb forms and things like (a suffix indicating the third-person singular indicative preterite of a verb in -ar) or -ábamos (indicates the imperfect tense first-person plural form of an -ar verb). There's little explanation for where each of the f- forms for French être come from; "inflected form of être" at fût tells me little and the explanation provided at être does not illuminate that much and is already crowded even. Unless you know historical linguistics back to Latin already, Wiktionary as is provides you with very little compared to other etymologies. This is something that really needs to be worked on if we're aiming for the goal for concise and clear information for readers. AG202 (talk) 21:09, 25 January 2023 (UTC)[reply]
The etymology at être does in fact state that the forms starting in f- are derived from Latin sum, and ultimately from PIE *bʰuH-, but it's not necessarily organized in a way that would be easy to find. I think this is a situation where having etymologies at the non-lemma forms might make sense, as it's an irregular verb that mixes different stems. That said, having it all on one page (instead of repeated on fussions, fussiez, fussent, etc.) reduces duplication of information that may get out of sync over time, so maybe we could just keep the etymology at the lemma but find a way to organize it better.
To make things more complicated, we actually do list French fus as a descendant under the non-lemma Latin form fuī, but we don't list furent under fuērunt. (I guess because it's even less lemma-ish, being the third-person plural?) 70.172.194.25 21:24, 25 January 2023 (UTC)[reply]
Yes, that's what I was referencing with the part saying "the explanation provided at être does not illuminate that much and is already crowded even". That section really is already crowded, and I don't see how continually adding to the info on those pages will really help with readability. There needs to be a way to link the verb forms to the suffixes that create them, and if they're especially different from the verb (irregulars like the f- series) that they're linked etymologically to where they're from. AG202 (talk) 22:08, 25 January 2023 (UTC)[reply]

There is also the question of whether this is a normal borrowing or a learned borrowing. I get the feeling that the phrase namo buddhāya came over as a blending of Theravadin rituals (Pali) and Mahayana rituals (Sanskrit). I would therefore see it as a normal borrowing. --RichardW57m (talk) 13:04, 25 January 2023 (UTC)[reply]

My views: (1) borrowings from non-lemma forms are to be listed under the lemma (with a explanatory note if necessary), (2) etymologies should link to lemmas, even if they display non-lemma forms, (3) non-lemma forms should ideally not contain etymologies; suppletive stems should be documented in the etymology section of the lemma and (4) we shouldn't do anything special for verbs if the dictionary forms are different between one language and another (since, as stated above, that is an artifact of how we lemmatize entries); etymologies for verbs should be for the verb stems, not the particular forms we use to lemmatize them. — SURJECTION / T / C / L / 22:04, 25 January 2023 (UTC)[reply]

Agree. Vininn126 (talk) 22:10, 25 January 2023 (UTC)[reply]
On (2), there needs to be a way for readers to know exactly what the non-lemma is though in the etymology. comer does it well, though it's not my favorite implementation. Otherwise, it leaves information lacking. On (4), what exactly do you mean by verb stems? If we're talking about the stem of the verb (ex: com- in Spanish comer or in Korean 가다 (gada), I've never seen that done for the languages that I'm familiar with, and I'm confused on what you mean. AG202 (talk) 22:12, 25 January 2023 (UTC)[reply]
(2) I was talking about suppletion, like in être. Regarding (2) and (4), no, it doesn't leave any information lacking. "Verb stem" here is a somewhat abstract concept, but if language A uses a form A as the dictionary form for verbs and a descendant language B uses a form B, then we shouldn't be documenting the difference between forms A and B, especially not if language B has a descendant of form A even if it is not used as the dictionary form. comer has a descendant como that is a direct descendant of comedō. Any stance that we should be documenting the difference between these forms as part of the verb etymology to me stems from a misunderstanding of the etymologies of such words. As I said, our lemmatization is an artifact and we should seek to not give it any importance under etymology sections. An entry like comer would ideally do "From Latin comedō / comedere" or something along those lines. — SURJECTION / T / C / L / 22:24, 25 January 2023 (UTC)[reply]
No, I was not saying that we should document the differences between comer & como in the etymology section of “comer” for example. I was saying that, if anything, there should be marked or categorized somewhere that como comes from comer + -o, maybe on an appendix I don’t know. That’s one issue. (2) Another issue is the one of suppletion. For fût, for example, it’s really unclear how we get to that form as it clearly doesn’t descend from être + a suffix, so for a reader that wants to know more, that should be more clear. It’s similar to what we do with English entries like are which have a more complex etymology separate from their modern stem. (3) I like the slash idea as a compromise, but I’d still mark which one is the infinitive. A user shouldn’t have to be learn and be familiar with our multiple lemmatization systems to figure out which word came from which other word. This also applies to descendant sections. I feel like a lot of folks here are already used to this system so there’s little incentive to change it (as with the discussion to lemmatize Latin verbs at the infinitive, one of the rationales was that it’d be too much work), but for folks that are not accustomed to it, it makes it much more confusing and decreases readability. AG202 (talk) 23:21, 25 January 2023 (UTC)[reply]
That is what I was saying about suppletion - that all the different roots should be documented under the etymology section of the lemma, in this case être. When it comes to lemmatization, the issue is that there could be a legitimate case (although I don't think there are any) where a first infinitive ending became part of a verb stem, and those cases would be then indistinguishable from every other verb where that didn't happen. As for not having to understand multiple lemmatization systems, our etymologies are already not aimed at readers who understand nothing about grammar and etymology (and that's why I think they shouldn't be the first thing people see in an entry...), since they have their own terminology and jargon that one must learn to comprehend. I'm sure anyone who understands those terms also understands that a verb may be lemmatized under a different dictionary form for a descendant than for an ancestor, and thus it isn't something that we need to pay any attention to more than necessary. Again, if all Spanish verbs were too lemmatized under the same form as in Latin, we wouldn't even be having this discussion. — SURJECTION / T / C / L / 07:07, 26 January 2023 (UTC)[reply]
I definitely agree that they shouldn't be the first thing people see, and voted for that, but alas, that's not the situation that we're in right now. I can see your point otherwise though. Hmmm that makes me wonder how we'd get the etymologies from am, are, is to move to be if this is a change that we'd want to make universally. It's already crowded. But overall yeah, there just needs to be some kinda compromise for folks where we can limit duplication and redundancies but also be able to give as much concise information and readability for users. AG202 (talk) 20:27, 26 January 2023 (UTC)[reply]
Trying to match up Romance languages and Latin is trickier that it looks: for instance, Latin had nominal inflected forms for five basic grammatical cases (let's ignore locative and vocative for the moment) and three grammatical genders. Spanish has no forms inflected for case, and two genders. As I understand it (maybe @Nicodene can confirm), the Spanish endings came from the Latin accusatives. Do we really want to have most of the Latin nominal lemmas with no Spanish descendants? (I'm sure it's not always like that, with some deriving from other forms but taking the accusative-descended endings by analogy- but that's even worse). Then there's the matter of consistency: most general sources give lemmas in their etymologies, so you then have to wonder whether lemmas with descendants have them only because someone didn't have access to the right sources, or whether the descendants really did come from that specific form. Language change often doesn't go where we expect, and often doesn't leave footprints we can follow all the way back. Do we want to imply more precision than we can deliver? Chuck Entz (talk) 07:14, 26 January 2023 (UTC)[reply]
Yes, stuff like the noun phenomenon you mention where the Romance languages generally take after the Latin oblique stem would cause an almighty mess if the descendants were separated pedantically per non-lemma form, for not much benefit that I can see. This seems to me rather different from Richard's original question, which is about a specific instance of a borrowing of a non-lemma form. —Al-Muqanna المقنع (talk) 09:30, 26 January 2023 (UTC)[reply]
Agreed. I don't have much to add, other than the incidental observation that the general Spanish feminine singular -a reflects the Latin nominative -a equally as well as the accusative -am, on account of the early loss of /-m/ in polysyllabic words. Nicodene (talk) 10:14, 26 January 2023 (UTC)[reply]

In my opinion, descendants should only be listed in the lemma entry, potentially with qualifying notes which morphological form they derive from. This is superior from a usability perspective because this way, descendants don't run the danger of being overlooked and because presenting all information (including descendants) on a single page saves clicks. I also think our existing {{desctree}} infrastructure doesn't mesh well with the approach of spreading out descendants onto multiple pages: Wiktionary talk:Votes/pl-2022-07/Stubifying alternative forms § Derived and descendants from the stub. As for etymologies, I don't have any strong opinions; showing the non-lemma as an alt-text and linking to the lemma is one option, just showing and linking to the non-lemma and then explaining that it is a non-lemma (as in înțelepciune) is another. — Fytcha T | L | C 09:08, 26 January 2023 (UTC)[reply]

Would {{desctree}} handle buddha being borrowed into Pali from Sanskrit because of the dative singular and being borrowed into Sanskrit from Pali because of the sense 'the Buddha'? Of course, the dative singular route generally wouldn't propagate with the borrowing of the word from Pali! --RichardW57m (talk) 13:58, 26 January 2023 (UTC)[reply]

IPO, the non-lemmas of language A can also derive into the lemmas of language B; this case should be kept. A lot of Thai (B) lemmas came from Sanskrit/Pali (A) non-lemmas, for example. --Octahedron80 (talk) 09:28, 26 January 2023 (UTC)[reply]

One method that's been recommended for this is to display the non-lemma but link to the lemma in the borrowing template. --RichardW57m (talk) 12:30, 26 January 2023 (UTC)[reply]
I prefer the / option that Surjection gave. AG202 (talk) 20:20, 26 January 2023 (UTC)[reply]
@AG202: I can't work out what you are talking about. --RichardW57m (talk) 11:05, 27 January 2023 (UTC)[reply]

It currently contains loads of number-terms like 100th and 13375p34k. Should perhaps be renamed "English words spelled without vowels" (since the pronunciation is not taken into account at all). Even then, I think anything containing digits ought to be omitted. Equinox 02:27, 26 January 2023 (UTC)[reply]

Almost entries are abbreviations/initialisms because they just do not include AEIOU. (Is Y a vowel?) How can we exclude these? Or we can just cancel the category. --Octahedron80 (talk) 09:41, 26 January 2023 (UTC)[reply]

To be honest looking through the category there are so few legitimate examples that aren't just initialisms or abbreviations that it could probably just be handled manually if we exclude those. —Al-Muqanna المقنع (talk) 09:55, 26 January 2023 (UTC)[reply]
These are the (main lemma) entries I could find that aren't just abbreviations, censored forms, number substitutions etc., out of about 5,000+ in the category, though probably missing a couple: brr, brrm, bzz, bzzt, cwm, cwmwd (but w is a vowel in Welsh), grr, grrrl, hmm, hmph, hnn, hnnng, hrm, hrmph, mhm, mmm, mmph, pff, pfft, pht, pshh, psst, shh, skrrt, tsk, tsktsk, tsk tsk, whrr, zhng, zzz. Maybe they can have their own category. —Al-Muqanna المقنع (talk) 13:39, 26 January 2023 (UTC)[reply]
100th is not a word this is a disgusting accountant-bureaucrat abbreviation reminiscent of 1984 or of Soviet-Bolshevik butchering of language . The worst totalitarian Excesses of the 20th century. So it can't be considered as a 'word' but rather an eye-spelling or (nonstandard, unacceptable and frankly repulsive) slang (?) variation of something that is spoken in the same way. So I have to agree that the category should be deleted because none of the entries it contains actually qualify as unique words. Anyway 100th sounds like 'hundredth' which contains vowels ?? you can't shorten it and write it so funny that you get rid of all the vowels and then -- haha checkmate !! there aren't any vowels, so it doesn't count as a word ! you couldn't say anything like that without vowels anyway .. who on earth could come up with such a ridiculous idea ? anyway 'one' and 'zero' characters contain vowels ?? or if we were to have entries entirely comprised of digits such as 100 is this then a 'word ' without a 'vowel ' ?? no, it's merely a different spelling . i think the discussion should be suspended immediately no more time needs to be wasted on such absurd and degenerative nonsense. we are going to be grunting and gesturing like cavemen in 2 seconds flat if we allow this to continue . down with this sort of thing . and as for your << leetspeak >> entry again there are ways for normal and well-adjusted people to express this idea (God knows why any of them would ever have a reason, or feel the need to ) using the 26 letters of the alphabet that God gave us and that have been perfectly O.K. for hundreds of years . Dispense with the category immediately and the entries as well as you asked me. Perhaps entries containing digits should have a separate designation other than 'words' or even be moved to a separate namespace ? Counting Wiktionary mainspace entries as 'words' would be quite misleading if this were not resolved. Beaneater00 (talk) 20:29, 26 January 2023 (UTC)[reply]
@Equinox, Al-Muqanna, Octahedron80 Dafuq regarding the comment just above? But I agree this should be a manually-added category if kept. There aren't any programmatic ways of excluding abbreviations, censored forms, etc. other than by examining the page contents, and we don't really want to go down that route. Benwing2 (talk) 00:48, 27 January 2023 (UTC)[reply]
This is an automated category?! I agree, that's ridiculous. The category itself is of interest, if you like linguistic trivia, which a lot of people do. Lots of children go through a stage where they explore words without vowels (or without consonants), as part of a growing understanding of the difference between sounds and the letters used to write them. (E.g. like understanding that Italian doesn't have 5 vowels just because it has 5 letters for its vowels.) But filling up the category with items such as "2nd" and "f**k" obscures the only interest that it would have. So no, don't delete, but add to it manually. If it has any maintenance use, it should be a hidden category. kwami (talk) 01:05, 27 January 2023 (UTC)[reply]
OK. Cute talk. The bunchberry plant thing sounds like bullshit, like when Americans say "oh the English have a law that you get murdered with a cannonball if you pick a rose after midnight". Should I "RFDO" this silly category, or is anyone interested in doing anything with it? Equinox 03:10, 28 January 2023 (UTC)[reply]

Discouraged Characters[edit]

The Unicode Consortium discourages the use of some characters, because there is now a generally better encoding. Should we put the old and new form (usually a sequence) in the same page, as is done with normal and halfwidth ASCII characters? Should we do the same with the chillus of the Malayalam script, which have an old (normally three character) form that should still be supported, and a new form which is an indivisible character. Note that these equivalences are not handled by canonical equivalence, unlike the multitude of precomposed characters in the Roman script. --RichardW57m (talk) 13:03, 26 January 2023 (UTC)[reply]

There is some discussion of how to handle discouraged characters and character sequences at the Grease Pit. RichardW57m (talk) 13:03, 26 January 2023 (UTC)[reply]

Perhaps you can look at mainspace Wikipedia usage for guidance. If there is an obviously newer and more correct/functional encoding then there should be no problem using that as opposed to providing two different encodings for the same word (in the same script, that is -- different Unicode characters used to express what would be/is written the same way in real life). To 'put the old and new form in the same page' as you say would be out of the ordinary in my opinion Beaneater00 (talk) 20:16, 26 January 2023 (UTC)[reply]
Wikipedia practice with two different names for the same thing, and only that same thing, is to have a common page by the technique of hard redirects. When it comes to words in Wiktionary, putting two different spellings (i.e. visual forms) on the same page is standard practice, except that hard redirects are discouraged in favour of soft redirects, for there may be distinctions of meaning, not least because spellings equivalent in one language are not equivalent in others.
I has previously noted a reluctance to enter letters composed of two starters, such as ph, but I see that they are now around. That seems precedent enough to merge the two entries. --RichardW57m (talk) 10:58, 27 January 2023 (UTC)[reply]

Templatizing "Unknown" and "Uncertain" in etymologies[edit]

I'm planning on replacing the word "Unknown" at the beginning of an etymology with {{unk|LANG}}, and similarly for "Uncertain" with {{unc|LANG}}, as recommended e.g. by User:Vininn126. I'm posting here because I'm also planning on replacing the following expressions in the same fashion and I want to verify that no one has significant concerns:

  • Unknown etymology
  • Unknown origin
  • Etymology unknown
  • Origin unknown
  • Of unknown etymology
  • Of unknown origin

And similarly for uncertain. That is, in place of the above expressions, it will simply read Unknown or Uncertain, appropriately templatized. I did some spot-checking and (a) I can't discern any semantic differences among these different expressions, (b) replacing the above expressions with just Unknown or Uncertain still reads fine in all the cases I examined. A few examples:

etc.

Benwing2 (talk) 00:57, 27 January 2023 (UTC)[reply]

I believe that whenever a theory is suggested that {{uncertain}} should be used. The last one looks more like pure speculation on our part? I.e. Someone said "these two words are probably related, and etymologists have given an etymology for piggin. Vininn126 (talk) 08:48, 27 January 2023 (UTC)[reply]
It's saying it's cognate with piggin, which has an unknown etymology ("origin obscure" per the entry)—there isn't a concrete etymology given for piggin, just a list of "compares". "Obscure", incidentally, should probably also be treated as analogous to "uncertain". Benwing's proposal for handling "etymology ~", "origin ~" etc. seems fine to me. —Al-Muqanna المقنع (talk) 09:23, 27 January 2023 (UTC)[reply]
@Al-Muqanna I agree with treating 'obscure' as 'uncertain'. Benwing2 (talk) 07:02, 28 January 2023 (UTC)[reply]
@Al-Muqanna BTW there are only 72 places where 'Obscure' or 'Obscure etymology/origin' or 'Of obscure etymology/origin' occurs at the beginning of an Etymology section, compared with 2400+ for uncertain+unknown, and some need to be processed by hand (which I can do as there are so few of them). Benwing2 (talk) 07:21, 28 January 2023 (UTC)[reply]

Oghuz language again (continuation of Wiktionary:Beer parlour/2022/December#Oghuz language)[edit]

I do not want to be too insistent in this discussion, I also find your arguments sensible, after all, there is no language registered as such. But, I want you to understand me, as well. According to an article on Oghuz words in the DLT, the number of Oghuz words is over 250. This is the language that Kashgari recorded the most in DLT, apart from his own language Karakhanid. Although Old Anatolian Turkish starts from the 11th century (some says 12/13th c.), I do not want more than 250 words mentioned here as Old Anatolian Turkish. Because you will see on Kashgari's map that the place referred to as the land of the Oghuzs is in the middle of Central Asia, not even in Iran. Oguz is in the position of Yabgu state. In other words, it cannot be the ancestor of only Azerbaijani and Turkish. If you say that it is only in one work, yes it is, but the Oghuzs and Kipchaks are also intensely described. And Oghuz is the most. It's not like argu that the number of saved words. That's why I say, Oghuz language should be added with ogz code. We already have the Kipchak language, but I think Oghuz should be present as well. I won't say this for languages like Argu and Çiğil, but over 250 words are not few. I would appreciate it if you also could give your opinion on this matter: @Allahverdi Verdizade, @Rd1978, @Itidal. BurakD53 (talk) 04:01, 27 January 2023 (UTC)[reply]

Re: Oghuz language should be added with ogz code
Volunteer editing community of Wiktionary is not in charge of ISO 639-1 codes. Wiktionary cannot create a new language code designation. Are you proposing that an Oghuz language edition of Wiktionary should be created? what kind of code are you talking about ? Beaneater00 (talk) 05:36, 27 January 2023 (UTC)[reply]
Is there such a language as the Proto-Oghuz language? If trk-ogz-pro code, which will never enter, is in the wiktionary, it can also be the Oghuz language before the Old Anatolian Turkish. In addition, trk-ogz is also entered into the data. I just request it to be in a status that can enter entry. But I am not insistent on this. Negative suggestions are also justified. It is not a literary language. But the fact that it is not a literary language does not mean that it does not exist. I just think that the mentioned language is not Old Anatolian Turkish. And this language will have more data than many languages on this site.
If it were up to me, the data doesn't matter, I'd add Arghu too. Because you cannot put this language under the dialect of anything. But it's not up to me. This is a community and I'm only suggesting in good faith. In common parlance, just saying.
I just realized that trk-ogz code works with cognate ({{cog|trk-ogz}}). Doesn't work with descendant ({{desc|trk-ogz}}). If you increase the status of this code, my problem will be solved. This is maybe the option where a language whose iso code does not exist can be added to the wiktionary. BurakD53 (talk) 06:38, 27 January 2023 (UTC)[reply]
It seems like Burak knows more about this code business than I do. So I'll leave the people he asked the question for in the first place to answer it Beaneater00 (talk) 06:44, 27 January 2023 (UTC)[reply]
Then can you tell me where to put these more than 250 words? Shall we say Karakhanid? Shall I add it under the name Proto-Oghuz? I'm okay with that. Just last time you said it should be OAT. I do not think so. I don't claim to know better than you. BurakD53 (talk) 06:52, 27 January 2023 (UTC)[reply]
@BurakD53 I think there was some confusion, as Beaneater00 didn't realise we can add custom language codes. Theknightwho (talk) 16:24, 27 January 2023 (UTC)[reply]
How about this? ابا#Karakhanid They are mentioned as such in the database. BurakD53 (talk) 16:41, 27 January 2023 (UTC)[reply]
@BurakD53 Please don't add any more such terms until we've ironed out where they should go. You need to find some knowledgeable sources about this so-called "Oghuz Language". I'm skeptical it's different from all of the languages already in Wiktionary, which generally reflect the scholarly consensus on these matters. Pinging @-sche who works on language codes. Benwing2 (talk) 06:56, 28 January 2023 (UTC)[reply]
OK. I created only three: ابا, داغ, قون. Two of them are Arghu. BurakD53 (talk) 13:55, 28 January 2023 (UTC)[reply]
I can't find any more resources on the so-called Oghuz language. Because there is not. You do not understand the dictionary of Kashgarî, he wrote a comparative dictionary of the Turkic languages of his time. Since you cannot add non-language codes, they can be collected in Karakhanid. How else could they be added? (If the trk-ogz code had worked for descendant, I wouldn't have messed it up at all. 😬 This is how it all started: Reconstruction:Proto-Turkic/serče, Reconstruction:Proto-Turkic/koń, Reconstruction:Proto-Turkic/köń-) BurakD53 (talk) 14:35, 28 January 2023 (UTC)[reply]
So, what did we decide to do? Shall we consider them as Karakhanid? I think we can clear this up before next month comes. We can count them as a dialect of the Karakhanid period. We can divide languages by adding them into categories. In this way, only the Oghuz problem will not be solved. In addition, dialects such as Arghu, Chigil, Kenchek, Kipchak, Suvar and Bulgar will be solved. You were right in the first place that you couldn't add a new language. It was ridiculous that it took this long. Also I'm the guilty. BurakD53 (talk) 04:46, 31 January 2023 (UTC)[reply]

What remains to be done at Wiktionary ?[edit]

I found that there was always something to do at Wikipedia , something to be added. But I'm under the impression that you already have all of the words and al of their definitions. I could add a few maybe or give pronunciation and usages in a Southern U.S. context but it seems from the viewpoint of this unenlightened observer that Wiktionary is full, it is complete. Beaneater00 (talk) 04:21, 27 January 2023 (UTC)[reply]

That's correct: Wiktionary is finished. However, you can still
* vote on which entries should be Word of the Day;
* add unnecessary pictures;
* vandalise entries so that we have something to fix;
* argue about the Russian war on discussion pages.
Equinox 04:30, 27 January 2023 (UTC)[reply]
Why do you call it the Russian war ? I don't like this trend. Nobody called the war in Vietnam the "American war". Well, the Vietnamese did. I don't know if anyone calls the wars in Libya and Syria the American War. I guess this is the first time in a while that we didn't start the thing ( well that's debatable we certainly egged the Russians on when we spied on them and gave the Ukes weapons and intel and then turned around and told everyone Kiev would fall in 3 days. Perhaps the whole business could have been avoided ? But avoiding war isn't fun, appear weak when you are strong (apparently). All is fair in love and war ). But the constant reference in the media to "Russia's war in Ukraine" tells me they think we're so stupid and childlike that we can't keep their narrative straight in our heads for five seconds, they have to keep reminding us that the Russians started the thing and all the pictures of dead soldiers and blown up apartment buildings on TV are all their fault and we should hate them that much more and especially that (pick one) tyrant/madman/fool/clown/cancer patient Putin. Not that Russian state media is any better.
Neither do I like the political misuse of language such as "counteroffensives" against areas that Ukraine has not controlled for months. A counteroffensive in military terminology is not a political or historiographical designation but refers to a specific kind of military operation that is conducted against or immediately in the wake of an ongoing offensive, i.e. when the enemy has not prepared a defense and established solid control in the areas under attack. It has nothing to do with who you think are the good guys and the bad guys . Ditto for the use of "Ukrainian defenders" to refer to soldiers, combatants. There are probably a lot more examples thankfully I don't get as much of this drivel rammed down my throat as I rightly should so I don't know . Beaneater00 (talk) 04:48, 27 January 2023 (UTC)[reply]
It’s not a war! It is a special military operation. May we call it “the Russian special military operation”, pretty please?  --Lambiam 12:48, 1 February 2023 (UTC)[reply]
Seems like you're getting the hang of it already!
On a more serious note, Equinox's comment was intended as a joke, not an earnest list of tasks. Wiktionary might be more or less "complete" from the perspective of having a basic definition for the most commonly encountered English words, but it's certainly not complete in the sense of describing "all words of all languages", which is our goal. Heck, even for English there are plenty of words (and subsenses) missing, and we definitely haven't finished adding "etymologies, pronunciations, sample quotations, synonyms, antonyms and translations" for all entries.
If you're only interested in English, I might direct you to User:Brian0918/Hotlist and WT:REE, as well as Category:Requests concerning English. 70.172.194.25 05:33, 27 January 2023 (UTC)[reply]
Don't forget all the obsolete, archaic, and dated stuff in English. I found a dozen or two terms we didn't previously have recently just from reading a book written in the 1950s. —Al-Muqanna المقنع (talk) 09:28, 27 January 2023 (UTC)[reply]
Seeing someone earnestly "both sides" a landgrab invasion in response to a flippant joke is one of the funniest things I've seen all week. Thanks! Theknightwho (talk) 16:01, 27 January 2023 (UTC)[reply]
Wiktionary doesn't care about political sides, no ? Did I ever talk about who was wrong and who was right ? But should we not use language as we ourselves the eminent Encyclopedia -- and I am proud to say that I have gone first to Wiktionary for years -- because of its lucid, uncorrupted, and familiar layout compared to other sites -- define it ? That is -- based on existing conventions of language ? It's true that words acquire new forms and meanings sometimes directly contrary to what one would assume -- as in the case of a << false friend >>, this is often the consequence of linguistic evolution , finger-wagging about << folk etymology >> aside ? Don't mind me I'm just a rambling fool ... Beaneater00 (talk) 21:55, 27 January 2023 (UTC)[reply]


Anglocentrism at its finest... Thadh (talk) 14:50, 27 January 2023 (UTC)[reply]
It was a prank bro. Look at the literally hundreds of English entries I added last night while responding to this joker. Equinox 19:29, 27 January 2023 (UTC)[reply]
My respected friend, I'm not a joker I am simply new here Beaneater00 (talk) 21:55, 27 January 2023 (UTC)[reply]
I know, but while you add "sophisticative" the majority of languages lack the word for "mother"... Thadh (talk) 13:39, 1 February 2023 (UTC)[reply]
We have many significant quality problems in all kinds of English entries. Our entries have many inane definitions; too many trivial wikilinks; deficient and missing derived terms; unfulfilled requests for cleanup, definitions, quotations, examples, etymologies, images; redundancies; etc. We also don't know much about how ordinary users use Wiktionary (or would use, if they knew about it). DCDuring (talk) 14:47, 27 January 2023 (UTC)[reply]
Finally. I never thought we'd be done, but here we are. What shall we do next, get girlfriends? Nicodene (talk) 22:02, 27 January 2023 (UTC)[reply]

Need for update of Slovene templates and appendices[edit]

This is a generalization of discussion about Template:sl-pronounce.

All the current module-operated Slovene templates for pronunciation and inflection are outdated and missing some key features and should therefore be updated. Template {{sl-IPA}} is only phoneme-based and does not take into account even the most basic allophones and all predictable regional changes have to be also inserted manually. I am thinking maybe a template similar to {{zh-pron}} could be made that would account for all the differences. The inflection templates have support only for non-tonal Slovene, while all the rest of Wiktionary uses tonal Slovene. This issue needs to be resolved as the current state is probably very confusing to the users.

There are detailed articles regarding the pronunciation, declension and conjugation on Wikipedia, it just has to be decided what to include and what not. Consequently, most other Wiktionary appendices about Slovene have to be updated, including Wiktionary:about Slovene. Also, it would be great to add a template to warn the users that non-tonal accent is used for the entries where tone is unknown.

As far as I know, the Slovene community is non-existent, so I made the new pronunciation and declension templates by myself, however everything has to be entered manually as I have no knowledge about modules.

What I’m saying is that Slovene is in desperate need for update, and it has to be first decided what to include and what not and then update the templates. Garygo golob (talk) 11:38, 27 January 2023 (UTC)[reply]

@Garygo golob User:Rua used to work on Slovene but she isn't active any more. I have done some work on the Slovene modules but not fundamental work on them. Supporting different varieties of Slovene is IMO better done by following the example of the Spanish, Portuguese or Basque pronunciation modules than the Chinese one, which has too much in it that is specific to Chinese. Also IMO we should use tonal notation everywhere. I'm not sure how common it is that the tone is unknown but it can't be frequent since tonal Slovene is regularly used in radio broadcasts and such and they need to know what the tones are for every word they pronounce. Benwing2 (talk) 07:00, 28 January 2023 (UTC)[reply]
@Benwing2 I know, I left a message to Rua already. I gave example of Chinese because I was also doing some Chinese entries, but yeah, the Spanish example is better, I guess. I also support the tonal notation, but the tone can be hard to determine for words that are not in the SSKJ/SP or only appear in non-tonal dialects. I guess that for now, when the basic words are being added, is not such of an issue. So apart from this issue, there is also the problem with the order of cases (the current is not as traditional, but is based on one source where the accusative was put second because of it similarity with nominative) and with names of accentual types as the mixed accent is currently named mobile, which is actually a name for the other type. Also IMO we should also add SNPT pronunciation as Slovene is very rarely transcribed with IPA and even foreigners are taught SNPT when learning Slovene.
So, will you be able to update the modules? Garygo golob (talk) 09:22, 28 January 2023 (UTC)[reply]
@Garygo golob I didn't get your ping; in order for it to go through, the ping and your signature need to be saved/published at the same time. I may be able to help with the modules somewhat but I have a lot on my plate now. It would make it easier if you could specify exactly what needs to be changed, in as much detail as possible. Benwing2 (talk) 07:07, 29 January 2023 (UTC)[reply]
@Benwing2 The sl-IPA template needs the most work. Firstly, it would be great if it would automatically change between IPA and SNPT (data about that you can find in the Appendix:SNPT#for standard Slovene). The basic transcription for conversion from phonetic respelling to IPA is already present in the template, but some allophones are missing:
  • pf → /p͡f/, voiced as /b͡v/ (voicing and devoicing is already correctly implemented for other consonants, except that it does not ignore spaces and therefore does not have support for multiword entries.
  • dz → /d͡z/, devoiced as /t͡s/
  • kh → /k͡x/, voiced as /ɡ͡ɣ/
  • v:
    • before a vowel, it should be /ʋ/
    • after a vowel (and not followed by vowel), it should be /u̯/
    • between two consonants or at the beginning of a word, it should be /ʷ/ or /u/ (e. g. vsi → ʷsi, usi)
    • after r, it should change to /w/
  • ł:
    • it should be transcribed as /u̯/, except after r, when it is /w/
  • p, b should change to /p̪, b̪/ before /f, v/
  • nasals
    • n should change to /ŋ/ before /k, g, x, ɣ/
    • m/n should change to /ɱ/ before /f, v, p̪, b̪/
    • n should change to /m/ before bilabial consonants
  • When a dental/alveolar fricative or affricate are followed by a postalveolar fricative, affricate or ⟨nj⟩/⟨lj⟩/⟨rj⟩, they should become postalveolar
  • When two or more of the same consonants are pronounced one after another, they become geminated, taking the allophone of the first one brezzọ̑b → [breˈzːôp]. The same thing happens with /u̯/ and /ʷ/ (sȋv vzọ̑rec → [ˈsîːˈu̯ːzóːɾə̀t͡s])
  • When a dental/alveolar stop is followed by a dental/alveolar affricate, then they can be pronounced separately or combine into a geminated affricate; e.g., od čebẹ̑le [otʃːeˈbéːlɛ̀]
  • When a dental/alveolar stop or affricate is followed by a dental/alveolar fricative, they combine into a geminated affricate (pọ̑dse [ˈpóːtsːɛ̀]).
  • stressed i before r should be transcribed as ɪ with the correct diacritic on top (that is already partially implemented, see below for some needed additions)
  • ȁ should be transcribed as /ʌ/ with the correct diacritic on top
  • in the final syllable of the word, sequences ej and ou should be transcribed as /e̞j/ and /o̞u̯/, regardless of the stress (with the correct diacritic)
  • unstressed e and o should be transcribed as close-mid before the stress, secondarily stressed and in words with no stress and as open-mid after the stress.
  • When unstressed /ə, i, e/ are followed by [u̯], they can also be pronounced together as [u] (but the usual form with the usual transcription is also valid)
  • between two vowels, a syllable break should be inserted, except after i, where j should be inserted. Exceptions are unstressed /i/ and /u/, which can also turn into [j] and [u̯], respectively if preceded by a vowel (but the usual form with the syllable break is also valid)
  • the last syllable should have the opposite tone to the stressed syllable. If the last syllable is stressed, then ◌̑ → /◌̂/ and ◌́ → /◌̌/
  • Every second syllable should be secondarily stressed
What is listed up to now should be denoted with a qualifier Tonal Slovene. For non-tonal Slovene, /ʌ/ should be transcribed as /a/ instead and sequences /əl, əm, ən/ should be transcribed as /l̩, m̩ and n̩/, and of course, tones should be omitted.
Visible by default should also be pronunciation for Natisone Valley dialect, Resian dialect and Prekmurje dialect (the IPA will be inserted manually and no SNTP) as they have their own official standard. The template should look like this (the indented bullet points should be hidden by default):
  • (Standard Slovene, tonal):
  • (Standard Slovene, non-tonal):
    • (Rosen Valley, Resian, Torre Valley, Natisone Valley, southern Soča, rural areas of Upper Carniolan and Lower Carniolan dialects, in Standard Slovene archaic or high literary): added separately, just needs to be transcribed into IPA the same way as tonal Slovene
    • (Rosen Valley, Resian, Torre Valley, Natisone Valley, southern Soča dialects, in Standard Slovene obsolete): added separately, just needs to be transcribed into IPA the same way as tonal Slovene
    • (Area around Ljubljana, in Standard Slovene neologic): every /ə/ should be changed to /ɛ/
    • (colloquial): every /e/, /ɛ/ before or after a nasal should change to /ə/, but it would be great if other forms could also be added manually.
    • (Dialects with ń–n distinction, in Standard Slovene archaic): all nj + consonant sequences should be transcribed as nʲ +cons. and nː + cons.
    • (Dialects with ĺ–l distinction, in Standard Slovene archaic): al lj + consonant sequences should be transcribes as lʲ + cons. and lː + cons.
    • (Dialects with t’–č distinction, in Standard Slovene obsolete): t' should in all other cases be transcribed the same as č, but here it should be transcribed as t͡ɕ, voiced d͡ʑ, SNPT t’, voiced d’.
    • (Švapanje): every /ʋ/, /l/ and /ł/ should be pronounced as u̯, except before /e/, /ɛ/, /ɪ/, /i/
    • Alpine Slovene (1000–1200 AD): will be inserted manually in the notation similar to that of Proto-Slavic (i. e. no IPA template)
  • (Resian): is not the same as any of its subpronunciations, which will be added manually
    • (Bila/Bela/San Giorno):
    • (Njïva/Njiva/Gniva):
    • (Osoanë/Osojane/Oseacco):
    • (Solbica/Solbica/Stolvizza):
  • (Natisone Valley dialect): should be the same as for western microdialects
    • (western microdialects):
    • (eastern microdialects): added manually
  • (Prekmurje dialect old standard form):
  • Rhymes: currently are added without any allophones and in non-tonal variety, but if you feel that also needs to be in tonal variety, then go ahead (I'm guessing this is relatively easy to automate from pronunciation)
  • Hyphenation: will be done manually
  • homophones: it would be great to have two categories: one for tonal Slovene and one for non-tonal additional.
If any of the forms is not inserted (which may also be Standard Slovene pronunciations), or if any of the forms are the same, then they should not be shown. Also, prosodic word is not the same as a written word, but the changes occur in relation to prosodic words, so either there is no space in the pronunciation and the words made up of multiple prosodic words are have spaces in between or some other mark is inserted, e.g. a full stop to mark the end of a prosodic word.
For the inflection tables, if we wanted to continue Rua's vision of just having to write one form and it would automatically do all other, the work for IPA template would be nothing compared to that. The articles on Wikipedia are huge (260000 bytes for inflection and 100000 for conjugation) and I therefore recommend we make those templates manual. I made the Template:sl-decl-noun-table3, which I am quite happy with and does not require a module, but if you could make it so it is of constant width, even when not all forms are shown, or if you have time add automatic formation of vocative (◌́ and ◌̄ should change to ◌̑, and ◌̀  and ◌̄ (if on ə) should change to ◌̏; otherwise the form is the same as nominative of the same number), but it should also allow addition of other forms.
For the adjective, I made the Template:sl-decl-adj-table2- series. For most adjectives, declension is much simpler and if you have time to also help me with that, I would happily explain. Again, vocative can be quickly automated by the same principles as nouns.
For verbs, the current module-operated template also does not support tonal accents, so I am in the process of making one table myself.
If you have time, you can make module-operated tables for combining forms of adjectives and verbs, but that is not really crucial.
Thank you in advance and I hope it is not too much. If you still don't understand something, or have a better idea, feel free to ask. Garygo golob (talk) 14:00, 29 January 2023 (UTC)[reply]
@Garygo golob Thanks! This is a lot of info to digest and it may take some time before I get to it. Benwing2 (talk) 02:28, 30 January 2023 (UTC)[reply]
@Benwing2 I can help you with the list of consonants and vowels and how they should convert from IPA to SNPT. Also, two things I forgot to mention: The diacritics used in a word or in SNPT are only on the stressed vowels, the last vowel, if unstressed, does not have a dedicated diacritic. In the template, it should also be possible to add tonal IPA pronunciation manually as some loanwords and interjections don't follow all this changes. Thank you. Garygo golob (talk) 06:11, 3 February 2023 (UTC)[reply]

blocks of long-time problematic but currently inactive users?[edit]

I just went through and reviewed and undid the last few years' changes of a particular Italian editor who (a) doesn't know what the fuck they're doing at all, (b) has extremely wrong ideas about Italian phonology and grammar, (c) when challenged, either doesn't respond at all or is combative and denies there is any issue. These sorts of editors are the most problematic of all, much worse than outright vandals. This user also has edited other languages and is equally problematic there. Thankfully this user has not been very active since around 2020, but I'm concerned if they ever come back they will continue the damage. Is it worth it and/or reasonable to proactively block them (indefinitely)? I am losing patience for problematic editors and have come to the conclusion there are certain people who won't ever learn. Benwing2 (talk) 02:27, 30 January 2023 (UTC)[reply]

Temporary bans are useful for temporary issues and permanent/indefinite ones for permanent issues. If Editor X has persistent problems making quality edits and playing nice with others, then it makes sense to block him indefinitely until/unless he can explain on his talk that he understands and accepts why he was blocked. (It's odd that the block hasn't already happened, but I'll put a pin in that, since I don't know anything about the details.) Even an editor who has competent edits but who pushes away others by being combative would be a candidate for a permanent block, since this is a collaborative venture. So, in short, I think this is a completely valid idea in principle. —Justin (koavf)TCM 06:05, 30 January 2023 (UTC)[reply]
My learned friend,
It may not surprise you at all to learn that I entirely agree with your simple rule: temporary for temporary, permanent for permanent. Under this principle it is shocking and distressing to me that I have received an indefinite block (immediately combined with TPE ban) for one offense of possibly wrong changes and polite talk page disagreement with an admin when he rolled me back without explanation. Technicalrestrictions01 (talk) 13:24, 30 January 2023 (UTC)[reply]
I don't know anything about that, but it seems like you're using multiple accounts to edit, which I advise against. —Justin (koavf)TCM 13:40, 30 January 2023 (UTC)[reply]
I agree with Koavf. A preemptive block can be justified, and does no harm in any case if they're inactive. If they've changed their ways in the meantime they can explain it and request an unblock. —Al-Muqanna المقنع (talk) 07:55, 30 January 2023 (UTC)[reply]
For reference, how can a user request an unblock if his talk page access was revoked (for no discernible reason) ? Technicalrestrictions01 (talk) 13:29, 30 January 2023 (UTC)[reply]
Seems like you've figured it out. —Al-Muqanna المقنع (talk) 13:58, 30 January 2023 (UTC)[reply]
I agree with Al-Muqanna. When I notice someone has made edits that merit a block, I sometimes issue one even if the user isn't active, partly so the log reflects that they did blockable things, in case they become active and problematic again. It's not the best example, but at the time I gave this user a two-week block, he hadn't edited in nearly a month (and had made very few edits in the months before that), but I put a block in the log to make clear to both the user and people in the future that the edits had been block-worthy: it wasn't initially obvious to me that he had been warned multiple times before, so I knew it'd be easy for someone else to miss, if he returned in another few months and made more problematic edits... but with a block in the log (and one mentioning where the user was previously warned), I knew if problematic edits resumed later it'd be easier for a future admin to notice "ah, this user has already been warned (and blocked)". (In this case, the block seems to have motivated the user to shape up, so it worked!) If an inactive editor has been long-term problematic in a way that merits an indefinite block, issuing one is either good (if it prevents them from returning and causing more problems) or harmless (if they remain inactive), as Al-Muqanna says, as long as people will notice if they return and contest it on their talk page. - -sche (discuss) 08:00, 3 February 2023 (UTC)[reply]
Thanks all for the comments. I have blocked two such users (User:DelvecchioSimone12 5 96 and User:Angelucci). I'm sure there are other longtime problematic users of this nature. I know of at least one, still active user, User:Diddy-sama6, who will probably have to be blocked as well; among other things, they wantonly add pronunciation templates for large numbers of languages they don't know, continue despite numerous warnings and requests to stop and deny there is an issue with their edits. Benwing2 (talk) 05:07, 7 February 2023 (UTC)[reply]

Voting closes soon on the revised Enforcement Guidelines for the Universal Code of Conduct [edit]

You can find this message translated into additional languages on Meta-wiki.

Hello all,

Voting closes on the revised Universal Code of Conduct Enforcement Guidelines at 23.59 UTC on January 31, 2023. Please visit the voter information page on Meta-wiki for voter eligibility information and details on how to vote. More information on the Enforcement Guidelines and the voting process is available in this previous message.

On behalf of the UCoC Project Team, Mervat (WMF) (talk) 10:26, 30 January 2023 (UTC)[reply]

Last call to vote on revised UCoC enforcement guidelines![edit]

Hi all,

A friendly and final reminder that the voting period for the revised Universal Code of Conduct Enforcement Guidelines closes tomorrow, Tuesday, 31 January at 23:59:59 UTC.

The UCoC supports Wikimedia’s equity objectives and commitment to ensuring a welcoming, diverse movement, and it applies to all members of our communities. Voting is an opportunity for you to be a part of deciding how we uphold this commitment to our community and each other!

To vote, visit the voter information page on Meta-wiki, which outlines how to participate using SecurePoll.

Many thanks for your interest and participation in the UCoC!

On behalf of the UCoC Project Team,

JPBeland-WMF (talk) 21:23, 30 January 2023 (UTC)[reply]

Automatic override of sc= and sort=[edit]

Currently, sc= and sort= are two standard parameters which we use on all headword and linking templates. For context, sc= manually specifies the script, while sort= manually specifies the sortkey (i.e. how that term should be sorted in any categories it’s added to by that template, if any). If they aren’t given, the script is automatically detected for the language, and the sortkey is handled according to language-specific rules.

Now, most of the time these are redundant, and generally date from a time before we were able to automate this stuff. However, in the cases where these are actually doing something, the vast majority of the time they’re actually causing problems:

  1. Although these aren’t commonly given for individual terms, they’re very common in headword templates. For languages that use multiple scripts, this can be an issue if whoever made template didn’t account for that. For example, until today, many Vietnamese headword templates would automatically use the Latin script unless the user manually overrode this (which, of course, they often didn’t). This was, therefore, causing issues on many Han entries.
  2. In the case of sortkeys, the automatic schema usually follows a very specific format, which is designed to ensure that terms all come out in the correct order. These have absolutely nothing to do with the grammar of the language, because they’re tailored for the MediaWiki software that Wiktionary runs on. It’s completely unreasonable to expect a user to know that, much less to actually implement it. Plus, by their very nature, it’s generally desirable to use the same sorting method for all terms, and users may not necessarily agree on what that is (e.g. Vietnamese has two systems). As such, manual sortkeys generally just mess everything up.

In both cases, the only situation where manual specification is actually desirable is if a language uses multiple scripts, and the two scripts have overlapping character sets. The only example I can think of where this applies is for the Translingual Cyrillic letters, as they have Cyrillic and Old Cyrillic headwords. In the case of sorting, manual specification is never necessary, as the only thing that could impact this is the script.

As such, I propose the following:

  • sc is automatically overridden unless the language has multiple scripts assigned to it and the term could plausibly be in either, or the language has no scripts assigned.
  • sort is automatically overridden unless the language has no sorting rules. If there are issues with the current sorting method, then this needs to be dealt with properly by updating the language data module/sortkey module. Not by forcing users to manually specify it every single time a given term is linked to.

Theknightwho (talk) 18:32, 31 January 2023 (UTC)[reply]

  • There are cases where sort is useful. The most recent example is that I changed the sorting of all entries in Category:English terms derived from the shape of letters so that they sorted under the letter from which they were derived, instead of the first letter of the term, which IMO makes the category much more usable (e.g. "esses" under S, "wye" under Y). Most of these were bare category links, but a few used {{cln}} with a sort parameter. That's kinda a unique case, admittedly.
  • A much more widespread use of sort is in Japanese kanji entries, to specify the pronunciation-based hiragana under which the term should be sorted, and similarly for other Japonic languages. This seems useful.
  • sort=* has been used to specify a main entry for a set category.
  • There are probably other valid uses that I can't think of off the top of my head. I wonder if a better solution would be to add a parameter not for the final sort key, but for the human-understandable input that gets passed to the sort key function (that then gets transformed in hacky ways to get around MediaWiki's limitations, e.g. the replacement of I by I in Turkish) 70.172.194.25 18:53, 31 January 2023 (UTC)[reply]
So the two solutions I see there are:
  1. Allow manual overrides on a language-specific basis. We already do something similar for transliteration.
  2. Allow manual overrides to be specified in a template (perhaps by making sort a module parameter, or instead by having a whitelist). The overall issue here is that it needs to be done systematically.
Really, though, don’t think sorting should ever be done by specifying another term (unless that’s how the language does things, as with Japanese, or there is a very specific use-case, as with the shapes of letters you mention) - that usually isn’t what sorting is for. Theknightwho (talk) 19:11, 31 January 2023 (UTC)[reply]
|sort= is needed for sorting Welsh words in Welsh order, e.g. to ensure that Llanberis comes before Llangollen, unless we encode the latter with a (semi-)invisible character (e.g soft hyphen or CGJ) to prevent its being sorted as though its third letter were the letter 'ng', which comes between 'g' and 'h' in the alphabet. I believe the same type of issue arises in Slovak. --02:30, 1 February 2023 (UTC)
Pali, of course, can be considered as having overlapping scripts within the ISO 15924 Burmese, Lao, Lanna and Thai scripts, but so far this has been handled by manual transliteration and passing in a partial writing system description for inflection tables. We have lazily tolerated English collation order for the Roman script, and indeed there is one Roman script dictionary that actually uses it. We do have lurking issues in the Burmese script with local variations in the interpretation of NNYA and, allegedly, SHAN KA. In the Thai script, สุตวา can be sutvā or sutavā. --RichardW57 (talk) 02:30, 1 February 2023 (UTC)[reply]
@RichardW57 Isn't that handled automatically already for Welsh? I'm pretty certain that it is. In any event, where there are no automatic rules specified, this proposal wouldn't change anything. I see the issue now. That is something that is feasible to solve, albeit likely to be quite complex. Theknightwho (talk) 02:34, 1 February 2023 (UTC)[reply]
There is a temporary usage of |sc= I can think of. There was a group of Turks who switched to speaking Russian, but continued to use the Arabic script for writing. If someone starts writing up the vocabulary from the surviving documents, should they not be entering the script as some form of Arabic script so as to assist font selection? An ordinary editor cannot add Arabic to the list of scripts used by Russian.
I had a request to add support for the Chakma script to Pali. I patched it the Pali modules in as far as I could go without changing the protected the data modules. I should probably have added |sc=Cakm for the header line modules for the few test entries I added, but thought the requester (@Apisite?) would add more Chakma script Pali entries himself. I only had one page of Pali text in the Chakma script, which is more than can be said for all the Shan writing system forms reportedly around.
Shouldn't there be a maintenance category for the use of unusual script codes? --RichardW57m (talk) 14:15, 1 February 2023 (UTC)[reply]
@RichardW57 I believe Category:Terms written in foreign scripts by language serves this purpose, but it is added manually and only covers three languages. IMO it should be added automatically by the head temp. Wpi31 (talk) 06:18, 2 February 2023 (UTC)[reply]
Why is this a distinct category hierarchy from Category:Terms by script subcategories by language, e.g. Category:Terms in Latin script by language? 70.172.194.25 08:07, 3 February 2023 (UTC)[reply]
No good reason I can see of; they should be merged. Benwing2 (talk) 09:11, 3 February 2023 (UTC)[reply]
@Benwing2: These categories confuse me. How do I find Balti terms written in Tibetan script? --RichardW57m (talk) 12:50, 3 February 2023 (UTC)[reply]
The needs of maintenance are different; at least, I believe maintenance is mostly done by language. For example, I occasionally look through Pali maintenance categories to look at what needs attention. To do that for strange scripts, one would have to look through nearly 50 pages. I suppose one could add a bespoke category with them as subcategories, but that's still dozens of lines to scan, or perhaps even read. (The generic specification would be to add one for the subcategories not listed in the data for Module:languages as using that script.)
Even then, it relies on |sccat= and |sc= being set somewhere in the call chain going down to full_headword() in Module:headword. And this only addresses headword lines - it doesn't address issues with links, which prompted this thread. --RichardW57m (talk) 11:44, 10 February 2023 (UTC)[reply]
There is another technique for searching for words in odd scripts. If there is a TOC template (can't remember name), one can click on the page at the start of each Unicode block and look at the previous page, and also look at the collationally last entry in the index. That again is another fiddly manual process. --RichardW57m (talk) 11:44, 10 February 2023 (UTC)[reply]
@Wpi31: It would fail in the case I had in mind, where a {{lbor}} invocation had a weird combination of language and script, namely Hebrew and "polytonic". I think 'polytonic' would get folded into Greek. --RichardW57m (talk) 10:39, 3 February 2023 (UTC)[reply]
Forgot about the relevant Category:Terms written in multiple scripts by language, whose usage is even more odd. – Wpi31 (talk) 11:21, 3 February 2023 (UTC)[reply]