Wiktionary:Beer parlour/2023/March

From Wiktionary, the free dictionary
Jump to navigation Jump to search

I've got an impression that HAKHSIN (talkcontribs) pushes through made up terms. I don't know Persian enough but his words are not even searchable in Google. He has been adding Persian translations to terms he makes entries in the Persian Wiktionary for. (Notifying Ariamihr, Dijan, Mazsch, Qehath, ZxxZxxZ): Please check if you care.

I had one little conversation. My last question/comment was left unanswered. Anatoli T. (обсудить/вклад) 04:32, 1 March 2023 (UTC)[reply]

@Atitarev I blocked this user for 3 days, since they're not responding to questions about the likely made-up terms and this isn't the first time this user has come up in connection with problematic edits. Benwing2 (talk) 06:50, 1 March 2023 (UTC)[reply]
I am a native persian speaker and i can confirm that his terms were made up and meaningless. Karen kalantari (talk) 06:41, 16 March 2023 (UTC)[reply]

<languages />

Reminder: Office hours about updating the Wikimedia Terms of Use[edit]

You can find this message translated into additional languages on Meta-wiki.

Hello everyone,

This a reminder that the Wikimedia Foundation Legal Department is hosting office hours with community members about updating the Wikimedia Terms of Use.

The office hours will be held on March 2, at 17:00 UTC to 18:30 UTC. See for more details here on Meta.

Another office hours will be held on April 4.

We hereby kindly invite you to participate in the discussion. Please note that this meeting will be held in English language and led by the members of the Wikimedia Foundation Legal Team, who will take and answer your questions. Facilitators from the Movement Strategy and Governance Team will provide the necessary assistance and other meeting-related services.

On behalf of the Wikimedia Foundation Legal Team, Mervat (WMF) (talk) 18:19, 1 March 2023 (UTC)[reply]

Unverifiable derogatory term of a specific modern time person[edit]

@Justinrleung, Thadh added info of a specific political figure into 影帝 (lit. acting emperor)changes, stating that it is a derogatory label for that political figure. There is indeed a book whose title labels that political figure as such, but the said users did not give any reliable source to verify if this usage is widespread. Does Wiktionary policy allow such info even with a rfv-sense template? Sameboat (talk) 22:41, 1 March 2023 (UTC)[reply]

@Sameboat: We are not adding something, but putting something removed out of process back. The RFV process requires the relevant senses to be on the page until official failure of the process. — justin(r)leung (t...) | c=› } 22:43, 1 March 2023 (UTC)[reply]
Yes, we do. Any information that is added and hasn't gone through an RFV yet should go through RFV before being removed. If there is reason to assume that a sense is a fabrication or vandalism, an RFV may be speedied, but I don't think that's applicable here. Thadh (talk) 22:43, 1 March 2023 (UTC)[reply]
I'd agree if this does not involve a derogatory label against a modern day person. Sameboat (talk) 22:45, 1 March 2023 (UTC)[reply]
I don't know if this will make it more acceptable to you, but the verification of derogatory terms is already expediated compared to other terms (WT:DEROG). — justin(r)leung (t...) | c=› } 22:48, 1 March 2023 (UTC)[reply]
I understand your concerns, but if we start going around removing unflattering nicknames of public figures we lose all credibility as an unbiased source. As Justin said, we do handle this kind of term more quickly than other words. Thadh (talk) 22:51, 1 March 2023 (UTC)[reply]
Please read our rules at the top of Wiktionary:Requests for verification/CJK and our WT:CFI. Vininn126 (talk) 22:47, 1 March 2023 (UTC)[reply]
How long does the rfv take? Sameboat (talk) 22:53, 1 March 2023 (UTC)[reply]
Depends on the individual request. Usually at minimum a month, unless there is reason to speedy a request. Vininn126 (talk) 22:59, 1 March 2023 (UTC)[reply]
This one in particular should take two weeks since it's derogatory. — justin(r)leung (t...) | c=› } 23:08, 1 March 2023 (UTC)[reply]
@Justinrleung, Vininn126: I don't see what the point of removing Sameboat's etymology for the common noun is, the article can list both until the RFV is resolved. —Al-Muqanna المقنع (talk) 23:17, 1 March 2023 (UTC)[reply]
@Al-Muqanna: It's general practice in Chinese entries to not have elaborations on compounds in the etymology section because it is kind of redundant to the {{zh-forms}} box. — justin(r)leung (t...) | c=› } 23:28, 1 March 2023 (UTC)[reply]
Yeah, I'm just not sure the significance of 影 would be obvious to someone not familiar with the language with the automatic "picture, image, reflection" gloss. I see it's been clarified in the box now though, which works for me. —Al-Muqanna المقنع (talk) 23:49, 1 March 2023 (UTC)[reply]

Translations sections in non-English terms[edit]

See കൂപമണ്ഡൂകം. Created by User:Vis M. Normally I'd just delete the section but this term is a language-specific idiom (see Kupamanduka) that doesn't appear to have an equivalent lemma in English, and the translations are of equivalent terms in other languages. What is our policy in such cases? Benwing2 (talk) 05:14, 3 March 2023 (UTC)[reply]

As I understood translation sections were only for English entries - I do see the dilemma here. I don't see why we don't just move these to the etymology section using {{cog}} saying "compare". Vininn126 (talk) 09:27, 3 March 2023 (UTC)[reply]
For this particular entry, it seems that "frog in a well" / "frog in the well" is used in English. It should be possible to create an English entry to house the translations. – Wpi31 (talk) 15:54, 3 March 2023 (UTC)[reply]
I'm having difficulty finding examples of idiomatic usage in English: in the uses I can find it's either in translations from Chinese/Sanskrit/etc. or given an explicit explanation (with the understanding the reader won't otherwise be familiar with it). Regardless, even if it's unverifiable as an English idiom I think it would make pragmatic sense to create frog in a well as a translation hub. Given how widespread the idiom is in Asia (there are also at least Japanese and Korean versions) it wouldn't make sense to arbitrarily host it at the Malayalam entry, where people who come across the term in other languages won't find it. —Al-Muqanna المقنع (talk) 17:28, 3 March 2023 (UTC)[reply]
Yeah, let's create a translation hub. Some other sets of non-English terms that all mean the same thing but lack an English translation are at Appendix:Terms considered difficult or impossible to translate into English; perhaps the various frog in a well phrases in different languages should be listed there, and/or the other sets of terms listed there should have translation hub entries. In a few similar cases, I've resorted to ===See also=== to link such things (although this is not ideal as it doesn't clarify the nature of the connection and some people think "See also" should only link same-language terms), or the etymology section as Vininn suggests. - -sche (discuss) 21:23, 3 March 2023 (UTC)[reply]
@Vininn126, Wpi31, -sche, Al-Muqanna Thanks for the suggestions; I created frog in a well as a translation hub. Benwing2 (talk) 23:28, 3 March 2023 (UTC)[reply]

Ban Donnanz from participating in RFD[edit]

Time after time, Donnanz has proved woefully incapable of participating constructively at RFD discussions. He very rarely, if ever, makes any attempt at presenting cogent arguments. When someone asks clarification from him, all they're met with is dismissiveness.

Therefore, I hereby propose that Donnanz be banned from taking part in RFD debates. PUC20:36, 3 March 2023 (UTC)[reply]

Can you provide a few examples? Ioaxxere (talk) 21:08, 3 March 2023 (UTC)[reply]
For example:
The commonality is that the user often either gives no argument at all, or arguments that (given our CFI) are completely irrelevant, while refusing to acknowledge the relevant CFI rules. The user’s dismissiveness of other’s arguments gets by times a bit vitriolic.  --Lambiam 06:36, 4 March 2023 (UTC)[reply]
I have to agree with this. It's frustrating and offputting. Theknightwho (talk) 09:11, 4 March 2023 (UTC)[reply]
to be fair the "beyond repair" one has a valid argument, the "this is a set phrase" part, and he does reply afterwards, so I don't think that that specific example is the best. AG202 (talk) 13:47, 4 March 2023 (UTC)[reply]
Oppose. While I find it irritating banning a specific user seems like the wrong end of the stick—if RFDs are being resolved on the basis of comments like this then there's a problem with the policy for how RFDs are resolved, if they aren't then it seems pointlessly punitive. —Al-Muqanna المقنع (talk) 15:20, 4 March 2023 (UTC)[reply]
Well I never. Apart from not receiving any notification of this (why?), many of the "reasons" quoted by PUC and Lambiam are quite trivial, there appears to be prejudice against anyone voting keep without giving a reason, and I am often outvoted anyway. Does this mean deletionists want entries deleted without contest? It doesn't bother me if I'm banned from RFD (even if I just comment, and don't vote at all?), I have other things to concentrate on. Re the Dickens point, another way around that would be the definition: "A surname, notably that of Charles Dickens". Then you could remove the separate definition. DonnanZ (talk) 20:52, 4 March 2023 (UTC)[reply]
Banning individual editors from specific tasks is clearly not the way forward, as the recent I-am-annoyed-about-Wonderfool's-admin-nominations vote evinces. I wouldn't worry Donnanz, you and I both know there's not a snowball's chance in hell this dumb suggestions will come to anything. You have my 100% support, and we appreciate your hard work here. Van Man Fan (talk) 21:09, 4 March 2023 (UTC)[reply]
It is unseemly to label editors as “deletionists” for trying in earnest to apply our criteria (which do not necessarily correspond to their personal preferences). A manifest contemptuous disregard of this policy is not merely annoying but interferes with the discussions.  --Lambiam 08:07, 5 March 2023 (UTC)[reply]
Using such a label doesn't give me any pleasure. I don't vote on every RFD that comes along, some I am indifferent to. I have been accused of trolling by another "dodgy" admin, I'm still awaiting an explanation of that. I have been editing here for 9½ years now, and in that time PUC in his various guises has been enthusiastic (obsessed?) about removing what he sees as contraventions of CFI, SoP policy or whatever. I am the opposite, and now PUC is an admin he thinks he can ban me from RFD, which is rather draconian. Semi-related to this is the permanent ban of Dan Polansky, which seems to be a fate worse than death. I seem to get on better these days adding to Wikipedia, mainly surname disambiguation pages. DonnanZ (talk) 13:56, 5 March 2023 (UTC)[reply]
The only reason you call me “dodgy” is because you liked the fact Dan Polansky voted “keep” on pretty much everything - and it’s no surprise that you brought that up, really.
What you have pointedly not done is actually address any of the concerns raised here. It’s disappointing. Theknightwho (talk) 17:13, 5 March 2023 (UTC)[reply]
I don't think it's best to compare yourself to Dan Polansky as he was permabanned primarily for racist attacks towards other editors... AG202 (talk) 17:37, 5 March 2023 (UTC)[reply]
No, I am not comparing myself to Dan, I was referring to his ban, which is rather harsh.
I have addressed one concern re Dickens, withdrawing my vote and suggesting a solution. DonnanZ (talk) 17:47, 5 March 2023 (UTC)[reply]
God Defend New Zealand dealt with. I am an NZer, BTW. DonnanZ (talk) 22:15, 5 March 2023 (UTC)[reply]
North Atlantic Treaty Organization: I naturally prefer British spellings, but I know that it would have the "z" spelling if reinstated. I think I was misunderstood by my critic, but I revised my comment anyway. DonnanZ (talk) 21:45, 6 March 2023 (UTC)[reply]
Any editor could have proposed this ban. Being an admin has nothing to do with it.  --Lambiam 19:29, 5 March 2023 (UTC)[reply]
I accept that point if it's true, PUC was the only one motivated though. DonnanZ (talk) 21:03, 5 March 2023 (UTC)[reply]
Oppose. Just ignore his vote if it's nonsensical, but banning him from any voting makes it impossible for him to give any arguments at all, even if they do make sense. Thadh (talk) 19:57, 5 March 2023 (UTC)[reply]
Oppose. He's far from the only person to vote without providing a justification (or providing an unconvincing one). --Overlordnat1 (talk) 22:08, 5 March 2023 (UTC)[reply]
Oppose. For al the reasons already given by others above. — Sartma 𒁾𒁉𒊭 𒌑𒊑𒀉𒁲 00:16, 13 March 2023 (UTC)[reply]
Oppose. No adequate reason given for unprecedented selective ban, let alone any broader action. DCDuring (talk) 15:46, 13 March 2023 (UTC)[reply]
Oppose. While I share your frustrations with DonnanZ's inability to explain his decision or adhere to CFI, I don't think that warrants a ban. Though I really do wish discussions were less... frustrating. Vininn126 (talk) 15:57, 13 March 2023 (UTC)[reply]
Oppose. A ban is far too harsh of a response. Plenty of people vote "keep" without giving a good reason (or any reason at all); providing one is not mandatory. If someone makes "illogical" arguments, they can be ignored and outvoted. When outvoted, DonnanZ accepts the outcome, even if they don't personally agree with it. Thus they have my support to continue participating in RFD. Megathonic (talk) 20:17, 16 March 2023 (UTC)[reply]
I was hoping that User:PUC would see which way the wind is blowing and withdraw this proposal, but no such luck. I am very grateful for the support given to me by many editors. DonnanZ (talk) 11:22, 23 March 2023 (UTC)[reply]
I would ask you kindly to take some of the comments to heart and consider things like CFI and explaining yourself more in the future. Vininn126 (talk) 11:39, 23 March 2023 (UTC)[reply]
@Vininn126: I have already. As for CFI, it has shortcomings and isn't perfect, so won't satisfy every user. DonnanZ (talk) 12:11, 23 March 2023 (UTC)[reply]

Proposed change to CFI[edit]

Through my experiences closing RFVs I've noticed that CFI doesn't always align with actual practice. My suggestion is to add this into Wiktionary:Criteria for inclusion § Inflections:

Regular inflections of terms, such as English feats or crossed, may be cited with only one attestation in a durably archived source even if they are not part of a limited documentation language. What counts as a "regular" inflection should be decided for each individual language. This does not apply if the regular inflection is nonstandard or uncommon relative to the lemma, such as beed (regular past tense of English be).

Inflections, including irregular ones, count towards attesting their lemma form. For example, three quotations for soars, soaring, and soared are sufficient to attest soar.

Any unattested entries should be noted as such in a usage note.

Would anyone be interested in voting for this? If so, what about if the first paragraph was changed to "zero attestations"? That would mean three quotations for soar would be sufficient to create soars, soaring, and soared.

Ioaxxere (talk) 20:41, 3 March 2023 (UTC)[reply]

I feel like it should be zero attestions. For example, if a Spanish speaker were to read "123ar", they would know that if they wanted to make the verb first-person plural present, they would say "123amos", etc. with the other forms. Three citations, for all senses. (talk) 21:04, 3 March 2023 (UTC)[reply]
In practice we already don't require attestation of regularly formed inflections of attested lemmas unlessd there is some question about whether the form is attestable (e.g. if there is an irregular inflection of the same meaning, like 'fungi', or if the term is singular-only, etc.). The main exception is dead languages, esp. those with low attestation like Old Irish; and even then in well-attested dead languages (e.g. Latin) we often create entries for non-lemma inflections without attestation. Benwing2 (talk) 21:36, 3 March 2023 (UTC)[reply]
I also wonder how we should handle regular spelling variation. For example, converbialisation is not attestable while converbialization is, but it feels very silly to delete the latter. It's just not a very common word. Theknightwho (talk) 22:02, 3 March 2023 (UTC)[reply]
we have only been documenting attested middle and old polish forms, mostly due to a lack of uniformity, however for obsolete terms we generate full inflection tables. Vininn126 (talk) 22:27, 3 March 2023 (UTC)[reply]
I would support the version without attestations. There was an RFV in October where we agreed that the rare English verb soccer was considered cited by two uses of soccer and one of soccered, and that this also extended to the inflected form soccers which was the form that was originally sent to RFV. So my understanding is that it is already our policy that we don't need to find attestations for every inflected form of a word, at least in English. I would hope not, anyway ... that could generate a lot of spurious RFV's. It is only good when the forms are irregular or unpredictable .... for another rare English verb, coorie, I had trouble finding uses of the -ing spelling, but eventually was able to turn up coorying ... another editor later added coorieing, so it seems that this verb can take both spellings. Soap 04:36, 4 March 2023 (UTC)[reply]
I’m fine with zero for regular inflections in living languages, which I think reflects current practice. 70.172.194.25 21:41, 4 March 2023 (UTC)[reply]
I agree with Soap and Benwing2 that we should keep credible, regular forms, even without dredging for instances. (There's a good case for considering ephemeral evidence.) --RichardW57m (talk) 16:31, 6 March 2023 (UTC)[reply]
Do we yet have a mechanism for challenging the content of inflection tables? Dan Polansky tried to RfV an English comparative lurking in an inflection line, but I never managed to find a record of the challenge later. I have doubts about some of the forms in Pali inflection tables, but they've stayed for a lack of a consensus. (I would welcome advice on how to publicly accumulate evidence for endings as 'regular'.) --RichardW57m (talk) 16:31, 6 March 2023 (UTC)[reply]
With the proposed rule, what would happen to inflected forms in a table if they failed RfV when they ventured onto their own page? Many table-generators seem to lack a mechanism to remove a form. I loathe orange links in inflection tables because they appear blue to visitors. Fixing them looks non-trivial. --RichardW57m (talk) 16:31, 6 March 2023 (UTC)[reply]
As far as I know there are no policies as to what should be in a table. Generally they should be whatever is most helpful to readers, I guess. Ioaxxere (talk) 21:58, 6 March 2023 (UTC)[reply]
Not on this subject, I noticed that entries for names of species (botanical, zoological) aren't mentioned in CFI. They are accepted by default, I guess. DonnanZ (talk) 13:49, 19 March 2023 (UTC)[reply]

Proposal to expose and clean up HTML comments[edit]

There are currently about 26,000 sections with HTML comments scattered throughout the main Wiktionary namespace. Here's a random sample of 1000 comments.

The comments seem to fall into a few categories:

  1. Comments left to save future editors from duplicating an effort
    • all but:English:Adverb Note: Do not add the sense "all except", as in "all but three of them were left", as it is not a set phrase and its meaning can be derived from "all" and "but"
    • cẩu:Tày:Etymology 2 The Vietnamese word is never used as a neutral word for "dog" and its humorous connotation is probably recent, so unless the Tày word is also humorous, it's probably a direct Sinitic loan.
  2. Comments that raise questions or doubts
  3. Comments urging some action
  4. Comments that disable something
  5. Comments that could be categories
  6. Comments that could have been part of the edit summary
    • basis:Latin:Noun my own translation, since all the ones I found online either didn't include Sirach or didn't translate 'bases virtutis' literally.
  7. Comments that may be better on the talk page
    • bat-fowling:English:Noun Had a description from a Robert Graves short story here previously, but what he was describing was in Majorca and seemed a little different from this Cyclopaedia description. Believe Graves was describing only something similar rather than being authoritative on the term.

I think in many of these cases, there's a better option than using HTML comments, ideally a template like {{attention}} that would automatically categorize the page for cleanup and expose comments to users who want to see them.

I propose creating a new template, possibly {{rfc-comment}}, and adding it by bot to the beginning of each section containing a HTML comment, excluding Translations, which already uses {{t-check}} for most of the lines with comments. The new template could generate a banner visible at the beginning of each section alerting users that the section contains HTML comments and needs manual cleanup. The banner could have a link to a page describing how best to handle the various types of HTML comments and possibly even the HTML comment itself (or the first line non-blank, if it's a multi-line comment). Additionally, the template would categorize the page into categories like "Category:Requests for cleanup in LanguageName" and "Category:Requests for cleanup in LanguageName SectionName"

This wouldn't have to be done all at once, we could start by tagging only specific section within a few languages that have editors interested in this type of cleanup and iterate from there.

I'm interested in feedback, suggestions, and specific ideas for how to categorize the pages and how best to handle the various classes of comments. JeffDoozan (talk) 02:39, 4 March 2023 (UTC)[reply]

@JeffDoozan: None of them regularly seeks attention to go to the article, which a category would be for. You just raised mine by a row of bot edits (5 in my watchlist) none of which justified the insertion of {{attn}}. Comments are there for the mere eventual attention of those who want to edit the entry–otherwise man also knows {{attn}} but intentionally did not use it. In the most often cases I know the comments are there to give an exact reasoning behind a definition that could later help decipher vague wording but does not need to do anything.
So in tahulla someone said why it is a superseded spelling, in alarguez I added a number of vernacular names typically equated in reference works from which I could map unto modernly appropriate botanical glosses, álabe contains technical terms in German found in various dictionaries to make it reconstructible how the English glosses came about—we sometimes feel that we can be transparent that our definitions are based on translating German, Latin, Russian ones etc.—, similarly even for pastry stuff like alcorza the German versions felt much lighter—also note that people translate my dictionaries entries into Chinese etc., surely it can make a difference: it still is just an exception for suspected irresolvable inexact or ambiguous mapping between languages—, segur contains the source of a descendant otherwise not easy to find since we cluttering descendants with footnotes.
Or of course they just disable something or warn so something dubious is not added. Your number three is very telling as the comment exactly urges to not take the particular action, of reinserting the hidden text, since it belongs elsewhere.
So none of them needs category or cleanup or a template (RAM and unworthy attention seeking). It is also unheard of that reasonings in the source-codes of FOSS are purged without the codes they comment themselves; of course we use HTML comments just like in any programmed app, how could one think it different? I also frequently confuse “edit summaries” with commit messages, no difference in my opinion, and sometimes one uses commit messages and sometimes source comments and in either case one assumed that the text should gain no activity. Fay Freak (talk) 04:20, 4 March 2023 (UTC)[reply]
@JeffDoozan From the example comments, it appears only some of them need addressing; my concern about adding an {{rfc-comment}} banner at the top of every section with an HTML comment is that it would be a lot of noise. Also you'd potentially be duplicating the functionality of {{attn}}; maybe better to just fix {{attn}} as needed. For example, it might make sense to have {{attn}} comments displayed by default; having them hidden makes them invisible to someone browsing the page, and hence much less likely that they will be addressed. HTML comments in a sense are like {{attn}} but even more invisible, and can be used in place of visible {{attn}} to leave an invisible comment.
BTW I have absolutely no idea what Fay Freak's comment means, which is par for the course.
Benwing2 (talk) 05:34, 4 March 2023 (UTC)[reply]
The display of comments from {{attn}} was disabled a few months ago, based on an earlier discussion. It might be handy to have an invisible-by-default template that marked the presence of html comments to facilitate the use of Cirrus search to find comments (using insource=, which is not usable without a filter such as hastemplate=). Alternatively, the xml dump could be processed to yield a complete list of entries with HTML comments, preferably grouped by the language of the L2 section(s) in which they appear. DCDuring (talk) 19:34, 4 March 2023 (UTC)[reply]
As is, one can identify HTML comments in even large categories of entries by using searches such as "incategory:"English adverbs" insource:/\<\!\-\-/". This search would have a number of false positives (ie, the HTML comments could be in non-English L2s), but is very practical, requiring no investment of technical or other resources beyond learning basic Cirrus search, which has many other applications. DCDuring (talk) 19:45, 4 March 2023 (UTC)[reply]
I think your Category 6 example would be lost in a change history. It's telling an editor that other translations may be unsuitable, and giving criteria to use in assessing them as replacements. --RichardW57m (talk) 17:35, 6 March 2023 (UTC)[reply]
On the topic of HTML comments, let me briefly also link to here, where there is a list of potentially problematic (unclosed) HTML comments, and where I suggested we might benefit from identifying what pages have the longest HTML comments (e.g. someone commented out four language sections and no-one has noticed); see this for how some long comments on Wikipedia were found by HaeB. - -sche (discuss) 03:19, 13 March 2023 (UTC)[reply]
Unclosed HTML comments seem to be more clearly problematic than HTML comments in general. DCDuring (talk) 16:04, 13 March 2023 (UTC)[reply]

Retiring derivative subpages[edit]

I have been going through the 50-something derivative subpages and moving their contents into the main entry, while making changes to ensure it does not go over the Lua memory limit. <technical stuff> Most of the entries only need changes such as replacing ineffcient templates (e.g. {{ja-r}}, {{ko-l}}, {{col3-u}}) with more efficient ones like {{ja-r/multi}}/{{ja-r/args}} and {{der-top}} etc., and the use of some lite templates. Some entries required using manual transliteration in {{zh-x}}, which otherwise uses a large amount of memory when loading the pronunciation tables. The few especially problematic pages are , , , and (in particular its descendants section), which although I have successfully made them go under the 50MB limit, there is not much leeway to further reduce the memory usage if more content is added. </technical stuff>

With these changes, it means that the derivative subpages are no longer needed, but they do include some useful history (which I have attributed when copying the contents over), so should they be deleted? Please also suggest any other concerns if the derivative subpages are to be retired. Wpi31 (talk) 06:52, 4 March 2023 (UTC)[reply]

@Wpi31 I don't have any particular concerns with deleting unneeded derivative subpages. However, it would be great if you could write up a guide to optimizing memory usage, including such things as which inefficient templates to replace with which others and how, and which {{*-lite}} templates exist and (as far as you know) what their limitations are. A lot of this stuff is currently just tribal knowledge, and having it written up would go a long way towards helping other editors figure out how to do this stuff. Maybe User:Theknightwho, User:Surjection and/or IP 70.* can help augment the guide as they also have done significant work reducing memory usage. Benwing2 (talk) 07:20, 4 March 2023 (UTC)[reply]
Good idea. In fact sometimes I have no idea how some of the lite templates work and miss out a parameter. The guide is already partly covered by Wiktionary:Lua memory errors#Tactics, but it's just skimming over the topic on the surface, and I think it's better to split it into a separate page. – Wpi31 (talk) 07:28, 4 March 2023 (UTC)[reply]
@Wpi31 I've been seeing a few pages returning to CAT:E with memory errors after your edits. The problem is that the trend toward increasing memory usage is still going on, so what works now may not keep working for long. Be careful not to overdo it, and keep an eye on CAT:E. Chuck Entz (talk) 01:32, 6 March 2023 (UTC)[reply]

Template editor permission request[edit]

I want to edit some protected modules (such as Module:bo-pron and Module:languages/data*). Please grant me template editor permission. Thanks. -- 14:49, 4 March 2023 (UTC)[reply]

Seeking feedback on trying to add a kind of usage for terms like "fifty" as in "I was going 63 in a 50".[edit]

Is there any way to include these kinds of uses, which are common in the United States? These usages refer to the speed limit in a given region, such as "You will get a reckless endangerment charge if you're going 70 (miles per hour) in a 30 (mile per hour speed limit zone)". My suspicion is that there is no real way to catalogue these, but I'm open to feedback on if there's some way to include this usage. —Justin (koavf)TCM 09:16, 5 March 2023 (UTC)[reply]

Not sure I understand. Are you thinking of something other than just adding a third usage under fifty#Noun? I'd think we'd also want to add an adverbial entry for the 70 in your example. kwami (talk) 10:09, 5 March 2023 (UTC)[reply]
I am thinking of that, yes. Do you think that would be an appropriate usage to add? In theory, there could be an infinite number of these added, but in practice, they would only be 25 to 70 in multiples of five. Good point on the adverb as well. —Justin (koavf)TCM 10:31, 5 March 2023 (UTC)[reply]
You would have to put up with 20mph speed limits around here, whether it's a suburban cul-de-sac or a main road. DonnanZ (talk) 14:39, 5 March 2023 (UTC)[reply]
The phenomenon seems to me to be an example of context-specific shortening by omission of what would be the head of an NP. This seems to me to be simple pragmatics, not something lexical. Other examples using fifty:
I got a fifty(-dollar bill) from my grandma for my birthday.
He looked like he was in his fifties. (years of age)
I don't want anyone to hear that 'fifty(-caliber machine gun) at night, understand? Just cover it and leave that fat barrel sticking out for show.
6IH and 6CFE combined stations and are using a "fifty"(-watt transmitter).
[] with Herman Hyde and John Stoker captains of fifties (units of fifty men)
He had an office on Lexington in the fifties. (from 50th to 59th Street)
By the 10th century it had become the custom to divide the Psalter into three books of fifties (fifty psalms)
Thus each fifties selector has [] access to every terminal selector of each fifty. (telephone circuits)
These large and fairly constant yields during the forties and fifties, moreover, seem not to have been caused by high prices [] (1850s)
The differences among these are whether the usage context is widely experienced (first two) or not and in what time period it was experienced (last one), not attestability. DCDuring (talk) 19:13, 5 March 2023 (UTC)[reply]
No other OneLook dictionary has fifty as an adverb. Does OED have it as one?
I'm not sure that we should have this kind of adverbial use of a noun, but we do, eg, home#Adverb. DCDuring (talk) 19:25, 5 March 2023 (UTC)[reply]
Thank you for clarifying my thinking on this: this really puts it into perspective. I was driving around thinking about how I'm "going from a 50 to a 40" and wondering how I would explain it to someone with poor English skills and whether or not it would fit here. Seems like there'ss no real way to capture this as a kind of definition, per the examples you gave above. —Justin (koavf)TCM 19:27, 5 March 2023 (UTC)[reply]
I agree with DCDuring. This seems to be just an omission of certain words rather than a different part of speech. — Sgconlaw (talk) 01:39, 6 March 2023 (UTC)[reply]
I'm slightly in two minds about it. The meaning is dependent on context, if you were talking about your speed in continental Europe you might use a number (say 'X') to mean 'X km/h' or 'X km/h zone' rather than mph, for example, but we do have things like both forty and 40 being used to refer to a 40 fl oz bottle and 40 referring to a score in tennis. On what basis are we deciding to keep these and reject the meaning of '40 units of speed' (which could even be 40 knots) and 'a zone with a speed limit of 40 units of speed'? --Overlordnat1 (talk) 02:44, 6 March 2023 (UTC)[reply]
Beside the size of a beer bottle, we list a monetary denomination with "A banknote or coin with a denomination of 50."
So, if we wish to be consistent, we need to decide how to handle those entries as well: Should we have no examples at all, and delete the current currency and beer-bottle entries; list only a consensus list of the most common (e.g. currencies, speed limits, ages), or have open-ended and potentially indefinite lists for whatever someone happens to find attestation for?
Or, should we perhaps have a single entry for "ellipsis of a noun phrase containing a numeral", and restrict the specific definitions of age, currency, bottle-size etc. to an illustrative but non-exhaustive list of examples under that single entry? We could then argue that the definition is complete even if some particular usage is not listed. That and the original context should hopefully then be enough for a non-native speaker to understand any instances that they come across. kwami (talk) 03:54, 6 March 2023 (UTC)[reply]
I think something like that is the best approach: have a definition that is a clipping of [number] with [unit] and usage examples "Can you lend me a 20?" and "He was going 70 in a 40", etc. But do we add them to all numbers? Multiples of five? :/ —Justin (koavf)TCM 04:38, 6 March 2023 (UTC)[reply]

Signs of a potentially problematic IP[edit]

Some of you may remember Fête (talkcontribsglobal account infodeleted contribsnukeabuse filter logpage movesblockblock logactive blocks), who was well known to live in Quebec and who combined poor English skills with a very peculiar way of looking at things to create lots of bad edits. That and their annoying habit of hanging out on peoples' talk pages and constantly asking questions got them (and their socks) globally banned. I just noticed an IP geolocating to Quebec, Special:Contributions/65.92.244.151, spending a lot of time on the entry for Phung. Based on the name of one of their sock accounts, that may very well be Fête's name in real life. I've noticed this IP for at least a few months working a lot on given names and surnames, but had no reason to connect them with anyone.

I would note that there's another Quebec IP who has been systematically adding entries on technical subjects for many years. In spite of my misgivings based on their wideranging subject matter and closeness to Fête's location, their edits have checked out every time I looked into it, and I don't think they're the same person.

All of this is very circumstantial. It's also true that Fête was fairly young when they were active a decade or so ago, so they might have grown out of their problematic stage. Nonetheless, this makes me nervous and I'd appreciate others looking at this IP's edits. Chuck Entz (talk) 04:31, 6 March 2023 (UTC)[reply]

For context: There are 8.5 million people in Quebec and it's two times larger than Texas. WordyAndNerdy (talk) 05:39, 6 March 2023 (UTC)[reply]
I'm aware of that, but there are certain IP editors associated with certain areas, whether it's the south end of Long Island in New York, St. Louis, Missouri (not that far from where my mother was born and raised), Occidental College in Glendale, California not far from me, Philadelphia, Pennsylvania, etc., as well as the north end of London, the Pays de Loire in France, Land Berlin in Germany, Thailand and Vietnam. That's not even getting into the IPs I run into in my checkuser work. You may not be aware of them, because your focus is on other things, but they do exist and they do have distinctive editing signatures that one learns to spot after years of patrolling IP edits. Chuck Entz (talk) 06:21, 6 March 2023 (UTC)[reply]
I don't trust that Fête has grown up one bit. — SURJECTION / T / C / L / 07:59, 6 March 2023 (UTC)[reply]

Arabic presentation forms[edit]

I've come across links from Wikipedia to ‎, , and . These are isolated presentation forms of ۇ‎, ۋ, and ى, respectively, whose entries include character info boxes listing that forms. The first two links, however, give 404. Seeing that presentation forms for some Arabic letters have redirects like the third one, and that there even seems to be rcat template {{R character variation}} exactly for this kind of redirects, I thought I'd create the missing two for ﯗ and ﯞ, but it turns out page titles matching .*[\x{FB50}-\x{FBB1}\x{FBD3}-\x{FDC7}\x{FE70}-\x{FEFC}].* are disallowed. Yet, some redirects matching it like clearly exist. I think it would be natural for any single-letter entry with more than one character info box on it to have them all redirecting to it. What is the policy on Arabic presentation forms? –mwgamera (talk) 07:39, 6 March 2023 (UTC)[reply]

@MwGamera:
Wiktionary:Beer parlour/2020/May § why are the unicode Arabic Pedagogical symbols blacklisted?
Wiktionary:Beer parlour/2021/August § Deleting "Hangul syllable" entries
There doesn’t really need to be a policy since we cover the language and are not a Unicode database. In the latter linked thread, RichardW57 (talkcontribs) accurately called Arabic presentation forms dead waste, not actually part of any language but a technical nuisance you would need a specific reason for to include—there wasn’t any reason other than this consideration to delete existing redirects either, so it is logical that we are inconsistent. Unresolved wilderness is also natural, often, rather than consistency. Fay Freak (talk) 14:13, 6 March 2023 (UTC)[reply]
So I understand it works as intended? Some consistency would be nice. Dictionaries should sort out the mess nature creates, not multiply it ;) And sure, I can't imagine these being anything else than redirects, but it's not that obvious they shouldn't exist at all as we have single-letter entries which prominently list their alternative encodings and calling these entries parts of a language is already a bit of a stretch. Thanks for the links and explaining the current status quo anyway!
Btw, where do I vote on removal of red links from the character info that was previously mentioned? Red links are invitation to contribute and shouldn't point to names that are not supposed to exist. mwgamera (talk) 00:59, 7 March 2023 (UTC)[reply]
It would be possible to redirect terms containing presentation forms with JavaScript if we had a map from presentation form to regular shape-changing letters. I haven't found such a thing myself, but it's probably out there somewhere. — Eru·tuon 01:14, 2 April 2023 (UTC)[reply]

Inclusion of italics in head[edit]

Should they be added? Some examples are “abc conjecture”, “ad valorem tax”, “EverQuester or EverQuester”, “in terrorem clause”, “k-cell”, “Palko test”, “p-adic order”, “r/K selection theory”, “uno flatu”, “Zelda-like or Zelda-like (sense 2)” (which started this discussion: Talk:Zelda-like). J3133 (talk) 10:04, 6 March 2023 (UTC)[reply]

“Head” here means {{en-noun}}, etc., which displays it in bold. J3133 (talk) 11:13, 6 March 2023 (UTC)[reply]
Wikipedia does this in articles on certain topics, but I have found this can't be repeated on their disambiguation pages. DonnanZ (talk) 10:54, 6 March 2023 (UTC)[reply]
Were the entries only in English, then yes, at the very least for creative work terms. The use of italics for foreign terms that are not entirely localized is inconsistent, but the use of italics for book or movie or video game titles is very standard. That said, a term could be multi-lingual and the conventions of that language may not be to italicize a certain thing, so I'm inclined against it for the page title, but in favor of using it in the text of entries as appropriate. —Justin (koavf)TCM 11:04, 6 March 2023 (UTC)[reply]
None of those entries have a double header (except for EverQuester which you created two days ago) which what I think looks strange. Ioaxxere (talk) 13:23, 6 March 2023 (UTC)[reply]
@Ioaxxere: I do not think it looks strange. Wade-Giles has “Wade-Giles or Wade–Giles”. Perhaps the strangeness is that the words have more than one form, which itself is unusual (therefore both forms are included); otherwise, I do not see another way, unless you have a suggestion? J3133 (talk) 13:45, 6 March 2023 (UTC)[reply]
I've removed the second head from Wade-Giles. It's a typographic difference, not a legitimate alternative form (which belong at other pages anyway). Ultimateria (talk) 01:10, 15 March 2023 (UTC)[reply]
@Ultimateria: The page after your edit does not indicate that the dash is a valid alternative to the hyphen. The dash form was the only one included from 6 April 2022 (@Chuck Entz) to 5 March 2023—now it is the opposite. Also pinging @Geographyinitiative. J3133 (talk) 01:20, 15 March 2023 (UTC)[reply]
I have no effin clue what should be done on this question, not even .0001% of a clue. Wikipedia uses Wade–Giles, Wikimedia Commons uses Wade–Giles, but Wiktionary's cites show that Wade-Giles is the form in actual use, lulz. --Geographyinitiative (talk) 01:24, 15 March 2023 (UTC)[reply]
@Geographyinitiative: Wikipedia follows its own Manual of Style, which mandates the en-dash when the two linked terms aren't a morphological compound or blended into one (i.e., it's a system ascribed to Wade and to Giles, not by one person with the surname Wade-Giles). In practice Wikipedia follows this rule rather more strictly than most publishers; for example their en-dash in Polish–Lithuanian Commonwealth is very rare in published works. It's not difficult to find Wade–Giles printed with the en-dash, though, this example was on the second page of Google Books results for an intext search for "Wade-Giles" for me. —Al-Muqanna المقنع (talk)
The problem with including it as an alternative is that it's not specific to this term but potentially any hyphenated term. I don't find it necessary on this or any other entry to "indicate that the dash is a valid alternative to the hyphen". Al-Muqanna's point about the two surnames is interesting, but that's covered in the etymology of this page at least. If someone thinks it's important, feel free to revert. Ultimateria (talk) 02:06, 15 March 2023 (UTC)[reply]
@Ultimateria: I have changed Wade–Giles from a redirect to an alternative form per other terms in Category:English terms spelled with –: e.g., Trans–New Guinea was changed from a redirect by Equinox. If you think there should be a vote, then I will make one. J3133 (talk) 12:02, 15 March 2023 (UTC)[reply]
Also, this is consistent with the other entries using italics such as “Palko test”. Providing only one form would be incorrect (e.g., if “Palko test” was also used). J3133 (talk) 13:49, 6 March 2023 (UTC)[reply]
I.e., excluding the form using italics would be inconsistent if the italics were kept in other entries, because it would be misleading—indicating that italics are not used. J3133 (talk) 14:00, 6 March 2023 (UTC)[reply]

Dates[edit]

What is our policy on dating quotations and what do we think it should be? Should it be in a YYYY-MM-DD format or in either the standard British or American formats and should we be including the month and day at all? I ask because the dates for the citations for knob were changed yesterday so that the months and days were removed, though when I brought it up in a user’s talk page they did partially change it back to include the months. I don’t want to single any particular editor out, though anyone reading this can easily find out the specifics for themselves, but I think that a clear and official policy on the issue would be welcome as we should aim to be consistent. Overlordnat1 (talk) 14:26, 6 March 2023 (UTC)[reply]

I just do it arbitrarily. Where does it end? You could ask about policy to order the parameters of quotation templates and then templates general and at some point it is no fun any more. I think one can’t make a reliable rule on it, people would vote on it arbitrarily and subjectively too and nothing would be gained plus one would waste time to correct rule violations. Fay Freak (talk) 14:30, 6 March 2023 (UTC)[reply]
Personally, I provide whatever information is available in or about the source. If a full date is known I provide it, and if a month and year is known I provide them. As for the date format, the quotation templates currently display the year, followed by the month and date if provided. It’s helpful to have the month spelled out as a word, I think, to avoid ambiguity. — Sgconlaw (talk) 14:36, 6 March 2023 (UTC)[reply]
I agree. Vininn126 (talk) 14:47, 6 March 2023 (UTC)[reply]
I also agree. Perhaps we should create a policy that states that more specific dates should never be altered to less specific ones unless there is very real doubt as to the accuracy of the more specific date? --Overlordnat1 (talk) 14:54, 6 March 2023 (UTC)[reply]
I should also add that full dates are very useful, if not essential, when trying to verify quotations from serial publications such as magazines and newspapers. It makes little sense to indicate an article from a daily newspaper as having been published in "1950". — Sgconlaw (talk) 15:00, 6 March 2023 (UTC)[reply]
@Overlordnat1 forgot to mention that this was specifically about {{quote-book}} dates. I agree that for regularly published material (magazines, newspapers etc) we should always include the full date, I simply question the usefulness of specifying the day of publication of books. Jberkel 15:04, 6 March 2023 (UTC)[reply]
I think full dates are less critical for books. Nonetheless, if the information is provided I generally just add it. It can help in arranging quotations chronologically if there are two works published in the same year, but again this isn't a biggie. If full dates aren't available in such a situation, i just arrange them alphabetically by the authors' surnames. — Sgconlaw (talk) 15:08, 6 March 2023 (UTC)[reply]
There are occasions where a new edition is printed in the same year, I suppose. Vininn126 (talk) 15:09, 6 March 2023 (UTC)[reply]
My personal policy is to only add publishing info as stated in the frontmatter, since Google's metadata is frequently wrong, and it's pretty rare for a month or day to be listed there for a book. I wouldn't remove it if someone's added it though. —Al-Muqanna المقنع (talk) 15:15, 6 March 2023 (UTC)[reply]
Yes, I generally also use data that is published in works and don’t use Google’s metadata either, but I have sometimes accepted what is stated in a Wikipedia article about a work in good faith. — Sgconlaw (talk) 22:08, 6 March 2023 (UTC)[reply]
Virtually all the quotes I add now use {{quote-journal}}, where dates are critical. They are from magazines both new and old, the old ones are ones I bought second-hand years ago; finding their contents on the Internet is highly unlikely. DonnanZ (talk) 09:14, 7 March 2023 (UTC)[reply]

RFVE Mass Closures (Again)[edit]

A continuation of Wiktionary:Beer parlour/2023/February § Disallowing mass closures. After being told by multiple folks to slow down on the mass closures due to multiple instances of not following CFI, @Ioaxxere is only continuing and keeps closing entries against CFI. The most egregious example that was brought to my attention was with bigenital surgery, where they passed an entry with two links to white supremacist/fake news websites; two sites that have never ever been accepted for CFI. This made me go through several RFVs yet again, only to see that they've passed entries like MAMAA, and have had multiple back-and-forths with entries like antijapanese, praecognita, and Falklands Fritillary Butterfly. They've also been closing RFVs almost as soon as the entries hit the month point, which, while it's the written guideline, is usually pushed out so that folks have enough time to go through them (though the warning is appreciated). This is especially concerning because of the high amount of RFVs that they've closed and then archived, leaving possible entries that would've passed or failed RFV if given enough proper review out in the wind. An example of this is y'all'd'nt've, which still does not have cites, even though I explicitly pointed out that it'd need cites. (Resolved) At first I had thought that this is just an example of inexperience, but the fact that they keep continuing and these problems keep coming up is very problematic, especially with the bigenital surgery one. Something needs to be done. Pinging @Theknightwho, @-sche, @WordyAndNerdy, @Benwing2, @Chuck Entz AG202 (talk) 15:42, 6 March 2023 (UTC)[reply]

Okay, I'm experienced enough to admit my own inexperience. I will stop passing RFVs (for at least the rest of the month), and if I think an RFV should pass I'll ping you (or any other editor who closes RFVs). By the way, the cites for y'all'd'nt've are on the lemma entry. Ioaxxere (talk) 16:09, 6 March 2023 (UTC)[reply]
I'll move those cites and edit my initial comment, apologies for that, and thanks. AG202 (talk) 16:10, 6 March 2023 (UTC)[reply]
"Lemma" doesn't mean "the main spelling of several alternate spellings": it means the main form of an inflected word, like dog for dogs. Alt spelling citations should go at the correct spelling, the one they actually provide evidence for. Equinox 16:18, 6 March 2023 (UTC)[reply]
@Equinox Some people argue that alternative spellings are not themselves lemmas, even though they may get inflected. Personally, I think there's a hierarchy of lemmas, but I can envisage a demand to categorise non-Roman script Pali words and stems as forms rather than lemmas. --RichardW57m (talk) 16:13, 10 March 2023 (UTC)[reply]
@RichardW57m: I think this is a foolish argument because we could create either "color" or "colour" as an alt form of the other one, depending on whether we felt more British or American (or other places). Clearly a "base word" is a lemma, even if it isn't our favourite one. But a conjugated form like "coloring, colouring" is not. Equinox 04:27, 13 March 2023 (UTC)[reply]
I wouldn't try giving the community a deadline ("for at least the rest of the month"). Clearly some experienced users find your standards for closing RfVs questionable. IMHO, although the size of RFVE is a problem, it is not more of a problem than premature closure of RfVs. DCDuring (talk) 16:46, 6 March 2023 (UTC)[reply]
It's not a deadline, I just thought it wouldn't be believable to declare "I will never do X for as long as I live". By the way, it's not the size of RFV that's the problem but rather the fact that RFVs are created and immediately forgotten about. Before I starting going through them, there were literally hundreds of uncited terms (including hoaxes) in the mainspace with no one interested in clearing them out. Ioaxxere (talk) 17:15, 6 March 2023 (UTC)[reply]
Also, what's the time I need to wait before failing an RFV if not a month? I am following the guideline "After a discussion has sat for more than a month without being “cited”, or after a discussion has been “cited” for more than a week without challenge, the discussion may be closed." Ioaxxere (talk) 17:27, 6 March 2023 (UTC)[reply]
The time to close a difficult RfV is a matter of judgment. Judgment is largely achieved by learning from relevant experience. Also one needs a repertoire of means of resolution of RfVs. Sometimes the best thing to do is to try to find cites. Sometimes one should see what other dictionaries do (by using {{R:OneLook}} or OED, the latter by asking at RfV for help). Maybe the definition could be reworded to be more citable. Maybe it would be good to ask for help in finding cites from durably archived sources other than Google Books and News. Maybe one could guess at who might have an interest in the particular definition.
Maybe it just isn't that big a deal to let the RfV go for a while longer. DCDuring (talk) 17:38, 6 March 2023 (UTC)[reply]
What about the time to fail an "easy" RFV (no hits anywhere)? Ioaxxere (talk) 18:27, 6 March 2023 (UTC)[reply]
"No hits anywhere"? There are many sources that are not found by general Google web searches. Examples are regional terms and dated terms, but there are many types of terms that need a lot of love to find support. If no one is interested enough to do the work after some longish period, then I suppose we have to let it go, saving the record to the entry's talk page and any cites (even those not found durably archived or supporting a definition different from the challenged one) to the citations page. DCDuring (talk) 19:04, 6 March 2023 (UTC)[reply]
It occurs to me that perhaps you could ask for advice from User:Kiwima who worked hard at citing and closing RfVs for 4-5 years. DCDuring (talk) 17:44, 6 March 2023 (UTC)[reply]
When I had taken on the task of keeping the RFV list up-to-date, I never failed an RFV without personally doing a search for cites. It's time consuming, but tends to avoid closing entries that are simply uncited either because they are difficult (e.g. too many false hits because of a more common alternate definition) or that no one is particularly interested in. Kiwima (talk) 18:46, 6 March 2023 (UTC)[reply]
There’s also the fact that you need to judge whether you’re the best person to be doing the closure. For example, regionalisms or historical terms may be something that another user is much more knowledgeable about. Theknightwho (talk) 19:11, 6 March 2023 (UTC)[reply]
Yeah, agreed, this also tends to be a reason why RFVs are pending for a long time: with early modern English, for example, there are a fair number of cases where a word or sense is probably attestable, but needs legwork that most people don't have the time or inclination for. It's true that a term that failed RFV can be re-created, but failing the RFV removes the term from to-do lists and it becomes much less likely anyone will ever put in that work. —Al-Muqanna المقنع (talk) 01:04, 7 March 2023 (UTC)[reply]
Exactly. Much of Hansard (100m+ words) is not directly searchable from Google - though is easily searchable from the official website - and the same goes for a lot of legislation or law reports, where it’s simply a matter of knowing where you need to search. Nevermind the fact that archives of material are often (unfortunately) behind a paywall that another user may have access to. I’m reminded of the case of plantage, where I was only able to cite it because I have access to Westlaw.
In some cases, these archives will have tens-if-not-hundreds of millions of words of material, and it’s a shame that we don’t have our own archive of resources as to where to start looking, frankly. Perhaps we should use this as the impetus to make one, because this stuff all counts as being durably archived! Theknightwho (talk) 17:19, 7 March 2023 (UTC)[reply]
FWIW, there is a list at Wiktionary:Searchable external archives which could be expanded; unfortunately, I don't know what we could do to publicize it any further, it's already mentioned in the header of WT:RFV, but I don't recall noticing it until many years after I first started editing, and evidently you didn't read or see it in that boilerplate either, heh. Maybe we could link to it in a notice that displays when someone goes to edit an RFV page (like the notices Wikipedia uses on certain contentious pages), but I don't know if people who use visual editor would ever see that. I suppose if we wanted to make it really prominent we could expand the {{rfv}} and {{rfv-sense}} templates to say something like "(check these sources)". - -sche (discuss) 23:34, 7 March 2023 (UTC)[reply]
I think adding something to {{rfv}} and maybe also {{rfv-sense}} that links to Wiktionary:Searchable external archives or similar (personally, I think a design akin to Wiktionary:Corpora is superior, though admittedly, not user friendly) is a very good idea. Kinda like how {{rfi}} links to Wikimedia Commons. Another thing is mentioning the Wikipedia Library which can let you access great corpuses like the British Newspaper Archive, Newspapers.com, and NewspaperARCHIVE.com . —The Editor's Apprentice (talk) 00:43, 8 March 2023 (UTC)[reply]
For the record, another instance of a bad closure in mind is cephalophore as a term related to fungi. 98.170.164.88 explicitly mentioned that they thought the term was citable and linked to a Google Scholar search with a few good hits. In my own search, I found these two papers as well: [1], [2]. Nonetheless, Ioaxxere closed the discussion with the determination that the term failed noting no results at all for "cephalophore mushroom" etc. which didn't really address the previous discussion, such as 98.170.164.88's search nor the fact that the term's context is around fungus/fungi rather than mushroom. In this case it has been a few months since the discussion started, so the timing wasn't the problem, but instead the approach.
I'll also say I'm still glad to have Ioaxxere as a fellow editor responsible for entries like grounders or senses like climber (structure on a playground designed to be climbed on) though those could probably benefit from a larger time span of cites/some non-web cites. I think Kiwima's statement about not closing RfVs without first giving it a real go yourself as well as the other comments under that are a good advice for how to improve things going forward. —The Editor's Apprentice (talk) 00:35, 8 March 2023 (UTC)[reply]
@The Editor's Apprentice In that discussion User:Chuck Entz and 98.* noted that the term might be attestable in a different sense, but the RFV was only for that specific sense. By the way, if you like non-web cites check out the quotations on devilfish. Ioaxxere (talk) 04:19, 9 March 2023 (UTC)[reply]
It's (or at least it was) fairly common for editors to change the definition after discussions like those, so that the cites can fall under one definition. AG202 (talk) 12:37, 9 March 2023 (UTC)[reply]
Thanks for the reply @Ioaxxere I get your reasoning and will echo what AG202 said. On another note, I want to apologize for the note about web cites. It was an unnecessary comment, especially given your work with other terms like devilfish, as you point out. Overall, it made the message a backhanded compliment. I'll leave minor and irrelevant criticism aside in the future when I'm trying show my appreciation to you and other editors. —The Editor's Apprentice (talk) 19:54, 10 March 2023 (UTC)[reply]

We were blessed with a new frequency list today. This one includes collocations, as well as lots of unwanted crap, but there are plenty of missing entries that we would be delighted to include. I made a list of a small selection of semi-glaring omissions at Wiktionary talk:Frequency lists/English/Wikipedia (2016)/10001-20000 Van Man Fan (talk) 20:58, 6 March 2023 (UTC)[reply]

A lot of these are not entry-worthy, but it might be interesting to build a "collocation suggester" with this data. – Jberkel 21:09, 6 March 2023 (UTC)[reply]
@Jberkel I suspected as much, I was kinda disappointed with the amount of dross - I might try again with more of a mix of sources, which has worked better for some languages than others.
P.s. if you are thinking of building a collocation suggester, I recommend going to the original source, since a lot of the work has already been done - though it wasn't directly relevant to what I am using the data for. Even better, it's under a suitable licence. Helrasincke (talk) 09:53, 19 March 2023 (UTC)[reply]

Coverage of Sign Languages[edit]

Though it was thriving at one point, our coverage of Sign Languages has fallen to the wayside in recent times. There's a strong need for more robust entries, but the barrier to entry is very high currently. Sign language entries either follow very complex entry names like 5@NearInsideNosehigh-PalmBack-5@NearInsideNosehigh-PalmBack 5@NearInsideNeckhigh-PalmBack-5@NearInsideNeckhigh-PalmBack or use Signwriting in entry names like 𝡌𝪛𝨒𝤆 or 𝣷𝪜 𝤃𝪜 𝣜𝪜 which isn't encoded the way it should be in Unicode (should also be vertical, which I've attempted to recreate in my own common.css). I myself have been trying to create the ASL entry for BODY as seen at BODY at Handspeak, but being that it's a multi-move sign, it gets very complex. The first entry type would be OpenB@Chest-PalmBack-OpenB@Chest-PalmBack Contact Contact OpenB@Abdomen-PalmBack-OpenB@Abdomen-PalmBack based on WT:AASE, and then the second type with Signwriting would be even more complex with something like 𝡚𝪧𝡚𝪡𝤅𝤅𝤪𝪛𝪤𝤪𝪤𝤅𝤅, but it doesn't feel the best. This is before we start getting into nonmanual markers and differences between for example, SNOW vs ¡SNOW!(chhh) (which can mean "blizzard/snowstorm", nonmanual marker depending on the speaker). Thus, there needs to be more clarity on it, especially since a lot of main contributors haven't been active in years.

On a second point, there's also a lack of proper family/etymology coverage for Sign Languages. (Old) French Sign Language is almost universally accepted as an ancestor of ASL, but it's not set as an ancestor here, nor is Old French Sign Language even an etymology-only language. Category:French Sign Languages only has three sign languages, and is missing many more. This makes it more difficult for future sign language coverage (as we can't show signs coming from ancestor languages). For these changes, I plan on going through at least the ASL subfamily on my own at some point, though support would be appreciated. Pinging @Rodasmith, @Numberguy6, @Msh210 AG202 (talk) 05:39, 7 March 2023 (UTC)[reply]

Thinking out loud a bit, this seems like a topic that would really benefit from specialists (even moreso than most languages), and would probably be a good application for a grant. —Justin (koavf)TCM 07:39, 7 March 2023 (UTC)[reply]
@AG202, Koavf Agree on all counts with the points both of you are making. I have passing familiarity with ASL and I know it's very different from your typical spoken language due to the way signs work: hands cannot move as fast as the mouth, so to make up for this the individual signs encode a lot more information than phonemes do. Sign-language research seems much less developed than spoken-language research and most linguists do not focus on sign languages, with the result that (AFAIK) there isn't even a universally accepted way of symbolically representing signs. Benwing2 (talk) 23:45, 12 March 2023 (UTC)[reply]
If you're motivated to work on it, I'm a grant writer and I have a personal interest in sign language documentation, so I'd be happy to collaborate on asking for funds, reporting, etc. —Justin (koavf)TCM 23:47, 12 March 2023 (UTC)[reply]
Alas, I don't have time to focus on sign languages currently but maybe someone else will; I didn't realize you are a grant writer. Benwing2 (talk) 05:04, 13 March 2023 (UTC)[reply]
I believe the records indicate that ASL was partly relexified with FSL, but not actually descended from it. But in general the evidence for genealogical relationships among sign languages is extremely poor, and often family proposals are little better than guesswork. If we don't expect "FSL family" to mean anything more than "contains significant FSL vocab" (the way for example English would be a Romance language, and Japanese both a Sinitic and a Germanic language), then listing ASL in the FSL family would probably make navigation easier. kwami (talk) 05:40, 13 March 2023 (UTC)[reply]
@Kwamikagami My understanding was that ASL descends directly from Old LSF (Old French Sign) with a mix of native languages. At least that's what the works that I've looked at say. For example, A Historical and Etymological Dictionary of American Sign Language (2015, published by Gallaudet Uni Press) (intro linked here), states, "Since some of the first generations of ASD students were from the island, it is likely that a number of Martha's Vineyard Sign Language (MVSL) signs were incorporated into ASL, though probably less than is typically assumed." And estimates that 20% of the 300 MVSL signs documented were cognates with ASL signs, but it's unclear which loaned to which one. But it also makes it clear that there's a link from Old LSF to ASL and that the latter inherits from the former. Thus, with this + other sources, I'd say that it's clear enough to have ASL have Old LSF as an ancestor and be a part of its language family. AG202 (talk) 23:05, 14 March 2023 (UTC)[reply]
From my understanding, Clerc couldn't understand the students and needed to learn something of their language. The original students mostly spoke 3 village SL's, with MVSL numerically dominant. I don't know if the result was a mixture, but the basis of ASL was not FSL. Especially since Clerc would have been mostly teaching them vocab: nearly all the grammar would have been whatever the students converged on, as they were already fluent, and there was only one speaker of FSL but dozens of MVSL. They certainly adopted a lot of FSL words, but for oral languages that would be considered secondary. kwami (talk) 00:15, 15 March 2023 (UTC)[reply]
Is it possible that you could provide sources? Not denying what you’re saying, but from what I’ve seen, there’s more evidence to prove that it does in part come from Old FSL, at least what we have access to. And re: MVSL, the above says: “No more than four students from Martha’s Vineyard were present at ASD at the same time until the 1850s and 1860s, when their attendance peaked at around twelve students (Annual Report 1887)”. This is not as much as expected, and if there are only truly 60 cognates documented cognates between MVSL & ASL vs the many many cognates of LSF, then I’d be hard-pressed to see how the grammar + syntax would be as heavily impacted as well by MVSL. AG202 (talk) 00:24, 15 March 2023 (UTC)[reply]
I don't recall the ref for Clerc not being able to understand the students. Something somewhere in what he himself wrote, if I remember correctly, rather than from Gallaudet.
May've been wrong about the number of MVSL-speakers. But when you bring deaf children together, they develop their own language. Yes, FSL will be the lexifier, bus as with a creole, not likely to be the basis of the grammar, because the students are already fluent in their own grammar. FSL wasn't transplanted here and then impacted by contact with other SLs, rather, children speaking those SLs were taught FSL in by a single teacher but most of their interaction, reinforcing the grammar they used, would've been with each other. This is quite a common occurrence with SL's, and in general SL 'families' are not going to be structured the same way as oral-language families, where it's native speakers who diverge from each other over time. kwami (talk) 00:34, 15 March 2023 (UTC)[reply]
Thank you so much for the info. I don't have any personal experience with grant writing, but I'd be happy to collaborate where I can. AG202 (talk) 13:12, 13 March 2023 (UTC)[reply]

Requesting rollback[edit]

I started checking Special:RecentChanges regularly recently to revert vandalism. I'd like to request the rollback right since some people make several edits that should be reverted in one go. I'm generally conservative and only revert if I'm sure. I sometimes point questionable edits out on Discord. Thanks! -- tbm (talk) 07:41, 8 March 2023 (UTC)[reply]

Approved by @Fenakhay. Vininn126 (talk) 09:03, 8 March 2023 (UTC)[reply]

Northern Kurdish alphabet[edit]

Hi, everybody.
I wanted to ask: what is the policy on Latin-script Northern Kurdish entries? Are we supposed to employ the base Hawar alphabet? the version including ⟨ḧ ẍ '⟩ for /ħ ɣ ʕ/? Should we follow the system listed at Wiktionary:Kurdish transliteration? Thanks in advance for any input on the subject. — GianWiki (talk) 17:28, 8 March 2023 (UTC)[reply]

@GianWiki AFAIK, Northern Kurdish is by default written in Latin script, so we need to follow whatever the actual usage is. If you have a Northern Kurdish dictionary that lists terms in Arabic script, it's undoubtedly outdated, and you should use a different one if possible. You may have to do some research to find out what the current usage is. I do see Wikipedia's article Kurdish alphabets, which mentions the base Hawar alphabet and in addition the ⟨ḧ ẍ '⟩ that Celadet Alî Bedirxan proposed using. I would (a) look to see what current entries do; (b) look at the "Kurdî" Wiktionary (which is Kurmanji/Northern Kurdish) at [3]; (c) consult with native Northern Kurdish speakers. I don't know if any are very active currently at Wiktionary but I know I spoke with some when I split "Kurdish" into Northern Kurdish and Central Kurdish. You can find the discussion of this split in Wiktionary:Beer parlour/2020/September#Kurdish and Wiktionary:Beer parlour/2020/October#Remaining Kurdish lemmas, where I spoke with @Balyozxane, Calak, Şêr and also with @Vahagn Petrosyan (who is Armenian but may be able to help with Kurdish). Benwing2 (talk) 23:33, 12 March 2023 (UTC)[reply]
@GianWiki: there is no formally agreed policy, but in practice we use the base Hawar alphabet. <ḧ ẍ '> as well as the aspiration symbol <’> should not be included in the pagename, but they should be shown in the headword line using the parameter head= as in antêx. Vahag (talk) 15:00, 13 March 2023 (UTC)[reply]
I see. Thank you very much for your help! — GianWiki (talk) 16:50, 13 March 2023 (UTC)[reply]
There's something I forgot to ask: aside from ⟨ḧ ẍ ' ’⟩, should the head= parameter also show a particular character for the trilled /r/? I saw the transliteration chart uses ⟨ř⟩, while the Ferhenga Birûskî: Kurmanji–English Dictionary uses ⟨r̄⟩. Is one of them preferable to the other? — GianWiki (talk) 17:31, 13 March 2023 (UTC)[reply]
You should show the trilled r in the headword. I don't think we ever discussed which symbol is preferable. Vahag (talk) 18:47, 13 March 2023 (UTC)[reply]
Many thanks for the advice; you've been extremely helpful! — GianWiki (talk) 16:51, 13 March 2023 (UTC)[reply]

Proto-Italic/Proto-Hellenic IPA[edit]

Would anyone be opposed to a mass removal of unsourced IPA from Proto-Hellenic and Proto-Italic entries? These have been added by one IP-hopping anonymous editor and the quality is questionable. Are there any other protolanguages with this same problem? — SURJECTION / T / C / L / 21:05, 8 March 2023 (UTC)[reply]

(Yes, kill, they're wrong, thank you.) Catonif (talk) 21:09, 8 March 2023 (UTC)[reply]
I don't oppose that. In general, I see no justification for including IPA on proto-language entries: since these words are not attested in writing, their spelling itself is typically phonemic. IPA or narrow phonetic transcriptions are unnecessary and often debatable.--Urszag (talk) 04:10, 9 March 2023 (UTC)[reply]
Yeah, I would support not having reconstructed IPA pronunciations without a source (as some PIE entries have, like *h₂éwis). If there are no objections within the next few days, I'll have a bot job remove ita-pro and grk-pro IPAs. — SURJECTION / T / C / L / 12:18, 10 March 2023 (UTC)[reply]
Remove them for Proto-Celtic and Proto-Brythonic as well. --– Sokkjō 06:12, 16 March 2023 (UTC)[reply]
Absolutely, and block the IP if they reverts any deletions. --– Sokkjō 05:25, 12 March 2023 (UTC)[reply]
@Surjection Yes, please remove the IPA. I'm generally opposed in any case to IPA attached to reconstructed languages for the reasons enumerated by User:Urszag. I would even argue we should remove the IPA from Proto-Germanic, because it doesn't seem to add a lot compared with the spelling and may not represent a consensus (and if it's kept, it should DEFINITELY be generated by a pronunciation module rather than hard-coded manually). Benwing2 (talk) 23:39, 12 March 2023 (UTC)[reply]
I don't think that IPA is a good idea for reconstructed languages, unless the reconstructions actually use IPA. People are going to read IPA transcriptions as indicating pronunciation, which will be misleading. Any orthography is likely to mislead people that way, but using IPA is likely to make it worse. Certainly if it's appended to the reconstruction, as if to say "this is what it really sounded like," that would be problematic. kwami (talk) 05:48, 13 March 2023 (UTC)[reply]
I think having a phonemic transcription for proto-languages that use an obscure orthography is not a bad idea. Karen kalantari (talk) 05:31, 16 March 2023 (UTC)[reply]
What you're suggesting is that we add a second orthography. It won't be any more phonemic than the first one. At that point, IMO it would be simpler to use a reformed orthog that is more accessible. kwami (talk) 05:52, 16 March 2023 (UTC)[reply]
That's a good solution, or at least talk about the orthography in the "about: proto-language" section. some languages like proto-turkic lack this. Karen kalantari (talk) 06:38, 16 March 2023 (UTC)[reply]
If an orthography is obscure, i.e. limited to one or a few researchers, I think it might be reasonable to transliterate it into a more international system. But if it's the norm it its field, then we will presumably want to stick with convention because that's what most RS's will be using. (That would be different form the IPA, where we generally normalize transcriptions, because the IPA has internationally accepted values. Reconstructions in general do not.) I'm doubtful about the utility of having multiple transcriptions. I suspect it many cases that would only cause confusion.
In some cases, reconstructions are intentionally agnostic. (E.g. capital letters that could mean almost anything.) Also, researchers may disagree as to what sound *G was. In such cases, using IPA could give a wrong impression of phonetic precision or of agreement among scholars. kwami (talk) 06:45, 16 March 2023 (UTC)[reply]

POS headers / headword lines[edit]

I'm currently in the process of recreating the entry stats for Wiktionary, and came across some inconsistencies while working on the parser: According to WT:EL, there's always one headword line per POS header.

Each entry has one or more POS sections. In each, there is a headword line

For two nouns, this would look like:

===Noun===
{{head|xx|noun}}

# def

===Noun===
{{head|xx|noun}}

# def

However, some entries group headwords under a single POS header:

===Noun===
{{head|xx|noun}}

# def

{{head|xx|noun}}

# def

Am I interpreting WT:EL correctly? What should be the standard formatting? From a parsing perspective, the first option is easier. Jberkel 10:54, 10 March 2023 (UTC)[reply]

Can you give concrete examples? In cases like English denier and Turkish melemen, the two meanings have different etymologies and are in different Etymology sections. Are there nouns that are homographs and have a common etymology, yet are in some sense distinct rather than a single noun with two distinct senses?  --Lambiam 00:08, 12 March 2023 (UTC)[reply]
@Jberkel I have encountered the second style above occasionally and I view it as purely an error, and always correct it by duplicating the header above the second headword to make it look like the first style given above. The second style is not common and I think it occurs either because people aren't familiar with WT:EL, or because it's a holdover from several years ago before WT:EL got solidified. Benwing2 (talk) 23:16, 12 March 2023 (UTC)[reply]
@Lambiam: Yes, at least given what we have on Wiktionary at the moment.
  1. There are dhātu f and dhātu m (root of a word), though personally I think only the masculine form is borrowed rather than inherited. Even if the feminine is chiefly inherited, it is also used to mean 'root of a word', as well as in a range of other meanings.
  2. More securely, there are palāsa m or n (leaf), palāsa n (foliage) and palāsa m (bastard teak). The latter is so called because of its red petals; in Sanskrit the word includes the meaning 'petal'. The first two senses would benefit from an assembly of quotations to confirm the associations of gender, number and meaning. There's also an adjective sense under the same etymology, palāsa (green).
For English, we might wind up with monosyllabic cafe in the same section as disyllabic cafe. However, the pronunciation difference might lead to separate etymologies! --RichardW57m (talk) 17:40, 13 March 2023 (UTC)[reply]

Frequency information[edit]

Discussion moved from Wiktionary:Tea_room/2023/March.

I recently found (and made a template for) {{R:pl:SFPW}}, and I am thinking about using this somehow. Sadly, it's from 1990, which is a little dated, so some things might have changed, but it should still be interesting for people. I am considering making a template, something like {{pl-freq 1990}}, which when given certain parameters would print various information about the frequency of the given word automatically.

Question 1: Has frequency information like this ever been documented in the mainspace? The closes I've seen is the information on surnames Question 2: What section should this go under? Currently surname information is listed under the non-standard header "Statistics", but I think this and frequency information should be put under the header "Trivia", which is a header listed on WT:ELE. Vininn126 (talk) 00:06, 10 March 2023 (UTC)[reply]

I have included an example of this on the page sprawa. If anyone thinks it should be done differently, please let me know. Otherwise I would like to set this as the standard for such things in the future. Vininn126 (talk) 11:26, 10 March 2023 (UTC)[reply]
This is how I would rewrite the template for clarity and concision:
The Słownik frekwencyjny polszczyzny współczesnej (1990) found sprawa to be the 47th most common word in Polish, appearing 77 times in scientific texts, 243 times in news, 335 times in essays, 114 times in fiction, and 114 times in plays, totaling 883 uses.
I've moved the frequency up, removed the factoid "one of the top 10,355" (obviously if it's 47th), and removed the size of the corpus since it didn't seem relevant to the frequency of any particular word. I recognize that the latter two changes are a matter of personal taste. I think the implementation is fine; the entry looks great overall. Ultimateria (talk) 00:58, 15 March 2023 (UTC)[reply]
I think the size of the corpus is hugely important when presenting this numbers, so people can do the math themselves. Vininn126 (talk) 11:12, 15 March 2023 (UTC)[reply]
Why does it still say "one of the top 10,355"??, this isn't useful information.
The breakdown of numbers by different genre is also quite useless, as we don't know the breakdown of the totals.
If anything, only parameters 6 and 7 should be included, this will also help ease of reading, as a list of numbers is a little bit hard to digest. Something like "in a corpus of ... words, ... appeared ... times, making it the ... common word" itd 85.255.237.74 04:28, 2 June 2023 (UTC)[reply]
1) How is that not useful?
2) How is this not useful either? Vininn126 (talk) 09:19, 2 June 2023 (UTC)[reply]
1) As already mentioned by Ultimateria, if you say a word is the 147th most common word, then that it is in the top 1234567 words is already known, hence not useful.
2) When I say that "in a corpus of Y words, word A appears X times", this gives me useful information about the frequency of the word appearing, I can expect the word to appear X/Y of the time, as a fraction. But notice that if you omit Y, X loses all meaning. Similarly, you might be interested in the ranking "word A is the 147th most common word". At the moment there is information like "word A appears 3874 times in science etc." ommiting the total number of words in the science corpus (!). This only makes sense when you make some assumptions about the proportion of the corpus of Y words is made up of science words. 82.46.123.120 14:19, 2 June 2023 (UTC)[reply]
I support having this information in general, but saying it is in the XXXX most common words is completely useless if you then say what actual rank this has. The way the template is worded doesn't even tell me that the top ten thousand-whatever words were analyzed, because for all I know (and this would have been my interpretation if the number was a round number), there is a second category that goes to 20,000 words. So I'd either say, "Of 10,XXX words analyzed" or drop that information. And if they only counted the most common words, I would drop it in that case as well, since then the rank of the word tells you all you need to know. Andrew Sheedy (talk) 14:28, 2 June 2023 (UTC)[reply]
I have updated the template to just take that part out of the wording. Vininn126 (talk) 14:57, 2 June 2023 (UTC)[reply]
I find that first argument somewhat compelling and I can probably omit that. As far as the genres are concerned, I feel it's important because if you look at some words, you'll see for example no#Polish is more popular in different areas. It's similar to labels, etc. I could explain how big each corpus is (they are all equal in size, equally dividing the entire corpus. Vininn126 (talk) 14:30, 2 June 2023 (UTC)[reply]
Ok, makes sense. In that case each subject corpus is a round 100,000 words.
If you want to keep all the info in, maybe something like this:
According to ..., ... is the Xst most common word in a corpus of 500,000 words, appearing A times in scientific texts, B times in news, C times in essays, D times in fiction, and E times in plays, each out of a corpus of 100,000 words, totaling A+B+C+D+E times, making it the Zst most common word in a corpus of 500,000 words. 82.46.123.120 15:00, 2 June 2023 (UTC)[reply]
ups, removed the duplication I left in:
According to ..., ... is the Xst most common word, appearing 191 times in scientific texts, 161 times in news, 128 times in essays, 199 times in fiction, and 169 times in plays, each out of a corpus of 100,000 words, totaling 848 times, in a corpus of 500,000 words. 82.46.123.120 15:02, 2 June 2023 (UTC)[reply]
Sure. Vininn126 (talk) 15:08, 2 June 2023 (UTC)[reply]
  • A long long time ago, Wiktionary contained word frequency for English words, using Template:rank. We decided later on it was dumb. Van Man Fan (talk) 03:10, 18 March 2023 (UTC)[reply]
    Interesting. I don't see any arguments why in that thread, but I'm willing to hear them. Vininn126 (talk) 10:18, 18 March 2023 (UTC)[reply]
    @Vininn126 I'm not huge on the idea of including frequency here mostly because it's a very vague concept (in what linguistic domain? encompassing which register/s, time periods, geographic groupings?) and we don't have the resources or access to the kind of high quality data which are required for these statistics to really be reliable. Relative frequency rank is only really valid if you have a truly representative corpus. That said, if you are interested in working on this anyway with the resources we do have access to, there's no need to reinvent the wheel. Here it is in action. Maybe these numbers could be incorporated somehow if you want to build a test-case. Helrasincke (talk) 11:43, 20 March 2023 (UTC)[reply]
    @Helrasincke I agree providing context is incredibly important with this. I am basing this off a frequency dictionary printed some time ago, but have tried to include all the relevant information. I wouldn't be opposed to including other sources, of course. If you actually look at the implementation, I think you'll see I've tried to explain everything needed. Vininn126 (talk) 11:47, 20 March 2023 (UTC)[reply]
@Vininn126: As someone who relies heavily on frequency lists for language learning, I'm in favor of using frequency lists to improve coverage, but I'm not sold on including it as a Trivia section for the reasons mentioned by User:Helrasincke. A quick look at our [frequency lists] shows that the top 10 most common English words in [English wikipedia] are "the of and to in a is was that for" while the top 10 words used in [[4]] are "the I to and a of was he you it". They don't even agree on the top two most common words, but at least 9 words do appear on both lists. The 2000th most common fiction word is "teen", which appears at position 17,159 in the wikipedia list.. Telling readers that "teen" is either the 2000th most common word or the 17,159th most common word really doesn't give them any useful information and presenting only one could be unintentionally misleading. JeffDoozan (talk) 16:40, 2 June 2023 (UTC)[reply]
@JeffDoozan So how should that be presented? I also don't see how it could be misleading. Vininn126 (talk) 16:42, 2 June 2023 (UTC)[reply]
@Vininn126: I don't think it should be presented to the reader at all. It could be misleading because I'm unfamiliar with the word "teen" and therefore searching for its definition on Wiktionary and it tells me that it's the 17,159th most common word, I might think that it's a relatively uncommon word (on par with interspersed, microcode, or socioeconomic according to the Wikipedia list). JeffDoozan (talk) 16:48, 2 June 2023 (UTC)[reply]
So how would you word it? I'm just trying to present the information in this specific dictionary as flatly and clearly as possible - i.e. giving the year, specific dictionary, etc. Vininn126 (talk) 17:04, 2 June 2023 (UTC)[reply]

More on alternative forms[edit]

This time it's the several variants of inscripturated. All of them, best I can tell from informal research, are of relatively recent coinage, probably by theologians, who are the main users of these words. There is inscripturate, which (like its coordinate term incarnate) can be either a verb or an adjective, and clearly deserves its own entry. Then there's enscripturated, which appears to be merely an alternative spelling (much less common) of "inscripturated". Last and least, a day or so ago the ongoing compulsion to dig further here got the better of me, and I found that enscriptured is yet another form (even less common, but still attested in reputable sources). How should it be listed? As simply yet another alternative form of "inscripturate"? That doesn't quite seem to fit. Surely not as a full-fledged word in its own right, a synonym of "inscripturate" that just happens to look similar. Or are we allowed to label a word as an alternative form of an alternative form? That is what this seems to be. – HelpMyUnbelief (talk) 13:26, 10 March 2023 (UTC)[reply]

I think the argument can be made either way really: whether certain forms ultimately represent a single word is a qualitative judgement and your example seems borderline. I've seen some people suggest that altforms have to be purely orthographic, but I imagine few people disagree with aluminum being listed as an altform of aluminium and that's not a purely orthographic variation, they're pronounced differently too. The guideline at WT:FORMS is simply that altforms are "variants of a single word" that should be identical in meaning, not errors, and satisfy the CFI. —Al-Muqanna المقنع (talk) 14:02, 10 March 2023 (UTC)[reply]

On our list of English one-letter words[edit]

The list for English one-letter words only has three, the standard a, I, and O. However, we list far more one-letter words than that. Here's a list, separated by how iffy they are:

Words

  • A - "London euph. for arsehole"
  • c - "Alt. form of c., as in circa"
  • C - "One hundred dollars"
  • D - "Slang for dick"
  • d - "Abbr. for down, in the crossword sense"
  • E - "Slang for ecstasy"
  • e - A Spivak pronoun
  • F - "Fahrenheit"
  • f - "Euph. for fuck"
  • G/g - "Unit of gravitational acceleration"
  • H - "Abbr. for heroin
  • h - "Internet filler response"
  • J/j - "A marijuana cigarette"
  • K/k - "OK"
  • L - "Slang for loss"
  • n - "Shortening of and"
  • o - "Zero"
  • p - "pretty"
  • Q - "QAnon, anon. person on message boards"
  • R/r - "radius"
  • T/t - "time"
  • U - "Char. of the upper class, as in language"
  • v - "Abbr. of versus, in the name of a case"
  • W - "Slang for win."
  • X - "Obscene, as in a film"
  • x - "Ship indicator"
  • Y - "Facility ran by the YMCA/YWCA"
  • Z - "Z-drug"

Iffy

I suggest that we include at least some of the ones of the upper list, and leave a note at the top of the category saying that common ones are a, I, and maybe O. Three citations, for all senses. (talk) 20:15, 11 March 2023 (UTC)[reply]

What is more iffy about those on the second list? Most I recognize as being commonly used abbreviations.  --Lambiam 23:52, 11 March 2023 (UTC)[reply]
If we include cases where the "word" is just the name of the letter itself used with a particular sense (e.g. D, H, Q) then there seems to be no reason to not just include all letters, as any letter can be used as a word to refer to the letter itself.-Urszag (talk) 23:55, 11 March 2023 (UTC)[reply]
I don't understand your criteria for iffiness. I'm also going to RFV "h"! Equinox 00:04, 12 March 2023 (UTC)[reply]
Mainly "would a reasonable person say the letter as itself (eg "U" as "yoo" or "W" as "dub") or as something else (eg "b" for "born")?" I also put V on the Iffy list as it's just a subsense of the shape. Although maybe D isn't as iffy. Three citations, for all senses. (talk) 00:08, 12 March 2023 (UTC)[reply]
All v iffy. – Sokkjō 05:31, 12 March 2023 (UTC)[reply]
A = arsehole, c = circa, q = question, b = billion, p = pretty I've all heard pronounced as the letter. —Al-Muqanna المقنع (talk) 14:41, 12 March 2023 (UTC)[reply]
'V' is not a word, just as '3', '@' and '♃' are not words. 'V' is just a letter. The name of that letter is vee, not 'V'.
Looks to me that the only one-letter words here, including letter names, are 'a', 'e', 'i', 'I', 'o', 'O', 'u' and 'n', though usually 'n' is written with an apostrophe. There are going to be some interjections as well, such as 'm' indicating something tastes good, or as a variant of 'hm' or 'um'. kwami (talk) 05:30, 13 March 2023 (UTC)[reply]
Only if you exclude words that aren't spelled phonetically, which would be an arbitrary restriction. Nobody writes e.g. "bee" for "billion"; the spelling "b" is standard, so if treated as a separate word it's a word with one letter. CitationsFreak's suggestion makes somewhat more sense for distinguishing "separate words" in that independent pronunciation is a relatively standard way of distinguishing mere abbreviations (i.e., ones that function only as written representations of other words) from forms with some independent lexical character. —Al-Muqanna المقنع (talk) 10:19, 13 March 2023 (UTC)[reply]
I agree about following the pronunciation, but by that standard, '3' (three), '@' (at), '+' (plus) and '♃' (Jupiter) are all words, and the restriction to letters is arbitrary. kwami (talk) 10:34, 13 March 2023 (UTC)[reply]
Yeah, it's arbitrary, like any other starting principle, but it's the choice that happens to have been selected ("English one-letter words"). The restriction to pronunciation spellings doesn't follow as a necessity. I don't consider '@' etc. to be in the same category anyway, by the same principle that they're mere representations of words and not lexically independent (there isn't some special pronunciation of '@' as opposed to 'at')—though there might perhaps be cases where a symbol has its own special pronunciation, in which case it's more interesting. —Al-Muqanna المقنع (talk) 12:23, 13 March 2023 (UTC)[reply]
Like many others have said above, most of the 'iffy' ones are no such thing. A search for 'b.1756 d.1791' unsurprisingly yields many hits about Mozart, for example[5]. 'A' for 'arsehole' seems deeply iffy though. Even if someone does insult someone by calling them an 'A' (does this actually happen?) then how do we know they're not calling them an 'arse' rather than an 'arsehole'? That would be more consistent with how people say 'A-hole' to mean arsehole/asshole. --Overlordnat1 (talk) 13:22, 13 March 2023 (UTC)[reply]
All of those would just seem to be the letters standing in for the word, not words of their own. 'B' for 'bitch' is common, and it's clearly 'bitch' as opposed to anything else, but again it's the letter as a euphemism, not a distinct word. Similarly with 'F you!'.
And if we're going to accept all the letters of the English alphabet, why not the Greek alphabet too? A muon is often called a 'μ', a photon a 'γ', etc. These aren't written out 'mu' or 'gamma' any more than 'b' for 'born' is written out 'bee'. kwami (talk) 14:22, 13 March 2023 (UTC)[reply]
I would consider those to be translingual. A photon is still a γ particle in another language, even a language which doesnt include a /g/ phoneme and normally transliterates /g/ by /k/ or some other sound. So, they would be out of place in this category. Soap 14:56, 13 March 2023 (UTC)[reply]
A letter that's being said as a euphemism is a word, at least according to my gut. Also, "γ" (as in "photon") is totally a Trans. one-letter word, as it's pronounced like "gamma" in the various tongues that use it (and is just a shortening of "gamma ray"). CitationsFreak: Accessed 2023/01/01 (talk) 22:53, 13 March 2023 (UTC)[reply]
Euphemisms are distinct words, yes. That shouldn't be problematic. —Al-Muqanna المقنع (talk) 22:56, 13 March 2023 (UTC)[reply]
I think your average user is going to expect the category to contain words that are pronounced either as the name of the letters or the letter's sound. I think that's where you're going with the "iffy" list. I would exclude all abbreviations that are pronounced as the full word (for instance, if I see "q." I read "question" not "queue/cue". Unless there's evidence that it's pronounced the latter way, I wouldn't want to see it in the category, because it would be a purely typographical convention, not a single-letter "word". Andrew Sheedy (talk) 14:35, 2 June 2023 (UTC)[reply]
That was my intent. The "iffy" was me having no evidence for people ONLY pronouncing, say "50 L" as "50 Liter[s]". (And V is just a reference to its shape.) CitationsFreak: Accessed 2023/01/01 (talk) 14:40, 2 June 2023 (UTC)[reply]

I saw this discussion, and I just want to state the obvious: it is wicked hard to cite some of these words, and it's not hard merely because it's actually rare, it's just hard because you aren't sure how to look for it. I'm so proud of my citations for the ancient Chinese kingdom of E, also romanized as O. But I feel I only found the third clear cite for 'O' just a minute ago, despite doing cites for years. You sometimes have to think outside the box to find these things. But I tell you, just as much as a, I and O are words, I have fully confirmed that E and O are English language proper noun terms: a name for that ancient state. --Geographyinitiative (talk) 14:57, 2 June 2023 (UTC)[reply]

Category:Administration[edit]

I propose deleting Category:Administration, since most of the language subcats are empty except for the category "Public administration". Even Category:en:Administration only has one entry. --Numberguy6 (talk) 18:39, 12 March 2023 (UTC)[reply]

@Numberguy6 Agreed although this should probably be discussed at WT:RFDO. Benwing2 (talk) 23:55, 12 March 2023 (UTC)[reply]

Wikimania 2023 Welcoming Program Submissions[edit]

Do you want to host an in-person or virtual session at Wikimania 2023? Maybe a hands-on workshop, a lively discussion, a fun performance, a catchy poster, or a memorable lightning talk? Submissions are open until March 28. The event will have dedicated hybrid blocks, so virtual submissions and pre-recorded content are also welcome. If you have any questions, please join us at an upcoming conversation on March 12 or 19, or reach out by email at wikimania@wikimedia.org or on Telegram. More information on-wiki.

"Someone must have slandered Josef K., for one morning, without having done anything truly wrong, he was arrested."[edit]

Time after time my edits have been reverted by the admin @Fenakhay. I have time after time tried to fish out of him what rule I have broken on his talk page, to no avail.

This culminated in me being blocked by him for "refusing to learn" God knows what. I appealed the block, but no one noticed.

Each time my "crime" was to point out similarities between Hebrew and Arabic, under the guise of "duplicated entries", whatever that means. Of course, I have not seen Fenakhay make a problem out of this for any other set of languages than Hebrew and Arabic. I've also pointed out similarities between Dutch and German; here for example he edited right after me, and it seemed to be no problem to him.

I demand to concretely know, finally, what rule I have broken, and whether this welcoming behavior is to be expected from other Wiktionary admins. Synotia (talk) 09:39, 14 March 2023 (UTC)[reply]

Your edits were irrelevant where placed and badly formatted. Rule of the whole WWW: Only post relevant content. Relevancy may also be affected by duplicate nature. Fay Freak (talk) 10:32, 14 March 2023 (UTC)[reply]
How are they irrelevant? According to what criteria? Synotia (talk) 10:34, 14 March 2023 (UTC)[reply]
You have already been told the considerations by various people, yet refuse to learn. Fay Freak (talk) 10:43, 14 March 2023 (UTC)[reply]
@Synotia The reason why your additions are duplicative is that you can simply click on the Proto-Semitic ancestor to see all the descendants. We try to avoid duplication like this because it tends to lead to errors. (Also, referring to Hebrew as the "language of the Zionists" sounds pejorative and is best avoided.) Benwing2 (talk) 02:59, 15 March 2023 (UTC)[reply]
@Synotia: I am with @Fenakhay, @Fay Freak and @Benwing2 on this and I have explained it to you on the Fenakhay's page.
Imagine, if I add all Slavic cognates on the Ukrainian term вода́ (vodá, water). It's way more than just a couple, there is no point in this and it would be a huge duplication if *voda exists and lists all descendants. Anatoli T. (обсудить/вклад) 03:30, 15 March 2023 (UTC)[reply]
I wonder how many site visitors do this. Synotia (talk) 20:53, 15 March 2023 (UTC)[reply]

Cleaning up Persian templates[edit]

(Notifying Ariamihr, Dijan, Mazsch, Qehath, ZxxZxxZ, Sameerhameedy): Seems not too many active Persian editors, but User:Atitarev and I have been discussing cleaning up the Persian-specific templates, which are in a messy state currently. IMO, e.g. {{fa-adv}}, {{fa-conjunction}}, {{fa-interjection}}, {{fa-phrase}}, {{fa-preposition}}, {{fa-pronoun}} don't really accomplish anything and should be eliminated in favor of directly calling {{head}}, and some of the other templates have weird and non-standard param usages that could stand to be cleaned up and standardized. There's also things like {{fa-verb/new}}, {{fa-IPA/old}} etc. that are in a halfway state. We also have 61 (?!) verb conjugation templates, which are certainly in an awful state (although cleaning that up will take significant effort). Any Persian editors have any thoughts on this? Benwing2 (talk) 03:05, 15 March 2023 (UTC)[reply]

Thanks. {{fa-proper noun}} should be kept, IMO and it needs an optional |g= for pluralia tantum, such as قرون وسطی (qorun-e vostâ, the Middle Ages).
{{fa-IPA/old}} can possibly be converted to {{fa-IPA}} by a bot, which does the work. Anatoli T. (обсудить/вклад) 03:22, 15 March 2023 (UTC)[reply]
@Atitarev Agreed on {{fa-proper noun}}; I'm only proposing removing the 6 templates I mentioned above, which take only a head= and tr= param. Probably possible to convert {{fa-IPA/old}} to {{fa-IPA}} by bot, although I haven't looked into the details, and likewise for {{fa-verb}} vs. {{fa-verb/new}}. Benwing2 (talk) 03:28, 15 March 2023 (UTC)[reply]
@Atitarev FYI even after deleting some old templates there are 123 remaining fa-* templates. It will take some time to clean these all up. Benwing2 (talk) 04:52, 17 March 2023 (UTC)[reply]
@Benwing2: Thank you! Anatoli T. (обсудить/вклад) 04:57, 17 March 2023 (UTC)[reply]

Entries for bird names[edit]

I just wanted to double-check that this is worthwhile doing. I've been adding Welsh names for birds and I'm finding that many of the English translations don't have their own entry. Some entries like pileated woodpecker have been around since 2010, but others like downy woodpecker or bright-rumped attila were missing until I recently added them.
I was thinking that these probably qualified for inclusion based on existing entries, how a non-native speaker might approach these terms and look them up (I wouldn't fault anybody for not knowing that a bright-rumped attila was even a bird), and that they aren't SOP since they refer to a specific species of bird and not to any bird which merely fits the characteristics of the name. But I just wanted to confirm and hear the community's thoughts first before making too many more entries and end up potentially wasting my time. – Guitarmankev1 (talk) 15:05, 15 March 2023 (UTC)[reply]

I think they are fine, as long as they are attestable. (See WT:ATTEST.) Some birdnames are hard to find in use, though they often appear principally in synonyms listings in wildlife books, ie, in mentions, not uses. DCDuring (talk) 15:33, 15 March 2023 (UTC)[reply]
I tend to agree with DCDuring tbh, I don't see any other reason to exclude them honestly. User: The Ice Mage talk to meh 19:00, 15 March 2023 (UTC)[reply]

@Sgconlaw Can you explain what the purpose of this category is and why it's needed? It seems very strange to me esp. given that it only ever contains one subcategory, 'Carbon'. Benwing2 (talk) 06:34, 16 March 2023 (UTC)[reply]

It was created by @Solomonfromfinland, so I just added it to the module. It is the parent category of “Category:en:Categories named after chemical elements” which has several entries in it. — Sgconlaw (talk) 11:36, 16 March 2023 (UTC)[reply]
@Sgconlaw, Chuck Entz This seems highly questionable. It appears that User:Solomonfromfinland created a zillion element-specific categories each of which is a grab bag of junk; e.g. Category:en:Iron contains cast iron, ductile iron, etc. but also ferrous, iron-sick, blacksmith (??), irony (totally wrong), and other randomness. Chuck, this is pushing the limits of the topics-as-sets vs. topics-as-related-terms issue; should we consider trying once and for all to solve this e.g. by renaming the 'related terms' categories to something like 'Iron-related'? Benwing2 (talk) 04:47, 17 March 2023 (UTC)[reply]
@Benwing2 Re: " irony (totally wrong)" irony#Eymology 2 is correctly associated with the element. DCDuring (talk) 15:43, 17 March 2023 (UTC)[reply]
@DCDuring Hmm, thanks, never heard that usage but you are right. Benwing2 (talk) 15:46, 17 March 2023 (UTC)[reply]
Reminds me of liver#Etymology 2. Is there a name for words like that? Theknightwho (talk) 18:06, 17 March 2023 (UTC)[reply]
@Theknightwho You mean "someone who lives"? I would call that an agent noun. Benwing2 (talk) 18:11, 17 March 2023 (UTC)[reply]
@Benwing2 I meant situations where adding an affix to one word creates an uncommon homograph(?) of a much more common word that’s unrelated, like iron + -y and irony or live + -er and liver. The kind of thing that would trip up a non-native speaker. Theknightwho (talk) 18:21, 17 March 2023 (UTC)[reply]
@Theknightwho: Oh. I bet there's an obscure term for this but I don't know it. Benwing2 (talk) 18:27, 17 March 2023 (UTC)[reply]
@Theknightwho: Only the 25,000th time I've done that. Benwing2 (talk) 18:27, 17 March 2023 (UTC)[reply]
@Benwing2 I meant they'd trip up doing it the other way round haha. I guess the rule is if native speakers think the derived term probably doesn't exist (because I had the same thought as you tbh). Theknightwho (talk) 18:29, 17 March 2023 (UTC)[reply]
I don't know if there's a term for it, but Granger and some other users and I have a list of them here, in the Anteroom of Silliness, along with some silly definitions. - -sche (discuss) 01:44, 18 March 2023 (UTC)[reply]
detail is also often used in crosswords to indicate removing the last letter of a word, it's de-tail but cheekily written without the hyphen (though I doubt this could be attested as an actual word). --Overlordnat1 (talk) 02:03, 18 March 2023 (UTC)[reply]
@-sche Thanks for this - exactly what I was looking for. Theknightwho (talk) 21:59, 18 March 2023 (UTC)[reply]
@Benwing2: they simply don't understand the abstract structures behind categorization (see their talk page), but our category organization has some real problems, too. This mess was a response to a particular oddity: Iron is an Ossetian language. Category:Iron Ossetian was mistakenly moved to Category:Iron and a category redirect was left behind when it was moved back. Instead of asking someone what to do about this, they improvised a hacky workaround. I would just delete the current Category:Iron, replace it with a daughter category of Category:Chemical elements, then orphan this category by moving everything else to its proper place under that category.
As for the contents of these categories: I have yet to see a workable way to deal with the overlap of topical and set categories. That leaves situations where the intersection of the topical structure and the set structure is artificially narrow, but there are enough such intersections to bloat one or the other if no subcategories are made. I experimented a little with categories for such overlaps in the case of maize, which has a wide body of terminology that makes even narrow categories workable in some languages. See Category:Maize (crop), Category:Maize (food) and Category:Maize (plant). This is just the main problem that occurred to me- I'm sure there are others. Chuck Entz (talk) 00:22, 18 March 2023 (UTC)[reply]
@Chuck Entz. Thanks. Can you give me some examples of what you mean by this sentence:
Topical categories for specific things often only fit into the same conceptual framework as that used by the set categories for those things, but there are plenty of cases where they fit better into other conceptual frameworks, with the distribution of which is which not predictable from the specifics of either framework.
Also do you think it's worth trying to separate actual chemical-element categories (see Category:en:Categories named after chemical elements, User:Solomonfromfinland manually created a bunch of them) from subcategories like "Halogens", "Chalcogens", etc.? Or do you think we should just move halogen elements under "Halogens", chalcogen elements under "Chalcogens" etc.? The name Category:en:Categories named after chemical elements is terrible and needs to go. Any suggestions for rearranging the subhierarchy under CAT:Chemical elements are welcome. Benwing2 (talk) 00:30, 18 March 2023 (UTC)[reply]
I created the remaining group categories needed to cover all of the elements (except hydrogen, which is unique), then added all of the chemical elements that are likely to need categories to the module as subcategories of the group categories and finally converted all of the chemical element categories to {{autocat}}. As of now, the main category contains nothing but empty subcategories. This particular edifice of coat hangers and duct tape is ready to be deleted. I'm sure there are more of them, though. Chuck Entz (talk) 07:19, 3 April 2023 (UTC)[reply]

Ottoman borrowings of Arabic adverbial accusatives[edit]

@Itidal, Fay Freak, Rd1978, Ardahan Karabağ. Many Turkish adverbs were originally Arabic adverbial accusatives. These often end in -en in modern Turkish. An IP editor has defined Ottoman Turkish and Turkish suffixes ـاً and -en. I do not think these suffixes exist. Terms like اولا (evvela), تماماً (tamamen), and kısmen were formed long ago from Arabic nouns by the rules of Arabic grammar, mostly in Arabic as far as I can tell. I propose to delete any mention of the supposed Turkish suffixes. In general, these are borrowings from Arabic that happen to end in the same sound (plus or minus a nasal). Some of them may be pseudo-loans. Vox Sciurorum (talk) 16:16, 16 March 2023 (UTC)[reply]

Yes. It would have to exist in native Turkish words (even pseudo-Arabisms with it might not be enough if only extraordinarily occurring). Fay Freak (talk) 17:25, 16 March 2023 (UTC)[reply]
I agree to delete actually. These suffixes don't exist in native Turkish vocabulary & lexicon and entered to our language directly. If there was no words in Arabic that ends with the suffix -an/-en but occurs in Turkish I wouldn't see a problem. For example Tr. tamamen < Ar. تماما.
As you can see there is an Arabic form. Ardahan Karabağ (talk) 17:42, 16 March 2023 (UTC)[reply]
there are some cases that Turkish speakers actually coin adverbs utilizing -en, such as tekniken ("technically"), (there's probably more which I can't remember). categorization of the Arabic adverbial accusative derivatives is functional, as an average speaker will mostly have easy time analyzing those adverbs from its stem and the suffix at issue. imho deletion is unnecessary. what are the criteria that decides whether a turkish word is "native" or not? Itidal (talk) 21:52, 16 March 2023 (UTC)[reply]
If there are cases where -an/-en is a genuine suffix, then it's maybe OK to keep it but IMO no words borrowed whole from Arabic should mention it. Benwing2 (talk) 04:51, 17 March 2023 (UTC)[reply]
we already mention those terms with surface etymology template. we are obviously don’t consider them as terms that are genuinely coined in turkish. ex: müttefiken, hakikaten, binaen. Itidal (talk) 10:14, 17 March 2023 (UTC)[reply]
"but IMO no words borrowed whole from Arabic should mention it" as I wrote because surface etymology categorizes. Benwing2 (talk) 15:47, 17 March 2023 (UTC)[reply]
I didn't know the suffix was productive. I checked {{R:tr:OTK}} and indeed it is there. It must be a modern Turkish innovation. It is not listed as belonging to Ottoman Turkish. Educated Ottomans would have recognized -en applied to Turkish roots as a barbarism. I will delete mentions of the Ottoman suffix ـاً and leave the modern suffix alone. Vox Sciurorum (talk) 12:28, 22 March 2023 (UTC)[reply]

Medieval Greek[edit]

from Sarri.greek: notifying @Al-Muqanna who initiated the discussion Koine/Byzantine and @Benwing2. Also @Mahagaja, Erutuon, JohnC5 directors of ancient greek, and, although inactive, the 'fathers' of grc section @Atelaes, ObsequiousNewt. Mr @A. T. Galenitis has shown great interest on the subject during our discussions.
Subject: Applying to create Medieval Greek (gkm) a language section; currently an etymolgoy language (2016, BeerParlour) Category:Byzantine Greek, resulting to categories like Category:Ancient Greek terms derived from Old Anatolian Turkish. Three issues are put, also concerning periodization of Greek as described at WT:About Ancient_Greek#Divisions of the Greek language

  • 1) Could the term 'byzantine' be changed to Medieval Greek? (a term used in many of our contemporary sources) It is also visible at {{grc-IPA}})
  • 2) Would en.wiktionary agrree to make Medieval Greek an autonomus language section? Not many lemmata would be added, but I feel that a gap of 1,000 years (6th to 17th century) is a somehow serious omission. WT hosts many languages with very few lemmata, I wonder if this one could be added too.
  • and, 3) revisiting and updating texts about Greek language periodization, especially for Koine and Medieval Greek as in appendixes, templates and lemmata.

The basic sources for the documentation of Med.Gr. by period: EarlyMed: Iustinianus up to 1100, learned, extension of Late Koine {{R:LBG}} dictionary. Main period, vulgar texts 1100 to 1453 and LateMed or EarlyModern (they coincide) 1453‑1699 i.e. 1500‑1700. Dictionaries: {{R:Kriaras Medieval}} & {{R:Kriaras Medieval2}} (22 printed volumes, up to τέως (téōs)), {{R:Dimitrakos 1964}}. Grammar, the 2019 Cambridge Grammar [6]. No inflection tables are required for this language, ipa already is included at {{grc-IPA}}. Texts are available in the internet.
I realise that a lot of technical interventions are need to add a new language, and unfortunately I do not have the capacity to make them. Still, I hope that en.wiktionary will stand favourable to this proposal. Thank you ‑‑Sarri.greek  I 18:41, 17 March 2023 (UTC)[reply]

PS, Of course, what I can do, is to review and update all existing lemmata of the category. ‑‑Sarri.greek  I 19:23, 17 March 2023 (UTC)[reply]
1) No, because Byzantine Greek can be from the 4th century CE. Wikipedia treating “Byzantine Greek” and “Medieval Greek” as synonyms (the latter is a subset of the former) and hence restricting the former to begin from “c. 600” makes them stupid, but they don’t know because they disregard primary sources for terminology usage and hence contrary evidence. I have frequently used it that way for borrowings into the Arabic language, which was spoken before the frontiers of the Byzantine Empire from the 4th century CE up to the 7th century CE (to when the Arabs expanded and a new era began); the most detailled treatment is perhaps at س ج ن (s-j-n), thence the categories bring you to other cases, often military and food terms.
2) As I have understood it the language is however shorter at the end, marked by Islam’s capture of Constantinople in 1453.
I do not deny that there were significant sound and grammar changes over the whole period (which sometimes have to be known for borrowings), but nothing hinders us to specify periods and variants to be more exact under a language name.
It is easiest if we don’t split that hard, I don’t see the problem. Fay Freak (talk) 01:26, 18 March 2023 (UTC)[reply]
They are treated as synonyms on Wiktionary too at the moment: Byzantine Greek occupies the ISO language code gkm = Medieval Greek. So if they aren't in fact being used as synonyms then they will need to be split. In general, though, they have been used synonymously in the literature, so we would probably need a better justification to use a bespoke definition. AFAIK most contemporary Anglophone specialists would not use the term "Byzantine" for the empire before around the 6th century (if at all), which happens to coincide with the relevant linguistic shift.
The recent Cambridge Grammar of Medieval and Early Modern Greek rejects the term "Byzantine Greek" and suggests a periodisation of Greek into Early Medieval from 500 to 1100 and Late Medieval from 1100 to 1500 (p. xix) based on linguistic turning points. —Al-Muqanna المقنع (talk) 12:14, 18 March 2023 (UTC)[reply]
Does not look like it if we have a recent work Byzantium and the Arabs in the Fourth Century. But I have a sinister suspicion that correct philological usage and fashionable historians’ usage that can be expected differs. Anglophone scientists appear to speak the same language, to the laymen, but they don’t, specialized interests make echo chambers. In this fashion you can be as convinced that “most” specialists (wherein?) use the term for one range as I am that it is for another. Wikipedia opts for the loudest, most circular echo chamber. Fay Freak (talk) 13:42, 18 March 2023 (UTC)[reply]
The Byzantium and the Arabs ... series started in the 1980s so the title's not recent, but in any case I said "most" for a reason—there isn't a decisive consensus in usage, so it's a somewhat opaque term if the Constantinian period is meant (as opposed to "late Koine", which is clear and used in the literature). It also has no bearing on what gkm is labelled, sounds like it ought to be renamed from "Byzantine Greek" to "Medieval Greek" either way. —Al-Muqanna المقنع (talk) 13:52, 18 March 2023 (UTC)[reply]
Cool, then we actually agree. The implication of the language code is of course a problem if it is different to that of the language name assigned to it when somebody uses it—another of the casual inexactitudes of those language databases (they didn’t go that deep into chronolects at those standardization gremiums that much, did they, they copied together overviews but did not investigate the actual usage of the terminology). It is possible to have separate codes or separate concepts “Byzantine Greek” and “Medieval Greek” with intersection, as for Latin “Renaissance Latin” and “Medieval Latin” overlap and somewhere in between the latter as a register we have “Ecclesiastical Latin” (unfair comparison since those are not merely chronolects). If we have “Late Koine” than the meaning of “Koine” would also be affected, I tended to view “Koine Greek” as of the time just before it but of course it is true that my pre-Medieval Byzantine Greek is also Koine. Two bad, ambiguous names for a particular era we have right now: I could arbitrarily have chosen the codes for “Koine Greek” and “Byzantine Greek” for those borrowings imagined to have taken place from the 4th to 7th century.
I don’t know, we could have Byzantine Greek with subcategories Late Koine and Medieval Greek and Koine Greek with sub categories Late Koine and Middle and Early Koine (somewhere between the two latter also the term Hellenistic Greek ends, Sarri had fun using this epithet); for etymologies “Byzantine Greek” might be seen as a cleanup category and for labels in ”Ancient Greek” entries it would make sense if editors want to express that the terms are of that era from the 4th century to the 15th—they do anyway, depending on how exact they want to be, we won’t forbid people to use lect names whereby they do the job, T:defdate is too ideal to oust it in reality. Fay Freak (talk) 18:33, 18 March 2023 (UTC)[reply]

Medieval or Byzantine[edit]

I have no linguistic training; none. My notes here reflect what we read as wiktionary editors. I have no arguments or opinions. But still, I would like to present the case of Med.Greek, as a language lover. Allow me then, to rephrase point 1 from above. The reason we asked en.wikt to use 'Medieval' instead of 'Byzantine' is

  • a) to avoid using historical terms and events as boundaries for language change. Boundaries are always conventional. Lexicographers and linguists of previous millenium, tended to adopt as boundaries the historical turning points even if they did not have an impact on language change.
  • b) because The CambrdigeGrammar.XIX: «The system of periodization that we have used is not based on external criteria, which might relate to historically significant dates, such as wars, conquest or independence. For this reason we do not employ the term “Byzantine Greek”: for almost the whole of the period that we are concerned with, a substantial part of the Greek-speaking world was not “Byzantine” in a political sense. Our criteria are instead internal ones, based on clusters of important linguistic changes that we see as occurring around 1100, 1500 and 1700»
  • c) because Kriaras, and greek lexicography, prefer the term Medieval (supposing, that the tradition of lexicography of each language is taken in account here, in en.wiktionary.)

Thank you ‑‑Sarri.greek  I 19:37, 18 March 2023 (UTC)[reply]

Strange to refer to the seventeenth century as 'medieval'. Nicodene (talk) 18:52, 19 March 2023 (UTC)[reply]
Yes, @Nicodene, it might (note: some lexicographers extend it to 1800). Of course it is about language, not history. Also, the 'medievalism' both historically and linguistically had to do with the resurrection of the Greek statei in 1821 after some 400 hundred years of occupation. So, the term is not imprecise but describes a rare delay of renaissance. Thank you ‑‑Sarri.greek  I 19:03, 19 March 2023 (UTC)[reply]

Periodization Koine & terminus[edit]

Let me elaborate on point 3 from above for the ending of Koine and beginning of Medieval Greek.

  • Koine. The 330 terminus was chosen by previous century lexicographers to coincide with the historical beginning of Byzantium. The founding of Constantinople. But this had nothing to do with language. The official language of the empire was Latin, and people spoke Late Koine. The 6th century terminus makes sense because it is the first time we have official legal texts in Greek, so conventionally, we can make it a boundary for passing from Koine to Medieval. Note, Koine was used by authors for many centuries ahead. E.g. Eustathios of the 12 century is quoted at Koine section under Ancient Greek for his Scholia, not under Medieval. (Not to mention atticists)... So, we have
  • Koine1. 3rd, 2nd, 1st BCE centuries,
  • Koine2. 1st, 2nd 3rd CE centuries,
  • Koine3 is Late Koine. 4th, 5th, 6th centuries (as included at {{R:DGE}}, TLG, and recent lexicography for Ancient Greek).. and for some writers, up to ... 1970s.

Thank you ‑‑Sarri.greek  I 19:37, 18 March 2023 (UTC)[reply]

Periodization Medieval Greek & terminus[edit]

Let me explain point 3 from above for the ending of Medieval.

  • Med starts (conventionally) 6th century with the Iustinianus Novellae (Νεαραί). Greek is now official language in the Byzantine Emprire. So we have
  • Med1 7th-11 centuries An extended Late Koine, because we have only learned texts surviving.
  • Med2 1100-1453 the main Medieval period, vulgar texts in abundance. Not only, but mainly, in the Byzantine empire ++Venetian and Frankish occupations. It is much more like Modern Greek than Koine. Language change was significant.
  • Med3 1453-1669 (or 1500-1700 if you like) is LateMedieval, interchangeably, EarlyModernGreek. Up to 1669, the year of the Fall of Crete coincides with the ending of Cretan literature, theatre and poetry, sung and popular even today. As Kriaras explains it in his 1100-1669 dictionary of «the last byzantine and first postbyzantine centuries» in his own words (cover)
    Quote, p.XI Kretchmer pereferred as boundaries the basic limit-chronologies of byzantine history (324, 1453). We have placed 1669 […] because a substantial portion of Greek literary output echoes byzantine tradition. […] Indeed, these cretan texts, inspite their individual characteristics, are placed in the linguistic atmosphere of the most-vulgar texts of the last byzantine centuries.The CambridgeGrammar uses the term EarlyModernGreek. We could place this Med3 either under Modern or Medieval Section. The reason why el.wiktionary places it under Medieval is a) because of the Kriaras Dictionary and his rationale and b), for a technical reason: because we study it in polytonic script.

Thank you all ‑‑Sarri.greek  I 19:37, 18 March 2023 (UTC)[reply]

Comments on Medieval Greek[edit]

I support all of the points immediately above (renaming "Byzantine Greek" to "Medieval Greek", standardising the periodisation of Koine as 3rd century BC to 6th century AD, and distinguishing the 600–1100, 1100–1500, and 1500–1700 periods). I also lean towards supporting splitting Medieval Greek as a distinct language: this is no different in principle from distinguishing e.g. Old French / Middle French / French as three distinct languages, or many other medieval European "Old" languages that we've added with no problems (see @Vininn126's work on Old Polish, which has no ISO code at all even in ISO 639-3). The one problem is that the grc code is currently formally defined as referring to Greek up to 1453, but as the ayin transliteration debate shows this is something we can choose whether to follow at our leisure. —Al-Muqanna المقنع (talk) 00:46, 19 March 2023 (UTC)[reply]

I like the idea of renaming "Byzantine Greek" to "Medieval Greek" and taking it up to 1669 or 1700, as the Greek sources do. Now, when I see a μσν. = μεσαιωνική ελληνική term quoted in Kriaras or {{R:DSMG}}, I have to check if it is attested before the arbitrary (and sad) date of 1453, which is difficult. I am undecided on splitting Medieval Greek from grc. That will lead to a lot of duplication. — This unsigned comment was added by Vahagn Petrosyan (talkcontribs) at 09:13, 19 March 2023(UTC).

@Vahagn Petrosyan, {{R:Kriaras Medieval}} is for vulgar language. Date, is stated, if known (e.g. λέξη του 11ου αιώνα ("word of 11th century") or, we understand itfrom the authors (here is the guide pdf. Note, that some writers continue to write in Koine. Kriaras and the Cambrdigde Grammar do not deal with them at all. Their vocabulary, up to 11th centruy is covered at {{R:LBG}}, where just getting a password will show you that it is very similar to Koine, like we see it at LSJ
No inflectional endings are given in any med.greek lemma of either LBG or Kriaras, because of the variety.
But, Vahagan, how many centuries and millennia would you need to make it an autonomus Section? Etymological and othercategories, the lemmata themselves are so weird, under the title Ancient Greek Howww can a word of 11th CE century be under a title with the word 'Ancient'. Ancient means ancient. Thank you, and especially for all your great work for greek dialects! ‑‑Sarri.greek  I 14:26, 19 March 2023 (UTC)[reply]
@Sarri.greek: so would a word first attested in a Koine text written after the 6th century be classified as gkm or grc? Vahag (talk) 15:10, 19 March 2023 (UTC)[reply]
@Vahagn Petrosyan if it is a new word, a neologism of its era, but in Koine style, {{R:LBG}} has it. Also {{R:Dimitrakos 1964}} (a bit difficult to read this dictionary, he does each definition diachronically from grc to el). LBG has the authors (so, we get the dates of their lifetime). A 10th century new word found at LBG: if you cannot find it at LOGEION, you know, it is not Koine, but Med. It would be labelled either as Late Koine (Scholia to ancient texts) or as learned medieval (like religious texts, laws etc), just as we label learned for any such case.
Had it been vulgar, people's language, LBG would not have it, Kriaras would.
We do not ask what the dating of the text/word was, but who the author was. The high prestige and Ancient Greek resulted to a continuous diglossia. Take an extreme example. Anna Comnena who had this dream of becoming the female Thucydides: she wrote her Alexias in attic dialect. No medieval dictionary would deal with words, inflectional forms of a revived Attic. Perhaps they would only include placenames, people's names, or vocabulary of things of her era.... ‑‑Sarri.greek  I 15:31, 19 March 2023 (UTC)[reply]
@Vahagn Petrosyan example wikt:el:βάμβαξ. But the ending, is like Koine and Ancient. el:ἀκουμπῶ / Cat. of Early Med words ‑‑Sarri.greek  I 15:43, 19 March 2023 (UTC)[reply]
If I understood you correctly, a Koine-style neologism after the 6th century should be put under ==Medieval Greek==. How would the etymology section of such a word look like? For example, ὀφθαλμοπονία (ophthalmoponía, eye pain) in LBG. Would you say {{affix|gkm|ὀφθαλμός|πονέω}} and then create Medieval Greek entries for ὀφθαλμός (ophthalmós) and πονέω (ponéō)? Would the Medieval Greek entry for ὀφθαλμός (ophthalmós) contain the definition "eye" or just the two new senses that are attested in the Medieval period, namely "a kind of stone; water intake of a mill". I am asking because I haven't figured out how to handle the medieval period of another language with diglossia — Armenian. I now regret having code axm for "Middle" Armenian. Vahag (talk) 16:41, 19 March 2023 (UTC)[reply]
@Vahagn Petrosyan I do not write blind etymologies. I would have to find a dcitionary that states: {l|gkm|ὀφθαλμοπόν(ος) (< Koine {l|grc|ὀφθαλμοπόνος) +{af|gkm|-ία. http://stephanus.tlg.uci.edu/lbg/#eid=51338
In general, I would treat the word, just as any other language. ‑‑Sarri.greek  I 16:50, 19 March 2023 (UTC)[reply]
I may be showing my ignorance, but that reminds me a lot of Katharevousa- is the way we deal with that relevant here? Even with English we have Edmund Spenser, who wrote what sometimes looked like Middle English in what we consider to be the Early Modern English period. Sometimes you just have to come up with an arbitrary cutoff date and stick with it. Chuck Entz (talk) 00:24, 20 March 2023 (UTC)[reply]
Yes, it is a common theme in Greek @Chuck Entz, the revival (an artificail one) of old styles. But my question here is: is Medieval Greek recognized at en.wiktionary as an existing and documented language period? similar to all other Middle and Medieval periods of other languages covered here? Do the diglossic literary styles overshadow its existence? Thank you. ‑‑Sarri.greek  I 00:43, 20 March 2023 (UTC)[reply]

Conclusion for Medieval Greek[edit]

[by Sarri.greek] I thank you all for contributing to this discussiion, and helping to clarify.
During the last 15 days, I have contacted the three administrators for Ancient Greek at their talk pages and two of them respdonded; @JohnC5.talk and @Mahagaja.talk. I thank JohnC5 and @Mahagaja for their positive responses for the above changes. I think there is no objection, except one about wikipedia's different periodization (w:en:Template:Greek language periods already has correct Koine up to c.600. I could notify WP, that a period 1453-1669 as either Late Medieval Greek or Early Modern Greek is studied at en.wiktionary under Medieval Greek language (in coordination with Dictionaries for Med.Greek. w:en:Tempalte:Greek language has not been updated yet).
If administrators agree that there are no other objections, could the necessary changes start being implemented?

I thank you in advance, ‑‑Sarri.greek  I 11:05, 31 March 2023 (UTC)[reply]

@Sarri.greek: Before we do anything, I must point out that the code for Medieval Greek (or Byzantine Greek) is gkm, not byz, which is the code for Banaro, a language of Papua New Guinea. —Mahāgaja · talk 11:16, 31 March 2023 (UTC)[reply]
@Mahagaja:, yes of course. byz is used at IPA. Thank you. ‑‑Sarri.greek  I 11:20, 31 March 2023 (UTC)[reply]
Actually, I just noticed that gkm is not actually an official ISO 639-3 code. It was requested back in 2006, but no decision has been made yet. That being the case, is it in keeping with Wiktionary policy to use the gkm for Medieval Greek, or do we have to use an explicitly Wiktionary-only code like grk-gkm until such time as the code is made official? —Mahāgaja · talk 11:22, 31 March 2023 (UTC)[reply]
@Mahagaja -if the question is for me-, I dont know your policy. Or if you add at a list such language-codes as 'under trial use' or something like that. ‑‑Sarri.greek  I 11:30, 31 March 2023 (UTC)[reply]
The question isn't for you but for other admins. Reading Wiktionary:Languages § Language codes, I think we do have to use grk-gkm and list it at Module:languages/data/exceptional, not at Module:languages/data/3/g. —Mahāgaja · talk 11:36, 31 March 2023 (UTC)[reply]
Yes, @Mahagaja. I have read the ISO.proposal2006 for gkm, which is very old. I hope a proposal would be renewed with updated sources, besides 'Robert Browning'. and perhaps from some official institution -I' ll try to find out what is available- By the way, at el.wikt we already use labels for dialect codes. gkm-cyp (Medieval Greek Cypriot) and gkm-crt (..Cretan). ‑‑Sarri.greek  I 11:59, 31 March 2023 (UTC)[reply]
Could @Benwing2 help or perhaps recommend an admin for {alert|languages} to assist with the code gkm or ...? Thank you. ‑‑Sarri.greek  I 18:47, 31 March 2023 (UTC)[reply]
@Sarri.greek Apologies for not keeping up with this discussion, as it's long and technical and I don't know enough about Greek periodization. What is the request exactly? Is it to convert 'gkm' to a full language from an etymology-only language? Anything else? Also, User:Mahagaja you are suggesting a different code 'grc-gkm'? I don't think there's any rule here that says we can't use non-official ISO 639-3 codes for languages, although User:-sche can correct me if I'm wrong. Definitely it would be better to stick with 'gkm' if possible as it's four fewer characters to type. Benwing2 (talk) 18:57, 31 March 2023 (UTC)[reply]
BTW I don't think it will be difficult to make this conversion (going in the other direction, from full to etym-only language, is harder). When you create the new 'Medieval Greek' lemmas, you'll have to use {{head}} for the moment until we have new gkm-specific headword templates. User:Sarri.greek maybe you can specify how you want the templates to behave, and someone who knows Greek well (User:Erutuon, if you have time?) can help implement them? Benwing2 (talk) 19:01, 31 March 2023 (UTC)[reply]
@Benwing2: Are there any examples of non-ISO codes in use at Wiktionary that have the form "xxx" rather than "xxx-xxx"? I thought we kept them carefully separate. —Mahāgaja · talk 19:08, 31 March 2023 (UTC)[reply]
Yeah, if gkm has never been an official ISO code for Medieval/Byzantine Greek, then for internal clarity and to avoid problems if the ISO assigns gkm to another (newly-encoded) language (which they would be free to do!), it should be formatted as an exceptional code xxx-yyy where xxx is the nearest family code and yyy is some string, as described on WT:LANG. To my knowledge the only time we use ISO-like but non-ISO codes is when something used to be an official ISO code and the ISO retired it but we didn't, like sh and some minor languages with three-letter codes that they've split or merged but which we haven't (yet). Because we use that ISO-like three-letter code for it, I didn't realize that gkm wasn't an official ISO code! For a time, we used LinguistList's qot for Sahaptin (which was actually less of a problem since IIRC that's within the range the ISO allots for private use), but when that was noticed we re-coded it, too. Perhaps someone can prod the ISO to approve gkm, but until then, let's use grk-gkm. Probably it would be beneficial to check ISO's code list against ours and see which ISO codes are absent from our modules and which two- or three-letter codes we use are absent from the current ISO standard. I did this years ago and noticed quite a few discrepancies where we needed to either add a code or record on WT:LANGTREAT that we were intentionally excluding it. - -sche (discuss) 19:51, 31 March 2023 (UTC)[reply]
1) Thank you very much @Benwing2. Template {{head}}|gkm|POS} looks fine. Perhaps a label might be used (learned or formal). Bibliography templates already exist. No declensions are needed. Inflectional forms are discussed in dictionaries as attested.
It would be nice if gkm would anticipate a renewal of proposed ISO language (hopefully soon). ‑‑Sarri.greek  I
2) Thank you @-sche for your help and explanations. I hope someone will renew the gkm ISO proposal soon. At the moment, any code would be fine! Is grk-gkm = hellenic language, Medieval OK? ‑‑Sarri.greek  I 20:33, 31 March 2023 (UTC)[reply]
@-sche, +parent = Ancient Greek grc. descendants = Modern Greek el, Pontic pnt, and Cappadocian cpg... ‑‑Sarri.greek  I 20:40, 31 March 2023 (UTC)[reply]

@Sarri.greek, you did not answer how we should deal with Ancient Greek words developing new senses in the Medieval period. For example, ὀφθαλμός (ophthalmós) has the new meanings "a kind of stone; water intake of a mill" according to LBG. Should we create ==Medieval Greek== with just those two senses? --Vahag (talk) 20:01, 31 March 2023 (UTC)[reply]

Yes, @Vahagn Petrosyan. We would treat the senses as we do for every lang. Also in Modern Greek, senses may be identical to ancient ones, or/and plus new senses, or different. It is no problem: we follow our dictionaries's definitions. ‑‑Sarri.greek  I 20:21, 31 March 2023 (UTC)[reply]
But we don't follow dictionaries. We include what is attested. Obviously, the bulk of Ancient Greek lexicon is attested also in the Medieval Period, in the same form and meaning. You are proposing to duplicate the whole of Ancient Greek lexicography under ==Medieval Greek==. Note that LBG and Kriaras are dictionaries of differences from Ancient Greek, namely of words, wordforms and meanings unattested in Ancient Greek. For example, they do not include κύων (kúōn, dog). They are supposed to complement Ancient Greek dictionaries, not duplicate them.
Another problem beside duplication is that if gkm is separate from grc we will have determine if each borrowing in other languages (Old Armenian, Old Georgian, Coptic, Aramaic etc.) happened before or after 500 AD. That is difficult. Vahag (talk) 20:46, 31 March 2023 (UTC)[reply]
@Vahagn Petrosyan, no, the dictionaries of Medieval deal with all words attested in the particular periods without 'duplicating the whole of Ancient'. No, {{R:LBG}} and {{R:Kriaras Medieval}}, do not include only 'different' senses. They have all senses that are attested and found in medieval texts of their scope. κύων (kúōn) is an ancient word. Greeks of today may use it too if they wish. That does not make it a Modern Greek word. In Greek, you may use, reuse, quote, any word or inflectional form from any period. Dictionaries do not deal with such references, but with real usages.
For the borrowings: the Medieval Greek has been overlooked, but a borrowing might happen through ancient texts too. Not necessarily through contact with post 7th century speakers of Med.Greek. You do not have to change anything from your sources. Thank you, I hope this covers your question. ‑‑Sarri.greek  I 21:00, 31 March 2023 (UTC)[reply]
I will give you an example. LBG has the word βρομερός meaning "stinky" attested in the phrase βρωμερός κύων "stinky dog"" in a 6–7th century medieval text. LBG has no entry for κύων because there is nothing different about that word compared to Ancient Greek. You see that it does not include all words and senses attested in the texts of its scope (it's the same text). If you split code gkm, then according to Wiktionary's principles both βρωμερός and κύων will be eligible to be entered under ==Medieval Greek== because they are attested in the medieval period: see WT:CFI. Κύων is attested in modern Greek texts too so it is eligible to be entered also under ==Greek==.
Regarding the borrowings in other languages, if gkm is split from grc, then all oral borrowings happening after the cut-off date are by definition not from grc. I can't rely on my sources anymore. I will have to figure out if the borrowing of ղենջակ (łenǰak) is pre- or post- the cut-off date. That may be impossible for non-literary languages like Laz. Vahag (talk) 21:36, 31 March 2023 (UTC)[reply]
@Vahagn Petrosyan, it is normal to use an ancient word. Also, the presence of Med.Greek has always been persent at English.wikt, either as Category under grc or as grk-gkm. It did not alter your etymologies. ‑‑Sarri.greek  I 21:42, 31 March 2023 (UTC)[reply]
As Vahag says, entries will have to be based on cites, not copying other dictionaries (although presumably dictionaries will be of help in figuring out what stage a term is attested in). That said, and while I can't contribute much to the question of whether to split Medieval Greek off as a separate language, it's probably best for that decision to based on 'Greek-internal' factors (whether the stages are as different as other stages we would typically consider different lects, etc) and not on what's easiest for borrower-languages' etymology sections [regarding the point about Laz, above], because even for languages as well-attested and well-written-about in reference works as English and French, it can require more resources than I am able to track down, to figure out whether a particular word for e.g. armor or heraldry was borrowed from Modern or Middle (or possibly even Old) French or even Anglo-Norman, but it does not follow that having these as separate languages is bad; in such cases all I can do is spell out the uncertainty like "from modern or Middle French foo". In other situations it can be hard to figure out which of several closely (or even not closely!) related languages a term was borrowed from, e.g. bensin. - -sche (discuss) 22:28, 31 March 2023 (UTC)[reply]
I completely agree with User:-sche here. It sounds like there is still some discussion to be had; I'll wait to make any changes until the issues are resolved. Benwing2 (talk) 22:34, 31 March 2023 (UTC)[reply]
@Benwing2, issues? I agree too with -sche. That it is based on Greek internal factors and that having these as separate languages is not bad. ‑‑Sarri.greek  I 22:41, 31 March 2023 (UTC)[reply]
@Sarri.greek User:Vahagn Petrosyan seems not to support this, and User:-sche just said that the decision should not be made based on how easy it is for etymologies to be created (which I agree with), but didn't explicitly support the change. I think we should try to resolve Vahag's concerns. Benwing2 (talk) 22:59, 31 March 2023 (UTC)[reply]
@Benwing2, Vahagn Petrosyan. I hope so too. But I have not studied other languages except greek, and I cannot comment on their etymologies. Our concern here, is to correct the title 'Ancient' over medieval words, and keep en.wiktionary updated with reference and bibliography published in the last decade, thus correcting the empty gap of the 7th century to modern times, covered by 'Ancient'. Thank you. ‑‑Sarri.greek  I 23:07, 31 March 2023 (UTC)[reply]
I don't understand what is the Greek internal factor that makes ὀφθαλμός (ophthalmós, a kind of stone; water inlet) a word in a different language stage than ὀφθαλμός (ophthalmós, eye). Did it undergo a sound change? No. It just developed yet another figurative meaning. Before we move ahead with this momentous change, @Sarri.greek can you please create a sample ==Medieval Greek== page in a sandbox in your userspace for ὀφθαλμός (ophthalmós) and another one for ὀφθαλμόπονος (ophthalmóponos) to see how they will look like? Vahag (talk) 23:07, 31 March 2023 (UTC)[reply]
Normally i need more time to study words, date all authors where they are attested, but here User:Sarri.greek/gkm-test is a sample page of your requested terms Vahagn Petrosyan. ‑‑Sarri.greek  I 00:00, 1 April 2023 (UTC)[reply]
@Sarri.greek: thank you for creating the samples. These look like normal Ancient Greek words that can be presented under ==Ancient Greek== with a label {{lb|grc|Medieval Greek}}, using the vast infrastructure that has already been developed for Ancient Greek.
Note that there are no gaps. All Greek words until 1453 can currently be entered under code grc (we can raise the cut-off date to 1663). All Greek words after that can currently be entered under code el, with a label {{lb|el|obsolete}} if not used anymore. I understand that Medieval Greek under Ancient Greek is somewhat oxymoronic, but language names are conventional, we don't need to pay attention to that formal contradiction.
If we are to carve gkm out of grc and el, then there should be practical benefits in terms of organizing and presenting information. One benefit I can think of is allowing the normalizing of polytonic spellings to monotonic for gkm, like Kriaras does (that is not acceptable for grc). That way we can freely use the monotonic headwords and quotations found in Kriaras without having to find and restore the polytonic spelling. Another benefit is presenting the vulgar alternative spellings found in medieval texts in the ===Alternative forms=== section of ==Medieval Greek==, like αφθαλμός, εφταλμός, ουφθαλμός for ὀφθαλμός. I assume most of our users would not appreciate seeing those barbarities in the Ancient Greek section. Vahag (talk) 14:42, 1 April 2023 (UTC)[reply]
Yeah, it's not clear to me what the reason they would have to be split is. Sarri, you said in your initial post that if we "make Medieval Greek an autonomus language section[,] not many lemmata would be added, but I feel that a gap of 1,000 years (6th to 17th century) is a somehow serious omission", but that seems like a misunderstanding, there shouldn't be any omissions at present, everything is entered as either Ancient or modern Greek depending on when it's attested relative to the cutoff between the two. And it seems from Vahag's comments like many duplicate entries would be added if we split, since all the words that just continued to be used from ancient through medieval times would be duplicated (since, as Vahag said, we add what's attested, not only what other dictionaries have, and hence not only words first attested in the medieval period). It is unfortunate we don't have many active editors familiar with Ancient Greek who could weigh in; most of the discussion seems to be about what to call the lect (even if it's just an etymology-only language), but other than the proposer and Al-Muqanna is anyone else supporting a split as opposed to just commenting, like me and Mahagaja, that they aren't in the best position to judge whether it's necessary or not? And if so, what's the rational for the split? (My comment above was just to say that I don't think "it's hard to tell when another language borrowed a term" is an impediment to splitting the lects, but I'm not seeing what internal changes would necessitate a split.) From Vahag's comments it seems like a lot of the vocabulary is unchanged, just in some cases with semantic changes which could be handled via labels like they should already be. - -sche (discuss) 02:01, 1 April 2023 (UTC)[reply]

@-sche. I am not a linguist. If the dictionaries {{R:Kriaras Medieval}} for 1100-1669, {{R:LBG}} for the early period 9th to 11th century, and if the 2019 Cambridge Grammar [7] do not suffice for a justification of a separate language section, what could I, a little editor answer to you? Or what could I say to all the administrators of en.wikt, which prides for its accuracy, its plurality, covering thousands of languges. ‑‑Sarri.greek  I 02:10, 1 April 2023 (UTC)[reply]

To the future[edit]

I am certain that gkm will inevitabely become autonomous language one day, which is the correct thing. A hellenist interested in it may arrive here in some years, perhaps in some decades. α! φίλε μου, I apologise for being so inadequate! ‑‑Sarri.greek  I 14:49, 1 April 2023 (UTC)[reply]

Change Proto-Mon-Khmer to Proto-Austroasiatic[edit]

Proto-Mon-Khmer is deprecated. The name of Category:Proto-Mon-Khmer language needs to be changed to Category:Proto-Austroasiatic language, just like how we have Category:Proto-Sino-Tibetan language rather than Category:Proto-Tibeto-Burman language. See the Wikipedia article on Austroasiatic languages to get an idea of why Mon-Khmer is no longer valid, because Munda and Nicobarese are simply regular branches that are sisters of the other so-called Mon-Khmer languages. So how can this name change be done? Ngôn Ngữ Học (talk) 21:51, 18 March 2023 (UTC)[reply]

@Ngôn Ngữ Học Normally this would be handled at Wiktionary:Requests for moves, mergers and splits.
Thanks, I have just moved this discussion to Wiktionary:Requests for moves, mergers and splits. Ngôn Ngữ Học (talk) 22:19, 18 March 2023 (UTC)[reply]
@-sche, PhanAnh123, Patnugot123 This suggests we need to rename the Proto-Mon-Khmer lemmas to Proto-Austroasiatic. Can they simply be renamed or do they need updates to the reconstructed forms? I don't know a damn thing about these languages but I can help with bot stuff. Benwing2 (talk) 22:07, 18 March 2023 (UTC)[reply]
@Benwing2 They can simply be renamed. Category:Proto-Sino-Tibetan language is a perfect example of this. The Proto-Sino-Tibetan lemmas are actually all Proto-Tibeto-Burman reconstructed forms by James A. Matisoff, who considers Tibeto-Burman to be a branch of Sino-Tibetan. Now, more scholars are thinking that Chinese is simply another another regular sister branch of the various Sino-Tibetan languages out there, rather than its own special branch. Same goes for Mon-Khmer. Ngôn Ngữ Học (talk) 22:18, 18 March 2023 (UTC)[reply]

Remove language name from reconstructed entries' title[edit]

I propose we move our reconstructed entries from Reconstruction:Langname/entry to simply Reconstruction:entry (and merge consequential homographs). Note how we currently have an L2 header repeating what the title already says. Nowhere else in the project (mainspace, citations, thesaurus, ...) are language names in the title of an entry. This would make it easier to look up the terms in the search bar, and setting R: as shorthand for Reconstruction: would help even more. The current situation feels like a remnant of when these entries used to be in the Appendix namespace. Catonif (talk) 09:25, 19 March 2023 (UTC)[reply]

  • Support, though I'll defer to editors who regularly edit within the namespace. — excarnateSojourner (talk · contrib) 04:25, 12 April 2023 (UTC)[reply]
  • Support. Not only is it inconsistent, but it means special logic has to be used in (e.g.) Module:links, and - much more of a headache - in the lite templates. We’re fast approaching some of the non-Lua page limits on certain pages with a lot of lite templates, and one reasons for that is the process of adding the language name for every reconstruction link. Removing that would reduce load time and add a buffer against the limits on those pages. Theknightwho (talk) 16:34, 12 April 2023 (UTC)[reply]
Support if no issues result from this. Nicodene (talk) 16:43, 12 April 2023 (UTC)[reply]
Oppose, reconstruction namespaces are generally very large already and a reconstruction like *a would have to host so many entries it will quickly follow the fate of its attested counterpart. If anything, we should think about splitting the mainspace. Thadh (talk) 17:01, 12 April 2023 (UTC)[reply]
Are they, compared to pages like para? Theknightwho (talk) 17:20, 12 April 2023 (UTC)[reply]
Proto-Sino-Tibetan *ŋa, Proto-Indo-European *bʰer-, Proto-Athabaskan *tuˑ; Yes. Thadh (talk) 17:51, 12 April 2023 (UTC)[reply]
@Thadh None of those even come close. Theknightwho (talk) 11:11, 13 April 2023 (UTC)[reply]
To one section? Thadh (talk) 11:36, 13 April 2023 (UTC)[reply]
They are just one section, but I’m sceptical that any combined reconstruction pages would have the 1,000+ template calls that we see on the mainspace pages that are causing us headaches. Theknightwho (talk) 11:43, 13 April 2023 (UTC)[reply]
Oppose, per Thadh. I don't see why we'd merge them. — Fenakhay (حيطي · مساهماتي) 17:10, 12 April 2023 (UTC)[reply]
Oppose. Vininn126 (talk) 17:16, 12 April 2023 (UTC)[reply]
Abstain. I kind of like being able to search for talk pages of only specific reconstructed languages (e.g. all Reconstruction talk:Proto-Algonquian/), but that's a minor issue. If I recall correctly, this was proposed years ago and one of the objections raised then was that reconstruction entries' orthographies are somewhat arbitrary so we'd be putting e.g. Proto-Algonquian terms with θ or x on the same pages as Proto-Germanic (etc.) terms with those letters as if those two languages both had a term spelled or pronounced that way, when in fact neither term was written that way historically and they represent completely different sounds. I don't consider that overly persuasive, since with e.g. cag, g likewise represents something completely different in English vs Hmong (where it's not a sound at all but just indicating the tone of the preceding letters), but the argument was that at least there all the listed languages really do spell their terms that way. - -sche (discuss) 17:25, 12 April 2023 (UTC)[reply]
Oppose. There would be more language ambiguity if we remove the language name and more confusion than having a single language per reconstruction entry. Kwékwlos (talk) 18:04, 12 April 2023 (UTC)[reply]
But we do this in mainspace already... It creates more confusion to have two different ways of doing it. Theknightwho (talk) 18:21, 12 April 2023 (UTC)[reply]
This is likely something that'd need to go to a full vote. I'm abstaining for now, but I do think that the Lua issues should be taken into account, along with readability issues of entries like a. AG202 (talk) 18:14, 12 April 2023 (UTC)[reply]
Abstain for now. I would like to see some data on how many reconstruction entries would be merged due to this (and hence may lead to readability/Lua memory issues). Wpi31 (talk) 18:33, 12 April 2023 (UTC)[reply]
Oppose Would mess with my reconstruction page tracking and scraping, and I think it would hurt SEO results, for whatever that's worth. Also, don't want to give people any Proto-World ideas, LMFAO, and what Thadh said. -- Sokkjō 19:27, 12 April 2023 (UTC)[reply]
Oppose. The technical argument is not convincing - the real way to get rid of this 'special case' would be to retire the Reconstruction namespace entirely and have e.g. Reconstruction:Langname/entry directly under *entry, but I cannot support that either. — SURJECTION / T / C / L / 20:01, 16 April 2023 (UTC)[reply]

To hyphen or not to hyphen[edit]

In Bantu languages, there’s many morphemes that are technically prefixes (they go before the content-word stem), but cannot be at the beginning of a word. All sources that I know of write these with hyphens both before and after, as -me-. It seems, however, that the editors of the Nguni languages have lemmatised many such morphemes with only a hyphen after, such as sa- (etymology 2). What is the “canonical” way of doing this here at Wiktionary? The entry layout page is vague (“where it links with other words”; these morphemes link with other affixes, though, not always with actual words). I’d prefer two hyphens, as these morphemes must attach to something both in front and after, and that’s what many (all?) sources do. But there’s a lot of precedent against it. MuDavid 栘𩿠 (talk) 07:18, 21 March 2023 (UTC)[reply]

@MuDavid If all sources use hyphens on both sides, we should do the same (and these should potentially should reclassified as interfixes). Benwing2 (talk) 22:28, 23 March 2023 (UTC)[reply]
I agree, if they can't go at the start of the word then it makes sense for there to be hyphens before and after. —Al-Muqanna المقنع (talk) 23:48, 23 March 2023 (UTC)[reply]
Thank you for the answers!
@Benwing2 According to both Wiktionary and Wikipedia (the sources of all knowledge ☺) an interfix is a meaningless morpheme, which is not the case here. I’m still looking for a better term; I’m tempted to make -me- a particle like Vietnamese đã, given that they function exactly the same except for the presence of whitespace. MuDavid 栘𩿠 (talk) 01:57, 24 March 2023 (UTC)[reply]
The term in this case is infix, which often gets confused with interfix for understandable reasons but is also an admissible part of speech (see WT:POS, Category:English infixes etc.). —Al-Muqanna المقنع (talk) 02:05, 24 March 2023 (UTC)[reply]
The word infix normally means a morpheme that goes inside another morpheme, which is not the case here either: -me- goes between different morphemes. Many sources call it an infix and we also used to, but I’m not sure we should. MuDavid 栘𩿠 (talk) 02:34, 24 March 2023 (UTC)[reply]
There is prescriptive disapproval of this application, for sure, but ultimately Wiktionary decides for itself how to use particular terms when other sources differ in their usage, and our own glossary definition atm is that an infix is a morpheme inserted inside a word, not specifically in another morpheme. In that case the IP's justification for changing the POS header there ("infixes go inside stems/root not simply inside words") was simply ignoring how the term's been defined for in-house use. I can also think of counterexamples I've come across in reference grammars for other languages myself, e.g. Chinese 得 being defined as an infix in constructions like 聽得懂听得懂 (tīngdedǒng). "Prefix" or "Infix" are both comprehensible in this case IMO (and I don't think "Prefix" has to mean no hyphen at the front). My only suggestion would be to avoid a bespoke option not used in the sources like the "Particle" idea, which is likely to be opaque to readers. —Al-Muqanna المقنع (talk) 03:07, 24 March 2023 (UTC)[reply]
I agree with Al-Muqanna regarding the scope of infix: perhaps some works prescribe a narrow definition, but in practice (things described as) infixes are clearly not limited to being inside a morpheme, even in works about grammar. (All of Wikipedia's examples of English infixes are not limited in that way, e.g. hiphop hizouse or Homeric edumacation.) To the immediate question about Bantu, though... how do works about Bantu describe these? As prefixes? It's apparently possible for something to be a prefix even if it can't appear at the beginning of a word: compare Wiktionary:Tea room/2023/March#-nil-, and various Category:Navajo prefixes like -ł-, -∅-, and -ba. (OTOH it is also conceivable that some of the Navajo affixes are mislabeled.) - -sche (discuss) 22:02, 24 March 2023 (UTC)[reply]
Trask defines infix as “An affix which occupies a position in which it interrupts another single morpheme”. If we use a different meaning, we should definite more clearly say so in our glossary.
I’ve searched some works on Bantu grammars, and it seems some conspicuously avoid calling those anything but “markers”, but the term infix is also common, see for example here. MuDavid 栘𩿠 (talk) 02:37, 27 March 2023 (UTC)[reply]
"Infix" is common in Bantu linguistics, or at least it used to be, but that usage conflicts with normal usage of the term. AFAIR, and pace Trask, in general an infix does not necessarily appear inside a morpheme, but must appear within a stem. That is, if you derive one word by adding an affix between morphemes of an existing word. (Generally this would also allow appearance within a morpheme, but there may be exceptions.) But a sequence of affixes does not turn the internal affixes into infixes. I'd oppose labeling these "infixes" because (1) it is technically incorrect, and (2) it may be passing out of favor even within in-house Bantuist convention.
I agree however that we should write two hyphens. That does not imply that the element is an infix, only that it is affixed to something on both sides. kwami (talk) 08:41, 30 March 2023 (UTC)[reply]
If “infix” is passing out of favor among Bantuists, what do they call it instead then? For example, what do they call it when a morpheme is inserted in between two different stems? (It happens in Swahili, where the -o of reference is suffixed to auxiliary verbs, and some of these grammaticalized and ended up attached in front of their main verb.) MuDavid 栘𩿠 (talk) 09:39, 30 March 2023 (UTC)[reply]
As for passing out of favor, you'll have to ask the people above who said that. The older lit I'm familiar with usually does call them infixes, but this has long been criticized by linguists who work on languages that actually have infixes. IMO Wikipedia should not intentionally misuse technical terminology. We could call them "Bantuist infixes", I suppose, so as to not mislead the reader into thinking they're actually infixes.
Yes, the suffix to the compounded auxiliary is an interesting complication. I don't know that there is a term for it, but it's certainly not an infix, as that would mean the auxiliary had been compounded to the main verb first, to form a functional word of its own, and then that that verb was further derived by inserting the relative suffix. kwami (talk) 23:26, 30 March 2023 (UTC)[reply]
Nobody “above” said that infix is passing out of favor. I found some sources that call them markers and concords, which is what they are; that does not mean they are called prefixes now.
Words can have different meanings in different fields of study. If you cannot accept that, stay out of language, of all things. “Infix” may be limited to what you say in non-Bantu linguistics, that does not mean it cannot have other meanings in other fields.
If you’re unable to come up with a solution for infixed suffixes, I’ll stick with the solution that is generally accepted and continue calling them infixes. And to “not mislead the reader into thinking they're [what pedants call] infixes”, that’s only a question of expanding our glossary definition. MuDavid 栘𩿠 (talk) 03:44, 4 April 2023 (UTC)[reply]
Would anyone else like to weigh in here? (The only other user who comes to mind as knowing about Bantu is @Metaknowledge, who has sadly been inactive of late.) Benwing is OK with 'infix' (User_talk:MuDavid#Swahili_infixes), MuDavid is going with 'infix', and although I have no strong feelings, I'd go with whichever of our POS categories Bantu literature treats these as, which is apparently 'infix'. Kwami unilaterally changed the entries and categories to 'prefix', claiming there is a lack of consensus to continue using 'infix', and ignoring the rather more obvious lack of consensus for his own abrupt change away from the Bantuist-literature-standard term we've been using for years to the one he (alone?) prefers. Unless anyone else wants to weigh in in favour of 'prefix', I intend to undo that change on principle. (This is not the first time Kwami has tried to unilaterally implement his own preferences, claiming there was a lack of consensus to stop him, even if there was.) - -sche (discuss) 21:56, 4 April 2023 (UTC)[reply]
The infix entries were just made while this discussion was going on.
Anyway, if you have evidence that a Bantu language actually has infixes, then sure. Hadza is in the area, and has both infixes and suffixes. But if you're calling suffixes "infixes" because that's the local convention, we have a problem: Wiktionary is not about local conventions. So will will have a conflict between RS's that they're both infixes and affixes, and the consequent possibility of duplicate entries and requests for merging, when we could simply limit the label "infix" to infixes in the first place. I mean, if we haad a RS that called verbs "action words" and nouns "thing words", we wouldn't add those as POS labels and have the same word twice, once as a "verb" and once as an "action word". kwami (talk) 22:03, 4 April 2023 (UTC)[reply]
David Crystal (2008) A dictionary of linguistics and phonetics, Wiley-Blackwell pub., defines an "infix" as:
A term used in morphology referring to an affix which is added within a root or stem. The process of infixation (or infixing) is not encountered in European languages, but it is commonly found in Asian, American Indian and African languages (e.g. Arabic).
Hadumod Bussmann (1999) Routledge Dictionary of Language and Linguistics defines an "infix" as:
Word formation morpheme that is inserted into the stem, e.g. -n- in Lat. iungere ‘to tie’ vs iugum (‘yoke’) or the -t- in the reflexive function between the first and second consonants of the root in the eighth binyan of classical Arabic, cf. ftarag ‘to separate,’ ʕtarad ‘to place before oneself.’ Ablaut and umlaut are often considered infixes.
and under "affix":
infixes are inserted into the stem (e.g. -m- in Lat. rumpo ‘I break’ vs ruptum ‘broken’).
These are standard linguistic definitions, and Bantu agreement prefixes do not match. Indeed, Bussmann in summarizing Bantu languages says,
Complex verb morphology (agreement prefixes, tense/mood/polarity prefixes, voice-marking suffixes)
that is, labeling the morphemes in question as "prefixes".
I could easily find more, including professional descriptions of individual Bantu languages. If we insist on labeling the Swahili perfect morpheme -me- an "infix", then following RS's we would need a second perfect morpheme -me- labeled a "prefix". kwami (talk) 22:22, 4 April 2023 (UTC)[reply]
You still have not provided any reference work (let alone sufficiently many do establish consensus) describing the -o of reference as a “prefix” when this is suffixed to an auxiliary and followed by some other verb complex.
And don’t lie, Kwami. The “infix entries [] just made while this discussion was going on” were made after the discussion died down and before you revived it, the Category:Lingala words infixed with -el- was created in 2018, and @Metaknowledge created -me- with the “infix” header in 2016. MuDavid 栘𩿠 (talk) 01:12, 5 April 2023 (UTC)[reply]
Were made without any consensus or much discussion at all, in addition to being demonstrably wrong.
Finding a morphological classification for the -o- won't be easy. It's obviously not an infix; I suspect it will just be a suffix to a compounded aux (that is, AUX-o+VERB, not -o-), but it will probably take some time to find something. kwami (talk) 02:18, 5 April 2023 (UTC)[reply]
@MuDavid, can you give me specific examples of the AUX-o-VERB construction in question, or the equivalent in other Bantu languages? It's hard to address this without sometime concrete to base it on. kwami (talk) 03:37, 5 April 2023 (UTC)[reply]
Wait wait wait, you mean you made those edits to -ye- and its ilk without having the faintest idea of what it is or how it works? You had all this discussion without the slightest idea of what you’re talking about? That’s, erm, audacious is the only polite word I can think of. Do you even know anything of Bantu languages at all besides terminology nitpicking?
Well, before my patience runs out:
  • watakao kulathey who want to eat
  • watakaokulathey who will eat
How is -o- prefixed to -la here? Or:
  • watakacho kukisomathey who want to read it
  • watakachokisomathey who will read it
How is -cho- prefixed to -soma? MuDavid 栘𩿠 (talk) 01:50, 6 April 2023 (UTC)[reply]
I was wondering if maybe you meant -taka-, but that is not an auxiliary. It was one historically, but now it is simply a TAM prefix. You said "sometimes". "Sometimes" doesn't mean "all the time", which is what we have here: -taka- is always used for the future tense: it forms a paradigm with wanaokula and waliokula. So, yes, historically this derives from -o as a suffix to an auxiliary verb, watakao kula. But that's -o, not -o-. -o- is the same morpheme in wa-taka-o-kula as it is in wa-li-o-kula. The fact that the historical origin of -taka- from "want" is more transparent than that of -na- or -li- doesn't change the analysis of the -o-.
BTW, -li- and -na- have the same stress assignment as -taka-, which is why you will frequently see these written as wanao kula and walio kula, both in Latin and in Arabic script.
So, if you wish to analyze wanao kula as two words, AUX and lexical verb, then -o is a suffix to the AUX. If you wish to analyze it as a single word, then -o- is one of a string of prefixes to the root. -taka- is more complicated semantically, because if you analyze the above as two words, wanao kula, then you have watakao kula as two lexical verbs (those who want to eat) vs watakao kula as AUX + lexical verb (those who will eat). But that's no different in principle from English "will" as AUX vs lexical verb, or "going to" as verb of motion vs lexicalized as intention or a prospective aspectual marker, and doesn't affect the status of -o.
Anyway, -o- cannot be an infix in watakaokula, because there is no *watakakula for it to be infixed into. That's assuming that, like me, you accept the more lenient definition that an infix can appear between morphemes in a stem; the standard definition requires an infix to be placed inside a root. There are, BTW, true infixes in Bantu languages -- you can find examples in Nurse & Philippson. For example, David Odden says of Rufiji-Ruvuma,
Polysyllabic roots infix the vowel i, as in tukuta → tukwiite 'run'.
That's an infix. kwami (talk) 02:07, 6 April 2023 (UTC)[reply]
Did you even read what I wrote above? I’m perfectly aware that -taka grammaticalised and became a TAM marker (to use better terminology). The details of the grammaticalisation depend on the stress: without a suffix it becomes -ta-, with a suffix it remains -taka-. (And the etymology of -na- and -li- is not less transparent if you know basic Swahili. That’s just the preposition na and the stem -li of -wa.)
Anyway, you’re saying that the “analysis” depends on white space. More than one linguist I have interacted with in the past said that speech is primary, which means white space may not change your “analysis”. Swahili doesn’t put white space everywhere, Tsonga does. So what?
You still haven’t produced even a single reference work that explicitly “analyses” -ye- and its ilk as prefixes, while you did admit several times Bantu works in general call them infixes. MuDavid 栘𩿠 (talk) 02:36, 6 April 2023 (UTC)[reply]
Of course speech is primary. The white space reflects stress assignment, which is speech. Or at least that's Schadeberg's take on it. I was just pointing out that the way speakers write these constructions shows that they see them as being parallel, and that -taka- isn't an auxiliary that just "sometimes" happens to compound with the following verb.
Yes, some Bantuist works call them "infixes". But that's not specific to -ye-/-o-: all non-initial prefixes are called "infixes". -taka- is also an "infix". Watakaokula is a prefix wa- followed by two infixes, -taka- and -o-, and a root kula (unless you want to count the -ku- as a third infix). In Nurse and Philippson, there are places where they call something an "infix (prefix)", presumably because the material they're using calls the prefixes "infixes". There is plenty of discussion in the lit that, for cross-linguistic usage, calling non-initial prefixes "infixes" is not useful, because now you need a new word for infix. That terminology may function acceptably if your language has no infixes, but here on Wikt we have words from languages which do have infixes, and if we call prefixes and suffixes "infixes", what do we call the infixes? This is a terminological distinction, not a difference in analysis. kwami (talk) 02:52, 6 April 2023 (UTC)[reply]
Robert Botne ("Lega" in N&P) speaks of "relative prefixes". E.g., In object relative clauses, the relative prefix occurs in the SUB slot. In subject relative clauses, the prefix occurs in the SP slot, and Object relative constructions require two agreement prefixes, a relative prefix determined by the object noun, followed by a subject prefix determined by the agentive subject.
Here it's the relative that occurs initially and the subject that follows, so it's the subject prefix that would be the "infix" in the tradition you're referring to. The phrase "relative infix" does not occur once in the entire volume.
You keep speaking of the relative prefixes, as if they were morphologically distinct from the object and TAM prefixes that are also often called "infixes" in the Bantu tradition. Can you cite anyone that distinguishes them, object "prefix" -o- vs relative "infix" -o-? If not, then you have no RS for the distinction you draw, and we're reduced to the oft-noted fact than non-initial prefixes have often been called "infixes" in Bantu linguistics.
BTW, that used to be the case more generally in linguistics, but once true infixes started to be found, people left off calling prefixes and suffixes "infixes". That development came late to Bantu linguistics, presumably because infixes are marginal among Bantu languages (and don't occur at all in most). kwami (talk) 03:12, 6 April 2023 (UTC)[reply]
Sigh. Could you pleeease read the context there? The “SUB” slot is the very first one. The “SP” slot is the first one that’s required. So Lega actually has relative prefixes. Sweet, but has no bearing on Swahili. You’re just wasting my time. MuDavid 栘𩿠 (talk) 03:28, 6 April 2023 (UTC)[reply]
Could you please read what I wrote about the context? Yes, we have relative prefixes here. But then we have subject prefixes, which by your definition are infixes. To repeat myself, do you have a RS that Swahili relative infixes are more infixy than Swahili object and TAM infixes? That somehow they're the real infixes, so even if the object and TAM infixes are actually prefixes, the relative infixes remain infixes? That's the gap in your argument I should have started with. kwami (talk) 03:32, 6 April 2023 (UTC)[reply]
I just came across this in an older source:
[Swahili] reciprocal verbs are usually followed by "na" (with) reminding us of the frequent English prefix (or infix) "con".
with reconciled as an example of -con- as an infix in English. This is the same tradition as that of the Bantu 'infix'. Shall we create an entry for the English infix -con-? These are not infixes by modern linguistic definition, and do not fit the definition of 'infix' either here on wikt or on WP, or in modern treatments of morphology or linguistic dictionaries. kwami (talk) 03:38, 6 April 2023 (UTC)[reply]
Okay, a couple things I've found so far, specifically for Swahili and specifically for the relative prefixes (though AFAICT there is no relative–non-relative distinction in the use of 'infix'):
Joan Maw (1999) Swahili for Starters
"Notice that any object prefix comes after the relative prefix."
"In the case of a compound verb, the object prefix occurs only in the main verb, in contrast to the relative prefix, which occurs only in the auxiliary."
Example: kitabu ninachokisoma 'the book which I am reading' vs kitabu nilichokuwa ninakisoma 'the book which I was reading'
Ultimate Swahili Notebook (2020)
"Verb structure in Swahili: Subject prefix - TAM prefix - Relative prefix - Object prefix - Verb stem - Extension(s)"
"Relatives are verbs used as adjectives by being relativised using a relative prefix (or suffix) which agrees with the noun's class."
Example: ndege aliyekufa.
Edward Steere & Augustine Hellier (1934) Swahili Exercises
"The use of the relative prefix in the verb will no doubt be found difficult."
Mühlhäusler, Ludwig & Pagel (2019) Linguistic Ecology and Language Contact. CUP.
"Sheng appears to be losing or to have already lost the subsystems of the object prefix and the nominal relative prefix."
[U.S.] Foreign Service Institute, Earl W. Stevick (1966) Swahili: An Active Introduction
"This form is characterized by a 'relative prefix,' which stands between the tense prefix and the object prefix (if any). The relative prefixes all contain /-o-/, except for the third person singular personal relative prefix, /-ye-/."
Examples: wanaokaa 'those who live', anayekaa 'he/she who lives'
"The relative of the /ta/ tense has /taka/ plus the relative prefix."
"The relative prefix /po/ (Class 16) is often used without any special Class 16 word before it."
Example: Utakapofika penye mto ... 'When you arrive at a stream ...'
So there you have a relative 'prefix' even after the TAM prefix -taka-. And neatly using the double hyphen that was the original point of this thread. kwami (talk) 04:08, 6 April 2023 (UTC)[reply]

@-sche, I’m tired of this pointless discussion. Care to judge? Or how do you suggest we move on from here? MuDavid 栘𩿠 (talk) 02:35, 11 April 2023 (UTC)[reply]

Agreed. I provided you with exactly you asked for, which makes any further argument pointless. kwami (talk) 02:43, 11 April 2023 (UTC)[reply]
I really hoped more people would weigh in here, but in the absence of that, as said above, there was no consensus for the one editor to change these to prefix, against other editors saying infix, so yes, I will revert the changes. - -sche (discuss) 23:09, 11 April 2023 (UTC)[reply]
We have multiple sources that these "infixes" are really prefixes. I concentrated on the specific case of the relative prefixes, because that's what MuDavid was most concerned about, but there are additional RS's for the Swahili concord prefixes in general. If an "infix" on Wiktionary is a non-initial prefix simply because that's what some sources call them, then for consistency we need to add a duplicate entry for English con- as an "infix" because that's what some sources call it in words like reconcile. Shall I start a proposal on a mass creation of English "infixes"? kwami (talk) 23:17, 11 April 2023 (UTC)[reply]
I'd think even the scholars who consider -con- in that word to be an infix would agree that there's a difference between it and the more traditional infixes of some other languages, which occupy inflectional slots, not derivational ones, and which in most cases can be omitted and still leave behind a grammatical word. By contrast, there is no *recile. Soap 16:43, 12 April 2023 (UTC)[reply]
The -con- argument is completely irrelevant; that’s just a strawman of Kwami’s. Reconcile is a borrowing from Latin, not an English word with affixes, and there is a word conciliō in Latin, such that it’s “better” (if such a thing exists) to analyse con- (just like Swahili subject concord) as a prefix. There are, however, no Swahili verb forms *okula, *takaokula, *chokisoma, or *takachokisoma. That’s why Bantuist generally call these infixes, unless they call them something more specific like “concord”. (And no, I don’t consider language learning materials to be authoritative.) MuDavid 栘𩿠 (talk) 02:13, 13 April 2023 (UTC)[reply]
"Infix" for such things is dated in Bantuist material, just as it is for reconcile in English. That's why I brought up the latter. kwami (talk) 02:15, 13 April 2023 (UTC)[reply]
  • They're prefixes. Infixes are squeezed into single morphemes, this does not happen with the Bantu prefixes in question. The only special thing about them is that they are subject to positional constraints. You find similar things in Athabaskan or Northwest Caucasian languages, and all these tiny morphemes concatenated before the root are generally called "prefixes" in the literature, even if they are obligatorily preceded by other prefixes. Search for "Navajo"+"Classifier prefixes" vs. "Navajo"+"Classifier infixes" in Google, and you'll see what I mean. The only attestations for the latter are from Wiktionary talk pages(!)
    The same thing is found in IE languages with thematic suffixes; many are obligatorily followed by case-marking suffixes, yet hardly anyone speaks of thematic "infixes". Austronesier (talk) 20:41, 12 April 2023 (UTC)[reply]
    And that’s the difference with Bantu infixes: if you Google for them, you’ll see it’s *very* common terminology, as even Kwami pointed out several times above. (Furthermore, infixes do not need to go into single morphemes, as even Kwami conceded above. And from what I gather, it appears Navajo classifiers are not used in the same way as Bantu infixes, in that Navajo classifiers must be glued to the verb stem. But as I don’t know Navajo and it seems you don’t know much about Bantu, that’s something we won’t be able to debate here.) MuDavid 栘𩿠 (talk) 02:31, 13 April 2023 (UTC)[reply]
No, I use the term "infix" for an affix placed within a stem, but that's not the formal definition in modern linguistics.
What you're arguing is that "infix" should mean different things in the context of different languages, based on common usage, so it shouldn't matter if you know Athabascan or if Austronesier knows Bantu. But if you're going to go by common usage, why shouldn't learning material be authoritative? Doesn't that reflect common usage? kwami (talk) 02:34, 13 April 2023 (UTC)[reply]
("I don’t know Navajo" → "it seems you don’t know much about Bantu") That's an interesting piece of deduction, but well, anyway...) I know enough about linguistic typology and the morphological structure of Bantu person-marking and TAM morphemes like Swahili me- to say that they are not infixes in the common cross-linguistic sense of the word. I am aware of the common parlance used in scholarly and non-scholarly descriptions of Bantu languages, but that is "insular" usage. Insular usage is very common in descriptive linguistics and perfectly fine within in its context, but once you leave the bubble you should at least be aware of it. And also consider how to deal with it in Wiktionary which is the main prupose of this discussion. If insular terminology is not in conflict with general usage because it never appears outside context, like "screeve" in the description of Georgian, there's no problem to use it here. In the worst case, a reader says 'Wtf is "screeve"?', looks it up and gets a clear definition. Otherwise, it might not be useful to follow insular usage. E.g. to use "focus" in the dated usage in grammatical descriptions of Philippine languages, which is not focus in the common and also specialized linguistic sense of the word (luckily, Philippinists/Austronesianists stopped using it the wrong way). Or the traditional way of calling certain prefixes "infixes" in the description of Bantu languages. On the one hand, it creates an inconsistency within Wiktionary (and also cross-linguistically), but on the other hand increases recognizability of our data when compared with much of the existing specialized literature. This is the principal issue for consideration here.
(Just out of curiosity: if you wanted to submit a paper for publication in a work that follows Leipzig Glossing Rules, would you really parse nimesoma as ni-<me>soma with the "infix" <me> in angle brackets per rule 9?) Austronesier (talk) 13:05, 13 April 2023 (UTC)[reply]
(It’s difficult to discuss with people who make up deductions I never made and who scoff the study of the languages of half a continent and 350 million people as “insular”, but well, anyway…)
You can argue all day long that -me- and -cho- are not infixes, that does not automatically make them prefixes. You can easily find tons of papers online arguing for different analyses: that -me- should still be analyzed as an auxiliary despite having been phonetically reduced and orthographically glued to other words, that -cho- is a suffix even when in infix position, etc. Heck, there’s even people arguing that Swahili subject prefixes are actually pronominal clitics; this may not be widely accepted, but it does show the situation is more complex than shouting “they aren’t infixes!”
So, does anyone have anything constructive to add? MuDavid 栘𩿠 (talk) 03:41, 19 April 2023 (UTC)[reply]
Do you have any RS's to support your claims, given the general linguistic consensus that these affixes do not meet the modern international definition of 'infix'? It's not up to me to prove that you're wrong, it's up to you to demonstrate that you're right. kwami (talk) 04:49, 19 April 2023 (UTC)[reply]

Getsnoopy and canonical forms[edit]

Getsnoopy changed center to an alternative form (“Made this the alternative form, since the previous alternative form labels were not exhaustive, and this one is the exception to the rule.”) of centre (“Made this the alternative form, since the previous alternative form labels were not exhaustive, and this one is the exception to the rule.”) on 19 March.

Likewise with airplane (“Made this an alternative form.”; “Pointed the translations to the canonical spelling entry.”) and aeroplane (“Removed the labels, as those labels are not exhaustive (it's much easier to list the exceptions to the rule), and moved the translations here.”) on 14 January.

And analyze (“Made this the alternative form.”) and analyse (“Made this the canonical form.”) on 23 November 2022.

Should the edits be reverted?

Note that I blocked this user from Category and Module namespaces on 12 October 2022 for changing spellings of categories (also in July; see Category talk:Theater). J3133 (talk) 20:20, 21 March 2023 (UTC)[reply]

@J3133 Yes, please revert all these changes and warn this user not to make Pondian changes in the future. I don't know what their logic is but a cardinal rule at Wiktionary is not to try to impose either American or British spelling on entries. I consider it acceptable e.g. to modify a definition to add the other spelling after the existing spelling (e.g. if an Italian definition says "aeroplane tyre" I think it's fine to modify it to say "aeroplane tyre/airplane tire" or similar) but not to simply replace "aeroplane tyre" with "airplane tire", and especially not to reverse the direction of canonical vs. alternative forms. Benwing2 (talk) 22:26, 23 March 2023 (UTC)[reply]
@J3133 & @Benwing2 It's not about "picking" one or the other; it's that the labels that are required to enumerate the spellings used outside of the US (and sometimes Canada) are numerous, and it usually results in the labels either not being exhaustive or being tedious to add and to read them. For example, many of the entries have labels like "UK, Ireland, South Africa, Canada, Australia, New Zealand". Not only is this list not exhaustive, but it clutters the page. For the reverse, however, it's almost always only "US" or "US and Canada", and that's it. Hence, it's a much more elegant solution.
It makes little sense to have definitions listed in the exceptional/minority case as the canonical entry and then have all of the normal/majority cases listed as variants. Either way, this issue needs to be discussed and solved. Getsnoopy (talk) 19:05, 24 March 2023 (UTC)[reply]
@Getsnoopy The US and Canada are not the "exceptional/minority case" and the consensus for not changing Pondian choices is long-established. Please don't try to disturb this consensus or you may end up blocked. Benwing2 (talk) 19:15, 24 March 2023 (UTC)[reply]
The US makes up 4% of the world and 26–27% of the English-speaking world, so very much in the minority. Could you point me to where this consensus was established? And there's still the issue I brought up that needs to be addressed. Getsnoopy (talk) 19:44, 24 March 2023 (UTC)[reply]
@Benwing2, J3133: Obviously Getsnoopy should not be moving these around but I do think there needs to be a better solution for the label issue. If we're just being exhaustive there are plenty of countries with their own English dialects that use the -re spelling, like Singapore and India, which are arbitrarily excluded at the moment. For an orthographic difference like this it might make sense to only list dialects that actually have distinct spelling standards, i.e. (as far as I know) British, Canadian, Australian, and NZ, and leave "British spelling" to cover Ireland and other places I've mentioned. —Al-Muqanna المقنع (talk) 19:57, 24 March 2023 (UTC)[reply]
Maybe we can add a label "Commonwealth spelling" (and the corresponding category "Category:Commonwealth English forms", a subcategory of "Category:Commonwealth English") along the lines of "British spelling" and "Category:British English forms". — Sgconlaw (talk) 20:12, 24 March 2023 (UTC)[reply]
@Al-Muqanna, Sgconlaw This sounds fine with me. Benwing2 (talk) 21:13, 24 March 2023 (UTC)[reply]
Works for me too. —Al-Muqanna المقنع (talk) 21:20, 24 March 2023 (UTC)[reply]

In the past, it's been pointed out that some countries which use "British"/"Commonwealth" spellings are not in the Commonwealth (and would not tolerate being called British), like Ireland (and there's also the issue of countries like Canada which are in the Commonwealth but don't always use its spellings), so people who just see a label displaying "Commonwealth" or "Commonwealth spelling/form" will still probably recurringly add Ireland (etc), or even change it to "UK, Ireland," etc to indicate Canada doesn't use that spelling. Perhaps we could make {{lb|en|Commonwealth spelling}} the input but have it display "Commonwealth, Ireland"? That would solve the Irish problem although not the Canadian one.
I had been going to suggest creating some shorthand input that would be quick to type (e.g. {{lb|en|Bspell}} or C-form or literally whatever) which could display and add categories for all the places that use what we're loosely calling the "British" or "Commonwealth" spelling, including any that are currently omitted as mentioned above, so that e.g. typing {{lb|en|C-form}} would display "Britain, Ireland, Australia, New Zealand, India, Singapore," etc, but the idea of just displaying something like "Commonwealth and Irish spelling" is definitely more succinct. - -sche (discuss) 21:49, 24 March 2023 (UTC)[reply]
I agree about the politics—for the spelling label (not for "Commonwealth English" in general) having it display "Commonwealth and Irish spelling" is probably the best way forward, since as far as I know they will pretty much always be the same. It might be worth thinking about options for things like Commonwealth-except-Canada the same way we have an option for non-Oxford British spelling, but that's less important. —Al-Muqanna المقنع (talk) 22:08, 24 March 2023 (UTC)[reply]
TIL Ireland is not a member of the Commonwealth … — Sgconlaw (talk) 04:50, 25 March 2023 (UTC)[reply]
The problem with doing this, like you mentioned, is that it's complicated in a lot of cases—both politically and factually. If we follow @-sche's approach, then it would completely clutter the interface. That's why it's much easier to have the majority spelling as the canonical one and have the exceptional ones as the variants that can easily be listed. I don't see why this would be controversial at all given that this is how many of the entries have been structure, which is why I employed this strategy. I didn't realize this would rustle so many feathers. Getsnoopy (talk) 02:58, 25 March 2023 (UTC)[reply]
It's quite obvious to you which variant should be chosen, but there have been people over the years for whom it's obvious to choose the other one- it depends on which criteria you decide are the most important. Making unilateral decisions like this runs the risk of endless edit wars. That's why we arrived at the arrangement years ago where the variant that's created first as a full entry is left as the main entry. As a result, we've gone a long, long time without any serious dispute. You're focusing on the ones where the US variant has priority, but I can tell you that I've reverted just as many edits that switched the priority to the US variant as the other way around.
If you want to make a major change like this you must get consensus first Chuck Entz (talk) 03:44, 25 March 2023 (UTC)[reply]
But that is exactly what I'm referring to: let's just get multilateral consensus here. The numbers are very clear which variants are more popular than the others, and given that it's almost always in favour of Commonwealth spellings (just due to their sheer 3:1 outnumbering of US spellings), this would make the label issue very simple to solve. Of course, there are exceptions to this (e.g., program is one I can think of off the top of my head), where those would be the canonical ones. There shouldn't be any edit wars if the policy is made clear, and the policy is actually sound. With the "whoever created it first" policy, as is clear from the existence of this discussion, it's almost always going to be inaccurate or cumbersome somehow. It's almost always going to be easier to list the exceptions rather than exhaustively list the norms (which is a big factor for a community-driven platform like Wiktionary), so even from a "number of users willing to contribute and how much effort they have to put in" perspective, the solution I proposed makes more sense. Getsnoopy (talk) 21:23, 25 March 2023 (UTC)[reply]
You are trolling, please stop. Benwing2 (talk) 20:01, 26 March 2023 (UTC)[reply]
...? No, I am not. Are you? Getsnoopy (talk) 16:20, 28 March 2023 (UTC)[reply]
If you are not then you are incredibly misinformed and should not be making changes related to this. Your claims are also highly unsourced. Vininn126 (talk) 16:33, 28 March 2023 (UTC)[reply]
Which claims are you referring to? Getsnoopy (talk) 17:46, 28 March 2023 (UTC)[reply]
Prevalence of lects? Vininn126 (talk) 17:54, 28 March 2023 (UTC)[reply]
I often come across American spellings that haven't been labelled as such, probably because American editors don't recognise them as such, which could account for less entries with {{lb|en|American spelling}} compared to {{lb|en|British spelling}}. I label the American spellings as I come across them. -ise spellings are readily identifiable as British, -ize spellings are not, but are rarely used in Britain. I sympathise with the issue of American spellings, such as center, being usually treated as the the main spelling, this can cause problems for British English editors. This is illustrated by draught, where I decided to add notes to quotations about which sense of draft they belong to. Another problem has been the use of {{lb|en|British}} where, in reality, it's a British spelling. DonnanZ (talk) 10:28, 29 March 2023 (UTC)[reply]
@Vininn126 I'm going by the English-speaking population sizes of countries, and that US spelling is only widely used (outside of the US) in the Philippines, Liberia, South Korea, Japan, China, Turkey, Israel, and maybe Saudi Arabia. Even adding in all the Latin American countries, if one tallies the English-speaking populations of those countries with the global count of ~1.2 billion speakers, the number is at maximum ~38%, which is still well in the minority. And the numbers that that article cites have been reported as being too low for many of those countries listed in that article, so the 37% number would be conservative (or generous). This is consistent with common knowledge that US English is a minor variant of English around the world.

A case for a Proto-Yoruba and Proto-Edekiri language codes[edit]

Subject: applying to create Proto-Yoruba and Proto-Edekiri language sections - to serve as etymology languages.

Background[edit]

I have been doing a lot of work on Proto-Yoruboid entries and would like to see the creation of a Proto-Yoruba language. Proto-Yoruboid is the ancestor of the Yoruboid language family, this includes the Yoruba language (all the dialects of Yoruba), Igala, all the Ede languages, and Olukumi.

The issues is that this is not really a suffice ancestry tree for the Yoruba languages.

Instead, Proto-Yoruboid actually split into Proto-Edekiri and then Proto-Igala. Proto-Edekiri is also known as Proto-Yoruba-Itsekiri. I will be first arguing for Proto-Yoruba, but I'd like to see Proto-Edekiri also considered, since its usually reconstructed in data for Proto-Yoruba

However, I do believe that Proto-Yoruba deserves a space on wiktionary. Proto-Yoruba is recognized as the direct ancestor of the 5 dialectal groups that are lumped as the "Yoruba language," Northwest Yoruba, Southeast Yoruba, Central Yoruba, and Northeast Yoruba, as well as the Ede language family. A lot of work has been done on the many dialects in each of these groups, and its clear that they have a common ancestor.

Has the reconstruction of Proto-Yoruba been minimal. Yes. But there are a wide variety of papers that not only site the existence of Proto-Yoruba, but reconstruct both its vocabulary and its phonology.

Proto-Yoruba phonology[edit]

On the high non-expaned vowels of Yoruboid by Hounkpati B. Capo, Capo discusses the question on the evolution of Yoruboid vowels. There, he cites work from Akinkugbe 1976 in which a family tree was establishing identifying Proto-Yoruba as the direct ancestor of the Yoruba language and its dialects. According to Akinkugbe, CY [Central Yoruba], NEY [North-eastern Yoruba], SWY [South-western Yoruba], which is now regarded as the Ede language group; NWY [North-western Yoruba]; SEY [South-eastern Yoruba] are all descendants of Proto-Yoruba.

To continue, Oyelaran 1973 & 1977 did early work on identifying the vowel system of Proto-Yoruba. This is largely because of the wide variety of allophonic vowels in the Yoruba language dialects, and it was mainly argued whether Proto-Yoruba maintained the 9-vowel harmony system seen in Central Yoruba dialects, or instead a more restricted vowel system as seen in the Standard Yoruba language, Northwest Yoruba, etc. These papers argue that proto-Yorùbá did not have harmony in its high vowels (Fresco 1970, Oyelaran 1973, and Capo 1985), and that high vowel harmony developed in the Central Yoruba dialects, but several linguists still believe Proto-Yoruba and Yoruboid did have this vowel harmony lost, and it was instead lost over time.

Proto-Yoruba reconstructions[edit]

In Archaeology and Language IV: Language Change and Cultural Transformation by Robert Blench, we actually see some reconstructed vocabulary of Proto-Yoruba, *ɔ̀-cʊ̀kpá, from *ò-cù (seen in Proto-Yoruboid for moon) + *kpá (perhaps a suffix relating to shining). It also reconstructs vocabulary in Proto-Yoruba like *bi, which is not seen in languages like Igala, supporting the view that Proto-Yoruba is different from Proto-Yoruboid and does deserve its on category.

However, the greatest evidence for Proto-Yoruba is almost certainly in "A comparative phonology of Yoruba dialects, Itsekiri, and Igala," by Olufemi Akinkugbe in 1978. There we see a reconstruction of three hundred basic vocabulary words for Proto-Yoruboid, Proto-Edekiri (then known as Proto-Yoruba-Itsekiri), and Proto-Yoruba.

Examples of reconstructions in Proto-Yoruba and Proto-Edekiri[edit]

Some examples we see is Proto-Yoruboid *ɔ́-bɪ̃̀lɪ̃ (woman) becomes *ɔ́-bɪ̃̀rɪ̃ in Proto-Edekiri, and *ɔ-bɪ̃̀rɪ̃ in Proto-Yoruba. Proto-Yoruboid *ɔ́-bɛ (knife) stays consistent as ɔ́-bɛ in Proto-Edekiri and becomes *ɔ-bɛ in Proto-Yoruba

The basic differences we see from Proto-Yoruboid to Proto-Yoruba is a change from high tone prefixes to mid tone, a simplification of implosive consonants like /ɓ/ and /ɗ/, new roots not found in Proto-Edekiri or Proto-Yoruboid, and vowel changes, usually a shift from /i/ to /e/.

Yoruba would be listed under Proto-Yoruba, as well as the Ede languages (Ede Ife, Ede Idaca, etc). Proto-Yoruba-Itsekiri would include Proto-Yoruba as well as Itsekiri and Olukumi languages.

Historical evidence[edit]

Akin Ogundiran's relatively recent book title Yoruba: A New History, provides the first look into how these people would have actually migrated and where they may have lived, which also provides some nice context

References[edit]

  • Akinkugbe, Olufemi. (1978). A comparative phonology of Yoruba dialects, Itsekiri, and Igala
  • Blench, Robert. (1999). Archaeology and Language IV: Language Change and Cultural Transformation
  • Capo, Hounkpati. (1985). On the high non-expaned vowels of Yoruboid
  • Fresco, Edward M. (1970). Topics in Yoruba dialect Phonology
  • Oyelaran, O. O. (1973). Yoruba vowel co-occurence Restrictions. Retrieved from https://www.proquest.com/scholarly-journals/yoruba-vowel-co-occurrence-restrictions/docview/1308685672/se-2
  • Ogundiran, Akin (2020). The Yoruba: A New History

I'm also going to ping those who work on Yoruba and Yoruboid language lemmas @Egbingíga, @Oníhùmọ̀, @AG202. Oniwe (talk) 21:16, 21 March 2023 (UTC)[reply]

This looks great 👍🏿
However, my only question is: how does Isekiri and SEY fit into this Proto-Yoruba analysis? Numerous authors claim, including Akinkugbe, that Isekiri-SEY diverged from Proto-Yoruba-Isekiri, and not from Proto-Yoruba. It is quite apparent that in the southeastern varieties, particularly Ijebu, that Standard Yoruba and other non-SEY varieties, which descended from Proto-Yoruba, have greatly influenced varieties of SEY, but they still claim that it doesn't stem from the proposed Proto-Yoruba language. The term Yoruba is applied to these varieties purely as a result of the politics within the region over the last two-hundred or so years, with Isekiri avoiding such a fate due to cultural differences and its geographic separation from the rest of Yorubaland.
In Akinkugbe's 1976 paper he states, "The only plausible explanation we can offer for these common innovations between Iṣẹkiri and SEY is that the YIS [Proto-Yoruba-Isekiri] branch split into two an Iṣẹkiri/SEY branch and a YOR [Proto-Yoruba] branch. However, part of the evidence before us suggests that at a later point in time, because of the geographical contiguity of SEY with the rest of Yoruba, SEY converged with YOR, while Isekiri became relatively isolated and, partly due to Edoid influence, developed in a more divergent manner than SEY." Some of the "innovations" of which he speaks include, the usage of ọ̀bọ̀n (ọ̀bù in Ijebu) in lieu of Proto-Yoruba's ọjà in all SEY varieties as well as Iṣẹkiri, the usage of ẹnẹ/ẹni for the first person plural in lieu of Yoruba's various wa forms, the usage of àghan and its variations for both the 2nd and 3rd person plural whereas varieties descended from Proto-Yoruba have distinct forms for both, in addition to a few other "innovations" that support his claim.
Overall, relying on Akinkugbe and other papers I've come across but can't seem to find 😅 It seems as though there is a consensus that Isekiri/SEY branched off from Proto-Yoruba-Isekiri and that the main clusters under Proto-Yoruba are Northwest Yoruba, Central Yoruba, Northeast Yoruba, and the Ede languages.
Akinkugbe's paper: https://main.journalofwestafricanlanguages.org/index.php/downloads?task=download.send&id=131&catid=31&m=0 Egbingíga (talk) 08:05, 23 March 2023 (UTC)[reply]
Thanks for your response, you make a great point about SEY. I think since SEY doesn't have its own code that doesn't make making Proto-Edekiri much of an issue. However, In Akinkugbe's reconstruction it looks like she is suggesting that Itsekiri diverged from Proto-Edekiri and SEY simply diverged very early on from Proto-Yoruba, so that the time between Itsekiri and SEY is much closer than the rest of Proto-Yoruba. So think of a Proto-Yoruba-Itsekiri which became Proto-Yoruba but SEY split off from Proto-Yoruba quite early.
But I guess your argument here is that Itsekiri and SEY have similarities that don't exist in Proto-Yoruba or Igala which would mean they had their own ancestor, Proto-Itsekiri-SEY, which may have been a dialect along with Proto-Yoruba, but were not Proto-Yoruba? I think I can support that but I don't know how or if we even need to make such a distinction when making Proto-Yoruba or Proto-Edekiri since SEY doesn't have its own language code. Oniwe (talk) 17:47, 23 March 2023 (UTC)[reply]
I believe such distinctions are necessary, especially for the SEY cluster, at least for it not to be grouped with the other clusters under Proto-Yoruba. I understand the lack of online infrastructure for SEY when it comes to language codes and hope that changes, but as you stated it doesn't make the creation of codes for Edekiri and Proto-Yoruba an issue. Egbingíga (talk) 19:24, 23 March 2023 (UTC)[reply]
Would it perhaps be better if we moved the Regional Yoruba category of Southeastern Yoruba to under Edekiri? Thanks for your input. Oniwe (talk) 19:56, 23 March 2023 (UTC)[reply]
That's a difficult question because there are two reasonable answers. One option is that yes, we should move it under Edekiri as that is where it belongs from a purely linguistic viewpoint. On the other hand SEY in a modern political sense and by speakers of its various varieties view the cluster as under Yoruba (some even believe it to be under Standard Yoruba), primarily due to the political forces of the last 200 years, which is wrong. In keeping SEY under the Regional Yoruba category, I would assume it would be easier for speakers and those interested to access it, as that is where they would assume their variety to be, even if not reflective of the historical linguistics of the variety. Egbingíga (talk) 06:30, 24 March 2023 (UTC)[reply]
I agree! I'd appreciate if @AG202 and @Oníhùmọ̀ could add any input, perhaps how you also feel regarding SEY being considered under Proto-Edekiri and not Proto-Yoruba! Oniwe (talk) 04:00, 26 March 2023 (UTC)[reply]
@Egbingíga, @Oniwe: Apologies for the late response, I thought I had responded already. I'd support adding language codes for Proto-Yoruba & Proto-Edekiri. In terms of SEY, one workaround would be to have Yoruba have two ancestors officially listed (Proto-Yoruba & Proto-Edekiri) and then for SEY entries, they can show derivation directly from Proto-Edekiri, similar to how Norwegian Bokmål has two ancestors. AG202 (talk) 01:14, 10 April 2023 (UTC)[reply]
@Benwing2 Hi, this post has been over a month old and the people who work on Yoruba here agree with the edition of these languages, so I'd appreciate if you could make codes for Proto-Edekiri (Proto-Yoruba-Itsekiri) and Proto-Yoruba. Proto-Edekiri should be directly under Proto-Yoruboid and listed as an ancestor of Yoruba, Itsekiri, and Proto-Yoruba. Proto-Yoruba is a descendant of Proto-Edekiri and is the direct ancestor of Yoruba, the Ede languages, and Olukumi. Thank you!! Oniwe (talk) 19:02, 20 April 2023 (UTC)[reply]
@Theknightwho Oniwe (talk) 01:13, 24 April 2023 (UTC)[reply]
@Oniwe @AG202 What would the actual language codes be? I don't see anything on Wikipedia. Vininn126 (talk) 12:49, 24 April 2023 (UTC)[reply]
Not sure. How was the Proto-Yoruboid code made, since there was nothing on Wikipedia when that was made? Theoretically we could make one but I'm not sure of the process of that. Oniwe (talk) 13:39, 24 April 2023 (UTC)[reply]
I don't think Glottolog or ISO have one. It should be in-line with other langcodes that we use. Vininn126 (talk) 13:58, 24 April 2023 (UTC)[reply]
The naming format is the family code plus "-pro", which means we also need to create family codes for the families the proto-languages are the ancestors of (for which this describes the naming scheme), since we don't have a "Category:Yoruba languages" family yet. But we treat Yoruba as just one language; Wikipedia likewise treats "Yoruba" as a language, and redirects "Yoruba languages" to Edekiri languages. Are there any other languages in the "Yoruba languages family" of which "Proto-Yoruba" is the ancestor, besides Yoruba? I'm struggling to think of a situation where we create a family and Proto-Language for something we consider one language (even for Proto-Basque and the Vasconic languages, we're saying it's Basque + Aquitanian, though I know some people do reconstruct a Proto-Basque just from the dialects of Basque), so we should consider this carefully. When sources reconstruct a "Proto-Yoruba" as a different thing from Yoruba, how can we best handle this? Is it like Proto-Romance (handled in the reconstruction namespace as a reconstructed form of a language, Latin, rather than as a separate language), or should we indeed reconstruct a "Yoruba family" with only(?) Yoruba in it? - -sche (discuss) 14:45, 24 April 2023 (UTC)[reply]
@-sche: "Yoruba would be listed under Proto-Yoruba, as well as the Ede languages (Ede Ife, Ede Idaca, etc). Proto-Yoruba-Itsekiri would include Proto-Yoruba as well as Itsekiri and Olukumi languages." To rephrase (@Oniwe, please fact-check me):
  • The Yoruba languages family (alv-you? maybe?) would include Yoruba proper (yo), the Ede languages family (alv-ede), and Lucumí (luq, a descendant of Yoruba). Edit: Olukumi (ulb) as well.
  • The Yoruba-Itsekiri (or Edekiri) family (alv-edk or alv-yit or something) would include the Yoruba languages family & Itsekiri (its). Olukumi (ulb)
  • The Yoruboid (alv-yor) family would include Yoruba-Itsekiri & Igala (igl).
Also, Lucumí & Itsekiri still need to be updated per Wiktionary:Requests for moves, mergers and splits § Rename Ulukwumi (ulb) to Olukumi, add Yoruba as an ancestor for Lucumí (luq), & the case of Itsekiri (its) AG202 (talk) 05:19, 25 April 2023 (UTC)[reply]
I think the consensus is that Olukumi is actually a descended from Proto-Yoruba, not Proto-Yoruba-Itsekiri. Oniwe (talk) 17:09, 25 April 2023 (UTC)[reply]
Thanks! I'll update the list. AG202 (talk) 18:05, 25 April 2023 (UTC)[reply]
Yes, Olukumi is a language, that should have an ISO code (ulb), that is descended from Proto-Yoruba, and is regarded as different from Yoruba. The Ede languages (Ede Ife, Idaca) as AG202 explained are also all under Proto-Yoruba, and each of these have their own ISO codes.
Yoruba is treated as one language but the dialects are quite different and probably could be defined as their own languages to some extent if in another political/social context, but this is debatable of course, and we're not looking to change how Wikipedia and people look at Yoruba. The Yoruba language both refers to the standardized Yoruba language, but also the 30 dialects, spoken.
When Proto-Yoruba is reconstructed, it is quite different from modern Yoruba because it reconstructs the ancestors of all these dialects that make up the "Yoruba language." The standard Yoruba dialect is simply a construction of about two of these dialects. So "Standard Yoruba," is essentially within the dialectal contiuum that could be reconstructed to Proto-Yoruboid. I believe the proto-Romance thing would not accurately represent that. So I believe that since we are not giving codes to each of the dialect groups, the Yoruba family should only include Yoruba, Ede Idaca, Ede Nago. Ede Ije, Ede Sabe, Ede Mokole, (I'm probably missing a few more Ede languages) and Olukumi. However, we can reflect in entries that Yoruba actually consists of Northwest Yoruba, Northeast Yoruba, and Central Yoruba dialects, in addition to Standard Yoruba.
The goal is to reconstruct a Yoruba family, but Yoruba is not the only language it in. Oniwe (talk) 17:17, 25 April 2023 (UTC)[reply]
OK, so there are enough languages; thanks for explaining. :) Then yes, we'd need something like "alv-you" for the family (it's a pity "Yoruboid" has already taken "alv-yor" instead of using something like "alv-yrd"), "alv-edk" for Edekiri and + -pro for the Proto-languages. I can add those later, or someone else can, if there are no objections/impediments. - -sche (discuss) 15:30, 26 April 2023 (UTC)[reply]
@-sche Any updates? CC: @Theknightwho. Would it also maybe be possible to change the Yoruboid family to be alv-yrd and then have Yoruba be alv-yor? Or would that involve too much of a change? AG202 (talk) 03:20, 1 June 2023 (UTC)[reply]
CC: @Benwing2, @Surjection. It's been two months since this discussion was ended with a consensus to create. AG202 (talk) 19:52, 29 June 2023 (UTC)[reply]
I can check when I have time. Is the list of changes above still accurate? — SURJECTION / T / C / L / 20:32, 29 June 2023 (UTC)[reply]
@AG202 Do any of these changes require bot work? Benwing2 (talk) 21:26, 29 June 2023 (UTC)[reply]
@Surjection: Yes the list of changes is still accurate. CC: @Oniwe. @Benwing2: I'm not sure, but I assume that the creation of categories and such and updating related modules might need bot work. Thanks y'all! AG202 (talk) 18:10, 30 June 2023 (UTC)[reply]
I think bot work may be needed to move the codes. alv-yor is a much better fit for the Yoruba languages than the Yoruboid languages (which could be something like alv-yrd). In general it might be a good idea to agree on the final codes for all of the new families before starting. — SURJECTION / T / C / L / 18:59, 30 June 2023 (UTC)[reply]
Here are all the codes
Edekiri: alv-edk
Yoruboid: alv-yrd
Yoruba (family): alv-yor
Proto-Edekiri: alv-edk-pro
Proto-Yoruboid: change to alv-yrd-pro
Proto-Yoruba: alv-yor-pro Oniwe (talk) 00:59, 1 July 2023 (UTC)[reply]
Sounds like a good proposal. If @AG202 is okay with these, I can start the necessary cleanup work. — SURJECTION / T / C / L / 09:17, 1 July 2023 (UTC)[reply]
@Surjection That looks good to me! Feel free to go ahead with them, thanks! AG202 (talk) 14:44, 1 July 2023 (UTC)[reply]
 Done The bot jobs are still running. Does the descendant tree at Category:Proto-Yoruboid language look correct? — SURJECTION / T / C / L / 15:24, 1 July 2023 (UTC)[reply]
Yup still accurate! Oniwe (talk) 00:48, 1 July 2023 (UTC)[reply]

Idea to make quotations easier to read[edit]

(pinging @Sgconlaw who is probably interested) In quotations, the ISBN code is hidden since persumably very few readers care about an arbitrary string of digits. My idea is that the OCLC, ISSN, LCCN, and DOI codes should be similarly hidden.

Before:

  • 2021 October 4, “With Quantum Computing’s Rise, Cybersecurity Takes Center Stage”, in Wired[8], San Francisco, C.A.: Condé Nast Publications, →ISSN, →OCLC, archived from the original on 2022-12-16:
    The U.S. Department of Commerce’s National Institute of Standards and Technology (NIST) is stepping in to prepare for this post-quantum future.


After:

  • 2021 October 4, “With Quantum Computing’s Rise, Cybersecurity Takes Center Stage”, in Wired[9], San Francisco, C.A.: Condé Nast Publications →ISSN →OCLC, archived from the original on 2022-12-16:
    The U.S. Department of Commerce’s National Institute of Standards and Technology (NIST) is stepping in to prepare for this post-quantum future.

Should this be implemented? Ioaxxere (talk) 01:20, 23 March 2023 (UTC)[reply]

Yes, I think this is a (small) improvement. The availability of the data is useful but the values per se are meaningless and most people will scan over them. Equinox 01:23, 23 March 2023 (UTC)[reply]
Would support this change. —Al-Muqanna المقنع (talk) 01:48, 23 March 2023 (UTC)[reply]
No objection, I guess. — Sgconlaw (talk) 04:14, 23 March 2023 (UTC)[reply]
Alright, in that case I've made the switch. Ioaxxere (talk) 04:38, 23 March 2023 (UTC)[reply]
  • Somewhat relatedly: The display of {{...}} is the same as that of {{nb...}}, which appears (almost?) exclusively in quotations templates of the R:Q: variety. As the functions from the user's PoV of the two differ, shouldn't there be a visual distinction? DCDuring (talk) 14:31, 23 March 2023 (UTC)[reply]
    There are exactly two differences between {{nb...}} and {{...}}: {{nb...}} doesn't add a space afterwards, and the space before {{nb...}} is non-breaking. Their code is identical other than that, and based on the documentation those are the only intended differences. The only reason {{nb...}} tends to appear more often in RQ templates is, as far as I can tell, that the people who make those templates are more likely familiar with its existence, but I've come across and used both in normal quote-X citations as well. —Al-Muqanna المقنع (talk) 14:55, 23 March 2023 (UTC)[reply]

Remove accessdate parameter in quotations[edit]

The Internet Archive Bot seems to be adding the accessdate parameter indiscriminately (as in diff). This isn't standard practice at all and in my opinion serves as pointless bloat—after all, the whole point of an archive is to ensure that the website doesn't change over time. I propose to remove the accessdate parameter in all quotation and citation templates and presumably clean up all current use via a bot job. Would this be okay with everyone? Ioaxxere (talk) 02:35, 23 March 2023 (UTC)[reply]

I would support removing accessdate where an archive link has been supplied for the reason you mention, but not removing it indiscriminately where there isn't one. —Al-Muqanna المقنع (talk) 02:48, 23 March 2023 (UTC)[reply]
@Al-Muqanna Why? In a bare URL quotation, either the URL works, in which case the accessdate is pointless, or the URL is dead, in which case an archiveurl should be added straight away, and the accessdate again becomes redundant. Ioaxxere (talk) 03:33, 23 March 2023 (UTC)[reply]
@Ioaxxere: The reason that most style guides require access dates for web references is that the content at a URL can change over time. If an archived version hasn't yet been supplied, the access date is in principle necessary to decide which archived snapshot corresponds to the state of the website at the time it was referenced (and even when not necessary, like when dealing with standard link rot, it can certainly help). I don't consider it a necessity but it would be silly to remove the option. —Al-Muqanna المقنع (talk) 03:54, 23 March 2023 (UTC)[reply]
"the URL works, in which case the accessdate is pointless" is a very dangerous assumption to make. Websites are not books - they can change after they're published. — SURJECTION / T / C / L / 20:24, 23 March 2023 (UTC)[reply]

Board of Trustees has Ratified the UCoC Enforcement Guidelines[edit]

You can find this message translated into additional languages on Meta-wiki.

Hello all, an important update on the UCoC Enforcement Guidelines:

The vote on the Enforcement Guidelines in January 2023 showed a majority approval of the Enforcement Guidelines. There were 369 comments received and a detailed summary of the comments will be published shortly. Just over three-thousand (3097) voters voted and 76% approved of the Enforcement Guidelines. You can view the vote statistics on Meta-wiki.

As the support increased, this signifies to the Board that the current version has addressed some of the issues indicated during the last review in 2022. The Board of Trustees voted to ratify the Enforcement Guidelines. The resolution can be found on Foundation wiki and you can read more about the process behind the 2023 Enforcement Guidelines review on Diff.

There are some next steps to take with the important recommendations provided by the Enforcement Guidelines. More details will come soon about timelines. Thank you for your interest and participation.

On behalf of the UCoC Project Team, Mervat (WMF) (talk) 11:49, 23 March 2023 (UTC)[reply]

Deriviatives[edit]

What's the appropriate convention on Wiktionary when it comes to deriviatives, and deriviative categories? What kind of words are appropriate to include in an entry, and which ones are not?

articiocco and its descendants made that question pop up in my mind. Synotia (talk) 12:15, 23 March 2023 (UTC)[reply]

If you mean things like types of borrowings/inheritance etc you should read the documentation for {{descendant}} (just click the link). Vininn126 (talk) 12:24, 23 March 2023 (UTC)[reply]
Hmmm not really that.
For example, the Hebrew term ארטישוק. The diff was initially this, then I thought why not this, but I have this feeling it's too long and now I am lost. Synotia (talk) 12:35, 23 March 2023 (UTC)[reply]
For etymologies it's best to find a reliable etymological dictionary and use that as a source. You can check out reference templates for various languages. Vininn126 (talk) 12:38, 23 March 2023 (UTC)[reply]
So, essentially, you're telling me there is no "hard" guideline, but that I should best emulate what I find in dictionaries? Synotia (talk) 12:48, 23 March 2023 (UTC)[reply]
You should read WT:Etymology. That has all the answers to your questions, I believe :) Vininn126 (talk) 12:49, 23 March 2023 (UTC)[reply]
@Synotia Also on tchórz - you can see that the two noun sections are nested at L4 under the first etymology, which is at L3. This is because it's the same etymology, but just a different gender/declension. Vininn126 (talk) 12:54, 23 March 2023 (UTC)[reply]
Regarding tchórz: my goal was just to point the similarity with трус, and vice versa. Synotia (talk) 12:59, 23 March 2023 (UTC)[reply]
Twould be better under ety1!
Here's another thinking about adding cognates to words: there is no consensus how that should be handled. The general preference is to avoid that if the appropriate descendants section lists many languages. Vininn126 (talk) 13:07, 23 March 2023 (UTC)[reply]
Really? Because I meant for the sense of "coward". You could look at the Russian etymology. Synotia (talk) 13:18, 23 March 2023 (UTC)[reply]
Looking into it, these two words aren't even etymologically related? They're only semantically related under coward, I don't think that's a very useful use of {{cog}}. Vininn126 (talk) 13:20, 23 March 2023 (UTC)[reply]
Was I too enthusiastic when I realized the similarity between трусить and tchórzyć? Synotia (talk) 13:24, 23 March 2023 (UTC)[reply]
If you follow the links to Proto-Slavic you'll see the two terms aren't related to each other. Vininn126 (talk) 13:25, 23 March 2023 (UTC)[reply]
The roots perhaps. But the verb? Hmm. I don't know what to do.
Also, while I use the {{cog}} template, my intention is to draw a parallel. I've done this with gelijktijdig and gleichzeitig for example, to point out the words having the same construction. In the same vein I've done the same with трусливый and tchórzliwy. Although, again, I'm not anymore certain if it's relevant. Synotia (talk) 13:32, 23 March 2023 (UTC)[reply]
What is the parallel between these two? Vininn126 (talk) 13:33, 23 March 2023 (UTC)[reply]
They sound very similar, have the same meaning, the same suffix (it could be another one), and I (erroneously) thought the root was the same as well. Synotia (talk) 13:39, 23 March 2023 (UTC)[reply]
That is definitely too enthusiastic. It would make more sense if the roots shared more semantic content, basing this on phonetics alone is a very, very bad idea. Vininn126 (talk) 13:42, 23 March 2023 (UTC)[reply]
Alright, thanks for sharing your advice Synotia (talk) 13:43, 23 March 2023 (UTC)[reply]
Thanks, it's somewhat clearer to me now. Is the form of ארטישוק correct, for example? Synotia (talk) 12:57, 23 March 2023 (UTC)[reply]
I would say the formatting is. As to the etymology, not sure, we'd have to check an etymological dictionary ;) Vininn126 (talk) 13:07, 23 March 2023 (UTC)[reply]
It's rather lacking in quotations. --RichardW57m (talk) 14:10, 23 March 2023 (UTC)[reply]

Should we remove the default text from {{lbor}} and create {{lbor+}} for symmetry with other similar etymology templates? ({{bor}} / {{bor+}} and the like). PUC20:24, 23 March 2023 (UTC)[reply]

It would also make sense to do this for other etymology templates, like {{deverbal}}. Vininn126 (talk) 20:26, 23 March 2023 (UTC)[reply]
Why? —Al-Muqanna المقنع (talk) 20:54, 23 March 2023 (UTC)[reply]
Symmetry. Vininn126 (talk) 20:55, 23 March 2023 (UTC)[reply]
If that's the only reason it's a solution in search of a problem. Oppose unless there are convincing cases where a non-+ version would be preferable. —Al-Muqanna المقنع (talk) 20:57, 23 March 2023 (UTC)[reply]
I understand the symmetry argument but at the same time, it could be argued that {{lbor}}, {{obor}}, {{ubor}} and the like are specialized templates which are conceptually different from the general templates {{bor}}, {{inh}} and {{der}}. Once upon a time, at least {{bor}} and {{inh}} had text auto-added, and they were switched both because {{bor}} said "Borrowing from" instead of "Borrowed from" and because it was common to have to suppress the text, which ended up being awkward. I imagine that for {{lbor}} etc. the cases where the text needs to be suppressed are many fewer. (On a rainy day we could look to see how often the text actually needs to be suppressed.) Benwing2 (talk) 22:18, 23 March 2023 (UTC)[reply]
Yeah, my specific concerns are that a) specialised templates like {{lbor}} and {{deverbal}} use jargon that's much less likely to be understood than "inherited" and "borrowed", so the glossary link is needed, and b) for these templates there's virtually never any justification for the term itself to be suppressed. I don't see any use case for separate versions of those templates without the automatic text (especially given that this can already be done with the notext parameter if absolutely necessary), and creating them purely for aesthetic reasons will encourage needlessly obscure etymology sections. From Discord it seems the concern is just that at least one user has used {{lbor+}} out of habit—if that's the only problem then the + version can simply be redirected to the right place (and I have done so). —Al-Muqanna المقنع (talk) 02:13, 24 March 2023 (UTC)[reply]

Making Illyrian language code xil an etymology-only code (again)[edit]

Illyrian is a woefully unattested language and we've essentially banned the creation of entries in it for years. User:Balltari, however, recently came along and added a handful of entries. Their reconstruction is wishful at best, nationalist at worst, and should be left the etymologies of the languages the words are found in, like Latin sīca, see Reconstruction talk:Illyrian/sika. This makes me think that perhaps the Illyrian language code xil should be made back into an etymology-only code. Thoughts, objections? @Thadh, Vahagn Petrosyan, Chuck Entz, -scheSokkjō 08:18, 24 March 2023 (UTC)[reply]

I agree, this is a case where we really don't know enough about the language to make any good-quality reconstructions. Thadh (talk) 15:47, 24 March 2023 (UTC)[reply]
How are they wishful or nationalistic? It's not like I made any of the terms up myself. I only made entries for words I found, either on Wiki or other sources. If any are to be removed, they should *Terg, *Tergitio, because of the capital and double entries, and *Enchelei, because that was the Greek version of the name. I created a page for the native Illyrian name of the tribe, *Engelanes, attested in Greek literature, and following phonetic changes. Balltari (talk) 19:20, 24 March 2023 (UTC)[reply]
@Balltari: Wishful because the language is unattested, and nationalistic because some people highjack it for Ablanian nationalism, creating dubious etymologies. All you're doing is taking Greek and Latin words, and presuming they're letter-for-letter Illyrian, if they're even Illyrian words at all. Your example of **Engelanes/**Enchelei is extra spurious as Greek Εγχελάνες (Enkhelánes)/Εγχελει (Enkhelei) is thought by some to be an exonym deriving from Greek ἔγχελῠς (énkhelus, eel), but then you also went and created **engel. All your Illyrian entires should to be deleted. – Sokkjō 22:32, 24 March 2023 (UTC)[reply]
I agree. Vahag (talk) 09:44, 25 March 2023 (UTC)[reply]

Switch to dercat[edit]

I propose that we establish a new rule and bot-update existing entries to follow this rule, that borrowings in etylines should point to the nearest step back, i.e. the loaner, and long chains of der should be collapsed into {{dercat}}. Vininn126 (talk) 10:50, 24 March 2023 (UTC)[reply]

Support except for English. I expect more of an explanation in English entries. Ultimateria (talk) 18:17, 31 March 2023 (UTC)[reply]
Agreed, however, I don't think we need to go to the lengths that SGConlaw goes too... Vininn126 (talk) 18:32, 31 March 2023 (UTC)[reply]
Support. Cutting down on ety-spam is always welcome. Nicodene (talk) 18:30, 31 March 2023 (UTC)[reply]
We do as if we already have this rule but I doubt it can be botted. The step back may contain no or not the same information as the entry for the younger lexeme. The result would also differ by whether one used a code for a language group or that of a macrolanguage, such as sem-ara Aramaic languages and sem-osa Old South Arabian languages vs. arc Aramaic language and sem-srb Old South Arabian language and also inc-hnd vs. hi and ur. Fay Freak (talk) 19:25, 31 March 2023 (UTC)[reply]
Support I'm with Nicodene (and I like the coinage.). DCDuring (talk) 20:39, 31 March 2023 (UTC)[reply]
In general I agree with Ultimateria (and Vininn), I'd be OK doing this with languages other than English but in the English Wiktionary I'd expect more detail in English entries (but yes, not necessarily to the level SGConlaw goes, heh). One block to implementing this is: what if e.g. French or German borrowed a word from Swahili which borrowed it from Arabic which derived it from some root, but we don't have entries for the Swahili or Arabic words yet — and hence, can't insert "derived from Arabic foo" into the nonexistent Swahili entry, nor "derived from root X" into the nonexistent Arabic entry — and someone wants to add the ety of the German word's etymology info but doesn't speak Arabic and doesn't feel up to the task of adding the Arabic entry? - -sche (discuss) 21:34, 31 March 2023 (UTC)[reply]
This has been mentioned, and in such cases I think we should leave chains until redlinks are blueified. Vininn126 (talk) 21:39, 31 March 2023 (UTC)[reply]
Oppose any codified rule -- it should be left to users that work in that area; and/or bot scripts as I don't believe this can be done with a bot without accidently losing content. – Sokkjō 21:39, 31 March 2023 (UTC)[reply]
I'm referring to a specific type of chain. Take, for example, this edit. Nothing was lost because the information was copied directly from "above" entries. TONS of entries have this. Vininn126 (talk) 21:41, 31 March 2023 (UTC)[reply]
I follow (I created {{dercat}}), I'm just saying I don't think a script can properly parse the data to understand whether or not the entries higher up in the chain have all the correctly matching etymology info. It's common to find a child entry with a better etymology section than its parents. -- Sokkjō 22:17, 31 March 2023 (UTC)[reply]
I think a code could easily look at sections like the one I showed and skip others. Vininn126 (talk) 22:37, 31 March 2023 (UTC)[reply]
"Easily"? Care to write that script then? -- Sokkjō 23:26, 31 March 2023 (UTC)[reply]
My ability to do so is irrelevant? Vininn126 (talk) 08:07, 1 April 2023 (UTC)[reply]
One's ability to do something and one's ability to assess something are different skills. One can have an incredible knowledge of a language's grammar and assess what would or wouldn't be normal without actually speaking the language. I don't care if you disagree, just do it in a respectful manner, please. You've always come off very rude, Victar, and I'm not having. Please learn to say your thoughts without being a jerk. Vininn126 (talk) 08:29, 1 April 2023 (UTC)[reply]
@Vininn126: If you don't want your ability called into question, don't make claims to wheather something can be easily done or not without the credentials to do so. As a programmer and someone who has written modules here, I can tell you what you're suggesting is not easy, nor should it be automated. You're just salty I'm calling you out, and lashing out now calling me "very rude" and a "jerk" for doing so. -- Sokkjō 15:43, 1 April 2023 (UTC)[reply]
I'm "salty" because I think you can voice your opinions in a much different way, and that's besides the point. I don't see what would be lost in checking long lists of "from {{der}}, you haven't even brought up a single potential edge case, merely stating "trust me bro I'm a programmer and you don't know nothin". Perhaps you are right, but you haven't mentioned any concrete cases where that is the case. Vininn126 (talk) 15:47, 1 April 2023 (UTC)[reply]
I already gave a reason it shouldn't be automated. -- Sokkjō 18:14, 1 April 2023 (UTC)[reply]
I'm not talking about cases where there's more etymology in the child. I'm talking about cases where it's the same. I said that. Why are you changing what I am presenting? Vininn126 (talk) 18:18, 1 April 2023 (UTC)[reply]
Obviously, my guy. I'm saying a bot would be illequipped to parse and process the difrerence, and will lead to mistakes, i.e. the deletion and/or misinterpretation of information. What aren't you understanding about that? -- Sokkjō 18:40, 1 April 2023 (UTC)[reply]
A bot could detect if there was information beyond just ,from {{der}}. If it did, it could just skip that. What aren't you getting about that? Vininn126 (talk) 18:42, 1 April 2023 (UTC)[reply]
To which I say, show me a bot script that can do that, repeating myself again. -- Sokkjō 18:46, 1 April 2023 (UTC)[reply]
@Benwing2 Would such a script be possible? Vininn126 (talk) 18:59, 1 April 2023 (UTC)[reply]

──────────────────────────────────────────────────────────────────────────────────────────────────── @Vininn126 It depends on how sophisticated it should be. I don't think it would be too hard to detect exact matches, but a lot of copied etymologies have been modified slightly, and it might be difficult to reliably detect whether a given modification is safe to delete (i.e. it would depend on what sort of modifications have been made and how much work is put into the script). Possibly the best it could do in such situations would be to output a list of cases where there is likely duplication, and require someone to manually flag whether it's OK to delete. This sort of flagging can be done by dumping all the cases to review into a single file, and just requiring that someone enter "yes" or "no" (or even a single character) by the entries to indicate whether to process them. Contrary to what Victar claims, I'm pretty confident this can be done without risk of losing information, as I've done several similar things in the past. It's just a case of how many manual cases there will be. Benwing2 (talk) 20:25, 1 April 2023 (UTC)[reply]

@Benwing2 If that's the case, we can at least check. Vininn126 (talk) 20:29, 1 April 2023 (UTC)[reply]
If users want to go manually through a dump file, that's a different matter. My opposition was automating the process using a bot, which is not the same thing. -- Sokkjō 01:46, 2 April 2023 (UTC)[reply]
He's still referring to the switch I was talking about. Automating the clear ones while skipping the not clear ones. Read it carefully ;) Vininn126 (talk) 09:14, 2 April 2023 (UTC)[reply]
This is your example you gave of what you're saying should be automated:
{{inh+|pl|sla-pro|*ǫzъkъ}}, from Pre-Slavic {{m|ine-bsl-pro||*anźukas}}, from {{inh|pl|ine-bsl-pro|*anźus}}, from {{inh|pl|ine-pro|*h₂énǵʰus}}.
compared to the parent entry:
From Pre-Slavic {{m|ine-bsl-pro||*anźukas}}, from {{der|sla-pro|ine-bsl-pro|*anźus}}, from {{der|sla-pro|ine-pro|*h₂énǵʰus}}.
Your very example would require manual adjustment as the deriviation templates are not an exact match.
But also back to my first point, it should be up to the editors in an area if they use {{dercat}} or not, and for how many steps. As you can read in the discussion I posted above, on Germanic entries, we "stop etymologies at the first bluelink that is either a reconstruction, or a borrowing" and it's "not a strict science". As a Germanic editor, I do not welcome "establishing a new rule and bot-update" for our area. -- Sokkjō 10:36, 2 April 2023 (UTC)[reply]

Adding Yilan Creole Japanese[edit]

Hello, could someone please add Yilan Creole Japanese (ycr) to Module:languages? I've found a few papers on the language, which should provide us with a decent amount of entries. (P.S. the Wikipedia article claims that Japanese-derived words are written in Kunrei-shiki, but this seems to be the convention only used in one of the papers, and all other sources more or less follow the Atayal orthography) – Wpi31 (talk) 18:09, 24 March 2023 (UTC)[reply]

P.P.S. The language name should be simply "Yilan Creole", both for conciseness and for following most of the sources' usages. Wpi31 (talk) 07:01, 25 March 2023 (UTC)[reply]
 Added, since the only reason we didn't have this code is that ISO also didn't have it until two months ago, and I see no reason not to follow them in adding it, and no objections were raised when this was brought up in February. - -sche (discuss) 09:22, 25 March 2023 (UTC)[reply]

Old Dutch & descendants missing from Germanic family trees[edit]

For some reason Old Dutch and all of the descendent languages are missing from the family trees at both Category:Proto-Germanic language and Category:Proto-West Germanic language. I'm not sure how to go about changing this, so would someone be able look into this? Helrasincke (talk) 02:47, 25 March 2023 (UTC)[reply]

I looked at the language data for Old Dutch and the other West Germanic languages and couldn't find a reason why Old Dutch was not being placed below Proto-West Germanic in the tree. Someone needs to look at Module:family tree and see what happens when it encounters Old Dutch. I could because I've worked on the module in the past, but am not sure if I will get to it. — Eru·tuon 03:21, 25 March 2023 (UTC)[reply]
@Theknightwho, Erutuon, Helrasincke Old Dutch is listed in Module:family tree/data with a spec indicating that it should have Frankish as its parent. Frankish is an etymology-only variant of Proto-West Germanic. All similar cases are also broken, e.g. Greek lists gkm (Byzantine Greek) as its parent and is missing from the family tree at Category:Proto-Hellenic language. I suspect this is related to some breaking change by User:Theknightwho, can you look into it when you have a chance? Benwing2 (talk) 19:59, 26 March 2023 (UTC)[reply]
@Benwing2, Theknightwho: I think it's a problem in Module:family tree/nested data because the module looks for the "parent" field to distinguish an etymology language from a regular language or a language family, but after restructuring of Module:etymology languages/data that field is moved to field 3. Not sure what the solution is because field 3 also existed in the language data and now also exists in language family data, but it is the language family there. Etymology languages can have families as their parent, so it's not sufficient to check if field 3 is not a language family. I guess we could modify the etymology language data beforehand to set a "type" field to "etymology language" and check that. — Eru·tuon 21:05, 26 March 2023 (UTC)[reply]
Came back to this and I didn't explain the problem properly. Basically it looked to me like the etymology languages are all omitted from the table that is used to generate trees because they no longer have the "parent" field and the module hasn't been told to interpret field 3 as parent. — Eru·tuon 02:13, 27 March 2023 (UTC)[reply]
@Erutuon @Benwing2 I've done a total redesign of Module:etymology languages that makes this problem fairly easy to solve (among other things): Module:User:Theknightwho/etymology languages. It works entirely on the basis of class inheritance, so a child etymology language inherits all of the features of its parent, but in a way that means any modifications overwrite those base qualities. For example, a customised transliteration module for Classical Persian (which is something that came up the other day). Plus, it works transitively, so there is no issue having an etym-only language which inherits different things from its parent and grandparent (and so on).
From a technical perspective, self._rawData is an empty object with a metatable that is connected to self._stack, which is a table of the actual raw data for each level. self._rawData has a metatable that iterates through the current lang and each parent looking for a match for that particular key (e.g. accessing self._sortkey will start at the top (e.g.) self._stack[3]._sortkey, and will work its way down until it reaches self._stack[1]._sortkey, which is the base language). This is important for two reasons:
  1. It means the raw data can be slotted into each level of the stack as-is, and simply accessed as ._rawData by any method. This avoids any increased complexity, because the method doesn't know/care what its own generation is, or what the generation of any data it accesses is. Plus, it works well with mw.loadData.
  2. It makes it easy to automatically handle issues like this, where different data modules use the same key for different things. Each generation is created recursively from the base language, and certain bits of data are simply handed over, to avoid needing to recalculate them. This includes the parent, family and non-etym codes, which means :getFamilyCode() still works, because the etym-only lang has a ._familyCode value on creation. So long as any clashes like this are accounted for during the makeObject process, this should be very manageable.
On a side note, it also solves the isAncestralToParent problem, because it makes it possible to set etym-only languages as ancestors in the language data modules. The etym-only language simply gets added to the table of ancestors in the chain. That means we can make the isAncestralToParent check in Module:etymology redundant, because we can just check lang:hasAncestor(otherlang) (and it doesn't matter if otherlang is an etym-only language or not). I've tested this with Latin and Old Latin, and it works okay.
What I haven't worked out is what to do with etym-only languages with families for parents. They don't really fit into the model very well, as they feel like a bit of a bodge. Theknightwho (talk) 13:58, 27 March 2023 (UTC)[reply]
I fixed the problem in Module:family tree/nested data so that Dutch shows up in the Proto-West Germanic tree once again. — Eru·tuon 16:28, 27 March 2023 (UTC)[reply]
@Erutuon One thing this touches on (which is also related to my new code above) is that it would be good to distinguish between “descendant” and “variety” when it comes to etym-only languages on the language family trees. To use an example: Medieval Latin is an etym-only variety of Latin. Renaissance Latin is a descendant of ML, while Early Medieval Latin is a variety of ML. At the moment, these two things become conflated on the language tree, and it would be good to have some way of distinguishing them. Theknightwho (talk) 16:46, 28 March 2023 (UTC)[reply]
@Theknightwho: I guess that's what ancestors = {"la-med"} in the data for Renaissance Latin would signify. And then the family tree should probably prioritize the "ancestors" field over field 3 for etymology languages in deciding how to nest. Edit: Made Renaissance Latin a variety of Latin (it was formerly a variety of Medieval Latin) and a descendant of Medieval Latin and changed Module:family tree/nested data to nest it under its ancestor and not its parent language variety. — Eru·tuon 19:28, 28 March 2023 (UTC)[reply]
@Erutuon Makes sense. I’ve mentioned this to @Benwing2 already, but I would support renaming these “varieties”, as I think it would be clearer as to what their purpose is - and it would open the door to using them in (e.g.) pronunciation sections. Also, perhaps we could rename “parent” to “variety of” in the documentation? It’s not hugely important, but it might avoid further confusion. I know there are a bunch more that need amending, for this same reason. Theknightwho (talk) 19:36, 28 March 2023 (UTC)[reply]
@Erutuon, Theknightwho: On the specific point, I believe "Renaissance Latin" should be treated in a parallel manner to Early Medieval Latin and as a subtype of New Latin, not an autonomous stage, because the division between Medieval Latin and New Latin is defined by the Renaissance reform of Latinity and there are very few terms specific to the Renaissance. It is essentially "Early New Latin". This is also how I rearranged the category structure at Category:New Latin some time ago. —Al-Muqanna المقنع (talk) 21:11, 28 March 2023 (UTC)[reply]
@Al-Muqanna @Erutuon I think that suggests the family tree needs to somehow display these two things differently, as this change would make it show New Latin → Renaissance Latin, which is obviously misleading. Theknightwho (talk) 21:30, 28 March 2023 (UTC)[reply]
Now I think about it, this system of class inheritance could also work for families to regular languages (which might also assist with solving the substrate issue). That would make it possible to define behaviour for whole (sub-)families. For example, Hellenic sortkeys. You could then simply override this with language-specific data. Substrates could simply be etym-only languages that inherit directly from the family object.
One small advantage of this is it'd mean we could probably get rid of Module:language-like, too, or possibly even merge Module:languages, Module:etymology languages and Module:families into a single module. Theknightwho (talk) 21:37, 28 March 2023 (UTC)[reply]

──────────────────────────────────────────────────────────────────────────────────────────────────── @Theknightwho I'll try to take a look at your new code tomorrow. Benwing2 (talk) 04:13, 29 March 2023 (UTC)[reply]

@Theknightwho I took a look at your code in Module:User:Theknightwho/etymology languages. It generally looks fine. If you're "thinking big", you might think about how to merge etymology languages, regular languages and families; this is IMO too big a change right now, but keep it in mind in case we ever decide to move in that direction so that you make your code "future-proof". Also, I took a look at your changes to Module:languages and it looks like you were maybe trying to implement stacks for regular languages. IMO we should do this at first only for etymology languages, and iron out the kinks (e.g. possible memory hits) that way. Finally, you mentioned something about etymology-only languages that have families as parents being problematic in your new scheme; which such etymology languages exist? I agree it's strange to have such things, maybe there's a different way of handling them. Benwing2 (talk) 08:09, 31 March 2023 (UTC)[reply]
@Benwing2 Thanks - I had the same thought about stacks, so they’ve come out of the latest version of Module:User:Theknightwho/languages. I also agree with you about the possible merge, too.
At the moment, substrates like qfa-sub-bma have qfa-sub as their parent, which is a family code. I feel like this could probably be handled in a better way. Theknightwho (talk) 12:06, 31 March 2023 (UTC)[reply]
@Theknightwho Yeah the whole substrate thing is weird. In reality substrates may not be unique languages at all, more like families at best, so maybe they should just be families. Benwing2 (talk) 17:52, 31 March 2023 (UTC)[reply]
@Benwing2 For the moment, Module:etymology languages uses an und language object as the baseline for substrates, but makes sure the family is noted as the parent. It's not ideal, but it should be okay for the moment. Substrate codes seem to be really rare, anyway, so I don't think it's pressing. Theknightwho (talk) 05:51, 1 April 2023 (UTC)[reply]
@Theknightwho Sounds good. Benwing2 (talk) 05:59, 1 April 2023 (UTC)[reply]
@Benwing2 @Erutuon Just flagging up that Module:family tree/data is now obsolete, as the data has been ported over into the proper language modules. Theknightwho (talk) 05:21, 9 April 2023 (UTC)[reply]

Lemmatizing Proto-Romance verbs[edit]

A while ago I proposed lemmatizing all Latin verbs by the infinitive for a number of reasons. Ultimately it seemed best to abandon that idea for two reasons:

1) We would have to change nearly 6000 entries, many of which act as hubs for dozens of other entries (inflections/derivatives/synonyms/antonyms; cedō is a good example). Various modules and templates would also have to be recoded accordingly.

2) All traditional dictionaries for Classical Latin cite by the first-person singular.

However, neither of these factors applies to our entries for verbs reconstructed from Romance data:

1) There are only about 200 of them, they do not act as 'mega-hubs', and they cannot use normal Latin inflection tables anyway due to the impossibility of reconstructing all of the Classical inflections from Romance.

2) Not a single (!) notable scholarly source that refers to these verbs cites them in the first-person singular.

  • See: the FEW, REW, Coromines & Pascual, DEX, DCVB, Oxford Guide to the Romance Languages, Cambridge History of the Romance languages, etc. (all use the infinitive).

In addition, using first-person singular lemmas is problematic in several cases due to the impossibility of actually reconstructing that specific form as we have given it.

Let's take *lū́ciō as an example. (I will be using the acute accent to indicate stress.) There is no doubt about the infinitive *lucī́re: all of the Romance languages which inherited this verb show infinitives ending in -í(r)(e).

Per normal Latin rules, we would be forced to reconstruct the first-person singular inflection of *lucī́re as *lū́ciō, which is what has been done to make the lemma here. The problem is that not a single Romance language actually reflects that form. They instead reflect *lū́scō (> French luis, Spanish luzco), *lūcíscō (> Catalan llueixo, Romanian lucesc) and even *lū́ō (> Catalan lluo, which might however be a recent derivative from llueixo, with subtraction of -eix-).

There are also examples where Romance reflects a stress position that differs from what we would predict by Latin rules. For instance, *cominítiō does not actually survive as such in Romance, which instead reflects *comín(i)tiō. The infinitive *cominitiā́re, on the other hand, survives as such everywhere.

Further examples include:

  • *imprūmū́tō, where Romance instead reflects *imprū́mūtō
    • **imprū́mŭtō is impossible due to e.g. the Piemontese outcome.
  • *repænítiō, where Romance instead reflects *repǽnitō.
    • As an experiment, I have already relemmatized this to the infinitive.
  • *scrībiō, where Romance instead reflects *scrībō.
  • *sórbiō, where Romance instead reflects *sórbō and *sorbíscō.
  • *suffériō, where Romance instead reflects *súfferō and *sufferíscō.
  • *surrū́pō, where Romance (at least Balkan Romance) instead reflects *súrrūpō.
  • *wárniō, where Romance instead reflects *wárnō and *warníscō.

Needless to say, it is awkward to have a reconstructed lemma that cannot actually be reconstructed (without caveats). However, all of these lemmas, and their stress positions, are 'demanded' by the correctly-reconstructed infinitives, if we follow normal Latin rules. If abandon the Latin rules, but still adhere to the first-person singular lemmas, then we are forced to split several of these entries into two (that is, one for *wárnō, another for *warníscō- and so forth).

Far simpler would be to simply lemmatize the infinitive which was straightforwardly inherited everywhere (*warnīre).

Apart from the the attractive stability of the infinitive in the evolution of Romance, lemmatizing it would have, as mentioned in the earlier proposal, the advantages of:

1) Consistency with Romance descendants lemmatized by the infinitive.

2) Consistency with impersonals such as *nivat 'it snows', where a first-person lemma *nivō would have the nonsensical meaning of 'I snow' (whereas the infinitive *nivāre conveys a perfectly reasonable 'to snow').

In sum, switching to infinitive lemmatization for Proto-Romance entries would clear up numerous inconsistencies (absurdities, even) and would be a fairly trivial task. I could probably do it manually over the course of a week or two. Nicodene (talk) 16:39, 26 March 2023 (UTC)[reply]

Support; I think you've made a good case. PUC17:26, 26 March 2023 (UTC)[reply]
Support for the same reasons. Benwing2 (talk) 19:49, 26 March 2023 (UTC)[reply]
Support. Vininn126 (talk) 19:53, 26 March 2023 (UTC)[reply]
Support. The important difference from Latin is that Latin is and was its own language with its own grammatical traditions that lemmatised on the first-person-singular present indicative, while Proto-Romance is primarily a reconstruction based on later languages- all of which lemmatise on the infinitive. I opposed treating Latin like Proto-Romance, but it would be just as wrong to treat Proto-Romance like Latin. Chuck Entz (talk) 20:07, 26 March 2023 (UTC)[reply]
Support. AG202 (talk) 20:49, 26 March 2023 (UTC)[reply]
Support. Urszag (talk) 21:25, 26 March 2023 (UTC)[reply]
Support. —Al-Muqanna المقنع (talk) 16:37, 28 March 2023 (UTC)[reply]
Oppose. Ha-ha, just kidding. I'll get to work then. Nicodene (talk) 23:00, 30 March 2023 (UTC)[reply]

displaying headword inflections after a comma instead of between parens[edit]

I changed the format of headwords slightly so that inflections are shown after a comma instead of between parens if there are translits displayed for the inflections. This is because with translits, we were getting nested parens, which IMO looks ugly. You can see an example of this at عربي. Question: Should we make this change unilaterally? Compare e.g. Russian демократия (demokratija), where the inflections are not transliterated and are within parens, with عربي as mentioned above, where the various-language inflections are transliterated. Benwing2 (talk) 02:42, 27 March 2023 (UTC)[reply]

Oppose, and if anything, change the Arabic headword back to parentheses. I find it much easier to see where the headword/inflections end with parentheses than with commas, especially with languages that have gender, translit etc in inflections. Thadh (talk) 07:28, 27 March 2023 (UTC)[reply]
Oppose. It is easier to distinguish between the headword and the inflections with the parenstheses, as Thadh has said. If the double parentheses looks ugly, perhaps we could use [square brackets] or {curly brackets} for the inflections instead? – Wpi31 (talk) 08:10, 27 March 2023 (UTC)[reply]
Hmm, OK. I don't think we should use braces (curly brackets), but square brackets sound good. Should we put the inflections themselves in square brackets and the translit inside it in parens, or vice-versa? Benwing2 (talk) 06:11, 29 March 2023 (UTC)[reply]

using self-published books[edit]

Where do we stand on allowing self-published books to count as durably archived? Is it a "don't make waves" issue that we prefer to have no policy on, since few if any people are going to go scrummaging around just to find cites of obscure words? If it were up to me, I'd think it should be like Twitter, since Twitter is also a self-published medium. Therefore they could be used to cite a word's existence, but they could be challenged just like Twitter cites can. Best regards, Soap 10:46, 27 March 2023 (UTC)[reply]

Agreed that vanity press books are totally valid for showing how words are used, but if an entry only has a single self-published book, that is a bit dodgy. —Justin (koavf)TCM 10:56, 27 March 2023 (UTC)[reply]
I think it's fine for there to be no policy because there's no real reason for there to be a one-size-fits-all policy about whether a source was published by a third party or not. I doubt many people would object to a 17th-century source that was printed by the same person who wrote it, for example, and basing admissibility on whether there's editorial oversight seems too prescriptive. Manuscript sources that are in archival collections but were never published to begin with should (obviously IMO) still be admissible. It probably makes more sense to handle it case by case on the basis of the medium: if it's just an ebook someone's thrown online I doubt it should be considered durably archived, if it has an ISBN and they've put out an entire printed edition then it probably should be. It also needs the context of the entry in question: some 2010s Internet neologism that stands or falls on self-published books would be very different from, say, something that's from a regional dialect where better-quality sources might not be available in principle. —Al-Muqanna المقنع (talk) 11:18, 27 March 2023 (UTC)[reply]
I think it is noteworthy that we don't have a reference or quotation template for the w:Ormulum, of which all we have is the original composition. (We do, though, have a category for spellings found only in the Ormulum.) I have a similar problem with hunting a source for the (TBC) very modern Shan writing of Pali. This month I finally found a Vipassana meditation manual in Shan which has some Pali text, and a book of Pali chants with connecting text and translations in Shan. However, the former is in the form of a PDF which looks as though the reader is intended to print it off and bind it himself, and the latter looks more like a hand-out that the reader is intended to print off and staple together himself. I'm also worried that these are just the products of a band of promoters of the spelling - older Shan materials use pretty much the same spelling as the Burmese do. From this point of view, the meditation manual would be better, though much scantier, evidence. --RichardW57m (talk) 13:19, 29 March 2023 (UTC)[reply]
It wouldn't be too difficult to whip up a reference template for the 1878 critical edition of the Ormulum, which still seems to be the one used in recent scholarly literature. I have cited unedited manuscripts before (see Lux Mundi and the 1766 citation for egger-on), though I do think it might be useful to have an actual {{quote-manuscript}} template since the expected citation parameters are rather different from a published book. —Al-Muqanna المقنع (talk) 13:36, 29 March 2023 (UTC)[reply]
The issue with the Ormulum would be its idiosyncratic spelling. While it meets the letter of the LDL concession, it's not just the scarcity of the available material that makes it unusual. --RichardW57m (talk) 14:54, 29 March 2023 (UTC)[reply]
I don't think there are sufficient grounds for banning all self-published books outright, but if there is a choice between self-published and non-self-published works for particular entries, I generally choose non-self-published ones. Editors should also remember that self-published works are often poorly edited, and so may contain idiosyncratic spelling and ungrammatical sentence construction. Thus, they shouldn't be relied on solely as evidence of such usage. (I generally work on modern English quotation templates, but if anyone wants help creating a quotation template for the Ormulum, let me know.) — Sgconlaw (talk) 15:18, 29 March 2023 (UTC)[reply]
I think a huge factor in this is level of cataloging. If a self-published book has been catagogued by some source we deem as durable, the book could be too. Vininn126 (talk) 15:28, 29 March 2023 (UTC)[reply]
@Vininn126: weeell, if a book is assigned an ISBN (and pretty much every self-published book is), it will usually appear in the OCLC. Thus I’m not sure the fact that such books appear in a catalogue is much of a criterion, unless you mean to exclude the OCLC. — Sgconlaw (talk) 22:40, 31 March 2023 (UTC)[reply]
@Vininn126: I too am confused by 'catalogued by some source we deem as durable'. Surely, what is more relevant is more relevant for durability is being held by some library, and at that, one which holds its stock for many decades. --RichardW57 (talk) 15:28, 1 April 2023 (UTC)[reply]
@Sgconlaw @RichardW57 I should have clarified, by "catalogued by a source we deem durable" I was referring to books in libraries predominantly, which do on occasion take in self-published books and catalogue them, not necessarily to an ISBN number or something similar. Vininn126 (talk) 15:41, 1 April 2023 (UTC)[reply]
They are usable but tend to be low-quality (e.g. no proof-reading). I try to avoid them unless there is no alternative. Equinox 20:05, 30 March 2023 (UTC)[reply]

Sassarese reference template[edit]

Hello, everybody. I recently acquired a copy of Salvatore Dedola's Fabiddággiu etimológicu di lu sassaresu (Etymological dictionary of Sassarese). I was thinking about creating a reference template based on this publication, and I figured I'd need some kind of approval before doing that (something along the lines of "This source is good enough to be used in a reference template", I imagine): is that a thing? Is there anyone who could help me with this matter? — GianWiki (talk) 16:40, 27 March 2023 (UTC)[reply]

Not a thing; If it's physically published and makes sense, feel free to create a reference template and use it as a reference. If it's not published it gets more tricky, but in this case there really isn't any issue at all. Thadh (talk) 17:00, 27 March 2023 (UTC)[reply]
I see. Thank you very much for your input! — GianWiki (talk) 18:53, 27 March 2023 (UTC)[reply]
Hi. When I answered your comment yesterday, I had hurriedly misread it: I thought you had written If it's physically published it makes sense.
About the “making sense” part, it's... kind of hard for me to tell, actually. I started reading the dictionary, and it seems Dedola is a proponent of a fringe theory (that's what it looks like to me, at the very least; maybe I'm not informed enough on the subject) wherein the languages in the Mediterranean Basin area have experienced a very strong Semitic influence. He often proposes etymologies going back to the Sumerian, Akkadian and Babylonian languages, even in cases where it seems kind of baffling to do so. For example, he proposes that Sassarese pésciu (fish) is ultimately derived from Sumerian peš- (I don't have the dictionary on hand at the moment, and I don't exactly remember what this component meant; I think it was “to gather”) + 𒄩 (ku₆, kud /⁠kud⁠/, fish), a compound meaning “(a gathering of) that which has been fished”, rather than from a change in declension of Latin piscis. It honestly seems unnecessarily complicated (what with Ockham's Razor and all that), but I'm willing to concede that my point of view may be utterly biased. It's a case wherein we're dealing with either a genius, or a madman (for lack of a better word). Any thoughts on this? — GianWiki (talk) 07:46, 28 March 2023 (UTC)[reply]
If his non-Semitic etymologies are less controversial, you could still use it and ignore the Semitic ones. Thadh (talk) 09:01, 28 March 2023 (UTC)[reply]
@GianWiki The etymology for pésciu that you mention is obviously garbage, and as a result I would be wary of any of his etymologies. (Reminds me of one book I found in my university's library that tried to prove that all Indo-European languages were descended from Tamil.) Benwing2 (talk) 04:16, 29 March 2023 (UTC)[reply]
Yeah, I've taken another look at different entries, and every single one of those I've seen is presented as having etymologies going back to ancient Near Eastern languages (particularly Sumerian and Akkadian). While I have a passion for linguistics in general, I'm definitely not a linguist; despite this, I've come to think the publication is objectively ridiculous. I don't think there's anything to be gained in keeping that hastily-made template ({{R:sdc:Dedola}}): feel free to delete it (I'd do it myself, but I don't think I have the clearance/permissions to do that).
Really, I'm just trying to take this as a lesson about informing oneself on the nature of a work, when possible, instead of blindly spending money on it.
I'm sorry for wasting your time with this stuff, but I'd like to thank you, @Thadh and @Benwing2, for all of your input on this matter. — GianWiki (talk) 07:11, 29 March 2023 (UTC)[reply]
As requested, I have deleted the reference template. — Sgconlaw (talk) 08:35, 29 March 2023 (UTC)[reply]
Thank you very much. — GianWiki (talk) 14:41, 29 March 2023 (UTC)[reply]

Removed NSFW images[edit]

194.247.255.130 (who is currently blocked) removed NSFW images from 1 December 2022 to 3 February 2023 on anal beads (“Fixed issue”, writing on the talk page, “I feel that the image shouldn’t be on this entry, considering how it has a drawing of a naked woman using it and I can only imagine someone at work or school accidentally seeing this”), map of Tasmania, manbulge (both reverted once by @MathXplore (“Unexplained content removal”), then removed again with the edit summary “Removing NSFW image” (by 149.12.2.185 for the former)), moose knuckle, sex organ, ball gag, ahegao, areola (all with the edit summary “Removed NSFW image”; the nipple edit was reverted by @DCDuring 18 days later, after the block). The block was for unrelated edits that were reverted; should the images be restored? J3133 (talk) 08:58, 29 March 2023 (UTC)[reply]

Explanations should be given to the edit summary when content removal is really needed, so I reverted once to encourage the user to do so. If the images are really helpful for the dictionary content, then they should be restored. MathXplore (talk) 09:08, 29 March 2023 (UTC)[reply]
In principle, we're not a censored dictionary, so there is plenty of objectionable material that is simply documented here. That said, if we have images to accompany NSFW topics, they should be less shocking while being informative. E.g. a picture of anal beads is certainly useful for understanding what anal beads are, but we don't need to have a photograph of someone using anal beads. —Justin (koavf)TCM 09:31, 29 March 2023 (UTC)[reply]
I agree with Koafv's thoughts. The entries for ball gag, areola, anal beads, and probably ahegao merit a picture that constitutes an ostensive definition. Sex organ seems not warrant an image for each sex organ. Moose knuckle seems a provide a sufficiently vivid image without an image. I don't know what to think or do about pictures of sex acts for which an image could easily be "required" for an ostensive definition. Probably we need the same kind of "censorship" that we apply to images that involve racism, violence, etc.
I don't think we can be both "safe for children" and open in the way we like. We seem to be very committed to openness. Most of our entries are "safe for work". A user can usually anticipate which ones may not be. A problem could arise with polysemic terms, for which an ordinary-seeming term has a definition that would, in itself, warrant an NSFW image. I think no NSFW image should be on such a page, at least if there is or could be a synonym to bear the image. Along the same lines, it might pay to make sure that none of our images are too large, eg, screen-filling. The sudden appearance of a large, colorful image on a screen that generally displays text or similar content could draw attention to whatever is on the screen and to the user. Perhaps images some object to could be made available only be a user action, eg, a click on an image box that contained a useful reference to the definition associated with the image. DCDuring (talk) 11:54, 29 March 2023 (UTC)[reply]
Agree with all of the above. I think it's best to look at it from a user utility standpoint: how much does the image contribute to understanding the entry? If the image is more shocking or distracting than informative then it probably doesn't need to be there. —Al-Muqanna المقنع (talk) 12:14, 29 March 2023 (UTC)[reply]
Generally agree with folks that have already commented. I also just find the comment, "I feel that the image shouldn’t be on this entry, considering how it has a drawing of a naked woman using it and I can only imagine someone at work or school accidentally seeing this" funny as I would expect going to the page for "anal beads" would be questionable enough and that you wouldn't accidentally end up on it... AG202 (talk) 12:28, 29 March 2023 (UTC)[reply]
I too agree with what's been said, especially that an image should only show what's necessary (so for anal beads just the beads, not beads in use), and that when possible we should put images in the most expectable places, e.g. it's reasonable that cock doesn't have an image of a penis (nor a definition of what a penis is, the definition just says "A penis" and directs users to go there if they don't know what that word means), but penis does have a diagram. I would reinstate the image of areola, worth a thousand words as far as clarifying what exactly the term denotes, but I don't see a need to reinstate the image of pubic hair on map of Tasmania, since it seems reasonable to just define it as "pubic hair" and let anyone who doesn't know what that is click through to that entry and Wikipedia. - -sche (discuss) 00:13, 30 March 2023 (UTC)[reply]
Another tactic for reducing surprise NSFW images. At least for longer entries, a problematic image could appear some distance down the page so as not to leap into view on opening the page. DCDuring (talk) 00:21, 30 March 2023 (UTC)[reply]
What do people think of hiding NSFW images in a collapsible div with some sort of warning? This approach is used at the Russian Wiktionary, for example. Benwing2 (talk) 01:50, 30 March 2023 (UTC)[reply]
Meh. I'm not strongly opposed to it if it's what most people want to do, but personally I think that'd be going too far in the other direction. In a recent discussion on Wikipedia about deploying explanatory footnotes to something, it was brought up that there are statistics on how few people click through to read/see footnotes or references (approximately 0.3%), and we've gotten complaints here over the years from people saying something was uncited because they didn't realize there was a "quotations" button they could click, or it didn't work because some part of the javascript didn't load, so I suspect many users might not notice or figure out or be able to unhide images. - -sche (discuss) 05:36, 30 March 2023 (UTC)[reply]
Also, it would probably lead to more debates around what counts as NSFW. Things which people have made headlines and laws(!) trying to eliminate from visible spaces in just the last week include Michelangelo's [[David]], trans people, and various books we have quotes from. Currently, if someone tries to remove the image of Michelangelo's sculpture from David for being NSFW, I feel like people will just revert them since it's a fine image and doesn't matter if it's NSFW or not since we don't have any absolute ban on NSFW... but if we implement a Hide feature and people want it hidden, do we entertain that? I don't oppose a Hide functionality if it's what people want, but I'd like us to have a clear idea of what, exactly, it should apply to. - -sche (discuss) 23:24, 30 March 2023 (UTC)[reply]
Human anatomy is not inherently salacious. A biological or medical textbook would have pictures of the penis, etc. and I would say we should too, on anatomical entries. It's just as educational as a picture at "nose" or "toe". Not sure that we need it at "anal beads" though: just show the beads, not the anus. Also I wouldn't include them at slang entries like "dick": unnecessary. Equinox 20:03, 30 March 2023 (UTC)[reply]
I think Benwing2's idea about hiding them behind some sort of "adult material warning" is acceptable, since we do get all kinds and ages of visitor. Everybody would click it though, out of curiosity :) Equinox 20:08, 30 March 2023 (UTC)[reply]
@-sche I think it should be possible to define reasonable policies as to what should be hidden by default. We've managed to define "derogatory" in WT:DEROGATORY, for example. Search engines seem to have figured this out pretty well; e.g. they won't show pornographic pics unless your intent to see them is fairly clear from the search term, and I think some engines implement an NSFW feature. One thing we could do is have a preference setting "Hide NSFW images by default", which is normally enabled but you could disable it, similarly again to the way that search engines operate. Benwing2 (talk) 00:39, 31 March 2023 (UTC)[reply]
Do any search engines currently suppress any of the images that have been considered by some to be NSFW (eg, anal beads) when they load Wiktionary or Wikipedia content? Do they suppress only the image or the other page content too? DCDuring (talk) 14:48, 31 March 2023 (UTC)[reply]

Does Japanese need automatic transliterations?[edit]

General language link templates like {{m}}, {{l}}, {{bor}}, {{syn}}, {{alternative form of}} etc. are now able to produce automatic transliterations for Chinese links, e.g.:

{{obor|ja|zh|輸送}} => Orthographic borrowing from Chinese 輸送输送 (shūsòng)

Japanese may have good reasons to follow the suit. If we enable its automatic transliterations, Japanese editors can also do this:

{{obor|zh|ja|輸送}} => Orthographic borrowing from Japanese 輸送 (yusō)

without having to do:

{{obor|zh|ja|輸送|tr=yusō}}

It will use a tactic similar to that of Chinese by transcluding the target page to fetch the phonetic information, and thus cost more Lua memory. But I guess this should not be a problem since Chinese has been already doing so.

(Automatic ruby and sortkeys are also available, though at the price of perhaps still more Lua memory usage.)

(Notifying Eirikr, TAKASUGI Shinji, Atitarev, Fish bowl, Poketalker, Cnilep, Marlin Setia1, 荒巻モロゾフ, 片割れ靴下, Onionbar, Shen233, Alves9, Cpt.Guapo, Sartma, Lugria): -- Huhu9001 (talk) 13:37, 29 March 2023 (UTC)[reply]

Broadly, I like the idea. One question remains -- what about the many terms in Japanese that have multiple readings? Presumably editors would still have the option of specifying a reading manually? ‑‑ Eiríkr Útlendi │Tala við mig 18:11, 29 March 2023 (UTC)[reply]
It will be much easier to do this in an efficient way if we do this via the main modules (at least behind the scenes). However, that shouldn't be too difficult, I think. Theknightwho (talk) 18:48, 29 March 2023 (UTC)[reply]
I also like the idea. As an aside, I don't think that Chinese automatically defaulting to Pinyin & Mandarin is really a good idea (especially with {{bor}}, {{l}}, etc.) This issue, while not the topic at hand, is an important one and part of the problem that arises with putting all lects into one (such as the descendant-listing and specific etymology issues that keep coming up). I'd honestly prefer that there be no automatic transliteration for Chinese (zh) links unless absolutely needed and that folks focus on using the specific codes like "cmn", "yue", etc., so that it disincentivizes people from just defaulting to "zh" unless truly needed. AG202 (talk) 22:28, 29 March 2023 (UTC)[reply]
My general sense is that zh should be reserved for the written form of Chinese, without including any pronunciation. If a user wants the template to output the pronunciation for a given Chinese language, they should specify which one: cmn for Mandarin, yue for Cantonese, nan for Min Nan, hak for Hakka, etc. ‑‑ Eiríkr Útlendi │Tala við mig 04:30, 30 March 2023 (UTC)[reply]
I support the idea. However, I thought with Japanese, it will be based on the full kana in the translit, with all tricks as in {{ja-usex}} - dots(.), spaces, carets (^). Multiple readings, could be separated by commas or similar. Anatoli T. (обсудить/вклад) 22:30, 29 March 2023 (UTC)[reply]
賛成(さんせい) (sansei)。I agree with Anatoli T. that transliteration based on kana and including the standards used in ja-usex would be useful. Such an approach might also work around the issue raised by Eiríkr Útlendi of lexemes with multiple readings – though perhaps not, if it would require specifying both kanji and kana in the template. If that were the case, ja-r currently does that. (It also adds ruby, which may or may not be helpful in particular cases.) Cnilep (talk) 00:28, 30 March 2023 (UTC)[reply]
@Cnilep: Thanks. Ruby would be good but there will be problems with multiple readings. Alternatively, different readings can be always split into multiple words, e.g. 字面(じづら) (jizura), 字面(じめん) (jimen) (rather than making it one word: 字面 (じづら, jitsura, じめん, jimen). Even e.g. Russian words with different word stresses are now typically given as two words, rather than one word with two comma-separated readings. Anatoli T. (обсудить/вклад) 00:57, 30 March 2023 (UTC)[reply]
I have no issue with this, although the issue of multiple readings needs to be thought about carefully. I think it's even more important to have automatic sortkey generation (if possible), because the current situation is a massive pain (if you're lucky enough to have a term that is Japanese-only, you can use {{DEFAULTSORT:...}}, but otherwise you have to specify a sort key in EVERY SINGLE template that generates a category, which litters the wikicode with such sort keys and is hard to maintain). Benwing2 (talk) 01:55, 30 March 2023 (UTC)[reply]
@Benwing2: Automatic sortkey generation is undoubtfully possible, but at what cost. The major challenge of Japanese sortkeys comes from {{lb}} and less often {{tlb}}, which are tasked with generating Japanese categories but have no access to the phonetic information. To overcome this difficulty:
  • Plan Z: Do nothing. Sortkeys are just not that important.
  • Plan A: Transclude the page and pick a phonetic input found in this page as the sortkey. When multiple phonetic inputs exist, choose one arbitrarily. This can sometimes cause wrong sortkeys for homonyms. One may find this tolerable or not.
  • Plan A+: When multiple phonetic inputs exist, compare the arguments received by the module, with those of all instances of {{lb|ja}} found in the page, to deduce the correct sortkey for homonyms. Largely incompatible with the current {{lb}}, may need some new hacky {{ja-lb}}.
  • Plan B: Relocate all Japanese lemmas to kana-titled pages, or subpages. Then all {{lb|ja}} will have no trouble with sortkeys anymore. I remember there were some previous votes concerning the choice of Japanese lemmata but none produced substantial conclusion.
-- Huhu9001 (talk) 07:02, 30 March 2023 (UTC)[reply]
Japanese lemmas are currently sorted by kana readings - kanji, hiragana and katakana alike. You can check by looking at any PoS categories or lemmas. Anatoli T. (обсудить/вклад) 07:23, 30 March 2023 (UTC)[reply]
You did not understand what I have said. -- Huhu9001 (talk) 07:37, 30 March 2023 (UTC)[reply]
For any Japanese grapheme that has multiple readings, any borrowing or other derivation of that term will likely be dependent on one single reading, making it problematic to take any approach of "output all the readings in one long list".
More technically speaking, a Japanese "word" for lexicography purposes exists at the intersection of graphemic and phonemic units. The problem we are discussing here in this thread is how to specify a Japanese term using only one of these two axes. For some Japanese terms, this is not a problem -- written forms like 志す that have only one reading (kokorozasu) that matches this particular spelling. But for spellings like , our entry currently includes four readings that represent four distinct terms -- and our entry is missing ten additional readings that are also distinct terms, all fourteen of which happen to share this 生 spelling. (Providing one hint for why it takes so damn long to learn to read Japanese.) Several of these readings also have alternative spellings, like ubu, which could be spelled as , , or 初心.
→ Getting back to the core idea, I strongly suspect that Anatoli and I are thinking of different use cases.
  1. When specifying a Japanese term in a template like {{bor}} or {{der}}, we probably need to specify a single reading, since only a specific term (grapheme x reading) is generally the source of a borrowing or other derivation. If the Japanese written form only has one reading, then no problem -- but if the written form has multiple readings, we need to have some way of specifying which one -- presumably using the tr= parameter we've been using for years.
  2. When linking in some other way to a Japanese term, it might be desirable to output all readings at once. I can't think of a good case for this right at the moment, but I'm open to the idea that someone might need to do this.
‑‑ Eiríkr Útlendi │Tala við mig 04:47, 30 March 2023 (UTC)[reply]
@Eirikr: I don't see we disagree on anything or talk about different things. If there is only one reading, there are many options: 1. read from entry, like Chinese does. 2. provide one kana reading and get one transliteration - 2. horizontal, with |tr= or 3. ruby like {{ja-r}} or {{ja-usex}} does.
With multiple readings, one traditional option with |tr= there is always this method: 字面 (じづら, jitsura, じめん, jimen). The alternative is to split and list readings separately: 字面(じづら) (jizura), 字面(じめん) (jimen).
Compare with Russian three way display of words with alternative stresses:
  1. обеспече́ние (obespečénije), обеспе́чение (obespéčenije) (split)
  2. обеспе́че́ние (obespéčénije) (both stresses together, this is confusing)
  3. обеспечение (obespečénije, obespéčenije) (two transliterations)
I prefer # 1.
The automated transliterations currently fails for Mandarin when a headword has multiple readings and requires a manual transliteration. The same can apply to Japanese. Anatoli T. (обсудить/вклад) 05:23, 30 March 2023 (UTC)[reply]
(This might be a tangent...)
I can't think of a good reason to output multiple readings though, of either format 字面 (じづら, jitsura, じめん, jimen) or 字面(じづら) (jizura), 字面(じめん) (jimen).
I guess I'm confused why you'd ever need to do that? ‑‑ Eiríkr Útlendi │Tala við mig 05:28, 30 March 2023 (UTC)[reply]
@Eirikr: In translations tables on English entries, if both readings are equal. Unless you prefer to restrict to just one (which is OK but incomplete).
seven o'clock#Translations (not the best example, if it's an SoP) has only one Japanese reading but there are two readings: 七時(しちじ) (shichiji) or 七時(ななじ) (nanaji). Should there be only one, in your opinion? Anatoli T. (обсудить/вклад) 05:40, 30 March 2023 (UTC)[reply]
If it's a tangent, we can split off but it seems a bit relevant. Anatoli T. (обсудить/вклад) 05:42, 30 March 2023 (UTC)[reply]
  • If a written form has a set of readings that all have the same meaning, that's great. But many multi-reading Japanese terms have distinct meanings for the different readings. The single-kanji terms are probably the worst for this, as mentioned above with . Multi-kanji terms can still have their own complications as well, such as 宮殿. With the reading kyūden, this refers to a king's palace, or a shrine for a kami. With the reading kūden, this refers instead to a kind of miniature Buddhist temple, small enough to put on a 仏壇 (butsudan) dais.
As another example, there's the spelling 強る. Read as tsuyoru, this means "to get strong(er)". Read as kowaru, this means either "to get stiff and hard", or "to have a pain in the belly".
I'm open to the idea of having a template that returns all the readings of a given Japanese spelling, but I do not think that this should be the default behavior of the {{bor}} or {{der}} examples given at the top of this thread. ‑‑ Eiríkr Útlendi │Tala við mig 06:02, 30 March 2023 (UTC)[reply]
@Eirikr: I am pretty sure @Huhu9001 refers to general language templates, which can now handle Chinese transliterations automatically. Your examples are valid as well just in a different scenario. Anatoli T. (обсудить/вклад) 06:22, 30 March 2023 (UTC)[reply]
You may still need to provide multiple readings but with glosses (senses). Anatoli T. (обсудить/вклад) 06:45, 30 March 2023 (UTC)[reply]
Is the senseid/etymid system too difficult to use? Do we need some system to make the IDs more visible - the only technique I know is to open the term's definition in a suitable editor. --RichardW57m (talk) 09:23, 30 March 2023 (UTC)[reply]
I've been adding {{etymid|ja|READING}} to entries with multiple readings (where READING is the modified-Hepburn romanization), just under the ===Etymology X=== heading and just above {{ja-kanjitab}}. Since {{ja-kanjitab}} is used to indicate the reading for each etym section, it might make more sense to just have the {{ja-kanjitab}} template itself add an {{etymid}}.
Another much more extensive approach, discussed multiple times over the years, would be to locate each term's actual entry data at some combination of the kanji spelling and the reading, such as putting all the nama information for the spelling at something like [[生/なま]], and then transclude from there into the Japanese section at [[生#Japanese]]. But this would require a lot of reworking and testing, and editors of other languages have voiced concerns about the consistency of the page structure, etc. etc.
(For that matter, I've long thought that including all languages' data in one big page under a shared grapheme is a bit wrong-headed, and each language should have its own sub-page of that grapheme. So pages like [[a]] wouldn't be so huge -- the English entry wikicode would be at something like [[a/en]], the Hawaiian at [[a/haw]], etc. This idea has faced similar opposition based on concerns about having to retool. However, since the last time I recall this coming up, we have adopted basically this same approach to pages like WT:BEER, where the wikicode for all the content shown at WT:BEER actually lives instead at sub-pages like Wiktionary:Beer_parlour/2021/June, so it's clearly a workable approach...) ‑‑ Eiríkr Útlendi │Tala við mig 20:03, 30 March 2023 (UTC)[reply]
Japanese headword templates have already been adding special section names for different readings since a long time ago. E.g. 生#き brings you directly to Etymology 3. -- Huhu9001 (talk) 01:36, 31 March 2023 (UTC)[reply]
@Huhu9001 -- Hmm, just tried that link to 生#き, it just takes me to the top of the entry. A quick look at the page source shows nothing with id="き", which would presumably be needed as the target for the #き part of the hyperlink.
Any chance something changed in the eleven days between when you wrote the above, and now when I just tried the link? ‑‑ Eiríkr Útlendi │Tala við mig 05:31, 11 April 2023 (UTC)[reply]
@Eirikr: Yes, it is now changed to 生#Japanese:_き. -- Huhu9001 (talk) 05:41, 11 April 2023 (UTC)[reply]
@Huhu9001 -- Thank you! That works.
FWIW, it appears twice in the page source, once for each POS. The link only works for the first one.
Shouldn't this link instead to the top of the etymology section, rather than to a specific POS? ‑‑ Eiríkr Útlendi │Tala við mig 05:52, 11 April 2023 (UTC)[reply]
I don't know what other Japanese editors want. -- Huhu9001 (talk) 08:41, 16 April 2023 (UTC)[reply]
(PS This looks exciting. As a non-Japanese speaker, I'd love to have something automatic when I write etymologies for Japanese colonial nomenclature in Taiwan like Shiodome. --Geographyinitiative (talk) 07:47, 30 March 2023 (UTC))[reply]
@Huhu9001 I don't think the sortkey issue is as difficult as you're making it out to be. The very existence of large numbers of pages using {{DEFAULTSORT:...}} means there are lots of pages with only one possible sortkey. Sortkeys can still be specified manually for pages/terms with multiple readings, possibly with defaults. I can also imagine that categories of readings could be defined; e.g. a term has multiple readings in general, but only one reading as a toponym, given name or surname, the {{place}}, {{given name}} or {{surname}} template could choose the right reading automatically. Even if it's not perfect and requires manual help in some situations, it would be a lot better than the current all-manual situation. Benwing2 (talk) 00:45, 31 March 2023 (UTC)[reply]
@Benwing2: So basically you are saying you prefer Plan A I described above. -- Huhu9001 (talk) 01:29, 31 March 2023 (UTC)[reply]
@Atitarev: Please correct me if I misunderstood you. Did you suggest an input format of {{l|ja|来る|tr=くる}} with the |tr=くる part compulsory and oppose reading kana from the entry for transliterations? -- Huhu9001 (talk) 15:57, 31 March 2023 (UTC)[reply]
@Huhu9001: No, I don’t oppose it, if this is what everyone wants and it’s going to work. I think kana reading should be exposed, though, as ruby or before rōmaji. I won’t insist on this. Anatoli T. (обсудить/вклад) 03:12, 1 April 2023 (UTC)[reply]
@Atitarev I agree, though I’d really prefer rubytext over putting it in the transliteration parameter. Theknightwho (talk) 12:49, 1 April 2023 (UTC)[reply]

@Eirikr, AG202, Atitarev, Cnilep, Benwing2 To address the heteronym problem, which solution do you prefer?

  • Solution A: Show no romaji for heteronyms.
    開ける, , さん, 輸送 (yusō)
  • Solution B: Show the romaji of all heteronyms.
    開ける (akeru/hirakeru), (ichi/ichi/itsu/hito/hito-/hi/hī/ī/Hajime), さん (san/-san), 輸送 (yusō)
  • Solution C: Show the romaji of some heteronyms, chosen arbitrarily.

-- Huhu9001 (talk) 05:24, 3 April 2023 (UTC)[reply]

Solution A: fail to produce romaji on heteronyms. That's what the Chinese equivalent does. Manual input is required. Anatoli T. (обсудить/вклад) 05:32, 3 April 2023 (UTC)[reply]
I'll weight in with +1 for Solution A, much as Anatoli says. ‑‑ Eiríkr Útlendi │Tala við mig 05:34, 11 April 2023 (UTC)[reply]

I changed my mind. Now I don't think this is a good idea but instead I prefer a solution similar to that of Arabic. -- Huhu9001 (talk) 02:08, 21 April 2023 (UTC)[reply]

@Huhu9001: Do you mean always using kana in |tr= (or |r=, which doesn't exist yet)?
Arabic uses vocalisations: الْيَابَان (al-yābān), compare with unvocalised اليابان. I don't see how you can show terms with kanji or mixed with kanji.
Something like 日本 (にほん) (input) to get 日本(にほん) (Nihon) or 日本 (にほん, Nihon) or 日本 (Nihon)? Anatoli T. (обсудить/вклад) 02:17, 21 April 2023 (UTC)[reply]
@Atitarev: This is what I mean:
  • {{m|ja|日本}} -> 日本: no transliteration without phonetic notations.
    Like {{m|ar|اليابان}} -> اليابان: no transliteration when not vocalized.
  • {{m|ja|[日](^に)[本](ほん)}} -> ほん (Nihon): transliteration only with phonetic notations.
    Like {{m|ar|الْيَابَان}} -> الْيَابَان (al-yābān): transliteration only when vocalized.
-- Huhu9001 (talk) 02:28, 21 April 2023 (UTC)[reply]
@Huhu9001: I see, thanks. You don't expect users to enter something like {{m|ja|[日](^に)[本](ほん)}}. A method like {{m|ja|日本(^にほん)}} (or use ^に%ほん, which I dislike) is more palatable. Anatoli T. (обсудить/вклад) 02:42, 21 April 2023 (UTC)[reply]
@Atitarev: If you can do {{m|ja|[日](^に)[本](ほん)}}, surely you can do {{m|ja|[日本](^にほん)}}, and {{m|ja|日本}} still remains a choice for the laziest user. A user can easily decide how much effort they put in the input, depending on how beautiful they want the output to be. -- Huhu9001 (talk) 03:09, 21 April 2023 (UTC)[reply]
Support kana only input for Japanese Anatoli T. (обсудить/вклад) 07:33, 21 April 2023 (UTC)[reply]

See also[edit]

Are the 'See also' sections on Gwoyeu Romatzyh and Hanyu Pinyin (for instance) a correct use of Wiktionary:See also? Is a semantic relation between the entry and the See also link needed, and if so, does it exist for those lists? Also, do you think my revert of that o so wonderful fool (@Van Man Fan) at diff and diff was justified on the basis I gave? (Please ping me. Thanks in advance!) --Geographyinitiative (talk) 14:08, 29 March 2023 (UTC) (Modified)[reply]

I personally don't care about any of those reverts. Van Man Fan (talk) 18:31, 29 March 2023 (UTC)[reply]
The ones at Gwoyeu Romatzyh etc should be listed as coordinate terms, since that's what they are (I already did this at Yale a while back). —Al-Muqanna المقنع (talk) 11:23, 30 March 2023 (UTC)[reply]

{{col}} vs bare lists in "Derived terms" headings?[edit]

I was wondering whether it is preferred by policy in any way whether to use the {{col}} family of templates (e.g. {{col-auto}}) when listing terms, such as in derived-terms sections under a word. I see a mix of this template, and just plain bulleted lists, used around Wiktionary, and I was wondering whether there is any reason to use one or the other, besides aesthetic ones. Any opinions or agreements on this? Kiril kovachev (talk) 19:47, 30 March 2023 (UTC)[reply]

There’s no policy, but the column templates do automatic sorting. I’d only use bulleted lists if there are 3 terms or less, personally. Theknightwho (talk) 19:52, 30 March 2023 (UTC)[reply]
I have been leaving five or fewer terms as bare lists, and applying column templates for six or more terms. — Sgconlaw (talk) 21:49, 30 March 2023 (UTC)[reply]
I recently had Polish and Czech entries use {{col-auto}} because it's easier. Wonderfool came along and then it was applied to Spanish. There is no policy. If you and the group of editors within a certain language agree it should be used, you can, and can even have a bot convert existing entries. Vininn126 (talk) 20:06, 30 March 2023 (UTC)[reply]
Maybe English should use this template more. If we converted English entries to that, it would bring a level of consistency not yet seen, and it would be very useful. @Equinox @Theknightwho @DCDuring @JeffDoozan and I'm sure i'm missing more, but what do y'all think? I know WF approves, but take that for what it's worth. Vininn126 (talk) 20:32, 30 March 2023 (UTC)[reply]
Strongly agree. Theknightwho (talk) 20:46, 30 March 2023 (UTC)[reply]
I have two problem with tables that probably can be addressed
  1. I work on tables that have both English and Translingual items with a given line being [English item] ([Translingual item]). The Translingual item appears orange rather than blue even when there is a Translingual L2 section linked. That means I can't use the orange as an indication that there is no Translingual entry rather than a capitalized word in, say, Latin or German.
  2. Elimination of duplicates and other changes are more difficult because the auto-alphabetized display means that those adding items or merging tables seem to feel no need to put them in alphabetical order. It can be quite tedious trying to improve large tables with this problem.
I am hopeful that both of these problems can be resolved so I can support the emerging consensus. DCDuring (talk) 21:28, 30 March 2023 (UTC)[reply]
What if biological terms were skipped until they were addressed? Vininn126 (talk) 21:29, 30 March 2023 (UTC)[reply]
One major advantage of the {{colN}} templates is that they are a single template invocation, vs. a large number of calls to {{l}} when bulleted lists are used with large lists. This dramatically reduces memory usage. Benwing2 (talk) 23:44, 30 March 2023 (UTC)[reply]
Sure, that's one reason why I think it's a good idea to switch. They are also easier to use for new people, and less of a hassle. And also tidier and easier to maintain. Vininn126 (talk) 06:23, 31 March 2023 (UTC)[reply]
@User:Vininn126 I'm not sure exactly what you mean. Do you mean if there is a single instance in a table of a term with formatting like ''[[Rosa]]''? Even that doesn't address the issue completely as taxonomic terms above the rank of genus don't have any distinguishing formatting.
@User:Theknightwho, @User:Benwing2 I suppose that we could put items that contain or potentially contain taxonomic names into a separate derived/related terms table. This already occasionally occurs, for other reasons. There are some gray areas (as for items that are the product (fruit, wood, shell, etc.) of specific groups of organisms), but it could be workable. DCDuring (talk) 14:24, 31 March 2023 (UTC)[reply]
Using a columns template for a small number of entries can just make is harder to read, which is super fustrating when people just apply it carte blanche. – Sokkjō 07:41, 31 March 2023 (UTC)[reply]
Personally I do not experience this. Vininn126 (talk) 07:43, 31 March 2023 (UTC)[reply]
Great, but personally, I do. – Sokkjō 07:44, 31 March 2023 (UTC)[reply]
Honestly I dislike looking at bare lists. I think we'd have to see how many people have the same problem as you; if it's considerable then it's something worth considering. Vininn126 (talk) 07:48, 31 March 2023 (UTC)[reply]
True, some people are obsessed with using templates. – Sokkjō 08:16, 31 March 2023 (UTC)[reply]
@Sokkjo All this suggests is that we need column templates to handle a small number of items better. It’s not about being “obsessed” - it’s about ensuring consistency and ease of use, instead of giving people bare lists which are in some random format and order. Theknightwho (talk) 12:09, 31 March 2023 (UTC)[reply]
There are users that use column templates for two or even one item, sacrificing legibility for the sake of using a template. I call that obsessed. – Sokkjō 21:14, 31 March 2023 (UTC)[reply]
@Sokkjo I just addressed that point, so I'm not sure why you repeated it. The obvious solution is to make sure it looks acceptable with one or two terms, instead of mixing formats. Theknightwho (talk) 05:52, 1 April 2023 (UTC)[reply]
Great, so you agree that they're template-obsessed zealots, perfect. -- Sokkjō 13:51, 1 April 2023 (UTC)[reply]
You've yet to say why the mentioned upsides don't outweigh the upsides of not using templates, you've only referred to a type of ad-hominem (calling them zealots) and avoided any other questions. Would you care to explain, or do you have nothing to actually add instead of rude comments? Vininn126 (talk) 14:02, 1 April 2023 (UTC)[reply]
What questions am I not answering and which nameless person am I being rude to? -- Sokkjō 15:32, 1 April 2023 (UTC)[reply]
I refer to the point of the column templates 1) allowing for better homogeneity 2) being easier to use 3) it has been mentioned they use less memory. So far your downside is "they look bad to me" and nothing else, and to be honest I consider this less of a point because it's so subjective - that aesthetic value will be different for each person. If enough people voice that same opinion, we can talk, but there are so many technical upsides.
The rudeness is using terms like "template-obsessed zealots", it's very clear you are referring to people supporting the conversion, i.e. many people in this thread.
Pretending you don't understand doesn't make this less true. Vininn126 (talk) 15:35, 1 April 2023 (UTC)[reply]
You seem to think "they look bad to me" is an invalid point. You also seem to think I'm in the minority but several people above agree that columned lists with few items "look bad". I've also noted that I find them less legible. Is that also invalid? I haven't heard any "technical upsides", just cries for policy creation. -- Sokkjō 15:54, 1 April 2023 (UTC)[reply]
Did you just completely ignore the memory point? That's a "technical upside". And I didn't say it's invalid, I just said that it's subjective. They do objectively save memory. Maybe I should say that yet again because you've managed to miss that point 2-3 times in this discussion. Vininn126 (talk) 15:57, 1 April 2023 (UTC)[reply]
I haven't missed it, it's just a moot point as it only comes into play with very large lists, and I don't think anyone is againt their usuage there. The issue of contention is templates being used in short lists. -- Sokkjō 16:16, 1 April 2023 (UTC)[reply]
It's actually something very important for people with slower computers - it can vastly speed up load time. I thought surely a programmer of your level would be aware of such a thing... Vininn126 (talk) 16:19, 1 April 2023 (UTC)[reply]
When we talk about memory, we're not talking about user-end computer memory -- we're talking about Lua memory usage. Clearly, you have no idea what you're talking about. -- Sokkjō 17:36, 1 April 2023 (UTC)[reply]
I am aware - it can still reduce load time? Vininn126 (talk) 17:41, 1 April 2023 (UTC)[reply]
@Vininn126: No, you are not, because if you did, you would have never said "slower computers". Please stop falsely claiming you have knowledge or understanding you do not possess. -- Sokkjō 17:55, 1 April 2023 (UTC)[reply]
Fine, that aside, the actual technical point is moot, but your objective preference is more important? It sounds like your bending over backwards to make a point just because you have a distaste for templates. It's fine to dislike them, but disregarding objective points while giving weight to subjective ones is bad reasoning. Vininn126 (talk) 17:57, 1 April 2023 (UTC)[reply]
I don't have a "distaste for templates" -- I've created many, ones you use every day here on the project -- I have a distate for over-templatization, like when users use a columns template for a single item, which is supported by others above and below. -- Sokkjō 18:08, 1 April 2023 (UTC)[reply]
I don't know what everyone thinks, but my opinion is that the different-colored background and the sorted, collapsible entries that the {{col-auto}} template generates is super helpful visually, cleaner to use, and just prettier. I don't want to be profligate with it, though, if people are finding it inconvenient for whatever reason; @Sokkjo could you explain why you find col-lists hard to read please? I also noticed that your (@JeffDoozan) bot was recasting bare lists under Spanish entires to this col format, and this has been very much to my appeal as I have been using Wiktionary to learn Spanish lately. My personal conclusion is that I'd like to use {{col}} more in editing, but I also hope we can agree on what is best and whether to systematise whatever we eventually settle on. Kiril kovachev (talk) 20:22, 31 March 2023 (UTC)[reply]
  • I hope no one objects to my removal of organism-name items from such templates for inclusion in separate tables of, eg, derived terms. Such separate tables are already in use for non-technical reasons in some sections of some entries, notably for Latin adjectives used as specific epithets.


Reminder: Office hours about updating the Wikimedia Terms of Use[edit]

You can find this message translated into additional languages on Meta-wiki.

Hello everyone,

This a reminder that the Wikimedia Foundation Legal Department is hosting office hours with community members about updating the Wikimedia Terms of Use.

The office hours will be held on April 4, at 17:00 UTC to 18:30 UTC. See for more details here on Meta.

We hereby kindly invite you to participate in the discussion. Please note that this meeting will be held in English language and led by the members of the Wikimedia Foundation Legal Team, who will take and answer your questions. Facilitators from the Movement Strategy and Governance Team will provide the necessary assistance and other meeting-related services.

On behalf of the Wikimedia Foundation Legal Team, Mervat (WMF) (talk) 12:52, 31 March 2023 (UTC)[reply]

/ø/ in Dutch[edit]

I am wondering whether in the Dutch language, we use /ø/ or /øː/ in words like leuk. The page Appendix:Dutch_pronunciation lists /ø/, but on the other hand we have the page Rhymes:Dutch/øːk.–Jérôme (talk) 16:49, 31 March 2023 (UTC)[reply]

Quoting from Wikipedia, Dutch phonology § Monophthongs:
  • The native /eː, øː, oː, aː/ as well as the non-native nasal /ɛ̃ː, œ̃ː, ɔ̃ː, ɑ̃ː/ are sometimes transcribed without the length marks, as ⟨e, ø, o, a, ɛ̃, œ̃, ɔ̃, ɑ̃⟩.
The length is not distinctive, and I suppose that in secondary stress positions (mensenheugenis, zwamneus) it may be shortish. I think, though, the appendix page should use /eː, øː, oː, aː/, just like we do in the pronunciation sections of Dutch terms with IPA pronunciation.  --Lambiam 07:40, 3 April 2023 (UTC)[reply]