Wiktionary:Grease pit/2022/May

Bot for self creating cognagations from verb[edit]

I'd love a bot which would create all conjugations from a verb, especially for German as it can be quite tedious (even with acceleration) since often lots of the words are not shown, like the zu-infinitive and preterite tense ADDSamuels (talk) 09:59, 1 May 2022 (UTC)[reply]

Unicode Private Use Area[edit]

(U+E864)を作成しようとしたところ、作成できませんでした。

具体的には、

{{character info}} ==Chinese== ===GB18030=== ''For pronunciation and definitions of <span style="font-family : 'SimSun', 'MingLiU', 'Dotum', 'Gulim', 'Gungsuh'">{{PAGENAME}}</span> – see [[龻]]''

という内容を投稿しようとしました。

w:GB 18030を参照。ソフトリダイレクトとして有用なため。--Charidri (talk) 09:22, 3 May 2022 (UTC)[reply]

Chinese
GB18030

For pronunciation and definitions of  – see 龻　←このようにしたいです。--Charidri (talk) 09:33, 3 May 2022 (UTC)[reply]

@Charidri: we don't create entries for Private Use Area characters. The character you see on your system and the character I see on my system may be completely different. Anyone can use those codepoints for anything they want. Chuck Entz (talk) 13:40, 3 May 2022 (UTC)[reply]

それは分かっています。しかし、U+E815からU+E864までのような、UnicodeとGB18030(中国の規格)の対応関係が明白なものは、項目を作成する価値があります。見た目は、字の形が同じでも、はっきりと違うとアナウンスできます。そしてそれは、SimSunなどの標準的なフォントで、誰でも確認することができます。De Facto Standard です。--Charidri (talk) 04:54, 4 May 2022 (UTC)[reply]

@Charidri: I just created Appendix:Unicode/Private Use Area/GB 18030. This would be sufficient. --172.58.36.21 01:14, 7 May 2022 (UTC)[reply]

一覧を作成いただいて、ありがとうございます。でも、一つずつ個別に項目があったほうがいいと思います。一つの項目で横断的に確認することができます。たとえばU+E864が、GB 18030だったら龻、HKSCSだったら藮、Hanyangだったらᄫᅡᇹ　などと。代表的なものをいろいろ確認できます。そして、その文字がPUAであることが明確に判別できます。--Charidri (talk) 06:11, 7 May 2022 (UTC)[reply]

とりあえずAppendix:Unicode/Private Use Area/GB 18030にGB対応フォントを指定しました。しかし、PUAを作成する価値は変わりません。PUAを作成するメリットは、外見は同じでも、その文字がPUAであると判別できます。やはり(U+E864)を含めてPUAの項目を一つずつ作成できるようにするべきです。--Charidri (talk) 10:16, 25 May 2022 (UTC)[reply]

Module:el-translit[edit]

There is currently a module error at Greek ψεύτρα (pséftra) due to interaction between the respelling used in a transliteration parameter:

wikitext: {{m|el|ψεύτης|g=m|t=liar|ψεύ(της)}

and a mw.ustring.gsub capture on line 43 of the module. @Sarri.greek pinged @Benwing2 in her edit summary, but he hasn't been active for over a month.

Her edit summary was:

ety -τρα // bug: if i write {{l or {{af or {{m|el|ψεύτη|ψεύτ(ης)}} it works, but gives error if τ is moved: {{m|el|ψεύτη|ψεύ(της)}} Attn. Benwing2

Any help would be appreciated, as I have no clue about the finer points of ustring functions. Thanks! Chuck Entz (talk) 15:07, 5 May 2022 (UTC)[reply]

Thank you @Chuck Entz. I did not realize it was a transliteration problem (it is caused by the parenthesis, not expected at Module:el-translit

paragraph text = gsub(text, "([αεηΑΕΗ])([υύ])(.?)", the 'following' letters do not include a parenthesis symbol.

This intrusion of a parenthesis mark is not common and occurs only in etymology (for composition). A simple solution is |tr=- or tr=manual. ‑‑Sarri.greek ^♫ I 15:22, 5 May 2022 (UTC)[reply]

The module does not allow manual correction |tr=pséf(tis). ‑‑Sarri.greek ^♫ I 15:54, 5 May 2022 (UTC)[reply]

Fixed the module error at least. — Eru·tuon 20:23, 6 May 2022 (UTC)[reply]

Also made consonantal υ be more often transliterated correctly. — Eru·tuon 21:09, 6 May 2022 (UTC)[reply]

Caught in the vandalism filter[edit]

Tried to make an entry for sexx0r, only to be told that I can't because it contains "xx". What should I do? Binarystep (talk) 12:21, 6 May 2022 (UTC)[reply]

@Chuck Entz. Binarystep (talk) 12:39, 6 May 2022 (UTC)[reply]

I created the entry. The filter should probably be adjusted to be less restrictive about which users it applies to. - TheDaveRoss 13:12, 6 May 2022 (UTC)[reply]

I added a check for edit count. The problem that led me to create the filter was the proliferation of new and clueless smartphone users who thought they could pull up porn by typing in "xx". Those are all IPs and brand-new accounts (we still get them all the time, years later). The filter could still use some tightening up to allow editing of things like Roman-numeral entries. Chuck Entz (talk) 21:24, 8 May 2022 (UTC)[reply]

fixing title display[edit]

Hello. What is the template to correct the display of an article title? For example, if a character displays as an emoji in some fonts, and we wish to suppress that, or if we want the title to display as italic or with subscripts. French WK has tl:titre incorrect, but that doesn't connect to anything here. kwami (talk) 20:32, 8 May 2022 (UTC)[reply]

As far as I can tell, there isn't one. The MediaWiki magic word is DISPLAYTITLE: but searching for it in the template namespace turns up only a few very specialized templates. Like DEFAULTSORT, this can be problematic: for instance, some Han characters display differently for different languages, or even for simplified vs. traditional Chinese. A template like you're describing might interfere with that kind of thing and perhaps even lead to disputes. Chuck Entz (talk) 21:12, 8 May 2022 (UTC)[reply]

So, how should I handle something like M_☉ (solar mass), where the M is italic and the ☉ subscript? The actual characters are M☉, but that's not how the symbol is rendered.

I see for H₂O we get around this by using the hack H₂O, even though those are the wrong characters. And there is a Unicode italic 𝑀 we could use with a redirect, but no subscript ☉. kwami (talk) 21:27, 8 May 2022 (UTC)[reply]

Well, even problematic things like redirects have exceptions where they're appropriate, and that looks like one of them. Whenever something new is proposed, I always ask myself "what could go wrong", but that doesn't mean I'm against everything. If other people think it's a good idea to create such a template, I'm not going to object. Chuck Entz (talk) 21:42, 8 May 2022 (UTC)[reply]

A template like that could indeed be used to force a simplified or traditional display for CJK characters, but someone would have to go to some effort to find the proper variation selector to achieve that. The question would be why they did so, and if that's an appropriate reason.

I added a rd from H2O to H₂O, but with this template, the entry could be moved to H20 where it would be easier for readers to find, and there would be no need for a rd. The same for H₂SO₄ and similar chemical formulae where we currently use hacks.

Another use would be with the signs of the zodiac and the astronomical symbol for a comet, ☄. We generally don't want entries for emojis, and these aren't. But they have an emoji option, and some popular fonts display them as emojis by default. Readers using such a font in their browser may wonder why I get to create Wikt articles for emojis but they don't. Forcing ☄ &c. to display in their non-emoji form would address that issue, while having no effect on most readers (since most fonts display them as text by default).

What could go wrong there: a reader might not have any font that supports text display of ☄, so if we forced it, they'd only see a box. But again that requires a variation selector, and it wouldn't be a problem for simple HTML formatting such as italics and subscripts. kwami (talk) 21:55, 8 May 2022 (UTC)[reply]

Another example is e.g. xkuMS in Chatino. That should be xku^MS, but there is not yet Unicode support for Chatino superscript S. Chatino ^C and ^F were just added to Unicode last year, but don't have much font support yet, and so will be impractical for probably a few years. So for the time being, the best approach would probably be to write them as we do now and use a title-formatting template to fix. kwami (talk) 16:32, 9 May 2022 (UTC)[reply]

I guess there is not a template for this yet(?), just the magic word. If there is some advantage to having a template and not just using the magic word, and you're able to write a template, go ahead, I guess. Fixing up entries' titles would be useful; the issue, as Chuck says, is just that there'd be certain entries it couldn't be used on (if there were multiple language sections that should display differently), but that's no reason not to let most of the entries get fixed up. If there are particular sets of terms which use particular templates and which are unlikely to be homographic to terms in another language that would need to display differently, we could even consider using the magic word / template inside those templates (or their underlying modules), like we already do to verticalize Mongolian-script entries' titles. E.g., maybe we could/should make the Chatino headword templates always superscript terminal capitals with only a parameter to turn it off in the apparently-uncommon situation of conflict (e.g. if there were a Chatino word *a^E it'd have to be turned off there because there's also German aE), or having the {{taxon|genus always italicize the page title except in the apparently statistically-rare case of homography with something that shouldn't be italicized, like Homo. (I feel like the second one must've been discussed before already somewhere...) - -sche (discuss) 16:06, 18 May 2022 (UTC)[reply]

Lua memory usage on bar[edit]

The page bar was exceeding the Lua memory quota. I tried changing a bunch of the {{IPA}} and {{head}} templates to their -lite equivalents, and this only helped a little. But then I noticed there was a recent change to the German declension templates and modules, and when I undid this, it fixed the issue. Can someone look into whether those templates are doing something unreasonably costly? It may also be that they aren't super costly, but that they just happened to be the straw that broke the camel's back on a rather long page. 70.172.194.25 05:07, 10 May 2022 (UTC)[reply]

@Benwing2 Chuck Entz (talk) 05:11, 10 May 2022 (UTC)[reply]

Synonym collapser thing looks wrong[edit]

See:

A definition
Synonym: hi

For me, the definition line looks like:

A definition synonym ▲

When it should look more like:

A definition [synonym ▲]

Just started happening today, and it only affects nyms, not quotations. This, that and the other (talk) 12:01, 11 May 2022 (UTC)[reply]

@This, that and the other: See MediaWiki talk:Gadget-defaultVisibilityToggles.js § CSS class. J3133 (talk) 12:19, 11 May 2022 (UTC)[reply]

Bot request: redundant Italian rhymes[edit]

This search shows 634 pages that use {{it-pr}}, which generates rhymes automatically, but also {{rhymes|it|}}, which is added by edits in the Rhymes namespace. Can someone remove the rhyme templates from these pages? I can't think of a case where the pronunciation template doesn't take precedence over the one for rhymes. Ultimateria (talk) 00:27, 12 May 2022 (UTC)[reply]

Done. —Svārtava (t/u) • 11:30, 15 May 2022 (UTC)[reply]

Template for Onkelos quotations?[edit]

Hi! Lately I've been adding quotations from the Targum Onkelos to Aramaic terms, such as דתאה. For this purpose I used the RQ:Tanach template, since the Targum is a translation of the Bible in Aramaic. But I was wondering - is there a template specific for Targumic citations? And if there is no such template - could it be created? (such a template may be identical to RQ:Tanach, with an additional note that the citation is from the Targum, and not the original Hebrew text). Thanks! Cymelo (talk) 07:41, 11 May 2022 (UTC)[reply]

https://en.wiktionary.org/wiki/Template:quote-book Vininn126 (talk) 11:48, 16 May 2022 (UTC)[reply]

quote-book is poorly suited for classical texts; {{Q}} is the recommended alternative. However, in Cymelo's case, you could also add a parameter to {{RQ:Tanach}}, or ask for someone to help you if you aren't confident to edit that hot mess of template syntax! This, that and the other (talk) 12:36, 16 May 2022 (UTC)[reply]

Why has nobody seen {{RQ:Onkelos}}? Fay Freak (talk) 14:37, 16 May 2022 (UTC)[reply]

If this template, that I have not made, is lame, you are free to loan the code from one of my reference templates and even link various books and editions of the Targum and Talmud or even also Mishna, e.g. on the model of {{R:ar:GdQ}} or {{RQ:Ibn Batoutah}}, and even provide links to the respective sections or pages of editions, e.g. from Sefaria as their URLs do not look too unreasonable—I don’t what you use exactly. (I have little experience with adding such functionality to {{Q}}). Fay Freak (talk) 14:48, 16 May 2022 (UTC)[reply]

Thanks, but I'm still not 100% sure about how to create/changing templates (I'm quite new to Wiktionary). It would be better if somebody else could create it (I don't want to make a mess). Cymelo (talk) 12:10, 17 May 2022 (UTC)[reply]

I'm really not sure how to use this template, it seems pretty different from {{RQ:Tanach}}, and not suited for citations... Cymelo (talk) 12:04, 17 May 2022 (UTC)[reply]

Yes, I think that this would be the best option, since the Onkelos is an Aramaic translation of the Pentateuch, so apart from the language everything would be the same. An additional parameter such as Targum:Onkelus is basically what we need. Do you know who could I ask for help? Cymelo (talk) 12:03, 17 May 2022 (UTC)[reply]

Category:Jandavra_language[edit]

Can someone edit the relevant module(s) to include the missing data for this language please? Info can be easily found in the linked WP article. Acolyte of Ice (talk) 13:10, 18 May 2022 (UTC)[reply]

Requesting the same for Category:Kalkoti language. Acolyte of Ice (talk) 13:13, 18 May 2022 (UTC)[reply]

Quiet Quinton Further Development[edit]

Is it possible to get QQ to not only backend G-books, but also WikiSource? Vininn126 (talk) 14:44, 18 May 2022 (UTC)[reply]

Big support for this idea from me. Theknightwho (talk) 11:58, 23 May 2022 (UTC)[reply]

Proto-Meso-Melanesian and Meso-Melanesian languages[edit]

@Kwékwlos has used these terms in the new entry for poke- so I'm wondering should we have these in our system here at Wiktionary? Researching this kind of stuff isn't really my thing so while I've glanced at Wikipedia I said I'd post here and see what people think and hopefully someone who knows the language template/module system can add data on this stuff if need be. Acolyte of Ice (talk) 12:12, 19 May 2022 (UTC)[reply]

@Acolyte of Ice Proto-Meso-Melanesian was first described by Ross (1988) as the ancestral language to the Meso-Melanesian linkage. But as a strict proto-language, it probably doesn't exist, being only a Western Oceanic residue of mutually intelligible dialects. Currently I am focusing on Bali (Uneapa) which is the most conservative language of Oceanic in phonology, but lacks a dictionary that could be used for comparative purposes. Besides I have to deal with areal words (shared by Willaumez and the Bariai languages). Kwékwlos (talk) 12:18, 19 May 2022 (UTC)[reply]

zh-pron Issue[edit]

@Fish bowl, Justinrleung, RcAlex36, Theknightwho Several years ago, I don't remember when anymore, either I personally or someone else (not sure anymore) was able to add Tongyong Pinyin to Template:zh-pron for one syllable and multi-syllabic Chinese character entries. However, there was something stopping us from unlocking Wade-Giles for the multi-syllabic entries. Keep in mind: all the syllables are already inputted into zh-pron-- they get displayed in the zh-pron box for all (one syllable) Chinese characters- no problem! It feels like Wiktionary is one small step away from having Wade-Giles on the multi-syllable Chinese character entries. I don't know what that step was exactly; it feels like it was a technical issue and not a linguistic theory issue. Here is a book of multi-syllabic Wade-Giles forms for reference: [1]. I think that everything related to linguistics should already be inputted into zh-pron, it's just that there's some key element missing that's preventing Wade-Giles from being displayed in zh-pron for the multi-syllable entries. Can anyone help me identify that small remaining problem is so we can determine how to overcome it? Here: Category:English terms derived from Wade-Giles is a category filled with over 400 English language loan words derived from the Wade-Giles transliteration scheme most of which are multi-syllable terms; after nearly twenty years of being ignored they cry out to you for your help. --Geographyinitiative (talk) 19:49, 20 May 2022 (UTC)[reply]

Does this change work correctly? You can test it by previewing with {{zh-pron/sandbox}} instead of {{zh-pron}} on Chinese entries. 98.170.164.88 19:56, 20 May 2022 (UTC)[reply]

God bless you IP. If this can be implemented on the mainspace, please do it. I think technically there needs to be a "dash" between the syllables, but if it can't be done, a a space okay- this is still an important step forward on this issue. Love you 98 IP. --Geographyinitiative (talk) 21:01, 20 May 2022 (UTC)[reply]

I've added the code created by 98 IP into the real deal. It's not perfect yet, but neither is this dictionary website which for twenty years just ignored Wade-Giles for multi-syllable Chinese character entries.
7 “Ask and it will be given to you; seek and you will find; knock and the door will be opened to you. 8 For everyone who asks receives; the one who seeks finds; and to the one who knocks, the door will be opened.
--Geographyinitiative (talk) 22:12, 20 May 2022 (UTC)[reply]

User:Geographyinitiative: what about this? [2] 98.170.164.88 00:48, 21 May 2022 (UTC)[reply]

Hey 98.170.164.88, this really works and accomplishes the exact purpose I was intending. Thanks. I have one final problem though: now, on the single-syllable Chinese character entries like 喉, there are TWO Wade-Giles spots under zh-pron. Can you help me delete that duplicated one? Thanks for your work. --Geographyinitiative (talk) 13:32, 21 May 2022 (UTC)[reply]

@Geographyinitiative You can remove the entire if block at Module:cmn-pron#L-1167 to 1171 to prevent this redundant behavior.

By the way, you can move the code that is currently on lines 1186–1192 to wherever you feel is appropriate in the ordering of romanizations. Currently it is before sinological IPA, but I just put it there arbitrarily. 98.170.164.88 15:32, 21 May 2022 (UTC)[reply]

Account deletion[edit]

Can I please have my account deleted and my edits reattributed. – Ilovemydoodle (talk) 03:54, 22 May 2022 (UTC)[reply]

@Ilovemydoodle: Is there some reason why you need your account deleted? You can just abandon the account if you don't want it anymore. - TheDaveRoss 16:49, 24 May 2022 (UTC)[reply]

Also, what do you mean by “reattributed”? To whom should the credit or blame for your edits be given? --Lambiam

Disallowing page creations as well as edits with abuse filters[edit]

There's a certain very persistent Greek IP editor who is convinced that their allegedly superior knowledge of advanced physics and philosophy makes their version of English much better and more important than that of the mere mortals that actually speak the language. After years of getting their protologisms deleted and cleaning up their incomprehensible definitions in entries, I finally came up with an abuse filter(#128) that prevents any of their IP ranges from editing any entries or entry talk pages that aren't Greek. The last part is to allow some functionality to innocent third parties who have the misfortune of using the same IP ranges.

So far, this has worked quite well. They do have a tendency to follow the link to the Grease pit in the abuse filter message and post explanations in their usual unreadable private language, but those are so out of place that they're easy to spot and revert.

Recently, though, they seem to have discovered a loophole: the filter won't stop them from creating and saving the entry or talk page the first time, even if it stops them from editing it once it exists. I'm not sure if it's because something in the variables I check isn't available for page creations, or there's just an error in my code. I've taught myself abuse-filter syntax by browsing the manuals and trial-and-error, so I certainly could have missed something.

The first page they created (that I know of) is physicsism, with the definition "Overestimation of the descriptive ability of a future and ideal physics; The belief that physics is evolvable into a general descriptor." @Surjection recognized this for the quasi-gibberish it was and replaced that with {{rfdef}}. I'm sure the IP has a good idea of what they think the word means, but A) there's no guarantee that it matches what anyone else means when they use the word, and B) they're unable to explain it so that anyone else can understand it. I have yet to see any but the most trivial of their edits that was an improvement. I would appreciate it if anyone with access to the abuse filter would fix it or let me know how to fix it myself. Thanks! Chuck Entz (talk) 21:49, 22 May 2022 (UTC)[reply]

The filter is working perfectly fine and is catching page creations as well. The reason it didn't catch that particular edit is entirely different, and I have already updated the filter to address it. — SURJECTION ^{/ T / C / L /} 05:37, 23 May 2022 (UTC)[reply]

Template:Han compound Issue[edit]

At 廣／广 (guǎng), we see: "+ phonetic 黃 (OC *ɡʷaːŋ)" in the Glyph Origin section. The exact same content should appear at the exact same spot on the 潢 (huáng) entry, but instead we see: "+ phonetic 黄 ()". I assume this must be a tech issue so I send it to you all to look at. --Geographyinitiative (talk) 15:53, 24 May 2022 (UTC)[reply]

Wiktionary:Statistics[edit]

It hasn't been updated since the April wiki dump. @Ungoliant MMDCCLXIV Could you update it? — Fenakhay ^{(حيطي · مساهماتي)} 15:55, 24 May 2022 (UTC)[reply]

Janus page unviewable – reported to be a phishing site[edit]

Please check this out: Wiktionary:Tea room/2022/May § Janus. --Lambiam 16:49, 24 May 2022 (UTC)[reply]

Old Occitan link normalization[edit]

On laüt#Occitan, the link to Old Occitan takes you to laut (no diaeresis) when it should go to laüt#Old Occitan. Or maybe the entry should be moved to laut? 70.172.194.25 00:04, 25 May 2022 (UTC)[reply]

`{{quote-av}}` transcript URL[edit]

IMO, it would be great if {{quote-av}} supported up to two URL parameters, one to view the audiovisual content and another to see a transcript. When only one of these options is available, of course, you could just supply that one. For comparison, the English Wikipedia's equivalent, Template:Cite AV media, has a transcripturl parameter. I think this would improve accessibility and searchability. Sometimes transcripts are not identical to what actually gets said, but often they're close enough. 70.172.194.25 04:14, 25 May 2022 (UTC)[reply]

Transliteration Systems in Etymologies[edit]

@Inqilābī, Justinrleung, Theknightwho & all: I would like to add two transliteration systems to Template:borrowed which would "fall under" Mandarin (cmn): one called "wg" (Wade-Giles) and one called "hp" (Hanyu Pinyin). See the last three posts in Talk:Kuomintang for discussion of this issue. See the first half of the Etymology section of 'Xizhi' for a potential example of what this might look like if implemented: "From the Hanyu Pinyin romanization of Mandarin […] ". On the Xizhi page, you would hypothetically write "From the {{bor|en|hp|-}}" and produce that text (or similar), and all the attendant categorization, etc that cmn would normally produce. ***Note: this issue can become extremely complex- I want to keep it narrowly focused right on the request in the first sentence so something can actually get DONE rather than endless debate. Please don't discuss new categories, different transliteration schemes, etc. yet.*** Thanks for any help here! --Geographyinitiative (talk) 18:59, 25 May 2022 (UTC) (modified)[reply]

This is a very reasonable idea. These could be set up as etymology-only languages; maybe cmn-hp and cmn-wg would be more appropriate codes. The existing Lua infrastructure for language codes would need some extensions, such as allowing for multiple Wikipedia links in the language name, but that is probably easy work for a Lua expert.

Another option, with slightly different output, would be to add a parameter to {{bor+}}, like {{bor+|en|cmn|-|rom=hanyu}} = "Borrowed from the Hanyu Pinyin romanization of Mandarin".

Still a third option would be to have separate templates which also integrate the functionality of {{zh-l}}, like {{bor-cmn-hanyu|en|汐止|tr=Xìzhǐ}}. = "From the Hanyu Pinyin romanization of Mandarin 汐止 (Xìzhǐ)." This seems like the most flexible option, and there is precedent in {{zh-l}}, the Chinese-specific variant of {{l}}. However, it doesn't scale very well if this approach gets expanded to more languages and romanization systems. This, that and the other (talk) 04:41, 26 May 2022 (UTC)[reply]

I'm supportive of this idea, and I think that etymology-only languages are the best way to refer to these. I would suggest that we slightly modify the ISO standard (for consistency going forward) and use cmn-pny for Pinyin, but with Wade-Giles I agree that cmn-wg is the best option. The main reasons I'd oppose your other two suggestsions are:

Added complexity, which becomes relevant when you may have editors adding etymologies for more distant descendents who may not be familiar with Chinese-specific templates (e.g. a German term borrowed from English, which was itself a Wade-Giles Romanisation). Etymology-only languages are a system that editors are already familiar with.
I don't envision a Romanisation applying to more than one parent langauge (e.g. you aren't going to have Pinyin of any language other than Mandarin), so there doesn't seem to be a reason to allow the Romanisation to be specified in a separate field to the parent langauge. It's just a "flavour" of that language, in the same way Medieval Latin is a "flavour" of Latin.
Incorporating the functionality of {{zh-l}} is a good idea, but I think that's a wider point that we should be getting the etymology templates to do in general; let's not compound the divergence, but deal with that issue properly (and separately).

Theknightwho (talk) 21:54, 26 May 2022 (UTC)[reply]

Very reasonable points. Some code would need to be added to Module:etymology languages, which would need additional code added by someone with appropriate powers. I would suggest, though, that co-opting pny in the code cmn-pny isn't the greatest idea, since pny is the code for an African language called Pinyin that has nothing to do with Chinese, and as Geographyinitiative reminds us from time to time, other Pinyin systems exist besides Hanyu Pinyin (see Tongyong Pinyin). This, that and the other (talk) 02:57, 27 May 2022 (UTC)[reply]

My mistake! Let's use cmn-hp then. Theknightwho (talk) 04:00, 27 May 2022 (UTC)[reply]

Why not use the BCP 47 variant tags? As far as I can tell, admissible codes would be cmn-pinyin and cmn-wadegile, though there might be a pedantic argument that one SHOULD use zh-cmn-pinyin and zh-cmn-wadegile. I'll ask about that tonight on the BCP 47 forum. --RichardW57m (talk) 12:53, 27 May 2022 (UTC)[reply]

@RichardW57m I think one of the main issues is the point that Pinyin isn't limited to Hanyu Pinyin, which is the system we usually associate with the term. Theknightwho (talk) 14:13, 27 May 2022 (UTC)[reply]

IANA has assigned pinyin to Hanyu Pinyin and tongyong to Tongyong Pinyin, so there is justification behind the use of those variant tags. This, that and the other (talk) 14:29, 27 May 2022 (UTC)[reply]

Fourth option: modifying {{transliteration}} to have a "system" parameter? —Fish bowl (talk) 22:02, 26 May 2022 (UTC)[reply]

Is that not just option 2? Theknightwho (talk) 22:06, 26 May 2022 (UTC)[reply]

Geographyinitiative has argued that we should avoid using the word "transliteration" to refer to romanizations of Chinese. See Talk:Kongmoon. So implementing it in {{bor}} may be less controversial. 70.172.194.25 01:03, 27 May 2022 (UTC)[reply]

Lua memory usage on Han character pages[edit]

Currently, CAT:E is full of Han character pages that are exceeding Lua memory limits. Does anyone know which modules might be the culprits? 70.172.194.25 17:45, 26 May 2022 (UTC)[reply]

The CJKV modules in general are a bit of a mess and could probably be optimized a lot in terms of memory usage. — SURJECTION ^{/ T / C / L /} 20:13, 26 May 2022 (UTC)[reply]

Why is Supratiṣṭhitacāritra auto-categorised in Category:Long English words?[edit]

The category is intended for "English words that are 25 letters long or more". Equinox ◑ 03:28, 29 May 2022 (UTC)[reply]

Because the code that populates this category counts bytes, not Unicode characters. In Lua, ("Supratiṣṭhitacāritra"):len(), or equivalently string.len("Supratiṣṭhitacāritra"), evaluates to 25. On the other hand, mw.ustring.len("Supratiṣṭhitacāritra") evaluates to the expected 20. The extra bytes come from the diacritics. 70.172.194.25 03:43, 29 May 2022 (UTC)[reply]

That is a bug then. Equinox ◑ 03:57, 29 May 2022 (UTC)[reply]

@Equinox: I have done 70.172.194.25’s fix. J3133 (talk) 08:01, 29 May 2022 (UTC)[reply]

On a wider note, we should never be using the string functions in Lua - they should always be mw.string. Theknightwho (talk) 17:57, 30 May 2022 (UTC)[reply]

Script for Pali in Eastern Nagari Script[edit]

If anyone rushes in to make changes, please note that there are related changes to be done listed in Tweaking Eastern Nagari Script Definitions.

We currently have two versions of the Eastern Nagari script - Bengali (code 'Beng') which uses U+9B0 র RA for 'r' and Assamese (code 'as-Beng') which uses U+09F0 ৰ RA WITH MIDDLE DIAGONAL for 'r'. Template {{sa-sc}} uses the difference to determine whether a Sanskrit word is in the Bengali script or the Assamese script. Inconveniently, Pali in the Eastern Nagari script nowadays uses both letters - the first for 'r' and the second for 'v'. (See Template_talk:pi-alt for the elucidation of evidence.) Pali is currently declared to use the Bengali script as its Eastern Nagari script.

Unfortunately, this prevents the script detection and thus automatic transliteration of the indeclinable particle ৰ (va). This appears to be the only word affected. What is the proper solution? Is it to manually specify the script and transliterations, including replacing {{pi-particle}} in the entry with {{head|pi|particle}}, or should I create a third Eastern Nagari script for Pali? Today I modified the page for the particle to work around the problem. --14:12, 29 May 2022 (UTC) — This unsigned comment was added by RichardW57 (talk • contribs) at 14:12, 29 May 2022.

If this issue only affects one word, then it is not unreasonable to handle it using manual overrides on that one entry. If other entries with this character are being misclassified, then you could consider removing this detection rule from the module that detects scripts and requiring the script to be manually specified in such cases. I'm not sure what you mean by creating a third Eastern Nagari script for Pali; would it only include these exceptional words? 70.172.194.25 15:06, 29 May 2022 (UTC)[reply]

No, the third script would be pi-Beng, would (probably) only be used for Pali, and would include both the letters above as well as the present repertoire of Beng. It might be possible to remove some unused characters, but it is unlikely to be worth the effort, which could backfire. I will leave a note at Module:pi-headword to say that it should be modified to handle ৰ; at the moment I am simply bypassing it and going direct to Module:headword. --RichardW57m (talk) 12:26, 30 May 2022 (UTC)[reply]

RQ:Byron Childe Harold[edit]

At face, RQ:Byron Childe Harold makes Lord Byron out to be a contemporary of the Han dynasty by saying his works were written in the year 181. --Geographyinitiative (talk) 00:04, 31 May 2022 (UTC)[reply]

Getting the HTML of the Flexion namespace in German Wiktionary[edit]

I am trying to parse the German Wiktionary. I have been using the HTML dumps and they have been working great, however for some pages the inflections are not on the page itself, but instead on a subpage in the form of https://de.wiktionary.org/wiki/Flexion:spole%C4%8Dn%C3%BD . And these pages are not included in the HTML dumps, unfortunately. Does anyone have any idea what the best way to solve the problem would be? One could use the XML dump and then some technique similar to this project to turn the XML into HTML, however this would be very difficult to implement. After thinking about it, the simplest technique would be to simply scrape all the desired pages' HTML. Pretty ugly, but would probably work. Does anyone have a better idea? --MrBeef12 (talk) 11:36, 31 May 2022 (UTC)[reply]

I think that the best solution is to get the Wikimedia developers to include the Flexion namespace in the German Wiktionary's HTML dump. Using the XML dump involves extra steps because you have to process the templates in the Flexion pages to get the inflected forms. jberkel posted a Phabricator task for including Appendix, Thesaurus, Reconstruction, and Citations namespaces in the English Wiktionary HTML dump and maybe other Wiktionaries' namespaces that include entries or dictionary-related information could be mentioned in the same task. Not sure how soon that will be addressed. I'm surprised that addressing the English Wiktionary task wasn't as simple as just adding more namespaces to a list, but I don't really know how the HTML dumping works. — Eru·tuon 18:57, 31 May 2022 (UTC)[reply]

Thank you, this is probably the best option. Let's see when they are going to be available. MrBeef12 (talk) 11:16, 2 June 2022 (UTC)[reply]

If you want, leave a comment on the phabricator ticket indicating your interest in this, maybe this helps. – Jberkel 11:31, 2 June 2022 (UTC)[reply]

Good idea, I did so. Let's hope it'll get fixed. MrBeef12 (talk) 12:22, 3 June 2022 (UTC)[reply]

Yes, I wouldn't recommend using HTML dumps at the moment, they are incomplete and unreliable (aka enterprisey). On the other hand, parsing wiki markup with anything other than MediaWiki is usually doomed to fail at some point. But depends what kind of data you want to extract. – Jberkel 19:21, 31 May 2022 (UTC)[reply]

Wiktionary:Grease pit/2022/May

Contents

Bot for self creating cognagations from verb[edit]

Unicode Private Use Area[edit]

Module:el-translit[edit]

Caught in the vandalism filter[edit]

fixing title display[edit]

Lua memory usage on bar[edit]

Synonym collapser thing looks wrong[edit]

Bot request: redundant Italian rhymes[edit]

Template for Onkelos quotations?[edit]

Category:Jandavra_language[edit]

Quiet Quinton Further Development[edit]

Proto-Meso-Melanesian and Meso-Melanesian languages[edit]

zh-pron Issue[edit]

Account deletion[edit]

Disallowing page creations as well as edits with abuse filters[edit]

Template:Han compound Issue[edit]

Wiktionary:Statistics[edit]

Janus page unviewable – reported to be a phishing site[edit]

Old Occitan link normalization[edit]

`{{quote-av}}` transcript URL[edit]

Transliteration Systems in Etymologies[edit]

Lua memory usage on Han character pages[edit]

Why is Supratiṣṭhitacāritra auto-categorised in Category:Long English words?[edit]

Script for Pali in Eastern Nagari Script[edit]

RQ:Byron Childe Harold[edit]

Getting the HTML of the Flexion namespace in German Wiktionary[edit]

Navigation menu

Wiktionary:Grease pit/2022/May

Bot for self creating cognagations from verb[edit]

Unicode Private Use Area[edit]

Module:el-translit[edit]

Caught in the vandalism filter[edit]

fixing title display[edit]

Lua memory usage on bar[edit]

Synonym collapser thing looks wrong[edit]

Bot request: redundant Italian rhymes[edit]

Template for Onkelos quotations?[edit]

Category:Jandavra_language[edit]

Quiet Quinton Further Development[edit]

Proto-Meso-Melanesian and Meso-Melanesian languages[edit]

zh-pron Issue[edit]

Account deletion[edit]

Disallowing page creations as well as edits with abuse filters[edit]

Template:Han compound Issue[edit]

Wiktionary:Statistics[edit]

Janus page unviewable – reported to be a phishing site[edit]

Old Occitan link normalization[edit]

{{quote-av}} transcript URL[edit]

Transliteration Systems in Etymologies[edit]

Lua memory usage on Han character pages[edit]

Why is Supratiṣṭhitacāritra auto-categorised in Category:Long English words?[edit]

Script for Pali in Eastern Nagari Script[edit]

RQ:Byron Childe Harold[edit]

Getting the HTML of the Flexion namespace in German Wiktionary[edit]

Navigation menu

Search

`{{quote-av}}` transcript URL[edit]