User talk:Erutuon/2019

Archives:
2009 – 2010 – 2011 – 2015 – 2016 – 2017 – 2018 – 2019 – 2020

Flag of Portuguese[edit]

Latest comment: 5 years ago3 comments2 people in discussion

Hello. I see you are an administrator who deals with flags so maybe you can help me. We currently use only the flag of Portugal to represent Portuguese and here it was requested twice that it be replaced by this flag, which represents Brazil as well (consider that Brazil has twenty times more Portuguese speakers than Portugal). Both requests, the first in May 2016 by myself and the other one in June 2018, have been largely ignored. I'm here to request that change for the third time. - Alumnum (talk) 22:50, 3 January 2019 (UTC)Reply

Done, since nobody has objected to it. — Eru·tuon 23:04, 3 January 2019 (UTC)Reply

Thanks! - Alumnum (talk) 23:07, 3 January 2019 (UTC)Reply

Change to MediaWiki:Common.js[edit]

Latest comment: 5 years ago2 comments2 people in discussion

This is about this change. IE9 and older browsers get grade C support which means our js does not even get to run on them. more info. Giorgi Eufshi (talk) 06:37, 4 January 2019 (UTC)Reply

@Giorgi Eufshi: Thank you! I was trying to find that information, but didn't succeed. That makes things easier. I'll remove stuff relating to unsupported versions of IE. — Eru·tuon 06:52, 4 January 2019 (UTC)Reply

do ... end?[edit]

Latest comment: 5 years ago2 comments2 people in discussion

I noticed you added a block of code to Module:nyms that begins with do and ends with end, but it doesn't seem to loop at all. Is this some Lua construct I'm not aware of? There's nothing on https://www.mediawiki.org/wiki/Extension:Scribunto/Lua_reference_manual about it. —Rua (mew) 21:33, 6 January 2019 (UTC)Reply

It's a simple block. It's mentioned briefly under mw:Extension:Scribunto/Lua reference manual § Statements. The only effect it has is to make the local variable thesaurus_links inaccessible below where it's actually used. I'm not sure I've ever seen it on Wiktionary before. — Eru·tuon 21:50, 6 January 2019 (UTC)Reply

problem with `{{der3}}`[edit]

Latest comment: 5 years ago32 comments4 people in discussion

At [[rock]] Derived terms the control that expands the list does not appear. Clicking the place it should appear does expand the list. DCDuring (talk) 19:36, 7 January 2019 (UTC)Reply

@DCDuring: In all of the Derived terms sections that have enough terms in them, the control appears for me and works. The one for "an act of rocking" doesn't have a control because it doesn't have enough terms. Are there any other show–hide things that aren't working, like the translations boxes? — Eru·tuon 20:09, 7 January 2019 (UTC)Reply

Others on the same page appear. I wonder it is some kind of interference from the rhs table of contents or other stuff appearing on the right side. DCDuring (talk) 20:18, 7 January 2019 (UTC)Reply

@DCDuring: Which one are you talking about? They all appear for me except Etym 2 noun which doesn't have enough entries to activate it (set for 4 columns). Not a system I am fond of. DonnanZ (talk) 20:30, 7 January 2019 (UTC)Reply

The first is the one that doesn't appear for me. The Translation controls appear. DCDuring (talk) 20:38, 7 January 2019 (UTC)Reply

I'm doubtful that anything else on the page would interfere with it. On my screen, the image on the right pushes the derived terms over, but the control still appears and works. If you're in mobile mode, the control will not appear. It sounds to me like the JavaScript for the list switcher is not working. I am curious if your browser console has any error messages indicating that this is the case. If you have Firefox or Chrome you can click F12 and select the Console tab to view it. There are usually a lot of annoying warning messages, but maybe the only thing relevant would be a message containing "Error" (or "TypeError", or some variation on that). — Eru·tuon 20:48, 7 January 2019 (UTC)Reply

It's OK for me. I had to revise developed earlier where {{hyp4}} was used to {{hyp3}} because of odd behaviour; it doesn't need four columns anyway. DonnanZ (talk) 21:17, 7 January 2019 (UTC)Reply

The problem also arises in Chrome. I found lots of messages, but none showing error. The control to reduce the list does appear, of course well after the right-hand side table of contents. I wonder whether anyone has looked at that gadget in the last few years. DCDuring (talk) 01:49, 8 January 2019 (UTC)Reply

Disabling the rhs ToC gadget gave me the control. But I'd rather have the rhs ToC than the control or that configuration. I am also loath to defeat it with JS as we have often had long periods where the JS functionally lagged dreadfully. I've stripped a lot of optional JS away. DCDuring (talk) 02:01, 8 January 2019 (UTC)Reply

We are talking about {{der3}}, right? That's one of the templates that I redid recently and that the discussion and vote in November's Beer Parlour was about. I don't have the RHS ToC so I will have to enable that to see if I can reproduce what you're talking about. — Eru·tuon 02:07, 8 January 2019 (UTC)Reply

Whew, I enabled the RHS ToC and it is quite horrifying what it does to the control for the derived terms list: pushes it all the way down into the next set of definitions. It's some weird interaction between all the images on the right and the HTML elements in the derived terms list. I'll look into it. — Eru·tuon 02:14, 8 January 2019 (UTC)Reply

Okay, it was a CSS issue and I think it will be resolved whenever your browser manages to get ahold of the most recent version of MediaWiki:Common.css. — Eru·tuon 02:37, 8 January 2019 (UTC)Reply

Does the rhs ToC gadget have dated, deprecated elements? DCDuring (talk) 03:06, 8 January 2019 (UTC)Reply

Thanks. It works. DCDuring (talk) 03:09, 8 January 2019 (UTC)Reply

Thank you for pointing out the bug. I've noticed lagginess too. I'm working on making more of the default JavaScript into gadgets loaded by default, which could help if the problem is download time. — Eru·tuon 02:21, 9 January 2019 (UTC)Reply

The RHS ToC gadget just involves a very short CSS file, so no HTML elements are changed. The issue was a CSS property that was applied to the control for the derived terms list (float: right;), which was putting the control into the stack of elements floating on the right side of the page, below the TOC and most of the images. Removing that property fixed the problem. — Eru·tuon 03:17, 8 January 2019 (UTC)Reply

(float: right;) could be an issue with {{WOTD}} causing problems at the top of the ~~page~~ entry. That code is definitely in there. DonnanZ (talk) 10:42, 8 January 2019 (UTC)Reply

This is what I mean, see telemark, which hasn't been modified. There is a workaround which fixes the gap (which I see, maybe you don't) where images and Wikipedia links are placed under the first header, which would be Etymology here. DonnanZ (talk) 00:09, 11 January 2019 (UTC)Reply

@Donnanz: Where are you seeing a gap? With ToC on the left side, I see the WOTD link under the header, with a reasonably sized gap that is simply due to the bottom padding of the header above it; with ToC on the right side, the WOTD link is pushed down by the ToC but there isn't a significant gap between it and the ToC. — Eru·tuon 00:29, 11 January 2019 (UTC)Reply

I have the TOC on the left, with a large gap alongside the image, with Etymology pushed down to the line below the image. It could be my browser doing it, I'm on Windows 10. Hiding the TOC makes no difference. I have discussed this with Sgconlaw before, he now modifies current WOTDs. DonnanZ (talk) 00:42, 11 January 2019 (UTC)Reply

@Donnanz: I'm using Firefox on Linux Mint. I can get the same effect by adding the CSS properties clear: right; or clear: both; to the "Etymology" header. Maybe there is a gadget that adds that CSS property to the header. — Eru·tuon 01:01, 11 January 2019 (UTC)Reply

I haven't got a clue. I experimented with placing the image on the left, which gets rid of the gap for me, but it had some odd effects, with bullets and numbers showing through the image, so I didn't save it that way. Images default to the right, unless they are modified; and so do {{wp}}, {{swp}}, {{wikipedia}}, so maybe {{WOTD}} should too, and not use (float: right;). I don't know, I'm not a programmer. DonnanZ (talk) 10:32, 11 January 2019 (UTC)Reply

I still use IE, as I have all my favourites stored there, but I thought I would try Edge and Orange. No gap on Orange, everything as it should be, but Edge is the same as IE, a massive gap. So I guess it's a Microsoft problem. DonnanZ (talk) 13:05, 11 January 2019 (UTC)Reply

Why did I say Orange? I meant Chrome. Oops. DonnanZ (talk) 16:20, 26 January 2019 (UTC)Reply

@Sgconlaw, I think you should know we're discussing this. DonnanZ (talk) 15:14, 11 January 2019 (UTC)Reply

@Donnanz: Is there anything specific you'd like my input on? — SGconlaw (talk) 16:26, 11 January 2019 (UTC)Reply

@Sgconlaw: Not that I can thank of at the moment, I was just drawing your attention. I know many editors use Firefox or Orange rather than Microsoft products, but countless other users (passive or otherwise) may use Edge or IE, so we still have to cater for them. DonnanZ (talk) 17:15, 11 January 2019 (UTC)Reply

@Donnanz: At some point I should go into Windows and see if I can reproduce the problem and figure out a solution. — Eru·tuon 07:34, 15 January 2019 (UTC)Reply

OK, I was tempted to modify telemark, but I will leave it as it is for now. DonnanZ (talk) 09:39, 15 January 2019 (UTC)Reply

Okay, I'm in Windows and the problem in telemark is the clear: right; CSS property on the HTML element that encloses the image (<div class="thumb tright">...</div>). For some reason, Microsoft Edge thinks that the property means that the etymology heading has to be below the image, but Firefox and Chrome don't. Just removing the property isn't desirable; then the image appears to the left of the "WOTD" text. — Eru·tuon 18:39, 17 January 2019 (UTC)Reply

I take it that means there's nothing you can do. Ironically, IE seems to have "packed up", so I've been using Chrome (in preference to Edge) for the last two days. DonnanZ (talk) 16:28, 26 January 2019 (UTC)Reply

Well, I did some web searches and didn't find any references to the issue or any solutions. I don't feel very motivated to do more searching, but I might go back to it. — Eru·tuon 01:01, 27 January 2019 (UTC)Reply

-ύς epic declension[edit]

Latest comment: 5 years ago1 comment1 person in discussion

Hi! Could you please have a look here? Thank you very much, --Epìdosis (talk) 10:20, 9 January 2019 (UTC)Reply

Issue with "Template:WOTD" and audio files[edit]

Latest comment: 5 years ago7 comments3 people in discussion

Wonder if you can see if I did something wrong. I updated {{WOTD}} so that it would recognize audio files in the format "File:En-au-[entry].ogg" which Commander Keane has been diligently uploading and inserting into entries. However, it works for some entries and not others. For example, if you look at the January 2019 WOTDs at "Wiktionary:Word of the day/Archive/2019/January", the audio file of emu appears but that of Tiggerish doesn't. I tried resetting the transcode of File:En-au-Tiggerish.ogg but that didn't make a difference. Any idea what might be going wrong? Thanks. — SGconlaw (talk) 06:47, 15 January 2019 (UTC)Reply

@Sgconlaw: Heh, I spent a lot of time looking at the template code and seeing no problems, and then finally edited the section and the audio showed up, so I pressed "refresh" in the upper right hand side of the WOTD box to make the audio show up on the page. It was apparently a caching issue. — Eru·tuon 07:22, 15 January 2019 (UTC)Reply

Ohhh. I tried refreshing the entry page and the WOTD archive page, but didn't think it was an issue of the template page needing to be refreshed as well. Thanks for discovering that! — SGconlaw (talk) 07:27, 15 January 2019 (UTC)Reply

I mean, I clicked one of the "refresh" buttons in a WOTD box in Wiktionary:Word of the day/Archive/2019/January, not in Template:WOTD. That purges the page (action=purge in the URL), different from reloading the browser. — Eru·tuon 07:32, 15 January 2019 (UTC)Reply

Ah, I see. — SGconlaw (talk) 07:41, 15 January 2019 (UTC)Reply

Whatever you did seems to have speeded up the loading of audio on a page, I had noticed recently there was a delay where you had to wait for it to catch up before the page or entry could be edited. DonnanZ (talk) 11:02, 15 January 2019 (UTC)Reply

That's weird. — SGconlaw (talk) 11:06, 15 January 2019 (UTC)Reply

What have you done?[edit]

Latest comment: 5 years ago2 comments2 people in discussion

I know you're trying to clean up my (admittedly) poorly constructed category, but you do realize that 糹 is not a triplication? Johnny Shiz (talk) 16:19, 9 February 2019 (UTC)Reply

@Johnny Shiz: Oops! That should be fixed with this edit. — Eru·tuon 20:46, 9 February 2019 (UTC)Reply

rookie's question[edit]

Latest comment: 5 years ago6 comments2 people in discussion

Eru, you don't have to answer this... But if you ever have time: I'm trying to understand lua (at my age, impossible), which is needed at el.wiktionary, because the last person who could handle it, disappeared last year. I know that neither is correct, but which one is the worst? the 1st or the 2nd? sarri.greek (talk) 23:56, 13 February 2019 (UTC)Reply

@Sarri.greek: Neither of them will work, but I added a version that probably will. — Eru·tuon 01:22, 14 February 2019 (UTC)Reply

Αχ @Erutuon:, thank you, ευχαριστώ! I'll study it, I promise. I am indebted to you. sarri.greek (talk) 01:43, 14 February 2019 (UTC)Reply

I remember when I was just starting to learn Lua. It was pretty hard and I made a lot of mistakes. It was just about my first programming language. These cheat sheets might be helpful: 1, 2. — Eru·tuon 02:21, 14 February 2019 (UTC)Reply

Another idea is to try playing around with a REPL, where you can type in code and see the result. There is a console below the editing area in module pages, or you could try the console on the Fengari website (yes, they named it after the Greek word for moon!) which uses a more recent version of Lua and doesn't have MediaWiki-related stuff. — Eru·tuon 02:40, 14 February 2019 (UTC)Reply

You are so sweet, and a genius!! I did it! el:Πρότυπο:sarritest and el:Module:sarritest (only i changed 'local' at function). And I will study your links too. I'll try to keep most of the things in simple templates. If something happens to me, there will be noone to continue or correct things. sarri.greek (talk) 05:04, 14 February 2019 (UTC)Reply

Just informing that the Saudi IP has an agenda for removing computing-related senses[edit]

Latest comment: 5 years ago3 comments3 people in discussion

Regarding the IP’s removal of the quote on غَزَا (ḡazā), this follows a long line of removing anything implying usage of Arabic words for computing, regard the history of قُرْصَان (qurṣān), English hacker, خ ر ق (ḵ-r-q), and others I cannot name off the cuff. The removal of such references may also be the only motivation for layout changes, this IP appears to frequently camouflage removals by changes in other respects of dubious worth. Informing also @Chuck Entz, Surjection who have previously tackled this IP. Fay Freak (talk) 23:30, 23 February 2019 (UTC)Reply

@Fay Freak: Ahh, thanks for the information. That makes sense of what the user was doing. — Eru·tuon 23:34, 23 February 2019 (UTC)Reply

I don't think I have ever seen this editor communicate, but I have seen them edit-war in the past. They simply undo any edits that counteract theirs. — sur jec tion ⟨?⟩ 23:44, 23 February 2019 (UTC)Reply

proper way to clone a table[edit]

Latest comment: 5 years ago4 comments2 people in discussion

Why doesn't mw.clone() work on loadData'd tables? What is the error? What is the proper way to clone a table? Maybe table.shallowClone() and/or table.deepcopy()? Module:parameters should *DEFINITELY* not be side-effecting the params table passed into it; that's bad juju and can lead to all sorts of subtle and hard-to-debug errors. Benwing2 (talk) 01:36, 1 March 2019 (UTC)Reply

@Benwing2: deepcopy from Module:table is intended to copy tables loaded with mw.loadData, but when I tried plugging it into your edit, there was a stack overflow. Not sure how that happened. — Eru·tuon 01:40, 1 March 2019 (UTC)Reply

The problem with mw.clone is that it copies the metatable, and the metatable makes the copied table read-only, and prevents mw.clone from writing any keys to it. deepcopy allows the metatable not to be copied. — Eru·tuon 01:43, 1 March 2019 (UTC)Reply

I see. Well, in this case, either deepcopy() without metatable copying or shallowClone() should work, as only the top level is being side-effected. Benwing2 (talk) 01:45, 1 March 2019 (UTC)Reply

jahvatama[edit]

Latest comment: 5 years ago3 comments2 people in discussion

I am not familiar with the particular standard used here, but I do not believe "alternative" is an accurate term for these terms. I have previously used "alternative form" for slightly different forms of the same word that are more or less equal in the standard language, like "kaitsema" and "kaitsma". In this case, these forms are not entirely equal in meaning. "Jauhatama" for instance is the Võro word, and would not be considered an "Estonian" word by most. While "jahvama" is listed in the ÕS as a dialectal termin, I would still not consider it an "alternative form", but rather a dialectal synonym. If used, it carries a dialectal connotation which makes it different from "jahvatama". Strombones (talk) 09:55, 6 March 2019 (UTC)Reply

@Strombones: "Alternative forms" is not meant to be very specific or descriptive; it's simply the header that's used (see WT:ALTER) for dialectal forms, as well as quite a few other things. For instance, see the Alternative forms sections of ἐγώ (egṓ) or ἠώς (ēṓs), which list dialectal forms. The words in the Alternative forms section should ideally be labeled with the name of the dialect or dialects that they belong to. Often the entry for these words will contain a definition line like "alternative form of x" or "{dialect name} form of x".

However, I think Võro is a special case; since it is considered a separate language here on Wiktionary (meaning, it uses the "Võro", not the "Estonian" header), a Võro word should probably be linked from some other part of the entry, like the etymology section (as a cognate), though I don't know what would be appropriate in this case. Usually the Alternative forms only contains words that have or will have an entry with the same language header as the current entry. — Eru·tuon 10:11, 6 March 2019 (UTC)Reply

Ah, thank you. I misconstrued the heading because I had only seen it being used one way. The thing with Võro terms is that sometimes they are used in standard Estonian for a dialectal "twist" of sorts, along with other non-Võro dialectal words. That's probably irrelevant here, so I think I'll remove the Võro words and just keep "jahvama".Strombones (talk) 14:06, 6 March 2019 (UTC)Reply

grc-noun form[edit]

Latest comment: 5 years ago4 comments2 people in discussion

Frankly, I do not think {{grc-noun form}} is necessarily preferable to {{head|grc|noun form}}. --Dan Polansky (talk) 06:25, 23 March 2019 (UTC)Reply

@Dan Polansky: It might be worth a discussion. As noted in the documentation for Module:grc-headword, the module does some things that {{head}} does not. — Eru·tuon 06:45, 23 March 2019 (UTC)Reply

All right, then. You probably mean the following: "This module tracks the monophthongs α, ι, υ (a, i, u) without macrons, breves, circumflexes, or iota subscripts (◌̄, ◌̆, ◌͂, ◌ͅ) with the tracking template grc-headword/ambig, so that length can be marked as policy requires, and it categorizes all Ancient Greek words into categories for accent type, such as Ancient Greek oxytone terms." --Dan Polansky (talk) 06:49, 23 March 2019 (UTC)Reply

Yeah. So converting non-lemma entries to use Module:grc-headword is mainly so that these services are provided for non-lemmas as well as lemmas. However, tracking ambiguous vowels and listing terms by accent could be done by analyzing the dump instead. — Eru·tuon 06:55, 23 March 2019 (UTC)Reply

dot= in form-of templates[edit]

Latest comment: 5 years ago7 comments2 people in discussion

Can you rerun your script checking for any of the following templates? Some of them don't end in 'of' (particularly the shortcut aliases). Thanks:

Click to show or hide list

language_specific_alt_form_of_templates = [
  u"be-Taraškievica",
  "bg-pre-reform",
  "ceb-superseded spelling of",
  "egy-alt",
  "egy-alternative transliteration of",
  "en-ing form of",
  "fr-post-1990",
  "fr-pre-1990",
  #"ga-lenition of",
  "hy-reformed",
  "jbo-rafsi of",
  "morse code abbreviation",
  "morse code for",
  "morse code prosign",
  "my-ICT of",
  u"pt-superseded-paroxytone-éi",
  u"pt-superseded-paroxytone-ói",
  "ru-abbrev of",
  "ru-acronym of",
  "ru-clipping of",
  "ru-initialism of",
  "ru-pre-reform",
  "uk-pre-reform",
  "yi-alternatively pointed form of",
  "yi-phonetic spelling of",
  "yi-unpointed form of",
]

alt_form_of_templates = [
  "abbreviation of", "abb", "abbreviation", "ao",
  "acronym of",
  "archaic form of",
  "archaic spelling of",
  "aspirate mutation of",
  "clipping of", "clipped form of", "clip",
  "contraction of",
  "dated form of",
  "dated spelling of",
  "deliberate misspelling of",
  "eclipsis of", "eclipsed",
  "eggcorn of", "eggcorn",
  "elongated form of",
  "euphemistic form of",
  "euphemistic spelling of",
  "former name of",
  "hard mutation of",
  "informal form of",
  "informal spelling of",
  "initialism of", "io",
  "lenition of", "lenited",
  "misromanization of",
  "misspelling of", "common misspelling of", "misspell",
  "mixed mutation of",
  "mutation of",
  "nasal mutation of",
  "nomen sacrum form of",
  "nonstandard form of",
  "nonstandard spelling of",
  "obsolete form of",
  "official form of",
  "rare form of", "rareform",
  "rare spelling of", "rarespell", "rarspell",
  "short for", "short form of", "short of", "shortfor",
  "soft mutation of",
  "standard form of",
  "standard spelling of", "standspell",
  "superseded spelling of", "deprecated spelling of", "superseded form of",
  "uncommon form of",
  "uncommon spelling of",
]

language_specific_form_of_templates = [
  "ar-act-participle",
  "ar-adj-inf-def",
  "ar-noun-inf-cons",
  "ar-noun-pl-coll-cons",
  "ar-pass-participle",
  "ar-verb-form",
  "ar-verbal noun of",
  "bg-adjective extended of",
  "bg-adjective feminine definite of",
  "bg-adjective feminine indefinite of",
  "bg-adjective masculine definite object of",
  "bg-adjective masculine definite subject of",
  "bg-adjective neuter definite of",
  "bg-adjective neuter indefinite of",
  "bg-adjective plural definite of",
  "bg-adjective plural indefinite of",
  "bg-plural count of",
  "bg-singular definite object form of",
  "bg-singular definite subject form of",
  "blk-past of",
  "cs-imperfective form of",
  "cu-Glag spelling of",
  "da-pl-genitive",
  "de-du contraction",
  "de-inflected form of",
  "egy-verb form of",
  "el-comp-form-of",
  "el-form-of-verb",
  "el-super-form-of",
  "en-archaic second-person singular of",
  "en-comparative of",
  "en-irregular plural of",
  "en-past of",
  "en-simple past of",
  "en-superlative of",
  "fy-NPL",
  "fy-noun-entry-pl",
  "ga-emphatic of",
  "ga-lenition of",
  "hu-exaggerated of",
  "hy-traditional",
  "ia-form of",
  "ie-past and pp of",
  "ja-new/r",
  "ja-past of verb",
  "ja-romaji",
  "ja-romanization of",
  "ja-te form of verb",
  "ja-verb form of",
  u"kyūjitai spelling of",
  "la-comp-form",
  "la-part-form",
  "lb-inflected form of",
  "mn-verb form of",
  "pt-cardinal form of",
  "pt-pronoun-with-l",
  "pt-pronoun-with-n",
  "ro-adj-form of",
  u"ru-alt-ё",
  "ru-participle of",
  "sa-desiderative of",
  "sa-frequentative of",
  "sa-root form of",
  "sce-verb form of",
  "sco-past of",
  "sco-simple past of",
  "sga-verbnec of",
  "sino-vietnamese reading of",
  "ug-latin",
  "ug-uly of",
  "ug-uyy of",
  "yi-inflected form of",
  "za-sawndip form of",
]

form_of_templates = [
  "abessive plural of",
  "abessive singular of",
  "abstract noun of",
  "accusative of",
  "accusative plural of",
  "accusative singular of",
  "active participle of",
  "agent noun of",
  "alternative case form of", "alternative capitalisation of", "alternative capitalization of", "altcaps", "altcase",
  "alternative form of", "alternate form of", "alt form", "altform", "alt form of", "alt-form",
  "alternative plural of",
  "alternative reconstruction of",
  "alternative spelling of", "alternate spelling of", "altspelling", "altspell", "alt-sp", "alt spell of",
  "alternative typography of",
  "ancient form of",
  "aphetic form of",
  "apocopic form of",
  "associative plural of",
  "associative singular of",
  "attributive form of", "attributive of",
  "augmentative of",
  "broad form of",
  "causative of",
  "combining form of",
  "comitative plural of",
  "comitative singular of",
  "comparative of", "comparative form of",
  "comparative plural of",
  "comparative singular of",
  "dative dual of",
  "dative of",
  "dative plural definite of",
  "dative plural indefinite of",
  "dative plural of",
  "dative singular of",
  "definite of",
  "distributive plural of",
  "distributive singular of",
  "dual of",
  "e-form of", "definite and plural of",
  "early form of",
  "elative of",
  "ellipsis of", "anapodoton of", "ellipse of",
  "equative of",
  "exclusive plural of",
  "exclusive singular of",
  "female form of", "fem form",
  "feminine noun of",
  "feminine of",
  "feminine plural of",
  "feminine plural past participle of",
  "feminine singular of",
  "feminine singular past participle of", "feminine past participle of",
  "form of",
  "frequentative of",
  "future participle of",
  "genitive of",
  "genitive plural definite of",
  "genitive plural indefinite of",
  "genitive plural of",
  "genitive singular definite of",
  "genitive singular indefinite of",
  "genitive singular of",
  "gerund of",
  "harmonic variant of",
  "honorific alternative case form of", "honoraltcaps",
  "imperative of",
  "imperfective form of",
  "inflected form of",
  "inflection of", "conjugation of",
  "iterative of",
  "late form of",
  "masculine animate plural past participle of",
  "masculine inanimate plural past participle of",
  "masculine noun of",
  "masculine of",
  "masculine plural of",
  "masculine plural past participle of",
  "masculine singular past participle of",
  "medieval spelling of",
  "men's speech form of", "men's form of",
  "misconstruction of",
  "monotonic form of",
  "negative of",
  "neuter plural of",
  "neuter plural past participle of",
  "neuter singular of", "neuter of",
  "neuter singular past participle of", "neuter past participle of",
  "nominalization of",
  "nominative plural of",
  "nominative singular of",
  "nuqtaless form of",
  "oblique plural of",
  "oblique singular of",
  "obsolete spelling of", "obssp", "obs-sp",
  "obsolete typography of",
  "participle of",
  "passive of", "passive form of",
  "passive participle of",
  "passive past tense of", "past passive of", "passive past of",
  "past active participle of",
  "past participle of", "past participle",
  "past passive participle of",
  "past tense of", "past of",
  "paucal of",
  "pejorative of",
  "perfect participle of",
  "perfective form of",
  "plural definite of", "definite plural of",
  "plural indefinite of", "indefinite plural of",
  "plural of", "plural form of",
  "present active participle of",
  "present participle of",
  "present tense of", "present of",
  "reflexive of",
  "rfform",
  "second-person singular of",
  "second-person singular past of",
  "singular definite of", "definite singular of",
  "singular of",
  "singulative of",
  "slender form of",
  "spelling of",
  "substantivisation of", "substantivization of",
  "superlative attributive of",
  "superlative of", "superlative form of",
  "superlative predicative of",
  "supine of",
  "syncopic form of",
  "synonym of", "alternative term for", "altname", "synonym", "alternative name of", "synof", "syn-of", "syn of",
  "terminative plural of",
  "terminative singular of",
  "verbal noun of",
  "vocative plural of",
  "vocative singular of",
]

Benwing2 (talk) 05:00, 25 March 2019 (UTC)Reply

Sure. I will have to rewrite the program a bit first though so that it can digest a list of templates to look up instances of. — Eru·tuon 05:05, 25 March 2019 (UTC)Reply

Done! Now I can make a listing of multiple templates pretty quickly. Do you find it useful to have the wikitext of the templates printed on the page under the titles like this, or no? — Eru·tuon 06:29, 25 March 2019 (UTC)Reply

Thank you! This is very cool. Having the wikitext of the templates is useful so I can inspect it, e.g. I wouldn't have thought to look for cases like |dot=<nowiki/>. Benwing2 (talk) 09:53, 25 March 2019 (UTC)Reply

@Erutuon Can you make a list of all pages that have a {{form of}} template with the |dot= param? Thanks! Benwing2 (talk) 01:34, 28 March 2019 (UTC)Reply

@Benwing2: I went through the previously generated data file and there were only a few matches: B.J., C.I.A., J.C., M.S., steerike. They are already on the list above. — Eru·tuon 02:20, 28 March 2019 (UTC)Reply

Oh, I forgot that {{form of}} was on the preceding list. Thanks! Benwing2 (talk) 02:21, 28 March 2019 (UTC)Reply

καλός[edit]

Latest comment: 5 years ago14 comments4 people in discussion

Sir, have it your way, if you must. But, with respect, you need to understand that language and poetic scansion are not the same thing. You seem to be confusing the two. Traditional metric scansion can sometimes do violence to language, forcing it to do abnormal things, but that does not mean that those abnormalities then become part and parcel of normal everyday speech. Thus, we can be quite certain that, while forced to scan κᾱλός while reciting certain kinds of poetry, speakers of Attic-Ionic never said κᾱλός in actual normal speech.

Now, I am not saying that information about the linguistic abnormalities of metric scansion is not germane to the Wiktionary. On the contrary, I think it is very useful to a student of Classical Greek poetry, provided that it is placed in the appropriate context, such as in a Usage Note (as it is now), and stating very clearly that what applies to the traditional metric scansion of poetry does not apply to normal everyday language.

However, that kind of metric information certainly does not belong under Pronunciation. “Epic Greek” scansion is not a Greek dialect, as Doric, Attic, Ionic, Aeolian or Boeotian are. Nor are “certain other cases” forms of language in the way that dialects are.

Perhaps, as a student of Classical Greek poetry (a highly commendable endeavor to be sure), you have little concern for language outside of metric scansion. But I very much doubt that ancient Greeks went about their day reciting Homer all the time (thus saying κᾱλός much more often than κᾰλός). I am certain that they actually spoke their Greek as a real everyday language.

The Wiktionary is primarily a dictionary, not a guide to metric scansion. And the purpose of a dictionary is to record actual language, not the abnormalities of metric scansion. Pasquale (talk) 16:02, 26 March 2019 (UTC)Reply

@Pasquale: As I understand it, both pronunciations of καλός (kalós) are inferred from poetry: the short-vowel version from Attic drama and some other types of poetry, the long-vowel version from epic poetry and some other types of poetry. As you say, presumably the usual pronunciation had a short vowel since Attic drama would probably use the usual pronunciation, but the long-vowel pronunciation would have been used when reading Homer. (As the entry points out, the long-vowel pronunciation was probably not the actual Homeric pronunciation since at the time of composition the word would have had a short vowel and a digamma. It was a reinterpretation after the digamma was lost.) But that doesn't matter; here on Wiktionary we show transcriptions for both the usual and unusual vowel lengths. — Eru·tuon 17:41, 26 March 2019 (UTC)Reply

@Erutuon: Indeed, that's absolutely correct. When it comes to α, ι, and υ, vowel quantity has to be inferred from poetry. But then linguistic analysis takes over. There have been several important volumes written about the phonetics and phonology of Attic-Ionic Greek, as well as other dialects, for over a century now. As a result, we know for certain that the short-vowel pronunciation κᾰλός was, in fact, the standard spoken Attic-Ionic pronunciation, while the long-vowel pronunciation κᾱλός was restricted to scanning certain types of poetry, especially epic poetry, ergo not part of the actual spoken language. Back to what I wrote about the difference between actual language and poetic scansion... Thanks. Pasquale (talk) 21:00, 26 March 2019 (UTC)Reply

Yes, and there are a variety of other ways it's inferred, for instance in μᾶλλον (mâllon) from the circumflex, or in πρᾱ́ττω (prā́ttō) from either the circumflex in certain forms or the fact that Ionic had η (ē), or from the expected vowel grade in certain formations, or from the fact that a form has undergone compensatory lengthening or quantitative metathesis.

Anyway, are you arguing that the long-vowel version καλός (kalós) would never have actually been used by Athenians, even when reciting Homer, that they would have recited it un-metrically? — Eru·tuon 21:18, 26 March 2019 (UTC)Reply

(butting in) I agree with Pasquale that artificial pronunciations that exist solely for the sake of the meter should not be put on an equal footing with the natural, prosaic ones. Chignon – Пу чок 21:22, 26 March 2019 (UTC)Reply

What do you mean exactly? Marking the artificial pronunciations somehow, not showing them at all? At the moment the κᾱλός pronunciation is labeled as being used in epic poetry.

Note that there are some metrically modified forms that have or will have their own entries, like words with other vowel lengthenings, like ε to ει or ο to ου, or with consonants doubled for the sake of meter, or with οω instead of ω. — Eru·tuon 21:41, 26 March 2019 (UTC)Reply

I think we can still show the artificial pronunciations, but yes, in my view we should explicitly mark them as artificial. Maybe we could write (artificial lengthening for the sake of the meter) (although that's a bit long) or something like that?

As to your second point, yes, I agree that those deserve their own entries of course. But again, let's write explicitly where they come from. Chignon – Пу чок 21:55, 26 March 2019 (UTC)Reply

Now I'm kind of interested in writing a script that looks through page titles to find words that might be metrical modifications of other words. — Eru·tuon 22:13, 26 March 2019 (UTC)Reply

According to w:Proto-Greek language, "Loss of /h/ and /w/ after a consonant was often accompanied by compensatory lengthening of a preceding vowel." This suggests that this is a regular change and not artificial at all. —Rua (mew) 22:00, 26 March 2019 (UTC)Reply

Yes, I think I've encountered that before. Maybe καλός (kalós) is a bad example, but my point still stands: there are some lengthenings that have no etymological justification. Chignon – Пу чок 22:06, 26 March 2019 (UTC)Reply

Compensatory lengthening after the loss of /w/ after a consonant may be more common in Ionic than Attic, though: *monwos → Attic μόνος (mónos), Ionic μοῦνος (moûnos); *ksenwos → Attic ξένος (xénos), Ionic ξεῖνος (xeînos). — Eru·tuon 22:10, 26 March 2019 (UTC)Reply

So perhaps the longer form is just non-Attic, but really was used somewhere. Given that Homer was an Ionian himself, that seems like the first place to look. —Rua (mew) 22:21, 26 March 2019 (UTC)Reply

@Erutuon: In reply to your question: "are you arguing that the long-vowel version κᾱλός (kālós) would never have actually been used by Athenians, even when reciting Homer, that they would have recited it un-metrically?": No, of course, I am not suggesting any such thing. And that's perfectly clear from my previous comments, I believe. What I did say is that information about the linguistic abnormalities of metric scansion is fine, and indeed very useful, as long as it is placed in a Usage Note. But it certainly does not belong under Pronunciation and should be removed from that section. The Pronunciation section strictly references the Attic dialect (and not, for example, the Homeric). Look at what the Pronunciation section says now:

In most cases:

(5^th BCE Attic) IPA: /ka.lós/

(1^st CE Egyptian) IPA: /kaˈlos/

(4^th CE Koine) IPA: /kaˈlos/

(10^th CE Byzantine) IPA: /kaˈlos/

(15^th CE Constantinopolitan) IPA: /kaˈlos/

In epic poetry and in some other cases:

(5^th BCE Attic) IPA: /kaː.lós/

(1st CE Egyptian) IPA: /kaˈlos/

(4^th CE Koine) IPA: /kaˈlos/

(10^th CE Byzantine) IPA: /kaˈlos/

(15^th CE Constantinopolitan) IPA: /kaˈlos/

This is just silly and probably incorrect. As I repeat, metric scansion is distinct from normal everyday language, and it has a history of its own. An experienced reciter of poetry may well have pronounced /kaː.lós/ with a long ᾱ while scanning Homeric verse, not only in 5^th BCE Attic, but also well into 1st CE Egyptian, 4^th CE Koine, and even later. On the other hand, there were surely speakers of 5^th BCE Attic Greek who never recited Homeric verse in their lives and only ever said /ka.lós/, which we know was the normal everyday pronunciation.

Lots of other languages, ancient and modern, offer similar peculiarities of metric scansion, which often shed light on earlier stages of those languages, but synchronically are always artificial. They merit mention in a note, but are not listed as synchronic pronunciation variants. For example, there are numerous words in Sanskrit that have peculiar scansions in Vedic hymns (sometimes only in the oldest hymns of the Rigveda); e.g. स्वर् (svár, but metrically súar or súvar in Vedic hymns), see स्वर्. Or, in Italian, the word oriente, which is normally pronounced in three syllables, but is often pronounced in four syllables in poetry (and maybe spelled orïente), see oriente. There are myriad such cases. But, invariably, the synchronically artificial metric scansion is discussed at best in a note, never listed as a normal pronunciation variant. Hope this is clear. Pasquale (talk) 17:24, 27 March 2019 (UTC)Reply

┌────────────────────────────────────────────────────────────────────────────────────────────────────┘ Hmm, I agree that the post-Classical Attic pronunciations of καλός (kalós) are probably inaccurate: it's not clear when Greek speakers would have stopped reciting Homer metrically. So it would make more sense to only show the Classical Attic pronunciation for any special Epic forms.

As mentioned above, we tend to include {{grc-IPA}} in every entry, and some entries are for special metrical forms that are spelled differently from the normal forms. So to apply your preferred policy, pronunciations would have to be removed altogether from certain entries. Another option is to label these forms and to only show the Classical Attic pronunciation, because it is not clear how the pronunciation used in reciting poetry would have evolved, but it is plausible that poetry known to the Athenians would have been recited in something like an Athenian accent. I'm more inclined to the latter because I think it is helpful to provide some kind of transcription of poetic words or pronunciations. — Eru·tuon 18:06, 27 March 2019 (UTC)Reply

Wikipedia links[edit]

Latest comment: 5 years ago6 comments2 people in discussion

I am interested in the inline links to Wikipedia that we have enclosed in [[w:|]], {{w}} (96K pages of transclusions, mostly multiple), and {{w2}} (3K). There is an argument to be made that the frequency with which we feel compelled to have such a link is an indication that it might make a useful entry, provided, of course, that it meets CFI.

I could extract these using modifications of the Perl scripts that I use for {{taxon}} (15K) and {{taxlink}} (25K). Are you aware of any such compilation or of any well-designed code to do the same thing?

Does it make sense to do it from the tool server? How? DCDuring (talk) 23:33, 29 March 2019 (UTC)Reply

@DCDuring: I haven't worked with the toolserver yet, and am not sure if there are tools for getting statistics on wikilinks, but I generated a list of {{w}} and {{w2}} instances and could make a program to find wikilinks if you'd like. I figure it would be faster to generate the data from a list of templates and wikilinks than the whole dump. — Eru·tuon 00:12, 30 March 2019 (UTC)Reply

Where are said lists of the template instances?

How would I get a count of links that use [[w:]], which is not a template?

I didn't even know about {{w2}} until I has looking for the number of pages that transcluded {{w}}. My {{vern}} does a link to WP, but obviously I know all about them. Are there other templates that link to WP?

Ny needs are only for links to English Wikipedia. (I think I've already extirpated all link to Wikispecies other than via {{taxlink}}.) Ideally, I would like to merge all the lists of instances and get counts. Specifically, for a given link, I would like the wikipedia link and the display. The WP links might include links to headers and there could be different displays for a given WP link. The sort should group instances by WP link page, and subsort by any header links and then by display. The groups should sort by decreasing frequency of the entire group. DCDuring (talk) 01:55, 30 March 2019 (UTC)Reply

I probably can do this myself. Please don't spend time on it unless I come begging. DCDuring (talk) 01:57, 30 March 2019 (UTC)Reply

@DCDuring: I've got the {{w}} and {{w2}} files on my computer. Unfortunately the {{w}} file is too big to save on a Wiktionary page (about 3.5 MB), the other one a bit smaller and available here. If you'd like the {{w}} file, you can send me an email via Special:Emailuser. (I'm not super familiar with file-sharing sites.) — Eru·tuon 02:22, 30 March 2019 (UTC)Reply

Done, I think. DCDuring (talk) 16:30, 30 March 2019 (UTC)Reply

Not quite done etyl cleanups[edit]

Latest comment: 5 years ago2 comments2 people in discussion

Hi, if I edit {{etyl}} to show that a certain language is done, but there are actually still a few forms left, please don't undo my edit to the template. I often list language codes as done that don't actually have an "etyl cleanup/xx" subcategory precisely so that I can find the last few stragglers via CAT:E. Obviously if I've really jumped the gun and there are suddenly dozens of pages in CAT:E, you can and should revert me, but if there's only a handful of pages, then please let it be. I'll find and fix the module errors soon. Thanks! —Mahāgaja · talk 14:43, 9 April 2019 (UTC)Reply

@Mahagaja: Sorry about that. I'll stop interfering. I'm not super enthusiastic about using module errors, but it works.

If it would help, I can supply lists of pages that had a given language code as of the last dump. I created a folder with separate files for each language code in a format similar to this. (It comes to about 4 MB.) I was doing a JWB run through a list like that to clean up the last instances. Usually there are several pages that have already been cleaned up since the dump was generated, so it's not the best for manual editing, unless you have a way to filter out pages that have already been cleaned up. — Eru·tuon 21:06, 9 April 2019 (UTC)Reply

form-of templates: Full information[edit]

Latest comment: 5 years ago7 comments2 people in discussion

Thought you might find this useful ... I created a programmatic list of all the non-language-specific form-of templates and their properties. Not sure if you use Python but if you do it should be very easy to fetch whatever you want out of this list. Each template has an associated dict of properties:

"aliases": List of aliases
"deprecated-aliases": List of deprecated aliases (should no longer be used)
"withcap": If true, template displays a default initial capital and supports |nocap=
"withdot": If true, template displays a default final period and supports |nodot= and |dot=
"withfrom": If true, template supports |from=, |from2=, etc. to specify regional dialects or whatever
"withPOS": If true, template supports |POS= to control the part of speech of the category
"cat": If present, non-language-specific portion of the category to which the page belongs (prepended with the canonical name of the language to form the actual category name); value could potentially be a list of multiple categories, but no such entries exist among the non-language-specific templates

I'm still working on the corresponding language-specific list. These templates are much messier, often work in idiosyncratic ways, and are often defined manually instead of using a function in Module:form of/templates. I'm gradually converting them and cleaning them up.

Click to show or hide list

form_of_templates = [
  ("abbreviation of", {"aliases": ["abbr of"], "withcap": True, "withdot": True, "cat": "abbreviations"}),
  ("abstract noun of", {"withcap": True, "withfrom": True, "cat": "abstract nouns"}),
  ("accusative of", {}),
  ("accusative plural of", {}),
  ("accusative singular of", {}),
  ("acronym of", {"withcap": True, "withdot": True, "cat": "acronyms"}),
  ("active participle of", {}),
  ("agent noun of", {"cat": "agent nouns"}),
  ("alternative case form of", {"aliases": ["alt case"], "withcap": True}),
  ("alternative form of", {"aliases": ["alt form", "altform"], "deprecated-aliases": ["alt form of"], "withcap": True, "withfrom": True}),
  ("alternative plural of", {}),
  ("alternative reconstruction of", {}),
  ("alternative spelling of", {"aliases": ["alt sp"], "withcap": True, "withfrom": True}),
  ("alternative typography of", {}),
  ("aphetic form of", {"withcap": True, "cat": "aphetic forms"}),
  ("apocopic form of", {"cat": "apocopic forms"}),
  ("archaic form of", {"withcap": True, "withdot": True, "cat": "archaic forms"}),
  ("archaic spelling of", {"withcap": True, "withdot": True, "cat": "archaic forms"}),
  ("aspirate mutation of", {"withcap": True, "withdot": True, "cat": "aspirate-mutation forms"}),
  ("attributive form of", {}),
  ("augmentative of", {"withcap": True, "withPOS": True, "cat": "augmentative {{{POS|noun}}}s"}),
  ("broad form of", {"withfrom": True}),
  ("causative of", {"cat": "causative verbs"}),
  ("clipping of", {"aliases": ["clip of"], "withcap": True, "withdot": True, "cat": "clippings"}),
  ("combining form of", {"cat": "combining forms"}),
  ("comparative of", {"withPOS": True, "cat": "comparative {{{POS|adjective}}}s"}),
  ("construed with", {}),
  ("contraction of", {"withcap": True, "withdot": True, "cat": "contractions"}),
  ("dated form of", {"withcap": True, "withdot": True, "cat": "dated forms"}),
  ("dated spelling of", {"withcap": True, "withdot": True, "cat": "dated forms"}),
  ("dative of", {}),
  ("dative plural of", {}),
  ("dative singular of", {}),
  ("definite of", {}),
  ("deliberate misspelling of", {"withcap": True, "withdot": True, "cat": "misspellings"}),
  ("diminutive of", {"aliases": ["dim of"], "withPOS": True, "cat": "diminutive {{{POS|noun}}}s"}),
  ("diminutive plural of", {"withPOS": True, "cat": "diminutive {{{POS|noun}}}s"}),
  ("dual of", {}),
  ("eclipsis of", {"withcap": True, "withdot": True, "cat": "eclipsed forms"}),
  ("eggcorn of", {"withcap": True, "withdot": True, "cat": "eggcorns"}),
  ("elative of", {}),
  ("ellipsis of", {"withcap": True, "cat": "ellipses"}),
  ("elongated form of", {"withcap": True, "withdot": True, "cat": "elongated forms"}),
  ("endearing form of", {"withPOS": True, "cat": "endearing {{{POS|noun}}}s"}),
  ("equative of", {"withPOS": True, "cat": "{{{POS|adjective}}} equative forms"}),
  ("euphemistic form of", {"withcap": True, "withdot": True, "cat": "euphemisms"}),
  ("euphemistic spelling of", {"withcap": True, "withdot": True, "cat": "euphemisms"}),
  ("eye dialect of", {"withcap": True, "withdot": True, "withfrom": True, "cat": "eye dialect"}),
  ("feminine noun of", {}),
  ("feminine of", {}),
  ("feminine plural of", {}),
  ("feminine plural past participle of", {"cat": "past participle forms"}),
  ("feminine singular of", {}),
  ("feminine singular past participle of", {"cat": "past participle forms"}),
  ("form of", {}),
  ("former name of", {"withcap": True, "withdot": True}),
  ("frequentative of", {"cat": "frequentative verbs"}),
  ("future participle of", {}),
  ("genitive of", {}),
  ("genitive plural definite of", {}),
  ("genitive plural indefinite of", {}),
  ("genitive plural of", {}),
  ("genitive singular definite of", {}),
  ("genitive singular indefinite of", {}),
  ("genitive singular of", {}),
  ("gerund of", {"cat": "gerunds"}),
  ("h-prothesis of", {"cat": "h-prothesized forms"}),
  ("hard mutation of", {"withcap": True, "withdot": True, "cat": "hard-mutation forms"}),
  ("harmonic variant of", {}),
  ("honorific alternative case form of", {"aliases": ["honor alt case"], "withcap": True}), # FIXME, rewrite with withdot=
  ("imperative of", {}),
  ("imperfective form of", {"cat": "imperfective verbs"}),
  ("inflected form of", {}),
  ("inflection of", {"deprecated-aliases": ["conjugation of"]}),
  ("informal form of", {"withcap": True, "withdot": True, "cat": "informal forms"}),
  ("informal spelling of", {"withcap": True, "withdot": True, "cat": "informal forms"}),
  ("initialism of", {"aliases": ["init of"], "withcap": True, "withdot": True, "cat": "initialisms"}),
  ("iterative of", {"cat": "iterative verbs"}),
  ("lenition of", {"withcap": True, "withdot": True, "cat": "lenited forms"}),
  ("masculine noun of", {}),
  ("masculine of", {}),
  ("masculine plural of", {}),
  ("masculine plural past participle of", {"cat": "past participle forms"}),
  ("medieval spelling of", {"cat": "medieval spellings"}),
  ("men's speech form of", {"cat": "men's speech terms"}),
  ("misconstruction of", {"withcap": True, "cat": "misconstructions"}),
  ("misromanization of", {"withcap": True, "withdot": True, "cat": "misromanizations"}),
  ("misspelling of", {"aliases": ["missp"], "withcap": True, "withdot": True, "cat": "misspellings"}),
  ("mixed mutation of", {"withcap": True, "withdot": True, "cat": "mixed-mutation forms"}),
  ("nasal mutation of", {"withcap": True, "withdot": True, "cat": "nasal-mutation forms"}),
  ("negative of", {}),
  ("neuter plural of", {}),
  ("neuter singular of", {}),
  ("neuter singular past participle of", {"cat": "past participle forms"}),
  ("nomen sacrum form of", {"withcap": True, "withdot": True, "cat": "nomina sacra"}),
  ("nominalization of", {"cat": "nominalized adjectives"}),
  ("nominative plural of", {}),
  ("nominative singular of", {}),
  ("nonstandard form of", {"withcap": True, "withdot": True, "cat": "nonstandard forms"}),
  ("nonstandard spelling of", {"withcap": True, "withdot": True, "cat": "nonstandard forms"}),
  ("nuqtaless form of", {}),
  ("obsolete form of", {"withcap": True, "withdot": True, "cat": "obsolete forms"}),
  ("obsolete spelling of", {"aliases": ["obs sp"], "withcap": True, "cat": "obsolete forms"}),
  ("obsolete typography of", {"cat": "obsolete forms"}),
  ("official form of", {"withcap": True, "withdot": True, "cat": "official forms"}),
  ("participle of", {"cat": "participles"}),
  ("passive of", {"cat": "verb passive forms"}),
  ("passive participle of", {}),
  ("passive past tense of", {}),
  ("past active participle of", {"cat": "past active participles"}),
  ("past participle form of", {"cat": "past participle forms"}),
  ("past participle of", {"cat": "past participles"}),
  ("past passive participle of", {"cat": "past passive participles"}),
  ("past tense of", {}),
  ("pejorative of", {"withcap": True, "cat": "derogatory terms"}),
  ("perfect participle of", {"cat": "perfect participles"}),
  ("perfective form of", {"cat": "perfective verbs"}),
  ("plural definite of", {}),
  ("plural indefinite of", {}),
  ("plural of", {"deprecated-aliases": ["plural form of"]}),
  ("present active participle of", {"cat": "present active participles"}),
  ("present participle of", {"cat": "present participles"}),
  ("present tense of", {}),
  ("pronunciation spelling of", {"withcap": True, "withdot": True, "withfrom": True, "cat": "pronunciation spellings"}),
  ("pronunciation variant of", {"withcap": True, "withdot": True, "withfrom": True, "cat": "pronunciation variants"}),
  ("rare form of", {"withcap": True, "withdot": True, "cat": "rare forms"}),
  ("rare spelling of", {"aliases": ["rare sp"], "withcap": True, "withdot": True, "cat": "rare forms"}),
  ("reflexive of", {"cat": "reflexive verbs"}),
  ("rfform", {}),
  ("romanization of", {"withcap": True}),
  ("short for", {"withcap": True, "withdot": True, "cat": "short forms"}),
  ("singular definite of", {}),
  ("singular of", {}),
  ("singulative of", {}),
  ("slender form of", {"withfrom": True}),
  ("soft mutation of", {"withcap": True, "withdot": True, "cat": "soft-mutation forms"}),
  ("spelling of", {"cat": "{{#if:{{{lang|}}}|{{{1}}} forms|{{{2}}} forms}}"}),
  ("standard form of", {"withcap": True, "withdot": True}),
  ("standard spelling of", {"aliases": ["stand sp"], "withcap": True, "withdot": True}),
  ("superlative attributive of", {}),
  ("superlative of", {"withPOS": True, "cat": "superlative {{{POS|adjective}}}s"}),
  ("superlative predicative of", {}),
  ("superseded spelling of", {"withcap": True, "withdot": True, "cat": "superseded forms"}),
  ("supine of", {}),
  ("syncopic form of", {"cat": "syncopic forms"}),
  ("synonym of", {"aliases": ["syn of"], "withcap": True}),
  ("t-prothesis of", {"cat": "t-prothesized forms"}),
  ("uncommon form of", {"withcap": True, "withdot": True, "cat": "uncommon forms"}),
  ("uncommon spelling of", {"withcap": True, "withdot": True, "cat": "uncommon forms"}),
  ("verbal noun of", {"cat": "verbal nouns"}),
  ("vocative plural of", {}),
  ("vocative singular of", {}),
]

Benwing2 (talk) 02:48, 10 April 2019 (UTC)Reply

Thanks, that's already been useful to help me determine that a list of form-of templates that I made would include one with capitalization. (I've written a little Python that used Pywikibot and mwparserfromhell, but I'm not as familiar with it as with Lua, JavaScript, and C.) — Eru·tuon 04:55, 10 April 2019 (UTC)Reply

Lang-specific form-of templates[edit]

Here is my current list of lang-specific form-of templates and their aliases (if there are multiple comma-separated template names listed on a single line, the first one is the canonical name and the remainder are aliases). I haven't gotten around yet to classifying them by behavior, which is difficult in any case because each one is so idiosyncratic and my plan is to obsolete as many as possible.

Click to show or hide list

ar-instance noun of
ar-verbal noun of
be-Taraškievica spelling of
bg-adj form of
bg-noun form of
bg-pre-reform
bg-verb form of
blk-past of
br-noun-mutation of,br-noun-mutated
br-noun-plural
ca-adj form of
ca-form of
ca-verb form of
caret notation of
ceb-superseded spelling of
chm-inflection of
cmn-erhua form of,zh-erhua form of
cu-Glag spelling of
cu-form of
da-e-form of
da-pl-genitive
de-du contraction
de-form-adj
de-form-noun
de-inflected form of
de-superseded spelling of,de-deprecated spelling of
de-umlautless spelling of
de-verb form of
de-zu-infinitive of
egy-alternative transliteration of,egy-alt
egy-verb form of
el-Cretan dialect form of
el-Cypriot dialect form of
el-Italiot dialect form of
el-Katharevousa form of
el-Maniot dialect form of
el-Pontian dialect form of
el-form-of-adv
el-form-of-nounadj,el-form-of-pronoun
el-form-of-verb,el-verb form of
el-monotonic form of
el-participle of
el-polytonic form of
en-archaic second-person singular of
en-archaic second-person singular past of
en-archaic third-person singular of
en-comparative of
en-ing form of
en-irregular plural of
en-past of
en-simple past of
en-superlative of
en-third-person singular of,en-third person singular of
enm-first-person singular of
enm-first/third-person singular past of
enm-inflected form of
enm-plural of
enm-plural past of
enm-plural subjunctive of
enm-plural subjunctive past of
enm-second-person singular of
enm-second-person singular past of
enm-singular subjunctive of
enm-singular subjunctive past of
enm-third-person singular of
eo-form of
eo-root of
es-adj form of
es-compound of
es-note-noun-mf
es-verb form of
es-verb form of/adverbial
es-verb form of/conditional
es-verb form of/imperative
es-verb form of/indicative
es-verb form of/participle
es-verb form of/subjunctive
es-verb form of/subtense-name
es-verb form of/subtense-pronoun
et-nom form of
et-participle of
et-verb form of
fa-adj form of,fa-adj-form
fa-form-verb
ff-fuc-form of
fi-form of
fi-infinitive of
fi-participle of
fi-verb form of
fr-post-1990
fr-pre-1990
fy-pronadv of
ga-emphatic of
ga-lenition of
gl-verb form of
gl-verb form of/conditional
gl-verb form of/doWork
gl-verb form of/error
gl-verb form of/imperative
gl-verb form of/indicative
gl-verb form of/participle
gl-verb form of/pronoun
gl-verb form of/subjunctive
gl-verb form of/subtense-name
gl-verb form of/subtense-pronoun
gmq-bot-verb-form-sup
got-compound of
got-nom form of
got-verb form of
han tu form of,vi-hantu form of
he-adj form of
he-defective spelling of
he-excessive spelling of
he-infinitive of
he-noun form of
he-prep form of
he-verb form of
hi-form-adj
hi-form-adj-verb
hi-form-noun
hi-form-verb
hit-broad transcription of
hit-transliteration of
hu-exaggerated of
hu-inflection of
hu-participle
hy-form-noun
hy-reformed
hy-traditional
ia-form of
ie-past and pp of
io-form of
is-conjugation of
is-inflection of
it-adj form of
iu-spel
ja-form of
ja-kyujitai spelling of,kyu,ja-kyu sp
ja-past of verb
ja-romanization of,ja-romanization-of
ja-te form of verb
ja-verb form of
jbo-rafsi of
jyutping reading of
ka-form of
ka-verb-form-of
ka-verbal for,ka-verbal of
ko-hanja form of,hanja form of
ko-mixed form of
ko-root of
ku-verb form of
la-praenominal abbreviation of
lb-inflected form of
liv-conjugation of
liv-inflection of
liv-participle of
lt-būdinys,lt-budinys
lt-dalyvis-1,lt-dalyvis
lt-dalyvis-2
lt-form-adj
lt-form-adj-is
lt-form-noun
lt-form-part
lt-form-pronoun
lt-form-verb
lt-padalyvis
lt-pusdalyvis
lv-adv form of
lv-comparative of
lv-definite of
lv-inflection of
lv-negative of
lv-participle of
lv-reflexive of
lv-superlative of
lv-verbal noun of
mfe-medial of,mfe-short of
mn-verb form of
morse code abbreviation
morse code for
morse code prosign
mr-form-adj
mt-prep-form
my-ICT of
nb-noun-form-def-gen
nb-noun-form-def-gen-pl
nb-noun-form-indef-gen-pl
nb-noun-form-indef-pl
nl-adj form of
nl-noun form of
nl-pronadv of
nl-verb form of
nn-verb-form of
nn-verb-form-imp
nn-verb-form-past
nn-verb-form-pastpart
nn-verb-form-pre
no-noun-form-def
no-noun-form-def-pl
ofs-nom form of
osx-nom form of
pi-sc
pinyin reading of,pinread,pinof
pt-adj form of
pt-adv form of
pt-apocopic-verb
pt-article form of
pt-cardinal form of
pt-noun form of
pt-obsolete-differential-accent
pt-obsolete-hellenism
pt-obsolete-sc
pt-obsolete-secondary-stress
pt-obsolete-silent-letter-1911
pt-obsolete-éia
pt-obsolete-ôo
pt-obsolete-ü
pt-ordinal form,pt-ordinal def
pt-pron def
pt-pronoun-with-l
pt-pronoun-with-n
pt-superseded-hyphen
pt-superseded-paroxytone
pt-superseded-silent-letter-1990
pt-verb form of
pt-verb-form-of
ro-Cyrillic of
ro-adj-form of,ro-form-adj
ro-form-noun
ro-form-verb
ro-superseded spelling of
roa-opt-noun plural of
ru-abbrev of
ru-acronym of
ru-alt-ё
ru-clipping of
ru-initialism of
ru-participle of
ru-pre-reform
sa-desiderative of,sa-desi
sa-frequentative of,sa-freq
sa-root form of
sce-verb form of
sco-past of
sco-simple past of
sco-third-person singular of
sga-verbnec of
sh-form-noun
sh-form-proper-noun
sh-verb form of,sh-form-verb
sino-vietnamese reading of
sl-form-adj
sl-form-noun
sl-form-verb,sl-verb form of
sl-participle of
sv-adj-form-abs-def
sv-adj-form-abs-def+pl
sv-adj-form-abs-def-m
sv-adj-form-abs-indef-n
sv-adj-form-abs-pl
sv-adj-form-comp
sv-adj-form-comp-pl
sv-adj-form-sup-attr
sv-adj-form-sup-attr-m
sv-adj-form-sup-pred
sv-adj-form-sup-pred-pl
sv-adv-form-comp
sv-adv-form-sup
sv-noun-form-adj
sv-noun-form-def
sv-noun-form-def-gen
sv-noun-form-def-gen-pl
sv-noun-form-def-pl
sv-noun-form-indef-gen
sv-noun-form-indef-gen-pl
sv-noun-form-indef-pl
sv-proper-noun-gen
sv-verb-form-imp
sv-verb-form-inf-pass
sv-verb-form-past
sv-verb-form-past-pass
sv-verb-form-pastpart
sv-verb-form-pre
sv-verb-form-pre-pass
sv-verb-form-prepart
sv-verb-form-pres-pass
sv-verb-form-subjunctive
sv-verb-form-sup
sv-verb-form-sup-pass
sw-adj form of
tg-adj form of,tg-adj-form
tg-form-verb
tl-superseded spelling of
tl-verb form of
tr-copulative form of
tr-inflection of
tr-possessive form of
ug-uly of
ug-uyy of
uk-pre-reform
ur-form-adj
ur-form-noun
ur-form-verb
vi-Nom form of,Nom form of,nomof
xh-combining stem of
yi-alternatively pointed form of
yi-inflected form of
yi-phonetic spelling of
yi-unpointed form of
za-sawndip form of
zh-alt-form
zh-altname,zh-alt-name
zh-altterm,zh-alt-term
zh-misspelling of,zh-misspelling
zh-old-name
zh-only used in,zh-only
zh-original
zh-short,zh-abbrev
zh-subst-char
zh-sum of parts
zh-synonym of,zh-synonym
zu-combining stem of
zu-verb inf of

Benwing2 (talk) 23:55, 13 April 2019 (UTC)Reply

@Benwing2: I've made a file of instances of these templates (121 MiB!) if you need to do any text analysis on them. — Eru·tuon 21:16, 14 April 2019 (UTC)Reply

Thanks! What I actually need currently though is a list of any instances of the inflection tag "mp" in {{inflection of}}; i.e. any cases where "mp" (possibly with spaces on either end) occurs in param 3 or greater in a call to {{inflection of}}. BTW I missed two templates in the list above (now corrected): Template:he-infinitive of (I just forgot it) and Template:fy-pronadv of (recently added). Benwing2 (talk) 21:24, 14 April 2019 (UTC)Reply

Okay, here's the list of {{inflection of}} containing |mp|. There shouldn't be many cases in which mp isn't a grammar label because it isn't a language code and isn't very likely to be a word, and no instances with explicitly numbered parameters include mp as a grammar tag. — Eru·tuon 21:56, 14 April 2019 (UTC)Reply

Thanks! Benwing2 (talk) 22:12, 14 April 2019 (UTC)Reply

Scripts scripts scripts[edit]

Latest comment: 4 years ago2 comments2 people in discussion

BTW as part of my cleanup of the lang-specific form-of templates I wrote some general scripts to rewrite templates in various ways. One of them lets you do fairly simple things like rename templates or remove or rename parameters using command-line arguments; e.g. I used the following:

python rewrite_template.py -t 'e-form of' -n 'da-e-form of' -r lang --filter lang=da --save

to rename {{e-form of}} to {{da-e-form of}} and remove the |lang= parameter, with a filter added saying to operate only when |lang=da, for safety's sake. Another one lets you specify complex rewrite specifications in code. An example is for rewriting {{et-verb form of}} to {{Inflection of|et|...}} (this latter template doesn't exist yet but it will):

  ("et-verb form of", (
    # The template code supports m=ptc and categorizes specially, but
    # it never occurs.
    "Inflection of",
    ("error-if", ("present-except", ["1", "p", "m", "t"])),
    ("set", "1", [
      "et",
      ("copy", "1"),
      "",
      ("lookup", "p", {
        "1s": ["1", "s"],
        "2s": ["2", "s"],
        "3s": ["3", "s"],
        "1p": ["1", "p"],
        "2p": ["2", "p"],
        "3p": ["3", "p"],
        "pass": "pass",
        "": [],
      }),
      ("lookup", "m", {
        "pres": "pres",
        "past": "past",
        "cond": "cond",
        "impr": "impr",
        "quot": "quot",
        "": [],
      }),
      ("lookup", "t", {
        "da": "da-infinitive",
        "conn": "conn",
        "": [],
      }),
    ]),
  )),

This will, for example, rewrite {{et-verb form of|foobar|p=1p|t=conn}} to {{Inflection of|et|foobar||1|p|conn}}, but will complain and refuse to do anything if it sees an unfamiliar parameter or an unexpected value for a known parameter. I also have lots of other scripts to do things like regex-based lookups and rewrites, lists of pages in a given category or namespace or referencing a given page, etc. All of these scripts operate online, although most of them can be passed a list of pages to operate on, making it possible to interface them with scripts that search through a dump. If you're interested, I can make these scripts available. Benwing2 (talk) 22:34, 14 April 2019 (UTC)Reply

@Benwing2: Hi again! Now that I have a bot these scripts would be very useful. I made a script to swap parameters in {{R:itc:EDL}} or move a numbered parameter to a named one, and realized I might have saved some effort by using your scripts instead, because it turned out to be more complex than I thought. — Eru·tuon 19:59, 23 December 2019 (UTC)Reply

Your Latin>Cyrillic edits[edit]

Latest comment: 5 years ago6 comments3 people in discussion

Hi Erutuon, I appreciate your Latin>Cyrillic edits for the terms in Turkic languages.

Just wanted to ask: are you sure those terms prior to your edits were actually typed using Latin characters? Each time I took the effort to use the actual Cyrillic characters using the respective character sets. If so, then I will have those character sets corrected.

Regards, Borovi4ok (talk) 09:07, 19 April 2019 (UTC)Reply

@Borovi4ok: I'm quite sure. My program uses regular expressions to find words with non-Cyrillic characters and a set of replacements based on the Lat2CyrMap here to automatically replace Latin characters with Cyrillic. (I also sometimes verify using a program that I paste text into to see the names of the characters.) If you have trouble finding the characters, you can use the "Cyrillic" menu under the edit box (also available here) as a reference; all the letters there are Cyrillic except in the "Transliteration" section. — Eru·tuon 09:36, 19 April 2019 (UTC)Reply

Thanks. I actually routinely use the "Cyrillic" menu under the edit box. So I am confused now. Can I be sure that it actually has all the correct characters in it? Borovi4ok (talk) 10:12, 19 April 2019 (UTC)Reply

@Borovi4ok: Yeah, I just checked the letters in the Cyrillic menu and they're all Cyrillic, except the ones in the Transliteration section. If you used the menu, I don't know how you could have been adding the Latin lookalikes. — Eru·tuon 19:57, 19 April 2019 (UTC)Reply

I wish there was a way to access the edit tools when using the translation adding tool. — SGconlaw (talk) 03:51, 20 April 2019 (UTC)Reply

Thanks, Erutuon! Borovi4ok (talk) 07:24, 22 April 2019 (UTC)Reply

List of inflection tags by usage?[edit]

Latest comment: 5 years ago20 comments4 people in discussion

Hey ... one of the side effects of my adding a whole bunch of inflection tags is that some pages are now running out of memory. One way to attack this is to separate the tags into more and less common ones, and only load the less common set if an unknown tag is encountered. To do this I need a list of all tags by usage; is this something you can produce? Benwing2 (talk) 00:45, 21 April 2019 (UTC)Reply

@Benwing2: Yep, see here. I had a Lua script go through the uses of {{inflection of}}, convert the tags from shortcuts to full forms if possible, and count them. — Eru·tuon 01:29, 21 April 2019 (UTC)Reply

Oops, I didn't parse HTML comments. But apart from that, it's okay. — Eru·tuon 01:30, 21 April 2019 (UTC)Reply

Restricted the list to tags from Module:form of/data. — Eru·tuon 01:46, 21 April 2019 (UTC)Reply

Thank you! Can you also make a list of all the cases of {{inflection of}} involving tags not in Module:form of/data? That way I can fix them up appropriately or add the missing tags to Module:form of/data. Benwing2 (talk) 15:38, 21 April 2019 (UTC)Reply

@Benwing2: Done. But it comes to about 3 or 4 MiB depending on the format, too large to conveniently save on-wiki. I don't do filesharing much; how do you want me to get it to you? — Eru·tuon 19:41, 21 April 2019 (UTC)Reply

Hmm, you could email it to me as an attachment; you should be able to do it using the "email this user" link on the left-hand side. Another possibility is to categorize each page by the inflection tag and make a list of each inflection tag and, under the tag, just the names of the pages containing the tag; that should be much smaller. BTW there may be a bug in your script that computed the counts above; for example, you have only 66 entries listed under "prepositional" but there should be > 40,000, since there are that many Russian nouns and each one has at least a prepositional singular non-lemma form (and usually also a prepositional plural). Similarly there should be thousands of entries under "first-person", "second-person", "third-person", "animate" and "inanimate". Benwing2 (talk) 02:52, 22 April 2019 (UTC)Reply

Yeah, you're right, those counts are way off. I'll see if I can fix it. I sent you an email. — Eru·tuon 03:30, 22 April 2019 (UTC)Reply

Fixed. It was simply that I was skipping the very first tag in each template. 🙄 — Eru·tuon 03:48, 22 April 2019 (UTC)Reply

Thanks for the email. I definitely see some tags that can be added, e.g. to better support Irish and Old Irish, as well as a lot of junk, some of which can be easily cleaned up by bot and some of which is harder to do because it's idiosyncratic. I bet though that at least 90% of the 56,980 entries can be eliminated without a lot of work. Benwing2 (talk) 04:40, 22 April 2019 (UTC)Reply

Also we'll need to do another run after I finish converting the lang-specific form-of templates to generic templates; this will hugely increase the frequency of some tags. Benwing2 (talk) 04:45, 22 April 2019 (UTC)Reply

18 rules plus elimination of empty tags plus addition of a dozen or so tags to Module:form of/data2 (which will hold the less frequent tags) leads to elimination of > 94% of the cases:

Fraction of templates with bad tags = 3165 / 56980 = 5.55%
Bad tags:
other = 1138
autonomous = 314
{{lb|ga|archaic}} = 125
Epic = 121
Attic = 107
copulative = 50
negative conjugation = 49
duoplural = 42
definite form = 41
resultative = 40
variant = 39
Doric = 37
unaugmented = 36
Verbal noun = 34
Passive participle = 32
inalienable = 32
possession = 30
(multiple possessions) = 30
indefinite form = 25
{{lb|ga|obsolete}} = 25
...

Figuring out what to do with the "other" tag will eliminate more than 1/3 of the remainder. Benwing2 (talk) 11:15, 22 April 2019 (UTC)Reply

Every one of the "other" tags comes from Polish and corresponds to the rightmost column of e.g. abonować, which is listed in the conjugation table as "masculine animate or masculine inanimate or feminine or neuter", as opposed to "masculine personal". Suggestions for how to handle this? Should I list out 'm|an//in|and|f|and|n', or 'm|nonpersonal|and|f|and|n' (with a new 'nonpersonal' tag), or 'non-masculine-personal' (perhaps handled by a special 'non-' tag), or ...? Benwing2 (talk) 11:26, 22 April 2019 (UTC)Reply

I think it would be a good idea to discuss this with Polish editors. (I'm not very familiar with Polish grammar.) — Eru·tuon 20:08, 22 April 2019 (UTC)Reply

(Notifying Hergilei, Tweenk, Shumkichi, Wrzodek, Asank neo): Hello Polish editors ... could you read the preceding paragraph? There are over 1,000 entries that use the "other" tag in {{inflection of}}. All of these are Polish past-tense forms like abonowałyśmy, where "other" means "not masculine personal", but this is far from clear without context. I'd like to replace the "other" tag with something more specific, do you agree? Benwing2 (talk) 14:16, 10 May 2019 (UTC)Reply

This is easy because we use the "nonvirile" tag in newer entries, e.g. grałyśmy, srałyśmy. Wrzodek (talk) 17:56, 10 May 2019 (UTC)Reply

@Wrzodek Thanks! This looks easy enough to implement, just other -> nv (= nonvirile), right? Benwing2 (talk) 23:19, 10 May 2019 (UTC)Reply

@Benwing2 Yes, assuming all "other" tags are in Polish entries, this should fix it once and for all. I can't find any case where "other" could not be made into "nonvirile". Wrzodek (talk) 16:17, 11 May 2019 (UTC)Reply

@Wrzodek I made the change, using my bot. Benwing2 (talk) 03:17, 12 May 2019 (UTC)Reply

Looking at the list, I believe "autonomous" is a particular verb form in Irish, so that one is legitimate. Copulative is used in Zulu and its relatives, for a special form that has the function of a copulative verb. —Rua (mew) 14:22, 10 May 2019 (UTC)Reply

List ϝείδω for etymology of εἴδομαι, εἶδον, οἶδα, and ϝοράω+ϝείδω for ὁράω.[edit]

Latest comment: 5 years ago2 comments2 people in discussion

ϝείδω and ϝοράω warrant unique inclusion, as they are the common Ancient Greek ancestors of ὁράω, εἴδομαι, and εἶδον. Their existence explains the weirdness of ὁράω, εἶδον, and οἶδα, from two common verbs of origin, and warrants an exception to the usual tendency to skip reconstructed Ancient Greek forms. Indeed ϝείδω's mention in ὁράω is very useful, and instantly explains why its imperfect is ἐώρων. Wing gundam (talk) 00:29, 25 April 2019 (UTC)Reply

@Wing gundam: I think these irregular forms can be explained without spelling reconstructed forms in Greek script. (What does ϝείδω have to do with ἐώρων? Perhaps you mean that ϝοράω explains ἐώρων?) — Eru·tuon 19:20, 26 April 2019 (UTC)Reply

Grease Pit reversions[edit]

Latest comment: 5 years ago3 comments2 people in discussion

Are you sure you got each one or should I revert everything to Rua's edit of 29 minutes ago? DCDuring (talk) 22:59, 28 April 2019 (UTC)Reply

@DCDuring: Yeah, it should be good; I actually was reverting to Rua's edit over and over. — Eru·tuon

Oops, apparently I wasn't. Fixed. — Eru·tuon 23:03, 28 April 2019 (UTC)Reply

Yes. I'd noticed no reversion after those last few. Sorry I didn't catch it and block the IP shortly after he started. DCDuring (talk) 23:05, 28 April 2019 (UTC)Reply

and vs. // etc.[edit]

Latest comment: 5 years ago14 comments2 people in discussion

Hello. I remember awhile ago you wondered if we could convert uses of |and| in {{inflection of}} to |//|. I wrote a script to do that. It's careful only to combine things of the same type, and I have special exceptions for certain cases where combining doesn't make sense. The script also combines things like |nom|p|;|acc|p|;|voc|p| to |nom//acc/voc|p| and |2|p|pres|indc|;|2|p|pres|subj| to |2|p|pres|indc//subj|. A couple of issues that I'd like your input on:

The use of |and| can be ambiguous in how loosely or tightly it joins. There are cases like |nom|and|voc|and|dat|and|strong|gen|p| (in Modern Irish, which should be read as "(nominative + vocative + dative + strong-genitive) plural") and |def|s|and|p| (in Norwegian, which should be read as "(definite singular) + plural") and |1|s|and|3|p|aor|act|ind| (in Ancient Greek, which should be read in the obvious way). I propose to introduce the code _ to bind more tightly than //, so that the above three examples could be written as |nom//voc//dat//strong_gen|p|, |def_s//p|, and |1_s//3_p|aor|act|ind|. I'm not sure how to display this to indicate the binding, maybe nominative, vocative, dative and strong_genitive plural (with an underscore) or definite-singular and plural (with a hyphen). What do you think?
When you have multiple "and"'s or "//"'s, sometimes the display can be confusing, e.g. dative and ablative masculine and feminine plural, which should be read as "(dative and ablative) (masculine and feminine) plural" but might be confused as "(dative) and (ablative masculine) and (feminine) plural". I wonder if we should display them differently, e.g. dative–ablative masculine–feminine plural (with en dash) or dative+ablative masculine+feminine plural (with +) or some other way. Comments?

Benwing2 (talk) 02:19, 4 May 2019 (UTC)Reply

BTW, I think if the proper binding can't be expressed using _ and //, the tag set should be split into multiple tag sets. For example, litear currently has {{inflection of|ligh||pres|indc|and|pres|subj|and|impr|autonomous|lang=ga}}. This could be expressed as |pres_indc//pres_subj//impr|autonomous|, but might better be expresed as |pres|ind//sub|autonomous|;|impr|autonomous|. I think this especially goes for cases like paca, which has {{inflection of|pacare||3|s|pres|indc|and|2|s|impr|lang=it}}, where the two things being joined share almost nothing; why not use {{inflection of|pacare||3|s|pres|indc|;|2|s|impr|lang=it}}? Benwing2 (talk) 03:02, 4 May 2019 (UTC)Reply

I think it would be okay not to make the display of cases like "dative and ablative masculine and feminine plural" any clearer, and to rely on language-specific context for disambiguation. The incorrect reading "(dative) and (ablative masculine) and (feminine) plural" is technically possible syntactically speaking, but someone who understands the basic grammar of the language (or even has basic understanding of the meanings of the terms) will probably know that's nonsensical and that the correct reading is "(dative and ablative) (masculine and feminine) plural", because only cases and genders can be joined by a conjunction. But of the options given, I prefer dashes (dative–ablative masculine–feminine plural) to plus signs (dative+ablative masculine+feminine plural) because they're more commonly used in good typography. Actually the dashed version is far more readable than the version with "and".

The other issue seems more complicated, aside from the person–number labels which feel very obvious to me because they have parallel structure. I don't really like putting unusual characters like underscores in the output, but I don't have a better idea right now. Maybe the binding of "strong genitive" and "definite singular" is something that can be left to language-specific context though. (I don't have a great sense for these particular examples though.) Using underscores with a special meaning in the template code confuses me a little, because underscores are equivalent to spaces in wikilinks and page titles and they are a character in C-ish identifiers (which don't have internal semantic structure). — Eru·tuon 03:50, 4 May 2019 (UTC)Reply

Thanks for your input. I'm not wedded to underscores, my other thought is colon, e.g. def:s//p. The advantage of having *some* code like this is that the underlying template code has an unambiguous interpretation (even if the output doesn't show it), which can enable various use cases. The interpretation of either underscore or colon as a separator would be inhibited if the tag contains either a link (i.e. any of the [ or ] or | chars) or HTML (i.e. the < or > chars). It isn't necessary to inhibit interpretation of // in this fashion because // doesn't normally occur in links or HTML (which is why I chose it); this allows things like {{lb|grc|Epic}}//{{lb|grc|Attic}}, which occurs frequently.

As for your comment about the en-dash version, I agree that it's more readable than the version with "and"; maybe I'll implement this. Benwing2 (talk) 04:50, 4 May 2019 (UTC)Reply

@Benwing2: Colon does seem less confusing. I also like the idea of having the template code clearly convey the intent, though I'm not sure what non-programmers will think.

I had the idea of adding HTML so that JavaScript can find the output of these conjoined tags and change the way they are displayed. It would be sufficient to enclose each of the separators and the whole sequence with a class. Say, inflection-of-sep and inflection-of-conjoined. If the separator ends or begins with an ASCII space, I think the space has to be replaced with ~~ ~~   to prevent the MediaWiki parser from moving the space outside of the HTML tag. ~~Oddly,   (as well as aliases like  ) is replaced with an ASCII space in the HTML emitted by the parser.~~ For nom//acc//voc, this would look roughly like

<span class="inflection-of-conjoined">nominative<span class="inflection-of-sep">,&#32;</span>accusative<span class="inflection-of-sep"><span class="serial-comma">,</span><span class="serial-and">&#32;and</span></span>vocative</span>

if the linking is omitted. Then JavaScript can iterate over each .inflection-of-conjoined element and find the child .inflection-of-sep elements and change their displayed text. — Eru·tuon 18:58, 10 May 2019 (UTC)Reply

This is a good idea. I'll implement it. Benwing2 (talk) 23:17, 10 May 2019 (UTC)Reply

Implemented. Benwing2 (talk) 03:08, 12 May 2019 (UTC)Reply

Another cleanup task: I think that in Ancient Greek at least, all the incorrectly conjoined tags from the same category in the formation tag|tag|and|tag, like nom|acc|and|voc, can safely be changed to tag//tag//tag. This list should include all the templates that need to be fixed. — Eru·tuon 07:33, 4 May 2019 (UTC)Reply

Yup, my script already handles those as well. Benwing2 (talk) 07:50, 4 May 2019 (UTC)Reply

Sweet. Oh, also your script could remove trailing semicolons. — Eru·tuon 16:09, 4 May 2019 (UTC)Reply

I'll make sure it handles that also. Benwing2 (talk) 16:14, 4 May 2019 (UTC)Reply

The script has finished running. Let me know if you see anything that's wrong. BTW for a comparison between "and" and en-dashes, see User:Benwing2/billigen vs. User:Benwing2/billigen2. You can view any page in en-dash format by locally changing the return value of export.multipart_join_strategy() to "en-dash" in Module:form of/functions, and previewing the page. Benwing2 (talk) 16:46, 5 May 2019 (UTC)Reply

I also cleaned up the remaining entries in your list above that my bot didn't handle and that you hadn't already fixed. Benwing2 (talk) 16:56, 5 May 2019 (UTC)Reply

Thank you so much! — Eru·tuon 23:24, 5 May 2019 (UTC)Reply

combining adjacent calls to `{{inflection of}}`[edit]

Latest comment: 5 years ago7 comments3 people in discussion

Hey ... sorry to see all the vandalism on your page. I wrote a script to combine adjacent calls to {{inflection of}} into a single call with semicolon separators, and then apply combination logic when sets of inflections differ along only one axis (the same thing I already did to existing calls to {{inflection of}} with semicolons in them). I am thinking of running it, what do you think? Benwing2 (talk) 01:20, 8 May 2019 (UTC)Reply

I would be very glad if you ran that on Ancient Greek entries; I was considering starting a bot to do it because WT:ACCEL doesn't yet do it. I would merge by multiple dimensions, but since Rua disagrees, it's best not to do that without a vote; on the other hand, people are unlikely to disagree with merging by a single dimension, and the templates can later be merged by multiple dimensions if that is agreed on. — Eru·tuon 19:57, 8 May 2019 (UTC)Reply

@Rua Are you dead set against having syncretism along two axes? As mentioned above, I wrote a script to combine adjacent calls to {{inflection of}} and combine syncretisms as much as possible. I first went through the latest dump and identified subsections where such combination is potentially possible (producing 442,504 subsections on 420,214 pages), and then ran my script on those subsections. The script first combines adjacent calls to {{inflection of}} that can be combined (same language, same lemma, etc.), using |;|, and then seeks to further combine tag sets that differ in a single dimension. Some stats after all combining is done:

Num tag sets seen = 691737
Num tag sets with 1 multipart tags = 342350 (49.49%)
Num tag sets with 0 multipart tags = 300938 (43.50%)
Num tag sets with 2 multipart tags =  48445 (7.00%)
Num tag sets with 3 multipart tags =      4 (0.00%)
Tag sets by ordered dimensions of multipart tags:
                                         = 300938 (43.50%)
case                                     = 169362 (24.48%)
gender                                   =  65031 (9.40%)
person                                   =  52322 (7.56%)
mood                                     =  47584 (6.88%)
case, gender                             =  44323 (6.41%)
tense-aspect                             =   5490 (0.79%)
person, mood                             =   2947 (0.43%)
number                                   =   2146 (0.31%)
person, number                           =    792 (0.11%)
gender, case                             =    318 (0.05%)
animacy                                  =    204 (0.03%)
voice-valence                            =    122 (0.02%)
state                                    =     75 (0.01%)
case, number                             =     34 (0.00%)
person, tense-aspect                     =      7 (0.00%)
voice-valence, mood                      =      7 (0.00%)
unknown                                  =      7 (0.00%)
class, case                              =      6 (0.00%)
state, case                              =      4 (0.00%)
class                                    =      4 (0.00%)
case, gender, number                     =      4 (0.00%)
grammar                                  =      3 (0.00%)
number, case                             =      2 (0.00%)
number, mood                             =      1 (0.00%)
number, gender                           =      1 (0.00%)
person, grammar                          =      1 (0.00%)
animacy, case                            =      1 (0.00%)
gender, number                           =      1 (0.00%)

What this means is that 691,737 tag sets were left after combinations were applied (where a "tag set" is a single grouping of tags representing a single inflection, and the semicolon separates tag sets), of which 342,350 (about half) had a single multipart tag in them (where a multipart tag is something like nom//acc//voc, i.e. it denotes syncretism along an axis), while 300,938 (43.5%) had no multipart tags, 48,445 (7%) had two multipart tags, and only 4 had three multipart tags. The rest of the info specifies the dimensions of the multipart tags: 169,362 (24.48%) of the tag sets had a single multipart tag along the case dimension; 44,323 (6.41%) of the tag sets had two multipart tags, with the earlier one along the case dimension and the later one along the gender dimension (this accounts for almost all the cases of two-axis syncretism); etc.

Typical examples of two-axis syncretism are like this:

abjure: inflection of abjurer:

1. first/third-person singular present indicative/subjunctive
2. second-person singular imperative

abscissīs: dative/ablative masculine/feminine/neuter plural of abscissus

BTW the only examples of three-axis syncretism come from Slovenian, like this:

glavnih: genitive/locative masculine/feminine/neuter dual/plural of glaven

Note that the above three examples rendered using en dashes (which I think looks better) are:

inflection of abjurer:

1. first/third-person singular present indicative/subjunctive
2. second-person singular imperative

dative/ablative masculine/feminine/neuter plural of abscissus
genitive/locative masculine/feminine/neuter dual/plural of glaven

(Sorry, the multiline calls to {{inflection of}} aren't getting formatted right but you get the idea.)

If we are to syncretize along only one axis at a time, how should this be done? Should we first seek to minimize the number of inflection lines (hence dat m//f//n and abl m//f//n, rather than dat//abl m, dat//abl f and dat//abl n), and then choose some ordering of dimensions? If so, what should the ordering be? In general I think the two-axis syncretisms are compact and readable and help readers to know the common syncretism patterns, e.g. the dative and ablative plural are almost always the same in Latin, which is obscured by splitting them into separate dat m//f//n and abl m//f//n lines, whereas splitting them the other way obscures the fact that the dative and ablative plural in Latin adjectives almost always have the same form for all genders.

Benwing2 (talk) 01:33, 10 May 2019 (UTC)Reply

I'm not totally against it, but clarity to the user has to come first. Keep in mind that not everything that's clear to us experienced users is also clear to new users. —Rua (mew) 09:48, 10 May 2019 (UTC)Reply

@Rua I agree. Can you comment more specifically on the examples above, whether you think they are clear, and if you'd prefer to have syncretism along only one axis, how you'd prefer it done? Benwing2 (talk) 14:11, 10 May 2019 (UTC)Reply

I find the hyphens clearer than using "and", because it makes it more clear which terms are on the same axis. But whether that is clear enough for everyone I can't say. Perhaps actual slashes, like in the code, are even clearer (i.e. first/third-person). This is probably something that needs more eyes. —Rua (mew) 14:13, 10 May 2019 (UTC)Reply

@Rua I'll see what others have to say, but for the moment I modified the examples above to use slash instead of en-dash for joining. Benwing2 (talk) 15:59, 10 May 2019 (UTC)Reply

Javascript tooling[edit]

Latest comment: 5 years ago2 comments2 people in discussion

You seem to be a JS “poweruser”. what do you recommend for adding small Javascript based refactoring tools? I came across meta:TemplateScript, is this any good? I have some Python scripts I use for formatting but I always need to switch back to the terminal, copy&paste etc, I'd like to streamline this. – Jberkel 15:45, 9 May 2019 (UTC)Reply

@Jberkel: Well, I created a few TemplateScript scripts, but it didn't seem quite flexible enough for everything I wanted to do, so I created User:Erutuon/scripts/CleanupButtons.js to add buttons above the textbox based on arbitrary conditions that usually have to do with the wikitext in the edit box, and have them execute callbacks when clicked. I think most of all I wanted to be able to decide when to add the buttons, rather than having a whole flock of buttons appear on every page, and to have them near the textbox so that I don't have to page up to the sidebar (since I'm using Monobook). CleanupButtons started as a part of User:Erutuon/scripts/cleanup.js, and you can see examples there. Then I moved the button-adding code because I was using it in another script and on Wikipedia. — Eru·tuon 16:45, 9 May 2019 (UTC)Reply

πολύγονον[edit]

Latest comment: 5 years ago2 comments2 people in discussion

Thanks for taking a look at my addition. I admit to being a little out of my depth when it comes to some of the finer details like the declension- I basically combined information from an entry starting with πολύ- and one ending with -γονον after checking the LSJ at Perseus and the Gaffiot entry. I also checked as many of the forms as I could get the word study tool at Perseus to show me, though for some reason I couldn't get the genitive to display.

I was wondering if we have any reference template for pages from the Naples Dioscurides. It seems to be an alphabetized condensation of De Materia Medica from the Middle Ages, but it has very nice illustrations, and it's viewable online here]. If we do, it would be nice to link to folio 121 for this entry. Chuck Entz (talk) 05:31, 11 May 2019 (UTC)Reply

@Chuck Entz: Well, you got the declension and so on right. There are probably hardly any nouns in -ον that don't belong to the -ον, -ου neuter second declension.

I'm pretty sure there isn't such a reference template; I didn't see it while recategorizing entries on plant names, and it doesn't seem to be in Category:Ancient Greek reference templates either. — Eru·tuon 23:52, 11 May 2019 (UTC)Reply

WT:NEWS[edit]

Latest comment: 5 years ago2 comments2 people in discussion

Although, https://www.unicode.org/versions/Unicode12.1.0/ did come out in May, specifically for 令和 :p —Suzukaze-c ◇◇ 02:39, 14 May 2019 (UTC)Reply

Ooh, interesting... added that character. — Eru·tuon 02:45, 14 May 2019 (UTC)Reply

rookie's question 2[edit]

Latest comment: 4 years ago5 comments2 people in discussion

Dear Erutuon! May i bother you again... it is not urgent. I am writing this little module: if a greek word begins with x, x, x letters, then write article την.
I do not know exactly how to write them. I know, I should not use commata, and that they need U+ codes (I have them) and something like local gsub = mw.ustring.gsub. Is there a module where I can see examples? I've looked at transliteration modules, but they substitute letters which is a bit different. --sarri.greek (talk) 11:52, 15 May 2019 (UTC)Reply

@Sarri.greek: The function that you want for testing that a term begins with a Greek letter is mw.ustring.find. (string.find does not always work for Greek letters because it looks at bytes and Greek letters are two or three bytes long in UTF-8.) It returns a number (actually two numbers, but that doesn't matter in the code that you showed me) if the letter was found or nil if it was not, so it can be used in the protasis of an if-statement (if mw.ustring.find(...) then ... end or if mw.ustring.find(...) ~= nil then ... end if you want to explicitly convert to a boolean). To check if a term begins with α, you can use mw.ustring.find(term, '^α'). To check if a term begins with one of multiple characters, put them in square brackets: mw.ustring.find(term, '^[αεηιουω]') checks if term begins with a lowercase vowel letter. ^ at the beginning of the pattern forces the pattern to match only at the beginning of the term, so mw.ustring.find('τη', '^[αεηιουω]') returns nil but mw.ustring.find('τη', '[αεηιουω]') returns a number.

To avoid having to list a bunch of letters with diacritics, you can decompose the term with term = mw.ustring.toNFD(term) before using mw.ustring.find. When decomposed, for instance ά (U+03AC GREEK SMALL LETTER ALPHA WITH TONOS) becomes ά (U+03B1 GREEK SMALL LETTER ALPHA, U+0301 COMBINING ACUTE ACCENT), and mw.ustring.find(mw.ustring.toNFD('ά'), '^[αεηιουω]') will return a number while mw.ustring.find('ά', '^[αεηιουω]') returns nil.

I'm not sure if there is a good Greek module for this type of thing, but I hope this long post helps. I can give a module with examples if you need it. — Eru·tuon 19:52, 15 May 2019 (UTC)Reply

ow this is wonderful: you are a great teacher. I will practice with all the instructions you gave me. Your previous help with the module that recognizes affixes, is a great hit!! We are very grateful. --sarri.greek (talk) 01:47, 16 May 2019 (UTC)Reply

I will experiment with accented letters -which will be very useful-, but in the module I will do the easy thing and reverse the rule: I will state which letters do NOT get the article την (they are just β, γ, δ, θ, φ, χ, λ, μ, ν, ρ, σ, ζ). Thank you!! --sarri.greek (talk) 01:57, 16 May 2019 (UTC)Reply

@Erutuon! itttt works! (tests are all ok) THANK you teacher: now I can do all declensions! --sarri.greek (talk) 12:49, 16 May 2019 (UTC)Reply

Akkadian IPA[edit]

Latest comment: 4 years ago1 comment1 person in discussion

Hello, could you help me out with Akkadian traditional transcription and IPA. I could use a template that could convert the transcription to IPA. Luckily, it's pretty straight forward. Each letter has a single correspondence except for e which would have to be imputed manually. – Tom 144 (𒄩𒇻𒅗𒀸) 22:09, 26 May 2019 (UTC)Reply

Franc-Comtois[edit]

Latest comment: 4 years ago3 comments2 people in discussion

Cheers, I made a request for Franc-Comtois. --Lo Ximiendo (talk) 03:40, 3 June 2019 (UTC)Reply

@Erutuon: You left the flag at sixty-eight pixels. — This unsigned comment was added by Lo Ximiendo (talk • contribs) at 04:45, 3 June 2019 (UTC).Reply

@Lo Ximiendo: Hi, when discussing this gadget, please ping me at MediaWiki_talk:Gadget-WiktCountryFlags.css to get my attention. — Eru·tuon 16:48, 3 June 2019 (UTC)Reply

How do if find a diff I know only by number and wiki?[edit]

Latest comment: 4 years ago3 comments2 people in discussion

I know a specific edit number for a WP edit that allegedly triggered a ban of a veteran user. I'd like to see it and the context and judge for myself. I don't know what page was being edited, nor the date. If you don't know how to do this, do you have any idea where I can look? DCDuring (talk) 12:03, 12 June 2019 (UTC)Reply

@DCDuring: If I understand correctly, you can look up the diff by entering https://wiki domain/w/index.php?diff=edit number. You don't need the page name because all edit numbers on a wiki are unique. If you then want to look at the history for more context, you can note the date, click the History tab, and enter the date to view edits around that time. — Eru·tuon 18:16, 12 June 2019 (UTC)Reply

Thanks. It worked perfectly. I am right now finding the confrontation that probably led to the ban. DCDuring (talk) 19:14, 12 June 2019 (UTC)Reply

CAT:E[edit]

Latest comment: 4 years ago2 comments2 people in discussion

Lots of errors in documentation pages of translit modules, which you seem to have introduced. Benwing2 (talk) 05:01, 28 June 2019 (UTC)Reply

@Benwing2: Sorry, fixed. It was a design flaw in Module:array. — Eru·tuon 06:11, 28 June 2019 (UTC)Reply

Your miracles[edit]

Latest comment: 4 years ago1 comment1 person in discussion

@Erutuon! THANK YOU. What you have taught me at this module, I applied here annnddd it works wonderfully! (el:λύση, gen.sg). Ι will expand now! You are my hero. sarri.greek (talk) 05:44, 25 July 2019 (UTC)Reply

Module:fi-pronunciation[edit]

Latest comment: 4 years ago2 comments2 people in discussion

I'm creating a new module designed to implement a template that would replace {{fi-IPA}}, {{fi-hyphenation}} and {{rhymes|fi}}. I was planning to name this Module:fi-pronunciation, Template:fi-pronunciation (after I realized {{fi-pron}} was taken for pronouns). However, Module:fi-pronunciation is used too, by a module of yours that seems to be an unused template meant to replace (?) Module:fi-IPA. Mind if I (eventually) take the name for my module? — sur jec tion ⟨?⟩ 20:00, 3 August 2019 (UTC)Reply

@Surjection: That'd be fine. I created Module:fi-pronunciation by mistake when I didn't notice there was already Module:fi-IPA, so it serves no purpose. — Eru·tuon 00:12, 4 August 2019 (UTC)Reply

Example sentences in usage notes[edit]

Latest comment: 4 years ago2 comments2 people in discussion

I pressed ENTER too fast and now my message in the edit summary of antun appears rude. But what I mean is that such sentences do not look marked enough, not enough herausgehoben. There is also usage examples in new lines with various templates for example in erinnern but it seems excessive and I imagine the templates are abused this way. All that I know is not satisfactory. Are there better methods? Fay Freak (talk) 22:12, 11 August 2019 (UTC)Reply

@Fay Freak: Personally I prefer to put the notes themselves on lines without bullets and put examples in a bulleted list below. That's demonstrated here, where the previous state is very weird HTML-wise, since it contains single-item unordered lists that contain the usage note and then series of dd tags created by : containing the usage examples. Having notes in paragraph tags and examples in unordered lists (as in the linked diff) makes sense HTML-tag-wise, though I can't speak for whether it looks good or not.

In general if the examples are inline, as in antun, and in Latin script, I put them in italics rather than quotation marks, like this (and put a gloss or translation in quotes, if it's provided). That might make them stand out more visually, though it does not make it clear that the two consecutive examples are separate as the quotation marks do. — Eru·tuon 22:36, 11 August 2019 (UTC)Reply

Module:User:Erutuon/Wonderfool[edit]

Latest comment: 4 years ago3 comments2 people in discussion

You made a Wonderfool Module? That's so lame. --Gibraltar Rocks (talk) 15:38, 15 August 2019 (UTC)Reply

You're welcome! I'm glad you like it. — Eru·tuon 16:10, 15 August 2019 (UTC)Reply

I was trying to have my "revenge" by making an Erutuon Module, but I soon realised I still haven't learned Scribunto. So essentially, I lost the game. --Gibraltar Rocks (talk) 16:38, 15 August 2019 (UTC)Reply

RE diacritic automatically removed[edit]

Latest comment: 4 years ago2 comments2 people in discussion

I was under influence of măceș. When I pass a multiple-word term as third parameter (altdisplay parameter) of the normal linking templates and use square brackets to link the separate terms the diacritic strip does not run, so I added {{der|ro|bg||*[[мечешки|мѐчешка]] ([[шипка|шѝпка]])}}. Though I could pass the same thing with the desired effect to the second parameter so it does not make sense to use the third, something else does not make sense either. I remember I had this problem unrelated to Bulgarian, I think it was Latin diacritics did not get stripped in such an environment – I only now see the pattern, and yep the test code works with Latin content; before, because of the described error I thought that ѝ does not get stripped because of special handling, so people can link ѝ. But how do people link ѝ anyway if the diacritic is stripped? It’s another thing somewhere here that does not make sense. (Arguably, the page should not exist, but the content abide on и with the diacritic in the headerlines.) And I do remember that there was that discussion about stuff removed from Arabic-script links, differently in Arabic and Persian, I remember the mechanism left much to desire. Fay Freak (talk) 06:31, 21 August 2019 (UTC)Reply

Right now the Bulgarian entry-name replacements do not allow linking to ѝ. (As you pointed out, entry-name replacements can be sidestepped by putting links in alt parameters, because alt parameters are not modified in any way. Then the links also don't point to a language section unless they are explicitly written that way: {{m|bg||[[мечешки#Bulgarian|мѐчешка]]}}.) Perhaps they should be refined to leave the accent on this word. Without hardcoding anything in Module:languages, that could be done by replacing the word ѝ with a placeholder, removing grave accents, then putting ѝ back again. Hacky, but it would work. The other option is moving the Bulgarian entry from ѝ to и – if the word is usually spelled without the grave accent, outside of teaching materials or dictionaries. — Eru·tuon 07:12, 21 August 2019 (UTC)Reply

Overriding Skt. adjective templates?[edit]

Latest comment: 4 years ago3 comments3 people in discussion

Hello - I added inflection templates to अल्प, but am not sure how to add the irregular masc. nom. pl. in -e. Do you know how to override the adjective templates? Hölderlin2019 (talk) 23:33, 28 August 2019 (UTC)Reply

@Hölderlin2019: Sorry, I don't know much about Sanskrit templates. Perhaps JohnC5 would know? — Eru·tuon 23:40, 28 August 2019 (UTC)Reply

@Erutuon, Hölderlin2019: So, I did not make the template {{sa-decl-adj-mfn}}, which is just the templates {{sa-decl-noun-m}}, {{sa-decl-noun-f}}, and {{sa-decl-noun-n}} concatenated together. It would be possible to add parameters like |m_nom_s=, |m_gen_s=, etc., which punch through to the |nom_s=, |gen_s= parameters of the {{sa-decl-noun-m}} template inside. Regardless, this is not how I wanted to build this template since Sanskrit adjectives sometimes decline differently from the nouns. So... yeah. —*i̯óh₁n̥C ^[5] 05:01, 29 August 2019 (UTC)Reply

2 things[edit]

Latest comment: 4 years ago2 comments2 people in discussion

Hi :)

Could you help with this?

review and commit my change to TranslationAdder.js removing the balancer buttons and reliance on trans-mid. I have used it daily since the change and have not seen any problems after removing it.
Add me to this list for me to able to use JWB.--So9q (talk) 10:24, 9 September 2019 (UTC)Reply

Your change to the gadget looks okay, so I'll copy it to the gadget page.

I'm just an interface admin, so I can't edit Wiktionary:AutoWikiBrowser/CheckPage. You'll have to get the attention of a real admin (sysop). — Eru·tuon 18:07, 9 September 2019 (UTC)Reply

Lua memory usage[edit]

Latest comment: 4 years ago2 comments2 people in discussion

Hi, I found this via your common.js: User:Erutuon/scripts/simpleTranslations.js. It contains this: {{[[T:t-simple|t-simple]]}} for Latin-script terms with just lang, term, and gender, to reduce Lua memory usage, using [[User:Erutuon/simpleTranslations.js|JavaScript]]

Is this still relevant? If yes, would it not be a good idea to improve the TranslationAdder.js to insert these for da, no, nb, etc.? WDYT?

I saw that some pages have sub-pages /translations to work around the Lua memory issue. Can massive use of t-simple avoid that?--So9q (talk) 10:40, 9 September 2019 (UTC)Reply

No, the translation adder shouldn't use {{t-simple}}. It's just a workaround on pages that are in CAT:E because they are using too much Lua memory. And {{t-simple}} doesn't always reduce memory enough to remove the error messages; that's why there are translation subpages. — Eru·tuon 17:05, 9 September 2019 (UTC)Reply

Community Insights Survey[edit]

Latest comment: 4 years ago1 comment1 person in discussion

Share your experience in this survey

Hi Erutuon/2019,

The Wikimedia Foundation is asking for your feedback in a survey about your experience with Wiktionary and Wikimedia. The purpose of this survey is to learn how well the Foundation is supporting your work on wiki and how we can change or improve things in the future. The opinions you share will directly affect the current and future work of the Wikimedia Foundation.

Please take 15 to 25 minutes to give your feedback through this survey. It is available in various languages.

This survey is hosted by a third-party and governed by this privacy statement (in English).

Find more information about this project. Email us if you have any questions, or if you don't want to receive future messages about taking this survey.

Sincerely,

RMaung (WMF) 14:34, 9 September 2019 (UTC)Reply

Context deprecation and red message[edit]

Latest comment: 4 years ago2 comments2 people in discussion

In {{context}}, I restored the version that does not show the long red message. The point of deprecation as opposed to deletion is to make page histories legible. I did that after I noticed in page histories illegibility that I did not expect to be there, and then found the source of the illegibility.

I understand this was an attempt to prevent people from using the template. There is a better way, preserving history legibility: create an edit filter that is going to prevent people from saving an entry that contains a deprecated template. No one created such a filter yet and I don't know why; I fear I do not have enough user rights to edit these filters.

In any case, we have deprecation under control via Category:Pages using deprecated templates, which now contains 4 pages. I am cleaning up the category once in a while, and I remember similar counts. It is very manageable. With the edit filter, it would be even easier. --Dan Polansky (talk)

Not a bad approach to the problem. So much edit history is virtually unusable because of deprecation. DCDuring (talk) 14:47, 12 September 2019 (UTC)Reply

@Dan Polansky: I'm also generally in favor of keeping histories legible, but got a bit carried away so I added the error message. Since you are keeping an eye on the category, it makes sense to remove it. I do like the idea of an edit filter for frequently used deprecated templates, but I'm not an admin either. — Eru·tuon 00:15, 13 September 2019 (UTC)Reply

Administrator?[edit]

Latest comment: 4 years ago10 comments8 people in discussion

You do a lot of valuable work with templates and modules. Would you consider becoming an administrator? — SGconlaw (talk) 11:51, 13 September 2019 (UTC)Reply

Good idea. You would have access to more things. We wouldn't make you do more patrolling. DCDuring (talk) 13:10, 13 September 2019 (UTC)Reply

Not that I would mind if we had more people patrolling... —Μετάknowledge^{discuss/deeds} 16:51, 13 September 2019 (UTC)Reply

I'm grateful for what he does. I run into vandalism that he's undone all the time. Chuck Entz (talk) 22:23, 13 September 2019 (UTC)Reply

I'm surprised Erutuon is not an admin already! —Aryaman^A ^{(मुझसे बात करें • योगदान)} 18:50, 14 September 2019 (UTC)Reply

He's been offered the position before: see here. 31.173.87.215 18:54, 14 September 2019 (UTC)Reply

I refused before, but I guess I'd be willing now if there's something I could do with the admin tools. Perhaps protecting vandalized modules and templates and moving pages. — Eru·tuon 19:31, 14 September 2019 (UTC)Reply

Great! Let me see if I can figure out how to nominate you. (Unless someone else wants to jump in and do it first ...) — SGconlaw (talk) 19:51, 14 September 2019 (UTC)Reply

@Sgconlaw: Done. Please endorse the nomination. 31.173.83.164 12:15, 15 September 2019 (UTC)Reply

Oh, thanks, 31.173.83.164! Erutuon, you need to indicate your acceptance on the voting page. — SGconlaw (talk) 14:46, 15 September 2019 (UTC)Reply

Erroneous conversion to t-simple[edit]

Latest comment: 4 years ago4 comments2 people in discussion

Hi, I just discovered that these entries have been converted by you to t-simple because of the Lua memory bug but in a way that does not show the information about gender.

* Danish: {{t-simple|da|næse|c|langname=Danish|interwiki=1}}

This is correct:

* Danish: {{t-simple|da|næse|g=c|langname=Danish|interwiki=1}}

--So9q (talk) 11:33, 16 September 2019 (UTC)Reply

Ouch. Good catch. I'm going to have to figure out if it's better to make parameter 3 be gender, or convert these to use |g= and change my script. — Eru·tuon 18:24, 16 September 2019 (UTC)Reply

Census of parameters in {{t-simple}} from the latest dump:

|1=: 16129
|2=: 16129
|3=: 3716
|4=: 1
|alt=: 141
|g=: 323
|interwiki=: 6342
|lang=: 1
|langname=: 15341
|lit=: 1
|sc=: 66
|tr=: 317

Since |3= is so common (because of me no doubt), {{t-simple}} now accepts the gender in either |3= or |g=. I also checked and there was only one instance with both |3= and |g=, which I corrected. — Eru·tuon 20:28, 16 September 2019 (UTC)Reply

Nice! Thank you, again, again :)--So9q (talk) 20:56, 16 September 2019 (UTC)Reply

English at top[edit]

Latest comment: 4 years ago3 comments2 people in discussion

Concerning this do you have a link to a policy or vote stating this norm? I found nothing in wt:EL and other style pages I looked at.--So9q (talk) 08:05, 18 September 2019 (UTC)Reply

@So9q: From ELE:

Priority is given to Translingual: this heading includes terms that remain the same in all languages. This includes taxonomic names, symbols for the chemical elements, and abbreviations for international units of measurement; for example Homo sapiens, He (“helium”), and km (“kilometre”). English comes next, because this is the English Wiktionary. After that come other languages in alphabetical order.

Giorgi Eufshi (talk) 10:43, 18 September 2019 (UTC)Reply

OK, that makes sense. --So9q (talk) 11:18, 18 September 2019 (UTC)Reply

Admin[edit]

Latest comment: 4 years ago3 comments3 people in discussion

Congratulations! Chuck Entz (talk) 13:01, 30 September 2019 (UTC)Reply

Yeah, you are awesome and admin --Vealhurl (talk) 17:52, 10 October 2019 (UTC)Reply

Indeed, congrats! — SGconlaw (talk) 20:16, 10 October 2019 (UTC)Reply

wikt:majolica n.[edit]

Latest comment: 4 years ago5 comments3 people in discussion

Re your reversion, removal of images: The word majolica has been dogged with confusion since it is used for two distinctly different products in different countries in different periods of time. All other dictionaries than Wiktionary define it inaccurately or omit one sense of the word. Hard to believe but true. The two products, the two meanings of majolica, the two majolicas are visibly different. I feel the deleted images assist understanding and warrant an exception to the 'minimal images' rule.
Davidmadelena (talk) 23:10, 15 October 2019 (UTC)Reply

@Davidmadelena: I have no objection to illustrating the two definitions – it's just not clear to me why so many images are needed. Why wouldn't two images, one for each definition, be enough? (This is an honest question – I hadn't heard of majolica before the entry showed up in my possibly incorrect headers cleanup page.) If you could find two images that clearly illustrate the differences in the two techniques, that would be ideal. To allow people to see more images, you can create a page on Wikimedia Commons (see c:Category:Majolica) and link it from the entry using {{commons}}. — Eru·tuon 23:33, 15 October 2019 (UTC)Reply

@Eru: Overnight I had reached the same conclusion: two images to clearly illustrate the difference. Done, and thanks, much better now.Davidmadelena (talk) 10:15, 16 October 2019 (UTC)Reply

I think you mean @Erutuon: :) Eru (talk) 16:36, 16 October 2019 (UTC)Reply

@Davidmadelena, Eru: My confusing signature is to blame... — Eru·tuon 16:39, 16 October 2019 (UTC)Reply

Removing control chars[edit]

Latest comment: 4 years ago2 comments2 people in discussion

Some of these should not be removed, but rather replaced with an em dash, e.g. [1]. Equinox ◑ 21:25, 18 October 2019 (UTC)Reply

@Equinox: Oh, yeah, that makes sense. I'll go and clean up after myself. — Eru·tuon 21:36, 18 October 2019 (UTC)Reply

Template:t-simple[edit]

Latest comment: 4 years ago3 comments2 people in discussion

Regarding diff, I thought the whole point of translation subpages was that they would avoid Lua memory problems without the need for the clumsy {{t-simple}} template. That's why I've been going through them and removing it from them. If you're readding it though, then we're working at cross purposes. —Mahāgaja · talk 09:40, 24 October 2019 (UTC)Reply

@Mahagaja: I switched translations in fire/translations to {{t-simple}} because it was running out of memory. In general I'm in favor of having translation subpages use {{t}}, {{t+}} if they can without running out of memory; if fire/translations can be switched back (maybe I should make a script for this), it should be. — Eru·tuon 16:07, 24 October 2019 (UTC)Reply

Good heavens, you're right: it was running out of memory. That's kind of appalling. But I agree that using {{t-simple}} in that case is unavoidable. —Mahāgaja · talk 19:50, 24 October 2019 (UTC)Reply

Thank you[edit]

Latest comment: 4 years ago100 comments3 people in discussion

I'm extremely new to Lua. Having a solid background in JavaScript has helped me transition, but I appreciate the improvements you've offered. I just wanted to tell you that I've been working on a major update to the module script, which I've been editing offline because...Wiktionary's editor isn't as convenient as EditPad for indentation, regular expressions text search and replacement, etc.

Some background information: I know ideally, if I can get more people to help me out with Marshallese maintenance on Wiktionary (and on Wikipedia, where I'm mostly responsible for it there, too), I can't just treat scripts like something I can write and maintain unilaterally. But for now, the script is still very much in flux, not just in the state of code but in the wisdom of coding decisions, etc. For instance, I think I made a huge mistake embedding separate MED vs. Choi vs. Willson IPA symbols, because they don't actually represent different dialects, but merely different published researchers' occasionally conflicting phonological analyses of the language. Honestly, the state of Marshallese linguistics publications can be a bit of a mish-mash of different researchers doing their own things and not always agreeing on conventions, which has led me occasionally having to get a tad...creative. Lately I've been asking for more peer review on w:Talk:Marshallese language to help improve the occasionally confused and OR-prone state of the article and pronunciation templates, and what the scripting I write here is something I hope can eventually be used there as well where appropriate. That effort on Wikipedia, like this script, and the Wiktionary:About Marshallese proposal, are still all very much a work in progress, and for the most part I've had to maintain it all myself, and inadequate peer review means the mistakes I make tend to become the decisive word in how the wikis describe the language, sometimes for years on end until someone (or myself) notices the problem.

So thank you for your help with scripting and setting up some simple test cases, etc. While I'm still improving the script offline, I've made note of your improvements and am trying to add them in the offline editing before I submit and test features of a new update, all while trying not to break currently deployed invocations in the process. - Gilgamesh~enwiki (talk) 08:03, 31 October 2019 (UTC)Reply

Glad that my tinkering was appreciated. I just encountered some module errors due to outdated input in {{mh-ipa-rows}} and provided more informative module errors, and then possibly made the errors useless by removing u from the supported characters. (All the erroring instances had u.) Wiktionary:About Marshallese still needs updating though. — Eru·tuon 17:18, 2 November 2019 (UTC)Reply

Thanks again. And yeah, my bad.

Again, some background, and what motivated me to make such drastic changes today: When writing Marshallese templates on Wikipedia and Wiktionary years ago, I devised an ASCII-based symbol system loosely based on the MED phoneme transcription developed by Byron W. Bender and used in the Marshallese-English Dictionary. But in my effort to simplify it into an ASCII-inputtable system, I changed Bender's a e ẹ i notation to a e o u, since at the time we were treating the vertical vowel system phonemes as underspecified for backness or roundedness—which is true, they are underspecified for that, but at the time we were representing the phonemes using central vowel symbols /a ɜ ɘ ɨ/. But in the most recent discussion at w:Talk:Marshallese language where I asked for review by other editors to improve the quality of the language's representation and to reduce original research, it was agreed that only one of the published linguists had represented the phonemes with central vowel symbols at all, and that was Choi (1992). No one else used his ad hoc system, and it excluded one of the vowels altogether, representing only three. Other published researchers had either phonetically represented the vowels only as allophones, or echoed Bender's half century of Marshallese research using front vowel symbols (instead of central vowel symbols) to represent the underlying phonemes, which meant that the a e o u notation used before had come to make even less logical sense now. We agreed to change the way the article on Wikipedia represents the phonology. Many of those edits are still pending—I've been focusing most of my edits so far on Wiktionary because it will be most affected by these changes. Anyway, it was observed that before Bender started using a e ẹ i, he represented them in his earlier works from 1968 and 1969 as a e & i using an ampersand instead of ẹ, and I realized that since those four characters are still ASCII, they're as good as any symbols to represent those phonemes in modules and templates. I changed every instance I could find in the word entries, and I checked Category:E to check for stragglers, but at the time there weren't any, so I thought I'd gotten them all. Obviously, it seems I missed two of them.

But yes, the use of "o" and "u" as symbols are in the process of being retired, and I edited the parse function to no longer recognize them when I thought I'd at least already updated all the examples in the word entries. (I still need to edit Wiktionary:About Marshallese, and the examples in talk pages have the lowest priority at the moment.)

This time, I was quick to incorporate your most recent changes to the module code in my offline editing copy. But I admit...I don't understand the syntax text:gsub or what that snippet of code does. I didn't know Lua at all before a couple of weeks ago, and I've adapted to writing it much more quickly than I imagined possible, but that's thanks to where I've been able to convert my equivalent JavaScript knowledge. Though I understand your error-checking edits were to diagnose straggling "u" as the culprit, I don't fully understand what your added code actually does in regards to error message reporting. Could you please explain it, if possible? When I've gotten errors from the module, I've mainly just been browsing the stack trace and the line numbers of where the error was generated. - Gilgamesh~enwiki (talk) 18:52, 2 November 2019 (UTC)Reply

Thanks for the further explanation.

I learned JavaScript (and C) after learning Lua, so I can try to explain the colon syntax by comparison with JavaScript. In JavaScript, a.method() is a method call and passes an implicit this, equal to a, to the method. In Lua, a:method() is the closest equivalent; it passes a as the first argument to the method. a.method() would call the method with no arguments. The functions in the string library are available when a string value is indexed (via the __index field in the metatable for strings), so if text is a string, text.gsub gives a function equal to string.gsub, and text:gsub(pattern, replacement) is equivalent to string.gsub(text, pattern, replacement), and is analogous to text.replace(regex, replacement) in JavaScript. text.gsub(a, b) would fail to pass text as the first argument to the function, so is equivalent to string.gsub(a, b): a is the string, and b is the Lua pattern. (Lua will throw a runtime error because the replacement value is required: "string/function/table expected".) In JavaScript, it would be sort of similar to do { const replace = text.replace; replace.call(a, b); }.

The error messages I added were to avoid the incomprehensible error for indexing of a nil value for map[a][d] and such indexings. If local map = {} and local a = "u", then accessing map[a][anything] will cause the error "attempt to index field '?' (a nil value)" because map[a] is nil (there is no value indexed by a) and nil values can't be indexed in vanilla Lua. So I added a check that will prevent the "indexing of nil" error message, since I like error messages to be somewhat understandable (even though average users can't fix them). The error message might be wrong, since I was writing it quickly, and it's possible the check is no longer needed, if the module ensures that the transcription has correct phonotactics or syntax before that point. — Eru·tuon 20:29, 2 November 2019 (UTC)Reply

Thank you. I didn't even know that calling syntax was possible in Lua, but it looks elegant. I'm tempted to use it more. - Gilgamesh~enwiki (talk) 06:52, 3 November 2019 (UTC)Reply

So, to be clear, arg:func() is syntactic sugar for func(arg), right? And arg:func(a, b, c) is equivalent to func(arg, a, b, c)? - Gilgamesh~enwiki (talk) 07:12, 3 November 2019 (UTC)Reply

Apparently it's not quite that simple... But I'd love to understand it. - Gilgamesh~enwiki (talk) 08:13, 3 November 2019 (UTC)Reply

No, any old local or global variable can't be accessed with method syntax. For arg:func() to work, indexing arg.func (or arg["func"]) has to yield a function. So, setting func as a field in a table with local arg = { func = table.insert } enables it to be used as a method: arg:insert("elem"). (The same can be done by setting the metatable for the table: local arg = setmetatable({}, { __index = { func = table.insert } }).)

In the Scribunto variety of Lua, we can only modify the fields or metatables of tables. As mentioned, strings have a metatable that allows using the functions in the string library as methods, but it can't be modified. — Eru·tuon 15:35, 3 November 2019 (UTC)Reply

I see... - Gilgamesh~enwiki (talk) 23:01, 3 November 2019 (UTC)Reply

Well, I hope I'm making sense. Methods in JavaScript and Lua pretty similar apart from the this thing and the difference between prototypes and metatables. — Eru·tuon 17:24, 4 November 2019 (UTC)Reply

Also, if you don't mind my asking, are there any thoughts or critiques you could offer on how I structure the module code, the things I'm doing in the functions, etc.? I'm trying not to make my code too convoluted, but I'm also consciously aware I'm exercising some degree of feature creep. And when I realized you were also exporting the internal conversion functions, I changed the export naming convention so that all such functions are prefixed with an underscore to indicate they are internal functions not intended for normal exported use rather than the actual exports functions. - Gilgamesh~enwiki (talk) 19:00, 2 November 2019 (UTC)Reply

In regard to design, it would be simpler (at least conceptually, and for the testcases module) if the transcription-generating functions took a string and yielded a string, rather than an array of strings. Then multiple transcriptions can be handled by applying the functions multiple times. And it would be consistent with {{IPA}} to have the separate inputs in numbered parameters, rather than separate them with commas in a single parameter, and to bracket them separately: for instance, {{mh-ipa-rows|j&ngw&wil|jengwewil}} instead of {{mh-ipa-rows|j&ngw&wil, jengwewil}} yielding /tʲeŋʷewilʲ/, / tʲɛŋʷɛwilʲ/ as the phonemic transcription instead of /tʲeŋʷewilʲ, tʲɛŋʷɛwilʲ/. But this might complicate {{mh-ipa-rows}} or the module, so you should be the one to decide. — Eru·tuon 18:51, 5 November 2019 (UTC)Reply

Okay, so to be clear...calling gsub with tbl is equivalent to function(match) return tbl[match] or match end? I thought if the item wasn't in the table, it might return nil or something, which is why I wrote it as a function that returns the item or match. Also, I noticed you replaced all those substitutions with "("..V..")(ː*)%1". I was honestly not aware it was possible to reference a capture within the same pattern. - Gilgamesh~enwiki (talk) 20:40, 4 November 2019 (UTC)Reply

Yes, that's correct. Similarly, if a function supplied to gsub returns nil for a particular match, no change will be made to that match. For instance, both ("bat"):gsub(".", { ["b"] = "c" }) and ("bat"):gsub(".", function(char) if char == "b" then return "c" end end) return "cat". (Whereas in JavaScript if you do "bat".replace(/./g, function(char) { if (char === "b") { return "c"; } }) you get "cundefinedundefined". Heh.) — Eru·tuon 20:55, 4 November 2019 (UTC)Reply

I appreciate what you've further done with the testcases, in making tests appear on the main module's page itself. And since I really didn't write any of the testcases script and am not sure what to change without breaking it, I should probably let you know that the MED/Choi/Willson stuff is not coming back. I don't know what I was thinking, putting linguists' conflicting vowel symbols in pronunciation sections as if they were different dialects—that was really unwise of me to begin with. - Gilgamesh~enwiki (talk) 12:10, 5 November 2019 (UTC)Reply

@Gilgamesh~enwiki: In case you haven't noticed, I've made the testcases on Module:mh-pronunc/documentation to compare the outputs of Module:mh-pronunc and Module:mh-pronunc/sandbox. In each of the table cells for which the sandbox module differs, its output is shown below the output of the main module. — Eru·tuon 16:13, 6 November 2019 (UTC)Reply

Yes, I noticed. It helps. Though I still don't quite understand how you're getting that word list programmatically, as it hasn't seemed to have updated since I added new word entries on the wiki. - Gilgamesh~enwiki (talk) 16:36, 6 November 2019 (UTC)Reply

The list of pages and template inputs isn't automatically updated; I generated it from this list of all {{mh-ipa-rows}} templates, which I made two days ago with Pywikibot. I can regenerate it soon if you like. — Eru·tuon 16:41, 6 November 2019 (UTC)Reply

Oh. Okay, that makes sense. - Gilgamesh~enwiki (talk) 17:24, 6 November 2019 (UTC)Reply

Since you've been helping me maintain the module code, I thought I should let you know that I made some major changes to the code structure. I wrote a new local function, gsubBatch, to help reduce boilerplate in the source, since gsub is called a lot and I wanted to streamline it. - Gilgamesh~enwiki (talk) 23:55, 13 November 2019 (UTC)Reply

@Gilgamesh~enwiki: I like it. You might want to take a look at this edit applying the useful behavior of the function replacement value. I think it makes the code more readable. — Eru·tuon 21:01, 14 November 2019 (UTC)Reply

My gsubBatch function may not have been as wise as I once thought. Though it makes code more elegant to read, it can actually make it harder to debug, because errors that occur inside anonymous functions don't seem to report their line numbers if they generate an error, which in a long batch makes it harder to determine where the error came from. I may find myself restructuring code again, but if a lot of sequential gsub calls are necessary, I think I'd rather reduce the length of some variable names, because the sheer amount of boilerplate can be awful. - Gilgamesh~enwiki (talk) 00:55, 18 November 2019 (UTC)Reply

@Gilgamesh~enwiki: Hmm, this should be an improvement. However, if you aren't aware, you can click the Lua error to get a backtrace (assuming JavaScript is working). — Eru·tuon 05:20, 18 November 2019 (UTC)Reply

If I adopt a gsubBatch mechanism again, I'll look into it. - Gilgamesh~enwiki (talk) 17:05, 19 November 2019 (UTC)Reply

I just noticed a strange abundance of words in the table spelt "Wiktionary:About Marshallese", with six different phonological forms. :) Also, been adding more words up to moments ago. - Gilgamesh~enwiki (talk) 17:05, 19 November 2019 (UTC)Reply

@Gilgamesh~enwiki: Yeah, I wasn't sure if you had gotten all the new transcriptions, so I ran the Pywikibot script. It prints the contents of the transclusions of {{mh-ipa-rows}} in Wiktionary:About Marshallese as well as in entries; then I have to remove the unwanted titles. I added a list of titles to exclude so that in the future the unwanted titles can be automatically removed. Perhaps alternative spelling entries could just be soft redirects using {{alternative spelling of}}, without any definition or pronunciation (because both of those are the same for all spellings). I changed M̧ajōļ to an alternative spelling entry for M̧ajeļ based on something you said in the Wikipedia discussion, but am not sure about the others. — Eru·tuon 17:18, 19 November 2019 (UTC)Reply

Yeah, the orthography takes a while to get a feel for. I'm still learning new mini-rules about it, especially recently since I started writing that script. Of the examples at the top of my head, where Bender phonemes are otherwise identical...

io̧kwe over iakwe or yokwe. io̧kio̧kwe isn't difficult from there.
eok over yuk, etc. The Marshallese new orthography, strictly speaking, has no Y.
jukwa over juga. The new orthography has no G, either. Just AĀBDEIJKLĻMM̧NŅN̄OO̧ŌPRTUŪW.
wōja over oja, and similar examples.
Wūjae over Ujae, and similar examples.
I'm not 100% sure whether Bok-ak or Bokaak should be considered primary. I'm guessing Bok-ak, because Bokaak unusually spells out an epenthetic vowel that the new orthography largely avoids.
Between spaces, hyphens and unspaced unhyphenated compound words, there's really no difference in pronunciation, so just one can be picked from multiple. Multiple words undergo assimilations in uninterrupted speech, and individual morphemes of words can be enunciated as needed. The logic of that is...a work in progress; I'm still trying to reconcile the differences between normal vowels and epenthetic vowels when they neighbor glide consonants {y h w}. Anyway, I'd probably go with unhyphenated words or hyphenated ones, and hyphenated words over spaced words.
Note overall that as I've written vowel simplifications into the module, I've largely been following orthographic norms in deciding which surface vowel to express. And I've been trying to leave notes as to "{this} is [that], not [that]", etc.

And thank you again. :) - Gilgamesh~enwiki (talk) 19:49, 19 November 2019 (UTC)Reply

And Jāmo̧ over Jemo̧. - Gilgamesh~enwiki (talk) 22:35, 19 November 2019 (UTC)Reply

Efficiency[edit]

I may have significantly increased the module's execution time, which may be extending table load times. I changed it so that forRemainder is actually (pretty much unconditionally) called twice and the duplicate result discarded. This is for careful mode (variable name subject to change), to satisfy inconsistencies between the way Bender (1968) and Willson (2003) described the language, and the more careful pronunciations prescribed by Naan (2014). Basically, in careful mode, the nasal consonant cluster assimilations are avoided, there's a handful more cases where clusters have epenthesis instead of assimilation, and the behavior of epenthetic vowels neighboring glides has changed. I don't necessarily see an inconsistency in including both, since most languages (including English) have words or phrases that differ notably in pronunciation when spoken more rapidly or more slowly, and can change how people perceive the word in their own speech. Compare "ornge" vs. orange, where some people primarily speak it as two syllables, and some (like me) say it as one syllable. - Gilgamesh~enwiki (talk) 20:02, 20 November 2019 (UTC)Reply

Yes, execution time is definitely way up according to the "Lua time usage" measurement (at the bottom of the edit page). According to the profile as I am writing this, 4160 ms (85.2%) of that is mw.ustring.gsub. It's not a very efficient function because it's implemented using PHP regex and calls go over the Lua–PHP boundary. Sometimes the number of calls can be reduced by generalizing the patterns (regexes) and using a function replacement. — Eru·tuon 20:17, 20 November 2019 (UTC)Reply

By the way, I like how the "careful" mode avoids assimilations. Assuming Arņo is a native word, it seems strange for the r to be assimilated into a ņ, when the only reason for the r to be in the spelling is if it is sometimes pronounced. Otherwise, it should be Aņņo. Similarly with Aujtōrōlia, which could be Auttōrōlia, though since it's a loanword and the j might be needed to represent the original s, it's not very strong evidence against assimilation. — Eru·tuon 20:44, 20 November 2019 (UTC)Reply

Youch... So would it actually be more efficient to pass a function substitutor argument than a string substitutor argument? I'm all for increasing the efficiency of the script by whatever practical means available. It is also my very first Lua script.

And yes...Marshallese orthography has always been a strange creature. The new orthography since the 1970s is not purely phonemic, obviously, if you compare it with Bender's phonemes, but is designed so that syllables in isolation are reasonably easy to learn how to pronounce once you learn which sound each letter stands for, and is something foreigners (most of whose languages do not have vertical vowel systems) can more easily learn to pronounce. Native speakers of the language already know words in isolation, and know how to string them together into compound words and sentences, so their orthography can simply string together morphemes and allow epenthesis, sandhi, assimilations, etc. to take their natural course. In this way, it also preserves the morphemic structure and thus more of the etymology of words, in an orthographic approach also preferred in languages like French and Icelandic. Arņo is a compound name of two morphemes: ar "lagoon beach" and ņo "wave". If you simply write the assimilations and write it Aņņo, the etymology is relatively more obscured. What seems to be relatively new to the equation is learning how to pronounce words as they are written in a stable orthography already provided. This means that some consonant clusters that were previously routinely assimilated, may now be enunciated more carefully by people who have learnt to read and write at school. Spellings like kw increasingly are no longer taken as single consonant phonemes, but as sequences of k and w. Two-syllable words like io̧kwe may instead come to be analyzed as three-syllable words because of how they are written. rn is pronounced as two different consonants because it is written that way. I've seen evidence of these trends in the pronunciation guides prescribed by Naan (2014), my discovery of which led me to rethink how to write the Lua module. I honestly can't say I know how realistic these "careful" pronunciations are among native Marshallese speakers (some of it may well be more artificial than not), but it certainly seems to be increasingly how Marshallese is taught, at least in a college environment. If only we had more access to more native Marshallese speakers, but internet access is too expensive and unreliable for most of the population. (I'm impressed that the undersea fiberoptic cable connecting Majuro to Guam manages to span the Marianas Trench.) - Gilgamesh~enwiki (talk) 22:09, 20 November 2019 (UTC)Reply

I just noticed you made changes to the script. I haven't fully assessed the changes yet, but I've seen just enough to pique my interest. - Gilgamesh~enwiki (talk) 22:31, 20 November 2019 (UTC)Reply

Yeah, I think a function substitution can be more efficient. The function replacement handling assimilation is slightly faster, if the "Lua time usage" figures for the "before" and "after" versions of the module are accurate. (But sometimes the figures vary unpredictably. Greater differences are less likely to be the result of chance.) It means only one mw.ustring.gsub call to handle all assimilations, and perhaps the overhead of calling a function for every series of two consonants is less than the overhead of multiple calls to mw.ustring.gsub. I think that's plausible because of all that PHP has to do for each mw.ustring.gsub call.

I didn't realize Arņo was a compound (naturally, since I'm pretty ignorant). That does provide an explanation for the spelling, even if there's assimilation. — Eru·tuon 22:35, 20 November 2019 (UTC)Reply

Is it all right if I rename the substitutor function's variable names? Not just because I generally start non-consonant variable names with a lowercase letter, but C2 already exists as a separate higher scope variable, and using a different variable name may reduce the risk of variable name confusion and make the code more readable.

And s'fine. A lot of common Marshallese morphemes are only two letters long, and there was no Wiktionary Marshallese entry for ar yet anyway. - Gilgamesh~enwiki (talk) 22:42, 20 November 2019 (UTC)Reply

Yeah, the variable name duplication is not a good idea. I noticed it and was displeased. I do prefer somewhat descriptive variable names over "a, b, c, d" though. — Eru·tuon 22:47, 20 November 2019 (UTC)Reply

I tend to think of captures as a, b, c, d as a sequence of captures, and easier on the eyes than letter-numbering them like c1, c2, c3, c4, etc. Anyway, I think I know what you're trying to accomplish. Your code broke some of the (as of yet unused) nʷtˠ logic, but what you're doing here looks very, very clever and I think I know how to take it and run with it with other parts of the code. - Gilgamesh~enwiki (talk) 22:58, 20 November 2019 (UTC)Reply

Well, the variable names C1, A1, C2, A2 were abbreviations of "consonant 1", "articulation 1", "consonant 2", "articulation 2" (though that's not completely accurate terminology, since it's more like primary and secondary articulation), so more descriptive than either a, b, c, d or c1, c2, c3, c4. — Eru·tuon 23:03, 20 November 2019 (UTC)Reply

I've thought of it: x, xx, y, yy. It helps that neither X nor Y are in the standard new orthography. And when I realized what you were doing, I rewrote your function. May I demonstrate...? - Gilgamesh~enwiki (talk) 23:36, 20 November 2019 (UTC)Reply

Ahh, that's much more readable! — Eru·tuon 01:20, 21 November 2019 (UTC)Reply

Thanks. :D And I'm not even done yet. You gave me the idea, and I'm running with it. About to try another edit. - Gilgamesh~enwiki (talk) 02:14, 21 November 2019 (UTC)Reply

In response to your question, "Why did the epenthetic vowel disappear between the p and the k in Āneeļļapkaņ?", the pattern is not matching the /pʲkˠ/ when mw.ustring.gsub is called the second time, because /lˠlˠ/ is not changed when mw.ustring.gsub is called the first time, and is matched both times. Here is a technique for cases like this that also allows mw.ustring.gsub to be called only once. (Gah, in the edit summary I meant to say "getting the surrounding consonants with mw.ustring.sub", not "mw.ustring.gsub".) — Eru·tuon 02:54, 21 November 2019 (UTC)Reply

Your solution with the i and j indices was clever. (I renamed them xvi and yvi.) It all...seems to work now. Now let's see if I can rewrite the logic of another expensive regex batch without breaking it too badly.

Oh, and...the table's Rālik vs. Ratak logic seems reversed. When both forms are the same, it shows two table cells. But when the forms differ, it only shows the Rātak form.- Gilgamesh~enwiki (talk) 03:13, 21 November 2019 (UTC)Reply

How much time do you think was shaved off the module's execution, comparing right after I added "careful" mode to when we rewrote this regex batch? - Gilgamesh~enwiki (talk) 03:15, 21 November 2019 (UTC)Reply

Whoops, fixed the logic. Glad you spotted it.

It is apparently somewhat faster; I previewed Module:mh-pronunc/documentation three times with the old version and the new version, and got 5.3 or 5.4 or 7.1 seconds and 4.5 or 4.6 or 3.0 seconds respectively. Significant variation, so it's hard to say just how much faster, but there wasn't overlap. The number of calls to mw.ustring.gsub in Module:mh-pronunc in the generation of the testcases table (counted thus) has been reduced from 228,294 to 156,516.

We should probably be editing Module:mh-pronunc/sandbox to avoid changing transcriptions in entries (and avoid asking the server to update pages).... — Eru·tuon 07:18, 21 November 2019 (UTC)Reply

So, edit sandbox for experimental code, and the main module for stable milestones? Yeah, I can see how that's a good idea. - Gilgamesh~enwiki (talk) 13:36, 21 November 2019 (UTC)Reply

I've been considering an alternative approach to programming the phonetic algorithm. As it currently stands, the regex approach is effective in thoroughly processing the input text, but it's also proven a lot more inefficient than I predicted. Putting more logic into substitutor functions improves the performance somewhat, but in a process where regex replaces matches one by one, it's not as practical in making necessarily adjustments to vowels that were already replaced. For example, this existing code:

				-- {yekʷey, yewan} are [ɛɡʷɛ, ɛwɑnʲ], not [ɛ̯ɔɡʷɛ, ɛ̯ɔwɑnʲ]
				text = gsub(text,
					"(ɦʲ@*)([ɔou])(@*.ʷ.?ʷ?@*[æɛeiɑʌɤɯ])", function(a, b, c)
						return a..VOWELS_Y[b]..c
					end)

Unlike other logic that replaces text based on what already exists to the match's left-hand side, this replacement can only be made if the stable value of the vowel on the right is already known. This is how I earlier solved the Ānewātak problem so that its phonetics were properly displayed as [ænʲeːwæːtˠɑk] instead of [ænʲeowæːtˠɑk]. In a more optimized approach, that could be fixed in a second regex pass, but I think I have a better idea—I just don't know beforehand how practical it will be.

Basically, my idea is, instead of relying so much on regex, just parse the input text and represent its data as a doubly linked list of table objects, where each node represents either a consonant or a vowel. Code could loop through the link nodes, make changes in them informed by nodes that come before or after, and can make secondary changes to previous node data as needed. Then, when the linked list is done being manipulated, convert it back to text.

But can this all be done in Lua using only linked lists and logic, more efficiently than batches of regex replacements can do it? - Gilgamesh~enwiki (talk) 18:46, 22 November 2019 (UTC)Reply

I'm not sure, but I think it could end up being faster because the overhead of many mw.ustring.gsub calls is considerable. It could also reduce memory because fewer intermediate strings would be created. But I'm speculating.

I haven't done anything quite like this; the closest thing is the pair of functions make_tokens in Module:grc-utilities and tr in Module:grc-utilities. The former processes Greek characters into "tokens" (sub-sequences, mainly to handle diphthongs and single vowels correctly), and uses objects to represent the characteristics of the Greek characters, and the latter processes the tokens to create a transliteration. Not super elegant, but my version of the tokenization function was much faster than the previous one, probably because it got rid of most of the calls to mw.ustring functions.

Using a doubly linked list is an interesting idea. It could be more elegant, though I can't imagine all the details of how it could work. — Eru·tuon 03:24, 24 November 2019 (UTC)Reply

Well, practically any grc script has to be easier to maintain than the pre-Scribunto version, which I wrote back in the day. That was such a beast... - Gilgamesh~enwiki (talk) 14:37, 24 November 2019 (UTC)Reply

Wait...you said mw.ustring functions were inefficient. Does that include mw.ustring.sub? - Gilgamesh~enwiki (talk) 14:40, 24 November 2019 (UTC)Reply

mw.ustring.sub is noticeably inefficient when there are many calls, for instance when you iterate through strings using for i = 1, mw.ustring.len(str) do local character = mw.ustring.sub(str, i, i) end. In the previous version of the tokenization function, mw.ustring.sub was called about up to three times for every code point in the string. My impression is that that explained most of the inefficiency in the old version of the function, though it's not a great testcase because the old and new versions are so different. The overhead is probably not as noticeable in the function replacement in Module:mh-pronunc though, where it currently has only 2,028 calls, as opposed to 115,872 for mw.ustring.gsub to create the testcases table. (And I guess mw.ustring.gsub probably has greater overhead.) It's not so efficient that the function should be avoided altogether.

I should say, the module is already efficient enough in entries (it looks like {{mh-ipa-rows}} takes about a twentieth of a second in entries), so don't feel obligated to remodel it for that reason at least. (Not to discourage you from rewriting it if you want to – I do quite a bit of random rewriting of modules for various reasons.) — Eru·tuon 23:08, 24 November 2019 (UTC)Reply

It's not just Wiktionary I have to think about. I want to also be able to migrate the code to Wikipedia. Most WP articles where it would be relevant might need the entry only once, but not on articles like Kwajalein Atoll where there are Marshallese names provided for all the notable islets and many of them are notable, but most not notable enough to get separate articles of their own. And some of these islands have two or three separate Marshallese names depending on context. Obviously, being WP, pronunciations aren't embedded in the same format as Template:mh-ipa-rows, and perhaps that means fewer functions called, but toPhonetic would certainly be called multiple times in an article like that. I'd rather not add that much extra load time there. - Gilgamesh~enwiki (talk) 00:12, 25 November 2019 (UTC)Reply

Also, as I've tried to write linked list code, I'm realizing that I'm still creating a beast of a different kind: Far fewer mw.string, but immensely more bloated code. I get the impression that functions like mw.string.sub are so expensive because the strings are probably encoded in UTF-8, but logic required to seek codepoint indices—or worse, conceivably to convert between UTF-8 and UTF-16 and back—may involve a lot of overhead if called often enough (I'm not sure which, if any of these things, is actually being done). Obviously we're working with a lot of Unicode text and the data needs to be preserved in that format.

I wonder...what if I completely redesign the internal code format (returned by parse and passed to the other internal functions) to use only ASCII surrogates and byte-based string functions for the text-crunching, and then convert them to Unicode forms to represent their final forms? Are there also byte-based functions available for regex that are more efficient? - Gilgamesh~enwiki (talk) 00:12, 25 November 2019 (UTC)Reply

I just had a thought. Many calls to mw.ustring.sub can be expensive, right? But most of the time I only need a single Unicode character. What if I...split a string into an array of characters first, and just reference the array's indices? No dynamic linear behavior involved in retrieving an indexed Unicode code point from a byte string. - Gilgamesh~enwiki (talk) 02:00, 25 November 2019 (UTC)Reply

Hm, yeah, maybe some Wikipedia articles could invoke the module enough to noticeably increase Lua time usage. There are quite a few words in Kwajalein Atoll that could have IPA transcriptions.

I certainly hope mw.ustring.sub doesn't do any conversion between UTF-8 and UTF-16. That would be madness. I found that the implementation of mw.ustring.sub calls mb_substr in PHP, which calls mbfl_substr, but I didn't figure out what it does to UTF-8.

The byte-based functions are the string library functions (the ones that can be called as methods on strings). They are much more efficient because they call directly into C and don't have to deal with UTF-8 or Unicode categories. But using ASCII replacements for the Unicode characters sounds like a bit of a pain; it could make the intermediate forms a bit harder to understand.

Yeah, using an array of characters should be cheaper if you're calling mw.ustring.sub to get multiple characters from the same string. To be super cheap, I would use string.gmatch:

function get_character_array(str) local arr, i = {}, 1 for char in string.gmatch(str, "[%z\1-\127\194-\244][\128-\191]*") do arr[i] = char i = i + 1 end return arr end

. — Eru·tuon 05:38, 25 November 2019 (UTC)Reply

I'm increasingly wondering if UTF-16 isn't involved under the hood at all. But then, Unicode code point operations on UTF-8 data still means that the functions cannot know in advance which byte index contains which code point index, which means that it has to measure from the start of the string. That means linear behavior, and that isn't much better than converting the whole string to UTF-16.

Anyway, the string-to-character-array code I had in mind was mw.string.split(text, ""), called only once before a major mw.string.gsub operation whose substitutor function would have otherwise needed mw.string.sub multiple times per match. I hadn't considered your string.gmatch approach before, but it looks interesting—might there be a way to expand it to work with three- and four-byte UTF-8 code points?

And yeah, trying to find an ASCII-based surrogate code has proven...challenging, to the point I think maybe I won't do it. I tried to design a Unicode-to-ASCII-to-Unicode cipher mostly based on X-SAMPA, but it had its constraints, and a lot of X-SAMPA sequences use two or more ASCII characters where Unicode IPA would only use one code point. It's fortunate I'm pretty knowledgeable in X-SAMPA, which greatly improved since I wrote an offline JS utility (downloadable here) that automatically converts X-SAMPA input to IPA as you type. (I wrote it several years ago, and my coding conventions have certainly improved since then, so don't be too horrified if you view source. If I could write the identical utility today, there would be so many things I'd change. But I digress.) So, to try to come up with a one-code-point-to-one-character cipher, I had to think of ways to simplify some sequences. [æɛeiɑʌɤɯɒɔou] already has a one-to-one conversion with {EeiAV7MQOou, but when writing regex sequences, { would have to become %{, so I could just replace it with a instead. The secondary articulations is where it gets trickier, as the equivalents of [ʲ ˠ ʷ] are ' _G _w. Since I only use [w] as a final phonetic presentation form, I could conceivably just use j G w, but it's again complicated where the X-SAMPA equivalent of [ɦ] is h\. Lots of these little things call for lots of little simplifications, until you get to the point where the internal string /ɦʲænʲeɦʲelˠlˠæpʲkˠænˠ/ (Āneeļļapkaņ) has a pseudo-X-SAMPA appearance of hjanjehjelGlGapjkGanG, and...I end up kinda not wanting to go that route anymore. Regex and the algorithm can already get complex enough without making the internal IPA so much harder to read. - Gilgamesh~enwiki (talk) 16:27, 25 November 2019 (UTC)Reply

Oh, just now realized that your "[%z\1-\127\194-\244][\128-\191]*" does support three- and four-byte code points. - Gilgamesh~enwiki (talk) 16:36, 25 November 2019 (UTC)Reply

Wait, your example code just grows an array by assigning new indices to the end of it? That seems bad to me from a JS background, where an array becomes much more inefficient unless you grow it with array.push(element). You sure that doesn't hurt array storage efficiency on the JIT site? (Or does Scribunto/Lua not use a JIT anyway?) I'd probably find myself writing it with push's Lua equivalent, table.insert. - Gilgamesh~enwiki (talk) 16:41, 25 November 2019 (UTC)Reply

Huh... Okay, then, your approach is better. :) - Gilgamesh~enwiki (talk) 16:44, 25 November 2019 (UTC)Reply

Hm, is it generally safe (and hopefully performs better) to use byte-string-based regex functions on UTF-8 strings in situations where it doesn't have to care how the Unicode code points are encoded? UTF-8 searches, UTF-8 replacements, etc. It seems to me like it would only really get unsafe if you tried to mix non-ASCII characters into single-character regex logic ([xyz] x? x* x+ etc.), as it would test for the byte rather than the codepoint. But stuff like simple substring replacements and multi-character captures (xyz) could be fine even with UTF-8 code points included. - Gilgamesh~enwiki (talk) 17:02, 25 November 2019 (UTC)Reply

table.insert isn't any more efficient than t[i]. As mentioned in the link, it's actually slower because of the two meanings that table.insert has (table.insert(t, val) vs. table.insert(t, i, val)). Scribunto doesn't use LuaJIT. It would probably improve performance to allocate the entire array at once with { nil, nil, nil, ... }, but that requires knowing the number of code points and having a function that can return that many nils.

Yep, those are two cases in which the string library doesn't work with multi-byte characters; also several of the character classes like %s are Unicode-dependent in the mw.ustring library. I wrote a little about this at WT:LUA § Ustring patterns and created Module:User:Erutuon/patterns, which contains a function that tests whether a pattern will match correctly (according to UTF-8 and Unicode semantics) in the string library functions.

I imagine that converting UTF-8 to UTF-16 and back requires memory allocation, so there should be a significant performance penalty if mw.ustring.sub is implemented that way. Certainly indexing UTF-8 by code point is slower than byte indexing, but I imagine with this decoding technique it could be fairly fast. — Eru·tuon

I've given the the theoretical Unicode-to-ASCII-pseudo-X-SAMPA cipher more thought, and I believe if I were to use it, it would look something like this:

p	b	t	d	z	k	ɡ	m	n	ŋ	r	l	ĭ	ī	ɣ	ɦ	ɧ	_	ʲ	ˠ	ʷ	æ	ɛ	e	i	ï	ɑ	ʌ	ɤ	ɯ	ɒ	ɔ	o	u	◌̯	ː	◌͡◌
`p`	`b`	`t`	`d`	`d`	`k`	`g`	`m`	`n`	`N`	`r`	`l`	`y`	`Y`	`H`	`h`	`H`	`_`	`j`	`G`	`w`	`a`	`E`	`e`	`i`	`I`	`A`	`V`	`7`	`M`	`Q`	`O`	`o`	`u`	`^`	`:`	`=`

Because, on second thought, hjanjehjelGlGapjkGanG is rather hard to read, but then, so is /ɦʲænʲeɦʲelˠlˠæpʲkˠænˠ/. These are internal formats, not display formats (even the internal IPA is pseudo-IPA), and at least X-SAMPA is well documented enough for a pseudo-X-SAMPA approach to be viable. I'm still working with code ideas offline. - Gilgamesh~enwiki (talk) 21:23, 26 November 2019 (UTC)Reply

I've tried a variety of coding approaches, and I'm realizing there may be no real substitute for batches of regex. Regexp can be written fairly concisely, and the more bloated code comes, the harder it is to read. And after multiple attempted rewrites, I've found that I've stopped writing comments to reduce mental gear-shifting. Well-written code doesn't need many comments anyway. I just want to write something that balances readability with efficiency. Fortunately, I've had decent success with the pseudo-X-SAMPA approach in concept, and I can minimize the use of UTF-8 regex functions and rely more on faster functions like string.gsub. (At least I hope it's faster...) - Gilgamesh~enwiki (talk) 08:16, 2 December 2019 (UTC)Reply

This revision does seem to be noticeably more efficient than this: about 1.7 seconds versus 2.7 or so. Since some of that is the less efficient Module:mh-pronunc, I guess the sandbox module takes 1.7 - 2.7 / 2, or 0.4 seconds. But there is a tradeoff between efficiency and readability. 20:34, 2 December 2019 (UTC)

I wonder...how are Lua's regular expressions functions implemented? string.gsub, string.find, etc. I cringe to think that the engine has to compile a new regex edifice every time the regex code is passed to one of these functions. I hope they are at least being cached between calls, either in an internal hashtable or attached to the internalized pattern strings themselves. - Gilgamesh~enwiki (talk) 02:08, 3 December 2019 (UTC)Reply

Since Lua patterns are so much simpler than proper regular expressions, they're just interpreted. You can see the pattern-interpreting function used by all of the string-library pattern-matching functions, except string.find when the plain flag is set, here. — Eru·tuon 04:15, 3 December 2019 (UTC)Reply

I see... I hadn't considered that. Keeping it simple means implementing it simple. - Gilgamesh~enwiki (talk) 04:27, 3 December 2019 (UTC)Reply

I finished writing the new draft and ironing out the bugs, and replaced the non-sandbox version with it. How does the performance compare now with the previous version? - Gilgamesh~enwiki (talk) 21:32, 5 December 2019 (UTC)Reply

Wow! Considerably faster for the whole testcases table: less than half a second. — Eru·tuon 22:52, 5 December 2019 (UTC)Reply

Seems like a winner, then. And the code is readable? The pseudo-X-SAMPA isn't too much trouble? I had to deviate significantly for some symbols, like c J h H y Y a I @ which do not represent their conventional X-SAMPA counterparts, for the sake of being more regex-pattern-friendly and single-character-friendly. The way I use them, c is actually [t͡s], J is [d͡z], h and H are transitional representations of unsurfaced and surfaced glides, y is {yi'y} ([i̯]), Y is {'yiy} ([iː]), a is [æ] ({ isn't as readably regex-friendly), I is a dotless [ı] replace ı with ɪ, invalid IPA characters (ı) that is friendlier to IPA tie bars, and @ is the diacritic [◌̆]. Otherwise (unless I've forgotten any), the symbols are the same as their X-SAMPA counterparts (or _-notated forms thereof), which are mostly the same as their IPA counterparts when they are plain Latin lowercase letters. The system works well. (Right now, in edit preview, it complains that [ı] replace ı with ɪ, invalid IPA characters (ı) is invalid IPA, but the choice is really just to keep the tie bar from hovering so much higher than over other pairs of vowels when [i] is present—[u͡i] vs. [u͡ı] replace ı with ɪ, invalid IPA characters (ı). If it proves problematic, it can be reverted to [i]—I just wanted to polish the presentation a bit, which makes a different with certain IPA typefaces like Gentium and certain browsers like Firefox.) - Gilgamesh~enwiki (talk) 01:35, 6 December 2019 (UTC)Reply

It looks pretty readable to me, since I'm familiar with a fair amount of X-SAMPA.

An alternative to using the dotless i would be to use ͜ (U+035C COMBINING DOUBLE BREVE BELOW) if either of the two vowels is i: [u͜i]. I prefer that because the dotless i confuses me: it looks somewhat like ɪ, and I think I'm used to seeing the dot when there's a tie bar. The equals sign could be converted to the tie character above or below before the rest of the ASCII characters at the end. — Eru·tuon 04:40, 6 December 2019 (UTC)Reply

That is a very good point. I think I'll do what you suggest. - Gilgamesh~enwiki (talk) 04:49, 6 December 2019 (UTC)Reply

You know, it has been my conventional wisdom for decades that regular expressions are one of the slowest devices in scripting, and that practically any other conventional means of parsing text is preferable for speed. But that isn't always true, is it? At least, not in Lua. In some cases, string.gsub actually seems faster than trying to do the same thing procedurally, even if you try to do it all with arrays of one-character strings. These calls are actually a lot faster than I gave them credit for—I knew they would be faster than mw.ustring.gsub, but not that they might actually be faster than my attempts to do the same thing procedurally. I suppose it also helps that, this time, I eliminated most throwaway lookup tables, and instead generate them only once and cache them.

All that said...I still kinda hate Lua. Too many thens and nots and not enough curly braces, and arrays starting at 1 instead of 0 is consistently maddening. I miss JavaScript. Would love to write modules in modern JS. - Gilgamesh~enwiki (talk) 05:09, 6 December 2019 (UTC)Reply

I made a small change that could significantly improve performance, at least for some regex replacements, but I don't know how well. The change is:

local function string_gsub2(text, pattern, subst)
	local result = text
	result = string.gsub(result, pattern, subst)
	-- If it didn't change the first time, it won't change the second time.
	if result ~= text then
		result = string.gsub(result, pattern, subst)
	end
	return result
end

Still looking for small ways I can improve efficiency. - Gilgamesh~enwiki (talk) 19:44, 21 January 2020 (UTC)Reply

toMOD[edit]

I wrote a simple new function, toMOD, that I need tested, perhaps with a new column in the table. It converts standard orthographic spelling to the format used by the Marshallese-English Online Dictionary, converting ĻļM̧m̧ŅņN̄n̄O̧o̧ to ḶḷṂṃṆṇÑñỌọ. This has potential applications in Marshallese reference templating, where a word in standard orthographic spelling can be automatically converted to MOD's spelling so that references can link directly to dictionary entry anchors on that site without us needing to directly embed a differently-spelt word in the external link. No such template has been written yet. It may be a good idea for each row of the "term" column and a potential MOD column to share a table cell where the forms have identical spelling. And, in any event, the separate MOD spelling should probably not link to a Wiktionary entry with that spelling, as it is and always was a non-standard alteration to Marshallese orthography which is largely limited to the MOD, Naan and associated media intended for offline distribution to available computers in the Marshall Islands. I imagine that, if the standard orthography were considered friendlier to older Windows and Mac computers and their available font rendering, MOD and Naan would be using the standard orthography out of the box, but for the time being they are what they are. - Gilgamesh~enwiki (talk) 07:44, 10 December 2019 (UTC)Reply

That is a useful function to have. I think it would be useful to display the MOD spelling in the entry, unlinked – that would allow people to search for the MOD spelling (ḷọñ and find the entry (ļo̧n̄), provided there's no entry for a homograph of the MOD spelling. — Eru·tuon 22:09, 10 December 2019 (UTC)Reply

I thought most modern browsers allow Ctrl-F text searches that recognize letters and ignore diacritics. Right now I press Ctrl-F and type unmarked "lon" and it finds both of those words you just mentioned. However, just displaying the MOD spelling in the entry might be doable...might need some new templates. But I think I've been hesitant to dive into new Marshallese entry templating design too soon when there are still so many aspects of the language's grammar I don't fully understand. For instance, all Marshallese adjectives are verbs, and beyond suspecting that adjectives are stative verbs (equivalent to English "to be <adjective>"), I don't know what else that actually means. Yet for now, a Marshallese entry template doesn't have to be complicated—it can just redirect to the standard entry template, but display the MOD spelling as an alternate where they differ.

By the way, I've not yet figured out how display actual wiki markup using Scribunto/Lua—everything I print out seems to be the same as the contents of <nowiki></nowiki>. If I knew how to write scripts that generate more complex wiki markup output, I might be able to migrate more of the functionality of {{mh-ipa-rows}} to a template.

It also occurs to me that Module:mh-pronunc is getting big, at over 30K now. Conventional wisdom suggests splitting it up into multiple scripts that can be imported into each other as needed, but then a multi-file project isn't as simple to mirror at Wikipedia. (A copy exists at wikipedia:Module:mh-pronunc, and its comment at the top links back here.) So maybe, the most portable, reusable portions could be maintained as one script, and more site-specific applications can be separate scripts that can stay on this wiki. For instance, mh-ipa-rows is useful at Wiktionary but notso much at Wikipedia. - Gilgamesh~enwiki (talk) 03:04, 11 December 2019 (UTC)Reply

Oh, by search I'm mean the search engine for Wiktionary. Right now ļo̧n̄ is the 17th result in the search for ḷọñ, but if it is displayed in one of the templates, it should be higher in the results. I was thinking the MOD spelling could be displayed in the pronunciation template, but that isn't quite appropriate, and anyway alternative spelling entries probably need a MOD spelling, but might not have a pronunciation template. Probably the template that displays the MOD spelling should be placed in the Alternative forms section.

I've maintained a sort-of mirrored version of a set of Wiktionary modules on Wikipedia (Module:Unicode data), but the Wikipedia and Wiktionary versions have drifted apart in some ways; it's tedious copying the source code. It might be easier with a Pywikibot script, but I can't edit the Wikipedia module anymore because it's been template-protected. — Eru·tuon 04:05, 11 December 2019 (UTC)Reply

I didn't realize that's what you meant—I put it in (newly-created and under-featured) {{mh-head}} for now. At least the MOD spelling is being displayed, though. And I don't think it may be the best idea to put the MOD spelling in an alternative forms section, because it may prompt a naive third-party editor to turn the unlinked term into a linked term and create a word entry. My concern is that it may motivate an unnecessary duplication of many entries with the non-standard orthographic variants. It also doesn't help that some sources for the language write Marshallese words without any diacritics, and it seems dan was created from one of these sources as an unknowing duplicate of dān. - Gilgamesh~enwiki (talk) 08:05, 11 December 2019 (UTC)Reply

If I may ask, could you please update the table? I was updating it manually, but then I added so many new entries that I got behind. Most of the new entries are words that start with ri-—demonyms, mainly. - Gilgamesh~enwiki (talk) 05:08, 15 December 2019 (UTC)Reply

Done. And finally the script is fully automatic: it reads the "excluded titles" list and updates the list of template input without me copy-pasting anything. — Eru·tuon 09:59, 15 December 2019 (UTC)Reply

Thank you. What do you think of the state of the script and entries now? It's still only a tiny selection of the language, but I've been trying to steadily add more words. I'll also try to add words of phonological interest that help continue to refine the script. - Gilgamesh~enwiki (talk) 11:01, 15 December 2019 (UTC)Reply

Overhauling Template:mh-head[edit]

Marshallese doesn't have all the complex noun cases of an agglutinative language, but it does have some inflected forms, and {{mh-head}} would seem to be the appropriate place to list these. I have an idea of what I want to accomplish, but it may require some additional Scribunto/Lua API I'm not that familiar with, since I think template-only logic would become unnecessarily bloated. I was wondering if you could help me write such a template and backing script. I need to figure out how vanilla {{head}} creates its inflection list and handles the appropriate automatical categories with language-sensitive sorting keys, and how I can extend or replicate that in a script, with possibilities like default inflected forms, more than one of the same kind of inflected form, etc. I can conceptualize what I want to achieve, but API-wise I'm in over my head. - Gilgamesh~enwiki (talk) 02:14, 24 December 2019 (UTC)Reply

I think I found some resources to start with, chiefly Module:headword. - Gilgamesh~enwiki (talk) 18:02, 24 December 2019 (UTC)Reply

Yeah, the language-specific headword-line modules call full_headword in Module:headword and if necessary format_categories in Module:utilities to format extra categories that don't begin with the language name. In the Marshallese module there could be a main function that generates the MOD spelling and it can call one of the pos_functions to handle part-of-speech-specific stuff. I'm not sure what is a good module to base the Marshallese one on though. Much of Module:eo-headword is probably understandable because the morphology is simple at least. — Eru·tuon 19:52, 24 December 2019 (UTC)Reply

Now that I understand the technical aspects better of implementing the template, I realize I still need a better understanding of the grammar, so I'll put it off for the time being. After all, I'm sure there may be all sorts of unforeseen errors in the Wiktionary entries that could be remedied with a better understanding of both Marshallese grammar and the MOD entry structure. - Gilgamesh~enwiki (talk) 05:04, 25 December 2019 (UTC)Reply

Distributive verbs[edit]

I think sometimes I forgot just how much technical work you do here at Wiktionary, beyond just helping me with a Marshallese module. I created a new category, Category:Marshallese distributive verbs, but {{auto cat}} shows this category is not supported. What would be involved in creating new grammar categories? - Gilgamesh~enwiki (talk) 13:45, 14 January 2020 (UTC)Reply

Some brief background: Marshallese distributive verbs basically modify a noun or verb with the rough inflected meaning of "there are a lot of [something]s." This particular grammatical form is demonstrated extensively in example sentences throughout the Marshallese-English Online Dictionary. - Gilgamesh~enwiki (talk) 13:53, 14 January 2020 (UTC)Reply

The "distributive verbs" category should only be added to the category system (Module:category tree/poscatboiler/data/lemmas probably) if it's going to be used in other languages and the meaning is roughly the same for all of them – meaning if there are distributive verbs in another language with a different meaning, that doesn't allow us to have a single description for every language's distributive verbs category. At least to start with, it can have manual content. — Eru·tuon 23:38, 15 January 2020 (UTC)Reply

That seems logical. Since I'm not specifically aware of distributive verbs being in any other language, I couldn't guarantee they would mean the same thing in those languages. As it is, Marshallese already uses at least a few relatively exotic grammatical forms that only one or a few other languages use—for instance, besides Category:Marshallese noun construct forms, there's only Category:Hebrew noun construct forms as subcategories of Category:Noun construct forms by language. Then there's also adjective verbs, which I initially categorized as Category:Marshallese adjectives, but then wondered if they shouldn't be better in Category:Marshallese stative verbs (there are no adjectives that are not verbs), when in reality these grammatical categories don't always easily fit in the existing conventional hierarchy, and I'm not proficient enough in the language myself to make confident decisions about their placement, and I fear I may be introducing errors that might have to be fixed in bulk at a later date. - Gilgamesh~enwiki (talk) 06:28, 16 January 2020 (UTC)Reply

@Erutuon Wow, you are a busy bee. I think I have even greater respect for what you do here than I did even just 24 hours ago. As much as I would appreciate your continued feedback in my ongoing endeavors, I can still wait. - Gilgamesh~enwiki (talk) 23:28, 15 January 2020 (UTC)Reply

Bug[edit]

@Erutuon There's a bug in the module's debug table, most noticeable with words whose Bender spellings start with "yiy" and a vowel. In line with references explaining how Marshallese words can be enunciated phoneme by phoneme, I'm testing an experimental enunciate-mode, where short prosodic breaks [|] are inserted in the middle of consonant clusters. The problem is...the International Phonetic Alphabet specifies these as pipe characters |. I already tried hard-coding {{!}} in the module output, but it only looks like {{!}}. So now I'm using a normal pipe character, but there's a bug in the way the module's debug table displays it. What's only displaying æ.e.kʷwɤtʲ] should actually be displaying [i | æ.e.kʷwɤtʲ] - Gilgamesh~enwiki (talk) 19:03, 16 January 2020 (UTC)Reply

@Gilgamesh~enwiki: Fixed, in the testcases module, by escaping the pipes. They are part of template syntax, and in this case the stuff before the pipe was being treated as attributes for the table cell. — Eru·tuon 19:14, 16 January 2020 (UTC)Reply

Thank you. :) - Gilgamesh~enwiki (talk) 19:25, 16 January 2020 (UTC)Reply

Just FYI: it's unnecessary to ping someone on their talk page, because they already get a notification just from someone else editing their talk page. Chuck Entz (talk) 04:11, 17 January 2020 (UTC)Reply

Ahh, good to know. - Gilgamesh~enwiki (talk) 06:37, 21 January 2020 (UTC)Reply

Ratak and Rālik specific word categories[edit]

How do I set this up? So things work in {{lb}}, and so forth. I know similar categories exist for Category:Indian English, Category:New Zealand English, etc. The Ratak Chain and Rālik Chain dialects of Marshallese are mutually intelligible, and differ mainly by some regular variations in pronunciation reflex, and some vocabulary differences. But many of the different forms are often still written differently depending on dialect. For instance, m̧m̧an "good" is the common stem, em̧m̧an is the Rālik reflex, and m̧ōm̧an is the Ratak reflex, but in both dialects the prothetic vowel vanishes if the stem takes a bare vowel prefix: rūm̧m̧an (ri- + m̧m̧an) means "good person." I want to start making articles for the stem forms, and have their dialect reflex entries (by spelling) automatically categorized through {{lb|mh|Ratak}}, {{lb|mh|Ralik}}/{{lb|mh|ālik}}, etc. I should add that I don't know if the dialects themselves have supplemental language codes, the same way Tosk Albanian is "als" (Albanian, South) and Gheg Albanian is "aln" (Albanian, North).

I'm not sure what to name the categories, though—"Rālik Marshallese"? "Rālik dialect Marshallese"? "Rālik Chain Marshallese"? I'm not sure what the most stable nomenclature would be. In the Marshallese-English Online Dictionary, they're also frequently just called "Dial. W" and "Dial E.", since Rālik ("sunset") is the western chain and Ratak ("sunrise") is the eastern chain, but the two dialects' native isogloss line still runs between the two chains themselves.

I should probably additionally add...I'm not 100% sure that I know what I'm doing. It's one thing to know how templating and scripting languages work (which I increasingly know), and another thing entirely to know how existing templates and scripts are set up so I extend them for specific editing needs. - Gilgamesh~enwiki (talk) 01:14, 20 January 2020 (UTC)Reply

@Gilgamesh~enwiki: Categories for most language varieties are added to entries via Module:labels/data/subvarieties. You can add definitions for the labels {{lb|mh|Ratak}} and {{lb|mh|Ralik}} there, with categories and linked display text if desired. Personally, I like the shorter category name: "Rālik Marshallese". The category page can explain what it means. It looks like there aren't ISO codes for Rālik and Ratak, but if they might be referred to in etymologies (for instance, {{der|en|<code for ralik>|word}}), then they could be given Wiktionary codes in Module:etymology languages/data too. — Eru·tuon 19:34, 20 January 2020 (UTC)Reply

Thank you, I'll check out the subvarieties. And if nothing else, "mh-ralik" and "mh-ratak" may suffice as ad hoc language codes if ever needed. - Gilgamesh~enwiki (talk) 19:41, 20 January 2020 (UTC)Reply

Enunciated columns in the debug table[edit]

In addition to the previous section I just wrote, I was wondering...do we risk the module timing out if we add additional enunciated columns to it? Seeing that enunciated mode has since been fully deployed to articles wherever a consonant cluster exists in the phonemic form, acting on previously unread documents that Austronesier and I discussed at wikipedia:Talk:Marshallese language—see kajin M̧ajeļ for a good example of how normal phonetic and enunciated IPA can differ. And it's not just the absence of consonant assimilations or epenthetic vowels, but also some different vowel reflexes simply as a consequence of the last vowel before a consonant cluster being the last vowel of its prosodic fragment and the first vowel after a consonant cluster being the first vowel of its prosodic fragment—see eakeak, tuen̄ and utut to see what I mean. (Incidentally, you may be pleased to see that Arņo now shows two different consonants when enunciated.)

As for how the added columns would work, enunciated forms would only differ between dialects if their normal phonetic forms already differ (because of the limits in the differences between dialect reflexes), so I'm thinking something like: phonetic (Rālik), enunciated (Rālik), phonetic (Ratak), enunciated (Ratak), with each dialect's phonetic and enunciated columns merging if they're the same, and all four columns merging if all four forms are the same.

If we'd be taxing our Scribunto/Lua allowances too much for the one table, I could instead set it to show enunciated mode in the sandboxed version as a temporary visual aid during relevant discussions, but still there are now effectively four different phonetic modes to debug. - Gilgamesh~enwiki (talk) 16:11, 20 January 2020 (UTC)Reply

@Gilgamesh~enwiki: At the moment there's no risk of the testcases timing out, even if they take twice as much Lua processing time as they do now, because it's still under a second, and they've got a limit of ten seconds. The page does take a bit long to parse now though: the "real time" can be as much as 2 seconds (not quite as long as for Wiktionary:List of languages: ~6 seconds).

I'll take a look at how to handle the enunciated mode. I do like it using spaces; it looks quite intuitive to me. — Eru·tuon 09:26, 21 January 2020 (UTC)Reply

Well, it looks like if the table starts to balloon that big, we may have to start excluding other words that simply won't get displayed. Perhaps some of the least bug-prone words with the least complicated logic involved, like for instance those with invariable /ʲVʲ, ˠVˠ, ʷVʷ/ vowels and no clusters, like jeen and ļan̄. But for now, nothing needs to be removed and it may never get to that point. And I suppose there's still a chance I could improve the module's efficiency in other areas. - Gilgamesh~enwiki (talk) 13:14, 21 January 2020 (UTC)Reply

Voicing of fricatives in Old English pronunciation transcriptions[edit]

Latest comment: 4 years ago2 comments2 people in discussion

I saw that you've recently edited a bunch of Old English entries to replace /z/ with /s/, leaving the comment that [z] is an allophone of /s/ in Old English. That is arguably true, but I think the removal of /z/ from Old English transcriptions brings up a few more issues that ought to be addressed. First, the reason I say the allophonic status of [z] is "arguable" is because there are in fact some contexts where the use of a voiced vs. a voiceless fricative may not be completely predictable from the phonological context. See "Phonemically Contrastive Fricatives in Old English?", by Donka Minkova, for a description of some of the relevant evidence and references to prior literature that discusses the topic (Minkova does support the interpretation that the voiced and voiceless fricatives were allophones in Old English). The other issue, more important in my opinion, is a matter of consistency: two other voiced fricatives, [v] and [ð], are commonly analyzed as allophones of /f/ and /θ/. So a transcription like "/ˈt͡ʃiyvese/" for ciefese seems fairly problematic: if we decide to use /s/ here, I think it would be better to also use /f/, giving /ˈt͡ʃiyfese/. And in fact, considering that the allophonic realization of voiceless fricative phonemes as voiced fricatives doesn't come naturally to modern English speakers, and that (as mentioned above), the distribution of the voiced and voiceless allophones in Old English is somewhat complicated, I think it would be worthwhile to include a phonetic transcription using [v] and [z] in addition to a phonemic transcription with /f/ and /s/ for words like this.--Urszag (talk) 21:36, 31 October 2019 (UTC)Reply

@Urszag: Sorry, I really did make a mess with my edits. I will search for phonemic transcriptions with /ð/ and /v/ and correct them as well.

It would be easier to just generate Old English transcriptions with Module:ang-pronunciation, which I started but never completed. I agree that there should be phonetic transcriptions for words in which /f s θ/ are voiced. Words with hard allophones of /j/, like eċġ, whose phonemic transcription /ejj/, would also benefit from phonetic transcriptions (assuming that the "hard" and "soft" pronunciations of ġ are indeed allophones) because the change from /j/ to [d͡ʒ] is a bit surprising. — Eru·tuon 14:46, 1 November 2019 (UTC)Reply

Review of NEC rewrite[edit]

Latest comment: 4 years ago4 comments2 people in discussion

WDYT about the result? Should I move the function processor() and function setup_click_keyup() out of the setup_infl()?--So9q (talk) 19:17, 4 November 2019 (UTC)Reply

I'm still very confused by the script, but it looks much improved. I have some cleanup ideas. It's probably a good idea to add a nec- prefix to the NEC parameters in the URL, to avoid collisions, and it's traditional to use hyphens in class names rather than underscores. I've made the script use mw.util.getParamValue instead of a custom function.

I loaded the scripts, and some of the translation links are colored; but clicking the links doesn't show the NEC. Maybe I broke User:So9q/new-entry-creator.js when I edited it? — Eru·tuon 20:12, 4 November 2019 (UTC)Reply

I just tested and it still works for me clicking translation links. Although for now CreateTranslation.js only support fetching the first PoS. There is a bug with lang=code not being set also.--So9q (talk) 16:30, 6 November 2019 (UTC)Reply

Oh, it's working now for me too. That's odd. — Eru·tuon 17:07, 6 November 2019 (UTC)Reply

Adding aliases to Module:family tree[edit]

Latest comment: 4 years ago3 comments2 people in discussion

You've done a lot of work on this. Now that we have aliases for etymology languages, I'd like to display them, either in the family tree or in an info box, similar to what we have with {{langcatboiler}}. Maybe we should have {{etym lang cat}} for etymology language categories; currently these categories, when they exist, aren't standardized in name or contents. Benwing2 (talk) 05:40, 15 November 2019 (UTC)Reply

@Benwing2: I've thought of creating a template for etymology language categories, but I got hung up over an unresolved issue. At the moment, many etymology language categories just have a category for the canonical name (Category:Attic Greek), though there is also Category:Kölsch Central Franconian corresponding to Kölsch (ksh). Entries are added to the categories using {{lb}} and {{tlb}}. Ideally lemmas and non-lemma forms would be in different categories, but I didn't know how to do that. It would be weird to have to specify lemmas or non-lemma forms in {{tlb}}, like having {{tlb|grc|Epic Greek lemmas}} or {{tlb|grc|Epic Greek non-lemma forms}} display as "(Epic)" but add different categories, and I didn't know how to accommodate that in Module:labels and couldn't think of another good way to add the categories. So I never came up with any kind of action plan. Maybe this issue doesn't have to be solved right away though. — Eru·tuon 19:52, 15 November 2019 (UTC)Reply

One possibility is to allow etymology languages in {{head}}, which knows about the POS and hence whether it's a lemma or not. The only other way I can think of without having the POS or lemma status marked explicitly in {{tlb}} is for {{tlb}} to look through the page text, which is expensive and likely error-prone. Benwing2 (talk) 18:11, 16 November 2019 (UTC)Reply

Χαῖρε! On 21st century Wiktionary we shouldn't perpetuate the biases of 19th century Englishmen; Doric is real Ancient Greek! Not a subdialect of Attic...[edit]

Latest comment: 4 years ago2 comments2 people in discussion

Χαῖρε, hello, nice to (virtually) meet you...

With regard to recent edits on ἅρπα I wasn't sure where to post this, I was just responding specifically vis-à-vis the Doric Greek morphology of ἅρπα but ran long touching on the broader subject of Greek dialects and their inclusion on Wiktionary, so I'll post this full comment on your talk page too...

Extended content

Personally I am bewildered that a simple 1st declension noun like Doric ἅρπα for Attic ἅρπη would be controversial...? This is pretty basic Ancient Greek dialectal morphology variance. Doric (and Aeolic) retain original ᾱ which Attic changed to η in many cases (there are exceptions after certain letters ε, ι, ρ; whereas Ionic nearly always changes old ᾱ to η). 1st declension singular -ᾱ, -ᾱς, -ᾳ, ᾱν. In the plural the forms are the same as Attic except in the genitive plural Doric -ᾱων typically contracts to -ᾶν. Unlike some other dialectal variances, on an academic level Doric 1st declension in -ᾱ, -ᾱς for Attic -η, -ης is a fairly well-established consistent paradigm, a minor lengthening of one vowel...

....and Western/Central Greek dialects (Doric-Aeolic) preserved ᾱ which was the original Ancient Greek form; Attic-Ionic lengthening ᾱ to η was a later dialectal novelty unique to the Eastern Greek dialects (Attic-Ionic). Attic is in fact the variant form here from the original authentic archaic Greek form which Aeolic and Doric much more faithfully preserved...to this day Tsakonian, descended from Doric, spoken in the Peloponnese (albeit sadly endangered) preserves ancient α where later Attic-derived Greek substituted η.

And in the ancient world, Doric and Aeolic Greek is what they spoke in Sparta and all of Laconia, in Thebes and all of Boeotia, in Epirus, in Achaea and Thessaly, Corinth and Olympia, on the islands of Lesbos and of Crete (also a bastion of preservation for the most authentic original Ancient Greek, being the birthplace of Greek civilization going back to the Mycenaean Greeks and Minoan Greeks), and also in much of Magna Græcia (Italy and Sicily), including Syracusæ in Sicily, the home of Archimedes, and by the Classical period the greatest and most significant rival city of Athens in the Hellenic world, by some sources Syracusæ was even larger and more significant than Athens. (And of course if you know your history, Athens deciding to launch an infamous "Sicilian Expedition" to attack Doric Syracusæ during the Peloponnesian War would prove a catastrophic ruinous mistake for the Athenians).

This seems to touch on the other general problem raised by recent edit reverts, which is bias in Wiktionary's coverage of Ancient Greek hitherto, bias that should be removed. A 21st century electronic 'Wiktionary' should not perpetuate biases of 19th century-20th century elite French and Englishmen who based on historical judgments idolized all things Athens, put up on an Ionic pedestal (the other 2 Greek column orders being Doric and Corinthian, both Dorian speakers!) while demonizing and denigrating Sparta and all of the Doric and Aeolic Greek worlds, in fact all of Ancient Greek linguistic history except for c. 5th century BC Athens. Biased scholars many centuries later decided that Attic was superior and real Greek while other dialects mere imitators, Archimedes in Syracusæ did not speak Ancient Greek of the Doric dialect, rather he spoke an inferior "Doric forms" of REAL Greek which is only Attic.

Other than such historical bias, there is no reason why distinct words and forms of Ancient Greek in Doric or Aeolic should just link to the Attic form as REAL Ancient Greek. Attic has more unique local noveltiies diverging from standard Ancient Greek than Doric/Aeolic. In their time Doric and Aeolic Greek were of equal if not greater significance, and spoken by far more people than the novel local dialect of Athens, which again only became looked at as the "model"

Doric Greek is different from Attic Greek, different enough that Doric/Aeolic forms deserve their own entry (at least a West Doric/Aeolic separate from Attic/Ionic). Different but an equally valid form of Ancient Greek in its own right and merits inclusion of Doric/Aeolic forms that stand on their own, not just (mis)represented as inferior variant forms of Attic. The language is called "Ancient Greek", NOT "Attic Greek". Doric/Aeolic Greek words and forms should be added/provided whenever possible-and as their own entries, not links to Attic, 'tis biased historical revisionism to imply Doric and Aeolic Greek are just variant forms of REAL (Attic) Greek, when in fact the dialects developed independently and were of equal standing and signifcance in the time when they were actually spoken and used as living languages (and Doric was actually closer to the original, Attic was the odd local provincial dialect that diverged most from Proto-Hellenic). As a reference source for all languages including ancient languages no longer spoken (some of which far more speculative like e.g. Phoenician/Punic), Wiktionary (and Wiktionarians) should seek to provide Doric Greek entries no less so than Attic entries. The biases of the recent past against any form of Greek except 5th century BC Athens dialect should be left on the ash heap of history. Rather, for a fair, unbiased and thorough modern reference source on Ancient Greek, the dialects should be treated equally as their own forms of Ancient Greek language with their own unique morphology.

Reducing Doric/Aeolic Greek words to mere dialectal variants of Athens just linking to the Attic variant is akin to having Aragonese, Asturian, Catalan, Galician, Leonese, Occitan, even Portuguese, all just have links to the (Castilian) Spanish entry e.g. Catalan joventut entry should say just "Catalan form of juventud" with a link to the Castilian Spanish juventud entry. After all, like Attic among Greek dialects, Castilian Spanish is the clear historical winner of the Ibero-Romance languages, the other Ibero-Romance languages are historical losers, just inferior imitation dialect forms of Spanish language not worth recordng and preserviing in their own right, like Doric and Aeolic are just inferior imitation dialects of Attic REAL Greek...

Respectfully, I would suggest perhaps re-examining your potential ingrained Athenocentric biases that have plagued Greek classrooms and textbooks and lexicons for the past few centuries which conflate Attic Greek with Ancient Greek, and which ignore or disparage other dialects as irrelevant inferior imitations of Attic at best, missing the forest through the trees; try to zoom out and get a new bigger picture perspective conscious of these insidious deeply ingrained...some of us have actually studied and are actually interested in researching and preserving Doric and Aeolic Greek for their own sake as equally valid and historically and linguistically significant forms of Ancient Greek, not as mere trivial inferior variant subdialects of Attic. Someone who wants to research Doric Greek forms should not have to click through every entry to go see the Attic variant as the "real" form. Attic is the spin-off from the original, not Doric! And at the very least Doric and Aeolic Greek entries deserve to exist! Especially such simple forms conforming to basic paradigms of what we know about the standard morphology and usage of Doric and Aeolic Greek dialects. Wiktionary cannot claim to have comprehensive coverage of Ancient Greek as a reference source if it neglects the other equally significant, equally legitimate, equally valid, equally deserving divergent dialects. Wiktionarians should seek to add Doric Greek entries just like they add Catalan and Galician or Asturian despite being varians of far more well-known and widely used Castilian Spanish which like Attic Greek just happened to win the historical winners-and-losers lottery...

And this is the case with Doric-Aeolic ἅρπα, ἅρπᾱς, an equally valid independent Western Greek form deserving of its own entry distinct from the Eastern Greek Attic-Ionic variant ἅρπη, ἅρπης...across many other languages there are many far more redundant forms of words in closely related languages (often forms identical or nearly identical, more closely related than the rainbow of diverse Western Ancient Greek and Eastern Ancient Greek dialects) that may not be so commonlyused much but are considered worthwhile to preserve as a comprehensive linguistic reference source database.

Herbert Weir Smyth, A Greek Grammar for Colleges http://www.perseus.tufts.edu/hopper/text?doc=Perseus%3Atext%3A1999.04.0007%3Apart%3D2%3Achapter%3D13%3Asection%3D13 Smyth grammar 2.13.13 FIRST DECLENSION (STEMS IN α_）

[*] 214. The dialects show various forms.

[*] 214 D. 1. For η, Doric and Aeolic have original α_; thus, νί_κα_, ϝί_κα_ς, ϝί_κᾳ, νί_κα_ν; πολί_τα_ς, κριτά_ς, Ἀτρείδα_ς.

2. Ionic has η for the α_ of Attic even after ε, ι, and ρ; thus, γενεή, οἰκίη, ἀγορή, μοίρης, μοίρῃ (nom. μοῖρα^), νεηνίης. Thus, ἀγορή, -ῆς, -ῇ, -ήν; νεηνίης, -ου, -ῃ, -ην. But Hom. has θεά_ goddess, Ἑρμεία_ς Hermes.

3. The dialects admit -α^ in the nom. sing. less often than does Attic. Thus, Ionic πρύμνη stern, κνί_ση savour (Att. πρύμνα, κνῖσα), Dor. τόλμα_ daring. Ionic has η for α^ in the abstracts in -είη, -οίη (ἀληθείη truth, εὐνοίη good-will). Hom. has νύμφα^ oh maiden from νύμφη.

8. Gen. plur.—(a) -ά_ων, the original form, occurs in Hom. (μουσά_ων, ἀγορά_ων). In Aeolic and Doric -ά_ων contracts to (b) -ᾶν (ἀγορᾶν). The Doric -ᾶν is found also in the choral songs of the drama (πετρᾶν rocks). (c) -έων, the Ionic form, appears in Homer, who usually makes it a single syllable by synizesis (60) as in βουλέωνν, from βουλή plan. -έων is from -ήων, Ionic for -ά_ων. (d) -ῶν in Hom. generally after vowels (κλισιῶν, from κλισίη hut).

Perseus Greek Word Study Tool:

http://www.perseus.tufts.edu/hopper/morph?l=arpa&la=greek#lexicon ἅρπα noun sg fem nom doric aeolic ἅρπα noun sg fem nom doric aeolic

http://www.perseus.tufts.edu/hopper/morph?l=arpas&la=greek#lexicon ἅρπας noun sg fem gen doric aeolic

Greek morphological index (Ελληνική μορφολογικούς δείκτες):

Nominative: https://morphological_el.academic.ru/687234/%E1%BC%85%CF%81%CF%80%CE%B1%CF%82#sel=10:3,10:3 ἅρπας

   ἅρπᾱς , ἅρπη
   bird of prey
   fem acc pl
   ἅρπᾱς , ἅρπη
   bird of prey
   fem gen sg (doric aeolic)

Accusative: https://morphological_el.enacademic.com/687226/%E1%BC%85%CF%81%CF%80%CE%B1%CE%BD ἅρπαν

   ἅρπᾱν , ἅρπη
   bird of prey
   fem acc sg (doric aeolic)

Inqvisitor (talk) 08:24, 16 November 2019 (UTC)Reply

Hi, it looks like your post in WT:RFVN is substantially the same. In future, please post in just one place. You can bring my attention to the post by including a link to my user page (Erutuon). That will send me a notification. — Eru·tuon 09:04, 16 November 2019 (UTC)Reply

On the reversal of my edit on the article on ışık[edit]

Latest comment: 4 years ago2 comments2 people in discussion

You reverted my edit on the page ışık. Why is that? The declension adds nothing to the article (the nominative declension is the word itself and the accusative declension is already given in the {{tr-noun}} template: "ışık (definite accusative ışığı, plural ışıklar)"). In my opinion, the templates {{tr-infl-noun-c}} and {{tr-infl-noun-v}} shouldn't be used anywhere on Wiktionary as they provide no information that {{tr-noun}} doesn't already provide already but only bloat the site. --Fytcha (talk) 18:16, 6 December 2019 (UTC)Reply

@Fytcha: There are a lot more forms in the table than just the definite accusative and the plural (ışık, ışığı, ışıklar, ışıkları, ışığa, ışıklara, ışıkta, ışıklarda, ışıktan, ışıklardan, ışığın, ışıkların, ışığım, ışıklarım, ışığımız, ışıklarımız, ışığınız, ışıklarınız), but they are hidden by default. You've got to click two "more" buttons on the right side of the table to see them. — Eru·tuon 18:22, 6 December 2019 (UTC)Reply

Another Rustacean :)[edit]

Latest comment: 4 years ago4 comments3 people in discussion

I noticed that you are working in Rust. It has become my favourite language recently, although for Wiktionary bot work I still use Python. —Rua (mew) 11:01, 9 December 2019 (UTC)Reply

I've become quite fond of it as well, and now often miss features like return values from blocks and match blocks when programming in Lua. — Eru·tuon 19:36, 9 December 2019 (UTC)Reply

@Rua, Erutuon: I'm interested in things you dislike about Rust. I looked at it a while ago, and there was a lack of libs for doing standard stuff (talking to a database etc.), but that's probably changed in the meantime. - Jberkel 00:26, 10 December 2019 (UTC)Reply

Yeah, the development is going pretty fast. Not just the language itself, but library infrastructure as well. —Rua (mew) 10:14, 10 December 2019 (UTC)Reply

If you ever have time[edit]

Latest comment: 4 years ago7 comments2 people in discussion

I hate to bother you all the time. If you ever have time, could you check el:Module:sarritest The only person in el.wikt who knew Lua is now a 'vanished' user. sarri.greek (talk) 00:00, 11 December 2019 (UTC)Reply
Thank you so much! sarri.greek (talk) 18:48, 11 December 2019 (UTC)Reply

@sarri.greek: Let me know if you need any more help or further explanation. — Eru·tuon 18:51, 11 December 2019 (UTC)Reply

The basic ideas of Lua, I cannot grasp. I have tried all kinds of combinations of the words 'local', 'frame', but I cannot make the collective function.main work. It is just an excercise, it is not important.

One general question, if i may: When we have a module which produces declensions automatically like el:Module:κλίση/el/ουσιαστικό, is it better/preferable to do all the paradigms IN the Module? Or create wikitext Templates with the parameters for the endings? They are so many! and the Module page becomes so long! sarri.greek (talk) 16:08, 13 December 2019 (UTC)Reply

It turns out I had reversed the logic for getting args. That's not uncommon with me.

Do you mean separate templates for each declension? I suppose either way works, but I like to be able to edit all the paradigms at once and compare them, so having them in a single module helps. For Ancient Greek, the module is Module:grc-decl/decl/staticdata/paradigms. If each is in a separate template, then there are more pages to edit. — Eru·tuon 19:04, 13 December 2019 (UTC)Reply

Thank you SO much. For the many pages of paradigmata: I was worried about what is best for ...errr... you call some actions 'expensive' or bad, or not good. I will study the examples you have shown me. sarri.greek (talk) 19:09, 13 December 2019 (UTC)Reply

Ahh, I see. I'm not sure which is least expensive in memory and Lua processing time. — Eru·tuon 19:20, 13 December 2019 (UTC)Reply

Req[edit]

Latest comment: 4 years ago7 comments2 people in discussion

Hi Erutuon. Can you run a bot to do this:

moving translations with ku code and Latin script to kmr code and Northern Kurdish dialect

moving translations with ku code and Arabic script to ckb code and Central Kurdish dialect

also this:

changing translations with ku code and Latin script to kmr code

changing translations with ku code and Arabic script to ckb code

also we shouldn't allow ppl to add translations with ku code; they should use Kurdish dialects codes (kmr, ckb, ...) instead of using ku code directly. Thanks.--Calak (talk) 16:50, 13 December 2019 (UTC)Reply

Hmm, I know how to identify scripts, but don't have a method to modify translations yet. I can at least make a list to start with. — Eru·tuon 08:13, 14 December 2019 (UTC)Reply

Oh, no! You don't need to modify translations, you should change "ku" code to "ckb" or "kmr" per its script.--Calak (talk) 11:15, 14 December 2019 (UTC)Reply

Right, by modifying translations I mean changing moving translations from "Kurdish" to "Northern Kurdish" etc. while using the correct format (the first diff). For that, it would be nice to have a method that would move translation x from language a to language b and format everything correctly. It seems complicated though. Perhaps someone else has worked this out already. But I might be able to change language codes easily (the second diff). — Eru·tuon 22:25, 14 December 2019 (UTC)Reply

OK. How about to prevent people from using ku code in translations? Can you add a code (in TranslationAdder gadget) to do this?--Calak (talk) 16:19, 15 December 2019 (UTC)Reply

@Calak: Hmm, perhaps the TranslationAdder could suggest inserting the translation under ckb, kmr, or sdh instead of ku? I might be able to figure out how to do that but I've mostly stayed away from that gadget because its code confuses me. — Eru·tuon 09:14, 17 December 2019 (UTC)Reply

It is OK Erutuon. I will be thankful if you can apply any one of them.--Calak (talk) 07:12, 21 December 2019 (UTC)Reply

Reverted Edit[edit]

Latest comment: 4 years ago2 comments2 people in discussion

Hello, it is not an "odd alternative pronunciation". Several million people pronounce it that way, whereas the mispronunciation of "decade" has about five variants on the site for about 10 speakers. ABAlphaBeta (talk) 08:39, 17 December 2019 (UTC)Reply

@ABAlphaBeta: I'm sorry for my hasty reversion. I've restored the alternative pronunciation that you probably meant (as User:Mellohi! pointed out to me), but moved it into {{fr-IPA}}: {{fr-IPA|écuidistant|équidistant}}. I know very little about the fine details of French pronunciation and you may be right. Words with équi- (or ultimately derived from aequus) are transcribed with either /e.kɥi/ or /e.ki/ on Wiktionary, and while the soundfiles of équidistant on the French Wiktionary and on Forvo has /e.kɥi/, perhaps some people pronounce it with /e.ki/ like équilibre and other words because it may be as confusing for French speakers as it is for foreigners like me. — Eru·tuon 09:10, 17 December 2019 (UTC)Reply

Deletion reasons[edit]

Latest comment: 4 years ago3 comments2 people in discussion

Hi. In October, you added "Incorrect title: a mixture of Latin- and Cyrillic-script characters". Do you think this could be merged into the existing "Bad entry title"? How do they differ? Equinox ◑ 08:05, 20 December 2019 (UTC)Reply

@Equinox: Well, it's certainly a subtype, but I prefer to be clear since it's not always easy to see what's wrong with the title. I was thinking maybe something like "mixed script" or "incorrect lookalike characters" would work as well. At the time there was a backlog of these titles, and I was getting tired of re-entering the deletion reason since the "content: ..." bit prevented the input box history from working. But perhaps it won't be needed now that there's this abuse filter. It displays a message showing which characters are in which script, which seems to enable editors to create the entry at the right title, so there aren't any new badly titled entries to delete. — Eru·tuon 08:33, 20 December 2019 (UTC)Reply

Yeah, went and removed it. — Eru·tuon 08:56, 20 December 2019 (UTC)Reply

Help needed at simple.wikt[edit]

Latest comment: 4 years ago6 comments2 people in discussion

Hi Erutuon, can you help me with the Lua Module:number list on simple.wikt? Minorax (talk) 05:10, 29 December 2019 (UTC)Reply

@Minorax: Sure... I did fix one problem that caused a module error. — Eru·tuon 05:30, 29 December 2019 (UTC)Reply

So that was the problem, forgot about that. Thank you :) Minorax (talk) 05:37, 29 December 2019 (UTC)Reply

And since simple.wikt only contains English words, Module:number list/data/en isn't really needed as a subset of the module, is it possible to merge it into the main module? Minorax (talk) 05:41, 29 December 2019 (UTC)Reply

It's possible, but I wouldn't recommend it. Putting data in the main module adds many lines, making it harder to edit, and if you want to keep the Simple Wiktionary module in sync with the English Wiktionary module, it will be harder to copy code. — Eru·tuon 05:51, 29 December 2019 (UTC)Reply

Alright :) Minorax (talk) 05:52, 29 December 2019 (UTC)Reply

User talk:Erutuon/2019

Flag of Portuguese[edit]

Change to MediaWiki:Common.js[edit]

do ... end?[edit]

problem with {{der3}}[edit]

-ύς epic declension[edit]

Issue with "Template:WOTD" and audio files[edit]

What have you done?[edit]

rookie's question[edit]

Just informing that the Saudi IP has an agenda for removing computing-related senses[edit]

proper way to clone a table[edit]

jahvatama[edit]

grc-noun form[edit]

dot= in form-of templates[edit]

καλός[edit]

Wikipedia links[edit]

Not quite done etyl cleanups[edit]

form-of templates: Full information[edit]

Lang-specific form-of templates[edit]

Scripts scripts scripts[edit]

Your Latin>Cyrillic edits[edit]

List of inflection tags by usage?[edit]

List ϝείδω for etymology of εἴδομαι, εἶδον, οἶδα, and ϝοράω+ϝείδω for ὁράω.[edit]

Grease Pit reversions[edit]

and vs. // etc.[edit]

combining adjacent calls to {{inflection of}}[edit]

Javascript tooling[edit]

πολύγονον[edit]

WT:NEWS[edit]

rookie's question 2[edit]

Akkadian IPA[edit]

Franc-Comtois[edit]

How do if find a diff I know only by number and wiki?[edit]

CAT:E[edit]

Your miracles[edit]

Module:fi-pronunciation[edit]

Example sentences in usage notes[edit]

Module:User:Erutuon/Wonderfool[edit]

RE diacritic automatically removed[edit]

Overriding Skt. adjective templates?[edit]

2 things[edit]

Lua memory usage[edit]

Community Insights Survey[edit]

Context deprecation and red message[edit]

Administrator?[edit]

Erroneous conversion to t-simple[edit]

English at top[edit]

Admin[edit]

wikt:majolica n.[edit]

Removing control chars[edit]

Template:t-simple[edit]

Thank you[edit]

Efficiency[edit]

toMOD[edit]

Overhauling Template:mh-head[edit]

Distributive verbs[edit]

Bug[edit]

Ratak and Rālik specific word categories[edit]

Enunciated columns in the debug table[edit]

Voicing of fricatives in Old English pronunciation transcriptions[edit]

Review of NEC rewrite[edit]

Adding aliases to Module:family tree[edit]

Χαῖρε! On 21st century Wiktionary we shouldn't perpetuate the biases of 19th century Englishmen; Doric is real Ancient Greek! Not a subdialect of Attic...[edit]

On the reversal of my edit on the article on ışık[edit]

Another Rustacean :)[edit]

If you ever have time[edit]

Req[edit]

Reverted Edit[edit]

Deletion reasons[edit]

Help needed at simple.wikt[edit]

Navigation menu

Search

problem with `{{der3}}`[edit]

combining adjacent calls to `{{inflection of}}`[edit]