Wiktionary:Beer parlour/2022/November

From Wiktionary, the free dictionary
Jump to navigation Jump to search

We have Category:Adjective feminine forms by language and Category:Adjective plural forms by language for certain languages, esp. Romance languages. Do we really need these categories? Do they add anything useful? In general we don't categorize non-lemma forms according to their inflectional properties, so I'm not sure why we're doing it here. Benwing2 (talk) 05:32, 1 November 2022 (UTC)[reply]

Do we really need categories by etymology?
The information can be useful for the collection of oddities. For example, cat:Welsh adjective plural forms collects plurals that are distinct from the masculine singular, very much a minority of Welsh adjectives. Now, the current method of collection leaves a great deal to be desired. One needs to know that plural forms should be categorised as 'adjective plural form' rather than 'adjective form' via the PoS headline, which is not mentioned in WT:About Welsh. Consequently, the category is much shorter than it should be, omitting for instance gwynion.
In this case, better coverage would be obtained by generating a category 'Welsh adjectives with distinct plural', though there may be some awkward corner cases. A specific 'form of' template would also work, though there is the problem of training editors to choose the right template.
Category:Hebrew adjective feminine forms could likewise be useful, if one can restrict the display to feminines ending in taw. RichardW57m (talk) 10:59, 1 November 2022 (UTC)[reply]
If we want to categorize irregular / unexpected forms, it would be better to add something like "irregular" to the category names (and update the contents); as it is, Category:Portuguese adjective feminine forms, with its combination of regular singular and plural form-of soft redirects, which swamp any irregular forms that may be in there, seems kinda useless. Probably we should also rename the Welsh category something like "...irregular plural forms" or "...distinct plural forms" instead of just "...plural forms" for consistency, although if regular plural forms are identical to the singular and wouldn't be categorized at all (since we seem to in general not put "inflected form of itself" sense lines on pages), the need is less pressing. - -sche (discuss) 16:35, 1 November 2022 (UTC)[reply]
The Portuguese case is more difficult, but for the Hebrew case one can use a search such as:
incategory:"Hebrew adjective feminine forms" intitle:/...*ת/
Unfortunately, regular expressions seem not to support anchors at all. Let us not make the best the enemy of the good. --RichardW57m (talk) 10:14, 3 November 2022 (UTC)[reply]
I say delete them. @Embryomystic may want to weigh in. Ultimateria (talk) 02:35, 2 November 2022 (UTC)[reply]
While we're at it, why do we split lemmas by part of speech? --RichardW57m (talk) 10:28, 3 November 2022 (UTC)[reply]
You mean categories like "English nouns"? I find those very useful for filtering searches. I regularly include or exclude results by part of speech category. Ultimateria (talk) 03:34, 5 November 2022 (UTC)[reply]
I agree with User:-sche here; categories like this are only useful if they track only irregular forms (and have the appropriate name). Tracking all forms (the vast majority of which will be regular) isn't terribly helpful. Benwing2 (talk) 02:39, 2 November 2022 (UTC)[reply]
I don't think I have any arguments to offer in favour of retaining them, but I agree that there are situations in Welsh and Hebrew (and probably others) where subcategories of adjective forms might be a good idea even if the general concept is discarded. embryomystic (talk) 01:42, 3 November 2022 (UTC)[reply]
It would be good to hear from the creators such as @Ruakh, LlywelynII before we trash their work on templates and modules. --RichardW57m (talk) 10:27, 3 November 2022 (UTC)[reply]
Speaking only for myself — I don't have strong feelings either way. I don't actually remember doing work on templates and modules to support these categories, but whatever it was that I did, I imagine that most of it would have been needed anyway in order to show the right display text. —RuakhTALK 19:14, 11 November 2022 (UTC)[reply]
You are I take it aware that the concepts of a regular Welsh plural noun and of a regular masculine Arabic plural are dubious, just like the concept of the regular perfect of a Latin 3rd conjugation verb. --RichardW57m (talk) 10:27, 3 November 2022 (UTC)[reply]
@RichardW57m I am not proposing to "trash" Hebrew and Welsh template or module work. It's not even strictly necessary to eliminate all categories named "adjective feminine forms" and "adjective plural forms" etc. But I see no benefit at all to keeping these categories for Romance languages; do you? BTW there do exist regular Arabic masculine plurals (aka "sound masculine plurals"). The irregular ones you're thinking of are broken plurals, and IMO the categories should be named as such, i.e. in a language-specific manner. In fact, we do have such categories; take a look for example at Category:Arabic nouns by inflection type and you'll see a lot of them. Benwing2 (talk) 04:17, 5 November 2022 (UTC)[reply]
These categories are populated by templates and modules. If the categories are deleted, an implicit invitation to recreate (namely, a red link) will be sent to everyone who is shown the categories of a page being placed in them. The only way to permanently remove the categories requires changing the templates and modules. Now, it may be simple to orphan the categories by adjusting the code invoked by {{auto cat}}, but that strikes me as a retrograde step if the categories still exist. Once objects are no longer be placed in them, these categories will be caught up in the regular slaughter of empty categories.
Eliminating these categories for Romance languages is extra work - and I'm not sure that French adjective plurals in -x are not of interest. (Unfortunately, anchors are currently missing from regular expressions in searches - someone should raise a Phabricator ticket to add them.) --RichardW57m (talk) 10:09, 7 November 2022 (UTC)[reply]
OT: Arabic sound masculine plurals are just one, circumscribed option - it's hard to describe them as the 'regular' form, except when the singular fits certain patterns of derivation, and there are also predictable broken plurals, e.g. for diminutives. --RichardW57m (talk) 11:57, 7 November 2022 (UTC)[reply]
@RichardW57m I absolutely do not understand your objection concerning eliminating the Romance categories given that no one else is in favor of keeping them. "It's extra work to get rid of them" is a pretty questionable reason for keeping them (and in any case the actual work is trivial). Benwing2 (talk) 02:15, 8 November 2022 (UTC)[reply]
I suspect you're uttering an untruth. Not all users read the Beer Parlour every week. Should you even expect non-editors to read the Beer Parlour at all? You haven't even announced the threat to delete them on the category pages themselves! And how do you propose to eliminate these categories properly? How to restore them isn't obvious to everyone - your knowledge of the systems employed is excellent, but the systems are not well documented. Indeed, different languages do the same thing differently. --RichardW57m (talk) 11:21, 8 November 2022 (UTC)[reply]
Now, some of their functionality should be addressed better. But one should put the alternative functionality in place before deleting the old. --RichardW57m (talk) 11:21, 8 November 2022 (UTC)[reply]
@RichardW57m Damn you are blustery. I want to at least eliminate the Portuguese adjective categories, which are populated only partially and only when you use {{adj form of}}, and you are acting like the gatekeeper of all category changes. You haven't given a single reason why these categories are actually useful and worth the maintenance burden (which falls on people like me, not you). Please let me know why you are so desperate to keep them -- do *you* actually use them? Or is this just a sort of "nothing should ever be removed because someone might possibly find them useful"? Benwing2 (talk) 02:13, 14 November 2022 (UTC)[reply]
It is partly that someone went to the trouble of creating them; I'd be a lot happier if on reconsideration they accepted that it was not useful work. I'd be a lot happier if you put notices on the categories you want to get rid of alerting any who actually use them of the categories' imminent removal. In general, these adjective categories are being generated by two routes - the {{inflection_of}}-type route, as in {{adj form of}}, and by direct invocations of {{head}}. I don't see any saving in eliminating the former for Portuguese; you eliminate one occurrence of "romance_adjective_categorization," in Module:form_of/cats. Now, there would be a saving of maintenance effort if you eliminated the categorisation of entries by inflection for gender and number, but if that is what you are proposing, say so. --RichardW57m (talk) 13:24, 14 November 2022 (UTC)[reply]
Having looked into this mechanism, I am now wondering if it could actually be useful when revising inflection tables and reducing the number of forms. Several cases spring to mind for the dative singular of Pali a-stems:
  1. The ending in -tthaṃ does not actually seem to be a case-ending, and one day we may be able to get rid of it. (There was no mechanism to formally challenge it.) By that time, there may be senses of words ending in -tthaṃ claiming them as dative singulars. We would then need to eliminate or redescribe them. However, for this one, it might actually be quicker to search for noun and adjective forms in -tthaṃ, and rely on such forms having entries in the Roman script.
  2. We may be overstating the number of masculine and neuter dative singulars in -āya. This is not a rare form for feminines, being used for several cases. We may therefore need to revise such case forms when entered as terms.
  3. At present, we distinguish Pali datives from genitives by their meaning. In Prakrit, the criterion is form, and therefore many words lack datives entirely. If we switched Pali to the same treatment as Prakrit, what are currently described as dative/gentive caseforms will have to be redescribed.
It would seem a shame for categorisation by inflection to have to be re-implemented for such filtering. --RichardW57m (talk) 13:24, 14 November 2022 (UTC)[reply]
I am already using some Pali verb forms as the basis of categorisation, but categorising the lemma rather than the inflected form, and classifying the categories as maintenance categories. The problem is that the textbooks help one recognise a form, rather than tell one if it does not exist. --RichardW57m (talk) 13:24, 14 November 2022 (UTC)[reply]

Duplicated words in Category borrowed[edit]

notifying @Benwing2, Erutuon. It is me again, about borrowed terms. Example: under Category:Greek terms borrowed from French, the members of subcategories Category:Greek learned borrowings from French, the Cat:unadapted & the Cat:obor, etc are duplicated. They appear twice, in the 2 categories. So, we cannot tell, which ones have the template {{bor}}. The terms under calques and semantic loans are OK, they do not appear twice. Same happens at e.g. Category:English terms borrowed from French, and so on. For Greek languages, perhaps others too, the {{bor}} template is very significant and distinct from the other templates. It would be great if this duplication could be avoided? Thank you. ‑‑Sarri.greek  I 07:58, 1 November 2022 (UTC)[reply]

@Sarri.greek I guess you're requesting that template {{lbor}} does not categorize into 'DEST terms borrowed from SOURCE' but only 'DEST learned borrowings from SOURCE'? My original logic for categorizing into both is that a learned borrowing is still a borrowing, and if you remove them from the parent category, it would be easy for a new Wiktionary user to miss the fact that they also have to look in all child subcategories to find all borrowings. Also there was a vote in favor of including 'DEST terms borrowed from SOURCE' also in 'DEST terms derived from SOURCE', and this is in the spirit of that vote. OTOH I suppose this same argument could potentially be made for including all terms in all subcategories in all their parent categories, which might be undesirable. Benwing2 (talk) 02:48, 2 November 2022 (UTC)[reply]
No, this time I do not request any template (! I changed my mind, since en.wikt, thinks differently.)@Benwing2. As is have seen here and there, there are 2 kins of Categories:
1) The index-like-cateogires (all the members of all subcategories can be viewed there) (Probably they should have a different name too: Index:C....)
2) and the 'non-index' ones which are
  • 2a) either empty, and only subcategories can be seen
  • or 2b) subcategories have their hyponyms, +we view in the general Cat the words which have no characteristic of a hyponym.
The above e.g. Category:Greek learned borrowings from French and the similar are a bit sloppy in the sense that there is no way to spot the {bor} = the ones that are NOT hyponyms (I have understaood, that in Eng.Dictionaries, the {bor} is a general word and means no specific kind of borrowing. So, The structure 1. or 2b (I would love to have the 2b, because it serves other languages too, which need to separated {bor} from {lobr} {ubor} ... I am sorry, that I cannot express myself a bit better from the linguisitcs side of things. Thank you, for your attention. ‑‑Sarri.greek  I 03:00, 2 November 2022 (UTC)[reply]

Ecclesiastical Latin vs. Medieval and New Latin[edit]

For purposes of classification what's the difference between them meant to be exactly on WT? The definitions currently on the category pages are (Ecclesiastical Latin) "a form of Latin initially developed to discuss Christian thought and later used as a lingua franca by the Medieval and Early Modern upper class of Europe"; (New Latin) "a revival in the use of Latin in original, scholarly, and scientific works since c. 1375/1500"; (Medieval Latin) "a primarily written form of Latin used across Europe in the Middle Ages". The definition of Ecclesiastical Latin is the sticking point here since it makes it synonymous with, or a collective term for, Medieval and New Latin, or weirdly implies that the latter are basilects (not "upper class").

My own thought, which seems to better reflect the terms that are actually in the category and how I've used it as a label myself, is that Ecclesiastical Latin should be limited to terms with a specifically liturgical or theological bearing, especially ones that have been current in the Catholic Church up to the contemporary era (apart from the liturgy, many Catholic specialist journals were still written in Latin up to the mid-20th century). The "lingua franca" stuff should be dropped from the description—Ecclesiastical Latin is Latin used by the Church, not just "the upper class" and not specifically in medieval or early modern times. —Al-Muqanna المقنع (talk) 12:03, 1 November 2022 (UTC)[reply]

Do we need a category for Ecclesiastical at all? As you mention, it spans multiple periods in history. It almost amounts to a topic label, such as 'food' or 'types of potato'. Nicodene (talk) 14:07, 1 November 2022 (UTC)[reply]
I tend to agree actually, it would make more sense to just have straightforward chronological categories and use Category:la:Theology, Category:la:Bible, Category:la:Christianity etc. where appropriate, and maybe treat existing "Ecclesiastical Latin" labels as meaning "post-Classical". I was thinking about this when I made dēcrētum horribile, which is very much a theological term but a Protestant one (the term is Calvin's and both of my Latin citations are from Lutherans)—is there "Protestant Ecclesiastical Latin", or should it just be listed as New Latin? Might be easier to avoid the question and just use Medieval/Renaissance/New with topics as appropriate. —Al-Muqanna المقنع (talk) 14:14, 1 November 2022 (UTC)[reply]
How much does in cost us to maintain these labels and categories? If all we get is a bit of tidiness, it doesn't seem worthwhile to suppress the information reflected in the labels and categories. Not all of our category groups are mutually exclusive and collectively exhaustive, nor should they have to be. DCDuring (talk) 15:05, 1 November 2022 (UTC)[reply]
@DCDuring: The problem isn't tidiness, it's that it isn't clear what the label is actually intended to mean, and the description of the category (which is also the intro of the Wikipedia page the label links to) contradicts how it's used in practice. I don't mind if it's kept with an explanation, e.g. along the lines of my suggestion above (Latin as used by the Church, up to the contemporary age). But I am sympathetic to Nicodene's point to the extent that getting rid of the term would not actually suppress any information, since as actually used it doesn't seem to contribute anything that wouldn't be covered by a chronological + topical combination like "New Latin, theology" and the like. —Al-Muqanna المقنع (talk) 16:44, 1 November 2022 (UTC)[reply]
Exactly. The meaning isn't tidy.
It certainly doesn't contribute anything to someone not interested in what it might mean. Is it really true that all Ecclesiastical Latin is about academic theology, rather than, say, maintenance of churches, canon law, or the conduct of rituals. Has anyone knowledgeable taken a good look at how the labels are actually used? What was the source of the labels? How did the source apply them? Is "Ecclesiastical Latin" actually used only for terms used in theological discourse? Do we have anyone who respects the subject(s) enough to make an improvement on the current labels and categories? Ecclesiastical Latin seems to have had more uniformity than, say, scientific, literary, legal or medical Latins. Doesn't that add to the value of the existing label? DCDuring (talk) 22:55, 1 November 2022 (UTC)[reply]
A lot of it relates to law, and it's not entirely appropriate to put the word "ecclesiastical" on that. Yes, much of it obviously was used in that way by the church, but certainly not exclusively. Theknightwho (talk) 22:57, 1 November 2022 (UTC)[reply]
@DCDuring: I think I get your point a little better, but I'm not concerned about e.g. the use of "Ecclesiastical Latin" in etymology sections and the like, imported from dictionaries, although those could be more precise in some cases. I am myself a specialist and I add terms that I come across in primary sources. It isn't clear to me when "Ecclesiastical Latin" should be applied to a term that is being added, or, conversely, what it means when someone else adds one, because our definition of the term is poor. I imagine for a non-specialist it would be even less helpful. So, I think it would probably be good to clarify how we are using it. If you're asking for someone knowledgeable to take a look, well, I am here and taking a look at it, hence this thread. "Ecclesiastical Latin" of course does not only apply to academic theology, hence my point above about theological or liturgical bearing and my suggestion to describe it expansively as language used in relation to Church matters and especially terms that are not obviously circumscribed by era.
I do disagree, as a point of fact, that "Ecclesiastical Latin seems to have had more uniformity than, say, scientific, literary, legal or medical Latins": I think FWIW that in practice precisely the opposite is true. Law Latin developed over a much shorter period, is entirely technical and constituted more of a pan-European argot because the study of law was dominated by a small number of institutions (Orléans, Bologna) from the time of the reintroduction of the Corpus Iuris Civilis. By contrast, liturgy in the Middle Ages was not developed by technicians and, before the advent of printing, Trent, and Quo primum, the language of clergy reflected a much more diverse set of local practices, often developed diocese by diocese. Anyway, all this is just to say that I think we should decide on an in-house definition of Ecclesiastical Latin that can be applied with reasonable consistency and can be explained to non-specialist readers, rather than just point to or copy what's on the Wikipedia page, which is fine as it is but wasn't written with a dictionary in mind. —Al-Muqanna المقنع (talk) 00:30, 2 November 2022 (UTC)[reply]
It may be relevant to this discussion to note that there is an official Vatican body responsible for (among other things) creating a dictionary of neologisms for modern concepts, which are likely often not used, but are probably incorporated into the official Latin translations of Vatican documents. Most of these are not ecclesiastical terms per se, but I would think they are primarily used in ecclesiastical contexts (papal encyclicals and the like). Andrew Sheedy (talk) 04:20, 2 November 2022 (UTC)[reply]
That's worth noting, for sure. Our current definition of EL focuses on medieval and early modern usage, and sometimes dictionaries use it just to mean "Medieval Latin": but that's a very different beast from Latin as used by the Vatican now. I think my "era-independent" suggestion would encompass that better. —Al-Muqanna المقنع (talk) 19:46, 3 November 2022 (UTC)[reply]

Pre-Proto-Mongolic[edit]

Modern literature on Mongolic languages tends to make a distinction between Proto-Mongolic (the direct ancestor to Middle Mongolian, spoken between the 10th/11th and 13th centuries) and Pre-Proto-Mongolic, the ancestor to that language, tracing back to approximately the 5th century. Although Proto-Mongolic and Pre-Proto-Mongolic are both unattested, the distinction does still matter, as they're reconstructed by very different means: Proto-Mongolic is primarily reconstructed from extant (and attested) languages within the Mongolic family (though obviously with Turkic, Tungusic and Sino-Tibetan influence where appropriate). On the other hand, Pre-Proto-Mongolic is only possible to reconstruct externally (i.e. indirectly), from what we can infer from known/suspected contact with other language families at the time, and then cross-comparing to what we know about Proto-Mongolic + later developments.

Obviously the number of Pre-Proto-Mongolic lemmas is inevitably going to be quite small for a very long time, but I think the difference between the two is significant enough that it warrants creating a separate L2. For comparison, Pre-Proto-Mongolic would be (near-)contemporary with Old Turkic. Theknightwho (talk) 16:53, 1 November 2022 (UTC)[reply]

@Theknightwho: Unless there are descendants of Pre-Proto-Mongolic other than Proto-Mongolic, it seems quite shaky to reconstruct it at all. You can always give the reconstructed older forms (with appropriate references) in the etymology sections of Proto-Mongolic, there's no need to make separate lemmas for them. Thadh (talk) 17:05, 2 November 2022 (UTC)[reply]
@Thadh We do know that there was some influence of Pre-Proto-Mongolic during that period, which is how we are able to do any reconstructions. I would also feel uncomfortable adding reconstructions under a name not used for them outside of Wiktionary. Theknightwho (talk) 17:09, 2 November 2022 (UTC)[reply]
@Theknightwho: Even so, reconstructing purely on the basis of (supposed) loanwords is... eh. And I'm not saying you should add PPM lemmas under the name of PM, I'm rather referring to things like Proto-Finnic *hüvä, where the earlier stage (early Proto-Finnic, or pre-Proto-Finnic, if you wish) is given in the etymology section. Same thing is also widely done for Pre-Germanic. No need to make links out of them. Thadh (talk) 17:13, 2 November 2022 (UTC)[reply]
@Thadh I should probably have mentioned that much of this comes from the attempted reconstruction of Khitan, which is a para-Mongolic language, of which Proto-Mongolic is only one (or is its sister family, depending on which academic you talk to). Although this is tentative (and I'm unsure quite how many actual pages we can be confident enough to create), there are certainly a small handful. Theknightwho (talk) 18:33, 4 November 2022 (UTC)[reply]
@Theknightwho: Usually, creating full-fledged codes for proto-languages that contain just one more descendant than another code has not provided fantastic results here on Wiktionary - take Proto-Polynesian (compared to Proto-Nuclear Polynesian) and Proto-Semitic (compared to Proto-West Semitic) - usually the former is identical to the latter and people just link the older language making the whole categorisation and lemmatisation a mess. Thadh (talk) 20:40, 4 November 2022 (UTC)[reply]
I'm not sure that would happen here. There aren't many PPM reconstructions, compared to the large number of reconstructions for PM. Theknightwho (talk) 21:19, 4 November 2022 (UTC)[reply]
Support. AG202 (talk) 05:39, 4 November 2022 (UTC)[reply]
I don't think it's terribly necessary if we are talking about loans from let's say Proto-Turkic into pre-Proto-Mongolic. @Thadh's example of Proto-Finnic *hüvä shows how to illustrate the etymology elegantly without an additional entry for pre-Proto-Finnic. In such a case, we can include the Proto-Mongolic form among the descendants of the Proto-Turkic reconstruction, thus reciprocally linking the two forms to each other.
The opposite case is more interesting if let's say again Proto-Turkic borrowed from pre-Proto-Mongolic, i.e. if the Proto-Turkic form cannot be derived from Proto-Mongolic but definitely reflects an earlier form preceding the Proto-Mongolic stage (similar to pre-Grimm's law Germanic borrowings into Finnic). In the etymology of the Proto-Turkic form, we could mention the putative pre-Proto-Mongolic form and link to the Proto-Mongolic reconstruction derived from the latter, but we cannot include the Proto-Turkic reconstruction among the descendants of the Proto-Mongolic entry. In such a case (to ensure reciprocal linking), pre-Proto-Mongolic entries make sense. –Austronesier (talk) 18:36, 4 November 2022 (UTC)[reply]
@Austronesier: I think we could bend the rules a little and give the Proto-Turkic descendant on the Proto-Mongolian entry with a necessary qualifier From earlier *PPM_form: in the descendants section, something like on Proto-Finnic *omena (there are much better examples but I can't come up with one off the top of my head and I think the premise is quite clear here). Thadh (talk) 20:34, 4 November 2022 (UTC)[reply]
@Thadh I'm not sure I understand why that should be necessary, instead of doing it properly. Theknightwho (talk) 21:16, 4 November 2022 (UTC)[reply]
@Theknightwho: In practice, how many entries will we get for pre-Proto-Mongolic as donor? –Austronesier (talk) 21:20, 4 November 2022 (UTC)[reply]
@Austronesier I wouldn't say very many - at least not at this stage. We're probably looking at 20 reconstructions which are possible at all, which theoretically could be used on the pages for about 10 languages each. Theknightwho (talk) 21:32, 4 November 2022 (UTC)[reply]
@Theknightwho: That is doing it properly. Reconstructions of languages based on borrowings are very speculative, and we don't host terms that would normally have two (**) or even three (***) asterisks.
If we're aiming at a language with under thirty terms that can be (relatively) safely reconstructed, while having a solid reconstruction of a descendant that is also the ancestor of all its other descendants, then just adding this note to thirty lemmas out of hundreds potential pages isn't a problem and saves space and a lot of headache.
If we're talking about an actual solidly reconstructed language with a lot of reconstructions, then that means that pretty much any modern Mongolic term will need to have one more code added to its etymology, and that's becoming bothersome on that end. Thadh (talk) 21:25, 4 November 2022 (UTC)[reply]
So your argument is that if there aren't many there's no point, and if there are lots then it's too much work? Hmm. Forgive me if I'm misunderstanding you there. Theknightwho (talk) 21:29, 4 November 2022 (UTC)[reply]
@Theknightwho: I'm saying if there's few then there's no point, and if there are lots it may be better to just switch to generally giving the older form instead of the newer in the reconstructions. Thadh (talk) 16:33, 6 November 2022 (UTC)[reply]
Oppose - Even the reconstruction of Proto-Mongolic is tentative and based upon a handful of works. There is no consensus on the reconstruction of Pre-PM and indeed the reconstruction of the Khitan sound system itself is still in its early phases. The needs of linking Turkic and Tungusic cognates and Khitan entries can be well served by the PM pages themselves. Hromi duabh (talk) 14:28, 25 November 2022 (UTC)[reply]

Apply for Funding through the Movement Strategy Community Engagement Package to Support Your Community[edit]

The Wikimedia Movement Strategy implementation is a collaborative effort for all Wikimedians. Movement Strategy Implementation Grants support projects that take the current state of a Movement Strategy Initiative and push it one step forward. If you are looking for an example or some guide on how to engage your community further on Movement Strategy and the Movement Strategy Implementation Grants specifically, you may find this community engagement package helpful.

The goal of this community engagement package is to support more people to access the funding they might need for the implementation work. By becoming a recipient of this grant, you will be able to support other community members to develop further grant applications that fit with your local contexts to benefit your own communities. With this package, the hope is to break down language barriers and to ensure community members have needed information on Movement Strategy to connect with each other. Movement Strategy is a two-way exchange, we can always learn more from the experiences and knowledge of Wikimedians everywhere. We can train and support our peers by using this package, so more people can make use of this great funding opportunity.

If this information interests you or if you have any further thoughts or questions, please do not hesitate to reach out to us as your regional facilitators to discuss further. We will be more than happy to support you. When you are ready, follow the steps on this page to apply. We look forward to receiving your application.

Best regards,
Movement Strategy and Governance Team
Wikimedia Foundation Mervat (WMF) (talk) 13:49, 2 November 2022 (UTC)[reply]

Braille[edit]

I propose we move Braille from Translingual to Alt Forms of the approrpiate L2's and create something like {{braille form of}}. Braille entries as they are are a mess. @Binarystep @AG202 @Thadh, and anyone else interested. Vininn126 (talk) 14:44, 2 November 2022 (UTC)[reply]

That seems sensible for many entries. Can it be automated? —Justin (koavf)TCM 14:49, 2 November 2022 (UTC)[reply]
Isn't some Braille translingual? Maybe numeric digits, music notation, etc.? Equinox 14:57, 2 November 2022 (UTC)[reply]
This definitely seems true, so some translingual braille will have to stay. Vininn126 (talk) 15:30, 2 November 2022 (UTC)[reply]
There is already {{Brai-def}}, but that seems to be only ever used for Japanese. – Wpi31 (talk) 16:00, 2 November 2022 (UTC)[reply]
I'm inclined to oppose: Braille is essentially an alternative orthography never used in print media nor on the web; We don't include morse code, attested encoding mechanisms or shorthand either, and for good reason: It takes five minutes to look up the braille alphabet and you'll be able to read any braille text with the table, assuming you even manage to find a braille text that doesn't have a regular text next to it. And why on earth Unicode decided to add braille is beyond me. Thadh (talk) 17:01, 2 November 2022 (UTC)[reply]
Braille books exists? Vininn126 (talk) 17:28, 2 November 2022 (UTC)[reply]
Okay, I guess that wasn't a perfect wording, I rather meant "print media intended for visual consumption" - braille books are still intended for a very specific group of people that would probably prefer using regular text types if they could. Thadh (talk) 17:31, 2 November 2022 (UTC)[reply]
Of course they aren't for visual consumption, the vast majority of people reading these books can't see. I don't think I'm understanding the difference you are making. Is your argument based on the fact we should be recording printed letters as opposed to cues for other senses because these alternative "alphabets" are usually based on a visual alphabet? Somewhat relatedly, do you think what we have at is what we should be doing? Vininn126 (talk) 17:36, 2 November 2022 (UTC)[reply]
The point I'm making is that braille, along with morse, shorthand etc., are specialised respellings of the regular (in English's case, Latin) orthographies. So they don't have any place in a dictionary, plain and simple: If someone seriously wants to see what a braille texts says, they should use a converter, or a chart, but not a dictionary. To give some more examples of specialised respellings: binary code, hexadecimal code, UTF-codings... So no, I don't think is something we should be doing, I'm fine with keeping the translingual entry for consistency's sake, but making language-specific entries makes no sense to me. Thadh (talk) 17:42, 2 November 2022 (UTC)[reply]
This is essentially the discussion from a while ago trying to determine if we should collapse a lot of Language's letter content into translingual, utlimately the consensus from that was that we should keep them separate. I think it's rather inconsistent to have separate letter information for a in each L2 but not for various symbols such as this. Vininn126 (talk) 17:46, 2 November 2022 (UTC)[reply]
@Thadh Braille can be radically different from language to language and country to country though… it’s not the same as Morse code at all. You can’t look up a Braille converter for Braille in Japan for example and expect it to be the same. Also there are shorthand words made from Braille that don’t align with the letters. It feels oddly similar to the arguments made against including Sign Languages. Looking at ⠁⠉ for example, in English Braille it means “according” from the shorthand of “ac” but in w:Korean Braille it means 그러나 (geureona, but, however), which you wouldn’t even be able to easily guess from the Korean Braille alphabet. Another example is which differs from language to language significantly. Who knows what other shorthand Braille there are? This is actually one of the better things that Unicode has added, along with SignWriting as it can increase access significantly (who knows how Braille can interact with screen readers?) To quote w:English Braille: “Braille is frequently portrayed as a re-encoding of the English orthography by sighted people. However, braille is a separate writing system, not a variant of the printed English alphabet”. To label it as a respelling of a regular orthography is inaccurate. This is lexical information that’s important to users and increases accessibility and awareness of how Braille works. I support this proposal wholeheartedly. CC: @Vininn126 AG202 (talk) 21:26, 2 November 2022 (UTC)[reply]
See also: w:English Braille#Contractions & w:American Braille. You can’t pull out a dictionary and read everything out automatically. And that’s only three Braille systems that I’ve looked into, let alone the many many more. AG202 (talk) 21:40, 2 November 2022 (UTC)[reply]
I hate to use the "as someone [relative clause]" formation but as someone whose mother frequently uses Braille and teaches it, this "code" stance is fairly wrong.
"A few shorthands" (note: this was wording used on the English Wiktionary Discord, not here) does not come close to covering the amount of contractions, multisymbol contractions, symbols, and deprecated usages in Modern English Braille. There are sixty-four (64) possible Braille cells and the amount of distinct symbols and indicators in modern English Braille far exceeds that.
Braille, as we know, is not a language, but it is a specialized orthography deserving of demarcation from translingual lemmata. This discussion must acknowledge that not only is there of course multilingual Braille, but there is Braille specifically designed for technical purposes, e.g. Nemeth Braille Code (used for encoding mathematical + phsyical notation). These technical codes (which exist in tongues beyond English) are extremely complex and cannot likely be explained away in a translingual section.
Again, in a Braille cell, there are sixty-four possible individual characters. Multiple Braille cells are used to represent completely different letters, contractions, and symbols in different languages. N is not exclusively a translingual page. Why should be so?
I am aware, Thadh, that you yourself don't like letter pages anyway. But there is a precedent. Jodi1729 (talk) 17:10, 3 November 2022 (UTC)[reply]
I would like to add a clarification - when I say split, I mean just split the existing letters by language. I do not wish to imply things like transliterations of each words. If there are interesting, non predictable attestable forms of words and such then we can discuss that. Vininn126 (talk) 23:50, 2 November 2022 (UTC)[reply]
Support. Binarystep (talk) 07:59, 3 November 2022 (UTC)[reply]
Largely oppose. For the one-cell characters, they are mostly better not split by natural language. is nice and compact - it would be disastrous to split the Bharati Braille usage by language, and I wouldn't like to split the lemma by script. Abbreviations and logograms are possible exceptions - I wonder what multilingual Braille systems do for the word-like abbreviations. In this case, perhaps Wiktionary should act like a reference manual and list transliterators from Braille. --RichardW57m (talk) 12:48, 4 November 2022 (UTC)[reply]
I concede that there may be a case for L2 Braille-system headers, such as 'Unified'. --RichardW57m (talk) 12:48, 4 November 2022 (UTC)[reply]
Why would it be any more disastrous than splitting for any other script? Theknightwho (talk) 13:06, 4 November 2022 (UTC)[reply]
@Theknightwho: The letter 'a' only has entries for languages written in the Roman alphabet, the corresponding Braille letter would have entries for every language written in Braille - you would add most of the languages of mainland south and southeast Asia. --RichardW57m (talk) 10:42, 7 November 2022 (UTC)[reply]
Why is that a problem, though? Theknightwho (talk) 15:05, 7 November 2022 (UTC)[reply]
@Theknightwho: do you really thing umpteen entries for is better than what we currently have? --RichardW57m (talk) 10:20, 9 November 2022 (UTC)[reply]
@RichardW57m: If they’re generally semantically different, then yes. Theknightwho (talk) 16:07, 9 November 2022 (UTC)[reply]
They will usually map in the first instance to "1" or the "character used to represent /a/ or the approximation thereto". --RichardW57m (talk) 17:26, 9 November 2022 (UTC)[reply]
So do normal letters. I take it you would support merging those into translingual as once proposed? Vininn126 (talk) 17:31, 9 November 2022 (UTC)[reply]
Yes, and I note a lot of letters have false precision, as exemplified by definitions like "the nineteenth letter of the Welsh alphabet". The word 'nineteenth' is false precision - it depends on whether 'j' is in the Welsh alphabet, and some such definitions have been inconsistent. (It wasn't when I was a boy.)
Sheer aesthetics argue for the collapse to a single lemma in the case of Braille. --RichardW57m (talk) 11:12, 10 November 2022 (UTC)[reply]
That's an argument for improving the quality of those entries; not removing them. Theknightwho (talk) 16:52, 10 November 2022 (UTC)[reply]
Is there even multilingual braille? AG202 (talk) 13:24, 4 November 2022 (UTC)[reply]
w:Bharati Braille. RichardW57m (talk) 10:45, 7 November 2022 (UTC)[reply]
Thank you, I missed that in your original comment. I do wonder though, seeing how Braille systems often have contractions and shorthand, if those could differ for the languages that implement the Bharati Braille system, as you mentioned. Though I disagree with the implementation that Wiktionary should only act as a reference manual. Maybe an L2 Braille system like "Bharati Braille" would be useful, because as is, "Translingual" is not clear and has become a catch-all which is a problem. AG202 (talk) 15:23, 7 November 2022 (UTC)[reply]
I'm not saying that Wiktionary should act only as a reference manual. If someone is trying to decipher some Braille text, I think it is too much to hope that we will have found attestations for the Braille spelling of every English word in Braille, let alone Welsh. What we can do is point to a transliteration service. We might even supply them ourselves - if we list abbreviations, let alone words, we should probably offer transliterations, just as we do for other scripts, though notably on a language by language basis. (Hmm - non-Roman targeted Brailles need two levels of transliteration - target script and Latin script. And Bharati Braille is script-agnostic - it even supports basic Latin!) --RichardW57m (talk) 13:04, 10 November 2022 (UTC)[reply]
The idea of an L2 heading "Bharati Braille" has some appeal, especially if we must break translingual up. As far as I can tell, Bharati Braille has no contractions - it's 'level 1' equivalent employs all codes for simply written words. There must be some subtleties in the writing - I need to draw up a cell to letter etc. coding table. --RichardW57m (talk) RichardW57m (talk) 13:04, 10 November 2022 (UTC)[reply]
I believe that lack of a contractions in one language isn't an argument to not separate other languages. Lack of a word for "bombard" in one language is not evidence to not add it in another language. Also I want to emphasize the point of the thread is not to provide transliterations, just change the presentation of the current entries to be more consistent with other letters. There was an attempt to merge them into translingual before, ultimately leading to no change. Vininn126 (talk) 13:13, 10 November 2022 (UTC)[reply]
Ah, so you are just talking about Braille letters, and not other Braille characters? Note that we haven't split ÷, whose Scandinavian meaning ("minus sign") is different to its English meaning. I will remark here that Unicode considers the Braille characters to be symbols, not letters! Unfortunately, consistency is overrated. ---18:15, 10 November 2022 (UTC) RichardW57m (talk) 18:15, 10 November 2022 (UTC)[reply]
Look at my second comment to myself above. Also, disagree on the consistency! It makes a huge difference for readers. Vininn126 (talk) 18:38, 10 November 2022 (UTC)[reply]
So does Unified English Braille have 26, about 51 or how many letters? Is there anywhere a Wiktionary taxonomy for entities in Braille script? It's the 26 that are amongst the most translingual! --RichardW57m (talk) 12:52, 11 November 2022 (UTC)[reply]
I'm not following. Could you please elaborate? Vininn126 (talk) 12:54, 11 November 2022 (UTC)[reply]
Not all 64 6-dot Braille cells are letters. I think I've seen ligature and logogram used, and, irrelevently, of course there are the ten numerals which double as letters. Decade 5 is mostly punctuation, and the right-shifted cells are mostly 'format' or similar characters. The dotless cell does not function as a letter. --RichardW57m (talk) 14:51, 11 November 2022 (UTC)[reply]
(honestly, it should be split) AG202 (talk) 19:00, 10 November 2022 (UTC)[reply]
There are five characters in Bharati Braille (and more for Indian Urdu) whose writing includes format characters. There are also a couple of ambiguous characters - or at least, that's implicit in the documentation I can find. --RichardW57m (talk) 12:14, 11 November 2022 (UTC)[reply]

Frequency information in usage notes[edit]

A user removed frequency information that I added to entry supermajority:

"The term supermajority is much more common in the American corpus while qualified majority is much more common in the British corpus."

It traced to {{R:GNV}}.

The user said it belongs to context label but did not add any context label himself. This kind of procedure seems very unwiki to me.

I don't think we can fairly describe this in a context label: "Chiefly British" or "Chiefly American" does not seem appropriate context labels. It is not so clear what the prevalence in the corpora means; all we can do is state the prevalence and let the reader follow the GNV link to see for themselves. All it can mean is that Americans use "supermajority" to refer to their political supermajorities while EU uses "qualified majority" to refer to what they do.

What do you think? Does the usage note do more harm than good? I find it very useful, especially when paired with a link to follow.

--Dan Polansky (talk) 13:07, 3 November 2022 (UTC)[reply]

If that isn’t what the word “chiefly” means, then it’s not at all clear what it is ever supposed to mean. It’s also clearly escaped your attention that I did add a context label, but it would have been helpful if you could have bothered to do it yourself.
Rather than putting this information in a usage note that uses 5-10 times as many words as necessary, it is much better to simply use a context label - something that we do almost everywhere else. I also wasn’t aware that “British English” and “EU English” are synonymous. Theknightwho (talk) 13:12, 3 November 2022 (UTC)[reply]
Per WT:EL: These notes should not take the place of context labels when those are adequate for the job. Case closed. Theknightwho (talk) 14:59, 3 November 2022 (UTC)[reply]
It would, however, be helpful to indicate in the entry the more common British equivalent. Andrew Sheedy (talk) 15:20, 3 November 2022 (UTC)[reply]
@Andrew Sheedy It’s right under the definition. Theknightwho (talk) 15:21, 3 November 2022 (UTC)[reply]
supermajority,(qualified majority*4) at Google Ngram Viewer shows supermajority to be about 4 times as common as the other term in the American corpus. Does it make qualified majority "chiefly British"? Not to me: the term still sees very significant use in the American corpus. To me, "chiefly British" would require much smaller use in the American corpus. A problem is that we do not define anywhere what "chiefly" means numerically, something a professional dictionary would have to do. We have too many things uncodified. --Dan Polansky (talk) 18:27, 3 November 2022 (UTC)[reply]
If your concern is the precise meaning of the adverb "chiefly", then that is solvable by using a different adverb. I strongly suspect you're just nitpicking, though. Theknightwho (talk) 18:42, 3 November 2022 (UTC)[reply]
So which adverb? As I explained, my understanding of "chiefly" is different from what the data shows. The sentence I used does not suffer from that problem. I do not recall ever tagging entries as "chiefly US" or "chiefly UK" and I do not know what our guideline is for that tagging. My suspicion is that it is based on whim. The problem with crude labels is apparent in color entry, which says "color (countable and uncountable, plural colors) (American spelling) (Canadian spelling, rare)". By contrast, OED says "colour | color" and data shows "color" to be fairly common in the British corpus as of late[1]. Crude labels do not do justice to facts and OED does a better job than we do in its "color" entry. --Dan Polansky (talk) 19:10, 3 November 2022 (UTC)[reply]
So you're arguing that a word can be "much more common" in one corpus without it being "chiefly" used in that corpus?
Your color example is actually a great demonstration of why we need to take these NGram numbers with a heavy dose of salt anyway. Just because a variant is used in a corpus doesn't mean that it is actually accepted as being part of a particular variety of English by the speakers of that variety. There are other reasons why it might occur instead: spellcheckers, for instance. Theknightwho (talk) 19:22, 3 November 2022 (UTC)[reply]
'a word can be "much more common" in one corpus without it being "chiefly" used in that corpus?' Yes. Kind of obvious to me.
The color example shows real data, not guesses and unsubstantiated opinions. OED seems to think so as well given they say "colour | color". Given the data and the OED entry, it seems that "color" is now widely accepted in the British English. Supplementary evidence could challenge that idea, but mere unsubstantiated opinions won't. --Dan Polansky (talk) 14:50, 4 November 2022 (UTC)[reply]
The color example shows that you don't understand how to interpret raw data, and that you don't understand that the OED isn't limited to British English; you've failed to address both of these points. As a native speaker of British English, I can tell you pretty definitively that color is not "widely accepted" in British English. Other corpora do not support your point, either, given that BASE and British English 2006 contain almost 0 instances of color, and UkWac Complete and GloWbE show very low usage compared to colour. As someone who is not a native speaker of English and who does not even live in a country where English is the dominant language, you are not in a position to make the claim that you are; especially when you've stonewalled the obvious explanation that I've already given you.
Stop embarrassing yourself. You seem to have absolutely no idea what you're doing, and seem to be completely incapable of accepting that the conclusions you've hastily jumped to might be flawed; often fatally so. Theknightwho (talk) 20:56, 5 November 2022 (UTC)[reply]
As an American, qualified majority would confuse me; I would have assumed it's the normal use of qualified + majority, which could mean anything given the context. I'd think that the use of qualified majority in US English would either be in that broader sense, or specifically talking about the EU procedures and using the language they use to describe them.--Prosfilaes (talk) 18:00, 16 November 2022 (UTC)[reply]
I googled for US uses of qualified majority, and after pages of British or European uses, and a few US pages talking about the EU, I found a George Washington University article that used the phrase "qualified majority": "Along with Costa Rica, Argentina, Ecuador, and Nicaragua have adopted qualified majority-runoff rules; i.e., to win outright, the leading candidate must reach a threshold, but the threshold is lower than 50 percent of the vote."[2] That is, this US source uses "qualified majority" for almost the exact opposite of "supermajority".--Prosfilaes (talk) 18:07, 16 November 2022 (UTC)[reply]

Albanian proper noun lemmas - indefinite vs definite[edit]

I think lemmas for Albanian proper nouns should be the indefinite forms, like with common nouns, even if definite forms are more commonly used. There is no consistency and there are many duplications. So I have already created or changed indef. forms to be lemmas and def forms to be a inflected form only (focusing on country names for now).

Examples of pairs indefinite - definite (already checked and edited by me)

  1. Indi - India
  2. Francë - Franca
  3. Afganistan - Afganistani

Inflection and headword templates (incomplete) currently support the indefinite form to be the lemma.

Question: not sure if indefinite forms are always easily found but they probably exist. Do all proper nouns have both forms?

Please note Albanian Wikipedia uses definite forms in article names, e.g. "India", not "Indi".

Please comment if you have preferences or knowledge or on the subjects, as I have been making changes, so that less rework would be required. Anatoli T. (обсудить/вклад) 23:35, 3 November 2022 (UTC)[reply]

How come Korean verb conjugation templates split between different degrees of politeness, but Japanese don't?[edit]

Compare Category:Japanese verb inflection-table templates with Category:Korean verb inflection-table templates.

I just feel like Japanese verb templates would really benefit from the addition of ます forms, etc. These wouldn't be immediately intuitive to new language learners from the Japanese verb templates as they currently stand, when they're just as essential in the appropriate contexts in Japanese as they are in Korean.

So, what do you think? Dennis Dartman (talk) 00:57, 4 November 2022 (UTC)[reply]

@Dennis Dartman: The Japanese inflection of the formal ます (masu) is consistent and simple. It doesn't change dependent on conjugation type, unlike Korean (even if there are commonalities in Korean). Types 1, 2 or 3 have the same formal endings - -ました, -ませ, -ません, etc. Perhaps a link or a note in the conjugation table will suffice.
BTW, unfortunately, Korean template don't handle conjugations with 100% accuracy irregular verbs when the formal forms are lacking or the informal forms are lacking. Anatoli T. (обсудить/вклад) 01:11, 4 November 2022 (UTC)[reply]
Could you give an example for the wrong conjugations? @Atitarev AG202 (talk) 05:36, 4 November 2022 (UTC)[reply]
@AG202: There are a few. The latest issues are in Module_talk:ko-conj#Issues_with_the_module but you can see me in the same talk page. Both good knowledge of the Korean grammar and module writing skills are required but this could be a combined effort with building cases. Suppression and manual overrides (or a different type for the copulas, special versb) would be required. Anatoli T. (обсудить/вклад) 07:30, 4 November 2022 (UTC)[reply]
@Dennis Dartman: Because the current Japanese inflection-table templates are actually not "tables", but rather "lists". As you can notice, these templates in a single line have kanji, kana and romaji, the three items essentially constituting just one single inflected form. Thus they are 1-dimensional, which I would call "lists", and can only give a few forms before becoming almost unreadable. We would need 2-dimensional tables to have enough space for those addtional ます forms.
While it is possible to convert these templates to 2-D table-like structures, this may increase Lua memory usage which I am not sure would be a good idea to everyone. -- Huhu9001 (talk) 03:02, 4 November 2022 (UTC)[reply]
Well, considering Wiktionary is apparently okay with the likes of Template:sw-conj... Dennis Dartman (talk) 03:47, 4 November 2022 (UTC)[reply]
Re: memory issues. The Japanese conjugation template already uses Lua memory due to invoking Module:ja repeatedly, even though the main glue of the template is not written in Lua. In my testing, removing the conjugation template from 愛玩 saved a little over 2 MB, out of the 52 MB limit. For comparison, both Russian conjugation templates on минимизировать (minimizirovatʹ) combined, fully implemented using Lua, take less than 1 MB. My guess is that if the list/table were extended to have twice as many forms using its current implementation, that would require about twice as much memory. If the whole table were implemented using Lua (like the Russian ones are), the addition of more forms might incur a smaller additional cost because it wouldn't require loading the modules over and over. I haven't tested this though.
Anyway, memory would mainly be a concern on verb entries whose title is a single kanji character, since single-character pages tend to be the worst offenders for Lua memory overuse (due to having many language sections, each of which can be long). 98.170.164.88 04:44, 4 November 2022 (UTC)[reply]
I don’t think the additional workload is likely to be majorly intensive, and there are economies of scale in Lua if done right. The new Mongolian inflection template uses about 3MB (which is an inherent issue due to how the forms have to be generated, though there are about 3 times as many as Russian). Splitting off the independent genitives so that they have their own tables on the relevant pages didn’t save that much memory, despite cutting the number of forms from 117 to 53. Theknightwho (talk) 12:28, 4 November 2022 (UTC)[reply]

The Swahili conjugation template: is it unnecessarily convoluted? Could it use trimming?[edit]

Template:sw-conj is massive. Gargantuan.

And are we okay with this?

Feedback from a Swahili speaker preferred. Dennis Dartman (talk) 03:48, 4 November 2022 (UTC)[reply]

I recommend that you explain why this is bad, and what ideas you have got to fix it. (Feedback from someone who wants to solve problems preferred.) Equinox 03:50, 4 November 2022 (UTC)[reply]
I agree it seems rather unwieldy, but as long as it's collapsed by default how much of an issue is it? One benefit of having a comprehensive table is that if you come across a conjugated form and search for it, you'll find the entry for the stem.
I guess one drawback is that big pages take longer to load. For comparison, the table on ruka takes up about 309 kB of HTML, while the whole page (excluding JS, images, etc.) is 439 kB. The arm photograph in the Lower Sorbian section is 36.3 kB. 98.170.164.88 04:20, 4 November 2022 (UTC)[reply]
I already mentioned this at several places, without receiving any response. Another issue is that the table is …drumroll… woefully incomplete; there’s many more relative forms than that. (For example, on the very fist page of Duniani kuna watu you’ll find the form linalompeleka. We do have the entry -peleka, but this form doesn’t show in search and I certainly can’t find it anywhere in the interminable collapsible boxes in the table.)
Furthermore the template often generates wrong forms.
Anyway, most of the forms (including the lacking ones) are 100% predictable. Trying to include all forms is like including “I wouldn’t have been robbed” in the conjugation table at rob. What’s the point? It drowns the useful information in a sea of cruft.
Finally, looking up every single word in a language you don’t know the basics of is a completely pointless exercise. Try and look up all the words in “I’ll make it up to you.” This won’t tell you the meaning of the sentence at all, because you should look up make up instead of individual words. Someone who doesn’t know the basics of a language should learn the basics of the language rather than looking up every single string surrounded by spaces they come across. Otherwise you just get Bud Carry Without Being in Love. MuDavid 栘𩿠 (talk) 02:06, 5 November 2022 (UTC)[reply]
There does come a point when agglutinative languages get completely out of hand, yes. Are any of the suffixes involved derivational? Certainly for Mongolian and Turkish, voices like the causative are treated as deverbal, and participles are given their own tables. Otherwise, their verbs would also end up with hundreds/thousands of forms in their conjugation tables. My Mongolian spellcheck dictionary boasts that it can handle 1.8 billion possible inflections, and there's no genuine grammatical reason for it to stop there; just a practical one. We should take a similar view, but probably want to draw the line in a different place (e.g. "with one who is not without one that has a horse" is one for the spellchecker that we probably don't need, and that's just looking at nouns). Theknightwho (talk) 03:04, 5 November 2022 (UTC)[reply]
I also complained about this a few months ago, but it seems I somehow did not have the tables collapsed by default the way nearly everyone else does. If other people complain, I just want to make sure that they are not having the same problem I once had. Soap 11:16, 5 November 2022 (UTC)[reply]

pinging @Dennis Dartman, Equinox, Theknightwho, Soap, Jodi1729, MartinMichlmayr, Metaknowledge, JohnC5, Habst, anybody else? I’ve been working on a proposal for new, trimmed, tables: here. There’s still work to do (see the issues I mention at the bottom of the page), but I feel I’ve progressed enough that feedback is welcome. Let me know what y’all think. MuDavid 栘𩿠 (talk) 03:38, 14 December 2022 (UTC)[reply]

Replacing bare lists of adjectives & nouns in usage notes[edit]

We currently have a number of entries which have usage notes containing (sometimes very lengthy) lists of nouns and adjectives that the main term is commonly used with. For example: at argument, majority & practice. In my opinion, these are extremely low-effort and of little-to-no use to a reader, given they provide zero contextual information, can't be used as signposts, and are laid out in a format that is much too dense for those who would use the info that they're trying to convey anyway. They're just unlinked blocks of text (which is particularly bad on mobile), with absolutely no information about how any of the listed terms are used with the word in question.

The largest problem, though, is that this is a major misuse of usage notes, which makes it harder to pick out any genuine usage information which is buried underneath. Usage notes are frequently one of the most important sections in an entry, given they usually contain (sometimes critically) important contextual information, which someone unfamiliar with the term needs to know in order to understand the term properly. We do not want to train our readers to skip over them, as these lists invariably will do.

Fortunately, we already have ways of displaying this kind of information: collocations and derived terms. These have several advantages, not least of which are that they're segmented off from other info (and therefore easier to parse), as well as the fact that they show how the two terms are used together. Not every adjective is in the attributive position, after all.

Given this only affects a relatively small number of entries (<50) at present, I suggest we nip this in the bud by converting all of these sections into collocations (or whatever else is appropriate in the context), and we disallow the addition of these bare lists going forward. @Dan Polansky has claimed to me that this is the "traditional" way of doing things, but if there was ever truth in that, it's obviously not how things are generally done now. Theknightwho (talk) 13:49, 4 November 2022 (UTC)[reply]

I used this format in the English Wiktionary for over a decade. It is more compact and the information conveyed is the same as the space-wasting format for collocations. The lists are as useful as collocations; the difference is that, instead of writing "A X, "B X" and "C X", I write A, B, and C and leave it to the reader to fill in X. This format is used by some collocation dictionaries. The format is very compact, ensuring that even a fairly long list of items takes little screen space.
I don't oppose anyone wanting to convert this to the space-wasteful collocation format, but it is not worth my effort and I find the more compact format preferable.
I find the lists very useful. They sometimes reveal deficiencies in our definitions. They are often more useful than badly chosen quotations of use, of which we have many.
The information content is nearly the same as with collocations, just more compact.
If I were a reader, I would be glad someone is actually willing to do this kind of menial work.
I would therefore appreciate if I were allowed to continue using that format, in part as impetus for and a recognition of the work being done, even if uninspiring menial work. It will be easy to convert the information to a collocation format using a bot later in volume if desired. --Dan Polansky (talk) 14:46, 4 November 2022 (UTC)[reply]
We use "Adjectives often used with" on 28 entries, "Nouns often used with" on 16, "Verbs often used with" on 6 and "Adverbs often used with" on 2. Several entries have multiple lists. If you have being using this format for over a decade, you clearly haven't been using it very often. It is certainly not a "traditional" format.
You also fail to account for the fact (which I have already mentioned) that we cannot use a bot to convert into collocations, because it cannot accurately predict how each of the collocated terms will fit together. In fact, you've addressed none of the issues. Please just do the work properly in the first place, instead of lying to everyone that your niche way of doing things is the de facto standard. I would also appreciate if you did not misrepresent the issue: the problem is obviously how the information is being presented, and not the fact that it is there at all. Theknightwho (talk) 14:59, 4 November 2022 (UTC)[reply]
Agreed with User:Theknightwho on all accounts; please use the collocation templates and section. See for instance how nice and tidy broken looks, much better than those plain lists that visually interfere with the actual usage notes. Furthermore, I hope that the new translation table improvements also affect collocation tables because that would allow us to display them even more concisely while not sacrificing readability. By the way, {{co-top}} has already been deployed 262 times, {{coi}} 4'902 times. If there's any standard, it's this. — Fytcha T | L | C 15:15, 4 November 2022 (UTC)[reply]
I don't like how broken looks; the repetition of the adjective feels unnecessary and a relatively short list takes so much space, using only two colums. German Wiktionary uses a compact format with plain comma-separated lists. And it is of course more typing. More place taken in the wiki code. The adjectives are not linked either in broken. The lists usually do not interfere with anything since most entries do not have usage notes. And collocations are in fact usage.
If forced, I will probably resort to using this unseemly co-top business, but it is really annoying. --Dan Polansky (talk) 15:23, 4 November 2022 (UTC)[reply]
In revision history of hopeless, I noticed there used to be my list of collocating nouns and someone has converted this to the new collocation format later. We did use to have many more of my lists before the new collocation format vote. --Dan Polansky (talk) 15:30, 4 November 2022 (UTC)[reply]
We used to live in huts and shit in the woods. It is equally irrelevant. Theknightwho (talk) 15:36, 4 November 2022 (UTC)[reply]
It supports my claim about traditional practice. The new practice is unlikely to be objectively better: German Wiktionary does not use it and Polish Wiktionary does not either, from what I remember. Some people happen to prefer this wasteful format and so do some collocation dictionaries; other collocation dictionaries don't. To liken other professional collocation dictionaries to "shitting in the woods" is outlandish. I always recommend paying attention to objective verifiable facts and contrast them to subjective preferences, whims and value statements. It would be more respectful and true to facts to recognize that people and their preferences differ and start from there. --Dan Polansky (talk) 15:41, 4 November 2022 (UTC)[reply]
The issue is that you've given no actual argument other than calling it wasteful, which is trivially disproven by the fact that we can put it in a collapsible box, and doesn't address the fact that not all collocations are formulaic. What other Wiktionaries do is not of any relevance, given they frequently mimic our practices. Theknightwho (talk) 15:47, 4 November 2022 (UTC)[reply]
"wasteful, which is trivially disproven by the fact that we can put it in a collapsible box": nonsense. It is visually wasteful once uncollapsed. Should not need to be said.
What other Wiktionaries and other collocation dictionaries do confirms that there is no "objectively best" way of doing it, and in fact, there's often no arguing about taste. I rest my case that we are dealing with subjective preferences, not objective facts of value. --Dan Polansky (talk)
Why does that matter when they can collapse and uncollapse it at will? These objections are absolutely surreal. You've not addressed a single concern raised here; you've just had a self-absorbed tantrum about having to do things a bit differently. Theknightwho (talk) 16:11, 4 November 2022 (UTC)[reply]
An interesting question: how many of these collocations we now have were added anew by people willing to do the menial work and how many of them are just converted collocations entered by me. This would help show how much editors take the value of collocations seriously beyond talking about them and regulating them. --Dan Polansky (talk) 15:46, 4 November 2022 (UTC)[reply]
Tagging @Vininn126, who is a big fan of collocations. Theknightwho (talk) 15:49, 4 November 2022 (UTC)[reply]
All of the collocations I add are taken from a Polish National Corpus. Sometimes it takes a very long time to do them. Vininn126 (talk) 16:00, 4 November 2022 (UTC)[reply]
  • I would think we would want these hidden by default, so that they wasted less space. Text in a show-hide bar could explain or hint at what lurked beneath. DCDuring (talk) 17:53, 4 November 2022 (UTC)[reply]
    It really depends on the amount of collocations. When it gets close to 7 for a definition I move them from inline to the collapsible box. Sometimes I even set up multiple boxes with senses or even senseid's. Vininn126 (talk) 19:16, 4 November 2022 (UTC)[reply]
    IMHO as soon as they take up more space than the show/hide bar, they should appear under it, collapsed. I'd use the number of columns that led to the smallest amount of vertical screen space occupied by these lists when expanded. Everyone wants to get a lot of space for their favorite content: etymology, pronunciation, citations, usage examples, etc.; now, collocations too. I still have the strong suspicion that users want definitions first and foremost. Their other interests vary. Registered users get to make appear what they want, whatever the default. DCDuring (talk) 22:29, 4 November 2022 (UTC)[reply]
  • Agree with Theknightwho and DCDuring, including when Theknightwho is being rude. MuDavid 栘𩿠 (talk) 01:36, 5 November 2022 (UTC)[reply]

The minds seem set, but for the benefit of the reader, let's consider 4 major collocation dictionaries:

Very interesting. The only one that does anything like Wiktionary is Cambridge. By my subjective taste, the format chosen by Wiktionary is greatly inferior, requiring visual parsing of the same repetitive element again and again. 3 of 4 collocation dictionaries agree. Oh well. I will add that whether an adjective collocates attributively or predicatively is largely irrelevant: it is still a collocation. A "rule" can be "rigid" in the predicative position, and that is also interesting. --Dan Polansky (talk) 14:32, 5 November 2022 (UTC)[reply]

Stop being so fragile. You have already made it very clear that you have contempt for the concerns of other users over the way you want to do things. No need to repeat yourself. Theknightwho (talk) 14:38, 5 November 2022 (UTC)[reply]
The current Wiktionary format is superior because it is also flexible in its presentation. You can just change your user JS to display collocations as text lists again if you insist. It's not possible the other way around: a script can't reliably parse your unstandardized plain text lists and convert them to bulleted lists. — Fytcha T | L | C 14:45, 5 November 2022 (UTC)[reply]
And how do I customize it so that the repetitive element gets hidden, to match the presentation in the collocation dictionaries? If it at least used tilde instead of the repetitive element, that would be quite an improvement. --Dan Polansky (talk) 14:54, 5 November 2022 (UTC)[reply]
You could make your own template and apply it. If there were enough use or interest someone might moudule-ize it. DCDuring (talk) 15:07, 5 November 2022 (UTC)[reply]
What is that supposed to mean? That is not JavaScript. I would have to edit mainspace, wouldn't I? That's not personal customization. --Dan Polansky (talk) 15:18, 5 November 2022 (UTC)[reply]
@Dan Polansky: You can do that by adding this line to User:Dan Polansky/common.js: document.querySelectorAll('.collocation .e-example b').forEach(e => e.textContent = '~') This only works if the proper template ({{co}}/{{coi}}) is used and if the term is bolded (which should be done anyway). — Fytcha T | L | C 15:33, 5 November 2022 (UTC)[reply]
Thank you; fair enough. The boldface is another bad idea: the repetitive items are obtrusive enough even without boldface. If the collocating items that vary were in boldface, that would make a little bit more sense. A problem with the customization idea is that we should provide best defaults possible. We will have to assume that the choice made is the best default from usability standpoint. I don't believe that at all and 3 dictionaries agree with me, but the minds are set, so it's what it is. --Dan Polansky (talk) 15:38, 5 November 2022 (UTC)[reply]
With your mindset, no-one would ever innovate. Theknightwho (talk) 15:52, 5 November 2022 (UTC)[reply]

Change "Middle Mongolian" to "Middle Mongol"[edit]

Discussion moved to WT:RFM.

Comments in quotations[edit]

If I need to clarify something minor in usage examples or quotations (for instance things for which English lacks a distinction), I abuse {{abbr}} to add my comment (see obe for a recent example). However, I'm wondering what the best approach would be for quotations where every other word merits a comment. This mainly happens when a language that is correctly written with (many) diacritics is informally written without them (see mierli for a recent example). I want to add the correct forms with diacritics so that learners know what to look up and how it is pronounced but I don't know how to best present this information. {{abbr}} seems inadequate because it is impossible to copy from the hover text (well, apart from editing the page) while {{sic}} after every other word looks woefully ugly and disrupts the reading flow. Ideas? — Fytcha T | L | C 20:03, 5 November 2022 (UTC)[reply]

@Fytcha: I always use block brackets [ ] - cf. the quote at hää. Thadh (talk) 21:53, 5 November 2022 (UTC)[reply]
Brackets or even another line if a whole quote needs normalizing (analogous to how translations are given on a second line) seems like the best approach. The latter might require updating the template if anyone wanted to have the template do it rather than formatting the cite "manually". If multiple words need nomalizing or {{sic}}ing, it seems advisable to move the brackets to the end, i.e. knot [not] sick [sic] everi [every] wurd [word], but instead knot sick everi wurd [not sic every word]. Related issue: in the 2003 quote on that page,
  • 2003 April 13, Spaima Limbricilor, “Ploua, ploua, Bombonel se oua! [It rains, it rains, Bombonel lays eggs!]”, in soc.culture.romanian, Usenet [It rains, it rains, Bombonel lays eggs!][3]:
it's weird that the template puts the trans-title= redundantly in two places (and not in the best place either time; I'd think it would ideally be placed outside of, but directly next to, the quoted title). - -sche (discuss) 01:20, 7 November 2022 (UTC)[reply]
What about using |tr= or |ts= for the normalized spelling? 98.170.164.88 01:24, 7 November 2022 (UTC)[reply]
What about languages that do need transcription or transliteration? I don't think that's desirable. Thadh (talk) 07:14, 7 November 2022 (UTC)[reply]
What I've done, e.g. at Pali ປິ (pi), is to split the transliteration line into transliteration of text as is and as normalised/corrected, separated by
"<br><span style='font-style:normal;'>With ambiguities resolved:</span><br>"
or similar using angle brackets in the actual text. The path is a bit complicated - the text is stored in Module:RQ:pi:Anisongfree in variable resolve and is formatted by {{quote-web}}. This technique is useful for Lao script, where the writing is usually ambiguous, and for older Tai Tham texts, where the spelling is idiosyncratic or simply atrocious. --RichardW57m (talk) 12:55, 7 November 2022 (UTC)[reply]
Interesting. Maybe this tells us that we could do with an additional parameter in our templates, something along the lines of |copyedited=. As for [], I think a new template should be created attaches a separate CSS class to these comments. This has the advantages that their appearance is customizable and even toggleable and also that it is always clear which [] are from the source and which were inserted by us. — Fytcha T | L | C 13:05, 7 November 2022 (UTC)[reply]
@Fytcha: Good ideas! We need two versions of |copyedited=, one for the original script, and one for the transliteration, as it may sometimes be appropriate to edit the original script. While one could copy-edit my Tai Tham instances, typical Lao-script Pali writing systems cannot make the distinctions, which is why there was pressure to encode the Buddhist Institute's additions/restorations. In some cases, we might even want to correct the original and then resolve sandhi in the transliteration! That makes me think we need explanatory lines saying what the change is. --RichardW57m (talk) 13:26, 7 November 2022 (UTC)[reply]

Was this change a 'substantial or contested change' that should have required a formal vote?[edit]

In late 2014 and early 2015 a vote was held to decide whether to "[make] it official policy to delete entries which do not meet WT:CFI [...] even if there is a consensus to keep". That change was not enacted and the vote was closed "no consensus" with 7 supporting votes and 9 opposing votes (44% support).

Shortly after the vote was closed, Kephir, who entered an "abstain" in the vote, removed the template marking Wiktionary:Criteria for inclusion as a "policy, guideline or common practices page" and instead marked it as obsolete and "not intended to be used ever again" saying "how else can you interpret [the vote]?". Shortly afterward that BD2412, who did not participate in the vote, undid the change saying that "CFI can still be a guideline even if it is not mandatory where there is consensus for an exception". After that BD2412 added the following passage to Wiktionary:Criteria for inclusion:

In rare cases, a phrase that is arguably unidiomatic may be included by the consensus of the community, based on the determination editors that inclusion of the term is likely to be useful to readers.

In their edit summary for the change BD2412 wrote that "[t]his is what the vote really means."

More recently, PUC removed the passage added by BD2412 saying "[it] was never approved by vote" and also explained "so that it can no longer be invoked by the likes of Dan Polansky".

For context, during the time that the passage added by BD2412 was part of Wiktionary:Criteria for inclusion, I am aware of seven instances where it was referenced as part of a discussion (1, 2, 3, 4, 5, 6, 7)

Per a 2012 vote, "[a]ny substantial or contested changes [to CFI] require a VOTE". My question for now is "was the removal of the passage originally added by BD2412 a 'substantial or contested change' that should have required a formal vote?" If the consensus is "no", then this discussion can resolve with no further action. If the consensus if "yes", I will start a vote about whether the passage should be removed from Wiktionary:Criteria for inclusion. I appreciate hearing everyone's thoughts and hope we can approach this question narrowly. Take care. —The Editor's Apprentice (talk) 23:33, 5 November 2022 (UTC)[reply]

The removal by PUC is both substantial and contested, and per policy requires a vote. I understand PUC action: the sentence arrived into CFI without a vote about that sentence. But the sentence was in CFI for over 7 years (diff) and has become entrenched; any admin who opposed its addition could have removed it in the days, weeks and ever months that followed. Plus of the sentence: it documents our widespread practice of policy overrides, supported in some cases by nearly everyone, e.g. for hot words, for which no one played the stick-to-the-rules game. Without a sentence like this, CFI would be less honest. I don't really need the sentence anyway: I can always invoke User:Dan Polansky/IA § Policy override. Without policy overrides, editors must not invoke LEMMING ever again, nor "set phrase", nor "term of art", etc. We don't have Wikipedia's W:Wikipedia:IAR, and the sentence does this job in a nuanced and fairly weak manner, not so aggressive as the "Ignore all rules" phrase. And if we want to remove from CFI things that arrived without a vote and are not supported by consensus, fine, let's remove the irrational WT:COMPANY. If people want to improve the sentence and place it to other location of CFI, fine, let's do that, but as a replacement, not a removal. --Dan Polansky (talk) 07:38, 6 November 2022 (UTC)[reply]
It has not become entrenched, and was in fact already contested one year ago, as can be seen from the discussion I linked to above. Maybe no admin cared enough to remove it until now, and maybe some did not even notice it.
It's the addition of this sentence that should be submitted to a vote and approved by a 2/3 (or 60% percent, I don't particularly care) majority, not its removal. Submitting its removal to a vote means a 1/3 minority could have its say in what appears in the CFI. PUC10:10, 6 November 2022 (UTC)[reply]
Should WT:COMPANY be removed from CFI as not arriving into CFI via a vote? --Dan Polansky (talk) 11:03, 6 November 2022 (UTC)[reply]
For others, do you wish the controversial phrasebook provision in CFI to be removed from it as not arriving into CFI via a vote? --Dan Polansky (talk) 11:04, 6 November 2022 (UTC)[reply]
None of this is relevant to the discussion. It's just whataboutism. Theknightwho (talk) 11:48, 6 November 2022 (UTC)[reply]
This kind of invocation of the concept of whataboutism is largely nonsense. The idea is: if one proposes to act in a way that can be questioned, one should identify the principle behind the action and examine the acceptability of the principle on a variety of specific examples. The idea is Kantian and Popperian. Here, the principle seems to be: "If part of CFI is controversial and was added to CFI without a vote, it should be removed without a vote even if it was in CFI for several years". I propose to investigate whether we want to accept the principle by applying it to a variety of cases. To apply the principle to some cases but not to others is to reject the principle.
PUC, are you acting on this principle or on another principle? Your answers to questions would be appreciated; I have no desire to interact with the Knight. --Dan Polansky (talk) 12:37, 6 November 2022 (UTC)[reply]
This is obstructionism, and an obvious attempt to muddy the water by making the discussion about something that it is not. Please stop. OP even requested that we approach this question narrowly. Theknightwho (talk) 12:43, 6 November 2022 (UTC)[reply]
If we are to approach the question "narrowly", then the only question is whether the change is a) substantial or b) contested. It is in fact both a) and b). Everyone should agree on that (which is glaringly obvious), make the agreement on record and move on. The core of this pickle is that a "narrow" approach seems unfair to PUC. It seems to me that there could be a meaningful interaction with PUC in which he would himself realize the approach he has taken is unworkable, and would undo his edit. --Dan Polansky (talk) 13:05, 6 November 2022 (UTC)[reply]
No, that isn't what it means to approach a question narrowly, and you're just trying to find ways to keep talking about things that are not relevant. You do love your false dichotomies, though. Theknightwho (talk) 13:20, 6 November 2022 (UTC)[reply]
I forbid myself to respond to the Knight in this thread. I allow myself to respond to PUC. --Dan Polansky (talk) 13:27, 6 November 2022 (UTC)[reply]
I believe this was a substantial change as well, as I have always operated under the assumption that this rule applied. I would think many editors did, since few of us were here back in 2015. Thadh (talk) 13:59, 6 November 2022 (UTC)[reply]
To quote what I said on Discord: "Agree it should not have been removed [as that] clause is not being evoked in every discussion anyways and plenty of entries get deleted [anyways]". If the issue is with Dan Polansky, then talk to him directly. As seen in the prior discussion there was no consensus as to whether or not that line should be deleted, and it was inappropriate to delete it, especially after having participated in the aforementioned discussion. And it was even more inappropriate to make such a sweeping change with no discussion in this forum and to target a specific user while doing so. I maintain that this, along with prior instances, is unbecoming of someone that has recently become an admin. I hope that another admin will revert this change while this discussion takes place. AG202 (talk) 15:11, 6 November 2022 (UTC)[reply]
See also: User_talk:PUC#call_the_fire_department, pages shouldn't be deleted like this while ignoring CFI. This combined with other behaviors related to "inclusionists" is very concerning, and it's not the first time that this has been brought up either. AG202 (talk) 15:18, 6 November 2022 (UTC)[reply]
I see two possible types of concern:
  • genuine concern: you simply care about my (not) following proper procedure, and would have objected just as strongly if I had summarily removed a sentence of CFI you don't agree with, or summarily deleted an otherwise valid entry you personally disliked (on account of offensiveness, for example);
  • ideologically motivated concern: you don't care as much about my (not) following proper procedure as about my challenging things that support your views.
Which one is it here?
As for your quote:
  • "[that] clause is not being evoked in every discussion anyways": what's the logic there? That clause still is a bad argument. It's a good thing when a virus isn't spreading everywhere; that doesn't mean we shouldn't try to get rid of it completely.
  • "plenty of entries get deleted [anyways]": so we should start grasping at straws?
PUC20:53, 7 November 2022 (UTC)[reply]
I’ve called out “inclusionists” and “deletionists” alike for not following procedure and for many other reasons. Deleting a clause in CFI without discussion, however, is unheard of for me in my time here, and the fact that the edit summary was targeted towards a specific user, the other example I cited, and prior encounters, it raises it to a level of concern that untenable for me at the moment. If you had edited CFI in the other “direction”, I still would not have been pleased. Though I don’t agree with everything done here, I have never gone as far as to unilaterally change something major on my own without discussion and consensus, even when I’m the only one with expertise, let alone something as powerful as CFI.
As for the quotes you’ve cited, the first one was in response to your edit summary where you removed the clause. When you said “Dan Polansky and the likes”, it makes it seem like it’s something that’s seeping in every discussion, when it’s not, and even if it was, you should’ve still had discussion about it here first. The second quote was less of an argument and more of a personal observation. In my time here, I haven’t really seen that clause itself save an entry enough times to warrant any sort of backlash at this rate (“a virus,” really?). I understand that you don’t like it, but that’s not how things work here as I’ve seen myself many times. There should be consensus. I’m just overall concerned that you took it upon yourself to make that change with the admin power that you recently got, almost in retaliation against described “inclusionists” when, to me, that’s not really what admin power should be for. AG202 (talk) 02:55, 8 November 2022 (UTC)[reply]

I've reverted myself. @The Editor's Apprentice: I do hope this will be put to the vote. As you can see, the sentence is controversial, and should certainly not get a free pass imo. It's unfortunate that it's been sitting there unchallenged for so many years. PUC19:29, 6 November 2022 (UTC)[reply]

Since there seems to be consensus that the removal was "substantial" and/or should have followed a formal vote, I have started a formal vote which is currently in the premature stage to answer the question of if the passage originally added by BD2412 should remain. Please give it a look and discuss and possible improvements or fixes on the vote's talk page. Take care. —The Editor's Apprentice (talk) 06:42, 8 November 2022 (UTC)[reply]

Let's deprecate the Thesaurus namespace[edit]

To be clear, I think there's a lot of value in giving synonyms, but I think there are some serious flaws in how we do it at the moment. I don't want to set out a detailed proposal for this without getting a sense for what the consensus is, but my overall impression is that we probably want to integrate it into the mainspace:

  1. Badly neglected and inconsistent. I think it's fair to say that not very many editors maintain thesaurus pages. They're inconsistently categorised (Category:English thesaurus entries is extremely incomplete), and have no standardised layout, which creates confusion for the reader. There are also wide inconsistencies as to whether we should be including the language code in the page name. None of this is desirable, and it certainly does not aid the reader. It's also obvious that the various clean-up jobs which have been done over the years on Wiktionary have bypassed the Thesaurus: the template still uses the acryonym "ws" (depite the Wikisaurus name being deprecated back in 2017), there are still a bunch of interlanguage links (which have been removed everywhere else), and the templates still follow schemes that have been deprecated (e.g. they still use lang). These are all obviously fixable, but it is highly indicative of how much attention is actually paid to these pages by the majority of the editor-base (read: not much). The lists even still require manual alphabetizing, which is absurd.
  2. Potential clutter is easily avoidable. As with other sections which often contain lengthy lists (e.g. derived terms), there are ways of including these that don't clutter the page. The most obvious solution being to ensure that the section is collapsible.
  3. Better to be consistent with everything else. I can't think of a compelling reason for treating synonyms differently to everything else; particularly given that we only do this when lists of synonyms become longer. It's far more accessible (and better meets reader expectations) to treat synonyms consistently across all pages, whether the list has 3 entries or 300; just as we do with derived terms et al.
  4. A better model is already in use. The pages for Chinese already make use of an extensive system of modularised thesaurus templates, which can be placed on each page as necessary, and update automatically as new synonyms are added (i.e. bypassing the reason for having thesaurus pages in the first place). In the case of Chinese, these are primarily used to show dialectal distribution, but there is no reason why a similar system can't be used in a more general purpose way. You can see a bunch of these in use at 條#Chinese. Note: I am not saying we should use this layout; just that the underlying system can obviously be utilised by other languages. It's also not the only possible solution, but simply an example of how we could do things better. A less radical model would be doing what we do with translations, which is to point the user to the translation section on the primary entry.

What are people's thoughts? Theknightwho (talk) 22:47, 7 November 2022 (UTC)[reply]

One use case for a thesaurus is to try to gradually navigate to the mot juste (or a word you have forgotten). I've also used this e.g. when composing cryptic crossword clues and trying to create a convincing "surface reading". In such cases it's very useful to navigate in a thesaurus-only mode: when I see a candidate, I click it, and jump to the thesarus page for that word, and thus get closer and closer. You see the same interface in e.g. Microsoft Word thesaurus. (We don't really have enough thesaurus root words to be able to do this, yet.) Equinox 23:03, 7 November 2022 (UTC)[reply]
That is a good point. I think the main problem that we have is that our thesaurus at the moment essentially just acts like an overflow, and I don't think it's likely to change anytime soon. I also suspect that any modularized implementation would allow both formats, and for greatly expanded coverage in thesaurus-only mode as well (as any input to page sections would also benefit that). Theknightwho (talk) 23:14, 7 November 2022 (UTC)[reply]
I agree that there is a lot of improvement to be made with respect to synonyms, and a dropdown template with a dozen words or so automatically added in seems like a great idea. But in my opinion, many thesaurus entries are so impractically long that they deserve their own pages. Do we really want the full list of synonyms in Thesaurus:drunk to display in the main entry?
Ioaxxere (talk) 05:39, 8 November 2022 (UTC)[reply]
We already include pages with very large numbers of derived terms (comparable to Thesaurus:drunk); it doesn't seem like too much of any issue to me. Just have a look at the derived terms on neuro-, where they're not too difficult to parse (remembering that most thesaurus entries won't have lots of similar-looking terms like that, either). Theknightwho (talk) 07:01, 8 November 2022 (UTC)[reply]
Don't forget about the Lua memory limits. Having more content on main will likely push some entries over the cliff. Could we have a mixed model, where most of the content is in Thesaurus:, but some of it could be pulled into the main space? Perhaps the most salient synonyms? – Jberkel 08:22, 8 November 2022 (UTC)[reply]
Good point. I almost never remove synonyms from the mainspace. In my view, the mainspace should list some of the most common synonyms and then link to the thesaurus. By contrast, I have seen some editors remove synonyms from the mainspace, which I find not so good. --Dan Polansky (talk) 08:44, 8 November 2022 (UTC)[reply]
Lua memory limits are a concern only on a comparatively tiny number of pages. While the issue is obviously there, it's important to remember that the back-end for labels alone is considerably more burdensome than synonyms are ever likely to be, given the size of the module tables involved. Theknightwho (talk) 09:46, 8 November 2022 (UTC)[reply]
Yes, concerns are now only on a small number of pages, but only because content was moved *out* of pages. I think this should be the general direction to follow, moving non-essential content out of main, either to namespaces or to Wikidata. Editing large pages is already very slow right now. – Jberkel 10:29, 8 November 2022 (UTC)[reply]
Including a massive list of synonyms within the entry, many of them slang or obscure, would seriously hinder writers who are just looking for a single decent word. We should provide 10-20 of the most common and useful synonyms (and a handful of antonyms) as part of a template, provided with a link to the full thesaurus page. See the layout of Google's dictionary for what I basically mean. Ioaxxere (talk) 14:28, 8 November 2022 (UTC)[reply]
There are plenty of options for how we could lay things out. There is no obligation to do a massive list with no additional context. Theknightwho (talk) 15:26, 8 November 2022 (UTC)[reply]
I agree. Your example with neuro- gave the impression that you would like a list of synonyms to be laid out as such, but that would of course be less than ideal. Ioaxxere (talk) 15:33, 8 November 2022 (UTC)[reply]
I guess my point was just that there's plenty of precedent for having large lists in mainspace. I'd certainly prefer them to be subdivided sensibly, though. Theknightwho (talk) 15:48, 8 November 2022 (UTC)[reply]
Benefits are described here:
In brief, the thesaurus helps ease maintenance of semantic lists by centralization, allows focus on a single sense or place in the semantic space, allows focus on semantic relations to the exclusion of etymology, pronunciation, etc., and provides hints where to navigate next to find other semantic lists via the "=> Thesaurus" links next to items.
The problems raised do not seem intractable or very serious. The greatest problem is the lack of interest of editors, but I don't expect using the mainspace would improve that very much. The work on the thesaurus involves hard and unique challenges that most editors are not interested in. The bulk of the English thesaurus was made by two people with serious interest in it; AdamBMorgan did a lot of work there. An entry to consider is Thesaurus:number with all its structure and rich content, not constrained to synonyms, hyponyms and meronyms. To form a better idea of what's involved and what the mentioned benefits mean in practice, one has to look at some of the more interesting complex non-synonymic entries. (As an aside, the voters in Wiktionary:Votes/pl-2017-11/Restricting Thesaurus to English thought having a separate thesaurus is a good idea.) --Dan Polansky (talk) 07:25, 8 November 2022 (UTC)[reply]
Just compare how much attention derived terms get compared to the thesaurus. The difference is enormous. You haven't really explained why it needs to be in a separate namespace or presented any solutions to the (numerous) issues outlined, which are well-proven to be a problem judging by just how neglected and problematic the Thesaurus namespace currently is.
By the way, I'm going to nip in the bud any attempt to misrepresent this as being about the work involved or whether synonyms are valuable, because quite obviously I want to improve access to that, not remove it. Theknightwho (talk) 07:43, 8 November 2022 (UTC)[reply]
Derived terms are entirely trivial to add and figure out, requiring no skill to talk of at all; semantic relations, which emphatically are not just true synonymy, which is relatively boring and uninspiring, are a whole different beast. I recommend the readers to read the page with the benefits articulated, and if anyone has any questions for me, please ask, and I will try to do my best. --Dan Polansky (talk) 08:01, 8 November 2022 (UTC)[reply]
What relevance does any of that have to how we present information? You're also wrong, but it simply doesn't have anything to do with the topic at hand. The thesaurus pages are in a sorry state, whichever way you slice things. Theknightwho (talk) 08:17, 8 November 2022 (UTC)[reply]
Let's try something different: where is the thesaurus data for Thesaurus:number to be stored? Directly in the mainspace, in number? What about Thesaurus:drunk, in drunk? Will there be templates and modules to extract the content from the mainspace entry drunk and show it in the synonym entries? --Dan Polansky (talk) 12:09, 10 November 2022 (UTC)[reply]
From reading the top again, the answer seems to be templates, like in some Chinese entries. So all the people who could not figure out the thesaurus will be able and willing to do essentially the same kind of information filtering, selecting, taxonomizing and sequential ordering (e.g. Thesaurus:number), just using the template namespace and template technology? Is really using templates and modules easier for non-technical mortals, perhaps semanticists, ontologists and philosophers in general, than using the markup in use in the thesaurus? And why could not the same templating and module technology proposed be used in the thesaurus namespace? Could we thus retain the namespace but use the proposed technological change, provided the change really brings more pros than cons? --Dan Polansky (talk) 12:29, 10 November 2022 (UTC)[reply]
People seem to have no problem doing so with everything else that's done through modules. It's not that we're all too thick to work out how to use the thesaurus, if that's what you're implying. Theknightwho (talk) 12:52, 10 November 2022 (UTC)[reply]
Okay, let us assume (I don't) that editing modules and templates is generally as easy and general-editor-friendly as editing the current setup of the thesaurus. How is the semantic focus to the exclusion of everything else going to be achieved, given the semantic relationships are going to be transcluded in to the mainspace in some way? --Dan Polansky (talk) 13:13, 10 November 2022 (UTC)[reply]
It's possible to use module data for more than one purpose. This is trivially obvious. Theknightwho (talk) 16:34, 11 November 2022 (UTC)[reply]
@Dan Polansky: I think it would have been better had you disclosed that you are the sole editor of the benefits page. By linking to a Wiktionary namespace entry where arguments are collected, people who forget to check the history are tempted to think that there is more support for your personal views than there actually is. Please don't try to make it look like something that is not the case (e.g. by writing Benefits are described here instead of "I've described the benefits here"). — Fytcha T | L | C 11:47, 8 November 2022 (UTC)[reply]
Fair point; I could have made it explicit that I am the sole author of the argumentation. However, the benefits are an exercise in argumentation and do not necessarily have objective factual validity, as is all too often the case with "benefits". Everyone has to form their own judgment. While there are some purely factually valid claims such as the listing of thesauri in other dictionaries, to what extent the factual claims are relevant or convincing is for the reader to determine. --Dan Polansky (talk) 11:54, 8 November 2022 (UTC)[reply]
Perhaps your views would be better-suited to your personal userspace. Theknightwho (talk) 15:25, 8 November 2022 (UTC)[reply]

Keep, but rename it to DanThoughts. --Vahag (talk) 11:07, 10 November 2022 (UTC)[reply]

new Vector 2022: wasted space[edit]

Looks like they've just added a banner prompting people to switch to Vector 2022. I tried it and the first thing I notice is that there's a lot of wasted whitespace on the right, which seems to serve no purpose at all; in addition, the left rail got wider, which combined with the wasted space on the right means the contents in the middle are a lot narrower. I gather many people will switch skins, but all the wasted space will mean we potentially need to make things significantly more vertical and less horizontal (maybe necessary anyway for mobile devices, but otherwise non-ideal). Is there a way to recover the space with some customization settings while not switching entirely back to Vector 2010? Benwing2 (talk) 04:04, 8 November 2022 (UTC)[reply]

Honestly it looks like a mobile version to me, but I'm just being grumpy. Vininn126 (talk) 10:11, 8 November 2022 (UTC)[reply]
They also somehow managed to make the title non-copyable in edit mode (again)  :/ (phab:T322725) – Jberkel 10:37, 8 November 2022 (UTC)[reply]
The only hope I have left after seeing that they didn't respond to well-founded criticism is that Vector 2022 is never rolled out as a default skin on en.wikt (do we have control over that?). It's patently clear that this skin was created with only Wikipedia in mind. — Fytcha T | L | C 11:53, 8 November 2022 (UTC)[reply]
Unchecking "Enable limited width mode" under "Skin preferences" recovers the space on the right, but the left side remains well padded. JeffDoozan (talk) 20:23, 8 November 2022 (UTC)[reply]
It might be possible to change the sidebar width. The DIV class seems to be .mw-panel but I was trying to figure it out a few days ago and for some reason it didnt work on the site, even though it did work in my HTML editor on my computer. There might be some CSS that's loading externally that's interfering with it. What I do know is that I've hidden the sidebar entirely on a private wiki where it just doesn't serve much purpose. I wouldn't want to hide the sidebar on Wiktionary, but I'd hope it is at least possible to compress it and make the font smaller since most high-volume editors won't need it very often. Soap 19:34, 11 November 2022 (UTC)[reply]

Including lists of notable people in a field[edit]

@Dan Polansky wants to include a list of notable philosophers in our thesaurus page (history) Thesaurus:philosopher. What do other people think about this? — Fytcha T | L | C 12:18, 8 November 2022 (UTC)>[reply]

We do list instances in various entries in the thesaurus, and it makes sense, e.g. Thesaurus:country and Thesaurus:political party. The "instance of" relationship is well established in the thesaurus. In the discussed entry, it follows the example of Moby II and WordNet. In so far as mainspace entries should better be covered in the thesaurus, e.g. the sense for Aristotle should be covered somewhere and it naturally belongs to Thesaurus:philosopher. The choice of the notable philosophers is driven by criteria that, while arbitrary, are bound to two specific external lists, providing for maintainability. --Dan Polansky (talk) 12:26, 8 November 2022 (UTC)[reply]
I really don't understand the point of this proposal, at least as a proposal vs. an opportunity to troll. Most dictionaries do not include such lists. In particular, at least one dictionary with an affiliated encyclopedia does not, Merriam Webster. Why should we be duplicating content available from a sister project. We would have at best the same content WP has as in a mere listing page or category. DCDuring (talk) 16:09, 10 November 2022 (UTC)[reply]
@DCDuring To understand the objections better, I have posed a list of analytical questions below. If you would be so inclined and answered some of them, that would be great. The lead question is whether all instance-of relationships are a problem or something else is a problem. As for background, I do recall your objections to our having geographic names, so I am not surprised by your opposition. I am surprised that you spend most of your time here making a vastly incomplete replica of Wikispecies, but that's your choice, not for me to judge. --Dan Polansky (talk) 16:19, 10 November 2022 (UTC)[reply]
Not in the least interest in encouraging any more of this. DCDuring (talk) 18:21, 10 November 2022 (UTC)[reply]
@DCDuring We do have geographic entries as per voted policy supported by a 2/3-supermajority and we do have planets. Should Mars be removed from Thesaurus:planet since it is in instance-of relationship? And should biological taxa be removed from the mainspace since they are generally considered names of specific entities and they do not show attributive use in widely understood meaning? There is in fact no policy protection for names of taxa. Where are your principles, if any? --Dan Polansky (talk) 19:06, 10 November 2022 (UTC)[reply]
I can see this as a type of hyponym, and I think it makes a certain amount of sense. I wonder if a link to a category page or something would be better. Vininn126 (talk) 12:27, 8 November 2022 (UTC)[reply]
One thing is certain: listing all merely "notable" philosophers rather than "very notable" would become unwieldy. One can try to figure out where to draw the line, and include fewer notable instances. The choice I made was based on two reasonably short external lists, one a thesaurus, one a semantic network that we picked the semantic relationships from. If there is a shorter canonical list, we can consider using that one. A comprehensive list of notable philosophers should indeed be delegated to a category. However, including senses for specific philosophers in the mainspace is still a controversial issue, with no policy regulating the subject, so filling the category would be controversial. --Dan Polansky (talk) 12:39, 8 November 2022 (UTC)[reply]
I see this as a can of worms: It opens us up to potentially endless content disputes over who is or isn't a philosopher, whether someone's pet philosopher is noteworthy enough etc. with no good mechanism to determine who's in the right. We could theoretically do the heavy legwork of meticulously defining a razor-sharp demarcation such that there is no dispute possible, but OTOH we could also just not include these. Providing a list of luminaries in a field is an encyclopedia's job. I also want to remind that there is currently majority (though currently not supermajority) support in an ongoing RFD to delete such a "surname-person-sense": Wiktionary:Requests_for_deletion/English#DickensFytcha T | L | C 12:44, 8 November 2022 (UTC)[reply]
The thesaurus as a whole provides for potentially endless content disputes since so much of it cannot be algorithmically and deterministically regulated. Arbitrary external lists can be picked and there does not need to be any dispute. I picked two lists that do not in any way cater to my preferences but rather are "natural" picks. While Dickens is perhaps more vulnerable to poorly argued deletionist whims, Aristotle could be less so. I will also note that -ian/-ist nouns describing adherents (Platonist, Aristotelian, Marxist) are natural hyponyms of Thesaurus:philosopher, and will be probably listed anyway, and the selection problem will be the same or similar. As for the "covered by encyclopedia" argument, that alone has almost no force since a lot of dictionary content is necessarily covered by encyclopedias, and better so, e.g. names of laws, theorems and principles. Many dictionaries/networks do think this kind of content is inclusion worthy. --Dan Polansky (talk) 12:58, 8 November 2022 (UTC)[reply]
I strongly advise that you read WT:What Wiktionary is not. It is becoming extremely tiresome having to relitigate all the minutiae of Wiktionary because of your constant attempts at rules lawyering. There is an obvious and material difference between terms like Marxism and the philosopher Karl Marx. We focus on the former, and leave the details of the things those terms describe to Wikipedia. Theknightwho (talk) 15:35, 8 November 2022 (UTC)[reply]
My point is that the problem of selection criteria for surnames and -ian/-ist items is the same. If all -ian/-ist items are listed, they will be too many. And the list is fairly selective and interesting. Such a list is in fact not found in Wikipedia; you can try. --Dan Polansky (talk) 15:53, 8 November 2022 (UTC)[reply]
Why do we need lists for surnames or -ian/-ist terms? Both of those are already covered by categories. Theknightwho (talk) 15:57, 8 November 2022 (UTC)[reply]
To make the thesaurus more complete as for hyponymy. The categories do not list hyponyms of "philosopher"; they list all derivations from -ian/-ist, which will be not only such hyponyms. At the very least, one should list a few examples to remind the reader that such hyponyms exist. --Dan Polansky (talk) 16:00, 8 November 2022 (UTC)[reply]
What you seem to be doing is manually creating lists that could be trivially generated from categories. Theknightwho (talk) 16:02, 8 November 2022 (UTC)[reply]
A selection of very notable instances cannot be generated from categories. That holds true for all instances for which we have names in Wiktionary, whether people, countries, cities, rivers, mountains, etc. In my view, listing at least some instances adds value. WordNet agrees. --Dan Polansky (talk) 16:13, 8 November 2022 (UTC)[reply]
Notable by who's judgment? Why do we care about a list of philosophers that you personally consider noteworthy? How does any of this relate to terms? Theknightwho (talk) 16:18, 8 November 2022 (UTC)[reply]
By the determination of an external list, not mine. Popper is missing on the list, a scandal. The point is that exemplification is better than no exemplification. I am fine discussing how many should be there, whether 50, 100 or 200. For rivers, I picked the longest ones, that's easier and more easily measurable than notability of philosophers. But notability of philosophers is also a fact; some are much more notable than others. --Dan Polansky (talk) 16:25, 8 November 2022 (UTC)[reply]
So you've just copied a list you found somewhere else? Theknightwho (talk) 16:26, 8 November 2022 (UTC)[reply]
(outdent) The list is a union of Moby Thesaurus II and WordNet for "philosopher", and precisely the union. Nothing personal. We may choose a different standard if we wish. --Dan Polansky (talk) 16:41, 8 November 2022 (UTC)[reply]
Oppose. Equinox 15:27, 8 November 2022 (UTC)[reply]
What is the substantive argument? Consensus should be based on a combination of voting and reasoning. --Dan Polansky (talk) 15:53, 8 November 2022 (UTC)[reply]
Maybe you better start another vote saying "everyone must explain their votes to Dan's satisfaction". I'm not an idiot. Equinox 09:07, 25 November 2022 (UTC)[reply]
Oppose. Obviously not dictionary material. Theknightwho (talk) 15:29, 8 November 2022 (UTC)[reply]
What is the substantive argument? Consensus should be based on a combination of voting and reasoning. --Dan Polansky (talk) 15:53, 8 November 2022 (UTC)[reply]
Nothing "obvious" about it given WordNet disagrees and so do multiple other dictionaries that do contain biographical entries. From the point of view of an external observer with no bias, it is dictionary material in so far as it is found in dictionaries. --Dan Polansky (talk) 15:56, 8 November 2022 (UTC)[reply]
It's explained in point 1 of WT:What Wiktionary is not: Wiktionary is not an encyclopedia, a genealogy database, or an atlas; that is, it is not an in-depth collection of factual information, or of data about places and people. Encyclopedic information should be placed in our sister project, Wikipedia. Wiktionary entries are about words. A Wiktionary entry should focus on matters of language and wordsmithing: spelling, pronunciation, etymology, translation, concept, usage, quotations and links to related words. Theknightwho (talk) 16:01, 8 November 2022 (UTC)[reply]
That's fine; it's just the "instance-of" relationship, not "in-depth collection of factual information" and not "data about places and people". And the quoted passage is flawed in that it does not even recognize semantic relationships as valid content. --Dan Polansky (talk) 16:09, 8 November 2022 (UTC)[reply]
Oppose. We are not a short-attention-span version of WP. DCDuring (talk) 16:46, 8 November 2022 (UTC)[reply]
More substance please. Neither is WordNet. Exemplification is a great principle. --Dan Polansky (talk) 16:48, 8 November 2022 (UTC)[reply]
You have thousands of words here. Stop demanding that people give you lengthy explanations; especially when you make absolutely no effort to come to a common understanding with other users anyway. Theknightwho (talk) 18:47, 8 November 2022 (UTC)[reply]
Clarifying my previous comment - ultimately oppose. At most listing a category or somethiing it's not lexical. Vininn126 (talk) 15:36, 8 November 2022 (UTC)[reply]
Oppose as well, it's not lexical as Vininn stated and amounts to pure taxonomising of things rather than words. —Al-Muqanna المقنع (talk) 16:04, 8 November 2022 (UTC)[reply]
What does it mean "it's not lexical"? Is the "instance of" relationship a problem, or just notable people? Can countries be listed in Thesaurus:country? --Dan Polansky (talk) 16:06, 8 November 2022 (UTC)[reply]
Individuals aren't lexical, that's an axiom. Vininn126 (talk) 16:11, 8 November 2022 (UTC)[reply]
Names of individual entities (people, rivers, etc.) are words, unless they are multi-word names and thus are "lexical" and even multi-word names are as lexical as phrases. "Instance of" is a semantic relationship, as per WordNet and common sense. --Dan Polansky (talk) 16:16, 8 November 2022 (UTC)[reply]
And categories of specific rivers are also not "lexical"? Should they therefore be deleted as encyclopedic? --Dan Polansky (talk) 16:17, 8 November 2022 (UTC)[reply]
Again: why are you duplicating the function of categories, while also removing a load of info? It just means either more maintenance work, or yet another thing that will become neglected; like much of the thesaurus already is. Theknightwho (talk) 16:25, 8 November 2022 (UTC)[reply]
What am I "removing"? I don't recall removing anything. --Dan Polansky (talk) 16:34, 8 November 2022 (UTC)[reply]
Re-read what I wrote. Theknightwho (talk) 16:42, 8 November 2022 (UTC)[reply]
Nothing to see there. I am not removing anything; I am doing exemplification on the model of Moby II and WordNet. WordNet is an amazing role model, absolutely astounding, regardless of the flaws that it necessarily has. There is no maintenance problem: the list is frozen as a union of Moby II and WordNet. --Dan Polansky (talk) 16:46, 8 November 2022 (UTC)[reply]
You're unbelievable. I obviously meant that you are including a cut-down version of a list that we already have. Stop finding every excuse to miss the point. Theknightwho (talk) 18:48, 8 November 2022 (UTC)[reply]
Yes, exemplification means that not the complete list is included. What you are saying is "exemplification is bad", without saying why it is bad. To my mind, complete lists are uninteresting: I have no interest to look at a comprehensive list of 10,000 philosophers, most of which will not ring any bell in my mind. The same for rivers: I would rather be reminded of some notable instances than of the first 200 items of a comprehensive list where the only claim the items make for themselves is that they lead the alphabet. If I wanted a complete list of rivers, I could go to Wikipedia anyway, or make a Wikidata query; I don't need a dictionary for that. --Dan Polansky (talk) 18:58, 8 November 2022 (UTC)[reply]
There have been arguments for that! Proper nouns are inherently different. Keeping them for etymologies and other information is one reason to keep them, but they are inherently different from common nouns. Vininn126 (talk) 16:28, 8 November 2022 (UTC)[reply]
Are you now saying that "Amazon" is not a word? I placed a source to Appendix:Wordhood claiming otherwise, although I find the claim that it is not a word an absurdity. --Dan Polansky (talk) 16:34, 8 November 2022 (UTC)[reply]
I did not say that! I said they are different. Please stick to the words that I use! Vininn126 (talk) 16:37, 8 November 2022 (UTC)[reply]
Fine, just answer "No" and we move on. We now have that "Amazon" is a word. Now, is the "instance of" relationship between "Amazon" and "river" a relationship that is "lexical"? --Dan Polansky (talk) 16:39, 8 November 2022 (UTC)[reply]
Amazon IS a river specifically, but it's ONE river, which is a different relationship than a TYPE of river which can refer to many instances of it. THAT is lexical. Vininn126 (talk) 16:46, 8 November 2022 (UTC)[reply]
But surely "instance of" is the relationship between the meaning of words "Amazon" and "River" and therefore is "lexical" (of or pertaining to words)? Why would hyponymy be lexical and "instance of" not given both are relationships between word meanings? --Dan Polansky (talk) 16:51, 8 November 2022 (UTC)[reply]
This are inherently different kinds of instances. This is a singular instance, one-of-a-kind, inherently by definition. Other instances, countable or uncountable, still refer to something that can be or be shared with multiple entities - which is why proper nouns are different from non-proper nouns and why the relationship is not lexical. If the Amazon belonged to a category of rivers that behaved differently than other categories of rivers, whatever word we used to describe that category would be lexical. Vininn126 (talk) 16:57, 8 November 2022 (UTC)[reply]
(outdent) There is no doubt "hyponymy" and "instance of" are different relationships, as recognized by Wikidata, although WordNet confuses the two a bit. But what does it have to do with the word "lexical"? What is the definition of the word "lexical" other than "of or relating to words"? And what is the business with the word "lexical" anyway? The words "Amazon" and "river" are semantically connected, and the thesaurus relationships are semantic relationships; the word "lexical" is not used for the purpose. And why are categories allowed to do something that the thesaurus is not? --Dan Polansky (talk) 17:02, 8 November 2022 (UTC)[reply]
Inclined to oppose. The example of political parties is shaky as I'm sure that many of those could be subject to RFD themselves (some have already been deleted), but the example of countries isn't relevant because those are explicitly allowed by CFI, whereas philosophers are not. A lot of the entries linked don't even have an entry for the philosopher mentioned, and a few of the ones that do don't feel super notable and could be subject to RFV/RFD. AG202 (talk) 16:51, 8 November 2022 (UTC)[reply]
Now that's a different line of reasoning. It would mean listing countries in Thesaurus:country would be okay because the names themselves as countries are guaranteed to be included. For philosophers, I would argue that their names are going to be included in some form, e.g. as Aristotle or as Russell, so they will mostly be bluelinks, and where they would be redlinks, {{ws}} enables saying link= to disable linking.
I believe among the large set of all Wikipedia-notable philosophers, those relatively few listed are likely to be very notable, given the selection made by the authors of WordNet and Moby II. A different list of notable philosophers could be chosen; I am fine with that. --Dan Polansky (talk) 16:57, 8 November 2022 (UTC)[reply]
No. Since when are dictionaries to present the instance level? There may be a ”philosophy dictionary” doing this, but only in as much as it is not a dictionary but misnomed. You also have a link to a list of philosophers on Wikipedia which has better personnel for the same job.
Your attempts to deduce arguments from assumptions that you put in our mouths but we have not mentioned are all beside the point and a waste. You are the only one dropping the term “word” in this thread. WT:CFI saying “all words in all languages” is not specific enough to demand inclusion of anything that you deem a word. And still we do not ascribe value to “notability” in an absolute sense—the instance may be as notable as it can be, in the context of this project it won’t be as much. Fay Freak (talk) 17:05, 8 November 2022 (UTC)[reply]
Ever heard of WordNet? And Moby II thesaurus? Both lexicographical works? I made neither of them. Ever heard of Wiktionary topical categories for specific rivers? Not dictionary content? As for "word", I did not introduce the word "lexical" into the discussion and "lexical" means "of or relating to words". The point is not really notability but exemplification, and to achieve exemplification, one needs to make some arbitrary cut off or choice, which for rivers may be length and for philosophers may be notability. --Dan Polansky (talk) 17:13, 8 November 2022 (UTC)[reply]
It seems like you are the only one who esteems it necessary to cut off and around the dictionary arbitrarily. The other editors here work on some kind of system, which you seek every opportunity to deny by introducing anything to estrange em, though its foreignness to this place be immediately discernible, supported by the observation that few have much ambition to formulate rules—but this is your personal guiding theme, others just want to write a dictionary and not a philosophy of dictionaries, which they aren’t at a loss about. Fay Freak (talk) 17:27, 8 November 2022 (UTC)[reply]
Well, it is not reasonable to list all rivers as instances in Thesaurus:river, so if examples are to be given on the model of WordNet and Moby II, some selection has to take place, some arbitrary cut off. I don't understand what the above is all about. How does that relate to anything that I have said above? How does my editing of the thesaurus interfere with anything that others are doing? How does it impact "work on some kind of system"? --Dan Polansky (talk) 17:48, 8 November 2022 (UTC)[reply]
Why do you want to manually duplicate what we can already do with categories? And why do you want to do so in a way that requires an "arbitrary cut off"? These are arguments against your approach, because they make it very clear that there is no underlying principle here other than what you've decided to hyperfocus on today. Theknightwho (talk) 18:51, 8 November 2022 (UTC)[reply]
To exemplify, as I write above, not just philosophers but rivers, countries, mountains, mountain ranges, etc. I want to follow WordNet's wisdom. All I hear is "exemplification is bad", with no argument to support that notion. Exemplification is not duplication of a comprehensive list, by definition. --Dan Polansky (talk) 19:04, 8 November 2022 (UTC)[reply]

This is a lost cause, but let me make the point that the thesaurus is a word finder. To find the name of a very notable philosopher is to find a word, by starting with another related word, here "philosopher". Why make the word finding function less rich? Sure, other sources such as WordNet already do the job and are one click away, but why make the "word finder" less rich in its "word finding" capacity? It has enough space on the page. --Dan Polansky (talk) 20:40, 8 November 2022 (UTC)[reply]

The list provided a tool to navigate from philosophers to the derived adjectives: you click on the name to get to the mainspace and there you see the derived adjective. For that, the philosopher does not necessarily need to have a sense in the mainspace, only an entry for the name. Thus, one can answer the question: what notable philosophers have an adjective derived from them? Without the list, there is no way to do that in Wiktionary. A category of philosophers would only be there to serve the purpose if they all had senses in the mainspace, which is controversial; the thesaurus can work without that. --Dan Polansky (talk) 21:05, 8 November 2022 (UTC)[reply]

DuckDuckGo and Yandex and Twitter’s and Reddits search functions are also word finders, most useful to find usage and discussions of terms; doesn’t mean Wiktionarians should build a search engine to be accessed by the Thesaurus namespace. It’s a word finder but not for all kinds of words and only to find these limited kinds of “words” in a specific fashion. You are not making a point but a petitio principii the whole time, defining things as what you want them to be.
The list was a kludge, like an improvised explosive device. But you don’t have any mission to take on this site, repurpose the tools offered here to achieve your objectives, as you can just enter other sites and employ their devices. You are acting as though there were a frontline that, to extend your influence sphere, you would have to break by any argument imaginable, not being able to desert from your position, but actually we have to cooperatively restrict ourselves for a concentrated and coordinated effort to allocate scanty manhours, which are diluted if there is no prospect of contours in the resulting work. Fay Freak (talk) 21:41, 8 November 2022 (UTC)[reply]
I'm about 85% sure that I agree with you, but I have to admit that I did get lost in your second paragraph. Theknightwho (talk) 21:47, 8 November 2022 (UTC)[reply]
What nonsense. The semantic relations employed by Wiktionary and the thesaurus are modeled on WordNet, and I am merely following WordNet's lead, making my own thoughts along the way and finding that I like the result, which is still in the revision history. There is no "repurposing" of the tool: there is use of the tool as designed by the tool maker WordNet. The above is pure rhetoric full of buzzwords and figures of speech while making no substantive argument. The claim that I am doing "my" way is absurd since I am doing the WordNet and Moby II way and perhaps I would not even come up with the idea that we should list considerable number of instances without them. I tried to do what they are doing, already before the philosopher entry in geographic entries, and I find it cool. By contrast, it is the opposition that is doing "their" way by disregarding practice in external sources that serve as inspiration. It is all the more curious given the opposition does not spend any resources on the thesaurus and has made derogatory remarks about it. --Dan Polansky (talk) 22:09, 8 November 2022 (UTC)[reply]
This an obvious no, this is what w:Category:Philosophers is for and does. - -sche (discuss) 09:52, 9 November 2022 (UTC)[reply]

In the spirit of Sisyfos, I will try to understand the problems raised or implied. Questions:

  • Is the problem with instance-of relationship? If so, planets have to go from Thesaurus:planet and countries have to go from Thesaurus:country.
  • Is the problem with cut-off on the number of instances covered? If so, specific rivers have to go from Thesaurus:river: it is not practical to list all the rivers and only a sample can be given.
  • Is the problem specifically with humans? If so, why? Why are specific humans more encyclopedic than specific rivers?
  • Is the problem with poor measurability of notability? If so, rivers could be kept in Thesaurus:river, but something would have to be done about individuals in Thesaurus:philosopher. Could we perhaps include philosophers whose names are used figuratively, as in "he is no Socrates"? We could thus exemplify without relying on notability.
  • Is the problem with including items that have no sense in the mainspace? If so, I could modify the list to include only such items: "items from Moby II and WordNet that are covered by mainspace" or "only items covered by mainspace".
  • Is the problem with duplicating Wikipedia? If so, why should we have Category:en:Rivers and why should its category structure involve the encyclopedic CAT:en:Rivers in the United States and CAT:en:Rivers in Alabama, USA?

I pledge to avoid responding to individuals who have shown to produce unproductive arguments to prevent derailing the discussion. There are some individuals who have produced interesting and relevant thought and I would like to hear from them. --Dan Polansky (talk) 09:42, 10 November 2022 (UTC)[reply]

I would personally be fine with, and encourage, removing the existing instances from rivers, philosophers, countries, and planets. When I say it is non-lexical, I mean that it is taxonomising referents rather than words. That is unavoidable to some extent when mapping semantic relationships, but instance-of relationships are entirely about the referents and shed virtually no light on words. Thesaurus:country, for example, would be much more useful to my mind if it had a more detailed list of terms related to countries, rather than the vast majority of the entry being a mechanical list of existing sovereign states. —Al-Muqanna المقنع (talk) 11:23, 10 November 2022 (UTC)[reply]
If you remove planets, the thesaurus will show no connection between Thesaurus:planet and Thesaurus:Earth and no connection between Thesaurus:country and Thesaurus:United States of America. I don't see how that disconnection can be desirable. Thesaurus:country listing countries does not prevent it from listing other terms. Granted, the country entry lists quite many instances, but if it is to connect thesaurus entries that are semantically related, it has to do it, or have a separate thesaurus entry just for the purpose, e.g. Thesaurus:country/instances. --Dan Polansky (talk) 11:52, 10 November 2022 (UTC)[reply]
I think our readers are smart enough to understand that absence on a Thesaurus page does not mean absence of any connection whatsoever. It can be replaced by a see also link to a category, which also prevents having to manually edit the information in multiple places. —Al-Muqanna المقنع (talk) 11:55, 10 November 2022 (UTC)[reply]
"any connection whatsoever" is a red herring and not under discussion, e.g. phonological connections. As for "referents rather than words", semantic relations are done via relationships between referents; for instance, hyponymy is for subset relationship on referents and meronymy is on part-of relationship of referents. Thus, we have meronymy in Thesaurus:Brazil that connects the referent of Brazil to the referent of Mato Grosso. --Dan Polansky (talk) 12:01, 10 November 2022 (UTC)[reply]
I agree that phonological connections are a red herring and not under discussion, and if people intuitively understand that exclusion from a thesaurus page doesn't exclude phonological connections, I'm sure they can also be trusted to understand that it doesn't exclude non-lexical instance-of relationships. These various examples are not particularly impressive to me; I don't think we gain anything from using the Thesaurus namespace to detail everything that's e.g. located within a country and if that is all it's being used for we could probably do without it entirely. —Al-Muqanna المقنع (talk) 12:32, 10 November 2022 (UTC)[reply]
Okay, hyponymy is a subset relationship on referents, how about that? --Dan Polansky (talk) 12:41, 10 November 2022 (UTC)[reply]

Collapsing the table of contents to only show language names[edit]

For many entries, it is quite difficult to find the language one is interested in due to the table of contents being excessively long. Take for example this page and compare it to the same page on the Spanish Wiktionary.

The Spanish Wiktionary has a nice solution: The table of contents is collapsed for all languages except the dictionary's main language (Spanish). I want to propose that we do the same.

This proposal is different from the last discussion in that section names are only collapsed, not removed, and the section names of the English entry would be shown.

(On a related note: Does it not annoy others that the sections on the mobile version aren't collapsed, or that we don't have a table of contents there? It takes forever to scroll down to the section one is interested in.)

--Hvergi (talk) 09:51, 9 November 2022 (UTC)[reply]

This seems pretty reasonable, with the caveat that we should keep the table of contents floating to the right instead of making a big block that forces all of the actual definitions in the entry way down the page. —Justin (koavf)TCM 10:04, 9 November 2022 (UTC)[reply]
I don't see benefit to this if there is only one language in the entry, or even two or three. The benefit seems to arise only for the relatively small proportion of entries that have large number of L2 sections. I also don't see the benefit for Translingual items, whether they be for symbols, CGKV or other characters, of taxonomic names.
Is there a way to address the problem in the case of entries with large numbers of L2 sections without diminishing the value of the ToC in cases where it poses no problem? DCDuring (talk) 16:18, 17 November 2022 (UTC)[reply]
Honestly I wonder if mobile UI designers in general have been having a laugh at us for fifteen years, as the lack of scrollbars on any mobile browser that Im aware of have been forcing us to flick, flick, flick our way through pages all this time, and it seems like such an easily solvable problem since there are quite often scrollbars in other areas of the mobile interface such as (on Android at least) the list of installed apps. The problem you mention would be more annoying to me if I wasnt dealing with the same thing on every other site already. Thanks for bringing it up, though. Soap 00:19, 25 November 2022 (UTC)[reply]

I was confused. You mean Vector legacy 2010 without Tabbed Languages enabled. With new Vector 2022, the TOC in the sidebar is collapsed for pages with more than 20 sections. --Vriullop (talk) 13:57, 21 November 2022 (UTC)[reply]

20 sections?? I have the misfortune of studying two languages that frequently coincide with other languages in their linguistic family and are last or near-last in the alphabetic list for that family. Namely, Portuguese—which comes after Catalan, Galician, Ligurian, Old (Catalan, Galician, Ligurian), Old French, and Old Portuguese (boy, makes me feel for students of Spanish!)—and Ukranian, which comes last in the Slavic family and almost dead-last in Cyrillic languages in general.
It’s excruciating when there are four fully-fleshed-out entries (including etymologies, declension tables, quotes, and related words). (I get annoyed when entries in Japanese are buried beneath “Translingual” and “Chinese”—but that’s just being curmudgeonly.) Why was the number 20 chosen, and can it be made a user-modifiable variable?
(I’ve found, btw, that currently it is much faster to collapse the earlier sections than it is to scroll them.)TreyHarris (talk) 19:55, 30 November 2022 (UTC)[reply]

Plato and whether concrete persons are subsenses of name senses[edit]

I converted the "Greek philosopher" sense of Plato to a subsense of the given name sense twice and was reverted by @Dan Polansky both times. I pointed to Trump, Clinton and Hitler for analogous cases, though there are admittedly also entries like Stalin where the person sense is on the same line as the name sense and entries like Plato, Aristotle and Socrates (before I changed them) where the persons and names are on different lines entirely. Dan then demanded that I provide evidence up to an impossible-to-meet standard (evidence that this is the "dominant" practice which is apparently more stringent than "many but far from all entries"), which is why I elected to instead bring attention to it in the BP (again).

I would be in favor of disallowing all person senses on pages where the set of page title words is a (potentially improper) subset of the set of name words of that person. As an example, Donald Trump should not be a permissible sense for either of the pages Donald, Trump or Donald Trump but it is okay for something entirely different such as Cheetolini. I don't think there's currently super-majority support for this so if we must include these person senses, we should at least include them in some way subordinate to the name senses (i.e. either as a subsense (which I prefer) or as is done in Stalin but certainly not as a separate sense) because that's what they are: The set of referents of the name sense of Plato is any person called Plato, which thus makes the philosopher sense merely a restriction of that, a subset of the set of referents, hence a subsense. — Fytcha T | L | C 21:27, 10 November 2022 (UTC)[reply]

I favour doing exactly what you did in the first place.
On a related note, I think we should introduce something similar to WP:POINT (if we don't have it already). I think we are all getting sick of having Wiktionary held hostage at this point. Theknightwho (talk) 22:03, 10 November 2022 (UTC)[reply]
  • The reason I reverted is in the edit summary: "restore the philosopher as the main sense following a long-term tradition: this is the primary activated semantic node under the symbol out of context; Plato the philosopher is extraordinarily notorious". I did so because I found the new format ugly and stupid, which of course is subjective. The nesting indentation helps nothing from usability perspective. Some editors started to change that practice, so it is now inconsistent. The objective of your edit seems to be to doubly demote the primary semantic node by changing it to the 2nd place and indenting it at the same time; and yet, the only translation table in Plato is for the philosopher. If there is consensus for a change, fine, let's find what the consensus is and make it a policy, issue closed. And since it seems to be a matter of preference and not of factually correct or incorrect, I think 60% should be a pass in this case; we should not be deadlocked on such issues only because we require the high standard of 2/3-supermajority and then let people fight the issues by back-and-forth in the mainspace. As an aside, having a dedicated sense in Trump for the president is user-friendly: if the user asks "what are the nicknames for Donald Trump", it is most straightforward to search for them in Trump entry, and there they are. It would be better if the president sense were not nested and indented, though; now it looks ugly and stupid. --Dan Polansky (talk) 07:21, 11 November 2022 (UTC)[reply]
    The issue is, I'm providing rigorous arguments for why something is a subsense, whereas you're just talking about your feelings and completely irrelevant things like translation tables. From your reply I take it that you have nothing to object to the actual logic of my argument which reduces your objection to "I acknowledge that subsenses are used correctly here but I object to their correct usage anyhow because I dislike them for subjective reasons." Is this an accurate characterization of your position? Also, judging off of WT:Subsenses and the linked discussion WT:Beer_parlour/2015/May#ELE:_explicitly_ban_nested_subdefinitions/subsenses?_Or_allow_in_rare_cases?, it seems like there is good consensus to not only keep them but to employ them more often. And while I personally don't care about WT:LEMMING much, I know that you do and I want to point out the fact that the majority of monolingual dictionaries (that I use) make frequent use of subsenses. — Fytcha T | L | C 12:09, 11 November 2022 (UTC)[reply]
    The contrast is between "we should be using subsensing more often" vs. "whenever there is arguably a subsense relationship, we should indicate it by indenting and nesting even if there are only two or three sense lines and even if the subsense has priority in the sense activation list over the broader sense." Maybe there is consensus for the latter as well, I don't know. As you see from the thread title, is asks whether we should "allow in rare cases", whereas some people seem to think it should be done nearly always when possible in principle. As for lemmings, I know of no lemming that has Plato entry done the way you propose. As for subjectivity, there is element of subjectivity but also objectivity: I believe the notion that the philosopher sense leads the activation list is very likely to be correct. What is subjective is the assessment of what takes priority, whether semantic relations (hyponymy and the like) or activation frequency relations and usability. --Dan Polansky (talk) 13:37, 11 November 2022 (UTC)[reply]
    On another related note, is anyone else getting sick of Dan asserting (without evidence) that whatever he prefers is always the status quo, and that it’s up to other people to overturn it? How about he accepts the burden of proof for once, given he has provided absolutely no evidence for that. As far as I can tell, it’s just a rhetorical tactic to stack the deck in his favour in every discussion. Theknightwho (talk) 15:09, 11 November 2022 (UTC)[reply]
His extremely long filibusters make discussions hard to follow. Equinox 15:29, 11 November 2022 (UTC)[reply]
I don't think Plato-person is a subsense of Plato-name, because a person is not a name. Rather, Plato-person has an instance of Plato-name; or his name (but not he himself) is an example of the name. In object-oriented programming you would never derive Person from Name. (My preference with specific people like Plato and Einstein is to put their Wikipedia links in the "See also" section, and only include them at all if they are the overwhelmingly commonest known person of that name.) Equinox 15:30, 11 November 2022 (UTC)[reply]
This gets into some quite tedious (literal) semantics but it's worth noting that our name senses don't (generally) define the word as referring to a name, they tend to be non-gloss definitions explaining that the word is used as a name (for people). In that case I believe it's fair to talk about instances being subsenses, though it's ultimately really a presentational issue and I don't really mind either way. —Al-Muqanna المقنع (talk) 17:54, 11 November 2022 (UTC)[reply]
I don't care how tedious I am, if I'm right. ("Bureaucrat Conrad, you are technically correct. The best kind of correct!") It's definitely a strange question, and actually opens up whole cans of worms: e.g. if Smith is a name, but we have a plural Smiths, then what is it a plural of? Two Smiths are two people, not two names, but we wouldn't define Smith as a person (unless it was Einstein, haha!), and then even if we did define it as a person, then the plural would usually be two of any people with the name, and not two of the defined person. I can see how this seems like boring semantic dancing, but I think it's actually a strong indicator of why a dictionary, defining words, should not get into questions of individual personalities. Equinox 05:04, 12 November 2022 (UTC)[reply]
On the specific issue, like Al-Muqanna, I'm not really bothered by either presentation.
On the broader issue, Dan has been an obstructionist for as long as he's been here, also years before his hiatus. I'm loath to block a long-time editor who does also do some good work, but I do think we have a "one disruptive editor" problem more than a "we need new rules about POINTing" problem — no shade to TKW, rules can help in general or future cases, but rules can also be gamed and part of this user's MO is rules-lawyering, so at some point a community has to exercise discretion and block people who are not participating in the collaboratively-building-a-dictionary part of working together to build a dictionary. Seeing how many other people are also fed up with his obstructionism and filibustering independent of their feelings on the specific issues like this, I have bitten the bullet and blocked him, and repeat my block summary here in case the length gets cut off in the block log: "persistent, years-long history of disruptive editing and obstructionism; in particular, I highlight as w:WP:DE does that disruptive editing need not be "intentional. Editors may be accidentally disruptive because they [...] lack the social skills or competence necessary to work collaboratively. That the disruption occurs in good faith does not change that it is harmful"."
- -sche (discuss) 19:12, 11 November 2022 (UTC)[reply]
Further discussion at User talk:-sche#Dan_block. - -sche (discuss) 00:24, 12 November 2022 (UTC)[reply]
I want to thank @-sche for their courage in this decision; I don't see myself as ever having the courage to permanently block a long-time editor. It goes without saying that I am saddened that we miss out on the good work Dan could have done in the future for this project but I also agree that the status quo would have been untenable in the long term. I want it to be known that I, being a lover of second chances, would be in favor of unblocking him, provided that he abstains from participating in Wiktionary policy making, edit wars and and the likes (I don't want to provide a comprehensive list here because I'm sure Dan is smart enough to figure out by himself which kinds of edits are fine and which ones aren't). — Fytcha T | L | C 01:05, 12 November 2022 (UTC)[reply]

Is Proto-Norse a dialect of Proto-Germanic[edit]

The differences between Proto-Norse and Proto-Germanic are pretty small. Would it perhaps be a good idea to treat Proto-Norse as a late dialect of Proto-Germanic, much like we did for Frankish? -- {{victar|talk}} 02:46, 12 November 2022 (UTC)[reply]

@Mårtensås as the main (only?) editor of Proto-Norse. Thadh (talk) 00:44, 13 November 2022 (UTC)[reply]
Proto-Norse is an attested language; we have several hundred words in it. Now, the earliest Proto-Norse is so close to Proto-Germanic that it might better be classified as Proto-North-West Germanic (the common ancestor of North and West Germanic); Elmer Antonsen argues for this, and he is right in that it does not show any specifically North Germanic innovations, only common ones like *ē > ā, *-ai > ē, *-ō > -u. This would also solve the issue of a word like ᚱᚨᛇᚺᚨᚾ, which as it is is classified as Proto-Norse, even though it might just as well be "Proto-English", the two languages being almost identical at this time.
We further have certain innovations common to Anglo-Frisian and North Germanic, but not shared with the more southern West Germanic languages, such as the 3rd person plural present indicative *eʀun,[1] or 2nd person plural pres. ind. *eʀt[2]. Another one would be the collapse of the n-stem oblique conjugation into that of the accusative, as we already see in the genitive raihan above: Old English: -a, -an, -an, -an, Old Norse: -i, -a, -a, -a[3] Old High German: -o, -on, -en, -en. ᛙᛆᚱᛐᛁᚿᛌᛆᛌProto-NorsingAsk me anything 11:19, 13 November 2022 (UTC)[reply]
All the pings: @Mahagaja, Rua, Mnemosientje, Mulder1982 --{{victar|talk}} 08:51, 13 November 2022 (UTC)[reply]
  1. ^ Old English: earon, Old Norse: eru, Old High German: sind
  2. ^ Old English: eart, Old Norse: ert, Old High German: bist
  3. ^ -n has been lost word-finally, but survives in early inscriptions which show that the change is at least from the 400s.

Japanese verb conjugation template discussion Part 2[edit]

User:Huhu9001 has created a new conjugation template here. I think it's fine, but what do you think of it? (Memory issues, etc?) Dennis Dartman (talk) 16:14, 12 November 2022 (UTC)[reply]

I really like the fact that forms and polarity are now split by axis but, due to the (IMO useless) furigana and romaji that are now presented vertically instead of horizontally, I still prefer the current table. The table in Module:User:Huhu9001/000/documentation is 1249 pixels high, the one in 泳ぐ#Conjugation only 568. — Fytcha T | L | C 16:36, 12 November 2022 (UTC)[reply]
Remember that on mobile, screen width is at a premium. Theknightwho (talk) 16:42, 12 November 2022 (UTC)[reply]
Well, tell that to the creator and supporters to Template:sw-conj! Dennis Dartman (talk) 18:44, 12 November 2022 (UTC)[reply]
I'm not sure how that's relevant, really. There are obviously extremes at both ends of the spectrum. Theknightwho (talk) 23:49, 15 November 2022 (UTC)[reply]
I like the improved completeness and precision of this new one for sure. It specifies clearly all the "principle parts", but I don't like that they aren't written in Japanese script anymore. It was nice to see hyperlinked 未然形, 終止形, etc., which are now missing. Also, the boxes, as we've established, are quite tall, and the comparatively thinner boxes of the old template were more aesthetically pleasing in my eyes. They wasted less space in conveying the same information.
Nevertheless, a huge bonus of the new one is the extra forms it has that the old one doesn't: distinguished "adverbial" forms, optative and presumptive, all the polite forms, etc.
In comparison, the old template was much more terse, so it wasn't so good a reference for all the possible forms the verb could actually take. In the wild, I'd seen the presumptive form used in writing, but Wiktionary never had anything to say about it in the conjugation template, so I only now got to understand it a little better. Kiril kovachev (talk) 22:52, 12 November 2022 (UTC)[reply]
I love it. I find it a bit odd to have the names of the "principal parts" without -kei (I don't think you ever use those words without -kei in Japanese when referring to the grammatical form). I'm not too bothered by having the actual Japanese words in kanji. I don't think principal parts is the term we want there though. It works with Latin, Greek, etc., but it's not really the same thing in JP. If we were actually giving "principal parts", we would (1) not repeat the identical forms (oyogu x2, oyoge x2), and (2) we would give "oyoi-" and "oyogo-" too. I would stick to something more traditional, like "bases" or "stems". (Well, if it was up to me I wouldn't even include them, since they were created to describe Classical Japanese and they are not really functional/useful when talking about Modern Japanese,on the contrary they only create confusion, but that's another question.) — Sartma 𒁾𒁉𒊭 𒌑𒊑𒀉𒁲 16:05, 15 November 2022 (UTC)[reply]
  • On the whole, I think this represents a number of improvements. This new format fills some important gaps in our older infrastructure, such as including polite forms.
  • I agree with @Kiril kovachev that the new format leaves out the linked Japanese grammatical terms that would go in the header under "Principal Parts", and with @Sartma that the final (-kei) should be added on the end of those terms, but these are minor issues and easy to fix. Perhaps something like "Verb Stem Forms" might be a better label than "Principal Parts". I do see value in including the verb stems under the labels, as these are still relevant in modern Japanese grammar. Where the stem forms are identical, we could merge the table cells -- easy enough, since the identical forms always occur between the adjacent 終止形 (shūshikei, terminal or predicate form) and the 連体形 (rentaikei, attributive form) (both ending in -u), and between the adjacent 已然形 (izenkei, realis form, for Classical) / 仮定形 (kateikei, hypothetical form, for modern) and the 命令形 (meireikei, imperative form) (both ending in -e for so-called quintigrade verbs, but distinct for other verb classes).
  • Minor note: "Reitai" is a typo for "Rentai", c.f. 連体形 (rentaikei, attributive form).
  • I think a couple of the English-language labels on the left might be problematic. "Subjunctive" doesn't quite capture how ~たら (-tara) or ~ば (-ba) function, for instance: both can express "if" conditions, with ~ば expressing more of a prerequisite causal relationship than ~たら. I am more accustomed to seeing these described as "Conditional". By way of example, a subjunctive statement such as "it would have been better if I had gone" can be expressed in Japanese without using either ~たら or ~ば: 行ったほうがよかったのに (itta hō ga yokatta no ni). Meanwhile, the ~て (-te) forms are more conjunctives than adverbs: compare 眩しく明るい (mabushiku akarui, blindingly bright) and 眩しくて明るい (mabushikute akarui, blinding and bright).
  • In terms of usability, I am concerned about the use of gray text on a gray background for the Japanese terms in the table -- lower contrast is not ideal for accessibility reasons, and some of our readers will struggle to see this clearly. Black text on a white background would work better.
  • I also agree with @Fytcha that the furigana (the smaller characters above the kanji, provided as a phonetic guide) are not terribly useful here -- anyone who can read the hiragana used for the furigana can already understand how the okurigana break down, so the correlation with the romanized text is obvious. And anyone who cannot read even the hiragana would need the romanized text, and would have no use for the furigana. I would propose to remove the furigana, and thereby save some space in the layout. The furigana features require Module:ja-ruby, so cutting out this dependency might also save a bit on Lua memory.
These minor changes aside, I think we would be well served to try this out with the different verb types (quintigrade, -i and -e monograde; also the different ending morae for the quintigrade, such as ~ぐ -gu vs. ~く -ku, etc.), and ideally use this to replace our existing Japanese verb conjugation table templates. Props to @Huhu9001 for tackling this. ‑‑ Eiríkr Útlendi │Tala við mig 19:38, 15 November 2022 (UTC)[reply]

I insist on my terminology.

Principal parts:
  • This name most accurately describe the role of mizenkei, etc. in the Japanese grammar. They were inventions of traditional grammarians, as a mnemonic tool to help remember the whole inflection, despite language change in Modern Japanese rendering them flawful. Traditional dictionaries would give them as clues for other inflected forms, should they give any. They can be ignored if you have better ways to memorize.
  • "verb stem forms" is a particularly uninformative name. It can not be more obvious that this term is a "verb", and everything in the table are "forms". Meanwhile the word "stem" is overused. In the headline we already have a "stem" which means ren'yokei, and sometimes we talk about "consonant or vowel stems" which is actually something like せ for する. Here we are not to add another six "stems" as this word is becoming almost meaningless.
Subjunctive:
  • Calling them "conditional" is simply a misunderstanding of what the "conditional mood" means. In a if-sentence, the conditional mood is the one in the main clause. The mood in the if-clause is called "subjunctive". Japanese -ba and -tara are obviously in the if-clause, not the main clause.
  • The example 行ったほうがよかった only shows that -ba/-tara are paraphrasable. That's an unrelated topic.
Adverbial:
  • While some -te forms translate into English "(verb) and (verb)", not all of them do. -te has a variety of meanings other than "(verb) and (verb)" which names like "conjuntive" would obscure. In any case -te forms are, even when they do translate into "(verb) and (verb)", syntactically adverbial phrases, making "adverbial" the most proper name.
  • "Conjunctive" is easily conflated with the "conjunctive mood", something totally different. -- Huhu9001 (talk) 03:11, 18 November 2022 (UTC)[reply]
@Huhu9001: Principal parts are something very precise. Despite their name, principal parts are not just "parts", but fully formed/inflected words. That's why I'm saying this definition doesn't apply in the case of the 活用形 (katsuyōkei). The mizenkei of iku (to go) is ika-, and it's not a fully formed/inflected word. The fully formed word would be ikanai or ikō. Actual principal parts for iku would be, for instance, iku, ikimasu, ikanai, ikeba, ikō, itta, since remembering those forms would allow you to derive all other inflections.
So no, "principal parts" is not the name that most accurately describes the role of Japanese katsuyōkei. On the very contrary. If you stick to it, you'll just be misusing it. — Sartma 𒁾𒁉𒊭 𒌑𒊑𒀉𒁲 16:43, 18 November 2022 (UTC)[reply]
@Huhu9001: Actually, for iku you would just need iku and itta as principal parts. That's all one needs to know to conjugate the verb in all the other forms. You see what thinking in terms of katsuyōkei does to one's brain? It stops you from seeing the obvious! Lol. — Sartma 𒁾𒁉𒊭 𒌑𒊑𒀉𒁲 17:40, 18 November 2022 (UTC)[reply]

Correct regional label for Switzerland Alemannic[edit]

What is the correct regional label for entries like Chlapf? I added (Switzerland) but, strictly speaking, this is wrong because there are Swiss dialects that use Klapf instead. The reason why I feel like I need to add a label in the first place is because I don't know whether it is used in Alsace, Northern Italy etc. and so if I don't add a label, it implicitly gives off the impression that this word is used in these dialects as well. It's not the first time I've been confronted with this dilemma. Pinging @Widsith, Linshee. — Fytcha T | L | C 18:06, 12 November 2022 (UTC)[reply]

We've discussed this kind of thing a bit before on our respective Talk pages I think. Personally I don't think having a "Switzerland" label on gsw entries makes much sense, since I think it's assumed we're talking about Swiss German in the absence of any other label. Even in such a diverse dialect continuum as Alemannic German, at some point you have to assume that a particular form is a majority form, and then the other variants would be listed under an "Alternative forms" section. At least that is how I would handle it, from a practical point of view. Ƿidsiþ 06:16, 13 November 2022 (UTC)[reply]
@Widsith: Apologies for the late reply. Yeah, I think this is a very reasonable middle-ground solution and I think it should be codified into WT:AGSW (WIP). The only downside I can think of is that it would be impossible to find all Swiss German lemmas (without external processing), i.e. all lemmas that would be understood in Switzerland. It seems at least half-way plausible that somebody would want to look that up, but oh well, let's just keep it simple. — Fytcha T | L | C 16:44, 26 November 2022 (UTC)[reply]
@Fytcha If we knew it wasn't used outside Switzerland, then I'd think "Switzerland" was a fine label, since I don't think any of our labels assert that every single speaker and sub-dialect in the region uses the term. For example, I'm sure there are terms labelled "(UK)" or "(US)" for which some sub-dialects have other terms, but "(UK)" is still a useful label. But if you aren't sure it's limited to Switzerland, then...yeah, that's tricky; I understand the desire not to present something as pan- or non-dialectal if you don't know that it is, but presenting it as Switzerland-only (when you don't know that it is) also seems suboptimal. What we really need, not just for Alemannic but also Low German and probably other languages (Chinese?), is a system for marking "known to be used in at least the following dialects: ...". - -sche (discuss) 10:43, 26 November 2022 (UTC)[reply]
@-sche: A system to distinguish whether a given set of context labels is known to be complete or not would probably be the best long-term solution, but I don't see us implementing that anytime soon. I agree with your contention and I think we should just use no label at all for such entries for the time being. — Fytcha T | L | C 16:44, 26 November 2022 (UTC)[reply]

Lingua Libre .ogg Format?[edit]

Under Help:Audio pronunciations, it looks like it's been agreed upon that we should use the .ogg format, "because it is a free format". Naturally, I agree with this, but using the Lingua Libre resource mentioned on that same page creates .wav files, rather than .ogg. Is this okay, and is there any way to record in the .ogg format on Lingua Libre? I'm just concerned that the wave files generated are uncompressed, so might waste a lot of space on the Wikimedia servers when a simple ogg would do fine. Thanks for any feedback, Kiril kovachev (talk) 22:41, 12 November 2022 (UTC)[reply]

@Kiril kovachev I wouldn't worry about wasting space - there's a filesize limit for a reason, but you won't be anywhere near it. Theknightwho (talk) 17:00, 15 November 2022 (UTC)[reply]

the vowel of floor, horse, etc in GenAm[edit]

Many entries notate the vowel of floor, core, hoarse, horse, etc only as /ɔ/, as if it were the vowel of flaw and caw. Since it's not (in GenAm), and one can even contrive minimal sets like core vs. a poetic caw'r (monosyllabized like o'er...for which we comically give /oʊ/ only in a disyllabic pronunciation), I think /ɔ/ is misleading. Some editors add /o/ or /oʊ/ to entries here and there, but we should approach this categorically. Merriam-Webster says the vowel+r of floor, core, hoarse, and horse, north and force is "[oɚ, ɔɚ]" (they write "ȯr", but clarify that this means IPA [oɚ, ɔɚ], vs. the "ȯ" of flaw which is [ɔ]). Dictionary.com writes the vowel of floor, core, and hoarse (but not horse) as /ɔ, oʊ/ without any suggestion that /oʊ/ is restricted to the few dialects than haven't undergone the hoarse-horse merger. (The 1933 OED, which distinguished hoarse (hōᵊɹs) from horse (hǭɹs, with italics), also distinguished horse's vowel from haw's (hǭ, no italics).

I think we should categorically include GenAm /o/ or /oɚ/ pronunciations in all these entries, either alongside /ɔ/, or relegating GenAm /ɔ/ to a separate {{a|without the hoarse-horse merger}} line. (/o/ is not limited to the few accents that don't have the hoarse-horse merger, as entries like floor currently claim, because as M-W indicates, it's the pronunciation with and which resulted from the merger.) Agree, disagree, other ideas? Catalyst for this was diff ~ diff and Talk:Florida; pinging Whoop whoop pull up, Tharthan and Soap, plus Mahagaja who's had things to say about this vowel in the past. - -sche (discuss) 21:48, 13 November 2022 (UTC)[reply]

/ˈflɔɹ.ɪ.də/ sounds to me like something from Noo Yawk and points north, but I don't use /ɔ/ except in diphthongs- so I may be an outlier. I don't think it's a coincidence that Tharthan is from an area that uses /ɔ/ a lot. Chuck Entz (talk) 22:21, 13 November 2022 (UTC)[reply]
It's an extremely complicated issue. For speakers with the horse/hoarse merger, the starting point of the rhotacized vowel in question is more open than the starting point of the goat vowel but closer than the thought vowel, even for people who distinguish cot and caught and have a real /ɔ/ in the latter. Therefore, the north/force vowel can be thought of either as /ɔɹ/ (with the understanding that the /ɔ/ is closer here than in its nonrhotacized equivalent) or as /oɹ/ (with the understanding that the /o/ is more open here than in its nonrhotacized equivalent). (Incidentally, I wouldn't use /oʊɹ/ since there's really no [ʊ] offglide, and we aren't generally using /ɚ/ to transcribe the ends of rhotacized diphthongs.) For speakers that distinguish horse and hoarse, the transcriptions /ɔɹ/ and /oɹ/ respectively make much more sense (but in the U.S., most speakers who make this distinction are either also nonrhotic speakers, meaning it's actually a matter of /ɔː/ vs. /oə/, or else they've merged north with start rather than with force, meaning it's a matter of /ɑɹ/ vs. /oɹ/). However, it's important to remember that Florida (and foreign, forest, orange, etc.) are neither north words nor force words, but lot words where the stressed vowel precedes an intervocalic /ɹ/ (as shown by the fact that New Yorkers say /ˈflɑɹɪdə/ but not */nɑɹθ/). So however we decide we want to transcribe north and force, it won't have an effect on Florida, which will need a separate decision. I'd also point out that all previous attempts to impose any sort of consistency to our transliteration of GenAm and other U.S. accents have failed miserably, as everyone has their own ideas as to the best system, and that inconsistency has already been enshrined in Appendix:English pronunciation, which has been written to be tolerant of using a wide range of symbols to represent the same sounds. —Mahāgaja · talk 22:28, 13 November 2022 (UTC)[reply]
I'm originally from Central Massachusetts, and /ɔɹ/ in any context sounds completely alien (in fact, whenever I try to pronounce it myself, it comes out as /ɔɚ/), even though I'm from an area that uses /ɔ/ a lot (you can thank the cot/caught and father/bother mergers for that) and where /o/ is itself completely alien except in /oɹ/ (as in bore), /ol/ (as in bowl), /oʊ/ (as in bow), and /oɪ/ (as in boy). Just throwing this out there. (And, for that matter, thinking about some of the contexts where /o/ does occur is leading me to suspect that a lot of our diphthong notations show the wrong vowel for their offglides...) Whoop whoop pull up Bitching Betty ⚧️ Averted crashes 00:55, 14 November 2022 (UTC)[reply]
I would like to point out that we currently transcribe the word Florida with the syllable boundary after the r, not before it, so it's FLOR-i-da, not FLO-ri-da according to us. This is in keeping with words like massive, where a single consonant after a stressed syllable is grouped with the preceding vowel. My theory that I posted on talk:Florida was that the dual pronunciations arise because different speakers, even within the same dialect, might parse the syllables in different ways, with some of us thinking of it as FLOR-i-da and others as FLO-ri-da. Since as you point out there are very few speakers of American English for whom IPA(key): [flɔr] is a possible word, it may be that those of us who group the /r/ with the first syllable have only the "floor" pronunciation to choose from, while those who mentally parse the first syllable as open can pronounce it with any vowel that's valid in an open syllable. Thus, it might be that this word does not follow the usual dialectal patterns. Soap 01:03, 14 November 2022 (UTC)[reply]
Yes IMO the vowel of 'floor' and 'horse' is clearly /o/ not /ɔ/ in GA. Note that GA speakers who merge 'cot' and 'caught' do not merge 'car' and 'core', which suggests rather strongly that the vowel in 'core' is not /ɔ/. It is probably time for me to try creating an English pronunciation module; I've been spending the last couple of months on Portuguese pronunciations but that work is coming to a close. Benwing2 (talk) 02:19, 14 November 2022 (UTC)[reply]
Exactly (re car). Another example that just occurred to me is that "floor eight" or "the floor ate it" (to invent a more creative excuse for not having my homework than "the dog ate it") clearly has a different vowel from "flaw rate" or "the flaw rate it (has is XYZ)" in GenAm. - -sche (discuss) 02:34, 17 November 2022 (UTC)[reply]
I support the /oɹ/ representation for the north and force merged vowel. I think I analyzed it as another instance of the goat vowel /oʊ/, before I learned about these IPA systems that write it with an ɔ symbol. It is generally a bit lower phonetically than the vowel in goat, but I think that's a natural effect of the velarized /ɹ/ after it. It's usually very different in quality from the caught vowel /ɔ/ or /ɑ/ in accents under the General American umbrella, such that I don't think anyone perceives north and force as having the same vowel as caught. Though it's a separate issue as Mahagaja says, for words like Florida I would support two General American transcriptions, one for the Flarrida pronunciation (I guess /ˈflɑɹədə/ with the same vowel as lot), probably influenced by NYC and neighboring accents, and one for the Midwestern one with the same vowel as north and force (/ˈfloɹədə/), because these represent slightly different phonological systems. — Eru·tuon 00:42, 15 November 2022 (UTC)[reply]
I would agree that the o vowel before /l/ and /r/ monophthongizes, at least for many dialects including mind. Vininn126 (talk) 16:53, 15 November 2022 (UTC)[reply]

Dictionaries give the following pronunciations for "Florida" in GA:

  • OED: /ˈflɔrᵻdə/
  • Routledge: /ˈflɔrƗdə/
  • Cambridge: /ˈflɔːrɪdə/, /ˈflɑːrɪdə/
  • Longman: /ˈflɔːrɪdə/, /ˈflɑːrɪdə/
  • Merriam-Webster: ˈflȯrədə, ˈflärədə

Hope that helps. Nosferattus (talk) 07:18, 14 November 2022 (UTC)[reply]

@Tharthan, Soap, -sche, Mahagaja, Chuck Entz, Benwing2, Vininn126, Nosferattus Have we come up with any sort of consensus re: GA /o/? Whoop whoop pull up Bitching Betty ⚧️ Averted crashes 00:26, 18 November 2022 (UTC)[reply]

Well, you, I, Benwing and Erutuon are in favour of /o/, and Vininn seems to be in favour. Chuck doesn't use /ɔ/: Chuck, do you pronounce north/force words with /o/, or what? Mahagaja seems to be saying either could work (and in the past mentioned the notation [o̞] which might be something to consider for the narrow transcription). I can't tell if Soap is taking a stance; I think Tharthan would prefer to stick to /ɔ/ but I'm not sure. I mocked up one possible way of handling these at hoarse and horse, tentatively leaving /ɔ/ alongside /o/. If there's not (much) opposition, I reckon we update Appendix:English pronunciation. But are we merely adding /o/ so these words will have "/ɔ/, /o/", or are we dropping /ɔ/ from the modern pronunciation? (I trust we're still mentioning it as the pre-merger pronunciation in any case, something that'll be easier to do systematically once we have a pronunciation module.) BTW, I'd like us to add the words that make up Wells' lexical sets' names into that appendix, to make it easier to see which line covers force... - -sche (discuss) 23:06, 19 November 2022 (UTC)[reply]
My opinion for words like floor is that they should be listed as having /or/ in GenAm because almost no American dialect still allows [ɔr] in a closed syllable .... basically it'd be people who are horse/hoarse unmerged but still pronounce final /r/. But there are words like Florida that have idiosyncratic pronunciations and should not be forced to show only /or/. Soap 23:16, 19 November 2022 (UTC)[reply]
@-sche: I am basically in agreement with you on this, -sche. I would say that I don't think that we necessarily ought to drop /ɔ/ altogether from the modern pronunciation. Something along the lines of what you have right now at horse looks good, although I am not sure that it makes the most sense for an Early Modern English pronunciation of the word to be placed right next to that (Modern English) pronunciation that you have there near the bottom. The Early Modern English pronunciation ought to be separate, on its own line, not immediately next to that pronunciation.
When it comes to some words, such as Florida, I think that what Soap has brought up regarding syllable boundary differences from speaker to speaker impacting how that word might be pronounced probably warrants some indication of that when such differences lead to a real difference in pronunciation. In the case of that particular word, that would mean including an [-ɔ.ɹɪ-] pronunciation in addition to the other pronunciations. Tharthan (talk) 00:16, 20 November 2022 (UTC)[reply]
I think General American-type accents that pronounce the vowel of Florida differently from north and force typically pronounce it with the lot vowel, not the thought vowel. So then the correct transcription would be /ɑɹ/ (ignoring syllable break considerations), contrasting with /oɹ/ (or /ɔɹ/?) for north, force, and I guess glory. My reasoning is that these accents seem to be influenced by New York City pronunciations (I hear them from TV presenters a lot), and there I think the vowel of Florida is more similar to the low vowel of lot and not the back and sometimes raised and rounded vowel of thought. The syllable break certainly is the condition that would explain why Florida hasn't merged with force but north has, but it isn't enough to describe the current phonemic distinction between Florida and north-force in General American-type accents that I typically hear. I could be wrong because I don't live in an area where these pronunciations are at all common. But I'm curious now if for you, Tharthan, because you have a north and force contrast, would Florida and north have the caught vowel, and glory and force the goat vowel (contrasting with the vowel of starry or lot)? — Eru·tuon 22:20, 23 November 2022 (UTC)[reply]
You are talking about something that is different from what Soap and I were referring to with regard to Florida. The issue at play that we were bringing up about Florida is that, for some speakers, there are syllable boundary differences—as Soap said: different speakers parsing Florida's syllables in different ways. Most saying /ˈfloɹ.ɪ.də/ (the first part like the word floor), but others saying [ˈflɔ.ɹɪ.də], (the first part like flaw).
The /ˈflɑɹ.ɪ.də/ pronunciation is an entirely different subject, one that I think is adequately covered in our entry for Florida. The /ˈflɑɹ.ɪ.də/ pronunciation is listed as New York City, Philadelphian, and non-Bostonian traditional Eastern New England English. A /ˈflɒɹ.ɪ.də/ pronunciation is indicated as the traditional Bostonian pronunciation, as it is in England's Received Pronunciation.
With regard to your last question, I am sorry to say Erutuon but I only have a partial horse-hoarse distinction. And I suspect that the only reason that I have any distinction at all is because I spent a lot of time from a very young age around relatives who had a full distinction in their speech.
A couple of examples: mourning is different from morning for me, and four is different from for for me. But, in contrast, the first word in a hypothetical "more *ning" would not have the GOAT vowel, and fore doesn't have the GOAT vowel either. Nor does boar or bore. And yet hoarse does have that vowel.
I guess the situation with the horse-hoarse distinction for me is not dissimilar to the situation that another New Englander Wiktionarian here briefly mentioned that they have with the father-bother merger. As they said about their own speech in a discussion related to this one: "full father/bother merger except before /ɹ/ plus a handful of other scattered exceptions which remain unmerged." Tharthan (talk) 00:05, 24 November 2022 (UTC)[reply]
@Tharthan As the New Englander Wiktionarian in question, it's "the situation that she has" and "as she said about her own speech", FYI. Whoop whoop pull up Bitching Betty ⚧️ Averted crashes 22:51, 24 November 2022 (UTC)[reply]
Does anyone else want to express an opinion on whether to have both /ɔ/ and /o/ as post-merger GenAm pronunciations, or just /o/ (with /ɔ/ only as the pre-merger pronunciation)? Normally when we list multiple pronunciations for the same accent they're contrastive, whereas these are not, they're competing ways of representing the same pronunciation for words in the horse, hoarse sets (setting Florida aside as a different beast that may be pronounced both like floor+ɪdə and like flaw+ɹɪdə). I'm interpreting Benwing, Erutuon and Whoop as preferring just /o/. For now, I've done this. BTW if Canadian English should also use /oɹ/, let's discuss that... - -sche (discuss) 23:43, 22 November 2022 (UTC)[reply]
Erutuon's edit changing the Canadian borrow and forest vowels from /ɔɹ/ to /oɹ/ highlighted that we also needed to update the appendix with regard to which vowel those words have in GenAm. The appendix claimed that borrow and a handful of other words only have /ɑɹ/ while horror and forest only have /ɔɹ/, but this is wrong in both directions: in truth, looking at Youtube and Merriam-Webster, any of these words can have either the horse-hoarse vowel or the start vowel in GenAm, like in the other listed dialects. I boldly made this change; if this needs to be tweaked (or reverted) and discussed further, please tweak/revert/discuss. It is unclear which accents or pronunciations the note "This sequence only occurs before another vowel." is intended to refer to. - -sche (discuss) 00:23, 24 November 2022 (UTC)[reply]
Do we need to update the enPR given for that line? It's just ŏr, which covers the first of the possibilities several of the dialects allow, /ɒɹ/ (although it means we're representing GenAm's /ɑɹ/ with two different enPR transcriptions), but it means we're not indicating the separate existence of the other possible pronunciation which all of the dialects allow, ɔɹ, oɹ (unless we're saying that sound gets represented two different ways, as ŏr (which thus stands for two things) or ôr, based on... what? facts about the etymology which someone looking to add an enPR transcription may not know?). - -sche (discuss) 00:38, 24 November 2022 (UTC)[reply]
The reason for having a different lines for borrow and horror is that borrow has /ɑɹ/ in all GA, whereas horror has /ɑɹ/ in New York City-influenced (and similar) GA and /oɹ/ elsewhere, as shown in the chart of mergers of /ɒr/ and /ɔr/ (using their diaphonemic symbols) or Aschmann's table of r-colored vowels. I think Merriam-Webster's /oɹ/-type transcription for borrow must be for Canadian English; I don't know of a US accent that pronounces it that way. — Eru·tuon 02:23, 24 November 2022 (UTC)[reply]
@Erutuon - borrow doesn't have /ɑɹ/ in all GA, though; for me, it comes out as /ˈbɔ.ɹou/, with /ɔ.ɹ/ (the phonotactic incompatibility between these two phones being averted by the intervening syllable break). Whoop whoop pull up Bitching Betty ⚧️ Averted crashes 06:15, 24 November 2022 (UTC)[reply]
Just to be clear, are you saying you have an American accent without the cot-caught merger and you pronounce borrow with the THOUGHT vowel, in contrast with both /o/ as in boring and /ɑ/ as in starring? So borrow (or sorrow) would essentially rhyme with "raw row" or "saw row" (setting aside stress) but not with "spa row" or "sore O"?--Urszag (talk) 06:23, 24 November 2022 (UTC)[reply]
@Urszag - With the cot/caught merger, but the rest of that is correct. Whoop whoop pull up Bitching Betty ⚧️ Averted crashes 06:41, 24 November 2022 (UTC)[reply]
Oh! Then, do you have /ɔ/ in cot/caught/bother/borrow and lack the merger of this with the father/spa/starry vowel /ɑ/? Urszag (talk) 06:47, 24 November 2022 (UTC)[reply]
/ɔ/ in cot/caught/father/bother/borrow and /ɑ/ in spa/starry (basically, it's /ɑ/ when immediately before /ɹ/ without an intervening syllable break and also in some foreign loans like spa/bra/Nazi, /ɔ/ otherwise). Whoop whoop pull up Bitching Betty ⚧️ Averted crashes 07:59, 24 November 2022 (UTC)[reply]
...You have /ɔ/ in father?
I do not recall having ever encountered anyone who pronounced the word father /ˈfɔðɚ/, and was not aware that there was a North American dialect that pronounced it that way. I guess that if someone spoke a dialect which had first had a form of the father-bother merger take place that resulted in historical /ɑ/ and /ɒ/ merging to to /ɒ/ in the dialect, that then afterwards had a cot-caught merger occur with /ɔ/ being the resulting vowel, /ˈfɔðɚ/ could conceivably result from that. With that said, again, I have no recollection of having ever heard /ˈfɔðɚ/ for father in my life. Tharthan (talk) 09:07, 24 November 2022 (UTC)[reply]
...yes, I do. Whoop whoop pull up Bitching Betty ⚧️ Averted crashes 10:48, 24 November 2022 (UTC)[reply]
@Erutuon: Hmm... M-W seems to specify when they're giving a Canadian or other non-GenAm/US pronunciation (as for schedule, [3]), and they don't call borrow-with-o Canadian. Well, I have no objection to re-splitting the lines, I'd just like to understand why it's considered useful: is there a reason these 4–5 words having a different main pronunciation (in one dialect) than the other members of their broader set (while both sets would still need notes about how they can have the other pronunciation) means they need their own line in the table; like, does there need to be a separate line for every anomalous word, or is there something special about these? Were they historically in a different class than the horror, forest words, or are they separated out by other reference works so often that we'd be remiss not to do likewise? - -sche (discuss) 19:29, 24 November 2022 (UTC)[reply]

ʌ in American English pronunciations[edit]

There's something of a schism between dictionary editors over whether a stressed schwa should be represented as ʌ or ə in American English. Some dictionaries transcribe stressed schwas as ʌ in IPA so that the American and British transcriptions match. Other dictionaries transcribe stressed schwas as ə in IPA because stressed and unstressed schwas are allophones and should be represented by the same symbol. Dictionaries that follow the first convention include the Longman Pronunciation Dictionary and the Cambridge English Pronouncing Dictionary, while dictionaries that follow the second convention include the Merriam-Webster Collegiate Dictionary, the Oxford English Dictionary, and the Routledge Dictionary of Pronunciation for Current English. Geoff Lindsey convincingly argues that the first convention doesn't make any sense at https://www.youtube.com/watch?v=wt66Je3o0Qg. As far as I can tell, Wiktionary follows the first convention (using ʌ), but maybe we should be using ə instead. See examples at above#Pronunciation, Russia#Pronunciation, and love#Pronunciation. Thoughts? Nosferattus (talk) 06:56, 14 November 2022 (UTC)[reply]

I want to be against this because I have a clear stressed phonetic schwa in words like pull ~ full ~ bull, and it is neither [ʊ] nor a syllabic consonant. Meanwhile there is never an unstressed [ʊ], whereas I would say I do have unstressed [ʌ] in words like above. There is a minimal pair between [ʌ] and [ə] in unstressed position .... Rosa's roses. So if any stressed vowel is to be united with schwa, for me it must be /ʊ/.
However as I intimated above, I live in New England, don't travel much, and don't consume mass media, and it took me well into adulthood to realize that most people even in America don't talk quite the same way as me. So all I can really say is that for some Americans the /ʌ/=/ə/ analysis doesn't work. Soap 10:21, 14 November 2022 (UTC)[reply]
@Soap For me (another New Englander born and raised, albeit one since transplanted to Minnesota), Rosa's roses has [ə] and [ɪ] rather than [ʌ] and [ə] (my personal go-to unstressed [ʌ]/[ə] minimal pair is untangle/entangle), and [ʊ] does exist in unstressed environments (for instance, the second vowel in fishhook). Whoop whoop pull up Bitching Betty ⚧️ Averted crashes 02:27, 16 November 2022 (UTC)[reply]
@Soap: Interesting. Are could and cud homophones for you? How do you pronounce the vowel in strut? For me (also a native GenAm speaker), it's definitely a mid central schwa (with a relaxed mouth), identical to ⟨a⟩ in about. Whereas for pull, I purse my lips on the vowel. Do you think it would make sense for us to list both /əˈbʌv/ and /əˈbəv/ as GenAm pronunciations for above, since for many (most?) GenAm speakers the vowel sounds are identical (other than the stress)? Nosferattus (talk) 17:24, 14 November 2022 (UTC)[reply]
My understanding has always been that the ʌ=ə argument is based on a technicality ... that there are no minimal pairs if you posit secondary stress .... and therefore that the two phonemes can be united even though they sound audibly quite distinct. Ive never heard it claimed that any sizable group of people actually pronounces the vowels of cut ~ strut etc as a literal IPA [ə] schwa. If there are people actually saying [ə'bəv] for "above", here in the United States, then perhaps I need to get out more, because I've never heard that at all. That sounds sarcastic, but obviously the people here and the many dictionaries using this scheme can't all be wrong. Still, I just want to make sure this isn't based on a misunderstanding of some similar type to the Florida discussion above where dictionaries have been using "ô" or some other symbol to indicate two audibly distinct vowels. The fact that this seems to come up over and over again not just here but on linguistics forums indicates to me that there are at least two credible sides to the debate.
As for my pronunciations .... no, I dont unify could / cud or any other pair of words in which one member has /ʌ/ and the other has /ʊ/. STRUT also has /ʌ/. Perhaps my dialect is in the minority, although I want to at least add that I wouldnt assume it is just New England that speaks like me, as this has nothing to do with cot/caught or horse/hoarse and may have a completely different distribution. The word above is [ʌˈbʌv] for me, though I'd accept [ə'bʌv] in rapid speech, but never *[ə'bəv]. I dont have a good answer for how to transcribe above but we should be able to find an agreement as a community. Soap 22:14, 14 November 2022 (UTC)[reply]
@Soap You should watch the video by Geoff Lindsay by OP. Vininn126 (talk) 22:15, 14 November 2022 (UTC)[reply]
I admit I didnt have time to play the video this morning. Having played it, I can't change anything I said above. It's clear to me that he pronounces the words like double and fungus with [ʌ] in the first syllable and a reduced vowel in the second syllable ([ə] in fungus, and either [ə] or just a syllabic consonant in double). He never explains in the video why he considers both sounds to belong to the same vowel phoneme, but I believe Im familiar with the argument, as I said up above ..... essentially, we can argue that [ə] and [ʌ] are the same phoneme in some (perhaps most) dialects on a technicality, even though theyre audibly distinct. But the reduced vowel system is less accurate even in those dialects where the ʌ=ə analysis works, and simply wrong for the dialects where the ʌ=ə analysis breaks down.
As for the schoolbook respelling "uh" for both /ə/ and /ʌ/? I'd say that's a weak argument. There simply arent enough vowel symbols in the English alphabet to represent every IPA phoneme properly. Schoolbooks often run into the same problem with /æ/, but nobody would take that as evidence that English is losing its /æ/ vowel.
I can accept phonemic stressed /ə/ as a phoneme, even if it is realized as essentially [ʌ], since people seem to want to reduce the vowel inventory as much as possible when transcribing words. But even Lindsay admits that there are some speakers, a minority, who continue to contrast /ə/ and /ʌ/. I think we should therefore continue to mark the contrast, so that we can better cover the whole range of English dialects. Soap 22:40, 14 November 2022 (UTC)[reply]
I hate to say it but I wonder how much psychology is playing a role in this. Perhaps your expectations are affecting it. I also believe that east coast US accents are more conservative when it comes to this split, but if you check youglish you'll see most people using schwa. Vininn126 (talk) 22:45, 14 November 2022 (UTC)[reply]
Could we at least agree that ['fʌŋ.gəs] and ['dʌb.əl ~ dʌb.l̩] would be the best narrow IPA transcriptions of the two words he spoke in that video at around the 2:50 mark? If you're hearing ['fəŋ.gəs] and ['dəb.əl] then yes we disagree on the core issue at hand. Thanks, Soap 22:55, 14 November 2022 (UTC)[reply]
Well first of all in that list I don't think he's necessarily trying to present the phoneme, rather, list words that would contain it. Also that said stressed schwa and unstressed schwa will sound different, which will sound different from /ʌ/. I also wouldn't call that /ʌ/ in those words. Vininn126 (talk) 23:00, 14 November 2022 (UTC)[reply]
But there are no minimal pairs between stressed /ə/ and stressed /ʌ/ in any dialect that Im aware of. So why not just call the supposed stressed schwa /ʌ/? Soap 23:04, 14 November 2022 (UTC)[reply]
The lack of minimal pairs usually points to the lack of a phoneme, not the existence of one. Vininn126 (talk) 23:06, 14 November 2022 (UTC)[reply]
But analyzing stressed /ʌ~ə/ as a single phoneme doesnt eliminate /ʌ/ as a phoneme because of words like undone, which have an unstressed /ʌ/ that sounds clearly different from the schwa. (At least in the speech I've heard.) Are you saying that the word undone should be transcribed as /ən'dən/? If so, that complicates our transcription of words like embattle, for which we list initial schwa as a valid pronunciation .... analyzing un- as /ən/ would make it seem that at least some Americans merge en- with un-, which I've never heard claimed even by people who believe in the ʌ=ə theory.
We can get around this problem by positing secondary stress for the prefix un-, which would allow us to explain how it's pronounced as [ʌ] without actually being /ʌ/ ..... that analysis works well enough, but it's also more complicated than just keeping /ə/ and /ʌ/ as separate phonemes. Soap 09:09, 15 November 2022 (UTC)[reply]
Both vowels in undone are schwas for me. As to en- and em-, I tend to pronounce it more with /ɪ/ or the likes there of. As to the notion that this would complicate entries I fail so see how this is more complicated. Vininn126 (talk) 09:13, 15 November 2022 (UTC)[reply]
Neither undone vowel is a schwa for me, and I'm definitely an AmE speaker. Whoop whoop pull up Bitching Betty ⚧️ Averted crashes 02:07, 16 November 2022 (UTC)[reply]
Well, embattle only listed schwa as a pronunciation because it was recently added in diff by Whoop whoop, but AFAICT this is mistaken: it may be a regional pronunciation, but it's not GenAm, where (as Vininn says) it's /ɪ/ (or else /ɛ/), and hence en-, em- is distinct from un-.
Whoop whoop, I think we should also discuss /ɾ/ before you add it to /broad/ transcriptions; until now we've only given that in the [narrow] transcriptions. - -sche (discuss) 03:07, 16 November 2022 (UTC)[reply]
@-sche Discussion of /ɾ/ would probably best be split out into a new section, given that it isn't directly related to the current one and how long the current one is. Whoop whoop pull up Bitching Betty ⚧️ Averted crashes 03:47, 16 November 2022 (UTC)[reply]
I doubt strut and comma are consistently audibly distinct in General American. They are variable in Standard Southern British as Geoff Lindsey points out in his blog post about the strut vowel. When he has cut out a bunch of strut vowels out with audio editing software so that we can play them back in isolation, some of them sound like a mid-central schwa, others like a lower and backer [ʌ], and others like a lower central [ä]. He shows similar variation in the comma vowel (unstressed schwa) and the first element of the diphthong of goat. So all three vowels vary within a sort of triangle in the phonetic vowel space. Lindsey's post is not about American English, but I suspect there's similar variation in the strut and comma vowels in many General American type accents, and that if you were to play a wide selection of instances of each vowel sound back in isolation, you couldn't very reliably tell which was strut and which was comma by the phonetic quality of the isolated vowel sounds. (Though I think not many North Americans would have the extremely open [ä] pronunciation of strut or comma that sounds either Queen Elizabethy or Cockney.)
Double is a bad example because the strut and comma vowels are in different environments (stressed and unstressed; the second is before /l/ so it's affected by velarization and possibly l-vocalization) so you can expect them to sound different. Some better strut and comma minimal pairs can be generated with the un- prefix because it has an unstressed strut: unequal versus an equal for instance. I can contrast those if I'm emphasizing the difference, but it's not consistent. I might have a contrast in schwa-y vowels, but it doesn't correspond to the strut and comma lexical sets. Like I sometimes might distinguish between the final vowels of Rosa's and roses as Soap does, but that isn't the strut and comma distinction because they have the vowels /ə/ (comma) and /ɪ/ in old-fashioned RP.
Strut equaling comma is weird for me for a different reason. I don't consistently distinguish strut from comma, and my /ɪ/ vowel is schwa-like too so using the same symbol for kit and comma makes sense to me. So, if strut equals comma and kit equals comma, then strut equals kit, but that's not true because I pronounce kit with a higher vowel than strut. So in my weird idiolect I'm not sure exactly how to divide up the historical strut, comma, and kit lexical sets into phonemes. — Eru·tuon 00:17, 15 November 2022 (UTC)[reply]
Geoff Lindsey's accent is nowadays significantly more Southern British English influenced, which has the distinction (and also the FOOT/STRUT split). When he talks about his native accent he's talking about the significantly more Scouse accent he had as a child. I agree that when he says the words the phones are different, however when the Americans in the video say the words they are largely the same. --Muzer (talk) 01:12, 16 November 2022 (UTC)[reply]
@Soap, separate question re your earlier comment: where's the /ʌ/ you say you have in Rosa's roses? I thought the traditional analysis was Rosa's = /ə/ and roses = /ɨ/ ~ /ɪ/ (~ /ə/), do you lower and back both vowels (Rosa's to /ʌ/ and roses to /ə/)? - -sche (discuss) 22:58, 14 November 2022 (UTC)[reply]
Thanks for asking. Yes, Rosa's has /ʌ/ and roses has /ə/. I had actually written out that Rose is would have /ɪ/, and then deleted it as I felt it would be a distraction since people could say it's secondary stress, a word boundary, or something else. Which is all fair. I dont know offhand whether I have a three-way minimal pair for /ʌ ə ɪ/ in unstressed position. Soap 23:17, 14 November 2022 (UTC)[reply]
Interesting. I don't want to doubt someone's perception of what they're saying, but (like Vininn) I think psychology is influencing how people interpret the differences in sound that they hear, like the guy who said there was a schwa-vs-/ʌ/ contrast in "undone" because he (correctly!) perceived there was a contrast ... it just isn't a schwa-vs-/ʌ/ contrast, because for speakers who distinguish schwa-vs-/ʌ/ undone's vowels are both /ʌ/. I'm not aware of any historical precedent for Rosa's having /ʌ/, so it might be helpful if we could find (or record) some audio and try to check that the difference is not, in fact, something else. - -sche (discuss) 01:22, 15 November 2022 (UTC)[reply]
@Soap: Could you clarify your reasoning about how the schwa sound you have in pull is a problem for the theory that /ʌ/ is the same phoneme as /ə/? Does it contrast with a [ʌ] sound in a similar environment in words like pulverize (/pʌl/) or mull and how exactly? And how is it phonetically different from the vowel in could so that you wouldn't consider it to belong to the same phoneme? I think I sometimes pronounce pull a similar way, though I'm from the Upper Midwest. I keep hearing that fellow North Americans have a single vowel phoneme in pull (/ʊ/), mull (/ʌ/), bowl (/oʊ/) and maybe even ball (/ɔ/) or pall (/ɑ/), though that is not true for me, so probably some clarification would be helpful. I do contrast all of these, though pull and mull are closest phonetically, and I also have a schwa-like vowel in pill (/ɪ/) which is nevertheless quite distinct from the others, I guess higher than pull and mull. The velarized or vocalized /l/ does weird things to the preceding vowel. — Eru·tuon 23:07, 14 November 2022 (UTC)[reply]
Just chipping in saying that for me pull will have schwa, and bowl will have /oʊ/, and ball has /ɑ/. Western with Wisconsin influence. Vininn126 (talk) 23:10, 14 November 2022 (UTC)[reply]
Yes, pull and mull have different vowels. I would say that could has a different vowel as well. At this point, though, while I thank you for your interest, this might be no more than a distraction from the issue at hand, the ʌ=ə argument, .... up above when I said that I might just as well say that ʊ=ə, i did not expect this thread to grow so fast, and wasnt intending to present it as a counterargument so much as an afterthought. After all, the stressed schwa for me occurs in just three words, all of them very similar. My preferred analysis is that /ʌ ə ʊ/ are all separate phonemes and that /ə/ is confined to unstressed syllables. Soap 23:16, 14 November 2022 (UTC)[reply]
For the vast majority of Americans and even a good number of Brits, Lindsay Ellis is right. If we are updating our GA transcriptions I believe we should be using schwa, not upturned v. As to regional lects such as various East Coast lects, as I remember Soap speaks, we should consider that differently. Vininn126 (talk) 18:07, 14 November 2022 (UTC)[reply]
Support this change. I've always wondered why this was the case, especially comparing it to languages that have a true /ʌ/ which can cause a disconnect while trying to compare sounds. AG202 (talk) 20:33, 14 November 2022 (UTC)[reply]
I was sceptical of stressed schwas, but am persuaded it's a fine analysis for modern GenAm; I would just suggest that we should ideally preserve the information about which schwas were /ʌ/ at earlier points in history (compare e.g. the obsolete pronunciations we list on some entries like one) — it sometimes feels like we, the English Wiktionary, cover Ancient Greek phonological history and verb conjugations more comprehensively than English ones! It'd be great if we could get an English pronunciation module going, even if it only works on many but not all entries and some cases still need manual {{IPA|en|...}}, so that we could automatically show developments like older American /ʌ/, /ɝ/ to modern GenAm /ə/, /ɚ/ — perhaps displaying them in reverse order, of course, putting the modern pronunciation first since it's of the most interest to readers, and maybe even collapsing older pronunciations similar to how we do for Ancient Greek. - -sche (discuss) 21:23, 14 November 2022 (UTC)[reply]
Too be fair many British dialects do this, too. However, I agree that marking it as a historical pronunciation is a good idea, but presenting it as modern is just lying to the readers. Vininn126 (talk) 21:25, 14 November 2022 (UTC)[reply]
I agree with this point, it would be helpful to show (or have the option of showing) historical development similar to the various stages shown for Ancient Greek entries. —Al-Muqanna المقنع (talk) 12:06, 15 November 2022 (UTC)[reply]
Fortunately in this case modern Standard Southern British retains the distinction, so you can always look in that transcription to find the traditional distinction. But I agree that proper historical support would be pretty cool. --Muzer (talk) 01:15, 16 November 2022 (UTC)[reply]
I second (fifth?) the proposal to show the stages of historical development in English pronunciation entries. Whoop whoop pull up Bitching Betty ⚧️ Averted crashes 16:29, 16 November 2022 (UTC)[reply]
This may be getting off-topic, but the mentions of bull and bowl remind me of when we had a user adding /kl̩/, /bl̩/, etc (!) as GenAm pronunciations of cull, cole, and bull, bowl, etc, which was undone (discussion in 2014, short 2015 follow-up) because even if some people merge those, it's not in General American and it's almost certainly not to /l̩/. - -sche (discuss) 01:22, 15 November 2022 (UTC)[reply]
It is arguable that English sonorants are vowel-like, but with an inserted schwa, so I think our current transcription of including /ə/ in parentheses is best. Vininn126 (talk) 06:43, 15 November 2022 (UTC)[reply]
Support stressed schwas as ə. —Al-Muqanna المقنع (talk) 12:04, 15 November 2022 (UTC)[reply]
I’m not sure how relevant this is, because phonemes denoted by the same IPA symbol in different languages may have different phonetic realizations, but while Dutch does not have the phoneme /ʌ/, it does have stressed schwas, such as the first vowel in je van het. To me, it does not sound close to /ʌ/ or in any case much closer to /ʏ/. Apparently the unstressed schwa does so too to Dutch ears, as witnessed by pronunciation spellings like Het leven is vurrukkulluk.  --Lambiam 23:16, 15 November 2022 (UTC)[reply]
I still think this is a bad idea, but my arguments dont seem to be moving anybody, and I dont want to wear out my welcome. I just have one more thing to add ..... in all other Germanic languages that I know of, the schwa is treated like a vowel in its own right, and nobody feels the need to try to unite it with one of the full vowels, even if it would be technically possible to do so. To do what we're doing is against the tradition of all Germanic languages .... simplifying the phonology on a technicality .... and in my mind, it will make things more confusing for our readers. That's all I can add without repeating myself. Thank you, Soap 21:01, 17 November 2022 (UTC)[reply]
FWIW, I feel a lot less strongly about merging /ʌ/ with schwa (don't take me as a !vote for it, I would just be OK with it if it's what other people think is best), than about the floor vowel. If we continue presenting /ʌ/ and schwa as different and yet the best analysis turns out to be that GenAm actually only has schwas that differ by (secondary or primary) stress, we're still only requiring people to learn that multiple* symbols for pretty similar sounds both (always) indicate the same sound, whereas representing floor with /ɔ/ is asking people to realize that the same symbol means two markedly different sounds and it's not 100% guessable which is meant in a particular case.
*I was going to say "two symbols", but the sources that merge vs distinguish /ʌ/ and /ə/ seem to be largely the same as those that merge vs split /ɝ/ and /ɚ/, so presumably if we're going for maximum distinction we'll be retaining the traditional /ɝ/ notation of term or turf, and thus using three symbols where e.g. Merriam-Webster uses just schwa? But some of the users who've been making /ɝ/→/ɚ/ changes are against /ʌ/→/ə/; is there any scholarly work saying term has schwa but undone doesn't, or...? - -sche (discuss) 22:27, 17 November 2022 (UTC)[reply]
Oh, I've just noticed Merriam-Webster is doing something a bit slick (and wishy-washy): they do use ‹ə› in their non-IPA notation of a lot of things in their dictionary, e.g. hurry, under, bird ‹ˈhə-rē / ˈhər-ē, ˈən-dər, ˈbərd›, as Lindsey noted ... but the pronunciation key doesn't commit to all of these being IPA [ə], instead they talk about unstressed syllables' ‹ə› corresponding to IPA [ə], primary- or secondary-stress syllables' ‹ə› corresponding to IPA [ʌ], ‹ər› as in merger, bird being [ɝ, ɚ], ‹ər, ə-r› as in hurry being its own thing, etc. - -sche (discuss) 09:23, 18 November 2022 (UTC)[reply]
@Soap, the argument that we don't unite vowels to schwa in other Germanic languages seems to me to be mistaken or begging the question. For example, we do represent the final vowel of machen and Tage and Apfel as a schwa, although earlier in German history they were different; presumably you think the difference is "well, those vowels really are now schwas, not separate sounds", but then ... that's the argument being made by people who want to unite these, too. - -sche (discuss) 16:53, 18 November 2022 (UTC)[reply]
i would say it's more analogous to analyzing entdecken with all three vowels as schwa .... in essence, picking an arbitrary vowel to unite with schwa purely for the purpose of simplifying the vowel inventory. But again, there's nothing else I can add without repeating what I've already said above. Soap 23:20, 19 November 2022 (UTC)[reply]
Consider the sentence, “Tatooine was a circumbinary planet that had two Suns, which were called the Summer Sun and the Winter Sun.” I think that the second word of the compound “Summer Sun” in this sentence is unstressed. Is this compound proper noun homophone in American English with the surname Summerson?  --Lambiam 21:50, 19 November 2022 (UTC)[reply]
No, they would be pronounced differently. This, and words like uppercut, have been sometimes used as evidence that English has a phonemic contrast between /ʌ/ and /ə/ that cannot be analyzed away. However, anyone on the opposite side of the argument can always posit secondary stress for any vowel that gets in the way of the analysis. While this isn't the system I prefer, it is consistent, and therefore I think this question doesn't really get to the core of the /ʌ~ə/ debate, which is really about whether stressed schwa contrasts with /ʌ/. Soap 23:24, 19 November 2022 (UTC)[reply]
In other words, the position of one side in the debate is that American English has an unstressed /ʌ/, contrasting with unstressed /ə/, but no stressed /ʌ/.  --Lambiam 21:43, 20 November 2022 (UTC)[reply]
No, of course not. The idea is that apparent contrasts between unstressed /ʌ/ and /ə/ can actually be explained as contrasts between /ə/ with some degree of stress and /ə/ that is completely unstressed. See the Wikipedia article w:Stress and vowel reduction in English#Degrees_of_lexical_stress, in particular "it is common for tertiary stress to be assigned to those syllables that, while not assigned primary or secondary stress, nonetheless contain full vowels": if the degree of stress rather than the level of vowel reduction is taken to be the underlying contrast, then the degree of vowel reduction can be interpreted as a secondary, allophonic property of a vowel.--Urszag (talk) 22:20, 20 November 2022 (UTC)[reply]
The problem with saying that the debate "is really about whether stressed schwa contrasts with /ʌ/" is that if we go by the traditional definition of schwa where it does not occur in stressed syllables, it obviously does not contrast with any stressed vowel: there is nothing special in terms of patterns of contrast about how it compares to stressed /ʌ/, you could equally transcribe stressed /ɪ/ or /ʊ/ as /ə/ with no ambiguity. Identifying /ə/ with /ʌ/ would have to be based either on a supposed greater phonetic similarity between /ə/ and this particular stressed vowel, or on the phonological phenomenon of certain words like what, was, because, of having strong (stressed) forms with /ʌ/ in some American English accents, where the replacement of the word's original stressed vowel quality with /ʌ/ can be explained as caused by vowel reduction followed by "re-stressing" of /ə/ .--Urszag (talk) 22:20, 20 November 2022 (UTC)[reply]
Okay, thank you. I agree with everything you say in this paragraph. I'd even say that if we must unite schwa with some other vowel, /ʌ/ seems like the best choice. However I still oppose this change because I believe it does nothing useful, and makes analysis more complicated. As I said above, this is against the tradition of all Germanic languages, and makes about as much sense to me as transcribing German entdecken with three schwas. Perhaps English is different somehow. But is it?
What do we gain, exactly, by reducing the English vowel inventory by one? People up above have admitted that the stressed schwa has a clear allophone of [ʌ], and the ʌ=ə analysis would make it the only vowel in the English language with allophones so far apart. This may mislead readers who assume, following the pattern of all the other vowels in our transcriptions, that stressed schwa really is pronounced as [ə] and not as [ʌ].
If we do go ahead with the change, it seems we'll need to start positing secondary stress all over the place, in a pattern that seems arbitrary to me (is it just going to be for [ʌ]?), and in your other paragraph you mention that some scholars also believe in tertiary stress, which I'd never heard of before now. If a phonetic analysis requires that much extra work to keep itself together, I think it's more sensible to just stick with the traditional system in which the schwa is a vowel in its own right, occurring only in unstressed syllables. Just like all the other Germanic languages, along with some Uralic languages and Southeast Asian languages. Soap 23:01, 20 November 2022 (UTC)[reply]
I also support continuing to use /ʌ/ rather than replacing it with /ə/.--Urszag (talk) 23:24, 20 November 2022 (UTC)[reply]
While looking for words with both wedge and schwa in their conventional transcriptions (I later found some, like London), I stumbled across this post by a linguistics professor making a case for keeping /ʌ, ɝ/ separate from /ə, ɚ/ in spite of them sounding the same, because they convey stress differences and other information. (As I commented below, this would also support undoing the /ɝ/→/ɚ/ changes that some of the proponents of /ʌ/-not-/ə/ have made, though: I'm not seeing any source or rationale that keeps /ʌ/ that doesn't also keep /ɝ/.) - -sche (discuss) 02:15, 21 November 2022 (UTC)[reply]
@-sche What I don't understand about this argument is why would we want to indicate a difference in stress by using a completely different phoneme/symbol rather than stress markers? And why would we only do this for schwa and not for other vowels? (The professor's argument that it's about the importance of the vowel just seems like a repackaging of the tautological stress argument.) We seem to be bending over backwards to find excuses for why they should remain separate ('some AmEng speakers pronounce them differently', 'we would lose information about the historical pronunciation', 'we would lose information about the importance of the vowel', or just 'it's the traditional transcription'), while ignoring the fact that the entire reason we are providing IPA transcriptions is to tell readers how a word is pronounced, and keeping /ʌ/ and /ə/ separate clearly doesn't help that. At best, it's confusing; at worst, it's teaching non-native speakers how to pronounce GA incorrectly. Unless we're going to put a big warning message next to every /ʌ/ in GA pronunciations, explaining that it's actually just a stressed /ə/ and not the open-mid back unrounded vowel (as described at Wikipedia and elsewhere), I think we're doing our readers a disservice. And it's not like we'd be blazing a new trail. Merriam-Webster, Oxford, and Routledge have all switched to using just /ə/ for GA. It seems that we would be in good company. Nosferattus (talk) 20:05, 22 November 2022 (UTC)[reply]
I think we may have reached an impasse, as at this point some of us (myself included) are simply talking past each other. Just a few paragraphs above I used similar words to you, but arguing for the opposite point ..... I said we'd be misleading our readers if we transcribed words like cut with schwa. My worry was that English language learners might take it literally and believe that they are actually supposed to say [kət] instead of [kʌt]. So we see the same problem, but propose opposite solutions for it. It may be that, as some others have suggested, we are simply hearing the clips differently. Assuming youre as confident of your analysis as I am of mine, it may be that we will never convince each other of anything and might need to work towards some out-of-the-box solution instead of just declaring one viewpoint correct and the other(s) incorrect. Soap 23:22, 24 November 2022 (UTC)[reply]

Pronunciation Layout[edit]

Hi, everyone! @Vininn126 and I had this discussion where the Pronunciation section in an entry (regardless of language) should follow the table of contents found in WT:Pronunciation. The reason is because I'm an editor of Tagalog language entries here, and we put the IPA above the hyphenation of the word (which he pointed out should be syllabification, which is technically what we were going for). However, looking at WT:Pronunciation, Audio File is at the bottom, so if we're to follow the Table of Contents in ordering the Pronunciation section, that would mean we put Audio at the bottom, which is contrary to all English entries, afaik, but he said that was an exception. My reaction was, I find that weird that there's an exception, why couldn't we have a template provided similar to what we have in the Entry layout page, if we're all gonna follow a specific order? Because if not, if we lack a template, I kinda feel that it's open to interpretation, and the order of the Pronunciations section we use in Tagalog language entries could be just as valid. Thoughts? Mar vin kaiser (talk) 11:20, 15 November 2022 (UTC)[reply]

User_talk:Mar_vin_kaiser#Pronunciation_order For reference. Vininn126 (talk) 11:45, 15 November 2022 (UTC)[reply]
Hyphenation is of less importance and less interest than IPA and audio. The Polish order makes the most sense. Ultimateria (talk) 02:29, 17 November 2022 (UTC)[reply]
Frankly, I find the Audio template too bulky, so I usually place it in the end. Unless the template is changed, I would prefer the order to be [IPA, Rhymes, Homophones, Hyphenation, Audio]. I find a template specifically for syllabification quite useless, since it can be represented in the IPA part. Hyphenation, on the other hand, often isn't intuitive and thus should be present somewhere in the entry (e.g.: Portuguese carro is syllabified as ca.rro, but hyphenated as car-ro). - Sarilho1 (talk) 11:30, 17 November 2022 (UTC)[reply]
However, I do think that it would be good for us to agree in a common standard, including regarding the use of hyphenation, rather than having each language or user do as they please. - Sarilho1 (talk) 11:32, 17 November 2022 (UTC)[reply]
I believe the stanard should be the order of headers on WT:Pronunciation but with audio under IPA. It's what most people want. We should do this by adding a section on the page with the header Order of items and list out all the possibilities in order. Vininn126 (talk) 17:08, 17 November 2022 (UTC)[reply]
Do we have any sources about it being what most people want? - Sarilho1 (talk) 17:13, 17 November 2022 (UTC)[reply]
I suppose not, I'm using guesswork. I think the other most logical place for audio is at the top, then IPA, then rhymes, then syllabification, then hyphenation. Vininn126 (talk) 17:28, 17 November 2022 (UTC)[reply]
I for one honestly don't care much, as long as IPA and audio are before the rest. And even so, the Tagalog order was also fine, I didn't mind. Thadh (talk) 18:07, 17 November 2022 (UTC)[reply]
Whatever order we decide on, are we grouping things that pertain to one pronunciation vs another, and ordering within each grouping? To me, it makes more sense to present "IPA of non-rhotic UK pronunciation of fiver; audio of non-rhotic UK pronunciation" followed by "IPA of rhotic US pronunciation; audio of rhotic US pronunciation", rather than "non-rhotic IPA; rhotic IPA followed by non-rhotic audio as if that audio goes with that IPA; then rhotic audio", though I can also see how listing only a UK pronunciation and all its trappings (rhymes, etc) then a US pronunciation and its different rhymes (e.g. decal), could push the existence of e.g. a New Zealand pronunciation off the screen, whereas listing all the IPAs up front would keep them all visible at once. I suppose there are benefts or drawbacks to either. I agree that IPA and audio of how to pronounce a word is probably what most people looking at a ===Pronunciation=== section are most interested in, not hyphenation or rhymes. - -sche (discuss) 22:35, 17 November 2022 (UTC)[reply]
I believe anything nested would then follow the rules are presented before, so if you have British IPA determining something then British audio would go underneath it, and American audio would introduce a new "L1" as it were. Vininn126 (talk) 00:07, 18 November 2022 (UTC)[reply]
BTW: as described in its documentation, the hyphenation template is indeed intended to be for hyphenation (car-ro), not syllabification, which as Sarilho says should already be indicated in the IPA. Many people don't grasp the distinction, so many uses are wrong, but I don't know how we could make it any clearer. I suppose we should wikilink the word "hyphenation" to a page describing what hyphenation is and how it differs from syllabification. Many of our hyphenation listings are incomplete and it's not always clear if references even exist which would support them, as a separate matter. - -sche (discuss) 22:48, 17 November 2022 (UTC)[reply]

Also, I'd like to suggest this format for tabla:

  • (tablá) IPA(key): /tabˈla/, [tɐbˈla]
  • (tabla) IPA(key): /ˈtabla/, [ˈtɐb.lɐ]

or similar, rather than list all the definitions for those spellings/pronuciations. Ultimateria (talk) 05:44, 18 November 2022 (UTC)[reply]

Hiragana in Japanese inflection tables[edit]

(Notifying Eirikr, TAKASUGI Shinji, Atitarev, Fish bowl, Poketalker, Cnilep, Marlin Setia1, Huhu9001, 荒巻モロゾフ, 片割れ靴下, Onionbar, Shen233, Alves9, Cpt.Guapo):

Is there any reason why we give a full hiragana transcription between the conjugated form and its romanization in Japanese inflection templates, like in 豪快? It's completely redundant (it just repeats the same information given in the two other columns). I would propose to get rid of that column to increase readability and reduce redundancy. — Sartma 𒁾𒁉𒊭 𒌑𒊑𒀉𒁲 15:47, 15 November 2022 (UTC)[reply]

Romanization is not Japanese but it serve as a reference for people not familiar with Japanese. Conjugated form may contain kanji that has more than one readings, thus having a full hiragana transcription is appropriate to distinguish. See 行く where 行く can be pronounced as both いく and ゆく.Shen233 (talk) 15:54, 15 November 2022 (UTC)[reply]
@Shen233: I think you missed my point. Even if a word has more possible pronunciations, like 行く, having いく (iku) or ゆく (yuku) doesn't add anything to the information already given by the romanizations "iku" and "yuku". You're just repeating the same information twice. — Sartma 𒁾𒁉𒊭 𒌑𒊑𒀉𒁲 22:22, 15 November 2022 (UTC)[reply]
Fair, one thing I can envision is to get rid of hiragana column, but put furigana instead on all the kanji.Shen233 (talk) 22:25, 19 November 2022 (UTC)[reply]
If we're getting rid of the kana anyway due to being mostly extraneous, how would furigana be useful? ‑‑ Eiríkr Útlendi │Tala við mig 08:11, 20 November 2022 (UTC)[reply]
If anything, I rather delete romaji column in favor of kana. Shen233 (talk) 08:22, 20 November 2022 (UTC)[reply]
I've got the same feeling about removing kana. All textbooks I used were based on kana inflections with no rōmaji but we don't have to remove kana to reduce the number of columns. Using ruby we could have both in the same column, e.g.
  1. () (kuru)
  2. () (koi)
  3. () (kure)
(no hyperlinks are required). Anatoli T. (обсудить/вклад) 08:31, 20 November 2022 (UTC)[reply]
@Shen233: I still think furigana is redundant (it's English Wikipedia, and the hiragana reading is given in the "kanji in this term" box anyway), but it's definitely 100 times better than having the hiragana column, that takes space for no good reason AT ALL. Though I do feel like we all need to remind ourselves here that this is not Japanese Wiktionary but ENGLISH Wiktionary. Treating every Japanese text as if it was a Japanese book for children or a textbook for foreigners (instead of a DICTIONARY for ENGLISH speakers) should not be the direction we want to follow here.
@Eirikr: I agree with you, too. My preference would be to get rid of ANY hiragana (hiragana column AND furigana), but I feel a lot of editors here love their kana (for reasons that escape me completely), so being realistic I understand it's not going to happen. But yeah, if it was up to me I'd delete all unnecessary hiragana and furigana, it only add noise and makes every entry unnecessarily heavy.
@Atitarev: That would be the least bad solution for me, so I could compromise to that. — Sartma 𒁾𒁉𒊭 𒌑𒊑𒀉𒁲 12:35, 20 November 2022 (UTC)[reply]
@Sartma: Japanese inflections without kana? Seriously? No way. It's a native way, any Japanese dictionary would use only kana. Besides, there is no 100% correspondence between kana and rōmaji. Nothing is redundant. Keep the way it is. Anatoli T. (обсудить/вклад) 22:15, 15 November 2022 (UTC)[reply]
@Atitarev: Japanese dictionaries don't give inflections in kana like we're doing on Wiktionary, so I'm not sure what you're referring to. Using a column to give a "kana reading" or the inflected forms (which already are mostly in kana anyway) when you have the romanization next to it is 100%, absolutely redundant. — Sartma 𒁾𒁉𒊭 𒌑𒊑𒀉𒁲 22:31, 15 November 2022 (UTC)[reply]
I think there are ways to make the table more space-efficient without reducing the amount of information they contain. They're very spread-out at the moment. Theknightwho (talk) 23:41, 15 November 2022 (UTC)[reply]
@Theknightwho: Using furigana should be enough. (even though, again, I don't see the use of them in a dictionary for English speakers that gives romanizations...). But at least everything would be more compact. — Sartma 𒁾𒁉𒊭 𒌑𒊑𒀉𒁲 21:53, 17 November 2022 (UTC)[reply]

────────────────────────────────────────────────────────────────────────────────────────────────────I take your point, that for words like 食べる or 食事する including hiragana on every line of the paradigm is repetitive. (By the way, could you guess that I'm on my lunch break?) There are some words (e.g. (), (), 来る(くる), 来れ(くれ), 来い(こい)) where that is not the case, but in the majority of cases the hiragana is predictable. I would support making the tables more compact or efficient, but would hesitate to get rid of (usually) redundant elements wholesale. Cnilep (talk) 03:33, 17 November 2022 (UTC)[reply]

@Cnilep: How would 来る (kuru) be a different case? As long as you have the romanization, all furigana or kana transcription would be redundant. What's "not enough" with 来ない (konai), 来ます (kimasu), 来れば (kureba), 来い (koi), etc? There's no need for the furigana if you give the romanization. They carry the exact same information. The weird argument that "it's a native way of indicating pronunciation" makes no sense at all in a dictionary that's aimed at English speakers, like the English version of Wiktionary. English speakers are not Japanese native speakers... — Sartma 𒁾𒁉𒊭 𒌑𒊑𒀉𒁲 22:02, 17 November 2022 (UTC)[reply]
Oh, I guess I didn't quite take your point. I thought that your argument was that repeating the furigana is redundant, since the differing okurigana is present on the kanji forms. You actually meant that kana is redundant to romaji, correct? In that case, my comments are off base. Even so, I am not sufficiently convinced that kana is unwarranted, even for an English-speaking readership. My personal opinion, for what its worth, is unchanged: I support efficiency but not removing information. Cnilep (talk) 02:05, 18 November 2022 (UTC)[reply]
@Cnilep: but removing the hiragana-only column would not remove any information. — Sartma 𒁾𒁉𒊭 𒌑𒊑𒀉𒁲 08:18, 18 November 2022 (UTC)[reply]
Removing a column would not remove information if the information is repeated elsewhere (e.g. as furigana). Potentially, however, removing hiragana and not including it elsewhere would remove information. In both of my replies efficiency is meant to describe such actions as removing the hiragana column from the chart and including furigana on each kanji form in the chart. Cnilep (talk) 01:24, 21 November 2022 (UTC)[reply]
I think it would not be a good idea to remove the hiragana. While it is not usual in Japanese to spell verbs fully in hiragana, there are nevertheless conventional aspects to hiragana spellings that aren't fully captured by many common styles of romanization, like the use of づ (as in つづか tsuzuka) vs. ず (as in こず kozu).--Urszag (talk) 02:48, 18 November 2022 (UTC)[reply]
@Urszag: I understand that point, but you do have the hiragana spelling (as furigana) in the headword, AND in the "Kanji in this term" box already, so it's again just another instance of redundancy. — Sartma 𒁾𒁉𒊭 𒌑𒊑𒀉𒁲 08:25, 18 November 2022 (UTC)[reply]
  • Some general thoughts.
  • This is the English Wiktionary. We can only assume that our readers have facility in the English language.
  • Removing romanizations completely from Japanese inflection tables removes a large amount of information for 100% of our entries, and renders them effectively useless to a chunk of our target audience -- English readers who may not have any knowledge of Japanese.
  • Removing kana completely from Japanese inflection tables removes a small amount of information for a small percentage of specific cases -- those few cases where Hepburn romanization is a lossy process, such as losing the distinction between (zu) and (zu), or (ji) and (ji), or (o) and (o), or (ha) and (wa), etc.
→ That said, does this matter?
Serious question. I see three relevant categories of our readership to consider here:
  • That portion of our readership that doesn't read Japanese at all won't care.
  • That portion of our readership that already reads Japanese will know enough that they also won't care.
  • That portion of our readership that is trying to learn Japanese will care, and will be potentially impacted by not knowing that zu in the romanized spellings of certain words should be spelled as in kana, etc.
However, if the kana spellings are given elsewhere in the same entry, does the absence of kana from the conjugation tables matter? I don't think it does.
  • Including a kana column does make our tables larger than necessary. This is a problem for smaller UIs, such as mobile phones.
  • Including kana as ruby / furigana obviates the need for a dedicated column, and could be a viable workaround here.
I do have concerns about the potential for ruby causing confusion among our English-reading audience, as ruby text is not commonly used in the English-language world. Reader unfamiliarity with the conventions of ruby text could cause confusion that things like (しゅつ) (shutsu) might represent a single grapheme, rather than the four separate graphemes , , , and , with the latter three providing a phonetic guide to the first -- which guide is wholly useless to a portion of our English-reading audience.
I understand the utility of kana for those Wiktionary users who are trying to learn Japanese. I am worried that we are overusing kana in places that might not be appropriate for a broader audience, much as Sartma points out above.
‑‑ Eiríkr Útlendi │Tala við mig 20:05, 21 November 2022 (UTC)[reply]
@Eirikr: Amen! — Sartma 𒁾𒁉𒊭 𒌑𒊑𒀉𒁲 09:30, 22 November 2022 (UTC)[reply]

Furigana and search indexing[edit]

My immediate reaction upon seeing this: no, no, please, NO! Sorry for the emphatics, but there is a huge wrinkle I think people have missed here. (Not your fault for not noticing, since the discussion has mostly been around the needs of aesthetics and of human readers. And unless you are presently actively involved in studying Japanese, and view English Wiktionary as a priceless Japanese learning tool—as I am and I do.)

Furigana messes with searchability, in several big ways:

  1. Replacing both the current kana and romaji with furigana could make the inflected forms impossible to locate by their reading, unless they have their own non-lemma entries as soft-redirects (and the vast majority of Japanese inflections do not).
  2. Still, adding furigana to supplant one or the other of kana or romaji seems like a no-brainer, then, right? Unfortunately, furigana requires the (human or machine) annotator to decide on segmentation—that is, which kanji goes with what sound(s)/kana/romaji. In non-jukugo compounds, this can be entirely arbitrary. That means that you have to make a choice:
    • Don’t segment—then the kanji will be indexed for search, and so will the reading in kana (or ruby romaji, I suppose), but you will have to find the segmentation elsewhere, which—as mentioned above—is an important pedagogical and etymological tool. Also, from some CodePen experimentation, I believe if you put more furigana than fit over the first kanji, you must segment unless you want empty space between the kanji.
    • Segment—but then each segment will be indexed separately, so the whole will become unsearchable.

Here, an example entry that doesn’t require much language expertise may help: 一本道. It usefully has furigana in its entry: (いっ)(ぽん)(みち). Now, copy 一本道 and do a search-on-page. Notice, the one with ruby/furigana doesn’t get selected—the same is true for the full-text search indexing. Do the same with the kana—same result (though you match the furigana that doesn’t require any extraneous formatting—untrue for conjugation forms). (If you know how to use your DOM inspector, or just want to “view source” and look for the <ruby> tags, you’ll see how each kanji, and each segment of kana, is separated from its neighbors—the format is like an HTML dictionary, kanji-kana, kanji-kana, kanji-kana…)

This entry is not a great example of the issue—because it has a clear one-to-one kanji-to-kana composition and no conjugated forms—but I couldn’t think of one with ambiguous kana that also has conjugation off the top of my head.

So, the existing furigana (and other furigana created by the {{ja-noun}} template) isn’t a problem with this particular reading/kanji: it doesn’t conjugate. So it exists in bare kana and kanji form elsewhere in the entry, and will be indexed and findable in search. But the conjugation tables are typically the only place in an entry each of those forms can be found. Convert them to ruby text, and they may end up indexed inaccessibly.

The {{ja-noun}} template is smart enough to do most segmentation itself, so perhaps the underlying {{ruby}} templates could be modified to ensure searchability (via invisible auxiliary divs? I don’t know).

But unless and until that’s done, this change would be very disruptive. There are sources about the tradeoffs of ruby text on the web (I’ll update if you want me to cite them, they’re accessible enough but many are not in English), and the W3C has had proposals to address these issues, but AFAIK today, there’s no simple block-level-HTML-only solution to include ruby text arbitrarily inline without breaking the flow of text indexing. (Japanese’s lack of spaces in running text doesn’t help here, either.)

Why does this matter? It is very common for Japanese writers to switch to kana when kanji is available or even more common—for effect, to increase or lower formality, or because they just forgot a kanji. But also, for at least a century, many words have become more and more commonly written in kana, and this slow drift means that it isn’t always clear-cut whether the lemma entry belongs in kanji or not. (There’s an entire class of entries that soft-redirect in one direction, and another class in the other, and then there are entry pairs like ください and 下さい that—for various more-or-less explicable reasons—have full entries for both, even though they are pronounced and mean the same.)

I almost never need to use full-text search for the other languages I study—the search box completion is good enough! But I use full-text search for Japanese multiple times a day—usually to search readings for possible conjugated lemmas.

I personally wouldn’t care if we take away the romaji in the conjugated tables—that seems fine (but I can read kana and hundreds of kanji; I don’t know how many potential users of Japanese conjugation tables can’t, and I have trouble imagining their usage patterns).

The “redundancy” mentioned previously in the discussion is mostly visual—you see the kana/kanji makeup of the headword, and you can (usually) apply it to other non-lemma uses. But the search engine cannot do this cleverness. (As I mentioned, perhaps the ja lang modules used for things like {{ja-noun}} can—but we must make sure it can and will, in all cases.)

Even still—the “redundancy” only works if you know the kanji already; if you put いっぽんみち into the search box, as if you only knew the reading or encountered the word in hiragana alone, nothing will come up directing you to 一本道 (at least, until someone makes a soft-redirect page.)

I was now going to get into the accessibility issues with non-human screen readers (for blind people or other people with disabilities, or for AI-powered audible reference works), but I’m already 850 words in, and I think you can imagine much of what I’m going to say. (—But if you don’t, I’m happy to expand on it later!)

In any case: if you do decide to remove both and/or replace the plaintext kanji/kana word with a furigana equivalent, please ensure it still indexes correctly.

Thank you for putting up with this rant! TreyHarris (talk) 21:32, 30 November 2022 (UTC)[reply]

Oops—I totally missed that there was an ongoing discussion about what Japanese “lemmas” even are, further down the page. I haven’t absorbed it yet, so I don’t know how it relates (and it may—if “form which most easily tracks spelling with reading” is a candidate), but I wanted to clarify that I had a very specific idea of “lemma” in mind in my repeated usage of the term above, which may not be congruent with anything in that discussion or anything I’d want to defend in that discussion. By “Japanese lemma”, I simply meant “a headword that does not soft-redirect anywhere else”.
 
These tend to have lemma templates and serve a purpose like “dictionary-form, subjective, nominative, infinitive, ergative, etc.”, of being where to expect an inflected form’s primary headword to be, so I called them lemmas, that’s all. (The dual full entries for things like both ください (kudasai, ’please’) and 下さい (kudasai, ’please’) complicate that a bit, but for the most part it works even for my point: If they conjugated—which they, thankfully, don’t!—full-text searching of readings would still turn up both as top hits). TreyHarris (talk) 21:59, 30 November 2022 (UTC)[reply]
Oh, and another small thought: if it were to come down to some choice between keeping hiragana or keeping romaji, keeping the kana has the advantage that search indexing for Japanese must contend with the language’s lack of spaces, so you can search for substrings of kana; romaji are recognized as Latin-script, which is used for languages with spaces between words, so you cannot search substrings.
A “killer feature” of English Wiktionary as a Japanese language learning tool for me is when I misidentify a long word as a multiword construction, or vice versa. If you search for dasai because you’re looking for a word you don’t know yet is ください, you won’t find it; but searching for ださい, you will. TreyHarris (talk) 22:32, 30 November 2022 (UTC)[reply]

IPA transcriptions for Southern British English[edit]

(NOTE: I've moved this discussion from the Tea Room as I realised too late that was completely the wrong place for it. Sorry for any Deja Vu!)

Over the past couple of years I've been following the work of Dr Geoff Lindsey (laid out on his blog posts, YouTube channel, and his book English After RP, →ISBN), and specifically his increasingly popular suggestion that the traditional vowel transcription of Conservative RP used to represent a modern Southern British English accent is now so out of date as to be actively misleading, confusing, and generally unhelpful; and that Gimson's intent with his RP transcription system is that it should always be revised and updated to reflect modern changes in pronunciation, which it generally-speaking hasn't been. He therefore lays out a much more consistent and also phonetically accurate transcription system (on his blog and in a half hour video, as well as in the aforementioned book) which I've noticed has gained a little traction, and has certainly received favourable opinions from highly regarded linguists in the field. As a site that should probably remain up-to-date with linguistic research and scholarly opinion, I wonder if there could ever be a consensus for Wiktionary (and presumably also Standard Southern British transcriptions in Wikipedia) to adopt these new symbols and perhaps also new terminology like "SSB" ("Standard Southern British") to replace the confusing and no longer accurate descriptor of "RP" (which largely gets its name from a set of social conditions that no longer exist). Does anyone have any strong opinions on this matter? How might such an idea be progressed were it to come about? (Sorry, I'm not a big wiki-er so I'm not sure if this is the best place to solicit general discussion on this topic). Muzer (talk) 19:59, 14 November 2022 (UTC)[reply]

I’m a fan of Geoff Lindsay’s, he explains many things about phonetic representations that are confusing even to native speakers and makes some good suggestions about labelling some sounds as diphthongs that are traditionally considered to be monophthongs and some as monophthongs that are traditionally considered to be diphthongs. He also rightly debunks the silly notion that a stressed schwa is impossible (though this doesn’t typically occur in SSB). Some things are a little less convincing to me though, I’d say that ‘near levelling’ is still a minority pronunciation and the onset of the goose vowel isn’t quite as far back as he suggests for most speakers. It seems a bit exaggerated to say that the traditional PUT vowel (the capital letter omega) is nearly extinct too. His idea of transcribing the final part of the diphthong in words like ‘no’ and ‘how’ as a ‘w’ does make a lot more sense than using an omega ever did though, even in traditional RP (U-RP). --Overlordnat1 (talk) 02:16, 15 November 2022 (UTC)[reply]
Cheers. Yes, I agree that some of his observations on frequency are maybe not completely right in my experience, though I would say he's probably right on the "near" front; hearing it as a monophthong is very common in my experience. I find the "PUT" vowel is also very notably different between south and north; I hear it a lot more fronted in the South, so I think I'm inclined to agree with him there too. In any case I don't think his subjective observations on frequency really affect the core of his suggested new transcription system, which I think most linguists would agree is very much sound. I've also just realised I've put this entirely in the wrong section; I will move it to the beer hall. --Muzer (talk) 01:02, 16 November 2022 (UTC)[reply]
I don't see why SSB could not be added alongside traditional RP. I'd still keep the latter to illustrate diachronic change. Nicodene (talk) 00:21, 17 November 2022 (UTC)[reply]
Yes, exactly, we should keep RP and add any modern, different pronunciations alongside it. (I'm in the process of sandboxing ideas for ways we might display older vs modern pronunciations.) - -sche (discuss) 00:51, 17 November 2022 (UTC)[reply]

en- as /ən-/ in GenAm[edit]

Splitting this off of the discussion above because it's a somewhat separate issue and that discussion is big enough as it is: /ən-/, /əm-/ was recently added to some entries like entangle, embattle (see edit history) as a GenAm pronunciation. I haven't been able to find any evidence of this being a GenAm pronunciation (and it would make the prefixes indistinct from un- for GenAm speakers who, as discussed above, pronounce un- with schwa). Do we have sources for these words starting with a schwa, either as a GenAm pronunciation or as a pronunciation in some regions or dialects we could label? (Pinging User:Whoop whoop pull up.) - -sche (discuss) 03:25, 16 November 2022 (UTC)[reply]

@-sche In my experience, GA speakers pronounce unstressed en- (as in entangle) as /ən-/ (stressed en- [which is generally restricted to very formal registers of GA when dealing with en- as a prefix, although it does occur in the colloquial registers in other contexts, such as Endermen] remains /ɛn-/, except in some foreign loans like encore or en garde, where it's pronounced /ɔn-/ instead), while un- is pronounced /ʌn-/ in both stressed and unstressed contexts (thus preserving, e.g., entangle/untangle as a ə/ʌ minimal pair and avoiding the collapse of the phonological distinction between the two prefixes that would, as noted, occur if un- were also pronounced with a schwa). Whoop whoop pull up Bitching Betty ⚧️ Averted crashes 03:42, 16 November 2022 (UTC)[reply]
@Whoop whoop pull up: is that /ən-/ or is it /n̩/ due to loss of an unstressed vowel before an syllabic nasal? Chuck Entz (talk) 16:01, 16 November 2022 (UTC)[reply]
@Chuck Entz Examining my own speech (it being the one most immediately available to me for testing), there's definitely a /ə/ starting off that syllable, with an initial (if short) burst of air making it out through the mouth before the tongue makes contact with the alveolar ridge. In my experience, that's also how the speech of other GA speakers sounds, although with a somewhat-lesser degree of certainty there. Whoop whoop pull up Bitching Betty ⚧️ Averted crashes 16:25, 16 November 2022 (UTC)[reply]
If un- is treated as having the vowel /ə/, it would probably be a good idea to transcribe it with a secondary stress marker, which would prevent ambiguity with fully unstressed /ən/ en-. There are other reasons to think of un- as having secondary stress: I believe that, like thirteen and similar words, disyllables starting with un- can be accented on the first syllable in some cases due to stress retraction (e.g. in our audio for "unknown quantity", I don't hear much stronger stress on the second syllable of "unknown" vs the first).--Urszag (talk) 19:03, 17 November 2022 (UTC)[reply]
OK, I (and other editors, see the edit history of America) have been changing these back to /ɛ/ or /ɪ/ as appropriate: reduction to schwa may occur (allophonically?) for some speakers (some regions?) when the words are particularly unstressed, but it doesn't seem to be the GenAm pronunciation. Perhaps all our differing opinions over whether un- or en- (or bird, etc) has a schwa are a cautionary tale about editors unscientifically assessing of personal pronunciations of things and assuming that's the phonemic value of a whole dialect, against the analysis of more scholarly and in some cases scientific sources. Perhaps the best approach is, while using the separate symbols (/ɪ/, /ɛ/, /ʌ/, /ɝ/, ...), to indicate in Appendix:English pronunciation the circumstances under which these may reduce to schwa, or the linguists who do vs don't think they reduce... - -sche (discuss) 22:50, 21 November 2022 (UTC)[reply]

Are we transcribing American English diphthongs wrong?[edit]

Going through our IPA transcriptions for lots and lots of diphthong-containing English words, I'm struck by how little most of our diphthong transcriptions have to do with how the diphthong in question is actually pronounced. I'm a native speaker of American English (born and raised in Central Massachusetts, since transplanted to Minnesota), so I'm gonna reserve judgment on our IPA transcriptions of diphthongs in other English dialects, but as regards AmE, the transcriptions in the third column of the following table seem to reflect actual pronunciation much better than the current ones (in the second column):

Suggested corrections in American English IPA diphthong transcription
Examples Current IPA transcription More-accurate IPA transcription
day, jade, raise ɛi
beer, mere, mirror ɪɚ
gyro, rice, why ɑi
hose, soda, stow ou
boing, joint, koi ɔɪ oi
chow, Mao, tau æu

Should we instead be transcribing these diphthongs using the transcriptions from the third column, rather than the second? Whoop whoop pull up Bitching Betty ⚧️ Averted crashes 03:27, 16 November 2022 (UTC)[reply]

Where I live (mid-western Canada), /aɪ/ is an accurate transcription. I'm so used to seeing /aʊ/ and /oʊ/ that I'm not sure about my ability to judge whether those are accurate or not, but I think they are. I think the first sound is most accurately transcribed as /ei/ where I live (and that's how I've been transcribing it in entries...whoops). The other two suggestions seem right to me. Makes me wonder just how close to standard American my accent is, though. I thought it was closer, but maybe I haven't been perceiving a lot of differences. Andrew Sheedy (talk) 03:44, 16 November 2022 (UTC)[reply]
Re your last point, I feel this is why we really need [as Wikipedia would put it] "reliable sources" and not just our intuitions when deciding all these pronunciation questions, because people too often misidentify sounds under the influence of expectations (context, etc), like the undone example further up this page. - -sche (discuss) 04:29, 16 November 2022 (UTC)[reply]
Are you sure it's /aɪ/? Have a listen to Geoff Lindsey's clips of King Charles saying it (2nd clip on the page) [4], I might be wrong but don't think I've heard that from Canadians. —Al-Muqanna المقنع (talk) 13:56, 16 November 2022 (UTC)[reply]
Hmm... I think it might actually be /ai/ for me. I was mainly focusing on the first vowel, since Whoop whoop pull up originally had /ɒi/ in the table. Andrew Sheedy (talk) 22:44, 17 November 2022 (UTC)[reply]
@Andrew Sheedy (and also pinging @Tharthan since it's relevant to some of the points they've made) Apologies there; I originally had the wrong lead vowel in the suggested transcription for the gyro/rice/why diphthong (damn you, Latin alpha v. reversed Latin alpha). Should be fixed now. Whoop whoop pull up Bitching Betty ⚧️ Averted crashes 00:16, 18 November 2022 (UTC)[reply]
I once wondered the same thing about /eɪ/. I, too, used to wonder if [ɛi] might be better. But I was reminded that Dutch has an /ɛi̯/ sound, and to my ears at least it doesn't sound the same as English's /eɪ/ sound. /eɪ/ does genuinely sound like [e] with an offglide. For plenty of American English speakers, the sound even seems to be [e(ː)]. So in General American /eɪ/ is definitely /eɪ/ (again, for some, [e(ː)]).
I disagree with /aɪ/ actually being [ɒi]. That's simply not true. Obviously, there is what can be called an /aɪ/ - /ʌɪ/ (often [ɐɪ̯]) split for many speakers, but /aɪ/ is not [ɒi]. [EDIT: I see that the table above has been changed to suggest /ɑi/ instead of /aɪ/. For reference, the table originally suggested /ɒi/ instead of /aɪ/. That was what this segment of what I said was responding to.]
I also disagree that /ɔɪ/ is actually [oi].
I think that the transcriptions that we use for diphthongs for General American are, broadly speaking, pretty accurate. Tharthan (talk) 13:53, 16 November 2022 (UTC)[reply]
@Tharthan The offglide in raise, gyro, and koi is definitely not ɪ, though, and neither is the offglide in stow or Mao ʊ (as a matter of fact, the constructions with ɪ or ʊ as the offglide are nearly unpronounceable if you actually try to produce them, and, if you somehow do manage to do so, sound nothing like the diphthongs that they supposedly represent). Whoop whoop pull up Bitching Betty ⚧️ Averted crashes 16:20, 16 November 2022 (UTC)[reply]
I would sooner analyse the day, jade vowel as just /e/; indeed, Baruch College (City of New York University)'s pages on how to pronounce English have the vowel of say and paid as just /e/ (not even diphthongal /eɪ/, let alone /ɛi/), and the vowel of boat as just /o/; this Berkley page, although they accept the conventional notation, similarly counts "[i] [eɪ] [ɑ] [oʊ] [u]" among the "vowels", and not among the "diphthongs" ("[aɪ] [aʊ] [ɔɪ]"). Amusingly, the page on how to say /ɔɪ/ outright says to say /o/ and then /ɪ/, seemingly acknowledging that the first element is /o/ (although I agree the second element seems more /i/-y, leading to pronunciations with a very pronounced /i/, like google:boyeee). FWIW Baruch also uses ɑɪ, ɑʊ (something I have also seen a few other reference works do) where we use aɪ, aʊ, and they retain /ʌ/ rather than Merriam-Webster's schwa for words like come. All of these sources seem rather basic and I'd like us to find even better references. - -sche (discuss) 00:47, 17 November 2022 (UTC)[reply]
For me, definitely [ɑɪ] and [æʊ] are closer to the truth about the first half of these diphthongs. But I disagree that the second half of any of the diphthongs is accurately represented as [i] or [u]. If anything I would say more like [ɑe] and [æo]. I think the issue that is confusing Whoop whoop pull up is that English [ɪ] and [ʊ] are much more central than cardinal [ɪ] and [ʊ], and representations like [aɪ] and [aʊ] are using the cardinal variants of these vowels even while [ɪ] and [ʊ] by themselves stand for the centralized variants used in English. Benwing2 (talk) 05:28, 18 November 2022 (UTC)[reply]
There is a how-now split of sorts where for me some words retain [aʊ] and others have shifted to [æʊ] with no clear pattern that I can see. It also happens before consonants, so town and gown dont rhyme. Im not sure how widespread this is and yet again I might be just living in a bubble, but since this split likely has nothing to do with the cot/caught, horse/hoarse, etc, it could be quite widespread after all. My guess is that some of the dialects where the shift is completely to [æʊ] lent us some of their words through cultural osmosis and that is why there is no perceivable pattern. Soap 11:28, 18 November 2022 (UTC)[reply]
Well, it is always fascinating to see when different, (relatively) nearby dialects influence one another. I would be curious to read some studies—from this century—on exactly to what extent, say, Midwestern and/or Southern speakers for instance living in places close by to or right where the Midwest and the South meet have each been being influenced by the other dialect. I have personally encountered speakers who came across to me as speaking what sounded markedly like a half-Midwestern, half-Southern dialect, and I have wondered 'Where exactly is that person from?'
It will be interesting to see how things develop going forward. Mind, you have to remember that even if, in the case of your dialect, there actually is what would appear to be an informal (and not even entirely conscious) split happening with /aʊ/ (where, as you describe it, some words have [æʊ] and others have what you describe as [aʊ]), it could just be a transitional state within a shift where one pronunciation is being slowly superseded by another. You'll have to see what the situation is a decade or two from now. Tharthan (talk) 02:10, 19 November 2022 (UTC)[reply]
In response to the corrected table: while the forms on the right seem reasonable alternatives to me, I don't find any of the forms on the left inaccurate enough to justify calling it "wrong" for American English as a whole. The forms on the right are closer to my perception of the sounds, but I am not actually sure that this corresponds to being more phonetically precise. For example, I am pretty sure that the start of my MOUTH vowel isn't actually front of center; it just sounds like an "æ" to me because my TRAP vowel has a range that covers both front and central qualities (e.g. I have a central vowel for TRAP before dark /l/), so I think a narrow phonetic transcription of my pronunciation would use [a] in both [aɫ] as in pal and [aʊ] as in cow. Likewise (in reverse), I'm not sure my PRICE vowel actually starts with a phonetically backer quality than MOUTH: it might just sound like that due to the contrast with its later trajectory, like one of those optical illusions where the same shade of gray looks either white or black depending on what color is next to it. Regarding the use of ɪ and ʊ to represent the offglides of English diphthongs, this is defended from a phonetic point of view by the linguist Mark Liberman in the blog post "The rɑɪt sɑʊnz?" (Language Log, October 2, 2010). My not very thought out reactions are that I would be comfortable with changing the transcription of the nuclei in ɪɚ and ɔɪ to [i] and [o], but uncomfortable with changing the transcription of the nuclei in eɪ and aʊ, and I'm unsure about the other proposals.--Urszag (talk) 06:53, 19 November 2022 (UTC)[reply]
I like i u or j w as the second element of the diphthongs in General American phonemic transcription. The ɪ ʊ convention seems to indicate that the second element of diphthongs is opener than the /j/ phoneme at the beginning of a syllable (unlike the similar diphthongs in Spanish; for instance, compare nylon with Spanish nailon), but I don't think this is phonologically relevant since we don't have a contrast between more and less close offglides in diphthongs (between /ai/ and /ae/ for instance). I feel like the offglides have the phonological feature of "high" or "close" and I would reserve the exact details of just how high or close for phonetic transcription. And ɪ ʊ as the second element of a diphthong confuses some people because the second element is usually different from the independent phonemes /ɪ ʊ/, which are usually not near-close near-front, but closer to [ə], and seems to give some people the impression that ɪ ʊ at the end of a diphthong basically mean nonsyllabic i̯ u̯. Writing the offglides as e o might lead to less confusion with the independent phonemes at least, though it might encourage some people to use weird pronunciations with very low offglides.
The first element of /oi/ makes sense to me. The first element feels to me close enough to the goat vowel. It doesn't really match my thought vowel, which is very close or identical to the lot vowel /ɑ/. I feel like you'd have to have a very close and rounded caught vowel or a very low and unrounded first element of choice for the two to be similar, which might be true of some Southern accents I've heard.
/ɛi/ doesn't describe my pronunciation because I don't perceive that vowel as being related to the dress vowel but rather as a closer monophthong. It looks like the Standard Southern British /ɛj/ transcription and my face vowel starts higher and is often not very diphthongal. But /ɛi/ might make sense for some eastern accents and related varieties of General American.
The transcription of the first element of /ɑi æu/ isn't true for all General American. The first element, when it is not affected by Canadian raising, can be front or back in three main configurations and I think all of them might occur in General American-type accents. There are the accents where the first element of the diphthong has the same frontness as the second element (/æi ɑu/), where the first element is the same for both /ai au/, and where the first and second elements have different frontness (/ɑi æu/) like in your transcription. These can be schematically represented vowel-chart-style as ||, V, and X. I'm not sure which of these is more common in General American. /ɑi ɑu/ would make sense to me because I intuitively analyze the first element as the lot vowel, but that probably doesn't make sense for all accents.
I like /iɚ/ or /iɹ/ because my near vowel seems to have the fleece vowel plus /ɹ/. I think /ɪɹ/ makes sense for the vowel of mirror in those American accents where it's distinguished from the vowel of serious, but in my accent those aren't distinguished and the merged vowel seems to belong to the tense phoneme to me, even though it is opener than the same vowel without an /ɹ/ after it. (I feel like most of my vowels before /ɹ/ are tense, so here, square, force, cure are /iɹ eɹ oɹ uɹ/, not /ɪɹ ɛɹ ɔɹ ʊɹ/.)
Sorry for the long post, but I hope it's useful. — Eru·tuon 00:07, 20 November 2022 (UTC)[reply]
@Erutuon - TL;DR: AmE diphthongs are really, really messy? Whoop whoop pull up Bitching Betty ⚧️ Averted crashes 06:06, 24 November 2022 (UTC)[reply]
Seems like it. I discovered a map in the Atlas of North American English showing some of the variation in the ow and eye diphthongs as it relates to the group of accents that they call "the North". It's on page 2 of [the PDF of Chapter 14. The legend is a bit cryptic, but as I understand it, it's describing the relative frontness of the first part of the diphthong, and "F2(aw) < F2(ayV)" means that ow is backer than eye (roughly /ɑu æi/, ||), "F2(aw) - F2(ayV) < 75" means ow is slightly fronter than eye (roughly /au ai/, V), and "F2(aw) < F2(ayV)" means ow is significantly fronter than eye (roughly /æu ɑi/, X). So ow is backer or roughly equal to eye from the northern prairies of the US and Canada into the southern Great Lakes region, and some parts of upstate New York and New England, but the rest of the country has ow fronter than eye. I feel like some pronunciations from either group would sound GA enough, as long as the first element is open and not very rounded, and the second element of ow is rounded and not too fronted. — Eru·tuon 18:10, 5 December 2022 (UTC)[reply]
Owtchie. It looks like Appendix:English pronunciation's vowel section'll need a big chunk of added explanation regarding diphthongs in GA. Whoop whoop pull up Bitching Betty ⚧️ Averted crashes 07:12, 6 December 2022 (UTC)[reply]
@Erutuon About the only common thread I'm picking up on between all of those different realizations of the GA ow and eye diphthongs is that none of them are accurately represented by our present convention of using /aʊ aɪ/! Whoop whoop pull up Bitching Betty ⚧️ Averted crashes 07:16, 6 December 2022 (UTC)[reply]
@Whoop whoop pull up: I wouldn't say /aʊ aɪ/ is inaccurate. It's one possible representation of a same-first-element pronunciation /au ai/, but the use of /ʊ ɪ/ indicates that the second element of the diphthong isn't fully close, which I think is correct (haven't seen studies on this though). But it's confusing because the second element doesn't match many people's lowered and centralized pronunciations of the independent /ʊ ɪ/ vowels, and I think it's an unnecessary degree of detail. — Eru·tuon 15:07, 7 December 2022 (UTC)[reply]
@Erutuon: So what should we do with Appendix:English pronunciation regarding the highly-variable pronunciation of GA diphthongs? Whoop whoop pull up Bitching Betty ⚧️ Averted crashes 21:18, 10 December 2022 (UTC)[reply]
The appendix is currently describing (broad) phonemes, yes? And the various competing ways of representing /aɪ/ (etc) aren't contrastive with each other (they're competing notations / realizations of the same phoneme) ... so pick one notation for the broad phoneme (perhaps the current, traditional one, or perhaps we reach consensus for a different one) and mention competing representations and various narrow realizations in a footnote like /aɪ/'s footnote about /ʌɪ/? If specific dialects normally use a different phoneme/notation, the way e.g. Southern does (using just /a/ in certain circumstances), I've suggested mentioning that in the table or footnotes. (IMO we should try to have standards for all the dialects/lines we commonly include, since we don't want different people representing the same Boston pronunciation of a word two ways based on different sources' house notation styles any more than we want people representing representing the same GenAm pronunciation of the bed vowel two ways just because certain Collins dictionaries notate it /bed/...) - -sche (discuss) 22:26, 10 December 2022 (UTC)[reply]
For whatever it's worth, the book I mentioned in the discussion further down of /ol/, Syllable Structure: The Limits of Variation (2009) by San Duanmu (professor of linguistics at the University of Michigan), lists on page 185 various common, competing ways of notating diphthongs: ai vs aɪ, au vs aʊ, oi vs oɪ vs ɔɪ vs ɔi, ei vs eɪ vs (ɛɪ), ou vs oʊ, and writes of these: "no one uses [ɛɪ] for British or American English, probably because the starting point is higher than [ɛ]. One might also point out that the ending point of a diphthong does not quite reach the height of [i] or [u] [...] However, it is possible that the ending target of a diphthong is a tense high vowel, and owing to the lack of time the target is not quite reached. There are two other reasons against using lax vowels. First, a diphthong is like a tense vowel, and so we should represent diphthongs with tense vowels. Second, the lax vowels [ɪ] and [ʊ] do not occur in word-final position in American English, but diphthongs can. [...] Therefore [...] for now I use tense vowels to represent diphthongs, although I shall argue later that the feature [tense] is not contrastive in diphthongs." Some of that might be getting into higher-level theory than dictionary transcriptions normally do. - -sche (discuss) 22:26, 10 December 2022 (UTC)[reply]

/ɾ/ in GenAm[edit]

(Split out of a discussion above.) How do people feel about using /ɾ/ in place of /t/ (and /d/?) in /broad/ GenAm transcriptions of words that can exhibit Flapping? My impression is that our practice has been to restrict it to [narrow] transcriptions, which seems good to me because AFAIK flapping is facultative and noncontrastive, but some entries have been changed to use it as the broad transcription, like sanity.
BTW, for completeness: in the past, it has even been suggested to use /d/ (e.g. in otter), which someone in that discussion said the OED does. - -sche (discuss) 08:55, 16 November 2022 (UTC)[reply]

Is there any reliable source that has /ɾ/ as a separate phoneme of GenAm? I feel like I've only seen it as an allophone of /t/ or /d/. It also doesn't seem like GenAm natives internalize it as a contrasting phoneme either (considering the difficulty that arises when trying to explain the pronunciation of it in languages that actually have it as a phoneme). See also: w:Flapping. AG202 (talk) 13:51, 16 November 2022 (UTC)[reply]
Given that, a. when a word is stressed, /t/ and /d/ reappear immediately in whatever the relevant word is, b. the exact extent of flapping in speech differs from speaker to speaker; some speakers have it to the extent that they would consistently pronounce winter as [ˈwɪɾ̃ɚ] and twenty as [ˈtw̥ɛɾ̃i], others have it only to the extent that when a word with a medial /t/ or /d/ is said quickly in speech, it becomes [ɾ], and c. to AG202's point, speakers do not perceive it as an separate phoneme, I would strongly oppose using /ɾ/ in place of /t/ and /d/ in broad General American transcriptions. Tharthan (talk) 14:09, 16 November 2022 (UTC)[reply]
There is a contrast, in that flapped /d/ leaves a lengthening effect on a preceding vowel, while flapped /t/ does not. (A difference which does not appear, in practice, to be used as a cue for distinguishing /d/ and /t/.) See here. Nicodene (talk) 14:23, 16 November 2022 (UTC)[reply]
That's not the contrast being talked about though. That wouldn't lead a different in representation of [ɾ] and whether or not it should be a phoneme. AG202 (talk) 14:47, 16 November 2022 (UTC)[reply]
Yes, it does lead to different phonemic representations of [ɾ] < /t/ and [ɾ] < /d/. Otherwise, you'd have to posit a newly phonemic vowel length contrast. Nicodene (talk) 14:56, 16 November 2022 (UTC)[reply]
Yes, but the initial question is whether or not /ɾ/ should be a phoneme itself in the place of /d/ or /t/. AG202 (talk) 15:10, 16 November 2022 (UTC)[reply]
And? This is an argument against /ɾ/. Nicodene (talk) 20:05, 16 November 2022 (UTC)[reply]
Ahhh apologies, I wasn't sure why that was being brought up, but it makes sense now. AG202 (talk) 14:04, 17 November 2022 (UTC)[reply]
@Tharthan a. In flapped accents, /ɾ/ is mandatory in some contexts (e.g., GA shutter/shudder) and persists even when the word is stressed, without reverting to /t/ or /d/. In some other contexts (e.g, GA winter or militaristic), flapping is nonmandatory and does sometimes (though far from always) disappear with stress. b. The existence of minimal pairs between /ɾ/ and /d/ (and likely /t/ as well, though I haven't pinned those down yet), such as Beatty (/ˈbi.ɾi/) / beady (/ˈbi.di/), argues strongly in favor of /ɾ/ being a separate phoneme in GA. Whoop whoop pull up Bitching Betty ⚧️ Averted crashes 16:11, 16 November 2022 (UTC)[reply]
@Whoop whoop pull up Couldn't the example of Beatty vs beady be argued as /t/ with an allophone of [ɾ] vs /d/? I don't see a place where [ɾ] would have a minimal pair with /d/ where it's not clearly an allophone of /t/. (Same thing with the reverse, [ɾ] vs /t/) Also, regardless, the consonant is still the same to me in those two words, I don't pronounce it differently, both as [ɾ] (I can feel the flap happening). The distinguishing factor between the two for me (and also listening to forvo and merriam-webster) focuses on vowel length, which Nicodene discussed above. AG202 (talk) 17:06, 16 November 2022 (UTC)[reply]
In that particular case, I suppose you could make that argument (although I strongly suspect that, with enough digging, you'll find at least a few full ɾ/t/d minimal triplets), but the case for /ɾ/ being nonphonemic in GA still founders on the fact that there're a solid core of cases (such as shutter/shudder, duty/doody, metal/medal, etc.) where /t/ or /d/ cannot be substituted for /ɾ/(attempting to do so makes it sound like you're imitating a foreign accent, which is a pretty surefire way of telling that that is not a way it can be pronounced in your home dialect), making any argument that /ɾ/ is merely an allophone of /t/ or /d/ untenable for those words (although there do exist other words which do exhibit facultative, rather than mandatory, flapping, such as dentist, Mediterranean, militaristic, planting, etc., for which one could make a reasonable case for /ɾ/ being allophonic rather than phonemic). Whoop whoop pull up Bitching Betty ⚧️ Averted crashes 04:55, 17 November 2022 (UTC)[reply]
But that does not prove that there is a phoneme /ɾ/: even if it's mandatory to use [ɾ] in some words, this can still be interpreted as a conditioned allophone rather than as a separate phoneme. Compare the standard treatment of word-initial [st] [sp] [sk] (start, spin, skin): they are analyzed as /st/ /sp/ /sk/, with the same phonemes as /t k p/ in tart, pin, kin, even though the latter set of words are pronounced with aspirated allophones [tʰ pʰ kʰ] which would sound unnatural in the first set of words (we can't use *[stʰ spʰ skʰ]), and even though the contrast between /t p k/ and /d b g/ is neutralized after word-initial /s/. It is not standard to treat unaspirated /t˭ k˭ p˭/ as their own phonemes.--Urszag (talk) 15:33, 17 November 2022 (UTC)[reply]
Word-initial /t/ can also be flapped, even in words like tonight per YouGlish. The initial sound there is, indisputably, an underlying /t/. A similar case can be made for the /t/ in bet versus betting. Nicodene (talk) 18:57, 17 November 2022 (UTC)[reply]
I'm not so sure about there being minimal triplets; the environments in which flapping can vs can't occur seem to make in difficult for there to be sequences otherwise identical in their phonemic elements like vowels and stress, where /t/ (or /d/) can flap in only one. If we discount word boundary and secondary stress differences, we might contrast "bee tea" [tʰ-] (unless it can also be flapped like the tonight example), "Beatty" [t-, ɾ-], "beady" [d-, ɾ-], but that approach would seem to equally well phonemicize things like aspiration ("hat spin", "hat's pin"), and dark l ("pool abs", "poo labs"). I don't know, perhaps we should consider phonemicizing all these things, including the vowel length differences that phonemicizing [ɾ] would require — when I look in scholarly sources, much of what I've seen so far that directly speaks against phonemicizing [ɾ] seems to amount to "it would make the description of the system more complicated, which we find less phonemic, albeit truer to the narrow phonetic realization",* but I am wary of our transcriptions straying far out away from what reliable sources seem to consider the /broad/ phonemes of American English to be, positing something like /ˈpʰiɾi/ where everyone else sees /ˈpiti/, [ˈpʰiɾi]. (That said, would someone like to add the length difference to the [narrow] transcription of rider vs writer, which is mentioned above [and below] as contrastive, but not indicated in the entries?)
*For example, William Frawley, International Encyclopedia of Linguistics: 4-Volume Set, 2003, page 332, says: "Another much-discussed problem is the minimal pair writer [...] vs. rider [...]. The surface contrast is in the length of the vowel. But most analysts felt that the correct phonemicization registers the contrast in the consonant as [t] vs [d]. [] Comparison with the morphologically related write and ride, as well as the restricted distribution of the flap, justifies locating the contrast in the consonant. The phonetic forms can be derived by ordered application of the independently needed rules of vowel-length assignment and flapping. But if we proceed simply on the basis of minimal pairs, then we must phonemicize the vowel length—even though this contrast only appears before the flap, and correlates elsewhere with voicing of the following consonant."
- -sche (discuss) 20:02, 17 November 2022 (UTC)[reply]
I think you're right to connect phonemicizing dark L and aspiration with the current discussion. I think they're on a similar level and it's quite possible that they are developing to the point of being phonemes in GenAm. However, I'm disinclined to say that these features are already part of General American. It seems more likely that specific regional accents are moving towards phonemicization, but not the "standard" accent.
My skepticism about a broad transition of /ɾ/ to phonemic status is that I'm not aware of any English speakers ignorant of linguistics who perceive it as such. For instance, people teaching non-native speakers English will often use /tʰ/ or /d/ in its place in order to differentiate, for instance, the pronunciation of "latter" and "ladder". Or people sometimes use /tʰ/ and /ɾ/, which suggests that they perceive /ɾ/ as the same sound (and therefore an allophone) of /d/, which lends support to the idea of using /d/ in broad transcription. The argument against that, however, is that it would confuse many people who haven't even noticed that they pronounce pairs like "latter" and "ladder" the same way, which is also common, in my experience. This all suggests that in General American, /ɾ/ is just an allophone. The difficulty English speakers often have with flapping /ɾ/ in foreign languages, as mentioned above, is a further reason to think that /ɾ/ is not perceived as a phoneme.
However, a potentially interesting data point, which may contradict what I've said, is that most of my siblings, before they learned to read, would typically render /ɾ/ as /tʰ/ when stressing the pronunciation of a word. I have heard several young children pronounce "ladder" as [ɫætʰɚ]. Andrew Sheedy (talk) 23:15, 17 November 2022 (UTC)[reply]
Maybe wait another couple decades and see, then? Whoop whoop pull up Bitching Betty ⚧️ Averted crashes 09:11, 20 November 2022 (UTC)[reply]

{{ja-noun}} for multiple counters[edit]

There are many Japanese nouns which have more than one counters (for example, (ろう)(そく) (rōsoku, candle) can be counted with (ちょう) (chō), (ちょう) (chō) or (ほん) (hon)). Is it possible to put more than one counters in {{ja-noun}}? Also, do we need a classifier-by-language category system for Japanese? --TongcyDai (talk) 10:10, 16 November 2022 (UTC)[reply]

(Notifying Eirikr, TAKASUGI Shinji, Atitarev, Fish bowl, Poketalker, Cnilep, Marlin Setia1, Huhu9001, 荒巻モロゾフ, 片割れ靴下, Onionbar, Shen233, Alves9, Cpt.Guapo, Sartma): AG202 (talk) 13:46, 16 November 2022 (UTC)[reply]
I'm not really up on our Lua infrastructure and defer to others regarding {{ja-noun}}. Separately, I think a classifier-by-language category system for Japanese might well be useful. ‑‑ Eiríkr Útlendi │Tala við mig 18:53, 16 November 2022 (UTC)[reply]
Support inclusion of multiple classifiers and categorisations for Japanese and Korean and addition to Category:Nouns by classifier by language. Multiple classifiers are already supported in a number of languages, such as Thai, Khmer, etc. Please check Thai กรณีพิพาท (gɔɔ-rá-nii-pí-pâat), which uses two classifiers in the headword like this: {{th-noun|กรณี|เรื่อง}}. Probably worth checking the implementation of Module:th-headword.
It seems Vietnamese supports only one classifier with |cls=.
BTW, what should be done in cases where a classifier used depends on the sense? These could be split into multiple headwords. Anatoli T. (обсудить/вклад) 22:14, 16 November 2022 (UTC)[reply]
Sometimes a Japanese word is spelled using different kanji depending on the sense. We've used {{ja-def|[KANJI SPELLING HERE]}} on the specific sense line(s) for those cases. Could we do something similar for counters? ‑‑ Eiríkr Útlendi │Tala við mig 23:32, 16 November 2022 (UTC)[reply]
Vietnamese supports multiple classifiers actually, with auto categorization; see đá, for example. PhanAnh123 (talk) 04:37, 18 November 2022 (UTC)[reply]
@PhanAnh123: Thanks. The display there is OK, not broken but it's not like the Thai implementation where entry กรณีพิพาท (gɔɔ-rá-nii-pí-pâat) is categorised correctly by both Category:Thai nouns classified by กรณี and Category:Thai nouns classified by เรื่อง (both classifiers are used correctly for categorisations). So the Vietnamese solution is imperfect. The Thai entry doesn't require extra sets of brackets: {{th-noun|กรณี|เรื่อง}}, compare with the Vietnamese {{vi-noun|𥒥|cls=tảng, hòn, viên, cục}}. Anatoli T. (обсудить/вклад) 05:09, 18 November 2022 (UTC)[reply]

What is the point of "partial" rhyme pages?[edit]

e.g. Rhymes:English/ɛnv... -- who would find this useful? Not even a stumped poet, as far as I can see. Equinox 18:42, 16 November 2022 (UTC)[reply]

They're mid-points in the infrastructure for getting to rhymes if you're wondering if any words end in a certain sequence (say -envik) and you can't think of one to look up, e.g. if you're adding what you think may be the first one (something I've used when adding Arabic loanwords with unusual codas) — or if, like at present with the two -envik words listed there, we don't have English entries with {{rhymes}} to look up. You go to Rhymes:English and click on the right vowel /ɛ/, get all the rhymes starting with Rhymes:English/ɛ-, Rhymes:English/ɛn..., Rhymes:English/ɛnv..., and find out if there are -envik words. If/as we migrate away from Rhymes: pages to categories, this ability will not be lost, because Category:Rhymes:English allows much the same thing by just presenting all possible rhymes up front, which is perhaps more efficient and better at requiring that only English words (with English entries) are listed and not foreign words and music album names and such. - -sche (discuss) 19:27, 16 November 2022 (UTC)[reply]
Speaking from experience, those pages are also useful for looking for near rhymes of a word, when an exact rhyme doesn't work. This specific page doesn't accomplish that, but many do. Andrew Sheedy (talk) 23:17, 17 November 2022 (UTC)[reply]

Etymology section for irregular non-lemmas[edit]

The standard established in Wiktionary:Beer parlour/2016/March#Etymology section for non-lemmas is that non-lemma forms shouldn't include etymologies. However, there are many cases where a form is irregular and it would be interesting to include a short etymology section that explains the origin of the irregularity (doing so in the lemma seems too verbose). One example is Portuguese fará, corresponding to the third-person singular future of indicative of the verb fazer, the expected form is *fazerá. Furthermore, I find the current message left by Template:nonlemma, insufficient in several cases. For instance, it would be interesting, imo, to briefly state that the verb form crerá is a regular suffixation of the lemma crer with , thus giving the reader the opportunity, not only to follow the etymology of the lemma form, but also the one of the suffix (which is interesting by itself). For instance, the example above could be summarized in a template that returned something akin to "From crer +‎ (future forming suffix). For further etymology, see the corresponding lemma form." This approach also brings the benefit of categorization.

In short, I would like to have your thoughts on the following two changes:

  1. Irregular non-lemmas can have etymologies explaining the origin of the irregularity
  2. Regular non-lemmas can have short etymologies (preferably templatized) linking to the lemma form and to an affix or to the glossary term that explains the derivation.

What do you think? - Sarilho1 (talk) 11:57, 17 November 2022 (UTC)[reply]

I support common sense in cases like this. I don't think there's any point adding special etymologies to every non-lemma form out there, but I don't think they should be banned either, and they can certainly be helpful to explain irregular forms. I'm having a hard time finding where in the discussion you link that standard was established—it seems like the discussion was about a separate issue, whether to group all non-lemmas in a single etymology section, and it doesn't come to an obvious decision—but if it is a norm it doesn't seem particularly well-enforced anyway; I noticed that despite having a very predictable morphology Hungarian does have etymologies on stuff like plurals (lakatosok etc.). —Al-Muqanna المقنع (talk) 12:35, 17 November 2022 (UTC)[reply]
When one inflected form has a substantially different etymology, as in cases of suppletion, we absolutely allow etymologies: see English was. When you have different stems for whole blocks of the paradigm, though, I believe we cover that at the lemma. Chuck Entz (talk) 15:13, 17 November 2022 (UTC)[reply]
Somewhat relatedly, I note that we don't any etymology for many inflectional morphemes and don't have entries for some. DCDuring (talk) 15:51, 17 November 2022 (UTC)[reply]
On one hand, there've been irregular forms where I've been tempted to put etymological information about the source of the irregularity; OTOH, we so uniformly centralize information to lemmas that I don't know how many people would think to look up an inflected form instead of a lemma. This is also my concern with certain plural-only senses of words most people would be able to figure out (and hence likely to look up) the lemma/singular of; like in those cases, I think we should have little pointers between the entries, e.g. if there's more information (about was) in was than there is in be, be should say something like "see was for more on that form". - -sche (discuss) 20:07, 17 November 2022 (UTC)[reply]
That seems sensible to me. I had a similar concern about Latin pluralia tantum not being signposted at the "singular" forms that people might look up; I added a see also to plural-only minae from mina after an IP tried to add definitions of minae to the second one. —Al-Muqanna المقنع (talk) 22:39, 17 November 2022 (UTC)[reply]

Osco-Umbrian language code[edit]

We should have it, for better catogarization. Could be an etymology-only code to Proto-Italic. See entries like lupus, farfecchie. The name I believe should be Osco-Umbrian as that seems to be the preferred term in recent literature, with Sabellic being the older alternative, also still in great use (see ngrams). I believe Sabellian is usually the historical rather than linguistical term. The code could be itc-sbl. Catonif (talk) 13:52, 19 November 2022 (UTC)[reply]

As an encouragement to add this, I also note bitumen, botulus, rufus, lumbus, Vibius, bos, omentum, popina. I correct my previous statement "etymonly code to Proto Italic" with "language family code", as the parent of Oscan, Umbrian, South Piecene and all the other minor w:Osco-Umbrian languages. Catonif (talk) 19:50, 1 December 2022 (UTC)[reply]

/ɝ/ vs /ɚ/ in GenAm[edit]

Given the disagreement about /ʌ/ vs /ə/, I want to bring up /ɝ/ vs /ɚ/. Traditionally, words like nurse, termite and turf are considered to have /ɝ/ in GenAm, but recently some users (including some who want to keep /ʌ/!) have changed /ɝ/ words to merge and unite that vowel to /ɚ/. AFAICT, the sources which support vs oppose /ʌ/ vs /ə/ are the same as support vs oppose /ɝ/ vs /ɚ/. Cambridge, Collins, Dictionary.com, Oxford Learner's, and MacMillan all have the US pronunciation of termite as /ɝ/~/ɜr/~/ɜːr/, and though Merriam-Webster's non-IPA notation is "ˈtər-ˌmīt", they clarify that this means IPA [ɝ, ɚ]. So if there's no consensus to change e.g. un- to schwa, should we also be undoing the edits that changed e.g. turf to schwa? - -sche (discuss) 23:38, 20 November 2022 (UTC)[reply]

Although I prefer to differentiate /ʌ/ vs /ə/, I am not sure whether I prefer to use /ɝ/ vs /ɚ/ or just /ɚ/. Phonetically, I can't hear a clear difference between the vowels themselves, which makes /ɚ/ seem like the simpler option, but I do see the argument that /ɝ/ vs /ɚ/ is more consistent. Phonologically, if we do not indicate tertiary stress/stress on syllables coming after the primary stressed syllable, the use of /ɝ/ vs /ɚ/ could be helpful to indicate differences in prosody that correspond to use vs. non-use of t-flapping etc.: e.g., in the word dramaturgy. But I guess unflapped t can occur before /ɚ/ in some cases, e.g. Mediterranean, militaristic, so even in this case /ɝ/ vs /ɚ/ isn't perfectly informative.--Urszag (talk) 00:16, 21 November 2022 (UTC)[reply]
Given that there's neither a phonemic nor a phonetic difference between stressed and unstressed GA rhotic schwas (with the use of /ɝ/ as well as /ɚ/ being simply a notational convention to differentiate stressed from unstressed rhotic schwas, a convention rendered completely redundant by the use of ˈ and ˌ to indicate primary and non-primary stress, respectively), that the GA realization of both is [ɚ], and that GA rhotic schwas are produced by R-coloring /ə/, whereas /ɝ/ is the R-colored version of a vowel that does not exist in GA in non-rhotic form, I would argue strongly in favor of deprecating /ɝ/ for IPA transcription of GA rhotic schwas and using /ɚ/ to represent both stressed and unstressed rhotic schwas in GA, with /ɝ/ being restricted to use in IPA transcriptions of regional dialects where /ɝ/ (and /ɜ/) actually exist. Whoop whoop pull up Bitching Betty ⚧️ Averted crashes 23:47, 21 November 2022 (UTC)[reply]
I think their clarification amounts to "ɝ is what you will find in traditional transcriptions of American English". Nicodene (talk) 23:57, 21 November 2022 (UTC)[reply]
I agree that /ɚ/ or /əɹ/ should be used instead of /ɝ/ or /ɜɹ/ in General American because there's no such phonemic contrast in typical GA accents. There's probably a phonemic contrast in some accents like a stereotypical old-fashioned New York City accent (in which /ɝ/ sounds kind of like oi). But this isn't true for General American and using two separate symbols might lead some people trying to learn General American pronunciation to believe that they need to pronounce the first syllable of murder with an opener vowel than the second syllable, which is not true, or to try to hear a difference that isn't there. — Eru·tuon 01:15, 22 November 2022 (UTC)[reply]
I dont have much to add to this thread, as opposed to the STRUT/COMMA thread above. I suspect the Americans who maintain the distinction are mostly nonrhotic speakers, and that within that dialect pool, the distinction might be not in the vowel height but in whether the speaker pronounces the /r/ in their otherwise nonrhotic dialect. Wikipedia suggests here that at least in NYC, the nonrhotic speakers now pronounce /r/ in words like bird. Whether this can be analyzed as due to stress or not, I dont kinow. Soap 12:04, 24 November 2022 (UTC)[reply]
In a sense the older pronunciation of bird with [əɪ] (coilcurl merger) was still pronouncing /r/, just with a different tongue shape. Geoff Lindsey has a blog post where the second sound file demonstrates the similarity of a bunched r to a palatal y sound. It was pretty eye-opening to me because it shows that the odd nurse vowel pronunciation could have developed just by slight changes in tongue shape from a more typical American r. — Eru·tuon 14:58, 12 December 2022 (UTC)[reply]
@Kwamikagami, given diff, perhaps you want to weigh in here? Personally I'd prefer to retain the info about what was originally /ɝ/ vs /ɚ/ (even if only as e.g. "older GenAm"), but my impression of the above discussion and Wiktionary:Tea room/2022/September#NURSE_vowel_vs_stressed_schwa_in_turfy is that people prefer /ɚ/ for modern {{a|GenAm}}. - -sche (discuss) 04:23, 6 December 2022 (UTC)[reply]
I just reverted a change where the two should be kept distinct. If we use /ɚ/ for both full and reduced vowels, then there is no way to indicate that the second syllable in t-girl has what Webster's calls "secondary stress" except by adding a spurious stress mark, which is not a phonemic IPA transcription and would not be used in the OED, despite GA and RP having the same stress pattern. kwami (talk) 04:33, 6 December 2022 (UTC)[reply]
@Kwamikagami IPA has secondary-stress marking. That's what ˌ is for. Whoop whoop pull up Bitching Betty ⚧️ Averted crashes 05:41, 6 December 2022 (UTC)[reply]
Yes, it does. But it's supposed to mark stress, not vowel quality as in Webster's (which isn't IPA anyway). kwami (talk) 05:51, 6 December 2022 (UTC)[reply]
There is no vowel-quality difference here. The vowels in the first and second syllables of "murdered" are identical except for stress, and the second-syllable vowel of "t-girl" is identical in quality to both.
On a closely-related note, User:Kwamikagami's been revert-warring on fur, claiming that there's some difference in pronunciation between fur and unstressed for, when, in fact, the two are completely homophonous in GA, both being pronounced /fɚ/. (Maybe there's some distinction between the two in Kwami's personal regiolect, but no such distinction is present in GA itself.) Whoop whoop pull up Bitching Betty ⚧️ Averted crashes 06:17, 6 December 2022 (UTC)[reply]
We make the distinction in this key. That's what this discussion is about. (Especially when we don't mark stress, as with fur, marking vowel reduction becomes important.) If you want to retire the distinction, fine, but you should get consensus here first, and change the key accordingly, and only then change the transcriptions of the articles. You shouldn't edit-war over imposing your POV while the discussion is still in progress, and contradicting the key that users are referred to.
Also, MW distinguishes fur from the reduced pronunciation of for, so you're contradicted by at least that source. kwami (talk) 06:23, 6 December 2022 (UTC)[reply]
MW uses outdated pronunciations for GA, as demonstrated both here and at rolloff below. Whoop whoop pull up Bitching Betty ⚧️ Averted crashes 06:50, 6 December 2022 (UTC)[reply]
And, as for making a distinction between /ɚ/ and "/ɝ/" for GA in our pronunciation key, that's a distinction without a difference (and one that, I suspect, has been kept around in zombie form by dictionaries like MW continuing to parrot outdated pronunciations rather than updating their pronunciation keys to reflect how words are actually pronounced nowadays) - modern-day GA has absolutely no phonemic or phonetic difference in vowel quality between stressed and unstressed rhotic schwas, which is why we're discussing retiring "/ɝ/" from use for GA and why there appears to be a consensus to go ahead and retire "/ɝ/". Whoop whoop pull up Bitching Betty ⚧️ Averted crashes 07:06, 6 December 2022 (UTC)[reply]
Re "MW distinguishes fur from the reduced pronunciation of for": do they?? They have "fər, (ˈ)fȯr Southern also (ˈ)fär" for for and "ˈfər" for fur, differing only in stress, which is ... what everyone is saying here, that people who write /ɝ/ are just doing so to indicate "the /ɚ/ sound, but with stress", which MW is thus indicating in a more traditional or straightforward manner (by just writing the ər sound, but with stress). (Now, this is also what people use /ʌ/ for — "the /ə/ sound, but with more stress" — so I continue to be intrigued that the sets of people who want to fold /ɝ/ into /ɚ/ and those who want to fold /ʌ/ into /ə/ differ, but nonetheless...) - -sche (discuss) 07:22, 6 December 2022 (UTC)[reply]
@-sche: (I strongly suspect that the reason for those two sets differing is that not everyone in favor of folding /ɝ/ into /ɚ/ uses stressed /ʌ/ to represent "the /ə/ sound, but with more stress"; a significant subset of the former category [me, for instance] use stressed /ə/ to represent a stressed neutral vowel [i.e., "the /ə/ sound, but with more stress"] and stressed /ʌ/ to represent a stressed vowel that's significantly more open (and usually somewhat more back as well) than the stressed neutral vowel represented by /ə/.) Whoop whoop pull up Bitching Betty ⚧️ Averted crashes 07:31, 6 December 2022 (UTC)[reply]
Could you provide an example of stressed /ə/ vs stressed /ʌ/? I've never seen an analysis of English like that. kwami (talk) 20:52, 7 December 2022 (UTC)[reply]
@-sche Yes, MW distinguishes them by marking 'fur' as having stress. But we don't mark stress, so we need to distinguish the vowels. My point was that MW is a RS that 'fur' and reduced 'for' are not homophones, so we shouldn't claim they are homophones, which is what would happen if we collapsed this distinction. I've gotten into arguments with people here who insist that monosyllables like 'fur' don't have stress, and who revert my attempts to add it. So if the Wikt convention is to not mark monosyllables for stress, we need some other remedy for distinguishing them. kwami (talk) 20:49, 7 December 2022 (UTC)[reply]
No we don't, because that distinction no longer exists in modern GA, MW's fossil pronunciations notwithstanding. Whoop whoop pull up Bitching Betty ⚧️ Averted crashes 20:59, 7 December 2022 (UTC)[reply]
I'm with you on wanting to transcribe stress on non-clitic monosyllables, but I don't care enough to argue with everyone else about it. — Eru·tuon 14:10, 8 December 2022 (UTC)[reply]

@-sche @Urszag @Nicodene @Erutuon @Soap @Kwamikagami: Do we have a consensus either way yet? Whoop whoop pull up Bitching Betty ⚧️ Averted crashes 20:37, 7 December 2022 (UTC)[reply]

My vote is to deprecate /ɝ/ for General American. Nicodene (talk) 21:25, 7 December 2022 (UTC)[reply]
I'm in favor of using /ɚ/ or /əɹ/ in place of /ɝ/ or /ɜɹ/ for General American. — Eru·tuon 14:28, 12 December 2022 (UTC)[reply]
OK, that makes me, Nicodene, and Erutuon in favor of deprecating /ɝ/ for GA, on the one hand, and Kwami opposed to deprecating /ɝ/ for GA, on the other. @-sche, Urszag, Soap, wanna weigh in? Whoop whoop pull up Bitching Betty ⚧️ Averted crashes 09:08, 16 December 2022 (UTC)[reply]
I have some thoughts on the matter as well.
In Ohio, some other nearby Midwestern states, as well as some states more southwards, /ɜ/ has been clearly established to exist as part of the inventory of vowels. I hardly think that it would be strange for General American speakers from those areas to have /ɝ/ or /ɜɹ/ in a word like her.
If [ɝ] or [ɜɹ] is indeed a possibility for General American speakers, discarding it in our transcriptions and uniformly using /ɚ/ or /əɹ/ might not be the best idea. Tharthan (talk) 15:41, 16 December 2022 (UTC)[reply]
@Tharthan: Do you remember where you heard or read about it? I don't know what such a distinction would be like in near-General American accents. I guess older New York City English might have a distinction with the nurse vowel sounding like [əi] and the letter vowel being often just a derhoticized schwa, but that wouldn't be the difference in Ohio. Unfortunately on Wikipedia I don't remember seeing anything about the nurse-letter distinction in relation to American English except in the w:General American English article, which says there isn't one. — Eru·tuon 16:35, 16 December 2022 (UTC)[reply]
I wasn't suggesting that there exists a conscious distinction for those speakers. I was pointing out that the actual pronunciation of what is being suggested be uniformly transcribed as /ɚ/ may, at least in certain environments, be closer to [ɝ] than to [ɚ] for them.
As for where /ɜ/ is used by such speakers, from what I understand it is often their STRUT vowel. I have only personally ever heard it myself—for the STRUT vowel—in the speech of some Southerners. But I have read that it occurs in certain Midwestern states as well, including Ohio.
Anecdotally, I recall some years ago seeing a Southern speaker write a stressed "love" (in all capitals) as "lurve". In speech where /ɜ/ has been established to exist as a vowel (such as by being its STRUT vowel), "lurve" would seem to represent at the very least [lɜv], if not [lɝv]. Tharthan (talk) 01:48, 17 December 2022 (UTC)[reply]
I don't have a strong opinion either way. But if we use just /ɚ/ or /əɹ/ (I think I'd prefer /ɚ/) and not /ɝ/, the issue of how to transcribe unreduced/tertiary-stressed vs. reduced/fully unstressed syllables after the main stress in words like t-girl (kwami's example) or dramaturgy (my example) does not seem to be resolved yet. I don't consider stress to be a fully phonetic characteristic, so I'm not entirely satisfied with kwami's argument that we shouldn't transcribe stress in this position because it isn't phonetically present. Also, there is a contrast between "strong"/unreduced and "weak"/reduced /ɪ/ even though I don't think we transcribe them differently: e.g. I think autism and autist can be pronounced with strong /ɪ/, resulting in no flapping of the preceding /t/, but emphatic has weak /ɪ/ in the final syllable ("emphatic" example taken from John Wells's blog post strong and weak, Friday, 25 March 2011). To me, using distinct symbols for the vowels in these kinds of syllables seems like a more unwieldy way of transcribing this kind of contrast than using stress marks (as in the Oxford English Dictionary's American English transcriptions: "/ˈdrɑməˌtərdʒ/"). But if there is a consensus for kwami's position that we should not transcribe any stress on these syllables, then I can understand why /ɝ/ could be considered useful. I would like to see further discussion of this issue so we can get an answer to that at the same time.--Urszag (talk) 16:53, 22 December 2022 (UTC)[reply]

/ɜ/ vs /ə/ in RP / British[edit]

I notice that Appendix:English pronunciation already had (for some time now) a note about bird/nurse words, "For RP, /əː/ is sometimes used as an alternative to /ɜː/—for example, in dictionaries of the Oxford University Press." So should we be merging these not only in GenAm but also in RP / British? Any British editors want to weigh in? Does a word like murdered have two different vowels, or one vowel that just differs by length? - -sche (discuss) 00:57, 25 November 2022 (UTC)[reply]

@-sche: As something resembling a phoneme in British English, I've only noticed length on consonants. I think the height difference is variable, though it should be noted that the second vowel is a schwa, and thus quite variable in itself. The first vowel is long; British English fundamentally retains length, with roughly three length phones in closed syllable, as short before nominally voiced and long before nominally voiceless are about the same. The qualify of the first vowel is what IPA writes as [ɘ] (close mid central), though most linguists stick with "[ɜ]", which has been declared by the IPA to be open mid central. --RichardW57 (talk) 14:56, 5 December 2022 (UTC)[reply]
Formally, this is similar to the traditional use of "ʌ" for what is now [ɐ], the STRUT vowel. --RichardW57 (talk) 14:56, 5 December 2022 (UTC)[reply]
In my pronunciation of murdered the two vowels are subtly different beyond just length, though if I force myself to pronounce both as /ə(ː)/ it doesn't sound particularly unusual. The first one is somewhere on the ɜ~ə spectrum, [ɘ] is probably right. —Al-Muqanna المقنع (talk) 15:05, 5 December 2022 (UTC)[reply]
Is the difference something that cannot be explained by stress? Nicodene (talk) 15:14, 5 December 2022 (UTC)[reply]
It is transcribed that way in the American tradition, at least with Merriam-Webster, but that's not IPA. E.g. t-girl in the thread above. The distinction typically comes up in compound words, where one of the elements loses its stress but retains a full vowel. So in t-girl the second (unstressed) vowel is the same as the first (stressed) vowel of murdered, rather than as the second. If we're going to follow the OED in using stress marks for stress, then I believe we need different symbols here. kwami (talk) 04:41, 6 December 2022 (UTC)[reply]
@Kwamikagami Could you please clearly explain what you're trying to say? IPA does use secondary-stress marking (that's what ˌ is for), and in GA itself the two vowels of murdered and the second-syllable vowel of t-girl are completely identicala except for stress (certain regiolects do have a distinction between those vowels that goes beyond stress, but that's a feature of those specific regiolects, not of GA). Whoop whoop pull up Bitching Betty ⚧️ Averted crashes 05:41, 6 December 2022 (UTC)[reply]
There is no distinction in stress. Both murdered and t-girl are stressed on the first syllable. Webster's would mark the second syllable of t-girl as having "secondary stress", but that's just how they distinguish full vowels from reduced, since they don't have enough symbols for all the vowels. It's not stress, so we shouldn't mark it with the IPA stress mark. (English doesn't have secondary lexical stress, but it's common to transcribe all but the last stressed syllable as having secondary stress, because the last stressed syllable gets additional phrasal stress when the word is pronounced in isolation.) kwami (talk) 05:46, 6 December 2022 (UTC)[reply]
Hm? They say they use the secondary stress mark to mark secondary stress, and when I spot-check now, the things they mark with secondary stress seem to be things that bear some secondary stress. You say stress marks are "just how they distinguish full vowels from reduced, since they don't have enough symbols for all the vowels", but they mark secondary stress also on words like girlfriend and battlefield where the symbols they notate the second syllables with already indicate that the vowels are not reduced. - -sche (discuss) 07:49, 6 December 2022 (UTC)[reply]
Yes, they interpret it as stress, but that's contradicted by phoneticians as well as by other dictionaries. That pattern is typical of compound words. But according to Ladefoged (who should know the IPA) and the OED, those words don't have secondary stress. kwami (talk) 08:00, 6 December 2022 (UTC)[reply]
Unless they specifically state 'these words don't have secondary stress', there is no reason to interpret their transcriptions that way. Secondary stress is often simply left untranscribed, especially on the phonemic level. Nicodene (talk) 15:09, 6 December 2022 (UTC)[reply]
He actually says that no words in English have secondary stress, so that would include these words. kwami (talk) 20:59, 7 December 2022 (UTC)[reply]
The phonetic correlates of stress are often not simple to interpret. John Wells made a blog post suggesting that in his view, it's not the case that secondary stress is impossible after a primary-stressed syllable, it's just that it's optional, and he indicates that there are different traditions about how to transcribe stress in English. Speaking of the example irritating, Wells writes "Actual rhythmic beats following the main word stress accent are all pretty optional, which is why the British tradition is not to show any secondary stress in words like this: ˈɪrɪteɪtɪŋ, not *ˈɪrɪˌteɪtɪŋ. The alternative tradition, usually followed in the States and (for example) Japan, is to recognize a secondary stress on the penultimate, írritàting." irritating hamburgers (John Wells’s phonetic blog, 29 September 2009).--Urszag (talk) 15:18, 6 December 2022 (UTC)[reply]
According to Ladefoged, in GA that's a matter of vowel reduction, not stress. He was unable to find any phonetic indication of secondary stress: after primary stress, those are just un-reduced vowels, and before primary stress, they're just primary stress that doesn't have phrase-final intonation. (The difference between primary and secondary stress disappears when you put the word in a phrase so it's no longer phrase-final.) kwami (talk) 21:03, 7 December 2022 (UTC)[reply]
Out of curiosity I recorded myself and cropped the vowels: they are definitely a different quality though whether that's just because my stressed articulation of /ə/ is more open is past my pay grade. —Al-Muqanna المقنع (talk) 14:22, 6 December 2022 (UTC)[reply]

GenAm vs US in Template:accent[edit]

This was discussed years ago, but maybe with new editors and interest we can decide: what is the difference between a bare {{a|US}} accent label (as in doggy), and {{a|GenAm}} (as in body)? Do we want to standardize on one and make the other an alias, or establish an intended distinction, e.g. using "US" as an (empty? diaphonemic?) element atop a list of "GenAm; NYC; Southern US" etc pronunciations? (And "UK" atop a list of "RP, SSB, Geordie, Wales" etc?) I'm not talking about the use of {{a|US}} as part of a label like {{a|US|regional|Southern US|Midwestern US}} which, while I think it's unnecessary — just say {{a|Southern US|Midwestern US}}, or if you don't know which regions, say {{a|regional US}}! — is at least a different beast. - -sche (discuss) 23:06, 22 November 2022 (UTC)[reply]

I like the idea of using 'US' and 'UK' as empty geographic headers. Nicodene (talk) 23:38, 22 November 2022 (UTC)[reply]
As a Canadian, I very much don't. Canada is not part of the US, but Canadian English is part of GenAm. It would be silly to add a separate "Canada" label to almost every entry and inaccurate to just leave it as "US". Andrew Sheedy (talk) 04:49, 23 November 2022 (UTC)[reply]
This is the first time I've heard GenAm defined as explicitly including Canadian English. I would have previously said that Canadian English is strictly speaking something else, just similar enough to be mostly covered by GenAm transcriptions. It looks like w:General American English does suggest Canadian English might be included. Might it be clearer to label entries that are meant to cover both Canadian and US accents as "North American" rather than "General American"?--Urszag (talk) 06:01, 23 November 2022 (UTC)[reply]
I would be fine with that. I could understand someone wanted to restrict GenAm to a more specific accent or set of accents, but the fact stands that Canadian pronunciation is a subcategory of the class of accents found in the US. There's much more of a difference between different accents in the UK than between any variety of Canadian English and standard American English. Andrew Sheedy (talk) 15:59, 23 November 2022 (UTC)[reply]
We can simply make the header 'North America'. Nicodene (talk) 18:01, 23 November 2022 (UTC)[reply]
My understanding is that GenAm is a specific accent whereas US refers to all accents in the states. Vininn126 (talk) 17:52, 23 November 2022 (UTC)[reply]
No, General American is a continuum of accents. It is not a single unified accent. Tharthan (talk) 22:39, 23 November 2022 (UTC)[reply]
What is a label that "refers to all accents in the states" useful for? Surely if I put {{a|US}} {{IPA|en|[bəɪd]|[bɜd]}} in the entry bird, this—although technically true, these are pronunciations found in the US—is unhelpful because I should be labelling them by where in the US each one is found. Should {{a|US}} try to be a diaphonemic representation of all US accents? But this is probably impossible/inadvisable, as Erutuon says. Should it be an empty placeholder? But I don't expect casual users to understand why there's a blank line, they'll probably keep using it for GenAm pronunciations as at present. Should be an alias that displays the same as GenAm? IMO that would be most maintainable. - -sche (discuss) 02:18, 25 November 2022 (UTC)[reply]
I prefer the use of {{a|US}} as a header and not a label for individual phonemic transcriptions, because we do not have a diaphonemic transcription system that accommodates all the diverse pronunciations in the US (nor should we, because it would be way too confusing) and {{a|US}} is sometimes placed in front of transcriptions that definitely do not describe some pronunciations in the US. For instance, the entry for glory gives two incompatible US pronunciations, /ˈɡlɔɹ.i/ and /ˈɡlo(ː)ɹi/, but only the first is labeled as US; it is actually intended to represent the General American pronunciation and should be labeled with {{a|GA}} or {{a|GenAm}}. (Granted the transcription of glory will change to /ˈɡloɹi/ with the discussion further up this page, but still there will be two US pronunciations in entries like north or horror, and no matter what transcription system we could devise, it would never accommodate all US accents without having way too many hard-to-remember symbols.) There are some words in which the phonemes in the different US accents probably don't need different symbols, but it's better to err on the side of being specific because most people can't reliably determine which words those are (even I'd be unsure if we really tried to transcribe the full variety of accents). — Eru·tuon 22:45, 23 November 2022 (UTC)[reply]
On the subject of headers, perhaps we can have three to cover the main native zones, namely British Isles (UK + Ireland), North America (US + Canada), and Oceania (Australia + New Zealand). Nicodene (talk) 05:33, 24 November 2022 (UTC)[reply]
Regarding the inclusion of Canadian in GenAm: while various sources (summarized on Wikipedia) do say Canadian resembles GenAm, it seems like most people do interpret "General American" (when named that way) as a US thing; Appendix:English pronunciation not only treats GenAm and Canadian as difference accents but specifies quite a few differences; and entries (e.g. orange, out) specify some pronunciations as Canadian-only and others as GenAm-only. So, if we're intending to subsume Canadian into a "General North American", it might be necessary to rename along those lines to make the scope clear. But since most sources only describe US/GenAm and UK/British/RP pronunciations, if we just start using sources that describe US-GenAm to support our pronouncements about North American, it'd seem kinda fishy (source hijacking/falsification). - -sche (discuss) 02:18, 25 November 2022 (UTC)[reply]
With regard to out, I think the current treatment may not actually be ideal—so-called "Canadian raising" is definitely not confined to Canadian speakers, but occurs for a non-negligible amount of US speakers as well. However, I think this phonemic (or incipiently phonemic) split is poorly documented, so I'm not sure how good a job Wiktionary can do at showing information about it.--Urszag (talk) 09:38, 25 November 2022 (UTC)[reply]

My preference, FWIW, would be: make "US" an alias of "GenAm" so writing {{a|US}} produces "(General American)", and nest NYC, Boston, Southern US, California, etc underneath GenAm (and let Canadian keep its own line whenever it's different from GenAm, otherwise combine labels when the accents have the same pronunciation, like we already do: {{a|GenAm|Canada}}). The UK situation may need to be handled differently (e.g. not nesting things under RP) since so few British people use RP. - -sche (discuss) 02:19, 25 November 2022 (UTC)[reply]

I'm in favour of this. I don't think it's necessary to specify "Canadian" alongside "GenAm" when they're the same, provided "General American" is what readers see. Andrew Sheedy (talk) 06:00, 25 November 2022 (UTC)[reply]
I was not aware that the way that we intend to proceed regarding Received Pronunciation had been entirely decided yet (maybe I have missed something, but I don't recall seeing anything agreed to on that), but if regional British pronunciations are not going to placed beneath Received Pronunciation pronunciations, then I don't think that regional American pronunciations ought to be placed beneath General American pronunciations either.
To be clear where I stand on this, rather than making the "US" label an alias for General American, I think that we ought to either continue having our primary US pronunciations be focused on General American pronunciations, and try to avoid using the {{a|US}} label unless perhaps it has some qualifier (and even then, as -sche pointed out, it isn't necessary: regional pronunciations can simply be listed by the region) or we ought to have "US" be some sort of empty header, as Nicodene suggested. Tharthan (talk) 16:32, 25 November 2022 (UTC)[reply]

Japanese lemma or non-lemma[edit]

Entries like そうきん or ぬいぐるみ are in Category:Japanese non-lemma forms. But given that English oeconomy and naïve are in Category:English lemmas, should those Japanese entries be moved to Category:Japanese lemmas? -- Huhu9001 (talk) 01:53, 23 November 2022 (UTC)[reply]

Both of the Japanese entries are soft redirects. I don't think redirection entries count as lemmata. ‑‑ Eiríkr Útlendi │Tala við mig 19:44, 23 November 2022 (UTC)[reply]
Hiragana entries that don't use {{ja-see}} aren't put in the non-lemma category, though. It's also inconsistent with the way alternative spellings are handled in other languages. Binarystep (talk) 22:42, 23 November 2022 (UTC)[reply]
Japanese writing has wrinkles that alphabetic systems don't have. (neko, cat) can be written as ねこ, but ねこ is more of a phonetic guide than an "alternative spelling", not quite the same as the way that nite is an alternative spelling for night.
This might also have to do with different assumptions about what "lemma" means. My understanding is that "the lemma is the 'main' form of the word at Wiktionary for purposes of locating the definitions and other details". How do you define "lemma", specifically in relation to Wiktionary data structure?
Pinging @Fish bowl, Atitarev, TAKASUGI Shinji as a couple other JA editors off the top of my head. ‑‑ Eiríkr Útlendi │Tala við mig 23:39, 23 November 2022 (UTC)[reply]
I really don't want to say so, but it is definitely not a good sign that some Wiktionarians are even trying to invent their new law that demotes kana from a legitimate writing system to a mere "phonetic guide". Do they think kanas are equivalent to romaji or Chinese pinyin? Had they ever paid attention to what actual Japanese people write would they immediately notice "ねこ" happens just as common as "猫" in casual texts, such as in social media. And all of the sudden they become "phonetic guides", presumably to show that Japanese are all idiots to be taught again and again how to pronounce "cat"? -- Huhu9001 (talk) 00:45, 24 November 2022 (UTC)[reply]

: Japanese writing has wrinkles that alphabetic systems don't have. (neko, “cat”) can be written as ねこ, but ねこ is more of a phonetic guide than an "alternative spelling", not quite the same as the way that nite is an alternative spelling for night.

How isn't it an alternate spelling? Hiragana spellings are actually used in running text (though not always very often), which makes them actual words rather than a mere spelling guide.

This might also have to do with different assumptions about what "lemma" means. My understanding is that "the lemma is the 'main' form of the word at Wiktionary for purposes of locating the definitions and other details". How do you define "lemma", specifically in relation to Wiktionary data structure?

I define a lemma as the base form of a word, without any inflections. Binarystep (talk) 05:19, 24 November 2022 (UTC)[reply]
@Huhu9001, @Binarystep, @Eirikr, @Fytcha: In purely Wiktionary terms, "lemma" is the non-inflected word. So 猫、ねこ、ネコ are all equally lemmas, just "Alternative forms". neko (neko) is a "Romanization", and as such it should be treated (see for instance pi1). Akkadian works exactly the same as Japanese, but fortunately scholars use the Romanization as main entry on dictionaries, so I can just give all the alternative spellings and Sumerograms in a table inside the entry page (see for example 𒈗 (šarrum). The issue here is that Japanese editors inexplicably, and very unwisely, decided to go against what's done in every single existing Japanese dictionary, and instead of lemmatising the hiragana spelling, they went for a random mix of whatever they thought the "most common" spelling would be. The truth is that Japanese entries should be in hiragana (for Japanese and Sino-Japanese words) and katakana (for foreign words). All other spellings should have been given in the entry page. For instance: とりあつかい should be the main entry, and 取扱, 取り扱い and 取扱い should have been given in the entry page under something like "Common spellings".
All your troubles start there. The day bad decisions were made for Japanese on Wiktionary. — Sartma 𒁾𒁉𒊭 𒌑𒊑𒀉𒁲 09:22, 1 December 2022 (UTC)[reply]
@Sartma: I agree with what you say about lemmas but I don't see the advantage of having the main entry always at the kana lemma. Minimizing the number of clicks one has to perform to get to the information is also something we should strive for. Almost nobody is going to look up とりあつかい (1 page view in the last 30 days) in contrast to 取り扱い (14 page views in the last 30 days). Ideally, we would in some way mirror what jpdb is doing with alt forms but I don't think that's ever going to happen (primarily for technical reasons). — Fytcha T | L | C 16:04, 1 December 2022 (UTC)[reply]
@Fytcha: The advantage would be that you will always get to the lemma entry straight away, while pretty much solving the issue of "what is a lemma". It's well worth the price of one click. — Sartma 𒁾𒁉𒊭 𒌑𒊑𒀉𒁲 22:36, 1 December 2022 (UTC)[reply]
  • @Sartma: Much of the data structure you describe was inherited from before I became an active editor here. The only real changes I'm aware of from that initial base state are:
  • More effort at deduplication. Soft-redirect entries previously had more information provided in the interests of usability, but this was often manually copied and thus a maintenance challenge.
  • Changes to basic ideas about where to put 'main' entries. Initially, kanji renderings were preferred for almost everything. For native Japonic terms, that can get quite complex: one kanji may have multiple Japonic terms, as we see at , or conversely one Japonic term might have multiple kanji spellings, as we see at つく. After various discussions, the rough consensus emerged at that time to locate the 'main' entries for Japonic terms (i.e. kun'yomi) at the kana spellings, since these are more closely tied together, and to locate the 'main' entries for Sinic terms (i.e. on'yomi) at the kanji spellings, for similar reasons.
FWIW, the JA Wiktionary seems to put the 'main' entries for Japonic terms at the kana spellings, and for Sinic terms at the kanji spellings. Compare their Japonic-term entry at ja:きく and their Sinic-term entry at ja:高校. One good argument for not using kana spellings as the 'main' entries for Sinic terms is the large number of homophones, which become kana homographs. Consider ja:こうこう, or our not-quite-as-complete entry at こうこう. ‑‑ Eiríkr Útlendi │Tala við mig 23:16, 2 December 2022 (UTC)[reply]
@Sartma: Have you considered bringing this to the ballot? I have to admit, I have also grown increasingly frustrated with some aspects of the organization of Japanese on Wiktionary. Where we place the main entry oftentimes isn't even related to the usage frequency of the different spellings. — Fytcha T | L | C 13:11, 6 January 2023 (UTC)[reply]
@Fytcha: I don't have enough faith in Wiktionary Japanese editors to try, sorry... But you'll have my support if you do! 22:17, 6 January 2023 (UTC) — Sartma 𒁾𒁉𒊭 𒌑𒊑𒀉𒁲 22:17, 6 January 2023 (UTC)[reply]

More analogies, Chinese 后天 and 儿童 are Category:Chinese lemmas. 兒童 is both Category:Korean lemmas and Category:Vietnamese lemmas. -- Huhu9001 (talk) 00:43, 24 November 2022 (UTC)[reply]

  • @Huhu9001: I invent nothing. Perhaps I expressed myself poorly? I'll try restating.
  • @Binarystep: In light of Lemma (morphology) at Wikipedia and our own entry at lemma, and your comment, we seem to have two senses at play here: 1) "the canonical form of an inflected word; i.e., the form usually found as the headword in a dictionary," and 2) "the base form of a word, without any inflections". While neither Japanese nor ねこ are inflected, the "canonical" or "headword" form we have chosen here at Wiktionary is , so the full entry for the intersection of the pronunciation ねこ (neko) and the kanji spelling only exists on the page.
As discussed separately in other threads multiple times in years past, electronic Japanese dictionaries usually support lookup by both kana and kanji (and sometimes even romaji), returning all the records that match the input string. Wiktionary's technological underpinnings do not allow this (or at least, not as we have the site currently organized), so we have had to choose one form alone for the headword, and have the other forms redirect the reader to that headword entry.
Perhaps a different question I should pose is, what is the use case for having the category Category:Japanese_lemmas? For that use case, however it is defined, does it make sense to include multiple different renderings of the same word (kanji, kana, romaji), even if the full entry only exists at one of these? For our example, is there value in categorizing all three forms -- , ねこ, and neko (neko) -- as "lemma entries", when the full entry only exists at ? ‑‑ Eiríkr Útlendi │Tala við mig 07:57, 24 November 2022 (UTC)[reply]

* In light of Lemma (morphology) at Wikipedia and our own entry at lemma, and your comment, we seem to have two senses at play here: 1) "the canonical form of an inflected word; i.e., the form usually found as the headword in a dictionary," and 2) "the base form of a word, without any inflections". While neither Japanese nor ねこ are inflected, the "canonical" or "headword" form we have chosen here at Wiktionary is , so the full entry for the intersection of the pronunciation ねこ (neko) and the kanji spelling only exists on the page.

My definition is more consistent with Wiktionary policy. Currently, to-night, colour, and gasolene are categorized as lemmas, despite their "canonical" spellings being tonight, color, and gasoline. Simplified Chinese entries like 两广 (Liǎngguǎng) still go in Category:Chinese lemmas, even though they only serve to redirect to the traditional forms (in this case 兩廣两广 (Liǎngguǎng)). The way we handle Japanese, on the other hand, is unusual and contradictory.

As discussed separately in other threads multiple times in years past, electronic Japanese dictionaries usually support lookup by both kana and kanji (and sometimes even romaji), returning all the records that match the input string. Wiktionary's technological underpinnings do not allow this (or at least, not as we have the site currently organized), so we have had to choose one form alone for the headword, and have the other forms redirect the reader to that headword entry.

Japanese is far from the only language where we use alternative spellings as soft redirects, but it *is* the only one (as far as I know) where we don't categorize them as lemmas.

Perhaps a different question I should pose is, what is the use case for having the category Category:Japanese_lemmas? For that use case, however it is defined, does it make sense to include multiple different renderings of the same word (kanji, kana, romaji), even if the full entry only exists at one of these? For our example, is there value in categorizing all three forms -- , ねこ, and neko -- as "lemma entries", when the full entry only exists at ?

Romaji spellings aren't lemmas. They're normally not used in running text (I've only seen one exception, which used romanized Japanese to look like fake English text), and are completely subjective, since a number of popular romanization systems exist. For instance, we romanize しょうねん as shōnen (shōnen), but I've been on sites that would use shounen (shounen) or shônen (shônen) instead, and you can even find syounen (syounen), shohnen (shohnen), or shoonen (shoonen) in some places. Binarystep (talk) 20:26, 24 November 2022 (UTC)[reply]
  • @Binarystep: I see a key difference between how multiple alphabetical spellings for a single word contrast, and how kanji vs. kana spellings contrast.
By way of example, English night and nite both refer to the same basic concept (as the opposite of day). However, the latter spelling nite has specific associations due to the spelling, having to do with social register and context -- aspects that we can, and should, describe in our entry as lexically relevant information. Likewise, color and colour refer to the same concept, and the spelling difference indicates something we can talk about lexically (in this case, regional differences).
However, not every English word has multiple spellings.
In Japanese, every word that has a kanji spelling also has a kana spelling. The existence of a kana spelling for a given word that is usually spelled in kanji is completely expected and unsurprising, and unworthy of anything beyond simply noting this in the main entry. We have only created any such kana spelling entries at all due to the technical shortcomings of our platform.
The only places I can think of where using "alternative spelling" for a kana entry makes any sense is where the kana spelling deviates from the spoken pronunciation, due to historical sound shifts in "mainstream" Tokyo Japanese. This could include instances like つずく (tsuzuku), understandable as a non-standard and proscribed pronunciation-based kana spelling for つづく (tsuzuku). (Nota bene: some dialects of Japanese distinguish between (zu) and (zu) in speech, the former as /zu/ and the latter as /d͡zu/, and speakers of these lects would never confuse these two kana renderings. See also w:Yotsugana.)
So unlike English night and nite, or color and colour, there is nothing we can lexically say about the contrast between Japanese and ねこ (neko). We could talk about how kanji and kana are used in Japanese writing -- but that seems like a topic more appropriate for an encyclopedia than a dictionary, and indeed there is such content at w:Japanese_writing_system#Use_of_scripts. The use of the term script there is important: while night and nite are two different words that both use the same script, and ねこ are two different scripts that both spell the same word. This phenomenon in English only affects certain words, and involves lexically important differences between these distinct words. Meanwhile, the phenomenon in Japanese occurs across the entirety of the written language, and involves a difference in script used to spell the same words.
Looking at Chinese, the contrast between simplified and traditional is similar -- this is a phenomenon that occurs across the entirety of the written language, and involves a difference in script used to spell the same words. Electronic dictionaries for Chinese (that I'm familiar with) treat simplified and traditional as the same thing: if you look up traditional-script (shū) or simplified-script (shū), you get the same information. Depending on the dictionary, you might get a list of derived terms mirrored in traditional and simplified (example here). The difference between (shū) and (shū) is not a difference between two distinct words, but rather a difference between two distinct scripts.
My understanding of lemma with regard to Wiktionary entries is that the "lemma" is the "address" where we put the main entry for that word. The "lemma" for goes is go. We exclude goes from the list of English lemmata, and the expectation is that users should go to the lemma entry for the full information about that word. The entry at goes is essentially a soft redirect, pointing the reader to the lemma entry, and it is it is accordingly treated as a non-lemma.
Along those lines, I and the other Japanese editors have understood that the "lemma" for Japanese terms should be wherever the full entry goes. So the "lemma" for the Japanese word neko ("cat") is at , and ねこ is a soft-redirect entry, and it is accordingly treated as a non-lemma.
(Side note: Frankly, I don't think we (the Japanese editors here) have paid much attention to the lemma categories, and many of our soft-redirect entries using older infrastructure like the basic {{ja-noun}} or {{ja-verb}} templates categorize the entries by default in Category:Japanese lemmas. Newer infrastructure like {{ja-see}} or {{ja-see-kango}} was apparently created with more awareness of categorization, and these templates categorize entries in Category:Japanese non-lemma forms instead.)
If "lemma" for purposes of categorizing Japanese terms should mean "the canonical form of the headword where the main entry is located", then any soft-redirect entry should be in Category:Japanese non-lemma forms.
If instead "lemma" should mean something more like "the canonical form of the headword written in any script that a native speaker would use", then all kana and kanji entries in Japanese (and arguably some romaji and Arabic numeral entries) should be in Category:Japanese lemmas. However, that also makes the category useless for purposes of identifying the "main" entries.
‑‑ Eiríkr Útlendi │Tala við mig 20:20, 28 November 2022 (UTC)[reply]
@Eirikr:

By way of example, English night and nite both refer to the same basic concept (as the opposite of day). However, the latter spelling nite has specific associations due to the spelling, having to do with social register and context -- aspects that we can, and should, describe in our entry as lexically relevant information. Likewise, color and colour refer to the same concept, and the spelling difference indicates something we can talk about lexically (in this case, regional differences).

Even in English, not all alternate spellings have specific connotations. Just look at all the various ways to spell pampelmoes, for instance.
As for the rest of your comment, everything you've mentioned isn't unique to Japanese. I'm curious what you think of our treatment of Serbo-Croatian, a language where every word can be written in Cyrillic or Latin letters. There's certainly nothing unique or "interesting" about the fact that хокејаш can also be written as hokejaš, yet both are in Category:Serbo-Croatian lemmas. Much like kana and kanji, these are nothing more than different scripts spelling the same word. Does that mean we should de-lemmatize half of our Serbo-Croatian entries?

My understanding of lemma with regard to Wiktionary entries is that the "lemma" is the "address" where we put the main entry for that word. The "lemma" for goes is go. We exclude goes from the list of English lemmata, and the expectation is that users should go to the lemma entry for the full information about that word. The entry at goes is essentially a soft redirect, pointing the reader to the lemma entry, and it is it is accordingly treated as a non-lemma.

goes is an inflected form of go, not an alternate spelling, so it's not really relevant to this discussion. Binarystep (talk) 22:02, 28 November 2022 (UTC)[reply]
  • @Binarystep:, goes was intended purely as an example of a non-lemma entry. The kind of script difference that exists in Japanese does not exist in English, so there is no direct parallel.
Re: Serbo-Croatian, if both Cyrillic and Latin were used in mixed texts, and readers and writers would be expected to treat them identically, then yes, I would strongly be in favor of choosing one form for the lemmata, and having the other form reduced to soft-redirection stubs that would be treated as non-lemma entries. More ideally, if the Wiktionary platform could be made to support this approach, the user could enter either form and land on the same page, showing both forms and providing a unified entry.
However, rather that (so far as I understand it) Serbo-Croatian texts are written either in Cyrillic or Latin, these two are not really interchangeable. Moreover, it looks like we have full entries at both the Cyrillic and Latin spellings, neither is a soft-redirect to the other, and since both are full entries, both are sensibly treated as lemmata. The treatment of Serbo-Croatian is not really comparable to how Japanese works. ‑‑ Eiríkr Útlendi │Tala við mig 22:41, 28 November 2022 (UTC)[reply]
Can't say it is not funny to see a Japanese exceptionalism movement emerging from English Wiktionary. -- Huhu9001 (talk) 01:17, 30 November 2022 (UTC)[reply]
  • Can't say as I'm advocating for Japanese exceptionalism either. My position hinges entirely on how we define "lemma" for purposes of Wiktionary entry categorization. This could apply just as well to Chinese: those entries that are soft redirects, such as Chinese (shū) (soft redirect to (shū)), should presumably be handled similarly. ‑‑ Eiríkr Útlendi │Tala við mig 04:22, 30 November 2022 (UTC)[reply]
@Eirikr: I don't like the language used against you in this thread and I distance myself from that, but I agree with the substance brought forth by the other commenters: I understand lemma and non-lemma-ness to be solely defined on morphological grounds (i.e. a term is a lemma iff it abides by the morphological constraints imposed on lemmas). Lemma-ness doesn't have anything to do with main-entry-ness: 可愛い, かわいい and カワイイ are all equally much lemmas while 可愛くて, かわいくて and カワイクテ are all equally much non-lemmas, even though only one of the lemmas should have a main entry (i.e. more than a soft redirect). I also agree that this is exactly how it is handled for all other languages on Wiktionary and that Japanese is the sore thumb. — Fytcha T | L | C 02:26, 30 November 2022 (UTC)[reply]
  • @Fytcha: As I understand your reply, your position is that "lemma" should mean "'main' or uninflected form of a word, regardless of script or whether that particular entry is our 'main' entry for that term".
With that in mind, why would you lemmatize at カワイイ (kawaii), not a common form for this word, and not at kawaii (kawaii), also not a common form for this word? What distinction do you draw?
Definitions for "lemma" that I've seen consistently describe how a term is indexed in a dictionary. This sounds to me like the "address" at which we expect to find the main entry. Per w:Lemma, "lemma refers to the particular form that is chosen by convention to represent the lexeme." For the Japanese word meaning "cat", for instance, the Wiktionary editors working on Japanese have chosen the form to represent this lexeme, and we have treated this as the 'main' entry. The Japanese editors have, so far as I'm aware, endeavored to avoid data duplication and instead consolidate the 'main' entries for each word at one specific script rendering, using templates like {{ja-see}} at the other renderings to direct users to the 'main' entries.
If instead "lemma" just means "uninflected form of a word, regardless of script", what use are the "[LANGUAGE] lemmas" categories for our users? This artificially inflates the lemma counts for CJKV languages -- all Korean hanja renderings have corresponding hangul renderings; all Chinese simplified entries have corresponding traditional entries (albeint sometimes identical), all Vietnamese Hán tự entries have corresponding Vietnamese alphabet entries, and all Japanese kanji entries have corresponding hiragana, katakana, and possibly even Latin alphabet and Arabic numeral entries. For these multi-script written languages, what word or category should we use instead of "lemma" for users to find the 'main' entries? Serious questions.
  • One partial parallel with English has occurred to me. Some time back, we had discussions and votes on the handling of Middle and Old English words using the letter Ƿ or wynn, such as at Wiktionary:Votes/2020-09/Removing Old English entries with wynns and Wiktionary:Votes/2020-12/Bringing back wynn entries. The outcome of that was that all entries using ⟨Ƿ⟩ spellings were deleted and redirected (via some software configuration rather than using #REDIRECT) to the equivalent spellings using ⟨W⟩ instead, such as at ƿynn. Meanwhile, words using the letter Þ or thorn still exist, such as at þat or oþer, and apparently these are still categorized as lemma entries, in contrast to the complete removal of ⟨Ƿ⟩ entries.
For Middle and Old English, what utility is there in classifying an entry like Middle English oþer as a "lemma" when the bulk of usable entry information is located at Middle English other instead? And if we treat the oþer spelling as a "lemma", why do we not do the same for spellings like ƿynn?
I ask these as serious and honest questions. I do not understand how we (the wider Wiktionary editing community) intend for "lemma" categories to be used. ‑‑ Eiríkr Útlendi │Tala við mig 18:46, 30 November 2022 (UTC)[reply]
@Eirikr: your position is that "lemma" should mean "'main' or uninflected form of a word, regardless of script or whether that particular entry is our 'main' entry for that term". Yes, this is a fair summary of my position.
With that in mind, why would you lemmatize at カワイイ (kawaii), not a common form for this word, and not at kawaii (kawaii), also not a common form for this word? If we accept what I've laid out thus far, there's not really a question as to why カワイイ (kawaii) should be treated as a lemma. To answer the part about romaji, I draw a distinction because romaji is not a native script of Japanese (much like Pinyin is not a native script of Mandarin). (Even if you could attest kawaii (kawaii) three times in native Japanese books, I would still be opposed to the inclusion of kawaii (kawaii) as anything more than a romanization soft-redirect, just like I'd be opposed to the inclusion of spurious but perhaps attested joke spellings of English words in non-native scripts ([5], [6]).) It all hinges on what is considered to be a predominant script of a language. Kanji and kana are, romaji isn't. The only reason why we should include romaji entries at all is because they're extremely useful for people who don't have a Japanese keyboard layout installed and because the WikiMedia software doesn't allow for a better solution (contrast this with jisho.org where you can enter romaji in the search bar even though the dictionary itself doesn't contain romaji entries).
Per w:Lemma, "lemma refers to the particular form that is chosen by convention to represent the lexeme." It seems that this is taken from w:Lemma (morphology) and reading that page makes clear that they actually agree with my position, that they also understand lemma to be defined on morphological grounds (they even explicitly lay out some of these morphological constraints that I was talking about). The word form in your quote was clearly meant to refer to morphological forms, not orthographic forms.
The Japanese editors have, so far as I'm aware, endeavored to avoid data duplication and instead consolidate the 'main' entries for each word at one specific script rendering, using templates like {{ja-see}} at the other renderings to direct users to the 'main' entries. I think you are still confounding lemma-ness with main-entry-ness. We all endeavor to avoid redundancy (ok, let's not talk about Serbo-Croatian) and we use soft-redirect templates in every language, but that has nothing to do with lemma-ness. coöperation is a soft-redirect because we don't want to duplicate the entirety of the content found in cooperation but we still correctly consider it a lemma and categorize it as such. Even if we start to correctly categorize ねこ as a lemma, it would still only ever be a soft-redirect, as is coöperation. Nothing changes in terms of duplication.
If instead "lemma" just means "uninflected form of a word, regardless of script", what use are the "[LANGUAGE] lemmas" categories for our users? I think this is not really relevant to the issue at hand. If we come to agree what lemma means and if we further agree that we should employ one consistent definition of lemma all across Wiktionary (which we currently do, bar Japanese), the utility argument does not really change anything. I would not be opposed if we additionally wanted to create a category for a subset of lemmas, namely precisely those lemmas that contain at least one non-redirecting definition. However, this is probably not really implementable using only templates and modules and would thus require some massive botting just for upkeep which comes with its own set of problems. I cannot give you a more direct answer to your question of what the use is because I use Wiktionary in some specific ways that lead me to not have any use at all for both the present lemma categories as well as the lemma subset category that you envision.
I want to present two additional points to consider. Not only is Wiktionary's Japanese inconsistent in its use of the lemma categories compared to the rest of Wiktionary, it is also internally inconsistent:
  1. {{ja-see}} adds the non-lemma category to an entry while {{ja-def}} doesn't. See for instance this entry: すすめる. Both templates are used in the exact same way, to provide a redirection from one spelling to another.
  2. {{ja-see}} actually places the entries it is used in in the corresponding part of speech category: ねこ is categorized as Category:Japanese nouns. The issue with that is that these part of speech categories are subcategories of Category:Japanese lemmas. Transitively speaking, ねこ is both a lemma and a non-lemma.
Everything I've brought up so far is solved by simply making {{ja-see}} categorize into Category:Japanese lemmas instead of Category:Japanese non-lemma forms. — Fytcha T | L | C 15:54, 1 December 2022 (UTC)[reply]
@Fytcha: The case of t:ja-def is a little complicated: t:ja-def was indeed initially (2007) created as a redirect template. 11 years later in 2018, the creator of t:ja-see repurposed it as a main-entry template serving to tell t:ja-see which lines of the definitions are to be fetched and displayed on the redirect page. But the change was never fully implemented, perhaps due to lack of consensus. That's the main reason why 2 drastically different styles of redirect pages coexist on English Wiktionary. -- Huhu9001 (talk) 04:20, 2 December 2022 (UTC)[reply]
  • @Fytcha, thank you for bearing with me and hashing this all out.  :)
  • Re: katakana forms, I am not a fan of creating these by default unless they are lexically important. If someone wants to bot-create these as soft-redirects, I suppose I would not be opposed. That said, the katakana ↔ hiragana difference is very roughly analogous to the upper case ↔ lower case difference in bicameral alphabets – there is a one-to-one equivalency for each kana / letter. We don't have upper-case entries for words like dog, even though that is a valid lemma form for the term (per the definition arrived at in this thread; while DOG exists, this has no sense line referring to a canine animal), and for similar reasons I don't think we need to have katakana entries for words like かわいい (kawaii, cute).
Happy to leave romanized entries out of any "lemma" categories. Still unsure of proper categorization for proposed entries like katakana カワイイ (kawaii), when we don't have analogous English entries like CUTE. If your position is that CUTE should exist and should be categorized as a lemma, then fine, I'll grant that カワイイ (kawaii) should be treated similarly. But if conversely your position is that CUTE should not exist and/or should not be categorized as a lemma, then I am unsure why we would treat カワイイ (kawaii) differently.
  • Re: the Wikipedia treatment of "Lemma" at w:Lemma (morphology), you mentioned that The word form in your quote was clearly meant to refer to morphological forms, not orthographic forms. That isn't as clear to me, considering the subsection further down at w:Lemma (morphology)#Headword. My working understanding of the term "lemma" is closer to what is described in this section, where the focus is on "where the 'main' entry is located" in a dictionary.
By way of background, Japanese print dictionaries are indexed by pronunciation, and the headwords that are printed in that order are shown with the kanji spellings. There is no functional difference between looking up ねこ (neko) or (neko, cat).
This particular word neko only has the one entry; a more illustrative example might be こうこう (kōkō) vs. 高校 (kōkō, high school) -- there are several homophones pronounced the same as こうこう (kōkō), so a reader would look up the right page of the dictionary for this reading, and skim through the headwords to find the 高校 (kōkō) headword and associated entry. There is a small functional difference between looking up こうこう (kōkō) and 高校 (kōkō, high school), but again the headwords are listed using the kanji spellings.
The key point is that Japanese dictionary entries exist at the intersection between the pronunciations (as represented by the kana spellings) and the kanji spellings.
The disjuncture between readings and headwords here at Wiktionary is caused by the technological shortcomings of our platform -- and hence this (at least my) confusion about what we treat as "lemma" here.
→ I am happy to change my working definition to match this thread, as in something along the lines of "any uninflected form of a word, regardless of script, so long as it is written in a script used by native writers".
  • I'll skip the deduping comment, as that is addressed by the "what is a lemma" point above. :)
  • Re: lemma categories, I don't think I've used those for much either. I am happy to concede on this.
  • Re: {{ja-see}} vs. {{ja-def}}, in terms of template use, while both redirect the user to another entry, they are not used the same at all -- {{ja-def}} just creates an inline link to the Japanese entry for the provided string, while {{ja-see}} displays a reformatted summarization of the targeted entry with additional explanatory text. {{ja-see}} is much more heavy-lifting, using Lua to look up the targeted entry and extract only certain relevant information. {{ja-def}} is was much simpler, using the older template architecture to just add a link to the Japanese entry for each of up to eight argument strings.
Incidentally, @Fytcha, it's not clear to me why you Lua-ized {{ja-def}}. Isn't it more problematic to add Lua to this, increasing the Lua memory load? I'm not sure that Lua is actually needed for anything this template does?
  • I'd be happy to have {{ja-see}} reworked to categorize differently. Note that something similar would likely be needed for related templates {{ja-see-kango}} and {{ja-gv}}.
Cheers! ‑‑ Eiríkr Útlendi │Tala við mig 03:52, 6 December 2022 (UTC)[reply]
@Eirikr: Apologies for the very belated reply.
  • Re: katakana forms, I would argue that all of カワイイ, CUTE and gebäude are indeed lemmas (i.e. that they satisfy the morphological constraints imposed on them) but I, of course, agree with you that the latter two are not worthy of inclusion despite being attested and despite having a slightly different nuance than cute and Gebäude (i.e. being emphasized and being sloppily typed on a computer, respectively). We regularly exclude lemmas for all sorts of reasons. However, I concede that there is a priori no difference to カワイイ and that it thus shouldn't be included. The only reason why I would still be okay with a redirect at カワイイ is because the MediaWiki software redirects the search queries CUTE and gebäude automatically but not カワイイ.
  • Re: w:Lemma (morphology), I think we will have to agree to disagree about our respective readings of the Wikipedia article but given that you said you're willing to change your working definition to the one presented in this thread, this is already all I could ask for anyway :)
  • Re: {{ja-def}}, I have Lua-ized it because there has been a case where there have been more than 8 arguments (can't find it right now). Just adding more slots seemed like a band-aid solution to me so I did it "the proper way". I have almost exclusively encountered this template on kana pages so I thought the increase in memory usage wouldn't make a difference. If it does, I can change it to a hybrid template that switches to module only after a certain number of arguments have been passed.
Fytcha T | L | C 12:47, 6 January 2023 (UTC)[reply]

@Huhu9001, Eirikr, Binarystep, Sartma: I have now implemented this change: diff. — Fytcha T | L | C 13:00, 6 January 2023 (UTC)[reply]

Looks good! Binarystep (talk) 07:00, 10 January 2023 (UTC)[reply]

English pronunciation appendix fails to account for General American three-way back-vowel merger[edit]

Looking through the vowel table in Appendix:English pronunciation, the GA column lists the father/palm and not/boss vowels as /ɑ/, and the law/caught vowel as /ɔ/. However, this doesn't match with my experience of GenAm speakers, who, so far as I can tell, mostly use /ɔ/ for all three (except for a few anomalous instances of persistent /ɑ/, mostly in foreign loans like bra or Nazi).

Should we update our English vowel chart to reflect the GA merger of all three non-/ʌ/ open and open-mid back vowels to /ɔ/? Whoop whoop pull up Bitching Betty ⚧️ Averted crashes 06:00, 24 November 2022 (UTC)[reply]

  • FWIW, I pronounce the initial vowels in father and not similarly, but distinct from either palm or boss. For me, the vowels in law and caught are close, but also still distinct. ‑‑ Eiríkr Útlendi │Tala við mig 08:01, 24 November 2022 (UTC)[reply]
    Whereas (as you might've guessed from the question) I pronounce all of those with the same vowel. Whoop whoop pull up Bitching Betty ⚧️ Averted crashes 08:17, 24 November 2022 (UTC)[reply]
    Could you clarify how many total distinct vowels you have? Are you saying that father=not, palm=boss, and law=caught, but no two of these three sets are equal to each other? (Usually, "palm" and "boss" are grouped either with father=not (as /ɑ/) or law=caught (as /ɔ/), although not necessarily both with the same one.) --Urszag (talk) 08:37, 24 November 2022 (UTC)[reply]
    @Urszag I have eleven distinct vowels in total (if we lump R-colored vowels in with their non-R-colored versions): four front or near-front (/i/, /ɪ/, /ɛ/, and /æ/), one central (/ə/), and six back or near-back (/u/, /ʊ/, /o/, /ʌ/, /ɔ/, and /ɑ/). Of these, /o/ and /ɑ/ only show up in limited circumstances (in my 'lect, /o/ can only occur immediately prior to the approximants /l/ and /ɹ/ or as the onset of the diphthongs /ou/ and /oi/, while almost all cases of /ɑ/ are either immediately prior to /ɹ/ [a context which, in GA, as noted earlier this month, is allergic to /ɔ/ unless the two are separated by a syllable break] or in relatively-recent loanwords). And for me, as regards their (main) vowels, father=not=palm=boss=law=caught=sorry=/ɔ/, while Nazi=bra=star=/ɑ/. Whoop whoop pull up Bitching Betty ⚧️ Averted crashes 11:07, 24 November 2022 (UTC)[reply]
    @Eirikr Would you be able to give an elaboration similar to the above?--Urszag (talk) 03:14, 28 November 2022 (UTC)[reply]
  • @Urszag: I am not as well-versed in vowel IPA as I'd like, so I won't dive into that. While close, fathernot. The "A" in father is more open, I think. Also, palmboss, where the "L" renders the vowels quite distinct -- palm = balsam, for instance. Meanwhile, my vowel for law is pretty "flat" or "straight", while for caught there's maybe a tiny bit of a diphthong.
To crib from Whoop whoop's post, I'd group the words as follows based on the vowel values for how I speak:
  • father = Nazi = bra = star = sorry
  • not
  • palm
  • boss = caught
  • law
‑‑ Eiríkr Útlendi │Tala við mig 19:13, 28 November 2022 (UTC)[reply]
Thanks for the reply! That is definitely interesting. When you say that palm = balm and is different from boss because of the "L", do you mean that you pronounce palm, balm with a sequence of a vowel + the consonant /l/, as in ball, bald, Paul? There are more sets here than generally exist in North American phoneme inventories, so I suspect some of the distinctions you mention might be allophonic and conditioned by the surrounding sounds rather than phonemic. Another follow up question I am wondering about is whether differences you hear between father and not and between boss/caught and law might be based on the voicing of the following consonant: comparing nod, hop, block and broad, hawk, pause might shed light on that.--Urszag (talk) 19:56, 28 November 2022 (UTC)[reply]
  • @Urszag: For me, bomb and balm are contrastive: the "L" is definitely present, and causes me to start the vowel sound with my lips more rounded and with my tongue pulled further forward in the mouth. The difference is essentially the same as between ah and all. FWIW, all and awl are identical for me. For not vs. boss and caught, a better minimal pair here might be not and naught (where the latter vowel sound for me is the same as in boss and caught, and also bog for that matter). ‑‑ Eiríkr Útlendi │Tala við mig 20:42, 28 November 2022 (UTC)[reply]
  • I was quite surprised to read further up on the page that you have /ɔ/ in father vs. /ɑ/ in bra. Descriptions or dictionary transcriptions of General American almost always use /ɑ/ to represent the vowel in father, bra, bother, which have the same vowel phoneme for most General American English speakers. See the section "Unrounded LOT" in Phonological history of English open back vowels. ("Palm" is complicated for the separate reason that many American English speakers either restore the /l/, or have a pronunciation that is in some way affected by the presence of the letter L.) Retaining a rounded vowel in LOT seems to be a particular feature of the New England region (which would I think fit with your background); I guess that the word father, which in RP has an unrounded vowel that was irregularly lengthened to [ɑː], developed a rounded vowel in your accent by analogy with the much larger LOT set. I am from California and in my accent, there is no phonemic distinction at all between /ɒ/, /ɔ/ and /ɑ/; for me father=not=palm=boss=law=caught=Nazi=bra=star=sorry. It's possible that my merged vowel is phonetically [ɔ] before /l/, but I would certainly not use [ɔ] as the general transcription of the merged vowel. Its quality for me is much better represented by [ɑ], so I transcribe my phoneme in these words as /ɑ/. My impression has been that aside from other speakers with a three-way merger, who seem to be common and who I hear as having [ɑ] like myself, the other largest group in North America is speakers with a two-way distinction between [ɑ] in LOT and [ɒ] (conventionally transcribed as /ɔ/, although I believe it is often not really that raised, or even necessarily all that rounded) in CLOTH and THOUGHT. Although speakers with a two-way distinction are supposedly not uncommon, my personal experience has been that the only people I've encountered who say they make a distinction are older speakers like my grandparents or people I have communicated with online. "General American" is I think generally not defined either as having or as lacking the cot-caught merger: maintaining the distinction is a more conservative feature, but from what I can tell many American English speakers are poor at perceiving whether another American speaker has a low back merger or not, and there isn't much stigma attached to any of the various possible configurations of phonemes in this area (as opposed to some distinctive regional allophones, such as certain New York-associated pronunciations of the /ɔ/ phoneme). --Urszag (talk) 08:37, 24 November 2022 (UTC)[reply]
    It looks like you also have a three-way /ɒ/-/ɔ/-/ɑ/ merger (and one that's more complete than my own, in fact), just with a different endpoint... which actually strengthens my point that Appendix:English pronunciation should in some way take note of the tendency of more-evolved-GA speakers towards a full three-way lowish-back-vowel-other-than-/ʌ/ merger. As regards my background, one, most descriptions of New England English that I've seen seem to focus mainly on the coastal non-rhotic vowel-merge-resistant 'lects, without much notice being taken of the apparently-less-conservative inland-New-England 'lects (the linked WP article is apparently unaware of the possibility of a full or near-full three-way low-back merger occurring with a rounded LOT vowel), and, two, before someone asks, I'm 99% certain that how I pronounce things isn't just from my family passing me an idiosyncratic way of pronouncing English, given that, so far as I can recall, my classmates in middle and high school also tended to pronounce things the way my family does. (Interestingly, judging by the table later on in the abovelinked WP article, my accent is apparently actually closer to a rounded-LOT variation of Canadian English than it is to General American, at least as regards low back vowels, which is not a result I would've expected.) Whoop whoop pull up Bitching Betty ⚧️ Averted crashes 11:30, 24 November 2022 (UTC)[reply]
The result of the merger is /ɑ/, surely. Nicodene (talk) 23:18, 24 November 2022 (UTC)[reply]
I think I don't have the contrast that you're talking about between palm-not-law and bra, and am not sure I've noticed it in others or what phonetic difference the notation /ɔ/ and /ɑ/ indicates exactly, though you mentioned the former vowel is rounded. Is it similar to the cardinal values (official strict phonetic pronunciation, like Polish o and Received Pronunciation father) or something else? (Forgive my skepticism that it's the cardinal value, because people tend to use phonetic symbols in a way highly conditioned by the way English is transcribed so I'm never sure.) Are there any sound files or videos that you can link to (with timestamps) that demonstrate the contrast? — Eru·tuon 22:45, 25 November 2022 (UTC)[reply]
Just noting for clarity, I moved "boss" (and "moth") out of the "not, wasp" line: independent of if we want to start mentioning the additional/alternate vowels words have after cot-caught-merging, "not, wasp" and "boss, moth" aren't in the same set (moth, at least, is in cloth; this can merge back to lot for speakers who merge caught to cot, but that merger changes multiple lines, so it's more clearly handled by noting as much on each line rather than making a frankenline, IMO). - -sche (discuss) 00:16, 26 November 2022 (UTC)[reply]
The mergers in question are the father-bother and cot-caught mergers, yes? But as others have said, the outcome of these is /ɑ/ not /ɔ/, except in dialects like Boston (and, outside GenAm, some varieties of Canadian) that have an /ɒ/ I could see someone interpreting as /ɔ/. I agree the appendix should mention the mergers, though enough varieties of GenAm don't merge cot-caught that IMO we're best off continuing to show the distinct vowels first, and mentioning the merger in a labelled way — if not in footnotes, then maybe (for e.g. law) "ɔ (with cot-caught merger: ɑ)", like how entries put the distinct sounds first and the merged {{a|cot-caught}} sound next? - -sche (discuss) 07:30, 26 November 2022 (UTC)[reply]
@-sche: I think that that makes sense. Tharthan (talk) 22:03, 26 November 2022 (UTC)[reply]
@-sche: The problem is, not all cot/caught-and-father/bother-merging dialects merge to ɑ; for those Northeastern and Canadian dialects that you mention (and that I evidently grew up in), the cot/caught and father/bother mergers keep things like law at /ɔ/ (which some of those dialects might well realize as [ɒ]) and, instead, merge most instances (the exceptions being where this is blocked by /ɹ/ plus in some more-recent loanwords) of /ɑ/ to /ɔ/ ([ɒ]?), which makes the proposed label (which pretty much flat-out states that all back-vowel-merging dialects merge /ɔ/ to /ɑ/ rather than the other way around) wrong. Whoop whoop pull up Bitching Betty ⚧️ Averted crashes 19:27, 27 November 2022 (UTC)[reply]
The Northeastern dialect that -sche mentioned was the Boston dialect, which traditionally does not have the father-bother merger.
I can believe you when you assert that certain accents in more central or western parts of Massachusetts that have both the cot-caught merger and the father-bother merger can end up with all three groups having a single, noticeably rounded vowel. But that definitely is not the case with the Boston dialect. The Boston dialect pronounces father as [ˈfaðə] and bother as [ˈbɒðə]. In the Boston dialect, a word like law also has [ɒ] because the Boston dialect does have the cot-caught merger. Tharthan (talk) 22:49, 27 November 2022 (UTC)[reply]
@Tharthan: Ah, that makes sense regarding the Boston accent's vowel-merger status. Regardless, there are still those inland-Northeast accents that merge all three vowels to /ɔ/ (or perhaps [ɒ]), which makes the "ɔ (with cot-caught merger: ɑ)" notation misleading (because of said inland-Northeast accents where the merger goes the other way). Whoop whoop pull up Bitching Betty ⚧️ Averted crashes 23:44, 27 November 2022 (UTC)[reply]
(e/c) As far as I understand, the conventional answer to that in linguistics literature, and what seems to be the current approach here, is that these dialects are distinct things, treated on separate lines like (Inland Northern American) or (Boston) (etc, as applicable), distinct from {{a|GenAm}}. Of course, entries like thought vs cot vs caught are inconsistent about whether they are nested under GenAm yet (or labelled at all beyond the uselessly vague "US"), etc, but maybe if we can find / agree on which label(s) to use here, we can set about standardizing and fleshing out such entries. AFAICT, these dialects are generally analysed not as shifting /ɑ/ to /ɔ/ (or even necessarily having GenAm /ɔ/ at all), but as lowering caught to /ɒ/ (so that, incidentally, it sounds to speakers outside the dialects roughly like they have simply merged it into [how speakers outside the dialects would say] cot), and either also merging cot to that, or — since in some of these, like Inland North, not all speakers merge cot-caught — keeping cot distinct by fronting it to [kat]. Or, as in Boston, speakers may merge both cot and caught but still keep words like cart distinct by fronting it to [kat].
My guess is that this lowering to /ɒ/ might be what you have, too. (I've revised this comment a few times, trying to be neither oververbose nor so short it's curt or making statements without explaining them.) Brains already need to ignore a lot of variation in the sound of a vowel from speaker to speaker that just comes from voices differing, so we shockingly often also don't perceive when people make contrasts or mergers we don't, as long as we can resolve their speech into expected word-slots, i.e. unless their accents strongly differ or a particular collision creates confusion that context can't readily resolve. Hence, for example, this discussion led me to ask Americans I know whether they merge lot, thought etc, and I was surprised that some of them did and I hadn't registered it (even though I knew the merger existed). Likewise, it's probably how you were able to think most Americans merged these sounds. (In the past, it's what led Gilgamesh to argue most Americans merged bull and bowl like him, and it's what led Mahagaja to make the still-relevant advice that this is why it's important we look to linguistic literature that looks at formants, etc, rather than our own assessments, heh.) I suspect that if you have this regional /ɒ/, and know that the two nearest GenAm phonemes are /ɑ/ and /ɔ/, and you have a few words with canonical /ɑ/, that may be why you're interpreting your merged lot-thought vowel as /ɔ/ (even if you're comparing it to other GenAm varieties' /ɔ/, GenAm /ɔ/ is already closer to /ɒ/ than RP /ɔ/ is). - -sche (discuss) 00:31, 28 November 2022 (UTC)[reply]
(Palm moving to /ɔ/ for some people is probably influenced by /l/ having been present at some stage in their lect's history as Urszag says; balm and psalm can have the same /ɔ/, and salt, etc. Why father would have /ɔ/ or /ɒ/, I don't know; I'd love to see data on where that pronunciation occurs. Maybe absence of non-r-coloured/r-adjacent /ɑ/ in any(?) other words reduced the extent to which /ɑ/ was distinguished as a phoneme, until its reintroduction in learnèd or loan terms, which is where you say you have /ɑ/...?) - -sche (discuss) 02:13, 28 November 2022 (UTC)[reply]
The word "father" is recorded as having /ɔ/ in some accents, but I didn't know it occurred in North America. The varying vowel by dialect of "father" (/ɑ/, although common, is after all an irregular development) is in fact part of what led John Wells to choose PALM instead of FATHER as the keyword for this set; even though, in the context of North American English, I would say palm is certainly unsatisfactory. Wells mentions "if we are discussing Hiberno-English, [...] father often has not the expected aː of Armagh, Karachi, Java etc but the ɔː of THOUGHT" in his blog post "lexical sets" (John Wells’s phonetic blog, Monday, 1 February 2010).--Urszag (talk) 03:07, 28 November 2022 (UTC)[reply]
I am late to this discussion but I have a cot-caught distinction which is /ɑ/ vs. /ɒ/, and for me, palm has the /ɒ/ vowel without any /l/; similarly bomb /bɑm/ vs. balm /bɒm/. I grew up in Tucson but spent the first four years of my life in New Haven CT, which might explain this. Benwing2 (talk) 06:40, 3 December 2022 (UTC)[reply]

Gloss/category for non-religious Jewish terms[edit]

Arising from a chat with User:Jodi1729: we often gloss religious terms by the religion (e.g. "Christianity", "Hinduism", "Judaism"): however, in the case of Jewish people, there are a lot of very distinctive culturally Jewish terms that have got no religious connection, like shiksa, bubele, alter kaker. Jodi and I thought that it might be appropriate to have some sort of "Jewish culture" gloss for these things. Opinions? Equinox 05:16, 25 November 2022 (UTC)[reply]

Agreed. 98.170.164.88 05:34, 25 November 2022 (UTC)[reply]
Makes sense to me. Binarystep (talk) 06:19, 25 November 2022 (UTC)[reply]
Disagree. It's better to be specific, simce not all Jewish cultures speak Yiddish. Some better labels are Ashkenazi or Yiddishism. Ioaxxere (talk) 19:10, 25 November 2022 (UTC)[reply]
You're still in favour of a label, though? That's all I was saying. We can't put "Judaism" on alter kaker because it isn't a religious term. But it's also a phrase that is definitely not used outside of Jewish/Yiddish-related communities. Equinox 19:20, 25 November 2022 (UTC)[reply]
Yep, I agree with all that Ioaxxere (talk) 19:28, 25 November 2022 (UTC)[reply]
Seems sensible. Whoop whoop pull up Bitching Betty ⚧️ Averted crashes 19:28, 27 November 2022 (UTC)[reply]

Join the Movement Charter Regional Conversation Hours[edit]

You can find this message translated into additional languages on Meta-wiki.

Hi all,

As most of you are aware, the Movement Charter Drafting Committee (MCDC) is currently collecting community feedback about three draft sections of the Movement Charter: Preamble, Values & Principles, and Roles & Responsibilities (intentions statement).

How can you participate and share your feedback?

The MCDC is looking forward to receiving all types of feedback in different languages from the community members across the Movement and Affiliates. You can participate in the following ways:

  • Attend the community conversation hours with MCDC members. Details about the regional community conversation hours are published [ here]
  • Fill out a survey (optional and anonymous)
  • Share your thoughts and feedback on the Meta talk page
  • Share your thoughts and feedback on the MS Forum:
  • Send an email to: movementcharterwikimediaorg if you have other feedback to the MCDC.

Please check the appointments of the Community consultation hours [Here] and register for the meeting that suits your availability. The conversations will not be recorded, except for the section where participants are invited to share what they discussed in the breakout rooms. We will take notes and produce a summary report afterward.

If you want to learn more about the Movement Charter, its goals, why it matters and how it impacts your community, please watch the recording of the “Ask Me Anything about Movement Charter” sessions which took place earlier in November 2022.

Thank you for your participation.

On behalf of the Movement Charter Drafting Committee,

Mervat (WMF) (talk) 19:41, 27 November 2022 (UTC)[reply]

Updating {{nonlemma}}'s documentation[edit]

I've edited the documentation of {{nonlemma}} as well as Wiktionary:Etymology#Inflected forms to reflect the way this template is typically used and non-lemmas are treated. I wanted to discourage the wholesale addition of {{nonlemma}} to entries that are strictly non-lemma forms; it strikes me as a misguided attempt to complete every entry when leaving the etymology section out of most non-lemma entries entirely suggests to me that they're fine the way they are. Ultimateria (talk) 05:06, 30 November 2022 (UTC)[reply]

I can get behind that. It has the same vibes as demoting alt forms to not include etymologies and the like. Vininn126 (talk) 17:18, 30 November 2022 (UTC)[reply]

/ol/[edit]

Discussion moved from Appendix talk:English pronunciation.

Should words like pole be notated /ol/ rather than /oʊl/ in GenAm? Some editors have said that something like this is the (allophonic / [narrow]) pronunciation they have, in a few discussions over the years, most recently Vininn and Whoop whoop here. Lately, I've seen it being added to entries, but in that case we need to update the appendix which AFAICT only recognizes /oʊ/+/l/. The main rationale against a switch that I can see is that outside of a few British dialects, it doesn't seem to be contrastive; the difference between row labs and roll abs is regular vs dark l rather than a change in the vowel, and the difference in the vowels of bone and bowl is perhaps not so great as to be phonemic. - -sche (discuss) 21:08, 29 November 2022 (UTC)[reply]

@-sche Row labs v. roll abs does involve a change in the first word's vowel, though; row labs has a diphthong before the /l/, whereas roll abs is monophthongal in that location. Whoop whoop pull up Bitching Betty ⚧️ Averted crashes 06:01, 6 December 2022 (UTC)[reply]
That would imply /o/ is a unique phoneme vs /oʊ/ which it isn't. My accent has even more extreme dark-L shenanigans, but I still notate "goal" [gɔɰ] as /gɐʉl/. Just put more realised pronunciations on pages, that's what I've been doing for cases like this where the phonemic vs phonetic pronunciation differs quite a bit. – Nixinova [‌T|C] 03:31, 30 November 2022 (UTC)[reply]
True. (Moving this to BP in hopes of more input; I initially didn't want to start yet another BP pronunciation discussion, but it needs input...) Unlike with or, where e.g. floor eight does contrast with both flaw rate and flow rate, supporting a distinction between /o/ and /ɔ/ and /oʊ/ before /ɹ/, I'm not sure if there's a phonemic contrast between /oʊ+l/ vs /ol/ (the distinction seems to reside more in the quality of l than o). - -sche (discuss) 22:15, 30 November 2022 (UTC)[reply]
For my accent, that difference is completely predictable as a matter of syllabification (sometimes also a difference in foot structure/secondary stress): "flow ring" vs. "flooring" is parallel to the differences I have between "key ring" and a hypothetical "keering", or pairs with /l/ like "slowly" [sloʊli] vs. "goalie" [goəɫi] or "gayly" [geɪli] and "gale-y" [geəɫi]. The pair of phonemes could be transcribed as /e/ and /o/ or /eɪ/ and /oʊ/, but I consider them just two phonemes, not four: I have no cases of tautosyllabic [oʊl] and [eɪl] so there is no contrast when syllable division is taken into account. Overall, I think my preference would be to use /o/ /e/ everywhere in phonemic transcriptions of American English.--Urszag (talk) 23:10, 30 November 2022 (UTC)[reply]
I wouldn't necessarily disagree (on just using /o/ and /e/ for flow, day, etc) if we switched to e.g. the /o̞/ that was suggested in a prior discussion for floor, because the sounds are different in minimal pairs like flowrate vs (some pronunciations of) fluorate. (An exhaustive search would probably find more minimal pairs.) - -sche (discuss) 22:51, 1 December 2022 (UTC)[reply]
That would be no better from my perspective: as I said, the reason I'd be for switching to using /o/ everywhere is because I consider them to be the same vowel phoneme. I have a phonetic difference between "flow-rate" and "fluorate", but I do not accept that as an example of a minimal pair for the vowel phoneme in the first syllable because for me the difference is entirely explained by the prosodic/syllabic structure of the word. They are like the pairs "night rate" and "nitrate" or "sea king" and "seeking", which are not pronounced completely identically despite having the same sequence of phonemes.--Urszag (talk) 01:05, 2 December 2022 (UTC)[reply]
I'd personally be in favor of using /e o/. In my speech they are sometimes monophthongal, sometimes closing or centering diphthongs as Urszag points out and /i u æ/ are as well though not with the exact same patterns or degree of diphthongality. And I think it's true of some General American accents. Not sure if that is true of all General American accents, if they are influenced to some degree by a regional accent that does have strong diphthongs for /e o/, like a Southern accent. — Eru·tuon 03:25, 2 December 2022 (UTC)[reply]
I would personally argue for using /oʊ/ where the vowel in question is a diphthong (e.g., in Southern American accents, where "roll" actually is pronounced /ɹoʊl/, as if it were "row'l"), and /o/ where it's a monophthong (e.g., in GA itself, where "roll" is not pronounced with the "row" vowel, but, instead, with a monophthong, as /ɹol/). Whoop whoop pull up Bitching Betty ⚧️ Averted crashes 06:01, 6 December 2022 (UTC)[reply]
Pronunciation of "rolloff" in GA, as /ˈɹol.ɔf/; note the absence of diphthongs.
How "rolloff" would be pronounced if MW's pronunciation were still accurate for GA, as /ˈɹoʊ.lɔf/; note that this pronunciation contains a spurious diphthong that does not exist in the word as it's actually pronounced in GA.
For example, User:Kwamikagami's been reverting rolloff to show an /oʊ/ diphthong in its pronunciation, claiming Merriam-Webster Online as authority for doing so. The problem with this is that MW's GA pronunciations are considerably out of date and do not reflect how the word is actually pronounced in GA (hear included audio files for demonstration of same). Whoop whoop pull up Bitching Betty ⚧️ Averted crashes 06:54, 6 December 2022 (UTC)[reply]
I don't think there's a phonemic contrast between a monophthongal and diphthongal oh vowel. That was lost hundreds of years ago in mainstream English accents (like toe /toː/ used to contrast with tow /tou/). We shouldn't be transcribing the same phoneme two different ways based on allophonic differences; that can be put in phonetic [] transcriptions. Unfortunately we're already kind of transcribing an allophone by writing north-force with /oɹ/ rather than /oʊɹ/ in harmony with other cases of the vowel, but we don't need to add more inconsistencies.... I'd rather switch to /o e/ everywhere. — Eru·tuon 00:50, 8 December 2022 (UTC)[reply]
@Urszag: on one hand, the traditional use of that argument is to keep floor as /ɔ/, and to the extent we rejected the idea that the difference between floor eight and flaw rate was just allophony predictable from the syllable boundary, I'm not convinced about using it to dismiss the difference between flowrate vs fluorate... on the other hand, I concede the issue arises with all r-colored vowels (the precise vowels in /-ɑ.ɹ-/ and /-ɑɹ-/ also sound different, and as this discussion shows, so do the precise vowels used before same-morpheme /l/). I find conflicting evidence of whether, if a speaker produces the flowrate vowel where the fluorate vowel is expected, the difference is so large that it's heard as the other word / a clearly different phoneme and not merely an allophone: this writer says his father's unmerged hoarse vowel made moron (VCV) sound like a name Mo Ron rather than like moron, but apparently he couldn't hear the difference in hoarse vs horse (VCC) since he didn't understand his father saying those didn't sound the same. I suppose I could get behind representing both flowrate and fluorate and row and roll with /o/ in broad IPA as Erutuon and others have suggested, if we can agree to try and more routinely include narrow IPA showing the differences, and to continue showing the contrastive syllable breaks in these cases, not drop them as proposed recently. (But if people are disagreeing on whether the vowel in roll vs rolloff is the same vs different... well, hopefully scholarship can clarify.)
Another interesting (not minimal, because they differ in number of syllables) pair is
sewer (sewist) (where the entry currently says "/ˈsoʊɚ/") vs sore; if we change to /o/ everywhere and don't mark syllable breaks, and hence change sewer to /soɚ/, it'll be confusing, given how many reference works treat /oɚ/ and /oɹ/ as interchangeable notations or notate one with the other: [soɚ is the notation Merriam-Webster uses for monosyllabic sore, different from a sewing sewer, and we ourselves inconsistently notate e.g. the air sound interchangeably as /ɛɚ/, /ɛɹ/ or /ɛəɹ/ (or with other vowels than ɛ) in various entries. (So that's another example of the need to retain syllable breaks, /soɚ/.) - -sche (discuss) 20:36, 10 December 2022 (UTC)[reply]
Again, if you want to change how we transcribe words, get consensus to change the key first, and then change the articles. Our entry for roll, for example, shows a diphthong in GA, so your edit was inconsistent -- it would be very strange for GA to have a diphthong in roll but a monophthong in rolloff. I speak something very close to GA, and I also have a monophthong in roll, but a diphthong in rolloff. Your two pronunciations are lexically distinct for me, a verb phrase vs a noun. So, sources, plus consensus to change the key. kwami (talk) 07:08, 6 December 2022 (UTC)[reply]
@Kwamikagami: "Your two pronunciations are lexically distinct for me" - please explain how this is relevant? Whoop whoop pull up Bitching Betty ⚧️ Averted crashes 07:23, 6 December 2022 (UTC)[reply]
I don't understand the question. If the pronunciations of two words are different, then when transcribing them we should show them as being different. kwami (talk) 07:28, 6 December 2022 (UTC)[reply]
@Kwamikagami Which is what I've been doing. Whoop whoop pull up Bitching Betty ⚧️ Averted crashes 07:33, 6 December 2022 (UTC)[reply]
That doesn't clarify your question. kwami (talk) 07:56, 6 December 2022 (UTC)[reply]
You described the two pronunciations as "a verb phrase vs a noun", which doesn't make sense to me, given that, for me, both the verb phrase and the noun use the first, monophthongal pronunciation (the only difference between the verb phrase and the noun being the insertion of a hiatus between the two syllables in the verb phrase, and the absence of said hiatus in the noun). Whoop whoop pull up Bitching Betty ⚧️ Averted crashes 08:39, 6 December 2022 (UTC)[reply]
Yes, it's the hiatus that makes the difference. For me, the /l/ in the noun "rolloff" is ambisyllabic, and doesn't monophthongize the vowel, just as in the sound file at MW. In the verbal phrase, "roll" and "off" are separate words, and the /l/ does monophthongize the vowel. Or perhaps the vowel in rolloff is ambiguous, neither quite a monophthong as in roll nor a diphthong as in row, just as the syllabification is ambiguous. Either way, it would be weird to transcribe rolloff with a monophthong but roll with a diphthong. kwami (talk) 21:09, 7 December 2022 (UTC)[reply]
I've been trying to find sources for any of the positions people are taking here, since our personal assessments of how general Americans speak differ. So far, FWIW, I've found Matthew Gordon, in a section on the phonology of "New York, Philadelphia, and other northern cities" in Edgar Schneider, The Americas and the Caribbean (2008), page 81: "In some areas [of the Inland North], GOAT and GOAL appear with long monophthongs as they do in the Upper Midwest (see [...]) and Canada", whereas Bas Aarts, April McMahon, Lars Hinrichs, The Handbook of English Linguistics (2021), page 333, says "while the mid vowels /e/ and /o/ are indeed realized as monophthongs by some speakers of Northern British English dialects (Wells 1982; Watt 2002), these vowels are realized as diphthongs ([eɪ], [oʊ]) in American English (Labov et al. 2006)." I do see San Duanmu saying in Syllable Structure: The Limits of Variation (2009), page 185, that "The tense vowels [o] and [e] are monophthongs when they are followed by [l] or [ɚ], as in [oɚ] or, [gol] goal, [eɚ] air, and [pel] pale. They are often diphthongs in open syllables". (Conversely, Merriam-Webster mentions as an aside in their pronunciation guide that "in coastal South Carolina, Georgia, and Florida stressed \o\ is often monophthongal when final, but when a consonant follows it is often a diphthong moving from \o\ to \ə\".) I get the sense that while there may be a [narrow, phonetic] difference between the vowels of roll and row, which one has the mono- vs diphthong differs between the North and South, and they're the same /broad/ phoneme, for which /oʊ/ is the traditional notation. - -sche (discuss) 09:12, 6 December 2022 (UTC)[reply]
I don't have a problem transcribing it /oʊ/. Per Duanmu, I have [o] a monophthong in goal, or and air, but I wouldn't find [oʊ] to be confusing. Also, I have a diphthong in pale (quite audible, even sequisyllabic, as happens with diphthongs before /l/). kwami (talk) 21:19, 7 December 2022 (UTC)[reply]
@Kwamikagami ...You have /o/ in air? Whoop whoop pull up Bitching Betty ⚧️ Averted crashes 22:20, 7 December 2022 (UTC)[reply]
Oops, sorry. I changed something after I drafted that and didn't iron out all the inconsistencies. I was referring to the claim that GA has monophthongs [o] or [e] in those words; I don't for pale but do for the rest. kwami (talk) 22:56, 7 December 2022 (UTC)[reply]
I'm surprised to hear that there're AmE speakers who don't have a diphthong or outright syllable break in pale! Whoop whoop pull up Bitching Betty ⚧️ Averted crashes 22:18, 7 December 2022 (UTC)[reply]