Wiktionary talk:Improving entries for the most common English words

From Wiktionary, the free dictionary
Latest comment: 6 months ago by DCDuring in topic More modest objectives
Jump to navigation Jump to search

Word list[edit]

The top 500 from User:DCDuring/GSL/GSL sortablel New General Service List on Wikipedia.Wikipedia and its official website. DCDuring (talk) 20:57, 25 August 2023 (UTC)Reply

Examples[edit]

I had a personal project to check that basic English words were as fleshed out as they could be and flesh them out if they weren't, User:-sche/basic English; unfortunately, it is quite time-consuming to make an entry truly comprehensive, so I only got a few done (including as, at, high, go, know, take, that). - -sche (discuss) 00:43, 24 August 2023 (UTC)Reply

I didn't specifically know you were doing so, but I'm not surprised. Those seven are tough. (Was high the easiest?) I'm not sure that we want to work on including all the rare PoSes and definitions rather than the core. But, I suppose that while one is working on the core, the others show up in references and citations, so one may as well go for completeness as well. DCDuring (talk) 01:29, 24 August 2023 (UTC)Reply
@-sche: I have proposed that your seven entries be used as exemplars for use in discussions determining what it means for this kind of entry to be "good". DCDuring (talk) 18:07, 24 August 2023 (UTC)Reply
I think -sche's entries look great overall. There are still a few ways I think they could be improved. I think for consistency of quality, it would be good to agree upon a list of things we are going to look at. Perhaps we could look at a selection of words in phases, beginning with the definitions and then moving on to other aspects of the entries. Here are the specific things I would improve in the seven entries:
  1. Add usexes and quotations to all senses that are missing them;
  2. Move quotations out of "quotations" sections and under corresponding definitions;
  3. Ensure that there are translation tables for every non-obsolete sense and subsense (because this can get unwieldy, could we perhaps consider indenting translation tables for subsenses like I did at quarantine?);
  4. Depending on who's participating, make sure that pronunciation encompasses every major English-speaking country;
  5. Ensure consistent capitalization and punctuation at the beginning/end of all senses in an entry;
  6. Make sure that -nyms are displayed either together after all the definitions or listed under each one.
Andrew Sheedy (talk) 19:23, 24 August 2023 (UTC)Reply
The numbered items all sound good. It seems to me that list item 0 is making sure that we have definitions that include all contemporary senses for the word (eg, comparing with those in the OED and MWOnline (for those of us who don't have convenient access to the OED). That can include determining what PoSes we are including and whether we have duplicate definitions for different PoSes, especially when it comes to Adverbs vs. Conjunctions (eg. wh-words), Determiners vs. Pronouns, Adverbs vs. Prepositions vs. Particles (Or is that list item -1?). Eliminating duplicates and excessive overlap is part of the defining process, which cannot be readily performed in phases or by more than one party at a time, at least one party leading the effort. Trying to keep the defining vocabulary as simple as possible also seems like a good idea.
In any event, the definition/PoS parts of the process should be as complete as possible before we do steps 1-3 and 6. Even 4 and 5 might be performed more efficiently if the PoSes and definitions were basically resolved. DCDuring (talk) 20:39, 24 August 2023 (UTC)Reply
I agree. I'll list what I like about the entries as additional things we can watch for:
  1. Definitions:
    1. Comprehensive coverage of definitions;
    2. Clearly worded definitions;
    3. Key terms wikified;
    4. Logical grouping/ordering of senses (I didn't look at this one very closely);
    5. Use of sense-subsense structure for long entries to make it easier to read;
    6. All verb senses labelled either "transitive" or "intransitive" for clarity;
    7. Good use and formatting of usexes and quotes.
  2. Etymology:
    1. As complete as possible;
    2. Consistent, clear formatting;
    3. Extra information like cognates is collapsible.
  3. Pronunciation:
    1. Presence of at least American and British pronunciation;
    2. Audio files for at least American and British pronunciation.
  4. Other:
    1. Usage notes are present and offer examples where needed;
    2. Fairly comprehensive list of derived and related terms.
Other things to consider are the inclusion of images, Wiki-links, etc. but that doesn't really apply much to these entries. Andrew Sheedy (talk) 22:17, 24 August 2023 (UTC)Reply
I like the idea of organizing things so that some items could be done separately, even by different people, without messing up other work processes. That is the usual wiki way. An an example when I am adding derived terms, I have a bunch of browser windows open and special clipboard contents to reduce keystrokes by pasting therefrom. I might want to do a few of those once I'm set up for them. Many items don't really depend on definitions being done first, eg, Etymology, Pronunciation, and Derived and Related terms.
Many of these terms will have many definitions which may take a while to settle down. This raises the possibility that we might want to indicate when the definitions are not fixed enough to allow the other steps to proceed without wasted effort. {{under construction}} or something more specific might be useful. DCDuring (talk) 23:32, 24 August 2023 (UTC)Reply
Yeah, to clarify, I was mainly working on making sure the definitions were comprehensive; I didn't add audio etc, so that remains to be done even on the few entries I overhauled. I agree Quotations sections should be done away with. If we get a list together of "entries to overhaul", I'll indicate the ones I've already worked on the definitions of: others are however, when, word, the noun and verb sections of man, and some random ones like absolute, concrete, create, get off, next and next to, settle, warn, and the adjective section of low. Widsith did of, and other entries have been fleshed out by other people. Obviously, it wouldn't hurt if anyone wanted to look through the definitions even of all those 'done' entries and see if anything is missing, since 2+ people notice more than 1 person.
I agree on all the numbered points above, especially logical grouping of senses (which I think makes sense despite the periodic debates over whether to list senses in chronological order, or in order of how common they are: most helpful is clearly neither of those, it's to group related senses). - -sche (discuss) 05:34, 25 August 2023 (UTC)Reply

"Stewards", checklists for entries?[edit]

Would it make sense for each entry to have one or more stewards, monitors, shepherds, or cat herders at least through to an entry achieving "good entry" status? Each term should be on the watchlist of one or more contributors to track all changes. Would it make sense to have a templated table on each entry talk page to track the state of each entry, a checklist for each of the items suggested above? DCDuring (talk) 14:11, 25 August 2023 (UTC)Reply

I think a checklist is a good idea. Having monitors for each entry might not be necessary, but I would be happy to do it. I think it would be enough to either assign entries to different people or (preferably) allow people to "sign up" to take on an entry from a given list (maybe with several distinct phases: (1) definitions, (2) quotations and usexes, (3) synonyms, derived terms, etc. (4) final formatting and quality check; things like pronunciation could be taken care of at any point before the final quality check). Andrew Sheedy (talk) 16:23, 25 August 2023 (UTC)Reply
I was thinking that the monitor/steward would usually be the same as the definer.
I'll see about creating a templated checklist. DCDuring (talk) 19:11, 25 August 2023 (UTC)Reply
I have created {{English entry quality status}} that is meant to be for:
  1. initial quality assessment
  2. guidance for elements of entry "goodness"
  3. reporting progress, next steps.
It is dauntingly long, but incomplete and with insufficient detail for some purposes.
The idea would be to have one on every one of the entries on our improvement list. I suppose that each willing participant should try it out on one entry. DCDuring (talk) 00:10, 29 August 2023 (UTC)Reply
  • @DCDuring Do you mean for this to be used on each entry's talk page? (I'm wondering what categories I can put the template into.) — excarnateSojourner (talk · contrib) 04:19, 4 September 2023 (UTC)Reply
    My intent is to use the template on talk pages for entries in this project. I was hoping to get more reactions to the template itself. I have placed it only on Talk:the. But we don't have any category for entry-quality-evaluative templates. WP has that kind of template, also placed on entry talk pages. Their categorization would be something to consider in categorizing this and any similar templates. DCDuring (talk) 15:34, 4 September 2023 (UTC)Reply
    I've been very busy, because I was in the midst of moving and starting a new semester of university, but I hope to be more involved in a couple weeks. I'm mostly pleased with the template at first glance. I would be inclined to have three or four phases, though. Having just two makes it feel more overwhelming, IMO, since it makes it look like a whole ton of things are equal priority. If there are specific aspects you want feedback on, let me know. Andrew Sheedy (talk) 01:15, 7 September 2023 (UTC)Reply
    It is definitely overwhelming.
    It seems to me that there at least two tracks: one concerned with or dependent on definitions and other tracks concerned with appearance that can operate without delay, independently of the definition track. Completeness of definitions requires checking against other references and has to be fairly early in the process. The translation section may as well not be started until the definitions are tentatively "complete". Etymology can and should proceed at a very early stage, because it can help with entry structuring and, therefore, definitions. Checking completeness of PoSes, relative to other references should be early. We might need a template {{rfOED}} so those without convenient OED access can ask for things (PoS, Middle English, Etymologies, definitions, citations) to be checked. RfV can begin with existing uncited definitions, but, for common function words, we might need to cut directly to {{rfOED}}. Pronunciation can begin early, probably right after dividing things up by Etymology and, possibly, PoS. Use of {{rfe|en}} for missing etymologies and {{rfp|en}} for missing pronunciations can start immediately. Selectively starting discussions on WT:ES may be necessary to draw attention to missing of seriously deficient etymoloies. Subsense structuring can be a late step. Checking for goodness of definition wording and for "completeness" of nyms and derived terms seems like part of a late phase. Pictures, examples, and 'final' review of appearance seem to be deferrable to a last stage, earning an entry "excellent" rating. This analysis is not complete. I will take a run at editing the template and possibly producing a shorter one, designed to get the process started without seeming too overwhelming.
    I'm also thinking we don't need an "entry steward" as much as we need someone to tackle the definitions. "Stewarding" can possibly be crowdsourced. DCDuring (talk) 18:42, 7 September 2023 (UTC)Reply

Quotations[edit]

Thanks for creating this page @DCDuring. I volunteer for the somewhat tedious job of adding quotations, with the goal of having (at least) 3 quotations per sense—let me know what entries we're happy with (definitions-wise) and I will try to get them cited. Adding quotations is also a good test for duplicate definitions: on good, the phrases "a good worker", "a good watch", and "a good swimmer" are claimed to be different senses of "good", which will have to be addressed at some point. Ioaxxere (talk) 00:00, 26 August 2023 (UTC)Reply

That could happen even with entries that have been worked on by our best contributors. The first 500 terms on the table at User:DCDuring/GSL/GSL sortable are our master list for now. Only 15 of them have been worked on comprehensively, by -sche and by Widsith. Pick any one those 15. The function words are tough if you like a challenge. Other words a better for building up to the challenges. I will be soliciting nominations from the list of 500 for words that are good, at least with respect to definitions. I certainly agree that attestation efforts could easily lead to more or fewer definitions, but we can hope they won't completely upset the applecart. After all, really good dictionaries work from a corpus, at least for new definitions. Maybe good would be a place to start. It might help us better understand the quality issues for these entries. DCDuring (talk) 01:43, 26 August 2023 (UTC)Reply
One thing I like to do is to take the cites from some more obscure word and throw them onto the page of a common word- for instance, dance and Bible. --Geographyinitiative (talk) 22:07, 27 August 2023 (UTC)Reply
That can help for some entries. But how can it help with words like of or it? It would be hard to go through all the definitions to find the right place to put the cite. We will need some special sources and/or tools and/or cleverality to get cites for words like that. DCDuring (talk) 22:22, 27 August 2023 (UTC)Reply
You don't find it by looking, you find it by knowing how pitiful Wiktionary's coverage is, and then when you incidentally run in to something, you can fill in the gap. That's how I did the cite here:God#Interjection. Also, I know how pitiful Wiktionary is on hyphenated terms, so I can predict where there's going to be a missing alternative form.------ I just found cites for "of" and "it" on the Nansi entry (random word I was just working on). Cannibalize the cites that exist. You could find a word that often pairs with a particular sense of "of" or "it" and then search that on Wiktionary and find it. Or you could search "New York Times" on Wiktionary plus whatever word you're looking for. After you've done a few of those, then what's left is going to be rarer senses, and those can be dealt with either in RFV or whatever. --Geographyinitiative (talk) 22:44, 27 August 2023 (UTC)Reply

Cutting corners[edit]

I think we may need some shortcuts. For many of the function words, it is not very easy to find attestation for all of the great variety of senses. We may have to rely on usage examples without attestation or on authority: other dictionaries (OED, OneLook references) or definitive grammar books, such as the CGELs. To actually make visible progress, perhaps we should start our efforts on nouns, verbs, and adjectives that are not used as function words. DCDuring (talk) 19:05, 27 August 2023 (UTC)Reply

I've been making hopefully pretty visible progress on concrete with the help of OED citations: my plan is to RFV whatever senses I can't find citations for. Ioaxxere (talk) 20:19, 27 August 2023 (UTC)Reply
Sounds good. I guess we do need to keep to our well-established processes. I hope we can put items on an RfV fast track in some way. DCDuring (talk) 21:54, 27 August 2023 (UTC)Reply

Some common words still with Webster tag[edit]

The top 24:

Fond of sanddunes (talk) 17:35, 28 August 2023 (UTC)Reply

So, 3 (start, hope, and reach) appear among the top 500 of the NGSL; 8 do not appear on it at all; 13 are on the list, but not among the top 500. We often work from this kind of list, sometimes much, much longer ones. This particular page is an effort to focus the efforts of some contributors on the most "important" entries, just using frequency as a gauge of "importance". Looking at all the things that would make up a really good (still not perfect) entry, I can get intimidated, especially by the work required for many of the common function words. But we need to keep some kind of focus on "important" words. Good, experienced contributors can tackle noun, verb, and adjective entries from our list of 500 with some hope of achieving good entries, probably with contributions from others. Only our very best, most experienced contributors are likely to succeed with the complex ones. They will probably need all the help they can get. DCDuring (talk) 23:14, 28 August 2023 (UTC)Reply

More modest objectives[edit]

I don't see that we have any chance of making enough progress on even the top 100 terms with all of the things that we say we want to do to develop any serious momentum. Personally, I intend to limit myself to working on definition coverage, improved wording, and sense/subsense structure for the words that are both basic and among those most looked for by our users. Other cleanup may ensue. DCDuring (talk) 02:59, 4 November 2023 (UTC)Reply

Yeah, I think I was a bit ambitious with my expectations. I underestimated just how busy this school year would be and I simply don't have time to devote to a project like this. I think part of the problem is the more basic words can be very intimidating and it's a big time commitment just to see how our sense coverage compares to other dictionaries. I might try to tackle some of the easier words, but real life is getting in the way. Andrew Sheedy (talk) 04:43, 4 November 2023 (UTC)Reply
I underestimated the degree of incompleteness of apparently satisfactory entries for many common words.
The most useful thing that I have done for this is creating User:DCDuring/GSL/GSL sortable, especially the right columns, which have some information on entry-viewing frequency at Wiktionary. Most GSL terms don't appear on the relatively short list I looked at, but some do. Those seem like a good place to start, though they are "hard" by my reckoning.
I've started work on definitions of the and be. DCDuring (talk) 14:59, 4 November 2023 (UTC)Reply