Wiktionary talk:Forms and spellings

From Wiktionary, the free dictionary
Latest comment: 3 years ago by Geographyinitiative in topic Pocket of Uncertainty
Jump to navigation Jump to search

Various comments[edit]

I've been mulling this over and putting off commenting for a while now so let me rip...

  • Splitting "Altnernative spellings" up into 3 specific types is a bad idea. They are not so polarized. There are words which cross over and blur these categories and there are doubtless other smaller categories.
  • This may highlight that "Alternative spellings" is not the best wording for this section. It's always been contentious. It has been argues over and various people have used their own names for the section from time to time. Not so long ago I suggested that either "Other spellings" or just "Spelling" were better.
  • The purpose of the section has always been to show that under various conditions, the same word has been or is spelled differently in some regions or in some circles or by some people for various reasons. Sometimes they reflect spelling reforms, sometimes they reflect borrowing from another language, sometimes the reflect simplification of diacritics or ligatures, sometimes they are due to misanlysis, sometimes they are due to various romanization schemse or a language written in another script, sometimes they are due to various people attempting to capture words from an unwritten native language in roman script. Not so uncommonly various of these reasons combine and blur and overlap and I'm sure there's a few I didn't think of to list here.
  • they have identical pronunciation and meaning
    • Neither is necessarily true. My guess is the person who wrote this did not study a large sample of words and just went on a hunch. Pronunciation of a single spelling can vary greatly between regions and even depending on part of speech or sense. Meanings shift and it's not unknown for words which were historically the same to subtly split pronunciation or spelling or both and each variant take a different shade of meaning.
  • frequency of usage of either term is within an order of magnitude of that of the other, or two orders of magnitude if either is very common
    • This is just arbitrary and in it seems like the writer is confused about whether all forms are equal or one form is an alternative of another.
  • neither term arises directly from common accidental typos, human errors, scanning errors, etc
    • We don't need anything different for these than for other errors.
  • region-specific spellings
    • This is just unneeded. It attempts to polarize issues which are simply not so black and white. We don't need a separate concept. We need to label spellings with regions and sometimes we need usage notes. For instance many people believe -ise is British and -ize is American and colour is British and color is American yet the words "colorise" and "colourize" can be attested.
  • obsolete spelling
    • This is too simplified. We already have "dated", "archaic", and "obsolete" labels for words and senses. We just need to use them here. A word or spelling is dated if your grandpartents use it but you don't. A word of spelling is archaic if it is now used only to create a feeling of the past and a more modern word or spelling would be used generally. A word or spelling is obsolete if it has not been used at all or a long time. Again there are other cases such as German's recent spelling reform - are spellings replaced by it archaic or obsolete now? Maybe they are deprecated. And what about Belarus and China where there are forms based on outmoded romanizations or too closely related to unpopular political situations in the past?
  • What about languages or dialects written in more than one script or version of a script?
    • Serbian and Belarusian can both be written in Cyrillic and Latin.
    • Hindi and Urdu are closely related dialects of the same language the majority of whose vocabulary can be used in both languges but will use Arabic script for Urdu and Devanagari for Hindi.
    • Korean used to be written in Hanja and is now written mostly in Hangul, but was written in a combination for a long time and still is in some sectors or by some writers. Hanja words can always be written in Hangul but the opposite is not always the case. Eg: Seoul has no Hanja form.
    • Chinese is written in Traditional characters in some countries and Simplified characters in others. Only some characters were affected by simplification.
    • Some Japanese characters were simplified but differently to Chinese. The older forms can still be found especially in older texts but modern words will never have been spelled in tradtional characters. Many words can be written in Kanji or Katakana, sometimes in Kanji or Hiragana. Usually one form will be more common than the other but there are trends over time.

I propose that we all work on trying to fix this mess with better understanding of all the issues and all the affected parties, but keeping in mind simplification, standardization, and ease of use.

Thanks for your attention. — Hippietrail 17:40, 16 April 2007 (UTC)Reply

I agree with your comments, and think/hope that my proposal below addresses most of them. —RuakhTALK 20:19, 16 April 2007 (UTC)Reply

Alternative proposal[edit]

  • Two or more spellings are considered to identify the same word if all of the following are the case:
    • They are the same part of speech, and the same form of that part of speech (e.g., all are present participles).
    • They are closely related etymologically.
    • They have approximately the same pronunciation (ignoring regional accents and whatnot).
    • They have approximately the same range of meanings, such that they would be defined the same way.
    • A writer using one of them in a text is unlikely to use another in the same text (except by mistake, or in quoted text).
  • Further, a spelling might identify multiple words, one of which is also identified by another spelling; for example, a Simplified Chinese character might identify a few different words, each having a separate Traditional Chinese character.
  • Two or more spellings that identify the same word in this fashion are referred to below as "variants".
  • Variants that have the exact same pronunciation (ignoring regional accents and whatnot) are referred to below as "alternative spellings". For example, the variants color and colour are alternative spellings, while the variants negotiate (with [ʃ]) and negociate (with [s]) are not.
  • Where possible, a lemma with variants should have a full entry at only one of them, with the others being mostly soft redirects (but having variant-specific information, such as differing pronunciation, where appropriate). The title of the full entry should be chosen as follows:
    • Non-standard variants (misspellings like accomodate, eye-dialect spellings like wuz, and so on) should not be given full entries. (Indeed, only the most common misspellings should be given entries at all.)
    • If a given spelling is a variant of a few otherwise-differently-spelled words … ? [I have no idea. On the one hand, it might be preferable to give full entries to the latter (because otherwise each soft redirect would require additional information on which sense applies); on the other, it would be irksome to look up the more common spelling and then have to visit five different softly-redirected pages to find all the possible senses. Input is welcome.]
    • If it is clear which standard variant is the most common, then it should be given a full entry.
    • If a few standard variants seem to be roughly tied for most common, then one of them should be given a full entry.
  • Assuming there is more than one non-misspelled variant, the full entry should have a section headed "Variants" with a link to each non-misspelled variant, followed by a parenthesized, italicized context note if relevant. This includes a self-"link" which will appear in bold (included for the purpose of attaching a context note to it, though for consistency's sake it should appear whether or not it requires a context note). This allows a distinction to be drawn between sense contexts (pants, in the sense of under-trousers, is a British word) and spelling contexts (realise is a British spelling), which is essential for the sake of people coming in from a soft redirects.
  • As mentioned above, a non-full lemma entry should be a soft redirect to the full entry, with the following information:
    • The language, part of speech, inflected forms, etc.
    • If it's an alternative spelling, then the kind of spelling ("Mostly U.K. spelling of ___", "Dated spelling of ___", "Rare spelling of ___", "Common misspelling of ___", etc.), unless the two spellings really are equivalent and there's nothing to say on that point, in which case it should be "Alternative spelling of ___". [{{alternative spelling of}} should be modified to add support for that modifier.]
    • Otherwise, the kind of variant ("Dated variant of ___", etc.), with the default being simply "Variant of ___." Erroneous variants like ect. should be labeled "Erroneous variant of ___." [A {{variant of}} should be created for this.]
    • The pronunciation, if it differs from that at the full entry. (This could happen even with alternative spellings, since sometimes a spelling is strongly associated with a specific region that normally gets its own pronunciation line.)
    • The etymology, if it differs from that at the full entry; for example, something like "From ___, with influence from ___."
  • Variants of inflected forms are treated normally, with soft redirects to the corresponding lemmata; the lemma pages then identify only their corresponding inflected forms (so color would not link to coloured, nor colour to colored; but age would link to both aging and ageing).

Improvements welcome. :-)

RuakhTALK 20:17, 16 April 2007 (UTC)Reply

Sample of actual uses[edit]

Rather than make assumptions on how the Alternative spellings section (and variants of it) are used, I've made a random sample of actual uses which display a better than expected variety of how our various contributors really use this heading. I've done some analysis of them. Hopefully this will help the development process of an improved standardized spellings section:

Wiktionary:Alternative spellings/sample

Hippietrail 20:12, 21 April 2007 (UTC)Reply

Idioms?[edit]

What about alternative forms of idioms?

I'm considering esprit d'escalier/esprit de l'escalier/l'esprit d'escalier/l'esprit de l'escalier, all of which occur with similar frequencies.

I raised this also at Wiktionary talk:Idioms#Alternative forms.3F.

Nbarth 01:36, 27 January 2008 (UTC)Reply

OIC: use {{alternative form of}}
Nbarth 01:53, 27 January 2008 (UTC)Reply

Question[edit]

I know of many alternative spellings for words in non-English languages that are redirected to the main spelling article instead of having its own entry. Should this be changed? DaGizza 09:01, 1 May 2008 (UTC)Reply

There are some exceptions with diacritics, but otherwise yes, they should. Conrad.Irwin 09:09, 1 May 2008 (UTC)Reply

Misspellings[edit]

This needs a decent explanation, but nobody seems to know what that explanation should be. Mglovesfun (talk) 14:53, 25 December 2009 (UTC)Reply

I think the whole structure of [[Category:Misspellings by language]] needs looking at. I think a lot of these aren't common enough to be here. Mglovesfun (talk)
I agree, this needs a decent explanation. This whole page is still a stub, which I'm slowly editing (and copying information from other places) after being abandoned for more than a year. If more information on the subject is achieved, I'll happily update it.--Daniel. 15:17, 25 December 2009 (UTC)Reply

What about Template:alter[edit]

Shouldn't {{alter}} be mentioned somewhere? — Orgyn (talk) 21:49, 6 March 2018 (UTC)Reply

Orthographic differences only?[edit]

Are Alternative forms just for orthographic differences? MiguelX413 (talk) 05:23, 20 October 2019 (UTC)Reply

Pocket of Uncertainty[edit]

Rule 2 and 3 three create a pocket of uncertainty for certain words.

2 they each satisfy Wiktionary's Criteria for Inclusion

3 frequency of usage of either term is within an order of magnitude of that of the other, or two orders of magnitude if either is very common

Imagine a word with extremely large amount of usage. Imagine another word that is spelled differently but is intended to be pronounced the same way. Such could be the case with a hyper high frequency Hanyu Pinyin derived word and an uncommon Wade-Giles derived word or postal romanization. Now if I proved the Wade-Giles word via CFI criteria, but the frequency of the word is not within one or two orders of magnitude as the Hanyu Pinyin word today on Google, or maybe in fifty years on Google Mars, then where does the Wade-Giles form go on the page of the more common word? If it's not true already, this situation will happen to Beijing and Pei-ching. Will Pei-ching be kicked off the Beijing page by this proposed rule, but allowed to remain as a lonely island page? Would it be kicked to synonyms? --Geographyinitiative (talk) 15:28, 29 April 2021 (UTC)Reply

Based on an above proposal, I changed the rule three to read:
3. frequency of usage of either term is within an order of magnitude of that of the other, or two orders of magnitude if either is very common AND/OR they have approximately the same pronunciation and are closely related etymologically
This would conform the rule to my practice with regard to Hanyu Pinyin-Wade-Giles-Postal Romanizations (into Mandarin) &c. --Geographyinitiative (talk) 15:51, 29 April 2021 (UTC)Reply