User talk:Sameerhameedy/example entry

From Wiktionary, the free dictionary
Latest comment: 6 months ago by Atitarev in topic Header format
Jump to navigation Jump to search

Header format[edit]

@Sameerhameedy, @Benwing2:

Thank you for these examples.

Generally, it's getting much better and you are producing mostly good results using {{fa-IPA}} both the respellings and transliterations.

I'd like to see the reverse at work as well. From vocalised Persian to transliterations and used them in the headers as well, just like Arabic and now Urdu.

What is lacking is an agreement on what variety to use in the header. I'd prefer something, rather than nothing (OK to include Tajik spelling).

The choices for vocalisations in the header:

  1. No vocalisations or transliterations, handle everything in the Pronunciation section. This will be a downgrade from the existing practice (transliterations). Chinese entries are the only exception but that was supported by a vote. We can decide, at least on one.
  2. Default to Classical or Iranian
    1. Classical is more academic and used by serious resources.
    2. Iranian is more modern, understood and perceived by many as a standard. Modern terms, loanwords, especially specific to Iran will need to be displayed accordingly.
  3. Split headers into multiple lines for each variety, if there are more than one.
  4. Have separate L2 language headers, basically unmerge Persian. Probably dispreferred by the majority.

The transliteration modules need to be enabled in the main space. They are as good as the current Urdu module but not used. There is still trouble with language codes, they need to be allowed, not just for etymology purposes.

The translation adder needs to allow automated nesting for "fa-ira", "fa-cls", "prs" regardless of the decisions for the above. They all link to "fa" entries. Manual edit, sort of works but not through the tool. Anatoli T. (обсудить/вклад) 05:05, 18 October 2023 (UTC)Reply

@Atitarev I would definitely be opposed to splitting Iranian Persian, Dari and Classical Persian into separate L2 languages, and AFAIK there's no current support for sticking multiple transliterations into a single transliteration slot. OTOH User:Theknightwho implemented a // notation in links for use by Mandarin Chinese (simplified vs. traditional), but it's general and maybe we could use it for showing both the Classical/Dari vocalization and the Iranian vocalization. I'm not sure if it supports a three-way split but if so we could use it to show something like CLASSICAL // DARI // IRANIAN if all three exist. It would be great if we could add small subscripted language codes by each one to make it clearer which is which. Not sure how transliteration of // links is currently handled, User:Theknightwho can you comment? As for the translation adder, I think it shouldn't be too hard to fix it to allow for nesting of Persian lects since it already has the built-in support. It might need a few hacks to support etym-only language varieties in nesting but maybe not. I'll take a look soon. Benwing2 (talk) 05:15, 18 October 2023 (UTC)Reply
@Benwing2 Currently it shows the first, since we haven’t yet got consensus on the best way to display multiple translits. It can certainly be customised on a language-specific basis, but I’m still prioritising work on the parser at the moment, so won’t be able to work on it in the short-term. Theknightwho (talk) 05:28, 18 October 2023 (UTC)Reply
@Benwing2, @Theknightwho: Thanks. There is still an issue with // notation when there are multiple forms. Please see how 臺灣台灣台湾 (zh) (Táiwān) is linked to zh:wiki. I am not sure if Theknightwho is working on the fix. I mentioned in WT:GP
@Benwing2, are you able to show an example with links, please?
Maybe use دَقیقِه (dağiğe) (Iranian) and دَقِیقَه (daqīqa) (Dari) as an example? Anatoli T. (обсудить/вклад) 05:28, 18 October 2023 (UTC)Reply
@Theknightwho Just curious, how close is your parser work to being in the alpha and beta stages? Benwing2 (talk) 05:30, 18 October 2023 (UTC)Reply
@Benwing2 Alpha is pretty close - beta’s still quite a way off. It’s gone through a bunch of redesigns, since the hardest part has been working out what form the output should actually take. In brief:
  1. Output is a node tree, consisting of wikitext (i.e. plaintext exploded into UTF8 arrays), and any number of special-purpose nodes. The wikitext parser has wikilinks, external links and html tags, while the template parser has ones for templates and arguments. These can be iterated over recursively using their methods, and they call their children’s methods in turn, so nodes can be mixed and matched arbitrarily, and will always remain compatible with each other. Naturally, they can have special-purpose methods as well.
  2. The plan is to add a bunch of options to the iteration methods, so that they can be used for searching, text replacement, etc. This still needs a lot of work, since I’ve barely started on it.
  3. The Parser object is designed in a general purpose way and can be used to parse anything, so long as you give it the right set of handlers and nodes; this part is essentially complete. I’m currently using it for the wikitext and template parsers, but it could easily be used for complex linguistic parsing as well. As an example of its capability, it’s what allows the wikitext parser to handle an input like [<<<!-- -->!-- -->!-- -->[foo]] in a single pass, where Parsoid (using regex) takes at least 4: foo. It’s also extremely fast and lightweight, so I’m reasonably optimistic we’ll be able to use it elsewhere without causing memory issues.
Theknightwho (talk) 06:37, 18 October 2023 (UTC)Reply
@Atitarev thank you for responding. Yes I've been thinking of allowing fa-IPA to use vocalized text, and will probably add it soon. However I made fa-IPA a bit messy with it creating the IPA, reading, and Arabic spelling all separately (which has led to maintenance difficulties). I am going to do a (hopefully small) rewrite to have fa-IPA make the romanizations first and do 1-to-1 conversions for the IPA and Arabic spellings so those won't require (as much) maintenance. Unless @Benwing2 knows of a better way for me to organize the module. Ideally it would be as clean as Module:ko-pron but my Lua ability is not there yet, and I can't understand the code used there. But, if it created the romanizations first it would also be very easy to have the input text be vocalized Arabic; since the conversation would only need to happen once. Stress marks may be an issue; though, Ben did mention having it add stress marks automatically.
Though (as fa-IPA currently requires with latin text) the inputted text must be classical persian. I think it would be a bit awkward and inconsistent to require classical-style vocalization in fa-IPA but require a completely different style of vocalization in the header. Which is part of why I think we should avoid vocalizations in the header. It also does cause issues for Dari and Classical Persian terms. Even for terms that are not exclusive to Dari and Classical, it's quite annoying to have to transcribe a word twice using two different schemes. Activating automatic transliteration in this state would just mean I need to add diacritics to the same word twice. سَمِیر | Sameer (مشارکت‌هاکتی من گپ بزن) 07:38, 18 October 2023 (UTC)Reply
oh I should mention, both Persian translation modules are a bit behind Urdu's. I've been feeling a bit unmotivated recently and haven't given them the attention they need. I will fix them soon though. سَمِیر | Sameer (مشارکت‌هاکتی من گپ بزن) 09:26, 18 October 2023 (UTC)Reply
@Sameerhameedy I think this would be confusing without small superscripts (or maybe tooltips) indicating what the different transliterations refer to. I also think the Iranian Persian one should go first since it corresponds to the large majority of speakers today. Benwing2 (talk) 04:50, 9 November 2023 (UTC)Reply
@Benwing2, @Sameerhameedy: Hi
On Template talk:User:Sameerhameedy/fa-l I also suggested to add the Iranian vocalisation, which can possibly be derived from classical but not always.
E.g. خْوَد بَه خْوَد (C. xwad ba xwad, I. xod be xod), the Iranian vocalisation would be
خود بِه خودxod be xod(please add an English translation of this usage example) (manual substitution or translation may be required.) Anatoli T. (обсудить/вклад) 05:13, 9 November 2023 (UTC)Reply