User:KovachevBot/bg-anagrams Report

From Wiktionary, the free dictionary
Jump to navigation Jump to search

Report of details[edit]

  • This project tasked itself with expanding as many Bulgarian entries as possible with anagrams, using the database of rechnik.chitanka.info as a source from which the list of anagrams was generated.
  • The methodology was to generate a dictionary mapping alphagrams (sequences where the letters of a word are arranged in alphabetical order) to the anagrams that are spelled with that alphragram's letters.
  • This produced an easily-iterable structure, which the bot then enumerated, adding an Anagrams section if it didn't exist, and otherwise adding elements to the section if it already did.
  • There was considerable debate amongst fellow editors as to what the scope of the project should be, namely including whether non-lemmas should be included, whether to automatically generate those if they don't exist, and more.
  • In the end, I went with Chernorizets's idea of skipping non-lemmas, until such a time comes as we're ready to document them. For now, the lemma-only coverage will help us to add common lemmas, whereas non-lemma coverage can be incrementally added later.
  • The script ran over over 3000 anagrams, but in the end many fewer were generated due to the absence of many entries on Wiktionary. Some of the most curious anagrams were енцефаломенингоцеле (encefalomeningocele) and менингоенцефалоцеле (meningoencefalocele), which shockingly both have the alphagram агеееееиллмннноофцц!
  • We await a future instalment of the project in which we add non-lemmas into the mix.

Errors[edit]

  • There were a few minor mistakes generated, largely centering around forms which also appear with a hyphen at the end (as a prefix marker), such as от linking to an "anagram" от-. These aren't anagrams, since their arrangement of letters is exactly the same.
  • I wrote a new script to scour through all the generated terms, which raised a couple tens of mistakes, which I believe I have now corrected.