User:Bequw/GraphAnalysis

From Wiktionary, the free dictionary
Jump to navigation Jump to search

I would like to plan a program to do a more complete graph analysis of the definitions in Wiktionary (basic graph idea). It would be a generalization of similar projects such as User:RJFJR/WTconcord and User:Robert_Ullmann/Missing. Not only would it find missing words, but circular definitions (small cycles). To work it would have to only pull out the definition sections of each entry (if it always pulled out the synonyms, then most complete definitions would be circular). It should also be able to gauge the size of Wiktionary's Defining vocabulary (the set of 2k+ words that are circularly defined). Though we probably don't want to make it strict like Longman's (see here and here) it would be helpful to know what the current set is. If it's too big maybe we can make steps to reduce it.