Botwork on sr.wp during 2008
From Cpp Tea :: Articles
Cleanup
Saturday, 9th August, 2008 at 21:10
In the past days I engaged my bot on basically two types of article corrections: cyrlat words and measuring units.
In Serbian language both Cyrillic and Latin script are legal to use. Still there is no word that should contain both Cyrillic and Latin letters, so those have to be repaired somehow, as they often occur. The problem isn't that simple as in Bulgarian or Russian language because there is no algorithm that would say if a letter needs homoglyph transformation or transliteration, so the bot needed to make a rich log that shows which word has been changed to what and in which article. This log had to be read entirely and the needed corrections had to be done before it could be said that the job is done. For the higher level of security, here bot made only 769 edits, which included over 4500 words fixed. More is to go in the next pass. Here is an example.
Legality of Cyrillic and Latin script in Serbian language also implied existence of tabs that allow user to read its content completely in Latin or Cyrillic script, transliterated. But the grammar of Serbian language allows using only Latin script when writing names of measuring units like kg, m, kN, mph etc. which introduces the need of a tag that forbids the transliterator to convert these words. These are -{ and }- and they have to be put around any measuring unit. So "m" would be incorrect to write while "-{m}-" would be correct. Still this is a tedious job and many people write these units even in Cyrillic for it is uncomfortable to always switch from Cyrillic to Latin key mapping just to write an unit. That means just "кг" instead of "-{kg}-" and such. I set task to my bot to clean up these by converting them to Latin script whenever it find one and by putting -{}- tags whenever it find proper. Here I had problems again because of ambiguity of some units with other words but these were under 7% of all edits and were not affecting whole edits, but only one-two units of number of changed. I corrected them, of course. Here bot has made 726 edits and fixed over 5000 units. It practically cleaned up Wikipedia of bad cases. Here is an example.
In the process I was correcting these filters so that they become quite stable, which gave me idea to put them together and let bot fix articles in real time, as it finds them in the list of recent changes. Of course I took care that bot doesn't interrupt and user who has just started working on an article. However I can speed it up manually.
Cleanup 2
Saturday, 30.08.2008, 12:37
Some time ago, I started letting the bot through the list of recent changes to do repairing for cyrlat words and measuring units stuff. In average, the bot has about 30 edits per day only on count of that. It is amazing that there are so many things to be repaired in articles on daily basis.
