CppBot's alive

From Cpp Tea :: Articles

Jump to: navigation, search
Date:
Wednesday, 28.11.2007, 11:17

What brought me out of my daily obligations is the fact this morning my Wikipedia bot known as CppBot succeeded to make his first edit. In the past few months I was developing an API able of doing some harder stuff in Wikipedia articles but my cancer wound was that part. Handling that part opens the door to many new possibilities.

The main problem with articles on Wikipedia on Serbian language is that the language is allowing use of Latin and a Cyrillic alphabet. That causes people sometimes edit an Cyrillic Article in Latin alphabet (which is not allowed), and vice versa, and, sadly, it often happens that people write words which have letters of both alphabets. Often it is hard to see this visually, and according to the statistics that I was making, the problem spreads from month to month. For example, since last month sr.wiki got about a million words more but percent of bad words grew from 0.21% to 0.27%, and percent of articles with bad words grew from 28.36% to 28.60%.

Since the dump came yesterday too, I was as well indexing all words from sr.wikipedia, which gave me some insight in how to organize the future bot changes to minimize the number of words that have been written in mixed alphabets.

So, I will report back about this the results as soon as my daily obligations allow me to play with all this stuff.