Seek-and-replace
From Cpp Tea :: Articles
Sunday, 24.02.2008, 12:03
Recently, one phpBB2 forum I am participating on tried to cross on phpBB3 version. Bad fortune was that the forum already ran on UTF-8, which is a case unhandled by phpBB2-phpBB3 converter. It happened that all letters that are not plain ASCII were represented in UTF-8 on a phpBB3 board, as if they were represented in pure ASCII on the phpBB2 board. For example a German letter "ä" in UTF-8 is represented by two bytes, which in ASCII looks like "ä". Now, the problem that occurred after conversion is that such an umlaut looked like "ä" in UTF-8, which is "ä" in ASCII.
So yeah, it was a crappy deal, that the installation didn't handle that. Since the forum staff was incapable of solving this, I made a simple program that seeks and replaces certain entries, that are to be listed in an external dictionary file. Perhaps one can make an ultimate algorithm that reverts this unnatural transformation, still I didn't bother doing that. The program I made you can find here:
And the usage is simple:
fr <input file> <output file>
The dictionary dic.html is being defined as an array of triplets: bad entity, TAB, good entity. Every such triplet should be written in a separate line, and TAB is one single byte with decimal value 9. The dictionary you will find in the ZIP file was used to convert the forum DB I spoke of. So, let me state the procedure exactly:
- Lock the forum
- Get yourself a copy of the DB locally
- Use this program to convert it
- You will get a new file. Try to use that file to renew your DB on the forum
- If everything works, unlock the forum. Otherwise, you can bring back the old DB and then unlock it, stating this app didn't help you.
