Umlauts and other diacritics are broken
Moderator: Alastair
Umlauts and other diacritics are broken
Try Games > Browse by language and select a language such as (say) Danish or German that uses diacritical marks. Now browse through the list of games and you'll see all sorts of strange characters where umlauts or other non-ASCII characters should appear in game titles and author's names.
Can someone please fix this?
Can someone please fix this?
Re: Umlauts and other diacritics are broken
Thanks Garry for bringing this to our attention. I noticed this for one entry back on the 16th, but I hadn't realised it was site wide. I hope there is an easy solution, the thought of correcting each error individually does not appeal.
- Gunness
- Site Admin
- Posts: 1951
- Joined: Tue Dec 07, 2004 7:04 pm
- Location: Copenhagen, Denmark
- Contact:
Re: Umlauts and other diacritics are broken
I presume that it's a display error, as I don't know why these specific characters should have been corrupted.
As far as I can tell, all pages are displayed with UTF-8, which *is* able to display special characters such as these. I don't have any ideas straight away but I'll investigate.
As far as I can tell, all pages are displayed with UTF-8, which *is* able to display special characters such as these. I don't have any ideas straight away but I'll investigate.
-
Mr Creosote
- Posts: 1146
- Joined: Tue Sep 22, 2009 9:23 am
- Contact:
Re: Umlauts and other diacritics are broken
My guess is on a database upgrade gone wrong. I can check this afternoon.
- Gunness
- Site Admin
- Posts: 1951
- Joined: Tue Dec 07, 2004 7:04 pm
- Location: Copenhagen, Denmark
- Contact:
Re: Umlauts and other diacritics are broken
Much appreciated!
-
Mr Creosote
- Posts: 1146
- Joined: Tue Sep 22, 2009 9:23 am
- Contact:
Re: Umlauts and other diacritics are broken
Can you help assembling a list of characters which will need replacing - i.e. how they look now and what they are supposed to be? I have so far:
Also, are other fields than the title also affected?
Code: Select all
'ü' => 'ü',
'ä' => 'ä',
'ö' => 'ö',
'Ü' => 'Ü',
'é' => 'é',
'á' => 'á',
'ê' => 'ê'
Re: Umlauts and other diacritics are broken
Add
https://solutionarchive.com/game/id%2C9 ... %2C+L.html
https://solutionarchive.com/game/id%2C1 ... 2C+Le.html
I'm guessing that the ¿ in ¿La sentencia? - https://solutionarchive.com/game/id%2C6 ... 2C+La.html - should be ¿ and the à in the author's name "José Luis DÃaz" should be í but then we have an instance where just one rather than two faulty characters represents the real one.
(Note that since ½ => ½ and ¿ may => ¿ it could be that all instances of Â{character} resolve to {character}.)
I also see https://solutionarchive.com/game/id%2C9 ... A9lye.html what the title and the author's name are supposed to be I cannot guess.
Code: Select all
½ => ½
â => â
è => è
Ç => Ç
Å« => ū
also I am seeing î (see links below), I think it should be Î or îhttps://solutionarchive.com/game/id%2C1 ... 2C+Le.html
I'm guessing that the ¿ in ¿La sentencia? - https://solutionarchive.com/game/id%2C6 ... 2C+La.html - should be ¿ and the à in the author's name "José Luis DÃaz" should be í but then we have an instance where just one rather than two faulty characters represents the real one.
(Note that since ½ => ½ and ¿ may => ¿ it could be that all instances of Â{character} resolve to {character}.)
I also see https://solutionarchive.com/game/id%2C9 ... A9lye.html what the title and the author's name are supposed to be I cannot guess.
Yes, see https://solutionarchive.com/game/id%2C5 ... glub!.html for an example where "Related" (no surprise since it contains the game's title) and "Notes" are affected.
Re: Umlauts and other diacritics are broken
This is a rough equation in hex I've devised that may explain what is going on, where:
x is the value of the actual character
y is the value of the first rogue character
z is the value of the second rogue character
I'm getting the values from https://en.wikipedia.org/wiki/List_of_U ... characters
x = 40(y - C2) + z
some examples which all correlate with what is known (for the third example see the note):
x is the value of the actual character
y is the value of the first rogue character
z is the value of the second rogue character
I'm getting the values from https://en.wikipedia.org/wiki/List_of_U ... characters
x = 40(y - C2) + z
some examples which all correlate with what is known (for the third example see the note):
Code: Select all
à = 00C3
¤ = 00A4
x = 40(C3 - C2) + A4 = 40 + A4 = E4
00E4 = ä
Code: Select all
Å = 00C5
« = 00AB
x = 40(C5 - C2) + AB = C0 + AB = 16B
016B = ūCode: Select all
à = 00C3
{Soft hyphen} = 00AD
x = 40(C3 - C2) + AD = ED
00ED = í
N.B. The soft hyphen would explain the "Díaz" issue.- Gunness
- Site Admin
- Posts: 1951
- Joined: Tue Dec 07, 2004 7:04 pm
- Location: Copenhagen, Denmark
- Contact:
Re: Umlauts and other diacritics are broken
I have these:Mr Creosote wrote: Sat May 30, 2026 12:41 pm Can you help assembling a list of characters which will need replacing - i.e. how they look now and what they are supposed to be? I have so far:
Also, are other fields than the title also affected?Code: Select all
'ü' => 'ü', 'ä' => 'ä', 'ö' => 'ö', 'Ü' => 'Ü', 'é' => 'é', 'á' => 'á', 'ê' => 'ê'
'Ã¥' => 'å'
'Ø' => 'ø'
'ø' => 'Ø'
'æ' => 'æ'
'Ä' => 'Ä'
This ought to cover the various Scandinavian languages.
A few French characters:
'Ç' => 'Ç'
'â' => 'â'
'è' => 'è'
- Gunness
- Site Admin
- Posts: 1951
- Joined: Tue Dec 07, 2004 7:04 pm
- Location: Copenhagen, Denmark
- Contact:
Re: Umlauts and other diacritics are broken
Yes, the user comments - see: https://solutionarchive.com/game/id%2C3 ... Karma.html:
"I’m a big fan of Avalon Hill Microcomputer Games and have nearly all of these titles in my game collection. Lords of Karma can’t be 1978 given one primary fact. Avalon Hill in context to computer games first presented offerings to the public for sale at the Origins Gaming Convention on June 27–29 1980"
What's worrying here is that the bug also seems to affect apostrophes and dashes?
-
Mr Creosote
- Posts: 1146
- Joined: Tue Sep 22, 2009 9:23 am
- Contact:
Re: Umlauts and other diacritics are broken
Well, I can run a script over a number of database fields to re-encode characters. Though honestly, it will never be fully complete. You don't happen to have a backup from before this happened, do you?
- Gunness
- Site Admin
- Posts: 1951
- Joined: Tue Dec 07, 2004 7:04 pm
- Location: Copenhagen, Denmark
- Contact:
Re: Umlauts and other diacritics are broken
No, it would be pretty outdated, I'm afraid.
To avoid several passes, maybe we should ensure that the list of characters to be replaced is as complete as possible. I can take another look tomorrow, but don't have the time tonight.
Equally important, what can be done to avoid this in the future, other than restoring backups or running char replacement scripts?
To avoid several passes, maybe we should ensure that the list of characters to be replaced is as complete as possible. I can take another look tomorrow, but don't have the time tonight.
Equally important, what can be done to avoid this in the future, other than restoring backups or running char replacement scripts?
Re: Umlauts and other diacritics are broken
Looking at that page in the Wayback Machine - https://web.archive.org/web/20251015052 ... Karma.html - shows that the apostrophes and dash in 27–29 are not standard ASCII ' and - they probably came from cutting and pasting from a word processor.Gunness wrote: Mon Jun 01, 2026 1:51 pm Yes, the user comments - see: https://solutionarchive.com/game/id%2C3 ... Karma.html:
"I’m a big fan of Avalon Hill Microcomputer Games and have nearly all of these titles in my game collection. Lords of Karma can’t be 1978 given one primary fact. Avalon Hill in context to computer games first presented offerings to the public for sale at the Origins Gaming Convention on June 27–29 1980"
What's worrying here is that the bug also seems to affect apostrophes and dashes?
The Wayback Machine also shows that the problem occurred between 3rd April - https://web.archive.org/web/20260403032 ... chive.com/ - and 8th May - https://web.archive.org/web/20260508140 ... chive.com/ (search for "sur ma Cour" or "ologie en Aveugle" for a couple of examples).
-
Mr Creosote
- Posts: 1146
- Joined: Tue Sep 22, 2009 9:23 am
- Contact:
Re: Umlauts and other diacritics are broken
I think I have an algorithmic solution, not relying on a whitelist of specific characters (thanks, Alastair, your math approach nudged me in the right direction).
Here is an overview of what it would do if I put it live: https://solutionarchive.com/__umlauts/ (temporary page, will be deleted again after committing the fix). The ones marked in red do contain question marks, which could happen if my algorithm fails. Though of course, a question mark may be correctly part of a game title or a sentence. I.e. it just means we should manually check whether this proposed result is correct. A bit of sanity checking would be appreciated.
I'm still looking into what other database fields could be affected. Is there any impact on the forums as well or only on the website?
Here is an overview of what it would do if I put it live: https://solutionarchive.com/__umlauts/ (temporary page, will be deleted again after committing the fix). The ones marked in red do contain question marks, which could happen if my algorithm fails. Though of course, a question mark may be correctly part of a game title or a sentence. I.e. it just means we should manually check whether this proposed result is correct. A bit of sanity checking would be appreciated.
I'm still looking into what other database fields could be affected. Is there any impact on the forums as well or only on the website?
Re: Umlauts and other diacritics are broken
The umlauts in the thread Jörg Walkowiak's "Gold Fever" - https://solutionarchive.com/phpBB3/viewtopic.php?t=871 - are present and correct. So the forum is probably unaffected.Mr Creosote wrote: Thu Jun 04, 2026 4:51 pm I'm still looking into what other database fields could be affected. Is there any impact on the forums as well or only on the website?