Hello,
I tried to AutoTag an audiobook from Amazon Germany (de) - it seems like there's a Unicode error on the German umlauts (ä, ö, ü) - and I guess more characters are affected - like ß.
Example: Search for the following album: "Die drei ??? - Folge 137/Pfad der Angst"
Screenshot: http://img717.imageshack.us/img717/1444 ... 133832.png
At the Website all information are displayed correctly: http://www.amazon.de/gp/product/B003273MKS/
I'm using the latest build (4.0.2) on Windows 7 x64.
Thanks, best regards,
tony.
AutoTag Web from Amazon - Encoding Error (Unicode)
Moderator: Gurus
Re: AutoTag Web from Amazon - Encoding Error (Unicode)
It said to be a bug on Amazon's side: http://www.ventismedia.com/mantis/view.php?id=5974
Download MediaMonkey ♪ License ♪ Knowledge Base ♪ MediaMonkey for Windows 2024 Help ♪ MediaMonkey for Android Help
Lowlander (MediaMonkey user since 2003)
Lowlander (MediaMonkey user since 2003)
Re: AutoTag Web from Amazon - Encoding Error (Unicode)
I'm not sure if my issue is related to that as there are no "unknown characters" (like the box or the question mark at the issue you posted).
The bug you posted states an "invalid unicode detection - but what I'm posting is is just a decoding issue.
It's actually a pretty common unicode (UTF8) decoding error.
I'll try to expand on it further:
These umlauts are encoded with 16 bits in UTF-8 - "normal" characters are encoded with 8 bits.
If an ISO-8859-1 interpreter parses unicode it will generate two chars because it can't handle 16 bit characters and will interpret the umlaut as two 8 bit characters.
Example:
UTF-8 Character "ü": 11000011 10011100
will get interpreted as two 8bit blocks:
11000011 --> "Ã"
10011100 --> "¼"
which will result in the two characters "ü" instead of "ü".
There are tons of frameworks which will help with converting.
As a workaround a simple replacer would be possible which would replace "ü" with "ü" and the most common other chars - there are lists for that on the internet, too.
The bug you posted states an "invalid unicode detection - but what I'm posting is is just a decoding issue.
It's actually a pretty common unicode (UTF8) decoding error.
I'll try to expand on it further:
These umlauts are encoded with 16 bits in UTF-8 - "normal" characters are encoded with 8 bits.
If an ISO-8859-1 interpreter parses unicode it will generate two chars because it can't handle 16 bit characters and will interpret the umlaut as two 8 bit characters.
Example:
UTF-8 Character "ü": 11000011 10011100
will get interpreted as two 8bit blocks:
11000011 --> "Ã"
10011100 --> "¼"
which will result in the two characters "ü" instead of "ü".
There are tons of frameworks which will help with converting.
As a workaround a simple replacer would be possible which would replace "ü" with "ü" and the most common other chars - there are lists for that on the internet, too.
Re: AutoTag Web from Amazon - Encoding Error (Unicode)
tonydl: we naturally handle UTF8 strings, but sometimes, the problem is on Amazon side - the string MM receives as XML response from their server is not in UTF8, but some parts of the response are encoded twice to UTF8. So when MM decode it, it is still in UTF8 as you described. It is related only to some records, the same album with the same umlauts could be sometimes received correctly from another Amazon server or another related record on the same Amazon server. They have it correctly on web, but sometimes they send it incorrectly in XML response.
Re: AutoTag Web from Amazon - Encoding Error (Unicode)
Thanks for the answer.
Could you maybe implement a replace-workaround?
For the German letters I'm using the following replace()'s myself:
As it's highly unlikely that the first chars are used in this combination it should be pretty failsafe. And the chars on the right are used quite a lot.
What do you think?
Edit: I found a larger table on the web, from which people from Spain, Portugal, Greek, etc. would benefit, too.
The replace should again be pretty failsafe (no "wrong replaces") because the char-combinations on the right won't make any sense.
I think it would be a pretty good solution.
(the table is reversed compared to the one above - correct char on the left, String to replace on the right)
Could you maybe implement a replace-workaround?
For the German letters I'm using the following replace()'s myself:
Code: Select all
Ä --> Ä
ä --> ä
Ü --> Ü
ü --> ü
Ö --> Ö
ö --> ö
ß --> ß
What do you think?
Edit: I found a larger table on the web, from which people from Spain, Portugal, Greek, etc. would benefit, too.
The replace should again be pretty failsafe (no "wrong replaces") because the char-combinations on the right won't make any sense.
I think it would be a pretty good solution.
(the table is reversed compared to the one above - correct char on the left, String to replace on the right)
Code: Select all
"¡" = "¡"
"¢" = "¢"
"£" = "£"
"¤" = "¤"
"¥" = "Â¥"
"¦" = "¦"
"§" = "§"
"¨" = "¨"
"©" = "©"
"ª" = "ª"
"«" = "«"
"¬" = "¬"
"®" = "®"
"¯" = "¯"
"°" = "°"
"±" = "±"
"²" = "²"
"³" = "³"
"´" = "´"
"µ" = "µ"
"¶" = "¶"
"·" = "·"
"¸" = "¸"
"¹" = "¹"
"º" = "º"
"»" = "»"
"¼" = "¼"
"½" = "½"
"¾" = "¾"
"¿" = "¿"
"À" = "À"
"Â" = "Â"
"Ã" = "Ã"
"Ä" = "Ä"
"Å" = "Ã…"
"Æ" = "Æ"
"Ç" = "Ç"
"È" = "È"
"É" = "É"
"Ê" = "Ê"
"Ë" = "Ë"
"Ì" = "ÃŒ"
"Î" = "ÃŽ"
"Ñ" = "Ñ"
"Ò" = "Ã’"
"Ó" = "Ó"
"Ô" = "Ô"
"Õ" = "Õ"
"Ö" = "Ö"
"×" = "×"
"Ø" = "Ø"
"Ù" = "Ù"
"Ú" = "Ú"
"Û" = "Û"
"Ü" = "Ü"
"Þ" = "Þ"
"ß" = "ß"
"à" = "Ã "
"á" = "á"
"â" = "â"
"ã" = "ã"
"ä" = "ä"
"å" = "Ã¥"
"æ" = "æ"
"ç" = "ç"
"è" = "è"
"é" = "é"
"ê" = "ê"
"ë" = "ë"
"ì" = "ì"
"í" = "Ã"
"î" = "î"
"ï" = "ï"
"ð" = "ð"
"ñ" = "ñ"
"ò" = "ò"
"ó" = "ó"
"ô" = "ô"
"õ" = "õ"
"ö" = "ö"
"÷" = "÷"
"ø" = "ø"
"ù" = "ù"
"ú" = "ú"
"û" = "û"
"ü" = "ü"
"ý" = "ý"
"þ" = "þ"
"ÿ" = "ÿ"
"†" = "†"
"Š" = "Å "
Re: AutoTag Web from Amazon - Encoding Error (Unicode)
I agree, it could be good improvement. Reopened issue for it: http://www.ventismedia.com/mantis/view.php?id=5974