AutoTag Web from Amazon - Encoding Error (Unicode)

This forum is for reporting bugs in MediaMonkey for Windows 4. Note that version 4 is no longer actively maintained as it has been replaced by version 5.

Moderator: Gurus

tonydl
Posts: 35
Joined: Tue Jan 19, 2010 11:20 pm

AutoTag Web from Amazon - Encoding Error (Unicode)

Post by tonydl »

Hello,

I tried to AutoTag an audiobook from Amazon Germany (de) - it seems like there's a Unicode error on the German umlauts (ä, ö, ü) - and I guess more characters are affected - like ß.

Example: Search for the following album: "Die drei ??? - Folge 137/Pfad der Angst"

Screenshot: http://img717.imageshack.us/img717/1444 ... 133832.png

At the Website all information are displayed correctly: http://www.amazon.de/gp/product/B003273MKS/

I'm using the latest build (4.0.2) on Windows 7 x64.


Thanks, best regards,
tony.
Lowlander
Posts: 58631
Joined: Sat Sep 06, 2003 5:53 pm

Re: AutoTag Web from Amazon - Encoding Error (Unicode)

Post by Lowlander »

It said to be a bug on Amazon's side: http://www.ventismedia.com/mantis/view.php?id=5974
tonydl
Posts: 35
Joined: Tue Jan 19, 2010 11:20 pm

Re: AutoTag Web from Amazon - Encoding Error (Unicode)

Post by tonydl »

I'm not sure if my issue is related to that as there are no "unknown characters" (like the box or the question mark at the issue you posted).
The bug you posted states an "invalid unicode detection - but what I'm posting is is just a decoding issue.

It's actually a pretty common unicode (UTF8) decoding error.
I'll try to expand on it further:

These umlauts are encoded with 16 bits in UTF-8 - "normal" characters are encoded with 8 bits.
If an ISO-8859-1 interpreter parses unicode it will generate two chars because it can't handle 16 bit characters and will interpret the umlaut as two 8 bit characters.

Example:
UTF-8 Character "ü": 11000011 10011100
will get interpreted as two 8bit blocks:
11000011 --> "Ã"
10011100 --> "¼"
which will result in the two characters "ü" instead of "ü".

There are tons of frameworks which will help with converting.
As a workaround a simple replacer would be possible which would replace "ü" with "ü" and the most common other chars - there are lists for that on the internet, too.
tonydl
Posts: 35
Joined: Tue Jan 19, 2010 11:20 pm

Re: AutoTag Web from Amazon - Encoding Error (Unicode)

Post by tonydl »

*push*
MiPi
Posts: 902
Joined: Tue Aug 18, 2009 2:56 pm
Location: Czech Republic
Contact:

Re: AutoTag Web from Amazon - Encoding Error (Unicode)

Post by MiPi »

tonydl: we naturally handle UTF8 strings, but sometimes, the problem is on Amazon side - the string MM receives as XML response from their server is not in UTF8, but some parts of the response are encoded twice to UTF8. So when MM decode it, it is still in UTF8 as you described. It is related only to some records, the same album with the same umlauts could be sometimes received correctly from another Amazon server or another related record on the same Amazon server. They have it correctly on web, but sometimes they send it incorrectly in XML response.
tonydl
Posts: 35
Joined: Tue Jan 19, 2010 11:20 pm

Re: AutoTag Web from Amazon - Encoding Error (Unicode)

Post by tonydl »

Thanks for the answer.

Could you maybe implement a replace-workaround?
For the German letters I'm using the following replace()'s myself:

Code: Select all

Ä --> Ä
ä --> ä 
Ü --> Ü 
ü --> ü 
Ö --> Ö 
ö --> ö 
ß --> ß 
As it's highly unlikely that the first chars are used in this combination it should be pretty failsafe. And the chars on the right are used quite a lot.

What do you think?



Edit: I found a larger table on the web, from which people from Spain, Portugal, Greek, etc. would benefit, too.
The replace should again be pretty failsafe (no "wrong replaces") because the char-combinations on the right won't make any sense.
I think it would be a pretty good solution.

(the table is reversed compared to the one above - correct char on the left, String to replace on the right)

Code: Select all

    "¡" = "¡"
    "¢" = "¢"
    "£" = "£"
    "¤" = "¤"
    "¥" = "Â¥"
    "¦" = "¦"
    "§" = "§"
    "¨" = "¨"
    "©" = "©"
    "ª" = "ª"
    "«" = "«"
    "¬" = "¬"
    "®" = "®"
    "¯" = "¯"
    "°" = "°"
    "±" = "±"
    "²" = "²"
    "³" = "³"
    "´" = "´"
    "µ" = "µ"
    "¶" = "¶"
    "·" = "·"
    "¸" = "¸"
    "¹" = "¹"
    "º" = "º"
    "»" = "»"
    "¼" = "¼"
    "½" = "½"
    "¾" = "¾"
    "¿" = "¿"
    "À" = "À"
    "Â" = "Â"
    "Ã" = "Ã"
    "Ä" = "Ä"
    "Å" = "Ã…"
    "Æ" = "Æ"
    "Ç" = "Ç"
    "È" = "È"
    "É" = "É"
    "Ê" = "Ê"
    "Ë" = "Ë"
    "Ì" = "ÃŒ"
    "Î" = "ÃŽ"
    "Ñ" = "Ñ"
    "Ò" = "Ã’"
    "Ó" = "Ó"
    "Ô" = "Ô"
    "Õ" = "Õ"
    "Ö" = "Ö"
    "×" = "×"
    "Ø" = "Ø"
    "Ù" = "Ù"
    "Ú" = "Ú"
    "Û" = "Û"
    "Ü" = "Ü"
    "Þ" = "Þ"
    "ß" = "ß"
    "à" = "Ã "
    "á" = "á"
    "â" = "â"
    "ã" = "ã"
    "ä" = "ä"
    "å" = "Ã¥"
    "æ" = "æ"
    "ç" = "ç"
    "è" = "è"
    "é" = "é"
    "ê" = "ê"
    "ë" = "ë"
    "ì" = "ì"
    "í" = "í"
    "î" = "î"
    "ï" = "ï"
    "ð" = "ð"
    "ñ" = "ñ"
    "ò" = "ò"
    "ó" = "ó"
    "ô" = "ô"
    "õ" = "õ"
    "ö" = "ö"
    "÷" = "÷"
    "ø" = "ø"
    "ù" = "ù"
    "ú" = "ú"
    "û" = "û"
    "ü" = "ü"
    "ý" = "ý"
    "þ" = "þ"
    "ÿ" = "ÿ"
    "†" = "†"
    "Š" = "Å "
MiPi
Posts: 902
Joined: Tue Aug 18, 2009 2:56 pm
Location: Czech Republic
Contact:

Re: AutoTag Web from Amazon - Encoding Error (Unicode)

Post by MiPi »

I agree, it could be good improvement. Reopened issue for it: http://www.ventismedia.com/mantis/view.php?id=5974
Post Reply