[2.5.5.996] ID3v2.3 Unicode (UTF-16) Problem

To get bugs in the current release fixed, please report them here.

Moderator: Gurus

Mizery_Made
Posts: 2283
Joined: Tue Aug 29, 2006 1:09 pm
Location: Kansas City, Missouri, United States
Contact:

[2.5.5.996] ID3v2.3 Unicode (UTF-16) Problem

Post by Mizery_Made »

Yeah, 2.5.5 was finalized and all the work is being done on 3.0 now, but I ran across this and since 3.0 is still ALPHA, I haven't tested. So, maybe someone with 3.0 could check to see if it still behaves the same as 2.5.5.

Anyway, even with the option for it to write in UTF-16 selected, atleast for me, it's still writing in ANSI. Not much more to really say, if you need/want me to say more, let me know what you need to know and I'll get back to you.
Teknojnky
Posts: 5537
Joined: Tue Sep 06, 2005 11:01 pm
Contact:

Post by Teknojnky »

edit: same as in I get the same thing (ansi tags when utf-16 is set in the options)

Have not checked MM 3.0 to see if same behavior.
Mizery_Made
Posts: 2283
Joined: Tue Aug 29, 2006 1:09 pm
Location: Kansas City, Missouri, United States
Contact:

Post by Mizery_Made »

It would appear that 3.0 has the same trouble.
jiri
Posts: 5399
Joined: Tue Aug 14, 2001 7:00 pm
Location: Czech Republic
Contact:

Post by jiri »

It probably should have been documented somewhere (I'm not sure if it is), but it isn't a bug - the idea is that if you store a string that consists only of standard ASCII characters, UTF-16 isn't necessary and so ANSI is used. Maybe a special option, some 'Mixed' mode could be introduced to make it clearer.

Jiri
Mizery_Made
Posts: 2283
Joined: Tue Aug 29, 2006 1:09 pm
Location: Kansas City, Missouri, United States
Contact:

Post by Mizery_Made »

I just took a quick look in the Help and it doesn't really mention anything along those lines (that I found anyway). I mean, it's not a huge deal or anything to me, it's just that I found it odd that while One option was selected, it was doing something else, you know? Cheers though. :)
Teknojnky
Posts: 5537
Joined: Tue Sep 06, 2005 11:01 pm
Contact:

Post by Teknojnky »

jiri wrote:It probably should have been documented somewhere (I'm not sure if it is), but it isn't a bug - the idea is that if you store a string that consists only of standard ASCII characters, UTF-16 isn't necessary and so ANSI is used. Maybe a special option, some 'Mixed' mode could be introduced to make it clearer.

Jiri
This makes sense and is probably a good idea for extra compatiblity.

A useful place for this tidbit of info would be the mouse over help, indicating that utf-16 will only be used when necessary.

Also, changing the option text to indicate that utf-16 will be *ALLOWED* to be used, instead of *WILL* be used.

proposed:

ID3v2 text encoding: Ansi + Unicode UTF-16 (only when needed)
gk
Posts: 35
Joined: Mon Apr 23, 2007 7:01 pm
Contact:

Re:

Post by gk »

Teknojnky wrote:
jiri wrote:proposed:

ID3v2 text encoding: Ansi + Unicode UTF-16 (only when needed)
This is a very reasonable clarification, which, 8 years later, has not been implemented in MM v. 4.1.7.1741

I am dealing with this problem in the context of trying to support import of iTunes playlists, in the iPlaylist Importer plugin in which filenames from iTunes, encoded in UTF-8, are imported into MM and do not play, since these UTF-8 characters are always interpreted as Ansi and I cannot force them to be interpreted the way iTunes interprets them.

The unpredicatability of MM is really a problem here. Especially when some glyphs exist both in the Ansi and Unicode encodings.

One case in point (among many others), just as an example:

I name an mp3 file, in Windows 7:
™.mp3

I import it into iTunes and export the XML playlist including this file.

iTunes <Location> Path (in UTF-8) to this file is:
file://localhost/E:/Docs/My Projects/Music/_Software/Music - Library Mgmt/MediaMonkey/iPlaylist Importer/Testing/â„¢.mp3

If you use a hex editor, â„¢ is the byte sequence: E2 84 A2
Which is the unicode for the trademark symbol.

Problem is that, in Windows 7, ™.mp3, is not unicode: it uses ANSI (Windows 1252) code page, in which the Trademark character is represented by 0x99.

MM always interprets these cases as Ansi, so I cannot import an iTunes playlist that includes a UTF-8 track since searching the MM library will never find a UTF-8 encoding if the same glyph also occurs in Ansi, or Windows 1252, perhaps (?), which is not the same as Ansi, by the way.

Clarification?

Here are a couple of good technical references about this problem:

http://www.joelonsoftware.com/articles/Unicode.html
http://www.i18nqa.com/debug/utf8-debug.html
Peke
Posts: 13680
Joined: Tue Jun 10, 2003 7:21 pm
Location: Serbia
Contact:

Re: [2.5.5.996] ID3v2.3 Unicode (UTF-16) Problem

Post by Peke »

From what I see you have a case of BOM character import at the beggining of XML.

Have you tried to remove BOM from the beggining and than import in MMW?

XML files are natively supported by UTF8 and BOM character is not needed like in case of M3U/M3U8

I have Cyrillic filenames and MMW import them without problem from iTunes Library XML.
Best regards,
Pavle
MediaMonkey Team lead QA/Tech Support guru
Admin of Free MediaMonkey addon Site HappyMonkeying
Image
Image
How to add SCREENSHOTS to forum
gk
Posts: 35
Joined: Mon Apr 23, 2007 7:01 pm
Contact:

Re: [2.5.5.996] ID3v2.3 Unicode (UTF-16) Problem

Post by gk »

Peke wrote:From what I see you have a case of BOM character import at the beggining of XML.

Have you tried to remove BOM from the beggining and than import in MMW?
I apologize.
My post here was off-topic so I think you have misunderstood my problem: I don't have any problem with MMW, only with the iPlaylist Importer plugin that I'm working on.

I am not aware of any BOM in my input.
The XML I'm importing is just the standard iTunes XML, that begins with "<?xml", and doesn't appear to have a BOM character, which, as I understand it, would be 0xEF 0xBB 0xBF in UTF-8.

[The reason I have posted here, in addition to discussing with trixmoto, is that trixmoto doesn't know how to fix the problem and I haven't found a better place to ask the question yet. Perhaps I should have started a new thread, but in any case, here we are. Thanks for listening.]
Post Reply