[2.5.5.996] ID3v2.3 Unicode (UTF-16) Problem

Post a reply

Smilies
:D :) :( :o :-? 8) :lol: :x :P :oops: :cry: :evil: :roll: :wink:

BBCode is ON
[img] is ON
[url] is ON
Smilies are ON

Topic review
   

Expand view Topic review: [2.5.5.996] ID3v2.3 Unicode (UTF-16) Problem

Re: [2.5.5.996] ID3v2.3 Unicode (UTF-16) Problem

by gk » Sun May 24, 2015 10:35 pm

Peke wrote:From what I see you have a case of BOM character import at the beggining of XML.

Have you tried to remove BOM from the beggining and than import in MMW?
I apologize.
My post here was off-topic so I think you have misunderstood my problem: I don't have any problem with MMW, only with the iPlaylist Importer plugin that I'm working on.

I am not aware of any BOM in my input.
The XML I'm importing is just the standard iTunes XML, that begins with "<?xml", and doesn't appear to have a BOM character, which, as I understand it, would be 0xEF 0xBB 0xBF in UTF-8.

[The reason I have posted here, in addition to discussing with trixmoto, is that trixmoto doesn't know how to fix the problem and I haven't found a better place to ask the question yet. Perhaps I should have started a new thread, but in any case, here we are. Thanks for listening.]

Re: [2.5.5.996] ID3v2.3 Unicode (UTF-16) Problem

by Peke » Sun May 24, 2015 4:19 pm

From what I see you have a case of BOM character import at the beggining of XML.

Have you tried to remove BOM from the beggining and than import in MMW?

XML files are natively supported by UTF8 and BOM character is not needed like in case of M3U/M3U8

I have Cyrillic filenames and MMW import them without problem from iTunes Library XML.

Re:

by gk » Sun May 24, 2015 4:20 am

Teknojnky wrote:
jiri wrote:proposed:

ID3v2 text encoding: Ansi + Unicode UTF-16 (only when needed)
This is a very reasonable clarification, which, 8 years later, has not been implemented in MM v. 4.1.7.1741

I am dealing with this problem in the context of trying to support import of iTunes playlists, in the iPlaylist Importer plugin in which filenames from iTunes, encoded in UTF-8, are imported into MM and do not play, since these UTF-8 characters are always interpreted as Ansi and I cannot force them to be interpreted the way iTunes interprets them.

The unpredicatability of MM is really a problem here. Especially when some glyphs exist both in the Ansi and Unicode encodings.

One case in point (among many others), just as an example:

I name an mp3 file, in Windows 7:
™.mp3

I import it into iTunes and export the XML playlist including this file.

iTunes <Location> Path (in UTF-8) to this file is:
file://localhost/E:/Docs/My Projects/Music/_Software/Music - Library Mgmt/MediaMonkey/iPlaylist Importer/Testing/â„¢.mp3

If you use a hex editor, â„¢ is the byte sequence: E2 84 A2
Which is the unicode for the trademark symbol.

Problem is that, in Windows 7, ™.mp3, is not unicode: it uses ANSI (Windows 1252) code page, in which the Trademark character is represented by 0x99.

MM always interprets these cases as Ansi, so I cannot import an iTunes playlist that includes a UTF-8 track since searching the MM library will never find a UTF-8 encoding if the same glyph also occurs in Ansi, or Windows 1252, perhaps (?), which is not the same as Ansi, by the way.

Clarification?

Here are a couple of good technical references about this problem:

http://www.joelonsoftware.com/articles/Unicode.html
http://www.i18nqa.com/debug/utf8-debug.html

by Teknojnky » Tue Mar 06, 2007 11:08 am

jiri wrote:It probably should have been documented somewhere (I'm not sure if it is), but it isn't a bug - the idea is that if you store a string that consists only of standard ASCII characters, UTF-16 isn't necessary and so ANSI is used. Maybe a special option, some 'Mixed' mode could be introduced to make it clearer.

Jiri
This makes sense and is probably a good idea for extra compatiblity.

A useful place for this tidbit of info would be the mouse over help, indicating that utf-16 will only be used when necessary.

Also, changing the option text to indicate that utf-16 will be *ALLOWED* to be used, instead of *WILL* be used.

proposed:

ID3v2 text encoding: Ansi + Unicode UTF-16 (only when needed)

by Mizery_Made » Tue Mar 06, 2007 10:00 am

I just took a quick look in the Help and it doesn't really mention anything along those lines (that I found anyway). I mean, it's not a huge deal or anything to me, it's just that I found it odd that while One option was selected, it was doing something else, you know? Cheers though. :)

by jiri » Tue Mar 06, 2007 8:11 am

It probably should have been documented somewhere (I'm not sure if it is), but it isn't a bug - the idea is that if you store a string that consists only of standard ASCII characters, UTF-16 isn't necessary and so ANSI is used. Maybe a special option, some 'Mixed' mode could be introduced to make it clearer.

Jiri

by Mizery_Made » Mon Feb 26, 2007 11:45 pm

It would appear that 3.0 has the same trouble.

by Teknojnky » Fri Feb 23, 2007 6:28 pm

edit: same as in I get the same thing (ansi tags when utf-16 is set in the options)

Have not checked MM 3.0 to see if same behavior.

[2.5.5.996] ID3v2.3 Unicode (UTF-16) Problem

by Mizery_Made » Fri Feb 23, 2007 6:25 pm

Yeah, 2.5.5 was finalized and all the work is being done on 3.0 now, but I ran across this and since 3.0 is still ALPHA, I haven't tested. So, maybe someone with 3.0 could check to see if it still behaves the same as 2.5.5.

Anyway, even with the option for it to write in UTF-16 selected, atleast for me, it's still writing in ANSI. Not much more to really say, if you need/want me to say more, let me know what you need to know and I'll get back to you.

Top