Teknojnky wrote:jiri wrote:proposed:
ID3v2 text encoding: Ansi + Unicode UTF-16 (only when needed)
This is a very reasonable clarification, which, 8 years later, has not been implemented in MM v. 4.1.7.1741
I am dealing with this problem in the context of trying to support import of iTunes playlists, in the iPlaylist Importer plugin in which filenames from iTunes, encoded in UTF-8, are imported into MM and do not play, since these UTF-8 characters are always interpreted as Ansi and I cannot force them to be interpreted the way iTunes interprets them.
The unpredicatability of MM is really a problem here. Especially when some glyphs exist both in the Ansi and Unicode encodings.
One case in point (among many others), just as an example:
I name an mp3 file, in Windows 7:
™.mp3
I import it into iTunes and export the XML playlist including this file.
iTunes <Location> Path (in UTF-8) to this file is:
file://localhost/E:/Docs/My Projects/Music/_Software/Music - Library Mgmt/MediaMonkey/iPlaylist Importer/Testing/â„¢.mp3
If you use a hex editor, â„¢ is the byte sequence: E2 84 A2
Which is the unicode for the trademark symbol.
Problem is that, in Windows 7, ™.mp3, is not unicode: it uses ANSI (Windows 1252) code page, in which the Trademark character is represented by 0x99.
MM always interprets these cases as Ansi, so I cannot import an iTunes playlist that includes a UTF-8 track since searching the MM library will never find a UTF-8 encoding if the same glyph also occurs in Ansi, or Windows 1252, perhaps (?), which is not the same as Ansi, by the way.
Clarification?
Here are a couple of good technical references about this problem:
http://www.joelonsoftware.com/articles/Unicode.html
http://www.i18nqa.com/debug/utf8-debug.html