2075: Wrong URL encoding when using getURLContentAsync

Report bugs & feature requests for MediaMonkey 5 and learn about the newest builds.

Moderator: Gurus

2075: Wrong URL encoding when using getURLContentAsync

Postby TIV73 » Thu Sep 07, 2017 6:29 pm

Hey there,
I believe I found a (possible) issue when using app.utils.web.getURLContentAsync. I'm currently working on an plugin for the new autotag framework for vgmdb.net and noticed that lookups with
getURLContentAsync consistently fail if the URL contains certain japanese characters, e.g. 瀬 or 々. If I now run the following code:
Code: Select all
var headers = newStringList()
headers.add('User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/54.0.2840.99 Safari/537.36')
var requestURL = 'http://vgmdb.info/search/albums?format=json&q=瀬'
app.utils.web.getURLContentAsync(requestURL, {headers: headers}).then(function(Content) {console.log(Content)})

An object containing no results is returned, even though there should be close to 140 albums containing this character. To verify, just open the same URL in a browser, and the correct results will be returned.

To rule out that the difference in results is not just the browser just handling the string differently or doing some other magic, I opened the powershell ISE and ran the following statement
Code: Select all
$response=Invoke-WebRequest -Uri "http://vgmdb.info/search/albums?format=json&q=瀬";$response.Content

Again, the correct results were returned. To narrow down the issue, I had a closer look at the returned object, and noticed that it contains the performed query as link parameter. When performing the query with ps or in the browser, the link property contained "search/albums/%E7%80%AC", while the same query in mediamonkey returned "search/albums/%E7%C2%3F%AC", so for some reason the character was encoded incorrectly.

Originally I thought that the server is just misinterpreting the provided URL for some reason or returning because I didn't provide any details about language, charset, etc., so I ran both functions in powershell and MM again and captured both sessions. Turns out, that the request sent to the server by MM already contained the wrong encoding:

Code: Select all
ps => GET /search/albums?format=json&q=%E7%80%AC
MM => GET /search/albums?format=json&q=%E7%C2?%AC


Now, this actually has a quite straightforward solution. Simply manually encoding the URL and then providing the pre-encoded string to getURLContentAsync yields the correct result:
Code: Select all
var requestURL = encodeURI('http://vgmdb.info/search/albums?format=json&q=瀬')

I can't really claim to understand why it works, because it looks like getURLContentAsync internally already calls encodeURI on the provided URL before passing it to _loadDataFromServer, but for some reason it does. Anyway, it's probably a not the biggest problem in practical terms since the wrong encoding can be easily intercepted before calling getURLContentAsync, but I still wanted to provide some feedback about it as it could possibly create issues down the road.
TIV73
 
Posts: 43
Joined: Sat Nov 12, 2011 1:31 pm

Re: 2075: Wrong URL encoding when using getURLContentAsync

Postby PetrCBR » Fri Sep 08, 2017 3:51 am

Hi. We're using Indy library and it requires encoded URL so encodeURI is required when you use any special or unicode character in your URL.
PetrCBR
 
Posts: 1337
Joined: Tue Mar 07, 2006 5:31 pm
Location: Czech

Re: 2075: Wrong URL encoding when using getURLContentAsync

Postby TIV73 » Fri Sep 08, 2017 6:07 am

Alright, so using encodeURI before passing an URL to getURLContentAsync is not just a workaround but the accepted solution. Thanks for letting me know!
TIV73
 
Posts: 43
Joined: Sat Nov 12, 2011 1:31 pm


Return to Alpha testing, bugs, and feature requests (MM5)

Who is online

Users browsing this forum: No registered users and 9 guests