2075: Wrong URL encoding when using getURLContentAsync

Post a reply

Visual Confirmation

To prevent automated access and spam, you are required to confirm that you are human. Please place a check mark next to all images of monkeys or apes. If you cannot see any images, please contact the Board Administrator.

:D :) :( :o :-? 8) :lol: :x :P :oops: :cry: :evil: :roll: :wink:
BBCode is ON
[img] is ON
[flash] is OFF
[url] is ON
Smilies are ON
Topic review

Expand view Topic review: 2075: Wrong URL encoding when using getURLContentAsync

Re: 2075: Wrong URL encoding when using getURLContentAsync

Post by TIV73 » Fri Sep 08, 2017 6:07 am

Alright, so using encodeURI before passing an URL to getURLContentAsync is not just a workaround but the accepted solution. Thanks for letting me know!

Re: 2075: Wrong URL encoding when using getURLContentAsync

Post by PetrCBR » Fri Sep 08, 2017 3:51 am

Hi. We're using Indy library and it requires encoded URL so encodeURI is required when you use any special or unicode character in your URL.

2075: Wrong URL encoding when using getURLContentAsync

Post by TIV73 » Thu Sep 07, 2017 6:29 pm

Hey there,
I believe I found a (possible) issue when using app.utils.web.getURLContentAsync. I'm currently working on an plugin for the new autotag framework for vgmdb.net and noticed that lookups with
getURLContentAsync consistently fail if the URL contains certain japanese characters, e.g. 瀬 or 々. If I now run the following code:
Code: Select all
var headers = newStringList()
headers.add('User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/54.0.2840.99 Safari/537.36')
var requestURL = 'http://vgmdb.info/search/albums?format=json&q=瀬'
app.utils.web.getURLContentAsync(requestURL, {headers: headers}).then(function(Content) {console.log(Content)})

An object containing no results is returned, even though there should be close to 140 albums containing this character. To verify, just open the same URL in a browser, and the correct results will be returned.

To rule out that the difference in results is not just the browser just handling the string differently or doing some other magic, I opened the powershell ISE and ran the following statement
Code: Select all
$response=Invoke-WebRequest -Uri "http://vgmdb.info/search/albums?format=json&q=瀬";$response.Content

Again, the correct results were returned. To narrow down the issue, I had a closer look at the returned object, and noticed that it contains the performed query as link parameter. When performing the query with ps or in the browser, the link property contained "search/albums/%E7%80%AC", while the same query in mediamonkey returned "search/albums/%E7%C2%3F%AC", so for some reason the character was encoded incorrectly.

Originally I thought that the server is just misinterpreting the provided URL for some reason or returning because I didn't provide any details about language, charset, etc., so I ran both functions in powershell and MM again and captured both sessions. Turns out, that the request sent to the server by MM already contained the wrong encoding:

Code: Select all
ps => GET /search/albums?format=json&q=%E7%80%AC
MM => GET /search/albums?format=json&q=%E7%C2?%AC

Now, this actually has a quite straightforward solution. Simply manually encoding the URL and then providing the pre-encoded string to getURLContentAsync yields the correct result:
Code: Select all
var requestURL = encodeURI('http://vgmdb.info/search/albums?format=json&q=瀬')

I can't really claim to understand why it works, because it looks like getURLContentAsync internally already calls encodeURI on the provided URL before passing it to _loadDataFromServer, but for some reason it does. Anyway, it's probably a not the biggest problem in practical terms since the wrong encoding can be easily intercepted before calling getURLContentAsync, but I still wanted to provide some feedback about it as it could possibly create issues down the road.