url - Why java.net.URLEncoder gives different result for same string? -


on webapp server when try encoding "médicaux_jérôme.txt" using java.net.urlencoder gives following string:

me%cc%81dicaux_je%cc%81ro%cc%82me.txt 

while on backend server when try encoding same string gives following:

m%c3%a9dicaux_j%c3%a9r%c3%b4me.txt 

can me understanding different output same input? how can standardized output each time decode same string?

the outcome depends on platform, if don't specify it.

see java.net.urlencoder javadocs:

encode(string s)

deprecated

the resulting string may vary depending on platform's default encoding. instead, use encode(string,string) method specify encoding.

so, use suggested method , specify encoding:

string urlencodedstring = urlencoder.encode(stringtobeurlencoded, "utf-8") 

about different representations same string, if specified "utf-8":

the 2 url encoded strings gave in question, although differently encoded, represent same unencoded value, there nothing inherently wrong there. writing both in decode tool, can verify same.

this due, seeing in case, fact there multiple ways url encode same string, specially if have acute accents (due combining acute accent, precisely happens in case).

to case, specifically, first string encoded é e + ´ (latin small letter e + combining acute accent) resulting in e%cc%81. second encoded é directly %c3%a9 (latin small letter e acute - 2 % because in utf-8 takes 2 bytes).

again, there no problem either representation. both forms of unicode normalization. known mac os xs tend encode using combining acute accent; in end, matter of preference of encoder. in case, there must different jres or, if file name user generated, user may have used different os (or tool) generated encoding.


Popular posts from this blog