url - Why java.net.URLEncoder gives different result for same string? -
on webapp server when try encoding "médicaux_jérôme.txt
" using java.net.urlencoder
gives following string:
me%cc%81dicaux_je%cc%81ro%cc%82me.txt
while on backend server when try encoding same string gives following:
m%c3%a9dicaux_j%c3%a9r%c3%b4me.txt
can me understanding different output same input? how can standardized output each time decode same string?
the outcome depends on platform, if don't specify it.
see java.net.urlencoder
javadocs:
encode(string s)
deprecated.
the resulting string may vary depending on platform's default encoding. instead, use
encode(string,string)
method specify encoding.
so, use suggested method , specify encoding:
string urlencodedstring = urlencoder.encode(stringtobeurlencoded, "utf-8")
about different representations same string, if specified "utf-8"
:
the 2 url encoded strings gave in question, although differently encoded, represent same unencoded value, there nothing inherently wrong there. writing both in decode tool, can verify same.
this due, seeing in case, fact there multiple ways url encode same string, specially if have acute accents (due combining acute accent, precisely happens in case).
to case, specifically, first string encoded é
e
+ ´
(latin small letter e + combining acute accent) resulting in e%cc%81
. second encoded é
directly %c3%a9
(latin small letter e acute - 2 %
because in utf-8 takes 2 bytes).
again, there no problem either representation. both forms of unicode normalization. known mac os xs tend encode using combining acute accent; in end, matter of preference of encoder. in case, there must different jres or, if file name user generated, user may have used different os (or tool) generated encoding.