Discussion:
[ruby-dev:40672] URI methods for application/x-www-form-urlencoded
Tanaka Akira
15 years ago
Permalink
$B:G6a!"@.@%$5$s$,DI2C$7$?(B URI.encode_www_form $B$J$I!"(B
application/x-www-form-urlencoded $B$r07$&%a%=%C%I$K$D$$$F!"$$$/$D$+%3%a%s%H$,$"$j$^$9!#(B

* URI.encode_www_component

* form $BMQ$G$"$k$3$H$,%a%=%C%IL>$+$i$o$+$i$J$$$N$G!"$h$/$J$$L>A0$@$H;W$$$^$9!#(B
form $B$H$$$&8l$r4^$`$Y$-$G!"$?$H$($P!"(BURI.encode_www_form_component $B$O(B
$B$I$&$G$7$g$&$+!#(B

* "\x00" $B$,(B "%0" $B$K$J$C$F$7$^$$$^$9!#(B
% bin/ruby -ruri -e 'p URI.encode_www_component("\x00")'
"%0"

* $B0z?t$r(B Encoding::ASCII_8BIT $B$K(B force_encoding $B$7$F=hM}$9$k$N$***@5$7$/$O$J$$(B
$B5$$,$7$^$9!#(B
Shift_JIS $BEy%^%k%A%P%$%HJ8;z$NCf$K(B ASCII $B$KBP1~$9$k%P%$%H$,8=$l$k$H!"(B
$B$=$3$,(B ASCII $B$G;D$C$F$7$^$&$3$H$,$"$j$^$9!#(B

% bin/ruby -ruri -e 'p
URI.encode_www_component("\x83\x41".force_encoding("Shift_JIS"))'
"%83A"

http://www.w3.org/TR/html5/forms.html#url-encoded-form-data $B$K$h$l$P!"(B
$BJ8;zC10L$G=hM}$9$k$N$G!"(B"%83%41" $B$K$J$k$Y$-$G$O$J$$$G$7$g$&$,!#(B

$B$3$l$O<B:]$K$O:$$i$J$$5$$O$7$^$9$,!#(B

* $***@8@.7k2L$N(B encoding $B$O0z?t$N(B encoding $B$K$J$j$^$9$,!"(B
$B$^$:$$>l9g$b$"$k$N$G$O$J$$$G$7$g$&$+!#(B
$B0z?t$,(B UTF-16BE $B$J$I(B ASCII$BHs8_49$N>l9g$K$O$"$+$i$5$^$K$^$:$/$F!"(B
$***@8@.$7$?(B % $B$H$$$&J8;z$,J8;z$K$J$j$^$;$s!#(B

$B$3$l$b<B:]$K$O:$$i$J$$5$$O$7$^$9!#(B

* URI.decode_www_component

* form $BMQ$G$"$k$3$H$,%a%=%C%IL>$+$i$o$+$i$J$$$N$G!"$h$/$J$$L>A0$@$H;W$$$^$9!#(B
form $B$H$$$&8l$r4^$`$Y$-$G!"$?$H$($P!"(BURI.decode_www_form_component $B$O(B
$B$I$&$G$7$g$&$+!#(B

* URI.decode_www_component("%20") $B$,6uJ8;zNs$K$J$C$F$7$^$$$^$9!#(B
% bin/ruby -ruri -e 'p URI.decode_www_component("%20")'
""

* $BBh(B2$B0z?t$H$7$F(B encoding $B$r;XDj2DG=$K$9$Y$-$G$O$J$$$G$7$g$&$+!#(B
application/x-www-form-urlencoded $B$K$OJ8;z%(%s%3!<%G%#%s%0$N>pJs$,(B
$B4^$^$l$F$$$J$$$?$a!"8=>u$N(B URI.decode_www_component $B$G(B
$***@5$7$$%(%s%3!<%G%#%s%0$rIU2C$9$k$K$OJV$jCM$KBP$7$F(B force_encoding $B$r(B
$B;H$&$3$H$K$J$j$^$9!#(B
$B$7$+$7!"(Bforce_encoding $B$O4pK\E*$K;H$&$Y$-$G$J$$$o$1$G!"(B
URI.decode_www_component $B<+?H$,0z?t$H$7$F%(%s%3!<%G%#%s%0$r<u$1<h$j!"(B
$BFbIt$G(B force_encoding $B$9$k$N$,NI$$$N$G$O$J$$$G$7$g$&$+!#(B

$B$J$*!"%G%U%)%k%H$G$O(B ASCII-8BIT $B$+(B UTF-8 $B$K$9$Y$-$@$H;W$$$^$9!#(B

* URI.encode_www_form

* HTML $B$N$[$&$NIUO?$K$O%;%Q%l!<%?$H$7$F(B ; $B$r;H$&J}K!$b=R$Y$i$l$F$$$k$N$G!"(B
$BDs6!$9$k$H$$$&2DG=@-$b$"$k$+$b$7$l$^$;$s!#(B
http://www.w3.org/TR/1999/REC-html401-19991224/appendix/notes.html#h-B.2.2

$B$J$/$F$bLdBj$J$$$H$O;W$$$^$9$,!#(B

* URI.decode_www_form

* $B$H$&$<$sB8:_$9$k$HM=A[$7$?$N$G$9$,!"$J$$$h$&$G$9!#(B
URI.encode_www_form $B$@$1$"$C$F$3$C$A$,$J$$$N$O$J$s$G$J$s$G$7$g$&$+(B?
--
[$BEDCf(B $BE/(B][$B$?$J$+(B $B$"$-$i(B][Tanaka Akira]
NARUSE, Yui
15 years ago
Permalink
Post by Tanaka Akira
application/x-www-form-urlencoded $B$r07$&%a%=%C%I$K$D$$$F!"$$$/$D$+%3%a%s%H$,$"$j$^$9!#(B
* URI.encode_www_component
form $B$H$$$&8l$r4^$`$Y$-$G!"$?$H$($P!"(BURI.encode_www_form_component $B$O(B
$B$I$&$G$7$g$&$+!#(B
$B$U$`!"JQ99$7$^$9!#(B
...
$BJQ99$7$^$9!#(B
...
$B$=$&$G$9$M!"%G%U%)%k%H$O(B UTF-8 $B$K$7$^$7$g$&$+!#(B
Post by Tanaka Akira
* URI.encode_www_form
* HTML $B$N$[$&$NIUO?$K$O%;%Q%l!<%?$H$7$F(B ; $B$r;H$&J}K!$b=R$Y$i$l$F$$$k$N$G!"(B
http://www.w3.org/TR/1999/REC-html401-19991224/appendix/notes.html#h-B.2.2
$B$J$/$F$bLdBj$J$$$H$O;W$$$^$9$,!#(B
$BBhFs0z?t$G%;%Q%l!<%?$r;XDj$5$;$k$3$H$O9M$($?$N$G$9$,!"(B
$B$H$j$"$($::G=i$O$J$7$K$7$F$*$/$3$H$K$7$^$9!#(B
Post by Tanaka Akira
* URI.decode_www_form
* $B$H$&$<$sB8:_$9$k$HM=A[$7$?$N$G$9$,!"$J$$$h$&$G$9!#(B
$BLa$jCM$GG:$s$G$$$?$+$i$@$C$?$N$G$9$,!"(BArray $B$GDI2C$9$k$3$H$K$7$^$9!#(B
rdoc $B$G$J$<(B Array $B$J$N$+$H!"$I$&07$&$HJXMx$+$O0FFb$9$k$h$&$K$7$^$9!#(B
--
NARUSE, Yui <***@airemix.jp>
Tanaka Akira
15 years ago
Permalink
Post by NARUSE, Yui
Post by Tanaka Akira
* URI.encode_www_component
$B$^$:$$>l9g$b$"$k$N$G$O$J$$$G$7$g$&$+!#(B
$B0z?t$,(B UTF-16BE $B$J$I(B ASCII$BHs8_49$N>l9g$K$O$"$+$i$5$^$K$^$:$/$F!"(B
$B$3$l$b<B:]$K$O:$$i$J$$5$$O$7$^$9!#(B
$B9M$($?$N$G$9$,!"$3$l$O$&$^$/$J$$$h$&$G$9!#(B

% ./ruby -ruri -e '
v = URI.encode_www_form_component("a$B$"(B".encode("UTF-16BE"))
puts v.dump, v.encoding'
"\x00a%30%42"
US-ASCII

$B$3$NNc$N7k2L$O!":G=i$N(B 2$B%P%$%H$O(B UTF-16BE $B$J(B a $B$H$$$&J8;z$G!"(B
$B$=$l0J9_$O(B US-ASCII $B$J(B %30%42 $B$H$$$&(B 6$BJ8;z$H$$$&$b$N$K$J$C$F$$$^$9!#(B
$B$3$l$O$R$H$D$NJ8;zNs$NCf$K(B UTF-16BE $B$H(B US-ASCII $B$,:.$6$C$F$$$F!"(B
$B$"$+$i$5$^$KJQ$G$9!#(B

http://www.w3.org/TR/html5/forms.html#url-encoded-form-data $B$N(B

4. For each character in the entry's name and value, apply the following
subsubsteps:

1. If the character isn't in the range U+0020, U+002A, U+002D, U+002E,
U+0030 to U+0039, U+0041 to U+005A, U+005F, U+0061 to U+007A then
replace the character with a string formed as follows: Start with
the empty string, and then, taking each byte of the character when
expressed in the selected character encoding in turn, append to the
string a U+0025 PERCENT SIGN character (%) followed by two
characters in the ranges U+0030 DIGIT ZERO (0) to U+0039 DIGIT NINE
(9) and U+0041 LATIN CAPITAL LETTER A to U+0046 LATIN CAPITAL
LETTER F representing the hexadecimal value of the byte
(zero-padded if necessary).

2. If the character is a U+0020 SPACE character, replace it with a
single U+002B PLUS SIGN character (+).

$B$H$$$&ItJ,$rFI$`$H!"(Bselected character encoding $B$,2?$+!"(B
$B$H$$$&E@$,LdBj$K$J$j$^$9!#(B
$B$3$NJ8=q$N$b$&$A$g$C$H>e$NJ}$K$O(B selected character encoding $B$O(B
$B$J$K$+(B ASCII-compatible character encoding $B$rA*$V$H$$$&$3$H$,=q$$$F$"$j$^$9!#(B

UTF-16BE $B$O$"$+$i$5$^$K(B ASCII-compatible character encoding $B$G$O$J$$$N$G$9$,!"(B
$B$"$($F$=$NE@$OL5;k$7$FA*$s$G$7$^$C$?$H$-$K2?$,5/$-$k$+$H$$$&$H!"(B
"a$B$"(B" $B$H$$$&J8;zNs$N3FJ8;z(B ("a", "$B$"(B") $B$=$l$>$l$K$D$$$F!"(B
"a" $B$O(B U+0061 $B$J$N$G%j%9%H$KF~$C$F$$$k$?$a$=$N$^$^$K$7$F!"(B
"$B$"(B" $B$O(B U+3042 $B$J$N$G%j%9%H$KF~$C$F$$$J$/$F!"(B
selected character encoding ($B:#2s$O(B UTF-16BE) $B$GI=8=$5$l$?J8;z(B (0x30, 0x42)$B$N(B
$B3F%P%$%H$=$l$>$l$K$D$$$F!"(B%HH $B$N7A<0$K$9$k$N$G!"(B"%30%42" $B$K$J$j!"(B
$BA4BN$r9g$o$;$k$H(B "a%30%42" $B$H$$$&7k2L$K$J$j$^$9!#(B
($B$=$7$F!"$3$NCf$K(B U+0020 $B$OF~$C$F$$$J$$$N$G(B + $B$K$J$k$H$3$m$O$"$j$^$;$s(B)

$B$=$7$F!"(BRFC 3986 (URI) $B$r9M$($k$H!"(B
%30 $B$d(B %42 $B$KBP1~$9$k(B ASCII $BJ8;z$O(B unreserved $B$J$N$G!"(B
$B$=$l$i$O(B percent encoding $B$K$9$k$+$I$&$+$G0UL#$,JQ2=$7$J$$$3$H$K$J$C$F$$$^$9!#(B
$B$D$^$j!"(B%30 $B$O(B 0 $B$HEy2A$G!"(B%42 $B$O(B B $B$HEy2A$G$9!#(B
$B$H$$$&$o$1$G!"(B"a%30%42" $B$H$$$&J8;zNs$r%G%3!<%I$9$k$H(B "a0B" $B$K$;$6$k$rF@$^$;$s!#(B
$B$D$^$j!"$b$H$b$H$N(B "a$B$"(B" $B$H$$$&J8;zNs$,EA$o$j$^$;$s!#(B

$B$D$^$j!"(BASCII-compatible character encoding $B$rA*$V$3$H$K$J$C$F$$$k$N$K$OM}M3$,$"$j$^$9!#(B
$BL5;k$9$k$N$O$h$m$7$/$"$j$^$;$s!#(B

$B;W$$Ib$+$VBP:v$H$7$F$O!"0J2<$/$i$$$G$7$g$&$+!#(B

* $B<+F0E*$K(B UTF-8 $B$KJQ49(B
* ISO-2022-JP $B$J$I$HF1$8$/!"$9$Y$F$N%P%$%H$r(B percent encoding
* $BNc30(B

$B$J$*!"(BISO-2022-JP $B$J$I(B Unicode $B0J30$N(B ASCII incompatible encoding $B$G$9$Y$F$N(B
$B%P%$%H$r(B percent encoding $B$K$9$k$H$$$&8=>u$NF0:n$OJQ$($J$/$F$$$$$H;W$$$^$9!#(B
Post by NARUSE, Yui
Post by Tanaka Akira
* URI.decode_www_component
* $BBh(B2$B0z?t$H$7$F(B encoding $B$r;XDj2DG=$K$9$Y$-$G$O$J$$$G$7$g$&$+!#(B
$B$=$&$G$9$M!"%G%U%)%k%H$O(B UTF-8 $B$K$7$^$7$g$&$+!#(B
$B%G%U%)%k%H$8$c$J$/$F6/@)$K$J$C$F$$$^$9!#(B

% ./ruby -ruri -e '
v = URI.decode_www_form_component("%A1%A2", "EUC-JP")
p v
p v.encoding
'
"\xA1\xA2"
#<Encoding:UTF-8>

$B$;$C$+$/(B enc $B0z?t$r<u$1IU$1$F$$$k$N$K!";H$C$F$$$^$;$s!#(B

$B$"$H!"(BTBLDECWWWCOMP_['+'] = ' ' if i == 0x20 $B$,(B
$B%k!<%W$NCf$K$"$k$N$OL5BL$G$7$g$&!#(B
--
[$BEDCf(B $BE/(B][$B$?$J$+(B $B$"$-$i(B][Tanaka Akira]
NARUSE, Yui
15 years ago
Permalink
...
$B:$$C$?$H$-$O<BAu$K$"$o$;$k$H8@$&$3$H$G!"(BUTF-8 $B$KJQ49$9$k$3$H$K$7$^$7$?!#(B
$B$^$?!"@h$N%a!<%k$G;XE&$5$l$?(B Shift_JIS $B$N%(%s%3!<%I$O!"(B
"\x83\x41" $B$O(B "%83A" $B$H$J$k$h$&$KLa$7$^$7$?!#(B
Post by Tanaka Akira
Post by NARUSE, Yui
Post by Tanaka Akira
* URI.decode_www_component
* $BBh(B2$B0z?t$H$7$F(B encoding $B$r;XDj2DG=$K$9$Y$-$G$O$J$$$G$7$g$&$+!#(B
$B$=$&$G$9$M!"%G%U%)%k%H$O(B UTF-8 $B$K$7$^$7$g$&$+!#(B
$B$*$C$H3N$+$K!"D>$7$^$7$?(B
Post by Tanaka Akira
$B$"$H!"(BTBLDECWWWCOMP_['+'] = ' ' if i == 0x20 $B$,(B
$B%k!<%W$NCf$K$"$k$N$OL5BL$G$7$g$&!#(B
$BD>$7$^$7$?!#(B
--
NARUSE, Yui <***@airemix.jp>
NARUSE, Yui
15 years ago
Permalink
...
$B$3$l$G$9$,!":G?7$N(B HTML5 $B$K;XE&$,H?1G$5$l!"%P%$%HC10L$G%(%s%3!<%I$9$k$h$&$K$J$j$^$7$?!#(B
http://www.whatwg.org/specs/web-apps/current-work/multipage/association-of-controls-and-forms.html#url-encoded-form-data
--
NARUSE, Yui <***@airemix.jp>
Loading...