review.tizen.org Git - platform/upstream/perl.git/commit

author	Father Chrysostomos <sprout@cpan.org>
	Tue, 23 Jul 2013 20:15:34 +0000 (13:15 -0700)
committer	Father Chrysostomos <sprout@cpan.org>
	Sun, 25 Aug 2013 19:22:40 +0000 (12:22 -0700)
commit	25fdce4a165b6305e760d4c8d94404ce055657a0
tree	7c3aa76b83b1518991bf23909ee072c55de29138	tree \| snapshot
parent	428ccf1e2d78d72b07c5e959e967569a82ce07ba	commit \| diff

Stop pos() from being confused by changing utf8ness

The value of pos() is stored as a byte offset. If it is stored on a
tied variable or a reference (or glob), then the stringification could
change, resulting in pos() now pointing to a different character off-
set or pointing to the middle of a character:

$ ./perl -Ilib -le '$x = bless [], chr 256; pos $x=1; bless $x, a; print pos $x'
2
$ ./perl -Ilib -le '$x = bless [], chr 256; pos $x=1; bless $x, "\x{1000}"; print pos $x'
Malformed UTF-8 character (unexpected end of string) in match position at -e line 1.
0

So pos() should be stored as a character offset.

The regular expression engine expects byte offsets always, so allow it
to store bytes when possible (a pure non-magical string) but use char-
acters otherwise.

This does result in more complexity than I should like, but the alter-
native (always storing a character offset) would slow down regular
expressions, which is a big no-no.

dump.c		diff \| blob \| history
embed.fnc		diff \| blob \| history
embed.h		diff \| blob \| history
ext/Devel-Peek/t/Peek.t		diff \| blob \| history
inline.h		diff \| blob \| history
mg.c		diff \| blob \| history
mg.h		diff \| blob \| history
pp.c		diff \| blob \| history
pp_ctl.c		diff \| blob \| history
pp_hot.c		diff \| blob \| history
proto.h		diff \| blob \| history
regexec.c		diff \| blob \| history
regexp.h		diff \| blob \| history
sv.c		diff \| blob \| history
sv.h		diff \| blob \| history
t/op/pos.t		diff \| blob \| history