utf8: add tests for behavior change in v5.15.6-407-gc710240, and more
In v5.15.6-407-gc710240 Father Chrysostomos patched utf8::decode() so it
would call SvPV_force_nolen() on its argument. This meant that calling
utf8::decode() with a non-blessed non-overloaded reference would now
coerce the reference scalar to a string, i.e. before we'd do:
$ ./perl -Ilib -MDevel::Peek -wle 'use strict; print $]; my $s = shift; my $s_ref = \$s; utf8::decode($s_ref); Dump $s_ref; print $$s_ref' ævar
5.019011
SV = IV(0x2579fd8) at 0x2579fe8
REFCNT = 1
FLAGS = (PADMY,ROK)
RV = 0x25c33d8
SV = PV(0x257ab08) at 0x25c33d8
REFCNT = 2
FLAGS = (PADMY,POK,pPOK)
PV = 0x25a1338 "\303\246var"\0
CUR = 5
LEN = 16
ævar
But after calling SvPV_force_nolen(sv) we'd instead do:
$ ./perl -Ilib -MDevel::Peek -wle 'use strict; print $]; my $s = shift; my $s_ref = \$s; utf8::decode($s_ref); Dump $s_ref; print $$s_ref' ævar
5.019011
SV = PVIV(0x140e4b8) at 0x13e7fe8
REFCNT = 1
FLAGS = (PADMY,POK,pPOK)
IV = 0
PV = 0x140c578 "SCALAR(0x14313d8)"\0
CUR = 17
LEN = 24
Can't use string ("SCALAR(0x14313d8)") as a SCALAR ref while "strict refs" in use at -e line 1.
I think this is arguably the right thing to do, we wouln't actually utf8
decode the containing scalar so this reveals bugs in code that passed
references to utf8::decode(), what you want is to do this instead:
$ ./perl -CO -Ilib -MDevel::Peek -wle 'use strict; print $]; my $s = shift; my $s_ref = \$s; utf8::decode($$s_ref); Dump $s_ref; print $$s_ref' ævar
5.019011
SV = IV(0x1aa8fd8) at 0x1aa8fe8
REFCNT = 1
FLAGS = (PADMY,ROK)
RV = 0x1af23d8
SV = PV(0x1aa9b08) at 0x1af23d8
REFCNT = 2
FLAGS = (PADMY,POK,pPOK,UTF8)
PV = 0x1ad0338 "\303\246var"\0 [UTF8 "\x{e6}var"]
CUR = 5
LEN = 16
ævar
However I think we should be more consistent here, e.g. we'll die when
utf8::upgrade() gets passed a reference, but utf8::downgrade() just
passes it through. I'll file a bug for that separately.