Steve Hay [Wed, 10 Oct 2012 12:56:36 +0000 (13:56 +0100)]
Fix VC compilation of universal.c as C++ following commit
613875e219
Steve Hay [Wed, 10 Oct 2012 12:48:00 +0000 (13:48 +0100)]
Minor tidy-ups from
624a1c42c1
Fix indents and update comments to reflect the fact that the compiler's
msvcr*.dll is now used rather than loading msvcrt.dll too.
Daniel Dragan [Tue, 9 Oct 2012 09:15:55 +0000 (05:15 -0400)]
clean up vmem.h, remove unused instrumentation hooks
Removed virtual. Removed dyn loading msvcrt.dll and function pointers.
Replaced with Compiler's native CRT's malloc. Moved the CS parts
into _USE_LINKED_LIST blocks. There is nothing to protect if we aren't
putting headers on. Faster startup time is the result of this commit.
Before .text be8df .rdata 21171, after .text be88f .rdata 21121. I did
turn off _USE_LINKED_LIST as an experiment, it compiled successfully and
passed the /t/op/*.t tests (only ones I performed). I did not try the Knuth
stuff. See also this msg by Jan Dubois
https://rt.perl.org/rt3/Ticket/Display.html?id=88840#txn-1144384
Steve Hay [Wed, 10 Oct 2012 08:07:18 +0000 (09:07 +0100)]
Fixes to enable building win32 files as C++ with VC
Mostly providing explicit casts where required by VC with /TP option,
plus one renamed variable (can't have a variable called 'new' in C++).
Daniel Dragan [Tue, 9 Oct 2012 23:00:24 +0000 (19:00 -0400)]
add const to Perl_boot_core_UNIVERSAL's xsub registration struct
Move struct xsub_details details to RO memory from RW memory. This increases
the amount of bytes of the image that can be shared between Perl processes
by the OS. The inverse is each perl process takes less process specific
memory. I saw no change in .text, .rdata went from 0x21121 to 0x21391
(+0x270), .data went from 0x3b18 to 0x38b8 (-0x260). 32 bit Visual C.
Father Chrysostomos [Tue, 9 Oct 2012 20:34:54 +0000 (13:34 -0700)]
[perl #26986] Skip subst const repl optimisation for logops
pm_runtime iterates through the ops that make up the replacement part
of s///, to see whether the ops on the rhs can have side effects or
contain match vars (in which case they must only be evaluted after the
pattern). If they do not have side-effects, the rhs is presumed to be
constant and evaluated first, and then pp_subst hangs on to the return
value and reuses it in each iteration of s///g.
This iteration simply follows op_next pointers. Logops are not that
simple, so it is possible to hide match vars inside them, resulting in
incorrect optimisations:
"g" =~ /(.)/;
@l{'a'..'z'} = 'a'..'z';
$_ = "hello";
s/(.)/$l{$a||$1}/g;
print;
__END__
ggggg
This commit skips the optimisation whenever a logop is present.
This does not fix all the optimisation problems. See ticket #49190.
Daniel Dragan [Mon, 8 Oct 2012 06:13:22 +0000 (02:13 -0400)]
have embed.pl add PERL_CALLCONV_NO_RET to noreturn statics
In commit
12a2785c7e8 PERL_CALLCONV_NO_RET was added to allow MS Visual C's
noreturn to work. In that commit, statics did not get a PERL_CALLCONV_NO_RET
so Visual C may not always figure out that a certain static is a noreturn.
This patch fixes that and allows statics to be Visual C noreturns. I
observed a drop in the .text section from 0xBEAAF to 0xBE8CF on no
DEBUGGING 32 bit VC 2003 -01 -GL/-LTCG after applying this.
Karl Williamson [Tue, 9 Oct 2012 17:07:18 +0000 (11:07 -0600)]
perlreapi.pod: grammar and other nits
Karl Williamson [Sat, 6 Oct 2012 21:06:53 +0000 (15:06 -0600)]
regexec.c: White-space only; comment only; no code changes
Karl Williamson [Sun, 7 Oct 2012 20:08:42 +0000 (14:08 -0600)]
regexec.c: Refactor slightly for clarity
This reverses the sense of an if...else, so that the tiny trivial code
is done after the 'if', and the larger, non-trivial part is done in the
else. This makes it easier to understand. It also is clear that the
label and goto are no longer needed, if they ever were.
Karl Williamson [Sat, 6 Oct 2012 21:08:19 +0000 (15:08 -0600)]
utf8.c: Remove an unnecessary conditional
The unconditional array lookup is clearer, and I suspect faster than
testing for a special case.
Karl Williamson [Thu, 4 Oct 2012 04:17:19 +0000 (22:17 -0600)]
PATCH: [perl #114982]: case-insensitive regex bug with UTF8-flagged strings
It turns out that this whole area in regexec.c has never worked
properly, we just didn't have test coverage for it. Indeed, the portion
that deals with CURLYM had never been updated to include UTF-8!
The failure rate for the new tests added by this commit on the blead
that existed just prior to this commit was 96%. (4% pass)
This commit refactors the part of regexec.c that deals with quantifiers
and case insensitive matching, and adds comprehensive tests.
Consider a regex of the form A*B. How do we know how many of A* to
match? The answer is we don't, until we try B. But if B begins with
text, we can easily rule out places where the next thing isn't a B.
Only if the next character (following where we are) is the first
character of B, is this a potential match; otherwise it isn't, so there
is no need to even try B.
Code existed to do this, but it was wrong in many ways. If B can match
case-insensitively under /i, then there may be two or more possible
characters that could begin B. The previous code assumed there was a
max of two. It didn't adequately account for the differences between
/a, /d, /l, and /u; and as I mentioned above, in one place it didn't
know that there was such a thing as UTF-8. Also, it assumed that all
code points can fit into an I32.
To do this right takes quite a bit more intelligence then was there;
and it needs to be done in two different places. So, this commit
extracts out the code into a single subroutine, heavily modified to
account for the various oversights that were there previously.
The pre-existing infrastructure of fold_grind.t allowed the tests to be
added easily. I did not add tests for the above-I32 problems, as it
turns out that there are other bugs on these, [perl #115166]
Karl Williamson [Sat, 6 Oct 2012 21:03:06 +0000 (15:03 -0600)]
Allow _swash_inversion_hash() to be called in regexec.c
To prevent this very-internal core function from being used by XS
writers, it isn't defined except if the preprocessor indicates it is
compiling certain .c files. Add regexec.c to the list
Karl Williamson [Sat, 6 Oct 2012 20:57:38 +0000 (14:57 -0600)]
regex: Allow any single char to be SIMPLE
This commit relaxes the previous requirement that an EXACTish node must
contain a single Latin1-range character in order to be considered
SIMPLE. Now it allows any single character, not just Latin1.
This allows above-Unicode characters to be in optimizations like STAR or
CURLY, instead of having to match with the more complex CURLYM; and it
brings EXACTish nodes in alignment with other SIMPLE nodes, such as
those matching \w or the dot metacharacter, which all along have
supported any code point being SIMPLE
Karl Williamson [Sat, 6 Oct 2012 17:21:02 +0000 (11:21 -0600)]
regcomp.c: Slightly relax restriction of SIMPLE nodes
Currently all EXACTish nodes that are SIMPLE must be a single UTF-8
invariant character. It turns out that the code works not just for
these, but for all Latin1 characters (when the pattern isn't UTF-8)
except the SHARP S under /d folding.
SIMPLE nodes allow for better optimization possibilities, such as CURLY
instead of CURLYM.
There is still a discrepancy in that non-EXACTish nodes that match a
single character, such as the dot (SANY), can be SIMPLE, but EXACTish
nodes have to be just a single byte.
Karl Williamson [Thu, 4 Oct 2012 17:01:57 +0000 (11:01 -0600)]
regexec.c: Turn test into an assertion
Commit
31c15ce5372b770c3ca899df6cf102f1ed6866ba should have made it so
that the situation of a quantifier {m,n} with m>n never happens. Remove
the check for it, but replace it with an assertion.
Karl Williamson [Thu, 4 Oct 2012 04:24:21 +0000 (22:24 -0600)]
regexec.c: White-space only
Add some spacing for clarity.
Karl Williamson [Thu, 4 Oct 2012 01:42:39 +0000 (19:42 -0600)]
regexec.c: indent properly and reflow some comments to 80 cols
Karl Williamson [Wed, 3 Oct 2012 22:35:00 +0000 (16:35 -0600)]
regcomp.c: Refactor join_exact() to handle all multi-char folds
join_exact() prior to this commit returned a delta for 3 problematic
sequences showing that the minimum length they match is less than their
nominal length. It turns out that this is needed for all
multi-character fold sequences; our test suite just did not have the
tests in it to show that. Tests that do show this will be added in a
future commit, but code elsewhere must be fixed before they pass.
regcomp.c
Karl Williamson [Wed, 3 Oct 2012 20:42:51 +0000 (14:42 -0600)]
regen/regcharclass.pl: Generate macros for multi-char fold sequences
These will be used in future commits
Karl Williamson [Wed, 3 Oct 2012 03:48:26 +0000 (21:48 -0600)]
Add regen/regcharclass_multi_char_folds.pl
This takes as input the current Unicode character data base, and outputs
lists of the multi-character folds in it, in a form suitable for input
to regen/regcharclass.pl
Karl Williamson [Wed, 3 Oct 2012 03:45:45 +0000 (21:45 -0600)]
regen/regcharclass.pl: Simplify regex
There doesn't need to be a quantifier or capturing on this regex.
Karl Williamson [Wed, 3 Oct 2012 03:42:59 +0000 (21:42 -0600)]
regen/regcharclass.pl: Add ability for more complex inputs
This adds the capability to get input to this program from another
program, thus allowing essentially arbitrary input.
This will be used in future commits.
Karl Williamson [Sun, 30 Sep 2012 15:41:51 +0000 (09:41 -0600)]
regcomp.c: min len is chars, not bytes
The traditionally-called tricky folds occur because, under /i, a
6-byte/3-character sequence can match a 2-byte/1-character sequence.
The code here has assumed that the delta quantity is measured in bytes
(6-2=4), whereas everywhere else (AFAICT), assumes the measure is to be
in characters (3-2=1).
Father Chrysostomos [Tue, 9 Oct 2012 06:05:04 +0000 (23:05 -0700)]
[perl #114658] perl5180delta: Mention B::Hooks::Parser
Father Chrysostomos [Tue, 9 Oct 2012 05:47:13 +0000 (22:47 -0700)]
[perl #114632] perl5180delta: Mention B::Generate in known problems
Father Chrysostomos [Tue, 9 Oct 2012 05:44:32 +0000 (22:44 -0700)]
Begin perl5180delta
(I have an ulterior motive. I need somewhere to list broken CPAN modules
once patches are submitted, so I can close RT tickets.)
Peter Martini [Tue, 9 Oct 2012 02:31:37 +0000 (22:31 -0400)]
Clarify that in-place editing actually creates a new file.
If the in-place editing dies, the original is gone.
Another implication of this is that hard links on UNIX
won't work properly, since a new inode will be generated -
I think that's a little too specific to spell out in the docs
though.
Daniel Dragan [Mon, 8 Oct 2012 20:21:03 +0000 (16:21 -0400)]
remove redundant calls in S_minus_v in perl.c
Commit
b0e47665895 added a large amount of redundant calls to
PerlIO_stdout(). Fix, cache stdout. Also commit
46807d8e809 added multiple
calls to SvPVs and sv_len on the same 2 scalars that do not change between
the calls to SvPV and sv_len. Since sv_len is a wrapper around SvPV*, just
use SvPV once on each scalar and cache everything.
Ruslan Zakirov [Fri, 5 Oct 2012 22:30:20 +0000 (02:30 +0400)]
use HVhek_KEYCANONICAL in hv_delete
Ruslan Zakirov [Fri, 5 Oct 2012 22:30:18 +0000 (02:30 +0400)]
there is no obvious reason not to set flags
I don't see any reason not to set flags properly in this
branch. It doesn't look like any useful optimization.
It's probably even a bug, but probably it can only be hit from
a XS code. To hit the bug keysv should be provided, be UTF8
and not SvIsCOW_shared_hash, but with flags containing
HVhek_KEYCANONICAL.
Ruslan Zakirov [Fri, 5 Oct 2012 22:30:17 +0000 (02:30 +0400)]
no need to get shared hash value here
hv_common does it later from the keysv. Also,
there are quite a few cases when hash can not
be trusted.
Ruslan Zakirov [Fri, 5 Oct 2012 22:30:19 +0000 (02:30 +0400)]
use && rather than &
Smylers [Thu, 4 Oct 2012 10:52:01 +0000 (11:52 +0100)]
No colon at end of subheading
Grammatically unnecessary, and caused the page anchor of HTML to end in a
colon:
https://metacpan.org/module/RJBS/perl-5.16.0/pod/perldelta.pod#Platforms-with-no-supporting-programmers:
That confuses some create-URLs-from-plain-text parsers, such as list
archives and mail clients, which don't see the trailing : as part of the
URL.
Daniel Dragan [Sun, 7 Oct 2012 16:06:24 +0000 (12:06 -0400)]
Merge 2 gv_fetch* calls in Perl_newXS_len_flags
Merge a gv_fetchpvn and a gv_fetchpv. A strlen call is avoided in
gv_fetchpv, and in Perl_newXS_len_flags shorter machine code because 2
different call destinations were merged to 1, and
"GV_ADDMULTI | flags,SVt_PVCV" arguments are unconditionally executed
and are not in a branch.
Craig A. Berry [Sun, 7 Oct 2012 02:42:56 +0000 (21:42 -0500)]
Perl_sv_mortalcopy expects a return value.
Courtesy of the OpenVMS C compiler, which said:
SV *
^
%CC-I-MISSINGRETURN, Non-void function "Perl_sv_mortalcopy" does not contain a return statement.
at line number 1206 in file D0:[craig.blead]mathoms.c;1
Aaron Crane [Sun, 7 Oct 2012 11:40:52 +0000 (12:40 +0100)]
Add TODO tests for RT#115156
Father Chrysostomos [Sun, 7 Oct 2012 07:31:48 +0000 (00:31 -0700)]
Fix infinite loop with $tied =~ s/non-utf8/utf8/
Commit
3e462cdc208 fixed bug #41530 (s/non-utf8/utf8/ was not working
properly at all) by upgrading the target and redoing the substitution
if the replacement was utf8 and the target was not.
Commit
c95ca9b8cd1 fixed one problem with it calling get-magic too
many times, by checking whether the upgrade caused a string realloca-
tion and only then redoing the substitution. But it only fixed it
when magic returns a pure ASCII string.
Redoing the substitution meant going back to where the target was
initially stringified and starting again. That meant calling get-
magic again.
So in those cases where magic returned something other than a UTF8 or
pure ASCII string the substitution restarted and magic would be trig-
gered again, possibly resulting in infinite loops (because it would
have to be upgraded again, resulting a reallocation, and a restart).
This happens with:
• Latin-1 strings
• Copy-on-write non-UTF8 strings
• References that stringify without UTF8
c95ca9b8cd1 also added SvPVX without checking first that it is SvPVX-
able, so a typeglob causes an assertion failure.
It turned out that there were also two other places in pp_subst that
were calling FETCH a second time (the tests I added for the looping/
assertion bugs found this), so I changed them, too.
Karl Williamson [Sat, 6 Oct 2012 18:28:26 +0000 (12:28 -0600)]
sv.c: perlapi pod grammar
Karl Williamson [Sat, 6 Oct 2012 16:06:57 +0000 (10:06 -0600)]
regexec.c: PATCH: [perl #114808]
Commit
c72077c4fff72b66cdde1621c62fb4fd383ce093 fixed a place where
to_byte_substr() fails, but the code continued as if it had succeeded.
There is yet another place where the return is not checked. This commit
adds a check there.
However, it turns out that there is another underlying problem to
[perl #114808]. The function to_byte_substr() tries to downgrade the
substr fields in the regex program it is passed. If it fails (because
something in it is expressible only in UTF-8), it permanently changes
that field to point to PL_sv_undef, thus losing the original
information. This is fine as long as the program will be used once and
discarded. However, there are places where the program is re-used, as
in the test case introduced by this commit, and the original value has
been lost.
To solve this, this commit also changes to_byte_substr() from returning
void to instead returning bool, indicating success or failure. On
failure, the original substrs are left intact.
The calls to this function are correspondingly changed. One of them had
a trace statement when the failure happens, I reworded it to be more
general and accurate (it was slightly misleading), and added the trace
to every such place, not just the one.
In addition, I found the use of the same ternary operation in 3 or 4
consecutive lines very hard to understand; and is inefficient unless
compiled under C optimization which avoids recalculating things. So I
expanded the several nearly identical places in the code that do that so
that I could quickly see what is going on.
Aaron Crane [Wed, 12 Sep 2012 15:04:38 +0000 (16:04 +0100)]
Fix spurious "uninitialized value" warning in regex match
The warning appeared if the pattern contains a floating substring for
which utf8 is needed, and the target string isn't in utf8. In this
situation, downgrading the floating substring yields undef, which
triggers the warning.
Matching can't succeed in this situation, because it's impossible for
the non-utf8 target string to contain any string which needs utf8 for
its own representation. So the warning is quelled by aborting the match
early.
Anchored substrings already have a check of this form; this commit makes
the corresponding change in the floating-substring case.
James E Keenan [Sat, 6 Oct 2012 03:42:26 +0000 (23:42 -0400)]
Silence an uninitialized value warning.
As per precedent about 15 lines earlier in file, with respect to same
variable.
Father Chrysostomos [Fri, 5 Oct 2012 22:56:15 +0000 (15:56 -0700)]
[perl #79824] Don’t cow for sv_mortalcopy call from XS
XS code doing sv_mortalcopy(sv) will expect to get a true copy, and
not a COW ‘copy’.
So make sv_mortalcopy and wrapper around the new sv_mortalcopy_flags
that passes it SV_DO_COW_SVSETSV, which is defined as 0 for XS code.
Father Chrysostomos [Fri, 5 Oct 2012 04:57:10 +0000 (21:57 -0700)]
Increase $PerlIO::encoding::VERSION to 0.16
Father Chrysostomos [Fri, 5 Oct 2012 04:56:00 +0000 (21:56 -0700)]
Make PerlIO::encoding more resilient to buffer changes
I was trying to figure out why Encode’s perlio.t was sometimes failing
under PERL_OLD_COPY_ON_WRITE (depending on the number of comments in
the source code, or metereological conditions).
I noticed that PerlIO::encoding assumes that the buffer passed to
the encode method will come back SvPOKp. (It accesses SvCUR without
checking any flags.)
That means it can come back as a typeglob, reference, or undefined,
and PerlIO::encoding won’t care. This can result in crashes. Assign-
ing $_[1] = *foo inside an encode method is not a smart thing to do,
but it shouldn’t crash.
PerlIO::encoding was also assuming that SvPVX would not change between
calls to encode. It is very easy to reallocate it. This means the
internal buffer used by the encoding layer (which is owned by the
SV buffer passed to the encode method) can be freed and still subse-
quently written too, which is not good.
This commit makes PerlIO::encoding force stringification of the value
returned. If it does not match its internal buffer pointers, it
resets them based on the buffer SV.
This probably makes Encode pass its tests under
PERL_OLD_COPY_ON_WRITE, but I have yet to confirm it. Encoding mod-
ules are expected to write to the buffer ($_[1] = '') in certain
cases. If COW is enabled, that would cause the buffer’s SvPVX to
point to the same string as the rhs, which would explain why the lack
of accounting for SvPVX changes caused test failures under
PERL_OLD_COPY_ON_WRITE.
Steve Hay [Fri, 5 Oct 2012 07:26:40 +0000 (08:26 +0100)]
Remove exports of dummy set[ug]id functions on Windows
These are surely not required by anything, and are only stub functions
anyway so can easily be provided locally by anything that really does need
them. Also hide the declarations other than when building the core itself
as per the fix for [perl #114516].
Steffen Mueller [Fri, 5 Oct 2012 06:16:22 +0000 (08:16 +0200)]
Revert to string evals: EBCDIC
Archaeology tells me that it's string evals for a reason.
commit
933bc096593f55b9633fb193815ddd81d5b5ec1b
Author: Jarkko Hietaniemi <jhi@iki.fi>
Date: Tue Nov 27 01:22:22 2001 +0000
\141 is malformed "unexpected continuation byte" in UTF-EBCDIC.
Delay the match until runtime.
Steffen Mueller [Fri, 5 Oct 2012 06:08:04 +0000 (08:08 +0200)]
Note new Data::Dumper release in Porting/Maintainers.pl
Colin Kuskie [Wed, 3 Oct 2012 01:19:14 +0000 (18:19 -0700)]
Refactor Porting/Maintainers.pm to use Test::More instead of making TAP by hand.
With a small fix from committer: No use to eval {} module loading.
Father Chrysostomos [Thu, 4 Oct 2012 16:37:26 +0000 (09:37 -0700)]
bignum: Suppress warnings under 5.6
5.6 does not like it when a sub is declared with a prototype after a
reference to it has been taken.
5.6 does not think lowercase module names should be exempt from
reserved word warnings before ->.
Father Chrysostomos [Thu, 4 Oct 2012 16:15:10 +0000 (09:15 -0700)]
bignum overrides.t: Fix for 5.8
We use the ;$ prototype for testing global overrides under 5.8, as it
had no _ prototype. But back then (before 5.14, in fact) ;$ did not
give a function unary precedence.
Comparing against 5.009004 in bigint scope is the same as comparing
against 5, resulting in incorrect version checks and skips being
skipped.
Father Chrysostomos [Thu, 4 Oct 2012 16:12:03 +0000 (09:12 -0700)]
bigint: Fix new oct override for older Math::BigInt
Older versions of Math::BigInt required the input to from_oct to
begin with a 0.
Father Chrysostomos [Thu, 4 Oct 2012 15:34:49 +0000 (08:34 -0700)]
bignum overrides.t: Skip some tests under 5.8
Father Chrysostomos [Thu, 4 Oct 2012 07:49:37 +0000 (00:49 -0700)]
Increase bignum versions to 0.31 after the preceding change
Father Chrysostomos [Thu, 4 Oct 2012 07:35:05 +0000 (00:35 -0700)]
Rewrite bignum’s hex and oct overrides
As mentioned in <https://rt.cpan.org/Ticket/Display.html?id=79915>,
bigint.pm does not use any prototype when globally overriding hex.
This means that map { hex } ... will stop working in completely unre-
lated code if bigint happens to be loaded. (Explicit $_ will con-
tinue to work.)
I thought it would be a simple matter of adding the right prototype
depending on perl version (and inferring $_), but the basic tests
I added failed for other reasons after I fixed the prototype and
$_ handling.
It turns out this whole thing is a mess, so I have basically reimple-
mented these two overrides.
What bigint, bignum and bigrat were doing was this: In import,
*CORE::GLOBAL::hex and ::oct are assigned functions that create
Math::BigInt objects if the pragma is in effect. If import is passed
'hex' or 'oct', then the function assigned does not check the pragma
hints, but simply creates Math::BigInt objects regardless.
This means that ‘use bigrat’ stops hex() and oct() from creating
objects in ‘use bigint’ scopes, and vice versa. In fact, whichever
pragma is loaded last wins. Any scopes elsewhere in the program that
use the same pragma will have special hex() and oct() behaviour. But
the other two lowercase big* pragmata will be disabled with regard to
hex and oct.
Having ‘use bigint 'hex'’ override hex globally makes no sense to me.
I have no qualms about changing it, as it was already broken. Any
subsequent ‘use bigint;’ would turn off the global override. So now
it exports hex or oct to the calling package, just like a normal mod-
ule. You can now also call bigint::hex.
Also, in writing tests I found that oct("20") gives me 20. Apparently
this was never tested properly.
I also found notes about ‘5.9.4 or later’ when the code checked
$] > 5.009004. (Actually, in one place the code checked > 5.009003,
so I made it match, as we use the _ prototype now, which was intro-
duced in 5.9.5.) One was in the docs, so I changed it to 5.10.0,
since it is not helpful to mention dev versions. The docs were also
wrong to imply that ‘no bigint’ would countermand ‘use bigint 'hex'’.
Steffen Mueller [Thu, 4 Oct 2012 07:33:03 +0000 (09:33 +0200)]
Data::Dumper: Promote dev version to stable release
For staying in sync with CPAN.
Yves Orton [Wed, 3 Oct 2012 17:05:03 +0000 (19:05 +0200)]
regen/regcharclass.pl: improved optree generation
Karl Williamson noticed that we dont always deal with common suffixes in
the most efficient way. This change reworks how we convert a trie to an
optree so that common suffixes are always grouped together.
Yves Orton [Wed, 3 Oct 2012 01:40:50 +0000 (03:40 +0200)]
regen/regcharclass.pl: add comments and some minor code cleanup
Father Chrysostomos [Wed, 3 Oct 2012 15:49:17 +0000 (08:49 -0700)]
substr.t: Move two tests outside run_tests
I inadvertently moved them inside run_tests in commit
5888debfcd,
resulting in closure warnings.
Steve Hay [Wed, 3 Oct 2012 13:06:34 +0000 (14:06 +0100)]
Steve Hay [Tue, 2 Oct 2012 08:11:48 +0000 (09:11 +0100)]
Bump $ExtUtils::CBuilder::VERSION and add Changes entry for prelink() fix
Steve Hay [Tue, 2 Oct 2012 08:07:03 +0000 (09:07 +0100)]
Add missing Changes entries for ExtUtils-CBuilder per its README.patching
Eric Brine [Tue, 2 Oct 2012 02:30:04 +0000 (19:30 -0700)]
Allow a list of symbols to export to be passed to link() when on Windows, as on other OSes.
Father Chrysostomos [Tue, 2 Oct 2012 16:59:30 +0000 (09:59 -0700)]
Fix uninit warnings under old cow
Applying copy-on-write to pad names when generating uninitial-
ised warnings results in modifications to SvPVX(sv) after
sv_setsv(sv,padname) affecting the pad name as well. sv_setsv_flags
with no flags avoids the COW.
Under PERL_OLD_COPY_ON_WRITE and warnings, this:
my (@ma);
$v = sin $ma[1000];
$v = sin $ma[$$];
was producing this:
$ pbpaste|./miniperl -w
Use of uninitialized value $ma[1000] in sin at - line 2.
Use of uninitialized value within $ma in sin at - line 3.
If you comment out the sceond line:
$ pbpaste|./miniperl -w
Name "main::v" used only once: possible typo at - line 3.
Use of uninitialized value within @ma in sin at - line 3.
Notice the @ma/$ma difference.
The first uninit warning was modifying the pad name.
Father Chrysostomos [Tue, 2 Oct 2012 05:10:53 +0000 (22:10 -0700)]
Stop sv_force_normal from crashing on ro globs under old cow
This allows t/lib/universal.t to pass under PERL_OLD_COPY_ON_WRITE.
Jerry D. Hedden [Tue, 2 Oct 2012 22:58:32 +0000 (18:58 -0400)]
Upgrade to threads::shared 1.42
Father Chrysostomos [Tue, 2 Oct 2012 23:11:17 +0000 (16:11 -0700)]
Revert "Upgrade to threads::shared 1.42"
This reverts commit
34bd199a87daedeaeadd8e9ef48032c8307eaa94.
Jerry D. Hedden [Tue, 2 Oct 2012 19:33:01 +0000 (15:33 -0400)]
Upgrade to threads::shared 1.42
Father Chrysostomos [Tue, 2 Oct 2012 03:05:45 +0000 (20:05 -0700)]
perly.c: Disarm the YYDEBUG defines in perly.h
See <craigberry-E9C729.
16313730092012@cpc2-bmly6-0-0-cust974.2-3.cable.virginmedia.com>.
Move the YYDEBUG defines in perly.c back where they were before, but
undefine YYDEBUG first. That leaves bison 2.6’s YYDEBUG defines in
perly.h harmless.
Father Chrysostomos [Mon, 1 Oct 2012 22:55:54 +0000 (15:55 -0700)]
substr.t: Fix for substr_thr.t
I was putting tests below run_tests by mistake. When substr_thr.t is
run, the tests below run first. Any warnings they cause will make the
no warnings tests at the top of the script fail.
Father Chrysostomos [Mon, 1 Oct 2012 21:26:20 +0000 (14:26 -0700)]
Oops; fix threaded build
Father Chrysostomos [Mon, 1 Oct 2012 19:52:10 +0000 (12:52 -0700)]
[Merge] utf8 caches and overload; other bug fixes
I began by trying to fix the remaining issues in ticket #114410,
namely, overloading interacting badly with utf8 caching. I discovered
other bugs along the way (some of which touch on #114690 and #80190),
fixing them in the process.
The core no longer uses length magic on scalars. mg_length is depre-
cated (see its docs for why). The PERL_MAGIC_scalar vtable no longer
contains a length function, and Perl_magic_len is gone. mg_length has
been corrected always to return a byte count, as it did originally,
instead of characters for scalars without length magic and bytes
otherwise.
sv_pos_u2b and sv_len_utf8 now only store utf8 caches on non-magical
PVs. They used to cause bugs not only on tied and overloaded values,
but also typeglobs and non-overloaded references.
sv_or_pv_len_utf8 and sv_or_pv_pos_u2b are two new macros for calling
the non-_or_pv versions on muggle scalars and using the pv directly
for SvGMAGICAL scalars.
sv_len_utf8 now returns a character count, as documented, instead of
assuming that what is passed to it is utf8 (an unreliable assumption,
as the UTF8 flag is meaningful only *after* stringification, and
sv_len_utf8 stringifies).
length, pos, substr and sprintf now avoid triggering overloading or
ties too many times or reading the UTF8 flag at the wrong time, by
following one of these sequences:
If the scalar is going to be stringified:
• Call get-magic.
• Stringify the scalar in place.
• Check the UTF8 flag.
• Use the non-magical variants of the above functions:
sv_pos_u2b_flags without SV_GMAGIC, and sv_len_utf8_nomg.
If the scalar is not going to be modified:
• Call get-magic.
• Stringify the scalar without coercing it, and save the string.
• Check the UTF8 flag.
• Use sv_or_pv_ macros.
sysread, syswrite and pack no longer need to be as complicated any
more, now that sv_pos_u2b and sv_len_utf8 (and their _or_pv variants)
are more friendly, so they have been simplified.
utf8::encode now calls FETCH and STORE on tied variables. It stopped
calling fetch with the magic flags patch. It had never called STORE.
utf8::decode now calls STORE, too. It likewise had never called it.
This means it no longer preserves pos.
Father Chrysostomos [Mon, 1 Oct 2012 16:08:18 +0000 (09:08 -0700)]
Make sprintf "%1s" and "%.1s" call overloading once
Calling overloading multiple times can probably result in mangled
UTF8, but it is much easier just to test for the number of calls.
Father Chrysostomos [Mon, 1 Oct 2012 16:01:37 +0000 (09:01 -0700)]
sprintf{2,}.t: Explain why we have two test files
Father Chrysostomos [Mon, 1 Oct 2012 13:28:48 +0000 (06:28 -0700)]
pp_sys.c: Simplify uses of sv_len_utf8
sv_len_utf8 is now careful not to record caches on magical or over-
loaded scalars (any non-PV, in fact). It also returns the number of logical characters correctly, regardless of whether its input is utf8.
So we can take advantage of that to simplify pp_sysread.
For pp_syswrite, we can use sv_or_pv_len_utf8 with the existing
string buffer.
Father Chrysostomos [Mon, 1 Oct 2012 06:54:17 +0000 (23:54 -0700)]
pp_pack.c: Simplify sv length determination in one spot
sv_len_utf8 is now careful not to record caches on magical or over-
loaded scalars (any non-PV, in fact). It also returns the number of
logical characters correctly, regardless of whether its input is utf8.
So we can greatly simplify this code.
Father Chrysostomos [Mon, 1 Oct 2012 06:28:12 +0000 (23:28 -0700)]
Call overloading once for utf8 ovld→substr assignment
Father Chrysostomos [Sun, 30 Sep 2012 20:04:53 +0000 (13:04 -0700)]
Make substr assignment work with changing UTF8ness
Assigning to a substr lvalue scalar was invoking overload too
many times if the target was a UTF8 string and the assigned sub-
string was not.
Since sv_insert_flags itself stringifies the scalar, the easiest
way to fix this is to force the target to a PV before doing any-
thing to it.
Father Chrysostomos [Sun, 30 Sep 2012 07:07:47 +0000 (00:07 -0700)]
mg.c:magic_setsubstr: rmv redundante null check
This was added in commit
9bf12eaf4 to fix a crash, but it is not
necessary any more, due to changes elsewhere.
Father Chrysostomos [Sun, 30 Sep 2012 07:06:46 +0000 (00:06 -0700)]
Test #7678
This was fixed in
9bf12eaf4, but apparently never tested.
It used to crash, so no is() is necessary.
Father Chrysostomos [Sun, 30 Sep 2012 06:56:56 +0000 (23:56 -0700)]
Make rvalue substr call overloading once on utf8 str
Father Chrysostomos [Sun, 30 Sep 2012 06:31:44 +0000 (23:31 -0700)]
sv.c: One less assignment in sv_pvutf8n_force
This function always assigns to *lp twice (though indirectly the first
time). It only needs to do so once.
Father Chrysostomos [Sun, 30 Sep 2012 05:58:55 +0000 (22:58 -0700)]
Make 4-arg substr call FETCH once when upgrading target
Father Chrysostomos [Sat, 29 Sep 2012 18:27:35 +0000 (11:27 -0700)]
Make 4-arg substr check SvUTF8(target) after stringfying
If it checks the UTF8 flag first, it might be looking at a stale flag,
resulting in malformed UTF8. Both tests added produced malformed utf8
strings before this commit.
Simply moving this:
if (!DO_UTF8(sv))
sv_utf8_upgrade(sv);
after the stringification is not enough to fix this, as the string
retrieved will be out of date after we do an upgrade. To avoid
stringifying twice, we use SvPV_force if there is a replacement. This
means rearranging if() blocks a little.
The use of SvPV_force also means that string overloading is no longer
called twice on the target scalar. This rearrangement also means
that targets upgraded to utf8 are no longer exempt from the refer-
ence warning. (Oh, and the test for that warning was not testing any-
thing in its no warnings test, because the target was no longer a ref-
erence; so I corrected the test.)
Father Chrysostomos [Fri, 28 Sep 2012 21:47:05 +0000 (14:47 -0700)]
Remove length magic on scalars
It is not possible to know how to interpret the returned length
without accessing the UTF8 flag, which is not reliable until
the SV has been stringified, which requires get-magic. So length
magic has not made senses since utf8 support was added. I have
removed all uses of length magic from the core, so this is now
dead code.
Father Chrysostomos [Fri, 28 Sep 2012 21:13:33 +0000 (14:13 -0700)]
Update utf8.t tests
Recent commits made utf8::decode and utf8::encode call STORE on tied
variables. This also causes pos to be reset. This seems the right
thing to me, as these function actually change the content of the SV,
not just the internal representation.
Father Chrysostomos [Fri, 28 Sep 2012 21:01:53 +0000 (14:01 -0700)]
sv.c: Don’t cache utf8 length on gmagical SVs
The cache will just be invalidate on the next fetch. This commit avoids
extra work in those cases that are detectable. We still have to invali-
date caches in mg_get, because caches can be created while magic is
being called and SvMAGICAL is off.
Father Chrysostomos [Fri, 28 Sep 2012 20:44:48 +0000 (13:44 -0700)]
pp_length should stringify before checking DO_UTF8
Typeglobs and references can change their UTF8ness upon string-
ification.
Father Chrysostomos [Fri, 28 Sep 2012 16:56:01 +0000 (09:56 -0700)]
Only cache utf8 offsets for PVs
References and typeglobs can change their stringification without the
SV itself being assigned to.
Father Chrysostomos [Fri, 28 Sep 2012 15:42:47 +0000 (08:42 -0700)]
Make substr = $utf8 call get-magic once
Father Chrysostomos [Fri, 28 Sep 2012 15:40:53 +0000 (08:40 -0700)]
Make utf8::decode respect set-magic
Father Chrysostomos [Fri, 28 Sep 2012 15:34:51 +0000 (08:34 -0700)]
Make utf8::encode respect magic
It has always ignored set-magic, as far as I can tell.
Since the magic flags patch (
4bac9ae47b), it has been ignor-
ing get- magic on magical scalars that were already PVs.
sv_utf8_upgrade_flags_grow begins with an if(!SvPOK(sv)) check, which
used to mean ‘if this scalar is magic or not a string’, but now means
simply ‘if this scalar is not a string’. SvPOK_nog is the new SvPOK.
Due to the way the flags now work, I had to modify sv_pvutf8n_force
as well, to keep existing tests passing.
Father Chrysostomos [Fri, 28 Sep 2012 13:16:55 +0000 (06:16 -0700)]
sv.c:sv_pos_u2b: Don’t cache anything on magical SVs
This should not change the behaviour. It is just unnecessary to cache
positions on a magical string, as we then just have to invalidate the
cache in mg_get. It will be more efficient just not to create the
cache to begin with.
This does not allow us to remove the cache invalidation in mg_get,
because the cache can sometimes be created while magic is being called
and SvMAGICAL is off. It just avoids extra work in those cases that
are detectable.
Father Chrysostomos [Fri, 28 Sep 2012 12:52:53 +0000 (05:52 -0700)]
Make magic_setsubstr check UTF8 flag after stringification
By checking it before, it can end up treating a UTF8 string as bytes
when calculating offsets if the UTF8 flag is not turned on until
the target is stringified. This can happen with overloading and
typeglobs.
This is a regression from 5.14. 5.14 itself was buggy, too, but one
would have to modify the target after creating the substr lvalue but
before assigning to it; and that because of another bug fixed by
83f78d1a27, which was cancelling out this one.
package o {
use overload '""' => sub { $_[0][0] }
}
my $refee = bless ["\x{100}a"], o::;
my $substr = \substr $refee, -2;
$$substr = "b";
warn $refee;
That prints:
Wide character in warn at - line 7.
Āb at - line 7.
In 5.14 it prints:
b at - line 7.
Father Chrysostomos [Fri, 28 Sep 2012 12:52:36 +0000 (05:52 -0700)]
Stop substr lvalues from being confused by changing UTF8ness
Father Chrysostomos [Fri, 28 Sep 2012 04:29:33 +0000 (21:29 -0700)]
Stop pos from panicking when overloading changes UTF8ness
This touches on issues raised in tickets #114410 and #114690.
If the UTF8ness of an overloaded string changes with each call, it
will make magic_setpos panic if it tries to stringify the SV multiple
times. We have to avoid any sv-specific utf8 length functions when
it comes to overloading. And we should do the same thing for gmagic,
too, to avoid creating caches that will shortly be invalidated.
The test class is very closely based on code written by Nicholas Clark
in a response to #114410.
Father Chrysostomos [Thu, 27 Sep 2012 03:39:55 +0000 (20:39 -0700)]
Make sv_len_utf8 return a character count as documented
Brought up in ticket #114690.
sv_len_utf8 does not make sense. It assumes that the string is UTF-8.
If it is not, it just does the wrong thing. For magical variables, it
expects mg_length to return the number of characters, but it would
only sometimes do that until the previous commit, which restored it to
returning bytes for all scalars.
Since you have to know already that a string is in utf8 before you
can call sv_len_utf8, but sv_len_utf8 might call get-magic which will
change the utf8-ness, it really makes no sense as an API. Up till
now, it has been consistently buggy with any magic scalars.
So I have changed sv_len_utf8 to do exactly what the documentation
says: return the number of characters.
This also causes an existing buggy code path in sv_len_utf8_nomg to be
reached (SvCUR without checking SvPOK), so this fixes that too.
Father Chrysostomos [Thu, 27 Sep 2012 00:45:51 +0000 (17:45 -0700)]
sv.c: Document that sv_len sets the UTF8 flag
Father Chrysostomos [Wed, 26 Sep 2012 22:20:52 +0000 (15:20 -0700)]
Deprecate mg_length; make it return bytes
mg_length returns the number of bytes if a scalar has length magic,
but the number of characters otherwise.
sv_len_utf8 used to assume that mg_length would return bytes. The
first mistake was added in commit
b76347f2eb, which assumed that
mg_length would return characters. But it was #ifdeffed out until
commit
ffc61ed20e.
Later, commit
5636d518683 met sv_len_utf8’s assumptions by making
mg_length return the length in characters, without accounting for
sv_len, which also used mg_length.
So we ended up with a buggy sv_len that would return a character
count for scalars with get- but not length-magic, and a byte count
otherwise.
In the previous commit, I fixed sv_len not to use mg_length at all. I
plan shortly to remove any use of mg_length (the one remaining use is
in sv_len_utf8, which is currently not called on magical values).
The reason for removing all calls to mg_length is that the returned
length cannot be converted to characters without access to the PV as
well, which requires get-magic. So length magic on scalars makes no
sense since the advent of utf8.
This commit restore mg_length to its old behaviour and lists it as
deprecated. This is mostly cosmetic, as there are no CPAN users. But
it is in the API, and I don’t know whether we can easily remove it.
Father Chrysostomos [Wed, 26 Sep 2012 03:33:30 +0000 (20:33 -0700)]
Make pos less volatile when UTF8-ness can change
This was brought up in ticket #114690.
pos checks the length of the string and then its UTF8-ness. But the
UTF8-ness is not updated by length magic. So it can get confused if
simply stringifying a match var happens to flip the UTF8 flag:
$ perl -le '"\x{100}a" =~ /(..)/; pos($1) = 2; print pos($1); "$1";
print pos($1)'
2
1
$ perl -le '"\x{100}a" =~ /(.)/; pos($1) = 2; print pos($1); "$1"; print
pos($1)'
1
Malformed UTF-8 character (unexpected end of string) in match position
at -e line 1.
0
As pointed out in that ticket, length magic on scalars cannot work
properly with UTF8, so stop using it.