Father Chrysostomos [Wed, 18 Jan 2012 02:22:16 +0000 (18:22 -0800)]
[perl #106726] Don’t crash on length(@arr) warning
The RT ticket blames this on
676a678ac, but it was actually commit
579333ee9e3.
676a678ac extended this problem to evals (and modules),
but it already occurred in the main program.
This crashes:
./miniperl -Ilib -we 'sub {length my @forecasts}'
because it is trying to find the variable name for the warning in the
CV returned by find_runcv, but this is a *compile-time* warning, so
using find_runcv is just wrong.
It ends up looking for the array in PL_main_cv’s pad, instead of
PL_compcv.
Tony Cook [Wed, 18 Jan 2012 03:36:12 +0000 (14:36 +1100)]
avoid truncating time values when long is smaller than time_t
long is only 32-bits on x64 Win32, but time_t is 64-bits. This was warning:
POSIX.xs(1777) : warning C4244: 'initializing' : conversion from 'time_t' to 'const long', possible loss of data
The check against (time_t)-1 is the approved check from ANSI C 89 and
99.
Nicholas Clark [Tue, 17 Jan 2012 12:29:56 +0000 (13:29 +0100)]
Avoid 'defined(@array) is deprecated' warnings in tests.
Commit
604a99bd464c92d7 enabled the warning for package arrays, but failed
to lexically disable the warning for the various tests for the construction.
Even though the construction is deprecated, we'd still like to know if the
behaviour changes, in case it wasn't intentional.
Nicholas Clark [Tue, 17 Jan 2012 11:44:42 +0000 (12:44 +0100)]
In Perl_refcounted_he_fetch_pvn(), eliminate nested ? : ternary operators.
Chris 'BinGOs' Williams [Tue, 17 Jan 2012 10:19:13 +0000 (10:19 +0000)]
Update Locale-Maketext to CPAN version 1.22
[DELTA]
2012-01-14
* Minor POD documentation update to sync with upstream blead.
Nicholas Clark [Tue, 10 Jan 2012 14:43:55 +0000 (15:43 +0100)]
Make Pod::Html more robust against malformed L<> contents.
Pod::Html attempts to search for the contents to see if they are a suffix
of any entry in an existing list, using a regular expression.
Previously the contents were interpolated directly into a regex, which
meant that if they happened to be syntactically invalid regular expression
syntax, Pod::Html aborted with a runtime error.
Nicholas Clark [Tue, 10 Jan 2012 13:36:26 +0000 (14:36 +0100)]
Purge references to --netscape and --libpods, no longer in Pod::Html
The long deprecated --netscape flag was removed in commit
27b29ec338b08496.
This was originally added to control use of Netscape specific HTML
extensions, , but became a no-op when that functionality was removed.
--libpods was removed in commit
3b49d8d9ac841d8e.
However neither commit removed use of these flags by callers to Pod::Html,
notably in installhtml and the Makefiles that invoke it. Hence this commit.
Father Chrysostomos [Tue, 17 Jan 2012 07:12:12 +0000 (23:12 -0800)]
perldelta: Typos and calrifications
Father Chrysostomos [Tue, 17 Jan 2012 06:59:42 +0000 (22:59 -0800)]
perldelta up to
1a50d74bac4
Father Chrysostomos [Tue, 17 Jan 2012 04:48:27 +0000 (20:48 -0800)]
perldelta for @& and PL_sawampersand
Father Chrysostomos [Tue, 17 Jan 2012 04:47:46 +0000 (20:47 -0800)]
perldelta for tying and autoviv
Father Chrysostomos [Tue, 17 Jan 2012 04:42:53 +0000 (20:42 -0800)]
perldelta for overload::Overloaded/can change
Father Chrysostomos [Tue, 17 Jan 2012 04:42:35 +0000 (20:42 -0800)]
perldelta for += warning
Father Chrysostomos [Tue, 17 Jan 2012 04:42:11 +0000 (20:42 -0800)]
Lower $overload::VERSION to 1.17
It doesn’t need to be increased twice between releases.
Tony Cook [Mon, 16 Jan 2012 10:11:10 +0000 (21:11 +1100)]
config.over is generated on some platforms, .gitignore it
Tony Cook [Mon, 16 Jan 2012 10:00:00 +0000 (21:00 +1100)]
avoid overflowing a 32-bit signed int
and the associated warning from Solaris C:
"regcomp.c", line 5294: warning: integer overflow detected: op "<<"
Tony Cook [Mon, 16 Jan 2012 22:02:50 +0000 (09:02 +1100)]
handle U suffixed unsigned int literals from regexp.h
Nicholas Clark [Mon, 16 Jan 2012 16:08:38 +0000 (17:08 +0100)]
Provide as much diagnostic information as possible in "panic: ..." messages.
The convention is that when the interpreter dies with an internal error, the
message starts "panic: ". Historically, many panic messages had been terse
fixed strings, which means that the out-of-range values that triggered the
panic are lost. Now we try to report these values, as such panics may not be
repeatable, and the original error message may be the only diagnostic we get
when we try to find the cause.
We can't report diagnostics when the panic message is generated by something
other than croak(), as we don't have *printf-style format strings. Don't
attempt to report values in panics related to *printf buffer overflows, as
attempting to format the values to strings may repeat or compound the
original error.
Father Chrysostomos [Mon, 16 Jan 2012 18:05:41 +0000 (10:05 -0800)]
Increase $Math::BigInt::VERSION to 1.998
Father Chrysostomos [Mon, 16 Jan 2012 18:04:10 +0000 (10:04 -0800)]
BigInt.pm: Suppress overload warning
BigInt.pm intentionally registered an unregisterable op.
Father Chrysostomos [Mon, 16 Jan 2012 17:59:49 +0000 (09:59 -0800)]
perldiag.pod: Document overload’s invalid arg warning
Father Chrysostomos [Mon, 16 Jan 2012 17:59:23 +0000 (09:59 -0800)]
overload.pm: Combine two loops
It should go faster if we only iterate through %arg once.
Father Chrysostomos [Mon, 16 Jan 2012 17:50:47 +0000 (09:50 -0800)]
Test invalid arg warning from overload
Father Chrysostomos [Mon, 16 Jan 2012 17:36:22 +0000 (09:36 -0800)]
overload.pm: Put invalid arg warning in "overload" category
Existing overload warnings work that way.
Father Chrysostomos [Mon, 16 Jan 2012 17:32:09 +0000 (09:32 -0800)]
Suppress ‘useless’ warning in overload.pm
jkeenan [Sun, 15 Jan 2012 14:07:13 +0000 (09:07 -0500)]
Emit a warning if an attempt is made to overload an invalid (e.g., misspelled) operator. For RT #74098.
Nicholas Clark [Wed, 4 Jan 2012 10:36:25 +0000 (11:36 +0100)]
defined(@array) now also warns for package variables.
Nicholas Clark [Wed, 4 Jan 2012 10:23:17 +0000 (11:23 +0100)]
Test that defined warns for package arrays and hashes.
Currently TODO'd for package arrays. The existing tests were only for
lexicals.
Nicholas Clark [Mon, 16 Jan 2012 15:21:21 +0000 (16:21 +0100)]
In Perl_feature_is_enabled() use cBOOL to convert the pointer to a "bool".
On some platforms which don't have a (real) bool type, bool is actually a
char, and hence (sadly) it's correct for the compiler to truncate when
assigning to it, instead of what the programmer thought was going to happen
(testing zero or not). In Perl_feature_is_enabled(), the expression is a
pointer, so if it converts to an integer with the bottom 8 bits zero, then
on these platforms it would truncate to "false". Not what was expected.
Steffen Mueller [Mon, 16 Jan 2012 07:25:09 +0000 (08:25 +0100)]
Shuffle my entry around to use my name instead of old email
Hojung Youn [Mon, 16 Jan 2012 07:19:07 +0000 (08:19 +0100)]
Correct Hojung Youn's name in AUTHORS
Chris 'BinGOs' Williams [Sun, 15 Jan 2012 21:49:01 +0000 (21:49 +0000)]
Bump ExtUtils::Manifest version due to commits
97bae9c5 and
bf081550
Chris 'BinGOs' Williams [Sun, 15 Jan 2012 21:42:59 +0000 (21:42 +0000)]
Bump autouse version due to commit
f965e9d4a
Father Chrysostomos [Sun, 15 Jan 2012 19:36:35 +0000 (11:36 -0800)]
stat.t: Avoid unconditional lstat on file name
lstat on handles is ok.
Father Chrysostomos [Sun, 15 Jan 2012 19:22:34 +0000 (11:22 -0800)]
stat.t, filetest.t: Actually gen rand file names
This mistake of mine was stupid enough I laughed out loud.
Father Chrysostomos [Sun, 15 Jan 2012 06:35:47 +0000 (22:35 -0800)]
perldelta up to
55b5114f4
Hojung Youn [Sun, 15 Jan 2012 03:03:19 +0000 (12:03 +0900)]
[perl #108224] B::Deparse doesn't recognize for continue block
B::Deparse foreach scoping problem was fixed at
cf24a84005,
which was issued at #30504. But B::Deparse was blinded
temporarily by this commit so that it couldn't recognize
foreach continue block for a moment.
foreach statement generates 'nextstate', 'stub', 'leave', or
'scope' root opcode at will when foreach statement is not used
as a oneline statement modifier. So all the case of opcodes
should be checked.
Some tests for foreach scoping and continue block are attached.
related: #30504
Father Chrysostomos [Sun, 15 Jan 2012 06:05:13 +0000 (22:05 -0800)]
perldelta up to
9f71cfe6ef2
Craig A. Berry [Sun, 15 Jan 2012 04:40:24 +0000 (22:40 -0600)]
In vmsify, leave ../ path components alone.
Way back in
08c7cbbb0fc466967038dcb56ca4f1b828b96269, we started
eliminating ../ components when converting paths from Unix syntax
to VMS syntax. No corresponding change was made when converting
in the opposite direction, so this was inconsistent. We should
get a valid path either way, but doing more interpretation than
necessary seems uncalled for, so this patch restores the previous
behavior.
This also paves the way to eliminate some inconsistencies between
what we do when Extended Filename Syntax (EFS) is in effect and
when it's not.
Craig A. Berry [Sat, 14 Jan 2012 22:24:16 +0000 (16:24 -0600)]
Un-TODO some VMS file spec tests under EFS.
When Extended Filename Syntax is enabled, several tests were
expecting not to pass, but they do, so we should say so. Also,
reinstate a test removed in
1fe570cc5e24eecfb07059e53e95fa864bb44142.
Father Chrysostomos [Sat, 14 Jan 2012 21:12:07 +0000 (13:12 -0800)]
Make lstat($ioref) and lstat($gv) consistent
As documented in perldiag, lstat($gv) warns and does an fstat.
lstat($ioref) wasn’t doing what was documented, but after warning
would do the same as stat(_).
Father Chrysostomos [Sat, 14 Jan 2012 20:59:34 +0000 (12:59 -0800)]
stat.t: Make test for -T _ and lstat more robust
It is now less likely to fail if one runs stat.t manually when tests
are running, due to the random file name. It also works now when the
script chdirs (when run outside t), because it no longer relies on $0
being a real file.
That last part was broken by commit
ad2d99e39, which made -T _ more
consistent when it cannot open the file. It used to set the stat
type, but not the success status (as of commit
25988e07, which
fixed #4253).
Craig A. Berry [Sat, 14 Jan 2012 19:16:26 +0000 (13:16 -0600)]
Document the data columns in vms/ext/filespec.t.
Craig A. Berry [Fri, 6 Jan 2012 20:23:59 +0000 (14:23 -0600)]
Stop inadvertently skipping Spec.t on VMS.
ae5a807c7dcf moved a check against $@ away from the eval it was
checking and inserted another eval in between, the effect of which
was to make the tests that can only run on VMS get skipped there
too. Ouch.
There are other problems with
ae5a807c7dcf, but this is a start.
Father Chrysostomos [Sat, 14 Jan 2012 19:31:45 +0000 (11:31 -0800)]
pp_sys.c: goto mustn’t skip initialisation
Father Chrysostomos [Sat, 14 Jan 2012 09:16:58 +0000 (01:16 -0800)]
perldelta up to
7c2b3c783b
Father Chrysostomos [Sat, 14 Jan 2012 09:01:33 +0000 (01:01 -0800)]
magic.t: Correct miniperl skip count
Father Chrysostomos [Sat, 14 Jan 2012 08:23:23 +0000 (00:23 -0800)]
-T "unreadable file" should set stat info consistently
This was mentioned in ticket #77388. It turns out to be
related to #4253.
If the file cannot be opened, -T and -B on filenames set the last han-
dle to null and set the last stat type to stat, but leave the actual
stat buffer and success status as they were.
That means that stat(_) will continue to return the previous buffer,
but lstat(_) will no longer work.
This is another of those inconsistent cases where the internal stat
info is only partially set.
Originally, this code would set PL_laststatval (the success status) to
-1. Commit
25988e07 (the patch in ticket #4253) intentionally changed
this to make -T _ less suprising on read-only files.
But the patch ended up affecting -T with an explicit file name, too.
It also only partially fixed things for -T _, because the last stat
type *was* still being set.
This commit changes it to set all the stat info, for explicit file
names, or no stat info, for _ (if the previous stat was with a
file name).
Father Chrysostomos [Sat, 14 Jan 2012 08:40:05 +0000 (00:40 -0800)]
stat.t: Add bug number
Father Chrysostomos [Sat, 14 Jan 2012 08:33:15 +0000 (00:33 -0800)]
Don’t emit unopened warning for other stat(HANDLE) error
-r or -T on a GV with no IO or on an IO with no fp (or dirp for -r)
will produce an ‘unopened’ warning. stat() on a filehandle will warn
about an unopened filehandle not only if there is no fp, but also if
the fstat call fails (with errno containing EBADP, EFAULT or EIO, at
least on Darwin).
I don’t know if there is a way to test this.
(But pp_stat and my_stat_flags are getting closer, so this must be
correct. :-)
Father Chrysostomos [Sat, 14 Jan 2012 08:07:46 +0000 (00:07 -0800)]
Make -T BADHANDLE set errno with fatal warnings
Due to the order of the statements, SETERRNO would never be reached
with fatal warnings.
I’ve added another SETERRNO out of paranoia. If there is a nicely-
behaved __WARN__ handler, we should still be setting errno just before
-T returns, in case the handler changed it. We can’t do much in
the case of fatal handlers that do system calls. (Is $! localised
for those?)
Father Chrysostomos [Sat, 14 Jan 2012 08:00:26 +0000 (00:00 -0800)]
Make -l HANDLE set PL_laststatval with fatal warnings
Fatal warnings were preventing it from being set, because the warning
came first.
(PL_laststatval records the success status of the previous stat.)
Father Chrysostomos [Sat, 14 Jan 2012 07:50:15 +0000 (23:50 -0800)]
Make -T HANDLE and -B HANDLE always set last stat type
-T and -B on handles always set PL_laststatval (which indicates the
success of the previous stat). But they don’t set the last stat type
(PL_laststype) for closed filehandles. Those two should always go
together. stat and -r, -w etc., always set PL_laststype for a closed
or missing filehandle.
Father Chrysostomos [Sat, 14 Jan 2012 07:42:04 +0000 (23:42 -0800)]
pp_sys.c:pp_fttest: Don’t set PL_statname to SvPV(PL_statname)
This is a waste of CPU cycles.
PL_statname is always a PV.
Father Chrysostomos [Sat, 14 Jan 2012 07:38:57 +0000 (23:38 -0800)]
Make -T _ and -B _ always set PL_laststatval
-T _ and -B _ always do another stat() on the previous file handle or
filename, unless it is a handle that has been closed.
Normally, the internal stat buffer, status, etc., are reset even for
_. This happens even on a failed fstat().
-T HANDLE and -B HANDLE currently *do* reset the stat status
(PL_laststatval) if there is no IO thingy, so having -T _ and -B _ not
do that makes things needlessly inconsistent.
Father Chrysostomos [Sat, 14 Jan 2012 06:47:31 +0000 (22:47 -0800)]
pp_sys.c: Remove space from lstat($ioref) warning
This was emitting two spaces before the ‘at’:
lstat() on filehandle at -e line 1.
Father Chrysostomos [Sat, 14 Jan 2012 06:19:23 +0000 (22:19 -0800)]
pp_sys.c:pp_fttext: Don’t extend the stack after popping
Father Chrysostomos [Sat, 14 Jan 2012 04:31:23 +0000 (20:31 -0800)]
Squash repetitititive code in doio.c:my_stat_flags
Father Chrysostomos [Sat, 14 Jan 2012 04:28:46 +0000 (20:28 -0800)]
Make failed filetests consistent with & w/out fatal warnings
The result of stat(_) after a failed -r HANDLE would differ depending
on whether fatal warnings are on. This corrects that, by setting the
internal status before warning about an unopened filehandle.
Father Chrysostomos [Sat, 14 Jan 2012 00:52:23 +0000 (16:52 -0800)]
stat $ioref should record the handle for -T _
stat $gv records the handle so that -T _ can use it. But stat $ioref
hasn’t been doing that, until this commit.
PL_statgv can now hold an SVt_PVIO instead of a SVt_PVGV.
Father Chrysostomos [Sat, 14 Jan 2012 00:43:30 +0000 (16:43 -0800)]
stat $ioref should reset the internal stat type
In addition to a stat buffer, Perl keeps track internally of which
type of stat was done last, either stat or lstat, so that lstat _ can
die if the previous type was stat.
This was not being reset for stat $ioref. Filetest ops were fine.
Father Chrysostomos [Fri, 13 Jan 2012 23:50:51 +0000 (15:50 -0800)]
Set PL_statgv to null when freed or coerced
If PL_statgv is not set to null when freed, that same SV could be
reused for another GV, in which case -T _ will then use another handle
unrelated to the previous stat.
Similarly, if PL_statgv points to a fake glob that gets coerced into
a non-glob before it is freed, it will not follow the code path in
sv_free that sets PL_statgv to null. Furthermore, if it becomes a GV
again, it could be a completely different filehandle, unrelated to the
previous stat.
Father Chrysostomos [Fri, 13 Jan 2012 22:26:19 +0000 (14:26 -0800)]
Suppress confusing uninit warning from -T _
-T _ uses the file name saved by a preceding stat. If there was no
preceding stat, the internal sv used to store the file name is unde-
fined, so SvPV producing an uninitialized warning. Normally a failed
-T will just return undefined and set $!. Normally stat on a filehan-
dle will set the internal stat file name to "".
This commit sets the internal file name to "" initially on startup,
instead of creating an undefined scalar.
Father Chrysostomos [Fri, 13 Jan 2012 17:32:20 +0000 (09:32 -0800)]
defined *{"+"} should not stop %+ from working
The same applies to %-.
This is something I broke when merging is_magical_gv with
gv_fetchpvn_flags.
gv_fetchpvn_flags must make sure its *+ glob is present in the symbol
table when it loads Tie::Hash::NamedCapture. If it adds it afterwards
it will clobber another *+ that Tie::Hash::NamedCapture has autovivi-
fied and tied in the mean time.
Father Chrysostomos [Fri, 13 Jan 2012 17:23:07 +0000 (09:23 -0800)]
defined *{"!"} should not stop %! from working
This is something I broke when merging is_magical_gv with
gv_fetchpvn_flags.
gv_fetchpvn_flags must make sure its *! glob is present in the sym-
bol table it loads Errno. If it adds it afterwards it will clobber
another *! that Errno has autovivified and tied in the mean time.
Father Chrysostomos [Fri, 13 Jan 2012 00:41:59 +0000 (16:41 -0800)]
Squash repetititive code in util.c:report_evil_fh
Karl Williamson [Fri, 13 Jan 2012 16:38:50 +0000 (09:38 -0700)]
perldelta for Unicode property performance changes
I put this under a major change, but would be fine if it is moved to an
=item change.
Karl Williamson [Sun, 8 Jan 2012 17:23:45 +0000 (10:23 -0700)]
util.c: Silence compiler warning
cc on solaris is smart enough to figure out that this return isn't
reached.
Karl Williamson [Fri, 6 Jan 2012 22:05:11 +0000 (15:05 -0700)]
regcomp.c: Compile inverted character classes with \p{}
This commit causes character classes of the form [^\p{...}] to have
their code points known at compile time instead of runtime. This allows
for better optimization and runtime execution speed.
Karl Williamson [Fri, 6 Jan 2012 21:51:27 +0000 (14:51 -0700)]
regcomp.c: Prepare for allowing [^\p{...}]
It turns out that this code is buggy, except for the fact that
<nonbitmap> currently can't contain conflicts. The trouble would have
started when Unicode properties were moved to being looked at at compile
time -- except when the class is to be inverted, so there isn't a
problem. But in preparation for handling this case, we fix the
potential bugs, as specified in the comments.
Karl Williamson [Fri, 6 Jan 2012 21:38:37 +0000 (14:38 -0700)]
regcomp.c; Use Latin1 \p{} in optimization
This commit causes any Latin1-range characters from Unicode properties
to be placed at compile time into the bitmap of the ANYOF node that
implements those properties, and to remove the flag that says they
should be looked for at run time. This causes the optimizer to generate
a better start class, as it knows more fully which characters can be and
can't be in the start class, and speeds up runtime checking, as it can
just do a bitmap test for these, instead of having to go look at the
swash.
Karl Williamson [Fri, 6 Jan 2012 20:46:17 +0000 (13:46 -0700)]
regcomp.c: Better optimize [classes] under /aa.
An optimization introduced in 5.14 is for bracketed character classes of
the very special form like [Bb]. These can be optimized into an
EXACTFish node. In this case, they can be optimized to an EXACTFA node
since they are ASCII characters. If the surrounding options are /aa, it
is likely that any adjacent EXACTFish nodes will be EXACTFA, so optimize
to that node instead of the previous EXACTFU. This will allow the
optimizer to collapse any adjacent nodes. For example
qr/a[B]c/aai
will now get optimized to an EXACTFA of "abc". Previously it would
have gotten optimized to EXACTFA<a> . EXACTFU<b> . EXACTFA<c>.
Karl Williamson [Fri, 6 Jan 2012 20:27:17 +0000 (13:27 -0700)]
regcomp.c: Avoid unnecessary runtime fold checking
Since 5.14, the single-char folds have been calculated at compile time,
either by doing it there, or for properties, setting the swash name to
include a foleded or non-folded version of the property. Thus this
patch could have been done much earlier.
Now, most of the properties are actually computed at compile time by
previous patches, but that isn't relevant to this one.
Thus there really doesn't need to be runtime folding for things that
aren't in the bitmap, except for those things under /d that match only
if the string is in UTF8.
Karl Williamson [Fri, 6 Jan 2012 17:18:53 +0000 (10:18 -0700)]
regcomp.c: Change loop variable name, associated changes
The variable 'value' is already used for something else. Using it as a
loop variable corrupts the other use. This commit changes to a
different name, and adds code to keep 'value', and 'prevvalue' in sync
with their other meanings.
Karl Williamson [Fri, 6 Jan 2012 04:15:45 +0000 (21:15 -0700)]
regexec.c: Use shared swash in bracketed character classes
This takes advantage of an earlier commit to use a swash that may be
shared across multiple character class instances. That means that if a
match in another class has to look up a value, that that same value is
automatically available without further lookup to all character classes
that share the swash. This means that the lookup result only needs be
cached once for all instances in the thread, saving time and memory.
Note that currently the only swashes that are shared are those that
consist solely of a single Unicode property definition. Some sort of
checksum would have to be computed if this were to be extended to
custom classes. But what this does is cause sharing for all Unicode
properties that aren't in bracketed classes (as they are implemented as
a bracketed class with a single element), as well as the few cases where
someone explicitly writes [\p{foo}] without anything else in the class.
Karl Williamson [Fri, 6 Jan 2012 04:10:28 +0000 (21:10 -0700)]
regexec.c: Allow for returning shared swash
This changes the function that returns the swash associated with a
bracketed character class so that it returns the original swash and not
a copy. The function is renamed and made accessible only from within
regexec.c, and a new wrapper function with the original name is created
that just calls the other one and returns a copy of the swash.
Thus, all access from outside regexec.c will use a copy which if
overwritten will not harm others; while the option exists from within
regexec.c to use a shared version.
Karl Williamson [Thu, 5 Jan 2012 22:53:25 +0000 (15:53 -0700)]
regcomp.c: Clean up comment
Karl Williamson [Thu, 5 Jan 2012 22:42:08 +0000 (15:42 -0700)]
perlunicode: Discourage use of is_utf8_char()
Karl Williamson [Thu, 5 Jan 2012 22:27:35 +0000 (15:27 -0700)]
perlop: Typos, too long lines, corrections
Karl Williamson [Thu, 5 Jan 2012 22:24:11 +0000 (15:24 -0700)]
intrpvar.h: clarification in comment
Karl Williamson [Thu, 5 Jan 2012 22:23:16 +0000 (15:23 -0700)]
utf8.c: fix typo in pod
Karl Williamson [Thu, 5 Jan 2012 22:17:18 +0000 (15:17 -0700)]
regcomp.c: Avoid leaking a scalar
Karl Williamson [Thu, 5 Jan 2012 20:27:35 +0000 (13:27 -0700)]
regcomp.c: truncate long debug dump output
What an ANYOF node matches could theoretically be millions of characters
long; This only outputs the first portion of very long ones.
Karl Williamson [Thu, 5 Jan 2012 20:21:57 +0000 (13:21 -0700)]
regcomp.c: in debug output, don't duplicate code points
The non-bitmap portion of an ANYOF node may also be in the bitmap
portion. There is no sense in having duplicate output
Karl Williamson [Thu, 5 Jan 2012 20:17:19 +0000 (13:17 -0700)]
regcomp.c: Change debug dump of bitmap/non-bitmap
Instead of '...' separating the two components of the output, change it
to a single space, which is output only if the first component isn't
null.
Karl Williamson [Thu, 5 Jan 2012 20:13:55 +0000 (13:13 -0700)]
regcomp.c: Change \t to a - in debug dumping ranges
This changes the separator in the output of a range from a tab to a
hyphen, which is clearer.
Karl Williamson [Thu, 5 Jan 2012 20:01:36 +0000 (13:01 -0700)]
regcomp.c: White-space only
Remove trailing tabs
Karl Williamson [Thu, 5 Jan 2012 18:44:48 +0000 (11:44 -0700)]
regcomp.c: put_byte wants an ord, not a utf8 char
These were calling put_byte() incorrectly, with a utf8 char instead of a
the ordinal.
Karl Williamson [Thu, 5 Jan 2012 18:41:36 +0000 (11:41 -0700)]
regcomp.c: White-space only
These lines were indented one stop too many for the enclosing block
Karl Williamson [Tue, 29 Nov 2011 21:57:02 +0000 (14:57 -0700)]
regcomp.c: Don't read beyond input
This code was assuming that there were several more bytes in the input
stream, when there may not be. This was discovered by valgrind.
Karl Williamson [Mon, 28 Nov 2011 19:32:02 +0000 (12:32 -0700)]
regcomp.c: Optimize a single Unicode property in a [character class]
All Unicode properties actually turn into bracketed character classes,
whether explicitly done or not. A swash is generated for each property
in the class. If that is the only thing not in the class's bitmap, it
specifies completely the non-bitmap behavior of the class, and can be
passed explicitly to regexec.c. This avoids having to regenerate the
swash. It also means that the same swash is used for multiple instances
of a property. And that means the number of duplicated data structures
is greatly reduced. This currently doesn't extend to cases where
multiple Unicode properties are used in the same class
[\p{greek}\p{latin}] will not share the same swash as another character
class with the same components. This is because I don't know of a
an efficient method to determine if a new class being parsed has the
same components as one already generated. I suppose some sort of
checksum could be generated, but that is for future consideration.
Karl Williamson [Mon, 28 Nov 2011 17:26:28 +0000 (10:26 -0700)]
Move Unicode property defn processing to compile time
This patch moves the processing of most Unicode property definitions
from execution (regexec.c) to compilation (regcomp.c). There is a cost
to do this. By deferring it to execution, it may be that the affected
path will never be taken, and hence the work won't have to be done;
whereas, it's always done if it gets done at compilation.
However, doing it at compilation, has many advantages. We can't
optimize what we don't know about, so this allows for better
optimization, as well as feature enhancements, such as set
manipulations, restricting matches to certain scripts, etc. A big one,
about to be committed allows for significantly reducing the number of
copies of the data structure used for each property. (Currently, every
mention in every regular expression of a given property will generate a
new instance of its hash, and so results of look-ups of code points in
one instance aren't automatically known to other instances, so the code
point has to be looked-up again.)
This commit leaves the processing to execution time when the class is to
be inverted. This was done purely to make the commit smaller, and will
be removed in a future commit; hence the redundant test here will be
removed shortly.
It also has to leave to execution time processing of properties whose
definition is not known yet. That can happen when the property is
user-defined. We call _core_swash_init(), and if it fails, we assume
that it's because it's such a property, and if it turns out that it was
an unknown property, we leave to execution time the raising of a warning
for it, just as before.
Currently, the processing of properties in inverted character classes is
also left to execution time. This restriction will be lifted in a
future commit, and this patch assumes that, and doesn't indent some code
that it otherwise would, in anticipation of the surrounding 'if' tests
being removed.
Karl Williamson [Mon, 28 Nov 2011 16:43:54 +0000 (09:43 -0700)]
regcomp.c: Pass inversion list directly to regexec.c
Currently, any generated inversion list is stringified and passed in the
data structure to regexec.c as such. regexec.c then calls
_core_swash_init() to convert it into a swash and back into an inversion
list. This intermediate step is wasteful, and this commit dispenses
with it, based on preparatory commits in regexec.c and utf8.c
Karl Williamson [Mon, 28 Nov 2011 16:25:45 +0000 (09:25 -0700)]
regexec.c: Prepare for inversion lists in ANYOF nodes
Future commits will start passing inversion lists to regexec.c from the
compilation phase. This commit causes regexec.c to accept them, trace
them for debug output, and pass them along to utf8.c
Karl Williamson [Mon, 28 Nov 2011 16:20:12 +0000 (09:20 -0700)]
regcomp.c: Add _invlist_contents() to compactly dump inversion list
This will be used in future commits for debug traces
Karl Williamson [Mon, 28 Nov 2011 16:00:52 +0000 (09:00 -0700)]
utf8.c: White-space only
As a result of previous commits adding and removing if() {} blocks,
indent and outdent and reflow comments and statements to not exceed 80
columns.
Karl Williamson [Mon, 28 Nov 2011 15:36:54 +0000 (08:36 -0700)]
utf8.c: Add ability to pass inversion list to _core_swash_init()
Add a new parameter to _core_swash_init() that is an inversion list to
add to the swash, along with a boolean to indicate if this inversion
list is derived from a user-defined property. This capability will prove
useful in future commits
Karl Williamson [Mon, 28 Nov 2011 15:24:07 +0000 (08:24 -0700)]
utf8.c: Add flag to swash_init() to not croak on error
This adds the capability, to be used in future commits, for swash_ini()
to return NULL instead of croaking if it can't find a property, so that
the caller can choose how to handle the situation.
Karl Williamson [Mon, 28 Nov 2011 03:55:33 +0000 (20:55 -0700)]
regcomp.c: Use '*a == b', not 'a == &b'
The latter doesn't always work. The consequences of this failure were
memory leaks
Karl Williamson [Mon, 28 Nov 2011 03:53:56 +0000 (20:53 -0700)]
regcomp.c: decrement ptr ref cnt before invalidating ptr
Otherwise there coul be memory leaks