Nicholas Clark [Fri, 4 Mar 2011 18:24:19 +0000 (18:24 +0000)]
Eliminate use of $::BugId in t/re/pat_rt_report.t
Pass the message in explicitly to the test functions. Change to use test.pl
compatible functions where appropriate. For now avoid renumbering lines, or
any other change that changes the generated TAP output. (Hence no splitting
tests, and adding the seemingly useless 'Noname test;', as that was what
t/re/ReTest.pl's _ok() was defaulting to)
Nicholas Clark [Fri, 4 Mar 2011 16:14:20 +0000 (16:14 +0000)]
Eliminate use of $::BugId in t/re/pat_re_eval.t
Pass the message in explicitly to the test functions. Change to use test.pl
compatible functions where appropriate. For now avoid renumbering lines, or
any other change that changes the generated TAP output. (Hence no splitting
tests, and adding the seemingly useless 'Noname test;', as that was what
t/re/ReTest.pl's _ok() was defaulting to)
Nicholas Clark [Fri, 4 Mar 2011 16:04:21 +0000 (16:04 +0000)]
Eliminate the global override $Message from t/re/ReTest.pl
Nicholas Clark [Fri, 4 Mar 2011 16:03:37 +0000 (16:03 +0000)]
Eliminate use of $::Message in t/re/pat_re_eval.t
Pass the message in explicitly to the test functions. Change to use test.pl
compatible functions where appropriate. For now avoid renumbering lines, or
any other change that changes the generated TAP output. (Hence no splitting
tests.)
Nicholas Clark [Fri, 4 Mar 2011 15:31:17 +0000 (15:31 +0000)]
Eliminate use of $::Message in t/re/pat_rt_report.t
Pass the message in explicitly to the test functions. Change to use test.pl
compatible functions where appropriate. For now avoid renumbering lines, or
any other change that changes the generated TAP output. (Hence no splitting
tests.)
Nicholas Clark [Fri, 4 Mar 2011 10:02:36 +0000 (10:02 +0000)]
Eliminate use of $::Message in t/re/pat_advanced.t
Pass the message in explicitly to the test functions. Change to use test.pl
compatible functions where appropriate. For now avoid renumbering lines, or
any other change that changes the generated TAP output. (Hence no splitting
tests.)
Nicholas Clark [Thu, 3 Mar 2011 14:45:25 +0000 (14:45 +0000)]
Eliminate use of $::Message in t/re/pat.t
Pass the message in explicitly to the test functions. Change to use test.pl
compatible functions where appropriate. For now avoid renumbering lines, or
any other change that changes the generated TAP output. (Hence no splitting of
tests.)
Nicholas Clark [Thu, 3 Mar 2011 14:44:31 +0000 (14:44 +0000)]
In ReTest.pl, provide is(), isnt(), like() and unlike(), equivalent to test.pl
This will ease the migration of the users of ReTest.pl to test.pl
Nicholas Clark [Wed, 2 Mar 2011 16:23:17 +0000 (16:23 +0000)]
In ReTest.pl, convert iseq() and isneq() to the same logic as test.pl's is/isnt
Previously both would stringify first, then compare, which would mean that
any overloaded objects would have their stringify method called, instead of
'eq' or 'ne'.
Nicholas Clark [Thu, 3 Mar 2011 08:30:16 +0000 (08:30 +0000)]
In test.pl, change like() and unlike() to avoid copying the tested scalar.
This means that side effects of matching regexps on it are maintained,
specifically the value of pos, making test.pl more useful for tests in t/re.
This is a subtle divergence from the behaviour of Test::More::{like,unlike}
Nicholas Clark [Sat, 5 Mar 2011 19:59:42 +0000 (19:59 +0000)]
59d6f6a4c05afa7f was too aggressive, as it disabled #! line -I on miniperl
Restore -I processing on the #! line for miniperl. This gets t/run/switchI.t
and t/run/switchd-78586.t passing again under minitest.
Michael Stevens [Mon, 28 Feb 2011 14:23:52 +0000 (14:23 +0000)]
Fix podchecker warnings.
Fix warnings due to empty lines containing whitespace in
Porting/epigraphs.pod.
Nicholas Clark [Sat, 5 Mar 2011 18:14:47 +0000 (18:14 +0000)]
Avoid miniperl SEGVing when processing -I on the #! line
A side-effect of change
3185893b8dec1062 was to force av in S_incpush() to be
NULL, whilst other flag variables were still set as if it were non-NULL, for
certain cases, only when compiled with -DPERL_IS_MINIPERL
The "obvious" fix is to also set all the flag variables to 0 under
-DPERL_IS_MINIPERL, to make everything consistent. However, this confuses (at
least) the local version of gcc, which issues warnings about passing a NULL
value (av, known always to be NULL) as a not-NULL parameter, despite the fact
that all the relevant calls are inside blocks which are actually dead code,
due to the if() conditions being const variables set to 0 under
-DPERL_IS_MINIPERL.
So to avoid future bug reports about compiler warnings, the least worst thing
to do seems to be to use #ifndef to use the pre-processor to eliminate the
dead code, and related variables.
Craig A. Berry [Fri, 4 Mar 2011 22:44:24 +0000 (16:44 -0600)]
In S_incpush, unixify libdir earlier.
This allows, for example, -I[.lib] to have Unix format appendages
added, such as "/buildcustomize.pl", "/sitecustomize.pl", etc.
It was previously only being converted to Unix syntax to allow the addition
of subdirectories, but the number of things that want to glue pieces onto
lib/ have multiplied over the years.
Craig A. Berry [Fri, 4 Mar 2011 22:35:16 +0000 (16:35 -0600)]
In S_incpush, omit subdirs when PERL_IS_MINIPERL.
The new logic in S_parse_body that loads lib/buildcustomize.pl in
miniperl relies on lib being in $INC[0], which it won't be if we've
loaded version- and architecture-specific directories before lib.
Since miniperl isn't installed and can't do dynamic loading, it
doesn't really need those subdirectories, so skip loading them
for miniperl.
Karl Williamson [Fri, 4 Mar 2011 02:54:02 +0000 (19:54 -0700)]
Karl Williamson [Fri, 4 Mar 2011 02:10:06 +0000 (19:10 -0700)]
UCD.pm: All code points are in some block
Code points that are not in a block are considered to be in the
pseudo-block 'No_Block' by the Unicode standard; so change to do that
instead of 'undef'
Karl Williamson [Fri, 4 Mar 2011 02:02:37 +0000 (19:02 -0700)]
UCD.pm: All code points have a script
Unassigned code points have the script 'Unknown'; not undef
Karl Williamson [Fri, 4 Mar 2011 02:01:34 +0000 (19:01 -0700)]
UCD.pm: Nits in pod
Karl Williamson [Fri, 4 Mar 2011 01:42:30 +0000 (18:42 -0700)]
UCD.pm: Fix typos in pod
Karl Williamson [Fri, 4 Mar 2011 01:33:18 +0000 (18:33 -0700)]
UCD.pm: Remove reliance on UnicodeData.txt
In doing so, there were a number of bug fixes made, as it now relies
on files processed by mktables, which has intelligence to fix a
number of problems with UnicodeData.txt.
This is essentially a rewrite of charinfo(). It previously had
hard-coded the ranges in UnicodeData.txt, instead of examining the file
to see what was there. This had not been updated for some time, and was
out-of-date, with the result that the newer ranges (all CJK) were quite
wrong. The new code does not have such reliance, and so new versions
of Unicode should not break this, like they previously would
This may be slower than what was previously there, as it reads several
smaller files instead of one very large one. But the principal reason
to do this work was to save disk space. It was previously thought that
the function could continue to use UnicodeData.txt if it exists on the
machine, but this would have required fixing all the bugs that this
automatically fixes by using the processed files.
Karl Williamson [Fri, 4 Mar 2011 01:02:03 +0000 (18:02 -0700)]
UCD.pm: Use subclassed warnings
5.14 subclasses some UTF8 warnings, so that they can be turned off
more precisely.
Karl Williamson [Fri, 4 Mar 2011 01:00:08 +0000 (18:00 -0700)]
UCD.pm: Use traditional casing for script names
For some reason UCD.pm has lowercased the first letters of the
non-first word in script names. For backwards compatibility, continue
to do so.
Karl Williamson [Fri, 4 Mar 2011 00:57:29 +0000 (17:57 -0700)]
mktables: Write Unicode_1_Name table for UCD.pm
Karl Williamson [Fri, 4 Mar 2011 00:48:04 +0000 (17:48 -0700)]
mktables: Add override for map tables output
This adds a hash so can more precisely control which map tables
get output and which are documented. The hash is populated to
suppress some messages and some tables that are redundant.
Karl Williamson [Fri, 4 Mar 2011 00:31:15 +0000 (17:31 -0700)]
mktables: White-space only
The previous patch introduced a closure, and this patch indents
the code in that closure.
Karl Williamson [Fri, 4 Mar 2011 00:20:37 +0000 (17:20 -0700)]
mktables: Add tables of just simple case foldings
This adds three tables for lc, uc, and title, which are the simple
mappings that are overridden by full mappings. These are quite
tiny, and will be used by UCD.pm to avoid using UnicodeData.txt
Karl Williamson [Thu, 3 Mar 2011 23:53:20 +0000 (16:53 -0700)]
UCD.t: Add test for non-Unicode code point
Karl Williamson [Thu, 3 Mar 2011 23:48:47 +0000 (16:48 -0700)]
UCD.pm" remove no longer used variable
Karl Williamson [Thu, 3 Mar 2011 23:27:31 +0000 (16:27 -0700)]
mktables: Move some definitions to earlier
Karl Williamson [Thu, 3 Mar 2011 23:20:24 +0000 (16:20 -0700)]
UCD.t: Fix a test description
Karl Williamson [Thu, 3 Mar 2011 23:17:55 +0000 (16:17 -0700)]
UCD.pm: Nits in pod and comment
Chris 'BinGOs' Williams [Thu, 3 Mar 2011 18:19:14 +0000 (18:19 +0000)]
Update Digest-SHA to CPAN version 5.60
[DELTA]
5.60 Thu Mar 3 05:26:42 MST 2011
- added new SHA-512/224 and SHA-512/256 transforms
-- ref. NIST Draft FIPS 180-4 (February 2011)
- simplified shasum by removing duplicative text
- improved efficiency of Addfile
-- expensive -T test now occurs only in portable mode
Nicholas Clark [Thu, 3 Mar 2011 16:09:35 +0000 (16:09 +0000)]
Simplify the code for a group of tests in pat_advanced.t
Nicholas Clark [Thu, 3 Mar 2011 14:38:54 +0000 (14:38 +0000)]
In ReTest.pl's must_warn(), ignore $::Message.
Every caller was setting $name, so $::Message was never used.
Nicholas Clark [Thu, 3 Mar 2011 14:27:34 +0000 (14:27 +0000)]
In ReTest.pl's may_not_warn(), eliminate the use of $::Message.
For the one caller using the global variable, instead pass the message in as
a function parameter.
Nicholas Clark [Thu, 3 Mar 2011 11:57:29 +0000 (11:57 +0000)]
Eliminate the unused global override $WarnPattern from ReTest.pl
Nicholas Clark [Thu, 3 Mar 2011 11:44:46 +0000 (11:44 +0000)]
Eliminate the global override $DiePattern from t/re{ReTest.pl,pat.t}
For the only user of this, instead explicitly pass the value into must_die()
As must_die() is always passed $name, eliminate its use of $Message.
Karl Williamson [Thu, 3 Mar 2011 05:14:24 +0000 (22:14 -0700)]
mktables: force code point tables are range size 1
As stated in the comment, the Perl core is expecting a different
range value definition than this program outputs. But this isn't a
problem if the range size is set to 1. Currently the core only reads in
tables that map to code points, so do it only for them.
Karl Williamson [Thu, 3 Mar 2011 05:07:27 +0000 (22:07 -0700)]
mktables: Nits in comment, white space
Karl Williamson [Thu, 3 Mar 2011 05:05:29 +0000 (22:05 -0700)]
UCD.pm: nits in comments and pod
Karl Williamson [Thu, 3 Mar 2011 05:02:12 +0000 (22:02 -0700)]
regexec.c: Remove '#if 0' code
This code was retained for a while until it was clear that the replacement
code worked.
Karl Williamson [Thu, 3 Mar 2011 05:00:36 +0000 (22:00 -0700)]
regcomp.c: Remove #if 0 code
This code is obsolete, as new code has been written to do folding;
now that smokes are all passing with that new code, there is no point to
retaining the old.
Nicholas Clark [Wed, 2 Mar 2011 17:25:17 +0000 (17:25 +0000)]
Eliminate the global error override $Error from t/re{ReTest.pl,pat.t}
For the 5 use points, convey the information by appending to $Message, which
all are already using.
Nicholas Clark [Wed, 2 Mar 2011 17:15:19 +0000 (17:15 +0000)]
Simplify pat.t by removing a loop over 2 items, which is mostly if/else
Karl Williamson [Wed, 2 Mar 2011 15:15:55 +0000 (08:15 -0700)]
Revert "In File::Copy, convert two regexps to explicit ranges, instead of using /i"
This reverts commit
7b7d8b152c027b50b260244da6f7c17a010279d6.
The performance issue that prompted this commit has been fixed.
Karl Williamson [Wed, 2 Mar 2011 15:15:05 +0000 (08:15 -0700)]
Revert "In Cwd, convert two regexps to explicit ranges, instead of using /i"
This reverts commit
be6c6a23f06d680159ce323c1906d297abbe85cd.
The performance issue that prompted this commit has been fixed.
Nicholas Clark [Wed, 2 Mar 2011 13:46:10 +0000 (13:46 +0000)]
Fix the TODO handling in t/re/ReTest.pl's skip()
It now does TODO & SKIP. However, currently I believe that nothing that uses it
is calling skip() with a TODO test.
Nicholas Clark [Mon, 28 Feb 2011 16:31:19 +0000 (16:31 +0000)]
Convert taint.t to lexical file and directory handles, and 3 argument open.
Retain tainting tests for package filehandles - augment these with analogous
tests for lexical filehandles.
Drop the use of File::Spec::Functions to determine a portable path for
'./TEST', added as part of the MacOS classic porting. We haven't built on
classic for many years, and the change itself was over-engineering - the
better fix at the time would have been to replace './TEST' with 'TEST'.
Nicholas Clark [Mon, 28 Feb 2011 16:08:29 +0000 (16:08 +0000)]
In taint.t, convert the Fcntl and *printf tests to violates_taint().
Nicholas Clark [Mon, 28 Feb 2011 14:49:43 +0000 (14:49 +0000)]
In taint.t, violates_taint() now tests more of the "insecure dependency" error.
Use the second parameter to determine the string to look for in the error
message, and add an optional third parameter for the test description, if it
should differ from the error message string.
Nicholas Clark [Mon, 28 Feb 2011 14:44:38 +0000 (14:44 +0000)]
In taint.t, add violates_taint(), to replace a repeated is()/like() pair.
Nicholas Clark [Mon, 28 Feb 2011 13:22:34 +0000 (13:22 +0000)]
In taint.t, avoid using ok() where better test functions are more suitable.
Nicholas Clark [Mon, 28 Feb 2011 13:17:14 +0000 (13:17 +0000)]
In taint.t, replace C<not any_tainted(..)> with calls to isnt_tainted()
Change tainted() to perform the actual test for tainting, eliminate all other
uses of any_tainted() and remove it.
Nicholas Clark [Mon, 28 Feb 2011 11:57:32 +0000 (11:57 +0000)]
In taint.t, replace calls to all_tainted() with a loop over is_tainted().
Nicholas Clark [Mon, 28 Feb 2011 11:50:38 +0000 (11:50 +0000)]
Add {is,isnt}_tainted() to taint.t, to replace use of C<ok(tainted(...))>
Nicholas Clark [Sat, 26 Feb 2011 10:46:52 +0000 (10:46 +0000)]
Convert taint.t to use test.pl's testing functions.
This eliminates the local sub test().
Chris 'BinGOs' Williams [Tue, 1 Mar 2011 21:39:52 +0000 (21:39 +0000)]
Update perldelta for Locale::Codes update
Sullivan Beck [Tue, 1 Mar 2011 21:05:43 +0000 (16:05 -0500)]
PATCH: Bump Locale-Codes from 3.15 to 3.16
Attached is a patch to upgrade the Locale-Codes distribution (containing
the core modules Locale::Country, Locale::Language, and
Locale::Currency) to the most recent version.
====
Background:
The core modules Locale::Country, Locale::Language, and Locale::Currency
(all part of the Locale-Codes distribution) should be updated on a
regular basis. They contain "codes" from various internet standards
which change over time.
I plan on releasing new versions at least twice a year to keep the codes
up-to-date. At this point, I'm not planning on any significant code
changes (other than bug fixes), so the only significant changes
between releases should be to update the codes.
!DSPAM:
4d6d635562584214713763!
>From
5f7e59eac34b1b322f80686fbf431569789c222a Mon Sep 17 00:00:00 2001
From: Sullivan Beck <sbeck@cpan.org>
Date: Tue, 1 Mar 2011 15:56:18 -0500
Subject: [PATCH] Bump Locale-Codes from 3.15 to 3.16
Signed-off-by: Chris 'BinGOs' Williams <chris@bingosnet.co.uk>
Karl Williamson [Tue, 1 Mar 2011 17:03:25 +0000 (10:03 -0700)]
toke.c: Raise error for multiple regexp mods
When the new regular expression modifiers being allowed in suffix-form
were added on a very tight schedule, it was with the understanding that
the error checking that only one can occur per regular experssion would
be added later. This accomplishes that.
Karl Williamson [Tue, 1 Mar 2011 15:53:05 +0000 (08:53 -0700)]
UCD.pm: Convert charscript to use mktables tables
This removes the need for Scripts.txt
Karl Williamson [Tue, 1 Mar 2011 15:23:21 +0000 (08:23 -0700)]
UCD.pm: Bump version
Karl Williamson [Tue, 1 Mar 2011 14:56:52 +0000 (07:56 -0700)]
Revert "mktables: Default map tables to range size 1."
This reverts commit
cef6a343d5e19fe2dc2c3655ecf621c8ff26f252.
This commit had the unintended consequence of greatly increasing
the disk space used; the code in UCD.pm that didn't like ranges has now
been changed to accept them. The tables that the swashes currently
read already had been set to not put out ranges. If other tables
eventually do get read by swashes, things will have to be resolved at
that time.
Karl Williamson [Tue, 1 Mar 2011 14:50:11 +0000 (07:50 -0700)]
UCD.pm: Convert num() to use new fcn
A new function that reads mktables files has been created. Switch to
use this.
A test is added to make sure it's working right
Karl Williamson [Tue, 1 Mar 2011 14:45:45 +0000 (07:45 -0700)]
UCD.pm: Add internal fcn for reading mktables file
Chris 'BinGOs' Williams [Tue, 1 Mar 2011 13:17:36 +0000 (13:17 +0000)]
Update Maintainers.pl and perldelta for Math::BigInt::FastCalc update
Peter John Acklam [Tue, 1 Mar 2011 13:15:47 +0000 (13:15 +0000)]
[perl #85118] [PATCH] Update Math::BigInt::FastCalc to CPAN version 0.28
Add Perl v5.6 compatibility code to FastCalc.xs (Closes RT #63859).
Signed-off-by: Chris 'BinGOs' Williams <chris@bingosnet.co.uk>
brian d foy [Tue, 1 Mar 2011 08:46:27 +0000 (02:46 -0600)]
Point people to perlop for here doc errors
This makes this long warning slightly more useful:
Can't find string terminator %s anywhere before EOF
brian d foy [Tue, 1 Mar 2011 06:49:49 +0000 (00:49 -0600)]
Note here docs need a line separator after the last token
Karl Williamson [Mon, 28 Feb 2011 17:42:28 +0000 (10:42 -0700)]
regex: Remove obsolete code
This code has been rendered obsolete in 5.14 by using a different
mechanism altogether. This functionality is now provided at run-time,
user-selectable, via the /u and /d regex modifiers. This code was
for compile-time selection of which to use.
Karl Williamson [Mon, 28 Feb 2011 17:39:02 +0000 (10:39 -0700)]
fold_grind: Remove more tests under /d
This removes tests that cross the boundary between ascii and above latin1,
under /d, as they aren't testing different code than already tested under
/u
Karl Williamson [Mon, 28 Feb 2011 16:58:44 +0000 (09:58 -0700)]
fold_grind.t: Reduce some tests.
/d executes different code form /u really only when there are things in the
latin1 range; so the successes/failures of it should match those of /u for
things outside that
Karl Williamson [Mon, 28 Feb 2011 16:58:01 +0000 (09:58 -0700)]
fold_grind.t: Clarify comment
Karl Williamson [Mon, 28 Feb 2011 16:55:47 +0000 (09:55 -0700)]
fold_grind.t: fix confusingly-named variable
Karl Williamson [Mon, 28 Feb 2011 16:26:43 +0000 (09:26 -0700)]
regexec.c: remove no longer needed code
The code dealing with the sharp ss is now handled by the ANYOFV node,
and shouldn't appear here.
Karl Williamson [Mon, 28 Feb 2011 16:25:03 +0000 (09:25 -0700)]
regcomp.c: white space only
A previous commit collapsed nested blocks. This outdents the nested
part
Karl Williamson [Mon, 28 Feb 2011 15:40:30 +0000 (08:40 -0700)]
regcomp.c: collapse two blocks
An earlier commit removed code so that these two blocks can be written
as one.
Karl Williamson [Mon, 28 Feb 2011 15:38:30 +0000 (08:38 -0700)]
regcomp.c: Remove temporary code
This code was inserted to make sure no tests failed in the intermediate
commits leading up to
d50a4f90cab527593b2dd218f71b66a6be555490, and
should have been removed in that commit, but I forgot to.
Chris 'BinGOs' Williams [Mon, 28 Feb 2011 15:17:27 +0000 (15:17 +0000)]
Update CPANPLUS to CPAN version 0.9102
[DELTA]
Changes for 0.9102 Mon Feb 28 11:35:43 2011
================================================
* Only send NAs for a 'perl' prereq when it actually is
Not Applicable
Chris 'BinGOs' Williams [Mon, 28 Feb 2011 15:12:22 +0000 (15:12 +0000)]
Update CPANPLUS::Dist::Build to CPAN version 0.54
[DELTA]
0.54 Mon Feb 28 11:52:04 GMT 2011
- Only skip 'perl' as a prereq when CPANPLUS version is
less than 0.9102
Nicholas Clark [Mon, 28 Feb 2011 15:17:17 +0000 (15:17 +0000)]
Convert the taint.t Fcntl tests to use tempfile(), instead of "foo".
tempfile() also automatically deletes all of its temporary files.
brian d foy [Mon, 28 Feb 2011 14:50:26 +0000 (08:50 -0600)]
Fix a grammar nit in perlfaq8
Nicholas Clark [Mon, 28 Feb 2011 14:05:38 +0000 (14:05 +0000)]
Correct taint.t to skip the truncate test if $Config{d_truncate} is false.
Previously it had a comment, present since 5.003_92, that "There is no feature
test in $Config{} for truncate, so we allow for the possibility that it's
d missing."
This was never correct, as 5.000 had d_truncate, and taint.t had analogous
tests for $Config{d_chown} etc.
Chris 'BinGOs' Williams [Mon, 28 Feb 2011 11:25:33 +0000 (11:25 +0000)]
Corrected an incredibly small error in perlport.pod
Florian Ragwitz [Mon, 28 Feb 2011 10:37:07 +0000 (11:37 +0100)]
Ignore ExtUtils::Command release tests
Florian Ragwitz [Mon, 28 Feb 2011 10:31:46 +0000 (11:31 +0100)]
Upgrade Devel::DProf from version
20110225.01 to
20110228.00
Florian Ragwitz [Mon, 28 Feb 2011 10:30:03 +0000 (11:30 +0100)]
Ignore DProf release tests
Florian Ragwitz [Mon, 28 Feb 2011 09:53:40 +0000 (10:53 +0100)]
Upgrade ExtUtils::Command from version 1.16 to 1.17
Nicholas Clark [Mon, 28 Feb 2011 10:06:35 +0000 (10:06 +0000)]
Simplify the regression tests added in
3e6bd4bfcd175c61.
Unwind the innermost loop, used only for reporting, and remove the array @s,
only used to accumulate results for that loop. Fewer lines, less complexity,
better diagnostics, and with the same failure cases on
3e6bd4bfcd175c61^.
Karl Williamson [Mon, 28 Feb 2011 01:44:43 +0000 (18:44 -0700)]
Handle [folds] of 0-255 without swashes
Commit
56ca34cada940c7f6aae9a59da266e541530041e had the side effect of
causing regular expressions with things like [a-z], or even just [k] to
go out to disk to read tables to create swashes because it knew that
some of those characters matched outside the bitmap (and due to
l1_char_class_tab.h it knew which ones had those matches), but it didn't
know what the characters were that participated in those folds.
This patch hard-codes the Unicode 6.0 rules into regcomp.c for the
code points 0-255, so that the very slow utf8_heavy is not invoked on
them. (Code points above 255 will continue to invoke it.) It would,
of course, be better if these rules could be regen'd into regcomp.c, as
there is a risk that the standard will change, and the code will not.
But I don't think that has ever happened; in other words, I think that
the rules haven't changed so far since Day 1 of Unicode. (That would
not be the case if we were doing simple case folding, as the capital
sharp ss which folds to U+00DF was added later.) And the Standard is
getting more stable in this area. I believe one of their stability
policies now forbid them from adding something that simply folds to
one of the characters that already has a fold, such as M and m.
Ligatures are frowned on, and I doubt that new ones would be encoded,
so that leaves a new Unicode character that folds to a Latin-1 plus some
sort of mark. For those, this code is a no-op, so those aren't a
problem either.
Karl Williamson [Mon, 28 Feb 2011 01:31:51 +0000 (18:31 -0700)]
regcomp.c: Add deprecation macro with extra param
Karl Williamson [Sun, 27 Feb 2011 22:19:04 +0000 (15:19 -0700)]
regcomp.c: More prep for bitmap/nonbitmap folds
This sets things up in preparation for a future commit that will
move calculating all folds involving characters in the bit map.
Karl Williamson [Sun, 27 Feb 2011 21:21:47 +0000 (14:21 -0700)]
regcomp.c: Place marker for 2nd inversion list
The set_regclass_bit functions will be adding to a new inversion list.
This declares that list and passes it to them.
Karl Williamson [Sun, 27 Feb 2011 21:12:57 +0000 (14:12 -0700)]
Change to use new add_cp_to_invlist()
Karl Williamson [Sun, 27 Feb 2011 21:04:26 +0000 (14:04 -0700)]
regcomp.c: Add parameters to fcns
A pointer to the list of multi-char folds in an ANYOF node is now passed
to the routines that set the bit map. This is in preparation for those
routines to add to the list
Karl Williamson [Sun, 27 Feb 2011 20:55:03 +0000 (13:55 -0700)]
regcomp.c: Convert old-style to inversion list
The code that handles a false range in a [character class] hadn't been
converted to use inversion lists
Karl Williamson [Mon, 28 Feb 2011 00:51:33 +0000 (17:51 -0700)]
regcomp.c: Add fcn add_cp_to_invlist()
This is just an inline shorthand when a single code point is all that is
needed. A macro could have been used instead, but this just seemed nicer.
Karl Williamson [Mon, 28 Feb 2011 00:45:46 +0000 (17:45 -0700)]
regcomp.c: Move code to common place
THis is part of the refactoring of the code that sets the alternate array
for multi-char folds. Changing the node type to ANYOFV can be done at
the last second, in pass 2, as it doesn't change any sizing, etc.
Karl Williamson [Sun, 27 Feb 2011 20:12:49 +0000 (13:12 -0700)]
regcomp.c: Factor code into a function.
A future commit uses this same code, so put it into a common place.
Karl Williamson [Sun, 27 Feb 2011 05:02:26 +0000 (22:02 -0700)]
Add #defines for 2 Latin1 chars
These will be used in a future commit; the ordinals are different on
EBCDIC vs. ASCII