John Peacock [Sun, 29 Dec 2013 18:47:11 +0000 (13:47 -0500)]
Ensure that version::_VERSION is always exported
Now that version.pm doesn't mess with the symbol table, we need
to make sure that version::_VERSION exists at all times. Also
change the name of the method that implements UNIVERSAL::VERSION
so that it is visually distinct and matches the other version.pm
derived methods.
John Peacock [Sun, 29 Dec 2013 17:26:30 +0000 (12:26 -0500)]
Apply patch from Sprout to make vxs.inc better
[Committer’s note: I already had the vxs.inc changes in my branch,
so this patch only includes the version.pm changes.]
John Peacock [Sun, 29 Dec 2013 16:49:43 +0000 (11:49 -0500)]
Integrate CPAN release of version.pm 0.9905
When adding the CPAN-distributed files for version.pm, it is necessary
to delete an entire block out of lib/version.pm, since that code is
only necessary with the CPAN release. Within core Perl, there is no
version::vxs implementation class any more.
John Peacock [Wed, 25 Dec 2013 19:19:19 +0000 (14:19 -0500)]
Grab latest changes from CPAN 0.9905
John Peacock [Mon, 9 Dec 2013 23:23:20 +0000 (18:23 -0500)]
Integrate CPAN version.pm release into core
Father Chrysostomos [Fri, 25 Oct 2013 12:57:47 +0000 (05:57 -0700)]
vxs.inc: Fix thinko
This was causing test failures after rebasing against blead.
Father Chrysostomos [Fri, 25 Oct 2013 04:54:45 +0000 (21:54 -0700)]
Update Maintainers.pl for version.pm changes
I *hope* I got it all correct. At least cmp_version.t now passes.
Father Chrysostomos [Fri, 25 Oct 2013 00:56:13 +0000 (17:56 -0700)]
Use VXS_ prefix for XSUB bodies in CPAN version
The names of the functions in core and in the CPAN version will con-
flict otherwise.
Since perl versions before 5.16.0 did not have XS_INTERNAL (which
could solve this problem another way, making the functions static),
it’s easier just to use different names.
Father Chrysostomos [Wed, 11 Sep 2013 20:19:31 +0000 (13:19 -0700)]
Integrate the rest of CPAN’s vxs.inc
Uppercase macros instead of functions (so the CPAN version can call
its own non-core functions if need be), plus a poor man’s typemap
(VTYPECHECK).
Father Chrysostomos [Wed, 11 Sep 2013 20:17:59 +0000 (13:17 -0700)]
vxs.inc: Disallow multiple args to XS_version_normal
Also rename the argument.
This is part of bringing perl and CPAN into synch.
Father Chrysostomos [Wed, 11 Sep 2013 20:10:15 +0000 (13:10 -0700)]
[rt.cpan.org #88495] bad string comparison in version->qv
qv is affected, too. A package called "ver" inheriting from version
should be able to create "ver" objects via ->qv.
Father Chrysostomos [Wed, 11 Sep 2013 19:51:44 +0000 (12:51 -0700)]
vxs.inc: Integrate the CPAN version of version_new
No behaviour changes; just rearranged, and with a few extra #ifdefs.
Father Chrysostomos [Wed, 11 Sep 2013 16:03:14 +0000 (09:03 -0700)]
[rt.cpan.org #88495] version->new str cmp bug
We shouldn’t consider ver and version to be the same class.
If ver inherits from version, ver->new should give a ver object.
This string comparison bug has only ever existed in the perl core
version of the version routines. It was
ed1db70e1224 in 5.16 that
introduced it.
Father Chrysostomos [Wed, 11 Sep 2013 07:23:07 +0000 (00:23 -0700)]
vxs.inc: Import UNIVERSAL::VERSION from CPAN
No functional changes, just cosmetic (and it works with older
perls, too).
This is part of bringing perl and CPAN into synch.
Father Chrysostomos [Wed, 11 Sep 2013 16:01:39 +0000 (09:01 -0700)]
universal.c: include vutil.h
Subsequent changes to vxs.inc will require it.
Father Chrysostomos [Wed, 11 Sep 2013 06:46:47 +0000 (23:46 -0700)]
vxs.inc: arg list checking for UNIVERSAL::VERSION
This brings it in line with the CPAN implementation. It’s hard to
test this, as the tests should go in cpan/version, but the pure-Perl
implementation doesn’t check the number of arguments.
Father Chrysostomos [Wed, 11 Sep 2013 05:12:45 +0000 (22:12 -0700)]
vxs.inc: Add dVAR define for CPAN use
This is part of bringing perl and CPAN into synch.
Father Chrysostomos [Wed, 11 Sep 2013 05:11:30 +0000 (22:11 -0700)]
vxs.inc: Don’t hard-code class name
This is part of bringing perl and CPAN into synch.
Father Chrysostomos [Tue, 10 Sep 2013 07:33:19 +0000 (00:33 -0700)]
vutil.c: Add preproc code specific to CPAN
The purpose is to bring the files into synch so that later version.pm
upgrades involve dropping files into place.
This requires changing vutil.h a bit to work in the core.
Father Chrysostomos [Wed, 11 Sep 2013 03:39:32 +0000 (20:39 -0700)]
Import vutil.h from the CPAN version dist
This will be needed when we switch vutil.c over to using macros for
version functions, the way the CPAN dist does it.
Father Chrysostomos [Tue, 10 Sep 2013 07:14:59 +0000 (00:14 -0700)]
Extract version routines into two new files
This is to make synchronisation between the CPAN distribution and the
perl core easier.
The files have different extensions to match what the CPAN distribu-
tion will have. vutil.c is a separate compilation unit that the CPAN
dist already has. vxs.inc will be included by vxs.xs (vxs.c is obvi-
ously alreday taken, being generated from vxs.xs).
In the perl core util.c includes vutil.c and universal.c
includes vxs.inc.
Chris 'BinGOs' Williams [Sat, 4 Jan 2014 12:24:00 +0000 (12:24 +0000)]
Update ExtUtils-MakeMaker to CPAN version 6.86
[DELTA]
6.86 Sat Jan 4 12:17:53 GMT 2014
No changes from 6.85_07
6.85_07 Wed Jan 1 18:55:22 GMT 2014
Bug fixes:
* Expanded test coverage for metafiles
Doc fixes:
* Documentation expanded to mention JSON metafiles
6.85_06 Mon Dec 30 23:14:37 GMT 2013
Bug fixes:
* Explicitly require dynaloader before using mod2fname
6.85_05 Sun Dec 29 11:25:00 GMT 2013
Bug fixes:
* Export 'configure' section of prereqs when meta-spec version 2
Doc fixes:
* Document BUILD_REQUIRES defaults
6.85_04 Mon Dec 23 15:00:14 GMT 2013
No changes since v6.85_03 fixing repo tags
6.85_03 Mon Dec 23 14:55:37 GMT 2013
Bug fixes:
* RT#91540 PREREQ_FATAL not recognised on command line
6.85_02 Tue Dec 17 10:13:28 GMT 2013
New features:
* Added PPM_UNINSTALL_EXEC and PPM_UNINSTALL_SCRIPT options
to PPD generation
6.85_01 Mon Dec 16 13:15:43 GMT 2013
Bug Fixes:
* harden xsubpp locating loop in MM_Unix
Chris 'BinGOs' Williams [Sat, 4 Jan 2014 10:31:51 +0000 (10:31 +0000)]
Update Module-Load to CPAN version 0.28
[DELTA]
0.28 Sat Jan 4 11:07:27 GMT 2014
* Fix 'Prototype after' warnings
0.26 Sat Jan 4 10:08:35 GMT 2014
* New functions added (reisub)
* Documented by magnolia
Brian Fraser [Sat, 4 Jan 2014 03:03:35 +0000 (00:03 -0300)]
perldelta for d_libname_unique
Brian Fraser [Wed, 27 Nov 2013 16:25:25 +0000 (13:25 -0300)]
Teach ExtUtils::CBuilder to handle mod2fname properly
Brian Fraser [Wed, 20 Nov 2013 05:37:33 +0000 (02:37 -0300)]
Configure: Introduce d_libname_unique
Brian Fraser [Wed, 15 May 2013 11:52:18 +0000 (08:52 -0300)]
DynaLoader: Introduce d_libname_unique
Android's linker has some unusual behavior, in that it only uses
the basename of a library in its cache. That means that, as far as
dlopen() is concerned, the libraries for Hash::Util and List::Util,
both of which are called Util.so, are the same.
This commit teaches DynaLoader about d_libname_unique. When
defined, it signals DynaLoader define a mod2fname sub that renames
the .so files to something "unique" -- so for example,
Hash/Util/Util.so becomes Hash/Util/PL_Hash__Util.so.
Karl Williamson [Fri, 3 Jan 2014 16:17:14 +0000 (09:17 -0700)]
porting/diag.t: Add comments
Karl Williamson [Fri, 3 Jan 2014 15:00:32 +0000 (08:00 -0700)]
perldiag.pod: Correct some categories
The warning categories were wrong in a few places here. diag.t had them
as to ignore. By correcting them in the pod, they can be removed from
the ignore list.
This commit additionally adds text for a few messages that can be either
fatal or just warnings.
Steve Hay [Fri, 3 Jan 2014 08:48:20 +0000 (08:48 +0000)]
Upgrade Encode from version 2.56 to 2.57
Father Chrysostomos [Thu, 2 Jan 2014 14:27:24 +0000 (06:27 -0800)]
perldiag: Rewrap an entry slightly
for better splain output
Karl Williamson [Thu, 2 Jan 2014 03:08:02 +0000 (20:08 -0700)]
Change some warnings in utf8n_to_uvchr()
This bottom level function decodes the first character of a UTF-8 string
into a code point. It is discouraged from using it directly. This
commit cleans up some of the warnings it can raise. Now, tests for
malformations are done before any tests for other potential issues. One
of those issues involves code points so large that they have never
appeared in any official standard (the current standard has scaled back
the highest acceptable code point from earlier versions). It is
possible (though not done in CPAN) to warn and/or forbid these code
points, while accepting smaller code points that are still above the
legal Unicode maximum. The warning message for this now includes the
code point if representable on the machine. Previously it always
displayed raw bytes, which is what it still does for non-representable
code points.
Ricardo Signes [Thu, 2 Jan 2014 00:21:43 +0000 (19:21 -0500)]
do not have perlbug talk about perlthanks
Ricardo Signes [Wed, 1 Jan 2014 23:20:21 +0000 (18:20 -0500)]
remove the claim that perlthanks gets an autoreply
Karl Williamson [Wed, 1 Jan 2014 05:41:39 +0000 (22:41 -0700)]
numeric.c: Use macros instead of strchr()
This replaces uses of strchr() (and hence its loop) with a simple array
lookup, mask, and test. This causes an extra test to be needed later in
the hex decoding case to get the hex value, instead of a subtract
previously. However these two tests are fewer than the average number
in strchr().
Karl Williamson [Wed, 1 Jan 2014 05:35:46 +0000 (22:35 -0700)]
handy.h: Add two macros
handy.h contains a macro that reads a hex digit and returns its value,
with fewer branches than a naive implementation would use. This commit
just copies and modifies it to create two macros for
1) just converting the hex value, without advancing the input; and
2) doing the same for an octal value.
Karl Williamson [Wed, 1 Jan 2014 05:19:45 +0000 (22:19 -0700)]
handy.h: Add debugging assertion
This macro requires the input to be a hex digit, without testing. It is
prudent to assert that under DEBUGGING.
Karl Williamson [Wed, 1 Jan 2014 05:13:06 +0000 (22:13 -0700)]
Move a macro from utf8.h to handy.h for wider use.
Future commits will want this available outside utf8.h
Karl Williamson [Wed, 1 Jan 2014 04:57:53 +0000 (21:57 -0700)]
regen/warnings.pl: Add comments
These note that warnings categories should be independent in the calls
to ckWARN() and packWARN() type macros.
Karl Williamson [Wed, 1 Jan 2014 05:05:45 +0000 (22:05 -0700)]
ext/XS-APItest/t/utf8.t: White-space only
Indent and reflow to fit into 79 columns due to a new enclosing block in
the previous commit
Karl Williamson [Wed, 1 Jan 2014 04:45:54 +0000 (21:45 -0700)]
utf8.c: Fix warning category and subcategory conflicts
The warnings categories non_unicode, nonchar, and surrogate are all
subcategories of 'utf8'. One should never call a packWARN() with both a
category and a subcategory of it, as it will mean that one can't
completely make the subcategory independent. For example,
use warnings 'utf8';
no warnings 'surrogate';
surrogate warnings will be output if they are tested with a
ckWARN2(WARN_UTF8, WARN_SURROGATE);
utf8.c was guilty of this.
Karl Williamson [Wed, 1 Jan 2014 04:41:09 +0000 (21:41 -0700)]
utf8.c: Don't do redundant test
The test here for WARN_UTF8 is redundant, as only if one of the other
three warning categories is enabled will anything actually be output.
Karl Williamson [Wed, 1 Jan 2014 04:37:52 +0000 (21:37 -0700)]
utf8.c: Typo in comment, and clarification
Karl Williamson [Tue, 31 Dec 2013 19:30:35 +0000 (12:30 -0700)]
Unicode::UCD::prop_aliases(): Don't generate spurious warnings
Certain inputs to prop_aliases caused spurious warning.
Karl Williamson [Wed, 1 Jan 2014 19:58:05 +0000 (12:58 -0700)]
t/test.pl: Reword comment
There was a typo in this comment, but looking at it closely made me
realize that I didn't really understand it. This clarifies it.
Dominic Hargreaves [Wed, 1 Jan 2014 19:45:35 +0000 (19:45 +0000)]
perl5180delta: typos
Karl Williamson [Wed, 1 Jan 2014 18:36:17 +0000 (11:36 -0700)]
regexec.c: Clarify comment
Karl Williamson [Wed, 1 Jan 2014 16:59:20 +0000 (09:59 -0700)]
regexec.c: Guard against malformed UTF-8 in [...]
The code that handles bracketed character classes assumed that the
string being matched against did not have the too-short malformation;
this could lead to reading beyond-the-end-of-buffer. (It did check for
other malformations.) This is solved by changing the function that
operates on bracketed character classes to take and use an extra
parameter, the actaul buffer end.
Karl Williamson [Wed, 1 Jan 2014 16:52:55 +0000 (09:52 -0700)]
pp.c: Remove unnecessary mask operation.
An unsigned character (U8) should not have more than 8 bits of data, so
no need to force that by masking with 0xFF.
Karl Williamson [Wed, 1 Jan 2014 16:49:04 +0000 (09:49 -0700)]
pp.c: Guard against malformed UTF-8 input in ord()
This code got the actual length of the input scalar, but discarded it.
If that scalar contains malformed UTF-8 that has fewer bytes than is
indicated, a read beyond-buffer-end could happen. Simply use the actual
length.
H.Merijn Brand [Wed, 1 Jan 2014 17:32:50 +0000 (18:32 +0100)]
Regenerated Configure after backports
Father Chrysostomos [Wed, 1 Jan 2014 14:06:30 +0000 (06:06 -0800)]
pp.c: Simplify lc and uc stringification code
Originally, lc and uc would not warn about undef, due to an implemen-
tation detail.
The implementation changed in
673061948, and extra code was added to
keep the behaviour the same.
Commit
0a0ffbced enabled the warnings about undef, but did so by added
even more code in the midst of the blocks that existed solely to avoid
the warning.
We can just delete those blocks and put in a simple stringification.
Father Chrysostomos [Wed, 1 Jan 2014 13:56:15 +0000 (05:56 -0800)]
pp.c: Improve self-referential comment
pp.c:pp_lc has this:
/* Here is where we would do context-sensitive actions. See the
* commit message for this comment for why there isn't any */
If I try to look up the commit that added the comment, I get this:
commit
06b5486afd6f58eb7fdf8c5c8cdb8520a4c87f40
Author: Karl Williamson <public@khwilliamson.com>
Date: Fri Nov 11 10:13:28 2011 -0700
pp.c: White-space only
This outdents and reflows comments as a result of the removal of a
surrounding block
86510fb15 was the commit that added the comment, whose commit message
contains the explanation, so cite that directly.
Father Chrysostomos [Wed, 1 Jan 2014 13:51:36 +0000 (05:51 -0800)]
Reënable in-place lc/uc
It used to be that this code:
for("$foo") {
lc $_;
...
}
would modify $_, allowing other code in the ‘for’ block to see the
changes (bug #43207). Commit
17fa077605 fixed that by changing the
logic that determined whether lc/uc(first) could modify the sca-
lar in place.
In doing so, it stopped in-place modification from happening at all,
because the condition became SvPADTMP && SvTEMP, which never happens.
(SvPADTMP unually indicates an operator return value stored in a pad;
i.e., a scalar that will next be used by the same operator again to
return another value. SvTEMP indicates that the REFCNT will go down
shortly, usually a temporary value created solely for the sake of
returning something.)
Now that bug #78194 is fixed, for("$foo") no longer exposes a PADTMP
to the following code, so we *can* now assume (as was done erroneously
before) that PADTMP indicates something like lc("$foo$bar") and modify
pp_stringify’s return value in place.
Also, we can extend this to apply to TEMP variables that have a ref-
erence count of 1, since they cannot be in use elsewhere. We skip
TEMP variables with set-magic, because they could be tied, and
SvSETMAGIC would have a side effect. (That could happen with
lc(delete $h{tied_elem}).)
Previously, this was skipped for uc and lc for overloaded references,
since stringification could change the utf8ness. That is no longer
sufficient. As of Perl 5.16, typeglobs and non-overloaded blessed
references can also enable their utf8 flag upon stringification, if
the stash or glob names contains wide characters. So I changed the
!SvAMAGIC (not overloaded) to SvPOK (is a string already), which will
cover most cases where this optimisation helps. The two tests added
to the end of lc.t fail with !SvAMAGIC.
Father Chrysostomos [Tue, 31 Dec 2013 05:29:49 +0000 (21:29 -0800)]
lc.t: More tests for #43207
Chris 'BinGOs' Williams [Wed, 1 Jan 2014 09:03:39 +0000 (09:03 +0000)]
Update Time-Piece to CPAN version 1.26
[DELTA]
1.26 2013-12-29
- no changes since previous (trial) release
1.25_01 2013-12-16
- fix compiling for WinCE, execution is untested
- add a .gitignore (from Win32::API)
- fix a compiler warning about unused var, and add inlining
- add PERL_NO_GET_CONTEXT to XS to bring the binary into 21st century
- refactor XS code to remove large sections of duplicate machine code
- fix _crt_localtime to return year only once, previously
_crt_localtime returned year (item #6) twice in the list
Karl Williamson [Tue, 31 Dec 2013 15:42:27 +0000 (08:42 -0700)]
regexec.c: Remove redundant test
A string must be in utf8 format if the first code point that came from
it is above 255; therefore it is redundant to test for both.
Martin McGrath [Tue, 31 Dec 2013 16:22:28 +0000 (09:22 -0700)]
PATCH [perl #120901] perlbug.PL - Add to user feedback/docs
Karl Williamson [Tue, 31 Dec 2013 15:28:20 +0000 (08:28 -0700)]
Merge branch into blead that changes \p above-Unicode behavior
This changes the behavior of Unicode property matching for above-Unicode
code points. It changes the format of mktables output tables for binary
properties for efficiency, eliminating a pass while reading them in.
It changes the format somewhat of the output tables when output for
debugging purposes so that they no longer fail some tests.
Karl Williamson [Sun, 29 Dec 2013 04:36:58 +0000 (21:36 -0700)]
Remove no-longer used inversion list function
The function _invlist_invert_prop() is hereby removed. The recent
changes to allow \p{} to match above-Unicode means that no special
handling of properties need be done when inverting.
This function was accessible to XS code that cheated by using #defines
to pretend it was something it wasn't, but it also has been marked
as subject to change since its inception, and never appeared in any
documentation.
Karl Williamson [Thu, 26 Dec 2013 21:01:49 +0000 (14:01 -0700)]
White-space only
This indents various newly-formed blocks (by the previous commit) in
these three files, and reflows lines to fit into 79 columns
Karl Williamson [Wed, 25 Dec 2013 03:11:23 +0000 (20:11 -0700)]
Change format of mktables output binary property tables
mktables now outputs the tables for binary properties as inversion
lists, with a size as the first element. This means simpler handling of
these tables in the core, including removal of an entire pass over them
(it was done just to get the size). These tables are marked as for
internal use by the Perl core only, so their format is changeable at
will.
Karl Williamson [Tue, 24 Dec 2013 03:35:54 +0000 (20:35 -0700)]
Change \p{} matching for above-Unicode code points
http://markmail.org/message/eod7ukhbbh5tnll4 is the beginning of the
thread that led to this commit.
This commit revises the handling of \p{} and \P{} to treat above-Unicode
code points as typical Unicode unassigned ones, and only output a
warning during matching when the answer is arguable under strict Unicode
rules (that is "matched" for \p{}, and "didn't match" for \P{}). The
exception is if the warning category has been made fatal, then it tries
hard to always output the warning. The definition of \p{All} is changed
to be qr/./s, and no warning is issued at all for matching it against
above-Unicode code points.
Karl Williamson [Thu, 19 Dec 2013 05:57:55 +0000 (22:57 -0700)]
regcomp.c: comment typo and rewording
Karl Williamson [Thu, 19 Dec 2013 05:53:46 +0000 (22:53 -0700)]
regcomp.c: Refactor 'if' statement
This refactoring makes it clear that within a (?[]), we don't try to
optimize the [] part. This is for clarity for the future only, as
currently the only changed behavior is if this is being compiled with /l
rules, and (?[]) generates a syntax error under /l.
Karl Williamson [Thu, 19 Dec 2013 05:41:35 +0000 (22:41 -0700)]
Fatalized non-unicode warnings skip regex optimization
This makes sure that fatalized non-unicode warnings actually get output.
For example \p{Line_Break=CR} would normally get optimized into an EXACT
node. But if the user has made non-unicode warnings fatal indicating
they want to be sure not to try to even match such code points, the
optimization is skipped so that the checks are made.
Documentation for this change will be in a future commit.
Karl Williamson [Wed, 27 Nov 2013 19:16:25 +0000 (12:16 -0700)]
mktables: Split off some functionality
This adds a new function that formats a count of code points. Currently
it calls the current function that formats a generic number. A future
commit will change so that the output of the two functions differ. The
reason for this commit is to make that later commit's difference listing
smaller and easier to understand.
Karl Williamson [Wed, 27 Nov 2013 18:39:48 +0000 (11:39 -0700)]
mktables: Add \p{Unicode}
This is a clearer synonym for \p{Any}
Karl Williamson [Wed, 27 Nov 2013 17:59:08 +0000 (10:59 -0700)]
mktables: Separate out defns of \p{Any} and \p{All}
This is in preparation to making them mean different things, in a future
commit
Karl Williamson [Tue, 26 Nov 2013 03:18:31 +0000 (20:18 -0700)]
regcomp.h: Reorder some #defines
There are no logic changes. The previous commit changed the numbers for
some of the bits. This commit re-arranges things so that the #defines
are again in numerical order.
Karl Williamson [Tue, 26 Nov 2013 03:12:33 +0000 (20:12 -0700)]
Re-order some flag bits to avoid potential branches
The ANYOF_INVERT flag is used in every single pattern match of
[bracketed character classes]. With backtracking, this can be a huge
number. All the other flags' uses pale by comparison. I noticed that
by making it the lowest bit, we don't have to use CBOOL, as the only
possibilities are 0 and 1. cBOOL hopefully will be optimized away, but
not always. This commit reorders some of the flag bits to make this one
the lowest, and adds a compile check to make sure it isn't inadvertently
changed.
Karl Williamson [Mon, 30 Dec 2013 22:16:57 +0000 (15:16 -0700)]
Output regex above-Unicode matching in syn strt class
A warning is supposed to be raised under some conditions when matching
an above-Unicode code point against a Unicode property. Prior to this
patch, if the synthetic start class excluded the code point, the warning
would be skipped, even though it was attempted to be matched.
Karl Williamson [Tue, 26 Nov 2013 02:40:12 +0000 (19:40 -0700)]
Convert regnode to a flag for [...]
Prior to this commit, there were 3 types of ANYOF nodes; now there are
two: regular, and one for the synthetic start class (ssc). This commit
converted the third type dealing with warning about matching \p{}
against non-Unicode code points, into using the spare flag bit for ANYOF
nodes.
This allows this bit to apply to ssc ANYOF nodes, whereas previously it
couldn't. There is a bug in which the warning isn't raised if the match
is rejected by the optimizer, because of this inability. This bug will
be fixed in a later commit.
Another option would have been to create a new node-type which was an
ANYOF_SSC_WARN_SUPER node. But this adds extra complications to things;
and we have a spare bit that we might as well use. The comments give
better possibilities for freeing up 2 bits should they be needed.
Karl Williamson [Mon, 30 Dec 2013 22:04:37 +0000 (15:04 -0700)]
regcomp.c: Split #define into two
The syntethic start class regnode (SSC) and a bracketed character class
node share much of the same data structure, including a flags field,
and some of the same flag bits within it. Currently, only
locale-related flags (under /l rules) are the same between the two
during construction of the SSC. But a future commit will introduce
another common flag. This commit creates an extra #define for use where
we want the common flags, while retaining the existing one for use where
we want the locale flags. The new #define is just a copy of the
existing one, to be changed in the future commit.
Karl Williamson [Tue, 26 Nov 2013 02:31:57 +0000 (19:31 -0700)]
mktables: Better comment some variables
Karl Williamson [Fri, 15 Nov 2013 04:12:40 +0000 (21:12 -0700)]
mktables: Calculate debugging information placement
When outputting debugging information under the -annotate option, it's
nice to line up the columns. This commit does a pass through the tables
where the final real data column is variable width so that it can figure
out where to put the debugging info so as almost all of the columns can
be lined up, and not have to be right-shifted because of overlong real
data.
Certain tables prior to this commit had been manually eyeballed and
column information hard-coded in. This is no longer necessary. This
means that one parameter to the write() function is no longer used, and
is removed here.
Karl Williamson [Fri, 15 Nov 2013 02:30:42 +0000 (19:30 -0700)]
mktables: White-space only
Outdent a just-removed block, and better align several other statements
Karl Williamson [Fri, 15 Nov 2013 02:32:44 +0000 (19:32 -0700)]
mktables: Convert to use new function
The previous commit added a new function used in newly added code; this
changes some existing code to use that function
Karl Williamson [Thu, 14 Nov 2013 04:56:31 +0000 (21:56 -0700)]
mktables: Don't change table format with debugging info
The -annotate option to mktables causes it to output extra information
(in the form of comments) to its generated tables to make them human
readable and useful for debugging. Prior to this commit, this caused
the tables' formats to be changed somewhat by causing what normally
ranges to have a line output for each element of the range. This bloats
the tables, and causes UCD.t to fail, as it is looking for a
particular syntax for the tables.
This commit causes the debugging information to be placed separately
but adjacent to the real data. The ranges remain as they would be
without -annotate. This removes the bloat (as the debugging info is
stripped out as the table is read in) and causes UCD.t to pass.
It also allows for the format of the real data to change in a later
commit, and the debugging info can remain relevant.
A future commit will improve the indentation of the comment annotations
Karl Williamson [Tue, 12 Nov 2013 19:09:19 +0000 (12:09 -0700)]
mktables: Improve display of debugging information
Under the -annotate option, mktables outputs the UTF-8 for the printable
characters. This commit adds a non-spacing blank before each such one
that is supposed to combine with its preceding character (marks). This
causes the display of the character to look better.
This necessitated making a local variable more global in scope.
Karl Williamson [Fri, 8 Nov 2013 16:34:54 +0000 (09:34 -0700)]
lib/Unicode/UCD.t: White-space only
Indent a newly formed block
Karl Williamson [Fri, 8 Nov 2013 16:26:51 +0000 (09:26 -0700)]
Add tests for legacy Unicode data files
There are 5 files in lib/unicore/To that may be in direct use by
applications, and which are not used by Perl itself. These have been
changed in an earlier stable release to have comments in them saying,
their use is deprecated, and that Unicode::UCD gives a stable API for
access to the data they contain. However, no warning is given if an
application reads these files, so the deprecation cycle needs to be
quite long. Until we decide to get rid of these files sometime in the
future, we should make sure they exist and are correct. Since they
aren't actually used by Perl, there were no such tests. This commit
adds some tests. It puts them in lib/Unicode/UCD.t, as that required
the least amount of work, as it already has nearly all the
infrastructure required for testing these.
Karl Williamson [Fri, 8 Nov 2013 16:21:11 +0000 (09:21 -0700)]
lib/Unicode/UCD.t: Anchor a couple of regexes
A future commit will need these to be anchored to avoid false positives.
Karl Williamson [Thu, 7 Nov 2013 19:38:31 +0000 (12:38 -0700)]
lib/Unicode/UCD.t: Clarify diagnostic
This diagnostic comes from either of 2 problems, so mention both of
them.
Karl Williamson [Thu, 7 Nov 2013 18:56:09 +0000 (11:56 -0700)]
lib/Unicode/UCD.t: Rename a $variable
This is in preparation for a future commit where the new name makes more
sense.
Karl Williamson [Wed, 6 Nov 2013 17:56:07 +0000 (10:56 -0700)]
Unicode/UCD.t: Add missing 'next' statement
When a test fails, it should do a 'next' to stop processing the current
property.
Karl Williamson [Wed, 6 Nov 2013 05:52:10 +0000 (22:52 -0700)]
mktables: White-space only
Align a few lines to begin on same column which has been outdented so
nothing exceeds 79 columns
Karl Williamson [Wed, 6 Nov 2013 05:33:06 +0000 (22:33 -0700)]
Unicode::UCD: Remove access to some legacy-only properties
Five files are currently being kept around only because they existed
before Unicode::UCD gave access to the properties they define, and some
application programs may rely on their presence, and format. More
compact files have supplanted the use of these files by the Perl core.
Mistakenly, Unicode::UCD gave access to these files via the made-up
property names that they are referred to by in mktables. This was
undocumented. This commit removes this access.
Karl Williamson [Mon, 4 Nov 2013 16:57:29 +0000 (09:57 -0700)]
mktables: Clarify overloaded variable name
The term 'full' is overloaded here in this small section of code. In
some cases it refers to the full case mapping versus the simple case
mapping; in other cases it refers to the full name for a property as
opposed to the abbreviated name. This commit expands each to indicate
which is meant.
Karl Williamson [Sun, 3 Nov 2013 05:22:48 +0000 (23:22 -0600)]
mktables, UCD.t: Fix nits in comments; add comment
Karl Williamson [Tue, 29 Oct 2013 01:49:55 +0000 (19:49 -0600)]
mktables: Don't output trailing tabs in tables
This makes sure that the tabs aren't output unless there is a following
non-null value, saving some disk space
Karl Williamson [Mon, 28 Oct 2013 23:00:25 +0000 (17:00 -0600)]
Unicode/UCD.t: white-space, comments
Wrap to 79 columns; add a comment
Karl Williamson [Mon, 28 Oct 2013 22:43:01 +0000 (16:43 -0600)]
mktables: Stop generating most leading zeros
Leading zeros were generated to conform with Unicode usage, but these
are machine-read files so this just takes up some extra space and extra
parsing cycles at run-time. It's a small matter, but we should design
our files to be the most efficient possible. It is possible to get more
human-readable files by using the -annotate option to mktables.
Certain files whose existence has been published have their formats
unchanged, in case some application is reading them. The files contain
comments that their use is deprecated, but there is no warning generated
if they are opened and read, nor is it really feasible to add such a
warning. At some time in the future, we may feel it's ok to remove
these files, as their contents have been available since v5.16 through a
stable API in Unicode::UCD, but until we remove them, we shouldn't
change their formats.
Not all other leading zeros are removed; just the ones that were
convenient to remove.
Karl Williamson [Sun, 20 Oct 2013 16:57:21 +0000 (10:57 -0600)]
mktables: Further explain how things work in a comment
Karl Williamson [Sun, 20 Oct 2013 16:27:42 +0000 (10:27 -0600)]
mktables: Add an advisory comment to generated files.
Karl Williamson [Sun, 20 Oct 2013 16:20:13 +0000 (10:20 -0600)]
mktables: Regenerate if called with different cmd line args
mktables acts pretty much like its own Makefile. This is because the
rules for regenerating are complicated and too hard to keep in sync in a
Makefile with new versions of Unicode. mktables itself already has
enough intelligence to automatically update the rules when it gets
modified to account for new files from Unicode.
However, prior to this commit, it didn't keep track of the options it
was called with, thus it wouldn't necessarily run when those options
changed to affect the desired outputs.
Karl Williamson [Sun, 20 Oct 2013 16:13:39 +0000 (10:13 -0600)]
mktables: Tighten regex match to real data
The actual file has spaces, so use \s instead of the more dangerous dot.
Also, after processing the line, no need to look to see if it matches
something else.
Karl Williamson [Fri, 18 Oct 2013 02:05:18 +0000 (20:05 -0600)]
mktables: Fixup debugging info
The -annotate parameter generates extra information in the tables
created by mktables which is useful to me in understanding the Unicode
standard and debugging. I doubt that anyone else has ever used it. It
has been broken for some tables for some time. This commit fixes those.
Karl Williamson [Mon, 30 Dec 2013 22:43:12 +0000 (15:43 -0700)]
mktables: Always strip off returned comments in tables
mktables generates (among other things) many perl .pl files which when
executed, return a string containing many lines. Each line may end
with '#' comments. Previously, it didn't always strip off those
comments to the caller, which it assumed uses a 'do' statement to
execute these, and the comments are automatically ignored. However, it
turns out that the 'mkheader' script in Unicode::Normalize doesn't cope
with these comments. This usually doesn't get called except once when
normally these comments aren't generated, but if it does, things don't
just compile. So, just strip off the comments, rather than letting the
'do' handle it.
Karl Williamson [Fri, 18 Oct 2013 02:03:52 +0000 (20:03 -0600)]
mktables: White-space only: wrap to 79 cols