Karl Williamson [Sat, 5 Nov 2011 17:31:47 +0000 (11:31 -0600)]
utf8.c: Use proper Unicode property names
There are five functions in utf8.c that look up Unicode maps--the case
changing functions. They look up these maps under the names ToDigit,
ToFold, ToLower, ToTitle, and ToUpper. The imminent expansion of Unicode::UCD
to return the mappings for all properties creates a naming conflict, as
three of those names are the same as other properties, Upper, Lower, and
Title.
It was an unfortunate choice of names originally. Now mktables has been
changed to create a list of mapping properties that utf8_heavy.pl reads.
It uses the official names of those properties, so change utf8.c to
correspond.
Karl Williamson [Sat, 5 Nov 2011 17:12:39 +0000 (11:12 -0600)]
utf8_heavy.pl: Find mapping files from table
Previously, utf8_heavy.pl only returned 4 mapping files, the ones that
change case, and their names are known to it. mktables now generates a
list of mapping files that it outputs. This adds these to utf8_heavy's
repertoire.
Karl Williamson [Sat, 5 Nov 2011 16:52:45 +0000 (10:52 -0600)]
utf8_heavy.pl: white-space only
Indenting to reflect being in a new block
Karl Williamson [Sat, 5 Nov 2011 16:34:01 +0000 (10:34 -0600)]
utf8_heavy: Reorder 2 if's
This saves a little redundant code, and will be useful in future commits
Karl Williamson [Sat, 5 Nov 2011 16:18:48 +0000 (10:18 -0600)]
mktables: Add %file_to_swash_name to utf8_heavy.pl
Karl Williamson [Sat, 5 Nov 2011 16:11:02 +0000 (10:11 -0600)]
mktables: Fix comment
utf8_heavy.pl is now being used by Unicode::UCD.
Karl Williamson [Sat, 5 Nov 2011 16:07:46 +0000 (10:07 -0600)]
mktables: Add %loose_property_to_file_of to utf8_heavy.pl
Karl Williamson [Sat, 5 Nov 2011 15:50:04 +0000 (09:50 -0600)]
mktables: Add comment to generated files
Karl Williamson [Sat, 5 Nov 2011 15:25:46 +0000 (09:25 -0600)]
mktables: Add %algorithmic_named_code_points to UCD.pl
Karl Williamson [Sat, 5 Nov 2011 15:17:07 +0000 (09:17 -0600)]
Unicode::UCD: pod: document new/old style block property names
Karl Williamson [Sat, 5 Nov 2011 15:08:19 +0000 (09:08 -0600)]
Unicode::UCD: Add prop_invlist()
This function returns a data structure of all the code points matching
a binary Unicode property or property-value
Karl Williamson [Sat, 5 Nov 2011 14:56:37 +0000 (08:56 -0600)]
mktables: Add %loose_defaults to UCD.pl
Karl Williamson [Sat, 5 Nov 2011 14:49:48 +0000 (08:49 -0600)]
utf8_heavy.pl: Correct debugging statement
This was printing out the value before setting it (hence getting the old
value).
Karl Williamson [Sat, 5 Nov 2011 14:34:00 +0000 (08:34 -0600)]
utf8_heavy.pl: Return that property is user-defined
This adds an element to the returned hash that is a boolean as to
whether or not the property is user-defined.
Karl Williamson [Mon, 31 Oct 2011 20:07:25 +0000 (14:07 -0600)]
Unicode::UCD: add prop_aliases(), prop_value_aliases()
Karl Williamson [Fri, 4 Nov 2011 22:50:20 +0000 (16:50 -0600)]
mktables: Add %suppressed_properties to UCD.pl
Karl Williamson [Fri, 4 Nov 2011 22:28:27 +0000 (16:28 -0600)]
mktables: Add %ambiguous_names to UCD.pl
Karl Williamson [Fri, 4 Nov 2011 22:39:05 +0000 (16:39 -0600)]
mktables: Output ISO_Comment table
This is done for the reasons cited in the comment. The table is trivial
in size.
Karl Williamson [Fri, 4 Nov 2011 22:05:33 +0000 (16:05 -0600)]
mktables: Add %prop_value_aliases to UCD.pl
Karl Williamson [Fri, 4 Nov 2011 21:46:42 +0000 (15:46 -0600)]
mktables: Add %prop_aliases in UCD.pl
Karl Williamson [Fri, 4 Nov 2011 21:32:47 +0000 (15:32 -0600)]
mktables: Add %string_property_loose_to_name for UCD.pl
Karl Williamson [Fri, 4 Nov 2011 21:16:24 +0000 (15:16 -0600)]
mktables: White-space only
Earlier commits removed and inserted blocks. This changes the
indentation to correspond
Karl Williamson [Fri, 4 Nov 2011 21:06:52 +0000 (15:06 -0600)]
mktables: store method value in variable
This variable will have later use. This also changes 'table' to
'property' for more clarity. Here they point to the same thing
Karl Williamson [Fri, 4 Nov 2011 21:03:37 +0000 (15:03 -0600)]
mktables: Use method instead of relying on internals
A method call exists so that the caller doesn't need to know these
internals
Karl Williamson [Fri, 4 Nov 2011 20:26:22 +0000 (14:26 -0600)]
mktables: Refactor area of code for future commits
This commit changes this area of code to not skip the whole thing for
string properties. This is because future commits will use those parts
of it.
Karl Williamson [Fri, 4 Nov 2011 20:09:28 +0000 (14:09 -0600)]
mktables: Add %loose_to_standard_value to UCD.pl
This hash is to be used in Unicode::UCD
Karl Williamson [Fri, 4 Nov 2011 19:54:14 +0000 (13:54 -0600)]
mktables: Add %perlprop_to_aliases to UCD.pl
This is for future use in Unicode::UCD
Karl Williamson [Fri, 4 Nov 2011 19:42:17 +0000 (13:42 -0600)]
mktables: Create UCD.pl
This is a the initial portion of this file that will be used in
Unicode::UCD.
Karl Williamson [Fri, 4 Nov 2011 19:36:03 +0000 (13:36 -0600)]
mktables: Add 'ucd' member to alias class
This member indicates if the alias is to be documented as being
accessible via Unicode::UCD (after future commits).
Karl Williamson [Fri, 4 Nov 2011 19:26:48 +0000 (13:26 -0600)]
mktables: Set proxy tables
As the comments indicate, this commit causes the tables that are usable
as proxies to be known.
Karl Williamson [Fri, 4 Nov 2011 19:18:26 +0000 (13:18 -0600)]
mktables: White space only
Indent a newly-formed block
Karl Williamson [Fri, 4 Nov 2011 18:39:06 +0000 (12:39 -0600)]
mktables: Add proxied fate
This adds a fate for tables for future use in Unicode::UCD in which the
content of the tables is available via a proxy table. This allows the
table to be retrievable but to not have to be output itself.
Karl Williamson [Fri, 4 Nov 2011 18:54:31 +0000 (12:54 -0600)]
mktables: Add method to change if outputting pod entry
A new method is created to allow changing if an entry for a name is to
be output in the regular expression part of perluniprops.pod.
Karl Williamson [Wed, 2 Nov 2011 20:29:15 +0000 (14:29 -0600)]
mktables: don't re-use variable for sub-purpose
A future commit will need the main variable.
Karl Williamson [Wed, 2 Nov 2011 19:15:32 +0000 (13:15 -0600)]
mktables: Remove unnecessary test
This test no longer changes what is output in any way
Karl Williamson [Tue, 1 Nov 2011 13:59:11 +0000 (07:59 -0600)]
mktables: rename subroutine to reflect new reality
The pod's scope is about to be expanded in future commits, and so this
subroutine will only apply to a portion of the pod, not the whole one.
Karl Williamson [Sat, 29 Oct 2011 21:04:01 +0000 (15:04 -0600)]
mktables: rearrange make_Heavy() so can do one HERE doc
This just rearranges the code so that all the variable parts of the HERE
document are calculated first in the routine, then interpolated in.
This is clearer than having many HERE docs with code interspersed.
Karl Williamson [Mon, 17 Oct 2011 17:01:56 +0000 (11:01 -0600)]
mktables: Refactor a string set.
The unadorned full name will be needed for future commits
Karl Williamson [Wed, 12 Oct 2011 16:01:49 +0000 (10:01 -0600)]
mktables: Correct the descriptions of a few tables
The previous descriptions made no sense when reading them out-of-context
in the generated files
Karl Williamson [Wed, 12 Oct 2011 15:43:02 +0000 (09:43 -0600)]
mktables: Don't generate swash info unnecessarily
Certain map files are special purpose, for use by certain modules, as
documented in the files (but these comments are stripped for
non-DEBUGGING builds). They do not need to have the information about
the format, etc. of the file.
Karl Williamson [Wed, 12 Oct 2011 02:05:52 +0000 (20:05 -0600)]
mktables: Mark two internal tables as such
Karl Williamson [Wed, 12 Oct 2011 01:23:11 +0000 (19:23 -0600)]
mktables: Add fate
Some tables are predestined to not be output; others have different
fates. This patch creates a 'fate' field in the table data structure
which replaces the 'internal_only' field, the latter becoming just one
of the possible fates.
The SUPPRESSED and PLACEHOLDER statuses are moved to be fates. This
makes the code cleaner in places, and allows independent setting of the
tables fate from its pod entry.
The empty slot will be filled in later
Karl Williamson [Wed, 12 Oct 2011 00:05:24 +0000 (18:05 -0600)]
mktables: Use hash instead of grep the keys
It doesn't make sense to grep the keys, when you can just look up the
key in the hash.
Karl Williamson [Tue, 11 Oct 2011 14:30:54 +0000 (08:30 -0600)]
perluniprops: Discourage use of internal properties
These properties are publicly listed because they've always been listed.
I suspect, but haven't checked, that they pre-date the Unicode
properties that they are now equivalent to. They are therefore included
just for backwards compatibility. But new code shouldn't be using them,
unless it's trying to accept very old Unicode versions, so mark them in
the pod as discouraged. (That's the only effect, a pod change.)
Karl Williamson [Tue, 11 Oct 2011 02:04:38 +0000 (20:04 -0600)]
mktables: Fix up a Unihan property
Prior to the specified Unicode release, this property was not handled by
the code without special work. After that, it came packaged with other
properties that the code can handle.
Karl Williamson [Tue, 11 Oct 2011 01:56:20 +0000 (19:56 -0600)]
perluniprops: Insert placeholder
This adds text to perluniprops in case an installation has recompiled
enabling every Unicode property, so that there isn't just a blank spot
in the pod.
Karl Williamson [Tue, 11 Oct 2011 01:54:27 +0000 (19:54 -0600)]
mktables: Factor out a constant string to a variable
A future commit will want to use this same text.
Karl Williamson [Mon, 10 Oct 2011 16:50:22 +0000 (10:50 -0600)]
mktables: Change variable name
A future commit will use the current name for a purpose for which it
makes more sense.
Karl Williamson [Sat, 8 Oct 2011 21:04:12 +0000 (15:04 -0600)]
perluniprops: Slight wording changes
Karl Williamson [Sat, 8 Oct 2011 20:55:19 +0000 (14:55 -0600)]
perluniprops: Add info about unused Unicode files
A new section to the pod is added.
Karl Williamson [Sat, 8 Oct 2011 19:25:56 +0000 (13:25 -0600)]
mktables: More detail on ignored files
This improves the wording on files that are delivered by Unicode and
aren't used by mktables; and adds the documentation files that
previously weren't listed. This is in preparation for perluniprops
listing these.
Karl Williamson [Sat, 8 Oct 2011 19:17:27 +0000 (13:17 -0600)]
mktables: Add reason skipping input files
Some files may be in the directory but we don't currently process them.
Add a reason as to why, and add them to the global list of ignored
files. This is in preparation to changing perluniprops to note them.
Karl Williamson [Thu, 6 Oct 2011 04:15:52 +0000 (22:15 -0600)]
mktables; Use variable instead of hard-coded name
Earlier commits changed the properties that are to be internal-only to
have that flagged. Now use that instead of relying on the name
beginning with an underscore.
Karl Williamson [Wed, 5 Oct 2011 14:37:44 +0000 (08:37 -0600)]
mktables: Change name of field for to indicate restricted use
Future commits will add a new section to the generated pod. The
existing pod entries are just for regular expressions. Change the names
to reflect that.
Karl Williamson [Tue, 4 Oct 2011 20:42:07 +0000 (14:42 -0600)]
mktables: Improve comment
The comment is repeated in the generated Name.pm file
Karl Williamson [Tue, 4 Oct 2011 17:58:52 +0000 (11:58 -0600)]
Use newly created Name.pm file
Name.pm has been populated in an earlier commit. This removes the
now redundant information in Name.pl, and changes charnames to include
the new Name.pm
Karl Williamson [Tue, 4 Oct 2011 20:06:58 +0000 (14:06 -0600)]
mktables: Actually write data into Name.pm
Earlier commits have paved the way for actual data to be placed into
this file
Karl Williamson [Tue, 4 Oct 2011 19:57:42 +0000 (13:57 -0600)]
mktables: White space only
Removing the nesting if, and the fact that this code was just copied
from a place where it was nested deeper means that we can outdent.
Karl Williamson [Tue, 4 Oct 2011 19:41:00 +0000 (13:41 -0600)]
mktables: Rmv redundant checks
This code was just copied to its new location. It will only be used
once, so don't need to check that it's for the correct properties, and a
line just above also checked that it's needed or not, so the whole if
can be removed
Karl Williamson [Tue, 4 Oct 2011 18:14:07 +0000 (12:14 -0600)]
mktables: Copy code to new location.
This is simply a straight copy of code that currently gets put into
Name.pl into the subroutine that creates Name.pm.
The only other change is to define and set $pre_body to the empty
string, so that this code will compile in the new place, but the results
are discarded.
Karl Williamson [Tue, 4 Oct 2011 17:57:57 +0000 (11:57 -0600)]
mktables: whitespace changes only
Indent to correspond to newly formed block around this code
Karl Williamson [Tue, 4 Oct 2011 16:41:28 +0000 (10:41 -0600)]
mktables: refactor algorithmically-defined names into globals
This is in preparation for splitting the Name.pl and Name.pm files.
Prior to this patch, the code is copied into any file that needs it.
After this patch, it will be possible to store the code in Name.pm
once.
The main issue here is that Perl creates the Perl_charname file,
and the To/Name.pl file, both of which contain the Name property (the
first one also contains other things.) Normally the To/Name.pl is
suppressed, but things can be configured so it is output. The
subroutines that deal with algorithmically defined names are duplicated
in both files. Future commits will also have Unicode::UCD refer to
those subroutines. We do not want a calling program to load duplicate
or triplicate definitions of those subroutines, so we are refactoring
them into Name.pm, which can be included just once.
Karl Williamson [Tue, 4 Oct 2011 15:57:35 +0000 (09:57 -0600)]
mktables: Generate empty Name.pm
This file in later commits will take over portions of Name.pl
Karl Williamson [Tue, 4 Oct 2011 14:38:46 +0000 (08:38 -0600)]
Fix up some properties internal-onlyness
Several properties aren't marked as internal-only, that should be,
and vice versa. This doesn't make a difference now, but will for future
commits
Karl Williamson [Sun, 2 Oct 2011 18:26:45 +0000 (12:26 -0600)]
mktables: Refactor INTERNAL ONLY warning generation
All match tables are marked as for Perl core use only. This commit
causes this to happen in the header method for these tables. Most map
tables are also marked, and this is now done in its header method.
This will be useful for later commits
Karl Williamson [Sat, 1 Oct 2011 21:41:41 +0000 (15:41 -0600)]
mktables: Mark props that are perl extensions
Several properties are Perl extensions without this fact being marked.
Karl Williamson [Fri, 30 Sep 2011 03:53:23 +0000 (21:53 -0600)]
mktables: Change formal parameter name
This is to better reflect how future commits will treat this
Karl Williamson [Fri, 30 Sep 2011 03:39:45 +0000 (21:39 -0600)]
mktables: ALERT for U+0007 is really an alias
The previous solution to fixing the problem that ALERT is replacing BELL
for the name of U+0007, changed the name of that code point, but the
better solution, which will matter in future commits, is to change the
name to nothing, and add an alias of ALERT.
Karl Williamson [Fri, 30 Sep 2011 02:00:43 +0000 (20:00 -0600)]
perluniprops: clarify why ISO_Comment is deprecated
Karl Williamson [Tue, 20 Sep 2011 14:38:22 +0000 (08:38 -0600)]
mktables: XONfoo properties are deprecated
An installation can choose to not suppress these properties, but they
are deprecated none-the-less, and should be listed in the documentation
as such.
Karl Williamson [Tue, 20 Sep 2011 14:34:18 +0000 (08:34 -0600)]
mktables: Move Decomposition_Type suppression
This table is ordinarily not written out. The place to do that is where
other similar decisions are made.
Karl Williamson [Wed, 14 Sep 2011 15:46:14 +0000 (09:46 -0600)]
UCD.t: fix test names
Karl Williamson [Wed, 14 Sep 2011 15:17:03 +0000 (09:17 -0600)]
Change internal sub name to begin with underscore
This is in the utf8 package, which is used in a number of places,
so change the name to avoid potential conflicts
Karl Williamson [Wed, 14 Sep 2011 01:54:32 +0000 (19:54 -0600)]
UCD.t: Convert to use done_testing()
Future commits will introduce many more tests that are fairly difficult
to count ahead of time.
Karl Williamson [Wed, 14 Sep 2011 01:50:04 +0000 (19:50 -0600)]
Unicode::UCD: various nits in pod
Karl Williamson [Wed, 14 Sep 2011 01:38:33 +0000 (19:38 -0600)]
Unicode::UCD: correct minor pod error
An unassigned code point is considered to be in the "Unknown" script by
Unicode, and since 5.14, this function returns that correct value.
Karl Williamson [Mon, 5 Sep 2011 04:12:11 +0000 (22:12 -0600)]
mktables: output floating pt as strings
mktables creates data structures on files that are later read in. Some
of these are nominally floating point numbers, like 2.0. But they are
actually version numbers, and if output without being quoted, they will
be read in as just an integer and a string compare will fail. Actual
floating point numbers are also output, but if these are output enclosed
in quotes, they will be coerced into the correct data type when
necessary.
This fix doesn't correct any known existing problems, but is needed for
future commits.
Karl Williamson [Sat, 3 Sep 2011 19:12:26 +0000 (13:12 -0600)]
perluniprops: Clarify language about obsoleted properties
Karl Williamson [Sat, 3 Sep 2011 19:07:34 +0000 (13:07 -0600)]
perluniprops: Don't list newer internal-only properties
Certain property tables are constructed for Perls internal use. The
ones already in existence when mktables was revamped were retained
just in case someone was using them; but the newer ones, plus ones yet
to be created shouldn't be used outside of the Perl core, and hence
shouldn't be documented.
Karl Williamson [Sat, 3 Sep 2011 15:52:07 +0000 (09:52 -0600)]
mktables: Change variable name to prevent confusion
Best current practices say not to use 'last', as is ambiguous. In this
case we mean 'max'
Karl Williamson [Tue, 23 Aug 2011 23:36:13 +0000 (17:36 -0600)]
mktables: Don't output Decomposition Type mapping table
This table is marked as for internal Perl use only, and its data are
already found in lib/unicore/Decomposition.pl
Karl Williamson [Tue, 23 Aug 2011 23:05:52 +0000 (17:05 -0600)]
mktables: Add comment to db file
The two properties have been combined into one table for some time.
This notes that fact
Karl Williamson [Tue, 23 Aug 2011 23:04:04 +0000 (17:04 -0600)]
mktables: rmv extra blank between words
Karl Williamson [Tue, 23 Aug 2011 23:02:05 +0000 (17:02 -0600)]
mktables: Don't use hard-coded strings hash keys
I'm always nervous about using strings for hash keys. I could make a
typo and not know it. This changes a set of these into using $variables
instead, so that a typo will cause a syntax error.
Karl Williamson [Tue, 23 Aug 2011 22:59:29 +0000 (16:59 -0600)]
mktables: Don't hard-code true/false synonyms
Instead, use the vaules from a binary property furnished by Unicode.
Then, any changes to the synonyms are automatically propagated
Karl Williamson [Mon, 22 Aug 2011 17:44:22 +0000 (11:44 -0600)]
mktables: New table format descriptor
The ScriptExtensions property has a format that doesn't quite fit with
the existing ones. Everything is a string, which prior to this commit,
it was set to, but some strings are blank separated lists of scripts,
and shouldn't be interpreted as just a string. This change is for
Unicode::UCD::prop_invmap() function in current development.
Karl Williamson [Sat, 3 Sep 2011 20:56:57 +0000 (14:56 -0600)]
mktables: generalize uniprops swash format table
This makes the generated table of format strings in perluniprops go
through folding and tabbing.
Karl Williamson [Mon, 22 Aug 2011 01:14:42 +0000 (19:14 -0600)]
mktables: kIICore becomes a forced bin property
An earlier commit prepared the way to make this property be a
FORCED_BINARY type property, which will give better results when looking
at the db using future tools in Unicode::UCD than it does currently.
Karl Williamson [Mon, 22 Aug 2011 00:57:28 +0000 (18:57 -0600)]
mktables: Accept Unicode combo binary properties
Unicode has one property currently that is a combination of a property
with enum values, plus a binary property. The way it's supposed to work
is that property can be referred to as true or false, and any non-empty
value will be considered true, regardless of value.
The one property is in Unihan only, which currently requires downloading
its db and recompiling to use in the Perl cord. But mktables is
supposed to work on all Unicode properties.
Prior to the work that is going on to expose the Unicode db in
Unicode::UCD, this property was dealt with by simply changing all
non-empty values to true. But with the details of the db about to be
exposed, the real original values should be displayed.
What this patch does is to create a new property type, FORCED_BINARY.
The way it is handled, which Unicode doesn't seem to require, is to
allow matching on the original values, plus the standard true and false
ones.
I have tried to make it general, so truth is anything that isn't the
default value (so it doesn't have to be just the null string), plus I
don't believe that the table has to be an enum value to work.
Karl Williamson [Mon, 22 Aug 2011 00:30:46 +0000 (18:30 -0600)]
mktables: Lock a complement table
A table that is defined as an inverse of another should be read-only,
with no ability to add or subtract code points. This commit changes the
'set_complement' method from using the default accessor to a custom one
which locks the table.
Karl Williamson [Mon, 22 Aug 2011 00:29:02 +0000 (18:29 -0600)]
mktables: White-space only
This patch indents a block that was surrounded by an 'else' in a
previous patch
Karl Williamson [Mon, 22 Aug 2011 00:14:40 +0000 (18:14 -0600)]
mktables: Don't populate inverse tables
This patch saves some memory and time by skipping the populating of
tables which are complements of other ones. It relies on a previous
commit that causes operations on the complement to actually work on
the values of the master.
Karl Williamson [Sun, 21 Aug 2011 16:24:11 +0000 (10:24 -0600)]
mktables: use master range list if inverse
This patch subclasses the method that returns the range associated with
a table to instead return its complement's range. This means that when
working on a table that is a complement, you are really working on its
complement. This will be used in a future commit to avoid populating
the complement.
The base class has to change so that the method can be subclassed.
Karl Williamson [Sun, 21 Aug 2011 20:54:05 +0000 (14:54 -0600)]
mktables: perluniprops change prop-val generation
This patch changes the variables slightly for generating the rhs of a
pod entry in perluniprops. This has no effect currently, but will be
useful in a future patch.
Karl Williamson [Sun, 21 Aug 2011 20:47:45 +0000 (14:47 -0600)]
perluniprops: Fix handling of null prop vals
Regexes don't match '\p{prop=}'. As a result, prior to this patch, if a
table's name was the null string it wasn't output in the pod file, even
if there were aliases that did have names. This patch fixes this so
only the empty names are suppressed. No current property handling is
affected by this bug, but it is in preparation for a later patch.
Karl Williamson [Sun, 21 Aug 2011 20:37:06 +0000 (14:37 -0600)]
mktables: White-space only
Indent a block that is now enclosed in an 'if' from a previous commit
Karl Williamson [Sun, 21 Aug 2011 20:33:27 +0000 (14:33 -0600)]
mktables: Don't look at complement tables for equivalences
The inserted comment explains why.
Karl Williamson [Sun, 21 Aug 2011 20:18:27 +0000 (14:18 -0600)]
mktables: White space-only
This just indents a block properly for the 'if' added in a previous
commit
Karl Williamson [Sun, 21 Aug 2011 20:16:21 +0000 (14:16 -0600)]
mktables: Add test to avoid undef
The scx propoerty is not available in early Unicode versions. This adds
a test to avoid trying to use it if mktables is run on such a version
Karl Williamson [Sun, 21 Aug 2011 20:14:53 +0000 (14:14 -0600)]
mktables: Move declaration
This will be needed in a future commit