<TITLE>libstdc++-v3 HOWTO: Chapter 21</TITLE>
<LINK REL="home" HREF="http://sources.redhat.com/libstdc++/docs/21_strings/">
<LINK REL=StyleSheet HREF="../lib3styles.css">
-<!-- $Id: howto.html,v 1.2 2000/07/07 21:13:28 pme Exp $ -->
+<!-- $Id: howto.html,v 1.3 2000/07/11 21:45:07 pme Exp $ -->
</HEAD>
<BODY>
are relying on special functons offered by the CString class.
</P>
<P>Things are not as bad as they seem. In
- <A HREF="http://egcs.cygnus.com/ml/egcs/1999-04/msg00233.html">this
+ <A HREF="http://gcc.gnu.org/ml/egcs/1999-04/msg00233.html">this
message</A>, Joe Buck points out a few very important things:
<UL>
<LI>The Standard <TT>string</TT> supports all the operations
#include <string>
#include <sstream>
- string f (string& incoming) // incoming is something like "foo N"
+ string f (string& incoming) // incoming is "foo N"
{
istringstream incoming_stream(incoming);
string the_word;
CString suffers from a common programming error that results in
poor performance. Consider the following code:
- CString n_copies_of (const CString& foo, unsigned n)
+ CString n_copies_of (const CString& foo, unsigned n)
{
CString tmp;
for (unsigned i = 0; i < n; i++)
</P>
<P>The solution is surprisingly easy. The original answer pages
- on the GotW website have been removed into cold storage, in
- preparation for a published book of GotW notes. Before being
+ on the GotW website were removed into cold storage, in
+ preparation for
+ <A HREF="http://cseng.aw.com/bookpage.taf?ISBN=0-201-61562-2">a
+ published book of GotW notes</A>. Before being
put on the web, of course, it was posted on Usenet, and that
posting containing the answer is <A HREF="gotw29a.txt">available
here</A>.
on why case-insensitive comparisons are not as easy as they seem,
and why creating a class is the <EM>wrong</EM> way to go about it in
production code. (The GotW answer mentions one of the principle
- difficulties; this article mentions more.)
+ difficulties; his article mentions more.)
</P>
<P>Basically, this is "easy" only if you ignore some things,
things which may be too important to your program to ignore. (I chose
that nobody ever called me on it...) The GotW question and answer
remain useful instructional tools, however.
</P>
+ <P><B>Added September 2000:</B> James Kanze provided a link to a
+ <A HREF="http://www.unicode.org/unicode/reports/tr21/">Unicode
+ Technical Report discussing case handling</A>, which provides some
+ very good information.
+ </P>
<P>Return <A HREF="#top">to top of page</A> or
<A HREF="../faq/index.html">to the FAQ</A>.
</P>
a more general (but less readable) form of it for parsing command
strings and the like. If you compiled and ran this code using it:
<PRE>
- std::list<string> ls;
+ std::list<string> ls;
stringtok (ls, " this \t is\t\n a test ");
- for (std::list<string>::const_iterator i = ls.begin();
+ for (std::list<string>const_iterator i = ls.begin();
i != ls.end(); ++i)
{
std::cerr << ':' << (*i) << ":\n";
<A HREF="stringtok_std_h.txt">Another version of stringtok is given
here</A>, suggested by Chris King and tweaked by Petr Prikryl,
and this one uses the
- transformation functions given below. If you are comfortable with
- reading the new function names, this version is recommended as an example.
+ transformation functions mentioned below. If you are comfortable
+ with reading the new function names, this version is recommended
+ as an example.
</P>
<P>Return <A HREF="#top">to top of page</A> or
<A HREF="../faq/index.html">to the FAQ</A>.
to all upper case." The word transformations is especially
apt, because the standard template function
<TT>transform<></TT> is used.
+ </P>
+ <P>This code will go through some iterations (no pun). Here's the
+ simplistic version usually seen on Usenet:
<PRE>
- #include <string>
- #include <algorithm>
- #include <cctype> // old <ctype.h>
- std::string s ("Some Kind Of Initial Input Goes Here");
-
- // Change everything into upper case
- std::transform (s.begin(), s.end(), s.begin(), toupper);
-
- // Change everything into lower case
- std::transform (s.begin(), s.end(), s.begin(), tolower);
-
- // Change everything back into upper case, but store the
- // result in a different string
- std::string capital_s;
- capital_s.reserve(s.size());
- std::transform (s.begin(), s.end(), capital_s.begin(), tolower); </PRE>
+ #include <string>
+ #include <algorithm>
+ #include <cctype> // old <ctype.h>
+
+ std::string s ("Some Kind Of Initial Input Goes Here");
+
+ // Change everything into upper case
+ std::transform (s.begin(), s.end(), s.begin(), toupper);
+
+ // Change everything into lower case
+ std::transform (s.begin(), s.end(), s.begin(), tolower);
+
+ // Change everything back into upper case, but store the
+ // result in a different string
+ std::string capital_s;
+ capital_s.reserve(s.size());
+ std::transform (s.begin(), s.end(), capital_s.begin(), tolower); </PRE>
<SPAN CLASS="larger"><B>Note</B></SPAN> that these calls all involve
the global C locale through the use of the C functions
<TT>toupper/tolower</TT>. This is absolutely guaranteed to work --
- but only if you're using English text (bummer). A much better and
- more portable solution is to use a facet for a particular locale
- and call its conversion functions. (These are discussed more in
- Chapter 22.)
+ but <EM>only</EM> if the string contains <EM>only</EM> characters
+ from the basic source character set, and there are <EM>only</EM>
+ 96 of those. Which means that not even all English text can be
+ represented (certain British spellings, proper names, and so forth).
+ So, if all your input forevermore consists of only those 96
+ characters (hahahahahaha), then you're done.
+ </P>
+ <P>At minimum, you can write
+ </P>
+ <P>The correct method is to use a facet for a particular locale
+ and call its conversion functions. These are discussed more in
+ Chapter 22; the specific part is
+ <A HREF="../22_locale/howto.html#5">here</A>, which shows the
+ final version of this code. (Thanks to James Kanze for assistance
+ and suggestions on all of this.)
</P>
<P>Another common operation is trimming off excess whitespace. Much
like transformations, this task is trivial with the use of string's
Comments and suggestions are welcome, and may be sent to
<A HREF="mailto:pme@sources.redhat.com">Phil Edwards</A> or
<A HREF="mailto:gdr@egcs.cygnus.com">Gabriel Dos Reis</A>.
-<BR> $Id: howto.html,v 1.2 2000/07/07 21:13:28 pme Exp $
+<BR> $Id: howto.html,v 1.3 2000/07/11 21:45:07 pme Exp $
</EM></P>
<TITLE>libstdc++-v3 HOWTO: Chapter 22</TITLE>
<LINK REL="home" HREF="http://sources.redhat.com/libstdc++/docs/22_locale/">
<LINK REL=StyleSheet HREF="../lib3styles.css">
-<!-- $Id: howto.html,v 1.3 2000/08/25 08:52:56 bkoz Exp $ -->
+<!-- $Id: howto.html,v 1.4 2000/08/31 01:17:53 bkoz Exp $ -->
</HEAD>
<BODY>
<LI><A HREF="#2">Nathan Myers on Locales</A>
<LI><A HREF="#3">codecvt</A>
<LI><A HREF="#4">ctype</A>
+ <LI><A HREF="#5">Correct Transformations</A>
</UL>
<HR>
Programming Language (3rd Edition)</A>. It is a detailed
description of locales and how to use them.
</P>
+ <P>He also writes:
+ <BLOCKQUOTE><EM>
+ Please note that I still consider this detailed description of
+ locales beyond the needs of most C++ programmers. It is written
+ with experienced programmers in mind and novices will do best to
+ avoid it.
+ </EM></BLOCKQUOTE>
+ </P>
<P>Return <A HREF="#top">to top of page</A> or
<A HREF="../faq/index.html">to the FAQ</A>.
</P>
<A HREF="../faq/index.html">to the FAQ</A>.
</P>
+<HR>
+<H2><A NAME="5">Correct Transformations</A></H2>
+ <!-- Jumping directly here from chapter 21. -->
+ <P>A very common question on newsgroups and mailing lists is, "How
+ do I do <foo> to a character string?" where <foo> is
+ a task such as changing all the letters to uppercase, to lowercase,
+ testing for digits, etc. A skilled and conscientious programmer
+ will follow the question with another, "And how do I make the
+ code portable?"
+ </P>
+ <P>(Poor innocent programmer, you have no idea the depths of trouble
+ you are getting yourself into. 'Twould be best for your sanity if
+ you dropped the whole idea and took up basket weaving instead. No?
+ Fine, you asked for it...)
+ </P>
+ <P>The task of changing the case of a letter or classifying a character
+ as numeric, graphical, etc, all depends on the cultural context of the
+ program at runtime. So, first you must take the portability question
+ into account. Once you have localized the program to a particular
+ natural language, only then can you perform the specific task.
+ Unfortunately, specializing a function for a human language is not
+ as simple as declaring
+ <TT> extern "Danish" int tolower (int); </TT>.
+ </P>
+ <P>The C++ code to do all this proceeds in the same way. First, a locale
+ is created. Then member functions of that locale are called to
+ perform minor tasks. Continuing the example from Chapter 21, we wish
+ to use the following convenience functions:
+ <PRE>
+ namespace std {
+ template <class charT>
+ charT
+ toupper (charT c, const locale& loc) const;
+ template <class charT>
+ charT
+ tolower (charT c, const locale& loc) const;
+ }</PRE>
+ This function extracts the appropriate "facet" from the
+ locale <EM>loc</EM> and calls the appropriate member function of that
+ facet, passing <EM>c</EM> as its argument. The resulting character
+ is returned.
+ </P>
+ <P>For the C/POSIX locale, the results are the same as calling the
+ classic C <TT>toupper/tolower</TT> function that was used in previous
+ examples. For other locales, the code should Do The Right Thing.
+ </P>
+ <P>Of course, these functions take a second argument, and the
+ transformation algorithm's operator argument can only take a single
+ parameter. So we write simple wrapper structs to handle that.
+ </P>
+ <P>The next-to-final version of the code started in Chapter 21 looks like:
+ <PRE>
+ #include <iterator> // for back_inserter
+ #include <locale>
+ #include <string>
+ #include <algorithm>
+ #include <cctype> // old <ctype.h>
+
+ struct Toupper
+ {
+ Toupper (std::locale const& l) : loc(l) {;}
+ char operator() (char c) { return std::toupper(c,loc); }
+ private:
+ std::locale const& loc;
+ };
+
+ struct Tolower
+ {
+ Tolower (std::locale const& l) : loc(l) {;}
+ char operator() (char c) { return std::tolower(c,loc); }
+ private:
+ std::locale const& loc;
+ };
+
+ int main ()
+ {
+ std::string s ("Some Kind Of Initial Input Goes Here");
+ Toupper up ( std::locale("C") );
+ Tolower down ( std::locale("C") );
+
+ // Change everything into upper case
+ std::transform (s.begin(), s.end(), s.begin(),
+ up
+ );
+
+ // Change everything into lower case
+ std::transform (s.begin(), s.end(), s.begin(),
+ down
+ );
+
+ // Change everything back into upper case, but store the
+ // result in a different string
+ std::string capital_s;
+ std::transform (s.begin(), s.end(), std::back_inserter(capital_s),
+ up
+ );
+ }</PRE>
+ </P>
+ <P>The final version of the code uses <TT>bind2nd</TT> to eliminate
+ the wrapper structs, but the resulting code is tricky. I have not
+ shown it here because no compilers currently available to me will
+ handle it.
+ </P>
+ <P>Return <A HREF="#top">to top of page</A> or
+ <A HREF="../faq/index.html">to the FAQ</A>.
+ </P>
+
+
<!-- ####################################################### -->
Comments and suggestions are welcome, and may be sent to
<A HREF="mailto:pme@sources.redhat.com">Phil Edwards</A> or
<A HREF="mailto:gdr@egcs.cygnus.com">Gabriel Dos Reis</A>.
-<BR> $Id: howto.html,v 1.3 2000/08/25 08:52:56 bkoz Exp $
+<BR> $Id: howto.html,v 1.4 2000/08/31 01:17:53 bkoz Exp $
</EM></P>