Document shm_open.

[platform/upstream/glibc.git] / manual / charset.texi
diff --git a/manual/charset.texi b/manual/charset.texi

index 5063246..4042639 100644 (file)
--- a/manual/charset.texi
+++ b/manual/charset.texi
@@ -112,7 +112,7 @@ this type is capable of storing all elements of the basic character set.
  Therefore it would be legitimate to define @code{wchar_t} as @code{char},
  which might make sense for embedded systems.
  
-But for GNU systems @code{wchar_t} is always 32 bits wide and, therefore,
+But in @theglibc{} @code{wchar_t} is always 32 bits wide and, therefore,
  capable of representing all UCS-4 values and, therefore, covering all of
  @w{ISO 10646}.  Some Unix systems define @code{wchar_t} as a 16-bit type
  and thereby follow Unicode very strictly.  This definition is perfectly
@@ -204,10 +204,10 @@ defined in @file{wchar.h}.
  
  These internal representations present problems when it comes to storing
  and transmittal.  Because each single wide character consists of more
-than one byte, they are effected by byte-ordering.  Thus, machines with
+than one byte, they are affected by byte-ordering.  Thus, machines with
  different endianesses would see different values when accessing the same
  data.  This byte ordering concern also applies for communication protocols
-that are all byte-based and, thereforet require that the sender has to
+that are all byte-based and therefore require that the sender has to
  decide about splitting the wide character in bytes.  A last (but not least
  important) point is that wide characters often require more storage space
  than a customized byte-oriented character set.
@@ -225,7 +225,7 @@ fulfill one requirement: they are "filesystem safe."  This means that
  the character @code{'/'} is used in the encoding @emph{only} to
  represent itself.  Things are a bit different for character sets like
  EBCDIC (Extended Binary Coded Decimal Interchange Code, a character set
-family used by IBM), but if the operation system does not understand
+family used by IBM), but if the operating system does not understand
  EBCDIC directly the parameters-to-system calls have to be converted
  first anyhow.
  
@@ -257,7 +257,7 @@ state changes that cover more than the next character.  This has the
  big advantage that whenever one can identify the beginning of the byte
  sequence of a character one can interpret a text correctly.  Examples of
  character sets using this policy are the various EUC character sets
-(used by Sun's operations systems, EUC-JP, EUC-KR, EUC-TW, and EUC-CN)
+(used by Sun's operating systems, EUC-JP, EUC-KR, EUC-TW, and EUC-CN)
  or Shift_JIS (SJIS, a Japanese encoding).
  
  But there are also character sets using a state that is valid for more
@@ -361,7 +361,7 @@ the @code{LC_CTYPE} category of the current locale is used; see
  The functions handling more than one character at a time require NUL
  terminated strings as the argument (i.e., converting blocks of text
  does not work unless one can add a NUL byte at an appropriate place).
-The GNU C library contains some extensions to the standard that allow
+@Theglibc{} contains some extensions to the standard that allow
  specifying a size, but basically they also expect terminated strings.
  @end itemize
  
@@ -393,7 +393,7 @@ We already said above that the currently selected locale for the
  by the functions we are about to describe.  Each locale uses its own
  character set (given as an argument to @code{localedef}) and this is the
  one assumed as the external multibyte encoding.  The wide character
-character set always is UCS-4, at least on GNU systems.
+set is always UCS-4 in @theglibc{}.
  
  A characteristic of each multibyte character set is the maximum number
  of bytes that can be necessary to represent one character.  This
@@ -418,7 +418,7 @@ a compile-time constant and is defined in @file{limits.h}.
  maximum number of bytes in a multibyte character in the current locale.
  The value is never greater than @code{MB_LEN_MAX}.  Unlike
  @code{MB_LEN_MAX} this macro need not be a compile-time constant, and in
-the GNU C library it is not.
+@theglibc{} it is not.
  
  @pindex stdlib.h
  @code{MB_CUR_MAX} is defined in @file{stdlib.h}.
@@ -537,8 +537,8 @@ Code using @code{mbsinit} often looks similar to this:
  
  The code to emit the escape sequence to get back to the initial state is
  interesting.  The @code{wcsrtombs} function can be used to determine the
-necessary output code (@pxref{Converting Strings}).  Please note that on
-GNU systems it is not necessary to perform this extra action for the
+necessary output code (@pxref{Converting Strings}).  Please note that with
+@theglibc{} it is not necessary to perform this extra action for the
  conversion from multibyte text to wide character text since the wide
  character encoding is not stateful.  But there is nothing mentioned in
  any standard that prohibits making @code{wchar_t} using a stateful
@@ -577,8 +577,8 @@ The @code{btowc} function was introduced in @w{Amendment 1} to @w{ISO C90}
  and is declared in @file{wchar.h}.
  @end deftypefun
  
-Despite the limitation that the single byte value always is interpreted
-in the initial state this function is actually useful most of the time.
+Despite the limitation that the single byte value is always interpreted
+in the initial state, this function is actually useful most of the time.
  Most characters are either entirely single-byte character sets or they
  are extension to ASCII.  But then it is possible to write code like this
  (not that this specific example is very useful):
@@ -607,10 +607,10 @@ that there is no guarantee that one can perform this kind of arithmetic
  on the character of the character set used for @code{wchar_t}
  representation.  In other situations the bytes are not constant at
  compile time and so the compiler cannot do the work.  In situations like
-this it is necessary @code{btowc}.
+this, using @code{btowc} is required.
  
  @noindent
-There also is a function for the conversion in the other direction.
+There is also a function for the conversion in the other direction.
  
  @comment wchar.h
  @comment ISO
@@ -737,7 +737,7 @@ the return value is @math{0}.  If the next @var{n} bytes form a valid
  multibyte character, the number of bytes belonging to this multibyte
  character byte sequence is returned.
  
-If the the first @var{n} bytes possibly form a valid multibyte
+If the first @var{n} bytes possibly form a valid multibyte
  character but the character is incomplete, the return value is
  @code{(size_t) -2}.  Otherwise the multibyte character sequence is invalid
  and the return value is @code{(size_t) -1}.
@@ -786,14 +786,14 @@ mbslen (const char *s)
  This function simply calls @code{mbrlen} for each multibyte character
  in the string and counts the number of function calls.  Please note that
  we here use @code{MB_LEN_MAX} as the size argument in the @code{mbrlen}
-call.  This is acceptable since a) this value is larger then the length of
+call.  This is acceptable since a) this value is larger than the length of
  the longest multibyte character sequence and b) we know that the string
  @var{s} ends with a NUL byte, which cannot be part of any other multibyte
  character sequence but the one representing the NUL wide character.
  Therefore, the @code{mbrlen} function will never read invalid memory.
  
  Now that this function is available (just to make this clear, this
-function is @emph{not} part of the GNU C library) we can compute the
+function is @emph{not} part of @theglibc{}) we can compute the
  number of wide character required to store the converted multibyte
  character string @var{s} using
  
@@ -949,7 +949,7 @@ The functions described in the previous section only convert a single
  character at a time.  Most operations to be performed in real-world
  programs include strings and therefore the @w{ISO C} standard also
  defines conversions on entire strings.  However, the defined set of
-functions is quite limited; therefore, the GNU C library contains a few
+functions is quite limited; therefore, @theglibc{} contains a few
  extensions that can help in some important situations.
  
  @comment wchar.h
@@ -1030,7 +1030,7 @@ therefore, should never be used in generally used code.
  
  The generic conversion interface (@pxref{Generic Charset Conversion})
  does not have this limitation (it simply works on buffers, not
-strings), and the GNU C library contains a set of functions that take
+strings), and @theglibc{} contains a set of functions that take
  additional parameters specifying the maximal number of bytes that are
  consumed from the input string.  This way the problem of
  @code{mbsrtowcs}'s example above could be solved by determining the line
@@ -1234,7 +1234,7 @@ file_mbsrtowcs (int input, int output)
        /* @r{If any characters must be carried forward,}
           @r{put them at the beginning of @code{buffer}.} */
        if (filled > 0)
-        memmove (inp, buffer, filled);
+        memmove (buffer, inp, filled);
      @}
  
    return 1;
@@ -1528,8 +1528,8 @@ The conversion functions mentioned so far in this chapter all had in
  common that they operate on character sets that are not directly
  specified by the functions.  The multibyte encoding used is specified by
  the currently selected locale for the @code{LC_CTYPE} category.  The
-wide character set is fixed by the implementation (in the case of GNU C
-library it is always UCS-4 encoded @w{ISO 10646}.
+wide character set is fixed by the implementation (in the case of @theglibc{}
+it is always UCS-4 encoded @w{ISO 10646}.
  
  This has of course several problems when it comes to general character
  conversion:
@@ -1541,7 +1541,7 @@ character set is the character set of the locale for the @code{LC_CTYPE}
  category, one has to change the @code{LC_CTYPE} locale using
  @code{setlocale}.
  
-Changing the @code{LC_TYPE} locale introduces major problems for the rest
+Changing the @code{LC_CTYPE} locale introduces major problems for the rest
  of the programs since several more functions (e.g., the character
  classification functions, @pxref{Classification of Characters}) use the
  @code{LC_CTYPE} category.
@@ -1648,7 +1648,7 @@ An @code{iconv} descriptor is like a file descriptor as for every use a
  new descriptor must be created.  The descriptor does not stand for all
  of the conversions from @var{fromset} to @var{toset}.
  
-The GNU C library implementation of @code{iconv_open} has one
+The @glibcadj{} implementation of @code{iconv_open} has one
  significant extension to other implementations.  To ease the extension
  of the set of available conversions, the implementation allows storing
  the necessary files with data and code in an arbitrary number of
@@ -1740,7 +1740,7 @@ from the initial state.  It is important that the programmer never makes
  any assumption as to whether the conversion has to deal with states.
  Even if the input and output character sets are not stateful, the
  implementation might still have to keep states.  This is due to the
-implementation chosen for the GNU C library as it is described below.
+implementation chosen for @theglibc{} as it is described below.
  Therefore an @code{iconv} call to reset the state should always be
  performed if some protocol requires this for the output text.
  
@@ -1761,7 +1761,7 @@ Since the character sets selected in the @code{iconv_open} call can be
  almost arbitrary, there can be situations where the input buffer contains
  valid characters, which have no identical representation in the output
  character set.  The behavior in this situation is undefined.  The
-@emph{current} behavior of the GNU C library in this situation is to
+@emph{current} behavior of @theglibc{} in this situation is to
  return with an error immediately.  This certainly is not the most
  desirable solution; therefore, future versions will provide better ones,
  but they are not yet finished.
@@ -1980,7 +1980,7 @@ the door open for extensions and improvements, but this design is also
  limiting on some platforms since not many platforms support dynamic
  loading in statically linked programs.  On platforms without this
  capability it is therefore not possible to use this interface in
-statically linked programs.  The GNU C library has, on ELF platforms, no
+statically linked programs.  @Theglibc{} has, on ELF platforms, no
  problems with dynamic loading in these situations; therefore, this
  point is moot.  The danger is that one gets acquainted with this
  situation and forgets about the restrictions on other systems.
@@ -2054,38 +2054,38 @@ such conversion, one could make sure this also is true for indirect
  routes.
  
  @node glibc iconv Implementation
-@subsection The @code{iconv} Implementation in the GNU C library
+@subsection The @code{iconv} Implementation in @theglibc{}
  
  After reading about the problems of @code{iconv} implementations in the
  last section it is certainly good to note that the implementation in
-the GNU C library has none of the problems mentioned above.  What
+@theglibc{} has none of the problems mentioned above.  What
  follows is a step-by-step analysis of the points raised above.  The
  evaluation is based on the current state of the development (as of
  January 1999).  The development of the @code{iconv} functions is not
  complete, but basic functionality has solidified.
  
-The GNU C library's @code{iconv} implementation uses shared loadable
+@Theglibc{}'s @code{iconv} implementation uses shared loadable
  modules to implement the conversions.  A very small number of
  conversions are built into the library itself but these are only rather
  trivial conversions.
  
-All the benefits of loadable modules are available in the GNU C library
+All the benefits of loadable modules are available in the @glibcadj{}
  implementation.  This is especially appealing since the interface is
  well documented (see below), and it, therefore, is easy to write new
  conversion modules.  The drawback of using loadable objects is not a
-problem in the GNU C library, at least on ELF systems.  Since the
+problem in @theglibc{}, at least on ELF systems.  Since the
  library is able to load shared objects even in statically linked
  binaries, static linking need not be forbidden in case one wants to use
  @code{iconv}.
  
  The second mentioned problem is the number of supported conversions.
-Currently, the GNU C library supports more than 150 character sets.  The
+Currently, @theglibc{} supports more than 150 character sets.  The
  way the implementation is designed the number of supported conversions
  is greater than 22350 (@math{150} times @math{149}).  If any conversion
  from or to a character set is missing, it can be added easily.
  
  Particularly impressive as it may be, this high number is due to the
-fact that the GNU C library implementation of @code{iconv} does not have
+fact that the @glibcadj{} implementation of @code{iconv} does not have
  the third problem mentioned above (i.e., whenever there is a conversion
  from a character set @math{@cal{A}} to @math{@cal{B}} and from
  @math{@cal{B}} to @math{@cal{C}} it is always possible to convert from
@@ -2115,7 +2115,7 @@ the input to @w{ISO 10646} first.  The two character sets of interest
  are much more similar to each other than to @w{ISO 10646}.
  
  In such a situation one easily can write a new conversion and provide it
-as a better alternative.  The GNU C library @code{iconv} implementation
+as a better alternative.  The @glibcadj{} @code{iconv} implementation
  would automatically use the module implementing the conversion if it is
  specified to be more efficient.
  
@@ -2207,7 +2207,7 @@ file, however, specifies that the new conversion modules can perform this
  conversion with only the cost of @math{1}.
  
  A mysterious item about the @file{gconv-modules} file above (and also
-the file coming with the GNU C library) are the names of the character
+the file coming with @theglibc{}) are the names of the character
  sets specified in the @code{module} lines.  Why do almost all the names
  end in @code{//}?  And this is not all: the names can actually be
  regular expressions.  At this point in time this mystery should not be
@@ -2225,13 +2225,13 @@ become clear that this is the name for the representation used in the
  intermediate step of the triangulation.  We have said that this is UCS-4
  but actually that is not quite right.  The UCS-4 specification also
  includes the specification of the byte ordering used.  Since a UCS-4 value
-consists of four bytes, a stored value is effected by byte ordering.  The
+consists of four bytes, a stored value is affected by byte ordering.  The
  internal representation is @emph{not} the same as UCS-4 in case the byte
  ordering of the processor (or at least the running process) is not the
  same as the one required for UCS-4.  This is done for performance reasons
  as one does not want to perform unnecessary byte-swapping operations if
  one is not interested in actually seeing the result in UCS-4.  To avoid
-trouble with endianess, the internal representation consistently is named
+trouble with endianness, the internal representation consistently is named
  @code{INTERNAL} even on big-endian systems where the representations are
  identical.
  
@@ -2423,7 +2423,7 @@ loads the objects with the conversions.
  It is often the case that one conversion is used more than once (i.e.,
  there are several @code{iconv_open} calls for the same set of character
  sets during one program run).  The @code{mbsrtowcs} et.al.@: functions in
-the GNU C library also use the @code{iconv} functionality, which
+@theglibc{} also use the @code{iconv} functionality, which
  increases the number of uses of the same functions even more.
  
  Because of this multiple use of conversions, the modules do not get
@@ -2888,8 +2888,8 @@ gconv (struct __gconv_step *step, struct __gconv_step_data *data,
  @end deftypevr
  
  This information should be sufficient to write new modules.  Anybody
-doing so should also take a look at the available source code in the GNU
-C library sources.  It contains many examples of working and optimized
+doing so should also take a look at the available source code in the
+@glibcadj{} sources.  It contains many examples of working and optimized
  modules.
  
  @c File charset.texi edited October 2001 by Dennis Grace, IBM Corporation