From b483a630cbaddd8d3b4d1bd3bea646eb0d6fede1 Mon Sep 17 00:00:00 2001
From: sandra <sandra@138bc75d-0d04-0410-961f-82ee72b054a4>
Date: Sun, 1 Feb 2015 02:11:30 +0000
Subject: [PATCH] 2015-01-31  Sandra Loosemore  <sandra@codesourcery.com>

	gcc/
	* doc/md.texi (Machine Constraints): Alphabetize table by target.
	* doc/extend.texi (x86 Variable Attributes): Move section to
	correct alphabetization	after renaming.
	(x86 Type Attributes): Likewise.
	(Target Builtins): Re-alphabetize menu.
	(x86 Built-in Functions): Move section to correct alphabetization
	after renaming.
	(x86 transactional memory intrinsics): Likewise.
	* doc/invoke.texi (Option Summary): Re-alphabetize x86 Options
	and x86 Windows Options in table and menu.
	(x86 Options): Move section to correct alphabetization after
	renaming.
	(x86 Windows Options): Likewise.


git-svn-id: svn+ssh://gcc.gnu.org/svn/gcc/trunk@220315 138bc75d-0d04-0410-961f-82ee72b054a4
---
 gcc/ChangeLog       |    16 +
 gcc/doc/extend.texi | 10230 ++++++++++++++++++++--------------------
 gcc/doc/invoke.texi | 12574 +++++++++++++++++++++++++-------------------------
 gcc/doc/md.texi     |  1587 +++----
 4 files changed, 12212 insertions(+), 12195 deletions(-)

diff --git a/gcc/ChangeLog b/gcc/ChangeLog
index 7c06f05..0618d83 100644
--- a/gcc/ChangeLog
+++ b/gcc/ChangeLog
@@ -1,5 +1,21 @@
 2015-01-31  Sandra Loosemore  <sandra@codesourcery.com>
 
+	* doc/md.texi (Machine Constraints): Alphabetize table by target.
+	* doc/extend.texi (x86 Variable Attributes): Move section to
+	correct alphabetization	after renaming.
+	(x86 Type Attributes): Likewise.
+	(Target Builtins): Re-alphabetize menu.
+	(x86 Built-in Functions): Move section to correct alphabetization
+	after renaming.
+	(x86 transactional memory intrinsics): Likewise.
+	* doc/invoke.texi (Option Summary): Re-alphabetize x86 Options
+	and x86 Windows Options in table and menu.
+	(x86 Options): Move section to correct alphabetization after
+	renaming.
+	(x86 Windows Options): Likewise.
+
+2015-01-31  Sandra Loosemore  <sandra@codesourcery.com>
+
 	* doc/extend.texi: Use "x86", "x86-32", and "x86-64" as the
 	preferred names of the architecture and its 32- and 64-bit
 	variants.
diff --git a/gcc/doc/extend.texi b/gcc/doc/extend.texi
index 681812e..1806850 100644
--- a/gcc/doc/extend.texi
+++ b/gcc/doc/extend.texi
@@ -5521,6 +5521,23 @@ int cpu_clock __attribute__((cb(0x123)));
 
 @end table
 
+@subsection PowerPC Variable Attributes
+
+Three attributes currently are defined for PowerPC configurations:
+@code{altivec}, @code{ms_struct} and @code{gcc_struct}.
+
+For full documentation of the struct attributes please see the
+documentation in @ref{x86 Variable Attributes}.
+
+For documentation of @code{altivec} attribute please see the
+documentation in @ref{PowerPC Type Attributes}.
+
+@subsection SPU Variable Attributes
+
+The SPU supports the @code{spu_vector} attribute for variables.  For
+documentation of this attribute please see the documentation in
+@ref{SPU Type Attributes}.
+
 @anchor{x86 Variable Attributes}
 @subsection x86 Variable Attributes
 
@@ -5659,23 +5676,6 @@ Here, @code{t5} takes up 2 bytes.
 @end enumerate
 @end table
 
-@subsection PowerPC Variable Attributes
-
-Three attributes currently are defined for PowerPC configurations:
-@code{altivec}, @code{ms_struct} and @code{gcc_struct}.
-
-For full documentation of the struct attributes please see the
-documentation in @ref{x86 Variable Attributes}.
-
-For documentation of @code{altivec} attribute please see the
-documentation in @ref{PowerPC Type Attributes}.
-
-@subsection SPU Variable Attributes
-
-The SPU supports the @code{spu_vector} attribute for variables.  For
-documentation of this attribute please see the documentation in
-@ref{SPU Type Attributes}.
-
 @subsection Xstormy16 Variable Attributes
 
 One attribute is currently defined for xstormy16 configurations:
@@ -6078,30 +6078,6 @@ Specifically, the @code{based}, @code{tiny}, @code{near}, and
 @code{far} attributes may be applied to either.  The @code{io} and
 @code{cb} attributes may not be applied to types.
 
-@anchor{x86 Type Attributes}
-@subsection x86 Type Attributes
-
-Two attributes are currently defined for x86 configurations:
-@code{ms_struct} and @code{gcc_struct}.
-
-@table @code
-
-@item ms_struct
-@itemx gcc_struct
-@cindex @code{ms_struct}
-@cindex @code{gcc_struct}
-
-If @code{packed} is used on a structure, or if bit-fields are used
-it may be that the Microsoft ABI packs them differently
-than GCC normally packs them.  Particularly when moving packed
-data between functions compiled with GCC and the native Microsoft compiler
-(either via function call or as data in a file), it may be necessary to access
-either format.
-
-Currently @option{-m[no-]ms-bitfields} is provided for the Microsoft Windows x86
-compilers to match the native Microsoft compiler.
-@end table
-
 @anchor{PowerPC Type Attributes}
 @subsection PowerPC Type Attributes
 
@@ -6134,6 +6110,30 @@ allows one to declare vector data types supported by the Sony/Toshiba/IBM SPU
 Language Extensions Specification.  It is intended to support the
 @code{__vector} keyword.
 
+@anchor{x86 Type Attributes}
+@subsection x86 Type Attributes
+
+Two attributes are currently defined for x86 configurations:
+@code{ms_struct} and @code{gcc_struct}.
+
+@table @code
+
+@item ms_struct
+@itemx gcc_struct
+@cindex @code{ms_struct}
+@cindex @code{gcc_struct}
+
+If @code{packed} is used on a structure, or if bit-fields are used
+it may be that the Microsoft ABI packs them differently
+than GCC normally packs them.  Particularly when moving packed
+data between functions compiled with GCC and the native Microsoft compiler
+(either via function call or as data in a file), it may be necessary to access
+either format.
+
+Currently @option{-m[no-]ms-bitfields} is provided for the Microsoft Windows x86
+compilers to match the native Microsoft compiler.
+@end table
+
 @node Alignment
 @section Inquiring on Alignment of Types or Variables
 @cindex alignment
@@ -10113,8 +10113,6 @@ instructions, but allow the compiler to schedule those calls.
 * AVR Built-in Functions::
 * Blackfin Built-in Functions::
 * FR-V Built-in Functions::
-* x86 Built-in Functions::
-* x86 transactional memory intrinsics::
 * MIPS DSP Built-in Functions::
 * MIPS Paired-Single Support::
 * MIPS Loongson Built-in Functions::
@@ -10133,6 +10131,8 @@ instructions, but allow the compiler to schedule those calls.
 * TI C6X Built-in Functions::
 * TILE-Gx Built-in Functions::
 * TILEPro Built-in Functions::
+* x86 Built-in Functions::
+* x86 transactional memory intrinsics::
 @end menu
 
 @node AArch64 Built-in Functions
@@ -11484,5787 +11484,5787 @@ Use the @code{nldub} instruction to load the contents of address @var{x}
 into the data cache.  The instruction is issued in slot I1@.
 @end table
 
-@node x86 Built-in Functions
-@subsection x86 Built-in Functions
+@node MIPS DSP Built-in Functions
+@subsection MIPS DSP Built-in Functions
 
-These built-in functions are available for the x86-32 and x86-64 family
-of computers, depending on the command-line switches used.
+The MIPS DSP Application-Specific Extension (ASE) includes new
+instructions that are designed to improve the performance of DSP and
+media applications.  It provides instructions that operate on packed
+8-bit/16-bit integer data, Q7, Q15 and Q31 fractional data.
 
-If you specify command-line switches such as @option{-msse},
-the compiler could use the extended instruction sets even if the built-ins
-are not used explicitly in the program.  For this reason, applications
-that perform run-time CPU detection must compile separate files for each
-supported architecture, using the appropriate flags.  In particular,
-the file containing the CPU detection code should be compiled without
-these options.
+GCC supports MIPS DSP operations using both the generic
+vector extensions (@pxref{Vector Extensions}) and a collection of
+MIPS-specific built-in functions.  Both kinds of support are
+enabled by the @option{-mdsp} command-line option.
 
-The following machine modes are available for use with MMX built-in functions
-(@pxref{Vector Extensions}): @code{V2SI} for a vector of two 32-bit integers,
-@code{V4HI} for a vector of four 16-bit integers, and @code{V8QI} for a
-vector of eight 8-bit integers.  Some of the built-in functions operate on
-MMX registers as a whole 64-bit entity, these use @code{V1DI} as their mode.
+Revision 2 of the ASE was introduced in the second half of 2006.
+This revision adds extra instructions to the original ASE, but is
+otherwise backwards-compatible with it.  You can select revision 2
+using the command-line option @option{-mdspr2}; this option implies
+@option{-mdsp}.
 
-If 3DNow!@: extensions are enabled, @code{V2SF} is used as a mode for a vector
-of two 32-bit floating-point values.
+The SCOUNT and POS bits of the DSP control register are global.  The
+WRDSP, EXTPDP, EXTPDPV and MTHLIP instructions modify the SCOUNT and
+POS bits.  During optimization, the compiler does not delete these
+instructions and it does not delete calls to functions containing
+these instructions.
 
-If SSE extensions are enabled, @code{V4SF} is used for a vector of four 32-bit
-floating-point values.  Some instructions use a vector of four 32-bit
-integers, these use @code{V4SI}.  Finally, some instructions operate on an
-entire vector register, interpreting it as a 128-bit integer, these use mode
-@code{TI}.
+At present, GCC only provides support for operations on 32-bit
+vectors.  The vector type associated with 8-bit integer data is
+usually called @code{v4i8}, the vector type associated with Q7
+is usually called @code{v4q7}, the vector type associated with 16-bit
+integer data is usually called @code{v2i16}, and the vector type
+associated with Q15 is usually called @code{v2q15}.  They can be
+defined in C as follows:
 
-In 64-bit mode, the x86-64 family of processors uses additional built-in
-functions for efficient use of @code{TF} (@code{__float128}) 128-bit
-floating point and @code{TC} 128-bit complex floating-point values.
+@smallexample
+typedef signed char v4i8 __attribute__ ((vector_size(4)));
+typedef signed char v4q7 __attribute__ ((vector_size(4)));
+typedef short v2i16 __attribute__ ((vector_size(4)));
+typedef short v2q15 __attribute__ ((vector_size(4)));
+@end smallexample
 
-The following floating-point built-in functions are available in 64-bit
-mode.  All of them implement the function that is part of the name.
+@code{v4i8}, @code{v4q7}, @code{v2i16} and @code{v2q15} values are
+initialized in the same way as aggregates.  For example:
 
 @smallexample
-__float128 __builtin_fabsq (__float128)
-__float128 __builtin_copysignq (__float128, __float128)
+v4i8 a = @{1, 2, 3, 4@};
+v4i8 b;
+b = (v4i8) @{5, 6, 7, 8@};
+
+v2q15 c = @{0x0fcb, 0x3a75@};
+v2q15 d;
+d = (v2q15) @{0.1234 * 0x1.0p15, 0.4567 * 0x1.0p15@};
 @end smallexample
 
-The following built-in function is always available.
+@emph{Note:} The CPU's endianness determines the order in which values
+are packed.  On little-endian targets, the first value is the least
+significant and the last value is the most significant.  The opposite
+order applies to big-endian targets.  For example, the code above
+sets the lowest byte of @code{a} to @code{1} on little-endian targets
+and @code{4} on big-endian targets.
 
-@table @code
-@item void __builtin_ia32_pause (void)
-Generates the @code{pause} machine instruction with a compiler memory
-barrier.
-@end table
+@emph{Note:} Q7, Q15 and Q31 values must be initialized with their integer
+representation.  As shown in this example, the integer representation
+of a Q7 value can be obtained by multiplying the fractional value by
+@code{0x1.0p7}.  The equivalent for Q15 values is to multiply by
+@code{0x1.0p15}.  The equivalent for Q31 values is to multiply by
+@code{0x1.0p31}.
 
-The following floating-point built-in functions are made available in the
-64-bit mode.
+The table below lists the @code{v4i8} and @code{v2q15} operations for which
+hardware support exists.  @code{a} and @code{b} are @code{v4i8} values,
+and @code{c} and @code{d} are @code{v2q15} values.
 
-@table @code
-@item __float128 __builtin_infq (void)
-Similar to @code{__builtin_inf}, except the return type is @code{__float128}.
-@findex __builtin_infq
+@multitable @columnfractions .50 .50
+@item C code @tab MIPS instruction
+@item @code{a + b} @tab @code{addu.qb}
+@item @code{c + d} @tab @code{addq.ph}
+@item @code{a - b} @tab @code{subu.qb}
+@item @code{c - d} @tab @code{subq.ph}
+@end multitable
 
-@item __float128 __builtin_huge_valq (void)
-Similar to @code{__builtin_huge_val}, except the return type is @code{__float128}.
-@findex __builtin_huge_valq
-@end table
+The table below lists the @code{v2i16} operation for which
+hardware support exists for the DSP ASE REV 2.  @code{e} and @code{f} are
+@code{v2i16} values.
 
-The following built-in functions are always available and can be used to
-check the target platform type.
+@multitable @columnfractions .50 .50
+@item C code @tab MIPS instruction
+@item @code{e * f} @tab @code{mul.ph}
+@end multitable
 
-@deftypefn {Built-in Function} void __builtin_cpu_init (void)
-This function runs the CPU detection code to check the type of CPU and the
-features supported.  This built-in function needs to be invoked along with the built-in functions
-to check CPU type and features, @code{__builtin_cpu_is} and
-@code{__builtin_cpu_supports}, only when used in a function that is
-executed before any constructors are called.  The CPU detection code is
-automatically executed in a very high priority constructor.
+It is easier to describe the DSP built-in functions if we first define
+the following types:
 
-For example, this function has to be used in @code{ifunc} resolvers that
-check for CPU type using the built-in functions @code{__builtin_cpu_is}
-and @code{__builtin_cpu_supports}, or in constructors on targets that
-don't support constructor priority.
 @smallexample
-
-static void (*resolve_memcpy (void)) (void)
-@{
-  // ifunc resolvers fire before constructors, explicitly call the init
-  // function.
-  __builtin_cpu_init ();
-  if (__builtin_cpu_supports ("ssse3"))
-    return ssse3_memcpy; // super fast memcpy with ssse3 instructions.
-  else
-    return default_memcpy;
-@}
-
-void *memcpy (void *, const void *, size_t)
-     __attribute__ ((ifunc ("resolve_memcpy")));
+typedef int q31;
+typedef int i32;
+typedef unsigned int ui32;
+typedef long long a64;
 @end smallexample
 
-@end deftypefn
-
-@deftypefn {Built-in Function} int __builtin_cpu_is (const char *@var{cpuname})
-This function returns a positive integer if the run-time CPU
-is of type @var{cpuname}
-and returns @code{0} otherwise. The following CPU names can be detected:
+@code{q31} and @code{i32} are actually the same as @code{int}, but we
+use @code{q31} to indicate a Q31 fractional value and @code{i32} to
+indicate a 32-bit integer value.  Similarly, @code{a64} is the same as
+@code{long long}, but we use @code{a64} to indicate values that are
+placed in one of the four DSP accumulators (@code{$ac0},
+@code{$ac1}, @code{$ac2} or @code{$ac3}).
 
-@table @samp
-@item intel
-Intel CPU.
+Also, some built-in functions prefer or require immediate numbers as
+parameters, because the corresponding DSP instructions accept both immediate
+numbers and register operands, or accept immediate numbers only.  The
+immediate parameters are listed as follows.
 
-@item atom
-Intel Atom CPU.
+@smallexample
+imm0_3: 0 to 3.
+imm0_7: 0 to 7.
+imm0_15: 0 to 15.
+imm0_31: 0 to 31.
+imm0_63: 0 to 63.
+imm0_255: 0 to 255.
+imm_n32_31: -32 to 31.
+imm_n512_511: -512 to 511.
+@end smallexample
 
-@item core2
-Intel Core 2 CPU.
+The following built-in functions map directly to a particular MIPS DSP
+instruction.  Please refer to the architecture specification
+for details on what each instruction does.
 
-@item corei7
-Intel Core i7 CPU.
+@smallexample
+v2q15 __builtin_mips_addq_ph (v2q15, v2q15)
+v2q15 __builtin_mips_addq_s_ph (v2q15, v2q15)
+q31 __builtin_mips_addq_s_w (q31, q31)
+v4i8 __builtin_mips_addu_qb (v4i8, v4i8)
+v4i8 __builtin_mips_addu_s_qb (v4i8, v4i8)
+v2q15 __builtin_mips_subq_ph (v2q15, v2q15)
+v2q15 __builtin_mips_subq_s_ph (v2q15, v2q15)
+q31 __builtin_mips_subq_s_w (q31, q31)
+v4i8 __builtin_mips_subu_qb (v4i8, v4i8)
+v4i8 __builtin_mips_subu_s_qb (v4i8, v4i8)
+i32 __builtin_mips_addsc (i32, i32)
+i32 __builtin_mips_addwc (i32, i32)
+i32 __builtin_mips_modsub (i32, i32)
+i32 __builtin_mips_raddu_w_qb (v4i8)
+v2q15 __builtin_mips_absq_s_ph (v2q15)
+q31 __builtin_mips_absq_s_w (q31)
+v4i8 __builtin_mips_precrq_qb_ph (v2q15, v2q15)
+v2q15 __builtin_mips_precrq_ph_w (q31, q31)
+v2q15 __builtin_mips_precrq_rs_ph_w (q31, q31)
+v4i8 __builtin_mips_precrqu_s_qb_ph (v2q15, v2q15)
+q31 __builtin_mips_preceq_w_phl (v2q15)
+q31 __builtin_mips_preceq_w_phr (v2q15)
+v2q15 __builtin_mips_precequ_ph_qbl (v4i8)
+v2q15 __builtin_mips_precequ_ph_qbr (v4i8)
+v2q15 __builtin_mips_precequ_ph_qbla (v4i8)
+v2q15 __builtin_mips_precequ_ph_qbra (v4i8)
+v2q15 __builtin_mips_preceu_ph_qbl (v4i8)
+v2q15 __builtin_mips_preceu_ph_qbr (v4i8)
+v2q15 __builtin_mips_preceu_ph_qbla (v4i8)
+v2q15 __builtin_mips_preceu_ph_qbra (v4i8)
+v4i8 __builtin_mips_shll_qb (v4i8, imm0_7)
+v4i8 __builtin_mips_shll_qb (v4i8, i32)
+v2q15 __builtin_mips_shll_ph (v2q15, imm0_15)
+v2q15 __builtin_mips_shll_ph (v2q15, i32)
+v2q15 __builtin_mips_shll_s_ph (v2q15, imm0_15)
+v2q15 __builtin_mips_shll_s_ph (v2q15, i32)
+q31 __builtin_mips_shll_s_w (q31, imm0_31)
+q31 __builtin_mips_shll_s_w (q31, i32)
+v4i8 __builtin_mips_shrl_qb (v4i8, imm0_7)
+v4i8 __builtin_mips_shrl_qb (v4i8, i32)
+v2q15 __builtin_mips_shra_ph (v2q15, imm0_15)
+v2q15 __builtin_mips_shra_ph (v2q15, i32)
+v2q15 __builtin_mips_shra_r_ph (v2q15, imm0_15)
+v2q15 __builtin_mips_shra_r_ph (v2q15, i32)
+q31 __builtin_mips_shra_r_w (q31, imm0_31)
+q31 __builtin_mips_shra_r_w (q31, i32)
+v2q15 __builtin_mips_muleu_s_ph_qbl (v4i8, v2q15)
+v2q15 __builtin_mips_muleu_s_ph_qbr (v4i8, v2q15)
+v2q15 __builtin_mips_mulq_rs_ph (v2q15, v2q15)
+q31 __builtin_mips_muleq_s_w_phl (v2q15, v2q15)
+q31 __builtin_mips_muleq_s_w_phr (v2q15, v2q15)
+a64 __builtin_mips_dpau_h_qbl (a64, v4i8, v4i8)
+a64 __builtin_mips_dpau_h_qbr (a64, v4i8, v4i8)
+a64 __builtin_mips_dpsu_h_qbl (a64, v4i8, v4i8)
+a64 __builtin_mips_dpsu_h_qbr (a64, v4i8, v4i8)
+a64 __builtin_mips_dpaq_s_w_ph (a64, v2q15, v2q15)
+a64 __builtin_mips_dpaq_sa_l_w (a64, q31, q31)
+a64 __builtin_mips_dpsq_s_w_ph (a64, v2q15, v2q15)
+a64 __builtin_mips_dpsq_sa_l_w (a64, q31, q31)
+a64 __builtin_mips_mulsaq_s_w_ph (a64, v2q15, v2q15)
+a64 __builtin_mips_maq_s_w_phl (a64, v2q15, v2q15)
+a64 __builtin_mips_maq_s_w_phr (a64, v2q15, v2q15)
+a64 __builtin_mips_maq_sa_w_phl (a64, v2q15, v2q15)
+a64 __builtin_mips_maq_sa_w_phr (a64, v2q15, v2q15)
+i32 __builtin_mips_bitrev (i32)
+i32 __builtin_mips_insv (i32, i32)
+v4i8 __builtin_mips_repl_qb (imm0_255)
+v4i8 __builtin_mips_repl_qb (i32)
+v2q15 __builtin_mips_repl_ph (imm_n512_511)
+v2q15 __builtin_mips_repl_ph (i32)
+void __builtin_mips_cmpu_eq_qb (v4i8, v4i8)
+void __builtin_mips_cmpu_lt_qb (v4i8, v4i8)
+void __builtin_mips_cmpu_le_qb (v4i8, v4i8)
+i32 __builtin_mips_cmpgu_eq_qb (v4i8, v4i8)
+i32 __builtin_mips_cmpgu_lt_qb (v4i8, v4i8)
+i32 __builtin_mips_cmpgu_le_qb (v4i8, v4i8)
+void __builtin_mips_cmp_eq_ph (v2q15, v2q15)
+void __builtin_mips_cmp_lt_ph (v2q15, v2q15)
+void __builtin_mips_cmp_le_ph (v2q15, v2q15)
+v4i8 __builtin_mips_pick_qb (v4i8, v4i8)
+v2q15 __builtin_mips_pick_ph (v2q15, v2q15)
+v2q15 __builtin_mips_packrl_ph (v2q15, v2q15)
+i32 __builtin_mips_extr_w (a64, imm0_31)
+i32 __builtin_mips_extr_w (a64, i32)
+i32 __builtin_mips_extr_r_w (a64, imm0_31)
+i32 __builtin_mips_extr_s_h (a64, i32)
+i32 __builtin_mips_extr_rs_w (a64, imm0_31)
+i32 __builtin_mips_extr_rs_w (a64, i32)
+i32 __builtin_mips_extr_s_h (a64, imm0_31)
+i32 __builtin_mips_extr_r_w (a64, i32)
+i32 __builtin_mips_extp (a64, imm0_31)
+i32 __builtin_mips_extp (a64, i32)
+i32 __builtin_mips_extpdp (a64, imm0_31)
+i32 __builtin_mips_extpdp (a64, i32)
+a64 __builtin_mips_shilo (a64, imm_n32_31)
+a64 __builtin_mips_shilo (a64, i32)
+a64 __builtin_mips_mthlip (a64, i32)
+void __builtin_mips_wrdsp (i32, imm0_63)
+i32 __builtin_mips_rddsp (imm0_63)
+i32 __builtin_mips_lbux (void *, i32)
+i32 __builtin_mips_lhx (void *, i32)
+i32 __builtin_mips_lwx (void *, i32)
+a64 __builtin_mips_ldx (void *, i32) [MIPS64 only]
+i32 __builtin_mips_bposge32 (void)
+a64 __builtin_mips_madd (a64, i32, i32);
+a64 __builtin_mips_maddu (a64, ui32, ui32);
+a64 __builtin_mips_msub (a64, i32, i32);
+a64 __builtin_mips_msubu (a64, ui32, ui32);
+a64 __builtin_mips_mult (i32, i32);
+a64 __builtin_mips_multu (ui32, ui32);
+@end smallexample
 
-@item nehalem
-Intel Core i7 Nehalem CPU.
+The following built-in functions map directly to a particular MIPS DSP REV 2
+instruction.  Please refer to the architecture specification
+for details on what each instruction does.
 
-@item westmere
-Intel Core i7 Westmere CPU.
+@smallexample
+v4q7 __builtin_mips_absq_s_qb (v4q7);
+v2i16 __builtin_mips_addu_ph (v2i16, v2i16);
+v2i16 __builtin_mips_addu_s_ph (v2i16, v2i16);
+v4i8 __builtin_mips_adduh_qb (v4i8, v4i8);
+v4i8 __builtin_mips_adduh_r_qb (v4i8, v4i8);
+i32 __builtin_mips_append (i32, i32, imm0_31);
+i32 __builtin_mips_balign (i32, i32, imm0_3);
+i32 __builtin_mips_cmpgdu_eq_qb (v4i8, v4i8);
+i32 __builtin_mips_cmpgdu_lt_qb (v4i8, v4i8);
+i32 __builtin_mips_cmpgdu_le_qb (v4i8, v4i8);
+a64 __builtin_mips_dpa_w_ph (a64, v2i16, v2i16);
+a64 __builtin_mips_dps_w_ph (a64, v2i16, v2i16);
+v2i16 __builtin_mips_mul_ph (v2i16, v2i16);
+v2i16 __builtin_mips_mul_s_ph (v2i16, v2i16);
+q31 __builtin_mips_mulq_rs_w (q31, q31);
+v2q15 __builtin_mips_mulq_s_ph (v2q15, v2q15);
+q31 __builtin_mips_mulq_s_w (q31, q31);
+a64 __builtin_mips_mulsa_w_ph (a64, v2i16, v2i16);
+v4i8 __builtin_mips_precr_qb_ph (v2i16, v2i16);
+v2i16 __builtin_mips_precr_sra_ph_w (i32, i32, imm0_31);
+v2i16 __builtin_mips_precr_sra_r_ph_w (i32, i32, imm0_31);
+i32 __builtin_mips_prepend (i32, i32, imm0_31);
+v4i8 __builtin_mips_shra_qb (v4i8, imm0_7);
+v4i8 __builtin_mips_shra_r_qb (v4i8, imm0_7);
+v4i8 __builtin_mips_shra_qb (v4i8, i32);
+v4i8 __builtin_mips_shra_r_qb (v4i8, i32);
+v2i16 __builtin_mips_shrl_ph (v2i16, imm0_15);
+v2i16 __builtin_mips_shrl_ph (v2i16, i32);
+v2i16 __builtin_mips_subu_ph (v2i16, v2i16);
+v2i16 __builtin_mips_subu_s_ph (v2i16, v2i16);
+v4i8 __builtin_mips_subuh_qb (v4i8, v4i8);
+v4i8 __builtin_mips_subuh_r_qb (v4i8, v4i8);
+v2q15 __builtin_mips_addqh_ph (v2q15, v2q15);
+v2q15 __builtin_mips_addqh_r_ph (v2q15, v2q15);
+q31 __builtin_mips_addqh_w (q31, q31);
+q31 __builtin_mips_addqh_r_w (q31, q31);
+v2q15 __builtin_mips_subqh_ph (v2q15, v2q15);
+v2q15 __builtin_mips_subqh_r_ph (v2q15, v2q15);
+q31 __builtin_mips_subqh_w (q31, q31);
+q31 __builtin_mips_subqh_r_w (q31, q31);
+a64 __builtin_mips_dpax_w_ph (a64, v2i16, v2i16);
+a64 __builtin_mips_dpsx_w_ph (a64, v2i16, v2i16);
+a64 __builtin_mips_dpaqx_s_w_ph (a64, v2q15, v2q15);
+a64 __builtin_mips_dpaqx_sa_w_ph (a64, v2q15, v2q15);
+a64 __builtin_mips_dpsqx_s_w_ph (a64, v2q15, v2q15);
+a64 __builtin_mips_dpsqx_sa_w_ph (a64, v2q15, v2q15);
+@end smallexample
 
-@item sandybridge
-Intel Core i7 Sandy Bridge CPU.
 
-@item amd
-AMD CPU.
+@node MIPS Paired-Single Support
+@subsection MIPS Paired-Single Support
 
-@item amdfam10h
-AMD Family 10h CPU.
+The MIPS64 architecture includes a number of instructions that
+operate on pairs of single-precision floating-point values.
+Each pair is packed into a 64-bit floating-point register,
+with one element being designated the ``upper half'' and
+the other being designated the ``lower half''.
 
-@item barcelona
-AMD Family 10h Barcelona CPU.
+GCC supports paired-single operations using both the generic
+vector extensions (@pxref{Vector Extensions}) and a collection of
+MIPS-specific built-in functions.  Both kinds of support are
+enabled by the @option{-mpaired-single} command-line option.
 
-@item shanghai
-AMD Family 10h Shanghai CPU.
+The vector type associated with paired-single values is usually
+called @code{v2sf}.  It can be defined in C as follows:
 
-@item istanbul
-AMD Family 10h Istanbul CPU.
+@smallexample
+typedef float v2sf __attribute__ ((vector_size (8)));
+@end smallexample
 
-@item btver1
-AMD Family 14h CPU.
+@code{v2sf} values are initialized in the same way as aggregates.
+For example:
 
-@item amdfam15h
-AMD Family 15h CPU.
+@smallexample
+v2sf a = @{1.5, 9.1@};
+v2sf b;
+float e, f;
+b = (v2sf) @{e, f@};
+@end smallexample
 
-@item bdver1
-AMD Family 15h Bulldozer version 1.
+@emph{Note:} The CPU's endianness determines which value is stored in
+the upper half of a register and which value is stored in the lower half.
+On little-endian targets, the first value is the lower one and the second
+value is the upper one.  The opposite order applies to big-endian targets.
+For example, the code above sets the lower half of @code{a} to
+@code{1.5} on little-endian targets and @code{9.1} on big-endian targets.
 
-@item bdver2
-AMD Family 15h Bulldozer version 2.
+@node MIPS Loongson Built-in Functions
+@subsection MIPS Loongson Built-in Functions
 
-@item bdver3
-AMD Family 15h Bulldozer version 3.
+GCC provides intrinsics to access the SIMD instructions provided by the
+ST Microelectronics Loongson-2E and -2F processors.  These intrinsics,
+available after inclusion of the @code{loongson.h} header file,
+operate on the following 64-bit vector types:
 
-@item bdver4
-AMD Family 15h Bulldozer version 4.
+@itemize
+@item @code{uint8x8_t}, a vector of eight unsigned 8-bit integers;
+@item @code{uint16x4_t}, a vector of four unsigned 16-bit integers;
+@item @code{uint32x2_t}, a vector of two unsigned 32-bit integers;
+@item @code{int8x8_t}, a vector of eight signed 8-bit integers;
+@item @code{int16x4_t}, a vector of four signed 16-bit integers;
+@item @code{int32x2_t}, a vector of two signed 32-bit integers.
+@end itemize
 
-@item btver2
-AMD Family 16h CPU.
-@end table
+The intrinsics provided are listed below; each is named after the
+machine instruction to which it corresponds, with suffixes added as
+appropriate to distinguish intrinsics that expand to the same machine
+instruction yet have different argument types.  Refer to the architecture
+documentation for a description of the functionality of each
+instruction.
 
-Here is an example:
 @smallexample
-if (__builtin_cpu_is ("corei7"))
-  @{
-     do_corei7 (); // Core i7 specific implementation.
-  @}
-else
-  @{
-     do_generic (); // Generic implementation.
-  @}
+int16x4_t packsswh (int32x2_t s, int32x2_t t);
+int8x8_t packsshb (int16x4_t s, int16x4_t t);
+uint8x8_t packushb (uint16x4_t s, uint16x4_t t);
+uint32x2_t paddw_u (uint32x2_t s, uint32x2_t t);
+uint16x4_t paddh_u (uint16x4_t s, uint16x4_t t);
+uint8x8_t paddb_u (uint8x8_t s, uint8x8_t t);
+int32x2_t paddw_s (int32x2_t s, int32x2_t t);
+int16x4_t paddh_s (int16x4_t s, int16x4_t t);
+int8x8_t paddb_s (int8x8_t s, int8x8_t t);
+uint64_t paddd_u (uint64_t s, uint64_t t);
+int64_t paddd_s (int64_t s, int64_t t);
+int16x4_t paddsh (int16x4_t s, int16x4_t t);
+int8x8_t paddsb (int8x8_t s, int8x8_t t);
+uint16x4_t paddush (uint16x4_t s, uint16x4_t t);
+uint8x8_t paddusb (uint8x8_t s, uint8x8_t t);
+uint64_t pandn_ud (uint64_t s, uint64_t t);
+uint32x2_t pandn_uw (uint32x2_t s, uint32x2_t t);
+uint16x4_t pandn_uh (uint16x4_t s, uint16x4_t t);
+uint8x8_t pandn_ub (uint8x8_t s, uint8x8_t t);
+int64_t pandn_sd (int64_t s, int64_t t);
+int32x2_t pandn_sw (int32x2_t s, int32x2_t t);
+int16x4_t pandn_sh (int16x4_t s, int16x4_t t);
+int8x8_t pandn_sb (int8x8_t s, int8x8_t t);
+uint16x4_t pavgh (uint16x4_t s, uint16x4_t t);
+uint8x8_t pavgb (uint8x8_t s, uint8x8_t t);
+uint32x2_t pcmpeqw_u (uint32x2_t s, uint32x2_t t);
+uint16x4_t pcmpeqh_u (uint16x4_t s, uint16x4_t t);
+uint8x8_t pcmpeqb_u (uint8x8_t s, uint8x8_t t);
+int32x2_t pcmpeqw_s (int32x2_t s, int32x2_t t);
+int16x4_t pcmpeqh_s (int16x4_t s, int16x4_t t);
+int8x8_t pcmpeqb_s (int8x8_t s, int8x8_t t);
+uint32x2_t pcmpgtw_u (uint32x2_t s, uint32x2_t t);
+uint16x4_t pcmpgth_u (uint16x4_t s, uint16x4_t t);
+uint8x8_t pcmpgtb_u (uint8x8_t s, uint8x8_t t);
+int32x2_t pcmpgtw_s (int32x2_t s, int32x2_t t);
+int16x4_t pcmpgth_s (int16x4_t s, int16x4_t t);
+int8x8_t pcmpgtb_s (int8x8_t s, int8x8_t t);
+uint16x4_t pextrh_u (uint16x4_t s, int field);
+int16x4_t pextrh_s (int16x4_t s, int field);
+uint16x4_t pinsrh_0_u (uint16x4_t s, uint16x4_t t);
+uint16x4_t pinsrh_1_u (uint16x4_t s, uint16x4_t t);
+uint16x4_t pinsrh_2_u (uint16x4_t s, uint16x4_t t);
+uint16x4_t pinsrh_3_u (uint16x4_t s, uint16x4_t t);
+int16x4_t pinsrh_0_s (int16x4_t s, int16x4_t t);
+int16x4_t pinsrh_1_s (int16x4_t s, int16x4_t t);
+int16x4_t pinsrh_2_s (int16x4_t s, int16x4_t t);
+int16x4_t pinsrh_3_s (int16x4_t s, int16x4_t t);
+int32x2_t pmaddhw (int16x4_t s, int16x4_t t);
+int16x4_t pmaxsh (int16x4_t s, int16x4_t t);
+uint8x8_t pmaxub (uint8x8_t s, uint8x8_t t);
+int16x4_t pminsh (int16x4_t s, int16x4_t t);
+uint8x8_t pminub (uint8x8_t s, uint8x8_t t);
+uint8x8_t pmovmskb_u (uint8x8_t s);
+int8x8_t pmovmskb_s (int8x8_t s);
+uint16x4_t pmulhuh (uint16x4_t s, uint16x4_t t);
+int16x4_t pmulhh (int16x4_t s, int16x4_t t);
+int16x4_t pmullh (int16x4_t s, int16x4_t t);
+int64_t pmuluw (uint32x2_t s, uint32x2_t t);
+uint8x8_t pasubub (uint8x8_t s, uint8x8_t t);
+uint16x4_t biadd (uint8x8_t s);
+uint16x4_t psadbh (uint8x8_t s, uint8x8_t t);
+uint16x4_t pshufh_u (uint16x4_t dest, uint16x4_t s, uint8_t order);
+int16x4_t pshufh_s (int16x4_t dest, int16x4_t s, uint8_t order);
+uint16x4_t psllh_u (uint16x4_t s, uint8_t amount);
+int16x4_t psllh_s (int16x4_t s, uint8_t amount);
+uint32x2_t psllw_u (uint32x2_t s, uint8_t amount);
+int32x2_t psllw_s (int32x2_t s, uint8_t amount);
+uint16x4_t psrlh_u (uint16x4_t s, uint8_t amount);
+int16x4_t psrlh_s (int16x4_t s, uint8_t amount);
+uint32x2_t psrlw_u (uint32x2_t s, uint8_t amount);
+int32x2_t psrlw_s (int32x2_t s, uint8_t amount);
+uint16x4_t psrah_u (uint16x4_t s, uint8_t amount);
+int16x4_t psrah_s (int16x4_t s, uint8_t amount);
+uint32x2_t psraw_u (uint32x2_t s, uint8_t amount);
+int32x2_t psraw_s (int32x2_t s, uint8_t amount);
+uint32x2_t psubw_u (uint32x2_t s, uint32x2_t t);
+uint16x4_t psubh_u (uint16x4_t s, uint16x4_t t);
+uint8x8_t psubb_u (uint8x8_t s, uint8x8_t t);
+int32x2_t psubw_s (int32x2_t s, int32x2_t t);
+int16x4_t psubh_s (int16x4_t s, int16x4_t t);
+int8x8_t psubb_s (int8x8_t s, int8x8_t t);
+uint64_t psubd_u (uint64_t s, uint64_t t);
+int64_t psubd_s (int64_t s, int64_t t);
+int16x4_t psubsh (int16x4_t s, int16x4_t t);
+int8x8_t psubsb (int8x8_t s, int8x8_t t);
+uint16x4_t psubush (uint16x4_t s, uint16x4_t t);
+uint8x8_t psubusb (uint8x8_t s, uint8x8_t t);
+uint32x2_t punpckhwd_u (uint32x2_t s, uint32x2_t t);
+uint16x4_t punpckhhw_u (uint16x4_t s, uint16x4_t t);
+uint8x8_t punpckhbh_u (uint8x8_t s, uint8x8_t t);
+int32x2_t punpckhwd_s (int32x2_t s, int32x2_t t);
+int16x4_t punpckhhw_s (int16x4_t s, int16x4_t t);
+int8x8_t punpckhbh_s (int8x8_t s, int8x8_t t);
+uint32x2_t punpcklwd_u (uint32x2_t s, uint32x2_t t);
+uint16x4_t punpcklhw_u (uint16x4_t s, uint16x4_t t);
+uint8x8_t punpcklbh_u (uint8x8_t s, uint8x8_t t);
+int32x2_t punpcklwd_s (int32x2_t s, int32x2_t t);
+int16x4_t punpcklhw_s (int16x4_t s, int16x4_t t);
+int8x8_t punpcklbh_s (int8x8_t s, int8x8_t t);
 @end smallexample
-@end deftypefn
 
-@deftypefn {Built-in Function} int __builtin_cpu_supports (const char *@var{feature})
-This function returns a positive integer if the run-time CPU
-supports @var{feature}
-and returns @code{0} otherwise. The following features can be detected:
+@menu
+* Paired-Single Arithmetic::
+* Paired-Single Built-in Functions::
+* MIPS-3D Built-in Functions::
+@end menu
 
-@table @samp
-@item cmov
-CMOV instruction.
-@item mmx
-MMX instructions.
-@item popcnt
-POPCNT instruction.
-@item sse
-SSE instructions.
-@item sse2
-SSE2 instructions.
-@item sse3
-SSE3 instructions.
-@item ssse3
-SSSE3 instructions.
-@item sse4.1
-SSE4.1 instructions.
-@item sse4.2
-SSE4.2 instructions.
-@item avx
-AVX instructions.
-@item avx2
-AVX2 instructions.
-@item avx512f
-AVX512F instructions.
-@end table
+@node Paired-Single Arithmetic
+@subsubsection Paired-Single Arithmetic
 
-Here is an example:
-@smallexample
-if (__builtin_cpu_supports ("popcnt"))
-  @{
-     asm("popcnt %1,%0" : "=r"(count) : "rm"(n) : "cc");
-  @}
-else
-  @{
-     count = generic_countbits (n); //generic implementation.
-  @}
-@end smallexample
-@end deftypefn
+The table below lists the @code{v2sf} operations for which hardware
+support exists.  @code{a}, @code{b} and @code{c} are @code{v2sf}
+values and @code{x} is an integral value.
 
+@multitable @columnfractions .50 .50
+@item C code @tab MIPS instruction
+@item @code{a + b} @tab @code{add.ps}
+@item @code{a - b} @tab @code{sub.ps}
+@item @code{-a} @tab @code{neg.ps}
+@item @code{a * b} @tab @code{mul.ps}
+@item @code{a * b + c} @tab @code{madd.ps}
+@item @code{a * b - c} @tab @code{msub.ps}
+@item @code{-(a * b + c)} @tab @code{nmadd.ps}
+@item @code{-(a * b - c)} @tab @code{nmsub.ps}
+@item @code{x ? a : b} @tab @code{movn.ps}/@code{movz.ps}
+@end multitable
 
-The following built-in functions are made available by @option{-mmmx}.
-All of them generate the machine instruction that is part of the name.
+Note that the multiply-accumulate instructions can be disabled
+using the command-line option @code{-mno-fused-madd}.
 
-@smallexample
-v8qi __builtin_ia32_paddb (v8qi, v8qi)
-v4hi __builtin_ia32_paddw (v4hi, v4hi)
-v2si __builtin_ia32_paddd (v2si, v2si)
-v8qi __builtin_ia32_psubb (v8qi, v8qi)
-v4hi __builtin_ia32_psubw (v4hi, v4hi)
-v2si __builtin_ia32_psubd (v2si, v2si)
-v8qi __builtin_ia32_paddsb (v8qi, v8qi)
-v4hi __builtin_ia32_paddsw (v4hi, v4hi)
-v8qi __builtin_ia32_psubsb (v8qi, v8qi)
-v4hi __builtin_ia32_psubsw (v4hi, v4hi)
-v8qi __builtin_ia32_paddusb (v8qi, v8qi)
-v4hi __builtin_ia32_paddusw (v4hi, v4hi)
-v8qi __builtin_ia32_psubusb (v8qi, v8qi)
-v4hi __builtin_ia32_psubusw (v4hi, v4hi)
-v4hi __builtin_ia32_pmullw (v4hi, v4hi)
-v4hi __builtin_ia32_pmulhw (v4hi, v4hi)
-di __builtin_ia32_pand (di, di)
-di __builtin_ia32_pandn (di,di)
-di __builtin_ia32_por (di, di)
-di __builtin_ia32_pxor (di, di)
-v8qi __builtin_ia32_pcmpeqb (v8qi, v8qi)
-v4hi __builtin_ia32_pcmpeqw (v4hi, v4hi)
-v2si __builtin_ia32_pcmpeqd (v2si, v2si)
-v8qi __builtin_ia32_pcmpgtb (v8qi, v8qi)
-v4hi __builtin_ia32_pcmpgtw (v4hi, v4hi)
-v2si __builtin_ia32_pcmpgtd (v2si, v2si)
-v8qi __builtin_ia32_punpckhbw (v8qi, v8qi)
-v4hi __builtin_ia32_punpckhwd (v4hi, v4hi)
-v2si __builtin_ia32_punpckhdq (v2si, v2si)
-v8qi __builtin_ia32_punpcklbw (v8qi, v8qi)
-v4hi __builtin_ia32_punpcklwd (v4hi, v4hi)
-v2si __builtin_ia32_punpckldq (v2si, v2si)
-v8qi __builtin_ia32_packsswb (v4hi, v4hi)
-v4hi __builtin_ia32_packssdw (v2si, v2si)
-v8qi __builtin_ia32_packuswb (v4hi, v4hi)
+@node Paired-Single Built-in Functions
+@subsubsection Paired-Single Built-in Functions
 
-v4hi __builtin_ia32_psllw (v4hi, v4hi)
-v2si __builtin_ia32_pslld (v2si, v2si)
-v1di __builtin_ia32_psllq (v1di, v1di)
-v4hi __builtin_ia32_psrlw (v4hi, v4hi)
-v2si __builtin_ia32_psrld (v2si, v2si)
-v1di __builtin_ia32_psrlq (v1di, v1di)
-v4hi __builtin_ia32_psraw (v4hi, v4hi)
-v2si __builtin_ia32_psrad (v2si, v2si)
-v4hi __builtin_ia32_psllwi (v4hi, int)
-v2si __builtin_ia32_pslldi (v2si, int)
-v1di __builtin_ia32_psllqi (v1di, int)
-v4hi __builtin_ia32_psrlwi (v4hi, int)
-v2si __builtin_ia32_psrldi (v2si, int)
-v1di __builtin_ia32_psrlqi (v1di, int)
-v4hi __builtin_ia32_psrawi (v4hi, int)
-v2si __builtin_ia32_psradi (v2si, int)
+The following paired-single functions map directly to a particular
+MIPS instruction.  Please refer to the architecture specification
+for details on what each instruction does.
 
-@end smallexample
+@table @code
+@item v2sf __builtin_mips_pll_ps (v2sf, v2sf)
+Pair lower lower (@code{pll.ps}).
 
-The following built-in functions are made available either with
-@option{-msse}, or with a combination of @option{-m3dnow} and
-@option{-march=athlon}.  All of them generate the machine
-instruction that is part of the name.
+@item v2sf __builtin_mips_pul_ps (v2sf, v2sf)
+Pair upper lower (@code{pul.ps}).
 
-@smallexample
-v4hi __builtin_ia32_pmulhuw (v4hi, v4hi)
-v8qi __builtin_ia32_pavgb (v8qi, v8qi)
-v4hi __builtin_ia32_pavgw (v4hi, v4hi)
-v1di __builtin_ia32_psadbw (v8qi, v8qi)
-v8qi __builtin_ia32_pmaxub (v8qi, v8qi)
-v4hi __builtin_ia32_pmaxsw (v4hi, v4hi)
-v8qi __builtin_ia32_pminub (v8qi, v8qi)
-v4hi __builtin_ia32_pminsw (v4hi, v4hi)
-int __builtin_ia32_pmovmskb (v8qi)
-void __builtin_ia32_maskmovq (v8qi, v8qi, char *)
-void __builtin_ia32_movntq (di *, di)
-void __builtin_ia32_sfence (void)
+@item v2sf __builtin_mips_plu_ps (v2sf, v2sf)
+Pair lower upper (@code{plu.ps}).
+
+@item v2sf __builtin_mips_puu_ps (v2sf, v2sf)
+Pair upper upper (@code{puu.ps}).
+
+@item v2sf __builtin_mips_cvt_ps_s (float, float)
+Convert pair to paired single (@code{cvt.ps.s}).
+
+@item float __builtin_mips_cvt_s_pl (v2sf)
+Convert pair lower to single (@code{cvt.s.pl}).
+
+@item float __builtin_mips_cvt_s_pu (v2sf)
+Convert pair upper to single (@code{cvt.s.pu}).
+
+@item v2sf __builtin_mips_abs_ps (v2sf)
+Absolute value (@code{abs.ps}).
+
+@item v2sf __builtin_mips_alnv_ps (v2sf, v2sf, int)
+Align variable (@code{alnv.ps}).
+
+@emph{Note:} The value of the third parameter must be 0 or 4
+modulo 8, otherwise the result is unpredictable.  Please read the
+instruction description for details.
+@end table
+
+The following multi-instruction functions are also available.
+In each case, @var{cond} can be any of the 16 floating-point conditions:
+@code{f}, @code{un}, @code{eq}, @code{ueq}, @code{olt}, @code{ult},
+@code{ole}, @code{ule}, @code{sf}, @code{ngle}, @code{seq}, @code{ngl},
+@code{lt}, @code{nge}, @code{le} or @code{ngt}.
+
+@table @code
+@item v2sf __builtin_mips_movt_c_@var{cond}_ps (v2sf @var{a}, v2sf @var{b}, v2sf @var{c}, v2sf @var{d})
+@itemx v2sf __builtin_mips_movf_c_@var{cond}_ps (v2sf @var{a}, v2sf @var{b}, v2sf @var{c}, v2sf @var{d})
+Conditional move based on floating-point comparison (@code{c.@var{cond}.ps},
+@code{movt.ps}/@code{movf.ps}).
+
+The @code{movt} functions return the value @var{x} computed by:
+
+@smallexample
+c.@var{cond}.ps @var{cc},@var{a},@var{b}
+mov.ps @var{x},@var{c}
+movt.ps @var{x},@var{d},@var{cc}
 @end smallexample
 
-The following built-in functions are available when @option{-msse} is used.
-All of them generate the machine instruction that is part of the name.
+The @code{movf} functions are similar but use @code{movf.ps} instead
+of @code{movt.ps}.
+
+@item int __builtin_mips_upper_c_@var{cond}_ps (v2sf @var{a}, v2sf @var{b})
+@itemx int __builtin_mips_lower_c_@var{cond}_ps (v2sf @var{a}, v2sf @var{b})
+Comparison of two paired-single values (@code{c.@var{cond}.ps},
+@code{bc1t}/@code{bc1f}).
+
+These functions compare @var{a} and @var{b} using @code{c.@var{cond}.ps}
+and return either the upper or lower half of the result.  For example:
 
 @smallexample
-int __builtin_ia32_comieq (v4sf, v4sf)
-int __builtin_ia32_comineq (v4sf, v4sf)
-int __builtin_ia32_comilt (v4sf, v4sf)
-int __builtin_ia32_comile (v4sf, v4sf)
-int __builtin_ia32_comigt (v4sf, v4sf)
-int __builtin_ia32_comige (v4sf, v4sf)
-int __builtin_ia32_ucomieq (v4sf, v4sf)
-int __builtin_ia32_ucomineq (v4sf, v4sf)
-int __builtin_ia32_ucomilt (v4sf, v4sf)
-int __builtin_ia32_ucomile (v4sf, v4sf)
-int __builtin_ia32_ucomigt (v4sf, v4sf)
-int __builtin_ia32_ucomige (v4sf, v4sf)
-v4sf __builtin_ia32_addps (v4sf, v4sf)
-v4sf __builtin_ia32_subps (v4sf, v4sf)
-v4sf __builtin_ia32_mulps (v4sf, v4sf)
-v4sf __builtin_ia32_divps (v4sf, v4sf)
-v4sf __builtin_ia32_addss (v4sf, v4sf)
-v4sf __builtin_ia32_subss (v4sf, v4sf)
-v4sf __builtin_ia32_mulss (v4sf, v4sf)
-v4sf __builtin_ia32_divss (v4sf, v4sf)
-v4sf __builtin_ia32_cmpeqps (v4sf, v4sf)
-v4sf __builtin_ia32_cmpltps (v4sf, v4sf)
-v4sf __builtin_ia32_cmpleps (v4sf, v4sf)
-v4sf __builtin_ia32_cmpgtps (v4sf, v4sf)
-v4sf __builtin_ia32_cmpgeps (v4sf, v4sf)
-v4sf __builtin_ia32_cmpunordps (v4sf, v4sf)
-v4sf __builtin_ia32_cmpneqps (v4sf, v4sf)
-v4sf __builtin_ia32_cmpnltps (v4sf, v4sf)
-v4sf __builtin_ia32_cmpnleps (v4sf, v4sf)
-v4sf __builtin_ia32_cmpngtps (v4sf, v4sf)
-v4sf __builtin_ia32_cmpngeps (v4sf, v4sf)
-v4sf __builtin_ia32_cmpordps (v4sf, v4sf)
-v4sf __builtin_ia32_cmpeqss (v4sf, v4sf)
-v4sf __builtin_ia32_cmpltss (v4sf, v4sf)
-v4sf __builtin_ia32_cmpless (v4sf, v4sf)
-v4sf __builtin_ia32_cmpunordss (v4sf, v4sf)
-v4sf __builtin_ia32_cmpneqss (v4sf, v4sf)
-v4sf __builtin_ia32_cmpnltss (v4sf, v4sf)
-v4sf __builtin_ia32_cmpnless (v4sf, v4sf)
-v4sf __builtin_ia32_cmpordss (v4sf, v4sf)
-v4sf __builtin_ia32_maxps (v4sf, v4sf)
-v4sf __builtin_ia32_maxss (v4sf, v4sf)
-v4sf __builtin_ia32_minps (v4sf, v4sf)
-v4sf __builtin_ia32_minss (v4sf, v4sf)
-v4sf __builtin_ia32_andps (v4sf, v4sf)
-v4sf __builtin_ia32_andnps (v4sf, v4sf)
-v4sf __builtin_ia32_orps (v4sf, v4sf)
-v4sf __builtin_ia32_xorps (v4sf, v4sf)
-v4sf __builtin_ia32_movss (v4sf, v4sf)
-v4sf __builtin_ia32_movhlps (v4sf, v4sf)
-v4sf __builtin_ia32_movlhps (v4sf, v4sf)
-v4sf __builtin_ia32_unpckhps (v4sf, v4sf)
-v4sf __builtin_ia32_unpcklps (v4sf, v4sf)
-v4sf __builtin_ia32_cvtpi2ps (v4sf, v2si)
-v4sf __builtin_ia32_cvtsi2ss (v4sf, int)
-v2si __builtin_ia32_cvtps2pi (v4sf)
-int __builtin_ia32_cvtss2si (v4sf)
-v2si __builtin_ia32_cvttps2pi (v4sf)
-int __builtin_ia32_cvttss2si (v4sf)
-v4sf __builtin_ia32_rcpps (v4sf)
-v4sf __builtin_ia32_rsqrtps (v4sf)
-v4sf __builtin_ia32_sqrtps (v4sf)
-v4sf __builtin_ia32_rcpss (v4sf)
-v4sf __builtin_ia32_rsqrtss (v4sf)
-v4sf __builtin_ia32_sqrtss (v4sf)
-v4sf __builtin_ia32_shufps (v4sf, v4sf, int)
-void __builtin_ia32_movntps (float *, v4sf)
-int __builtin_ia32_movmskps (v4sf)
+v2sf a, b;
+if (__builtin_mips_upper_c_eq_ps (a, b))
+  upper_halves_are_equal ();
+else
+  upper_halves_are_unequal ();
+
+if (__builtin_mips_lower_c_eq_ps (a, b))
+  lower_halves_are_equal ();
+else
+  lower_halves_are_unequal ();
 @end smallexample
+@end table
 
-The following built-in functions are available when @option{-msse} is used.
+@node MIPS-3D Built-in Functions
+@subsubsection MIPS-3D Built-in Functions
+
+The MIPS-3D Application-Specific Extension (ASE) includes additional
+paired-single instructions that are designed to improve the performance
+of 3D graphics operations.  Support for these instructions is controlled
+by the @option{-mips3d} command-line option.
+
+The functions listed below map directly to a particular MIPS-3D
+instruction.  Please refer to the architecture specification for
+more details on what each instruction does.
 
 @table @code
-@item v4sf __builtin_ia32_loadups (float *)
-Generates the @code{movups} machine instruction as a load from memory.
-@item void __builtin_ia32_storeups (float *, v4sf)
-Generates the @code{movups} machine instruction as a store to memory.
-@item v4sf __builtin_ia32_loadss (float *)
-Generates the @code{movss} machine instruction as a load from memory.
-@item v4sf __builtin_ia32_loadhps (v4sf, const v2sf *)
-Generates the @code{movhps} machine instruction as a load from memory.
-@item v4sf __builtin_ia32_loadlps (v4sf, const v2sf *)
-Generates the @code{movlps} machine instruction as a load from memory
-@item void __builtin_ia32_storehps (v2sf *, v4sf)
-Generates the @code{movhps} machine instruction as a store to memory.
-@item void __builtin_ia32_storelps (v2sf *, v4sf)
-Generates the @code{movlps} machine instruction as a store to memory.
+@item v2sf __builtin_mips_addr_ps (v2sf, v2sf)
+Reduction add (@code{addr.ps}).
+
+@item v2sf __builtin_mips_mulr_ps (v2sf, v2sf)
+Reduction multiply (@code{mulr.ps}).
+
+@item v2sf __builtin_mips_cvt_pw_ps (v2sf)
+Convert paired single to paired word (@code{cvt.pw.ps}).
+
+@item v2sf __builtin_mips_cvt_ps_pw (v2sf)
+Convert paired word to paired single (@code{cvt.ps.pw}).
+
+@item float __builtin_mips_recip1_s (float)
+@itemx double __builtin_mips_recip1_d (double)
+@itemx v2sf __builtin_mips_recip1_ps (v2sf)
+Reduced-precision reciprocal (sequence step 1) (@code{recip1.@var{fmt}}).
+
+@item float __builtin_mips_recip2_s (float, float)
+@itemx double __builtin_mips_recip2_d (double, double)
+@itemx v2sf __builtin_mips_recip2_ps (v2sf, v2sf)
+Reduced-precision reciprocal (sequence step 2) (@code{recip2.@var{fmt}}).
+
+@item float __builtin_mips_rsqrt1_s (float)
+@itemx double __builtin_mips_rsqrt1_d (double)
+@itemx v2sf __builtin_mips_rsqrt1_ps (v2sf)
+Reduced-precision reciprocal square root (sequence step 1)
+(@code{rsqrt1.@var{fmt}}).
+
+@item float __builtin_mips_rsqrt2_s (float, float)
+@itemx double __builtin_mips_rsqrt2_d (double, double)
+@itemx v2sf __builtin_mips_rsqrt2_ps (v2sf, v2sf)
+Reduced-precision reciprocal square root (sequence step 2)
+(@code{rsqrt2.@var{fmt}}).
 @end table
 
-The following built-in functions are available when @option{-msse2} is used.
-All of them generate the machine instruction that is part of the name.
+The following multi-instruction functions are also available.
+In each case, @var{cond} can be any of the 16 floating-point conditions:
+@code{f}, @code{un}, @code{eq}, @code{ueq}, @code{olt}, @code{ult},
+@code{ole}, @code{ule}, @code{sf}, @code{ngle}, @code{seq},
+@code{ngl}, @code{lt}, @code{nge}, @code{le} or @code{ngt}.
+
+@table @code
+@item int __builtin_mips_cabs_@var{cond}_s (float @var{a}, float @var{b})
+@itemx int __builtin_mips_cabs_@var{cond}_d (double @var{a}, double @var{b})
+Absolute comparison of two scalar values (@code{cabs.@var{cond}.@var{fmt}},
+@code{bc1t}/@code{bc1f}).
+
+These functions compare @var{a} and @var{b} using @code{cabs.@var{cond}.s}
+or @code{cabs.@var{cond}.d} and return the result as a boolean value.
+For example:
 
 @smallexample
-int __builtin_ia32_comisdeq (v2df, v2df)
-int __builtin_ia32_comisdlt (v2df, v2df)
-int __builtin_ia32_comisdle (v2df, v2df)
-int __builtin_ia32_comisdgt (v2df, v2df)
-int __builtin_ia32_comisdge (v2df, v2df)
-int __builtin_ia32_comisdneq (v2df, v2df)
-int __builtin_ia32_ucomisdeq (v2df, v2df)
-int __builtin_ia32_ucomisdlt (v2df, v2df)
-int __builtin_ia32_ucomisdle (v2df, v2df)
-int __builtin_ia32_ucomisdgt (v2df, v2df)
-int __builtin_ia32_ucomisdge (v2df, v2df)
-int __builtin_ia32_ucomisdneq (v2df, v2df)
-v2df __builtin_ia32_cmpeqpd (v2df, v2df)
-v2df __builtin_ia32_cmpltpd (v2df, v2df)
-v2df __builtin_ia32_cmplepd (v2df, v2df)
-v2df __builtin_ia32_cmpgtpd (v2df, v2df)
-v2df __builtin_ia32_cmpgepd (v2df, v2df)
-v2df __builtin_ia32_cmpunordpd (v2df, v2df)
-v2df __builtin_ia32_cmpneqpd (v2df, v2df)
-v2df __builtin_ia32_cmpnltpd (v2df, v2df)
-v2df __builtin_ia32_cmpnlepd (v2df, v2df)
-v2df __builtin_ia32_cmpngtpd (v2df, v2df)
-v2df __builtin_ia32_cmpngepd (v2df, v2df)
-v2df __builtin_ia32_cmpordpd (v2df, v2df)
-v2df __builtin_ia32_cmpeqsd (v2df, v2df)
-v2df __builtin_ia32_cmpltsd (v2df, v2df)
-v2df __builtin_ia32_cmplesd (v2df, v2df)
-v2df __builtin_ia32_cmpunordsd (v2df, v2df)
-v2df __builtin_ia32_cmpneqsd (v2df, v2df)
-v2df __builtin_ia32_cmpnltsd (v2df, v2df)
-v2df __builtin_ia32_cmpnlesd (v2df, v2df)
-v2df __builtin_ia32_cmpordsd (v2df, v2df)
-v2di __builtin_ia32_paddq (v2di, v2di)
-v2di __builtin_ia32_psubq (v2di, v2di)
-v2df __builtin_ia32_addpd (v2df, v2df)
-v2df __builtin_ia32_subpd (v2df, v2df)
-v2df __builtin_ia32_mulpd (v2df, v2df)
-v2df __builtin_ia32_divpd (v2df, v2df)
-v2df __builtin_ia32_addsd (v2df, v2df)
-v2df __builtin_ia32_subsd (v2df, v2df)
-v2df __builtin_ia32_mulsd (v2df, v2df)
-v2df __builtin_ia32_divsd (v2df, v2df)
-v2df __builtin_ia32_minpd (v2df, v2df)
-v2df __builtin_ia32_maxpd (v2df, v2df)
-v2df __builtin_ia32_minsd (v2df, v2df)
-v2df __builtin_ia32_maxsd (v2df, v2df)
-v2df __builtin_ia32_andpd (v2df, v2df)
-v2df __builtin_ia32_andnpd (v2df, v2df)
-v2df __builtin_ia32_orpd (v2df, v2df)
-v2df __builtin_ia32_xorpd (v2df, v2df)
-v2df __builtin_ia32_movsd (v2df, v2df)
-v2df __builtin_ia32_unpckhpd (v2df, v2df)
-v2df __builtin_ia32_unpcklpd (v2df, v2df)
-v16qi __builtin_ia32_paddb128 (v16qi, v16qi)
-v8hi __builtin_ia32_paddw128 (v8hi, v8hi)
-v4si __builtin_ia32_paddd128 (v4si, v4si)
-v2di __builtin_ia32_paddq128 (v2di, v2di)
-v16qi __builtin_ia32_psubb128 (v16qi, v16qi)
-v8hi __builtin_ia32_psubw128 (v8hi, v8hi)
-v4si __builtin_ia32_psubd128 (v4si, v4si)
-v2di __builtin_ia32_psubq128 (v2di, v2di)
-v8hi __builtin_ia32_pmullw128 (v8hi, v8hi)
-v8hi __builtin_ia32_pmulhw128 (v8hi, v8hi)
-v2di __builtin_ia32_pand128 (v2di, v2di)
-v2di __builtin_ia32_pandn128 (v2di, v2di)
-v2di __builtin_ia32_por128 (v2di, v2di)
-v2di __builtin_ia32_pxor128 (v2di, v2di)
-v16qi __builtin_ia32_pavgb128 (v16qi, v16qi)
-v8hi __builtin_ia32_pavgw128 (v8hi, v8hi)
-v16qi __builtin_ia32_pcmpeqb128 (v16qi, v16qi)
-v8hi __builtin_ia32_pcmpeqw128 (v8hi, v8hi)
-v4si __builtin_ia32_pcmpeqd128 (v4si, v4si)
-v16qi __builtin_ia32_pcmpgtb128 (v16qi, v16qi)
-v8hi __builtin_ia32_pcmpgtw128 (v8hi, v8hi)
-v4si __builtin_ia32_pcmpgtd128 (v4si, v4si)
-v16qi __builtin_ia32_pmaxub128 (v16qi, v16qi)
-v8hi __builtin_ia32_pmaxsw128 (v8hi, v8hi)
-v16qi __builtin_ia32_pminub128 (v16qi, v16qi)
-v8hi __builtin_ia32_pminsw128 (v8hi, v8hi)
-v16qi __builtin_ia32_punpckhbw128 (v16qi, v16qi)
-v8hi __builtin_ia32_punpckhwd128 (v8hi, v8hi)
-v4si __builtin_ia32_punpckhdq128 (v4si, v4si)
-v2di __builtin_ia32_punpckhqdq128 (v2di, v2di)
-v16qi __builtin_ia32_punpcklbw128 (v16qi, v16qi)
-v8hi __builtin_ia32_punpcklwd128 (v8hi, v8hi)
-v4si __builtin_ia32_punpckldq128 (v4si, v4si)
-v2di __builtin_ia32_punpcklqdq128 (v2di, v2di)
-v16qi __builtin_ia32_packsswb128 (v8hi, v8hi)
-v8hi __builtin_ia32_packssdw128 (v4si, v4si)
-v16qi __builtin_ia32_packuswb128 (v8hi, v8hi)
-v8hi __builtin_ia32_pmulhuw128 (v8hi, v8hi)
-void __builtin_ia32_maskmovdqu (v16qi, v16qi)
-v2df __builtin_ia32_loadupd (double *)
-void __builtin_ia32_storeupd (double *, v2df)
-v2df __builtin_ia32_loadhpd (v2df, double const *)
-v2df __builtin_ia32_loadlpd (v2df, double const *)
-int __builtin_ia32_movmskpd (v2df)
-int __builtin_ia32_pmovmskb128 (v16qi)
-void __builtin_ia32_movnti (int *, int)
-void __builtin_ia32_movnti64 (long long int *, long long int)
-void __builtin_ia32_movntpd (double *, v2df)
-void __builtin_ia32_movntdq (v2df *, v2df)
-v4si __builtin_ia32_pshufd (v4si, int)
-v8hi __builtin_ia32_pshuflw (v8hi, int)
-v8hi __builtin_ia32_pshufhw (v8hi, int)
-v2di __builtin_ia32_psadbw128 (v16qi, v16qi)
-v2df __builtin_ia32_sqrtpd (v2df)
-v2df __builtin_ia32_sqrtsd (v2df)
-v2df __builtin_ia32_shufpd (v2df, v2df, int)
-v2df __builtin_ia32_cvtdq2pd (v4si)
-v4sf __builtin_ia32_cvtdq2ps (v4si)
-v4si __builtin_ia32_cvtpd2dq (v2df)
-v2si __builtin_ia32_cvtpd2pi (v2df)
-v4sf __builtin_ia32_cvtpd2ps (v2df)
-v4si __builtin_ia32_cvttpd2dq (v2df)
-v2si __builtin_ia32_cvttpd2pi (v2df)
-v2df __builtin_ia32_cvtpi2pd (v2si)
-int __builtin_ia32_cvtsd2si (v2df)
-int __builtin_ia32_cvttsd2si (v2df)
-long long __builtin_ia32_cvtsd2si64 (v2df)
-long long __builtin_ia32_cvttsd2si64 (v2df)
-v4si __builtin_ia32_cvtps2dq (v4sf)
-v2df __builtin_ia32_cvtps2pd (v4sf)
-v4si __builtin_ia32_cvttps2dq (v4sf)
-v2df __builtin_ia32_cvtsi2sd (v2df, int)
-v2df __builtin_ia32_cvtsi642sd (v2df, long long)
-v4sf __builtin_ia32_cvtsd2ss (v4sf, v2df)
-v2df __builtin_ia32_cvtss2sd (v2df, v4sf)
-void __builtin_ia32_clflush (const void *)
-void __builtin_ia32_lfence (void)
-void __builtin_ia32_mfence (void)
-v16qi __builtin_ia32_loaddqu (const char *)
-void __builtin_ia32_storedqu (char *, v16qi)
-v1di __builtin_ia32_pmuludq (v2si, v2si)
-v2di __builtin_ia32_pmuludq128 (v4si, v4si)
-v8hi __builtin_ia32_psllw128 (v8hi, v8hi)
-v4si __builtin_ia32_pslld128 (v4si, v4si)
-v2di __builtin_ia32_psllq128 (v2di, v2di)
-v8hi __builtin_ia32_psrlw128 (v8hi, v8hi)
-v4si __builtin_ia32_psrld128 (v4si, v4si)
-v2di __builtin_ia32_psrlq128 (v2di, v2di)
-v8hi __builtin_ia32_psraw128 (v8hi, v8hi)
-v4si __builtin_ia32_psrad128 (v4si, v4si)
-v2di __builtin_ia32_pslldqi128 (v2di, int)
-v8hi __builtin_ia32_psllwi128 (v8hi, int)
-v4si __builtin_ia32_pslldi128 (v4si, int)
-v2di __builtin_ia32_psllqi128 (v2di, int)
-v2di __builtin_ia32_psrldqi128 (v2di, int)
-v8hi __builtin_ia32_psrlwi128 (v8hi, int)
-v4si __builtin_ia32_psrldi128 (v4si, int)
-v2di __builtin_ia32_psrlqi128 (v2di, int)
-v8hi __builtin_ia32_psrawi128 (v8hi, int)
-v4si __builtin_ia32_psradi128 (v4si, int)
-v4si __builtin_ia32_pmaddwd128 (v8hi, v8hi)
-v2di __builtin_ia32_movq128 (v2di)
+float a, b;
+if (__builtin_mips_cabs_eq_s (a, b))
+  true ();
+else
+  false ();
 @end smallexample
 
-The following built-in functions are available when @option{-msse3} is used.
-All of them generate the machine instruction that is part of the name.
+@item int __builtin_mips_upper_cabs_@var{cond}_ps (v2sf @var{a}, v2sf @var{b})
+@itemx int __builtin_mips_lower_cabs_@var{cond}_ps (v2sf @var{a}, v2sf @var{b})
+Absolute comparison of two paired-single values (@code{cabs.@var{cond}.ps},
+@code{bc1t}/@code{bc1f}).
+
+These functions compare @var{a} and @var{b} using @code{cabs.@var{cond}.ps}
+and return either the upper or lower half of the result.  For example:
 
 @smallexample
-v2df __builtin_ia32_addsubpd (v2df, v2df)
-v4sf __builtin_ia32_addsubps (v4sf, v4sf)
-v2df __builtin_ia32_haddpd (v2df, v2df)
-v4sf __builtin_ia32_haddps (v4sf, v4sf)
-v2df __builtin_ia32_hsubpd (v2df, v2df)
-v4sf __builtin_ia32_hsubps (v4sf, v4sf)
-v16qi __builtin_ia32_lddqu (char const *)
-void __builtin_ia32_monitor (void *, unsigned int, unsigned int)
-v4sf __builtin_ia32_movshdup (v4sf)
-v4sf __builtin_ia32_movsldup (v4sf)
-void __builtin_ia32_mwait (unsigned int, unsigned int)
+v2sf a, b;
+if (__builtin_mips_upper_cabs_eq_ps (a, b))
+  upper_halves_are_equal ();
+else
+  upper_halves_are_unequal ();
+
+if (__builtin_mips_lower_cabs_eq_ps (a, b))
+  lower_halves_are_equal ();
+else
+  lower_halves_are_unequal ();
 @end smallexample
 
-The following built-in functions are available when @option{-mssse3} is used.
-All of them generate the machine instruction that is part of the name.
+@item v2sf __builtin_mips_movt_cabs_@var{cond}_ps (v2sf @var{a}, v2sf @var{b}, v2sf @var{c}, v2sf @var{d})
+@itemx v2sf __builtin_mips_movf_cabs_@var{cond}_ps (v2sf @var{a}, v2sf @var{b}, v2sf @var{c}, v2sf @var{d})
+Conditional move based on absolute comparison (@code{cabs.@var{cond}.ps},
+@code{movt.ps}/@code{movf.ps}).
+
+The @code{movt} functions return the value @var{x} computed by:
 
 @smallexample
-v2si __builtin_ia32_phaddd (v2si, v2si)
-v4hi __builtin_ia32_phaddw (v4hi, v4hi)
-v4hi __builtin_ia32_phaddsw (v4hi, v4hi)
-v2si __builtin_ia32_phsubd (v2si, v2si)
-v4hi __builtin_ia32_phsubw (v4hi, v4hi)
-v4hi __builtin_ia32_phsubsw (v4hi, v4hi)
-v4hi __builtin_ia32_pmaddubsw (v8qi, v8qi)
-v4hi __builtin_ia32_pmulhrsw (v4hi, v4hi)
-v8qi __builtin_ia32_pshufb (v8qi, v8qi)
-v8qi __builtin_ia32_psignb (v8qi, v8qi)
-v2si __builtin_ia32_psignd (v2si, v2si)
-v4hi __builtin_ia32_psignw (v4hi, v4hi)
-v1di __builtin_ia32_palignr (v1di, v1di, int)
-v8qi __builtin_ia32_pabsb (v8qi)
-v2si __builtin_ia32_pabsd (v2si)
-v4hi __builtin_ia32_pabsw (v4hi)
+cabs.@var{cond}.ps @var{cc},@var{a},@var{b}
+mov.ps @var{x},@var{c}
+movt.ps @var{x},@var{d},@var{cc}
 @end smallexample
 
-The following built-in functions are available when @option{-mssse3} is used.
-All of them generate the machine instruction that is part of the name.
+The @code{movf} functions are similar but use @code{movf.ps} instead
+of @code{movt.ps}.
+
+@item int __builtin_mips_any_c_@var{cond}_ps (v2sf @var{a}, v2sf @var{b})
+@itemx int __builtin_mips_all_c_@var{cond}_ps (v2sf @var{a}, v2sf @var{b})
+@itemx int __builtin_mips_any_cabs_@var{cond}_ps (v2sf @var{a}, v2sf @var{b})
+@itemx int __builtin_mips_all_cabs_@var{cond}_ps (v2sf @var{a}, v2sf @var{b})
+Comparison of two paired-single values
+(@code{c.@var{cond}.ps}/@code{cabs.@var{cond}.ps},
+@code{bc1any2t}/@code{bc1any2f}).
+
+These functions compare @var{a} and @var{b} using @code{c.@var{cond}.ps}
+or @code{cabs.@var{cond}.ps}.  The @code{any} forms return true if either
+result is true and the @code{all} forms return true if both results are true.
+For example:
 
 @smallexample
-v4si __builtin_ia32_phaddd128 (v4si, v4si)
-v8hi __builtin_ia32_phaddw128 (v8hi, v8hi)
-v8hi __builtin_ia32_phaddsw128 (v8hi, v8hi)
-v4si __builtin_ia32_phsubd128 (v4si, v4si)
-v8hi __builtin_ia32_phsubw128 (v8hi, v8hi)
-v8hi __builtin_ia32_phsubsw128 (v8hi, v8hi)
-v8hi __builtin_ia32_pmaddubsw128 (v16qi, v16qi)
-v8hi __builtin_ia32_pmulhrsw128 (v8hi, v8hi)
-v16qi __builtin_ia32_pshufb128 (v16qi, v16qi)
-v16qi __builtin_ia32_psignb128 (v16qi, v16qi)
-v4si __builtin_ia32_psignd128 (v4si, v4si)
-v8hi __builtin_ia32_psignw128 (v8hi, v8hi)
-v2di __builtin_ia32_palignr128 (v2di, v2di, int)
-v16qi __builtin_ia32_pabsb128 (v16qi)
-v4si __builtin_ia32_pabsd128 (v4si)
-v8hi __builtin_ia32_pabsw128 (v8hi)
+v2sf a, b;
+if (__builtin_mips_any_c_eq_ps (a, b))
+  one_is_true ();
+else
+  both_are_false ();
+
+if (__builtin_mips_all_c_eq_ps (a, b))
+  both_are_true ();
+else
+  one_is_false ();
 @end smallexample
 
-The following built-in functions are available when @option{-msse4.1} is
-used.  All of them generate the machine instruction that is part of the
-name.
+@item int __builtin_mips_any_c_@var{cond}_4s (v2sf @var{a}, v2sf @var{b}, v2sf @var{c}, v2sf @var{d})
+@itemx int __builtin_mips_all_c_@var{cond}_4s (v2sf @var{a}, v2sf @var{b}, v2sf @var{c}, v2sf @var{d})
+@itemx int __builtin_mips_any_cabs_@var{cond}_4s (v2sf @var{a}, v2sf @var{b}, v2sf @var{c}, v2sf @var{d})
+@itemx int __builtin_mips_all_cabs_@var{cond}_4s (v2sf @var{a}, v2sf @var{b}, v2sf @var{c}, v2sf @var{d})
+Comparison of four paired-single values
+(@code{c.@var{cond}.ps}/@code{cabs.@var{cond}.ps},
+@code{bc1any4t}/@code{bc1any4f}).
+
+These functions use @code{c.@var{cond}.ps} or @code{cabs.@var{cond}.ps}
+to compare @var{a} with @var{b} and to compare @var{c} with @var{d}.
+The @code{any} forms return true if any of the four results are true
+and the @code{all} forms return true if all four results are true.
+For example:
 
 @smallexample
-v2df __builtin_ia32_blendpd (v2df, v2df, const int)
-v4sf __builtin_ia32_blendps (v4sf, v4sf, const int)
-v2df __builtin_ia32_blendvpd (v2df, v2df, v2df)
-v4sf __builtin_ia32_blendvps (v4sf, v4sf, v4sf)
-v2df __builtin_ia32_dppd (v2df, v2df, const int)
-v4sf __builtin_ia32_dpps (v4sf, v4sf, const int)
-v4sf __builtin_ia32_insertps128 (v4sf, v4sf, const int)
-v2di __builtin_ia32_movntdqa (v2di *);
-v16qi __builtin_ia32_mpsadbw128 (v16qi, v16qi, const int)
-v8hi __builtin_ia32_packusdw128 (v4si, v4si)
-v16qi __builtin_ia32_pblendvb128 (v16qi, v16qi, v16qi)
-v8hi __builtin_ia32_pblendw128 (v8hi, v8hi, const int)
-v2di __builtin_ia32_pcmpeqq (v2di, v2di)
-v8hi __builtin_ia32_phminposuw128 (v8hi)
-v16qi __builtin_ia32_pmaxsb128 (v16qi, v16qi)
-v4si __builtin_ia32_pmaxsd128 (v4si, v4si)
-v4si __builtin_ia32_pmaxud128 (v4si, v4si)
-v8hi __builtin_ia32_pmaxuw128 (v8hi, v8hi)
-v16qi __builtin_ia32_pminsb128 (v16qi, v16qi)
-v4si __builtin_ia32_pminsd128 (v4si, v4si)
-v4si __builtin_ia32_pminud128 (v4si, v4si)
-v8hi __builtin_ia32_pminuw128 (v8hi, v8hi)
-v4si __builtin_ia32_pmovsxbd128 (v16qi)
-v2di __builtin_ia32_pmovsxbq128 (v16qi)
-v8hi __builtin_ia32_pmovsxbw128 (v16qi)
-v2di __builtin_ia32_pmovsxdq128 (v4si)
-v4si __builtin_ia32_pmovsxwd128 (v8hi)
-v2di __builtin_ia32_pmovsxwq128 (v8hi)
-v4si __builtin_ia32_pmovzxbd128 (v16qi)
-v2di __builtin_ia32_pmovzxbq128 (v16qi)
-v8hi __builtin_ia32_pmovzxbw128 (v16qi)
-v2di __builtin_ia32_pmovzxdq128 (v4si)
-v4si __builtin_ia32_pmovzxwd128 (v8hi)
-v2di __builtin_ia32_pmovzxwq128 (v8hi)
-v2di __builtin_ia32_pmuldq128 (v4si, v4si)
-v4si __builtin_ia32_pmulld128 (v4si, v4si)
-int __builtin_ia32_ptestc128 (v2di, v2di)
-int __builtin_ia32_ptestnzc128 (v2di, v2di)
-int __builtin_ia32_ptestz128 (v2di, v2di)
-v2df __builtin_ia32_roundpd (v2df, const int)
-v4sf __builtin_ia32_roundps (v4sf, const int)
-v2df __builtin_ia32_roundsd (v2df, v2df, const int)
-v4sf __builtin_ia32_roundss (v4sf, v4sf, const int)
+v2sf a, b, c, d;
+if (__builtin_mips_any_c_eq_4s (a, b, c, d))
+  some_are_true ();
+else
+  all_are_false ();
+
+if (__builtin_mips_all_c_eq_4s (a, b, c, d))
+  all_are_true ();
+else
+  some_are_false ();
 @end smallexample
+@end table
 
-The following built-in functions are available when @option{-msse4.1} is
-used.
+@node Other MIPS Built-in Functions
+@subsection Other MIPS Built-in Functions
+
+GCC provides other MIPS-specific built-in functions:
 
 @table @code
-@item v4sf __builtin_ia32_vec_set_v4sf (v4sf, float, const int)
-Generates the @code{insertps} machine instruction.
-@item int __builtin_ia32_vec_ext_v16qi (v16qi, const int)
-Generates the @code{pextrb} machine instruction.
-@item v16qi __builtin_ia32_vec_set_v16qi (v16qi, int, const int)
-Generates the @code{pinsrb} machine instruction.
-@item v4si __builtin_ia32_vec_set_v4si (v4si, int, const int)
-Generates the @code{pinsrd} machine instruction.
-@item v2di __builtin_ia32_vec_set_v2di (v2di, long long, const int)
-Generates the @code{pinsrq} machine instruction in 64bit mode.
+@item void __builtin_mips_cache (int @var{op}, const volatile void *@var{addr})
+Insert a @samp{cache} instruction with operands @var{op} and @var{addr}.
+GCC defines the preprocessor macro @code{___GCC_HAVE_BUILTIN_MIPS_CACHE}
+when this function is available.
+
+@item unsigned int __builtin_mips_get_fcsr (void)
+@itemx void __builtin_mips_set_fcsr (unsigned int @var{value})
+Get and set the contents of the floating-point control and status register
+(FPU control register 31).  These functions are only available in hard-float
+code but can be called in both MIPS16 and non-MIPS16 contexts.
+
+@code{__builtin_mips_set_fcsr} can be used to change any bit of the
+register except the condition codes, which GCC assumes are preserved.
 @end table
 
-The following built-in functions are changed to generate new SSE4.1
-instructions when @option{-msse4.1} is used.
+@node MSP430 Built-in Functions
+@subsection MSP430 Built-in Functions
+
+GCC provides a couple of special builtin functions to aid in the
+writing of interrupt handlers in C.
 
 @table @code
-@item float __builtin_ia32_vec_ext_v4sf (v4sf, const int)
-Generates the @code{extractps} machine instruction.
-@item int __builtin_ia32_vec_ext_v4si (v4si, const int)
-Generates the @code{pextrd} machine instruction.
-@item long long __builtin_ia32_vec_ext_v2di (v2di, const int)
-Generates the @code{pextrq} machine instruction in 64bit mode.
+@item __bic_SR_register_on_exit (int @var{mask})
+This clears the indicated bits in the saved copy of the status register
+currently residing on the stack.  This only works inside interrupt
+handlers and the changes to the status register will only take affect
+once the handler returns.
+
+@item __bis_SR_register_on_exit (int @var{mask})
+This sets the indicated bits in the saved copy of the status register
+currently residing on the stack.  This only works inside interrupt
+handlers and the changes to the status register will only take affect
+once the handler returns.
+
+@item __delay_cycles (long long @var{cycles})
+This inserts an instruction sequence that takes exactly @var{cycles}
+cycles (between 0 and about 17E9) to complete.  The inserted sequence
+may use jumps, loops, or no-ops, and does not interfere with any other
+instructions.  Note that @var{cycles} must be a compile-time constant
+integer - that is, you must pass a number, not a variable that may be
+optimized to a constant later.  The number of cycles delayed by this
+builtin is exact.
 @end table
 
-The following built-in functions are available when @option{-msse4.2} is
-used.  All of them generate the machine instruction that is part of the
-name.
+@node NDS32 Built-in Functions
+@subsection NDS32 Built-in Functions
 
-@smallexample
-v16qi __builtin_ia32_pcmpestrm128 (v16qi, int, v16qi, int, const int)
-int __builtin_ia32_pcmpestri128 (v16qi, int, v16qi, int, const int)
-int __builtin_ia32_pcmpestria128 (v16qi, int, v16qi, int, const int)
-int __builtin_ia32_pcmpestric128 (v16qi, int, v16qi, int, const int)
-int __builtin_ia32_pcmpestrio128 (v16qi, int, v16qi, int, const int)
-int __builtin_ia32_pcmpestris128 (v16qi, int, v16qi, int, const int)
-int __builtin_ia32_pcmpestriz128 (v16qi, int, v16qi, int, const int)
-v16qi __builtin_ia32_pcmpistrm128 (v16qi, v16qi, const int)
-int __builtin_ia32_pcmpistri128 (v16qi, v16qi, const int)
-int __builtin_ia32_pcmpistria128 (v16qi, v16qi, const int)
-int __builtin_ia32_pcmpistric128 (v16qi, v16qi, const int)
-int __builtin_ia32_pcmpistrio128 (v16qi, v16qi, const int)
-int __builtin_ia32_pcmpistris128 (v16qi, v16qi, const int)
-int __builtin_ia32_pcmpistriz128 (v16qi, v16qi, const int)
-v2di __builtin_ia32_pcmpgtq (v2di, v2di)
-@end smallexample
+These built-in functions are available for the NDS32 target:
 
-The following built-in functions are available when @option{-msse4.2} is
-used.
+@deftypefn {Built-in Function} void __builtin_nds32_isync (int *@var{addr})
+Insert an ISYNC instruction into the instruction stream where
+@var{addr} is an instruction address for serialization.
+@end deftypefn
 
-@table @code
-@item unsigned int __builtin_ia32_crc32qi (unsigned int, unsigned char)
-Generates the @code{crc32b} machine instruction.
-@item unsigned int __builtin_ia32_crc32hi (unsigned int, unsigned short)
-Generates the @code{crc32w} machine instruction.
-@item unsigned int __builtin_ia32_crc32si (unsigned int, unsigned int)
-Generates the @code{crc32l} machine instruction.
-@item unsigned long long __builtin_ia32_crc32di (unsigned long long, unsigned long long)
-Generates the @code{crc32q} machine instruction.
-@end table
+@deftypefn {Built-in Function} void __builtin_nds32_isb (void)
+Insert an ISB instruction into the instruction stream.
+@end deftypefn
 
-The following built-in functions are changed to generate new SSE4.2
-instructions when @option{-msse4.2} is used.
+@deftypefn {Built-in Function} int __builtin_nds32_mfsr (int @var{sr})
+Return the content of a system register which is mapped by @var{sr}.
+@end deftypefn
+
+@deftypefn {Built-in Function} int __builtin_nds32_mfusr (int @var{usr})
+Return the content of a user space register which is mapped by @var{usr}.
+@end deftypefn
+
+@deftypefn {Built-in Function} void __builtin_nds32_mtsr (int @var{value}, int @var{sr})
+Move the @var{value} to a system register which is mapped by @var{sr}.
+@end deftypefn
+
+@deftypefn {Built-in Function} void __builtin_nds32_mtusr (int @var{value}, int @var{usr})
+Move the @var{value} to a user space register which is mapped by @var{usr}.
+@end deftypefn
+
+@deftypefn {Built-in Function} void __builtin_nds32_setgie_en (void)
+Enable global interrupt.
+@end deftypefn
+
+@deftypefn {Built-in Function} void __builtin_nds32_setgie_dis (void)
+Disable global interrupt.
+@end deftypefn
+
+@node picoChip Built-in Functions
+@subsection picoChip Built-in Functions
+
+GCC provides an interface to selected machine instructions from the
+picoChip instruction set.
 
 @table @code
-@item int __builtin_popcount (unsigned int)
-Generates the @code{popcntl} machine instruction.
-@item int __builtin_popcountl (unsigned long)
-Generates the @code{popcntl} or @code{popcntq} machine instruction,
-depending on the size of @code{unsigned long}.
-@item int __builtin_popcountll (unsigned long long)
-Generates the @code{popcntq} machine instruction.
+@item int __builtin_sbc (int @var{value})
+Sign bit count.  Return the number of consecutive bits in @var{value}
+that have the same value as the sign bit.  The result is the number of
+leading sign bits minus one, giving the number of redundant sign bits in
+@var{value}.
+
+@item int __builtin_byteswap (int @var{value})
+Byte swap.  Return the result of swapping the upper and lower bytes of
+@var{value}.
+
+@item int __builtin_brev (int @var{value})
+Bit reversal.  Return the result of reversing the bits in
+@var{value}.  Bit 15 is swapped with bit 0, bit 14 is swapped with bit 1,
+and so on.
+
+@item int __builtin_adds (int @var{x}, int @var{y})
+Saturating addition.  Return the result of adding @var{x} and @var{y},
+storing the value 32767 if the result overflows.
+
+@item int __builtin_subs (int @var{x}, int @var{y})
+Saturating subtraction.  Return the result of subtracting @var{y} from
+@var{x}, storing the value @minus{}32768 if the result overflows.
+
+@item void __builtin_halt (void)
+Halt.  The processor stops execution.  This built-in is useful for
+implementing assertions.
+
 @end table
 
-The following built-in functions are available when @option{-mavx} is
-used. All of them generate the machine instruction that is part of the
-name.
+@node PowerPC Built-in Functions
+@subsection PowerPC Built-in Functions
 
+These built-in functions are available for the PowerPC family of
+processors:
 @smallexample
-v4df __builtin_ia32_addpd256 (v4df,v4df)
-v8sf __builtin_ia32_addps256 (v8sf,v8sf)
-v4df __builtin_ia32_addsubpd256 (v4df,v4df)
-v8sf __builtin_ia32_addsubps256 (v8sf,v8sf)
-v4df __builtin_ia32_andnpd256 (v4df,v4df)
-v8sf __builtin_ia32_andnps256 (v8sf,v8sf)
-v4df __builtin_ia32_andpd256 (v4df,v4df)
-v8sf __builtin_ia32_andps256 (v8sf,v8sf)
-v4df __builtin_ia32_blendpd256 (v4df,v4df,int)
-v8sf __builtin_ia32_blendps256 (v8sf,v8sf,int)
-v4df __builtin_ia32_blendvpd256 (v4df,v4df,v4df)
-v8sf __builtin_ia32_blendvps256 (v8sf,v8sf,v8sf)
-v2df __builtin_ia32_cmppd (v2df,v2df,int)
-v4df __builtin_ia32_cmppd256 (v4df,v4df,int)
-v4sf __builtin_ia32_cmpps (v4sf,v4sf,int)
-v8sf __builtin_ia32_cmpps256 (v8sf,v8sf,int)
-v2df __builtin_ia32_cmpsd (v2df,v2df,int)
-v4sf __builtin_ia32_cmpss (v4sf,v4sf,int)
-v4df __builtin_ia32_cvtdq2pd256 (v4si)
-v8sf __builtin_ia32_cvtdq2ps256 (v8si)
-v4si __builtin_ia32_cvtpd2dq256 (v4df)
-v4sf __builtin_ia32_cvtpd2ps256 (v4df)
-v8si __builtin_ia32_cvtps2dq256 (v8sf)
-v4df __builtin_ia32_cvtps2pd256 (v4sf)
-v4si __builtin_ia32_cvttpd2dq256 (v4df)
-v8si __builtin_ia32_cvttps2dq256 (v8sf)
-v4df __builtin_ia32_divpd256 (v4df,v4df)
-v8sf __builtin_ia32_divps256 (v8sf,v8sf)
-v8sf __builtin_ia32_dpps256 (v8sf,v8sf,int)
-v4df __builtin_ia32_haddpd256 (v4df,v4df)
-v8sf __builtin_ia32_haddps256 (v8sf,v8sf)
-v4df __builtin_ia32_hsubpd256 (v4df,v4df)
-v8sf __builtin_ia32_hsubps256 (v8sf,v8sf)
-v32qi __builtin_ia32_lddqu256 (pcchar)
-v32qi __builtin_ia32_loaddqu256 (pcchar)
-v4df __builtin_ia32_loadupd256 (pcdouble)
-v8sf __builtin_ia32_loadups256 (pcfloat)
-v2df __builtin_ia32_maskloadpd (pcv2df,v2df)
-v4df __builtin_ia32_maskloadpd256 (pcv4df,v4df)
-v4sf __builtin_ia32_maskloadps (pcv4sf,v4sf)
-v8sf __builtin_ia32_maskloadps256 (pcv8sf,v8sf)
-void __builtin_ia32_maskstorepd (pv2df,v2df,v2df)
-void __builtin_ia32_maskstorepd256 (pv4df,v4df,v4df)
-void __builtin_ia32_maskstoreps (pv4sf,v4sf,v4sf)
-void __builtin_ia32_maskstoreps256 (pv8sf,v8sf,v8sf)
-v4df __builtin_ia32_maxpd256 (v4df,v4df)
-v8sf __builtin_ia32_maxps256 (v8sf,v8sf)
-v4df __builtin_ia32_minpd256 (v4df,v4df)
-v8sf __builtin_ia32_minps256 (v8sf,v8sf)
-v4df __builtin_ia32_movddup256 (v4df)
-int __builtin_ia32_movmskpd256 (v4df)
-int __builtin_ia32_movmskps256 (v8sf)
-v8sf __builtin_ia32_movshdup256 (v8sf)
-v8sf __builtin_ia32_movsldup256 (v8sf)
-v4df __builtin_ia32_mulpd256 (v4df,v4df)
-v8sf __builtin_ia32_mulps256 (v8sf,v8sf)
-v4df __builtin_ia32_orpd256 (v4df,v4df)
-v8sf __builtin_ia32_orps256 (v8sf,v8sf)
-v2df __builtin_ia32_pd_pd256 (v4df)
-v4df __builtin_ia32_pd256_pd (v2df)
-v4sf __builtin_ia32_ps_ps256 (v8sf)
-v8sf __builtin_ia32_ps256_ps (v4sf)
-int __builtin_ia32_ptestc256 (v4di,v4di,ptest)
-int __builtin_ia32_ptestnzc256 (v4di,v4di,ptest)
-int __builtin_ia32_ptestz256 (v4di,v4di,ptest)
-v8sf __builtin_ia32_rcpps256 (v8sf)
-v4df __builtin_ia32_roundpd256 (v4df,int)
-v8sf __builtin_ia32_roundps256 (v8sf,int)
-v8sf __builtin_ia32_rsqrtps_nr256 (v8sf)
-v8sf __builtin_ia32_rsqrtps256 (v8sf)
-v4df __builtin_ia32_shufpd256 (v4df,v4df,int)
-v8sf __builtin_ia32_shufps256 (v8sf,v8sf,int)
-v4si __builtin_ia32_si_si256 (v8si)
-v8si __builtin_ia32_si256_si (v4si)
-v4df __builtin_ia32_sqrtpd256 (v4df)
-v8sf __builtin_ia32_sqrtps_nr256 (v8sf)
-v8sf __builtin_ia32_sqrtps256 (v8sf)
-void __builtin_ia32_storedqu256 (pchar,v32qi)
-void __builtin_ia32_storeupd256 (pdouble,v4df)
-void __builtin_ia32_storeups256 (pfloat,v8sf)
-v4df __builtin_ia32_subpd256 (v4df,v4df)
-v8sf __builtin_ia32_subps256 (v8sf,v8sf)
-v4df __builtin_ia32_unpckhpd256 (v4df,v4df)
-v8sf __builtin_ia32_unpckhps256 (v8sf,v8sf)
-v4df __builtin_ia32_unpcklpd256 (v4df,v4df)
-v8sf __builtin_ia32_unpcklps256 (v8sf,v8sf)
-v4df __builtin_ia32_vbroadcastf128_pd256 (pcv2df)
-v8sf __builtin_ia32_vbroadcastf128_ps256 (pcv4sf)
-v4df __builtin_ia32_vbroadcastsd256 (pcdouble)
-v4sf __builtin_ia32_vbroadcastss (pcfloat)
-v8sf __builtin_ia32_vbroadcastss256 (pcfloat)
-v2df __builtin_ia32_vextractf128_pd256 (v4df,int)
-v4sf __builtin_ia32_vextractf128_ps256 (v8sf,int)
-v4si __builtin_ia32_vextractf128_si256 (v8si,int)
-v4df __builtin_ia32_vinsertf128_pd256 (v4df,v2df,int)
-v8sf __builtin_ia32_vinsertf128_ps256 (v8sf,v4sf,int)
-v8si __builtin_ia32_vinsertf128_si256 (v8si,v4si,int)
-v4df __builtin_ia32_vperm2f128_pd256 (v4df,v4df,int)
-v8sf __builtin_ia32_vperm2f128_ps256 (v8sf,v8sf,int)
-v8si __builtin_ia32_vperm2f128_si256 (v8si,v8si,int)
-v2df __builtin_ia32_vpermil2pd (v2df,v2df,v2di,int)
-v4df __builtin_ia32_vpermil2pd256 (v4df,v4df,v4di,int)
-v4sf __builtin_ia32_vpermil2ps (v4sf,v4sf,v4si,int)
-v8sf __builtin_ia32_vpermil2ps256 (v8sf,v8sf,v8si,int)
-v2df __builtin_ia32_vpermilpd (v2df,int)
-v4df __builtin_ia32_vpermilpd256 (v4df,int)
-v4sf __builtin_ia32_vpermilps (v4sf,int)
-v8sf __builtin_ia32_vpermilps256 (v8sf,int)
-v2df __builtin_ia32_vpermilvarpd (v2df,v2di)
-v4df __builtin_ia32_vpermilvarpd256 (v4df,v4di)
-v4sf __builtin_ia32_vpermilvarps (v4sf,v4si)
-v8sf __builtin_ia32_vpermilvarps256 (v8sf,v8si)
-int __builtin_ia32_vtestcpd (v2df,v2df,ptest)
-int __builtin_ia32_vtestcpd256 (v4df,v4df,ptest)
-int __builtin_ia32_vtestcps (v4sf,v4sf,ptest)
-int __builtin_ia32_vtestcps256 (v8sf,v8sf,ptest)
-int __builtin_ia32_vtestnzcpd (v2df,v2df,ptest)
-int __builtin_ia32_vtestnzcpd256 (v4df,v4df,ptest)
-int __builtin_ia32_vtestnzcps (v4sf,v4sf,ptest)
-int __builtin_ia32_vtestnzcps256 (v8sf,v8sf,ptest)
-int __builtin_ia32_vtestzpd (v2df,v2df,ptest)
-int __builtin_ia32_vtestzpd256 (v4df,v4df,ptest)
-int __builtin_ia32_vtestzps (v4sf,v4sf,ptest)
-int __builtin_ia32_vtestzps256 (v8sf,v8sf,ptest)
-void __builtin_ia32_vzeroall (void)
-void __builtin_ia32_vzeroupper (void)
-v4df __builtin_ia32_xorpd256 (v4df,v4df)
-v8sf __builtin_ia32_xorps256 (v8sf,v8sf)
-@end smallexample
-
-The following built-in functions are available when @option{-mavx2} is
-used. All of them generate the machine instruction that is part of the
-name.
-
-@smallexample
-v32qi __builtin_ia32_mpsadbw256 (v32qi,v32qi,int)
-v32qi __builtin_ia32_pabsb256 (v32qi)
-v16hi __builtin_ia32_pabsw256 (v16hi)
-v8si __builtin_ia32_pabsd256 (v8si)
-v16hi __builtin_ia32_packssdw256 (v8si,v8si)
-v32qi __builtin_ia32_packsswb256 (v16hi,v16hi)
-v16hi __builtin_ia32_packusdw256 (v8si,v8si)
-v32qi __builtin_ia32_packuswb256 (v16hi,v16hi)
-v32qi __builtin_ia32_paddb256 (v32qi,v32qi)
-v16hi __builtin_ia32_paddw256 (v16hi,v16hi)
-v8si __builtin_ia32_paddd256 (v8si,v8si)
-v4di __builtin_ia32_paddq256 (v4di,v4di)
-v32qi __builtin_ia32_paddsb256 (v32qi,v32qi)
-v16hi __builtin_ia32_paddsw256 (v16hi,v16hi)
-v32qi __builtin_ia32_paddusb256 (v32qi,v32qi)
-v16hi __builtin_ia32_paddusw256 (v16hi,v16hi)
-v4di __builtin_ia32_palignr256 (v4di,v4di,int)
-v4di __builtin_ia32_andsi256 (v4di,v4di)
-v4di __builtin_ia32_andnotsi256 (v4di,v4di)
-v32qi __builtin_ia32_pavgb256 (v32qi,v32qi)
-v16hi __builtin_ia32_pavgw256 (v16hi,v16hi)
-v32qi __builtin_ia32_pblendvb256 (v32qi,v32qi,v32qi)
-v16hi __builtin_ia32_pblendw256 (v16hi,v16hi,int)
-v32qi __builtin_ia32_pcmpeqb256 (v32qi,v32qi)
-v16hi __builtin_ia32_pcmpeqw256 (v16hi,v16hi)
-v8si __builtin_ia32_pcmpeqd256 (c8si,v8si)
-v4di __builtin_ia32_pcmpeqq256 (v4di,v4di)
-v32qi __builtin_ia32_pcmpgtb256 (v32qi,v32qi)
-v16hi __builtin_ia32_pcmpgtw256 (16hi,v16hi)
-v8si __builtin_ia32_pcmpgtd256 (v8si,v8si)
-v4di __builtin_ia32_pcmpgtq256 (v4di,v4di)
-v16hi __builtin_ia32_phaddw256 (v16hi,v16hi)
-v8si __builtin_ia32_phaddd256 (v8si,v8si)
-v16hi __builtin_ia32_phaddsw256 (v16hi,v16hi)
-v16hi __builtin_ia32_phsubw256 (v16hi,v16hi)
-v8si __builtin_ia32_phsubd256 (v8si,v8si)
-v16hi __builtin_ia32_phsubsw256 (v16hi,v16hi)
-v32qi __builtin_ia32_pmaddubsw256 (v32qi,v32qi)
-v16hi __builtin_ia32_pmaddwd256 (v16hi,v16hi)
-v32qi __builtin_ia32_pmaxsb256 (v32qi,v32qi)
-v16hi __builtin_ia32_pmaxsw256 (v16hi,v16hi)
-v8si __builtin_ia32_pmaxsd256 (v8si,v8si)
-v32qi __builtin_ia32_pmaxub256 (v32qi,v32qi)
-v16hi __builtin_ia32_pmaxuw256 (v16hi,v16hi)
-v8si __builtin_ia32_pmaxud256 (v8si,v8si)
-v32qi __builtin_ia32_pminsb256 (v32qi,v32qi)
-v16hi __builtin_ia32_pminsw256 (v16hi,v16hi)
-v8si __builtin_ia32_pminsd256 (v8si,v8si)
-v32qi __builtin_ia32_pminub256 (v32qi,v32qi)
-v16hi __builtin_ia32_pminuw256 (v16hi,v16hi)
-v8si __builtin_ia32_pminud256 (v8si,v8si)
-int __builtin_ia32_pmovmskb256 (v32qi)
-v16hi __builtin_ia32_pmovsxbw256 (v16qi)
-v8si __builtin_ia32_pmovsxbd256 (v16qi)
-v4di __builtin_ia32_pmovsxbq256 (v16qi)
-v8si __builtin_ia32_pmovsxwd256 (v8hi)
-v4di __builtin_ia32_pmovsxwq256 (v8hi)
-v4di __builtin_ia32_pmovsxdq256 (v4si)
-v16hi __builtin_ia32_pmovzxbw256 (v16qi)
-v8si __builtin_ia32_pmovzxbd256 (v16qi)
-v4di __builtin_ia32_pmovzxbq256 (v16qi)
-v8si __builtin_ia32_pmovzxwd256 (v8hi)
-v4di __builtin_ia32_pmovzxwq256 (v8hi)
-v4di __builtin_ia32_pmovzxdq256 (v4si)
-v4di __builtin_ia32_pmuldq256 (v8si,v8si)
-v16hi __builtin_ia32_pmulhrsw256 (v16hi, v16hi)
-v16hi __builtin_ia32_pmulhuw256 (v16hi,v16hi)
-v16hi __builtin_ia32_pmulhw256 (v16hi,v16hi)
-v16hi __builtin_ia32_pmullw256 (v16hi,v16hi)
-v8si __builtin_ia32_pmulld256 (v8si,v8si)
-v4di __builtin_ia32_pmuludq256 (v8si,v8si)
-v4di __builtin_ia32_por256 (v4di,v4di)
-v16hi __builtin_ia32_psadbw256 (v32qi,v32qi)
-v32qi __builtin_ia32_pshufb256 (v32qi,v32qi)
-v8si __builtin_ia32_pshufd256 (v8si,int)
-v16hi __builtin_ia32_pshufhw256 (v16hi,int)
-v16hi __builtin_ia32_pshuflw256 (v16hi,int)
-v32qi __builtin_ia32_psignb256 (v32qi,v32qi)
-v16hi __builtin_ia32_psignw256 (v16hi,v16hi)
-v8si __builtin_ia32_psignd256 (v8si,v8si)
-v4di __builtin_ia32_pslldqi256 (v4di,int)
-v16hi __builtin_ia32_psllwi256 (16hi,int)
-v16hi __builtin_ia32_psllw256(v16hi,v8hi)
-v8si __builtin_ia32_pslldi256 (v8si,int)
-v8si __builtin_ia32_pslld256(v8si,v4si)
-v4di __builtin_ia32_psllqi256 (v4di,int)
-v4di __builtin_ia32_psllq256(v4di,v2di)
-v16hi __builtin_ia32_psrawi256 (v16hi,int)
-v16hi __builtin_ia32_psraw256 (v16hi,v8hi)
-v8si __builtin_ia32_psradi256 (v8si,int)
-v8si __builtin_ia32_psrad256 (v8si,v4si)
-v4di __builtin_ia32_psrldqi256 (v4di, int)
-v16hi __builtin_ia32_psrlwi256 (v16hi,int)
-v16hi __builtin_ia32_psrlw256 (v16hi,v8hi)
-v8si __builtin_ia32_psrldi256 (v8si,int)
-v8si __builtin_ia32_psrld256 (v8si,v4si)
-v4di __builtin_ia32_psrlqi256 (v4di,int)
-v4di __builtin_ia32_psrlq256(v4di,v2di)
-v32qi __builtin_ia32_psubb256 (v32qi,v32qi)
-v32hi __builtin_ia32_psubw256 (v16hi,v16hi)
-v8si __builtin_ia32_psubd256 (v8si,v8si)
-v4di __builtin_ia32_psubq256 (v4di,v4di)
-v32qi __builtin_ia32_psubsb256 (v32qi,v32qi)
-v16hi __builtin_ia32_psubsw256 (v16hi,v16hi)
-v32qi __builtin_ia32_psubusb256 (v32qi,v32qi)
-v16hi __builtin_ia32_psubusw256 (v16hi,v16hi)
-v32qi __builtin_ia32_punpckhbw256 (v32qi,v32qi)
-v16hi __builtin_ia32_punpckhwd256 (v16hi,v16hi)
-v8si __builtin_ia32_punpckhdq256 (v8si,v8si)
-v4di __builtin_ia32_punpckhqdq256 (v4di,v4di)
-v32qi __builtin_ia32_punpcklbw256 (v32qi,v32qi)
-v16hi __builtin_ia32_punpcklwd256 (v16hi,v16hi)
-v8si __builtin_ia32_punpckldq256 (v8si,v8si)
-v4di __builtin_ia32_punpcklqdq256 (v4di,v4di)
-v4di __builtin_ia32_pxor256 (v4di,v4di)
-v4di __builtin_ia32_movntdqa256 (pv4di)
-v4sf __builtin_ia32_vbroadcastss_ps (v4sf)
-v8sf __builtin_ia32_vbroadcastss_ps256 (v4sf)
-v4df __builtin_ia32_vbroadcastsd_pd256 (v2df)
-v4di __builtin_ia32_vbroadcastsi256 (v2di)
-v4si __builtin_ia32_pblendd128 (v4si,v4si)
-v8si __builtin_ia32_pblendd256 (v8si,v8si)
-v32qi __builtin_ia32_pbroadcastb256 (v16qi)
-v16hi __builtin_ia32_pbroadcastw256 (v8hi)
-v8si __builtin_ia32_pbroadcastd256 (v4si)
-v4di __builtin_ia32_pbroadcastq256 (v2di)
-v16qi __builtin_ia32_pbroadcastb128 (v16qi)
-v8hi __builtin_ia32_pbroadcastw128 (v8hi)
-v4si __builtin_ia32_pbroadcastd128 (v4si)
-v2di __builtin_ia32_pbroadcastq128 (v2di)
-v8si __builtin_ia32_permvarsi256 (v8si,v8si)
-v4df __builtin_ia32_permdf256 (v4df,int)
-v8sf __builtin_ia32_permvarsf256 (v8sf,v8sf)
-v4di __builtin_ia32_permdi256 (v4di,int)
-v4di __builtin_ia32_permti256 (v4di,v4di,int)
-v4di __builtin_ia32_extract128i256 (v4di,int)
-v4di __builtin_ia32_insert128i256 (v4di,v2di,int)
-v8si __builtin_ia32_maskloadd256 (pcv8si,v8si)
-v4di __builtin_ia32_maskloadq256 (pcv4di,v4di)
-v4si __builtin_ia32_maskloadd (pcv4si,v4si)
-v2di __builtin_ia32_maskloadq (pcv2di,v2di)
-void __builtin_ia32_maskstored256 (pv8si,v8si,v8si)
-void __builtin_ia32_maskstoreq256 (pv4di,v4di,v4di)
-void __builtin_ia32_maskstored (pv4si,v4si,v4si)
-void __builtin_ia32_maskstoreq (pv2di,v2di,v2di)
-v8si __builtin_ia32_psllv8si (v8si,v8si)
-v4si __builtin_ia32_psllv4si (v4si,v4si)
-v4di __builtin_ia32_psllv4di (v4di,v4di)
-v2di __builtin_ia32_psllv2di (v2di,v2di)
-v8si __builtin_ia32_psrav8si (v8si,v8si)
-v4si __builtin_ia32_psrav4si (v4si,v4si)
-v8si __builtin_ia32_psrlv8si (v8si,v8si)
-v4si __builtin_ia32_psrlv4si (v4si,v4si)
-v4di __builtin_ia32_psrlv4di (v4di,v4di)
-v2di __builtin_ia32_psrlv2di (v2di,v2di)
-v2df __builtin_ia32_gathersiv2df (v2df, pcdouble,v4si,v2df,int)
-v4df __builtin_ia32_gathersiv4df (v4df, pcdouble,v4si,v4df,int)
-v2df __builtin_ia32_gatherdiv2df (v2df, pcdouble,v2di,v2df,int)
-v4df __builtin_ia32_gatherdiv4df (v4df, pcdouble,v4di,v4df,int)
-v4sf __builtin_ia32_gathersiv4sf (v4sf, pcfloat,v4si,v4sf,int)
-v8sf __builtin_ia32_gathersiv8sf (v8sf, pcfloat,v8si,v8sf,int)
-v4sf __builtin_ia32_gatherdiv4sf (v4sf, pcfloat,v2di,v4sf,int)
-v4sf __builtin_ia32_gatherdiv4sf256 (v4sf, pcfloat,v4di,v4sf,int)
-v2di __builtin_ia32_gathersiv2di (v2di, pcint64,v4si,v2di,int)
-v4di __builtin_ia32_gathersiv4di (v4di, pcint64,v4si,v4di,int)
-v2di __builtin_ia32_gatherdiv2di (v2di, pcint64,v2di,v2di,int)
-v4di __builtin_ia32_gatherdiv4di (v4di, pcint64,v4di,v4di,int)
-v4si __builtin_ia32_gathersiv4si (v4si, pcint,v4si,v4si,int)
-v8si __builtin_ia32_gathersiv8si (v8si, pcint,v8si,v8si,int)
-v4si __builtin_ia32_gatherdiv4si (v4si, pcint,v2di,v4si,int)
-v4si __builtin_ia32_gatherdiv4si256 (v4si, pcint,v4di,v4si,int)
-@end smallexample
-
-The following built-in functions are available when @option{-maes} is
-used.  All of them generate the machine instruction that is part of the
-name.
-
-@smallexample
-v2di __builtin_ia32_aesenc128 (v2di, v2di)
-v2di __builtin_ia32_aesenclast128 (v2di, v2di)
-v2di __builtin_ia32_aesdec128 (v2di, v2di)
-v2di __builtin_ia32_aesdeclast128 (v2di, v2di)
-v2di __builtin_ia32_aeskeygenassist128 (v2di, const int)
-v2di __builtin_ia32_aesimc128 (v2di)
-@end smallexample
-
-The following built-in function is available when @option{-mpclmul} is
-used.
-
-@table @code
-@item v2di __builtin_ia32_pclmulqdq128 (v2di, v2di, const int)
-Generates the @code{pclmulqdq} machine instruction.
-@end table
-
-The following built-in function is available when @option{-mfsgsbase} is
-used.  All of them generate the machine instruction that is part of the
-name.
-
-@smallexample
-unsigned int __builtin_ia32_rdfsbase32 (void)
-unsigned long long __builtin_ia32_rdfsbase64 (void)
-unsigned int __builtin_ia32_rdgsbase32 (void)
-unsigned long long __builtin_ia32_rdgsbase64 (void)
-void _writefsbase_u32 (unsigned int)
-void _writefsbase_u64 (unsigned long long)
-void _writegsbase_u32 (unsigned int)
-void _writegsbase_u64 (unsigned long long)
-@end smallexample
-
-The following built-in function is available when @option{-mrdrnd} is
-used.  All of them generate the machine instruction that is part of the
-name.
-
-@smallexample
-unsigned int __builtin_ia32_rdrand16_step (unsigned short *)
-unsigned int __builtin_ia32_rdrand32_step (unsigned int *)
-unsigned int __builtin_ia32_rdrand64_step (unsigned long long *)
-@end smallexample
-
-The following built-in functions are available when @option{-msse4a} is used.
-All of them generate the machine instruction that is part of the name.
-
-@smallexample
-void __builtin_ia32_movntsd (double *, v2df)
-void __builtin_ia32_movntss (float *, v4sf)
-v2di __builtin_ia32_extrq  (v2di, v16qi)
-v2di __builtin_ia32_extrqi (v2di, const unsigned int, const unsigned int)
-v2di __builtin_ia32_insertq (v2di, v2di)
-v2di __builtin_ia32_insertqi (v2di, v2di, const unsigned int, const unsigned int)
-@end smallexample
-
-The following built-in functions are available when @option{-mxop} is used.
-@smallexample
-v2df __builtin_ia32_vfrczpd (v2df)
-v4sf __builtin_ia32_vfrczps (v4sf)
-v2df __builtin_ia32_vfrczsd (v2df)
-v4sf __builtin_ia32_vfrczss (v4sf)
-v4df __builtin_ia32_vfrczpd256 (v4df)
-v8sf __builtin_ia32_vfrczps256 (v8sf)
-v2di __builtin_ia32_vpcmov (v2di, v2di, v2di)
-v2di __builtin_ia32_vpcmov_v2di (v2di, v2di, v2di)
-v4si __builtin_ia32_vpcmov_v4si (v4si, v4si, v4si)
-v8hi __builtin_ia32_vpcmov_v8hi (v8hi, v8hi, v8hi)
-v16qi __builtin_ia32_vpcmov_v16qi (v16qi, v16qi, v16qi)
-v2df __builtin_ia32_vpcmov_v2df (v2df, v2df, v2df)
-v4sf __builtin_ia32_vpcmov_v4sf (v4sf, v4sf, v4sf)
-v4di __builtin_ia32_vpcmov_v4di256 (v4di, v4di, v4di)
-v8si __builtin_ia32_vpcmov_v8si256 (v8si, v8si, v8si)
-v16hi __builtin_ia32_vpcmov_v16hi256 (v16hi, v16hi, v16hi)
-v32qi __builtin_ia32_vpcmov_v32qi256 (v32qi, v32qi, v32qi)
-v4df __builtin_ia32_vpcmov_v4df256 (v4df, v4df, v4df)
-v8sf __builtin_ia32_vpcmov_v8sf256 (v8sf, v8sf, v8sf)
-v16qi __builtin_ia32_vpcomeqb (v16qi, v16qi)
-v8hi __builtin_ia32_vpcomeqw (v8hi, v8hi)
-v4si __builtin_ia32_vpcomeqd (v4si, v4si)
-v2di __builtin_ia32_vpcomeqq (v2di, v2di)
-v16qi __builtin_ia32_vpcomequb (v16qi, v16qi)
-v4si __builtin_ia32_vpcomequd (v4si, v4si)
-v2di __builtin_ia32_vpcomequq (v2di, v2di)
-v8hi __builtin_ia32_vpcomequw (v8hi, v8hi)
-v8hi __builtin_ia32_vpcomeqw (v8hi, v8hi)
-v16qi __builtin_ia32_vpcomfalseb (v16qi, v16qi)
-v4si __builtin_ia32_vpcomfalsed (v4si, v4si)
-v2di __builtin_ia32_vpcomfalseq (v2di, v2di)
-v16qi __builtin_ia32_vpcomfalseub (v16qi, v16qi)
-v4si __builtin_ia32_vpcomfalseud (v4si, v4si)
-v2di __builtin_ia32_vpcomfalseuq (v2di, v2di)
-v8hi __builtin_ia32_vpcomfalseuw (v8hi, v8hi)
-v8hi __builtin_ia32_vpcomfalsew (v8hi, v8hi)
-v16qi __builtin_ia32_vpcomgeb (v16qi, v16qi)
-v4si __builtin_ia32_vpcomged (v4si, v4si)
-v2di __builtin_ia32_vpcomgeq (v2di, v2di)
-v16qi __builtin_ia32_vpcomgeub (v16qi, v16qi)
-v4si __builtin_ia32_vpcomgeud (v4si, v4si)
-v2di __builtin_ia32_vpcomgeuq (v2di, v2di)
-v8hi __builtin_ia32_vpcomgeuw (v8hi, v8hi)
-v8hi __builtin_ia32_vpcomgew (v8hi, v8hi)
-v16qi __builtin_ia32_vpcomgtb (v16qi, v16qi)
-v4si __builtin_ia32_vpcomgtd (v4si, v4si)
-v2di __builtin_ia32_vpcomgtq (v2di, v2di)
-v16qi __builtin_ia32_vpcomgtub (v16qi, v16qi)
-v4si __builtin_ia32_vpcomgtud (v4si, v4si)
-v2di __builtin_ia32_vpcomgtuq (v2di, v2di)
-v8hi __builtin_ia32_vpcomgtuw (v8hi, v8hi)
-v8hi __builtin_ia32_vpcomgtw (v8hi, v8hi)
-v16qi __builtin_ia32_vpcomleb (v16qi, v16qi)
-v4si __builtin_ia32_vpcomled (v4si, v4si)
-v2di __builtin_ia32_vpcomleq (v2di, v2di)
-v16qi __builtin_ia32_vpcomleub (v16qi, v16qi)
-v4si __builtin_ia32_vpcomleud (v4si, v4si)
-v2di __builtin_ia32_vpcomleuq (v2di, v2di)
-v8hi __builtin_ia32_vpcomleuw (v8hi, v8hi)
-v8hi __builtin_ia32_vpcomlew (v8hi, v8hi)
-v16qi __builtin_ia32_vpcomltb (v16qi, v16qi)
-v4si __builtin_ia32_vpcomltd (v4si, v4si)
-v2di __builtin_ia32_vpcomltq (v2di, v2di)
-v16qi __builtin_ia32_vpcomltub (v16qi, v16qi)
-v4si __builtin_ia32_vpcomltud (v4si, v4si)
-v2di __builtin_ia32_vpcomltuq (v2di, v2di)
-v8hi __builtin_ia32_vpcomltuw (v8hi, v8hi)
-v8hi __builtin_ia32_vpcomltw (v8hi, v8hi)
-v16qi __builtin_ia32_vpcomneb (v16qi, v16qi)
-v4si __builtin_ia32_vpcomned (v4si, v4si)
-v2di __builtin_ia32_vpcomneq (v2di, v2di)
-v16qi __builtin_ia32_vpcomneub (v16qi, v16qi)
-v4si __builtin_ia32_vpcomneud (v4si, v4si)
-v2di __builtin_ia32_vpcomneuq (v2di, v2di)
-v8hi __builtin_ia32_vpcomneuw (v8hi, v8hi)
-v8hi __builtin_ia32_vpcomnew (v8hi, v8hi)
-v16qi __builtin_ia32_vpcomtrueb (v16qi, v16qi)
-v4si __builtin_ia32_vpcomtrued (v4si, v4si)
-v2di __builtin_ia32_vpcomtrueq (v2di, v2di)
-v16qi __builtin_ia32_vpcomtrueub (v16qi, v16qi)
-v4si __builtin_ia32_vpcomtrueud (v4si, v4si)
-v2di __builtin_ia32_vpcomtrueuq (v2di, v2di)
-v8hi __builtin_ia32_vpcomtrueuw (v8hi, v8hi)
-v8hi __builtin_ia32_vpcomtruew (v8hi, v8hi)
-v4si __builtin_ia32_vphaddbd (v16qi)
-v2di __builtin_ia32_vphaddbq (v16qi)
-v8hi __builtin_ia32_vphaddbw (v16qi)
-v2di __builtin_ia32_vphadddq (v4si)
-v4si __builtin_ia32_vphaddubd (v16qi)
-v2di __builtin_ia32_vphaddubq (v16qi)
-v8hi __builtin_ia32_vphaddubw (v16qi)
-v2di __builtin_ia32_vphaddudq (v4si)
-v4si __builtin_ia32_vphadduwd (v8hi)
-v2di __builtin_ia32_vphadduwq (v8hi)
-v4si __builtin_ia32_vphaddwd (v8hi)
-v2di __builtin_ia32_vphaddwq (v8hi)
-v8hi __builtin_ia32_vphsubbw (v16qi)
-v2di __builtin_ia32_vphsubdq (v4si)
-v4si __builtin_ia32_vphsubwd (v8hi)
-v4si __builtin_ia32_vpmacsdd (v4si, v4si, v4si)
-v2di __builtin_ia32_vpmacsdqh (v4si, v4si, v2di)
-v2di __builtin_ia32_vpmacsdql (v4si, v4si, v2di)
-v4si __builtin_ia32_vpmacssdd (v4si, v4si, v4si)
-v2di __builtin_ia32_vpmacssdqh (v4si, v4si, v2di)
-v2di __builtin_ia32_vpmacssdql (v4si, v4si, v2di)
-v4si __builtin_ia32_vpmacsswd (v8hi, v8hi, v4si)
-v8hi __builtin_ia32_vpmacssww (v8hi, v8hi, v8hi)
-v4si __builtin_ia32_vpmacswd (v8hi, v8hi, v4si)
-v8hi __builtin_ia32_vpmacsww (v8hi, v8hi, v8hi)
-v4si __builtin_ia32_vpmadcsswd (v8hi, v8hi, v4si)
-v4si __builtin_ia32_vpmadcswd (v8hi, v8hi, v4si)
-v16qi __builtin_ia32_vpperm (v16qi, v16qi, v16qi)
-v16qi __builtin_ia32_vprotb (v16qi, v16qi)
-v4si __builtin_ia32_vprotd (v4si, v4si)
-v2di __builtin_ia32_vprotq (v2di, v2di)
-v8hi __builtin_ia32_vprotw (v8hi, v8hi)
-v16qi __builtin_ia32_vpshab (v16qi, v16qi)
-v4si __builtin_ia32_vpshad (v4si, v4si)
-v2di __builtin_ia32_vpshaq (v2di, v2di)
-v8hi __builtin_ia32_vpshaw (v8hi, v8hi)
-v16qi __builtin_ia32_vpshlb (v16qi, v16qi)
-v4si __builtin_ia32_vpshld (v4si, v4si)
-v2di __builtin_ia32_vpshlq (v2di, v2di)
-v8hi __builtin_ia32_vpshlw (v8hi, v8hi)
-@end smallexample
-
-The following built-in functions are available when @option{-mfma4} is used.
-All of them generate the machine instruction that is part of the name.
-
-@smallexample
-v2df __builtin_ia32_vfmaddpd (v2df, v2df, v2df)
-v4sf __builtin_ia32_vfmaddps (v4sf, v4sf, v4sf)
-v2df __builtin_ia32_vfmaddsd (v2df, v2df, v2df)
-v4sf __builtin_ia32_vfmaddss (v4sf, v4sf, v4sf)
-v2df __builtin_ia32_vfmsubpd (v2df, v2df, v2df)
-v4sf __builtin_ia32_vfmsubps (v4sf, v4sf, v4sf)
-v2df __builtin_ia32_vfmsubsd (v2df, v2df, v2df)
-v4sf __builtin_ia32_vfmsubss (v4sf, v4sf, v4sf)
-v2df __builtin_ia32_vfnmaddpd (v2df, v2df, v2df)
-v4sf __builtin_ia32_vfnmaddps (v4sf, v4sf, v4sf)
-v2df __builtin_ia32_vfnmaddsd (v2df, v2df, v2df)
-v4sf __builtin_ia32_vfnmaddss (v4sf, v4sf, v4sf)
-v2df __builtin_ia32_vfnmsubpd (v2df, v2df, v2df)
-v4sf __builtin_ia32_vfnmsubps (v4sf, v4sf, v4sf)
-v2df __builtin_ia32_vfnmsubsd (v2df, v2df, v2df)
-v4sf __builtin_ia32_vfnmsubss (v4sf, v4sf, v4sf)
-v2df __builtin_ia32_vfmaddsubpd  (v2df, v2df, v2df)
-v4sf __builtin_ia32_vfmaddsubps  (v4sf, v4sf, v4sf)
-v2df __builtin_ia32_vfmsubaddpd  (v2df, v2df, v2df)
-v4sf __builtin_ia32_vfmsubaddps  (v4sf, v4sf, v4sf)
-v4df __builtin_ia32_vfmaddpd256 (v4df, v4df, v4df)
-v8sf __builtin_ia32_vfmaddps256 (v8sf, v8sf, v8sf)
-v4df __builtin_ia32_vfmsubpd256 (v4df, v4df, v4df)
-v8sf __builtin_ia32_vfmsubps256 (v8sf, v8sf, v8sf)
-v4df __builtin_ia32_vfnmaddpd256 (v4df, v4df, v4df)
-v8sf __builtin_ia32_vfnmaddps256 (v8sf, v8sf, v8sf)
-v4df __builtin_ia32_vfnmsubpd256 (v4df, v4df, v4df)
-v8sf __builtin_ia32_vfnmsubps256 (v8sf, v8sf, v8sf)
-v4df __builtin_ia32_vfmaddsubpd256 (v4df, v4df, v4df)
-v8sf __builtin_ia32_vfmaddsubps256 (v8sf, v8sf, v8sf)
-v4df __builtin_ia32_vfmsubaddpd256 (v4df, v4df, v4df)
-v8sf __builtin_ia32_vfmsubaddps256 (v8sf, v8sf, v8sf)
-
-@end smallexample
-
-The following built-in functions are available when @option{-mlwp} is used.
-
-@smallexample
-void __builtin_ia32_llwpcb16 (void *);
-void __builtin_ia32_llwpcb32 (void *);
-void __builtin_ia32_llwpcb64 (void *);
-void * __builtin_ia32_llwpcb16 (void);
-void * __builtin_ia32_llwpcb32 (void);
-void * __builtin_ia32_llwpcb64 (void);
-void __builtin_ia32_lwpval16 (unsigned short, unsigned int, unsigned short)
-void __builtin_ia32_lwpval32 (unsigned int, unsigned int, unsigned int)
-void __builtin_ia32_lwpval64 (unsigned __int64, unsigned int, unsigned int)
-unsigned char __builtin_ia32_lwpins16 (unsigned short, unsigned int, unsigned short)
-unsigned char __builtin_ia32_lwpins32 (unsigned int, unsigned int, unsigned int)
-unsigned char __builtin_ia32_lwpins64 (unsigned __int64, unsigned int, unsigned int)
-@end smallexample
-
-The following built-in functions are available when @option{-mbmi} is used.
-All of them generate the machine instruction that is part of the name.
-@smallexample
-unsigned int __builtin_ia32_bextr_u32(unsigned int, unsigned int);
-unsigned long long __builtin_ia32_bextr_u64 (unsigned long long, unsigned long long);
-@end smallexample
-
-The following built-in functions are available when @option{-mbmi2} is used.
-All of them generate the machine instruction that is part of the name.
-@smallexample
-unsigned int _bzhi_u32 (unsigned int, unsigned int)
-unsigned int _pdep_u32 (unsigned int, unsigned int)
-unsigned int _pext_u32 (unsigned int, unsigned int)
-unsigned long long _bzhi_u64 (unsigned long long, unsigned long long)
-unsigned long long _pdep_u64 (unsigned long long, unsigned long long)
-unsigned long long _pext_u64 (unsigned long long, unsigned long long)
-@end smallexample
-
-The following built-in functions are available when @option{-mlzcnt} is used.
-All of them generate the machine instruction that is part of the name.
-@smallexample
-unsigned short __builtin_ia32_lzcnt_16(unsigned short);
-unsigned int __builtin_ia32_lzcnt_u32(unsigned int);
-unsigned long long __builtin_ia32_lzcnt_u64 (unsigned long long);
-@end smallexample
-
-The following built-in functions are available when @option{-mfxsr} is used.
-All of them generate the machine instruction that is part of the name.
-@smallexample
-void __builtin_ia32_fxsave (void *)
-void __builtin_ia32_fxrstor (void *)
-void __builtin_ia32_fxsave64 (void *)
-void __builtin_ia32_fxrstor64 (void *)
-@end smallexample
-
-The following built-in functions are available when @option{-mxsave} is used.
-All of them generate the machine instruction that is part of the name.
-@smallexample
-void __builtin_ia32_xsave (void *, long long)
-void __builtin_ia32_xrstor (void *, long long)
-void __builtin_ia32_xsave64 (void *, long long)
-void __builtin_ia32_xrstor64 (void *, long long)
-@end smallexample
-
-The following built-in functions are available when @option{-mxsaveopt} is used.
-All of them generate the machine instruction that is part of the name.
-@smallexample
-void __builtin_ia32_xsaveopt (void *, long long)
-void __builtin_ia32_xsaveopt64 (void *, long long)
-@end smallexample
-
-The following built-in functions are available when @option{-mtbm} is used.
-Both of them generate the immediate form of the bextr machine instruction.
-@smallexample
-unsigned int __builtin_ia32_bextri_u32 (unsigned int, const unsigned int);
-unsigned long long __builtin_ia32_bextri_u64 (unsigned long long, const unsigned long long);
-@end smallexample
-
-
-The following built-in functions are available when @option{-m3dnow} is used.
-All of them generate the machine instruction that is part of the name.
-
-@smallexample
-void __builtin_ia32_femms (void)
-v8qi __builtin_ia32_pavgusb (v8qi, v8qi)
-v2si __builtin_ia32_pf2id (v2sf)
-v2sf __builtin_ia32_pfacc (v2sf, v2sf)
-v2sf __builtin_ia32_pfadd (v2sf, v2sf)
-v2si __builtin_ia32_pfcmpeq (v2sf, v2sf)
-v2si __builtin_ia32_pfcmpge (v2sf, v2sf)
-v2si __builtin_ia32_pfcmpgt (v2sf, v2sf)
-v2sf __builtin_ia32_pfmax (v2sf, v2sf)
-v2sf __builtin_ia32_pfmin (v2sf, v2sf)
-v2sf __builtin_ia32_pfmul (v2sf, v2sf)
-v2sf __builtin_ia32_pfrcp (v2sf)
-v2sf __builtin_ia32_pfrcpit1 (v2sf, v2sf)
-v2sf __builtin_ia32_pfrcpit2 (v2sf, v2sf)
-v2sf __builtin_ia32_pfrsqrt (v2sf)
-v2sf __builtin_ia32_pfsub (v2sf, v2sf)
-v2sf __builtin_ia32_pfsubr (v2sf, v2sf)
-v2sf __builtin_ia32_pi2fd (v2si)
-v4hi __builtin_ia32_pmulhrw (v4hi, v4hi)
-@end smallexample
-
-The following built-in functions are available when both @option{-m3dnow}
-and @option{-march=athlon} are used.  All of them generate the machine
-instruction that is part of the name.
-
-@smallexample
-v2si __builtin_ia32_pf2iw (v2sf)
-v2sf __builtin_ia32_pfnacc (v2sf, v2sf)
-v2sf __builtin_ia32_pfpnacc (v2sf, v2sf)
-v2sf __builtin_ia32_pi2fw (v2si)
-v2sf __builtin_ia32_pswapdsf (v2sf)
-v2si __builtin_ia32_pswapdsi (v2si)
-@end smallexample
-
-The following built-in functions are available when @option{-mrtm} is used
-They are used for restricted transactional memory. These are the internal
-low level functions. Normally the functions in 
-@ref{x86 transactional memory intrinsics} should be used instead.
-
-@smallexample
-int __builtin_ia32_xbegin ()
-void __builtin_ia32_xend ()
-void __builtin_ia32_xabort (status)
-int __builtin_ia32_xtest ()
-@end smallexample
-
-@node x86 transactional memory intrinsics
-@subsection x86 transaction memory intrinsics
-
-Hardware transactional memory intrinsics for x86. These allow to use
-memory transactions with RTM (Restricted Transactional Memory).
-For using HLE (Hardware Lock Elision) see @ref{x86 specific memory model extensions for transactional memory} instead.
-This support is enabled with the @option{-mrtm} option.
-
-A memory transaction commits all changes to memory in an atomic way,
-as visible to other threads. If the transaction fails it is rolled back
-and all side effects discarded.
-
-Generally there is no guarantee that a memory transaction ever succeeds
-and suitable fallback code always needs to be supplied.
-
-@deftypefn {RTM Function} {unsigned} _xbegin ()
-Start a RTM (Restricted Transactional Memory) transaction. 
-Returns _XBEGIN_STARTED when the transaction
-started successfully (note this is not 0, so the constant has to be 
-explicitely tested). When the transaction aborts all side effects
-are undone and an abort code is returned. There is no guarantee
-any transaction ever succeeds, so there always needs to be a valid
-tested fallback path.
-@end deftypefn
-
-@smallexample
-#include <immintrin.h>
-
-if ((status = _xbegin ()) == _XBEGIN_STARTED) @{
-    ... transaction code...
-    _xend ();
-@} else @{
-    ... non transactional fallback path...
-@}
-@end smallexample
-
-Valid abort status bits (when the value is not @code{_XBEGIN_STARTED}) are:
-
-@table @code
-@item _XABORT_EXPLICIT
-Transaction explicitely aborted with @code{_xabort}. The parameter passed
-to @code{_xabort} is available with @code{_XABORT_CODE(status)}
-@item _XABORT_RETRY
-Transaction retry is possible.
-@item _XABORT_CONFLICT
-Transaction abort due to a memory conflict with another thread
-@item _XABORT_CAPACITY
-Transaction abort due to the transaction using too much memory
-@item _XABORT_DEBUG
-Transaction abort due to a debug trap
-@item _XABORT_NESTED
-Transaction abort in a inner nested transaction
-@end table
-
-@deftypefn {RTM Function} {void} _xend ()
-Commit the current transaction. When no transaction is active this will
-fault. All memory side effects of the transactions will become visible
-to other threads in an atomic matter.
-@end deftypefn
-
-@deftypefn {RTM Function} {int} _xtest ()
-Return a value not zero when a transaction is currently active, otherwise 0.
-@end deftypefn
-
-@deftypefn {RTM Function} {void} _xabort (status)
-Abort the current transaction. When no transaction is active this is a no-op.
-status must be a 8bit constant, that is included in the status code returned
-by @code{_xbegin}
-@end deftypefn
-
-@node MIPS DSP Built-in Functions
-@subsection MIPS DSP Built-in Functions
-
-The MIPS DSP Application-Specific Extension (ASE) includes new
-instructions that are designed to improve the performance of DSP and
-media applications.  It provides instructions that operate on packed
-8-bit/16-bit integer data, Q7, Q15 and Q31 fractional data.
-
-GCC supports MIPS DSP operations using both the generic
-vector extensions (@pxref{Vector Extensions}) and a collection of
-MIPS-specific built-in functions.  Both kinds of support are
-enabled by the @option{-mdsp} command-line option.
-
-Revision 2 of the ASE was introduced in the second half of 2006.
-This revision adds extra instructions to the original ASE, but is
-otherwise backwards-compatible with it.  You can select revision 2
-using the command-line option @option{-mdspr2}; this option implies
-@option{-mdsp}.
-
-The SCOUNT and POS bits of the DSP control register are global.  The
-WRDSP, EXTPDP, EXTPDPV and MTHLIP instructions modify the SCOUNT and
-POS bits.  During optimization, the compiler does not delete these
-instructions and it does not delete calls to functions containing
-these instructions.
-
-At present, GCC only provides support for operations on 32-bit
-vectors.  The vector type associated with 8-bit integer data is
-usually called @code{v4i8}, the vector type associated with Q7
-is usually called @code{v4q7}, the vector type associated with 16-bit
-integer data is usually called @code{v2i16}, and the vector type
-associated with Q15 is usually called @code{v2q15}.  They can be
-defined in C as follows:
-
-@smallexample
-typedef signed char v4i8 __attribute__ ((vector_size(4)));
-typedef signed char v4q7 __attribute__ ((vector_size(4)));
-typedef short v2i16 __attribute__ ((vector_size(4)));
-typedef short v2q15 __attribute__ ((vector_size(4)));
-@end smallexample
-
-@code{v4i8}, @code{v4q7}, @code{v2i16} and @code{v2q15} values are
-initialized in the same way as aggregates.  For example:
-
-@smallexample
-v4i8 a = @{1, 2, 3, 4@};
-v4i8 b;
-b = (v4i8) @{5, 6, 7, 8@};
-
-v2q15 c = @{0x0fcb, 0x3a75@};
-v2q15 d;
-d = (v2q15) @{0.1234 * 0x1.0p15, 0.4567 * 0x1.0p15@};
-@end smallexample
-
-@emph{Note:} The CPU's endianness determines the order in which values
-are packed.  On little-endian targets, the first value is the least
-significant and the last value is the most significant.  The opposite
-order applies to big-endian targets.  For example, the code above
-sets the lowest byte of @code{a} to @code{1} on little-endian targets
-and @code{4} on big-endian targets.
-
-@emph{Note:} Q7, Q15 and Q31 values must be initialized with their integer
-representation.  As shown in this example, the integer representation
-of a Q7 value can be obtained by multiplying the fractional value by
-@code{0x1.0p7}.  The equivalent for Q15 values is to multiply by
-@code{0x1.0p15}.  The equivalent for Q31 values is to multiply by
-@code{0x1.0p31}.
-
-The table below lists the @code{v4i8} and @code{v2q15} operations for which
-hardware support exists.  @code{a} and @code{b} are @code{v4i8} values,
-and @code{c} and @code{d} are @code{v2q15} values.
-
-@multitable @columnfractions .50 .50
-@item C code @tab MIPS instruction
-@item @code{a + b} @tab @code{addu.qb}
-@item @code{c + d} @tab @code{addq.ph}
-@item @code{a - b} @tab @code{subu.qb}
-@item @code{c - d} @tab @code{subq.ph}
-@end multitable
-
-The table below lists the @code{v2i16} operation for which
-hardware support exists for the DSP ASE REV 2.  @code{e} and @code{f} are
-@code{v2i16} values.
-
-@multitable @columnfractions .50 .50
-@item C code @tab MIPS instruction
-@item @code{e * f} @tab @code{mul.ph}
-@end multitable
-
-It is easier to describe the DSP built-in functions if we first define
-the following types:
-
-@smallexample
-typedef int q31;
-typedef int i32;
-typedef unsigned int ui32;
-typedef long long a64;
+float __builtin_recipdivf (float, float);
+float __builtin_rsqrtf (float);
+double __builtin_recipdiv (double, double);
+double __builtin_rsqrt (double);
+uint64_t __builtin_ppc_get_timebase ();
+unsigned long __builtin_ppc_mftb ();
+double __builtin_unpack_longdouble (long double, int);
+long double __builtin_pack_longdouble (double, double);
 @end smallexample
 
-@code{q31} and @code{i32} are actually the same as @code{int}, but we
-use @code{q31} to indicate a Q31 fractional value and @code{i32} to
-indicate a 32-bit integer value.  Similarly, @code{a64} is the same as
-@code{long long}, but we use @code{a64} to indicate values that are
-placed in one of the four DSP accumulators (@code{$ac0},
-@code{$ac1}, @code{$ac2} or @code{$ac3}).
-
-Also, some built-in functions prefer or require immediate numbers as
-parameters, because the corresponding DSP instructions accept both immediate
-numbers and register operands, or accept immediate numbers only.  The
-immediate parameters are listed as follows.
+The @code{vec_rsqrt}, @code{__builtin_rsqrt}, and
+@code{__builtin_rsqrtf} functions generate multiple instructions to
+implement the reciprocal sqrt functionality using reciprocal sqrt
+estimate instructions.
 
-@smallexample
-imm0_3: 0 to 3.
-imm0_7: 0 to 7.
-imm0_15: 0 to 15.
-imm0_31: 0 to 31.
-imm0_63: 0 to 63.
-imm0_255: 0 to 255.
-imm_n32_31: -32 to 31.
-imm_n512_511: -512 to 511.
-@end smallexample
+The @code{__builtin_recipdiv}, and @code{__builtin_recipdivf}
+functions generate multiple instructions to implement division using
+the reciprocal estimate instructions.
 
-The following built-in functions map directly to a particular MIPS DSP
-instruction.  Please refer to the architecture specification
-for details on what each instruction does.
+The @code{__builtin_ppc_get_timebase} and @code{__builtin_ppc_mftb}
+functions generate instructions to read the Time Base Register.  The
+@code{__builtin_ppc_get_timebase} function may generate multiple
+instructions and always returns the 64 bits of the Time Base Register.
+The @code{__builtin_ppc_mftb} function always generates one instruction and
+returns the Time Base Register value as an unsigned long, throwing away
+the most significant word on 32-bit environments.
 
+The following built-in functions are available for the PowerPC family
+of processors, starting with ISA 2.06 or later (@option{-mcpu=power7}
+or @option{-mpopcntd}):
 @smallexample
-v2q15 __builtin_mips_addq_ph (v2q15, v2q15)
-v2q15 __builtin_mips_addq_s_ph (v2q15, v2q15)
-q31 __builtin_mips_addq_s_w (q31, q31)
-v4i8 __builtin_mips_addu_qb (v4i8, v4i8)
-v4i8 __builtin_mips_addu_s_qb (v4i8, v4i8)
-v2q15 __builtin_mips_subq_ph (v2q15, v2q15)
-v2q15 __builtin_mips_subq_s_ph (v2q15, v2q15)
-q31 __builtin_mips_subq_s_w (q31, q31)
-v4i8 __builtin_mips_subu_qb (v4i8, v4i8)
-v4i8 __builtin_mips_subu_s_qb (v4i8, v4i8)
-i32 __builtin_mips_addsc (i32, i32)
-i32 __builtin_mips_addwc (i32, i32)
-i32 __builtin_mips_modsub (i32, i32)
-i32 __builtin_mips_raddu_w_qb (v4i8)
-v2q15 __builtin_mips_absq_s_ph (v2q15)
-q31 __builtin_mips_absq_s_w (q31)
-v4i8 __builtin_mips_precrq_qb_ph (v2q15, v2q15)
-v2q15 __builtin_mips_precrq_ph_w (q31, q31)
-v2q15 __builtin_mips_precrq_rs_ph_w (q31, q31)
-v4i8 __builtin_mips_precrqu_s_qb_ph (v2q15, v2q15)
-q31 __builtin_mips_preceq_w_phl (v2q15)
-q31 __builtin_mips_preceq_w_phr (v2q15)
-v2q15 __builtin_mips_precequ_ph_qbl (v4i8)
-v2q15 __builtin_mips_precequ_ph_qbr (v4i8)
-v2q15 __builtin_mips_precequ_ph_qbla (v4i8)
-v2q15 __builtin_mips_precequ_ph_qbra (v4i8)
-v2q15 __builtin_mips_preceu_ph_qbl (v4i8)
-v2q15 __builtin_mips_preceu_ph_qbr (v4i8)
-v2q15 __builtin_mips_preceu_ph_qbla (v4i8)
-v2q15 __builtin_mips_preceu_ph_qbra (v4i8)
-v4i8 __builtin_mips_shll_qb (v4i8, imm0_7)
-v4i8 __builtin_mips_shll_qb (v4i8, i32)
-v2q15 __builtin_mips_shll_ph (v2q15, imm0_15)
-v2q15 __builtin_mips_shll_ph (v2q15, i32)
-v2q15 __builtin_mips_shll_s_ph (v2q15, imm0_15)
-v2q15 __builtin_mips_shll_s_ph (v2q15, i32)
-q31 __builtin_mips_shll_s_w (q31, imm0_31)
-q31 __builtin_mips_shll_s_w (q31, i32)
-v4i8 __builtin_mips_shrl_qb (v4i8, imm0_7)
-v4i8 __builtin_mips_shrl_qb (v4i8, i32)
-v2q15 __builtin_mips_shra_ph (v2q15, imm0_15)
-v2q15 __builtin_mips_shra_ph (v2q15, i32)
-v2q15 __builtin_mips_shra_r_ph (v2q15, imm0_15)
-v2q15 __builtin_mips_shra_r_ph (v2q15, i32)
-q31 __builtin_mips_shra_r_w (q31, imm0_31)
-q31 __builtin_mips_shra_r_w (q31, i32)
-v2q15 __builtin_mips_muleu_s_ph_qbl (v4i8, v2q15)
-v2q15 __builtin_mips_muleu_s_ph_qbr (v4i8, v2q15)
-v2q15 __builtin_mips_mulq_rs_ph (v2q15, v2q15)
-q31 __builtin_mips_muleq_s_w_phl (v2q15, v2q15)
-q31 __builtin_mips_muleq_s_w_phr (v2q15, v2q15)
-a64 __builtin_mips_dpau_h_qbl (a64, v4i8, v4i8)
-a64 __builtin_mips_dpau_h_qbr (a64, v4i8, v4i8)
-a64 __builtin_mips_dpsu_h_qbl (a64, v4i8, v4i8)
-a64 __builtin_mips_dpsu_h_qbr (a64, v4i8, v4i8)
-a64 __builtin_mips_dpaq_s_w_ph (a64, v2q15, v2q15)
-a64 __builtin_mips_dpaq_sa_l_w (a64, q31, q31)
-a64 __builtin_mips_dpsq_s_w_ph (a64, v2q15, v2q15)
-a64 __builtin_mips_dpsq_sa_l_w (a64, q31, q31)
-a64 __builtin_mips_mulsaq_s_w_ph (a64, v2q15, v2q15)
-a64 __builtin_mips_maq_s_w_phl (a64, v2q15, v2q15)
-a64 __builtin_mips_maq_s_w_phr (a64, v2q15, v2q15)
-a64 __builtin_mips_maq_sa_w_phl (a64, v2q15, v2q15)
-a64 __builtin_mips_maq_sa_w_phr (a64, v2q15, v2q15)
-i32 __builtin_mips_bitrev (i32)
-i32 __builtin_mips_insv (i32, i32)
-v4i8 __builtin_mips_repl_qb (imm0_255)
-v4i8 __builtin_mips_repl_qb (i32)
-v2q15 __builtin_mips_repl_ph (imm_n512_511)
-v2q15 __builtin_mips_repl_ph (i32)
-void __builtin_mips_cmpu_eq_qb (v4i8, v4i8)
-void __builtin_mips_cmpu_lt_qb (v4i8, v4i8)
-void __builtin_mips_cmpu_le_qb (v4i8, v4i8)
-i32 __builtin_mips_cmpgu_eq_qb (v4i8, v4i8)
-i32 __builtin_mips_cmpgu_lt_qb (v4i8, v4i8)
-i32 __builtin_mips_cmpgu_le_qb (v4i8, v4i8)
-void __builtin_mips_cmp_eq_ph (v2q15, v2q15)
-void __builtin_mips_cmp_lt_ph (v2q15, v2q15)
-void __builtin_mips_cmp_le_ph (v2q15, v2q15)
-v4i8 __builtin_mips_pick_qb (v4i8, v4i8)
-v2q15 __builtin_mips_pick_ph (v2q15, v2q15)
-v2q15 __builtin_mips_packrl_ph (v2q15, v2q15)
-i32 __builtin_mips_extr_w (a64, imm0_31)
-i32 __builtin_mips_extr_w (a64, i32)
-i32 __builtin_mips_extr_r_w (a64, imm0_31)
-i32 __builtin_mips_extr_s_h (a64, i32)
-i32 __builtin_mips_extr_rs_w (a64, imm0_31)
-i32 __builtin_mips_extr_rs_w (a64, i32)
-i32 __builtin_mips_extr_s_h (a64, imm0_31)
-i32 __builtin_mips_extr_r_w (a64, i32)
-i32 __builtin_mips_extp (a64, imm0_31)
-i32 __builtin_mips_extp (a64, i32)
-i32 __builtin_mips_extpdp (a64, imm0_31)
-i32 __builtin_mips_extpdp (a64, i32)
-a64 __builtin_mips_shilo (a64, imm_n32_31)
-a64 __builtin_mips_shilo (a64, i32)
-a64 __builtin_mips_mthlip (a64, i32)
-void __builtin_mips_wrdsp (i32, imm0_63)
-i32 __builtin_mips_rddsp (imm0_63)
-i32 __builtin_mips_lbux (void *, i32)
-i32 __builtin_mips_lhx (void *, i32)
-i32 __builtin_mips_lwx (void *, i32)
-a64 __builtin_mips_ldx (void *, i32) [MIPS64 only]
-i32 __builtin_mips_bposge32 (void)
-a64 __builtin_mips_madd (a64, i32, i32);
-a64 __builtin_mips_maddu (a64, ui32, ui32);
-a64 __builtin_mips_msub (a64, i32, i32);
-a64 __builtin_mips_msubu (a64, ui32, ui32);
-a64 __builtin_mips_mult (i32, i32);
-a64 __builtin_mips_multu (ui32, ui32);
+long __builtin_bpermd (long, long);
+int __builtin_divwe (int, int);
+int __builtin_divweo (int, int);
+unsigned int __builtin_divweu (unsigned int, unsigned int);
+unsigned int __builtin_divweuo (unsigned int, unsigned int);
+long __builtin_divde (long, long);
+long __builtin_divdeo (long, long);
+unsigned long __builtin_divdeu (unsigned long, unsigned long);
+unsigned long __builtin_divdeuo (unsigned long, unsigned long);
+unsigned int cdtbcd (unsigned int);
+unsigned int cbcdtd (unsigned int);
+unsigned int addg6s (unsigned int, unsigned int);
 @end smallexample
 
-The following built-in functions map directly to a particular MIPS DSP REV 2
-instruction.  Please refer to the architecture specification
-for details on what each instruction does.
+The @code{__builtin_divde}, @code{__builtin_divdeo},
+@code{__builitin_divdeu}, @code{__builtin_divdeou} functions require a
+64-bit environment support ISA 2.06 or later.
 
+The following built-in functions are available for the PowerPC family
+of processors when hardware decimal floating point
+(@option{-mhard-dfp}) is available:
 @smallexample
-v4q7 __builtin_mips_absq_s_qb (v4q7);
-v2i16 __builtin_mips_addu_ph (v2i16, v2i16);
-v2i16 __builtin_mips_addu_s_ph (v2i16, v2i16);
-v4i8 __builtin_mips_adduh_qb (v4i8, v4i8);
-v4i8 __builtin_mips_adduh_r_qb (v4i8, v4i8);
-i32 __builtin_mips_append (i32, i32, imm0_31);
-i32 __builtin_mips_balign (i32, i32, imm0_3);
-i32 __builtin_mips_cmpgdu_eq_qb (v4i8, v4i8);
-i32 __builtin_mips_cmpgdu_lt_qb (v4i8, v4i8);
-i32 __builtin_mips_cmpgdu_le_qb (v4i8, v4i8);
-a64 __builtin_mips_dpa_w_ph (a64, v2i16, v2i16);
-a64 __builtin_mips_dps_w_ph (a64, v2i16, v2i16);
-v2i16 __builtin_mips_mul_ph (v2i16, v2i16);
-v2i16 __builtin_mips_mul_s_ph (v2i16, v2i16);
-q31 __builtin_mips_mulq_rs_w (q31, q31);
-v2q15 __builtin_mips_mulq_s_ph (v2q15, v2q15);
-q31 __builtin_mips_mulq_s_w (q31, q31);
-a64 __builtin_mips_mulsa_w_ph (a64, v2i16, v2i16);
-v4i8 __builtin_mips_precr_qb_ph (v2i16, v2i16);
-v2i16 __builtin_mips_precr_sra_ph_w (i32, i32, imm0_31);
-v2i16 __builtin_mips_precr_sra_r_ph_w (i32, i32, imm0_31);
-i32 __builtin_mips_prepend (i32, i32, imm0_31);
-v4i8 __builtin_mips_shra_qb (v4i8, imm0_7);
-v4i8 __builtin_mips_shra_r_qb (v4i8, imm0_7);
-v4i8 __builtin_mips_shra_qb (v4i8, i32);
-v4i8 __builtin_mips_shra_r_qb (v4i8, i32);
-v2i16 __builtin_mips_shrl_ph (v2i16, imm0_15);
-v2i16 __builtin_mips_shrl_ph (v2i16, i32);
-v2i16 __builtin_mips_subu_ph (v2i16, v2i16);
-v2i16 __builtin_mips_subu_s_ph (v2i16, v2i16);
-v4i8 __builtin_mips_subuh_qb (v4i8, v4i8);
-v4i8 __builtin_mips_subuh_r_qb (v4i8, v4i8);
-v2q15 __builtin_mips_addqh_ph (v2q15, v2q15);
-v2q15 __builtin_mips_addqh_r_ph (v2q15, v2q15);
-q31 __builtin_mips_addqh_w (q31, q31);
-q31 __builtin_mips_addqh_r_w (q31, q31);
-v2q15 __builtin_mips_subqh_ph (v2q15, v2q15);
-v2q15 __builtin_mips_subqh_r_ph (v2q15, v2q15);
-q31 __builtin_mips_subqh_w (q31, q31);
-q31 __builtin_mips_subqh_r_w (q31, q31);
-a64 __builtin_mips_dpax_w_ph (a64, v2i16, v2i16);
-a64 __builtin_mips_dpsx_w_ph (a64, v2i16, v2i16);
-a64 __builtin_mips_dpaqx_s_w_ph (a64, v2q15, v2q15);
-a64 __builtin_mips_dpaqx_sa_w_ph (a64, v2q15, v2q15);
-a64 __builtin_mips_dpsqx_s_w_ph (a64, v2q15, v2q15);
-a64 __builtin_mips_dpsqx_sa_w_ph (a64, v2q15, v2q15);
+_Decimal64 __builtin_dxex (_Decimal64);
+_Decimal128 __builtin_dxexq (_Decimal128);
+_Decimal64 __builtin_ddedpd (int, _Decimal64);
+_Decimal128 __builtin_ddedpdq (int, _Decimal128);
+_Decimal64 __builtin_denbcd (int, _Decimal64);
+_Decimal128 __builtin_denbcdq (int, _Decimal128);
+_Decimal64 __builtin_diex (_Decimal64, _Decimal64);
+_Decimal128 _builtin_diexq (_Decimal128, _Decimal128);
+_Decimal64 __builtin_dscli (_Decimal64, int);
+_Decimal128 __builitn_dscliq (_Decimal128, int);
+_Decimal64 __builtin_dscri (_Decimal64, int);
+_Decimal128 __builitn_dscriq (_Decimal128, int);
+unsigned long long __builtin_unpack_dec128 (_Decimal128, int);
+_Decimal128 __builtin_pack_dec128 (unsigned long long, unsigned long long);
 @end smallexample
 
+The following built-in functions are available for the PowerPC family
+of processors when the Vector Scalar (vsx) instruction set is
+available:
+@smallexample
+unsigned long long __builtin_unpack_vector_int128 (vector __int128_t, int);
+vector __int128_t __builtin_pack_vector_int128 (unsigned long long,
+                                                unsigned long long);
+@end smallexample
 
-@node MIPS Paired-Single Support
-@subsection MIPS Paired-Single Support
+@node PowerPC AltiVec/VSX Built-in Functions
+@subsection PowerPC AltiVec Built-in Functions
 
-The MIPS64 architecture includes a number of instructions that
-operate on pairs of single-precision floating-point values.
-Each pair is packed into a 64-bit floating-point register,
-with one element being designated the ``upper half'' and
-the other being designated the ``lower half''.
+GCC provides an interface for the PowerPC family of processors to access
+the AltiVec operations described in Motorola's AltiVec Programming
+Interface Manual.  The interface is made available by including
+@code{<altivec.h>} and using @option{-maltivec} and
+@option{-mabi=altivec}.  The interface supports the following vector
+types.
 
-GCC supports paired-single operations using both the generic
-vector extensions (@pxref{Vector Extensions}) and a collection of
-MIPS-specific built-in functions.  Both kinds of support are
-enabled by the @option{-mpaired-single} command-line option.
+@smallexample
+vector unsigned char
+vector signed char
+vector bool char
 
-The vector type associated with paired-single values is usually
-called @code{v2sf}.  It can be defined in C as follows:
+vector unsigned short
+vector signed short
+vector bool short
+vector pixel
 
-@smallexample
-typedef float v2sf __attribute__ ((vector_size (8)));
+vector unsigned int
+vector signed int
+vector bool int
+vector float
 @end smallexample
 
-@code{v2sf} values are initialized in the same way as aggregates.
-For example:
+If @option{-mvsx} is used the following additional vector types are
+implemented.
 
 @smallexample
-v2sf a = @{1.5, 9.1@};
-v2sf b;
-float e, f;
-b = (v2sf) @{e, f@};
+vector unsigned long
+vector signed long
+vector double
 @end smallexample
 
-@emph{Note:} The CPU's endianness determines which value is stored in
-the upper half of a register and which value is stored in the lower half.
-On little-endian targets, the first value is the lower one and the second
-value is the upper one.  The opposite order applies to big-endian targets.
-For example, the code above sets the lower half of @code{a} to
-@code{1.5} on little-endian targets and @code{9.1} on big-endian targets.
+The long types are only implemented for 64-bit code generation, and
+the long type is only used in the floating point/integer conversion
+instructions.
 
-@node MIPS Loongson Built-in Functions
-@subsection MIPS Loongson Built-in Functions
+GCC's implementation of the high-level language interface available from
+C and C++ code differs from Motorola's documentation in several ways.
 
-GCC provides intrinsics to access the SIMD instructions provided by the
-ST Microelectronics Loongson-2E and -2F processors.  These intrinsics,
-available after inclusion of the @code{loongson.h} header file,
-operate on the following 64-bit vector types:
+@itemize @bullet
 
-@itemize
-@item @code{uint8x8_t}, a vector of eight unsigned 8-bit integers;
-@item @code{uint16x4_t}, a vector of four unsigned 16-bit integers;
-@item @code{uint32x2_t}, a vector of two unsigned 32-bit integers;
-@item @code{int8x8_t}, a vector of eight signed 8-bit integers;
-@item @code{int16x4_t}, a vector of four signed 16-bit integers;
-@item @code{int32x2_t}, a vector of two signed 32-bit integers.
-@end itemize
+@item
+A vector constant is a list of constant expressions within curly braces.
 
-The intrinsics provided are listed below; each is named after the
-machine instruction to which it corresponds, with suffixes added as
-appropriate to distinguish intrinsics that expand to the same machine
-instruction yet have different argument types.  Refer to the architecture
-documentation for a description of the functionality of each
-instruction.
+@item
+A vector initializer requires no cast if the vector constant is of the
+same type as the variable it is initializing.
+
+@item
+If @code{signed} or @code{unsigned} is omitted, the signedness of the
+vector type is the default signedness of the base type.  The default
+varies depending on the operating system, so a portable program should
+always specify the signedness.
+
+@item
+Compiling with @option{-maltivec} adds keywords @code{__vector},
+@code{vector}, @code{__pixel}, @code{pixel}, @code{__bool} and
+@code{bool}.  When compiling ISO C, the context-sensitive substitution
+of the keywords @code{vector}, @code{pixel} and @code{bool} is
+disabled.  To use them, you must include @code{<altivec.h>} instead.
+
+@item
+GCC allows using a @code{typedef} name as the type specifier for a
+vector type.
+
+@item
+For C, overloaded functions are implemented with macros so the following
+does not work:
 
 @smallexample
-int16x4_t packsswh (int32x2_t s, int32x2_t t);
-int8x8_t packsshb (int16x4_t s, int16x4_t t);
-uint8x8_t packushb (uint16x4_t s, uint16x4_t t);
-uint32x2_t paddw_u (uint32x2_t s, uint32x2_t t);
-uint16x4_t paddh_u (uint16x4_t s, uint16x4_t t);
-uint8x8_t paddb_u (uint8x8_t s, uint8x8_t t);
-int32x2_t paddw_s (int32x2_t s, int32x2_t t);
-int16x4_t paddh_s (int16x4_t s, int16x4_t t);
-int8x8_t paddb_s (int8x8_t s, int8x8_t t);
-uint64_t paddd_u (uint64_t s, uint64_t t);
-int64_t paddd_s (int64_t s, int64_t t);
-int16x4_t paddsh (int16x4_t s, int16x4_t t);
-int8x8_t paddsb (int8x8_t s, int8x8_t t);
-uint16x4_t paddush (uint16x4_t s, uint16x4_t t);
-uint8x8_t paddusb (uint8x8_t s, uint8x8_t t);
-uint64_t pandn_ud (uint64_t s, uint64_t t);
-uint32x2_t pandn_uw (uint32x2_t s, uint32x2_t t);
-uint16x4_t pandn_uh (uint16x4_t s, uint16x4_t t);
-uint8x8_t pandn_ub (uint8x8_t s, uint8x8_t t);
-int64_t pandn_sd (int64_t s, int64_t t);
-int32x2_t pandn_sw (int32x2_t s, int32x2_t t);
-int16x4_t pandn_sh (int16x4_t s, int16x4_t t);
-int8x8_t pandn_sb (int8x8_t s, int8x8_t t);
-uint16x4_t pavgh (uint16x4_t s, uint16x4_t t);
-uint8x8_t pavgb (uint8x8_t s, uint8x8_t t);
-uint32x2_t pcmpeqw_u (uint32x2_t s, uint32x2_t t);
-uint16x4_t pcmpeqh_u (uint16x4_t s, uint16x4_t t);
-uint8x8_t pcmpeqb_u (uint8x8_t s, uint8x8_t t);
-int32x2_t pcmpeqw_s (int32x2_t s, int32x2_t t);
-int16x4_t pcmpeqh_s (int16x4_t s, int16x4_t t);
-int8x8_t pcmpeqb_s (int8x8_t s, int8x8_t t);
-uint32x2_t pcmpgtw_u (uint32x2_t s, uint32x2_t t);
-uint16x4_t pcmpgth_u (uint16x4_t s, uint16x4_t t);
-uint8x8_t pcmpgtb_u (uint8x8_t s, uint8x8_t t);
-int32x2_t pcmpgtw_s (int32x2_t s, int32x2_t t);
-int16x4_t pcmpgth_s (int16x4_t s, int16x4_t t);
-int8x8_t pcmpgtb_s (int8x8_t s, int8x8_t t);
-uint16x4_t pextrh_u (uint16x4_t s, int field);
-int16x4_t pextrh_s (int16x4_t s, int field);
-uint16x4_t pinsrh_0_u (uint16x4_t s, uint16x4_t t);
-uint16x4_t pinsrh_1_u (uint16x4_t s, uint16x4_t t);
-uint16x4_t pinsrh_2_u (uint16x4_t s, uint16x4_t t);
-uint16x4_t pinsrh_3_u (uint16x4_t s, uint16x4_t t);
-int16x4_t pinsrh_0_s (int16x4_t s, int16x4_t t);
-int16x4_t pinsrh_1_s (int16x4_t s, int16x4_t t);
-int16x4_t pinsrh_2_s (int16x4_t s, int16x4_t t);
-int16x4_t pinsrh_3_s (int16x4_t s, int16x4_t t);
-int32x2_t pmaddhw (int16x4_t s, int16x4_t t);
-int16x4_t pmaxsh (int16x4_t s, int16x4_t t);
-uint8x8_t pmaxub (uint8x8_t s, uint8x8_t t);
-int16x4_t pminsh (int16x4_t s, int16x4_t t);
-uint8x8_t pminub (uint8x8_t s, uint8x8_t t);
-uint8x8_t pmovmskb_u (uint8x8_t s);
-int8x8_t pmovmskb_s (int8x8_t s);
-uint16x4_t pmulhuh (uint16x4_t s, uint16x4_t t);
-int16x4_t pmulhh (int16x4_t s, int16x4_t t);
-int16x4_t pmullh (int16x4_t s, int16x4_t t);
-int64_t pmuluw (uint32x2_t s, uint32x2_t t);
-uint8x8_t pasubub (uint8x8_t s, uint8x8_t t);
-uint16x4_t biadd (uint8x8_t s);
-uint16x4_t psadbh (uint8x8_t s, uint8x8_t t);
-uint16x4_t pshufh_u (uint16x4_t dest, uint16x4_t s, uint8_t order);
-int16x4_t pshufh_s (int16x4_t dest, int16x4_t s, uint8_t order);
-uint16x4_t psllh_u (uint16x4_t s, uint8_t amount);
-int16x4_t psllh_s (int16x4_t s, uint8_t amount);
-uint32x2_t psllw_u (uint32x2_t s, uint8_t amount);
-int32x2_t psllw_s (int32x2_t s, uint8_t amount);
-uint16x4_t psrlh_u (uint16x4_t s, uint8_t amount);
-int16x4_t psrlh_s (int16x4_t s, uint8_t amount);
-uint32x2_t psrlw_u (uint32x2_t s, uint8_t amount);
-int32x2_t psrlw_s (int32x2_t s, uint8_t amount);
-uint16x4_t psrah_u (uint16x4_t s, uint8_t amount);
-int16x4_t psrah_s (int16x4_t s, uint8_t amount);
-uint32x2_t psraw_u (uint32x2_t s, uint8_t amount);
-int32x2_t psraw_s (int32x2_t s, uint8_t amount);
-uint32x2_t psubw_u (uint32x2_t s, uint32x2_t t);
-uint16x4_t psubh_u (uint16x4_t s, uint16x4_t t);
-uint8x8_t psubb_u (uint8x8_t s, uint8x8_t t);
-int32x2_t psubw_s (int32x2_t s, int32x2_t t);
-int16x4_t psubh_s (int16x4_t s, int16x4_t t);
-int8x8_t psubb_s (int8x8_t s, int8x8_t t);
-uint64_t psubd_u (uint64_t s, uint64_t t);
-int64_t psubd_s (int64_t s, int64_t t);
-int16x4_t psubsh (int16x4_t s, int16x4_t t);
-int8x8_t psubsb (int8x8_t s, int8x8_t t);
-uint16x4_t psubush (uint16x4_t s, uint16x4_t t);
-uint8x8_t psubusb (uint8x8_t s, uint8x8_t t);
-uint32x2_t punpckhwd_u (uint32x2_t s, uint32x2_t t);
-uint16x4_t punpckhhw_u (uint16x4_t s, uint16x4_t t);
-uint8x8_t punpckhbh_u (uint8x8_t s, uint8x8_t t);
-int32x2_t punpckhwd_s (int32x2_t s, int32x2_t t);
-int16x4_t punpckhhw_s (int16x4_t s, int16x4_t t);
-int8x8_t punpckhbh_s (int8x8_t s, int8x8_t t);
-uint32x2_t punpcklwd_u (uint32x2_t s, uint32x2_t t);
-uint16x4_t punpcklhw_u (uint16x4_t s, uint16x4_t t);
-uint8x8_t punpcklbh_u (uint8x8_t s, uint8x8_t t);
-int32x2_t punpcklwd_s (int32x2_t s, int32x2_t t);
-int16x4_t punpcklhw_s (int16x4_t s, int16x4_t t);
-int8x8_t punpcklbh_s (int8x8_t s, int8x8_t t);
+  vec_add ((vector signed int)@{1, 2, 3, 4@}, foo);
 @end smallexample
 
-@menu
-* Paired-Single Arithmetic::
-* Paired-Single Built-in Functions::
-* MIPS-3D Built-in Functions::
-@end menu
+@noindent
+Since @code{vec_add} is a macro, the vector constant in the example
+is treated as four separate arguments.  Wrap the entire argument in
+parentheses for this to work.
+@end itemize
+
+@emph{Note:} Only the @code{<altivec.h>} interface is supported.
+Internally, GCC uses built-in functions to achieve the functionality in
+the aforementioned header file, but they are not supported and are
+subject to change without notice.
+
+The following interfaces are supported for the generic and specific
+AltiVec operations and the AltiVec predicates.  In cases where there
+is a direct mapping between generic and specific operations, only the
+generic names are shown here, although the specific operations can also
+be used.
+
+Arguments that are documented as @code{const int} require literal
+integral values within the range required for that operation.
+
+@smallexample
+vector signed char vec_abs (vector signed char);
+vector signed short vec_abs (vector signed short);
+vector signed int vec_abs (vector signed int);
+vector float vec_abs (vector float);
+
+vector signed char vec_abss (vector signed char);
+vector signed short vec_abss (vector signed short);
+vector signed int vec_abss (vector signed int);
+
+vector signed char vec_add (vector bool char, vector signed char);
+vector signed char vec_add (vector signed char, vector bool char);
+vector signed char vec_add (vector signed char, vector signed char);
+vector unsigned char vec_add (vector bool char, vector unsigned char);
+vector unsigned char vec_add (vector unsigned char, vector bool char);
+vector unsigned char vec_add (vector unsigned char,
+                              vector unsigned char);
+vector signed short vec_add (vector bool short, vector signed short);
+vector signed short vec_add (vector signed short, vector bool short);
+vector signed short vec_add (vector signed short, vector signed short);
+vector unsigned short vec_add (vector bool short,
+                               vector unsigned short);
+vector unsigned short vec_add (vector unsigned short,
+                               vector bool short);
+vector unsigned short vec_add (vector unsigned short,
+                               vector unsigned short);
+vector signed int vec_add (vector bool int, vector signed int);
+vector signed int vec_add (vector signed int, vector bool int);
+vector signed int vec_add (vector signed int, vector signed int);
+vector unsigned int vec_add (vector bool int, vector unsigned int);
+vector unsigned int vec_add (vector unsigned int, vector bool int);
+vector unsigned int vec_add (vector unsigned int, vector unsigned int);
+vector float vec_add (vector float, vector float);
+
+vector float vec_vaddfp (vector float, vector float);
 
-@node Paired-Single Arithmetic
-@subsubsection Paired-Single Arithmetic
+vector signed int vec_vadduwm (vector bool int, vector signed int);
+vector signed int vec_vadduwm (vector signed int, vector bool int);
+vector signed int vec_vadduwm (vector signed int, vector signed int);
+vector unsigned int vec_vadduwm (vector bool int, vector unsigned int);
+vector unsigned int vec_vadduwm (vector unsigned int, vector bool int);
+vector unsigned int vec_vadduwm (vector unsigned int,
+                                 vector unsigned int);
 
-The table below lists the @code{v2sf} operations for which hardware
-support exists.  @code{a}, @code{b} and @code{c} are @code{v2sf}
-values and @code{x} is an integral value.
+vector signed short vec_vadduhm (vector bool short,
+                                 vector signed short);
+vector signed short vec_vadduhm (vector signed short,
+                                 vector bool short);
+vector signed short vec_vadduhm (vector signed short,
+                                 vector signed short);
+vector unsigned short vec_vadduhm (vector bool short,
+                                   vector unsigned short);
+vector unsigned short vec_vadduhm (vector unsigned short,
+                                   vector bool short);
+vector unsigned short vec_vadduhm (vector unsigned short,
+                                   vector unsigned short);
 
-@multitable @columnfractions .50 .50
-@item C code @tab MIPS instruction
-@item @code{a + b} @tab @code{add.ps}
-@item @code{a - b} @tab @code{sub.ps}
-@item @code{-a} @tab @code{neg.ps}
-@item @code{a * b} @tab @code{mul.ps}
-@item @code{a * b + c} @tab @code{madd.ps}
-@item @code{a * b - c} @tab @code{msub.ps}
-@item @code{-(a * b + c)} @tab @code{nmadd.ps}
-@item @code{-(a * b - c)} @tab @code{nmsub.ps}
-@item @code{x ? a : b} @tab @code{movn.ps}/@code{movz.ps}
-@end multitable
+vector signed char vec_vaddubm (vector bool char, vector signed char);
+vector signed char vec_vaddubm (vector signed char, vector bool char);
+vector signed char vec_vaddubm (vector signed char, vector signed char);
+vector unsigned char vec_vaddubm (vector bool char,
+                                  vector unsigned char);
+vector unsigned char vec_vaddubm (vector unsigned char,
+                                  vector bool char);
+vector unsigned char vec_vaddubm (vector unsigned char,
+                                  vector unsigned char);
 
-Note that the multiply-accumulate instructions can be disabled
-using the command-line option @code{-mno-fused-madd}.
+vector unsigned int vec_addc (vector unsigned int, vector unsigned int);
 
-@node Paired-Single Built-in Functions
-@subsubsection Paired-Single Built-in Functions
+vector unsigned char vec_adds (vector bool char, vector unsigned char);
+vector unsigned char vec_adds (vector unsigned char, vector bool char);
+vector unsigned char vec_adds (vector unsigned char,
+                               vector unsigned char);
+vector signed char vec_adds (vector bool char, vector signed char);
+vector signed char vec_adds (vector signed char, vector bool char);
+vector signed char vec_adds (vector signed char, vector signed char);
+vector unsigned short vec_adds (vector bool short,
+                                vector unsigned short);
+vector unsigned short vec_adds (vector unsigned short,
+                                vector bool short);
+vector unsigned short vec_adds (vector unsigned short,
+                                vector unsigned short);
+vector signed short vec_adds (vector bool short, vector signed short);
+vector signed short vec_adds (vector signed short, vector bool short);
+vector signed short vec_adds (vector signed short, vector signed short);
+vector unsigned int vec_adds (vector bool int, vector unsigned int);
+vector unsigned int vec_adds (vector unsigned int, vector bool int);
+vector unsigned int vec_adds (vector unsigned int, vector unsigned int);
+vector signed int vec_adds (vector bool int, vector signed int);
+vector signed int vec_adds (vector signed int, vector bool int);
+vector signed int vec_adds (vector signed int, vector signed int);
 
-The following paired-single functions map directly to a particular
-MIPS instruction.  Please refer to the architecture specification
-for details on what each instruction does.
+vector signed int vec_vaddsws (vector bool int, vector signed int);
+vector signed int vec_vaddsws (vector signed int, vector bool int);
+vector signed int vec_vaddsws (vector signed int, vector signed int);
 
-@table @code
-@item v2sf __builtin_mips_pll_ps (v2sf, v2sf)
-Pair lower lower (@code{pll.ps}).
+vector unsigned int vec_vadduws (vector bool int, vector unsigned int);
+vector unsigned int vec_vadduws (vector unsigned int, vector bool int);
+vector unsigned int vec_vadduws (vector unsigned int,
+                                 vector unsigned int);
 
-@item v2sf __builtin_mips_pul_ps (v2sf, v2sf)
-Pair upper lower (@code{pul.ps}).
+vector signed short vec_vaddshs (vector bool short,
+                                 vector signed short);
+vector signed short vec_vaddshs (vector signed short,
+                                 vector bool short);
+vector signed short vec_vaddshs (vector signed short,
+                                 vector signed short);
 
-@item v2sf __builtin_mips_plu_ps (v2sf, v2sf)
-Pair lower upper (@code{plu.ps}).
+vector unsigned short vec_vadduhs (vector bool short,
+                                   vector unsigned short);
+vector unsigned short vec_vadduhs (vector unsigned short,
+                                   vector bool short);
+vector unsigned short vec_vadduhs (vector unsigned short,
+                                   vector unsigned short);
 
-@item v2sf __builtin_mips_puu_ps (v2sf, v2sf)
-Pair upper upper (@code{puu.ps}).
+vector signed char vec_vaddsbs (vector bool char, vector signed char);
+vector signed char vec_vaddsbs (vector signed char, vector bool char);
+vector signed char vec_vaddsbs (vector signed char, vector signed char);
 
-@item v2sf __builtin_mips_cvt_ps_s (float, float)
-Convert pair to paired single (@code{cvt.ps.s}).
+vector unsigned char vec_vaddubs (vector bool char,
+                                  vector unsigned char);
+vector unsigned char vec_vaddubs (vector unsigned char,
+                                  vector bool char);
+vector unsigned char vec_vaddubs (vector unsigned char,
+                                  vector unsigned char);
 
-@item float __builtin_mips_cvt_s_pl (v2sf)
-Convert pair lower to single (@code{cvt.s.pl}).
+vector float vec_and (vector float, vector float);
+vector float vec_and (vector float, vector bool int);
+vector float vec_and (vector bool int, vector float);
+vector bool int vec_and (vector bool int, vector bool int);
+vector signed int vec_and (vector bool int, vector signed int);
+vector signed int vec_and (vector signed int, vector bool int);
+vector signed int vec_and (vector signed int, vector signed int);
+vector unsigned int vec_and (vector bool int, vector unsigned int);
+vector unsigned int vec_and (vector unsigned int, vector bool int);
+vector unsigned int vec_and (vector unsigned int, vector unsigned int);
+vector bool short vec_and (vector bool short, vector bool short);
+vector signed short vec_and (vector bool short, vector signed short);
+vector signed short vec_and (vector signed short, vector bool short);
+vector signed short vec_and (vector signed short, vector signed short);
+vector unsigned short vec_and (vector bool short,
+                               vector unsigned short);
+vector unsigned short vec_and (vector unsigned short,
+                               vector bool short);
+vector unsigned short vec_and (vector unsigned short,
+                               vector unsigned short);
+vector signed char vec_and (vector bool char, vector signed char);
+vector bool char vec_and (vector bool char, vector bool char);
+vector signed char vec_and (vector signed char, vector bool char);
+vector signed char vec_and (vector signed char, vector signed char);
+vector unsigned char vec_and (vector bool char, vector unsigned char);
+vector unsigned char vec_and (vector unsigned char, vector bool char);
+vector unsigned char vec_and (vector unsigned char,
+                              vector unsigned char);
 
-@item float __builtin_mips_cvt_s_pu (v2sf)
-Convert pair upper to single (@code{cvt.s.pu}).
+vector float vec_andc (vector float, vector float);
+vector float vec_andc (vector float, vector bool int);
+vector float vec_andc (vector bool int, vector float);
+vector bool int vec_andc (vector bool int, vector bool int);
+vector signed int vec_andc (vector bool int, vector signed int);
+vector signed int vec_andc (vector signed int, vector bool int);
+vector signed int vec_andc (vector signed int, vector signed int);
+vector unsigned int vec_andc (vector bool int, vector unsigned int);
+vector unsigned int vec_andc (vector unsigned int, vector bool int);
+vector unsigned int vec_andc (vector unsigned int, vector unsigned int);
+vector bool short vec_andc (vector bool short, vector bool short);
+vector signed short vec_andc (vector bool short, vector signed short);
+vector signed short vec_andc (vector signed short, vector bool short);
+vector signed short vec_andc (vector signed short, vector signed short);
+vector unsigned short vec_andc (vector bool short,
+                                vector unsigned short);
+vector unsigned short vec_andc (vector unsigned short,
+                                vector bool short);
+vector unsigned short vec_andc (vector unsigned short,
+                                vector unsigned short);
+vector signed char vec_andc (vector bool char, vector signed char);
+vector bool char vec_andc (vector bool char, vector bool char);
+vector signed char vec_andc (vector signed char, vector bool char);
+vector signed char vec_andc (vector signed char, vector signed char);
+vector unsigned char vec_andc (vector bool char, vector unsigned char);
+vector unsigned char vec_andc (vector unsigned char, vector bool char);
+vector unsigned char vec_andc (vector unsigned char,
+                               vector unsigned char);
 
-@item v2sf __builtin_mips_abs_ps (v2sf)
-Absolute value (@code{abs.ps}).
+vector unsigned char vec_avg (vector unsigned char,
+                              vector unsigned char);
+vector signed char vec_avg (vector signed char, vector signed char);
+vector unsigned short vec_avg (vector unsigned short,
+                               vector unsigned short);
+vector signed short vec_avg (vector signed short, vector signed short);
+vector unsigned int vec_avg (vector unsigned int, vector unsigned int);
+vector signed int vec_avg (vector signed int, vector signed int);
 
-@item v2sf __builtin_mips_alnv_ps (v2sf, v2sf, int)
-Align variable (@code{alnv.ps}).
+vector signed int vec_vavgsw (vector signed int, vector signed int);
 
-@emph{Note:} The value of the third parameter must be 0 or 4
-modulo 8, otherwise the result is unpredictable.  Please read the
-instruction description for details.
-@end table
+vector unsigned int vec_vavguw (vector unsigned int,
+                                vector unsigned int);
 
-The following multi-instruction functions are also available.
-In each case, @var{cond} can be any of the 16 floating-point conditions:
-@code{f}, @code{un}, @code{eq}, @code{ueq}, @code{olt}, @code{ult},
-@code{ole}, @code{ule}, @code{sf}, @code{ngle}, @code{seq}, @code{ngl},
-@code{lt}, @code{nge}, @code{le} or @code{ngt}.
+vector signed short vec_vavgsh (vector signed short,
+                                vector signed short);
 
-@table @code
-@item v2sf __builtin_mips_movt_c_@var{cond}_ps (v2sf @var{a}, v2sf @var{b}, v2sf @var{c}, v2sf @var{d})
-@itemx v2sf __builtin_mips_movf_c_@var{cond}_ps (v2sf @var{a}, v2sf @var{b}, v2sf @var{c}, v2sf @var{d})
-Conditional move based on floating-point comparison (@code{c.@var{cond}.ps},
-@code{movt.ps}/@code{movf.ps}).
+vector unsigned short vec_vavguh (vector unsigned short,
+                                  vector unsigned short);
 
-The @code{movt} functions return the value @var{x} computed by:
+vector signed char vec_vavgsb (vector signed char, vector signed char);
 
-@smallexample
-c.@var{cond}.ps @var{cc},@var{a},@var{b}
-mov.ps @var{x},@var{c}
-movt.ps @var{x},@var{d},@var{cc}
-@end smallexample
+vector unsigned char vec_vavgub (vector unsigned char,
+                                 vector unsigned char);
+
+vector float vec_copysign (vector float);
+
+vector float vec_ceil (vector float);
+
+vector signed int vec_cmpb (vector float, vector float);
+
+vector bool char vec_cmpeq (vector signed char, vector signed char);
+vector bool char vec_cmpeq (vector unsigned char, vector unsigned char);
+vector bool short vec_cmpeq (vector signed short, vector signed short);
+vector bool short vec_cmpeq (vector unsigned short,
+                             vector unsigned short);
+vector bool int vec_cmpeq (vector signed int, vector signed int);
+vector bool int vec_cmpeq (vector unsigned int, vector unsigned int);
+vector bool int vec_cmpeq (vector float, vector float);
+
+vector bool int vec_vcmpeqfp (vector float, vector float);
 
-The @code{movf} functions are similar but use @code{movf.ps} instead
-of @code{movt.ps}.
+vector bool int vec_vcmpequw (vector signed int, vector signed int);
+vector bool int vec_vcmpequw (vector unsigned int, vector unsigned int);
 
-@item int __builtin_mips_upper_c_@var{cond}_ps (v2sf @var{a}, v2sf @var{b})
-@itemx int __builtin_mips_lower_c_@var{cond}_ps (v2sf @var{a}, v2sf @var{b})
-Comparison of two paired-single values (@code{c.@var{cond}.ps},
-@code{bc1t}/@code{bc1f}).
+vector bool short vec_vcmpequh (vector signed short,
+                                vector signed short);
+vector bool short vec_vcmpequh (vector unsigned short,
+                                vector unsigned short);
 
-These functions compare @var{a} and @var{b} using @code{c.@var{cond}.ps}
-and return either the upper or lower half of the result.  For example:
+vector bool char vec_vcmpequb (vector signed char, vector signed char);
+vector bool char vec_vcmpequb (vector unsigned char,
+                               vector unsigned char);
 
-@smallexample
-v2sf a, b;
-if (__builtin_mips_upper_c_eq_ps (a, b))
-  upper_halves_are_equal ();
-else
-  upper_halves_are_unequal ();
+vector bool int vec_cmpge (vector float, vector float);
 
-if (__builtin_mips_lower_c_eq_ps (a, b))
-  lower_halves_are_equal ();
-else
-  lower_halves_are_unequal ();
-@end smallexample
-@end table
+vector bool char vec_cmpgt (vector unsigned char, vector unsigned char);
+vector bool char vec_cmpgt (vector signed char, vector signed char);
+vector bool short vec_cmpgt (vector unsigned short,
+                             vector unsigned short);
+vector bool short vec_cmpgt (vector signed short, vector signed short);
+vector bool int vec_cmpgt (vector unsigned int, vector unsigned int);
+vector bool int vec_cmpgt (vector signed int, vector signed int);
+vector bool int vec_cmpgt (vector float, vector float);
 
-@node MIPS-3D Built-in Functions
-@subsubsection MIPS-3D Built-in Functions
+vector bool int vec_vcmpgtfp (vector float, vector float);
 
-The MIPS-3D Application-Specific Extension (ASE) includes additional
-paired-single instructions that are designed to improve the performance
-of 3D graphics operations.  Support for these instructions is controlled
-by the @option{-mips3d} command-line option.
+vector bool int vec_vcmpgtsw (vector signed int, vector signed int);
 
-The functions listed below map directly to a particular MIPS-3D
-instruction.  Please refer to the architecture specification for
-more details on what each instruction does.
+vector bool int vec_vcmpgtuw (vector unsigned int, vector unsigned int);
 
-@table @code
-@item v2sf __builtin_mips_addr_ps (v2sf, v2sf)
-Reduction add (@code{addr.ps}).
+vector bool short vec_vcmpgtsh (vector signed short,
+                                vector signed short);
 
-@item v2sf __builtin_mips_mulr_ps (v2sf, v2sf)
-Reduction multiply (@code{mulr.ps}).
+vector bool short vec_vcmpgtuh (vector unsigned short,
+                                vector unsigned short);
 
-@item v2sf __builtin_mips_cvt_pw_ps (v2sf)
-Convert paired single to paired word (@code{cvt.pw.ps}).
+vector bool char vec_vcmpgtsb (vector signed char, vector signed char);
 
-@item v2sf __builtin_mips_cvt_ps_pw (v2sf)
-Convert paired word to paired single (@code{cvt.ps.pw}).
+vector bool char vec_vcmpgtub (vector unsigned char,
+                               vector unsigned char);
 
-@item float __builtin_mips_recip1_s (float)
-@itemx double __builtin_mips_recip1_d (double)
-@itemx v2sf __builtin_mips_recip1_ps (v2sf)
-Reduced-precision reciprocal (sequence step 1) (@code{recip1.@var{fmt}}).
+vector bool int vec_cmple (vector float, vector float);
 
-@item float __builtin_mips_recip2_s (float, float)
-@itemx double __builtin_mips_recip2_d (double, double)
-@itemx v2sf __builtin_mips_recip2_ps (v2sf, v2sf)
-Reduced-precision reciprocal (sequence step 2) (@code{recip2.@var{fmt}}).
+vector bool char vec_cmplt (vector unsigned char, vector unsigned char);
+vector bool char vec_cmplt (vector signed char, vector signed char);
+vector bool short vec_cmplt (vector unsigned short,
+                             vector unsigned short);
+vector bool short vec_cmplt (vector signed short, vector signed short);
+vector bool int vec_cmplt (vector unsigned int, vector unsigned int);
+vector bool int vec_cmplt (vector signed int, vector signed int);
+vector bool int vec_cmplt (vector float, vector float);
 
-@item float __builtin_mips_rsqrt1_s (float)
-@itemx double __builtin_mips_rsqrt1_d (double)
-@itemx v2sf __builtin_mips_rsqrt1_ps (v2sf)
-Reduced-precision reciprocal square root (sequence step 1)
-(@code{rsqrt1.@var{fmt}}).
+vector float vec_cpsgn (vector float, vector float);
 
-@item float __builtin_mips_rsqrt2_s (float, float)
-@itemx double __builtin_mips_rsqrt2_d (double, double)
-@itemx v2sf __builtin_mips_rsqrt2_ps (v2sf, v2sf)
-Reduced-precision reciprocal square root (sequence step 2)
-(@code{rsqrt2.@var{fmt}}).
-@end table
+vector float vec_ctf (vector unsigned int, const int);
+vector float vec_ctf (vector signed int, const int);
+vector double vec_ctf (vector unsigned long, const int);
+vector double vec_ctf (vector signed long, const int);
 
-The following multi-instruction functions are also available.
-In each case, @var{cond} can be any of the 16 floating-point conditions:
-@code{f}, @code{un}, @code{eq}, @code{ueq}, @code{olt}, @code{ult},
-@code{ole}, @code{ule}, @code{sf}, @code{ngle}, @code{seq},
-@code{ngl}, @code{lt}, @code{nge}, @code{le} or @code{ngt}.
+vector float vec_vcfsx (vector signed int, const int);
 
-@table @code
-@item int __builtin_mips_cabs_@var{cond}_s (float @var{a}, float @var{b})
-@itemx int __builtin_mips_cabs_@var{cond}_d (double @var{a}, double @var{b})
-Absolute comparison of two scalar values (@code{cabs.@var{cond}.@var{fmt}},
-@code{bc1t}/@code{bc1f}).
+vector float vec_vcfux (vector unsigned int, const int);
 
-These functions compare @var{a} and @var{b} using @code{cabs.@var{cond}.s}
-or @code{cabs.@var{cond}.d} and return the result as a boolean value.
-For example:
+vector signed int vec_cts (vector float, const int);
+vector signed long vec_cts (vector double, const int);
 
-@smallexample
-float a, b;
-if (__builtin_mips_cabs_eq_s (a, b))
-  true ();
-else
-  false ();
-@end smallexample
+vector unsigned int vec_ctu (vector float, const int);
+vector unsigned long vec_ctu (vector double, const int);
 
-@item int __builtin_mips_upper_cabs_@var{cond}_ps (v2sf @var{a}, v2sf @var{b})
-@itemx int __builtin_mips_lower_cabs_@var{cond}_ps (v2sf @var{a}, v2sf @var{b})
-Absolute comparison of two paired-single values (@code{cabs.@var{cond}.ps},
-@code{bc1t}/@code{bc1f}).
+void vec_dss (const int);
 
-These functions compare @var{a} and @var{b} using @code{cabs.@var{cond}.ps}
-and return either the upper or lower half of the result.  For example:
+void vec_dssall (void);
 
-@smallexample
-v2sf a, b;
-if (__builtin_mips_upper_cabs_eq_ps (a, b))
-  upper_halves_are_equal ();
-else
-  upper_halves_are_unequal ();
+void vec_dst (const vector unsigned char *, int, const int);
+void vec_dst (const vector signed char *, int, const int);
+void vec_dst (const vector bool char *, int, const int);
+void vec_dst (const vector unsigned short *, int, const int);
+void vec_dst (const vector signed short *, int, const int);
+void vec_dst (const vector bool short *, int, const int);
+void vec_dst (const vector pixel *, int, const int);
+void vec_dst (const vector unsigned int *, int, const int);
+void vec_dst (const vector signed int *, int, const int);
+void vec_dst (const vector bool int *, int, const int);
+void vec_dst (const vector float *, int, const int);
+void vec_dst (const unsigned char *, int, const int);
+void vec_dst (const signed char *, int, const int);
+void vec_dst (const unsigned short *, int, const int);
+void vec_dst (const short *, int, const int);
+void vec_dst (const unsigned int *, int, const int);
+void vec_dst (const int *, int, const int);
+void vec_dst (const unsigned long *, int, const int);
+void vec_dst (const long *, int, const int);
+void vec_dst (const float *, int, const int);
 
-if (__builtin_mips_lower_cabs_eq_ps (a, b))
-  lower_halves_are_equal ();
-else
-  lower_halves_are_unequal ();
-@end smallexample
+void vec_dstst (const vector unsigned char *, int, const int);
+void vec_dstst (const vector signed char *, int, const int);
+void vec_dstst (const vector bool char *, int, const int);
+void vec_dstst (const vector unsigned short *, int, const int);
+void vec_dstst (const vector signed short *, int, const int);
+void vec_dstst (const vector bool short *, int, const int);
+void vec_dstst (const vector pixel *, int, const int);
+void vec_dstst (const vector unsigned int *, int, const int);
+void vec_dstst (const vector signed int *, int, const int);
+void vec_dstst (const vector bool int *, int, const int);
+void vec_dstst (const vector float *, int, const int);
+void vec_dstst (const unsigned char *, int, const int);
+void vec_dstst (const signed char *, int, const int);
+void vec_dstst (const unsigned short *, int, const int);
+void vec_dstst (const short *, int, const int);
+void vec_dstst (const unsigned int *, int, const int);
+void vec_dstst (const int *, int, const int);
+void vec_dstst (const unsigned long *, int, const int);
+void vec_dstst (const long *, int, const int);
+void vec_dstst (const float *, int, const int);
 
-@item v2sf __builtin_mips_movt_cabs_@var{cond}_ps (v2sf @var{a}, v2sf @var{b}, v2sf @var{c}, v2sf @var{d})
-@itemx v2sf __builtin_mips_movf_cabs_@var{cond}_ps (v2sf @var{a}, v2sf @var{b}, v2sf @var{c}, v2sf @var{d})
-Conditional move based on absolute comparison (@code{cabs.@var{cond}.ps},
-@code{movt.ps}/@code{movf.ps}).
+void vec_dststt (const vector unsigned char *, int, const int);
+void vec_dststt (const vector signed char *, int, const int);
+void vec_dststt (const vector bool char *, int, const int);
+void vec_dststt (const vector unsigned short *, int, const int);
+void vec_dststt (const vector signed short *, int, const int);
+void vec_dststt (const vector bool short *, int, const int);
+void vec_dststt (const vector pixel *, int, const int);
+void vec_dststt (const vector unsigned int *, int, const int);
+void vec_dststt (const vector signed int *, int, const int);
+void vec_dststt (const vector bool int *, int, const int);
+void vec_dststt (const vector float *, int, const int);
+void vec_dststt (const unsigned char *, int, const int);
+void vec_dststt (const signed char *, int, const int);
+void vec_dststt (const unsigned short *, int, const int);
+void vec_dststt (const short *, int, const int);
+void vec_dststt (const unsigned int *, int, const int);
+void vec_dststt (const int *, int, const int);
+void vec_dststt (const unsigned long *, int, const int);
+void vec_dststt (const long *, int, const int);
+void vec_dststt (const float *, int, const int);
 
-The @code{movt} functions return the value @var{x} computed by:
+void vec_dstt (const vector unsigned char *, int, const int);
+void vec_dstt (const vector signed char *, int, const int);
+void vec_dstt (const vector bool char *, int, const int);
+void vec_dstt (const vector unsigned short *, int, const int);
+void vec_dstt (const vector signed short *, int, const int);
+void vec_dstt (const vector bool short *, int, const int);
+void vec_dstt (const vector pixel *, int, const int);
+void vec_dstt (const vector unsigned int *, int, const int);
+void vec_dstt (const vector signed int *, int, const int);
+void vec_dstt (const vector bool int *, int, const int);
+void vec_dstt (const vector float *, int, const int);
+void vec_dstt (const unsigned char *, int, const int);
+void vec_dstt (const signed char *, int, const int);
+void vec_dstt (const unsigned short *, int, const int);
+void vec_dstt (const short *, int, const int);
+void vec_dstt (const unsigned int *, int, const int);
+void vec_dstt (const int *, int, const int);
+void vec_dstt (const unsigned long *, int, const int);
+void vec_dstt (const long *, int, const int);
+void vec_dstt (const float *, int, const int);
 
-@smallexample
-cabs.@var{cond}.ps @var{cc},@var{a},@var{b}
-mov.ps @var{x},@var{c}
-movt.ps @var{x},@var{d},@var{cc}
-@end smallexample
+vector float vec_expte (vector float);
+
+vector float vec_floor (vector float);
+
+vector float vec_ld (int, const vector float *);
+vector float vec_ld (int, const float *);
+vector bool int vec_ld (int, const vector bool int *);
+vector signed int vec_ld (int, const vector signed int *);
+vector signed int vec_ld (int, const int *);
+vector signed int vec_ld (int, const long *);
+vector unsigned int vec_ld (int, const vector unsigned int *);
+vector unsigned int vec_ld (int, const unsigned int *);
+vector unsigned int vec_ld (int, const unsigned long *);
+vector bool short vec_ld (int, const vector bool short *);
+vector pixel vec_ld (int, const vector pixel *);
+vector signed short vec_ld (int, const vector signed short *);
+vector signed short vec_ld (int, const short *);
+vector unsigned short vec_ld (int, const vector unsigned short *);
+vector unsigned short vec_ld (int, const unsigned short *);
+vector bool char vec_ld (int, const vector bool char *);
+vector signed char vec_ld (int, const vector signed char *);
+vector signed char vec_ld (int, const signed char *);
+vector unsigned char vec_ld (int, const vector unsigned char *);
+vector unsigned char vec_ld (int, const unsigned char *);
 
-The @code{movf} functions are similar but use @code{movf.ps} instead
-of @code{movt.ps}.
+vector signed char vec_lde (int, const signed char *);
+vector unsigned char vec_lde (int, const unsigned char *);
+vector signed short vec_lde (int, const short *);
+vector unsigned short vec_lde (int, const unsigned short *);
+vector float vec_lde (int, const float *);
+vector signed int vec_lde (int, const int *);
+vector unsigned int vec_lde (int, const unsigned int *);
+vector signed int vec_lde (int, const long *);
+vector unsigned int vec_lde (int, const unsigned long *);
 
-@item int __builtin_mips_any_c_@var{cond}_ps (v2sf @var{a}, v2sf @var{b})
-@itemx int __builtin_mips_all_c_@var{cond}_ps (v2sf @var{a}, v2sf @var{b})
-@itemx int __builtin_mips_any_cabs_@var{cond}_ps (v2sf @var{a}, v2sf @var{b})
-@itemx int __builtin_mips_all_cabs_@var{cond}_ps (v2sf @var{a}, v2sf @var{b})
-Comparison of two paired-single values
-(@code{c.@var{cond}.ps}/@code{cabs.@var{cond}.ps},
-@code{bc1any2t}/@code{bc1any2f}).
+vector float vec_lvewx (int, float *);
+vector signed int vec_lvewx (int, int *);
+vector unsigned int vec_lvewx (int, unsigned int *);
+vector signed int vec_lvewx (int, long *);
+vector unsigned int vec_lvewx (int, unsigned long *);
 
-These functions compare @var{a} and @var{b} using @code{c.@var{cond}.ps}
-or @code{cabs.@var{cond}.ps}.  The @code{any} forms return true if either
-result is true and the @code{all} forms return true if both results are true.
-For example:
+vector signed short vec_lvehx (int, short *);
+vector unsigned short vec_lvehx (int, unsigned short *);
 
-@smallexample
-v2sf a, b;
-if (__builtin_mips_any_c_eq_ps (a, b))
-  one_is_true ();
-else
-  both_are_false ();
+vector signed char vec_lvebx (int, char *);
+vector unsigned char vec_lvebx (int, unsigned char *);
 
-if (__builtin_mips_all_c_eq_ps (a, b))
-  both_are_true ();
-else
-  one_is_false ();
-@end smallexample
+vector float vec_ldl (int, const vector float *);
+vector float vec_ldl (int, const float *);
+vector bool int vec_ldl (int, const vector bool int *);
+vector signed int vec_ldl (int, const vector signed int *);
+vector signed int vec_ldl (int, const int *);
+vector signed int vec_ldl (int, const long *);
+vector unsigned int vec_ldl (int, const vector unsigned int *);
+vector unsigned int vec_ldl (int, const unsigned int *);
+vector unsigned int vec_ldl (int, const unsigned long *);
+vector bool short vec_ldl (int, const vector bool short *);
+vector pixel vec_ldl (int, const vector pixel *);
+vector signed short vec_ldl (int, const vector signed short *);
+vector signed short vec_ldl (int, const short *);
+vector unsigned short vec_ldl (int, const vector unsigned short *);
+vector unsigned short vec_ldl (int, const unsigned short *);
+vector bool char vec_ldl (int, const vector bool char *);
+vector signed char vec_ldl (int, const vector signed char *);
+vector signed char vec_ldl (int, const signed char *);
+vector unsigned char vec_ldl (int, const vector unsigned char *);
+vector unsigned char vec_ldl (int, const unsigned char *);
 
-@item int __builtin_mips_any_c_@var{cond}_4s (v2sf @var{a}, v2sf @var{b}, v2sf @var{c}, v2sf @var{d})
-@itemx int __builtin_mips_all_c_@var{cond}_4s (v2sf @var{a}, v2sf @var{b}, v2sf @var{c}, v2sf @var{d})
-@itemx int __builtin_mips_any_cabs_@var{cond}_4s (v2sf @var{a}, v2sf @var{b}, v2sf @var{c}, v2sf @var{d})
-@itemx int __builtin_mips_all_cabs_@var{cond}_4s (v2sf @var{a}, v2sf @var{b}, v2sf @var{c}, v2sf @var{d})
-Comparison of four paired-single values
-(@code{c.@var{cond}.ps}/@code{cabs.@var{cond}.ps},
-@code{bc1any4t}/@code{bc1any4f}).
+vector float vec_loge (vector float);
 
-These functions use @code{c.@var{cond}.ps} or @code{cabs.@var{cond}.ps}
-to compare @var{a} with @var{b} and to compare @var{c} with @var{d}.
-The @code{any} forms return true if any of the four results are true
-and the @code{all} forms return true if all four results are true.
-For example:
+vector unsigned char vec_lvsl (int, const volatile unsigned char *);
+vector unsigned char vec_lvsl (int, const volatile signed char *);
+vector unsigned char vec_lvsl (int, const volatile unsigned short *);
+vector unsigned char vec_lvsl (int, const volatile short *);
+vector unsigned char vec_lvsl (int, const volatile unsigned int *);
+vector unsigned char vec_lvsl (int, const volatile int *);
+vector unsigned char vec_lvsl (int, const volatile unsigned long *);
+vector unsigned char vec_lvsl (int, const volatile long *);
+vector unsigned char vec_lvsl (int, const volatile float *);
 
-@smallexample
-v2sf a, b, c, d;
-if (__builtin_mips_any_c_eq_4s (a, b, c, d))
-  some_are_true ();
-else
-  all_are_false ();
+vector unsigned char vec_lvsr (int, const volatile unsigned char *);
+vector unsigned char vec_lvsr (int, const volatile signed char *);
+vector unsigned char vec_lvsr (int, const volatile unsigned short *);
+vector unsigned char vec_lvsr (int, const volatile short *);
+vector unsigned char vec_lvsr (int, const volatile unsigned int *);
+vector unsigned char vec_lvsr (int, const volatile int *);
+vector unsigned char vec_lvsr (int, const volatile unsigned long *);
+vector unsigned char vec_lvsr (int, const volatile long *);
+vector unsigned char vec_lvsr (int, const volatile float *);
 
-if (__builtin_mips_all_c_eq_4s (a, b, c, d))
-  all_are_true ();
-else
-  some_are_false ();
-@end smallexample
-@end table
+vector float vec_madd (vector float, vector float, vector float);
 
-@node Other MIPS Built-in Functions
-@subsection Other MIPS Built-in Functions
+vector signed short vec_madds (vector signed short,
+                               vector signed short,
+                               vector signed short);
 
-GCC provides other MIPS-specific built-in functions:
+vector unsigned char vec_max (vector bool char, vector unsigned char);
+vector unsigned char vec_max (vector unsigned char, vector bool char);
+vector unsigned char vec_max (vector unsigned char,
+                              vector unsigned char);
+vector signed char vec_max (vector bool char, vector signed char);
+vector signed char vec_max (vector signed char, vector bool char);
+vector signed char vec_max (vector signed char, vector signed char);
+vector unsigned short vec_max (vector bool short,
+                               vector unsigned short);
+vector unsigned short vec_max (vector unsigned short,
+                               vector bool short);
+vector unsigned short vec_max (vector unsigned short,
+                               vector unsigned short);
+vector signed short vec_max (vector bool short, vector signed short);
+vector signed short vec_max (vector signed short, vector bool short);
+vector signed short vec_max (vector signed short, vector signed short);
+vector unsigned int vec_max (vector bool int, vector unsigned int);
+vector unsigned int vec_max (vector unsigned int, vector bool int);
+vector unsigned int vec_max (vector unsigned int, vector unsigned int);
+vector signed int vec_max (vector bool int, vector signed int);
+vector signed int vec_max (vector signed int, vector bool int);
+vector signed int vec_max (vector signed int, vector signed int);
+vector float vec_max (vector float, vector float);
 
-@table @code
-@item void __builtin_mips_cache (int @var{op}, const volatile void *@var{addr})
-Insert a @samp{cache} instruction with operands @var{op} and @var{addr}.
-GCC defines the preprocessor macro @code{___GCC_HAVE_BUILTIN_MIPS_CACHE}
-when this function is available.
+vector float vec_vmaxfp (vector float, vector float);
 
-@item unsigned int __builtin_mips_get_fcsr (void)
-@itemx void __builtin_mips_set_fcsr (unsigned int @var{value})
-Get and set the contents of the floating-point control and status register
-(FPU control register 31).  These functions are only available in hard-float
-code but can be called in both MIPS16 and non-MIPS16 contexts.
+vector signed int vec_vmaxsw (vector bool int, vector signed int);
+vector signed int vec_vmaxsw (vector signed int, vector bool int);
+vector signed int vec_vmaxsw (vector signed int, vector signed int);
 
-@code{__builtin_mips_set_fcsr} can be used to change any bit of the
-register except the condition codes, which GCC assumes are preserved.
-@end table
+vector unsigned int vec_vmaxuw (vector bool int, vector unsigned int);
+vector unsigned int vec_vmaxuw (vector unsigned int, vector bool int);
+vector unsigned int vec_vmaxuw (vector unsigned int,
+                                vector unsigned int);
 
-@node MSP430 Built-in Functions
-@subsection MSP430 Built-in Functions
+vector signed short vec_vmaxsh (vector bool short, vector signed short);
+vector signed short vec_vmaxsh (vector signed short, vector bool short);
+vector signed short vec_vmaxsh (vector signed short,
+                                vector signed short);
 
-GCC provides a couple of special builtin functions to aid in the
-writing of interrupt handlers in C.
+vector unsigned short vec_vmaxuh (vector bool short,
+                                  vector unsigned short);
+vector unsigned short vec_vmaxuh (vector unsigned short,
+                                  vector bool short);
+vector unsigned short vec_vmaxuh (vector unsigned short,
+                                  vector unsigned short);
 
-@table @code
-@item __bic_SR_register_on_exit (int @var{mask})
-This clears the indicated bits in the saved copy of the status register
-currently residing on the stack.  This only works inside interrupt
-handlers and the changes to the status register will only take affect
-once the handler returns.
+vector signed char vec_vmaxsb (vector bool char, vector signed char);
+vector signed char vec_vmaxsb (vector signed char, vector bool char);
+vector signed char vec_vmaxsb (vector signed char, vector signed char);
 
-@item __bis_SR_register_on_exit (int @var{mask})
-This sets the indicated bits in the saved copy of the status register
-currently residing on the stack.  This only works inside interrupt
-handlers and the changes to the status register will only take affect
-once the handler returns.
+vector unsigned char vec_vmaxub (vector bool char,
+                                 vector unsigned char);
+vector unsigned char vec_vmaxub (vector unsigned char,
+                                 vector bool char);
+vector unsigned char vec_vmaxub (vector unsigned char,
+                                 vector unsigned char);
 
-@item __delay_cycles (long long @var{cycles})
-This inserts an instruction sequence that takes exactly @var{cycles}
-cycles (between 0 and about 17E9) to complete.  The inserted sequence
-may use jumps, loops, or no-ops, and does not interfere with any other
-instructions.  Note that @var{cycles} must be a compile-time constant
-integer - that is, you must pass a number, not a variable that may be
-optimized to a constant later.  The number of cycles delayed by this
-builtin is exact.
-@end table
+vector bool char vec_mergeh (vector bool char, vector bool char);
+vector signed char vec_mergeh (vector signed char, vector signed char);
+vector unsigned char vec_mergeh (vector unsigned char,
+                                 vector unsigned char);
+vector bool short vec_mergeh (vector bool short, vector bool short);
+vector pixel vec_mergeh (vector pixel, vector pixel);
+vector signed short vec_mergeh (vector signed short,
+                                vector signed short);
+vector unsigned short vec_mergeh (vector unsigned short,
+                                  vector unsigned short);
+vector float vec_mergeh (vector float, vector float);
+vector bool int vec_mergeh (vector bool int, vector bool int);
+vector signed int vec_mergeh (vector signed int, vector signed int);
+vector unsigned int vec_mergeh (vector unsigned int,
+                                vector unsigned int);
 
-@node NDS32 Built-in Functions
-@subsection NDS32 Built-in Functions
+vector float vec_vmrghw (vector float, vector float);
+vector bool int vec_vmrghw (vector bool int, vector bool int);
+vector signed int vec_vmrghw (vector signed int, vector signed int);
+vector unsigned int vec_vmrghw (vector unsigned int,
+                                vector unsigned int);
 
-These built-in functions are available for the NDS32 target:
+vector bool short vec_vmrghh (vector bool short, vector bool short);
+vector signed short vec_vmrghh (vector signed short,
+                                vector signed short);
+vector unsigned short vec_vmrghh (vector unsigned short,
+                                  vector unsigned short);
+vector pixel vec_vmrghh (vector pixel, vector pixel);
 
-@deftypefn {Built-in Function} void __builtin_nds32_isync (int *@var{addr})
-Insert an ISYNC instruction into the instruction stream where
-@var{addr} is an instruction address for serialization.
-@end deftypefn
+vector bool char vec_vmrghb (vector bool char, vector bool char);
+vector signed char vec_vmrghb (vector signed char, vector signed char);
+vector unsigned char vec_vmrghb (vector unsigned char,
+                                 vector unsigned char);
 
-@deftypefn {Built-in Function} void __builtin_nds32_isb (void)
-Insert an ISB instruction into the instruction stream.
-@end deftypefn
+vector bool char vec_mergel (vector bool char, vector bool char);
+vector signed char vec_mergel (vector signed char, vector signed char);
+vector unsigned char vec_mergel (vector unsigned char,
+                                 vector unsigned char);
+vector bool short vec_mergel (vector bool short, vector bool short);
+vector pixel vec_mergel (vector pixel, vector pixel);
+vector signed short vec_mergel (vector signed short,
+                                vector signed short);
+vector unsigned short vec_mergel (vector unsigned short,
+                                  vector unsigned short);
+vector float vec_mergel (vector float, vector float);
+vector bool int vec_mergel (vector bool int, vector bool int);
+vector signed int vec_mergel (vector signed int, vector signed int);
+vector unsigned int vec_mergel (vector unsigned int,
+                                vector unsigned int);
 
-@deftypefn {Built-in Function} int __builtin_nds32_mfsr (int @var{sr})
-Return the content of a system register which is mapped by @var{sr}.
-@end deftypefn
+vector float vec_vmrglw (vector float, vector float);
+vector signed int vec_vmrglw (vector signed int, vector signed int);
+vector unsigned int vec_vmrglw (vector unsigned int,
+                                vector unsigned int);
+vector bool int vec_vmrglw (vector bool int, vector bool int);
 
-@deftypefn {Built-in Function} int __builtin_nds32_mfusr (int @var{usr})
-Return the content of a user space register which is mapped by @var{usr}.
-@end deftypefn
+vector bool short vec_vmrglh (vector bool short, vector bool short);
+vector signed short vec_vmrglh (vector signed short,
+                                vector signed short);
+vector unsigned short vec_vmrglh (vector unsigned short,
+                                  vector unsigned short);
+vector pixel vec_vmrglh (vector pixel, vector pixel);
 
-@deftypefn {Built-in Function} void __builtin_nds32_mtsr (int @var{value}, int @var{sr})
-Move the @var{value} to a system register which is mapped by @var{sr}.
-@end deftypefn
+vector bool char vec_vmrglb (vector bool char, vector bool char);
+vector signed char vec_vmrglb (vector signed char, vector signed char);
+vector unsigned char vec_vmrglb (vector unsigned char,
+                                 vector unsigned char);
 
-@deftypefn {Built-in Function} void __builtin_nds32_mtusr (int @var{value}, int @var{usr})
-Move the @var{value} to a user space register which is mapped by @var{usr}.
-@end deftypefn
+vector unsigned short vec_mfvscr (void);
 
-@deftypefn {Built-in Function} void __builtin_nds32_setgie_en (void)
-Enable global interrupt.
-@end deftypefn
+vector unsigned char vec_min (vector bool char, vector unsigned char);
+vector unsigned char vec_min (vector unsigned char, vector bool char);
+vector unsigned char vec_min (vector unsigned char,
+                              vector unsigned char);
+vector signed char vec_min (vector bool char, vector signed char);
+vector signed char vec_min (vector signed char, vector bool char);
+vector signed char vec_min (vector signed char, vector signed char);
+vector unsigned short vec_min (vector bool short,
+                               vector unsigned short);
+vector unsigned short vec_min (vector unsigned short,
+                               vector bool short);
+vector unsigned short vec_min (vector unsigned short,
+                               vector unsigned short);
+vector signed short vec_min (vector bool short, vector signed short);
+vector signed short vec_min (vector signed short, vector bool short);
+vector signed short vec_min (vector signed short, vector signed short);
+vector unsigned int vec_min (vector bool int, vector unsigned int);
+vector unsigned int vec_min (vector unsigned int, vector bool int);
+vector unsigned int vec_min (vector unsigned int, vector unsigned int);
+vector signed int vec_min (vector bool int, vector signed int);
+vector signed int vec_min (vector signed int, vector bool int);
+vector signed int vec_min (vector signed int, vector signed int);
+vector float vec_min (vector float, vector float);
 
-@deftypefn {Built-in Function} void __builtin_nds32_setgie_dis (void)
-Disable global interrupt.
-@end deftypefn
+vector float vec_vminfp (vector float, vector float);
 
-@node picoChip Built-in Functions
-@subsection picoChip Built-in Functions
+vector signed int vec_vminsw (vector bool int, vector signed int);
+vector signed int vec_vminsw (vector signed int, vector bool int);
+vector signed int vec_vminsw (vector signed int, vector signed int);
 
-GCC provides an interface to selected machine instructions from the
-picoChip instruction set.
+vector unsigned int vec_vminuw (vector bool int, vector unsigned int);
+vector unsigned int vec_vminuw (vector unsigned int, vector bool int);
+vector unsigned int vec_vminuw (vector unsigned int,
+                                vector unsigned int);
 
-@table @code
-@item int __builtin_sbc (int @var{value})
-Sign bit count.  Return the number of consecutive bits in @var{value}
-that have the same value as the sign bit.  The result is the number of
-leading sign bits minus one, giving the number of redundant sign bits in
-@var{value}.
+vector signed short vec_vminsh (vector bool short, vector signed short);
+vector signed short vec_vminsh (vector signed short, vector bool short);
+vector signed short vec_vminsh (vector signed short,
+                                vector signed short);
 
-@item int __builtin_byteswap (int @var{value})
-Byte swap.  Return the result of swapping the upper and lower bytes of
-@var{value}.
+vector unsigned short vec_vminuh (vector bool short,
+                                  vector unsigned short);
+vector unsigned short vec_vminuh (vector unsigned short,
+                                  vector bool short);
+vector unsigned short vec_vminuh (vector unsigned short,
+                                  vector unsigned short);
 
-@item int __builtin_brev (int @var{value})
-Bit reversal.  Return the result of reversing the bits in
-@var{value}.  Bit 15 is swapped with bit 0, bit 14 is swapped with bit 1,
-and so on.
+vector signed char vec_vminsb (vector bool char, vector signed char);
+vector signed char vec_vminsb (vector signed char, vector bool char);
+vector signed char vec_vminsb (vector signed char, vector signed char);
 
-@item int __builtin_adds (int @var{x}, int @var{y})
-Saturating addition.  Return the result of adding @var{x} and @var{y},
-storing the value 32767 if the result overflows.
+vector unsigned char vec_vminub (vector bool char,
+                                 vector unsigned char);
+vector unsigned char vec_vminub (vector unsigned char,
+                                 vector bool char);
+vector unsigned char vec_vminub (vector unsigned char,
+                                 vector unsigned char);
 
-@item int __builtin_subs (int @var{x}, int @var{y})
-Saturating subtraction.  Return the result of subtracting @var{y} from
-@var{x}, storing the value @minus{}32768 if the result overflows.
+vector signed short vec_mladd (vector signed short,
+                               vector signed short,
+                               vector signed short);
+vector signed short vec_mladd (vector signed short,
+                               vector unsigned short,
+                               vector unsigned short);
+vector signed short vec_mladd (vector unsigned short,
+                               vector signed short,
+                               vector signed short);
+vector unsigned short vec_mladd (vector unsigned short,
+                                 vector unsigned short,
+                                 vector unsigned short);
 
-@item void __builtin_halt (void)
-Halt.  The processor stops execution.  This built-in is useful for
-implementing assertions.
+vector signed short vec_mradds (vector signed short,
+                                vector signed short,
+                                vector signed short);
 
-@end table
+vector unsigned int vec_msum (vector unsigned char,
+                              vector unsigned char,
+                              vector unsigned int);
+vector signed int vec_msum (vector signed char,
+                            vector unsigned char,
+                            vector signed int);
+vector unsigned int vec_msum (vector unsigned short,
+                              vector unsigned short,
+                              vector unsigned int);
+vector signed int vec_msum (vector signed short,
+                            vector signed short,
+                            vector signed int);
 
-@node PowerPC Built-in Functions
-@subsection PowerPC Built-in Functions
+vector signed int vec_vmsumshm (vector signed short,
+                                vector signed short,
+                                vector signed int);
 
-These built-in functions are available for the PowerPC family of
-processors:
-@smallexample
-float __builtin_recipdivf (float, float);
-float __builtin_rsqrtf (float);
-double __builtin_recipdiv (double, double);
-double __builtin_rsqrt (double);
-uint64_t __builtin_ppc_get_timebase ();
-unsigned long __builtin_ppc_mftb ();
-double __builtin_unpack_longdouble (long double, int);
-long double __builtin_pack_longdouble (double, double);
-@end smallexample
+vector unsigned int vec_vmsumuhm (vector unsigned short,
+                                  vector unsigned short,
+                                  vector unsigned int);
 
-The @code{vec_rsqrt}, @code{__builtin_rsqrt}, and
-@code{__builtin_rsqrtf} functions generate multiple instructions to
-implement the reciprocal sqrt functionality using reciprocal sqrt
-estimate instructions.
+vector signed int vec_vmsummbm (vector signed char,
+                                vector unsigned char,
+                                vector signed int);
 
-The @code{__builtin_recipdiv}, and @code{__builtin_recipdivf}
-functions generate multiple instructions to implement division using
-the reciprocal estimate instructions.
+vector unsigned int vec_vmsumubm (vector unsigned char,
+                                  vector unsigned char,
+                                  vector unsigned int);
 
-The @code{__builtin_ppc_get_timebase} and @code{__builtin_ppc_mftb}
-functions generate instructions to read the Time Base Register.  The
-@code{__builtin_ppc_get_timebase} function may generate multiple
-instructions and always returns the 64 bits of the Time Base Register.
-The @code{__builtin_ppc_mftb} function always generates one instruction and
-returns the Time Base Register value as an unsigned long, throwing away
-the most significant word on 32-bit environments.
+vector unsigned int vec_msums (vector unsigned short,
+                               vector unsigned short,
+                               vector unsigned int);
+vector signed int vec_msums (vector signed short,
+                             vector signed short,
+                             vector signed int);
 
-The following built-in functions are available for the PowerPC family
-of processors, starting with ISA 2.06 or later (@option{-mcpu=power7}
-or @option{-mpopcntd}):
-@smallexample
-long __builtin_bpermd (long, long);
-int __builtin_divwe (int, int);
-int __builtin_divweo (int, int);
-unsigned int __builtin_divweu (unsigned int, unsigned int);
-unsigned int __builtin_divweuo (unsigned int, unsigned int);
-long __builtin_divde (long, long);
-long __builtin_divdeo (long, long);
-unsigned long __builtin_divdeu (unsigned long, unsigned long);
-unsigned long __builtin_divdeuo (unsigned long, unsigned long);
-unsigned int cdtbcd (unsigned int);
-unsigned int cbcdtd (unsigned int);
-unsigned int addg6s (unsigned int, unsigned int);
-@end smallexample
+vector signed int vec_vmsumshs (vector signed short,
+                                vector signed short,
+                                vector signed int);
 
-The @code{__builtin_divde}, @code{__builtin_divdeo},
-@code{__builitin_divdeu}, @code{__builtin_divdeou} functions require a
-64-bit environment support ISA 2.06 or later.
+vector unsigned int vec_vmsumuhs (vector unsigned short,
+                                  vector unsigned short,
+                                  vector unsigned int);
 
-The following built-in functions are available for the PowerPC family
-of processors when hardware decimal floating point
-(@option{-mhard-dfp}) is available:
-@smallexample
-_Decimal64 __builtin_dxex (_Decimal64);
-_Decimal128 __builtin_dxexq (_Decimal128);
-_Decimal64 __builtin_ddedpd (int, _Decimal64);
-_Decimal128 __builtin_ddedpdq (int, _Decimal128);
-_Decimal64 __builtin_denbcd (int, _Decimal64);
-_Decimal128 __builtin_denbcdq (int, _Decimal128);
-_Decimal64 __builtin_diex (_Decimal64, _Decimal64);
-_Decimal128 _builtin_diexq (_Decimal128, _Decimal128);
-_Decimal64 __builtin_dscli (_Decimal64, int);
-_Decimal128 __builitn_dscliq (_Decimal128, int);
-_Decimal64 __builtin_dscri (_Decimal64, int);
-_Decimal128 __builitn_dscriq (_Decimal128, int);
-unsigned long long __builtin_unpack_dec128 (_Decimal128, int);
-_Decimal128 __builtin_pack_dec128 (unsigned long long, unsigned long long);
-@end smallexample
+void vec_mtvscr (vector signed int);
+void vec_mtvscr (vector unsigned int);
+void vec_mtvscr (vector bool int);
+void vec_mtvscr (vector signed short);
+void vec_mtvscr (vector unsigned short);
+void vec_mtvscr (vector bool short);
+void vec_mtvscr (vector pixel);
+void vec_mtvscr (vector signed char);
+void vec_mtvscr (vector unsigned char);
+void vec_mtvscr (vector bool char);
 
-The following built-in functions are available for the PowerPC family
-of processors when the Vector Scalar (vsx) instruction set is
-available:
-@smallexample
-unsigned long long __builtin_unpack_vector_int128 (vector __int128_t, int);
-vector __int128_t __builtin_pack_vector_int128 (unsigned long long,
-                                                unsigned long long);
-@end smallexample
+vector unsigned short vec_mule (vector unsigned char,
+                                vector unsigned char);
+vector signed short vec_mule (vector signed char,
+                              vector signed char);
+vector unsigned int vec_mule (vector unsigned short,
+                              vector unsigned short);
+vector signed int vec_mule (vector signed short, vector signed short);
+
+vector signed int vec_vmulesh (vector signed short,
+                               vector signed short);
 
-@node PowerPC AltiVec/VSX Built-in Functions
-@subsection PowerPC AltiVec Built-in Functions
+vector unsigned int vec_vmuleuh (vector unsigned short,
+                                 vector unsigned short);
 
-GCC provides an interface for the PowerPC family of processors to access
-the AltiVec operations described in Motorola's AltiVec Programming
-Interface Manual.  The interface is made available by including
-@code{<altivec.h>} and using @option{-maltivec} and
-@option{-mabi=altivec}.  The interface supports the following vector
-types.
+vector signed short vec_vmulesb (vector signed char,
+                                 vector signed char);
 
-@smallexample
-vector unsigned char
-vector signed char
-vector bool char
+vector unsigned short vec_vmuleub (vector unsigned char,
+                                  vector unsigned char);
 
-vector unsigned short
-vector signed short
-vector bool short
-vector pixel
+vector unsigned short vec_mulo (vector unsigned char,
+                                vector unsigned char);
+vector signed short vec_mulo (vector signed char, vector signed char);
+vector unsigned int vec_mulo (vector unsigned short,
+                              vector unsigned short);
+vector signed int vec_mulo (vector signed short, vector signed short);
 
-vector unsigned int
-vector signed int
-vector bool int
-vector float
-@end smallexample
+vector signed int vec_vmulosh (vector signed short,
+                               vector signed short);
 
-If @option{-mvsx} is used the following additional vector types are
-implemented.
+vector unsigned int vec_vmulouh (vector unsigned short,
+                                 vector unsigned short);
 
-@smallexample
-vector unsigned long
-vector signed long
-vector double
-@end smallexample
+vector signed short vec_vmulosb (vector signed char,
+                                 vector signed char);
 
-The long types are only implemented for 64-bit code generation, and
-the long type is only used in the floating point/integer conversion
-instructions.
+vector unsigned short vec_vmuloub (vector unsigned char,
+                                   vector unsigned char);
 
-GCC's implementation of the high-level language interface available from
-C and C++ code differs from Motorola's documentation in several ways.
+vector float vec_nmsub (vector float, vector float, vector float);
 
-@itemize @bullet
+vector float vec_nor (vector float, vector float);
+vector signed int vec_nor (vector signed int, vector signed int);
+vector unsigned int vec_nor (vector unsigned int, vector unsigned int);
+vector bool int vec_nor (vector bool int, vector bool int);
+vector signed short vec_nor (vector signed short, vector signed short);
+vector unsigned short vec_nor (vector unsigned short,
+                               vector unsigned short);
+vector bool short vec_nor (vector bool short, vector bool short);
+vector signed char vec_nor (vector signed char, vector signed char);
+vector unsigned char vec_nor (vector unsigned char,
+                              vector unsigned char);
+vector bool char vec_nor (vector bool char, vector bool char);
 
-@item
-A vector constant is a list of constant expressions within curly braces.
+vector float vec_or (vector float, vector float);
+vector float vec_or (vector float, vector bool int);
+vector float vec_or (vector bool int, vector float);
+vector bool int vec_or (vector bool int, vector bool int);
+vector signed int vec_or (vector bool int, vector signed int);
+vector signed int vec_or (vector signed int, vector bool int);
+vector signed int vec_or (vector signed int, vector signed int);
+vector unsigned int vec_or (vector bool int, vector unsigned int);
+vector unsigned int vec_or (vector unsigned int, vector bool int);
+vector unsigned int vec_or (vector unsigned int, vector unsigned int);
+vector bool short vec_or (vector bool short, vector bool short);
+vector signed short vec_or (vector bool short, vector signed short);
+vector signed short vec_or (vector signed short, vector bool short);
+vector signed short vec_or (vector signed short, vector signed short);
+vector unsigned short vec_or (vector bool short, vector unsigned short);
+vector unsigned short vec_or (vector unsigned short, vector bool short);
+vector unsigned short vec_or (vector unsigned short,
+                              vector unsigned short);
+vector signed char vec_or (vector bool char, vector signed char);
+vector bool char vec_or (vector bool char, vector bool char);
+vector signed char vec_or (vector signed char, vector bool char);
+vector signed char vec_or (vector signed char, vector signed char);
+vector unsigned char vec_or (vector bool char, vector unsigned char);
+vector unsigned char vec_or (vector unsigned char, vector bool char);
+vector unsigned char vec_or (vector unsigned char,
+                             vector unsigned char);
 
-@item
-A vector initializer requires no cast if the vector constant is of the
-same type as the variable it is initializing.
+vector signed char vec_pack (vector signed short, vector signed short);
+vector unsigned char vec_pack (vector unsigned short,
+                               vector unsigned short);
+vector bool char vec_pack (vector bool short, vector bool short);
+vector signed short vec_pack (vector signed int, vector signed int);
+vector unsigned short vec_pack (vector unsigned int,
+                                vector unsigned int);
+vector bool short vec_pack (vector bool int, vector bool int);
 
-@item
-If @code{signed} or @code{unsigned} is omitted, the signedness of the
-vector type is the default signedness of the base type.  The default
-varies depending on the operating system, so a portable program should
-always specify the signedness.
+vector bool short vec_vpkuwum (vector bool int, vector bool int);
+vector signed short vec_vpkuwum (vector signed int, vector signed int);
+vector unsigned short vec_vpkuwum (vector unsigned int,
+                                   vector unsigned int);
 
-@item
-Compiling with @option{-maltivec} adds keywords @code{__vector},
-@code{vector}, @code{__pixel}, @code{pixel}, @code{__bool} and
-@code{bool}.  When compiling ISO C, the context-sensitive substitution
-of the keywords @code{vector}, @code{pixel} and @code{bool} is
-disabled.  To use them, you must include @code{<altivec.h>} instead.
+vector bool char vec_vpkuhum (vector bool short, vector bool short);
+vector signed char vec_vpkuhum (vector signed short,
+                                vector signed short);
+vector unsigned char vec_vpkuhum (vector unsigned short,
+                                  vector unsigned short);
 
-@item
-GCC allows using a @code{typedef} name as the type specifier for a
-vector type.
+vector pixel vec_packpx (vector unsigned int, vector unsigned int);
 
-@item
-For C, overloaded functions are implemented with macros so the following
-does not work:
+vector unsigned char vec_packs (vector unsigned short,
+                                vector unsigned short);
+vector signed char vec_packs (vector signed short, vector signed short);
+vector unsigned short vec_packs (vector unsigned int,
+                                 vector unsigned int);
+vector signed short vec_packs (vector signed int, vector signed int);
 
-@smallexample
-  vec_add ((vector signed int)@{1, 2, 3, 4@}, foo);
-@end smallexample
+vector signed short vec_vpkswss (vector signed int, vector signed int);
 
-@noindent
-Since @code{vec_add} is a macro, the vector constant in the example
-is treated as four separate arguments.  Wrap the entire argument in
-parentheses for this to work.
-@end itemize
+vector unsigned short vec_vpkuwus (vector unsigned int,
+                                   vector unsigned int);
 
-@emph{Note:} Only the @code{<altivec.h>} interface is supported.
-Internally, GCC uses built-in functions to achieve the functionality in
-the aforementioned header file, but they are not supported and are
-subject to change without notice.
+vector signed char vec_vpkshss (vector signed short,
+                                vector signed short);
 
-The following interfaces are supported for the generic and specific
-AltiVec operations and the AltiVec predicates.  In cases where there
-is a direct mapping between generic and specific operations, only the
-generic names are shown here, although the specific operations can also
-be used.
+vector unsigned char vec_vpkuhus (vector unsigned short,
+                                  vector unsigned short);
 
-Arguments that are documented as @code{const int} require literal
-integral values within the range required for that operation.
+vector unsigned char vec_packsu (vector unsigned short,
+                                 vector unsigned short);
+vector unsigned char vec_packsu (vector signed short,
+                                 vector signed short);
+vector unsigned short vec_packsu (vector unsigned int,
+                                  vector unsigned int);
+vector unsigned short vec_packsu (vector signed int, vector signed int);
 
-@smallexample
-vector signed char vec_abs (vector signed char);
-vector signed short vec_abs (vector signed short);
-vector signed int vec_abs (vector signed int);
-vector float vec_abs (vector float);
+vector unsigned short vec_vpkswus (vector signed int,
+                                   vector signed int);
 
-vector signed char vec_abss (vector signed char);
-vector signed short vec_abss (vector signed short);
-vector signed int vec_abss (vector signed int);
+vector unsigned char vec_vpkshus (vector signed short,
+                                  vector signed short);
 
-vector signed char vec_add (vector bool char, vector signed char);
-vector signed char vec_add (vector signed char, vector bool char);
-vector signed char vec_add (vector signed char, vector signed char);
-vector unsigned char vec_add (vector bool char, vector unsigned char);
-vector unsigned char vec_add (vector unsigned char, vector bool char);
-vector unsigned char vec_add (vector unsigned char,
+vector float vec_perm (vector float,
+                       vector float,
+                       vector unsigned char);
+vector signed int vec_perm (vector signed int,
+                            vector signed int,
+                            vector unsigned char);
+vector unsigned int vec_perm (vector unsigned int,
+                              vector unsigned int,
                               vector unsigned char);
-vector signed short vec_add (vector bool short, vector signed short);
-vector signed short vec_add (vector signed short, vector bool short);
-vector signed short vec_add (vector signed short, vector signed short);
-vector unsigned short vec_add (vector bool short,
-                               vector unsigned short);
-vector unsigned short vec_add (vector unsigned short,
-                               vector bool short);
-vector unsigned short vec_add (vector unsigned short,
-                               vector unsigned short);
-vector signed int vec_add (vector bool int, vector signed int);
-vector signed int vec_add (vector signed int, vector bool int);
-vector signed int vec_add (vector signed int, vector signed int);
-vector unsigned int vec_add (vector bool int, vector unsigned int);
-vector unsigned int vec_add (vector unsigned int, vector bool int);
-vector unsigned int vec_add (vector unsigned int, vector unsigned int);
-vector float vec_add (vector float, vector float);
+vector bool int vec_perm (vector bool int,
+                          vector bool int,
+                          vector unsigned char);
+vector signed short vec_perm (vector signed short,
+                              vector signed short,
+                              vector unsigned char);
+vector unsigned short vec_perm (vector unsigned short,
+                                vector unsigned short,
+                                vector unsigned char);
+vector bool short vec_perm (vector bool short,
+                            vector bool short,
+                            vector unsigned char);
+vector pixel vec_perm (vector pixel,
+                       vector pixel,
+                       vector unsigned char);
+vector signed char vec_perm (vector signed char,
+                             vector signed char,
+                             vector unsigned char);
+vector unsigned char vec_perm (vector unsigned char,
+                               vector unsigned char,
+                               vector unsigned char);
+vector bool char vec_perm (vector bool char,
+                           vector bool char,
+                           vector unsigned char);
 
-vector float vec_vaddfp (vector float, vector float);
+vector float vec_re (vector float);
+
+vector signed char vec_rl (vector signed char,
+                           vector unsigned char);
+vector unsigned char vec_rl (vector unsigned char,
+                             vector unsigned char);
+vector signed short vec_rl (vector signed short, vector unsigned short);
+vector unsigned short vec_rl (vector unsigned short,
+                              vector unsigned short);
+vector signed int vec_rl (vector signed int, vector unsigned int);
+vector unsigned int vec_rl (vector unsigned int, vector unsigned int);
+
+vector signed int vec_vrlw (vector signed int, vector unsigned int);
+vector unsigned int vec_vrlw (vector unsigned int, vector unsigned int);
+
+vector signed short vec_vrlh (vector signed short,
+                              vector unsigned short);
+vector unsigned short vec_vrlh (vector unsigned short,
+                                vector unsigned short);
 
-vector signed int vec_vadduwm (vector bool int, vector signed int);
-vector signed int vec_vadduwm (vector signed int, vector bool int);
-vector signed int vec_vadduwm (vector signed int, vector signed int);
-vector unsigned int vec_vadduwm (vector bool int, vector unsigned int);
-vector unsigned int vec_vadduwm (vector unsigned int, vector bool int);
-vector unsigned int vec_vadduwm (vector unsigned int,
-                                 vector unsigned int);
+vector signed char vec_vrlb (vector signed char, vector unsigned char);
+vector unsigned char vec_vrlb (vector unsigned char,
+                               vector unsigned char);
 
-vector signed short vec_vadduhm (vector bool short,
-                                 vector signed short);
-vector signed short vec_vadduhm (vector signed short,
-                                 vector bool short);
-vector signed short vec_vadduhm (vector signed short,
-                                 vector signed short);
-vector unsigned short vec_vadduhm (vector bool short,
-                                   vector unsigned short);
-vector unsigned short vec_vadduhm (vector unsigned short,
-                                   vector bool short);
-vector unsigned short vec_vadduhm (vector unsigned short,
-                                   vector unsigned short);
+vector float vec_round (vector float);
 
-vector signed char vec_vaddubm (vector bool char, vector signed char);
-vector signed char vec_vaddubm (vector signed char, vector bool char);
-vector signed char vec_vaddubm (vector signed char, vector signed char);
-vector unsigned char vec_vaddubm (vector bool char,
-                                  vector unsigned char);
-vector unsigned char vec_vaddubm (vector unsigned char,
-                                  vector bool char);
-vector unsigned char vec_vaddubm (vector unsigned char,
-                                  vector unsigned char);
+vector float vec_recip (vector float, vector float);
 
-vector unsigned int vec_addc (vector unsigned int, vector unsigned int);
+vector float vec_rsqrt (vector float);
 
-vector unsigned char vec_adds (vector bool char, vector unsigned char);
-vector unsigned char vec_adds (vector unsigned char, vector bool char);
-vector unsigned char vec_adds (vector unsigned char,
-                               vector unsigned char);
-vector signed char vec_adds (vector bool char, vector signed char);
-vector signed char vec_adds (vector signed char, vector bool char);
-vector signed char vec_adds (vector signed char, vector signed char);
-vector unsigned short vec_adds (vector bool short,
-                                vector unsigned short);
-vector unsigned short vec_adds (vector unsigned short,
-                                vector bool short);
-vector unsigned short vec_adds (vector unsigned short,
-                                vector unsigned short);
-vector signed short vec_adds (vector bool short, vector signed short);
-vector signed short vec_adds (vector signed short, vector bool short);
-vector signed short vec_adds (vector signed short, vector signed short);
-vector unsigned int vec_adds (vector bool int, vector unsigned int);
-vector unsigned int vec_adds (vector unsigned int, vector bool int);
-vector unsigned int vec_adds (vector unsigned int, vector unsigned int);
-vector signed int vec_adds (vector bool int, vector signed int);
-vector signed int vec_adds (vector signed int, vector bool int);
-vector signed int vec_adds (vector signed int, vector signed int);
+vector float vec_rsqrte (vector float);
 
-vector signed int vec_vaddsws (vector bool int, vector signed int);
-vector signed int vec_vaddsws (vector signed int, vector bool int);
-vector signed int vec_vaddsws (vector signed int, vector signed int);
+vector float vec_sel (vector float, vector float, vector bool int);
+vector float vec_sel (vector float, vector float, vector unsigned int);
+vector signed int vec_sel (vector signed int,
+                           vector signed int,
+                           vector bool int);
+vector signed int vec_sel (vector signed int,
+                           vector signed int,
+                           vector unsigned int);
+vector unsigned int vec_sel (vector unsigned int,
+                             vector unsigned int,
+                             vector bool int);
+vector unsigned int vec_sel (vector unsigned int,
+                             vector unsigned int,
+                             vector unsigned int);
+vector bool int vec_sel (vector bool int,
+                         vector bool int,
+                         vector bool int);
+vector bool int vec_sel (vector bool int,
+                         vector bool int,
+                         vector unsigned int);
+vector signed short vec_sel (vector signed short,
+                             vector signed short,
+                             vector bool short);
+vector signed short vec_sel (vector signed short,
+                             vector signed short,
+                             vector unsigned short);
+vector unsigned short vec_sel (vector unsigned short,
+                               vector unsigned short,
+                               vector bool short);
+vector unsigned short vec_sel (vector unsigned short,
+                               vector unsigned short,
+                               vector unsigned short);
+vector bool short vec_sel (vector bool short,
+                           vector bool short,
+                           vector bool short);
+vector bool short vec_sel (vector bool short,
+                           vector bool short,
+                           vector unsigned short);
+vector signed char vec_sel (vector signed char,
+                            vector signed char,
+                            vector bool char);
+vector signed char vec_sel (vector signed char,
+                            vector signed char,
+                            vector unsigned char);
+vector unsigned char vec_sel (vector unsigned char,
+                              vector unsigned char,
+                              vector bool char);
+vector unsigned char vec_sel (vector unsigned char,
+                              vector unsigned char,
+                              vector unsigned char);
+vector bool char vec_sel (vector bool char,
+                          vector bool char,
+                          vector bool char);
+vector bool char vec_sel (vector bool char,
+                          vector bool char,
+                          vector unsigned char);
 
-vector unsigned int vec_vadduws (vector bool int, vector unsigned int);
-vector unsigned int vec_vadduws (vector unsigned int, vector bool int);
-vector unsigned int vec_vadduws (vector unsigned int,
-                                 vector unsigned int);
+vector signed char vec_sl (vector signed char,
+                           vector unsigned char);
+vector unsigned char vec_sl (vector unsigned char,
+                             vector unsigned char);
+vector signed short vec_sl (vector signed short, vector unsigned short);
+vector unsigned short vec_sl (vector unsigned short,
+                              vector unsigned short);
+vector signed int vec_sl (vector signed int, vector unsigned int);
+vector unsigned int vec_sl (vector unsigned int, vector unsigned int);
 
-vector signed short vec_vaddshs (vector bool short,
-                                 vector signed short);
-vector signed short vec_vaddshs (vector signed short,
-                                 vector bool short);
-vector signed short vec_vaddshs (vector signed short,
-                                 vector signed short);
+vector signed int vec_vslw (vector signed int, vector unsigned int);
+vector unsigned int vec_vslw (vector unsigned int, vector unsigned int);
 
-vector unsigned short vec_vadduhs (vector bool short,
-                                   vector unsigned short);
-vector unsigned short vec_vadduhs (vector unsigned short,
-                                   vector bool short);
-vector unsigned short vec_vadduhs (vector unsigned short,
-                                   vector unsigned short);
+vector signed short vec_vslh (vector signed short,
+                              vector unsigned short);
+vector unsigned short vec_vslh (vector unsigned short,
+                                vector unsigned short);
 
-vector signed char vec_vaddsbs (vector bool char, vector signed char);
-vector signed char vec_vaddsbs (vector signed char, vector bool char);
-vector signed char vec_vaddsbs (vector signed char, vector signed char);
+vector signed char vec_vslb (vector signed char, vector unsigned char);
+vector unsigned char vec_vslb (vector unsigned char,
+                               vector unsigned char);
 
-vector unsigned char vec_vaddubs (vector bool char,
-                                  vector unsigned char);
-vector unsigned char vec_vaddubs (vector unsigned char,
-                                  vector bool char);
-vector unsigned char vec_vaddubs (vector unsigned char,
-                                  vector unsigned char);
+vector float vec_sld (vector float, vector float, const int);
+vector signed int vec_sld (vector signed int,
+                           vector signed int,
+                           const int);
+vector unsigned int vec_sld (vector unsigned int,
+                             vector unsigned int,
+                             const int);
+vector bool int vec_sld (vector bool int,
+                         vector bool int,
+                         const int);
+vector signed short vec_sld (vector signed short,
+                             vector signed short,
+                             const int);
+vector unsigned short vec_sld (vector unsigned short,
+                               vector unsigned short,
+                               const int);
+vector bool short vec_sld (vector bool short,
+                           vector bool short,
+                           const int);
+vector pixel vec_sld (vector pixel,
+                      vector pixel,
+                      const int);
+vector signed char vec_sld (vector signed char,
+                            vector signed char,
+                            const int);
+vector unsigned char vec_sld (vector unsigned char,
+                              vector unsigned char,
+                              const int);
+vector bool char vec_sld (vector bool char,
+                          vector bool char,
+                          const int);
 
-vector float vec_and (vector float, vector float);
-vector float vec_and (vector float, vector bool int);
-vector float vec_and (vector bool int, vector float);
-vector bool int vec_and (vector bool int, vector bool int);
-vector signed int vec_and (vector bool int, vector signed int);
-vector signed int vec_and (vector signed int, vector bool int);
-vector signed int vec_and (vector signed int, vector signed int);
-vector unsigned int vec_and (vector bool int, vector unsigned int);
-vector unsigned int vec_and (vector unsigned int, vector bool int);
-vector unsigned int vec_and (vector unsigned int, vector unsigned int);
-vector bool short vec_and (vector bool short, vector bool short);
-vector signed short vec_and (vector bool short, vector signed short);
-vector signed short vec_and (vector signed short, vector bool short);
-vector signed short vec_and (vector signed short, vector signed short);
-vector unsigned short vec_and (vector bool short,
-                               vector unsigned short);
-vector unsigned short vec_and (vector unsigned short,
-                               vector bool short);
-vector unsigned short vec_and (vector unsigned short,
+vector signed int vec_sll (vector signed int,
+                           vector unsigned int);
+vector signed int vec_sll (vector signed int,
+                           vector unsigned short);
+vector signed int vec_sll (vector signed int,
+                           vector unsigned char);
+vector unsigned int vec_sll (vector unsigned int,
+                             vector unsigned int);
+vector unsigned int vec_sll (vector unsigned int,
+                             vector unsigned short);
+vector unsigned int vec_sll (vector unsigned int,
+                             vector unsigned char);
+vector bool int vec_sll (vector bool int,
+                         vector unsigned int);
+vector bool int vec_sll (vector bool int,
+                         vector unsigned short);
+vector bool int vec_sll (vector bool int,
+                         vector unsigned char);
+vector signed short vec_sll (vector signed short,
+                             vector unsigned int);
+vector signed short vec_sll (vector signed short,
+                             vector unsigned short);
+vector signed short vec_sll (vector signed short,
+                             vector unsigned char);
+vector unsigned short vec_sll (vector unsigned short,
+                               vector unsigned int);
+vector unsigned short vec_sll (vector unsigned short,
                                vector unsigned short);
-vector signed char vec_and (vector bool char, vector signed char);
-vector bool char vec_and (vector bool char, vector bool char);
-vector signed char vec_and (vector signed char, vector bool char);
-vector signed char vec_and (vector signed char, vector signed char);
-vector unsigned char vec_and (vector bool char, vector unsigned char);
-vector unsigned char vec_and (vector unsigned char, vector bool char);
-vector unsigned char vec_and (vector unsigned char,
+vector unsigned short vec_sll (vector unsigned short,
+                               vector unsigned char);
+vector bool short vec_sll (vector bool short, vector unsigned int);
+vector bool short vec_sll (vector bool short, vector unsigned short);
+vector bool short vec_sll (vector bool short, vector unsigned char);
+vector pixel vec_sll (vector pixel, vector unsigned int);
+vector pixel vec_sll (vector pixel, vector unsigned short);
+vector pixel vec_sll (vector pixel, vector unsigned char);
+vector signed char vec_sll (vector signed char, vector unsigned int);
+vector signed char vec_sll (vector signed char, vector unsigned short);
+vector signed char vec_sll (vector signed char, vector unsigned char);
+vector unsigned char vec_sll (vector unsigned char,
+                              vector unsigned int);
+vector unsigned char vec_sll (vector unsigned char,
+                              vector unsigned short);
+vector unsigned char vec_sll (vector unsigned char,
                               vector unsigned char);
+vector bool char vec_sll (vector bool char, vector unsigned int);
+vector bool char vec_sll (vector bool char, vector unsigned short);
+vector bool char vec_sll (vector bool char, vector unsigned char);
 
-vector float vec_andc (vector float, vector float);
-vector float vec_andc (vector float, vector bool int);
-vector float vec_andc (vector bool int, vector float);
-vector bool int vec_andc (vector bool int, vector bool int);
-vector signed int vec_andc (vector bool int, vector signed int);
-vector signed int vec_andc (vector signed int, vector bool int);
-vector signed int vec_andc (vector signed int, vector signed int);
-vector unsigned int vec_andc (vector bool int, vector unsigned int);
-vector unsigned int vec_andc (vector unsigned int, vector bool int);
-vector unsigned int vec_andc (vector unsigned int, vector unsigned int);
-vector bool short vec_andc (vector bool short, vector bool short);
-vector signed short vec_andc (vector bool short, vector signed short);
-vector signed short vec_andc (vector signed short, vector bool short);
-vector signed short vec_andc (vector signed short, vector signed short);
-vector unsigned short vec_andc (vector bool short,
-                                vector unsigned short);
-vector unsigned short vec_andc (vector unsigned short,
-                                vector bool short);
-vector unsigned short vec_andc (vector unsigned short,
-                                vector unsigned short);
-vector signed char vec_andc (vector bool char, vector signed char);
-vector bool char vec_andc (vector bool char, vector bool char);
-vector signed char vec_andc (vector signed char, vector bool char);
-vector signed char vec_andc (vector signed char, vector signed char);
-vector unsigned char vec_andc (vector bool char, vector unsigned char);
-vector unsigned char vec_andc (vector unsigned char, vector bool char);
-vector unsigned char vec_andc (vector unsigned char,
+vector float vec_slo (vector float, vector signed char);
+vector float vec_slo (vector float, vector unsigned char);
+vector signed int vec_slo (vector signed int, vector signed char);
+vector signed int vec_slo (vector signed int, vector unsigned char);
+vector unsigned int vec_slo (vector unsigned int, vector signed char);
+vector unsigned int vec_slo (vector unsigned int, vector unsigned char);
+vector signed short vec_slo (vector signed short, vector signed char);
+vector signed short vec_slo (vector signed short, vector unsigned char);
+vector unsigned short vec_slo (vector unsigned short,
+                               vector signed char);
+vector unsigned short vec_slo (vector unsigned short,
                                vector unsigned char);
-
-vector unsigned char vec_avg (vector unsigned char,
+vector pixel vec_slo (vector pixel, vector signed char);
+vector pixel vec_slo (vector pixel, vector unsigned char);
+vector signed char vec_slo (vector signed char, vector signed char);
+vector signed char vec_slo (vector signed char, vector unsigned char);
+vector unsigned char vec_slo (vector unsigned char, vector signed char);
+vector unsigned char vec_slo (vector unsigned char,
                               vector unsigned char);
-vector signed char vec_avg (vector signed char, vector signed char);
-vector unsigned short vec_avg (vector unsigned short,
-                               vector unsigned short);
-vector signed short vec_avg (vector signed short, vector signed short);
-vector unsigned int vec_avg (vector unsigned int, vector unsigned int);
-vector signed int vec_avg (vector signed int, vector signed int);
 
-vector signed int vec_vavgsw (vector signed int, vector signed int);
+vector signed char vec_splat (vector signed char, const int);
+vector unsigned char vec_splat (vector unsigned char, const int);
+vector bool char vec_splat (vector bool char, const int);
+vector signed short vec_splat (vector signed short, const int);
+vector unsigned short vec_splat (vector unsigned short, const int);
+vector bool short vec_splat (vector bool short, const int);
+vector pixel vec_splat (vector pixel, const int);
+vector float vec_splat (vector float, const int);
+vector signed int vec_splat (vector signed int, const int);
+vector unsigned int vec_splat (vector unsigned int, const int);
+vector bool int vec_splat (vector bool int, const int);
+vector signed long vec_splat (vector signed long, const int);
+vector unsigned long vec_splat (vector unsigned long, const int);
 
-vector unsigned int vec_vavguw (vector unsigned int,
-                                vector unsigned int);
+vector signed char vec_splats (signed char);
+vector unsigned char vec_splats (unsigned char);
+vector signed short vec_splats (signed short);
+vector unsigned short vec_splats (unsigned short);
+vector signed int vec_splats (signed int);
+vector unsigned int vec_splats (unsigned int);
+vector float vec_splats (float);
 
-vector signed short vec_vavgsh (vector signed short,
-                                vector signed short);
+vector float vec_vspltw (vector float, const int);
+vector signed int vec_vspltw (vector signed int, const int);
+vector unsigned int vec_vspltw (vector unsigned int, const int);
+vector bool int vec_vspltw (vector bool int, const int);
 
-vector unsigned short vec_vavguh (vector unsigned short,
-                                  vector unsigned short);
+vector bool short vec_vsplth (vector bool short, const int);
+vector signed short vec_vsplth (vector signed short, const int);
+vector unsigned short vec_vsplth (vector unsigned short, const int);
+vector pixel vec_vsplth (vector pixel, const int);
 
-vector signed char vec_vavgsb (vector signed char, vector signed char);
+vector signed char vec_vspltb (vector signed char, const int);
+vector unsigned char vec_vspltb (vector unsigned char, const int);
+vector bool char vec_vspltb (vector bool char, const int);
 
-vector unsigned char vec_vavgub (vector unsigned char,
-                                 vector unsigned char);
+vector signed char vec_splat_s8 (const int);
 
-vector float vec_copysign (vector float);
+vector signed short vec_splat_s16 (const int);
 
-vector float vec_ceil (vector float);
+vector signed int vec_splat_s32 (const int);
 
-vector signed int vec_cmpb (vector float, vector float);
+vector unsigned char vec_splat_u8 (const int);
 
-vector bool char vec_cmpeq (vector signed char, vector signed char);
-vector bool char vec_cmpeq (vector unsigned char, vector unsigned char);
-vector bool short vec_cmpeq (vector signed short, vector signed short);
-vector bool short vec_cmpeq (vector unsigned short,
-                             vector unsigned short);
-vector bool int vec_cmpeq (vector signed int, vector signed int);
-vector bool int vec_cmpeq (vector unsigned int, vector unsigned int);
-vector bool int vec_cmpeq (vector float, vector float);
+vector unsigned short vec_splat_u16 (const int);
 
-vector bool int vec_vcmpeqfp (vector float, vector float);
+vector unsigned int vec_splat_u32 (const int);
 
-vector bool int vec_vcmpequw (vector signed int, vector signed int);
-vector bool int vec_vcmpequw (vector unsigned int, vector unsigned int);
+vector signed char vec_sr (vector signed char, vector unsigned char);
+vector unsigned char vec_sr (vector unsigned char,
+                             vector unsigned char);
+vector signed short vec_sr (vector signed short,
+                            vector unsigned short);
+vector unsigned short vec_sr (vector unsigned short,
+                              vector unsigned short);
+vector signed int vec_sr (vector signed int, vector unsigned int);
+vector unsigned int vec_sr (vector unsigned int, vector unsigned int);
 
-vector bool short vec_vcmpequh (vector signed short,
-                                vector signed short);
-vector bool short vec_vcmpequh (vector unsigned short,
+vector signed int vec_vsrw (vector signed int, vector unsigned int);
+vector unsigned int vec_vsrw (vector unsigned int, vector unsigned int);
+
+vector signed short vec_vsrh (vector signed short,
+                              vector unsigned short);
+vector unsigned short vec_vsrh (vector unsigned short,
                                 vector unsigned short);
 
-vector bool char vec_vcmpequb (vector signed char, vector signed char);
-vector bool char vec_vcmpequb (vector unsigned char,
+vector signed char vec_vsrb (vector signed char, vector unsigned char);
+vector unsigned char vec_vsrb (vector unsigned char,
                                vector unsigned char);
 
-vector bool int vec_cmpge (vector float, vector float);
-
-vector bool char vec_cmpgt (vector unsigned char, vector unsigned char);
-vector bool char vec_cmpgt (vector signed char, vector signed char);
-vector bool short vec_cmpgt (vector unsigned short,
+vector signed char vec_sra (vector signed char, vector unsigned char);
+vector unsigned char vec_sra (vector unsigned char,
+                              vector unsigned char);
+vector signed short vec_sra (vector signed short,
                              vector unsigned short);
-vector bool short vec_cmpgt (vector signed short, vector signed short);
-vector bool int vec_cmpgt (vector unsigned int, vector unsigned int);
-vector bool int vec_cmpgt (vector signed int, vector signed int);
-vector bool int vec_cmpgt (vector float, vector float);
-
-vector bool int vec_vcmpgtfp (vector float, vector float);
-
-vector bool int vec_vcmpgtsw (vector signed int, vector signed int);
-
-vector bool int vec_vcmpgtuw (vector unsigned int, vector unsigned int);
-
-vector bool short vec_vcmpgtsh (vector signed short,
-                                vector signed short);
-
-vector bool short vec_vcmpgtuh (vector unsigned short,
-                                vector unsigned short);
+vector unsigned short vec_sra (vector unsigned short,
+                               vector unsigned short);
+vector signed int vec_sra (vector signed int, vector unsigned int);
+vector unsigned int vec_sra (vector unsigned int, vector unsigned int);
 
-vector bool char vec_vcmpgtsb (vector signed char, vector signed char);
+vector signed int vec_vsraw (vector signed int, vector unsigned int);
+vector unsigned int vec_vsraw (vector unsigned int,
+                               vector unsigned int);
 
-vector bool char vec_vcmpgtub (vector unsigned char,
-                               vector unsigned char);
+vector signed short vec_vsrah (vector signed short,
+                               vector unsigned short);
+vector unsigned short vec_vsrah (vector unsigned short,
+                                 vector unsigned short);
 
-vector bool int vec_cmple (vector float, vector float);
+vector signed char vec_vsrab (vector signed char, vector unsigned char);
+vector unsigned char vec_vsrab (vector unsigned char,
+                                vector unsigned char);
 
-vector bool char vec_cmplt (vector unsigned char, vector unsigned char);
-vector bool char vec_cmplt (vector signed char, vector signed char);
-vector bool short vec_cmplt (vector unsigned short,
+vector signed int vec_srl (vector signed int, vector unsigned int);
+vector signed int vec_srl (vector signed int, vector unsigned short);
+vector signed int vec_srl (vector signed int, vector unsigned char);
+vector unsigned int vec_srl (vector unsigned int, vector unsigned int);
+vector unsigned int vec_srl (vector unsigned int,
                              vector unsigned short);
-vector bool short vec_cmplt (vector signed short, vector signed short);
-vector bool int vec_cmplt (vector unsigned int, vector unsigned int);
-vector bool int vec_cmplt (vector signed int, vector signed int);
-vector bool int vec_cmplt (vector float, vector float);
-
-vector float vec_cpsgn (vector float, vector float);
+vector unsigned int vec_srl (vector unsigned int, vector unsigned char);
+vector bool int vec_srl (vector bool int, vector unsigned int);
+vector bool int vec_srl (vector bool int, vector unsigned short);
+vector bool int vec_srl (vector bool int, vector unsigned char);
+vector signed short vec_srl (vector signed short, vector unsigned int);
+vector signed short vec_srl (vector signed short,
+                             vector unsigned short);
+vector signed short vec_srl (vector signed short, vector unsigned char);
+vector unsigned short vec_srl (vector unsigned short,
+                               vector unsigned int);
+vector unsigned short vec_srl (vector unsigned short,
+                               vector unsigned short);
+vector unsigned short vec_srl (vector unsigned short,
+                               vector unsigned char);
+vector bool short vec_srl (vector bool short, vector unsigned int);
+vector bool short vec_srl (vector bool short, vector unsigned short);
+vector bool short vec_srl (vector bool short, vector unsigned char);
+vector pixel vec_srl (vector pixel, vector unsigned int);
+vector pixel vec_srl (vector pixel, vector unsigned short);
+vector pixel vec_srl (vector pixel, vector unsigned char);
+vector signed char vec_srl (vector signed char, vector unsigned int);
+vector signed char vec_srl (vector signed char, vector unsigned short);
+vector signed char vec_srl (vector signed char, vector unsigned char);
+vector unsigned char vec_srl (vector unsigned char,
+                              vector unsigned int);
+vector unsigned char vec_srl (vector unsigned char,
+                              vector unsigned short);
+vector unsigned char vec_srl (vector unsigned char,
+                              vector unsigned char);
+vector bool char vec_srl (vector bool char, vector unsigned int);
+vector bool char vec_srl (vector bool char, vector unsigned short);
+vector bool char vec_srl (vector bool char, vector unsigned char);
 
-vector float vec_ctf (vector unsigned int, const int);
-vector float vec_ctf (vector signed int, const int);
-vector double vec_ctf (vector unsigned long, const int);
-vector double vec_ctf (vector signed long, const int);
+vector float vec_sro (vector float, vector signed char);
+vector float vec_sro (vector float, vector unsigned char);
+vector signed int vec_sro (vector signed int, vector signed char);
+vector signed int vec_sro (vector signed int, vector unsigned char);
+vector unsigned int vec_sro (vector unsigned int, vector signed char);
+vector unsigned int vec_sro (vector unsigned int, vector unsigned char);
+vector signed short vec_sro (vector signed short, vector signed char);
+vector signed short vec_sro (vector signed short, vector unsigned char);
+vector unsigned short vec_sro (vector unsigned short,
+                               vector signed char);
+vector unsigned short vec_sro (vector unsigned short,
+                               vector unsigned char);
+vector pixel vec_sro (vector pixel, vector signed char);
+vector pixel vec_sro (vector pixel, vector unsigned char);
+vector signed char vec_sro (vector signed char, vector signed char);
+vector signed char vec_sro (vector signed char, vector unsigned char);
+vector unsigned char vec_sro (vector unsigned char, vector signed char);
+vector unsigned char vec_sro (vector unsigned char,
+                              vector unsigned char);
 
-vector float vec_vcfsx (vector signed int, const int);
+void vec_st (vector float, int, vector float *);
+void vec_st (vector float, int, float *);
+void vec_st (vector signed int, int, vector signed int *);
+void vec_st (vector signed int, int, int *);
+void vec_st (vector unsigned int, int, vector unsigned int *);
+void vec_st (vector unsigned int, int, unsigned int *);
+void vec_st (vector bool int, int, vector bool int *);
+void vec_st (vector bool int, int, unsigned int *);
+void vec_st (vector bool int, int, int *);
+void vec_st (vector signed short, int, vector signed short *);
+void vec_st (vector signed short, int, short *);
+void vec_st (vector unsigned short, int, vector unsigned short *);
+void vec_st (vector unsigned short, int, unsigned short *);
+void vec_st (vector bool short, int, vector bool short *);
+void vec_st (vector bool short, int, unsigned short *);
+void vec_st (vector pixel, int, vector pixel *);
+void vec_st (vector pixel, int, unsigned short *);
+void vec_st (vector pixel, int, short *);
+void vec_st (vector bool short, int, short *);
+void vec_st (vector signed char, int, vector signed char *);
+void vec_st (vector signed char, int, signed char *);
+void vec_st (vector unsigned char, int, vector unsigned char *);
+void vec_st (vector unsigned char, int, unsigned char *);
+void vec_st (vector bool char, int, vector bool char *);
+void vec_st (vector bool char, int, unsigned char *);
+void vec_st (vector bool char, int, signed char *);
 
-vector float vec_vcfux (vector unsigned int, const int);
+void vec_ste (vector signed char, int, signed char *);
+void vec_ste (vector unsigned char, int, unsigned char *);
+void vec_ste (vector bool char, int, signed char *);
+void vec_ste (vector bool char, int, unsigned char *);
+void vec_ste (vector signed short, int, short *);
+void vec_ste (vector unsigned short, int, unsigned short *);
+void vec_ste (vector bool short, int, short *);
+void vec_ste (vector bool short, int, unsigned short *);
+void vec_ste (vector pixel, int, short *);
+void vec_ste (vector pixel, int, unsigned short *);
+void vec_ste (vector float, int, float *);
+void vec_ste (vector signed int, int, int *);
+void vec_ste (vector unsigned int, int, unsigned int *);
+void vec_ste (vector bool int, int, int *);
+void vec_ste (vector bool int, int, unsigned int *);
 
-vector signed int vec_cts (vector float, const int);
-vector signed long vec_cts (vector double, const int);
+void vec_stvewx (vector float, int, float *);
+void vec_stvewx (vector signed int, int, int *);
+void vec_stvewx (vector unsigned int, int, unsigned int *);
+void vec_stvewx (vector bool int, int, int *);
+void vec_stvewx (vector bool int, int, unsigned int *);
 
-vector unsigned int vec_ctu (vector float, const int);
-vector unsigned long vec_ctu (vector double, const int);
+void vec_stvehx (vector signed short, int, short *);
+void vec_stvehx (vector unsigned short, int, unsigned short *);
+void vec_stvehx (vector bool short, int, short *);
+void vec_stvehx (vector bool short, int, unsigned short *);
+void vec_stvehx (vector pixel, int, short *);
+void vec_stvehx (vector pixel, int, unsigned short *);
 
-void vec_dss (const int);
+void vec_stvebx (vector signed char, int, signed char *);
+void vec_stvebx (vector unsigned char, int, unsigned char *);
+void vec_stvebx (vector bool char, int, signed char *);
+void vec_stvebx (vector bool char, int, unsigned char *);
 
-void vec_dssall (void);
+void vec_stl (vector float, int, vector float *);
+void vec_stl (vector float, int, float *);
+void vec_stl (vector signed int, int, vector signed int *);
+void vec_stl (vector signed int, int, int *);
+void vec_stl (vector unsigned int, int, vector unsigned int *);
+void vec_stl (vector unsigned int, int, unsigned int *);
+void vec_stl (vector bool int, int, vector bool int *);
+void vec_stl (vector bool int, int, unsigned int *);
+void vec_stl (vector bool int, int, int *);
+void vec_stl (vector signed short, int, vector signed short *);
+void vec_stl (vector signed short, int, short *);
+void vec_stl (vector unsigned short, int, vector unsigned short *);
+void vec_stl (vector unsigned short, int, unsigned short *);
+void vec_stl (vector bool short, int, vector bool short *);
+void vec_stl (vector bool short, int, unsigned short *);
+void vec_stl (vector bool short, int, short *);
+void vec_stl (vector pixel, int, vector pixel *);
+void vec_stl (vector pixel, int, unsigned short *);
+void vec_stl (vector pixel, int, short *);
+void vec_stl (vector signed char, int, vector signed char *);
+void vec_stl (vector signed char, int, signed char *);
+void vec_stl (vector unsigned char, int, vector unsigned char *);
+void vec_stl (vector unsigned char, int, unsigned char *);
+void vec_stl (vector bool char, int, vector bool char *);
+void vec_stl (vector bool char, int, unsigned char *);
+void vec_stl (vector bool char, int, signed char *);
 
-void vec_dst (const vector unsigned char *, int, const int);
-void vec_dst (const vector signed char *, int, const int);
-void vec_dst (const vector bool char *, int, const int);
-void vec_dst (const vector unsigned short *, int, const int);
-void vec_dst (const vector signed short *, int, const int);
-void vec_dst (const vector bool short *, int, const int);
-void vec_dst (const vector pixel *, int, const int);
-void vec_dst (const vector unsigned int *, int, const int);
-void vec_dst (const vector signed int *, int, const int);
-void vec_dst (const vector bool int *, int, const int);
-void vec_dst (const vector float *, int, const int);
-void vec_dst (const unsigned char *, int, const int);
-void vec_dst (const signed char *, int, const int);
-void vec_dst (const unsigned short *, int, const int);
-void vec_dst (const short *, int, const int);
-void vec_dst (const unsigned int *, int, const int);
-void vec_dst (const int *, int, const int);
-void vec_dst (const unsigned long *, int, const int);
-void vec_dst (const long *, int, const int);
-void vec_dst (const float *, int, const int);
+vector signed char vec_sub (vector bool char, vector signed char);
+vector signed char vec_sub (vector signed char, vector bool char);
+vector signed char vec_sub (vector signed char, vector signed char);
+vector unsigned char vec_sub (vector bool char, vector unsigned char);
+vector unsigned char vec_sub (vector unsigned char, vector bool char);
+vector unsigned char vec_sub (vector unsigned char,
+                              vector unsigned char);
+vector signed short vec_sub (vector bool short, vector signed short);
+vector signed short vec_sub (vector signed short, vector bool short);
+vector signed short vec_sub (vector signed short, vector signed short);
+vector unsigned short vec_sub (vector bool short,
+                               vector unsigned short);
+vector unsigned short vec_sub (vector unsigned short,
+                               vector bool short);
+vector unsigned short vec_sub (vector unsigned short,
+                               vector unsigned short);
+vector signed int vec_sub (vector bool int, vector signed int);
+vector signed int vec_sub (vector signed int, vector bool int);
+vector signed int vec_sub (vector signed int, vector signed int);
+vector unsigned int vec_sub (vector bool int, vector unsigned int);
+vector unsigned int vec_sub (vector unsigned int, vector bool int);
+vector unsigned int vec_sub (vector unsigned int, vector unsigned int);
+vector float vec_sub (vector float, vector float);
 
-void vec_dstst (const vector unsigned char *, int, const int);
-void vec_dstst (const vector signed char *, int, const int);
-void vec_dstst (const vector bool char *, int, const int);
-void vec_dstst (const vector unsigned short *, int, const int);
-void vec_dstst (const vector signed short *, int, const int);
-void vec_dstst (const vector bool short *, int, const int);
-void vec_dstst (const vector pixel *, int, const int);
-void vec_dstst (const vector unsigned int *, int, const int);
-void vec_dstst (const vector signed int *, int, const int);
-void vec_dstst (const vector bool int *, int, const int);
-void vec_dstst (const vector float *, int, const int);
-void vec_dstst (const unsigned char *, int, const int);
-void vec_dstst (const signed char *, int, const int);
-void vec_dstst (const unsigned short *, int, const int);
-void vec_dstst (const short *, int, const int);
-void vec_dstst (const unsigned int *, int, const int);
-void vec_dstst (const int *, int, const int);
-void vec_dstst (const unsigned long *, int, const int);
-void vec_dstst (const long *, int, const int);
-void vec_dstst (const float *, int, const int);
+vector float vec_vsubfp (vector float, vector float);
 
-void vec_dststt (const vector unsigned char *, int, const int);
-void vec_dststt (const vector signed char *, int, const int);
-void vec_dststt (const vector bool char *, int, const int);
-void vec_dststt (const vector unsigned short *, int, const int);
-void vec_dststt (const vector signed short *, int, const int);
-void vec_dststt (const vector bool short *, int, const int);
-void vec_dststt (const vector pixel *, int, const int);
-void vec_dststt (const vector unsigned int *, int, const int);
-void vec_dststt (const vector signed int *, int, const int);
-void vec_dststt (const vector bool int *, int, const int);
-void vec_dststt (const vector float *, int, const int);
-void vec_dststt (const unsigned char *, int, const int);
-void vec_dststt (const signed char *, int, const int);
-void vec_dststt (const unsigned short *, int, const int);
-void vec_dststt (const short *, int, const int);
-void vec_dststt (const unsigned int *, int, const int);
-void vec_dststt (const int *, int, const int);
-void vec_dststt (const unsigned long *, int, const int);
-void vec_dststt (const long *, int, const int);
-void vec_dststt (const float *, int, const int);
+vector signed int vec_vsubuwm (vector bool int, vector signed int);
+vector signed int vec_vsubuwm (vector signed int, vector bool int);
+vector signed int vec_vsubuwm (vector signed int, vector signed int);
+vector unsigned int vec_vsubuwm (vector bool int, vector unsigned int);
+vector unsigned int vec_vsubuwm (vector unsigned int, vector bool int);
+vector unsigned int vec_vsubuwm (vector unsigned int,
+                                 vector unsigned int);
 
-void vec_dstt (const vector unsigned char *, int, const int);
-void vec_dstt (const vector signed char *, int, const int);
-void vec_dstt (const vector bool char *, int, const int);
-void vec_dstt (const vector unsigned short *, int, const int);
-void vec_dstt (const vector signed short *, int, const int);
-void vec_dstt (const vector bool short *, int, const int);
-void vec_dstt (const vector pixel *, int, const int);
-void vec_dstt (const vector unsigned int *, int, const int);
-void vec_dstt (const vector signed int *, int, const int);
-void vec_dstt (const vector bool int *, int, const int);
-void vec_dstt (const vector float *, int, const int);
-void vec_dstt (const unsigned char *, int, const int);
-void vec_dstt (const signed char *, int, const int);
-void vec_dstt (const unsigned short *, int, const int);
-void vec_dstt (const short *, int, const int);
-void vec_dstt (const unsigned int *, int, const int);
-void vec_dstt (const int *, int, const int);
-void vec_dstt (const unsigned long *, int, const int);
-void vec_dstt (const long *, int, const int);
-void vec_dstt (const float *, int, const int);
+vector signed short vec_vsubuhm (vector bool short,
+                                 vector signed short);
+vector signed short vec_vsubuhm (vector signed short,
+                                 vector bool short);
+vector signed short vec_vsubuhm (vector signed short,
+                                 vector signed short);
+vector unsigned short vec_vsubuhm (vector bool short,
+                                   vector unsigned short);
+vector unsigned short vec_vsubuhm (vector unsigned short,
+                                   vector bool short);
+vector unsigned short vec_vsubuhm (vector unsigned short,
+                                   vector unsigned short);
 
-vector float vec_expte (vector float);
+vector signed char vec_vsububm (vector bool char, vector signed char);
+vector signed char vec_vsububm (vector signed char, vector bool char);
+vector signed char vec_vsububm (vector signed char, vector signed char);
+vector unsigned char vec_vsububm (vector bool char,
+                                  vector unsigned char);
+vector unsigned char vec_vsububm (vector unsigned char,
+                                  vector bool char);
+vector unsigned char vec_vsububm (vector unsigned char,
+                                  vector unsigned char);
 
-vector float vec_floor (vector float);
+vector unsigned int vec_subc (vector unsigned int, vector unsigned int);
 
-vector float vec_ld (int, const vector float *);
-vector float vec_ld (int, const float *);
-vector bool int vec_ld (int, const vector bool int *);
-vector signed int vec_ld (int, const vector signed int *);
-vector signed int vec_ld (int, const int *);
-vector signed int vec_ld (int, const long *);
-vector unsigned int vec_ld (int, const vector unsigned int *);
-vector unsigned int vec_ld (int, const unsigned int *);
-vector unsigned int vec_ld (int, const unsigned long *);
-vector bool short vec_ld (int, const vector bool short *);
-vector pixel vec_ld (int, const vector pixel *);
-vector signed short vec_ld (int, const vector signed short *);
-vector signed short vec_ld (int, const short *);
-vector unsigned short vec_ld (int, const vector unsigned short *);
-vector unsigned short vec_ld (int, const unsigned short *);
-vector bool char vec_ld (int, const vector bool char *);
-vector signed char vec_ld (int, const vector signed char *);
-vector signed char vec_ld (int, const signed char *);
-vector unsigned char vec_ld (int, const vector unsigned char *);
-vector unsigned char vec_ld (int, const unsigned char *);
+vector unsigned char vec_subs (vector bool char, vector unsigned char);
+vector unsigned char vec_subs (vector unsigned char, vector bool char);
+vector unsigned char vec_subs (vector unsigned char,
+                               vector unsigned char);
+vector signed char vec_subs (vector bool char, vector signed char);
+vector signed char vec_subs (vector signed char, vector bool char);
+vector signed char vec_subs (vector signed char, vector signed char);
+vector unsigned short vec_subs (vector bool short,
+                                vector unsigned short);
+vector unsigned short vec_subs (vector unsigned short,
+                                vector bool short);
+vector unsigned short vec_subs (vector unsigned short,
+                                vector unsigned short);
+vector signed short vec_subs (vector bool short, vector signed short);
+vector signed short vec_subs (vector signed short, vector bool short);
+vector signed short vec_subs (vector signed short, vector signed short);
+vector unsigned int vec_subs (vector bool int, vector unsigned int);
+vector unsigned int vec_subs (vector unsigned int, vector bool int);
+vector unsigned int vec_subs (vector unsigned int, vector unsigned int);
+vector signed int vec_subs (vector bool int, vector signed int);
+vector signed int vec_subs (vector signed int, vector bool int);
+vector signed int vec_subs (vector signed int, vector signed int);
 
-vector signed char vec_lde (int, const signed char *);
-vector unsigned char vec_lde (int, const unsigned char *);
-vector signed short vec_lde (int, const short *);
-vector unsigned short vec_lde (int, const unsigned short *);
-vector float vec_lde (int, const float *);
-vector signed int vec_lde (int, const int *);
-vector unsigned int vec_lde (int, const unsigned int *);
-vector signed int vec_lde (int, const long *);
-vector unsigned int vec_lde (int, const unsigned long *);
+vector signed int vec_vsubsws (vector bool int, vector signed int);
+vector signed int vec_vsubsws (vector signed int, vector bool int);
+vector signed int vec_vsubsws (vector signed int, vector signed int);
 
-vector float vec_lvewx (int, float *);
-vector signed int vec_lvewx (int, int *);
-vector unsigned int vec_lvewx (int, unsigned int *);
-vector signed int vec_lvewx (int, long *);
-vector unsigned int vec_lvewx (int, unsigned long *);
+vector unsigned int vec_vsubuws (vector bool int, vector unsigned int);
+vector unsigned int vec_vsubuws (vector unsigned int, vector bool int);
+vector unsigned int vec_vsubuws (vector unsigned int,
+                                 vector unsigned int);
 
-vector signed short vec_lvehx (int, short *);
-vector unsigned short vec_lvehx (int, unsigned short *);
+vector signed short vec_vsubshs (vector bool short,
+                                 vector signed short);
+vector signed short vec_vsubshs (vector signed short,
+                                 vector bool short);
+vector signed short vec_vsubshs (vector signed short,
+                                 vector signed short);
 
-vector signed char vec_lvebx (int, char *);
-vector unsigned char vec_lvebx (int, unsigned char *);
+vector unsigned short vec_vsubuhs (vector bool short,
+                                   vector unsigned short);
+vector unsigned short vec_vsubuhs (vector unsigned short,
+                                   vector bool short);
+vector unsigned short vec_vsubuhs (vector unsigned short,
+                                   vector unsigned short);
 
-vector float vec_ldl (int, const vector float *);
-vector float vec_ldl (int, const float *);
-vector bool int vec_ldl (int, const vector bool int *);
-vector signed int vec_ldl (int, const vector signed int *);
-vector signed int vec_ldl (int, const int *);
-vector signed int vec_ldl (int, const long *);
-vector unsigned int vec_ldl (int, const vector unsigned int *);
-vector unsigned int vec_ldl (int, const unsigned int *);
-vector unsigned int vec_ldl (int, const unsigned long *);
-vector bool short vec_ldl (int, const vector bool short *);
-vector pixel vec_ldl (int, const vector pixel *);
-vector signed short vec_ldl (int, const vector signed short *);
-vector signed short vec_ldl (int, const short *);
-vector unsigned short vec_ldl (int, const vector unsigned short *);
-vector unsigned short vec_ldl (int, const unsigned short *);
-vector bool char vec_ldl (int, const vector bool char *);
-vector signed char vec_ldl (int, const vector signed char *);
-vector signed char vec_ldl (int, const signed char *);
-vector unsigned char vec_ldl (int, const vector unsigned char *);
-vector unsigned char vec_ldl (int, const unsigned char *);
+vector signed char vec_vsubsbs (vector bool char, vector signed char);
+vector signed char vec_vsubsbs (vector signed char, vector bool char);
+vector signed char vec_vsubsbs (vector signed char, vector signed char);
 
-vector float vec_loge (vector float);
+vector unsigned char vec_vsububs (vector bool char,
+                                  vector unsigned char);
+vector unsigned char vec_vsububs (vector unsigned char,
+                                  vector bool char);
+vector unsigned char vec_vsububs (vector unsigned char,
+                                  vector unsigned char);
 
-vector unsigned char vec_lvsl (int, const volatile unsigned char *);
-vector unsigned char vec_lvsl (int, const volatile signed char *);
-vector unsigned char vec_lvsl (int, const volatile unsigned short *);
-vector unsigned char vec_lvsl (int, const volatile short *);
-vector unsigned char vec_lvsl (int, const volatile unsigned int *);
-vector unsigned char vec_lvsl (int, const volatile int *);
-vector unsigned char vec_lvsl (int, const volatile unsigned long *);
-vector unsigned char vec_lvsl (int, const volatile long *);
-vector unsigned char vec_lvsl (int, const volatile float *);
+vector unsigned int vec_sum4s (vector unsigned char,
+                               vector unsigned int);
+vector signed int vec_sum4s (vector signed char, vector signed int);
+vector signed int vec_sum4s (vector signed short, vector signed int);
 
-vector unsigned char vec_lvsr (int, const volatile unsigned char *);
-vector unsigned char vec_lvsr (int, const volatile signed char *);
-vector unsigned char vec_lvsr (int, const volatile unsigned short *);
-vector unsigned char vec_lvsr (int, const volatile short *);
-vector unsigned char vec_lvsr (int, const volatile unsigned int *);
-vector unsigned char vec_lvsr (int, const volatile int *);
-vector unsigned char vec_lvsr (int, const volatile unsigned long *);
-vector unsigned char vec_lvsr (int, const volatile long *);
-vector unsigned char vec_lvsr (int, const volatile float *);
+vector signed int vec_vsum4shs (vector signed short, vector signed int);
 
-vector float vec_madd (vector float, vector float, vector float);
+vector signed int vec_vsum4sbs (vector signed char, vector signed int);
 
-vector signed short vec_madds (vector signed short,
-                               vector signed short,
-                               vector signed short);
+vector unsigned int vec_vsum4ubs (vector unsigned char,
+                                  vector unsigned int);
 
-vector unsigned char vec_max (vector bool char, vector unsigned char);
-vector unsigned char vec_max (vector unsigned char, vector bool char);
-vector unsigned char vec_max (vector unsigned char,
-                              vector unsigned char);
-vector signed char vec_max (vector bool char, vector signed char);
-vector signed char vec_max (vector signed char, vector bool char);
-vector signed char vec_max (vector signed char, vector signed char);
-vector unsigned short vec_max (vector bool short,
-                               vector unsigned short);
-vector unsigned short vec_max (vector unsigned short,
-                               vector bool short);
-vector unsigned short vec_max (vector unsigned short,
-                               vector unsigned short);
-vector signed short vec_max (vector bool short, vector signed short);
-vector signed short vec_max (vector signed short, vector bool short);
-vector signed short vec_max (vector signed short, vector signed short);
-vector unsigned int vec_max (vector bool int, vector unsigned int);
-vector unsigned int vec_max (vector unsigned int, vector bool int);
-vector unsigned int vec_max (vector unsigned int, vector unsigned int);
-vector signed int vec_max (vector bool int, vector signed int);
-vector signed int vec_max (vector signed int, vector bool int);
-vector signed int vec_max (vector signed int, vector signed int);
-vector float vec_max (vector float, vector float);
+vector signed int vec_sum2s (vector signed int, vector signed int);
 
-vector float vec_vmaxfp (vector float, vector float);
+vector signed int vec_sums (vector signed int, vector signed int);
 
-vector signed int vec_vmaxsw (vector bool int, vector signed int);
-vector signed int vec_vmaxsw (vector signed int, vector bool int);
-vector signed int vec_vmaxsw (vector signed int, vector signed int);
+vector float vec_trunc (vector float);
 
-vector unsigned int vec_vmaxuw (vector bool int, vector unsigned int);
-vector unsigned int vec_vmaxuw (vector unsigned int, vector bool int);
-vector unsigned int vec_vmaxuw (vector unsigned int,
-                                vector unsigned int);
+vector signed short vec_unpackh (vector signed char);
+vector bool short vec_unpackh (vector bool char);
+vector signed int vec_unpackh (vector signed short);
+vector bool int vec_unpackh (vector bool short);
+vector unsigned int vec_unpackh (vector pixel);
 
-vector signed short vec_vmaxsh (vector bool short, vector signed short);
-vector signed short vec_vmaxsh (vector signed short, vector bool short);
-vector signed short vec_vmaxsh (vector signed short,
-                                vector signed short);
+vector bool int vec_vupkhsh (vector bool short);
+vector signed int vec_vupkhsh (vector signed short);
 
-vector unsigned short vec_vmaxuh (vector bool short,
-                                  vector unsigned short);
-vector unsigned short vec_vmaxuh (vector unsigned short,
-                                  vector bool short);
-vector unsigned short vec_vmaxuh (vector unsigned short,
-                                  vector unsigned short);
+vector unsigned int vec_vupkhpx (vector pixel);
 
-vector signed char vec_vmaxsb (vector bool char, vector signed char);
-vector signed char vec_vmaxsb (vector signed char, vector bool char);
-vector signed char vec_vmaxsb (vector signed char, vector signed char);
+vector bool short vec_vupkhsb (vector bool char);
+vector signed short vec_vupkhsb (vector signed char);
 
-vector unsigned char vec_vmaxub (vector bool char,
-                                 vector unsigned char);
-vector unsigned char vec_vmaxub (vector unsigned char,
-                                 vector bool char);
-vector unsigned char vec_vmaxub (vector unsigned char,
-                                 vector unsigned char);
+vector signed short vec_unpackl (vector signed char);
+vector bool short vec_unpackl (vector bool char);
+vector unsigned int vec_unpackl (vector pixel);
+vector signed int vec_unpackl (vector signed short);
+vector bool int vec_unpackl (vector bool short);
 
-vector bool char vec_mergeh (vector bool char, vector bool char);
-vector signed char vec_mergeh (vector signed char, vector signed char);
-vector unsigned char vec_mergeh (vector unsigned char,
-                                 vector unsigned char);
-vector bool short vec_mergeh (vector bool short, vector bool short);
-vector pixel vec_mergeh (vector pixel, vector pixel);
-vector signed short vec_mergeh (vector signed short,
-                                vector signed short);
-vector unsigned short vec_mergeh (vector unsigned short,
-                                  vector unsigned short);
-vector float vec_mergeh (vector float, vector float);
-vector bool int vec_mergeh (vector bool int, vector bool int);
-vector signed int vec_mergeh (vector signed int, vector signed int);
-vector unsigned int vec_mergeh (vector unsigned int,
-                                vector unsigned int);
+vector unsigned int vec_vupklpx (vector pixel);
 
-vector float vec_vmrghw (vector float, vector float);
-vector bool int vec_vmrghw (vector bool int, vector bool int);
-vector signed int vec_vmrghw (vector signed int, vector signed int);
-vector unsigned int vec_vmrghw (vector unsigned int,
-                                vector unsigned int);
+vector bool int vec_vupklsh (vector bool short);
+vector signed int vec_vupklsh (vector signed short);
 
-vector bool short vec_vmrghh (vector bool short, vector bool short);
-vector signed short vec_vmrghh (vector signed short,
-                                vector signed short);
-vector unsigned short vec_vmrghh (vector unsigned short,
-                                  vector unsigned short);
-vector pixel vec_vmrghh (vector pixel, vector pixel);
+vector bool short vec_vupklsb (vector bool char);
+vector signed short vec_vupklsb (vector signed char);
 
-vector bool char vec_vmrghb (vector bool char, vector bool char);
-vector signed char vec_vmrghb (vector signed char, vector signed char);
-vector unsigned char vec_vmrghb (vector unsigned char,
-                                 vector unsigned char);
+vector float vec_xor (vector float, vector float);
+vector float vec_xor (vector float, vector bool int);
+vector float vec_xor (vector bool int, vector float);
+vector bool int vec_xor (vector bool int, vector bool int);
+vector signed int vec_xor (vector bool int, vector signed int);
+vector signed int vec_xor (vector signed int, vector bool int);
+vector signed int vec_xor (vector signed int, vector signed int);
+vector unsigned int vec_xor (vector bool int, vector unsigned int);
+vector unsigned int vec_xor (vector unsigned int, vector bool int);
+vector unsigned int vec_xor (vector unsigned int, vector unsigned int);
+vector bool short vec_xor (vector bool short, vector bool short);
+vector signed short vec_xor (vector bool short, vector signed short);
+vector signed short vec_xor (vector signed short, vector bool short);
+vector signed short vec_xor (vector signed short, vector signed short);
+vector unsigned short vec_xor (vector bool short,
+                               vector unsigned short);
+vector unsigned short vec_xor (vector unsigned short,
+                               vector bool short);
+vector unsigned short vec_xor (vector unsigned short,
+                               vector unsigned short);
+vector signed char vec_xor (vector bool char, vector signed char);
+vector bool char vec_xor (vector bool char, vector bool char);
+vector signed char vec_xor (vector signed char, vector bool char);
+vector signed char vec_xor (vector signed char, vector signed char);
+vector unsigned char vec_xor (vector bool char, vector unsigned char);
+vector unsigned char vec_xor (vector unsigned char, vector bool char);
+vector unsigned char vec_xor (vector unsigned char,
+                              vector unsigned char);
+
+int vec_all_eq (vector signed char, vector bool char);
+int vec_all_eq (vector signed char, vector signed char);
+int vec_all_eq (vector unsigned char, vector bool char);
+int vec_all_eq (vector unsigned char, vector unsigned char);
+int vec_all_eq (vector bool char, vector bool char);
+int vec_all_eq (vector bool char, vector unsigned char);
+int vec_all_eq (vector bool char, vector signed char);
+int vec_all_eq (vector signed short, vector bool short);
+int vec_all_eq (vector signed short, vector signed short);
+int vec_all_eq (vector unsigned short, vector bool short);
+int vec_all_eq (vector unsigned short, vector unsigned short);
+int vec_all_eq (vector bool short, vector bool short);
+int vec_all_eq (vector bool short, vector unsigned short);
+int vec_all_eq (vector bool short, vector signed short);
+int vec_all_eq (vector pixel, vector pixel);
+int vec_all_eq (vector signed int, vector bool int);
+int vec_all_eq (vector signed int, vector signed int);
+int vec_all_eq (vector unsigned int, vector bool int);
+int vec_all_eq (vector unsigned int, vector unsigned int);
+int vec_all_eq (vector bool int, vector bool int);
+int vec_all_eq (vector bool int, vector unsigned int);
+int vec_all_eq (vector bool int, vector signed int);
+int vec_all_eq (vector float, vector float);
 
-vector bool char vec_mergel (vector bool char, vector bool char);
-vector signed char vec_mergel (vector signed char, vector signed char);
-vector unsigned char vec_mergel (vector unsigned char,
-                                 vector unsigned char);
-vector bool short vec_mergel (vector bool short, vector bool short);
-vector pixel vec_mergel (vector pixel, vector pixel);
-vector signed short vec_mergel (vector signed short,
-                                vector signed short);
-vector unsigned short vec_mergel (vector unsigned short,
-                                  vector unsigned short);
-vector float vec_mergel (vector float, vector float);
-vector bool int vec_mergel (vector bool int, vector bool int);
-vector signed int vec_mergel (vector signed int, vector signed int);
-vector unsigned int vec_mergel (vector unsigned int,
-                                vector unsigned int);
+int vec_all_ge (vector bool char, vector unsigned char);
+int vec_all_ge (vector unsigned char, vector bool char);
+int vec_all_ge (vector unsigned char, vector unsigned char);
+int vec_all_ge (vector bool char, vector signed char);
+int vec_all_ge (vector signed char, vector bool char);
+int vec_all_ge (vector signed char, vector signed char);
+int vec_all_ge (vector bool short, vector unsigned short);
+int vec_all_ge (vector unsigned short, vector bool short);
+int vec_all_ge (vector unsigned short, vector unsigned short);
+int vec_all_ge (vector signed short, vector signed short);
+int vec_all_ge (vector bool short, vector signed short);
+int vec_all_ge (vector signed short, vector bool short);
+int vec_all_ge (vector bool int, vector unsigned int);
+int vec_all_ge (vector unsigned int, vector bool int);
+int vec_all_ge (vector unsigned int, vector unsigned int);
+int vec_all_ge (vector bool int, vector signed int);
+int vec_all_ge (vector signed int, vector bool int);
+int vec_all_ge (vector signed int, vector signed int);
+int vec_all_ge (vector float, vector float);
 
-vector float vec_vmrglw (vector float, vector float);
-vector signed int vec_vmrglw (vector signed int, vector signed int);
-vector unsigned int vec_vmrglw (vector unsigned int,
-                                vector unsigned int);
-vector bool int vec_vmrglw (vector bool int, vector bool int);
+int vec_all_gt (vector bool char, vector unsigned char);
+int vec_all_gt (vector unsigned char, vector bool char);
+int vec_all_gt (vector unsigned char, vector unsigned char);
+int vec_all_gt (vector bool char, vector signed char);
+int vec_all_gt (vector signed char, vector bool char);
+int vec_all_gt (vector signed char, vector signed char);
+int vec_all_gt (vector bool short, vector unsigned short);
+int vec_all_gt (vector unsigned short, vector bool short);
+int vec_all_gt (vector unsigned short, vector unsigned short);
+int vec_all_gt (vector bool short, vector signed short);
+int vec_all_gt (vector signed short, vector bool short);
+int vec_all_gt (vector signed short, vector signed short);
+int vec_all_gt (vector bool int, vector unsigned int);
+int vec_all_gt (vector unsigned int, vector bool int);
+int vec_all_gt (vector unsigned int, vector unsigned int);
+int vec_all_gt (vector bool int, vector signed int);
+int vec_all_gt (vector signed int, vector bool int);
+int vec_all_gt (vector signed int, vector signed int);
+int vec_all_gt (vector float, vector float);
 
-vector bool short vec_vmrglh (vector bool short, vector bool short);
-vector signed short vec_vmrglh (vector signed short,
-                                vector signed short);
-vector unsigned short vec_vmrglh (vector unsigned short,
-                                  vector unsigned short);
-vector pixel vec_vmrglh (vector pixel, vector pixel);
+int vec_all_in (vector float, vector float);
 
-vector bool char vec_vmrglb (vector bool char, vector bool char);
-vector signed char vec_vmrglb (vector signed char, vector signed char);
-vector unsigned char vec_vmrglb (vector unsigned char,
-                                 vector unsigned char);
+int vec_all_le (vector bool char, vector unsigned char);
+int vec_all_le (vector unsigned char, vector bool char);
+int vec_all_le (vector unsigned char, vector unsigned char);
+int vec_all_le (vector bool char, vector signed char);
+int vec_all_le (vector signed char, vector bool char);
+int vec_all_le (vector signed char, vector signed char);
+int vec_all_le (vector bool short, vector unsigned short);
+int vec_all_le (vector unsigned short, vector bool short);
+int vec_all_le (vector unsigned short, vector unsigned short);
+int vec_all_le (vector bool short, vector signed short);
+int vec_all_le (vector signed short, vector bool short);
+int vec_all_le (vector signed short, vector signed short);
+int vec_all_le (vector bool int, vector unsigned int);
+int vec_all_le (vector unsigned int, vector bool int);
+int vec_all_le (vector unsigned int, vector unsigned int);
+int vec_all_le (vector bool int, vector signed int);
+int vec_all_le (vector signed int, vector bool int);
+int vec_all_le (vector signed int, vector signed int);
+int vec_all_le (vector float, vector float);
 
-vector unsigned short vec_mfvscr (void);
+int vec_all_lt (vector bool char, vector unsigned char);
+int vec_all_lt (vector unsigned char, vector bool char);
+int vec_all_lt (vector unsigned char, vector unsigned char);
+int vec_all_lt (vector bool char, vector signed char);
+int vec_all_lt (vector signed char, vector bool char);
+int vec_all_lt (vector signed char, vector signed char);
+int vec_all_lt (vector bool short, vector unsigned short);
+int vec_all_lt (vector unsigned short, vector bool short);
+int vec_all_lt (vector unsigned short, vector unsigned short);
+int vec_all_lt (vector bool short, vector signed short);
+int vec_all_lt (vector signed short, vector bool short);
+int vec_all_lt (vector signed short, vector signed short);
+int vec_all_lt (vector bool int, vector unsigned int);
+int vec_all_lt (vector unsigned int, vector bool int);
+int vec_all_lt (vector unsigned int, vector unsigned int);
+int vec_all_lt (vector bool int, vector signed int);
+int vec_all_lt (vector signed int, vector bool int);
+int vec_all_lt (vector signed int, vector signed int);
+int vec_all_lt (vector float, vector float);
 
-vector unsigned char vec_min (vector bool char, vector unsigned char);
-vector unsigned char vec_min (vector unsigned char, vector bool char);
-vector unsigned char vec_min (vector unsigned char,
-                              vector unsigned char);
-vector signed char vec_min (vector bool char, vector signed char);
-vector signed char vec_min (vector signed char, vector bool char);
-vector signed char vec_min (vector signed char, vector signed char);
-vector unsigned short vec_min (vector bool short,
-                               vector unsigned short);
-vector unsigned short vec_min (vector unsigned short,
-                               vector bool short);
-vector unsigned short vec_min (vector unsigned short,
-                               vector unsigned short);
-vector signed short vec_min (vector bool short, vector signed short);
-vector signed short vec_min (vector signed short, vector bool short);
-vector signed short vec_min (vector signed short, vector signed short);
-vector unsigned int vec_min (vector bool int, vector unsigned int);
-vector unsigned int vec_min (vector unsigned int, vector bool int);
-vector unsigned int vec_min (vector unsigned int, vector unsigned int);
-vector signed int vec_min (vector bool int, vector signed int);
-vector signed int vec_min (vector signed int, vector bool int);
-vector signed int vec_min (vector signed int, vector signed int);
-vector float vec_min (vector float, vector float);
+int vec_all_nan (vector float);
 
-vector float vec_vminfp (vector float, vector float);
+int vec_all_ne (vector signed char, vector bool char);
+int vec_all_ne (vector signed char, vector signed char);
+int vec_all_ne (vector unsigned char, vector bool char);
+int vec_all_ne (vector unsigned char, vector unsigned char);
+int vec_all_ne (vector bool char, vector bool char);
+int vec_all_ne (vector bool char, vector unsigned char);
+int vec_all_ne (vector bool char, vector signed char);
+int vec_all_ne (vector signed short, vector bool short);
+int vec_all_ne (vector signed short, vector signed short);
+int vec_all_ne (vector unsigned short, vector bool short);
+int vec_all_ne (vector unsigned short, vector unsigned short);
+int vec_all_ne (vector bool short, vector bool short);
+int vec_all_ne (vector bool short, vector unsigned short);
+int vec_all_ne (vector bool short, vector signed short);
+int vec_all_ne (vector pixel, vector pixel);
+int vec_all_ne (vector signed int, vector bool int);
+int vec_all_ne (vector signed int, vector signed int);
+int vec_all_ne (vector unsigned int, vector bool int);
+int vec_all_ne (vector unsigned int, vector unsigned int);
+int vec_all_ne (vector bool int, vector bool int);
+int vec_all_ne (vector bool int, vector unsigned int);
+int vec_all_ne (vector bool int, vector signed int);
+int vec_all_ne (vector float, vector float);
 
-vector signed int vec_vminsw (vector bool int, vector signed int);
-vector signed int vec_vminsw (vector signed int, vector bool int);
-vector signed int vec_vminsw (vector signed int, vector signed int);
+int vec_all_nge (vector float, vector float);
 
-vector unsigned int vec_vminuw (vector bool int, vector unsigned int);
-vector unsigned int vec_vminuw (vector unsigned int, vector bool int);
-vector unsigned int vec_vminuw (vector unsigned int,
-                                vector unsigned int);
+int vec_all_ngt (vector float, vector float);
 
-vector signed short vec_vminsh (vector bool short, vector signed short);
-vector signed short vec_vminsh (vector signed short, vector bool short);
-vector signed short vec_vminsh (vector signed short,
-                                vector signed short);
+int vec_all_nle (vector float, vector float);
 
-vector unsigned short vec_vminuh (vector bool short,
-                                  vector unsigned short);
-vector unsigned short vec_vminuh (vector unsigned short,
-                                  vector bool short);
-vector unsigned short vec_vminuh (vector unsigned short,
-                                  vector unsigned short);
+int vec_all_nlt (vector float, vector float);
 
-vector signed char vec_vminsb (vector bool char, vector signed char);
-vector signed char vec_vminsb (vector signed char, vector bool char);
-vector signed char vec_vminsb (vector signed char, vector signed char);
+int vec_all_numeric (vector float);
 
-vector unsigned char vec_vminub (vector bool char,
-                                 vector unsigned char);
-vector unsigned char vec_vminub (vector unsigned char,
-                                 vector bool char);
-vector unsigned char vec_vminub (vector unsigned char,
-                                 vector unsigned char);
+int vec_any_eq (vector signed char, vector bool char);
+int vec_any_eq (vector signed char, vector signed char);
+int vec_any_eq (vector unsigned char, vector bool char);
+int vec_any_eq (vector unsigned char, vector unsigned char);
+int vec_any_eq (vector bool char, vector bool char);
+int vec_any_eq (vector bool char, vector unsigned char);
+int vec_any_eq (vector bool char, vector signed char);
+int vec_any_eq (vector signed short, vector bool short);
+int vec_any_eq (vector signed short, vector signed short);
+int vec_any_eq (vector unsigned short, vector bool short);
+int vec_any_eq (vector unsigned short, vector unsigned short);
+int vec_any_eq (vector bool short, vector bool short);
+int vec_any_eq (vector bool short, vector unsigned short);
+int vec_any_eq (vector bool short, vector signed short);
+int vec_any_eq (vector pixel, vector pixel);
+int vec_any_eq (vector signed int, vector bool int);
+int vec_any_eq (vector signed int, vector signed int);
+int vec_any_eq (vector unsigned int, vector bool int);
+int vec_any_eq (vector unsigned int, vector unsigned int);
+int vec_any_eq (vector bool int, vector bool int);
+int vec_any_eq (vector bool int, vector unsigned int);
+int vec_any_eq (vector bool int, vector signed int);
+int vec_any_eq (vector float, vector float);
 
-vector signed short vec_mladd (vector signed short,
-                               vector signed short,
-                               vector signed short);
-vector signed short vec_mladd (vector signed short,
-                               vector unsigned short,
-                               vector unsigned short);
-vector signed short vec_mladd (vector unsigned short,
-                               vector signed short,
-                               vector signed short);
-vector unsigned short vec_mladd (vector unsigned short,
-                                 vector unsigned short,
-                                 vector unsigned short);
+int vec_any_ge (vector signed char, vector bool char);
+int vec_any_ge (vector unsigned char, vector bool char);
+int vec_any_ge (vector unsigned char, vector unsigned char);
+int vec_any_ge (vector signed char, vector signed char);
+int vec_any_ge (vector bool char, vector unsigned char);
+int vec_any_ge (vector bool char, vector signed char);
+int vec_any_ge (vector unsigned short, vector bool short);
+int vec_any_ge (vector unsigned short, vector unsigned short);
+int vec_any_ge (vector signed short, vector signed short);
+int vec_any_ge (vector signed short, vector bool short);
+int vec_any_ge (vector bool short, vector unsigned short);
+int vec_any_ge (vector bool short, vector signed short);
+int vec_any_ge (vector signed int, vector bool int);
+int vec_any_ge (vector unsigned int, vector bool int);
+int vec_any_ge (vector unsigned int, vector unsigned int);
+int vec_any_ge (vector signed int, vector signed int);
+int vec_any_ge (vector bool int, vector unsigned int);
+int vec_any_ge (vector bool int, vector signed int);
+int vec_any_ge (vector float, vector float);
 
-vector signed short vec_mradds (vector signed short,
-                                vector signed short,
-                                vector signed short);
+int vec_any_gt (vector bool char, vector unsigned char);
+int vec_any_gt (vector unsigned char, vector bool char);
+int vec_any_gt (vector unsigned char, vector unsigned char);
+int vec_any_gt (vector bool char, vector signed char);
+int vec_any_gt (vector signed char, vector bool char);
+int vec_any_gt (vector signed char, vector signed char);
+int vec_any_gt (vector bool short, vector unsigned short);
+int vec_any_gt (vector unsigned short, vector bool short);
+int vec_any_gt (vector unsigned short, vector unsigned short);
+int vec_any_gt (vector bool short, vector signed short);
+int vec_any_gt (vector signed short, vector bool short);
+int vec_any_gt (vector signed short, vector signed short);
+int vec_any_gt (vector bool int, vector unsigned int);
+int vec_any_gt (vector unsigned int, vector bool int);
+int vec_any_gt (vector unsigned int, vector unsigned int);
+int vec_any_gt (vector bool int, vector signed int);
+int vec_any_gt (vector signed int, vector bool int);
+int vec_any_gt (vector signed int, vector signed int);
+int vec_any_gt (vector float, vector float);
 
-vector unsigned int vec_msum (vector unsigned char,
-                              vector unsigned char,
-                              vector unsigned int);
-vector signed int vec_msum (vector signed char,
-                            vector unsigned char,
-                            vector signed int);
-vector unsigned int vec_msum (vector unsigned short,
-                              vector unsigned short,
-                              vector unsigned int);
-vector signed int vec_msum (vector signed short,
-                            vector signed short,
-                            vector signed int);
+int vec_any_le (vector bool char, vector unsigned char);
+int vec_any_le (vector unsigned char, vector bool char);
+int vec_any_le (vector unsigned char, vector unsigned char);
+int vec_any_le (vector bool char, vector signed char);
+int vec_any_le (vector signed char, vector bool char);
+int vec_any_le (vector signed char, vector signed char);
+int vec_any_le (vector bool short, vector unsigned short);
+int vec_any_le (vector unsigned short, vector bool short);
+int vec_any_le (vector unsigned short, vector unsigned short);
+int vec_any_le (vector bool short, vector signed short);
+int vec_any_le (vector signed short, vector bool short);
+int vec_any_le (vector signed short, vector signed short);
+int vec_any_le (vector bool int, vector unsigned int);
+int vec_any_le (vector unsigned int, vector bool int);
+int vec_any_le (vector unsigned int, vector unsigned int);
+int vec_any_le (vector bool int, vector signed int);
+int vec_any_le (vector signed int, vector bool int);
+int vec_any_le (vector signed int, vector signed int);
+int vec_any_le (vector float, vector float);
 
-vector signed int vec_vmsumshm (vector signed short,
-                                vector signed short,
-                                vector signed int);
+int vec_any_lt (vector bool char, vector unsigned char);
+int vec_any_lt (vector unsigned char, vector bool char);
+int vec_any_lt (vector unsigned char, vector unsigned char);
+int vec_any_lt (vector bool char, vector signed char);
+int vec_any_lt (vector signed char, vector bool char);
+int vec_any_lt (vector signed char, vector signed char);
+int vec_any_lt (vector bool short, vector unsigned short);
+int vec_any_lt (vector unsigned short, vector bool short);
+int vec_any_lt (vector unsigned short, vector unsigned short);
+int vec_any_lt (vector bool short, vector signed short);
+int vec_any_lt (vector signed short, vector bool short);
+int vec_any_lt (vector signed short, vector signed short);
+int vec_any_lt (vector bool int, vector unsigned int);
+int vec_any_lt (vector unsigned int, vector bool int);
+int vec_any_lt (vector unsigned int, vector unsigned int);
+int vec_any_lt (vector bool int, vector signed int);
+int vec_any_lt (vector signed int, vector bool int);
+int vec_any_lt (vector signed int, vector signed int);
+int vec_any_lt (vector float, vector float);
 
-vector unsigned int vec_vmsumuhm (vector unsigned short,
-                                  vector unsigned short,
-                                  vector unsigned int);
+int vec_any_nan (vector float);
 
-vector signed int vec_vmsummbm (vector signed char,
-                                vector unsigned char,
-                                vector signed int);
+int vec_any_ne (vector signed char, vector bool char);
+int vec_any_ne (vector signed char, vector signed char);
+int vec_any_ne (vector unsigned char, vector bool char);
+int vec_any_ne (vector unsigned char, vector unsigned char);
+int vec_any_ne (vector bool char, vector bool char);
+int vec_any_ne (vector bool char, vector unsigned char);
+int vec_any_ne (vector bool char, vector signed char);
+int vec_any_ne (vector signed short, vector bool short);
+int vec_any_ne (vector signed short, vector signed short);
+int vec_any_ne (vector unsigned short, vector bool short);
+int vec_any_ne (vector unsigned short, vector unsigned short);
+int vec_any_ne (vector bool short, vector bool short);
+int vec_any_ne (vector bool short, vector unsigned short);
+int vec_any_ne (vector bool short, vector signed short);
+int vec_any_ne (vector pixel, vector pixel);
+int vec_any_ne (vector signed int, vector bool int);
+int vec_any_ne (vector signed int, vector signed int);
+int vec_any_ne (vector unsigned int, vector bool int);
+int vec_any_ne (vector unsigned int, vector unsigned int);
+int vec_any_ne (vector bool int, vector bool int);
+int vec_any_ne (vector bool int, vector unsigned int);
+int vec_any_ne (vector bool int, vector signed int);
+int vec_any_ne (vector float, vector float);
 
-vector unsigned int vec_vmsumubm (vector unsigned char,
-                                  vector unsigned char,
-                                  vector unsigned int);
+int vec_any_nge (vector float, vector float);
 
-vector unsigned int vec_msums (vector unsigned short,
-                               vector unsigned short,
-                               vector unsigned int);
-vector signed int vec_msums (vector signed short,
-                             vector signed short,
-                             vector signed int);
+int vec_any_ngt (vector float, vector float);
 
-vector signed int vec_vmsumshs (vector signed short,
-                                vector signed short,
-                                vector signed int);
+int vec_any_nle (vector float, vector float);
 
-vector unsigned int vec_vmsumuhs (vector unsigned short,
-                                  vector unsigned short,
-                                  vector unsigned int);
+int vec_any_nlt (vector float, vector float);
 
-void vec_mtvscr (vector signed int);
-void vec_mtvscr (vector unsigned int);
-void vec_mtvscr (vector bool int);
-void vec_mtvscr (vector signed short);
-void vec_mtvscr (vector unsigned short);
-void vec_mtvscr (vector bool short);
-void vec_mtvscr (vector pixel);
-void vec_mtvscr (vector signed char);
-void vec_mtvscr (vector unsigned char);
-void vec_mtvscr (vector bool char);
+int vec_any_numeric (vector float);
 
-vector unsigned short vec_mule (vector unsigned char,
-                                vector unsigned char);
-vector signed short vec_mule (vector signed char,
-                              vector signed char);
-vector unsigned int vec_mule (vector unsigned short,
-                              vector unsigned short);
-vector signed int vec_mule (vector signed short, vector signed short);
+int vec_any_out (vector float, vector float);
+@end smallexample
 
-vector signed int vec_vmulesh (vector signed short,
-                               vector signed short);
+If the vector/scalar (VSX) instruction set is available, the following
+additional functions are available:
 
-vector unsigned int vec_vmuleuh (vector unsigned short,
-                                 vector unsigned short);
+@smallexample
+vector double vec_abs (vector double);
+vector double vec_add (vector double, vector double);
+vector double vec_and (vector double, vector double);
+vector double vec_and (vector double, vector bool long);
+vector double vec_and (vector bool long, vector double);
+vector long vec_and (vector long, vector long);
+vector long vec_and (vector long, vector bool long);
+vector long vec_and (vector bool long, vector long);
+vector unsigned long vec_and (vector unsigned long, vector unsigned long);
+vector unsigned long vec_and (vector unsigned long, vector bool long);
+vector unsigned long vec_and (vector bool long, vector unsigned long);
+vector double vec_andc (vector double, vector double);
+vector double vec_andc (vector double, vector bool long);
+vector double vec_andc (vector bool long, vector double);
+vector long vec_andc (vector long, vector long);
+vector long vec_andc (vector long, vector bool long);
+vector long vec_andc (vector bool long, vector long);
+vector unsigned long vec_andc (vector unsigned long, vector unsigned long);
+vector unsigned long vec_andc (vector unsigned long, vector bool long);
+vector unsigned long vec_andc (vector bool long, vector unsigned long);
+vector double vec_ceil (vector double);
+vector bool long vec_cmpeq (vector double, vector double);
+vector bool long vec_cmpge (vector double, vector double);
+vector bool long vec_cmpgt (vector double, vector double);
+vector bool long vec_cmple (vector double, vector double);
+vector bool long vec_cmplt (vector double, vector double);
+vector double vec_cpsgn (vector double, vector double);
+vector float vec_div (vector float, vector float);
+vector double vec_div (vector double, vector double);
+vector long vec_div (vector long, vector long);
+vector unsigned long vec_div (vector unsigned long, vector unsigned long);
+vector double vec_floor (vector double);
+vector double vec_ld (int, const vector double *);
+vector double vec_ld (int, const double *);
+vector double vec_ldl (int, const vector double *);
+vector double vec_ldl (int, const double *);
+vector unsigned char vec_lvsl (int, const volatile double *);
+vector unsigned char vec_lvsr (int, const volatile double *);
+vector double vec_madd (vector double, vector double, vector double);
+vector double vec_max (vector double, vector double);
+vector signed long vec_mergeh (vector signed long, vector signed long);
+vector signed long vec_mergeh (vector signed long, vector bool long);
+vector signed long vec_mergeh (vector bool long, vector signed long);
+vector unsigned long vec_mergeh (vector unsigned long, vector unsigned long);
+vector unsigned long vec_mergeh (vector unsigned long, vector bool long);
+vector unsigned long vec_mergeh (vector bool long, vector unsigned long);
+vector signed long vec_mergel (vector signed long, vector signed long);
+vector signed long vec_mergel (vector signed long, vector bool long);
+vector signed long vec_mergel (vector bool long, vector signed long);
+vector unsigned long vec_mergel (vector unsigned long, vector unsigned long);
+vector unsigned long vec_mergel (vector unsigned long, vector bool long);
+vector unsigned long vec_mergel (vector bool long, vector unsigned long);
+vector double vec_min (vector double, vector double);
+vector float vec_msub (vector float, vector float, vector float);
+vector double vec_msub (vector double, vector double, vector double);
+vector float vec_mul (vector float, vector float);
+vector double vec_mul (vector double, vector double);
+vector long vec_mul (vector long, vector long);
+vector unsigned long vec_mul (vector unsigned long, vector unsigned long);
+vector float vec_nearbyint (vector float);
+vector double vec_nearbyint (vector double);
+vector float vec_nmadd (vector float, vector float, vector float);
+vector double vec_nmadd (vector double, vector double, vector double);
+vector double vec_nmsub (vector double, vector double, vector double);
+vector double vec_nor (vector double, vector double);
+vector long vec_nor (vector long, vector long);
+vector long vec_nor (vector long, vector bool long);
+vector long vec_nor (vector bool long, vector long);
+vector unsigned long vec_nor (vector unsigned long, vector unsigned long);
+vector unsigned long vec_nor (vector unsigned long, vector bool long);
+vector unsigned long vec_nor (vector bool long, vector unsigned long);
+vector double vec_or (vector double, vector double);
+vector double vec_or (vector double, vector bool long);
+vector double vec_or (vector bool long, vector double);
+vector long vec_or (vector long, vector long);
+vector long vec_or (vector long, vector bool long);
+vector long vec_or (vector bool long, vector long);
+vector unsigned long vec_or (vector unsigned long, vector unsigned long);
+vector unsigned long vec_or (vector unsigned long, vector bool long);
+vector unsigned long vec_or (vector bool long, vector unsigned long);
+vector double vec_perm (vector double, vector double, vector unsigned char);
+vector long vec_perm (vector long, vector long, vector unsigned char);
+vector unsigned long vec_perm (vector unsigned long, vector unsigned long,
+                               vector unsigned char);
+vector double vec_rint (vector double);
+vector double vec_recip (vector double, vector double);
+vector double vec_rsqrt (vector double);
+vector double vec_rsqrte (vector double);
+vector double vec_sel (vector double, vector double, vector bool long);
+vector double vec_sel (vector double, vector double, vector unsigned long);
+vector long vec_sel (vector long, vector long, vector long);
+vector long vec_sel (vector long, vector long, vector unsigned long);
+vector long vec_sel (vector long, vector long, vector bool long);
+vector unsigned long vec_sel (vector unsigned long, vector unsigned long,
+                              vector long);
+vector unsigned long vec_sel (vector unsigned long, vector unsigned long,
+                              vector unsigned long);
+vector unsigned long vec_sel (vector unsigned long, vector unsigned long,
+                              vector bool long);
+vector double vec_splats (double);
+vector signed long vec_splats (signed long);
+vector unsigned long vec_splats (unsigned long);
+vector float vec_sqrt (vector float);
+vector double vec_sqrt (vector double);
+void vec_st (vector double, int, vector double *);
+void vec_st (vector double, int, double *);
+vector double vec_sub (vector double, vector double);
+vector double vec_trunc (vector double);
+vector double vec_xor (vector double, vector double);
+vector double vec_xor (vector double, vector bool long);
+vector double vec_xor (vector bool long, vector double);
+vector long vec_xor (vector long, vector long);
+vector long vec_xor (vector long, vector bool long);
+vector long vec_xor (vector bool long, vector long);
+vector unsigned long vec_xor (vector unsigned long, vector unsigned long);
+vector unsigned long vec_xor (vector unsigned long, vector bool long);
+vector unsigned long vec_xor (vector bool long, vector unsigned long);
+int vec_all_eq (vector double, vector double);
+int vec_all_ge (vector double, vector double);
+int vec_all_gt (vector double, vector double);
+int vec_all_le (vector double, vector double);
+int vec_all_lt (vector double, vector double);
+int vec_all_nan (vector double);
+int vec_all_ne (vector double, vector double);
+int vec_all_nge (vector double, vector double);
+int vec_all_ngt (vector double, vector double);
+int vec_all_nle (vector double, vector double);
+int vec_all_nlt (vector double, vector double);
+int vec_all_numeric (vector double);
+int vec_any_eq (vector double, vector double);
+int vec_any_ge (vector double, vector double);
+int vec_any_gt (vector double, vector double);
+int vec_any_le (vector double, vector double);
+int vec_any_lt (vector double, vector double);
+int vec_any_nan (vector double);
+int vec_any_ne (vector double, vector double);
+int vec_any_nge (vector double, vector double);
+int vec_any_ngt (vector double, vector double);
+int vec_any_nle (vector double, vector double);
+int vec_any_nlt (vector double, vector double);
+int vec_any_numeric (vector double);
 
-vector signed short vec_vmulesb (vector signed char,
-                                 vector signed char);
+vector double vec_vsx_ld (int, const vector double *);
+vector double vec_vsx_ld (int, const double *);
+vector float vec_vsx_ld (int, const vector float *);
+vector float vec_vsx_ld (int, const float *);
+vector bool int vec_vsx_ld (int, const vector bool int *);
+vector signed int vec_vsx_ld (int, const vector signed int *);
+vector signed int vec_vsx_ld (int, const int *);
+vector signed int vec_vsx_ld (int, const long *);
+vector unsigned int vec_vsx_ld (int, const vector unsigned int *);
+vector unsigned int vec_vsx_ld (int, const unsigned int *);
+vector unsigned int vec_vsx_ld (int, const unsigned long *);
+vector bool short vec_vsx_ld (int, const vector bool short *);
+vector pixel vec_vsx_ld (int, const vector pixel *);
+vector signed short vec_vsx_ld (int, const vector signed short *);
+vector signed short vec_vsx_ld (int, const short *);
+vector unsigned short vec_vsx_ld (int, const vector unsigned short *);
+vector unsigned short vec_vsx_ld (int, const unsigned short *);
+vector bool char vec_vsx_ld (int, const vector bool char *);
+vector signed char vec_vsx_ld (int, const vector signed char *);
+vector signed char vec_vsx_ld (int, const signed char *);
+vector unsigned char vec_vsx_ld (int, const vector unsigned char *);
+vector unsigned char vec_vsx_ld (int, const unsigned char *);
 
-vector unsigned short vec_vmuleub (vector unsigned char,
-                                  vector unsigned char);
+void vec_vsx_st (vector double, int, vector double *);
+void vec_vsx_st (vector double, int, double *);
+void vec_vsx_st (vector float, int, vector float *);
+void vec_vsx_st (vector float, int, float *);
+void vec_vsx_st (vector signed int, int, vector signed int *);
+void vec_vsx_st (vector signed int, int, int *);
+void vec_vsx_st (vector unsigned int, int, vector unsigned int *);
+void vec_vsx_st (vector unsigned int, int, unsigned int *);
+void vec_vsx_st (vector bool int, int, vector bool int *);
+void vec_vsx_st (vector bool int, int, unsigned int *);
+void vec_vsx_st (vector bool int, int, int *);
+void vec_vsx_st (vector signed short, int, vector signed short *);
+void vec_vsx_st (vector signed short, int, short *);
+void vec_vsx_st (vector unsigned short, int, vector unsigned short *);
+void vec_vsx_st (vector unsigned short, int, unsigned short *);
+void vec_vsx_st (vector bool short, int, vector bool short *);
+void vec_vsx_st (vector bool short, int, unsigned short *);
+void vec_vsx_st (vector pixel, int, vector pixel *);
+void vec_vsx_st (vector pixel, int, unsigned short *);
+void vec_vsx_st (vector pixel, int, short *);
+void vec_vsx_st (vector bool short, int, short *);
+void vec_vsx_st (vector signed char, int, vector signed char *);
+void vec_vsx_st (vector signed char, int, signed char *);
+void vec_vsx_st (vector unsigned char, int, vector unsigned char *);
+void vec_vsx_st (vector unsigned char, int, unsigned char *);
+void vec_vsx_st (vector bool char, int, vector bool char *);
+void vec_vsx_st (vector bool char, int, unsigned char *);
+void vec_vsx_st (vector bool char, int, signed char *);
 
-vector unsigned short vec_mulo (vector unsigned char,
-                                vector unsigned char);
-vector signed short vec_mulo (vector signed char, vector signed char);
-vector unsigned int vec_mulo (vector unsigned short,
-                              vector unsigned short);
-vector signed int vec_mulo (vector signed short, vector signed short);
+vector double vec_xxpermdi (vector double, vector double, int);
+vector float vec_xxpermdi (vector float, vector float, int);
+vector long long vec_xxpermdi (vector long long, vector long long, int);
+vector unsigned long long vec_xxpermdi (vector unsigned long long,
+                                        vector unsigned long long, int);
+vector int vec_xxpermdi (vector int, vector int, int);
+vector unsigned int vec_xxpermdi (vector unsigned int,
+                                  vector unsigned int, int);
+vector short vec_xxpermdi (vector short, vector short, int);
+vector unsigned short vec_xxpermdi (vector unsigned short,
+                                    vector unsigned short, int);
+vector signed char vec_xxpermdi (vector signed char, vector signed char, int);
+vector unsigned char vec_xxpermdi (vector unsigned char,
+                                   vector unsigned char, int);
 
-vector signed int vec_vmulosh (vector signed short,
-                               vector signed short);
+vector double vec_xxsldi (vector double, vector double, int);
+vector float vec_xxsldi (vector float, vector float, int);
+vector long long vec_xxsldi (vector long long, vector long long, int);
+vector unsigned long long vec_xxsldi (vector unsigned long long,
+                                      vector unsigned long long, int);
+vector int vec_xxsldi (vector int, vector int, int);
+vector unsigned int vec_xxsldi (vector unsigned int, vector unsigned int, int);
+vector short vec_xxsldi (vector short, vector short, int);
+vector unsigned short vec_xxsldi (vector unsigned short,
+                                  vector unsigned short, int);
+vector signed char vec_xxsldi (vector signed char, vector signed char, int);
+vector unsigned char vec_xxsldi (vector unsigned char,
+                                 vector unsigned char, int);
+@end smallexample
 
-vector unsigned int vec_vmulouh (vector unsigned short,
-                                 vector unsigned short);
+Note that the @samp{vec_ld} and @samp{vec_st} built-in functions always
+generate the AltiVec @samp{LVX} and @samp{STVX} instructions even
+if the VSX instruction set is available.  The @samp{vec_vsx_ld} and
+@samp{vec_vsx_st} built-in functions always generate the VSX @samp{LXVD2X},
+@samp{LXVW4X}, @samp{STXVD2X}, and @samp{STXVW4X} instructions.
 
-vector signed short vec_vmulosb (vector signed char,
-                                 vector signed char);
+If the ISA 2.07 additions to the vector/scalar (power8-vector)
+instruction set is available, the following additional functions are
+available for both 32-bit and 64-bit targets.  For 64-bit targets, you
+can use @var{vector long} instead of @var{vector long long},
+@var{vector bool long} instead of @var{vector bool long long}, and
+@var{vector unsigned long} instead of @var{vector unsigned long long}.
 
-vector unsigned short vec_vmuloub (vector unsigned char,
-                                   vector unsigned char);
+@smallexample
+vector long long vec_abs (vector long long);
 
-vector float vec_nmsub (vector float, vector float, vector float);
+vector long long vec_add (vector long long, vector long long);
+vector unsigned long long vec_add (vector unsigned long long,
+                                   vector unsigned long long);
 
-vector float vec_nor (vector float, vector float);
-vector signed int vec_nor (vector signed int, vector signed int);
-vector unsigned int vec_nor (vector unsigned int, vector unsigned int);
-vector bool int vec_nor (vector bool int, vector bool int);
-vector signed short vec_nor (vector signed short, vector signed short);
-vector unsigned short vec_nor (vector unsigned short,
-                               vector unsigned short);
-vector bool short vec_nor (vector bool short, vector bool short);
-vector signed char vec_nor (vector signed char, vector signed char);
-vector unsigned char vec_nor (vector unsigned char,
-                              vector unsigned char);
-vector bool char vec_nor (vector bool char, vector bool char);
+int vec_all_eq (vector long long, vector long long);
+int vec_all_eq (vector unsigned long long, vector unsigned long long);
+int vec_all_ge (vector long long, vector long long);
+int vec_all_ge (vector unsigned long long, vector unsigned long long);
+int vec_all_gt (vector long long, vector long long);
+int vec_all_gt (vector unsigned long long, vector unsigned long long);
+int vec_all_le (vector long long, vector long long);
+int vec_all_le (vector unsigned long long, vector unsigned long long);
+int vec_all_lt (vector long long, vector long long);
+int vec_all_lt (vector unsigned long long, vector unsigned long long);
+int vec_all_ne (vector long long, vector long long);
+int vec_all_ne (vector unsigned long long, vector unsigned long long);
 
-vector float vec_or (vector float, vector float);
-vector float vec_or (vector float, vector bool int);
-vector float vec_or (vector bool int, vector float);
-vector bool int vec_or (vector bool int, vector bool int);
-vector signed int vec_or (vector bool int, vector signed int);
-vector signed int vec_or (vector signed int, vector bool int);
-vector signed int vec_or (vector signed int, vector signed int);
-vector unsigned int vec_or (vector bool int, vector unsigned int);
-vector unsigned int vec_or (vector unsigned int, vector bool int);
-vector unsigned int vec_or (vector unsigned int, vector unsigned int);
-vector bool short vec_or (vector bool short, vector bool short);
-vector signed short vec_or (vector bool short, vector signed short);
-vector signed short vec_or (vector signed short, vector bool short);
-vector signed short vec_or (vector signed short, vector signed short);
-vector unsigned short vec_or (vector bool short, vector unsigned short);
-vector unsigned short vec_or (vector unsigned short, vector bool short);
-vector unsigned short vec_or (vector unsigned short,
-                              vector unsigned short);
-vector signed char vec_or (vector bool char, vector signed char);
-vector bool char vec_or (vector bool char, vector bool char);
-vector signed char vec_or (vector signed char, vector bool char);
-vector signed char vec_or (vector signed char, vector signed char);
-vector unsigned char vec_or (vector bool char, vector unsigned char);
-vector unsigned char vec_or (vector unsigned char, vector bool char);
-vector unsigned char vec_or (vector unsigned char,
-                             vector unsigned char);
+int vec_any_eq (vector long long, vector long long);
+int vec_any_eq (vector unsigned long long, vector unsigned long long);
+int vec_any_ge (vector long long, vector long long);
+int vec_any_ge (vector unsigned long long, vector unsigned long long);
+int vec_any_gt (vector long long, vector long long);
+int vec_any_gt (vector unsigned long long, vector unsigned long long);
+int vec_any_le (vector long long, vector long long);
+int vec_any_le (vector unsigned long long, vector unsigned long long);
+int vec_any_lt (vector long long, vector long long);
+int vec_any_lt (vector unsigned long long, vector unsigned long long);
+int vec_any_ne (vector long long, vector long long);
+int vec_any_ne (vector unsigned long long, vector unsigned long long);
 
-vector signed char vec_pack (vector signed short, vector signed short);
-vector unsigned char vec_pack (vector unsigned short,
+vector long long vec_eqv (vector long long, vector long long);
+vector long long vec_eqv (vector bool long long, vector long long);
+vector long long vec_eqv (vector long long, vector bool long long);
+vector unsigned long long vec_eqv (vector unsigned long long,
+                                   vector unsigned long long);
+vector unsigned long long vec_eqv (vector bool long long,
+                                   vector unsigned long long);
+vector unsigned long long vec_eqv (vector unsigned long long,
+                                   vector bool long long);
+vector int vec_eqv (vector int, vector int);
+vector int vec_eqv (vector bool int, vector int);
+vector int vec_eqv (vector int, vector bool int);
+vector unsigned int vec_eqv (vector unsigned int, vector unsigned int);
+vector unsigned int vec_eqv (vector bool unsigned int,
+                             vector unsigned int);
+vector unsigned int vec_eqv (vector unsigned int,
+                             vector bool unsigned int);
+vector short vec_eqv (vector short, vector short);
+vector short vec_eqv (vector bool short, vector short);
+vector short vec_eqv (vector short, vector bool short);
+vector unsigned short vec_eqv (vector unsigned short, vector unsigned short);
+vector unsigned short vec_eqv (vector bool unsigned short,
                                vector unsigned short);
-vector bool char vec_pack (vector bool short, vector bool short);
-vector signed short vec_pack (vector signed int, vector signed int);
-vector unsigned short vec_pack (vector unsigned int,
-                                vector unsigned int);
-vector bool short vec_pack (vector bool int, vector bool int);
-
-vector bool short vec_vpkuwum (vector bool int, vector bool int);
-vector signed short vec_vpkuwum (vector signed int, vector signed int);
-vector unsigned short vec_vpkuwum (vector unsigned int,
-                                   vector unsigned int);
-
-vector bool char vec_vpkuhum (vector bool short, vector bool short);
-vector signed char vec_vpkuhum (vector signed short,
-                                vector signed short);
-vector unsigned char vec_vpkuhum (vector unsigned short,
-                                  vector unsigned short);
+vector unsigned short vec_eqv (vector unsigned short,
+                               vector bool unsigned short);
+vector signed char vec_eqv (vector signed char, vector signed char);
+vector signed char vec_eqv (vector bool signed char, vector signed char);
+vector signed char vec_eqv (vector signed char, vector bool signed char);
+vector unsigned char vec_eqv (vector unsigned char, vector unsigned char);
+vector unsigned char vec_eqv (vector bool unsigned char, vector unsigned char);
+vector unsigned char vec_eqv (vector unsigned char, vector bool unsigned char);
 
-vector pixel vec_packpx (vector unsigned int, vector unsigned int);
+vector long long vec_max (vector long long, vector long long);
+vector unsigned long long vec_max (vector unsigned long long,
+                                   vector unsigned long long);
 
-vector unsigned char vec_packs (vector unsigned short,
-                                vector unsigned short);
-vector signed char vec_packs (vector signed short, vector signed short);
-vector unsigned short vec_packs (vector unsigned int,
-                                 vector unsigned int);
-vector signed short vec_packs (vector signed int, vector signed int);
+vector signed int vec_mergee (vector signed int, vector signed int);
+vector unsigned int vec_mergee (vector unsigned int, vector unsigned int);
+vector bool int vec_mergee (vector bool int, vector bool int);
 
-vector signed short vec_vpkswss (vector signed int, vector signed int);
+vector signed int vec_mergeo (vector signed int, vector signed int);
+vector unsigned int vec_mergeo (vector unsigned int, vector unsigned int);
+vector bool int vec_mergeo (vector bool int, vector bool int);
 
-vector unsigned short vec_vpkuwus (vector unsigned int,
-                                   vector unsigned int);
+vector long long vec_min (vector long long, vector long long);
+vector unsigned long long vec_min (vector unsigned long long,
+                                   vector unsigned long long);
 
-vector signed char vec_vpkshss (vector signed short,
-                                vector signed short);
+vector long long vec_nand (vector long long, vector long long);
+vector long long vec_nand (vector bool long long, vector long long);
+vector long long vec_nand (vector long long, vector bool long long);
+vector unsigned long long vec_nand (vector unsigned long long,
+                                    vector unsigned long long);
+vector unsigned long long vec_nand (vector bool long long,
+                                   vector unsigned long long);
+vector unsigned long long vec_nand (vector unsigned long long,
+                                    vector bool long long);
+vector int vec_nand (vector int, vector int);
+vector int vec_nand (vector bool int, vector int);
+vector int vec_nand (vector int, vector bool int);
+vector unsigned int vec_nand (vector unsigned int, vector unsigned int);
+vector unsigned int vec_nand (vector bool unsigned int,
+                              vector unsigned int);
+vector unsigned int vec_nand (vector unsigned int,
+                              vector bool unsigned int);
+vector short vec_nand (vector short, vector short);
+vector short vec_nand (vector bool short, vector short);
+vector short vec_nand (vector short, vector bool short);
+vector unsigned short vec_nand (vector unsigned short, vector unsigned short);
+vector unsigned short vec_nand (vector bool unsigned short,
+                                vector unsigned short);
+vector unsigned short vec_nand (vector unsigned short,
+                                vector bool unsigned short);
+vector signed char vec_nand (vector signed char, vector signed char);
+vector signed char vec_nand (vector bool signed char, vector signed char);
+vector signed char vec_nand (vector signed char, vector bool signed char);
+vector unsigned char vec_nand (vector unsigned char, vector unsigned char);
+vector unsigned char vec_nand (vector bool unsigned char, vector unsigned char);
+vector unsigned char vec_nand (vector unsigned char, vector bool unsigned char);
 
-vector unsigned char vec_vpkuhus (vector unsigned short,
-                                  vector unsigned short);
+vector long long vec_orc (vector long long, vector long long);
+vector long long vec_orc (vector bool long long, vector long long);
+vector long long vec_orc (vector long long, vector bool long long);
+vector unsigned long long vec_orc (vector unsigned long long,
+                                   vector unsigned long long);
+vector unsigned long long vec_orc (vector bool long long,
+                                   vector unsigned long long);
+vector unsigned long long vec_orc (vector unsigned long long,
+                                   vector bool long long);
+vector int vec_orc (vector int, vector int);
+vector int vec_orc (vector bool int, vector int);
+vector int vec_orc (vector int, vector bool int);
+vector unsigned int vec_orc (vector unsigned int, vector unsigned int);
+vector unsigned int vec_orc (vector bool unsigned int,
+                             vector unsigned int);
+vector unsigned int vec_orc (vector unsigned int,
+                             vector bool unsigned int);
+vector short vec_orc (vector short, vector short);
+vector short vec_orc (vector bool short, vector short);
+vector short vec_orc (vector short, vector bool short);
+vector unsigned short vec_orc (vector unsigned short, vector unsigned short);
+vector unsigned short vec_orc (vector bool unsigned short,
+                               vector unsigned short);
+vector unsigned short vec_orc (vector unsigned short,
+                               vector bool unsigned short);
+vector signed char vec_orc (vector signed char, vector signed char);
+vector signed char vec_orc (vector bool signed char, vector signed char);
+vector signed char vec_orc (vector signed char, vector bool signed char);
+vector unsigned char vec_orc (vector unsigned char, vector unsigned char);
+vector unsigned char vec_orc (vector bool unsigned char, vector unsigned char);
+vector unsigned char vec_orc (vector unsigned char, vector bool unsigned char);
 
-vector unsigned char vec_packsu (vector unsigned short,
-                                 vector unsigned short);
-vector unsigned char vec_packsu (vector signed short,
-                                 vector signed short);
-vector unsigned short vec_packsu (vector unsigned int,
-                                  vector unsigned int);
-vector unsigned short vec_packsu (vector signed int, vector signed int);
+vector int vec_pack (vector long long, vector long long);
+vector unsigned int vec_pack (vector unsigned long long,
+                              vector unsigned long long);
+vector bool int vec_pack (vector bool long long, vector bool long long);
 
-vector unsigned short vec_vpkswus (vector signed int,
-                                   vector signed int);
+vector int vec_packs (vector long long, vector long long);
+vector unsigned int vec_packs (vector unsigned long long,
+                               vector unsigned long long);
 
-vector unsigned char vec_vpkshus (vector signed short,
-                                  vector signed short);
+vector unsigned int vec_packsu (vector long long, vector long long);
+vector unsigned int vec_packsu (vector unsigned long long,
+                                vector unsigned long long);
 
-vector float vec_perm (vector float,
-                       vector float,
-                       vector unsigned char);
-vector signed int vec_perm (vector signed int,
-                            vector signed int,
-                            vector unsigned char);
-vector unsigned int vec_perm (vector unsigned int,
-                              vector unsigned int,
-                              vector unsigned char);
-vector bool int vec_perm (vector bool int,
-                          vector bool int,
-                          vector unsigned char);
-vector signed short vec_perm (vector signed short,
-                              vector signed short,
-                              vector unsigned char);
-vector unsigned short vec_perm (vector unsigned short,
-                                vector unsigned short,
-                                vector unsigned char);
-vector bool short vec_perm (vector bool short,
-                            vector bool short,
-                            vector unsigned char);
-vector pixel vec_perm (vector pixel,
-                       vector pixel,
-                       vector unsigned char);
-vector signed char vec_perm (vector signed char,
-                             vector signed char,
-                             vector unsigned char);
-vector unsigned char vec_perm (vector unsigned char,
-                               vector unsigned char,
-                               vector unsigned char);
-vector bool char vec_perm (vector bool char,
-                           vector bool char,
-                           vector unsigned char);
+vector long long vec_rl (vector long long,
+                         vector unsigned long long);
+vector long long vec_rl (vector unsigned long long,
+                         vector unsigned long long);
 
-vector float vec_re (vector float);
+vector long long vec_sl (vector long long, vector unsigned long long);
+vector long long vec_sl (vector unsigned long long,
+                         vector unsigned long long);
 
-vector signed char vec_rl (vector signed char,
-                           vector unsigned char);
-vector unsigned char vec_rl (vector unsigned char,
-                             vector unsigned char);
-vector signed short vec_rl (vector signed short, vector unsigned short);
-vector unsigned short vec_rl (vector unsigned short,
-                              vector unsigned short);
-vector signed int vec_rl (vector signed int, vector unsigned int);
-vector unsigned int vec_rl (vector unsigned int, vector unsigned int);
+vector long long vec_sr (vector long long, vector unsigned long long);
+vector unsigned long long char vec_sr (vector unsigned long long,
+                                       vector unsigned long long);
 
-vector signed int vec_vrlw (vector signed int, vector unsigned int);
-vector unsigned int vec_vrlw (vector unsigned int, vector unsigned int);
+vector long long vec_sra (vector long long, vector unsigned long long);
+vector unsigned long long vec_sra (vector unsigned long long,
+                                   vector unsigned long long);
 
-vector signed short vec_vrlh (vector signed short,
-                              vector unsigned short);
-vector unsigned short vec_vrlh (vector unsigned short,
-                                vector unsigned short);
+vector long long vec_sub (vector long long, vector long long);
+vector unsigned long long vec_sub (vector unsigned long long,
+                                   vector unsigned long long);
 
-vector signed char vec_vrlb (vector signed char, vector unsigned char);
-vector unsigned char vec_vrlb (vector unsigned char,
-                               vector unsigned char);
+vector long long vec_unpackh (vector int);
+vector unsigned long long vec_unpackh (vector unsigned int);
 
-vector float vec_round (vector float);
+vector long long vec_unpackl (vector int);
+vector unsigned long long vec_unpackl (vector unsigned int);
 
-vector float vec_recip (vector float, vector float);
+vector long long vec_vaddudm (vector long long, vector long long);
+vector long long vec_vaddudm (vector bool long long, vector long long);
+vector long long vec_vaddudm (vector long long, vector bool long long);
+vector unsigned long long vec_vaddudm (vector unsigned long long,
+                                       vector unsigned long long);
+vector unsigned long long vec_vaddudm (vector bool unsigned long long,
+                                       vector unsigned long long);
+vector unsigned long long vec_vaddudm (vector unsigned long long,
+                                       vector bool unsigned long long);
 
-vector float vec_rsqrt (vector float);
+vector long long vec_vbpermq (vector signed char, vector signed char);
+vector long long vec_vbpermq (vector unsigned char, vector unsigned char);
 
-vector float vec_rsqrte (vector float);
+vector long long vec_cntlz (vector long long);
+vector unsigned long long vec_cntlz (vector unsigned long long);
+vector int vec_cntlz (vector int);
+vector unsigned int vec_cntlz (vector int);
+vector short vec_cntlz (vector short);
+vector unsigned short vec_cntlz (vector unsigned short);
+vector signed char vec_cntlz (vector signed char);
+vector unsigned char vec_cntlz (vector unsigned char);
 
-vector float vec_sel (vector float, vector float, vector bool int);
-vector float vec_sel (vector float, vector float, vector unsigned int);
-vector signed int vec_sel (vector signed int,
-                           vector signed int,
-                           vector bool int);
-vector signed int vec_sel (vector signed int,
-                           vector signed int,
-                           vector unsigned int);
-vector unsigned int vec_sel (vector unsigned int,
-                             vector unsigned int,
-                             vector bool int);
-vector unsigned int vec_sel (vector unsigned int,
-                             vector unsigned int,
-                             vector unsigned int);
-vector bool int vec_sel (vector bool int,
-                         vector bool int,
-                         vector bool int);
-vector bool int vec_sel (vector bool int,
-                         vector bool int,
-                         vector unsigned int);
-vector signed short vec_sel (vector signed short,
-                             vector signed short,
-                             vector bool short);
-vector signed short vec_sel (vector signed short,
-                             vector signed short,
-                             vector unsigned short);
-vector unsigned short vec_sel (vector unsigned short,
-                               vector unsigned short,
-                               vector bool short);
-vector unsigned short vec_sel (vector unsigned short,
-                               vector unsigned short,
-                               vector unsigned short);
-vector bool short vec_sel (vector bool short,
-                           vector bool short,
-                           vector bool short);
-vector bool short vec_sel (vector bool short,
-                           vector bool short,
-                           vector unsigned short);
-vector signed char vec_sel (vector signed char,
-                            vector signed char,
-                            vector bool char);
-vector signed char vec_sel (vector signed char,
-                            vector signed char,
-                            vector unsigned char);
-vector unsigned char vec_sel (vector unsigned char,
-                              vector unsigned char,
-                              vector bool char);
-vector unsigned char vec_sel (vector unsigned char,
-                              vector unsigned char,
-                              vector unsigned char);
-vector bool char vec_sel (vector bool char,
-                          vector bool char,
-                          vector bool char);
-vector bool char vec_sel (vector bool char,
-                          vector bool char,
-                          vector unsigned char);
+vector long long vec_vclz (vector long long);
+vector unsigned long long vec_vclz (vector unsigned long long);
+vector int vec_vclz (vector int);
+vector unsigned int vec_vclz (vector int);
+vector short vec_vclz (vector short);
+vector unsigned short vec_vclz (vector unsigned short);
+vector signed char vec_vclz (vector signed char);
+vector unsigned char vec_vclz (vector unsigned char);
 
-vector signed char vec_sl (vector signed char,
-                           vector unsigned char);
-vector unsigned char vec_sl (vector unsigned char,
-                             vector unsigned char);
-vector signed short vec_sl (vector signed short, vector unsigned short);
-vector unsigned short vec_sl (vector unsigned short,
-                              vector unsigned short);
-vector signed int vec_sl (vector signed int, vector unsigned int);
-vector unsigned int vec_sl (vector unsigned int, vector unsigned int);
+vector signed char vec_vclzb (vector signed char);
+vector unsigned char vec_vclzb (vector unsigned char);
 
-vector signed int vec_vslw (vector signed int, vector unsigned int);
-vector unsigned int vec_vslw (vector unsigned int, vector unsigned int);
+vector long long vec_vclzd (vector long long);
+vector unsigned long long vec_vclzd (vector unsigned long long);
 
-vector signed short vec_vslh (vector signed short,
-                              vector unsigned short);
-vector unsigned short vec_vslh (vector unsigned short,
-                                vector unsigned short);
+vector short vec_vclzh (vector short);
+vector unsigned short vec_vclzh (vector unsigned short);
 
-vector signed char vec_vslb (vector signed char, vector unsigned char);
-vector unsigned char vec_vslb (vector unsigned char,
-                               vector unsigned char);
+vector int vec_vclzw (vector int);
+vector unsigned int vec_vclzw (vector int);
 
-vector float vec_sld (vector float, vector float, const int);
-vector signed int vec_sld (vector signed int,
-                           vector signed int,
-                           const int);
-vector unsigned int vec_sld (vector unsigned int,
-                             vector unsigned int,
-                             const int);
-vector bool int vec_sld (vector bool int,
-                         vector bool int,
-                         const int);
-vector signed short vec_sld (vector signed short,
-                             vector signed short,
-                             const int);
-vector unsigned short vec_sld (vector unsigned short,
-                               vector unsigned short,
-                               const int);
-vector bool short vec_sld (vector bool short,
-                           vector bool short,
-                           const int);
-vector pixel vec_sld (vector pixel,
-                      vector pixel,
-                      const int);
-vector signed char vec_sld (vector signed char,
-                            vector signed char,
-                            const int);
-vector unsigned char vec_sld (vector unsigned char,
-                              vector unsigned char,
-                              const int);
-vector bool char vec_sld (vector bool char,
-                          vector bool char,
-                          const int);
+vector signed char vec_vgbbd (vector signed char);
+vector unsigned char vec_vgbbd (vector unsigned char);
 
-vector signed int vec_sll (vector signed int,
-                           vector unsigned int);
-vector signed int vec_sll (vector signed int,
-                           vector unsigned short);
-vector signed int vec_sll (vector signed int,
-                           vector unsigned char);
-vector unsigned int vec_sll (vector unsigned int,
-                             vector unsigned int);
-vector unsigned int vec_sll (vector unsigned int,
-                             vector unsigned short);
-vector unsigned int vec_sll (vector unsigned int,
-                             vector unsigned char);
-vector bool int vec_sll (vector bool int,
-                         vector unsigned int);
-vector bool int vec_sll (vector bool int,
-                         vector unsigned short);
-vector bool int vec_sll (vector bool int,
-                         vector unsigned char);
-vector signed short vec_sll (vector signed short,
-                             vector unsigned int);
-vector signed short vec_sll (vector signed short,
-                             vector unsigned short);
-vector signed short vec_sll (vector signed short,
-                             vector unsigned char);
-vector unsigned short vec_sll (vector unsigned short,
-                               vector unsigned int);
-vector unsigned short vec_sll (vector unsigned short,
-                               vector unsigned short);
-vector unsigned short vec_sll (vector unsigned short,
-                               vector unsigned char);
-vector bool short vec_sll (vector bool short, vector unsigned int);
-vector bool short vec_sll (vector bool short, vector unsigned short);
-vector bool short vec_sll (vector bool short, vector unsigned char);
-vector pixel vec_sll (vector pixel, vector unsigned int);
-vector pixel vec_sll (vector pixel, vector unsigned short);
-vector pixel vec_sll (vector pixel, vector unsigned char);
-vector signed char vec_sll (vector signed char, vector unsigned int);
-vector signed char vec_sll (vector signed char, vector unsigned short);
-vector signed char vec_sll (vector signed char, vector unsigned char);
-vector unsigned char vec_sll (vector unsigned char,
-                              vector unsigned int);
-vector unsigned char vec_sll (vector unsigned char,
-                              vector unsigned short);
-vector unsigned char vec_sll (vector unsigned char,
-                              vector unsigned char);
-vector bool char vec_sll (vector bool char, vector unsigned int);
-vector bool char vec_sll (vector bool char, vector unsigned short);
-vector bool char vec_sll (vector bool char, vector unsigned char);
+vector long long vec_vmaxsd (vector long long, vector long long);
 
-vector float vec_slo (vector float, vector signed char);
-vector float vec_slo (vector float, vector unsigned char);
-vector signed int vec_slo (vector signed int, vector signed char);
-vector signed int vec_slo (vector signed int, vector unsigned char);
-vector unsigned int vec_slo (vector unsigned int, vector signed char);
-vector unsigned int vec_slo (vector unsigned int, vector unsigned char);
-vector signed short vec_slo (vector signed short, vector signed char);
-vector signed short vec_slo (vector signed short, vector unsigned char);
-vector unsigned short vec_slo (vector unsigned short,
-                               vector signed char);
-vector unsigned short vec_slo (vector unsigned short,
-                               vector unsigned char);
-vector pixel vec_slo (vector pixel, vector signed char);
-vector pixel vec_slo (vector pixel, vector unsigned char);
-vector signed char vec_slo (vector signed char, vector signed char);
-vector signed char vec_slo (vector signed char, vector unsigned char);
-vector unsigned char vec_slo (vector unsigned char, vector signed char);
-vector unsigned char vec_slo (vector unsigned char,
-                              vector unsigned char);
+vector unsigned long long vec_vmaxud (vector unsigned long long,
+                                      unsigned vector long long);
 
-vector signed char vec_splat (vector signed char, const int);
-vector unsigned char vec_splat (vector unsigned char, const int);
-vector bool char vec_splat (vector bool char, const int);
-vector signed short vec_splat (vector signed short, const int);
-vector unsigned short vec_splat (vector unsigned short, const int);
-vector bool short vec_splat (vector bool short, const int);
-vector pixel vec_splat (vector pixel, const int);
-vector float vec_splat (vector float, const int);
-vector signed int vec_splat (vector signed int, const int);
-vector unsigned int vec_splat (vector unsigned int, const int);
-vector bool int vec_splat (vector bool int, const int);
-vector signed long vec_splat (vector signed long, const int);
-vector unsigned long vec_splat (vector unsigned long, const int);
+vector long long vec_vminsd (vector long long, vector long long);
 
-vector signed char vec_splats (signed char);
-vector unsigned char vec_splats (unsigned char);
-vector signed short vec_splats (signed short);
-vector unsigned short vec_splats (unsigned short);
-vector signed int vec_splats (signed int);
-vector unsigned int vec_splats (unsigned int);
-vector float vec_splats (float);
+vector unsigned long long vec_vminud (vector long long,
+                                      vector long long);
 
-vector float vec_vspltw (vector float, const int);
-vector signed int vec_vspltw (vector signed int, const int);
-vector unsigned int vec_vspltw (vector unsigned int, const int);
-vector bool int vec_vspltw (vector bool int, const int);
+vector int vec_vpksdss (vector long long, vector long long);
+vector unsigned int vec_vpksdss (vector long long, vector long long);
 
-vector bool short vec_vsplth (vector bool short, const int);
-vector signed short vec_vsplth (vector signed short, const int);
-vector unsigned short vec_vsplth (vector unsigned short, const int);
-vector pixel vec_vsplth (vector pixel, const int);
+vector unsigned int vec_vpkudus (vector unsigned long long,
+                                 vector unsigned long long);
 
-vector signed char vec_vspltb (vector signed char, const int);
-vector unsigned char vec_vspltb (vector unsigned char, const int);
-vector bool char vec_vspltb (vector bool char, const int);
+vector int vec_vpkudum (vector long long, vector long long);
+vector unsigned int vec_vpkudum (vector unsigned long long,
+                                 vector unsigned long long);
+vector bool int vec_vpkudum (vector bool long long, vector bool long long);
 
-vector signed char vec_splat_s8 (const int);
+vector long long vec_vpopcnt (vector long long);
+vector unsigned long long vec_vpopcnt (vector unsigned long long);
+vector int vec_vpopcnt (vector int);
+vector unsigned int vec_vpopcnt (vector int);
+vector short vec_vpopcnt (vector short);
+vector unsigned short vec_vpopcnt (vector unsigned short);
+vector signed char vec_vpopcnt (vector signed char);
+vector unsigned char vec_vpopcnt (vector unsigned char);
 
-vector signed short vec_splat_s16 (const int);
+vector signed char vec_vpopcntb (vector signed char);
+vector unsigned char vec_vpopcntb (vector unsigned char);
 
-vector signed int vec_splat_s32 (const int);
+vector long long vec_vpopcntd (vector long long);
+vector unsigned long long vec_vpopcntd (vector unsigned long long);
 
-vector unsigned char vec_splat_u8 (const int);
+vector short vec_vpopcnth (vector short);
+vector unsigned short vec_vpopcnth (vector unsigned short);
 
-vector unsigned short vec_splat_u16 (const int);
+vector int vec_vpopcntw (vector int);
+vector unsigned int vec_vpopcntw (vector int);
 
-vector unsigned int vec_splat_u32 (const int);
+vector long long vec_vrld (vector long long, vector unsigned long long);
+vector unsigned long long vec_vrld (vector unsigned long long,
+                                    vector unsigned long long);
 
-vector signed char vec_sr (vector signed char, vector unsigned char);
-vector unsigned char vec_sr (vector unsigned char,
-                             vector unsigned char);
-vector signed short vec_sr (vector signed short,
-                            vector unsigned short);
-vector unsigned short vec_sr (vector unsigned short,
-                              vector unsigned short);
-vector signed int vec_sr (vector signed int, vector unsigned int);
-vector unsigned int vec_sr (vector unsigned int, vector unsigned int);
+vector long long vec_vsld (vector long long, vector unsigned long long);
+vector long long vec_vsld (vector unsigned long long,
+                           vector unsigned long long);
 
-vector signed int vec_vsrw (vector signed int, vector unsigned int);
-vector unsigned int vec_vsrw (vector unsigned int, vector unsigned int);
+vector long long vec_vsrad (vector long long, vector unsigned long long);
+vector unsigned long long vec_vsrad (vector unsigned long long,
+                                     vector unsigned long long);
 
-vector signed short vec_vsrh (vector signed short,
-                              vector unsigned short);
-vector unsigned short vec_vsrh (vector unsigned short,
-                                vector unsigned short);
+vector long long vec_vsrd (vector long long, vector unsigned long long);
+vector unsigned long long char vec_vsrd (vector unsigned long long,
+                                         vector unsigned long long);
 
-vector signed char vec_vsrb (vector signed char, vector unsigned char);
-vector unsigned char vec_vsrb (vector unsigned char,
-                               vector unsigned char);
+vector long long vec_vsubudm (vector long long, vector long long);
+vector long long vec_vsubudm (vector bool long long, vector long long);
+vector long long vec_vsubudm (vector long long, vector bool long long);
+vector unsigned long long vec_vsubudm (vector unsigned long long,
+                                       vector unsigned long long);
+vector unsigned long long vec_vsubudm (vector bool long long,
+                                       vector unsigned long long);
+vector unsigned long long vec_vsubudm (vector unsigned long long,
+                                       vector bool long long);
 
-vector signed char vec_sra (vector signed char, vector unsigned char);
-vector unsigned char vec_sra (vector unsigned char,
-                              vector unsigned char);
-vector signed short vec_sra (vector signed short,
-                             vector unsigned short);
-vector unsigned short vec_sra (vector unsigned short,
-                               vector unsigned short);
-vector signed int vec_sra (vector signed int, vector unsigned int);
-vector unsigned int vec_sra (vector unsigned int, vector unsigned int);
+vector long long vec_vupkhsw (vector int);
+vector unsigned long long vec_vupkhsw (vector unsigned int);
 
-vector signed int vec_vsraw (vector signed int, vector unsigned int);
-vector unsigned int vec_vsraw (vector unsigned int,
-                               vector unsigned int);
+vector long long vec_vupklsw (vector int);
+vector unsigned long long vec_vupklsw (vector int);
+@end smallexample
 
-vector signed short vec_vsrah (vector signed short,
-                               vector unsigned short);
-vector unsigned short vec_vsrah (vector unsigned short,
-                                 vector unsigned short);
+If the ISA 2.07 additions to the vector/scalar (power8-vector)
+instruction set is available, the following additional functions are
+available for 64-bit targets.  New vector types
+(@var{vector __int128_t} and @var{vector __uint128_t}) are available
+to hold the @var{__int128_t} and @var{__uint128_t} types to use these
+builtins.
 
-vector signed char vec_vsrab (vector signed char, vector unsigned char);
-vector unsigned char vec_vsrab (vector unsigned char,
-                                vector unsigned char);
+The normal vector extract, and set operations work on
+@var{vector __int128_t} and @var{vector __uint128_t} types,
+but the index value must be 0.
 
-vector signed int vec_srl (vector signed int, vector unsigned int);
-vector signed int vec_srl (vector signed int, vector unsigned short);
-vector signed int vec_srl (vector signed int, vector unsigned char);
-vector unsigned int vec_srl (vector unsigned int, vector unsigned int);
-vector unsigned int vec_srl (vector unsigned int,
-                             vector unsigned short);
-vector unsigned int vec_srl (vector unsigned int, vector unsigned char);
-vector bool int vec_srl (vector bool int, vector unsigned int);
-vector bool int vec_srl (vector bool int, vector unsigned short);
-vector bool int vec_srl (vector bool int, vector unsigned char);
-vector signed short vec_srl (vector signed short, vector unsigned int);
-vector signed short vec_srl (vector signed short,
-                             vector unsigned short);
-vector signed short vec_srl (vector signed short, vector unsigned char);
-vector unsigned short vec_srl (vector unsigned short,
-                               vector unsigned int);
-vector unsigned short vec_srl (vector unsigned short,
-                               vector unsigned short);
-vector unsigned short vec_srl (vector unsigned short,
-                               vector unsigned char);
-vector bool short vec_srl (vector bool short, vector unsigned int);
-vector bool short vec_srl (vector bool short, vector unsigned short);
-vector bool short vec_srl (vector bool short, vector unsigned char);
-vector pixel vec_srl (vector pixel, vector unsigned int);
-vector pixel vec_srl (vector pixel, vector unsigned short);
-vector pixel vec_srl (vector pixel, vector unsigned char);
-vector signed char vec_srl (vector signed char, vector unsigned int);
-vector signed char vec_srl (vector signed char, vector unsigned short);
-vector signed char vec_srl (vector signed char, vector unsigned char);
-vector unsigned char vec_srl (vector unsigned char,
-                              vector unsigned int);
-vector unsigned char vec_srl (vector unsigned char,
-                              vector unsigned short);
-vector unsigned char vec_srl (vector unsigned char,
-                              vector unsigned char);
-vector bool char vec_srl (vector bool char, vector unsigned int);
-vector bool char vec_srl (vector bool char, vector unsigned short);
-vector bool char vec_srl (vector bool char, vector unsigned char);
+@smallexample
+vector __int128_t vec_vaddcuq (vector __int128_t, vector __int128_t);
+vector __uint128_t vec_vaddcuq (vector __uint128_t, vector __uint128_t);
 
-vector float vec_sro (vector float, vector signed char);
-vector float vec_sro (vector float, vector unsigned char);
-vector signed int vec_sro (vector signed int, vector signed char);
-vector signed int vec_sro (vector signed int, vector unsigned char);
-vector unsigned int vec_sro (vector unsigned int, vector signed char);
-vector unsigned int vec_sro (vector unsigned int, vector unsigned char);
-vector signed short vec_sro (vector signed short, vector signed char);
-vector signed short vec_sro (vector signed short, vector unsigned char);
-vector unsigned short vec_sro (vector unsigned short,
-                               vector signed char);
-vector unsigned short vec_sro (vector unsigned short,
-                               vector unsigned char);
-vector pixel vec_sro (vector pixel, vector signed char);
-vector pixel vec_sro (vector pixel, vector unsigned char);
-vector signed char vec_sro (vector signed char, vector signed char);
-vector signed char vec_sro (vector signed char, vector unsigned char);
-vector unsigned char vec_sro (vector unsigned char, vector signed char);
-vector unsigned char vec_sro (vector unsigned char,
-                              vector unsigned char);
+vector __int128_t vec_vadduqm (vector __int128_t, vector __int128_t);
+vector __uint128_t vec_vadduqm (vector __uint128_t, vector __uint128_t);
 
-void vec_st (vector float, int, vector float *);
-void vec_st (vector float, int, float *);
-void vec_st (vector signed int, int, vector signed int *);
-void vec_st (vector signed int, int, int *);
-void vec_st (vector unsigned int, int, vector unsigned int *);
-void vec_st (vector unsigned int, int, unsigned int *);
-void vec_st (vector bool int, int, vector bool int *);
-void vec_st (vector bool int, int, unsigned int *);
-void vec_st (vector bool int, int, int *);
-void vec_st (vector signed short, int, vector signed short *);
-void vec_st (vector signed short, int, short *);
-void vec_st (vector unsigned short, int, vector unsigned short *);
-void vec_st (vector unsigned short, int, unsigned short *);
-void vec_st (vector bool short, int, vector bool short *);
-void vec_st (vector bool short, int, unsigned short *);
-void vec_st (vector pixel, int, vector pixel *);
-void vec_st (vector pixel, int, unsigned short *);
-void vec_st (vector pixel, int, short *);
-void vec_st (vector bool short, int, short *);
-void vec_st (vector signed char, int, vector signed char *);
-void vec_st (vector signed char, int, signed char *);
-void vec_st (vector unsigned char, int, vector unsigned char *);
-void vec_st (vector unsigned char, int, unsigned char *);
-void vec_st (vector bool char, int, vector bool char *);
-void vec_st (vector bool char, int, unsigned char *);
-void vec_st (vector bool char, int, signed char *);
+vector __int128_t vec_vaddecuq (vector __int128_t, vector __int128_t,
+                                vector __int128_t);
+vector __uint128_t vec_vaddecuq (vector __uint128_t, vector __uint128_t, 
+                                 vector __uint128_t);
 
-void vec_ste (vector signed char, int, signed char *);
-void vec_ste (vector unsigned char, int, unsigned char *);
-void vec_ste (vector bool char, int, signed char *);
-void vec_ste (vector bool char, int, unsigned char *);
-void vec_ste (vector signed short, int, short *);
-void vec_ste (vector unsigned short, int, unsigned short *);
-void vec_ste (vector bool short, int, short *);
-void vec_ste (vector bool short, int, unsigned short *);
-void vec_ste (vector pixel, int, short *);
-void vec_ste (vector pixel, int, unsigned short *);
-void vec_ste (vector float, int, float *);
-void vec_ste (vector signed int, int, int *);
-void vec_ste (vector unsigned int, int, unsigned int *);
-void vec_ste (vector bool int, int, int *);
-void vec_ste (vector bool int, int, unsigned int *);
+vector __int128_t vec_vaddeuqm (vector __int128_t, vector __int128_t,
+                                vector __int128_t);
+vector __uint128_t vec_vaddeuqm (vector __uint128_t, vector __uint128_t, 
+                                 vector __uint128_t);
 
-void vec_stvewx (vector float, int, float *);
-void vec_stvewx (vector signed int, int, int *);
-void vec_stvewx (vector unsigned int, int, unsigned int *);
-void vec_stvewx (vector bool int, int, int *);
-void vec_stvewx (vector bool int, int, unsigned int *);
+vector __int128_t vec_vsubecuq (vector __int128_t, vector __int128_t,
+                                vector __int128_t);
+vector __uint128_t vec_vsubecuq (vector __uint128_t, vector __uint128_t, 
+                                 vector __uint128_t);
 
-void vec_stvehx (vector signed short, int, short *);
-void vec_stvehx (vector unsigned short, int, unsigned short *);
-void vec_stvehx (vector bool short, int, short *);
-void vec_stvehx (vector bool short, int, unsigned short *);
-void vec_stvehx (vector pixel, int, short *);
-void vec_stvehx (vector pixel, int, unsigned short *);
+vector __int128_t vec_vsubeuqm (vector __int128_t, vector __int128_t,
+                                vector __int128_t);
+vector __uint128_t vec_vsubeuqm (vector __uint128_t, vector __uint128_t,
+                                 vector __uint128_t);
 
-void vec_stvebx (vector signed char, int, signed char *);
-void vec_stvebx (vector unsigned char, int, unsigned char *);
-void vec_stvebx (vector bool char, int, signed char *);
-void vec_stvebx (vector bool char, int, unsigned char *);
+vector __int128_t vec_vsubcuq (vector __int128_t, vector __int128_t);
+vector __uint128_t vec_vsubcuq (vector __uint128_t, vector __uint128_t);
 
-void vec_stl (vector float, int, vector float *);
-void vec_stl (vector float, int, float *);
-void vec_stl (vector signed int, int, vector signed int *);
-void vec_stl (vector signed int, int, int *);
-void vec_stl (vector unsigned int, int, vector unsigned int *);
-void vec_stl (vector unsigned int, int, unsigned int *);
-void vec_stl (vector bool int, int, vector bool int *);
-void vec_stl (vector bool int, int, unsigned int *);
-void vec_stl (vector bool int, int, int *);
-void vec_stl (vector signed short, int, vector signed short *);
-void vec_stl (vector signed short, int, short *);
-void vec_stl (vector unsigned short, int, vector unsigned short *);
-void vec_stl (vector unsigned short, int, unsigned short *);
-void vec_stl (vector bool short, int, vector bool short *);
-void vec_stl (vector bool short, int, unsigned short *);
-void vec_stl (vector bool short, int, short *);
-void vec_stl (vector pixel, int, vector pixel *);
-void vec_stl (vector pixel, int, unsigned short *);
-void vec_stl (vector pixel, int, short *);
-void vec_stl (vector signed char, int, vector signed char *);
-void vec_stl (vector signed char, int, signed char *);
-void vec_stl (vector unsigned char, int, vector unsigned char *);
-void vec_stl (vector unsigned char, int, unsigned char *);
-void vec_stl (vector bool char, int, vector bool char *);
-void vec_stl (vector bool char, int, unsigned char *);
-void vec_stl (vector bool char, int, signed char *);
+__int128_t vec_vsubuqm (__int128_t, __int128_t);
+__uint128_t vec_vsubuqm (__uint128_t, __uint128_t);
 
-vector signed char vec_sub (vector bool char, vector signed char);
-vector signed char vec_sub (vector signed char, vector bool char);
-vector signed char vec_sub (vector signed char, vector signed char);
-vector unsigned char vec_sub (vector bool char, vector unsigned char);
-vector unsigned char vec_sub (vector unsigned char, vector bool char);
-vector unsigned char vec_sub (vector unsigned char,
-                              vector unsigned char);
-vector signed short vec_sub (vector bool short, vector signed short);
-vector signed short vec_sub (vector signed short, vector bool short);
-vector signed short vec_sub (vector signed short, vector signed short);
-vector unsigned short vec_sub (vector bool short,
-                               vector unsigned short);
-vector unsigned short vec_sub (vector unsigned short,
-                               vector bool short);
-vector unsigned short vec_sub (vector unsigned short,
-                               vector unsigned short);
-vector signed int vec_sub (vector bool int, vector signed int);
-vector signed int vec_sub (vector signed int, vector bool int);
-vector signed int vec_sub (vector signed int, vector signed int);
-vector unsigned int vec_sub (vector bool int, vector unsigned int);
-vector unsigned int vec_sub (vector unsigned int, vector bool int);
-vector unsigned int vec_sub (vector unsigned int, vector unsigned int);
-vector float vec_sub (vector float, vector float);
+vector __int128_t __builtin_bcdadd (vector __int128_t, vector__int128_t);
+int __builtin_bcdadd_lt (vector __int128_t, vector__int128_t);
+int __builtin_bcdadd_eq (vector __int128_t, vector__int128_t);
+int __builtin_bcdadd_gt (vector __int128_t, vector__int128_t);
+int __builtin_bcdadd_ov (vector __int128_t, vector__int128_t);
+vector __int128_t bcdsub (vector __int128_t, vector__int128_t);
+int __builtin_bcdsub_lt (vector __int128_t, vector__int128_t);
+int __builtin_bcdsub_eq (vector __int128_t, vector__int128_t);
+int __builtin_bcdsub_gt (vector __int128_t, vector__int128_t);
+int __builtin_bcdsub_ov (vector __int128_t, vector__int128_t);
+@end smallexample
 
-vector float vec_vsubfp (vector float, vector float);
+If the cryptographic instructions are enabled (@option{-mcrypto} or
+@option{-mcpu=power8}), the following builtins are enabled.
 
-vector signed int vec_vsubuwm (vector bool int, vector signed int);
-vector signed int vec_vsubuwm (vector signed int, vector bool int);
-vector signed int vec_vsubuwm (vector signed int, vector signed int);
-vector unsigned int vec_vsubuwm (vector bool int, vector unsigned int);
-vector unsigned int vec_vsubuwm (vector unsigned int, vector bool int);
-vector unsigned int vec_vsubuwm (vector unsigned int,
-                                 vector unsigned int);
+@smallexample
+vector unsigned long long __builtin_crypto_vsbox (vector unsigned long long);
 
-vector signed short vec_vsubuhm (vector bool short,
-                                 vector signed short);
-vector signed short vec_vsubuhm (vector signed short,
-                                 vector bool short);
-vector signed short vec_vsubuhm (vector signed short,
-                                 vector signed short);
-vector unsigned short vec_vsubuhm (vector bool short,
-                                   vector unsigned short);
-vector unsigned short vec_vsubuhm (vector unsigned short,
-                                   vector bool short);
-vector unsigned short vec_vsubuhm (vector unsigned short,
-                                   vector unsigned short);
+vector unsigned long long __builtin_crypto_vcipher (vector unsigned long long,
+                                                    vector unsigned long long);
 
-vector signed char vec_vsububm (vector bool char, vector signed char);
-vector signed char vec_vsububm (vector signed char, vector bool char);
-vector signed char vec_vsububm (vector signed char, vector signed char);
-vector unsigned char vec_vsububm (vector bool char,
-                                  vector unsigned char);
-vector unsigned char vec_vsububm (vector unsigned char,
-                                  vector bool char);
-vector unsigned char vec_vsububm (vector unsigned char,
-                                  vector unsigned char);
+vector unsigned long long __builtin_crypto_vcipherlast
+                                     (vector unsigned long long,
+                                      vector unsigned long long);
+
+vector unsigned long long __builtin_crypto_vncipher (vector unsigned long long,
+                                                     vector unsigned long long);
 
-vector unsigned int vec_subc (vector unsigned int, vector unsigned int);
+vector unsigned long long __builtin_crypto_vncipherlast
+                                     (vector unsigned long long,
+                                      vector unsigned long long);
 
-vector unsigned char vec_subs (vector bool char, vector unsigned char);
-vector unsigned char vec_subs (vector unsigned char, vector bool char);
-vector unsigned char vec_subs (vector unsigned char,
-                               vector unsigned char);
-vector signed char vec_subs (vector bool char, vector signed char);
-vector signed char vec_subs (vector signed char, vector bool char);
-vector signed char vec_subs (vector signed char, vector signed char);
-vector unsigned short vec_subs (vector bool short,
-                                vector unsigned short);
-vector unsigned short vec_subs (vector unsigned short,
-                                vector bool short);
-vector unsigned short vec_subs (vector unsigned short,
-                                vector unsigned short);
-vector signed short vec_subs (vector bool short, vector signed short);
-vector signed short vec_subs (vector signed short, vector bool short);
-vector signed short vec_subs (vector signed short, vector signed short);
-vector unsigned int vec_subs (vector bool int, vector unsigned int);
-vector unsigned int vec_subs (vector unsigned int, vector bool int);
-vector unsigned int vec_subs (vector unsigned int, vector unsigned int);
-vector signed int vec_subs (vector bool int, vector signed int);
-vector signed int vec_subs (vector signed int, vector bool int);
-vector signed int vec_subs (vector signed int, vector signed int);
+vector unsigned char __builtin_crypto_vpermxor (vector unsigned char,
+                                                vector unsigned char,
+                                                vector unsigned char);
 
-vector signed int vec_vsubsws (vector bool int, vector signed int);
-vector signed int vec_vsubsws (vector signed int, vector bool int);
-vector signed int vec_vsubsws (vector signed int, vector signed int);
+vector unsigned short __builtin_crypto_vpermxor (vector unsigned short,
+                                                 vector unsigned short,
+                                                 vector unsigned short);
 
-vector unsigned int vec_vsubuws (vector bool int, vector unsigned int);
-vector unsigned int vec_vsubuws (vector unsigned int, vector bool int);
-vector unsigned int vec_vsubuws (vector unsigned int,
-                                 vector unsigned int);
+vector unsigned int __builtin_crypto_vpermxor (vector unsigned int,
+                                               vector unsigned int,
+                                               vector unsigned int);
 
-vector signed short vec_vsubshs (vector bool short,
-                                 vector signed short);
-vector signed short vec_vsubshs (vector signed short,
-                                 vector bool short);
-vector signed short vec_vsubshs (vector signed short,
-                                 vector signed short);
+vector unsigned long long __builtin_crypto_vpermxor (vector unsigned long long,
+                                                     vector unsigned long long,
+                                                     vector unsigned long long);
 
-vector unsigned short vec_vsubuhs (vector bool short,
-                                   vector unsigned short);
-vector unsigned short vec_vsubuhs (vector unsigned short,
-                                   vector bool short);
-vector unsigned short vec_vsubuhs (vector unsigned short,
-                                   vector unsigned short);
+vector unsigned char __builtin_crypto_vpmsumb (vector unsigned char,
+                                               vector unsigned char);
 
-vector signed char vec_vsubsbs (vector bool char, vector signed char);
-vector signed char vec_vsubsbs (vector signed char, vector bool char);
-vector signed char vec_vsubsbs (vector signed char, vector signed char);
+vector unsigned short __builtin_crypto_vpmsumb (vector unsigned short,
+                                                vector unsigned short);
 
-vector unsigned char vec_vsububs (vector bool char,
-                                  vector unsigned char);
-vector unsigned char vec_vsububs (vector unsigned char,
-                                  vector bool char);
-vector unsigned char vec_vsububs (vector unsigned char,
-                                  vector unsigned char);
+vector unsigned int __builtin_crypto_vpmsumb (vector unsigned int,
+                                              vector unsigned int);
 
-vector unsigned int vec_sum4s (vector unsigned char,
-                               vector unsigned int);
-vector signed int vec_sum4s (vector signed char, vector signed int);
-vector signed int vec_sum4s (vector signed short, vector signed int);
+vector unsigned long long __builtin_crypto_vpmsumb (vector unsigned long long,
+                                                    vector unsigned long long);
 
-vector signed int vec_vsum4shs (vector signed short, vector signed int);
+vector unsigned long long __builtin_crypto_vshasigmad
+                               (vector unsigned long long, int, int);
 
-vector signed int vec_vsum4sbs (vector signed char, vector signed int);
+vector unsigned int __builtin_crypto_vshasigmaw (vector unsigned int,
+                                                 int, int);
+@end smallexample
 
-vector unsigned int vec_vsum4ubs (vector unsigned char,
-                                  vector unsigned int);
+The second argument to the @var{__builtin_crypto_vshasigmad} and
+@var{__builtin_crypto_vshasigmaw} builtin functions must be a constant
+integer that is 0 or 1.  The third argument to these builtin functions
+must be a constant integer in the range of 0 to 15.
 
-vector signed int vec_sum2s (vector signed int, vector signed int);
+@node PowerPC Hardware Transactional Memory Built-in Functions
+@subsection PowerPC Hardware Transactional Memory Built-in Functions
+GCC provides two interfaces for accessing the Hardware Transactional
+Memory (HTM) instructions available on some of the PowerPC family
+of prcoessors (eg, POWER8).  The two interfaces come in a low level
+interface, consisting of built-in functions specific to PowerPC and a
+higher level interface consisting of inline functions that are common
+between PowerPC and S/390.
 
-vector signed int vec_sums (vector signed int, vector signed int);
+@subsubsection PowerPC HTM Low Level Built-in Functions
 
-vector float vec_trunc (vector float);
+The following low level built-in functions are available with
+@option{-mhtm} or @option{-mcpu=CPU} where CPU is `power8' or later.
+They all generate the machine instruction that is part of the name.
 
-vector signed short vec_unpackh (vector signed char);
-vector bool short vec_unpackh (vector bool char);
-vector signed int vec_unpackh (vector signed short);
-vector bool int vec_unpackh (vector bool short);
-vector unsigned int vec_unpackh (vector pixel);
+The HTM built-ins return true or false depending on their success and
+their arguments match exactly the type and order of the associated
+hardware instruction's operands.  Refer to the ISA manual for a
+description of each instruction's operands.
 
-vector bool int vec_vupkhsh (vector bool short);
-vector signed int vec_vupkhsh (vector signed short);
+@smallexample
+unsigned int __builtin_tbegin (unsigned int)
+unsigned int __builtin_tend (unsigned int)
 
-vector unsigned int vec_vupkhpx (vector pixel);
+unsigned int __builtin_tabort (unsigned int)
+unsigned int __builtin_tabortdc (unsigned int, unsigned int, unsigned int)
+unsigned int __builtin_tabortdci (unsigned int, unsigned int, int)
+unsigned int __builtin_tabortwc (unsigned int, unsigned int, unsigned int)
+unsigned int __builtin_tabortwci (unsigned int, unsigned int, int)
 
-vector bool short vec_vupkhsb (vector bool char);
-vector signed short vec_vupkhsb (vector signed char);
+unsigned int __builtin_tcheck (unsigned int)
+unsigned int __builtin_treclaim (unsigned int)
+unsigned int __builtin_trechkpt (void)
+unsigned int __builtin_tsr (unsigned int)
+@end smallexample
 
-vector signed short vec_unpackl (vector signed char);
-vector bool short vec_unpackl (vector bool char);
-vector unsigned int vec_unpackl (vector pixel);
-vector signed int vec_unpackl (vector signed short);
-vector bool int vec_unpackl (vector bool short);
+In addition to the above HTM built-ins, we have added built-ins for
+some common extended mnemonics of the HTM instructions:
 
-vector unsigned int vec_vupklpx (vector pixel);
+@smallexample
+unsigned int __builtin_tendall (void)
+unsigned int __builtin_tresume (void)
+unsigned int __builtin_tsuspend (void)
+@end smallexample
 
-vector bool int vec_vupklsh (vector bool short);
-vector signed int vec_vupklsh (vector signed short);
+The following set of built-in functions are available to gain access
+to the HTM specific special purpose registers.
 
-vector bool short vec_vupklsb (vector bool char);
-vector signed short vec_vupklsb (vector signed char);
+@smallexample
+unsigned long __builtin_get_texasr (void)
+unsigned long __builtin_get_texasru (void)
+unsigned long __builtin_get_tfhar (void)
+unsigned long __builtin_get_tfiar (void)
 
-vector float vec_xor (vector float, vector float);
-vector float vec_xor (vector float, vector bool int);
-vector float vec_xor (vector bool int, vector float);
-vector bool int vec_xor (vector bool int, vector bool int);
-vector signed int vec_xor (vector bool int, vector signed int);
-vector signed int vec_xor (vector signed int, vector bool int);
-vector signed int vec_xor (vector signed int, vector signed int);
-vector unsigned int vec_xor (vector bool int, vector unsigned int);
-vector unsigned int vec_xor (vector unsigned int, vector bool int);
-vector unsigned int vec_xor (vector unsigned int, vector unsigned int);
-vector bool short vec_xor (vector bool short, vector bool short);
-vector signed short vec_xor (vector bool short, vector signed short);
-vector signed short vec_xor (vector signed short, vector bool short);
-vector signed short vec_xor (vector signed short, vector signed short);
-vector unsigned short vec_xor (vector bool short,
-                               vector unsigned short);
-vector unsigned short vec_xor (vector unsigned short,
-                               vector bool short);
-vector unsigned short vec_xor (vector unsigned short,
-                               vector unsigned short);
-vector signed char vec_xor (vector bool char, vector signed char);
-vector bool char vec_xor (vector bool char, vector bool char);
-vector signed char vec_xor (vector signed char, vector bool char);
-vector signed char vec_xor (vector signed char, vector signed char);
-vector unsigned char vec_xor (vector bool char, vector unsigned char);
-vector unsigned char vec_xor (vector unsigned char, vector bool char);
-vector unsigned char vec_xor (vector unsigned char,
-                              vector unsigned char);
+void __builtin_set_texasr (unsigned long);
+void __builtin_set_texasru (unsigned long);
+void __builtin_set_tfhar (unsigned long);
+void __builtin_set_tfiar (unsigned long);
+@end smallexample
 
-int vec_all_eq (vector signed char, vector bool char);
-int vec_all_eq (vector signed char, vector signed char);
-int vec_all_eq (vector unsigned char, vector bool char);
-int vec_all_eq (vector unsigned char, vector unsigned char);
-int vec_all_eq (vector bool char, vector bool char);
-int vec_all_eq (vector bool char, vector unsigned char);
-int vec_all_eq (vector bool char, vector signed char);
-int vec_all_eq (vector signed short, vector bool short);
-int vec_all_eq (vector signed short, vector signed short);
-int vec_all_eq (vector unsigned short, vector bool short);
-int vec_all_eq (vector unsigned short, vector unsigned short);
-int vec_all_eq (vector bool short, vector bool short);
-int vec_all_eq (vector bool short, vector unsigned short);
-int vec_all_eq (vector bool short, vector signed short);
-int vec_all_eq (vector pixel, vector pixel);
-int vec_all_eq (vector signed int, vector bool int);
-int vec_all_eq (vector signed int, vector signed int);
-int vec_all_eq (vector unsigned int, vector bool int);
-int vec_all_eq (vector unsigned int, vector unsigned int);
-int vec_all_eq (vector bool int, vector bool int);
-int vec_all_eq (vector bool int, vector unsigned int);
-int vec_all_eq (vector bool int, vector signed int);
-int vec_all_eq (vector float, vector float);
+Example usage of these low level built-in functions may look like:
 
-int vec_all_ge (vector bool char, vector unsigned char);
-int vec_all_ge (vector unsigned char, vector bool char);
-int vec_all_ge (vector unsigned char, vector unsigned char);
-int vec_all_ge (vector bool char, vector signed char);
-int vec_all_ge (vector signed char, vector bool char);
-int vec_all_ge (vector signed char, vector signed char);
-int vec_all_ge (vector bool short, vector unsigned short);
-int vec_all_ge (vector unsigned short, vector bool short);
-int vec_all_ge (vector unsigned short, vector unsigned short);
-int vec_all_ge (vector signed short, vector signed short);
-int vec_all_ge (vector bool short, vector signed short);
-int vec_all_ge (vector signed short, vector bool short);
-int vec_all_ge (vector bool int, vector unsigned int);
-int vec_all_ge (vector unsigned int, vector bool int);
-int vec_all_ge (vector unsigned int, vector unsigned int);
-int vec_all_ge (vector bool int, vector signed int);
-int vec_all_ge (vector signed int, vector bool int);
-int vec_all_ge (vector signed int, vector signed int);
-int vec_all_ge (vector float, vector float);
+@smallexample
+#include <htmintrin.h>
 
-int vec_all_gt (vector bool char, vector unsigned char);
-int vec_all_gt (vector unsigned char, vector bool char);
-int vec_all_gt (vector unsigned char, vector unsigned char);
-int vec_all_gt (vector bool char, vector signed char);
-int vec_all_gt (vector signed char, vector bool char);
-int vec_all_gt (vector signed char, vector signed char);
-int vec_all_gt (vector bool short, vector unsigned short);
-int vec_all_gt (vector unsigned short, vector bool short);
-int vec_all_gt (vector unsigned short, vector unsigned short);
-int vec_all_gt (vector bool short, vector signed short);
-int vec_all_gt (vector signed short, vector bool short);
-int vec_all_gt (vector signed short, vector signed short);
-int vec_all_gt (vector bool int, vector unsigned int);
-int vec_all_gt (vector unsigned int, vector bool int);
-int vec_all_gt (vector unsigned int, vector unsigned int);
-int vec_all_gt (vector bool int, vector signed int);
-int vec_all_gt (vector signed int, vector bool int);
-int vec_all_gt (vector signed int, vector signed int);
-int vec_all_gt (vector float, vector float);
+int num_retries = 10;
+
+while (1)
+  @{
+    if (__builtin_tbegin (0))
+      @{
+        /* Transaction State Initiated.  */
+        if (is_locked (lock))
+          __builtin_tabort (0);
+        ... transaction code...
+        __builtin_tend (0);
+        break;
+      @}
+    else
+      @{
+        /* Transaction State Failed.  Use locks if the transaction
+           failure is "persistent" or we've tried too many times.  */
+        if (num_retries-- <= 0
+            || _TEXASRU_FAILURE_PERSISTENT (__builtin_get_texasru ()))
+          @{
+            acquire_lock (lock);
+            ... non transactional fallback path...
+            release_lock (lock);
+            break;
+          @}
+      @}
+  @}
+@end smallexample
 
-int vec_all_in (vector float, vector float);
+One final built-in function has been added that returns the value of
+the 2-bit Transaction State field of the Machine Status Register (MSR)
+as stored in @code{CR0}.
 
-int vec_all_le (vector bool char, vector unsigned char);
-int vec_all_le (vector unsigned char, vector bool char);
-int vec_all_le (vector unsigned char, vector unsigned char);
-int vec_all_le (vector bool char, vector signed char);
-int vec_all_le (vector signed char, vector bool char);
-int vec_all_le (vector signed char, vector signed char);
-int vec_all_le (vector bool short, vector unsigned short);
-int vec_all_le (vector unsigned short, vector bool short);
-int vec_all_le (vector unsigned short, vector unsigned short);
-int vec_all_le (vector bool short, vector signed short);
-int vec_all_le (vector signed short, vector bool short);
-int vec_all_le (vector signed short, vector signed short);
-int vec_all_le (vector bool int, vector unsigned int);
-int vec_all_le (vector unsigned int, vector bool int);
-int vec_all_le (vector unsigned int, vector unsigned int);
-int vec_all_le (vector bool int, vector signed int);
-int vec_all_le (vector signed int, vector bool int);
-int vec_all_le (vector signed int, vector signed int);
-int vec_all_le (vector float, vector float);
+@smallexample
+unsigned long __builtin_ttest (void)
+@end smallexample
 
-int vec_all_lt (vector bool char, vector unsigned char);
-int vec_all_lt (vector unsigned char, vector bool char);
-int vec_all_lt (vector unsigned char, vector unsigned char);
-int vec_all_lt (vector bool char, vector signed char);
-int vec_all_lt (vector signed char, vector bool char);
-int vec_all_lt (vector signed char, vector signed char);
-int vec_all_lt (vector bool short, vector unsigned short);
-int vec_all_lt (vector unsigned short, vector bool short);
-int vec_all_lt (vector unsigned short, vector unsigned short);
-int vec_all_lt (vector bool short, vector signed short);
-int vec_all_lt (vector signed short, vector bool short);
-int vec_all_lt (vector signed short, vector signed short);
-int vec_all_lt (vector bool int, vector unsigned int);
-int vec_all_lt (vector unsigned int, vector bool int);
-int vec_all_lt (vector unsigned int, vector unsigned int);
-int vec_all_lt (vector bool int, vector signed int);
-int vec_all_lt (vector signed int, vector bool int);
-int vec_all_lt (vector signed int, vector signed int);
-int vec_all_lt (vector float, vector float);
+This built-in can be used to determine the current transaction state
+using the following code example:
 
-int vec_all_nan (vector float);
+@smallexample
+#include <htmintrin.h>
 
-int vec_all_ne (vector signed char, vector bool char);
-int vec_all_ne (vector signed char, vector signed char);
-int vec_all_ne (vector unsigned char, vector bool char);
-int vec_all_ne (vector unsigned char, vector unsigned char);
-int vec_all_ne (vector bool char, vector bool char);
-int vec_all_ne (vector bool char, vector unsigned char);
-int vec_all_ne (vector bool char, vector signed char);
-int vec_all_ne (vector signed short, vector bool short);
-int vec_all_ne (vector signed short, vector signed short);
-int vec_all_ne (vector unsigned short, vector bool short);
-int vec_all_ne (vector unsigned short, vector unsigned short);
-int vec_all_ne (vector bool short, vector bool short);
-int vec_all_ne (vector bool short, vector unsigned short);
-int vec_all_ne (vector bool short, vector signed short);
-int vec_all_ne (vector pixel, vector pixel);
-int vec_all_ne (vector signed int, vector bool int);
-int vec_all_ne (vector signed int, vector signed int);
-int vec_all_ne (vector unsigned int, vector bool int);
-int vec_all_ne (vector unsigned int, vector unsigned int);
-int vec_all_ne (vector bool int, vector bool int);
-int vec_all_ne (vector bool int, vector unsigned int);
-int vec_all_ne (vector bool int, vector signed int);
-int vec_all_ne (vector float, vector float);
+unsigned char tx_state = _HTM_STATE (__builtin_ttest ());
 
-int vec_all_nge (vector float, vector float);
+if (tx_state == _HTM_TRANSACTIONAL)
+  @{
+    /* Code to use in transactional state.  */
+  @}
+else if (tx_state == _HTM_NONTRANSACTIONAL)
+  @{
+    /* Code to use in non-transactional state.  */
+  @}
+else if (tx_state == _HTM_SUSPENDED)
+  @{
+    /* Code to use in transaction suspended state.  */
+  @}
+@end smallexample
 
-int vec_all_ngt (vector float, vector float);
+@subsubsection PowerPC HTM High Level Inline Functions
 
-int vec_all_nle (vector float, vector float);
+The following high level HTM interface is made available by including
+@code{<htmxlintrin.h>} and using @option{-mhtm} or @option{-mcpu=CPU}
+where CPU is `power8' or later.  This interface is common between PowerPC
+and S/390, allowing users to write one HTM source implementation that
+can be compiled and executed on either system.
 
-int vec_all_nlt (vector float, vector float);
+@smallexample
+long __TM_simple_begin (void)
+long __TM_begin (void* const TM_buff)
+long __TM_end (void)
+void __TM_abort (void)
+void __TM_named_abort (unsigned char const code)
+void __TM_resume (void)
+void __TM_suspend (void)
 
-int vec_all_numeric (vector float);
+long __TM_is_user_abort (void* const TM_buff)
+long __TM_is_named_user_abort (void* const TM_buff, unsigned char *code)
+long __TM_is_illegal (void* const TM_buff)
+long __TM_is_footprint_exceeded (void* const TM_buff)
+long __TM_nesting_depth (void* const TM_buff)
+long __TM_is_nested_too_deep(void* const TM_buff)
+long __TM_is_conflict(void* const TM_buff)
+long __TM_is_failure_persistent(void* const TM_buff)
+long __TM_failure_address(void* const TM_buff)
+long long __TM_failure_code(void* const TM_buff)
+@end smallexample
 
-int vec_any_eq (vector signed char, vector bool char);
-int vec_any_eq (vector signed char, vector signed char);
-int vec_any_eq (vector unsigned char, vector bool char);
-int vec_any_eq (vector unsigned char, vector unsigned char);
-int vec_any_eq (vector bool char, vector bool char);
-int vec_any_eq (vector bool char, vector unsigned char);
-int vec_any_eq (vector bool char, vector signed char);
-int vec_any_eq (vector signed short, vector bool short);
-int vec_any_eq (vector signed short, vector signed short);
-int vec_any_eq (vector unsigned short, vector bool short);
-int vec_any_eq (vector unsigned short, vector unsigned short);
-int vec_any_eq (vector bool short, vector bool short);
-int vec_any_eq (vector bool short, vector unsigned short);
-int vec_any_eq (vector bool short, vector signed short);
-int vec_any_eq (vector pixel, vector pixel);
-int vec_any_eq (vector signed int, vector bool int);
-int vec_any_eq (vector signed int, vector signed int);
-int vec_any_eq (vector unsigned int, vector bool int);
-int vec_any_eq (vector unsigned int, vector unsigned int);
-int vec_any_eq (vector bool int, vector bool int);
-int vec_any_eq (vector bool int, vector unsigned int);
-int vec_any_eq (vector bool int, vector signed int);
-int vec_any_eq (vector float, vector float);
+Using these common set of HTM inline functions, we can create
+a more portable version of the HTM example in the previous
+section that will work on either PowerPC or S/390:
 
-int vec_any_ge (vector signed char, vector bool char);
-int vec_any_ge (vector unsigned char, vector bool char);
-int vec_any_ge (vector unsigned char, vector unsigned char);
-int vec_any_ge (vector signed char, vector signed char);
-int vec_any_ge (vector bool char, vector unsigned char);
-int vec_any_ge (vector bool char, vector signed char);
-int vec_any_ge (vector unsigned short, vector bool short);
-int vec_any_ge (vector unsigned short, vector unsigned short);
-int vec_any_ge (vector signed short, vector signed short);
-int vec_any_ge (vector signed short, vector bool short);
-int vec_any_ge (vector bool short, vector unsigned short);
-int vec_any_ge (vector bool short, vector signed short);
-int vec_any_ge (vector signed int, vector bool int);
-int vec_any_ge (vector unsigned int, vector bool int);
-int vec_any_ge (vector unsigned int, vector unsigned int);
-int vec_any_ge (vector signed int, vector signed int);
-int vec_any_ge (vector bool int, vector unsigned int);
-int vec_any_ge (vector bool int, vector signed int);
-int vec_any_ge (vector float, vector float);
+@smallexample
+#include <htmxlintrin.h>
 
-int vec_any_gt (vector bool char, vector unsigned char);
-int vec_any_gt (vector unsigned char, vector bool char);
-int vec_any_gt (vector unsigned char, vector unsigned char);
-int vec_any_gt (vector bool char, vector signed char);
-int vec_any_gt (vector signed char, vector bool char);
-int vec_any_gt (vector signed char, vector signed char);
-int vec_any_gt (vector bool short, vector unsigned short);
-int vec_any_gt (vector unsigned short, vector bool short);
-int vec_any_gt (vector unsigned short, vector unsigned short);
-int vec_any_gt (vector bool short, vector signed short);
-int vec_any_gt (vector signed short, vector bool short);
-int vec_any_gt (vector signed short, vector signed short);
-int vec_any_gt (vector bool int, vector unsigned int);
-int vec_any_gt (vector unsigned int, vector bool int);
-int vec_any_gt (vector unsigned int, vector unsigned int);
-int vec_any_gt (vector bool int, vector signed int);
-int vec_any_gt (vector signed int, vector bool int);
-int vec_any_gt (vector signed int, vector signed int);
-int vec_any_gt (vector float, vector float);
+int num_retries = 10;
+TM_buff_type TM_buff;
 
-int vec_any_le (vector bool char, vector unsigned char);
-int vec_any_le (vector unsigned char, vector bool char);
-int vec_any_le (vector unsigned char, vector unsigned char);
-int vec_any_le (vector bool char, vector signed char);
-int vec_any_le (vector signed char, vector bool char);
-int vec_any_le (vector signed char, vector signed char);
-int vec_any_le (vector bool short, vector unsigned short);
-int vec_any_le (vector unsigned short, vector bool short);
-int vec_any_le (vector unsigned short, vector unsigned short);
-int vec_any_le (vector bool short, vector signed short);
-int vec_any_le (vector signed short, vector bool short);
-int vec_any_le (vector signed short, vector signed short);
-int vec_any_le (vector bool int, vector unsigned int);
-int vec_any_le (vector unsigned int, vector bool int);
-int vec_any_le (vector unsigned int, vector unsigned int);
-int vec_any_le (vector bool int, vector signed int);
-int vec_any_le (vector signed int, vector bool int);
-int vec_any_le (vector signed int, vector signed int);
-int vec_any_le (vector float, vector float);
+while (1)
+  @{
+    if (__TM_begin (TM_buff))
+      @{
+        /* Transaction State Initiated.  */
+        if (is_locked (lock))
+          __TM_abort ();
+        ... transaction code...
+        __TM_end ();
+        break;
+      @}
+    else
+      @{
+        /* Transaction State Failed.  Use locks if the transaction
+           failure is "persistent" or we've tried too many times.  */
+        if (num_retries-- <= 0
+            || __TM_is_failure_persistent (TM_buff))
+          @{
+            acquire_lock (lock);
+            ... non transactional fallback path...
+            release_lock (lock);
+            break;
+          @}
+      @}
+  @}
+@end smallexample
 
-int vec_any_lt (vector bool char, vector unsigned char);
-int vec_any_lt (vector unsigned char, vector bool char);
-int vec_any_lt (vector unsigned char, vector unsigned char);
-int vec_any_lt (vector bool char, vector signed char);
-int vec_any_lt (vector signed char, vector bool char);
-int vec_any_lt (vector signed char, vector signed char);
-int vec_any_lt (vector bool short, vector unsigned short);
-int vec_any_lt (vector unsigned short, vector bool short);
-int vec_any_lt (vector unsigned short, vector unsigned short);
-int vec_any_lt (vector bool short, vector signed short);
-int vec_any_lt (vector signed short, vector bool short);
-int vec_any_lt (vector signed short, vector signed short);
-int vec_any_lt (vector bool int, vector unsigned int);
-int vec_any_lt (vector unsigned int, vector bool int);
-int vec_any_lt (vector unsigned int, vector unsigned int);
-int vec_any_lt (vector bool int, vector signed int);
-int vec_any_lt (vector signed int, vector bool int);
-int vec_any_lt (vector signed int, vector signed int);
-int vec_any_lt (vector float, vector float);
+@node RX Built-in Functions
+@subsection RX Built-in Functions
+GCC supports some of the RX instructions which cannot be expressed in
+the C programming language via the use of built-in functions.  The
+following functions are supported:
 
-int vec_any_nan (vector float);
+@deftypefn {Built-in Function}  void __builtin_rx_brk (void)
+Generates the @code{brk} machine instruction.
+@end deftypefn
 
-int vec_any_ne (vector signed char, vector bool char);
-int vec_any_ne (vector signed char, vector signed char);
-int vec_any_ne (vector unsigned char, vector bool char);
-int vec_any_ne (vector unsigned char, vector unsigned char);
-int vec_any_ne (vector bool char, vector bool char);
-int vec_any_ne (vector bool char, vector unsigned char);
-int vec_any_ne (vector bool char, vector signed char);
-int vec_any_ne (vector signed short, vector bool short);
-int vec_any_ne (vector signed short, vector signed short);
-int vec_any_ne (vector unsigned short, vector bool short);
-int vec_any_ne (vector unsigned short, vector unsigned short);
-int vec_any_ne (vector bool short, vector bool short);
-int vec_any_ne (vector bool short, vector unsigned short);
-int vec_any_ne (vector bool short, vector signed short);
-int vec_any_ne (vector pixel, vector pixel);
-int vec_any_ne (vector signed int, vector bool int);
-int vec_any_ne (vector signed int, vector signed int);
-int vec_any_ne (vector unsigned int, vector bool int);
-int vec_any_ne (vector unsigned int, vector unsigned int);
-int vec_any_ne (vector bool int, vector bool int);
-int vec_any_ne (vector bool int, vector unsigned int);
-int vec_any_ne (vector bool int, vector signed int);
-int vec_any_ne (vector float, vector float);
+@deftypefn {Built-in Function}  void __builtin_rx_clrpsw (int)
+Generates the @code{clrpsw} machine instruction to clear the specified
+bit in the processor status word.
+@end deftypefn
 
-int vec_any_nge (vector float, vector float);
+@deftypefn {Built-in Function}  void __builtin_rx_int (int)
+Generates the @code{int} machine instruction to generate an interrupt
+with the specified value.
+@end deftypefn
 
-int vec_any_ngt (vector float, vector float);
+@deftypefn {Built-in Function}  void __builtin_rx_machi (int, int)
+Generates the @code{machi} machine instruction to add the result of
+multiplying the top 16 bits of the two arguments into the
+accumulator.
+@end deftypefn
 
-int vec_any_nle (vector float, vector float);
+@deftypefn {Built-in Function}  void __builtin_rx_maclo (int, int)
+Generates the @code{maclo} machine instruction to add the result of
+multiplying the bottom 16 bits of the two arguments into the
+accumulator.
+@end deftypefn
 
-int vec_any_nlt (vector float, vector float);
+@deftypefn {Built-in Function}  void __builtin_rx_mulhi (int, int)
+Generates the @code{mulhi} machine instruction to place the result of
+multiplying the top 16 bits of the two arguments into the
+accumulator.
+@end deftypefn
 
-int vec_any_numeric (vector float);
+@deftypefn {Built-in Function}  void __builtin_rx_mullo (int, int)
+Generates the @code{mullo} machine instruction to place the result of
+multiplying the bottom 16 bits of the two arguments into the
+accumulator.
+@end deftypefn
 
-int vec_any_out (vector float, vector float);
-@end smallexample
+@deftypefn {Built-in Function}  int  __builtin_rx_mvfachi (void)
+Generates the @code{mvfachi} machine instruction to read the top
+32 bits of the accumulator.
+@end deftypefn
 
-If the vector/scalar (VSX) instruction set is available, the following
-additional functions are available:
+@deftypefn {Built-in Function}  int  __builtin_rx_mvfacmi (void)
+Generates the @code{mvfacmi} machine instruction to read the middle
+32 bits of the accumulator.
+@end deftypefn
 
-@smallexample
-vector double vec_abs (vector double);
-vector double vec_add (vector double, vector double);
-vector double vec_and (vector double, vector double);
-vector double vec_and (vector double, vector bool long);
-vector double vec_and (vector bool long, vector double);
-vector long vec_and (vector long, vector long);
-vector long vec_and (vector long, vector bool long);
-vector long vec_and (vector bool long, vector long);
-vector unsigned long vec_and (vector unsigned long, vector unsigned long);
-vector unsigned long vec_and (vector unsigned long, vector bool long);
-vector unsigned long vec_and (vector bool long, vector unsigned long);
-vector double vec_andc (vector double, vector double);
-vector double vec_andc (vector double, vector bool long);
-vector double vec_andc (vector bool long, vector double);
-vector long vec_andc (vector long, vector long);
-vector long vec_andc (vector long, vector bool long);
-vector long vec_andc (vector bool long, vector long);
-vector unsigned long vec_andc (vector unsigned long, vector unsigned long);
-vector unsigned long vec_andc (vector unsigned long, vector bool long);
-vector unsigned long vec_andc (vector bool long, vector unsigned long);
-vector double vec_ceil (vector double);
-vector bool long vec_cmpeq (vector double, vector double);
-vector bool long vec_cmpge (vector double, vector double);
-vector bool long vec_cmpgt (vector double, vector double);
-vector bool long vec_cmple (vector double, vector double);
-vector bool long vec_cmplt (vector double, vector double);
-vector double vec_cpsgn (vector double, vector double);
-vector float vec_div (vector float, vector float);
-vector double vec_div (vector double, vector double);
-vector long vec_div (vector long, vector long);
-vector unsigned long vec_div (vector unsigned long, vector unsigned long);
-vector double vec_floor (vector double);
-vector double vec_ld (int, const vector double *);
-vector double vec_ld (int, const double *);
-vector double vec_ldl (int, const vector double *);
-vector double vec_ldl (int, const double *);
-vector unsigned char vec_lvsl (int, const volatile double *);
-vector unsigned char vec_lvsr (int, const volatile double *);
-vector double vec_madd (vector double, vector double, vector double);
-vector double vec_max (vector double, vector double);
-vector signed long vec_mergeh (vector signed long, vector signed long);
-vector signed long vec_mergeh (vector signed long, vector bool long);
-vector signed long vec_mergeh (vector bool long, vector signed long);
-vector unsigned long vec_mergeh (vector unsigned long, vector unsigned long);
-vector unsigned long vec_mergeh (vector unsigned long, vector bool long);
-vector unsigned long vec_mergeh (vector bool long, vector unsigned long);
-vector signed long vec_mergel (vector signed long, vector signed long);
-vector signed long vec_mergel (vector signed long, vector bool long);
-vector signed long vec_mergel (vector bool long, vector signed long);
-vector unsigned long vec_mergel (vector unsigned long, vector unsigned long);
-vector unsigned long vec_mergel (vector unsigned long, vector bool long);
-vector unsigned long vec_mergel (vector bool long, vector unsigned long);
-vector double vec_min (vector double, vector double);
-vector float vec_msub (vector float, vector float, vector float);
-vector double vec_msub (vector double, vector double, vector double);
-vector float vec_mul (vector float, vector float);
-vector double vec_mul (vector double, vector double);
-vector long vec_mul (vector long, vector long);
-vector unsigned long vec_mul (vector unsigned long, vector unsigned long);
-vector float vec_nearbyint (vector float);
-vector double vec_nearbyint (vector double);
-vector float vec_nmadd (vector float, vector float, vector float);
-vector double vec_nmadd (vector double, vector double, vector double);
-vector double vec_nmsub (vector double, vector double, vector double);
-vector double vec_nor (vector double, vector double);
-vector long vec_nor (vector long, vector long);
-vector long vec_nor (vector long, vector bool long);
-vector long vec_nor (vector bool long, vector long);
-vector unsigned long vec_nor (vector unsigned long, vector unsigned long);
-vector unsigned long vec_nor (vector unsigned long, vector bool long);
-vector unsigned long vec_nor (vector bool long, vector unsigned long);
-vector double vec_or (vector double, vector double);
-vector double vec_or (vector double, vector bool long);
-vector double vec_or (vector bool long, vector double);
-vector long vec_or (vector long, vector long);
-vector long vec_or (vector long, vector bool long);
-vector long vec_or (vector bool long, vector long);
-vector unsigned long vec_or (vector unsigned long, vector unsigned long);
-vector unsigned long vec_or (vector unsigned long, vector bool long);
-vector unsigned long vec_or (vector bool long, vector unsigned long);
-vector double vec_perm (vector double, vector double, vector unsigned char);
-vector long vec_perm (vector long, vector long, vector unsigned char);
-vector unsigned long vec_perm (vector unsigned long, vector unsigned long,
-                               vector unsigned char);
-vector double vec_rint (vector double);
-vector double vec_recip (vector double, vector double);
-vector double vec_rsqrt (vector double);
-vector double vec_rsqrte (vector double);
-vector double vec_sel (vector double, vector double, vector bool long);
-vector double vec_sel (vector double, vector double, vector unsigned long);
-vector long vec_sel (vector long, vector long, vector long);
-vector long vec_sel (vector long, vector long, vector unsigned long);
-vector long vec_sel (vector long, vector long, vector bool long);
-vector unsigned long vec_sel (vector unsigned long, vector unsigned long,
-                              vector long);
-vector unsigned long vec_sel (vector unsigned long, vector unsigned long,
-                              vector unsigned long);
-vector unsigned long vec_sel (vector unsigned long, vector unsigned long,
-                              vector bool long);
-vector double vec_splats (double);
-vector signed long vec_splats (signed long);
-vector unsigned long vec_splats (unsigned long);
-vector float vec_sqrt (vector float);
-vector double vec_sqrt (vector double);
-void vec_st (vector double, int, vector double *);
-void vec_st (vector double, int, double *);
-vector double vec_sub (vector double, vector double);
-vector double vec_trunc (vector double);
-vector double vec_xor (vector double, vector double);
-vector double vec_xor (vector double, vector bool long);
-vector double vec_xor (vector bool long, vector double);
-vector long vec_xor (vector long, vector long);
-vector long vec_xor (vector long, vector bool long);
-vector long vec_xor (vector bool long, vector long);
-vector unsigned long vec_xor (vector unsigned long, vector unsigned long);
-vector unsigned long vec_xor (vector unsigned long, vector bool long);
-vector unsigned long vec_xor (vector bool long, vector unsigned long);
-int vec_all_eq (vector double, vector double);
-int vec_all_ge (vector double, vector double);
-int vec_all_gt (vector double, vector double);
-int vec_all_le (vector double, vector double);
-int vec_all_lt (vector double, vector double);
-int vec_all_nan (vector double);
-int vec_all_ne (vector double, vector double);
-int vec_all_nge (vector double, vector double);
-int vec_all_ngt (vector double, vector double);
-int vec_all_nle (vector double, vector double);
-int vec_all_nlt (vector double, vector double);
-int vec_all_numeric (vector double);
-int vec_any_eq (vector double, vector double);
-int vec_any_ge (vector double, vector double);
-int vec_any_gt (vector double, vector double);
-int vec_any_le (vector double, vector double);
-int vec_any_lt (vector double, vector double);
-int vec_any_nan (vector double);
-int vec_any_ne (vector double, vector double);
-int vec_any_nge (vector double, vector double);
-int vec_any_ngt (vector double, vector double);
-int vec_any_nle (vector double, vector double);
-int vec_any_nlt (vector double, vector double);
-int vec_any_numeric (vector double);
+@deftypefn {Built-in Function}  int __builtin_rx_mvfc (int)
+Generates the @code{mvfc} machine instruction which reads the control
+register specified in its argument and returns its value.
+@end deftypefn
+
+@deftypefn {Built-in Function}  void __builtin_rx_mvtachi (int)
+Generates the @code{mvtachi} machine instruction to set the top
+32 bits of the accumulator.
+@end deftypefn
+
+@deftypefn {Built-in Function}  void __builtin_rx_mvtaclo (int)
+Generates the @code{mvtaclo} machine instruction to set the bottom
+32 bits of the accumulator.
+@end deftypefn
+
+@deftypefn {Built-in Function}  void __builtin_rx_mvtc (int reg, int val)
+Generates the @code{mvtc} machine instruction which sets control
+register number @code{reg} to @code{val}.
+@end deftypefn
 
-vector double vec_vsx_ld (int, const vector double *);
-vector double vec_vsx_ld (int, const double *);
-vector float vec_vsx_ld (int, const vector float *);
-vector float vec_vsx_ld (int, const float *);
-vector bool int vec_vsx_ld (int, const vector bool int *);
-vector signed int vec_vsx_ld (int, const vector signed int *);
-vector signed int vec_vsx_ld (int, const int *);
-vector signed int vec_vsx_ld (int, const long *);
-vector unsigned int vec_vsx_ld (int, const vector unsigned int *);
-vector unsigned int vec_vsx_ld (int, const unsigned int *);
-vector unsigned int vec_vsx_ld (int, const unsigned long *);
-vector bool short vec_vsx_ld (int, const vector bool short *);
-vector pixel vec_vsx_ld (int, const vector pixel *);
-vector signed short vec_vsx_ld (int, const vector signed short *);
-vector signed short vec_vsx_ld (int, const short *);
-vector unsigned short vec_vsx_ld (int, const vector unsigned short *);
-vector unsigned short vec_vsx_ld (int, const unsigned short *);
-vector bool char vec_vsx_ld (int, const vector bool char *);
-vector signed char vec_vsx_ld (int, const vector signed char *);
-vector signed char vec_vsx_ld (int, const signed char *);
-vector unsigned char vec_vsx_ld (int, const vector unsigned char *);
-vector unsigned char vec_vsx_ld (int, const unsigned char *);
+@deftypefn {Built-in Function}  void __builtin_rx_mvtipl (int)
+Generates the @code{mvtipl} machine instruction set the interrupt
+priority level.
+@end deftypefn
 
-void vec_vsx_st (vector double, int, vector double *);
-void vec_vsx_st (vector double, int, double *);
-void vec_vsx_st (vector float, int, vector float *);
-void vec_vsx_st (vector float, int, float *);
-void vec_vsx_st (vector signed int, int, vector signed int *);
-void vec_vsx_st (vector signed int, int, int *);
-void vec_vsx_st (vector unsigned int, int, vector unsigned int *);
-void vec_vsx_st (vector unsigned int, int, unsigned int *);
-void vec_vsx_st (vector bool int, int, vector bool int *);
-void vec_vsx_st (vector bool int, int, unsigned int *);
-void vec_vsx_st (vector bool int, int, int *);
-void vec_vsx_st (vector signed short, int, vector signed short *);
-void vec_vsx_st (vector signed short, int, short *);
-void vec_vsx_st (vector unsigned short, int, vector unsigned short *);
-void vec_vsx_st (vector unsigned short, int, unsigned short *);
-void vec_vsx_st (vector bool short, int, vector bool short *);
-void vec_vsx_st (vector bool short, int, unsigned short *);
-void vec_vsx_st (vector pixel, int, vector pixel *);
-void vec_vsx_st (vector pixel, int, unsigned short *);
-void vec_vsx_st (vector pixel, int, short *);
-void vec_vsx_st (vector bool short, int, short *);
-void vec_vsx_st (vector signed char, int, vector signed char *);
-void vec_vsx_st (vector signed char, int, signed char *);
-void vec_vsx_st (vector unsigned char, int, vector unsigned char *);
-void vec_vsx_st (vector unsigned char, int, unsigned char *);
-void vec_vsx_st (vector bool char, int, vector bool char *);
-void vec_vsx_st (vector bool char, int, unsigned char *);
-void vec_vsx_st (vector bool char, int, signed char *);
+@deftypefn {Built-in Function}  void __builtin_rx_racw (int)
+Generates the @code{racw} machine instruction to round the accumulator
+according to the specified mode.
+@end deftypefn
 
-vector double vec_xxpermdi (vector double, vector double, int);
-vector float vec_xxpermdi (vector float, vector float, int);
-vector long long vec_xxpermdi (vector long long, vector long long, int);
-vector unsigned long long vec_xxpermdi (vector unsigned long long,
-                                        vector unsigned long long, int);
-vector int vec_xxpermdi (vector int, vector int, int);
-vector unsigned int vec_xxpermdi (vector unsigned int,
-                                  vector unsigned int, int);
-vector short vec_xxpermdi (vector short, vector short, int);
-vector unsigned short vec_xxpermdi (vector unsigned short,
-                                    vector unsigned short, int);
-vector signed char vec_xxpermdi (vector signed char, vector signed char, int);
-vector unsigned char vec_xxpermdi (vector unsigned char,
-                                   vector unsigned char, int);
+@deftypefn {Built-in Function}  int __builtin_rx_revw (int)
+Generates the @code{revw} machine instruction which swaps the bytes in
+the argument so that bits 0--7 now occupy bits 8--15 and vice versa,
+and also bits 16--23 occupy bits 24--31 and vice versa.
+@end deftypefn
 
-vector double vec_xxsldi (vector double, vector double, int);
-vector float vec_xxsldi (vector float, vector float, int);
-vector long long vec_xxsldi (vector long long, vector long long, int);
-vector unsigned long long vec_xxsldi (vector unsigned long long,
-                                      vector unsigned long long, int);
-vector int vec_xxsldi (vector int, vector int, int);
-vector unsigned int vec_xxsldi (vector unsigned int, vector unsigned int, int);
-vector short vec_xxsldi (vector short, vector short, int);
-vector unsigned short vec_xxsldi (vector unsigned short,
-                                  vector unsigned short, int);
-vector signed char vec_xxsldi (vector signed char, vector signed char, int);
-vector unsigned char vec_xxsldi (vector unsigned char,
-                                 vector unsigned char, int);
-@end smallexample
+@deftypefn {Built-in Function}  void __builtin_rx_rmpa (void)
+Generates the @code{rmpa} machine instruction which initiates a
+repeated multiply and accumulate sequence.
+@end deftypefn
 
-Note that the @samp{vec_ld} and @samp{vec_st} built-in functions always
-generate the AltiVec @samp{LVX} and @samp{STVX} instructions even
-if the VSX instruction set is available.  The @samp{vec_vsx_ld} and
-@samp{vec_vsx_st} built-in functions always generate the VSX @samp{LXVD2X},
-@samp{LXVW4X}, @samp{STXVD2X}, and @samp{STXVW4X} instructions.
+@deftypefn {Built-in Function}  void __builtin_rx_round (float)
+Generates the @code{round} machine instruction which returns the
+floating-point argument rounded according to the current rounding mode
+set in the floating-point status word register.
+@end deftypefn
 
-If the ISA 2.07 additions to the vector/scalar (power8-vector)
-instruction set is available, the following additional functions are
-available for both 32-bit and 64-bit targets.  For 64-bit targets, you
-can use @var{vector long} instead of @var{vector long long},
-@var{vector bool long} instead of @var{vector bool long long}, and
-@var{vector unsigned long} instead of @var{vector unsigned long long}.
+@deftypefn {Built-in Function}  int __builtin_rx_sat (int)
+Generates the @code{sat} machine instruction which returns the
+saturated value of the argument.
+@end deftypefn
 
-@smallexample
-vector long long vec_abs (vector long long);
+@deftypefn {Built-in Function}  void __builtin_rx_setpsw (int)
+Generates the @code{setpsw} machine instruction to set the specified
+bit in the processor status word.
+@end deftypefn
 
-vector long long vec_add (vector long long, vector long long);
-vector unsigned long long vec_add (vector unsigned long long,
-                                   vector unsigned long long);
+@deftypefn {Built-in Function}  void __builtin_rx_wait (void)
+Generates the @code{wait} machine instruction.
+@end deftypefn
 
-int vec_all_eq (vector long long, vector long long);
-int vec_all_eq (vector unsigned long long, vector unsigned long long);
-int vec_all_ge (vector long long, vector long long);
-int vec_all_ge (vector unsigned long long, vector unsigned long long);
-int vec_all_gt (vector long long, vector long long);
-int vec_all_gt (vector unsigned long long, vector unsigned long long);
-int vec_all_le (vector long long, vector long long);
-int vec_all_le (vector unsigned long long, vector unsigned long long);
-int vec_all_lt (vector long long, vector long long);
-int vec_all_lt (vector unsigned long long, vector unsigned long long);
-int vec_all_ne (vector long long, vector long long);
-int vec_all_ne (vector unsigned long long, vector unsigned long long);
+@node S/390 System z Built-in Functions
+@subsection S/390 System z Built-in Functions
+@deftypefn {Built-in Function} int __builtin_tbegin (void*)
+Generates the @code{tbegin} machine instruction starting a
+non-constraint hardware transaction.  If the parameter is non-NULL the
+memory area is used to store the transaction diagnostic buffer and
+will be passed as first operand to @code{tbegin}.  This buffer can be
+defined using the @code{struct __htm_tdb} C struct defined in
+@code{htmintrin.h} and must reside on a double-word boundary.  The
+second tbegin operand is set to @code{0xff0c}. This enables
+save/restore of all GPRs and disables aborts for FPR and AR
+manipulations inside the transaction body.  The condition code set by
+the tbegin instruction is returned as integer value.  The tbegin
+instruction by definition overwrites the content of all FPRs.  The
+compiler will generate code which saves and restores the FPRs.  For
+soft-float code it is recommended to used the @code{*_nofloat}
+variant.  In order to prevent a TDB from being written it is required
+to pass an constant zero value as parameter.  Passing the zero value
+through a variable is not sufficient.  Although modifications of
+access registers inside the transaction will not trigger an
+transaction abort it is not supported to actually modify them.  Access
+registers do not get saved when entering a transaction. They will have
+undefined state when reaching the abort code.
+@end deftypefn
 
-int vec_any_eq (vector long long, vector long long);
-int vec_any_eq (vector unsigned long long, vector unsigned long long);
-int vec_any_ge (vector long long, vector long long);
-int vec_any_ge (vector unsigned long long, vector unsigned long long);
-int vec_any_gt (vector long long, vector long long);
-int vec_any_gt (vector unsigned long long, vector unsigned long long);
-int vec_any_le (vector long long, vector long long);
-int vec_any_le (vector unsigned long long, vector unsigned long long);
-int vec_any_lt (vector long long, vector long long);
-int vec_any_lt (vector unsigned long long, vector unsigned long long);
-int vec_any_ne (vector long long, vector long long);
-int vec_any_ne (vector unsigned long long, vector unsigned long long);
+Macros for the possible return codes of tbegin are defined in the
+@code{htmintrin.h} header file:
 
-vector long long vec_eqv (vector long long, vector long long);
-vector long long vec_eqv (vector bool long long, vector long long);
-vector long long vec_eqv (vector long long, vector bool long long);
-vector unsigned long long vec_eqv (vector unsigned long long,
-                                   vector unsigned long long);
-vector unsigned long long vec_eqv (vector bool long long,
-                                   vector unsigned long long);
-vector unsigned long long vec_eqv (vector unsigned long long,
-                                   vector bool long long);
-vector int vec_eqv (vector int, vector int);
-vector int vec_eqv (vector bool int, vector int);
-vector int vec_eqv (vector int, vector bool int);
-vector unsigned int vec_eqv (vector unsigned int, vector unsigned int);
-vector unsigned int vec_eqv (vector bool unsigned int,
-                             vector unsigned int);
-vector unsigned int vec_eqv (vector unsigned int,
-                             vector bool unsigned int);
-vector short vec_eqv (vector short, vector short);
-vector short vec_eqv (vector bool short, vector short);
-vector short vec_eqv (vector short, vector bool short);
-vector unsigned short vec_eqv (vector unsigned short, vector unsigned short);
-vector unsigned short vec_eqv (vector bool unsigned short,
-                               vector unsigned short);
-vector unsigned short vec_eqv (vector unsigned short,
-                               vector bool unsigned short);
-vector signed char vec_eqv (vector signed char, vector signed char);
-vector signed char vec_eqv (vector bool signed char, vector signed char);
-vector signed char vec_eqv (vector signed char, vector bool signed char);
-vector unsigned char vec_eqv (vector unsigned char, vector unsigned char);
-vector unsigned char vec_eqv (vector bool unsigned char, vector unsigned char);
-vector unsigned char vec_eqv (vector unsigned char, vector bool unsigned char);
+@table @code
+@item _HTM_TBEGIN_STARTED
+@code{tbegin} has been executed as part of normal processing.  The
+transaction body is supposed to be executed.
+@item _HTM_TBEGIN_INDETERMINATE
+The transaction was aborted due to an indeterminate condition which
+might be persistent.
+@item _HTM_TBEGIN_TRANSIENT
+The transaction aborted due to a transient failure.  The transaction
+should be re-executed in that case.
+@item _HTM_TBEGIN_PERSISTENT
+The transaction aborted due to a persistent failure.  Re-execution
+under same circumstances will not be productive.
+@end table
+
+@defmac _HTM_FIRST_USER_ABORT_CODE
+The @code{_HTM_FIRST_USER_ABORT_CODE} defined in @code{htmintrin.h}
+specifies the first abort code which can be used for
+@code{__builtin_tabort}.  Values below this threshold are reserved for
+machine use.
+@end defmac
+
+@deftp {Data type} {struct __htm_tdb}
+The @code{struct __htm_tdb} defined in @code{htmintrin.h} describes
+the structure of the transaction diagnostic block as specified in the
+Principles of Operation manual chapter 5-91.
+@end deftp
+
+@deftypefn {Built-in Function} int __builtin_tbegin_nofloat (void*)
+Same as @code{__builtin_tbegin} but without FPR saves and restores.
+Using this variant in code making use of FPRs will leave the FPRs in
+undefined state when entering the transaction abort handler code.
+@end deftypefn
+
+@deftypefn {Built-in Function} int __builtin_tbegin_retry (void*, int)
+In addition to @code{__builtin_tbegin} a loop for transient failures
+is generated.  If tbegin returns a condition code of 2 the transaction
+will be retried as often as specified in the second argument.  The
+perform processor assist instruction is used to tell the CPU about the
+number of fails so far.
+@end deftypefn
+
+@deftypefn {Built-in Function} int __builtin_tbegin_retry_nofloat (void*, int)
+Same as @code{__builtin_tbegin_retry} but without FPR saves and
+restores.  Using this variant in code making use of FPRs will leave
+the FPRs in undefined state when entering the transaction abort
+handler code.
+@end deftypefn
 
-vector long long vec_max (vector long long, vector long long);
-vector unsigned long long vec_max (vector unsigned long long,
-                                   vector unsigned long long);
+@deftypefn {Built-in Function} void __builtin_tbeginc (void)
+Generates the @code{tbeginc} machine instruction starting a constraint
+hardware transaction.  The second operand is set to @code{0xff08}.
+@end deftypefn
 
-vector signed int vec_mergee (vector signed int, vector signed int);
-vector unsigned int vec_mergee (vector unsigned int, vector unsigned int);
-vector bool int vec_mergee (vector bool int, vector bool int);
+@deftypefn {Built-in Function} int __builtin_tend (void)
+Generates the @code{tend} machine instruction finishing a transaction
+and making the changes visible to other threads.  The condition code
+generated by tend is returned as integer value.
+@end deftypefn
 
-vector signed int vec_mergeo (vector signed int, vector signed int);
-vector unsigned int vec_mergeo (vector unsigned int, vector unsigned int);
-vector bool int vec_mergeo (vector bool int, vector bool int);
+@deftypefn {Built-in Function} void __builtin_tabort (int)
+Generates the @code{tabort} machine instruction with the specified
+abort code.  Abort codes from 0 through 255 are reserved and will
+result in an error message.
+@end deftypefn
 
-vector long long vec_min (vector long long, vector long long);
-vector unsigned long long vec_min (vector unsigned long long,
-                                   vector unsigned long long);
+@deftypefn {Built-in Function} void __builtin_tx_assist (int)
+Generates the @code{ppa rX,rY,1} machine instruction.  Where the
+integer parameter is loaded into rX and a value of zero is loaded into
+rY.  The integer parameter specifies the number of times the
+transaction repeatedly aborted.
+@end deftypefn
 
-vector long long vec_nand (vector long long, vector long long);
-vector long long vec_nand (vector bool long long, vector long long);
-vector long long vec_nand (vector long long, vector bool long long);
-vector unsigned long long vec_nand (vector unsigned long long,
-                                    vector unsigned long long);
-vector unsigned long long vec_nand (vector bool long long,
-                                   vector unsigned long long);
-vector unsigned long long vec_nand (vector unsigned long long,
-                                    vector bool long long);
-vector int vec_nand (vector int, vector int);
-vector int vec_nand (vector bool int, vector int);
-vector int vec_nand (vector int, vector bool int);
-vector unsigned int vec_nand (vector unsigned int, vector unsigned int);
-vector unsigned int vec_nand (vector bool unsigned int,
-                              vector unsigned int);
-vector unsigned int vec_nand (vector unsigned int,
-                              vector bool unsigned int);
-vector short vec_nand (vector short, vector short);
-vector short vec_nand (vector bool short, vector short);
-vector short vec_nand (vector short, vector bool short);
-vector unsigned short vec_nand (vector unsigned short, vector unsigned short);
-vector unsigned short vec_nand (vector bool unsigned short,
-                                vector unsigned short);
-vector unsigned short vec_nand (vector unsigned short,
-                                vector bool unsigned short);
-vector signed char vec_nand (vector signed char, vector signed char);
-vector signed char vec_nand (vector bool signed char, vector signed char);
-vector signed char vec_nand (vector signed char, vector bool signed char);
-vector unsigned char vec_nand (vector unsigned char, vector unsigned char);
-vector unsigned char vec_nand (vector bool unsigned char, vector unsigned char);
-vector unsigned char vec_nand (vector unsigned char, vector bool unsigned char);
+@deftypefn {Built-in Function} int __builtin_tx_nesting_depth (void)
+Generates the @code{etnd} machine instruction.  The current nesting
+depth is returned as integer value.  For a nesting depth of 0 the code
+is not executed as part of an transaction.
+@end deftypefn
 
-vector long long vec_orc (vector long long, vector long long);
-vector long long vec_orc (vector bool long long, vector long long);
-vector long long vec_orc (vector long long, vector bool long long);
-vector unsigned long long vec_orc (vector unsigned long long,
-                                   vector unsigned long long);
-vector unsigned long long vec_orc (vector bool long long,
-                                   vector unsigned long long);
-vector unsigned long long vec_orc (vector unsigned long long,
-                                   vector bool long long);
-vector int vec_orc (vector int, vector int);
-vector int vec_orc (vector bool int, vector int);
-vector int vec_orc (vector int, vector bool int);
-vector unsigned int vec_orc (vector unsigned int, vector unsigned int);
-vector unsigned int vec_orc (vector bool unsigned int,
-                             vector unsigned int);
-vector unsigned int vec_orc (vector unsigned int,
-                             vector bool unsigned int);
-vector short vec_orc (vector short, vector short);
-vector short vec_orc (vector bool short, vector short);
-vector short vec_orc (vector short, vector bool short);
-vector unsigned short vec_orc (vector unsigned short, vector unsigned short);
-vector unsigned short vec_orc (vector bool unsigned short,
-                               vector unsigned short);
-vector unsigned short vec_orc (vector unsigned short,
-                               vector bool unsigned short);
-vector signed char vec_orc (vector signed char, vector signed char);
-vector signed char vec_orc (vector bool signed char, vector signed char);
-vector signed char vec_orc (vector signed char, vector bool signed char);
-vector unsigned char vec_orc (vector unsigned char, vector unsigned char);
-vector unsigned char vec_orc (vector bool unsigned char, vector unsigned char);
-vector unsigned char vec_orc (vector unsigned char, vector bool unsigned char);
+@deftypefn {Built-in Function} void __builtin_non_tx_store (uint64_t *, uint64_t)
 
-vector int vec_pack (vector long long, vector long long);
-vector unsigned int vec_pack (vector unsigned long long,
-                              vector unsigned long long);
-vector bool int vec_pack (vector bool long long, vector bool long long);
+Generates the @code{ntstg} machine instruction.  The second argument
+is written to the first arguments location.  The store operation will
+not be rolled-back in case of an transaction abort.
+@end deftypefn
 
-vector int vec_packs (vector long long, vector long long);
-vector unsigned int vec_packs (vector unsigned long long,
-                               vector unsigned long long);
+@node SH Built-in Functions
+@subsection SH Built-in Functions
+The following built-in functions are supported on the SH1, SH2, SH3 and SH4
+families of processors:
 
-vector unsigned int vec_packsu (vector long long, vector long long);
-vector unsigned int vec_packsu (vector unsigned long long,
-                                vector unsigned long long);
+@deftypefn {Built-in Function} {void} __builtin_set_thread_pointer (void *@var{ptr})
+Sets the @samp{GBR} register to the specified value @var{ptr}.  This is usually
+used by system code that manages threads and execution contexts.  The compiler
+normally does not generate code that modifies the contents of @samp{GBR} and
+thus the value is preserved across function calls.  Changing the @samp{GBR}
+value in user code must be done with caution, since the compiler might use
+@samp{GBR} in order to access thread local variables.
 
-vector long long vec_rl (vector long long,
-                         vector unsigned long long);
-vector long long vec_rl (vector unsigned long long,
-                         vector unsigned long long);
+@end deftypefn
 
-vector long long vec_sl (vector long long, vector unsigned long long);
-vector long long vec_sl (vector unsigned long long,
-                         vector unsigned long long);
+@deftypefn {Built-in Function} {void *} __builtin_thread_pointer (void)
+Returns the value that is currently set in the @samp{GBR} register.
+Memory loads and stores that use the thread pointer as a base address are
+turned into @samp{GBR} based displacement loads and stores, if possible.
+For example:
+@smallexample
+struct my_tcb
+@{
+   int a, b, c, d, e;
+@};
 
-vector long long vec_sr (vector long long, vector unsigned long long);
-vector unsigned long long char vec_sr (vector unsigned long long,
-                                       vector unsigned long long);
+int get_tcb_value (void)
+@{
+  // Generate @samp{mov.l @@(8,gbr),r0} instruction
+  return ((my_tcb*)__builtin_thread_pointer ())->c;
+@}
 
-vector long long vec_sra (vector long long, vector unsigned long long);
-vector unsigned long long vec_sra (vector unsigned long long,
-                                   vector unsigned long long);
+@end smallexample
+@end deftypefn
 
-vector long long vec_sub (vector long long, vector long long);
-vector unsigned long long vec_sub (vector unsigned long long,
-                                   vector unsigned long long);
+@deftypefn {Built-in Function} {unsigned int} __builtin_sh_get_fpscr (void)
+Returns the value that is currently set in the @samp{FPSCR} register.
+@end deftypefn
 
-vector long long vec_unpackh (vector int);
-vector unsigned long long vec_unpackh (vector unsigned int);
+@deftypefn {Built-in Function} {void} __builtin_sh_set_fpscr (unsigned int @var{val})
+Sets the @samp{FPSCR} register to the specified value @var{val}, while
+preserving the current values of the FR, SZ and PR bits.
+@end deftypefn
 
-vector long long vec_unpackl (vector int);
-vector unsigned long long vec_unpackl (vector unsigned int);
+@node SPARC VIS Built-in Functions
+@subsection SPARC VIS Built-in Functions
 
-vector long long vec_vaddudm (vector long long, vector long long);
-vector long long vec_vaddudm (vector bool long long, vector long long);
-vector long long vec_vaddudm (vector long long, vector bool long long);
-vector unsigned long long vec_vaddudm (vector unsigned long long,
-                                       vector unsigned long long);
-vector unsigned long long vec_vaddudm (vector bool unsigned long long,
-                                       vector unsigned long long);
-vector unsigned long long vec_vaddudm (vector unsigned long long,
-                                       vector bool unsigned long long);
+GCC supports SIMD operations on the SPARC using both the generic vector
+extensions (@pxref{Vector Extensions}) as well as built-in functions for
+the SPARC Visual Instruction Set (VIS).  When you use the @option{-mvis}
+switch, the VIS extension is exposed as the following built-in functions:
 
-vector long long vec_vbpermq (vector signed char, vector signed char);
-vector long long vec_vbpermq (vector unsigned char, vector unsigned char);
+@smallexample
+typedef int v1si __attribute__ ((vector_size (4)));
+typedef int v2si __attribute__ ((vector_size (8)));
+typedef short v4hi __attribute__ ((vector_size (8)));
+typedef short v2hi __attribute__ ((vector_size (4)));
+typedef unsigned char v8qi __attribute__ ((vector_size (8)));
+typedef unsigned char v4qi __attribute__ ((vector_size (4)));
 
-vector long long vec_cntlz (vector long long);
-vector unsigned long long vec_cntlz (vector unsigned long long);
-vector int vec_cntlz (vector int);
-vector unsigned int vec_cntlz (vector int);
-vector short vec_cntlz (vector short);
-vector unsigned short vec_cntlz (vector unsigned short);
-vector signed char vec_cntlz (vector signed char);
-vector unsigned char vec_cntlz (vector unsigned char);
+void __builtin_vis_write_gsr (int64_t);
+int64_t __builtin_vis_read_gsr (void);
 
-vector long long vec_vclz (vector long long);
-vector unsigned long long vec_vclz (vector unsigned long long);
-vector int vec_vclz (vector int);
-vector unsigned int vec_vclz (vector int);
-vector short vec_vclz (vector short);
-vector unsigned short vec_vclz (vector unsigned short);
-vector signed char vec_vclz (vector signed char);
-vector unsigned char vec_vclz (vector unsigned char);
+void * __builtin_vis_alignaddr (void *, long);
+void * __builtin_vis_alignaddrl (void *, long);
+int64_t __builtin_vis_faligndatadi (int64_t, int64_t);
+v2si __builtin_vis_faligndatav2si (v2si, v2si);
+v4hi __builtin_vis_faligndatav4hi (v4si, v4si);
+v8qi __builtin_vis_faligndatav8qi (v8qi, v8qi);
 
-vector signed char vec_vclzb (vector signed char);
-vector unsigned char vec_vclzb (vector unsigned char);
+v4hi __builtin_vis_fexpand (v4qi);
 
-vector long long vec_vclzd (vector long long);
-vector unsigned long long vec_vclzd (vector unsigned long long);
+v4hi __builtin_vis_fmul8x16 (v4qi, v4hi);
+v4hi __builtin_vis_fmul8x16au (v4qi, v2hi);
+v4hi __builtin_vis_fmul8x16al (v4qi, v2hi);
+v4hi __builtin_vis_fmul8sux16 (v8qi, v4hi);
+v4hi __builtin_vis_fmul8ulx16 (v8qi, v4hi);
+v2si __builtin_vis_fmuld8sux16 (v4qi, v2hi);
+v2si __builtin_vis_fmuld8ulx16 (v4qi, v2hi);
+
+v4qi __builtin_vis_fpack16 (v4hi);
+v8qi __builtin_vis_fpack32 (v2si, v8qi);
+v2hi __builtin_vis_fpackfix (v2si);
+v8qi __builtin_vis_fpmerge (v4qi, v4qi);
+
+int64_t __builtin_vis_pdist (v8qi, v8qi, int64_t);
+
+long __builtin_vis_edge8 (void *, void *);
+long __builtin_vis_edge8l (void *, void *);
+long __builtin_vis_edge16 (void *, void *);
+long __builtin_vis_edge16l (void *, void *);
+long __builtin_vis_edge32 (void *, void *);
+long __builtin_vis_edge32l (void *, void *);
+
+long __builtin_vis_fcmple16 (v4hi, v4hi);
+long __builtin_vis_fcmple32 (v2si, v2si);
+long __builtin_vis_fcmpne16 (v4hi, v4hi);
+long __builtin_vis_fcmpne32 (v2si, v2si);
+long __builtin_vis_fcmpgt16 (v4hi, v4hi);
+long __builtin_vis_fcmpgt32 (v2si, v2si);
+long __builtin_vis_fcmpeq16 (v4hi, v4hi);
+long __builtin_vis_fcmpeq32 (v2si, v2si);
+
+v4hi __builtin_vis_fpadd16 (v4hi, v4hi);
+v2hi __builtin_vis_fpadd16s (v2hi, v2hi);
+v2si __builtin_vis_fpadd32 (v2si, v2si);
+v1si __builtin_vis_fpadd32s (v1si, v1si);
+v4hi __builtin_vis_fpsub16 (v4hi, v4hi);
+v2hi __builtin_vis_fpsub16s (v2hi, v2hi);
+v2si __builtin_vis_fpsub32 (v2si, v2si);
+v1si __builtin_vis_fpsub32s (v1si, v1si);
 
-vector short vec_vclzh (vector short);
-vector unsigned short vec_vclzh (vector unsigned short);
+long __builtin_vis_array8 (long, long);
+long __builtin_vis_array16 (long, long);
+long __builtin_vis_array32 (long, long);
+@end smallexample
 
-vector int vec_vclzw (vector int);
-vector unsigned int vec_vclzw (vector int);
+When you use the @option{-mvis2} switch, the VIS version 2.0 built-in
+functions also become available:
 
-vector signed char vec_vgbbd (vector signed char);
-vector unsigned char vec_vgbbd (vector unsigned char);
+@smallexample
+long __builtin_vis_bmask (long, long);
+int64_t __builtin_vis_bshuffledi (int64_t, int64_t);
+v2si __builtin_vis_bshufflev2si (v2si, v2si);
+v4hi __builtin_vis_bshufflev2si (v4hi, v4hi);
+v8qi __builtin_vis_bshufflev2si (v8qi, v8qi);
 
-vector long long vec_vmaxsd (vector long long, vector long long);
+long __builtin_vis_edge8n (void *, void *);
+long __builtin_vis_edge8ln (void *, void *);
+long __builtin_vis_edge16n (void *, void *);
+long __builtin_vis_edge16ln (void *, void *);
+long __builtin_vis_edge32n (void *, void *);
+long __builtin_vis_edge32ln (void *, void *);
+@end smallexample
 
-vector unsigned long long vec_vmaxud (vector unsigned long long,
-                                      unsigned vector long long);
+When you use the @option{-mvis3} switch, the VIS version 3.0 built-in
+functions also become available:
 
-vector long long vec_vminsd (vector long long, vector long long);
+@smallexample
+void __builtin_vis_cmask8 (long);
+void __builtin_vis_cmask16 (long);
+void __builtin_vis_cmask32 (long);
 
-vector unsigned long long vec_vminud (vector long long,
-                                      vector long long);
+v4hi __builtin_vis_fchksm16 (v4hi, v4hi);
 
-vector int vec_vpksdss (vector long long, vector long long);
-vector unsigned int vec_vpksdss (vector long long, vector long long);
+v4hi __builtin_vis_fsll16 (v4hi, v4hi);
+v4hi __builtin_vis_fslas16 (v4hi, v4hi);
+v4hi __builtin_vis_fsrl16 (v4hi, v4hi);
+v4hi __builtin_vis_fsra16 (v4hi, v4hi);
+v2si __builtin_vis_fsll16 (v2si, v2si);
+v2si __builtin_vis_fslas16 (v2si, v2si);
+v2si __builtin_vis_fsrl16 (v2si, v2si);
+v2si __builtin_vis_fsra16 (v2si, v2si);
 
-vector unsigned int vec_vpkudus (vector unsigned long long,
-                                 vector unsigned long long);
+long __builtin_vis_pdistn (v8qi, v8qi);
 
-vector int vec_vpkudum (vector long long, vector long long);
-vector unsigned int vec_vpkudum (vector unsigned long long,
-                                 vector unsigned long long);
-vector bool int vec_vpkudum (vector bool long long, vector bool long long);
+v4hi __builtin_vis_fmean16 (v4hi, v4hi);
 
-vector long long vec_vpopcnt (vector long long);
-vector unsigned long long vec_vpopcnt (vector unsigned long long);
-vector int vec_vpopcnt (vector int);
-vector unsigned int vec_vpopcnt (vector int);
-vector short vec_vpopcnt (vector short);
-vector unsigned short vec_vpopcnt (vector unsigned short);
-vector signed char vec_vpopcnt (vector signed char);
-vector unsigned char vec_vpopcnt (vector unsigned char);
+int64_t __builtin_vis_fpadd64 (int64_t, int64_t);
+int64_t __builtin_vis_fpsub64 (int64_t, int64_t);
 
-vector signed char vec_vpopcntb (vector signed char);
-vector unsigned char vec_vpopcntb (vector unsigned char);
+v4hi __builtin_vis_fpadds16 (v4hi, v4hi);
+v2hi __builtin_vis_fpadds16s (v2hi, v2hi);
+v4hi __builtin_vis_fpsubs16 (v4hi, v4hi);
+v2hi __builtin_vis_fpsubs16s (v2hi, v2hi);
+v2si __builtin_vis_fpadds32 (v2si, v2si);
+v1si __builtin_vis_fpadds32s (v1si, v1si);
+v2si __builtin_vis_fpsubs32 (v2si, v2si);
+v1si __builtin_vis_fpsubs32s (v1si, v1si);
 
-vector long long vec_vpopcntd (vector long long);
-vector unsigned long long vec_vpopcntd (vector unsigned long long);
+long __builtin_vis_fucmple8 (v8qi, v8qi);
+long __builtin_vis_fucmpne8 (v8qi, v8qi);
+long __builtin_vis_fucmpgt8 (v8qi, v8qi);
+long __builtin_vis_fucmpeq8 (v8qi, v8qi);
 
-vector short vec_vpopcnth (vector short);
-vector unsigned short vec_vpopcnth (vector unsigned short);
+float __builtin_vis_fhadds (float, float);
+double __builtin_vis_fhaddd (double, double);
+float __builtin_vis_fhsubs (float, float);
+double __builtin_vis_fhsubd (double, double);
+float __builtin_vis_fnhadds (float, float);
+double __builtin_vis_fnhaddd (double, double);
 
-vector int vec_vpopcntw (vector int);
-vector unsigned int vec_vpopcntw (vector int);
+int64_t __builtin_vis_umulxhi (int64_t, int64_t);
+int64_t __builtin_vis_xmulx (int64_t, int64_t);
+int64_t __builtin_vis_xmulxhi (int64_t, int64_t);
+@end smallexample
 
-vector long long vec_vrld (vector long long, vector unsigned long long);
-vector unsigned long long vec_vrld (vector unsigned long long,
-                                    vector unsigned long long);
+@node SPU Built-in Functions
+@subsection SPU Built-in Functions
 
-vector long long vec_vsld (vector long long, vector unsigned long long);
-vector long long vec_vsld (vector unsigned long long,
-                           vector unsigned long long);
+GCC provides extensions for the SPU processor as described in the
+Sony/Toshiba/IBM SPU Language Extensions Specification, which can be
+found at @uref{http://cell.scei.co.jp/} or
+@uref{http://www.ibm.com/developerworks/power/cell/}.  GCC's
+implementation differs in several ways.
 
-vector long long vec_vsrad (vector long long, vector unsigned long long);
-vector unsigned long long vec_vsrad (vector unsigned long long,
-                                     vector unsigned long long);
+@itemize @bullet
 
-vector long long vec_vsrd (vector long long, vector unsigned long long);
-vector unsigned long long char vec_vsrd (vector unsigned long long,
-                                         vector unsigned long long);
+@item
+The optional extension of specifying vector constants in parentheses is
+not supported.
 
-vector long long vec_vsubudm (vector long long, vector long long);
-vector long long vec_vsubudm (vector bool long long, vector long long);
-vector long long vec_vsubudm (vector long long, vector bool long long);
-vector unsigned long long vec_vsubudm (vector unsigned long long,
-                                       vector unsigned long long);
-vector unsigned long long vec_vsubudm (vector bool long long,
-                                       vector unsigned long long);
-vector unsigned long long vec_vsubudm (vector unsigned long long,
-                                       vector bool long long);
+@item
+A vector initializer requires no cast if the vector constant is of the
+same type as the variable it is initializing.
 
-vector long long vec_vupkhsw (vector int);
-vector unsigned long long vec_vupkhsw (vector unsigned int);
+@item
+If @code{signed} or @code{unsigned} is omitted, the signedness of the
+vector type is the default signedness of the base type.  The default
+varies depending on the operating system, so a portable program should
+always specify the signedness.
 
-vector long long vec_vupklsw (vector int);
-vector unsigned long long vec_vupklsw (vector int);
-@end smallexample
+@item
+By default, the keyword @code{__vector} is added. The macro
+@code{vector} is defined in @code{<spu_intrinsics.h>} and can be
+undefined.
 
-If the ISA 2.07 additions to the vector/scalar (power8-vector)
-instruction set is available, the following additional functions are
-available for 64-bit targets.  New vector types
-(@var{vector __int128_t} and @var{vector __uint128_t}) are available
-to hold the @var{__int128_t} and @var{__uint128_t} types to use these
-builtins.
+@item
+GCC allows using a @code{typedef} name as the type specifier for a
+vector type.
 
-The normal vector extract, and set operations work on
-@var{vector __int128_t} and @var{vector __uint128_t} types,
-but the index value must be 0.
+@item
+For C, overloaded functions are implemented with macros so the following
+does not work:
 
 @smallexample
-vector __int128_t vec_vaddcuq (vector __int128_t, vector __int128_t);
-vector __uint128_t vec_vaddcuq (vector __uint128_t, vector __uint128_t);
-
-vector __int128_t vec_vadduqm (vector __int128_t, vector __int128_t);
-vector __uint128_t vec_vadduqm (vector __uint128_t, vector __uint128_t);
+  spu_add ((vector signed int)@{1, 2, 3, 4@}, foo);
+@end smallexample
 
-vector __int128_t vec_vaddecuq (vector __int128_t, vector __int128_t,
-                                vector __int128_t);
-vector __uint128_t vec_vaddecuq (vector __uint128_t, vector __uint128_t, 
-                                 vector __uint128_t);
+@noindent
+Since @code{spu_add} is a macro, the vector constant in the example
+is treated as four separate arguments.  Wrap the entire argument in
+parentheses for this to work.
 
-vector __int128_t vec_vaddeuqm (vector __int128_t, vector __int128_t,
-                                vector __int128_t);
-vector __uint128_t vec_vaddeuqm (vector __uint128_t, vector __uint128_t, 
-                                 vector __uint128_t);
+@item
+The extended version of @code{__builtin_expect} is not supported.
 
-vector __int128_t vec_vsubecuq (vector __int128_t, vector __int128_t,
-                                vector __int128_t);
-vector __uint128_t vec_vsubecuq (vector __uint128_t, vector __uint128_t, 
-                                 vector __uint128_t);
+@end itemize
 
-vector __int128_t vec_vsubeuqm (vector __int128_t, vector __int128_t,
-                                vector __int128_t);
-vector __uint128_t vec_vsubeuqm (vector __uint128_t, vector __uint128_t,
-                                 vector __uint128_t);
+@emph{Note:} Only the interface described in the aforementioned
+specification is supported. Internally, GCC uses built-in functions to
+implement the required functionality, but these are not supported and
+are subject to change without notice.
 
-vector __int128_t vec_vsubcuq (vector __int128_t, vector __int128_t);
-vector __uint128_t vec_vsubcuq (vector __uint128_t, vector __uint128_t);
+@node TI C6X Built-in Functions
+@subsection TI C6X Built-in Functions
 
-__int128_t vec_vsubuqm (__int128_t, __int128_t);
-__uint128_t vec_vsubuqm (__uint128_t, __uint128_t);
+GCC provides intrinsics to access certain instructions of the TI C6X
+processors.  These intrinsics, listed below, are available after
+inclusion of the @code{c6x_intrinsics.h} header file.  They map directly
+to C6X instructions.
 
-vector __int128_t __builtin_bcdadd (vector __int128_t, vector__int128_t);
-int __builtin_bcdadd_lt (vector __int128_t, vector__int128_t);
-int __builtin_bcdadd_eq (vector __int128_t, vector__int128_t);
-int __builtin_bcdadd_gt (vector __int128_t, vector__int128_t);
-int __builtin_bcdadd_ov (vector __int128_t, vector__int128_t);
-vector __int128_t bcdsub (vector __int128_t, vector__int128_t);
-int __builtin_bcdsub_lt (vector __int128_t, vector__int128_t);
-int __builtin_bcdsub_eq (vector __int128_t, vector__int128_t);
-int __builtin_bcdsub_gt (vector __int128_t, vector__int128_t);
-int __builtin_bcdsub_ov (vector __int128_t, vector__int128_t);
-@end smallexample
+@smallexample
 
-If the cryptographic instructions are enabled (@option{-mcrypto} or
-@option{-mcpu=power8}), the following builtins are enabled.
+int _sadd (int, int)
+int _ssub (int, int)
+int _sadd2 (int, int)
+int _ssub2 (int, int)
+long long _mpy2 (int, int)
+long long _smpy2 (int, int)
+int _add4 (int, int)
+int _sub4 (int, int)
+int _saddu4 (int, int)
 
-@smallexample
-vector unsigned long long __builtin_crypto_vsbox (vector unsigned long long);
+int _smpy (int, int)
+int _smpyh (int, int)
+int _smpyhl (int, int)
+int _smpylh (int, int)
 
-vector unsigned long long __builtin_crypto_vcipher (vector unsigned long long,
-                                                    vector unsigned long long);
+int _sshl (int, int)
+int _subc (int, int)
 
-vector unsigned long long __builtin_crypto_vcipherlast
-                                     (vector unsigned long long,
-                                      vector unsigned long long);
+int _avg2 (int, int)
+int _avgu4 (int, int)
 
-vector unsigned long long __builtin_crypto_vncipher (vector unsigned long long,
-                                                     vector unsigned long long);
+int _clrr (int, int)
+int _extr (int, int)
+int _extru (int, int)
+int _abs (int)
+int _abs2 (int)
 
-vector unsigned long long __builtin_crypto_vncipherlast
-                                     (vector unsigned long long,
-                                      vector unsigned long long);
+@end smallexample
 
-vector unsigned char __builtin_crypto_vpermxor (vector unsigned char,
-                                                vector unsigned char,
-                                                vector unsigned char);
+@node TILE-Gx Built-in Functions
+@subsection TILE-Gx Built-in Functions
 
-vector unsigned short __builtin_crypto_vpermxor (vector unsigned short,
-                                                 vector unsigned short,
-                                                 vector unsigned short);
+GCC provides intrinsics to access every instruction of the TILE-Gx
+processor.  The intrinsics are of the form:
 
-vector unsigned int __builtin_crypto_vpermxor (vector unsigned int,
-                                               vector unsigned int,
-                                               vector unsigned int);
+@smallexample
 
-vector unsigned long long __builtin_crypto_vpermxor (vector unsigned long long,
-                                                     vector unsigned long long,
-                                                     vector unsigned long long);
+unsigned long long __insn_@var{op} (...)
 
-vector unsigned char __builtin_crypto_vpmsumb (vector unsigned char,
-                                               vector unsigned char);
+@end smallexample
 
-vector unsigned short __builtin_crypto_vpmsumb (vector unsigned short,
-                                                vector unsigned short);
+Where @var{op} is the name of the instruction.  Refer to the ISA manual
+for the complete list of instructions.
 
-vector unsigned int __builtin_crypto_vpmsumb (vector unsigned int,
-                                              vector unsigned int);
+GCC also provides intrinsics to directly access the network registers.
+The intrinsics are:
 
-vector unsigned long long __builtin_crypto_vpmsumb (vector unsigned long long,
-                                                    vector unsigned long long);
+@smallexample
 
-vector unsigned long long __builtin_crypto_vshasigmad
-                               (vector unsigned long long, int, int);
+unsigned long long __tile_idn0_receive (void)
+unsigned long long __tile_idn1_receive (void)
+unsigned long long __tile_udn0_receive (void)
+unsigned long long __tile_udn1_receive (void)
+unsigned long long __tile_udn2_receive (void)
+unsigned long long __tile_udn3_receive (void)
+void __tile_idn_send (unsigned long long)
+void __tile_udn_send (unsigned long long)
 
-vector unsigned int __builtin_crypto_vshasigmaw (vector unsigned int,
-                                                 int, int);
 @end smallexample
 
-The second argument to the @var{__builtin_crypto_vshasigmad} and
-@var{__builtin_crypto_vshasigmaw} builtin functions must be a constant
-integer that is 0 or 1.  The third argument to these builtin functions
-must be a constant integer in the range of 0 to 15.
-
-@node PowerPC Hardware Transactional Memory Built-in Functions
-@subsection PowerPC Hardware Transactional Memory Built-in Functions
-GCC provides two interfaces for accessing the Hardware Transactional
-Memory (HTM) instructions available on some of the PowerPC family
-of prcoessors (eg, POWER8).  The two interfaces come in a low level
-interface, consisting of built-in functions specific to PowerPC and a
-higher level interface consisting of inline functions that are common
-between PowerPC and S/390.
-
-@subsubsection PowerPC HTM Low Level Built-in Functions
+The intrinsic @code{void __tile_network_barrier (void)} is used to
+guarantee that no network operations before it are reordered with
+those after it.
 
-The following low level built-in functions are available with
-@option{-mhtm} or @option{-mcpu=CPU} where CPU is `power8' or later.
-They all generate the machine instruction that is part of the name.
+@node TILEPro Built-in Functions
+@subsection TILEPro Built-in Functions
 
-The HTM built-ins return true or false depending on their success and
-their arguments match exactly the type and order of the associated
-hardware instruction's operands.  Refer to the ISA manual for a
-description of each instruction's operands.
+GCC provides intrinsics to access every instruction of the TILEPro
+processor.  The intrinsics are of the form:
 
 @smallexample
-unsigned int __builtin_tbegin (unsigned int)
-unsigned int __builtin_tend (unsigned int)
 
-unsigned int __builtin_tabort (unsigned int)
-unsigned int __builtin_tabortdc (unsigned int, unsigned int, unsigned int)
-unsigned int __builtin_tabortdci (unsigned int, unsigned int, int)
-unsigned int __builtin_tabortwc (unsigned int, unsigned int, unsigned int)
-unsigned int __builtin_tabortwci (unsigned int, unsigned int, int)
+unsigned __insn_@var{op} (...)
 
-unsigned int __builtin_tcheck (unsigned int)
-unsigned int __builtin_treclaim (unsigned int)
-unsigned int __builtin_trechkpt (void)
-unsigned int __builtin_tsr (unsigned int)
 @end smallexample
 
-In addition to the above HTM built-ins, we have added built-ins for
-some common extended mnemonics of the HTM instructions:
-
-@smallexample
-unsigned int __builtin_tendall (void)
-unsigned int __builtin_tresume (void)
-unsigned int __builtin_tsuspend (void)
-@end smallexample
+@noindent
+where @var{op} is the name of the instruction.  Refer to the ISA manual
+for the complete list of instructions.
 
-The following set of built-in functions are available to gain access
-to the HTM specific special purpose registers.
+GCC also provides intrinsics to directly access the network registers.
+The intrinsics are:
 
 @smallexample
-unsigned long __builtin_get_texasr (void)
-unsigned long __builtin_get_texasru (void)
-unsigned long __builtin_get_tfhar (void)
-unsigned long __builtin_get_tfiar (void)
 
-void __builtin_set_texasr (unsigned long);
-void __builtin_set_texasru (unsigned long);
-void __builtin_set_tfhar (unsigned long);
-void __builtin_set_tfiar (unsigned long);
+unsigned __tile_idn0_receive (void)
+unsigned __tile_idn1_receive (void)
+unsigned __tile_sn_receive (void)
+unsigned __tile_udn0_receive (void)
+unsigned __tile_udn1_receive (void)
+unsigned __tile_udn2_receive (void)
+unsigned __tile_udn3_receive (void)
+void __tile_idn_send (unsigned)
+void __tile_sn_send (unsigned)
+void __tile_udn_send (unsigned)
+
 @end smallexample
 
-Example usage of these low level built-in functions may look like:
+The intrinsic @code{void __tile_network_barrier (void)} is used to
+guarantee that no network operations before it are reordered with
+those after it.
 
-@smallexample
-#include <htmintrin.h>
+@node x86 Built-in Functions
+@subsection x86 Built-in Functions
 
-int num_retries = 10;
+These built-in functions are available for the x86-32 and x86-64 family
+of computers, depending on the command-line switches used.
 
-while (1)
-  @{
-    if (__builtin_tbegin (0))
-      @{
-        /* Transaction State Initiated.  */
-        if (is_locked (lock))
-          __builtin_tabort (0);
-        ... transaction code...
-        __builtin_tend (0);
-        break;
-      @}
-    else
-      @{
-        /* Transaction State Failed.  Use locks if the transaction
-           failure is "persistent" or we've tried too many times.  */
-        if (num_retries-- <= 0
-            || _TEXASRU_FAILURE_PERSISTENT (__builtin_get_texasru ()))
-          @{
-            acquire_lock (lock);
-            ... non transactional fallback path...
-            release_lock (lock);
-            break;
-          @}
-      @}
-  @}
-@end smallexample
+If you specify command-line switches such as @option{-msse},
+the compiler could use the extended instruction sets even if the built-ins
+are not used explicitly in the program.  For this reason, applications
+that perform run-time CPU detection must compile separate files for each
+supported architecture, using the appropriate flags.  In particular,
+the file containing the CPU detection code should be compiled without
+these options.
 
-One final built-in function has been added that returns the value of
-the 2-bit Transaction State field of the Machine Status Register (MSR)
-as stored in @code{CR0}.
+The following machine modes are available for use with MMX built-in functions
+(@pxref{Vector Extensions}): @code{V2SI} for a vector of two 32-bit integers,
+@code{V4HI} for a vector of four 16-bit integers, and @code{V8QI} for a
+vector of eight 8-bit integers.  Some of the built-in functions operate on
+MMX registers as a whole 64-bit entity, these use @code{V1DI} as their mode.
 
-@smallexample
-unsigned long __builtin_ttest (void)
-@end smallexample
+If 3DNow!@: extensions are enabled, @code{V2SF} is used as a mode for a vector
+of two 32-bit floating-point values.
 
-This built-in can be used to determine the current transaction state
-using the following code example:
+If SSE extensions are enabled, @code{V4SF} is used for a vector of four 32-bit
+floating-point values.  Some instructions use a vector of four 32-bit
+integers, these use @code{V4SI}.  Finally, some instructions operate on an
+entire vector register, interpreting it as a 128-bit integer, these use mode
+@code{TI}.
 
-@smallexample
-#include <htmintrin.h>
+In 64-bit mode, the x86-64 family of processors uses additional built-in
+functions for efficient use of @code{TF} (@code{__float128}) 128-bit
+floating point and @code{TC} 128-bit complex floating-point values.
 
-unsigned char tx_state = _HTM_STATE (__builtin_ttest ());
+The following floating-point built-in functions are available in 64-bit
+mode.  All of them implement the function that is part of the name.
 
-if (tx_state == _HTM_TRANSACTIONAL)
-  @{
-    /* Code to use in transactional state.  */
-  @}
-else if (tx_state == _HTM_NONTRANSACTIONAL)
-  @{
-    /* Code to use in non-transactional state.  */
-  @}
-else if (tx_state == _HTM_SUSPENDED)
-  @{
-    /* Code to use in transaction suspended state.  */
-  @}
+@smallexample
+__float128 __builtin_fabsq (__float128)
+__float128 __builtin_copysignq (__float128, __float128)
 @end smallexample
 
-@subsubsection PowerPC HTM High Level Inline Functions
+The following built-in function is always available.
 
-The following high level HTM interface is made available by including
-@code{<htmxlintrin.h>} and using @option{-mhtm} or @option{-mcpu=CPU}
-where CPU is `power8' or later.  This interface is common between PowerPC
-and S/390, allowing users to write one HTM source implementation that
-can be compiled and executed on either system.
+@table @code
+@item void __builtin_ia32_pause (void)
+Generates the @code{pause} machine instruction with a compiler memory
+barrier.
+@end table
 
-@smallexample
-long __TM_simple_begin (void)
-long __TM_begin (void* const TM_buff)
-long __TM_end (void)
-void __TM_abort (void)
-void __TM_named_abort (unsigned char const code)
-void __TM_resume (void)
-void __TM_suspend (void)
+The following floating-point built-in functions are made available in the
+64-bit mode.
 
-long __TM_is_user_abort (void* const TM_buff)
-long __TM_is_named_user_abort (void* const TM_buff, unsigned char *code)
-long __TM_is_illegal (void* const TM_buff)
-long __TM_is_footprint_exceeded (void* const TM_buff)
-long __TM_nesting_depth (void* const TM_buff)
-long __TM_is_nested_too_deep(void* const TM_buff)
-long __TM_is_conflict(void* const TM_buff)
-long __TM_is_failure_persistent(void* const TM_buff)
-long __TM_failure_address(void* const TM_buff)
-long long __TM_failure_code(void* const TM_buff)
-@end smallexample
+@table @code
+@item __float128 __builtin_infq (void)
+Similar to @code{__builtin_inf}, except the return type is @code{__float128}.
+@findex __builtin_infq
 
-Using these common set of HTM inline functions, we can create
-a more portable version of the HTM example in the previous
-section that will work on either PowerPC or S/390:
+@item __float128 __builtin_huge_valq (void)
+Similar to @code{__builtin_huge_val}, except the return type is @code{__float128}.
+@findex __builtin_huge_valq
+@end table
+
+The following built-in functions are always available and can be used to
+check the target platform type.
+
+@deftypefn {Built-in Function} void __builtin_cpu_init (void)
+This function runs the CPU detection code to check the type of CPU and the
+features supported.  This built-in function needs to be invoked along with the built-in functions
+to check CPU type and features, @code{__builtin_cpu_is} and
+@code{__builtin_cpu_supports}, only when used in a function that is
+executed before any constructors are called.  The CPU detection code is
+automatically executed in a very high priority constructor.
 
+For example, this function has to be used in @code{ifunc} resolvers that
+check for CPU type using the built-in functions @code{__builtin_cpu_is}
+and @code{__builtin_cpu_supports}, or in constructors on targets that
+don't support constructor priority.
 @smallexample
-#include <htmxlintrin.h>
 
-int num_retries = 10;
-TM_buff_type TM_buff;
+static void (*resolve_memcpy (void)) (void)
+@{
+  // ifunc resolvers fire before constructors, explicitly call the init
+  // function.
+  __builtin_cpu_init ();
+  if (__builtin_cpu_supports ("ssse3"))
+    return ssse3_memcpy; // super fast memcpy with ssse3 instructions.
+  else
+    return default_memcpy;
+@}
 
-while (1)
-  @{
-    if (__TM_begin (TM_buff))
-      @{
-        /* Transaction State Initiated.  */
-        if (is_locked (lock))
-          __TM_abort ();
-        ... transaction code...
-        __TM_end ();
-        break;
-      @}
-    else
-      @{
-        /* Transaction State Failed.  Use locks if the transaction
-           failure is "persistent" or we've tried too many times.  */
-        if (num_retries-- <= 0
-            || __TM_is_failure_persistent (TM_buff))
-          @{
-            acquire_lock (lock);
-            ... non transactional fallback path...
-            release_lock (lock);
-            break;
-          @}
-      @}
-  @}
+void *memcpy (void *, const void *, size_t)
+     __attribute__ ((ifunc ("resolve_memcpy")));
 @end smallexample
 
-@node RX Built-in Functions
-@subsection RX Built-in Functions
-GCC supports some of the RX instructions which cannot be expressed in
-the C programming language via the use of built-in functions.  The
-following functions are supported:
-
-@deftypefn {Built-in Function}  void __builtin_rx_brk (void)
-Generates the @code{brk} machine instruction.
 @end deftypefn
 
-@deftypefn {Built-in Function}  void __builtin_rx_clrpsw (int)
-Generates the @code{clrpsw} machine instruction to clear the specified
-bit in the processor status word.
-@end deftypefn
+@deftypefn {Built-in Function} int __builtin_cpu_is (const char *@var{cpuname})
+This function returns a positive integer if the run-time CPU
+is of type @var{cpuname}
+and returns @code{0} otherwise. The following CPU names can be detected:
 
-@deftypefn {Built-in Function}  void __builtin_rx_int (int)
-Generates the @code{int} machine instruction to generate an interrupt
-with the specified value.
-@end deftypefn
+@table @samp
+@item intel
+Intel CPU.
 
-@deftypefn {Built-in Function}  void __builtin_rx_machi (int, int)
-Generates the @code{machi} machine instruction to add the result of
-multiplying the top 16 bits of the two arguments into the
-accumulator.
-@end deftypefn
+@item atom
+Intel Atom CPU.
 
-@deftypefn {Built-in Function}  void __builtin_rx_maclo (int, int)
-Generates the @code{maclo} machine instruction to add the result of
-multiplying the bottom 16 bits of the two arguments into the
-accumulator.
-@end deftypefn
+@item core2
+Intel Core 2 CPU.
 
-@deftypefn {Built-in Function}  void __builtin_rx_mulhi (int, int)
-Generates the @code{mulhi} machine instruction to place the result of
-multiplying the top 16 bits of the two arguments into the
-accumulator.
-@end deftypefn
+@item corei7
+Intel Core i7 CPU.
 
-@deftypefn {Built-in Function}  void __builtin_rx_mullo (int, int)
-Generates the @code{mullo} machine instruction to place the result of
-multiplying the bottom 16 bits of the two arguments into the
-accumulator.
-@end deftypefn
+@item nehalem
+Intel Core i7 Nehalem CPU.
 
-@deftypefn {Built-in Function}  int  __builtin_rx_mvfachi (void)
-Generates the @code{mvfachi} machine instruction to read the top
-32 bits of the accumulator.
-@end deftypefn
+@item westmere
+Intel Core i7 Westmere CPU.
 
-@deftypefn {Built-in Function}  int  __builtin_rx_mvfacmi (void)
-Generates the @code{mvfacmi} machine instruction to read the middle
-32 bits of the accumulator.
-@end deftypefn
+@item sandybridge
+Intel Core i7 Sandy Bridge CPU.
 
-@deftypefn {Built-in Function}  int __builtin_rx_mvfc (int)
-Generates the @code{mvfc} machine instruction which reads the control
-register specified in its argument and returns its value.
-@end deftypefn
+@item amd
+AMD CPU.
 
-@deftypefn {Built-in Function}  void __builtin_rx_mvtachi (int)
-Generates the @code{mvtachi} machine instruction to set the top
-32 bits of the accumulator.
-@end deftypefn
+@item amdfam10h
+AMD Family 10h CPU.
 
-@deftypefn {Built-in Function}  void __builtin_rx_mvtaclo (int)
-Generates the @code{mvtaclo} machine instruction to set the bottom
-32 bits of the accumulator.
-@end deftypefn
+@item barcelona
+AMD Family 10h Barcelona CPU.
 
-@deftypefn {Built-in Function}  void __builtin_rx_mvtc (int reg, int val)
-Generates the @code{mvtc} machine instruction which sets control
-register number @code{reg} to @code{val}.
-@end deftypefn
+@item shanghai
+AMD Family 10h Shanghai CPU.
 
-@deftypefn {Built-in Function}  void __builtin_rx_mvtipl (int)
-Generates the @code{mvtipl} machine instruction set the interrupt
-priority level.
-@end deftypefn
+@item istanbul
+AMD Family 10h Istanbul CPU.
 
-@deftypefn {Built-in Function}  void __builtin_rx_racw (int)
-Generates the @code{racw} machine instruction to round the accumulator
-according to the specified mode.
-@end deftypefn
+@item btver1
+AMD Family 14h CPU.
 
-@deftypefn {Built-in Function}  int __builtin_rx_revw (int)
-Generates the @code{revw} machine instruction which swaps the bytes in
-the argument so that bits 0--7 now occupy bits 8--15 and vice versa,
-and also bits 16--23 occupy bits 24--31 and vice versa.
-@end deftypefn
+@item amdfam15h
+AMD Family 15h CPU.
 
-@deftypefn {Built-in Function}  void __builtin_rx_rmpa (void)
-Generates the @code{rmpa} machine instruction which initiates a
-repeated multiply and accumulate sequence.
-@end deftypefn
+@item bdver1
+AMD Family 15h Bulldozer version 1.
 
-@deftypefn {Built-in Function}  void __builtin_rx_round (float)
-Generates the @code{round} machine instruction which returns the
-floating-point argument rounded according to the current rounding mode
-set in the floating-point status word register.
-@end deftypefn
+@item bdver2
+AMD Family 15h Bulldozer version 2.
 
-@deftypefn {Built-in Function}  int __builtin_rx_sat (int)
-Generates the @code{sat} machine instruction which returns the
-saturated value of the argument.
-@end deftypefn
+@item bdver3
+AMD Family 15h Bulldozer version 3.
 
-@deftypefn {Built-in Function}  void __builtin_rx_setpsw (int)
-Generates the @code{setpsw} machine instruction to set the specified
-bit in the processor status word.
-@end deftypefn
+@item bdver4
+AMD Family 15h Bulldozer version 4.
 
-@deftypefn {Built-in Function}  void __builtin_rx_wait (void)
-Generates the @code{wait} machine instruction.
+@item btver2
+AMD Family 16h CPU.
+@end table
+
+Here is an example:
+@smallexample
+if (__builtin_cpu_is ("corei7"))
+  @{
+     do_corei7 (); // Core i7 specific implementation.
+  @}
+else
+  @{
+     do_generic (); // Generic implementation.
+  @}
+@end smallexample
 @end deftypefn
 
-@node S/390 System z Built-in Functions
-@subsection S/390 System z Built-in Functions
-@deftypefn {Built-in Function} int __builtin_tbegin (void*)
-Generates the @code{tbegin} machine instruction starting a
-non-constraint hardware transaction.  If the parameter is non-NULL the
-memory area is used to store the transaction diagnostic buffer and
-will be passed as first operand to @code{tbegin}.  This buffer can be
-defined using the @code{struct __htm_tdb} C struct defined in
-@code{htmintrin.h} and must reside on a double-word boundary.  The
-second tbegin operand is set to @code{0xff0c}. This enables
-save/restore of all GPRs and disables aborts for FPR and AR
-manipulations inside the transaction body.  The condition code set by
-the tbegin instruction is returned as integer value.  The tbegin
-instruction by definition overwrites the content of all FPRs.  The
-compiler will generate code which saves and restores the FPRs.  For
-soft-float code it is recommended to used the @code{*_nofloat}
-variant.  In order to prevent a TDB from being written it is required
-to pass an constant zero value as parameter.  Passing the zero value
-through a variable is not sufficient.  Although modifications of
-access registers inside the transaction will not trigger an
-transaction abort it is not supported to actually modify them.  Access
-registers do not get saved when entering a transaction. They will have
-undefined state when reaching the abort code.
+@deftypefn {Built-in Function} int __builtin_cpu_supports (const char *@var{feature})
+This function returns a positive integer if the run-time CPU
+supports @var{feature}
+and returns @code{0} otherwise. The following features can be detected:
+
+@table @samp
+@item cmov
+CMOV instruction.
+@item mmx
+MMX instructions.
+@item popcnt
+POPCNT instruction.
+@item sse
+SSE instructions.
+@item sse2
+SSE2 instructions.
+@item sse3
+SSE3 instructions.
+@item ssse3
+SSSE3 instructions.
+@item sse4.1
+SSE4.1 instructions.
+@item sse4.2
+SSE4.2 instructions.
+@item avx
+AVX instructions.
+@item avx2
+AVX2 instructions.
+@item avx512f
+AVX512F instructions.
+@end table
+
+Here is an example:
+@smallexample
+if (__builtin_cpu_supports ("popcnt"))
+  @{
+     asm("popcnt %1,%0" : "=r"(count) : "rm"(n) : "cc");
+  @}
+else
+  @{
+     count = generic_countbits (n); //generic implementation.
+  @}
+@end smallexample
 @end deftypefn
 
-Macros for the possible return codes of tbegin are defined in the
-@code{htmintrin.h} header file:
 
-@table @code
-@item _HTM_TBEGIN_STARTED
-@code{tbegin} has been executed as part of normal processing.  The
-transaction body is supposed to be executed.
-@item _HTM_TBEGIN_INDETERMINATE
-The transaction was aborted due to an indeterminate condition which
-might be persistent.
-@item _HTM_TBEGIN_TRANSIENT
-The transaction aborted due to a transient failure.  The transaction
-should be re-executed in that case.
-@item _HTM_TBEGIN_PERSISTENT
-The transaction aborted due to a persistent failure.  Re-execution
-under same circumstances will not be productive.
-@end table
+The following built-in functions are made available by @option{-mmmx}.
+All of them generate the machine instruction that is part of the name.
 
-@defmac _HTM_FIRST_USER_ABORT_CODE
-The @code{_HTM_FIRST_USER_ABORT_CODE} defined in @code{htmintrin.h}
-specifies the first abort code which can be used for
-@code{__builtin_tabort}.  Values below this threshold are reserved for
-machine use.
-@end defmac
+@smallexample
+v8qi __builtin_ia32_paddb (v8qi, v8qi)
+v4hi __builtin_ia32_paddw (v4hi, v4hi)
+v2si __builtin_ia32_paddd (v2si, v2si)
+v8qi __builtin_ia32_psubb (v8qi, v8qi)
+v4hi __builtin_ia32_psubw (v4hi, v4hi)
+v2si __builtin_ia32_psubd (v2si, v2si)
+v8qi __builtin_ia32_paddsb (v8qi, v8qi)
+v4hi __builtin_ia32_paddsw (v4hi, v4hi)
+v8qi __builtin_ia32_psubsb (v8qi, v8qi)
+v4hi __builtin_ia32_psubsw (v4hi, v4hi)
+v8qi __builtin_ia32_paddusb (v8qi, v8qi)
+v4hi __builtin_ia32_paddusw (v4hi, v4hi)
+v8qi __builtin_ia32_psubusb (v8qi, v8qi)
+v4hi __builtin_ia32_psubusw (v4hi, v4hi)
+v4hi __builtin_ia32_pmullw (v4hi, v4hi)
+v4hi __builtin_ia32_pmulhw (v4hi, v4hi)
+di __builtin_ia32_pand (di, di)
+di __builtin_ia32_pandn (di,di)
+di __builtin_ia32_por (di, di)
+di __builtin_ia32_pxor (di, di)
+v8qi __builtin_ia32_pcmpeqb (v8qi, v8qi)
+v4hi __builtin_ia32_pcmpeqw (v4hi, v4hi)
+v2si __builtin_ia32_pcmpeqd (v2si, v2si)
+v8qi __builtin_ia32_pcmpgtb (v8qi, v8qi)
+v4hi __builtin_ia32_pcmpgtw (v4hi, v4hi)
+v2si __builtin_ia32_pcmpgtd (v2si, v2si)
+v8qi __builtin_ia32_punpckhbw (v8qi, v8qi)
+v4hi __builtin_ia32_punpckhwd (v4hi, v4hi)
+v2si __builtin_ia32_punpckhdq (v2si, v2si)
+v8qi __builtin_ia32_punpcklbw (v8qi, v8qi)
+v4hi __builtin_ia32_punpcklwd (v4hi, v4hi)
+v2si __builtin_ia32_punpckldq (v2si, v2si)
+v8qi __builtin_ia32_packsswb (v4hi, v4hi)
+v4hi __builtin_ia32_packssdw (v2si, v2si)
+v8qi __builtin_ia32_packuswb (v4hi, v4hi)
 
-@deftp {Data type} {struct __htm_tdb}
-The @code{struct __htm_tdb} defined in @code{htmintrin.h} describes
-the structure of the transaction diagnostic block as specified in the
-Principles of Operation manual chapter 5-91.
-@end deftp
+v4hi __builtin_ia32_psllw (v4hi, v4hi)
+v2si __builtin_ia32_pslld (v2si, v2si)
+v1di __builtin_ia32_psllq (v1di, v1di)
+v4hi __builtin_ia32_psrlw (v4hi, v4hi)
+v2si __builtin_ia32_psrld (v2si, v2si)
+v1di __builtin_ia32_psrlq (v1di, v1di)
+v4hi __builtin_ia32_psraw (v4hi, v4hi)
+v2si __builtin_ia32_psrad (v2si, v2si)
+v4hi __builtin_ia32_psllwi (v4hi, int)
+v2si __builtin_ia32_pslldi (v2si, int)
+v1di __builtin_ia32_psllqi (v1di, int)
+v4hi __builtin_ia32_psrlwi (v4hi, int)
+v2si __builtin_ia32_psrldi (v2si, int)
+v1di __builtin_ia32_psrlqi (v1di, int)
+v4hi __builtin_ia32_psrawi (v4hi, int)
+v2si __builtin_ia32_psradi (v2si, int)
 
-@deftypefn {Built-in Function} int __builtin_tbegin_nofloat (void*)
-Same as @code{__builtin_tbegin} but without FPR saves and restores.
-Using this variant in code making use of FPRs will leave the FPRs in
-undefined state when entering the transaction abort handler code.
-@end deftypefn
+@end smallexample
 
-@deftypefn {Built-in Function} int __builtin_tbegin_retry (void*, int)
-In addition to @code{__builtin_tbegin} a loop for transient failures
-is generated.  If tbegin returns a condition code of 2 the transaction
-will be retried as often as specified in the second argument.  The
-perform processor assist instruction is used to tell the CPU about the
-number of fails so far.
-@end deftypefn
+The following built-in functions are made available either with
+@option{-msse}, or with a combination of @option{-m3dnow} and
+@option{-march=athlon}.  All of them generate the machine
+instruction that is part of the name.
 
-@deftypefn {Built-in Function} int __builtin_tbegin_retry_nofloat (void*, int)
-Same as @code{__builtin_tbegin_retry} but without FPR saves and
-restores.  Using this variant in code making use of FPRs will leave
-the FPRs in undefined state when entering the transaction abort
-handler code.
-@end deftypefn
+@smallexample
+v4hi __builtin_ia32_pmulhuw (v4hi, v4hi)
+v8qi __builtin_ia32_pavgb (v8qi, v8qi)
+v4hi __builtin_ia32_pavgw (v4hi, v4hi)
+v1di __builtin_ia32_psadbw (v8qi, v8qi)
+v8qi __builtin_ia32_pmaxub (v8qi, v8qi)
+v4hi __builtin_ia32_pmaxsw (v4hi, v4hi)
+v8qi __builtin_ia32_pminub (v8qi, v8qi)
+v4hi __builtin_ia32_pminsw (v4hi, v4hi)
+int __builtin_ia32_pmovmskb (v8qi)
+void __builtin_ia32_maskmovq (v8qi, v8qi, char *)
+void __builtin_ia32_movntq (di *, di)
+void __builtin_ia32_sfence (void)
+@end smallexample
 
-@deftypefn {Built-in Function} void __builtin_tbeginc (void)
-Generates the @code{tbeginc} machine instruction starting a constraint
-hardware transaction.  The second operand is set to @code{0xff08}.
-@end deftypefn
+The following built-in functions are available when @option{-msse} is used.
+All of them generate the machine instruction that is part of the name.
 
-@deftypefn {Built-in Function} int __builtin_tend (void)
-Generates the @code{tend} machine instruction finishing a transaction
-and making the changes visible to other threads.  The condition code
-generated by tend is returned as integer value.
-@end deftypefn
+@smallexample
+int __builtin_ia32_comieq (v4sf, v4sf)
+int __builtin_ia32_comineq (v4sf, v4sf)
+int __builtin_ia32_comilt (v4sf, v4sf)
+int __builtin_ia32_comile (v4sf, v4sf)
+int __builtin_ia32_comigt (v4sf, v4sf)
+int __builtin_ia32_comige (v4sf, v4sf)
+int __builtin_ia32_ucomieq (v4sf, v4sf)
+int __builtin_ia32_ucomineq (v4sf, v4sf)
+int __builtin_ia32_ucomilt (v4sf, v4sf)
+int __builtin_ia32_ucomile (v4sf, v4sf)
+int __builtin_ia32_ucomigt (v4sf, v4sf)
+int __builtin_ia32_ucomige (v4sf, v4sf)
+v4sf __builtin_ia32_addps (v4sf, v4sf)
+v4sf __builtin_ia32_subps (v4sf, v4sf)
+v4sf __builtin_ia32_mulps (v4sf, v4sf)
+v4sf __builtin_ia32_divps (v4sf, v4sf)
+v4sf __builtin_ia32_addss (v4sf, v4sf)
+v4sf __builtin_ia32_subss (v4sf, v4sf)
+v4sf __builtin_ia32_mulss (v4sf, v4sf)
+v4sf __builtin_ia32_divss (v4sf, v4sf)
+v4sf __builtin_ia32_cmpeqps (v4sf, v4sf)
+v4sf __builtin_ia32_cmpltps (v4sf, v4sf)
+v4sf __builtin_ia32_cmpleps (v4sf, v4sf)
+v4sf __builtin_ia32_cmpgtps (v4sf, v4sf)
+v4sf __builtin_ia32_cmpgeps (v4sf, v4sf)
+v4sf __builtin_ia32_cmpunordps (v4sf, v4sf)
+v4sf __builtin_ia32_cmpneqps (v4sf, v4sf)
+v4sf __builtin_ia32_cmpnltps (v4sf, v4sf)
+v4sf __builtin_ia32_cmpnleps (v4sf, v4sf)
+v4sf __builtin_ia32_cmpngtps (v4sf, v4sf)
+v4sf __builtin_ia32_cmpngeps (v4sf, v4sf)
+v4sf __builtin_ia32_cmpordps (v4sf, v4sf)
+v4sf __builtin_ia32_cmpeqss (v4sf, v4sf)
+v4sf __builtin_ia32_cmpltss (v4sf, v4sf)
+v4sf __builtin_ia32_cmpless (v4sf, v4sf)
+v4sf __builtin_ia32_cmpunordss (v4sf, v4sf)
+v4sf __builtin_ia32_cmpneqss (v4sf, v4sf)
+v4sf __builtin_ia32_cmpnltss (v4sf, v4sf)
+v4sf __builtin_ia32_cmpnless (v4sf, v4sf)
+v4sf __builtin_ia32_cmpordss (v4sf, v4sf)
+v4sf __builtin_ia32_maxps (v4sf, v4sf)
+v4sf __builtin_ia32_maxss (v4sf, v4sf)
+v4sf __builtin_ia32_minps (v4sf, v4sf)
+v4sf __builtin_ia32_minss (v4sf, v4sf)
+v4sf __builtin_ia32_andps (v4sf, v4sf)
+v4sf __builtin_ia32_andnps (v4sf, v4sf)
+v4sf __builtin_ia32_orps (v4sf, v4sf)
+v4sf __builtin_ia32_xorps (v4sf, v4sf)
+v4sf __builtin_ia32_movss (v4sf, v4sf)
+v4sf __builtin_ia32_movhlps (v4sf, v4sf)
+v4sf __builtin_ia32_movlhps (v4sf, v4sf)
+v4sf __builtin_ia32_unpckhps (v4sf, v4sf)
+v4sf __builtin_ia32_unpcklps (v4sf, v4sf)
+v4sf __builtin_ia32_cvtpi2ps (v4sf, v2si)
+v4sf __builtin_ia32_cvtsi2ss (v4sf, int)
+v2si __builtin_ia32_cvtps2pi (v4sf)
+int __builtin_ia32_cvtss2si (v4sf)
+v2si __builtin_ia32_cvttps2pi (v4sf)
+int __builtin_ia32_cvttss2si (v4sf)
+v4sf __builtin_ia32_rcpps (v4sf)
+v4sf __builtin_ia32_rsqrtps (v4sf)
+v4sf __builtin_ia32_sqrtps (v4sf)
+v4sf __builtin_ia32_rcpss (v4sf)
+v4sf __builtin_ia32_rsqrtss (v4sf)
+v4sf __builtin_ia32_sqrtss (v4sf)
+v4sf __builtin_ia32_shufps (v4sf, v4sf, int)
+void __builtin_ia32_movntps (float *, v4sf)
+int __builtin_ia32_movmskps (v4sf)
+@end smallexample
 
-@deftypefn {Built-in Function} void __builtin_tabort (int)
-Generates the @code{tabort} machine instruction with the specified
-abort code.  Abort codes from 0 through 255 are reserved and will
-result in an error message.
-@end deftypefn
+The following built-in functions are available when @option{-msse} is used.
 
-@deftypefn {Built-in Function} void __builtin_tx_assist (int)
-Generates the @code{ppa rX,rY,1} machine instruction.  Where the
-integer parameter is loaded into rX and a value of zero is loaded into
-rY.  The integer parameter specifies the number of times the
-transaction repeatedly aborted.
-@end deftypefn
+@table @code
+@item v4sf __builtin_ia32_loadups (float *)
+Generates the @code{movups} machine instruction as a load from memory.
+@item void __builtin_ia32_storeups (float *, v4sf)
+Generates the @code{movups} machine instruction as a store to memory.
+@item v4sf __builtin_ia32_loadss (float *)
+Generates the @code{movss} machine instruction as a load from memory.
+@item v4sf __builtin_ia32_loadhps (v4sf, const v2sf *)
+Generates the @code{movhps} machine instruction as a load from memory.
+@item v4sf __builtin_ia32_loadlps (v4sf, const v2sf *)
+Generates the @code{movlps} machine instruction as a load from memory
+@item void __builtin_ia32_storehps (v2sf *, v4sf)
+Generates the @code{movhps} machine instruction as a store to memory.
+@item void __builtin_ia32_storelps (v2sf *, v4sf)
+Generates the @code{movlps} machine instruction as a store to memory.
+@end table
 
-@deftypefn {Built-in Function} int __builtin_tx_nesting_depth (void)
-Generates the @code{etnd} machine instruction.  The current nesting
-depth is returned as integer value.  For a nesting depth of 0 the code
-is not executed as part of an transaction.
-@end deftypefn
+The following built-in functions are available when @option{-msse2} is used.
+All of them generate the machine instruction that is part of the name.
+
+@smallexample
+int __builtin_ia32_comisdeq (v2df, v2df)
+int __builtin_ia32_comisdlt (v2df, v2df)
+int __builtin_ia32_comisdle (v2df, v2df)
+int __builtin_ia32_comisdgt (v2df, v2df)
+int __builtin_ia32_comisdge (v2df, v2df)
+int __builtin_ia32_comisdneq (v2df, v2df)
+int __builtin_ia32_ucomisdeq (v2df, v2df)
+int __builtin_ia32_ucomisdlt (v2df, v2df)
+int __builtin_ia32_ucomisdle (v2df, v2df)
+int __builtin_ia32_ucomisdgt (v2df, v2df)
+int __builtin_ia32_ucomisdge (v2df, v2df)
+int __builtin_ia32_ucomisdneq (v2df, v2df)
+v2df __builtin_ia32_cmpeqpd (v2df, v2df)
+v2df __builtin_ia32_cmpltpd (v2df, v2df)
+v2df __builtin_ia32_cmplepd (v2df, v2df)
+v2df __builtin_ia32_cmpgtpd (v2df, v2df)
+v2df __builtin_ia32_cmpgepd (v2df, v2df)
+v2df __builtin_ia32_cmpunordpd (v2df, v2df)
+v2df __builtin_ia32_cmpneqpd (v2df, v2df)
+v2df __builtin_ia32_cmpnltpd (v2df, v2df)
+v2df __builtin_ia32_cmpnlepd (v2df, v2df)
+v2df __builtin_ia32_cmpngtpd (v2df, v2df)
+v2df __builtin_ia32_cmpngepd (v2df, v2df)
+v2df __builtin_ia32_cmpordpd (v2df, v2df)
+v2df __builtin_ia32_cmpeqsd (v2df, v2df)
+v2df __builtin_ia32_cmpltsd (v2df, v2df)
+v2df __builtin_ia32_cmplesd (v2df, v2df)
+v2df __builtin_ia32_cmpunordsd (v2df, v2df)
+v2df __builtin_ia32_cmpneqsd (v2df, v2df)
+v2df __builtin_ia32_cmpnltsd (v2df, v2df)
+v2df __builtin_ia32_cmpnlesd (v2df, v2df)
+v2df __builtin_ia32_cmpordsd (v2df, v2df)
+v2di __builtin_ia32_paddq (v2di, v2di)
+v2di __builtin_ia32_psubq (v2di, v2di)
+v2df __builtin_ia32_addpd (v2df, v2df)
+v2df __builtin_ia32_subpd (v2df, v2df)
+v2df __builtin_ia32_mulpd (v2df, v2df)
+v2df __builtin_ia32_divpd (v2df, v2df)
+v2df __builtin_ia32_addsd (v2df, v2df)
+v2df __builtin_ia32_subsd (v2df, v2df)
+v2df __builtin_ia32_mulsd (v2df, v2df)
+v2df __builtin_ia32_divsd (v2df, v2df)
+v2df __builtin_ia32_minpd (v2df, v2df)
+v2df __builtin_ia32_maxpd (v2df, v2df)
+v2df __builtin_ia32_minsd (v2df, v2df)
+v2df __builtin_ia32_maxsd (v2df, v2df)
+v2df __builtin_ia32_andpd (v2df, v2df)
+v2df __builtin_ia32_andnpd (v2df, v2df)
+v2df __builtin_ia32_orpd (v2df, v2df)
+v2df __builtin_ia32_xorpd (v2df, v2df)
+v2df __builtin_ia32_movsd (v2df, v2df)
+v2df __builtin_ia32_unpckhpd (v2df, v2df)
+v2df __builtin_ia32_unpcklpd (v2df, v2df)
+v16qi __builtin_ia32_paddb128 (v16qi, v16qi)
+v8hi __builtin_ia32_paddw128 (v8hi, v8hi)
+v4si __builtin_ia32_paddd128 (v4si, v4si)
+v2di __builtin_ia32_paddq128 (v2di, v2di)
+v16qi __builtin_ia32_psubb128 (v16qi, v16qi)
+v8hi __builtin_ia32_psubw128 (v8hi, v8hi)
+v4si __builtin_ia32_psubd128 (v4si, v4si)
+v2di __builtin_ia32_psubq128 (v2di, v2di)
+v8hi __builtin_ia32_pmullw128 (v8hi, v8hi)
+v8hi __builtin_ia32_pmulhw128 (v8hi, v8hi)
+v2di __builtin_ia32_pand128 (v2di, v2di)
+v2di __builtin_ia32_pandn128 (v2di, v2di)
+v2di __builtin_ia32_por128 (v2di, v2di)
+v2di __builtin_ia32_pxor128 (v2di, v2di)
+v16qi __builtin_ia32_pavgb128 (v16qi, v16qi)
+v8hi __builtin_ia32_pavgw128 (v8hi, v8hi)
+v16qi __builtin_ia32_pcmpeqb128 (v16qi, v16qi)
+v8hi __builtin_ia32_pcmpeqw128 (v8hi, v8hi)
+v4si __builtin_ia32_pcmpeqd128 (v4si, v4si)
+v16qi __builtin_ia32_pcmpgtb128 (v16qi, v16qi)
+v8hi __builtin_ia32_pcmpgtw128 (v8hi, v8hi)
+v4si __builtin_ia32_pcmpgtd128 (v4si, v4si)
+v16qi __builtin_ia32_pmaxub128 (v16qi, v16qi)
+v8hi __builtin_ia32_pmaxsw128 (v8hi, v8hi)
+v16qi __builtin_ia32_pminub128 (v16qi, v16qi)
+v8hi __builtin_ia32_pminsw128 (v8hi, v8hi)
+v16qi __builtin_ia32_punpckhbw128 (v16qi, v16qi)
+v8hi __builtin_ia32_punpckhwd128 (v8hi, v8hi)
+v4si __builtin_ia32_punpckhdq128 (v4si, v4si)
+v2di __builtin_ia32_punpckhqdq128 (v2di, v2di)
+v16qi __builtin_ia32_punpcklbw128 (v16qi, v16qi)
+v8hi __builtin_ia32_punpcklwd128 (v8hi, v8hi)
+v4si __builtin_ia32_punpckldq128 (v4si, v4si)
+v2di __builtin_ia32_punpcklqdq128 (v2di, v2di)
+v16qi __builtin_ia32_packsswb128 (v8hi, v8hi)
+v8hi __builtin_ia32_packssdw128 (v4si, v4si)
+v16qi __builtin_ia32_packuswb128 (v8hi, v8hi)
+v8hi __builtin_ia32_pmulhuw128 (v8hi, v8hi)
+void __builtin_ia32_maskmovdqu (v16qi, v16qi)
+v2df __builtin_ia32_loadupd (double *)
+void __builtin_ia32_storeupd (double *, v2df)
+v2df __builtin_ia32_loadhpd (v2df, double const *)
+v2df __builtin_ia32_loadlpd (v2df, double const *)
+int __builtin_ia32_movmskpd (v2df)
+int __builtin_ia32_pmovmskb128 (v16qi)
+void __builtin_ia32_movnti (int *, int)
+void __builtin_ia32_movnti64 (long long int *, long long int)
+void __builtin_ia32_movntpd (double *, v2df)
+void __builtin_ia32_movntdq (v2df *, v2df)
+v4si __builtin_ia32_pshufd (v4si, int)
+v8hi __builtin_ia32_pshuflw (v8hi, int)
+v8hi __builtin_ia32_pshufhw (v8hi, int)
+v2di __builtin_ia32_psadbw128 (v16qi, v16qi)
+v2df __builtin_ia32_sqrtpd (v2df)
+v2df __builtin_ia32_sqrtsd (v2df)
+v2df __builtin_ia32_shufpd (v2df, v2df, int)
+v2df __builtin_ia32_cvtdq2pd (v4si)
+v4sf __builtin_ia32_cvtdq2ps (v4si)
+v4si __builtin_ia32_cvtpd2dq (v2df)
+v2si __builtin_ia32_cvtpd2pi (v2df)
+v4sf __builtin_ia32_cvtpd2ps (v2df)
+v4si __builtin_ia32_cvttpd2dq (v2df)
+v2si __builtin_ia32_cvttpd2pi (v2df)
+v2df __builtin_ia32_cvtpi2pd (v2si)
+int __builtin_ia32_cvtsd2si (v2df)
+int __builtin_ia32_cvttsd2si (v2df)
+long long __builtin_ia32_cvtsd2si64 (v2df)
+long long __builtin_ia32_cvttsd2si64 (v2df)
+v4si __builtin_ia32_cvtps2dq (v4sf)
+v2df __builtin_ia32_cvtps2pd (v4sf)
+v4si __builtin_ia32_cvttps2dq (v4sf)
+v2df __builtin_ia32_cvtsi2sd (v2df, int)
+v2df __builtin_ia32_cvtsi642sd (v2df, long long)
+v4sf __builtin_ia32_cvtsd2ss (v4sf, v2df)
+v2df __builtin_ia32_cvtss2sd (v2df, v4sf)
+void __builtin_ia32_clflush (const void *)
+void __builtin_ia32_lfence (void)
+void __builtin_ia32_mfence (void)
+v16qi __builtin_ia32_loaddqu (const char *)
+void __builtin_ia32_storedqu (char *, v16qi)
+v1di __builtin_ia32_pmuludq (v2si, v2si)
+v2di __builtin_ia32_pmuludq128 (v4si, v4si)
+v8hi __builtin_ia32_psllw128 (v8hi, v8hi)
+v4si __builtin_ia32_pslld128 (v4si, v4si)
+v2di __builtin_ia32_psllq128 (v2di, v2di)
+v8hi __builtin_ia32_psrlw128 (v8hi, v8hi)
+v4si __builtin_ia32_psrld128 (v4si, v4si)
+v2di __builtin_ia32_psrlq128 (v2di, v2di)
+v8hi __builtin_ia32_psraw128 (v8hi, v8hi)
+v4si __builtin_ia32_psrad128 (v4si, v4si)
+v2di __builtin_ia32_pslldqi128 (v2di, int)
+v8hi __builtin_ia32_psllwi128 (v8hi, int)
+v4si __builtin_ia32_pslldi128 (v4si, int)
+v2di __builtin_ia32_psllqi128 (v2di, int)
+v2di __builtin_ia32_psrldqi128 (v2di, int)
+v8hi __builtin_ia32_psrlwi128 (v8hi, int)
+v4si __builtin_ia32_psrldi128 (v4si, int)
+v2di __builtin_ia32_psrlqi128 (v2di, int)
+v8hi __builtin_ia32_psrawi128 (v8hi, int)
+v4si __builtin_ia32_psradi128 (v4si, int)
+v4si __builtin_ia32_pmaddwd128 (v8hi, v8hi)
+v2di __builtin_ia32_movq128 (v2di)
+@end smallexample
 
-@deftypefn {Built-in Function} void __builtin_non_tx_store (uint64_t *, uint64_t)
+The following built-in functions are available when @option{-msse3} is used.
+All of them generate the machine instruction that is part of the name.
 
-Generates the @code{ntstg} machine instruction.  The second argument
-is written to the first arguments location.  The store operation will
-not be rolled-back in case of an transaction abort.
-@end deftypefn
+@smallexample
+v2df __builtin_ia32_addsubpd (v2df, v2df)
+v4sf __builtin_ia32_addsubps (v4sf, v4sf)
+v2df __builtin_ia32_haddpd (v2df, v2df)
+v4sf __builtin_ia32_haddps (v4sf, v4sf)
+v2df __builtin_ia32_hsubpd (v2df, v2df)
+v4sf __builtin_ia32_hsubps (v4sf, v4sf)
+v16qi __builtin_ia32_lddqu (char const *)
+void __builtin_ia32_monitor (void *, unsigned int, unsigned int)
+v4sf __builtin_ia32_movshdup (v4sf)
+v4sf __builtin_ia32_movsldup (v4sf)
+void __builtin_ia32_mwait (unsigned int, unsigned int)
+@end smallexample
 
-@node SH Built-in Functions
-@subsection SH Built-in Functions
-The following built-in functions are supported on the SH1, SH2, SH3 and SH4
-families of processors:
+The following built-in functions are available when @option{-mssse3} is used.
+All of them generate the machine instruction that is part of the name.
 
-@deftypefn {Built-in Function} {void} __builtin_set_thread_pointer (void *@var{ptr})
-Sets the @samp{GBR} register to the specified value @var{ptr}.  This is usually
-used by system code that manages threads and execution contexts.  The compiler
-normally does not generate code that modifies the contents of @samp{GBR} and
-thus the value is preserved across function calls.  Changing the @samp{GBR}
-value in user code must be done with caution, since the compiler might use
-@samp{GBR} in order to access thread local variables.
+@smallexample
+v2si __builtin_ia32_phaddd (v2si, v2si)
+v4hi __builtin_ia32_phaddw (v4hi, v4hi)
+v4hi __builtin_ia32_phaddsw (v4hi, v4hi)
+v2si __builtin_ia32_phsubd (v2si, v2si)
+v4hi __builtin_ia32_phsubw (v4hi, v4hi)
+v4hi __builtin_ia32_phsubsw (v4hi, v4hi)
+v4hi __builtin_ia32_pmaddubsw (v8qi, v8qi)
+v4hi __builtin_ia32_pmulhrsw (v4hi, v4hi)
+v8qi __builtin_ia32_pshufb (v8qi, v8qi)
+v8qi __builtin_ia32_psignb (v8qi, v8qi)
+v2si __builtin_ia32_psignd (v2si, v2si)
+v4hi __builtin_ia32_psignw (v4hi, v4hi)
+v1di __builtin_ia32_palignr (v1di, v1di, int)
+v8qi __builtin_ia32_pabsb (v8qi)
+v2si __builtin_ia32_pabsd (v2si)
+v4hi __builtin_ia32_pabsw (v4hi)
+@end smallexample
 
-@end deftypefn
+The following built-in functions are available when @option{-mssse3} is used.
+All of them generate the machine instruction that is part of the name.
 
-@deftypefn {Built-in Function} {void *} __builtin_thread_pointer (void)
-Returns the value that is currently set in the @samp{GBR} register.
-Memory loads and stores that use the thread pointer as a base address are
-turned into @samp{GBR} based displacement loads and stores, if possible.
-For example:
 @smallexample
-struct my_tcb
-@{
-   int a, b, c, d, e;
-@};
+v4si __builtin_ia32_phaddd128 (v4si, v4si)
+v8hi __builtin_ia32_phaddw128 (v8hi, v8hi)
+v8hi __builtin_ia32_phaddsw128 (v8hi, v8hi)
+v4si __builtin_ia32_phsubd128 (v4si, v4si)
+v8hi __builtin_ia32_phsubw128 (v8hi, v8hi)
+v8hi __builtin_ia32_phsubsw128 (v8hi, v8hi)
+v8hi __builtin_ia32_pmaddubsw128 (v16qi, v16qi)
+v8hi __builtin_ia32_pmulhrsw128 (v8hi, v8hi)
+v16qi __builtin_ia32_pshufb128 (v16qi, v16qi)
+v16qi __builtin_ia32_psignb128 (v16qi, v16qi)
+v4si __builtin_ia32_psignd128 (v4si, v4si)
+v8hi __builtin_ia32_psignw128 (v8hi, v8hi)
+v2di __builtin_ia32_palignr128 (v2di, v2di, int)
+v16qi __builtin_ia32_pabsb128 (v16qi)
+v4si __builtin_ia32_pabsd128 (v4si)
+v8hi __builtin_ia32_pabsw128 (v8hi)
+@end smallexample
 
-int get_tcb_value (void)
-@{
-  // Generate @samp{mov.l @@(8,gbr),r0} instruction
-  return ((my_tcb*)__builtin_thread_pointer ())->c;
-@}
+The following built-in functions are available when @option{-msse4.1} is
+used.  All of them generate the machine instruction that is part of the
+name.
 
+@smallexample
+v2df __builtin_ia32_blendpd (v2df, v2df, const int)
+v4sf __builtin_ia32_blendps (v4sf, v4sf, const int)
+v2df __builtin_ia32_blendvpd (v2df, v2df, v2df)
+v4sf __builtin_ia32_blendvps (v4sf, v4sf, v4sf)
+v2df __builtin_ia32_dppd (v2df, v2df, const int)
+v4sf __builtin_ia32_dpps (v4sf, v4sf, const int)
+v4sf __builtin_ia32_insertps128 (v4sf, v4sf, const int)
+v2di __builtin_ia32_movntdqa (v2di *);
+v16qi __builtin_ia32_mpsadbw128 (v16qi, v16qi, const int)
+v8hi __builtin_ia32_packusdw128 (v4si, v4si)
+v16qi __builtin_ia32_pblendvb128 (v16qi, v16qi, v16qi)
+v8hi __builtin_ia32_pblendw128 (v8hi, v8hi, const int)
+v2di __builtin_ia32_pcmpeqq (v2di, v2di)
+v8hi __builtin_ia32_phminposuw128 (v8hi)
+v16qi __builtin_ia32_pmaxsb128 (v16qi, v16qi)
+v4si __builtin_ia32_pmaxsd128 (v4si, v4si)
+v4si __builtin_ia32_pmaxud128 (v4si, v4si)
+v8hi __builtin_ia32_pmaxuw128 (v8hi, v8hi)
+v16qi __builtin_ia32_pminsb128 (v16qi, v16qi)
+v4si __builtin_ia32_pminsd128 (v4si, v4si)
+v4si __builtin_ia32_pminud128 (v4si, v4si)
+v8hi __builtin_ia32_pminuw128 (v8hi, v8hi)
+v4si __builtin_ia32_pmovsxbd128 (v16qi)
+v2di __builtin_ia32_pmovsxbq128 (v16qi)
+v8hi __builtin_ia32_pmovsxbw128 (v16qi)
+v2di __builtin_ia32_pmovsxdq128 (v4si)
+v4si __builtin_ia32_pmovsxwd128 (v8hi)
+v2di __builtin_ia32_pmovsxwq128 (v8hi)
+v4si __builtin_ia32_pmovzxbd128 (v16qi)
+v2di __builtin_ia32_pmovzxbq128 (v16qi)
+v8hi __builtin_ia32_pmovzxbw128 (v16qi)
+v2di __builtin_ia32_pmovzxdq128 (v4si)
+v4si __builtin_ia32_pmovzxwd128 (v8hi)
+v2di __builtin_ia32_pmovzxwq128 (v8hi)
+v2di __builtin_ia32_pmuldq128 (v4si, v4si)
+v4si __builtin_ia32_pmulld128 (v4si, v4si)
+int __builtin_ia32_ptestc128 (v2di, v2di)
+int __builtin_ia32_ptestnzc128 (v2di, v2di)
+int __builtin_ia32_ptestz128 (v2di, v2di)
+v2df __builtin_ia32_roundpd (v2df, const int)
+v4sf __builtin_ia32_roundps (v4sf, const int)
+v2df __builtin_ia32_roundsd (v2df, v2df, const int)
+v4sf __builtin_ia32_roundss (v4sf, v4sf, const int)
 @end smallexample
-@end deftypefn
-
-@deftypefn {Built-in Function} {unsigned int} __builtin_sh_get_fpscr (void)
-Returns the value that is currently set in the @samp{FPSCR} register.
-@end deftypefn
-
-@deftypefn {Built-in Function} {void} __builtin_sh_set_fpscr (unsigned int @var{val})
-Sets the @samp{FPSCR} register to the specified value @var{val}, while
-preserving the current values of the FR, SZ and PR bits.
-@end deftypefn
-
-@node SPARC VIS Built-in Functions
-@subsection SPARC VIS Built-in Functions
-
-GCC supports SIMD operations on the SPARC using both the generic vector
-extensions (@pxref{Vector Extensions}) as well as built-in functions for
-the SPARC Visual Instruction Set (VIS).  When you use the @option{-mvis}
-switch, the VIS extension is exposed as the following built-in functions:
-
-@smallexample
-typedef int v1si __attribute__ ((vector_size (4)));
-typedef int v2si __attribute__ ((vector_size (8)));
-typedef short v4hi __attribute__ ((vector_size (8)));
-typedef short v2hi __attribute__ ((vector_size (4)));
-typedef unsigned char v8qi __attribute__ ((vector_size (8)));
-typedef unsigned char v4qi __attribute__ ((vector_size (4)));
 
-void __builtin_vis_write_gsr (int64_t);
-int64_t __builtin_vis_read_gsr (void);
+The following built-in functions are available when @option{-msse4.1} is
+used.
 
-void * __builtin_vis_alignaddr (void *, long);
-void * __builtin_vis_alignaddrl (void *, long);
-int64_t __builtin_vis_faligndatadi (int64_t, int64_t);
-v2si __builtin_vis_faligndatav2si (v2si, v2si);
-v4hi __builtin_vis_faligndatav4hi (v4si, v4si);
-v8qi __builtin_vis_faligndatav8qi (v8qi, v8qi);
+@table @code
+@item v4sf __builtin_ia32_vec_set_v4sf (v4sf, float, const int)
+Generates the @code{insertps} machine instruction.
+@item int __builtin_ia32_vec_ext_v16qi (v16qi, const int)
+Generates the @code{pextrb} machine instruction.
+@item v16qi __builtin_ia32_vec_set_v16qi (v16qi, int, const int)
+Generates the @code{pinsrb} machine instruction.
+@item v4si __builtin_ia32_vec_set_v4si (v4si, int, const int)
+Generates the @code{pinsrd} machine instruction.
+@item v2di __builtin_ia32_vec_set_v2di (v2di, long long, const int)
+Generates the @code{pinsrq} machine instruction in 64bit mode.
+@end table
 
-v4hi __builtin_vis_fexpand (v4qi);
+The following built-in functions are changed to generate new SSE4.1
+instructions when @option{-msse4.1} is used.
 
-v4hi __builtin_vis_fmul8x16 (v4qi, v4hi);
-v4hi __builtin_vis_fmul8x16au (v4qi, v2hi);
-v4hi __builtin_vis_fmul8x16al (v4qi, v2hi);
-v4hi __builtin_vis_fmul8sux16 (v8qi, v4hi);
-v4hi __builtin_vis_fmul8ulx16 (v8qi, v4hi);
-v2si __builtin_vis_fmuld8sux16 (v4qi, v2hi);
-v2si __builtin_vis_fmuld8ulx16 (v4qi, v2hi);
+@table @code
+@item float __builtin_ia32_vec_ext_v4sf (v4sf, const int)
+Generates the @code{extractps} machine instruction.
+@item int __builtin_ia32_vec_ext_v4si (v4si, const int)
+Generates the @code{pextrd} machine instruction.
+@item long long __builtin_ia32_vec_ext_v2di (v2di, const int)
+Generates the @code{pextrq} machine instruction in 64bit mode.
+@end table
 
-v4qi __builtin_vis_fpack16 (v4hi);
-v8qi __builtin_vis_fpack32 (v2si, v8qi);
-v2hi __builtin_vis_fpackfix (v2si);
-v8qi __builtin_vis_fpmerge (v4qi, v4qi);
+The following built-in functions are available when @option{-msse4.2} is
+used.  All of them generate the machine instruction that is part of the
+name.
 
-int64_t __builtin_vis_pdist (v8qi, v8qi, int64_t);
+@smallexample
+v16qi __builtin_ia32_pcmpestrm128 (v16qi, int, v16qi, int, const int)
+int __builtin_ia32_pcmpestri128 (v16qi, int, v16qi, int, const int)
+int __builtin_ia32_pcmpestria128 (v16qi, int, v16qi, int, const int)
+int __builtin_ia32_pcmpestric128 (v16qi, int, v16qi, int, const int)
+int __builtin_ia32_pcmpestrio128 (v16qi, int, v16qi, int, const int)
+int __builtin_ia32_pcmpestris128 (v16qi, int, v16qi, int, const int)
+int __builtin_ia32_pcmpestriz128 (v16qi, int, v16qi, int, const int)
+v16qi __builtin_ia32_pcmpistrm128 (v16qi, v16qi, const int)
+int __builtin_ia32_pcmpistri128 (v16qi, v16qi, const int)
+int __builtin_ia32_pcmpistria128 (v16qi, v16qi, const int)
+int __builtin_ia32_pcmpistric128 (v16qi, v16qi, const int)
+int __builtin_ia32_pcmpistrio128 (v16qi, v16qi, const int)
+int __builtin_ia32_pcmpistris128 (v16qi, v16qi, const int)
+int __builtin_ia32_pcmpistriz128 (v16qi, v16qi, const int)
+v2di __builtin_ia32_pcmpgtq (v2di, v2di)
+@end smallexample
 
-long __builtin_vis_edge8 (void *, void *);
-long __builtin_vis_edge8l (void *, void *);
-long __builtin_vis_edge16 (void *, void *);
-long __builtin_vis_edge16l (void *, void *);
-long __builtin_vis_edge32 (void *, void *);
-long __builtin_vis_edge32l (void *, void *);
+The following built-in functions are available when @option{-msse4.2} is
+used.
 
-long __builtin_vis_fcmple16 (v4hi, v4hi);
-long __builtin_vis_fcmple32 (v2si, v2si);
-long __builtin_vis_fcmpne16 (v4hi, v4hi);
-long __builtin_vis_fcmpne32 (v2si, v2si);
-long __builtin_vis_fcmpgt16 (v4hi, v4hi);
-long __builtin_vis_fcmpgt32 (v2si, v2si);
-long __builtin_vis_fcmpeq16 (v4hi, v4hi);
-long __builtin_vis_fcmpeq32 (v2si, v2si);
+@table @code
+@item unsigned int __builtin_ia32_crc32qi (unsigned int, unsigned char)
+Generates the @code{crc32b} machine instruction.
+@item unsigned int __builtin_ia32_crc32hi (unsigned int, unsigned short)
+Generates the @code{crc32w} machine instruction.
+@item unsigned int __builtin_ia32_crc32si (unsigned int, unsigned int)
+Generates the @code{crc32l} machine instruction.
+@item unsigned long long __builtin_ia32_crc32di (unsigned long long, unsigned long long)
+Generates the @code{crc32q} machine instruction.
+@end table
 
-v4hi __builtin_vis_fpadd16 (v4hi, v4hi);
-v2hi __builtin_vis_fpadd16s (v2hi, v2hi);
-v2si __builtin_vis_fpadd32 (v2si, v2si);
-v1si __builtin_vis_fpadd32s (v1si, v1si);
-v4hi __builtin_vis_fpsub16 (v4hi, v4hi);
-v2hi __builtin_vis_fpsub16s (v2hi, v2hi);
-v2si __builtin_vis_fpsub32 (v2si, v2si);
-v1si __builtin_vis_fpsub32s (v1si, v1si);
+The following built-in functions are changed to generate new SSE4.2
+instructions when @option{-msse4.2} is used.
 
-long __builtin_vis_array8 (long, long);
-long __builtin_vis_array16 (long, long);
-long __builtin_vis_array32 (long, long);
-@end smallexample
+@table @code
+@item int __builtin_popcount (unsigned int)
+Generates the @code{popcntl} machine instruction.
+@item int __builtin_popcountl (unsigned long)
+Generates the @code{popcntl} or @code{popcntq} machine instruction,
+depending on the size of @code{unsigned long}.
+@item int __builtin_popcountll (unsigned long long)
+Generates the @code{popcntq} machine instruction.
+@end table
 
-When you use the @option{-mvis2} switch, the VIS version 2.0 built-in
-functions also become available:
+The following built-in functions are available when @option{-mavx} is
+used. All of them generate the machine instruction that is part of the
+name.
 
 @smallexample
-long __builtin_vis_bmask (long, long);
-int64_t __builtin_vis_bshuffledi (int64_t, int64_t);
-v2si __builtin_vis_bshufflev2si (v2si, v2si);
-v4hi __builtin_vis_bshufflev2si (v4hi, v4hi);
-v8qi __builtin_vis_bshufflev2si (v8qi, v8qi);
-
-long __builtin_vis_edge8n (void *, void *);
-long __builtin_vis_edge8ln (void *, void *);
-long __builtin_vis_edge16n (void *, void *);
-long __builtin_vis_edge16ln (void *, void *);
-long __builtin_vis_edge32n (void *, void *);
-long __builtin_vis_edge32ln (void *, void *);
+v4df __builtin_ia32_addpd256 (v4df,v4df)
+v8sf __builtin_ia32_addps256 (v8sf,v8sf)
+v4df __builtin_ia32_addsubpd256 (v4df,v4df)
+v8sf __builtin_ia32_addsubps256 (v8sf,v8sf)
+v4df __builtin_ia32_andnpd256 (v4df,v4df)
+v8sf __builtin_ia32_andnps256 (v8sf,v8sf)
+v4df __builtin_ia32_andpd256 (v4df,v4df)
+v8sf __builtin_ia32_andps256 (v8sf,v8sf)
+v4df __builtin_ia32_blendpd256 (v4df,v4df,int)
+v8sf __builtin_ia32_blendps256 (v8sf,v8sf,int)
+v4df __builtin_ia32_blendvpd256 (v4df,v4df,v4df)
+v8sf __builtin_ia32_blendvps256 (v8sf,v8sf,v8sf)
+v2df __builtin_ia32_cmppd (v2df,v2df,int)
+v4df __builtin_ia32_cmppd256 (v4df,v4df,int)
+v4sf __builtin_ia32_cmpps (v4sf,v4sf,int)
+v8sf __builtin_ia32_cmpps256 (v8sf,v8sf,int)
+v2df __builtin_ia32_cmpsd (v2df,v2df,int)
+v4sf __builtin_ia32_cmpss (v4sf,v4sf,int)
+v4df __builtin_ia32_cvtdq2pd256 (v4si)
+v8sf __builtin_ia32_cvtdq2ps256 (v8si)
+v4si __builtin_ia32_cvtpd2dq256 (v4df)
+v4sf __builtin_ia32_cvtpd2ps256 (v4df)
+v8si __builtin_ia32_cvtps2dq256 (v8sf)
+v4df __builtin_ia32_cvtps2pd256 (v4sf)
+v4si __builtin_ia32_cvttpd2dq256 (v4df)
+v8si __builtin_ia32_cvttps2dq256 (v8sf)
+v4df __builtin_ia32_divpd256 (v4df,v4df)
+v8sf __builtin_ia32_divps256 (v8sf,v8sf)
+v8sf __builtin_ia32_dpps256 (v8sf,v8sf,int)
+v4df __builtin_ia32_haddpd256 (v4df,v4df)
+v8sf __builtin_ia32_haddps256 (v8sf,v8sf)
+v4df __builtin_ia32_hsubpd256 (v4df,v4df)
+v8sf __builtin_ia32_hsubps256 (v8sf,v8sf)
+v32qi __builtin_ia32_lddqu256 (pcchar)
+v32qi __builtin_ia32_loaddqu256 (pcchar)
+v4df __builtin_ia32_loadupd256 (pcdouble)
+v8sf __builtin_ia32_loadups256 (pcfloat)
+v2df __builtin_ia32_maskloadpd (pcv2df,v2df)
+v4df __builtin_ia32_maskloadpd256 (pcv4df,v4df)
+v4sf __builtin_ia32_maskloadps (pcv4sf,v4sf)
+v8sf __builtin_ia32_maskloadps256 (pcv8sf,v8sf)
+void __builtin_ia32_maskstorepd (pv2df,v2df,v2df)
+void __builtin_ia32_maskstorepd256 (pv4df,v4df,v4df)
+void __builtin_ia32_maskstoreps (pv4sf,v4sf,v4sf)
+void __builtin_ia32_maskstoreps256 (pv8sf,v8sf,v8sf)
+v4df __builtin_ia32_maxpd256 (v4df,v4df)
+v8sf __builtin_ia32_maxps256 (v8sf,v8sf)
+v4df __builtin_ia32_minpd256 (v4df,v4df)
+v8sf __builtin_ia32_minps256 (v8sf,v8sf)
+v4df __builtin_ia32_movddup256 (v4df)
+int __builtin_ia32_movmskpd256 (v4df)
+int __builtin_ia32_movmskps256 (v8sf)
+v8sf __builtin_ia32_movshdup256 (v8sf)
+v8sf __builtin_ia32_movsldup256 (v8sf)
+v4df __builtin_ia32_mulpd256 (v4df,v4df)
+v8sf __builtin_ia32_mulps256 (v8sf,v8sf)
+v4df __builtin_ia32_orpd256 (v4df,v4df)
+v8sf __builtin_ia32_orps256 (v8sf,v8sf)
+v2df __builtin_ia32_pd_pd256 (v4df)
+v4df __builtin_ia32_pd256_pd (v2df)
+v4sf __builtin_ia32_ps_ps256 (v8sf)
+v8sf __builtin_ia32_ps256_ps (v4sf)
+int __builtin_ia32_ptestc256 (v4di,v4di,ptest)
+int __builtin_ia32_ptestnzc256 (v4di,v4di,ptest)
+int __builtin_ia32_ptestz256 (v4di,v4di,ptest)
+v8sf __builtin_ia32_rcpps256 (v8sf)
+v4df __builtin_ia32_roundpd256 (v4df,int)
+v8sf __builtin_ia32_roundps256 (v8sf,int)
+v8sf __builtin_ia32_rsqrtps_nr256 (v8sf)
+v8sf __builtin_ia32_rsqrtps256 (v8sf)
+v4df __builtin_ia32_shufpd256 (v4df,v4df,int)
+v8sf __builtin_ia32_shufps256 (v8sf,v8sf,int)
+v4si __builtin_ia32_si_si256 (v8si)
+v8si __builtin_ia32_si256_si (v4si)
+v4df __builtin_ia32_sqrtpd256 (v4df)
+v8sf __builtin_ia32_sqrtps_nr256 (v8sf)
+v8sf __builtin_ia32_sqrtps256 (v8sf)
+void __builtin_ia32_storedqu256 (pchar,v32qi)
+void __builtin_ia32_storeupd256 (pdouble,v4df)
+void __builtin_ia32_storeups256 (pfloat,v8sf)
+v4df __builtin_ia32_subpd256 (v4df,v4df)
+v8sf __builtin_ia32_subps256 (v8sf,v8sf)
+v4df __builtin_ia32_unpckhpd256 (v4df,v4df)
+v8sf __builtin_ia32_unpckhps256 (v8sf,v8sf)
+v4df __builtin_ia32_unpcklpd256 (v4df,v4df)
+v8sf __builtin_ia32_unpcklps256 (v8sf,v8sf)
+v4df __builtin_ia32_vbroadcastf128_pd256 (pcv2df)
+v8sf __builtin_ia32_vbroadcastf128_ps256 (pcv4sf)
+v4df __builtin_ia32_vbroadcastsd256 (pcdouble)
+v4sf __builtin_ia32_vbroadcastss (pcfloat)
+v8sf __builtin_ia32_vbroadcastss256 (pcfloat)
+v2df __builtin_ia32_vextractf128_pd256 (v4df,int)
+v4sf __builtin_ia32_vextractf128_ps256 (v8sf,int)
+v4si __builtin_ia32_vextractf128_si256 (v8si,int)
+v4df __builtin_ia32_vinsertf128_pd256 (v4df,v2df,int)
+v8sf __builtin_ia32_vinsertf128_ps256 (v8sf,v4sf,int)
+v8si __builtin_ia32_vinsertf128_si256 (v8si,v4si,int)
+v4df __builtin_ia32_vperm2f128_pd256 (v4df,v4df,int)
+v8sf __builtin_ia32_vperm2f128_ps256 (v8sf,v8sf,int)
+v8si __builtin_ia32_vperm2f128_si256 (v8si,v8si,int)
+v2df __builtin_ia32_vpermil2pd (v2df,v2df,v2di,int)
+v4df __builtin_ia32_vpermil2pd256 (v4df,v4df,v4di,int)
+v4sf __builtin_ia32_vpermil2ps (v4sf,v4sf,v4si,int)
+v8sf __builtin_ia32_vpermil2ps256 (v8sf,v8sf,v8si,int)
+v2df __builtin_ia32_vpermilpd (v2df,int)
+v4df __builtin_ia32_vpermilpd256 (v4df,int)
+v4sf __builtin_ia32_vpermilps (v4sf,int)
+v8sf __builtin_ia32_vpermilps256 (v8sf,int)
+v2df __builtin_ia32_vpermilvarpd (v2df,v2di)
+v4df __builtin_ia32_vpermilvarpd256 (v4df,v4di)
+v4sf __builtin_ia32_vpermilvarps (v4sf,v4si)
+v8sf __builtin_ia32_vpermilvarps256 (v8sf,v8si)
+int __builtin_ia32_vtestcpd (v2df,v2df,ptest)
+int __builtin_ia32_vtestcpd256 (v4df,v4df,ptest)
+int __builtin_ia32_vtestcps (v4sf,v4sf,ptest)
+int __builtin_ia32_vtestcps256 (v8sf,v8sf,ptest)
+int __builtin_ia32_vtestnzcpd (v2df,v2df,ptest)
+int __builtin_ia32_vtestnzcpd256 (v4df,v4df,ptest)
+int __builtin_ia32_vtestnzcps (v4sf,v4sf,ptest)
+int __builtin_ia32_vtestnzcps256 (v8sf,v8sf,ptest)
+int __builtin_ia32_vtestzpd (v2df,v2df,ptest)
+int __builtin_ia32_vtestzpd256 (v4df,v4df,ptest)
+int __builtin_ia32_vtestzps (v4sf,v4sf,ptest)
+int __builtin_ia32_vtestzps256 (v8sf,v8sf,ptest)
+void __builtin_ia32_vzeroall (void)
+void __builtin_ia32_vzeroupper (void)
+v4df __builtin_ia32_xorpd256 (v4df,v4df)
+v8sf __builtin_ia32_xorps256 (v8sf,v8sf)
 @end smallexample
 
-When you use the @option{-mvis3} switch, the VIS version 3.0 built-in
-functions also become available:
+The following built-in functions are available when @option{-mavx2} is
+used. All of them generate the machine instruction that is part of the
+name.
 
 @smallexample
-void __builtin_vis_cmask8 (long);
-void __builtin_vis_cmask16 (long);
-void __builtin_vis_cmask32 (long);
-
-v4hi __builtin_vis_fchksm16 (v4hi, v4hi);
-
-v4hi __builtin_vis_fsll16 (v4hi, v4hi);
-v4hi __builtin_vis_fslas16 (v4hi, v4hi);
-v4hi __builtin_vis_fsrl16 (v4hi, v4hi);
-v4hi __builtin_vis_fsra16 (v4hi, v4hi);
-v2si __builtin_vis_fsll16 (v2si, v2si);
-v2si __builtin_vis_fslas16 (v2si, v2si);
-v2si __builtin_vis_fsrl16 (v2si, v2si);
-v2si __builtin_vis_fsra16 (v2si, v2si);
-
-long __builtin_vis_pdistn (v8qi, v8qi);
-
-v4hi __builtin_vis_fmean16 (v4hi, v4hi);
-
-int64_t __builtin_vis_fpadd64 (int64_t, int64_t);
-int64_t __builtin_vis_fpsub64 (int64_t, int64_t);
-
-v4hi __builtin_vis_fpadds16 (v4hi, v4hi);
-v2hi __builtin_vis_fpadds16s (v2hi, v2hi);
-v4hi __builtin_vis_fpsubs16 (v4hi, v4hi);
-v2hi __builtin_vis_fpsubs16s (v2hi, v2hi);
-v2si __builtin_vis_fpadds32 (v2si, v2si);
-v1si __builtin_vis_fpadds32s (v1si, v1si);
-v2si __builtin_vis_fpsubs32 (v2si, v2si);
-v1si __builtin_vis_fpsubs32s (v1si, v1si);
-
-long __builtin_vis_fucmple8 (v8qi, v8qi);
-long __builtin_vis_fucmpne8 (v8qi, v8qi);
-long __builtin_vis_fucmpgt8 (v8qi, v8qi);
-long __builtin_vis_fucmpeq8 (v8qi, v8qi);
+v32qi __builtin_ia32_mpsadbw256 (v32qi,v32qi,int)
+v32qi __builtin_ia32_pabsb256 (v32qi)
+v16hi __builtin_ia32_pabsw256 (v16hi)
+v8si __builtin_ia32_pabsd256 (v8si)
+v16hi __builtin_ia32_packssdw256 (v8si,v8si)
+v32qi __builtin_ia32_packsswb256 (v16hi,v16hi)
+v16hi __builtin_ia32_packusdw256 (v8si,v8si)
+v32qi __builtin_ia32_packuswb256 (v16hi,v16hi)
+v32qi __builtin_ia32_paddb256 (v32qi,v32qi)
+v16hi __builtin_ia32_paddw256 (v16hi,v16hi)
+v8si __builtin_ia32_paddd256 (v8si,v8si)
+v4di __builtin_ia32_paddq256 (v4di,v4di)
+v32qi __builtin_ia32_paddsb256 (v32qi,v32qi)
+v16hi __builtin_ia32_paddsw256 (v16hi,v16hi)
+v32qi __builtin_ia32_paddusb256 (v32qi,v32qi)
+v16hi __builtin_ia32_paddusw256 (v16hi,v16hi)
+v4di __builtin_ia32_palignr256 (v4di,v4di,int)
+v4di __builtin_ia32_andsi256 (v4di,v4di)
+v4di __builtin_ia32_andnotsi256 (v4di,v4di)
+v32qi __builtin_ia32_pavgb256 (v32qi,v32qi)
+v16hi __builtin_ia32_pavgw256 (v16hi,v16hi)
+v32qi __builtin_ia32_pblendvb256 (v32qi,v32qi,v32qi)
+v16hi __builtin_ia32_pblendw256 (v16hi,v16hi,int)
+v32qi __builtin_ia32_pcmpeqb256 (v32qi,v32qi)
+v16hi __builtin_ia32_pcmpeqw256 (v16hi,v16hi)
+v8si __builtin_ia32_pcmpeqd256 (c8si,v8si)
+v4di __builtin_ia32_pcmpeqq256 (v4di,v4di)
+v32qi __builtin_ia32_pcmpgtb256 (v32qi,v32qi)
+v16hi __builtin_ia32_pcmpgtw256 (16hi,v16hi)
+v8si __builtin_ia32_pcmpgtd256 (v8si,v8si)
+v4di __builtin_ia32_pcmpgtq256 (v4di,v4di)
+v16hi __builtin_ia32_phaddw256 (v16hi,v16hi)
+v8si __builtin_ia32_phaddd256 (v8si,v8si)
+v16hi __builtin_ia32_phaddsw256 (v16hi,v16hi)
+v16hi __builtin_ia32_phsubw256 (v16hi,v16hi)
+v8si __builtin_ia32_phsubd256 (v8si,v8si)
+v16hi __builtin_ia32_phsubsw256 (v16hi,v16hi)
+v32qi __builtin_ia32_pmaddubsw256 (v32qi,v32qi)
+v16hi __builtin_ia32_pmaddwd256 (v16hi,v16hi)
+v32qi __builtin_ia32_pmaxsb256 (v32qi,v32qi)
+v16hi __builtin_ia32_pmaxsw256 (v16hi,v16hi)
+v8si __builtin_ia32_pmaxsd256 (v8si,v8si)
+v32qi __builtin_ia32_pmaxub256 (v32qi,v32qi)
+v16hi __builtin_ia32_pmaxuw256 (v16hi,v16hi)
+v8si __builtin_ia32_pmaxud256 (v8si,v8si)
+v32qi __builtin_ia32_pminsb256 (v32qi,v32qi)
+v16hi __builtin_ia32_pminsw256 (v16hi,v16hi)
+v8si __builtin_ia32_pminsd256 (v8si,v8si)
+v32qi __builtin_ia32_pminub256 (v32qi,v32qi)
+v16hi __builtin_ia32_pminuw256 (v16hi,v16hi)
+v8si __builtin_ia32_pminud256 (v8si,v8si)
+int __builtin_ia32_pmovmskb256 (v32qi)
+v16hi __builtin_ia32_pmovsxbw256 (v16qi)
+v8si __builtin_ia32_pmovsxbd256 (v16qi)
+v4di __builtin_ia32_pmovsxbq256 (v16qi)
+v8si __builtin_ia32_pmovsxwd256 (v8hi)
+v4di __builtin_ia32_pmovsxwq256 (v8hi)
+v4di __builtin_ia32_pmovsxdq256 (v4si)
+v16hi __builtin_ia32_pmovzxbw256 (v16qi)
+v8si __builtin_ia32_pmovzxbd256 (v16qi)
+v4di __builtin_ia32_pmovzxbq256 (v16qi)
+v8si __builtin_ia32_pmovzxwd256 (v8hi)
+v4di __builtin_ia32_pmovzxwq256 (v8hi)
+v4di __builtin_ia32_pmovzxdq256 (v4si)
+v4di __builtin_ia32_pmuldq256 (v8si,v8si)
+v16hi __builtin_ia32_pmulhrsw256 (v16hi, v16hi)
+v16hi __builtin_ia32_pmulhuw256 (v16hi,v16hi)
+v16hi __builtin_ia32_pmulhw256 (v16hi,v16hi)
+v16hi __builtin_ia32_pmullw256 (v16hi,v16hi)
+v8si __builtin_ia32_pmulld256 (v8si,v8si)
+v4di __builtin_ia32_pmuludq256 (v8si,v8si)
+v4di __builtin_ia32_por256 (v4di,v4di)
+v16hi __builtin_ia32_psadbw256 (v32qi,v32qi)
+v32qi __builtin_ia32_pshufb256 (v32qi,v32qi)
+v8si __builtin_ia32_pshufd256 (v8si,int)
+v16hi __builtin_ia32_pshufhw256 (v16hi,int)
+v16hi __builtin_ia32_pshuflw256 (v16hi,int)
+v32qi __builtin_ia32_psignb256 (v32qi,v32qi)
+v16hi __builtin_ia32_psignw256 (v16hi,v16hi)
+v8si __builtin_ia32_psignd256 (v8si,v8si)
+v4di __builtin_ia32_pslldqi256 (v4di,int)
+v16hi __builtin_ia32_psllwi256 (16hi,int)
+v16hi __builtin_ia32_psllw256(v16hi,v8hi)
+v8si __builtin_ia32_pslldi256 (v8si,int)
+v8si __builtin_ia32_pslld256(v8si,v4si)
+v4di __builtin_ia32_psllqi256 (v4di,int)
+v4di __builtin_ia32_psllq256(v4di,v2di)
+v16hi __builtin_ia32_psrawi256 (v16hi,int)
+v16hi __builtin_ia32_psraw256 (v16hi,v8hi)
+v8si __builtin_ia32_psradi256 (v8si,int)
+v8si __builtin_ia32_psrad256 (v8si,v4si)
+v4di __builtin_ia32_psrldqi256 (v4di, int)
+v16hi __builtin_ia32_psrlwi256 (v16hi,int)
+v16hi __builtin_ia32_psrlw256 (v16hi,v8hi)
+v8si __builtin_ia32_psrldi256 (v8si,int)
+v8si __builtin_ia32_psrld256 (v8si,v4si)
+v4di __builtin_ia32_psrlqi256 (v4di,int)
+v4di __builtin_ia32_psrlq256(v4di,v2di)
+v32qi __builtin_ia32_psubb256 (v32qi,v32qi)
+v32hi __builtin_ia32_psubw256 (v16hi,v16hi)
+v8si __builtin_ia32_psubd256 (v8si,v8si)
+v4di __builtin_ia32_psubq256 (v4di,v4di)
+v32qi __builtin_ia32_psubsb256 (v32qi,v32qi)
+v16hi __builtin_ia32_psubsw256 (v16hi,v16hi)
+v32qi __builtin_ia32_psubusb256 (v32qi,v32qi)
+v16hi __builtin_ia32_psubusw256 (v16hi,v16hi)
+v32qi __builtin_ia32_punpckhbw256 (v32qi,v32qi)
+v16hi __builtin_ia32_punpckhwd256 (v16hi,v16hi)
+v8si __builtin_ia32_punpckhdq256 (v8si,v8si)
+v4di __builtin_ia32_punpckhqdq256 (v4di,v4di)
+v32qi __builtin_ia32_punpcklbw256 (v32qi,v32qi)
+v16hi __builtin_ia32_punpcklwd256 (v16hi,v16hi)
+v8si __builtin_ia32_punpckldq256 (v8si,v8si)
+v4di __builtin_ia32_punpcklqdq256 (v4di,v4di)
+v4di __builtin_ia32_pxor256 (v4di,v4di)
+v4di __builtin_ia32_movntdqa256 (pv4di)
+v4sf __builtin_ia32_vbroadcastss_ps (v4sf)
+v8sf __builtin_ia32_vbroadcastss_ps256 (v4sf)
+v4df __builtin_ia32_vbroadcastsd_pd256 (v2df)
+v4di __builtin_ia32_vbroadcastsi256 (v2di)
+v4si __builtin_ia32_pblendd128 (v4si,v4si)
+v8si __builtin_ia32_pblendd256 (v8si,v8si)
+v32qi __builtin_ia32_pbroadcastb256 (v16qi)
+v16hi __builtin_ia32_pbroadcastw256 (v8hi)
+v8si __builtin_ia32_pbroadcastd256 (v4si)
+v4di __builtin_ia32_pbroadcastq256 (v2di)
+v16qi __builtin_ia32_pbroadcastb128 (v16qi)
+v8hi __builtin_ia32_pbroadcastw128 (v8hi)
+v4si __builtin_ia32_pbroadcastd128 (v4si)
+v2di __builtin_ia32_pbroadcastq128 (v2di)
+v8si __builtin_ia32_permvarsi256 (v8si,v8si)
+v4df __builtin_ia32_permdf256 (v4df,int)
+v8sf __builtin_ia32_permvarsf256 (v8sf,v8sf)
+v4di __builtin_ia32_permdi256 (v4di,int)
+v4di __builtin_ia32_permti256 (v4di,v4di,int)
+v4di __builtin_ia32_extract128i256 (v4di,int)
+v4di __builtin_ia32_insert128i256 (v4di,v2di,int)
+v8si __builtin_ia32_maskloadd256 (pcv8si,v8si)
+v4di __builtin_ia32_maskloadq256 (pcv4di,v4di)
+v4si __builtin_ia32_maskloadd (pcv4si,v4si)
+v2di __builtin_ia32_maskloadq (pcv2di,v2di)
+void __builtin_ia32_maskstored256 (pv8si,v8si,v8si)
+void __builtin_ia32_maskstoreq256 (pv4di,v4di,v4di)
+void __builtin_ia32_maskstored (pv4si,v4si,v4si)
+void __builtin_ia32_maskstoreq (pv2di,v2di,v2di)
+v8si __builtin_ia32_psllv8si (v8si,v8si)
+v4si __builtin_ia32_psllv4si (v4si,v4si)
+v4di __builtin_ia32_psllv4di (v4di,v4di)
+v2di __builtin_ia32_psllv2di (v2di,v2di)
+v8si __builtin_ia32_psrav8si (v8si,v8si)
+v4si __builtin_ia32_psrav4si (v4si,v4si)
+v8si __builtin_ia32_psrlv8si (v8si,v8si)
+v4si __builtin_ia32_psrlv4si (v4si,v4si)
+v4di __builtin_ia32_psrlv4di (v4di,v4di)
+v2di __builtin_ia32_psrlv2di (v2di,v2di)
+v2df __builtin_ia32_gathersiv2df (v2df, pcdouble,v4si,v2df,int)
+v4df __builtin_ia32_gathersiv4df (v4df, pcdouble,v4si,v4df,int)
+v2df __builtin_ia32_gatherdiv2df (v2df, pcdouble,v2di,v2df,int)
+v4df __builtin_ia32_gatherdiv4df (v4df, pcdouble,v4di,v4df,int)
+v4sf __builtin_ia32_gathersiv4sf (v4sf, pcfloat,v4si,v4sf,int)
+v8sf __builtin_ia32_gathersiv8sf (v8sf, pcfloat,v8si,v8sf,int)
+v4sf __builtin_ia32_gatherdiv4sf (v4sf, pcfloat,v2di,v4sf,int)
+v4sf __builtin_ia32_gatherdiv4sf256 (v4sf, pcfloat,v4di,v4sf,int)
+v2di __builtin_ia32_gathersiv2di (v2di, pcint64,v4si,v2di,int)
+v4di __builtin_ia32_gathersiv4di (v4di, pcint64,v4si,v4di,int)
+v2di __builtin_ia32_gatherdiv2di (v2di, pcint64,v2di,v2di,int)
+v4di __builtin_ia32_gatherdiv4di (v4di, pcint64,v4di,v4di,int)
+v4si __builtin_ia32_gathersiv4si (v4si, pcint,v4si,v4si,int)
+v8si __builtin_ia32_gathersiv8si (v8si, pcint,v8si,v8si,int)
+v4si __builtin_ia32_gatherdiv4si (v4si, pcint,v2di,v4si,int)
+v4si __builtin_ia32_gatherdiv4si256 (v4si, pcint,v4di,v4si,int)
+@end smallexample
 
-float __builtin_vis_fhadds (float, float);
-double __builtin_vis_fhaddd (double, double);
-float __builtin_vis_fhsubs (float, float);
-double __builtin_vis_fhsubd (double, double);
-float __builtin_vis_fnhadds (float, float);
-double __builtin_vis_fnhaddd (double, double);
+The following built-in functions are available when @option{-maes} is
+used.  All of them generate the machine instruction that is part of the
+name.
 
-int64_t __builtin_vis_umulxhi (int64_t, int64_t);
-int64_t __builtin_vis_xmulx (int64_t, int64_t);
-int64_t __builtin_vis_xmulxhi (int64_t, int64_t);
+@smallexample
+v2di __builtin_ia32_aesenc128 (v2di, v2di)
+v2di __builtin_ia32_aesenclast128 (v2di, v2di)
+v2di __builtin_ia32_aesdec128 (v2di, v2di)
+v2di __builtin_ia32_aesdeclast128 (v2di, v2di)
+v2di __builtin_ia32_aeskeygenassist128 (v2di, const int)
+v2di __builtin_ia32_aesimc128 (v2di)
 @end smallexample
 
-@node SPU Built-in Functions
-@subsection SPU Built-in Functions
+The following built-in function is available when @option{-mpclmul} is
+used.
 
-GCC provides extensions for the SPU processor as described in the
-Sony/Toshiba/IBM SPU Language Extensions Specification, which can be
-found at @uref{http://cell.scei.co.jp/} or
-@uref{http://www.ibm.com/developerworks/power/cell/}.  GCC's
-implementation differs in several ways.
+@table @code
+@item v2di __builtin_ia32_pclmulqdq128 (v2di, v2di, const int)
+Generates the @code{pclmulqdq} machine instruction.
+@end table
 
-@itemize @bullet
+The following built-in function is available when @option{-mfsgsbase} is
+used.  All of them generate the machine instruction that is part of the
+name.
 
-@item
-The optional extension of specifying vector constants in parentheses is
-not supported.
+@smallexample
+unsigned int __builtin_ia32_rdfsbase32 (void)
+unsigned long long __builtin_ia32_rdfsbase64 (void)
+unsigned int __builtin_ia32_rdgsbase32 (void)
+unsigned long long __builtin_ia32_rdgsbase64 (void)
+void _writefsbase_u32 (unsigned int)
+void _writefsbase_u64 (unsigned long long)
+void _writegsbase_u32 (unsigned int)
+void _writegsbase_u64 (unsigned long long)
+@end smallexample
 
-@item
-A vector initializer requires no cast if the vector constant is of the
-same type as the variable it is initializing.
+The following built-in function is available when @option{-mrdrnd} is
+used.  All of them generate the machine instruction that is part of the
+name.
 
-@item
-If @code{signed} or @code{unsigned} is omitted, the signedness of the
-vector type is the default signedness of the base type.  The default
-varies depending on the operating system, so a portable program should
-always specify the signedness.
+@smallexample
+unsigned int __builtin_ia32_rdrand16_step (unsigned short *)
+unsigned int __builtin_ia32_rdrand32_step (unsigned int *)
+unsigned int __builtin_ia32_rdrand64_step (unsigned long long *)
+@end smallexample
 
-@item
-By default, the keyword @code{__vector} is added. The macro
-@code{vector} is defined in @code{<spu_intrinsics.h>} and can be
-undefined.
+The following built-in functions are available when @option{-msse4a} is used.
+All of them generate the machine instruction that is part of the name.
 
-@item
-GCC allows using a @code{typedef} name as the type specifier for a
-vector type.
+@smallexample
+void __builtin_ia32_movntsd (double *, v2df)
+void __builtin_ia32_movntss (float *, v4sf)
+v2di __builtin_ia32_extrq  (v2di, v16qi)
+v2di __builtin_ia32_extrqi (v2di, const unsigned int, const unsigned int)
+v2di __builtin_ia32_insertq (v2di, v2di)
+v2di __builtin_ia32_insertqi (v2di, v2di, const unsigned int, const unsigned int)
+@end smallexample
 
-@item
-For C, overloaded functions are implemented with macros so the following
-does not work:
+The following built-in functions are available when @option{-mxop} is used.
+@smallexample
+v2df __builtin_ia32_vfrczpd (v2df)
+v4sf __builtin_ia32_vfrczps (v4sf)
+v2df __builtin_ia32_vfrczsd (v2df)
+v4sf __builtin_ia32_vfrczss (v4sf)
+v4df __builtin_ia32_vfrczpd256 (v4df)
+v8sf __builtin_ia32_vfrczps256 (v8sf)
+v2di __builtin_ia32_vpcmov (v2di, v2di, v2di)
+v2di __builtin_ia32_vpcmov_v2di (v2di, v2di, v2di)
+v4si __builtin_ia32_vpcmov_v4si (v4si, v4si, v4si)
+v8hi __builtin_ia32_vpcmov_v8hi (v8hi, v8hi, v8hi)
+v16qi __builtin_ia32_vpcmov_v16qi (v16qi, v16qi, v16qi)
+v2df __builtin_ia32_vpcmov_v2df (v2df, v2df, v2df)
+v4sf __builtin_ia32_vpcmov_v4sf (v4sf, v4sf, v4sf)
+v4di __builtin_ia32_vpcmov_v4di256 (v4di, v4di, v4di)
+v8si __builtin_ia32_vpcmov_v8si256 (v8si, v8si, v8si)
+v16hi __builtin_ia32_vpcmov_v16hi256 (v16hi, v16hi, v16hi)
+v32qi __builtin_ia32_vpcmov_v32qi256 (v32qi, v32qi, v32qi)
+v4df __builtin_ia32_vpcmov_v4df256 (v4df, v4df, v4df)
+v8sf __builtin_ia32_vpcmov_v8sf256 (v8sf, v8sf, v8sf)
+v16qi __builtin_ia32_vpcomeqb (v16qi, v16qi)
+v8hi __builtin_ia32_vpcomeqw (v8hi, v8hi)
+v4si __builtin_ia32_vpcomeqd (v4si, v4si)
+v2di __builtin_ia32_vpcomeqq (v2di, v2di)
+v16qi __builtin_ia32_vpcomequb (v16qi, v16qi)
+v4si __builtin_ia32_vpcomequd (v4si, v4si)
+v2di __builtin_ia32_vpcomequq (v2di, v2di)
+v8hi __builtin_ia32_vpcomequw (v8hi, v8hi)
+v8hi __builtin_ia32_vpcomeqw (v8hi, v8hi)
+v16qi __builtin_ia32_vpcomfalseb (v16qi, v16qi)
+v4si __builtin_ia32_vpcomfalsed (v4si, v4si)
+v2di __builtin_ia32_vpcomfalseq (v2di, v2di)
+v16qi __builtin_ia32_vpcomfalseub (v16qi, v16qi)
+v4si __builtin_ia32_vpcomfalseud (v4si, v4si)
+v2di __builtin_ia32_vpcomfalseuq (v2di, v2di)
+v8hi __builtin_ia32_vpcomfalseuw (v8hi, v8hi)
+v8hi __builtin_ia32_vpcomfalsew (v8hi, v8hi)
+v16qi __builtin_ia32_vpcomgeb (v16qi, v16qi)
+v4si __builtin_ia32_vpcomged (v4si, v4si)
+v2di __builtin_ia32_vpcomgeq (v2di, v2di)
+v16qi __builtin_ia32_vpcomgeub (v16qi, v16qi)
+v4si __builtin_ia32_vpcomgeud (v4si, v4si)
+v2di __builtin_ia32_vpcomgeuq (v2di, v2di)
+v8hi __builtin_ia32_vpcomgeuw (v8hi, v8hi)
+v8hi __builtin_ia32_vpcomgew (v8hi, v8hi)
+v16qi __builtin_ia32_vpcomgtb (v16qi, v16qi)
+v4si __builtin_ia32_vpcomgtd (v4si, v4si)
+v2di __builtin_ia32_vpcomgtq (v2di, v2di)
+v16qi __builtin_ia32_vpcomgtub (v16qi, v16qi)
+v4si __builtin_ia32_vpcomgtud (v4si, v4si)
+v2di __builtin_ia32_vpcomgtuq (v2di, v2di)
+v8hi __builtin_ia32_vpcomgtuw (v8hi, v8hi)
+v8hi __builtin_ia32_vpcomgtw (v8hi, v8hi)
+v16qi __builtin_ia32_vpcomleb (v16qi, v16qi)
+v4si __builtin_ia32_vpcomled (v4si, v4si)
+v2di __builtin_ia32_vpcomleq (v2di, v2di)
+v16qi __builtin_ia32_vpcomleub (v16qi, v16qi)
+v4si __builtin_ia32_vpcomleud (v4si, v4si)
+v2di __builtin_ia32_vpcomleuq (v2di, v2di)
+v8hi __builtin_ia32_vpcomleuw (v8hi, v8hi)
+v8hi __builtin_ia32_vpcomlew (v8hi, v8hi)
+v16qi __builtin_ia32_vpcomltb (v16qi, v16qi)
+v4si __builtin_ia32_vpcomltd (v4si, v4si)
+v2di __builtin_ia32_vpcomltq (v2di, v2di)
+v16qi __builtin_ia32_vpcomltub (v16qi, v16qi)
+v4si __builtin_ia32_vpcomltud (v4si, v4si)
+v2di __builtin_ia32_vpcomltuq (v2di, v2di)
+v8hi __builtin_ia32_vpcomltuw (v8hi, v8hi)
+v8hi __builtin_ia32_vpcomltw (v8hi, v8hi)
+v16qi __builtin_ia32_vpcomneb (v16qi, v16qi)
+v4si __builtin_ia32_vpcomned (v4si, v4si)
+v2di __builtin_ia32_vpcomneq (v2di, v2di)
+v16qi __builtin_ia32_vpcomneub (v16qi, v16qi)
+v4si __builtin_ia32_vpcomneud (v4si, v4si)
+v2di __builtin_ia32_vpcomneuq (v2di, v2di)
+v8hi __builtin_ia32_vpcomneuw (v8hi, v8hi)
+v8hi __builtin_ia32_vpcomnew (v8hi, v8hi)
+v16qi __builtin_ia32_vpcomtrueb (v16qi, v16qi)
+v4si __builtin_ia32_vpcomtrued (v4si, v4si)
+v2di __builtin_ia32_vpcomtrueq (v2di, v2di)
+v16qi __builtin_ia32_vpcomtrueub (v16qi, v16qi)
+v4si __builtin_ia32_vpcomtrueud (v4si, v4si)
+v2di __builtin_ia32_vpcomtrueuq (v2di, v2di)
+v8hi __builtin_ia32_vpcomtrueuw (v8hi, v8hi)
+v8hi __builtin_ia32_vpcomtruew (v8hi, v8hi)
+v4si __builtin_ia32_vphaddbd (v16qi)
+v2di __builtin_ia32_vphaddbq (v16qi)
+v8hi __builtin_ia32_vphaddbw (v16qi)
+v2di __builtin_ia32_vphadddq (v4si)
+v4si __builtin_ia32_vphaddubd (v16qi)
+v2di __builtin_ia32_vphaddubq (v16qi)
+v8hi __builtin_ia32_vphaddubw (v16qi)
+v2di __builtin_ia32_vphaddudq (v4si)
+v4si __builtin_ia32_vphadduwd (v8hi)
+v2di __builtin_ia32_vphadduwq (v8hi)
+v4si __builtin_ia32_vphaddwd (v8hi)
+v2di __builtin_ia32_vphaddwq (v8hi)
+v8hi __builtin_ia32_vphsubbw (v16qi)
+v2di __builtin_ia32_vphsubdq (v4si)
+v4si __builtin_ia32_vphsubwd (v8hi)
+v4si __builtin_ia32_vpmacsdd (v4si, v4si, v4si)
+v2di __builtin_ia32_vpmacsdqh (v4si, v4si, v2di)
+v2di __builtin_ia32_vpmacsdql (v4si, v4si, v2di)
+v4si __builtin_ia32_vpmacssdd (v4si, v4si, v4si)
+v2di __builtin_ia32_vpmacssdqh (v4si, v4si, v2di)
+v2di __builtin_ia32_vpmacssdql (v4si, v4si, v2di)
+v4si __builtin_ia32_vpmacsswd (v8hi, v8hi, v4si)
+v8hi __builtin_ia32_vpmacssww (v8hi, v8hi, v8hi)
+v4si __builtin_ia32_vpmacswd (v8hi, v8hi, v4si)
+v8hi __builtin_ia32_vpmacsww (v8hi, v8hi, v8hi)
+v4si __builtin_ia32_vpmadcsswd (v8hi, v8hi, v4si)
+v4si __builtin_ia32_vpmadcswd (v8hi, v8hi, v4si)
+v16qi __builtin_ia32_vpperm (v16qi, v16qi, v16qi)
+v16qi __builtin_ia32_vprotb (v16qi, v16qi)
+v4si __builtin_ia32_vprotd (v4si, v4si)
+v2di __builtin_ia32_vprotq (v2di, v2di)
+v8hi __builtin_ia32_vprotw (v8hi, v8hi)
+v16qi __builtin_ia32_vpshab (v16qi, v16qi)
+v4si __builtin_ia32_vpshad (v4si, v4si)
+v2di __builtin_ia32_vpshaq (v2di, v2di)
+v8hi __builtin_ia32_vpshaw (v8hi, v8hi)
+v16qi __builtin_ia32_vpshlb (v16qi, v16qi)
+v4si __builtin_ia32_vpshld (v4si, v4si)
+v2di __builtin_ia32_vpshlq (v2di, v2di)
+v8hi __builtin_ia32_vpshlw (v8hi, v8hi)
+@end smallexample
+
+The following built-in functions are available when @option{-mfma4} is used.
+All of them generate the machine instruction that is part of the name.
+
+@smallexample
+v2df __builtin_ia32_vfmaddpd (v2df, v2df, v2df)
+v4sf __builtin_ia32_vfmaddps (v4sf, v4sf, v4sf)
+v2df __builtin_ia32_vfmaddsd (v2df, v2df, v2df)
+v4sf __builtin_ia32_vfmaddss (v4sf, v4sf, v4sf)
+v2df __builtin_ia32_vfmsubpd (v2df, v2df, v2df)
+v4sf __builtin_ia32_vfmsubps (v4sf, v4sf, v4sf)
+v2df __builtin_ia32_vfmsubsd (v2df, v2df, v2df)
+v4sf __builtin_ia32_vfmsubss (v4sf, v4sf, v4sf)
+v2df __builtin_ia32_vfnmaddpd (v2df, v2df, v2df)
+v4sf __builtin_ia32_vfnmaddps (v4sf, v4sf, v4sf)
+v2df __builtin_ia32_vfnmaddsd (v2df, v2df, v2df)
+v4sf __builtin_ia32_vfnmaddss (v4sf, v4sf, v4sf)
+v2df __builtin_ia32_vfnmsubpd (v2df, v2df, v2df)
+v4sf __builtin_ia32_vfnmsubps (v4sf, v4sf, v4sf)
+v2df __builtin_ia32_vfnmsubsd (v2df, v2df, v2df)
+v4sf __builtin_ia32_vfnmsubss (v4sf, v4sf, v4sf)
+v2df __builtin_ia32_vfmaddsubpd  (v2df, v2df, v2df)
+v4sf __builtin_ia32_vfmaddsubps  (v4sf, v4sf, v4sf)
+v2df __builtin_ia32_vfmsubaddpd  (v2df, v2df, v2df)
+v4sf __builtin_ia32_vfmsubaddps  (v4sf, v4sf, v4sf)
+v4df __builtin_ia32_vfmaddpd256 (v4df, v4df, v4df)
+v8sf __builtin_ia32_vfmaddps256 (v8sf, v8sf, v8sf)
+v4df __builtin_ia32_vfmsubpd256 (v4df, v4df, v4df)
+v8sf __builtin_ia32_vfmsubps256 (v8sf, v8sf, v8sf)
+v4df __builtin_ia32_vfnmaddpd256 (v4df, v4df, v4df)
+v8sf __builtin_ia32_vfnmaddps256 (v8sf, v8sf, v8sf)
+v4df __builtin_ia32_vfnmsubpd256 (v4df, v4df, v4df)
+v8sf __builtin_ia32_vfnmsubps256 (v8sf, v8sf, v8sf)
+v4df __builtin_ia32_vfmaddsubpd256 (v4df, v4df, v4df)
+v8sf __builtin_ia32_vfmaddsubps256 (v8sf, v8sf, v8sf)
+v4df __builtin_ia32_vfmsubaddpd256 (v4df, v4df, v4df)
+v8sf __builtin_ia32_vfmsubaddps256 (v8sf, v8sf, v8sf)
 
-@smallexample
-  spu_add ((vector signed int)@{1, 2, 3, 4@}, foo);
 @end smallexample
 
-@noindent
-Since @code{spu_add} is a macro, the vector constant in the example
-is treated as four separate arguments.  Wrap the entire argument in
-parentheses for this to work.
-
-@item
-The extended version of @code{__builtin_expect} is not supported.
-
-@end itemize
-
-@emph{Note:} Only the interface described in the aforementioned
-specification is supported. Internally, GCC uses built-in functions to
-implement the required functionality, but these are not supported and
-are subject to change without notice.
-
-@node TI C6X Built-in Functions
-@subsection TI C6X Built-in Functions
+The following built-in functions are available when @option{-mlwp} is used.
 
-GCC provides intrinsics to access certain instructions of the TI C6X
-processors.  These intrinsics, listed below, are available after
-inclusion of the @code{c6x_intrinsics.h} header file.  They map directly
-to C6X instructions.
+@smallexample
+void __builtin_ia32_llwpcb16 (void *);
+void __builtin_ia32_llwpcb32 (void *);
+void __builtin_ia32_llwpcb64 (void *);
+void * __builtin_ia32_llwpcb16 (void);
+void * __builtin_ia32_llwpcb32 (void);
+void * __builtin_ia32_llwpcb64 (void);
+void __builtin_ia32_lwpval16 (unsigned short, unsigned int, unsigned short)
+void __builtin_ia32_lwpval32 (unsigned int, unsigned int, unsigned int)
+void __builtin_ia32_lwpval64 (unsigned __int64, unsigned int, unsigned int)
+unsigned char __builtin_ia32_lwpins16 (unsigned short, unsigned int, unsigned short)
+unsigned char __builtin_ia32_lwpins32 (unsigned int, unsigned int, unsigned int)
+unsigned char __builtin_ia32_lwpins64 (unsigned __int64, unsigned int, unsigned int)
+@end smallexample
 
+The following built-in functions are available when @option{-mbmi} is used.
+All of them generate the machine instruction that is part of the name.
 @smallexample
+unsigned int __builtin_ia32_bextr_u32(unsigned int, unsigned int);
+unsigned long long __builtin_ia32_bextr_u64 (unsigned long long, unsigned long long);
+@end smallexample
 
-int _sadd (int, int)
-int _ssub (int, int)
-int _sadd2 (int, int)
-int _ssub2 (int, int)
-long long _mpy2 (int, int)
-long long _smpy2 (int, int)
-int _add4 (int, int)
-int _sub4 (int, int)
-int _saddu4 (int, int)
+The following built-in functions are available when @option{-mbmi2} is used.
+All of them generate the machine instruction that is part of the name.
+@smallexample
+unsigned int _bzhi_u32 (unsigned int, unsigned int)
+unsigned int _pdep_u32 (unsigned int, unsigned int)
+unsigned int _pext_u32 (unsigned int, unsigned int)
+unsigned long long _bzhi_u64 (unsigned long long, unsigned long long)
+unsigned long long _pdep_u64 (unsigned long long, unsigned long long)
+unsigned long long _pext_u64 (unsigned long long, unsigned long long)
+@end smallexample
 
-int _smpy (int, int)
-int _smpyh (int, int)
-int _smpyhl (int, int)
-int _smpylh (int, int)
+The following built-in functions are available when @option{-mlzcnt} is used.
+All of them generate the machine instruction that is part of the name.
+@smallexample
+unsigned short __builtin_ia32_lzcnt_16(unsigned short);
+unsigned int __builtin_ia32_lzcnt_u32(unsigned int);
+unsigned long long __builtin_ia32_lzcnt_u64 (unsigned long long);
+@end smallexample
 
-int _sshl (int, int)
-int _subc (int, int)
+The following built-in functions are available when @option{-mfxsr} is used.
+All of them generate the machine instruction that is part of the name.
+@smallexample
+void __builtin_ia32_fxsave (void *)
+void __builtin_ia32_fxrstor (void *)
+void __builtin_ia32_fxsave64 (void *)
+void __builtin_ia32_fxrstor64 (void *)
+@end smallexample
 
-int _avg2 (int, int)
-int _avgu4 (int, int)
+The following built-in functions are available when @option{-mxsave} is used.
+All of them generate the machine instruction that is part of the name.
+@smallexample
+void __builtin_ia32_xsave (void *, long long)
+void __builtin_ia32_xrstor (void *, long long)
+void __builtin_ia32_xsave64 (void *, long long)
+void __builtin_ia32_xrstor64 (void *, long long)
+@end smallexample
 
-int _clrr (int, int)
-int _extr (int, int)
-int _extru (int, int)
-int _abs (int)
-int _abs2 (int)
+The following built-in functions are available when @option{-mxsaveopt} is used.
+All of them generate the machine instruction that is part of the name.
+@smallexample
+void __builtin_ia32_xsaveopt (void *, long long)
+void __builtin_ia32_xsaveopt64 (void *, long long)
+@end smallexample
 
+The following built-in functions are available when @option{-mtbm} is used.
+Both of them generate the immediate form of the bextr machine instruction.
+@smallexample
+unsigned int __builtin_ia32_bextri_u32 (unsigned int, const unsigned int);
+unsigned long long __builtin_ia32_bextri_u64 (unsigned long long, const unsigned long long);
 @end smallexample
 
-@node TILE-Gx Built-in Functions
-@subsection TILE-Gx Built-in Functions
 
-GCC provides intrinsics to access every instruction of the TILE-Gx
-processor.  The intrinsics are of the form:
+The following built-in functions are available when @option{-m3dnow} is used.
+All of them generate the machine instruction that is part of the name.
 
 @smallexample
+void __builtin_ia32_femms (void)
+v8qi __builtin_ia32_pavgusb (v8qi, v8qi)
+v2si __builtin_ia32_pf2id (v2sf)
+v2sf __builtin_ia32_pfacc (v2sf, v2sf)
+v2sf __builtin_ia32_pfadd (v2sf, v2sf)
+v2si __builtin_ia32_pfcmpeq (v2sf, v2sf)
+v2si __builtin_ia32_pfcmpge (v2sf, v2sf)
+v2si __builtin_ia32_pfcmpgt (v2sf, v2sf)
+v2sf __builtin_ia32_pfmax (v2sf, v2sf)
+v2sf __builtin_ia32_pfmin (v2sf, v2sf)
+v2sf __builtin_ia32_pfmul (v2sf, v2sf)
+v2sf __builtin_ia32_pfrcp (v2sf)
+v2sf __builtin_ia32_pfrcpit1 (v2sf, v2sf)
+v2sf __builtin_ia32_pfrcpit2 (v2sf, v2sf)
+v2sf __builtin_ia32_pfrsqrt (v2sf)
+v2sf __builtin_ia32_pfsub (v2sf, v2sf)
+v2sf __builtin_ia32_pfsubr (v2sf, v2sf)
+v2sf __builtin_ia32_pi2fd (v2si)
+v4hi __builtin_ia32_pmulhrw (v4hi, v4hi)
+@end smallexample
 
-unsigned long long __insn_@var{op} (...)
+The following built-in functions are available when both @option{-m3dnow}
+and @option{-march=athlon} are used.  All of them generate the machine
+instruction that is part of the name.
 
+@smallexample
+v2si __builtin_ia32_pf2iw (v2sf)
+v2sf __builtin_ia32_pfnacc (v2sf, v2sf)
+v2sf __builtin_ia32_pfpnacc (v2sf, v2sf)
+v2sf __builtin_ia32_pi2fw (v2si)
+v2sf __builtin_ia32_pswapdsf (v2sf)
+v2si __builtin_ia32_pswapdsi (v2si)
 @end smallexample
 
-Where @var{op} is the name of the instruction.  Refer to the ISA manual
-for the complete list of instructions.
-
-GCC also provides intrinsics to directly access the network registers.
-The intrinsics are:
+The following built-in functions are available when @option{-mrtm} is used
+They are used for restricted transactional memory. These are the internal
+low level functions. Normally the functions in 
+@ref{x86 transactional memory intrinsics} should be used instead.
 
 @smallexample
+int __builtin_ia32_xbegin ()
+void __builtin_ia32_xend ()
+void __builtin_ia32_xabort (status)
+int __builtin_ia32_xtest ()
+@end smallexample
 
-unsigned long long __tile_idn0_receive (void)
-unsigned long long __tile_idn1_receive (void)
-unsigned long long __tile_udn0_receive (void)
-unsigned long long __tile_udn1_receive (void)
-unsigned long long __tile_udn2_receive (void)
-unsigned long long __tile_udn3_receive (void)
-void __tile_idn_send (unsigned long long)
-void __tile_udn_send (unsigned long long)
+@node x86 transactional memory intrinsics
+@subsection x86 transaction memory intrinsics
 
-@end smallexample
+Hardware transactional memory intrinsics for x86. These allow to use
+memory transactions with RTM (Restricted Transactional Memory).
+For using HLE (Hardware Lock Elision) see @ref{x86 specific memory model extensions for transactional memory} instead.
+This support is enabled with the @option{-mrtm} option.
 
-The intrinsic @code{void __tile_network_barrier (void)} is used to
-guarantee that no network operations before it are reordered with
-those after it.
+A memory transaction commits all changes to memory in an atomic way,
+as visible to other threads. If the transaction fails it is rolled back
+and all side effects discarded.
 
-@node TILEPro Built-in Functions
-@subsection TILEPro Built-in Functions
+Generally there is no guarantee that a memory transaction ever succeeds
+and suitable fallback code always needs to be supplied.
 
-GCC provides intrinsics to access every instruction of the TILEPro
-processor.  The intrinsics are of the form:
+@deftypefn {RTM Function} {unsigned} _xbegin ()
+Start a RTM (Restricted Transactional Memory) transaction. 
+Returns _XBEGIN_STARTED when the transaction
+started successfully (note this is not 0, so the constant has to be 
+explicitely tested). When the transaction aborts all side effects
+are undone and an abort code is returned. There is no guarantee
+any transaction ever succeeds, so there always needs to be a valid
+tested fallback path.
+@end deftypefn
 
 @smallexample
+#include <immintrin.h>
 
-unsigned __insn_@var{op} (...)
-
+if ((status = _xbegin ()) == _XBEGIN_STARTED) @{
+    ... transaction code...
+    _xend ();
+@} else @{
+    ... non transactional fallback path...
+@}
 @end smallexample
 
-@noindent
-where @var{op} is the name of the instruction.  Refer to the ISA manual
-for the complete list of instructions.
-
-GCC also provides intrinsics to directly access the network registers.
-The intrinsics are:
+Valid abort status bits (when the value is not @code{_XBEGIN_STARTED}) are:
 
-@smallexample
+@table @code
+@item _XABORT_EXPLICIT
+Transaction explicitely aborted with @code{_xabort}. The parameter passed
+to @code{_xabort} is available with @code{_XABORT_CODE(status)}
+@item _XABORT_RETRY
+Transaction retry is possible.
+@item _XABORT_CONFLICT
+Transaction abort due to a memory conflict with another thread
+@item _XABORT_CAPACITY
+Transaction abort due to the transaction using too much memory
+@item _XABORT_DEBUG
+Transaction abort due to a debug trap
+@item _XABORT_NESTED
+Transaction abort in a inner nested transaction
+@end table
 
-unsigned __tile_idn0_receive (void)
-unsigned __tile_idn1_receive (void)
-unsigned __tile_sn_receive (void)
-unsigned __tile_udn0_receive (void)
-unsigned __tile_udn1_receive (void)
-unsigned __tile_udn2_receive (void)
-unsigned __tile_udn3_receive (void)
-void __tile_idn_send (unsigned)
-void __tile_sn_send (unsigned)
-void __tile_udn_send (unsigned)
+@deftypefn {RTM Function} {void} _xend ()
+Commit the current transaction. When no transaction is active this will
+fault. All memory side effects of the transactions will become visible
+to other threads in an atomic matter.
+@end deftypefn
 
-@end smallexample
+@deftypefn {RTM Function} {int} _xtest ()
+Return a value not zero when a transaction is currently active, otherwise 0.
+@end deftypefn
 
-The intrinsic @code{void __tile_network_barrier (void)} is used to
-guarantee that no network operations before it are reordered with
-those after it.
+@deftypefn {RTM Function} {void} _xabort (status)
+Abort the current transaction. When no transaction is active this is a no-op.
+status must be a 8bit constant, that is included in the status code returned
+by @code{_xbegin}
+@end deftypefn
 
 @node Target Format Checks
 @section Format Checks Specific to Particular Target Machines
diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
index 94ca947..ba81ec7 100644
--- a/gcc/doc/invoke.texi
+++ b/gcc/doc/invoke.texi
@@ -676,44 +676,6 @@ Objective-C and Objective-C++ Dialects}.
 -mschedule=@var{cpu-type}  -mspace-regs  -msio  -mwsio @gol
 -munix=@var{unix-std}  -nolibdld  -static  -threads}
 
-@emph{x86 Options}
-@gccoptlist{-mtune=@var{cpu-type}  -march=@var{cpu-type} @gol
--mtune-ctrl=@var{feature-list} -mdump-tune-features -mno-default @gol
--mfpmath=@var{unit} @gol
--masm=@var{dialect}  -mno-fancy-math-387 @gol
--mno-fp-ret-in-387  -msoft-float @gol
--mno-wide-multiply  -mrtd  -malign-double @gol
--mpreferred-stack-boundary=@var{num} @gol
--mincoming-stack-boundary=@var{num} @gol
--mcld -mcx16 -msahf -mmovbe -mcrc32 @gol
--mrecip -mrecip=@var{opt} @gol
--mvzeroupper -mprefer-avx128 @gol
--mmmx  -msse  -msse2 -msse3 -mssse3 -msse4.1 -msse4.2 -msse4 -mavx @gol
--mavx2 -mavx512f -mavx512pf -mavx512er -mavx512cd -msha @gol
--maes -mpclmul -mfsgsbase -mrdrnd -mf16c -mfma -mprefetchwt1 @gol
--mclflushopt -mxsavec -mxsaves @gol
--msse4a -m3dnow -mpopcnt -mabm -mbmi -mtbm -mfma4 -mxop -mlzcnt @gol
--mbmi2 -mfxsr -mxsave -mxsaveopt -mrtm -mlwp -mmpx -mthreads @gol
--mno-align-stringops  -minline-all-stringops @gol
--minline-stringops-dynamically -mstringop-strategy=@var{alg} @gol
--mmemcpy-strategy=@var{strategy} -mmemset-strategy=@var{strategy} @gol
--mpush-args  -maccumulate-outgoing-args  -m128bit-long-double @gol
--m96bit-long-double -mlong-double-64 -mlong-double-80 -mlong-double-128 @gol
--mregparm=@var{num}  -msseregparm @gol
--mveclibabi=@var{type} -mvect8-ret-in-mem @gol
--mpc32 -mpc64 -mpc80 -mstackrealign @gol
--momit-leaf-frame-pointer  -mno-red-zone -mno-tls-direct-seg-refs @gol
--mcmodel=@var{code-model} -mabi=@var{name} -maddress-mode=@var{mode} @gol
--m32 -m64 -mx32 -m16 -mlarge-data-threshold=@var{num} @gol
--msse2avx -mfentry -mrecord-mcount -mnop-mcount -m8bit-idiv @gol
--mavx256-split-unaligned-load -mavx256-split-unaligned-store @gol
--malign-data=@var{type} -mstack-protector-guard=@var{guard}}
-
-@emph{x86 Windows Options}
-@gccoptlist{-mconsole -mcygwin -mno-cygwin -mdll @gol
--mnop-fun-dllimport -mthread @gol
--municode -mwin32 -mwindows -fno-set-stack-executable}
-
 @emph{IA-64 Options}
 @gccoptlist{-mbig-endian  -mlittle-endian  -mgnu-as  -mgnu-ld  -mno-pic @gol
 -mvolatile-asm-stop  -mregister-names  -msdata -mno-sdata @gol
@@ -1081,6 +1043,44 @@ See RS/6000 and PowerPC Options.
 @gccoptlist{-mrtp  -non-static  -Bstatic  -Bdynamic @gol
 -Xbind-lazy  -Xbind-now}
 
+@emph{x86 Options}
+@gccoptlist{-mtune=@var{cpu-type}  -march=@var{cpu-type} @gol
+-mtune-ctrl=@var{feature-list} -mdump-tune-features -mno-default @gol
+-mfpmath=@var{unit} @gol
+-masm=@var{dialect}  -mno-fancy-math-387 @gol
+-mno-fp-ret-in-387  -msoft-float @gol
+-mno-wide-multiply  -mrtd  -malign-double @gol
+-mpreferred-stack-boundary=@var{num} @gol
+-mincoming-stack-boundary=@var{num} @gol
+-mcld -mcx16 -msahf -mmovbe -mcrc32 @gol
+-mrecip -mrecip=@var{opt} @gol
+-mvzeroupper -mprefer-avx128 @gol
+-mmmx  -msse  -msse2 -msse3 -mssse3 -msse4.1 -msse4.2 -msse4 -mavx @gol
+-mavx2 -mavx512f -mavx512pf -mavx512er -mavx512cd -msha @gol
+-maes -mpclmul -mfsgsbase -mrdrnd -mf16c -mfma -mprefetchwt1 @gol
+-mclflushopt -mxsavec -mxsaves @gol
+-msse4a -m3dnow -mpopcnt -mabm -mbmi -mtbm -mfma4 -mxop -mlzcnt @gol
+-mbmi2 -mfxsr -mxsave -mxsaveopt -mrtm -mlwp -mmpx -mthreads @gol
+-mno-align-stringops  -minline-all-stringops @gol
+-minline-stringops-dynamically -mstringop-strategy=@var{alg} @gol
+-mmemcpy-strategy=@var{strategy} -mmemset-strategy=@var{strategy} @gol
+-mpush-args  -maccumulate-outgoing-args  -m128bit-long-double @gol
+-m96bit-long-double -mlong-double-64 -mlong-double-80 -mlong-double-128 @gol
+-mregparm=@var{num}  -msseregparm @gol
+-mveclibabi=@var{type} -mvect8-ret-in-mem @gol
+-mpc32 -mpc64 -mpc80 -mstackrealign @gol
+-momit-leaf-frame-pointer  -mno-red-zone -mno-tls-direct-seg-refs @gol
+-mcmodel=@var{code-model} -mabi=@var{name} -maddress-mode=@var{mode} @gol
+-m32 -m64 -mx32 -m16 -mlarge-data-threshold=@var{num} @gol
+-msse2avx -mfentry -mrecord-mcount -mnop-mcount -m8bit-idiv @gol
+-mavx256-split-unaligned-load -mavx256-split-unaligned-store @gol
+-malign-data=@var{type} -mstack-protector-guard=@var{guard}}
+
+@emph{x86 Windows Options}
+@gccoptlist{-mconsole -mcygwin -mno-cygwin -mdll @gol
+-mnop-fun-dllimport -mthread @gol
+-municode -mwin32 -mwindows -fno-set-stack-executable}
+
 @emph{Xstormy16 Options}
 @gccoptlist{-msim}
 
@@ -11952,8 +11952,6 @@ platform.
 * GNU/Linux Options::
 * H8/300 Options::
 * HPPA Options::
-* x86 Options::
-* x86 Windows Options::
 * IA-64 Options::
 * LM32 Options::
 * M32C Options::
@@ -11989,6 +11987,8 @@ platform.
 * Visium Options::
 * VMS Options::
 * VxWorks Options::
+* x86 Options::
+* x86 Windows Options::
 * Xstormy16 Options::
 * Xtensa Options::
 * zSeries Options::
@@ -15361,6568 +15361,6170 @@ under HP-UX@.  This option sets flags for both the preprocessor and
 linker.
 @end table
 
-@node x86 Options
-@subsection x86 Options
-@cindex x86 Options
+@node IA-64 Options
+@subsection IA-64 Options
+@cindex IA-64 Options
 
-These @samp{-m} options are defined for the x86 family of computers.
+These are the @samp{-m} options defined for the Intel IA-64 architecture.
 
 @table @gcctabopt
+@item -mbig-endian
+@opindex mbig-endian
+Generate code for a big-endian target.  This is the default for HP-UX@.
 
-@item -march=@var{cpu-type}
-@opindex march
-Generate instructions for the machine type @var{cpu-type}.  In contrast to
-@option{-mtune=@var{cpu-type}}, which merely tunes the generated code 
-for the specified @var{cpu-type}, @option{-march=@var{cpu-type}} allows GCC
-to generate code that may not run at all on processors other than the one
-indicated.  Specifying @option{-march=@var{cpu-type}} implies 
-@option{-mtune=@var{cpu-type}}.
-
-The choices for @var{cpu-type} are:
+@item -mlittle-endian
+@opindex mlittle-endian
+Generate code for a little-endian target.  This is the default for AIX5
+and GNU/Linux.
 
-@table @samp
-@item native
-This selects the CPU to generate code for at compilation time by determining
-the processor type of the compiling machine.  Using @option{-march=native}
-enables all instruction subsets supported by the local machine (hence
-the result might not run on different machines).  Using @option{-mtune=native}
-produces code optimized for the local machine under the constraints
-of the selected instruction set.  
+@item -mgnu-as
+@itemx -mno-gnu-as
+@opindex mgnu-as
+@opindex mno-gnu-as
+Generate (or don't) code for the GNU assembler.  This is the default.
+@c Also, this is the default if the configure option @option{--with-gnu-as}
+@c is used.
 
-@item i386
-Original Intel i386 CPU@.
+@item -mgnu-ld
+@itemx -mno-gnu-ld
+@opindex mgnu-ld
+@opindex mno-gnu-ld
+Generate (or don't) code for the GNU linker.  This is the default.
+@c Also, this is the default if the configure option @option{--with-gnu-ld}
+@c is used.
 
-@item i486
-Intel i486 CPU@.  (No scheduling is implemented for this chip.)
+@item -mno-pic
+@opindex mno-pic
+Generate code that does not use a global pointer register.  The result
+is not position independent code, and violates the IA-64 ABI@.
 
-@item i586
-@itemx pentium
-Intel Pentium CPU with no MMX support.
+@item -mvolatile-asm-stop
+@itemx -mno-volatile-asm-stop
+@opindex mvolatile-asm-stop
+@opindex mno-volatile-asm-stop
+Generate (or don't) a stop bit immediately before and after volatile asm
+statements.
 
-@item pentium-mmx
-Intel Pentium MMX CPU, based on Pentium core with MMX instruction set support.
+@item -mregister-names
+@itemx -mno-register-names
+@opindex mregister-names
+@opindex mno-register-names
+Generate (or don't) @samp{in}, @samp{loc}, and @samp{out} register names for
+the stacked registers.  This may make assembler output more readable.
 
-@item pentiumpro
-Intel Pentium Pro CPU@.
+@item -mno-sdata
+@itemx -msdata
+@opindex mno-sdata
+@opindex msdata
+Disable (or enable) optimizations that use the small data section.  This may
+be useful for working around optimizer bugs.
 
-@item i686
-When used with @option{-march}, the Pentium Pro
-instruction set is used, so the code runs on all i686 family chips.
-When used with @option{-mtune}, it has the same meaning as @samp{generic}.
+@item -mconstant-gp
+@opindex mconstant-gp
+Generate code that uses a single constant global pointer value.  This is
+useful when compiling kernel code.
 
-@item pentium2
-Intel Pentium II CPU, based on Pentium Pro core with MMX instruction set
-support.
+@item -mauto-pic
+@opindex mauto-pic
+Generate code that is self-relocatable.  This implies @option{-mconstant-gp}.
+This is useful when compiling firmware code.
 
-@item pentium3
-@itemx pentium3m
-Intel Pentium III CPU, based on Pentium Pro core with MMX and SSE instruction
-set support.
+@item -minline-float-divide-min-latency
+@opindex minline-float-divide-min-latency
+Generate code for inline divides of floating-point values
+using the minimum latency algorithm.
 
-@item pentium-m
-Intel Pentium M; low-power version of Intel Pentium III CPU
-with MMX, SSE and SSE2 instruction set support.  Used by Centrino notebooks.
+@item -minline-float-divide-max-throughput
+@opindex minline-float-divide-max-throughput
+Generate code for inline divides of floating-point values
+using the maximum throughput algorithm.
 
-@item pentium4
-@itemx pentium4m
-Intel Pentium 4 CPU with MMX, SSE and SSE2 instruction set support.
+@item -mno-inline-float-divide
+@opindex mno-inline-float-divide
+Do not generate inline code for divides of floating-point values.
 
-@item prescott
-Improved version of Intel Pentium 4 CPU with MMX, SSE, SSE2 and SSE3 instruction
-set support.
+@item -minline-int-divide-min-latency
+@opindex minline-int-divide-min-latency
+Generate code for inline divides of integer values
+using the minimum latency algorithm.
 
-@item nocona
-Improved version of Intel Pentium 4 CPU with 64-bit extensions, MMX, SSE,
-SSE2 and SSE3 instruction set support.
+@item -minline-int-divide-max-throughput
+@opindex minline-int-divide-max-throughput
+Generate code for inline divides of integer values
+using the maximum throughput algorithm.
 
-@item core2
-Intel Core 2 CPU with 64-bit extensions, MMX, SSE, SSE2, SSE3 and SSSE3
-instruction set support.
+@item -mno-inline-int-divide
+@opindex mno-inline-int-divide
+Do not generate inline code for divides of integer values.
 
-@item nehalem
-Intel Nehalem CPU with 64-bit extensions, MMX, SSE, SSE2, SSE3, SSSE3,
-SSE4.1, SSE4.2 and POPCNT instruction set support.
+@item -minline-sqrt-min-latency
+@opindex minline-sqrt-min-latency
+Generate code for inline square roots
+using the minimum latency algorithm.
 
-@item westmere
-Intel Westmere CPU with 64-bit extensions, MMX, SSE, SSE2, SSE3, SSSE3,
-SSE4.1, SSE4.2, POPCNT, AES and PCLMUL instruction set support.
+@item -minline-sqrt-max-throughput
+@opindex minline-sqrt-max-throughput
+Generate code for inline square roots
+using the maximum throughput algorithm.
 
-@item sandybridge
-Intel Sandy Bridge CPU with 64-bit extensions, MMX, SSE, SSE2, SSE3, SSSE3,
-SSE4.1, SSE4.2, POPCNT, AVX, AES and PCLMUL instruction set support.
+@item -mno-inline-sqrt
+@opindex mno-inline-sqrt
+Do not generate inline code for @code{sqrt}.
 
-@item ivybridge
-Intel Ivy Bridge CPU with 64-bit extensions, MMX, SSE, SSE2, SSE3, SSSE3,
-SSE4.1, SSE4.2, POPCNT, AVX, AES, PCLMUL, FSGSBASE, RDRND and F16C
-instruction set support.
+@item -mfused-madd
+@itemx -mno-fused-madd
+@opindex mfused-madd
+@opindex mno-fused-madd
+Do (don't) generate code that uses the fused multiply/add or multiply/subtract
+instructions.  The default is to use these instructions.
 
-@item haswell
-Intel Haswell CPU with 64-bit extensions, MOVBE, MMX, SSE, SSE2, SSE3, SSSE3,
-SSE4.1, SSE4.2, POPCNT, AVX, AVX2, AES, PCLMUL, FSGSBASE, RDRND, FMA,
-BMI, BMI2 and F16C instruction set support.
+@item -mno-dwarf2-asm
+@itemx -mdwarf2-asm
+@opindex mno-dwarf2-asm
+@opindex mdwarf2-asm
+Don't (or do) generate assembler code for the DWARF 2 line number debugging
+info.  This may be useful when not using the GNU assembler.
 
-@item broadwell
-Intel Broadwell CPU with 64-bit extensions, MOVBE, MMX, SSE, SSE2, SSE3, SSSE3,
-SSE4.1, SSE4.2, POPCNT, AVX, AVX2, AES, PCLMUL, FSGSBASE, RDRND, FMA,
-BMI, BMI2, F16C, RDSEED, ADCX and PREFETCHW instruction set support.
+@item -mearly-stop-bits
+@itemx -mno-early-stop-bits
+@opindex mearly-stop-bits
+@opindex mno-early-stop-bits
+Allow stop bits to be placed earlier than immediately preceding the
+instruction that triggered the stop bit.  This can improve instruction
+scheduling, but does not always do so.
 
-@item bonnell
-Intel Bonnell CPU with 64-bit extensions, MOVBE, MMX, SSE, SSE2, SSE3 and SSSE3
-instruction set support.
+@item -mfixed-range=@var{register-range}
+@opindex mfixed-range
+Generate code treating the given register range as fixed registers.
+A fixed register is one that the register allocator cannot use.  This is
+useful when compiling kernel code.  A register range is specified as
+two registers separated by a dash.  Multiple register ranges can be
+specified separated by a comma.
 
-@item silvermont
-Intel Silvermont CPU with 64-bit extensions, MOVBE, MMX, SSE, SSE2, SSE3, SSSE3,
-SSE4.1, SSE4.2, POPCNT, AES, PCLMUL and RDRND instruction set support.
+@item -mtls-size=@var{tls-size}
+@opindex mtls-size
+Specify bit size of immediate TLS offsets.  Valid values are 14, 22, and
+64.
 
-@item k6
-AMD K6 CPU with MMX instruction set support.
+@item -mtune=@var{cpu-type}
+@opindex mtune
+Tune the instruction scheduling for a particular CPU, Valid values are
+@samp{itanium}, @samp{itanium1}, @samp{merced}, @samp{itanium2},
+and @samp{mckinley}.
 
-@item k6-2
-@itemx k6-3
-Improved versions of AMD K6 CPU with MMX and 3DNow!@: instruction set support.
+@item -milp32
+@itemx -mlp64
+@opindex milp32
+@opindex mlp64
+Generate code for a 32-bit or 64-bit environment.
+The 32-bit environment sets int, long and pointer to 32 bits.
+The 64-bit environment sets int to 32 bits and long and pointer
+to 64 bits.  These are HP-UX specific flags.
 
-@item athlon
-@itemx athlon-tbird
-AMD Athlon CPU with MMX, 3dNOW!, enhanced 3DNow!@: and SSE prefetch instructions
-support.
+@item -mno-sched-br-data-spec
+@itemx -msched-br-data-spec
+@opindex mno-sched-br-data-spec
+@opindex msched-br-data-spec
+(Dis/En)able data speculative scheduling before reload.
+This results in generation of @code{ld.a} instructions and
+the corresponding check instructions (@code{ld.c} / @code{chk.a}).
+The default is 'disable'.
 
-@item athlon-4
-@itemx athlon-xp
-@itemx athlon-mp
-Improved AMD Athlon CPU with MMX, 3DNow!, enhanced 3DNow!@: and full SSE
-instruction set support.
+@item -msched-ar-data-spec
+@itemx -mno-sched-ar-data-spec
+@opindex msched-ar-data-spec
+@opindex mno-sched-ar-data-spec
+(En/Dis)able data speculative scheduling after reload.
+This results in generation of @code{ld.a} instructions and
+the corresponding check instructions (@code{ld.c} / @code{chk.a}).
+The default is 'enable'.
 
-@item k8
-@itemx opteron
-@itemx athlon64
-@itemx athlon-fx
-Processors based on the AMD K8 core with x86-64 instruction set support,
-including the AMD Opteron, Athlon 64, and Athlon 64 FX processors.
-(This supersets MMX, SSE, SSE2, 3DNow!, enhanced 3DNow!@: and 64-bit
-instruction set extensions.)
+@item -mno-sched-control-spec
+@itemx -msched-control-spec
+@opindex mno-sched-control-spec
+@opindex msched-control-spec
+(Dis/En)able control speculative scheduling.  This feature is
+available only during region scheduling (i.e.@: before reload).
+This results in generation of the @code{ld.s} instructions and
+the corresponding check instructions @code{chk.s}.
+The default is 'disable'.
 
-@item k8-sse3
-@itemx opteron-sse3
-@itemx athlon64-sse3
-Improved versions of AMD K8 cores with SSE3 instruction set support.
+@item -msched-br-in-data-spec
+@itemx -mno-sched-br-in-data-spec
+@opindex msched-br-in-data-spec
+@opindex mno-sched-br-in-data-spec
+(En/Dis)able speculative scheduling of the instructions that
+are dependent on the data speculative loads before reload.
+This is effective only with @option{-msched-br-data-spec} enabled.
+The default is 'enable'.
 
-@item amdfam10
-@itemx barcelona
-CPUs based on AMD Family 10h cores with x86-64 instruction set support.  (This
-supersets MMX, SSE, SSE2, SSE3, SSE4A, 3DNow!, enhanced 3DNow!, ABM and 64-bit
-instruction set extensions.)
+@item -msched-ar-in-data-spec
+@itemx -mno-sched-ar-in-data-spec
+@opindex msched-ar-in-data-spec
+@opindex mno-sched-ar-in-data-spec
+(En/Dis)able speculative scheduling of the instructions that
+are dependent on the data speculative loads after reload.
+This is effective only with @option{-msched-ar-data-spec} enabled.
+The default is 'enable'.
 
-@item bdver1
-CPUs based on AMD Family 15h cores with x86-64 instruction set support.  (This
-supersets FMA4, AVX, XOP, LWP, AES, PCL_MUL, CX16, MMX, SSE, SSE2, SSE3, SSE4A,
-SSSE3, SSE4.1, SSE4.2, ABM and 64-bit instruction set extensions.)
-@item bdver2
-AMD Family 15h core based CPUs with x86-64 instruction set support.  (This
-supersets BMI, TBM, F16C, FMA, FMA4, AVX, XOP, LWP, AES, PCL_MUL, CX16, MMX,
-SSE, SSE2, SSE3, SSE4A, SSSE3, SSE4.1, SSE4.2, ABM and 64-bit instruction set 
-extensions.)
-@item bdver3
-AMD Family 15h core based CPUs with x86-64 instruction set support.  (This
-supersets BMI, TBM, F16C, FMA, FMA4, FSGSBASE, AVX, XOP, LWP, AES, 
-PCL_MUL, CX16, MMX, SSE, SSE2, SSE3, SSE4A, SSSE3, SSE4.1, SSE4.2, ABM and 
-64-bit instruction set extensions.
-@item bdver4
-AMD Family 15h core based CPUs with x86-64 instruction set support.  (This
-supersets BMI, BMI2, TBM, F16C, FMA, FMA4, FSGSBASE, AVX, AVX2, XOP, LWP, 
-AES, PCL_MUL, CX16, MOVBE, MMX, SSE, SSE2, SSE3, SSE4A, SSSE3, SSE4.1, 
-SSE4.2, ABM and 64-bit instruction set extensions.
+@item -msched-in-control-spec
+@itemx -mno-sched-in-control-spec
+@opindex msched-in-control-spec
+@opindex mno-sched-in-control-spec
+(En/Dis)able speculative scheduling of the instructions that
+are dependent on the control speculative loads.
+This is effective only with @option{-msched-control-spec} enabled.
+The default is 'enable'.
 
-@item btver1
-CPUs based on AMD Family 14h cores with x86-64 instruction set support.  (This
-supersets MMX, SSE, SSE2, SSE3, SSSE3, SSE4A, CX16, ABM and 64-bit
-instruction set extensions.)
+@item -mno-sched-prefer-non-data-spec-insns
+@itemx -msched-prefer-non-data-spec-insns
+@opindex mno-sched-prefer-non-data-spec-insns
+@opindex msched-prefer-non-data-spec-insns
+If enabled, data-speculative instructions are chosen for schedule
+only if there are no other choices at the moment.  This makes
+the use of the data speculation much more conservative.
+The default is 'disable'.
 
-@item btver2
-CPUs based on AMD Family 16h cores with x86-64 instruction set support. This
-includes MOVBE, F16C, BMI, AVX, PCL_MUL, AES, SSE4.2, SSE4.1, CX16, ABM,
-SSE4A, SSSE3, SSE3, SSE2, SSE, MMX and 64-bit instruction set extensions.
+@item -mno-sched-prefer-non-control-spec-insns
+@itemx -msched-prefer-non-control-spec-insns
+@opindex mno-sched-prefer-non-control-spec-insns
+@opindex msched-prefer-non-control-spec-insns
+If enabled, control-speculative instructions are chosen for schedule
+only if there are no other choices at the moment.  This makes
+the use of the control speculation much more conservative.
+The default is 'disable'.
 
-@item winchip-c6
-IDT WinChip C6 CPU, dealt in same way as i486 with additional MMX instruction
-set support.
+@item -mno-sched-count-spec-in-critical-path
+@itemx -msched-count-spec-in-critical-path
+@opindex mno-sched-count-spec-in-critical-path
+@opindex msched-count-spec-in-critical-path
+If enabled, speculative dependencies are considered during
+computation of the instructions priorities.  This makes the use of the
+speculation a bit more conservative.
+The default is 'disable'.
 
-@item winchip2
-IDT WinChip 2 CPU, dealt in same way as i486 with additional MMX and 3DNow!@:
-instruction set support.
+@item -msched-spec-ldc
+@opindex msched-spec-ldc
+Use a simple data speculation check.  This option is on by default.
 
-@item c3
-VIA C3 CPU with MMX and 3DNow!@: instruction set support.  (No scheduling is
-implemented for this chip.)
+@item -msched-control-spec-ldc
+@opindex msched-spec-ldc
+Use a simple check for control speculation.  This option is on by default.
 
-@item c3-2
-VIA C3-2 (Nehemiah/C5XL) CPU with MMX and SSE instruction set support.
-(No scheduling is
-implemented for this chip.)
+@item -msched-stop-bits-after-every-cycle
+@opindex msched-stop-bits-after-every-cycle
+Place a stop bit after every cycle when scheduling.  This option is on
+by default.
 
-@item geode
-AMD Geode embedded processor with MMX and 3DNow!@: instruction set support.
-@end table
+@item -msched-fp-mem-deps-zero-cost
+@opindex msched-fp-mem-deps-zero-cost
+Assume that floating-point stores and loads are not likely to cause a conflict
+when placed into the same instruction group.  This option is disabled by
+default.
 
-@item -mtune=@var{cpu-type}
-@opindex mtune
-Tune to @var{cpu-type} everything applicable about the generated code, except
-for the ABI and the set of available instructions.  
-While picking a specific @var{cpu-type} schedules things appropriately
-for that particular chip, the compiler does not generate any code that
-cannot run on the default machine type unless you use a
-@option{-march=@var{cpu-type}} option.
-For example, if GCC is configured for i686-pc-linux-gnu
-then @option{-mtune=pentium4} generates code that is tuned for Pentium 4
-but still runs on i686 machines.
+@item -msel-sched-dont-check-control-spec
+@opindex msel-sched-dont-check-control-spec
+Generate checks for control speculation in selective scheduling.
+This flag is disabled by default.
 
-The choices for @var{cpu-type} are the same as for @option{-march}.
-In addition, @option{-mtune} supports 2 extra choices for @var{cpu-type}:
+@item -msched-max-memory-insns=@var{max-insns}
+@opindex msched-max-memory-insns
+Limit on the number of memory insns per instruction group, giving lower
+priority to subsequent memory insns attempting to schedule in the same
+instruction group. Frequently useful to prevent cache bank conflicts.
+The default value is 1.
 
-@table @samp
-@item generic
-Produce code optimized for the most common IA32/@/AMD64/@/EM64T processors.
-If you know the CPU on which your code will run, then you should use
-the corresponding @option{-mtune} or @option{-march} option instead of
-@option{-mtune=generic}.  But, if you do not know exactly what CPU users
-of your application will have, then you should use this option.
+@item -msched-max-memory-insns-hard-limit
+@opindex msched-max-memory-insns-hard-limit
+Makes the limit specified by @option{msched-max-memory-insns} a hard limit,
+disallowing more than that number in an instruction group.
+Otherwise, the limit is ``soft'', meaning that non-memory operations
+are preferred when the limit is reached, but memory operations may still
+be scheduled.
 
-As new processors are deployed in the marketplace, the behavior of this
-option will change.  Therefore, if you upgrade to a newer version of
-GCC, code generation controlled by this option will change to reflect
-the processors
-that are most common at the time that version of GCC is released.
+@end table
 
-There is no @option{-march=generic} option because @option{-march}
-indicates the instruction set the compiler can use, and there is no
-generic instruction set applicable to all processors.  In contrast,
-@option{-mtune} indicates the processor (or, in this case, collection of
-processors) for which the code is optimized.
+@node LM32 Options
+@subsection LM32 Options
+@cindex LM32 options
 
-@item intel
-Produce code optimized for the most current Intel processors, which are
-Haswell and Silvermont for this version of GCC.  If you know the CPU
-on which your code will run, then you should use the corresponding
-@option{-mtune} or @option{-march} option instead of @option{-mtune=intel}.
-But, if you want your application performs better on both Haswell and
-Silvermont, then you should use this option.
+These @option{-m} options are defined for the LatticeMico32 architecture:
 
-As new Intel processors are deployed in the marketplace, the behavior of
-this option will change.  Therefore, if you upgrade to a newer version of
-GCC, code generation controlled by this option will change to reflect
-the most current Intel processors at the time that version of GCC is
-released.
+@table @gcctabopt
+@item -mbarrel-shift-enabled
+@opindex mbarrel-shift-enabled
+Enable barrel-shift instructions.
 
-There is no @option{-march=intel} option because @option{-march} indicates
-the instruction set the compiler can use, and there is no common
-instruction set applicable to all processors.  In contrast,
-@option{-mtune} indicates the processor (or, in this case, collection of
-processors) for which the code is optimized.
-@end table
+@item -mdivide-enabled
+@opindex mdivide-enabled
+Enable divide and modulus instructions.
 
-@item -mcpu=@var{cpu-type}
-@opindex mcpu
-A deprecated synonym for @option{-mtune}.
+@item -mmultiply-enabled
+@opindex multiply-enabled
+Enable multiply instructions.
 
-@item -mfpmath=@var{unit}
-@opindex mfpmath
-Generate floating-point arithmetic for selected unit @var{unit}.  The choices
-for @var{unit} are:
+@item -msign-extend-enabled
+@opindex msign-extend-enabled
+Enable sign extend instructions.
 
-@table @samp
-@item 387
-Use the standard 387 floating-point coprocessor present on the majority of chips and
-emulated otherwise.  Code compiled with this option runs almost everywhere.
-The temporary results are computed in 80-bit precision instead of the precision
-specified by the type, resulting in slightly different results compared to most
-of other chips.  See @option{-ffloat-store} for more detailed description.
+@item -muser-enabled
+@opindex muser-enabled
+Enable user-defined instructions.
 
-This is the default choice for x86-32 targets.
+@end table
 
-@item sse
-Use scalar floating-point instructions present in the SSE instruction set.
-This instruction set is supported by Pentium III and newer chips,
-and in the AMD line
-by Athlon-4, Athlon XP and Athlon MP chips.  The earlier version of the SSE
-instruction set supports only single-precision arithmetic, thus the double and
-extended-precision arithmetic are still done using 387.  A later version, present
-only in Pentium 4 and AMD x86-64 chips, supports double-precision
-arithmetic too.
+@node M32C Options
+@subsection M32C Options
+@cindex M32C options
 
-For the x86-32 compiler, you must use @option{-march=@var{cpu-type}}, @option{-msse}
-or @option{-msse2} switches to enable SSE extensions and make this option
-effective.  For the x86-64 compiler, these extensions are enabled by default.
+@table @gcctabopt
+@item -mcpu=@var{name}
+@opindex mcpu=
+Select the CPU for which code is generated.  @var{name} may be one of
+@samp{r8c} for the R8C/Tiny series, @samp{m16c} for the M16C (up to
+/60) series, @samp{m32cm} for the M16C/80 series, or @samp{m32c} for
+the M32C/80 series.
 
-The resulting code should be considerably faster in the majority of cases and avoid
-the numerical instability problems of 387 code, but may break some existing
-code that expects temporaries to be 80 bits.
+@item -msim
+@opindex msim
+Specifies that the program will be run on the simulator.  This causes
+an alternate runtime library to be linked in which supports, for
+example, file I/O@.  You must not use this option when generating
+programs that will run on real hardware; you must provide your own
+runtime library for whatever I/O functions are needed.
 
-This is the default choice for the x86-64 compiler.
+@item -memregs=@var{number}
+@opindex memregs=
+Specifies the number of memory-based pseudo-registers GCC uses
+during code generation.  These pseudo-registers are used like real
+registers, so there is a tradeoff between GCC's ability to fit the
+code into available registers, and the performance penalty of using
+memory instead of registers.  Note that all modules in a program must
+be compiled with the same value for this option.  Because of that, you
+must not use this option with GCC's default runtime libraries.
 
-@item sse,387
-@itemx sse+387
-@itemx both
-Attempt to utilize both instruction sets at once.  This effectively doubles the
-amount of available registers, and on chips with separate execution units for
-387 and SSE the execution resources too.  Use this option with care, as it is
-still experimental, because the GCC register allocator does not model separate
-functional units well, resulting in unstable performance.
 @end table
 
-@item -masm=@var{dialect}
-@opindex masm=@var{dialect}
-Output assembly instructions using selected @var{dialect}.  Supported
-choices are @samp{intel} or @samp{att} (the default).  Darwin does
-not support @samp{intel}.
+@node M32R/D Options
+@subsection M32R/D Options
+@cindex M32R/D options
 
-@item -mieee-fp
-@itemx -mno-ieee-fp
-@opindex mieee-fp
-@opindex mno-ieee-fp
-Control whether or not the compiler uses IEEE floating-point
-comparisons.  These correctly handle the case where the result of a
-comparison is unordered.
+These @option{-m} options are defined for Renesas M32R/D architectures:
 
-@item -msoft-float
-@opindex msoft-float
-Generate output containing library calls for floating point.
+@table @gcctabopt
+@item -m32r2
+@opindex m32r2
+Generate code for the M32R/2@.
 
-@strong{Warning:} the requisite libraries are not part of GCC@.
-Normally the facilities of the machine's usual C compiler are used, but
-this can't be done directly in cross-compilation.  You must make your
-own arrangements to provide suitable library functions for
-cross-compilation.
+@item -m32rx
+@opindex m32rx
+Generate code for the M32R/X@.
 
-On machines where a function returns floating-point results in the 80387
-register stack, some floating-point opcodes may be emitted even if
-@option{-msoft-float} is used.
+@item -m32r
+@opindex m32r
+Generate code for the M32R@.  This is the default.
 
-@item -mno-fp-ret-in-387
-@opindex mno-fp-ret-in-387
-Do not use the FPU registers for return values of functions.
+@item -mmodel=small
+@opindex mmodel=small
+Assume all objects live in the lower 16MB of memory (so that their addresses
+can be loaded with the @code{ld24} instruction), and assume all subroutines
+are reachable with the @code{bl} instruction.
+This is the default.
 
-The usual calling convention has functions return values of types
-@code{float} and @code{double} in an FPU register, even if there
-is no FPU@.  The idea is that the operating system should emulate
-an FPU@.
+The addressability of a particular object can be set with the
+@code{model} attribute.
 
-The option @option{-mno-fp-ret-in-387} causes such values to be returned
-in ordinary CPU registers instead.
+@item -mmodel=medium
+@opindex mmodel=medium
+Assume objects may be anywhere in the 32-bit address space (the compiler
+generates @code{seth/add3} instructions to load their addresses), and
+assume all subroutines are reachable with the @code{bl} instruction.
 
-@item -mno-fancy-math-387
-@opindex mno-fancy-math-387
-Some 387 emulators do not support the @code{sin}, @code{cos} and
-@code{sqrt} instructions for the 387.  Specify this option to avoid
-generating those instructions.  This option is the default on FreeBSD,
-OpenBSD and NetBSD@.  This option is overridden when @option{-march}
-indicates that the target CPU always has an FPU and so the
-instruction does not need emulation.  These
-instructions are not generated unless you also use the
-@option{-funsafe-math-optimizations} switch.
+@item -mmodel=large
+@opindex mmodel=large
+Assume objects may be anywhere in the 32-bit address space (the compiler
+generates @code{seth/add3} instructions to load their addresses), and
+assume subroutines may not be reachable with the @code{bl} instruction
+(the compiler generates the much slower @code{seth/add3/jl}
+instruction sequence).
 
-@item -malign-double
-@itemx -mno-align-double
-@opindex malign-double
-@opindex mno-align-double
-Control whether GCC aligns @code{double}, @code{long double}, and
-@code{long long} variables on a two-word boundary or a one-word
-boundary.  Aligning @code{double} variables on a two-word boundary
-produces code that runs somewhat faster on a Pentium at the
-expense of more memory.
+@item -msdata=none
+@opindex msdata=none
+Disable use of the small data area.  Variables are put into
+one of @code{.data}, @code{.bss}, or @code{.rodata} (unless the
+@code{section} attribute has been specified).
+This is the default.
 
-On x86-64, @option{-malign-double} is enabled by default.
+The small data area consists of sections @code{.sdata} and @code{.sbss}.
+Objects may be explicitly put in the small data area with the
+@code{section} attribute using one of these sections.
 
-@strong{Warning:} if you use the @option{-malign-double} switch,
-structures containing the above types are aligned differently than
-the published application binary interface specifications for the x86-32
-and are not binary compatible with structures in code compiled
-without that switch.
+@item -msdata=sdata
+@opindex msdata=sdata
+Put small global and static data in the small data area, but do not
+generate special code to reference them.
 
-@item -m96bit-long-double
-@itemx -m128bit-long-double
-@opindex m96bit-long-double
-@opindex m128bit-long-double
-These switches control the size of @code{long double} type.  The x86-32
-application binary interface specifies the size to be 96 bits,
-so @option{-m96bit-long-double} is the default in 32-bit mode.
+@item -msdata=use
+@opindex msdata=use
+Put small global and static data in the small data area, and generate
+special instructions to reference them.
 
-Modern architectures (Pentium and newer) prefer @code{long double}
-to be aligned to an 8- or 16-byte boundary.  In arrays or structures
-conforming to the ABI, this is not possible.  So specifying
-@option{-m128bit-long-double} aligns @code{long double}
-to a 16-byte boundary by padding the @code{long double} with an additional
-32-bit zero.
+@item -G @var{num}
+@opindex G
+@cindex smaller data references
+Put global and static objects less than or equal to @var{num} bytes
+into the small data or BSS sections instead of the normal data or BSS
+sections.  The default value of @var{num} is 8.
+The @option{-msdata} option must be set to one of @samp{sdata} or @samp{use}
+for this option to have any effect.
 
-In the x86-64 compiler, @option{-m128bit-long-double} is the default choice as
-its ABI specifies that @code{long double} is aligned on 16-byte boundary.
+All modules should be compiled with the same @option{-G @var{num}} value.
+Compiling with different values of @var{num} may or may not work; if it
+doesn't the linker gives an error message---incorrect code is not
+generated.
 
-Notice that neither of these options enable any extra precision over the x87
-standard of 80 bits for a @code{long double}.
+@item -mdebug
+@opindex mdebug
+Makes the M32R-specific code in the compiler display some statistics
+that might help in debugging programs.
 
-@strong{Warning:} if you override the default value for your target ABI, this
-changes the size of 
-structures and arrays containing @code{long double} variables,
-as well as modifying the function calling convention for functions taking
-@code{long double}.  Hence they are not binary-compatible
-with code compiled without that switch.
+@item -malign-loops
+@opindex malign-loops
+Align all loops to a 32-byte boundary.
 
-@item -mlong-double-64
-@itemx -mlong-double-80
-@itemx -mlong-double-128
-@opindex mlong-double-64
-@opindex mlong-double-80
-@opindex mlong-double-128
-These switches control the size of @code{long double} type. A size
-of 64 bits makes the @code{long double} type equivalent to the @code{double}
-type. This is the default for 32-bit Bionic C library.  A size
-of 128 bits makes the @code{long double} type equivalent to the
-@code{__float128} type. This is the default for 64-bit Bionic C library.
-
-@strong{Warning:} if you override the default value for your target ABI, this
-changes the size of
-structures and arrays containing @code{long double} variables,
-as well as modifying the function calling convention for functions taking
-@code{long double}.  Hence they are not binary-compatible
-with code compiled without that switch.
+@item -mno-align-loops
+@opindex mno-align-loops
+Do not enforce a 32-byte alignment for loops.  This is the default.
 
-@item -malign-data=@var{type}
-@opindex malign-data
-Control how GCC aligns variables.  Supported values for @var{type} are
-@samp{compat} uses increased alignment value compatible uses GCC 4.8
-and earlier, @samp{abi} uses alignment value as specified by the
-psABI, and @samp{cacheline} uses increased alignment value to match
-the cache line size.  @samp{compat} is the default.
+@item -missue-rate=@var{number}
+@opindex missue-rate=@var{number}
+Issue @var{number} instructions per cycle.  @var{number} can only be 1
+or 2.
 
-@item -mlarge-data-threshold=@var{threshold}
-@opindex mlarge-data-threshold
-When @option{-mcmodel=medium} is specified, data objects larger than
-@var{threshold} are placed in the large data section.  This value must be the
-same across all objects linked into the binary, and defaults to 65535.
+@item -mbranch-cost=@var{number}
+@opindex mbranch-cost=@var{number}
+@var{number} can only be 1 or 2.  If it is 1 then branches are
+preferred over conditional code, if it is 2, then the opposite applies.
 
-@item -mrtd
-@opindex mrtd
-Use a different function-calling convention, in which functions that
-take a fixed number of arguments return with the @code{ret @var{num}}
-instruction, which pops their arguments while returning.  This saves one
-instruction in the caller since there is no need to pop the arguments
-there.
+@item -mflush-trap=@var{number}
+@opindex mflush-trap=@var{number}
+Specifies the trap number to use to flush the cache.  The default is
+12.  Valid numbers are between 0 and 15 inclusive.
 
-You can specify that an individual function is called with this calling
-sequence with the function attribute @code{stdcall}.  You can also
-override the @option{-mrtd} option by using the function attribute
-@code{cdecl}.  @xref{Function Attributes}.
+@item -mno-flush-trap
+@opindex mno-flush-trap
+Specifies that the cache cannot be flushed by using a trap.
 
-@strong{Warning:} this calling convention is incompatible with the one
-normally used on Unix, so you cannot use it if you need to call
-libraries compiled with the Unix compiler.
+@item -mflush-func=@var{name}
+@opindex mflush-func=@var{name}
+Specifies the name of the operating system function to call to flush
+the cache.  The default is @samp{_flush_cache}, but a function call
+is only used if a trap is not available.
 
-Also, you must provide function prototypes for all functions that
-take variable numbers of arguments (including @code{printf});
-otherwise incorrect code is generated for calls to those
-functions.
+@item -mno-flush-func
+@opindex mno-flush-func
+Indicates that there is no OS function for flushing the cache.
 
-In addition, seriously incorrect code results if you call a
-function with too many arguments.  (Normally, extra arguments are
-harmlessly ignored.)
+@end table
 
-@item -mregparm=@var{num}
-@opindex mregparm
-Control how many registers are used to pass integer arguments.  By
-default, no registers are used to pass arguments, and at most 3
-registers can be used.  You can control this behavior for a specific
-function by using the function attribute @code{regparm}.
-@xref{Function Attributes}.
+@node M680x0 Options
+@subsection M680x0 Options
+@cindex M680x0 options
 
-@strong{Warning:} if you use this switch, and
-@var{num} is nonzero, then you must build all modules with the same
-value, including any libraries.  This includes the system libraries and
-startup modules.
+These are the @samp{-m} options defined for M680x0 and ColdFire processors.
+The default settings depend on which architecture was selected when
+the compiler was configured; the defaults for the most common choices
+are given below.
 
-@item -msseregparm
-@opindex msseregparm
-Use SSE register passing conventions for float and double arguments
-and return values.  You can control this behavior for a specific
-function by using the function attribute @code{sseregparm}.
-@xref{Function Attributes}.
+@table @gcctabopt
+@item -march=@var{arch}
+@opindex march
+Generate code for a specific M680x0 or ColdFire instruction set
+architecture.  Permissible values of @var{arch} for M680x0
+architectures are: @samp{68000}, @samp{68010}, @samp{68020},
+@samp{68030}, @samp{68040}, @samp{68060} and @samp{cpu32}.  ColdFire
+architectures are selected according to Freescale's ISA classification
+and the permissible values are: @samp{isaa}, @samp{isaaplus},
+@samp{isab} and @samp{isac}.
 
-@strong{Warning:} if you use this switch then you must build all
-modules with the same value, including any libraries.  This includes
-the system libraries and startup modules.
+GCC defines a macro @code{__mcf@var{arch}__} whenever it is generating
+code for a ColdFire target.  The @var{arch} in this macro is one of the
+@option{-march} arguments given above.
 
-@item -mvect8-ret-in-mem
-@opindex mvect8-ret-in-mem
-Return 8-byte vectors in memory instead of MMX registers.  This is the
-default on Solaris@tie{}8 and 9 and VxWorks to match the ABI of the Sun
-Studio compilers until version 12.  Later compiler versions (starting
-with Studio 12 Update@tie{}1) follow the ABI used by other x86 targets, which
-is the default on Solaris@tie{}10 and later.  @emph{Only} use this option if
-you need to remain compatible with existing code produced by those
-previous compiler versions or older versions of GCC@.
+When used together, @option{-march} and @option{-mtune} select code
+that runs on a family of similar processors but that is optimized
+for a particular microarchitecture.
 
-@item -mpc32
-@itemx -mpc64
-@itemx -mpc80
-@opindex mpc32
-@opindex mpc64
-@opindex mpc80
+@item -mcpu=@var{cpu}
+@opindex mcpu
+Generate code for a specific M680x0 or ColdFire processor.
+The M680x0 @var{cpu}s are: @samp{68000}, @samp{68010}, @samp{68020},
+@samp{68030}, @samp{68040}, @samp{68060}, @samp{68302}, @samp{68332}
+and @samp{cpu32}.  The ColdFire @var{cpu}s are given by the table
+below, which also classifies the CPUs into families:
 
-Set 80387 floating-point precision to 32, 64 or 80 bits.  When @option{-mpc32}
-is specified, the significands of results of floating-point operations are
-rounded to 24 bits (single precision); @option{-mpc64} rounds the
-significands of results of floating-point operations to 53 bits (double
-precision) and @option{-mpc80} rounds the significands of results of
-floating-point operations to 64 bits (extended double precision), which is
-the default.  When this option is used, floating-point operations in higher
-precisions are not available to the programmer without setting the FPU
-control word explicitly.
+@multitable @columnfractions 0.20 0.80
+@item @strong{Family} @tab @strong{@samp{-mcpu} arguments}
+@item @samp{51} @tab @samp{51} @samp{51ac} @samp{51ag} @samp{51cn} @samp{51em} @samp{51je} @samp{51jf} @samp{51jg} @samp{51jm} @samp{51mm} @samp{51qe} @samp{51qm}
+@item @samp{5206} @tab @samp{5202} @samp{5204} @samp{5206}
+@item @samp{5206e} @tab @samp{5206e}
+@item @samp{5208} @tab @samp{5207} @samp{5208}
+@item @samp{5211a} @tab @samp{5210a} @samp{5211a}
+@item @samp{5213} @tab @samp{5211} @samp{5212} @samp{5213}
+@item @samp{5216} @tab @samp{5214} @samp{5216}
+@item @samp{52235} @tab @samp{52230} @samp{52231} @samp{52232} @samp{52233} @samp{52234} @samp{52235}
+@item @samp{5225} @tab @samp{5224} @samp{5225}
+@item @samp{52259} @tab @samp{52252} @samp{52254} @samp{52255} @samp{52256} @samp{52258} @samp{52259}
+@item @samp{5235} @tab @samp{5232} @samp{5233} @samp{5234} @samp{5235} @samp{523x}
+@item @samp{5249} @tab @samp{5249}
+@item @samp{5250} @tab @samp{5250}
+@item @samp{5271} @tab @samp{5270} @samp{5271}
+@item @samp{5272} @tab @samp{5272}
+@item @samp{5275} @tab @samp{5274} @samp{5275}
+@item @samp{5282} @tab @samp{5280} @samp{5281} @samp{5282} @samp{528x}
+@item @samp{53017} @tab @samp{53011} @samp{53012} @samp{53013} @samp{53014} @samp{53015} @samp{53016} @samp{53017}
+@item @samp{5307} @tab @samp{5307}
+@item @samp{5329} @tab @samp{5327} @samp{5328} @samp{5329} @samp{532x}
+@item @samp{5373} @tab @samp{5372} @samp{5373} @samp{537x}
+@item @samp{5407} @tab @samp{5407}
+@item @samp{5475} @tab @samp{5470} @samp{5471} @samp{5472} @samp{5473} @samp{5474} @samp{5475} @samp{547x} @samp{5480} @samp{5481} @samp{5482} @samp{5483} @samp{5484} @samp{5485}
+@end multitable
 
-Setting the rounding of floating-point operations to less than the default
-80 bits can speed some programs by 2% or more.  Note that some mathematical
-libraries assume that extended-precision (80-bit) floating-point operations
-are enabled by default; routines in such libraries could suffer significant
-loss of accuracy, typically through so-called ``catastrophic cancellation'',
-when this option is used to set the precision to less than extended precision.
+@option{-mcpu=@var{cpu}} overrides @option{-march=@var{arch}} if
+@var{arch} is compatible with @var{cpu}.  Other combinations of
+@option{-mcpu} and @option{-march} are rejected.
 
-@item -mstackrealign
-@opindex mstackrealign
-Realign the stack at entry.  On the x86, the @option{-mstackrealign}
-option generates an alternate prologue and epilogue that realigns the
-run-time stack if necessary.  This supports mixing legacy codes that keep
-4-byte stack alignment with modern codes that keep 16-byte stack alignment for
-SSE compatibility.  See also the attribute @code{force_align_arg_pointer},
-applicable to individual functions.
+GCC defines the macro @code{__mcf_cpu_@var{cpu}} when ColdFire target
+@var{cpu} is selected.  It also defines @code{__mcf_family_@var{family}},
+where the value of @var{family} is given by the table above.
 
-@item -mpreferred-stack-boundary=@var{num}
-@opindex mpreferred-stack-boundary
-Attempt to keep the stack boundary aligned to a 2 raised to @var{num}
-byte boundary.  If @option{-mpreferred-stack-boundary} is not specified,
-the default is 4 (16 bytes or 128 bits).
+@item -mtune=@var{tune}
+@opindex mtune
+Tune the code for a particular microarchitecture within the
+constraints set by @option{-march} and @option{-mcpu}.
+The M680x0 microarchitectures are: @samp{68000}, @samp{68010},
+@samp{68020}, @samp{68030}, @samp{68040}, @samp{68060}
+and @samp{cpu32}.  The ColdFire microarchitectures
+are: @samp{cfv1}, @samp{cfv2}, @samp{cfv3}, @samp{cfv4} and @samp{cfv4e}.
 
-@strong{Warning:} When generating code for the x86-64 architecture with
-SSE extensions disabled, @option{-mpreferred-stack-boundary=3} can be
-used to keep the stack boundary aligned to 8 byte boundary.  Since
-x86-64 ABI require 16 byte stack alignment, this is ABI incompatible and
-intended to be used in controlled environment where stack space is
-important limitation.  This option leads to wrong code when functions
-compiled with 16 byte stack alignment (such as functions from a standard
-library) are called with misaligned stack.  In this case, SSE
-instructions may lead to misaligned memory access traps.  In addition,
-variable arguments are handled incorrectly for 16 byte aligned
-objects (including x87 long double and __int128), leading to wrong
-results.  You must build all modules with
-@option{-mpreferred-stack-boundary=3}, including any libraries.  This
-includes the system libraries and startup modules.
+You can also use @option{-mtune=68020-40} for code that needs
+to run relatively well on 68020, 68030 and 68040 targets.
+@option{-mtune=68020-60} is similar but includes 68060 targets
+as well.  These two options select the same tuning decisions as
+@option{-m68020-40} and @option{-m68020-60} respectively.
 
-@item -mincoming-stack-boundary=@var{num}
-@opindex mincoming-stack-boundary
-Assume the incoming stack is aligned to a 2 raised to @var{num} byte
-boundary.  If @option{-mincoming-stack-boundary} is not specified,
-the one specified by @option{-mpreferred-stack-boundary} is used.
+GCC defines the macros @code{__mc@var{arch}} and @code{__mc@var{arch}__}
+when tuning for 680x0 architecture @var{arch}.  It also defines
+@code{mc@var{arch}} unless either @option{-ansi} or a non-GNU @option{-std}
+option is used.  If GCC is tuning for a range of architectures,
+as selected by @option{-mtune=68020-40} or @option{-mtune=68020-60},
+it defines the macros for every architecture in the range.
 
-On Pentium and Pentium Pro, @code{double} and @code{long double} values
-should be aligned to an 8-byte boundary (see @option{-malign-double}) or
-suffer significant run time performance penalties.  On Pentium III, the
-Streaming SIMD Extension (SSE) data type @code{__m128} may not work
-properly if it is not 16-byte aligned.
+GCC also defines the macro @code{__m@var{uarch}__} when tuning for
+ColdFire microarchitecture @var{uarch}, where @var{uarch} is one
+of the arguments given above.
 
-To ensure proper alignment of this values on the stack, the stack boundary
-must be as aligned as that required by any value stored on the stack.
-Further, every function must be generated such that it keeps the stack
-aligned.  Thus calling a function compiled with a higher preferred
-stack boundary from a function compiled with a lower preferred stack
-boundary most likely misaligns the stack.  It is recommended that
-libraries that use callbacks always use the default setting.
+@item -m68000
+@itemx -mc68000
+@opindex m68000
+@opindex mc68000
+Generate output for a 68000.  This is the default
+when the compiler is configured for 68000-based systems.
+It is equivalent to @option{-march=68000}.
 
-This extra alignment does consume extra stack space, and generally
-increases code size.  Code that is sensitive to stack space usage, such
-as embedded systems and operating system kernels, may want to reduce the
-preferred alignment to @option{-mpreferred-stack-boundary=2}.
+Use this option for microcontrollers with a 68000 or EC000 core,
+including the 68008, 68302, 68306, 68307, 68322, 68328 and 68356.
 
-@need 200
-@item -mmmx
-@opindex mmmx
-@need 200
-@itemx -msse
-@opindex msse
-@need 200
-@itemx -msse2
-@need 200
-@itemx -msse3
-@need 200
-@itemx -mssse3
-@need 200
-@itemx -msse4
-@need 200
-@itemx -msse4a
-@need 200
-@itemx -msse4.1
-@need 200
-@itemx -msse4.2
-@need 200
-@itemx -mavx
-@opindex mavx
-@need 200
-@itemx -mavx2
-@need 200
-@itemx -mavx512f
-@need 200
-@itemx -mavx512pf
-@need 200
-@itemx -mavx512er
-@need 200
-@itemx -mavx512cd
-@need 200
-@itemx -msha
-@opindex msha
-@need 200
-@itemx -maes
-@opindex maes
-@need 200
-@itemx -mpclmul
-@opindex mpclmul
-@need 200
-@itemx -mclfushopt
-@opindex mclfushopt
-@need 200
-@itemx -mfsgsbase
-@opindex mfsgsbase
-@need 200
-@itemx -mrdrnd
-@opindex mrdrnd
-@need 200
-@itemx -mf16c
-@opindex mf16c
-@need 200
-@itemx -mfma
-@opindex mfma
-@need 200
-@itemx -mfma4
-@need 200
-@itemx -mno-fma4
-@need 200
-@itemx -mprefetchwt1
-@opindex mprefetchwt1
-@need 200
-@itemx -mxop
-@opindex mxop
-@need 200
-@itemx -mlwp
-@opindex mlwp
-@need 200
-@itemx -m3dnow
-@opindex m3dnow
-@need 200
-@itemx -mpopcnt
-@opindex mpopcnt
-@need 200
-@itemx -mabm
-@opindex mabm
-@need 200
-@itemx -mbmi
-@opindex mbmi
-@need 200
-@itemx -mbmi2
-@need 200
-@itemx -mlzcnt
-@opindex mlzcnt
-@need 200
-@itemx -mfxsr
-@opindex mfxsr
-@need 200
-@itemx -mxsave
-@opindex mxsave
-@need 200
-@itemx -mxsaveopt
-@opindex mxsaveopt
-@need 200
-@itemx -mxsavec
-@opindex mxsavec
-@need 200
-@itemx -mxsaves
-@opindex mxsaves
-@need 200
-@itemx -mrtm
-@opindex mrtm
-@need 200
-@itemx -mtbm
-@opindex mtbm
-@need 200
-@itemx -mmpx
-@opindex mmpx
-These switches enable the use of instructions in the MMX, SSE,
-SSE2, SSE3, SSSE3, SSE4.1, AVX, AVX2, AVX512F, AVX512PF, AVX512ER, AVX512CD,
-SHA, AES, PCLMUL, FSGSBASE, RDRND, F16C, FMA, SSE4A, FMA4, XOP, LWP, ABM,
-BMI, BMI2, FXSR, XSAVE, XSAVEOPT, LZCNT, RTM, MPX or 3DNow!@:
-extended instruction sets.  Each has a corresponding @option{-mno-} option
-to disable use of these instructions.
+@item -m68010
+@opindex m68010
+Generate output for a 68010.  This is the default
+when the compiler is configured for 68010-based systems.
+It is equivalent to @option{-march=68010}.
 
-These extensions are also available as built-in functions: see
-@ref{x86 Built-in Functions}, for details of the functions enabled and
-disabled by these switches.
+@item -m68020
+@itemx -mc68020
+@opindex m68020
+@opindex mc68020
+Generate output for a 68020.  This is the default
+when the compiler is configured for 68020-based systems.
+It is equivalent to @option{-march=68020}.
 
-To generate SSE/SSE2 instructions automatically from floating-point
-code (as opposed to 387 instructions), see @option{-mfpmath=sse}.
+@item -m68030
+@opindex m68030
+Generate output for a 68030.  This is the default when the compiler is
+configured for 68030-based systems.  It is equivalent to
+@option{-march=68030}.
 
-GCC depresses SSEx instructions when @option{-mavx} is used. Instead, it
-generates new AVX instructions or AVX equivalence for all SSEx instructions
-when needed.
+@item -m68040
+@opindex m68040
+Generate output for a 68040.  This is the default when the compiler is
+configured for 68040-based systems.  It is equivalent to
+@option{-march=68040}.
 
-These options enable GCC to use these extended instructions in
-generated code, even without @option{-mfpmath=sse}.  Applications that
-perform run-time CPU detection must compile separate files for each
-supported architecture, using the appropriate flags.  In particular,
-the file containing the CPU detection code should be compiled without
-these options.
+This option inhibits the use of 68881/68882 instructions that have to be
+emulated by software on the 68040.  Use this option if your 68040 does not
+have code to emulate those instructions.
 
-@item -mdump-tune-features
-@opindex mdump-tune-features
-This option instructs GCC to dump the names of the x86 performance 
-tuning features and default settings. The names can be used in 
-@option{-mtune-ctrl=@var{feature-list}}.
+@item -m68060
+@opindex m68060
+Generate output for a 68060.  This is the default when the compiler is
+configured for 68060-based systems.  It is equivalent to
+@option{-march=68060}.
 
-@item -mtune-ctrl=@var{feature-list}
-@opindex mtune-ctrl=@var{feature-list}
-This option is used to do fine grain control of x86 code generation features.
-@var{feature-list} is a comma separated list of @var{feature} names. See also
-@option{-mdump-tune-features}. When specified, the @var{feature} is turned
-on if it is not preceded with @samp{^}, otherwise, it is turned off. 
-@option{-mtune-ctrl=@var{feature-list}} is intended to be used by GCC
-developers. Using it may lead to code paths not covered by testing and can
-potentially result in compiler ICEs or runtime errors.
+This option inhibits the use of 68020 and 68881/68882 instructions that
+have to be emulated by software on the 68060.  Use this option if your 68060
+does not have code to emulate those instructions.
 
-@item -mno-default
-@opindex mno-default
-This option instructs GCC to turn off all tunable features. See also 
-@option{-mtune-ctrl=@var{feature-list}} and @option{-mdump-tune-features}.
+@item -mcpu32
+@opindex mcpu32
+Generate output for a CPU32.  This is the default
+when the compiler is configured for CPU32-based systems.
+It is equivalent to @option{-march=cpu32}.
 
-@item -mcld
-@opindex mcld
-This option instructs GCC to emit a @code{cld} instruction in the prologue
-of functions that use string instructions.  String instructions depend on
-the DF flag to select between autoincrement or autodecrement mode.  While the
-ABI specifies the DF flag to be cleared on function entry, some operating
-systems violate this specification by not clearing the DF flag in their
-exception dispatchers.  The exception handler can be invoked with the DF flag
-set, which leads to wrong direction mode when string instructions are used.
-This option can be enabled by default on 32-bit x86 targets by configuring
-GCC with the @option{--enable-cld} configure option.  Generation of @code{cld}
-instructions can be suppressed with the @option{-mno-cld} compiler option
-in this case.
+Use this option for microcontrollers with a
+CPU32 or CPU32+ core, including the 68330, 68331, 68332, 68333, 68334,
+68336, 68340, 68341, 68349 and 68360.
 
-@item -mvzeroupper
-@opindex mvzeroupper
-This option instructs GCC to emit a @code{vzeroupper} instruction
-before a transfer of control flow out of the function to minimize
-the AVX to SSE transition penalty as well as remove unnecessary @code{zeroupper}
-intrinsics.
+@item -m5200
+@opindex m5200
+Generate output for a 520X ColdFire CPU@.  This is the default
+when the compiler is configured for 520X-based systems.
+It is equivalent to @option{-mcpu=5206}, and is now deprecated
+in favor of that option.
 
-@item -mprefer-avx128
-@opindex mprefer-avx128
-This option instructs GCC to use 128-bit AVX instructions instead of
-256-bit AVX instructions in the auto-vectorizer.
+Use this option for microcontroller with a 5200 core, including
+the MCF5202, MCF5203, MCF5204 and MCF5206.
 
-@item -mcx16
-@opindex mcx16
-This option enables GCC to generate @code{CMPXCHG16B} instructions.
-@code{CMPXCHG16B} allows for atomic operations on 128-bit double quadword
-(or oword) data types.  
-This is useful for high-resolution counters that can be updated
-by multiple processors (or cores).  This instruction is generated as part of
-atomic built-in functions: see @ref{__sync Builtins} or
-@ref{__atomic Builtins} for details.
+@item -m5206e
+@opindex m5206e
+Generate output for a 5206e ColdFire CPU@.  The option is now
+deprecated in favor of the equivalent @option{-mcpu=5206e}.
 
-@item -msahf
-@opindex msahf
-This option enables generation of @code{SAHF} instructions in 64-bit code.
-Early Intel Pentium 4 CPUs with Intel 64 support,
-prior to the introduction of Pentium 4 G1 step in December 2005,
-lacked the @code{LAHF} and @code{SAHF} instructions
-which are supported by AMD64.
-These are load and store instructions, respectively, for certain status flags.
-In 64-bit mode, the @code{SAHF} instruction is used to optimize @code{fmod},
-@code{drem}, and @code{remainder} built-in functions;
-see @ref{Other Builtins} for details.
+@item -m528x
+@opindex m528x
+Generate output for a member of the ColdFire 528X family.
+The option is now deprecated in favor of the equivalent
+@option{-mcpu=528x}.
 
-@item -mmovbe
-@opindex mmovbe
-This option enables use of the @code{movbe} instruction to implement
-@code{__builtin_bswap32} and @code{__builtin_bswap64}.
+@item -m5307
+@opindex m5307
+Generate output for a ColdFire 5307 CPU@.  The option is now deprecated
+in favor of the equivalent @option{-mcpu=5307}.
 
-@item -mcrc32
-@opindex mcrc32
-This option enables built-in functions @code{__builtin_ia32_crc32qi},
-@code{__builtin_ia32_crc32hi}, @code{__builtin_ia32_crc32si} and
-@code{__builtin_ia32_crc32di} to generate the @code{crc32} machine instruction.
+@item -m5407
+@opindex m5407
+Generate output for a ColdFire 5407 CPU@.  The option is now deprecated
+in favor of the equivalent @option{-mcpu=5407}.
 
-@item -mrecip
-@opindex mrecip
-This option enables use of @code{RCPSS} and @code{RSQRTSS} instructions
-(and their vectorized variants @code{RCPPS} and @code{RSQRTPS})
-with an additional Newton-Raphson step
-to increase precision instead of @code{DIVSS} and @code{SQRTSS}
-(and their vectorized
-variants) for single-precision floating-point arguments.  These instructions
-are generated only when @option{-funsafe-math-optimizations} is enabled
-together with @option{-finite-math-only} and @option{-fno-trapping-math}.
-Note that while the throughput of the sequence is higher than the throughput
-of the non-reciprocal instruction, the precision of the sequence can be
-decreased by up to 2 ulp (i.e. the inverse of 1.0 equals 0.99999994).
+@item -mcfv4e
+@opindex mcfv4e
+Generate output for a ColdFire V4e family CPU (e.g.@: 547x/548x).
+This includes use of hardware floating-point instructions.
+The option is equivalent to @option{-mcpu=547x}, and is now
+deprecated in favor of that option.
 
-Note that GCC implements @code{1.0f/sqrtf(@var{x})} in terms of @code{RSQRTSS}
-(or @code{RSQRTPS}) already with @option{-ffast-math} (or the above option
-combination), and doesn't need @option{-mrecip}.
+@item -m68020-40
+@opindex m68020-40
+Generate output for a 68040, without using any of the new instructions.
+This results in code that can run relatively efficiently on either a
+68020/68881 or a 68030 or a 68040.  The generated code does use the
+68881 instructions that are emulated on the 68040.
 
-Also note that GCC emits the above sequence with additional Newton-Raphson step
-for vectorized single-float division and vectorized @code{sqrtf(@var{x})}
-already with @option{-ffast-math} (or the above option combination), and
-doesn't need @option{-mrecip}.
+The option is equivalent to @option{-march=68020} @option{-mtune=68020-40}.
 
-@item -mrecip=@var{opt}
-@opindex mrecip=opt
-This option controls which reciprocal estimate instructions
-may be used.  @var{opt} is a comma-separated list of options, which may
-be preceded by a @samp{!} to invert the option:
+@item -m68020-60
+@opindex m68020-60
+Generate output for a 68060, without using any of the new instructions.
+This results in code that can run relatively efficiently on either a
+68020/68881 or a 68030 or a 68040.  The generated code does use the
+68881 instructions that are emulated on the 68060.
 
-@table @samp
-@item all
-Enable all estimate instructions.
+The option is equivalent to @option{-march=68020} @option{-mtune=68020-60}.
 
-@item default
-Enable the default instructions, equivalent to @option{-mrecip}.
+@item -mhard-float
+@itemx -m68881
+@opindex mhard-float
+@opindex m68881
+Generate floating-point instructions.  This is the default for 68020
+and above, and for ColdFire devices that have an FPU@.  It defines the
+macro @code{__HAVE_68881__} on M680x0 targets and @code{__mcffpu__}
+on ColdFire targets.
 
-@item none
-Disable all estimate instructions, equivalent to @option{-mno-recip}.
+@item -msoft-float
+@opindex msoft-float
+Do not generate floating-point instructions; use library calls instead.
+This is the default for 68000, 68010, and 68832 targets.  It is also
+the default for ColdFire devices that have no FPU.
 
-@item div
-Enable the approximation for scalar division.
+@item -mdiv
+@itemx -mno-div
+@opindex mdiv
+@opindex mno-div
+Generate (do not generate) ColdFire hardware divide and remainder
+instructions.  If @option{-march} is used without @option{-mcpu},
+the default is ``on'' for ColdFire architectures and ``off'' for M680x0
+architectures.  Otherwise, the default is taken from the target CPU
+(either the default CPU, or the one specified by @option{-mcpu}).  For
+example, the default is ``off'' for @option{-mcpu=5206} and ``on'' for
+@option{-mcpu=5206e}.
 
-@item vec-div
-Enable the approximation for vectorized division.
+GCC defines the macro @code{__mcfhwdiv__} when this option is enabled.
 
-@item sqrt
-Enable the approximation for scalar square root.
+@item -mshort
+@opindex mshort
+Consider type @code{int} to be 16 bits wide, like @code{short int}.
+Additionally, parameters passed on the stack are also aligned to a
+16-bit boundary even on targets whose API mandates promotion to 32-bit.
 
-@item vec-sqrt
-Enable the approximation for vectorized square root.
-@end table
+@item -mno-short
+@opindex mno-short
+Do not consider type @code{int} to be 16 bits wide.  This is the default.
 
-So, for example, @option{-mrecip=all,!sqrt} enables
-all of the reciprocal approximations, except for square root.
+@item -mnobitfield
+@itemx -mno-bitfield
+@opindex mnobitfield
+@opindex mno-bitfield
+Do not use the bit-field instructions.  The @option{-m68000}, @option{-mcpu32}
+and @option{-m5200} options imply @w{@option{-mnobitfield}}.
 
-@item -mveclibabi=@var{type}
-@opindex mveclibabi
-Specifies the ABI type to use for vectorizing intrinsics using an
-external library.  Supported values for @var{type} are @samp{svml} 
-for the Intel short
-vector math library and @samp{acml} for the AMD math core library.
-To use this option, both @option{-ftree-vectorize} and
-@option{-funsafe-math-optimizations} have to be enabled, and an SVML or ACML 
-ABI-compatible library must be specified at link time.
+@item -mbitfield
+@opindex mbitfield
+Do use the bit-field instructions.  The @option{-m68020} option implies
+@option{-mbitfield}.  This is the default if you use a configuration
+designed for a 68020.
 
-GCC currently emits calls to @code{vmldExp2},
-@code{vmldLn2}, @code{vmldLog102}, @code{vmldLog102}, @code{vmldPow2},
-@code{vmldTanh2}, @code{vmldTan2}, @code{vmldAtan2}, @code{vmldAtanh2},
-@code{vmldCbrt2}, @code{vmldSinh2}, @code{vmldSin2}, @code{vmldAsinh2},
-@code{vmldAsin2}, @code{vmldCosh2}, @code{vmldCos2}, @code{vmldAcosh2},
-@code{vmldAcos2}, @code{vmlsExp4}, @code{vmlsLn4}, @code{vmlsLog104},
-@code{vmlsLog104}, @code{vmlsPow4}, @code{vmlsTanh4}, @code{vmlsTan4},
-@code{vmlsAtan4}, @code{vmlsAtanh4}, @code{vmlsCbrt4}, @code{vmlsSinh4},
-@code{vmlsSin4}, @code{vmlsAsinh4}, @code{vmlsAsin4}, @code{vmlsCosh4},
-@code{vmlsCos4}, @code{vmlsAcosh4} and @code{vmlsAcos4} for corresponding
-function type when @option{-mveclibabi=svml} is used, and @code{__vrd2_sin},
-@code{__vrd2_cos}, @code{__vrd2_exp}, @code{__vrd2_log}, @code{__vrd2_log2},
-@code{__vrd2_log10}, @code{__vrs4_sinf}, @code{__vrs4_cosf},
-@code{__vrs4_expf}, @code{__vrs4_logf}, @code{__vrs4_log2f},
-@code{__vrs4_log10f} and @code{__vrs4_powf} for the corresponding function type
-when @option{-mveclibabi=acml} is used.  
+@item -mrtd
+@opindex mrtd
+Use a different function-calling convention, in which functions
+that take a fixed number of arguments return with the @code{rtd}
+instruction, which pops their arguments while returning.  This
+saves one instruction in the caller since there is no need to pop
+the arguments there.
 
-@item -mabi=@var{name}
-@opindex mabi
-Generate code for the specified calling convention.  Permissible values
-are @samp{sysv} for the ABI used on GNU/Linux and other systems, and
-@samp{ms} for the Microsoft ABI.  The default is to use the Microsoft
-ABI when targeting Microsoft Windows and the SysV ABI on all other systems.
-You can control this behavior for specific functions by
-using the function attributes @code{ms_abi} and @code{sysv_abi}.
-@xref{Function Attributes}.
+This calling convention is incompatible with the one normally
+used on Unix, so you cannot use it if you need to call libraries
+compiled with the Unix compiler.
 
-@item -mtls-dialect=@var{type}
-@opindex mtls-dialect
-Generate code to access thread-local storage using the @samp{gnu} or
-@samp{gnu2} conventions.  @samp{gnu} is the conservative default;
-@samp{gnu2} is more efficient, but it may add compile- and run-time
-requirements that cannot be satisfied on all systems.
+Also, you must provide function prototypes for all functions that
+take variable numbers of arguments (including @code{printf});
+otherwise incorrect code is generated for calls to those
+functions.
 
-@item -mpush-args
-@itemx -mno-push-args
-@opindex mpush-args
-@opindex mno-push-args
-Use PUSH operations to store outgoing parameters.  This method is shorter
-and usually equally fast as method using SUB/MOV operations and is enabled
-by default.  In some cases disabling it may improve performance because of
-improved scheduling and reduced dependencies.
+In addition, seriously incorrect code results if you call a
+function with too many arguments.  (Normally, extra arguments are
+harmlessly ignored.)
 
-@item -maccumulate-outgoing-args
-@opindex maccumulate-outgoing-args
-If enabled, the maximum amount of space required for outgoing arguments is
-computed in the function prologue.  This is faster on most modern CPUs
-because of reduced dependencies, improved scheduling and reduced stack usage
-when the preferred stack boundary is not equal to 2.  The drawback is a notable
-increase in code size.  This switch implies @option{-mno-push-args}.
+The @code{rtd} instruction is supported by the 68010, 68020, 68030,
+68040, 68060 and CPU32 processors, but not by the 68000 or 5200.
 
-@item -mthreads
-@opindex mthreads
-Support thread-safe exception handling on MinGW.  Programs that rely
-on thread-safe exception handling must compile and link all code with the
-@option{-mthreads} option.  When compiling, @option{-mthreads} defines
-@option{-D_MT}; when linking, it links in a special thread helper library
-@option{-lmingwthrd} which cleans up per-thread exception-handling data.
+@item -mno-rtd
+@opindex mno-rtd
+Do not use the calling conventions selected by @option{-mrtd}.
+This is the default.
 
-@item -mno-align-stringops
-@opindex mno-align-stringops
-Do not align the destination of inlined string operations.  This switch reduces
-code size and improves performance in case the destination is already aligned,
-but GCC doesn't know about it.
+@item -malign-int
+@itemx -mno-align-int
+@opindex malign-int
+@opindex mno-align-int
+Control whether GCC aligns @code{int}, @code{long}, @code{long long},
+@code{float}, @code{double}, and @code{long double} variables on a 32-bit
+boundary (@option{-malign-int}) or a 16-bit boundary (@option{-mno-align-int}).
+Aligning variables on 32-bit boundaries produces code that runs somewhat
+faster on processors with 32-bit busses at the expense of more memory.
 
-@item -minline-all-stringops
-@opindex minline-all-stringops
-By default GCC inlines string operations only when the destination is 
-known to be aligned to least a 4-byte boundary.  
-This enables more inlining and increases code
-size, but may improve performance of code that depends on fast
-@code{memcpy}, @code{strlen},
-and @code{memset} for short lengths.
+@strong{Warning:} if you use the @option{-malign-int} switch, GCC
+aligns structures containing the above types differently than
+most published application binary interface specifications for the m68k.
 
-@item -minline-stringops-dynamically
-@opindex minline-stringops-dynamically
-For string operations of unknown size, use run-time checks with
-inline code for small blocks and a library call for large blocks.
+@item -mpcrel
+@opindex mpcrel
+Use the pc-relative addressing mode of the 68000 directly, instead of
+using a global offset table.  At present, this option implies @option{-fpic},
+allowing at most a 16-bit offset for pc-relative addressing.  @option{-fPIC} is
+not presently supported with @option{-mpcrel}, though this could be supported for
+68020 and higher processors.
 
-@item -mstringop-strategy=@var{alg}
-@opindex mstringop-strategy=@var{alg}
-Override the internal decision heuristic for the particular algorithm to use
-for inlining string operations.  The allowed values for @var{alg} are:
+@item -mno-strict-align
+@itemx -mstrict-align
+@opindex mno-strict-align
+@opindex mstrict-align
+Do not (do) assume that unaligned memory references are handled by
+the system.
 
-@table @samp
-@item rep_byte
-@itemx rep_4byte
-@itemx rep_8byte
-Expand using i386 @code{rep} prefix of the specified size.
+@item -msep-data
+Generate code that allows the data segment to be located in a different
+area of memory from the text segment.  This allows for execute-in-place in
+an environment without virtual memory management.  This option implies
+@option{-fPIC}.
 
-@item byte_loop
-@itemx loop
-@itemx unrolled_loop
-Expand into an inline loop.
+@item -mno-sep-data
+Generate code that assumes that the data segment follows the text segment.
+This is the default.
 
-@item libcall
-Always use a library call.
-@end table
+@item -mid-shared-library
+Generate code that supports shared libraries via the library ID method.
+This allows for execute-in-place and shared libraries in an environment
+without virtual memory management.  This option implies @option{-fPIC}.
 
-@item -mmemcpy-strategy=@var{strategy}
-@opindex mmemcpy-strategy=@var{strategy}
-Override the internal decision heuristic to decide if @code{__builtin_memcpy}
-should be inlined and what inline algorithm to use when the expected size
-of the copy operation is known. @var{strategy} 
-is a comma-separated list of @var{alg}:@var{max_size}:@var{dest_align} triplets. 
-@var{alg} is specified in @option{-mstringop-strategy}, @var{max_size} specifies
-the max byte size with which inline algorithm @var{alg} is allowed.  For the last
-triplet, the @var{max_size} must be @code{-1}. The @var{max_size} of the triplets
-in the list must be specified in increasing order.  The minimal byte size for 
-@var{alg} is @code{0} for the first triplet and @code{@var{max_size} + 1} of the 
-preceding range.
+@item -mno-id-shared-library
+Generate code that doesn't assume ID-based shared libraries are being used.
+This is the default.
 
-@item -mmemset-strategy=@var{strategy}
-@opindex mmemset-strategy=@var{strategy}
-The option is similar to @option{-mmemcpy-strategy=} except that it is to control
-@code{__builtin_memset} expansion.
+@item -mshared-library-id=n
+Specifies the identification number of the ID-based shared library being
+compiled.  Specifying a value of 0 generates more compact code; specifying
+other values forces the allocation of that number to the current
+library, but is no more space- or time-efficient than omitting this option.
 
-@item -momit-leaf-frame-pointer
-@opindex momit-leaf-frame-pointer
-Don't keep the frame pointer in a register for leaf functions.  This
-avoids the instructions to save, set up, and restore frame pointers and
-makes an extra register available in leaf functions.  The option
-@option{-fomit-leaf-frame-pointer} removes the frame pointer for leaf functions,
-which might make debugging harder.
+@item -mxgot
+@itemx -mno-xgot
+@opindex mxgot
+@opindex mno-xgot
+When generating position-independent code for ColdFire, generate code
+that works if the GOT has more than 8192 entries.  This code is
+larger and slower than code generated without this option.  On M680x0
+processors, this option is not needed; @option{-fPIC} suffices.
 
-@item -mtls-direct-seg-refs
-@itemx -mno-tls-direct-seg-refs
-@opindex mtls-direct-seg-refs
-Controls whether TLS variables may be accessed with offsets from the
-TLS segment register (@code{%gs} for 32-bit, @code{%fs} for 64-bit),
-or whether the thread base pointer must be added.  Whether or not this
-is valid depends on the operating system, and whether it maps the
-segment to cover the entire TLS area.
+GCC normally uses a single instruction to load values from the GOT@.
+While this is relatively efficient, it only works if the GOT
+is smaller than about 64k.  Anything larger causes the linker
+to report an error such as:
 
-For systems that use the GNU C Library, the default is on.
+@cindex relocation truncated to fit (ColdFire)
+@smallexample
+relocation truncated to fit: R_68K_GOT16O foobar
+@end smallexample
 
-@item -msse2avx
-@itemx -mno-sse2avx
-@opindex msse2avx
-Specify that the assembler should encode SSE instructions with VEX
-prefix.  The option @option{-mavx} turns this on by default.
+If this happens, you should recompile your code with @option{-mxgot}.
+It should then work with very large GOTs.  However, code generated with
+@option{-mxgot} is less efficient, since it takes 4 instructions to fetch
+the value of a global symbol.
 
-@item -mfentry
-@itemx -mno-fentry
-@opindex mfentry
-If profiling is active (@option{-pg}), put the profiling
-counter call before the prologue.
-Note: On x86 architectures the attribute @code{ms_hook_prologue}
-isn't possible at the moment for @option{-mfentry} and @option{-pg}.
+Note that some linkers, including newer versions of the GNU linker,
+can create multiple GOTs and sort GOT entries.  If you have such a linker,
+you should only need to use @option{-mxgot} when compiling a single
+object file that accesses more than 8192 GOT entries.  Very few do.
 
-@item -mrecord-mcount
-@itemx -mno-record-mcount
-@opindex mrecord-mcount
-If profiling is active (@option{-pg}), generate a __mcount_loc section
-that contains pointers to each profiling call. This is useful for
-automatically patching and out calls.
+These options have no effect unless GCC is generating
+position-independent code.
 
-@item -mnop-mcount
-@itemx -mno-nop-mcount
-@opindex mnop-mcount
-If profiling is active (@option{-pg}), generate the calls to
-the profiling functions as nops. This is useful when they
-should be patched in later dynamically. This is likely only
-useful together with @option{-mrecord-mcount}.
+@end table
 
-@item -mskip-rax-setup
-@itemx -mno-skip-rax-setup
-@opindex mskip-rax-setup
-When generating code for the x86-64 architecture with SSE extensions
-disabled, @option{-skip-rax-setup} can be used to skip setting up RAX
-register when there are no variable arguments passed in vector registers.
+@node MCore Options
+@subsection MCore Options
+@cindex MCore options
 
-@strong{Warning:} Since RAX register is used to avoid unnecessarily
-saving vector registers on stack when passing variable arguments, the
-impacts of this option are callees may waste some stack space,
-misbehave or jump to a random location.  GCC 4.4 or newer don't have
-those issues, regardless the RAX register value.
+These are the @samp{-m} options defined for the Motorola M*Core
+processors.
 
-@item -m8bit-idiv
-@itemx -mno-8bit-idiv
-@opindex m8bit-idiv
-On some processors, like Intel Atom, 8-bit unsigned integer divide is
-much faster than 32-bit/64-bit integer divide.  This option generates a
-run-time check.  If both dividend and divisor are within range of 0
-to 255, 8-bit unsigned integer divide is used instead of
-32-bit/64-bit integer divide.
+@table @gcctabopt
 
-@item -mavx256-split-unaligned-load
-@itemx -mavx256-split-unaligned-store
-@opindex mavx256-split-unaligned-load
-@opindex mavx256-split-unaligned-store
-Split 32-byte AVX unaligned load and store.
+@item -mhardlit
+@itemx -mno-hardlit
+@opindex mhardlit
+@opindex mno-hardlit
+Inline constants into the code stream if it can be done in two
+instructions or less.
 
-@item -mstack-protector-guard=@var{guard}
-@opindex mstack-protector-guard=@var{guard}
-Generate stack protection code using canary at @var{guard}.  Supported
-locations are @samp{global} for global canary or @samp{tls} for per-thread
-canary in the TLS block (the default).  This option has effect only when
-@option{-fstack-protector} or @option{-fstack-protector-all} is specified.
+@item -mdiv
+@itemx -mno-div
+@opindex mdiv
+@opindex mno-div
+Use the divide instruction.  (Enabled by default).
 
-@end table
+@item -mrelax-immediate
+@itemx -mno-relax-immediate
+@opindex mrelax-immediate
+@opindex mno-relax-immediate
+Allow arbitrary-sized immediates in bit operations.
 
-These @samp{-m} switches are supported in addition to the above
-on x86-64 processors in 64-bit environments.
+@item -mwide-bitfields
+@itemx -mno-wide-bitfields
+@opindex mwide-bitfields
+@opindex mno-wide-bitfields
+Always treat bit-fields as @code{int}-sized.
 
-@table @gcctabopt
-@item -m32
-@itemx -m64
-@itemx -mx32
-@itemx -m16
-@opindex m32
-@opindex m64
-@opindex mx32
-@opindex m16
-Generate code for a 16-bit, 32-bit or 64-bit environment.
-The @option{-m32} option sets @code{int}, @code{long}, and pointer types
-to 32 bits, and
-generates code that runs on any i386 system.
+@item -m4byte-functions
+@itemx -mno-4byte-functions
+@opindex m4byte-functions
+@opindex mno-4byte-functions
+Force all functions to be aligned to a 4-byte boundary.
 
-The @option{-m64} option sets @code{int} to 32 bits and @code{long} and pointer
-types to 64 bits, and generates code for the x86-64 architecture.
-For Darwin only the @option{-m64} option also turns off the @option{-fno-pic}
-and @option{-mdynamic-no-pic} options.
+@item -mcallgraph-data
+@itemx -mno-callgraph-data
+@opindex mcallgraph-data
+@opindex mno-callgraph-data
+Emit callgraph information.
 
-The @option{-mx32} option sets @code{int}, @code{long}, and pointer types
-to 32 bits, and
-generates code for the x86-64 architecture.
+@item -mslow-bytes
+@itemx -mno-slow-bytes
+@opindex mslow-bytes
+@opindex mno-slow-bytes
+Prefer word access when reading byte quantities.
 
-The @option{-m16} option is the same as @option{-m32}, except for that
-it outputs the @code{.code16gcc} assembly directive at the beginning of
-the assembly output so that the binary can run in 16-bit mode.
+@item -mlittle-endian
+@itemx -mbig-endian
+@opindex mlittle-endian
+@opindex mbig-endian
+Generate code for a little-endian target.
 
-@item -mno-red-zone
-@opindex mno-red-zone
-Do not use a so-called ``red zone'' for x86-64 code.  The red zone is mandated
-by the x86-64 ABI; it is a 128-byte area beyond the location of the
-stack pointer that is not modified by signal or interrupt handlers
-and therefore can be used for temporary data without adjusting the stack
-pointer.  The flag @option{-mno-red-zone} disables this red zone.
+@item -m210
+@itemx -m340
+@opindex m210
+@opindex m340
+Generate code for the 210 processor.
 
-@item -mcmodel=small
-@opindex mcmodel=small
-Generate code for the small code model: the program and its symbols must
-be linked in the lower 2 GB of the address space.  Pointers are 64 bits.
-Programs can be statically or dynamically linked.  This is the default
-code model.
+@item -mno-lsim
+@opindex mno-lsim
+Assume that runtime support has been provided and so omit the
+simulator library (@file{libsim.a)} from the linker command line.
 
-@item -mcmodel=kernel
-@opindex mcmodel=kernel
-Generate code for the kernel code model.  The kernel runs in the
-negative 2 GB of the address space.
-This model has to be used for Linux kernel code.
+@item -mstack-increment=@var{size}
+@opindex mstack-increment
+Set the maximum amount for a single stack increment operation.  Large
+values can increase the speed of programs that contain functions
+that need a large amount of stack space, but they can also trigger a
+segmentation fault if the stack is extended too much.  The default
+value is 0x1000.
 
-@item -mcmodel=medium
-@opindex mcmodel=medium
-Generate code for the medium model: the program is linked in the lower 2
-GB of the address space.  Small symbols are also placed there.  Symbols
-with sizes larger than @option{-mlarge-data-threshold} are put into
-large data or BSS sections and can be located above 2GB.  Programs can
-be statically or dynamically linked.
-
-@item -mcmodel=large
-@opindex mcmodel=large
-Generate code for the large model.  This model makes no assumptions
-about addresses and sizes of sections.
-
-@item -maddress-mode=long
-@opindex maddress-mode=long
-Generate code for long address mode.  This is only supported for 64-bit
-and x32 environments.  It is the default address mode for 64-bit
-environments.
-
-@item -maddress-mode=short
-@opindex maddress-mode=short
-Generate code for short address mode.  This is only supported for 32-bit
-and x32 environments.  It is the default address mode for 32-bit and
-x32 environments.
 @end table
 
-@node x86 Windows Options
-@subsection x86 Windows Options
-@cindex x86 Windows Options
-@cindex Windows Options for x86
-
-These additional options are available for Microsoft Windows targets:
+@node MeP Options
+@subsection MeP Options
+@cindex MeP options
 
 @table @gcctabopt
-@item -mconsole
-@opindex mconsole
-This option
-specifies that a console application is to be generated, by
-instructing the linker to set the PE header subsystem type
-required for console applications.
-This option is available for Cygwin and MinGW targets and is
-enabled by default on those targets.
-
-@item -mdll
-@opindex mdll
-This option is available for Cygwin and MinGW targets.  It
-specifies that a DLL---a dynamic link library---is to be
-generated, enabling the selection of the required runtime
-startup object and entry point.
 
-@item -mnop-fun-dllimport
-@opindex mnop-fun-dllimport
-This option is available for Cygwin and MinGW targets.  It
-specifies that the @code{dllimport} attribute should be ignored.
+@item -mabsdiff
+@opindex mabsdiff
+Enables the @code{abs} instruction, which is the absolute difference
+between two registers.
 
-@item -mthread
-@opindex mthread
-This option is available for MinGW targets. It specifies
-that MinGW-specific thread support is to be used.
+@item -mall-opts
+@opindex mall-opts
+Enables all the optional instructions---average, multiply, divide, bit
+operations, leading zero, absolute difference, min/max, clip, and
+saturation.
 
-@item -municode
-@opindex municode
-This option is available for MinGW-w64 targets.  It causes
-the @code{UNICODE} preprocessor macro to be predefined, and
-chooses Unicode-capable runtime startup code.
 
-@item -mwin32
-@opindex mwin32
-This option is available for Cygwin and MinGW targets.  It
-specifies that the typical Microsoft Windows predefined macros are to
-be set in the pre-processor, but does not influence the choice
-of runtime library/startup code.
+@item -maverage
+@opindex maverage
+Enables the @code{ave} instruction, which computes the average of two
+registers.
 
-@item -mwindows
-@opindex mwindows
-This option is available for Cygwin and MinGW targets.  It
-specifies that a GUI application is to be generated by
-instructing the linker to set the PE header subsystem type
-appropriately.
+@item -mbased=@var{n}
+@opindex mbased=
+Variables of size @var{n} bytes or smaller are placed in the
+@code{.based} section by default.  Based variables use the @code{$tp}
+register as a base register, and there is a 128-byte limit to the
+@code{.based} section.
 
-@item -fno-set-stack-executable
-@opindex fno-set-stack-executable
-This option is available for MinGW targets. It specifies that
-the executable flag for the stack used by nested functions isn't
-set. This is necessary for binaries running in kernel mode of
-Microsoft Windows, as there the User32 API, which is used to set executable
-privileges, isn't available.
+@item -mbitops
+@opindex mbitops
+Enables the bit operation instructions---bit test (@code{btstm}), set
+(@code{bsetm}), clear (@code{bclrm}), invert (@code{bnotm}), and
+test-and-set (@code{tas}).
 
-@item -fwritable-relocated-rdata
-@opindex fno-writable-relocated-rdata
-This option is available for MinGW and Cygwin targets.  It specifies
-that relocated-data in read-only section is put into .data
-section.  This is a necessary for older runtimes not supporting
-modification of .rdata sections for pseudo-relocation.
+@item -mc=@var{name}
+@opindex mc=
+Selects which section constant data is placed in.  @var{name} may
+be @samp{tiny}, @samp{near}, or @samp{far}.
 
-@item -mpe-aligned-commons
-@opindex mpe-aligned-commons
-This option is available for Cygwin and MinGW targets.  It
-specifies that the GNU extension to the PE file format that
-permits the correct alignment of COMMON variables should be
-used when generating code.  It is enabled by default if
-GCC detects that the target assembler found during configuration
-supports the feature.
-@end table
+@item -mclip
+@opindex mclip
+Enables the @code{clip} instruction.  Note that @option{-mclip} is not
+useful unless you also provide @option{-mminmax}.
 
-See also under @ref{x86 Options} for standard options.
+@item -mconfig=@var{name}
+@opindex mconfig=
+Selects one of the built-in core configurations.  Each MeP chip has
+one or more modules in it; each module has a core CPU and a variety of
+coprocessors, optional instructions, and peripherals.  The
+@code{MeP-Integrator} tool, not part of GCC, provides these
+configurations through this option; using this option is the same as
+using all the corresponding command-line options.  The default
+configuration is @samp{default}.
 
-@node IA-64 Options
-@subsection IA-64 Options
-@cindex IA-64 Options
+@item -mcop
+@opindex mcop
+Enables the coprocessor instructions.  By default, this is a 32-bit
+coprocessor.  Note that the coprocessor is normally enabled via the
+@option{-mconfig=} option.
 
-These are the @samp{-m} options defined for the Intel IA-64 architecture.
+@item -mcop32
+@opindex mcop32
+Enables the 32-bit coprocessor's instructions.
 
-@table @gcctabopt
-@item -mbig-endian
-@opindex mbig-endian
-Generate code for a big-endian target.  This is the default for HP-UX@.
+@item -mcop64
+@opindex mcop64
+Enables the 64-bit coprocessor's instructions.
 
-@item -mlittle-endian
-@opindex mlittle-endian
-Generate code for a little-endian target.  This is the default for AIX5
-and GNU/Linux.
+@item -mivc2
+@opindex mivc2
+Enables IVC2 scheduling.  IVC2 is a 64-bit VLIW coprocessor.
 
-@item -mgnu-as
-@itemx -mno-gnu-as
-@opindex mgnu-as
-@opindex mno-gnu-as
-Generate (or don't) code for the GNU assembler.  This is the default.
-@c Also, this is the default if the configure option @option{--with-gnu-as}
-@c is used.
+@item -mdc
+@opindex mdc
+Causes constant variables to be placed in the @code{.near} section.
 
-@item -mgnu-ld
-@itemx -mno-gnu-ld
-@opindex mgnu-ld
-@opindex mno-gnu-ld
-Generate (or don't) code for the GNU linker.  This is the default.
-@c Also, this is the default if the configure option @option{--with-gnu-ld}
-@c is used.
+@item -mdiv
+@opindex mdiv
+Enables the @code{div} and @code{divu} instructions.
 
-@item -mno-pic
-@opindex mno-pic
-Generate code that does not use a global pointer register.  The result
-is not position independent code, and violates the IA-64 ABI@.
+@item -meb
+@opindex meb
+Generate big-endian code.
 
-@item -mvolatile-asm-stop
-@itemx -mno-volatile-asm-stop
-@opindex mvolatile-asm-stop
-@opindex mno-volatile-asm-stop
-Generate (or don't) a stop bit immediately before and after volatile asm
-statements.
+@item -mel
+@opindex mel
+Generate little-endian code.
 
-@item -mregister-names
-@itemx -mno-register-names
-@opindex mregister-names
-@opindex mno-register-names
-Generate (or don't) @samp{in}, @samp{loc}, and @samp{out} register names for
-the stacked registers.  This may make assembler output more readable.
+@item -mio-volatile
+@opindex mio-volatile
+Tells the compiler that any variable marked with the @code{io}
+attribute is to be considered volatile.
 
-@item -mno-sdata
-@itemx -msdata
-@opindex mno-sdata
-@opindex msdata
-Disable (or enable) optimizations that use the small data section.  This may
-be useful for working around optimizer bugs.
+@item -ml
+@opindex ml
+Causes variables to be assigned to the @code{.far} section by default.
 
-@item -mconstant-gp
-@opindex mconstant-gp
-Generate code that uses a single constant global pointer value.  This is
-useful when compiling kernel code.
+@item -mleadz
+@opindex mleadz
+Enables the @code{leadz} (leading zero) instruction.
 
-@item -mauto-pic
-@opindex mauto-pic
-Generate code that is self-relocatable.  This implies @option{-mconstant-gp}.
-This is useful when compiling firmware code.
+@item -mm
+@opindex mm
+Causes variables to be assigned to the @code{.near} section by default.
 
-@item -minline-float-divide-min-latency
-@opindex minline-float-divide-min-latency
-Generate code for inline divides of floating-point values
-using the minimum latency algorithm.
+@item -mminmax
+@opindex mminmax
+Enables the @code{min} and @code{max} instructions.
 
-@item -minline-float-divide-max-throughput
-@opindex minline-float-divide-max-throughput
-Generate code for inline divides of floating-point values
-using the maximum throughput algorithm.
+@item -mmult
+@opindex mmult
+Enables the multiplication and multiply-accumulate instructions.
 
-@item -mno-inline-float-divide
-@opindex mno-inline-float-divide
-Do not generate inline code for divides of floating-point values.
+@item -mno-opts
+@opindex mno-opts
+Disables all the optional instructions enabled by @option{-mall-opts}.
 
-@item -minline-int-divide-min-latency
-@opindex minline-int-divide-min-latency
-Generate code for inline divides of integer values
-using the minimum latency algorithm.
+@item -mrepeat
+@opindex mrepeat
+Enables the @code{repeat} and @code{erepeat} instructions, used for
+low-overhead looping.
 
-@item -minline-int-divide-max-throughput
-@opindex minline-int-divide-max-throughput
-Generate code for inline divides of integer values
-using the maximum throughput algorithm.
+@item -ms
+@opindex ms
+Causes all variables to default to the @code{.tiny} section.  Note
+that there is a 65536-byte limit to this section.  Accesses to these
+variables use the @code{%gp} base register.
 
-@item -mno-inline-int-divide
-@opindex mno-inline-int-divide
-Do not generate inline code for divides of integer values.
+@item -msatur
+@opindex msatur
+Enables the saturation instructions.  Note that the compiler does not
+currently generate these itself, but this option is included for
+compatibility with other tools, like @code{as}.
 
-@item -minline-sqrt-min-latency
-@opindex minline-sqrt-min-latency
-Generate code for inline square roots
-using the minimum latency algorithm.
+@item -msdram
+@opindex msdram
+Link the SDRAM-based runtime instead of the default ROM-based runtime.
 
-@item -minline-sqrt-max-throughput
-@opindex minline-sqrt-max-throughput
-Generate code for inline square roots
-using the maximum throughput algorithm.
+@item -msim
+@opindex msim
+Link the simulator run-time libraries.
 
-@item -mno-inline-sqrt
-@opindex mno-inline-sqrt
-Do not generate inline code for @code{sqrt}.
+@item -msimnovec
+@opindex msimnovec
+Link the simulator runtime libraries, excluding built-in support
+for reset and exception vectors and tables.
 
-@item -mfused-madd
-@itemx -mno-fused-madd
-@opindex mfused-madd
-@opindex mno-fused-madd
-Do (don't) generate code that uses the fused multiply/add or multiply/subtract
-instructions.  The default is to use these instructions.
+@item -mtf
+@opindex mtf
+Causes all functions to default to the @code{.far} section.  Without
+this option, functions default to the @code{.near} section.
 
-@item -mno-dwarf2-asm
-@itemx -mdwarf2-asm
-@opindex mno-dwarf2-asm
-@opindex mdwarf2-asm
-Don't (or do) generate assembler code for the DWARF 2 line number debugging
-info.  This may be useful when not using the GNU assembler.
+@item -mtiny=@var{n}
+@opindex mtiny=
+Variables that are @var{n} bytes or smaller are allocated to the
+@code{.tiny} section.  These variables use the @code{$gp} base
+register.  The default for this option is 4, but note that there's a
+65536-byte limit to the @code{.tiny} section.
 
-@item -mearly-stop-bits
-@itemx -mno-early-stop-bits
-@opindex mearly-stop-bits
-@opindex mno-early-stop-bits
-Allow stop bits to be placed earlier than immediately preceding the
-instruction that triggered the stop bit.  This can improve instruction
-scheduling, but does not always do so.
+@end table
 
-@item -mfixed-range=@var{register-range}
-@opindex mfixed-range
-Generate code treating the given register range as fixed registers.
-A fixed register is one that the register allocator cannot use.  This is
-useful when compiling kernel code.  A register range is specified as
-two registers separated by a dash.  Multiple register ranges can be
-specified separated by a comma.
+@node MicroBlaze Options
+@subsection MicroBlaze Options
+@cindex MicroBlaze Options
 
-@item -mtls-size=@var{tls-size}
-@opindex mtls-size
-Specify bit size of immediate TLS offsets.  Valid values are 14, 22, and
-64.
+@table @gcctabopt
 
-@item -mtune=@var{cpu-type}
-@opindex mtune
-Tune the instruction scheduling for a particular CPU, Valid values are
-@samp{itanium}, @samp{itanium1}, @samp{merced}, @samp{itanium2},
-and @samp{mckinley}.
+@item -msoft-float
+@opindex msoft-float
+Use software emulation for floating point (default).
 
-@item -milp32
-@itemx -mlp64
-@opindex milp32
-@opindex mlp64
-Generate code for a 32-bit or 64-bit environment.
-The 32-bit environment sets int, long and pointer to 32 bits.
-The 64-bit environment sets int to 32 bits and long and pointer
-to 64 bits.  These are HP-UX specific flags.
+@item -mhard-float
+@opindex mhard-float
+Use hardware floating-point instructions.
 
-@item -mno-sched-br-data-spec
-@itemx -msched-br-data-spec
-@opindex mno-sched-br-data-spec
-@opindex msched-br-data-spec
-(Dis/En)able data speculative scheduling before reload.
-This results in generation of @code{ld.a} instructions and
-the corresponding check instructions (@code{ld.c} / @code{chk.a}).
-The default is 'disable'.
+@item -mmemcpy
+@opindex mmemcpy
+Do not optimize block moves, use @code{memcpy}.
 
-@item -msched-ar-data-spec
-@itemx -mno-sched-ar-data-spec
-@opindex msched-ar-data-spec
-@opindex mno-sched-ar-data-spec
-(En/Dis)able data speculative scheduling after reload.
-This results in generation of @code{ld.a} instructions and
-the corresponding check instructions (@code{ld.c} / @code{chk.a}).
-The default is 'enable'.
+@item -mno-clearbss
+@opindex mno-clearbss
+This option is deprecated.  Use @option{-fno-zero-initialized-in-bss} instead.
 
-@item -mno-sched-control-spec
-@itemx -msched-control-spec
-@opindex mno-sched-control-spec
-@opindex msched-control-spec
-(Dis/En)able control speculative scheduling.  This feature is
-available only during region scheduling (i.e.@: before reload).
-This results in generation of the @code{ld.s} instructions and
-the corresponding check instructions @code{chk.s}.
-The default is 'disable'.
+@item -mcpu=@var{cpu-type}
+@opindex mcpu=
+Use features of, and schedule code for, the given CPU.
+Supported values are in the format @samp{v@var{X}.@var{YY}.@var{Z}},
+where @var{X} is a major version, @var{YY} is the minor version, and
+@var{Z} is compatibility code.  Example values are @samp{v3.00.a},
+@samp{v4.00.b}, @samp{v5.00.a}, @samp{v5.00.b}, @samp{v5.00.b}, @samp{v6.00.a}.
 
-@item -msched-br-in-data-spec
-@itemx -mno-sched-br-in-data-spec
-@opindex msched-br-in-data-spec
-@opindex mno-sched-br-in-data-spec
-(En/Dis)able speculative scheduling of the instructions that
-are dependent on the data speculative loads before reload.
-This is effective only with @option{-msched-br-data-spec} enabled.
-The default is 'enable'.
+@item -mxl-soft-mul
+@opindex mxl-soft-mul
+Use software multiply emulation (default).
 
-@item -msched-ar-in-data-spec
-@itemx -mno-sched-ar-in-data-spec
-@opindex msched-ar-in-data-spec
-@opindex mno-sched-ar-in-data-spec
-(En/Dis)able speculative scheduling of the instructions that
-are dependent on the data speculative loads after reload.
-This is effective only with @option{-msched-ar-data-spec} enabled.
-The default is 'enable'.
+@item -mxl-soft-div
+@opindex mxl-soft-div
+Use software emulation for divides (default).
 
-@item -msched-in-control-spec
-@itemx -mno-sched-in-control-spec
-@opindex msched-in-control-spec
-@opindex mno-sched-in-control-spec
-(En/Dis)able speculative scheduling of the instructions that
-are dependent on the control speculative loads.
-This is effective only with @option{-msched-control-spec} enabled.
-The default is 'enable'.
+@item -mxl-barrel-shift
+@opindex mxl-barrel-shift
+Use the hardware barrel shifter.
 
-@item -mno-sched-prefer-non-data-spec-insns
-@itemx -msched-prefer-non-data-spec-insns
-@opindex mno-sched-prefer-non-data-spec-insns
-@opindex msched-prefer-non-data-spec-insns
-If enabled, data-speculative instructions are chosen for schedule
-only if there are no other choices at the moment.  This makes
-the use of the data speculation much more conservative.
-The default is 'disable'.
+@item -mxl-pattern-compare
+@opindex mxl-pattern-compare
+Use pattern compare instructions.
 
-@item -mno-sched-prefer-non-control-spec-insns
-@itemx -msched-prefer-non-control-spec-insns
-@opindex mno-sched-prefer-non-control-spec-insns
-@opindex msched-prefer-non-control-spec-insns
-If enabled, control-speculative instructions are chosen for schedule
-only if there are no other choices at the moment.  This makes
-the use of the control speculation much more conservative.
-The default is 'disable'.
+@item -msmall-divides
+@opindex msmall-divides
+Use table lookup optimization for small signed integer divisions.
 
-@item -mno-sched-count-spec-in-critical-path
-@itemx -msched-count-spec-in-critical-path
-@opindex mno-sched-count-spec-in-critical-path
-@opindex msched-count-spec-in-critical-path
-If enabled, speculative dependencies are considered during
-computation of the instructions priorities.  This makes the use of the
-speculation a bit more conservative.
-The default is 'disable'.
+@item -mxl-stack-check
+@opindex mxl-stack-check
+This option is deprecated.  Use @option{-fstack-check} instead.
 
-@item -msched-spec-ldc
-@opindex msched-spec-ldc
-Use a simple data speculation check.  This option is on by default.
+@item -mxl-gp-opt
+@opindex mxl-gp-opt
+Use GP-relative @code{.sdata}/@code{.sbss} sections.
 
-@item -msched-control-spec-ldc
-@opindex msched-spec-ldc
-Use a simple check for control speculation.  This option is on by default.
+@item -mxl-multiply-high
+@opindex mxl-multiply-high
+Use multiply high instructions for high part of 32x32 multiply.
 
-@item -msched-stop-bits-after-every-cycle
-@opindex msched-stop-bits-after-every-cycle
-Place a stop bit after every cycle when scheduling.  This option is on
-by default.
+@item -mxl-float-convert
+@opindex mxl-float-convert
+Use hardware floating-point conversion instructions.
 
-@item -msched-fp-mem-deps-zero-cost
-@opindex msched-fp-mem-deps-zero-cost
-Assume that floating-point stores and loads are not likely to cause a conflict
-when placed into the same instruction group.  This option is disabled by
-default.
+@item -mxl-float-sqrt
+@opindex mxl-float-sqrt
+Use hardware floating-point square root instruction.
 
-@item -msel-sched-dont-check-control-spec
-@opindex msel-sched-dont-check-control-spec
-Generate checks for control speculation in selective scheduling.
-This flag is disabled by default.
+@item -mbig-endian
+@opindex mbig-endian
+Generate code for a big-endian target.
 
-@item -msched-max-memory-insns=@var{max-insns}
-@opindex msched-max-memory-insns
-Limit on the number of memory insns per instruction group, giving lower
-priority to subsequent memory insns attempting to schedule in the same
-instruction group. Frequently useful to prevent cache bank conflicts.
-The default value is 1.
+@item -mlittle-endian
+@opindex mlittle-endian
+Generate code for a little-endian target.
 
-@item -msched-max-memory-insns-hard-limit
-@opindex msched-max-memory-insns-hard-limit
-Makes the limit specified by @option{msched-max-memory-insns} a hard limit,
-disallowing more than that number in an instruction group.
-Otherwise, the limit is ``soft'', meaning that non-memory operations
-are preferred when the limit is reached, but memory operations may still
-be scheduled.
-
-@end table
-
-@node LM32 Options
-@subsection LM32 Options
-@cindex LM32 options
-
-These @option{-m} options are defined for the LatticeMico32 architecture:
+@item -mxl-reorder
+@opindex mxl-reorder
+Use reorder instructions (swap and byte reversed load/store).
 
-@table @gcctabopt
-@item -mbarrel-shift-enabled
-@opindex mbarrel-shift-enabled
-Enable barrel-shift instructions.
+@item -mxl-mode-@var{app-model}
+Select application model @var{app-model}.  Valid models are
+@table @samp
+@item executable
+normal executable (default), uses startup code @file{crt0.o}.
 
-@item -mdivide-enabled
-@opindex mdivide-enabled
-Enable divide and modulus instructions.
+@item xmdstub
+for use with Xilinx Microprocessor Debugger (XMD) based
+software intrusive debug agent called xmdstub. This uses startup file
+@file{crt1.o} and sets the start address of the program to 0x800.
 
-@item -mmultiply-enabled
-@opindex multiply-enabled
-Enable multiply instructions.
+@item bootstrap
+for applications that are loaded using a bootloader.
+This model uses startup file @file{crt2.o} which does not contain a processor
+reset vector handler. This is suitable for transferring control on a
+processor reset to the bootloader rather than the application.
 
-@item -msign-extend-enabled
-@opindex msign-extend-enabled
-Enable sign extend instructions.
+@item novectors
+for applications that do not require any of the
+MicroBlaze vectors. This option may be useful for applications running
+within a monitoring application. This model uses @file{crt3.o} as a startup file.
+@end table
 
-@item -muser-enabled
-@opindex muser-enabled
-Enable user-defined instructions.
+Option @option{-xl-mode-@var{app-model}} is a deprecated alias for
+@option{-mxl-mode-@var{app-model}}.
 
 @end table
 
-@node M32C Options
-@subsection M32C Options
-@cindex M32C options
+@node MIPS Options
+@subsection MIPS Options
+@cindex MIPS options
 
 @table @gcctabopt
-@item -mcpu=@var{name}
-@opindex mcpu=
-Select the CPU for which code is generated.  @var{name} may be one of
-@samp{r8c} for the R8C/Tiny series, @samp{m16c} for the M16C (up to
-/60) series, @samp{m32cm} for the M16C/80 series, or @samp{m32c} for
-the M32C/80 series.
 
-@item -msim
-@opindex msim
-Specifies that the program will be run on the simulator.  This causes
-an alternate runtime library to be linked in which supports, for
-example, file I/O@.  You must not use this option when generating
-programs that will run on real hardware; you must provide your own
-runtime library for whatever I/O functions are needed.
+@item -EB
+@opindex EB
+Generate big-endian code.
 
-@item -memregs=@var{number}
-@opindex memregs=
-Specifies the number of memory-based pseudo-registers GCC uses
-during code generation.  These pseudo-registers are used like real
-registers, so there is a tradeoff between GCC's ability to fit the
-code into available registers, and the performance penalty of using
-memory instead of registers.  Note that all modules in a program must
-be compiled with the same value for this option.  Because of that, you
-must not use this option with GCC's default runtime libraries.
+@item -EL
+@opindex EL
+Generate little-endian code.  This is the default for @samp{mips*el-*-*}
+configurations.
 
-@end table
+@item -march=@var{arch}
+@opindex march
+Generate code that runs on @var{arch}, which can be the name of a
+generic MIPS ISA, or the name of a particular processor.
+The ISA names are:
+@samp{mips1}, @samp{mips2}, @samp{mips3}, @samp{mips4},
+@samp{mips32}, @samp{mips32r2}, @samp{mips32r3}, @samp{mips32r5},
+@samp{mips32r6}, @samp{mips64}, @samp{mips64r2}, @samp{mips64r3},
+@samp{mips64r5} and @samp{mips64r6}.
+The processor names are:
+@samp{4kc}, @samp{4km}, @samp{4kp}, @samp{4ksc},
+@samp{4kec}, @samp{4kem}, @samp{4kep}, @samp{4ksd},
+@samp{5kc}, @samp{5kf},
+@samp{20kc},
+@samp{24kc}, @samp{24kf2_1}, @samp{24kf1_1},
+@samp{24kec}, @samp{24kef2_1}, @samp{24kef1_1},
+@samp{34kc}, @samp{34kf2_1}, @samp{34kf1_1}, @samp{34kn},
+@samp{74kc}, @samp{74kf2_1}, @samp{74kf1_1}, @samp{74kf3_2},
+@samp{1004kc}, @samp{1004kf2_1}, @samp{1004kf1_1},
+@samp{loongson2e}, @samp{loongson2f}, @samp{loongson3a},
+@samp{m4k},
+@samp{m14k}, @samp{m14kc}, @samp{m14ke}, @samp{m14kec},
+@samp{octeon}, @samp{octeon+}, @samp{octeon2}, @samp{octeon3},
+@samp{orion},
+@samp{p5600},
+@samp{r2000}, @samp{r3000}, @samp{r3900}, @samp{r4000}, @samp{r4400},
+@samp{r4600}, @samp{r4650}, @samp{r4700}, @samp{r6000}, @samp{r8000},
+@samp{rm7000}, @samp{rm9000},
+@samp{r10000}, @samp{r12000}, @samp{r14000}, @samp{r16000},
+@samp{sb1},
+@samp{sr71000},
+@samp{vr4100}, @samp{vr4111}, @samp{vr4120}, @samp{vr4130}, @samp{vr4300},
+@samp{vr5000}, @samp{vr5400}, @samp{vr5500},
+@samp{xlr} and @samp{xlp}.
+The special value @samp{from-abi} selects the
+most compatible architecture for the selected ABI (that is,
+@samp{mips1} for 32-bit ABIs and @samp{mips3} for 64-bit ABIs)@.
 
-@node M32R/D Options
-@subsection M32R/D Options
-@cindex M32R/D options
+The native Linux/GNU toolchain also supports the value @samp{native},
+which selects the best architecture option for the host processor.
+@option{-march=native} has no effect if GCC does not recognize
+the processor.
 
-These @option{-m} options are defined for Renesas M32R/D architectures:
+In processor names, a final @samp{000} can be abbreviated as @samp{k}
+(for example, @option{-march=r2k}).  Prefixes are optional, and
+@samp{vr} may be written @samp{r}.
 
-@table @gcctabopt
-@item -m32r2
-@opindex m32r2
-Generate code for the M32R/2@.
+Names of the form @samp{@var{n}f2_1} refer to processors with
+FPUs clocked at half the rate of the core, names of the form
+@samp{@var{n}f1_1} refer to processors with FPUs clocked at the same
+rate as the core, and names of the form @samp{@var{n}f3_2} refer to
+processors with FPUs clocked a ratio of 3:2 with respect to the core.
+For compatibility reasons, @samp{@var{n}f} is accepted as a synonym
+for @samp{@var{n}f2_1} while @samp{@var{n}x} and @samp{@var{b}fx} are
+accepted as synonyms for @samp{@var{n}f1_1}.
 
-@item -m32rx
-@opindex m32rx
-Generate code for the M32R/X@.
+GCC defines two macros based on the value of this option.  The first
+is @code{_MIPS_ARCH}, which gives the name of target architecture, as
+a string.  The second has the form @code{_MIPS_ARCH_@var{foo}},
+where @var{foo} is the capitalized value of @code{_MIPS_ARCH}@.
+For example, @option{-march=r2000} sets @code{_MIPS_ARCH}
+to @code{"r2000"} and defines the macro @code{_MIPS_ARCH_R2000}.
 
-@item -m32r
-@opindex m32r
-Generate code for the M32R@.  This is the default.
+Note that the @code{_MIPS_ARCH} macro uses the processor names given
+above.  In other words, it has the full prefix and does not
+abbreviate @samp{000} as @samp{k}.  In the case of @samp{from-abi},
+the macro names the resolved architecture (either @code{"mips1"} or
+@code{"mips3"}).  It names the default architecture when no
+@option{-march} option is given.
 
-@item -mmodel=small
-@opindex mmodel=small
-Assume all objects live in the lower 16MB of memory (so that their addresses
-can be loaded with the @code{ld24} instruction), and assume all subroutines
-are reachable with the @code{bl} instruction.
-This is the default.
+@item -mtune=@var{arch}
+@opindex mtune
+Optimize for @var{arch}.  Among other things, this option controls
+the way instructions are scheduled, and the perceived cost of arithmetic
+operations.  The list of @var{arch} values is the same as for
+@option{-march}.
 
-The addressability of a particular object can be set with the
-@code{model} attribute.
+When this option is not used, GCC optimizes for the processor
+specified by @option{-march}.  By using @option{-march} and
+@option{-mtune} together, it is possible to generate code that
+runs on a family of processors, but optimize the code for one
+particular member of that family.
 
-@item -mmodel=medium
-@opindex mmodel=medium
-Assume objects may be anywhere in the 32-bit address space (the compiler
-generates @code{seth/add3} instructions to load their addresses), and
-assume all subroutines are reachable with the @code{bl} instruction.
+@option{-mtune} defines the macros @code{_MIPS_TUNE} and
+@code{_MIPS_TUNE_@var{foo}}, which work in the same way as the
+@option{-march} ones described above.
 
-@item -mmodel=large
-@opindex mmodel=large
-Assume objects may be anywhere in the 32-bit address space (the compiler
-generates @code{seth/add3} instructions to load their addresses), and
-assume subroutines may not be reachable with the @code{bl} instruction
-(the compiler generates the much slower @code{seth/add3/jl}
-instruction sequence).
+@item -mips1
+@opindex mips1
+Equivalent to @option{-march=mips1}.
 
-@item -msdata=none
-@opindex msdata=none
-Disable use of the small data area.  Variables are put into
-one of @code{.data}, @code{.bss}, or @code{.rodata} (unless the
-@code{section} attribute has been specified).
-This is the default.
+@item -mips2
+@opindex mips2
+Equivalent to @option{-march=mips2}.
 
-The small data area consists of sections @code{.sdata} and @code{.sbss}.
-Objects may be explicitly put in the small data area with the
-@code{section} attribute using one of these sections.
+@item -mips3
+@opindex mips3
+Equivalent to @option{-march=mips3}.
 
-@item -msdata=sdata
-@opindex msdata=sdata
-Put small global and static data in the small data area, but do not
-generate special code to reference them.
+@item -mips4
+@opindex mips4
+Equivalent to @option{-march=mips4}.
 
-@item -msdata=use
-@opindex msdata=use
-Put small global and static data in the small data area, and generate
-special instructions to reference them.
+@item -mips32
+@opindex mips32
+Equivalent to @option{-march=mips32}.
 
-@item -G @var{num}
-@opindex G
-@cindex smaller data references
-Put global and static objects less than or equal to @var{num} bytes
-into the small data or BSS sections instead of the normal data or BSS
-sections.  The default value of @var{num} is 8.
-The @option{-msdata} option must be set to one of @samp{sdata} or @samp{use}
-for this option to have any effect.
+@item -mips32r3
+@opindex mips32r3
+Equivalent to @option{-march=mips32r3}.
 
-All modules should be compiled with the same @option{-G @var{num}} value.
-Compiling with different values of @var{num} may or may not work; if it
-doesn't the linker gives an error message---incorrect code is not
-generated.
+@item -mips32r5
+@opindex mips32r5
+Equivalent to @option{-march=mips32r5}.
 
-@item -mdebug
-@opindex mdebug
-Makes the M32R-specific code in the compiler display some statistics
-that might help in debugging programs.
+@item -mips32r6
+@opindex mips32r6
+Equivalent to @option{-march=mips32r6}.
 
-@item -malign-loops
-@opindex malign-loops
-Align all loops to a 32-byte boundary.
+@item -mips64
+@opindex mips64
+Equivalent to @option{-march=mips64}.
 
-@item -mno-align-loops
-@opindex mno-align-loops
-Do not enforce a 32-byte alignment for loops.  This is the default.
+@item -mips64r2
+@opindex mips64r2
+Equivalent to @option{-march=mips64r2}.
 
-@item -missue-rate=@var{number}
-@opindex missue-rate=@var{number}
-Issue @var{number} instructions per cycle.  @var{number} can only be 1
-or 2.
+@item -mips64r3
+@opindex mips64r3
+Equivalent to @option{-march=mips64r3}.
 
-@item -mbranch-cost=@var{number}
-@opindex mbranch-cost=@var{number}
-@var{number} can only be 1 or 2.  If it is 1 then branches are
-preferred over conditional code, if it is 2, then the opposite applies.
+@item -mips64r5
+@opindex mips64r5
+Equivalent to @option{-march=mips64r5}.
 
-@item -mflush-trap=@var{number}
-@opindex mflush-trap=@var{number}
-Specifies the trap number to use to flush the cache.  The default is
-12.  Valid numbers are between 0 and 15 inclusive.
+@item -mips64r6
+@opindex mips64r6
+Equivalent to @option{-march=mips64r6}.
 
-@item -mno-flush-trap
-@opindex mno-flush-trap
-Specifies that the cache cannot be flushed by using a trap.
+@item -mips16
+@itemx -mno-mips16
+@opindex mips16
+@opindex mno-mips16
+Generate (do not generate) MIPS16 code.  If GCC is targeting a
+MIPS32 or MIPS64 architecture, it makes use of the MIPS16e ASE@.
 
-@item -mflush-func=@var{name}
-@opindex mflush-func=@var{name}
-Specifies the name of the operating system function to call to flush
-the cache.  The default is @samp{_flush_cache}, but a function call
-is only used if a trap is not available.
+MIPS16 code generation can also be controlled on a per-function basis
+by means of @code{mips16} and @code{nomips16} attributes.
+@xref{Function Attributes}, for more information.
 
-@item -mno-flush-func
-@opindex mno-flush-func
-Indicates that there is no OS function for flushing the cache.
+@item -mflip-mips16
+@opindex mflip-mips16
+Generate MIPS16 code on alternating functions.  This option is provided
+for regression testing of mixed MIPS16/non-MIPS16 code generation, and is
+not intended for ordinary use in compiling user code.
 
-@end table
+@item -minterlink-compressed
+@item -mno-interlink-compressed
+@opindex minterlink-compressed
+@opindex mno-interlink-compressed
+Require (do not require) that code using the standard (uncompressed) MIPS ISA
+be link-compatible with MIPS16 and microMIPS code, and vice versa.
 
-@node M680x0 Options
-@subsection M680x0 Options
-@cindex M680x0 options
+For example, code using the standard ISA encoding cannot jump directly
+to MIPS16 or microMIPS code; it must either use a call or an indirect jump.
+@option{-minterlink-compressed} therefore disables direct jumps unless GCC
+knows that the target of the jump is not compressed.
 
-These are the @samp{-m} options defined for M680x0 and ColdFire processors.
-The default settings depend on which architecture was selected when
-the compiler was configured; the defaults for the most common choices
-are given below.
+@item -minterlink-mips16
+@itemx -mno-interlink-mips16
+@opindex minterlink-mips16
+@opindex mno-interlink-mips16
+Aliases of @option{-minterlink-compressed} and
+@option{-mno-interlink-compressed}.  These options predate the microMIPS ASE
+and are retained for backwards compatibility.
 
-@table @gcctabopt
-@item -march=@var{arch}
-@opindex march
-Generate code for a specific M680x0 or ColdFire instruction set
-architecture.  Permissible values of @var{arch} for M680x0
-architectures are: @samp{68000}, @samp{68010}, @samp{68020},
-@samp{68030}, @samp{68040}, @samp{68060} and @samp{cpu32}.  ColdFire
-architectures are selected according to Freescale's ISA classification
-and the permissible values are: @samp{isaa}, @samp{isaaplus},
-@samp{isab} and @samp{isac}.
+@item -mabi=32
+@itemx -mabi=o64
+@itemx -mabi=n32
+@itemx -mabi=64
+@itemx -mabi=eabi
+@opindex mabi=32
+@opindex mabi=o64
+@opindex mabi=n32
+@opindex mabi=64
+@opindex mabi=eabi
+Generate code for the given ABI@.
 
-GCC defines a macro @code{__mcf@var{arch}__} whenever it is generating
-code for a ColdFire target.  The @var{arch} in this macro is one of the
-@option{-march} arguments given above.
+Note that the EABI has a 32-bit and a 64-bit variant.  GCC normally
+generates 64-bit code when you select a 64-bit architecture, but you
+can use @option{-mgp32} to get 32-bit code instead.
 
-When used together, @option{-march} and @option{-mtune} select code
-that runs on a family of similar processors but that is optimized
-for a particular microarchitecture.
+For information about the O64 ABI, see
+@uref{http://gcc.gnu.org/@/projects/@/mipso64-abi.html}.
 
-@item -mcpu=@var{cpu}
-@opindex mcpu
-Generate code for a specific M680x0 or ColdFire processor.
-The M680x0 @var{cpu}s are: @samp{68000}, @samp{68010}, @samp{68020},
-@samp{68030}, @samp{68040}, @samp{68060}, @samp{68302}, @samp{68332}
-and @samp{cpu32}.  The ColdFire @var{cpu}s are given by the table
-below, which also classifies the CPUs into families:
+GCC supports a variant of the o32 ABI in which floating-point registers
+are 64 rather than 32 bits wide.  You can select this combination with
+@option{-mabi=32} @option{-mfp64}.  This ABI relies on the @code{mthc1}
+and @code{mfhc1} instructions and is therefore only supported for
+MIPS32R2, MIPS32R3 and MIPS32R5 processors.
 
-@multitable @columnfractions 0.20 0.80
-@item @strong{Family} @tab @strong{@samp{-mcpu} arguments}
-@item @samp{51} @tab @samp{51} @samp{51ac} @samp{51ag} @samp{51cn} @samp{51em} @samp{51je} @samp{51jf} @samp{51jg} @samp{51jm} @samp{51mm} @samp{51qe} @samp{51qm}
-@item @samp{5206} @tab @samp{5202} @samp{5204} @samp{5206}
-@item @samp{5206e} @tab @samp{5206e}
-@item @samp{5208} @tab @samp{5207} @samp{5208}
-@item @samp{5211a} @tab @samp{5210a} @samp{5211a}
-@item @samp{5213} @tab @samp{5211} @samp{5212} @samp{5213}
-@item @samp{5216} @tab @samp{5214} @samp{5216}
-@item @samp{52235} @tab @samp{52230} @samp{52231} @samp{52232} @samp{52233} @samp{52234} @samp{52235}
-@item @samp{5225} @tab @samp{5224} @samp{5225}
-@item @samp{52259} @tab @samp{52252} @samp{52254} @samp{52255} @samp{52256} @samp{52258} @samp{52259}
-@item @samp{5235} @tab @samp{5232} @samp{5233} @samp{5234} @samp{5235} @samp{523x}
-@item @samp{5249} @tab @samp{5249}
-@item @samp{5250} @tab @samp{5250}
-@item @samp{5271} @tab @samp{5270} @samp{5271}
-@item @samp{5272} @tab @samp{5272}
-@item @samp{5275} @tab @samp{5274} @samp{5275}
-@item @samp{5282} @tab @samp{5280} @samp{5281} @samp{5282} @samp{528x}
-@item @samp{53017} @tab @samp{53011} @samp{53012} @samp{53013} @samp{53014} @samp{53015} @samp{53016} @samp{53017}
-@item @samp{5307} @tab @samp{5307}
-@item @samp{5329} @tab @samp{5327} @samp{5328} @samp{5329} @samp{532x}
-@item @samp{5373} @tab @samp{5372} @samp{5373} @samp{537x}
-@item @samp{5407} @tab @samp{5407}
-@item @samp{5475} @tab @samp{5470} @samp{5471} @samp{5472} @samp{5473} @samp{5474} @samp{5475} @samp{547x} @samp{5480} @samp{5481} @samp{5482} @samp{5483} @samp{5484} @samp{5485}
-@end multitable
+The register assignments for arguments and return values remain the
+same, but each scalar value is passed in a single 64-bit register
+rather than a pair of 32-bit registers.  For example, scalar
+floating-point values are returned in @samp{$f0} only, not a
+@samp{$f0}/@samp{$f1} pair.  The set of call-saved registers also
+remains the same in that the even-numbered double-precision registers
+are saved.
 
-@option{-mcpu=@var{cpu}} overrides @option{-march=@var{arch}} if
-@var{arch} is compatible with @var{cpu}.  Other combinations of
-@option{-mcpu} and @option{-march} are rejected.
+Two additional variants of the o32 ABI are supported to enable
+a transition from 32-bit to 64-bit registers.  These are FPXX
+(@option{-mfpxx}) and FP64A (@option{-mfp64} @option{-mno-odd-spreg}).
+The FPXX extension mandates that all code must execute correctly
+when run using 32-bit or 64-bit registers.  The code can be interlinked
+with either FP32 or FP64, but not both.
+The FP64A extension is similar to the FP64 extension but forbids the
+use of odd-numbered single-precision registers.  This can be used
+in conjunction with the @code{FRE} mode of FPUs in MIPS32R5
+processors and allows both FP32 and FP64A code to interlink and
+run in the same process without changing FPU modes.
 
-GCC defines the macro @code{__mcf_cpu_@var{cpu}} when ColdFire target
-@var{cpu} is selected.  It also defines @code{__mcf_family_@var{family}},
-where the value of @var{family} is given by the table above.
+@item -mabicalls
+@itemx -mno-abicalls
+@opindex mabicalls
+@opindex mno-abicalls
+Generate (do not generate) code that is suitable for SVR4-style
+dynamic objects.  @option{-mabicalls} is the default for SVR4-based
+systems.
 
-@item -mtune=@var{tune}
-@opindex mtune
-Tune the code for a particular microarchitecture within the
-constraints set by @option{-march} and @option{-mcpu}.
-The M680x0 microarchitectures are: @samp{68000}, @samp{68010},
-@samp{68020}, @samp{68030}, @samp{68040}, @samp{68060}
-and @samp{cpu32}.  The ColdFire microarchitectures
-are: @samp{cfv1}, @samp{cfv2}, @samp{cfv3}, @samp{cfv4} and @samp{cfv4e}.
+@item -mshared
+@itemx -mno-shared
+Generate (do not generate) code that is fully position-independent,
+and that can therefore be linked into shared libraries.  This option
+only affects @option{-mabicalls}.
 
-You can also use @option{-mtune=68020-40} for code that needs
-to run relatively well on 68020, 68030 and 68040 targets.
-@option{-mtune=68020-60} is similar but includes 68060 targets
-as well.  These two options select the same tuning decisions as
-@option{-m68020-40} and @option{-m68020-60} respectively.
+All @option{-mabicalls} code has traditionally been position-independent,
+regardless of options like @option{-fPIC} and @option{-fpic}.  However,
+as an extension, the GNU toolchain allows executables to use absolute
+accesses for locally-binding symbols.  It can also use shorter GP
+initialization sequences and generate direct calls to locally-defined
+functions.  This mode is selected by @option{-mno-shared}.
 
-GCC defines the macros @code{__mc@var{arch}} and @code{__mc@var{arch}__}
-when tuning for 680x0 architecture @var{arch}.  It also defines
-@code{mc@var{arch}} unless either @option{-ansi} or a non-GNU @option{-std}
-option is used.  If GCC is tuning for a range of architectures,
-as selected by @option{-mtune=68020-40} or @option{-mtune=68020-60},
-it defines the macros for every architecture in the range.
+@option{-mno-shared} depends on binutils 2.16 or higher and generates
+objects that can only be linked by the GNU linker.  However, the option
+does not affect the ABI of the final executable; it only affects the ABI
+of relocatable objects.  Using @option{-mno-shared} generally makes
+executables both smaller and quicker.
 
-GCC also defines the macro @code{__m@var{uarch}__} when tuning for
-ColdFire microarchitecture @var{uarch}, where @var{uarch} is one
-of the arguments given above.
+@option{-mshared} is the default.
 
-@item -m68000
-@itemx -mc68000
-@opindex m68000
-@opindex mc68000
-Generate output for a 68000.  This is the default
-when the compiler is configured for 68000-based systems.
-It is equivalent to @option{-march=68000}.
+@item -mplt
+@itemx -mno-plt
+@opindex mplt
+@opindex mno-plt
+Assume (do not assume) that the static and dynamic linkers
+support PLTs and copy relocations.  This option only affects
+@option{-mno-shared -mabicalls}.  For the n64 ABI, this option
+has no effect without @option{-msym32}.
 
-Use this option for microcontrollers with a 68000 or EC000 core,
-including the 68008, 68302, 68306, 68307, 68322, 68328 and 68356.
+You can make @option{-mplt} the default by configuring
+GCC with @option{--with-mips-plt}.  The default is
+@option{-mno-plt} otherwise.
 
-@item -m68010
-@opindex m68010
-Generate output for a 68010.  This is the default
-when the compiler is configured for 68010-based systems.
-It is equivalent to @option{-march=68010}.
+@item -mxgot
+@itemx -mno-xgot
+@opindex mxgot
+@opindex mno-xgot
+Lift (do not lift) the usual restrictions on the size of the global
+offset table.
 
-@item -m68020
-@itemx -mc68020
-@opindex m68020
-@opindex mc68020
-Generate output for a 68020.  This is the default
-when the compiler is configured for 68020-based systems.
-It is equivalent to @option{-march=68020}.
-
-@item -m68030
-@opindex m68030
-Generate output for a 68030.  This is the default when the compiler is
-configured for 68030-based systems.  It is equivalent to
-@option{-march=68030}.
-
-@item -m68040
-@opindex m68040
-Generate output for a 68040.  This is the default when the compiler is
-configured for 68040-based systems.  It is equivalent to
-@option{-march=68040}.
-
-This option inhibits the use of 68881/68882 instructions that have to be
-emulated by software on the 68040.  Use this option if your 68040 does not
-have code to emulate those instructions.
-
-@item -m68060
-@opindex m68060
-Generate output for a 68060.  This is the default when the compiler is
-configured for 68060-based systems.  It is equivalent to
-@option{-march=68060}.
-
-This option inhibits the use of 68020 and 68881/68882 instructions that
-have to be emulated by software on the 68060.  Use this option if your 68060
-does not have code to emulate those instructions.
-
-@item -mcpu32
-@opindex mcpu32
-Generate output for a CPU32.  This is the default
-when the compiler is configured for CPU32-based systems.
-It is equivalent to @option{-march=cpu32}.
-
-Use this option for microcontrollers with a
-CPU32 or CPU32+ core, including the 68330, 68331, 68332, 68333, 68334,
-68336, 68340, 68341, 68349 and 68360.
-
-@item -m5200
-@opindex m5200
-Generate output for a 520X ColdFire CPU@.  This is the default
-when the compiler is configured for 520X-based systems.
-It is equivalent to @option{-mcpu=5206}, and is now deprecated
-in favor of that option.
-
-Use this option for microcontroller with a 5200 core, including
-the MCF5202, MCF5203, MCF5204 and MCF5206.
+GCC normally uses a single instruction to load values from the GOT@.
+While this is relatively efficient, it only works if the GOT
+is smaller than about 64k.  Anything larger causes the linker
+to report an error such as:
 
-@item -m5206e
-@opindex m5206e
-Generate output for a 5206e ColdFire CPU@.  The option is now
-deprecated in favor of the equivalent @option{-mcpu=5206e}.
+@cindex relocation truncated to fit (MIPS)
+@smallexample
+relocation truncated to fit: R_MIPS_GOT16 foobar
+@end smallexample
 
-@item -m528x
-@opindex m528x
-Generate output for a member of the ColdFire 528X family.
-The option is now deprecated in favor of the equivalent
-@option{-mcpu=528x}.
+If this happens, you should recompile your code with @option{-mxgot}.
+This works with very large GOTs, although the code is also
+less efficient, since it takes three instructions to fetch the
+value of a global symbol.
 
-@item -m5307
-@opindex m5307
-Generate output for a ColdFire 5307 CPU@.  The option is now deprecated
-in favor of the equivalent @option{-mcpu=5307}.
+Note that some linkers can create multiple GOTs.  If you have such a
+linker, you should only need to use @option{-mxgot} when a single object
+file accesses more than 64k's worth of GOT entries.  Very few do.
 
-@item -m5407
-@opindex m5407
-Generate output for a ColdFire 5407 CPU@.  The option is now deprecated
-in favor of the equivalent @option{-mcpu=5407}.
+These options have no effect unless GCC is generating position
+independent code.
 
-@item -mcfv4e
-@opindex mcfv4e
-Generate output for a ColdFire V4e family CPU (e.g.@: 547x/548x).
-This includes use of hardware floating-point instructions.
-The option is equivalent to @option{-mcpu=547x}, and is now
-deprecated in favor of that option.
+@item -mgp32
+@opindex mgp32
+Assume that general-purpose registers are 32 bits wide.
 
-@item -m68020-40
-@opindex m68020-40
-Generate output for a 68040, without using any of the new instructions.
-This results in code that can run relatively efficiently on either a
-68020/68881 or a 68030 or a 68040.  The generated code does use the
-68881 instructions that are emulated on the 68040.
+@item -mgp64
+@opindex mgp64
+Assume that general-purpose registers are 64 bits wide.
 
-The option is equivalent to @option{-march=68020} @option{-mtune=68020-40}.
+@item -mfp32
+@opindex mfp32
+Assume that floating-point registers are 32 bits wide.
 
-@item -m68020-60
-@opindex m68020-60
-Generate output for a 68060, without using any of the new instructions.
-This results in code that can run relatively efficiently on either a
-68020/68881 or a 68030 or a 68040.  The generated code does use the
-68881 instructions that are emulated on the 68060.
+@item -mfp64
+@opindex mfp64
+Assume that floating-point registers are 64 bits wide.
 
-The option is equivalent to @option{-march=68020} @option{-mtune=68020-60}.
+@item -mfpxx
+@opindex mfpxx
+Do not assume the width of floating-point registers.
 
 @item -mhard-float
-@itemx -m68881
 @opindex mhard-float
-@opindex m68881
-Generate floating-point instructions.  This is the default for 68020
-and above, and for ColdFire devices that have an FPU@.  It defines the
-macro @code{__HAVE_68881__} on M680x0 targets and @code{__mcffpu__}
-on ColdFire targets.
+Use floating-point coprocessor instructions.
 
 @item -msoft-float
 @opindex msoft-float
-Do not generate floating-point instructions; use library calls instead.
-This is the default for 68000, 68010, and 68832 targets.  It is also
-the default for ColdFire devices that have no FPU.
+Do not use floating-point coprocessor instructions.  Implement
+floating-point calculations using library calls instead.
 
-@item -mdiv
-@itemx -mno-div
-@opindex mdiv
-@opindex mno-div
-Generate (do not generate) ColdFire hardware divide and remainder
-instructions.  If @option{-march} is used without @option{-mcpu},
-the default is ``on'' for ColdFire architectures and ``off'' for M680x0
-architectures.  Otherwise, the default is taken from the target CPU
-(either the default CPU, or the one specified by @option{-mcpu}).  For
-example, the default is ``off'' for @option{-mcpu=5206} and ``on'' for
-@option{-mcpu=5206e}.
+@item -mno-float
+@opindex mno-float
+Equivalent to @option{-msoft-float}, but additionally asserts that the
+program being compiled does not perform any floating-point operations.
+This option is presently supported only by some bare-metal MIPS
+configurations, where it may select a special set of libraries
+that lack all floating-point support (including, for example, the
+floating-point @code{printf} formats).  
+If code compiled with @option{-mno-float} accidentally contains
+floating-point operations, it is likely to suffer a link-time
+or run-time failure.
 
-GCC defines the macro @code{__mcfhwdiv__} when this option is enabled.
+@item -msingle-float
+@opindex msingle-float
+Assume that the floating-point coprocessor only supports single-precision
+operations.
 
-@item -mshort
-@opindex mshort
-Consider type @code{int} to be 16 bits wide, like @code{short int}.
-Additionally, parameters passed on the stack are also aligned to a
-16-bit boundary even on targets whose API mandates promotion to 32-bit.
+@item -mdouble-float
+@opindex mdouble-float
+Assume that the floating-point coprocessor supports double-precision
+operations.  This is the default.
 
-@item -mno-short
-@opindex mno-short
-Do not consider type @code{int} to be 16 bits wide.  This is the default.
+@item -modd-spreg
+@itemx -mno-odd-spreg
+@opindex modd-spreg
+@opindex mno-odd-spreg
+Enable the use of odd-numbered single-precision floating-point registers
+for the o32 ABI.  This is the default for processors that are known to
+support these registers.  When using the o32 FPXX ABI, @option{-mno-odd-spreg}
+is set by default.
 
-@item -mnobitfield
-@itemx -mno-bitfield
-@opindex mnobitfield
-@opindex mno-bitfield
-Do not use the bit-field instructions.  The @option{-m68000}, @option{-mcpu32}
-and @option{-m5200} options imply @w{@option{-mnobitfield}}.
+@item -mabs=2008
+@itemx -mabs=legacy
+@opindex mabs=2008
+@opindex mabs=legacy
+These options control the treatment of the special not-a-number (NaN)
+IEEE 754 floating-point data with the @code{abs.@i{fmt}} and
+@code{neg.@i{fmt}} machine instructions.
 
-@item -mbitfield
-@opindex mbitfield
-Do use the bit-field instructions.  The @option{-m68020} option implies
-@option{-mbitfield}.  This is the default if you use a configuration
-designed for a 68020.
+By default or when the @option{-mabs=legacy} is used the legacy
+treatment is selected.  In this case these instructions are considered
+arithmetic and avoided where correct operation is required and the
+input operand might be a NaN.  A longer sequence of instructions that
+manipulate the sign bit of floating-point datum manually is used
+instead unless the @option{-ffinite-math-only} option has also been
+specified.
 
-@item -mrtd
-@opindex mrtd
-Use a different function-calling convention, in which functions
-that take a fixed number of arguments return with the @code{rtd}
-instruction, which pops their arguments while returning.  This
-saves one instruction in the caller since there is no need to pop
-the arguments there.
+The @option{-mabs=2008} option selects the IEEE 754-2008 treatment.  In
+this case these instructions are considered non-arithmetic and therefore
+operating correctly in all cases, including in particular where the
+input operand is a NaN.  These instructions are therefore always used
+for the respective operations.
 
-This calling convention is incompatible with the one normally
-used on Unix, so you cannot use it if you need to call libraries
-compiled with the Unix compiler.
+@item -mnan=2008
+@itemx -mnan=legacy
+@opindex mnan=2008
+@opindex mnan=legacy
+These options control the encoding of the special not-a-number (NaN)
+IEEE 754 floating-point data.
 
-Also, you must provide function prototypes for all functions that
-take variable numbers of arguments (including @code{printf});
-otherwise incorrect code is generated for calls to those
-functions.
+The @option{-mnan=legacy} option selects the legacy encoding.  In this
+case quiet NaNs (qNaNs) are denoted by the first bit of their trailing
+significand field being 0, whereas signalling NaNs (sNaNs) are denoted
+by the first bit of their trailing significand field being 1.
 
-In addition, seriously incorrect code results if you call a
-function with too many arguments.  (Normally, extra arguments are
-harmlessly ignored.)
+The @option{-mnan=2008} option selects the IEEE 754-2008 encoding.  In
+this case qNaNs are denoted by the first bit of their trailing
+significand field being 1, whereas sNaNs are denoted by the first bit of
+their trailing significand field being 0.
 
-The @code{rtd} instruction is supported by the 68010, 68020, 68030,
-68040, 68060 and CPU32 processors, but not by the 68000 or 5200.
+The default is @option{-mnan=legacy} unless GCC has been configured with
+@option{--with-nan=2008}.
 
-@item -mno-rtd
-@opindex mno-rtd
-Do not use the calling conventions selected by @option{-mrtd}.
-This is the default.
+@item -mllsc
+@itemx -mno-llsc
+@opindex mllsc
+@opindex mno-llsc
+Use (do not use) @samp{ll}, @samp{sc}, and @samp{sync} instructions to
+implement atomic memory built-in functions.  When neither option is
+specified, GCC uses the instructions if the target architecture
+supports them.
 
-@item -malign-int
-@itemx -mno-align-int
-@opindex malign-int
-@opindex mno-align-int
-Control whether GCC aligns @code{int}, @code{long}, @code{long long},
-@code{float}, @code{double}, and @code{long double} variables on a 32-bit
-boundary (@option{-malign-int}) or a 16-bit boundary (@option{-mno-align-int}).
-Aligning variables on 32-bit boundaries produces code that runs somewhat
-faster on processors with 32-bit busses at the expense of more memory.
-
-@strong{Warning:} if you use the @option{-malign-int} switch, GCC
-aligns structures containing the above types differently than
-most published application binary interface specifications for the m68k.
+@option{-mllsc} is useful if the runtime environment can emulate the
+instructions and @option{-mno-llsc} can be useful when compiling for
+nonstandard ISAs.  You can make either option the default by
+configuring GCC with @option{--with-llsc} and @option{--without-llsc}
+respectively.  @option{--with-llsc} is the default for some
+configurations; see the installation documentation for details.
 
-@item -mpcrel
-@opindex mpcrel
-Use the pc-relative addressing mode of the 68000 directly, instead of
-using a global offset table.  At present, this option implies @option{-fpic},
-allowing at most a 16-bit offset for pc-relative addressing.  @option{-fPIC} is
-not presently supported with @option{-mpcrel}, though this could be supported for
-68020 and higher processors.
+@item -mdsp
+@itemx -mno-dsp
+@opindex mdsp
+@opindex mno-dsp
+Use (do not use) revision 1 of the MIPS DSP ASE@.
+@xref{MIPS DSP Built-in Functions}.  This option defines the
+preprocessor macro @code{__mips_dsp}.  It also defines
+@code{__mips_dsp_rev} to 1.
 
-@item -mno-strict-align
-@itemx -mstrict-align
-@opindex mno-strict-align
-@opindex mstrict-align
-Do not (do) assume that unaligned memory references are handled by
-the system.
+@item -mdspr2
+@itemx -mno-dspr2
+@opindex mdspr2
+@opindex mno-dspr2
+Use (do not use) revision 2 of the MIPS DSP ASE@.
+@xref{MIPS DSP Built-in Functions}.  This option defines the
+preprocessor macros @code{__mips_dsp} and @code{__mips_dspr2}.
+It also defines @code{__mips_dsp_rev} to 2.
 
-@item -msep-data
-Generate code that allows the data segment to be located in a different
-area of memory from the text segment.  This allows for execute-in-place in
-an environment without virtual memory management.  This option implies
-@option{-fPIC}.
+@item -msmartmips
+@itemx -mno-smartmips
+@opindex msmartmips
+@opindex mno-smartmips
+Use (do not use) the MIPS SmartMIPS ASE.
 
-@item -mno-sep-data
-Generate code that assumes that the data segment follows the text segment.
-This is the default.
+@item -mpaired-single
+@itemx -mno-paired-single
+@opindex mpaired-single
+@opindex mno-paired-single
+Use (do not use) paired-single floating-point instructions.
+@xref{MIPS Paired-Single Support}.  This option requires
+hardware floating-point support to be enabled.
 
-@item -mid-shared-library
-Generate code that supports shared libraries via the library ID method.
-This allows for execute-in-place and shared libraries in an environment
-without virtual memory management.  This option implies @option{-fPIC}.
+@item -mdmx
+@itemx -mno-mdmx
+@opindex mdmx
+@opindex mno-mdmx
+Use (do not use) MIPS Digital Media Extension instructions.
+This option can only be used when generating 64-bit code and requires
+hardware floating-point support to be enabled.
 
-@item -mno-id-shared-library
-Generate code that doesn't assume ID-based shared libraries are being used.
-This is the default.
+@item -mips3d
+@itemx -mno-mips3d
+@opindex mips3d
+@opindex mno-mips3d
+Use (do not use) the MIPS-3D ASE@.  @xref{MIPS-3D Built-in Functions}.
+The option @option{-mips3d} implies @option{-mpaired-single}.
 
-@item -mshared-library-id=n
-Specifies the identification number of the ID-based shared library being
-compiled.  Specifying a value of 0 generates more compact code; specifying
-other values forces the allocation of that number to the current
-library, but is no more space- or time-efficient than omitting this option.
+@item -mmicromips
+@itemx -mno-micromips
+@opindex mmicromips
+@opindex mno-mmicromips
+Generate (do not generate) microMIPS code.
 
-@item -mxgot
-@itemx -mno-xgot
-@opindex mxgot
-@opindex mno-xgot
-When generating position-independent code for ColdFire, generate code
-that works if the GOT has more than 8192 entries.  This code is
-larger and slower than code generated without this option.  On M680x0
-processors, this option is not needed; @option{-fPIC} suffices.
+MicroMIPS code generation can also be controlled on a per-function basis
+by means of @code{micromips} and @code{nomicromips} attributes.
+@xref{Function Attributes}, for more information.
 
-GCC normally uses a single instruction to load values from the GOT@.
-While this is relatively efficient, it only works if the GOT
-is smaller than about 64k.  Anything larger causes the linker
-to report an error such as:
+@item -mmt
+@itemx -mno-mt
+@opindex mmt
+@opindex mno-mt
+Use (do not use) MT Multithreading instructions.
 
-@cindex relocation truncated to fit (ColdFire)
-@smallexample
-relocation truncated to fit: R_68K_GOT16O foobar
-@end smallexample
+@item -mmcu
+@itemx -mno-mcu
+@opindex mmcu
+@opindex mno-mcu
+Use (do not use) the MIPS MCU ASE instructions.
 
-If this happens, you should recompile your code with @option{-mxgot}.
-It should then work with very large GOTs.  However, code generated with
-@option{-mxgot} is less efficient, since it takes 4 instructions to fetch
-the value of a global symbol.
+@item -meva
+@itemx -mno-eva
+@opindex meva
+@opindex mno-eva
+Use (do not use) the MIPS Enhanced Virtual Addressing instructions.
 
-Note that some linkers, including newer versions of the GNU linker,
-can create multiple GOTs and sort GOT entries.  If you have such a linker,
-you should only need to use @option{-mxgot} when compiling a single
-object file that accesses more than 8192 GOT entries.  Very few do.
+@item -mvirt
+@itemx -mno-virt
+@opindex mvirt
+@opindex mno-virt
+Use (do not use) the MIPS Virtualization Application Specific instructions.
 
-These options have no effect unless GCC is generating
-position-independent code.
+@item -mxpa
+@itemx -mno-xpa
+@opindex mxpa
+@opindex mno-xpa
+Use (do not use) the MIPS eXtended Physical Address (XPA) instructions.
 
-@end table
+@item -mlong64
+@opindex mlong64
+Force @code{long} types to be 64 bits wide.  See @option{-mlong32} for
+an explanation of the default and the way that the pointer size is
+determined.
 
-@node MCore Options
-@subsection MCore Options
-@cindex MCore options
+@item -mlong32
+@opindex mlong32
+Force @code{long}, @code{int}, and pointer types to be 32 bits wide.
 
-These are the @samp{-m} options defined for the Motorola M*Core
-processors.
+The default size of @code{int}s, @code{long}s and pointers depends on
+the ABI@.  All the supported ABIs use 32-bit @code{int}s.  The n64 ABI
+uses 64-bit @code{long}s, as does the 64-bit EABI; the others use
+32-bit @code{long}s.  Pointers are the same size as @code{long}s,
+or the same size as integer registers, whichever is smaller.
 
-@table @gcctabopt
+@item -msym32
+@itemx -mno-sym32
+@opindex msym32
+@opindex mno-sym32
+Assume (do not assume) that all symbols have 32-bit values, regardless
+of the selected ABI@.  This option is useful in combination with
+@option{-mabi=64} and @option{-mno-abicalls} because it allows GCC
+to generate shorter and faster references to symbolic addresses.
 
-@item -mhardlit
-@itemx -mno-hardlit
-@opindex mhardlit
-@opindex mno-hardlit
-Inline constants into the code stream if it can be done in two
-instructions or less.
+@item -G @var{num}
+@opindex G
+Put definitions of externally-visible data in a small data section
+if that data is no bigger than @var{num} bytes.  GCC can then generate
+more efficient accesses to the data; see @option{-mgpopt} for details.
 
-@item -mdiv
-@itemx -mno-div
-@opindex mdiv
-@opindex mno-div
-Use the divide instruction.  (Enabled by default).
+The default @option{-G} option depends on the configuration.
 
-@item -mrelax-immediate
-@itemx -mno-relax-immediate
-@opindex mrelax-immediate
-@opindex mno-relax-immediate
-Allow arbitrary-sized immediates in bit operations.
+@item -mlocal-sdata
+@itemx -mno-local-sdata
+@opindex mlocal-sdata
+@opindex mno-local-sdata
+Extend (do not extend) the @option{-G} behavior to local data too,
+such as to static variables in C@.  @option{-mlocal-sdata} is the
+default for all configurations.
 
-@item -mwide-bitfields
-@itemx -mno-wide-bitfields
-@opindex mwide-bitfields
-@opindex mno-wide-bitfields
-Always treat bit-fields as @code{int}-sized.
+If the linker complains that an application is using too much small data,
+you might want to try rebuilding the less performance-critical parts with
+@option{-mno-local-sdata}.  You might also want to build large
+libraries with @option{-mno-local-sdata}, so that the libraries leave
+more room for the main program.
 
-@item -m4byte-functions
-@itemx -mno-4byte-functions
-@opindex m4byte-functions
-@opindex mno-4byte-functions
-Force all functions to be aligned to a 4-byte boundary.
+@item -mextern-sdata
+@itemx -mno-extern-sdata
+@opindex mextern-sdata
+@opindex mno-extern-sdata
+Assume (do not assume) that externally-defined data is in
+a small data section if the size of that data is within the @option{-G} limit.
+@option{-mextern-sdata} is the default for all configurations.
 
-@item -mcallgraph-data
-@itemx -mno-callgraph-data
-@opindex mcallgraph-data
-@opindex mno-callgraph-data
-Emit callgraph information.
+If you compile a module @var{Mod} with @option{-mextern-sdata} @option{-G
+@var{num}} @option{-mgpopt}, and @var{Mod} references a variable @var{Var}
+that is no bigger than @var{num} bytes, you must make sure that @var{Var}
+is placed in a small data section.  If @var{Var} is defined by another
+module, you must either compile that module with a high-enough
+@option{-G} setting or attach a @code{section} attribute to @var{Var}'s
+definition.  If @var{Var} is common, you must link the application
+with a high-enough @option{-G} setting.
 
-@item -mslow-bytes
-@itemx -mno-slow-bytes
-@opindex mslow-bytes
-@opindex mno-slow-bytes
-Prefer word access when reading byte quantities.
+The easiest way of satisfying these restrictions is to compile
+and link every module with the same @option{-G} option.  However,
+you may wish to build a library that supports several different
+small data limits.  You can do this by compiling the library with
+the highest supported @option{-G} setting and additionally using
+@option{-mno-extern-sdata} to stop the library from making assumptions
+about externally-defined data.
 
-@item -mlittle-endian
-@itemx -mbig-endian
-@opindex mlittle-endian
-@opindex mbig-endian
-Generate code for a little-endian target.
+@item -mgpopt
+@itemx -mno-gpopt
+@opindex mgpopt
+@opindex mno-gpopt
+Use (do not use) GP-relative accesses for symbols that are known to be
+in a small data section; see @option{-G}, @option{-mlocal-sdata} and
+@option{-mextern-sdata}.  @option{-mgpopt} is the default for all
+configurations.
 
-@item -m210
-@itemx -m340
-@opindex m210
-@opindex m340
-Generate code for the 210 processor.
+@option{-mno-gpopt} is useful for cases where the @code{$gp} register
+might not hold the value of @code{_gp}.  For example, if the code is
+part of a library that might be used in a boot monitor, programs that
+call boot monitor routines pass an unknown value in @code{$gp}.
+(In such situations, the boot monitor itself is usually compiled
+with @option{-G0}.)
 
-@item -mno-lsim
-@opindex mno-lsim
-Assume that runtime support has been provided and so omit the
-simulator library (@file{libsim.a)} from the linker command line.
+@option{-mno-gpopt} implies @option{-mno-local-sdata} and
+@option{-mno-extern-sdata}.
 
-@item -mstack-increment=@var{size}
-@opindex mstack-increment
-Set the maximum amount for a single stack increment operation.  Large
-values can increase the speed of programs that contain functions
-that need a large amount of stack space, but they can also trigger a
-segmentation fault if the stack is extended too much.  The default
-value is 0x1000.
+@item -membedded-data
+@itemx -mno-embedded-data
+@opindex membedded-data
+@opindex mno-embedded-data
+Allocate variables to the read-only data section first if possible, then
+next in the small data section if possible, otherwise in data.  This gives
+slightly slower code than the default, but reduces the amount of RAM required
+when executing, and thus may be preferred for some embedded systems.
 
-@end table
+@item -muninit-const-in-rodata
+@itemx -mno-uninit-const-in-rodata
+@opindex muninit-const-in-rodata
+@opindex mno-uninit-const-in-rodata
+Put uninitialized @code{const} variables in the read-only data section.
+This option is only meaningful in conjunction with @option{-membedded-data}.
 
-@node MeP Options
-@subsection MeP Options
-@cindex MeP options
+@item -mcode-readable=@var{setting}
+@opindex mcode-readable
+Specify whether GCC may generate code that reads from executable sections.
+There are three possible settings:
 
 @table @gcctabopt
+@item -mcode-readable=yes
+Instructions may freely access executable sections.  This is the
+default setting.
 
-@item -mabsdiff
-@opindex mabsdiff
-Enables the @code{abs} instruction, which is the absolute difference
-between two registers.
-
-@item -mall-opts
-@opindex mall-opts
-Enables all the optional instructions---average, multiply, divide, bit
-operations, leading zero, absolute difference, min/max, clip, and
-saturation.
+@item -mcode-readable=pcrel
+MIPS16 PC-relative load instructions can access executable sections,
+but other instructions must not do so.  This option is useful on 4KSc
+and 4KSd processors when the code TLBs have the Read Inhibit bit set.
+It is also useful on processors that can be configured to have a dual
+instruction/data SRAM interface and that, like the M4K, automatically
+redirect PC-relative loads to the instruction RAM.
 
+@item -mcode-readable=no
+Instructions must not access executable sections.  This option can be
+useful on targets that are configured to have a dual instruction/data
+SRAM interface but that (unlike the M4K) do not automatically redirect
+PC-relative loads to the instruction RAM.
+@end table
 
-@item -maverage
-@opindex maverage
-Enables the @code{ave} instruction, which computes the average of two
-registers.
+@item -msplit-addresses
+@itemx -mno-split-addresses
+@opindex msplit-addresses
+@opindex mno-split-addresses
+Enable (disable) use of the @code{%hi()} and @code{%lo()} assembler
+relocation operators.  This option has been superseded by
+@option{-mexplicit-relocs} but is retained for backwards compatibility.
 
-@item -mbased=@var{n}
-@opindex mbased=
-Variables of size @var{n} bytes or smaller are placed in the
-@code{.based} section by default.  Based variables use the @code{$tp}
-register as a base register, and there is a 128-byte limit to the
-@code{.based} section.
+@item -mexplicit-relocs
+@itemx -mno-explicit-relocs
+@opindex mexplicit-relocs
+@opindex mno-explicit-relocs
+Use (do not use) assembler relocation operators when dealing with symbolic
+addresses.  The alternative, selected by @option{-mno-explicit-relocs},
+is to use assembler macros instead.
 
-@item -mbitops
-@opindex mbitops
-Enables the bit operation instructions---bit test (@code{btstm}), set
-(@code{bsetm}), clear (@code{bclrm}), invert (@code{bnotm}), and
-test-and-set (@code{tas}).
+@option{-mexplicit-relocs} is the default if GCC was configured
+to use an assembler that supports relocation operators.
 
-@item -mc=@var{name}
-@opindex mc=
-Selects which section constant data is placed in.  @var{name} may
-be @samp{tiny}, @samp{near}, or @samp{far}.
+@item -mcheck-zero-division
+@itemx -mno-check-zero-division
+@opindex mcheck-zero-division
+@opindex mno-check-zero-division
+Trap (do not trap) on integer division by zero.
 
-@item -mclip
-@opindex mclip
-Enables the @code{clip} instruction.  Note that @option{-mclip} is not
-useful unless you also provide @option{-mminmax}.
+The default is @option{-mcheck-zero-division}.
 
-@item -mconfig=@var{name}
-@opindex mconfig=
-Selects one of the built-in core configurations.  Each MeP chip has
-one or more modules in it; each module has a core CPU and a variety of
-coprocessors, optional instructions, and peripherals.  The
-@code{MeP-Integrator} tool, not part of GCC, provides these
-configurations through this option; using this option is the same as
-using all the corresponding command-line options.  The default
-configuration is @samp{default}.
+@item -mdivide-traps
+@itemx -mdivide-breaks
+@opindex mdivide-traps
+@opindex mdivide-breaks
+MIPS systems check for division by zero by generating either a
+conditional trap or a break instruction.  Using traps results in
+smaller code, but is only supported on MIPS II and later.  Also, some
+versions of the Linux kernel have a bug that prevents trap from
+generating the proper signal (@code{SIGFPE}).  Use @option{-mdivide-traps} to
+allow conditional traps on architectures that support them and
+@option{-mdivide-breaks} to force the use of breaks.
 
-@item -mcop
-@opindex mcop
-Enables the coprocessor instructions.  By default, this is a 32-bit
-coprocessor.  Note that the coprocessor is normally enabled via the
-@option{-mconfig=} option.
+The default is usually @option{-mdivide-traps}, but this can be
+overridden at configure time using @option{--with-divide=breaks}.
+Divide-by-zero checks can be completely disabled using
+@option{-mno-check-zero-division}.
 
-@item -mcop32
-@opindex mcop32
-Enables the 32-bit coprocessor's instructions.
+@item -mmemcpy
+@itemx -mno-memcpy
+@opindex mmemcpy
+@opindex mno-memcpy
+Force (do not force) the use of @code{memcpy} for non-trivial block
+moves.  The default is @option{-mno-memcpy}, which allows GCC to inline
+most constant-sized copies.
 
-@item -mcop64
-@opindex mcop64
-Enables the 64-bit coprocessor's instructions.
+@item -mlong-calls
+@itemx -mno-long-calls
+@opindex mlong-calls
+@opindex mno-long-calls
+Disable (do not disable) use of the @code{jal} instruction.  Calling
+functions using @code{jal} is more efficient but requires the caller
+and callee to be in the same 256 megabyte segment.
 
-@item -mivc2
-@opindex mivc2
-Enables IVC2 scheduling.  IVC2 is a 64-bit VLIW coprocessor.
+This option has no effect on abicalls code.  The default is
+@option{-mno-long-calls}.
 
-@item -mdc
-@opindex mdc
-Causes constant variables to be placed in the @code{.near} section.
+@item -mmad
+@itemx -mno-mad
+@opindex mmad
+@opindex mno-mad
+Enable (disable) use of the @code{mad}, @code{madu} and @code{mul}
+instructions, as provided by the R4650 ISA@.
 
-@item -mdiv
-@opindex mdiv
-Enables the @code{div} and @code{divu} instructions.
+@item -mimadd
+@itemx -mno-imadd
+@opindex mimadd
+@opindex mno-imadd
+Enable (disable) use of the @code{madd} and @code{msub} integer
+instructions.  The default is @option{-mimadd} on architectures
+that support @code{madd} and @code{msub} except for the 74k 
+architecture where it was found to generate slower code.
 
-@item -meb
-@opindex meb
-Generate big-endian code.
+@item -mfused-madd
+@itemx -mno-fused-madd
+@opindex mfused-madd
+@opindex mno-fused-madd
+Enable (disable) use of the floating-point multiply-accumulate
+instructions, when they are available.  The default is
+@option{-mfused-madd}.
 
-@item -mel
-@opindex mel
-Generate little-endian code.
+On the R8000 CPU when multiply-accumulate instructions are used,
+the intermediate product is calculated to infinite precision
+and is not subject to the FCSR Flush to Zero bit.  This may be
+undesirable in some circumstances.  On other processors the result
+is numerically identical to the equivalent computation using
+separate multiply, add, subtract and negate instructions.
 
-@item -mio-volatile
-@opindex mio-volatile
-Tells the compiler that any variable marked with the @code{io}
-attribute is to be considered volatile.
+@item -nocpp
+@opindex nocpp
+Tell the MIPS assembler to not run its preprocessor over user
+assembler files (with a @samp{.s} suffix) when assembling them.
 
-@item -ml
-@opindex ml
-Causes variables to be assigned to the @code{.far} section by default.
+@item -mfix-24k
+@item -mno-fix-24k
+@opindex mfix-24k
+@opindex mno-fix-24k
+Work around the 24K E48 (lost data on stores during refill) errata.
+The workarounds are implemented by the assembler rather than by GCC@.
 
-@item -mleadz
-@opindex mleadz
-Enables the @code{leadz} (leading zero) instruction.
+@item -mfix-r4000
+@itemx -mno-fix-r4000
+@opindex mfix-r4000
+@opindex mno-fix-r4000
+Work around certain R4000 CPU errata:
+@itemize @minus
+@item
+A double-word or a variable shift may give an incorrect result if executed
+immediately after starting an integer division.
+@item
+A double-word or a variable shift may give an incorrect result if executed
+while an integer multiplication is in progress.
+@item
+An integer division may give an incorrect result if started in a delay slot
+of a taken branch or a jump.
+@end itemize
 
-@item -mm
-@opindex mm
-Causes variables to be assigned to the @code{.near} section by default.
-
-@item -mminmax
-@opindex mminmax
-Enables the @code{min} and @code{max} instructions.
+@item -mfix-r4400
+@itemx -mno-fix-r4400
+@opindex mfix-r4400
+@opindex mno-fix-r4400
+Work around certain R4400 CPU errata:
+@itemize @minus
+@item
+A double-word or a variable shift may give an incorrect result if executed
+immediately after starting an integer division.
+@end itemize
 
-@item -mmult
-@opindex mmult
-Enables the multiplication and multiply-accumulate instructions.
+@item -mfix-r10000
+@itemx -mno-fix-r10000
+@opindex mfix-r10000
+@opindex mno-fix-r10000
+Work around certain R10000 errata:
+@itemize @minus
+@item
+@code{ll}/@code{sc} sequences may not behave atomically on revisions
+prior to 3.0.  They may deadlock on revisions 2.6 and earlier.
+@end itemize
 
-@item -mno-opts
-@opindex mno-opts
-Disables all the optional instructions enabled by @option{-mall-opts}.
+This option can only be used if the target architecture supports
+branch-likely instructions.  @option{-mfix-r10000} is the default when
+@option{-march=r10000} is used; @option{-mno-fix-r10000} is the default
+otherwise.
 
-@item -mrepeat
-@opindex mrepeat
-Enables the @code{repeat} and @code{erepeat} instructions, used for
-low-overhead looping.
+@item -mfix-rm7000
+@itemx -mno-fix-rm7000
+@opindex mfix-rm7000
+Work around the RM7000 @code{dmult}/@code{dmultu} errata.  The
+workarounds are implemented by the assembler rather than by GCC@.
 
-@item -ms
-@opindex ms
-Causes all variables to default to the @code{.tiny} section.  Note
-that there is a 65536-byte limit to this section.  Accesses to these
-variables use the @code{%gp} base register.
+@item -mfix-vr4120
+@itemx -mno-fix-vr4120
+@opindex mfix-vr4120
+Work around certain VR4120 errata:
+@itemize @minus
+@item
+@code{dmultu} does not always produce the correct result.
+@item
+@code{div} and @code{ddiv} do not always produce the correct result if one
+of the operands is negative.
+@end itemize
+The workarounds for the division errata rely on special functions in
+@file{libgcc.a}.  At present, these functions are only provided by
+the @code{mips64vr*-elf} configurations.
 
-@item -msatur
-@opindex msatur
-Enables the saturation instructions.  Note that the compiler does not
-currently generate these itself, but this option is included for
-compatibility with other tools, like @code{as}.
+Other VR4120 errata require a NOP to be inserted between certain pairs of
+instructions.  These errata are handled by the assembler, not by GCC itself.
 
-@item -msdram
-@opindex msdram
-Link the SDRAM-based runtime instead of the default ROM-based runtime.
+@item -mfix-vr4130
+@opindex mfix-vr4130
+Work around the VR4130 @code{mflo}/@code{mfhi} errata.  The
+workarounds are implemented by the assembler rather than by GCC,
+although GCC avoids using @code{mflo} and @code{mfhi} if the
+VR4130 @code{macc}, @code{macchi}, @code{dmacc} and @code{dmacchi}
+instructions are available instead.
 
-@item -msim
-@opindex msim
-Link the simulator run-time libraries.
+@item -mfix-sb1
+@itemx -mno-fix-sb1
+@opindex mfix-sb1
+Work around certain SB-1 CPU core errata.
+(This flag currently works around the SB-1 revision 2
+``F1'' and ``F2'' floating-point errata.)
 
-@item -msimnovec
-@opindex msimnovec
-Link the simulator runtime libraries, excluding built-in support
-for reset and exception vectors and tables.
+@item -mr10k-cache-barrier=@var{setting}
+@opindex mr10k-cache-barrier
+Specify whether GCC should insert cache barriers to avoid the
+side-effects of speculation on R10K processors.
 
-@item -mtf
-@opindex mtf
-Causes all functions to default to the @code{.far} section.  Without
-this option, functions default to the @code{.near} section.
+In common with many processors, the R10K tries to predict the outcome
+of a conditional branch and speculatively executes instructions from
+the ``taken'' branch.  It later aborts these instructions if the
+predicted outcome is wrong.  However, on the R10K, even aborted
+instructions can have side effects.
 
-@item -mtiny=@var{n}
-@opindex mtiny=
-Variables that are @var{n} bytes or smaller are allocated to the
-@code{.tiny} section.  These variables use the @code{$gp} base
-register.  The default for this option is 4, but note that there's a
-65536-byte limit to the @code{.tiny} section.
+This problem only affects kernel stores and, depending on the system,
+kernel loads.  As an example, a speculatively-executed store may load
+the target memory into cache and mark the cache line as dirty, even if
+the store itself is later aborted.  If a DMA operation writes to the
+same area of memory before the ``dirty'' line is flushed, the cached
+data overwrites the DMA-ed data.  See the R10K processor manual
+for a full description, including other potential problems.
 
-@end table
+One workaround is to insert cache barrier instructions before every memory
+access that might be speculatively executed and that might have side
+effects even if aborted.  @option{-mr10k-cache-barrier=@var{setting}}
+controls GCC's implementation of this workaround.  It assumes that
+aborted accesses to any byte in the following regions does not have
+side effects:
 
-@node MicroBlaze Options
-@subsection MicroBlaze Options
-@cindex MicroBlaze Options
+@enumerate
+@item
+the memory occupied by the current function's stack frame;
 
-@table @gcctabopt
+@item
+the memory occupied by an incoming stack argument;
 
-@item -msoft-float
-@opindex msoft-float
-Use software emulation for floating point (default).
+@item
+the memory occupied by an object with a link-time-constant address.
+@end enumerate
 
-@item -mhard-float
-@opindex mhard-float
-Use hardware floating-point instructions.
+It is the kernel's responsibility to ensure that speculative
+accesses to these regions are indeed safe.
 
-@item -mmemcpy
-@opindex mmemcpy
-Do not optimize block moves, use @code{memcpy}.
+If the input program contains a function declaration such as:
 
-@item -mno-clearbss
-@opindex mno-clearbss
-This option is deprecated.  Use @option{-fno-zero-initialized-in-bss} instead.
+@smallexample
+void foo (void);
+@end smallexample
 
-@item -mcpu=@var{cpu-type}
-@opindex mcpu=
-Use features of, and schedule code for, the given CPU.
-Supported values are in the format @samp{v@var{X}.@var{YY}.@var{Z}},
-where @var{X} is a major version, @var{YY} is the minor version, and
-@var{Z} is compatibility code.  Example values are @samp{v3.00.a},
-@samp{v4.00.b}, @samp{v5.00.a}, @samp{v5.00.b}, @samp{v5.00.b}, @samp{v6.00.a}.
+then the implementation of @code{foo} must allow @code{j foo} and
+@code{jal foo} to be executed speculatively.  GCC honors this
+restriction for functions it compiles itself.  It expects non-GCC
+functions (such as hand-written assembly code) to do the same.
 
-@item -mxl-soft-mul
-@opindex mxl-soft-mul
-Use software multiply emulation (default).
+The option has three forms:
 
-@item -mxl-soft-div
-@opindex mxl-soft-div
-Use software emulation for divides (default).
+@table @gcctabopt
+@item -mr10k-cache-barrier=load-store
+Insert a cache barrier before a load or store that might be
+speculatively executed and that might have side effects even
+if aborted.
 
-@item -mxl-barrel-shift
-@opindex mxl-barrel-shift
-Use the hardware barrel shifter.
+@item -mr10k-cache-barrier=store
+Insert a cache barrier before a store that might be speculatively
+executed and that might have side effects even if aborted.
 
-@item -mxl-pattern-compare
-@opindex mxl-pattern-compare
-Use pattern compare instructions.
+@item -mr10k-cache-barrier=none
+Disable the insertion of cache barriers.  This is the default setting.
+@end table
 
-@item -msmall-divides
-@opindex msmall-divides
-Use table lookup optimization for small signed integer divisions.
+@item -mflush-func=@var{func}
+@itemx -mno-flush-func
+@opindex mflush-func
+Specifies the function to call to flush the I and D caches, or to not
+call any such function.  If called, the function must take the same
+arguments as the common @code{_flush_func}, that is, the address of the
+memory range for which the cache is being flushed, the size of the
+memory range, and the number 3 (to flush both caches).  The default
+depends on the target GCC was configured for, but commonly is either
+@code{_flush_func} or @code{__cpu_flush}.
 
-@item -mxl-stack-check
-@opindex mxl-stack-check
-This option is deprecated.  Use @option{-fstack-check} instead.
+@item mbranch-cost=@var{num}
+@opindex mbranch-cost
+Set the cost of branches to roughly @var{num} ``simple'' instructions.
+This cost is only a heuristic and is not guaranteed to produce
+consistent results across releases.  A zero cost redundantly selects
+the default, which is based on the @option{-mtune} setting.
 
-@item -mxl-gp-opt
-@opindex mxl-gp-opt
-Use GP-relative @code{.sdata}/@code{.sbss} sections.
+@item -mbranch-likely
+@itemx -mno-branch-likely
+@opindex mbranch-likely
+@opindex mno-branch-likely
+Enable or disable use of Branch Likely instructions, regardless of the
+default for the selected architecture.  By default, Branch Likely
+instructions may be generated if they are supported by the selected
+architecture.  An exception is for the MIPS32 and MIPS64 architectures
+and processors that implement those architectures; for those, Branch
+Likely instructions are not be generated by default because the MIPS32
+and MIPS64 architectures specifically deprecate their use.
 
-@item -mxl-multiply-high
-@opindex mxl-multiply-high
-Use multiply high instructions for high part of 32x32 multiply.
+@item -mfp-exceptions
+@itemx -mno-fp-exceptions
+@opindex mfp-exceptions
+Specifies whether FP exceptions are enabled.  This affects how
+FP instructions are scheduled for some processors.
+The default is that FP exceptions are
+enabled.
 
-@item -mxl-float-convert
-@opindex mxl-float-convert
-Use hardware floating-point conversion instructions.
+For instance, on the SB-1, if FP exceptions are disabled, and we are emitting
+64-bit code, then we can use both FP pipes.  Otherwise, we can only use one
+FP pipe.
 
-@item -mxl-float-sqrt
-@opindex mxl-float-sqrt
-Use hardware floating-point square root instruction.
+@item -mvr4130-align
+@itemx -mno-vr4130-align
+@opindex mvr4130-align
+The VR4130 pipeline is two-way superscalar, but can only issue two
+instructions together if the first one is 8-byte aligned.  When this
+option is enabled, GCC aligns pairs of instructions that it
+thinks should execute in parallel.
 
-@item -mbig-endian
-@opindex mbig-endian
-Generate code for a big-endian target.
+This option only has an effect when optimizing for the VR4130.
+It normally makes code faster, but at the expense of making it bigger.
+It is enabled by default at optimization level @option{-O3}.
 
-@item -mlittle-endian
-@opindex mlittle-endian
-Generate code for a little-endian target.
+@item -msynci
+@itemx -mno-synci
+@opindex msynci
+Enable (disable) generation of @code{synci} instructions on
+architectures that support it.  The @code{synci} instructions (if
+enabled) are generated when @code{__builtin___clear_cache} is
+compiled.
 
-@item -mxl-reorder
-@opindex mxl-reorder
-Use reorder instructions (swap and byte reversed load/store).
+This option defaults to @option{-mno-synci}, but the default can be
+overridden by configuring GCC with @option{--with-synci}.
 
-@item -mxl-mode-@var{app-model}
-Select application model @var{app-model}.  Valid models are
-@table @samp
-@item executable
-normal executable (default), uses startup code @file{crt0.o}.
+When compiling code for single processor systems, it is generally safe
+to use @code{synci}.  However, on many multi-core (SMP) systems, it
+does not invalidate the instruction caches on all cores and may lead
+to undefined behavior.
 
-@item xmdstub
-for use with Xilinx Microprocessor Debugger (XMD) based
-software intrusive debug agent called xmdstub. This uses startup file
-@file{crt1.o} and sets the start address of the program to 0x800.
+@item -mrelax-pic-calls
+@itemx -mno-relax-pic-calls
+@opindex mrelax-pic-calls
+Try to turn PIC calls that are normally dispatched via register
+@code{$25} into direct calls.  This is only possible if the linker can
+resolve the destination at link-time and if the destination is within
+range for a direct call.
 
-@item bootstrap
-for applications that are loaded using a bootloader.
-This model uses startup file @file{crt2.o} which does not contain a processor
-reset vector handler. This is suitable for transferring control on a
-processor reset to the bootloader rather than the application.
+@option{-mrelax-pic-calls} is the default if GCC was configured to use
+an assembler and a linker that support the @code{.reloc} assembly
+directive and @option{-mexplicit-relocs} is in effect.  With
+@option{-mno-explicit-relocs}, this optimization can be performed by the
+assembler and the linker alone without help from the compiler.
 
-@item novectors
-for applications that do not require any of the
-MicroBlaze vectors. This option may be useful for applications running
-within a monitoring application. This model uses @file{crt3.o} as a startup file.
-@end table
+@item -mmcount-ra-address
+@itemx -mno-mcount-ra-address
+@opindex mmcount-ra-address
+@opindex mno-mcount-ra-address
+Emit (do not emit) code that allows @code{_mcount} to modify the
+calling function's return address.  When enabled, this option extends
+the usual @code{_mcount} interface with a new @var{ra-address}
+parameter, which has type @code{intptr_t *} and is passed in register
+@code{$12}.  @code{_mcount} can then modify the return address by
+doing both of the following:
+@itemize
+@item
+Returning the new address in register @code{$31}.
+@item
+Storing the new address in @code{*@var{ra-address}},
+if @var{ra-address} is nonnull.
+@end itemize
 
-Option @option{-xl-mode-@var{app-model}} is a deprecated alias for
-@option{-mxl-mode-@var{app-model}}.
+The default is @option{-mno-mcount-ra-address}.
 
 @end table
 
-@node MIPS Options
-@subsection MIPS Options
-@cindex MIPS options
+@node MMIX Options
+@subsection MMIX Options
+@cindex MMIX Options
+
+These options are defined for the MMIX:
 
 @table @gcctabopt
+@item -mlibfuncs
+@itemx -mno-libfuncs
+@opindex mlibfuncs
+@opindex mno-libfuncs
+Specify that intrinsic library functions are being compiled, passing all
+values in registers, no matter the size.
 
-@item -EB
-@opindex EB
-Generate big-endian code.
+@item -mepsilon
+@itemx -mno-epsilon
+@opindex mepsilon
+@opindex mno-epsilon
+Generate floating-point comparison instructions that compare with respect
+to the @code{rE} epsilon register.
 
-@item -EL
-@opindex EL
-Generate little-endian code.  This is the default for @samp{mips*el-*-*}
-configurations.
+@item -mabi=mmixware
+@itemx -mabi=gnu
+@opindex mabi=mmixware
+@opindex mabi=gnu
+Generate code that passes function parameters and return values that (in
+the called function) are seen as registers @code{$0} and up, as opposed to
+the GNU ABI which uses global registers @code{$231} and up.
 
-@item -march=@var{arch}
-@opindex march
-Generate code that runs on @var{arch}, which can be the name of a
-generic MIPS ISA, or the name of a particular processor.
-The ISA names are:
-@samp{mips1}, @samp{mips2}, @samp{mips3}, @samp{mips4},
-@samp{mips32}, @samp{mips32r2}, @samp{mips32r3}, @samp{mips32r5},
-@samp{mips32r6}, @samp{mips64}, @samp{mips64r2}, @samp{mips64r3},
-@samp{mips64r5} and @samp{mips64r6}.
-The processor names are:
-@samp{4kc}, @samp{4km}, @samp{4kp}, @samp{4ksc},
-@samp{4kec}, @samp{4kem}, @samp{4kep}, @samp{4ksd},
-@samp{5kc}, @samp{5kf},
-@samp{20kc},
-@samp{24kc}, @samp{24kf2_1}, @samp{24kf1_1},
-@samp{24kec}, @samp{24kef2_1}, @samp{24kef1_1},
-@samp{34kc}, @samp{34kf2_1}, @samp{34kf1_1}, @samp{34kn},
-@samp{74kc}, @samp{74kf2_1}, @samp{74kf1_1}, @samp{74kf3_2},
-@samp{1004kc}, @samp{1004kf2_1}, @samp{1004kf1_1},
-@samp{loongson2e}, @samp{loongson2f}, @samp{loongson3a},
-@samp{m4k},
-@samp{m14k}, @samp{m14kc}, @samp{m14ke}, @samp{m14kec},
-@samp{octeon}, @samp{octeon+}, @samp{octeon2}, @samp{octeon3},
-@samp{orion},
-@samp{p5600},
-@samp{r2000}, @samp{r3000}, @samp{r3900}, @samp{r4000}, @samp{r4400},
-@samp{r4600}, @samp{r4650}, @samp{r4700}, @samp{r6000}, @samp{r8000},
-@samp{rm7000}, @samp{rm9000},
-@samp{r10000}, @samp{r12000}, @samp{r14000}, @samp{r16000},
-@samp{sb1},
-@samp{sr71000},
-@samp{vr4100}, @samp{vr4111}, @samp{vr4120}, @samp{vr4130}, @samp{vr4300},
-@samp{vr5000}, @samp{vr5400}, @samp{vr5500},
-@samp{xlr} and @samp{xlp}.
-The special value @samp{from-abi} selects the
-most compatible architecture for the selected ABI (that is,
-@samp{mips1} for 32-bit ABIs and @samp{mips3} for 64-bit ABIs)@.
+@item -mzero-extend
+@itemx -mno-zero-extend
+@opindex mzero-extend
+@opindex mno-zero-extend
+When reading data from memory in sizes shorter than 64 bits, use (do not
+use) zero-extending load instructions by default, rather than
+sign-extending ones.
 
-The native Linux/GNU toolchain also supports the value @samp{native},
-which selects the best architecture option for the host processor.
-@option{-march=native} has no effect if GCC does not recognize
-the processor.
+@item -mknuthdiv
+@itemx -mno-knuthdiv
+@opindex mknuthdiv
+@opindex mno-knuthdiv
+Make the result of a division yielding a remainder have the same sign as
+the divisor.  With the default, @option{-mno-knuthdiv}, the sign of the
+remainder follows the sign of the dividend.  Both methods are
+arithmetically valid, the latter being almost exclusively used.
 
-In processor names, a final @samp{000} can be abbreviated as @samp{k}
-(for example, @option{-march=r2k}).  Prefixes are optional, and
-@samp{vr} may be written @samp{r}.
+@item -mtoplevel-symbols
+@itemx -mno-toplevel-symbols
+@opindex mtoplevel-symbols
+@opindex mno-toplevel-symbols
+Prepend (do not prepend) a @samp{:} to all global symbols, so the assembly
+code can be used with the @code{PREFIX} assembly directive.
 
-Names of the form @samp{@var{n}f2_1} refer to processors with
-FPUs clocked at half the rate of the core, names of the form
-@samp{@var{n}f1_1} refer to processors with FPUs clocked at the same
-rate as the core, and names of the form @samp{@var{n}f3_2} refer to
-processors with FPUs clocked a ratio of 3:2 with respect to the core.
-For compatibility reasons, @samp{@var{n}f} is accepted as a synonym
-for @samp{@var{n}f2_1} while @samp{@var{n}x} and @samp{@var{b}fx} are
-accepted as synonyms for @samp{@var{n}f1_1}.
+@item -melf
+@opindex melf
+Generate an executable in the ELF format, rather than the default
+@samp{mmo} format used by the @command{mmix} simulator.
 
-GCC defines two macros based on the value of this option.  The first
-is @code{_MIPS_ARCH}, which gives the name of target architecture, as
-a string.  The second has the form @code{_MIPS_ARCH_@var{foo}},
-where @var{foo} is the capitalized value of @code{_MIPS_ARCH}@.
-For example, @option{-march=r2000} sets @code{_MIPS_ARCH}
-to @code{"r2000"} and defines the macro @code{_MIPS_ARCH_R2000}.
+@item -mbranch-predict
+@itemx -mno-branch-predict
+@opindex mbranch-predict
+@opindex mno-branch-predict
+Use (do not use) the probable-branch instructions, when static branch
+prediction indicates a probable branch.
 
-Note that the @code{_MIPS_ARCH} macro uses the processor names given
-above.  In other words, it has the full prefix and does not
-abbreviate @samp{000} as @samp{k}.  In the case of @samp{from-abi},
-the macro names the resolved architecture (either @code{"mips1"} or
-@code{"mips3"}).  It names the default architecture when no
-@option{-march} option is given.
+@item -mbase-addresses
+@itemx -mno-base-addresses
+@opindex mbase-addresses
+@opindex mno-base-addresses
+Generate (do not generate) code that uses @emph{base addresses}.  Using a
+base address automatically generates a request (handled by the assembler
+and the linker) for a constant to be set up in a global register.  The
+register is used for one or more base address requests within the range 0
+to 255 from the value held in the register.  The generally leads to short
+and fast code, but the number of different data items that can be
+addressed is limited.  This means that a program that uses lots of static
+data may require @option{-mno-base-addresses}.
 
-@item -mtune=@var{arch}
-@opindex mtune
-Optimize for @var{arch}.  Among other things, this option controls
-the way instructions are scheduled, and the perceived cost of arithmetic
-operations.  The list of @var{arch} values is the same as for
-@option{-march}.
+@item -msingle-exit
+@itemx -mno-single-exit
+@opindex msingle-exit
+@opindex mno-single-exit
+Force (do not force) generated code to have a single exit point in each
+function.
+@end table
 
-When this option is not used, GCC optimizes for the processor
-specified by @option{-march}.  By using @option{-march} and
-@option{-mtune} together, it is possible to generate code that
-runs on a family of processors, but optimize the code for one
-particular member of that family.
+@node MN10300 Options
+@subsection MN10300 Options
+@cindex MN10300 options
 
-@option{-mtune} defines the macros @code{_MIPS_TUNE} and
-@code{_MIPS_TUNE_@var{foo}}, which work in the same way as the
-@option{-march} ones described above.
+These @option{-m} options are defined for Matsushita MN10300 architectures:
 
-@item -mips1
-@opindex mips1
-Equivalent to @option{-march=mips1}.
+@table @gcctabopt
+@item -mmult-bug
+@opindex mmult-bug
+Generate code to avoid bugs in the multiply instructions for the MN10300
+processors.  This is the default.
 
-@item -mips2
-@opindex mips2
-Equivalent to @option{-march=mips2}.
+@item -mno-mult-bug
+@opindex mno-mult-bug
+Do not generate code to avoid bugs in the multiply instructions for the
+MN10300 processors.
 
-@item -mips3
-@opindex mips3
-Equivalent to @option{-march=mips3}.
+@item -mam33
+@opindex mam33
+Generate code using features specific to the AM33 processor.
 
-@item -mips4
-@opindex mips4
-Equivalent to @option{-march=mips4}.
+@item -mno-am33
+@opindex mno-am33
+Do not generate code using features specific to the AM33 processor.  This
+is the default.
 
-@item -mips32
-@opindex mips32
-Equivalent to @option{-march=mips32}.
+@item -mam33-2
+@opindex mam33-2
+Generate code using features specific to the AM33/2.0 processor.
 
-@item -mips32r3
-@opindex mips32r3
-Equivalent to @option{-march=mips32r3}.
+@item -mam34
+@opindex mam34
+Generate code using features specific to the AM34 processor.
 
-@item -mips32r5
-@opindex mips32r5
-Equivalent to @option{-march=mips32r5}.
+@item -mtune=@var{cpu-type}
+@opindex mtune
+Use the timing characteristics of the indicated CPU type when
+scheduling instructions.  This does not change the targeted processor
+type.  The CPU type must be one of @samp{mn10300}, @samp{am33},
+@samp{am33-2} or @samp{am34}.
 
-@item -mips32r6
-@opindex mips32r6
-Equivalent to @option{-march=mips32r6}.
+@item -mreturn-pointer-on-d0
+@opindex mreturn-pointer-on-d0
+When generating a function that returns a pointer, return the pointer
+in both @code{a0} and @code{d0}.  Otherwise, the pointer is returned
+only in @code{a0}, and attempts to call such functions without a prototype
+result in errors.  Note that this option is on by default; use
+@option{-mno-return-pointer-on-d0} to disable it.
 
-@item -mips64
-@opindex mips64
-Equivalent to @option{-march=mips64}.
+@item -mno-crt0
+@opindex mno-crt0
+Do not link in the C run-time initialization object file.
 
-@item -mips64r2
-@opindex mips64r2
-Equivalent to @option{-march=mips64r2}.
+@item -mrelax
+@opindex mrelax
+Indicate to the linker that it should perform a relaxation optimization pass
+to shorten branches, calls and absolute memory addresses.  This option only
+has an effect when used on the command line for the final link step.
 
-@item -mips64r3
-@opindex mips64r3
-Equivalent to @option{-march=mips64r3}.
+This option makes symbolic debugging impossible.
 
-@item -mips64r5
-@opindex mips64r5
-Equivalent to @option{-march=mips64r5}.
+@item -mliw
+@opindex mliw
+Allow the compiler to generate @emph{Long Instruction Word}
+instructions if the target is the @samp{AM33} or later.  This is the
+default.  This option defines the preprocessor macro @code{__LIW__}.
 
-@item -mips64r6
-@opindex mips64r6
-Equivalent to @option{-march=mips64r6}.
+@item -mnoliw
+@opindex mnoliw
+Do not allow the compiler to generate @emph{Long Instruction Word}
+instructions.  This option defines the preprocessor macro
+@code{__NO_LIW__}.
 
-@item -mips16
-@itemx -mno-mips16
-@opindex mips16
-@opindex mno-mips16
-Generate (do not generate) MIPS16 code.  If GCC is targeting a
-MIPS32 or MIPS64 architecture, it makes use of the MIPS16e ASE@.
+@item -msetlb
+@opindex msetlb
+Allow the compiler to generate the @emph{SETLB} and @emph{Lcc}
+instructions if the target is the @samp{AM33} or later.  This is the
+default.  This option defines the preprocessor macro @code{__SETLB__}.
 
-MIPS16 code generation can also be controlled on a per-function basis
-by means of @code{mips16} and @code{nomips16} attributes.
-@xref{Function Attributes}, for more information.
+@item -mnosetlb
+@opindex mnosetlb
+Do not allow the compiler to generate @emph{SETLB} or @emph{Lcc}
+instructions.  This option defines the preprocessor macro
+@code{__NO_SETLB__}.
 
-@item -mflip-mips16
-@opindex mflip-mips16
-Generate MIPS16 code on alternating functions.  This option is provided
-for regression testing of mixed MIPS16/non-MIPS16 code generation, and is
-not intended for ordinary use in compiling user code.
+@end table
 
-@item -minterlink-compressed
-@item -mno-interlink-compressed
-@opindex minterlink-compressed
-@opindex mno-interlink-compressed
-Require (do not require) that code using the standard (uncompressed) MIPS ISA
-be link-compatible with MIPS16 and microMIPS code, and vice versa.
+@node Moxie Options
+@subsection Moxie Options
+@cindex Moxie Options
 
-For example, code using the standard ISA encoding cannot jump directly
-to MIPS16 or microMIPS code; it must either use a call or an indirect jump.
-@option{-minterlink-compressed} therefore disables direct jumps unless GCC
-knows that the target of the jump is not compressed.
+@table @gcctabopt
 
-@item -minterlink-mips16
-@itemx -mno-interlink-mips16
-@opindex minterlink-mips16
-@opindex mno-interlink-mips16
-Aliases of @option{-minterlink-compressed} and
-@option{-mno-interlink-compressed}.  These options predate the microMIPS ASE
-and are retained for backwards compatibility.
+@item -meb
+@opindex meb
+Generate big-endian code.  This is the default for @samp{moxie-*-*}
+configurations.
 
-@item -mabi=32
-@itemx -mabi=o64
-@itemx -mabi=n32
-@itemx -mabi=64
-@itemx -mabi=eabi
-@opindex mabi=32
-@opindex mabi=o64
-@opindex mabi=n32
-@opindex mabi=64
-@opindex mabi=eabi
-Generate code for the given ABI@.
+@item -mel
+@opindex mel
+Generate little-endian code.
 
-Note that the EABI has a 32-bit and a 64-bit variant.  GCC normally
-generates 64-bit code when you select a 64-bit architecture, but you
-can use @option{-mgp32} to get 32-bit code instead.
+@item -mmul.x
+@opindex mmul.x
+Generate mul.x and umul.x instructions.  This is the default for
+@samp{moxiebox-*-*} configurations.
 
-For information about the O64 ABI, see
-@uref{http://gcc.gnu.org/@/projects/@/mipso64-abi.html}.
+@item -mno-crt0
+@opindex mno-crt0
+Do not link in the C run-time initialization object file.
 
-GCC supports a variant of the o32 ABI in which floating-point registers
-are 64 rather than 32 bits wide.  You can select this combination with
-@option{-mabi=32} @option{-mfp64}.  This ABI relies on the @code{mthc1}
-and @code{mfhc1} instructions and is therefore only supported for
-MIPS32R2, MIPS32R3 and MIPS32R5 processors.
+@end table
 
-The register assignments for arguments and return values remain the
-same, but each scalar value is passed in a single 64-bit register
-rather than a pair of 32-bit registers.  For example, scalar
-floating-point values are returned in @samp{$f0} only, not a
-@samp{$f0}/@samp{$f1} pair.  The set of call-saved registers also
-remains the same in that the even-numbered double-precision registers
-are saved.
+@node MSP430 Options
+@subsection MSP430 Options
+@cindex MSP430 Options
 
-Two additional variants of the o32 ABI are supported to enable
-a transition from 32-bit to 64-bit registers.  These are FPXX
-(@option{-mfpxx}) and FP64A (@option{-mfp64} @option{-mno-odd-spreg}).
-The FPXX extension mandates that all code must execute correctly
-when run using 32-bit or 64-bit registers.  The code can be interlinked
-with either FP32 or FP64, but not both.
-The FP64A extension is similar to the FP64 extension but forbids the
-use of odd-numbered single-precision registers.  This can be used
-in conjunction with the @code{FRE} mode of FPUs in MIPS32R5
-processors and allows both FP32 and FP64A code to interlink and
-run in the same process without changing FPU modes.
+These options are defined for the MSP430:
 
-@item -mabicalls
-@itemx -mno-abicalls
-@opindex mabicalls
-@opindex mno-abicalls
-Generate (do not generate) code that is suitable for SVR4-style
-dynamic objects.  @option{-mabicalls} is the default for SVR4-based
-systems.
+@table @gcctabopt
 
-@item -mshared
-@itemx -mno-shared
-Generate (do not generate) code that is fully position-independent,
-and that can therefore be linked into shared libraries.  This option
-only affects @option{-mabicalls}.
+@item -masm-hex
+@opindex masm-hex
+Force assembly output to always use hex constants.  Normally such
+constants are signed decimals, but this option is available for
+testsuite and/or aesthetic purposes.
 
-All @option{-mabicalls} code has traditionally been position-independent,
-regardless of options like @option{-fPIC} and @option{-fpic}.  However,
-as an extension, the GNU toolchain allows executables to use absolute
-accesses for locally-binding symbols.  It can also use shorter GP
-initialization sequences and generate direct calls to locally-defined
-functions.  This mode is selected by @option{-mno-shared}.
+@item -mmcu=
+@opindex mmcu=
+Select the MCU to target.  This is used to create a C preprocessor
+symbol based upon the MCU name, converted to upper case and pre- and
+post-fixed with @samp{__}.  This in turn is used by the
+@file{msp430.h} header file to select an MCU-specific supplementary
+header file.
 
-@option{-mno-shared} depends on binutils 2.16 or higher and generates
-objects that can only be linked by the GNU linker.  However, the option
-does not affect the ABI of the final executable; it only affects the ABI
-of relocatable objects.  Using @option{-mno-shared} generally makes
-executables both smaller and quicker.
+The option also sets the ISA to use.  If the MCU name is one that is
+known to only support the 430 ISA then that is selected, otherwise the
+430X ISA is selected.  A generic MCU name of @samp{msp430} can also be
+used to select the 430 ISA.  Similarly the generic @samp{msp430x} MCU
+name selects the 430X ISA.
 
-@option{-mshared} is the default.
+In addition an MCU-specific linker script is added to the linker
+command line.  The script's name is the name of the MCU with
+@file{.ld} appended.  Thus specifying @option{-mmcu=xxx} on the @command{gcc}
+command line defines the C preprocessor symbol @code{__XXX__} and
+cause the linker to search for a script called @file{xxx.ld}.
 
-@item -mplt
-@itemx -mno-plt
-@opindex mplt
-@opindex mno-plt
-Assume (do not assume) that the static and dynamic linkers
-support PLTs and copy relocations.  This option only affects
-@option{-mno-shared -mabicalls}.  For the n64 ABI, this option
-has no effect without @option{-msym32}.
+This option is also passed on to the assembler.
 
-You can make @option{-mplt} the default by configuring
-GCC with @option{--with-mips-plt}.  The default is
-@option{-mno-plt} otherwise.
+@item -mcpu=
+@opindex mcpu=
+Specifies the ISA to use.  Accepted values are @samp{msp430},
+@samp{msp430x} and @samp{msp430xv2}.  This option is deprecated.  The
+@option{-mmcu=} option should be used to select the ISA.
 
-@item -mxgot
-@itemx -mno-xgot
-@opindex mxgot
-@opindex mno-xgot
-Lift (do not lift) the usual restrictions on the size of the global
-offset table.
+@item -msim
+@opindex msim
+Link to the simulator runtime libraries and linker script.  Overrides
+any scripts that would be selected by the @option{-mmcu=} option.
 
-GCC normally uses a single instruction to load values from the GOT@.
-While this is relatively efficient, it only works if the GOT
-is smaller than about 64k.  Anything larger causes the linker
-to report an error such as:
+@item -mlarge
+@opindex mlarge
+Use large-model addressing (20-bit pointers, 32-bit @code{size_t}).
 
-@cindex relocation truncated to fit (MIPS)
-@smallexample
-relocation truncated to fit: R_MIPS_GOT16 foobar
-@end smallexample
+@item -msmall
+@opindex msmall
+Use small-model addressing (16-bit pointers, 16-bit @code{size_t}).
 
-If this happens, you should recompile your code with @option{-mxgot}.
-This works with very large GOTs, although the code is also
-less efficient, since it takes three instructions to fetch the
-value of a global symbol.
+@item -mrelax
+@opindex mrelax
+This option is passed to the assembler and linker, and allows the
+linker to perform certain optimizations that cannot be done until
+the final link.
 
-Note that some linkers can create multiple GOTs.  If you have such a
-linker, you should only need to use @option{-mxgot} when a single object
-file accesses more than 64k's worth of GOT entries.  Very few do.
+@item mhwmult=
+@opindex mhwmult=
+Describes the type of hardware multiply supported by the target.
+Accepted values are @samp{none} for no hardware multiply, @samp{16bit}
+for the original 16-bit-only multiply supported by early MCUs.
+@samp{32bit} for the 16/32-bit multiply supported by later MCUs and
+@samp{f5series} for the 16/32-bit multiply supported by F5-series MCUs.
+A value of @samp{auto} can also be given.  This tells GCC to deduce
+the hardware multiply support based upon the MCU name provided by the
+@option{-mmcu} option.  If no @option{-mmcu} option is specified then
+@samp{32bit} hardware multiply support is assumed.  @samp{auto} is the
+default setting.
 
-These options have no effect unless GCC is generating position
-independent code.
+Hardware multiplies are normally performed by calling a library
+routine.  This saves space in the generated code.  When compiling at
+@option{-O3} or higher however the hardware multiplier is invoked
+inline.  This makes for bigger, but faster code.
 
-@item -mgp32
-@opindex mgp32
-Assume that general-purpose registers are 32 bits wide.
+The hardware multiply routines disable interrupts whilst running and
+restore the previous interrupt state when they finish.  This makes
+them safe to use inside interrupt handlers as well as in normal code.
 
-@item -mgp64
-@opindex mgp64
-Assume that general-purpose registers are 64 bits wide.
+@item -minrt
+@opindex minrt
+Enable the use of a minimum runtime environment - no static
+initializers or constructors.  This is intended for memory-constrained
+devices.  The compiler includes special symbols in some objects
+that tell the linker and runtime which code fragments are required.
 
-@item -mfp32
-@opindex mfp32
-Assume that floating-point registers are 32 bits wide.
+@end table
 
-@item -mfp64
-@opindex mfp64
-Assume that floating-point registers are 64 bits wide.
+@node NDS32 Options
+@subsection NDS32 Options
+@cindex NDS32 Options
 
-@item -mfpxx
-@opindex mfpxx
-Do not assume the width of floating-point registers.
+These options are defined for NDS32 implementations:
 
-@item -mhard-float
-@opindex mhard-float
-Use floating-point coprocessor instructions.
+@table @gcctabopt
 
-@item -msoft-float
-@opindex msoft-float
-Do not use floating-point coprocessor instructions.  Implement
-floating-point calculations using library calls instead.
+@item -mbig-endian
+@opindex mbig-endian
+Generate code in big-endian mode.
 
-@item -mno-float
-@opindex mno-float
-Equivalent to @option{-msoft-float}, but additionally asserts that the
-program being compiled does not perform any floating-point operations.
-This option is presently supported only by some bare-metal MIPS
-configurations, where it may select a special set of libraries
-that lack all floating-point support (including, for example, the
-floating-point @code{printf} formats).  
-If code compiled with @option{-mno-float} accidentally contains
-floating-point operations, it is likely to suffer a link-time
-or run-time failure.
+@item -mlittle-endian
+@opindex mlittle-endian
+Generate code in little-endian mode.
 
-@item -msingle-float
-@opindex msingle-float
-Assume that the floating-point coprocessor only supports single-precision
-operations.
+@item -mreduced-regs
+@opindex mreduced-regs
+Use reduced-set registers for register allocation.
 
-@item -mdouble-float
-@opindex mdouble-float
-Assume that the floating-point coprocessor supports double-precision
-operations.  This is the default.
+@item -mfull-regs
+@opindex mfull-regs
+Use full-set registers for register allocation.
 
-@item -modd-spreg
-@itemx -mno-odd-spreg
-@opindex modd-spreg
-@opindex mno-odd-spreg
-Enable the use of odd-numbered single-precision floating-point registers
-for the o32 ABI.  This is the default for processors that are known to
-support these registers.  When using the o32 FPXX ABI, @option{-mno-odd-spreg}
-is set by default.
+@item -mcmov
+@opindex mcmov
+Generate conditional move instructions.
 
-@item -mabs=2008
-@itemx -mabs=legacy
-@opindex mabs=2008
-@opindex mabs=legacy
-These options control the treatment of the special not-a-number (NaN)
-IEEE 754 floating-point data with the @code{abs.@i{fmt}} and
-@code{neg.@i{fmt}} machine instructions.
+@item -mno-cmov
+@opindex mno-cmov
+Do not generate conditional move instructions.
 
-By default or when the @option{-mabs=legacy} is used the legacy
-treatment is selected.  In this case these instructions are considered
-arithmetic and avoided where correct operation is required and the
-input operand might be a NaN.  A longer sequence of instructions that
-manipulate the sign bit of floating-point datum manually is used
-instead unless the @option{-ffinite-math-only} option has also been
-specified.
+@item -mperf-ext
+@opindex mperf-ext
+Generate performance extension instructions.
 
-The @option{-mabs=2008} option selects the IEEE 754-2008 treatment.  In
-this case these instructions are considered non-arithmetic and therefore
-operating correctly in all cases, including in particular where the
-input operand is a NaN.  These instructions are therefore always used
-for the respective operations.
+@item -mno-perf-ext
+@opindex mno-perf-ext
+Do not generate performance extension instructions.
 
-@item -mnan=2008
-@itemx -mnan=legacy
-@opindex mnan=2008
-@opindex mnan=legacy
-These options control the encoding of the special not-a-number (NaN)
-IEEE 754 floating-point data.
+@item -mv3push
+@opindex mv3push
+Generate v3 push25/pop25 instructions.
 
-The @option{-mnan=legacy} option selects the legacy encoding.  In this
-case quiet NaNs (qNaNs) are denoted by the first bit of their trailing
-significand field being 0, whereas signalling NaNs (sNaNs) are denoted
-by the first bit of their trailing significand field being 1.
+@item -mno-v3push
+@opindex mno-v3push
+Do not generate v3 push25/pop25 instructions.
 
-The @option{-mnan=2008} option selects the IEEE 754-2008 encoding.  In
-this case qNaNs are denoted by the first bit of their trailing
-significand field being 1, whereas sNaNs are denoted by the first bit of
-their trailing significand field being 0.
+@item -m16-bit
+@opindex m16-bit
+Generate 16-bit instructions.
 
-The default is @option{-mnan=legacy} unless GCC has been configured with
-@option{--with-nan=2008}.
+@item -mno-16-bit
+@opindex mno-16-bit
+Do not generate 16-bit instructions.
 
-@item -mllsc
-@itemx -mno-llsc
-@opindex mllsc
-@opindex mno-llsc
-Use (do not use) @samp{ll}, @samp{sc}, and @samp{sync} instructions to
-implement atomic memory built-in functions.  When neither option is
-specified, GCC uses the instructions if the target architecture
-supports them.
+@item -misr-vector-size=@var{num}
+@opindex misr-vector-size
+Specify the size of each interrupt vector, which must be 4 or 16.
 
-@option{-mllsc} is useful if the runtime environment can emulate the
-instructions and @option{-mno-llsc} can be useful when compiling for
-nonstandard ISAs.  You can make either option the default by
-configuring GCC with @option{--with-llsc} and @option{--without-llsc}
-respectively.  @option{--with-llsc} is the default for some
-configurations; see the installation documentation for details.
+@item -mcache-block-size=@var{num}
+@opindex mcache-block-size
+Specify the size of each cache block,
+which must be a power of 2 between 4 and 512.
 
-@item -mdsp
-@itemx -mno-dsp
-@opindex mdsp
-@opindex mno-dsp
-Use (do not use) revision 1 of the MIPS DSP ASE@.
-@xref{MIPS DSP Built-in Functions}.  This option defines the
-preprocessor macro @code{__mips_dsp}.  It also defines
-@code{__mips_dsp_rev} to 1.
+@item -march=@var{arch}
+@opindex march
+Specify the name of the target architecture.
 
-@item -mdspr2
-@itemx -mno-dspr2
-@opindex mdspr2
-@opindex mno-dspr2
-Use (do not use) revision 2 of the MIPS DSP ASE@.
-@xref{MIPS DSP Built-in Functions}.  This option defines the
-preprocessor macros @code{__mips_dsp} and @code{__mips_dspr2}.
-It also defines @code{__mips_dsp_rev} to 2.
+@item -mcmodel=@var{code-model}
+@opindex mcmodel
+Set the code model to one of
+@table @asis
+@item @samp{small}
+All the data and read-only data segments must be within 512KB addressing space.
+The text segment must be within 16MB addressing space.
+@item @samp{medium}
+The data segment must be within 512KB while the read-only data segment can be
+within 4GB addressing space.  The text segment should be still within 16MB
+addressing space.
+@item @samp{large}
+All the text and data segments can be within 4GB addressing space.
+@end table
 
-@item -msmartmips
-@itemx -mno-smartmips
-@opindex msmartmips
-@opindex mno-smartmips
-Use (do not use) the MIPS SmartMIPS ASE.
+@item -mctor-dtor
+@opindex mctor-dtor
+Enable constructor/destructor feature.
 
-@item -mpaired-single
-@itemx -mno-paired-single
-@opindex mpaired-single
-@opindex mno-paired-single
-Use (do not use) paired-single floating-point instructions.
-@xref{MIPS Paired-Single Support}.  This option requires
-hardware floating-point support to be enabled.
+@item -mrelax
+@opindex mrelax
+Guide linker to relax instructions.
 
-@item -mdmx
-@itemx -mno-mdmx
-@opindex mdmx
-@opindex mno-mdmx
-Use (do not use) MIPS Digital Media Extension instructions.
-This option can only be used when generating 64-bit code and requires
-hardware floating-point support to be enabled.
+@end table
 
-@item -mips3d
-@itemx -mno-mips3d
-@opindex mips3d
-@opindex mno-mips3d
-Use (do not use) the MIPS-3D ASE@.  @xref{MIPS-3D Built-in Functions}.
-The option @option{-mips3d} implies @option{-mpaired-single}.
+@node Nios II Options
+@subsection Nios II Options
+@cindex Nios II options
+@cindex Altera Nios II options
 
-@item -mmicromips
-@itemx -mno-micromips
-@opindex mmicromips
-@opindex mno-mmicromips
-Generate (do not generate) microMIPS code.
+These are the options defined for the Altera Nios II processor.
 
-MicroMIPS code generation can also be controlled on a per-function basis
-by means of @code{micromips} and @code{nomicromips} attributes.
-@xref{Function Attributes}, for more information.
+@table @gcctabopt
 
-@item -mmt
-@itemx -mno-mt
-@opindex mmt
-@opindex mno-mt
-Use (do not use) MT Multithreading instructions.
+@item -G @var{num}
+@opindex G
+@cindex smaller data references
+Put global and static objects less than or equal to @var{num} bytes
+into the small data or BSS sections instead of the normal data or BSS
+sections.  The default value of @var{num} is 8.
 
-@item -mmcu
-@itemx -mno-mcu
-@opindex mmcu
-@opindex mno-mcu
-Use (do not use) the MIPS MCU ASE instructions.
+@item -mgpopt=@var{option}
+@item -mgpopt
+@itemx -mno-gpopt
+@opindex mgpopt
+@opindex mno-gpopt
+Generate (do not generate) GP-relative accesses.  The following 
+@var{option} names are recognized:
 
-@item -meva
-@itemx -mno-eva
-@opindex meva
-@opindex mno-eva
-Use (do not use) the MIPS Enhanced Virtual Addressing instructions.
+@table @samp
 
-@item -mvirt
-@itemx -mno-virt
-@opindex mvirt
-@opindex mno-virt
-Use (do not use) the MIPS Virtualization Application Specific instructions.
+@item none
+Do not generate GP-relative accesses.
 
-@item -mxpa
-@itemx -mno-xpa
-@opindex mxpa
-@opindex mno-xpa
-Use (do not use) the MIPS eXtended Physical Address (XPA) instructions.
+@item local
+Generate GP-relative accesses for small data objects that are not 
+external or weak.  Also use GP-relative addressing for objects that
+have been explicitly placed in a small data section via a @code{section}
+attribute.
 
-@item -mlong64
-@opindex mlong64
-Force @code{long} types to be 64 bits wide.  See @option{-mlong32} for
-an explanation of the default and the way that the pointer size is
-determined.
+@item global
+As for @samp{local}, but also generate GP-relative accesses for
+small data objects that are external or weak.  If you use this option,
+you must ensure that all parts of your program (including libraries) are
+compiled with the same @option{-G} setting.
 
-@item -mlong32
-@opindex mlong32
-Force @code{long}, @code{int}, and pointer types to be 32 bits wide.
+@item data
+Generate GP-relative accesses for all data objects in the program.  If you
+use this option, the entire data and BSS segments
+of your program must fit in 64K of memory and you must use an appropriate
+linker script to allocate them within the addressible range of the
+global pointer.
 
-The default size of @code{int}s, @code{long}s and pointers depends on
-the ABI@.  All the supported ABIs use 32-bit @code{int}s.  The n64 ABI
-uses 64-bit @code{long}s, as does the 64-bit EABI; the others use
-32-bit @code{long}s.  Pointers are the same size as @code{long}s,
-or the same size as integer registers, whichever is smaller.
+@item all
+Generate GP-relative addresses for function pointers as well as data
+pointers.  If you use this option, the entire text, data, and BSS segments
+of your program must fit in 64K of memory and you must use an appropriate
+linker script to allocate them within the addressible range of the
+global pointer.
 
-@item -msym32
-@itemx -mno-sym32
-@opindex msym32
-@opindex mno-sym32
-Assume (do not assume) that all symbols have 32-bit values, regardless
-of the selected ABI@.  This option is useful in combination with
-@option{-mabi=64} and @option{-mno-abicalls} because it allows GCC
-to generate shorter and faster references to symbolic addresses.
+@end table
 
-@item -G @var{num}
-@opindex G
-Put definitions of externally-visible data in a small data section
-if that data is no bigger than @var{num} bytes.  GCC can then generate
-more efficient accesses to the data; see @option{-mgpopt} for details.
+@option{-mgpopt} is equivalent to @option{-mgpopt=local}, and
+@option{-mno-gpopt} is equivalent to @option{-mgpopt=none}.
 
-The default @option{-G} option depends on the configuration.
+The default is @option{-mgpopt} except when @option{-fpic} or
+@option{-fPIC} is specified to generate position-independent code.
+Note that the Nios II ABI does not permit GP-relative accesses from
+shared libraries.
 
-@item -mlocal-sdata
-@itemx -mno-local-sdata
-@opindex mlocal-sdata
-@opindex mno-local-sdata
-Extend (do not extend) the @option{-G} behavior to local data too,
-such as to static variables in C@.  @option{-mlocal-sdata} is the
-default for all configurations.
+You may need to specify @option{-mno-gpopt} explicitly when building
+programs that include large amounts of small data, including large
+GOT data sections.  In this case, the 16-bit offset for GP-relative
+addressing may not be large enough to allow access to the entire 
+small data section.
 
-If the linker complains that an application is using too much small data,
-you might want to try rebuilding the less performance-critical parts with
-@option{-mno-local-sdata}.  You might also want to build large
-libraries with @option{-mno-local-sdata}, so that the libraries leave
-more room for the main program.
+@item -mel
+@itemx -meb
+@opindex mel
+@opindex meb
+Generate little-endian (default) or big-endian (experimental) code,
+respectively.
 
-@item -mextern-sdata
-@itemx -mno-extern-sdata
-@opindex mextern-sdata
-@opindex mno-extern-sdata
-Assume (do not assume) that externally-defined data is in
-a small data section if the size of that data is within the @option{-G} limit.
-@option{-mextern-sdata} is the default for all configurations.
+@item -mbypass-cache
+@itemx -mno-bypass-cache
+@opindex mno-bypass-cache
+@opindex mbypass-cache
+Force all load and store instructions to always bypass cache by 
+using I/O variants of the instructions. The default is not to
+bypass the cache.
 
-If you compile a module @var{Mod} with @option{-mextern-sdata} @option{-G
-@var{num}} @option{-mgpopt}, and @var{Mod} references a variable @var{Var}
-that is no bigger than @var{num} bytes, you must make sure that @var{Var}
-is placed in a small data section.  If @var{Var} is defined by another
-module, you must either compile that module with a high-enough
-@option{-G} setting or attach a @code{section} attribute to @var{Var}'s
-definition.  If @var{Var} is common, you must link the application
-with a high-enough @option{-G} setting.
+@item -mno-cache-volatile 
+@itemx -mcache-volatile       
+@opindex mcache-volatile 
+@opindex mno-cache-volatile
+Volatile memory access bypass the cache using the I/O variants of 
+the load and store instructions. The default is not to bypass the cache.
 
-The easiest way of satisfying these restrictions is to compile
-and link every module with the same @option{-G} option.  However,
-you may wish to build a library that supports several different
-small data limits.  You can do this by compiling the library with
-the highest supported @option{-G} setting and additionally using
-@option{-mno-extern-sdata} to stop the library from making assumptions
-about externally-defined data.
+@item -mno-fast-sw-div
+@itemx -mfast-sw-div
+@opindex mno-fast-sw-div
+@opindex mfast-sw-div
+Do not use table-based fast divide for small numbers. The default 
+is to use the fast divide at @option{-O3} and above.
 
-@item -mgpopt
-@itemx -mno-gpopt
-@opindex mgpopt
-@opindex mno-gpopt
-Use (do not use) GP-relative accesses for symbols that are known to be
-in a small data section; see @option{-G}, @option{-mlocal-sdata} and
-@option{-mextern-sdata}.  @option{-mgpopt} is the default for all
-configurations.
+@item -mno-hw-mul
+@itemx -mhw-mul
+@itemx -mno-hw-mulx
+@itemx -mhw-mulx
+@itemx -mno-hw-div
+@itemx -mhw-div
+@opindex mno-hw-mul
+@opindex mhw-mul
+@opindex mno-hw-mulx
+@opindex mhw-mulx
+@opindex mno-hw-div
+@opindex mhw-div
+Enable or disable emitting @code{mul}, @code{mulx} and @code{div} family of 
+instructions by the compiler. The default is to emit @code{mul}
+and not emit @code{div} and @code{mulx}.
 
-@option{-mno-gpopt} is useful for cases where the @code{$gp} register
-might not hold the value of @code{_gp}.  For example, if the code is
-part of a library that might be used in a boot monitor, programs that
-call boot monitor routines pass an unknown value in @code{$gp}.
-(In such situations, the boot monitor itself is usually compiled
-with @option{-G0}.)
+@item -mcustom-@var{insn}=@var{N}
+@itemx -mno-custom-@var{insn}
+@opindex mcustom-@var{insn}
+@opindex mno-custom-@var{insn}
+Each @option{-mcustom-@var{insn}=@var{N}} option enables use of a
+custom instruction with encoding @var{N} when generating code that uses 
+@var{insn}.  For example, @option{-mcustom-fadds=253} generates custom
+instruction 253 for single-precision floating-point add operations instead
+of the default behavior of using a library call.
 
-@option{-mno-gpopt} implies @option{-mno-local-sdata} and
-@option{-mno-extern-sdata}.
+The following values of @var{insn} are supported.  Except as otherwise
+noted, floating-point operations are expected to be implemented with
+normal IEEE 754 semantics and correspond directly to the C operators or the
+equivalent GCC built-in functions (@pxref{Other Builtins}).
 
-@item -membedded-data
-@itemx -mno-embedded-data
-@opindex membedded-data
-@opindex mno-embedded-data
-Allocate variables to the read-only data section first if possible, then
-next in the small data section if possible, otherwise in data.  This gives
-slightly slower code than the default, but reduces the amount of RAM required
-when executing, and thus may be preferred for some embedded systems.
+Single-precision floating point:
+@table @asis
 
-@item -muninit-const-in-rodata
-@itemx -mno-uninit-const-in-rodata
-@opindex muninit-const-in-rodata
-@opindex mno-uninit-const-in-rodata
-Put uninitialized @code{const} variables in the read-only data section.
-This option is only meaningful in conjunction with @option{-membedded-data}.
+@item @samp{fadds}, @samp{fsubs}, @samp{fdivs}, @samp{fmuls}
+Binary arithmetic operations.
 
-@item -mcode-readable=@var{setting}
-@opindex mcode-readable
-Specify whether GCC may generate code that reads from executable sections.
-There are three possible settings:
+@item @samp{fnegs}
+Unary negation.
 
-@table @gcctabopt
-@item -mcode-readable=yes
-Instructions may freely access executable sections.  This is the
-default setting.
+@item @samp{fabss}
+Unary absolute value.
 
-@item -mcode-readable=pcrel
-MIPS16 PC-relative load instructions can access executable sections,
-but other instructions must not do so.  This option is useful on 4KSc
-and 4KSd processors when the code TLBs have the Read Inhibit bit set.
-It is also useful on processors that can be configured to have a dual
-instruction/data SRAM interface and that, like the M4K, automatically
-redirect PC-relative loads to the instruction RAM.
+@item @samp{fcmpeqs}, @samp{fcmpges}, @samp{fcmpgts}, @samp{fcmples}, @samp{fcmplts}, @samp{fcmpnes}
+Comparison operations.
 
-@item -mcode-readable=no
-Instructions must not access executable sections.  This option can be
-useful on targets that are configured to have a dual instruction/data
-SRAM interface but that (unlike the M4K) do not automatically redirect
-PC-relative loads to the instruction RAM.
-@end table
+@item @samp{fmins}, @samp{fmaxs}
+Floating-point minimum and maximum.  These instructions are only
+generated if @option{-ffinite-math-only} is specified.
 
-@item -msplit-addresses
-@itemx -mno-split-addresses
-@opindex msplit-addresses
-@opindex mno-split-addresses
-Enable (disable) use of the @code{%hi()} and @code{%lo()} assembler
-relocation operators.  This option has been superseded by
-@option{-mexplicit-relocs} but is retained for backwards compatibility.
+@item @samp{fsqrts}
+Unary square root operation.
 
-@item -mexplicit-relocs
-@itemx -mno-explicit-relocs
-@opindex mexplicit-relocs
-@opindex mno-explicit-relocs
-Use (do not use) assembler relocation operators when dealing with symbolic
-addresses.  The alternative, selected by @option{-mno-explicit-relocs},
-is to use assembler macros instead.
+@item @samp{fcoss}, @samp{fsins}, @samp{ftans}, @samp{fatans}, @samp{fexps}, @samp{flogs}
+Floating-point trigonometric and exponential functions.  These instructions
+are only generated if @option{-funsafe-math-optimizations} is also specified.
 
-@option{-mexplicit-relocs} is the default if GCC was configured
-to use an assembler that supports relocation operators.
+@end table
 
-@item -mcheck-zero-division
-@itemx -mno-check-zero-division
-@opindex mcheck-zero-division
-@opindex mno-check-zero-division
-Trap (do not trap) on integer division by zero.
+Double-precision floating point:
+@table @asis
 
-The default is @option{-mcheck-zero-division}.
+@item @samp{faddd}, @samp{fsubd}, @samp{fdivd}, @samp{fmuld}
+Binary arithmetic operations.
 
-@item -mdivide-traps
-@itemx -mdivide-breaks
-@opindex mdivide-traps
-@opindex mdivide-breaks
-MIPS systems check for division by zero by generating either a
-conditional trap or a break instruction.  Using traps results in
-smaller code, but is only supported on MIPS II and later.  Also, some
-versions of the Linux kernel have a bug that prevents trap from
-generating the proper signal (@code{SIGFPE}).  Use @option{-mdivide-traps} to
-allow conditional traps on architectures that support them and
-@option{-mdivide-breaks} to force the use of breaks.
+@item @samp{fnegd}
+Unary negation.
 
-The default is usually @option{-mdivide-traps}, but this can be
-overridden at configure time using @option{--with-divide=breaks}.
-Divide-by-zero checks can be completely disabled using
-@option{-mno-check-zero-division}.
+@item @samp{fabsd}
+Unary absolute value.
 
-@item -mmemcpy
-@itemx -mno-memcpy
-@opindex mmemcpy
-@opindex mno-memcpy
-Force (do not force) the use of @code{memcpy} for non-trivial block
-moves.  The default is @option{-mno-memcpy}, which allows GCC to inline
-most constant-sized copies.
+@item @samp{fcmpeqd}, @samp{fcmpged}, @samp{fcmpgtd}, @samp{fcmpled}, @samp{fcmpltd}, @samp{fcmpned}
+Comparison operations.
 
-@item -mlong-calls
-@itemx -mno-long-calls
-@opindex mlong-calls
-@opindex mno-long-calls
-Disable (do not disable) use of the @code{jal} instruction.  Calling
-functions using @code{jal} is more efficient but requires the caller
-and callee to be in the same 256 megabyte segment.
+@item @samp{fmind}, @samp{fmaxd}
+Double-precision minimum and maximum.  These instructions are only
+generated if @option{-ffinite-math-only} is specified.
 
-This option has no effect on abicalls code.  The default is
-@option{-mno-long-calls}.
+@item @samp{fsqrtd}
+Unary square root operation.
 
-@item -mmad
-@itemx -mno-mad
-@opindex mmad
-@opindex mno-mad
-Enable (disable) use of the @code{mad}, @code{madu} and @code{mul}
-instructions, as provided by the R4650 ISA@.
+@item @samp{fcosd}, @samp{fsind}, @samp{ftand}, @samp{fatand}, @samp{fexpd}, @samp{flogd}
+Double-precision trigonometric and exponential functions.  These instructions
+are only generated if @option{-funsafe-math-optimizations} is also specified.
 
-@item -mimadd
-@itemx -mno-imadd
-@opindex mimadd
-@opindex mno-imadd
-Enable (disable) use of the @code{madd} and @code{msub} integer
-instructions.  The default is @option{-mimadd} on architectures
-that support @code{madd} and @code{msub} except for the 74k 
-architecture where it was found to generate slower code.
+@end table
 
-@item -mfused-madd
-@itemx -mno-fused-madd
-@opindex mfused-madd
-@opindex mno-fused-madd
-Enable (disable) use of the floating-point multiply-accumulate
-instructions, when they are available.  The default is
-@option{-mfused-madd}.
+Conversions:
+@table @asis
+@item @samp{fextsd}
+Conversion from single precision to double precision.
 
-On the R8000 CPU when multiply-accumulate instructions are used,
-the intermediate product is calculated to infinite precision
-and is not subject to the FCSR Flush to Zero bit.  This may be
-undesirable in some circumstances.  On other processors the result
-is numerically identical to the equivalent computation using
-separate multiply, add, subtract and negate instructions.
+@item @samp{ftruncds}
+Conversion from double precision to single precision.
 
-@item -nocpp
-@opindex nocpp
-Tell the MIPS assembler to not run its preprocessor over user
-assembler files (with a @samp{.s} suffix) when assembling them.
+@item @samp{fixsi}, @samp{fixsu}, @samp{fixdi}, @samp{fixdu}
+Conversion from floating point to signed or unsigned integer types, with
+truncation towards zero.
 
-@item -mfix-24k
-@item -mno-fix-24k
-@opindex mfix-24k
-@opindex mno-fix-24k
-Work around the 24K E48 (lost data on stores during refill) errata.
-The workarounds are implemented by the assembler rather than by GCC@.
+@item @samp{round}
+Conversion from single-precision floating point to signed integer,
+rounding to the nearest integer and ties away from zero.
+This corresponds to the @code{__builtin_lroundf} function when
+@option{-fno-math-errno} is used.
 
-@item -mfix-r4000
-@itemx -mno-fix-r4000
-@opindex mfix-r4000
-@opindex mno-fix-r4000
-Work around certain R4000 CPU errata:
-@itemize @minus
-@item
-A double-word or a variable shift may give an incorrect result if executed
-immediately after starting an integer division.
-@item
-A double-word or a variable shift may give an incorrect result if executed
-while an integer multiplication is in progress.
-@item
-An integer division may give an incorrect result if started in a delay slot
-of a taken branch or a jump.
-@end itemize
+@item @samp{floatis}, @samp{floatus}, @samp{floatid}, @samp{floatud}
+Conversion from signed or unsigned integer types to floating-point types.
 
-@item -mfix-r4400
-@itemx -mno-fix-r4400
-@opindex mfix-r4400
-@opindex mno-fix-r4400
-Work around certain R4400 CPU errata:
-@itemize @minus
-@item
-A double-word or a variable shift may give an incorrect result if executed
-immediately after starting an integer division.
-@end itemize
+@end table
 
-@item -mfix-r10000
-@itemx -mno-fix-r10000
-@opindex mfix-r10000
-@opindex mno-fix-r10000
-Work around certain R10000 errata:
-@itemize @minus
-@item
-@code{ll}/@code{sc} sequences may not behave atomically on revisions
-prior to 3.0.  They may deadlock on revisions 2.6 and earlier.
-@end itemize
+In addition, all of the following transfer instructions for internal
+registers X and Y must be provided to use any of the double-precision
+floating-point instructions.  Custom instructions taking two
+double-precision source operands expect the first operand in the
+64-bit register X.  The other operand (or only operand of a unary
+operation) is given to the custom arithmetic instruction with the
+least significant half in source register @var{src1} and the most
+significant half in @var{src2}.  A custom instruction that returns a
+double-precision result returns the most significant 32 bits in the
+destination register and the other half in 32-bit register Y.  
+GCC automatically generates the necessary code sequences to write
+register X and/or read register Y when double-precision floating-point
+instructions are used.
 
-This option can only be used if the target architecture supports
-branch-likely instructions.  @option{-mfix-r10000} is the default when
-@option{-march=r10000} is used; @option{-mno-fix-r10000} is the default
-otherwise.
+@table @asis
 
-@item -mfix-rm7000
-@itemx -mno-fix-rm7000
-@opindex mfix-rm7000
-Work around the RM7000 @code{dmult}/@code{dmultu} errata.  The
-workarounds are implemented by the assembler rather than by GCC@.
+@item @samp{fwrx}
+Write @var{src1} into the least significant half of X and @var{src2} into
+the most significant half of X.
 
-@item -mfix-vr4120
-@itemx -mno-fix-vr4120
-@opindex mfix-vr4120
-Work around certain VR4120 errata:
-@itemize @minus
-@item
-@code{dmultu} does not always produce the correct result.
-@item
-@code{div} and @code{ddiv} do not always produce the correct result if one
-of the operands is negative.
-@end itemize
-The workarounds for the division errata rely on special functions in
-@file{libgcc.a}.  At present, these functions are only provided by
-the @code{mips64vr*-elf} configurations.
+@item @samp{fwry}
+Write @var{src1} into Y.
 
-Other VR4120 errata require a NOP to be inserted between certain pairs of
-instructions.  These errata are handled by the assembler, not by GCC itself.
+@item @samp{frdxhi}, @samp{frdxlo}
+Read the most or least (respectively) significant half of X and store it in
+@var{dest}.
 
-@item -mfix-vr4130
-@opindex mfix-vr4130
-Work around the VR4130 @code{mflo}/@code{mfhi} errata.  The
-workarounds are implemented by the assembler rather than by GCC,
-although GCC avoids using @code{mflo} and @code{mfhi} if the
-VR4130 @code{macc}, @code{macchi}, @code{dmacc} and @code{dmacchi}
-instructions are available instead.
+@item @samp{frdy}
+Read the value of Y and store it into @var{dest}.
+@end table
 
-@item -mfix-sb1
-@itemx -mno-fix-sb1
-@opindex mfix-sb1
-Work around certain SB-1 CPU core errata.
-(This flag currently works around the SB-1 revision 2
-``F1'' and ``F2'' floating-point errata.)
+Note that you can gain more local control over generation of Nios II custom
+instructions by using the @code{target("custom-@var{insn}=@var{N}")}
+and @code{target("no-custom-@var{insn}")} function attributes
+(@pxref{Function Attributes})
+or pragmas (@pxref{Function Specific Option Pragmas}).
 
-@item -mr10k-cache-barrier=@var{setting}
-@opindex mr10k-cache-barrier
-Specify whether GCC should insert cache barriers to avoid the
-side-effects of speculation on R10K processors.
+@item -mcustom-fpu-cfg=@var{name}
+@opindex mcustom-fpu-cfg
 
-In common with many processors, the R10K tries to predict the outcome
-of a conditional branch and speculatively executes instructions from
-the ``taken'' branch.  It later aborts these instructions if the
-predicted outcome is wrong.  However, on the R10K, even aborted
-instructions can have side effects.
+This option enables a predefined, named set of custom instruction encodings
+(see @option{-mcustom-@var{insn}} above).  
+Currently, the following sets are defined:
 
-This problem only affects kernel stores and, depending on the system,
-kernel loads.  As an example, a speculatively-executed store may load
-the target memory into cache and mark the cache line as dirty, even if
-the store itself is later aborted.  If a DMA operation writes to the
-same area of memory before the ``dirty'' line is flushed, the cached
-data overwrites the DMA-ed data.  See the R10K processor manual
-for a full description, including other potential problems.
+@option{-mcustom-fpu-cfg=60-1} is equivalent to:
+@gccoptlist{-mcustom-fmuls=252 @gol
+-mcustom-fadds=253 @gol
+-mcustom-fsubs=254 @gol
+-fsingle-precision-constant}
 
-One workaround is to insert cache barrier instructions before every memory
-access that might be speculatively executed and that might have side
-effects even if aborted.  @option{-mr10k-cache-barrier=@var{setting}}
-controls GCC's implementation of this workaround.  It assumes that
-aborted accesses to any byte in the following regions does not have
-side effects:
+@option{-mcustom-fpu-cfg=60-2} is equivalent to:
+@gccoptlist{-mcustom-fmuls=252 @gol
+-mcustom-fadds=253 @gol
+-mcustom-fsubs=254 @gol
+-mcustom-fdivs=255 @gol
+-fsingle-precision-constant}
 
-@enumerate
-@item
-the memory occupied by the current function's stack frame;
+@option{-mcustom-fpu-cfg=72-3} is equivalent to:
+@gccoptlist{-mcustom-floatus=243 @gol
+-mcustom-fixsi=244 @gol
+-mcustom-floatis=245 @gol
+-mcustom-fcmpgts=246 @gol
+-mcustom-fcmples=249 @gol
+-mcustom-fcmpeqs=250 @gol
+-mcustom-fcmpnes=251 @gol
+-mcustom-fmuls=252 @gol
+-mcustom-fadds=253 @gol
+-mcustom-fsubs=254 @gol
+-mcustom-fdivs=255 @gol
+-fsingle-precision-constant}
 
-@item
-the memory occupied by an incoming stack argument;
+Custom instruction assignments given by individual
+@option{-mcustom-@var{insn}=} options override those given by
+@option{-mcustom-fpu-cfg=}, regardless of the
+order of the options on the command line.
 
-@item
-the memory occupied by an object with a link-time-constant address.
-@end enumerate
+Note that you can gain more local control over selection of a FPU
+configuration by using the @code{target("custom-fpu-cfg=@var{name}")}
+function attribute (@pxref{Function Attributes})
+or pragma (@pxref{Function Specific Option Pragmas}).
 
-It is the kernel's responsibility to ensure that speculative
-accesses to these regions are indeed safe.
+@end table
 
-If the input program contains a function declaration such as:
+These additional @samp{-m} options are available for the Altera Nios II
+ELF (bare-metal) target:
 
-@smallexample
-void foo (void);
-@end smallexample
+@table @gcctabopt
 
-then the implementation of @code{foo} must allow @code{j foo} and
-@code{jal foo} to be executed speculatively.  GCC honors this
-restriction for functions it compiles itself.  It expects non-GCC
-functions (such as hand-written assembly code) to do the same.
+@item -mhal
+@opindex mhal
+Link with HAL BSP.  This suppresses linking with the GCC-provided C runtime
+startup and termination code, and is typically used in conjunction with
+@option{-msys-crt0=} to specify the location of the alternate startup code
+provided by the HAL BSP.
 
-The option has three forms:
+@item -msmallc
+@opindex msmallc
+Link with a limited version of the C library, @option{-lsmallc}, rather than
+Newlib.
 
-@table @gcctabopt
-@item -mr10k-cache-barrier=load-store
-Insert a cache barrier before a load or store that might be
-speculatively executed and that might have side effects even
-if aborted.
+@item -msys-crt0=@var{startfile}
+@opindex msys-crt0
+@var{startfile} is the file name of the startfile (crt0) to use 
+when linking.  This option is only useful in conjunction with @option{-mhal}.
 
-@item -mr10k-cache-barrier=store
-Insert a cache barrier before a store that might be speculatively
-executed and that might have side effects even if aborted.
+@item -msys-lib=@var{systemlib}
+@opindex msys-lib
+@var{systemlib} is the library name of the library that provides
+low-level system calls required by the C library,
+e.g. @code{read} and @code{write}.
+This option is typically used to link with a library provided by a HAL BSP.
 
-@item -mr10k-cache-barrier=none
-Disable the insertion of cache barriers.  This is the default setting.
 @end table
 
-@item -mflush-func=@var{func}
-@itemx -mno-flush-func
-@opindex mflush-func
-Specifies the function to call to flush the I and D caches, or to not
-call any such function.  If called, the function must take the same
-arguments as the common @code{_flush_func}, that is, the address of the
-memory range for which the cache is being flushed, the size of the
-memory range, and the number 3 (to flush both caches).  The default
-depends on the target GCC was configured for, but commonly is either
-@code{_flush_func} or @code{__cpu_flush}.
-
-@item mbranch-cost=@var{num}
-@opindex mbranch-cost
-Set the cost of branches to roughly @var{num} ``simple'' instructions.
-This cost is only a heuristic and is not guaranteed to produce
-consistent results across releases.  A zero cost redundantly selects
-the default, which is based on the @option{-mtune} setting.
+@node PDP-11 Options
+@subsection PDP-11 Options
+@cindex PDP-11 Options
 
-@item -mbranch-likely
-@itemx -mno-branch-likely
-@opindex mbranch-likely
-@opindex mno-branch-likely
-Enable or disable use of Branch Likely instructions, regardless of the
-default for the selected architecture.  By default, Branch Likely
-instructions may be generated if they are supported by the selected
-architecture.  An exception is for the MIPS32 and MIPS64 architectures
-and processors that implement those architectures; for those, Branch
-Likely instructions are not be generated by default because the MIPS32
-and MIPS64 architectures specifically deprecate their use.
+These options are defined for the PDP-11:
 
-@item -mfp-exceptions
-@itemx -mno-fp-exceptions
-@opindex mfp-exceptions
-Specifies whether FP exceptions are enabled.  This affects how
-FP instructions are scheduled for some processors.
-The default is that FP exceptions are
-enabled.
+@table @gcctabopt
+@item -mfpu
+@opindex mfpu
+Use hardware FPP floating point.  This is the default.  (FIS floating
+point on the PDP-11/40 is not supported.)
 
-For instance, on the SB-1, if FP exceptions are disabled, and we are emitting
-64-bit code, then we can use both FP pipes.  Otherwise, we can only use one
-FP pipe.
+@item -msoft-float
+@opindex msoft-float
+Do not use hardware floating point.
 
-@item -mvr4130-align
-@itemx -mno-vr4130-align
-@opindex mvr4130-align
-The VR4130 pipeline is two-way superscalar, but can only issue two
-instructions together if the first one is 8-byte aligned.  When this
-option is enabled, GCC aligns pairs of instructions that it
-thinks should execute in parallel.
+@item -mac0
+@opindex mac0
+Return floating-point results in ac0 (fr0 in Unix assembler syntax).
 
-This option only has an effect when optimizing for the VR4130.
-It normally makes code faster, but at the expense of making it bigger.
-It is enabled by default at optimization level @option{-O3}.
+@item -mno-ac0
+@opindex mno-ac0
+Return floating-point results in memory.  This is the default.
 
-@item -msynci
-@itemx -mno-synci
-@opindex msynci
-Enable (disable) generation of @code{synci} instructions on
-architectures that support it.  The @code{synci} instructions (if
-enabled) are generated when @code{__builtin___clear_cache} is
-compiled.
+@item -m40
+@opindex m40
+Generate code for a PDP-11/40.
 
-This option defaults to @option{-mno-synci}, but the default can be
-overridden by configuring GCC with @option{--with-synci}.
+@item -m45
+@opindex m45
+Generate code for a PDP-11/45.  This is the default.
 
-When compiling code for single processor systems, it is generally safe
-to use @code{synci}.  However, on many multi-core (SMP) systems, it
-does not invalidate the instruction caches on all cores and may lead
-to undefined behavior.
+@item -m10
+@opindex m10
+Generate code for a PDP-11/10.
 
-@item -mrelax-pic-calls
-@itemx -mno-relax-pic-calls
-@opindex mrelax-pic-calls
-Try to turn PIC calls that are normally dispatched via register
-@code{$25} into direct calls.  This is only possible if the linker can
-resolve the destination at link-time and if the destination is within
-range for a direct call.
+@item -mbcopy-builtin
+@opindex mbcopy-builtin
+Use inline @code{movmemhi} patterns for copying memory.  This is the
+default.
 
-@option{-mrelax-pic-calls} is the default if GCC was configured to use
-an assembler and a linker that support the @code{.reloc} assembly
-directive and @option{-mexplicit-relocs} is in effect.  With
-@option{-mno-explicit-relocs}, this optimization can be performed by the
-assembler and the linker alone without help from the compiler.
+@item -mbcopy
+@opindex mbcopy
+Do not use inline @code{movmemhi} patterns for copying memory.
 
-@item -mmcount-ra-address
-@itemx -mno-mcount-ra-address
-@opindex mmcount-ra-address
-@opindex mno-mcount-ra-address
-Emit (do not emit) code that allows @code{_mcount} to modify the
-calling function's return address.  When enabled, this option extends
-the usual @code{_mcount} interface with a new @var{ra-address}
-parameter, which has type @code{intptr_t *} and is passed in register
-@code{$12}.  @code{_mcount} can then modify the return address by
-doing both of the following:
-@itemize
-@item
-Returning the new address in register @code{$31}.
-@item
-Storing the new address in @code{*@var{ra-address}},
-if @var{ra-address} is nonnull.
-@end itemize
+@item -mint16
+@itemx -mno-int32
+@opindex mint16
+@opindex mno-int32
+Use 16-bit @code{int}.  This is the default.
 
-The default is @option{-mno-mcount-ra-address}.
+@item -mint32
+@itemx -mno-int16
+@opindex mint32
+@opindex mno-int16
+Use 32-bit @code{int}.
 
-@end table
+@item -mfloat64
+@itemx -mno-float32
+@opindex mfloat64
+@opindex mno-float32
+Use 64-bit @code{float}.  This is the default.
 
-@node MMIX Options
-@subsection MMIX Options
-@cindex MMIX Options
-
-These options are defined for the MMIX:
-
-@table @gcctabopt
-@item -mlibfuncs
-@itemx -mno-libfuncs
-@opindex mlibfuncs
-@opindex mno-libfuncs
-Specify that intrinsic library functions are being compiled, passing all
-values in registers, no matter the size.
-
-@item -mepsilon
-@itemx -mno-epsilon
-@opindex mepsilon
-@opindex mno-epsilon
-Generate floating-point comparison instructions that compare with respect
-to the @code{rE} epsilon register.
-
-@item -mabi=mmixware
-@itemx -mabi=gnu
-@opindex mabi=mmixware
-@opindex mabi=gnu
-Generate code that passes function parameters and return values that (in
-the called function) are seen as registers @code{$0} and up, as opposed to
-the GNU ABI which uses global registers @code{$231} and up.
-
-@item -mzero-extend
-@itemx -mno-zero-extend
-@opindex mzero-extend
-@opindex mno-zero-extend
-When reading data from memory in sizes shorter than 64 bits, use (do not
-use) zero-extending load instructions by default, rather than
-sign-extending ones.
+@item -mfloat32
+@itemx -mno-float64
+@opindex mfloat32
+@opindex mno-float64
+Use 32-bit @code{float}.
 
-@item -mknuthdiv
-@itemx -mno-knuthdiv
-@opindex mknuthdiv
-@opindex mno-knuthdiv
-Make the result of a division yielding a remainder have the same sign as
-the divisor.  With the default, @option{-mno-knuthdiv}, the sign of the
-remainder follows the sign of the dividend.  Both methods are
-arithmetically valid, the latter being almost exclusively used.
+@item -mabshi
+@opindex mabshi
+Use @code{abshi2} pattern.  This is the default.
 
-@item -mtoplevel-symbols
-@itemx -mno-toplevel-symbols
-@opindex mtoplevel-symbols
-@opindex mno-toplevel-symbols
-Prepend (do not prepend) a @samp{:} to all global symbols, so the assembly
-code can be used with the @code{PREFIX} assembly directive.
+@item -mno-abshi
+@opindex mno-abshi
+Do not use @code{abshi2} pattern.
 
-@item -melf
-@opindex melf
-Generate an executable in the ELF format, rather than the default
-@samp{mmo} format used by the @command{mmix} simulator.
+@item -mbranch-expensive
+@opindex mbranch-expensive
+Pretend that branches are expensive.  This is for experimenting with
+code generation only.
 
-@item -mbranch-predict
-@itemx -mno-branch-predict
-@opindex mbranch-predict
-@opindex mno-branch-predict
-Use (do not use) the probable-branch instructions, when static branch
-prediction indicates a probable branch.
+@item -mbranch-cheap
+@opindex mbranch-cheap
+Do not pretend that branches are expensive.  This is the default.
 
-@item -mbase-addresses
-@itemx -mno-base-addresses
-@opindex mbase-addresses
-@opindex mno-base-addresses
-Generate (do not generate) code that uses @emph{base addresses}.  Using a
-base address automatically generates a request (handled by the assembler
-and the linker) for a constant to be set up in a global register.  The
-register is used for one or more base address requests within the range 0
-to 255 from the value held in the register.  The generally leads to short
-and fast code, but the number of different data items that can be
-addressed is limited.  This means that a program that uses lots of static
-data may require @option{-mno-base-addresses}.
+@item -munix-asm
+@opindex munix-asm
+Use Unix assembler syntax.  This is the default when configured for
+@samp{pdp11-*-bsd}.
 
-@item -msingle-exit
-@itemx -mno-single-exit
-@opindex msingle-exit
-@opindex mno-single-exit
-Force (do not force) generated code to have a single exit point in each
-function.
+@item -mdec-asm
+@opindex mdec-asm
+Use DEC assembler syntax.  This is the default when configured for any
+PDP-11 target other than @samp{pdp11-*-bsd}.
 @end table
 
-@node MN10300 Options
-@subsection MN10300 Options
-@cindex MN10300 options
+@node picoChip Options
+@subsection picoChip Options
+@cindex picoChip options
 
-These @option{-m} options are defined for Matsushita MN10300 architectures:
+These @samp{-m} options are defined for picoChip implementations:
 
 @table @gcctabopt
-@item -mmult-bug
-@opindex mmult-bug
-Generate code to avoid bugs in the multiply instructions for the MN10300
-processors.  This is the default.
 
-@item -mno-mult-bug
-@opindex mno-mult-bug
-Do not generate code to avoid bugs in the multiply instructions for the
-MN10300 processors.
+@item -mae=@var{ae_type}
+@opindex mcpu
+Set the instruction set, register set, and instruction scheduling
+parameters for array element type @var{ae_type}.  Supported values
+for @var{ae_type} are @samp{ANY}, @samp{MUL}, and @samp{MAC}.
 
-@item -mam33
-@opindex mam33
-Generate code using features specific to the AM33 processor.
+@option{-mae=ANY} selects a completely generic AE type.  Code
+generated with this option runs on any of the other AE types.  The
+code is not as efficient as it would be if compiled for a specific
+AE type, and some types of operation (e.g., multiplication) do not
+work properly on all types of AE.
 
-@item -mno-am33
-@opindex mno-am33
-Do not generate code using features specific to the AM33 processor.  This
-is the default.
+@option{-mae=MUL} selects a MUL AE type.  This is the most useful AE type
+for compiled code, and is the default.
 
-@item -mam33-2
-@opindex mam33-2
-Generate code using features specific to the AM33/2.0 processor.
+@option{-mae=MAC} selects a DSP-style MAC AE.  Code compiled with this
+option may suffer from poor performance of byte (char) manipulation,
+since the DSP AE does not provide hardware support for byte load/stores.
 
-@item -mam34
-@opindex mam34
-Generate code using features specific to the AM34 processor.
+@item -msymbol-as-address
+Enable the compiler to directly use a symbol name as an address in a
+load/store instruction, without first loading it into a
+register.  Typically, the use of this option generates larger
+programs, which run faster than when the option isn't used.  However, the
+results vary from program to program, so it is left as a user option,
+rather than being permanently enabled.
 
-@item -mtune=@var{cpu-type}
-@opindex mtune
-Use the timing characteristics of the indicated CPU type when
-scheduling instructions.  This does not change the targeted processor
-type.  The CPU type must be one of @samp{mn10300}, @samp{am33},
-@samp{am33-2} or @samp{am34}.
+@item -mno-inefficient-warnings
+Disables warnings about the generation of inefficient code.  These
+warnings can be generated, for example, when compiling code that
+performs byte-level memory operations on the MAC AE type.  The MAC AE has
+no hardware support for byte-level memory operations, so all byte
+load/stores must be synthesized from word load/store operations.  This is
+inefficient and a warning is generated to indicate
+that you should rewrite the code to avoid byte operations, or to target
+an AE type that has the necessary hardware support.  This option disables
+these warnings.
 
-@item -mreturn-pointer-on-d0
-@opindex mreturn-pointer-on-d0
-When generating a function that returns a pointer, return the pointer
-in both @code{a0} and @code{d0}.  Otherwise, the pointer is returned
-only in @code{a0}, and attempts to call such functions without a prototype
-result in errors.  Note that this option is on by default; use
-@option{-mno-return-pointer-on-d0} to disable it.
+@end table
 
-@item -mno-crt0
-@opindex mno-crt0
-Do not link in the C run-time initialization object file.
+@node PowerPC Options
+@subsection PowerPC Options
+@cindex PowerPC options
 
-@item -mrelax
-@opindex mrelax
-Indicate to the linker that it should perform a relaxation optimization pass
-to shorten branches, calls and absolute memory addresses.  This option only
-has an effect when used on the command line for the final link step.
+These are listed under @xref{RS/6000 and PowerPC Options}.
 
-This option makes symbolic debugging impossible.
+@node RL78 Options
+@subsection RL78 Options
+@cindex RL78 Options
 
-@item -mliw
-@opindex mliw
-Allow the compiler to generate @emph{Long Instruction Word}
-instructions if the target is the @samp{AM33} or later.  This is the
-default.  This option defines the preprocessor macro @code{__LIW__}.
+@table @gcctabopt
 
-@item -mnoliw
-@opindex mnoliw
-Do not allow the compiler to generate @emph{Long Instruction Word}
-instructions.  This option defines the preprocessor macro
-@code{__NO_LIW__}.
+@item -msim
+@opindex msim
+Links in additional target libraries to support operation within a
+simulator.
 
-@item -msetlb
-@opindex msetlb
-Allow the compiler to generate the @emph{SETLB} and @emph{Lcc}
-instructions if the target is the @samp{AM33} or later.  This is the
-default.  This option defines the preprocessor macro @code{__SETLB__}.
+@item -mmul=none
+@itemx -mmul=g13
+@itemx -mmul=rl78
+@opindex mmul
+Specifies the type of hardware multiplication support to be used.  The
+default is @samp{none}, which uses software multiplication functions.
+The @samp{g13} option is for the hardware multiply/divide peripheral
+only on the RL78/G13 targets.  The @samp{rl78} option is for the
+standard hardware multiplication defined in the RL78 software manual.
 
-@item -mnosetlb
-@opindex mnosetlb
-Do not allow the compiler to generate @emph{SETLB} or @emph{Lcc}
-instructions.  This option defines the preprocessor macro
-@code{__NO_SETLB__}.
+@item -m64bit-doubles
+@itemx -m32bit-doubles
+@opindex m64bit-doubles
+@opindex m32bit-doubles
+Make the @code{double} data type be 64 bits (@option{-m64bit-doubles})
+or 32 bits (@option{-m32bit-doubles}) in size.  The default is
+@option{-m32bit-doubles}.
 
 @end table
 
-@node Moxie Options
-@subsection Moxie Options
-@cindex Moxie Options
+@node RS/6000 and PowerPC Options
+@subsection IBM RS/6000 and PowerPC Options
+@cindex RS/6000 and PowerPC Options
+@cindex IBM RS/6000 and PowerPC Options
 
+These @samp{-m} options are defined for the IBM RS/6000 and PowerPC:
 @table @gcctabopt
+@item -mpowerpc-gpopt
+@itemx -mno-powerpc-gpopt
+@itemx -mpowerpc-gfxopt
+@itemx -mno-powerpc-gfxopt
+@need 800
+@itemx -mpowerpc64
+@itemx -mno-powerpc64
+@itemx -mmfcrf
+@itemx -mno-mfcrf
+@itemx -mpopcntb
+@itemx -mno-popcntb
+@itemx -mpopcntd
+@itemx -mno-popcntd
+@itemx -mfprnd
+@itemx -mno-fprnd
+@need 800
+@itemx -mcmpb
+@itemx -mno-cmpb
+@itemx -mmfpgpr
+@itemx -mno-mfpgpr
+@itemx -mhard-dfp
+@itemx -mno-hard-dfp
+@opindex mpowerpc-gpopt
+@opindex mno-powerpc-gpopt
+@opindex mpowerpc-gfxopt
+@opindex mno-powerpc-gfxopt
+@opindex mpowerpc64
+@opindex mno-powerpc64
+@opindex mmfcrf
+@opindex mno-mfcrf
+@opindex mpopcntb
+@opindex mno-popcntb
+@opindex mpopcntd
+@opindex mno-popcntd
+@opindex mfprnd
+@opindex mno-fprnd
+@opindex mcmpb
+@opindex mno-cmpb
+@opindex mmfpgpr
+@opindex mno-mfpgpr
+@opindex mhard-dfp
+@opindex mno-hard-dfp
+You use these options to specify which instructions are available on the
+processor you are using.  The default value of these options is
+determined when configuring GCC@.  Specifying the
+@option{-mcpu=@var{cpu_type}} overrides the specification of these
+options.  We recommend you use the @option{-mcpu=@var{cpu_type}} option
+rather than the options listed above.
 
-@item -meb
-@opindex meb
-Generate big-endian code.  This is the default for @samp{moxie-*-*}
-configurations.
-
-@item -mel
-@opindex mel
-Generate little-endian code.
-
-@item -mmul.x
-@opindex mmul.x
-Generate mul.x and umul.x instructions.  This is the default for
-@samp{moxiebox-*-*} configurations.
-
-@item -mno-crt0
-@opindex mno-crt0
-Do not link in the C run-time initialization object file.
-
-@end table
+Specifying @option{-mpowerpc-gpopt} allows
+GCC to use the optional PowerPC architecture instructions in the
+General Purpose group, including floating-point square root.  Specifying
+@option{-mpowerpc-gfxopt} allows GCC to
+use the optional PowerPC architecture instructions in the Graphics
+group, including floating-point select.
 
-@node MSP430 Options
-@subsection MSP430 Options
-@cindex MSP430 Options
+The @option{-mmfcrf} option allows GCC to generate the move from
+condition register field instruction implemented on the POWER4
+processor and other processors that support the PowerPC V2.01
+architecture.
+The @option{-mpopcntb} option allows GCC to generate the popcount and
+double-precision FP reciprocal estimate instruction implemented on the
+POWER5 processor and other processors that support the PowerPC V2.02
+architecture.
+The @option{-mpopcntd} option allows GCC to generate the popcount
+instruction implemented on the POWER7 processor and other processors
+that support the PowerPC V2.06 architecture.
+The @option{-mfprnd} option allows GCC to generate the FP round to
+integer instructions implemented on the POWER5+ processor and other
+processors that support the PowerPC V2.03 architecture.
+The @option{-mcmpb} option allows GCC to generate the compare bytes
+instruction implemented on the POWER6 processor and other processors
+that support the PowerPC V2.05 architecture.
+The @option{-mmfpgpr} option allows GCC to generate the FP move to/from
+general-purpose register instructions implemented on the POWER6X
+processor and other processors that support the extended PowerPC V2.05
+architecture.
+The @option{-mhard-dfp} option allows GCC to generate the decimal
+floating-point instructions implemented on some POWER processors.
 
-These options are defined for the MSP430:
+The @option{-mpowerpc64} option allows GCC to generate the additional
+64-bit instructions that are found in the full PowerPC64 architecture
+and to treat GPRs as 64-bit, doubleword quantities.  GCC defaults to
+@option{-mno-powerpc64}.
 
-@table @gcctabopt
+@item -mcpu=@var{cpu_type}
+@opindex mcpu
+Set architecture type, register usage, and
+instruction scheduling parameters for machine type @var{cpu_type}.
+Supported values for @var{cpu_type} are @samp{401}, @samp{403},
+@samp{405}, @samp{405fp}, @samp{440}, @samp{440fp}, @samp{464}, @samp{464fp},
+@samp{476}, @samp{476fp}, @samp{505}, @samp{601}, @samp{602}, @samp{603},
+@samp{603e}, @samp{604}, @samp{604e}, @samp{620}, @samp{630}, @samp{740},
+@samp{7400}, @samp{7450}, @samp{750}, @samp{801}, @samp{821}, @samp{823},
+@samp{860}, @samp{970}, @samp{8540}, @samp{a2}, @samp{e300c2},
+@samp{e300c3}, @samp{e500mc}, @samp{e500mc64}, @samp{e5500},
+@samp{e6500}, @samp{ec603e}, @samp{G3}, @samp{G4}, @samp{G5},
+@samp{titan}, @samp{power3}, @samp{power4}, @samp{power5}, @samp{power5+},
+@samp{power6}, @samp{power6x}, @samp{power7}, @samp{power8}, @samp{powerpc},
+@samp{powerpc64}, and @samp{rs64}.
 
-@item -masm-hex
-@opindex masm-hex
-Force assembly output to always use hex constants.  Normally such
-constants are signed decimals, but this option is available for
-testsuite and/or aesthetic purposes.
+@option{-mcpu=powerpc}, and @option{-mcpu=powerpc64} specify pure 32-bit
+PowerPC and 64-bit PowerPC architecture machine
+types, with an appropriate, generic processor model assumed for
+scheduling purposes.
 
-@item -mmcu=
-@opindex mmcu=
-Select the MCU to target.  This is used to create a C preprocessor
-symbol based upon the MCU name, converted to upper case and pre- and
-post-fixed with @samp{__}.  This in turn is used by the
-@file{msp430.h} header file to select an MCU-specific supplementary
-header file.
+The other options specify a specific processor.  Code generated under
+those options runs best on that processor, and may not run at all on
+others.
 
-The option also sets the ISA to use.  If the MCU name is one that is
-known to only support the 430 ISA then that is selected, otherwise the
-430X ISA is selected.  A generic MCU name of @samp{msp430} can also be
-used to select the 430 ISA.  Similarly the generic @samp{msp430x} MCU
-name selects the 430X ISA.
+The @option{-mcpu} options automatically enable or disable the
+following options:
 
-In addition an MCU-specific linker script is added to the linker
-command line.  The script's name is the name of the MCU with
-@file{.ld} appended.  Thus specifying @option{-mmcu=xxx} on the @command{gcc}
-command line defines the C preprocessor symbol @code{__XXX__} and
-cause the linker to search for a script called @file{xxx.ld}.
+@gccoptlist{-maltivec  -mfprnd  -mhard-float  -mmfcrf  -mmultiple @gol
+-mpopcntb -mpopcntd  -mpowerpc64 @gol
+-mpowerpc-gpopt  -mpowerpc-gfxopt  -msingle-float -mdouble-float @gol
+-msimple-fpu -mstring  -mmulhw  -mdlmzb  -mmfpgpr -mvsx @gol
+-mcrypto -mdirect-move -mpower8-fusion -mpower8-vector @gol
+-mquad-memory -mquad-memory-atomic}
 
-This option is also passed on to the assembler.
+The particular options set for any particular CPU varies between
+compiler versions, depending on what setting seems to produce optimal
+code for that CPU; it doesn't necessarily reflect the actual hardware's
+capabilities.  If you wish to set an individual option to a particular
+value, you may specify it after the @option{-mcpu} option, like
+@option{-mcpu=970 -mno-altivec}.
 
-@item -mcpu=
-@opindex mcpu=
-Specifies the ISA to use.  Accepted values are @samp{msp430},
-@samp{msp430x} and @samp{msp430xv2}.  This option is deprecated.  The
-@option{-mmcu=} option should be used to select the ISA.
+On AIX, the @option{-maltivec} and @option{-mpowerpc64} options are
+not enabled or disabled by the @option{-mcpu} option at present because
+AIX does not have full support for these options.  You may still
+enable or disable them individually if you're sure it'll work in your
+environment.
 
-@item -msim
-@opindex msim
-Link to the simulator runtime libraries and linker script.  Overrides
-any scripts that would be selected by the @option{-mmcu=} option.
+@item -mtune=@var{cpu_type}
+@opindex mtune
+Set the instruction scheduling parameters for machine type
+@var{cpu_type}, but do not set the architecture type or register usage,
+as @option{-mcpu=@var{cpu_type}} does.  The same
+values for @var{cpu_type} are used for @option{-mtune} as for
+@option{-mcpu}.  If both are specified, the code generated uses the
+architecture and registers set by @option{-mcpu}, but the
+scheduling parameters set by @option{-mtune}.
 
-@item -mlarge
-@opindex mlarge
-Use large-model addressing (20-bit pointers, 32-bit @code{size_t}).
+@item -mcmodel=small
+@opindex mcmodel=small
+Generate PowerPC64 code for the small model: The TOC is limited to
+64k.
 
-@item -msmall
-@opindex msmall
-Use small-model addressing (16-bit pointers, 16-bit @code{size_t}).
+@item -mcmodel=medium
+@opindex mcmodel=medium
+Generate PowerPC64 code for the medium model: The TOC and other static
+data may be up to a total of 4G in size.
 
-@item -mrelax
-@opindex mrelax
-This option is passed to the assembler and linker, and allows the
-linker to perform certain optimizations that cannot be done until
-the final link.
+@item -mcmodel=large
+@opindex mcmodel=large
+Generate PowerPC64 code for the large model: The TOC may be up to 4G
+in size.  Other data and code is only limited by the 64-bit address
+space.
 
-@item mhwmult=
-@opindex mhwmult=
-Describes the type of hardware multiply supported by the target.
-Accepted values are @samp{none} for no hardware multiply, @samp{16bit}
-for the original 16-bit-only multiply supported by early MCUs.
-@samp{32bit} for the 16/32-bit multiply supported by later MCUs and
-@samp{f5series} for the 16/32-bit multiply supported by F5-series MCUs.
-A value of @samp{auto} can also be given.  This tells GCC to deduce
-the hardware multiply support based upon the MCU name provided by the
-@option{-mmcu} option.  If no @option{-mmcu} option is specified then
-@samp{32bit} hardware multiply support is assumed.  @samp{auto} is the
-default setting.
+@item -maltivec
+@itemx -mno-altivec
+@opindex maltivec
+@opindex mno-altivec
+Generate code that uses (does not use) AltiVec instructions, and also
+enable the use of built-in functions that allow more direct access to
+the AltiVec instruction set.  You may also need to set
+@option{-mabi=altivec} to adjust the current ABI with AltiVec ABI
+enhancements.
 
-Hardware multiplies are normally performed by calling a library
-routine.  This saves space in the generated code.  When compiling at
-@option{-O3} or higher however the hardware multiplier is invoked
-inline.  This makes for bigger, but faster code.
+When @option{-maltivec} is used, rather than @option{-maltivec=le} or
+@option{-maltivec=be}, the element order for Altivec intrinsics such
+as @code{vec_splat}, @code{vec_extract}, and @code{vec_insert} 
+match array element order corresponding to the endianness of the
+target.  That is, element zero identifies the leftmost element in a
+vector register when targeting a big-endian platform, and identifies
+the rightmost element in a vector register when targeting a
+little-endian platform.
 
-The hardware multiply routines disable interrupts whilst running and
-restore the previous interrupt state when they finish.  This makes
-them safe to use inside interrupt handlers as well as in normal code.
+@item -maltivec=be
+@opindex maltivec=be
+Generate Altivec instructions using big-endian element order,
+regardless of whether the target is big- or little-endian.  This is
+the default when targeting a big-endian platform.
 
-@item -minrt
-@opindex minrt
-Enable the use of a minimum runtime environment - no static
-initializers or constructors.  This is intended for memory-constrained
-devices.  The compiler includes special symbols in some objects
-that tell the linker and runtime which code fragments are required.
+The element order is used to interpret element numbers in Altivec
+intrinsics such as @code{vec_splat}, @code{vec_extract}, and
+@code{vec_insert}.  By default, these match array element order
+corresponding to the endianness for the target.
 
-@end table
+@item -maltivec=le
+@opindex maltivec=le
+Generate Altivec instructions using little-endian element order,
+regardless of whether the target is big- or little-endian.  This is
+the default when targeting a little-endian platform.  This option is
+currently ignored when targeting a big-endian platform.
 
-@node NDS32 Options
-@subsection NDS32 Options
-@cindex NDS32 Options
-
-These options are defined for NDS32 implementations:
-
-@table @gcctabopt
+The element order is used to interpret element numbers in Altivec
+intrinsics such as @code{vec_splat}, @code{vec_extract}, and
+@code{vec_insert}.  By default, these match array element order
+corresponding to the endianness for the target.
 
-@item -mbig-endian
-@opindex mbig-endian
-Generate code in big-endian mode.
+@item -mvrsave
+@itemx -mno-vrsave
+@opindex mvrsave
+@opindex mno-vrsave
+Generate VRSAVE instructions when generating AltiVec code.
 
-@item -mlittle-endian
-@opindex mlittle-endian
-Generate code in little-endian mode.
+@item -mgen-cell-microcode
+@opindex mgen-cell-microcode
+Generate Cell microcode instructions.
 
-@item -mreduced-regs
-@opindex mreduced-regs
-Use reduced-set registers for register allocation.
+@item -mwarn-cell-microcode
+@opindex mwarn-cell-microcode
+Warn when a Cell microcode instruction is emitted.  An example
+of a Cell microcode instruction is a variable shift.
 
-@item -mfull-regs
-@opindex mfull-regs
-Use full-set registers for register allocation.
+@item -msecure-plt
+@opindex msecure-plt
+Generate code that allows @command{ld} and @command{ld.so}
+to build executables and shared
+libraries with non-executable @code{.plt} and @code{.got} sections.
+This is a PowerPC
+32-bit SYSV ABI option.
 
-@item -mcmov
-@opindex mcmov
-Generate conditional move instructions.
+@item -mbss-plt
+@opindex mbss-plt
+Generate code that uses a BSS @code{.plt} section that @command{ld.so}
+fills in, and
+requires @code{.plt} and @code{.got}
+sections that are both writable and executable.
+This is a PowerPC 32-bit SYSV ABI option.
 
-@item -mno-cmov
-@opindex mno-cmov
-Do not generate conditional move instructions.
+@item -misel
+@itemx -mno-isel
+@opindex misel
+@opindex mno-isel
+This switch enables or disables the generation of ISEL instructions.
 
-@item -mperf-ext
-@opindex mperf-ext
-Generate performance extension instructions.
+@item -misel=@var{yes/no}
+This switch has been deprecated.  Use @option{-misel} and
+@option{-mno-isel} instead.
 
-@item -mno-perf-ext
-@opindex mno-perf-ext
-Do not generate performance extension instructions.
+@item -mspe
+@itemx -mno-spe
+@opindex mspe
+@opindex mno-spe
+This switch enables or disables the generation of SPE simd
+instructions.
 
-@item -mv3push
-@opindex mv3push
-Generate v3 push25/pop25 instructions.
+@item -mpaired
+@itemx -mno-paired
+@opindex mpaired
+@opindex mno-paired
+This switch enables or disables the generation of PAIRED simd
+instructions.
 
-@item -mno-v3push
-@opindex mno-v3push
-Do not generate v3 push25/pop25 instructions.
+@item -mspe=@var{yes/no}
+This option has been deprecated.  Use @option{-mspe} and
+@option{-mno-spe} instead.
 
-@item -m16-bit
-@opindex m16-bit
-Generate 16-bit instructions.
+@item -mvsx
+@itemx -mno-vsx
+@opindex mvsx
+@opindex mno-vsx
+Generate code that uses (does not use) vector/scalar (VSX)
+instructions, and also enable the use of built-in functions that allow
+more direct access to the VSX instruction set.
 
-@item -mno-16-bit
-@opindex mno-16-bit
-Do not generate 16-bit instructions.
+@item -mcrypto
+@itemx -mno-crypto
+@opindex mcrypto
+@opindex mno-crypto
+Enable the use (disable) of the built-in functions that allow direct
+access to the cryptographic instructions that were added in version
+2.07 of the PowerPC ISA.
 
-@item -misr-vector-size=@var{num}
-@opindex misr-vector-size
-Specify the size of each interrupt vector, which must be 4 or 16.
+@item -mdirect-move
+@itemx -mno-direct-move
+@opindex mdirect-move
+@opindex mno-direct-move
+Generate code that uses (does not use) the instructions to move data
+between the general purpose registers and the vector/scalar (VSX)
+registers that were added in version 2.07 of the PowerPC ISA.
 
-@item -mcache-block-size=@var{num}
-@opindex mcache-block-size
-Specify the size of each cache block,
-which must be a power of 2 between 4 and 512.
+@item -mpower8-fusion
+@itemx -mno-power8-fusion
+@opindex mpower8-fusion
+@opindex mno-power8-fusion
+Generate code that keeps (does not keeps) some integer operations
+adjacent so that the instructions can be fused together on power8 and
+later processors.
 
-@item -march=@var{arch}
-@opindex march
-Specify the name of the target architecture.
+@item -mpower8-vector
+@itemx -mno-power8-vector
+@opindex mpower8-vector
+@opindex mno-power8-vector
+Generate code that uses (does not use) the vector and scalar
+instructions that were added in version 2.07 of the PowerPC ISA.  Also
+enable the use of built-in functions that allow more direct access to
+the vector instructions.
 
-@item -mcmodel=@var{code-model}
-@opindex mcmodel
-Set the code model to one of
-@table @asis
-@item @samp{small}
-All the data and read-only data segments must be within 512KB addressing space.
-The text segment must be within 16MB addressing space.
-@item @samp{medium}
-The data segment must be within 512KB while the read-only data segment can be
-within 4GB addressing space.  The text segment should be still within 16MB
-addressing space.
-@item @samp{large}
-All the text and data segments can be within 4GB addressing space.
-@end table
+@item -mquad-memory
+@itemx -mno-quad-memory
+@opindex mquad-memory
+@opindex mno-quad-memory
+Generate code that uses (does not use) the non-atomic quad word memory
+instructions.  The @option{-mquad-memory} option requires use of
+64-bit mode.
 
-@item -mctor-dtor
-@opindex mctor-dtor
-Enable constructor/destructor feature.
+@item -mquad-memory-atomic
+@itemx -mno-quad-memory-atomic
+@opindex mquad-memory-atomic
+@opindex mno-quad-memory-atomic
+Generate code that uses (does not use) the atomic quad word memory
+instructions.  The @option{-mquad-memory-atomic} option requires use of
+64-bit mode.
 
-@item -mrelax
-@opindex mrelax
-Guide linker to relax instructions.
+@item -mupper-regs-df
+@itemx -mno-upper-regs-df
+@opindex mupper-regs-df
+@opindex mno-upper-regs-df
+Generate code that uses (does not use) the scalar double precision
+instructions that target all 64 registers in the vector/scalar
+floating point register set that were added in version 2.06 of the
+PowerPC ISA.  The @option{-mupper-regs-df} turned on by default if you
+use either of the @option{-mcpu=power7}, @option{-mcpu=power8}, or
+@option{-mvsx} options.
 
-@end table
+@item -mupper-regs-sf
+@itemx -mno-upper-regs-sf
+@opindex mupper-regs-sf
+@opindex mno-upper-regs-sf
+Generate code that uses (does not use) the scalar single precision
+instructions that target all 64 registers in the vector/scalar
+floating point register set that were added in version 2.07 of the
+PowerPC ISA.  The @option{-mupper-regs-sf} turned on by default if you
+use either of the @option{-mcpu=power8}, or @option{-mpower8-vector}
+options.
 
-@node Nios II Options
-@subsection Nios II Options
-@cindex Nios II options
-@cindex Altera Nios II options
+@item -mupper-regs
+@itemx -mno-upper-regs
+@opindex mupper-regs
+@opindex mno-upper-regs
+Generate code that uses (does not use) the scalar
+instructions that target all 64 registers in the vector/scalar
+floating point register set, depending on the model of the machine.
 
-These are the options defined for the Altera Nios II processor.
+If the @option{-mno-upper-regs} option is used, it turns off both
+@option{-mupper-regs-sf} and @option{-mupper-regs-df} options.
 
-@table @gcctabopt
+@item -mfloat-gprs=@var{yes/single/double/no}
+@itemx -mfloat-gprs
+@opindex mfloat-gprs
+This switch enables or disables the generation of floating-point
+operations on the general-purpose registers for architectures that
+support it.
 
-@item -G @var{num}
-@opindex G
-@cindex smaller data references
-Put global and static objects less than or equal to @var{num} bytes
-into the small data or BSS sections instead of the normal data or BSS
-sections.  The default value of @var{num} is 8.
+The argument @samp{yes} or @samp{single} enables the use of
+single-precision floating-point operations.
 
-@item -mgpopt=@var{option}
-@item -mgpopt
-@itemx -mno-gpopt
-@opindex mgpopt
-@opindex mno-gpopt
-Generate (do not generate) GP-relative accesses.  The following 
-@var{option} names are recognized:
+The argument @samp{double} enables the use of single and
+double-precision floating-point operations.
 
-@table @samp
+The argument @samp{no} disables floating-point operations on the
+general-purpose registers.
 
-@item none
-Do not generate GP-relative accesses.
+This option is currently only available on the MPC854x.
 
-@item local
-Generate GP-relative accesses for small data objects that are not 
-external or weak.  Also use GP-relative addressing for objects that
-have been explicitly placed in a small data section via a @code{section}
-attribute.
+@item -m32
+@itemx -m64
+@opindex m32
+@opindex m64
+Generate code for 32-bit or 64-bit environments of Darwin and SVR4
+targets (including GNU/Linux).  The 32-bit environment sets int, long
+and pointer to 32 bits and generates code that runs on any PowerPC
+variant.  The 64-bit environment sets int to 32 bits and long and
+pointer to 64 bits, and generates code for PowerPC64, as for
+@option{-mpowerpc64}.
 
-@item global
-As for @samp{local}, but also generate GP-relative accesses for
-small data objects that are external or weak.  If you use this option,
-you must ensure that all parts of your program (including libraries) are
-compiled with the same @option{-G} setting.
-
-@item data
-Generate GP-relative accesses for all data objects in the program.  If you
-use this option, the entire data and BSS segments
-of your program must fit in 64K of memory and you must use an appropriate
-linker script to allocate them within the addressible range of the
-global pointer.
-
-@item all
-Generate GP-relative addresses for function pointers as well as data
-pointers.  If you use this option, the entire text, data, and BSS segments
-of your program must fit in 64K of memory and you must use an appropriate
-linker script to allocate them within the addressible range of the
-global pointer.
+@item -mfull-toc
+@itemx -mno-fp-in-toc
+@itemx -mno-sum-in-toc
+@itemx -mminimal-toc
+@opindex mfull-toc
+@opindex mno-fp-in-toc
+@opindex mno-sum-in-toc
+@opindex mminimal-toc
+Modify generation of the TOC (Table Of Contents), which is created for
+every executable file.  The @option{-mfull-toc} option is selected by
+default.  In that case, GCC allocates at least one TOC entry for
+each unique non-automatic variable reference in your program.  GCC
+also places floating-point constants in the TOC@.  However, only
+16,384 entries are available in the TOC@.
 
-@end table
+If you receive a linker error message that saying you have overflowed
+the available TOC space, you can reduce the amount of TOC space used
+with the @option{-mno-fp-in-toc} and @option{-mno-sum-in-toc} options.
+@option{-mno-fp-in-toc} prevents GCC from putting floating-point
+constants in the TOC and @option{-mno-sum-in-toc} forces GCC to
+generate code to calculate the sum of an address and a constant at
+run time instead of putting that sum into the TOC@.  You may specify one
+or both of these options.  Each causes GCC to produce very slightly
+slower and larger code at the expense of conserving TOC space.
 
-@option{-mgpopt} is equivalent to @option{-mgpopt=local}, and
-@option{-mno-gpopt} is equivalent to @option{-mgpopt=none}.
+If you still run out of space in the TOC even when you specify both of
+these options, specify @option{-mminimal-toc} instead.  This option causes
+GCC to make only one TOC entry for every file.  When you specify this
+option, GCC produces code that is slower and larger but which
+uses extremely little TOC space.  You may wish to use this option
+only on files that contain less frequently-executed code.
 
-The default is @option{-mgpopt} except when @option{-fpic} or
-@option{-fPIC} is specified to generate position-independent code.
-Note that the Nios II ABI does not permit GP-relative accesses from
-shared libraries.
+@item -maix64
+@itemx -maix32
+@opindex maix64
+@opindex maix32
+Enable 64-bit AIX ABI and calling convention: 64-bit pointers, 64-bit
+@code{long} type, and the infrastructure needed to support them.
+Specifying @option{-maix64} implies @option{-mpowerpc64},
+while @option{-maix32} disables the 64-bit ABI and
+implies @option{-mno-powerpc64}.  GCC defaults to @option{-maix32}.
 
-You may need to specify @option{-mno-gpopt} explicitly when building
-programs that include large amounts of small data, including large
-GOT data sections.  In this case, the 16-bit offset for GP-relative
-addressing may not be large enough to allow access to the entire 
-small data section.
+@item -mxl-compat
+@itemx -mno-xl-compat
+@opindex mxl-compat
+@opindex mno-xl-compat
+Produce code that conforms more closely to IBM XL compiler semantics
+when using AIX-compatible ABI@.  Pass floating-point arguments to
+prototyped functions beyond the register save area (RSA) on the stack
+in addition to argument FPRs.  Do not assume that most significant
+double in 128-bit long double value is properly rounded when comparing
+values and converting to double.  Use XL symbol names for long double
+support routines.
 
-@item -mel
-@itemx -meb
-@opindex mel
-@opindex meb
-Generate little-endian (default) or big-endian (experimental) code,
-respectively.
+The AIX calling convention was extended but not initially documented to
+handle an obscure K&R C case of calling a function that takes the
+address of its arguments with fewer arguments than declared.  IBM XL
+compilers access floating-point arguments that do not fit in the
+RSA from the stack when a subroutine is compiled without
+optimization.  Because always storing floating-point arguments on the
+stack is inefficient and rarely needed, this option is not enabled by
+default and only is necessary when calling subroutines compiled by IBM
+XL compilers without optimization.
 
-@item -mbypass-cache
-@itemx -mno-bypass-cache
-@opindex mno-bypass-cache
-@opindex mbypass-cache
-Force all load and store instructions to always bypass cache by 
-using I/O variants of the instructions. The default is not to
-bypass the cache.
+@item -mpe
+@opindex mpe
+Support @dfn{IBM RS/6000 SP} @dfn{Parallel Environment} (PE)@.  Link an
+application written to use message passing with special startup code to
+enable the application to run.  The system must have PE installed in the
+standard location (@file{/usr/lpp/ppe.poe/}), or the @file{specs} file
+must be overridden with the @option{-specs=} option to specify the
+appropriate directory location.  The Parallel Environment does not
+support threads, so the @option{-mpe} option and the @option{-pthread}
+option are incompatible.
 
-@item -mno-cache-volatile 
-@itemx -mcache-volatile       
-@opindex mcache-volatile 
-@opindex mno-cache-volatile
-Volatile memory access bypass the cache using the I/O variants of 
-the load and store instructions. The default is not to bypass the cache.
+@item -malign-natural
+@itemx -malign-power
+@opindex malign-natural
+@opindex malign-power
+On AIX, 32-bit Darwin, and 64-bit PowerPC GNU/Linux, the option
+@option{-malign-natural} overrides the ABI-defined alignment of larger
+types, such as floating-point doubles, on their natural size-based boundary.
+The option @option{-malign-power} instructs GCC to follow the ABI-specified
+alignment rules.  GCC defaults to the standard alignment defined in the ABI@.
 
-@item -mno-fast-sw-div
-@itemx -mfast-sw-div
-@opindex mno-fast-sw-div
-@opindex mfast-sw-div
-Do not use table-based fast divide for small numbers. The default 
-is to use the fast divide at @option{-O3} and above.
+On 64-bit Darwin, natural alignment is the default, and @option{-malign-power}
+is not supported.
 
-@item -mno-hw-mul
-@itemx -mhw-mul
-@itemx -mno-hw-mulx
-@itemx -mhw-mulx
-@itemx -mno-hw-div
-@itemx -mhw-div
-@opindex mno-hw-mul
-@opindex mhw-mul
-@opindex mno-hw-mulx
-@opindex mhw-mulx
-@opindex mno-hw-div
-@opindex mhw-div
-Enable or disable emitting @code{mul}, @code{mulx} and @code{div} family of 
-instructions by the compiler. The default is to emit @code{mul}
-and not emit @code{div} and @code{mulx}.
+@item -msoft-float
+@itemx -mhard-float
+@opindex msoft-float
+@opindex mhard-float
+Generate code that does not use (uses) the floating-point register set.
+Software floating-point emulation is provided if you use the
+@option{-msoft-float} option, and pass the option to GCC when linking.
 
-@item -mcustom-@var{insn}=@var{N}
-@itemx -mno-custom-@var{insn}
-@opindex mcustom-@var{insn}
-@opindex mno-custom-@var{insn}
-Each @option{-mcustom-@var{insn}=@var{N}} option enables use of a
-custom instruction with encoding @var{N} when generating code that uses 
-@var{insn}.  For example, @option{-mcustom-fadds=253} generates custom
-instruction 253 for single-precision floating-point add operations instead
-of the default behavior of using a library call.
+@item -msingle-float
+@itemx -mdouble-float
+@opindex msingle-float
+@opindex mdouble-float
+Generate code for single- or double-precision floating-point operations.
+@option{-mdouble-float} implies @option{-msingle-float}.
 
-The following values of @var{insn} are supported.  Except as otherwise
-noted, floating-point operations are expected to be implemented with
-normal IEEE 754 semantics and correspond directly to the C operators or the
-equivalent GCC built-in functions (@pxref{Other Builtins}).
+@item -msimple-fpu
+@opindex msimple-fpu
+Do not generate @code{sqrt} and @code{div} instructions for hardware
+floating-point unit.
 
-Single-precision floating point:
-@table @asis
+@item -mfpu=@var{name}
+@opindex mfpu
+Specify type of floating-point unit.  Valid values for @var{name} are
+@samp{sp_lite} (equivalent to @option{-msingle-float -msimple-fpu}),
+@samp{dp_lite} (equivalent to @option{-mdouble-float -msimple-fpu}),
+@samp{sp_full} (equivalent to @option{-msingle-float}),
+and @samp{dp_full} (equivalent to @option{-mdouble-float}).
 
-@item @samp{fadds}, @samp{fsubs}, @samp{fdivs}, @samp{fmuls}
-Binary arithmetic operations.
+@item -mxilinx-fpu
+@opindex mxilinx-fpu
+Perform optimizations for the floating-point unit on Xilinx PPC 405/440.
 
-@item @samp{fnegs}
-Unary negation.
+@item -mmultiple
+@itemx -mno-multiple
+@opindex mmultiple
+@opindex mno-multiple
+Generate code that uses (does not use) the load multiple word
+instructions and the store multiple word instructions.  These
+instructions are generated by default on POWER systems, and not
+generated on PowerPC systems.  Do not use @option{-mmultiple} on little-endian
+PowerPC systems, since those instructions do not work when the
+processor is in little-endian mode.  The exceptions are PPC740 and
+PPC750 which permit these instructions in little-endian mode.
 
-@item @samp{fabss}
-Unary absolute value.
+@item -mstring
+@itemx -mno-string
+@opindex mstring
+@opindex mno-string
+Generate code that uses (does not use) the load string instructions
+and the store string word instructions to save multiple registers and
+do small block moves.  These instructions are generated by default on
+POWER systems, and not generated on PowerPC systems.  Do not use
+@option{-mstring} on little-endian PowerPC systems, since those
+instructions do not work when the processor is in little-endian mode.
+The exceptions are PPC740 and PPC750 which permit these instructions
+in little-endian mode.
 
-@item @samp{fcmpeqs}, @samp{fcmpges}, @samp{fcmpgts}, @samp{fcmples}, @samp{fcmplts}, @samp{fcmpnes}
-Comparison operations.
+@item -mupdate
+@itemx -mno-update
+@opindex mupdate
+@opindex mno-update
+Generate code that uses (does not use) the load or store instructions
+that update the base register to the address of the calculated memory
+location.  These instructions are generated by default.  If you use
+@option{-mno-update}, there is a small window between the time that the
+stack pointer is updated and the address of the previous frame is
+stored, which means code that walks the stack frame across interrupts or
+signals may get corrupted data.
 
-@item @samp{fmins}, @samp{fmaxs}
-Floating-point minimum and maximum.  These instructions are only
-generated if @option{-ffinite-math-only} is specified.
+@item -mavoid-indexed-addresses
+@itemx -mno-avoid-indexed-addresses
+@opindex mavoid-indexed-addresses
+@opindex mno-avoid-indexed-addresses
+Generate code that tries to avoid (not avoid) the use of indexed load
+or store instructions. These instructions can incur a performance
+penalty on Power6 processors in certain situations, such as when
+stepping through large arrays that cross a 16M boundary.  This option
+is enabled by default when targeting Power6 and disabled otherwise.
 
-@item @samp{fsqrts}
-Unary square root operation.
+@item -mfused-madd
+@itemx -mno-fused-madd
+@opindex mfused-madd
+@opindex mno-fused-madd
+Generate code that uses (does not use) the floating-point multiply and
+accumulate instructions.  These instructions are generated by default
+if hardware floating point is used.  The machine-dependent
+@option{-mfused-madd} option is now mapped to the machine-independent
+@option{-ffp-contract=fast} option, and @option{-mno-fused-madd} is
+mapped to @option{-ffp-contract=off}.
 
-@item @samp{fcoss}, @samp{fsins}, @samp{ftans}, @samp{fatans}, @samp{fexps}, @samp{flogs}
-Floating-point trigonometric and exponential functions.  These instructions
-are only generated if @option{-funsafe-math-optimizations} is also specified.
+@item -mmulhw
+@itemx -mno-mulhw
+@opindex mmulhw
+@opindex mno-mulhw
+Generate code that uses (does not use) the half-word multiply and
+multiply-accumulate instructions on the IBM 405, 440, 464 and 476 processors.
+These instructions are generated by default when targeting those
+processors.
 
-@end table
+@item -mdlmzb
+@itemx -mno-dlmzb
+@opindex mdlmzb
+@opindex mno-dlmzb
+Generate code that uses (does not use) the string-search @samp{dlmzb}
+instruction on the IBM 405, 440, 464 and 476 processors.  This instruction is
+generated by default when targeting those processors.
 
-Double-precision floating point:
-@table @asis
+@item -mno-bit-align
+@itemx -mbit-align
+@opindex mno-bit-align
+@opindex mbit-align
+On System V.4 and embedded PowerPC systems do not (do) force structures
+and unions that contain bit-fields to be aligned to the base type of the
+bit-field.
 
-@item @samp{faddd}, @samp{fsubd}, @samp{fdivd}, @samp{fmuld}
-Binary arithmetic operations.
+For example, by default a structure containing nothing but 8
+@code{unsigned} bit-fields of length 1 is aligned to a 4-byte
+boundary and has a size of 4 bytes.  By using @option{-mno-bit-align},
+the structure is aligned to a 1-byte boundary and is 1 byte in
+size.
 
-@item @samp{fnegd}
-Unary negation.
+@item -mno-strict-align
+@itemx -mstrict-align
+@opindex mno-strict-align
+@opindex mstrict-align
+On System V.4 and embedded PowerPC systems do not (do) assume that
+unaligned memory references are handled by the system.
 
-@item @samp{fabsd}
-Unary absolute value.
+@item -mrelocatable
+@itemx -mno-relocatable
+@opindex mrelocatable
+@opindex mno-relocatable
+Generate code that allows (does not allow) a static executable to be
+relocated to a different address at run time.  A simple embedded
+PowerPC system loader should relocate the entire contents of
+@code{.got2} and 4-byte locations listed in the @code{.fixup} section,
+a table of 32-bit addresses generated by this option.  For this to
+work, all objects linked together must be compiled with
+@option{-mrelocatable} or @option{-mrelocatable-lib}.
+@option{-mrelocatable} code aligns the stack to an 8-byte boundary.
 
-@item @samp{fcmpeqd}, @samp{fcmpged}, @samp{fcmpgtd}, @samp{fcmpled}, @samp{fcmpltd}, @samp{fcmpned}
-Comparison operations.
+@item -mrelocatable-lib
+@itemx -mno-relocatable-lib
+@opindex mrelocatable-lib
+@opindex mno-relocatable-lib
+Like @option{-mrelocatable}, @option{-mrelocatable-lib} generates a
+@code{.fixup} section to allow static executables to be relocated at
+run time, but @option{-mrelocatable-lib} does not use the smaller stack
+alignment of @option{-mrelocatable}.  Objects compiled with
+@option{-mrelocatable-lib} may be linked with objects compiled with
+any combination of the @option{-mrelocatable} options.
 
-@item @samp{fmind}, @samp{fmaxd}
-Double-precision minimum and maximum.  These instructions are only
-generated if @option{-ffinite-math-only} is specified.
+@item -mno-toc
+@itemx -mtoc
+@opindex mno-toc
+@opindex mtoc
+On System V.4 and embedded PowerPC systems do not (do) assume that
+register 2 contains a pointer to a global area pointing to the addresses
+used in the program.
 
-@item @samp{fsqrtd}
-Unary square root operation.
+@item -mlittle
+@itemx -mlittle-endian
+@opindex mlittle
+@opindex mlittle-endian
+On System V.4 and embedded PowerPC systems compile code for the
+processor in little-endian mode.  The @option{-mlittle-endian} option is
+the same as @option{-mlittle}.
 
-@item @samp{fcosd}, @samp{fsind}, @samp{ftand}, @samp{fatand}, @samp{fexpd}, @samp{flogd}
-Double-precision trigonometric and exponential functions.  These instructions
-are only generated if @option{-funsafe-math-optimizations} is also specified.
+@item -mbig
+@itemx -mbig-endian
+@opindex mbig
+@opindex mbig-endian
+On System V.4 and embedded PowerPC systems compile code for the
+processor in big-endian mode.  The @option{-mbig-endian} option is
+the same as @option{-mbig}.
 
-@end table
+@item -mdynamic-no-pic
+@opindex mdynamic-no-pic
+On Darwin and Mac OS X systems, compile code so that it is not
+relocatable, but that its external references are relocatable.  The
+resulting code is suitable for applications, but not shared
+libraries.
 
-Conversions:
-@table @asis
-@item @samp{fextsd}
-Conversion from single precision to double precision.
+@item -msingle-pic-base
+@opindex msingle-pic-base
+Treat the register used for PIC addressing as read-only, rather than
+loading it in the prologue for each function.  The runtime system is
+responsible for initializing this register with an appropriate value
+before execution begins.
 
-@item @samp{ftruncds}
-Conversion from double precision to single precision.
+@item -mprioritize-restricted-insns=@var{priority}
+@opindex mprioritize-restricted-insns
+This option controls the priority that is assigned to
+dispatch-slot restricted instructions during the second scheduling
+pass.  The argument @var{priority} takes the value @samp{0}, @samp{1},
+or @samp{2} to assign no, highest, or second-highest (respectively) 
+priority to dispatch-slot restricted
+instructions.
 
-@item @samp{fixsi}, @samp{fixsu}, @samp{fixdi}, @samp{fixdu}
-Conversion from floating point to signed or unsigned integer types, with
-truncation towards zero.
+@item -msched-costly-dep=@var{dependence_type}
+@opindex msched-costly-dep
+This option controls which dependences are considered costly
+by the target during instruction scheduling.  The argument
+@var{dependence_type} takes one of the following values:
 
-@item @samp{round}
-Conversion from single-precision floating point to signed integer,
-rounding to the nearest integer and ties away from zero.
-This corresponds to the @code{__builtin_lroundf} function when
-@option{-fno-math-errno} is used.
+@table @asis
+@item @samp{no}
+No dependence is costly.
 
-@item @samp{floatis}, @samp{floatus}, @samp{floatid}, @samp{floatud}
-Conversion from signed or unsigned integer types to floating-point types.
+@item @samp{all}
+All dependences are costly.
+
+@item @samp{true_store_to_load}
+A true dependence from store to load is costly.
 
+@item @samp{store_to_load}
+Any dependence from store to load is costly.
+
+@item @var{number}
+Any dependence for which the latency is greater than or equal to 
+@var{number} is costly.
 @end table
 
-In addition, all of the following transfer instructions for internal
-registers X and Y must be provided to use any of the double-precision
-floating-point instructions.  Custom instructions taking two
-double-precision source operands expect the first operand in the
-64-bit register X.  The other operand (or only operand of a unary
-operation) is given to the custom arithmetic instruction with the
-least significant half in source register @var{src1} and the most
-significant half in @var{src2}.  A custom instruction that returns a
-double-precision result returns the most significant 32 bits in the
-destination register and the other half in 32-bit register Y.  
-GCC automatically generates the necessary code sequences to write
-register X and/or read register Y when double-precision floating-point
-instructions are used.
+@item -minsert-sched-nops=@var{scheme}
+@opindex minsert-sched-nops
+This option controls which NOP insertion scheme is used during
+the second scheduling pass.  The argument @var{scheme} takes one of the
+following values:
 
 @table @asis
+@item @samp{no}
+Don't insert NOPs.
 
-@item @samp{fwrx}
-Write @var{src1} into the least significant half of X and @var{src2} into
-the most significant half of X.
-
-@item @samp{fwry}
-Write @var{src1} into Y.
+@item @samp{pad}
+Pad with NOPs any dispatch group that has vacant issue slots,
+according to the scheduler's grouping.
 
-@item @samp{frdxhi}, @samp{frdxlo}
-Read the most or least (respectively) significant half of X and store it in
-@var{dest}.
+@item @samp{regroup_exact}
+Insert NOPs to force costly dependent insns into
+separate groups.  Insert exactly as many NOPs as needed to force an insn
+to a new group, according to the estimated processor grouping.
 
-@item @samp{frdy}
-Read the value of Y and store it into @var{dest}.
+@item @var{number}
+Insert NOPs to force costly dependent insns into
+separate groups.  Insert @var{number} NOPs to force an insn to a new group.
 @end table
 
-Note that you can gain more local control over generation of Nios II custom
-instructions by using the @code{target("custom-@var{insn}=@var{N}")}
-and @code{target("no-custom-@var{insn}")} function attributes
-(@pxref{Function Attributes})
-or pragmas (@pxref{Function Specific Option Pragmas}).
+@item -mcall-sysv
+@opindex mcall-sysv
+On System V.4 and embedded PowerPC systems compile code using calling
+conventions that adhere to the March 1995 draft of the System V
+Application Binary Interface, PowerPC processor supplement.  This is the
+default unless you configured GCC using @samp{powerpc-*-eabiaix}.
 
-@item -mcustom-fpu-cfg=@var{name}
-@opindex mcustom-fpu-cfg
+@item -mcall-sysv-eabi
+@itemx -mcall-eabi
+@opindex mcall-sysv-eabi
+@opindex mcall-eabi
+Specify both @option{-mcall-sysv} and @option{-meabi} options.
 
-This option enables a predefined, named set of custom instruction encodings
-(see @option{-mcustom-@var{insn}} above).  
-Currently, the following sets are defined:
+@item -mcall-sysv-noeabi
+@opindex mcall-sysv-noeabi
+Specify both @option{-mcall-sysv} and @option{-mno-eabi} options.
 
-@option{-mcustom-fpu-cfg=60-1} is equivalent to:
-@gccoptlist{-mcustom-fmuls=252 @gol
--mcustom-fadds=253 @gol
--mcustom-fsubs=254 @gol
--fsingle-precision-constant}
+@item -mcall-aixdesc
+@opindex m
+On System V.4 and embedded PowerPC systems compile code for the AIX
+operating system.
 
-@option{-mcustom-fpu-cfg=60-2} is equivalent to:
-@gccoptlist{-mcustom-fmuls=252 @gol
--mcustom-fadds=253 @gol
--mcustom-fsubs=254 @gol
--mcustom-fdivs=255 @gol
--fsingle-precision-constant}
+@item -mcall-linux
+@opindex mcall-linux
+On System V.4 and embedded PowerPC systems compile code for the
+Linux-based GNU system.
 
-@option{-mcustom-fpu-cfg=72-3} is equivalent to:
-@gccoptlist{-mcustom-floatus=243 @gol
--mcustom-fixsi=244 @gol
--mcustom-floatis=245 @gol
--mcustom-fcmpgts=246 @gol
--mcustom-fcmples=249 @gol
--mcustom-fcmpeqs=250 @gol
--mcustom-fcmpnes=251 @gol
--mcustom-fmuls=252 @gol
--mcustom-fadds=253 @gol
--mcustom-fsubs=254 @gol
--mcustom-fdivs=255 @gol
--fsingle-precision-constant}
+@item -mcall-freebsd
+@opindex mcall-freebsd
+On System V.4 and embedded PowerPC systems compile code for the
+FreeBSD operating system.
 
-Custom instruction assignments given by individual
-@option{-mcustom-@var{insn}=} options override those given by
-@option{-mcustom-fpu-cfg=}, regardless of the
-order of the options on the command line.
-
-Note that you can gain more local control over selection of a FPU
-configuration by using the @code{target("custom-fpu-cfg=@var{name}")}
-function attribute (@pxref{Function Attributes})
-or pragma (@pxref{Function Specific Option Pragmas}).
+@item -mcall-netbsd
+@opindex mcall-netbsd
+On System V.4 and embedded PowerPC systems compile code for the
+NetBSD operating system.
 
-@end table
+@item -mcall-openbsd
+@opindex mcall-netbsd
+On System V.4 and embedded PowerPC systems compile code for the
+OpenBSD operating system.
 
-These additional @samp{-m} options are available for the Altera Nios II
-ELF (bare-metal) target:
+@item -maix-struct-return
+@opindex maix-struct-return
+Return all structures in memory (as specified by the AIX ABI)@.
 
-@table @gcctabopt
+@item -msvr4-struct-return
+@opindex msvr4-struct-return
+Return structures smaller than 8 bytes in registers (as specified by the
+SVR4 ABI)@.
 
-@item -mhal
-@opindex mhal
-Link with HAL BSP.  This suppresses linking with the GCC-provided C runtime
-startup and termination code, and is typically used in conjunction with
-@option{-msys-crt0=} to specify the location of the alternate startup code
-provided by the HAL BSP.
+@item -mabi=@var{abi-type}
+@opindex mabi
+Extend the current ABI with a particular extension, or remove such extension.
+Valid values are @samp{altivec}, @samp{no-altivec}, @samp{spe},
+@samp{no-spe}, @samp{ibmlongdouble}, @samp{ieeelongdouble},
+@samp{elfv1}, @samp{elfv2}@.
 
-@item -msmallc
-@opindex msmallc
-Link with a limited version of the C library, @option{-lsmallc}, rather than
-Newlib.
+@item -mabi=spe
+@opindex mabi=spe
+Extend the current ABI with SPE ABI extensions.  This does not change
+the default ABI, instead it adds the SPE ABI extensions to the current
+ABI@.
 
-@item -msys-crt0=@var{startfile}
-@opindex msys-crt0
-@var{startfile} is the file name of the startfile (crt0) to use 
-when linking.  This option is only useful in conjunction with @option{-mhal}.
+@item -mabi=no-spe
+@opindex mabi=no-spe
+Disable Book-E SPE ABI extensions for the current ABI@.
 
-@item -msys-lib=@var{systemlib}
-@opindex msys-lib
-@var{systemlib} is the library name of the library that provides
-low-level system calls required by the C library,
-e.g. @code{read} and @code{write}.
-This option is typically used to link with a library provided by a HAL BSP.
+@item -mabi=ibmlongdouble
+@opindex mabi=ibmlongdouble
+Change the current ABI to use IBM extended-precision long double.
+This is a PowerPC 32-bit SYSV ABI option.
 
-@end table
+@item -mabi=ieeelongdouble
+@opindex mabi=ieeelongdouble
+Change the current ABI to use IEEE extended-precision long double.
+This is a PowerPC 32-bit Linux ABI option.
 
-@node PDP-11 Options
-@subsection PDP-11 Options
-@cindex PDP-11 Options
+@item -mabi=elfv1
+@opindex mabi=elfv1
+Change the current ABI to use the ELFv1 ABI.
+This is the default ABI for big-endian PowerPC 64-bit Linux.
+Overriding the default ABI requires special system support and is
+likely to fail in spectacular ways.
 
-These options are defined for the PDP-11:
+@item -mabi=elfv2
+@opindex mabi=elfv2
+Change the current ABI to use the ELFv2 ABI.
+This is the default ABI for little-endian PowerPC 64-bit Linux.
+Overriding the default ABI requires special system support and is
+likely to fail in spectacular ways.
 
-@table @gcctabopt
-@item -mfpu
-@opindex mfpu
-Use hardware FPP floating point.  This is the default.  (FIS floating
-point on the PDP-11/40 is not supported.)
+@item -mprototype
+@itemx -mno-prototype
+@opindex mprototype
+@opindex mno-prototype
+On System V.4 and embedded PowerPC systems assume that all calls to
+variable argument functions are properly prototyped.  Otherwise, the
+compiler must insert an instruction before every non-prototyped call to
+set or clear bit 6 of the condition code register (@code{CR}) to
+indicate whether floating-point values are passed in the floating-point
+registers in case the function takes variable arguments.  With
+@option{-mprototype}, only calls to prototyped variable argument functions
+set or clear the bit.
 
-@item -msoft-float
-@opindex msoft-float
-Do not use hardware floating point.
+@item -msim
+@opindex msim
+On embedded PowerPC systems, assume that the startup module is called
+@file{sim-crt0.o} and that the standard C libraries are @file{libsim.a} and
+@file{libc.a}.  This is the default for @samp{powerpc-*-eabisim}
+configurations.
 
-@item -mac0
-@opindex mac0
-Return floating-point results in ac0 (fr0 in Unix assembler syntax).
+@item -mmvme
+@opindex mmvme
+On embedded PowerPC systems, assume that the startup module is called
+@file{crt0.o} and the standard C libraries are @file{libmvme.a} and
+@file{libc.a}.
 
-@item -mno-ac0
-@opindex mno-ac0
-Return floating-point results in memory.  This is the default.
+@item -mads
+@opindex mads
+On embedded PowerPC systems, assume that the startup module is called
+@file{crt0.o} and the standard C libraries are @file{libads.a} and
+@file{libc.a}.
 
-@item -m40
-@opindex m40
-Generate code for a PDP-11/40.
+@item -myellowknife
+@opindex myellowknife
+On embedded PowerPC systems, assume that the startup module is called
+@file{crt0.o} and the standard C libraries are @file{libyk.a} and
+@file{libc.a}.
 
-@item -m45
-@opindex m45
-Generate code for a PDP-11/45.  This is the default.
+@item -mvxworks
+@opindex mvxworks
+On System V.4 and embedded PowerPC systems, specify that you are
+compiling for a VxWorks system.
 
-@item -m10
-@opindex m10
-Generate code for a PDP-11/10.
+@item -memb
+@opindex memb
+On embedded PowerPC systems, set the @code{PPC_EMB} bit in the ELF flags
+header to indicate that @samp{eabi} extended relocations are used.
 
-@item -mbcopy-builtin
-@opindex mbcopy-builtin
-Use inline @code{movmemhi} patterns for copying memory.  This is the
-default.
+@item -meabi
+@itemx -mno-eabi
+@opindex meabi
+@opindex mno-eabi
+On System V.4 and embedded PowerPC systems do (do not) adhere to the
+Embedded Applications Binary Interface (EABI), which is a set of
+modifications to the System V.4 specifications.  Selecting @option{-meabi}
+means that the stack is aligned to an 8-byte boundary, a function
+@code{__eabi} is called from @code{main} to set up the EABI
+environment, and the @option{-msdata} option can use both @code{r2} and
+@code{r13} to point to two separate small data areas.  Selecting
+@option{-mno-eabi} means that the stack is aligned to a 16-byte boundary,
+no EABI initialization function is called from @code{main}, and the
+@option{-msdata} option only uses @code{r13} to point to a single
+small data area.  The @option{-meabi} option is on by default if you
+configured GCC using one of the @samp{powerpc*-*-eabi*} options.
 
-@item -mbcopy
-@opindex mbcopy
-Do not use inline @code{movmemhi} patterns for copying memory.
+@item -msdata=eabi
+@opindex msdata=eabi
+On System V.4 and embedded PowerPC systems, put small initialized
+@code{const} global and static data in the @code{.sdata2} section, which
+is pointed to by register @code{r2}.  Put small initialized
+non-@code{const} global and static data in the @code{.sdata} section,
+which is pointed to by register @code{r13}.  Put small uninitialized
+global and static data in the @code{.sbss} section, which is adjacent to
+the @code{.sdata} section.  The @option{-msdata=eabi} option is
+incompatible with the @option{-mrelocatable} option.  The
+@option{-msdata=eabi} option also sets the @option{-memb} option.
 
-@item -mint16
-@itemx -mno-int32
-@opindex mint16
-@opindex mno-int32
-Use 16-bit @code{int}.  This is the default.
+@item -msdata=sysv
+@opindex msdata=sysv
+On System V.4 and embedded PowerPC systems, put small global and static
+data in the @code{.sdata} section, which is pointed to by register
+@code{r13}.  Put small uninitialized global and static data in the
+@code{.sbss} section, which is adjacent to the @code{.sdata} section.
+The @option{-msdata=sysv} option is incompatible with the
+@option{-mrelocatable} option.
 
-@item -mint32
-@itemx -mno-int16
-@opindex mint32
-@opindex mno-int16
-Use 32-bit @code{int}.
+@item -msdata=default
+@itemx -msdata
+@opindex msdata=default
+@opindex msdata
+On System V.4 and embedded PowerPC systems, if @option{-meabi} is used,
+compile code the same as @option{-msdata=eabi}, otherwise compile code the
+same as @option{-msdata=sysv}.
 
-@item -mfloat64
-@itemx -mno-float32
-@opindex mfloat64
-@opindex mno-float32
-Use 64-bit @code{float}.  This is the default.
+@item -msdata=data
+@opindex msdata=data
+On System V.4 and embedded PowerPC systems, put small global
+data in the @code{.sdata} section.  Put small uninitialized global
+data in the @code{.sbss} section.  Do not use register @code{r13}
+to address small data however.  This is the default behavior unless
+other @option{-msdata} options are used.
 
-@item -mfloat32
-@itemx -mno-float64
-@opindex mfloat32
-@opindex mno-float64
-Use 32-bit @code{float}.
+@item -msdata=none
+@itemx -mno-sdata
+@opindex msdata=none
+@opindex mno-sdata
+On embedded PowerPC systems, put all initialized global and static data
+in the @code{.data} section, and all uninitialized data in the
+@code{.bss} section.
 
-@item -mabshi
-@opindex mabshi
-Use @code{abshi2} pattern.  This is the default.
+@item -mblock-move-inline-limit=@var{num}
+@opindex mblock-move-inline-limit
+Inline all block moves (such as calls to @code{memcpy} or structure
+copies) less than or equal to @var{num} bytes.  The minimum value for
+@var{num} is 32 bytes on 32-bit targets and 64 bytes on 64-bit
+targets.  The default value is target-specific.
 
-@item -mno-abshi
-@opindex mno-abshi
-Do not use @code{abshi2} pattern.
+@item -G @var{num}
+@opindex G
+@cindex smaller data references (PowerPC)
+@cindex .sdata/.sdata2 references (PowerPC)
+On embedded PowerPC systems, put global and static items less than or
+equal to @var{num} bytes into the small data or BSS sections instead of
+the normal data or BSS section.  By default, @var{num} is 8.  The
+@option{-G @var{num}} switch is also passed to the linker.
+All modules should be compiled with the same @option{-G @var{num}} value.
 
-@item -mbranch-expensive
-@opindex mbranch-expensive
-Pretend that branches are expensive.  This is for experimenting with
-code generation only.
-
-@item -mbranch-cheap
-@opindex mbranch-cheap
-Do not pretend that branches are expensive.  This is the default.
+@item -mregnames
+@itemx -mno-regnames
+@opindex mregnames
+@opindex mno-regnames
+On System V.4 and embedded PowerPC systems do (do not) emit register
+names in the assembly language output using symbolic forms.
 
-@item -munix-asm
-@opindex munix-asm
-Use Unix assembler syntax.  This is the default when configured for
-@samp{pdp11-*-bsd}.
+@item -mlongcall
+@itemx -mno-longcall
+@opindex mlongcall
+@opindex mno-longcall
+By default assume that all calls are far away so that a longer and more
+expensive calling sequence is required.  This is required for calls
+farther than 32 megabytes (33,554,432 bytes) from the current location.
+A short call is generated if the compiler knows
+the call cannot be that far away.  This setting can be overridden by
+the @code{shortcall} function attribute, or by @code{#pragma
+longcall(0)}.
 
-@item -mdec-asm
-@opindex mdec-asm
-Use DEC assembler syntax.  This is the default when configured for any
-PDP-11 target other than @samp{pdp11-*-bsd}.
-@end table
+Some linkers are capable of detecting out-of-range calls and generating
+glue code on the fly.  On these systems, long calls are unnecessary and
+generate slower code.  As of this writing, the AIX linker can do this,
+as can the GNU linker for PowerPC/64.  It is planned to add this feature
+to the GNU linker for 32-bit PowerPC systems as well.
 
-@node picoChip Options
-@subsection picoChip Options
-@cindex picoChip options
+On Darwin/PPC systems, @code{#pragma longcall} generates @code{jbsr
+callee, L42}, plus a @dfn{branch island} (glue code).  The two target
+addresses represent the callee and the branch island.  The
+Darwin/PPC linker prefers the first address and generates a @code{bl
+callee} if the PPC @code{bl} instruction reaches the callee directly;
+otherwise, the linker generates @code{bl L42} to call the branch
+island.  The branch island is appended to the body of the
+calling function; it computes the full 32-bit address of the callee
+and jumps to it.
 
-These @samp{-m} options are defined for picoChip implementations:
+On Mach-O (Darwin) systems, this option directs the compiler emit to
+the glue for every direct call, and the Darwin linker decides whether
+to use or discard it.
 
-@table @gcctabopt
+In the future, GCC may ignore all longcall specifications
+when the linker is known to generate glue.
 
-@item -mae=@var{ae_type}
-@opindex mcpu
-Set the instruction set, register set, and instruction scheduling
-parameters for array element type @var{ae_type}.  Supported values
-for @var{ae_type} are @samp{ANY}, @samp{MUL}, and @samp{MAC}.
+@item -mtls-markers
+@itemx -mno-tls-markers
+@opindex mtls-markers
+@opindex mno-tls-markers
+Mark (do not mark) calls to @code{__tls_get_addr} with a relocation
+specifying the function argument.  The relocation allows the linker to
+reliably associate function call with argument setup instructions for
+TLS optimization, which in turn allows GCC to better schedule the
+sequence.
 
-@option{-mae=ANY} selects a completely generic AE type.  Code
-generated with this option runs on any of the other AE types.  The
-code is not as efficient as it would be if compiled for a specific
-AE type, and some types of operation (e.g., multiplication) do not
-work properly on all types of AE.
+@item -pthread
+@opindex pthread
+Adds support for multithreading with the @dfn{pthreads} library.
+This option sets flags for both the preprocessor and linker.
 
-@option{-mae=MUL} selects a MUL AE type.  This is the most useful AE type
-for compiled code, and is the default.
+@item -mrecip
+@itemx -mno-recip
+@opindex mrecip
+This option enables use of the reciprocal estimate and
+reciprocal square root estimate instructions with additional
+Newton-Raphson steps to increase precision instead of doing a divide or
+square root and divide for floating-point arguments.  You should use
+the @option{-ffast-math} option when using @option{-mrecip} (or at
+least @option{-funsafe-math-optimizations},
+@option{-finite-math-only}, @option{-freciprocal-math} and
+@option{-fno-trapping-math}).  Note that while the throughput of the
+sequence is generally higher than the throughput of the non-reciprocal
+instruction, the precision of the sequence can be decreased by up to 2
+ulp (i.e.@: the inverse of 1.0 equals 0.99999994) for reciprocal square
+roots.
 
-@option{-mae=MAC} selects a DSP-style MAC AE.  Code compiled with this
-option may suffer from poor performance of byte (char) manipulation,
-since the DSP AE does not provide hardware support for byte load/stores.
+@item -mrecip=@var{opt}
+@opindex mrecip=opt
+This option controls which reciprocal estimate instructions
+may be used.  @var{opt} is a comma-separated list of options, which may
+be preceded by a @code{!} to invert the option:
 
-@item -msymbol-as-address
-Enable the compiler to directly use a symbol name as an address in a
-load/store instruction, without first loading it into a
-register.  Typically, the use of this option generates larger
-programs, which run faster than when the option isn't used.  However, the
-results vary from program to program, so it is left as a user option,
-rather than being permanently enabled.
+@table @samp
 
-@item -mno-inefficient-warnings
-Disables warnings about the generation of inefficient code.  These
-warnings can be generated, for example, when compiling code that
-performs byte-level memory operations on the MAC AE type.  The MAC AE has
-no hardware support for byte-level memory operations, so all byte
-load/stores must be synthesized from word load/store operations.  This is
-inefficient and a warning is generated to indicate
-that you should rewrite the code to avoid byte operations, or to target
-an AE type that has the necessary hardware support.  This option disables
-these warnings.
+@item all
+Enable all estimate instructions.
 
-@end table
+@item default 
+Enable the default instructions, equivalent to @option{-mrecip}.
 
-@node PowerPC Options
-@subsection PowerPC Options
-@cindex PowerPC options
+@item none 
+Disable all estimate instructions, equivalent to @option{-mno-recip}.
 
-These are listed under @xref{RS/6000 and PowerPC Options}.
+@item div 
+Enable the reciprocal approximation instructions for both 
+single and double precision.
 
-@node RL78 Options
-@subsection RL78 Options
-@cindex RL78 Options
+@item divf 
+Enable the single-precision reciprocal approximation instructions.
 
-@table @gcctabopt
+@item divd 
+Enable the double-precision reciprocal approximation instructions.
 
-@item -msim
-@opindex msim
-Links in additional target libraries to support operation within a
-simulator.
+@item rsqrt 
+Enable the reciprocal square root approximation instructions for both
+single and double precision.
 
-@item -mmul=none
-@itemx -mmul=g13
-@itemx -mmul=rl78
-@opindex mmul
-Specifies the type of hardware multiplication support to be used.  The
-default is @samp{none}, which uses software multiplication functions.
-The @samp{g13} option is for the hardware multiply/divide peripheral
-only on the RL78/G13 targets.  The @samp{rl78} option is for the
-standard hardware multiplication defined in the RL78 software manual.
+@item rsqrtf 
+Enable the single-precision reciprocal square root approximation instructions.
 
-@item -m64bit-doubles
-@itemx -m32bit-doubles
-@opindex m64bit-doubles
-@opindex m32bit-doubles
-Make the @code{double} data type be 64 bits (@option{-m64bit-doubles})
-or 32 bits (@option{-m32bit-doubles}) in size.  The default is
-@option{-m32bit-doubles}.
+@item rsqrtd 
+Enable the double-precision reciprocal square root approximation instructions.
 
 @end table
 
-@node RS/6000 and PowerPC Options
-@subsection IBM RS/6000 and PowerPC Options
-@cindex RS/6000 and PowerPC Options
-@cindex IBM RS/6000 and PowerPC Options
-
-These @samp{-m} options are defined for the IBM RS/6000 and PowerPC:
-@table @gcctabopt
-@item -mpowerpc-gpopt
-@itemx -mno-powerpc-gpopt
-@itemx -mpowerpc-gfxopt
-@itemx -mno-powerpc-gfxopt
-@need 800
-@itemx -mpowerpc64
-@itemx -mno-powerpc64
-@itemx -mmfcrf
-@itemx -mno-mfcrf
-@itemx -mpopcntb
-@itemx -mno-popcntb
-@itemx -mpopcntd
-@itemx -mno-popcntd
-@itemx -mfprnd
-@itemx -mno-fprnd
-@need 800
-@itemx -mcmpb
-@itemx -mno-cmpb
-@itemx -mmfpgpr
-@itemx -mno-mfpgpr
-@itemx -mhard-dfp
-@itemx -mno-hard-dfp
-@opindex mpowerpc-gpopt
-@opindex mno-powerpc-gpopt
-@opindex mpowerpc-gfxopt
-@opindex mno-powerpc-gfxopt
-@opindex mpowerpc64
-@opindex mno-powerpc64
-@opindex mmfcrf
-@opindex mno-mfcrf
-@opindex mpopcntb
-@opindex mno-popcntb
-@opindex mpopcntd
-@opindex mno-popcntd
-@opindex mfprnd
-@opindex mno-fprnd
-@opindex mcmpb
-@opindex mno-cmpb
-@opindex mmfpgpr
-@opindex mno-mfpgpr
-@opindex mhard-dfp
-@opindex mno-hard-dfp
-You use these options to specify which instructions are available on the
-processor you are using.  The default value of these options is
-determined when configuring GCC@.  Specifying the
-@option{-mcpu=@var{cpu_type}} overrides the specification of these
-options.  We recommend you use the @option{-mcpu=@var{cpu_type}} option
-rather than the options listed above.
+So, for example, @option{-mrecip=all,!rsqrtd} enables
+all of the reciprocal estimate instructions, except for the
+@code{FRSQRTE}, @code{XSRSQRTEDP}, and @code{XVRSQRTEDP} instructions
+which handle the double-precision reciprocal square root calculations.
 
-Specifying @option{-mpowerpc-gpopt} allows
-GCC to use the optional PowerPC architecture instructions in the
-General Purpose group, including floating-point square root.  Specifying
-@option{-mpowerpc-gfxopt} allows GCC to
-use the optional PowerPC architecture instructions in the Graphics
-group, including floating-point select.
+@item -mrecip-precision
+@itemx -mno-recip-precision
+@opindex mrecip-precision
+Assume (do not assume) that the reciprocal estimate instructions
+provide higher-precision estimates than is mandated by the PowerPC
+ABI.  Selecting @option{-mcpu=power6}, @option{-mcpu=power7} or
+@option{-mcpu=power8} automatically selects @option{-mrecip-precision}.
+The double-precision square root estimate instructions are not generated by
+default on low-precision machines, since they do not provide an
+estimate that converges after three steps.
 
-The @option{-mmfcrf} option allows GCC to generate the move from
-condition register field instruction implemented on the POWER4
-processor and other processors that support the PowerPC V2.01
-architecture.
-The @option{-mpopcntb} option allows GCC to generate the popcount and
-double-precision FP reciprocal estimate instruction implemented on the
-POWER5 processor and other processors that support the PowerPC V2.02
-architecture.
-The @option{-mpopcntd} option allows GCC to generate the popcount
-instruction implemented on the POWER7 processor and other processors
-that support the PowerPC V2.06 architecture.
-The @option{-mfprnd} option allows GCC to generate the FP round to
-integer instructions implemented on the POWER5+ processor and other
-processors that support the PowerPC V2.03 architecture.
-The @option{-mcmpb} option allows GCC to generate the compare bytes
-instruction implemented on the POWER6 processor and other processors
-that support the PowerPC V2.05 architecture.
-The @option{-mmfpgpr} option allows GCC to generate the FP move to/from
-general-purpose register instructions implemented on the POWER6X
-processor and other processors that support the extended PowerPC V2.05
-architecture.
-The @option{-mhard-dfp} option allows GCC to generate the decimal
-floating-point instructions implemented on some POWER processors.
-
-The @option{-mpowerpc64} option allows GCC to generate the additional
-64-bit instructions that are found in the full PowerPC64 architecture
-and to treat GPRs as 64-bit, doubleword quantities.  GCC defaults to
-@option{-mno-powerpc64}.
+@item -mveclibabi=@var{type}
+@opindex mveclibabi
+Specifies the ABI type to use for vectorizing intrinsics using an
+external library.  The only type supported at present is @samp{mass},
+which specifies to use IBM's Mathematical Acceleration Subsystem
+(MASS) libraries for vectorizing intrinsics using external libraries.
+GCC currently emits calls to @code{acosd2}, @code{acosf4},
+@code{acoshd2}, @code{acoshf4}, @code{asind2}, @code{asinf4},
+@code{asinhd2}, @code{asinhf4}, @code{atan2d2}, @code{atan2f4},
+@code{atand2}, @code{atanf4}, @code{atanhd2}, @code{atanhf4},
+@code{cbrtd2}, @code{cbrtf4}, @code{cosd2}, @code{cosf4},
+@code{coshd2}, @code{coshf4}, @code{erfcd2}, @code{erfcf4},
+@code{erfd2}, @code{erff4}, @code{exp2d2}, @code{exp2f4},
+@code{expd2}, @code{expf4}, @code{expm1d2}, @code{expm1f4},
+@code{hypotd2}, @code{hypotf4}, @code{lgammad2}, @code{lgammaf4},
+@code{log10d2}, @code{log10f4}, @code{log1pd2}, @code{log1pf4},
+@code{log2d2}, @code{log2f4}, @code{logd2}, @code{logf4},
+@code{powd2}, @code{powf4}, @code{sind2}, @code{sinf4}, @code{sinhd2},
+@code{sinhf4}, @code{sqrtd2}, @code{sqrtf4}, @code{tand2},
+@code{tanf4}, @code{tanhd2}, and @code{tanhf4} when generating code
+for power7.  Both @option{-ftree-vectorize} and
+@option{-funsafe-math-optimizations} must also be enabled.  The MASS
+libraries must be specified at link time.
 
-@item -mcpu=@var{cpu_type}
-@opindex mcpu
-Set architecture type, register usage, and
-instruction scheduling parameters for machine type @var{cpu_type}.
-Supported values for @var{cpu_type} are @samp{401}, @samp{403},
-@samp{405}, @samp{405fp}, @samp{440}, @samp{440fp}, @samp{464}, @samp{464fp},
-@samp{476}, @samp{476fp}, @samp{505}, @samp{601}, @samp{602}, @samp{603},
-@samp{603e}, @samp{604}, @samp{604e}, @samp{620}, @samp{630}, @samp{740},
-@samp{7400}, @samp{7450}, @samp{750}, @samp{801}, @samp{821}, @samp{823},
-@samp{860}, @samp{970}, @samp{8540}, @samp{a2}, @samp{e300c2},
-@samp{e300c3}, @samp{e500mc}, @samp{e500mc64}, @samp{e5500},
-@samp{e6500}, @samp{ec603e}, @samp{G3}, @samp{G4}, @samp{G5},
-@samp{titan}, @samp{power3}, @samp{power4}, @samp{power5}, @samp{power5+},
-@samp{power6}, @samp{power6x}, @samp{power7}, @samp{power8}, @samp{powerpc},
-@samp{powerpc64}, and @samp{rs64}.
+@item -mfriz
+@itemx -mno-friz
+@opindex mfriz
+Generate (do not generate) the @code{friz} instruction when the
+@option{-funsafe-math-optimizations} option is used to optimize
+rounding of floating-point values to 64-bit integer and back to floating
+point.  The @code{friz} instruction does not return the same value if
+the floating-point number is too large to fit in an integer.
 
-@option{-mcpu=powerpc}, and @option{-mcpu=powerpc64} specify pure 32-bit
-PowerPC and 64-bit PowerPC architecture machine
-types, with an appropriate, generic processor model assumed for
-scheduling purposes.
+@item -mpointers-to-nested-functions
+@itemx -mno-pointers-to-nested-functions
+@opindex mpointers-to-nested-functions
+Generate (do not generate) code to load up the static chain register
+(@code{r11}) when calling through a pointer on AIX and 64-bit Linux
+systems where a function pointer points to a 3-word descriptor giving
+the function address, TOC value to be loaded in register @code{r2}, and
+static chain value to be loaded in register @code{r11}.  The
+@option{-mpointers-to-nested-functions} is on by default.  You cannot
+call through pointers to nested functions or pointers
+to functions compiled in other languages that use the static chain if
+you use the @option{-mno-pointers-to-nested-functions}.
 
-The other options specify a specific processor.  Code generated under
-those options runs best on that processor, and may not run at all on
-others.
+@item -msave-toc-indirect
+@itemx -mno-save-toc-indirect
+@opindex msave-toc-indirect
+Generate (do not generate) code to save the TOC value in the reserved
+stack location in the function prologue if the function calls through
+a pointer on AIX and 64-bit Linux systems.  If the TOC value is not
+saved in the prologue, it is saved just before the call through the
+pointer.  The @option{-mno-save-toc-indirect} option is the default.
 
-The @option{-mcpu} options automatically enable or disable the
-following options:
+@item -mcompat-align-parm
+@itemx -mno-compat-align-parm
+@opindex mcompat-align-parm
+Generate (do not generate) code to pass structure parameters with a
+maximum alignment of 64 bits, for compatibility with older versions
+of GCC.
 
-@gccoptlist{-maltivec  -mfprnd  -mhard-float  -mmfcrf  -mmultiple @gol
--mpopcntb -mpopcntd  -mpowerpc64 @gol
--mpowerpc-gpopt  -mpowerpc-gfxopt  -msingle-float -mdouble-float @gol
--msimple-fpu -mstring  -mmulhw  -mdlmzb  -mmfpgpr -mvsx @gol
--mcrypto -mdirect-move -mpower8-fusion -mpower8-vector @gol
--mquad-memory -mquad-memory-atomic}
+Older versions of GCC (prior to 4.9.0) incorrectly did not align a
+structure parameter on a 128-bit boundary when that structure contained
+a member requiring 128-bit alignment.  This is corrected in more
+recent versions of GCC.  This option may be used to generate code
+that is compatible with functions compiled with older versions of
+GCC.
 
-The particular options set for any particular CPU varies between
-compiler versions, depending on what setting seems to produce optimal
-code for that CPU; it doesn't necessarily reflect the actual hardware's
-capabilities.  If you wish to set an individual option to a particular
-value, you may specify it after the @option{-mcpu} option, like
-@option{-mcpu=970 -mno-altivec}.
+The @option{-mno-compat-align-parm} option is the default.
+@end table
 
-On AIX, the @option{-maltivec} and @option{-mpowerpc64} options are
-not enabled or disabled by the @option{-mcpu} option at present because
-AIX does not have full support for these options.  You may still
-enable or disable them individually if you're sure it'll work in your
-environment.
+@node RX Options
+@subsection RX Options
+@cindex RX Options
 
-@item -mtune=@var{cpu_type}
-@opindex mtune
-Set the instruction scheduling parameters for machine type
-@var{cpu_type}, but do not set the architecture type or register usage,
-as @option{-mcpu=@var{cpu_type}} does.  The same
-values for @var{cpu_type} are used for @option{-mtune} as for
-@option{-mcpu}.  If both are specified, the code generated uses the
-architecture and registers set by @option{-mcpu}, but the
-scheduling parameters set by @option{-mtune}.
+These command-line options are defined for RX targets:
 
-@item -mcmodel=small
-@opindex mcmodel=small
-Generate PowerPC64 code for the small model: The TOC is limited to
-64k.
+@table @gcctabopt
+@item -m64bit-doubles
+@itemx -m32bit-doubles
+@opindex m64bit-doubles
+@opindex m32bit-doubles
+Make the @code{double} data type be 64 bits (@option{-m64bit-doubles})
+or 32 bits (@option{-m32bit-doubles}) in size.  The default is
+@option{-m32bit-doubles}.  @emph{Note} RX floating-point hardware only
+works on 32-bit values, which is why the default is
+@option{-m32bit-doubles}.
 
-@item -mcmodel=medium
-@opindex mcmodel=medium
-Generate PowerPC64 code for the medium model: The TOC and other static
-data may be up to a total of 4G in size.
+@item -fpu
+@itemx -nofpu
+@opindex fpu
+@opindex nofpu
+Enables (@option{-fpu}) or disables (@option{-nofpu}) the use of RX
+floating-point hardware.  The default is enabled for the RX600
+series and disabled for the RX200 series.
 
-@item -mcmodel=large
-@opindex mcmodel=large
-Generate PowerPC64 code for the large model: The TOC may be up to 4G
-in size.  Other data and code is only limited by the 64-bit address
-space.
+Floating-point instructions are only generated for 32-bit floating-point 
+values, however, so the FPU hardware is not used for doubles if the
+@option{-m64bit-doubles} option is used.
 
-@item -maltivec
-@itemx -mno-altivec
-@opindex maltivec
-@opindex mno-altivec
-Generate code that uses (does not use) AltiVec instructions, and also
-enable the use of built-in functions that allow more direct access to
-the AltiVec instruction set.  You may also need to set
-@option{-mabi=altivec} to adjust the current ABI with AltiVec ABI
-enhancements.
+@emph{Note} If the @option{-fpu} option is enabled then
+@option{-funsafe-math-optimizations} is also enabled automatically.
+This is because the RX FPU instructions are themselves unsafe.
 
-When @option{-maltivec} is used, rather than @option{-maltivec=le} or
-@option{-maltivec=be}, the element order for Altivec intrinsics such
-as @code{vec_splat}, @code{vec_extract}, and @code{vec_insert} 
-match array element order corresponding to the endianness of the
-target.  That is, element zero identifies the leftmost element in a
-vector register when targeting a big-endian platform, and identifies
-the rightmost element in a vector register when targeting a
-little-endian platform.
+@item -mcpu=@var{name}
+@opindex mcpu
+Selects the type of RX CPU to be targeted.  Currently three types are
+supported, the generic @samp{RX600} and @samp{RX200} series hardware and
+the specific @samp{RX610} CPU.  The default is @samp{RX600}.
 
-@item -maltivec=be
-@opindex maltivec=be
-Generate Altivec instructions using big-endian element order,
-regardless of whether the target is big- or little-endian.  This is
-the default when targeting a big-endian platform.
+The only difference between @samp{RX600} and @samp{RX610} is that the
+@samp{RX610} does not support the @code{MVTIPL} instruction.
 
-The element order is used to interpret element numbers in Altivec
-intrinsics such as @code{vec_splat}, @code{vec_extract}, and
-@code{vec_insert}.  By default, these match array element order
-corresponding to the endianness for the target.
+The @samp{RX200} series does not have a hardware floating-point unit
+and so @option{-nofpu} is enabled by default when this type is
+selected.
 
-@item -maltivec=le
-@opindex maltivec=le
-Generate Altivec instructions using little-endian element order,
-regardless of whether the target is big- or little-endian.  This is
-the default when targeting a little-endian platform.  This option is
-currently ignored when targeting a big-endian platform.
+@item -mbig-endian-data
+@itemx -mlittle-endian-data
+@opindex mbig-endian-data
+@opindex mlittle-endian-data
+Store data (but not code) in the big-endian format.  The default is
+@option{-mlittle-endian-data}, i.e.@: to store data in the little-endian
+format.
 
-The element order is used to interpret element numbers in Altivec
-intrinsics such as @code{vec_splat}, @code{vec_extract}, and
-@code{vec_insert}.  By default, these match array element order
-corresponding to the endianness for the target.
+@item -msmall-data-limit=@var{N}
+@opindex msmall-data-limit
+Specifies the maximum size in bytes of global and static variables
+which can be placed into the small data area.  Using the small data
+area can lead to smaller and faster code, but the size of area is
+limited and it is up to the programmer to ensure that the area does
+not overflow.  Also when the small data area is used one of the RX's
+registers (usually @code{r13}) is reserved for use pointing to this
+area, so it is no longer available for use by the compiler.  This
+could result in slower and/or larger code if variables are pushed onto
+the stack instead of being held in this register.
 
-@item -mvrsave
-@itemx -mno-vrsave
-@opindex mvrsave
-@opindex mno-vrsave
-Generate VRSAVE instructions when generating AltiVec code.
+Note, common variables (variables that have not been initialized) and
+constants are not placed into the small data area as they are assigned
+to other sections in the output executable.
 
-@item -mgen-cell-microcode
-@opindex mgen-cell-microcode
-Generate Cell microcode instructions.
+The default value is zero, which disables this feature.  Note, this
+feature is not enabled by default with higher optimization levels
+(@option{-O2} etc) because of the potentially detrimental effects of
+reserving a register.  It is up to the programmer to experiment and
+discover whether this feature is of benefit to their program.  See the
+description of the @option{-mpid} option for a description of how the
+actual register to hold the small data area pointer is chosen.
 
-@item -mwarn-cell-microcode
-@opindex mwarn-cell-microcode
-Warn when a Cell microcode instruction is emitted.  An example
-of a Cell microcode instruction is a variable shift.
-
-@item -msecure-plt
-@opindex msecure-plt
-Generate code that allows @command{ld} and @command{ld.so}
-to build executables and shared
-libraries with non-executable @code{.plt} and @code{.got} sections.
-This is a PowerPC
-32-bit SYSV ABI option.
-
-@item -mbss-plt
-@opindex mbss-plt
-Generate code that uses a BSS @code{.plt} section that @command{ld.so}
-fills in, and
-requires @code{.plt} and @code{.got}
-sections that are both writable and executable.
-This is a PowerPC 32-bit SYSV ABI option.
-
-@item -misel
-@itemx -mno-isel
-@opindex misel
-@opindex mno-isel
-This switch enables or disables the generation of ISEL instructions.
+@item -msim
+@itemx -mno-sim
+@opindex msim
+@opindex mno-sim
+Use the simulator runtime.  The default is to use the libgloss
+board-specific runtime.
 
-@item -misel=@var{yes/no}
-This switch has been deprecated.  Use @option{-misel} and
-@option{-mno-isel} instead.
+@item -mas100-syntax
+@itemx -mno-as100-syntax
+@opindex mas100-syntax
+@opindex mno-as100-syntax
+When generating assembler output use a syntax that is compatible with
+Renesas's AS100 assembler.  This syntax can also be handled by the GAS
+assembler, but it has some restrictions so it is not generated by default.
 
-@item -mspe
-@itemx -mno-spe
-@opindex mspe
-@opindex mno-spe
-This switch enables or disables the generation of SPE simd
-instructions.
+@item -mmax-constant-size=@var{N}
+@opindex mmax-constant-size
+Specifies the maximum size, in bytes, of a constant that can be used as
+an operand in a RX instruction.  Although the RX instruction set does
+allow constants of up to 4 bytes in length to be used in instructions,
+a longer value equates to a longer instruction.  Thus in some
+circumstances it can be beneficial to restrict the size of constants
+that are used in instructions.  Constants that are too big are instead
+placed into a constant pool and referenced via register indirection.
 
-@item -mpaired
-@itemx -mno-paired
-@opindex mpaired
-@opindex mno-paired
-This switch enables or disables the generation of PAIRED simd
-instructions.
+The value @var{N} can be between 0 and 4.  A value of 0 (the default)
+or 4 means that constants of any size are allowed.
 
-@item -mspe=@var{yes/no}
-This option has been deprecated.  Use @option{-mspe} and
-@option{-mno-spe} instead.
+@item -mrelax
+@opindex mrelax
+Enable linker relaxation.  Linker relaxation is a process whereby the
+linker attempts to reduce the size of a program by finding shorter
+versions of various instructions.  Disabled by default.
 
-@item -mvsx
-@itemx -mno-vsx
-@opindex mvsx
-@opindex mno-vsx
-Generate code that uses (does not use) vector/scalar (VSX)
-instructions, and also enable the use of built-in functions that allow
-more direct access to the VSX instruction set.
+@item -mint-register=@var{N}
+@opindex mint-register
+Specify the number of registers to reserve for fast interrupt handler
+functions.  The value @var{N} can be between 0 and 4.  A value of 1
+means that register @code{r13} is reserved for the exclusive use
+of fast interrupt handlers.  A value of 2 reserves @code{r13} and
+@code{r12}.  A value of 3 reserves @code{r13}, @code{r12} and
+@code{r11}, and a value of 4 reserves @code{r13} through @code{r10}.
+A value of 0, the default, does not reserve any registers.
 
-@item -mcrypto
-@itemx -mno-crypto
-@opindex mcrypto
-@opindex mno-crypto
-Enable the use (disable) of the built-in functions that allow direct
-access to the cryptographic instructions that were added in version
-2.07 of the PowerPC ISA.
+@item -msave-acc-in-interrupts
+@opindex msave-acc-in-interrupts
+Specifies that interrupt handler functions should preserve the
+accumulator register.  This is only necessary if normal code might use
+the accumulator register, for example because it performs 64-bit
+multiplications.  The default is to ignore the accumulator as this
+makes the interrupt handlers faster.
 
-@item -mdirect-move
-@itemx -mno-direct-move
-@opindex mdirect-move
-@opindex mno-direct-move
-Generate code that uses (does not use) the instructions to move data
-between the general purpose registers and the vector/scalar (VSX)
-registers that were added in version 2.07 of the PowerPC ISA.
+@item -mpid
+@itemx -mno-pid
+@opindex mpid
+@opindex mno-pid
+Enables the generation of position independent data.  When enabled any
+access to constant data is done via an offset from a base address
+held in a register.  This allows the location of constant data to be
+determined at run time without requiring the executable to be
+relocated, which is a benefit to embedded applications with tight
+memory constraints.  Data that can be modified is not affected by this
+option.
 
-@item -mpower8-fusion
-@itemx -mno-power8-fusion
-@opindex mpower8-fusion
-@opindex mno-power8-fusion
-Generate code that keeps (does not keeps) some integer operations
-adjacent so that the instructions can be fused together on power8 and
-later processors.
+Note, using this feature reserves a register, usually @code{r13}, for
+the constant data base address.  This can result in slower and/or
+larger code, especially in complicated functions.
 
-@item -mpower8-vector
-@itemx -mno-power8-vector
-@opindex mpower8-vector
-@opindex mno-power8-vector
-Generate code that uses (does not use) the vector and scalar
-instructions that were added in version 2.07 of the PowerPC ISA.  Also
-enable the use of built-in functions that allow more direct access to
-the vector instructions.
+The actual register chosen to hold the constant data base address
+depends upon whether the @option{-msmall-data-limit} and/or the
+@option{-mint-register} command-line options are enabled.  Starting
+with register @code{r13} and proceeding downwards, registers are
+allocated first to satisfy the requirements of @option{-mint-register},
+then @option{-mpid} and finally @option{-msmall-data-limit}.  Thus it
+is possible for the small data area register to be @code{r8} if both
+@option{-mint-register=4} and @option{-mpid} are specified on the
+command line.
 
-@item -mquad-memory
-@itemx -mno-quad-memory
-@opindex mquad-memory
-@opindex mno-quad-memory
-Generate code that uses (does not use) the non-atomic quad word memory
-instructions.  The @option{-mquad-memory} option requires use of
-64-bit mode.
+By default this feature is not enabled.  The default can be restored
+via the @option{-mno-pid} command-line option.
 
-@item -mquad-memory-atomic
-@itemx -mno-quad-memory-atomic
-@opindex mquad-memory-atomic
-@opindex mno-quad-memory-atomic
-Generate code that uses (does not use) the atomic quad word memory
-instructions.  The @option{-mquad-memory-atomic} option requires use of
-64-bit mode.
+@item -mno-warn-multiple-fast-interrupts
+@itemx -mwarn-multiple-fast-interrupts
+@opindex mno-warn-multiple-fast-interrupts
+@opindex mwarn-multiple-fast-interrupts
+Prevents GCC from issuing a warning message if it finds more than one
+fast interrupt handler when it is compiling a file.  The default is to
+issue a warning for each extra fast interrupt handler found, as the RX
+only supports one such interrupt.
 
-@item -mupper-regs-df
-@itemx -mno-upper-regs-df
-@opindex mupper-regs-df
-@opindex mno-upper-regs-df
-Generate code that uses (does not use) the scalar double precision
-instructions that target all 64 registers in the vector/scalar
-floating point register set that were added in version 2.06 of the
-PowerPC ISA.  The @option{-mupper-regs-df} turned on by default if you
-use either of the @option{-mcpu=power7}, @option{-mcpu=power8}, or
-@option{-mvsx} options.
+@end table
 
-@item -mupper-regs-sf
-@itemx -mno-upper-regs-sf
-@opindex mupper-regs-sf
-@opindex mno-upper-regs-sf
-Generate code that uses (does not use) the scalar single precision
-instructions that target all 64 registers in the vector/scalar
-floating point register set that were added in version 2.07 of the
-PowerPC ISA.  The @option{-mupper-regs-sf} turned on by default if you
-use either of the @option{-mcpu=power8}, or @option{-mpower8-vector}
+@emph{Note:} The generic GCC command-line option @option{-ffixed-@var{reg}}
+has special significance to the RX port when used with the
+@code{interrupt} function attribute.  This attribute indicates a
+function intended to process fast interrupts.  GCC ensures
+that it only uses the registers @code{r10}, @code{r11}, @code{r12}
+and/or @code{r13} and only provided that the normal use of the
+corresponding registers have been restricted via the
+@option{-ffixed-@var{reg}} or @option{-mint-register} command-line
 options.
 
-@item -mupper-regs
-@itemx -mno-upper-regs
-@opindex mupper-regs
-@opindex mno-upper-regs
-Generate code that uses (does not use) the scalar
-instructions that target all 64 registers in the vector/scalar
-floating point register set, depending on the model of the machine.
-
-If the @option{-mno-upper-regs} option is used, it turns off both
-@option{-mupper-regs-sf} and @option{-mupper-regs-df} options.
-
-@item -mfloat-gprs=@var{yes/single/double/no}
-@itemx -mfloat-gprs
-@opindex mfloat-gprs
-This switch enables or disables the generation of floating-point
-operations on the general-purpose registers for architectures that
-support it.
+@node S/390 and zSeries Options
+@subsection S/390 and zSeries Options
+@cindex S/390 and zSeries Options
 
-The argument @samp{yes} or @samp{single} enables the use of
-single-precision floating-point operations.
+These are the @samp{-m} options defined for the S/390 and zSeries architecture.
 
-The argument @samp{double} enables the use of single and
-double-precision floating-point operations.
+@table @gcctabopt
+@item -mhard-float
+@itemx -msoft-float
+@opindex mhard-float
+@opindex msoft-float
+Use (do not use) the hardware floating-point instructions and registers
+for floating-point operations.  When @option{-msoft-float} is specified,
+functions in @file{libgcc.a} are used to perform floating-point
+operations.  When @option{-mhard-float} is specified, the compiler
+generates IEEE floating-point instructions.  This is the default.
 
-The argument @samp{no} disables floating-point operations on the
-general-purpose registers.
+@item -mhard-dfp
+@itemx -mno-hard-dfp
+@opindex mhard-dfp
+@opindex mno-hard-dfp
+Use (do not use) the hardware decimal-floating-point instructions for
+decimal-floating-point operations.  When @option{-mno-hard-dfp} is
+specified, functions in @file{libgcc.a} are used to perform
+decimal-floating-point operations.  When @option{-mhard-dfp} is
+specified, the compiler generates decimal-floating-point hardware
+instructions.  This is the default for @option{-march=z9-ec} or higher.
 
-This option is currently only available on the MPC854x.
+@item -mlong-double-64
+@itemx -mlong-double-128
+@opindex mlong-double-64
+@opindex mlong-double-128
+These switches control the size of @code{long double} type. A size
+of 64 bits makes the @code{long double} type equivalent to the @code{double}
+type. This is the default.
 
-@item -m32
-@itemx -m64
-@opindex m32
-@opindex m64
-Generate code for 32-bit or 64-bit environments of Darwin and SVR4
-targets (including GNU/Linux).  The 32-bit environment sets int, long
-and pointer to 32 bits and generates code that runs on any PowerPC
-variant.  The 64-bit environment sets int to 32 bits and long and
-pointer to 64 bits, and generates code for PowerPC64, as for
-@option{-mpowerpc64}.
+@item -mbackchain
+@itemx -mno-backchain
+@opindex mbackchain
+@opindex mno-backchain
+Store (do not store) the address of the caller's frame as backchain pointer
+into the callee's stack frame.
+A backchain may be needed to allow debugging using tools that do not understand
+DWARF 2 call frame information.
+When @option{-mno-packed-stack} is in effect, the backchain pointer is stored
+at the bottom of the stack frame; when @option{-mpacked-stack} is in effect,
+the backchain is placed into the topmost word of the 96/160 byte register
+save area.
 
-@item -mfull-toc
-@itemx -mno-fp-in-toc
-@itemx -mno-sum-in-toc
-@itemx -mminimal-toc
-@opindex mfull-toc
-@opindex mno-fp-in-toc
-@opindex mno-sum-in-toc
-@opindex mminimal-toc
-Modify generation of the TOC (Table Of Contents), which is created for
-every executable file.  The @option{-mfull-toc} option is selected by
-default.  In that case, GCC allocates at least one TOC entry for
-each unique non-automatic variable reference in your program.  GCC
-also places floating-point constants in the TOC@.  However, only
-16,384 entries are available in the TOC@.
+In general, code compiled with @option{-mbackchain} is call-compatible with
+code compiled with @option{-mmo-backchain}; however, use of the backchain
+for debugging purposes usually requires that the whole binary is built with
+@option{-mbackchain}.  Note that the combination of @option{-mbackchain},
+@option{-mpacked-stack} and @option{-mhard-float} is not supported.  In order
+to build a linux kernel use @option{-msoft-float}.
 
-If you receive a linker error message that saying you have overflowed
-the available TOC space, you can reduce the amount of TOC space used
-with the @option{-mno-fp-in-toc} and @option{-mno-sum-in-toc} options.
-@option{-mno-fp-in-toc} prevents GCC from putting floating-point
-constants in the TOC and @option{-mno-sum-in-toc} forces GCC to
-generate code to calculate the sum of an address and a constant at
-run time instead of putting that sum into the TOC@.  You may specify one
-or both of these options.  Each causes GCC to produce very slightly
-slower and larger code at the expense of conserving TOC space.
+The default is to not maintain the backchain.
 
-If you still run out of space in the TOC even when you specify both of
-these options, specify @option{-mminimal-toc} instead.  This option causes
-GCC to make only one TOC entry for every file.  When you specify this
-option, GCC produces code that is slower and larger but which
-uses extremely little TOC space.  You may wish to use this option
-only on files that contain less frequently-executed code.
+@item -mpacked-stack
+@itemx -mno-packed-stack
+@opindex mpacked-stack
+@opindex mno-packed-stack
+Use (do not use) the packed stack layout.  When @option{-mno-packed-stack} is
+specified, the compiler uses the all fields of the 96/160 byte register save
+area only for their default purpose; unused fields still take up stack space.
+When @option{-mpacked-stack} is specified, register save slots are densely
+packed at the top of the register save area; unused space is reused for other
+purposes, allowing for more efficient use of the available stack space.
+However, when @option{-mbackchain} is also in effect, the topmost word of
+the save area is always used to store the backchain, and the return address
+register is always saved two words below the backchain.
 
-@item -maix64
-@itemx -maix32
-@opindex maix64
-@opindex maix32
-Enable 64-bit AIX ABI and calling convention: 64-bit pointers, 64-bit
-@code{long} type, and the infrastructure needed to support them.
-Specifying @option{-maix64} implies @option{-mpowerpc64},
-while @option{-maix32} disables the 64-bit ABI and
-implies @option{-mno-powerpc64}.  GCC defaults to @option{-maix32}.
+As long as the stack frame backchain is not used, code generated with
+@option{-mpacked-stack} is call-compatible with code generated with
+@option{-mno-packed-stack}.  Note that some non-FSF releases of GCC 2.95 for
+S/390 or zSeries generated code that uses the stack frame backchain at run
+time, not just for debugging purposes.  Such code is not call-compatible
+with code compiled with @option{-mpacked-stack}.  Also, note that the
+combination of @option{-mbackchain},
+@option{-mpacked-stack} and @option{-mhard-float} is not supported.  In order
+to build a linux kernel use @option{-msoft-float}.
 
-@item -mxl-compat
-@itemx -mno-xl-compat
-@opindex mxl-compat
-@opindex mno-xl-compat
-Produce code that conforms more closely to IBM XL compiler semantics
-when using AIX-compatible ABI@.  Pass floating-point arguments to
-prototyped functions beyond the register save area (RSA) on the stack
-in addition to argument FPRs.  Do not assume that most significant
-double in 128-bit long double value is properly rounded when comparing
-values and converting to double.  Use XL symbol names for long double
-support routines.
+The default is to not use the packed stack layout.
 
-The AIX calling convention was extended but not initially documented to
-handle an obscure K&R C case of calling a function that takes the
-address of its arguments with fewer arguments than declared.  IBM XL
-compilers access floating-point arguments that do not fit in the
-RSA from the stack when a subroutine is compiled without
-optimization.  Because always storing floating-point arguments on the
-stack is inefficient and rarely needed, this option is not enabled by
-default and only is necessary when calling subroutines compiled by IBM
-XL compilers without optimization.
+@item -msmall-exec
+@itemx -mno-small-exec
+@opindex msmall-exec
+@opindex mno-small-exec
+Generate (or do not generate) code using the @code{bras} instruction
+to do subroutine calls.
+This only works reliably if the total executable size does not
+exceed 64k.  The default is to use the @code{basr} instruction instead,
+which does not have this limitation.
 
-@item -mpe
-@opindex mpe
-Support @dfn{IBM RS/6000 SP} @dfn{Parallel Environment} (PE)@.  Link an
-application written to use message passing with special startup code to
-enable the application to run.  The system must have PE installed in the
-standard location (@file{/usr/lpp/ppe.poe/}), or the @file{specs} file
-must be overridden with the @option{-specs=} option to specify the
-appropriate directory location.  The Parallel Environment does not
-support threads, so the @option{-mpe} option and the @option{-pthread}
-option are incompatible.
+@item -m64
+@itemx -m31
+@opindex m64
+@opindex m31
+When @option{-m31} is specified, generate code compliant to the
+GNU/Linux for S/390 ABI@.  When @option{-m64} is specified, generate
+code compliant to the GNU/Linux for zSeries ABI@.  This allows GCC in
+particular to generate 64-bit instructions.  For the @samp{s390}
+targets, the default is @option{-m31}, while the @samp{s390x}
+targets default to @option{-m64}.
 
-@item -malign-natural
-@itemx -malign-power
-@opindex malign-natural
-@opindex malign-power
-On AIX, 32-bit Darwin, and 64-bit PowerPC GNU/Linux, the option
-@option{-malign-natural} overrides the ABI-defined alignment of larger
-types, such as floating-point doubles, on their natural size-based boundary.
-The option @option{-malign-power} instructs GCC to follow the ABI-specified
-alignment rules.  GCC defaults to the standard alignment defined in the ABI@.
+@item -mzarch
+@itemx -mesa
+@opindex mzarch
+@opindex mesa
+When @option{-mzarch} is specified, generate code using the
+instructions available on z/Architecture.
+When @option{-mesa} is specified, generate code using the
+instructions available on ESA/390.  Note that @option{-mesa} is
+not possible with @option{-m64}.
+When generating code compliant to the GNU/Linux for S/390 ABI,
+the default is @option{-mesa}.  When generating code compliant
+to the GNU/Linux for zSeries ABI, the default is @option{-mzarch}.
 
-On 64-bit Darwin, natural alignment is the default, and @option{-malign-power}
-is not supported.
+@item -mmvcle
+@itemx -mno-mvcle
+@opindex mmvcle
+@opindex mno-mvcle
+Generate (or do not generate) code using the @code{mvcle} instruction
+to perform block moves.  When @option{-mno-mvcle} is specified,
+use a @code{mvc} loop instead.  This is the default unless optimizing for
+size.
 
-@item -msoft-float
-@itemx -mhard-float
-@opindex msoft-float
-@opindex mhard-float
-Generate code that does not use (uses) the floating-point register set.
-Software floating-point emulation is provided if you use the
-@option{-msoft-float} option, and pass the option to GCC when linking.
+@item -mdebug
+@itemx -mno-debug
+@opindex mdebug
+@opindex mno-debug
+Print (or do not print) additional debug information when compiling.
+The default is to not print debug information.
 
-@item -msingle-float
-@itemx -mdouble-float
-@opindex msingle-float
-@opindex mdouble-float
-Generate code for single- or double-precision floating-point operations.
-@option{-mdouble-float} implies @option{-msingle-float}.
+@item -march=@var{cpu-type}
+@opindex march
+Generate code that runs on @var{cpu-type}, which is the name of a system
+representing a certain processor type.  Possible values for
+@var{cpu-type} are @samp{g5}, @samp{g6}, @samp{z900}, @samp{z990},
+@samp{z9-109}, @samp{z9-ec} and @samp{z10}.
+When generating code using the instructions available on z/Architecture,
+the default is @option{-march=z900}.  Otherwise, the default is
+@option{-march=g5}.
 
-@item -msimple-fpu
-@opindex msimple-fpu
-Do not generate @code{sqrt} and @code{div} instructions for hardware
-floating-point unit.
+@item -mtune=@var{cpu-type}
+@opindex mtune
+Tune to @var{cpu-type} everything applicable about the generated code,
+except for the ABI and the set of available instructions.
+The list of @var{cpu-type} values is the same as for @option{-march}.
+The default is the value used for @option{-march}.
 
-@item -mfpu=@var{name}
-@opindex mfpu
-Specify type of floating-point unit.  Valid values for @var{name} are
-@samp{sp_lite} (equivalent to @option{-msingle-float -msimple-fpu}),
-@samp{dp_lite} (equivalent to @option{-mdouble-float -msimple-fpu}),
-@samp{sp_full} (equivalent to @option{-msingle-float}),
-and @samp{dp_full} (equivalent to @option{-mdouble-float}).
+@item -mtpf-trace
+@itemx -mno-tpf-trace
+@opindex mtpf-trace
+@opindex mno-tpf-trace
+Generate code that adds (does not add) in TPF OS specific branches to trace
+routines in the operating system.  This option is off by default, even
+when compiling for the TPF OS@.
 
-@item -mxilinx-fpu
-@opindex mxilinx-fpu
-Perform optimizations for the floating-point unit on Xilinx PPC 405/440.
+@item -mfused-madd
+@itemx -mno-fused-madd
+@opindex mfused-madd
+@opindex mno-fused-madd
+Generate code that uses (does not use) the floating-point multiply and
+accumulate instructions.  These instructions are generated by default if
+hardware floating point is used.
 
-@item -mmultiple
-@itemx -mno-multiple
-@opindex mmultiple
-@opindex mno-multiple
-Generate code that uses (does not use) the load multiple word
-instructions and the store multiple word instructions.  These
-instructions are generated by default on POWER systems, and not
-generated on PowerPC systems.  Do not use @option{-mmultiple} on little-endian
-PowerPC systems, since those instructions do not work when the
-processor is in little-endian mode.  The exceptions are PPC740 and
-PPC750 which permit these instructions in little-endian mode.
+@item -mwarn-framesize=@var{framesize}
+@opindex mwarn-framesize
+Emit a warning if the current function exceeds the given frame size.  Because
+this is a compile-time check it doesn't need to be a real problem when the program
+runs.  It is intended to identify functions that most probably cause
+a stack overflow.  It is useful to be used in an environment with limited stack
+size e.g.@: the linux kernel.
 
-@item -mstring
-@itemx -mno-string
-@opindex mstring
-@opindex mno-string
-Generate code that uses (does not use) the load string instructions
-and the store string word instructions to save multiple registers and
-do small block moves.  These instructions are generated by default on
-POWER systems, and not generated on PowerPC systems.  Do not use
-@option{-mstring} on little-endian PowerPC systems, since those
-instructions do not work when the processor is in little-endian mode.
-The exceptions are PPC740 and PPC750 which permit these instructions
-in little-endian mode.
-
-@item -mupdate
-@itemx -mno-update
-@opindex mupdate
-@opindex mno-update
-Generate code that uses (does not use) the load or store instructions
-that update the base register to the address of the calculated memory
-location.  These instructions are generated by default.  If you use
-@option{-mno-update}, there is a small window between the time that the
-stack pointer is updated and the address of the previous frame is
-stored, which means code that walks the stack frame across interrupts or
-signals may get corrupted data.
+@item -mwarn-dynamicstack
+@opindex mwarn-dynamicstack
+Emit a warning if the function calls @code{alloca} or uses dynamically-sized
+arrays.  This is generally a bad idea with a limited stack size.
 
-@item -mavoid-indexed-addresses
-@itemx -mno-avoid-indexed-addresses
-@opindex mavoid-indexed-addresses
-@opindex mno-avoid-indexed-addresses
-Generate code that tries to avoid (not avoid) the use of indexed load
-or store instructions. These instructions can incur a performance
-penalty on Power6 processors in certain situations, such as when
-stepping through large arrays that cross a 16M boundary.  This option
-is enabled by default when targeting Power6 and disabled otherwise.
+@item -mstack-guard=@var{stack-guard}
+@itemx -mstack-size=@var{stack-size}
+@opindex mstack-guard
+@opindex mstack-size
+If these options are provided the S/390 back end emits additional instructions in
+the function prologue that trigger a trap if the stack size is @var{stack-guard}
+bytes above the @var{stack-size} (remember that the stack on S/390 grows downward).
+If the @var{stack-guard} option is omitted the smallest power of 2 larger than
+the frame size of the compiled function is chosen.
+These options are intended to be used to help debugging stack overflow problems.
+The additionally emitted code causes only little overhead and hence can also be
+used in production-like systems without greater performance degradation.  The given
+values have to be exact powers of 2 and @var{stack-size} has to be greater than
+@var{stack-guard} without exceeding 64k.
+In order to be efficient the extra code makes the assumption that the stack starts
+at an address aligned to the value given by @var{stack-size}.
+The @var{stack-guard} option can only be used in conjunction with @var{stack-size}.
 
-@item -mfused-madd
-@itemx -mno-fused-madd
-@opindex mfused-madd
-@opindex mno-fused-madd
-Generate code that uses (does not use) the floating-point multiply and
-accumulate instructions.  These instructions are generated by default
-if hardware floating point is used.  The machine-dependent
-@option{-mfused-madd} option is now mapped to the machine-independent
-@option{-ffp-contract=fast} option, and @option{-mno-fused-madd} is
-mapped to @option{-ffp-contract=off}.
+@item -mhotpatch=@var{pre-halfwords},@var{post-halfwords}
+@opindex mhotpatch
+If the hotpatch option is enabled, a ``hot-patching'' function
+prologue is generated for all functions in the compilation unit.
+The funtion label is prepended with the given number of two-byte
+Nop instructions (@var{pre-halfwords}, maximum 1000000).  After
+the label, 2 * @var{post-halfwords} bytes are appended, using the
+larges nop like instructions the architecture allows (maximum
+1000000).
 
-@item -mmulhw
-@itemx -mno-mulhw
-@opindex mmulhw
-@opindex mno-mulhw
-Generate code that uses (does not use) the half-word multiply and
-multiply-accumulate instructions on the IBM 405, 440, 464 and 476 processors.
-These instructions are generated by default when targeting those
-processors.
+If both arguments are zero, hotpatching is disabled.
 
-@item -mdlmzb
-@itemx -mno-dlmzb
-@opindex mdlmzb
-@opindex mno-dlmzb
-Generate code that uses (does not use) the string-search @samp{dlmzb}
-instruction on the IBM 405, 440, 464 and 476 processors.  This instruction is
-generated by default when targeting those processors.
+This option can be overridden for individual functions with the
+@code{hotpatch} attribute.
+@end table
 
-@item -mno-bit-align
-@itemx -mbit-align
-@opindex mno-bit-align
-@opindex mbit-align
-On System V.4 and embedded PowerPC systems do not (do) force structures
-and unions that contain bit-fields to be aligned to the base type of the
-bit-field.
+@node Score Options
+@subsection Score Options
+@cindex Score Options
 
-For example, by default a structure containing nothing but 8
-@code{unsigned} bit-fields of length 1 is aligned to a 4-byte
-boundary and has a size of 4 bytes.  By using @option{-mno-bit-align},
-the structure is aligned to a 1-byte boundary and is 1 byte in
-size.
+These options are defined for Score implementations:
 
-@item -mno-strict-align
-@itemx -mstrict-align
-@opindex mno-strict-align
-@opindex mstrict-align
-On System V.4 and embedded PowerPC systems do not (do) assume that
-unaligned memory references are handled by the system.
+@table @gcctabopt
+@item -meb
+@opindex meb
+Compile code for big-endian mode.  This is the default.
 
-@item -mrelocatable
-@itemx -mno-relocatable
-@opindex mrelocatable
-@opindex mno-relocatable
-Generate code that allows (does not allow) a static executable to be
-relocated to a different address at run time.  A simple embedded
-PowerPC system loader should relocate the entire contents of
-@code{.got2} and 4-byte locations listed in the @code{.fixup} section,
-a table of 32-bit addresses generated by this option.  For this to
-work, all objects linked together must be compiled with
-@option{-mrelocatable} or @option{-mrelocatable-lib}.
-@option{-mrelocatable} code aligns the stack to an 8-byte boundary.
+@item -mel
+@opindex mel
+Compile code for little-endian mode.
 
-@item -mrelocatable-lib
-@itemx -mno-relocatable-lib
-@opindex mrelocatable-lib
-@opindex mno-relocatable-lib
-Like @option{-mrelocatable}, @option{-mrelocatable-lib} generates a
-@code{.fixup} section to allow static executables to be relocated at
-run time, but @option{-mrelocatable-lib} does not use the smaller stack
-alignment of @option{-mrelocatable}.  Objects compiled with
-@option{-mrelocatable-lib} may be linked with objects compiled with
-any combination of the @option{-mrelocatable} options.
+@item -mnhwloop
+@opindex mnhwloop
+Disable generation of @code{bcnz} instructions.
 
-@item -mno-toc
-@itemx -mtoc
-@opindex mno-toc
-@opindex mtoc
-On System V.4 and embedded PowerPC systems do not (do) assume that
-register 2 contains a pointer to a global area pointing to the addresses
-used in the program.
+@item -muls
+@opindex muls
+Enable generation of unaligned load and store instructions.
 
-@item -mlittle
-@itemx -mlittle-endian
-@opindex mlittle
-@opindex mlittle-endian
-On System V.4 and embedded PowerPC systems compile code for the
-processor in little-endian mode.  The @option{-mlittle-endian} option is
-the same as @option{-mlittle}.
+@item -mmac
+@opindex mmac
+Enable the use of multiply-accumulate instructions. Disabled by default.
 
-@item -mbig
-@itemx -mbig-endian
-@opindex mbig
-@opindex mbig-endian
-On System V.4 and embedded PowerPC systems compile code for the
-processor in big-endian mode.  The @option{-mbig-endian} option is
-the same as @option{-mbig}.
+@item -mscore5
+@opindex mscore5
+Specify the SCORE5 as the target architecture.
 
-@item -mdynamic-no-pic
-@opindex mdynamic-no-pic
-On Darwin and Mac OS X systems, compile code so that it is not
-relocatable, but that its external references are relocatable.  The
-resulting code is suitable for applications, but not shared
-libraries.
+@item -mscore5u
+@opindex mscore5u
+Specify the SCORE5U of the target architecture.
 
-@item -msingle-pic-base
-@opindex msingle-pic-base
-Treat the register used for PIC addressing as read-only, rather than
-loading it in the prologue for each function.  The runtime system is
-responsible for initializing this register with an appropriate value
-before execution begins.
+@item -mscore7
+@opindex mscore7
+Specify the SCORE7 as the target architecture. This is the default.
 
-@item -mprioritize-restricted-insns=@var{priority}
-@opindex mprioritize-restricted-insns
-This option controls the priority that is assigned to
-dispatch-slot restricted instructions during the second scheduling
-pass.  The argument @var{priority} takes the value @samp{0}, @samp{1},
-or @samp{2} to assign no, highest, or second-highest (respectively) 
-priority to dispatch-slot restricted
-instructions.
+@item -mscore7d
+@opindex mscore7d
+Specify the SCORE7D as the target architecture.
+@end table
 
-@item -msched-costly-dep=@var{dependence_type}
-@opindex msched-costly-dep
-This option controls which dependences are considered costly
-by the target during instruction scheduling.  The argument
-@var{dependence_type} takes one of the following values:
+@node SH Options
+@subsection SH Options
 
-@table @asis
-@item @samp{no}
-No dependence is costly.
+These @samp{-m} options are defined for the SH implementations:
 
-@item @samp{all}
-All dependences are costly.
+@table @gcctabopt
+@item -m1
+@opindex m1
+Generate code for the SH1.
 
-@item @samp{true_store_to_load}
-A true dependence from store to load is costly.
+@item -m2
+@opindex m2
+Generate code for the SH2.
 
-@item @samp{store_to_load}
-Any dependence from store to load is costly.
+@item -m2e
+Generate code for the SH2e.
 
-@item @var{number}
-Any dependence for which the latency is greater than or equal to 
-@var{number} is costly.
-@end table
+@item -m2a-nofpu
+@opindex m2a-nofpu
+Generate code for the SH2a without FPU, or for a SH2a-FPU in such a way
+that the floating-point unit is not used.
 
-@item -minsert-sched-nops=@var{scheme}
-@opindex minsert-sched-nops
-This option controls which NOP insertion scheme is used during
-the second scheduling pass.  The argument @var{scheme} takes one of the
-following values:
+@item -m2a-single-only
+@opindex m2a-single-only
+Generate code for the SH2a-FPU, in such a way that no double-precision
+floating-point operations are used.
 
-@table @asis
-@item @samp{no}
-Don't insert NOPs.
+@item -m2a-single
+@opindex m2a-single
+Generate code for the SH2a-FPU assuming the floating-point unit is in
+single-precision mode by default.
 
-@item @samp{pad}
-Pad with NOPs any dispatch group that has vacant issue slots,
-according to the scheduler's grouping.
+@item -m2a
+@opindex m2a
+Generate code for the SH2a-FPU assuming the floating-point unit is in
+double-precision mode by default.
 
-@item @samp{regroup_exact}
-Insert NOPs to force costly dependent insns into
-separate groups.  Insert exactly as many NOPs as needed to force an insn
-to a new group, according to the estimated processor grouping.
+@item -m3
+@opindex m3
+Generate code for the SH3.
 
-@item @var{number}
-Insert NOPs to force costly dependent insns into
-separate groups.  Insert @var{number} NOPs to force an insn to a new group.
-@end table
+@item -m3e
+@opindex m3e
+Generate code for the SH3e.
 
-@item -mcall-sysv
-@opindex mcall-sysv
-On System V.4 and embedded PowerPC systems compile code using calling
-conventions that adhere to the March 1995 draft of the System V
-Application Binary Interface, PowerPC processor supplement.  This is the
-default unless you configured GCC using @samp{powerpc-*-eabiaix}.
+@item -m4-nofpu
+@opindex m4-nofpu
+Generate code for the SH4 without a floating-point unit.
 
-@item -mcall-sysv-eabi
-@itemx -mcall-eabi
-@opindex mcall-sysv-eabi
-@opindex mcall-eabi
-Specify both @option{-mcall-sysv} and @option{-meabi} options.
+@item -m4-single-only
+@opindex m4-single-only
+Generate code for the SH4 with a floating-point unit that only
+supports single-precision arithmetic.
 
-@item -mcall-sysv-noeabi
-@opindex mcall-sysv-noeabi
-Specify both @option{-mcall-sysv} and @option{-mno-eabi} options.
+@item -m4-single
+@opindex m4-single
+Generate code for the SH4 assuming the floating-point unit is in
+single-precision mode by default.
 
-@item -mcall-aixdesc
-@opindex m
-On System V.4 and embedded PowerPC systems compile code for the AIX
-operating system.
+@item -m4
+@opindex m4
+Generate code for the SH4.
 
-@item -mcall-linux
-@opindex mcall-linux
-On System V.4 and embedded PowerPC systems compile code for the
-Linux-based GNU system.
+@item -m4-100
+@opindex m4-100
+Generate code for SH4-100.
 
-@item -mcall-freebsd
-@opindex mcall-freebsd
-On System V.4 and embedded PowerPC systems compile code for the
-FreeBSD operating system.
+@item -m4-100-nofpu
+@opindex m4-100-nofpu
+Generate code for SH4-100 in such a way that the
+floating-point unit is not used.
 
-@item -mcall-netbsd
-@opindex mcall-netbsd
-On System V.4 and embedded PowerPC systems compile code for the
-NetBSD operating system.
+@item -m4-100-single
+@opindex m4-100-single
+Generate code for SH4-100 assuming the floating-point unit is in
+single-precision mode by default.
 
-@item -mcall-openbsd
-@opindex mcall-netbsd
-On System V.4 and embedded PowerPC systems compile code for the
-OpenBSD operating system.
+@item -m4-100-single-only
+@opindex m4-100-single-only
+Generate code for SH4-100 in such a way that no double-precision
+floating-point operations are used.
 
-@item -maix-struct-return
-@opindex maix-struct-return
-Return all structures in memory (as specified by the AIX ABI)@.
+@item -m4-200
+@opindex m4-200
+Generate code for SH4-200.
 
-@item -msvr4-struct-return
-@opindex msvr4-struct-return
-Return structures smaller than 8 bytes in registers (as specified by the
-SVR4 ABI)@.
+@item -m4-200-nofpu
+@opindex m4-200-nofpu
+Generate code for SH4-200 without in such a way that the
+floating-point unit is not used.
 
-@item -mabi=@var{abi-type}
-@opindex mabi
-Extend the current ABI with a particular extension, or remove such extension.
-Valid values are @samp{altivec}, @samp{no-altivec}, @samp{spe},
-@samp{no-spe}, @samp{ibmlongdouble}, @samp{ieeelongdouble},
-@samp{elfv1}, @samp{elfv2}@.
+@item -m4-200-single
+@opindex m4-200-single
+Generate code for SH4-200 assuming the floating-point unit is in
+single-precision mode by default.
 
-@item -mabi=spe
-@opindex mabi=spe
-Extend the current ABI with SPE ABI extensions.  This does not change
-the default ABI, instead it adds the SPE ABI extensions to the current
-ABI@.
+@item -m4-200-single-only
+@opindex m4-200-single-only
+Generate code for SH4-200 in such a way that no double-precision
+floating-point operations are used.
 
-@item -mabi=no-spe
-@opindex mabi=no-spe
-Disable Book-E SPE ABI extensions for the current ABI@.
+@item -m4-300
+@opindex m4-300
+Generate code for SH4-300.
 
-@item -mabi=ibmlongdouble
-@opindex mabi=ibmlongdouble
-Change the current ABI to use IBM extended-precision long double.
-This is a PowerPC 32-bit SYSV ABI option.
+@item -m4-300-nofpu
+@opindex m4-300-nofpu
+Generate code for SH4-300 without in such a way that the
+floating-point unit is not used.
 
-@item -mabi=ieeelongdouble
-@opindex mabi=ieeelongdouble
-Change the current ABI to use IEEE extended-precision long double.
-This is a PowerPC 32-bit Linux ABI option.
+@item -m4-300-single
+@opindex m4-300-single
+Generate code for SH4-300 in such a way that no double-precision
+floating-point operations are used.
 
-@item -mabi=elfv1
-@opindex mabi=elfv1
-Change the current ABI to use the ELFv1 ABI.
-This is the default ABI for big-endian PowerPC 64-bit Linux.
-Overriding the default ABI requires special system support and is
-likely to fail in spectacular ways.
+@item -m4-300-single-only
+@opindex m4-300-single-only
+Generate code for SH4-300 in such a way that no double-precision
+floating-point operations are used.
 
-@item -mabi=elfv2
-@opindex mabi=elfv2
-Change the current ABI to use the ELFv2 ABI.
-This is the default ABI for little-endian PowerPC 64-bit Linux.
-Overriding the default ABI requires special system support and is
-likely to fail in spectacular ways.
+@item -m4-340
+@opindex m4-340
+Generate code for SH4-340 (no MMU, no FPU).
 
-@item -mprototype
-@itemx -mno-prototype
-@opindex mprototype
-@opindex mno-prototype
-On System V.4 and embedded PowerPC systems assume that all calls to
-variable argument functions are properly prototyped.  Otherwise, the
-compiler must insert an instruction before every non-prototyped call to
-set or clear bit 6 of the condition code register (@code{CR}) to
-indicate whether floating-point values are passed in the floating-point
-registers in case the function takes variable arguments.  With
-@option{-mprototype}, only calls to prototyped variable argument functions
-set or clear the bit.
+@item -m4-500
+@opindex m4-500
+Generate code for SH4-500 (no FPU).  Passes @option{-isa=sh4-nofpu} to the
+assembler.
 
-@item -msim
-@opindex msim
-On embedded PowerPC systems, assume that the startup module is called
-@file{sim-crt0.o} and that the standard C libraries are @file{libsim.a} and
-@file{libc.a}.  This is the default for @samp{powerpc-*-eabisim}
-configurations.
+@item -m4a-nofpu
+@opindex m4a-nofpu
+Generate code for the SH4al-dsp, or for a SH4a in such a way that the
+floating-point unit is not used.
 
-@item -mmvme
-@opindex mmvme
-On embedded PowerPC systems, assume that the startup module is called
-@file{crt0.o} and the standard C libraries are @file{libmvme.a} and
-@file{libc.a}.
+@item -m4a-single-only
+@opindex m4a-single-only
+Generate code for the SH4a, in such a way that no double-precision
+floating-point operations are used.
 
-@item -mads
-@opindex mads
-On embedded PowerPC systems, assume that the startup module is called
-@file{crt0.o} and the standard C libraries are @file{libads.a} and
-@file{libc.a}.
+@item -m4a-single
+@opindex m4a-single
+Generate code for the SH4a assuming the floating-point unit is in
+single-precision mode by default.
 
-@item -myellowknife
-@opindex myellowknife
-On embedded PowerPC systems, assume that the startup module is called
-@file{crt0.o} and the standard C libraries are @file{libyk.a} and
-@file{libc.a}.
+@item -m4a
+@opindex m4a
+Generate code for the SH4a.
 
-@item -mvxworks
-@opindex mvxworks
-On System V.4 and embedded PowerPC systems, specify that you are
-compiling for a VxWorks system.
+@item -m4al
+@opindex m4al
+Same as @option{-m4a-nofpu}, except that it implicitly passes
+@option{-dsp} to the assembler.  GCC doesn't generate any DSP
+instructions at the moment.
 
-@item -memb
-@opindex memb
-On embedded PowerPC systems, set the @code{PPC_EMB} bit in the ELF flags
-header to indicate that @samp{eabi} extended relocations are used.
+@item -m5-32media
+@opindex m5-32media
+Generate 32-bit code for SHmedia.
 
-@item -meabi
-@itemx -mno-eabi
-@opindex meabi
-@opindex mno-eabi
-On System V.4 and embedded PowerPC systems do (do not) adhere to the
-Embedded Applications Binary Interface (EABI), which is a set of
-modifications to the System V.4 specifications.  Selecting @option{-meabi}
-means that the stack is aligned to an 8-byte boundary, a function
-@code{__eabi} is called from @code{main} to set up the EABI
-environment, and the @option{-msdata} option can use both @code{r2} and
-@code{r13} to point to two separate small data areas.  Selecting
-@option{-mno-eabi} means that the stack is aligned to a 16-byte boundary,
-no EABI initialization function is called from @code{main}, and the
-@option{-msdata} option only uses @code{r13} to point to a single
-small data area.  The @option{-meabi} option is on by default if you
-configured GCC using one of the @samp{powerpc*-*-eabi*} options.
+@item -m5-32media-nofpu
+@opindex m5-32media-nofpu
+Generate 32-bit code for SHmedia in such a way that the
+floating-point unit is not used.
 
-@item -msdata=eabi
-@opindex msdata=eabi
-On System V.4 and embedded PowerPC systems, put small initialized
-@code{const} global and static data in the @code{.sdata2} section, which
-is pointed to by register @code{r2}.  Put small initialized
-non-@code{const} global and static data in the @code{.sdata} section,
-which is pointed to by register @code{r13}.  Put small uninitialized
-global and static data in the @code{.sbss} section, which is adjacent to
-the @code{.sdata} section.  The @option{-msdata=eabi} option is
-incompatible with the @option{-mrelocatable} option.  The
-@option{-msdata=eabi} option also sets the @option{-memb} option.
+@item -m5-64media
+@opindex m5-64media
+Generate 64-bit code for SHmedia.
 
-@item -msdata=sysv
-@opindex msdata=sysv
-On System V.4 and embedded PowerPC systems, put small global and static
-data in the @code{.sdata} section, which is pointed to by register
-@code{r13}.  Put small uninitialized global and static data in the
-@code{.sbss} section, which is adjacent to the @code{.sdata} section.
-The @option{-msdata=sysv} option is incompatible with the
-@option{-mrelocatable} option.
-
-@item -msdata=default
-@itemx -msdata
-@opindex msdata=default
-@opindex msdata
-On System V.4 and embedded PowerPC systems, if @option{-meabi} is used,
-compile code the same as @option{-msdata=eabi}, otherwise compile code the
-same as @option{-msdata=sysv}.
+@item -m5-64media-nofpu
+@opindex m5-64media-nofpu
+Generate 64-bit code for SHmedia in such a way that the
+floating-point unit is not used.
 
-@item -msdata=data
-@opindex msdata=data
-On System V.4 and embedded PowerPC systems, put small global
-data in the @code{.sdata} section.  Put small uninitialized global
-data in the @code{.sbss} section.  Do not use register @code{r13}
-to address small data however.  This is the default behavior unless
-other @option{-msdata} options are used.
+@item -m5-compact
+@opindex m5-compact
+Generate code for SHcompact.
 
-@item -msdata=none
-@itemx -mno-sdata
-@opindex msdata=none
-@opindex mno-sdata
-On embedded PowerPC systems, put all initialized global and static data
-in the @code{.data} section, and all uninitialized data in the
-@code{.bss} section.
+@item -m5-compact-nofpu
+@opindex m5-compact-nofpu
+Generate code for SHcompact in such a way that the
+floating-point unit is not used.
 
-@item -mblock-move-inline-limit=@var{num}
-@opindex mblock-move-inline-limit
-Inline all block moves (such as calls to @code{memcpy} or structure
-copies) less than or equal to @var{num} bytes.  The minimum value for
-@var{num} is 32 bytes on 32-bit targets and 64 bytes on 64-bit
-targets.  The default value is target-specific.
+@item -mb
+@opindex mb
+Compile code for the processor in big-endian mode.
 
-@item -G @var{num}
-@opindex G
-@cindex smaller data references (PowerPC)
-@cindex .sdata/.sdata2 references (PowerPC)
-On embedded PowerPC systems, put global and static items less than or
-equal to @var{num} bytes into the small data or BSS sections instead of
-the normal data or BSS section.  By default, @var{num} is 8.  The
-@option{-G @var{num}} switch is also passed to the linker.
-All modules should be compiled with the same @option{-G @var{num}} value.
+@item -ml
+@opindex ml
+Compile code for the processor in little-endian mode.
 
-@item -mregnames
-@itemx -mno-regnames
-@opindex mregnames
-@opindex mno-regnames
-On System V.4 and embedded PowerPC systems do (do not) emit register
-names in the assembly language output using symbolic forms.
+@item -mdalign
+@opindex mdalign
+Align doubles at 64-bit boundaries.  Note that this changes the calling
+conventions, and thus some functions from the standard C library do
+not work unless you recompile it first with @option{-mdalign}.
 
-@item -mlongcall
-@itemx -mno-longcall
-@opindex mlongcall
-@opindex mno-longcall
-By default assume that all calls are far away so that a longer and more
-expensive calling sequence is required.  This is required for calls
-farther than 32 megabytes (33,554,432 bytes) from the current location.
-A short call is generated if the compiler knows
-the call cannot be that far away.  This setting can be overridden by
-the @code{shortcall} function attribute, or by @code{#pragma
-longcall(0)}.
+@item -mrelax
+@opindex mrelax
+Shorten some address references at link time, when possible; uses the
+linker option @option{-relax}.
 
-Some linkers are capable of detecting out-of-range calls and generating
-glue code on the fly.  On these systems, long calls are unnecessary and
-generate slower code.  As of this writing, the AIX linker can do this,
-as can the GNU linker for PowerPC/64.  It is planned to add this feature
-to the GNU linker for 32-bit PowerPC systems as well.
+@item -mbigtable
+@opindex mbigtable
+Use 32-bit offsets in @code{switch} tables.  The default is to use
+16-bit offsets.
 
-On Darwin/PPC systems, @code{#pragma longcall} generates @code{jbsr
-callee, L42}, plus a @dfn{branch island} (glue code).  The two target
-addresses represent the callee and the branch island.  The
-Darwin/PPC linker prefers the first address and generates a @code{bl
-callee} if the PPC @code{bl} instruction reaches the callee directly;
-otherwise, the linker generates @code{bl L42} to call the branch
-island.  The branch island is appended to the body of the
-calling function; it computes the full 32-bit address of the callee
-and jumps to it.
+@item -mbitops
+@opindex mbitops
+Enable the use of bit manipulation instructions on SH2A.
 
-On Mach-O (Darwin) systems, this option directs the compiler emit to
-the glue for every direct call, and the Darwin linker decides whether
-to use or discard it.
+@item -mfmovd
+@opindex mfmovd
+Enable the use of the instruction @code{fmovd}.  Check @option{-mdalign} for
+alignment constraints.
 
-In the future, GCC may ignore all longcall specifications
-when the linker is known to generate glue.
+@item -mrenesas
+@opindex mrenesas
+Comply with the calling conventions defined by Renesas.
 
-@item -mtls-markers
-@itemx -mno-tls-markers
-@opindex mtls-markers
-@opindex mno-tls-markers
-Mark (do not mark) calls to @code{__tls_get_addr} with a relocation
-specifying the function argument.  The relocation allows the linker to
-reliably associate function call with argument setup instructions for
-TLS optimization, which in turn allows GCC to better schedule the
-sequence.
+@item -mno-renesas
+@opindex mno-renesas
+Comply with the calling conventions defined for GCC before the Renesas
+conventions were available.  This option is the default for all
+targets of the SH toolchain.
 
-@item -pthread
-@opindex pthread
-Adds support for multithreading with the @dfn{pthreads} library.
-This option sets flags for both the preprocessor and linker.
+@item -mnomacsave
+@opindex mnomacsave
+Mark the @code{MAC} register as call-clobbered, even if
+@option{-mrenesas} is given.
 
-@item -mrecip
-@itemx -mno-recip
-@opindex mrecip
-This option enables use of the reciprocal estimate and
-reciprocal square root estimate instructions with additional
-Newton-Raphson steps to increase precision instead of doing a divide or
-square root and divide for floating-point arguments.  You should use
-the @option{-ffast-math} option when using @option{-mrecip} (or at
-least @option{-funsafe-math-optimizations},
-@option{-finite-math-only}, @option{-freciprocal-math} and
-@option{-fno-trapping-math}).  Note that while the throughput of the
-sequence is generally higher than the throughput of the non-reciprocal
-instruction, the precision of the sequence can be decreased by up to 2
-ulp (i.e.@: the inverse of 1.0 equals 0.99999994) for reciprocal square
-roots.
+@item -mieee
+@itemx -mno-ieee
+@opindex mieee
+@opindex mno-ieee
+Control the IEEE compliance of floating-point comparisons, which affects the
+handling of cases where the result of a comparison is unordered.  By default
+@option{-mieee} is implicitly enabled.  If @option{-ffinite-math-only} is
+enabled @option{-mno-ieee} is implicitly set, which results in faster
+floating-point greater-equal and less-equal comparisons.  The implcit settings
+can be overridden by specifying either @option{-mieee} or @option{-mno-ieee}.
 
-@item -mrecip=@var{opt}
-@opindex mrecip=opt
-This option controls which reciprocal estimate instructions
-may be used.  @var{opt} is a comma-separated list of options, which may
-be preceded by a @code{!} to invert the option:
+@item -minline-ic_invalidate
+@opindex minline-ic_invalidate
+Inline code to invalidate instruction cache entries after setting up
+nested function trampolines.
+This option has no effect if @option{-musermode} is in effect and the selected
+code generation option (e.g. @option{-m4}) does not allow the use of the @code{icbi}
+instruction.
+If the selected code generation option does not allow the use of the @code{icbi}
+instruction, and @option{-musermode} is not in effect, the inlined code
+manipulates the instruction cache address array directly with an associative
+write.  This not only requires privileged mode at run time, but it also
+fails if the cache line had been mapped via the TLB and has become unmapped.
 
-@table @samp
+@item -misize
+@opindex misize
+Dump instruction size and location in the assembly code.
 
-@item all
-Enable all estimate instructions.
+@item -mpadstruct
+@opindex mpadstruct
+This option is deprecated.  It pads structures to multiple of 4 bytes,
+which is incompatible with the SH ABI@.
 
-@item default 
-Enable the default instructions, equivalent to @option{-mrecip}.
+@item -matomic-model=@var{model}
+@opindex matomic-model=@var{model}
+Sets the model of atomic operations and additional parameters as a comma
+separated list.  For details on the atomic built-in functions see
+@ref{__atomic Builtins}.  The following models and parameters are supported:
 
-@item none 
-Disable all estimate instructions, equivalent to @option{-mno-recip}.
+@table @samp
 
-@item div 
-Enable the reciprocal approximation instructions for both 
-single and double precision.
+@item none
+Disable compiler generated atomic sequences and emit library calls for atomic
+operations.  This is the default if the target is not @code{sh*-*-linux*}.
 
-@item divf 
-Enable the single-precision reciprocal approximation instructions.
+@item soft-gusa
+Generate GNU/Linux compatible gUSA software atomic sequences for the atomic
+built-in functions.  The generated atomic sequences require additional support
+from the interrupt/exception handling code of the system and are only suitable
+for SH3* and SH4* single-core systems.  This option is enabled by default when
+the target is @code{sh*-*-linux*} and SH3* or SH4*.  When the target is SH4A,
+this option also partially utilizes the hardware atomic instructions
+@code{movli.l} and @code{movco.l} to create more efficient code, unless
+@samp{strict} is specified.  
 
-@item divd 
-Enable the double-precision reciprocal approximation instructions.
+@item soft-tcb
+Generate software atomic sequences that use a variable in the thread control
+block.  This is a variation of the gUSA sequences which can also be used on
+SH1* and SH2* targets.  The generated atomic sequences require additional
+support from the interrupt/exception handling code of the system and are only
+suitable for single-core systems.  When using this model, the @samp{gbr-offset=}
+parameter has to be specified as well.
 
-@item rsqrt 
-Enable the reciprocal square root approximation instructions for both
-single and double precision.
+@item soft-imask
+Generate software atomic sequences that temporarily disable interrupts by
+setting @code{SR.IMASK = 1111}.  This model works only when the program runs
+in privileged mode and is only suitable for single-core systems.  Additional
+support from the interrupt/exception handling code of the system is not
+required.  This model is enabled by default when the target is
+@code{sh*-*-linux*} and SH1* or SH2*.
 
-@item rsqrtf 
-Enable the single-precision reciprocal square root approximation instructions.
+@item hard-llcs
+Generate hardware atomic sequences using the @code{movli.l} and @code{movco.l}
+instructions only.  This is only available on SH4A and is suitable for
+multi-core systems.  Since the hardware instructions support only 32 bit atomic
+variables access to 8 or 16 bit variables is emulated with 32 bit accesses.
+Code compiled with this option is also compatible with other software
+atomic model interrupt/exception handling systems if executed on an SH4A
+system.  Additional support from the interrupt/exception handling code of the
+system is not required for this model.
 
-@item rsqrtd 
-Enable the double-precision reciprocal square root approximation instructions.
+@item gbr-offset=
+This parameter specifies the offset in bytes of the variable in the thread
+control block structure that should be used by the generated atomic sequences
+when the @samp{soft-tcb} model has been selected.  For other models this
+parameter is ignored.  The specified value must be an integer multiple of four
+and in the range 0-1020.
+
+@item strict
+This parameter prevents mixed usage of multiple atomic models, even if they
+are compatible, and makes the compiler generate atomic sequences of the
+specified model only.
 
 @end table
 
-So, for example, @option{-mrecip=all,!rsqrtd} enables
-all of the reciprocal estimate instructions, except for the
-@code{FRSQRTE}, @code{XSRSQRTEDP}, and @code{XVRSQRTEDP} instructions
-which handle the double-precision reciprocal square root calculations.
+@item -mtas
+@opindex mtas
+Generate the @code{tas.b} opcode for @code{__atomic_test_and_set}.
+Notice that depending on the particular hardware and software configuration
+this can degrade overall performance due to the operand cache line flushes
+that are implied by the @code{tas.b} instruction.  On multi-core SH4A
+processors the @code{tas.b} instruction must be used with caution since it
+can result in data corruption for certain cache configurations.
 
-@item -mrecip-precision
-@itemx -mno-recip-precision
-@opindex mrecip-precision
-Assume (do not assume) that the reciprocal estimate instructions
-provide higher-precision estimates than is mandated by the PowerPC
-ABI.  Selecting @option{-mcpu=power6}, @option{-mcpu=power7} or
-@option{-mcpu=power8} automatically selects @option{-mrecip-precision}.
-The double-precision square root estimate instructions are not generated by
-default on low-precision machines, since they do not provide an
-estimate that converges after three steps.
+@item -mprefergot
+@opindex mprefergot
+When generating position-independent code, emit function calls using
+the Global Offset Table instead of the Procedure Linkage Table.
 
-@item -mveclibabi=@var{type}
-@opindex mveclibabi
-Specifies the ABI type to use for vectorizing intrinsics using an
-external library.  The only type supported at present is @samp{mass},
-which specifies to use IBM's Mathematical Acceleration Subsystem
-(MASS) libraries for vectorizing intrinsics using external libraries.
-GCC currently emits calls to @code{acosd2}, @code{acosf4},
-@code{acoshd2}, @code{acoshf4}, @code{asind2}, @code{asinf4},
-@code{asinhd2}, @code{asinhf4}, @code{atan2d2}, @code{atan2f4},
-@code{atand2}, @code{atanf4}, @code{atanhd2}, @code{atanhf4},
-@code{cbrtd2}, @code{cbrtf4}, @code{cosd2}, @code{cosf4},
-@code{coshd2}, @code{coshf4}, @code{erfcd2}, @code{erfcf4},
-@code{erfd2}, @code{erff4}, @code{exp2d2}, @code{exp2f4},
-@code{expd2}, @code{expf4}, @code{expm1d2}, @code{expm1f4},
-@code{hypotd2}, @code{hypotf4}, @code{lgammad2}, @code{lgammaf4},
-@code{log10d2}, @code{log10f4}, @code{log1pd2}, @code{log1pf4},
-@code{log2d2}, @code{log2f4}, @code{logd2}, @code{logf4},
-@code{powd2}, @code{powf4}, @code{sind2}, @code{sinf4}, @code{sinhd2},
-@code{sinhf4}, @code{sqrtd2}, @code{sqrtf4}, @code{tand2},
-@code{tanf4}, @code{tanhd2}, and @code{tanhf4} when generating code
-for power7.  Both @option{-ftree-vectorize} and
-@option{-funsafe-math-optimizations} must also be enabled.  The MASS
-libraries must be specified at link time.
+@item -musermode
+@itemx -mno-usermode
+@opindex musermode
+@opindex mno-usermode
+Don't allow (allow) the compiler generating privileged mode code.  Specifying
+@option{-musermode} also implies @option{-mno-inline-ic_invalidate} if the
+inlined code would not work in user mode.  @option{-musermode} is the default
+when the target is @code{sh*-*-linux*}.  If the target is SH1* or SH2*
+@option{-musermode} has no effect, since there is no user mode.
 
-@item -mfriz
-@itemx -mno-friz
-@opindex mfriz
-Generate (do not generate) the @code{friz} instruction when the
-@option{-funsafe-math-optimizations} option is used to optimize
-rounding of floating-point values to 64-bit integer and back to floating
-point.  The @code{friz} instruction does not return the same value if
-the floating-point number is too large to fit in an integer.
+@item -multcost=@var{number}
+@opindex multcost=@var{number}
+Set the cost to assume for a multiply insn.
 
-@item -mpointers-to-nested-functions
-@itemx -mno-pointers-to-nested-functions
-@opindex mpointers-to-nested-functions
-Generate (do not generate) code to load up the static chain register
-(@code{r11}) when calling through a pointer on AIX and 64-bit Linux
-systems where a function pointer points to a 3-word descriptor giving
-the function address, TOC value to be loaded in register @code{r2}, and
-static chain value to be loaded in register @code{r11}.  The
-@option{-mpointers-to-nested-functions} is on by default.  You cannot
-call through pointers to nested functions or pointers
-to functions compiled in other languages that use the static chain if
-you use the @option{-mno-pointers-to-nested-functions}.
+@item -mdiv=@var{strategy}
+@opindex mdiv=@var{strategy}
+Set the division strategy to be used for integer division operations.
+For SHmedia @var{strategy} can be one of: 
 
-@item -msave-toc-indirect
-@itemx -mno-save-toc-indirect
-@opindex msave-toc-indirect
-Generate (do not generate) code to save the TOC value in the reserved
-stack location in the function prologue if the function calls through
-a pointer on AIX and 64-bit Linux systems.  If the TOC value is not
-saved in the prologue, it is saved just before the call through the
-pointer.  The @option{-mno-save-toc-indirect} option is the default.
+@table @samp
 
-@item -mcompat-align-parm
-@itemx -mno-compat-align-parm
-@opindex mcompat-align-parm
-Generate (do not generate) code to pass structure parameters with a
-maximum alignment of 64 bits, for compatibility with older versions
-of GCC.
+@item fp 
+Performs the operation in floating point.  This has a very high latency,
+but needs only a few instructions, so it might be a good choice if
+your code has enough easily-exploitable ILP to allow the compiler to
+schedule the floating-point instructions together with other instructions.
+Division by zero causes a floating-point exception.
 
-Older versions of GCC (prior to 4.9.0) incorrectly did not align a
-structure parameter on a 128-bit boundary when that structure contained
-a member requiring 128-bit alignment.  This is corrected in more
-recent versions of GCC.  This option may be used to generate code
-that is compatible with functions compiled with older versions of
-GCC.
+@item inv
+Uses integer operations to calculate the inverse of the divisor,
+and then multiplies the dividend with the inverse.  This strategy allows
+CSE and hoisting of the inverse calculation.  Division by zero calculates
+an unspecified result, but does not trap.
 
-The @option{-mno-compat-align-parm} option is the default.
-@end table
+@item inv:minlat
+A variant of @samp{inv} where, if no CSE or hoisting opportunities
+have been found, or if the entire operation has been hoisted to the same
+place, the last stages of the inverse calculation are intertwined with the
+final multiply to reduce the overall latency, at the expense of using a few
+more instructions, and thus offering fewer scheduling opportunities with
+other code.
 
-@node RX Options
-@subsection RX Options
-@cindex RX Options
+@item call
+Calls a library function that usually implements the @samp{inv:minlat}
+strategy.
+This gives high code density for @code{m5-*media-nofpu} compilations.
 
-These command-line options are defined for RX targets:
+@item call2
+Uses a different entry point of the same library function, where it
+assumes that a pointer to a lookup table has already been set up, which
+exposes the pointer load to CSE and code hoisting optimizations.
 
-@table @gcctabopt
-@item -m64bit-doubles
-@itemx -m32bit-doubles
-@opindex m64bit-doubles
-@opindex m32bit-doubles
-Make the @code{double} data type be 64 bits (@option{-m64bit-doubles})
-or 32 bits (@option{-m32bit-doubles}) in size.  The default is
-@option{-m32bit-doubles}.  @emph{Note} RX floating-point hardware only
-works on 32-bit values, which is why the default is
-@option{-m32bit-doubles}.
+@item inv:call
+@itemx inv:call2
+@itemx inv:fp
+Use the @samp{inv} algorithm for initial
+code generation, but if the code stays unoptimized, revert to the @samp{call},
+@samp{call2}, or @samp{fp} strategies, respectively.  Note that the
+potentially-trapping side effect of division by zero is carried by a
+separate instruction, so it is possible that all the integer instructions
+are hoisted out, but the marker for the side effect stays where it is.
+A recombination to floating-point operations or a call is not possible
+in that case.
 
-@item -fpu
-@itemx -nofpu
-@opindex fpu
-@opindex nofpu
-Enables (@option{-fpu}) or disables (@option{-nofpu}) the use of RX
-floating-point hardware.  The default is enabled for the RX600
-series and disabled for the RX200 series.
+@item inv20u
+@itemx inv20l
+Variants of the @samp{inv:minlat} strategy.  In the case
+that the inverse calculation is not separated from the multiply, they speed
+up division where the dividend fits into 20 bits (plus sign where applicable)
+by inserting a test to skip a number of operations in this case; this test
+slows down the case of larger dividends.  @samp{inv20u} assumes the case of a such
+a small dividend to be unlikely, and @samp{inv20l} assumes it to be likely.
 
-Floating-point instructions are only generated for 32-bit floating-point 
-values, however, so the FPU hardware is not used for doubles if the
-@option{-m64bit-doubles} option is used.
+@end table
 
-@emph{Note} If the @option{-fpu} option is enabled then
-@option{-funsafe-math-optimizations} is also enabled automatically.
-This is because the RX FPU instructions are themselves unsafe.
+For targets other than SHmedia @var{strategy} can be one of:
 
-@item -mcpu=@var{name}
-@opindex mcpu
-Selects the type of RX CPU to be targeted.  Currently three types are
-supported, the generic @samp{RX600} and @samp{RX200} series hardware and
-the specific @samp{RX610} CPU.  The default is @samp{RX600}.
+@table @samp
 
-The only difference between @samp{RX600} and @samp{RX610} is that the
-@samp{RX610} does not support the @code{MVTIPL} instruction.
+@item call-div1
+Calls a library function that uses the single-step division instruction
+@code{div1} to perform the operation.  Division by zero calculates an
+unspecified result and does not trap.  This is the default except for SH4,
+SH2A and SHcompact.
 
-The @samp{RX200} series does not have a hardware floating-point unit
-and so @option{-nofpu} is enabled by default when this type is
-selected.
+@item call-fp
+Calls a library function that performs the operation in double precision
+floating point.  Division by zero causes a floating-point exception.  This is
+the default for SHcompact with FPU.  Specifying this for targets that do not
+have a double precision FPU defaults to @code{call-div1}.
 
-@item -mbig-endian-data
-@itemx -mlittle-endian-data
-@opindex mbig-endian-data
-@opindex mlittle-endian-data
-Store data (but not code) in the big-endian format.  The default is
-@option{-mlittle-endian-data}, i.e.@: to store data in the little-endian
-format.
+@item call-table
+Calls a library function that uses a lookup table for small divisors and
+the @code{div1} instruction with case distinction for larger divisors.  Division
+by zero calculates an unspecified result and does not trap.  This is the default
+for SH4.  Specifying this for targets that do not have dynamic shift
+instructions defaults to @code{call-div1}.
 
-@item -msmall-data-limit=@var{N}
-@opindex msmall-data-limit
-Specifies the maximum size in bytes of global and static variables
-which can be placed into the small data area.  Using the small data
-area can lead to smaller and faster code, but the size of area is
-limited and it is up to the programmer to ensure that the area does
-not overflow.  Also when the small data area is used one of the RX's
-registers (usually @code{r13}) is reserved for use pointing to this
-area, so it is no longer available for use by the compiler.  This
-could result in slower and/or larger code if variables are pushed onto
-the stack instead of being held in this register.
+@end table
 
-Note, common variables (variables that have not been initialized) and
-constants are not placed into the small data area as they are assigned
-to other sections in the output executable.
+When a division strategy has not been specified the default strategy is
+selected based on the current target.  For SH2A the default strategy is to
+use the @code{divs} and @code{divu} instructions instead of library function
+calls.
 
-The default value is zero, which disables this feature.  Note, this
-feature is not enabled by default with higher optimization levels
-(@option{-O2} etc) because of the potentially detrimental effects of
-reserving a register.  It is up to the programmer to experiment and
-discover whether this feature is of benefit to their program.  See the
-description of the @option{-mpid} option for a description of how the
-actual register to hold the small data area pointer is chosen.
+@item -maccumulate-outgoing-args
+@opindex maccumulate-outgoing-args
+Reserve space once for outgoing arguments in the function prologue rather
+than around each call.  Generally beneficial for performance and size.  Also
+needed for unwinding to avoid changing the stack frame around conditional code.
 
-@item -msim
-@itemx -mno-sim
-@opindex msim
-@opindex mno-sim
-Use the simulator runtime.  The default is to use the libgloss
-board-specific runtime.
+@item -mdivsi3_libfunc=@var{name}
+@opindex mdivsi3_libfunc=@var{name}
+Set the name of the library function used for 32-bit signed division to
+@var{name}.
+This only affects the name used in the @samp{call} and @samp{inv:call}
+division strategies, and the compiler still expects the same
+sets of input/output/clobbered registers as if this option were not present.
 
-@item -mas100-syntax
-@itemx -mno-as100-syntax
-@opindex mas100-syntax
-@opindex mno-as100-syntax
-When generating assembler output use a syntax that is compatible with
-Renesas's AS100 assembler.  This syntax can also be handled by the GAS
-assembler, but it has some restrictions so it is not generated by default.
+@item -mfixed-range=@var{register-range}
+@opindex mfixed-range
+Generate code treating the given register range as fixed registers.
+A fixed register is one that the register allocator can not use.  This is
+useful when compiling kernel code.  A register range is specified as
+two registers separated by a dash.  Multiple register ranges can be
+specified separated by a comma.
 
-@item -mmax-constant-size=@var{N}
-@opindex mmax-constant-size
-Specifies the maximum size, in bytes, of a constant that can be used as
-an operand in a RX instruction.  Although the RX instruction set does
-allow constants of up to 4 bytes in length to be used in instructions,
-a longer value equates to a longer instruction.  Thus in some
-circumstances it can be beneficial to restrict the size of constants
-that are used in instructions.  Constants that are too big are instead
-placed into a constant pool and referenced via register indirection.
+@item -mindexed-addressing
+@opindex mindexed-addressing
+Enable the use of the indexed addressing mode for SHmedia32/SHcompact.
+This is only safe if the hardware and/or OS implement 32-bit wrap-around
+semantics for the indexed addressing mode.  The architecture allows the
+implementation of processors with 64-bit MMU, which the OS could use to
+get 32-bit addressing, but since no current hardware implementation supports
+this or any other way to make the indexed addressing mode safe to use in
+the 32-bit ABI, the default is @option{-mno-indexed-addressing}.
 
-The value @var{N} can be between 0 and 4.  A value of 0 (the default)
-or 4 means that constants of any size are allowed.
+@item -mgettrcost=@var{number}
+@opindex mgettrcost=@var{number}
+Set the cost assumed for the @code{gettr} instruction to @var{number}.
+The default is 2 if @option{-mpt-fixed} is in effect, 100 otherwise.
 
-@item -mrelax
-@opindex mrelax
-Enable linker relaxation.  Linker relaxation is a process whereby the
-linker attempts to reduce the size of a program by finding shorter
-versions of various instructions.  Disabled by default.
+@item -mpt-fixed
+@opindex mpt-fixed
+Assume @code{pt*} instructions won't trap.  This generally generates
+better-scheduled code, but is unsafe on current hardware.
+The current architecture
+definition says that @code{ptabs} and @code{ptrel} trap when the target 
+anded with 3 is 3.
+This has the unintentional effect of making it unsafe to schedule these
+instructions before a branch, or hoist them out of a loop.  For example,
+@code{__do_global_ctors}, a part of @file{libgcc}
+that runs constructors at program
+startup, calls functions in a list which is delimited by @minus{}1.  With the
+@option{-mpt-fixed} option, the @code{ptabs} is done before testing against @minus{}1.
+That means that all the constructors run a bit more quickly, but when
+the loop comes to the end of the list, the program crashes because @code{ptabs}
+loads @minus{}1 into a target register.  
 
-@item -mint-register=@var{N}
-@opindex mint-register
-Specify the number of registers to reserve for fast interrupt handler
-functions.  The value @var{N} can be between 0 and 4.  A value of 1
-means that register @code{r13} is reserved for the exclusive use
-of fast interrupt handlers.  A value of 2 reserves @code{r13} and
-@code{r12}.  A value of 3 reserves @code{r13}, @code{r12} and
-@code{r11}, and a value of 4 reserves @code{r13} through @code{r10}.
-A value of 0, the default, does not reserve any registers.
+Since this option is unsafe for any
+hardware implementing the current architecture specification, the default
+is @option{-mno-pt-fixed}.  Unless specified explicitly with 
+@option{-mgettrcost}, @option{-mno-pt-fixed} also implies @option{-mgettrcost=100};
+this deters register allocation from using target registers for storing
+ordinary integers.
 
-@item -msave-acc-in-interrupts
-@opindex msave-acc-in-interrupts
-Specifies that interrupt handler functions should preserve the
-accumulator register.  This is only necessary if normal code might use
-the accumulator register, for example because it performs 64-bit
-multiplications.  The default is to ignore the accumulator as this
-makes the interrupt handlers faster.
+@item -minvalid-symbols
+@opindex minvalid-symbols
+Assume symbols might be invalid.  Ordinary function symbols generated by
+the compiler are always valid to load with
+@code{movi}/@code{shori}/@code{ptabs} or
+@code{movi}/@code{shori}/@code{ptrel},
+but with assembler and/or linker tricks it is possible
+to generate symbols that cause @code{ptabs} or @code{ptrel} to trap.
+This option is only meaningful when @option{-mno-pt-fixed} is in effect.
+It prevents cross-basic-block CSE, hoisting and most scheduling
+of symbol loads.  The default is @option{-mno-invalid-symbols}.
 
-@item -mpid
-@itemx -mno-pid
-@opindex mpid
-@opindex mno-pid
-Enables the generation of position independent data.  When enabled any
-access to constant data is done via an offset from a base address
-held in a register.  This allows the location of constant data to be
-determined at run time without requiring the executable to be
-relocated, which is a benefit to embedded applications with tight
-memory constraints.  Data that can be modified is not affected by this
-option.
+@item -mbranch-cost=@var{num}
+@opindex mbranch-cost=@var{num}
+Assume @var{num} to be the cost for a branch instruction.  Higher numbers
+make the compiler try to generate more branch-free code if possible.  
+If not specified the value is selected depending on the processor type that
+is being compiled for.
 
-Note, using this feature reserves a register, usually @code{r13}, for
-the constant data base address.  This can result in slower and/or
-larger code, especially in complicated functions.
+@item -mzdcbranch
+@itemx -mno-zdcbranch
+@opindex mzdcbranch
+@opindex mno-zdcbranch
+Assume (do not assume) that zero displacement conditional branch instructions
+@code{bt} and @code{bf} are fast.  If @option{-mzdcbranch} is specified, the
+compiler prefers zero displacement branch code sequences.  This is
+enabled by default when generating code for SH4 and SH4A.  It can be explicitly
+disabled by specifying @option{-mno-zdcbranch}.
 
-The actual register chosen to hold the constant data base address
-depends upon whether the @option{-msmall-data-limit} and/or the
-@option{-mint-register} command-line options are enabled.  Starting
-with register @code{r13} and proceeding downwards, registers are
-allocated first to satisfy the requirements of @option{-mint-register},
-then @option{-mpid} and finally @option{-msmall-data-limit}.  Thus it
-is possible for the small data area register to be @code{r8} if both
-@option{-mint-register=4} and @option{-mpid} are specified on the
-command line.
+@item -mfused-madd
+@itemx -mno-fused-madd
+@opindex mfused-madd
+@opindex mno-fused-madd
+Generate code that uses (does not use) the floating-point multiply and
+accumulate instructions.  These instructions are generated by default
+if hardware floating point is used.  The machine-dependent
+@option{-mfused-madd} option is now mapped to the machine-independent
+@option{-ffp-contract=fast} option, and @option{-mno-fused-madd} is
+mapped to @option{-ffp-contract=off}.
 
-By default this feature is not enabled.  The default can be restored
-via the @option{-mno-pid} command-line option.
+@item -mfsca
+@itemx -mno-fsca
+@opindex mfsca
+@opindex mno-fsca
+Allow or disallow the compiler to emit the @code{fsca} instruction for sine
+and cosine approximations.  The option @option{-mfsca} must be used in
+combination with @option{-funsafe-math-optimizations}.  It is enabled by default
+when generating code for SH4A.  Using @option{-mno-fsca} disables sine and cosine
+approximations even if @option{-funsafe-math-optimizations} is in effect.
 
-@item -mno-warn-multiple-fast-interrupts
-@itemx -mwarn-multiple-fast-interrupts
-@opindex mno-warn-multiple-fast-interrupts
-@opindex mwarn-multiple-fast-interrupts
-Prevents GCC from issuing a warning message if it finds more than one
-fast interrupt handler when it is compiling a file.  The default is to
-issue a warning for each extra fast interrupt handler found, as the RX
-only supports one such interrupt.
+@item -mfsrra
+@itemx -mno-fsrra
+@opindex mfsrra
+@opindex mno-fsrra
+Allow or disallow the compiler to emit the @code{fsrra} instruction for
+reciprocal square root approximations.  The option @option{-mfsrra} must be used
+in combination with @option{-funsafe-math-optimizations} and
+@option{-ffinite-math-only}.  It is enabled by default when generating code for
+SH4A.  Using @option{-mno-fsrra} disables reciprocal square root approximations
+even if @option{-funsafe-math-optimizations} and @option{-ffinite-math-only} are
+in effect.
 
-@end table
+@item -mpretend-cmove
+@opindex mpretend-cmove
+Prefer zero-displacement conditional branches for conditional move instruction
+patterns.  This can result in faster code on the SH4 processor.
 
-@emph{Note:} The generic GCC command-line option @option{-ffixed-@var{reg}}
-has special significance to the RX port when used with the
-@code{interrupt} function attribute.  This attribute indicates a
-function intended to process fast interrupts.  GCC ensures
-that it only uses the registers @code{r10}, @code{r11}, @code{r12}
-and/or @code{r13} and only provided that the normal use of the
-corresponding registers have been restricted via the
-@option{-ffixed-@var{reg}} or @option{-mint-register} command-line
-options.
+@end table
 
-@node S/390 and zSeries Options
-@subsection S/390 and zSeries Options
-@cindex S/390 and zSeries Options
+@node Solaris 2 Options
+@subsection Solaris 2 Options
+@cindex Solaris 2 options
 
-These are the @samp{-m} options defined for the S/390 and zSeries architecture.
+These @samp{-m} options are supported on Solaris 2:
 
 @table @gcctabopt
-@item -mhard-float
-@itemx -msoft-float
-@opindex mhard-float
-@opindex msoft-float
-Use (do not use) the hardware floating-point instructions and registers
-for floating-point operations.  When @option{-msoft-float} is specified,
-functions in @file{libgcc.a} are used to perform floating-point
-operations.  When @option{-mhard-float} is specified, the compiler
-generates IEEE floating-point instructions.  This is the default.
+@item -mclear-hwcap
+@opindex mclear-hwcap
+@option{-mclear-hwcap} tells the compiler to remove the hardware
+capabilities generated by the Solaris assembler.  This is only necessary
+when object files use ISA extensions not supported by the current
+machine, but check at runtime whether or not to use them.
 
-@item -mhard-dfp
-@itemx -mno-hard-dfp
-@opindex mhard-dfp
-@opindex mno-hard-dfp
-Use (do not use) the hardware decimal-floating-point instructions for
-decimal-floating-point operations.  When @option{-mno-hard-dfp} is
-specified, functions in @file{libgcc.a} are used to perform
-decimal-floating-point operations.  When @option{-mhard-dfp} is
-specified, the compiler generates decimal-floating-point hardware
-instructions.  This is the default for @option{-march=z9-ec} or higher.
+@item -mimpure-text
+@opindex mimpure-text
+@option{-mimpure-text}, used in addition to @option{-shared}, tells
+the compiler to not pass @option{-z text} to the linker when linking a
+shared object.  Using this option, you can link position-dependent
+code into a shared object.
 
-@item -mlong-double-64
-@itemx -mlong-double-128
-@opindex mlong-double-64
-@opindex mlong-double-128
-These switches control the size of @code{long double} type. A size
-of 64 bits makes the @code{long double} type equivalent to the @code{double}
-type. This is the default.
+@option{-mimpure-text} suppresses the ``relocations remain against
+allocatable but non-writable sections'' linker error message.
+However, the necessary relocations trigger copy-on-write, and the
+shared object is not actually shared across processes.  Instead of
+using @option{-mimpure-text}, you should compile all source code with
+@option{-fpic} or @option{-fPIC}.
 
-@item -mbackchain
-@itemx -mno-backchain
-@opindex mbackchain
-@opindex mno-backchain
-Store (do not store) the address of the caller's frame as backchain pointer
-into the callee's stack frame.
-A backchain may be needed to allow debugging using tools that do not understand
-DWARF 2 call frame information.
-When @option{-mno-packed-stack} is in effect, the backchain pointer is stored
-at the bottom of the stack frame; when @option{-mpacked-stack} is in effect,
-the backchain is placed into the topmost word of the 96/160 byte register
-save area.
-
-In general, code compiled with @option{-mbackchain} is call-compatible with
-code compiled with @option{-mmo-backchain}; however, use of the backchain
-for debugging purposes usually requires that the whole binary is built with
-@option{-mbackchain}.  Note that the combination of @option{-mbackchain},
-@option{-mpacked-stack} and @option{-mhard-float} is not supported.  In order
-to build a linux kernel use @option{-msoft-float}.
+@end table
 
-The default is to not maintain the backchain.
+These switches are supported in addition to the above on Solaris 2:
 
-@item -mpacked-stack
-@itemx -mno-packed-stack
-@opindex mpacked-stack
-@opindex mno-packed-stack
-Use (do not use) the packed stack layout.  When @option{-mno-packed-stack} is
-specified, the compiler uses the all fields of the 96/160 byte register save
-area only for their default purpose; unused fields still take up stack space.
-When @option{-mpacked-stack} is specified, register save slots are densely
-packed at the top of the register save area; unused space is reused for other
-purposes, allowing for more efficient use of the available stack space.
-However, when @option{-mbackchain} is also in effect, the topmost word of
-the save area is always used to store the backchain, and the return address
-register is always saved two words below the backchain.
+@table @gcctabopt
+@item -pthreads
+@opindex pthreads
+Add support for multithreading using the POSIX threads library.  This
+option sets flags for both the preprocessor and linker.  This option does
+not affect the thread safety of object code produced  by the compiler or
+that of libraries supplied with it.
 
-As long as the stack frame backchain is not used, code generated with
-@option{-mpacked-stack} is call-compatible with code generated with
-@option{-mno-packed-stack}.  Note that some non-FSF releases of GCC 2.95 for
-S/390 or zSeries generated code that uses the stack frame backchain at run
-time, not just for debugging purposes.  Such code is not call-compatible
-with code compiled with @option{-mpacked-stack}.  Also, note that the
-combination of @option{-mbackchain},
-@option{-mpacked-stack} and @option{-mhard-float} is not supported.  In order
-to build a linux kernel use @option{-msoft-float}.
+@item -pthread
+@opindex pthread
+This is a synonym for @option{-pthreads}.
+@end table
 
-The default is to not use the packed stack layout.
+@node SPARC Options
+@subsection SPARC Options
+@cindex SPARC options
 
-@item -msmall-exec
-@itemx -mno-small-exec
-@opindex msmall-exec
-@opindex mno-small-exec
-Generate (or do not generate) code using the @code{bras} instruction
-to do subroutine calls.
-This only works reliably if the total executable size does not
-exceed 64k.  The default is to use the @code{basr} instruction instead,
-which does not have this limitation.
+These @samp{-m} options are supported on the SPARC:
 
-@item -m64
-@itemx -m31
-@opindex m64
-@opindex m31
-When @option{-m31} is specified, generate code compliant to the
-GNU/Linux for S/390 ABI@.  When @option{-m64} is specified, generate
-code compliant to the GNU/Linux for zSeries ABI@.  This allows GCC in
-particular to generate 64-bit instructions.  For the @samp{s390}
-targets, the default is @option{-m31}, while the @samp{s390x}
-targets default to @option{-m64}.
+@table @gcctabopt
+@item -mno-app-regs
+@itemx -mapp-regs
+@opindex mno-app-regs
+@opindex mapp-regs
+Specify @option{-mapp-regs} to generate output using the global registers
+2 through 4, which the SPARC SVR4 ABI reserves for applications.  Like the
+global register 1, each global register 2 through 4 is then treated as an
+allocable register that is clobbered by function calls.  This is the default.
 
-@item -mzarch
-@itemx -mesa
-@opindex mzarch
-@opindex mesa
-When @option{-mzarch} is specified, generate code using the
-instructions available on z/Architecture.
-When @option{-mesa} is specified, generate code using the
-instructions available on ESA/390.  Note that @option{-mesa} is
-not possible with @option{-m64}.
-When generating code compliant to the GNU/Linux for S/390 ABI,
-the default is @option{-mesa}.  When generating code compliant
-to the GNU/Linux for zSeries ABI, the default is @option{-mzarch}.
+To be fully SVR4 ABI-compliant at the cost of some performance loss,
+specify @option{-mno-app-regs}.  You should compile libraries and system
+software with this option.
 
-@item -mmvcle
-@itemx -mno-mvcle
-@opindex mmvcle
-@opindex mno-mvcle
-Generate (or do not generate) code using the @code{mvcle} instruction
-to perform block moves.  When @option{-mno-mvcle} is specified,
-use a @code{mvc} loop instead.  This is the default unless optimizing for
-size.
+@item -mflat
+@itemx -mno-flat
+@opindex mflat
+@opindex mno-flat
+With @option{-mflat}, the compiler does not generate save/restore instructions
+and uses a ``flat'' or single register window model.  This model is compatible
+with the regular register window model.  The local registers and the input
+registers (0--5) are still treated as ``call-saved'' registers and are
+saved on the stack as needed.
 
-@item -mdebug
-@itemx -mno-debug
-@opindex mdebug
-@opindex mno-debug
-Print (or do not print) additional debug information when compiling.
-The default is to not print debug information.
+With @option{-mno-flat} (the default), the compiler generates save/restore
+instructions (except for leaf functions).  This is the normal operating mode.
 
-@item -march=@var{cpu-type}
-@opindex march
-Generate code that runs on @var{cpu-type}, which is the name of a system
-representing a certain processor type.  Possible values for
-@var{cpu-type} are @samp{g5}, @samp{g6}, @samp{z900}, @samp{z990},
-@samp{z9-109}, @samp{z9-ec} and @samp{z10}.
-When generating code using the instructions available on z/Architecture,
-the default is @option{-march=z900}.  Otherwise, the default is
-@option{-march=g5}.
+@item -mfpu
+@itemx -mhard-float
+@opindex mfpu
+@opindex mhard-float
+Generate output containing floating-point instructions.  This is the
+default.
 
-@item -mtune=@var{cpu-type}
-@opindex mtune
-Tune to @var{cpu-type} everything applicable about the generated code,
-except for the ABI and the set of available instructions.
-The list of @var{cpu-type} values is the same as for @option{-march}.
-The default is the value used for @option{-march}.
+@item -mno-fpu
+@itemx -msoft-float
+@opindex mno-fpu
+@opindex msoft-float
+Generate output containing library calls for floating point.
+@strong{Warning:} the requisite libraries are not available for all SPARC
+targets.  Normally the facilities of the machine's usual C compiler are
+used, but this cannot be done directly in cross-compilation.  You must make
+your own arrangements to provide suitable library functions for
+cross-compilation.  The embedded targets @samp{sparc-*-aout} and
+@samp{sparclite-*-*} do provide software floating-point support.
 
-@item -mtpf-trace
-@itemx -mno-tpf-trace
-@opindex mtpf-trace
-@opindex mno-tpf-trace
-Generate code that adds (does not add) in TPF OS specific branches to trace
-routines in the operating system.  This option is off by default, even
-when compiling for the TPF OS@.
+@option{-msoft-float} changes the calling convention in the output file;
+therefore, it is only useful if you compile @emph{all} of a program with
+this option.  In particular, you need to compile @file{libgcc.a}, the
+library that comes with GCC, with @option{-msoft-float} in order for
+this to work.
 
-@item -mfused-madd
-@itemx -mno-fused-madd
-@opindex mfused-madd
-@opindex mno-fused-madd
-Generate code that uses (does not use) the floating-point multiply and
-accumulate instructions.  These instructions are generated by default if
-hardware floating point is used.
+@item -mhard-quad-float
+@opindex mhard-quad-float
+Generate output containing quad-word (long double) floating-point
+instructions.
 
-@item -mwarn-framesize=@var{framesize}
-@opindex mwarn-framesize
-Emit a warning if the current function exceeds the given frame size.  Because
-this is a compile-time check it doesn't need to be a real problem when the program
-runs.  It is intended to identify functions that most probably cause
-a stack overflow.  It is useful to be used in an environment with limited stack
-size e.g.@: the linux kernel.
+@item -msoft-quad-float
+@opindex msoft-quad-float
+Generate output containing library calls for quad-word (long double)
+floating-point instructions.  The functions called are those specified
+in the SPARC ABI@.  This is the default.
 
-@item -mwarn-dynamicstack
-@opindex mwarn-dynamicstack
-Emit a warning if the function calls @code{alloca} or uses dynamically-sized
-arrays.  This is generally a bad idea with a limited stack size.
+As of this writing, there are no SPARC implementations that have hardware
+support for the quad-word floating-point instructions.  They all invoke
+a trap handler for one of these instructions, and then the trap handler
+emulates the effect of the instruction.  Because of the trap handler overhead,
+this is much slower than calling the ABI library routines.  Thus the
+@option{-msoft-quad-float} option is the default.
 
-@item -mstack-guard=@var{stack-guard}
-@itemx -mstack-size=@var{stack-size}
-@opindex mstack-guard
-@opindex mstack-size
-If these options are provided the S/390 back end emits additional instructions in
-the function prologue that trigger a trap if the stack size is @var{stack-guard}
-bytes above the @var{stack-size} (remember that the stack on S/390 grows downward).
-If the @var{stack-guard} option is omitted the smallest power of 2 larger than
-the frame size of the compiled function is chosen.
-These options are intended to be used to help debugging stack overflow problems.
-The additionally emitted code causes only little overhead and hence can also be
-used in production-like systems without greater performance degradation.  The given
-values have to be exact powers of 2 and @var{stack-size} has to be greater than
-@var{stack-guard} without exceeding 64k.
-In order to be efficient the extra code makes the assumption that the stack starts
-at an address aligned to the value given by @var{stack-size}.
-The @var{stack-guard} option can only be used in conjunction with @var{stack-size}.
+@item -mno-unaligned-doubles
+@itemx -munaligned-doubles
+@opindex mno-unaligned-doubles
+@opindex munaligned-doubles
+Assume that doubles have 8-byte alignment.  This is the default.
 
-@item -mhotpatch=@var{pre-halfwords},@var{post-halfwords}
-@opindex mhotpatch
-If the hotpatch option is enabled, a ``hot-patching'' function
-prologue is generated for all functions in the compilation unit.
-The funtion label is prepended with the given number of two-byte
-Nop instructions (@var{pre-halfwords}, maximum 1000000).  After
-the label, 2 * @var{post-halfwords} bytes are appended, using the
-larges nop like instructions the architecture allows (maximum
-1000000).
+With @option{-munaligned-doubles}, GCC assumes that doubles have 8-byte
+alignment only if they are contained in another type, or if they have an
+absolute address.  Otherwise, it assumes they have 4-byte alignment.
+Specifying this option avoids some rare compatibility problems with code
+generated by other compilers.  It is not the default because it results
+in a performance loss, especially for floating-point code.
 
-If both arguments are zero, hotpatching is disabled.
+@item -muser-mode
+@itemx -mno-user-mode
+@opindex muser-mode
+@opindex mno-user-mode
+Do not generate code that can only run in supervisor mode.  This is relevant
+only for the @code{casa} instruction emitted for the LEON3 processor.  The
+default is @option{-mno-user-mode}.
 
-This option can be overridden for individual functions with the
-@code{hotpatch} attribute.
-@end table
+@item -mno-faster-structs
+@itemx -mfaster-structs
+@opindex mno-faster-structs
+@opindex mfaster-structs
+With @option{-mfaster-structs}, the compiler assumes that structures
+should have 8-byte alignment.  This enables the use of pairs of
+@code{ldd} and @code{std} instructions for copies in structure
+assignment, in place of twice as many @code{ld} and @code{st} pairs.
+However, the use of this changed alignment directly violates the SPARC
+ABI@.  Thus, it's intended only for use on targets where the developer
+acknowledges that their resulting code is not directly in line with
+the rules of the ABI@.
 
-@node Score Options
-@subsection Score Options
-@cindex Score Options
-
-These options are defined for Score implementations:
-
-@table @gcctabopt
-@item -meb
-@opindex meb
-Compile code for big-endian mode.  This is the default.
+@item -mcpu=@var{cpu_type}
+@opindex mcpu
+Set the instruction set, register set, and instruction scheduling parameters
+for machine type @var{cpu_type}.  Supported values for @var{cpu_type} are
+@samp{v7}, @samp{cypress}, @samp{v8}, @samp{supersparc}, @samp{hypersparc},
+@samp{leon}, @samp{leon3}, @samp{leon3v7}, @samp{sparclite}, @samp{f930},
+@samp{f934}, @samp{sparclite86x}, @samp{sparclet}, @samp{tsc701}, @samp{v9},
+@samp{ultrasparc}, @samp{ultrasparc3}, @samp{niagara}, @samp{niagara2},
+@samp{niagara3} and @samp{niagara4}.
 
-@item -mel
-@opindex mel
-Compile code for little-endian mode.
+Native Solaris and GNU/Linux toolchains also support the value @samp{native},
+which selects the best architecture option for the host processor.
+@option{-mcpu=native} has no effect if GCC does not recognize
+the processor.
 
-@item -mnhwloop
-@opindex mnhwloop
-Disable generation of @code{bcnz} instructions.
+Default instruction scheduling parameters are used for values that select
+an architecture and not an implementation.  These are @samp{v7}, @samp{v8},
+@samp{sparclite}, @samp{sparclet}, @samp{v9}.
 
-@item -muls
-@opindex muls
-Enable generation of unaligned load and store instructions.
+Here is a list of each supported architecture and their supported
+implementations.
 
-@item -mmac
-@opindex mmac
-Enable the use of multiply-accumulate instructions. Disabled by default.
+@table @asis
+@item v7
+cypress, leon3v7
 
-@item -mscore5
-@opindex mscore5
-Specify the SCORE5 as the target architecture.
+@item v8
+supersparc, hypersparc, leon, leon3
 
-@item -mscore5u
-@opindex mscore5u
-Specify the SCORE5U of the target architecture.
+@item sparclite
+f930, f934, sparclite86x
 
-@item -mscore7
-@opindex mscore7
-Specify the SCORE7 as the target architecture. This is the default.
+@item sparclet
+tsc701
 
-@item -mscore7d
-@opindex mscore7d
-Specify the SCORE7D as the target architecture.
+@item v9
+ultrasparc, ultrasparc3, niagara, niagara2, niagara3, niagara4
 @end table
 
-@node SH Options
-@subsection SH Options
+By default (unless configured otherwise), GCC generates code for the V7
+variant of the SPARC architecture.  With @option{-mcpu=cypress}, the compiler
+additionally optimizes it for the Cypress CY7C602 chip, as used in the
+SPARCStation/SPARCServer 3xx series.  This is also appropriate for the older
+SPARCStation 1, 2, IPX etc.
 
-These @samp{-m} options are defined for the SH implementations:
+With @option{-mcpu=v8}, GCC generates code for the V8 variant of the SPARC
+architecture.  The only difference from V7 code is that the compiler emits
+the integer multiply and integer divide instructions which exist in SPARC-V8
+but not in SPARC-V7.  With @option{-mcpu=supersparc}, the compiler additionally
+optimizes it for the SuperSPARC chip, as used in the SPARCStation 10, 1000 and
+2000 series.
 
-@table @gcctabopt
-@item -m1
-@opindex m1
-Generate code for the SH1.
+With @option{-mcpu=sparclite}, GCC generates code for the SPARClite variant of
+the SPARC architecture.  This adds the integer multiply, integer divide step
+and scan (@code{ffs}) instructions which exist in SPARClite but not in SPARC-V7.
+With @option{-mcpu=f930}, the compiler additionally optimizes it for the
+Fujitsu MB86930 chip, which is the original SPARClite, with no FPU@.  With
+@option{-mcpu=f934}, the compiler additionally optimizes it for the Fujitsu
+MB86934 chip, which is the more recent SPARClite with FPU@.
 
-@item -m2
-@opindex m2
-Generate code for the SH2.
+With @option{-mcpu=sparclet}, GCC generates code for the SPARClet variant of
+the SPARC architecture.  This adds the integer multiply, multiply/accumulate,
+integer divide step and scan (@code{ffs}) instructions which exist in SPARClet
+but not in SPARC-V7.  With @option{-mcpu=tsc701}, the compiler additionally
+optimizes it for the TEMIC SPARClet chip.
 
-@item -m2e
-Generate code for the SH2e.
+With @option{-mcpu=v9}, GCC generates code for the V9 variant of the SPARC
+architecture.  This adds 64-bit integer and floating-point move instructions,
+3 additional floating-point condition code registers and conditional move
+instructions.  With @option{-mcpu=ultrasparc}, the compiler additionally
+optimizes it for the Sun UltraSPARC I/II/IIi chips.  With
+@option{-mcpu=ultrasparc3}, the compiler additionally optimizes it for the
+Sun UltraSPARC III/III+/IIIi/IIIi+/IV/IV+ chips.  With
+@option{-mcpu=niagara}, the compiler additionally optimizes it for
+Sun UltraSPARC T1 chips.  With @option{-mcpu=niagara2}, the compiler
+additionally optimizes it for Sun UltraSPARC T2 chips. With
+@option{-mcpu=niagara3}, the compiler additionally optimizes it for Sun
+UltraSPARC T3 chips.  With @option{-mcpu=niagara4}, the compiler
+additionally optimizes it for Sun UltraSPARC T4 chips.
 
-@item -m2a-nofpu
-@opindex m2a-nofpu
-Generate code for the SH2a without FPU, or for a SH2a-FPU in such a way
-that the floating-point unit is not used.
+@item -mtune=@var{cpu_type}
+@opindex mtune
+Set the instruction scheduling parameters for machine type
+@var{cpu_type}, but do not set the instruction set or register set that the
+option @option{-mcpu=@var{cpu_type}} does.
 
-@item -m2a-single-only
-@opindex m2a-single-only
-Generate code for the SH2a-FPU, in such a way that no double-precision
-floating-point operations are used.
+The same values for @option{-mcpu=@var{cpu_type}} can be used for
+@option{-mtune=@var{cpu_type}}, but the only useful values are those
+that select a particular CPU implementation.  Those are @samp{cypress},
+@samp{supersparc}, @samp{hypersparc}, @samp{leon}, @samp{leon3},
+@samp{leon3v7}, @samp{f930}, @samp{f934}, @samp{sparclite86x}, @samp{tsc701},
+@samp{ultrasparc}, @samp{ultrasparc3}, @samp{niagara}, @samp{niagara2},
+@samp{niagara3} and @samp{niagara4}.  With native Solaris and GNU/Linux
+toolchains, @samp{native} can also be used.
 
-@item -m2a-single
-@opindex m2a-single
-Generate code for the SH2a-FPU assuming the floating-point unit is in
-single-precision mode by default.
+@item -mv8plus
+@itemx -mno-v8plus
+@opindex mv8plus
+@opindex mno-v8plus
+With @option{-mv8plus}, GCC generates code for the SPARC-V8+ ABI@.  The
+difference from the V8 ABI is that the global and out registers are
+considered 64 bits wide.  This is enabled by default on Solaris in 32-bit
+mode for all SPARC-V9 processors.
 
-@item -m2a
-@opindex m2a
-Generate code for the SH2a-FPU assuming the floating-point unit is in
-double-precision mode by default.
+@item -mvis
+@itemx -mno-vis
+@opindex mvis
+@opindex mno-vis
+With @option{-mvis}, GCC generates code that takes advantage of the UltraSPARC
+Visual Instruction Set extensions.  The default is @option{-mno-vis}.
 
-@item -m3
-@opindex m3
-Generate code for the SH3.
+@item -mvis2
+@itemx -mno-vis2
+@opindex mvis2
+@opindex mno-vis2
+With @option{-mvis2}, GCC generates code that takes advantage of
+version 2.0 of the UltraSPARC Visual Instruction Set extensions.  The
+default is @option{-mvis2} when targeting a cpu that supports such
+instructions, such as UltraSPARC-III and later.  Setting @option{-mvis2}
+also sets @option{-mvis}.
 
-@item -m3e
-@opindex m3e
-Generate code for the SH3e.
+@item -mvis3
+@itemx -mno-vis3
+@opindex mvis3
+@opindex mno-vis3
+With @option{-mvis3}, GCC generates code that takes advantage of
+version 3.0 of the UltraSPARC Visual Instruction Set extensions.  The
+default is @option{-mvis3} when targeting a cpu that supports such
+instructions, such as niagara-3 and later.  Setting @option{-mvis3}
+also sets @option{-mvis2} and @option{-mvis}.
 
-@item -m4-nofpu
-@opindex m4-nofpu
-Generate code for the SH4 without a floating-point unit.
+@item -mcbcond
+@itemx -mno-cbcond
+@opindex mcbcond
+@opindex mno-cbcond
+With @option{-mcbcond}, GCC generates code that takes advantage of
+compare-and-branch instructions, as defined in the Sparc Architecture 2011.
+The default is @option{-mcbcond} when targeting a cpu that supports such
+instructions, such as niagara-4 and later.
 
-@item -m4-single-only
-@opindex m4-single-only
-Generate code for the SH4 with a floating-point unit that only
-supports single-precision arithmetic.
+@item -mpopc
+@itemx -mno-popc
+@opindex mpopc
+@opindex mno-popc
+With @option{-mpopc}, GCC generates code that takes advantage of the UltraSPARC
+population count instruction.  The default is @option{-mpopc}
+when targeting a cpu that supports such instructions, such as Niagara-2 and
+later.
 
-@item -m4-single
-@opindex m4-single
-Generate code for the SH4 assuming the floating-point unit is in
-single-precision mode by default.
+@item -mfmaf
+@itemx -mno-fmaf
+@opindex mfmaf
+@opindex mno-fmaf
+With @option{-mfmaf}, GCC generates code that takes advantage of the UltraSPARC
+Fused Multiply-Add Floating-point extensions.  The default is @option{-mfmaf}
+when targeting a cpu that supports such instructions, such as Niagara-3 and
+later.
 
-@item -m4
-@opindex m4
-Generate code for the SH4.
+@item -mfix-at697f
+@opindex mfix-at697f
+Enable the documented workaround for the single erratum of the Atmel AT697F
+processor (which corresponds to erratum #13 of the AT697E processor).
 
-@item -m4-100
-@opindex m4-100
-Generate code for SH4-100.
+@item -mfix-ut699
+@opindex mfix-ut699
+Enable the documented workarounds for the floating-point errata and the data
+cache nullify errata of the UT699 processor.
+@end table
 
-@item -m4-100-nofpu
-@opindex m4-100-nofpu
-Generate code for SH4-100 in such a way that the
-floating-point unit is not used.
+These @samp{-m} options are supported in addition to the above
+on SPARC-V9 processors in 64-bit environments:
 
-@item -m4-100-single
-@opindex m4-100-single
-Generate code for SH4-100 assuming the floating-point unit is in
-single-precision mode by default.
+@table @gcctabopt
+@item -m32
+@itemx -m64
+@opindex m32
+@opindex m64
+Generate code for a 32-bit or 64-bit environment.
+The 32-bit environment sets int, long and pointer to 32 bits.
+The 64-bit environment sets int to 32 bits and long and pointer
+to 64 bits.
 
-@item -m4-100-single-only
-@opindex m4-100-single-only
-Generate code for SH4-100 in such a way that no double-precision
-floating-point operations are used.
+@item -mcmodel=@var{which}
+@opindex mcmodel
+Set the code model to one of
 
-@item -m4-200
-@opindex m4-200
-Generate code for SH4-200.
+@table @samp
+@item medlow
+The Medium/Low code model: 64-bit addresses, programs
+must be linked in the low 32 bits of memory.  Programs can be statically
+or dynamically linked.
 
-@item -m4-200-nofpu
-@opindex m4-200-nofpu
-Generate code for SH4-200 without in such a way that the
-floating-point unit is not used.
+@item medmid
+The Medium/Middle code model: 64-bit addresses, programs
+must be linked in the low 44 bits of memory, the text and data segments must
+be less than 2GB in size and the data segment must be located within 2GB of
+the text segment.
 
-@item -m4-200-single
-@opindex m4-200-single
-Generate code for SH4-200 assuming the floating-point unit is in
-single-precision mode by default.
+@item medany
+The Medium/Anywhere code model: 64-bit addresses, programs
+may be linked anywhere in memory, the text and data segments must be less
+than 2GB in size and the data segment must be located within 2GB of the
+text segment.
 
-@item -m4-200-single-only
-@opindex m4-200-single-only
-Generate code for SH4-200 in such a way that no double-precision
-floating-point operations are used.
+@item embmedany
+The Medium/Anywhere code model for embedded systems:
+64-bit addresses, the text and data segments must be less than 2GB in
+size, both starting anywhere in memory (determined at link time).  The
+global register %g4 points to the base of the data segment.  Programs
+are statically linked and PIC is not supported.
+@end table
 
-@item -m4-300
-@opindex m4-300
-Generate code for SH4-300.
+@item -mmemory-model=@var{mem-model}
+@opindex mmemory-model
+Set the memory model in force on the processor to one of
 
-@item -m4-300-nofpu
-@opindex m4-300-nofpu
-Generate code for SH4-300 without in such a way that the
-floating-point unit is not used.
+@table @samp
+@item default
+The default memory model for the processor and operating system.
 
-@item -m4-300-single
-@opindex m4-300-single
-Generate code for SH4-300 in such a way that no double-precision
-floating-point operations are used.
+@item rmo
+Relaxed Memory Order
 
-@item -m4-300-single-only
-@opindex m4-300-single-only
-Generate code for SH4-300 in such a way that no double-precision
-floating-point operations are used.
+@item pso
+Partial Store Order
 
-@item -m4-340
-@opindex m4-340
-Generate code for SH4-340 (no MMU, no FPU).
+@item tso
+Total Store Order
 
-@item -m4-500
-@opindex m4-500
-Generate code for SH4-500 (no FPU).  Passes @option{-isa=sh4-nofpu} to the
-assembler.
+@item sc
+Sequential Consistency
+@end table
 
-@item -m4a-nofpu
-@opindex m4a-nofpu
-Generate code for the SH4al-dsp, or for a SH4a in such a way that the
-floating-point unit is not used.
+These memory models are formally defined in Appendix D of the Sparc V9
+architecture manual, as set in the processor's @code{PSTATE.MM} field.
 
-@item -m4a-single-only
-@opindex m4a-single-only
-Generate code for the SH4a, in such a way that no double-precision
-floating-point operations are used.
+@item -mstack-bias
+@itemx -mno-stack-bias
+@opindex mstack-bias
+@opindex mno-stack-bias
+With @option{-mstack-bias}, GCC assumes that the stack pointer, and
+frame pointer if present, are offset by @minus{}2047 which must be added back
+when making stack frame references.  This is the default in 64-bit mode.
+Otherwise, assume no such offset is present.
+@end table
 
-@item -m4a-single
-@opindex m4a-single
-Generate code for the SH4a assuming the floating-point unit is in
-single-precision mode by default.
+@node SPU Options
+@subsection SPU Options
+@cindex SPU options
 
-@item -m4a
-@opindex m4a
-Generate code for the SH4a.
+These @samp{-m} options are supported on the SPU:
 
-@item -m4al
-@opindex m4al
-Same as @option{-m4a-nofpu}, except that it implicitly passes
-@option{-dsp} to the assembler.  GCC doesn't generate any DSP
-instructions at the moment.
+@table @gcctabopt
+@item -mwarn-reloc
+@itemx -merror-reloc
+@opindex mwarn-reloc
+@opindex merror-reloc
 
-@item -m5-32media
-@opindex m5-32media
-Generate 32-bit code for SHmedia.
+The loader for SPU does not handle dynamic relocations.  By default, GCC
+gives an error when it generates code that requires a dynamic
+relocation.  @option{-mno-error-reloc} disables the error,
+@option{-mwarn-reloc} generates a warning instead.
 
-@item -m5-32media-nofpu
-@opindex m5-32media-nofpu
-Generate 32-bit code for SHmedia in such a way that the
-floating-point unit is not used.
+@item -msafe-dma
+@itemx -munsafe-dma
+@opindex msafe-dma
+@opindex munsafe-dma
 
-@item -m5-64media
-@opindex m5-64media
-Generate 64-bit code for SHmedia.
+Instructions that initiate or test completion of DMA must not be
+reordered with respect to loads and stores of the memory that is being
+accessed.
+With @option{-munsafe-dma} you must use the @code{volatile} keyword to protect
+memory accesses, but that can lead to inefficient code in places where the
+memory is known to not change.  Rather than mark the memory as volatile,
+you can use @option{-msafe-dma} to tell the compiler to treat
+the DMA instructions as potentially affecting all memory.  
 
-@item -m5-64media-nofpu
-@opindex m5-64media-nofpu
-Generate 64-bit code for SHmedia in such a way that the
-floating-point unit is not used.
+@item -mbranch-hints
+@opindex mbranch-hints
 
-@item -m5-compact
-@opindex m5-compact
-Generate code for SHcompact.
+By default, GCC generates a branch hint instruction to avoid
+pipeline stalls for always-taken or probably-taken branches.  A hint
+is not generated closer than 8 instructions away from its branch.
+There is little reason to disable them, except for debugging purposes,
+or to make an object a little bit smaller.
 
-@item -m5-compact-nofpu
-@opindex m5-compact-nofpu
-Generate code for SHcompact in such a way that the
-floating-point unit is not used.
+@item -msmall-mem
+@itemx -mlarge-mem
+@opindex msmall-mem
+@opindex mlarge-mem
 
-@item -mb
-@opindex mb
-Compile code for the processor in big-endian mode.
+By default, GCC generates code assuming that addresses are never larger
+than 18 bits.  With @option{-mlarge-mem} code is generated that assumes
+a full 32-bit address.
 
-@item -ml
-@opindex ml
-Compile code for the processor in little-endian mode.
+@item -mstdmain
+@opindex mstdmain
 
-@item -mdalign
-@opindex mdalign
-Align doubles at 64-bit boundaries.  Note that this changes the calling
-conventions, and thus some functions from the standard C library do
-not work unless you recompile it first with @option{-mdalign}.
+By default, GCC links against startup code that assumes the SPU-style
+main function interface (which has an unconventional parameter list).
+With @option{-mstdmain}, GCC links your program against startup
+code that assumes a C99-style interface to @code{main}, including a
+local copy of @code{argv} strings.
 
-@item -mrelax
-@opindex mrelax
-Shorten some address references at link time, when possible; uses the
-linker option @option{-relax}.
+@item -mfixed-range=@var{register-range}
+@opindex mfixed-range
+Generate code treating the given register range as fixed registers.
+A fixed register is one that the register allocator cannot use.  This is
+useful when compiling kernel code.  A register range is specified as
+two registers separated by a dash.  Multiple register ranges can be
+specified separated by a comma.
 
-@item -mbigtable
-@opindex mbigtable
-Use 32-bit offsets in @code{switch} tables.  The default is to use
-16-bit offsets.
+@item -mea32
+@itemx -mea64
+@opindex mea32
+@opindex mea64
+Compile code assuming that pointers to the PPU address space accessed
+via the @code{__ea} named address space qualifier are either 32 or 64
+bits wide.  The default is 32 bits.  As this is an ABI-changing option,
+all object code in an executable must be compiled with the same setting.
 
-@item -mbitops
-@opindex mbitops
-Enable the use of bit manipulation instructions on SH2A.
+@item -maddress-space-conversion
+@itemx -mno-address-space-conversion
+@opindex maddress-space-conversion
+@opindex mno-address-space-conversion
+Allow/disallow treating the @code{__ea} address space as superset
+of the generic address space.  This enables explicit type casts
+between @code{__ea} and generic pointer as well as implicit
+conversions of generic pointers to @code{__ea} pointers.  The
+default is to allow address space pointer conversions.
 
-@item -mfmovd
-@opindex mfmovd
-Enable the use of the instruction @code{fmovd}.  Check @option{-mdalign} for
-alignment constraints.
+@item -mcache-size=@var{cache-size}
+@opindex mcache-size
+This option controls the version of libgcc that the compiler links to an
+executable and selects a software-managed cache for accessing variables
+in the @code{__ea} address space with a particular cache size.  Possible
+options for @var{cache-size} are @samp{8}, @samp{16}, @samp{32}, @samp{64}
+and @samp{128}.  The default cache size is 64KB.
 
-@item -mrenesas
-@opindex mrenesas
-Comply with the calling conventions defined by Renesas.
+@item -matomic-updates
+@itemx -mno-atomic-updates
+@opindex matomic-updates
+@opindex mno-atomic-updates
+This option controls the version of libgcc that the compiler links to an
+executable and selects whether atomic updates to the software-managed
+cache of PPU-side variables are used.  If you use atomic updates, changes
+to a PPU variable from SPU code using the @code{__ea} named address space
+qualifier do not interfere with changes to other PPU variables residing
+in the same cache line from PPU code.  If you do not use atomic updates,
+such interference may occur; however, writing back cache lines is
+more efficient.  The default behavior is to use atomic updates.
 
-@item -mno-renesas
-@opindex mno-renesas
-Comply with the calling conventions defined for GCC before the Renesas
-conventions were available.  This option is the default for all
-targets of the SH toolchain.
+@item -mdual-nops
+@itemx -mdual-nops=@var{n}
+@opindex mdual-nops
+By default, GCC inserts nops to increase dual issue when it expects
+it to increase performance.  @var{n} can be a value from 0 to 10.  A
+smaller @var{n} inserts fewer nops.  10 is the default, 0 is the
+same as @option{-mno-dual-nops}.  Disabled with @option{-Os}.
 
-@item -mnomacsave
-@opindex mnomacsave
-Mark the @code{MAC} register as call-clobbered, even if
-@option{-mrenesas} is given.
+@item -mhint-max-nops=@var{n}
+@opindex mhint-max-nops
+Maximum number of nops to insert for a branch hint.  A branch hint must
+be at least 8 instructions away from the branch it is affecting.  GCC
+inserts up to @var{n} nops to enforce this, otherwise it does not
+generate the branch hint.
 
-@item -mieee
-@itemx -mno-ieee
-@opindex mieee
-@opindex mno-ieee
-Control the IEEE compliance of floating-point comparisons, which affects the
-handling of cases where the result of a comparison is unordered.  By default
-@option{-mieee} is implicitly enabled.  If @option{-ffinite-math-only} is
-enabled @option{-mno-ieee} is implicitly set, which results in faster
-floating-point greater-equal and less-equal comparisons.  The implcit settings
-can be overridden by specifying either @option{-mieee} or @option{-mno-ieee}.
+@item -mhint-max-distance=@var{n}
+@opindex mhint-max-distance
+The encoding of the branch hint instruction limits the hint to be within
+256 instructions of the branch it is affecting.  By default, GCC makes
+sure it is within 125.
 
-@item -minline-ic_invalidate
-@opindex minline-ic_invalidate
-Inline code to invalidate instruction cache entries after setting up
-nested function trampolines.
-This option has no effect if @option{-musermode} is in effect and the selected
-code generation option (e.g. @option{-m4}) does not allow the use of the @code{icbi}
-instruction.
-If the selected code generation option does not allow the use of the @code{icbi}
-instruction, and @option{-musermode} is not in effect, the inlined code
-manipulates the instruction cache address array directly with an associative
-write.  This not only requires privileged mode at run time, but it also
-fails if the cache line had been mapped via the TLB and has become unmapped.
+@item -msafe-hints
+@opindex msafe-hints
+Work around a hardware bug that causes the SPU to stall indefinitely.
+By default, GCC inserts the @code{hbrp} instruction to make sure
+this stall won't happen.
 
-@item -misize
-@opindex misize
-Dump instruction size and location in the assembly code.
+@end table
 
-@item -mpadstruct
-@opindex mpadstruct
-This option is deprecated.  It pads structures to multiple of 4 bytes,
-which is incompatible with the SH ABI@.
+@node System V Options
+@subsection Options for System V
 
-@item -matomic-model=@var{model}
-@opindex matomic-model=@var{model}
-Sets the model of atomic operations and additional parameters as a comma
-separated list.  For details on the atomic built-in functions see
-@ref{__atomic Builtins}.  The following models and parameters are supported:
+These additional options are available on System V Release 4 for
+compatibility with other compilers on those systems:
 
-@table @samp
+@table @gcctabopt
+@item -G
+@opindex G
+Create a shared object.
+It is recommended that @option{-symbolic} or @option{-shared} be used instead.
 
-@item none
-Disable compiler generated atomic sequences and emit library calls for atomic
-operations.  This is the default if the target is not @code{sh*-*-linux*}.
+@item -Qy
+@opindex Qy
+Identify the versions of each tool used by the compiler, in a
+@code{.ident} assembler directive in the output.
 
-@item soft-gusa
-Generate GNU/Linux compatible gUSA software atomic sequences for the atomic
-built-in functions.  The generated atomic sequences require additional support
-from the interrupt/exception handling code of the system and are only suitable
-for SH3* and SH4* single-core systems.  This option is enabled by default when
-the target is @code{sh*-*-linux*} and SH3* or SH4*.  When the target is SH4A,
-this option also partially utilizes the hardware atomic instructions
-@code{movli.l} and @code{movco.l} to create more efficient code, unless
-@samp{strict} is specified.  
+@item -Qn
+@opindex Qn
+Refrain from adding @code{.ident} directives to the output file (this is
+the default).
 
-@item soft-tcb
-Generate software atomic sequences that use a variable in the thread control
-block.  This is a variation of the gUSA sequences which can also be used on
-SH1* and SH2* targets.  The generated atomic sequences require additional
-support from the interrupt/exception handling code of the system and are only
-suitable for single-core systems.  When using this model, the @samp{gbr-offset=}
-parameter has to be specified as well.
+@item -YP,@var{dirs}
+@opindex YP
+Search the directories @var{dirs}, and no others, for libraries
+specified with @option{-l}.
 
-@item soft-imask
-Generate software atomic sequences that temporarily disable interrupts by
-setting @code{SR.IMASK = 1111}.  This model works only when the program runs
-in privileged mode and is only suitable for single-core systems.  Additional
-support from the interrupt/exception handling code of the system is not
-required.  This model is enabled by default when the target is
-@code{sh*-*-linux*} and SH1* or SH2*.
+@item -Ym,@var{dir}
+@opindex Ym
+Look in the directory @var{dir} to find the M4 preprocessor.
+The assembler uses this option.
+@c This is supposed to go with a -Yd for predefined M4 macro files, but
+@c the generic assembler that comes with Solaris takes just -Ym.
+@end table
 
-@item hard-llcs
-Generate hardware atomic sequences using the @code{movli.l} and @code{movco.l}
-instructions only.  This is only available on SH4A and is suitable for
-multi-core systems.  Since the hardware instructions support only 32 bit atomic
-variables access to 8 or 16 bit variables is emulated with 32 bit accesses.
-Code compiled with this option is also compatible with other software
-atomic model interrupt/exception handling systems if executed on an SH4A
-system.  Additional support from the interrupt/exception handling code of the
-system is not required for this model.
+@node TILE-Gx Options
+@subsection TILE-Gx Options
+@cindex TILE-Gx options
 
-@item gbr-offset=
-This parameter specifies the offset in bytes of the variable in the thread
-control block structure that should be used by the generated atomic sequences
-when the @samp{soft-tcb} model has been selected.  For other models this
-parameter is ignored.  The specified value must be an integer multiple of four
-and in the range 0-1020.
+These @samp{-m} options are supported on the TILE-Gx:
 
-@item strict
-This parameter prevents mixed usage of multiple atomic models, even if they
-are compatible, and makes the compiler generate atomic sequences of the
-specified model only.
+@table @gcctabopt
+@item -mcmodel=small
+@opindex mcmodel=small
+Generate code for the small model.  The distance for direct calls is
+limited to 500M in either direction.  PC-relative addresses are 32
+bits.  Absolute addresses support the full address range.
 
-@end table
+@item -mcmodel=large
+@opindex mcmodel=large
+Generate code for the large model.  There is no limitation on call
+distance, pc-relative addresses, or absolute addresses.
 
-@item -mtas
-@opindex mtas
-Generate the @code{tas.b} opcode for @code{__atomic_test_and_set}.
-Notice that depending on the particular hardware and software configuration
-this can degrade overall performance due to the operand cache line flushes
-that are implied by the @code{tas.b} instruction.  On multi-core SH4A
-processors the @code{tas.b} instruction must be used with caution since it
-can result in data corruption for certain cache configurations.
+@item -mcpu=@var{name}
+@opindex mcpu
+Selects the type of CPU to be targeted.  Currently the only supported
+type is @samp{tilegx}.
 
-@item -mprefergot
-@opindex mprefergot
-When generating position-independent code, emit function calls using
-the Global Offset Table instead of the Procedure Linkage Table.
+@item -m32
+@itemx -m64
+@opindex m32
+@opindex m64
+Generate code for a 32-bit or 64-bit environment.  The 32-bit
+environment sets int, long, and pointer to 32 bits.  The 64-bit
+environment sets int to 32 bits and long and pointer to 64 bits.
 
-@item -musermode
-@itemx -mno-usermode
-@opindex musermode
-@opindex mno-usermode
-Don't allow (allow) the compiler generating privileged mode code.  Specifying
-@option{-musermode} also implies @option{-mno-inline-ic_invalidate} if the
-inlined code would not work in user mode.  @option{-musermode} is the default
-when the target is @code{sh*-*-linux*}.  If the target is SH1* or SH2*
-@option{-musermode} has no effect, since there is no user mode.
+@item -mbig-endian
+@itemx -mlittle-endian
+@opindex mbig-endian
+@opindex mlittle-endian
+Generate code in big/little endian mode, respectively.
+@end table
 
-@item -multcost=@var{number}
-@opindex multcost=@var{number}
-Set the cost to assume for a multiply insn.
+@node TILEPro Options
+@subsection TILEPro Options
+@cindex TILEPro options
 
-@item -mdiv=@var{strategy}
-@opindex mdiv=@var{strategy}
-Set the division strategy to be used for integer division operations.
-For SHmedia @var{strategy} can be one of: 
+These @samp{-m} options are supported on the TILEPro:
 
-@table @samp
+@table @gcctabopt
+@item -mcpu=@var{name}
+@opindex mcpu
+Selects the type of CPU to be targeted.  Currently the only supported
+type is @samp{tilepro}.
 
-@item fp 
-Performs the operation in floating point.  This has a very high latency,
-but needs only a few instructions, so it might be a good choice if
-your code has enough easily-exploitable ILP to allow the compiler to
-schedule the floating-point instructions together with other instructions.
-Division by zero causes a floating-point exception.
+@item -m32
+@opindex m32
+Generate code for a 32-bit environment, which sets int, long, and
+pointer to 32 bits.  This is the only supported behavior so the flag
+is essentially ignored.
+@end table
 
-@item inv
-Uses integer operations to calculate the inverse of the divisor,
-and then multiplies the dividend with the inverse.  This strategy allows
-CSE and hoisting of the inverse calculation.  Division by zero calculates
-an unspecified result, but does not trap.
+@node V850 Options
+@subsection V850 Options
+@cindex V850 Options
 
-@item inv:minlat
-A variant of @samp{inv} where, if no CSE or hoisting opportunities
-have been found, or if the entire operation has been hoisted to the same
-place, the last stages of the inverse calculation are intertwined with the
-final multiply to reduce the overall latency, at the expense of using a few
-more instructions, and thus offering fewer scheduling opportunities with
-other code.
+These @samp{-m} options are defined for V850 implementations:
 
-@item call
-Calls a library function that usually implements the @samp{inv:minlat}
-strategy.
-This gives high code density for @code{m5-*media-nofpu} compilations.
+@table @gcctabopt
+@item -mlong-calls
+@itemx -mno-long-calls
+@opindex mlong-calls
+@opindex mno-long-calls
+Treat all calls as being far away (near).  If calls are assumed to be
+far away, the compiler always loads the function's address into a
+register, and calls indirect through the pointer.
 
-@item call2
-Uses a different entry point of the same library function, where it
-assumes that a pointer to a lookup table has already been set up, which
-exposes the pointer load to CSE and code hoisting optimizations.
+@item -mno-ep
+@itemx -mep
+@opindex mno-ep
+@opindex mep
+Do not optimize (do optimize) basic blocks that use the same index
+pointer 4 or more times to copy pointer into the @code{ep} register, and
+use the shorter @code{sld} and @code{sst} instructions.  The @option{-mep}
+option is on by default if you optimize.
 
-@item inv:call
-@itemx inv:call2
-@itemx inv:fp
-Use the @samp{inv} algorithm for initial
-code generation, but if the code stays unoptimized, revert to the @samp{call},
-@samp{call2}, or @samp{fp} strategies, respectively.  Note that the
-potentially-trapping side effect of division by zero is carried by a
-separate instruction, so it is possible that all the integer instructions
-are hoisted out, but the marker for the side effect stays where it is.
-A recombination to floating-point operations or a call is not possible
-in that case.
-
-@item inv20u
-@itemx inv20l
-Variants of the @samp{inv:minlat} strategy.  In the case
-that the inverse calculation is not separated from the multiply, they speed
-up division where the dividend fits into 20 bits (plus sign where applicable)
-by inserting a test to skip a number of operations in this case; this test
-slows down the case of larger dividends.  @samp{inv20u} assumes the case of a such
-a small dividend to be unlikely, and @samp{inv20l} assumes it to be likely.
+@item -mno-prolog-function
+@itemx -mprolog-function
+@opindex mno-prolog-function
+@opindex mprolog-function
+Do not use (do use) external functions to save and restore registers
+at the prologue and epilogue of a function.  The external functions
+are slower, but use less code space if more than one function saves
+the same number of registers.  The @option{-mprolog-function} option
+is on by default if you optimize.
 
-@end table
+@item -mspace
+@opindex mspace
+Try to make the code as small as possible.  At present, this just turns
+on the @option{-mep} and @option{-mprolog-function} options.
 
-For targets other than SHmedia @var{strategy} can be one of:
+@item -mtda=@var{n}
+@opindex mtda
+Put static or global variables whose size is @var{n} bytes or less into
+the tiny data area that register @code{ep} points to.  The tiny data
+area can hold up to 256 bytes in total (128 bytes for byte references).
 
-@table @samp
+@item -msda=@var{n}
+@opindex msda
+Put static or global variables whose size is @var{n} bytes or less into
+the small data area that register @code{gp} points to.  The small data
+area can hold up to 64 kilobytes.
 
-@item call-div1
-Calls a library function that uses the single-step division instruction
-@code{div1} to perform the operation.  Division by zero calculates an
-unspecified result and does not trap.  This is the default except for SH4,
-SH2A and SHcompact.
+@item -mzda=@var{n}
+@opindex mzda
+Put static or global variables whose size is @var{n} bytes or less into
+the first 32 kilobytes of memory.
 
-@item call-fp
-Calls a library function that performs the operation in double precision
-floating point.  Division by zero causes a floating-point exception.  This is
-the default for SHcompact with FPU.  Specifying this for targets that do not
-have a double precision FPU defaults to @code{call-div1}.
+@item -mv850
+@opindex mv850
+Specify that the target processor is the V850.
 
-@item call-table
-Calls a library function that uses a lookup table for small divisors and
-the @code{div1} instruction with case distinction for larger divisors.  Division
-by zero calculates an unspecified result and does not trap.  This is the default
-for SH4.  Specifying this for targets that do not have dynamic shift
-instructions defaults to @code{call-div1}.
+@item -mv850e3v5
+@opindex mv850e3v5
+Specify that the target processor is the V850E3V5.  The preprocessor
+constant @code{__v850e3v5__} is defined if this option is used.
 
-@end table
+@item -mv850e2v4
+@opindex mv850e2v4
+Specify that the target processor is the V850E3V5.  This is an alias for
+the @option{-mv850e3v5} option.
 
-When a division strategy has not been specified the default strategy is
-selected based on the current target.  For SH2A the default strategy is to
-use the @code{divs} and @code{divu} instructions instead of library function
-calls.
+@item -mv850e2v3
+@opindex mv850e2v3
+Specify that the target processor is the V850E2V3.  The preprocessor
+constant @code{__v850e2v3__} is defined if this option is used.
 
-@item -maccumulate-outgoing-args
-@opindex maccumulate-outgoing-args
-Reserve space once for outgoing arguments in the function prologue rather
-than around each call.  Generally beneficial for performance and size.  Also
-needed for unwinding to avoid changing the stack frame around conditional code.
+@item -mv850e2
+@opindex mv850e2
+Specify that the target processor is the V850E2.  The preprocessor
+constant @code{__v850e2__} is defined if this option is used.
 
-@item -mdivsi3_libfunc=@var{name}
-@opindex mdivsi3_libfunc=@var{name}
-Set the name of the library function used for 32-bit signed division to
-@var{name}.
-This only affects the name used in the @samp{call} and @samp{inv:call}
-division strategies, and the compiler still expects the same
-sets of input/output/clobbered registers as if this option were not present.
+@item -mv850e1
+@opindex mv850e1
+Specify that the target processor is the V850E1.  The preprocessor
+constants @code{__v850e1__} and @code{__v850e__} are defined if
+this option is used.
 
-@item -mfixed-range=@var{register-range}
-@opindex mfixed-range
-Generate code treating the given register range as fixed registers.
-A fixed register is one that the register allocator can not use.  This is
-useful when compiling kernel code.  A register range is specified as
-two registers separated by a dash.  Multiple register ranges can be
-specified separated by a comma.
+@item -mv850es
+@opindex mv850es
+Specify that the target processor is the V850ES.  This is an alias for
+the @option{-mv850e1} option.
 
-@item -mindexed-addressing
-@opindex mindexed-addressing
-Enable the use of the indexed addressing mode for SHmedia32/SHcompact.
-This is only safe if the hardware and/or OS implement 32-bit wrap-around
-semantics for the indexed addressing mode.  The architecture allows the
-implementation of processors with 64-bit MMU, which the OS could use to
-get 32-bit addressing, but since no current hardware implementation supports
-this or any other way to make the indexed addressing mode safe to use in
-the 32-bit ABI, the default is @option{-mno-indexed-addressing}.
+@item -mv850e
+@opindex mv850e
+Specify that the target processor is the V850E@.  The preprocessor
+constant @code{__v850e__} is defined if this option is used.
 
-@item -mgettrcost=@var{number}
-@opindex mgettrcost=@var{number}
-Set the cost assumed for the @code{gettr} instruction to @var{number}.
-The default is 2 if @option{-mpt-fixed} is in effect, 100 otherwise.
+If neither @option{-mv850} nor @option{-mv850e} nor @option{-mv850e1}
+nor @option{-mv850e2} nor @option{-mv850e2v3} nor @option{-mv850e3v5}
+are defined then a default target processor is chosen and the
+relevant @samp{__v850*__} preprocessor constant is defined.
 
-@item -mpt-fixed
-@opindex mpt-fixed
-Assume @code{pt*} instructions won't trap.  This generally generates
-better-scheduled code, but is unsafe on current hardware.
-The current architecture
-definition says that @code{ptabs} and @code{ptrel} trap when the target 
-anded with 3 is 3.
-This has the unintentional effect of making it unsafe to schedule these
-instructions before a branch, or hoist them out of a loop.  For example,
-@code{__do_global_ctors}, a part of @file{libgcc}
-that runs constructors at program
-startup, calls functions in a list which is delimited by @minus{}1.  With the
-@option{-mpt-fixed} option, the @code{ptabs} is done before testing against @minus{}1.
-That means that all the constructors run a bit more quickly, but when
-the loop comes to the end of the list, the program crashes because @code{ptabs}
-loads @minus{}1 into a target register.  
+The preprocessor constants @code{__v850} and @code{__v851__} are always
+defined, regardless of which processor variant is the target.
 
-Since this option is unsafe for any
-hardware implementing the current architecture specification, the default
-is @option{-mno-pt-fixed}.  Unless specified explicitly with 
-@option{-mgettrcost}, @option{-mno-pt-fixed} also implies @option{-mgettrcost=100};
-this deters register allocation from using target registers for storing
-ordinary integers.
+@item -mdisable-callt
+@itemx -mno-disable-callt
+@opindex mdisable-callt
+@opindex mno-disable-callt
+This option suppresses generation of the @code{CALLT} instruction for the
+v850e, v850e1, v850e2, v850e2v3 and v850e3v5 flavors of the v850
+architecture.
 
-@item -minvalid-symbols
-@opindex minvalid-symbols
-Assume symbols might be invalid.  Ordinary function symbols generated by
-the compiler are always valid to load with
-@code{movi}/@code{shori}/@code{ptabs} or
-@code{movi}/@code{shori}/@code{ptrel},
-but with assembler and/or linker tricks it is possible
-to generate symbols that cause @code{ptabs} or @code{ptrel} to trap.
-This option is only meaningful when @option{-mno-pt-fixed} is in effect.
-It prevents cross-basic-block CSE, hoisting and most scheduling
-of symbol loads.  The default is @option{-mno-invalid-symbols}.
+This option is enabled by default when the RH850 ABI is
+in use (see @option{-mrh850-abi}), and disabled by default when the
+GCC ABI is in use.  If @code{CALLT} instructions are being generated
+then the C preprocessor symbol @code{__V850_CALLT__} is defined.
 
-@item -mbranch-cost=@var{num}
-@opindex mbranch-cost=@var{num}
-Assume @var{num} to be the cost for a branch instruction.  Higher numbers
-make the compiler try to generate more branch-free code if possible.  
-If not specified the value is selected depending on the processor type that
-is being compiled for.
+@item -mrelax
+@itemx -mno-relax
+@opindex mrelax
+@opindex mno-relax
+Pass on (or do not pass on) the @option{-mrelax} command line option
+to the assembler.
 
-@item -mzdcbranch
-@itemx -mno-zdcbranch
-@opindex mzdcbranch
-@opindex mno-zdcbranch
-Assume (do not assume) that zero displacement conditional branch instructions
-@code{bt} and @code{bf} are fast.  If @option{-mzdcbranch} is specified, the
-compiler prefers zero displacement branch code sequences.  This is
-enabled by default when generating code for SH4 and SH4A.  It can be explicitly
-disabled by specifying @option{-mno-zdcbranch}.
+@item -mlong-jumps
+@itemx -mno-long-jumps
+@opindex mlong-jumps
+@opindex mno-long-jumps
+Disable (or re-enable) the generation of PC-relative jump instructions.
 
-@item -mfused-madd
-@itemx -mno-fused-madd
-@opindex mfused-madd
-@opindex mno-fused-madd
-Generate code that uses (does not use) the floating-point multiply and
-accumulate instructions.  These instructions are generated by default
-if hardware floating point is used.  The machine-dependent
-@option{-mfused-madd} option is now mapped to the machine-independent
-@option{-ffp-contract=fast} option, and @option{-mno-fused-madd} is
-mapped to @option{-ffp-contract=off}.
+@item -msoft-float
+@itemx -mhard-float
+@opindex msoft-float
+@opindex mhard-float
+Disable (or re-enable) the generation of hardware floating point
+instructions.  This option is only significant when the target
+architecture is @samp{V850E2V3} or higher.  If hardware floating point
+instructions are being generated then the C preprocessor symbol
+@code{__FPU_OK__} is defined, otherwise the symbol
+@code{__NO_FPU__} is defined.
 
-@item -mfsca
-@itemx -mno-fsca
-@opindex mfsca
-@opindex mno-fsca
-Allow or disallow the compiler to emit the @code{fsca} instruction for sine
-and cosine approximations.  The option @option{-mfsca} must be used in
-combination with @option{-funsafe-math-optimizations}.  It is enabled by default
-when generating code for SH4A.  Using @option{-mno-fsca} disables sine and cosine
-approximations even if @option{-funsafe-math-optimizations} is in effect.
+@item -mloop
+@opindex mloop
+Enables the use of the e3v5 LOOP instruction.  The use of this
+instruction is not enabled by default when the e3v5 architecture is
+selected because its use is still experimental.
 
-@item -mfsrra
-@itemx -mno-fsrra
-@opindex mfsrra
-@opindex mno-fsrra
-Allow or disallow the compiler to emit the @code{fsrra} instruction for
-reciprocal square root approximations.  The option @option{-mfsrra} must be used
-in combination with @option{-funsafe-math-optimizations} and
-@option{-ffinite-math-only}.  It is enabled by default when generating code for
-SH4A.  Using @option{-mno-fsrra} disables reciprocal square root approximations
-even if @option{-funsafe-math-optimizations} and @option{-ffinite-math-only} are
-in effect.
+@item -mrh850-abi
+@itemx -mghs
+@opindex mrh850-abi
+@opindex mghs
+Enables support for the RH850 version of the V850 ABI.  This is the
+default.  With this version of the ABI the following rules apply:
 
-@item -mpretend-cmove
-@opindex mpretend-cmove
-Prefer zero-displacement conditional branches for conditional move instruction
-patterns.  This can result in faster code on the SH4 processor.
+@itemize
+@item
+Integer sized structures and unions are returned via a memory pointer
+rather than a register.
 
-@end table
+@item
+Large structures and unions (more than 8 bytes in size) are passed by
+value.
 
-@node Solaris 2 Options
-@subsection Solaris 2 Options
-@cindex Solaris 2 options
+@item
+Functions are aligned to 16-bit boundaries.
 
-These @samp{-m} options are supported on Solaris 2:
+@item
+The @option{-m8byte-align} command line option is supported.
 
-@table @gcctabopt
-@item -mclear-hwcap
-@opindex mclear-hwcap
-@option{-mclear-hwcap} tells the compiler to remove the hardware
-capabilities generated by the Solaris assembler.  This is only necessary
-when object files use ISA extensions not supported by the current
-machine, but check at runtime whether or not to use them.
+@item
+The @option{-mdisable-callt} command line option is enabled by
+default.  The @option{-mno-disable-callt} command line option is not
+supported.
+@end itemize
 
-@item -mimpure-text
-@opindex mimpure-text
-@option{-mimpure-text}, used in addition to @option{-shared}, tells
-the compiler to not pass @option{-z text} to the linker when linking a
-shared object.  Using this option, you can link position-dependent
-code into a shared object.
+When this version of the ABI is enabled the C preprocessor symbol
+@code{__V850_RH850_ABI__} is defined.
 
-@option{-mimpure-text} suppresses the ``relocations remain against
-allocatable but non-writable sections'' linker error message.
-However, the necessary relocations trigger copy-on-write, and the
-shared object is not actually shared across processes.  Instead of
-using @option{-mimpure-text}, you should compile all source code with
-@option{-fpic} or @option{-fPIC}.
+@item -mgcc-abi
+@opindex mgcc-abi
+Enables support for the old GCC version of the V850 ABI.  With this
+version of the ABI the following rules apply:
 
-@end table
+@itemize
+@item
+Integer sized structures and unions are returned in register @code{r10}.
 
-These switches are supported in addition to the above on Solaris 2:
+@item
+Large structures and unions (more than 8 bytes in size) are passed by
+reference.
 
-@table @gcctabopt
-@item -pthreads
-@opindex pthreads
-Add support for multithreading using the POSIX threads library.  This
-option sets flags for both the preprocessor and linker.  This option does
-not affect the thread safety of object code produced  by the compiler or
-that of libraries supplied with it.
+@item
+Functions are aligned to 32-bit boundaries, unless optimizing for
+size.
 
-@item -pthread
-@opindex pthread
-This is a synonym for @option{-pthreads}.
-@end table
+@item
+The @option{-m8byte-align} command line option is not supported.
 
-@node SPARC Options
-@subsection SPARC Options
-@cindex SPARC options
+@item
+The @option{-mdisable-callt} command line option is supported but not
+enabled by default.
+@end itemize
 
-These @samp{-m} options are supported on the SPARC:
+When this version of the ABI is enabled the C preprocessor symbol
+@code{__V850_GCC_ABI__} is defined.
+
+@item -m8byte-align
+@itemx -mno-8byte-align
+@opindex m8byte-align
+@opindex mno-8byte-align
+Enables support for @code{double} and @code{long long} types to be
+aligned on 8-byte boundaries.  The default is to restrict the
+alignment of all objects to at most 4-bytes.  When
+@option{-m8byte-align} is in effect the C preprocessor symbol
+@code{__V850_8BYTE_ALIGN__} is defined.
+
+@item -mbig-switch
+@opindex mbig-switch
+Generate code suitable for big switch tables.  Use this option only if
+the assembler/linker complain about out of range branches within a switch
+table.
+
+@item -mapp-regs
+@opindex mapp-regs
+This option causes r2 and r5 to be used in the code generated by
+the compiler.  This setting is the default.
 
-@table @gcctabopt
 @item -mno-app-regs
-@itemx -mapp-regs
 @opindex mno-app-regs
-@opindex mapp-regs
-Specify @option{-mapp-regs} to generate output using the global registers
-2 through 4, which the SPARC SVR4 ABI reserves for applications.  Like the
-global register 1, each global register 2 through 4 is then treated as an
-allocable register that is clobbered by function calls.  This is the default.
+This option causes r2 and r5 to be treated as fixed registers.
 
-To be fully SVR4 ABI-compliant at the cost of some performance loss,
-specify @option{-mno-app-regs}.  You should compile libraries and system
-software with this option.
+@end table
 
-@item -mflat
-@itemx -mno-flat
-@opindex mflat
-@opindex mno-flat
-With @option{-mflat}, the compiler does not generate save/restore instructions
-and uses a ``flat'' or single register window model.  This model is compatible
-with the regular register window model.  The local registers and the input
-registers (0--5) are still treated as ``call-saved'' registers and are
-saved on the stack as needed.
+@node VAX Options
+@subsection VAX Options
+@cindex VAX options
 
-With @option{-mno-flat} (the default), the compiler generates save/restore
-instructions (except for leaf functions).  This is the normal operating mode.
+These @samp{-m} options are defined for the VAX:
+
+@table @gcctabopt
+@item -munix
+@opindex munix
+Do not output certain jump instructions (@code{aobleq} and so on)
+that the Unix assembler for the VAX cannot handle across long
+ranges.
+
+@item -mgnu
+@opindex mgnu
+Do output those jump instructions, on the assumption that the
+GNU assembler is being used.
+
+@item -mg
+@opindex mg
+Output code for G-format floating-point numbers instead of D-format.
+@end table
+
+@node Visium Options
+@subsection Visium Options
+@cindex Visium options
+
+@table @gcctabopt
+
+@item -mdebug
+@opindex mdebug
+A program which performs file I/O and is destined to run on an MCM target
+should be linked with this option.  It causes the libraries libc.a and
+libdebug.a to be linked.  The program should be run on the target under
+the control of the GDB remote debugging stub.
+
+@item -msim
+@opindex msim
+A program which performs file I/O and is destined to run on the simulator
+should be linked with option.  This causes libraries libc.a and libsim.a to
+be linked.
 
 @item -mfpu
 @itemx -mhard-float
 @opindex mfpu
 @opindex mhard-float
-Generate output containing floating-point instructions.  This is the
+Generate code containing floating-point instructions.  This is the
 default.
 
 @item -mno-fpu
 @itemx -msoft-float
 @opindex mno-fpu
 @opindex msoft-float
-Generate output containing library calls for floating point.
-@strong{Warning:} the requisite libraries are not available for all SPARC
-targets.  Normally the facilities of the machine's usual C compiler are
-used, but this cannot be done directly in cross-compilation.  You must make
-your own arrangements to provide suitable library functions for
-cross-compilation.  The embedded targets @samp{sparc-*-aout} and
-@samp{sparclite-*-*} do provide software floating-point support.
+Generate code containing library calls for floating-point.
 
 @option{-msoft-float} changes the calling convention in the output file;
 therefore, it is only useful if you compile @emph{all} of a program with
@@ -21930,926 +21532,1324 @@ this option.  In particular, you need to compile @file{libgcc.a}, the
 library that comes with GCC, with @option{-msoft-float} in order for
 this to work.
 
-@item -mhard-quad-float
-@opindex mhard-quad-float
-Generate output containing quad-word (long double) floating-point
-instructions.
+@item -mcpu=@var{cpu_type}
+@opindex mcpu
+Set the instruction set, register set, and instruction scheduling parameters
+for machine type @var{cpu_type}.  Supported values for @var{cpu_type} are
+@samp{mcm}, @samp{gr5} and @samp{gr6}.
 
-@item -msoft-quad-float
-@opindex msoft-quad-float
-Generate output containing library calls for quad-word (long double)
-floating-point instructions.  The functions called are those specified
-in the SPARC ABI@.  This is the default.
+@samp{mcm} is a synonym of @samp{gr5} present for backward compatibility.
 
-As of this writing, there are no SPARC implementations that have hardware
-support for the quad-word floating-point instructions.  They all invoke
-a trap handler for one of these instructions, and then the trap handler
-emulates the effect of the instruction.  Because of the trap handler overhead,
-this is much slower than calling the ABI library routines.  Thus the
-@option{-msoft-quad-float} option is the default.
+By default (unless configured otherwise), GCC generates code for the GR5
+variant of the Visium architecture.  
 
-@item -mno-unaligned-doubles
-@itemx -munaligned-doubles
-@opindex mno-unaligned-doubles
-@opindex munaligned-doubles
-Assume that doubles have 8-byte alignment.  This is the default.
+With @option{-mcpu=gr6}, GCC generates code for the GR6 variant of the Visium
+architecture.  The only difference from GR5 code is that the compiler will
+generate block move instructions.
 
-With @option{-munaligned-doubles}, GCC assumes that doubles have 8-byte
-alignment only if they are contained in another type, or if they have an
-absolute address.  Otherwise, it assumes they have 4-byte alignment.
-Specifying this option avoids some rare compatibility problems with code
-generated by other compilers.  It is not the default because it results
-in a performance loss, especially for floating-point code.
+@item -mtune=@var{cpu_type}
+@opindex mtune
+Set the instruction scheduling parameters for machine type @var{cpu_type},
+but do not set the instruction set or register set that the option
+@option{-mcpu=@var{cpu_type}} would.
+
+@item -msv-mode
+@opindex msv-mode
+Generate code for the supervisor mode, where there are no restrictions on
+the access to general registers.  This is the default.
 
 @item -muser-mode
-@itemx -mno-user-mode
 @opindex muser-mode
-@opindex mno-user-mode
-Do not generate code that can only run in supervisor mode.  This is relevant
-only for the @code{casa} instruction emitted for the LEON3 processor.  The
-default is @option{-mno-user-mode}.
+Generate code for the user mode, where the access to some general registers
+is forbidden: on the GR5, registers r24 to r31 cannot be accessed in this
+mode; on the GR6, only registers r29 to r31 are affected.
+@end table
 
-@item -mno-faster-structs
-@itemx -mfaster-structs
-@opindex mno-faster-structs
-@opindex mfaster-structs
-With @option{-mfaster-structs}, the compiler assumes that structures
-should have 8-byte alignment.  This enables the use of pairs of
-@code{ldd} and @code{std} instructions for copies in structure
-assignment, in place of twice as many @code{ld} and @code{st} pairs.
-However, the use of this changed alignment directly violates the SPARC
-ABI@.  Thus, it's intended only for use on targets where the developer
-acknowledges that their resulting code is not directly in line with
-the rules of the ABI@.
+@node VMS Options
+@subsection VMS Options
 
-@item -mcpu=@var{cpu_type}
-@opindex mcpu
-Set the instruction set, register set, and instruction scheduling parameters
-for machine type @var{cpu_type}.  Supported values for @var{cpu_type} are
-@samp{v7}, @samp{cypress}, @samp{v8}, @samp{supersparc}, @samp{hypersparc},
-@samp{leon}, @samp{leon3}, @samp{leon3v7}, @samp{sparclite}, @samp{f930},
-@samp{f934}, @samp{sparclite86x}, @samp{sparclet}, @samp{tsc701}, @samp{v9},
-@samp{ultrasparc}, @samp{ultrasparc3}, @samp{niagara}, @samp{niagara2},
-@samp{niagara3} and @samp{niagara4}.
+These @samp{-m} options are defined for the VMS implementations:
 
-Native Solaris and GNU/Linux toolchains also support the value @samp{native},
-which selects the best architecture option for the host processor.
-@option{-mcpu=native} has no effect if GCC does not recognize
-the processor.
+@table @gcctabopt
+@item -mvms-return-codes
+@opindex mvms-return-codes
+Return VMS condition codes from @code{main}. The default is to return POSIX-style
+condition (e.g.@ error) codes.
 
-Default instruction scheduling parameters are used for values that select
-an architecture and not an implementation.  These are @samp{v7}, @samp{v8},
-@samp{sparclite}, @samp{sparclet}, @samp{v9}.
+@item -mdebug-main=@var{prefix}
+@opindex mdebug-main=@var{prefix}
+Flag the first routine whose name starts with @var{prefix} as the main
+routine for the debugger.
 
-Here is a list of each supported architecture and their supported
-implementations.
+@item -mmalloc64
+@opindex mmalloc64
+Default to 64-bit memory allocation routines.
 
-@table @asis
-@item v7
-cypress, leon3v7
+@item -mpointer-size=@var{size}
+@opindex mpointer-size=@var{size}
+Set the default size of pointers. Possible options for @var{size} are
+@samp{32} or @samp{short} for 32 bit pointers, @samp{64} or @samp{long}
+for 64 bit pointers, and @samp{no} for supporting only 32 bit pointers.
+The later option disables @code{pragma pointer_size}.
+@end table
 
-@item v8
-supersparc, hypersparc, leon, leon3
+@node VxWorks Options
+@subsection VxWorks Options
+@cindex VxWorks Options
 
-@item sparclite
-f930, f934, sparclite86x
+The options in this section are defined for all VxWorks targets.
+Options specific to the target hardware are listed with the other
+options for that target.
 
-@item sparclet
-tsc701
+@table @gcctabopt
+@item -mrtp
+@opindex mrtp
+GCC can generate code for both VxWorks kernels and real time processes
+(RTPs).  This option switches from the former to the latter.  It also
+defines the preprocessor macro @code{__RTP__}.
 
-@item v9
-ultrasparc, ultrasparc3, niagara, niagara2, niagara3, niagara4
-@end table
+@item -non-static
+@opindex non-static
+Link an RTP executable against shared libraries rather than static
+libraries.  The options @option{-static} and @option{-shared} can
+also be used for RTPs (@pxref{Link Options}); @option{-static}
+is the default.
 
-By default (unless configured otherwise), GCC generates code for the V7
-variant of the SPARC architecture.  With @option{-mcpu=cypress}, the compiler
-additionally optimizes it for the Cypress CY7C602 chip, as used in the
-SPARCStation/SPARCServer 3xx series.  This is also appropriate for the older
-SPARCStation 1, 2, IPX etc.
+@item -Bstatic
+@itemx -Bdynamic
+@opindex Bstatic
+@opindex Bdynamic
+These options are passed down to the linker.  They are defined for
+compatibility with Diab.
 
-With @option{-mcpu=v8}, GCC generates code for the V8 variant of the SPARC
-architecture.  The only difference from V7 code is that the compiler emits
-the integer multiply and integer divide instructions which exist in SPARC-V8
-but not in SPARC-V7.  With @option{-mcpu=supersparc}, the compiler additionally
-optimizes it for the SuperSPARC chip, as used in the SPARCStation 10, 1000 and
-2000 series.
+@item -Xbind-lazy
+@opindex Xbind-lazy
+Enable lazy binding of function calls.  This option is equivalent to
+@option{-Wl,-z,now} and is defined for compatibility with Diab.
 
-With @option{-mcpu=sparclite}, GCC generates code for the SPARClite variant of
-the SPARC architecture.  This adds the integer multiply, integer divide step
-and scan (@code{ffs}) instructions which exist in SPARClite but not in SPARC-V7.
-With @option{-mcpu=f930}, the compiler additionally optimizes it for the
-Fujitsu MB86930 chip, which is the original SPARClite, with no FPU@.  With
-@option{-mcpu=f934}, the compiler additionally optimizes it for the Fujitsu
-MB86934 chip, which is the more recent SPARClite with FPU@.
+@item -Xbind-now
+@opindex Xbind-now
+Disable lazy binding of function calls.  This option is the default and
+is defined for compatibility with Diab.
+@end table
 
-With @option{-mcpu=sparclet}, GCC generates code for the SPARClet variant of
-the SPARC architecture.  This adds the integer multiply, multiply/accumulate,
-integer divide step and scan (@code{ffs}) instructions which exist in SPARClet
-but not in SPARC-V7.  With @option{-mcpu=tsc701}, the compiler additionally
-optimizes it for the TEMIC SPARClet chip.
+@node x86 Options
+@subsection x86 Options
+@cindex x86 Options
 
-With @option{-mcpu=v9}, GCC generates code for the V9 variant of the SPARC
-architecture.  This adds 64-bit integer and floating-point move instructions,
-3 additional floating-point condition code registers and conditional move
-instructions.  With @option{-mcpu=ultrasparc}, the compiler additionally
-optimizes it for the Sun UltraSPARC I/II/IIi chips.  With
-@option{-mcpu=ultrasparc3}, the compiler additionally optimizes it for the
-Sun UltraSPARC III/III+/IIIi/IIIi+/IV/IV+ chips.  With
-@option{-mcpu=niagara}, the compiler additionally optimizes it for
-Sun UltraSPARC T1 chips.  With @option{-mcpu=niagara2}, the compiler
-additionally optimizes it for Sun UltraSPARC T2 chips. With
-@option{-mcpu=niagara3}, the compiler additionally optimizes it for Sun
-UltraSPARC T3 chips.  With @option{-mcpu=niagara4}, the compiler
-additionally optimizes it for Sun UltraSPARC T4 chips.
+These @samp{-m} options are defined for the x86 family of computers.
 
-@item -mtune=@var{cpu_type}
-@opindex mtune
-Set the instruction scheduling parameters for machine type
-@var{cpu_type}, but do not set the instruction set or register set that the
-option @option{-mcpu=@var{cpu_type}} does.
+@table @gcctabopt
 
-The same values for @option{-mcpu=@var{cpu_type}} can be used for
-@option{-mtune=@var{cpu_type}}, but the only useful values are those
-that select a particular CPU implementation.  Those are @samp{cypress},
-@samp{supersparc}, @samp{hypersparc}, @samp{leon}, @samp{leon3},
-@samp{leon3v7}, @samp{f930}, @samp{f934}, @samp{sparclite86x}, @samp{tsc701},
-@samp{ultrasparc}, @samp{ultrasparc3}, @samp{niagara}, @samp{niagara2},
-@samp{niagara3} and @samp{niagara4}.  With native Solaris and GNU/Linux
-toolchains, @samp{native} can also be used.
+@item -march=@var{cpu-type}
+@opindex march
+Generate instructions for the machine type @var{cpu-type}.  In contrast to
+@option{-mtune=@var{cpu-type}}, which merely tunes the generated code 
+for the specified @var{cpu-type}, @option{-march=@var{cpu-type}} allows GCC
+to generate code that may not run at all on processors other than the one
+indicated.  Specifying @option{-march=@var{cpu-type}} implies 
+@option{-mtune=@var{cpu-type}}.
 
-@item -mv8plus
-@itemx -mno-v8plus
-@opindex mv8plus
-@opindex mno-v8plus
-With @option{-mv8plus}, GCC generates code for the SPARC-V8+ ABI@.  The
-difference from the V8 ABI is that the global and out registers are
-considered 64 bits wide.  This is enabled by default on Solaris in 32-bit
-mode for all SPARC-V9 processors.
+The choices for @var{cpu-type} are:
 
-@item -mvis
-@itemx -mno-vis
-@opindex mvis
-@opindex mno-vis
-With @option{-mvis}, GCC generates code that takes advantage of the UltraSPARC
-Visual Instruction Set extensions.  The default is @option{-mno-vis}.
+@table @samp
+@item native
+This selects the CPU to generate code for at compilation time by determining
+the processor type of the compiling machine.  Using @option{-march=native}
+enables all instruction subsets supported by the local machine (hence
+the result might not run on different machines).  Using @option{-mtune=native}
+produces code optimized for the local machine under the constraints
+of the selected instruction set.  
 
-@item -mvis2
-@itemx -mno-vis2
-@opindex mvis2
-@opindex mno-vis2
-With @option{-mvis2}, GCC generates code that takes advantage of
-version 2.0 of the UltraSPARC Visual Instruction Set extensions.  The
-default is @option{-mvis2} when targeting a cpu that supports such
-instructions, such as UltraSPARC-III and later.  Setting @option{-mvis2}
-also sets @option{-mvis}.
+@item i386
+Original Intel i386 CPU@.
 
-@item -mvis3
-@itemx -mno-vis3
-@opindex mvis3
-@opindex mno-vis3
-With @option{-mvis3}, GCC generates code that takes advantage of
-version 3.0 of the UltraSPARC Visual Instruction Set extensions.  The
-default is @option{-mvis3} when targeting a cpu that supports such
-instructions, such as niagara-3 and later.  Setting @option{-mvis3}
-also sets @option{-mvis2} and @option{-mvis}.
+@item i486
+Intel i486 CPU@.  (No scheduling is implemented for this chip.)
 
-@item -mcbcond
-@itemx -mno-cbcond
-@opindex mcbcond
-@opindex mno-cbcond
-With @option{-mcbcond}, GCC generates code that takes advantage of
-compare-and-branch instructions, as defined in the Sparc Architecture 2011.
-The default is @option{-mcbcond} when targeting a cpu that supports such
-instructions, such as niagara-4 and later.
+@item i586
+@itemx pentium
+Intel Pentium CPU with no MMX support.
 
-@item -mpopc
-@itemx -mno-popc
-@opindex mpopc
-@opindex mno-popc
-With @option{-mpopc}, GCC generates code that takes advantage of the UltraSPARC
-population count instruction.  The default is @option{-mpopc}
-when targeting a cpu that supports such instructions, such as Niagara-2 and
-later.
+@item pentium-mmx
+Intel Pentium MMX CPU, based on Pentium core with MMX instruction set support.
 
-@item -mfmaf
-@itemx -mno-fmaf
-@opindex mfmaf
-@opindex mno-fmaf
-With @option{-mfmaf}, GCC generates code that takes advantage of the UltraSPARC
-Fused Multiply-Add Floating-point extensions.  The default is @option{-mfmaf}
-when targeting a cpu that supports such instructions, such as Niagara-3 and
-later.
+@item pentiumpro
+Intel Pentium Pro CPU@.
 
-@item -mfix-at697f
-@opindex mfix-at697f
-Enable the documented workaround for the single erratum of the Atmel AT697F
-processor (which corresponds to erratum #13 of the AT697E processor).
+@item i686
+When used with @option{-march}, the Pentium Pro
+instruction set is used, so the code runs on all i686 family chips.
+When used with @option{-mtune}, it has the same meaning as @samp{generic}.
 
-@item -mfix-ut699
-@opindex mfix-ut699
-Enable the documented workarounds for the floating-point errata and the data
-cache nullify errata of the UT699 processor.
-@end table
+@item pentium2
+Intel Pentium II CPU, based on Pentium Pro core with MMX instruction set
+support.
 
-These @samp{-m} options are supported in addition to the above
-on SPARC-V9 processors in 64-bit environments:
+@item pentium3
+@itemx pentium3m
+Intel Pentium III CPU, based on Pentium Pro core with MMX and SSE instruction
+set support.
 
-@table @gcctabopt
-@item -m32
-@itemx -m64
-@opindex m32
-@opindex m64
-Generate code for a 32-bit or 64-bit environment.
-The 32-bit environment sets int, long and pointer to 32 bits.
-The 64-bit environment sets int to 32 bits and long and pointer
-to 64 bits.
+@item pentium-m
+Intel Pentium M; low-power version of Intel Pentium III CPU
+with MMX, SSE and SSE2 instruction set support.  Used by Centrino notebooks.
 
-@item -mcmodel=@var{which}
-@opindex mcmodel
-Set the code model to one of
+@item pentium4
+@itemx pentium4m
+Intel Pentium 4 CPU with MMX, SSE and SSE2 instruction set support.
 
-@table @samp
-@item medlow
-The Medium/Low code model: 64-bit addresses, programs
-must be linked in the low 32 bits of memory.  Programs can be statically
-or dynamically linked.
-
-@item medmid
-The Medium/Middle code model: 64-bit addresses, programs
-must be linked in the low 44 bits of memory, the text and data segments must
-be less than 2GB in size and the data segment must be located within 2GB of
-the text segment.
+@item prescott
+Improved version of Intel Pentium 4 CPU with MMX, SSE, SSE2 and SSE3 instruction
+set support.
 
-@item medany
-The Medium/Anywhere code model: 64-bit addresses, programs
-may be linked anywhere in memory, the text and data segments must be less
-than 2GB in size and the data segment must be located within 2GB of the
-text segment.
+@item nocona
+Improved version of Intel Pentium 4 CPU with 64-bit extensions, MMX, SSE,
+SSE2 and SSE3 instruction set support.
 
-@item embmedany
-The Medium/Anywhere code model for embedded systems:
-64-bit addresses, the text and data segments must be less than 2GB in
-size, both starting anywhere in memory (determined at link time).  The
-global register %g4 points to the base of the data segment.  Programs
-are statically linked and PIC is not supported.
-@end table
+@item core2
+Intel Core 2 CPU with 64-bit extensions, MMX, SSE, SSE2, SSE3 and SSSE3
+instruction set support.
 
-@item -mmemory-model=@var{mem-model}
-@opindex mmemory-model
-Set the memory model in force on the processor to one of
+@item nehalem
+Intel Nehalem CPU with 64-bit extensions, MMX, SSE, SSE2, SSE3, SSSE3,
+SSE4.1, SSE4.2 and POPCNT instruction set support.
 
-@table @samp
-@item default
-The default memory model for the processor and operating system.
+@item westmere
+Intel Westmere CPU with 64-bit extensions, MMX, SSE, SSE2, SSE3, SSSE3,
+SSE4.1, SSE4.2, POPCNT, AES and PCLMUL instruction set support.
 
-@item rmo
-Relaxed Memory Order
+@item sandybridge
+Intel Sandy Bridge CPU with 64-bit extensions, MMX, SSE, SSE2, SSE3, SSSE3,
+SSE4.1, SSE4.2, POPCNT, AVX, AES and PCLMUL instruction set support.
 
-@item pso
-Partial Store Order
+@item ivybridge
+Intel Ivy Bridge CPU with 64-bit extensions, MMX, SSE, SSE2, SSE3, SSSE3,
+SSE4.1, SSE4.2, POPCNT, AVX, AES, PCLMUL, FSGSBASE, RDRND and F16C
+instruction set support.
 
-@item tso
-Total Store Order
+@item haswell
+Intel Haswell CPU with 64-bit extensions, MOVBE, MMX, SSE, SSE2, SSE3, SSSE3,
+SSE4.1, SSE4.2, POPCNT, AVX, AVX2, AES, PCLMUL, FSGSBASE, RDRND, FMA,
+BMI, BMI2 and F16C instruction set support.
 
-@item sc
-Sequential Consistency
-@end table
+@item broadwell
+Intel Broadwell CPU with 64-bit extensions, MOVBE, MMX, SSE, SSE2, SSE3, SSSE3,
+SSE4.1, SSE4.2, POPCNT, AVX, AVX2, AES, PCLMUL, FSGSBASE, RDRND, FMA,
+BMI, BMI2, F16C, RDSEED, ADCX and PREFETCHW instruction set support.
 
-These memory models are formally defined in Appendix D of the Sparc V9
-architecture manual, as set in the processor's @code{PSTATE.MM} field.
+@item bonnell
+Intel Bonnell CPU with 64-bit extensions, MOVBE, MMX, SSE, SSE2, SSE3 and SSSE3
+instruction set support.
 
-@item -mstack-bias
-@itemx -mno-stack-bias
-@opindex mstack-bias
-@opindex mno-stack-bias
-With @option{-mstack-bias}, GCC assumes that the stack pointer, and
-frame pointer if present, are offset by @minus{}2047 which must be added back
-when making stack frame references.  This is the default in 64-bit mode.
-Otherwise, assume no such offset is present.
-@end table
+@item silvermont
+Intel Silvermont CPU with 64-bit extensions, MOVBE, MMX, SSE, SSE2, SSE3, SSSE3,
+SSE4.1, SSE4.2, POPCNT, AES, PCLMUL and RDRND instruction set support.
 
-@node SPU Options
-@subsection SPU Options
-@cindex SPU options
+@item k6
+AMD K6 CPU with MMX instruction set support.
 
-These @samp{-m} options are supported on the SPU:
+@item k6-2
+@itemx k6-3
+Improved versions of AMD K6 CPU with MMX and 3DNow!@: instruction set support.
 
-@table @gcctabopt
-@item -mwarn-reloc
-@itemx -merror-reloc
-@opindex mwarn-reloc
-@opindex merror-reloc
+@item athlon
+@itemx athlon-tbird
+AMD Athlon CPU with MMX, 3dNOW!, enhanced 3DNow!@: and SSE prefetch instructions
+support.
 
-The loader for SPU does not handle dynamic relocations.  By default, GCC
-gives an error when it generates code that requires a dynamic
-relocation.  @option{-mno-error-reloc} disables the error,
-@option{-mwarn-reloc} generates a warning instead.
+@item athlon-4
+@itemx athlon-xp
+@itemx athlon-mp
+Improved AMD Athlon CPU with MMX, 3DNow!, enhanced 3DNow!@: and full SSE
+instruction set support.
 
-@item -msafe-dma
-@itemx -munsafe-dma
-@opindex msafe-dma
-@opindex munsafe-dma
+@item k8
+@itemx opteron
+@itemx athlon64
+@itemx athlon-fx
+Processors based on the AMD K8 core with x86-64 instruction set support,
+including the AMD Opteron, Athlon 64, and Athlon 64 FX processors.
+(This supersets MMX, SSE, SSE2, 3DNow!, enhanced 3DNow!@: and 64-bit
+instruction set extensions.)
 
-Instructions that initiate or test completion of DMA must not be
-reordered with respect to loads and stores of the memory that is being
-accessed.
-With @option{-munsafe-dma} you must use the @code{volatile} keyword to protect
-memory accesses, but that can lead to inefficient code in places where the
-memory is known to not change.  Rather than mark the memory as volatile,
-you can use @option{-msafe-dma} to tell the compiler to treat
-the DMA instructions as potentially affecting all memory.  
+@item k8-sse3
+@itemx opteron-sse3
+@itemx athlon64-sse3
+Improved versions of AMD K8 cores with SSE3 instruction set support.
 
-@item -mbranch-hints
-@opindex mbranch-hints
+@item amdfam10
+@itemx barcelona
+CPUs based on AMD Family 10h cores with x86-64 instruction set support.  (This
+supersets MMX, SSE, SSE2, SSE3, SSE4A, 3DNow!, enhanced 3DNow!, ABM and 64-bit
+instruction set extensions.)
 
-By default, GCC generates a branch hint instruction to avoid
-pipeline stalls for always-taken or probably-taken branches.  A hint
-is not generated closer than 8 instructions away from its branch.
-There is little reason to disable them, except for debugging purposes,
-or to make an object a little bit smaller.
+@item bdver1
+CPUs based on AMD Family 15h cores with x86-64 instruction set support.  (This
+supersets FMA4, AVX, XOP, LWP, AES, PCL_MUL, CX16, MMX, SSE, SSE2, SSE3, SSE4A,
+SSSE3, SSE4.1, SSE4.2, ABM and 64-bit instruction set extensions.)
+@item bdver2
+AMD Family 15h core based CPUs with x86-64 instruction set support.  (This
+supersets BMI, TBM, F16C, FMA, FMA4, AVX, XOP, LWP, AES, PCL_MUL, CX16, MMX,
+SSE, SSE2, SSE3, SSE4A, SSSE3, SSE4.1, SSE4.2, ABM and 64-bit instruction set 
+extensions.)
+@item bdver3
+AMD Family 15h core based CPUs with x86-64 instruction set support.  (This
+supersets BMI, TBM, F16C, FMA, FMA4, FSGSBASE, AVX, XOP, LWP, AES, 
+PCL_MUL, CX16, MMX, SSE, SSE2, SSE3, SSE4A, SSSE3, SSE4.1, SSE4.2, ABM and 
+64-bit instruction set extensions.
+@item bdver4
+AMD Family 15h core based CPUs with x86-64 instruction set support.  (This
+supersets BMI, BMI2, TBM, F16C, FMA, FMA4, FSGSBASE, AVX, AVX2, XOP, LWP, 
+AES, PCL_MUL, CX16, MOVBE, MMX, SSE, SSE2, SSE3, SSE4A, SSSE3, SSE4.1, 
+SSE4.2, ABM and 64-bit instruction set extensions.
 
-@item -msmall-mem
-@itemx -mlarge-mem
-@opindex msmall-mem
-@opindex mlarge-mem
+@item btver1
+CPUs based on AMD Family 14h cores with x86-64 instruction set support.  (This
+supersets MMX, SSE, SSE2, SSE3, SSSE3, SSE4A, CX16, ABM and 64-bit
+instruction set extensions.)
 
-By default, GCC generates code assuming that addresses are never larger
-than 18 bits.  With @option{-mlarge-mem} code is generated that assumes
-a full 32-bit address.
+@item btver2
+CPUs based on AMD Family 16h cores with x86-64 instruction set support. This
+includes MOVBE, F16C, BMI, AVX, PCL_MUL, AES, SSE4.2, SSE4.1, CX16, ABM,
+SSE4A, SSSE3, SSE3, SSE2, SSE, MMX and 64-bit instruction set extensions.
 
-@item -mstdmain
-@opindex mstdmain
+@item winchip-c6
+IDT WinChip C6 CPU, dealt in same way as i486 with additional MMX instruction
+set support.
 
-By default, GCC links against startup code that assumes the SPU-style
-main function interface (which has an unconventional parameter list).
-With @option{-mstdmain}, GCC links your program against startup
-code that assumes a C99-style interface to @code{main}, including a
-local copy of @code{argv} strings.
+@item winchip2
+IDT WinChip 2 CPU, dealt in same way as i486 with additional MMX and 3DNow!@:
+instruction set support.
 
-@item -mfixed-range=@var{register-range}
-@opindex mfixed-range
-Generate code treating the given register range as fixed registers.
-A fixed register is one that the register allocator cannot use.  This is
-useful when compiling kernel code.  A register range is specified as
-two registers separated by a dash.  Multiple register ranges can be
-specified separated by a comma.
+@item c3
+VIA C3 CPU with MMX and 3DNow!@: instruction set support.  (No scheduling is
+implemented for this chip.)
 
-@item -mea32
-@itemx -mea64
-@opindex mea32
-@opindex mea64
-Compile code assuming that pointers to the PPU address space accessed
-via the @code{__ea} named address space qualifier are either 32 or 64
-bits wide.  The default is 32 bits.  As this is an ABI-changing option,
-all object code in an executable must be compiled with the same setting.
+@item c3-2
+VIA C3-2 (Nehemiah/C5XL) CPU with MMX and SSE instruction set support.
+(No scheduling is
+implemented for this chip.)
 
-@item -maddress-space-conversion
-@itemx -mno-address-space-conversion
-@opindex maddress-space-conversion
-@opindex mno-address-space-conversion
-Allow/disallow treating the @code{__ea} address space as superset
-of the generic address space.  This enables explicit type casts
-between @code{__ea} and generic pointer as well as implicit
-conversions of generic pointers to @code{__ea} pointers.  The
-default is to allow address space pointer conversions.
+@item geode
+AMD Geode embedded processor with MMX and 3DNow!@: instruction set support.
+@end table
 
-@item -mcache-size=@var{cache-size}
-@opindex mcache-size
-This option controls the version of libgcc that the compiler links to an
-executable and selects a software-managed cache for accessing variables
-in the @code{__ea} address space with a particular cache size.  Possible
-options for @var{cache-size} are @samp{8}, @samp{16}, @samp{32}, @samp{64}
-and @samp{128}.  The default cache size is 64KB.
+@item -mtune=@var{cpu-type}
+@opindex mtune
+Tune to @var{cpu-type} everything applicable about the generated code, except
+for the ABI and the set of available instructions.  
+While picking a specific @var{cpu-type} schedules things appropriately
+for that particular chip, the compiler does not generate any code that
+cannot run on the default machine type unless you use a
+@option{-march=@var{cpu-type}} option.
+For example, if GCC is configured for i686-pc-linux-gnu
+then @option{-mtune=pentium4} generates code that is tuned for Pentium 4
+but still runs on i686 machines.
 
-@item -matomic-updates
-@itemx -mno-atomic-updates
-@opindex matomic-updates
-@opindex mno-atomic-updates
-This option controls the version of libgcc that the compiler links to an
-executable and selects whether atomic updates to the software-managed
-cache of PPU-side variables are used.  If you use atomic updates, changes
-to a PPU variable from SPU code using the @code{__ea} named address space
-qualifier do not interfere with changes to other PPU variables residing
-in the same cache line from PPU code.  If you do not use atomic updates,
-such interference may occur; however, writing back cache lines is
-more efficient.  The default behavior is to use atomic updates.
+The choices for @var{cpu-type} are the same as for @option{-march}.
+In addition, @option{-mtune} supports 2 extra choices for @var{cpu-type}:
 
-@item -mdual-nops
-@itemx -mdual-nops=@var{n}
-@opindex mdual-nops
-By default, GCC inserts nops to increase dual issue when it expects
-it to increase performance.  @var{n} can be a value from 0 to 10.  A
-smaller @var{n} inserts fewer nops.  10 is the default, 0 is the
-same as @option{-mno-dual-nops}.  Disabled with @option{-Os}.
+@table @samp
+@item generic
+Produce code optimized for the most common IA32/@/AMD64/@/EM64T processors.
+If you know the CPU on which your code will run, then you should use
+the corresponding @option{-mtune} or @option{-march} option instead of
+@option{-mtune=generic}.  But, if you do not know exactly what CPU users
+of your application will have, then you should use this option.
 
-@item -mhint-max-nops=@var{n}
-@opindex mhint-max-nops
-Maximum number of nops to insert for a branch hint.  A branch hint must
-be at least 8 instructions away from the branch it is affecting.  GCC
-inserts up to @var{n} nops to enforce this, otherwise it does not
-generate the branch hint.
+As new processors are deployed in the marketplace, the behavior of this
+option will change.  Therefore, if you upgrade to a newer version of
+GCC, code generation controlled by this option will change to reflect
+the processors
+that are most common at the time that version of GCC is released.
 
-@item -mhint-max-distance=@var{n}
-@opindex mhint-max-distance
-The encoding of the branch hint instruction limits the hint to be within
-256 instructions of the branch it is affecting.  By default, GCC makes
-sure it is within 125.
+There is no @option{-march=generic} option because @option{-march}
+indicates the instruction set the compiler can use, and there is no
+generic instruction set applicable to all processors.  In contrast,
+@option{-mtune} indicates the processor (or, in this case, collection of
+processors) for which the code is optimized.
 
-@item -msafe-hints
-@opindex msafe-hints
-Work around a hardware bug that causes the SPU to stall indefinitely.
-By default, GCC inserts the @code{hbrp} instruction to make sure
-this stall won't happen.
+@item intel
+Produce code optimized for the most current Intel processors, which are
+Haswell and Silvermont for this version of GCC.  If you know the CPU
+on which your code will run, then you should use the corresponding
+@option{-mtune} or @option{-march} option instead of @option{-mtune=intel}.
+But, if you want your application performs better on both Haswell and
+Silvermont, then you should use this option.
+
+As new Intel processors are deployed in the marketplace, the behavior of
+this option will change.  Therefore, if you upgrade to a newer version of
+GCC, code generation controlled by this option will change to reflect
+the most current Intel processors at the time that version of GCC is
+released.
 
+There is no @option{-march=intel} option because @option{-march} indicates
+the instruction set the compiler can use, and there is no common
+instruction set applicable to all processors.  In contrast,
+@option{-mtune} indicates the processor (or, in this case, collection of
+processors) for which the code is optimized.
 @end table
 
-@node System V Options
-@subsection Options for System V
+@item -mcpu=@var{cpu-type}
+@opindex mcpu
+A deprecated synonym for @option{-mtune}.
 
-These additional options are available on System V Release 4 for
-compatibility with other compilers on those systems:
+@item -mfpmath=@var{unit}
+@opindex mfpmath
+Generate floating-point arithmetic for selected unit @var{unit}.  The choices
+for @var{unit} are:
 
-@table @gcctabopt
-@item -G
-@opindex G
-Create a shared object.
-It is recommended that @option{-symbolic} or @option{-shared} be used instead.
+@table @samp
+@item 387
+Use the standard 387 floating-point coprocessor present on the majority of chips and
+emulated otherwise.  Code compiled with this option runs almost everywhere.
+The temporary results are computed in 80-bit precision instead of the precision
+specified by the type, resulting in slightly different results compared to most
+of other chips.  See @option{-ffloat-store} for more detailed description.
 
-@item -Qy
-@opindex Qy
-Identify the versions of each tool used by the compiler, in a
-@code{.ident} assembler directive in the output.
+This is the default choice for x86-32 targets.
 
-@item -Qn
-@opindex Qn
-Refrain from adding @code{.ident} directives to the output file (this is
-the default).
+@item sse
+Use scalar floating-point instructions present in the SSE instruction set.
+This instruction set is supported by Pentium III and newer chips,
+and in the AMD line
+by Athlon-4, Athlon XP and Athlon MP chips.  The earlier version of the SSE
+instruction set supports only single-precision arithmetic, thus the double and
+extended-precision arithmetic are still done using 387.  A later version, present
+only in Pentium 4 and AMD x86-64 chips, supports double-precision
+arithmetic too.
 
-@item -YP,@var{dirs}
-@opindex YP
-Search the directories @var{dirs}, and no others, for libraries
-specified with @option{-l}.
+For the x86-32 compiler, you must use @option{-march=@var{cpu-type}}, @option{-msse}
+or @option{-msse2} switches to enable SSE extensions and make this option
+effective.  For the x86-64 compiler, these extensions are enabled by default.
 
-@item -Ym,@var{dir}
-@opindex Ym
-Look in the directory @var{dir} to find the M4 preprocessor.
-The assembler uses this option.
-@c This is supposed to go with a -Yd for predefined M4 macro files, but
-@c the generic assembler that comes with Solaris takes just -Ym.
+The resulting code should be considerably faster in the majority of cases and avoid
+the numerical instability problems of 387 code, but may break some existing
+code that expects temporaries to be 80 bits.
+
+This is the default choice for the x86-64 compiler.
+
+@item sse,387
+@itemx sse+387
+@itemx both
+Attempt to utilize both instruction sets at once.  This effectively doubles the
+amount of available registers, and on chips with separate execution units for
+387 and SSE the execution resources too.  Use this option with care, as it is
+still experimental, because the GCC register allocator does not model separate
+functional units well, resulting in unstable performance.
 @end table
 
-@node TILE-Gx Options
-@subsection TILE-Gx Options
-@cindex TILE-Gx options
+@item -masm=@var{dialect}
+@opindex masm=@var{dialect}
+Output assembly instructions using selected @var{dialect}.  Supported
+choices are @samp{intel} or @samp{att} (the default).  Darwin does
+not support @samp{intel}.
 
-These @samp{-m} options are supported on the TILE-Gx:
+@item -mieee-fp
+@itemx -mno-ieee-fp
+@opindex mieee-fp
+@opindex mno-ieee-fp
+Control whether or not the compiler uses IEEE floating-point
+comparisons.  These correctly handle the case where the result of a
+comparison is unordered.
 
-@table @gcctabopt
-@item -mcmodel=small
-@opindex mcmodel=small
-Generate code for the small model.  The distance for direct calls is
-limited to 500M in either direction.  PC-relative addresses are 32
-bits.  Absolute addresses support the full address range.
+@item -msoft-float
+@opindex msoft-float
+Generate output containing library calls for floating point.
 
-@item -mcmodel=large
-@opindex mcmodel=large
-Generate code for the large model.  There is no limitation on call
-distance, pc-relative addresses, or absolute addresses.
+@strong{Warning:} the requisite libraries are not part of GCC@.
+Normally the facilities of the machine's usual C compiler are used, but
+this can't be done directly in cross-compilation.  You must make your
+own arrangements to provide suitable library functions for
+cross-compilation.
 
-@item -mcpu=@var{name}
-@opindex mcpu
-Selects the type of CPU to be targeted.  Currently the only supported
-type is @samp{tilegx}.
+On machines where a function returns floating-point results in the 80387
+register stack, some floating-point opcodes may be emitted even if
+@option{-msoft-float} is used.
 
-@item -m32
-@itemx -m64
-@opindex m32
-@opindex m64
-Generate code for a 32-bit or 64-bit environment.  The 32-bit
-environment sets int, long, and pointer to 32 bits.  The 64-bit
-environment sets int to 32 bits and long and pointer to 64 bits.
+@item -mno-fp-ret-in-387
+@opindex mno-fp-ret-in-387
+Do not use the FPU registers for return values of functions.
 
-@item -mbig-endian
-@itemx -mlittle-endian
-@opindex mbig-endian
-@opindex mlittle-endian
-Generate code in big/little endian mode, respectively.
-@end table
+The usual calling convention has functions return values of types
+@code{float} and @code{double} in an FPU register, even if there
+is no FPU@.  The idea is that the operating system should emulate
+an FPU@.
+
+The option @option{-mno-fp-ret-in-387} causes such values to be returned
+in ordinary CPU registers instead.
+
+@item -mno-fancy-math-387
+@opindex mno-fancy-math-387
+Some 387 emulators do not support the @code{sin}, @code{cos} and
+@code{sqrt} instructions for the 387.  Specify this option to avoid
+generating those instructions.  This option is the default on FreeBSD,
+OpenBSD and NetBSD@.  This option is overridden when @option{-march}
+indicates that the target CPU always has an FPU and so the
+instruction does not need emulation.  These
+instructions are not generated unless you also use the
+@option{-funsafe-math-optimizations} switch.
+
+@item -malign-double
+@itemx -mno-align-double
+@opindex malign-double
+@opindex mno-align-double
+Control whether GCC aligns @code{double}, @code{long double}, and
+@code{long long} variables on a two-word boundary or a one-word
+boundary.  Aligning @code{double} variables on a two-word boundary
+produces code that runs somewhat faster on a Pentium at the
+expense of more memory.
+
+On x86-64, @option{-malign-double} is enabled by default.
+
+@strong{Warning:} if you use the @option{-malign-double} switch,
+structures containing the above types are aligned differently than
+the published application binary interface specifications for the x86-32
+and are not binary compatible with structures in code compiled
+without that switch.
+
+@item -m96bit-long-double
+@itemx -m128bit-long-double
+@opindex m96bit-long-double
+@opindex m128bit-long-double
+These switches control the size of @code{long double} type.  The x86-32
+application binary interface specifies the size to be 96 bits,
+so @option{-m96bit-long-double} is the default in 32-bit mode.
+
+Modern architectures (Pentium and newer) prefer @code{long double}
+to be aligned to an 8- or 16-byte boundary.  In arrays or structures
+conforming to the ABI, this is not possible.  So specifying
+@option{-m128bit-long-double} aligns @code{long double}
+to a 16-byte boundary by padding the @code{long double} with an additional
+32-bit zero.
+
+In the x86-64 compiler, @option{-m128bit-long-double} is the default choice as
+its ABI specifies that @code{long double} is aligned on 16-byte boundary.
+
+Notice that neither of these options enable any extra precision over the x87
+standard of 80 bits for a @code{long double}.
+
+@strong{Warning:} if you override the default value for your target ABI, this
+changes the size of 
+structures and arrays containing @code{long double} variables,
+as well as modifying the function calling convention for functions taking
+@code{long double}.  Hence they are not binary-compatible
+with code compiled without that switch.
+
+@item -mlong-double-64
+@itemx -mlong-double-80
+@itemx -mlong-double-128
+@opindex mlong-double-64
+@opindex mlong-double-80
+@opindex mlong-double-128
+These switches control the size of @code{long double} type. A size
+of 64 bits makes the @code{long double} type equivalent to the @code{double}
+type. This is the default for 32-bit Bionic C library.  A size
+of 128 bits makes the @code{long double} type equivalent to the
+@code{__float128} type. This is the default for 64-bit Bionic C library.
+
+@strong{Warning:} if you override the default value for your target ABI, this
+changes the size of
+structures and arrays containing @code{long double} variables,
+as well as modifying the function calling convention for functions taking
+@code{long double}.  Hence they are not binary-compatible
+with code compiled without that switch.
+
+@item -malign-data=@var{type}
+@opindex malign-data
+Control how GCC aligns variables.  Supported values for @var{type} are
+@samp{compat} uses increased alignment value compatible uses GCC 4.8
+and earlier, @samp{abi} uses alignment value as specified by the
+psABI, and @samp{cacheline} uses increased alignment value to match
+the cache line size.  @samp{compat} is the default.
+
+@item -mlarge-data-threshold=@var{threshold}
+@opindex mlarge-data-threshold
+When @option{-mcmodel=medium} is specified, data objects larger than
+@var{threshold} are placed in the large data section.  This value must be the
+same across all objects linked into the binary, and defaults to 65535.
+
+@item -mrtd
+@opindex mrtd
+Use a different function-calling convention, in which functions that
+take a fixed number of arguments return with the @code{ret @var{num}}
+instruction, which pops their arguments while returning.  This saves one
+instruction in the caller since there is no need to pop the arguments
+there.
+
+You can specify that an individual function is called with this calling
+sequence with the function attribute @code{stdcall}.  You can also
+override the @option{-mrtd} option by using the function attribute
+@code{cdecl}.  @xref{Function Attributes}.
+
+@strong{Warning:} this calling convention is incompatible with the one
+normally used on Unix, so you cannot use it if you need to call
+libraries compiled with the Unix compiler.
+
+Also, you must provide function prototypes for all functions that
+take variable numbers of arguments (including @code{printf});
+otherwise incorrect code is generated for calls to those
+functions.
+
+In addition, seriously incorrect code results if you call a
+function with too many arguments.  (Normally, extra arguments are
+harmlessly ignored.)
+
+@item -mregparm=@var{num}
+@opindex mregparm
+Control how many registers are used to pass integer arguments.  By
+default, no registers are used to pass arguments, and at most 3
+registers can be used.  You can control this behavior for a specific
+function by using the function attribute @code{regparm}.
+@xref{Function Attributes}.
+
+@strong{Warning:} if you use this switch, and
+@var{num} is nonzero, then you must build all modules with the same
+value, including any libraries.  This includes the system libraries and
+startup modules.
+
+@item -msseregparm
+@opindex msseregparm
+Use SSE register passing conventions for float and double arguments
+and return values.  You can control this behavior for a specific
+function by using the function attribute @code{sseregparm}.
+@xref{Function Attributes}.
+
+@strong{Warning:} if you use this switch then you must build all
+modules with the same value, including any libraries.  This includes
+the system libraries and startup modules.
+
+@item -mvect8-ret-in-mem
+@opindex mvect8-ret-in-mem
+Return 8-byte vectors in memory instead of MMX registers.  This is the
+default on Solaris@tie{}8 and 9 and VxWorks to match the ABI of the Sun
+Studio compilers until version 12.  Later compiler versions (starting
+with Studio 12 Update@tie{}1) follow the ABI used by other x86 targets, which
+is the default on Solaris@tie{}10 and later.  @emph{Only} use this option if
+you need to remain compatible with existing code produced by those
+previous compiler versions or older versions of GCC@.
+
+@item -mpc32
+@itemx -mpc64
+@itemx -mpc80
+@opindex mpc32
+@opindex mpc64
+@opindex mpc80
+
+Set 80387 floating-point precision to 32, 64 or 80 bits.  When @option{-mpc32}
+is specified, the significands of results of floating-point operations are
+rounded to 24 bits (single precision); @option{-mpc64} rounds the
+significands of results of floating-point operations to 53 bits (double
+precision) and @option{-mpc80} rounds the significands of results of
+floating-point operations to 64 bits (extended double precision), which is
+the default.  When this option is used, floating-point operations in higher
+precisions are not available to the programmer without setting the FPU
+control word explicitly.
+
+Setting the rounding of floating-point operations to less than the default
+80 bits can speed some programs by 2% or more.  Note that some mathematical
+libraries assume that extended-precision (80-bit) floating-point operations
+are enabled by default; routines in such libraries could suffer significant
+loss of accuracy, typically through so-called ``catastrophic cancellation'',
+when this option is used to set the precision to less than extended precision.
+
+@item -mstackrealign
+@opindex mstackrealign
+Realign the stack at entry.  On the x86, the @option{-mstackrealign}
+option generates an alternate prologue and epilogue that realigns the
+run-time stack if necessary.  This supports mixing legacy codes that keep
+4-byte stack alignment with modern codes that keep 16-byte stack alignment for
+SSE compatibility.  See also the attribute @code{force_align_arg_pointer},
+applicable to individual functions.
+
+@item -mpreferred-stack-boundary=@var{num}
+@opindex mpreferred-stack-boundary
+Attempt to keep the stack boundary aligned to a 2 raised to @var{num}
+byte boundary.  If @option{-mpreferred-stack-boundary} is not specified,
+the default is 4 (16 bytes or 128 bits).
+
+@strong{Warning:} When generating code for the x86-64 architecture with
+SSE extensions disabled, @option{-mpreferred-stack-boundary=3} can be
+used to keep the stack boundary aligned to 8 byte boundary.  Since
+x86-64 ABI require 16 byte stack alignment, this is ABI incompatible and
+intended to be used in controlled environment where stack space is
+important limitation.  This option leads to wrong code when functions
+compiled with 16 byte stack alignment (such as functions from a standard
+library) are called with misaligned stack.  In this case, SSE
+instructions may lead to misaligned memory access traps.  In addition,
+variable arguments are handled incorrectly for 16 byte aligned
+objects (including x87 long double and __int128), leading to wrong
+results.  You must build all modules with
+@option{-mpreferred-stack-boundary=3}, including any libraries.  This
+includes the system libraries and startup modules.
+
+@item -mincoming-stack-boundary=@var{num}
+@opindex mincoming-stack-boundary
+Assume the incoming stack is aligned to a 2 raised to @var{num} byte
+boundary.  If @option{-mincoming-stack-boundary} is not specified,
+the one specified by @option{-mpreferred-stack-boundary} is used.
+
+On Pentium and Pentium Pro, @code{double} and @code{long double} values
+should be aligned to an 8-byte boundary (see @option{-malign-double}) or
+suffer significant run time performance penalties.  On Pentium III, the
+Streaming SIMD Extension (SSE) data type @code{__m128} may not work
+properly if it is not 16-byte aligned.
+
+To ensure proper alignment of this values on the stack, the stack boundary
+must be as aligned as that required by any value stored on the stack.
+Further, every function must be generated such that it keeps the stack
+aligned.  Thus calling a function compiled with a higher preferred
+stack boundary from a function compiled with a lower preferred stack
+boundary most likely misaligns the stack.  It is recommended that
+libraries that use callbacks always use the default setting.
+
+This extra alignment does consume extra stack space, and generally
+increases code size.  Code that is sensitive to stack space usage, such
+as embedded systems and operating system kernels, may want to reduce the
+preferred alignment to @option{-mpreferred-stack-boundary=2}.
+
+@need 200
+@item -mmmx
+@opindex mmmx
+@need 200
+@itemx -msse
+@opindex msse
+@need 200
+@itemx -msse2
+@need 200
+@itemx -msse3
+@need 200
+@itemx -mssse3
+@need 200
+@itemx -msse4
+@need 200
+@itemx -msse4a
+@need 200
+@itemx -msse4.1
+@need 200
+@itemx -msse4.2
+@need 200
+@itemx -mavx
+@opindex mavx
+@need 200
+@itemx -mavx2
+@need 200
+@itemx -mavx512f
+@need 200
+@itemx -mavx512pf
+@need 200
+@itemx -mavx512er
+@need 200
+@itemx -mavx512cd
+@need 200
+@itemx -msha
+@opindex msha
+@need 200
+@itemx -maes
+@opindex maes
+@need 200
+@itemx -mpclmul
+@opindex mpclmul
+@need 200
+@itemx -mclfushopt
+@opindex mclfushopt
+@need 200
+@itemx -mfsgsbase
+@opindex mfsgsbase
+@need 200
+@itemx -mrdrnd
+@opindex mrdrnd
+@need 200
+@itemx -mf16c
+@opindex mf16c
+@need 200
+@itemx -mfma
+@opindex mfma
+@need 200
+@itemx -mfma4
+@need 200
+@itemx -mno-fma4
+@need 200
+@itemx -mprefetchwt1
+@opindex mprefetchwt1
+@need 200
+@itemx -mxop
+@opindex mxop
+@need 200
+@itemx -mlwp
+@opindex mlwp
+@need 200
+@itemx -m3dnow
+@opindex m3dnow
+@need 200
+@itemx -mpopcnt
+@opindex mpopcnt
+@need 200
+@itemx -mabm
+@opindex mabm
+@need 200
+@itemx -mbmi
+@opindex mbmi
+@need 200
+@itemx -mbmi2
+@need 200
+@itemx -mlzcnt
+@opindex mlzcnt
+@need 200
+@itemx -mfxsr
+@opindex mfxsr
+@need 200
+@itemx -mxsave
+@opindex mxsave
+@need 200
+@itemx -mxsaveopt
+@opindex mxsaveopt
+@need 200
+@itemx -mxsavec
+@opindex mxsavec
+@need 200
+@itemx -mxsaves
+@opindex mxsaves
+@need 200
+@itemx -mrtm
+@opindex mrtm
+@need 200
+@itemx -mtbm
+@opindex mtbm
+@need 200
+@itemx -mmpx
+@opindex mmpx
+These switches enable the use of instructions in the MMX, SSE,
+SSE2, SSE3, SSSE3, SSE4.1, AVX, AVX2, AVX512F, AVX512PF, AVX512ER, AVX512CD,
+SHA, AES, PCLMUL, FSGSBASE, RDRND, F16C, FMA, SSE4A, FMA4, XOP, LWP, ABM,
+BMI, BMI2, FXSR, XSAVE, XSAVEOPT, LZCNT, RTM, MPX or 3DNow!@:
+extended instruction sets.  Each has a corresponding @option{-mno-} option
+to disable use of these instructions.
+
+These extensions are also available as built-in functions: see
+@ref{x86 Built-in Functions}, for details of the functions enabled and
+disabled by these switches.
+
+To generate SSE/SSE2 instructions automatically from floating-point
+code (as opposed to 387 instructions), see @option{-mfpmath=sse}.
 
-@node TILEPro Options
-@subsection TILEPro Options
-@cindex TILEPro options
+GCC depresses SSEx instructions when @option{-mavx} is used. Instead, it
+generates new AVX instructions or AVX equivalence for all SSEx instructions
+when needed.
 
-These @samp{-m} options are supported on the TILEPro:
+These options enable GCC to use these extended instructions in
+generated code, even without @option{-mfpmath=sse}.  Applications that
+perform run-time CPU detection must compile separate files for each
+supported architecture, using the appropriate flags.  In particular,
+the file containing the CPU detection code should be compiled without
+these options.
 
-@table @gcctabopt
-@item -mcpu=@var{name}
-@opindex mcpu
-Selects the type of CPU to be targeted.  Currently the only supported
-type is @samp{tilepro}.
+@item -mdump-tune-features
+@opindex mdump-tune-features
+This option instructs GCC to dump the names of the x86 performance 
+tuning features and default settings. The names can be used in 
+@option{-mtune-ctrl=@var{feature-list}}.
 
-@item -m32
-@opindex m32
-Generate code for a 32-bit environment, which sets int, long, and
-pointer to 32 bits.  This is the only supported behavior so the flag
-is essentially ignored.
-@end table
+@item -mtune-ctrl=@var{feature-list}
+@opindex mtune-ctrl=@var{feature-list}
+This option is used to do fine grain control of x86 code generation features.
+@var{feature-list} is a comma separated list of @var{feature} names. See also
+@option{-mdump-tune-features}. When specified, the @var{feature} is turned
+on if it is not preceded with @samp{^}, otherwise, it is turned off. 
+@option{-mtune-ctrl=@var{feature-list}} is intended to be used by GCC
+developers. Using it may lead to code paths not covered by testing and can
+potentially result in compiler ICEs or runtime errors.
 
-@node V850 Options
-@subsection V850 Options
-@cindex V850 Options
+@item -mno-default
+@opindex mno-default
+This option instructs GCC to turn off all tunable features. See also 
+@option{-mtune-ctrl=@var{feature-list}} and @option{-mdump-tune-features}.
 
-These @samp{-m} options are defined for V850 implementations:
+@item -mcld
+@opindex mcld
+This option instructs GCC to emit a @code{cld} instruction in the prologue
+of functions that use string instructions.  String instructions depend on
+the DF flag to select between autoincrement or autodecrement mode.  While the
+ABI specifies the DF flag to be cleared on function entry, some operating
+systems violate this specification by not clearing the DF flag in their
+exception dispatchers.  The exception handler can be invoked with the DF flag
+set, which leads to wrong direction mode when string instructions are used.
+This option can be enabled by default on 32-bit x86 targets by configuring
+GCC with the @option{--enable-cld} configure option.  Generation of @code{cld}
+instructions can be suppressed with the @option{-mno-cld} compiler option
+in this case.
 
-@table @gcctabopt
-@item -mlong-calls
-@itemx -mno-long-calls
-@opindex mlong-calls
-@opindex mno-long-calls
-Treat all calls as being far away (near).  If calls are assumed to be
-far away, the compiler always loads the function's address into a
-register, and calls indirect through the pointer.
+@item -mvzeroupper
+@opindex mvzeroupper
+This option instructs GCC to emit a @code{vzeroupper} instruction
+before a transfer of control flow out of the function to minimize
+the AVX to SSE transition penalty as well as remove unnecessary @code{zeroupper}
+intrinsics.
 
-@item -mno-ep
-@itemx -mep
-@opindex mno-ep
-@opindex mep
-Do not optimize (do optimize) basic blocks that use the same index
-pointer 4 or more times to copy pointer into the @code{ep} register, and
-use the shorter @code{sld} and @code{sst} instructions.  The @option{-mep}
-option is on by default if you optimize.
+@item -mprefer-avx128
+@opindex mprefer-avx128
+This option instructs GCC to use 128-bit AVX instructions instead of
+256-bit AVX instructions in the auto-vectorizer.
 
-@item -mno-prolog-function
-@itemx -mprolog-function
-@opindex mno-prolog-function
-@opindex mprolog-function
-Do not use (do use) external functions to save and restore registers
-at the prologue and epilogue of a function.  The external functions
-are slower, but use less code space if more than one function saves
-the same number of registers.  The @option{-mprolog-function} option
-is on by default if you optimize.
+@item -mcx16
+@opindex mcx16
+This option enables GCC to generate @code{CMPXCHG16B} instructions.
+@code{CMPXCHG16B} allows for atomic operations on 128-bit double quadword
+(or oword) data types.  
+This is useful for high-resolution counters that can be updated
+by multiple processors (or cores).  This instruction is generated as part of
+atomic built-in functions: see @ref{__sync Builtins} or
+@ref{__atomic Builtins} for details.
 
-@item -mspace
-@opindex mspace
-Try to make the code as small as possible.  At present, this just turns
-on the @option{-mep} and @option{-mprolog-function} options.
+@item -msahf
+@opindex msahf
+This option enables generation of @code{SAHF} instructions in 64-bit code.
+Early Intel Pentium 4 CPUs with Intel 64 support,
+prior to the introduction of Pentium 4 G1 step in December 2005,
+lacked the @code{LAHF} and @code{SAHF} instructions
+which are supported by AMD64.
+These are load and store instructions, respectively, for certain status flags.
+In 64-bit mode, the @code{SAHF} instruction is used to optimize @code{fmod},
+@code{drem}, and @code{remainder} built-in functions;
+see @ref{Other Builtins} for details.
 
-@item -mtda=@var{n}
-@opindex mtda
-Put static or global variables whose size is @var{n} bytes or less into
-the tiny data area that register @code{ep} points to.  The tiny data
-area can hold up to 256 bytes in total (128 bytes for byte references).
+@item -mmovbe
+@opindex mmovbe
+This option enables use of the @code{movbe} instruction to implement
+@code{__builtin_bswap32} and @code{__builtin_bswap64}.
 
-@item -msda=@var{n}
-@opindex msda
-Put static or global variables whose size is @var{n} bytes or less into
-the small data area that register @code{gp} points to.  The small data
-area can hold up to 64 kilobytes.
+@item -mcrc32
+@opindex mcrc32
+This option enables built-in functions @code{__builtin_ia32_crc32qi},
+@code{__builtin_ia32_crc32hi}, @code{__builtin_ia32_crc32si} and
+@code{__builtin_ia32_crc32di} to generate the @code{crc32} machine instruction.
 
-@item -mzda=@var{n}
-@opindex mzda
-Put static or global variables whose size is @var{n} bytes or less into
-the first 32 kilobytes of memory.
+@item -mrecip
+@opindex mrecip
+This option enables use of @code{RCPSS} and @code{RSQRTSS} instructions
+(and their vectorized variants @code{RCPPS} and @code{RSQRTPS})
+with an additional Newton-Raphson step
+to increase precision instead of @code{DIVSS} and @code{SQRTSS}
+(and their vectorized
+variants) for single-precision floating-point arguments.  These instructions
+are generated only when @option{-funsafe-math-optimizations} is enabled
+together with @option{-finite-math-only} and @option{-fno-trapping-math}.
+Note that while the throughput of the sequence is higher than the throughput
+of the non-reciprocal instruction, the precision of the sequence can be
+decreased by up to 2 ulp (i.e. the inverse of 1.0 equals 0.99999994).
 
-@item -mv850
-@opindex mv850
-Specify that the target processor is the V850.
+Note that GCC implements @code{1.0f/sqrtf(@var{x})} in terms of @code{RSQRTSS}
+(or @code{RSQRTPS}) already with @option{-ffast-math} (or the above option
+combination), and doesn't need @option{-mrecip}.
 
-@item -mv850e3v5
-@opindex mv850e3v5
-Specify that the target processor is the V850E3V5.  The preprocessor
-constant @code{__v850e3v5__} is defined if this option is used.
+Also note that GCC emits the above sequence with additional Newton-Raphson step
+for vectorized single-float division and vectorized @code{sqrtf(@var{x})}
+already with @option{-ffast-math} (or the above option combination), and
+doesn't need @option{-mrecip}.
 
-@item -mv850e2v4
-@opindex mv850e2v4
-Specify that the target processor is the V850E3V5.  This is an alias for
-the @option{-mv850e3v5} option.
+@item -mrecip=@var{opt}
+@opindex mrecip=opt
+This option controls which reciprocal estimate instructions
+may be used.  @var{opt} is a comma-separated list of options, which may
+be preceded by a @samp{!} to invert the option:
 
-@item -mv850e2v3
-@opindex mv850e2v3
-Specify that the target processor is the V850E2V3.  The preprocessor
-constant @code{__v850e2v3__} is defined if this option is used.
+@table @samp
+@item all
+Enable all estimate instructions.
 
-@item -mv850e2
-@opindex mv850e2
-Specify that the target processor is the V850E2.  The preprocessor
-constant @code{__v850e2__} is defined if this option is used.
+@item default
+Enable the default instructions, equivalent to @option{-mrecip}.
 
-@item -mv850e1
-@opindex mv850e1
-Specify that the target processor is the V850E1.  The preprocessor
-constants @code{__v850e1__} and @code{__v850e__} are defined if
-this option is used.
+@item none
+Disable all estimate instructions, equivalent to @option{-mno-recip}.
 
-@item -mv850es
-@opindex mv850es
-Specify that the target processor is the V850ES.  This is an alias for
-the @option{-mv850e1} option.
+@item div
+Enable the approximation for scalar division.
 
-@item -mv850e
-@opindex mv850e
-Specify that the target processor is the V850E@.  The preprocessor
-constant @code{__v850e__} is defined if this option is used.
+@item vec-div
+Enable the approximation for vectorized division.
 
-If neither @option{-mv850} nor @option{-mv850e} nor @option{-mv850e1}
-nor @option{-mv850e2} nor @option{-mv850e2v3} nor @option{-mv850e3v5}
-are defined then a default target processor is chosen and the
-relevant @samp{__v850*__} preprocessor constant is defined.
+@item sqrt
+Enable the approximation for scalar square root.
 
-The preprocessor constants @code{__v850} and @code{__v851__} are always
-defined, regardless of which processor variant is the target.
+@item vec-sqrt
+Enable the approximation for vectorized square root.
+@end table
 
-@item -mdisable-callt
-@itemx -mno-disable-callt
-@opindex mdisable-callt
-@opindex mno-disable-callt
-This option suppresses generation of the @code{CALLT} instruction for the
-v850e, v850e1, v850e2, v850e2v3 and v850e3v5 flavors of the v850
-architecture.
+So, for example, @option{-mrecip=all,!sqrt} enables
+all of the reciprocal approximations, except for square root.
 
-This option is enabled by default when the RH850 ABI is
-in use (see @option{-mrh850-abi}), and disabled by default when the
-GCC ABI is in use.  If @code{CALLT} instructions are being generated
-then the C preprocessor symbol @code{__V850_CALLT__} is defined.
+@item -mveclibabi=@var{type}
+@opindex mveclibabi
+Specifies the ABI type to use for vectorizing intrinsics using an
+external library.  Supported values for @var{type} are @samp{svml} 
+for the Intel short
+vector math library and @samp{acml} for the AMD math core library.
+To use this option, both @option{-ftree-vectorize} and
+@option{-funsafe-math-optimizations} have to be enabled, and an SVML or ACML 
+ABI-compatible library must be specified at link time.
 
-@item -mrelax
-@itemx -mno-relax
-@opindex mrelax
-@opindex mno-relax
-Pass on (or do not pass on) the @option{-mrelax} command line option
-to the assembler.
+GCC currently emits calls to @code{vmldExp2},
+@code{vmldLn2}, @code{vmldLog102}, @code{vmldLog102}, @code{vmldPow2},
+@code{vmldTanh2}, @code{vmldTan2}, @code{vmldAtan2}, @code{vmldAtanh2},
+@code{vmldCbrt2}, @code{vmldSinh2}, @code{vmldSin2}, @code{vmldAsinh2},
+@code{vmldAsin2}, @code{vmldCosh2}, @code{vmldCos2}, @code{vmldAcosh2},
+@code{vmldAcos2}, @code{vmlsExp4}, @code{vmlsLn4}, @code{vmlsLog104},
+@code{vmlsLog104}, @code{vmlsPow4}, @code{vmlsTanh4}, @code{vmlsTan4},
+@code{vmlsAtan4}, @code{vmlsAtanh4}, @code{vmlsCbrt4}, @code{vmlsSinh4},
+@code{vmlsSin4}, @code{vmlsAsinh4}, @code{vmlsAsin4}, @code{vmlsCosh4},
+@code{vmlsCos4}, @code{vmlsAcosh4} and @code{vmlsAcos4} for corresponding
+function type when @option{-mveclibabi=svml} is used, and @code{__vrd2_sin},
+@code{__vrd2_cos}, @code{__vrd2_exp}, @code{__vrd2_log}, @code{__vrd2_log2},
+@code{__vrd2_log10}, @code{__vrs4_sinf}, @code{__vrs4_cosf},
+@code{__vrs4_expf}, @code{__vrs4_logf}, @code{__vrs4_log2f},
+@code{__vrs4_log10f} and @code{__vrs4_powf} for the corresponding function type
+when @option{-mveclibabi=acml} is used.  
 
-@item -mlong-jumps
-@itemx -mno-long-jumps
-@opindex mlong-jumps
-@opindex mno-long-jumps
-Disable (or re-enable) the generation of PC-relative jump instructions.
+@item -mabi=@var{name}
+@opindex mabi
+Generate code for the specified calling convention.  Permissible values
+are @samp{sysv} for the ABI used on GNU/Linux and other systems, and
+@samp{ms} for the Microsoft ABI.  The default is to use the Microsoft
+ABI when targeting Microsoft Windows and the SysV ABI on all other systems.
+You can control this behavior for specific functions by
+using the function attributes @code{ms_abi} and @code{sysv_abi}.
+@xref{Function Attributes}.
 
-@item -msoft-float
-@itemx -mhard-float
-@opindex msoft-float
-@opindex mhard-float
-Disable (or re-enable) the generation of hardware floating point
-instructions.  This option is only significant when the target
-architecture is @samp{V850E2V3} or higher.  If hardware floating point
-instructions are being generated then the C preprocessor symbol
-@code{__FPU_OK__} is defined, otherwise the symbol
-@code{__NO_FPU__} is defined.
+@item -mtls-dialect=@var{type}
+@opindex mtls-dialect
+Generate code to access thread-local storage using the @samp{gnu} or
+@samp{gnu2} conventions.  @samp{gnu} is the conservative default;
+@samp{gnu2} is more efficient, but it may add compile- and run-time
+requirements that cannot be satisfied on all systems.
 
-@item -mloop
-@opindex mloop
-Enables the use of the e3v5 LOOP instruction.  The use of this
-instruction is not enabled by default when the e3v5 architecture is
-selected because its use is still experimental.
+@item -mpush-args
+@itemx -mno-push-args
+@opindex mpush-args
+@opindex mno-push-args
+Use PUSH operations to store outgoing parameters.  This method is shorter
+and usually equally fast as method using SUB/MOV operations and is enabled
+by default.  In some cases disabling it may improve performance because of
+improved scheduling and reduced dependencies.
 
-@item -mrh850-abi
-@itemx -mghs
-@opindex mrh850-abi
-@opindex mghs
-Enables support for the RH850 version of the V850 ABI.  This is the
-default.  With this version of the ABI the following rules apply:
+@item -maccumulate-outgoing-args
+@opindex maccumulate-outgoing-args
+If enabled, the maximum amount of space required for outgoing arguments is
+computed in the function prologue.  This is faster on most modern CPUs
+because of reduced dependencies, improved scheduling and reduced stack usage
+when the preferred stack boundary is not equal to 2.  The drawback is a notable
+increase in code size.  This switch implies @option{-mno-push-args}.
 
-@itemize
-@item
-Integer sized structures and unions are returned via a memory pointer
-rather than a register.
+@item -mthreads
+@opindex mthreads
+Support thread-safe exception handling on MinGW.  Programs that rely
+on thread-safe exception handling must compile and link all code with the
+@option{-mthreads} option.  When compiling, @option{-mthreads} defines
+@option{-D_MT}; when linking, it links in a special thread helper library
+@option{-lmingwthrd} which cleans up per-thread exception-handling data.
 
-@item
-Large structures and unions (more than 8 bytes in size) are passed by
-value.
+@item -mno-align-stringops
+@opindex mno-align-stringops
+Do not align the destination of inlined string operations.  This switch reduces
+code size and improves performance in case the destination is already aligned,
+but GCC doesn't know about it.
 
-@item
-Functions are aligned to 16-bit boundaries.
+@item -minline-all-stringops
+@opindex minline-all-stringops
+By default GCC inlines string operations only when the destination is 
+known to be aligned to least a 4-byte boundary.  
+This enables more inlining and increases code
+size, but may improve performance of code that depends on fast
+@code{memcpy}, @code{strlen},
+and @code{memset} for short lengths.
 
-@item
-The @option{-m8byte-align} command line option is supported.
+@item -minline-stringops-dynamically
+@opindex minline-stringops-dynamically
+For string operations of unknown size, use run-time checks with
+inline code for small blocks and a library call for large blocks.
 
-@item
-The @option{-mdisable-callt} command line option is enabled by
-default.  The @option{-mno-disable-callt} command line option is not
-supported.
-@end itemize
+@item -mstringop-strategy=@var{alg}
+@opindex mstringop-strategy=@var{alg}
+Override the internal decision heuristic for the particular algorithm to use
+for inlining string operations.  The allowed values for @var{alg} are:
 
-When this version of the ABI is enabled the C preprocessor symbol
-@code{__V850_RH850_ABI__} is defined.
+@table @samp
+@item rep_byte
+@itemx rep_4byte
+@itemx rep_8byte
+Expand using i386 @code{rep} prefix of the specified size.
 
-@item -mgcc-abi
-@opindex mgcc-abi
-Enables support for the old GCC version of the V850 ABI.  With this
-version of the ABI the following rules apply:
+@item byte_loop
+@itemx loop
+@itemx unrolled_loop
+Expand into an inline loop.
 
-@itemize
-@item
-Integer sized structures and unions are returned in register @code{r10}.
+@item libcall
+Always use a library call.
+@end table
 
-@item
-Large structures and unions (more than 8 bytes in size) are passed by
-reference.
+@item -mmemcpy-strategy=@var{strategy}
+@opindex mmemcpy-strategy=@var{strategy}
+Override the internal decision heuristic to decide if @code{__builtin_memcpy}
+should be inlined and what inline algorithm to use when the expected size
+of the copy operation is known. @var{strategy} 
+is a comma-separated list of @var{alg}:@var{max_size}:@var{dest_align} triplets. 
+@var{alg} is specified in @option{-mstringop-strategy}, @var{max_size} specifies
+the max byte size with which inline algorithm @var{alg} is allowed.  For the last
+triplet, the @var{max_size} must be @code{-1}. The @var{max_size} of the triplets
+in the list must be specified in increasing order.  The minimal byte size for 
+@var{alg} is @code{0} for the first triplet and @code{@var{max_size} + 1} of the 
+preceding range.
 
-@item
-Functions are aligned to 32-bit boundaries, unless optimizing for
-size.
+@item -mmemset-strategy=@var{strategy}
+@opindex mmemset-strategy=@var{strategy}
+The option is similar to @option{-mmemcpy-strategy=} except that it is to control
+@code{__builtin_memset} expansion.
 
-@item
-The @option{-m8byte-align} command line option is not supported.
+@item -momit-leaf-frame-pointer
+@opindex momit-leaf-frame-pointer
+Don't keep the frame pointer in a register for leaf functions.  This
+avoids the instructions to save, set up, and restore frame pointers and
+makes an extra register available in leaf functions.  The option
+@option{-fomit-leaf-frame-pointer} removes the frame pointer for leaf functions,
+which might make debugging harder.
 
-@item
-The @option{-mdisable-callt} command line option is supported but not
-enabled by default.
-@end itemize
+@item -mtls-direct-seg-refs
+@itemx -mno-tls-direct-seg-refs
+@opindex mtls-direct-seg-refs
+Controls whether TLS variables may be accessed with offsets from the
+TLS segment register (@code{%gs} for 32-bit, @code{%fs} for 64-bit),
+or whether the thread base pointer must be added.  Whether or not this
+is valid depends on the operating system, and whether it maps the
+segment to cover the entire TLS area.
 
-When this version of the ABI is enabled the C preprocessor symbol
-@code{__V850_GCC_ABI__} is defined.
+For systems that use the GNU C Library, the default is on.
 
-@item -m8byte-align
-@itemx -mno-8byte-align
-@opindex m8byte-align
-@opindex mno-8byte-align
-Enables support for @code{double} and @code{long long} types to be
-aligned on 8-byte boundaries.  The default is to restrict the
-alignment of all objects to at most 4-bytes.  When
-@option{-m8byte-align} is in effect the C preprocessor symbol
-@code{__V850_8BYTE_ALIGN__} is defined.
+@item -msse2avx
+@itemx -mno-sse2avx
+@opindex msse2avx
+Specify that the assembler should encode SSE instructions with VEX
+prefix.  The option @option{-mavx} turns this on by default.
 
-@item -mbig-switch
-@opindex mbig-switch
-Generate code suitable for big switch tables.  Use this option only if
-the assembler/linker complain about out of range branches within a switch
-table.
+@item -mfentry
+@itemx -mno-fentry
+@opindex mfentry
+If profiling is active (@option{-pg}), put the profiling
+counter call before the prologue.
+Note: On x86 architectures the attribute @code{ms_hook_prologue}
+isn't possible at the moment for @option{-mfentry} and @option{-pg}.
 
-@item -mapp-regs
-@opindex mapp-regs
-This option causes r2 and r5 to be used in the code generated by
-the compiler.  This setting is the default.
+@item -mrecord-mcount
+@itemx -mno-record-mcount
+@opindex mrecord-mcount
+If profiling is active (@option{-pg}), generate a __mcount_loc section
+that contains pointers to each profiling call. This is useful for
+automatically patching and out calls.
 
-@item -mno-app-regs
-@opindex mno-app-regs
-This option causes r2 and r5 to be treated as fixed registers.
+@item -mnop-mcount
+@itemx -mno-nop-mcount
+@opindex mnop-mcount
+If profiling is active (@option{-pg}), generate the calls to
+the profiling functions as nops. This is useful when they
+should be patched in later dynamically. This is likely only
+useful together with @option{-mrecord-mcount}.
 
-@end table
+@item -mskip-rax-setup
+@itemx -mno-skip-rax-setup
+@opindex mskip-rax-setup
+When generating code for the x86-64 architecture with SSE extensions
+disabled, @option{-skip-rax-setup} can be used to skip setting up RAX
+register when there are no variable arguments passed in vector registers.
 
-@node VAX Options
-@subsection VAX Options
-@cindex VAX options
+@strong{Warning:} Since RAX register is used to avoid unnecessarily
+saving vector registers on stack when passing variable arguments, the
+impacts of this option are callees may waste some stack space,
+misbehave or jump to a random location.  GCC 4.4 or newer don't have
+those issues, regardless the RAX register value.
 
-These @samp{-m} options are defined for the VAX:
+@item -m8bit-idiv
+@itemx -mno-8bit-idiv
+@opindex m8bit-idiv
+On some processors, like Intel Atom, 8-bit unsigned integer divide is
+much faster than 32-bit/64-bit integer divide.  This option generates a
+run-time check.  If both dividend and divisor are within range of 0
+to 255, 8-bit unsigned integer divide is used instead of
+32-bit/64-bit integer divide.
 
-@table @gcctabopt
-@item -munix
-@opindex munix
-Do not output certain jump instructions (@code{aobleq} and so on)
-that the Unix assembler for the VAX cannot handle across long
-ranges.
+@item -mavx256-split-unaligned-load
+@itemx -mavx256-split-unaligned-store
+@opindex mavx256-split-unaligned-load
+@opindex mavx256-split-unaligned-store
+Split 32-byte AVX unaligned load and store.
 
-@item -mgnu
-@opindex mgnu
-Do output those jump instructions, on the assumption that the
-GNU assembler is being used.
+@item -mstack-protector-guard=@var{guard}
+@opindex mstack-protector-guard=@var{guard}
+Generate stack protection code using canary at @var{guard}.  Supported
+locations are @samp{global} for global canary or @samp{tls} for per-thread
+canary in the TLS block (the default).  This option has effect only when
+@option{-fstack-protector} or @option{-fstack-protector-all} is specified.
 
-@item -mg
-@opindex mg
-Output code for G-format floating-point numbers instead of D-format.
 @end table
 
-@node Visium Options
-@subsection Visium Options
-@cindex Visium options
+These @samp{-m} switches are supported in addition to the above
+on x86-64 processors in 64-bit environments.
 
 @table @gcctabopt
+@item -m32
+@itemx -m64
+@itemx -mx32
+@itemx -m16
+@opindex m32
+@opindex m64
+@opindex mx32
+@opindex m16
+Generate code for a 16-bit, 32-bit or 64-bit environment.
+The @option{-m32} option sets @code{int}, @code{long}, and pointer types
+to 32 bits, and
+generates code that runs on any i386 system.
 
-@item -mdebug
-@opindex mdebug
-A program which performs file I/O and is destined to run on an MCM target
-should be linked with this option.  It causes the libraries libc.a and
-libdebug.a to be linked.  The program should be run on the target under
-the control of the GDB remote debugging stub.
-
-@item -msim
-@opindex msim
-A program which performs file I/O and is destined to run on the simulator
-should be linked with option.  This causes libraries libc.a and libsim.a to
-be linked.
-
-@item -mfpu
-@itemx -mhard-float
-@opindex mfpu
-@opindex mhard-float
-Generate code containing floating-point instructions.  This is the
-default.
+The @option{-m64} option sets @code{int} to 32 bits and @code{long} and pointer
+types to 64 bits, and generates code for the x86-64 architecture.
+For Darwin only the @option{-m64} option also turns off the @option{-fno-pic}
+and @option{-mdynamic-no-pic} options.
 
-@item -mno-fpu
-@itemx -msoft-float
-@opindex mno-fpu
-@opindex msoft-float
-Generate code containing library calls for floating-point.
+The @option{-mx32} option sets @code{int}, @code{long}, and pointer types
+to 32 bits, and
+generates code for the x86-64 architecture.
 
-@option{-msoft-float} changes the calling convention in the output file;
-therefore, it is only useful if you compile @emph{all} of a program with
-this option.  In particular, you need to compile @file{libgcc.a}, the
-library that comes with GCC, with @option{-msoft-float} in order for
-this to work.
+The @option{-m16} option is the same as @option{-m32}, except for that
+it outputs the @code{.code16gcc} assembly directive at the beginning of
+the assembly output so that the binary can run in 16-bit mode.
 
-@item -mcpu=@var{cpu_type}
-@opindex mcpu
-Set the instruction set, register set, and instruction scheduling parameters
-for machine type @var{cpu_type}.  Supported values for @var{cpu_type} are
-@samp{mcm}, @samp{gr5} and @samp{gr6}.
+@item -mno-red-zone
+@opindex mno-red-zone
+Do not use a so-called ``red zone'' for x86-64 code.  The red zone is mandated
+by the x86-64 ABI; it is a 128-byte area beyond the location of the
+stack pointer that is not modified by signal or interrupt handlers
+and therefore can be used for temporary data without adjusting the stack
+pointer.  The flag @option{-mno-red-zone} disables this red zone.
 
-@samp{mcm} is a synonym of @samp{gr5} present for backward compatibility.
+@item -mcmodel=small
+@opindex mcmodel=small
+Generate code for the small code model: the program and its symbols must
+be linked in the lower 2 GB of the address space.  Pointers are 64 bits.
+Programs can be statically or dynamically linked.  This is the default
+code model.
 
-By default (unless configured otherwise), GCC generates code for the GR5
-variant of the Visium architecture.  
+@item -mcmodel=kernel
+@opindex mcmodel=kernel
+Generate code for the kernel code model.  The kernel runs in the
+negative 2 GB of the address space.
+This model has to be used for Linux kernel code.
 
-With @option{-mcpu=gr6}, GCC generates code for the GR6 variant of the Visium
-architecture.  The only difference from GR5 code is that the compiler will
-generate block move instructions.
+@item -mcmodel=medium
+@opindex mcmodel=medium
+Generate code for the medium model: the program is linked in the lower 2
+GB of the address space.  Small symbols are also placed there.  Symbols
+with sizes larger than @option{-mlarge-data-threshold} are put into
+large data or BSS sections and can be located above 2GB.  Programs can
+be statically or dynamically linked.
 
-@item -mtune=@var{cpu_type}
-@opindex mtune
-Set the instruction scheduling parameters for machine type @var{cpu_type},
-but do not set the instruction set or register set that the option
-@option{-mcpu=@var{cpu_type}} would.
+@item -mcmodel=large
+@opindex mcmodel=large
+Generate code for the large model.  This model makes no assumptions
+about addresses and sizes of sections.
 
-@item -msv-mode
-@opindex msv-mode
-Generate code for the supervisor mode, where there are no restrictions on
-the access to general registers.  This is the default.
+@item -maddress-mode=long
+@opindex maddress-mode=long
+Generate code for long address mode.  This is only supported for 64-bit
+and x32 environments.  It is the default address mode for 64-bit
+environments.
 
-@item -muser-mode
-@opindex muser-mode
-Generate code for the user mode, where the access to some general registers
-is forbidden: on the GR5, registers r24 to r31 cannot be accessed in this
-mode; on the GR6, only registers r29 to r31 are affected.
+@item -maddress-mode=short
+@opindex maddress-mode=short
+Generate code for short address mode.  This is only supported for 32-bit
+and x32 environments.  It is the default address mode for 32-bit and
+x32 environments.
 @end table
 
-@node VMS Options
-@subsection VMS Options
+@node x86 Windows Options
+@subsection x86 Windows Options
+@cindex x86 Windows Options
+@cindex Windows Options for x86
 
-These @samp{-m} options are defined for the VMS implementations:
+These additional options are available for Microsoft Windows targets:
 
 @table @gcctabopt
-@item -mvms-return-codes
-@opindex mvms-return-codes
-Return VMS condition codes from @code{main}. The default is to return POSIX-style
-condition (e.g.@ error) codes.
-
-@item -mdebug-main=@var{prefix}
-@opindex mdebug-main=@var{prefix}
-Flag the first routine whose name starts with @var{prefix} as the main
-routine for the debugger.
+@item -mconsole
+@opindex mconsole
+This option
+specifies that a console application is to be generated, by
+instructing the linker to set the PE header subsystem type
+required for console applications.
+This option is available for Cygwin and MinGW targets and is
+enabled by default on those targets.
 
-@item -mmalloc64
-@opindex mmalloc64
-Default to 64-bit memory allocation routines.
+@item -mdll
+@opindex mdll
+This option is available for Cygwin and MinGW targets.  It
+specifies that a DLL---a dynamic link library---is to be
+generated, enabling the selection of the required runtime
+startup object and entry point.
 
-@item -mpointer-size=@var{size}
-@opindex mpointer-size=@var{size}
-Set the default size of pointers. Possible options for @var{size} are
-@samp{32} or @samp{short} for 32 bit pointers, @samp{64} or @samp{long}
-for 64 bit pointers, and @samp{no} for supporting only 32 bit pointers.
-The later option disables @code{pragma pointer_size}.
-@end table
+@item -mnop-fun-dllimport
+@opindex mnop-fun-dllimport
+This option is available for Cygwin and MinGW targets.  It
+specifies that the @code{dllimport} attribute should be ignored.
 
-@node VxWorks Options
-@subsection VxWorks Options
-@cindex VxWorks Options
+@item -mthread
+@opindex mthread
+This option is available for MinGW targets. It specifies
+that MinGW-specific thread support is to be used.
 
-The options in this section are defined for all VxWorks targets.
-Options specific to the target hardware are listed with the other
-options for that target.
+@item -municode
+@opindex municode
+This option is available for MinGW-w64 targets.  It causes
+the @code{UNICODE} preprocessor macro to be predefined, and
+chooses Unicode-capable runtime startup code.
 
-@table @gcctabopt
-@item -mrtp
-@opindex mrtp
-GCC can generate code for both VxWorks kernels and real time processes
-(RTPs).  This option switches from the former to the latter.  It also
-defines the preprocessor macro @code{__RTP__}.
+@item -mwin32
+@opindex mwin32
+This option is available for Cygwin and MinGW targets.  It
+specifies that the typical Microsoft Windows predefined macros are to
+be set in the pre-processor, but does not influence the choice
+of runtime library/startup code.
 
-@item -non-static
-@opindex non-static
-Link an RTP executable against shared libraries rather than static
-libraries.  The options @option{-static} and @option{-shared} can
-also be used for RTPs (@pxref{Link Options}); @option{-static}
-is the default.
+@item -mwindows
+@opindex mwindows
+This option is available for Cygwin and MinGW targets.  It
+specifies that a GUI application is to be generated by
+instructing the linker to set the PE header subsystem type
+appropriately.
 
-@item -Bstatic
-@itemx -Bdynamic
-@opindex Bstatic
-@opindex Bdynamic
-These options are passed down to the linker.  They are defined for
-compatibility with Diab.
+@item -fno-set-stack-executable
+@opindex fno-set-stack-executable
+This option is available for MinGW targets. It specifies that
+the executable flag for the stack used by nested functions isn't
+set. This is necessary for binaries running in kernel mode of
+Microsoft Windows, as there the User32 API, which is used to set executable
+privileges, isn't available.
 
-@item -Xbind-lazy
-@opindex Xbind-lazy
-Enable lazy binding of function calls.  This option is equivalent to
-@option{-Wl,-z,now} and is defined for compatibility with Diab.
+@item -fwritable-relocated-rdata
+@opindex fno-writable-relocated-rdata
+This option is available for MinGW and Cygwin targets.  It specifies
+that relocated-data in read-only section is put into .data
+section.  This is a necessary for older runtimes not supporting
+modification of .rdata sections for pseudo-relocation.
 
-@item -Xbind-now
-@opindex Xbind-now
-Disable lazy binding of function calls.  This option is the default and
-is defined for compatibility with Diab.
+@item -mpe-aligned-commons
+@opindex mpe-aligned-commons
+This option is available for Cygwin and MinGW targets.  It
+specifies that the GNU extension to the PE file format that
+permits the correct alignment of COMMON variables should be
+used when generating code.  It is enabled by default if
+GCC detects that the target assembler found during configuration
+supports the feature.
 @end table
 
+See also under @ref{x86 Options} for standard options.
+
 @node Xstormy16 Options
 @subsection Xstormy16 Options
 @cindex Xstormy16 Options
diff --git a/gcc/doc/md.texi b/gcc/doc/md.texi
index 03faa12..f2c25c2 100644
--- a/gcc/doc/md.texi
+++ b/gcc/doc/md.texi
@@ -1695,6 +1695,7 @@ constraints that aren't.  The compiler source file mentioned in the
 table heading for each architecture is the definitive reference for
 the meanings of that architecture's constraints.
 
+@c Please keep this table alphabetized by target!
 @table @emph
 @item AArch64 family---@file{config/aarch64/constraints.md}
 @table @code
@@ -1931,6 +1932,157 @@ A floating point constant 0.0
 A memory address based on Y or Z pointer with displacement.
 @end table
 
+@item Blackfin family---@file{config/bfin/constraints.md}
+@table @code
+@item a
+P register
+
+@item d
+D register
+
+@item z
+A call clobbered P register.
+
+@item q@var{n}
+A single register.  If @var{n} is in the range 0 to 7, the corresponding D
+register.  If it is @code{A}, then the register P0.
+
+@item D
+Even-numbered D register
+
+@item W
+Odd-numbered D register
+
+@item e
+Accumulator register.
+
+@item A
+Even-numbered accumulator register.
+
+@item B
+Odd-numbered accumulator register.
+
+@item b
+I register
+
+@item v
+B register
+
+@item f
+M register
+
+@item c
+Registers used for circular buffering, i.e. I, B, or L registers.
+
+@item C
+The CC register.
+
+@item t
+LT0 or LT1.
+
+@item k
+LC0 or LC1.
+
+@item u
+LB0 or LB1.
+
+@item x
+Any D, P, B, M, I or L register.
+
+@item y
+Additional registers typically used only in prologues and epilogues: RETS,
+RETN, RETI, RETX, RETE, ASTAT, SEQSTAT and USP.
+
+@item w
+Any register except accumulators or CC.
+
+@item Ksh
+Signed 16 bit integer (in the range @minus{}32768 to 32767)
+
+@item Kuh
+Unsigned 16 bit integer (in the range 0 to 65535)
+
+@item Ks7
+Signed 7 bit integer (in the range @minus{}64 to 63)
+
+@item Ku7
+Unsigned 7 bit integer (in the range 0 to 127)
+
+@item Ku5
+Unsigned 5 bit integer (in the range 0 to 31)
+
+@item Ks4
+Signed 4 bit integer (in the range @minus{}8 to 7)
+
+@item Ks3
+Signed 3 bit integer (in the range @minus{}3 to 4)
+
+@item Ku3
+Unsigned 3 bit integer (in the range 0 to 7)
+
+@item P@var{n}
+Constant @var{n}, where @var{n} is a single-digit constant in the range 0 to 4.
+
+@item PA
+An integer equal to one of the MACFLAG_XXX constants that is suitable for
+use with either accumulator.
+
+@item PB
+An integer equal to one of the MACFLAG_XXX constants that is suitable for
+use only with accumulator A1.
+
+@item M1
+Constant 255.
+
+@item M2
+Constant 65535.
+
+@item J
+An integer constant with exactly a single bit set.
+
+@item L
+An integer constant with all bits set except exactly one.
+
+@item H
+
+@item Q
+Any SYMBOL_REF.
+@end table
+
+@item CR16 Architecture---@file{config/cr16/cr16.h}
+@table @code
+
+@item b
+Registers from r0 to r14 (registers without stack pointer)
+
+@item t
+Register from r0 to r11 (all 16-bit registers)
+
+@item p
+Register from r12 to r15 (all 32-bit registers)
+
+@item I
+Signed constant that fits in 4 bits
+
+@item J
+Signed constant that fits in 5 bits
+
+@item K
+Signed constant that fits in 6 bits
+
+@item L
+Unsigned constant that fits in 4 bits
+
+@item M
+Signed constant that fits in 32 bits
+
+@item N
+Check for 64 bits wide constants for add/sub instructions
+
+@item G
+Floating point constant that is legal for store immediate
+@end table
+
 @item Epiphany---@file{config/epiphany/constraints.md}
 @table @code
 @item U16
@@ -2002,38 +2154,97 @@ Matches control register values to switch fp mode, which are encapsulated in
 @code{UNSPEC_FP_MODE}.
 @end table
 
-@item CR16 Architecture---@file{config/cr16/cr16.h}
+@item FRV---@file{config/frv/frv.h}
 @table @code
+@item a
+Register in the class @code{ACC_REGS} (@code{acc0} to @code{acc7}).
 
 @item b
-Registers from r0 to r14 (registers without stack pointer)
+Register in the class @code{EVEN_ACC_REGS} (@code{acc0} to @code{acc7}).
+
+@item c
+Register in the class @code{CC_REGS} (@code{fcc0} to @code{fcc3} and
+@code{icc0} to @code{icc3}).
+
+@item d
+Register in the class @code{GPR_REGS} (@code{gr0} to @code{gr63}).
+
+@item e
+Register in the class @code{EVEN_REGS} (@code{gr0} to @code{gr63}).
+Odd registers are excluded not in the class but through the use of a machine
+mode larger than 4 bytes.
+
+@item f
+Register in the class @code{FPR_REGS} (@code{fr0} to @code{fr63}).
+
+@item h
+Register in the class @code{FEVEN_REGS} (@code{fr0} to @code{fr63}).
+Odd registers are excluded not in the class but through the use of a machine
+mode larger than 4 bytes.
+
+@item l
+Register in the class @code{LR_REG} (the @code{lr} register).
+
+@item q
+Register in the class @code{QUAD_REGS} (@code{gr2} to @code{gr63}).
+Register numbers not divisible by 4 are excluded not in the class but through
+the use of a machine mode larger than 8 bytes.
 
 @item t
-Register from r0 to r11 (all 16-bit registers)
+Register in the class @code{ICC_REGS} (@code{icc0} to @code{icc3}).
 
-@item p
-Register from r12 to r15 (all 32-bit registers)
+@item u
+Register in the class @code{FCC_REGS} (@code{fcc0} to @code{fcc3}).
+
+@item v
+Register in the class @code{ICR_REGS} (@code{cc4} to @code{cc7}).
+
+@item w
+Register in the class @code{FCR_REGS} (@code{cc0} to @code{cc3}).
+
+@item x
+Register in the class @code{QUAD_FPR_REGS} (@code{fr0} to @code{fr63}).
+Register numbers not divisible by 4 are excluded not in the class but through
+the use of a machine mode larger than 8 bytes.
+
+@item z
+Register in the class @code{SPR_REGS} (@code{lcr} and @code{lr}).
+
+@item A
+Register in the class @code{QUAD_ACC_REGS} (@code{acc0} to @code{acc7}).
+
+@item B
+Register in the class @code{ACCG_REGS} (@code{accg0} to @code{accg7}).
+
+@item C
+Register in the class @code{CR_REGS} (@code{cc0} to @code{cc7}).
+
+@item G
+Floating point constant zero
 
 @item I
-Signed constant that fits in 4 bits
+6-bit signed integer constant
 
 @item J
-Signed constant that fits in 5 bits
-
-@item K
-Signed constant that fits in 6 bits
+10-bit signed integer constant
 
 @item L
-Unsigned constant that fits in 4 bits
+16-bit signed integer constant
 
 @item M
-Signed constant that fits in 32 bits
+16-bit unsigned integer constant
 
 @item N
-Check for 64 bits wide constants for add/sub instructions
+12-bit signed integer constant that is negative---i.e.@: in the
+range of @minus{}2048 to @minus{}1
+
+@item O
+Constant zero
+
+@item P
+12-bit signed integer constant that is greater than zero---i.e.@: in the
+range of 1 to 2047.
 
-@item G
-Floating point constant that is legal for store immediate
 @end table
 
 @item Hewlett-Packard PA-RISC---@file{config/pa/pa.h}
@@ -2107,615 +2318,68 @@ A memory operand for floating-point loads and stores
 A register indirect memory operand
 @end table
 
-@item PowerPC and IBM RS6000---@file{config/rs6000/constraints.md}
+@item Intel IA-64---@file{config/ia64/ia64.h}
 @table @code
-@item b
-Address base register
-
-@item d
-Floating point register (containing 64-bit value)
-
-@item f
-Floating point register (containing 32-bit value)
+@item a
+General register @code{r0} to @code{r3} for @code{addl} instruction
 
-@item v
-Altivec vector register
-
-@item wa
-Any VSX register if the -mvsx option was used or NO_REGS.
-
-@item wd
-VSX vector register to hold vector double data or NO_REGS.
-
-@item wf
-VSX vector register to hold vector float data or NO_REGS.
-
-@item wg
-If @option{-mmfpgpr} was used, a floating point register or NO_REGS.
-
-@item wh
-Floating point register if direct moves are available, or NO_REGS.
-
-@item wi
-FP or VSX register to hold 64-bit integers for VSX insns or NO_REGS.
-
-@item wj
-FP or VSX register to hold 64-bit integers for direct moves or NO_REGS.
-
-@item wk
-FP or VSX register to hold 64-bit doubles for direct moves or NO_REGS.
-
-@item wl
-Floating point register if the LFIWAX instruction is enabled or NO_REGS.
-
-@item wm
-VSX register if direct move instructions are enabled, or NO_REGS.
-
-@item wn
-No register (NO_REGS).
-
-@item wr
-General purpose register if 64-bit instructions are enabled or NO_REGS.
-
-@item ws
-VSX vector register to hold scalar double values or NO_REGS.
-
-@item wt
-VSX vector register to hold 128 bit integer or NO_REGS.
-
-@item wu
-Altivec register to use for float/32-bit int loads/stores  or NO_REGS.
-
-@item wv
-Altivec register to use for double loads/stores  or NO_REGS.
-
-@item ww
-FP or VSX register to perform float operations under @option{-mvsx} or NO_REGS.
-
-@item wx
-Floating point register if the STFIWX instruction is enabled or NO_REGS.
-
-@item wy
-FP or VSX register to perform ISA 2.07 float ops or NO_REGS.
-
-@item wz
-Floating point register if the LFIWZX instruction is enabled or NO_REGS.
-
-@item wD
-Int constant that is the element number of the 64-bit scalar in a vector.
-
-@item wQ
-A memory address that will work with the @code{lq} and @code{stq}
-instructions.
-
-@item h
-@samp{MQ}, @samp{CTR}, or @samp{LINK} register
-
-@item q
-@samp{MQ} register
-
-@item c
-@samp{CTR} register
-
-@item l
-@samp{LINK} register
-
-@item x
-@samp{CR} register (condition register) number 0
-
-@item y
-@samp{CR} register (condition register)
-
-@item z
-@samp{XER[CA]} carry bit (part of the XER register)
-
-@item I
-Signed 16-bit constant
-
-@item J
-Unsigned 16-bit constant shifted left 16 bits (use @samp{L} instead for
-@code{SImode} constants)
-
-@item K
-Unsigned 16-bit constant
-
-@item L
-Signed 16-bit constant shifted left 16 bits
-
-@item M
-Constant larger than 31
-
-@item N
-Exact power of 2
-
-@item O
-Zero
-
-@item P
-Constant whose negation is a signed 16-bit constant
-
-@item G
-Floating point constant that can be loaded into a register with one
-instruction per word
-
-@item H
-Integer/Floating point constant that can be loaded into a register using
-three instructions
-
-@item m
-Memory operand.
-Normally, @code{m} does not allow addresses that update the base register.
-If @samp{<} or @samp{>} constraint is also used, they are allowed and
-therefore on PowerPC targets in that case it is only safe
-to use @samp{m<>} in an @code{asm} statement if that @code{asm} statement
-accesses the operand exactly once.  The @code{asm} statement must also
-use @samp{%U@var{<opno>}} as a placeholder for the ``update'' flag in the
-corresponding load or store instruction.  For example:
-
-@smallexample
-asm ("st%U0 %1,%0" : "=m<>" (mem) : "r" (val));
-@end smallexample
-
-is correct but:
-
-@smallexample
-asm ("st %1,%0" : "=m<>" (mem) : "r" (val));
-@end smallexample
-
-is not.
-
-@item es
-A ``stable'' memory operand; that is, one which does not include any
-automodification of the base register.  This used to be useful when
-@samp{m} allowed automodification of the base register, but as those are now only
-allowed when @samp{<} or @samp{>} is used, @samp{es} is basically the same
-as @samp{m} without @samp{<} and @samp{>}.
-
-@item Q
-Memory operand that is an offset from a register (it is usually better
-to use @samp{m} or @samp{es} in @code{asm} statements)
-
-@item Z
-Memory operand that is an indexed or indirect from a register (it is
-usually better to use @samp{m} or @samp{es} in @code{asm} statements)
-
-@item R
-AIX TOC entry
-
-@item a
-Address operand that is an indexed or indirect from a register (@samp{p} is
-preferable for @code{asm} statements)
-
-@item S
-Constant suitable as a 64-bit mask operand
-
-@item T
-Constant suitable as a 32-bit mask operand
-
-@item U
-System V Release 4 small data area reference
-
-@item t
-AND masks that can be performed by two rldic@{l, r@} instructions
-
-@item W
-Vector constant that does not require memory
-
-@item j
-Vector constant that is all zeros.
-
-@end table
-
-@item x86 family---@file{config/i386/constraints.md}
-@table @code
-@item R
-Legacy register---the eight integer registers available on all
-i386 processors (@code{a}, @code{b}, @code{c}, @code{d},
-@code{si}, @code{di}, @code{bp}, @code{sp}).
-
-@item q
-Any register accessible as @code{@var{r}l}.  In 32-bit mode, @code{a},
-@code{b}, @code{c}, and @code{d}; in 64-bit mode, any integer register.
-
-@item Q
-Any register accessible as @code{@var{r}h}: @code{a}, @code{b},
-@code{c}, and @code{d}.
-
-@ifset INTERNALS
-@item l
-Any register that can be used as the index in a base+index memory
-access: that is, any general register except the stack pointer.
-@end ifset
-
-@item a
-The @code{a} register.
-
-@item b
-The @code{b} register.
-
-@item c
-The @code{c} register.
-
-@item d
-The @code{d} register.
-
-@item S
-The @code{si} register.
-
-@item D
-The @code{di} register.
-
-@item A
-The @code{a} and @code{d} registers.  This class is used for instructions
-that return double word results in the @code{ax:dx} register pair.  Single
-word values will be allocated either in @code{ax} or @code{dx}.
-For example on i386 the following implements @code{rdtsc}:
-
-@smallexample
-unsigned long long rdtsc (void)
-@{
-  unsigned long long tick;
-  __asm__ __volatile__("rdtsc":"=A"(tick));
-  return tick;
-@}
-@end smallexample
-
-This is not correct on x86-64 as it would allocate tick in either @code{ax}
-or @code{dx}.  You have to use the following variant instead:
-
-@smallexample
-unsigned long long rdtsc (void)
-@{
-  unsigned int tickl, tickh;
-  __asm__ __volatile__("rdtsc":"=a"(tickl),"=d"(tickh));
-  return ((unsigned long long)tickh << 32)|tickl;
-@}
-@end smallexample
-
-
-@item f
-Any 80387 floating-point (stack) register.
-
-@item t
-Top of 80387 floating-point stack (@code{%st(0)}).
-
-@item u
-Second from top of 80387 floating-point stack (@code{%st(1)}).
-
-@item y
-Any MMX register.
-
-@item x
-Any SSE register.
-
-@item Yz
-First SSE register (@code{%xmm0}).
-
-@ifset INTERNALS
-@item Y2
-Any SSE register, when SSE2 is enabled.
-
-@item Yi
-Any SSE register, when SSE2 and inter-unit moves are enabled.
-
-@item Ym
-Any MMX register, when inter-unit moves are enabled.
-@end ifset
-
-@item I
-Integer constant in the range 0 @dots{} 31, for 32-bit shifts.
-
-@item J
-Integer constant in the range 0 @dots{} 63, for 64-bit shifts.
-
-@item K
-Signed 8-bit integer constant.
-
-@item L
-@code{0xFF} or @code{0xFFFF}, for andsi as a zero-extending move.
-
-@item M
-0, 1, 2, or 3 (shifts for the @code{lea} instruction).
-
-@item N
-Unsigned 8-bit integer constant (for @code{in} and @code{out}
-instructions).
-
-@ifset INTERNALS
-@item O
-Integer constant in the range 0 @dots{} 127, for 128-bit shifts.
-@end ifset
-
-@item G
-Standard 80387 floating point constant.
-
-@item C
-Standard SSE floating point constant.
-
-@item e
-32-bit signed integer constant, or a symbolic reference known
-to fit that range (for immediate operands in sign-extending x86-64
-instructions).
-
-@item Z
-32-bit unsigned integer constant, or a symbolic reference known
-to fit that range (for immediate operands in zero-extending x86-64
-instructions).
-
-@end table
-
-@item Intel IA-64---@file{config/ia64/ia64.h}
-@table @code
-@item a
-General register @code{r0} to @code{r3} for @code{addl} instruction
-
-@item b
-Branch register
+@item b
+Branch register
 
 @item c
 Predicate register (@samp{c} as in ``conditional'')
 
-@item d
-Application register residing in M-unit
-
-@item e
-Application register residing in I-unit
-
-@item f
-Floating-point register
-
-@item m
-Memory operand.  If used together with @samp{<} or @samp{>},
-the operand can have postincrement and postdecrement which
-require printing with @samp{%Pn} on IA-64.
-
-@item G
-Floating-point constant 0.0 or 1.0
-
-@item I
-14-bit signed integer constant
-
-@item J
-22-bit signed integer constant
-
-@item K
-8-bit signed integer constant for logical instructions
-
-@item L
-8-bit adjusted signed integer constant for compare pseudo-ops
-
-@item M
-6-bit unsigned integer constant for shift counts
-
-@item N
-9-bit signed integer constant for load and store postincrements
-
-@item O
-The constant zero
-
-@item P
-0 or @minus{}1 for @code{dep} instruction
-
-@item Q
-Non-volatile memory for floating-point loads and stores
-
-@item R
-Integer constant in the range 1 to 4 for @code{shladd} instruction
-
-@item S
-Memory operand except postincrement and postdecrement.  This is
-now roughly the same as @samp{m} when not used together with @samp{<}
-or @samp{>}.
-@end table
-
-@item FRV---@file{config/frv/frv.h}
-@table @code
-@item a
-Register in the class @code{ACC_REGS} (@code{acc0} to @code{acc7}).
-
-@item b
-Register in the class @code{EVEN_ACC_REGS} (@code{acc0} to @code{acc7}).
-
-@item c
-Register in the class @code{CC_REGS} (@code{fcc0} to @code{fcc3} and
-@code{icc0} to @code{icc3}).
-
-@item d
-Register in the class @code{GPR_REGS} (@code{gr0} to @code{gr63}).
-
-@item e
-Register in the class @code{EVEN_REGS} (@code{gr0} to @code{gr63}).
-Odd registers are excluded not in the class but through the use of a machine
-mode larger than 4 bytes.
-
-@item f
-Register in the class @code{FPR_REGS} (@code{fr0} to @code{fr63}).
-
-@item h
-Register in the class @code{FEVEN_REGS} (@code{fr0} to @code{fr63}).
-Odd registers are excluded not in the class but through the use of a machine
-mode larger than 4 bytes.
-
-@item l
-Register in the class @code{LR_REG} (the @code{lr} register).
-
-@item q
-Register in the class @code{QUAD_REGS} (@code{gr2} to @code{gr63}).
-Register numbers not divisible by 4 are excluded not in the class but through
-the use of a machine mode larger than 8 bytes.
-
-@item t
-Register in the class @code{ICC_REGS} (@code{icc0} to @code{icc3}).
-
-@item u
-Register in the class @code{FCC_REGS} (@code{fcc0} to @code{fcc3}).
-
-@item v
-Register in the class @code{ICR_REGS} (@code{cc4} to @code{cc7}).
-
-@item w
-Register in the class @code{FCR_REGS} (@code{cc0} to @code{cc3}).
-
-@item x
-Register in the class @code{QUAD_FPR_REGS} (@code{fr0} to @code{fr63}).
-Register numbers not divisible by 4 are excluded not in the class but through
-the use of a machine mode larger than 8 bytes.
-
-@item z
-Register in the class @code{SPR_REGS} (@code{lcr} and @code{lr}).
-
-@item A
-Register in the class @code{QUAD_ACC_REGS} (@code{acc0} to @code{acc7}).
-
-@item B
-Register in the class @code{ACCG_REGS} (@code{accg0} to @code{accg7}).
-
-@item C
-Register in the class @code{CR_REGS} (@code{cc0} to @code{cc7}).
-
-@item G
-Floating point constant zero
-
-@item I
-6-bit signed integer constant
-
-@item J
-10-bit signed integer constant
-
-@item L
-16-bit signed integer constant
-
-@item M
-16-bit unsigned integer constant
-
-@item N
-12-bit signed integer constant that is negative---i.e.@: in the
-range of @minus{}2048 to @minus{}1
-
-@item O
-Constant zero
-
-@item P
-12-bit signed integer constant that is greater than zero---i.e.@: in the
-range of 1 to 2047.
-
-@end table
-
-@item Blackfin family---@file{config/bfin/constraints.md}
-@table @code
-@item a
-P register
-
-@item d
-D register
-
-@item z
-A call clobbered P register.
-
-@item q@var{n}
-A single register.  If @var{n} is in the range 0 to 7, the corresponding D
-register.  If it is @code{A}, then the register P0.
-
-@item D
-Even-numbered D register
-
-@item W
-Odd-numbered D register
-
-@item e
-Accumulator register.
-
-@item A
-Even-numbered accumulator register.
-
-@item B
-Odd-numbered accumulator register.
-
-@item b
-I register
-
-@item v
-B register
-
-@item f
-M register
-
-@item c
-Registers used for circular buffering, i.e. I, B, or L registers.
-
-@item C
-The CC register.
-
-@item t
-LT0 or LT1.
-
-@item k
-LC0 or LC1.
-
-@item u
-LB0 or LB1.
-
-@item x
-Any D, P, B, M, I or L register.
-
-@item y
-Additional registers typically used only in prologues and epilogues: RETS,
-RETN, RETI, RETX, RETE, ASTAT, SEQSTAT and USP.
-
-@item w
-Any register except accumulators or CC.
-
-@item Ksh
-Signed 16 bit integer (in the range @minus{}32768 to 32767)
-
-@item Kuh
-Unsigned 16 bit integer (in the range 0 to 65535)
-
-@item Ks7
-Signed 7 bit integer (in the range @minus{}64 to 63)
-
-@item Ku7
-Unsigned 7 bit integer (in the range 0 to 127)
-
-@item Ku5
-Unsigned 5 bit integer (in the range 0 to 31)
-
-@item Ks4
-Signed 4 bit integer (in the range @minus{}8 to 7)
-
-@item Ks3
-Signed 3 bit integer (in the range @minus{}3 to 4)
-
-@item Ku3
-Unsigned 3 bit integer (in the range 0 to 7)
+@item d
+Application register residing in M-unit
 
-@item P@var{n}
-Constant @var{n}, where @var{n} is a single-digit constant in the range 0 to 4.
+@item e
+Application register residing in I-unit
 
-@item PA
-An integer equal to one of the MACFLAG_XXX constants that is suitable for
-use with either accumulator.
+@item f
+Floating-point register
 
-@item PB
-An integer equal to one of the MACFLAG_XXX constants that is suitable for
-use only with accumulator A1.
+@item m
+Memory operand.  If used together with @samp{<} or @samp{>},
+the operand can have postincrement and postdecrement which
+require printing with @samp{%Pn} on IA-64.
 
-@item M1
-Constant 255.
+@item G
+Floating-point constant 0.0 or 1.0
 
-@item M2
-Constant 65535.
+@item I
+14-bit signed integer constant
 
 @item J
-An integer constant with exactly a single bit set.
+22-bit signed integer constant
+
+@item K
+8-bit signed integer constant for logical instructions
 
 @item L
-An integer constant with all bits set except exactly one.
+8-bit adjusted signed integer constant for compare pseudo-ops
 
-@item H
+@item M
+6-bit unsigned integer constant for shift counts
+
+@item N
+9-bit signed integer constant for load and store postincrements
+
+@item O
+The constant zero
+
+@item P
+0 or @minus{}1 for @code{dep} instruction
 
 @item Q
-Any SYMBOL_REF.
+Non-volatile memory for floating-point loads and stores
+
+@item R
+Integer constant in the range 1 to 4 for @code{shladd} instruction
+
+@item S
+Memory operand except postincrement and postdecrement.  This is
+now roughly the same as @samp{m} when not used together with @samp{<}
+or @samp{>}.
 @end table
 
 @item M32C---@file{config/m32c/m32c.c}
@@ -3316,33 +2980,232 @@ Floating point constant 0.
 @item I
 An integer constant that fits in 16 bits.
 
-@item J
-An integer constant whose low order 16 bits are zero.
+@item J
+An integer constant whose low order 16 bits are zero.
+
+@item K
+An integer constant that does not meet the constraints for codes
+@samp{I} or @samp{J}.
+
+@item L
+The integer constant 1.
+
+@item M
+The integer constant @minus{}1.
+
+@item N
+The integer constant 0.
+
+@item O
+Integer constants @minus{}4 through @minus{}1 and 1 through 4; shifts by these
+amounts are handled as multiple single-bit shifts rather than a single
+variable-length shift.
+
+@item Q
+A memory reference which requires an additional word (address or
+offset) after the opcode.
+
+@item R
+A memory reference that is encoded within the opcode.
+
+@end table
+
+@item PowerPC and IBM RS6000---@file{config/rs6000/constraints.md}
+@table @code
+@item b
+Address base register
+
+@item d
+Floating point register (containing 64-bit value)
+
+@item f
+Floating point register (containing 32-bit value)
+
+@item v
+Altivec vector register
+
+@item wa
+Any VSX register if the -mvsx option was used or NO_REGS.
+
+@item wd
+VSX vector register to hold vector double data or NO_REGS.
+
+@item wf
+VSX vector register to hold vector float data or NO_REGS.
+
+@item wg
+If @option{-mmfpgpr} was used, a floating point register or NO_REGS.
+
+@item wh
+Floating point register if direct moves are available, or NO_REGS.
+
+@item wi
+FP or VSX register to hold 64-bit integers for VSX insns or NO_REGS.
+
+@item wj
+FP or VSX register to hold 64-bit integers for direct moves or NO_REGS.
+
+@item wk
+FP or VSX register to hold 64-bit doubles for direct moves or NO_REGS.
+
+@item wl
+Floating point register if the LFIWAX instruction is enabled or NO_REGS.
+
+@item wm
+VSX register if direct move instructions are enabled, or NO_REGS.
+
+@item wn
+No register (NO_REGS).
+
+@item wr
+General purpose register if 64-bit instructions are enabled or NO_REGS.
+
+@item ws
+VSX vector register to hold scalar double values or NO_REGS.
+
+@item wt
+VSX vector register to hold 128 bit integer or NO_REGS.
+
+@item wu
+Altivec register to use for float/32-bit int loads/stores  or NO_REGS.
+
+@item wv
+Altivec register to use for double loads/stores  or NO_REGS.
+
+@item ww
+FP or VSX register to perform float operations under @option{-mvsx} or NO_REGS.
+
+@item wx
+Floating point register if the STFIWX instruction is enabled or NO_REGS.
+
+@item wy
+FP or VSX register to perform ISA 2.07 float ops or NO_REGS.
+
+@item wz
+Floating point register if the LFIWZX instruction is enabled or NO_REGS.
+
+@item wD
+Int constant that is the element number of the 64-bit scalar in a vector.
+
+@item wQ
+A memory address that will work with the @code{lq} and @code{stq}
+instructions.
+
+@item h
+@samp{MQ}, @samp{CTR}, or @samp{LINK} register
+
+@item q
+@samp{MQ} register
+
+@item c
+@samp{CTR} register
+
+@item l
+@samp{LINK} register
+
+@item x
+@samp{CR} register (condition register) number 0
+
+@item y
+@samp{CR} register (condition register)
+
+@item z
+@samp{XER[CA]} carry bit (part of the XER register)
+
+@item I
+Signed 16-bit constant
+
+@item J
+Unsigned 16-bit constant shifted left 16 bits (use @samp{L} instead for
+@code{SImode} constants)
+
+@item K
+Unsigned 16-bit constant
+
+@item L
+Signed 16-bit constant shifted left 16 bits
+
+@item M
+Constant larger than 31
+
+@item N
+Exact power of 2
+
+@item O
+Zero
+
+@item P
+Constant whose negation is a signed 16-bit constant
+
+@item G
+Floating point constant that can be loaded into a register with one
+instruction per word
+
+@item H
+Integer/Floating point constant that can be loaded into a register using
+three instructions
+
+@item m
+Memory operand.
+Normally, @code{m} does not allow addresses that update the base register.
+If @samp{<} or @samp{>} constraint is also used, they are allowed and
+therefore on PowerPC targets in that case it is only safe
+to use @samp{m<>} in an @code{asm} statement if that @code{asm} statement
+accesses the operand exactly once.  The @code{asm} statement must also
+use @samp{%U@var{<opno>}} as a placeholder for the ``update'' flag in the
+corresponding load or store instruction.  For example:
+
+@smallexample
+asm ("st%U0 %1,%0" : "=m<>" (mem) : "r" (val));
+@end smallexample
+
+is correct but:
+
+@smallexample
+asm ("st %1,%0" : "=m<>" (mem) : "r" (val));
+@end smallexample
+
+is not.
+
+@item es
+A ``stable'' memory operand; that is, one which does not include any
+automodification of the base register.  This used to be useful when
+@samp{m} allowed automodification of the base register, but as those are now only
+allowed when @samp{<} or @samp{>} is used, @samp{es} is basically the same
+as @samp{m} without @samp{<} and @samp{>}.
+
+@item Q
+Memory operand that is an offset from a register (it is usually better
+to use @samp{m} or @samp{es} in @code{asm} statements)
+
+@item Z
+Memory operand that is an indexed or indirect from a register (it is
+usually better to use @samp{m} or @samp{es} in @code{asm} statements)
+
+@item R
+AIX TOC entry
 
-@item K
-An integer constant that does not meet the constraints for codes
-@samp{I} or @samp{J}.
+@item a
+Address operand that is an indexed or indirect from a register (@samp{p} is
+preferable for @code{asm} statements)
 
-@item L
-The integer constant 1.
+@item S
+Constant suitable as a 64-bit mask operand
 
-@item M
-The integer constant @minus{}1.
+@item T
+Constant suitable as a 32-bit mask operand
 
-@item N
-The integer constant 0.
+@item U
+System V Release 4 small data area reference
 
-@item O
-Integer constants @minus{}4 through @minus{}1 and 1 through 4; shifts by these
-amounts are handled as multiple single-bit shifts rather than a single
-variable-length shift.
+@item t
+AND masks that can be performed by two rldic@{l, r@} instructions
 
-@item Q
-A memory reference which requires an additional word (address or
-offset) after the opcode.
+@item W
+Vector constant that does not require memory
 
-@item R
-A memory reference that is encoded within the opcode.
+@item j
+Vector constant that is all zeros.
 
 @end table
 
@@ -3462,6 +3325,79 @@ A constant in the range 0 to 15, inclusive.
 
 @end table
 
+@item S/390 and zSeries---@file{config/s390/s390.h}
+@table @code
+@item a
+Address register (general purpose register except r0)
+
+@item c
+Condition code register
+
+@item d
+Data register (arbitrary general purpose register)
+
+@item f
+Floating-point register
+
+@item I
+Unsigned 8-bit constant (0--255)
+
+@item J
+Unsigned 12-bit constant (0--4095)
+
+@item K
+Signed 16-bit constant (@minus{}32768--32767)
+
+@item L
+Value appropriate as displacement.
+@table @code
+@item (0..4095)
+for short displacement
+@item (@minus{}524288..524287)
+for long displacement
+@end table
+
+@item M
+Constant integer with a value of 0x7fffffff.
+
+@item N
+Multiple letter constraint followed by 4 parameter letters.
+@table @code
+@item 0..9:
+number of the part counting from most to least significant
+@item H,Q:
+mode of the part
+@item D,S,H:
+mode of the containing operand
+@item 0,F:
+value of the other parts (F---all bits set)
+@end table
+The constraint matches if the specified part of a constant
+has a value different from its other parts.
+
+@item Q
+Memory reference without index register and with short displacement.
+
+@item R
+Memory reference with index register and short displacement.
+
+@item S
+Memory reference without index register but with long displacement.
+
+@item T
+Memory reference with index register and long displacement.
+
+@item U
+Pointer with short displacement.
+
+@item W
+Pointer with long displacement.
+
+@item Y
+Shift count operand.
+
+@end table
+
 @need 1000
 @item SPARC---@file{config/sparc/sparc.h}
 @table @code
@@ -3581,199 +3517,56 @@ An immediate which can be loaded with @code{fsmbi}.
 @item A
 An immediate which can be loaded with the il/ila/ilh/ilhu instructions.  const_int is treated as a 32 bit value.
 
-@item B
-An immediate for most arithmetic instructions.  const_int is treated as a 32 bit value.
-
-@item C
-An immediate for and/xor/or instructions.  const_int is treated as a 32 bit value.
-
-@item D
-An immediate for the @code{iohl} instruction.  const_int is treated as a 32 bit value.
-
-@item I
-A constant in the range [@minus{}64, 63] for shift/rotate instructions.
-
-@item J
-An unsigned 7-bit constant for conversion/nop/channel instructions.
-
-@item K
-A signed 10-bit constant for most arithmetic instructions.
-
-@item M
-A signed 16 bit immediate for @code{stop}.
-
-@item N
-An unsigned 16-bit constant for @code{iohl} and @code{fsmbi}.
-
-@item O
-An unsigned 7-bit constant whose 3 least significant bits are 0.
-
-@item P
-An unsigned 3-bit constant for 16-byte rotates and shifts
-
-@item R
-Call operand, reg, for indirect calls
-
-@item S
-Call operand, symbol, for relative calls.
-
-@item T
-Call operand, const_int, for absolute calls.
-
-@item U
-An immediate which can be loaded with the il/ila/ilh/ilhu instructions.  const_int is sign extended to 128 bit.
-
-@item W
-An immediate for shift and rotate instructions.  const_int is treated as a 32 bit value.
-
-@item Y
-An immediate for and/xor/or instructions.  const_int is sign extended as a 128 bit.
-
-@item Z
-An immediate for the @code{iohl} instruction.  const_int is sign extended to 128 bit.
-
-@end table
-
-@item S/390 and zSeries---@file{config/s390/s390.h}
-@table @code
-@item a
-Address register (general purpose register except r0)
-
-@item c
-Condition code register
-
-@item d
-Data register (arbitrary general purpose register)
-
-@item f
-Floating-point register
-
-@item I
-Unsigned 8-bit constant (0--255)
-
-@item J
-Unsigned 12-bit constant (0--4095)
-
-@item K
-Signed 16-bit constant (@minus{}32768--32767)
-
-@item L
-Value appropriate as displacement.
-@table @code
-@item (0..4095)
-for short displacement
-@item (@minus{}524288..524287)
-for long displacement
-@end table
-
-@item M
-Constant integer with a value of 0x7fffffff.
-
-@item N
-Multiple letter constraint followed by 4 parameter letters.
-@table @code
-@item 0..9:
-number of the part counting from most to least significant
-@item H,Q:
-mode of the part
-@item D,S,H:
-mode of the containing operand
-@item 0,F:
-value of the other parts (F---all bits set)
-@end table
-The constraint matches if the specified part of a constant
-has a value different from its other parts.
-
-@item Q
-Memory reference without index register and with short displacement.
-
-@item R
-Memory reference with index register and short displacement.
-
-@item S
-Memory reference without index register but with long displacement.
-
-@item T
-Memory reference with index register and long displacement.
-
-@item U
-Pointer with short displacement.
-
-@item W
-Pointer with long displacement.
-
-@item Y
-Shift count operand.
-
-@end table
-
-@item Xstormy16---@file{config/stormy16/stormy16.h}
-@table @code
-@item a
-Register r0.
-
-@item b
-Register r1.
-
-@item c
-Register r2.
-
-@item d
-Register r8.
-
-@item e
-Registers r0 through r7.
-
-@item t
-Registers r0 and r1.
+@item B
+An immediate for most arithmetic instructions.  const_int is treated as a 32 bit value.
 
-@item y
-The carry register.
+@item C
+An immediate for and/xor/or instructions.  const_int is treated as a 32 bit value.
 
-@item z
-Registers r8 and r9.
+@item D
+An immediate for the @code{iohl} instruction.  const_int is treated as a 32 bit value.
 
 @item I
-A constant between 0 and 3 inclusive.
+A constant in the range [@minus{}64, 63] for shift/rotate instructions.
 
 @item J
-A constant that has exactly one bit set.
+An unsigned 7-bit constant for conversion/nop/channel instructions.
 
 @item K
-A constant that has exactly one bit clear.
-
-@item L
-A constant between 0 and 255 inclusive.
+A signed 10-bit constant for most arithmetic instructions.
 
 @item M
-A constant between @minus{}255 and 0 inclusive.
+A signed 16 bit immediate for @code{stop}.
 
 @item N
-A constant between @minus{}3 and 0 inclusive.
+An unsigned 16-bit constant for @code{iohl} and @code{fsmbi}.
 
 @item O
-A constant between 1 and 4 inclusive.
+An unsigned 7-bit constant whose 3 least significant bits are 0.
 
 @item P
-A constant between @minus{}4 and @minus{}1 inclusive.
-
-@item Q
-A memory reference that is a stack push.
+An unsigned 3-bit constant for 16-byte rotates and shifts
 
 @item R
-A memory reference that is a stack pop.
+Call operand, reg, for indirect calls
 
 @item S
-A memory reference that refers to a constant address of known value.
+Call operand, symbol, for relative calls.
 
 @item T
-The register indicated by Rx (not implemented yet).
+Call operand, const_int, for absolute calls.
 
 @item U
-A constant that is not between 2 and 15 inclusive.
+An immediate which can be loaded with the il/ila/ilh/ilhu instructions.  const_int is sign extended to 128 bit.
+
+@item W
+An immediate for shift and rotate instructions.  const_int is treated as a 32 bit value.
+
+@item Y
+An immediate for and/xor/or instructions.  const_int is sign extended as a 128 bit.
 
 @item Z
-The constant 0.
+An immediate for the @code{iohl} instruction.  const_int is sign extended to 128 bit.
 
 @end table
 
@@ -4058,6 +3851,214 @@ Integer constant 0
 Integer constant 32
 @end table
 
+@item x86 family---@file{config/i386/constraints.md}
+@table @code
+@item R
+Legacy register---the eight integer registers available on all
+i386 processors (@code{a}, @code{b}, @code{c}, @code{d},
+@code{si}, @code{di}, @code{bp}, @code{sp}).
+
+@item q
+Any register accessible as @code{@var{r}l}.  In 32-bit mode, @code{a},
+@code{b}, @code{c}, and @code{d}; in 64-bit mode, any integer register.
+
+@item Q
+Any register accessible as @code{@var{r}h}: @code{a}, @code{b},
+@code{c}, and @code{d}.
+
+@ifset INTERNALS
+@item l
+Any register that can be used as the index in a base+index memory
+access: that is, any general register except the stack pointer.
+@end ifset
+
+@item a
+The @code{a} register.
+
+@item b
+The @code{b} register.
+
+@item c
+The @code{c} register.
+
+@item d
+The @code{d} register.
+
+@item S
+The @code{si} register.
+
+@item D
+The @code{di} register.
+
+@item A
+The @code{a} and @code{d} registers.  This class is used for instructions
+that return double word results in the @code{ax:dx} register pair.  Single
+word values will be allocated either in @code{ax} or @code{dx}.
+For example on i386 the following implements @code{rdtsc}:
+
+@smallexample
+unsigned long long rdtsc (void)
+@{
+  unsigned long long tick;
+  __asm__ __volatile__("rdtsc":"=A"(tick));
+  return tick;
+@}
+@end smallexample
+
+This is not correct on x86-64 as it would allocate tick in either @code{ax}
+or @code{dx}.  You have to use the following variant instead:
+
+@smallexample
+unsigned long long rdtsc (void)
+@{
+  unsigned int tickl, tickh;
+  __asm__ __volatile__("rdtsc":"=a"(tickl),"=d"(tickh));
+  return ((unsigned long long)tickh << 32)|tickl;
+@}
+@end smallexample
+
+
+@item f
+Any 80387 floating-point (stack) register.
+
+@item t
+Top of 80387 floating-point stack (@code{%st(0)}).
+
+@item u
+Second from top of 80387 floating-point stack (@code{%st(1)}).
+
+@item y
+Any MMX register.
+
+@item x
+Any SSE register.
+
+@item Yz
+First SSE register (@code{%xmm0}).
+
+@ifset INTERNALS
+@item Y2
+Any SSE register, when SSE2 is enabled.
+
+@item Yi
+Any SSE register, when SSE2 and inter-unit moves are enabled.
+
+@item Ym
+Any MMX register, when inter-unit moves are enabled.
+@end ifset
+
+@item I
+Integer constant in the range 0 @dots{} 31, for 32-bit shifts.
+
+@item J
+Integer constant in the range 0 @dots{} 63, for 64-bit shifts.
+
+@item K
+Signed 8-bit integer constant.
+
+@item L
+@code{0xFF} or @code{0xFFFF}, for andsi as a zero-extending move.
+
+@item M
+0, 1, 2, or 3 (shifts for the @code{lea} instruction).
+
+@item N
+Unsigned 8-bit integer constant (for @code{in} and @code{out}
+instructions).
+
+@ifset INTERNALS
+@item O
+Integer constant in the range 0 @dots{} 127, for 128-bit shifts.
+@end ifset
+
+@item G
+Standard 80387 floating point constant.
+
+@item C
+Standard SSE floating point constant.
+
+@item e
+32-bit signed integer constant, or a symbolic reference known
+to fit that range (for immediate operands in sign-extending x86-64
+instructions).
+
+@item Z
+32-bit unsigned integer constant, or a symbolic reference known
+to fit that range (for immediate operands in zero-extending x86-64
+instructions).
+
+@end table
+
+@item Xstormy16---@file{config/stormy16/stormy16.h}
+@table @code
+@item a
+Register r0.
+
+@item b
+Register r1.
+
+@item c
+Register r2.
+
+@item d
+Register r8.
+
+@item e
+Registers r0 through r7.
+
+@item t
+Registers r0 and r1.
+
+@item y
+The carry register.
+
+@item z
+Registers r8 and r9.
+
+@item I
+A constant between 0 and 3 inclusive.
+
+@item J
+A constant that has exactly one bit set.
+
+@item K
+A constant that has exactly one bit clear.
+
+@item L
+A constant between 0 and 255 inclusive.
+
+@item M
+A constant between @minus{}255 and 0 inclusive.
+
+@item N
+A constant between @minus{}3 and 0 inclusive.
+
+@item O
+A constant between 1 and 4 inclusive.
+
+@item P
+A constant between @minus{}4 and @minus{}1 inclusive.
+
+@item Q
+A memory reference that is a stack push.
+
+@item R
+A memory reference that is a stack pop.
+
+@item S
+A memory reference that refers to a constant address of known value.
+
+@item T
+The register indicated by Rx (not implemented yet).
+
+@item U
+A constant that is not between 2 and 15 inclusive.
+
+@item Z
+The constant 0.
+
+@end table
+
 @item Xtensa---@file{config/xtensa/constraints.md}
 @table @code
 @item a
-- 
2.7.4