From: Richard Sandiford <richard.sandiford@arm.com>
Date: Wed, 29 Jan 2020 16:06:58 +0000 (+0000)
Subject: aarch64: Add svbfloat16_t support to arm_sve.h
X-Git-Tag: upstream/12.2.0~18653
X-Git-Url: http://review.tizen.org/git/?a=commitdiff_plain;h=02fcd8ac408be56d2a6e67e2e09b26532862f233;p=platform%2Fupstream%2Fgcc.git

aarch64: Add svbfloat16_t support to arm_sve.h

This patch adds support for the bfloat16-related vectors to
arm_sve.h.  It also adds support for functions that just treat
bfloat16_t as a bag of 16 bits; these functions are available
for bf16 whenever they're available for other 16-bit types.

Previously "all_data" was used for both data movement and for arithmetic
that happened to be defined for all data types.  Adding bf16 means we
need to distinguish between the two cases.

The patch also reorders the mode definitions in aarch64-modes.def,
which means we no longer need separate VECTOR_MODE entries for BF
vectors.

2020-01-31  Richard Sandiford  <richard.sandiford@arm.com>

gcc/
	* config/aarch64/arm_sve.h: Include arm_bf16.h.
	* config/aarch64/aarch64-modes.def (BF): Move definition before
	VECTOR_MODES.  Remove separate VECTOR_MODES for V4BF and V8BF.
	(SVE_MODES): Handle BF modes.
	* config/aarch64/aarch64.c (aarch64_classify_vector_mode): Handle
	BF modes.
	(aarch64_full_sve_mode): Likewise.
	* config/aarch64/iterators.md (SVE_STRUCT): Add VNx16BF, VNx24BF
	and VNx32BF.
	(SVE_FULL, SVE_FULL_HSD, SVE_ALL): Add VNx8BF.
	(Vetype, Vesize, Vctype, VEL, Vel, VEL_INT, V128, v128, vwcore)
	(V_INT_EQUIV, v_int_equiv, V_FP_EQUIV, v_fp_equiv, vector_count)
	(insn_length, VSINGLE, vsingle, VPRED, vpred, VDOUBLE): Handle the
	new SVE BF modes.
	* config/aarch64/aarch64-sve-builtins.h (TYPE_bfloat): New
	type_class_index.
	* config/aarch64/aarch64-sve-builtins.cc (TYPES_all_arith): New macro.
	(TYPES_all_data): Add bf16.
	(TYPES_reinterpret1, TYPES_reinterpret): Likewise.
	(register_tuple_type): Increase buffer size.
	* config/aarch64/aarch64-sve-builtins.def (svbfloat16_t): New type.
	(bf16): New type suffix.
	* config/aarch64/aarch64-sve-builtins-base.def (svabd, svadd, svaddv)
	(svcmpeq, svcmpge, svcmpgt, svcmple, svcmplt, svcmpne, svmad, svmax)
	(svmaxv, svmin, svminv, svmla, svmls, svmsb, svmul, svsub, svsubr):
	Change type from all_data to all_arith.
	* config/aarch64/aarch64-sve-builtins-sve2.def (svaddp, svmaxp)
	(svminp): Likewise.

gcc/testsuite/
	* g++.target/aarch64/sve/acle/general-c++/mangle_1.C: Test mangling
	of svbfloat16_t.
	* g++.target/aarch64/sve/acle/general-c++/mangle_2.C: Likewise for
	__SVBfloat16_t.
	* gcc.target/aarch64/sve/acle/asm/clasta_bf16.c: New test.
	* gcc.target/aarch64/sve/acle/asm/clastb_bf16.c: Likewise.
	* gcc.target/aarch64/sve/acle/asm/cnt_bf16.c: Likewise.
	* gcc.target/aarch64/sve/acle/asm/create2_1.c (create_bf16): Likewise.
	* gcc.target/aarch64/sve/acle/asm/create3_1.c (create_bf16): Likewise.
	* gcc.target/aarch64/sve/acle/asm/create4_1.c (create_bf16): Likewise.
	* gcc.target/aarch64/sve/acle/asm/dup_bf16.c: Likewise.
	* gcc.target/aarch64/sve/acle/asm/dup_lane_bf16.c: Likewise.
	* gcc.target/aarch64/sve/acle/asm/dupq_lane_bf16.c: Likewise.
	* gcc.target/aarch64/sve/acle/asm/ext_bf16.c: Likewise.
	* gcc.target/aarch64/sve/acle/asm/get2_bf16.c: Likewise.
	* gcc.target/aarch64/sve/acle/asm/get3_bf16.c: Likewise.
	* gcc.target/aarch64/sve/acle/asm/get4_bf16.c: Likewise.
	* gcc.target/aarch64/sve/acle/asm/insr_bf16.c: Likewise.
	* gcc.target/aarch64/sve/acle/asm/lasta_bf16.c: Likewise.
	* gcc.target/aarch64/sve/acle/asm/lastb_bf16.c: Likewise.
	* gcc.target/aarch64/sve/acle/asm/ld1_bf16.c: Likewise.
	* gcc.target/aarch64/sve/acle/asm/ld1ro_bf16.c: Likewise.
	* gcc.target/aarch64/sve/acle/asm/ld1rq_bf16.c: Likewise.
	* gcc.target/aarch64/sve/acle/asm/ld2_bf16.c: Likewise.
	* gcc.target/aarch64/sve/acle/asm/ld3_bf16.c: Likewise.
	* gcc.target/aarch64/sve/acle/asm/ld4_bf16.c: Likewise.
	* gcc.target/aarch64/sve/acle/asm/ldff1_bf16.c: Likewise.
	* gcc.target/aarch64/sve/acle/asm/ldnf1_bf16.c: Likewise.
	* gcc.target/aarch64/sve/acle/asm/ldnt1_bf16.c: Likewise.
	* gcc.target/aarch64/sve/acle/asm/len_bf16.c: Likewise.
	* gcc.target/aarch64/sve/acle/asm/reinterpret_bf16.c: Likewise.
	* gcc.target/aarch64/sve/acle/asm/reinterpret_f16.c
	(reinterpret_f16_bf16_tied1, reinterpret_f16_bf16_untied): Likewise.
	* gcc.target/aarch64/sve/acle/asm/reinterpret_f32.c
	(reinterpret_f32_bf16_tied1, reinterpret_f32_bf16_untied): Likewise.
	* gcc.target/aarch64/sve/acle/asm/reinterpret_f64.c
	(reinterpret_f64_bf16_tied1, reinterpret_f64_bf16_untied): Likewise.
	* gcc.target/aarch64/sve/acle/asm/reinterpret_s16.c
	(reinterpret_s16_bf16_tied1, reinterpret_s16_bf16_untied): Likewise.
	* gcc.target/aarch64/sve/acle/asm/reinterpret_s32.c
	(reinterpret_s32_bf16_tied1, reinterpret_s32_bf16_untied): Likewise.
	* gcc.target/aarch64/sve/acle/asm/reinterpret_s64.c
	(reinterpret_s64_bf16_tied1, reinterpret_s64_bf16_untied): Likewise.
	* gcc.target/aarch64/sve/acle/asm/reinterpret_s8.c
	(reinterpret_s8_bf16_tied1, reinterpret_s8_bf16_untied): Likewise.
	* gcc.target/aarch64/sve/acle/asm/reinterpret_u16.c
	(reinterpret_u16_bf16_tied1, reinterpret_u16_bf16_untied): Likewise.
	* gcc.target/aarch64/sve/acle/asm/reinterpret_u32.c
	(reinterpret_u32_bf16_tied1, reinterpret_u32_bf16_untied): Likewise.
	* gcc.target/aarch64/sve/acle/asm/reinterpret_u64.c
	(reinterpret_u64_bf16_tied1, reinterpret_u64_bf16_untied): Likewise.
	* gcc.target/aarch64/sve/acle/asm/reinterpret_u8.c
	(reinterpret_u8_bf16_tied1, reinterpret_u8_bf16_untied): Likewise.
	* gcc.target/aarch64/sve/acle/asm/rev_bf16.c: Likewise.
	* gcc.target/aarch64/sve/acle/asm/sel_bf16.c: Likewise.
	* gcc.target/aarch64/sve/acle/asm/set2_bf16.c: Likewise.
	* gcc.target/aarch64/sve/acle/asm/set3_bf16.c: Likewise.
	* gcc.target/aarch64/sve/acle/asm/set4_bf16.c: Likewise.
	* gcc.target/aarch64/sve/acle/asm/splice_bf16.c: Likewise.
	* gcc.target/aarch64/sve/acle/asm/st1_bf16.c: Likewise.
	* gcc.target/aarch64/sve/acle/asm/st2_bf16.c: Likewise.
	* gcc.target/aarch64/sve/acle/asm/st3_bf16.c: Likewise.
	* gcc.target/aarch64/sve/acle/asm/st4_bf16.c: Likewise.
	* gcc.target/aarch64/sve/acle/asm/stnt1_bf16.c: Likewise.
	* gcc.target/aarch64/sve/acle/asm/tbl_bf16.c: Likewise.
	* gcc.target/aarch64/sve/acle/asm/trn1_bf16.c: Likewise.
	* gcc.target/aarch64/sve/acle/asm/trn1q_bf16.c: Likewise.
	* gcc.target/aarch64/sve/acle/asm/trn2_bf16.c: Likewise.
	* gcc.target/aarch64/sve/acle/asm/trn2q_bf16.c: Likewise.
	* gcc.target/aarch64/sve/acle/asm/undef2_1.c (bfloat16_t): Likewise.
	* gcc.target/aarch64/sve/acle/asm/undef3_1.c (bfloat16_t): Likewise.
	* gcc.target/aarch64/sve/acle/asm/undef4_1.c (bfloat16_t): Likewise.
	* gcc.target/aarch64/sve/acle/asm/undef_1.c (bfloat16_t): Likewise.
	* gcc.target/aarch64/sve/acle/asm/uzp1_bf16.c: Likewise.
	* gcc.target/aarch64/sve/acle/asm/uzp1q_bf16.c: Likewise.
	* gcc.target/aarch64/sve/acle/asm/uzp2_bf16.c: Likewise.
	* gcc.target/aarch64/sve/acle/asm/uzp2q_bf16.c: Likewise.
	* gcc.target/aarch64/sve/acle/asm/zip1_bf16.c: Likewise.
	* gcc.target/aarch64/sve/acle/asm/zip1q_bf16.c: Likewise.
	* gcc.target/aarch64/sve/acle/asm/zip2_bf16.c: Likewise.
	* gcc.target/aarch64/sve/acle/asm/zip2q_bf16.c: Likewise.
	* gcc.target/aarch64/sve/pcs/annotate_1.c (ret_bf16, ret_bf16x2)
	(ret_bf16x3, ret_bf16x4): Likewise.
	* gcc.target/aarch64/sve/pcs/annotate_2.c (fn_bf16, fn_bf16x2)
	(fn_bf16x3, fn_bf16x4): Likewise.
	* gcc.target/aarch64/sve/pcs/annotate_3.c (fn_bf16, fn_bf16x2)
	(fn_bf16x3, fn_bf16x4): Likewise.
	* gcc.target/aarch64/sve/pcs/annotate_4.c (fn_bf16, fn_bf16x2)
	(fn_bf16x3, fn_bf16x4): Likewise.
	* gcc.target/aarch64/sve/pcs/annotate_5.c (fn_bf16, fn_bf16x2)
	(fn_bf16x3, fn_bf16x4): Likewise.
	* gcc.target/aarch64/sve/pcs/annotate_6.c (fn_bf16, fn_bf16x2)
	(fn_bf16x3, fn_bf16x4): Likewise.
	* gcc.target/aarch64/sve/pcs/annotate_7.c (fn_bf16, fn_bf16x2)
	(fn_bf16x3, fn_bf16x4): Likewise.
	* gcc.target/aarch64/sve/pcs/args_5_be_bf16.c: Likewise.
	* gcc.target/aarch64/sve/pcs/args_5_le_bf16.c: Likewise.
	* gcc.target/aarch64/sve/pcs/args_6_be_bf16.c: Likewise.
	* gcc.target/aarch64/sve/pcs/args_6_le_bf16.c: Likewise.
	* gcc.target/aarch64/sve/pcs/gnu_vectors_1.c (bfloat16x16_t): New
	typedef.
	(bfloat16_callee, bfloat16_caller): New tests.
	* gcc.target/aarch64/sve/pcs/gnu_vectors_2.c (bfloat16x16_t): New
	typedef.
	(bfloat16_callee, bfloat16_caller): New tests.
	* gcc.target/aarch64/sve/pcs/return_4.c (CALLER_BF16): New macro.
	(callee_bf16, caller_bf16): New tests.
	* gcc.target/aarch64/sve/pcs/return_4_128.c (CALLER_BF16): New macro.
	(callee_bf16, caller_bf16): New tests.
	* gcc.target/aarch64/sve/pcs/return_4_256.c (CALLER_BF16): New macro.
	(callee_bf16, caller_bf16): New tests.
	* gcc.target/aarch64/sve/pcs/return_4_512.c (CALLER_BF16): New macro.
	(callee_bf16, caller_bf16): New tests.
	* gcc.target/aarch64/sve/pcs/return_4_1024.c (CALLER_BF16): New macro.
	(callee_bf16, caller_bf16): New tests.
	* gcc.target/aarch64/sve/pcs/return_4_2048.c (CALLER_BF16): New macro.
	(callee_bf16, caller_bf16): New tests.
	* gcc.target/aarch64/sve/pcs/return_5.c (CALLER_BF16): New macro.
	(callee_bf16, caller_bf16): New tests.
	* gcc.target/aarch64/sve/pcs/return_5_128.c (CALLER_BF16): New macro.
	(callee_bf16, caller_bf16): New tests.
	* gcc.target/aarch64/sve/pcs/return_5_256.c (CALLER_BF16): New macro.
	(callee_bf16, caller_bf16): New tests.
	* gcc.target/aarch64/sve/pcs/return_5_512.c (CALLER_BF16): New macro.
	(callee_bf16, caller_bf16): New tests.
	* gcc.target/aarch64/sve/pcs/return_5_1024.c (CALLER_BF16): New macro.
	(callee_bf16, caller_bf16): New tests.
	* gcc.target/aarch64/sve/pcs/return_5_2048.c (CALLER_BF16): New macro.
	(callee_bf16, caller_bf16): New tests.
	* gcc.target/aarch64/sve/pcs/return_6.c (bfloat16_t): New typedef.
	(callee_bf16, caller_bf16): New tests.
	* gcc.target/aarch64/sve/pcs/return_6_128.c (bfloat16_t): New typedef.
	(callee_bf16, caller_bf16): New tests.
	* gcc.target/aarch64/sve/pcs/return_6_256.c (bfloat16_t): New typedef.
	(callee_bf16, caller_bf16): New tests.
	* gcc.target/aarch64/sve/pcs/return_6_512.c (bfloat16_t): New typedef.
	(callee_bf16, caller_bf16): New tests.
	* gcc.target/aarch64/sve/pcs/return_6_1024.c (bfloat16_t): New typedef.
	(callee_bf16, caller_bf16): New tests.
	* gcc.target/aarch64/sve/pcs/return_6_2048.c (bfloat16_t): New typedef.
	(callee_bf16, caller_bf16): New tests.
	* gcc.target/aarch64/sve/pcs/return_7.c (callee_bf16): Likewise
	(caller_bf16): Likewise.
	* gcc.target/aarch64/sve/pcs/return_8.c (callee_bf16): Likewise
	(caller_bf16): Likewise.
	* gcc.target/aarch64/sve/pcs/return_9.c (callee_bf16): Likewise
	(caller_bf16): Likewise.
	* gcc.target/aarch64/sve2/acle/asm/tbl2_bf16.c: Likewise.
	* gcc.target/aarch64/sve2/acle/asm/tbx_bf16.c: Likewise.
	* gcc.target/aarch64/sve2/acle/asm/whilerw_bf16.c: Likewise.
	* gcc.target/aarch64/sve2/acle/asm/whilewr_bf16.c: Likewise.
---

diff --git a/gcc/ChangeLog b/gcc/ChangeLog
index c45b1b6..d10ae92 100644
--- a/gcc/ChangeLog
+++ b/gcc/ChangeLog
@@ -1,3 +1,34 @@
+2020-01-31  Richard Sandiford  <richard.sandiford@arm.com>
+
+	* config/aarch64/arm_sve.h: Include arm_bf16.h.
+	* config/aarch64/aarch64-modes.def (BF): Move definition before
+	VECTOR_MODES.  Remove separate VECTOR_MODES for V4BF and V8BF.
+	(SVE_MODES): Handle BF modes.
+	* config/aarch64/aarch64.c (aarch64_classify_vector_mode): Handle
+	BF modes.
+	(aarch64_full_sve_mode): Likewise.
+	* config/aarch64/iterators.md (SVE_STRUCT): Add VNx16BF, VNx24BF
+	and VNx32BF.
+	(SVE_FULL, SVE_FULL_HSD, SVE_ALL): Add VNx8BF.
+	(Vetype, Vesize, Vctype, VEL, Vel, VEL_INT, V128, v128, vwcore)
+	(V_INT_EQUIV, v_int_equiv, V_FP_EQUIV, v_fp_equiv, vector_count)
+	(insn_length, VSINGLE, vsingle, VPRED, vpred, VDOUBLE): Handle the
+	new SVE BF modes.
+	* config/aarch64/aarch64-sve-builtins.h (TYPE_bfloat): New
+	type_class_index.
+	* config/aarch64/aarch64-sve-builtins.cc (TYPES_all_arith): New macro.
+	(TYPES_all_data): Add bf16.
+	(TYPES_reinterpret1, TYPES_reinterpret): Likewise.
+	(register_tuple_type): Increase buffer size.
+	* config/aarch64/aarch64-sve-builtins.def (svbfloat16_t): New type.
+	(bf16): New type suffix.
+	* config/aarch64/aarch64-sve-builtins-base.def (svabd, svadd, svaddv)
+	(svcmpeq, svcmpge, svcmpgt, svcmple, svcmplt, svcmpne, svmad, svmax)
+	(svmaxv, svmin, svminv, svmla, svmls, svmsb, svmul, svsub, svsubr):
+	Change type from all_data to all_arith.
+	* config/aarch64/aarch64-sve-builtins-sve2.def (svaddp, svmaxp)
+	(svminp): Likewise.
+
 2020-01-31  Dennis Zhang  <dennis.zhang@arm.com>
 	    Matthew Malcomson  <matthew.malcomson@arm.com>
 	    Richard Sandiford  <richard.sandiford@arm.com>
diff --git a/gcc/config/aarch64/aarch64-modes.def b/gcc/config/aarch64/aarch64-modes.def
index 1eeb8d8..af972e8 100644
--- a/gcc/config/aarch64/aarch64-modes.def
+++ b/gcc/config/aarch64/aarch64-modes.def
@@ -62,6 +62,10 @@ ADJUST_ALIGNMENT (VNx8BI, 2);
 ADJUST_ALIGNMENT (VNx4BI, 2);
 ADJUST_ALIGNMENT (VNx2BI, 2);
 
+/* Bfloat16 modes.  */
+FLOAT_MODE (BF, 2, 0);
+ADJUST_FLOAT_FORMAT (BF, &arm_bfloat_half_format);
+
 VECTOR_MODES (INT, 8);        /*       V8QI V4HI V2SI.  */
 VECTOR_MODES (INT, 16);       /* V16QI V8HI V4SI V2DI.  */
 VECTOR_MODES (FLOAT, 8);      /*                 V2SF.  */
@@ -69,13 +73,6 @@ VECTOR_MODES (FLOAT, 16);     /*            V4SF V2DF.  */
 VECTOR_MODE (FLOAT, DF, 1);   /*                 V1DF.  */
 VECTOR_MODE (FLOAT, HF, 2);   /*                 V2HF.  */
 
-/* Bfloat16 modes.  */
-FLOAT_MODE (BF, 2, 0);
-ADJUST_FLOAT_FORMAT (BF, &arm_bfloat_half_format);
-
-VECTOR_MODE (FLOAT, BF, 4);   /*		 V4BF.  */
-VECTOR_MODE (FLOAT, BF, 8);   /*		 V8BF.  */
-
 /* Oct Int: 256-bit integer mode needed for 32-byte vector arguments.  */
 INT_MODE (OI, 32);
 
@@ -96,6 +93,7 @@ INT_MODE (XI, 64);
   ADJUST_NUNITS (VH##HI, aarch64_sve_vg * NVECS * 4); \
   ADJUST_NUNITS (VS##SI, aarch64_sve_vg * NVECS * 2); \
   ADJUST_NUNITS (VD##DI, aarch64_sve_vg * NVECS); \
+  ADJUST_NUNITS (VH##BF, aarch64_sve_vg * NVECS * 4); \
   ADJUST_NUNITS (VH##HF, aarch64_sve_vg * NVECS * 4); \
   ADJUST_NUNITS (VS##SF, aarch64_sve_vg * NVECS * 2); \
   ADJUST_NUNITS (VD##DF, aarch64_sve_vg * NVECS); \
@@ -104,6 +102,7 @@ INT_MODE (XI, 64);
   ADJUST_ALIGNMENT (VH##HI, 16); \
   ADJUST_ALIGNMENT (VS##SI, 16); \
   ADJUST_ALIGNMENT (VD##DI, 16); \
+  ADJUST_ALIGNMENT (VH##BF, 16); \
   ADJUST_ALIGNMENT (VH##HF, 16); \
   ADJUST_ALIGNMENT (VS##SF, 16); \
   ADJUST_ALIGNMENT (VD##DF, 16);
diff --git a/gcc/config/aarch64/aarch64-sve-builtins-base.def b/gcc/config/aarch64/aarch64-sve-builtins-base.def
index c0efe05..332555b 100644
--- a/gcc/config/aarch64/aarch64-sve-builtins-base.def
+++ b/gcc/config/aarch64/aarch64-sve-builtins-base.def
@@ -18,15 +18,15 @@
    <http://www.gnu.org/licenses/>.  */
 
 #define REQUIRED_EXTENSIONS 0
-DEF_SVE_FUNCTION (svabd, binary_opt_n, all_data, mxz)
+DEF_SVE_FUNCTION (svabd, binary_opt_n, all_arith, mxz)
 DEF_SVE_FUNCTION (svabs, unary, all_float_and_signed, mxz)
 DEF_SVE_FUNCTION (svacge, compare_opt_n, all_float, implicit)
 DEF_SVE_FUNCTION (svacgt, compare_opt_n, all_float, implicit)
 DEF_SVE_FUNCTION (svacle, compare_opt_n, all_float, implicit)
 DEF_SVE_FUNCTION (svaclt, compare_opt_n, all_float, implicit)
-DEF_SVE_FUNCTION (svadd, binary_opt_n, all_data, mxz)
+DEF_SVE_FUNCTION (svadd, binary_opt_n, all_arith, mxz)
 DEF_SVE_FUNCTION (svadda, fold_left, all_float, implicit)
-DEF_SVE_FUNCTION (svaddv, reduction_wide, all_data, implicit)
+DEF_SVE_FUNCTION (svaddv, reduction_wide, all_arith, implicit)
 DEF_SVE_FUNCTION (svadrb, adr_offset, none, none)
 DEF_SVE_FUNCTION (svadrd, adr_index, none, none)
 DEF_SVE_FUNCTION (svadrh, adr_index, none, none)
@@ -51,17 +51,17 @@ DEF_SVE_FUNCTION (svcls, unary_to_uint, all_signed, mxz)
 DEF_SVE_FUNCTION (svclz, unary_to_uint, all_integer, mxz)
 DEF_SVE_FUNCTION (svcmla, ternary_rotate, all_float, mxz)
 DEF_SVE_FUNCTION (svcmla_lane, ternary_lane_rotate, hs_float, none)
-DEF_SVE_FUNCTION (svcmpeq, compare_opt_n, all_data, implicit)
+DEF_SVE_FUNCTION (svcmpeq, compare_opt_n, all_arith, implicit)
 DEF_SVE_FUNCTION (svcmpeq_wide, compare_wide_opt_n, bhs_signed, implicit)
-DEF_SVE_FUNCTION (svcmpge, compare_opt_n, all_data, implicit)
+DEF_SVE_FUNCTION (svcmpge, compare_opt_n, all_arith, implicit)
 DEF_SVE_FUNCTION (svcmpge_wide, compare_wide_opt_n, bhs_integer, implicit)
-DEF_SVE_FUNCTION (svcmpgt, compare_opt_n, all_data, implicit)
+DEF_SVE_FUNCTION (svcmpgt, compare_opt_n, all_arith, implicit)
 DEF_SVE_FUNCTION (svcmpgt_wide, compare_wide_opt_n, bhs_integer, implicit)
-DEF_SVE_FUNCTION (svcmple, compare_opt_n, all_data, implicit)
+DEF_SVE_FUNCTION (svcmple, compare_opt_n, all_arith, implicit)
 DEF_SVE_FUNCTION (svcmple_wide, compare_wide_opt_n, bhs_integer, implicit)
-DEF_SVE_FUNCTION (svcmplt, compare_opt_n, all_data, implicit)
+DEF_SVE_FUNCTION (svcmplt, compare_opt_n, all_arith, implicit)
 DEF_SVE_FUNCTION (svcmplt_wide, compare_wide_opt_n, bhs_integer, implicit)
-DEF_SVE_FUNCTION (svcmpne, compare_opt_n, all_data, implicit)
+DEF_SVE_FUNCTION (svcmpne, compare_opt_n, all_arith, implicit)
 DEF_SVE_FUNCTION (svcmpne_wide, compare_wide_opt_n, bhs_signed, implicit)
 DEF_SVE_FUNCTION (svcmpuo, compare_opt_n, all_float, implicit)
 DEF_SVE_FUNCTION (svcnot, unary, all_integer, mxz)
@@ -160,23 +160,23 @@ DEF_SVE_FUNCTION (svlsl, binary_uint_opt_n, all_integer, mxz)
 DEF_SVE_FUNCTION (svlsl_wide, binary_uint64_opt_n, bhs_integer, mxz)
 DEF_SVE_FUNCTION (svlsr, binary_uint_opt_n, all_unsigned, mxz)
 DEF_SVE_FUNCTION (svlsr_wide, binary_uint64_opt_n, bhs_unsigned, mxz)
-DEF_SVE_FUNCTION (svmad, ternary_opt_n, all_data, mxz)
-DEF_SVE_FUNCTION (svmax, binary_opt_n, all_data, mxz)
+DEF_SVE_FUNCTION (svmad, ternary_opt_n, all_arith, mxz)
+DEF_SVE_FUNCTION (svmax, binary_opt_n, all_arith, mxz)
 DEF_SVE_FUNCTION (svmaxnm, binary_opt_n, all_float, mxz)
 DEF_SVE_FUNCTION (svmaxnmv, reduction, all_float, implicit)
-DEF_SVE_FUNCTION (svmaxv, reduction, all_data, implicit)
-DEF_SVE_FUNCTION (svmin, binary_opt_n, all_data, mxz)
+DEF_SVE_FUNCTION (svmaxv, reduction, all_arith, implicit)
+DEF_SVE_FUNCTION (svmin, binary_opt_n, all_arith, mxz)
 DEF_SVE_FUNCTION (svminnm, binary_opt_n, all_float, mxz)
 DEF_SVE_FUNCTION (svminnmv, reduction, all_float, implicit)
-DEF_SVE_FUNCTION (svminv, reduction, all_data, implicit)
-DEF_SVE_FUNCTION (svmla, ternary_opt_n, all_data, mxz)
+DEF_SVE_FUNCTION (svminv, reduction, all_arith, implicit)
+DEF_SVE_FUNCTION (svmla, ternary_opt_n, all_arith, mxz)
 DEF_SVE_FUNCTION (svmla_lane, ternary_lane, all_float, none)
-DEF_SVE_FUNCTION (svmls, ternary_opt_n, all_data, mxz)
+DEF_SVE_FUNCTION (svmls, ternary_opt_n, all_arith, mxz)
 DEF_SVE_FUNCTION (svmls_lane, ternary_lane, all_float, none)
 DEF_SVE_FUNCTION (svmmla, mmla, none, none)
 DEF_SVE_FUNCTION (svmov, unary, b, z)
-DEF_SVE_FUNCTION (svmsb, ternary_opt_n, all_data, mxz)
-DEF_SVE_FUNCTION (svmul, binary_opt_n, all_data, mxz)
+DEF_SVE_FUNCTION (svmsb, ternary_opt_n, all_arith, mxz)
+DEF_SVE_FUNCTION (svmul, binary_opt_n, all_arith, mxz)
 DEF_SVE_FUNCTION (svmul_lane, binary_lane, all_float, none)
 DEF_SVE_FUNCTION (svmulh, binary_opt_n, all_integer, mxz)
 DEF_SVE_FUNCTION (svmulx, binary_opt_n, all_float, mxz)
@@ -287,8 +287,8 @@ DEF_SVE_FUNCTION (svst2, store, all_data, implicit)
 DEF_SVE_FUNCTION (svst3, store, all_data, implicit)
 DEF_SVE_FUNCTION (svst4, store, all_data, implicit)
 DEF_SVE_FUNCTION (svstnt1, store, all_data, implicit)
-DEF_SVE_FUNCTION (svsub, binary_opt_n, all_data, mxz)
-DEF_SVE_FUNCTION (svsubr, binary_opt_n, all_data, mxz)
+DEF_SVE_FUNCTION (svsub, binary_opt_n, all_arith, mxz)
+DEF_SVE_FUNCTION (svsubr, binary_opt_n, all_arith, mxz)
 DEF_SVE_FUNCTION (svtbl, binary_uint, all_data, none)
 DEF_SVE_FUNCTION (svtmad, tmad, all_float, none)
 DEF_SVE_FUNCTION (svtrn1, binary, all_data, none)
diff --git a/gcc/config/aarch64/aarch64-sve-builtins-sve2.def b/gcc/config/aarch64/aarch64-sve-builtins-sve2.def
index 5ab41c3..8daf8f7 100644
--- a/gcc/config/aarch64/aarch64-sve-builtins-sve2.def
+++ b/gcc/config/aarch64/aarch64-sve-builtins-sve2.def
@@ -31,7 +31,7 @@ DEF_SVE_FUNCTION (svabdlt, binary_long_opt_n, hsd_integer, none)
 DEF_SVE_FUNCTION (svaddlb, binary_long_opt_n, hsd_integer, none)
 DEF_SVE_FUNCTION (svaddlbt, binary_long_opt_n, hsd_signed, none)
 DEF_SVE_FUNCTION (svaddlt, binary_long_opt_n, hsd_integer, none)
-DEF_SVE_FUNCTION (svaddp, binary, all_data, mx)
+DEF_SVE_FUNCTION (svaddp, binary, all_arith, mx)
 DEF_SVE_FUNCTION (svaddwb, binary_wide_opt_n, hsd_integer, none)
 DEF_SVE_FUNCTION (svaddwt, binary_wide_opt_n, hsd_integer, none)
 DEF_SVE_FUNCTION (svbcax, ternary_opt_n, all_integer, none)
@@ -69,7 +69,7 @@ DEF_SVE_FUNCTION (svldnt1uw_gather, load_ext_gather_offset_restricted, d_integer
 DEF_SVE_FUNCTION (svldnt1uw_gather, load_ext_gather_index_restricted, d_integer, implicit)
 DEF_SVE_FUNCTION (svlogb, unary_to_int, all_float, mxz)
 DEF_SVE_FUNCTION (svmatch, compare, bh_integer, implicit)
-DEF_SVE_FUNCTION (svmaxp, binary, all_data, mx)
+DEF_SVE_FUNCTION (svmaxp, binary, all_arith, mx)
 DEF_SVE_FUNCTION (svmaxnmp, binary, all_float, mx)
 DEF_SVE_FUNCTION (svmla_lane, ternary_lane, hsd_integer, none)
 DEF_SVE_FUNCTION (svmlalb, ternary_long_opt_n, s_float_hsd_integer, none)
@@ -81,7 +81,7 @@ DEF_SVE_FUNCTION (svmlslb, ternary_long_opt_n, s_float_hsd_integer, none)
 DEF_SVE_FUNCTION (svmlslb_lane, ternary_long_lane, s_float_sd_integer, none)
 DEF_SVE_FUNCTION (svmlslt, ternary_long_opt_n, s_float_hsd_integer, none)
 DEF_SVE_FUNCTION (svmlslt_lane, ternary_long_lane, s_float_sd_integer, none)
-DEF_SVE_FUNCTION (svminp, binary, all_data, mx)
+DEF_SVE_FUNCTION (svminp, binary, all_arith, mx)
 DEF_SVE_FUNCTION (svminnmp, binary, all_float, mx)
 DEF_SVE_FUNCTION (svmovlb, unary_long, hsd_integer, none)
 DEF_SVE_FUNCTION (svmovlt, unary_long, hsd_integer, none)
diff --git a/gcc/config/aarch64/aarch64-sve-builtins.cc b/gcc/config/aarch64/aarch64-sve-builtins.cc
index 537c28e..d4d201d 100644
--- a/gcc/config/aarch64/aarch64-sve-builtins.cc
+++ b/gcc/config/aarch64/aarch64-sve-builtins.cc
@@ -184,9 +184,16 @@ CONSTEXPR const type_suffix_info type_suffixes[NUM_TYPE_SUFFIXES + 1] = {
 /*     _f16 _f32 _f64
    _s8 _s16 _s32 _s64
    _u8 _u16 _u32 _u64.  */
-#define TYPES_all_data(S, D) \
+#define TYPES_all_arith(S, D) \
   TYPES_all_float (S, D), TYPES_all_integer (S, D)
 
+/*     _bf16
+	_f16 _f32 _f64
+   _s8  _s16 _s32 _s64
+   _u8  _u16 _u32 _u64.  */
+#define TYPES_all_data(S, D) \
+  S (bf16), TYPES_all_arith (S, D)
+
 /* _b only.  */
 #define TYPES_b(S, D) \
   S (b)
@@ -371,14 +378,17 @@ CONSTEXPR const type_suffix_info type_suffixes[NUM_TYPE_SUFFIXES + 1] = {
   TYPES_inc_dec_n1 (D, u32), \
   TYPES_inc_dec_n1 (D, u64)
 
-/* {     _f16 _f32 _f64 }   {     _f16 _f32 _f64 }
-   { _s8 _s16 _s32 _s64 } x { _s8 _s16 _s32 _s64 }
-   { _u8 _u16 _u32 _u64 }   { _u8 _u16 _u32 _u64 }.  */
+/* {     _bf16           }   {     _bf16           }
+   {      _f16 _f32 _f64 }   {      _f16 _f32 _f64 }
+   { _s8  _s16 _s32 _s64 } x { _s8  _s16 _s32 _s64 }
+   { _u8  _u16 _u32 _u64 }   { _u8  _u16 _u32 _u64 }.  */
 #define TYPES_reinterpret1(D, A) \
+  D (A, bf16), \
   D (A, f16), D (A, f32), D (A, f64), \
   D (A, s8), D (A, s16), D (A, s32), D (A, s64), \
   D (A, u8), D (A, u16), D (A, u32), D (A, u64)
 #define TYPES_reinterpret(S, D) \
+  TYPES_reinterpret1 (D, bf16), \
   TYPES_reinterpret1 (D, f16), \
   TYPES_reinterpret1 (D, f32), \
   TYPES_reinterpret1 (D, f64), \
@@ -428,6 +438,7 @@ DEF_SVE_TYPES_ARRAY (all_signed);
 DEF_SVE_TYPES_ARRAY (all_float_and_signed);
 DEF_SVE_TYPES_ARRAY (all_unsigned);
 DEF_SVE_TYPES_ARRAY (all_integer);
+DEF_SVE_TYPES_ARRAY (all_arith);
 DEF_SVE_TYPES_ARRAY (all_data);
 DEF_SVE_TYPES_ARRAY (b);
 DEF_SVE_TYPES_ARRAY (b_unsigned);
@@ -3351,7 +3362,7 @@ register_tuple_type (unsigned int num_vectors, vector_type_index type)
 	      && TYPE_ALIGN (tuple_type) == 128);
 
   /* Work out the structure name.  */
-  char buffer[sizeof ("svfloat64x4_t")];
+  char buffer[sizeof ("svbfloat16x4_t")];
   const char *vector_type_name = vector_types[type].acle_name;
   snprintf (buffer, sizeof (buffer), "%.*sx%d_t",
 	    (int) strlen (vector_type_name) - 2, vector_type_name,
diff --git a/gcc/config/aarch64/aarch64-sve-builtins.def b/gcc/config/aarch64/aarch64-sve-builtins.def
index a5a5aca..3dbf4f5 100644
--- a/gcc/config/aarch64/aarch64-sve-builtins.def
+++ b/gcc/config/aarch64/aarch64-sve-builtins.def
@@ -61,6 +61,7 @@ DEF_SVE_MODE (u64offset, none, svuint64_t, bytes)
 DEF_SVE_MODE (vnum, none, none, vectors)
 
 DEF_SVE_TYPE (svbool_t, 10, __SVBool_t, boolean_type_node)
+DEF_SVE_TYPE (svbfloat16_t, 14, __SVBfloat16_t, aarch64_bf16_type_node)
 DEF_SVE_TYPE (svfloat16_t, 13, __SVFloat16_t, aarch64_fp16_type_node)
 DEF_SVE_TYPE (svfloat32_t, 13, __SVFloat32_t, float_type_node)
 DEF_SVE_TYPE (svfloat64_t, 13, __SVFloat64_t, double_type_node)
@@ -81,6 +82,7 @@ DEF_SVE_TYPE_SUFFIX (b8, svbool_t, bool, 8, VNx16BImode)
 DEF_SVE_TYPE_SUFFIX (b16, svbool_t, bool, 16, VNx8BImode)
 DEF_SVE_TYPE_SUFFIX (b32, svbool_t, bool, 32, VNx4BImode)
 DEF_SVE_TYPE_SUFFIX (b64, svbool_t, bool, 64, VNx2BImode)
+DEF_SVE_TYPE_SUFFIX (bf16, svbfloat16_t, bfloat, 16, VNx8BFmode)
 DEF_SVE_TYPE_SUFFIX (f16, svfloat16_t, float, 16, VNx8HFmode)
 DEF_SVE_TYPE_SUFFIX (f32, svfloat32_t, float, 32, VNx4SFmode)
 DEF_SVE_TYPE_SUFFIX (f64, svfloat64_t, float, 64, VNx2DFmode)
diff --git a/gcc/config/aarch64/aarch64-sve-builtins.h b/gcc/config/aarch64/aarch64-sve-builtins.h
index 9513b49..f7f06d2 100644
--- a/gcc/config/aarch64/aarch64-sve-builtins.h
+++ b/gcc/config/aarch64/aarch64-sve-builtins.h
@@ -150,6 +150,7 @@ enum predication_index
 enum type_class_index
 {
   TYPE_bool,
+  TYPE_bfloat,
   TYPE_float,
   TYPE_signed,
   TYPE_unsigned,
diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
index 11197bd..6581e4c 100644
--- a/gcc/config/aarch64/aarch64.c
+++ b/gcc/config/aarch64/aarch64.c
@@ -1656,6 +1656,7 @@ aarch64_classify_vector_mode (machine_mode mode)
     case E_VNx8HImode:
     case E_VNx4SImode:
     case E_VNx2DImode:
+    case E_VNx8BFmode:
     case E_VNx8HFmode:
     case E_VNx4SFmode:
     case E_VNx2DFmode:
@@ -1666,6 +1667,7 @@ aarch64_classify_vector_mode (machine_mode mode)
     case E_VNx16HImode:
     case E_VNx8SImode:
     case E_VNx4DImode:
+    case E_VNx16BFmode:
     case E_VNx16HFmode:
     case E_VNx8SFmode:
     case E_VNx4DFmode:
@@ -1674,6 +1676,7 @@ aarch64_classify_vector_mode (machine_mode mode)
     case E_VNx24HImode:
     case E_VNx12SImode:
     case E_VNx6DImode:
+    case E_VNx24BFmode:
     case E_VNx24HFmode:
     case E_VNx12SFmode:
     case E_VNx6DFmode:
@@ -1682,6 +1685,7 @@ aarch64_classify_vector_mode (machine_mode mode)
     case E_VNx32HImode:
     case E_VNx16SImode:
     case E_VNx8DImode:
+    case E_VNx32BFmode:
     case E_VNx32HFmode:
     case E_VNx16SFmode:
     case E_VNx8DFmode:
@@ -16109,8 +16113,10 @@ aarch64_full_sve_mode (scalar_mode mode)
       return VNx4SFmode;
     case E_HFmode:
       return VNx8HFmode;
+    case E_BFmode:
+      return VNx8BFmode;
     case E_DImode:
-	return VNx2DImode;
+      return VNx2DImode;
     case E_SImode:
       return VNx4SImode;
     case E_HImode:
diff --git a/gcc/config/aarch64/arm_sve.h b/gcc/config/aarch64/arm_sve.h
index 8df48b7..f2d2d0c 100644
--- a/gcc/config/aarch64/arm_sve.h
+++ b/gcc/config/aarch64/arm_sve.h
@@ -26,6 +26,7 @@
 #define _ARM_SVE_H_
 
 #include <stdint.h>
+#include <arm_bf16.h>
 
 typedef __fp16 float16_t;
 typedef float float32_t;
diff --git a/gcc/config/aarch64/iterators.md b/gcc/config/aarch64/iterators.md
index bac11b3..d5b60e0 100644
--- a/gcc/config/aarch64/iterators.md
+++ b/gcc/config/aarch64/iterators.md
@@ -321,19 +321,15 @@
 
 ;; All SVE vector structure modes.
 (define_mode_iterator SVE_STRUCT [VNx32QI VNx16HI VNx8SI VNx4DI
-				  VNx16HF VNx8SF VNx4DF
+				  VNx16BF VNx16HF VNx8SF VNx4DF
 				  VNx48QI VNx24HI VNx12SI VNx6DI
-				  VNx24HF VNx12SF VNx6DF
+				  VNx24BF VNx24HF VNx12SF VNx6DF
 				  VNx64QI VNx32HI VNx16SI VNx8DI
-				  VNx32HF VNx16SF VNx8DF])
-
-;; SVE_STRUCT restricted to 2-vector tuples.
-(define_mode_iterator SVE_STRUCT2 [VNx32QI VNx16HI VNx8SI VNx4DI
-				   VNx16HF VNx8SF VNx4DF])
+				  VNx32BF VNx32HF VNx16SF VNx8DF])
 
 ;; All fully-packed SVE vector modes.
 (define_mode_iterator SVE_FULL [VNx16QI VNx8HI VNx4SI VNx2DI
-			        VNx8HF VNx4SF VNx2DF])
+			        VNx8BF VNx8HF VNx4SF VNx2DF])
 
 ;; All fully-packed SVE integer vector modes.
 (define_mode_iterator SVE_FULL_I [VNx16QI VNx8HI VNx4SI VNx2DI])
@@ -349,7 +345,8 @@
 (define_mode_iterator SVE_FULL_BHSI [VNx16QI VNx8HI VNx4SI])
 
 ;; Fully-packed SVE vector modes that have 16-bit, 32-bit or 64-bit elements.
-(define_mode_iterator SVE_FULL_HSD [VNx8HI VNx4SI VNx2DI VNx8HF VNx4SF VNx2DF])
+(define_mode_iterator SVE_FULL_HSD [VNx8HI VNx4SI VNx2DI
+				    VNx8BF VNx8HF VNx4SF VNx2DF])
 
 ;; Fully-packed SVE integer vector modes that have 16-bit, 32-bit or 64-bit
 ;; elements.
@@ -395,6 +392,7 @@
 (define_mode_iterator SVE_ALL [VNx16QI VNx8QI VNx4QI VNx2QI
 			       VNx8HI VNx4HI VNx2HI
 			       VNx8HF VNx4HF VNx2HF
+			       VNx8BF
 			       VNx4SI VNx2SI
 			       VNx4SF VNx2SF
 			       VNx2DI
@@ -1005,6 +1003,7 @@
 			  (VNx16QI "b") (VNx8QI "b") (VNx4QI "b") (VNx2QI "b")
 			  (VNx8HI "h") (VNx4HI "h") (VNx2HI "h")
 			  (VNx8HF "h") (VNx4HF "h") (VNx2HF "h")
+			  (VNx8BF "h")
 			  (VNx4SI "s") (VNx2SI "s")
 			  (VNx4SF "s") (VNx2SF "s")
 			  (VNx2DI "d")
@@ -1021,6 +1020,7 @@
 (define_mode_attr Vesize [(VNx16QI "b") (VNx8QI "b") (VNx4QI "b") (VNx2QI "b")
 			  (VNx8HI "h") (VNx4HI "h") (VNx2HI "h")
 			  (VNx8HF "h") (VNx4HF "h") (VNx2HF "h")
+			  (VNx8BF "h")
 			  (VNx4SI "w") (VNx2SI "w")
 			  (VNx4SF "w") (VNx2SF "w")
 			  (VNx2DI "d")
@@ -1028,6 +1028,7 @@
 			  (VNx32QI "b") (VNx48QI "b") (VNx64QI "b")
 			  (VNx16HI "h") (VNx24HI "h") (VNx32HI "h")
 			  (VNx16HF "h") (VNx24HF "h") (VNx32HF "h")
+			  (VNx16BF "h") (VNx24BF "h") (VNx32BF "h")
 			  (VNx8SI  "w") (VNx12SI "w") (VNx16SI "w")
 			  (VNx8SF  "w") (VNx12SF "w") (VNx16SF "w")
 			  (VNx4DI  "d") (VNx6DI  "d") (VNx8DI  "d")
@@ -1038,6 +1039,7 @@
 (define_mode_attr Vctype [(VNx16QI "b") (VNx8QI "h") (VNx4QI "s") (VNx2QI "d")
 			  (VNx8HI "h") (VNx4HI "s") (VNx2HI "d")
 			  (VNx8HF "h") (VNx4HF "s") (VNx2HF "d")
+			  (VNx8BF "h")
 			  (VNx4SI "s") (VNx2SI "d")
 			  (VNx4SF "s") (VNx2SF "d")
 			  (VNx2DI "d")
@@ -1077,6 +1079,7 @@
 		       (VNx16QI "QI") (VNx8QI "QI") (VNx4QI "QI") (VNx2QI "QI")
 		       (VNx8HI "HI") (VNx4HI "HI") (VNx2HI "HI")
 		       (VNx8HF "HF") (VNx4HF "HF") (VNx2HF "HF")
+		       (VNx8BF "BF")
 		       (VNx4SI "SI") (VNx2SI "SI")
 		       (VNx4SF "SF") (VNx2SF "SF")
 		       (VNx2DI "DI")
@@ -1095,6 +1098,7 @@
 		       (VNx16QI "qi") (VNx8QI "qi") (VNx4QI "qi") (VNx2QI "qi")
 		       (VNx8HI "hi") (VNx4HI "hi") (VNx2HI "hi")
 		       (VNx8HF "hf") (VNx4HF "hf") (VNx2HF "hf")
+		       (VNx8BF "bf")
 		       (VNx4SI "si") (VNx2SI "si")
 		       (VNx4SF "sf") (VNx2SF "sf")
 		       (VNx2DI "di")
@@ -1102,19 +1106,19 @@
 
 ;; Element mode with floating-point values replaced by like-sized integers.
 (define_mode_attr VEL_INT [(VNx16QI "QI")
-			   (VNx8HI  "HI") (VNx8HF "HI")
+			   (VNx8HI  "HI") (VNx8HF "HI") (VNx8BF "HI")
 			   (VNx4SI  "SI") (VNx4SF "SI")
 			   (VNx2DI  "DI") (VNx2DF "DI")])
 
 ;; Gives the mode of the 128-bit lowpart of an SVE vector.
 (define_mode_attr V128 [(VNx16QI "V16QI")
-			(VNx8HI  "V8HI") (VNx8HF "V8HF")
+			(VNx8HI  "V8HI") (VNx8HF "V8HF") (VNx8BF "V8BF")
 			(VNx4SI  "V4SI") (VNx4SF "V4SF")
 			(VNx2DI  "V2DI") (VNx2DF "V2DF")])
 
 ;; ...and again in lower case.
 (define_mode_attr v128 [(VNx16QI "v16qi")
-			(VNx8HI  "v8hi") (VNx8HF "v8hf")
+			(VNx8HI  "v8hi") (VNx8HF "v8hf") (VNx8BF "v8bf")
 			(VNx4SI  "v4si") (VNx4SF "v4sf")
 			(VNx2DI  "v2di") (VNx2DF "v2df")])
 
@@ -1277,6 +1281,7 @@
 			  (VNx16QI "w") (VNx8QI "w") (VNx4QI "w") (VNx2QI "w")
 			  (VNx8HI "w") (VNx4HI "w") (VNx2HI "w")
 			  (VNx8HF "w") (VNx4HF "w") (VNx2HF "w")
+			  (VNx8BF "w")
 			  (VNx4SI "w") (VNx2SI "w")
 			  (VNx4SF "w") (VNx2SF "w")
 			  (VNx2DI "x")
@@ -1303,6 +1308,7 @@
 			       (HF    "HI")
 			       (VNx16QI "VNx16QI")
 			       (VNx8HI  "VNx8HI") (VNx8HF "VNx8HI")
+			       (VNx8BF  "VNx8HI")
 			       (VNx4SI  "VNx4SI") (VNx4SF "VNx4SI")
 			       (VNx2DI  "VNx2DI") (VNx2DF "VNx2DI")
 ])
@@ -1318,15 +1324,18 @@
 			       (SF   "si")
 			       (VNx16QI "vnx16qi")
 			       (VNx8HI  "vnx8hi") (VNx8HF "vnx8hi")
+			       (VNx8BF  "vnx8hi")
 			       (VNx4SI  "vnx4si") (VNx4SF "vnx4si")
 			       (VNx2DI  "vnx2di") (VNx2DF "vnx2di")
 ])
 
 ;; Floating-point equivalent of selected modes.
 (define_mode_attr V_FP_EQUIV [(VNx8HI "VNx8HF") (VNx8HF "VNx8HF")
+			      (VNx8BF "VNx8HF")
 			      (VNx4SI "VNx4SF") (VNx4SF "VNx4SF")
 			      (VNx2DI "VNx2DF") (VNx2DF "VNx2DF")])
 (define_mode_attr v_fp_equiv [(VNx8HI "vnx8hf") (VNx8HF "vnx8hf")
+			      (VNx8BF "vnx8hf")
 			      (VNx4SI "vnx4sf") (VNx4SF "vnx4sf")
 			      (VNx2DI "vnx2df") (VNx2DF "vnx2df")])
 
@@ -1508,51 +1517,63 @@
 ;; The number of subvectors in an SVE_STRUCT.
 (define_mode_attr vector_count [(VNx32QI "2") (VNx16HI "2")
 				(VNx8SI  "2") (VNx4DI  "2")
+				(VNx16BF "2")
 				(VNx16HF "2") (VNx8SF  "2") (VNx4DF "2")
 				(VNx48QI "3") (VNx24HI "3")
 				(VNx12SI "3") (VNx6DI  "3")
+				(VNx24BF "3")
 				(VNx24HF "3") (VNx12SF "3") (VNx6DF "3")
 				(VNx64QI "4") (VNx32HI "4")
 				(VNx16SI "4") (VNx8DI  "4")
+				(VNx32BF "4")
 				(VNx32HF "4") (VNx16SF "4") (VNx8DF "4")])
 
 ;; The number of instruction bytes needed for an SVE_STRUCT move.  This is
 ;; equal to vector_count * 4.
 (define_mode_attr insn_length [(VNx32QI "8")  (VNx16HI "8")
 			       (VNx8SI  "8")  (VNx4DI  "8")
+			       (VNx16BF "8")
 			       (VNx16HF "8")  (VNx8SF  "8")  (VNx4DF "8")
 			       (VNx48QI "12") (VNx24HI "12")
 			       (VNx12SI "12") (VNx6DI  "12")
+			       (VNx24BF "12")
 			       (VNx24HF "12") (VNx12SF "12") (VNx6DF "12")
 			       (VNx64QI "16") (VNx32HI "16")
 			       (VNx16SI "16") (VNx8DI  "16")
+			       (VNx32BF "16")
 			       (VNx32HF "16") (VNx16SF "16") (VNx8DF "16")])
 
 ;; The type of a subvector in an SVE_STRUCT.
 (define_mode_attr VSINGLE [(VNx32QI "VNx16QI")
 			   (VNx16HI "VNx8HI") (VNx16HF "VNx8HF")
+			   (VNx16BF "VNx8BF")
 			   (VNx8SI "VNx4SI") (VNx8SF "VNx4SF")
 			   (VNx4DI "VNx2DI") (VNx4DF "VNx2DF")
 			   (VNx48QI "VNx16QI")
 			   (VNx24HI "VNx8HI") (VNx24HF "VNx8HF")
+			   (VNx24BF "VNx8BF")
 			   (VNx12SI "VNx4SI") (VNx12SF "VNx4SF")
 			   (VNx6DI "VNx2DI") (VNx6DF "VNx2DF")
 			   (VNx64QI "VNx16QI")
 			   (VNx32HI "VNx8HI") (VNx32HF "VNx8HF")
+			   (VNx32BF "VNx8BF")
 			   (VNx16SI "VNx4SI") (VNx16SF "VNx4SF")
 			   (VNx8DI "VNx2DI") (VNx8DF "VNx2DF")])
 
 ;; ...and again in lower case.
 (define_mode_attr vsingle [(VNx32QI "vnx16qi")
 			   (VNx16HI "vnx8hi") (VNx16HF "vnx8hf")
+			   (VNx16BF "vnx8bf")
 			   (VNx8SI "vnx4si") (VNx8SF "vnx4sf")
 			   (VNx4DI "vnx2di") (VNx4DF "vnx2df")
 			   (VNx48QI "vnx16qi")
 			   (VNx24HI "vnx8hi") (VNx24HF "vnx8hf")
+			   (VNx24BF "vnx8bf")
 			   (VNx12SI "vnx4si") (VNx12SF "vnx4sf")
 			   (VNx6DI "vnx2di") (VNx6DF "vnx2df")
 			   (VNx64QI "vnx16qi")
 			   (VNx32HI "vnx8hi") (VNx32HF "vnx8hf")
+			   (VNx32BF "vnx8bf")
 			   (VNx16SI "vnx4si") (VNx16SF "vnx4sf")
 			   (VNx8DI "vnx2di") (VNx8DF "vnx2df")])
 
@@ -1562,20 +1583,24 @@
 			 (VNx4QI "VNx4BI") (VNx2QI "VNx2BI")
 			 (VNx8HI "VNx8BI") (VNx4HI "VNx4BI") (VNx2HI "VNx2BI")
 			 (VNx8HF "VNx8BI") (VNx4HF "VNx4BI") (VNx2HF "VNx2BI")
+			 (VNx8BF "VNx8BI")
 			 (VNx4SI "VNx4BI") (VNx2SI "VNx2BI")
 			 (VNx4SF "VNx4BI") (VNx2SF "VNx2BI")
 			 (VNx2DI "VNx2BI")
 			 (VNx2DF "VNx2BI")
 			 (VNx32QI "VNx16BI")
 			 (VNx16HI "VNx8BI") (VNx16HF "VNx8BI")
+			 (VNx16BF "VNx8BI")
 			 (VNx8SI "VNx4BI") (VNx8SF "VNx4BI")
 			 (VNx4DI "VNx2BI") (VNx4DF "VNx2BI")
 			 (VNx48QI "VNx16BI")
 			 (VNx24HI "VNx8BI") (VNx24HF "VNx8BI")
+			 (VNx24BF "VNx8BI")
 			 (VNx12SI "VNx4BI") (VNx12SF "VNx4BI")
 			 (VNx6DI "VNx2BI") (VNx6DF "VNx2BI")
 			 (VNx64QI "VNx16BI")
 			 (VNx32HI "VNx8BI") (VNx32HF "VNx8BI")
+			 (VNx32BF "VNx8BI")
 			 (VNx16SI "VNx4BI") (VNx16SF "VNx4BI")
 			 (VNx8DI "VNx2BI") (VNx8DF "VNx2BI")])
 
@@ -1584,25 +1609,30 @@
 			 (VNx4QI "vnx4bi") (VNx2QI "vnx2bi")
 			 (VNx8HI "vnx8bi") (VNx4HI "vnx4bi") (VNx2HI "vnx2bi")
 			 (VNx8HF "vnx8bi") (VNx4HF "vnx4bi") (VNx2HF "vnx2bi")
+			 (VNx8BF "vnx8bi")
 			 (VNx4SI "vnx4bi") (VNx2SI "vnx2bi")
 			 (VNx4SF "vnx4bi") (VNx2SF "vnx2bi")
 			 (VNx2DI "vnx2bi")
 			 (VNx2DF "vnx2bi")
 			 (VNx32QI "vnx16bi")
 			 (VNx16HI "vnx8bi") (VNx16HF "vnx8bi")
+			 (VNx16BF "vnx8bi")
 			 (VNx8SI "vnx4bi") (VNx8SF "vnx4bi")
 			 (VNx4DI "vnx2bi") (VNx4DF "vnx2bi")
 			 (VNx48QI "vnx16bi")
 			 (VNx24HI "vnx8bi") (VNx24HF "vnx8bi")
+			 (VNx24BF "vnx8bi")
 			 (VNx12SI "vnx4bi") (VNx12SF "vnx4bi")
 			 (VNx6DI "vnx2bi") (VNx6DF "vnx2bi")
 			 (VNx64QI "vnx16bi")
 			 (VNx32HI "vnx8bi") (VNx32HF "vnx4bi")
+			 (VNx32BF "vnx8bi")
 			 (VNx16SI "vnx4bi") (VNx16SF "vnx4bi")
 			 (VNx8DI "vnx2bi") (VNx8DF "vnx2bi")])
 
 (define_mode_attr VDOUBLE [(VNx16QI "VNx32QI")
 			   (VNx8HI "VNx16HI") (VNx8HF "VNx16HF")
+			   (VNx8BF "VNx16BF")
 			   (VNx4SI "VNx8SI") (VNx4SF "VNx8SF")
 			   (VNx2DI "VNx4DI") (VNx2DF "VNx4DF")])
 
diff --git a/gcc/testsuite/ChangeLog b/gcc/testsuite/ChangeLog
index d72468c..5d002d9 100644
--- a/gcc/testsuite/ChangeLog
+++ b/gcc/testsuite/ChangeLog
@@ -1,3 +1,157 @@
+2020-01-31  Richard Sandiford  <richard.sandiford@arm.com>
+
+	* g++.target/aarch64/sve/acle/general-c++/mangle_1.C: Test mangling
+	of svbfloat16_t.
+	* g++.target/aarch64/sve/acle/general-c++/mangle_2.C: Likewise for
+	__SVBfloat16_t.
+	* gcc.target/aarch64/sve/acle/asm/clasta_bf16.c: New test.
+	* gcc.target/aarch64/sve/acle/asm/clastb_bf16.c: Likewise.
+	* gcc.target/aarch64/sve/acle/asm/cnt_bf16.c: Likewise.
+	* gcc.target/aarch64/sve/acle/asm/create2_1.c (create_bf16): Likewise.
+	* gcc.target/aarch64/sve/acle/asm/create3_1.c (create_bf16): Likewise.
+	* gcc.target/aarch64/sve/acle/asm/create4_1.c (create_bf16): Likewise.
+	* gcc.target/aarch64/sve/acle/asm/dup_bf16.c: Likewise.
+	* gcc.target/aarch64/sve/acle/asm/dup_lane_bf16.c: Likewise.
+	* gcc.target/aarch64/sve/acle/asm/dupq_lane_bf16.c: Likewise.
+	* gcc.target/aarch64/sve/acle/asm/ext_bf16.c: Likewise.
+	* gcc.target/aarch64/sve/acle/asm/get2_bf16.c: Likewise.
+	* gcc.target/aarch64/sve/acle/asm/get3_bf16.c: Likewise.
+	* gcc.target/aarch64/sve/acle/asm/get4_bf16.c: Likewise.
+	* gcc.target/aarch64/sve/acle/asm/insr_bf16.c: Likewise.
+	* gcc.target/aarch64/sve/acle/asm/lasta_bf16.c: Likewise.
+	* gcc.target/aarch64/sve/acle/asm/lastb_bf16.c: Likewise.
+	* gcc.target/aarch64/sve/acle/asm/ld1_bf16.c: Likewise.
+	* gcc.target/aarch64/sve/acle/asm/ld1ro_bf16.c: Likewise.
+	* gcc.target/aarch64/sve/acle/asm/ld1rq_bf16.c: Likewise.
+	* gcc.target/aarch64/sve/acle/asm/ld2_bf16.c: Likewise.
+	* gcc.target/aarch64/sve/acle/asm/ld3_bf16.c: Likewise.
+	* gcc.target/aarch64/sve/acle/asm/ld4_bf16.c: Likewise.
+	* gcc.target/aarch64/sve/acle/asm/ldff1_bf16.c: Likewise.
+	* gcc.target/aarch64/sve/acle/asm/ldnf1_bf16.c: Likewise.
+	* gcc.target/aarch64/sve/acle/asm/ldnt1_bf16.c: Likewise.
+	* gcc.target/aarch64/sve/acle/asm/len_bf16.c: Likewise.
+	* gcc.target/aarch64/sve/acle/asm/reinterpret_bf16.c: Likewise.
+	* gcc.target/aarch64/sve/acle/asm/reinterpret_f16.c
+	(reinterpret_f16_bf16_tied1, reinterpret_f16_bf16_untied): Likewise.
+	* gcc.target/aarch64/sve/acle/asm/reinterpret_f32.c
+	(reinterpret_f32_bf16_tied1, reinterpret_f32_bf16_untied): Likewise.
+	* gcc.target/aarch64/sve/acle/asm/reinterpret_f64.c
+	(reinterpret_f64_bf16_tied1, reinterpret_f64_bf16_untied): Likewise.
+	* gcc.target/aarch64/sve/acle/asm/reinterpret_s16.c
+	(reinterpret_s16_bf16_tied1, reinterpret_s16_bf16_untied): Likewise.
+	* gcc.target/aarch64/sve/acle/asm/reinterpret_s32.c
+	(reinterpret_s32_bf16_tied1, reinterpret_s32_bf16_untied): Likewise.
+	* gcc.target/aarch64/sve/acle/asm/reinterpret_s64.c
+	(reinterpret_s64_bf16_tied1, reinterpret_s64_bf16_untied): Likewise.
+	* gcc.target/aarch64/sve/acle/asm/reinterpret_s8.c
+	(reinterpret_s8_bf16_tied1, reinterpret_s8_bf16_untied): Likewise.
+	* gcc.target/aarch64/sve/acle/asm/reinterpret_u16.c
+	(reinterpret_u16_bf16_tied1, reinterpret_u16_bf16_untied): Likewise.
+	* gcc.target/aarch64/sve/acle/asm/reinterpret_u32.c
+	(reinterpret_u32_bf16_tied1, reinterpret_u32_bf16_untied): Likewise.
+	* gcc.target/aarch64/sve/acle/asm/reinterpret_u64.c
+	(reinterpret_u64_bf16_tied1, reinterpret_u64_bf16_untied): Likewise.
+	* gcc.target/aarch64/sve/acle/asm/reinterpret_u8.c
+	(reinterpret_u8_bf16_tied1, reinterpret_u8_bf16_untied): Likewise.
+	* gcc.target/aarch64/sve/acle/asm/rev_bf16.c: Likewise.
+	* gcc.target/aarch64/sve/acle/asm/sel_bf16.c: Likewise.
+	* gcc.target/aarch64/sve/acle/asm/set2_bf16.c: Likewise.
+	* gcc.target/aarch64/sve/acle/asm/set3_bf16.c: Likewise.
+	* gcc.target/aarch64/sve/acle/asm/set4_bf16.c: Likewise.
+	* gcc.target/aarch64/sve/acle/asm/splice_bf16.c: Likewise.
+	* gcc.target/aarch64/sve/acle/asm/st1_bf16.c: Likewise.
+	* gcc.target/aarch64/sve/acle/asm/st2_bf16.c: Likewise.
+	* gcc.target/aarch64/sve/acle/asm/st3_bf16.c: Likewise.
+	* gcc.target/aarch64/sve/acle/asm/st4_bf16.c: Likewise.
+	* gcc.target/aarch64/sve/acle/asm/stnt1_bf16.c: Likewise.
+	* gcc.target/aarch64/sve/acle/asm/tbl_bf16.c: Likewise.
+	* gcc.target/aarch64/sve/acle/asm/trn1_bf16.c: Likewise.
+	* gcc.target/aarch64/sve/acle/asm/trn1q_bf16.c: Likewise.
+	* gcc.target/aarch64/sve/acle/asm/trn2_bf16.c: Likewise.
+	* gcc.target/aarch64/sve/acle/asm/trn2q_bf16.c: Likewise.
+	* gcc.target/aarch64/sve/acle/asm/undef2_1.c (bfloat16_t): Likewise.
+	* gcc.target/aarch64/sve/acle/asm/undef3_1.c (bfloat16_t): Likewise.
+	* gcc.target/aarch64/sve/acle/asm/undef4_1.c (bfloat16_t): Likewise.
+	* gcc.target/aarch64/sve/acle/asm/undef_1.c (bfloat16_t): Likewise.
+	* gcc.target/aarch64/sve/acle/asm/uzp1_bf16.c: Likewise.
+	* gcc.target/aarch64/sve/acle/asm/uzp1q_bf16.c: Likewise.
+	* gcc.target/aarch64/sve/acle/asm/uzp2_bf16.c: Likewise.
+	* gcc.target/aarch64/sve/acle/asm/uzp2q_bf16.c: Likewise.
+	* gcc.target/aarch64/sve/acle/asm/zip1_bf16.c: Likewise.
+	* gcc.target/aarch64/sve/acle/asm/zip1q_bf16.c: Likewise.
+	* gcc.target/aarch64/sve/acle/asm/zip2_bf16.c: Likewise.
+	* gcc.target/aarch64/sve/acle/asm/zip2q_bf16.c: Likewise.
+	* gcc.target/aarch64/sve/pcs/annotate_1.c (ret_bf16, ret_bf16x2)
+	(ret_bf16x3, ret_bf16x4): Likewise.
+	* gcc.target/aarch64/sve/pcs/annotate_2.c (fn_bf16, fn_bf16x2)
+	(fn_bf16x3, fn_bf16x4): Likewise.
+	* gcc.target/aarch64/sve/pcs/annotate_3.c (fn_bf16, fn_bf16x2)
+	(fn_bf16x3, fn_bf16x4): Likewise.
+	* gcc.target/aarch64/sve/pcs/annotate_4.c (fn_bf16, fn_bf16x2)
+	(fn_bf16x3, fn_bf16x4): Likewise.
+	* gcc.target/aarch64/sve/pcs/annotate_5.c (fn_bf16, fn_bf16x2)
+	(fn_bf16x3, fn_bf16x4): Likewise.
+	* gcc.target/aarch64/sve/pcs/annotate_6.c (fn_bf16, fn_bf16x2)
+	(fn_bf16x3, fn_bf16x4): Likewise.
+	* gcc.target/aarch64/sve/pcs/annotate_7.c (fn_bf16, fn_bf16x2)
+	(fn_bf16x3, fn_bf16x4): Likewise.
+	* gcc.target/aarch64/sve/pcs/args_5_be_bf16.c: Likewise.
+	* gcc.target/aarch64/sve/pcs/args_5_le_bf16.c: Likewise.
+	* gcc.target/aarch64/sve/pcs/args_6_be_bf16.c: Likewise.
+	* gcc.target/aarch64/sve/pcs/args_6_le_bf16.c: Likewise.
+	* gcc.target/aarch64/sve/pcs/gnu_vectors_1.c (bfloat16x16_t): New
+	typedef.
+	(bfloat16_callee, bfloat16_caller): New tests.
+	* gcc.target/aarch64/sve/pcs/gnu_vectors_2.c (bfloat16x16_t): New
+	typedef.
+	(bfloat16_callee, bfloat16_caller): New tests.
+	* gcc.target/aarch64/sve/pcs/return_4.c (CALLER_BF16): New macro.
+	(callee_bf16, caller_bf16): New tests.
+	* gcc.target/aarch64/sve/pcs/return_4_128.c (CALLER_BF16): New macro.
+	(callee_bf16, caller_bf16): New tests.
+	* gcc.target/aarch64/sve/pcs/return_4_256.c (CALLER_BF16): New macro.
+	(callee_bf16, caller_bf16): New tests.
+	* gcc.target/aarch64/sve/pcs/return_4_512.c (CALLER_BF16): New macro.
+	(callee_bf16, caller_bf16): New tests.
+	* gcc.target/aarch64/sve/pcs/return_4_1024.c (CALLER_BF16): New macro.
+	(callee_bf16, caller_bf16): New tests.
+	* gcc.target/aarch64/sve/pcs/return_4_2048.c (CALLER_BF16): New macro.
+	(callee_bf16, caller_bf16): New tests.
+	* gcc.target/aarch64/sve/pcs/return_5.c (CALLER_BF16): New macro.
+	(callee_bf16, caller_bf16): New tests.
+	* gcc.target/aarch64/sve/pcs/return_5_128.c (CALLER_BF16): New macro.
+	(callee_bf16, caller_bf16): New tests.
+	* gcc.target/aarch64/sve/pcs/return_5_256.c (CALLER_BF16): New macro.
+	(callee_bf16, caller_bf16): New tests.
+	* gcc.target/aarch64/sve/pcs/return_5_512.c (CALLER_BF16): New macro.
+	(callee_bf16, caller_bf16): New tests.
+	* gcc.target/aarch64/sve/pcs/return_5_1024.c (CALLER_BF16): New macro.
+	(callee_bf16, caller_bf16): New tests.
+	* gcc.target/aarch64/sve/pcs/return_5_2048.c (CALLER_BF16): New macro.
+	(callee_bf16, caller_bf16): New tests.
+	* gcc.target/aarch64/sve/pcs/return_6.c (bfloat16_t): New typedef.
+	(callee_bf16, caller_bf16): New tests.
+	* gcc.target/aarch64/sve/pcs/return_6_128.c (bfloat16_t): New typedef.
+	(callee_bf16, caller_bf16): New tests.
+	* gcc.target/aarch64/sve/pcs/return_6_256.c (bfloat16_t): New typedef.
+	(callee_bf16, caller_bf16): New tests.
+	* gcc.target/aarch64/sve/pcs/return_6_512.c (bfloat16_t): New typedef.
+	(callee_bf16, caller_bf16): New tests.
+	* gcc.target/aarch64/sve/pcs/return_6_1024.c (bfloat16_t): New typedef.
+	(callee_bf16, caller_bf16): New tests.
+	* gcc.target/aarch64/sve/pcs/return_6_2048.c (bfloat16_t): New typedef.
+	(callee_bf16, caller_bf16): New tests.
+	* gcc.target/aarch64/sve/pcs/return_7.c (callee_bf16): Likewise
+	(caller_bf16): Likewise.
+	* gcc.target/aarch64/sve/pcs/return_8.c (callee_bf16): Likewise
+	(caller_bf16): Likewise.
+	* gcc.target/aarch64/sve/pcs/return_9.c (callee_bf16): Likewise
+	(caller_bf16): Likewise.
+	* gcc.target/aarch64/sve2/acle/asm/tbl2_bf16.c: Likewise.
+	* gcc.target/aarch64/sve2/acle/asm/tbx_bf16.c: Likewise.
+	* gcc.target/aarch64/sve2/acle/asm/whilerw_bf16.c: Likewise.
+	* gcc.target/aarch64/sve2/acle/asm/whilewr_bf16.c: Likewise.
+
 2020-01-31  Dennis Zhang  <dennis.zhang@arm.com>
 	    Matthew Malcomson  <matthew.malcomson@arm.com>
 	    Richard Sandiford  <richard.sandiford@arm.com>
diff --git a/gcc/testsuite/g++.target/aarch64/sve/acle/general-c++/mangle_1.C b/gcc/testsuite/g++.target/aarch64/sve/acle/general-c++/mangle_1.C
index 1138f2e..1a17124 100644
--- a/gcc/testsuite/g++.target/aarch64/sve/acle/general-c++/mangle_1.C
+++ b/gcc/testsuite/g++.target/aarch64/sve/acle/general-c++/mangle_1.C
@@ -14,6 +14,7 @@ void f9(svuint64_t) {}
 void f10(svfloat16_t) {}
 void f11(svfloat32_t) {}
 void f12(svfloat64_t) {}
+void f13(svbfloat16_t) {}
 
 /* { dg-final { scan-assembler "_Z2f110__SVBool_t:" } } */
 /* { dg-final { scan-assembler "_Z2f210__SVInt8_t:" } } */
@@ -27,3 +28,4 @@ void f12(svfloat64_t) {}
 /* { dg-final { scan-assembler "_Z3f1013__SVFloat16_t:" } } */
 /* { dg-final { scan-assembler "_Z3f1113__SVFloat32_t:" } } */
 /* { dg-final { scan-assembler "_Z3f1213__SVFloat64_t:" } } */
+/* { dg-final { scan-assembler "_Z3f1314__SVBfloat16_t:" } } */
diff --git a/gcc/testsuite/g++.target/aarch64/sve/acle/general-c++/mangle_2.C b/gcc/testsuite/g++.target/aarch64/sve/acle/general-c++/mangle_2.C
index 575b262..6792b8a 100644
--- a/gcc/testsuite/g++.target/aarch64/sve/acle/general-c++/mangle_2.C
+++ b/gcc/testsuite/g++.target/aarch64/sve/acle/general-c++/mangle_2.C
@@ -12,6 +12,7 @@ void f9(__SVUint64_t) {}
 void f10(__SVFloat16_t) {}
 void f11(__SVFloat32_t) {}
 void f12(__SVFloat64_t) {}
+void f13(__SVBfloat16_t) {}
 
 /* { dg-final { scan-assembler "_Z2f110__SVBool_t:" } } */
 /* { dg-final { scan-assembler "_Z2f210__SVInt8_t:" } } */
@@ -25,3 +26,4 @@ void f12(__SVFloat64_t) {}
 /* { dg-final { scan-assembler "_Z3f1013__SVFloat16_t:" } } */
 /* { dg-final { scan-assembler "_Z3f1113__SVFloat32_t:" } } */
 /* { dg-final { scan-assembler "_Z3f1213__SVFloat64_t:" } } */
+/* { dg-final { scan-assembler "_Z3f1314__SVBfloat16_t:" } } */
diff --git a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/clasta_bf16.c b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/clasta_bf16.c
new file mode 100644
index 0000000..a15e344
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/clasta_bf16.c
@@ -0,0 +1,52 @@
+/* { dg-final { check-function-bodies "**" "" "-DCHECK_ASM" } } */
+
+#include "test_sve_acle.h"
+
+/*
+** clasta_bf16_tied1:
+**	clasta	z0\.h, p0, z0\.h, z1\.h
+**	ret
+*/
+TEST_UNIFORM_Z (clasta_bf16_tied1, svbfloat16_t,
+		z0 = svclasta_bf16 (p0, z0, z1),
+		z0 = svclasta (p0, z0, z1))
+
+/*
+** clasta_bf16_tied2:
+**	mov	(z[0-9]+)\.d, z0\.d
+**	movprfx	z0, z1
+**	clasta	z0\.h, p0, z0\.h, \1\.h
+**	ret
+*/
+TEST_UNIFORM_Z (clasta_bf16_tied2, svbfloat16_t,
+		z0 = svclasta_bf16 (p0, z1, z0),
+		z0 = svclasta (p0, z1, z0))
+
+/*
+** clasta_bf16_untied:
+**	movprfx	z0, z1
+**	clasta	z0\.h, p0, z0\.h, z2\.h
+**	ret
+*/
+TEST_UNIFORM_Z (clasta_bf16_untied, svbfloat16_t,
+		z0 = svclasta_bf16 (p0, z1, z2),
+		z0 = svclasta (p0, z1, z2))
+
+/*
+** clasta_d0_bf16:
+**	clasta	h0, p0, h0, z2\.h
+**	ret
+*/
+TEST_FOLD_LEFT_D (clasta_d0_bf16, bfloat16_t, svbfloat16_t,
+		  d0 = svclasta_n_bf16 (p0, d0, z2),
+		  d0 = svclasta (p0, d0, z2))
+
+/*
+** clasta_d1_bf16:
+**	mov	v0\.h\[0\], v1\.h\[0\]
+**	clasta	h0, p0, h0, z2\.h
+**	ret
+*/
+TEST_FOLD_LEFT_D (clasta_d1_bf16, bfloat16_t, svbfloat16_t,
+		  d0 = svclasta_n_bf16 (p0, d1, z2),
+		  d0 = svclasta (p0, d1, z2))
diff --git a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/clastb_bf16.c b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/clastb_bf16.c
new file mode 100644
index 0000000..235fd1b
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/clastb_bf16.c
@@ -0,0 +1,52 @@
+/* { dg-final { check-function-bodies "**" "" "-DCHECK_ASM" } } */
+
+#include "test_sve_acle.h"
+
+/*
+** clastb_bf16_tied1:
+**	clastb	z0\.h, p0, z0\.h, z1\.h
+**	ret
+*/
+TEST_UNIFORM_Z (clastb_bf16_tied1, svbfloat16_t,
+		z0 = svclastb_bf16 (p0, z0, z1),
+		z0 = svclastb (p0, z0, z1))
+
+/*
+** clastb_bf16_tied2:
+**	mov	(z[0-9]+)\.d, z0\.d
+**	movprfx	z0, z1
+**	clastb	z0\.h, p0, z0\.h, \1\.h
+**	ret
+*/
+TEST_UNIFORM_Z (clastb_bf16_tied2, svbfloat16_t,
+		z0 = svclastb_bf16 (p0, z1, z0),
+		z0 = svclastb (p0, z1, z0))
+
+/*
+** clastb_bf16_untied:
+**	movprfx	z0, z1
+**	clastb	z0\.h, p0, z0\.h, z2\.h
+**	ret
+*/
+TEST_UNIFORM_Z (clastb_bf16_untied, svbfloat16_t,
+		z0 = svclastb_bf16 (p0, z1, z2),
+		z0 = svclastb (p0, z1, z2))
+
+/*
+** clastb_d0_bf16:
+**	clastb	h0, p0, h0, z2\.h
+**	ret
+*/
+TEST_FOLD_LEFT_D (clastb_d0_bf16, bfloat16_t, svbfloat16_t,
+		  d0 = svclastb_n_bf16 (p0, d0, z2),
+		  d0 = svclastb (p0, d0, z2))
+
+/*
+** clastb_d1_bf16:
+**	mov	v0\.h\[0\], v1\.h\[0\]
+**	clastb	h0, p0, h0, z2\.h
+**	ret
+*/
+TEST_FOLD_LEFT_D (clastb_d1_bf16, bfloat16_t, svbfloat16_t,
+		  d0 = svclastb_n_bf16 (p0, d1, z2),
+		  d0 = svclastb (p0, d1, z2))
diff --git a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/cnt_bf16.c b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/cnt_bf16.c
new file mode 100644
index 0000000..d92fbc1
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/cnt_bf16.c
@@ -0,0 +1,52 @@
+/* { dg-final { check-function-bodies "**" "" "-DCHECK_ASM" } } */
+
+#include "test_sve_acle.h"
+
+/*
+** cnt_bf16_m_tied1:
+**	cnt	z0\.h, p0/m, z4\.h
+**	ret
+*/
+TEST_DUAL_Z (cnt_bf16_m_tied1, svuint16_t, svbfloat16_t,
+	     z0 = svcnt_bf16_m (z0, p0, z4),
+	     z0 = svcnt_m (z0, p0, z4))
+
+/*
+** cnt_bf16_m_untied:
+**	movprfx	z0, z1
+**	cnt	z0\.h, p0/m, z4\.h
+**	ret
+*/
+TEST_DUAL_Z (cnt_bf16_m_untied, svuint16_t, svbfloat16_t,
+	     z0 = svcnt_bf16_m (z1, p0, z4),
+	     z0 = svcnt_m (z1, p0, z4))
+
+/*
+** cnt_bf16_z:
+**	movprfx	z0\.h, p0/z, z4\.h
+**	cnt	z0\.h, p0/m, z4\.h
+**	ret
+*/
+TEST_DUAL_Z (cnt_bf16_z, svuint16_t, svbfloat16_t,
+	     z0 = svcnt_bf16_z (p0, z4),
+	     z0 = svcnt_z (p0, z4))
+
+/*
+** cnt_bf16_x:
+**	cnt	z0\.h, p0/m, z4\.h
+**	ret
+*/
+TEST_DUAL_Z (cnt_bf16_x, svuint16_t, svbfloat16_t,
+	     z0 = svcnt_bf16_x (p0, z4),
+	     z0 = svcnt_x (p0, z4))
+
+/*
+** ptrue_cnt_bf16_x:
+**	...
+**	ptrue	p[0-9]+\.b[^\n]*
+**	...
+**	ret
+*/
+TEST_DUAL_Z (ptrue_cnt_bf16_x, svuint16_t, svbfloat16_t,
+	     z0 = svcnt_bf16_x (svptrue_b16 (), z4),
+	     z0 = svcnt_x (svptrue_b16 (), z4))
diff --git a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/create2_1.c b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/create2_1.c
index b09e6ab..e9158ed 100644
--- a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/create2_1.c
+++ b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/create2_1.c
@@ -43,6 +43,16 @@ TEST_CREATE (create2_u16, svuint16x2_t, svuint16_t,
 	     z0 = svcreate2 (z6, z5))
 
 /*
+** create2_bf16:
+**	mov	z0\.d, z4\.d
+**	mov	z1\.d, z5\.d
+**	ret
+*/
+TEST_CREATE (create2_bf16, svbfloat16x2_t, svbfloat16_t,
+	     z0 = svcreate2_bf16 (z4, z5),
+	     z0 = svcreate2 (z4, z5))
+
+/*
 ** create2_f16:
 **	mov	z0\.d, z4\.d
 **	mov	z1\.d, z5\.d
diff --git a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/create3_1.c b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/create3_1.c
index 6b71bf3..6f1afb7 100644
--- a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/create3_1.c
+++ b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/create3_1.c
@@ -47,6 +47,17 @@ TEST_CREATE (create3_u16, svuint16x3_t, svuint16_t,
 	     z0 = svcreate3 (z6, z5, z4))
 
 /*
+** create3_bf16:
+**	mov	z0\.d, z4\.d
+**	mov	z1\.d, z5\.d
+**	mov	z2\.d, z6\.d
+**	ret
+*/
+TEST_CREATE (create3_bf16, svbfloat16x3_t, svbfloat16_t,
+	     z0 = svcreate3_bf16 (z4, z5, z6),
+	     z0 = svcreate3 (z4, z5, z6))
+
+/*
 ** create3_f16:
 **	mov	z0\.d, z4\.d
 **	mov	z1\.d, z5\.d
diff --git a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/create4_1.c b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/create4_1.c
index 03b22d3..a386628 100644
--- a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/create4_1.c
+++ b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/create4_1.c
@@ -51,6 +51,18 @@ TEST_CREATE (create4_u16, svuint16x4_t, svuint16_t,
 	     z0 = svcreate4 (z6, z5, z4, z7))
 
 /*
+** create4_bf16:
+**	mov	z0\.d, z4\.d
+**	mov	z1\.d, z5\.d
+**	mov	z2\.d, z6\.d
+**	mov	z3\.d, z7\.d
+**	ret
+*/
+TEST_CREATE (create4_bf16, svbfloat16x4_t, svbfloat16_t,
+	     z0 = svcreate4_bf16 (z4, z5, z6, z7),
+	     z0 = svcreate4 (z4, z5, z6, z7))
+
+/*
 ** create4_f16:
 **	mov	z0\.d, z4\.d
 **	mov	z1\.d, z5\.d
diff --git a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/dup_bf16.c b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/dup_bf16.c
new file mode 100644
index 0000000..db47d84
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/dup_bf16.c
@@ -0,0 +1,41 @@
+/* { dg-final { check-function-bodies "**" "" "-DCHECK_ASM" } } */
+
+#include "test_sve_acle.h"
+
+/*
+** dup_h4_bf16:
+**	mov	z0\.h, h4
+**	ret
+*/
+TEST_UNIFORM_ZD (dup_h4_bf16, svbfloat16_t, __bf16,
+		z0 = svdup_n_bf16 (d4),
+		z0 = svdup_bf16 (d4))
+
+/*
+** dup_h4_bf16_m:
+**	movprfx	z0, z1
+**	mov	z0\.h, p0/m, h4
+**	ret
+*/
+TEST_UNIFORM_ZD (dup_h4_bf16_m, svbfloat16_t, __bf16,
+		z0 = svdup_n_bf16_m (z1, p0, d4),
+		z0 = svdup_bf16_m (z1, p0, d4))
+
+/*
+** dup_h4_bf16_z:
+**	movprfx	z0\.h, p0/z, z0\.h
+**	mov	z0\.h, p0/m, h4
+**	ret
+*/
+TEST_UNIFORM_ZD (dup_h4_bf16_z, svbfloat16_t, __bf16,
+		z0 = svdup_n_bf16_z (p0, d4),
+		z0 = svdup_bf16_z (p0, d4))
+
+/*
+** dup_h4_bf16_x:
+**	mov	z0\.h, h4
+**	ret
+*/
+TEST_UNIFORM_ZD (dup_h4_bf16_x, svbfloat16_t, __bf16,
+		z0 = svdup_n_bf16_x (p0, d4),
+		z0 = svdup_bf16_x (p0, d4))
diff --git a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/dup_lane_bf16.c b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/dup_lane_bf16.c
new file mode 100644
index 0000000..d05ad5a
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/dup_lane_bf16.c
@@ -0,0 +1,108 @@
+/* { dg-final { check-function-bodies "**" "" "-DCHECK_ASM" } } */
+
+#include "test_sve_acle.h"
+
+/*
+** dup_lane_w0_bf16_tied1:
+**	mov	(z[0-9]+\.h), w0
+**	tbl	z0\.h, z0\.h, \1
+**	ret
+*/
+TEST_UNIFORM_ZX (dup_lane_w0_bf16_tied1, svbfloat16_t, uint16_t,
+		 z0 = svdup_lane_bf16 (z0, x0),
+		 z0 = svdup_lane (z0, x0))
+
+/*
+** dup_lane_w0_bf16_untied:
+**	mov	(z[0-9]+\.h), w0
+**	tbl	z0\.h, z1\.h, \1
+**	ret
+*/
+TEST_UNIFORM_ZX (dup_lane_w0_bf16_untied, svbfloat16_t, uint16_t,
+		 z0 = svdup_lane_bf16 (z1, x0),
+		 z0 = svdup_lane (z1, x0))
+
+/*
+** dup_lane_0_bf16_tied1:
+**	dup	z0\.h, z0\.h\[0\]
+**	ret
+*/
+TEST_UNIFORM_Z (dup_lane_0_bf16_tied1, svbfloat16_t,
+		z0 = svdup_lane_bf16 (z0, 0),
+		z0 = svdup_lane (z0, 0))
+
+/*
+** dup_lane_0_bf16_untied:
+**	dup	z0\.h, z1\.h\[0\]
+**	ret
+*/
+TEST_UNIFORM_Z (dup_lane_0_bf16_untied, svbfloat16_t,
+		z0 = svdup_lane_bf16 (z1, 0),
+		z0 = svdup_lane (z1, 0))
+
+/*
+** dup_lane_15_bf16:
+**	dup	z0\.h, z0\.h\[15\]
+**	ret
+*/
+TEST_UNIFORM_Z (dup_lane_15_bf16, svbfloat16_t,
+		z0 = svdup_lane_bf16 (z0, 15),
+		z0 = svdup_lane (z0, 15))
+
+/*
+** dup_lane_16_bf16:
+**	dup	z0\.h, z0\.h\[16\]
+**	ret
+*/
+TEST_UNIFORM_Z (dup_lane_16_bf16, svbfloat16_t,
+		z0 = svdup_lane_bf16 (z0, 16),
+		z0 = svdup_lane (z0, 16))
+
+/*
+** dup_lane_31_bf16:
+**	dup	z0\.h, z0\.h\[31\]
+**	ret
+*/
+TEST_UNIFORM_Z (dup_lane_31_bf16, svbfloat16_t,
+		z0 = svdup_lane_bf16 (z0, 31),
+		z0 = svdup_lane (z0, 31))
+
+/*
+** dup_lane_32_bf16:
+**	mov	(z[0-9]+\.h), #32
+**	tbl	z0\.h, z0\.h, \1
+**	ret
+*/
+TEST_UNIFORM_Z (dup_lane_32_bf16, svbfloat16_t,
+		z0 = svdup_lane_bf16 (z0, 32),
+		z0 = svdup_lane (z0, 32))
+
+/*
+** dup_lane_63_bf16:
+**	mov	(z[0-9]+\.h), #63
+**	tbl	z0\.h, z0\.h, \1
+**	ret
+*/
+TEST_UNIFORM_Z (dup_lane_63_bf16, svbfloat16_t,
+		z0 = svdup_lane_bf16 (z0, 63),
+		z0 = svdup_lane (z0, 63))
+
+/*
+** dup_lane_64_bf16:
+**	mov	(z[0-9]+\.h), #64
+**	tbl	z0\.h, z0\.h, \1
+**	ret
+*/
+TEST_UNIFORM_Z (dup_lane_64_bf16, svbfloat16_t,
+		z0 = svdup_lane_bf16 (z0, 64),
+		z0 = svdup_lane (z0, 64))
+
+/*
+** dup_lane_255_bf16:
+**	mov	(z[0-9]+\.h), #255
+**	tbl	z0\.h, z0\.h, \1
+**	ret
+*/
+TEST_UNIFORM_Z (dup_lane_255_bf16, svbfloat16_t,
+		z0 = svdup_lane_bf16 (z0, 255),
+		z0 = svdup_lane (z0, 255))
diff --git a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/dupq_lane_bf16.c b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/dupq_lane_bf16.c
new file mode 100644
index 0000000..89ae4a4
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/dupq_lane_bf16.c
@@ -0,0 +1,48 @@
+/* { dg-final { check-function-bodies "**" "" "-DCHECK_ASM" } } */
+
+#include "test_sve_acle.h"
+
+/*
+** dupq_lane_0_bf16_tied:
+**	dup	z0\.q, z0\.q\[0\]
+**	ret
+*/
+TEST_UNIFORM_Z (dupq_lane_0_bf16_tied, svbfloat16_t,
+		z0 = svdupq_lane_bf16 (z0, 0),
+		z0 = svdupq_lane (z0, 0))
+
+/*
+** dupq_lane_0_bf16_untied:
+**	dup	z0\.q, z1\.q\[0\]
+**	ret
+*/
+TEST_UNIFORM_Z (dupq_lane_0_bf16_untied, svbfloat16_t,
+		z0 = svdupq_lane_bf16 (z1, 0),
+		z0 = svdupq_lane (z1, 0))
+
+/*
+** dupq_lane_1_bf16:
+**	dup	z0\.q, z0\.q\[1\]
+**	ret
+*/
+TEST_UNIFORM_Z (dupq_lane_1_bf16, svbfloat16_t,
+		z0 = svdupq_lane_bf16 (z0, 1),
+		z0 = svdupq_lane (z0, 1))
+
+/*
+** dupq_lane_2_bf16:
+**	dup	z0\.q, z0\.q\[2\]
+**	ret
+*/
+TEST_UNIFORM_Z (dupq_lane_2_bf16, svbfloat16_t,
+		z0 = svdupq_lane_bf16 (z0, 2),
+		z0 = svdupq_lane (z0, 2))
+
+/*
+** dupq_lane_3_bf16:
+**	dup	z0\.q, z0\.q\[3\]
+**	ret
+*/
+TEST_UNIFORM_Z (dupq_lane_3_bf16, svbfloat16_t,
+		z0 = svdupq_lane_bf16 (z0, 3),
+		z0 = svdupq_lane (z0, 3))
diff --git a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ext_bf16.c b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ext_bf16.c
new file mode 100644
index 0000000..f982873
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ext_bf16.c
@@ -0,0 +1,73 @@
+/* { dg-final { check-function-bodies "**" "" "-DCHECK_ASM" } } */
+
+#include "test_sve_acle.h"
+
+/*
+** ext_0_bf16_tied1:
+**	ext	z0\.b, z0\.b, z1\.b, #0
+**	ret
+*/
+TEST_UNIFORM_Z (ext_0_bf16_tied1, svbfloat16_t,
+		z0 = svext_bf16 (z0, z1, 0),
+		z0 = svext (z0, z1, 0))
+
+/*
+** ext_0_bf16_tied2:
+**	mov	(z[0-9]+)\.d, z0\.d
+**	movprfx	z0, z1
+**	ext	z0\.b, z0\.b, \1\.b, #0
+**	ret
+*/
+TEST_UNIFORM_Z (ext_0_bf16_tied2, svbfloat16_t,
+		z0 = svext_bf16 (z1, z0, 0),
+		z0 = svext (z1, z0, 0))
+
+/*
+** ext_0_bf16_untied:
+**	movprfx	z0, z1
+**	ext	z0\.b, z0\.b, z2\.b, #0
+**	ret
+*/
+TEST_UNIFORM_Z (ext_0_bf16_untied, svbfloat16_t,
+		z0 = svext_bf16 (z1, z2, 0),
+		z0 = svext (z1, z2, 0))
+
+/*
+** ext_1_bf16:
+**	movprfx	z0, z1
+**	ext	z0\.b, z0\.b, z2\.b, #2
+**	ret
+*/
+TEST_UNIFORM_Z (ext_1_bf16, svbfloat16_t,
+		z0 = svext_bf16 (z1, z2, 1),
+		z0 = svext (z1, z2, 1))
+
+/*
+** ext_2_bf16:
+**	movprfx	z0, z1
+**	ext	z0\.b, z0\.b, z2\.b, #4
+**	ret
+*/
+TEST_UNIFORM_Z (ext_2_bf16, svbfloat16_t,
+		z0 = svext_bf16 (z1, z2, 2),
+		z0 = svext (z1, z2, 2))
+
+/*
+** ext_3_bf16:
+**	movprfx	z0, z1
+**	ext	z0\.b, z0\.b, z2\.b, #6
+**	ret
+*/
+TEST_UNIFORM_Z (ext_3_bf16, svbfloat16_t,
+		z0 = svext_bf16 (z1, z2, 3),
+		z0 = svext (z1, z2, 3))
+
+/*
+** ext_127_bf16:
+**	movprfx	z0, z1
+**	ext	z0\.b, z0\.b, z2\.b, #254
+**	ret
+*/
+TEST_UNIFORM_Z (ext_127_bf16, svbfloat16_t,
+		z0 = svext_bf16 (z1, z2, 127),
+		z0 = svext (z1, z2, 127))
diff --git a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/get2_bf16.c b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/get2_bf16.c
new file mode 100644
index 0000000..6e5c773
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/get2_bf16.c
@@ -0,0 +1,55 @@
+/* { dg-final { check-function-bodies "**" "" "-DCHECK_ASM" } } */
+
+#include "test_sve_acle.h"
+
+/*
+** get2_bf16_z0_0:
+**	mov	z0\.d, z4\.d
+**	ret
+*/
+TEST_GET (get2_bf16_z0_0, svbfloat16x2_t, svbfloat16_t,
+	  z0 = svget2_bf16 (z4, 0),
+	  z0 = svget2 (z4, 0))
+
+/*
+** get2_bf16_z0_1:
+**	mov	z0\.d, z5\.d
+**	ret
+*/
+TEST_GET (get2_bf16_z0_1, svbfloat16x2_t, svbfloat16_t,
+	  z0 = svget2_bf16 (z4, 1),
+	  z0 = svget2 (z4, 1))
+
+/*
+** get2_bf16_z4_0:
+**	ret
+*/
+TEST_GET (get2_bf16_z4_0, svbfloat16x2_t, svbfloat16_t,
+	  z4_res = svget2_bf16 (z4, 0),
+	  z4_res = svget2 (z4, 0))
+
+/*
+** get2_bf16_z4_1:
+**	mov	z4\.d, z5\.d
+**	ret
+*/
+TEST_GET (get2_bf16_z4_1, svbfloat16x2_t, svbfloat16_t,
+	  z4_res = svget2_bf16 (z4, 1),
+	  z4_res = svget2 (z4, 1))
+
+/*
+** get2_bf16_z5_0:
+**	mov	z5\.d, z4\.d
+**	ret
+*/
+TEST_GET (get2_bf16_z5_0, svbfloat16x2_t, svbfloat16_t,
+	  z5_res = svget2_bf16 (z4, 0),
+	  z5_res = svget2 (z4, 0))
+
+/*
+** get2_bf16_z5_1:
+**	ret
+*/
+TEST_GET (get2_bf16_z5_1, svbfloat16x2_t, svbfloat16_t,
+	  z5_res = svget2_bf16 (z4, 1),
+	  z5_res = svget2 (z4, 1))
diff --git a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/get3_bf16.c b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/get3_bf16.c
new file mode 100644
index 0000000..292f02a
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/get3_bf16.c
@@ -0,0 +1,108 @@
+/* { dg-final { check-function-bodies "**" "" "-DCHECK_ASM" } } */
+
+#include "test_sve_acle.h"
+
+/*
+** get3_bf16_z0_0:
+**	mov	z0\.d, z4\.d
+**	ret
+*/
+TEST_GET (get3_bf16_z0_0, svbfloat16x3_t, svbfloat16_t,
+	  z0 = svget3_bf16 (z4, 0),
+	  z0 = svget3 (z4, 0))
+
+/*
+** get3_bf16_z0_1:
+**	mov	z0\.d, z5\.d
+**	ret
+*/
+TEST_GET (get3_bf16_z0_1, svbfloat16x3_t, svbfloat16_t,
+	  z0 = svget3_bf16 (z4, 1),
+	  z0 = svget3 (z4, 1))
+
+/*
+** get3_bf16_z0_2:
+**	mov	z0\.d, z6\.d
+**	ret
+*/
+TEST_GET (get3_bf16_z0_2, svbfloat16x3_t, svbfloat16_t,
+	  z0 = svget3_bf16 (z4, 2),
+	  z0 = svget3 (z4, 2))
+
+/*
+** get3_bf16_z4_0:
+**	ret
+*/
+TEST_GET (get3_bf16_z4_0, svbfloat16x3_t, svbfloat16_t,
+	  z4_res = svget3_bf16 (z4, 0),
+	  z4_res = svget3 (z4, 0))
+
+/*
+** get3_bf16_z4_1:
+**	mov	z4\.d, z5\.d
+**	ret
+*/
+TEST_GET (get3_bf16_z4_1, svbfloat16x3_t, svbfloat16_t,
+	  z4_res = svget3_bf16 (z4, 1),
+	  z4_res = svget3 (z4, 1))
+
+/*
+** get3_bf16_z4_2:
+**	mov	z4\.d, z6\.d
+**	ret
+*/
+TEST_GET (get3_bf16_z4_2, svbfloat16x3_t, svbfloat16_t,
+	  z4_res = svget3_bf16 (z4, 2),
+	  z4_res = svget3 (z4, 2))
+
+/*
+** get3_bf16_z5_0:
+**	mov	z5\.d, z4\.d
+**	ret
+*/
+TEST_GET (get3_bf16_z5_0, svbfloat16x3_t, svbfloat16_t,
+	  z5_res = svget3_bf16 (z4, 0),
+	  z5_res = svget3 (z4, 0))
+
+/*
+** get3_bf16_z5_1:
+**	ret
+*/
+TEST_GET (get3_bf16_z5_1, svbfloat16x3_t, svbfloat16_t,
+	  z5_res = svget3_bf16 (z4, 1),
+	  z5_res = svget3 (z4, 1))
+
+/*
+** get3_bf16_z5_2:
+**	mov	z5\.d, z6\.d
+**	ret
+*/
+TEST_GET (get3_bf16_z5_2, svbfloat16x3_t, svbfloat16_t,
+	  z5_res = svget3_bf16 (z4, 2),
+	  z5_res = svget3 (z4, 2))
+
+/*
+** get3_bf16_z6_0:
+**	mov	z6\.d, z4\.d
+**	ret
+*/
+TEST_GET (get3_bf16_z6_0, svbfloat16x3_t, svbfloat16_t,
+	  z6_res = svget3_bf16 (z4, 0),
+	  z6_res = svget3 (z4, 0))
+
+/*
+** get3_bf16_z6_1:
+**	mov	z6\.d, z5\.d
+**	ret
+*/
+TEST_GET (get3_bf16_z6_1, svbfloat16x3_t, svbfloat16_t,
+	  z6_res = svget3_bf16 (z4, 1),
+	  z6_res = svget3 (z4, 1))
+
+/*
+** get3_bf16_z6_2:
+**	ret
+*/
+TEST_GET (get3_bf16_z6_2, svbfloat16x3_t, svbfloat16_t,
+	  z6_res = svget3_bf16 (z4, 2),
+	  z6_res = svget3 (z4, 2))
diff --git a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/get4_bf16.c b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/get4_bf16.c
new file mode 100644
index 0000000..f751fc1
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/get4_bf16.c
@@ -0,0 +1,179 @@
+/* { dg-final { check-function-bodies "**" "" "-DCHECK_ASM" } } */
+
+#include "test_sve_acle.h"
+
+/*
+** get4_bf16_z0_0:
+**	mov	z0\.d, z4\.d
+**	ret
+*/
+TEST_GET (get4_bf16_z0_0, svbfloat16x4_t, svbfloat16_t,
+	  z0 = svget4_bf16 (z4, 0),
+	  z0 = svget4 (z4, 0))
+
+/*
+** get4_bf16_z0_1:
+**	mov	z0\.d, z5\.d
+**	ret
+*/
+TEST_GET (get4_bf16_z0_1, svbfloat16x4_t, svbfloat16_t,
+	  z0 = svget4_bf16 (z4, 1),
+	  z0 = svget4 (z4, 1))
+
+/*
+** get4_bf16_z0_2:
+**	mov	z0\.d, z6\.d
+**	ret
+*/
+TEST_GET (get4_bf16_z0_2, svbfloat16x4_t, svbfloat16_t,
+	  z0 = svget4_bf16 (z4, 2),
+	  z0 = svget4 (z4, 2))
+
+/*
+** get4_bf16_z0_3:
+**	mov	z0\.d, z7\.d
+**	ret
+*/
+TEST_GET (get4_bf16_z0_3, svbfloat16x4_t, svbfloat16_t,
+	  z0 = svget4_bf16 (z4, 3),
+	  z0 = svget4 (z4, 3))
+
+/*
+** get4_bf16_z4_0:
+**	ret
+*/
+TEST_GET (get4_bf16_z4_0, svbfloat16x4_t, svbfloat16_t,
+	  z4_res = svget4_bf16 (z4, 0),
+	  z4_res = svget4 (z4, 0))
+
+/*
+** get4_bf16_z4_1:
+**	mov	z4\.d, z5\.d
+**	ret
+*/
+TEST_GET (get4_bf16_z4_1, svbfloat16x4_t, svbfloat16_t,
+	  z4_res = svget4_bf16 (z4, 1),
+	  z4_res = svget4 (z4, 1))
+
+/*
+** get4_bf16_z4_2:
+**	mov	z4\.d, z6\.d
+**	ret
+*/
+TEST_GET (get4_bf16_z4_2, svbfloat16x4_t, svbfloat16_t,
+	  z4_res = svget4_bf16 (z4, 2),
+	  z4_res = svget4 (z4, 2))
+
+/*
+** get4_bf16_z4_3:
+**	mov	z4\.d, z7\.d
+**	ret
+*/
+TEST_GET (get4_bf16_z4_3, svbfloat16x4_t, svbfloat16_t,
+	  z4_res = svget4_bf16 (z4, 3),
+	  z4_res = svget4 (z4, 3))
+
+/*
+** get4_bf16_z5_0:
+**	mov	z5\.d, z4\.d
+**	ret
+*/
+TEST_GET (get4_bf16_z5_0, svbfloat16x4_t, svbfloat16_t,
+	  z5_res = svget4_bf16 (z4, 0),
+	  z5_res = svget4 (z4, 0))
+
+/*
+** get4_bf16_z5_1:
+**	ret
+*/
+TEST_GET (get4_bf16_z5_1, svbfloat16x4_t, svbfloat16_t,
+	  z5_res = svget4_bf16 (z4, 1),
+	  z5_res = svget4 (z4, 1))
+
+/*
+** get4_bf16_z5_2:
+**	mov	z5\.d, z6\.d
+**	ret
+*/
+TEST_GET (get4_bf16_z5_2, svbfloat16x4_t, svbfloat16_t,
+	  z5_res = svget4_bf16 (z4, 2),
+	  z5_res = svget4 (z4, 2))
+
+/*
+** get4_bf16_z5_3:
+**	mov	z5\.d, z7\.d
+**	ret
+*/
+TEST_GET (get4_bf16_z5_3, svbfloat16x4_t, svbfloat16_t,
+	  z5_res = svget4_bf16 (z4, 3),
+	  z5_res = svget4 (z4, 3))
+
+/*
+** get4_bf16_z6_0:
+**	mov	z6\.d, z4\.d
+**	ret
+*/
+TEST_GET (get4_bf16_z6_0, svbfloat16x4_t, svbfloat16_t,
+	  z6_res = svget4_bf16 (z4, 0),
+	  z6_res = svget4 (z4, 0))
+
+/*
+** get4_bf16_z6_1:
+**	mov	z6\.d, z5\.d
+**	ret
+*/
+TEST_GET (get4_bf16_z6_1, svbfloat16x4_t, svbfloat16_t,
+	  z6_res = svget4_bf16 (z4, 1),
+	  z6_res = svget4 (z4, 1))
+
+/*
+** get4_bf16_z6_2:
+**	ret
+*/
+TEST_GET (get4_bf16_z6_2, svbfloat16x4_t, svbfloat16_t,
+	  z6_res = svget4_bf16 (z4, 2),
+	  z6_res = svget4 (z4, 2))
+
+/*
+** get4_bf16_z6_3:
+**	mov	z6\.d, z7\.d
+**	ret
+*/
+TEST_GET (get4_bf16_z6_3, svbfloat16x4_t, svbfloat16_t,
+	  z6_res = svget4_bf16 (z4, 3),
+	  z6_res = svget4 (z4, 3))
+
+/*
+** get4_bf16_z7_0:
+**	mov	z7\.d, z4\.d
+**	ret
+*/
+TEST_GET (get4_bf16_z7_0, svbfloat16x4_t, svbfloat16_t,
+	  z7_res = svget4_bf16 (z4, 0),
+	  z7_res = svget4 (z4, 0))
+
+/*
+** get4_bf16_z7_1:
+**	mov	z7\.d, z5\.d
+**	ret
+*/
+TEST_GET (get4_bf16_z7_1, svbfloat16x4_t, svbfloat16_t,
+	  z7_res = svget4_bf16 (z4, 1),
+	  z7_res = svget4 (z4, 1))
+
+/*
+** get4_bf16_z7_2:
+**	mov	z7\.d, z6\.d
+**	ret
+*/
+TEST_GET (get4_bf16_z7_2, svbfloat16x4_t, svbfloat16_t,
+	  z7_res = svget4_bf16 (z4, 2),
+	  z7_res = svget4 (z4, 2))
+
+/*
+** get4_bf16_z7_3:
+**	ret
+*/
+TEST_GET (get4_bf16_z7_3, svbfloat16x4_t, svbfloat16_t,
+	  z7_res = svget4_bf16 (z4, 3),
+	  z7_res = svget4 (z4, 3))
diff --git a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/insr_bf16.c b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/insr_bf16.c
new file mode 100644
index 0000000..55afdba
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/insr_bf16.c
@@ -0,0 +1,22 @@
+/* { dg-final { check-function-bodies "**" "" "-DCHECK_ASM" } } */
+
+#include "test_sve_acle.h"
+
+/*
+** insr_h4_bf16_tied1:
+**	insr	z0\.h, h4
+**	ret
+*/
+TEST_UNIFORM_ZD (insr_h4_bf16_tied1, svbfloat16_t, bfloat16_t,
+		 z0 = svinsr_n_bf16 (z0, d4),
+		 z0 = svinsr (z0, d4))
+
+/*
+** insr_h4_bf16_untied:
+**	movprfx	z0, z1
+**	insr	z0\.h, h4
+**	ret
+*/
+TEST_UNIFORM_ZD (insr_h4_bf16_untied, svbfloat16_t, bfloat16_t,
+		 z0 = svinsr_n_bf16 (z1, d4),
+		 z0 = svinsr (z1, d4))
diff --git a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/lasta_bf16.c b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/lasta_bf16.c
new file mode 100644
index 0000000..da30e05
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/lasta_bf16.c
@@ -0,0 +1,21 @@
+/* { dg-final { check-function-bodies "**" "" "-DCHECK_ASM" } } */
+
+#include "test_sve_acle.h"
+
+/*
+** lasta_d0_bf16_tied:
+**	lasta	h0, p0, z0\.h
+**	ret
+*/
+TEST_REDUCTION_D (lasta_d0_bf16_tied, bfloat16_t, svbfloat16_t,
+		  d0 = svlasta_bf16 (p0, z0),
+		  d0 = svlasta (p0, z0))
+
+/*
+** lasta_d0_bf16_untied:
+**	lasta	h0, p0, z1\.h
+**	ret
+*/
+TEST_REDUCTION_D (lasta_d0_bf16_untied, bfloat16_t, svbfloat16_t,
+		  d0 = svlasta_bf16 (p0, z1),
+		  d0 = svlasta (p0, z1))
diff --git a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/lastb_bf16.c b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/lastb_bf16.c
new file mode 100644
index 0000000..01ba39a
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/lastb_bf16.c
@@ -0,0 +1,21 @@
+/* { dg-final { check-function-bodies "**" "" "-DCHECK_ASM" } } */
+
+#include "test_sve_acle.h"
+
+/*
+** lastb_d0_bf16_tied:
+**	lastb	h0, p0, z0\.h
+**	ret
+*/
+TEST_REDUCTION_D (lastb_d0_bf16_tied, bfloat16_t, svbfloat16_t,
+		  d0 = svlastb_bf16 (p0, z0),
+		  d0 = svlastb (p0, z0))
+
+/*
+** lastb_d0_bf16_untied:
+**	lastb	h0, p0, z1\.h
+**	ret
+*/
+TEST_REDUCTION_D (lastb_d0_bf16_untied, bfloat16_t, svbfloat16_t,
+		  d0 = svlastb_bf16 (p0, z1),
+		  d0 = svlastb (p0, z1))
diff --git a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ld1_bf16.c b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ld1_bf16.c
new file mode 100644
index 0000000..07891de
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ld1_bf16.c
@@ -0,0 +1,158 @@
+/* { dg-final { check-function-bodies "**" "" "-DCHECK_ASM" { target { ! ilp32 } } } } */
+
+#include "test_sve_acle.h"
+
+/*
+** ld1_bf16_base:
+**	ld1h	z0\.h, p0/z, \[x0\]
+**	ret
+*/
+TEST_LOAD (ld1_bf16_base, svbfloat16_t, bfloat16_t,
+	   z0 = svld1_bf16 (p0, x0),
+	   z0 = svld1 (p0, x0))
+
+/*
+** ld1_bf16_index:
+**	ld1h	z0\.h, p0/z, \[x0, x1, lsl 1\]
+**	ret
+*/
+TEST_LOAD (ld1_bf16_index, svbfloat16_t, bfloat16_t,
+	   z0 = svld1_bf16 (p0, x0 + x1),
+	   z0 = svld1 (p0, x0 + x1))
+
+/*
+** ld1_bf16_1:
+**	ld1h	z0\.h, p0/z, \[x0, #1, mul vl\]
+**	ret
+*/
+TEST_LOAD (ld1_bf16_1, svbfloat16_t, bfloat16_t,
+	   z0 = svld1_bf16 (p0, x0 + svcnth ()),
+	   z0 = svld1 (p0, x0 + svcnth ()))
+
+/*
+** ld1_bf16_7:
+**	ld1h	z0\.h, p0/z, \[x0, #7, mul vl\]
+**	ret
+*/
+TEST_LOAD (ld1_bf16_7, svbfloat16_t, bfloat16_t,
+	   z0 = svld1_bf16 (p0, x0 + svcnth () * 7),
+	   z0 = svld1 (p0, x0 + svcnth () * 7))
+
+/* Moving the constant into a register would also be OK.  */
+/*
+** ld1_bf16_8:
+**	incb	x0, all, mul #8
+**	ld1h	z0\.h, p0/z, \[x0\]
+**	ret
+*/
+TEST_LOAD (ld1_bf16_8, svbfloat16_t, bfloat16_t,
+	   z0 = svld1_bf16 (p0, x0 + svcnth () * 8),
+	   z0 = svld1 (p0, x0 + svcnth () * 8))
+
+/*
+** ld1_bf16_m1:
+**	ld1h	z0\.h, p0/z, \[x0, #-1, mul vl\]
+**	ret
+*/
+TEST_LOAD (ld1_bf16_m1, svbfloat16_t, bfloat16_t,
+	   z0 = svld1_bf16 (p0, x0 - svcnth ()),
+	   z0 = svld1 (p0, x0 - svcnth ()))
+
+/*
+** ld1_bf16_m8:
+**	ld1h	z0\.h, p0/z, \[x0, #-8, mul vl\]
+**	ret
+*/
+TEST_LOAD (ld1_bf16_m8, svbfloat16_t, bfloat16_t,
+	   z0 = svld1_bf16 (p0, x0 - svcnth () * 8),
+	   z0 = svld1 (p0, x0 - svcnth () * 8))
+
+/* Moving the constant into a register would also be OK.  */
+/*
+** ld1_bf16_m9:
+**	decb	x0, all, mul #9
+**	ld1h	z0\.h, p0/z, \[x0\]
+**	ret
+*/
+TEST_LOAD (ld1_bf16_m9, svbfloat16_t, bfloat16_t,
+	   z0 = svld1_bf16 (p0, x0 - svcnth () * 9),
+	   z0 = svld1 (p0, x0 - svcnth () * 9))
+
+/*
+** ld1_vnum_bf16_0:
+**	ld1h	z0\.h, p0/z, \[x0\]
+**	ret
+*/
+TEST_LOAD (ld1_vnum_bf16_0, svbfloat16_t, bfloat16_t,
+	   z0 = svld1_vnum_bf16 (p0, x0, 0),
+	   z0 = svld1_vnum (p0, x0, 0))
+
+/*
+** ld1_vnum_bf16_1:
+**	ld1h	z0\.h, p0/z, \[x0, #1, mul vl\]
+**	ret
+*/
+TEST_LOAD (ld1_vnum_bf16_1, svbfloat16_t, bfloat16_t,
+	   z0 = svld1_vnum_bf16 (p0, x0, 1),
+	   z0 = svld1_vnum (p0, x0, 1))
+
+/*
+** ld1_vnum_bf16_7:
+**	ld1h	z0\.h, p0/z, \[x0, #7, mul vl\]
+**	ret
+*/
+TEST_LOAD (ld1_vnum_bf16_7, svbfloat16_t, bfloat16_t,
+	   z0 = svld1_vnum_bf16 (p0, x0, 7),
+	   z0 = svld1_vnum (p0, x0, 7))
+
+/* Moving the constant into a register would also be OK.  */
+/*
+** ld1_vnum_bf16_8:
+**	incb	x0, all, mul #8
+**	ld1h	z0\.h, p0/z, \[x0\]
+**	ret
+*/
+TEST_LOAD (ld1_vnum_bf16_8, svbfloat16_t, bfloat16_t,
+	   z0 = svld1_vnum_bf16 (p0, x0, 8),
+	   z0 = svld1_vnum (p0, x0, 8))
+
+/*
+** ld1_vnum_bf16_m1:
+**	ld1h	z0\.h, p0/z, \[x0, #-1, mul vl\]
+**	ret
+*/
+TEST_LOAD (ld1_vnum_bf16_m1, svbfloat16_t, bfloat16_t,
+	   z0 = svld1_vnum_bf16 (p0, x0, -1),
+	   z0 = svld1_vnum (p0, x0, -1))
+
+/*
+** ld1_vnum_bf16_m8:
+**	ld1h	z0\.h, p0/z, \[x0, #-8, mul vl\]
+**	ret
+*/
+TEST_LOAD (ld1_vnum_bf16_m8, svbfloat16_t, bfloat16_t,
+	   z0 = svld1_vnum_bf16 (p0, x0, -8),
+	   z0 = svld1_vnum (p0, x0, -8))
+
+/* Moving the constant into a register would also be OK.  */
+/*
+** ld1_vnum_bf16_m9:
+**	decb	x0, all, mul #9
+**	ld1h	z0\.h, p0/z, \[x0\]
+**	ret
+*/
+TEST_LOAD (ld1_vnum_bf16_m9, svbfloat16_t, bfloat16_t,
+	   z0 = svld1_vnum_bf16 (p0, x0, -9),
+	   z0 = svld1_vnum (p0, x0, -9))
+
+/* Using MUL to calculate an index would also be OK.  */
+/*
+** ld1_vnum_bf16_x1:
+**	cntb	(x[0-9]+)
+**	madd	(x[0-9]+), (x1, \1|\1, x1), x0
+**	ld1h	z0\.h, p0/z, \[\2\]
+**	ret
+*/
+TEST_LOAD (ld1_vnum_bf16_x1, svbfloat16_t, bfloat16_t,
+	   z0 = svld1_vnum_bf16 (p0, x0, x1),
+	   z0 = svld1_vnum (p0, x0, x1))
diff --git a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ld1ro_bf16.c b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ld1ro_bf16.c
new file mode 100644
index 0000000..cb18017
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ld1ro_bf16.c
@@ -0,0 +1,120 @@
+/* { dg-final { check-function-bodies "**" "" "-DCHECK_ASM" { target { ! ilp32 } } } } */
+/* { dg-additional-options "-march=armv8.6-a+f64mm" } */
+/* { dg-require-effective-target aarch64_asm_f64mm_ok }  */
+
+#include "test_sve_acle.h"
+
+/*
+** ld1ro_bf16_base:
+**	ld1roh	z0\.h, p0/z, \[x0\]
+**	ret
+*/
+TEST_LOAD (ld1ro_bf16_base, svbfloat16_t, bfloat16_t,
+	   z0 = svld1ro_bf16 (p0, x0),
+	   z0 = svld1ro (p0, x0))
+
+/*
+** ld1ro_bf16_index:
+**	ld1roh	z0\.h, p0/z, \[x0, x1, lsl 1\]
+**	ret
+*/
+TEST_LOAD (ld1ro_bf16_index, svbfloat16_t, bfloat16_t,
+	   z0 = svld1ro_bf16 (p0, x0 + x1),
+	   z0 = svld1ro (p0, x0 + x1))
+
+/*
+** ld1ro_bf16_1:
+**	add	(x[0-9]+), x0, #?2
+**	ld1roh	z0\.h, p0/z, \[\1\]
+**	ret
+*/
+TEST_LOAD (ld1ro_bf16_1, svbfloat16_t, bfloat16_t,
+	   z0 = svld1ro_bf16 (p0, x0 + 1),
+	   z0 = svld1ro (p0, x0 + 1))
+
+/*
+** ld1ro_bf16_8:
+**	add	(x[0-9]+), x0, #?16
+**	ld1roh	z0\.h, p0/z, \[\1\]
+**	ret
+*/
+TEST_LOAD (ld1ro_bf16_8, svbfloat16_t, bfloat16_t,
+	   z0 = svld1ro_bf16 (p0, x0 + 8),
+	   z0 = svld1ro (p0, x0 + 8))
+
+/*
+** ld1ro_bf16_128:
+**	add	(x[0-9]+), x0, #?256
+**	ld1roh	z0\.h, p0/z, \[\1\]
+**	ret
+*/
+TEST_LOAD (ld1ro_bf16_128, svbfloat16_t, bfloat16_t,
+	   z0 = svld1ro_bf16 (p0, x0 + 128),
+	   z0 = svld1ro (p0, x0 + 128))
+
+/*
+** ld1ro_bf16_m1:
+**	sub	(x[0-9]+), x0, #?2
+**	ld1roh	z0\.h, p0/z, \[\1\]
+**	ret
+*/
+TEST_LOAD (ld1ro_bf16_m1, svbfloat16_t, bfloat16_t,
+	   z0 = svld1ro_bf16 (p0, x0 - 1),
+	   z0 = svld1ro (p0, x0 - 1))
+
+/*
+** ld1ro_bf16_m8:
+**	sub	(x[0-9]+), x0, #?16
+**	ld1roh	z0\.h, p0/z, \[\1\]
+**	ret
+*/
+TEST_LOAD (ld1ro_bf16_m8, svbfloat16_t, bfloat16_t,
+	   z0 = svld1ro_bf16 (p0, x0 - 8),
+	   z0 = svld1ro (p0, x0 - 8))
+
+/*
+** ld1ro_bf16_m144:
+**	sub	(x[0-9]+), x0, #?288
+**	ld1roh	z0\.h, p0/z, \[\1\]
+**	ret
+*/
+TEST_LOAD (ld1ro_bf16_m144, svbfloat16_t, bfloat16_t,
+	   z0 = svld1ro_bf16 (p0, x0 - 144),
+	   z0 = svld1ro (p0, x0 - 144))
+
+/*
+** ld1ro_bf16_16:
+**	ld1roh	z0\.h, p0/z, \[x0, #?32\]
+**	ret
+*/
+TEST_LOAD (ld1ro_bf16_16, svbfloat16_t, bfloat16_t,
+	   z0 = svld1ro_bf16 (p0, x0 + 16),
+	   z0 = svld1ro (p0, x0 + 16))
+
+/*
+** ld1ro_bf16_112:
+**	ld1roh	z0\.h, p0/z, \[x0, #?224\]
+**	ret
+*/
+TEST_LOAD (ld1ro_bf16_112, svbfloat16_t, bfloat16_t,
+	   z0 = svld1ro_bf16 (p0, x0 + 112),
+	   z0 = svld1ro (p0, x0 + 112))
+
+/*
+** ld1ro_bf16_m16:
+**	ld1roh	z0\.h, p0/z, \[x0, #?-32\]
+**	ret
+*/
+TEST_LOAD (ld1ro_bf16_m16, svbfloat16_t, bfloat16_t,
+	   z0 = svld1ro_bf16 (p0, x0 - 16),
+	   z0 = svld1ro (p0, x0 - 16))
+
+/*
+** ld1ro_bf16_m128:
+**	ld1roh	z0\.h, p0/z, \[x0, #?-256\]
+**	ret
+*/
+TEST_LOAD (ld1ro_bf16_m128, svbfloat16_t, bfloat16_t,
+	   z0 = svld1ro_bf16 (p0, x0 - 128),
+	   z0 = svld1ro (p0, x0 - 128))
+
diff --git a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ld1rq_bf16.c b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ld1rq_bf16.c
new file mode 100644
index 0000000..54c69a1
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ld1rq_bf16.c
@@ -0,0 +1,137 @@
+/* { dg-final { check-function-bodies "**" "" "-DCHECK_ASM" { target { ! ilp32 } } } } */
+
+#include "test_sve_acle.h"
+
+/*
+** ld1rq_bf16_base:
+**	ld1rqh	z0\.h, p0/z, \[x0\]
+**	ret
+*/
+TEST_LOAD (ld1rq_bf16_base, svbfloat16_t, bfloat16_t,
+	   z0 = svld1rq_bf16 (p0, x0),
+	   z0 = svld1rq (p0, x0))
+
+/*
+** ld1rq_bf16_index:
+**	ld1rqh	z0\.h, p0/z, \[x0, x1, lsl 1\]
+**	ret
+*/
+TEST_LOAD (ld1rq_bf16_index, svbfloat16_t, bfloat16_t,
+	   z0 = svld1rq_bf16 (p0, x0 + x1),
+	   z0 = svld1rq (p0, x0 + x1))
+
+/*
+** ld1rq_bf16_1:
+**	add	(x[0-9]+), x0, #?2
+**	ld1rqh	z0\.h, p0/z, \[\1\]
+**	ret
+*/
+TEST_LOAD (ld1rq_bf16_1, svbfloat16_t, bfloat16_t,
+	   z0 = svld1rq_bf16 (p0, x0 + 1),
+	   z0 = svld1rq (p0, x0 + 1))
+
+/*
+** ld1rq_bf16_4:
+**	add	(x[0-9]+), x0, #?8
+**	ld1rqh	z0\.h, p0/z, \[\1\]
+**	ret
+*/
+TEST_LOAD (ld1rq_bf16_4, svbfloat16_t, bfloat16_t,
+	   z0 = svld1rq_bf16 (p0, x0 + 4),
+	   z0 = svld1rq (p0, x0 + 4))
+
+/*
+** ld1rq_bf16_7:
+**	add	(x[0-9]+), x0, #?14
+**	ld1rqh	z0\.h, p0/z, \[\1\]
+**	ret
+*/
+TEST_LOAD (ld1rq_bf16_7, svbfloat16_t, bfloat16_t,
+	   z0 = svld1rq_bf16 (p0, x0 + 7),
+	   z0 = svld1rq (p0, x0 + 7))
+
+/*
+** ld1rq_bf16_8:
+**	ld1rqh	z0\.h, p0/z, \[x0, #?16\]
+**	ret
+*/
+TEST_LOAD (ld1rq_bf16_8, svbfloat16_t, bfloat16_t,
+	   z0 = svld1rq_bf16 (p0, x0 + 8),
+	   z0 = svld1rq (p0, x0 + 8))
+
+/*
+** ld1rq_bf16_56:
+**	ld1rqh	z0\.h, p0/z, \[x0, #?112\]
+**	ret
+*/
+TEST_LOAD (ld1rq_bf16_56, svbfloat16_t, bfloat16_t,
+	   z0 = svld1rq_bf16 (p0, x0 + 56),
+	   z0 = svld1rq (p0, x0 + 56))
+
+/*
+** ld1rq_bf16_64:
+**	add	(x[0-9]+), x0, #?128
+**	ld1rqh	z0\.h, p0/z, \[\1\]
+**	ret
+*/
+TEST_LOAD (ld1rq_bf16_64, svbfloat16_t, bfloat16_t,
+	   z0 = svld1rq_bf16 (p0, x0 + 64),
+	   z0 = svld1rq (p0, x0 + 64))
+
+/*
+** ld1rq_bf16_m1:
+**	sub	(x[0-9]+), x0, #?2
+**	ld1rqh	z0\.h, p0/z, \[\1\]
+**	ret
+*/
+TEST_LOAD (ld1rq_bf16_m1, svbfloat16_t, bfloat16_t,
+	   z0 = svld1rq_bf16 (p0, x0 - 1),
+	   z0 = svld1rq (p0, x0 - 1))
+
+/*
+** ld1rq_bf16_m4:
+**	sub	(x[0-9]+), x0, #?8
+**	ld1rqh	z0\.h, p0/z, \[\1\]
+**	ret
+*/
+TEST_LOAD (ld1rq_bf16_m4, svbfloat16_t, bfloat16_t,
+	   z0 = svld1rq_bf16 (p0, x0 - 4),
+	   z0 = svld1rq (p0, x0 - 4))
+
+/*
+** ld1rq_bf16_m7:
+**	sub	(x[0-9]+), x0, #?14
+**	ld1rqh	z0\.h, p0/z, \[\1\]
+**	ret
+*/
+TEST_LOAD (ld1rq_bf16_m7, svbfloat16_t, bfloat16_t,
+	   z0 = svld1rq_bf16 (p0, x0 - 7),
+	   z0 = svld1rq (p0, x0 - 7))
+
+/*
+** ld1rq_bf16_m8:
+**	ld1rqh	z0\.h, p0/z, \[x0, #?-16\]
+**	ret
+*/
+TEST_LOAD (ld1rq_bf16_m8, svbfloat16_t, bfloat16_t,
+	   z0 = svld1rq_bf16 (p0, x0 - 8),
+	   z0 = svld1rq (p0, x0 - 8))
+
+/*
+** ld1rq_bf16_m64:
+**	ld1rqh	z0\.h, p0/z, \[x0, #?-128\]
+**	ret
+*/
+TEST_LOAD (ld1rq_bf16_m64, svbfloat16_t, bfloat16_t,
+	   z0 = svld1rq_bf16 (p0, x0 - 64),
+	   z0 = svld1rq (p0, x0 - 64))
+
+/*
+** ld1rq_bf16_m72:
+**	sub	(x[0-9]+), x0, #?144
+**	ld1rqh	z0\.h, p0/z, \[\1\]
+**	ret
+*/
+TEST_LOAD (ld1rq_bf16_m72, svbfloat16_t, bfloat16_t,
+	   z0 = svld1rq_bf16 (p0, x0 - 72),
+	   z0 = svld1rq (p0, x0 - 72))
diff --git a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ld2_bf16.c b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ld2_bf16.c
new file mode 100644
index 0000000..5d08c1e
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ld2_bf16.c
@@ -0,0 +1,200 @@
+/* { dg-final { check-function-bodies "**" "" "-DCHECK_ASM" { target { ! ilp32 } } } } */
+
+#include "test_sve_acle.h"
+
+/*
+** ld2_bf16_base:
+**	ld2h	{z0\.h(?: - |, )z1\.h}, p0/z, \[x0\]
+**	ret
+*/
+TEST_LOAD (ld2_bf16_base, svbfloat16x2_t, bfloat16_t,
+	   z0 = svld2_bf16 (p0, x0),
+	   z0 = svld2 (p0, x0))
+
+/*
+** ld2_bf16_index:
+**	ld2h	{z0\.h(?: - |, )z1\.h}, p0/z, \[x0, x1, lsl 1\]
+**	ret
+*/
+TEST_LOAD (ld2_bf16_index, svbfloat16x2_t, bfloat16_t,
+	   z0 = svld2_bf16 (p0, x0 + x1),
+	   z0 = svld2 (p0, x0 + x1))
+
+/* Moving the constant into a register would also be OK.  */
+/*
+** ld2_bf16_1:
+**	incb	x0
+**	ld2h	{z0\.h(?: - |, )z1\.h}, p0/z, \[x0\]
+**	ret
+*/
+TEST_LOAD (ld2_bf16_1, svbfloat16x2_t, bfloat16_t,
+	   z0 = svld2_bf16 (p0, x0 + svcnth ()),
+	   z0 = svld2 (p0, x0 + svcnth ()))
+
+/*
+** ld2_bf16_2:
+**	ld2h	{z0\.h(?: - |, )z1\.h}, p0/z, \[x0, #2, mul vl\]
+**	ret
+*/
+TEST_LOAD (ld2_bf16_2, svbfloat16x2_t, bfloat16_t,
+	   z0 = svld2_bf16 (p0, x0 + svcnth () * 2),
+	   z0 = svld2 (p0, x0 + svcnth () * 2))
+
+/*
+** ld2_bf16_14:
+**	ld2h	{z0\.h(?: - |, )z1\.h}, p0/z, \[x0, #14, mul vl\]
+**	ret
+*/
+TEST_LOAD (ld2_bf16_14, svbfloat16x2_t, bfloat16_t,
+	   z0 = svld2_bf16 (p0, x0 + svcnth () * 14),
+	   z0 = svld2 (p0, x0 + svcnth () * 14))
+
+/* Moving the constant into a register would also be OK.  */
+/*
+** ld2_bf16_16:
+**	incb	x0, all, mul #16
+**	ld2h	{z0\.h(?: - |, )z1\.h}, p0/z, \[x0\]
+**	ret
+*/
+TEST_LOAD (ld2_bf16_16, svbfloat16x2_t, bfloat16_t,
+	   z0 = svld2_bf16 (p0, x0 + svcnth () * 16),
+	   z0 = svld2 (p0, x0 + svcnth () * 16))
+
+/* Moving the constant into a register would also be OK.  */
+/*
+** ld2_bf16_m1:
+**	decb	x0
+**	ld2h	{z0\.h(?: - |, )z1\.h}, p0/z, \[x0\]
+**	ret
+*/
+TEST_LOAD (ld2_bf16_m1, svbfloat16x2_t, bfloat16_t,
+	   z0 = svld2_bf16 (p0, x0 - svcnth ()),
+	   z0 = svld2 (p0, x0 - svcnth ()))
+
+/*
+** ld2_bf16_m2:
+**	ld2h	{z0\.h(?: - |, )z1\.h}, p0/z, \[x0, #-2, mul vl\]
+**	ret
+*/
+TEST_LOAD (ld2_bf16_m2, svbfloat16x2_t, bfloat16_t,
+	   z0 = svld2_bf16 (p0, x0 - svcnth () * 2),
+	   z0 = svld2 (p0, x0 - svcnth () * 2))
+
+/*
+** ld2_bf16_m16:
+**	ld2h	{z0\.h(?: - |, )z1\.h}, p0/z, \[x0, #-16, mul vl\]
+**	ret
+*/
+TEST_LOAD (ld2_bf16_m16, svbfloat16x2_t, bfloat16_t,
+	   z0 = svld2_bf16 (p0, x0 - svcnth () * 16),
+	   z0 = svld2 (p0, x0 - svcnth () * 16))
+
+/*
+** ld2_bf16_m18:
+**	addvl	(x[0-9]+), x0, #-18
+**	ld2h	{z0\.h(?: - |, )z1\.h}, p0/z, \[\1\]
+**	ret
+*/
+TEST_LOAD (ld2_bf16_m18, svbfloat16x2_t, bfloat16_t,
+	   z0 = svld2_bf16 (p0, x0 - svcnth () * 18),
+	   z0 = svld2 (p0, x0 - svcnth () * 18))
+
+/*
+** ld2_vnum_bf16_0:
+**	ld2h	{z0\.h(?: - |, )z1\.h}, p0/z, \[x0\]
+**	ret
+*/
+TEST_LOAD (ld2_vnum_bf16_0, svbfloat16x2_t, bfloat16_t,
+	   z0 = svld2_vnum_bf16 (p0, x0, 0),
+	   z0 = svld2_vnum (p0, x0, 0))
+
+/* Moving the constant into a register would also be OK.  */
+/*
+** ld2_vnum_bf16_1:
+**	incb	x0
+**	ld2h	{z0\.h(?: - |, )z1\.h}, p0/z, \[x0\]
+**	ret
+*/
+TEST_LOAD (ld2_vnum_bf16_1, svbfloat16x2_t, bfloat16_t,
+	   z0 = svld2_vnum_bf16 (p0, x0, 1),
+	   z0 = svld2_vnum (p0, x0, 1))
+
+/*
+** ld2_vnum_bf16_2:
+**	ld2h	{z0\.h(?: - |, )z1\.h}, p0/z, \[x0, #2, mul vl\]
+**	ret
+*/
+TEST_LOAD (ld2_vnum_bf16_2, svbfloat16x2_t, bfloat16_t,
+	   z0 = svld2_vnum_bf16 (p0, x0, 2),
+	   z0 = svld2_vnum (p0, x0, 2))
+
+/*
+** ld2_vnum_bf16_14:
+**	ld2h	{z0\.h(?: - |, )z1\.h}, p0/z, \[x0, #14, mul vl\]
+**	ret
+*/
+TEST_LOAD (ld2_vnum_bf16_14, svbfloat16x2_t, bfloat16_t,
+	   z0 = svld2_vnum_bf16 (p0, x0, 14),
+	   z0 = svld2_vnum (p0, x0, 14))
+
+/* Moving the constant into a register would also be OK.  */
+/*
+** ld2_vnum_bf16_16:
+**	incb	x0, all, mul #16
+**	ld2h	{z0\.h(?: - |, )z1\.h}, p0/z, \[x0\]
+**	ret
+*/
+TEST_LOAD (ld2_vnum_bf16_16, svbfloat16x2_t, bfloat16_t,
+	   z0 = svld2_vnum_bf16 (p0, x0, 16),
+	   z0 = svld2_vnum (p0, x0, 16))
+
+/* Moving the constant into a register would also be OK.  */
+/*
+** ld2_vnum_bf16_m1:
+**	decb	x0
+**	ld2h	{z0\.h(?: - |, )z1\.h}, p0/z, \[x0\]
+**	ret
+*/
+TEST_LOAD (ld2_vnum_bf16_m1, svbfloat16x2_t, bfloat16_t,
+	   z0 = svld2_vnum_bf16 (p0, x0, -1),
+	   z0 = svld2_vnum (p0, x0, -1))
+
+/*
+** ld2_vnum_bf16_m2:
+**	ld2h	{z0\.h(?: - |, )z1\.h}, p0/z, \[x0, #-2, mul vl\]
+**	ret
+*/
+TEST_LOAD (ld2_vnum_bf16_m2, svbfloat16x2_t, bfloat16_t,
+	   z0 = svld2_vnum_bf16 (p0, x0, -2),
+	   z0 = svld2_vnum (p0, x0, -2))
+
+/*
+** ld2_vnum_bf16_m16:
+**	ld2h	{z0\.h(?: - |, )z1\.h}, p0/z, \[x0, #-16, mul vl\]
+**	ret
+*/
+TEST_LOAD (ld2_vnum_bf16_m16, svbfloat16x2_t, bfloat16_t,
+	   z0 = svld2_vnum_bf16 (p0, x0, -16),
+	   z0 = svld2_vnum (p0, x0, -16))
+
+/*
+** ld2_vnum_bf16_m18:
+**	addvl	(x[0-9]+), x0, #-18
+**	ld2h	{z0\.h(?: - |, )z1\.h}, p0/z, \[\1\]
+**	ret
+*/
+TEST_LOAD (ld2_vnum_bf16_m18, svbfloat16x2_t, bfloat16_t,
+	   z0 = svld2_vnum_bf16 (p0, x0, -18),
+	   z0 = svld2_vnum (p0, x0, -18))
+
+/* Using MUL to calculate an index would also be OK.  */
+/*
+** ld2_vnum_bf16_x1:
+**	cntb	(x[0-9]+)
+**	madd	(x[0-9]+), (x1, \1|\1, x1), x0
+**	ld2h	{z0\.h(?: - |, )z1\.h}, p0/z, \[\2\]
+**	ret
+*/
+TEST_LOAD (ld2_vnum_bf16_x1, svbfloat16x2_t, bfloat16_t,
+	   z0 = svld2_vnum_bf16 (p0, x0, x1),
+	   z0 = svld2_vnum (p0, x0, x1))
diff --git a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ld3_bf16.c b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ld3_bf16.c
new file mode 100644
index 0000000..e0b4fb1
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ld3_bf16.c
@@ -0,0 +1,242 @@
+/* { dg-final { check-function-bodies "**" "" "-DCHECK_ASM" { target { ! ilp32 } } } } */
+
+#include "test_sve_acle.h"
+
+/*
+** ld3_bf16_base:
+**	ld3h	{z0\.h - z2\.h}, p0/z, \[x0\]
+**	ret
+*/
+TEST_LOAD (ld3_bf16_base, svbfloat16x3_t, bfloat16_t,
+	   z0 = svld3_bf16 (p0, x0),
+	   z0 = svld3 (p0, x0))
+
+/*
+** ld3_bf16_index:
+**	ld3h	{z0\.h - z2\.h}, p0/z, \[x0, x1, lsl 1\]
+**	ret
+*/
+TEST_LOAD (ld3_bf16_index, svbfloat16x3_t, bfloat16_t,
+	   z0 = svld3_bf16 (p0, x0 + x1),
+	   z0 = svld3 (p0, x0 + x1))
+
+/* Moving the constant into a register would also be OK.  */
+/*
+** ld3_bf16_1:
+**	incb	x0
+**	ld3h	{z0\.h - z2\.h}, p0/z, \[x0\]
+**	ret
+*/
+TEST_LOAD (ld3_bf16_1, svbfloat16x3_t, bfloat16_t,
+	   z0 = svld3_bf16 (p0, x0 + svcnth ()),
+	   z0 = svld3 (p0, x0 + svcnth ()))
+
+/* Moving the constant into a register would also be OK.  */
+/*
+** ld3_bf16_2:
+**	incb	x0, all, mul #2
+**	ld3h	{z0\.h - z2\.h}, p0/z, \[x0\]
+**	ret
+*/
+TEST_LOAD (ld3_bf16_2, svbfloat16x3_t, bfloat16_t,
+	   z0 = svld3_bf16 (p0, x0 + svcnth () * 2),
+	   z0 = svld3 (p0, x0 + svcnth () * 2))
+
+/*
+** ld3_bf16_3:
+**	ld3h	{z0\.h - z2\.h}, p0/z, \[x0, #3, mul vl\]
+**	ret
+*/
+TEST_LOAD (ld3_bf16_3, svbfloat16x3_t, bfloat16_t,
+	   z0 = svld3_bf16 (p0, x0 + svcnth () * 3),
+	   z0 = svld3 (p0, x0 + svcnth () * 3))
+
+/*
+** ld3_bf16_21:
+**	ld3h	{z0\.h - z2\.h}, p0/z, \[x0, #21, mul vl\]
+**	ret
+*/
+TEST_LOAD (ld3_bf16_21, svbfloat16x3_t, bfloat16_t,
+	   z0 = svld3_bf16 (p0, x0 + svcnth () * 21),
+	   z0 = svld3 (p0, x0 + svcnth () * 21))
+
+/*
+** ld3_bf16_24:
+**	addvl	(x[0-9]+), x0, #24
+**	ld3h	{z0\.h - z2\.h}, p0/z, \[\1\]
+**	ret
+*/
+TEST_LOAD (ld3_bf16_24, svbfloat16x3_t, bfloat16_t,
+	   z0 = svld3_bf16 (p0, x0 + svcnth () * 24),
+	   z0 = svld3 (p0, x0 + svcnth () * 24))
+
+/* Moving the constant into a register would also be OK.  */
+/*
+** ld3_bf16_m1:
+**	decb	x0
+**	ld3h	{z0\.h - z2\.h}, p0/z, \[x0\]
+**	ret
+*/
+TEST_LOAD (ld3_bf16_m1, svbfloat16x3_t, bfloat16_t,
+	   z0 = svld3_bf16 (p0, x0 - svcnth ()),
+	   z0 = svld3 (p0, x0 - svcnth ()))
+
+/* Moving the constant into a register would also be OK.  */
+/*
+** ld3_bf16_m2:
+**	decb	x0, all, mul #2
+**	ld3h	{z0\.h - z2\.h}, p0/z, \[x0\]
+**	ret
+*/
+TEST_LOAD (ld3_bf16_m2, svbfloat16x3_t, bfloat16_t,
+	   z0 = svld3_bf16 (p0, x0 - svcnth () * 2),
+	   z0 = svld3 (p0, x0 - svcnth () * 2))
+
+/*
+** ld3_bf16_m3:
+**	ld3h	{z0\.h - z2\.h}, p0/z, \[x0, #-3, mul vl\]
+**	ret
+*/
+TEST_LOAD (ld3_bf16_m3, svbfloat16x3_t, bfloat16_t,
+	   z0 = svld3_bf16 (p0, x0 - svcnth () * 3),
+	   z0 = svld3 (p0, x0 - svcnth () * 3))
+
+/*
+** ld3_bf16_m24:
+**	ld3h	{z0\.h - z2\.h}, p0/z, \[x0, #-24, mul vl\]
+**	ret
+*/
+TEST_LOAD (ld3_bf16_m24, svbfloat16x3_t, bfloat16_t,
+	   z0 = svld3_bf16 (p0, x0 - svcnth () * 24),
+	   z0 = svld3 (p0, x0 - svcnth () * 24))
+
+/*
+** ld3_bf16_m27:
+**	addvl	(x[0-9]+), x0, #-27
+**	ld3h	{z0\.h - z2\.h}, p0/z, \[\1\]
+**	ret
+*/
+TEST_LOAD (ld3_bf16_m27, svbfloat16x3_t, bfloat16_t,
+	   z0 = svld3_bf16 (p0, x0 - svcnth () * 27),
+	   z0 = svld3 (p0, x0 - svcnth () * 27))
+
+/*
+** ld3_vnum_bf16_0:
+**	ld3h	{z0\.h - z2\.h}, p0/z, \[x0\]
+**	ret
+*/
+TEST_LOAD (ld3_vnum_bf16_0, svbfloat16x3_t, bfloat16_t,
+	   z0 = svld3_vnum_bf16 (p0, x0, 0),
+	   z0 = svld3_vnum (p0, x0, 0))
+
+/* Moving the constant into a register would also be OK.  */
+/*
+** ld3_vnum_bf16_1:
+**	incb	x0
+**	ld3h	{z0\.h - z2\.h}, p0/z, \[x0\]
+**	ret
+*/
+TEST_LOAD (ld3_vnum_bf16_1, svbfloat16x3_t, bfloat16_t,
+	   z0 = svld3_vnum_bf16 (p0, x0, 1),
+	   z0 = svld3_vnum (p0, x0, 1))
+
+/* Moving the constant into a register would also be OK.  */
+/*
+** ld3_vnum_bf16_2:
+**	incb	x0, all, mul #2
+**	ld3h	{z0\.h - z2\.h}, p0/z, \[x0\]
+**	ret
+*/
+TEST_LOAD (ld3_vnum_bf16_2, svbfloat16x3_t, bfloat16_t,
+	   z0 = svld3_vnum_bf16 (p0, x0, 2),
+	   z0 = svld3_vnum (p0, x0, 2))
+
+/*
+** ld3_vnum_bf16_3:
+**	ld3h	{z0\.h - z2\.h}, p0/z, \[x0, #3, mul vl\]
+**	ret
+*/
+TEST_LOAD (ld3_vnum_bf16_3, svbfloat16x3_t, bfloat16_t,
+	   z0 = svld3_vnum_bf16 (p0, x0, 3),
+	   z0 = svld3_vnum (p0, x0, 3))
+
+/*
+** ld3_vnum_bf16_21:
+**	ld3h	{z0\.h - z2\.h}, p0/z, \[x0, #21, mul vl\]
+**	ret
+*/
+TEST_LOAD (ld3_vnum_bf16_21, svbfloat16x3_t, bfloat16_t,
+	   z0 = svld3_vnum_bf16 (p0, x0, 21),
+	   z0 = svld3_vnum (p0, x0, 21))
+
+/*
+** ld3_vnum_bf16_24:
+**	addvl	(x[0-9]+), x0, #24
+**	ld3h	{z0\.h - z2\.h}, p0/z, \[\1\]
+**	ret
+*/
+TEST_LOAD (ld3_vnum_bf16_24, svbfloat16x3_t, bfloat16_t,
+	   z0 = svld3_vnum_bf16 (p0, x0, 24),
+	   z0 = svld3_vnum (p0, x0, 24))
+
+/* Moving the constant into a register would also be OK.  */
+/*
+** ld3_vnum_bf16_m1:
+**	decb	x0
+**	ld3h	{z0\.h - z2\.h}, p0/z, \[x0\]
+**	ret
+*/
+TEST_LOAD (ld3_vnum_bf16_m1, svbfloat16x3_t, bfloat16_t,
+	   z0 = svld3_vnum_bf16 (p0, x0, -1),
+	   z0 = svld3_vnum (p0, x0, -1))
+
+/* Moving the constant into a register would also be OK.  */
+/*
+** ld3_vnum_bf16_m2:
+**	decb	x0, all, mul #2
+**	ld3h	{z0\.h - z2\.h}, p0/z, \[x0\]
+**	ret
+*/
+TEST_LOAD (ld3_vnum_bf16_m2, svbfloat16x3_t, bfloat16_t,
+	   z0 = svld3_vnum_bf16 (p0, x0, -2),
+	   z0 = svld3_vnum (p0, x0, -2))
+
+/*
+** ld3_vnum_bf16_m3:
+**	ld3h	{z0\.h - z2\.h}, p0/z, \[x0, #-3, mul vl\]
+**	ret
+*/
+TEST_LOAD (ld3_vnum_bf16_m3, svbfloat16x3_t, bfloat16_t,
+	   z0 = svld3_vnum_bf16 (p0, x0, -3),
+	   z0 = svld3_vnum (p0, x0, -3))
+
+/*
+** ld3_vnum_bf16_m24:
+**	ld3h	{z0\.h - z2\.h}, p0/z, \[x0, #-24, mul vl\]
+**	ret
+*/
+TEST_LOAD (ld3_vnum_bf16_m24, svbfloat16x3_t, bfloat16_t,
+	   z0 = svld3_vnum_bf16 (p0, x0, -24),
+	   z0 = svld3_vnum (p0, x0, -24))
+
+/*
+** ld3_vnum_bf16_m27:
+**	addvl	(x[0-9]+), x0, #-27
+**	ld3h	{z0\.h - z2\.h}, p0/z, \[\1\]
+**	ret
+*/
+TEST_LOAD (ld3_vnum_bf16_m27, svbfloat16x3_t, bfloat16_t,
+	   z0 = svld3_vnum_bf16 (p0, x0, -27),
+	   z0 = svld3_vnum (p0, x0, -27))
+
+/* Using MUL to calculate an index would also be OK.  */
+/*
+** ld3_vnum_bf16_x1:
+**	cntb	(x[0-9]+)
+**	madd	(x[0-9]+), (x1, \1|\1, x1), x0
+**	ld3h	{z0\.h - z2\.h}, p0/z, \[\2\]
+**	ret
+*/
+TEST_LOAD (ld3_vnum_bf16_x1, svbfloat16x3_t, bfloat16_t,
+	   z0 = svld3_vnum_bf16 (p0, x0, x1),
+	   z0 = svld3_vnum (p0, x0, x1))
diff --git a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ld4_bf16.c b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ld4_bf16.c
new file mode 100644
index 0000000..123ff63
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ld4_bf16.c
@@ -0,0 +1,286 @@
+/* { dg-final { check-function-bodies "**" "" "-DCHECK_ASM" { target { ! ilp32 } } } } */
+
+#include "test_sve_acle.h"
+
+/*
+** ld4_bf16_base:
+**	ld4h	{z0\.h - z3\.h}, p0/z, \[x0\]
+**	ret
+*/
+TEST_LOAD (ld4_bf16_base, svbfloat16x4_t, bfloat16_t,
+	   z0 = svld4_bf16 (p0, x0),
+	   z0 = svld4 (p0, x0))
+
+/*
+** ld4_bf16_index:
+**	ld4h	{z0\.h - z3\.h}, p0/z, \[x0, x1, lsl 1\]
+**	ret
+*/
+TEST_LOAD (ld4_bf16_index, svbfloat16x4_t, bfloat16_t,
+	   z0 = svld4_bf16 (p0, x0 + x1),
+	   z0 = svld4 (p0, x0 + x1))
+
+/* Moving the constant into a register would also be OK.  */
+/*
+** ld4_bf16_1:
+**	incb	x0
+**	ld4h	{z0\.h - z3\.h}, p0/z, \[x0\]
+**	ret
+*/
+TEST_LOAD (ld4_bf16_1, svbfloat16x4_t, bfloat16_t,
+	   z0 = svld4_bf16 (p0, x0 + svcnth ()),
+	   z0 = svld4 (p0, x0 + svcnth ()))
+
+/* Moving the constant into a register would also be OK.  */
+/*
+** ld4_bf16_2:
+**	incb	x0, all, mul #2
+**	ld4h	{z0\.h - z3\.h}, p0/z, \[x0\]
+**	ret
+*/
+TEST_LOAD (ld4_bf16_2, svbfloat16x4_t, bfloat16_t,
+	   z0 = svld4_bf16 (p0, x0 + svcnth () * 2),
+	   z0 = svld4 (p0, x0 + svcnth () * 2))
+
+/* Moving the constant into a register would also be OK.  */
+/*
+** ld4_bf16_3:
+**	incb	x0, all, mul #3
+**	ld4h	{z0\.h - z3\.h}, p0/z, \[x0\]
+**	ret
+*/
+TEST_LOAD (ld4_bf16_3, svbfloat16x4_t, bfloat16_t,
+	   z0 = svld4_bf16 (p0, x0 + svcnth () * 3),
+	   z0 = svld4 (p0, x0 + svcnth () * 3))
+
+/*
+** ld4_bf16_4:
+**	ld4h	{z0\.h - z3\.h}, p0/z, \[x0, #4, mul vl\]
+**	ret
+*/
+TEST_LOAD (ld4_bf16_4, svbfloat16x4_t, bfloat16_t,
+	   z0 = svld4_bf16 (p0, x0 + svcnth () * 4),
+	   z0 = svld4 (p0, x0 + svcnth () * 4))
+
+/*
+** ld4_bf16_28:
+**	ld4h	{z0\.h - z3\.h}, p0/z, \[x0, #28, mul vl\]
+**	ret
+*/
+TEST_LOAD (ld4_bf16_28, svbfloat16x4_t, bfloat16_t,
+	   z0 = svld4_bf16 (p0, x0 + svcnth () * 28),
+	   z0 = svld4 (p0, x0 + svcnth () * 28))
+
+/*
+** ld4_bf16_32:
+**	[^{]*
+**	ld4h	{z0\.h - z3\.h}, p0/z, \[x[0-9]+\]
+**	ret
+*/
+TEST_LOAD (ld4_bf16_32, svbfloat16x4_t, bfloat16_t,
+	   z0 = svld4_bf16 (p0, x0 + svcnth () * 32),
+	   z0 = svld4 (p0, x0 + svcnth () * 32))
+
+/* Moving the constant into a register would also be OK.  */
+/*
+** ld4_bf16_m1:
+**	decb	x0
+**	ld4h	{z0\.h - z3\.h}, p0/z, \[x0\]
+**	ret
+*/
+TEST_LOAD (ld4_bf16_m1, svbfloat16x4_t, bfloat16_t,
+	   z0 = svld4_bf16 (p0, x0 - svcnth ()),
+	   z0 = svld4 (p0, x0 - svcnth ()))
+
+/* Moving the constant into a register would also be OK.  */
+/*
+** ld4_bf16_m2:
+**	decb	x0, all, mul #2
+**	ld4h	{z0\.h - z3\.h}, p0/z, \[x0\]
+**	ret
+*/
+TEST_LOAD (ld4_bf16_m2, svbfloat16x4_t, bfloat16_t,
+	   z0 = svld4_bf16 (p0, x0 - svcnth () * 2),
+	   z0 = svld4 (p0, x0 - svcnth () * 2))
+
+/* Moving the constant into a register would also be OK.  */
+/*
+** ld4_bf16_m3:
+**	decb	x0, all, mul #3
+**	ld4h	{z0\.h - z3\.h}, p0/z, \[x0\]
+**	ret
+*/
+TEST_LOAD (ld4_bf16_m3, svbfloat16x4_t, bfloat16_t,
+	   z0 = svld4_bf16 (p0, x0 - svcnth () * 3),
+	   z0 = svld4 (p0, x0 - svcnth () * 3))
+
+/*
+** ld4_bf16_m4:
+**	ld4h	{z0\.h - z3\.h}, p0/z, \[x0, #-4, mul vl\]
+**	ret
+*/
+TEST_LOAD (ld4_bf16_m4, svbfloat16x4_t, bfloat16_t,
+	   z0 = svld4_bf16 (p0, x0 - svcnth () * 4),
+	   z0 = svld4 (p0, x0 - svcnth () * 4))
+
+/*
+** ld4_bf16_m32:
+**	ld4h	{z0\.h - z3\.h}, p0/z, \[x0, #-32, mul vl\]
+**	ret
+*/
+TEST_LOAD (ld4_bf16_m32, svbfloat16x4_t, bfloat16_t,
+	   z0 = svld4_bf16 (p0, x0 - svcnth () * 32),
+	   z0 = svld4 (p0, x0 - svcnth () * 32))
+
+/*
+** ld4_bf16_m36:
+**	[^{]*
+**	ld4h	{z0\.h - z3\.h}, p0/z, \[x[0-9]+\]
+**	ret
+*/
+TEST_LOAD (ld4_bf16_m36, svbfloat16x4_t, bfloat16_t,
+	   z0 = svld4_bf16 (p0, x0 - svcnth () * 36),
+	   z0 = svld4 (p0, x0 - svcnth () * 36))
+
+/*
+** ld4_vnum_bf16_0:
+**	ld4h	{z0\.h - z3\.h}, p0/z, \[x0\]
+**	ret
+*/
+TEST_LOAD (ld4_vnum_bf16_0, svbfloat16x4_t, bfloat16_t,
+	   z0 = svld4_vnum_bf16 (p0, x0, 0),
+	   z0 = svld4_vnum (p0, x0, 0))
+
+/* Moving the constant into a register would also be OK.  */
+/*
+** ld4_vnum_bf16_1:
+**	incb	x0
+**	ld4h	{z0\.h - z3\.h}, p0/z, \[x0\]
+**	ret
+*/
+TEST_LOAD (ld4_vnum_bf16_1, svbfloat16x4_t, bfloat16_t,
+	   z0 = svld4_vnum_bf16 (p0, x0, 1),
+	   z0 = svld4_vnum (p0, x0, 1))
+
+/* Moving the constant into a register would also be OK.  */
+/*
+** ld4_vnum_bf16_2:
+**	incb	x0, all, mul #2
+**	ld4h	{z0\.h - z3\.h}, p0/z, \[x0\]
+**	ret
+*/
+TEST_LOAD (ld4_vnum_bf16_2, svbfloat16x4_t, bfloat16_t,
+	   z0 = svld4_vnum_bf16 (p0, x0, 2),
+	   z0 = svld4_vnum (p0, x0, 2))
+
+/* Moving the constant into a register would also be OK.  */
+/*
+** ld4_vnum_bf16_3:
+**	incb	x0, all, mul #3
+**	ld4h	{z0\.h - z3\.h}, p0/z, \[x0\]
+**	ret
+*/
+TEST_LOAD (ld4_vnum_bf16_3, svbfloat16x4_t, bfloat16_t,
+	   z0 = svld4_vnum_bf16 (p0, x0, 3),
+	   z0 = svld4_vnum (p0, x0, 3))
+
+/*
+** ld4_vnum_bf16_4:
+**	ld4h	{z0\.h - z3\.h}, p0/z, \[x0, #4, mul vl\]
+**	ret
+*/
+TEST_LOAD (ld4_vnum_bf16_4, svbfloat16x4_t, bfloat16_t,
+	   z0 = svld4_vnum_bf16 (p0, x0, 4),
+	   z0 = svld4_vnum (p0, x0, 4))
+
+/*
+** ld4_vnum_bf16_28:
+**	ld4h	{z0\.h - z3\.h}, p0/z, \[x0, #28, mul vl\]
+**	ret
+*/
+TEST_LOAD (ld4_vnum_bf16_28, svbfloat16x4_t, bfloat16_t,
+	   z0 = svld4_vnum_bf16 (p0, x0, 28),
+	   z0 = svld4_vnum (p0, x0, 28))
+
+/*
+** ld4_vnum_bf16_32:
+**	[^{]*
+**	ld4h	{z0\.h - z3\.h}, p0/z, \[x[0-9]+\]
+**	ret
+*/
+TEST_LOAD (ld4_vnum_bf16_32, svbfloat16x4_t, bfloat16_t,
+	   z0 = svld4_vnum_bf16 (p0, x0, 32),
+	   z0 = svld4_vnum (p0, x0, 32))
+
+/* Moving the constant into a register would also be OK.  */
+/*
+** ld4_vnum_bf16_m1:
+**	decb	x0
+**	ld4h	{z0\.h - z3\.h}, p0/z, \[x0\]
+**	ret
+*/
+TEST_LOAD (ld4_vnum_bf16_m1, svbfloat16x4_t, bfloat16_t,
+	   z0 = svld4_vnum_bf16 (p0, x0, -1),
+	   z0 = svld4_vnum (p0, x0, -1))
+
+/* Moving the constant into a register would also be OK.  */
+/*
+** ld4_vnum_bf16_m2:
+**	decb	x0, all, mul #2
+**	ld4h	{z0\.h - z3\.h}, p0/z, \[x0\]
+**	ret
+*/
+TEST_LOAD (ld4_vnum_bf16_m2, svbfloat16x4_t, bfloat16_t,
+	   z0 = svld4_vnum_bf16 (p0, x0, -2),
+	   z0 = svld4_vnum (p0, x0, -2))
+
+/* Moving the constant into a register would also be OK.  */
+/*
+** ld4_vnum_bf16_m3:
+**	decb	x0, all, mul #3
+**	ld4h	{z0\.h - z3\.h}, p0/z, \[x0\]
+**	ret
+*/
+TEST_LOAD (ld4_vnum_bf16_m3, svbfloat16x4_t, bfloat16_t,
+	   z0 = svld4_vnum_bf16 (p0, x0, -3),
+	   z0 = svld4_vnum (p0, x0, -3))
+
+/*
+** ld4_vnum_bf16_m4:
+**	ld4h	{z0\.h - z3\.h}, p0/z, \[x0, #-4, mul vl\]
+**	ret
+*/
+TEST_LOAD (ld4_vnum_bf16_m4, svbfloat16x4_t, bfloat16_t,
+	   z0 = svld4_vnum_bf16 (p0, x0, -4),
+	   z0 = svld4_vnum (p0, x0, -4))
+
+/*
+** ld4_vnum_bf16_m32:
+**	ld4h	{z0\.h - z3\.h}, p0/z, \[x0, #-32, mul vl\]
+**	ret
+*/
+TEST_LOAD (ld4_vnum_bf16_m32, svbfloat16x4_t, bfloat16_t,
+	   z0 = svld4_vnum_bf16 (p0, x0, -32),
+	   z0 = svld4_vnum (p0, x0, -32))
+
+/*
+** ld4_vnum_bf16_m36:
+**	[^{]*
+**	ld4h	{z0\.h - z3\.h}, p0/z, \[x[0-9]+\]
+**	ret
+*/
+TEST_LOAD (ld4_vnum_bf16_m36, svbfloat16x4_t, bfloat16_t,
+	   z0 = svld4_vnum_bf16 (p0, x0, -36),
+	   z0 = svld4_vnum (p0, x0, -36))
+
+/* Using MUL to calculate an index would also be OK.  */
+/*
+** ld4_vnum_bf16_x1:
+**	cntb	(x[0-9]+)
+**	madd	(x[0-9]+), (x1, \1|\1, x1), x0
+**	ld4h	{z0\.h - z3\.h}, p0/z, \[\2\]
+**	ret
+*/
+TEST_LOAD (ld4_vnum_bf16_x1, svbfloat16x4_t, bfloat16_t,
+	   z0 = svld4_vnum_bf16 (p0, x0, x1),
+	   z0 = svld4_vnum (p0, x0, x1))
diff --git a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ldff1_bf16.c b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ldff1_bf16.c
new file mode 100644
index 0000000..80f6468
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ldff1_bf16.c
@@ -0,0 +1,86 @@
+/* { dg-final { check-function-bodies "**" "" "-DCHECK_ASM" { target { ! ilp32 } } } } */
+
+#include "test_sve_acle.h"
+
+/*
+** ldff1_bf16_base:
+**	ldff1h	z0\.h, p0/z, \[x0\]
+**	ret
+*/
+TEST_LOAD (ldff1_bf16_base, svbfloat16_t, bfloat16_t,
+	   z0 = svldff1_bf16 (p0, x0),
+	   z0 = svldff1 (p0, x0))
+
+/*
+** ldff1_bf16_index:
+**	ldff1h	z0\.h, p0/z, \[x0, x1, lsl 1\]
+**	ret
+*/
+TEST_LOAD (ldff1_bf16_index, svbfloat16_t, bfloat16_t,
+	   z0 = svldff1_bf16 (p0, x0 + x1),
+	   z0 = svldff1 (p0, x0 + x1))
+
+/* Moving the constant into a register would also be OK.  */
+/*
+** ldff1_bf16_1:
+**	incb	x0
+**	ldff1h	z0\.h, p0/z, \[x0\]
+**	ret
+*/
+TEST_LOAD (ldff1_bf16_1, svbfloat16_t, bfloat16_t,
+	   z0 = svldff1_bf16 (p0, x0 + svcnth ()),
+	   z0 = svldff1 (p0, x0 + svcnth ()))
+
+/* Moving the constant into a register would also be OK.  */
+/*
+** ldff1_bf16_m1:
+**	decb	x0
+**	ldff1h	z0\.h, p0/z, \[x0\]
+**	ret
+*/
+TEST_LOAD (ldff1_bf16_m1, svbfloat16_t, bfloat16_t,
+	   z0 = svldff1_bf16 (p0, x0 - svcnth ()),
+	   z0 = svldff1 (p0, x0 - svcnth ()))
+
+/*
+** ldff1_vnum_bf16_0:
+**	ldff1h	z0\.h, p0/z, \[x0\]
+**	ret
+*/
+TEST_LOAD (ldff1_vnum_bf16_0, svbfloat16_t, bfloat16_t,
+	   z0 = svldff1_vnum_bf16 (p0, x0, 0),
+	   z0 = svldff1_vnum (p0, x0, 0))
+
+/* Moving the constant into a register would also be OK.  */
+/*
+** ldff1_vnum_bf16_1:
+**	incb	x0
+**	ldff1h	z0\.h, p0/z, \[x0\]
+**	ret
+*/
+TEST_LOAD (ldff1_vnum_bf16_1, svbfloat16_t, bfloat16_t,
+	   z0 = svldff1_vnum_bf16 (p0, x0, 1),
+	   z0 = svldff1_vnum (p0, x0, 1))
+
+/* Moving the constant into a register would also be OK.  */
+/*
+** ldff1_vnum_bf16_m1:
+**	decb	x0
+**	ldff1h	z0\.h, p0/z, \[x0\]
+**	ret
+*/
+TEST_LOAD (ldff1_vnum_bf16_m1, svbfloat16_t, bfloat16_t,
+	   z0 = svldff1_vnum_bf16 (p0, x0, -1),
+	   z0 = svldff1_vnum (p0, x0, -1))
+
+/* Using MUL to calculate an index would also be OK.  */
+/*
+** ldff1_vnum_bf16_x1:
+**	cntb	(x[0-9]+)
+**	madd	(x[0-9]+), (x1, \1|\1, x1), x0
+**	ldff1h	z0\.h, p0/z, \[\2\]
+**	ret
+*/
+TEST_LOAD (ldff1_vnum_bf16_x1, svbfloat16_t, bfloat16_t,
+	   z0 = svldff1_vnum_bf16 (p0, x0, x1),
+	   z0 = svldff1_vnum (p0, x0, x1))
diff --git a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ldnf1_bf16.c b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ldnf1_bf16.c
new file mode 100644
index 0000000..947a896
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ldnf1_bf16.c
@@ -0,0 +1,154 @@
+/* { dg-final { check-function-bodies "**" "" "-DCHECK_ASM" { target { ! ilp32 } } } } */
+
+#include "test_sve_acle.h"
+
+/*
+** ldnf1_bf16_base:
+**	ldnf1h	z0\.h, p0/z, \[x0\]
+**	ret
+*/
+TEST_LOAD (ldnf1_bf16_base, svbfloat16_t, bfloat16_t,
+	   z0 = svldnf1_bf16 (p0, x0),
+	   z0 = svldnf1 (p0, x0))
+
+/*
+** ldnf1_bf16_index:
+**	add	(x[0-9]+), x0, x1, lsl 1
+**	ldnf1h	z0\.h, p0/z, \[\1\]
+**	ret
+*/
+TEST_LOAD (ldnf1_bf16_index, svbfloat16_t, bfloat16_t,
+	   z0 = svldnf1_bf16 (p0, x0 + x1),
+	   z0 = svldnf1 (p0, x0 + x1))
+
+/*
+** ldnf1_bf16_1:
+**	ldnf1h	z0\.h, p0/z, \[x0, #1, mul vl\]
+**	ret
+*/
+TEST_LOAD (ldnf1_bf16_1, svbfloat16_t, bfloat16_t,
+	   z0 = svldnf1_bf16 (p0, x0 + svcnth ()),
+	   z0 = svldnf1 (p0, x0 + svcnth ()))
+
+/*
+** ldnf1_bf16_7:
+**	ldnf1h	z0\.h, p0/z, \[x0, #7, mul vl\]
+**	ret
+*/
+TEST_LOAD (ldnf1_bf16_7, svbfloat16_t, bfloat16_t,
+	   z0 = svldnf1_bf16 (p0, x0 + svcnth () * 7),
+	   z0 = svldnf1 (p0, x0 + svcnth () * 7))
+
+/*
+** ldnf1_bf16_8:
+**	incb	x0, all, mul #8
+**	ldnf1h	z0\.h, p0/z, \[x0\]
+**	ret
+*/
+TEST_LOAD (ldnf1_bf16_8, svbfloat16_t, bfloat16_t,
+	   z0 = svldnf1_bf16 (p0, x0 + svcnth () * 8),
+	   z0 = svldnf1 (p0, x0 + svcnth () * 8))
+
+/*
+** ldnf1_bf16_m1:
+**	ldnf1h	z0\.h, p0/z, \[x0, #-1, mul vl\]
+**	ret
+*/
+TEST_LOAD (ldnf1_bf16_m1, svbfloat16_t, bfloat16_t,
+	   z0 = svldnf1_bf16 (p0, x0 - svcnth ()),
+	   z0 = svldnf1 (p0, x0 - svcnth ()))
+
+/*
+** ldnf1_bf16_m8:
+**	ldnf1h	z0\.h, p0/z, \[x0, #-8, mul vl\]
+**	ret
+*/
+TEST_LOAD (ldnf1_bf16_m8, svbfloat16_t, bfloat16_t,
+	   z0 = svldnf1_bf16 (p0, x0 - svcnth () * 8),
+	   z0 = svldnf1 (p0, x0 - svcnth () * 8))
+
+/*
+** ldnf1_bf16_m9:
+**	decb	x0, all, mul #9
+**	ldnf1h	z0\.h, p0/z, \[x0\]
+**	ret
+*/
+TEST_LOAD (ldnf1_bf16_m9, svbfloat16_t, bfloat16_t,
+	   z0 = svldnf1_bf16 (p0, x0 - svcnth () * 9),
+	   z0 = svldnf1 (p0, x0 - svcnth () * 9))
+
+/*
+** ldnf1_vnum_bf16_0:
+**	ldnf1h	z0\.h, p0/z, \[x0\]
+**	ret
+*/
+TEST_LOAD (ldnf1_vnum_bf16_0, svbfloat16_t, bfloat16_t,
+	   z0 = svldnf1_vnum_bf16 (p0, x0, 0),
+	   z0 = svldnf1_vnum (p0, x0, 0))
+
+/*
+** ldnf1_vnum_bf16_1:
+**	ldnf1h	z0\.h, p0/z, \[x0, #1, mul vl\]
+**	ret
+*/
+TEST_LOAD (ldnf1_vnum_bf16_1, svbfloat16_t, bfloat16_t,
+	   z0 = svldnf1_vnum_bf16 (p0, x0, 1),
+	   z0 = svldnf1_vnum (p0, x0, 1))
+
+/*
+** ldnf1_vnum_bf16_7:
+**	ldnf1h	z0\.h, p0/z, \[x0, #7, mul vl\]
+**	ret
+*/
+TEST_LOAD (ldnf1_vnum_bf16_7, svbfloat16_t, bfloat16_t,
+	   z0 = svldnf1_vnum_bf16 (p0, x0, 7),
+	   z0 = svldnf1_vnum (p0, x0, 7))
+
+/*
+** ldnf1_vnum_bf16_8:
+**	incb	x0, all, mul #8
+**	ldnf1h	z0\.h, p0/z, \[x0\]
+**	ret
+*/
+TEST_LOAD (ldnf1_vnum_bf16_8, svbfloat16_t, bfloat16_t,
+	   z0 = svldnf1_vnum_bf16 (p0, x0, 8),
+	   z0 = svldnf1_vnum (p0, x0, 8))
+
+/*
+** ldnf1_vnum_bf16_m1:
+**	ldnf1h	z0\.h, p0/z, \[x0, #-1, mul vl\]
+**	ret
+*/
+TEST_LOAD (ldnf1_vnum_bf16_m1, svbfloat16_t, bfloat16_t,
+	   z0 = svldnf1_vnum_bf16 (p0, x0, -1),
+	   z0 = svldnf1_vnum (p0, x0, -1))
+
+/*
+** ldnf1_vnum_bf16_m8:
+**	ldnf1h	z0\.h, p0/z, \[x0, #-8, mul vl\]
+**	ret
+*/
+TEST_LOAD (ldnf1_vnum_bf16_m8, svbfloat16_t, bfloat16_t,
+	   z0 = svldnf1_vnum_bf16 (p0, x0, -8),
+	   z0 = svldnf1_vnum (p0, x0, -8))
+
+/*
+** ldnf1_vnum_bf16_m9:
+**	decb	x0, all, mul #9
+**	ldnf1h	z0\.h, p0/z, \[x0\]
+**	ret
+*/
+TEST_LOAD (ldnf1_vnum_bf16_m9, svbfloat16_t, bfloat16_t,
+	   z0 = svldnf1_vnum_bf16 (p0, x0, -9),
+	   z0 = svldnf1_vnum (p0, x0, -9))
+
+/*
+** ldnf1_vnum_bf16_x1:
+**	cntb	(x[0-9]+)
+**	madd	(x[0-9]+), (?:x1, \1|\1, x1), x0
+**	ldnf1h	z0\.h, p0/z, \[\2\]
+**	ret
+*/
+TEST_LOAD (ldnf1_vnum_bf16_x1, svbfloat16_t, bfloat16_t,
+	   z0 = svldnf1_vnum_bf16 (p0, x0, x1),
+	   z0 = svldnf1_vnum (p0, x0, x1))
diff --git a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ldnt1_bf16.c b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ldnt1_bf16.c
new file mode 100644
index 0000000..b083901
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ldnt1_bf16.c
@@ -0,0 +1,158 @@
+/* { dg-final { check-function-bodies "**" "" "-DCHECK_ASM" { target { ! ilp32 } } } } */
+
+#include "test_sve_acle.h"
+
+/*
+** ldnt1_bf16_base:
+**	ldnt1h	z0\.h, p0/z, \[x0\]
+**	ret
+*/
+TEST_LOAD (ldnt1_bf16_base, svbfloat16_t, bfloat16_t,
+	   z0 = svldnt1_bf16 (p0, x0),
+	   z0 = svldnt1 (p0, x0))
+
+/*
+** ldnt1_bf16_index:
+**	ldnt1h	z0\.h, p0/z, \[x0, x1, lsl 1\]
+**	ret
+*/
+TEST_LOAD (ldnt1_bf16_index, svbfloat16_t, bfloat16_t,
+	   z0 = svldnt1_bf16 (p0, x0 + x1),
+	   z0 = svldnt1 (p0, x0 + x1))
+
+/*
+** ldnt1_bf16_1:
+**	ldnt1h	z0\.h, p0/z, \[x0, #1, mul vl\]
+**	ret
+*/
+TEST_LOAD (ldnt1_bf16_1, svbfloat16_t, bfloat16_t,
+	   z0 = svldnt1_bf16 (p0, x0 + svcnth ()),
+	   z0 = svldnt1 (p0, x0 + svcnth ()))
+
+/*
+** ldnt1_bf16_7:
+**	ldnt1h	z0\.h, p0/z, \[x0, #7, mul vl\]
+**	ret
+*/
+TEST_LOAD (ldnt1_bf16_7, svbfloat16_t, bfloat16_t,
+	   z0 = svldnt1_bf16 (p0, x0 + svcnth () * 7),
+	   z0 = svldnt1 (p0, x0 + svcnth () * 7))
+
+/* Moving the constant into a register would also be OK.  */
+/*
+** ldnt1_bf16_8:
+**	incb	x0, all, mul #8
+**	ldnt1h	z0\.h, p0/z, \[x0\]
+**	ret
+*/
+TEST_LOAD (ldnt1_bf16_8, svbfloat16_t, bfloat16_t,
+	   z0 = svldnt1_bf16 (p0, x0 + svcnth () * 8),
+	   z0 = svldnt1 (p0, x0 + svcnth () * 8))
+
+/*
+** ldnt1_bf16_m1:
+**	ldnt1h	z0\.h, p0/z, \[x0, #-1, mul vl\]
+**	ret
+*/
+TEST_LOAD (ldnt1_bf16_m1, svbfloat16_t, bfloat16_t,
+	   z0 = svldnt1_bf16 (p0, x0 - svcnth ()),
+	   z0 = svldnt1 (p0, x0 - svcnth ()))
+
+/*
+** ldnt1_bf16_m8:
+**	ldnt1h	z0\.h, p0/z, \[x0, #-8, mul vl\]
+**	ret
+*/
+TEST_LOAD (ldnt1_bf16_m8, svbfloat16_t, bfloat16_t,
+	   z0 = svldnt1_bf16 (p0, x0 - svcnth () * 8),
+	   z0 = svldnt1 (p0, x0 - svcnth () * 8))
+
+/* Moving the constant into a register would also be OK.  */
+/*
+** ldnt1_bf16_m9:
+**	decb	x0, all, mul #9
+**	ldnt1h	z0\.h, p0/z, \[x0\]
+**	ret
+*/
+TEST_LOAD (ldnt1_bf16_m9, svbfloat16_t, bfloat16_t,
+	   z0 = svldnt1_bf16 (p0, x0 - svcnth () * 9),
+	   z0 = svldnt1 (p0, x0 - svcnth () * 9))
+
+/*
+** ldnt1_vnum_bf16_0:
+**	ldnt1h	z0\.h, p0/z, \[x0\]
+**	ret
+*/
+TEST_LOAD (ldnt1_vnum_bf16_0, svbfloat16_t, bfloat16_t,
+	   z0 = svldnt1_vnum_bf16 (p0, x0, 0),
+	   z0 = svldnt1_vnum (p0, x0, 0))
+
+/*
+** ldnt1_vnum_bf16_1:
+**	ldnt1h	z0\.h, p0/z, \[x0, #1, mul vl\]
+**	ret
+*/
+TEST_LOAD (ldnt1_vnum_bf16_1, svbfloat16_t, bfloat16_t,
+	   z0 = svldnt1_vnum_bf16 (p0, x0, 1),
+	   z0 = svldnt1_vnum (p0, x0, 1))
+
+/*
+** ldnt1_vnum_bf16_7:
+**	ldnt1h	z0\.h, p0/z, \[x0, #7, mul vl\]
+**	ret
+*/
+TEST_LOAD (ldnt1_vnum_bf16_7, svbfloat16_t, bfloat16_t,
+	   z0 = svldnt1_vnum_bf16 (p0, x0, 7),
+	   z0 = svldnt1_vnum (p0, x0, 7))
+
+/* Moving the constant into a register would also be OK.  */
+/*
+** ldnt1_vnum_bf16_8:
+**	incb	x0, all, mul #8
+**	ldnt1h	z0\.h, p0/z, \[x0\]
+**	ret
+*/
+TEST_LOAD (ldnt1_vnum_bf16_8, svbfloat16_t, bfloat16_t,
+	   z0 = svldnt1_vnum_bf16 (p0, x0, 8),
+	   z0 = svldnt1_vnum (p0, x0, 8))
+
+/*
+** ldnt1_vnum_bf16_m1:
+**	ldnt1h	z0\.h, p0/z, \[x0, #-1, mul vl\]
+**	ret
+*/
+TEST_LOAD (ldnt1_vnum_bf16_m1, svbfloat16_t, bfloat16_t,
+	   z0 = svldnt1_vnum_bf16 (p0, x0, -1),
+	   z0 = svldnt1_vnum (p0, x0, -1))
+
+/*
+** ldnt1_vnum_bf16_m8:
+**	ldnt1h	z0\.h, p0/z, \[x0, #-8, mul vl\]
+**	ret
+*/
+TEST_LOAD (ldnt1_vnum_bf16_m8, svbfloat16_t, bfloat16_t,
+	   z0 = svldnt1_vnum_bf16 (p0, x0, -8),
+	   z0 = svldnt1_vnum (p0, x0, -8))
+
+/* Moving the constant into a register would also be OK.  */
+/*
+** ldnt1_vnum_bf16_m9:
+**	decb	x0, all, mul #9
+**	ldnt1h	z0\.h, p0/z, \[x0\]
+**	ret
+*/
+TEST_LOAD (ldnt1_vnum_bf16_m9, svbfloat16_t, bfloat16_t,
+	   z0 = svldnt1_vnum_bf16 (p0, x0, -9),
+	   z0 = svldnt1_vnum (p0, x0, -9))
+
+/* Using MUL to calculate an index would also be OK.  */
+/*
+** ldnt1_vnum_bf16_x1:
+**	cntb	(x[0-9]+)
+**	madd	(x[0-9]+), (x1, \1|\1, x1), x0
+**	ldnt1h	z0\.h, p0/z, \[\2\]
+**	ret
+*/
+TEST_LOAD (ldnt1_vnum_bf16_x1, svbfloat16_t, bfloat16_t,
+	   z0 = svldnt1_vnum_bf16 (p0, x0, x1),
+	   z0 = svldnt1_vnum (p0, x0, x1))
diff --git a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/len_bf16.c b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/len_bf16.c
new file mode 100644
index 0000000..cd91ff4
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/len_bf16.c
@@ -0,0 +1,12 @@
+/* { dg-final { check-function-bodies "**" "" "-DCHECK_ASM" } } */
+
+#include "test_sve_acle.h"
+
+/*
+** len_x0_bf16:
+**	cnth	x0
+**	ret
+*/
+TEST_REDUCTION_X (len_x0_bf16, uint64_t, svbfloat16_t,
+		  x0 = svlen_bf16 (z0),
+		  x0 = svlen (z0))
diff --git a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/reinterpret_bf16.c b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/reinterpret_bf16.c
new file mode 100644
index 0000000..2d2c2a7
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/reinterpret_bf16.c
@@ -0,0 +1,207 @@
+/* { dg-final { check-function-bodies "**" "" "-DCHECK_ASM" } } */
+
+#include "test_sve_acle.h"
+
+/*
+** reinterpret_bf16_bf16_tied1:
+**	ret
+*/
+TEST_DUAL_Z_REV (reinterpret_bf16_bf16_tied1, svbfloat16_t, svbfloat16_t,
+		 z0_res = svreinterpret_bf16_bf16 (z0),
+		 z0_res = svreinterpret_bf16 (z0))
+
+/*
+** reinterpret_bf16_bf16_untied:
+**	mov	z0\.d, z4\.d
+**	ret
+*/
+TEST_DUAL_Z (reinterpret_bf16_bf16_untied, svbfloat16_t, svbfloat16_t,
+	     z0 = svreinterpret_bf16_bf16 (z4),
+	     z0 = svreinterpret_bf16 (z4))
+
+/*
+** reinterpret_bf16_f16_tied1:
+**	ret
+*/
+TEST_DUAL_Z_REV (reinterpret_bf16_f16_tied1, svbfloat16_t, svfloat16_t,
+		 z0_res = svreinterpret_bf16_f16 (z0),
+		 z0_res = svreinterpret_bf16 (z0))
+
+/*
+** reinterpret_bf16_f16_untied:
+**	mov	z0\.d, z4\.d
+**	ret
+*/
+TEST_DUAL_Z (reinterpret_bf16_f16_untied, svbfloat16_t, svfloat16_t,
+	     z0 = svreinterpret_bf16_f16 (z4),
+	     z0 = svreinterpret_bf16 (z4))
+
+/*
+** reinterpret_bf16_f32_tied1:
+**	ret
+*/
+TEST_DUAL_Z_REV (reinterpret_bf16_f32_tied1, svbfloat16_t, svfloat32_t,
+		 z0_res = svreinterpret_bf16_f32 (z0),
+		 z0_res = svreinterpret_bf16 (z0))
+
+/*
+** reinterpret_bf16_f32_untied:
+**	mov	z0\.d, z4\.d
+**	ret
+*/
+TEST_DUAL_Z (reinterpret_bf16_f32_untied, svbfloat16_t, svfloat32_t,
+	     z0 = svreinterpret_bf16_f32 (z4),
+	     z0 = svreinterpret_bf16 (z4))
+
+/*
+** reinterpret_bf16_f64_tied1:
+**	ret
+*/
+TEST_DUAL_Z_REV (reinterpret_bf16_f64_tied1, svbfloat16_t, svfloat64_t,
+		 z0_res = svreinterpret_bf16_f64 (z0),
+		 z0_res = svreinterpret_bf16 (z0))
+
+/*
+** reinterpret_bf16_f64_untied:
+**	mov	z0\.d, z4\.d
+**	ret
+*/
+TEST_DUAL_Z (reinterpret_bf16_f64_untied, svbfloat16_t, svfloat64_t,
+	     z0 = svreinterpret_bf16_f64 (z4),
+	     z0 = svreinterpret_bf16 (z4))
+
+/*
+** reinterpret_bf16_s8_tied1:
+**	ret
+*/
+TEST_DUAL_Z_REV (reinterpret_bf16_s8_tied1, svbfloat16_t, svint8_t,
+		 z0_res = svreinterpret_bf16_s8 (z0),
+		 z0_res = svreinterpret_bf16 (z0))
+
+/*
+** reinterpret_bf16_s8_untied:
+**	mov	z0\.d, z4\.d
+**	ret
+*/
+TEST_DUAL_Z (reinterpret_bf16_s8_untied, svbfloat16_t, svint8_t,
+	     z0 = svreinterpret_bf16_s8 (z4),
+	     z0 = svreinterpret_bf16 (z4))
+
+/*
+** reinterpret_bf16_s16_tied1:
+**	ret
+*/
+TEST_DUAL_Z_REV (reinterpret_bf16_s16_tied1, svbfloat16_t, svint16_t,
+		 z0_res = svreinterpret_bf16_s16 (z0),
+		 z0_res = svreinterpret_bf16 (z0))
+
+/*
+** reinterpret_bf16_s16_untied:
+**	mov	z0\.d, z4\.d
+**	ret
+*/
+TEST_DUAL_Z (reinterpret_bf16_s16_untied, svbfloat16_t, svint16_t,
+	     z0 = svreinterpret_bf16_s16 (z4),
+	     z0 = svreinterpret_bf16 (z4))
+
+/*
+** reinterpret_bf16_s32_tied1:
+**	ret
+*/
+TEST_DUAL_Z_REV (reinterpret_bf16_s32_tied1, svbfloat16_t, svint32_t,
+		 z0_res = svreinterpret_bf16_s32 (z0),
+		 z0_res = svreinterpret_bf16 (z0))
+
+/*
+** reinterpret_bf16_s32_untied:
+**	mov	z0\.d, z4\.d
+**	ret
+*/
+TEST_DUAL_Z (reinterpret_bf16_s32_untied, svbfloat16_t, svint32_t,
+	     z0 = svreinterpret_bf16_s32 (z4),
+	     z0 = svreinterpret_bf16 (z4))
+
+/*
+** reinterpret_bf16_s64_tied1:
+**	ret
+*/
+TEST_DUAL_Z_REV (reinterpret_bf16_s64_tied1, svbfloat16_t, svint64_t,
+		 z0_res = svreinterpret_bf16_s64 (z0),
+		 z0_res = svreinterpret_bf16 (z0))
+
+/*
+** reinterpret_bf16_s64_untied:
+**	mov	z0\.d, z4\.d
+**	ret
+*/
+TEST_DUAL_Z (reinterpret_bf16_s64_untied, svbfloat16_t, svint64_t,
+	     z0 = svreinterpret_bf16_s64 (z4),
+	     z0 = svreinterpret_bf16 (z4))
+
+/*
+** reinterpret_bf16_u8_tied1:
+**	ret
+*/
+TEST_DUAL_Z_REV (reinterpret_bf16_u8_tied1, svbfloat16_t, svuint8_t,
+		 z0_res = svreinterpret_bf16_u8 (z0),
+		 z0_res = svreinterpret_bf16 (z0))
+
+/*
+** reinterpret_bf16_u8_untied:
+**	mov	z0\.d, z4\.d
+**	ret
+*/
+TEST_DUAL_Z (reinterpret_bf16_u8_untied, svbfloat16_t, svuint8_t,
+	     z0 = svreinterpret_bf16_u8 (z4),
+	     z0 = svreinterpret_bf16 (z4))
+
+/*
+** reinterpret_bf16_u16_tied1:
+**	ret
+*/
+TEST_DUAL_Z_REV (reinterpret_bf16_u16_tied1, svbfloat16_t, svuint16_t,
+		 z0_res = svreinterpret_bf16_u16 (z0),
+		 z0_res = svreinterpret_bf16 (z0))
+
+/*
+** reinterpret_bf16_u16_untied:
+**	mov	z0\.d, z4\.d
+**	ret
+*/
+TEST_DUAL_Z (reinterpret_bf16_u16_untied, svbfloat16_t, svuint16_t,
+	     z0 = svreinterpret_bf16_u16 (z4),
+	     z0 = svreinterpret_bf16 (z4))
+
+/*
+** reinterpret_bf16_u32_tied1:
+**	ret
+*/
+TEST_DUAL_Z_REV (reinterpret_bf16_u32_tied1, svbfloat16_t, svuint32_t,
+		 z0_res = svreinterpret_bf16_u32 (z0),
+		 z0_res = svreinterpret_bf16 (z0))
+
+/*
+** reinterpret_bf16_u32_untied:
+**	mov	z0\.d, z4\.d
+**	ret
+*/
+TEST_DUAL_Z (reinterpret_bf16_u32_untied, svbfloat16_t, svuint32_t,
+	     z0 = svreinterpret_bf16_u32 (z4),
+	     z0 = svreinterpret_bf16 (z4))
+
+/*
+** reinterpret_bf16_u64_tied1:
+**	ret
+*/
+TEST_DUAL_Z_REV (reinterpret_bf16_u64_tied1, svbfloat16_t, svuint64_t,
+		 z0_res = svreinterpret_bf16_u64 (z0),
+		 z0_res = svreinterpret_bf16 (z0))
+
+/*
+** reinterpret_bf16_u64_untied:
+**	mov	z0\.d, z4\.d
+**	ret
+*/
+TEST_DUAL_Z (reinterpret_bf16_u64_untied, svbfloat16_t, svuint64_t,
+	     z0 = svreinterpret_bf16_u64 (z4),
+	     z0 = svreinterpret_bf16 (z4))
diff --git a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/reinterpret_f16.c b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/reinterpret_f16.c
index 0890700..60705e6 100644
--- a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/reinterpret_f16.c
+++ b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/reinterpret_f16.c
@@ -3,6 +3,23 @@
 #include "test_sve_acle.h"
 
 /*
+** reinterpret_f16_bf16_tied1:
+**	ret
+*/
+TEST_DUAL_Z_REV (reinterpret_f16_bf16_tied1, svfloat16_t, svbfloat16_t,
+		 z0_res = svreinterpret_f16_bf16 (z0),
+		 z0_res = svreinterpret_f16 (z0))
+
+/*
+** reinterpret_f16_bf16_untied:
+**	mov	z0\.d, z4\.d
+**	ret
+*/
+TEST_DUAL_Z (reinterpret_f16_bf16_untied, svfloat16_t, svbfloat16_t,
+	     z0 = svreinterpret_f16_bf16 (z4),
+	     z0 = svreinterpret_f16 (z4))
+
+/*
 ** reinterpret_f16_f16_tied1:
 **	ret
 */
diff --git a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/reinterpret_f32.c b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/reinterpret_f32.c
index aed31c8..06fc46f 100644
--- a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/reinterpret_f32.c
+++ b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/reinterpret_f32.c
@@ -3,6 +3,23 @@
 #include "test_sve_acle.h"
 
 /*
+** reinterpret_f32_bf16_tied1:
+**	ret
+*/
+TEST_DUAL_Z_REV (reinterpret_f32_bf16_tied1, svfloat32_t, svbfloat16_t,
+		 z0_res = svreinterpret_f32_bf16 (z0),
+		 z0_res = svreinterpret_f32 (z0))
+
+/*
+** reinterpret_f32_bf16_untied:
+**	mov	z0\.d, z4\.d
+**	ret
+*/
+TEST_DUAL_Z (reinterpret_f32_bf16_untied, svfloat32_t, svbfloat16_t,
+	     z0 = svreinterpret_f32_bf16 (z4),
+	     z0 = svreinterpret_f32 (z4))
+
+/*
 ** reinterpret_f32_f16_tied1:
 **	ret
 */
diff --git a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/reinterpret_f64.c b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/reinterpret_f64.c
index 92c68ee..003ee3f 100644
--- a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/reinterpret_f64.c
+++ b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/reinterpret_f64.c
@@ -3,6 +3,23 @@
 #include "test_sve_acle.h"
 
 /*
+** reinterpret_f64_bf16_tied1:
+**	ret
+*/
+TEST_DUAL_Z_REV (reinterpret_f64_bf16_tied1, svfloat64_t, svbfloat16_t,
+		 z0_res = svreinterpret_f64_bf16 (z0),
+		 z0_res = svreinterpret_f64 (z0))
+
+/*
+** reinterpret_f64_bf16_untied:
+**	mov	z0\.d, z4\.d
+**	ret
+*/
+TEST_DUAL_Z (reinterpret_f64_bf16_untied, svfloat64_t, svbfloat16_t,
+	     z0 = svreinterpret_f64_bf16 (z4),
+	     z0 = svreinterpret_f64 (z4))
+
+/*
 ** reinterpret_f64_f16_tied1:
 **	ret
 */
diff --git a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/reinterpret_s16.c b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/reinterpret_s16.c
index e5d9178..d62817c 100644
--- a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/reinterpret_s16.c
+++ b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/reinterpret_s16.c
@@ -3,6 +3,23 @@
 #include "test_sve_acle.h"
 
 /*
+** reinterpret_s16_bf16_tied1:
+**	ret
+*/
+TEST_DUAL_Z_REV (reinterpret_s16_bf16_tied1, svint16_t, svbfloat16_t,
+		 z0_res = svreinterpret_s16_bf16 (z0),
+		 z0_res = svreinterpret_s16 (z0))
+
+/*
+** reinterpret_s16_bf16_untied:
+**	mov	z0\.d, z4\.d
+**	ret
+*/
+TEST_DUAL_Z (reinterpret_s16_bf16_untied, svint16_t, svbfloat16_t,
+	     z0 = svreinterpret_s16_bf16 (z4),
+	     z0 = svreinterpret_s16 (z4))
+
+/*
 ** reinterpret_s16_f16_tied1:
 **	ret
 */
diff --git a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/reinterpret_s32.c b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/reinterpret_s32.c
index f188104..e1068f2 100644
--- a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/reinterpret_s32.c
+++ b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/reinterpret_s32.c
@@ -3,6 +3,23 @@
 #include "test_sve_acle.h"
 
 /*
+** reinterpret_s32_bf16_tied1:
+**	ret
+*/
+TEST_DUAL_Z_REV (reinterpret_s32_bf16_tied1, svint32_t, svbfloat16_t,
+		 z0_res = svreinterpret_s32_bf16 (z0),
+		 z0_res = svreinterpret_s32 (z0))
+
+/*
+** reinterpret_s32_bf16_untied:
+**	mov	z0\.d, z4\.d
+**	ret
+*/
+TEST_DUAL_Z (reinterpret_s32_bf16_untied, svint32_t, svbfloat16_t,
+	     z0 = svreinterpret_s32_bf16 (z4),
+	     z0 = svreinterpret_s32 (z4))
+
+/*
 ** reinterpret_s32_f16_tied1:
 **	ret
 */
diff --git a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/reinterpret_s64.c b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/reinterpret_s64.c
index f8fbb33d..cada753 100644
--- a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/reinterpret_s64.c
+++ b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/reinterpret_s64.c
@@ -3,6 +3,23 @@
 #include "test_sve_acle.h"
 
 /*
+** reinterpret_s64_bf16_tied1:
+**	ret
+*/
+TEST_DUAL_Z_REV (reinterpret_s64_bf16_tied1, svint64_t, svbfloat16_t,
+		 z0_res = svreinterpret_s64_bf16 (z0),
+		 z0_res = svreinterpret_s64 (z0))
+
+/*
+** reinterpret_s64_bf16_untied:
+**	mov	z0\.d, z4\.d
+**	ret
+*/
+TEST_DUAL_Z (reinterpret_s64_bf16_untied, svint64_t, svbfloat16_t,
+	     z0 = svreinterpret_s64_bf16 (z4),
+	     z0 = svreinterpret_s64 (z4))
+
+/*
 ** reinterpret_s64_f16_tied1:
 **	ret
 */
diff --git a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/reinterpret_s8.c b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/reinterpret_s8.c
index cfa591c..23a40d0 100644
--- a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/reinterpret_s8.c
+++ b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/reinterpret_s8.c
@@ -3,6 +3,23 @@
 #include "test_sve_acle.h"
 
 /*
+** reinterpret_s8_bf16_tied1:
+**	ret
+*/
+TEST_DUAL_Z_REV (reinterpret_s8_bf16_tied1, svint8_t, svbfloat16_t,
+		 z0_res = svreinterpret_s8_bf16 (z0),
+		 z0_res = svreinterpret_s8 (z0))
+
+/*
+** reinterpret_s8_bf16_untied:
+**	mov	z0\.d, z4\.d
+**	ret
+*/
+TEST_DUAL_Z (reinterpret_s8_bf16_untied, svint8_t, svbfloat16_t,
+	     z0 = svreinterpret_s8_bf16 (z4),
+	     z0 = svreinterpret_s8 (z4))
+
+/*
 ** reinterpret_s8_f16_tied1:
 **	ret
 */
diff --git a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/reinterpret_u16.c b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/reinterpret_u16.c
index 0980e73..48e8eca 100644
--- a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/reinterpret_u16.c
+++ b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/reinterpret_u16.c
@@ -3,6 +3,23 @@
 #include "test_sve_acle.h"
 
 /*
+** reinterpret_u16_bf16_tied1:
+**	ret
+*/
+TEST_DUAL_Z_REV (reinterpret_u16_bf16_tied1, svuint16_t, svbfloat16_t,
+		 z0_res = svreinterpret_u16_bf16 (z0),
+		 z0_res = svreinterpret_u16 (z0))
+
+/*
+** reinterpret_u16_bf16_untied:
+**	mov	z0\.d, z4\.d
+**	ret
+*/
+TEST_DUAL_Z (reinterpret_u16_bf16_untied, svuint16_t, svbfloat16_t,
+	     z0 = svreinterpret_u16_bf16 (z4),
+	     z0 = svreinterpret_u16 (z4))
+
+/*
 ** reinterpret_u16_f16_tied1:
 **	ret
 */
diff --git a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/reinterpret_u32.c b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/reinterpret_u32.c
index 92e3f5b..1d4e857 100644
--- a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/reinterpret_u32.c
+++ b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/reinterpret_u32.c
@@ -3,6 +3,23 @@
 #include "test_sve_acle.h"
 
 /*
+** reinterpret_u32_bf16_tied1:
+**	ret
+*/
+TEST_DUAL_Z_REV (reinterpret_u32_bf16_tied1, svuint32_t, svbfloat16_t,
+		 z0_res = svreinterpret_u32_bf16 (z0),
+		 z0_res = svreinterpret_u32 (z0))
+
+/*
+** reinterpret_u32_bf16_untied:
+**	mov	z0\.d, z4\.d
+**	ret
+*/
+TEST_DUAL_Z (reinterpret_u32_bf16_untied, svuint32_t, svbfloat16_t,
+	     z0 = svreinterpret_u32_bf16 (z4),
+	     z0 = svreinterpret_u32 (z4))
+
+/*
 ** reinterpret_u32_f16_tied1:
 **	ret
 */
diff --git a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/reinterpret_u64.c b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/reinterpret_u64.c
index bcfa336..07af69d 100644
--- a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/reinterpret_u64.c
+++ b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/reinterpret_u64.c
@@ -3,6 +3,23 @@
 #include "test_sve_acle.h"
 
 /*
+** reinterpret_u64_bf16_tied1:
+**	ret
+*/
+TEST_DUAL_Z_REV (reinterpret_u64_bf16_tied1, svuint64_t, svbfloat16_t,
+		 z0_res = svreinterpret_u64_bf16 (z0),
+		 z0_res = svreinterpret_u64 (z0))
+
+/*
+** reinterpret_u64_bf16_untied:
+**	mov	z0\.d, z4\.d
+**	ret
+*/
+TEST_DUAL_Z (reinterpret_u64_bf16_untied, svuint64_t, svbfloat16_t,
+	     z0 = svreinterpret_u64_bf16 (z4),
+	     z0 = svreinterpret_u64 (z4))
+
+/*
 ** reinterpret_u64_f16_tied1:
 **	ret
 */
diff --git a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/reinterpret_u8.c b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/reinterpret_u8.c
index dd1286b..a4c7f4c 100644
--- a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/reinterpret_u8.c
+++ b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/reinterpret_u8.c
@@ -3,6 +3,23 @@
 #include "test_sve_acle.h"
 
 /*
+** reinterpret_u8_bf16_tied1:
+**	ret
+*/
+TEST_DUAL_Z_REV (reinterpret_u8_bf16_tied1, svuint8_t, svbfloat16_t,
+		 z0_res = svreinterpret_u8_bf16 (z0),
+		 z0_res = svreinterpret_u8 (z0))
+
+/*
+** reinterpret_u8_bf16_untied:
+**	mov	z0\.d, z4\.d
+**	ret
+*/
+TEST_DUAL_Z (reinterpret_u8_bf16_untied, svuint8_t, svbfloat16_t,
+	     z0 = svreinterpret_u8_bf16 (z4),
+	     z0 = svreinterpret_u8 (z4))
+
+/*
 ** reinterpret_u8_f16_tied1:
 **	ret
 */
diff --git a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/rev_bf16.c b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/rev_bf16.c
new file mode 100644
index 0000000..fe587d4
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/rev_bf16.c
@@ -0,0 +1,21 @@
+/* { dg-final { check-function-bodies "**" "" "-DCHECK_ASM" } } */
+
+#include "test_sve_acle.h"
+
+/*
+** rev_bf16_tied1:
+**	rev	z0\.h, z0\.h
+**	ret
+*/
+TEST_UNIFORM_Z (rev_bf16_tied1, svbfloat16_t,
+		z0 = svrev_bf16 (z0),
+		z0 = svrev (z0))
+
+/*
+** rev_bf16_untied:
+**	rev	z0\.h, z1\.h
+**	ret
+*/
+TEST_UNIFORM_Z (rev_bf16_untied, svbfloat16_t,
+		z0 = svrev_bf16 (z1),
+		z0 = svrev (z1))
diff --git a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/sel_bf16.c b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/sel_bf16.c
new file mode 100644
index 0000000..44636d8
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/sel_bf16.c
@@ -0,0 +1,30 @@
+/* { dg-final { check-function-bodies "**" "" "-DCHECK_ASM" } } */
+
+#include "test_sve_acle.h"
+
+/*
+** sel_bf16_tied1:
+**	sel	z0\.h, p0, z0\.h, z1\.h
+**	ret
+*/
+TEST_UNIFORM_Z (sel_bf16_tied1, svbfloat16_t,
+		z0 = svsel_bf16 (p0, z0, z1),
+		z0 = svsel (p0, z0, z1))
+
+/*
+** sel_bf16_tied2:
+**	sel	z0\.h, p0, z1\.h, z0\.h
+**	ret
+*/
+TEST_UNIFORM_Z (sel_bf16_tied2, svbfloat16_t,
+		z0 = svsel_bf16 (p0, z1, z0),
+		z0 = svsel (p0, z1, z0))
+
+/*
+** sel_bf16_untied:
+**	sel	z0\.h, p0, z1\.h, z2\.h
+**	ret
+*/
+TEST_UNIFORM_Z (sel_bf16_untied, svbfloat16_t,
+		z0 = svsel_bf16 (p0, z1, z2),
+		z0 = svsel (p0, z1, z2))
diff --git a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/set2_bf16.c b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/set2_bf16.c
new file mode 100644
index 0000000..b160a25
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/set2_bf16.c
@@ -0,0 +1,41 @@
+/* { dg-final { check-function-bodies "**" "" "-DCHECK_ASM" } } */
+
+#include "test_sve_acle.h"
+
+/*
+** set2_bf16_z24_0:
+**	mov	z25\.d, z5\.d
+**	mov	z24\.d, z0\.d
+**	ret
+*/
+TEST_SET (set2_bf16_z24_0, svbfloat16x2_t, svbfloat16_t,
+	  z24 = svset2_bf16 (z4, 0, z0),
+	  z24 = svset2 (z4, 0, z0))
+
+/*
+** set2_bf16_z24_1:
+**	mov	z24\.d, z4\.d
+**	mov	z25\.d, z0\.d
+**	ret
+*/
+TEST_SET (set2_bf16_z24_1, svbfloat16x2_t, svbfloat16_t,
+	  z24 = svset2_bf16 (z4, 1, z0),
+	  z24 = svset2 (z4, 1, z0))
+
+/*
+** set2_bf16_z4_0:
+**	mov	z4\.d, z0\.d
+**	ret
+*/
+TEST_SET (set2_bf16_z4_0, svbfloat16x2_t, svbfloat16_t,
+	  z4 = svset2_bf16 (z4, 0, z0),
+	  z4 = svset2 (z4, 0, z0))
+
+/*
+** set2_bf16_z4_1:
+**	mov	z5\.d, z0\.d
+**	ret
+*/
+TEST_SET (set2_bf16_z4_1, svbfloat16x2_t, svbfloat16_t,
+	  z4 = svset2_bf16 (z4, 1, z0),
+	  z4 = svset2 (z4, 1, z0))
diff --git a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/set3_bf16.c b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/set3_bf16.c
new file mode 100644
index 0000000..4e0707d
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/set3_bf16.c
@@ -0,0 +1,63 @@
+/* { dg-final { check-function-bodies "**" "" "-DCHECK_ASM" } } */
+
+#include "test_sve_acle.h"
+
+/*
+** set3_bf16_z24_0:
+**	mov	z25\.d, z5\.d
+**	mov	z26\.d, z6\.d
+**	mov	z24\.d, z0\.d
+**	ret
+*/
+TEST_SET (set3_bf16_z24_0, svbfloat16x3_t, svbfloat16_t,
+	  z24 = svset3_bf16 (z4, 0, z0),
+	  z24 = svset3 (z4, 0, z0))
+
+/*
+** set3_bf16_z24_1:
+**	mov	z24\.d, z4\.d
+**	mov	z26\.d, z6\.d
+**	mov	z25\.d, z0\.d
+**	ret
+*/
+TEST_SET (set3_bf16_z24_1, svbfloat16x3_t, svbfloat16_t,
+	  z24 = svset3_bf16 (z4, 1, z0),
+	  z24 = svset3 (z4, 1, z0))
+
+/*
+** set3_bf16_z24_2:
+**	mov	z24\.d, z4\.d
+**	mov	z25\.d, z5\.d
+**	mov	z26\.d, z0\.d
+**	ret
+*/
+TEST_SET (set3_bf16_z24_2, svbfloat16x3_t, svbfloat16_t,
+	  z24 = svset3_bf16 (z4, 2, z0),
+	  z24 = svset3 (z4, 2, z0))
+
+/*
+** set3_bf16_z4_0:
+**	mov	z4\.d, z0\.d
+**	ret
+*/
+TEST_SET (set3_bf16_z4_0, svbfloat16x3_t, svbfloat16_t,
+	  z4 = svset3_bf16 (z4, 0, z0),
+	  z4 = svset3 (z4, 0, z0))
+
+/*
+** set3_bf16_z4_1:
+**	mov	z5\.d, z0\.d
+**	ret
+*/
+TEST_SET (set3_bf16_z4_1, svbfloat16x3_t, svbfloat16_t,
+	  z4 = svset3_bf16 (z4, 1, z0),
+	  z4 = svset3 (z4, 1, z0))
+
+/*
+** set3_bf16_z4_2:
+**	mov	z6\.d, z0\.d
+**	ret
+*/
+TEST_SET (set3_bf16_z4_2, svbfloat16x3_t, svbfloat16_t,
+	  z4 = svset3_bf16 (z4, 2, z0),
+	  z4 = svset3 (z4, 2, z0))
diff --git a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/set4_bf16.c b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/set4_bf16.c
new file mode 100644
index 0000000..4e26c11
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/set4_bf16.c
@@ -0,0 +1,87 @@
+/* { dg-final { check-function-bodies "**" "" "-DCHECK_ASM" } } */
+
+#include "test_sve_acle.h"
+
+/*
+** set4_bf16_z24_0:
+**	mov	z25\.d, z5\.d
+**	mov	z26\.d, z6\.d
+**	mov	z27\.d, z7\.d
+**	mov	z24\.d, z0\.d
+**	ret
+*/
+TEST_SET (set4_bf16_z24_0, svbfloat16x4_t, svbfloat16_t,
+	  z24 = svset4_bf16 (z4, 0, z0),
+	  z24 = svset4 (z4, 0, z0))
+
+/*
+** set4_bf16_z24_1:
+**	mov	z24\.d, z4\.d
+**	mov	z26\.d, z6\.d
+**	mov	z27\.d, z7\.d
+**	mov	z25\.d, z0\.d
+**	ret
+*/
+TEST_SET (set4_bf16_z24_1, svbfloat16x4_t, svbfloat16_t,
+	  z24 = svset4_bf16 (z4, 1, z0),
+	  z24 = svset4 (z4, 1, z0))
+
+/*
+** set4_bf16_z24_2:
+**	mov	z24\.d, z4\.d
+**	mov	z25\.d, z5\.d
+**	mov	z27\.d, z7\.d
+**	mov	z26\.d, z0\.d
+**	ret
+*/
+TEST_SET (set4_bf16_z24_2, svbfloat16x4_t, svbfloat16_t,
+	  z24 = svset4_bf16 (z4, 2, z0),
+	  z24 = svset4 (z4, 2, z0))
+
+/*
+** set4_bf16_z24_3:
+**	mov	z24\.d, z4\.d
+**	mov	z25\.d, z5\.d
+**	mov	z26\.d, z6\.d
+**	mov	z27\.d, z0\.d
+**	ret
+*/
+TEST_SET (set4_bf16_z24_3, svbfloat16x4_t, svbfloat16_t,
+	  z24 = svset4_bf16 (z4, 3, z0),
+	  z24 = svset4 (z4, 3, z0))
+
+/*
+** set4_bf16_z4_0:
+**	mov	z4\.d, z0\.d
+**	ret
+*/
+TEST_SET (set4_bf16_z4_0, svbfloat16x4_t, svbfloat16_t,
+	  z4 = svset4_bf16 (z4, 0, z0),
+	  z4 = svset4 (z4, 0, z0))
+
+/*
+** set4_bf16_z4_1:
+**	mov	z5\.d, z0\.d
+**	ret
+*/
+TEST_SET (set4_bf16_z4_1, svbfloat16x4_t, svbfloat16_t,
+	  z4 = svset4_bf16 (z4, 1, z0),
+	  z4 = svset4 (z4, 1, z0))
+
+/*
+** set4_bf16_z4_2:
+**	mov	z6\.d, z0\.d
+**	ret
+*/
+TEST_SET (set4_bf16_z4_2, svbfloat16x4_t, svbfloat16_t,
+	  z4 = svset4_bf16 (z4, 2, z0),
+	  z4 = svset4 (z4, 2, z0))
+
+/*
+** set4_bf16_z4_3:
+**	mov	z7\.d, z0\.d
+**	ret
+*/
+TEST_SET (set4_bf16_z4_3, svbfloat16x4_t, svbfloat16_t,
+	  z4 = svset4_bf16 (z4, 3, z0),
+	  z4 = svset4 (z4, 3, z0))
diff --git a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/splice_bf16.c b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/splice_bf16.c
new file mode 100644
index 0000000..3d2dbf2
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/splice_bf16.c
@@ -0,0 +1,33 @@
+/* { dg-final { check-function-bodies "**" "" "-DCHECK_ASM" } } */
+
+#include "test_sve_acle.h"
+
+/*
+** splice_bf16_tied1:
+**	splice	z0\.h, p0, z0\.h, z1\.h
+**	ret
+*/
+TEST_UNIFORM_Z (splice_bf16_tied1, svbfloat16_t,
+		z0 = svsplice_bf16 (p0, z0, z1),
+		z0 = svsplice (p0, z0, z1))
+
+/*
+** splice_bf16_tied2:
+**	mov	(z[0-9]+)\.d, z0\.d
+**	movprfx	z0, z1
+**	splice	z0\.h, p0, z0\.h, \1\.h
+**	ret
+*/
+TEST_UNIFORM_Z (splice_bf16_tied2, svbfloat16_t,
+		z0 = svsplice_bf16 (p0, z1, z0),
+		z0 = svsplice (p0, z1, z0))
+
+/*
+** splice_bf16_untied:
+**	movprfx	z0, z1
+**	splice	z0\.h, p0, z0\.h, z2\.h
+**	ret
+*/
+TEST_UNIFORM_Z (splice_bf16_untied, svbfloat16_t,
+		z0 = svsplice_bf16 (p0, z1, z2),
+		z0 = svsplice (p0, z1, z2))
diff --git a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/st1_bf16.c b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/st1_bf16.c
new file mode 100644
index 0000000..ec3dbe3
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/st1_bf16.c
@@ -0,0 +1,158 @@
+/* { dg-final { check-function-bodies "**" "" "-DCHECK_ASM" { target { ! ilp32 } } } } */
+
+#include "test_sve_acle.h"
+
+/*
+** st1_bf16_base:
+**	st1h	z0\.h, p0, \[x0\]
+**	ret
+*/
+TEST_STORE (st1_bf16_base, svbfloat16_t, bfloat16_t,
+	    svst1_bf16 (p0, x0, z0),
+	    svst1 (p0, x0, z0))
+
+/*
+** st1_bf16_index:
+**	st1h	z0\.h, p0, \[x0, x1, lsl 1\]
+**	ret
+*/
+TEST_STORE (st1_bf16_index, svbfloat16_t, bfloat16_t,
+	    svst1_bf16 (p0, x0 + x1, z0),
+	    svst1 (p0, x0 + x1, z0))
+
+/*
+** st1_bf16_1:
+**	st1h	z0\.h, p0, \[x0, #1, mul vl\]
+**	ret
+*/
+TEST_STORE (st1_bf16_1, svbfloat16_t, bfloat16_t,
+	    svst1_bf16 (p0, x0 + svcnth (), z0),
+	    svst1 (p0, x0 + svcnth (), z0))
+
+/*
+** st1_bf16_7:
+**	st1h	z0\.h, p0, \[x0, #7, mul vl\]
+**	ret
+*/
+TEST_STORE (st1_bf16_7, svbfloat16_t, bfloat16_t,
+	    svst1_bf16 (p0, x0 + svcnth () * 7, z0),
+	    svst1 (p0, x0 + svcnth () * 7, z0))
+
+/* Moving the constant into a register would also be OK.  */
+/*
+** st1_bf16_8:
+**	incb	x0, all, mul #8
+**	st1h	z0\.h, p0, \[x0\]
+**	ret
+*/
+TEST_STORE (st1_bf16_8, svbfloat16_t, bfloat16_t,
+	    svst1_bf16 (p0, x0 + svcnth () * 8, z0),
+	    svst1 (p0, x0 + svcnth () * 8, z0))
+
+/*
+** st1_bf16_m1:
+**	st1h	z0\.h, p0, \[x0, #-1, mul vl\]
+**	ret
+*/
+TEST_STORE (st1_bf16_m1, svbfloat16_t, bfloat16_t,
+	    svst1_bf16 (p0, x0 - svcnth (), z0),
+	    svst1 (p0, x0 - svcnth (), z0))
+
+/*
+** st1_bf16_m8:
+**	st1h	z0\.h, p0, \[x0, #-8, mul vl\]
+**	ret
+*/
+TEST_STORE (st1_bf16_m8, svbfloat16_t, bfloat16_t,
+	    svst1_bf16 (p0, x0 - svcnth () * 8, z0),
+	    svst1 (p0, x0 - svcnth () * 8, z0))
+
+/* Moving the constant into a register would also be OK.  */
+/*
+** st1_bf16_m9:
+**	decb	x0, all, mul #9
+**	st1h	z0\.h, p0, \[x0\]
+**	ret
+*/
+TEST_STORE (st1_bf16_m9, svbfloat16_t, bfloat16_t,
+	    svst1_bf16 (p0, x0 - svcnth () * 9, z0),
+	    svst1 (p0, x0 - svcnth () * 9, z0))
+
+/*
+** st1_vnum_bf16_0:
+**	st1h	z0\.h, p0, \[x0\]
+**	ret
+*/
+TEST_STORE (st1_vnum_bf16_0, svbfloat16_t, bfloat16_t,
+	    svst1_vnum_bf16 (p0, x0, 0, z0),
+	    svst1_vnum (p0, x0, 0, z0))
+
+/*
+** st1_vnum_bf16_1:
+**	st1h	z0\.h, p0, \[x0, #1, mul vl\]
+**	ret
+*/
+TEST_STORE (st1_vnum_bf16_1, svbfloat16_t, bfloat16_t,
+	    svst1_vnum_bf16 (p0, x0, 1, z0),
+	    svst1_vnum (p0, x0, 1, z0))
+
+/*
+** st1_vnum_bf16_7:
+**	st1h	z0\.h, p0, \[x0, #7, mul vl\]
+**	ret
+*/
+TEST_STORE (st1_vnum_bf16_7, svbfloat16_t, bfloat16_t,
+	    svst1_vnum_bf16 (p0, x0, 7, z0),
+	    svst1_vnum (p0, x0, 7, z0))
+
+/* Moving the constant into a register would also be OK.  */
+/*
+** st1_vnum_bf16_8:
+**	incb	x0, all, mul #8
+**	st1h	z0\.h, p0, \[x0\]
+**	ret
+*/
+TEST_STORE (st1_vnum_bf16_8, svbfloat16_t, bfloat16_t,
+	    svst1_vnum_bf16 (p0, x0, 8, z0),
+	    svst1_vnum (p0, x0, 8, z0))
+
+/*
+** st1_vnum_bf16_m1:
+**	st1h	z0\.h, p0, \[x0, #-1, mul vl\]
+**	ret
+*/
+TEST_STORE (st1_vnum_bf16_m1, svbfloat16_t, bfloat16_t,
+	    svst1_vnum_bf16 (p0, x0, -1, z0),
+	    svst1_vnum (p0, x0, -1, z0))
+
+/*
+** st1_vnum_bf16_m8:
+**	st1h	z0\.h, p0, \[x0, #-8, mul vl\]
+**	ret
+*/
+TEST_STORE (st1_vnum_bf16_m8, svbfloat16_t, bfloat16_t,
+	    svst1_vnum_bf16 (p0, x0, -8, z0),
+	    svst1_vnum (p0, x0, -8, z0))
+
+/* Moving the constant into a register would also be OK.  */
+/*
+** st1_vnum_bf16_m9:
+**	decb	x0, all, mul #9
+**	st1h	z0\.h, p0, \[x0\]
+**	ret
+*/
+TEST_STORE (st1_vnum_bf16_m9, svbfloat16_t, bfloat16_t,
+	    svst1_vnum_bf16 (p0, x0, -9, z0),
+	    svst1_vnum (p0, x0, -9, z0))
+
+/* Using MUL to calculate an index would also be OK.  */
+/*
+** st1_vnum_bf16_x1:
+**	cntb	(x[0-9]+)
+**	madd	(x[0-9]+), (x1, \1|\1, x1), x0
+**	st1h	z0\.h, p0, \[\2\]
+**	ret
+*/
+TEST_STORE (st1_vnum_bf16_x1, svbfloat16_t, bfloat16_t,
+	    svst1_vnum_bf16 (p0, x0, x1, z0),
+	    svst1_vnum (p0, x0, x1, z0))
diff --git a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/st2_bf16.c b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/st2_bf16.c
new file mode 100644
index 0000000..a4a57af
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/st2_bf16.c
@@ -0,0 +1,200 @@
+/* { dg-final { check-function-bodies "**" "" "-DCHECK_ASM" { target { ! ilp32 } } } } */
+
+#include "test_sve_acle.h"
+
+/*
+** st2_bf16_base:
+**	st2h	{z0\.h(?: - |, )z1\.h}, p0, \[x0\]
+**	ret
+*/
+TEST_STORE (st2_bf16_base, svbfloat16x2_t, bfloat16_t,
+	    svst2_bf16 (p0, x0, z0),
+	    svst2 (p0, x0, z0))
+
+/*
+** st2_bf16_index:
+**	st2h	{z0\.h(?: - |, )z1\.h}, p0, \[x0, x1, lsl 1\]
+**	ret
+*/
+TEST_STORE (st2_bf16_index, svbfloat16x2_t, bfloat16_t,
+	    svst2_bf16 (p0, x0 + x1, z0),
+	    svst2 (p0, x0 + x1, z0))
+
+/* Moving the constant into a register would also be OK.  */
+/*
+** st2_bf16_1:
+**	incb	x0
+**	st2h	{z0\.h(?: - |, )z1\.h}, p0, \[x0\]
+**	ret
+*/
+TEST_STORE (st2_bf16_1, svbfloat16x2_t, bfloat16_t,
+	    svst2_bf16 (p0, x0 + svcnth (), z0),
+	    svst2 (p0, x0 + svcnth (), z0))
+
+/*
+** st2_bf16_2:
+**	st2h	{z0\.h(?: - |, )z1\.h}, p0, \[x0, #2, mul vl\]
+**	ret
+*/
+TEST_STORE (st2_bf16_2, svbfloat16x2_t, bfloat16_t,
+	    svst2_bf16 (p0, x0 + svcnth () * 2, z0),
+	    svst2 (p0, x0 + svcnth () * 2, z0))
+
+/*
+** st2_bf16_14:
+**	st2h	{z0\.h(?: - |, )z1\.h}, p0, \[x0, #14, mul vl\]
+**	ret
+*/
+TEST_STORE (st2_bf16_14, svbfloat16x2_t, bfloat16_t,
+	    svst2_bf16 (p0, x0 + svcnth () * 14, z0),
+	    svst2 (p0, x0 + svcnth () * 14, z0))
+
+/* Moving the constant into a register would also be OK.  */
+/*
+** st2_bf16_16:
+**	incb	x0, all, mul #16
+**	st2h	{z0\.h(?: - |, )z1\.h}, p0, \[x0\]
+**	ret
+*/
+TEST_STORE (st2_bf16_16, svbfloat16x2_t, bfloat16_t,
+	    svst2_bf16 (p0, x0 + svcnth () * 16, z0),
+	    svst2 (p0, x0 + svcnth () * 16, z0))
+
+/* Moving the constant into a register would also be OK.  */
+/*
+** st2_bf16_m1:
+**	decb	x0
+**	st2h	{z0\.h(?: - |, )z1\.h}, p0, \[x0\]
+**	ret
+*/
+TEST_STORE (st2_bf16_m1, svbfloat16x2_t, bfloat16_t,
+	    svst2_bf16 (p0, x0 - svcnth (), z0),
+	    svst2 (p0, x0 - svcnth (), z0))
+
+/*
+** st2_bf16_m2:
+**	st2h	{z0\.h(?: - |, )z1\.h}, p0, \[x0, #-2, mul vl\]
+**	ret
+*/
+TEST_STORE (st2_bf16_m2, svbfloat16x2_t, bfloat16_t,
+	    svst2_bf16 (p0, x0 - svcnth () * 2, z0),
+	    svst2 (p0, x0 - svcnth () * 2, z0))
+
+/*
+** st2_bf16_m16:
+**	st2h	{z0\.h(?: - |, )z1\.h}, p0, \[x0, #-16, mul vl\]
+**	ret
+*/
+TEST_STORE (st2_bf16_m16, svbfloat16x2_t, bfloat16_t,
+	    svst2_bf16 (p0, x0 - svcnth () * 16, z0),
+	    svst2 (p0, x0 - svcnth () * 16, z0))
+
+/*
+** st2_bf16_m18:
+**	addvl	(x[0-9]+), x0, #-18
+**	st2h	{z0\.h(?: - |, )z1\.h}, p0, \[\1\]
+**	ret
+*/
+TEST_STORE (st2_bf16_m18, svbfloat16x2_t, bfloat16_t,
+	    svst2_bf16 (p0, x0 - svcnth () * 18, z0),
+	    svst2 (p0, x0 - svcnth () * 18, z0))
+
+/*
+** st2_vnum_bf16_0:
+**	st2h	{z0\.h(?: - |, )z1\.h}, p0, \[x0\]
+**	ret
+*/
+TEST_STORE (st2_vnum_bf16_0, svbfloat16x2_t, bfloat16_t,
+	    svst2_vnum_bf16 (p0, x0, 0, z0),
+	    svst2_vnum (p0, x0, 0, z0))
+
+/* Moving the constant into a register would also be OK.  */
+/*
+** st2_vnum_bf16_1:
+**	incb	x0
+**	st2h	{z0\.h(?: - |, )z1\.h}, p0, \[x0\]
+**	ret
+*/
+TEST_STORE (st2_vnum_bf16_1, svbfloat16x2_t, bfloat16_t,
+	    svst2_vnum_bf16 (p0, x0, 1, z0),
+	    svst2_vnum (p0, x0, 1, z0))
+
+/*
+** st2_vnum_bf16_2:
+**	st2h	{z0\.h(?: - |, )z1\.h}, p0, \[x0, #2, mul vl\]
+**	ret
+*/
+TEST_STORE (st2_vnum_bf16_2, svbfloat16x2_t, bfloat16_t,
+	    svst2_vnum_bf16 (p0, x0, 2, z0),
+	    svst2_vnum (p0, x0, 2, z0))
+
+/*
+** st2_vnum_bf16_14:
+**	st2h	{z0\.h(?: - |, )z1\.h}, p0, \[x0, #14, mul vl\]
+**	ret
+*/
+TEST_STORE (st2_vnum_bf16_14, svbfloat16x2_t, bfloat16_t,
+	    svst2_vnum_bf16 (p0, x0, 14, z0),
+	    svst2_vnum (p0, x0, 14, z0))
+
+/* Moving the constant into a register would also be OK.  */
+/*
+** st2_vnum_bf16_16:
+**	incb	x0, all, mul #16
+**	st2h	{z0\.h(?: - |, )z1\.h}, p0, \[x0\]
+**	ret
+*/
+TEST_STORE (st2_vnum_bf16_16, svbfloat16x2_t, bfloat16_t,
+	    svst2_vnum_bf16 (p0, x0, 16, z0),
+	    svst2_vnum (p0, x0, 16, z0))
+
+/* Moving the constant into a register would also be OK.  */
+/*
+** st2_vnum_bf16_m1:
+**	decb	x0
+**	st2h	{z0\.h(?: - |, )z1\.h}, p0, \[x0\]
+**	ret
+*/
+TEST_STORE (st2_vnum_bf16_m1, svbfloat16x2_t, bfloat16_t,
+	    svst2_vnum_bf16 (p0, x0, -1, z0),
+	    svst2_vnum (p0, x0, -1, z0))
+
+/*
+** st2_vnum_bf16_m2:
+**	st2h	{z0\.h(?: - |, )z1\.h}, p0, \[x0, #-2, mul vl\]
+**	ret
+*/
+TEST_STORE (st2_vnum_bf16_m2, svbfloat16x2_t, bfloat16_t,
+	    svst2_vnum_bf16 (p0, x0, -2, z0),
+	    svst2_vnum (p0, x0, -2, z0))
+
+/*
+** st2_vnum_bf16_m16:
+**	st2h	{z0\.h(?: - |, )z1\.h}, p0, \[x0, #-16, mul vl\]
+**	ret
+*/
+TEST_STORE (st2_vnum_bf16_m16, svbfloat16x2_t, bfloat16_t,
+	    svst2_vnum_bf16 (p0, x0, -16, z0),
+	    svst2_vnum (p0, x0, -16, z0))
+
+/*
+** st2_vnum_bf16_m18:
+**	addvl	(x[0-9]+), x0, #-18
+**	st2h	{z0\.h(?: - |, )z1\.h}, p0, \[\1\]
+**	ret
+*/
+TEST_STORE (st2_vnum_bf16_m18, svbfloat16x2_t, bfloat16_t,
+	    svst2_vnum_bf16 (p0, x0, -18, z0),
+	    svst2_vnum (p0, x0, -18, z0))
+
+/* Using MUL to calculate an index would also be OK.  */
+/*
+** st2_vnum_bf16_x1:
+**	cntb	(x[0-9]+)
+**	madd	(x[0-9]+), (x1, \1|\1, x1), x0
+**	st2h	{z0\.h(?: - |, )z1\.h}, p0, \[\2\]
+**	ret
+*/
+TEST_STORE (st2_vnum_bf16_x1, svbfloat16x2_t, bfloat16_t,
+	    svst2_vnum_bf16 (p0, x0, x1, z0),
+	    svst2_vnum (p0, x0, x1, z0))
diff --git a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/st3_bf16.c b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/st3_bf16.c
new file mode 100644
index 0000000..2f92168
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/st3_bf16.c
@@ -0,0 +1,242 @@
+/* { dg-final { check-function-bodies "**" "" "-DCHECK_ASM" { target { ! ilp32 } } } } */
+
+#include "test_sve_acle.h"
+
+/*
+** st3_bf16_base:
+**	st3h	{z0\.h - z2\.h}, p0, \[x0\]
+**	ret
+*/
+TEST_STORE (st3_bf16_base, svbfloat16x3_t, bfloat16_t,
+	    svst3_bf16 (p0, x0, z0),
+	    svst3 (p0, x0, z0))
+
+/*
+** st3_bf16_index:
+**	st3h	{z0\.h - z2\.h}, p0, \[x0, x1, lsl 1\]
+**	ret
+*/
+TEST_STORE (st3_bf16_index, svbfloat16x3_t, bfloat16_t,
+	    svst3_bf16 (p0, x0 + x1, z0),
+	    svst3 (p0, x0 + x1, z0))
+
+/* Moving the constant into a register would also be OK.  */
+/*
+** st3_bf16_1:
+**	incb	x0
+**	st3h	{z0\.h - z2\.h}, p0, \[x0\]
+**	ret
+*/
+TEST_STORE (st3_bf16_1, svbfloat16x3_t, bfloat16_t,
+	    svst3_bf16 (p0, x0 + svcnth (), z0),
+	    svst3 (p0, x0 + svcnth (), z0))
+
+/* Moving the constant into a register would also be OK.  */
+/*
+** st3_bf16_2:
+**	incb	x0, all, mul #2
+**	st3h	{z0\.h - z2\.h}, p0, \[x0\]
+**	ret
+*/
+TEST_STORE (st3_bf16_2, svbfloat16x3_t, bfloat16_t,
+	    svst3_bf16 (p0, x0 + svcnth () * 2, z0),
+	    svst3 (p0, x0 + svcnth () * 2, z0))
+
+/*
+** st3_bf16_3:
+**	st3h	{z0\.h - z2\.h}, p0, \[x0, #3, mul vl\]
+**	ret
+*/
+TEST_STORE (st3_bf16_3, svbfloat16x3_t, bfloat16_t,
+	    svst3_bf16 (p0, x0 + svcnth () * 3, z0),
+	    svst3 (p0, x0 + svcnth () * 3, z0))
+
+/*
+** st3_bf16_21:
+**	st3h	{z0\.h - z2\.h}, p0, \[x0, #21, mul vl\]
+**	ret
+*/
+TEST_STORE (st3_bf16_21, svbfloat16x3_t, bfloat16_t,
+	    svst3_bf16 (p0, x0 + svcnth () * 21, z0),
+	    svst3 (p0, x0 + svcnth () * 21, z0))
+
+/*
+** st3_bf16_24:
+**	addvl	(x[0-9]+), x0, #24
+**	st3h	{z0\.h - z2\.h}, p0, \[\1\]
+**	ret
+*/
+TEST_STORE (st3_bf16_24, svbfloat16x3_t, bfloat16_t,
+	    svst3_bf16 (p0, x0 + svcnth () * 24, z0),
+	    svst3 (p0, x0 + svcnth () * 24, z0))
+
+/* Moving the constant into a register would also be OK.  */
+/*
+** st3_bf16_m1:
+**	decb	x0
+**	st3h	{z0\.h - z2\.h}, p0, \[x0\]
+**	ret
+*/
+TEST_STORE (st3_bf16_m1, svbfloat16x3_t, bfloat16_t,
+	    svst3_bf16 (p0, x0 - svcnth (), z0),
+	    svst3 (p0, x0 - svcnth (), z0))
+
+/* Moving the constant into a register would also be OK.  */
+/*
+** st3_bf16_m2:
+**	decb	x0, all, mul #2
+**	st3h	{z0\.h - z2\.h}, p0, \[x0\]
+**	ret
+*/
+TEST_STORE (st3_bf16_m2, svbfloat16x3_t, bfloat16_t,
+	    svst3_bf16 (p0, x0 - svcnth () * 2, z0),
+	    svst3 (p0, x0 - svcnth () * 2, z0))
+
+/*
+** st3_bf16_m3:
+**	st3h	{z0\.h - z2\.h}, p0, \[x0, #-3, mul vl\]
+**	ret
+*/
+TEST_STORE (st3_bf16_m3, svbfloat16x3_t, bfloat16_t,
+	    svst3_bf16 (p0, x0 - svcnth () * 3, z0),
+	    svst3 (p0, x0 - svcnth () * 3, z0))
+
+/*
+** st3_bf16_m24:
+**	st3h	{z0\.h - z2\.h}, p0, \[x0, #-24, mul vl\]
+**	ret
+*/
+TEST_STORE (st3_bf16_m24, svbfloat16x3_t, bfloat16_t,
+	    svst3_bf16 (p0, x0 - svcnth () * 24, z0),
+	    svst3 (p0, x0 - svcnth () * 24, z0))
+
+/*
+** st3_bf16_m27:
+**	addvl	(x[0-9]+), x0, #-27
+**	st3h	{z0\.h - z2\.h}, p0, \[\1\]
+**	ret
+*/
+TEST_STORE (st3_bf16_m27, svbfloat16x3_t, bfloat16_t,
+	    svst3_bf16 (p0, x0 - svcnth () * 27, z0),
+	    svst3 (p0, x0 - svcnth () * 27, z0))
+
+/*
+** st3_vnum_bf16_0:
+**	st3h	{z0\.h - z2\.h}, p0, \[x0\]
+**	ret
+*/
+TEST_STORE (st3_vnum_bf16_0, svbfloat16x3_t, bfloat16_t,
+	    svst3_vnum_bf16 (p0, x0, 0, z0),
+	    svst3_vnum (p0, x0, 0, z0))
+
+/* Moving the constant into a register would also be OK.  */
+/*
+** st3_vnum_bf16_1:
+**	incb	x0
+**	st3h	{z0\.h - z2\.h}, p0, \[x0\]
+**	ret
+*/
+TEST_STORE (st3_vnum_bf16_1, svbfloat16x3_t, bfloat16_t,
+	    svst3_vnum_bf16 (p0, x0, 1, z0),
+	    svst3_vnum (p0, x0, 1, z0))
+
+/* Moving the constant into a register would also be OK.  */
+/*
+** st3_vnum_bf16_2:
+**	incb	x0, all, mul #2
+**	st3h	{z0\.h - z2\.h}, p0, \[x0\]
+**	ret
+*/
+TEST_STORE (st3_vnum_bf16_2, svbfloat16x3_t, bfloat16_t,
+	    svst3_vnum_bf16 (p0, x0, 2, z0),
+	    svst3_vnum (p0, x0, 2, z0))
+
+/*
+** st3_vnum_bf16_3:
+**	st3h	{z0\.h - z2\.h}, p0, \[x0, #3, mul vl\]
+**	ret
+*/
+TEST_STORE (st3_vnum_bf16_3, svbfloat16x3_t, bfloat16_t,
+	    svst3_vnum_bf16 (p0, x0, 3, z0),
+	    svst3_vnum (p0, x0, 3, z0))
+
+/*
+** st3_vnum_bf16_21:
+**	st3h	{z0\.h - z2\.h}, p0, \[x0, #21, mul vl\]
+**	ret
+*/
+TEST_STORE (st3_vnum_bf16_21, svbfloat16x3_t, bfloat16_t,
+	    svst3_vnum_bf16 (p0, x0, 21, z0),
+	    svst3_vnum (p0, x0, 21, z0))
+
+/*
+** st3_vnum_bf16_24:
+**	addvl	(x[0-9]+), x0, #24
+**	st3h	{z0\.h - z2\.h}, p0, \[\1\]
+**	ret
+*/
+TEST_STORE (st3_vnum_bf16_24, svbfloat16x3_t, bfloat16_t,
+	    svst3_vnum_bf16 (p0, x0, 24, z0),
+	    svst3_vnum (p0, x0, 24, z0))
+
+/* Moving the constant into a register would also be OK.  */
+/*
+** st3_vnum_bf16_m1:
+**	decb	x0
+**	st3h	{z0\.h - z2\.h}, p0, \[x0\]
+**	ret
+*/
+TEST_STORE (st3_vnum_bf16_m1, svbfloat16x3_t, bfloat16_t,
+	    svst3_vnum_bf16 (p0, x0, -1, z0),
+	    svst3_vnum (p0, x0, -1, z0))
+
+/* Moving the constant into a register would also be OK.  */
+/*
+** st3_vnum_bf16_m2:
+**	decb	x0, all, mul #2
+**	st3h	{z0\.h - z2\.h}, p0, \[x0\]
+**	ret
+*/
+TEST_STORE (st3_vnum_bf16_m2, svbfloat16x3_t, bfloat16_t,
+	    svst3_vnum_bf16 (p0, x0, -2, z0),
+	    svst3_vnum (p0, x0, -2, z0))
+
+/*
+** st3_vnum_bf16_m3:
+**	st3h	{z0\.h - z2\.h}, p0, \[x0, #-3, mul vl\]
+**	ret
+*/
+TEST_STORE (st3_vnum_bf16_m3, svbfloat16x3_t, bfloat16_t,
+	    svst3_vnum_bf16 (p0, x0, -3, z0),
+	    svst3_vnum (p0, x0, -3, z0))
+
+/*
+** st3_vnum_bf16_m24:
+**	st3h	{z0\.h - z2\.h}, p0, \[x0, #-24, mul vl\]
+**	ret
+*/
+TEST_STORE (st3_vnum_bf16_m24, svbfloat16x3_t, bfloat16_t,
+	    svst3_vnum_bf16 (p0, x0, -24, z0),
+	    svst3_vnum (p0, x0, -24, z0))
+
+/*
+** st3_vnum_bf16_m27:
+**	addvl	(x[0-9]+), x0, #-27
+**	st3h	{z0\.h - z2\.h}, p0, \[\1\]
+**	ret
+*/
+TEST_STORE (st3_vnum_bf16_m27, svbfloat16x3_t, bfloat16_t,
+	    svst3_vnum_bf16 (p0, x0, -27, z0),
+	    svst3_vnum (p0, x0, -27, z0))
+
+/* Using MUL to calculate an index would also be OK.  */
+/*
+** st3_vnum_bf16_x1:
+**	cntb	(x[0-9]+)
+**	madd	(x[0-9]+), (x1, \1|\1, x1), x0
+**	st3h	{z0\.h - z2\.h}, p0, \[\2\]
+**	ret
+*/
+TEST_STORE (st3_vnum_bf16_x1, svbfloat16x3_t, bfloat16_t,
+	    svst3_vnum_bf16 (p0, x0, x1, z0),
+	    svst3_vnum (p0, x0, x1, z0))
diff --git a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/st4_bf16.c b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/st4_bf16.c
new file mode 100644
index 0000000..b8d9f4a
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/st4_bf16.c
@@ -0,0 +1,286 @@
+/* { dg-final { check-function-bodies "**" "" "-DCHECK_ASM" { target { ! ilp32 } } } } */
+
+#include "test_sve_acle.h"
+
+/*
+** st4_bf16_base:
+**	st4h	{z0\.h - z3\.h}, p0, \[x0\]
+**	ret
+*/
+TEST_STORE (st4_bf16_base, svbfloat16x4_t, bfloat16_t,
+	    svst4_bf16 (p0, x0, z0),
+	    svst4 (p0, x0, z0))
+
+/*
+** st4_bf16_index:
+**	st4h	{z0\.h - z3\.h}, p0, \[x0, x1, lsl 1\]
+**	ret
+*/
+TEST_STORE (st4_bf16_index, svbfloat16x4_t, bfloat16_t,
+	    svst4_bf16 (p0, x0 + x1, z0),
+	    svst4 (p0, x0 + x1, z0))
+
+/* Moving the constant into a register would also be OK.  */
+/*
+** st4_bf16_1:
+**	incb	x0
+**	st4h	{z0\.h - z3\.h}, p0, \[x0\]
+**	ret
+*/
+TEST_STORE (st4_bf16_1, svbfloat16x4_t, bfloat16_t,
+	    svst4_bf16 (p0, x0 + svcnth (), z0),
+	    svst4 (p0, x0 + svcnth (), z0))
+
+/* Moving the constant into a register would also be OK.  */
+/*
+** st4_bf16_2:
+**	incb	x0, all, mul #2
+**	st4h	{z0\.h - z3\.h}, p0, \[x0\]
+**	ret
+*/
+TEST_STORE (st4_bf16_2, svbfloat16x4_t, bfloat16_t,
+	    svst4_bf16 (p0, x0 + svcnth () * 2, z0),
+	    svst4 (p0, x0 + svcnth () * 2, z0))
+
+/* Moving the constant into a register would also be OK.  */
+/*
+** st4_bf16_3:
+**	incb	x0, all, mul #3
+**	st4h	{z0\.h - z3\.h}, p0, \[x0\]
+**	ret
+*/
+TEST_STORE (st4_bf16_3, svbfloat16x4_t, bfloat16_t,
+	    svst4_bf16 (p0, x0 + svcnth () * 3, z0),
+	    svst4 (p0, x0 + svcnth () * 3, z0))
+
+/*
+** st4_bf16_4:
+**	st4h	{z0\.h - z3\.h}, p0, \[x0, #4, mul vl\]
+**	ret
+*/
+TEST_STORE (st4_bf16_4, svbfloat16x4_t, bfloat16_t,
+	    svst4_bf16 (p0, x0 + svcnth () * 4, z0),
+	    svst4 (p0, x0 + svcnth () * 4, z0))
+
+/*
+** st4_bf16_28:
+**	st4h	{z0\.h - z3\.h}, p0, \[x0, #28, mul vl\]
+**	ret
+*/
+TEST_STORE (st4_bf16_28, svbfloat16x4_t, bfloat16_t,
+	    svst4_bf16 (p0, x0 + svcnth () * 28, z0),
+	    svst4 (p0, x0 + svcnth () * 28, z0))
+
+/*
+** st4_bf16_32:
+**	[^{]*
+**	st4h	{z0\.h - z3\.h}, p0, \[x[0-9]+\]
+**	ret
+*/
+TEST_STORE (st4_bf16_32, svbfloat16x4_t, bfloat16_t,
+	    svst4_bf16 (p0, x0 + svcnth () * 32, z0),
+	    svst4 (p0, x0 + svcnth () * 32, z0))
+
+/* Moving the constant into a register would also be OK.  */
+/*
+** st4_bf16_m1:
+**	decb	x0
+**	st4h	{z0\.h - z3\.h}, p0, \[x0\]
+**	ret
+*/
+TEST_STORE (st4_bf16_m1, svbfloat16x4_t, bfloat16_t,
+	    svst4_bf16 (p0, x0 - svcnth (), z0),
+	    svst4 (p0, x0 - svcnth (), z0))
+
+/* Moving the constant into a register would also be OK.  */
+/*
+** st4_bf16_m2:
+**	decb	x0, all, mul #2
+**	st4h	{z0\.h - z3\.h}, p0, \[x0\]
+**	ret
+*/
+TEST_STORE (st4_bf16_m2, svbfloat16x4_t, bfloat16_t,
+	    svst4_bf16 (p0, x0 - svcnth () * 2, z0),
+	    svst4 (p0, x0 - svcnth () * 2, z0))
+
+/* Moving the constant into a register would also be OK.  */
+/*
+** st4_bf16_m3:
+**	decb	x0, all, mul #3
+**	st4h	{z0\.h - z3\.h}, p0, \[x0\]
+**	ret
+*/
+TEST_STORE (st4_bf16_m3, svbfloat16x4_t, bfloat16_t,
+	    svst4_bf16 (p0, x0 - svcnth () * 3, z0),
+	    svst4 (p0, x0 - svcnth () * 3, z0))
+
+/*
+** st4_bf16_m4:
+**	st4h	{z0\.h - z3\.h}, p0, \[x0, #-4, mul vl\]
+**	ret
+*/
+TEST_STORE (st4_bf16_m4, svbfloat16x4_t, bfloat16_t,
+	    svst4_bf16 (p0, x0 - svcnth () * 4, z0),
+	    svst4 (p0, x0 - svcnth () * 4, z0))
+
+/*
+** st4_bf16_m32:
+**	st4h	{z0\.h - z3\.h}, p0, \[x0, #-32, mul vl\]
+**	ret
+*/
+TEST_STORE (st4_bf16_m32, svbfloat16x4_t, bfloat16_t,
+	    svst4_bf16 (p0, x0 - svcnth () * 32, z0),
+	    svst4 (p0, x0 - svcnth () * 32, z0))
+
+/*
+** st4_bf16_m36:
+**	[^{]*
+**	st4h	{z0\.h - z3\.h}, p0, \[x[0-9]+\]
+**	ret
+*/
+TEST_STORE (st4_bf16_m36, svbfloat16x4_t, bfloat16_t,
+	    svst4_bf16 (p0, x0 - svcnth () * 36, z0),
+	    svst4 (p0, x0 - svcnth () * 36, z0))
+
+/*
+** st4_vnum_bf16_0:
+**	st4h	{z0\.h - z3\.h}, p0, \[x0\]
+**	ret
+*/
+TEST_STORE (st4_vnum_bf16_0, svbfloat16x4_t, bfloat16_t,
+	    svst4_vnum_bf16 (p0, x0, 0, z0),
+	    svst4_vnum (p0, x0, 0, z0))
+
+/* Moving the constant into a register would also be OK.  */
+/*
+** st4_vnum_bf16_1:
+**	incb	x0
+**	st4h	{z0\.h - z3\.h}, p0, \[x0\]
+**	ret
+*/
+TEST_STORE (st4_vnum_bf16_1, svbfloat16x4_t, bfloat16_t,
+	    svst4_vnum_bf16 (p0, x0, 1, z0),
+	    svst4_vnum (p0, x0, 1, z0))
+
+/* Moving the constant into a register would also be OK.  */
+/*
+** st4_vnum_bf16_2:
+**	incb	x0, all, mul #2
+**	st4h	{z0\.h - z3\.h}, p0, \[x0\]
+**	ret
+*/
+TEST_STORE (st4_vnum_bf16_2, svbfloat16x4_t, bfloat16_t,
+	    svst4_vnum_bf16 (p0, x0, 2, z0),
+	    svst4_vnum (p0, x0, 2, z0))
+
+/* Moving the constant into a register would also be OK.  */
+/*
+** st4_vnum_bf16_3:
+**	incb	x0, all, mul #3
+**	st4h	{z0\.h - z3\.h}, p0, \[x0\]
+**	ret
+*/
+TEST_STORE (st4_vnum_bf16_3, svbfloat16x4_t, bfloat16_t,
+	    svst4_vnum_bf16 (p0, x0, 3, z0),
+	    svst4_vnum (p0, x0, 3, z0))
+
+/*
+** st4_vnum_bf16_4:
+**	st4h	{z0\.h - z3\.h}, p0, \[x0, #4, mul vl\]
+**	ret
+*/
+TEST_STORE (st4_vnum_bf16_4, svbfloat16x4_t, bfloat16_t,
+	    svst4_vnum_bf16 (p0, x0, 4, z0),
+	    svst4_vnum (p0, x0, 4, z0))
+
+/*
+** st4_vnum_bf16_28:
+**	st4h	{z0\.h - z3\.h}, p0, \[x0, #28, mul vl\]
+**	ret
+*/
+TEST_STORE (st4_vnum_bf16_28, svbfloat16x4_t, bfloat16_t,
+	    svst4_vnum_bf16 (p0, x0, 28, z0),
+	    svst4_vnum (p0, x0, 28, z0))
+
+/*
+** st4_vnum_bf16_32:
+**	[^{]*
+**	st4h	{z0\.h - z3\.h}, p0, \[x[0-9]+\]
+**	ret
+*/
+TEST_STORE (st4_vnum_bf16_32, svbfloat16x4_t, bfloat16_t,
+	    svst4_vnum_bf16 (p0, x0, 32, z0),
+	    svst4_vnum (p0, x0, 32, z0))
+
+/* Moving the constant into a register would also be OK.  */
+/*
+** st4_vnum_bf16_m1:
+**	decb	x0
+**	st4h	{z0\.h - z3\.h}, p0, \[x0\]
+**	ret
+*/
+TEST_STORE (st4_vnum_bf16_m1, svbfloat16x4_t, bfloat16_t,
+	    svst4_vnum_bf16 (p0, x0, -1, z0),
+	    svst4_vnum (p0, x0, -1, z0))
+
+/* Moving the constant into a register would also be OK.  */
+/*
+** st4_vnum_bf16_m2:
+**	decb	x0, all, mul #2
+**	st4h	{z0\.h - z3\.h}, p0, \[x0\]
+**	ret
+*/
+TEST_STORE (st4_vnum_bf16_m2, svbfloat16x4_t, bfloat16_t,
+	    svst4_vnum_bf16 (p0, x0, -2, z0),
+	    svst4_vnum (p0, x0, -2, z0))
+
+/* Moving the constant into a register would also be OK.  */
+/*
+** st4_vnum_bf16_m3:
+**	decb	x0, all, mul #3
+**	st4h	{z0\.h - z3\.h}, p0, \[x0\]
+**	ret
+*/
+TEST_STORE (st4_vnum_bf16_m3, svbfloat16x4_t, bfloat16_t,
+	    svst4_vnum_bf16 (p0, x0, -3, z0),
+	    svst4_vnum (p0, x0, -3, z0))
+
+/*
+** st4_vnum_bf16_m4:
+**	st4h	{z0\.h - z3\.h}, p0, \[x0, #-4, mul vl\]
+**	ret
+*/
+TEST_STORE (st4_vnum_bf16_m4, svbfloat16x4_t, bfloat16_t,
+	    svst4_vnum_bf16 (p0, x0, -4, z0),
+	    svst4_vnum (p0, x0, -4, z0))
+
+/*
+** st4_vnum_bf16_m32:
+**	st4h	{z0\.h - z3\.h}, p0, \[x0, #-32, mul vl\]
+**	ret
+*/
+TEST_STORE (st4_vnum_bf16_m32, svbfloat16x4_t, bfloat16_t,
+	    svst4_vnum_bf16 (p0, x0, -32, z0),
+	    svst4_vnum (p0, x0, -32, z0))
+
+/*
+** st4_vnum_bf16_m36:
+**	[^{]*
+**	st4h	{z0\.h - z3\.h}, p0, \[x[0-9]+\]
+**	ret
+*/
+TEST_STORE (st4_vnum_bf16_m36, svbfloat16x4_t, bfloat16_t,
+	    svst4_vnum_bf16 (p0, x0, -36, z0),
+	    svst4_vnum (p0, x0, -36, z0))
+
+/* Using MUL to calculate an index would also be OK.  */
+/*
+** st4_vnum_bf16_x1:
+**	cntb	(x[0-9]+)
+**	madd	(x[0-9]+), (x1, \1|\1, x1), x0
+**	st4h	{z0\.h - z3\.h}, p0, \[\2\]
+**	ret
+*/
+TEST_STORE (st4_vnum_bf16_x1, svbfloat16x4_t, bfloat16_t,
+	    svst4_vnum_bf16 (p0, x0, x1, z0),
+	    svst4_vnum (p0, x0, x1, z0))
diff --git a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/stnt1_bf16.c b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/stnt1_bf16.c
new file mode 100644
index 0000000..3c4d21f
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/stnt1_bf16.c
@@ -0,0 +1,158 @@
+/* { dg-final { check-function-bodies "**" "" "-DCHECK_ASM" { target { ! ilp32 } } } } */
+
+#include "test_sve_acle.h"
+
+/*
+** stnt1_bf16_base:
+**	stnt1h	z0\.h, p0, \[x0\]
+**	ret
+*/
+TEST_STORE (stnt1_bf16_base, svbfloat16_t, bfloat16_t,
+	    svstnt1_bf16 (p0, x0, z0),
+	    svstnt1 (p0, x0, z0))
+
+/*
+** stnt1_bf16_index:
+**	stnt1h	z0\.h, p0, \[x0, x1, lsl 1\]
+**	ret
+*/
+TEST_STORE (stnt1_bf16_index, svbfloat16_t, bfloat16_t,
+	    svstnt1_bf16 (p0, x0 + x1, z0),
+	    svstnt1 (p0, x0 + x1, z0))
+
+/*
+** stnt1_bf16_1:
+**	stnt1h	z0\.h, p0, \[x0, #1, mul vl\]
+**	ret
+*/
+TEST_STORE (stnt1_bf16_1, svbfloat16_t, bfloat16_t,
+	    svstnt1_bf16 (p0, x0 + svcnth (), z0),
+	    svstnt1 (p0, x0 + svcnth (), z0))
+
+/*
+** stnt1_bf16_7:
+**	stnt1h	z0\.h, p0, \[x0, #7, mul vl\]
+**	ret
+*/
+TEST_STORE (stnt1_bf16_7, svbfloat16_t, bfloat16_t,
+	    svstnt1_bf16 (p0, x0 + svcnth () * 7, z0),
+	    svstnt1 (p0, x0 + svcnth () * 7, z0))
+
+/* Moving the constant into a register would also be OK.  */
+/*
+** stnt1_bf16_8:
+**	incb	x0, all, mul #8
+**	stnt1h	z0\.h, p0, \[x0\]
+**	ret
+*/
+TEST_STORE (stnt1_bf16_8, svbfloat16_t, bfloat16_t,
+	    svstnt1_bf16 (p0, x0 + svcnth () * 8, z0),
+	    svstnt1 (p0, x0 + svcnth () * 8, z0))
+
+/*
+** stnt1_bf16_m1:
+**	stnt1h	z0\.h, p0, \[x0, #-1, mul vl\]
+**	ret
+*/
+TEST_STORE (stnt1_bf16_m1, svbfloat16_t, bfloat16_t,
+	    svstnt1_bf16 (p0, x0 - svcnth (), z0),
+	    svstnt1 (p0, x0 - svcnth (), z0))
+
+/*
+** stnt1_bf16_m8:
+**	stnt1h	z0\.h, p0, \[x0, #-8, mul vl\]
+**	ret
+*/
+TEST_STORE (stnt1_bf16_m8, svbfloat16_t, bfloat16_t,
+	    svstnt1_bf16 (p0, x0 - svcnth () * 8, z0),
+	    svstnt1 (p0, x0 - svcnth () * 8, z0))
+
+/* Moving the constant into a register would also be OK.  */
+/*
+** stnt1_bf16_m9:
+**	decb	x0, all, mul #9
+**	stnt1h	z0\.h, p0, \[x0\]
+**	ret
+*/
+TEST_STORE (stnt1_bf16_m9, svbfloat16_t, bfloat16_t,
+	    svstnt1_bf16 (p0, x0 - svcnth () * 9, z0),
+	    svstnt1 (p0, x0 - svcnth () * 9, z0))
+
+/*
+** stnt1_vnum_bf16_0:
+**	stnt1h	z0\.h, p0, \[x0\]
+**	ret
+*/
+TEST_STORE (stnt1_vnum_bf16_0, svbfloat16_t, bfloat16_t,
+	    svstnt1_vnum_bf16 (p0, x0, 0, z0),
+	    svstnt1_vnum (p0, x0, 0, z0))
+
+/*
+** stnt1_vnum_bf16_1:
+**	stnt1h	z0\.h, p0, \[x0, #1, mul vl\]
+**	ret
+*/
+TEST_STORE (stnt1_vnum_bf16_1, svbfloat16_t, bfloat16_t,
+	    svstnt1_vnum_bf16 (p0, x0, 1, z0),
+	    svstnt1_vnum (p0, x0, 1, z0))
+
+/*
+** stnt1_vnum_bf16_7:
+**	stnt1h	z0\.h, p0, \[x0, #7, mul vl\]
+**	ret
+*/
+TEST_STORE (stnt1_vnum_bf16_7, svbfloat16_t, bfloat16_t,
+	    svstnt1_vnum_bf16 (p0, x0, 7, z0),
+	    svstnt1_vnum (p0, x0, 7, z0))
+
+/* Moving the constant into a register would also be OK.  */
+/*
+** stnt1_vnum_bf16_8:
+**	incb	x0, all, mul #8
+**	stnt1h	z0\.h, p0, \[x0\]
+**	ret
+*/
+TEST_STORE (stnt1_vnum_bf16_8, svbfloat16_t, bfloat16_t,
+	    svstnt1_vnum_bf16 (p0, x0, 8, z0),
+	    svstnt1_vnum (p0, x0, 8, z0))
+
+/*
+** stnt1_vnum_bf16_m1:
+**	stnt1h	z0\.h, p0, \[x0, #-1, mul vl\]
+**	ret
+*/
+TEST_STORE (stnt1_vnum_bf16_m1, svbfloat16_t, bfloat16_t,
+	    svstnt1_vnum_bf16 (p0, x0, -1, z0),
+	    svstnt1_vnum (p0, x0, -1, z0))
+
+/*
+** stnt1_vnum_bf16_m8:
+**	stnt1h	z0\.h, p0, \[x0, #-8, mul vl\]
+**	ret
+*/
+TEST_STORE (stnt1_vnum_bf16_m8, svbfloat16_t, bfloat16_t,
+	    svstnt1_vnum_bf16 (p0, x0, -8, z0),
+	    svstnt1_vnum (p0, x0, -8, z0))
+
+/* Moving the constant into a register would also be OK.  */
+/*
+** stnt1_vnum_bf16_m9:
+**	decb	x0, all, mul #9
+**	stnt1h	z0\.h, p0, \[x0\]
+**	ret
+*/
+TEST_STORE (stnt1_vnum_bf16_m9, svbfloat16_t, bfloat16_t,
+	    svstnt1_vnum_bf16 (p0, x0, -9, z0),
+	    svstnt1_vnum (p0, x0, -9, z0))
+
+/* Using MUL to calculate an index would also be OK.  */
+/*
+** stnt1_vnum_bf16_x1:
+**	cntb	(x[0-9]+)
+**	madd	(x[0-9]+), (x1, \1|\1, x1), x0
+**	stnt1h	z0\.h, p0, \[\2\]
+**	ret
+*/
+TEST_STORE (stnt1_vnum_bf16_x1, svbfloat16_t, bfloat16_t,
+	    svstnt1_vnum_bf16 (p0, x0, x1, z0),
+	    svstnt1_vnum (p0, x0, x1, z0))
diff --git a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/tbl_bf16.c b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/tbl_bf16.c
new file mode 100644
index 0000000..8c077d1
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/tbl_bf16.c
@@ -0,0 +1,30 @@
+/* { dg-final { check-function-bodies "**" "" "-DCHECK_ASM" } } */
+
+#include "test_sve_acle.h"
+
+/*
+** tbl_bf16_tied1:
+**	tbl	z0\.h, z0\.h, z4\.h
+**	ret
+*/
+TEST_DUAL_Z (tbl_bf16_tied1, svbfloat16_t, svuint16_t,
+	     z0 = svtbl_bf16 (z0, z4),
+	     z0 = svtbl (z0, z4))
+
+/*
+** tbl_bf16_tied2:
+**	tbl	z0\.h, z4\.h, z0\.h
+**	ret
+*/
+TEST_DUAL_Z_REV (tbl_bf16_tied2, svbfloat16_t, svuint16_t,
+		 z0_res = svtbl_bf16 (z4, z0),
+		 z0_res = svtbl (z4, z0))
+
+/*
+** tbl_bf16_untied:
+**	tbl	z0\.h, z1\.h, z4\.h
+**	ret
+*/
+TEST_DUAL_Z (tbl_bf16_untied, svbfloat16_t, svuint16_t,
+	     z0 = svtbl_bf16 (z1, z4),
+	     z0 = svtbl (z1, z4))
diff --git a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/trn1_bf16.c b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/trn1_bf16.c
new file mode 100644
index 0000000..b04c7da
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/trn1_bf16.c
@@ -0,0 +1,30 @@
+/* { dg-final { check-function-bodies "**" "" "-DCHECK_ASM" } } */
+
+#include "test_sve_acle.h"
+
+/*
+** trn1_bf16_tied1:
+**	trn1	z0\.h, z0\.h, z1\.h
+**	ret
+*/
+TEST_UNIFORM_Z (trn1_bf16_tied1, svbfloat16_t,
+		z0 = svtrn1_bf16 (z0, z1),
+		z0 = svtrn1 (z0, z1))
+
+/*
+** trn1_bf16_tied2:
+**	trn1	z0\.h, z1\.h, z0\.h
+**	ret
+*/
+TEST_UNIFORM_Z (trn1_bf16_tied2, svbfloat16_t,
+		z0 = svtrn1_bf16 (z1, z0),
+		z0 = svtrn1 (z1, z0))
+
+/*
+** trn1_bf16_untied:
+**	trn1	z0\.h, z1\.h, z2\.h
+**	ret
+*/
+TEST_UNIFORM_Z (trn1_bf16_untied, svbfloat16_t,
+		z0 = svtrn1_bf16 (z1, z2),
+		z0 = svtrn1 (z1, z2))
diff --git a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/trn1q_bf16.c b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/trn1q_bf16.c
new file mode 100644
index 0000000..f1810da
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/trn1q_bf16.c
@@ -0,0 +1,32 @@
+/* { dg-require-effective-target aarch64_asm_f64mm_ok } */
+/* { dg-additional-options "-march=armv8.2-a+f64mm" } */
+/* { dg-final { check-function-bodies "**" "" "-DCHECK_ASM" } } */
+
+#include "test_sve_acle.h"
+
+/*
+** trn1q_bf16_tied1:
+**	trn1	z0\.q, z0\.q, z1\.q
+**	ret
+*/
+TEST_UNIFORM_Z (trn1q_bf16_tied1, svbfloat16_t,
+		z0 = svtrn1q_bf16 (z0, z1),
+		z0 = svtrn1q (z0, z1))
+
+/*
+** trn1q_bf16_tied2:
+**	trn1	z0\.q, z1\.q, z0\.q
+**	ret
+*/
+TEST_UNIFORM_Z (trn1q_bf16_tied2, svbfloat16_t,
+		z0 = svtrn1q_bf16 (z1, z0),
+		z0 = svtrn1q (z1, z0))
+
+/*
+** trn1q_bf16_untied:
+**	trn1	z0\.q, z1\.q, z2\.q
+**	ret
+*/
+TEST_UNIFORM_Z (trn1q_bf16_untied, svbfloat16_t,
+		z0 = svtrn1q_bf16 (z1, z2),
+		z0 = svtrn1q (z1, z2))
diff --git a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/trn2_bf16.c b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/trn2_bf16.c
new file mode 100644
index 0000000..12028b0
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/trn2_bf16.c
@@ -0,0 +1,30 @@
+/* { dg-final { check-function-bodies "**" "" "-DCHECK_ASM" } } */
+
+#include "test_sve_acle.h"
+
+/*
+** trn2_bf16_tied1:
+**	trn2	z0\.h, z0\.h, z1\.h
+**	ret
+*/
+TEST_UNIFORM_Z (trn2_bf16_tied1, svbfloat16_t,
+		z0 = svtrn2_bf16 (z0, z1),
+		z0 = svtrn2 (z0, z1))
+
+/*
+** trn2_bf16_tied2:
+**	trn2	z0\.h, z1\.h, z0\.h
+**	ret
+*/
+TEST_UNIFORM_Z (trn2_bf16_tied2, svbfloat16_t,
+		z0 = svtrn2_bf16 (z1, z0),
+		z0 = svtrn2 (z1, z0))
+
+/*
+** trn2_bf16_untied:
+**	trn2	z0\.h, z1\.h, z2\.h
+**	ret
+*/
+TEST_UNIFORM_Z (trn2_bf16_untied, svbfloat16_t,
+		z0 = svtrn2_bf16 (z1, z2),
+		z0 = svtrn2 (z1, z2))
diff --git a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/trn2q_bf16.c b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/trn2q_bf16.c
new file mode 100644
index 0000000..5623b54
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/trn2q_bf16.c
@@ -0,0 +1,32 @@
+/* { dg-require-effective-target aarch64_asm_f64mm_ok } */
+/* { dg-additional-options "-march=armv8.2-a+f64mm" } */
+/* { dg-final { check-function-bodies "**" "" "-DCHECK_ASM" } } */
+
+#include "test_sve_acle.h"
+
+/*
+** trn2q_bf16_tied1:
+**	trn2	z0\.q, z0\.q, z1\.q
+**	ret
+*/
+TEST_UNIFORM_Z (trn2q_bf16_tied1, svbfloat16_t,
+		z0 = svtrn2q_bf16 (z0, z1),
+		z0 = svtrn2q (z0, z1))
+
+/*
+** trn2q_bf16_tied2:
+**	trn2	z0\.q, z1\.q, z0\.q
+**	ret
+*/
+TEST_UNIFORM_Z (trn2q_bf16_tied2, svbfloat16_t,
+		z0 = svtrn2q_bf16 (z1, z0),
+		z0 = svtrn2q (z1, z0))
+
+/*
+** trn2q_bf16_untied:
+**	trn2	z0\.q, z1\.q, z2\.q
+**	ret
+*/
+TEST_UNIFORM_Z (trn2q_bf16_untied, svbfloat16_t,
+		z0 = svtrn2q_bf16 (z1, z2),
+		z0 = svtrn2q (z1, z2))
diff --git a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/undef2_1.c b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/undef2_1.c
index c1439bf..fe6c4c7 100644
--- a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/undef2_1.c
+++ b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/undef2_1.c
@@ -38,6 +38,13 @@ TEST_UNDEF (float16, svfloat16x2_t,
 	    z0 = svundef2_f16 ())
 
 /*
+** bfloat16:
+**	ret
+*/
+TEST_UNDEF (bfloat16, svbfloat16x2_t,
+	    z0 = svundef2_bf16 ())
+
+/*
 ** int32:
 **	ret
 */
diff --git a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/undef3_1.c b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/undef3_1.c
index 1944d5c..5c18c63 100644
--- a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/undef3_1.c
+++ b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/undef3_1.c
@@ -38,6 +38,13 @@ TEST_UNDEF (float16, svfloat16x3_t,
 	    z0 = svundef3_f16 ())
 
 /*
+** bfloat16:
+**	ret
+*/
+TEST_UNDEF (bfloat16, svbfloat16x3_t,
+	    z0 = svundef3_bf16 ())
+
+/*
 ** int32:
 **	ret
 */
diff --git a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/undef4_1.c b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/undef4_1.c
index b745e13..4d6b86b 100644
--- a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/undef4_1.c
+++ b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/undef4_1.c
@@ -38,6 +38,13 @@ TEST_UNDEF (float16, svfloat16x4_t,
 	    z0 = svundef4_f16 ())
 
 /*
+** bfloat16:
+**	ret
+*/
+TEST_UNDEF (bfloat16, svbfloat16x4_t,
+	    z0 = svundef4_bf16 ())
+
+/*
 ** int32:
 **	ret
 */
diff --git a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/undef_1.c b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/undef_1.c
index 9c80791..62873b6 100644
--- a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/undef_1.c
+++ b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/undef_1.c
@@ -38,6 +38,13 @@ TEST_UNDEF (float16, svfloat16_t,
 	    z0 = svundef_f16 ())
 
 /*
+** bfloat16:
+**	ret
+*/
+TEST_UNDEF (bfloat16, svbfloat16_t,
+	    z0 = svundef_bf16 ())
+
+/*
 ** int32:
 **	ret
 */
diff --git a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/uzp1_bf16.c b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/uzp1_bf16.c
new file mode 100644
index 0000000..19d43ed
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/uzp1_bf16.c
@@ -0,0 +1,30 @@
+/* { dg-final { check-function-bodies "**" "" "-DCHECK_ASM" } } */
+
+#include "test_sve_acle.h"
+
+/*
+** uzp1_bf16_tied1:
+**	uzp1	z0\.h, z0\.h, z1\.h
+**	ret
+*/
+TEST_UNIFORM_Z (uzp1_bf16_tied1, svbfloat16_t,
+		z0 = svuzp1_bf16 (z0, z1),
+		z0 = svuzp1 (z0, z1))
+
+/*
+** uzp1_bf16_tied2:
+**	uzp1	z0\.h, z1\.h, z0\.h
+**	ret
+*/
+TEST_UNIFORM_Z (uzp1_bf16_tied2, svbfloat16_t,
+		z0 = svuzp1_bf16 (z1, z0),
+		z0 = svuzp1 (z1, z0))
+
+/*
+** uzp1_bf16_untied:
+**	uzp1	z0\.h, z1\.h, z2\.h
+**	ret
+*/
+TEST_UNIFORM_Z (uzp1_bf16_untied, svbfloat16_t,
+		z0 = svuzp1_bf16 (z1, z2),
+		z0 = svuzp1 (z1, z2))
diff --git a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/uzp1q_bf16.c b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/uzp1q_bf16.c
new file mode 100644
index 0000000..30a1992
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/uzp1q_bf16.c
@@ -0,0 +1,32 @@
+/* { dg-require-effective-target aarch64_asm_f64mm_ok } */
+/* { dg-additional-options "-march=armv8.2-a+f64mm" } */
+/* { dg-final { check-function-bodies "**" "" "-DCHECK_ASM" } } */
+
+#include "test_sve_acle.h"
+
+/*
+** uzp1q_bf16_tied1:
+**	uzp1	z0\.q, z0\.q, z1\.q
+**	ret
+*/
+TEST_UNIFORM_Z (uzp1q_bf16_tied1, svbfloat16_t,
+		z0 = svuzp1q_bf16 (z0, z1),
+		z0 = svuzp1q (z0, z1))
+
+/*
+** uzp1q_bf16_tied2:
+**	uzp1	z0\.q, z1\.q, z0\.q
+**	ret
+*/
+TEST_UNIFORM_Z (uzp1q_bf16_tied2, svbfloat16_t,
+		z0 = svuzp1q_bf16 (z1, z0),
+		z0 = svuzp1q (z1, z0))
+
+/*
+** uzp1q_bf16_untied:
+**	uzp1	z0\.q, z1\.q, z2\.q
+**	ret
+*/
+TEST_UNIFORM_Z (uzp1q_bf16_untied, svbfloat16_t,
+		z0 = svuzp1q_bf16 (z1, z2),
+		z0 = svuzp1q (z1, z2))
diff --git a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/uzp2_bf16.c b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/uzp2_bf16.c
new file mode 100644
index 0000000..b5566bf
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/uzp2_bf16.c
@@ -0,0 +1,30 @@
+/* { dg-final { check-function-bodies "**" "" "-DCHECK_ASM" } } */
+
+#include "test_sve_acle.h"
+
+/*
+** uzp2_bf16_tied1:
+**	uzp2	z0\.h, z0\.h, z1\.h
+**	ret
+*/
+TEST_UNIFORM_Z (uzp2_bf16_tied1, svbfloat16_t,
+		z0 = svuzp2_bf16 (z0, z1),
+		z0 = svuzp2 (z0, z1))
+
+/*
+** uzp2_bf16_tied2:
+**	uzp2	z0\.h, z1\.h, z0\.h
+**	ret
+*/
+TEST_UNIFORM_Z (uzp2_bf16_tied2, svbfloat16_t,
+		z0 = svuzp2_bf16 (z1, z0),
+		z0 = svuzp2 (z1, z0))
+
+/*
+** uzp2_bf16_untied:
+**	uzp2	z0\.h, z1\.h, z2\.h
+**	ret
+*/
+TEST_UNIFORM_Z (uzp2_bf16_untied, svbfloat16_t,
+		z0 = svuzp2_bf16 (z1, z2),
+		z0 = svuzp2 (z1, z2))
diff --git a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/uzp2q_bf16.c b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/uzp2q_bf16.c
new file mode 100644
index 0000000..bbac53a
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/uzp2q_bf16.c
@@ -0,0 +1,32 @@
+/* { dg-require-effective-target aarch64_asm_f64mm_ok } */
+/* { dg-additional-options "-march=armv8.2-a+f64mm" } */
+/* { dg-final { check-function-bodies "**" "" "-DCHECK_ASM" } } */
+
+#include "test_sve_acle.h"
+
+/*
+** uzp2q_bf16_tied1:
+**	uzp2	z0\.q, z0\.q, z1\.q
+**	ret
+*/
+TEST_UNIFORM_Z (uzp2q_bf16_tied1, svbfloat16_t,
+		z0 = svuzp2q_bf16 (z0, z1),
+		z0 = svuzp2q (z0, z1))
+
+/*
+** uzp2q_bf16_tied2:
+**	uzp2	z0\.q, z1\.q, z0\.q
+**	ret
+*/
+TEST_UNIFORM_Z (uzp2q_bf16_tied2, svbfloat16_t,
+		z0 = svuzp2q_bf16 (z1, z0),
+		z0 = svuzp2q (z1, z0))
+
+/*
+** uzp2q_bf16_untied:
+**	uzp2	z0\.q, z1\.q, z2\.q
+**	ret
+*/
+TEST_UNIFORM_Z (uzp2q_bf16_untied, svbfloat16_t,
+		z0 = svuzp2q_bf16 (z1, z2),
+		z0 = svuzp2q (z1, z2))
diff --git a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/zip1_bf16.c b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/zip1_bf16.c
new file mode 100644
index 0000000..6017cde
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/zip1_bf16.c
@@ -0,0 +1,30 @@
+/* { dg-final { check-function-bodies "**" "" "-DCHECK_ASM" } } */
+
+#include "test_sve_acle.h"
+
+/*
+** zip1_bf16_tied1:
+**	zip1	z0\.h, z0\.h, z1\.h
+**	ret
+*/
+TEST_UNIFORM_Z (zip1_bf16_tied1, svbfloat16_t,
+		z0 = svzip1_bf16 (z0, z1),
+		z0 = svzip1 (z0, z1))
+
+/*
+** zip1_bf16_tied2:
+**	zip1	z0\.h, z1\.h, z0\.h
+**	ret
+*/
+TEST_UNIFORM_Z (zip1_bf16_tied2, svbfloat16_t,
+		z0 = svzip1_bf16 (z1, z0),
+		z0 = svzip1 (z1, z0))
+
+/*
+** zip1_bf16_untied:
+**	zip1	z0\.h, z1\.h, z2\.h
+**	ret
+*/
+TEST_UNIFORM_Z (zip1_bf16_untied, svbfloat16_t,
+		z0 = svzip1_bf16 (z1, z2),
+		z0 = svzip1 (z1, z2))
diff --git a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/zip1q_bf16.c b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/zip1q_bf16.c
new file mode 100644
index 0000000..aabf7c0
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/zip1q_bf16.c
@@ -0,0 +1,32 @@
+/* { dg-require-effective-target aarch64_asm_f64mm_ok } */
+/* { dg-additional-options "-march=armv8.2-a+f64mm" } */
+/* { dg-final { check-function-bodies "**" "" "-DCHECK_ASM" } } */
+
+#include "test_sve_acle.h"
+
+/*
+** zip1q_bf16_tied1:
+**	zip1	z0\.q, z0\.q, z1\.q
+**	ret
+*/
+TEST_UNIFORM_Z (zip1q_bf16_tied1, svbfloat16_t,
+		z0 = svzip1q_bf16 (z0, z1),
+		z0 = svzip1q (z0, z1))
+
+/*
+** zip1q_bf16_tied2:
+**	zip1	z0\.q, z1\.q, z0\.q
+**	ret
+*/
+TEST_UNIFORM_Z (zip1q_bf16_tied2, svbfloat16_t,
+		z0 = svzip1q_bf16 (z1, z0),
+		z0 = svzip1q (z1, z0))
+
+/*
+** zip1q_bf16_untied:
+**	zip1	z0\.q, z1\.q, z2\.q
+**	ret
+*/
+TEST_UNIFORM_Z (zip1q_bf16_untied, svbfloat16_t,
+		z0 = svzip1q_bf16 (z1, z2),
+		z0 = svzip1q (z1, z2))
diff --git a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/zip2_bf16.c b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/zip2_bf16.c
new file mode 100644
index 0000000..a9e0cfc
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/zip2_bf16.c
@@ -0,0 +1,30 @@
+/* { dg-final { check-function-bodies "**" "" "-DCHECK_ASM" } } */
+
+#include "test_sve_acle.h"
+
+/*
+** zip2_bf16_tied1:
+**	zip2	z0\.h, z0\.h, z1\.h
+**	ret
+*/
+TEST_UNIFORM_Z (zip2_bf16_tied1, svbfloat16_t,
+		z0 = svzip2_bf16 (z0, z1),
+		z0 = svzip2 (z0, z1))
+
+/*
+** zip2_bf16_tied2:
+**	zip2	z0\.h, z1\.h, z0\.h
+**	ret
+*/
+TEST_UNIFORM_Z (zip2_bf16_tied2, svbfloat16_t,
+		z0 = svzip2_bf16 (z1, z0),
+		z0 = svzip2 (z1, z0))
+
+/*
+** zip2_bf16_untied:
+**	zip2	z0\.h, z1\.h, z2\.h
+**	ret
+*/
+TEST_UNIFORM_Z (zip2_bf16_untied, svbfloat16_t,
+		z0 = svzip2_bf16 (z1, z2),
+		z0 = svzip2 (z1, z2))
diff --git a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/zip2q_bf16.c b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/zip2q_bf16.c
new file mode 100644
index 0000000..6d79136
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/zip2q_bf16.c
@@ -0,0 +1,32 @@
+/* { dg-require-effective-target aarch64_asm_f64mm_ok } */
+/* { dg-additional-options "-march=armv8.2-a+f64mm" } */
+/* { dg-final { check-function-bodies "**" "" "-DCHECK_ASM" } } */
+
+#include "test_sve_acle.h"
+
+/*
+** zip2q_bf16_tied1:
+**	zip2	z0\.q, z0\.q, z1\.q
+**	ret
+*/
+TEST_UNIFORM_Z (zip2q_bf16_tied1, svbfloat16_t,
+		z0 = svzip2q_bf16 (z0, z1),
+		z0 = svzip2q (z0, z1))
+
+/*
+** zip2q_bf16_tied2:
+**	zip2	z0\.q, z1\.q, z0\.q
+**	ret
+*/
+TEST_UNIFORM_Z (zip2q_bf16_tied2, svbfloat16_t,
+		z0 = svzip2q_bf16 (z1, z0),
+		z0 = svzip2q (z1, z0))
+
+/*
+** zip2q_bf16_untied:
+**	zip2	z0\.q, z1\.q, z2\.q
+**	ret
+*/
+TEST_UNIFORM_Z (zip2q_bf16_untied, svbfloat16_t,
+		z0 = svzip2q_bf16 (z1, z2),
+		z0 = svzip2q (z1, z2))
diff --git a/gcc/testsuite/gcc.target/aarch64/sve/pcs/annotate_1.c b/gcc/testsuite/gcc.target/aarch64/sve/pcs/annotate_1.c
index 1172be5..12ae767 100644
--- a/gcc/testsuite/gcc.target/aarch64/sve/pcs/annotate_1.c
+++ b/gcc/testsuite/gcc.target/aarch64/sve/pcs/annotate_1.c
@@ -12,6 +12,7 @@ svuint8_t ret_u8 (void) { return svdup_u8 (0); }
 svuint16_t ret_u16 (void) { return svdup_u16 (0); }
 svuint32_t ret_u32 (void) { return svdup_u32 (0); }
 svuint64_t ret_u64 (void) { return svdup_u64 (0); }
+svbfloat16_t ret_bf16 (void) { return svundef_bf16 (); }
 svfloat16_t ret_f16 (void) { return svdup_f16 (0); }
 svfloat32_t ret_f32 (void) { return svdup_f32 (0); }
 svfloat64_t ret_f64 (void) { return svdup_f64 (0); }
@@ -24,6 +25,7 @@ svuint8x2_t ret_u8x2 (void) { return svundef2_u8 (); }
 svuint16x2_t ret_u16x2 (void) { return svundef2_u16 (); }
 svuint32x2_t ret_u32x2 (void) { return svundef2_u32 (); }
 svuint64x2_t ret_u64x2 (void) { return svundef2_u64 (); }
+svbfloat16x2_t ret_bf16x2 (void) { return svundef2_bf16 (); }
 svfloat16x2_t ret_f16x2 (void) { return svundef2_f16 (); }
 svfloat32x2_t ret_f32x2 (void) { return svundef2_f32 (); }
 svfloat64x2_t ret_f64x2 (void) { return svundef2_f64 (); }
@@ -36,6 +38,7 @@ svuint8x3_t ret_u8x3 (void) { return svundef3_u8 (); }
 svuint16x3_t ret_u16x3 (void) { return svundef3_u16 (); }
 svuint32x3_t ret_u32x3 (void) { return svundef3_u32 (); }
 svuint64x3_t ret_u64x3 (void) { return svundef3_u64 (); }
+svbfloat16x3_t ret_bf16x3 (void) { return svundef3_bf16 (); }
 svfloat16x3_t ret_f16x3 (void) { return svundef3_f16 (); }
 svfloat32x3_t ret_f32x3 (void) { return svundef3_f32 (); }
 svfloat64x3_t ret_f64x3 (void) { return svundef3_f64 (); }
@@ -48,6 +51,7 @@ svuint8x4_t ret_u8x4 (void) { return svundef4_u8 (); }
 svuint16x4_t ret_u16x4 (void) { return svundef4_u16 (); }
 svuint32x4_t ret_u32x4 (void) { return svundef4_u32 (); }
 svuint64x4_t ret_u64x4 (void) { return svundef4_u64 (); }
+svbfloat16x4_t ret_bf16x4 (void) { return svundef4_bf16 (); }
 svfloat16x4_t ret_f16x4 (void) { return svundef4_f16 (); }
 svfloat32x4_t ret_f32x4 (void) { return svundef4_f32 (); }
 svfloat64x4_t ret_f64x4 (void) { return svundef4_f64 (); }
@@ -62,6 +66,7 @@ svfloat64x4_t ret_f64x4 (void) { return svundef4_f64 (); }
 /* { dg-final { scan-assembler {\t\.variant_pcs\tret_u16\n} } } */
 /* { dg-final { scan-assembler {\t\.variant_pcs\tret_u32\n} } } */
 /* { dg-final { scan-assembler {\t\.variant_pcs\tret_u64\n} } } */
+/* { dg-final { scan-assembler {\t\.variant_pcs\tret_bf16\n} } } */
 /* { dg-final { scan-assembler {\t\.variant_pcs\tret_f16\n} } } */
 /* { dg-final { scan-assembler {\t\.variant_pcs\tret_f32\n} } } */
 /* { dg-final { scan-assembler {\t\.variant_pcs\tret_f64\n} } } */
@@ -74,6 +79,7 @@ svfloat64x4_t ret_f64x4 (void) { return svundef4_f64 (); }
 /* { dg-final { scan-assembler {\t\.variant_pcs\tret_u16x2\n} } } */
 /* { dg-final { scan-assembler {\t\.variant_pcs\tret_u32x2\n} } } */
 /* { dg-final { scan-assembler {\t\.variant_pcs\tret_u64x2\n} } } */
+/* { dg-final { scan-assembler {\t\.variant_pcs\tret_bf16x2\n} } } */
 /* { dg-final { scan-assembler {\t\.variant_pcs\tret_f16x2\n} } } */
 /* { dg-final { scan-assembler {\t\.variant_pcs\tret_f32x2\n} } } */
 /* { dg-final { scan-assembler {\t\.variant_pcs\tret_f64x2\n} } } */
@@ -87,6 +93,7 @@ svfloat64x4_t ret_f64x4 (void) { return svundef4_f64 (); }
 /* { dg-final { scan-assembler {\t\.variant_pcs\tret_u16x3\n} } } */
 /* { dg-final { scan-assembler {\t\.variant_pcs\tret_u32x3\n} } } */
 /* { dg-final { scan-assembler {\t\.variant_pcs\tret_u64x3\n} } } */
+/* { dg-final { scan-assembler {\t\.variant_pcs\tret_bf16x3\n} } } */
 /* { dg-final { scan-assembler {\t\.variant_pcs\tret_f16x3\n} } } */
 /* { dg-final { scan-assembler {\t\.variant_pcs\tret_f32x3\n} } } */
 /* { dg-final { scan-assembler {\t\.variant_pcs\tret_f64x3\n} } } */
@@ -99,6 +106,7 @@ svfloat64x4_t ret_f64x4 (void) { return svundef4_f64 (); }
 /* { dg-final { scan-assembler {\t\.variant_pcs\tret_u16x4\n} } } */
 /* { dg-final { scan-assembler {\t\.variant_pcs\tret_u32x4\n} } } */
 /* { dg-final { scan-assembler {\t\.variant_pcs\tret_u64x4\n} } } */
+/* { dg-final { scan-assembler {\t\.variant_pcs\tret_bf16x4\n} } } */
 /* { dg-final { scan-assembler {\t\.variant_pcs\tret_f16x4\n} } } */
 /* { dg-final { scan-assembler {\t\.variant_pcs\tret_f32x4\n} } } */
 /* { dg-final { scan-assembler {\t\.variant_pcs\tret_f64x4\n} } } */
diff --git a/gcc/testsuite/gcc.target/aarch64/sve/pcs/annotate_2.c b/gcc/testsuite/gcc.target/aarch64/sve/pcs/annotate_2.c
index 6f10f90..9f0741e 100644
--- a/gcc/testsuite/gcc.target/aarch64/sve/pcs/annotate_2.c
+++ b/gcc/testsuite/gcc.target/aarch64/sve/pcs/annotate_2.c
@@ -12,6 +12,7 @@ void fn_u8 (svuint8_t x) {}
 void fn_u16 (svuint16_t x) {}
 void fn_u32 (svuint32_t x) {}
 void fn_u64 (svuint64_t x) {}
+void fn_bf16 (svbfloat16_t x) {}
 void fn_f16 (svfloat16_t x) {}
 void fn_f32 (svfloat32_t x) {}
 void fn_f64 (svfloat64_t x) {}
@@ -24,6 +25,7 @@ void fn_u8x2 (svuint8x2_t x) {}
 void fn_u16x2 (svuint16x2_t x) {}
 void fn_u32x2 (svuint32x2_t x) {}
 void fn_u64x2 (svuint64x2_t x) {}
+void fn_bf16x2 (svbfloat16x2_t x) {}
 void fn_f16x2 (svfloat16x2_t x) {}
 void fn_f32x2 (svfloat32x2_t x) {}
 void fn_f64x2 (svfloat64x2_t x) {}
@@ -36,6 +38,7 @@ void fn_u8x3 (svuint8x3_t x) {}
 void fn_u16x3 (svuint16x3_t x) {}
 void fn_u32x3 (svuint32x3_t x) {}
 void fn_u64x3 (svuint64x3_t x) {}
+void fn_bf16x3 (svbfloat16x3_t x) {}
 void fn_f16x3 (svfloat16x3_t x) {}
 void fn_f32x3 (svfloat32x3_t x) {}
 void fn_f64x3 (svfloat64x3_t x) {}
@@ -48,6 +51,7 @@ void fn_u8x4 (svuint8x4_t x) {}
 void fn_u16x4 (svuint16x4_t x) {}
 void fn_u32x4 (svuint32x4_t x) {}
 void fn_u64x4 (svuint64x4_t x) {}
+void fn_bf16x4 (svbfloat16x4_t x) {}
 void fn_f16x4 (svfloat16x4_t x) {}
 void fn_f32x4 (svfloat32x4_t x) {}
 void fn_f64x4 (svfloat64x4_t x) {}
@@ -62,6 +66,7 @@ void fn_f64x4 (svfloat64x4_t x) {}
 /* { dg-final { scan-assembler {\t\.variant_pcs\tfn_u16\n} } } */
 /* { dg-final { scan-assembler {\t\.variant_pcs\tfn_u32\n} } } */
 /* { dg-final { scan-assembler {\t\.variant_pcs\tfn_u64\n} } } */
+/* { dg-final { scan-assembler {\t\.variant_pcs\tfn_bf16\n} } } */
 /* { dg-final { scan-assembler {\t\.variant_pcs\tfn_f16\n} } } */
 /* { dg-final { scan-assembler {\t\.variant_pcs\tfn_f32\n} } } */
 /* { dg-final { scan-assembler {\t\.variant_pcs\tfn_f64\n} } } */
@@ -74,6 +79,7 @@ void fn_f64x4 (svfloat64x4_t x) {}
 /* { dg-final { scan-assembler {\t\.variant_pcs\tfn_u16x2\n} } } */
 /* { dg-final { scan-assembler {\t\.variant_pcs\tfn_u32x2\n} } } */
 /* { dg-final { scan-assembler {\t\.variant_pcs\tfn_u64x2\n} } } */
+/* { dg-final { scan-assembler {\t\.variant_pcs\tfn_bf16x2\n} } } */
 /* { dg-final { scan-assembler {\t\.variant_pcs\tfn_f16x2\n} } } */
 /* { dg-final { scan-assembler {\t\.variant_pcs\tfn_f32x2\n} } } */
 /* { dg-final { scan-assembler {\t\.variant_pcs\tfn_f64x2\n} } } */
@@ -86,6 +92,7 @@ void fn_f64x4 (svfloat64x4_t x) {}
 /* { dg-final { scan-assembler {\t\.variant_pcs\tfn_u16x3\n} } } */
 /* { dg-final { scan-assembler {\t\.variant_pcs\tfn_u32x3\n} } } */
 /* { dg-final { scan-assembler {\t\.variant_pcs\tfn_u64x3\n} } } */
+/* { dg-final { scan-assembler {\t\.variant_pcs\tfn_bf16x3\n} } } */
 /* { dg-final { scan-assembler {\t\.variant_pcs\tfn_f16x3\n} } } */
 /* { dg-final { scan-assembler {\t\.variant_pcs\tfn_f32x3\n} } } */
 /* { dg-final { scan-assembler {\t\.variant_pcs\tfn_f64x3\n} } } */
@@ -98,6 +105,7 @@ void fn_f64x4 (svfloat64x4_t x) {}
 /* { dg-final { scan-assembler {\t\.variant_pcs\tfn_u16x4\n} } } */
 /* { dg-final { scan-assembler {\t\.variant_pcs\tfn_u32x4\n} } } */
 /* { dg-final { scan-assembler {\t\.variant_pcs\tfn_u64x4\n} } } */
+/* { dg-final { scan-assembler {\t\.variant_pcs\tfn_bf16x4\n} } } */
 /* { dg-final { scan-assembler {\t\.variant_pcs\tfn_f16x4\n} } } */
 /* { dg-final { scan-assembler {\t\.variant_pcs\tfn_f32x4\n} } } */
 /* { dg-final { scan-assembler {\t\.variant_pcs\tfn_f64x4\n} } } */
diff --git a/gcc/testsuite/gcc.target/aarch64/sve/pcs/annotate_3.c b/gcc/testsuite/gcc.target/aarch64/sve/pcs/annotate_3.c
index d922a8a..42e7860 100644
--- a/gcc/testsuite/gcc.target/aarch64/sve/pcs/annotate_3.c
+++ b/gcc/testsuite/gcc.target/aarch64/sve/pcs/annotate_3.c
@@ -10,6 +10,7 @@ void fn_u8 (float d0, float d1, float d2, float d3, svuint8_t x) {}
 void fn_u16 (float d0, float d1, float d2, float d3, svuint16_t x) {}
 void fn_u32 (float d0, float d1, float d2, float d3, svuint32_t x) {}
 void fn_u64 (float d0, float d1, float d2, float d3, svuint64_t x) {}
+void fn_bf16 (float d0, float d1, float d2, float d3, svbfloat16_t x) {}
 void fn_f16 (float d0, float d1, float d2, float d3, svfloat16_t x) {}
 void fn_f32 (float d0, float d1, float d2, float d3, svfloat32_t x) {}
 void fn_f64 (float d0, float d1, float d2, float d3, svfloat64_t x) {}
@@ -22,6 +23,7 @@ void fn_u8x2 (float d0, float d1, float d2, float d3, svuint8x2_t x) {}
 void fn_u16x2 (float d0, float d1, float d2, float d3, svuint16x2_t x) {}
 void fn_u32x2 (float d0, float d1, float d2, float d3, svuint32x2_t x) {}
 void fn_u64x2 (float d0, float d1, float d2, float d3, svuint64x2_t x) {}
+void fn_bf16x2 (float d0, float d1, float d2, float d3, svbfloat16x2_t x) {}
 void fn_f16x2 (float d0, float d1, float d2, float d3, svfloat16x2_t x) {}
 void fn_f32x2 (float d0, float d1, float d2, float d3, svfloat32x2_t x) {}
 void fn_f64x2 (float d0, float d1, float d2, float d3, svfloat64x2_t x) {}
@@ -34,6 +36,7 @@ void fn_u8x3 (float d0, float d1, float d2, float d3, svuint8x3_t x) {}
 void fn_u16x3 (float d0, float d1, float d2, float d3, svuint16x3_t x) {}
 void fn_u32x3 (float d0, float d1, float d2, float d3, svuint32x3_t x) {}
 void fn_u64x3 (float d0, float d1, float d2, float d3, svuint64x3_t x) {}
+void fn_bf16x3 (float d0, float d1, float d2, float d3, svbfloat16x3_t x) {}
 void fn_f16x3 (float d0, float d1, float d2, float d3, svfloat16x3_t x) {}
 void fn_f32x3 (float d0, float d1, float d2, float d3, svfloat32x3_t x) {}
 void fn_f64x3 (float d0, float d1, float d2, float d3, svfloat64x3_t x) {}
@@ -46,6 +49,7 @@ void fn_u8x4 (float d0, float d1, float d2, float d3, svuint8x4_t x) {}
 void fn_u16x4 (float d0, float d1, float d2, float d3, svuint16x4_t x) {}
 void fn_u32x4 (float d0, float d1, float d2, float d3, svuint32x4_t x) {}
 void fn_u64x4 (float d0, float d1, float d2, float d3, svuint64x4_t x) {}
+void fn_bf16x4 (float d0, float d1, float d2, float d3, svbfloat16x4_t x) {}
 void fn_f16x4 (float d0, float d1, float d2, float d3, svfloat16x4_t x) {}
 void fn_f32x4 (float d0, float d1, float d2, float d3, svfloat32x4_t x) {}
 void fn_f64x4 (float d0, float d1, float d2, float d3, svfloat64x4_t x) {}
@@ -58,6 +62,7 @@ void fn_f64x4 (float d0, float d1, float d2, float d3, svfloat64x4_t x) {}
 /* { dg-final { scan-assembler {\t\.variant_pcs\tfn_u16\n} } } */
 /* { dg-final { scan-assembler {\t\.variant_pcs\tfn_u32\n} } } */
 /* { dg-final { scan-assembler {\t\.variant_pcs\tfn_u64\n} } } */
+/* { dg-final { scan-assembler {\t\.variant_pcs\tfn_bf16\n} } } */
 /* { dg-final { scan-assembler {\t\.variant_pcs\tfn_f16\n} } } */
 /* { dg-final { scan-assembler {\t\.variant_pcs\tfn_f32\n} } } */
 /* { dg-final { scan-assembler {\t\.variant_pcs\tfn_f64\n} } } */
@@ -70,6 +75,7 @@ void fn_f64x4 (float d0, float d1, float d2, float d3, svfloat64x4_t x) {}
 /* { dg-final { scan-assembler {\t\.variant_pcs\tfn_u16x2\n} } } */
 /* { dg-final { scan-assembler {\t\.variant_pcs\tfn_u32x2\n} } } */
 /* { dg-final { scan-assembler {\t\.variant_pcs\tfn_u64x2\n} } } */
+/* { dg-final { scan-assembler {\t\.variant_pcs\tfn_bf16x2\n} } } */
 /* { dg-final { scan-assembler {\t\.variant_pcs\tfn_f16x2\n} } } */
 /* { dg-final { scan-assembler {\t\.variant_pcs\tfn_f32x2\n} } } */
 /* { dg-final { scan-assembler {\t\.variant_pcs\tfn_f64x2\n} } } */
@@ -82,6 +88,7 @@ void fn_f64x4 (float d0, float d1, float d2, float d3, svfloat64x4_t x) {}
 /* { dg-final { scan-assembler {\t\.variant_pcs\tfn_u16x3\n} } } */
 /* { dg-final { scan-assembler {\t\.variant_pcs\tfn_u32x3\n} } } */
 /* { dg-final { scan-assembler {\t\.variant_pcs\tfn_u64x3\n} } } */
+/* { dg-final { scan-assembler {\t\.variant_pcs\tfn_bf16x3\n} } } */
 /* { dg-final { scan-assembler {\t\.variant_pcs\tfn_f16x3\n} } } */
 /* { dg-final { scan-assembler {\t\.variant_pcs\tfn_f32x3\n} } } */
 /* { dg-final { scan-assembler {\t\.variant_pcs\tfn_f64x3\n} } } */
@@ -94,6 +101,7 @@ void fn_f64x4 (float d0, float d1, float d2, float d3, svfloat64x4_t x) {}
 /* { dg-final { scan-assembler {\t\.variant_pcs\tfn_u16x4\n} } } */
 /* { dg-final { scan-assembler {\t\.variant_pcs\tfn_u32x4\n} } } */
 /* { dg-final { scan-assembler {\t\.variant_pcs\tfn_u64x4\n} } } */
+/* { dg-final { scan-assembler {\t\.variant_pcs\tfn_bf16x4\n} } } */
 /* { dg-final { scan-assembler {\t\.variant_pcs\tfn_f16x4\n} } } */
 /* { dg-final { scan-assembler {\t\.variant_pcs\tfn_f32x4\n} } } */
 /* { dg-final { scan-assembler {\t\.variant_pcs\tfn_f64x4\n} } } */
diff --git a/gcc/testsuite/gcc.target/aarch64/sve/pcs/annotate_4.c b/gcc/testsuite/gcc.target/aarch64/sve/pcs/annotate_4.c
index d057158..7e4438e 100644
--- a/gcc/testsuite/gcc.target/aarch64/sve/pcs/annotate_4.c
+++ b/gcc/testsuite/gcc.target/aarch64/sve/pcs/annotate_4.c
@@ -18,6 +18,8 @@ void fn_u32 (float d0, float d1, float d2, float d3,
 	     float d4, svuint32_t x) {}
 void fn_u64 (float d0, float d1, float d2, float d3,
 	     float d4, svuint64_t x) {}
+void fn_bf16 (float d0, float d1, float d2, float d3,
+	      float d4, svbfloat16_t x) {}
 void fn_f16 (float d0, float d1, float d2, float d3,
 	     float d4, svfloat16_t x) {}
 void fn_f32 (float d0, float d1, float d2, float d3,
@@ -41,6 +43,8 @@ void fn_u32x2 (float d0, float d1, float d2, float d3,
 	       float d4, svuint32x2_t x) {}
 void fn_u64x2 (float d0, float d1, float d2, float d3,
 	       float d4, svuint64x2_t x) {}
+void fn_bf16x2 (float d0, float d1, float d2, float d3,
+		float d4, svbfloat16x2_t x) {}
 void fn_f16x2 (float d0, float d1, float d2, float d3,
 	       float d4, svfloat16x2_t x) {}
 void fn_f32x2 (float d0, float d1, float d2, float d3,
@@ -64,6 +68,8 @@ void fn_u32x3 (float d0, float d1, float d2, float d3,
 	       float d4, svuint32x3_t x) {}
 void fn_u64x3 (float d0, float d1, float d2, float d3,
 	       float d4, svuint64x3_t x) {}
+void fn_bf16x3 (float d0, float d1, float d2, float d3,
+		float d4, svbfloat16x3_t x) {}
 void fn_f16x3 (float d0, float d1, float d2, float d3,
 	       float d4, svfloat16x3_t x) {}
 void fn_f32x3 (float d0, float d1, float d2, float d3,
@@ -87,6 +93,8 @@ void fn_u32x4 (float d0, float d1, float d2, float d3,
 	       float d4, svuint32x4_t x) {}
 void fn_u64x4 (float d0, float d1, float d2, float d3,
 	       float d4, svuint64x4_t x) {}
+void fn_bf16x4 (float d0, float d1, float d2, float d3,
+		float d4, svbfloat16x4_t x) {}
 void fn_f16x4 (float d0, float d1, float d2, float d3,
 	       float d4, svfloat16x4_t x) {}
 void fn_f32x4 (float d0, float d1, float d2, float d3,
@@ -102,6 +110,7 @@ void fn_f64x4 (float d0, float d1, float d2, float d3,
 /* { dg-final { scan-assembler {\t\.variant_pcs\tfn_u16\n} } } */
 /* { dg-final { scan-assembler {\t\.variant_pcs\tfn_u32\n} } } */
 /* { dg-final { scan-assembler {\t\.variant_pcs\tfn_u64\n} } } */
+/* { dg-final { scan-assembler {\t\.variant_pcs\tfn_bf16\n} } } */
 /* { dg-final { scan-assembler {\t\.variant_pcs\tfn_f16\n} } } */
 /* { dg-final { scan-assembler {\t\.variant_pcs\tfn_f32\n} } } */
 /* { dg-final { scan-assembler {\t\.variant_pcs\tfn_f64\n} } } */
@@ -114,6 +123,7 @@ void fn_f64x4 (float d0, float d1, float d2, float d3,
 /* { dg-final { scan-assembler {\t\.variant_pcs\tfn_u16x2\n} } } */
 /* { dg-final { scan-assembler {\t\.variant_pcs\tfn_u32x2\n} } } */
 /* { dg-final { scan-assembler {\t\.variant_pcs\tfn_u64x2\n} } } */
+/* { dg-final { scan-assembler {\t\.variant_pcs\tfn_bf16x2\n} } } */
 /* { dg-final { scan-assembler {\t\.variant_pcs\tfn_f16x2\n} } } */
 /* { dg-final { scan-assembler {\t\.variant_pcs\tfn_f32x2\n} } } */
 /* { dg-final { scan-assembler {\t\.variant_pcs\tfn_f64x2\n} } } */
@@ -126,6 +136,7 @@ void fn_f64x4 (float d0, float d1, float d2, float d3,
 /* { dg-final { scan-assembler {\t\.variant_pcs\tfn_u16x3\n} } } */
 /* { dg-final { scan-assembler {\t\.variant_pcs\tfn_u32x3\n} } } */
 /* { dg-final { scan-assembler {\t\.variant_pcs\tfn_u64x3\n} } } */
+/* { dg-final { scan-assembler {\t\.variant_pcs\tfn_bf16x3\n} } } */
 /* { dg-final { scan-assembler {\t\.variant_pcs\tfn_f16x3\n} } } */
 /* { dg-final { scan-assembler {\t\.variant_pcs\tfn_f32x3\n} } } */
 /* { dg-final { scan-assembler {\t\.variant_pcs\tfn_f64x3\n} } } */
@@ -138,6 +149,7 @@ void fn_f64x4 (float d0, float d1, float d2, float d3,
 /* { dg-final { scan-assembler-not {\t\.variant_pcs\tfn_u16x4\n} } } */
 /* { dg-final { scan-assembler-not {\t\.variant_pcs\tfn_u32x4\n} } } */
 /* { dg-final { scan-assembler-not {\t\.variant_pcs\tfn_u64x4\n} } } */
+/* { dg-final { scan-assembler-not {\t\.variant_pcs\tfn_bf16x4\n} } } */
 /* { dg-final { scan-assembler-not {\t\.variant_pcs\tfn_f16x4\n} } } */
 /* { dg-final { scan-assembler-not {\t\.variant_pcs\tfn_f32x4\n} } } */
 /* { dg-final { scan-assembler-not {\t\.variant_pcs\tfn_f64x4\n} } } */
diff --git a/gcc/testsuite/gcc.target/aarch64/sve/pcs/annotate_5.c b/gcc/testsuite/gcc.target/aarch64/sve/pcs/annotate_5.c
index 3523528..6dadc04 100644
--- a/gcc/testsuite/gcc.target/aarch64/sve/pcs/annotate_5.c
+++ b/gcc/testsuite/gcc.target/aarch64/sve/pcs/annotate_5.c
@@ -18,6 +18,8 @@ void fn_u32 (float d0, float d1, float d2, float d3,
 	     float d4, float d5, svuint32_t x) {}
 void fn_u64 (float d0, float d1, float d2, float d3,
 	     float d4, float d5, svuint64_t x) {}
+void fn_bf16 (float d0, float d1, float d2, float d3,
+	      float d4, float d5, svbfloat16_t x) {}
 void fn_f16 (float d0, float d1, float d2, float d3,
 	     float d4, float d5, svfloat16_t x) {}
 void fn_f32 (float d0, float d1, float d2, float d3,
@@ -41,6 +43,8 @@ void fn_u32x2 (float d0, float d1, float d2, float d3,
 	       float d4, float d5, svuint32x2_t x) {}
 void fn_u64x2 (float d0, float d1, float d2, float d3,
 	       float d4, float d5, svuint64x2_t x) {}
+void fn_bf16x2 (float d0, float d1, float d2, float d3,
+		float d4, float d5, svbfloat16x2_t x) {}
 void fn_f16x2 (float d0, float d1, float d2, float d3,
 	       float d4, float d5, svfloat16x2_t x) {}
 void fn_f32x2 (float d0, float d1, float d2, float d3,
@@ -64,6 +68,8 @@ void fn_u32x3 (float d0, float d1, float d2, float d3,
 	       float d4, float d5, svuint32x3_t x) {}
 void fn_u64x3 (float d0, float d1, float d2, float d3,
 	       float d4, float d5, svuint64x3_t x) {}
+void fn_bf16x3 (float d0, float d1, float d2, float d3,
+		float d4, float d5, svbfloat16x3_t x) {}
 void fn_f16x3 (float d0, float d1, float d2, float d3,
 	       float d4, float d5, svfloat16x3_t x) {}
 void fn_f32x3 (float d0, float d1, float d2, float d3,
@@ -87,6 +93,8 @@ void fn_u32x4 (float d0, float d1, float d2, float d3,
 	       float d4, float d5, svuint32x4_t x) {}
 void fn_u64x4 (float d0, float d1, float d2, float d3,
 	       float d4, float d5, svuint64x4_t x) {}
+void fn_bf16x4 (float d0, float d1, float d2, float d3,
+		float d4, float d5, svbfloat16x4_t x) {}
 void fn_f16x4 (float d0, float d1, float d2, float d3,
 	       float d4, float d5, svfloat16x4_t x) {}
 void fn_f32x4 (float d0, float d1, float d2, float d3,
@@ -102,6 +110,7 @@ void fn_f64x4 (float d0, float d1, float d2, float d3,
 /* { dg-final { scan-assembler {\t\.variant_pcs\tfn_u16\n} } } */
 /* { dg-final { scan-assembler {\t\.variant_pcs\tfn_u32\n} } } */
 /* { dg-final { scan-assembler {\t\.variant_pcs\tfn_u64\n} } } */
+/* { dg-final { scan-assembler {\t\.variant_pcs\tfn_bf16\n} } } */
 /* { dg-final { scan-assembler {\t\.variant_pcs\tfn_f16\n} } } */
 /* { dg-final { scan-assembler {\t\.variant_pcs\tfn_f32\n} } } */
 /* { dg-final { scan-assembler {\t\.variant_pcs\tfn_f64\n} } } */
@@ -114,6 +123,7 @@ void fn_f64x4 (float d0, float d1, float d2, float d3,
 /* { dg-final { scan-assembler {\t\.variant_pcs\tfn_u16x2\n} } } */
 /* { dg-final { scan-assembler {\t\.variant_pcs\tfn_u32x2\n} } } */
 /* { dg-final { scan-assembler {\t\.variant_pcs\tfn_u64x2\n} } } */
+/* { dg-final { scan-assembler {\t\.variant_pcs\tfn_bf16x2\n} } } */
 /* { dg-final { scan-assembler {\t\.variant_pcs\tfn_f16x2\n} } } */
 /* { dg-final { scan-assembler {\t\.variant_pcs\tfn_f32x2\n} } } */
 /* { dg-final { scan-assembler {\t\.variant_pcs\tfn_f64x2\n} } } */
@@ -126,6 +136,7 @@ void fn_f64x4 (float d0, float d1, float d2, float d3,
 /* { dg-final { scan-assembler-not {\t\.variant_pcs\tfn_u16x3\n} } } */
 /* { dg-final { scan-assembler-not {\t\.variant_pcs\tfn_u32x3\n} } } */
 /* { dg-final { scan-assembler-not {\t\.variant_pcs\tfn_u64x3\n} } } */
+/* { dg-final { scan-assembler-not {\t\.variant_pcs\tfn_bf16x3\n} } } */
 /* { dg-final { scan-assembler-not {\t\.variant_pcs\tfn_f16x3\n} } } */
 /* { dg-final { scan-assembler-not {\t\.variant_pcs\tfn_f32x3\n} } } */
 /* { dg-final { scan-assembler-not {\t\.variant_pcs\tfn_f64x3\n} } } */
@@ -138,6 +149,7 @@ void fn_f64x4 (float d0, float d1, float d2, float d3,
 /* { dg-final { scan-assembler-not {\t\.variant_pcs\tfn_u16x4\n} } } */
 /* { dg-final { scan-assembler-not {\t\.variant_pcs\tfn_u32x4\n} } } */
 /* { dg-final { scan-assembler-not {\t\.variant_pcs\tfn_u64x4\n} } } */
+/* { dg-final { scan-assembler-not {\t\.variant_pcs\tfn_bf16x4\n} } } */
 /* { dg-final { scan-assembler-not {\t\.variant_pcs\tfn_f16x4\n} } } */
 /* { dg-final { scan-assembler-not {\t\.variant_pcs\tfn_f32x4\n} } } */
 /* { dg-final { scan-assembler-not {\t\.variant_pcs\tfn_f64x4\n} } } */
diff --git a/gcc/testsuite/gcc.target/aarch64/sve/pcs/annotate_6.c b/gcc/testsuite/gcc.target/aarch64/sve/pcs/annotate_6.c
index 1f89dce..0ff73e2 100644
--- a/gcc/testsuite/gcc.target/aarch64/sve/pcs/annotate_6.c
+++ b/gcc/testsuite/gcc.target/aarch64/sve/pcs/annotate_6.c
@@ -18,6 +18,8 @@ void fn_u32 (float d0, float d1, float d2, float d3,
 	     float d4, float d5, float d6, svuint32_t x) {}
 void fn_u64 (float d0, float d1, float d2, float d3,
 	     float d4, float d5, float d6, svuint64_t x) {}
+void fn_bf16 (float d0, float d1, float d2, float d3,
+	      float d4, float d5, float d6, svbfloat16_t x) {}
 void fn_f16 (float d0, float d1, float d2, float d3,
 	     float d4, float d5, float d6, svfloat16_t x) {}
 void fn_f32 (float d0, float d1, float d2, float d3,
@@ -41,6 +43,8 @@ void fn_u32x2 (float d0, float d1, float d2, float d3,
 	       float d4, float d5, float d6, svuint32x2_t x) {}
 void fn_u64x2 (float d0, float d1, float d2, float d3,
 	       float d4, float d5, float d6, svuint64x2_t x) {}
+void fn_bf16x2 (float d0, float d1, float d2, float d3,
+		float d4, float d5, float d6, svbfloat16x2_t x) {}
 void fn_f16x2 (float d0, float d1, float d2, float d3,
 	       float d4, float d5, float d6, svfloat16x2_t x) {}
 void fn_f32x2 (float d0, float d1, float d2, float d3,
@@ -64,6 +68,8 @@ void fn_u32x3 (float d0, float d1, float d2, float d3,
 	       float d4, float d5, float d6, svuint32x3_t x) {}
 void fn_u64x3 (float d0, float d1, float d2, float d3,
 	       float d4, float d5, float d6, svuint64x3_t x) {}
+void fn_bf16x3 (float d0, float d1, float d2, float d3,
+		float d4, float d5, float d6, svbfloat16x3_t x) {}
 void fn_f16x3 (float d0, float d1, float d2, float d3,
 	       float d4, float d5, float d6, svfloat16x3_t x) {}
 void fn_f32x3 (float d0, float d1, float d2, float d3,
@@ -87,6 +93,8 @@ void fn_u32x4 (float d0, float d1, float d2, float d3,
 	       float d4, float d5, float d6, svuint32x4_t x) {}
 void fn_u64x4 (float d0, float d1, float d2, float d3,
 	       float d4, float d5, float d6, svuint64x4_t x) {}
+void fn_bf16x4 (float d0, float d1, float d2, float d3,
+		float d4, float d5, float d6, svbfloat16x4_t x) {}
 void fn_f16x4 (float d0, float d1, float d2, float d3,
 	       float d4, float d5, float d6, svfloat16x4_t x) {}
 void fn_f32x4 (float d0, float d1, float d2, float d3,
@@ -102,6 +110,7 @@ void fn_f64x4 (float d0, float d1, float d2, float d3,
 /* { dg-final { scan-assembler {\t\.variant_pcs\tfn_u16\n} } } */
 /* { dg-final { scan-assembler {\t\.variant_pcs\tfn_u32\n} } } */
 /* { dg-final { scan-assembler {\t\.variant_pcs\tfn_u64\n} } } */
+/* { dg-final { scan-assembler {\t\.variant_pcs\tfn_bf16\n} } } */
 /* { dg-final { scan-assembler {\t\.variant_pcs\tfn_f16\n} } } */
 /* { dg-final { scan-assembler {\t\.variant_pcs\tfn_f32\n} } } */
 /* { dg-final { scan-assembler {\t\.variant_pcs\tfn_f64\n} } } */
@@ -114,6 +123,7 @@ void fn_f64x4 (float d0, float d1, float d2, float d3,
 /* { dg-final { scan-assembler-not {\t\.variant_pcs\tfn_u16x2\n} } } */
 /* { dg-final { scan-assembler-not {\t\.variant_pcs\tfn_u32x2\n} } } */
 /* { dg-final { scan-assembler-not {\t\.variant_pcs\tfn_u64x2\n} } } */
+/* { dg-final { scan-assembler-not {\t\.variant_pcs\tfn_bf16x2\n} } } */
 /* { dg-final { scan-assembler-not {\t\.variant_pcs\tfn_f16x2\n} } } */
 /* { dg-final { scan-assembler-not {\t\.variant_pcs\tfn_f32x2\n} } } */
 /* { dg-final { scan-assembler-not {\t\.variant_pcs\tfn_f64x2\n} } } */
@@ -126,6 +136,7 @@ void fn_f64x4 (float d0, float d1, float d2, float d3,
 /* { dg-final { scan-assembler-not {\t\.variant_pcs\tfn_u16x3\n} } } */
 /* { dg-final { scan-assembler-not {\t\.variant_pcs\tfn_u32x3\n} } } */
 /* { dg-final { scan-assembler-not {\t\.variant_pcs\tfn_u64x3\n} } } */
+/* { dg-final { scan-assembler-not {\t\.variant_pcs\tfn_bf16x3\n} } } */
 /* { dg-final { scan-assembler-not {\t\.variant_pcs\tfn_f16x3\n} } } */
 /* { dg-final { scan-assembler-not {\t\.variant_pcs\tfn_f32x3\n} } } */
 /* { dg-final { scan-assembler-not {\t\.variant_pcs\tfn_f64x3\n} } } */
@@ -138,6 +149,7 @@ void fn_f64x4 (float d0, float d1, float d2, float d3,
 /* { dg-final { scan-assembler-not {\t\.variant_pcs\tfn_u16x4\n} } } */
 /* { dg-final { scan-assembler-not {\t\.variant_pcs\tfn_u32x4\n} } } */
 /* { dg-final { scan-assembler-not {\t\.variant_pcs\tfn_u64x4\n} } } */
+/* { dg-final { scan-assembler-not {\t\.variant_pcs\tfn_bf16x4\n} } } */
 /* { dg-final { scan-assembler-not {\t\.variant_pcs\tfn_f16x4\n} } } */
 /* { dg-final { scan-assembler-not {\t\.variant_pcs\tfn_f32x4\n} } } */
 /* { dg-final { scan-assembler-not {\t\.variant_pcs\tfn_f64x4\n} } } */
diff --git a/gcc/testsuite/gcc.target/aarch64/sve/pcs/annotate_7.c b/gcc/testsuite/gcc.target/aarch64/sve/pcs/annotate_7.c
index e67d180..4f3ff81 100644
--- a/gcc/testsuite/gcc.target/aarch64/sve/pcs/annotate_7.c
+++ b/gcc/testsuite/gcc.target/aarch64/sve/pcs/annotate_7.c
@@ -18,6 +18,8 @@ void fn_u32 (float d0, float d1, float d2, float d3,
 	     float d4, float d5, float d6, float d7, svuint32_t x) {}
 void fn_u64 (float d0, float d1, float d2, float d3,
 	     float d4, float d5, float d6, float d7, svuint64_t x) {}
+void fn_bf16 (float d0, float d1, float d2, float d3,
+	      float d4, float d5, float d6, float d7, svbfloat16_t x) {}
 void fn_f16 (float d0, float d1, float d2, float d3,
 	     float d4, float d5, float d6, float d7, svfloat16_t x) {}
 void fn_f32 (float d0, float d1, float d2, float d3,
@@ -41,6 +43,8 @@ void fn_u32x2 (float d0, float d1, float d2, float d3,
 	       float d4, float d5, float d6, float d7, svuint32x2_t x) {}
 void fn_u64x2 (float d0, float d1, float d2, float d3,
 	       float d4, float d5, float d6, float d7, svuint64x2_t x) {}
+void fn_bf16x2 (float d0, float d1, float d2, float d3,
+		float d4, float d5, float d6, float d7, svbfloat16x2_t x) {}
 void fn_f16x2 (float d0, float d1, float d2, float d3,
 	       float d4, float d5, float d6, float d7, svfloat16x2_t x) {}
 void fn_f32x2 (float d0, float d1, float d2, float d3,
@@ -64,6 +68,8 @@ void fn_u32x3 (float d0, float d1, float d2, float d3,
 	       float d4, float d5, float d6, float d7, svuint32x3_t x) {}
 void fn_u64x3 (float d0, float d1, float d2, float d3,
 	       float d4, float d5, float d6, float d7, svuint64x3_t x) {}
+void fn_bf16x3 (float d0, float d1, float d2, float d3,
+		float d4, float d5, float d6, float d7, svbfloat16x3_t x) {}
 void fn_f16x3 (float d0, float d1, float d2, float d3,
 	       float d4, float d5, float d6, float d7, svfloat16x3_t x) {}
 void fn_f32x3 (float d0, float d1, float d2, float d3,
@@ -87,6 +93,8 @@ void fn_u32x4 (float d0, float d1, float d2, float d3,
 	       float d4, float d5, float d6, float d7, svuint32x4_t x) {}
 void fn_u64x4 (float d0, float d1, float d2, float d3,
 	       float d4, float d5, float d6, float d7, svuint64x4_t x) {}
+void fn_bf16x4 (float d0, float d1, float d2, float d3,
+		float d4, float d5, float d6, float d7, svbfloat16x4_t x) {}
 void fn_f16x4 (float d0, float d1, float d2, float d3,
 	       float d4, float d5, float d6, float d7, svfloat16x4_t x) {}
 void fn_f32x4 (float d0, float d1, float d2, float d3,
diff --git a/gcc/testsuite/gcc.target/aarch64/sve/pcs/args_5_be_bf16.c b/gcc/testsuite/gcc.target/aarch64/sve/pcs/args_5_be_bf16.c
new file mode 100644
index 0000000..e9b63a4
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/sve/pcs/args_5_be_bf16.c
@@ -0,0 +1,63 @@
+/* { dg-do compile { target lp64 } } */
+/* { dg-options "-O -mbig-endian -fno-stack-clash-protection -g" } */
+/* { dg-final { check-function-bodies "**" "" } } */
+
+#pragma GCC aarch64 "arm_sve.h"
+
+/*
+** callee:
+**	addvl	sp, sp, #-1
+**	str	p4, \[sp\]
+**	ptrue	p4\.b, all
+** (
+**	ld1h	(z[0-9]+\.h), p4/z, \[x1, #1, mul vl\]
+**	ld1h	(z[0-9]+\.h), p4/z, \[x1\]
+**	st2h	{\2 - \1}, p0, \[x0\]
+** |
+**	ld1h	(z[0-9]+\.h), p4/z, \[x1\]
+**	ld1h	(z[0-9]+\.h), p4/z, \[x1, #1, mul vl\]
+**	st2h	{\3 - \4}, p0, \[x0\]
+** )
+**	st4h	{z0\.h - z3\.h}, p1, \[x0\]
+**	st3h	{z4\.h - z6\.h}, p2, \[x0\]
+**	st1h	z7\.h, p3, \[x0\]
+**	ldr	p4, \[sp\]
+**	addvl	sp, sp, #1
+**	ret
+*/
+void __attribute__((noipa))
+callee (void *x0, svbfloat16x4_t z0, svbfloat16x3_t z4, svbfloat16x2_t stack,
+	svbfloat16_t z7, svbool_t p0, svbool_t p1, svbool_t p2, svbool_t p3)
+{
+  svst2 (p0, x0, stack);
+  svst4 (p1, x0, z0);
+  svst3 (p2, x0, z4);
+  svst1_bf16 (p3, x0, z7);
+}
+
+void __attribute__((noipa))
+caller (void *x0)
+{
+  svbool_t pg;
+  pg = svptrue_b8 ();
+  callee (x0,
+	  svld4_vnum_bf16 (pg, x0, -8),
+	  svld3_vnum_bf16 (pg, x0, -3),
+	  svld2_vnum_bf16 (pg, x0, 0),
+	  svld1_vnum_bf16 (pg, x0, 2),
+	  svptrue_pat_b8 (SV_VL1),
+	  svptrue_pat_b16 (SV_VL2),
+	  svptrue_pat_b32 (SV_VL3),
+	  svptrue_pat_b64 (SV_VL4));
+}
+
+/* { dg-final { scan-assembler {\tld4h\t{z0\.h - z3\.h}, p[0-7]/z, \[x0, #-8, mul vl\]\n} } } */
+/* { dg-final { scan-assembler {\tld3h\t{z4\.h - z6\.h}, p[0-7]/z, \[x0, #-3, mul vl\]\n} } } */
+/* { dg-final { scan-assembler {\tld1h\tz7\.h, p[0-7]/z, \[x0, #2, mul vl\]\n} } } */
+/* { dg-final { scan-assembler {\tmov\tx1, sp\n} } } */
+/* { dg-final { scan-assembler {\tld2h\t{(z[0-9]+\.h) - z[0-9]+\.h}.*\tst1h\t\1, p[0-7], \[x1\]\n} } } */
+/* { dg-final { scan-assembler {\tld2h\t{z[0-9]+\.h - (z[0-9]+\.h)}.*\tst1h\t\1, p[0-7], \[x1, #1, mul vl\]\n} } } */
+/* { dg-final { scan-assembler {\tptrue\tp0\.b, vl1\n} } } */
+/* { dg-final { scan-assembler {\tptrue\tp1\.h, vl2\n} } } */
+/* { dg-final { scan-assembler {\tptrue\tp2\.s, vl3\n} } } */
+/* { dg-final { scan-assembler {\tptrue\tp3\.d, vl4\n} } } */
diff --git a/gcc/testsuite/gcc.target/aarch64/sve/pcs/args_5_le_bf16.c b/gcc/testsuite/gcc.target/aarch64/sve/pcs/args_5_le_bf16.c
new file mode 100644
index 0000000..94d84df
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/sve/pcs/args_5_le_bf16.c
@@ -0,0 +1,58 @@
+/* { dg-do compile { target lp64 } } */
+/* { dg-options "-O -mlittle-endian -fno-stack-clash-protection -g" } */
+/* { dg-final { check-function-bodies "**" "" } } */
+
+#pragma GCC aarch64 "arm_sve.h"
+
+/*
+** callee:
+** (
+**	ldr	(z[0-9]+), \[x1, #1, mul vl\]
+**	ldr	(z[0-9]+), \[x1\]
+**	st2h	{\2\.h - \1\.h}, p0, \[x0\]
+** |
+**	ldr	(z[0-9]+), \[x1\]
+**	ldr	(z[0-9]+), \[x1, #1, mul vl\]
+**	st2h	{\3\.h - \4\.h}, p0, \[x0\]
+** )
+**	st4h	{z0\.h - z3\.h}, p1, \[x0\]
+**	st3h	{z4\.h - z6\.h}, p2, \[x0\]
+**	st1h	z7\.h, p3, \[x0\]
+**	ret
+*/
+void __attribute__((noipa))
+callee (void *x0, svbfloat16x4_t z0, svbfloat16x3_t z4, svbfloat16x2_t stack,
+	svbfloat16_t z7, svbool_t p0, svbool_t p1, svbool_t p2, svbool_t p3)
+{
+  svst2 (p0, x0, stack);
+  svst4 (p1, x0, z0);
+  svst3 (p2, x0, z4);
+  svst1_bf16 (p3, x0, z7);
+}
+
+void __attribute__((noipa))
+caller (void *x0)
+{
+  svbool_t pg;
+  pg = svptrue_b8 ();
+  callee (x0,
+	  svld4_vnum_bf16 (pg, x0, -8),
+	  svld3_vnum_bf16 (pg, x0, -3),
+	  svld2_vnum_bf16 (pg, x0, 0),
+	  svld1_vnum_bf16 (pg, x0, 2),
+	  svptrue_pat_b8 (SV_VL1),
+	  svptrue_pat_b16 (SV_VL2),
+	  svptrue_pat_b32 (SV_VL3),
+	  svptrue_pat_b64 (SV_VL4));
+}
+
+/* { dg-final { scan-assembler {\tld4h\t{z0\.h - z3\.h}, p[0-7]/z, \[x0, #-8, mul vl\]\n} } } */
+/* { dg-final { scan-assembler {\tld3h\t{z4\.h - z6\.h}, p[0-7]/z, \[x0, #-3, mul vl\]\n} } } */
+/* { dg-final { scan-assembler {\tld1h\tz7\.h, p[0-7]/z, \[x0, #2, mul vl\]\n} } } */
+/* { dg-final { scan-assembler {\tmov\tx1, sp\n} } } */
+/* { dg-final { scan-assembler {\tld2h\t{(z[0-9]+)\.h - z[0-9]+\.h}.*\tstr\t\1, \[x1\]\n} } } */
+/* { dg-final { scan-assembler {\tld2h\t{z[0-9]+\.h - (z[0-9]+)\.h}.*\tstr\t\1, \[x1, #1, mul vl\]\n} } } */
+/* { dg-final { scan-assembler {\tptrue\tp0\.b, vl1\n} } } */
+/* { dg-final { scan-assembler {\tptrue\tp1\.h, vl2\n} } } */
+/* { dg-final { scan-assembler {\tptrue\tp2\.s, vl3\n} } } */
+/* { dg-final { scan-assembler {\tptrue\tp3\.d, vl4\n} } } */
diff --git a/gcc/testsuite/gcc.target/aarch64/sve/pcs/args_6_be_bf16.c b/gcc/testsuite/gcc.target/aarch64/sve/pcs/args_6_be_bf16.c
new file mode 100644
index 0000000..84d2c40
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/sve/pcs/args_6_be_bf16.c
@@ -0,0 +1,71 @@
+/* { dg-do compile { target lp64 } } */
+/* { dg-options "-O -mbig-endian -fno-stack-clash-protection -g" } */
+/* { dg-final { check-function-bodies "**" "" } } */
+
+#pragma GCC aarch64 "arm_sve.h"
+
+/*
+** callee1:
+**	ptrue	p3\.b, all
+**	...
+**	ld1h	(z[0-9]+\.h), p3/z, \[x1, #3, mul vl\]
+**	...
+**	st4h	{z[0-9]+\.h - \1}, p0, \[x0\]
+**	st2h	{z3\.h - z4\.h}, p1, \[x0\]
+**	st3h	{z5\.h - z7\.h}, p2, \[x0\]
+**	ret
+*/
+void __attribute__((noipa))
+callee1 (void *x0, svbfloat16x3_t z0, svbfloat16x2_t z3, svbfloat16x3_t z5,
+	 svbfloat16x4_t stack1, svbfloat16_t stack2, svbool_t p0,
+	 svbool_t p1, svbool_t p2)
+{
+  svst4_bf16 (p0, x0, stack1);
+  svst2_bf16 (p1, x0, z3);
+  svst3_bf16 (p2, x0, z5);
+}
+
+/*
+** callee2:
+**	ptrue	p3\.b, all
+**	ld1h	(z[0-9]+\.h), p3/z, \[x2\]
+**	st1h	\1, p0, \[x0\]
+**	st2h	{z3\.h - z4\.h}, p1, \[x0\]
+**	st3h	{z0\.h - z2\.h}, p2, \[x0\]
+**	ret
+*/
+void __attribute__((noipa))
+callee2 (void *x0, svbfloat16x3_t z0, svbfloat16x2_t z3, svbfloat16x3_t z5,
+	 svbfloat16x4_t stack1, svbfloat16_t stack2, svbool_t p0,
+	 svbool_t p1, svbool_t p2)
+{
+  svst1_bf16 (p0, x0, stack2);
+  svst2_bf16 (p1, x0, z3);
+  svst3_bf16 (p2, x0, z0);
+}
+
+void __attribute__((noipa))
+caller (void *x0)
+{
+  svbool_t pg;
+  pg = svptrue_b8 ();
+  callee1 (x0,
+	   svld3_vnum_bf16 (pg, x0, -9),
+	   svld2_vnum_bf16 (pg, x0, -2),
+	   svld3_vnum_bf16 (pg, x0, 0),
+	   svld4_vnum_bf16 (pg, x0, 8),
+	   svld1_vnum_bf16 (pg, x0, 5),
+	   svptrue_pat_b8 (SV_VL1),
+	   svptrue_pat_b16 (SV_VL2),
+	   svptrue_pat_b32 (SV_VL3));
+}
+
+/* { dg-final { scan-assembler {\tld3h\t{z0\.h - z2\.h}, p[0-7]/z, \[x0, #-9, mul vl\]\n} } } */
+/* { dg-final { scan-assembler {\tld2h\t{z3\.h - z4\.h}, p[0-7]/z, \[x0, #-2, mul vl\]\n} } } */
+/* { dg-final { scan-assembler {\tld3h\t{z5\.h - z7\.h}, p[0-7]/z, \[x0\]\n} } } */
+/* { dg-final { scan-assembler {\tld4h\t{(z[0-9]+\.h) - z[0-9]+\.h}.*\tst1h\t\1, p[0-7], \[x1\]\n} } } */
+/* { dg-final { scan-assembler {\tld4h\t{z[0-9]+\.h - (z[0-9]+\.h)}.*\tst1h\t\1, p[0-7], \[x1, #3, mul vl\]\n} } } */
+/* { dg-final { scan-assembler {\tld1h\t(z[0-9]+\.h), p[0-7]/z, \[x0, #5, mul vl\]\n.*\tst1h\t\1, p[0-7], \[x2\]\n} } } */
+/* { dg-final { scan-assembler {\tptrue\tp0\.b, vl1\n} } } */
+/* { dg-final { scan-assembler {\tptrue\tp1\.h, vl2\n} } } */
+/* { dg-final { scan-assembler {\tptrue\tp2\.s, vl3\n} } } */
diff --git a/gcc/testsuite/gcc.target/aarch64/sve/pcs/args_6_le_bf16.c b/gcc/testsuite/gcc.target/aarch64/sve/pcs/args_6_le_bf16.c
new file mode 100644
index 0000000..3dc9e42
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/sve/pcs/args_6_le_bf16.c
@@ -0,0 +1,70 @@
+/* { dg-do compile { target lp64 } } */
+/* { dg-options "-O -mlittle-endian -fno-stack-clash-protection -g" } */
+/* { dg-final { check-function-bodies "**" "" } } */
+
+#pragma GCC aarch64 "arm_sve.h"
+
+/*
+** callee1:
+**	...
+**	ldr	(z[0-9]+), \[x1, #3, mul vl\]
+**	...
+**	st4h	{z[0-9]+\.h - \1\.h}, p0, \[x0\]
+**	st2h	{z3\.h - z4\.h}, p1, \[x0\]
+**	st3h	{z5\.h - z7\.h}, p2, \[x0\]
+**	ret
+*/
+void __attribute__((noipa))
+callee1 (void *x0, svbfloat16x3_t z0, svbfloat16x2_t z3, svbfloat16x3_t z5,
+	 svbfloat16x4_t stack1, svbfloat16_t stack2, svbool_t p0,
+	 svbool_t p1, svbool_t p2)
+{
+  svst4_bf16 (p0, x0, stack1);
+  svst2_bf16 (p1, x0, z3);
+  svst3_bf16 (p2, x0, z5);
+}
+
+/*
+** callee2:
+**	ptrue	p3\.b, all
+**	ld1h	(z[0-9]+\.h), p3/z, \[x2\]
+**	st1h	\1, p0, \[x0\]
+**	st2h	{z3\.h - z4\.h}, p1, \[x0\]
+**	st3h	{z0\.h - z2\.h}, p2, \[x0\]
+**	ret
+*/
+void __attribute__((noipa))
+callee2 (void *x0, svbfloat16x3_t z0, svbfloat16x2_t z3, svbfloat16x3_t z5,
+	 svbfloat16x4_t stack1, svbfloat16_t stack2, svbool_t p0,
+	 svbool_t p1, svbool_t p2)
+{
+  svst1_bf16 (p0, x0, stack2);
+  svst2_bf16 (p1, x0, z3);
+  svst3_bf16 (p2, x0, z0);
+}
+
+void __attribute__((noipa))
+caller (void *x0)
+{
+  svbool_t pg;
+  pg = svptrue_b8 ();
+  callee1 (x0,
+	   svld3_vnum_bf16 (pg, x0, -9),
+	   svld2_vnum_bf16 (pg, x0, -2),
+	   svld3_vnum_bf16 (pg, x0, 0),
+	   svld4_vnum_bf16 (pg, x0, 8),
+	   svld1_vnum_bf16 (pg, x0, 5),
+	   svptrue_pat_b8 (SV_VL1),
+	   svptrue_pat_b16 (SV_VL2),
+	   svptrue_pat_b32 (SV_VL3));
+}
+
+/* { dg-final { scan-assembler {\tld3h\t{z0\.h - z2\.h}, p[0-7]/z, \[x0, #-9, mul vl\]\n} } } */
+/* { dg-final { scan-assembler {\tld2h\t{z3\.h - z4\.h}, p[0-7]/z, \[x0, #-2, mul vl\]\n} } } */
+/* { dg-final { scan-assembler {\tld3h\t{z5\.h - z7\.h}, p[0-7]/z, \[x0\]\n} } } */
+/* { dg-final { scan-assembler {\tld4h\t{(z[0-9]+)\.h - z[0-9]+\.h}.*\tstr\t\1, \[x1\]\n} } } */
+/* { dg-final { scan-assembler {\tld4h\t{z[0-9]+\.h - (z[0-9]+)\.h}.*\tstr\t\1, \[x1, #3, mul vl\]\n} } } */
+/* { dg-final { scan-assembler {\tld1h\t(z[0-9]+\.h), p[0-7]/z, \[x0, #5, mul vl\]\n.*\tst1h\t\1, p[0-7], \[x2\]\n} } } */
+/* { dg-final { scan-assembler {\tptrue\tp0\.b, vl1\n} } } */
+/* { dg-final { scan-assembler {\tptrue\tp1\.h, vl2\n} } } */
+/* { dg-final { scan-assembler {\tptrue\tp2\.s, vl3\n} } } */
diff --git a/gcc/testsuite/gcc.target/aarch64/sve/pcs/gnu_vectors_1.c b/gcc/testsuite/gcc.target/aarch64/sve/pcs/gnu_vectors_1.c
index 6bf9e77..e5fceb1 100644
--- a/gcc/testsuite/gcc.target/aarch64/sve/pcs/gnu_vectors_1.c
+++ b/gcc/testsuite/gcc.target/aarch64/sve/pcs/gnu_vectors_1.c
@@ -2,6 +2,7 @@
 
 #include <arm_sve.h>
 
+typedef bfloat16_t bfloat16x16_t __attribute__((vector_size (32)));
 typedef float16_t float16x16_t __attribute__((vector_size (32)));
 typedef float32_t float32x8_t __attribute__((vector_size (32)));
 typedef float64_t float64x4_t __attribute__((vector_size (32)));
@@ -14,6 +15,7 @@ typedef uint16_t uint16x16_t __attribute__((vector_size (32)));
 typedef uint32_t uint32x8_t __attribute__((vector_size (32)));
 typedef uint64_t uint64x4_t __attribute__((vector_size (32)));
 
+void bfloat16_callee (bfloat16x16_t);
 void float16_callee (float16x16_t);
 void float32_callee (float32x8_t);
 void float64_callee (float64x4_t);
@@ -27,6 +29,12 @@ void uint32_callee (uint32x8_t);
 void uint64_callee (uint64x4_t);
 
 void
+bfloat16_caller (bfloat16_t val)
+{
+  bfloat16_callee (svdup_bf16 (val));
+}
+
+void
 float16_caller (void)
 {
   float16_callee (svdup_f16 (1.0));
@@ -93,7 +101,7 @@ uint64_caller (void)
 }
 
 /* { dg-final { scan-assembler-times {\tst1b\tz[0-9]+\.b, p[0-7], \[x0\]} 2 } } */
-/* { dg-final { scan-assembler-times {\tst1h\tz[0-9]+\.h, p[0-7], \[x0\]} 3 } } */
+/* { dg-final { scan-assembler-times {\tst1h\tz[0-9]+\.h, p[0-7], \[x0\]} 4 } } */
 /* { dg-final { scan-assembler-times {\tst1w\tz[0-9]+\.s, p[0-7], \[x0\]} 3 } } */
 /* { dg-final { scan-assembler-times {\tst1d\tz[0-9]+\.d, p[0-7], \[x0\]} 3 } } */
-/* { dg-final { scan-assembler-times {\tadd\tx0, sp, #?16\n} 11 } } */
+/* { dg-final { scan-assembler-times {\tadd\tx0, sp, #?16\n} 12 } } */
diff --git a/gcc/testsuite/gcc.target/aarch64/sve/pcs/gnu_vectors_2.c b/gcc/testsuite/gcc.target/aarch64/sve/pcs/gnu_vectors_2.c
index dc2d000..875567f 100644
--- a/gcc/testsuite/gcc.target/aarch64/sve/pcs/gnu_vectors_2.c
+++ b/gcc/testsuite/gcc.target/aarch64/sve/pcs/gnu_vectors_2.c
@@ -2,6 +2,7 @@
 
 #include <arm_sve.h>
 
+typedef bfloat16_t bfloat16x16_t __attribute__((vector_size (32)));
 typedef float16_t float16x16_t __attribute__((vector_size (32)));
 typedef float32_t float32x8_t __attribute__((vector_size (32)));
 typedef float64_t float64x4_t __attribute__((vector_size (32)));
@@ -14,6 +15,7 @@ typedef uint16_t uint16x16_t __attribute__((vector_size (32)));
 typedef uint32_t uint32x8_t __attribute__((vector_size (32)));
 typedef uint64_t uint64x4_t __attribute__((vector_size (32)));
 
+void bfloat16_callee (svbfloat16_t);
 void float16_callee (svfloat16_t);
 void float32_callee (svfloat32_t);
 void float64_callee (svfloat64_t);
@@ -27,6 +29,12 @@ void uint32_callee (svuint32_t);
 void uint64_callee (svuint64_t);
 
 void
+bfloat16_caller (bfloat16x16_t arg)
+{
+  bfloat16_callee (arg);
+}
+
+void
 float16_caller (float16x16_t arg)
 {
   float16_callee (arg);
@@ -93,7 +101,7 @@ uint64_caller (uint64x4_t arg)
 }
 
 /* { dg-final { scan-assembler-times {\tld1b\tz0\.b, p[0-7]/z, \[x0\]} 2 } } */
-/* { dg-final { scan-assembler-times {\tld1h\tz0\.h, p[0-7]/z, \[x0\]} 3 } } */
+/* { dg-final { scan-assembler-times {\tld1h\tz0\.h, p[0-7]/z, \[x0\]} 4 } } */
 /* { dg-final { scan-assembler-times {\tld1w\tz0\.s, p[0-7]/z, \[x0\]} 3 } } */
 /* { dg-final { scan-assembler-times {\tld1d\tz0\.d, p[0-7]/z, \[x0\]} 3 } } */
 /* { dg-final { scan-assembler-not {\tst1[bhwd]\t} } } */
diff --git a/gcc/testsuite/gcc.target/aarch64/sve/pcs/return_4.c b/gcc/testsuite/gcc.target/aarch64/sve/pcs/return_4.c
index 8c111ae..00eb2cb 100644
--- a/gcc/testsuite/gcc.target/aarch64/sve/pcs/return_4.c
+++ b/gcc/testsuite/gcc.target/aarch64/sve/pcs/return_4.c
@@ -50,6 +50,14 @@ CALLEE (u16, __SVUint16_t)
 CALLEE (f16, __SVFloat16_t)
 
 /*
+** callee_bf16:
+**	ptrue	(p[0-7])\.b, all
+**	ld1h	z0\.h, \1/z, \[x0\]
+**	ret
+*/
+CALLEE (bf16, __SVBfloat16_t)
+
+/*
 ** callee_s32:
 **	ptrue	(p[0-7])\.b, all
 **	ld1w	z0\.s, \1/z, \[x0\]
@@ -107,6 +115,14 @@ CALLEE (f64, __SVFloat64_t)
     return svaddv (svptrue_b8 (), callee_##SUFFIX (ptr1));	\
   }
 
+#define CALLER_BF16(SUFFIX, TYPE)				\
+  typeof (svlasta (svptrue_b8 (), *(TYPE *) 0))			\
+  __attribute__((noipa))					\
+  caller_##SUFFIX (TYPE *ptr1)					\
+  {								\
+    return svlasta (svptrue_b8 (), callee_##SUFFIX (ptr1));	\
+  }
+
 /*
 ** caller_s8:
 **	...
@@ -167,6 +183,17 @@ CALLER (u16, __SVUint16_t)
 CALLER (f16, __SVFloat16_t)
 
 /*
+** caller_bf16:
+**	...
+**	bl	callee_bf16
+**	ptrue	(p[0-7])\.b, all
+**	lasta	h0, \1, z0\.h
+**	ldp	x29, x30, \[sp\], 16
+**	ret
+*/
+CALLER_BF16 (bf16, __SVBfloat16_t)
+
+/*
 ** caller_s32:
 **	...
 **	bl	callee_s32
diff --git a/gcc/testsuite/gcc.target/aarch64/sve/pcs/return_4_1024.c b/gcc/testsuite/gcc.target/aarch64/sve/pcs/return_4_1024.c
index c9c2fa9..4351963 100644
--- a/gcc/testsuite/gcc.target/aarch64/sve/pcs/return_4_1024.c
+++ b/gcc/testsuite/gcc.target/aarch64/sve/pcs/return_4_1024.c
@@ -50,6 +50,14 @@ CALLEE (u16, __SVUint16_t)
 CALLEE (f16, __SVFloat16_t)
 
 /*
+** callee_bf16:
+**	ptrue	(p[0-7])\.b, vl128
+**	ld1h	z0\.h, \1/z, \[x0\]
+**	ret
+*/
+CALLEE (bf16, __SVBfloat16_t)
+
+/*
 ** callee_s32:
 **	ptrue	(p[0-7])\.b, vl128
 **	ld1w	z0\.s, \1/z, \[x0\]
@@ -107,6 +115,14 @@ CALLEE (f64, __SVFloat64_t)
     return svaddv (svptrue_b8 (), callee_##SUFFIX (ptr1));	\
   }
 
+#define CALLER_BF16(SUFFIX, TYPE)				\
+  typeof (svlasta (svptrue_b8 (), *(TYPE *) 0))			\
+  __attribute__((noipa))					\
+  caller_##SUFFIX (TYPE *ptr1)					\
+  {								\
+    return svlasta (svptrue_b8 (), callee_##SUFFIX (ptr1));	\
+  }
+
 /*
 ** caller_s8:
 **	...
@@ -167,6 +183,17 @@ CALLER (u16, __SVUint16_t)
 CALLER (f16, __SVFloat16_t)
 
 /*
+** caller_bf16:
+**	...
+**	bl	callee_bf16
+**	ptrue	(p[0-7])\.b, vl128
+**	lasta	h0, \1, z0\.h
+**	ldp	x29, x30, \[sp\], 16
+**	ret
+*/
+CALLER_BF16 (bf16, __SVBfloat16_t)
+
+/*
 ** caller_s32:
 **	...
 **	bl	callee_s32
diff --git a/gcc/testsuite/gcc.target/aarch64/sve/pcs/return_4_128.c b/gcc/testsuite/gcc.target/aarch64/sve/pcs/return_4_128.c
index 964c257..6b49022 100644
--- a/gcc/testsuite/gcc.target/aarch64/sve/pcs/return_4_128.c
+++ b/gcc/testsuite/gcc.target/aarch64/sve/pcs/return_4_128.c
@@ -50,6 +50,14 @@ CALLEE (u16, __SVUint16_t)
 CALLEE (f16, __SVFloat16_t)
 
 /*
+** callee_bf16:
+**	ptrue	(p[0-7])\.b, vl16
+**	ld1h	z0\.h, \1/z, \[x0\]
+**	ret
+*/
+CALLEE (bf16, __SVBfloat16_t)
+
+/*
 ** callee_s32:
 **	ptrue	(p[0-7])\.b, vl16
 **	ld1w	z0\.s, \1/z, \[x0\]
@@ -107,6 +115,14 @@ CALLEE (f64, __SVFloat64_t)
     return svaddv (svptrue_b8 (), callee_##SUFFIX (ptr1));	\
   }
 
+#define CALLER_BF16(SUFFIX, TYPE)				\
+  typeof (svlasta (svptrue_b8 (), *(TYPE *) 0))			\
+  __attribute__((noipa))					\
+  caller_##SUFFIX (TYPE *ptr1)					\
+  {								\
+    return svlasta (svptrue_b8 (), callee_##SUFFIX (ptr1));	\
+  }
+
 /*
 ** caller_s8:
 **	...
@@ -167,6 +183,17 @@ CALLER (u16, __SVUint16_t)
 CALLER (f16, __SVFloat16_t)
 
 /*
+** caller_bf16:
+**	...
+**	bl	callee_bf16
+**	ptrue	(p[0-7])\.b, vl16
+**	lasta	h0, \1, z0\.h
+**	ldp	x29, x30, \[sp\], 16
+**	ret
+*/
+CALLER_BF16 (bf16, __SVBfloat16_t)
+
+/*
 ** caller_s32:
 **	...
 **	bl	callee_s32
diff --git a/gcc/testsuite/gcc.target/aarch64/sve/pcs/return_4_2048.c b/gcc/testsuite/gcc.target/aarch64/sve/pcs/return_4_2048.c
index 475ac8f..8256645 100644
--- a/gcc/testsuite/gcc.target/aarch64/sve/pcs/return_4_2048.c
+++ b/gcc/testsuite/gcc.target/aarch64/sve/pcs/return_4_2048.c
@@ -50,6 +50,14 @@ CALLEE (u16, __SVUint16_t)
 CALLEE (f16, __SVFloat16_t)
 
 /*
+** callee_bf16:
+**	ptrue	(p[0-7])\.b, vl256
+**	ld1h	z0\.h, \1/z, \[x0\]
+**	ret
+*/
+CALLEE (bf16, __SVBfloat16_t)
+
+/*
 ** callee_s32:
 **	ptrue	(p[0-7])\.b, vl256
 **	ld1w	z0\.s, \1/z, \[x0\]
@@ -107,6 +115,14 @@ CALLEE (f64, __SVFloat64_t)
     return svaddv (svptrue_b8 (), callee_##SUFFIX (ptr1));	\
   }
 
+#define CALLER_BF16(SUFFIX, TYPE)				\
+  typeof (svlasta (svptrue_b8 (), *(TYPE *) 0))			\
+  __attribute__((noipa))					\
+  caller_##SUFFIX (TYPE *ptr1)					\
+  {								\
+    return svlasta (svptrue_b8 (), callee_##SUFFIX (ptr1));	\
+  }
+
 /*
 ** caller_s8:
 **	...
@@ -167,6 +183,17 @@ CALLER (u16, __SVUint16_t)
 CALLER (f16, __SVFloat16_t)
 
 /*
+** caller_bf16:
+**	...
+**	bl	callee_bf16
+**	ptrue	(p[0-7])\.b, vl256
+**	lasta	h0, \1, z0\.h
+**	ldp	x29, x30, \[sp\], 16
+**	ret
+*/
+CALLER_BF16 (bf16, __SVBfloat16_t)
+
+/*
 ** caller_s32:
 **	...
 **	bl	callee_s32
diff --git a/gcc/testsuite/gcc.target/aarch64/sve/pcs/return_4_256.c b/gcc/testsuite/gcc.target/aarch64/sve/pcs/return_4_256.c
index dd01831..1e0f6bb 100644
--- a/gcc/testsuite/gcc.target/aarch64/sve/pcs/return_4_256.c
+++ b/gcc/testsuite/gcc.target/aarch64/sve/pcs/return_4_256.c
@@ -50,6 +50,14 @@ CALLEE (u16, __SVUint16_t)
 CALLEE (f16, __SVFloat16_t)
 
 /*
+** callee_bf16:
+**	ptrue	(p[0-7])\.b, vl32
+**	ld1h	z0\.h, \1/z, \[x0\]
+**	ret
+*/
+CALLEE (bf16, __SVBfloat16_t)
+
+/*
 ** callee_s32:
 **	ptrue	(p[0-7])\.b, vl32
 **	ld1w	z0\.s, \1/z, \[x0\]
@@ -107,6 +115,14 @@ CALLEE (f64, __SVFloat64_t)
     return svaddv (svptrue_b8 (), callee_##SUFFIX (ptr1));	\
   }
 
+#define CALLER_BF16(SUFFIX, TYPE)				\
+  typeof (svlasta (svptrue_b8 (), *(TYPE *) 0))			\
+  __attribute__((noipa))					\
+  caller_##SUFFIX (TYPE *ptr1)					\
+  {								\
+    return svlasta (svptrue_b8 (), callee_##SUFFIX (ptr1));	\
+  }
+
 /*
 ** caller_s8:
 **	...
@@ -167,6 +183,17 @@ CALLER (u16, __SVUint16_t)
 CALLER (f16, __SVFloat16_t)
 
 /*
+** caller_bf16:
+**	...
+**	bl	callee_bf16
+**	ptrue	(p[0-7])\.b, vl32
+**	lasta	h0, \1, z0\.h
+**	ldp	x29, x30, \[sp\], 16
+**	ret
+*/
+CALLER_BF16 (bf16, __SVBfloat16_t)
+
+/*
 ** caller_s32:
 **	...
 **	bl	callee_s32
diff --git a/gcc/testsuite/gcc.target/aarch64/sve/pcs/return_4_512.c b/gcc/testsuite/gcc.target/aarch64/sve/pcs/return_4_512.c
index 04cdc9e..5b58ed7 100644
--- a/gcc/testsuite/gcc.target/aarch64/sve/pcs/return_4_512.c
+++ b/gcc/testsuite/gcc.target/aarch64/sve/pcs/return_4_512.c
@@ -50,6 +50,14 @@ CALLEE (u16, __SVUint16_t)
 CALLEE (f16, __SVFloat16_t)
 
 /*
+** callee_bf16:
+**	ptrue	(p[0-7])\.b, vl64
+**	ld1h	z0\.h, \1/z, \[x0\]
+**	ret
+*/
+CALLEE (bf16, __SVBfloat16_t)
+
+/*
 ** callee_s32:
 **	ptrue	(p[0-7])\.b, vl64
 **	ld1w	z0\.s, \1/z, \[x0\]
@@ -107,6 +115,14 @@ CALLEE (f64, __SVFloat64_t)
     return svaddv (svptrue_b8 (), callee_##SUFFIX (ptr1));	\
   }
 
+#define CALLER_BF16(SUFFIX, TYPE)				\
+  typeof (svlasta (svptrue_b8 (), *(TYPE *) 0))			\
+  __attribute__((noipa))					\
+  caller_##SUFFIX (TYPE *ptr1)					\
+  {								\
+    return svlasta (svptrue_b8 (), callee_##SUFFIX (ptr1));	\
+  }
+
 /*
 ** caller_s8:
 **	...
@@ -167,6 +183,17 @@ CALLER (u16, __SVUint16_t)
 CALLER (f16, __SVFloat16_t)
 
 /*
+** caller_bf16:
+**	...
+**	bl	callee_bf16
+**	ptrue	(p[0-7])\.b, vl64
+**	lasta	h0, \1, z0\.h
+**	ldp	x29, x30, \[sp\], 16
+**	ret
+*/
+CALLER_BF16 (bf16, __SVBfloat16_t)
+
+/*
 ** caller_s32:
 **	...
 **	bl	callee_s32
diff --git a/gcc/testsuite/gcc.target/aarch64/sve/pcs/return_5.c b/gcc/testsuite/gcc.target/aarch64/sve/pcs/return_5.c
index 17365a6..55c78e1 100644
--- a/gcc/testsuite/gcc.target/aarch64/sve/pcs/return_5.c
+++ b/gcc/testsuite/gcc.target/aarch64/sve/pcs/return_5.c
@@ -52,6 +52,14 @@ CALLEE (u16, svuint16_t)
 CALLEE (f16, svfloat16_t)
 
 /*
+** callee_bf16:
+**	ptrue	(p[0-7])\.b, all
+**	ld1h	z0\.h, \1/z, \[x0\]
+**	ret
+*/
+CALLEE (bf16, svbfloat16_t)
+
+/*
 ** callee_s32:
 **	ptrue	(p[0-7])\.b, all
 **	ld1w	z0\.s, \1/z, \[x0\]
@@ -107,6 +115,14 @@ CALLEE (f64, svfloat64_t)
     return svaddv (svptrue_b8 (), callee_##SUFFIX (ptr1));	\
   }
 
+#define CALLER_BF16(SUFFIX, TYPE)				\
+  typeof (svlasta (svptrue_b8 (), *(TYPE *) 0))			\
+  __attribute__((noipa))					\
+  caller_##SUFFIX (TYPE *ptr1)					\
+  {								\
+    return svlasta (svptrue_b8 (), callee_##SUFFIX (ptr1));	\
+  }
+
 /*
 ** caller_s8:
 **	...
@@ -167,6 +183,17 @@ CALLER (u16, svuint16_t)
 CALLER (f16, svfloat16_t)
 
 /*
+** caller_bf16:
+**	...
+**	bl	callee_bf16
+**	ptrue	(p[0-7])\.b, all
+**	lasta	h0, \1, z0\.h
+**	ldp	x29, x30, \[sp\], 16
+**	ret
+*/
+CALLER_BF16 (bf16, svbfloat16_t)
+
+/*
 ** caller_s32:
 **	...
 **	bl	callee_s32
diff --git a/gcc/testsuite/gcc.target/aarch64/sve/pcs/return_5_1024.c b/gcc/testsuite/gcc.target/aarch64/sve/pcs/return_5_1024.c
index 2af5fbc..52e9916 100644
--- a/gcc/testsuite/gcc.target/aarch64/sve/pcs/return_5_1024.c
+++ b/gcc/testsuite/gcc.target/aarch64/sve/pcs/return_5_1024.c
@@ -52,6 +52,14 @@ CALLEE (u16, svuint16_t)
 CALLEE (f16, svfloat16_t)
 
 /*
+** callee_bf16:
+**	ptrue	(p[0-7])\.b, vl128
+**	ld1h	z0\.h, \1/z, \[x0\]
+**	ret
+*/
+CALLEE (bf16, svbfloat16_t)
+
+/*
 ** callee_s32:
 **	ptrue	(p[0-7])\.b, vl128
 **	ld1w	z0\.s, \1/z, \[x0\]
@@ -107,6 +115,14 @@ CALLEE (f64, svfloat64_t)
     return svaddv (svptrue_b8 (), callee_##SUFFIX (ptr1));	\
   }
 
+#define CALLER_BF16(SUFFIX, TYPE)				\
+  typeof (svlasta (svptrue_b8 (), *(TYPE *) 0))			\
+  __attribute__((noipa))					\
+  caller_##SUFFIX (TYPE *ptr1)					\
+  {								\
+    return svlasta (svptrue_b8 (), callee_##SUFFIX (ptr1));	\
+  }
+
 /*
 ** caller_s8:
 **	...
@@ -167,6 +183,17 @@ CALLER (u16, svuint16_t)
 CALLER (f16, svfloat16_t)
 
 /*
+** caller_bf16:
+**	...
+**	bl	callee_bf16
+**	ptrue	(p[0-7])\.b, vl128
+**	lasta	h0, \1, z0\.h
+**	ldp	x29, x30, \[sp\], 16
+**	ret
+*/
+CALLER_BF16 (bf16, svbfloat16_t)
+
+/*
 ** caller_s32:
 **	...
 **	bl	callee_s32
diff --git a/gcc/testsuite/gcc.target/aarch64/sve/pcs/return_5_128.c b/gcc/testsuite/gcc.target/aarch64/sve/pcs/return_5_128.c
index df61b63..cfb2f38 100644
--- a/gcc/testsuite/gcc.target/aarch64/sve/pcs/return_5_128.c
+++ b/gcc/testsuite/gcc.target/aarch64/sve/pcs/return_5_128.c
@@ -52,6 +52,14 @@ CALLEE (u16, svuint16_t)
 CALLEE (f16, svfloat16_t)
 
 /*
+** callee_bf16:
+**	ptrue	(p[0-7])\.b, vl16
+**	ld1h	z0\.h, \1/z, \[x0\]
+**	ret
+*/
+CALLEE (bf16, svbfloat16_t)
+
+/*
 ** callee_s32:
 **	ptrue	(p[0-7])\.b, vl16
 **	ld1w	z0\.s, \1/z, \[x0\]
@@ -107,6 +115,14 @@ CALLEE (f64, svfloat64_t)
     return svaddv (svptrue_b8 (), callee_##SUFFIX (ptr1));	\
   }
 
+#define CALLER_BF16(SUFFIX, TYPE)				\
+  typeof (svlasta (svptrue_b8 (), *(TYPE *) 0))			\
+  __attribute__((noipa))					\
+  caller_##SUFFIX (TYPE *ptr1)					\
+  {								\
+    return svlasta (svptrue_b8 (), callee_##SUFFIX (ptr1));	\
+  }
+
 /*
 ** caller_s8:
 **	...
@@ -167,6 +183,17 @@ CALLER (u16, svuint16_t)
 CALLER (f16, svfloat16_t)
 
 /*
+** caller_bf16:
+**	...
+**	bl	callee_bf16
+**	ptrue	(p[0-7])\.b, vl16
+**	lasta	h0, \1, z0\.h
+**	ldp	x29, x30, \[sp\], 16
+**	ret
+*/
+CALLER_BF16 (bf16, svbfloat16_t)
+
+/*
 ** caller_s32:
 **	...
 **	bl	callee_s32
diff --git a/gcc/testsuite/gcc.target/aarch64/sve/pcs/return_5_2048.c b/gcc/testsuite/gcc.target/aarch64/sve/pcs/return_5_2048.c
index a8ae430..6f37d9d 100644
--- a/gcc/testsuite/gcc.target/aarch64/sve/pcs/return_5_2048.c
+++ b/gcc/testsuite/gcc.target/aarch64/sve/pcs/return_5_2048.c
@@ -52,6 +52,14 @@ CALLEE (u16, svuint16_t)
 CALLEE (f16, svfloat16_t)
 
 /*
+** callee_bf16:
+**	ptrue	(p[0-7])\.b, vl256
+**	ld1h	z0\.h, \1/z, \[x0\]
+**	ret
+*/
+CALLEE (bf16, svbfloat16_t)
+
+/*
 ** callee_s32:
 **	ptrue	(p[0-7])\.b, vl256
 **	ld1w	z0\.s, \1/z, \[x0\]
@@ -107,6 +115,14 @@ CALLEE (f64, svfloat64_t)
     return svaddv (svptrue_b8 (), callee_##SUFFIX (ptr1));	\
   }
 
+#define CALLER_BF16(SUFFIX, TYPE)				\
+  typeof (svlasta (svptrue_b8 (), *(TYPE *) 0))			\
+  __attribute__((noipa))					\
+  caller_##SUFFIX (TYPE *ptr1)					\
+  {								\
+    return svlasta (svptrue_b8 (), callee_##SUFFIX (ptr1));	\
+  }
+
 /*
 ** caller_s8:
 **	...
@@ -167,6 +183,17 @@ CALLER (u16, svuint16_t)
 CALLER (f16, svfloat16_t)
 
 /*
+** caller_bf16:
+**	...
+**	bl	callee_bf16
+**	ptrue	(p[0-7])\.b, vl256
+**	lasta	h0, \1, z0\.h
+**	ldp	x29, x30, \[sp\], 16
+**	ret
+*/
+CALLER_BF16 (bf16, svbfloat16_t)
+
+/*
 ** caller_s32:
 **	...
 **	bl	callee_s32
diff --git a/gcc/testsuite/gcc.target/aarch64/sve/pcs/return_5_256.c b/gcc/testsuite/gcc.target/aarch64/sve/pcs/return_5_256.c
index 52db4e5..7ba094e 100644
--- a/gcc/testsuite/gcc.target/aarch64/sve/pcs/return_5_256.c
+++ b/gcc/testsuite/gcc.target/aarch64/sve/pcs/return_5_256.c
@@ -52,6 +52,14 @@ CALLEE (u16, svuint16_t)
 CALLEE (f16, svfloat16_t)
 
 /*
+** callee_bf16:
+**	ptrue	(p[0-7])\.b, vl32
+**	ld1h	z0\.h, \1/z, \[x0\]
+**	ret
+*/
+CALLEE (bf16, svbfloat16_t)
+
+/*
 ** callee_s32:
 **	ptrue	(p[0-7])\.b, vl32
 **	ld1w	z0\.s, \1/z, \[x0\]
@@ -107,6 +115,14 @@ CALLEE (f64, svfloat64_t)
     return svaddv (svptrue_b8 (), callee_##SUFFIX (ptr1));	\
   }
 
+#define CALLER_BF16(SUFFIX, TYPE)				\
+  typeof (svlasta (svptrue_b8 (), *(TYPE *) 0))			\
+  __attribute__((noipa))					\
+  caller_##SUFFIX (TYPE *ptr1)					\
+  {								\
+    return svlasta (svptrue_b8 (), callee_##SUFFIX (ptr1));	\
+  }
+
 /*
 ** caller_s8:
 **	...
@@ -167,6 +183,17 @@ CALLER (u16, svuint16_t)
 CALLER (f16, svfloat16_t)
 
 /*
+** caller_bf16:
+**	...
+**	bl	callee_bf16
+**	ptrue	(p[0-7])\.b, vl32
+**	lasta	h0, \1, z0\.h
+**	ldp	x29, x30, \[sp\], 16
+**	ret
+*/
+CALLER_BF16 (bf16, svbfloat16_t)
+
+/*
 ** caller_s32:
 **	...
 **	bl	callee_s32
diff --git a/gcc/testsuite/gcc.target/aarch64/sve/pcs/return_5_512.c b/gcc/testsuite/gcc.target/aarch64/sve/pcs/return_5_512.c
index 9294081..36b14d4 100644
--- a/gcc/testsuite/gcc.target/aarch64/sve/pcs/return_5_512.c
+++ b/gcc/testsuite/gcc.target/aarch64/sve/pcs/return_5_512.c
@@ -52,6 +52,14 @@ CALLEE (u16, svuint16_t)
 CALLEE (f16, svfloat16_t)
 
 /*
+** callee_bf16:
+**	ptrue	(p[0-7])\.b, vl64
+**	ld1h	z0\.h, \1/z, \[x0\]
+**	ret
+*/
+CALLEE (bf16, svbfloat16_t)
+
+/*
 ** callee_s32:
 **	ptrue	(p[0-7])\.b, vl64
 **	ld1w	z0\.s, \1/z, \[x0\]
@@ -107,6 +115,14 @@ CALLEE (f64, svfloat64_t)
     return svaddv (svptrue_b8 (), callee_##SUFFIX (ptr1));	\
   }
 
+#define CALLER_BF16(SUFFIX, TYPE)				\
+  typeof (svlasta (svptrue_b8 (), *(TYPE *) 0))			\
+  __attribute__((noipa))					\
+  caller_##SUFFIX (TYPE *ptr1)					\
+  {								\
+    return svlasta (svptrue_b8 (), callee_##SUFFIX (ptr1));	\
+  }
+
 /*
 ** caller_s8:
 **	...
@@ -167,6 +183,17 @@ CALLER (u16, svuint16_t)
 CALLER (f16, svfloat16_t)
 
 /*
+** caller_bf16:
+**	...
+**	bl	callee_bf16
+**	ptrue	(p[0-7])\.b, vl64
+**	lasta	h0, \1, z0\.h
+**	ldp	x29, x30, \[sp\], 16
+**	ret
+*/
+CALLER_BF16 (bf16, svbfloat16_t)
+
+/*
 ** caller_s32:
 **	...
 **	bl	callee_s32
diff --git a/gcc/testsuite/gcc.target/aarch64/sve/pcs/return_6.c b/gcc/testsuite/gcc.target/aarch64/sve/pcs/return_6.c
index 93dac99..72468ea 100644
--- a/gcc/testsuite/gcc.target/aarch64/sve/pcs/return_6.c
+++ b/gcc/testsuite/gcc.target/aarch64/sve/pcs/return_6.c
@@ -10,6 +10,7 @@ typedef uint8_t svuint8_t __attribute__ ((vector_size (32)));
 typedef int16_t svint16_t __attribute__ ((vector_size (32)));
 typedef uint16_t svuint16_t __attribute__ ((vector_size (32)));
 typedef __fp16 svfloat16_t __attribute__ ((vector_size (32)));
+typedef __bf16 svbfloat16_t __attribute__ ((vector_size (32)));
 
 typedef int32_t svint32_t __attribute__ ((vector_size (32)));
 typedef uint32_t svuint32_t __attribute__ ((vector_size (32)));
@@ -81,6 +82,9 @@ CALLEE (u16, svuint16_t)
 /* Currently we scalarize this.  */
 CALLEE (f16, svfloat16_t)
 
+/* Currently we scalarize this.  */
+CALLEE (bf16, svbfloat16_t)
+
 /*
 ** callee_s32:
 ** (
@@ -198,6 +202,16 @@ CALLER (u16, svuint16_t)
 CALLER (f16, svfloat16_t)
 
 /*
+** caller_bf16:
+**	...
+**	bl	callee_bf16
+**	ldr	h0, \[sp, 16\]
+**	ldp	x29, x30, \[sp\], 48
+**	ret
+*/
+CALLER (bf16, svbfloat16_t)
+
+/*
 ** caller_s32:
 **	...
 **	bl	callee_s32
diff --git a/gcc/testsuite/gcc.target/aarch64/sve/pcs/return_6_1024.c b/gcc/testsuite/gcc.target/aarch64/sve/pcs/return_6_1024.c
index b019080..b6f267e 100644
--- a/gcc/testsuite/gcc.target/aarch64/sve/pcs/return_6_1024.c
+++ b/gcc/testsuite/gcc.target/aarch64/sve/pcs/return_6_1024.c
@@ -10,6 +10,7 @@ typedef uint8_t svuint8_t __attribute__ ((vector_size (128)));
 typedef int16_t svint16_t __attribute__ ((vector_size (128)));
 typedef uint16_t svuint16_t __attribute__ ((vector_size (128)));
 typedef __fp16 svfloat16_t __attribute__ ((vector_size (128)));
+typedef __bf16 svbfloat16_t __attribute__ ((vector_size (128)));
 
 typedef int32_t svint32_t __attribute__ ((vector_size (128)));
 typedef uint32_t svuint32_t __attribute__ ((vector_size (128)));
@@ -72,6 +73,15 @@ CALLEE (u16, svuint16_t)
 CALLEE (f16, svfloat16_t)
 
 /*
+** callee_bf16:
+**	ptrue	(p[0-7])\.b, vl128
+**	ld1h	z0\.h, \1/z, \[x0\]
+**	st1h	z0\.h, \1, \[x8\]
+**	ret
+*/
+CALLEE (bf16, svbfloat16_t)
+
+/*
 ** callee_s32:
 **	ptrue	(p[0-7])\.b, vl128
 **	ld1w	z0\.s, \1/z, \[x0\]
@@ -193,6 +203,18 @@ CALLER (u16, svuint16_t)
 CALLER (f16, svfloat16_t)
 
 /*
+** caller_bf16:
+**	...
+**	bl	callee_bf16
+**	...
+**	ld1h	(z[0-9]+\.h), (p[0-7])/z, \[[^]]*\]
+**	st1h	\1, \2, \[[^]]*\]
+**	...
+**	ret
+*/
+CALLER (bf16, svbfloat16_t)
+
+/*
 ** caller_s32:
 **	...
 **	bl	callee_s32
diff --git a/gcc/testsuite/gcc.target/aarch64/sve/pcs/return_6_128.c b/gcc/testsuite/gcc.target/aarch64/sve/pcs/return_6_128.c
index cbb89d4..fd83845 100644
--- a/gcc/testsuite/gcc.target/aarch64/sve/pcs/return_6_128.c
+++ b/gcc/testsuite/gcc.target/aarch64/sve/pcs/return_6_128.c
@@ -10,6 +10,7 @@ typedef uint8_t svuint8_t __attribute__ ((vector_size (16)));
 typedef int16_t svint16_t __attribute__ ((vector_size (16)));
 typedef uint16_t svuint16_t __attribute__ ((vector_size (16)));
 typedef __fp16 svfloat16_t __attribute__ ((vector_size (16)));
+typedef __bf16 svbfloat16_t __attribute__ ((vector_size (16)));
 
 typedef int32_t svint32_t __attribute__ ((vector_size (16)));
 typedef uint32_t svuint32_t __attribute__ ((vector_size (16)));
@@ -62,6 +63,13 @@ CALLEE (u16, svuint16_t)
 CALLEE (f16, svfloat16_t)
 
 /*
+** callee_bf16:
+**	ldr	q0, \[x0\]
+**	ret
+*/
+CALLEE (bf16, svbfloat16_t)
+
+/*
 ** callee_s32:
 **	ldr	q0, \[x0\]
 **	ret
@@ -166,6 +174,17 @@ CALLER (u16, svuint16_t)
 CALLER (f16, svfloat16_t)
 
 /*
+** caller_bf16:
+**	...
+**	bl	callee_bf16
+**	...
+**	str	q0, \[[^]]*\]
+**	...
+**	ret
+*/
+CALLER (bf16, svbfloat16_t)
+
+/*
 ** caller_s32:
 **	...
 **	bl	callee_s32
diff --git a/gcc/testsuite/gcc.target/aarch64/sve/pcs/return_6_2048.c b/gcc/testsuite/gcc.target/aarch64/sve/pcs/return_6_2048.c
index 21a3d47..46b7d68 100644
--- a/gcc/testsuite/gcc.target/aarch64/sve/pcs/return_6_2048.c
+++ b/gcc/testsuite/gcc.target/aarch64/sve/pcs/return_6_2048.c
@@ -10,6 +10,7 @@ typedef uint8_t svuint8_t __attribute__ ((vector_size (256)));
 typedef int16_t svint16_t __attribute__ ((vector_size (256)));
 typedef uint16_t svuint16_t __attribute__ ((vector_size (256)));
 typedef __fp16 svfloat16_t __attribute__ ((vector_size (256)));
+typedef __bf16 svbfloat16_t __attribute__ ((vector_size (256)));
 
 typedef int32_t svint32_t __attribute__ ((vector_size (256)));
 typedef uint32_t svuint32_t __attribute__ ((vector_size (256)));
@@ -72,6 +73,15 @@ CALLEE (u16, svuint16_t)
 CALLEE (f16, svfloat16_t)
 
 /*
+** callee_bf16:
+**	ptrue	(p[0-7])\.b, vl256
+**	ld1h	z0\.h, \1/z, \[x0\]
+**	st1h	z0\.h, \1, \[x8\]
+**	ret
+*/
+CALLEE (bf16, svbfloat16_t)
+
+/*
 ** callee_s32:
 **	ptrue	(p[0-7])\.b, vl256
 **	ld1w	z0\.s, \1/z, \[x0\]
@@ -193,6 +203,18 @@ CALLER (u16, svuint16_t)
 CALLER (f16, svfloat16_t)
 
 /*
+** caller_bf16:
+**	...
+**	bl	callee_bf16
+**	...
+**	ld1h	(z[0-9]+\.h), (p[0-7])/z, \[[^]]*\]
+**	st1h	\1, \2, \[[^]]*\]
+**	...
+**	ret
+*/
+CALLER (bf16, svbfloat16_t)
+
+/*
 ** caller_s32:
 **	...
 **	bl	callee_s32
diff --git a/gcc/testsuite/gcc.target/aarch64/sve/pcs/return_6_256.c b/gcc/testsuite/gcc.target/aarch64/sve/pcs/return_6_256.c
index d495cfb..0487249 100644
--- a/gcc/testsuite/gcc.target/aarch64/sve/pcs/return_6_256.c
+++ b/gcc/testsuite/gcc.target/aarch64/sve/pcs/return_6_256.c
@@ -10,6 +10,7 @@ typedef uint8_t svuint8_t __attribute__ ((vector_size (32)));
 typedef int16_t svint16_t __attribute__ ((vector_size (32)));
 typedef uint16_t svuint16_t __attribute__ ((vector_size (32)));
 typedef __fp16 svfloat16_t __attribute__ ((vector_size (32)));
+typedef __bf16 svbfloat16_t __attribute__ ((vector_size (32)));
 
 typedef int32_t svint32_t __attribute__ ((vector_size (32)));
 typedef uint32_t svuint32_t __attribute__ ((vector_size (32)));
@@ -72,6 +73,15 @@ CALLEE (u16, svuint16_t)
 CALLEE (f16, svfloat16_t)
 
 /*
+** callee_bf16:
+**	ptrue	(p[0-7])\.b, vl32
+**	ld1h	z0\.h, \1/z, \[x0\]
+**	st1h	z0\.h, \1, \[x8\]
+**	ret
+*/
+CALLEE (bf16, svbfloat16_t)
+
+/*
 ** callee_s32:
 **	ptrue	(p[0-7])\.b, vl32
 **	ld1w	z0\.s, \1/z, \[x0\]
@@ -193,6 +203,18 @@ CALLER (u16, svuint16_t)
 CALLER (f16, svfloat16_t)
 
 /*
+** caller_bf16:
+**	...
+**	bl	callee_bf16
+**	...
+**	ld1h	(z[0-9]+\.h), (p[0-7])/z, \[[^]]*\]
+**	st1h	\1, \2, \[[^]]*\]
+**	...
+**	ret
+*/
+CALLER (bf16, svbfloat16_t)
+
+/*
 ** caller_s32:
 **	...
 **	bl	callee_s32
diff --git a/gcc/testsuite/gcc.target/aarch64/sve/pcs/return_6_512.c b/gcc/testsuite/gcc.target/aarch64/sve/pcs/return_6_512.c
index be572f2..9817d85 100644
--- a/gcc/testsuite/gcc.target/aarch64/sve/pcs/return_6_512.c
+++ b/gcc/testsuite/gcc.target/aarch64/sve/pcs/return_6_512.c
@@ -10,6 +10,7 @@ typedef uint8_t svuint8_t __attribute__ ((vector_size (64)));
 typedef int16_t svint16_t __attribute__ ((vector_size (64)));
 typedef uint16_t svuint16_t __attribute__ ((vector_size (64)));
 typedef __fp16 svfloat16_t __attribute__ ((vector_size (64)));
+typedef __bf16 svbfloat16_t __attribute__ ((vector_size (64)));
 
 typedef int32_t svint32_t __attribute__ ((vector_size (64)));
 typedef uint32_t svuint32_t __attribute__ ((vector_size (64)));
@@ -72,6 +73,15 @@ CALLEE (u16, svuint16_t)
 CALLEE (f16, svfloat16_t)
 
 /*
+** callee_bf16:
+**	ptrue	(p[0-7])\.b, vl64
+**	ld1h	z0\.h, \1/z, \[x0\]
+**	st1h	z0\.h, \1, \[x8\]
+**	ret
+*/
+CALLEE (bf16, svbfloat16_t)
+
+/*
 ** callee_s32:
 **	ptrue	(p[0-7])\.b, vl64
 **	ld1w	z0\.s, \1/z, \[x0\]
@@ -193,6 +203,18 @@ CALLER (u16, svuint16_t)
 CALLER (f16, svfloat16_t)
 
 /*
+** caller_bf16:
+**	...
+**	bl	callee_bf16
+**	...
+**	ld1h	(z[0-9]+\.h), (p[0-7])/z, \[[^]]*\]
+**	st1h	\1, \2, \[[^]]*\]
+**	...
+**	ret
+*/
+CALLER (bf16, svbfloat16_t)
+
+/*
 ** caller_s32:
 **	...
 **	bl	callee_s32
diff --git a/gcc/testsuite/gcc.target/aarch64/sve/pcs/return_7.c b/gcc/testsuite/gcc.target/aarch64/sve/pcs/return_7.c
index d03ef69..55456a3 100644
--- a/gcc/testsuite/gcc.target/aarch64/sve/pcs/return_7.c
+++ b/gcc/testsuite/gcc.target/aarch64/sve/pcs/return_7.c
@@ -145,6 +145,34 @@ caller_f16 (void)
 }
 
 /*
+** callee_bf16:
+**	mov	z0\.h, h2
+**	mov	z1\.h, h3
+**	ret
+*/
+svbfloat16x2_t __attribute__((noipa))
+callee_bf16 (bfloat16_t h0, bfloat16_t h1, bfloat16_t h2, bfloat16_t h3)
+{
+  return svcreate2 (svdup_bf16 (h2), svdup_bf16 (h3));
+}
+
+/*
+** caller_bf16:
+**	...
+**	bl	callee_bf16
+**	zip2	z0\.h, z1\.h, z0\.h
+**	ldp	x29, x30, \[sp\], 16
+**	ret
+*/
+svbfloat16_t __attribute__((noipa))
+caller_bf16 (bfloat16_t h0, bfloat16_t h1, bfloat16_t h2, bfloat16_t h3)
+{
+  svbfloat16x2_t res;
+  res = callee_bf16 (h0, h1, h2, h3);
+  return svzip2 (svget2 (res, 1), svget2 (res, 0));
+}
+
+/*
 ** callee_s32:
 **	mov	z0\.s, #1
 **	mov	z1\.s, #2
diff --git a/gcc/testsuite/gcc.target/aarch64/sve/pcs/return_8.c b/gcc/testsuite/gcc.target/aarch64/sve/pcs/return_8.c
index 6a094bb..9581811 100644
--- a/gcc/testsuite/gcc.target/aarch64/sve/pcs/return_8.c
+++ b/gcc/testsuite/gcc.target/aarch64/sve/pcs/return_8.c
@@ -160,6 +160,35 @@ caller_f16 (void)
 }
 
 /*
+** callee_bf16:
+**	mov	z0\.h, h0
+**	mov	z1\.h, h1
+**	mov	z2\.h, h2
+**	ret
+*/
+svbfloat16x3_t __attribute__((noipa))
+callee_bf16 (bfloat16_t h0, bfloat16_t h1, bfloat16_t h2)
+{
+  return svcreate3 (svdup_bf16 (h0), svdup_bf16 (h1), svdup_bf16 (h2));
+}
+
+/*
+** caller_bf16:
+**	...
+**	bl	callee_bf16
+**	trn2	z0\.h, z0\.h, z2\.h
+**	ldp	x29, x30, \[sp\], 16
+**	ret
+*/
+svbfloat16_t __attribute__((noipa))
+caller_bf16 (bfloat16_t h0, bfloat16_t h1, bfloat16_t h2)
+{
+  svbfloat16x3_t res;
+  res = callee_bf16 (h0, h1, h2);
+  return svtrn2 (svget3 (res, 0), svget3 (res, 2));
+}
+
+/*
 ** callee_s32:
 **	mov	z0\.s, #1
 **	mov	z1\.s, #2
diff --git a/gcc/testsuite/gcc.target/aarch64/sve/pcs/return_9.c b/gcc/testsuite/gcc.target/aarch64/sve/pcs/return_9.c
index caadbb9..ad32e1f 100644
--- a/gcc/testsuite/gcc.target/aarch64/sve/pcs/return_9.c
+++ b/gcc/testsuite/gcc.target/aarch64/sve/pcs/return_9.c
@@ -185,6 +185,39 @@ caller_f16 (void)
 }
 
 /*
+** callee_bf16:
+**	mov	z0\.h, h4
+**	mov	z1\.h, h5
+**	mov	z2\.h, h6
+**	mov	z3\.h, h7
+**	ret
+*/
+svbfloat16x4_t __attribute__((noipa))
+callee_bf16 (bfloat16_t h0, bfloat16_t h1, bfloat16_t h2, bfloat16_t h3,
+	     bfloat16_t h4, bfloat16_t h5, bfloat16_t h6, bfloat16_t h7)
+{
+  return svcreate4 (svdup_bf16 (h4), svdup_bf16 (h5),
+		    svdup_bf16 (h6), svdup_bf16 (h7));
+}
+
+/*
+** caller_bf16:
+**	...
+**	bl	callee_bf16
+**	trn2	z0\.h, z0\.h, z3\.h
+**	ldp	x29, x30, \[sp\], 16
+**	ret
+*/
+svbfloat16_t __attribute__((noipa))
+caller_bf16 (bfloat16_t h0, bfloat16_t h1, bfloat16_t h2, bfloat16_t h3,
+	     bfloat16_t h4, bfloat16_t h5, bfloat16_t h6, bfloat16_t h7)
+{
+  svbfloat16x4_t res;
+  res = callee_bf16 (h0, h1, h2, h3, h4, h5, h6, h7);
+  return svtrn2 (svget4 (res, 0), svget4 (res, 3));
+}
+
+/*
 ** callee_s32:
 **	mov	z0\.s, #1
 **	mov	z1\.s, #2
diff --git a/gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/tbl2_bf16.c b/gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/tbl2_bf16.c
new file mode 100644
index 0000000..4912c9e
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/tbl2_bf16.c
@@ -0,0 +1,30 @@
+/* { dg-final { check-function-bodies "**" "" "-DCHECK_ASM" } } */
+
+#include "test_sve_acle.h"
+
+/*
+** tbl2_bf16_tied1:
+**	tbl	z0\.h, {z0\.h(?:, | - )z1\.h}, z4\.h
+**	ret
+*/
+TEST_TBL2 (tbl2_bf16_tied1, svbfloat16x2_t, svbfloat16_t, svuint16_t,
+	   z0_res = svtbl2_bf16 (z0, z4),
+	   z0_res = svtbl2 (z0, z4))
+
+/*
+** tbl2_bf16_tied2:
+**	tbl	z0\.h, {z1\.h(?:, | - )z2\.h}, z0\.h
+**	ret
+*/
+TEST_TBL2_REV (tbl2_bf16_tied2, svbfloat16x2_t, svbfloat16_t, svuint16_t,
+	       z0_res = svtbl2_bf16 (z1, z0),
+	       z0_res = svtbl2 (z1, z0))
+
+/*
+** tbl2_bf16_untied:
+**	tbl	z0\.h, {z2\.h(?:, | - )z3\.h}, z4\.h
+**	ret
+*/
+TEST_TBL2 (tbl2_bf16_untied, svbfloat16x2_t, svbfloat16_t, svuint16_t,
+	   z0_res = svtbl2_bf16 (z2, z4),
+	   z0_res = svtbl2 (z2, z4))
diff --git a/gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/tbx_bf16.c b/gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/tbx_bf16.c
new file mode 100644
index 0000000..1908573
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/tbx_bf16.c
@@ -0,0 +1,37 @@
+/* { dg-final { check-function-bodies "**" "" "-DCHECK_ASM" } } */
+
+#include "test_sve_acle.h"
+
+/*
+** tbx_bf16_tied1:
+**	tbx	z0\.h, z1\.h, z4\.h
+**	ret
+*/
+TEST_DUAL_Z (tbx_bf16_tied1, svbfloat16_t, svuint16_t,
+	     z0 = svtbx_bf16 (z0, z1, z4),
+	     z0 = svtbx (z0, z1, z4))
+
+/* Bad RA choice: no preferred output sequence.  */
+TEST_DUAL_Z (tbx_bf16_tied2, svbfloat16_t, svuint16_t,
+	     z0 = svtbx_bf16 (z1, z0, z4),
+	     z0 = svtbx (z1, z0, z4))
+
+/* Bad RA choice: no preferred output sequence.  */
+TEST_DUAL_Z_REV (tbx_bf16_tied3, svbfloat16_t, svuint16_t,
+		 z0_res = svtbx_bf16 (z4, z5, z0),
+		 z0_res = svtbx (z4, z5, z0))
+
+/*
+** tbx_bf16_untied:
+** (
+**	mov	z0\.d, z1\.d
+**	tbx	z0\.h, z2\.h, z4\.h
+** |
+**	tbx	z1\.h, z2\.h, z4\.h
+**	mov	z0\.d, z1\.d
+** )
+**	ret
+*/
+TEST_DUAL_Z (tbx_bf16_untied, svbfloat16_t, svuint16_t,
+	     z0 = svtbx_bf16 (z1, z2, z4),
+	     z0 = svtbx (z1, z2, z4))
diff --git a/gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/whilerw_bf16.c b/gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/whilerw_bf16.c
new file mode 100644
index 0000000..a0e101d
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/whilerw_bf16.c
@@ -0,0 +1,50 @@
+/* { dg-final { check-function-bodies "**" "" "-DCHECK_ASM" { target { ! ilp32 } } } } */
+
+#include "test_sve_acle.h"
+
+/*
+** whilerw_rr_bf16:
+**	whilerw	p0\.h, x0, x1
+**	ret
+*/
+TEST_COMPARE_S (whilerw_rr_bf16, const bfloat16_t *,
+		p0 = svwhilerw_bf16 (x0, x1),
+		p0 = svwhilerw (x0, x1))
+
+/*
+** whilerw_0r_bf16:
+**	whilerw	p0\.h, xzr, x1
+**	ret
+*/
+TEST_COMPARE_S (whilerw_0r_bf16, const bfloat16_t *,
+		p0 = svwhilerw_bf16 ((const bfloat16_t *) 0, x1),
+		p0 = svwhilerw ((const bfloat16_t *) 0, x1))
+
+/*
+** whilerw_cr_bf16:
+**	mov	(x[0-9]+), #?1073741824
+**	whilerw	p0\.h, \1, x1
+**	ret
+*/
+TEST_COMPARE_S (whilerw_cr_bf16, const bfloat16_t *,
+		p0 = svwhilerw_bf16 ((const bfloat16_t *) 1073741824, x1),
+		p0 = svwhilerw ((const bfloat16_t *) 1073741824, x1))
+
+/*
+** whilerw_r0_bf16:
+**	whilerw	p0\.h, x0, xzr
+**	ret
+*/
+TEST_COMPARE_S (whilerw_r0_bf16, const bfloat16_t *,
+		p0 = svwhilerw_bf16 (x0, (const bfloat16_t *) 0),
+		p0 = svwhilerw (x0, (const bfloat16_t *) 0))
+
+/*
+** whilerw_rc_bf16:
+**	mov	(x[0-9]+), #?1073741824
+**	whilerw	p0\.h, x0, \1
+**	ret
+*/
+TEST_COMPARE_S (whilerw_rc_bf16, const bfloat16_t *,
+		p0 = svwhilerw_bf16 (x0, (const bfloat16_t *) 1073741824),
+		p0 = svwhilerw (x0, (const bfloat16_t *) 1073741824))
diff --git a/gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/whilewr_bf16.c b/gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/whilewr_bf16.c
new file mode 100644
index 0000000..895e376
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/whilewr_bf16.c
@@ -0,0 +1,50 @@
+/* { dg-final { check-function-bodies "**" "" "-DCHECK_ASM" { target { ! ilp32 } } } } */
+
+#include "test_sve_acle.h"
+
+/*
+** whilewr_rr_bf16:
+**	whilewr	p0\.h, x0, x1
+**	ret
+*/
+TEST_COMPARE_S (whilewr_rr_bf16, const bfloat16_t *,
+		p0 = svwhilewr_bf16 (x0, x1),
+		p0 = svwhilewr (x0, x1))
+
+/*
+** whilewr_0r_bf16:
+**	whilewr	p0\.h, xzr, x1
+**	ret
+*/
+TEST_COMPARE_S (whilewr_0r_bf16, const bfloat16_t *,
+		p0 = svwhilewr_bf16 ((const bfloat16_t *) 0, x1),
+		p0 = svwhilewr ((const bfloat16_t *) 0, x1))
+
+/*
+** whilewr_cr_bf16:
+**	mov	(x[0-9]+), #?1073741824
+**	whilewr	p0\.h, \1, x1
+**	ret
+*/
+TEST_COMPARE_S (whilewr_cr_bf16, const bfloat16_t *,
+		p0 = svwhilewr_bf16 ((const bfloat16_t *) 1073741824, x1),
+		p0 = svwhilewr ((const bfloat16_t *) 1073741824, x1))
+
+/*
+** whilewr_r0_bf16:
+**	whilewr	p0\.h, x0, xzr
+**	ret
+*/
+TEST_COMPARE_S (whilewr_r0_bf16, const bfloat16_t *,
+		p0 = svwhilewr_bf16 (x0, (const bfloat16_t *) 0),
+		p0 = svwhilewr (x0, (const bfloat16_t *) 0))
+
+/*
+** whilewr_rc_bf16:
+**	mov	(x[0-9]+), #?1073741824
+**	whilewr	p0\.h, x0, \1
+**	ret
+*/
+TEST_COMPARE_S (whilewr_rc_bf16, const bfloat16_t *,
+		p0 = svwhilewr_bf16 (x0, (const bfloat16_t *) 1073741824),
+		p0 = svwhilewr (x0, (const bfloat16_t *) 1073741824))