aarch64: Use Q-reg loads/stores in movmem expansion
authorSudakshina Das <sudi.das@arm.com>
Tue, 4 Aug 2020 11:01:21 +0000 (12:01 +0100)
committerSudakshina Das <sudi.das@arm.com>
Tue, 4 Aug 2020 11:01:53 +0000 (12:01 +0100)
commit7cda9e0878da44dcaf025d3d146534dfaf0b9986
tree55d3496d7ffbd9f70eef4063b9fafbdd6c980423
parentd2b86e14c14020f3e119ab8f462e2a91bd7d46e5
aarch64: Use Q-reg loads/stores in movmem expansion

This is my attempt at reviving the old patch
https://gcc.gnu.org/pipermail/gcc-patches/2019-January/514632.html

I have followed on Kyrill's comment upstream on the link above and I
am using the recommended option iii that he mentioned.
"1) Adjust the copy_limit to 256 bits after checking
    AARCH64_EXTRA_TUNE_NO_LDP_STP_QREGS in the tuning.
 2) Adjust aarch64_copy_one_block_and_progress_pointers to handle
    256-bit moves. by iii:
   iii) Emit explicit V4SI (or any other 128-bit vector mode) pairs
ldp/stps. This wouldn't need any adjustments to MD patterns,
but would make aarch64_copy_one_block_and_progress_pointers
more complex as it would now have two paths, where one
handles two adjacent memory addresses in one calls."

gcc/ChangeLog:

* config/aarch64/aarch64.c (aarch64_gen_store_pair): Add case
for E_V4SImode.
(aarch64_gen_load_pair): Likewise.
(aarch64_copy_one_block_and_progress_pointers): Handle 256 bit copy.
(aarch64_expand_cpymem): Expand copy_limit to 256bits where
appropriate.

gcc/testsuite/ChangeLog:

* gcc.target/aarch64/cpymem-q-reg_1.c: New test.
* gcc.target/aarch64/large_struct_copy_2.c: Update for ldp q regs.
gcc/config/aarch64/aarch64.c
gcc/testsuite/gcc.target/aarch64/cpymem-q-reg_1.c [new file with mode: 0644]
gcc/testsuite/gcc.target/aarch64/large_struct_copy_2.c