review.tizen.org Git - test

LoongArch: Modify fp_sp_offset and gp_sp_offset's calculation method when frame->mask or frame->fmask is zero.

Under the LA architecture, when the stack is dropped too far, the process
of dropping the stack is divided into two steps.
step1: After dropping the stack, save callee saved registers on the stack.
step2: The rest of it.

The stack drop operation is optimized when frame->total_size minus
frame->sp_fp_offset is an integer multiple of 4096, can reduce the number
of instructions required to drop the stack. However, this optimization is
not effective because of the original calculation method

The following case:
int main()
{
     char buf[1024 * 12];
     printf ("%p\n", buf);
     return 0;
}

As you can see from the generated assembler, the old GCC has two more
instructions than the new GCC, lines 14 and line 24.

   new                                        old
10 main:                       | 11 main:
11   addi.d  $r3,$r3,-16       | 12   lu12i.w $r13,-12288>>12
12   lu12i.w $r13,-12288>>12   | 13   addi.d  $r3,$r3,-2032
13   lu12i.w $r5,-12288>>12    | 14   ori $r13,$r13,2016
14   lu12i.w $r12,12288>>12    | 15   lu12i.w $r5,-12288>>12
15   st.d  $r1,$r3,8           | 16   lu12i.w $r12,12288>>12
16   add.d $r12,$r12,$r5       | 17   st.d  $r1,$r3,2024
17   add.d $r3,$r3,$r13        | 18   add.d $r12,$r12,$r5
18   add.d $r5,$r12,$r3        | 19   add.d $r3,$r3,$r13
19   la.local  $r4,.LC0        | 20   add.d $r5,$r12,$r3
20   bl  %plt(printf)          | 21   la.local  $r4,.LC0
21   lu12i.w $r13,12288>>12    | 22   bl  %plt(printf)
22   add.d $r3,$r3,$r13        | 23   lu12i.w $r13,8192>>12
23   ld.d  $r1,$r3,8           | 24   ori $r13,$r13,2080
24   or  $r4,$r0,$r0           | 25   add.d $r3,$r3,$r13
25   addi.d  $r3,$r3,16        | 26   ld.d  $r1,$r3,2024
26   jr  $r1                   | 27   or  $r4,$r0,$r0
        | 28   addi.d  $r3,$r3,2032
| 29   jr  $r1
gcc/ChangeLog:

* config/loongarch/loongarch.cc (loongarch_compute_frame_info):
Modify fp_sp_offset and gp_sp_offset's calculation method,
when frame->mask or frame->fmask is zero, don't minus UNITS_PER_WORD
or UNITS_PER_FP_REG.

gcc/testsuite/ChangeLog:

* gcc.target/loongarch/prolog-opt.c: New test.

(cherry picked from commit aa8fd7f65683ef9c3b6d2e9306bea2f28b5cadf7)

author	Lulu Cheng <chenglulu@loongson.cn>
	Thu, 7 Jul 2022 10:07:28 +0000 (18:07 +0800)
committer	Lulu Cheng <chenglulu@loongson.cn>
	Fri, 8 Jul 2022 03:24:34 +0000 (11:24 +0800)
commit	e623829c18ec2949f8b43a5a13775659e0cd1cbf
tree	bce922b4293730d056cb6dbd46ab867149e632f9	tree \| snapshot
parent	e02edb338f530ba86ad944327f540e52bb709959	commit \| diff

gcc/config/loongarch/loongarch.cc		diff \| blob \| history
gcc/testsuite/gcc.target/loongarch/prolog-opt.c	[new file with mode: 0644]	blob