review.tizen.org Git - platform/upstream/pixman.git/commit

ARM: better NEON instructions scheduling for over_n_8_0565

Code rearranged to get better instructions scheduling for ARM Cortex-A8/A9.
Now it is ~30% faster for the pixel data in L1 cache and makes better use
of memory bandwidth when running at lower clock frequencies (ex. 500MHz).
Also register d24 (pixels from the mask image) is now not clobbered by
supplementary macros, which allows to reuse them for the other variants
of compositing operations later.

Benchmark from ARM Cortex-A8 @500MHz:

== before ==

    over_n_8_0565 =  L1:  63.90  L2:  63.15  M: 60.97 ( 73.53%)
                     HT:  28.89  VT:  24.14  R: 21.33  RT:  6.78 (  67Kops/s)

== after ==

    over_n_8_0565 =  L1:  82.64  L2:  75.19  M: 71.52 ( 84.14%)
                     HT:  30.49  VT:  25.56  R: 22.36  RT:  6.89 (  68Kops/s)

author	Siarhei Siamashka <siarhei.siamashka@nokia.com>
	Fri, 26 Nov 2010 15:06:58 +0000 (17:06 +0200)
committer	Siarhei Siamashka <siarhei.siamashka@nokia.com>
	Fri, 3 Dec 2010 13:37:11 +0000 (15:37 +0200)
commit	e6814837a6ccd3e4db329e0131eaf2055d2c864b
tree	1c9ec6da116e3698386f0ae78bd4bdc6c6a97604	tree \| snapshot
parent	3be86a92ccab240859062a541cdb871d81c9501a	commit \| diff