levytamar82 [Wed, 29 Apr 2015 18:54:11 +0000 (11:54 -0700)]
VP9_LPF_VERTICAL_16_DUAL_SSE2 optimization
The vp9_lpf_vertical_16_dual function optimized for x86 32bit target. The hot code in that function was caused by the call to the transpose8x16.
The gcc generated assembly created uneeded fills and spills to the stack. By interleaving 2 loads and unpack instructions, in addition to hoisting the consumer
instruction closer to the producer instructions, we eliminated most of the fills and spills and improve the function-level performance by 17%.
credit for writing the function as well as finding the root cause goes to Erik Niemeyer (erik.a.niemeyer@intel.com)
Change-Id: I6173cf53956d52918a047d1c53d9a673f952ec46
Jingning Han [Thu, 2 Jul 2015 04:24:53 +0000 (04:24 +0000)]
Merge "Use vpx prefix for codec independent threading functions"
Jingning Han [Thu, 2 Jul 2015 04:24:35 +0000 (04:24 +0000)]
Merge "Move multi-threading module functions into vpx_thread folder"
James Zern [Thu, 2 Jul 2015 01:52:39 +0000 (01:52 +0000)]
Merge "vp9_pred_common: inline vp9_get_tx_size_context"
James Zern [Thu, 2 Jul 2015 01:52:19 +0000 (01:52 +0000)]
Merge "vp9_pred_common: inline vp9_get_segment_id"
James Zern [Thu, 2 Jul 2015 01:51:24 +0000 (01:51 +0000)]
Merge "vp9_dsubexp: replace some divides with shifts"
James Zern [Thu, 2 Jul 2015 01:51:05 +0000 (01:51 +0000)]
Merge "vp9/inv_remap_prob: simplify inv_map_table[]"
James Zern [Thu, 2 Jul 2015 01:50:44 +0000 (01:50 +0000)]
Merge "vp9_dsubexp: remove clamp in inv_remap_prob()"
Jingning Han [Wed, 1 Jul 2015 23:32:48 +0000 (16:32 -0700)]
Use vpx prefix for codec independent threading functions
Replace vp9_ prefix with vpx_ for common multi-threading functions.
Change-Id: I941a5ead9bfe8213fdad345511d2061b07797b55
Jingning Han [Wed, 1 Jul 2015 21:58:13 +0000 (14:58 -0700)]
Move multi-threading module functions into vpx_thread folder
This commit moves the primitive multi-threading files from vp9
folder to vpx_thread, which will be accessible by all vpx codec.
Change-Id: Ib51e66e9c69801c10631fab56d35a0c0aaed5883
Johann [Wed, 1 Jul 2015 21:14:41 +0000 (21:14 +0000)]
Merge "Fix --disable-use-x86inc when used with --enable-vp9-highbitdepth"
Johann [Wed, 1 Jul 2015 21:14:16 +0000 (21:14 +0000)]
Merge "Fix --disable-use-x86inc"
Johann [Tue, 30 Jun 2015 00:04:58 +0000 (17:04 -0700)]
Fix --disable-use-x86inc when used with --enable-vp9-highbitdepth
Change-Id: I0ed6de72dc0bb99fc9c5b1f6500399b16754ffb3
Johann [Mon, 29 Jun 2015 23:44:30 +0000 (16:44 -0700)]
Fix --disable-use-x86inc
Change-Id: I374fcd8fb45a6893dcdeac6896671be142a99f06
James Zern [Wed, 1 Jul 2015 20:05:49 +0000 (20:05 +0000)]
Merge "mips msa vp9 avg subpel variance optimization"
Scott LaVarnway [Wed, 1 Jul 2015 20:01:51 +0000 (20:01 +0000)]
Merge "Move inter_predictor to vp9_reconinter.h"
Parag Salasakar [Wed, 1 Jul 2015 18:23:08 +0000 (18:23 +0000)]
Merge "mips msa vpx_dsp sad sad4d avgsad optimization"
James Zern [Wed, 1 Jul 2015 17:57:52 +0000 (17:57 +0000)]
Merge "loopfiltersimpleverticaledge_neon: quiet uninit var warnings"
Scott LaVarnway [Wed, 1 Jul 2015 12:52:20 +0000 (12:52 +0000)]
Merge "VP9: Move ref_mvs[][] and mode_context[] from MB_MODE_INFO"
Parag Salasakar [Wed, 1 Jul 2015 08:16:41 +0000 (13:46 +0530)]
mips msa vp9 avg subpel variance optimization
average improvement ~3x-5x
Change-Id: Iefbcafc05daab77b38a4e63b551e427867a501a4
James Zern [Wed, 1 Jul 2015 06:23:59 +0000 (23:23 -0700)]
loopfiltersimpleverticaledge_neon: quiet uninit var warnings
take 2. localize the function parameter to actually remove the warning
Change-Id: I23c02061b5e21b0b75bd33c26062d1e531df7b92
Parag Salasakar [Wed, 1 Jul 2015 05:49:42 +0000 (11:19 +0530)]
mips msa vpx_dsp sad sad4d avgsad optimization
average improvement ~3x-5x
Change-Id: Ie30748cfbedebbd544b7ef4f286055ccb7f60306
James Zern [Wed, 1 Jul 2015 03:09:00 +0000 (20:09 -0700)]
vp9_dsubexp: replace some divides with shifts
Change-Id: I24e10c37ea8f06600cd04b43512efa6170e23e5c
James Zern [Wed, 1 Jul 2015 02:58:08 +0000 (19:58 -0700)]
vp9/inv_remap_prob: simplify inv_map_table[]
add one to each entry to remove the universal 'value + 1'.
Change-Id: I8919b1d7fde8155d1728196c4d577db3064e2c1e
Parag Salasakar [Wed, 1 Jul 2015 02:02:44 +0000 (07:32 +0530)]
mips msa vp9 subpel variance optimization
average improvement ~3x-5x
Change-Id: I4cbba2711467b0e205904769ebbb4a1fcbb1a311
Parag Salasakar [Wed, 1 Jul 2015 01:40:24 +0000 (01:40 +0000)]
Merge "mips msa vpx_dsp variance optimization"
James Zern [Tue, 30 Jun 2015 22:09:48 +0000 (15:09 -0700)]
vp9_dsubexp: remove clamp in inv_remap_prob()
the max value of the lookup in expanded form is:
(((1 << 7) - 1) << 1) - 65 + 1 + 64 = 254
remove the clamp [0, 253] and add one table entry
Change-Id: I0b5d0c66702fdb0b8f1cc9ab9b0dac66326e85a6
James Zern [Tue, 30 Jun 2015 21:12:50 +0000 (21:12 +0000)]
Merge "vp9_common_data: right-size tables"
Yaowu Xu [Tue, 30 Jun 2015 19:48:32 +0000 (19:48 +0000)]
Merge "Fixed a variance calculation"
Parag Salasakar [Fri, 26 Jun 2015 08:32:16 +0000 (14:02 +0530)]
mips msa vpx_dsp variance optimization
average improvement ~2x-4x
Change-Id: Ia3eef3f390148c2eb5cdc580a94cb26369737f82
Parag Salasakar [Tue, 30 Jun 2015 06:25:30 +0000 (06:25 +0000)]
Merge "mips msa vp9 common macro comments updated"
James Zern [Tue, 30 Jun 2015 06:21:58 +0000 (06:21 +0000)]
Merge changes Idce95354,I6b791088
* changes:
loopfiltersimpleverticaledge_neon: quiet uninit var warnings
idct_dequant_0_2x_neon: quiet uninit var warnings
Scott LaVarnway [Mon, 29 Jun 2015 16:27:11 +0000 (09:27 -0700)]
VP9: Move ref_mvs[][] and mode_context[] from MB_MODE_INFO
to MB_MODE_INFO_EXT. This saves 36 bytes per 8x8 area for
both the decoder and encoder. (encoder has two MODE_INFO
buffers)
Change-Id: If006abb2224acaf326df3c2be09e77e967662107
Scott LaVarnway [Mon, 29 Jun 2015 18:04:54 +0000 (18:04 +0000)]
Merge "Remove tile param"
Parag Salasakar [Fri, 26 Jun 2015 12:11:00 +0000 (17:41 +0530)]
mips msa vp9 common macro comments updated
Cosmetic/Grammatical corrections in vp9 macro comments
Change-Id: I774b983aff854feb69c7e4442e8731ce4c995645
Parag Salasakar [Sat, 27 Jun 2015 01:29:02 +0000 (01:29 +0000)]
Merge "mips msa vp9 temporal filter optimization"
Yaowu Xu [Fri, 26 Jun 2015 22:54:43 +0000 (15:54 -0700)]
Fixed a variance calculation
This commit fixed a mistake in variance calculation.
Thanks to Xintong for spotting the error.
Change-Id: Ia285fc0128c00f0234a73b0a7eba6adc88b8a7de
Tom Finegan [Fri, 26 Jun 2015 16:12:33 +0000 (16:12 +0000)]
Merge "vpxenc.sh: Add basic multithreaded frame parallel encode test."
Parag Salasakar [Fri, 26 Jun 2015 06:30:24 +0000 (12:00 +0530)]
mips msa vp9 temporal filter optimization
average improvement ~4x-5x
Change-Id: Iad9c0a296dbc2ea96d000bd009077999ed58a3c5
Parag Salasakar [Fri, 26 Jun 2015 03:53:56 +0000 (09:23 +0530)]
mips msa vp9 subtract block optimization
average improvement ~3x-4x
Change-Id: Idbe4d13a00d05ff8be6559b116f416e42c3b4097
Parag Salasakar [Fri, 26 Jun 2015 03:42:30 +0000 (03:42 +0000)]
Merge "mips msa vp9 block error optimization"
James Zern [Fri, 26 Jun 2015 03:27:50 +0000 (20:27 -0700)]
loopfiltersimpleverticaledge_neon: quiet uninit var warnings
the vector used in vld*_lane_* should be initialized before use
Change-Id: Idce95354737915f6fb4e6b5e8980a050e953036d
Parag Salasakar [Tue, 23 Jun 2015 07:18:50 +0000 (12:48 +0530)]
mips msa vp9 block error optimization
average improvement ~3x-4x
Change-Id: If0fdcc34b17437a7e3e7fb4caaf1067bc175f291
James Zern [Fri, 26 Jun 2015 03:27:50 +0000 (20:27 -0700)]
idct_dequant_0_2x_neon: quiet uninit var warnings
the vector used in vld*_lane_* should be initialized before use
Change-Id: I6b791088479fec3bc021ca75cc2af5adcc39d954
James Zern [Mon, 20 Apr 2015 22:56:10 +0000 (15:56 -0700)]
vp9_common_data: right-size tables
Change-Id: I2206ee148a46b234df58f2b623e9f32f26033e04
Tom Finegan [Wed, 24 Jun 2015 21:43:55 +0000 (14:43 -0700)]
vpxenc.sh: Add basic multithreaded frame parallel encode test.
Change-Id: Id526783fa2e3e9bb31229931b6548ac7a9b2b7e6
Marco [Thu, 25 Jun 2015 19:22:19 +0000 (12:22 -0700)]
Update to dynamic resize logic for 1pass CBR.
Only do the check for resizing if the feature is selected
(i.e., resize_mode = RESIZE_DYNAMIC).
And modify condition for checking to be resize_count >= window,
(since framerate can change).
Change-Id: Idceb4e50956bb965a1492b4993b0dcb393c9be4d
James Zern [Thu, 25 Jun 2015 02:14:18 +0000 (02:14 +0000)]
Merge "vp8_subpixelvariance_neon: right size coeff table"
Marco [Wed, 24 Jun 2015 23:12:12 +0000 (16:12 -0700)]
Fix to unstable build from commit 517a66.
Change-Id: I123db2d20ae65a10e2dec95eec61150e2f69546d
James Zern [Wed, 24 Jun 2015 21:23:15 +0000 (21:23 +0000)]
Merge "vp9_reconintra_neon: add d45 16x16"
Tom Finegan [Wed, 24 Jun 2015 16:36:08 +0000 (16:36 +0000)]
Merge "vpxenc.sh: Add basic vp9 multithread encode test."
James Zern [Sat, 6 Jun 2015 16:20:12 +0000 (09:20 -0700)]
vp8_subpixelvariance_neon: right size coeff table
only uint8 is required; each use only loads one value as a uint8
quiets a few type conversion warnings
Change-Id: I03dc0dc0eb01ac23a6e8673daa2b77c6c57bf1b0
James Zern [Wed, 24 Jun 2015 06:20:36 +0000 (06:20 +0000)]
Merge "build: add *test-no-data-check targets"
Tom Finegan [Wed, 24 Jun 2015 01:30:52 +0000 (18:30 -0700)]
vpxenc.sh: Add basic vp9 multithread encode test.
- Change default real time speed to -6.
- Add vpxenc_vp9_webm_rt_multithread, which encodes
niklas_1280_720_30.y4m with 2 to 4 threads using 2 to 4
tile columns.
Change-Id: I4d86c3360aec67ae5d1ba82eb6e0f0be8068b5af
Marco [Tue, 23 Jun 2015 23:55:15 +0000 (23:55 +0000)]
Merge "aq-mode=3: Reduce boost for segment#2 at low bitrates/low res."
Marco [Tue, 23 Jun 2015 00:50:01 +0000 (17:50 -0700)]
aq-mode=3: Reduce boost for segment#2 at low bitrates/low res.
Reduce boost for segment#2 for low bitrates and low-res.
This change is to reduce the rate overshoot at low bitrates.
No change in behavior, except at the very low bitrates.
Change-Id: I0dbd9d3b6356da5804de94adf10fca6a7a8f8948
Tom Finegan [Tue, 23 Jun 2015 22:57:03 +0000 (22:57 +0000)]
Merge "Fix building with iOS 9 beta SDK"
James Zern [Tue, 23 Jun 2015 03:57:14 +0000 (20:57 -0700)]
vp9_reconintra_neon: add d45 16x16
~90% faster over 20M pixels
Change-Id: I92d80f66e91e0a870a672cfb5dd29bf1a17cb11a
Parag Salasakar [Tue, 23 Jun 2015 02:02:25 +0000 (07:32 +0530)]
mips msa vp9 avg optimization
average improvement ~2x-3x
Change-Id: I76f7fc00c0ffdf2b4ba41bf3819f3b6044bcdeff
Parag Salasakar [Tue, 23 Jun 2015 01:46:52 +0000 (01:46 +0000)]
Merge "mips msa vp9 fdct 4x4 optimization"
Marco [Mon, 22 Jun 2015 22:20:28 +0000 (15:20 -0700)]
Fixes for key frame coding at speed 5.
Keep the same transform cutoff and partition selection
for speed 5 as in speeds >=6 (non-rd speed settings).
Existing setting for key frame at speed 5 allowed transform size
up to 32x32 on key frames, and did not allow for 4x4 block partition size.
This created more visual artifacts on first few frames.
avgPSNR/overallPSNR/SSIM gains of 0.2/0.7/0.8 for rtc_derf(low-res) set,
and 0/0.7/1.1 gains for rtc set.
Change-Id: I8c139ec6c9bb74e14b4ffbad5f12e94f18a59c0b
James Zern [Mon, 22 Jun 2015 22:27:56 +0000 (22:27 +0000)]
Merge "vp9_reconintra_neon: add d45 8x8"
Brion Vibber [Sat, 20 Jun 2015 19:08:35 +0000 (12:08 -0700)]
Fix building with iOS 9 beta SDK
configure.sh was setting some Mac OS X options for iOS targets, which
confuses the iOS 9 beta SDK in Xcode 7 when linking libraries.
Additionally, old armv6 media extensions were being enabled on iOS
when they're not needed (we always have Neon since iOS 6). These
broke on iOS 9 SDK which no longer assembles those instructions.
Change-Id: I4e4d2722392ead3382ce96289c03ef1e489799d6
Marco [Mon, 22 Jun 2015 16:59:47 +0000 (16:59 +0000)]
Merge "Reduce max_partition_size for low resolutions at speed 5."
Scott LaVarnway [Tue, 16 Jun 2015 13:38:34 +0000 (06:38 -0700)]
Remove tile param
and added to MACROBLOCKD.
Change-Id: I0e60aaa9f84bcc9f2376d71bd934f251baee38db
Parag Salasakar [Mon, 22 Jun 2015 09:00:24 +0000 (14:30 +0530)]
mips msa vp9 fdct 4x4 optimization
average improvement ~2x-3x
Change-Id: Idf8be780b8b4228fc91f110a94e4ee1fd9af0163
Frank Galligan [Fri, 19 Jun 2015 15:59:42 +0000 (08:59 -0700)]
Add assembly tests for int projections.
BUG=https://code.google.com/p/webm/issues/detail?id=1022
Change-Id: I5ae4acac39fd75c56d3feff0716cb52133de3b22
Parag Salasakar [Sat, 20 Jun 2015 02:58:08 +0000 (02:58 +0000)]
Merge "mips msa vp9 fdct 8x8 optimization"
James Zern [Sat, 20 Jun 2015 02:19:22 +0000 (19:19 -0700)]
vp9_reconintra_neon: add d45 8x8
based on ssse3 implementation
~91% faster over 20M pixels
Change-Id: I6d743a53352c2d6de0efe7899d7996e8b0f7fa29
Parag Salasakar [Thu, 18 Jun 2015 06:33:30 +0000 (12:03 +0530)]
mips msa vp9 fdct 8x8 optimization
average improvement ~4x-5x
Change-Id: I37582efc2622bc20b2bf99617a76110ab24e9f6a
James Zern [Sat, 20 Jun 2015 01:43:53 +0000 (01:43 +0000)]
Merge "Add dynamic range comment to vp9_int_pro_row"
Jingning Han [Tue, 16 Jun 2015 21:43:21 +0000 (14:43 -0700)]
Add dynamic range comment to vp9_int_pro_row
Change-Id: Icaa643568159c4e2db24eef42090b002ae02a45e
Jingning Han [Sat, 20 Jun 2015 00:35:05 +0000 (00:35 +0000)]
Merge "Add dynamic range comment to vp9_int_pro_col"
James Zern [Fri, 19 Jun 2015 23:02:28 +0000 (16:02 -0700)]
build: add *test-no-data-check targets
skips testdata verification; useful with slow media or if the data was
retrieved via a separate call to testdata
Change-Id: Ifd97892cee6c04b0111874cc8071675e90ec852b
Marco [Fri, 19 Jun 2015 23:40:01 +0000 (16:40 -0700)]
Reduce max_partition_size for low resolutions at speed 5.
For speed 5 real-time mode, the selection of the partition size for
superblocks on the segment (aq-mode=3) uses the non-rd recursive
pick partition search, and can sometimes select 64x64.
For low resolutions, visually better to limit this to 32x32.
Change-Id: I69657a7ed8899f8b3cf8c9c318a2509c5c72c565
Alex Converse [Thu, 18 Jun 2015 23:05:56 +0000 (16:05 -0700)]
Limit cyclic refresh revisitng blocks at the same quantizer.
For screen content don't refresh a block at a quantizer higher than
it was last coded at. PReviosuly at realtime speeds the encoder had a
tendency to recode a block from GOLDEN with a higher Q than it was last
coded at.
Change-Id: Iacd561806c769dcce1a81b9827ffc70090f5ba18
Yaowu Xu [Fri, 19 Jun 2015 19:02:09 +0000 (19:02 +0000)]
Merge "Fix a msvc compiler warning"
Jingning Han [Tue, 16 Jun 2015 21:45:58 +0000 (14:45 -0700)]
Add dynamic range comment to vp9_int_pro_col
Change-Id: If14d9f874bd0bf2c5a455982088fd70591f5ea5a
Johann Koenig [Fri, 19 Jun 2015 16:27:24 +0000 (16:27 +0000)]
Merge "Move vp8 variance files"
Yaowu Xu [Fri, 19 Jun 2015 16:04:29 +0000 (09:04 -0700)]
Fix a msvc compiler warning
Change-Id: Ida8a04370895ed14bd118324ec2577da926e4648
James Zern [Fri, 19 Jun 2015 03:32:24 +0000 (03:32 +0000)]
Merge "vp9_filter: make all filter tables static"
James Zern [Fri, 19 Jun 2015 03:31:37 +0000 (03:31 +0000)]
Merge changes I2552d810,I51952c0a,Ib82e4247,I9c8d16cb
* changes:
vp9_mcomp: make search_step_table static
vp9_encodeframe: delete auto_partition_range()
vp9_mcomp: don't mark setup_center_error() inline
vp9_encoder: hide adjust_image_stat()
James Zern [Fri, 19 Jun 2015 03:27:22 +0000 (03:27 +0000)]
Merge "vp9_reconintra_neon: add d45 4x4"
James Zern [Fri, 19 Jun 2015 03:24:57 +0000 (03:24 +0000)]
Merge changes from topic 'vp9-intra-pred'
* changes:
vp9_reconintra_neon: add d135 4x4
vp9_reconintra: correct d135 4x4 signature
Marco [Fri, 19 Jun 2015 00:56:05 +0000 (00:56 +0000)]
Merge "Add dynamic resize logic for 1 pass CBR."
Marco [Mon, 8 Jun 2015 17:03:51 +0000 (10:03 -0700)]
Add dynamic resize logic for 1 pass CBR.
Decision to scale down/up is based on buffer state and average QP
over previous time window. Limit the total amount of down-scaling
to be at most one scale down for now.
Reset certain quantities after resize (buffer level, cyclic refresh,
rate correction factor).
Feature is enable via the setting rc_resize_allowed = 1.
Change-Id: I9b1a53024e1e1e953fb8a1e1f75d21d160280dc7
Johann [Tue, 2 Jun 2015 21:17:24 +0000 (14:17 -0700)]
Move vp8 variance files
There is a naming conflict in the chromium build system.
The rest of the variance functions will move to vpx_dsp soon.
Change-Id: Iff78da2aafb0d7380eda73e38d7dac72110a1e47
James Zern [Thu, 18 Jun 2015 03:52:13 +0000 (20:52 -0700)]
vp9_reconintra_neon: add d45 4x4
based on webp's LD4()
~59% faster over 20M pixels
Change-Id: I371eaed9ce8f470451046997e130b0ba1a2f7a9c
James Zern [Thu, 18 Jun 2015 01:23:19 +0000 (18:23 -0700)]
vp9_reconintra_neon: add d135 4x4
based on webp's RD4()
~50% faster over 20M pixels
Change-Id: Ifcb7bf7f7fc8eabf79d9e3b219ce1be67abc524a
James Zern [Thu, 18 Jun 2015 00:24:45 +0000 (17:24 -0700)]
vp9_reconintra: correct d135 4x4 signature
add missing '_c' suffix
Change-Id: I928d6cf8f90db0b8ca0b1f3bbf10b3d792062cec
James Zern [Thu, 18 Jun 2015 22:24:54 +0000 (22:24 +0000)]
Merge "vp9_reconintra_neon: add DC 4x4 predictors"
James Zern [Wed, 17 Jun 2015 23:34:14 +0000 (16:34 -0700)]
vp9_reconintra_neon: add DC 4x4 predictors
~85-89% faster over 20M pixels
Change-Id: I3812e8adfffe5255034da88dfe6546e12f4d10ee
James Zern [Thu, 18 Jun 2015 22:17:51 +0000 (22:17 +0000)]
Merge "vp9_reconintra_neon: add DC 32x32 predictors"
Jingning Han [Thu, 18 Jun 2015 19:36:52 +0000 (19:36 +0000)]
Merge "Add dynamic range comment to vp9_satd"
Jingning Han [Tue, 16 Jun 2015 21:35:00 +0000 (14:35 -0700)]
Add dynamic range comment to vp9_satd
Change-Id: I75873846e6fdafbe7597a1bd0192115d2d1e9987
Parag Salasakar [Thu, 18 Jun 2015 04:30:52 +0000 (04:30 +0000)]
Merge "mips msa vp9 fdct 32x32 optimization"
Jingning Han [Wed, 17 Jun 2015 15:49:02 +0000 (08:49 -0700)]
Take out assertion for block_yrd in rtc coding flow
The internal behavior of block_yrd differs in high bit depth
settings from 8-bit one. This causes the assertion condition not
true for high bit depth.
Change-Id: I15dc02e7162d27cabe78c451941d769d488b1174
James Zern [Wed, 17 Jun 2015 05:29:30 +0000 (05:29 +0000)]
Merge "Fix integer overflow issue in rtc coding flow intra mode search"
Jingning Han [Tue, 16 Jun 2015 19:00:50 +0000 (12:00 -0700)]
Fix integer overflow issue in rtc coding flow intra mode search
The overflow issue affects a variable that is only used in inter
mode. This commit fixes the ioc warning triggered in the intra
mode. It does not affect the compression performance.
Change-Id: I593d1b5650599de07f3e68176dd1442c6cb7bdbc
Parag Salasakar [Wed, 17 Jun 2015 02:23:06 +0000 (07:53 +0530)]
mips msa vp9 fdct 32x32 optimization
average improvement ~4x-6x
Change-Id: Ibcac3ef8ed5e207cf8c121e696570e6b63d3c0f4