platform/upstream/libvpx.git
22 months ago[NEON] Optimize highbd 32x32 DCT
Konstantinos Margaritis [Wed, 26 Oct 2022 22:09:32 +0000 (22:09 +0000)]
[NEON] Optimize highbd 32x32 DCT

For --best quality, resulting function
vpx_highbd_fdct32x32_rd_neon takes 0.27% of cpu time in
profiling, vs 6.27% for the sum of scalar functions:
vpx_fdct32, vpx_fdct32.constprop.0, vpx_fdct32x32_rd_c for rd.
For --rt quality, the function takes 0.19% vs 4.57% for the scalar
version.
Overall, this improves encoding time by ~6% compared for highbd
for --best and ~9% for --rt.

Change-Id: I1ce4bbef6e364bbadc76264056aa3f86b1a8edc5

23 months agoMerge "[NEON] Optimize and homogenize Butterfly DCT functions" into main
James Zern [Wed, 2 Nov 2022 02:21:18 +0000 (02:21 +0000)]
Merge "[NEON] Optimize and homogenize Butterfly DCT functions" into main

23 months ago[NEON] Optimize and homogenize Butterfly DCT functions
Konstantinos Margaritis [Wed, 26 Oct 2022 21:37:31 +0000 (21:37 +0000)]
[NEON] Optimize and homogenize Butterfly DCT functions

Provide a set of commonly used Butterfly DCT functions for use in
DCT 4x4, 8x8, 16x16, 32x32 functions. These are provided in various
forms, using vqrdmulh_s16/vqrdmulh_s32 for _fast variants, which
unfortunately are only usable in pass1 of most DCTs, as they do not
provide the necessary precision in pass2.
This gave a performance gain ranging from 5% to 15% in 16x16 case.
Also, for 32x32, the loads were rearranged, along with the butterfly
optimizations, this gave 10% gain in 32x32_rd function.
This refactoring was necessary to allow easier porting of highbd
32x32 functions -follows this patchset.

Change-Id: I6282e640b95a95938faff76c3b2bace3dc298bc3

23 months agoMerge "MacOS 13 is darwin22" into main
Johann Koenig [Thu, 27 Oct 2022 08:38:48 +0000 (08:38 +0000)]
Merge "MacOS 13 is darwin22" into main

23 months agoMerge "rtcd: allow disabling neon on armv8" into main
Johann Koenig [Thu, 27 Oct 2022 08:38:18 +0000 (08:38 +0000)]
Merge "rtcd: allow disabling neon on armv8" into main

23 months agoMacOS 13 is darwin22
Johann [Thu, 27 Oct 2022 02:40:19 +0000 (11:40 +0900)]
MacOS 13 is darwin22

Bug: webm:1783
Change-Id: I97d94ab8c8aebe13aedb58e280dc37474814ad5d

23 months agortcd: allow disabling neon on armv8
Johann [Wed, 26 Oct 2022 23:49:37 +0000 (08:49 +0900)]
rtcd: allow disabling neon on armv8

Change-Id: Idef943775456eb95b46be5c92c114c1d215f38d7

23 months agomailmap: add johann@duck.com
Johann [Wed, 26 Oct 2022 08:14:21 +0000 (17:14 +0900)]
mailmap: add johann@duck.com

Change-Id: I3b48951e69ba1f4a9fafdbb81fac48f79587a342

23 months agoMerge changes I36545ff4,Id1aa29da into main
James Zern [Tue, 25 Oct 2022 19:16:46 +0000 (19:16 +0000)]
Merge changes I36545ff4,Id1aa29da into main

* changes:
  vp9_highbd_quantize_fp*_neon: normalize fn param name
  highbd_sad_avx2: normalize function param names

23 months agoMerge "SAD*Test: mark virtual Run() as overridden" into main
James Zern [Tue, 25 Oct 2022 19:16:08 +0000 (19:16 +0000)]
Merge "SAD*Test: mark virtual Run() as overridden" into main

23 months agoMerge "quantize: consolidate sse2 conditionals" into main
Johann Koenig [Tue, 25 Oct 2022 13:26:37 +0000 (13:26 +0000)]
Merge "quantize: consolidate sse2 conditionals" into main

23 months agoMerge "vp9 quantize: rewrite ssse3 in intrinsics" into main
Johann Koenig [Tue, 25 Oct 2022 13:26:22 +0000 (13:26 +0000)]
Merge "vp9 quantize: rewrite ssse3 in intrinsics" into main

23 months agoSAD*Test: mark virtual Run() as overridden
James Zern [Mon, 24 Oct 2022 22:37:26 +0000 (15:37 -0700)]
SAD*Test: mark virtual Run() as overridden

this comes from AbstractBench

Change-Id: Ie0b5a26a68bfbffd80f132125d15a1bdfc990c22

23 months agovp9_highbd_quantize_fp*_neon: normalize fn param name
James Zern [Mon, 24 Oct 2022 22:28:47 +0000 (15:28 -0700)]
vp9_highbd_quantize_fp*_neon: normalize fn param name

count -> n_coeffs. aligns the name with the rtcd header; clears a
clang-tidy warning

Change-Id: I36545ff479df92b117c95e494f16002e6990f433

23 months agohighbd_sad_avx2: normalize function param names
James Zern [Mon, 24 Oct 2022 22:24:51 +0000 (15:24 -0700)]
highbd_sad_avx2: normalize function param names

(src|ref)8_ptr -> (src|ref)_ptr. aligns the names with the rtcd header;
clears some clang-tidy warnings

Change-Id: Id1aa29da8c0fa5860b46ac902f5b2620c0d3ff54

23 months agoFix to VP8 external RC for buffer levels
Marco Paniconi [Tue, 18 Oct 2022 05:36:25 +0000 (22:36 -0700)]
Fix to VP8 external RC for buffer levels

On a dynamic change of temporal layers:
starting/maimum/optimal were being set twice,
causing incorrect large values.

Bug: b/253927937
Change-Id: I204e885cff92530336a9ed9a4363c486c5bf80ae

23 months agoquantize: consolidate sse2 conditionals
Johann [Mon, 17 Oct 2022 07:22:23 +0000 (16:22 +0900)]
quantize: consolidate sse2 conditionals

Change-Id: I43de579e30f2967b97064063e29676e0af1a498f

23 months agovp9 quantize: rewrite ssse3 in intrinsics
Johann [Sat, 1 Oct 2022 02:47:05 +0000 (11:47 +0900)]
vp9 quantize: rewrite ssse3 in intrinsics

Change-Id: I3177251a5935453a23a23c39ea5f6fd41254775e

23 months agoMerge "Fix to VP8 external RC for dynamic update of layers" into main
Marco Paniconi [Sat, 15 Oct 2022 01:56:46 +0000 (01:56 +0000)]
Merge "Fix to VP8 external RC for dynamic update of layers" into main

23 months agoFix to VP8 external RC for dynamic update of layers
Marco Paniconi [Wed, 12 Oct 2022 07:10:47 +0000 (00:10 -0700)]
Fix to VP8 external RC for dynamic update of layers

On change/update of rc_cfg: when number of temporal
layers change call vp8_reset_temporal_layer_change(),
which in turn will call vp8_init_temporal_layer_context()
only for the new layers.

Bug:b/249644737

Change-Id: Ib20d746c7eacd10b78806ca6a5362c750d9ca0b3

23 months ago[NEON] fix clang compile warnings
Konstantinos Margaritis [Thu, 13 Oct 2022 15:19:46 +0000 (15:19 +0000)]
[NEON] fix clang compile warnings

Change-Id: Ib7ce7a774ec89ba51169ea64d24c878109ef07d1

23 months agoMerge "Add vpx_highbd_sad64x{64,32}_avg_avx2." into main
Scott LaVarnway [Thu, 13 Oct 2022 11:31:51 +0000 (11:31 +0000)]
Merge "Add vpx_highbd_sad64x{64,32}_avg_avx2." into main

23 months ago[NEON] Add highbd FDCT 16x16 function
Konstantinos Margaritis [Fri, 7 Oct 2022 15:13:29 +0000 (15:13 +0000)]
[NEON] Add highbd FDCT 16x16 function

90-95% faster than C version in best/rt profiles

Change-Id: I41d5e9acdc348b57153637ec736498a25ed84c25

23 months agoMerge "[NEON] Add highbd FDCT 8x8 function" into main
James Zern [Wed, 12 Oct 2022 20:07:51 +0000 (20:07 +0000)]
Merge "[NEON] Add highbd FDCT 8x8 function" into main

23 months agoMerge "Add vpx_highbd_sad32x{64,32,16}_avg_avx2." into main
Scott LaVarnway [Wed, 12 Oct 2022 19:50:55 +0000 (19:50 +0000)]
Merge "Add vpx_highbd_sad32x{64,32,16}_avg_avx2." into main

23 months agoMerge "Add vpx_highbd_sad16x{32,16,8}_avg_avx2." into main
Scott LaVarnway [Wed, 12 Oct 2022 19:44:44 +0000 (19:44 +0000)]
Merge "Add vpx_highbd_sad16x{32,16,8}_avg_avx2." into main

23 months ago[NEON] Add highbd FDCT 8x8 function
Konstantinos Margaritis [Thu, 6 Oct 2022 16:00:43 +0000 (16:00 +0000)]
[NEON] Add highbd FDCT 8x8 function

50% faster than C version in best/rt profiles

Change-Id: I0f9504ed52b5d5f7722407e91108ed4056d66bc2

23 months agoAdd vpx_highbd_sad64x{64,32}_avg_avx2.
Scott LaVarnway [Wed, 12 Oct 2022 17:26:43 +0000 (10:26 -0700)]
Add vpx_highbd_sad64x{64,32}_avg_avx2.

~2.8x faster than the sse2 version.

Bug: b/245917257

Change-Id: Ib727ba8a8c8fa4df450bafdde30ed99fd283f06d

23 months ago[NEON] Add highbd FDCT 4x4 function
Konstantinos Margaritis [Thu, 6 Oct 2022 14:53:56 +0000 (14:53 +0000)]
[NEON] Add highbd FDCT 4x4 function

~80% faster than C version for both best/rt profiles.

Change-Id: Ibb3c8e1862131d2a020922420d53c66b31d5c2c3

23 months agoAdd vpx_highbd_sad32x{64,32,16}_avg_avx2.
Scott LaVarnway [Wed, 12 Oct 2022 13:05:46 +0000 (06:05 -0700)]
Add vpx_highbd_sad32x{64,32,16}_avg_avx2.

2.1x to 2.8x faster than the sse2 version.

Bug: b/245917257

Change-Id: I1aaffa4a1debbe5559784e854b8fc6fba07e5000

23 months agoAdd vpx_highbd_sad16x{32,16,8}_avg_avx2.
Scott LaVarnway [Mon, 10 Oct 2022 15:38:44 +0000 (08:38 -0700)]
Add vpx_highbd_sad16x{32,16,8}_avg_avx2.

1.6x to 2.1x faster than the sse2 version.

Bug: b/245917257

Change-Id: I56c467a850297ae3abcca4b4843302bb8d5d0ac1

23 months ago[NEON] Move helper functions for reuse
Konstantinos Margaritis [Thu, 6 Oct 2022 13:05:01 +0000 (13:05 +0000)]
[NEON] Move helper functions for reuse

Move all butterfly functions to fdct_neon.h
Slightly optimize load/scale/cross functions
in fdct 16x16.
These will be reused in highbd variants.

Change-Id: I28b6e0cc240304bab6b94d9c3f33cca77b8cb073

23 months agoMerge "SADavgTest: Add speed test." into main
Scott LaVarnway [Mon, 10 Oct 2022 20:34:02 +0000 (20:34 +0000)]
Merge "SADavgTest: Add speed test." into main

23 months agoSADavgTest: Add speed test.
Scott LaVarnway [Mon, 10 Oct 2022 19:20:37 +0000 (12:20 -0700)]
SADavgTest: Add speed test.

Change-Id: Ie14c0f6d15f410adf749f7ab74cf9f2bf35f3d5f

23 months ago[NEON] move transpose_8x8 to reuse
Konstantinos Margaritis [Thu, 6 Oct 2022 10:58:27 +0000 (10:58 +0000)]
[NEON] move transpose_8x8 to reuse

Change-Id: I3915b6c9971aedaac9c23f21fdb88bc271216208

23 months agoMerge "[NEON] highbd partial DCT functions" into main
James Zern [Mon, 10 Oct 2022 18:37:05 +0000 (18:37 +0000)]
Merge "[NEON] highbd partial DCT functions" into main

23 months ago[NEON] highbd partial DCT functions
Konstantinos Margaritis [Thu, 6 Oct 2022 10:26:05 +0000 (10:26 +0000)]
[NEON] highbd partial DCT functions

Change-Id: I7dd4e698469562f5b1f948cc36f8403b490dcb6a

23 months agoAdd vpx_highbd_sad64x{64,32}_avx2.
Scott LaVarnway [Fri, 7 Oct 2022 12:53:50 +0000 (05:53 -0700)]
Add vpx_highbd_sad64x{64,32}_avx2.

~2.8x faster than the sse2 version.

Bug: b/245917257

Change-Id: Ibc8e5d030ec145c9a9b742fff98fbd9131c9ede4

23 months agoMerge "vp9 quantize: change index" into main
Johann Koenig [Fri, 7 Oct 2022 08:17:03 +0000 (08:17 +0000)]
Merge "vp9 quantize: change index" into main

23 months agoAdd vpx_highbd_sad32x{64,32,16}_avx2.
Scott LaVarnway [Wed, 5 Oct 2022 21:03:55 +0000 (14:03 -0700)]
Add vpx_highbd_sad32x{64,32,16}_avx2.

2.7x to 3.1x faster than the sse2 version.

Bug: b/245917257

Change-Id: Idff3284932f7ee89d036f38893205bf622a159a3

23 months agoAdd vpx_highbd_sad16x{32,16,8}_avx2.
Scott LaVarnway [Wed, 5 Oct 2022 14:04:27 +0000 (07:04 -0700)]
Add vpx_highbd_sad16x{32,16,8}_avx2.

1.9x to 2.4x faster than the sse2 version.

Bug: b/245917257

Change-Id: I686452772f9b72233930de2207af36a0cd72e0bb

23 months agoMerge "L2E: Rework recode decisions for external max frame size and q" into main
Cheng Chen [Tue, 4 Oct 2022 16:15:49 +0000 (16:15 +0000)]
Merge "L2E: Rework recode decisions for external max frame size and q" into main

2 years agovp9 quantize: change index
Johann [Sat, 1 Oct 2022 02:18:09 +0000 (11:18 +0900)]
vp9 quantize: change index

In assembly it made sense to iterate using n_coeffs.
In intrinsics it's just as fast to use index and
easier to read.

Change-Id: I403c959709309dad68123d0a3d0efe183874543d

2 years agovpx_subpixel_8t_intrin_avx2.c: quiet -Wuninitialized
Scott LaVarnway [Mon, 19 Sep 2022 12:09:23 +0000 (05:09 -0700)]
vpx_subpixel_8t_intrin_avx2.c: quiet -Wuninitialized

warning: ‘s2[3]’ may be used uninitialized
and
warning: ‘s1[3]’ may be used uninitialized

The warnings exposed unused code.

Change-Id: I75cf1f9db75e811cb42e2f143be1ad76f3e4dee9

2 years agoMerge "vp9_rd.c quiet -Wstringop-overflow" into main
Scott LaVarnway [Mon, 26 Sep 2022 23:18:04 +0000 (23:18 +0000)]
Merge "vp9_rd.c quiet -Wstringop-overflow" into main

2 years agoquantize: standardize vp9_quantize_fp_sse2
Johann [Sat, 24 Sep 2022 01:53:05 +0000 (10:53 +0900)]
quantize: standardize vp9_quantize_fp_sse2

Match style for vpx_quantize_b_sse2 and prepare to rewrite
ssse3 version in intrinsics.

Need to evaluate the value of threshold breakout before
going further.

Change-Id: I9cfceb1bb0dc237cd6b73fc8d41d78bba444a15b

2 years agovp9_rd.c quiet -Wstringop-overflow
Scott LaVarnway [Fri, 23 Sep 2022 16:17:18 +0000 (09:17 -0700)]
vp9_rd.c quiet -Wstringop-overflow

../libvpx/vp9/encoder/vp9_rd.c:594:20: warning: writing 1 byte into a region of size 0 [-Wstringop-overflow=]
  594 |         t_above[i] = !!*(const uint32_t *)&above[i];
      |         ~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
../libvpx/vp9/encoder/vp9_rd.c:572:47: note: at offset [64, 254] into destination object ‘t_above’ of size [0, 16]
  572 |                               ENTROPY_CONTEXT t_above[16],
      |                               ~~~~~~~~~~~~~~~~^~~~~~~~~~~

Change-Id: Ie9ef24e685af417cdd35f6aa7284805e422b6ae2

2 years agoquantize: add untested function
Johann [Sat, 24 Sep 2022 01:55:52 +0000 (10:55 +0900)]
quantize: add untested function

vp9_quantize_fp_sse2 was only tested in non-hbd
configuration. Missed when fixing this for
vpx_quantize_b_sse2.

Change-Id: Ide346e5727d74281c774f605c90d280050e0bf62

2 years agoquantize: increase iscan by 1
Johann [Fri, 16 Sep 2022 23:47:28 +0000 (08:47 +0900)]
quantize: increase iscan by 1

All of the assembly adds 1 to iscan to convert from
a 0 based array to the EOB value.

Add 1 to all iscan values and remove the extra
instructions from the assembly.

Change-Id: I219dd7f2bd10533ab24b206289565703176dc5e9

2 years agoMerge "resize_test.cc: quiet -Wmaybe-uninitialized" into main
Scott LaVarnway [Wed, 21 Sep 2022 23:41:42 +0000 (23:41 +0000)]
Merge "resize_test.cc: quiet -Wmaybe-uninitialized" into main

2 years agoresize_test.cc: quiet -Wmaybe-uninitialized
Scott LaVarnway [Wed, 21 Sep 2022 19:15:16 +0000 (12:15 -0700)]
resize_test.cc: quiet -Wmaybe-uninitialized

warning: ‘expected_w’ may be used uninitialized
Change-Id: I915efd82d3263250cea90391345f7683c1330fc8

2 years agoMerge "post_proc_sse2.c: quiet -Wuninitialized" into main
Scott LaVarnway [Wed, 21 Sep 2022 20:53:07 +0000 (20:53 +0000)]
Merge "post_proc_sse2.c: quiet -Wuninitialized" into main

2 years agopost_proc_sse2.c: quiet -Wuninitialized
Scott LaVarnway [Wed, 21 Sep 2022 18:37:04 +0000 (11:37 -0700)]
post_proc_sse2.c: quiet -Wuninitialized

In file included from ../libvpx/vpx_dsp/x86/post_proc_sse2.c:12:
In function ‘_mm_add_epi16’,
    inlined from ‘vpx_mbpost_proc_down_sse2’ at ../libvpx/vpx_dsp/x86/post_proc_sse2.c:88:13:
/usr/lib/gcc/x86_64-linux-gnu/12/include/emmintrin.h:1060:35: warning: ‘below_context’ may be used uninitialized [-Wmaybe-uninitialized]
 1060 |   return (__m128i) ((__v8hu)__A + (__v8hu)__B);
      |                                   ^~~~~~~~~~~
../libvpx/vpx_dsp/x86/post_proc_sse2.c: In function ‘vpx_mbpost_proc_down_sse2’:
../libvpx/vpx_dsp/x86/post_proc_sse2.c:39:13: note: ‘below_context’ was declared here
   39 |     __m128i below_context;

Change-Id: I2fc592f121c4e85d0aff1640014c3444f5eb09fd

2 years agoMerge "CHECK_MEM_ERROR: add an assert for a valid jmp target" into main
James Zern [Tue, 20 Sep 2022 23:24:44 +0000 (23:24 +0000)]
Merge "CHECK_MEM_ERROR: add an assert for a valid jmp target" into main

2 years agoMerge "quantize: test lowbd in highbd builds" into main
Johann Koenig [Tue, 20 Sep 2022 00:12:13 +0000 (00:12 +0000)]
Merge "quantize: test lowbd in highbd builds" into main

2 years agoquantize: test lowbd in highbd builds
Johann [Sun, 18 Sep 2022 01:26:00 +0000 (10:26 +0900)]
quantize: test lowbd in highbd builds

Change-Id: I7af273e979415a8b8cafb7494728d2736862f4a5

2 years agofwd_txfm: remove avx2 file from non-hbd
Johann [Fri, 16 Sep 2022 22:54:40 +0000 (07:54 +0900)]
fwd_txfm: remove avx2 file from non-hbd

Resolves warning on OS X:
file: libvpx_g.a(fwd_txfm_avx2.c.o) has no symbols

Change-Id: Ie8b290bb3ed329656beb883d552c98353f1ed5e5

2 years agoL2E: Rework recode decisions for external max frame size and q
Cheng Chen [Wed, 14 Sep 2022 18:40:50 +0000 (11:40 -0700)]
L2E: Rework recode decisions for external max frame size and q

Allow to handle external q and external max frame size separately.
Rely on libvpx's decision to catch overshoot/undershoot and recode frames.

Previously, when external max frame size is set, we didn't handle
undershoot cases, and now we fall back to libvpx's decision to
recode a frame if overshoot/undershoot is seen.

Change-Id: Ic3eee042cfe104b528c5f2c6c82b98dd5d8fa8ca

2 years agoAdd vpx_highbd_sad64x{64,32}x4d_avx2.
Scott LaVarnway [Wed, 14 Sep 2022 10:36:46 +0000 (03:36 -0700)]
Add vpx_highbd_sad64x{64,32}x4d_avx2.

~2x faster than the sse2 version.

Bug: b/245917257

Change-Id: I4742950ab7b90d7f09e8d4687e1e967138acee39

2 years agoAdd vpx_highbd_sad32x{64,32,16}x4d_avx2.
Scott LaVarnway [Mon, 12 Sep 2022 14:40:39 +0000 (07:40 -0700)]
Add vpx_highbd_sad32x{64,32,16}x4d_avx2.

~2.4x faster than the sse2 version.

Bug: b/245917257

Change-Id: I6df2bd62b46e5e175c8ad80daa6de3a1c313db0f

2 years agoCHECK_MEM_ERROR: add an assert for a valid jmp target
James Zern [Sat, 28 May 2022 04:53:49 +0000 (21:53 -0700)]
CHECK_MEM_ERROR: add an assert for a valid jmp target

callers of CHECK_MEM_ERROR() expect failures to not return

tested with:
configure --enable-debug --enable-vp9-postproc --enable-postproc \
  --enable-multi-res-encoding --enable-vp9-temporal-denoising \
  --enable-error-concealment

--enable-internal-stats has unrelated assertion failures currently

Change-Id: Ic12073b1ae80a6f434f14d24f652e64d30f63eea

2 years agoMerge "Add vpx_highbd_sad16x{32,16,8}x4d_avx2." into main
Scott LaVarnway [Mon, 12 Sep 2022 12:18:19 +0000 (12:18 +0000)]
Merge "Add vpx_highbd_sad16x{32,16,8}x4d_avx2." into main

2 years agoUpdate third_party/googletest to v1.12.1
Wan-Teh Chang [Thu, 8 Sep 2022 22:35:13 +0000 (15:35 -0700)]
Update third_party/googletest to v1.12.1

See https://github.com/google/googletest/releases/tag/release-1.12.1.

Modeled after https://aomedia-review.googlesource.com/c/aom/+/162601.

Change-Id: If0ced3097b4c8490985e3381aaac9b3266d52ae7

2 years agoAdd vpx_highbd_sad16x{32,16,8}x4d_avx2.
Scott LaVarnway [Thu, 8 Sep 2022 20:05:55 +0000 (13:05 -0700)]
Add vpx_highbd_sad16x{32,16,8}x4d_avx2.

1.98x to 2.3x faster than the sse2 version.

Bug: b/245917257

Change-Id: Ie4f9bb942ffaf4af7d395fb5a5978b41aabfc93c

2 years agovp8_decode: declare 2 variables volatile
James Zern [Thu, 8 Sep 2022 01:41:13 +0000 (18:41 -0700)]
vp8_decode: declare 2 variables volatile

fixes -Wclobbered warnings with gcc 12.1.0:
vp8/vp8_dx_iface.c|278 col 16| warning: variable 'w' might be clobbered
by 'longjmp' or 'vfork' [-Wclobbered]
vp8/vp8_dx_iface.c|278 col 19| warning: variable 'h' might be clobbered
by 'longjmp' or 'vfork' [-Wclobbered]

Change-Id: Ib2c606a3450188d7869c066cacaf5615d9746181

2 years agoMerge "x86,cosmetics: prefer _mm_setzero_si128/_mm256_setzero_si256" into main
James Zern [Tue, 6 Sep 2022 22:23:30 +0000 (22:23 +0000)]
Merge "x86,cosmetics: prefer _mm_setzero_si128/_mm256_setzero_si256" into main

2 years agosad_neon: enable UDOT implementation w/aarch32
James Zern [Fri, 2 Sep 2022 23:55:43 +0000 (16:55 -0700)]
sad_neon: enable UDOT implementation w/aarch32

Change-Id: Ia28305ec5c61518b732cbacbd102acd2cb7f9d82

2 years agovariance_neon.cc: simplify __ARM_FEATURE_DOTPROD check
James Zern [Fri, 2 Sep 2022 23:44:14 +0000 (16:44 -0700)]
variance_neon.cc: simplify __ARM_FEATURE_DOTPROD check

missed in
447e27588 vpx_dsp,neon: simplify __ARM_FEATURE_DOTPROD check

+ fix #if comments

only check that the macro is defined, the value doesn't have any effect.

from https://arm-software.github.io/acle/main/acle.html:

5.5.7.7.  Dot Product extension
  __ARM_FEATURE_DOTPROD is defined if the dot product data manipulation
  instructions are supported and the vector intrinsics are available.
  Note that this implies:
    - __ARM_NEON == 1

Change-Id: I098b96421b7de5928bb3b11612ca1f32e7b6cbc4

2 years agox86,cosmetics: prefer _mm_setzero_si128/_mm256_setzero_si256
James Zern [Fri, 2 Sep 2022 23:17:52 +0000 (16:17 -0700)]
x86,cosmetics: prefer _mm_setzero_si128/_mm256_setzero_si256

over *_set1_*(0)

Change-Id: I136e1798a2ce286480ebb9418db67a2f1e92b9a2

2 years agovpx_dsp,neon: simplify __ARM_FEATURE_DOTPROD check
James Zern [Fri, 2 Sep 2022 19:17:20 +0000 (12:17 -0700)]
vpx_dsp,neon: simplify __ARM_FEATURE_DOTPROD check

only check that the macro is defined, the value doesn't have any effect.

from https://arm-software.github.io/acle/main/acle.html:

5.5.7.7.  Dot Product extension
  __ARM_FEATURE_DOTPROD is defined if the dot product data manipulation
  instructions are supported and the vector intrinsics are available.
  Note that this implies:
    - __ARM_NEON == 1

Change-Id: I164fe121ccefda99050a9b6a99738a2b518520f3

2 years agoneon,load_unaligned_*: use dup for lane 0
James Zern [Fri, 2 Sep 2022 01:47:50 +0000 (18:47 -0700)]
neon,load_unaligned_*: use dup for lane 0

this produces better assembly with gcc (11.3.0-3); no change in assembly
using clang from the r24 android sdk (Android (8075178, based on
r437112b) clang version 14.0.1
(https://android.googlesource.com/toolchain/llvm-project
8671348b81b95fc603505dfc881b45103bee1731)

Change-Id: Ifec252d4f499f23be1cd94aa8516caf6b3fbbc11

2 years agotest/*,cosmetics: normalize void parameter lists
James Zern [Wed, 31 Aug 2022 23:35:08 +0000 (16:35 -0700)]
test/*,cosmetics: normalize void parameter lists

replace (void) with (); use of this synonym is more common in C++ code.

Change-Id: I9813e82234dc9caa7115918a0491b0040f6afaf4

2 years agoRemove const for pass-by-value parameters
Yaowu Xu [Tue, 30 Aug 2022 16:04:58 +0000 (09:04 -0700)]
Remove const for pass-by-value parameters

This also fixes MSVC compiler warnings.

Change-Id: I20dc9ac821275ba95598f3016fc6b23e884e13b7

2 years agoMerge "L2E: Add gop size and ARF existence to frame info" into main
Cheng Chen [Tue, 30 Aug 2022 04:25:33 +0000 (04:25 +0000)]
Merge "L2E: Add gop size and ARF existence to frame info" into main

2 years agohighbd_variance_neon,cosmetics: reorder a few lines
James Zern [Sat, 27 Aug 2022 05:12:44 +0000 (22:12 -0700)]
highbd_variance_neon,cosmetics: reorder a few lines

Change-Id: Ia6fa54652d7f94687e64108482bb0f28ca06cf49

2 years agoL2E: Add gop size and ARF existence to frame info
Cheng Chen [Fri, 26 Aug 2022 21:29:32 +0000 (14:29 -0700)]
L2E: Add gop size and ARF existence to frame info

Pass the encode frame info to external ml model, with the information
of gop size and whether alt ref is used.

Change-Id: I55be2d3de83d7182c1a1a174e44ead7e19045c9d

2 years agoMerge "[NEON] Add highbd *variance* functions" into main
James Zern [Fri, 26 Aug 2022 02:07:34 +0000 (02:07 +0000)]
Merge "[NEON] Add highbd *variance* functions" into main

2 years agoMerge "vpx_encoder.h: note VPX_ERROR_RESILIENT_PARTITIONS is VP8-only" into main
James Zern [Fri, 26 Aug 2022 02:01:55 +0000 (02:01 +0000)]
Merge "vpx_encoder.h: note VPX_ERROR_RESILIENT_PARTITIONS is VP8-only" into main

2 years ago[NEON] Add highbd *variance* functions
Konstantinos Margaritis [Wed, 24 Aug 2022 12:28:43 +0000 (12:28 +0000)]
[NEON] Add highbd *variance* functions

        Total gain for 12-bit encoding:
        * ~7.2% for best profile
        * ~5.8% for rt profile

Change-Id: I5b70415fb89d1bbb02a0c139eb317ba6b08adede

2 years agoMerge "vp9: fix ubsan sub-overflows" into main
James Zern [Thu, 25 Aug 2022 20:44:41 +0000 (20:44 +0000)]
Merge "vp9: fix ubsan sub-overflows" into main

2 years agovpx_encoder.h: note VPX_ERROR_RESILIENT_PARTITIONS is VP8-only
James Zern [Thu, 25 Aug 2022 17:50:16 +0000 (10:50 -0700)]
vpx_encoder.h: note VPX_ERROR_RESILIENT_PARTITIONS is VP8-only

Change-Id: If71b2ec766f9f41253ce5a34987ffd208f9c8381

2 years agoMerge "vp8_ratectrl_rtc_test.cc: ensure frame_type is initialized" into main
James Zern [Thu, 25 Aug 2022 16:55:42 +0000 (16:55 +0000)]
Merge "vp8_ratectrl_rtc_test.cc: ensure frame_type is initialized" into main

2 years agolibs.doxy_template: remove obsolete CLASS_DIAGRAMS
James Zern [Thu, 25 Aug 2022 01:50:10 +0000 (18:50 -0700)]
libs.doxy_template: remove obsolete CLASS_DIAGRAMS

This was reported with doxygen 1.9.4.

Also update the comment for CLASS_GRAPH by running "doxygen -u" because
the original comment for CLASS_GRAPH mentions the obsolete tag
'CLASS_DIAGRAMS',

Change-Id: I3bca547201f794d363bd814b7c7f7c9d7088797a

2 years agovp8_ratectrl_rtc_test.cc: ensure frame_type is initialized
James Zern [Wed, 24 Aug 2022 22:48:24 +0000 (15:48 -0700)]
vp8_ratectrl_rtc_test.cc: ensure frame_type is initialized

this fixes a valgrind failure:
==1095597== Conditional jump or move depends on uninitialised value(s)
==1095597==    at 0x12E0CC: (anonymous
namespace)::Vp8RcInterfaceTest::PreEncodeFrameHook(libvpx_test::VideoSource*,
libvpx_test::  > Encoder*) (vp8_ratectrl_rtc_test.cc:131)
==1095597==    by 0x1255A9:
libvpx_test::EncoderTest::RunLoop(libvpx_test::VideoSource*)
(encode_test_driver.cc:205)

Bug: webm:1776
Change-Id: Id3b40f62573ee513e79c74b6315c71b6ecd22c9a
Fixed: webm:1776

2 years agoMerge "[NEON] Improve vpx_quantize_b* functions" into main
James Zern [Wed, 24 Aug 2022 19:18:25 +0000 (19:18 +0000)]
Merge "[NEON] Improve vpx_quantize_b* functions" into main

2 years ago.clang-format: update to clang-format-11
clang-format [Sat, 13 Aug 2022 17:33:56 +0000 (10:33 -0700)]
.clang-format: update to clang-format-11

only store the deltas from --style Google in the file and reapply using
Debian clang-format version 11.1.0-6+build1

Bug: b/229626362
Change-Id: I3e18a2e7c17a90a48405b3cf1b37ebc652aba0db

2 years ago[NEON] Improve vpx_quantize_b* functions
Konstantinos Margaritis [Sat, 20 Aug 2022 19:02:15 +0000 (19:02 +0000)]
[NEON] Improve vpx_quantize_b* functions

Slight optimization, prefetch gives a 1% improvement in 1st pass

Change-Id: Iba4664964664234666406ab53893e02d481fbe61

2 years agovp9_ratectrl_rtc_test: initialize loopfilter_ctrl[]
James Zern [Tue, 23 Aug 2022 02:33:26 +0000 (19:33 -0700)]
vp9_ratectrl_rtc_test: initialize loopfilter_ctrl[]

this was added in:
  7beafefd1 vp9: Allow for disabling loopfilter per spatial layer
but the test doesn't zero initialize its svc_params_ member.

fixes the use of an uninitialized value, reported by valgrind and
integer sanitizer:
[ RUN      ] VP9/RcInterfaceSvcTest.Svc/0
==1064682== Conditional jump or move depends on uninitialised value(s)
==1064682==    at 0x1C5624: loopfilter_frame (vp9_encoder.c:3285)
==1064682==    by 0x1C9B54: encode_frame_to_data_rate (vp9_encoder.c:5595)
==1064682==    by 0x1CA2EE: SvcEncode (vp9_encoder.c:5789)
==1064682==    by 0x1CEA01: vp9_get_compressed_data (vp9_encoder.c:7891)
==1064682==    by 0x185F0E: encoder_encode (vp9_cx_iface.c:1437)
==1064682==    by 0x1503BB: vpx_codec_encode (vpx_encoder.c:208)

vp9/encoder/vp9_svc_layercontext.c:362:26: runtime error: implicit
conversion from type 'int' of value -1 (32-bit, signed) to type
'LOOPFILTER_CONTROL' changed the value to 4294967295 (32-bit, unsigned)
    #0 0x558925f45377 in vp9_restore_layer_context vp9/encoder/vp9_svc_layercontext.c:362:26
    #1 0x558925ef89fd in vp9_get_compressed_data vp9/encoder/vp9_encoder.c:7781:5
    #2 0x558925e3ef3e in encoder_encode vp9/vp9_cx_iface.c:1437:20

Bug: b/229626362
Change-Id: I33d244be7752c68b71efa9c62ca45d6b202ec761

2 years agoMerge "vp9.read_inter_block_mode_info: return on corruption" into main
James Zern [Mon, 22 Aug 2022 22:36:09 +0000 (22:36 +0000)]
Merge "vp9.read_inter_block_mode_info: return on corruption" into main

2 years agoMerge "highbd_quantize_neon.c: remove unneeded assert.h" into main
James Zern [Mon, 22 Aug 2022 22:21:46 +0000 (22:21 +0000)]
Merge "highbd_quantize_neon.c: remove unneeded assert.h" into main

2 years agoMerge "vp9,search_new_mv: descale rather than scale sse" into main
James Zern [Mon, 22 Aug 2022 22:21:28 +0000 (22:21 +0000)]
Merge "vp9,search_new_mv: descale rather than scale sse" into main

2 years agoMerge changes Iabed118b,I60a384b2 into main
James Zern [Mon, 22 Aug 2022 22:21:00 +0000 (22:21 +0000)]
Merge changes Iabed118b,I60a384b2 into main

* changes:
  use VPX_NO_UNSIGNED_SHIFT_CHECK with entropy functions
  compiler_attributes.h: add VPX_NO_UNSIGNED_SHIFT_CHECK

2 years ago[NEON] Add vpx_highbd_subtract_block function
Konstantinos Margaritis [Mon, 22 Aug 2022 19:46:50 +0000 (19:46 +0000)]
[NEON] Add vpx_highbd_subtract_block function

    Total gain for 12-bit encoding:
    * ~1% for best and rt profile

Change-Id: I4039120dc570baab1ae519a5e38b1acff38d81f0

2 years ago[NEON] Added vpx_highbd_sad* functions
Konstantinos Margaritis [Fri, 19 Aug 2022 22:00:42 +0000 (22:00 +0000)]
[NEON] Added vpx_highbd_sad* functions

    Total gain for 12-bit encoding:
    * ~7.8% for best profile
    * ~10% for rt profile

Change-Id: I89eda5c4372a5b628c9df84cdeb4c8486fc44789

2 years agohighbd_quantize_neon.c: remove unneeded assert.h
James Zern [Mon, 22 Aug 2022 17:48:40 +0000 (10:48 -0700)]
highbd_quantize_neon.c: remove unneeded assert.h

Change-Id: I041f5fb23b856a2b519669b5bf8a40d3772b4a6e

2 years agoMerge "[NEON] Added vpx_highbd_quantize_b* functions" into main
James Zern [Mon, 22 Aug 2022 17:45:52 +0000 (17:45 +0000)]
Merge "[NEON] Added vpx_highbd_quantize_b* functions" into main

2 years agoMerge "Fix TEST_P(SADx4Test, DISABLED_Speed)" into main
Scott LaVarnway [Mon, 22 Aug 2022 10:36:09 +0000 (10:36 +0000)]
Merge "Fix TEST_P(SADx4Test, DISABLED_Speed)" into main

2 years ago[NEON] Added vpx_highbd_quantize_b* functions
Konstantinos Margaritis [Fri, 12 Aug 2022 17:41:11 +0000 (17:41 +0000)]
[NEON] Added vpx_highbd_quantize_b* functions

    Total gain for 12-bit encoding:
    * ~4.8% for best profile
    * ~6.2% for rt profile

Change-Id: I61e646ab7aedf06a25db1365d6d1cf7b05101c21

2 years agoMerge "loopfilter.c: normalize flat func param type" into main
James Zern [Sat, 20 Aug 2022 00:00:06 +0000 (00:00 +0000)]
Merge "loopfilter.c: normalize flat func param type" into main

2 years agovp9.read_inter_block_mode_info: return on corruption
James Zern [Fri, 19 Aug 2022 00:51:19 +0000 (17:51 -0700)]
vp9.read_inter_block_mode_info: return on corruption

with block sizes < 8x8 previously only the inner loop was aborted. this
could cause propagation of invalid motion vectors to scale_mv().

this quiets integer sanitizer warnings of the form:
vp9/common/vp9_mvref_common.h:239:18: runtime error: implicit conversion
from type 'int' of value 32768 (32-bit, signed) to type 'int16_t' (aka
'short') changed the value to -32768 (16-bit, signed)

Bug: b/229626362
Change-Id: I58b5a425adf21542cbf4cc4dd5ab3cc5ed008264