Yunqing Wang [Mon, 7 Aug 2023 20:10:41 +0000 (13:10 -0700)]
Disable vpx_int_pro_row/col neon SIMD functions
The vpx_int_pro_row/col neon SIMD version caused a mismatch between
neon encoding vs c encoding. Disabled them for now to ensure the
correctness of VP9 encoding on the arm platform. Since these 2
functions were not used much, so this wouldn't affect the overall
encoder speed much.
BUG=webm:1800
BUG=webm:1809
Change-Id: Id1a7d542fc03d4cf9fa1039a49832abf35fb722f
Jerome Jiang [Mon, 7 Aug 2023 14:28:46 +0000 (10:28 -0400)]
Fix more clang-tidy warnings
- Include vpx/vpx_ext_ratectrl.h in vp9_ext_ratectrl.c
- Include vpx/internal/vpx_codec_internal.h
- Include <stddef.h> for NULL
Bug: b/
294049605
Change-Id: Iedd8b3864da27fde1678bfa6606e6fc5630a7a09
Jerome Jiang [Fri, 4 Aug 2023 20:12:29 +0000 (16:12 -0400)]
Fix some clang-tidy warnings
- Use zero initializer instead of memset to avoid including <cstring>
- Include vpx_codec.h for vpx_codec_err_t and error codes
- Include vpx_tpl.h for VpxTplGopStats
Change-Id: Iac5837ce2173bd945bfe8eeb401ff4dfd04fd2e1
Jerome Jiang [Fri, 4 Aug 2023 18:18:33 +0000 (14:18 -0400)]
Fix include path fpr vpx_tpl.h,vpx_ext_ratectrl.h
Bug: b/
294049605
Change-Id: I6422fc4250c2192f985cce2e296a19a05934969b
Jerome Jiang [Thu, 3 Aug 2023 19:28:44 +0000 (19:28 +0000)]
Merge "vp9_quantize_fp_neon: Same params name as in decl" into main
Jerome Jiang [Thu, 3 Aug 2023 18:33:32 +0000 (18:33 +0000)]
Merge "vp9 ext rc: Add callback for tpl stats" into main
Jerome Jiang [Thu, 3 Aug 2023 18:07:55 +0000 (14:07 -0400)]
vp9_quantize_fp_neon: Same params name as in decl
Clear some clang-tidy warnings
Change-Id: Iea4c4e77b3d515ec6384bd34875a0002ab13c14c
Jerome Jiang [Tue, 1 Aug 2023 15:00:20 +0000 (11:00 -0400)]
vp9 ext rc: Add callback for tpl stats
Added test
Bug: b/
294049605
Change-Id: I3967a0f915e1a6e7a0d34d04732c33e1ca6f35e7
Anupam Pandey [Tue, 13 Jun 2023 10:32:58 +0000 (16:02 +0530)]
Add test to check bit exactness of C and SIMD in VP9 encoder
This CL adds a shell script to test bit exactness of C and SIMD
VP9 encoder for x86 platform.
As C Vs NEON encoding outputs are not bit-exact (BUG=webm:1809),
ARM tests are currently disabled.
BUG=webm:1800
Change-Id: Iffcc70863e8cf83ccb5bc5be73e8866165697358
Yunqing Wang [Wed, 2 Aug 2023 03:58:18 +0000 (20:58 -0700)]
Add a 10-bit test file
Added a 10-bit test file for VP9 end-to-end c vs SIMD bit-
exactness test.
BUG=webm:1800
Change-Id: I4a864f1a740abee27049d68231adf2ec308f9a96
Johann [Fri, 28 Jul 2023 20:44:56 +0000 (05:44 +0900)]
normalize *const in rtcd
Change-Id: Iece50143b43263c0c8f90299bedd7d2a5b9aa56b
Johann [Fri, 28 Jul 2023 11:21:31 +0000 (20:21 +0900)]
remove incorrect (void)
n_coeffs is used in this function
Change-Id: I5f5d2933304bb636a33e0fa294b4526edb65a08d
Johann [Fri, 28 Jul 2023 10:37:48 +0000 (19:37 +0900)]
quantize_fp: reduce parameters
apply similar steps as to the other quantize functions to switch to
macroblock_plane and ScanOrder
Change-Id: I486d653326aaf52ffd3beafd2e891ba6a5d57ef3
Johann [Mon, 14 Nov 2022 07:47:33 +0000 (16:47 +0900)]
quantize: reduce parameters
Pass macroblock_plane and ScanOrder instead of looking up the values
beforehand. Avoids pushing arguments to the stack.
Change-Id: I22df6f645eb1a1d89ba5a4d9bc58acb77af51aa9
James Zern [Thu, 27 Jul 2023 02:03:04 +0000 (19:03 -0700)]
resize_test: prefer 'override' to 'virtual'
Update functions in WRITE_COMPRESSED_STREAM blocks, which are disabled
by default. This caused them to be missed in:
84e6b7ab0 test/*.cc: prefer 'override' to 'virtual'
Change-Id: I0e462263f19c15eb0a30d0c0f4e145062f789489
James Zern [Wed, 26 Jul 2023 22:29:40 +0000 (15:29 -0700)]
test/*.h: prefer 'override' to 'virtual'
created with clang-tidy --fix --checks=-*,modernize-use-override
Change-Id: I53412f35590799574edb573ae417a4a004cccd1e
James Zern [Wed, 26 Jul 2023 22:38:36 +0000 (15:38 -0700)]
encode_test_driver.h: use bool literal
Change-Id: If47be9ca0daa18d92cb849484f9e139e65e3560e
James Zern [Tue, 25 Jul 2023 19:18:03 +0000 (12:18 -0700)]
test/**.cc: use bool literals
created with clang-tidy --fix --checks=-*,modernize-use-bool-literals
Change-Id: Ifaed8ca824676555acaf1053b2a5a52c51a70638
James Zern [Tue, 25 Jul 2023 19:10:21 +0000 (12:10 -0700)]
test/decode_perf_test.cc: use nullptr
created with clang-tidy --fix --checks=-*,modernize-use-nullptr
Change-Id: Ibf4a80fa00e9b59d471c92788ec4c7c72e4662e5
James Zern [Tue, 25 Jul 2023 19:04:57 +0000 (12:04 -0700)]
test/*.cc: use '= default'
created with clang-tidy --fix --checks=-*,modernize-use-equals-default
Change-Id: Ie373fb5501491fce53479d20f3a6d908c4b7c535
James Zern [Tue, 25 Jul 2023 18:27:34 +0000 (18:27 +0000)]
Merge changes I71e1b442,Ibbfb949b into main
* changes:
test/*.cc: prefer 'override' to 'virtual'
test,AbstractBench: fix -Wnon-virtual-dtor
James Zern [Tue, 25 Jul 2023 00:23:23 +0000 (17:23 -0700)]
test/*.cc: prefer 'override' to 'virtual'
created with clang-tidy --fix --checks=-*,modernize-use-override
Change-Id: I71e1b4423c143b3e47fe90929ee110b307cdb565
James Zern [Sat, 8 Jul 2023 02:14:59 +0000 (19:14 -0700)]
test,AbstractBench: fix -Wnon-virtual-dtor
In file included from ../test/bench.cc:14:
../test/bench.h:17:7: warning: 'AbstractBench' has virtual functions but
non-virtual destructor [-Wnon-virtual-dtor]
class AbstractBench {
Change-Id: Ibbfb949b63c8dff936c7ed4f2d056dea0343377b
Jerome Jiang [Mon, 24 Jul 2023 22:04:58 +0000 (18:04 -0400)]
Add new_mv_count to ext rate control interface
Bug: b/
290385227
Change-Id: Ia87c4bf1e9315bf1134c998f88e9d5548c497777
Jerome Jiang [Mon, 24 Jul 2023 17:08:05 +0000 (13:08 -0400)]
cleanup: _pt -> _ptr in vp9 external RC interface
Change-Id: Ic483488f8f6273e8977cfc324466bda41f1e47a7
James Zern [Thu, 13 Jul 2023 16:49:30 +0000 (09:49 -0700)]
vp9_rdopt,handle_inter_mode: fix -Wmaybe-uninitialized warning
With gcc 13.1.1
In function ‘handle_inter_mode’,
inlined from ‘vp9_rd_pick_inter_mode_sb’ at
../vp9/encoder/vp9_rdopt.c:3872:17:
../vp9/encoder/vp9_rdopt.c:3142:8: warning: ‘tmp_rd’ may be used
uninitialized [-Wmaybe-uninitialized]
3142 | rd = tmp_rd + RDCOST(x->rdmult, x->rddiv, rs, 0);
../vp9/encoder/vp9_rdopt.c: In function ‘vp9_rd_pick_inter_mode_sb’:
../vp9/encoder/vp9_rdopt.c:2846:15: note: ‘tmp_rd’ was declared here
2846 | int64_t rd, tmp_rd, best_rd = INT64_MAX;
Change-Id: I8608957cc8bbeb1ae525f3c3dad6fe9785b2a9b4
James Zern [Tue, 11 Jul 2023 00:55:30 +0000 (00:55 +0000)]
Merge "vp8: remove missing prototypes from the rtcd header" into main
L. E. Segovia [Sat, 8 Jul 2023 23:30:49 +0000 (20:30 -0300)]
vp8: remove missing prototypes from the rtcd header
These were removed in If7a49e920e12f7fca0541190b87e6dae510df05c but
the leftovers can cause a build to fail if the code isn't optimized out.
I just found this out in the Meson port of libvpx for GStreamer.
BUG=webm:1584
Change-Id: I1c953720a2cbec3796200d4ec4020dca0b672bfb
James Zern [Mon, 10 Jul 2023 17:06:13 +0000 (10:06 -0700)]
vpx_free_tpl_gop_stats: normalize param name
this fixes a clang-tidy warning
Change-Id: I13f4750c15b7d6a395494c8dbcb896bde125b3c4
James Zern [Thu, 6 Jul 2023 17:10:37 +0000 (17:10 +0000)]
Merge "delete some dead code" into main
James Zern [Thu, 29 Jun 2023 16:52:26 +0000 (09:52 -0700)]
mfqe_partition: fix -Wunreachable-code
vp9/common/vp9_mfqe.c|240 col 16| warning: code will never be executed
[-Wunreachable-code]
BLOCK_SIZE mfqe_bs, bs_tmp;
^~~~~~~
Change-Id: I566b20d8c294e19bc4b90b57b730f933048e71a5
Wan-Teh Chang [Wed, 28 Jun 2023 23:09:36 +0000 (16:09 -0700)]
Fix a bug in vpx_highbd_hadamard_32x32_neon().
This CL is the highbd version of
https://chromium-review.googlesource.com/c/webm/libvpx/+/4646573.
The bug is caused by the incorrect assumption that
(a / 2) + (b / 2) == (a + b) / 2 and (a / 2) - (b / 2) == (a - b) / 2.
Also fix the Rand() inputs to Hadamard functions in unit tests.
This CL ports the following libaom CLs to libvpx:
https://aomedia-review.googlesource.com/c/aom/+/177101
https://aomedia-review.googlesource.com/c/aom/+/177241
Change-Id: Ic20e7684eab5d6507417fa2b75e572064d37ad2c
James Zern [Wed, 28 Jun 2023 19:26:32 +0000 (12:26 -0700)]
delete some dead code
follow-up to:
3ecba3980 Fix Clang -Wunreachable-code-aggressive warnings
Change-Id: I364312987bc838c69c010cce024bd3d62a918417
James Zern [Wed, 28 Jun 2023 19:21:40 +0000 (19:21 +0000)]
Merge "Fix Clang -Wunreachable-code-aggressive warnings" into main
James Zern [Sat, 3 Jun 2023 01:49:00 +0000 (18:49 -0700)]
Fix Clang -Wunreachable-code-aggressive warnings
Based on the change in libaom:
fe36011455 Fix Clang -Wunreachable-code-aggressive warnings
Clang's -Wunreachable-code-aggressive flag enables several warning flags
such as -Wunreachable-code-break and -Wunreachable-code-return. Chrome's
build system enables -Wunreachable-code-aggressive (in
build/config/compiler/BUILD.gn), so it would be good if libvpx could be
compiled without -Wunreachable-code-aggressive warnings.
This requires the VPX_NO_RETURN macro be defined correctly for all the
compilers we support, otherwise some compilers may warn about missing
return statements after a die() or fatal() call (which does not return).
Change-Id: I0c069133af45a7a61759538b6d74c681ea087dcd
Jerome Jiang [Wed, 28 Jun 2023 14:04:21 +0000 (10:04 -0400)]
vp9 firstpass stats in a separate header
Change-Id: If91c5c74c71affc48eb858beb314a6c194b14023
James Zern [Wed, 28 Jun 2023 00:02:47 +0000 (00:02 +0000)]
Merge changes I1c17302f,Ic084894b,I9867f5fc,Ie3faf7b3,If5dc96b7, ... into main
* changes:
vp8_decode: fix keyframe resync after decode error
vp8_decode: only remove threads on thread create failure
vp8_decode: clear stream info on decoder create failure
vp9_decodeframe,init_mt: free tile_workers on alloc failure
vp9_alloccommon: clear allocation sizes on free
vp9_dx_iface: fix leaks on init_decoder() failure
James Zern [Tue, 27 Jun 2023 02:25:56 +0000 (19:25 -0700)]
vp8_decode: fix keyframe resync after decode error
This fixes a crash if the application continues to call
vpx_codec_decode(). Previously a non-keyframe could cause a crash if the
decoder failed before fully initializing due to an allocation failure.
The stream info and frame resolution would be 0, skipping an allocation.
Found with vpx_dec_fuzzer_vp8 & Nallocfuzz
(https://github.com/catenacyber/nallocfuzz).
Bug: webm:1807
Change-Id: I1c17302f4d3a488ba3b4eefe0bf53853dc558bc1
James Zern [Tue, 27 Jun 2023 02:22:00 +0000 (19:22 -0700)]
vp8_decode: only remove threads on thread create failure
This fixes a crash if the application continues to call
vpx_codec_decode(). Previously the decoder instance would be freed,
causing a crash when attempting to access it with restart_threads=1.
Found with vpx_dec_fuzzer_vp8 & Nallocfuzz
(https://github.com/catenacyber/nallocfuzz).
Bug: webm:1807
Change-Id: Ic084894b776729bb1572f747082cef002f0832a8
James Zern [Tue, 27 Jun 2023 02:18:55 +0000 (19:18 -0700)]
vp8_decode: clear stream info on decoder create failure
This fixes a crash if the application continues to call
vpx_codec_decode().
Found with vpx_dec_fuzzer_vp8 & Nallocfuzz
(https://github.com/catenacyber/nallocfuzz).
Bug: webm:1807
Change-Id: I9867f5fc3d1163026f521a9609d3cbbc00568d1d
James Zern [Tue, 27 Jun 2023 02:09:24 +0000 (19:09 -0700)]
vp9_decodeframe,init_mt: free tile_workers on alloc failure
This avoids a crash if any of the thread allocations fail and the
application continues to call vpx_codec_decode(). Previously
num_tile_workers would be non-zero, but not equal to num_threads, which
would cause a crash during later thread management.
Found with vpx_dec_fuzzer_vp9 & Nallocfuzz
(https://github.com/catenacyber/nallocfuzz).
Bug: webm:1807
Change-Id: Ie3faf7b36764aebedac0924acb6e4cb7545aec7d
James Zern [Tue, 27 Jun 2023 02:06:51 +0000 (19:06 -0700)]
vp9_alloccommon: clear allocation sizes on free
This fixes reallocations (and avoids potential crashes) if any
allocations fails and the application continues to call
vpx_codec_decode().
Found with vpx_dec_fuzzer_vp9 & Nallocfuzz
(https://github.com/catenacyber/nallocfuzz).
Bug: webm:1807
Change-Id: If5dc96b73c02efc94ec84c25eb50d10ad6b645a6
James Zern [Sat, 24 Jun 2023 02:27:26 +0000 (19:27 -0700)]
vp9_dx_iface: fix leaks on init_decoder() failure
If any allocations fail in init_decoder() and the application continues
to call vpx_codec_decode() some of the allocations would be orphaned or
the decoder would be left in a partially initialized state.
Found with vpx_dec_fuzzer_vp9 & Nallocfuzz
(https://github.com/catenacyber/nallocfuzz).
Bug: webm:1807
Change-Id: I44f662526d715ecaeac6180070af40672cd42611
Wan-Teh Chang [Mon, 26 Jun 2023 21:57:53 +0000 (14:57 -0700)]
Fix a bug in vpx_hadamard_32x32_neon()
A right shift by 2 is equivalent to two halving operations if there is
no no addition or subtraction between the two halving operations.
Note: Since vhaddq_s16() and vhsubq_s16() have 17-bit intermediate
precision, the Neon code doesn't need to go to int32_t as was done in
https://chromium-review.googlesource.com/c/webm/libvpx/+/4604169.
Change-Id: Ibe0691cde0fd3b94ee7c497845ba459d30d503b0
James Zern [Tue, 20 Jun 2023 20:06:32 +0000 (20:06 +0000)]
Merge "configure.sh: Improve a comment." into main
Yunqing Wang [Tue, 20 Jun 2023 16:34:58 +0000 (16:34 +0000)]
Merge "Remove vp9_diamond_search_sad_avx function" into main
Anupam Pandey [Wed, 14 Jun 2023 04:57:49 +0000 (10:27 +0530)]
Remove vp9_diamond_search_sad_avx function
This CL removes the avx of vp9_diamond_search_sad function as
there is no speed up seen wrt C.
Change-Id: Ife6005d8e444ea2c8d07ac0f686c840344b9e0ea
Chen Wang [Fri, 16 Jun 2023 08:19:02 +0000 (16:19 +0800)]
configure.sh: Improve a comment.
The corresponding case block is not only for ARM.
Original comment text makes reader confused.
Test: N/A, just comment text changes.
Change-Id: I3154d18d3b3d237c1eecfe07dc7ec237c98194cf
Signed-off-by: Chen Wang <wangchen20@iscas.ac.cn>
Jerome Jiang [Wed, 14 Jun 2023 20:20:30 +0000 (16:20 -0400)]
Add new_mv_count to firstpass stats
Mostly follows the logic of how it's calculated in libaom.
Bug: b/
287283080
Change-Id: I9ee67d844ef9db7cca63339b5304459eaa28d324
Yunqing Wang [Mon, 12 Jun 2023 16:44:21 +0000 (16:44 +0000)]
Merge "Fix c vs intrinsic mismatch of vpx_hadamard_32x32() function" into main
Jerome Jiang [Fri, 9 Jun 2023 19:33:39 +0000 (15:33 -0400)]
RTC RC: clean up unnecessary headers
Change-Id: I77c407be59f4eb0c70a89a5fffd88c648e634123
Anupam Pandey [Tue, 6 Jun 2023 06:57:34 +0000 (12:27 +0530)]
Fix c vs intrinsic mismatch of vpx_hadamard_32x32() function
This CL resolves the mismatch between C and intrinsic implementation
of vpx_hadamard_32x32 function. The mismatch was due to integer
overflow during the addition operation in the intrinsic functions.
Specifically, the addition in the intrinsic function was performed
at the 16-bit level, while the calculation of a0 + a1 resulted in
a 17-bit value.
This code change addresses the problem by performing
the addition at the 32-bit level (with sign extension) in both SSE2
and AVX2, and then converting the results back to the 16-bit level
after a right shift.
STATS_CHANGED
Change-Id: I576ca64e3b9ebb31d143fcd2da64322790bc5853
Jerome Jiang [Thu, 8 Jun 2023 14:52:45 +0000 (10:52 -0400)]
Replace NONE with NO_REF_FRAME
NONE is a common name and it has conflicts with symbols defined in
Chromium.
Bug: b/
286163500
Change-Id: I3d935a786f771a4d90b258fabc6fd6c2ecbf1c59
Jerome Jiang [Thu, 8 Jun 2023 14:11:24 +0000 (14:11 +0000)]
Merge "Fix more typos (n/n)" into main
Jerome Jiang [Wed, 7 Jun 2023 21:10:04 +0000 (21:10 +0000)]
Merge "Fix more typos (3/n)" into main
Jerome Jiang [Wed, 7 Jun 2023 20:35:19 +0000 (16:35 -0400)]
Fix more typos (n/n)
impace -> impact
taget -> target
prediciton -> prediction
addtion -> addition
the the -> the
Bug: webm:1803
Change-Id: I759c9d930a037ca69662164fcd6be160ed707d77
Jerome Jiang [Wed, 7 Jun 2023 16:36:31 +0000 (12:36 -0400)]
Fix more typos (3/n)
Propogation -> Propagation
propogate -> propagate
cant -> can't
upto -> up to
canddiates -> candidates
refernce -> reference
USEAGE -> USAGE
Change-Id: Iadaf2dffd86b54e04411910f667e8c2dfc6c4c77
Jerome Jiang [Wed, 7 Jun 2023 19:10:43 +0000 (19:10 +0000)]
Merge "Fix more typos (2/n)" into main
Jerome Jiang [Wed, 7 Jun 2023 19:10:36 +0000 (19:10 +0000)]
Merge "Fix more typos (1/n)" into main
Jerome Jiang [Wed, 7 Jun 2023 18:19:08 +0000 (18:19 +0000)]
Merge "Fix a few typos" into main
Jerome Jiang [Wed, 7 Jun 2023 16:31:38 +0000 (12:31 -0400)]
Fix more typos (2/n)
kernal -> kernel
e.g -> e.g.
paritioning -> partitioning
partioning -> partitioning
coefficents -> coefficients
i.e, -> i.e.,
equivalend -> equivalent
recive -> receive
resoultions -> resolutions
Bug: webm:1803
Change-Id: I1d6176202ee5daee7a64bf59114e8b304aeb4db7
Jerome Jiang [Wed, 7 Jun 2023 16:26:55 +0000 (12:26 -0400)]
Fix more typos (1/n)
Dont -> Don't
setings -> settings
thresold -> thresh
thresold -> threshold
becasue -> because
itterations -> iterations
its a -> it's a
an constant -> a constant
Bug: webm:1803
Change-Id: I1e019393939ed25c59c898c88d4941ec360b026d
Jerome Jiang [Wed, 7 Jun 2023 16:21:38 +0000 (12:21 -0400)]
Fix a few typos
segement -> segment
dont -> don't
useage -> usage
devide -> divide
Bug: webm:1803
Change-Id: I0153380b0003825c4b62cf323d4f2bc837c8a264
Deepa K G [Tue, 6 Jun 2023 06:08:09 +0000 (11:38 +0530)]
Add comments in vp9_diamond_search_sad_avx()
Added comments related to re-arranging the
elements of the SAD vector to find the
minimum.
Change-Id: I58b702d304a6cdd32f04775fba603e39c19a8947
Deepa K G [Mon, 24 Apr 2023 10:26:18 +0000 (15:56 +0530)]
Fix c vs avx mismatch of diamond_search_sad()
In the function vp9_diamond_search_sad_avx(), arranged
the cost vector in a specific order. This ensures that
the motion vector with the least index is selected,
when there exists more than one candidate motion
vector with the minimum cost, thus resolving the
c vs avx mismatch.
STATS_CHANGED
Change-Id: I4f8864f464f9ea2aae6250db3d8ad91cb08b26e2
Jerome Jiang [Wed, 31 May 2023 19:31:04 +0000 (19:31 +0000)]
Merge "Trim tpl stats by 2 extra frames" into main
Jerome Jiang [Fri, 26 May 2023 16:02:36 +0000 (12:02 -0400)]
Trim tpl stats by 2 extra frames
Not applicable to the last GOP.
Bug: b/
284162396
Change-Id: I55b7e04e9fc4b68a08ce3e00b10743823c828954
James Zern [Wed, 31 May 2023 17:44:00 +0000 (17:44 +0000)]
Merge changes I6a906803,I0307a3b6 into main
* changes:
Optimize Neon implementation of vpx_int_pro_row
Optimize Neon implementation of vpx_int_pro_col
Jonathan Wright [Tue, 30 May 2023 16:31:18 +0000 (17:31 +0100)]
Optimize Neon implementation of vpx_int_pro_row
Double the number of accumulator registers to remove the bottleneck.
Also peel the first loop iteration.
Change-Id: I6a90680369f9c33cdfe14ea547ac1569ec3f50de
Jonathan Wright [Tue, 30 May 2023 14:22:04 +0000 (15:22 +0100)]
Optimize Neon implementation of vpx_int_pro_col
Use widening pairwise addition instructions to halve the number of
additions required.
Change-Id: I0307a3b65e50d2b1ae582938bc5df9c2b21df734
James Zern [Thu, 25 May 2023 04:54:09 +0000 (04:54 +0000)]
Merge changes Ia3647698,I55caf34e,Id2c60f39 into main
* changes:
vpx_dsp_common.h,clip_pixel: work around VS2022 Arm64 issue
fdct_partial_neon.c: work around VS2022 Arm64 issue
fdct8x8_test.cc: work around VS2022 Arm64 issue
James Zern [Wed, 24 May 2023 17:43:20 +0000 (17:43 +0000)]
Merge "examples.mk,vpxdec: rm libwebm muxer dependency" into main
Jerome Jiang [Wed, 24 May 2023 16:27:20 +0000 (16:27 +0000)]
Merge "Add IO for TPL stats" into main
James Zern [Tue, 23 May 2023 22:50:10 +0000 (15:50 -0700)]
vpx_dsp_common.h,clip_pixel: work around VS2022 Arm64 issue
cl.exe targeting AArch64 with optimizations enabled
produces invalid code for clip_pixel() when the return type is uint8_t.
See:
https://developercommunity.visualstudio.com/t/Misoptimization-for-ARM64-in-VS-2022-17/
10363361
Bug: b/
277255076
Bug: webm:1788
Change-Id: Ia3647698effd34f1cf196cd33fa4a8cab9fa53d6
James Zern [Tue, 23 May 2023 22:49:29 +0000 (15:49 -0700)]
fdct_partial_neon.c: work around VS2022 Arm64 issue
cl.exe targeting AArch64 with optimizations enabled
will fail with an internal compiler error.
See:
https://developercommunity.visualstudio.com/t/Compiler-crash-C1001-when-building-a-for/
10346110
Bug: b/
277255076
Bug: webm:1788
Change-Id: I55caf34e910dab47a7775f07280677cdfe606f5b
James Zern [Tue, 23 May 2023 22:48:10 +0000 (15:48 -0700)]
fdct8x8_test.cc: work around VS2022 Arm64 issue
cl.exe targeting AArch64 with optimizations enabled
produces invalid code in RunExtremalCheck() and RunInvAccuracyCheck().
See:
https://developercommunity.visualstudio.com/t/1770-preview-1:-Misoptimization-for-AR/
10369786
Bug: b/
277255076
Bug: webm:1788
Change-Id: Id2c60f3948d8f788c78602aea1b5232133415dea
Jerome Jiang [Thu, 4 May 2023 14:48:25 +0000 (10:48 -0400)]
Add IO for TPL stats
Overload TempOutFile constructor to allow IO mode.
Bug: b/
281563704
Change-Id: I1f4f5b29db0e331941b6795e478eeeab51f625ad
Jerome Jiang [Thu, 18 May 2023 17:20:03 +0000 (17:20 +0000)]
Merge "Add new vpx_tpl.h API file" into main
Yunqing Wang [Thu, 18 May 2023 15:48:49 +0000 (15:48 +0000)]
Merge "Improve convolve AVX2 intrinsic for speed" into main
Jerome Jiang [Tue, 16 May 2023 18:57:05 +0000 (14:57 -0400)]
Add new vpx_tpl.h API file
New file (vpx_tpl.c) in the following CLs will add new APIs dealing with
TPL stats from VP9 encoder.
Change-Id: I5102ef64214cba1ca6ecea9582a19049666c6ca4
Anupam Pandey [Fri, 12 May 2023 05:26:45 +0000 (10:56 +0530)]
Improve convolve AVX2 intrinsic for speed
This CL refactors the code related to convolve function.
Furthermore, improved the AVX2 intrinsic to compute
convolve vertical for w = 4 case, and convolve horiz for
w = 16 case.
Please note the module level scaling w.r.t C function
(timer based) for existing (AVX2) and new AVX2 intrinsics:
Block Scaling
Size AVX2 AVX2
(existing) (New)
4x4 5.34x 5.91x
4x8 7.10x 7.79x
16x8 23.52x 25.63x
16x16 29.47x 30.22x
16x32 33.42x 33.44x
This is a bit exact change.
Change-Id: If130183bc12faab9ca2bcec0ceeaa8d0af05e413
James Zern [Tue, 16 May 2023 00:05:05 +0000 (00:05 +0000)]
Merge changes Ie77ad184,Idfcac43c into main
* changes:
Add 2D-specific Neon horizontal convolution functions
Refactor standard bitdepth Neon convolution functions
Jonathan Wright [Thu, 4 May 2023 15:33:38 +0000 (16:33 +0100)]
Add 2D-specific Neon horizontal convolution functions
2D 8-tap convolution filtering is performed in two passes -
horizontal and vertical. The horizontal pass must produce enough
input data for the subsequent vertical pass - 3 rows above and 4 rows
below, in addition to the actual block height.
At present, all Neon horizontal convolution algorithms process 4 rows
at a time, but this means we end up doing at least 1 row too much
work in the 2D first pass case where we need h + 7, not h + 8 rows of
output.
This patch adds additional dot-product (SDOT and USDOT) Neon paths
that process h + 7 rows of data exactly, saving the work of the
unnecessary extra row. It is impractical to take a similar approach
for the Armv8.0 MLA paths since we have to transpose the data block
both before and after calling the convolution helper functions.
vpx_convolve_neon performance impact: we observe a speedup of ~9% for
smaller (and wider) blocks, and a speedup of 0-3% for larger blocks.
This is to be expected since the proportion of redundant work
decreases as the block height increases.
Change-Id: Ie77ad1848707d2d48bb8851345a469aae9d097e1
James Zern [Fri, 12 May 2023 19:23:47 +0000 (19:23 +0000)]
Merge "Don't use -Wl,-z,defs with Clang's sanitizers" into main
James Zern [Mon, 8 May 2023 23:58:59 +0000 (16:58 -0700)]
Don't use -Wl,-z,defs with Clang's sanitizers
This avoids link errors related to the sanitizers:
https://clang.llvm.org/docs/AddressSanitizer.html#usage
"When linking shared libraries, the AddressSanitizer run-time is not
linked, so -Wl,-z,defs may cause link errors ..."
See also:
https://crbug.com/aomedia/3438
Bug: webm:1801
Fixed: webm:1801
Change-Id: Ie212318005a5f7222e5486775175534025306367
Jonathan Wright [Mon, 8 May 2023 16:41:26 +0000 (17:41 +0100)]
Refactor standard bitdepth Neon convolution functions
1) Use #define constant instead of magic numbers for right shifts.
2) Move saturating narrow into helper functions that return 4-element
result vectors.
3) Use mem_neon.h helpers for load/store sequences in Armv8.0 paths.
4) Tidy up: assert conditions and some longer variable names.
5) Prefer != 0 to > 0 where possible for loop termination conditions.
Change-Id: Idfcac43ca38faf729dca07b8cc8f7f45ad264d24
James Zern [Mon, 17 Apr 2023 20:42:11 +0000 (13:42 -0700)]
configure: add -Wshadow
libraries under third_party/ are out of scope for this change.
Bug: webm:1793
Change-Id: I562065a3c0ea9fdfc9615d1a6b1ae47da79b8ce0
James Zern [Tue, 9 May 2023 21:03:31 +0000 (21:03 +0000)]
Merge "vp8_macros_msa.h: clear -Wshadow warnings" into main
James Zern [Tue, 9 May 2023 20:55:55 +0000 (20:55 +0000)]
Merge changes Iac020280,I8ca8660a into main
* changes:
gen_msvs_vcxproj: add ARM64EC w/VS >= 2022
configure: add clang-cl vs1[67] arm64 targets
Yunqing Wang [Tue, 9 May 2023 15:57:09 +0000 (15:57 +0000)]
Merge "Add AVX2 intrinsic for vpx_comp_avg_pred() function" into main
Anupam Pandey [Mon, 8 May 2023 06:40:09 +0000 (12:10 +0530)]
Add AVX2 intrinsic for vpx_comp_avg_pred() function
The module level scaling w.r.t C function (timer based) for
existing (SSE2) and new AVX2 intrinsics:
If ref_padding = 0
Block Scaling
size SSE2 AVX2
8x4 3.24x 3.24x
8x8 4.22x 4.90x
8x16 5.91x 5.93x
16x8 1.63x 3.52x
16x16 1.53x 4.19x
16x32 1.38x 4.82x
32x16 1.28x 3.08x
32x32 1.45x 3.13x
32x64 1.38x 3.04x
64x32 1.39x 2.12x
64x64 1.46x 2.24x
If ref_padding = 8
Block Scaling
size SSE2 AVX2
8x4 3.20x 3.21x
8x8 4.61x 4.83x
8x16 5.50x 6.45x
16x8 1.56x 3.35x
16x16 1.53x 4.19x
16x32 1.37x 4.83x
32x16 1.28x 3.07x
32x32 1.46x 3.29x
32x64 1.38x 3.22x
64x32 1.38x 2.14x
64x64 1.38x 2.12x
This is a bit-exact change.
Change-Id: I72c5d155f64d0c630bc8c3aef21dc8bbd045d9e6
James Zern [Mon, 8 May 2023 18:48:15 +0000 (11:48 -0700)]
vp8_macros_msa.h: clear -Wshadow warnings
Bug: webm:1793
Change-Id: Ia940b06bd23a915a050432e03bb630567e891d8d
James Zern [Mon, 8 May 2023 20:52:52 +0000 (20:52 +0000)]
Merge "README: update target list" into main
James Zern [Mon, 8 May 2023 20:52:31 +0000 (20:52 +0000)]
Merge changes Ie165d410,I6d9bb8da,I6858e574 into main
* changes:
vp8_[cd]x_iface: clear setjmp flag on function exit
vp9_decodeframe,tile_worker_hook: relocate setjmp=1
vp9,encoder_set_config: set setjmp flag after setjmp()
Jerome Jiang [Mon, 8 May 2023 19:47:30 +0000 (19:47 +0000)]
Merge "Add VpxTplGopStats" into main
Jerome Jiang [Mon, 8 May 2023 19:47:21 +0000 (19:47 +0000)]
Merge "Unify implementation of CHECK_MEM_ERROR" into main
Jerome Jiang [Mon, 8 May 2023 19:46:44 +0000 (19:46 +0000)]
Merge "CHECK_MEM_ERROR to return in vp9_set_roi_map" into main
James Zern [Sat, 6 May 2023 02:00:08 +0000 (19:00 -0700)]
gen_msvs_vcxproj: add ARM64EC w/VS >= 2022
rather than define new targets, add a platform to the arm64 list as they
share the same configuration.
Bug: webm:1788
Change-Id: Iac020280b1103fb12b559f21439aeff26568fba4
James Zern [Fri, 5 May 2023 23:56:51 +0000 (16:56 -0700)]
configure: add clang-cl vs1[67] arm64 targets
x86 and armv7 are skipped for now as the intrinsics will need different
flags than cl.exe (/arch:... -> -m...).
Bug: webm:1788
Change-Id: I8ca8660a8644cdd84c51cb1f75005e371ba8207d
Jerome Jiang [Thu, 4 May 2023 18:28:29 +0000 (14:28 -0400)]
Add VpxTplGopStats
Contains the size of GOP - also the size of the list of TPL stats for
each frame in this GOP.
VpxTplGopStats will be the unit for VP9E_GET_TPL_STATS control to return
TPL stats from the encoder.
Bug: b/
273736974
Change-Id: I1682242fc6db4aafcd6314af023aa0d704976585