platform/upstream/libvpx.git
17 months agoOptimize Neon high bitdepth subpel variance functions
Salome Thirot [Tue, 7 Feb 2023 14:08:33 +0000 (14:08 +0000)]
Optimize Neon high bitdepth subpel variance functions

Use the same general code style as in the standard bitdepth Neon
implementation. Additionally, do not unnecessarily widen to 32-bit data
types when doing bilinear filtering - allowing us to process twice as
many elements per instruction.

Change-Id: I1e178991d2aa71f5f77a376e145d19257481e90f

17 months agoMerge "Remove CONFIG_CONSISTENT_RECODE flag" into main
Chi Yo Tsai [Fri, 10 Feb 2023 22:13:50 +0000 (22:13 +0000)]
Merge "Remove CONFIG_CONSISTENT_RECODE flag" into main

17 months agoRemove CONFIG_CONSISTENT_RECODE flag
chiyotsai [Wed, 8 Feb 2023 21:54:46 +0000 (13:54 -0800)]
Remove CONFIG_CONSISTENT_RECODE flag

Currently, libvpx does not properly clear and re-initialize the memories
when it re-encodes a frame. As a result, out-of-date values are used in
the encoding process, and re-encoding a frame with the same parameter
will give different outputs.

This commit enables the code under CONFIG_CONSISTENT_RECODE to correct
this behavior. This change has minor effect on the coding performance,
but it ensures valid values are used in the encoding process.

Furthermore, the flag is removed as it is now always turned on.

Performance:
| SPD_SET | TESTSET | AVG_PSNR | OVR_PSNR |  SSIM   | ENC_T |
|---------|---------|----------|----------|---------|-------|
|    0    | hdres2  | -0.012%  | -0.021%  | -0.030% | +0.1% |
|    0    | lowres2 | +0.029%  | +0.019%  | +0.047% | +0.1% |
|    0    | midres2 | -0.004%  | +0.009%  | +0.026% | +0.1% |
|---------|---------|----------|----------|---------|-------|
|    1    | hdres2  | +0.032%  | +0.032%  | -0.000% | -0.0% |
|    1    | lowres2 | -0.005%  | -0.011%  | -0.014% | +0.0% |
|    1    | midres2 | +0.004%  | +0.020%  | +0.027% | +0.2% |
|---------|---------|----------|----------|---------|-------|
|    2    | hdres2  | +0.048%  | +0.056%  | +0.057% | +0.1% |
|    2    | lowres2 | +0.007%  | +0.002%  | -0.016% | -0.0% |
|    2    | midres2 | -0.015%  | -0.008%  | -0.002% | +0.1% |
|---------|---------|----------|----------|---------|-------|
|    3    | hdres2  | +0.010%  | +0.014%  | +0.004% | -0.0% |
|    3    | lowres2 | +0.000%  | -0.021%  | -0.001% | +0.0% |
|    3    | midres2 | +0.007%  | -0.038%  | +0.012% | -0.2% |
|---------|---------|----------|----------|---------|-------|
|    4    | hdres2  | +0.107%  | +0.136%  | +0.124% | -0.0% |
|    4    | lowres2 | -0.012%  | -0.024%  | -0.020% | -0.0% |
|    4    | midres2 | +0.055%  | -0.004%  | +0.048% | -0.1% |
|---------|---------|----------|----------|---------|-------|
|    5    | hdres2  | +0.026%  | +0.027%  | +0.020% | -0.0% |
|    5    | lowres2 | +0.009%  | -0.008%  | +0.028% | +0.1% |
|    5    | midres2 | -0.025%  | +0.021%  | -0.020% | -0.1% |

STATS_CHANGED

Change-Id: I3967aee8c8e4d0608a492e07f99ab8de9744ba57

17 months agoMerge "Optimize Neon high bitdepth convolve copy" into main
James Zern [Fri, 10 Feb 2023 03:35:22 +0000 (03:35 +0000)]
Merge "Optimize Neon high bitdepth convolve copy" into main

17 months agoMerge "Merge tag 'v1.13.0'" into main
Jerome Jiang [Thu, 9 Feb 2023 22:07:28 +0000 (22:07 +0000)]
Merge "Merge tag 'v1.13.0'" into main

17 months agoMerge "Remove onyx_int.h from vp8 rc header" into main
Jerome Jiang [Thu, 9 Feb 2023 21:27:59 +0000 (21:27 +0000)]
Merge "Remove onyx_int.h from vp8 rc header" into main

17 months agoRemove onyx_int.h from vp8 rc header
Jerome Jiang [Tue, 7 Feb 2023 22:22:12 +0000 (17:22 -0500)]
Remove onyx_int.h from vp8 rc header

Also move the FRAME_TYPE declaration to common.h

Bug: webm:1766

Change-Id: Ic3016bd16548a5d2e0ae828a7fd7ad8adda8b8f6

17 months agoMerge tag 'v1.13.0'
Jerome Jiang [Thu, 9 Feb 2023 19:37:33 +0000 (14:37 -0500)]
Merge tag 'v1.13.0'

Release v1.13.0 Ugly Duckling

2023-01-31 v1.13.0 "Ugly Duckling"

  This release includes more Neon and AVX2 optimizations, adds a new codec
  control to set per frame QP, upgrades GoogleTest to v1.12.1, and includes
  numerous bug fixes.

- Upgrading:
    This release is ABI incompatible with the previous release.

    New codec control VP9E_SET_QUANTIZER_ONE_PASS to set per frame QP.

    GoogleTest is upgraded to v1.12.1.

    .clang-format is upgraded to clang-format-11.

    VPX_EXT_RATECTRL_ABI_VERSION was bumped due to incompatible changes to the
    feature of using external rate control models for vp9.

- Enhancement:
    Numerous improvements on Neon optimizations.
    Numerous improvements on AVX2 optimizations.
    Additional ARM targets added for Visual Studio.

- Bug fixes:
    Fix to calculating internal stats when frame dropped.
    Fix to segfault for external resize test in vp9.
    Fix to build system with replacing egrep with grep -E.
    Fix to a few bugs with external RTC rate control library.
    Fix to make SVC work with VBR.
    Fix to key frame setting in VP9 external RC.
    Fix to -Wimplicit-int (Clang 16).
    Fix to VP8 external RC for buffer levels.
    Fix to VP8 external RC for dynamic update of layers.
    Fix to VP9 auto level.
    Fix to off-by-one error of max w/h in validate_config.
    Fix to make SVC work for Profile 1.

Bug: webm:1780

Change-Id: I371fc1444ead56f8d7fc510e05582b6415c3ddb1

17 months agoOptimize Neon high bitdepth convolve copy
Jonathan Wright [Thu, 9 Feb 2023 11:57:10 +0000 (11:57 +0000)]
Optimize Neon high bitdepth convolve copy

Use standard loads and stores instead of the significantly slower
interleaving/de-interleaving variants. Also move all loads in loop
bodies above all stores as a mitigation against the compiler thinking
that the src and dst pointers alias (since we can't use restrict in
C89.)

Change-Id: Idd59dca51387f553f8db27144a2b8f2377c937d3

17 months agoMerge "Copy BLOCK_8X8's mi to PICK_MODE_CONTEXT::mic" into main
Chi Yo Tsai [Wed, 8 Feb 2023 23:16:48 +0000 (23:16 +0000)]
Merge "Copy BLOCK_8X8's mi to PICK_MODE_CONTEXT::mic" into main

17 months agoCopy BLOCK_8X8's mi to PICK_MODE_CONTEXT::mic
chiyotsai [Wed, 8 Feb 2023 22:01:19 +0000 (14:01 -0800)]
Copy BLOCK_8X8's mi to PICK_MODE_CONTEXT::mic

STATS_CHANGED

BUG=webm:1789

Change-Id: I74efe28bdf90a179c59fe3d1f5a15d497f57080d

17 months agoAdd missing high bitdepth Neon subpel variance tests
Salome Thirot [Wed, 8 Feb 2023 17:05:25 +0000 (17:05 +0000)]
Add missing high bitdepth Neon subpel variance tests

Add missing 4x4 and 4x8 tests for both high bitdepth sub-pixel variance
and high bitdepth averaging sub-pixel variance.

Change-Id: I042752c5b7ccc14f58075694d0bb1d36f144ad06

17 months agoFix unsigned integer overflow in sse computation
Cheng Chen [Mon, 30 Jan 2023 19:51:58 +0000 (11:51 -0800)]
Fix unsigned integer overflow in sse computation

Basically port the fix from libaom:
https://aomedia-review.googlesource.com/c/aom/+/169361

Change-Id: Id06a5db91372037832399200ded75d514e096726
(cherry picked from commit a94cdd57ffd95ee7beb48d2794dae538f25da46c)

17 months agoMerge "Enable some speed features on speed 0" into main
Chi Yo Tsai [Wed, 8 Feb 2023 00:44:46 +0000 (00:44 +0000)]
Merge "Enable some speed features on speed 0" into main

17 months agoEnable some speed features on speed 0
chiyotsai [Tue, 7 Feb 2023 19:11:35 +0000 (11:11 -0800)]
Enable some speed features on speed 0

Performance:
| SPD_SET | TESTSET | AVG_PSNR | OVR_PSNR |  SSIM   | ENC_T |
|---------|---------|----------|----------|---------|-------|
|    0    | hdres2  | +0.069%  | +0.067%  | +0.100% | -8.6% |
|    0    | midres2 | +0.116%  | +0.103%  | +0.062% | -9.6% |
|    0    | lowres2 | +0.276%  | +0.283%  | +0.214% |-11.9% |

STATS_CHANGED

Change-Id: I8b26c0be2312fcd0f8c9e889367682e80ea8de4b

17 months agoUse 4D reduction Neon helper for standard bitdepth SAD4D
Salome Thirot [Tue, 7 Feb 2023 11:28:15 +0000 (11:28 +0000)]
Use 4D reduction Neon helper for standard bitdepth SAD4D

Move the 4D reduction helper function to sum_neon.h and use this for
both standard and high bitdepth SAD4D paths. This also removes the
AArch64 requirement for using the UDOT Neon SAD4D paths.

Change-Id: I207f76b3d42aa541809b0672c3b3d86e54d133ff

17 months agoMerge "Move TPL to a new file" into main
Yunqing Wang [Tue, 7 Feb 2023 04:22:40 +0000 (04:22 +0000)]
Merge "Move TPL to a new file" into main

17 months agoMerge changes Ica45c44f,I75c5f099,I9e626d7f into main
James Zern [Tue, 7 Feb 2023 01:32:03 +0000 (01:32 +0000)]
Merge changes Ica45c44f,I75c5f099,I9e626d7f into main

* changes:
  Optimize Neon implementation of high bitdepth SAD4D functions
  Optimize Neon implementation of high bitdepth avg SAD functions
  Optimize Neon implementation of high bitdepth SAD functions

17 months agoMove TPL to a new file
Yunqing Wang [Mon, 6 Feb 2023 22:48:34 +0000 (14:48 -0800)]
Move TPL to a new file

This is a refactoring CL.

Change-Id: Ic8c1575601d27f14ecd1b1bf0a038e447eaae458

17 months agoMerge "Remove duplicated VPX_SCALING declaration" into main
Jerome Jiang [Mon, 6 Feb 2023 22:16:41 +0000 (22:16 +0000)]
Merge "Remove duplicated VPX_SCALING declaration" into main

17 months agoOptimize Neon implementation of high bitdepth SAD4D functions
Salome Thirot [Thu, 2 Feb 2023 16:06:38 +0000 (16:06 +0000)]
Optimize Neon implementation of high bitdepth SAD4D functions

Optimizations take a similar form to those implemented for Armv8.0
standard bitdepth SAD4D:

- Use ABD, UADALP instead of ABAL, ABAL2 (double the throughput on
  modern out-of-order Arm-designed cores.)
- Use more accumulator registers to make better use of Neon pipeline
  resources on Arm CPUs that have four Neon pipes.
- Compute the four SAD sums in parallel so that we only load the source
  block once - instead of four times.

Change-Id: Ica45c44fd167e5fcc83871d8c138fc72ed3a9723

17 months agoRemove duplicated VPX_SCALING declaration
Jerome Jiang [Mon, 6 Feb 2023 18:29:58 +0000 (13:29 -0500)]
Remove duplicated VPX_SCALING declaration

Use VPX_SCALING_MODE instead

Change-Id: Iab9d29f20838703e00bd9f7641035d8ebd69af53

17 months agoOptimize Neon implementation of high bitdepth avg SAD functions
Salome Thirot [Fri, 3 Feb 2023 11:00:19 +0000 (11:00 +0000)]
Optimize Neon implementation of high bitdepth avg SAD functions

Optimizations take a similar form to those implemented for standard
bitdepth averaging SAD:

- Use ABD, UADALP instead of ABAL, ABAL2 (double the throughput on
  modern out-of-order Arm-designed cores.)
- Use more accumulator registers to make better use of Neon pipeline
  resources on Arm CPUs that have four Neon pipes.

Change-Id: I75c5f09948f6bf17200f82e00e7a827a80451108

17 months agoOptimize Neon implementation of high bitdepth SAD functions
Salome Thirot [Wed, 1 Feb 2023 16:37:24 +0000 (16:37 +0000)]
Optimize Neon implementation of high bitdepth SAD functions

Optimizations take a similar form to those implemented for standard
bitdepth SAD:

- Use ABD, UADALP instead of ABAL, ABAL2 (double the throughput on
  modern out-of-order Arm-designed cores.)
- Use more accumulator registers to make better use of Neon pipeline
  resources on Arm CPUs that have four Neon pipes.

Change-Id: I9e626d7fa0e271908dc43448405a7985b80e6230

17 months agoMerge "Fix uninitialized mesh feature for BEST mode" into main
Yunqing Wang [Fri, 3 Feb 2023 23:22:58 +0000 (23:22 +0000)]
Merge "Fix uninitialized mesh feature for BEST mode" into main

17 months agoSet _img->bit_depth in y4m_input_fetch_frame()
Wan-Teh Chang [Fri, 3 Feb 2023 22:07:09 +0000 (14:07 -0800)]
Set _img->bit_depth in y4m_input_fetch_frame()

This is a port of
https://aomedia-review.googlesource.com/c/aom/+/169961.

Change-Id: I2aa0d12cafde0c73448bf8c57eab0cd92e846468

17 months agoFix uninitialized mesh feature for BEST mode
Yunqing Wang [Fri, 3 Feb 2023 00:30:09 +0000 (16:30 -0800)]
Fix uninitialized mesh feature for BEST mode

At BEST encoding mode, the mesh search range wasn't initialized for
non FC_GRAPHICS_ANIMATION content type, which actually/mistakenly
used speed 0's setting. Fixed it by adding the initialization.

There were 2 ways to fix this. Patchset 1 set to use speed 0's setting
for non FC_GRAPHICS_ANIMATION type. This didn't change BEST mode's
encoding results much, and only a couple of clips' results were changed.

Borg result for BEST mode:
         avg_psnr:  ovr_psnr:  ssim:  encoding_spdup:
lowres2:  -0.004     -0.003   -0.000    0.030
midres2:  -0.006     -0.009   -0.012    0.033
hdres2:    0.002      0.002    0.004    0.015

Patchset 2 set to use BEST's setting for non FC_GRAPHICS_ANIMATION type.
However, the majority of test clips' BDrate got changed up to
~0.5% (gain or loss), and overall it didn't give better performance
than patchset 1. So, we chose to use patchset 1.

Change-Id: Ibbf578dad04420e6ba22cb9a3ddec137a7e4deef

17 months agovp9_diamond_search_sad_neon: use DECLARE_ALIGNED
James Zern [Wed, 1 Feb 2023 21:27:06 +0000 (13:27 -0800)]
vp9_diamond_search_sad_neon: use DECLARE_ALIGNED

rather than the gcc specific __attribute__((aligned())); fixes build
targeting ARM64 windows.

Bug: webm:1788
Change-Id: I2210fc215f44d90c1ce9dee9b54888eb1b78c99e

17 months agoUpdate AUTHORS .mailmap and version
Jerome Jiang [Wed, 1 Feb 2023 16:38:42 +0000 (11:38 -0500)]
Update AUTHORS .mailmap and version

Bug: webm:1780
Change-Id: I75a24bdd076dc1746b23bababfaafccbce3b4214

17 months agoFix per frame qp for temporal layers
Jerome Jiang [Thu, 26 Jan 2023 00:25:12 +0000 (19:25 -0500)]
Fix per frame qp for temporal layers

Also add tests with fixed temporal layering mode.

Change-Id: If516fe94e3fb7f5a745821d1788bfe6cf90edaac
(cherry picked from commit db69ce6aea278bee88668fd9cc2af2e544516fdb)

17 months agoUpdate CHANGELOG
Jerome Jiang [Tue, 31 Jan 2023 17:16:38 +0000 (12:16 -0500)]
Update CHANGELOG

Bug: webm:1780
Change-Id: I3ab4729bff1d27ef7127ef26e780a469e9278c21

17 months agoMerge "Use load_unaligned mem_neon.h helpers in SAD and SAD4D" into main
James Zern [Tue, 31 Jan 2023 21:20:16 +0000 (21:20 +0000)]
Merge "Use load_unaligned mem_neon.h helpers in SAD and SAD4D" into main

17 months agoUse load_unaligned mem_neon.h helpers in SAD and SAD4D
Jonathan Wright [Tue, 31 Jan 2023 13:32:33 +0000 (13:32 +0000)]
Use load_unaligned mem_neon.h helpers in SAD and SAD4D

Use the load_unaligned helper functions in mem_neon.h to load strided
sequences of 4 bytes where alignment is not guaranteed in the Neon
SAD and SAD4D paths.

Change-Id: I941d226ef94fd7a633b09fc92165a00ba68a1501

17 months agoFix unsigned integer overflow in sse computation
Cheng Chen [Mon, 30 Jan 2023 19:51:58 +0000 (11:51 -0800)]
Fix unsigned integer overflow in sse computation

Basically port the fix from libaom:
https://aomedia-review.googlesource.com/c/aom/+/169361

Change-Id: Id06a5db91372037832399200ded75d514e096726

17 months agoMerge "Refactor 8x8 16-bit Neon transpose functions" into main
James Zern [Mon, 30 Jan 2023 19:30:45 +0000 (19:30 +0000)]
Merge "Refactor 8x8 16-bit Neon transpose functions" into main

17 months agoRefactor Neon implementation of SAD4D functions
Salome Thirot [Fri, 27 Jan 2023 16:16:16 +0000 (16:16 +0000)]
Refactor Neon implementation of SAD4D functions

Refactor and optimize the Neon implementation of SAD4D functions -
effectively backporting these libaom changes[1,2].

[1] https://aomedia-review.googlesource.com/c/aom/+/162181
[2] https://aomedia-review.googlesource.com/c/aom/+/162183

Change-Id: Icb04bd841d86f2d0e2596aa7ba86b74f8d2d360b

17 months agoMerge "Add encoder component timing information" into main
Yunqing Wang [Sat, 28 Jan 2023 00:27:57 +0000 (00:27 +0000)]
Merge "Add encoder component timing information" into main

17 months agoAdd encoder component timing information
Yunqing Wang [Fri, 27 Jan 2023 01:20:54 +0000 (17:20 -0800)]
Add encoder component timing information

Change-Id: Iaa5b73a9593ecfd74b6426ed47d2b529ec7ae2b5

17 months agoRefactor 8x8 16-bit Neon transpose functions
Gerda Zsejke More [Thu, 26 Jan 2023 15:12:55 +0000 (16:12 +0100)]
Refactor 8x8 16-bit Neon transpose functions

Refactor the Neon implementation of transpose_s16_8x8(q) and
transpose_u16_8x8 so that the final step compiles to 8 ZIP1/ZIP2
instructions as opposed to 8 EXT, MOV pairs. This change removes 8
instructions per call to transpose_s16_8x8(q), transpose_u16_8x8
where the result stays in registers for further processing - rather
than being stored to memory - like in vpx_hadamard_8x8_neon, for
example.

This is a backport of this libaom patch[1].
[1] https://aomedia-review.googlesource.com/c/aom/+/169426

Change-Id: Icef3e51d40efeca7008e1c4fc701bf39bd319c88

17 months agoMerge "Fix per frame qp for temporal layers" into main
Jerome Jiang [Thu, 26 Jan 2023 21:31:14 +0000 (21:31 +0000)]
Merge "Fix per frame qp for temporal layers" into main

17 months agoFix per frame qp for temporal layers
Jerome Jiang [Thu, 26 Jan 2023 00:25:12 +0000 (19:25 -0500)]
Fix per frame qp for temporal layers

Also add tests with fixed temporal layering mode.

Change-Id: If516fe94e3fb7f5a745821d1788bfe6cf90edaac

17 months agoMerge "Refactor Neon implementation of SAD functions" into main
James Zern [Thu, 26 Jan 2023 03:26:38 +0000 (03:26 +0000)]
Merge "Refactor Neon implementation of SAD functions" into main

17 months agoMerge "[NEON] Add Highbd FHT 8x8/16x16 functions" into main
James Zern [Thu, 26 Jan 2023 03:23:31 +0000 (03:23 +0000)]
Merge "[NEON] Add Highbd FHT 8x8/16x16 functions" into main

17 months agoRefactor Neon implementation of SAD functions
Salome Thirot [Tue, 24 Jan 2023 14:27:14 +0000 (14:27 +0000)]
Refactor Neon implementation of SAD functions

Refactor and optimize the Neon implementation of SAD functions -
effectively backporting these libaom changes[1,2,3].

[1] https://aomedia-review.googlesource.com/c/aom/+/161921
[2] https://aomedia-review.googlesource.com/c/aom/+/161923
[3] https://aomedia-review.googlesource.com/c/aom/+/166963

Change-Id: I2d72fd0f27d61a3e31a78acd33172e2afb044cb8

17 months ago[NEON] Add Highbd FHT 8x8/16x16 functions
Konstantinos Margaritis [Tue, 24 Jan 2023 20:48:06 +0000 (20:48 +0000)]
[NEON] Add Highbd FHT 8x8/16x16 functions

In total this gives about 9% extra performance for both rt/best
profiles.
Furthermore, add transpose_s32 16x16 function

Change-Id: Ib6f368bbb9af7f03c9ce0deba1664cef77632fe2

17 months agoSkip calculating internal stats when frame dropped
Jerome Jiang [Tue, 24 Jan 2023 19:08:17 +0000 (14:08 -0500)]
Skip calculating internal stats when frame dropped

Bug: webm:1771
Change-Id: I30cd5b7ec0945b521a1cc03999d39ec6a25f1696

17 months agoSpecialize Neon averaging subpel variance by filter value
Salome Thirot [Fri, 20 Jan 2023 11:42:06 +0000 (11:42 +0000)]
Specialize Neon averaging subpel variance by filter value

Use the same specialization for averaging subpel variance functions
as used for the non-averaging variants. The rationale for the
specialization is as follows:

The optimal implementation of the bilinear interpolation depends on
the filter values being used. For both horizontal and vertical
interpolation this can simplify to just taking the source values, or
averaging the source and reference values - which can be computed
more easily than a bilinear interpolation with arbitrary filter
values.

This patch introduces tests to find the most optimal bilinear
interpolation implementation based on the filter values being used.
This new specialization is only used for larger block sizes

This is a backport of this libaom change[1].

After this change, the only differences between the code in libvpx and
libaom are due to libvpx being compiled with ISO C90, which forbids
mixing declarations and code [-Wdeclaration-after-statement].

[1] https://aomedia-review.googlesource.com/c/aom/+/166962

Change-Id: I7860c852db94a7c9c3d72ae4411316685f3800a4

17 months agoRefactor Neon averaging subpel variance functions
Salome Thirot [Fri, 20 Jan 2023 11:21:02 +0000 (11:21 +0000)]
Refactor Neon averaging subpel variance functions

Merge the computation of vpx_comp_avg_pred into the second pass of the
bilinear filter - avoiding the overhead of loading and storing the
entire block again.

This is a backport of this libaom change[1].

[1] https://aomedia-review.googlesource.com/c/aom/+/166961

Change-Id: I9327ff7382a46d50c42a5213a11379b957146372

17 months agoSpecialize Neon subpel variance by filter value for large blocks
Salome Thirot [Fri, 20 Jan 2023 10:35:34 +0000 (10:35 +0000)]
Specialize Neon subpel variance by filter value for large blocks

The optimal implementation of the bilinear interpolation depends on
the filter values being used. For both horizontal and vertical
interpolation this can simplify to just taking the source values, or
averaging the source and reference values - which can be computed
more easily than a bilinear interpolation with arbitrary filter
values.

This patch introduces tests to find the most optimal bilinear
interpolation implementation based on the filter values being used.
This new specialization is only used for larger block sizes
(>= 16x16) as we need to be doing enough work to make the cost of
finding the optimal implementation worth it.

This is a backport of this libaom change[1].

After this change, the only differences between the code in libvpx and
libaom are due to libvpx being compiled with ISO C90, which forbids
mixing declarations and code [-Wdeclaration-after-statement].

[1] https://aomedia-review.googlesource.com/c/aom/+/162463

Change-Id: Ia818e148f6fd126656e8411d59c184b55dd43094

17 months agoRefactor Neon subpel variance functions
Salome Thirot [Thu, 19 Jan 2023 18:02:52 +0000 (18:02 +0000)]
Refactor Neon subpel variance functions

Refactor the Neon implementation of the sub-pixel variance bilinear
filter helper functions - effectively backporting this libaom patch[1].

[1] https://aomedia-review.googlesource.com/c/aom/+/162462

Change-Id: I3dee32e8125250bbeffeb63d1fef5da559bacbf1

18 months agoMerge "Add codec control to set per frame QP" into main
Jerome Jiang [Fri, 20 Jan 2023 17:14:04 +0000 (17:14 +0000)]
Merge "Add codec control to set per frame QP" into main

18 months agoAdd codec control to set per frame QP
Jerome Jiang [Thu, 12 Jan 2023 20:58:00 +0000 (15:58 -0500)]
Add codec control to set per frame QP

Use case is for 1 pass encoding.
Forces max_quantizer = min_quantizer and aq-mode = 0.
Applicalble to spatial layers, where user may set
the QP per spatial layer.

Change-Id: Idfcb7daefde94c475ed1bc0eb8af47c9f309110b

18 months agoMerge "Refactor Neon implementation of variance functions" into main
James Zern [Thu, 19 Jan 2023 19:44:43 +0000 (19:44 +0000)]
Merge "Refactor Neon implementation of variance functions" into main

18 months ago*/Android.mk: add a check for NDK_ROOT
James Zern [Thu, 19 Jan 2023 03:19:01 +0000 (19:19 -0800)]
*/Android.mk: add a check for NDK_ROOT

This simplifies integration with the Android platform and avoids the
files from being used when a non-NDK build is performed. In that case
Android.bp is preferred.

Change-Id: I803912146dac788b7f0af27199c7613cabbc9fa0

18 months agoRefactor Neon implementation of variance functions
Salome Thirot [Mon, 16 Jan 2023 16:44:04 +0000 (16:44 +0000)]
Refactor Neon implementation of variance functions

Refactor and optimize the Neon implementation of variance functions -
effectively backporting these libaom changes[1,2].

After this change, the only differences between the code in libvpx and
libaom are due to libvpx being compiled with ISO C90, which forbids
mixing declarations and code [-Wdeclaration-after-statement].

[1] https://aomedia-review.googlesource.com/c/aom/+/162241
[2] https://aomedia-review.googlesource.com/c/aom/+/162262

Change-Id: Ia4e8fff4d53297511d1a1e43bca8053bf811e551

18 months agoMerge "Fix to segfault for external resize test in vp9" into main
Marco Paniconi [Wed, 18 Jan 2023 02:04:18 +0000 (02:04 +0000)]
Merge "Fix to segfault for external resize test in vp9" into main

18 months agoFix to segfault for external resize test in vp9
Marco Paniconi [Sat, 14 Jan 2023 03:46:10 +0000 (19:46 -0800)]
Fix to segfault for external resize test in vp9

Failure occurs for 1 pass non-realtime mode at speed 0.
Due to speed feautre rd_ml_partition.var_pruning, which
doesn't check for scaled reference in simple_motion_search().

Bug: webm:1768

Change-Id: Iddcb56033bac042faebb5196eed788317590b23f

18 months agovariance_test.cc: Enable HBDMse speed test.
Scott LaVarnway [Fri, 13 Jan 2023 15:30:07 +0000 (07:30 -0800)]
variance_test.cc: Enable HBDMse speed test.

Change-Id: If0226307a6efd704f8a35cb986f570304d698b95

18 months agoMerge "variance_test.cc: Enable VpxHBDMseTest for C and SSE2." into main
Scott LaVarnway [Fri, 13 Jan 2023 13:36:15 +0000 (13:36 +0000)]
Merge "variance_test.cc: Enable VpxHBDMseTest for C and SSE2." into main

18 months agovariance_test.cc: Enable VpxHBDMseTest for C and SSE2.
Scott LaVarnway [Thu, 12 Jan 2023 19:03:28 +0000 (11:03 -0800)]
variance_test.cc: Enable VpxHBDMseTest for C and SSE2.

Change-Id: I66c0db6c605876d6757684fd715614881ca261e7

18 months agoMerge changes Ifbf46768,If19f5872 into main
James Zern [Thu, 12 Jan 2023 18:41:27 +0000 (18:41 +0000)]
Merge changes Ifbf46768,If19f5872 into main

* changes:
  Implement vertical convolutions using Neon USDOT instruction
  Implement horizontal convolutions using Neon USDOT instruction

18 months agoImplement vertical convolutions using Neon USDOT instruction
Jonathan Wright [Wed, 18 May 2022 15:58:50 +0000 (16:58 +0100)]
Implement vertical convolutions using Neon USDOT instruction

Add additional AArch64 paths for vpx_convolve8_vert_neon and
vpx_convolve8_avg_vert_neon that use the Armv8.6-A USDOT (mixed-sign
dot-product) instruction. The USDOT instruction takes an 8-bit
unsigned operand vector and a signed 8-bit operand vector to produce
a signed 32-bit result. This is helpful because convolution filters
often have both positive and negative values, while the 8-bit pixel
channel data being filtered is all unsigned. As a result, the USDOT
convolution paths added here do not have to do the "transform the
pixel channel data to [-128, 128) and correct for it later" dance
that we have to do with the SDOT paths.

The USDOT instruction is optional from Armv8.2 to Armv8.5 but
mandatory from Armv8.6 onwards. The availability of the USDOT
instruction is indicated by the feature macro
__ARM_FEATURE_MATMUL_INT8. The SDOT paths are retained for use on
target CPUs that do not implement the USDOT instructions.

Change-Id: Ifbf467681dd53bb1d26e22359885e6edde3c5c72

18 months agoImplement horizontal convolutions using Neon USDOT instruction
Jonathan Wright [Wed, 18 May 2022 13:14:56 +0000 (14:14 +0100)]
Implement horizontal convolutions using Neon USDOT instruction

Add additional AArch64 paths for vpx_convolve8_horiz_neon and
vpx_convolve8_avg_horiz_neon that use the Armv8.6-A USDOT (mixed-sign
dot-product) instruction. The USDOT instruction takes an 8-bit
unsigned operand vector and a signed 8-bit operand vector to produce
a signed 32-bit result. This is helpful because convolution filters
often have both positive and negative values, while the 8-bit pixel
channel data being filtered is all unsigned. As a result, the USDOT
convolution paths added here do not have to do the "transform the
pixel channel data to [-128, 128) and correct for it later" dance
that we have to do with the SDOT paths.

The USDOT instruction is optional from Armv8.2 to Armv8.5 but
mandatory from Armv8.6 onwards. The availability of the USDOT
instruction is indicated by the feature macro
__ARM_FEATURE_MATMUL_INT8. The SDOT paths are retained for use on
target CPUs that do not implement the USDOT instructions.

Change-Id: If19f5872c3453458a8cfb7c7d2be82a2c0eab46a

18 months agobuild: replace egrep with grep -E
James Zern [Tue, 10 Jan 2023 21:49:15 +0000 (13:49 -0800)]
build: replace egrep with grep -E

avoids a warning on some platforms:
egrep: warning: egrep is obsolescent; using grep -E

Bug: webm:1786
Change-Id: Ia434297731303aacb0b02cf3dcbfd8e03936485d
Fixed: webm:1786

18 months agoUse Neon load/store helper functions consistently
Jonathan Wright [Thu, 5 Jan 2023 15:04:53 +0000 (15:04 +0000)]
Use Neon load/store helper functions consistently

Define all Neon load/store helper functions in mem_neon.h and use
them consistently in Neon convolution functions.

Change-Id: I57905bc0a3574c77999cf4f4a73442c3420fa2be

18 months agoUse lane-referencing intrinsics in Neon convolution kernels
Jonathan Wright [Thu, 5 Jan 2023 12:20:03 +0000 (12:20 +0000)]
Use lane-referencing intrinsics in Neon convolution kernels

The Neon convolution helper functions take a pointer to a filter and
load the 8 values into a single Neon register. For some reason,
filter values 3 and 4 are then duplicated into their own separate
registers.

This patch modifies these helper functions so that they access filter
values 3 and 4 via the lane-referencing versions of the various Neon
multiply instructions. This reduces register pressure and tidies up
the source code quite a bit.

Change-Id: Ia4aeee8b46fe218658fb8577dc07ff04a9324b3e

19 months agoRemove references to deprecated NumPy type aliases
Jerome Jiang [Wed, 21 Dec 2022 16:13:40 +0000 (11:13 -0500)]
Remove references to deprecated NumPy type aliases

This change replaces references to a number of deprecated NumPy type
aliases (np.bool, np.int, np.float, np.complex, np.object, np.str)
with their recommended replacement
(bool, int, float, complex, object, str).

NumPy 1.24 drops the deprecated aliases
so we must remove uses before updating NumPy.

Change-Id: I9f5dfcbb11fe6534fce358054f210c7653f278c3

19 months ago[x86]: Add vpx_highbd_comp_avg_pred_sse2().
Scott LaVarnway [Tue, 20 Dec 2022 23:43:44 +0000 (15:43 -0800)]
[x86]: Add vpx_highbd_comp_avg_pred_sse2().

C vs SSE2

4x4: 3.38x
8x8: 3.45x
16x16: 2.06x
32x32: 2.19x
64x64: 1.39x

Change-Id: I46638fe187b49a78fee554114fac51c485d74474

19 months agoAdd vpx_highbd_comp_avg_pred_c() test.
Scott LaVarnway [Fri, 16 Dec 2022 18:21:00 +0000 (10:21 -0800)]
Add vpx_highbd_comp_avg_pred_c() test.

Change-Id: I6b2c3379c49a62e56e5ac56fd4782a50b3c4e12a

19 months agoMerge "rc-svc: Add tests for dynamic svc in external RC" into main
Marco Paniconi [Wed, 14 Dec 2022 17:08:21 +0000 (17:08 +0000)]
Merge "rc-svc: Add tests for dynamic svc in external RC" into main

19 months agorc-svc: Add tests for dynamic svc in external RC
Marco Paniconi [Wed, 7 Dec 2022 08:17:22 +0000 (00:17 -0800)]
rc-svc: Add tests for dynamic svc in external RC

Test to verify RC for going down and back up in
spatial layers. Going back up has an issue so added
a TODO.

Make the test more flexible to handle dynamic layers.
Test for dyanmic change in temporal layers to follow.

Change-Id: Ic5542f7b274135277429e116f56ba54e682e96a0

19 months agoAdd additional ARM targets for Visual Studio.
Anton Venema [Tue, 13 Dec 2022 18:27:37 +0000 (10:27 -0800)]
Add additional ARM targets for Visual Studio.

configure: Add an armv7-win32-vs16 target
configure: Add an armv7-win32-vs17 target
configure: Add an arm64-win64-vs16 target
configure: Add an arm64-win64-vs17 target

Change-Id: I11d6cd6e51f7703939d6fd3fc6a7469591e3b09d

19 months agoMerge "L2E: Add a new interface to control rdmult" into main
Cheng Chen [Tue, 13 Dec 2022 01:24:00 +0000 (01:24 +0000)]
Merge "L2E: Add a new interface to control rdmult" into main

19 months ago[x86]: Add vpx_highbd_subtract_block_avx2().
Scott LaVarnway [Tue, 6 Dec 2022 21:13:30 +0000 (13:13 -0800)]
[x86]: Add vpx_highbd_subtract_block_avx2().

Up to 4x faster than "sse2 vectorized C".

Change-Id: Ie9b3c12a437c5cddf92c4d5349c4f659ca6b82ea

19 months agoAdd vpx highbd subtract test.
Scott LaVarnway [Tue, 6 Dec 2022 22:18:03 +0000 (14:18 -0800)]
Add vpx highbd subtract test.

Change-Id: I069ae0fe22bfc82ad5083df85a7fdf9058a285eb

19 months agoL2E: Add a new interface to control rdmult
Cheng Chen [Sat, 3 Dec 2022 02:04:32 +0000 (18:04 -0800)]
L2E: Add a new interface to control rdmult

Allow external model to control frame rdmult.

A function is called per frame to get the value of rdmult from
the external model.

The external rdmult will overwrite libvpx's default rdmult unless
a reserved value is selected.

A unit test is added to test when the default rdmult value is set.

Change-Id: I2f17a036c188de66dc00709beef4bf2ed86a919a

19 months agorc-rtc: Test for periodic key in SVC external RC
Marco Paniconi [Mon, 5 Dec 2022 22:30:40 +0000 (14:30 -0800)]
rc-rtc: Test for periodic key in SVC external RC

This test catches the fix merged in here:
https://chromium-review.googlesource.com/c/webm/libvpx/+/4022904

Change-Id: Ib68fbcba694b5d465a9faf3ca7d6880bfe8eabb3

19 months agorc-rtc: Remove frame_flags_ change in svc ratectril rtc test
Marco Paniconi [Mon, 5 Dec 2022 19:54:33 +0000 (11:54 -0800)]
rc-rtc: Remove frame_flags_ change in svc ratectril rtc test

SVC test is only in CBR and the frame_flags are
set by the SVC pattern, so we shouldn't undo them
for svc mode.

Change-Id: I5ffa65dd58a7b47f287d124d9e71ba1dc7c5a549

20 months agoMerge "vp9/rate_ctrl_rtc: Improve get cyclic refresh data" into main
Marco Paniconi [Fri, 18 Nov 2022 04:16:26 +0000 (04:16 +0000)]
Merge "vp9/rate_ctrl_rtc: Improve get cyclic refresh data" into main

20 months agovp9/rate_ctrl_rtc: Improve get cyclic refresh data
Hirokazu Honda [Thu, 17 Nov 2022 07:05:28 +0000 (16:05 +0900)]
vp9/rate_ctrl_rtc: Improve get cyclic refresh data

A client of the vp9 rate controller needs to know whether the
segmentation is enabled and the size of delta_q. It is also nicer to
know the size of map. This CL changes the interface to achieve these.

Bug: b:259487065
Test: Build

Change-Id: If05854530f97e1430a7b97788910f277ab673a87

20 months agoMerge "vp9-svc: Fixes to make SVC work with VBR" into main
Marco Paniconi [Tue, 15 Nov 2022 21:45:07 +0000 (21:45 +0000)]
Merge "vp9-svc: Fixes to make SVC work with VBR" into main

20 months agovp9-svc: Fixes to make SVC work with VBR
Marco Paniconi [Tue, 15 Nov 2022 06:11:19 +0000 (22:11 -0800)]
vp9-svc: Fixes to make SVC work with VBR

Prior to this CL SVC with VBR mode was broken.
Fixes made here to make VBR rate control work for SVC.
Rename is_one_pass_cbr_svc() --> is_one_pass_svc(),
as it can be used now for both CBR and VBR.

Added rate targetting unittest for (2SL, 3TL).

Bug: chromium:1375111
Change-Id: I5a62ffe7fbea29dc5949c88a284768386b1907a9

20 months agoMerge "[NEON] Optimize FHT functions, add highbd FHT 4x4" into main
James Zern [Tue, 15 Nov 2022 19:19:43 +0000 (19:19 +0000)]
Merge "[NEON] Optimize FHT functions, add highbd FHT 4x4" into main

20 months agoquantize: remove vp9_regular_quantize_b_4x4
Johann [Mon, 14 Nov 2022 08:59:45 +0000 (17:59 +0900)]
quantize: remove vp9_regular_quantize_b_4x4

This was just a helper function which called vpx_quantize_b or
vpx_highbd_quantize_b. It also checked for skip_block, which was
necessary when webm:1439 was filed but does not appear to be
necessary now.

Removes a quantize variant and makes subsequent cleanups easier.

Change-Id: Ibe545eccd19370f07ff26c8e151f290c642efd2a

20 months ago[NEON] Optimize FHT functions, add highbd FHT 4x4
Konstantinos Margaritis [Wed, 9 Nov 2022 09:30:58 +0000 (09:30 +0000)]
[NEON] Optimize FHT functions, add highbd FHT 4x4

Refactor & optimize FHT functions further, use new butterfly functions
4x4 5% faster, 8x8 & 16x16 10% faster than previous versions.
Highbd 4x4 FHT version 2.27x faster than C version for --rt.

Change-Id: I3ebcd26010f6c5c067026aa9353cde46669c5d94

20 months agovp9-rc: Fix key frame setting in external RC
Marco Paniconi [Fri, 11 Nov 2022 02:50:19 +0000 (18:50 -0800)]
vp9-rc: Fix key frame setting in external RC

Bug: b/257368998

Change-Id: I03e35915ac99b50cb6bdf7bce8b8f9ec5aef75b7

20 months agoMerge "Add Neon implementation of vpx_hadamard_32x32" into main
James Zern [Mon, 7 Nov 2022 21:48:50 +0000 (21:48 +0000)]
Merge "Add Neon implementation of vpx_hadamard_32x32" into main

20 months agobuild: fix -Wimplicit-int (Clang 16)
Sam James [Sun, 6 Nov 2022 04:11:59 +0000 (04:11 +0000)]
build: fix -Wimplicit-int (Clang 16)

Clang 16 will make -Wimplicit-int error by default which can, in addition to
other things, lead to some configure tests silently failing/returning the wrong result.

Fixes this error:
```
+/var/tmp/portage/media-libs/libvpx-1.12.0/temp/vpx-conf-1802-30624.c:1:15: error: type specifier missing, defaults to 'int'; ISO C99 and later do not support implicit int [-Wimplicit-int]
```

For more information, see LWN.net [0] or LLVM's Discourse [1], gentoo-dev@ [2],
or the (new) c-std-porting mailing list [3].

[0] https://lwn.net/Articles/913505/
[1] https://discourse.llvm.org/t/configure-script-breakage-with-the-new-werror-implicit-function-declaration/65213
[2] https://archives.gentoo.org/gentoo-dev/message/dd9f2d3082b8b6f8dfbccb0639e6e240
[3] hosted at lists.linux.dev.

Bug: https://bugs.gentoo.org/879705
Change-Id: Id73a98944ab3c99a368b9da7a5e902ddff9d937f
Signed-off-by: Sam James <sam@gentoo.org>
20 months agoAdd Neon implementation of vpx_hadamard_32x32
Andrew Salkeld [Thu, 13 Oct 2022 15:28:41 +0000 (16:28 +0100)]
Add Neon implementation of vpx_hadamard_32x32

Add an Arm Neon implementation of vpx_hadamard_32x32 and use it
instead of the scalar C implementation.

Also add test coverage for the new Neon implementation.

Change-Id: Iccc018eec4dbbe629fb0c6f8ad6ea8554e7a0b13

20 months ago[NEON] Optimize highbd 32x32 DCT
Konstantinos Margaritis [Wed, 26 Oct 2022 22:09:32 +0000 (22:09 +0000)]
[NEON] Optimize highbd 32x32 DCT

For --best quality, resulting function
vpx_highbd_fdct32x32_rd_neon takes 0.27% of cpu time in
profiling, vs 6.27% for the sum of scalar functions:
vpx_fdct32, vpx_fdct32.constprop.0, vpx_fdct32x32_rd_c for rd.
For --rt quality, the function takes 0.19% vs 4.57% for the scalar
version.
Overall, this improves encoding time by ~6% compared for highbd
for --best and ~9% for --rt.

Change-Id: I1ce4bbef6e364bbadc76264056aa3f86b1a8edc5

20 months agoMerge "[NEON] Optimize and homogenize Butterfly DCT functions" into main
James Zern [Wed, 2 Nov 2022 02:21:18 +0000 (02:21 +0000)]
Merge "[NEON] Optimize and homogenize Butterfly DCT functions" into main

20 months ago[NEON] Optimize and homogenize Butterfly DCT functions
Konstantinos Margaritis [Wed, 26 Oct 2022 21:37:31 +0000 (21:37 +0000)]
[NEON] Optimize and homogenize Butterfly DCT functions

Provide a set of commonly used Butterfly DCT functions for use in
DCT 4x4, 8x8, 16x16, 32x32 functions. These are provided in various
forms, using vqrdmulh_s16/vqrdmulh_s32 for _fast variants, which
unfortunately are only usable in pass1 of most DCTs, as they do not
provide the necessary precision in pass2.
This gave a performance gain ranging from 5% to 15% in 16x16 case.
Also, for 32x32, the loads were rearranged, along with the butterfly
optimizations, this gave 10% gain in 32x32_rd function.
This refactoring was necessary to allow easier porting of highbd
32x32 functions -follows this patchset.

Change-Id: I6282e640b95a95938faff76c3b2bace3dc298bc3

20 months agoMerge "MacOS 13 is darwin22" into main
Johann Koenig [Thu, 27 Oct 2022 08:38:48 +0000 (08:38 +0000)]
Merge "MacOS 13 is darwin22" into main

20 months agoMerge "rtcd: allow disabling neon on armv8" into main
Johann Koenig [Thu, 27 Oct 2022 08:38:18 +0000 (08:38 +0000)]
Merge "rtcd: allow disabling neon on armv8" into main

20 months agoMacOS 13 is darwin22
Johann [Thu, 27 Oct 2022 02:40:19 +0000 (11:40 +0900)]
MacOS 13 is darwin22

Bug: webm:1783
Change-Id: I97d94ab8c8aebe13aedb58e280dc37474814ad5d

20 months agortcd: allow disabling neon on armv8
Johann [Wed, 26 Oct 2022 23:49:37 +0000 (08:49 +0900)]
rtcd: allow disabling neon on armv8

Change-Id: Idef943775456eb95b46be5c92c114c1d215f38d7

20 months agomailmap: add johann@duck.com
Johann [Wed, 26 Oct 2022 08:14:21 +0000 (17:14 +0900)]
mailmap: add johann@duck.com

Change-Id: I3b48951e69ba1f4a9fafdbb81fac48f79587a342

20 months agoMerge changes I36545ff4,Id1aa29da into main
James Zern [Tue, 25 Oct 2022 19:16:46 +0000 (19:16 +0000)]
Merge changes I36545ff4,Id1aa29da into main

* changes:
  vp9_highbd_quantize_fp*_neon: normalize fn param name
  highbd_sad_avx2: normalize function param names

20 months agoMerge "SAD*Test: mark virtual Run() as overridden" into main
James Zern [Tue, 25 Oct 2022 19:16:08 +0000 (19:16 +0000)]
Merge "SAD*Test: mark virtual Run() as overridden" into main

20 months agoMerge "quantize: consolidate sse2 conditionals" into main
Johann Koenig [Tue, 25 Oct 2022 13:26:37 +0000 (13:26 +0000)]
Merge "quantize: consolidate sse2 conditionals" into main