James Zern [Tue, 28 Feb 2023 21:50:11 +0000 (21:50 +0000)]
Merge changes I892fbd2c,Ic59df16c,I7228327b,Ib4a1a2cb into main
* changes:
Implement highbd_d117_predictor using Neon
Implement highbd_d63_predictor using Neon
Implement d117_predictor using Neon
Implement d63_predictor using Neon
James Zern [Tue, 28 Feb 2023 21:40:26 +0000 (21:40 +0000)]
Merge "quantize: simplify 32x32_b args" into main
George Steed [Tue, 21 Feb 2023 11:17:10 +0000 (11:17 +0000)]
Implement highbd_d117_predictor using Neon
Add Neon implementations of the highbd d117 predictor for 4x4, 8x8,
16x16 and 32x32 block sizes. Also update tests to add new corresponding
cases.
An explanation of the general implementation strategy is given in the
8x8 implementation body, and is mostly identical to the non-highbd
version.
Speedups over the C code (higher is better):
Microarch. | Compiler | Block | Speedup
Neoverse N1 | LLVM 15 | 4x4 | 1.99
Neoverse N1 | LLVM 15 | 8x8 | 4.37
Neoverse N1 | LLVM 15 | 16x16 | 6.81
Neoverse N1 | LLVM 15 | 32x32 | 6.49
Neoverse N1 | GCC 12 | 4x4 | 2.49
Neoverse N1 | GCC 12 | 8x8 | 4.10
Neoverse N1 | GCC 12 | 16x16 | 5.58
Neoverse N1 | GCC 12 | 32x32 | 2.16
Neoverse V1 | LLVM 15 | 4x4 | 1.99
Neoverse V1 | LLVM 15 | 8x8 | 5.03
Neoverse V1 | LLVM 15 | 16x16 | 6.61
Neoverse V1 | LLVM 15 | 32x32 | 6.01
Neoverse V1 | GCC 12 | 4x4 | 2.09
Neoverse V1 | GCC 12 | 8x8 | 4.52
Neoverse V1 | GCC 12 | 16x16 | 4.23
Neoverse V1 | GCC 12 | 32x32 | 2.70
Change-Id: I892fbd2c17ac527ddc22b91acca907ffc84c5cd2
George Steed [Mon, 20 Feb 2023 11:41:40 +0000 (11:41 +0000)]
Implement highbd_d63_predictor using Neon
Add Neon implementations of the highbd d63 predictor for 4x4, 8x8, 16x16
and 32x32 block sizes. Also update tests to add new corresponding cases.
Speedups over the C code (higher is better):
Microarch. | Compiler | Block | Speedup
Neoverse N1 | LLVM 15 | 4x4 | 2.43
Neoverse N1 | LLVM 15 | 8x8 | 4.03
Neoverse N1 | LLVM 15 | 16x16 | 3.07
Neoverse N1 | LLVM 15 | 32x32 | 4.11
Neoverse N1 | GCC 12 | 4x4 | 2.92
Neoverse N1 | GCC 12 | 8x8 | 7.20
Neoverse N1 | GCC 12 | 16x16 | 4.43
Neoverse N1 | GCC 12 | 32x32 | 3.18
Neoverse V1 | LLVM 15 | 4x4 | 1.99
Neoverse V1 | LLVM 15 | 8x8 | 3.66
Neoverse V1 | LLVM 15 | 16x16 | 3.60
Neoverse V1 | LLVM 15 | 32x32 | 3.29
Neoverse V1 | GCC 12 | 4x4 | 2.39
Neoverse V1 | GCC 12 | 8x8 | 4.76
Neoverse V1 | GCC 12 | 16x16 | 3.29
Neoverse V1 | GCC 12 | 32x32 | 2.43
Change-Id: Ic59df16ceeb468003754b4374be2f4d9af6589e4
George Steed [Tue, 7 Feb 2023 12:16:00 +0000 (12:16 +0000)]
Implement d117_predictor using Neon
Add Neon implementations of the d117 predictor for 4x4, 8x8, 16x16 and
32x32 block sizes. Also update tests to add new corresponding cases.
An explanation of the general implementation strategy is given in the
8x8 implementation body.
Speedups over the C code (higher is better):
Microarch. | Compiler | Block | Speedup
Neoverse N1 | LLVM 15 | 4x4 | 1.73
Neoverse N1 | LLVM 15 | 8x8 | 5.24
Neoverse N1 | LLVM 15 | 16x16 | 9.77
Neoverse N1 | LLVM 15 | 32x32 | 14.13
Neoverse N1 | GCC 12 | 4x4 | 2.04
Neoverse N1 | GCC 12 | 8x8 | 4.70
Neoverse N1 | GCC 12 | 16x16 | 8.64
Neoverse N1 | GCC 12 | 32x32 | 4.57
Neoverse V1 | LLVM 15 | 4x4 | 1.75
Neoverse V1 | LLVM 15 | 8x8 | 6.79
Neoverse V1 | LLVM 15 | 16x16 | 9.16
Neoverse V1 | LLVM 15 | 32x32 | 14.47
Neoverse V1 | GCC 12 | 4x4 | 1.75
Neoverse V1 | GCC 12 | 8x8 | 6.00
Neoverse V1 | GCC 12 | 16x16 | 7.63
Neoverse V1 | GCC 12 | 32x32 | 4.32
Change-Id: I7228327b5be27ee7a68deecafa05be0bd2a40ff4
George Steed [Fri, 3 Feb 2023 17:12:46 +0000 (17:12 +0000)]
Implement d63_predictor using Neon
Add Neon implementations of the d63 predictor for 4x4, 8x8, 16x16 and
32x32 block sizes. Also update tests to add new corresponding cases.
Speedups over the C code (higher is better):
Microarch. | Compiler | Block | Speedup
Neoverse N1 | LLVM 15 | 4x4 | 2.10
Neoverse N1 | LLVM 15 | 8x8 | 4.45
Neoverse N1 | LLVM 15 | 16x16 | 4.74
Neoverse N1 | LLVM 15 | 32x32 | 2.27
Neoverse N1 | GCC 12 | 4x4 | 2.46
Neoverse N1 | GCC 12 | 8x8 | 10.37
Neoverse N1 | GCC 12 | 16x16 | 11.46
Neoverse N1 | GCC 12 | 32x32 | 6.57
Neoverse V1 | LLVM 15 | 4x4 | 2.24
Neoverse V1 | LLVM 15 | 8x8 | 3.53
Neoverse V1 | LLVM 15 | 16x16 | 4.44
Neoverse V1 | LLVM 15 | 32x32 | 2.17
Neoverse V1 | GCC 12 | 4x4 | 2.25
Neoverse V1 | GCC 12 | 8x8 | 7.67
Neoverse V1 | GCC 12 | 16x16 | 8.97
Neoverse V1 | GCC 12 | 32x32 | 4.77
Change-Id: Ib4a1a2cb5a5c4495ae329529f8847664cbd0dfe0
Johann [Sat, 5 Nov 2022 00:53:07 +0000 (09:53 +0900)]
quantize: simplify 32x32_b args
Now that all the implementations of the 32x32 quantize are in
intrinsics we can reference struct members directly. Saves
pushing them to the stack.
n_coeffs is not used at all for this function.
Change-Id: I2104fea3fa20c455087e21b347d6abd7ea1f3e1e
James Zern [Tue, 28 Feb 2023 02:44:28 +0000 (02:44 +0000)]
Merge "Add Neon implementations of standard bitdepth MSE functions" into main
James Zern [Tue, 28 Feb 2023 02:36:41 +0000 (02:36 +0000)]
Merge "Optimize transpose_neon.h helper functions" into main
James Zern [Mon, 27 Feb 2023 21:48:47 +0000 (13:48 -0800)]
tools_common,VpxInterface: remove unneeded const
Change-Id: Ic309aab2ff1750bdbcc36e8aafe05d52930ba694
James Zern [Mon, 27 Feb 2023 19:52:18 +0000 (19:52 +0000)]
Merge "tools_common,VpxInterface: fix interface fn ptr proto" into main
Salome Thirot [Fri, 24 Feb 2023 18:05:43 +0000 (18:05 +0000)]
Add Neon implementations of standard bitdepth MSE functions
Currently only vpx_mse16x16 has a Neon implementation. This patch adds
optimized Armv8.0 and Armv8.4 dot-product paths for all block sizes:
8x8, 8x16, 16x8 and 16x16.
Add the corresponding tests as well.
Change-Id: Ib0357fdcdeb05860385fec89633386e34395e260
Jonathan Wright [Sat, 25 Feb 2023 00:43:46 +0000 (00:43 +0000)]
Optimize transpose_neon.h helper functions
1) Use vtrn[12]q_[su]64 in vpx_vtrnq_[su]64* helpers on AArch64
targets. This produces half as many TRN1/2 instructions compared to
the number of MOVs that result from vcombine.
2) Use vpx_vtrnq_[su]64* helpers wherever applicable.
3) Refactor transpose_4x8_s16 to operate on 128-bit vectors.
Change-Id: I9a8b1c1fe2a98a429e0c5f39def5eb2f65759127
James Zern [Sat, 25 Feb 2023 03:25:39 +0000 (19:25 -0800)]
tools_common,VpxInterface: fix interface fn ptr proto
Use (void) to indicate an empty parameter list and match the declaration
of vpx_codec_vp[89]_[cd]x. This fixes a cfi sanitizer error.
Change-Id: I190f432eea4d1765afffd84c7458ec44d863f90c
James Zern [Fri, 24 Feb 2023 17:58:15 +0000 (17:58 +0000)]
Merge changes I65d86038,If3299fe5,I3ef1ff19 into main
* changes:
Add Neon implementation of high bitdepth 32x32 hadamard transform
Add Neon implementation of high bitdepth 16x16 hadamard transform
Add Neon implementation of high bitdepth 8x8 hadamard transform
James Zern [Fri, 24 Feb 2023 17:49:25 +0000 (17:49 +0000)]
Merge changes Ia64d175a,Ie4ea8f0a into main
* changes:
vp9_loop_filter_alloc: clear -Wshadow warnings
vp9_adapt_mode_probs: clear -Wshadow warning
Salome Thirot [Thu, 23 Feb 2023 12:05:30 +0000 (12:05 +0000)]
Add Neon implementation of high bitdepth 32x32 hadamard transform
Add Neon implementation of vpx_highbd_hadamard_32x32 as well as the
corresponding tests.
Change-Id: I65d8603896649de1996b353aa79eee54824b4708
Salome Thirot [Wed, 22 Feb 2023 17:27:56 +0000 (17:27 +0000)]
Add Neon implementation of high bitdepth 16x16 hadamard transform
Add Neon implementation of vpx_highbd_hadamard_16x16 as well as the
corresponding tests.
Change-Id: If3299fe556351dfe3db994ac171d83a95ea1504b
Jerome Jiang [Fri, 24 Feb 2023 01:45:54 +0000 (01:45 +0000)]
Merge "vp9 rc test: change param type to bool" into main
Jerome Jiang [Thu, 23 Feb 2023 19:28:30 +0000 (14:28 -0500)]
vp9 rc test: change param type to bool
Change-Id: Ib45522e32d9137678da9062830044e9dd87537e5
Chi Yo Tsai [Thu, 23 Feb 2023 18:01:05 +0000 (18:01 +0000)]
Merge "Disable some intra modes for TX_32X32" into main
Salome Thirot [Tue, 21 Feb 2023 17:40:20 +0000 (17:40 +0000)]
Add Neon implementation of high bitdepth 8x8 hadamard transform
Add Neon implementation of vpx_highbd_hadamard_8x8 as well as the
corresponding tests.
Change-Id: I3ef1ff199d76b6b010591ef15a81b0f36c9ded03
James Zern [Wed, 22 Feb 2023 21:25:29 +0000 (13:25 -0800)]
vp9_loop_filter_alloc: clear -Wshadow warnings
Bug: webm:1793
Change-Id: Ia64d175aa69dc2ecde2babf64bde04f02b32795b
James Zern [Wed, 22 Feb 2023 21:21:27 +0000 (13:21 -0800)]
vp9_adapt_mode_probs: clear -Wshadow warning
Bug: webm:1793
Change-Id: Ie4ea8f0a3295e6f58dc6f7d5c61d46700c539d40
James Zern [Thu, 23 Feb 2023 06:07:25 +0000 (06:07 +0000)]
Merge "vp9_block.h: rename diff struct to Diff" into main
chiyotsai [Wed, 22 Feb 2023 20:44:47 +0000 (12:44 -0800)]
Disable some intra modes for TX_32X32
Performance:
| SPD_SET | TESTSET | AVG_PSNR | OVR_PSNR | SSIM | ENC_T |
|---------|---------|----------|----------|---------|-------|
| 0 | hdres2 | +0.036% | +0.032% | +0.014% | -3.9% |
| 0 | lowres2 | -0.002% | -0.011% | +0.020% | -3.6% |
| 0 | midres2 | +0.045% | +0.025% | -0.007% | -4.0% |
STATS_CHANGED
Change-Id: I75a927333d26f2a37f0dda57a641b455b845f5b9
James Zern [Wed, 22 Feb 2023 20:54:21 +0000 (12:54 -0800)]
vpx_subpixel_8t_intrin_avx2: clear -Wshadow warnings
no changes to assembly
Bug: webm:1793
Change-Id: I6a82290cafee7f4a7909d497ccfdefd5a78fb8ed
James Zern [Wed, 22 Feb 2023 19:34:30 +0000 (11:34 -0800)]
vp9_block.h: rename diff struct to Diff
This matches the style guide and fixes some -Wshadow warnings related to
variables with the same name. Something similar was done in libaom in:
863b04994b Fix warnings reported by -Wshadow: Part2: av1 directory
Bug: webm:1793
Change-Id: I4df1bbc8d079a3174d75f0d35d54c200ffdbb677
Yunqing Wang [Wed, 22 Feb 2023 19:28:17 +0000 (19:28 +0000)]
Merge "Skip redundant iterations in joint motion search " into main
Jerome Jiang [Wed, 22 Feb 2023 14:59:49 +0000 (14:59 +0000)]
Merge "vp9 rc: Make it work for SVC parallel encoding" into main
Salome Thirot [Mon, 13 Feb 2023 16:11:31 +0000 (16:11 +0000)]
Optimize Neon implementation of high bitpdeth variance functions
Specialize implementation of high bitdepth variance functions such that
we only widen data processing element types when absolutely necessary.
Change-Id: If4cc3fea7b5ab0821e3129ebd79ff63706a512bf
Deepa K G [Thu, 16 Feb 2023 16:17:24 +0000 (21:47 +0530)]
Skip redundant iterations in joint motion search
In joint_motion_search, there are four iterations.
Even iterations search in the first reference frame
and odd iterations search in the second. The last two
iterations use the search result of the first two
iterations as the start point. If the search result does
not change,last two iterations are not necessary and can
be skipped.
Instruction Count
cpu-used Reduction(%)
0 1.411
Change-Id: Ie583c9f75dd0a22bbdfb432ccdd62eea6ec4fce8
Jerome Jiang [Fri, 11 Nov 2022 19:21:27 +0000 (14:21 -0500)]
vp9 rc: Make it work for SVC parallel encoding
Added unit test.
Keep track of spatial layer id and frame type in case where spatial
layers are encoded parallel by the hardware encoder.
ComputeQP() / PostEncodeUpdate() doesn't need to be called sequentially
when there is no inter layer prediction.
Bug: b/
257368998
Change-Id: I50beaefcfc205d3f9a9d3dbe11fead5bfdc71489
Jerome Jiang [Fri, 17 Feb 2023 02:11:31 +0000 (02:11 +0000)]
Merge "vp9 rc: Verify QP for all spatial layers" into main
Jerome Jiang [Thu, 16 Feb 2023 22:48:49 +0000 (17:48 -0500)]
vp9 rc: Verify QP for all spatial layers
Change-Id: Ic669c96d25d7c039d370e9acd00dc45e09054552
chiyotsai [Tue, 14 Feb 2023 22:29:29 +0000 (14:29 -0800)]
Relax frame recode tolerance on speed 0 to 1 above 480p
Performance:
| SPD_SET | TESTSET | AVG_PSNR | OVR_PSNR | SSIM | ENC_T |
|---------|---------|----------|----------|---------|-------|
| 0 | hdres2 | -0.028% | +0.030% | -0.408% | -2.0% |
| 0 | lowres2 | +0.000% | +0.000% | +0.000% | +0.0% |
| 0 | midres2 | -0.138% | +0.042% | -0.427% | -2.5% |
|---------|---------|----------|----------|---------|-------|
| 1 | hdres2 | -0.032% | +0.018% | -0.342% | -1.1% |
| 1 | lowres2 | +0.000% | +0.000% | +0.000% | +0.0% |
| 1 | midres2 | +0.050% | +0.060% | -0.257% | -1.6% |
Rate Error:
| | | AVG_RC_ERROR | MAX_RC_ERROR |
| | |---------------------|---------------------|
| SPD_SET | TESTSET | BASE | TEST | BASE | TEST |
|---------|---------|----------|----------|----------|----------|
| 0 | hdres2 | 33.044% | 33.065% | 149.903% | 149.903% |
| 0 | midres2 | 59.632% | 59.566% | 79.091% | 79.249% |
|---------|---------|----------|----------|----------|----------|
| 1 | hdres2 | 33.050% | 33.057% | 151.278% | 151.278% |
| 1 | midres2 | 59.640% | 59.614% | 78.707% | 78.842% |
STATS_CHANGED
Change-Id: I5d09601fede3912d5173717ce9dd070df3a97ec8
chiyotsai [Tue, 14 Feb 2023 01:57:26 +0000 (17:57 -0800)]
Enable some more speed features on speed 0 to 2
Performance:
| SPD_SET | TESTSET | AVG_PSNR | OVR_PSNR | SSIM | ENC_T |
|---------|---------|----------|----------|---------|-------|
| 0 | hdres2 | +0.034% | +0.030% | +0.033% | -3.7% |
| 0 | lowres2 | +0.012% | +0.017% | +0.044% | -2.1% |
| 0 | midres2 | +0.030% | +0.035% | +0.060% | -1.9% |
|---------|---------|----------|----------|---------|-------|
| 1 | hdres2 | +0.027% | +0.036% | +0.030% | -2.7% |
| 1 | lowres2 | -0.006% | -0.002% | +0.006% | -1.0% |
| 1 | midres2 | -0.006% | -0.012% | -0.010% | -1.0% |
|---------|---------|----------|----------|---------|-------|
| 2 | hdres2 | -0.006% | -0.001% | -0.020% | -2.4% |
| 2 | lowres2 | -0.010% | -0.015% | -0.001% | -0.9% |
| 2 | midres2 | +0.006% | -0.005% | +0.009% | -1.0% |
STATS_CHANGED
Change-Id: I1431ac07215bb844739a410697387b9aead82792
James Zern [Tue, 14 Feb 2023 02:46:51 +0000 (02:46 +0000)]
Merge changes Id74a6d9c,I5c31e0e9,Id5a2b2d9,I73182c97,I2f5916d5, ... into main
* changes:
Optimize vpx_highbd_comp_avg_pred_neon
Add Neon AvgPredTestHBD test suite
Specialize Neon high bitdepth avg subpel variance by filter value
Specialize Neon high bitdepth subpel variance by filter value
Refactor Neon high bitdepth avg subpel variance functions
Optimize Neon high bitdepth subpel variance functions
Salome Thirot [Fri, 10 Feb 2023 10:50:47 +0000 (10:50 +0000)]
Optimize vpx_highbd_comp_avg_pred_neon
Optimize the implementation of vpx_highbd_comp_avg_pred_neon by making
use of the URHADD instruction to compute the average.
Change-Id: Id74a6d9c33e89bc548c3c7ecace59af69051b4a7
Salome Thirot [Fri, 10 Feb 2023 10:29:24 +0000 (10:29 +0000)]
Add Neon AvgPredTestHBD test suite
Add test suite for vpx_highbd_comp_avg_pred_neon.
Change-Id: I5c31e0e990661ee3b8030bb517829c088fceae4d
Salome Thirot [Thu, 9 Feb 2023 16:45:01 +0000 (16:45 +0000)]
Specialize Neon high bitdepth avg subpel variance by filter value
Use the same specialization as for standard bitdepth. The rationale for
the specialization is as follows:
The optimal implementation of the bilinear interpolation depends on the
filter values being used. For both horizontal and vertical interpolation
this can simplify to just taking the source values, or averaging the
source and reference values - which can be computed more easily than a
bilinear interpolation with arbitrary filter values.
This patch introduces tests to find the most optimal bilinear
interpolation implementation based on the filter values being used.
This new specialization is only used for larger block sizes.
Change-Id: Id5a2b2d9fac6f878795a6ed9de2bc27d9e62d661
Salome Thirot [Thu, 9 Feb 2023 14:16:30 +0000 (14:16 +0000)]
Specialize Neon high bitdepth subpel variance by filter value
Use the same specialization as for standard bitdepth. The rationale for
the specialization is as follows:
The optimal implementation of the bilinear interpolation depends on the
filter values being used. For both horizontal and vertical interpolation
this can simplify to just taking the source values, or averaging the
source and reference values - which can be computed more easily than a
bilinear interpolation with arbitrary filter values.
This patch introduces tests to find the most optimal bilinear
interpolation implementation based on the filter values being used.
This new specialization is only used for larger block sizes.
Change-Id: I73182c979255f0332a274f2e5907df7f38c9eeb3
Salome Thirot [Wed, 8 Feb 2023 16:50:59 +0000 (16:50 +0000)]
Refactor Neon high bitdepth avg subpel variance functions
Use the same general code style as in the standard bitdepth Neon
implementation - merging the computation of vpx_highbd_comp_avg_pred
with the second pass of the bilinear filter to avoid storing and loading
the block again.
Also move vpx_highbd_comp_avg_pred_neon to its own file (like the
standard bitdepth implementation) since we're no longer using it for
averaging sub-pixel variance.
Change-Id: I2f5916d5b397db44b3247b478ef57046797dae6c
Salome Thirot [Tue, 7 Feb 2023 14:08:33 +0000 (14:08 +0000)]
Optimize Neon high bitdepth subpel variance functions
Use the same general code style as in the standard bitdepth Neon
implementation. Additionally, do not unnecessarily widen to 32-bit data
types when doing bilinear filtering - allowing us to process twice as
many elements per instruction.
Change-Id: I1e178991d2aa71f5f77a376e145d19257481e90f
James Zern [Sat, 11 Feb 2023 03:04:41 +0000 (19:04 -0800)]
README: update release version to 1.13.0
this was missed in the v1.13.0 tag
Bug: webm:1780
Change-Id: I3044534123bf67861174970e6241f6586055358e
Chi Yo Tsai [Fri, 10 Feb 2023 22:13:50 +0000 (22:13 +0000)]
Merge "Remove CONFIG_CONSISTENT_RECODE flag" into main
chiyotsai [Wed, 8 Feb 2023 21:54:46 +0000 (13:54 -0800)]
Remove CONFIG_CONSISTENT_RECODE flag
Currently, libvpx does not properly clear and re-initialize the memories
when it re-encodes a frame. As a result, out-of-date values are used in
the encoding process, and re-encoding a frame with the same parameter
will give different outputs.
This commit enables the code under CONFIG_CONSISTENT_RECODE to correct
this behavior. This change has minor effect on the coding performance,
but it ensures valid values are used in the encoding process.
Furthermore, the flag is removed as it is now always turned on.
Performance:
| SPD_SET | TESTSET | AVG_PSNR | OVR_PSNR | SSIM | ENC_T |
|---------|---------|----------|----------|---------|-------|
| 0 | hdres2 | -0.012% | -0.021% | -0.030% | +0.1% |
| 0 | lowres2 | +0.029% | +0.019% | +0.047% | +0.1% |
| 0 | midres2 | -0.004% | +0.009% | +0.026% | +0.1% |
|---------|---------|----------|----------|---------|-------|
| 1 | hdres2 | +0.032% | +0.032% | -0.000% | -0.0% |
| 1 | lowres2 | -0.005% | -0.011% | -0.014% | +0.0% |
| 1 | midres2 | +0.004% | +0.020% | +0.027% | +0.2% |
|---------|---------|----------|----------|---------|-------|
| 2 | hdres2 | +0.048% | +0.056% | +0.057% | +0.1% |
| 2 | lowres2 | +0.007% | +0.002% | -0.016% | -0.0% |
| 2 | midres2 | -0.015% | -0.008% | -0.002% | +0.1% |
|---------|---------|----------|----------|---------|-------|
| 3 | hdres2 | +0.010% | +0.014% | +0.004% | -0.0% |
| 3 | lowres2 | +0.000% | -0.021% | -0.001% | +0.0% |
| 3 | midres2 | +0.007% | -0.038% | +0.012% | -0.2% |
|---------|---------|----------|----------|---------|-------|
| 4 | hdres2 | +0.107% | +0.136% | +0.124% | -0.0% |
| 4 | lowres2 | -0.012% | -0.024% | -0.020% | -0.0% |
| 4 | midres2 | +0.055% | -0.004% | +0.048% | -0.1% |
|---------|---------|----------|----------|---------|-------|
| 5 | hdres2 | +0.026% | +0.027% | +0.020% | -0.0% |
| 5 | lowres2 | +0.009% | -0.008% | +0.028% | +0.1% |
| 5 | midres2 | -0.025% | +0.021% | -0.020% | -0.1% |
STATS_CHANGED
Change-Id: I3967aee8c8e4d0608a492e07f99ab8de9744ba57
James Zern [Fri, 10 Feb 2023 03:35:22 +0000 (03:35 +0000)]
Merge "Optimize Neon high bitdepth convolve copy" into main
Jerome Jiang [Thu, 9 Feb 2023 22:07:28 +0000 (22:07 +0000)]
Merge "Merge tag 'v1.13.0'" into main
Jerome Jiang [Thu, 9 Feb 2023 21:27:59 +0000 (21:27 +0000)]
Merge "Remove onyx_int.h from vp8 rc header" into main
Jerome Jiang [Tue, 7 Feb 2023 22:22:12 +0000 (17:22 -0500)]
Remove onyx_int.h from vp8 rc header
Also move the FRAME_TYPE declaration to common.h
Bug: webm:1766
Change-Id: Ic3016bd16548a5d2e0ae828a7fd7ad8adda8b8f6
Jerome Jiang [Thu, 9 Feb 2023 19:37:33 +0000 (14:37 -0500)]
Merge tag 'v1.13.0'
Release v1.13.0 Ugly Duckling
2023-01-31 v1.13.0 "Ugly Duckling"
This release includes more Neon and AVX2 optimizations, adds a new codec
control to set per frame QP, upgrades GoogleTest to v1.12.1, and includes
numerous bug fixes.
- Upgrading:
This release is ABI incompatible with the previous release.
New codec control VP9E_SET_QUANTIZER_ONE_PASS to set per frame QP.
GoogleTest is upgraded to v1.12.1.
.clang-format is upgraded to clang-format-11.
VPX_EXT_RATECTRL_ABI_VERSION was bumped due to incompatible changes to the
feature of using external rate control models for vp9.
- Enhancement:
Numerous improvements on Neon optimizations.
Numerous improvements on AVX2 optimizations.
Additional ARM targets added for Visual Studio.
- Bug fixes:
Fix to calculating internal stats when frame dropped.
Fix to segfault for external resize test in vp9.
Fix to build system with replacing egrep with grep -E.
Fix to a few bugs with external RTC rate control library.
Fix to make SVC work with VBR.
Fix to key frame setting in VP9 external RC.
Fix to -Wimplicit-int (Clang 16).
Fix to VP8 external RC for buffer levels.
Fix to VP8 external RC for dynamic update of layers.
Fix to VP9 auto level.
Fix to off-by-one error of max w/h in validate_config.
Fix to make SVC work for Profile 1.
Bug: webm:1780
Change-Id: I371fc1444ead56f8d7fc510e05582b6415c3ddb1
Jonathan Wright [Thu, 9 Feb 2023 11:57:10 +0000 (11:57 +0000)]
Optimize Neon high bitdepth convolve copy
Use standard loads and stores instead of the significantly slower
interleaving/de-interleaving variants. Also move all loads in loop
bodies above all stores as a mitigation against the compiler thinking
that the src and dst pointers alias (since we can't use restrict in
C89.)
Change-Id: Idd59dca51387f553f8db27144a2b8f2377c937d3
Chi Yo Tsai [Wed, 8 Feb 2023 23:16:48 +0000 (23:16 +0000)]
Merge "Copy BLOCK_8X8's mi to PICK_MODE_CONTEXT::mic" into main
chiyotsai [Wed, 8 Feb 2023 22:01:19 +0000 (14:01 -0800)]
Copy BLOCK_8X8's mi to PICK_MODE_CONTEXT::mic
STATS_CHANGED
BUG=webm:1789
Change-Id: I74efe28bdf90a179c59fe3d1f5a15d497f57080d
Salome Thirot [Wed, 8 Feb 2023 17:05:25 +0000 (17:05 +0000)]
Add missing high bitdepth Neon subpel variance tests
Add missing 4x4 and 4x8 tests for both high bitdepth sub-pixel variance
and high bitdepth averaging sub-pixel variance.
Change-Id: I042752c5b7ccc14f58075694d0bb1d36f144ad06
Cheng Chen [Mon, 30 Jan 2023 19:51:58 +0000 (11:51 -0800)]
Fix unsigned integer overflow in sse computation
Basically port the fix from libaom:
https://aomedia-review.googlesource.com/c/aom/+/169361
Change-Id: Id06a5db91372037832399200ded75d514e096726
(cherry picked from commit
a94cdd57ffd95ee7beb48d2794dae538f25da46c)
Chi Yo Tsai [Wed, 8 Feb 2023 00:44:46 +0000 (00:44 +0000)]
Merge "Enable some speed features on speed 0" into main
chiyotsai [Tue, 7 Feb 2023 19:11:35 +0000 (11:11 -0800)]
Enable some speed features on speed 0
Performance:
| SPD_SET | TESTSET | AVG_PSNR | OVR_PSNR | SSIM | ENC_T |
|---------|---------|----------|----------|---------|-------|
| 0 | hdres2 | +0.069% | +0.067% | +0.100% | -8.6% |
| 0 | midres2 | +0.116% | +0.103% | +0.062% | -9.6% |
| 0 | lowres2 | +0.276% | +0.283% | +0.214% |-11.9% |
STATS_CHANGED
Change-Id: I8b26c0be2312fcd0f8c9e889367682e80ea8de4b
Salome Thirot [Tue, 7 Feb 2023 11:28:15 +0000 (11:28 +0000)]
Use 4D reduction Neon helper for standard bitdepth SAD4D
Move the 4D reduction helper function to sum_neon.h and use this for
both standard and high bitdepth SAD4D paths. This also removes the
AArch64 requirement for using the UDOT Neon SAD4D paths.
Change-Id: I207f76b3d42aa541809b0672c3b3d86e54d133ff
Yunqing Wang [Tue, 7 Feb 2023 04:22:40 +0000 (04:22 +0000)]
Merge "Move TPL to a new file" into main
James Zern [Tue, 7 Feb 2023 01:32:03 +0000 (01:32 +0000)]
Merge changes Ica45c44f,I75c5f099,I9e626d7f into main
* changes:
Optimize Neon implementation of high bitdepth SAD4D functions
Optimize Neon implementation of high bitdepth avg SAD functions
Optimize Neon implementation of high bitdepth SAD functions
Yunqing Wang [Mon, 6 Feb 2023 22:48:34 +0000 (14:48 -0800)]
Move TPL to a new file
This is a refactoring CL.
Change-Id: Ic8c1575601d27f14ecd1b1bf0a038e447eaae458
Jerome Jiang [Mon, 6 Feb 2023 22:16:41 +0000 (22:16 +0000)]
Merge "Remove duplicated VPX_SCALING declaration" into main
Salome Thirot [Thu, 2 Feb 2023 16:06:38 +0000 (16:06 +0000)]
Optimize Neon implementation of high bitdepth SAD4D functions
Optimizations take a similar form to those implemented for Armv8.0
standard bitdepth SAD4D:
- Use ABD, UADALP instead of ABAL, ABAL2 (double the throughput on
modern out-of-order Arm-designed cores.)
- Use more accumulator registers to make better use of Neon pipeline
resources on Arm CPUs that have four Neon pipes.
- Compute the four SAD sums in parallel so that we only load the source
block once - instead of four times.
Change-Id: Ica45c44fd167e5fcc83871d8c138fc72ed3a9723
Jerome Jiang [Mon, 6 Feb 2023 18:29:58 +0000 (13:29 -0500)]
Remove duplicated VPX_SCALING declaration
Use VPX_SCALING_MODE instead
Change-Id: Iab9d29f20838703e00bd9f7641035d8ebd69af53
Salome Thirot [Fri, 3 Feb 2023 11:00:19 +0000 (11:00 +0000)]
Optimize Neon implementation of high bitdepth avg SAD functions
Optimizations take a similar form to those implemented for standard
bitdepth averaging SAD:
- Use ABD, UADALP instead of ABAL, ABAL2 (double the throughput on
modern out-of-order Arm-designed cores.)
- Use more accumulator registers to make better use of Neon pipeline
resources on Arm CPUs that have four Neon pipes.
Change-Id: I75c5f09948f6bf17200f82e00e7a827a80451108
Salome Thirot [Wed, 1 Feb 2023 16:37:24 +0000 (16:37 +0000)]
Optimize Neon implementation of high bitdepth SAD functions
Optimizations take a similar form to those implemented for standard
bitdepth SAD:
- Use ABD, UADALP instead of ABAL, ABAL2 (double the throughput on
modern out-of-order Arm-designed cores.)
- Use more accumulator registers to make better use of Neon pipeline
resources on Arm CPUs that have four Neon pipes.
Change-Id: I9e626d7fa0e271908dc43448405a7985b80e6230
Yunqing Wang [Fri, 3 Feb 2023 23:22:58 +0000 (23:22 +0000)]
Merge "Fix uninitialized mesh feature for BEST mode" into main
Wan-Teh Chang [Fri, 3 Feb 2023 22:07:09 +0000 (14:07 -0800)]
Set _img->bit_depth in y4m_input_fetch_frame()
This is a port of
https://aomedia-review.googlesource.com/c/aom/+/169961.
Change-Id: I2aa0d12cafde0c73448bf8c57eab0cd92e846468
Yunqing Wang [Fri, 3 Feb 2023 00:30:09 +0000 (16:30 -0800)]
Fix uninitialized mesh feature for BEST mode
At BEST encoding mode, the mesh search range wasn't initialized for
non FC_GRAPHICS_ANIMATION content type, which actually/mistakenly
used speed 0's setting. Fixed it by adding the initialization.
There were 2 ways to fix this. Patchset 1 set to use speed 0's setting
for non FC_GRAPHICS_ANIMATION type. This didn't change BEST mode's
encoding results much, and only a couple of clips' results were changed.
Borg result for BEST mode:
avg_psnr: ovr_psnr: ssim: encoding_spdup:
lowres2: -0.004 -0.003 -0.000 0.030
midres2: -0.006 -0.009 -0.012 0.033
hdres2: 0.002 0.002 0.004 0.015
Patchset 2 set to use BEST's setting for non FC_GRAPHICS_ANIMATION type.
However, the majority of test clips' BDrate got changed up to
~0.5% (gain or loss), and overall it didn't give better performance
than patchset 1. So, we chose to use patchset 1.
Change-Id: Ibbf578dad04420e6ba22cb9a3ddec137a7e4deef
James Zern [Wed, 1 Feb 2023 21:27:06 +0000 (13:27 -0800)]
vp9_diamond_search_sad_neon: use DECLARE_ALIGNED
rather than the gcc specific __attribute__((aligned())); fixes build
targeting ARM64 windows.
Bug: webm:1788
Change-Id: I2210fc215f44d90c1ce9dee9b54888eb1b78c99e
Jerome Jiang [Wed, 1 Feb 2023 16:38:42 +0000 (11:38 -0500)]
Update AUTHORS .mailmap and version
Bug: webm:1780
Change-Id: I75a24bdd076dc1746b23bababfaafccbce3b4214
Jerome Jiang [Thu, 26 Jan 2023 00:25:12 +0000 (19:25 -0500)]
Fix per frame qp for temporal layers
Also add tests with fixed temporal layering mode.
Change-Id: If516fe94e3fb7f5a745821d1788bfe6cf90edaac
(cherry picked from commit
db69ce6aea278bee88668fd9cc2af2e544516fdb)
Jerome Jiang [Tue, 31 Jan 2023 17:16:38 +0000 (12:16 -0500)]
Update CHANGELOG
Bug: webm:1780
Change-Id: I3ab4729bff1d27ef7127ef26e780a469e9278c21
James Zern [Tue, 31 Jan 2023 21:20:16 +0000 (21:20 +0000)]
Merge "Use load_unaligned mem_neon.h helpers in SAD and SAD4D" into main
Jonathan Wright [Tue, 31 Jan 2023 13:32:33 +0000 (13:32 +0000)]
Use load_unaligned mem_neon.h helpers in SAD and SAD4D
Use the load_unaligned helper functions in mem_neon.h to load strided
sequences of 4 bytes where alignment is not guaranteed in the Neon
SAD and SAD4D paths.
Change-Id: I941d226ef94fd7a633b09fc92165a00ba68a1501
Cheng Chen [Mon, 30 Jan 2023 19:51:58 +0000 (11:51 -0800)]
Fix unsigned integer overflow in sse computation
Basically port the fix from libaom:
https://aomedia-review.googlesource.com/c/aom/+/169361
Change-Id: Id06a5db91372037832399200ded75d514e096726
James Zern [Mon, 30 Jan 2023 19:30:45 +0000 (19:30 +0000)]
Merge "Refactor 8x8 16-bit Neon transpose functions" into main
Salome Thirot [Fri, 27 Jan 2023 16:16:16 +0000 (16:16 +0000)]
Refactor Neon implementation of SAD4D functions
Refactor and optimize the Neon implementation of SAD4D functions -
effectively backporting these libaom changes[1,2].
[1] https://aomedia-review.googlesource.com/c/aom/+/162181
[2] https://aomedia-review.googlesource.com/c/aom/+/162183
Change-Id: Icb04bd841d86f2d0e2596aa7ba86b74f8d2d360b
Yunqing Wang [Sat, 28 Jan 2023 00:27:57 +0000 (00:27 +0000)]
Merge "Add encoder component timing information" into main
Yunqing Wang [Fri, 27 Jan 2023 01:20:54 +0000 (17:20 -0800)]
Add encoder component timing information
Change-Id: Iaa5b73a9593ecfd74b6426ed47d2b529ec7ae2b5
Gerda Zsejke More [Thu, 26 Jan 2023 15:12:55 +0000 (16:12 +0100)]
Refactor 8x8 16-bit Neon transpose functions
Refactor the Neon implementation of transpose_s16_8x8(q) and
transpose_u16_8x8 so that the final step compiles to 8 ZIP1/ZIP2
instructions as opposed to 8 EXT, MOV pairs. This change removes 8
instructions per call to transpose_s16_8x8(q), transpose_u16_8x8
where the result stays in registers for further processing - rather
than being stored to memory - like in vpx_hadamard_8x8_neon, for
example.
This is a backport of this libaom patch[1].
[1] https://aomedia-review.googlesource.com/c/aom/+/169426
Change-Id: Icef3e51d40efeca7008e1c4fc701bf39bd319c88
Jerome Jiang [Thu, 26 Jan 2023 21:31:14 +0000 (21:31 +0000)]
Merge "Fix per frame qp for temporal layers" into main
Jerome Jiang [Thu, 26 Jan 2023 00:25:12 +0000 (19:25 -0500)]
Fix per frame qp for temporal layers
Also add tests with fixed temporal layering mode.
Change-Id: If516fe94e3fb7f5a745821d1788bfe6cf90edaac
James Zern [Thu, 26 Jan 2023 03:26:38 +0000 (03:26 +0000)]
Merge "Refactor Neon implementation of SAD functions" into main
James Zern [Thu, 26 Jan 2023 03:23:31 +0000 (03:23 +0000)]
Merge "[NEON] Add Highbd FHT 8x8/16x16 functions" into main
Salome Thirot [Tue, 24 Jan 2023 14:27:14 +0000 (14:27 +0000)]
Refactor Neon implementation of SAD functions
Refactor and optimize the Neon implementation of SAD functions -
effectively backporting these libaom changes[1,2,3].
[1] https://aomedia-review.googlesource.com/c/aom/+/161921
[2] https://aomedia-review.googlesource.com/c/aom/+/161923
[3] https://aomedia-review.googlesource.com/c/aom/+/166963
Change-Id: I2d72fd0f27d61a3e31a78acd33172e2afb044cb8
Konstantinos Margaritis [Tue, 24 Jan 2023 20:48:06 +0000 (20:48 +0000)]
[NEON] Add Highbd FHT 8x8/16x16 functions
In total this gives about 9% extra performance for both rt/best
profiles.
Furthermore, add transpose_s32 16x16 function
Change-Id: Ib6f368bbb9af7f03c9ce0deba1664cef77632fe2
Jerome Jiang [Tue, 24 Jan 2023 19:08:17 +0000 (14:08 -0500)]
Skip calculating internal stats when frame dropped
Bug: webm:1771
Change-Id: I30cd5b7ec0945b521a1cc03999d39ec6a25f1696
Salome Thirot [Fri, 20 Jan 2023 11:42:06 +0000 (11:42 +0000)]
Specialize Neon averaging subpel variance by filter value
Use the same specialization for averaging subpel variance functions
as used for the non-averaging variants. The rationale for the
specialization is as follows:
The optimal implementation of the bilinear interpolation depends on
the filter values being used. For both horizontal and vertical
interpolation this can simplify to just taking the source values, or
averaging the source and reference values - which can be computed
more easily than a bilinear interpolation with arbitrary filter
values.
This patch introduces tests to find the most optimal bilinear
interpolation implementation based on the filter values being used.
This new specialization is only used for larger block sizes
This is a backport of this libaom change[1].
After this change, the only differences between the code in libvpx and
libaom are due to libvpx being compiled with ISO C90, which forbids
mixing declarations and code [-Wdeclaration-after-statement].
[1] https://aomedia-review.googlesource.com/c/aom/+/166962
Change-Id: I7860c852db94a7c9c3d72ae4411316685f3800a4
Salome Thirot [Fri, 20 Jan 2023 11:21:02 +0000 (11:21 +0000)]
Refactor Neon averaging subpel variance functions
Merge the computation of vpx_comp_avg_pred into the second pass of the
bilinear filter - avoiding the overhead of loading and storing the
entire block again.
This is a backport of this libaom change[1].
[1] https://aomedia-review.googlesource.com/c/aom/+/166961
Change-Id: I9327ff7382a46d50c42a5213a11379b957146372
Salome Thirot [Fri, 20 Jan 2023 10:35:34 +0000 (10:35 +0000)]
Specialize Neon subpel variance by filter value for large blocks
The optimal implementation of the bilinear interpolation depends on
the filter values being used. For both horizontal and vertical
interpolation this can simplify to just taking the source values, or
averaging the source and reference values - which can be computed
more easily than a bilinear interpolation with arbitrary filter
values.
This patch introduces tests to find the most optimal bilinear
interpolation implementation based on the filter values being used.
This new specialization is only used for larger block sizes
(>= 16x16) as we need to be doing enough work to make the cost of
finding the optimal implementation worth it.
This is a backport of this libaom change[1].
After this change, the only differences between the code in libvpx and
libaom are due to libvpx being compiled with ISO C90, which forbids
mixing declarations and code [-Wdeclaration-after-statement].
[1] https://aomedia-review.googlesource.com/c/aom/+/162463
Change-Id: Ia818e148f6fd126656e8411d59c184b55dd43094
Salome Thirot [Thu, 19 Jan 2023 18:02:52 +0000 (18:02 +0000)]
Refactor Neon subpel variance functions
Refactor the Neon implementation of the sub-pixel variance bilinear
filter helper functions - effectively backporting this libaom patch[1].
[1] https://aomedia-review.googlesource.com/c/aom/+/162462
Change-Id: I3dee32e8125250bbeffeb63d1fef5da559bacbf1
Jerome Jiang [Fri, 20 Jan 2023 17:14:04 +0000 (17:14 +0000)]
Merge "Add codec control to set per frame QP" into main
Jerome Jiang [Thu, 12 Jan 2023 20:58:00 +0000 (15:58 -0500)]
Add codec control to set per frame QP
Use case is for 1 pass encoding.
Forces max_quantizer = min_quantizer and aq-mode = 0.
Applicalble to spatial layers, where user may set
the QP per spatial layer.
Change-Id: Idfcb7daefde94c475ed1bc0eb8af47c9f309110b
James Zern [Thu, 19 Jan 2023 19:44:43 +0000 (19:44 +0000)]
Merge "Refactor Neon implementation of variance functions" into main
James Zern [Thu, 19 Jan 2023 03:19:01 +0000 (19:19 -0800)]
*/Android.mk: add a check for NDK_ROOT
This simplifies integration with the Android platform and avoids the
files from being used when a non-NDK build is performed. In that case
Android.bp is preferred.
Change-Id: I803912146dac788b7f0af27199c7613cabbc9fa0
Salome Thirot [Mon, 16 Jan 2023 16:44:04 +0000 (16:44 +0000)]
Refactor Neon implementation of variance functions
Refactor and optimize the Neon implementation of variance functions -
effectively backporting these libaom changes[1,2].
After this change, the only differences between the code in libvpx and
libaom are due to libvpx being compiled with ISO C90, which forbids
mixing declarations and code [-Wdeclaration-after-statement].
[1] https://aomedia-review.googlesource.com/c/aom/+/162241
[2] https://aomedia-review.googlesource.com/c/aom/+/162262
Change-Id: Ia4e8fff4d53297511d1a1e43bca8053bf811e551
Marco Paniconi [Wed, 18 Jan 2023 02:04:18 +0000 (02:04 +0000)]
Merge "Fix to segfault for external resize test in vp9" into main