Jan Zielinski [Fri, 2 Aug 2019 09:59:03 +0000 (11:59 +0200)]
swr/rasterizer: Fix GS attributes processing
Input to GS is just a set of attributes, so remove explicit setup of
'position' which is meaningless for GS input processing.
Reviewed-by: Alok Hota <alok.hota@intel.com>
Samuel Pitoiset [Wed, 28 Aug 2019 15:08:29 +0000 (17:08 +0200)]
radv: keep a pointer to a NIR shader into radv_shader_context
This avoids multiple copies for nothing and it's more elegant.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
Samuel Pitoiset [Wed, 28 Aug 2019 14:52:30 +0000 (16:52 +0200)]
radv: move setting can_discard to ac_fill_shader_info()
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
Samuel Pitoiset [Thu, 29 Aug 2019 11:32:10 +0000 (13:32 +0200)]
radv: replace ac_nir_build_if by ac_build_ifcc
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
Samuel Pitoiset [Thu, 29 Aug 2019 09:49:03 +0000 (11:49 +0200)]
radv: remove radv_init_llvm_target() helper
RADV no longer uses specific LLVM options compared to the common code.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
Samuel Pitoiset [Thu, 29 Aug 2019 09:46:46 +0000 (11:46 +0200)]
radv: remove useless ac_llvm_util.h include from the WSI code
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
Samuel Pitoiset [Fri, 26 Jul 2019 12:48:23 +0000 (14:48 +0200)]
radv: remove unused shader_info parameter in ac_compile_llvm_module()
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
Samuel Pitoiset [Wed, 28 Aug 2019 14:46:15 +0000 (16:46 +0200)]
radv: remove some unused fields from radv_shader_context
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
Samuel Pitoiset [Thu, 29 Aug 2019 09:16:44 +0000 (11:16 +0200)]
radv: move lowering PS inputs/outputs at the right place
At shaders creation, just after NIR linking.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
Samuel Pitoiset [Thu, 29 Aug 2019 09:12:25 +0000 (11:12 +0200)]
radv: gather info about PS inputs in the shader info pass
It's the right place to do that.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
Samuel Pitoiset [Wed, 31 Jul 2019 07:57:47 +0000 (09:57 +0200)]
ac: drop now useless lookup_interp_param from ABI
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Samuel Pitoiset [Wed, 31 Jul 2019 07:54:48 +0000 (09:54 +0200)]
ac: import linear/perspective PS input parameters from radv/radeonsi
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Krzysztof Raszkowski [Fri, 30 Aug 2019 05:50:21 +0000 (05:50 +0000)]
util: Add unreachable() definition for clang compiler.
Without unreachable() definition clang throw return-type error
in many places in mesa code.
Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>
Nataraj Deshpande [Wed, 28 Aug 2019 21:18:43 +0000 (14:18 -0700)]
egl/android: Enable HAL_PIXEL_FORMAT_RGBA_FP16 format
The patch adds support for 64 bit HAL_PIXEL_FORMAT_RGBA_FP16
for android platform.
Fixes android.graphics.cts.BitmapColorSpaceTest#test16bitHardware
which failed in egl due to "Unsupported native buffer format 0x16"
on chromebooks.
Signed-off-by: Nataraj Deshpande <nataraj.deshpande@intel.com>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Dave Airlie [Thu, 29 Aug 2019 19:50:26 +0000 (05:50 +1000)]
gallivm: disable accurate cube corner for integer textures.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=111511
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Pierre-Eric Pelloux-Prayer [Wed, 28 Aug 2019 08:56:52 +0000 (10:56 +0200)]
glsl: replace 'x + (-x)' with constant 0
This fixes a hang in shadertoy for radeonsi where a buffer was initialized with:
value -= value
with value being undefined.
In this case LLVM replace the operation with an assignment to NaN.
Cc: 19.1 19.2 <mesa-stable@lists.freedesktop.org>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=111241
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Thong Thai [Mon, 19 Aug 2019 18:31:08 +0000 (14:31 -0400)]
radeonsi: add JPEG decode support for VCN 2.0 devices
Signed-off-by: Thong Thai <thong.thai@amd.com>
Reviewed-by: Boyuan Zhang <boyuan.zhang@amd.com>
Thong Thai [Wed, 28 Aug 2019 21:02:26 +0000 (17:02 -0400)]
Revert "radeonsi: don't emit PKT3_CONTEXT_CONTROL on amdgpu"
This reverts commit
5a2e65be89d97ed5d7263f0296ea69ae8517187b.
Even though CONTEXT_CONTROL is emitted by the kernel, CONTEXT_CONTROL
still needs to be emitted by the UMD, or else the driver will hang
Cc: 19.2 <mesa-stable@lists.freedesktop.org>
Signed-off-by: Thong Thai <thong.thai@amd.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Ian Romanick [Mon, 12 Aug 2019 22:40:20 +0000 (15:40 -0700)]
nir/range-analysis: Add a lot more assertions about the contents of tables
v2: Update several of the comments. Drop some redundant uses of
ASSERT_UNION_OF_OTHERS_MATCHES_UNKNOWN_*_SOURCE source. Suggested by
Caio.
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Suggested-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Ian Romanick [Fri, 9 Aug 2019 19:48:27 +0000 (12:48 -0700)]
nir/range-analysis: Range tracking for fpow
One shader from Metro Last Light and the rest from Rochard. In the
Rochard cases, something like:
min(1.0, max(pow(saturate(x), y), z))
was transformed to
saturate(max(pow(saturate(x), y), z))
because the result of the pow must be >= 0.
The Metro Last Light case was similar. An instance of
min(pow(abs(x), y), 1.0)
became
saturate(pow(abs(x), y))
v2: Fix some comments. Suggested by Caio.
v3: Fix setting is_intgral when the exponent might be negative. See
also Mesa MR !1778.
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
All Intel platforms had similar results. (Ice Lake shown)
total instructions in shared programs:
16280670 ->
16280659 (<.01%)
instructions in affected programs: 1130 -> 1119 (-0.97%)
helped: 11
HURT: 0
helped stats (abs) min: 1 max: 1 x̄: 1.00 x̃: 1
helped stats (rel) min: 0.72% max: 1.43% x̄: 1.03% x̃: 0.97%
95% mean confidence interval for instructions value: -1.00 -1.00
95% mean confidence interval for instructions %-change: -1.19% -0.86%
Instructions are helped.
total cycles in shared programs:
367168430 ->
367168270 (<.01%)
cycles in affected programs: 10281 -> 10121 (-1.56%)
helped: 10
HURT: 1
helped stats (abs) min: 16 max: 18 x̄: 17.00 x̃: 17
helped stats (rel) min: 1.31% max: 2.43% x̄: 1.79% x̃: 1.70%
HURT stats (abs) min: 10 max: 10 x̄: 10.00 x̃: 10
HURT stats (rel) min: 3.10% max: 3.10% x̄: 3.10% x̃: 3.10%
95% mean confidence interval for cycles value: -20.06 -9.04
95% mean confidence interval for cycles %-change: -2.36% -0.32%
Cycles are helped.
Ian Romanick [Tue, 13 Aug 2019 01:44:56 +0000 (18:44 -0700)]
nir/range-analysis: Handle constants in nir_op_mov just like nir_op_bcsel
I discovered this while looking at a shader that was hurt by some other
work I'm doing. When I examined the changes, I was confused that one
instance of a comparison that was used in a discard_if was (incorrectly)
eliminated, while another instance used by a bcsel was (correctly) not
eliminated. I had to use NIR_PRINT=true to see exactly where things
when wrong.
A bunch of shaders in Goat Simulator, Dungeon Defenders, Sanctum 2, and
Strike Suit Zero were impacted.
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Fixes:
405de7ccb6c ("nir/range-analysis: Rudimentary value range analysis pass")
All Intel platforms had similar results. (Ice Lake shown)
total instructions in shared programs:
16280659 ->
16281075 (<.01%)
instructions in affected programs: 21042 -> 21458 (1.98%)
helped: 0
HURT: 136
HURT stats (abs) min: 1 max: 9 x̄: 3.06 x̃: 3
HURT stats (rel) min: 1.16% max: 6.12% x̄: 2.23% x̃: 2.03%
95% mean confidence interval for instructions value: 2.93 3.19
95% mean confidence interval for instructions %-change: 2.08% 2.37%
Instructions are HURT.
total cycles in shared programs:
367168270 ->
367170313 (<.01%)
cycles in affected programs: 172020 -> 174063 (1.19%)
helped: 14
HURT: 111
helped stats (abs) min: 2 max: 80 x̄: 21.21 x̃: 9
helped stats (rel) min: 0.10% max: 4.47% x̄: 1.35% x̃: 0.79%
HURT stats (abs) min: 2 max: 584 x̄: 21.08 x̃: 5
HURT stats (rel) min: 0.12% max: 17.28% x̄: 1.55% x̃: 0.40%
95% mean confidence interval for cycles value: 5.41 27.28
95% mean confidence interval for cycles %-change: 0.64% 1.81%
Cycles are HURT.
Ian Romanick [Mon, 12 Aug 2019 19:08:40 +0000 (12:08 -0700)]
nir/range-analysis: Fix incorrect fadd range result for (ne_zero, ne_zero)
Found by inspection. I tried really, really hard to make a test case
that would trigger this problem, but I was unsuccesful. It's very hard
to get an instruction to produce a ne_zero result without ne_zero
sources. The most plausible way is using bcsel. That proves
problematic because bcsel interprets its sources as integers, so it
cannot currently be used to "clean" values for floating point
instructions.
No shader-db changes on any Intel platform.
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Fixes:
405de7ccb6c ("nir/range-analysis: Rudimentary value range analysis pass")
Ian Romanick [Fri, 9 Aug 2019 17:55:49 +0000 (10:55 -0700)]
nir/range-analysis: Adjust result range of multiplication to account for flush-to-zero
Fixes piglit tests (new in piglit!110):
- fs-underflow-fma-compare-zero.shader_test
- fs-underflow-mul-compare-zero.shader_test
v2: Add back part of comment accidentally deleted. Noticed by
Caio. Remove is_not_zero function as it is no longer used.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=111308
Fixes:
fa116ce357b ("nir/range-analysis: Range tracking for ffma and flrp")
Fixes:
405de7ccb6c ("nir/range-analysis: Rudimentary value range analysis pass")
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
All Gen7+ platforms** had similar results. (Ice Lake shown)
total instructions in shared programs:
16278465 ->
16279492 (<.01%)
instructions in affected programs: 16765 -> 17792 (6.13%)
helped: 0
HURT: 23
HURT stats (abs) min: 7 max: 275 x̄: 44.65 x̃: 8
HURT stats (rel) min: 1.15% max: 17.51% x̄: 4.23% x̃: 1.62%
95% mean confidence interval for instructions value: 9.57 79.74
95% mean confidence interval for instructions %-change: 1.85% 6.61%
Instructions are HURT.
total cycles in shared programs:
367135159 ->
367154270 (<.01%)
cycles in affected programs: 279306 -> 298417 (6.84%)
helped: 0
HURT: 23
HURT stats (abs) min: 13 max: 6029 x̄: 830.91 x̃: 54
HURT stats (rel) min: 0.17% max: 45.67% x̄: 7.33% x̃: 0.49%
95% mean confidence interval for cycles value: 100.89 1560.94
95% mean confidence interval for cycles %-change: 0.94% 13.71%
Cycles are HURT.
total spills in shared programs: 8870 -> 8869 (-0.01%)
spills in affected programs: 19 -> 18 (-5.26%)
helped: 1
HURT: 0
total fills in shared programs: 21904 -> 21901 (-0.01%)
fills in affected programs: 81 -> 78 (-3.70%)
helped: 1
HURT: 0
LOST: 0
GAINED: 1
** On Broadwell, a shader was hurt for spills / fills instead of
helped.
No changes on any earlier platforms.
Ian Romanick [Wed, 7 Aug 2019 15:56:22 +0000 (08:56 -0700)]
nir/range-analysis: Adjust result range of exp2 to account for flush-to-zero
Fixes piglit tests (new in piglit!110):
- fs-underflow-exp2-compare-zero.shader_test
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=111308
Fixes:
405de7ccb6c ("nir/range-analysis: Rudimentary value range analysis pass")
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Most of the shaders affected are, unsurprisingly, in Unigine Heaven.
All Gen6+ platforms had similar results. (Ice Lake shown)
total instructions in shared programs:
16278207 ->
16278465 (<.01%)
instructions in affected programs: 11374 -> 11632 (2.27%)
helped: 0
HURT: 58
HURT stats (abs) min: 2 max: 13 x̄: 4.45 x̃: 4
HURT stats (rel) min: 0.54% max: 4.11% x̄: 2.42% x̃: 2.82%
95% mean confidence interval for instructions value: 3.77 5.13
95% mean confidence interval for instructions %-change: 2.19% 2.64%
Instructions are HURT.
total cycles in shared programs:
367134284 ->
367135159 (<.01%)
cycles in affected programs: 81207 -> 82082 (1.08%)
helped: 17
HURT: 36
helped stats (abs) min: 6 max: 356 x̄: 90.35 x̃: 6
helped stats (rel) min: 0.69% max: 21.45% x̄: 5.71% x̃: 0.78%
HURT stats (abs) min: 4 max: 235 x̄: 66.97 x̃: 16
HURT stats (rel) min: 0.35% max: 27.58% x̄: 5.34% x̃: 1.09%
95% mean confidence interval for cycles value: -20.36 53.38
95% mean confidence interval for cycles %-change: -1.08% 4.67%
Inconclusive result (value mean confidence interval includes 0).
No changes on any earlier platforms.
Ian Romanick [Thu, 8 Aug 2019 23:48:14 +0000 (16:48 -0700)]
nir/algebraic: Clean up value range analysis-based optimizations
Fix the a / b ordering in some compares. Delete duplicate patterns.
Add a table explaining things. While I was cleaning this up, I managed
to confuse myself. The table helped sort that out.
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Ian Romanick [Wed, 7 Aug 2019 15:54:04 +0000 (08:54 -0700)]
nir/algebraic: Mark some value range analysis-based optimizations imprecise
This didn't fix bug #111308, but it was found will trying to find the
actual cause of that bug.
Fixes piglit tests (new in piglit!110):
- fs-fract-of-NaN.shader_test
- fs-lt-nan-tautology.shader_test
- fs-ge-nan-tautology.shader_test
No shader-db changes on any Intel platform.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=111308
Fixes:
b77070e293c ("nir/algebraic: Use value range analysis to eliminate tautological compares")
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Kenneth Graunke [Thu, 29 Aug 2019 01:05:57 +0000 (18:05 -0700)]
iris: Fix partial fast clear checks to account for miplevel.
We enabled fast clears at level > 0, but didn't minify the dimensions
when comparing the box size, so we always thought it was a partial
clear and as a result never actually enabled any.
This eliminates some slow clears in Civilization VI, but they are mostly
during initialization and not the main rendering.
Thanks to Dan Walsh for noticing we had too many slow clears.
Fixes:
393f659ed83 ("iris: Enable fast clears on other miplevels and layers than 0.")
Reviewed-by: Rafael Antognolli <rafael.antognolli@intel.com>
Rohan Garg [Thu, 29 Aug 2019 12:53:10 +0000 (14:53 +0200)]
panfrost: Remove unused argument from panfrost_drm_submit_vs_fs_job()
is_scanout is not used anywhere and can be inferred within
panfrost_drm_submit_vs_fs_job() if required.
Signed-off-by: Rohan Garg <rohan.garg@collabora.com>
Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com>
Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
Kenneth Graunke [Thu, 29 Aug 2019 16:39:46 +0000 (09:39 -0700)]
iris: Actually describe bo_reuse driconf option
Otherwise it doesn't exist and can't be parsed, so everything dies at
screen init time.
Fixes:
6dc4ddc5f81 ("iris: use driconf for 'bo_reuse' parameter")
Tomeu Vizoso [Thu, 29 Aug 2019 12:44:17 +0000 (14:44 +0200)]
panfrost/ci: Print only regressions
Some functionality has been added to deqp-volt to only print
regressions, so update our version of it and use the new options.
Signed-off-by: Tomeu Vizoso <tomeu.vizoso@collabora.com>
Acked-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Roland Scheidegger [Wed, 28 Aug 2019 19:35:45 +0000 (21:35 +0200)]
gallivm: use fallback code for mul_hi with llvm >= 7.0
LLVM 7.0 ditched the pmulu intrinsics.
This is only a trivial patch to use the fallback code instead.
It'll likely produce atrocious code since the pattern doesn't match what
llvm itself uses in its autoupgrade paths, hence the pattern won't be
recognized.
Should fix https://bugs.freedesktop.org/show_bug.cgi?id=111496
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
Samuel Pitoiset [Thu, 29 Aug 2019 07:18:54 +0000 (09:18 +0200)]
radv/gfx10: compute the LDS size for exporting PrimID for VS
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Jan Zielinski [Fri, 2 Aug 2019 10:28:13 +0000 (12:28 +0200)]
swr/rasterizer: Enable ARB_fragment_layer_viewport
Added loading gl_Layer and gl_ViewportIndex variables
to Pixel Shader context.
Reviewed-by: Alok Hota <alok.hota@intel.com>
Tapani Pälli [Wed, 28 Aug 2019 11:46:16 +0000 (14:46 +0300)]
iris: use driconf for 'bo_reuse' parameter
Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Tapani Pälli [Wed, 28 Aug 2019 11:29:53 +0000 (14:29 +0300)]
i965: initialize bo_reuse when creating brw_bufmgr
Fixes a possible data race spotted while debugging on other EGL
related failures where glFinish and eglCreateContext are going on at
the same time:
==11558== Possible data race during read of size 1 at 0x5E78CD0 by thread #23
==11558== Locks held: 1, at address 0x5E77CA8
==11558== at 0x61B71D4: bo_alloc_internal (brw_bufmgr.c:639)
==11558== by 0x61B7328: brw_bo_alloc (brw_bufmgr.c:669)
==11558== by 0x61EF975: recreate_growing_buffer (intel_batchbuffer.c:231)
==11558== by 0x61EFAAE: intel_batchbuffer_reset (intel_batchbuffer.c:255)
==11558== by 0x61EFB85: intel_batchbuffer_reset_and_clear_render_cache (intel_batchbuffer.c:280)
==11558== by 0x61F0507: brw_new_batch (intel_batchbuffer.c:551)
==11558== by 0x61F12C1: _intel_batchbuffer_flush_fence (intel_batchbuffer.c:888)
==11558== by 0x61BDD6B: intel_glFlush (brw_context.c:296)
==11558== by 0x61BDDB9: intel_finish (brw_context.c:307)
==11558== by 0x623831B: _mesa_Finish (context.c:1906)
==11558== by 0x46D556: deqp::egl::GLES2ThreadTest::Operation::execute(tcu::ThreadUtil::Thread&)
==11558== by 0x721502: tcu::ThreadUtil::Thread::run()
==11558==
==11558== This conflicts with a previous write of size 1 by thread #26
==11558== Locks held: 1, at address 0x5D09878
==11558== at 0x61B98A9: brw_bufmgr_enable_reuse (brw_bufmgr.c:1541)
==11558== by 0x61BF09D: brw_process_driconf_options (brw_context.c:854)
==11558== by 0x61BF6CA: brwCreateContext (brw_context.c:993)
==11558== by 0x621181F: driCreateContextAttribs (dri_util.c:473)
==11558== by 0x53FE87B: dri2_create_context (egl_dri2.c:1388)
==11558== by 0x53EE7BE: eglCreateContext (eglapi.c:807)
==11558== by 0x5C8AB9: eglw::FuncPtrLibrary::createContext(void*, void*, void*, int const*) const
==11558== by 0x46E027: deqp::egl::GLES2ThreadTest::CreateContext::exec(tcu::ThreadUtil::Thread&)
Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Kenneth Graunke [Thu, 29 Aug 2019 00:50:13 +0000 (17:50 -0700)]
iris: Don't auto-flush/dirty on transfer unmap for coherent buffers
When u_upload_mgr fills up a buffer, it unmaps and destroys it. Our
unmap function was automatically performing the equivalent of a
FlushMappedBufferRange call in this case. Because the buffer mapping
is persistent and coherent, we don't actually do any flushing when we
do the rest of the writes to the buffer - we were just doing one final
one at the end. But we would be using the uploaded contents on the
GPU the whole time.
This certainly shouldn't be necessary for streaming buffers, and if
such flushing and dirtying is necessary for coherent buffers, this is
wildly insufficient.
Drops a small number of constant packets and PIPE_CONTROL flushes from
most benchmarks that I've looked at. Doesn't seem to make much of an
impact on performance, however.
Thanks to Felix Degrood for noticing that we were emitting more
3DSTATE_CONSTANT_* packets than we needed to.
Timur Kristóf [Wed, 28 Aug 2019 23:23:37 +0000 (01:23 +0200)]
st/nine: Properly initialize GLSL types for NIR shaders.
NIR shaders use GLSL types (note: these live outside libglsl), and
nine needs to properly initialize these just like the other state
trackers. This fixes an assertion failure when TTN is used.
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Rob Clark [Sat, 29 Jun 2019 11:54:23 +0000 (04:54 -0700)]
freedreno/ir3: do better job of marking convergence points
Fixes:
dEQP-GLES3.functional.shaders.switch.switch_in_do_while_loop_dynamic_vertex
dEQP-GLES3.functional.shaders.switch.switch_in_do_while_loop_dynamic_fragment
Signed-off-by: Rob Clark <robdclark@chromium.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
Rob Clark [Sat, 29 Jun 2019 11:52:47 +0000 (04:52 -0700)]
freedreno/ir3: maintain predecessors/successors
While resolving jumps to skip intermediate jumps from the structured
CFG, maintain the successors and predecessors correctly.
Signed-off-by: Rob Clark <robdclark@chromium.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
Rob Clark [Fri, 28 Jun 2019 14:30:35 +0000 (07:30 -0700)]
freedreno/ir3: convert block->predecessors to set
Signed-off-by: Rob Clark <robdclark@chromium.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
Jordan Justen [Sat, 9 Mar 2019 00:59:44 +0000 (16:59 -0800)]
pci_id_driver_map: Support preferring iris over i965
This adds the ability for intel devices that:
* Only load on i965
* Only load on iris
* First attempt i965, and try iris next
* First attempt iris, and try i965 next
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Jordan Justen [Tue, 20 Aug 2019 01:55:40 +0000 (18:55 -0700)]
i965: Exit with error if gen12+ is detected
For OpenGL support on gen12, the iris driver should be used.
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Tapani Pälli [Fri, 9 Aug 2019 09:28:44 +0000 (12:28 +0300)]
anv: build libanv for gen12 in android build
Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Jordan Justen [Wed, 13 Dec 2017 09:18:07 +0000 (01:18 -0800)]
anv: Build for gen12
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Tapani Pälli [Fri, 9 Aug 2019 09:30:05 +0000 (12:30 +0300)]
iris: build android libmesa_iris for gen12
Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Jordan Justen [Mon, 11 Feb 2019 04:14:07 +0000 (20:14 -0800)]
iris: Build for gen12
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Jordan Justen [Thu, 14 Sep 2017 20:39:46 +0000 (13:39 -0700)]
intel/l3: Don't assert on gen12 (use gen11 config temporarily)
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Acked-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Jordan Justen [Tue, 8 Aug 2017 22:28:23 +0000 (15:28 -0700)]
intel/compiler: Disable compaction on gen12 for now
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Tapani Pälli [Fri, 9 Aug 2019 09:31:42 +0000 (12:31 +0300)]
intel/isl: build android libmesa_isl for gen12
Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Jordan Justen [Wed, 16 Aug 2017 23:45:47 +0000 (16:45 -0700)]
intel/isl: Build gen12 using gen11 code paths
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Tapani Pälli [Fri, 9 Aug 2019 09:27:06 +0000 (12:27 +0300)]
intel/genxml: generate pack files for gen12 on android builds
Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Jordan Justen [Fri, 12 Oct 2018 23:18:37 +0000 (16:18 -0700)]
intel/genxml: Build gen12 genxml
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Jordan Justen [Wed, 6 Sep 2017 21:40:57 +0000 (14:40 -0700)]
intel/genxml: Add gen12.xml as a copy of gen11.xml
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Jordan Justen [Mon, 19 Aug 2019 09:17:57 +0000 (02:17 -0700)]
intel/genxml: Run sort_xml.sh to tidy gen9.xml and gen11.xml
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Jordan Justen [Sat, 17 Aug 2019 08:45:59 +0000 (01:45 -0700)]
intel/genxml/gen11: Add spaces in EnableUnormPathInColorPipe
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Jordan Justen [Thu, 17 Aug 2017 01:19:39 +0000 (18:19 -0700)]
intel/genxml: Handle field names with different spacing/hyphen
If a field name differs slightly between two generations then this
change will still add the fields into the same group.
For example, these will be treated as equal:
* "Software Exception" and "Software Exception"
* "Per Thread" and "Per-Thread"
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Eric Anholt [Wed, 28 Aug 2019 17:13:29 +0000 (10:13 -0700)]
freedreno/a6xx: Fix non-mipmap filtering selection.
We were clamping the LOD to force non-mipmap filtering, but that means
that the HW doesn't get to select between the min and mag filters.
Setting MIPFILTER_LINEAR_FAR appears to force non-mipmap filtering.
Fixes all failures in dEQP-GLES2.functional.texture.filtering.2d.*
Reviewed-by: Rob Clark <robdclark@chromium.org>
Ian Romanick [Mon, 26 Aug 2019 20:33:06 +0000 (13:33 -0700)]
intel/compiler: Request bitfield_reverse lowering on pre-Gen7 hardware
See the previous commit for the explanation of the Fixes tag.
Hurts 21 shaders in shader-db. All of the hurt shaders are in Unreal
Engine 4 tech demos.
Reviewed-by: Matt Turner <mattst88@gmail.com>
Fixes:
7afa26d4e39 ("nir: Add lowering for nir_op_bitfield_reverse.")
Ian Romanick [Mon, 26 Aug 2019 20:28:09 +0000 (13:28 -0700)]
nir/algrbraic: Don't optimize open-coded bitfield reverse when lowering is enabled
This caused a problem on Sandybridge where an open-coded
bitfieldReverse() function could be optimized to a
nir_op_bitfield_reverse that would generate an unsupported BFREV
instruction in the backend. This was encountered in some Unreal4 tech
demos in shader-db. The bug was not previously noticed because we don't
actually try to run those demos on Sandybridge.
The fixes tag is a bit a lie. The actual bug was introduced about
26,000 commits earlier in
371c4b3c48f ("nir: Recognize open-coded
bitfield_reverse."). Without the NIR lowering pass, the flag needed to
avoid the optimization does not exist. Hopefully nobody will care to
fix this on an earlier Mesa release.
Reviewed-by: Matt Turner <mattst88@gmail.com>
Fixes:
7afa26d4e39 ("nir: Add lowering for nir_op_bitfield_reverse.")
Eric Anholt [Wed, 14 Aug 2019 21:40:03 +0000 (14:40 -0700)]
gallium: Don't emit identical endian-dependent pack/unpack code.
Reduces the size of the u_format_table.c file by 140k (out of 1.64M)
and makes me less confused about endianness in gallium.
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Acked-by: Adam Jackson <ajax@redhat.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Eric Anholt [Thu, 15 Aug 2019 22:38:00 +0000 (15:38 -0700)]
gallium: Fix big-endian addressing of non-bitmask array formats.
The formats affected are:
- LA x (16_FLOAT, 32_FLOAT, 32_UINT, 32_SINT)
- R8G8B8 x (UNORM, SNORM, SRGB, USCALED, SSCALED, UINT, SINT)
- RG/RGB/RGBA x (64_FLOAT, 32_FLOAT, 16_FLOAT, 32_UNORM, 32_SNORM,
32_USCALED, 32_SSCALED, 32_FIXED, 32_UINT, 32_SINT)
- RGB/RGBA x (16_UNORM, 16_SNORM, 16_USCALED, 16_SSCALED,
16_UINT, 16_SINT)
- RGBx16 x (UNORM, SNORM, FLOAT, UINT, SINT)
- RGBx32 x (FLOAT, UINT, SINT)
- RA x (16_FLOAT, 32_FLOAT, 32_UINT, 32_SINT)
The updated st_formats.c unit test checks that the formats affected by
this change are all array formats in the equivalent Mesa format (if
any). Mesa's array format definition is clear: the value stored is an
array (increasing memory address) of values of the channel's type.
It's also the only thing that makes sense for the RGB types, or very
large types like RGBA64_FLOAT (A should not move to the low address
because the cpu is BE).
Acked-by: Roland Scheidegger <sroland@vmware.com>
Acked-by: Adam Jackson <ajax@redhat.com>
Tested-by: Matt Turner <mattst88@gmail.com> (unit tests on BE)
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Eric Anholt [Thu, 15 Aug 2019 16:55:39 +0000 (09:55 -0700)]
gallium: Drop a bit of dead code from the pack/unpack python.
Nothing used this var.
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Acked-by: Adam Jackson <ajax@redhat.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Eric Anholt [Thu, 15 Aug 2019 16:48:53 +0000 (09:48 -0700)]
gallium: Drop the useless union wrapper on pack/unpack.
Nothing accessed the .value field, just the .chan. Unwrap all the
code from the union, for clarity (and 13k less generated code).
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Acked-by: Adam Jackson <ajax@redhat.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Eric Anholt [Wed, 14 Aug 2019 21:53:35 +0000 (14:53 -0700)]
gallium: Skip generating the pack/unpack union if we don't use it.
Shaves 30k off of the 1.6M .c file, and makes for less noise for me
trying to understand how gallium formats actually work.
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Acked-by: Adam Jackson <ajax@redhat.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Eric Anholt [Thu, 15 Aug 2019 22:00:10 +0000 (15:00 -0700)]
gallium: Fix mesa format name in unit test failure path.
We clearly wanted the mesa format here.
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Acked-by: Adam Jackson <ajax@redhat.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Boris Brezillon [Wed, 28 Aug 2019 14:52:37 +0000 (16:52 +0200)]
panfrost: Reset the damage area on imported resources
Reset the damage area in the resource_from_handle() path (as done in
panfrost_resource_create()).
Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Boris Brezillon [Wed, 28 Aug 2019 07:17:21 +0000 (09:17 +0200)]
panfrost: Use ralloc() to allocate instructions to avoid leaking those objs
Instructions attached to blocks are never explicitly freed. Let's
use ralloc() to attach those objects to the compiler context so that
they are automatically freed when the ctx object is freed.
Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Jose Fonseca [Tue, 27 Aug 2019 10:54:31 +0000 (11:54 +0100)]
scons: Make GCC builds stricter.
Uses some of the same -Werror options used by Meson, as suggested by
Michel Dänzer.
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Reviewed-by: Michel Dänzer <michel@daenzer.net>
Acked-by: Eric Engestrom <eric@engestrom.ch>
Jose Fonseca [Tue, 27 Aug 2019 10:53:00 +0000 (11:53 +0100)]
util: Prevent strcasecmp macro redefinion.
MinGW headers already define it.
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Acked-by: Eric Engestrom <eric@engestrom.ch>
Jose Fonseca [Tue, 27 Aug 2019 10:51:00 +0000 (11:51 +0100)]
util: Prevent implicit declaration of function getenv.
With MinGW cross compilation.
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Acked-by: Eric Engestrom <eric@engestrom.ch>
Jose Fonseca [Tue, 27 Aug 2019 10:48:30 +0000 (11:48 +0100)]
glx: Fix incompatible function pointer types.
I don't know how Meson didn't hit this issue, when it too already uses
-Werror=incompatible-pointer-types
Fixes:
3dd299c3d5b88114894e ("glx: Sync <GL/glxext.h> with Khronos")
Reviewed-by: Eric Engestrom <eric@engestrom.ch>
Reviewed-by: Adam Jackson <ajax@redhat.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Vasily Khoruzhick [Sun, 25 Aug 2019 19:54:14 +0000 (12:54 -0700)]
lima: fix texture descriptor issues
Looks like initial RE was wrong and some fields have different purpose.
I.e. there's no "disable_mipmap" field, it's actually part of another field
that selects mipmap filtering.
Also fix layout position.
Reviewed-by: Qiang Yu <yuq825@gmail.com>
Signed-off-by: Vasily Khoruzhick <anarsoul@gmail.com>
Kenneth Graunke [Wed, 24 Apr 2019 05:18:11 +0000 (22:18 -0700)]
iris: Drop swizzling parameter from s8_offset.
This is always false on Gen8+, no need for dead code and parameters.
Kenneth Graunke [Fri, 23 Aug 2019 18:10:30 +0000 (11:10 -0700)]
mesa: Fix _mesa_float_to_unorm() on 32-bit systems.
This fixes the following CTS test on 32-bit systems:
GTF-GL46.gtf30.GL3Tests.packed_depth_stencil.packed_depth_stencil_init
It does glGetTexImage of a 16-bit SNORM image, requesting 32-bit UNORM
data. In get_tex_rgba_uncompressed, we round trip through float to
handle image transfer ops for clamping. _mesa_format_convert does:
_mesa_float_to_unorm(0.
571428597f, 32)
which translated to:
_mesa_lroundevenf(0.
571428597f * 0xffffffffu)
which produced different results on 64-bit and 32-bit systems:
64-bit: result = 0x92492500
32-bit: result = 0x80000000
This is because the size of "long" varies between the two systems, and
0x92492500 is too large to fit in a signed 32-bit integer. To fix this,
we switch to the new _mesa_i64roundevenf function which always does the
64-bit operation.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=104395
Fixes:
594fc0f8595 ("mesa: Replace F_TO_I() with _mesa_lroundevenf().")
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Kenneth Graunke [Fri, 23 Aug 2019 18:08:48 +0000 (11:08 -0700)]
util: Add a _mesa_i64roundevenf() helper.
This always returns a int64_t, translating to _mesa_lroundevenf on
systems where long is 64-bit, and llrintf where "long long" is needed.
Fixes:
594fc0f8595 ("mesa: Replace F_TO_I() with _mesa_lroundevenf().")
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Adam Jackson [Thu, 22 Aug 2019 19:15:28 +0000 (15:15 -0400)]
glx: Unset the direct_support bit for GLX_EXT_import_context
GLX_EXT_import_context operates only on indirect contexts, a direct
context cannot possibly support it. Without this change the extension
will appear in the combined GLX extension string even if it is missing
from the server string, indicating a lack of required server support.
Daniel Kolesa [Tue, 27 Aug 2019 19:47:48 +0000 (21:47 +0200)]
util: add auxv based PowerPC AltiVec/VSX detection
At least on Linux, we can use the ELF auxiliary vector to
detect the presence of AltiVec, VSX and other CPU features
without having to go through handling SIGILL, which has
various problems of its own.
A similar thing is already being done for ARM to detect NEON.
Reviewed-by: Matt Turner <mattst88@gmail.com>
Signed-off-by: Daniel Kolesa <daniel@octaforge.org>
Kenneth Graunke [Sat, 24 Aug 2019 01:23:32 +0000 (18:23 -0700)]
intel/compiler: Use new Gen11 headerless RT writes for MRT cases
Gen11 adds support for specifying the render target index and src0
alpha present bits in the extended message descriptor. Previously,
we had to use a message header for this, requiring extra instructions
to write the fields, and two registers of extra payload.
Improves performance on my ICL 8x8 frequency locked to 700Mhz, on iris:
GfxBench5 Manhattan 3.0: 2.13635% +/- 0.159859% (n=5)
GfxBench5 Aztec Ruins: 1.57173% +/- 0.128749% (n=5)
Synmark2 OglDeferred: 2.86914% +/- 0.191211% (n=10)
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Kenneth Graunke [Mon, 26 Aug 2019 07:05:21 +0000 (00:05 -0700)]
intel/compiler: Use generic SEND for Gen7+ FB writes
This takes care of generate_fb_write/fire_fb_write/brw_fb_WRITE's stuff
earlier in the visitor. It will also make it easier to generate SENDSC
messages with indirect extended descriptors in a few patches.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Kenneth Graunke [Mon, 26 Aug 2019 06:59:25 +0000 (23:59 -0700)]
intel/compiler: Refactor FB write message control setup into a helper.
This will be used by visitor code to convert directly to SEND in a bit.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Kenneth Graunke [Mon, 26 Aug 2019 06:43:29 +0000 (23:43 -0700)]
intel/compiler: Handle bits 15:12 in brw_send_indirect_split_message()
Annoyingly, these bits exist in some extended message descriptors
(in particular render target writes), but they don't have any
corresponding bits in the ISA encoding. So we can't use an immediate
and have to fall back to an indirect extended descriptor.
Thanks to Jason Ekstrand for reminding me that you can still set these
bits via an indirect descriptor, even if they don't exist in the ISA.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Kenneth Graunke [Mon, 26 Aug 2019 22:21:40 +0000 (15:21 -0700)]
intel/compiler: Fix src0/desc setter ordering
src0 vstride and type overlap with bits of the extended descriptor.
brw_set_desc() also sets the extended descriptor to 0. So by setting
the descriptor, then setting src0, we were accidentally setting a bunch
of extended descriptor bits unintentionally.
When using this infrastructure for framebuffer writes (in a future
patch), this ended up setting the extended descriptor bit 20, which is
"Null Render Target" on Icelake, causing nothing to be written to the
framebuffer.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Marek Olšák [Mon, 19 Aug 2019 17:15:54 +0000 (13:15 -0400)]
radeonsi: fix scratch buffer WAVESIZE setting leading to corruption
Cc: 19.2 19.1 <mesa-stable@lists.freedesktop.org>
Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Marek Olšák [Mon, 19 Aug 2019 18:37:33 +0000 (14:37 -0400)]
radeonsi: unbind blend/DSA/rasterizer state correctly in delete functions
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=111414
Fixes:
b758eed9c37 ("radeonsi: make sure that blend state != NULL and remove all NULL checking")
Cc: 19.2 <mesa-stable@lists.freedesktop.org>
Tested-by: Edmondo Tommasina <edmondo.tommasina@gmail.com>
Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Marek Olšák [Mon, 19 Aug 2019 17:06:47 +0000 (13:06 -0400)]
radeonsi: align scratch and ring buffer allocations for faster memory access
Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Marek Olšák [Wed, 21 Aug 2019 22:03:50 +0000 (18:03 -0400)]
radeonsi: consolidate determining VGPR_COMP_CNT for API VS
Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Marek Olšák [Wed, 21 Aug 2019 04:18:17 +0000 (00:18 -0400)]
radeonsi/gfx10: set PA_CL_VS_OUT_CNTL with CONTEXT_REG_RMW to fix edge flags
We need two different values of the register, one for NGG and one for
legacy, in order to fix edge flags for the legacy pipeline.
Passing the ngg flag to emit_clip_regs would be too complicated,
so CONTEXT_REG_RMW is used for partial register updates.
Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Marek Olšák [Wed, 21 Aug 2019 04:28:23 +0000 (00:28 -0400)]
radeonsi/gfx10: remove incorrect ngg/pos_writes_edgeflag variables
It varies depending on si_shader_key::as_ngg.
Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Marek Olšák [Wed, 21 Aug 2019 04:13:17 +0000 (00:13 -0400)]
radeonsi: add PKT3_CONTEXT_REG_RMW
Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Marek Olšák [Wed, 21 Aug 2019 02:44:16 +0000 (22:44 -0400)]
winsys/amdgpu+radeon: process AMD_DEBUG in addition to R600_DEBUG
Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Marek Olšák [Wed, 21 Aug 2019 02:37:52 +0000 (22:37 -0400)]
radeonsi/gfx10: add AMD_DEBUG=nongg
Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Marek Olšák [Tue, 20 Aug 2019 22:57:36 +0000 (18:57 -0400)]
radeonsi/gfx10: finish up Navi14, add PCI ID
Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Marek Olšák [Tue, 20 Aug 2019 22:57:01 +0000 (18:57 -0400)]
radeonsi/gfx10: always use the legacy pipeline for streamout
The best way to prevent GDS hangs is not to use GDS.
Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Marek Olšák [Wed, 21 Aug 2019 00:58:18 +0000 (20:58 -0400)]
radeonsi/gfx10: don't initialize VGT_INSTANCE_STEP_RATE_0
Only gfx9 and older use it to get InstanceID in VGPR1.
Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Marek Olšák [Wed, 21 Aug 2019 00:56:22 +0000 (20:56 -0400)]
radeonsi/gfx10: fix InstanceID for legacy VS+GS
Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Marek Olšák [Wed, 21 Aug 2019 00:39:08 +0000 (20:39 -0400)]
radeonsi/gfx10: add as_ngg variant for VS as ES to select Wave32/64
Legacy GS only works with Wave64.
Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Marek Olšák [Wed, 21 Aug 2019 00:08:38 +0000 (20:08 -0400)]
radeonsi/gfx10: create the GS copy shader if using legacy streamout
Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Marek Olšák [Wed, 21 Aug 2019 00:07:26 +0000 (20:07 -0400)]
radeonsi/gfx10: fix the PRIMITIVES_GENERATED query if using legacy streamout
Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Marek Olšák [Tue, 20 Aug 2019 22:43:14 +0000 (18:43 -0400)]
radeonsi/gfx10: fix tessellation for the legacy pipeline
ported from PAL
Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Marek Olšák [Tue, 20 Aug 2019 17:36:45 +0000 (13:36 -0400)]
radeonsi: move some global shader cache flags to per-binary flags
Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>