platform/upstream/mesa.git
5 years agoloader: always map the "amdgpu" kernel driver name to radeonsi (v2)
Jiang, Sonny [Tue, 3 Sep 2019 22:33:57 +0000 (22:33 +0000)]
loader: always map the "amdgpu" kernel driver name to radeonsi (v2)

v2: cleanup

Signed-off-by: Sonny Jiang <sonny.jiang@amd.com>
Signed-off-by: Marek Olšák <marek.olsak@amd.com>
Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
5 years agoac: stop using PCI IDs for chip identification
Marek Olšák [Wed, 18 Sep 2019 21:07:31 +0000 (17:07 -0400)]
ac: stop using PCI IDs for chip identification

PCI IDs for amdgpu will be removed from Mesa.

Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
5 years agoac/addrlib: fix chip identification for Vega10, Arcturus, Raven2, Renoir
Marek Olšák [Wed, 18 Sep 2019 21:05:09 +0000 (17:05 -0400)]
ac/addrlib: fix chip identification for Vega10, Arcturus, Raven2, Renoir

Cc: 19.2 <mesa-stable@lists.freedesktop.org>
Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
5 years agoamd: add more PCI IDs for Navi14
Marek Olšák [Mon, 23 Sep 2019 19:08:38 +0000 (15:08 -0400)]
amd: add more PCI IDs for Navi14

trivial and urgent

Cc: 19.2 <mesa-stable@lists.freedesktop.org>
5 years agomeson: split compiler warnings one per line
Eric Engestrom [Mon, 23 Sep 2019 16:20:32 +0000 (17:20 +0100)]
meson: split compiler warnings one per line

Signed-off-by: Eric Engestrom <eric.engestrom@intel.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
5 years agonir/repair_ssa: Replace the unreachable check with the phi builder
Jason Ekstrand [Mon, 9 Sep 2019 18:38:37 +0000 (13:38 -0500)]
nir/repair_ssa: Replace the unreachable check with the phi builder

In a3268599f3c9, I attempted to fix nir_repair_ssa for unreachable
blocks.  However, that commit missed the possibility that the use is in
a block which, itself, is unreachable.  In this case, we can end up in
an infinite loop trying to replace a def with itself.  Even though a
no-op replacement is a fine operation, it keeps extending the end of the
uses list as we're walking it.  Instead of explicitly checking for the
group of conditions, just check if the phi builder gives us a different
def.  That's guaranteed to be 100% reliable and, while it lacks symmetry
with the is_valid checks, should be more reliable.

Fixes: a3268599 "nir/repair_ssa: Repair dominance for unreachable..."
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
5 years agoaco: only emit waitcnt on loop continues if we there was some load or export
Daniel Schürmann [Thu, 19 Sep 2019 16:48:01 +0000 (18:48 +0200)]
aco: only emit waitcnt on loop continues if we there was some load or export

Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
5 years agonv50/ir/nir: comparison of integer expressions of different signedness warning
Karol Herbst [Fri, 20 Sep 2019 17:47:14 +0000 (19:47 +0200)]
nv50/ir/nir: comparison of integer expressions of different signedness warning

Signed-off-by: Karol Herbst <kherbst@redhat.com>
Reviewed-by: Rhys Kidd <rhyskidd@gmail.com>
5 years agonv50/ir: fix unnecessary parentheses warning
Karol Herbst [Fri, 20 Sep 2019 17:45:22 +0000 (19:45 +0200)]
nv50/ir: fix unnecessary parentheses warning

Signed-off-by: Karol Herbst <kherbst@redhat.com>
Reviewed-by: Rhys Kidd <rhyskidd@gmail.com>
5 years agolima: remove partial clear support from pipe->clear()
Erico Nunes [Thu, 19 Sep 2019 19:08:05 +0000 (21:08 +0200)]
lima: remove partial clear support from pipe->clear()

pipe->clear() is not called for partial clears, which mesa emulates by
drawing a quad.
Furthermore, drivers should not use rasterizer state information for
scissor information (which was being used to handle the partial clears).
So, remove the partial clear support since it was not supposed to be
handled by pipe->clear() anyway.
This fixes issues with clearing after switching to different sized
framebuffers.

Signed-off-by: Erico Nunes <nunes.erico@gmail.com>
Reviewed-by: Vasily Khoruzhick <anarsoul@gmail.com>
Reviewed-by: Qiang Yu <yuq825@gmail.com>
5 years agodEQP-GLES2.functional.buffer.write.use.index_array.* are passing now.
Boris Brezillon [Wed, 18 Sep 2019 13:23:09 +0000 (15:23 +0200)]
dEQP-GLES2.functional.buffer.write.use.index_array.* are passing now.

Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
5 years agopanfrost: Fix indexed draws
Boris Brezillon [Wed, 18 Sep 2019 13:22:24 +0000 (15:22 +0200)]
panfrost: Fix indexed draws

->padded_count should be large enough to cover all vertices pointed by
the index array. Use the local vertex_count variable that contains the
updated vertex_count value for the indexed draw case.

Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
5 years agoclover/nir: fix compilation with g++-5.5 and maybe earlier
Karol Herbst [Sun, 22 Sep 2019 13:27:33 +0000 (15:27 +0200)]
clover/nir: fix compilation with g++-5.5 and maybe earlier

fixes "sorry, unimplemented: non-trivial designated initializers not supported"

Fixes: deb04adf2ae ("clover: add support for passing kernels as nir to the driver")
Signed-off-by: Karol Herbst <kherbst@redhat.com>
Reviewed-by: Eric Engestrom <eric@engestrom.ch>
5 years agost/mesa: Bail on incomplete attachments in discard_framebuffer
Kenneth Graunke [Fri, 20 Sep 2019 21:33:51 +0000 (14:33 -0700)]
st/mesa: Bail on incomplete attachments in discard_framebuffer

Incomplete attachments don't have an associated pipe_surface, so
this would crash.

Fixes a WebGL conformance test that uses incomplete attachments:
https://www.khronos.org/registry/webgl/sdk/tests/conformance2/renderbuffers/invalidate-framebuffer.html?webglVersion=2&quiet=0&quick=1

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=111756
Reviewed-By: Tapani Pälli <tapani.palli@intel.com>
5 years agolima: implement BO cache
Vasily Khoruzhick [Sun, 8 Sep 2019 02:33:07 +0000 (19:33 -0700)]
lima: implement BO cache

Allocating BOs is expensive, so we should avoid doing that by caching
freed BOs.

BO cache is modelled after one in v3d driver and works as follows:

- in lima_bo_create() check if we have matching BO in cache and return
  it if there's one, allocate new BO otherwise.
- in lima_bo_unreference() (renamed from lima_bo_free()): put BO in
  cache instead of freeing it and remove all stale BOs from cache

Reviewed-by: Qiang Yu <yuq825@gmail.com>
Signed-off-by: Vasily Khoruzhick <anarsoul@gmail.com>
5 years agolima: use 0 to poll if BO is busy in lima_bo_wait()
Vasily Khoruzhick [Sun, 8 Sep 2019 02:30:39 +0000 (19:30 -0700)]
lima: use 0 to poll if BO is busy in lima_bo_wait()

os_time_get_absolute_timeout(0) returns current time, while kernel
driver expects 0 as value to poll BO status and return immediately.
Fix it by setting abs_timeout to 0 if timeout_ns is 0

Reviewed-by: Qiang Yu <yuq825@gmail.com>
Signed-off-by: Vasily Khoruzhick <anarsoul@gmail.com>
5 years agolima: move damage bound build to resource
Qiang Yu [Sun, 25 Aug 2019 11:04:01 +0000 (19:04 +0800)]
lima: move damage bound build to resource

Reviewed-and-Tested-by: Vasily Khoruzhick <anarsoul@gmail.com>
Signed-off-by: Qiang Yu <yuq825@gmail.com>
5 years agolima: don't use damage system when full damage
Qiang Yu [Sun, 25 Aug 2019 09:24:26 +0000 (17:24 +0800)]
lima: don't use damage system when full damage

Some time weston set full damage region. It is
more effient to use the cached pp stream instead
of dynamically create one.

Reviewed-and-Tested-by: Vasily Khoruzhick <anarsoul@gmail.com>
Signed-off-by: Qiang Yu <yuq825@gmail.com>
5 years agolima: implement EGL_KHR_partial_update
Qiang Yu [Sun, 30 Jun 2019 13:44:12 +0000 (21:44 +0800)]
lima: implement EGL_KHR_partial_update

This extension set a damage region for each
buffer swap which can be used to reduce buffer
reload cost by only feed damage region's tile
buffer address for PP.

Reviewed-and-Tested-by: Vasily Khoruzhick <anarsoul@gmail.com>
Signed-off-by: Qiang Yu <yuq825@gmail.com>
5 years agolima: fix PLBU viewport configuration
Icenowy Zheng [Sun, 22 Sep 2019 01:37:38 +0000 (09:37 +0800)]
lima: fix PLBU viewport configuration

The PLBU expects the viewport's 4 borders' coordinates, however
currently we're feeding the coordinate of the left-bottom point and the
size to it, which leads to misrendering when the left-bottom point is
not (0,0).

Change the macros for the viewport PLBU command, and the data feed to
it. The code to calculate the 4 borders is ported from Panfrost.

Signed-off-by: Icenowy Zheng <icenowy@aosc.io>
Reviewed-by: Qiang Yu <yuq825@gmail.com>
5 years agoamd: Build aco only if radv is enabled
Bas Nieuwenhuizen [Fri, 20 Sep 2019 20:22:13 +0000 (22:22 +0200)]
amd: Build aco only if radv is enabled

ACO depends on C++14, but radeonsi/radv with LLVM 8,9 do not. Let us
only require it for RADV, since that is the only user.

Fixes: a70a9987181 "radv/aco: Setup alternate path in RADV to support the experimental ACO compiler"
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
5 years agonvc0: expose spirv support
Karol Herbst [Fri, 10 May 2019 07:28:15 +0000 (09:28 +0200)]
nvc0: expose spirv support

required for OpenCL

v2: adjust to changes in previous commits
v3: properly convert to NIR in nvc0_cp_state_create

Signed-off-by: Karol Herbst <kherbst@redhat.com>
Reviewed-by: Pierre Moreau <pierre.morrow@free.fr> (v1)
5 years agoclover: add support for passing kernels as nir to the driver
Karol Herbst [Tue, 6 Aug 2019 18:35:48 +0000 (20:35 +0200)]
clover: add support for passing kernels as nir to the driver

v2: minor formatting fixes
v3: call glsl_type_singleton_init_or_ref and glsl_type_singleton_decref
v4: capitalize and punctuate comments
    fix text_executable -> text_intermediate in TODO
    make glsl_type_singleton wrapper static
v5: rewrite how we run the nir passes
v6: fix unhandled case switch warning in st/mesa

Signed-off-by: Karol Herbst <kherbst@redhat.com>
Reviewed-by: Francisco Jerez <currojerez@riseup.net> (v4)
5 years agoclover: prepare supporting multiple IRs
Karol Herbst [Fri, 10 May 2019 07:27:06 +0000 (09:27 +0200)]
clover: prepare supporting multiple IRs

v2: rework arguments to compiler::compile_program
    add assert to device::ir_format
v3: remove PIPE_SHADER_IR_SPIRV
    change title

Signed-off-by: Karol Herbst <kherbst@redhat.com>
Reviewed-by: Francisco Jerez <currojerez@riseup.net> (v2)
Reviewed-by: Pierre Moreau <pierre.morrow@free.fr>
5 years agoclover: add support for drivers having no proper binary format
Karol Herbst [Fri, 10 May 2019 07:24:42 +0000 (09:24 +0200)]
clover: add support for drivers having no proper binary format

Most drivers have actually no binary format and just store the IR directly
as a single entry point blob.

v2: add a cap to switch between single or multi entry point binaries
v3: remove the entry_point field
v4: remove PIPE_CAP_MULTI_ENTRY_POINT_BINARIES
v5: remove supports_multiple_entry_points

Signed-off-by: Karol Herbst <kherbst@redhat.com>
Reviewed-by: Pierre Moreau <pierre.morrow@free.fr>
5 years agoclover/functional: add id_equals helper
Karol Herbst [Tue, 30 Jul 2019 11:36:37 +0000 (13:36 +0200)]
clover/functional: add id_equals helper

v2: pass argument by value

Signed-off-by: Karol Herbst <kherbst@redhat.com>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
Reviewed-by: Pierre Moreau <pierre.morrow@free.fr>
5 years agorename pipe_llvm_program_header to pipe_binary_program_header
Karol Herbst [Sat, 11 May 2019 12:26:06 +0000 (14:26 +0200)]
rename pipe_llvm_program_header to pipe_binary_program_header

We want to use it for other formats as well, so give it a more generic name

Signed-off-by: Karol Herbst <kherbst@redhat.com>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
Reviewed-by: Pierre Moreau <pierre.morrow@free.fr>
5 years agogallium: add blob field to pipe_llvm_program_header
Karol Herbst [Fri, 10 May 2019 07:22:25 +0000 (09:22 +0200)]
gallium: add blob field to pipe_llvm_program_header

makes it easier to consume a IR_NATIVE binary

Signed-off-by: Karol Herbst <kherbst@redhat.com>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
Reviewed-by: Pierre Moreau <pierre.morrow@free.fr>
5 years agoclover/llvm: Add functions for compiling from source to SPIR-V
Pierre Moreau [Sat, 10 Feb 2018 20:44:45 +0000 (21:44 +0100)]
clover/llvm: Add functions for compiling from source to SPIR-V

Reviewed-by: Karol Herbst <kherbst@redhat.com>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
5 years agoclover/llvm: Add options for dumping SPIR-V binaries
Pierre Moreau [Sat, 10 Feb 2018 20:41:19 +0000 (21:41 +0100)]
clover/llvm: Add options for dumping SPIR-V binaries

Reviewed-by: Karol Herbst <kherbst@redhat.com>
Acked-by: Francisco Jerez <currojerez@riseup.net>
5 years agoclover/spirv: Add functions for parsing arguments, linking programs, etc.
Pierre Moreau [Tue, 2 Apr 2019 21:32:23 +0000 (23:32 +0200)]
clover/spirv: Add functions for parsing arguments, linking programs, etc.

v2 (Karol Herbst):
  silence warnings about unhandled enum values
v3 (Karol Herbst):
  added back array size parsing (needed for structs passed by value)

Acked-by: Francisco Jerez <currojerez@riseup.net> (v2)
5 years agoclover/spirv: Add functions for validating SPIR-V binaries
Pierre Moreau [Sat, 10 Feb 2018 20:40:10 +0000 (21:40 +0100)]
clover/spirv: Add functions for validating SPIR-V binaries

Changes since:
* v12:
  - remove autotools (Karol Herbst)
  - Remove the callback in format_validation_msg. (Francisco Jerez)
  - Removed is_binary_spirv. (Francisco Jerez)
  - Pass a string reference to is_valid_spirv instead of the
    notification callback. (Francisco Jerez)
* v11: Fix compilation error introduced in v11.
* v10:
  - Reuse format_validation_msg in is_valid_spirv.
  - Remove LVL2STR macro in format_validation_msg.
* v9: Add `clover_cpp_std` to the overrides of the `libclspirv` target
      in Meson.
* v7: Add DEFINES to libclspirv and libclover, in autotools, as they
      would otherwise never know whether CLOVER_ALLOW_SPIRV has been
      defined (Dave Airlie)
* v6: Update the dependency name (meson) and the libs variable
      (Makefile) due to the replacement of llvm-spirv to the new
      official SPIRV-LLVM-Translator.
* v5: Changed to match the updated “clover/llvm: Allow translating from
      SPIR-V to LLVM IR” in the v6.

Reviewed-by: Karol Herbst <kherbst@redhat.com>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
5 years agomeson: Check for SPIRV-Tools and llvm-spirv
Pierre Moreau [Sun, 21 Jan 2018 18:10:58 +0000 (19:10 +0100)]
meson: Check for SPIRV-Tools and llvm-spirv

Changes since:
* v12 (Karol Herbst):
  - rename CLOVER_ALLOW_SPIRV to HAVE_CLOVER_SPIRV
* v11 (Karol Herbst):
  - only set new defines for clover to speed up recompilation
  - remove autotools
* v10:
  - Add a new flag (`--enable-opencl-spirv` for autotools, and
    `-Dopencl-spirv=true` for meson) for enabling SPIR-V support in
    clover, and never automagically enable it without that flag. (Dylan Baker)
  - When enabling the SPIR-V support, the SPIRV-Tools and
    SPIRV-LLVM-Translator libraries are now required dependencies.
* v7:
  - Properly align LLVMSPIRVLib comment (Dylan Baker)
  - Only define CLOVER_ALLOW_SPIRV when **both** dependencies are found:
    autotools was only requiring one or the other.
* v6: Replace the llvm-spirv repository by the new official
      SPIRV-LLVM-Translator.
* v4: Add a comment saying where to find llvm-spirv (Karol Herbst).
* v3:
  - make SPIRV-Tools and llvm-spirv optional (Francisco Jerez);
  - bump requirement for llvm-spirv to version 0.2
* v2:
  - Bump the required version of SPIRV-Tools to the latest release;
  - Add a dependency on llvm-spirv.

Reviewed-by: Dylan Baker <dylan@pnwbakers.com> (v10)
Reviewed-by: Karol Herbst <kherbst@redhat.com>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
5 years agoisl: Drop WaDisableSamplerL2BypassForTextureCompressedFormats on Gen11
Kenneth Graunke [Tue, 17 Sep 2019 06:29:48 +0000 (23:29 -0700)]
isl: Drop WaDisableSamplerL2BypassForTextureCompressedFormats on Gen11

Gen11 doesn't require us to bypass the L2 cache for BC* images anymore.

The documentation is a bit hard to follow on this point, but the Windows
driver clearly only applies this workaround on Gen9, and their commit
history indicates that this was an intentional change to drop the
workaround for Gen11+.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
5 years agogallium/osmesa: Fix the inability to set no context as current.
Hal Gentz [Sun, 15 Sep 2019 21:29:50 +0000 (15:29 -0600)]
gallium/osmesa: Fix the inability to set no context as current.

Currently there is no way to make no context current w/gallium + osmesa.
The non-gallium version of osmesa does this if the context and buffer
passed to `OSMesaMakeCurrent` are both null. This small change makes it
so that this is also the case with the gallium version.

Cc: mesa-stable@lists.freedesktop.org
Signed-off-by: Hal Gentz <zegentzy@protonmail.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
5 years agolibgbm: Wire up getCapability for the image loader
Adam Jackson [Tue, 17 Sep 2019 18:23:28 +0000 (14:23 -0400)]
libgbm: Wire up getCapability for the image loader

5 years agoegl/surfaceless: Add FP16 format support
Adam Jackson [Tue, 10 Sep 2019 15:44:24 +0000 (11:44 -0400)]
egl/surfaceless: Add FP16 format support

Reviewed-by: Kevin Strasser <kevin.strasser@intel.com>
5 years agoegl/wayland: Implement getCapability for the dri2 and image loaders
Adam Jackson [Tue, 10 Sep 2019 15:53:11 +0000 (11:53 -0400)]
egl/wayland: Implement getCapability for the dri2 and image loaders

Reviewed-by: Kevin Strasser <kevin.strasser@intel.com>
5 years agoegl/wayland: Add FP16 format support
Adam Jackson [Fri, 30 Aug 2019 19:35:22 +0000 (15:35 -0400)]
egl/wayland: Add FP16 format support

Reviewed-by: Kevin Strasser <kevin.strasser@intel.com>
5 years agoegl/wayland: Reindent the format table
Adam Jackson [Fri, 30 Aug 2019 19:27:23 +0000 (15:27 -0400)]
egl/wayland: Reindent the format table

No idea how these ended up with 3-then-2-space indents.

Reviewed-by: Kevin Strasser <kevin.strasser@intel.com>
5 years agoanv: Advertise VK_KHR_shader_subgroup_extended_types
Jason Ekstrand [Thu, 18 Apr 2019 19:17:50 +0000 (14:17 -0500)]
anv: Advertise VK_KHR_shader_subgroup_extended_types

Reviewed-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
5 years agointel/fs: Do 8-bit subgroup scan operations in 16 bits
Jason Ekstrand [Tue, 4 Jun 2019 16:39:25 +0000 (11:39 -0500)]
intel/fs: Do 8-bit subgroup scan operations in 16 bits

Reviewed-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
5 years agointel/fs: Allow CLUSTER_BROADCAST to do type conversion
Jason Ekstrand [Tue, 4 Jun 2019 16:45:50 +0000 (11:45 -0500)]
intel/fs: Allow CLUSTER_BROADCAST to do type conversion

We can't really handle it in the little-core 64-bit case but it's not
really needed there.  Where we really want this is for when we need to
do 16 -> 8-bit conversions.

Reviewed-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
5 years agointel/fs: Allow UB, B, and HF types in brw_nir_reduction_op_identity
Jason Ekstrand [Sat, 27 Apr 2019 09:31:31 +0000 (04:31 -0500)]
intel/fs: Allow UB, B, and HF types in brw_nir_reduction_op_identity

Because byte immediates aren't a thing on GEN hardware, we return a
signed or unsigned word immediate in the byte case.

Reviewed-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
5 years agointel/fs: don't forget the stride at generate_shuffle
Paulo Zanoni [Tue, 17 Sep 2019 23:46:33 +0000 (16:46 -0700)]
intel/fs: don't forget the stride at generate_shuffle

During generate_shuffle(), when we use byte sized registers we end up
with a destination stride of 2. We don't take the stride into
consideration when selecting the group offset for the last MOV
operation, which means we end up moving things to the wrong place,
leaving the last few channels untouched. Take the destination stride
in consideration so we don't miss the last channels.

v2: Assert this is not necessary for the IVB special case (Jason).

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Signed-off-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
5 years agoutil/rb_tree: Reverse the order of comparison functions
Jason Ekstrand [Thu, 19 Sep 2019 20:17:24 +0000 (15:17 -0500)]
util/rb_tree: Reverse the order of comparison functions

The new order matches that of the comparison functions accepted by the C
standard library qsort() functions.  Being consistent with qsort will
hopefully help avoid developer confusion.

The only current user of the red-black tree is aub_mem.c which is pretty
easy to fix up.

Reviewed-by: Lionel Landwerlin <lionel.g.lndwerlin@intel.com>
5 years agoutil/rb_tree: Add the unit tests
Jason Ekstrand [Thu, 19 Sep 2019 20:05:51 +0000 (15:05 -0500)]
util/rb_tree: Add the unit tests

When I wrote the red-black tree implementation, I wrote tests for it but
they never got imported into mesa.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
5 years agoanv: implement ICD interface v4
Eric Engestrom [Sat, 7 Sep 2019 21:11:01 +0000 (00:11 +0300)]
anv: implement ICD interface v4

Signed-off-by: Eric Engestrom <eric.engestrom@intel.com>
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
5 years agoanv: split instance dispatch table
Eric Engestrom [Tue, 3 Sep 2019 10:54:35 +0000 (13:54 +0300)]
anv: split instance dispatch table

This effectively breaks the instance dispatch table in 2 with entry
points using a physical device as first argument getting their own
dispatch table.

As a result we now have to check instance & physical device dispatch
table instead of just the instance dispatch table before.

Signed-off-by: Eric Engestrom <eric.engestrom@intel.com>
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
5 years agoglx: Fix drawable lookup bugs in glXUseXFont
Adam Jackson [Thu, 19 Sep 2019 17:27:08 +0000 (13:27 -0400)]
glx: Fix drawable lookup bugs in glXUseXFont

We were using the current drawable of the context to name the
appropriate screen for creating the bitmaps. But one, the current
drawable can be None, and two, it can be a GLXDrawable. Passing either
one as the second argument to XCreatePixmap will throw BadDrawable. Use
the root window of the context's screen instead.

Gitlab: https://gitlab.freedesktop.org/mesa/mesa/issues/89
LOLed-by: Kristian H. Kristensen <hoegsberg@google.com>
Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>
5 years agoglx: Avoid atof() when computing the server's GLX version
Adam Jackson [Thu, 19 Sep 2019 17:50:12 +0000 (13:50 -0400)]
glx: Avoid atof() when computing the server's GLX version

atof() is locale-dependent (sigh), which means 1.3 becomes 1.0 if the
locale's decimal separator isn't a full-stop. Just use the protocol
major/minor instead. This would be slightly broken if the server
generically implements 1.3+ but a particular screen is only capable of
less, but in practice no such servers exist.

Gitlab: https://gitlab.freedesktop.org/mesa/mesa/issues/74
Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>
5 years agonir/algebraic: Additional D3D Boolean optimization
Ian Romanick [Mon, 9 Sep 2019 22:47:48 +0000 (15:47 -0700)]
nir/algebraic: Additional D3D Boolean optimization

I observed this pattern in several shaders in Hand of Fate 2 while
investigating bugzilla #111490.  This also led to the related
bugzilla #111578.  The shaders from HoF2 are *not* in shader-db.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Skylake and Ice Lake had similar results. (Ice Lake shown)
total instructions in shared programs: 16222621 -> 16205419 (-0.11%)
instructions in affected programs: 798418 -> 781216 (-2.15%)
helped: 548
HURT: 0
helped stats (abs) min: 2 max: 158 x̄: 31.39 x̃: 35
helped stats (rel) min: 0.45% max: 28.64% x̄: 2.83% x̃: 2.09%
95% mean confidence interval for instructions value: -33.22 -29.56
95% mean confidence interval for instructions %-change: -3.11% -2.56%
Instructions are helped.

total cycles in shared programs: 364676209 -> 363345763 (-0.36%)
cycles in affected programs: 112810504 -> 111480058 (-1.18%)
helped: 546
HURT: 7
helped stats (abs) min: 2 max: 118913 x̄: 2439.77 x̃: 2340
helped stats (rel) min: 0.08% max: 37.56% x̄: 1.46% x̃: 1.08%
HURT stats (abs)   min: 2 max: 770 x̄: 238.00 x̃: 43
HURT stats (rel)   min: 0.02% max: 11.24% x̄: 3.71% x̃: 0.35%
95% mean confidence interval for cycles value: -2884.33 -1927.41
95% mean confidence interval for cycles %-change: -1.59% -1.21%
Cycles are helped.

total spills in shared programs: 8870 -> 8514 (-4.01%)
spills in affected programs: 1230 -> 874 (-28.94%)
helped: 161
HURT: 0

total fills in shared programs: 21901 -> 21348 (-2.52%)
fills in affected programs: 2120 -> 1567 (-26.08%)
helped: 155
HURT: 5

Broadwell and Haswell had similar results. (Broadwell shown)
total instructions in shared programs: 14994910 -> 14975495 (-0.13%)
instructions in affected programs: 839033 -> 819618 (-2.31%)
helped: 548
HURT: 0
helped stats (abs) min: 2 max: 299 x̄: 35.43 x̃: 49
helped stats (rel) min: 0.39% max: 19.89% x̄: 2.91% x̃: 2.22%
95% mean confidence interval for instructions value: -37.46 -33.40
95% mean confidence interval for instructions %-change: -3.12% -2.70%
Instructions are helped.

total cycles in shared programs: 386032453 -> 384450722 (-0.41%)
cycles in affected programs: 117807357 -> 116225626 (-1.34%)
helped: 547
HURT: 6
helped stats (abs) min: 2 max: 22096 x̄: 2892.01 x̃: 3926
helped stats (rel) min: 0.17% max: 10.34% x̄: 1.56% x̃: 1.31%
HURT stats (abs)   min: 4 max: 60 x̄: 32.83 x̃: 29
HURT stats (rel)   min: 0.38% max: 12.79% x̄: 5.86% x̃: 4.65%
95% mean confidence interval for cycles value: -3060.28 -2660.27
95% mean confidence interval for cycles %-change: -1.59% -1.37%
Cycles are helped.

total spills in shared programs: 23372 -> 21869 (-6.43%)
spills in affected programs: 11730 -> 10227 (-12.81%)
helped: 352
HURT: 0

total fills in shared programs: 34747 -> 35351 (1.74%)
fills in affected programs: 11013 -> 11617 (5.48%)
helped: 3
HURT: 347

Ivy Bridge and Sandybridge had similar results. (Ivy Bridge shown)
total instructions in shared programs: 11956420 -> 11956126 (<.01%)
instructions in affected programs: 14898 -> 14604 (-1.97%)
helped: 98
HURT: 0
helped stats (abs) min: 3 max: 3 x̄: 3.00 x̃: 3
helped stats (rel) min: 1.30% max: 3.57% x̄: 2.08% x̃: 2.00%
95% mean confidence interval for instructions value: -3.00 -3.00
95% mean confidence interval for instructions %-change: -2.18% -1.98%
Instructions are helped.

total cycles in shared programs: 178791217 -> 178790792 (<.01%)
cycles in affected programs: 149763 -> 149338 (-0.28%)
helped: 91
HURT: 7
helped stats (abs) min: 3 max: 107 x̄: 20.63 x̃: 16
helped stats (rel) min: 0.13% max: 6.91% x̄: 1.40% x̃: 1.18%
HURT stats (abs)   min: 3 max: 322 x̄: 207.43 x̃: 322
HURT stats (rel)   min: 0.14% max: 19.85% x̄: 12.73% x̃: 17.41%
95% mean confidence interval for cycles value: -18.94 10.27
95% mean confidence interval for cycles %-change: -1.28% 0.49%
Inconclusive result (value mean confidence interval includes 0).

5 years agonir/algebraic: Do not apply late DPH optimization in vertex processing stages
Ian Romanick [Sat, 31 Aug 2019 18:40:32 +0000 (11:40 -0700)]
nir/algebraic: Do not apply late DPH optimization in vertex processing stages

Some shaders do not use 'invariant' in vertex and (possibly) geometry
shader stages on some outputs that are intended to be invariant.  For
various reasons, this optimization may not be fully applied in all
shaders used for different rendering passes of the same geometry.  This
can result in Z-fighting artifacts (at best).  For now, disable this
optimization in these stages.

In tessellation stages applications seem to use 'precise' when
necessary, so allow the optimization in those stages.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=111490
Fixes: 09705747d72 ("nir/algebraic: Reassociate fadd into fmul in DPH-like pattern")

All Gen8+ platforms had similar results. (Ice Lake shown)
total instructions in shared programs: 16194726 -> 16344745 (0.93%)
instructions in affected programs: 2855172 -> 3005191 (5.25%)
helped: 6
HURT: 20279
helped stats (abs) min: 1 max: 3 x̄: 1.33 x̃: 1
helped stats (rel) min: 0.44% max: 1.00% x̄: 0.54% x̃: 0.44%
HURT stats (abs)   min: 1 max: 32 x̄: 7.40 x̃: 7
HURT stats (rel)   min: 0.14% max: 42.86% x̄: 8.58% x̃: 6.56%
95% mean confidence interval for instructions value: 7.34 7.45
95% mean confidence interval for instructions %-change: 8.48% 8.67%
Instructions are HURT.

total cycles in shared programs: 364471296 -> 365014683 (0.15%)
cycles in affected programs: 32421530 -> 32964917 (1.68%)
helped: 2925
HURT: 16144
helped stats (abs) min: 1 max: 403 x̄: 18.39 x̃: 5
helped stats (rel) min: <.01% max: 22.61% x̄: 1.97% x̃: 1.15%
HURT stats (abs)   min: 1 max: 18471 x̄: 36.99 x̃: 15
HURT stats (rel)   min: 0.02% max: 52.58% x̄: 5.60% x̃: 3.87%
95% mean confidence interval for cycles value: 21.58 35.41
95% mean confidence interval for cycles %-change: 4.36% 4.52%
Cycles are HURT.

5 years agodocs/features: Update VK_KHR_display_swapchain status
Andres Gomez [Wed, 18 Sep 2019 12:34:33 +0000 (15:34 +0300)]
docs/features: Update VK_KHR_display_swapchain status

It was set as done by mistake.

Fixes: bc15d74529e ("docs/features: Mark some Vulkan extensions as done")
Signed-off-by: Andres Gomez <agomez@igalia.com>
Acked-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
5 years agodocs/features: Update status list of Vulkan extensions
Andres Gomez [Wed, 18 Sep 2019 12:25:16 +0000 (15:25 +0300)]
docs/features: Update status list of Vulkan extensions

To get the extension list:

$ git grep -hE "extension name=\"VK_KHR" src/vulkan/registry/vk.xml | \
grep -v disabled | awk '{print $2}' | sed -E 's/(name=)?"//g' | sort

To find anv(il) and radv supported extensions:

$ git grep -hE "'VK_([A-Z]+)_[a-z,0-9]" src/intel/

$ git grep -hE "'VK_([A-Z]+)_[a-z,0-9]" src/amd/

v2:
  - Keep VK_KHR_device_group and VK_KHR_device_group_creation as not
    started (Jason).

Signed-off-by: Andres Gomez <agomez@igalia.com>
Acked-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
5 years agoMove blob from compiler/ to util/
Jason Ekstrand [Wed, 18 Sep 2019 19:32:00 +0000 (14:32 -0500)]
Move blob from compiler/ to util/

There's nothing whatsoever compiler-specific about it other than that's
currently where it's used.

Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
5 years agoRevert "panfrost: Rework midgard_pair_load_store() to kill the nested foreach loop"
Boris Brezillon [Thu, 19 Sep 2019 18:59:46 +0000 (20:59 +0200)]
Revert "panfrost: Rework midgard_pair_load_store() to kill the nested foreach loop"

There's a missing prev_ldst = NULL; assignment in the new logic,
but even with this fixed it seems to regress some applications,
so let's revert the change until we find the real problem.

This reverts commit c9bebae2877e55cdcd94f9f9f3f6805238caeb28.

5 years agointel/fs: Add Fall-through comment
Caio Marcelo de Oliveira Filho [Wed, 18 Sep 2019 16:04:39 +0000 (09:04 -0700)]
intel/fs: Add Fall-through comment

Reviewed-by: Andres Gomez <agomez@igalia.com>
5 years agonir/algebraic: refactor inexact opcode restrictions
Samuel Iglesias Gonsálvez [Thu, 19 Sep 2019 10:10:27 +0000 (12:10 +0200)]
nir/algebraic: refactor inexact opcode restrictions

Refactor the code to avoid calling a lot of time to auxiliary functions
when it is not really needed.

Signed-off-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
5 years agodocs: Update bug report URLs for the gitlab migration
Adam Jackson [Wed, 18 Sep 2019 19:59:41 +0000 (15:59 -0400)]
docs: Update bug report URLs for the gitlab migration

Cc: mesa-stable@lists.freedesktop.org
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>
5 years agoglx: Remove redundant null check.
Bas Nieuwenhuizen [Thu, 19 Sep 2019 14:50:08 +0000 (16:50 +0200)]
glx: Remove redundant null check.

Gitlab: https://gitlab.freedesktop.org/mesa/mesa/issues/64
Reviewed-by: Adam Jackson <ajax@redhat.com>
5 years agoiris: Skip double-disabling TCS/TES/GS after BLORP operations
Kenneth Graunke [Thu, 19 Sep 2019 07:33:28 +0000 (00:33 -0700)]
iris: Skip double-disabling TCS/TES/GS after BLORP operations

BLORP always turns off TCS/TES/GS.  If regular drawing also has them
disabled (the overwhelmingly common case), then leaving them disabled
is just fine by us and we can skip dirtying them, as that would just
re-disable them a second time on the next draw.

If they are actually enabled, however, we do need to flag them.

Cuts 52% of the 3DSTATE_HS packets in an Aztec Ruins trace.

5 years ago.mailmap: add an alias for Frank Binns
Erik Faye-Lund [Mon, 16 Sep 2019 17:17:00 +0000 (19:17 +0200)]
.mailmap: add an alias for Frank Binns

Reviewed-by: Frank Binns <frank.binns@imgtec.com>
5 years ago.mailmap: add an alias for Bas Nieuwenhuizen
Erik Faye-Lund [Mon, 16 Sep 2019 16:39:22 +0000 (18:39 +0200)]
.mailmap: add an alias for Bas Nieuwenhuizen

Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
5 years agoanv: fix descriptor limits on gen8
Arcady Goldmints-Orlov [Thu, 12 Sep 2019 19:20:22 +0000 (14:20 -0500)]
anv: fix descriptor limits on gen8

Later generations support bindless for samplers, images, and buffers and
thus per-stage descriptors are not limited by the binding table size.
However, gen8 doesn't support bindless images and thus needs to report a
lower per-stage limit so that all combinations of descriptors that fit
within the advertised limits are reported as supported by
vkGetDescriptorSetLayoutSupport.

Fixes test dEQP-VK.api.maintenance3_check.descriptor_set
Fixes: 79fb0d27f3 ("anv: Implement SSBOs bindings with GPU addresses in the descriptor BO")

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
5 years agoradv: remove dead shared variables
Daniel Schürmann [Tue, 17 Sep 2019 16:24:06 +0000 (18:24 +0200)]
radv: remove dead shared variables

LLVM does this anyway, but for ACO we need to do it in NIR.

Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
5 years agoradv/aco: enable VK_EXT_shader_demote_to_helper_invocation
Daniel Schürmann [Tue, 17 Sep 2019 15:09:52 +0000 (17:09 +0200)]
radv/aco: enable VK_EXT_shader_demote_to_helper_invocation

For now, this extension will only be enabled for ACO.

Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
5 years agoradv: enable clustered reductions
Daniel Schürmann [Tue, 17 Sep 2019 15:07:51 +0000 (17:07 +0200)]
radv: enable clustered reductions

These work with both, LLVM and ACO.

Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
5 years agoradv/aco: Setup alternate path in RADV to support the experimental ACO compiler
Daniel Schürmann [Tue, 17 Sep 2019 12:35:22 +0000 (14:35 +0200)]
radv/aco: Setup alternate path in RADV to support the experimental ACO compiler

LLVM remains default and ACO can be enabled with RADV_PERFTEST=aco.

Co-authored-by: Daniel Schürmann <daniel@schuermann.dev>
Co-authored-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
5 years agoaco: Initial commit of independent AMD compiler
Daniel Schürmann [Tue, 17 Sep 2019 11:22:17 +0000 (13:22 +0200)]
aco: Initial commit of independent AMD compiler

ACO (short for AMD Compiler) is a new compiler backend with the goal to replace
LLVM for Radeon hardware for the RADV driver.

ACO currently supports only VS, PS and CS on VI and Vega.
There are some optimizations missing because of unmerged NIR changes
which may decrease performance.

Full commit history can be found at
https://github.com/daniel-schuermann/mesa/commits/backend

Co-authored-by: Daniel Schürmann <daniel@schuermann.dev>
Co-authored-by: Rhys Perry <pendingchaos02@gmail.com>
Co-authored-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Co-authored-by: Connor Abbott <cwabbott0@gmail.com>
Co-authored-by: Michael Schellenberger Costa <mschellenbergercosta@googlemail.com>
Co-authored-by: Timur Kristóf <timur.kristof@gmail.com>
Acked-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Acked-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
5 years agoegl: check for NULL value like eglGetSyncAttribKHR does
Tapani Pälli [Mon, 10 Jun 2019 10:06:05 +0000 (13:06 +0300)]
egl: check for NULL value like eglGetSyncAttribKHR does

Commit d1e1563bb63 added a NULL check for eglGetSyncAttribKHR
but eglGetSyncAttrib does not do this. Patch adds same check to
happen with eglGetSyncAttrib.

Fixes crashes in (when exposing EGL 1.5):
   dEQP-EGL.functional.fence_sync.invalid.get_invalid_value

Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>
Cc: mesa-stable@lists.freedesktop.org
5 years agoiris: Rework iris_update_draw_parameters to be more efficient
Kenneth Graunke [Thu, 19 Sep 2019 03:32:36 +0000 (20:32 -0700)]
iris: Rework iris_update_draw_parameters to be more efficient

This improves a couple of things:

1. We now only update anything if the shader actually cares.

   Previously, is_indexed_draw was causing us to flag dirty vertex
   buffers, elements, and SGVs every time the shader switched between
   indexed and non-indexed draws.  This is a very common situation,
   but we only need that information if the shader uses gl_BaseVertex.

   We were also flagging things when switching between indirect/direct
   draws as well, and now we only bother if it matters.

2. We upload new draw parameters only when necessary.

   When we detect that the draw parameters have changed, we upload a
   new copy, and use that.  Previously we were uploading it every time
   the vertex buffers were dirty (for possibly unrelated reasons) and
   the shader needed that info.  Tying these together also makes the
   code a bit easier to follow.

In Civilization VI's benchmark, this code was flagging dirty state
many times per frame (49 average, 16 median, 614 maximum).  Now it
occurs exactly once for the entire run.

5 years agoiris: Use state_refs for draw parameters.
Kenneth Graunke [Thu, 19 Sep 2019 03:12:33 +0000 (20:12 -0700)]
iris: Use state_refs for draw parameters.

iris_state_ref is a <resource, offset> tuple, which is exactly what we
need here.

5 years agoutil/disk_cache: make use of the total job size limiting feature
Timothy Arceri [Tue, 3 Sep 2019 04:22:50 +0000 (14:22 +1000)]
util/disk_cache: make use of the total job size limiting feature

This makes use of the total job size limiting feature added in the
previous patch.

The idea is to avoid an excessive build up in memory use due to the
use of both the UTIL_QUEUE_INIT_RESIZE_IF_FULL and
UTIL_QUEUE_INIT_USE_MINIMUM_PRIORITY flags.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
5 years agoutil/u_queue: track job size and limit the size of queue growth
Timothy Arceri [Tue, 3 Sep 2019 03:05:08 +0000 (13:05 +1000)]
util/u_queue: track job size and limit the size of queue growth

When both UTIL_QUEUE_INIT_RESIZE_IF_FULL and
UTIL_QUEUE_INIT_USE_MINIMUM_PRIORITY are set, we can get into a
situation where the queue never executes and grows to a huge size
due to all other threads being busy.

This is the case with the shader cache when attempting to compile a
huge number of shaders up front. If all threads are busy compiling
shaders the cache queues memory use can climb into the many GBs
very fast.

The use of these two flags with the shader cache is intended to
allow shaders compiled at runtime to be compiled as fast as possible.
To avoid huge memory use but still allow the queue to perform
optimally in the run time compilation case, we now add the ability
to track memory consumed by the jobs in the queue and limit it to
a hardcoded 256MB which should be more than enough.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
5 years agoutil/disk_cache: bump thread count assigned to disk cache queue
Timothy Arceri [Tue, 3 Sep 2019 04:13:05 +0000 (14:13 +1000)]
util/disk_cache: bump thread count assigned to disk cache queue

Since we set the UTIL_QUEUE_INIT_USE_MINIMUM_PRIORITY flag this should
have little impact on low core systems. However just about all modern
CPUs currently available that run Mesa have *at least* 4 cores. For
these CPUs allowing more threads can result in the queue being
processed faster and avoid excessive memory use due to a backlog of
cache entrys building up in the queue.

This change helps avoid a huge build up of cache entrys in the queue
due to using both the UTIL_QUEUE_INIT_USE_MINIMUM_PRIORITY and
UTIL_QUEUE_INIT_RESIZE_IF_FULL flags.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
5 years agointel/fs: fix SHADER_OPCODE_CLUSTER_BROADCAST for SIMD32
Paulo Zanoni [Wed, 4 Sep 2019 22:07:20 +0000 (15:07 -0700)]
intel/fs: fix SHADER_OPCODE_CLUSTER_BROADCAST for SIMD32

The current code can create functions with a width of 32, which is not
supported by our hardware. Add some code to simplify how we express
what we want and prevent such cases.

For some unknown reason, all the tests I could run seem to work even
with these unsupported MOVs.

Fixes: b0858c1cc6 "intel/fs: Add a couple of simple helper opcodes"
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Signed-off-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
5 years agointel/fs: the maximum supported stride width is 16
Paulo Zanoni [Sat, 31 Aug 2019 00:16:28 +0000 (17:16 -0700)]
intel/fs: the maximum supported stride width is 16

There are cases where we try to generate registers with a stride of
32, while the hardware maximum is just 16. This happens, for example,
when using 8 bit integers on SIMD32. This results in a crash because
the variable 'width' has a value of 32:

../../src/intel/compiler/brw_reg.h:550: brw_reg brw_vecn_reg(unsigned
int, brw_reg_file, unsigned int, unsigned int): Assertion `!"Invalid
register width"' failed.

This change prevents the crash and makes the tests pass.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Signed-off-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
5 years agointel/fs: roll the loop with the <0,1,0> additions in emit_scan()
Paulo Zanoni [Sat, 24 Aug 2019 00:15:27 +0000 (17:15 -0700)]
intel/fs: roll the loop with the <0,1,0> additions in emit_scan()

IMHO the code is easier to understand this way, being explicit that
we're doing exactly the same thing every time.

No functional changes.

v2: Adjust the loop breaking condition (Jason).

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Signed-off-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
5 years agointel/fs: make scan/reduce work with SIMD32 when it fits 2 registers
Paulo Zanoni [Fri, 9 Aug 2019 22:40:33 +0000 (15:40 -0700)]
intel/fs: make scan/reduce work with SIMD32 when it fits 2 registers

When dealing with uint16_t and uint8_t on SIMD32 we can do all the
operations using just 2 registers, so we don't hit the recursion at
the beginning of emit_scan(). Because of that, we need to actually
compute scan/reduce for channels 31:16.

v2: Still missed instructions (Jason).

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Signed-off-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
5 years agofreedreno/regs: A couple of tess updates
Kristian H. Kristensen [Wed, 18 Sep 2019 20:09:50 +0000 (13:09 -0700)]
freedreno/regs: A couple of tess updates

Reviewed-by: Eric Anholt <eric@anholt.net>
5 years agofreedreno/regs: Fix CP_DRAW_INDX_OFFSET command
Kristian H. Kristensen [Wed, 18 Sep 2019 20:08:55 +0000 (13:08 -0700)]
freedreno/regs: Fix CP_DRAW_INDX_OFFSET command

On A5xx+ the INDX_BASE pointer is 64 bit.

Reviewed-by: Eric Anholt <eric@anholt.net>
5 years agofreedreno/a6xx: Write multiple regs for SP_VS_OUT_REG and SP_VS_VPC_DST_REG
Kristian H. Kristensen [Mon, 10 Jun 2019 19:04:21 +0000 (12:04 -0700)]
freedreno/a6xx: Write multiple regs for SP_VS_OUT_REG and SP_VS_VPC_DST_REG

Compute the number of writes up front.

Signed-off-by: Kristian H. Kristensen <hoegsberg@google.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
5 years agofreedreno/a6xx: Turn on vectorize_io
Kristian H. Kristensen [Fri, 12 Jul 2019 19:36:45 +0000 (12:36 -0700)]
freedreno/a6xx: Turn on vectorize_io

We want this for tessellation eventually, but we can turn it on now.

Shader-db results:

total instructions in shared programs: 8612905 -> 8611387 (-0.02%)
instructions in affected programs: 164952 -> 163434 (-0.92%)

total dwords in shared programs: 11952000 -> 11950560 (-0.01%)
dwords in affected programs: 68096 -> 66656 (-2.11%)

total full in shared programs: 315019 -> 315009 (<.01%)
full in affected programs: 1642 -> 1632 (-0.61%)

total constlen in shared programs: 2463654 -> 2463654 (0.00%)
constlen in affected programs: 0 -> 0

total (ss) in shared programs: 152379 -> 152409 (0.02%)
(ss) in affected programs: 1503 -> 1533 (2.00%)

total (sy) in shared programs: 96473 -> 96525 (0.05%)
(sy) in affected programs: 654 -> 706 (7.95%)

total max_sun in shared programs: 1172454 -> 1172472 (<.01%)
max_sun in affected programs: 104 -> 122 (17.31%)

Signed-off-by: Kristian H. Kristensen <hoegsberg@google.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
5 years agofreedreno/a6xx: Share shader state constructor and destructor
Kristian H. Kristensen [Mon, 10 Jun 2019 19:04:21 +0000 (12:04 -0700)]
freedreno/a6xx: Share shader state constructor and destructor

Also, swap vs and fs constructor or so fs comes first.

Signed-off-by: Kristian H. Kristensen <hoegsberg@google.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
5 years agofreedreno/a6xx: Track location of gl_Position out as we link it
Kristian H. Kristensen [Fri, 13 Sep 2019 22:20:05 +0000 (15:20 -0700)]
freedreno/a6xx: Track location of gl_Position out as we link it

When using xfb and rasterizing, the fragment shader may have fewer
inputs than the vertex shader outputs. We can't rely on gl_Position to
be placed at fs->total_in, but have to instead remember where we add
it in the link map and use that location.

Fixes 100+ tesselation dEQPs under

  dEQP-GLES31.functional.tessellation.primitive_discard.*
  dEQP-GLES31.functional.tessellation.user_defined_io.*

Reviewed-by: Eric Anholt <eric@anholt.net>
5 years agospirv: Add missing break for capability handling
Caio Marcelo de Oliveira Filho [Wed, 18 Sep 2019 15:57:15 +0000 (08:57 -0700)]
spirv: Add missing break for capability handling

New added cases "stole" the previous break.

Fixes: 420ad0a1a3d ("spirv: check support for SPV_KHR_float_controls capabilities")
Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>
5 years agoiris: Avoid uploading SURFACE_STATE descriptors for UBOs if possible
Kenneth Graunke [Sun, 15 Sep 2019 06:18:20 +0000 (23:18 -0700)]
iris: Avoid uploading SURFACE_STATE descriptors for UBOs if possible

If we can entirely push uniform data, we don't need a SURFACE_STATE
descriptor for pulling data.  Since constant uploads are a very common
operation, and being able to push all data is also very common, we would
like to avoid the overhead in this case.

This patch defers uploading new descriptors.  Instead of handling that
at iris_set_constant_buffer, we do it at iris_update_compiled_shaders,
where we can see the currently bound shader variants.  If any need pull
descriptors, and descriptors are missing, we update them and flag that
the binding table also needs to be refreshed.

Improves performance in GFXBench5 gl_driver2 on an i7-6770HQ by
31.9774% +/- 1.12947% (n=15).

Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
5 years agointel/compiler: Record whether any pull constant loads occur
Kenneth Graunke [Tue, 10 Sep 2019 05:21:17 +0000 (22:21 -0700)]
intel/compiler: Record whether any pull constant loads occur

I would like for iris to be able to avoid setting up SURFACE_STATE
for UBOs in the common case where all constants are pushed.

Unfortunately, we don't know up front whether everything will be
pushed: the backend is allowed to demote pushed UBOs to pull loads
fairly late in the process.  This is probably desirable though, as
we'd like the backend to be able to re-pull pushed data to break up
long live ranges in response to register pressure.

Here we simply add a "are there any pull loads at all" boolean to
prog_data, which is a bit crude but at least allows us to skip work
in the common "everything pushed" case.  We could skip more work by
tracking exactly which UBO surfaces are pulled in a bitmask, but I
wanted to avoid bringing back the old mark_surface_used() mechanism.

Finer-grained tracking could allow us to skip a bit more work when
multiple UBOs are in use and /some/ are 100% pushed, but others are
accessed via pulls.  However, I'm not sure how common this is and
it would save at most 4 pull descriptors, so we defer that for now.

Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
5 years agoiris: Track per-stage bind history, reduce work accordingly
Kenneth Graunke [Tue, 10 Sep 2019 18:14:57 +0000 (11:14 -0700)]
iris: Track per-stage bind history, reduce work accordingly

We now track per-stage bind history for constant and shader buffers,
shader images, and sampler views by adding an extra res->bind_stages
field to go with res->bind_history.

This lets us flag IRIS_DIRTY_CONSTANTS for only the specific stages
involved, and also skip some CPU overhead in iris_rebind_buffer.

Cuts 4% of 3DSTATE_CONSTANT_XS packets in a Shadow of Mordor trace
on Icelake.

Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
5 years agoiris: Don't flag IRIS_DIRTY_BINDINGS for constant usage history
Kenneth Graunke [Tue, 10 Sep 2019 10:28:59 +0000 (03:28 -0700)]
iris: Don't flag IRIS_DIRTY_BINDINGS for constant usage history

The underlying buffer isn't changing - so we don't need to update any
SURFACE_STATE descriptors - we just might have new constants, meaning
we need to re-emit 3DSTATE_CONSTANT_XS.  On Gen9, this means we need
to update 3DSTATE_BINDING_TABLE_POINTERS_XS too, but that's now handled
by the explicit check in the previous patch.

On Gen9, this should cause us to re-emit the binding table /pointer/ on
writing to a buffer with PIPE_BIND_CONSTANT_BUFFER, rather than emitting
a whole new /table/.

On Gen8 and Gen11, this avoids binding table churn altogether.

Cuts 61% of 3DSTATE_BINDING_TABLE_POINTERS_XS packets in a Shadow of
Mordor trace on Icelake.

Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
5 years agoiris: Explicitly emit 3DSTATE_BTP_XS on Gen9 with DIRTY_CONSTANTS_XS
Kenneth Graunke [Tue, 10 Sep 2019 10:08:46 +0000 (03:08 -0700)]
iris: Explicitly emit 3DSTATE_BTP_XS on Gen9 with DIRTY_CONSTANTS_XS

Right now, we usually flag both IRIS_DIRTY_{CONSTANTS,BINDINGS}_XS,
because we have SURFACE_STATE for constant buffers in case the shaders
access them via pull mode.

But this flagging is overkill in many cases.  Gen8 and Gen11 don't need
it at all.  Gen9 doesn't need that large of a hammer in all cases.

Just handle it explicitly so the right thing happens.

Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
5 years agoiris: Flag IRIS_DIRTY_BINDINGS_XS on constant buffer rebinds
Kenneth Graunke [Tue, 10 Sep 2019 19:10:26 +0000 (12:10 -0700)]
iris: Flag IRIS_DIRTY_BINDINGS_XS on constant buffer rebinds

We upload a new SURFACE_STATE for the UBO/SSBO in question, which
means that we need new binding tables as well.

Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
5 years agoradv: Add DFSM support.
Bas Nieuwenhuizen [Sun, 15 Sep 2019 12:39:42 +0000 (14:39 +0200)]
radv: Add DFSM support.

Apparently we already enabled it without having support ...

Not sure if we also need to set disable_start_of_prim when the PS
has memory writes, but this mirrors radeonsi.

Doubles fillrate in my dual_quad_bench from ~16 pixels/cycles to
~32 pixels/cycle on a Raven.

Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
5 years agoradv: Disable dfsm by default even on Raven.
Bas Nieuwenhuizen [Sun, 15 Sep 2019 13:57:52 +0000 (15:57 +0200)]
radv: Disable dfsm by default even on Raven.

When actually implementing it, Talos on low is still 3% slower.

Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
5 years agoradv: Only break batch on framebuffer change with dfsm.
Bas Nieuwenhuizen [Sun, 15 Sep 2019 11:36:58 +0000 (13:36 +0200)]
radv: Only break batch on framebuffer change with dfsm.

Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
5 years agonir/opt_if: Fix undef handling in opt_split_alu_of_phi()
Connor Abbott [Wed, 28 Aug 2019 14:56:57 +0000 (16:56 +0200)]
nir/opt_if: Fix undef handling in opt_split_alu_of_phi()

The pass assumed that "Most ALU ops produce an undefined result if any
source is undef" which is completely untrue. Due to how we lower if
statements to selects and then optimize on those selects later, we
simply cannot make that assumption. In particular this pass tried to
replace an ior of undef and true, which had been generated by
optimizing a select which itself came from flattening an if statement,
to undef causing a miscompilation for a CTS test with radeonsi NIR.

We fix this by always doing what the non-undef path did, i.e. duplicate
the instruction twice. If there are cases where the instruction before
the loop can be folded away due to having an undef source, we should add
these to opt_undef instead.

The comment above the pass says that if the phi source from before the
loop is undef, and we can fold the instruction before the loop to undef,
then we can ignore sources of the original instruction that don't
dominate the block before the loop because we don't need them to create
the instruction before the loop. This is incorrect, because the
instruction at the bottom of the loop would get those sources from the
wrong loop iteration. The code never actually did what the comment said,
so we only have to update the comment to match what the pass actually
does. We also update the example to more closely match what most actual
loops look like after vtn and peephole_select.

There are no shader-db changes with i965, radeonsi NIR, or radv. With
anv and my vkpipeline-db there's only one change:

total instructions in shared programs: 14125290 -> 14125300 (<.01%)
instructions in affected programs: 2598 -> 2608 (0.38%)
helped: 0
HURT: 1

total cycles in shared programs: 2051473437 -> 2051473397 (<.01%)
cycles in affected programs: 36697 -> 36657 (-0.11%)
helped: 1
HURT: 0

Fixes
KHR-GL45.shader_subroutine.control_flow_and_returned_subroutine_values_used_as_subroutine_input
with radeonsi NIR.

5 years agogl: drop incorrect pkg-config file for glvnd
Eric Engestrom [Wed, 18 Sep 2019 20:48:49 +0000 (21:48 +0100)]
gl: drop incorrect pkg-config file for glvnd

Akin to 1a25980c469b38d2c645 ("egl: drop incorrect pkg-config file for
glvnd") and b01524fff05eef66e8cd ("meson: don't build libGLES*.so with
GLVND") , removes a pkg-config file that shouldn't have been there in
the first place, but was needed because of that GLVND bug.

Now that the glvnd bug has been fixed, it was apparent that this gl.pc
pkg-config file was forgotten to be removed, so let's do just that :)

Suggested-by: Matt Turner <mattst88@gmail.com>
Cc: mesa-stable@lists.freedesktop.org
Signed-off-by: Eric Engestrom <eric.engestrom@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
5 years agodocs: Add the maximum implemented Vulkan API version in 19.3 rel notes
Andres Gomez [Wed, 18 Sep 2019 09:44:47 +0000 (12:44 +0300)]
docs: Add the maximum implemented Vulkan API version in 19.3 rel notes

Currently, Vulkan 1.1.

Cc: mesa-stable@lists.freedesktop.org
Signed-off-by: Andres Gomez <agomez@igalia.com>
Reviewed-by: Eric Engestrom <eric@engestrom.ch>
Reviewed-by: Dylan Baker <dylan@pnwbakers.com>
5 years agodocs: Add the maximum implemented Vulkan API version in 19.2 rel notes
Andres Gomez [Wed, 18 Sep 2019 09:44:13 +0000 (12:44 +0300)]
docs: Add the maximum implemented Vulkan API version in 19.2 rel notes

Currently, Vulkan 1.1.

Cc: 19.2 <mesa-stable@lists.freedesktop.org>
Signed-off-by: Andres Gomez <agomez@igalia.com>
Reviewed-by: Eric Engestrom <eric@engestrom.ch>
Reviewed-by: Dylan Baker <dylan@pnwbakers.com>