platform/upstream/mesa.git
5 years agoradeonsi: move color clamping to si_llvm_export_vs to unify the code
Marek Olšák [Mon, 3 Jun 2019 18:51:08 +0000 (14:51 -0400)]
radeonsi: move color clamping to si_llvm_export_vs to unify the code

5 years agoradeonsi: use the ac helper for index buffer stores in the culling shader
Marek Olšák [Mon, 3 Jun 2019 23:43:44 +0000 (19:43 -0400)]
radeonsi: use the ac helper for index buffer stores in the culling shader

5 years agoradeonsi: use the ac helper for image stores
Marek Olšák [Mon, 3 Jun 2019 20:35:37 +0000 (16:35 -0400)]
radeonsi: use the ac helper for image stores

5 years agoradeonsi: use the ac helper for SSBO stores
Marek Olšák [Fri, 24 May 2019 22:56:05 +0000 (18:56 -0400)]
radeonsi: use the ac helper for SSBO stores

5 years agoradeonsi: fixes for vec3 buffer stores in LLVM 9
Marek Olšák [Mon, 3 Jun 2019 22:11:27 +0000 (18:11 -0400)]
radeonsi: fixes for vec3 buffer stores in LLVM 9

5 years agoiris: Enable PIPE_CAP_CS_DERIVED_SYSTEM_VALUES_SUPPORTED
Caio Marcelo de Oliveira Filho [Mon, 10 Jun 2019 03:58:08 +0000 (20:58 -0700)]
iris: Enable PIPE_CAP_CS_DERIVED_SYSTEM_VALUES_SUPPORTED

This avoids lowering of CS system values by GLSL (configured by state
tracker).  In i965 we don't use that lowering, and we also shouldn't
need that in Iris.

Using it cause some unnecessary round trip between values, e.g.:
shader uses gl_LocalInvocationIndex, GLSL rewrites it in terms of
gl_LocalInvocationID, then driver rewrites those in terms of
gl_LocalInvocationIndex again.  Copy propagation can make some of
those go away, but not all as seen below.

Intel SKL shader-db results:

    total instructions in shared programs: 15595189 -> 15594556 (<.01%)
    instructions in affected programs: 74880 -> 74247 (-0.85%)
    helped: 81
    HURT: 4
    helped stats (abs) min: 2 max: 172 x̄: 7.88 x̃: 4
    helped stats (rel) min: 0.19% max: 5.66% x̄: 1.71% x̃: 1.23%
    HURT stats (abs)   min: 1 max: 2 x̄: 1.25 x̃: 1
    HURT stats (rel)   min: 0.45% max: 1.65% x̄: 0.76% x̃: 0.46%
    95% mean confidence interval for instructions value: -11.56 -3.34
    95% mean confidence interval for instructions %-change: -1.91% -1.28%
    Instructions are helped.

    total loops in shared programs: 4831 -> 4831 (0.00%)
    loops in affected programs: 0 -> 0
    helped: 0
    HURT: 0

    total cycles in shared programs: 372136618 -> 372145628 (<.01%)
    cycles in affected programs: 9218230 -> 9227240 (0.10%)
    helped: 131
    HURT: 86
    helped stats (abs) min: 1 max: 798 x̄: 39.79 x̃: 12
    helped stats (rel) min: <.01% max: 6.75% x̄: 0.42% x̃: 0.13%
    HURT stats (abs)   min: 2 max: 2442 x̄: 165.38 x̃: 6
    HURT stats (rel)   min: <.01% max: 20.83% x̄: 0.74% x̃: 0.12%
    95% mean confidence interval for cycles value: -2.07 85.11
    95% mean confidence interval for cycles %-change: -0.22% 0.30%
    Inconclusive result (value mean confidence interval includes 0).

    total spills in shared programs: 11956 -> 11950 (-0.05%)
    spills in affected programs: 77 -> 71 (-7.79%)
    helped: 3
    HURT: 0

    total fills in shared programs: 25619 -> 25549 (-0.27%)
    fills in affected programs: 593 -> 523 (-11.80%)
    helped: 4
    HURT: 0

    LOST:   0
    GAINED: 0

    Total CPU time (seconds): 1695.69 -> 1706.03 (0.61%)

Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
5 years agogallium: Add PIPE_CAP_CS_DERIVED_SYSTEM_VALUES_SUPPORTED
Caio Marcelo de Oliveira Filho [Mon, 10 Jun 2019 03:56:09 +0000 (20:56 -0700)]
gallium: Add PIPE_CAP_CS_DERIVED_SYSTEM_VALUES_SUPPORTED

Tells whether or not the driver can handle gl_LocalInvocationIndex and
gl_GlobalInvocationID.  If not supported (the default), state tracker
will lower those on behalf of the driver.

v2: Add case to u_screen.c.  (Anholt)

Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
5 years agost/glsl: Perform some var optimizations
Caio Marcelo de Oliveira Filho [Sun, 9 Jun 2019 17:25:07 +0000 (10:25 -0700)]
st/glsl: Perform some var optimizations

Perform those before some derefs are gone when we lower the buffers
after the st_nir_opts() call.

Intel SKL shader-db results:

    total instructions in shared programs: 15593685 -> 15590708 (-0.02%)
    instructions in affected programs: 378078 -> 375101 (-0.79%)
    helped: 777
    HURT: 44
    helped stats (abs) min: 1 max: 68 x̄: 4.07 x̃: 4
    helped stats (rel) min: 0.04% max: 31.58% x̄: 2.88% x̃: 1.37%
    HURT stats (abs)   min: 1 max: 24 x̄: 4.20 x̃: 2
    HURT stats (rel)   min: 0.17% max: 8.00% x̄: 1.60% x̃: 1.27%
    95% mean confidence interval for instructions value: -4.02 -3.23
    95% mean confidence interval for instructions %-change: -2.93% -2.35%
    Instructions are helped.

    total loops in shared programs: 4815 -> 4815 (0.00%)
    loops in affected programs: 0 -> 0
    helped: 0
    HURT: 0

    total cycles in shared programs: 371965528 -> 371788566 (-0.05%)
    cycles in affected programs: 184190307 -> 184013345 (-0.10%)
    helped: 3650
    HURT: 2855
    helped stats (abs) min: 1 max: 59400 x̄: 99.45 x̃: 15
    helped stats (rel) min: <.01% max: 43.18% x̄: 2.60% x̃: 1.02%
    HURT stats (abs)   min: 1 max: 16362 x̄: 65.16 x̃: 10
    HURT stats (rel)   min: <.01% max: 66.22% x̄: 2.78% x̃: 0.81%
    95% mean confidence interval for cycles value: -53.73 -0.68
    95% mean confidence interval for cycles %-change: -0.39% -0.08%
    Cycles are helped.

    total spills in shared programs: 11936 -> 11956 (0.17%)
    spills in affected programs: 443 -> 463 (4.51%)
    helped: 0
    HURT: 8

    total fills in shared programs: 25644 -> 25619 (-0.10%)
    fills in affected programs: 2306 -> 2281 (-1.08%)
    helped: 24
    HURT: 2

    LOST:   7
    GAINED: 16

    Total CPU time (seconds): 1679.04 -> 1695.69 (0.99%)

shader-db results radeonsi (VEGA64):

    Totals from affected shaders:
    SGPRS: 180160 -> 179552 (-0.34 %)
    VGPRS: 115368 -> 114544 (-0.71 %)
    Spilled SGPRs: 5627 -> 5603 (-0.43 %)
    Spilled VGPRs: 0 -> 0 (0.00 %)
    Private memory VGPRs: 0 -> 0 (0.00 %)
    Scratch size: 0 -> 0 (0.00 %) dwords per thread
    Code Size: 7808364 -> 7803268 (-0.07 %) bytes
    LDS: 192 -> 192 (0.00 %) blocks
    Max Waves: 19202 -> 19340 (0.72 %)
    Wait states: 0 -> 0 (0.00 %)

Radeonsi results provided by Timothy.

Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
5 years agoanv/cmd_buffer: Reuse gen8 Cmd{Set, Reset}Event on gen7
Ville Syrjälä [Mon, 10 Jun 2019 11:21:05 +0000 (14:21 +0300)]
anv/cmd_buffer: Reuse gen8 Cmd{Set, Reset}Event on gen7

Modern DXVK requires event support [1], but looks like it only
uses vkCmdSetEvent() + vkGetEventStatus(). So we can just
borrow the relevant code from gen8, leaving CmdWaitEvents still
unimplemented.

[1] https://github.com/doitsujin/dxvk/commit/8c3900c533d83d12c970b905183d17a1d3e8df1f

v2: Also move CmdWaitEvents into genX_cmd_buffer.c (Jason)

Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
5 years agointel/fs: Mark source 0 of bcsel as needing Boolean resolve
Ian Romanick [Fri, 7 Jun 2019 22:45:03 +0000 (15:45 -0700)]
intel/fs: Mark source 0 of bcsel as needing Boolean resolve

The other sources of the bcsel behave like the sources of an and or
other logical operation.  However, source zero behaves differently.
It is evaluated as a Boolean, so it needs to be resolved.

No shader-db changes, but the tests mentioned in the bug get a couple
instructions added back.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=110857
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
5 years agofreedreno/a5xx: enable a540
Rob Clark [Thu, 6 Jun 2019 14:43:19 +0000 (07:43 -0700)]
freedreno/a5xx: enable a540

Tested-by: Jeffrey Hugo <jeffrey.l.hugo@gmail.com>
Signed-off-by: Rob Clark <robdclark@chromium.org>
5 years agofreedreno/a6xx: enable UBWC by default
Rob Clark [Mon, 10 Jun 2019 23:12:12 +0000 (16:12 -0700)]
freedreno/a6xx: enable UBWC by default

Flip the FD_MESA_DEBUG flag to a disable rather than enable, drop the
obsolete comment (and bonus, drop unused softpin debug flag)

Signed-off-by: Rob Clark <robdclark@chromium.org>
Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>
5 years agofreedreno/a6xx: disallow UBWC for z24s8
Rob Clark [Mon, 10 Jun 2019 22:57:43 +0000 (15:57 -0700)]
freedreno/a6xx: disallow UBWC for z24s8

This is slightly annoying because it *mostly* works.. but we have some
issues to sort out about how to blit z24s8/x24s8/z24x8 with UBWC before
we can enable UBWC by default.  For now it is a step forward to at least
enable it for non-z/s while we figure out how to blit z24s8+UBWC.

(The basic issue is that pretending z24s8 is an equivalently sized rgba
format for the purpose of blitting falls apart when UBWC is in the
picture.)

Signed-off-by: Rob Clark <robdclark@chromium.org>
Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>
5 years agofreedreno/a6xx: use correct UBWC reg builders
Rob Clark [Mon, 10 Jun 2019 13:43:52 +0000 (06:43 -0700)]
freedreno/a6xx: use correct UBWC reg builders

No functional change, the registers have the same layout as MRT flags
pitch reg.

Signed-off-by: Rob Clark <robdclark@chromium.org>
Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>
5 years agofreedreno: update generated headers
Rob Clark [Mon, 10 Jun 2019 13:36:31 +0000 (06:36 -0700)]
freedreno: update generated headers

Signed-off-by: Rob Clark <robdclark@chromium.org>
Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>
5 years agofreedreno/a6xx: disable UBWC for some formats
Rob Clark [Fri, 7 Jun 2019 19:14:10 +0000 (12:14 -0700)]
freedreno/a6xx: disable UBWC for some formats

An older blob claims to support UBWC w/ r32ui an r32i, but not r32f.
Results from deqp indicate that it doesn't work with r32ui and r32i.

This *could* also just mean that use as "IBO" (image) is more limited
than as texture, although blob also doesn't seem to bother to try to use
UBWC with images at all, so hard to know for sure.

Signed-off-by: Rob Clark <robdclark@chromium.org>
Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>
5 years agofreedreno/a6xx: handle non-UWC-compatible image views
Rob Clark [Fri, 7 Jun 2019 18:00:56 +0000 (11:00 -0700)]
freedreno/a6xx: handle non-UWC-compatible image views

Signed-off-by: Rob Clark <robdclark@chromium.org>
Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>
5 years agofreedreno/a6xx: handle non-UBWC-compatible texture views
Rob Clark [Fri, 7 Jun 2019 17:31:59 +0000 (10:31 -0700)]
freedreno/a6xx: handle non-UBWC-compatible texture views

Signed-off-by: Rob Clark <robdclark@chromium.org>
Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>
5 years agofreedreno: add helper to uncompress UBWC resource
Rob Clark [Fri, 7 Jun 2019 17:14:12 +0000 (10:14 -0700)]
freedreno: add helper to uncompress UBWC resource

We'll need this for a few edge cases, like image/sampler view that uses
a format that UBWC does not support with a resource originally created
in a format that UBWC does support.

NOTE we *could* in some cases do an in-place uncompress.  But that has
a couple potential sharp edges:

 1) the uncompressed buffer could have different layout, ie. a5xx
    with meta and pixel data of layers/levels interleaved.

 2) if it comes mid-batch, it would force flush, or somehow fixing
    up cmdstream for draws already emitted.  But with the resource
    shadowing approach we can rely on batch re-ordering to avoid
    splitting things.. older draws see the older compressed version,
    newer draws see the new uncompressed version of the rsc.

Signed-off-by: Rob Clark <robdclark@chromium.org>
Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>
5 years agofreedreno: handle images in rebind_resource()
Rob Clark [Fri, 7 Jun 2019 18:20:11 +0000 (11:20 -0700)]
freedreno: handle images in rebind_resource()

Signed-off-by: Rob Clark <robdclark@chromium.org>
Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>
5 years agofreedreno: allow null discard box in shadow path
Rob Clark [Fri, 7 Jun 2019 16:39:30 +0000 (09:39 -0700)]
freedreno: allow null discard box in shadow path

When uncompressing a UBWC buffer, we don't want to discard anything.

Signed-off-by: Rob Clark <robdclark@chromium.org>
Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>
5 years agofreedreno: swap UBWC state in shadow path
Rob Clark [Fri, 7 Jun 2019 16:29:53 +0000 (09:29 -0700)]
freedreno: swap UBWC state in shadow path

It doesn't come up yet, as so far we only hit this path with linear
buffers.  But it will when we start re-using the shadow path for
uncompressing UBWC buffers.

Signed-off-by: Rob Clark <robdclark@chromium.org>
Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>
5 years agofreedreno: add modifier param to fd_try_shadow_resource()
Rob Clark [Fri, 7 Jun 2019 16:23:16 +0000 (09:23 -0700)]
freedreno: add modifier param to fd_try_shadow_resource()

To uncompress UBWC, I want to re-use the shadow path, but we'll need a
way to request that the new buffer is not compressed.

Signed-off-by: Rob Clark <robdclark@chromium.org>
Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>
5 years agofreedreno: correct modifier for UBWC buffers
Rob Clark [Fri, 7 Jun 2019 16:12:52 +0000 (09:12 -0700)]
freedreno: correct modifier for UBWC buffers

Signed-off-by: Rob Clark <robdclark@chromium.org>
Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>
5 years agovirgl: consider newly created resources idle
Chia-I Wu [Thu, 6 Jun 2019 17:55:59 +0000 (10:55 -0700)]
virgl: consider newly created resources idle

A newly created resource can be regarded as idle.  We don't care if
the RESOURCE_CREATE command has been retired, unless it is used for
fencing.

Signed-off-by: Chia-I Wu <olvaffe@gmail.com>
Reviewed-by: Alexandros Frantzis <alexandros.frantzis@collabora.com>
5 years agovirgl: make resource_wait/resource_is_busy cheaper
Chia-I Wu [Fri, 10 May 2019 18:56:46 +0000 (11:56 -0700)]
virgl: make resource_wait/resource_is_busy cheaper

The round trip to the kernel is expensive.  Add a local cache to
avoid it when possible.

There is a race condition when two contexts access the same resource
at the same time (e.g., ctx1 submits a cmdbuf that accesses a
resource while ctx2 maps the resource).  But that is probably an app
bug in the first place.

Signed-off-by: Chia-I Wu <olvaffe@gmail.com>
Reviewed-by: Alexandros Frantzis <alexandros.frantzis@collabora.com>
5 years agovirgl: add virgl_drm_{alloc,free,clear}_res_list
Chia-I Wu [Mon, 10 Jun 2019 23:05:48 +0000 (16:05 -0700)]
virgl: add virgl_drm_{alloc,free,clear}_res_list

Helpers to work with resource list.  virgl_drm_release_all_res is
removed.

Signed-off-by: Chia-I Wu <olvaffe@gmail.com>
Reviewed-by: Alexandros Frantzis <alexandros.frantzis@collabora.com>
5 years agovirgl: do not cache external resources
Chia-I Wu [Thu, 6 Jun 2019 21:58:39 +0000 (14:58 -0700)]
virgl: do not cache external resources

We should not reuse a resource for other purposes when it can still
be accessed by another process or device.

Signed-off-by: Chia-I Wu <olvaffe@gmail.com>
Reviewed-by: Alexandros Frantzis <alexandros.frantzis@collabora.com>
5 years agopanfrost: Enable AFBC on depth/stencil
Alyssa Rosenzweig [Mon, 10 Jun 2019 15:04:10 +0000 (08:04 -0700)]
panfrost: Enable AFBC on depth/stencil

This seems to be a performance win, but more rigorous testing is
necessary to figure out the exact circumstances when this is good/bad.
Incidentally, this fixes non-aligned ZS.

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
5 years agopanfrost: Linear depth/stencil should be aligned
Alyssa Rosenzweig [Mon, 10 Jun 2019 14:43:41 +0000 (07:43 -0700)]
panfrost: Linear depth/stencil should be aligned

We might render to it.

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
5 years agopanfrost/midgard: Decode LOD/bias registers
Alyssa Rosenzweig [Tue, 11 Jun 2019 14:41:09 +0000 (07:41 -0700)]
panfrost/midgard: Decode LOD/bias registers

For constant LODs/biases, we can use an immediate embedded in the
texture (already decoded); for non-constant, we have to use a register
squeezed into the usual immediate field, which is decoded here.

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
5 years agopanfrost/midgard: Decode texture offset register swizzle
Alyssa Rosenzweig [Mon, 10 Jun 2019 21:56:54 +0000 (14:56 -0700)]
panfrost/midgard: Decode texture offset register swizzle

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
5 years agopanfrost/midgard/disasm: include textureGather()
Alyssa Rosenzweig [Mon, 10 Jun 2019 21:56:32 +0000 (14:56 -0700)]
panfrost/midgard/disasm: include textureGather()

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
5 years agopanfrost/midgard: Support negative immediate offsets
Alyssa Rosenzweig [Mon, 10 Jun 2019 20:27:10 +0000 (13:27 -0700)]
panfrost/midgard: Support negative immediate offsets

It's not at all clear why this work for texelFetch but not texture.
Maybe the top bits are dual-purpose on other texturing ops...?

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
5 years agopanfrost/midgard: Fix redunant mask redundancy
Alyssa Rosenzweig [Mon, 10 Jun 2019 20:19:15 +0000 (13:19 -0700)]
panfrost/midgard: Fix redunant mask redundancy

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
5 years agopanfrost/midgard/disasm: Print LOD for texelFetch
Alyssa Rosenzweig [Mon, 10 Jun 2019 20:13:51 +0000 (13:13 -0700)]
panfrost/midgard/disasm: Print LOD for texelFetch

Its encoding differs slightly from the LOD used in normal texture calls.

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
5 years agopanfrost/midgard: Identify the in_reg_full field
Alyssa Rosenzweig [Mon, 10 Jun 2019 20:09:39 +0000 (13:09 -0700)]
panfrost/midgard: Identify the in_reg_full field

This is clear for texelFetch, hence the confusion with Bifrost's filter
field, but it's much more general in reality.

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
5 years agopanfrost/midgard/disasm: Correctly dump bias/LOD
Alyssa Rosenzweig [Mon, 10 Jun 2019 19:12:49 +0000 (12:12 -0700)]
panfrost/midgard/disasm: Correctly dump bias/LOD

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
5 years agopanfrost/midgard/disasm: Cleanup texture op code
Alyssa Rosenzweig [Mon, 10 Jun 2019 19:12:27 +0000 (12:12 -0700)]
panfrost/midgard/disasm: Cleanup texture op code

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
5 years agopanfrost/midgard/disasm: Add missing space
Alyssa Rosenzweig [Mon, 10 Jun 2019 18:52:32 +0000 (11:52 -0700)]
panfrost/midgard/disasm: Add missing space

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
5 years agopanfrost/midgard/disasm: LOD immediate/register select
Alyssa Rosenzweig [Mon, 10 Jun 2019 18:51:54 +0000 (11:51 -0700)]
panfrost/midgard/disasm: LOD immediate/register select

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
5 years agopanfrost/midgard/disasm: Use texture op name bare
Alyssa Rosenzweig [Mon, 10 Jun 2019 18:51:16 +0000 (11:51 -0700)]
panfrost/midgard/disasm: Use texture op name bare

This allows us to show a call to textureLod in a reasonable way.

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
5 years agopanfrost/midgard/disasm: Varying perspective divides
Alyssa Rosenzweig [Mon, 10 Jun 2019 18:18:41 +0000 (11:18 -0700)]
panfrost/midgard/disasm: Varying perspective divides

With an extra flag, we're able to do a perspective division "for free"
while loading a varying.

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
5 years agopanfrost/midgard: Add perspective division opcodes
Alyssa Rosenzweig [Mon, 10 Jun 2019 18:05:40 +0000 (11:05 -0700)]
panfrost/midgard: Add perspective division opcodes

...on the load/store unit, not the ALUs. Looks goofy but hey.

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
5 years agopanfrost/midgard: Print texture offsets
Alyssa Rosenzweig [Mon, 10 Jun 2019 17:15:28 +0000 (10:15 -0700)]
panfrost/midgard: Print texture offsets

This patch identifies the two modes of offsets in a texture instruction
(immediate and register, disambiguated by the bit-once-known-as
"has_offset") and implements disassembly for both.

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
5 years agopanfrost/midgard: Expand texture to 4-channel swizzle
Alyssa Rosenzweig [Mon, 10 Jun 2019 16:39:17 +0000 (09:39 -0700)]
panfrost/midgard: Expand texture to 4-channel swizzle

This eliminates some unknowns, clarifies 3D textures, and will maybe
help with array/shadow textures?

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
5 years agodocs: update calendar, add news item and link release notes for 19.1.0
Juan A. Suarez Romero [Tue, 11 Jun 2019 15:38:22 +0000 (17:38 +0200)]
docs: update calendar, add news item and link release notes for 19.1.0

Signed-off-by: Juan A. Suarez Romero <jasuarez@igalia.com>
5 years agodocs: Add SHA256 sums for 19.1.0
Juan A. Suarez Romero [Tue, 11 Jun 2019 15:25:40 +0000 (15:25 +0000)]
docs: Add SHA256 sums for 19.1.0

Signed-off-by: Juan A. Suarez Romero <jasuarez@igalia.com>
(cherry picked from commit 2a5b4e2b9ffc07f32a7ff5f89176cb892b179c5f)

5 years agodocs: Add release notes for 19.1.0
Juan A. Suarez Romero [Tue, 11 Jun 2019 15:07:39 +0000 (17:07 +0200)]
docs: Add release notes for 19.1.0

Signed-off-by: Juan A. Suarez Romero <jasuarez@igalia.com>
(cherry picked from commit 1517811f4f75cd628dd7122d63092f3954a81a7d)

5 years agoradv: assert on inline uniform blocks in radv_CmdPushDescriptorSetKHR()
Samuel Iglesias Gonsálvez [Tue, 11 Jun 2019 09:00:28 +0000 (11:00 +0200)]
radv: assert on inline uniform blocks in radv_CmdPushDescriptorSetKHR()

According to the Vulkan spec, inline uniform blocks are not allowed
to be updated through vkCmdPushDescriptorSetKHR().

These are the spec quotes from "13.2.1. Descriptor Set Layout"
that are relevant for this case:

"VK_DESCRIPTOR_SET_LAYOUT_CREATE_PUSH_DESCRIPTOR_BIT_KHR specifies
 that descriptor sets must not be allocated using this layout, and
 descriptors are instead pushed by vkCmdPushDescriptorSetKHR."

"If flags contains
 VK_DESCRIPTOR_SET_LAYOUT_CREATE_PUSH_DESCRIPTOR_BIT_KHR, then all
 elements of pBindings must not have a descriptorType of
 VK_DESCRIPTOR_TYPE_INLINE_UNIFORM_BLOCK_EXT".

There is no explicit mention in vkCmdPushDescriptorSetKHR() to forbid
this case but it is implied in the creation of the descriptor set
layout as aforementioned.

Signed-off-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
5 years agoanv: ignore inline uniform blocks in anv_CmdPushDescriptorSetKHR()
Samuel Iglesias Gonsálvez [Tue, 11 Jun 2019 08:44:47 +0000 (10:44 +0200)]
anv: ignore inline uniform blocks in anv_CmdPushDescriptorSetKHR()

According to the Vulkan spec, inline uniform blocks are not allowed
to be updated through vkCmdPushDescriptorSetKHR().

These are the spec quotes from "13.2.1. Descriptor Set Layout"
that are relevant for this case:

"VK_DESCRIPTOR_SET_LAYOUT_CREATE_PUSH_DESCRIPTOR_BIT_KHR specifies
that descriptor sets must not be allocated using this layout, and
descriptors are instead pushed by vkCmdPushDescriptorSetKHR."

"If flags contains
VK_DESCRIPTOR_SET_LAYOUT_CREATE_PUSH_DESCRIPTOR_BIT_KHR, then all
elements of pBindings must not have a descriptorType of
VK_DESCRIPTOR_TYPE_INLINE_UNIFORM_BLOCK_EXT".

There is no explicit mention in vkCmdPushDescriptorSetKHR() to forbid
this case but it is implied in the creation of the descriptor set
layout as aforementioned.

Signed-off-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
5 years agoegl: compare the whole list of attributes
Eric Engestrom [Thu, 6 Jun 2019 20:30:49 +0000 (21:30 +0100)]
egl: compare the whole list of attributes

`memcmp()` compares a given number of bytes, but `EGLAttrib` is larger than a byte.

Fixes: 8e991ce5397598ceb422 "egl: handle the full attrib list in display::options"
Signed-off-by: Eric Engestrom <eric.engestrom@intel.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
5 years agofreedreno/a5xx: Fix indirect draw max_indices calculation
Eduardo Lima Mitev [Mon, 10 Jun 2019 21:27:22 +0000 (23:27 +0200)]
freedreno/a5xx: Fix indirect draw max_indices calculation

The number of elements to draw should not be affected by the offset.

A similar fix was submitted for a6xx at 79180a05.

Fixes these dEQP tests on a5xx:

dEQP-GLES31.functional.draw_indirect.compute_interop.large.drawelements_separate_grid_500x500_drawcount_8
dEQP-GLES31.functional.draw_indirect.compute_interop.large.drawelements_separate_grid_500x500_drawcount_2500
dEQP-GLES31.functional.draw_indirect.compute_interop.large.drawarrays_separate_grid_500x500_drawcount_2500
dEQP-GLES31.functional.draw_indirect.compute_interop.large.drawarrays_combined_grid_500x500_drawcount_2500
dEQP-GLES31.functional.draw_indirect.compute_interop.large.drawelements_combined_grid_500x500_drawcount_8
dEQP-GLES31.functional.draw_indirect.compute_interop.large.drawelements_combined_grid_500x500_drawcount_2500

Reviewed-by: Rob Clark <robdclark@gmail.com>
5 years agoradv: remove extra assignment in radv_decompress_resolve_subpass_src()
Samuel Pitoiset [Tue, 11 Jun 2019 06:17:22 +0000 (08:17 +0200)]
radv: remove extra assignment in radv_decompress_resolve_subpass_src()

baseArrayLayer is defined twice, trivial.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
5 years agoradv: add radv_get_resolve_pipeline() helper in the graphics path
Samuel Pitoiset [Mon, 10 Jun 2019 15:45:33 +0000 (17:45 +0200)]
radv: add radv_get_resolve_pipeline() helper in the graphics path

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-By: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
5 years agoradv: do not decompress all image layers before resolving inside a subpass
Samuel Pitoiset [Mon, 10 Jun 2019 15:45:32 +0000 (17:45 +0200)]
radv: do not decompress all image layers before resolving inside a subpass

When decompressing resolve source images, we should rely on the
framebuffer layer count instead of resolving all images layers.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-By: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
5 years agoradv: initialize the aspect mask when decompressing resolve source images
Samuel Pitoiset [Mon, 10 Jun 2019 15:45:31 +0000 (17:45 +0200)]
radv: initialize the aspect mask when decompressing resolve source images

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-By: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
5 years agoradv: perform proper layout transitions before resolving
Samuel Pitoiset [Mon, 10 Jun 2019 15:45:30 +0000 (17:45 +0200)]
radv: perform proper layout transitions before resolving

Use an explicit pipeline barrier for doing layout transitions
instead of duplicating some code.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-By: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
5 years agoradv: do not resolve all image layers with compute inside a subpass
Samuel Pitoiset [Mon, 10 Jun 2019 15:45:29 +0000 (17:45 +0200)]
radv: do not resolve all image layers with compute inside a subpass

When resolving inside a subpass, we should rely on the framebuffer
layer count instead of resolving all images layers. This should
improve performance of layered resolves a bit.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-By: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
5 years agoiris: Bypass half-float pack/unpack lowering.
Kenneth Graunke [Mon, 10 Jun 2019 21:03:03 +0000 (14:03 -0700)]
iris: Bypass half-float pack/unpack lowering.

This skips GLSL IR lowering of pack/unpackHalf operations, allowing
the NIR optimizer to see them

Improves performance in Synmark2's OglCSDof by about 2x, by cutting
about 90% of the cycles from one of the compute shaders.

shader-db statistics on Skylake:

4 compute shaders went from SIMD8 to SIMD16.

total instructions in shared programs: 15598871 -> 15542568 (-0.36%)
instructions in affected programs: 143016 -> 86713 (-39.37%)
helped: 144
HURT: 0
helped stats (abs) min: 17 max: 4669 x̄: 390.99 x̃: 164
helped stats (rel) min: 7.48% max: 85.28% x̄: 30.17% x̃: 24.22%
95% mean confidence interval for instructions value: -510.50 -271.49
95% mean confidence interval for instructions %-change: -32.70% -27.65%
Instructions are helped.

total cycles in shared programs: 371973958 -> 368902103 (-0.83%)
cycles in affected programs: 5557722 -> 2485867 (-55.27%)
helped: 144
HURT: 0
helped stats (abs) min: 106 max: 1026600 x̄: 21332.33 x̃: 1697
helped stats (rel) min: 0.53% max: 88.98% x̄: 36.12% x̃: 34.67%
95% mean confidence interval for cycles value: -41570.02 -1094.64
95% mean confidence interval for cycles %-change: -38.44% -33.80%
Cycles are helped.

total spills in shared programs: 11936 -> 11903 (-0.28%)
spills in affected programs: 110 -> 77 (-30.00%)
helped: 3
HURT: 2

total fills in shared programs: 25644 -> 25178 (-1.82%)
fills in affected programs: 677 -> 211 (-68.83%)
helped: 5
HURT: 0

total loops in shared programs: 4830 -> 4829 (-0.02%)
loops in affected programs: 1 -> 0
helped: 1
HURT: 0

5 years agoradv: Handle UNDEFINED format in image format list.
Bas Nieuwenhuizen [Sat, 8 Jun 2019 21:51:16 +0000 (23:51 +0200)]
radv: Handle UNDEFINED format in image format list.

Was watching a presentation on YT where this was used and it turns
out it is not invalid.

The only case it is actually valid as format in the creation of an
image or image view is with Android Hardware Buffers which have
their format specified externally.

So we can just ignore all entries with VK_FORMAT_UNDEFINED.

Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
5 years agoradv: Prevent out of bound shift on 32-bit builds.
Bas Nieuwenhuizen [Mon, 10 Jun 2019 14:17:46 +0000 (16:17 +0200)]
radv: Prevent out of bound shift on 32-bit builds.

uintptr_t is 32-bits then and shifting it by 32 bits results in undefined
behavior IIRC.

Fixes: b3c8de1c55c "radv: save all descriptor pointers into the trace BO"
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
5 years agoglsl: Check order and uniqueness of interlock functions
Caio Marcelo de Oliveira Filho [Wed, 5 Jun 2019 08:25:24 +0000 (01:25 -0700)]
glsl: Check order and uniqueness of interlock functions

With this commit all remaining compilation tests in Piglit for
ARB_fragment_shader_interlock will pass.

Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Plamena Manolova <plamena.manolova@intel.com>
5 years agoglsl: Make interlock builtins follow same compiler rules as barriers
Caio Marcelo de Oliveira Filho [Wed, 5 Jun 2019 07:59:11 +0000 (00:59 -0700)]
glsl: Make interlock builtins follow same compiler rules as barriers

Generalize the barrier code to provide correct error messages for
other builtins.

Fixes most of piglit compilation tests for
ARB_fragment_shader_interlock.

Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Plamena Manolova <plamena.manolova@intel.com>
5 years agonir/opt_algebraic: Fix rules for imadsh_mix16
Eduardo Lima Mitev [Mon, 10 Jun 2019 19:38:39 +0000 (21:38 +0200)]
nir/opt_algebraic: Fix rules for imadsh_mix16

The rules added in patch 3addd7c are inverted:

It should be:

(al * bh) << 16 + c

instead of:

(ah * bl) << 16 + c

Fixes a number of regressions under
dEQP-GLES31.functional.draw_indirect.compute_interop.large.*
on Freedreno.

Reviewed-by: Rob Clark <robdclark@gmail.com>
5 years agopanfrost: Ignore discards in dead branch analysis
Alyssa Rosenzweig [Mon, 10 Jun 2019 15:21:24 +0000 (08:21 -0700)]
panfrost: Ignore discards in dead branch analysis

Fixes regressions in
dEQP-GLES2.functional.shaders.discard.dynamic_loop_*

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
5 years agoradv: fix setting CB_SHADER_MASK for dual source blending
Samuel Pitoiset [Thu, 6 Jun 2019 12:46:47 +0000 (14:46 +0200)]
radv: fix setting CB_SHADER_MASK for dual source blending

CB_SHADER_MASK was computed without the second color buffer
format which looks totally wrong to me.

While we are at it, copy a comment from RadeonSI.

Cc: 19.0 19.1 <mesa-stable@lists.freedesktop.org>
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-By: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
5 years agopanfrost/midgard: Disambiguate register mode
Alyssa Rosenzweig [Wed, 5 Jun 2019 22:43:52 +0000 (15:43 -0700)]
panfrost/midgard: Disambiguate register mode

We postfix instructions by their size if a destination override is in
place (a la AT&T assembly), disambiguating instruction sizes.
Previously, "16-bit instruction, 16-bit dest, 16-bit sources"
disassembled identically to "32-bit instruction, 16-bit dest, 16-bit
sources", which is semantically distinct due to the lessened opportunity
for parallelism but (potentially) greater precision. Adding a postfix
removes the ambiguity and relieves mental gymnastics reading weird
disassemblies even in some cases that are not ambiguous.

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
5 years agopanfrost/midgard: Expose vec8/vec16 modes
Alyssa Rosenzweig [Wed, 5 Jun 2019 22:41:03 +0000 (15:41 -0700)]
panfrost/midgard: Expose vec8/vec16 modes

Midgard ALUs can operate in one of four modes: vec2 64-bit, vec4 32-bit,
vec8 16-bit, or vec16 8-bit. Our compiler (and indeed, any OpenGL ES
shader) only uses 32-bit (and eventually vec4 16-bit) modes in normal
circumstances. Nevertheless, the other modes do exist and are easily
accessible through OpenCL; they also come up in cases like blend
shaders.

While we have had minimal support for decoding 8-bit/64-bit modes, we
did so pretending they were vec4 in each case; 16-bit registers had a
synthetically duplicated register file to separate lo/hi halves, etc.
This works for GL, but it doesn't map to what the hardware is -actually-
doing, which can cause some headscratchingly bizarre disassemblies from
OpenCL. So, we dive in the deep end and support these other modes
natively in the disassembler, using absurdly long masks/swizzles, since
the hardware is considerably more flexible than what was exposed before.

Outside of some fixed routines for blending, none of the above is
supported in the compiler yet. But it's better to have it in the ISA
definitions and disassembler than not, for future use if nothing else.

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
5 years agopanfrost/midgard: Add shifting int modifiers
Alyssa Rosenzweig [Wed, 5 Jun 2019 22:20:26 +0000 (15:20 -0700)]
panfrost/midgard: Add shifting int modifiers

As a source modifier, shift allows shifting a value left by the bit
size, useful in conjunction with a greater register mode, for instance
to implement `upsample`. As a concrete example, the following OpenCL:

   ushort hr0 = /* ... */, uint r1 = /* ... */;
   uint r2 = (convert_uint(hr0) << 16) ^ b;

compiles to the following Midgard assembly:

   ixor r, (hr0) << 16, b

In reverse, the ".hi" output modifier shifts the value right by the bit
size, leaving just the carry/overflow at the bottom. To implement *_hi
functions in OpenCL (for <64-bit), we do arithmetic in the 2x higher
mode with the .hi modifier. (For 64-bit, things are hairier, since there
is not an 128-bit int mode).

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
5 years agopanfrost/midgard: Add integer outmods
Alyssa Rosenzweig [Wed, 5 Jun 2019 22:17:45 +0000 (15:17 -0700)]
panfrost/midgard: Add integer outmods

For floats, output modifiers determine clamping behaviour. For integers,
they determine wrapping/saturation behaviour (or shifting -- see next
commit). These are very different; they are conceptually two unrelated
enums union'ed together; the distinction is responsible for many-a-bug.
While clamping behaviour for floats was clear from GL, the int behaviour
is only known From OpenCL contortion with convert_*_sat() functions.

With the underlying functions known, clean up the codebase, likely
fixing outmod type related bugs in the process.

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
5 years agopanfrost/midgard: Note floating compares type convert
Alyssa Rosenzweig [Wed, 5 Jun 2019 22:02:20 +0000 (15:02 -0700)]
panfrost/midgard: Note floating compares type convert

OP_TYPE_CONVERTS denotes an opcode that returns a different type than is
source (going from int-domain to float-domain or vice versa), named
after the f2i/i2f family of opcodes it covers. We care because source
mods are determined by the source type (i/f) but output modifiers are
determined by the output type (equals the source type, unless the op
type converts, in which case it's the opposite).

The upshot is that floating-point compares (feq/fne/etc) actually do
type-convert.  That is, that take in floating-points and output in
integer space (a boolean), so we mark them off this way to ensure the
correct output modifiers are used.

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
5 years agopanfrost: Align linear renderable resources
Alyssa Rosenzweig [Thu, 6 Jun 2019 21:36:41 +0000 (14:36 -0700)]
panfrost: Align linear renderable resources

It's just -easier- to render to aligned framebuffers. For winsys
targets, we already align, but even for an internal linear FBO we ought
to align everything nicely.

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
5 years agopanfrost: Fix stride check when mipmapping
Alyssa Rosenzweig [Sat, 8 Jun 2019 00:07:13 +0000 (17:07 -0700)]
panfrost: Fix stride check when mipmapping

Now that we support custom strides on mipmapped textures (theoretically,
at least), extend the stride check to support mipmaps.  Fixes incorrect
strides of linear windows in Weston.

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Reviewed-by: Tomeu Vizoso <tomeu.vizoso@collabora.com>
5 years agopanfrost: Refactor texture/sampler upload
Alyssa Rosenzweig [Fri, 7 Jun 2019 21:25:28 +0000 (14:25 -0700)]
panfrost: Refactor texture/sampler upload

We move some coding packing the texture/sampler descriptors into
dedicated functions (out of the terrifyingly long emit_for_draw
monolith), cleaning them up as we go.

The discovery triggering the cleanup is the format for including manual
strides in the presence of mipmaps/cubemaps. Rather than placed at the
end like previously assumed, they are interleaved after each address.
This difference is relevant when handling NPOT linear mipmaps.

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
5 years agopanfrost: Refactor blitting code
Alyssa Rosenzweig [Fri, 7 Jun 2019 17:32:17 +0000 (10:32 -0700)]
panfrost: Refactor blitting code

We refactor the wallpaper rendering code to separate the
wallpaper-specific bits from the general blitting capabilities. In the
(hopefully near) future, we'll turn this on to implement real Gallium
blits, e.g. for automatic mipmap generation.

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
5 years agopanfrost: Refactor AFBC code
Alyssa Rosenzweig [Fri, 7 Jun 2019 16:39:31 +0000 (09:39 -0700)]
panfrost: Refactor AFBC code

This patch does a substantial cleanup of the code for handling AFBC,
moving various disparate misplaced functions into a new central
pan_afbc.c file.

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
5 years agopanfrost: Move pan_screen() to pan_screen.h
Alyssa Rosenzweig [Fri, 7 Jun 2019 16:49:36 +0000 (09:49 -0700)]
panfrost: Move pan_screen() to pan_screen.h

Trivial.

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
5 years agopanfrost: Always align strides to cache line (64)
Alyssa Rosenzweig [Fri, 7 Jun 2019 15:58:16 +0000 (08:58 -0700)]
panfrost: Always align strides to cache line (64)

(Performance tweak.)

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
5 years agodocs: fixup 19.0.5 <> 19.0.6 confusion
Emil Velikov [Fri, 7 Jun 2019 16:11:23 +0000 (17:11 +0100)]
docs: fixup 19.0.5 <> 19.0.6 confusion

The title of the release notes says 19.0.5 while the rest of the file
(correctly) says 19.0.6

Fixes: fe79d75ccf9 ("docs: Add relnotes for 19.0.6")
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Dylan Baker <dylan at pnwbakers.com>
5 years agomapi: correctly handle the full offset table
Emil Velikov [Wed, 5 Jun 2019 15:45:03 +0000 (16:45 +0100)]
mapi: correctly handle the full offset table

Earlier commit converted ES1 and ES2 to a new, much simpler, dispatch
generator. At the same time, GL/glapi and the driver side are still
using the old code.

There is a hidden ABI between GL*.so and glapi.so, former referencing
entry-points by offset in the _glapi_table. Hence earlier commit added
the full table of entry-points, alongside a marker for other cases like
indirect GL(X) and driver-size remapping.

Yet the patches did not handle things fully, thus it was possible to
get different interpretations of the dispatch table after the marker.

This commit fixes that adding an indicative error message to catch
future bugs.

While here correct the marker (MAX_OFFSETS) comment.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=110302
Fixes: cf317bf0937 ("mapi: add all _glapi_table entrypoints tostatic_data.py")
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
5 years agomapi: add static_date offset to EXT_dsa
Emil Velikov [Fri, 7 Jun 2019 15:19:59 +0000 (16:19 +0100)]
mapi: add static_date offset to EXT_dsa

As elaborated in the next patch, there is some hidden ABI that
effectively require most entrypoints to be listed in the file.

Cc: Marek Olšák <marek.olsak@amd.com>
Fixes: d2906293c43 ("mesa: EXT_dsa add selectorless matrix stackfunctions")
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
5 years agomapi: add static_date offset to MaxShaderCompilerThreadsKHR
Emil Velikov [Wed, 5 Jun 2019 16:14:20 +0000 (17:14 +0100)]
mapi: add static_date offset to MaxShaderCompilerThreadsKHR

As elaborated in the next patch, there is some hidden ABI that
effectively require most entrypoints to be listed in the file.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=110302
Cc: Marek Olšák <maraeo@gmail.com>
Fixes: c5c38e831ee ("mesa: implement ARB/KHR_parallel_shader_compile")
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
5 years agoegl: Let the caller of dri2_create_drawable decide about loaderPrivate.
Mathias Fröhlich [Fri, 7 Jun 2019 05:12:42 +0000 (07:12 +0200)]
egl: Let the caller of dri2_create_drawable decide about loaderPrivate.

In the call arguments to dri2_create_drawable decouple loaderPrivate
from dri2_surf. For all callers of dri2_create_drawable the two
pointers are the same with the exception of the gbm backed platform.
Let the calling code of dri2_create_drawable decide what
loaderPrivate shall be.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Signed-off-by: Mathias Fröhlich <Mathias.Froehlich@web.de>
5 years agoradv: fix alpha-to-coverage when there is unused color attachments
Samuel Pitoiset [Thu, 6 Jun 2019 14:31:01 +0000 (16:31 +0200)]
radv: fix alpha-to-coverage when there is unused color attachments

When alphaToCoverage is enabled, we should always write the alpha
channel of MRT0 if it's unused. This now matches RadeonSI.

This fixes the new CTS:
dEQP-VK.pipeline.multisample.alpha_to_coverage_unused_attachment.samples_*.alpha_invisible

Cc: 19.0 19.1 <mesa-stable@lists.freedesktop.org>
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-By: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl
5 years agopanfrost: ci: Switch from direct Docker use to buildah
Tomeu Vizoso [Fri, 7 Jun 2019 08:20:28 +0000 (10:20 +0200)]
panfrost: ci: Switch from direct Docker use to buildah

Use the infrastructure in wayland/ci-templates to build the container
images.

This prevents from getting into some situations in which the images
wouldn't be rebuilt, and allows us to share some infrastructure with
other projects in freedesktop.org.

Signed-off-by: Tomeu Vizoso <tomeu.vizoso@collabora.com>
Suggested-by: Michel Dänzer <michel@daenzer.net>
Acked-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
5 years agogallium/u_transfer_helper: Free the staging buffer on unmap.
Kenneth Graunke [Fri, 7 Jun 2019 08:16:16 +0000 (01:16 -0700)]
gallium/u_transfer_helper: Free the staging buffer on unmap.

u_transfer_helper sometimes mallocs a staging buffer, and leaked it.

Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>
5 years agointel/gpu_dump: fix argument passing
Lionel Landwerlin [Sat, 8 Jun 2019 20:48:02 +0000 (23:48 +0300)]
intel/gpu_dump: fix argument passing

We were dropping "/' around arguments grouped together.
This was triggering failures with :

   $ ./framemetrics -g "Memory Writes Distribution Gen9" -o /tmp/output.csv -f ./my.trace 10 11

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Eric Engestrom <eric@engestrom.ch>
5 years agoutil/os_file: suppress sign comparison warning
Eric Engestrom [Thu, 16 May 2019 14:37:28 +0000 (15:37 +0100)]
util/os_file: suppress sign comparison warning

Signed-off-by: Eric Engestrom <eric.engestrom@intel.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
5 years agoutil/os_file: fix error being sign-cast back and forth
Eric Engestrom [Thu, 16 May 2019 12:08:53 +0000 (13:08 +0100)]
util/os_file: fix error being sign-cast back and forth

Signed-off-by: Eric Engestrom <eric.engestrom@intel.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
5 years agoutil/os_file: avoid shadowing read() with a local variable
Eric Engestrom [Thu, 16 May 2019 14:02:45 +0000 (15:02 +0100)]
util/os_file: avoid shadowing read() with a local variable

Signed-off-by: Eric Engestrom <eric.engestrom@intel.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
5 years agoutil/os_file: actually return the error read() gave us
Eric Engestrom [Thu, 16 May 2019 13:57:07 +0000 (14:57 +0100)]
util/os_file: actually return the error read() gave us

Fixes: 316964709e21286c2af5 "util: add os_read_file() helper"
Signed-off-by: Eric Engestrom <eric.engestrom@intel.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
5 years agovirgl: Work around possible memory exhaustion
Alexandros Frantzis [Wed, 5 Jun 2019 13:50:11 +0000 (16:50 +0300)]
virgl: Work around possible memory exhaustion

Since we don't normally flush before performing copy transfers, it's
possible in some scenarios to use too much memory for staging resources
and start failing. This can happen either because we exhaust the total
available memory (including system memory virtio-gpu swaps out to), or,
more commonly, because the total size of resources in a command buffer
doesn't fit in virtio-gpu video memory.

To reduce the chances of this happening, force a flush before a copy
transfer if the total size of queued staging resources exceeds a certain
limit. Since after a flush any queued staging resources will be
eventually released, this ensures both that each command buffer doesn't
require too much video memory, and that we don't end up consuming too
much memory for staging resources in total.

Fixes kernel errors reported when running texture_upload tests in glbench.

Signed-off-by: Alexandros Frantzis <alexandros.frantzis@collabora.com>
Reviewed-by: Chia-I Wu <olvaffe@gmail.com>
5 years agovirgl: Remove incorrect resource wait condition
Alexandros Frantzis [Mon, 27 May 2019 21:06:03 +0000 (00:06 +0300)]
virgl: Remove incorrect resource wait condition

Now that we have copy transfers in place, we can remove the incorrect
resource wait condition. Copy transfers and other optimizations minimize
the performance impact of this removal, while providing the correct
behavior.

Signed-off-by: Alexandros Frantzis <alexandros.frantzis@collabora.com>
Reviewed-by: Chia-I Wu <olvaffe@gmail.com>
5 years agovirgl: Use copy transfers for textures
Alexandros Frantzis [Fri, 24 May 2019 11:03:28 +0000 (14:03 +0300)]
virgl: Use copy transfers for textures

Extend copy transfers to also be used for busy textures.

Performance results:
Unigine Valley, qemu before: 22.7 FPS after: 23.1 FPS

Signed-off-by: Alexandros Frantzis <alexandros.frantzis@collabora.com>
Reviewed-by: Chia-I Wu <olvaffe@gmail.com>
5 years agovirgl: Use buffer copy transfers to avoid waiting when mapping
Alexandros Frantzis [Wed, 8 May 2019 09:10:21 +0000 (12:10 +0300)]
virgl: Use buffer copy transfers to avoid waiting when mapping

We typically need to wait for a buffer to become ready before mapping,
so that we don't write new contents while the host is still using the
old contents. However, if we are allowed to discard the contents of the
mapped buffer range, then we can avoid waiting by using a staging buffer
range which we guarantee to never be busy, copying from the staging
buffer range to the target buffer in the host.

This commit implements this optimization by utilizing a dedicated
u_upload_mgr for the staging buffer.

Performance results:
Twilight Struggle (Steam/Proton), qemu before: 7 FPS after: 25 FPS
glmark2 ubo, qemu before: 38 FPS after: 331 FPS

Signed-off-by: Alexandros Frantzis <alexandros.frantzis@collabora.com>
Suggested-by: Gurchetan Singh <gurchetansingh@chromium.org>
Reviewed-by: Chia-I Wu <olvaffe@gmail.com>
5 years agovirgl: Support copy transfers
Alexandros Frantzis [Wed, 8 May 2019 13:17:53 +0000 (16:17 +0300)]
virgl: Support copy transfers

Support transfers that use a different resource as the source of data to
transfer. This will be used in upcoming commits to send data to host
buffers through a transfer upload buffer, in order to avoid waiting
when the buffer resource is busy.

Note that we don't support queueing copy transfers in the transfer
queue. Copy transfers should be emitted directly in the command queue,
allowing us to avoid flushes before them and leads to better
performance.

Signed-off-by: Alexandros Frantzis <alexandros.frantzis@collabora.com>
Reviewed-by: Chia-I Wu <olvaffe@gmail.com>
5 years agovirgl: Add copy_transfer3d definitions
Alexandros Frantzis [Tue, 14 May 2019 09:38:24 +0000 (12:38 +0300)]
virgl: Add copy_transfer3d definitions

Introduce definitions for the copy_transfer3d protocol command and virgl
capability. This command transfers data to the host by copying through
another resource, and will be used in upcoming commits to avoid waiting
when transferring data for busy resources.

Signed-off-by: Alexandros Frantzis <alexandros.frantzis@collabora.com>
Reviewed-by: Chia-I Wu <olvaffe@gmail.com>
5 years agovirgl: Make VIRGL_BIND_STAGING resources cacheable
Alexandros Frantzis [Tue, 4 Jun 2019 13:43:31 +0000 (16:43 +0300)]
virgl: Make VIRGL_BIND_STAGING resources cacheable

This could help performance when trying to recreate such resources for
copy transfers.

Signed-off-by: Alexandros Frantzis <alexandros.frantzis@collabora.com>
Reviewed-by: Chia-I Wu <olvaffe@gmail.com>
5 years agovirgl: Support VIRGL_BIND_STAGING
Alexandros Frantzis [Mon, 20 May 2019 10:00:38 +0000 (13:00 +0300)]
virgl: Support VIRGL_BIND_STAGING

Support a new virgl bind type for staging buffers which don't require
dedicated host-side storage. These will be used to implement copy
transfers.

Signed-off-by: Alexandros Frantzis <alexandros.frantzis@collabora.com>
Reviewed-by: Chia-I Wu <olvaffe@gmail.com>