platform/upstream/mesa.git
5 years agoamd/common/gfx10: print gfx10 registers in debug dumps
Nicolai Hähnle [Fri, 24 May 2019 12:34:45 +0000 (14:34 +0200)]
amd/common/gfx10: print gfx10 registers in debug dumps

Acked-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
5 years agoamd/common/gfx10: CMASK is only used for FMASK
Nicolai Hähnle [Sun, 19 Nov 2017 16:26:23 +0000 (17:26 +0100)]
amd/common/gfx10: CMASK is only used for FMASK

All regular color compression is done via DCC.

Acked-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
5 years agoamd/common/gfx10: support new tbuffer encoding
Nicolai Hähnle [Mon, 25 Mar 2019 17:12:07 +0000 (18:12 +0100)]
amd/common/gfx10: support new tbuffer encoding

Acked-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
5 years agoamd/common/gfx10: pad shader buffers for instruction prefetch
Nicolai Hähnle [Thu, 29 Nov 2018 23:37:07 +0000 (00:37 +0100)]
amd/common/gfx10: pad shader buffers for instruction prefetch

Acked-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
5 years agoamd/common/gfx10: implement scan & reduce operations
Nicolai Hähnle [Wed, 23 May 2018 20:08:22 +0000 (22:08 +0200)]
amd/common/gfx10: implement scan & reduce operations

Acked-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
5 years agoamd/common/gfx10: add GS_ALLOC_REQ message define
Nicolai Hähnle [Tue, 7 May 2019 20:34:50 +0000 (22:34 +0200)]
amd/common/gfx10: add GS_ALLOC_REQ message define

Acked-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
5 years agoamd/common/gfx10: print out GCR_CNTL as part of {ACQUIRE,RELEASE}_MEM
Nicolai Hähnle [Sun, 19 Nov 2017 14:23:44 +0000 (15:23 +0100)]
amd/common/gfx10: print out GCR_CNTL as part of {ACQUIRE,RELEASE}_MEM

Acked-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
5 years agoamd/common/gfx10: add register JSON
Nicolai Hähnle [Tue, 7 May 2019 00:37:19 +0000 (02:37 +0200)]
amd/common/gfx10: add register JSON

A small number of fields now need new disambiguation.

Acked-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
5 years agoamd/common: add GFX10 chips
Nicolai Hähnle [Tue, 24 Oct 2017 11:42:31 +0000 (11:42 +0000)]
amd/common: add GFX10 chips

Acked-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
5 years agomeson: require libdrm_amdgpu 2.4.99 for Navi
Marek Olšák [Tue, 2 Jul 2019 18:47:01 +0000 (14:47 -0400)]
meson: require libdrm_amdgpu 2.4.99 for Navi

Acked-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
5 years agoradv: gfx10 is not supported
Nicolai Hähnle [Mon, 13 May 2019 19:57:47 +0000 (21:57 +0200)]
radv: gfx10 is not supported

Acked-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
5 years agoamd/addrlib: add gfx10 support
Marek Olšák [Thu, 20 Jun 2019 00:42:18 +0000 (20:42 -0400)]
amd/addrlib: add gfx10 support

Acked-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
5 years agoradeonsi: make emit_streamout_output externally accessible
Nicolai Hähnle [Fri, 21 Sep 2018 20:07:01 +0000 (22:07 +0200)]
radeonsi: make emit_streamout_output externally accessible

Acked-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
5 years agoradeonsi: pass the context to query destroy functions
Nicolai Hähnle [Thu, 20 Sep 2018 08:19:30 +0000 (10:19 +0200)]
radeonsi: pass the context to query destroy functions

We'll need this in the future.

Acked-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
5 years agoradeonsi: make si_restore_qbo_state externally available
Nicolai Hähnle [Wed, 24 Apr 2019 13:52:35 +0000 (15:52 +0200)]
radeonsi: make si_restore_qbo_state externally available

Acked-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
5 years agoradeonsi: make get_primitive_id externally visible
Nicolai Hähnle [Tue, 7 May 2019 20:52:27 +0000 (22:52 +0200)]
radeonsi: make get_primitive_id externally visible

Acked-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
5 years agoradeonsi: make si_llvm_export_vs externally available
Nicolai Hähnle [Thu, 16 Nov 2017 15:49:06 +0000 (16:49 +0100)]
radeonsi: make si_llvm_export_vs externally available

Acked-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
5 years agoradeonsi: various si_translate_*format functions only apply to pre-gfx10
Nicolai Hähnle [Tue, 7 May 2019 20:38:20 +0000 (22:38 +0200)]
radeonsi: various si_translate_*format functions only apply to pre-gfx10

Acked-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
5 years agoradeonsi: use a fragment shader blit instead of DB->CB copy for ZS CPU mappings
Marek Olšák [Thu, 20 Jun 2019 22:32:57 +0000 (18:32 -0400)]
radeonsi: use a fragment shader blit instead of DB->CB copy for ZS CPU mappings

This mainly removes and simplifies code that is no longer needed.

There were some issues with the DB->CB stencil copy on gfx10, so let's
just use a fragment shader blit for all ZS mappings. It's more reliable.

Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de>
5 years agogallium/u_blitter: implement copying from ZS to color and vice versa
Marek Olšák [Thu, 20 Jun 2019 22:24:19 +0000 (18:24 -0400)]
gallium/u_blitter: implement copying from ZS to color and vice versa

This is for drivers that can't map depth and stencil and need to blit
them to a color texture for CPU access.

This also useful for drivers using separate depth and stencil.

Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de>
5 years agogallium/util: rewrite depth-stencil blit shaders
Marek Olšák [Thu, 20 Jun 2019 23:52:23 +0000 (19:52 -0400)]
gallium/util: rewrite depth-stencil blit shaders

- merge all 3 functions (Z, S, ZS)
- don't write the color output
- read the value from texel.x, then write it to position.z or stencil.y
  (don't use the value from texel.y or texel.z)

Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de>
5 years agost/mesa: accelerate glCopyPixels(STENCIL)
Marek Olšák [Sat, 15 Jun 2019 04:09:56 +0000 (00:09 -0400)]
st/mesa: accelerate glCopyPixels(STENCIL)

Tested-by: Dieter Nützel
5 years agoglsl/standalone: meson test for --dump-builder
Yevhenii Kolesnikov [Wed, 20 Feb 2019 13:42:27 +0000 (15:42 +0200)]
glsl/standalone: meson test for --dump-builder

Added meson test for standalone compiler with --dump-builder option
on builtin texture* functions.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=107767
Signed-off-by: Yevhenii Kolesnikov <yevhenii.kolesnikov@globallogic.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Dylan Baker <dylan@pnwbakers.com>
5 years agoglsl/standalone: exit on unsupported texture functions
Sergii Romantsov [Thu, 30 Aug 2018 12:04:35 +0000 (15:04 +0300)]
glsl/standalone: exit on unsupported texture functions

glsl/standalone with --dump-builder will exit when unsupported texture
functions are encountered.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=107767
Signed-off-by: Sergii Romantsov <sergii.romantsov@globallogic.com>
Signed-off-by: Yevhenii Kolesnikov <yevhenii.kolesnikov@globallogic.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Dylan Baker <dylan@pnwbakers.com>
5 years agoradeonsi: make gl_SampleMaskIn = 0x1 when MSAA is disabled
Pierre-Eric Pelloux-Prayer [Wed, 3 Jul 2019 17:27:12 +0000 (19:27 +0200)]
radeonsi: make gl_SampleMaskIn = 0x1 when MSAA is disabled

gl_SampleMaskIn is 1 when R_028BE0_PA_SC_AA_CONFIG is 0, so this commit rework the conditions
controlling this register.

Before it was set if the sctx->framebuffer had a sample count > 1.

Now we still require this condition, but we also need either:
  - GL_MULTISAMPLE to be enabled
  - to be executing an operation that doesn't depends on GL state using u_blitter.

This fixes the arb_sample_shading/sample_mask piglit tests on radeonsi.

Signed-off-by: Marek Olšák <marek.olsak@amd.com>
5 years agogallium/u_blitter: enable MSAA when blitting to MSAA surfaces
Brian Paul [Thu, 7 Dec 2017 16:09:13 +0000 (09:09 -0700)]
gallium/u_blitter: enable MSAA when blitting to MSAA surfaces

If we're doing a Z -> Z MSAA blit (for example) we need to enable
msaa rasterization when drawing the quads so that we can properly
write the per-sample values.

This fixes a number of Piglit ext_framebuffer_multisample blit tests
such as ext_framebuffer_multisample/no-color 2 depth combined with
the VMware driver.

Signed-off-by: Marek Olšák <marek.olsak@amd.com>
5 years agovirgl: Clear the valid buffer range when possible
Alexandros Frantzis [Fri, 21 Jun 2019 22:18:27 +0000 (01:18 +0300)]
virgl: Clear the valid buffer range when possible

If we are discarding the whole resource, we don't care about previous contents,
and the resource storage is now unused, either because we have created new
resource storage, or because we have waited for the existing resource storage
to become unused, or because the transfer is unsynchronized.

In the last two cases this commit marks the storage as uninitialized, but only
if the resource is not host writable (in which case we can't clear the valid
range, since that would result in missed readbacks in future transfers).

In the first case, when the whole resource discard involves a reallocation, the
reallocation and subsequent rebinding already update the valid buffer range
appropriately.

Signed-off-by: Alexandros Frantzis <alexandros.frantzis@collabora.com>
Reviewed-by: Chia-I Wu <olvaffe@gmail.com>
5 years agoswr/swr: Enable ARB_viewport_array
Jan Zielinski [Tue, 2 Jul 2019 14:44:34 +0000 (16:44 +0200)]
swr/swr: Enable ARB_viewport_array

The rasterizer core supported ARB_viewport_array,
but the swr layer connecting core to Gallium state
tracker only allowed one viewport.

We add support for multiple viewports to swr layer.

Reviewed-by: Alok Hota <alok.hota@intel.com>
5 years agoradv: Support VK_EXT_queue_family_foreign.
Bas Nieuwenhuizen [Wed, 3 Jul 2019 00:25:19 +0000 (02:25 +0200)]
radv: Support VK_EXT_queue_family_foreign.

Basically same as external for now.

Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Only case we might need to handle differently in the near future
is Raven's case of displayable DCC which is not renderable. But
we don't support that yet.

5 years agoradv: Fix interactions between variable descriptor count and inline uniform blocks.
Bas Nieuwenhuizen [Tue, 2 Jul 2019 09:32:44 +0000 (11:32 +0200)]
radv: Fix interactions between variable descriptor count and inline uniform blocks.

Fixes: d7e6541cc72 "radv: Only allocate supplied number of descriptors when variable."
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
5 years agowinsys/amdgpu: Make KMS handles valid for original DRM file descriptor
Michel Dänzer [Fri, 28 Jun 2019 16:35:56 +0000 (18:35 +0200)]
winsys/amdgpu: Make KMS handles valid for original DRM file descriptor

Getting a DMA-buf fd and converting that to a handle using our duplicate
of that file descriptor (getting at which requires passing a
radeon_winsys pointer to the buffer_get_handle hook) makes sure of this,
since duplicated file descriptors reference the same file description
and therefore the same GEM handle namespace.

This is necessary because libdrm_amdgpu may use a different DRM file
descriptor with a separate handle namespace internally, e.g. because it
always reuses any existing amdgpu_device_handle for the same device.
amdgpu_bo_export returns a handle which is valid for that internal
file descriptor.

Bugzilla: https://bugs.freedesktop.org/110903
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Tested-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
5 years agowinsys/amdgpu: Add amdgpu_screen_winsys
Michel Dänzer [Fri, 28 Jun 2019 14:06:23 +0000 (16:06 +0200)]
winsys/amdgpu: Add amdgpu_screen_winsys

It extends pipe_screen / radeon_winsys and references amdgpu_winsys.
Multiple amdgpu_screen_winsys instances may reference the same
amdgpu_winsys instance, which corresponds to an amdgpu_device_handle.

The purpose of amdgpu_screen_winsys is to keep a duplicate of the DRM
file descriptor passed to amdgpu_winsys_create, which will be needed
in the next change.

v2:
* Add comment in amdgpu_winsys_unref explaining why it always returns
  true (Marek Olšák)

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Tested-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
5 years agowinsys/amdgpu: Use amdgpu_winsys helper instead of open-coded casts
Michel Dänzer [Mon, 1 Jul 2019 07:20:11 +0000 (09:20 +0200)]
winsys/amdgpu: Use amdgpu_winsys helper instead of open-coded casts

Cleanup to prevent breakage with the next change, no functional change
intended in this one.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Tested-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
5 years agointel: fix wrong format usage
Juan A. Suarez Romero [Tue, 2 Jul 2019 17:36:56 +0000 (19:36 +0200)]
intel: fix wrong format usage

Do not use the view format when filling the surface state.

Fixes dEQP-VK.image.texel_view_compatible.compute.extended.texture.*

Fixes: fb1350c76f1 ("intel: Add and use helpers for level0 extent")

Reviewed-by: Nanley Chery <nanley.g.chery@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
5 years agoradv: only allocate a 32-bit value for the TC-compat range metadata
Samuel Pitoiset [Tue, 2 Jul 2019 12:50:28 +0000 (14:50 +0200)]
radv: only allocate a 32-bit value for the TC-compat range metadata

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
5 years agoradv: remove unused code in radv_update_tc_compat_zrange_metadata()
Samuel Pitoiset [Tue, 2 Jul 2019 12:50:27 +0000 (14:50 +0200)]
radv: remove unused code in radv_update_tc_compat_zrange_metadata()

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
5 years agoradv: add radv_get_depth_pipeline() helper
Samuel Pitoiset [Tue, 2 Jul 2019 12:50:24 +0000 (14:50 +0200)]
radv: add radv_get_depth_pipeline() helper

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
5 years agoiris: assert isl_surf_init success in resource_from_handle
Mike Blumenkrantz [Wed, 29 May 2019 20:27:39 +0000 (16:27 -0400)]
iris: assert isl_surf_init success in resource_from_handle

this can fail unexpectedly due to bugs, so it's good to provide feedback
when this occurs

Reviewed-by: Sagar Ghuge <sagar.ghuge@intel.com>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
5 years agoanv: Advertise a more accurate minTexelBufferOffsetAlignment
Jason Ekstrand [Thu, 6 Jun 2019 20:45:57 +0000 (15:45 -0500)]
anv: Advertise a more accurate minTexelBufferOffsetAlignment

Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
5 years agoanv: Implement VK_EXT_texel_buffer_alignment
Jason Ekstrand [Thu, 6 Jun 2019 20:31:17 +0000 (15:31 -0500)]
anv: Implement VK_EXT_texel_buffer_alignment

Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
5 years agovulkan: Update the XML and headers to 1.1.113
Jason Ekstrand [Thu, 6 Jun 2019 20:22:02 +0000 (15:22 -0500)]
vulkan: Update the XML and headers to 1.1.113

Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
5 years agospirv: Ignore ArrayStride in OpPtrAccessChain for Workgroup
Caio Marcelo de Oliveira Filho [Mon, 1 Jul 2019 23:06:13 +0000 (16:06 -0700)]
spirv: Ignore ArrayStride in OpPtrAccessChain for Workgroup

From OpPtrAccessChain description in the SPIR-V spec (1.4 rev 1):

    For objects in the Uniform, StorageBuffer, or PushConstant storage
    classes, the element’s address or location is calculated using a
    stride, which will be the Base-type’s Array Stride when the Base
    type is decorated with ArrayStride. For all other objects, the
    implementation will calculate the element’s address or location.

For non-CL shaders the driver should layout the Workgroup storage
class, so override any explicitly set ArrayStride in the shader.  This
currently fixes only the lower_workgroup_access_to_offsets case, which
is used by anv.

Reviewed-by: Juan A. Suarez <jasuarez@igalia.com>
5 years agonouveau: handle new CAPS
Karol Herbst [Mon, 1 Jul 2019 10:17:39 +0000 (12:17 +0200)]
nouveau: handle new CAPS

Signed-off-by: Karol Herbst <kherbst@redhat.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
5 years agointel/fs: Use nir_lower_interpolation on gen11+
Jason Ekstrand [Thu, 11 Apr 2019 19:57:12 +0000 (14:57 -0500)]
intel/fs: Use nir_lower_interpolation on gen11+

On gen11, the removed the PLN instruction so we have to emit a pile of
MAD to emulate it.  We may as well do that in NIR so we can optimize and
later schedule it.

Shader-db results on Ice Lake:

    total instructions in shared programs: 17145644 -> 16556440 (-3.44%)
    instructions in affected programs: 11507454 -> 10918250 (-5.12%)
    helped: 35763
    HURT: 42085
    helped stats (abs) min: 1 max: 140 x̄: 19.09 x̃: 18
    helped stats (rel) min: 0.04% max: 37.93% x̄: 15.40% x̃: 14.49%
    HURT stats (abs)   min: 1 max: 248 x̄: 2.22 x̃: 2
    HURT stats (rel)   min: 0.05% max: 50.00% x̄: 5.00% x̃: 2.47%
    95% mean confidence interval for instructions value: -7.67 -7.47
    95% mean confidence interval for instructions %-change: -4.46% -4.29%
    Instructions are helped.

    total loops in shared programs: 4370 -> 4370 (0.00%)
    loops in affected programs: 0 -> 0
    helped: 0
    HURT: 0

    total cycles in shared programs: 360624645 -> 368220857 (2.11%)
    cycles in affected programs: 269631244 -> 277227456 (2.82%)
    helped: 15583
    HURT: 65874
    helped stats (abs) min: 1 max: 28561 x̄: 78.45 x̃: 32
    helped stats (rel) min: <.01% max: 67.81% x̄: 5.38% x̃: 2.44%
    HURT stats (abs)   min: 1 max: 238638 x̄: 133.87 x̃: 20
    HURT stats (rel)   min: <.01% max: 306.25% x̄: 5.81% x̃: 3.97%
    95% mean confidence interval for cycles value: 67.42 119.09
    95% mean confidence interval for cycles %-change: 3.61% 3.73%
    Cycles are HURT.

    total spills in shared programs: 8943 -> 8981 (0.42%)
    spills in affected programs: 1925 -> 1963 (1.97%)
    helped: 44
    HURT: 14

    total fills in shared programs: 21815 -> 21925 (0.50%)
    fills in affected programs: 3511 -> 3621 (3.13%)
    helped: 41
    HURT: 18

    LOST:   70
    GAINED: 14

Reviewed-by: Matt Turner <mattst88@gmail.com>
5 years agointel/fs: Implement nir_intrinsic_load_fs_input_interp_deltas
Jason Ekstrand [Thu, 11 Apr 2019 19:55:40 +0000 (14:55 -0500)]
intel/fs: Implement nir_intrinsic_load_fs_input_interp_deltas

Reviewed-by: Matt Turner <mattst88@gmail.com>
5 years agointel/fs: Actually implement the load_barycentric intrinsics
Jason Ekstrand [Thu, 11 Apr 2019 19:12:58 +0000 (14:12 -0500)]
intel/fs: Actually implement the load_barycentric intrinsics

If they never get used, dead code should clean them up.  Also, we rework
the at_offset and at_sample intrinsics so they return a proper vec2
instead of returning things in PLN layout.  Fortunately, copy-prop is
pretty good at cleaning this up and it doesn't result in any actual
extra MOVs.

Reviewed-by: Matt Turner <mattst88@gmail.com>
5 years agonir: add pass to lower load_interpolated_input
Rob Clark [Wed, 3 Apr 2019 23:29:36 +0000 (19:29 -0400)]
nir: add pass to lower load_interpolated_input

Signed-off-by: Rob Clark <robdclark@gmail.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Matt Turner <mattst88@gmail.com>
5 years agopanfrost: Pass referenced BOs to the SUBMIT ioctls
Boris Brezillon [Tue, 2 Jul 2019 12:26:02 +0000 (14:26 +0200)]
panfrost: Pass referenced BOs to the SUBMIT ioctls

Instead of manually adding the BOs from the various SLAB pools plus
the one backing the color FB, we insert them in the BO set attached
to the job and let panfrost_drm_submit_job() pass all BOs from this set
to the SUBMIT ioctl.
This means we are now passing all referenced BOs and let the scheduler
wait on referenced BO fences if needed.

Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
5 years agopanfrost: Make SLAB pool creation rely on BO helpers
Boris Brezillon [Tue, 2 Jul 2019 11:50:44 +0000 (13:50 +0200)]
panfrost: Make SLAB pool creation rely on BO helpers

There's no point duplicating the code, and it will help us simplify
the bo_handles[] filling logic in panfrost_drm_submit_job().

Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
5 years agopanfrost: Add the panfrost_drm_{create,release}_bo() helpers
Boris Brezillon [Tue, 2 Jul 2019 11:21:55 +0000 (13:21 +0200)]
panfrost: Add the panfrost_drm_{create,release}_bo() helpers

To avoid the panfrost_memory <-> panfrost_bo dance done in
panfrost_resource_create_bo() and panfrost_bo_unreference().

Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
5 years agopanfrost: Move the mmap BO logic out of panfrost_drm_import_bo()
Boris Brezillon [Tue, 2 Jul 2019 11:15:58 +0000 (13:15 +0200)]
panfrost: Move the mmap BO logic out of panfrost_drm_import_bo()

So we can re-use it for the panfrost_drm_create_bo() function we are
about to introduce.

Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
5 years agopanfrost: Avoid passing winsys handles to import/export BO funcs
Boris Brezillon [Tue, 2 Jul 2019 10:53:17 +0000 (12:53 +0200)]
panfrost: Avoid passing winsys handles to import/export BO funcs

Let's keep a clear split between ioctl wrappers and the rest of the
driver. All the import BO function need is a dmabuf FD and the screen
object, and the export one should only take care of generating a dmabuf
FD out of a BO object. Winsys handle manipulation should stay in the
resource.c file.

Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
5 years agopanfrost: Move BO meta-data out of panfrost_bo
Boris Brezillon [Tue, 2 Jul 2019 09:37:40 +0000 (11:37 +0200)]
panfrost: Move BO meta-data out of panfrost_bo

That's what most (all?) implementation seem to do, and my understanding
is that a BO is just a bunch of memory that can be used for anything GPU
related, not only texture/FB resources.

Let's move those meta data in panfrost_resource so we can use
panfrost_bo for all kind of memory allocation and make BO allocation
more consistent.

Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
5 years agopanfrost: Stop exposing internal panfrost_drm_*() functions
Boris Brezillon [Tue, 2 Jul 2019 12:20:07 +0000 (14:20 +0200)]
panfrost: Stop exposing internal panfrost_drm_*() functions

panfrost_drm_submit_job() and panfrost_fence_create() are not used
outside of pan_drm.c.

Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
5 years agopanfrost: Get rid of the "free imported BO" logic
Boris Brezillon [Tue, 2 Jul 2019 08:30:45 +0000 (10:30 +0200)]
panfrost: Get rid of the "free imported BO" logic

bo->imported was never set to true which means this path was never taken.
Moreover, panfrost_drm_free_imported_bo() is doing missing the munmap()
call which seems wrong because the import BO function calls mmap().

Let's just kill this function along with the ->imported field.

Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
5 years agopanfrost: Get rid of the panfrost_driver abstraction leftovers
Boris Brezillon [Tue, 2 Jul 2019 08:25:26 +0000 (10:25 +0200)]
panfrost: Get rid of the panfrost_driver abstraction leftovers

Commit 5f81669d880b ("panfrost: Remove the panfrost_driver abstraction")
left a few things behind, remove them now.

Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
5 years agopanfrost: Move scanout res creation out of panfrost_resource_create()
Boris Brezillon [Tue, 2 Jul 2019 08:12:10 +0000 (10:12 +0200)]
panfrost: Move scanout res creation out of panfrost_resource_create()

Which improves readability and help us avoid a memory leak.

Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
5 years agopanfrost: Add the sampled texture BO to the job
Boris Brezillon [Mon, 1 Jul 2019 15:22:26 +0000 (17:22 +0200)]
panfrost: Add the sampled texture BO to the job

Otherwise we get random use-after-{free,unmap} errors.

Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
---
Changes in v2:
- Move the panfrost_job_add_bo() call out of the loop

5 years agoradv: enable DCC for layers on GFX8
Samuel Pitoiset [Mon, 1 Jul 2019 14:31:01 +0000 (16:31 +0200)]
radv: enable DCC for layers on GFX8

It's currently only enabled if dcc_slice_size is equal to
dcc_slice_fast_clear_size because the driver assumes that
portions of multiple layers are contiguous but it's not
always true.

Still not supported on GFX9.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
5 years agoradv: do not enable DCC for mipmapped arrays because performance is worse
Samuel Pitoiset [Mon, 1 Jul 2019 14:31:00 +0000 (16:31 +0200)]
radv: do not enable DCC for mipmapped arrays because performance is worse

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
5 years agoradv: implement clearing DCC layers on GFX8
Samuel Pitoiset [Mon, 1 Jul 2019 14:30:59 +0000 (16:30 +0200)]
radv: implement clearing DCC layers on GFX8

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
5 years agoradv: merge radv_dcc_clear_level() into radv_clear_dcc()
Samuel Pitoiset [Mon, 1 Jul 2019 14:30:58 +0000 (16:30 +0200)]
radv: merge radv_dcc_clear_level() into radv_clear_dcc()

This will help for clearing DCC arrays because we need to know
the subresource range.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
5 years agoradv: add support for decompressing DCC layers with compute
Samuel Pitoiset [Mon, 1 Jul 2019 14:30:57 +0000 (16:30 +0200)]
radv: add support for decompressing DCC layers with compute

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
5 years agoac: compute the DCC fast clear size per slice on GFX8
Samuel Pitoiset [Mon, 1 Jul 2019 14:30:56 +0000 (16:30 +0200)]
ac: compute the DCC fast clear size per slice on GFX8

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
5 years agoac: compute the size of one DCC slice on GFX8
Samuel Pitoiset [Mon, 1 Jul 2019 14:30:55 +0000 (16:30 +0200)]
ac: compute the size of one DCC slice on GFX8

Addrlib doesn't provide this info. Because DCC is linear, at least
on GFX8, it's easy to compute the size of one slice.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
5 years agoiris: Defer closing and freeing VMA until buffers are idle.
Kenneth Graunke [Sat, 4 May 2019 03:27:40 +0000 (20:27 -0700)]
iris: Defer closing and freeing VMA until buffers are idle.

There will unfortunately be circumstances where we cannot re-use a
virtual memory address until it's no longer active on the GPU.  To
facilitate this, we instead move BOs to a "dead" list, and defer
closing them and returning their VMA until they are idle.  We
periodically sweep these away in cleanup_bo_cache, which triggers
every time a new object's refcount hits zero.

Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Tested-by: Jordan Justen <jordan.l.justen@intel.com>
5 years agoiris: Add an explicit alignment parameter to iris_bo_alloc_tiled().
Kenneth Graunke [Sat, 23 Mar 2019 17:04:16 +0000 (10:04 -0700)]
iris: Add an explicit alignment parameter to iris_bo_alloc_tiled().

In the future, some images will need to be aligned to a larger value
than 4096.  Most buffers, however, don't have any such requirement,
so for now we only add the parameter to iris_bo_alloc_tiled() and
leave the others with the simpler interface.

v2: Fix missing alignment in vma_alloc, caught by Caio!

Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Tested-by: Jordan Justen <jordan.l.justen@intel.com>
5 years agov3d: do not flush jobs that are synced with 'Wait for transform feedback'
Iago Toral Quiroga [Thu, 20 Jun 2019 11:46:02 +0000 (13:46 +0200)]
v3d: do not flush jobs that are synced with 'Wait for transform feedback'

Generally, we achieve this by skipping the flush on calls to
v3d_flush_jobs_writing_resource() when we detect that the resource is written
in the current job from a transform feedback write.

The exception to this is the case where the caller is about to map the
resource, in which case we need to flush immediately since we can only emit
'Wait for transform feedback' commands on rendering jobs. We add a parameter
to the function so the caller can identify that scenario.

Reviewed-by: Eric Anholt <eric@anholt.net>
5 years agov3d: emit 'Wait for transform feedback' commands when needed
Iago Toral Quiroga [Thu, 20 Jun 2019 10:14:17 +0000 (12:14 +0200)]
v3d: emit 'Wait for transform feedback' commands when needed

The hardware can flush transform feedback writes before reads in the same
job by inserting this command.

This patch detects when the rendering state for the current draw call reads
resources that had been previously written by transform feedback in the
same job and inserts the 'Wait for transform feedback' command before
emitting the new draw.

v2 (Eric):
  - this was intended to look at job->tf_write_prscs for TF jobs.
  - clear job->tf_write_prscs after we emit the TF flush.
  - can skip flushes for fragment shader reads from TF.

v3 (Eric):
  - all resources in job->tf_write_prscs are resources written by TF so
   we don't need to check if they are bound to PIPE_BIND_STREAM_OUTPUT.
  - documented optimization opportunity for geometry stages.

Reviewed-by: Eric Anholt <eric@anholt.net>
5 years agov3d: keep track of resources written by transform feedback
Iago Toral Quiroga [Thu, 20 Jun 2019 11:38:56 +0000 (13:38 +0200)]
v3d: keep track of resources written by transform feedback

The hardware provides a feature to sync reads from previous transform feedback
writes in the same job so if we use this mechanism we no longer have to flush
the job.

In order to identify this scenario we need a mechanism to identify resources
that are written by transform feedback.

v2: use _mesa_pointer_set_create (Eric)

Reviewed-by: Eric Anholt <eric@anholt.net>
5 years agost/dri: fix typo in format table for GR1616 format
Mike Blumenkrantz [Thu, 30 May 2019 12:47:59 +0000 (08:47 -0400)]
st/dri: fix typo in format table for GR1616 format

the dri image format here should match the fourcc format

Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
5 years agost/dri: pass dri2_format_mapping directly to dri2_create_image_from_winsys
Mike Blumenkrantz [Wed, 29 May 2019 20:56:30 +0000 (16:56 -0400)]
st/dri: pass dri2_format_mapping directly to dri2_create_image_from_winsys

this makes the entire struct available for use here

Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
5 years agomesa/st: simplify format usage in st_bind_egl_image
Mike Blumenkrantz [Wed, 29 May 2019 20:41:59 +0000 (16:41 -0400)]
mesa/st: simplify format usage in st_bind_egl_image

the formats handled in the switch statement will always return an
unknown mesa format, so process them directly and leave the default
case for other/unknown formats

no functional changes

Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
5 years agoiris: Use MI_COPY_MEM_MEM for tiny resource_copy_region calls.
Kenneth Graunke [Wed, 26 Jun 2019 07:05:06 +0000 (00:05 -0700)]
iris: Use MI_COPY_MEM_MEM for tiny resource_copy_region calls.

If our resource_copy_region size is a small number of DWords, then
instead of firing up BLORP, we can simply use MI_COPY_MEM_MEM (after
a CS stall).  We also try and select the optimal batch.

Improves performance in Shadow of Mordor on Low settings at 1920x1080
on Skylake GT4e by 0.689096% +/- 0.473968% (n=4).  It tries to copy
4 bytes of data to a buffer which was most recently used as a writable
compute shader SSBO.  Previously we were switching from compute to the
render pipeline, then firing up all of blorp_buffer_copy...for 4 bytes.

I arbitrarily decided to support 4/8/12/16 bytes.  Jason thinks this
is about the right threshold where it's cheaper to use MI_COPY_MEM_MEM.

5 years agoradv: Only allocate supplied number of descriptors when variable.
Bas Nieuwenhuizen [Sat, 29 Jun 2019 01:07:03 +0000 (03:07 +0200)]
radv: Only allocate supplied number of descriptors when variable.

Fixes: b5e04e9217b "radv: Support allocating variable size descriptor sets."
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=111019
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
5 years agoegl: simplify loop
Eric Engestrom [Thu, 13 Dec 2018 17:35:28 +0000 (17:35 +0000)]
egl: simplify loop

Signed-off-by: Eric Engestrom <eric.engestrom@intel.com>
Reviewed-by: Sagar Ghuge<sagar.ghuge@intel.com>
5 years agosparc: Reuse m_vector_asm.h.
Eric Anholt [Thu, 20 Jun 2019 17:35:32 +0000 (10:35 -0700)]
sparc: Reuse m_vector_asm.h.

Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>
5 years agomesa: Enable asm unconditionally, now that gen_matypes is gone.
Eric Anholt [Thu, 20 Jun 2019 17:27:28 +0000 (10:27 -0700)]
mesa: Enable asm unconditionally, now that gen_matypes is gone.

Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
5 years agomesa: Replace gen_matypes with a simple header for V4F/mat layout.
Eric Anholt [Thu, 20 Jun 2019 17:18:41 +0000 (10:18 -0700)]
mesa: Replace gen_matypes with a simple header for V4F/mat layout.

We can greatly simplify our builds by just hardcoding GLvector4f and
GLmatrix's layouts.

Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>
5 years agomatypes: Drop some unused defines.
Eric Anholt [Thu, 20 Jun 2019 16:18:20 +0000 (09:18 -0700)]
matypes: Drop some unused defines.

Most of these haven't been used since the conversion from checked-in
matypes to generation.  By cutting down the generated contents, this
should clarify why the file is generated: we need
architecture-specific offsets to the V4F fields in the asm that uses
it.

v2: Keep matrix offsets to prevent x86 build breakage..

Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>
5 years agomeson: drop duplicate source & inc_dir
Eric Engestrom [Sat, 29 Jun 2019 23:14:47 +0000 (00:14 +0100)]
meson: drop duplicate source & inc_dir

These two are already pulled from `idep_vulkan_util_headers`.

Signed-off-by: Eric Engestrom <eric.engestrom@intel.com>
Reviewed-by: Dylan Baker <dylan@pnwbakers.com>
5 years agoswrast: simplify function pointer calls
Eric Engestrom [Sat, 29 Jun 2019 23:01:15 +0000 (00:01 +0100)]
swrast: simplify function pointer calls

Signed-off-by: Eric Engestrom <eric.engestrom@intel.com>
Reviewed-by: Sagar Ghuge <sagar.ghuge@intel.com>
5 years agoegl/wayland: use bitset.h for `formats` bit set
Eric Engestrom [Tue, 27 Nov 2018 12:27:45 +0000 (12:27 +0000)]
egl/wayland: use bitset.h for `formats` bit set

Currently only 7 formats are supported, but we don't want the 16 limit
(it's an `unsigned`) to hit us by surprise :]

Let's use bitset.h's BITSET magic to allow us to have any number of
formats, with a static assert to make sure we don't forget to update it.

Signed-off-by: Eric Engestrom <eric.engestrom@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
5 years agointel/tools: Add assembler unit tests for ROL/ROR instructions
Sagar Ghuge [Tue, 4 Jun 2019 20:05:20 +0000 (13:05 -0700)]
intel/tools: Add assembler unit tests for ROL/ROR instructions

Signed-off-by: Sagar Ghuge <sagar.ghuge@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
5 years agointel/tools: Add ROL/ROR support in assembler
Sagar Ghuge [Tue, 4 Jun 2019 20:04:49 +0000 (13:04 -0700)]
intel/tools: Add ROL/ROR support in assembler

Signed-off-by: Sagar Ghuge <sagar.ghuge@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
5 years agonir: Add lower_rotate flag and set to true in all drivers
Sagar Ghuge [Tue, 4 Jun 2019 00:11:57 +0000 (17:11 -0700)]
nir: Add lower_rotate flag and set to true in all drivers

Signed-off-by: Sagar Ghuge <sagar.ghuge@intel.com>
Suggested-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
5 years agointel/compiler: Emit ROR and ROL instruction
Sagar Ghuge [Thu, 30 May 2019 21:14:52 +0000 (14:14 -0700)]
intel/compiler: Emit ROR and ROL instruction

v2: Reorder patch (Matt Turner)

Signed-off-by: Sagar Ghuge <sagar.ghuge@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
5 years agonir: Add optimization to use ROR/ROL instructions
Sagar Ghuge [Thu, 30 May 2019 21:15:51 +0000 (14:15 -0700)]
nir: Add optimization to use ROR/ROL instructions

v2: 1) Add more optimization rules for ROL/ROR (Matt Turner)
    2) Add lowering rules for ROL/ROR (Matt Turner)

Signed-off-by: Sagar Ghuge <sagar.ghuge@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
5 years agonir: Add urol and uror opcodes
Sagar Ghuge [Thu, 30 May 2019 21:11:58 +0000 (14:11 -0700)]
nir: Add urol and uror opcodes

Signed-off-by: Sagar Ghuge <sagar.ghuge@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
5 years agointel/compiler: Enable the emission of ROR/ROL instructions
Sagar Ghuge [Wed, 29 May 2019 18:43:30 +0000 (11:43 -0700)]
intel/compiler: Enable the emission of ROR/ROL instructions

v2: 1) Drop changes for vec4 backend as on Gen11+ we don't support
       align16 mode (Matt Turner)

Signed-off-by: Sagar Ghuge <sagar.ghuge@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
5 years agopanfrost: Implement instanced rendering
Alyssa Rosenzweig [Thu, 27 Jun 2019 21:13:10 +0000 (14:13 -0700)]
panfrost: Implement instanced rendering

We implement GLES3.0 instanced rendering with full support for instanced
arrays (via instance divisors). To do so, we use the new invocation
helpers to invoke a triplet of (1, vertex_count, instance_count), rather
than simply (1, vertex_count, 1). We rewrite the attribute handling code
into a new pan_instancing.c file which handles both the simple LINEAR
case for non-instanced as well as each of the new instancing cases:
MODULO (for per-vertex attributes), POT and NPOT divisors.

As a side effect, we rework how vertex buffers are handled, duplicating
them to be 1:1 with vertex descriptors to simplify instancing code paths
dramatically. This might be a performance regression, but this remains
to be seen; if so, we can always deduplicate later with some added logic
in pan_instancing.c

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
5 years agopanfrost/decode: Compute padded_num_vertices for MODULO
Alyssa Rosenzweig [Thu, 27 Jun 2019 17:18:22 +0000 (10:18 -0700)]
panfrost/decode: Compute padded_num_vertices for MODULO

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
5 years agopanfrost/midgard: Emit type appropriate ld_vary
Alyssa Rosenzweig [Fri, 28 Jun 2019 16:30:59 +0000 (09:30 -0700)]
panfrost/midgard: Emit type appropriate ld_vary

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
5 years agopanfrost/midgard: Add unsigned ld/st ops
Alyssa Rosenzweig [Fri, 28 Jun 2019 16:07:30 +0000 (09:07 -0700)]
panfrost/midgard: Add unsigned ld/st ops

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
5 years agopanfrost/midgard: Use the appropriate ld_attr type
Alyssa Rosenzweig [Thu, 27 Jun 2019 22:33:07 +0000 (15:33 -0700)]
panfrost/midgard: Use the appropriate ld_attr type

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
5 years agopanfrost: Implement dispatch helpers
Alyssa Rosenzweig [Thu, 27 Jun 2019 15:29:06 +0000 (08:29 -0700)]
panfrost: Implement dispatch helpers

Rather than open-coding workgroups_shift_* type fields, we include a
general routine for packing the vertex/tiler/compute descriptor based on
the provided dispatch parameters.

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
5 years agopanfrost: Remove ancient comment
Alyssa Rosenzweig [Thu, 27 Jun 2019 14:43:33 +0000 (07:43 -0700)]
panfrost: Remove ancient comment

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
5 years agopanfrost: Extend software tiling to larger bpp
Alyssa Rosenzweig [Mon, 1 Jul 2019 14:39:22 +0000 (07:39 -0700)]
panfrost: Extend software tiling to larger bpp

Should not affect lima.

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
5 years agopanfrost: Rewrite u-interleaving code
Alyssa Rosenzweig [Tue, 25 Jun 2019 15:09:58 +0000 (08:09 -0700)]
panfrost: Rewrite u-interleaving code

Rather than using a magic lookup table with no explanations, let's add
liberal comments to the code to explain what this tiling scheme is and
how to encode/decode it efficiently.

It's not so mysterious after all -- just reordering bits with some XORs
thrown in.

v2: Correct copyright identifier. Fix spelling error. Switch space_4 to
a LUT. Fix comment typo. Use LUT instead of space_x tricks. Fallback on
generic rather than split up unaligned writes.

v3: Correct stride order (fixes crash loading). Correct coordinate
system mishap.

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Reviewed-by: Vasily Khoruzhick <anarsoul@gmail.com>
Tested-by: Andreas Baierl <ichgeh@imkreisrum.de>
5 years agofreedreno: update generated registers
Rob Clark [Mon, 1 Jul 2019 13:14:41 +0000 (06:14 -0700)]
freedreno: update generated registers

Corrects the a3xx texconst state for TILE_MODE.

Signed-off-by: Rob Clark <robdclark@chromium.org>