Lionel Landwerlin [Wed, 7 Mar 2018 14:28:41 +0000 (14:28 +0000)]
i965: perf: add support for raw queries
The INTEL_performance_query extension provides a list of queries that
a user can select to monitor a particular workload. Each query reports
different sets of counters (roughly looking at different parts of the
hardware, i.e. caches/fixed functions/etc...).
Each query has an associated configuration that we need to program
into the hardware before using the query. Up to now, we provided
predefined queries. This change allows the user to build its own query
(and associated configuration) externally, and have the i965 driver
use that configuration through a new query named :
Intel_Raw_Hardware_Counters_Set_0_Query
When this query is selected, the i965 driver will report raw counters
deltas (meaning their values need to be interpreted by the user, as
opposed to existing queries that provide human readable values).
This change is also useful for debug purposes for building new
pre-defined queries and verifying the underlying numbers make sense
before writing equations for user readable output.
This change's purpose is also to enable GPA. GPA uses a library called
MDAPI that processes raw counter data. MDAPI expects raw data to have
a certain layout (per generation which is a bit unfortunate...). This
change also embeds the expected data layouts.
v2: Enable raw queries on gen 7->11, v1 had 7->9 (Lionel)
v3: Don't assert on cherryview for gen7... (Ken)
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Lionel Landwerlin [Wed, 7 Mar 2018 16:02:40 +0000 (16:02 +0000)]
i965: perf: read slice/unslice frequencies from OA reports
v2: Add comment breaking down where the frequency values come from (Ken)
v3: More documentation (Ken/Lionel)
Adjust clock ratio multiplier to reflect the divider's behavior (Lionel)
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Lionel Landwerlin [Wed, 7 Mar 2018 10:46:58 +0000 (10:46 +0000)]
i965: perf: snapshot RPSTAT register
This register contains the current/previous frequency of the GT, it's
one of the value GPA would like to have as part of their queries.
v2: Don't use this register on baytrail/cherryview (Ken)
Use GET_FIELD() macro (Ken)
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Lionel Landwerlin [Tue, 6 Mar 2018 17:09:21 +0000 (17:09 +0000)]
i965: perf: extract utility functions
We would like to reuse a number of the functions and structures in
another file in a future commit.
We also move the previous content of brw_performance_query.h into
brw_performance_query_metrics.h to be included by generated metrics
files.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Samuel Pitoiset [Mon, 23 Apr 2018 14:55:39 +0000 (16:55 +0200)]
ac: teach get_ac_sampler_dim() about subpass attachments
Suggested by Nicolai.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Samuel Pitoiset [Mon, 23 Apr 2018 12:46:26 +0000 (14:46 +0200)]
ac/nir: add missing round_slice for 1D arrays
This fixes a bunch of CTS fails with 1D arrays:
dEQP-VK.glsl.texture_functions.texture*.sampler1darray_*
Fixes:
625dcbbc456 ("amd/common: pass address components individually to
ac_build_image_intrinsic")
Cc: 18.1 <mesa-stable@lists.freedesktop.org>
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Dylan Baker [Mon, 9 Apr 2018 20:59:55 +0000 (13:59 -0700)]
bin/install_megadrivers: rename a few variables to make things clearer
Originally the "each" variable was just a part of the "drivers"
variable. It's not anymore so it's a bit ambiguous.
Signed-off-by: Dylan Baker <dylan.c.baker@intel.com>
Dylan Baker [Mon, 9 Apr 2018 20:53:09 +0000 (13:53 -0700)]
bin/install_megadrivers: fix DESTDIR and -D*-path
This fixes -Ddri-drivers-path, -Dvdpau-libs-path, etc. with DESTDIR when
those paths are absolute. Currently due to the way python's os.path.join
handles absolute paths these will ignore DESTDIR, which is bad. This
fixes them to be relative to DESTDIR if that is set.
Fixes:
3218056e0eb375eeda470058d06add1532acd6d4
("meson: Build i965 and dri stack")
Signed-off-by: Dylan Baker <dylan.c.baker@intel.com>
Dylan Baker [Thu, 19 Apr 2018 18:02:32 +0000 (11:02 -0700)]
compiler/glsl: close fd's in glcpp_test.py
I would have thought falling out of scope would allow the gc to collect
these, but apparently it doesn't, and this hits an fd limit on macos.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=106133
Fixes:
db8cd8e36771eed98eb638fd0593c978c3da52a9
("glcpp/tests: Convert shell scripts to a python script")
Signed-off-by: Dylan Baker <dylan.c.baker@intel.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Tested-by: Vinson Lee <vlee@freedesktop.org>
Bas Nieuwenhuizen [Sun, 22 Apr 2018 17:05:19 +0000 (19:05 +0200)]
nir: Do not use progress for unreachable code in return lowering.
We seem to use progress for two cases:
1) When we lowered some returns.
2) When we remove unreachable code.
If just case 2 happens we assert as state->return_flag has not
been allocated yet, but we are still trying to do insert all
predicates based on it.
This splits the concerns. We only use progress internally for case 1
and then keep track of 2 in a separate variable to indicate progress
in the return value of the pass.
This is slightly better than transforming the assert into
if (!state->return_flag) return, as the solution in this patch avoids
inserting predicates even if some other part of the might need them.
Fixes:
6e22ad6edc "nir: return early when lowering a return at the end of a function"
CC: 18.1 <mesa-stable@lists.freedesktop.org>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=106174
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
Józef Kucia [Tue, 10 Apr 2018 22:11:57 +0000 (00:11 +0200)]
radv: advertise 8 bits of subpixel precision for viewports
This is what radeonsi does.
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Johan Klokkhammer Helsing [Fri, 20 Apr 2018 10:29:16 +0000 (12:29 +0200)]
st/dri: Fix dangling pointer to a destroyed dri_drawable
If an EGLSurface is created, made current and destroyed, and then a second
EGLSurface is created. Then the second malloc in driCreateNewDrawable may
return the same pointer address the first surface's drawable had.
Consequently, when dri_make_current later tries to determine if it should
update the texture_stamp it compares the surface's drawable pointer against
the drawable in the last call to dri_make_current and assumes it's the same
surface (which it isn't).
When texture_stamp is left unset, then dri_st_framebuffer_validate thinks
it has already called update_drawable_info for that drawable, leaving it
unvalidated and this is when bad things starts to happen. In my case it
manifested itself by the width and height of the surface being unset.
This is fixed this by setting the pointer to NULL before freeing the
surface.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=106126
Signed-off-by: Johan Klokkhammer Helsing <johan.helsing@qt.io>
Signed-off-by: Marek Olšák <marek.olsak@amd.com>
Cc: 18.0 18.1 <mesa-stable@lists.freedesktop.org>
Ilia Mirkin [Tue, 10 Apr 2018 02:19:35 +0000 (22:19 -0400)]
nv50/ir: make a copy of tex src if it's referenced multiple times
For nv50 we coalesce the srcs and defs into a single node. As such, we
can end up with impossible constraints if the source is referenced
after the tex operation (which, due to the coalescing of values, will
have overwritten it).
This logic already exists for inserting moves for MERGE/UNION sources.
It's the exact same idea here, so leverage that code, which also
includes a few optimizations around not extending live ranges
unnecessarily.
Fixes tests/spec/glsl-1.30/execution/fs-textureSize-components.shader_test
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Lepton Wu [Thu, 5 Apr 2018 19:38:48 +0000 (12:38 -0700)]
virgl: disable virgl when no 3D for virtio gpu.
If users are running mesa under old version of qemu or have turned off
GL at runtime, virtio gpu driver actually doesn't work. Adds a detection
here so mesa can fall back to software rendering.
v2:
- move detection from loader to virgl (Ilia, Emil)
Signed-off-by: Lepton Wu <lepton@chromium.org>
Reviewed-by: Dave Airlie <airlied@redhat.com>
Dave Airlie [Fri, 13 Apr 2018 02:40:55 +0000 (12:40 +1000)]
radv: mark const structs as extern in header file to avoid lto damage
The copr repo from che was using LTO and he reported radv broke
recently with it. When testing with lto builds here I noticed
that we weren't seeing any instance extensions reported.
It appears LTO was treating the const without extern as an empty
struct, this is possibly a gcc bug, but we can work around it
just by marking these with extern.
Acked-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
Dylan Baker [Sat, 21 Apr 2018 03:31:26 +0000 (20:31 -0700)]
Bump version after 18.1
Signed-off-by: Dylan Baker <dylan.c.baker@intel.com>
Ilia Mirkin [Thu, 1 Mar 2018 00:40:48 +0000 (19:40 -0500)]
gallium/tests/trivial: fix viewport depth transform
These were getting mapped off into outer space, which would cause nv50
and nvc0 to clip the primitives (as depth_clip was enabled).
These drivers are configured to clip everything outside the [0, 1]
range, even though the hardware supports other view settings.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Ilia Mirkin [Tue, 27 Feb 2018 00:26:36 +0000 (19:26 -0500)]
trace: allow image resource to be null
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Karol Herbst [Tue, 27 Mar 2018 17:10:34 +0000 (19:10 +0200)]
nv50/ir/ra: prefer def == src2 for fma with immediates on nvc0
This helps with the PostRALoadPropagation pass moving long immediates into
FMA/MAD instructions.
changes in shader-db:
total instructions in shared programs : 5894114 -> 5886074 (-0.14%)
total gprs used in shared programs : 666558 -> 666563 (0.00%)
total shared used in shared programs : 520416 -> 520416 (0.00%)
total local used in shared programs : 53524 -> 53524 (0.00%)
total bytes used in shared programs :
54006744 ->
53932472 (-0.14%)
local shared gpr inst bytes
helped 0 0 2 4192 4192
hurt 0 0 7 9 9
Signed-off-by: Karol Herbst <karolherbst@gmail.com>
[imirkin: minor edits to separate nv50 and nvc0+ cases]
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Rhys Perry [Sat, 21 Apr 2018 10:43:16 +0000 (11:43 +0100)]
docs/features: mark GL_ARB_post_depth_coverage as DONE for nvc0
This was done a while ago but never marked on features.txt. Note that
this is only supported on GM200+.
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Dylan Baker [Sat, 21 Apr 2018 01:52:55 +0000 (18:52 -0700)]
autotools: Include new meson files
Signed-off-by: Dylan Baker <dylan.c.baker@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Dylan Baker [Sat, 21 Apr 2018 02:04:01 +0000 (19:04 -0700)]
autotools: Add passes.h to sources so it will be included in the tarball
This was introduced in commit
8f848ada8a42d9aaa8136afa1bafe32281a0fb48
but not added to the sources list, which is necessary for it to be
included in release tarballs.
Fixes:
8f848ada8a42d9aaa8136afa1bafe32281a0fb48
("swr/rast: Start refactoring of builder/packetizer.")
Signed-off-by: Dylan Baker <dylan.c.baker@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Dylan Baker [Sat, 21 Apr 2018 01:28:57 +0000 (18:28 -0700)]
autotools: include include/vulkan headers
This is needed to provide vk_android_native_buffer.h for vk_enum_to_str.
v2: - remove accidentally included changes
Signed-off-by: Dylan Baker <dylan.c.baker@intel.com>
Acked-by: Kenneth Graunke <kenneth@whitecape.org>
Rhys Perry [Fri, 20 Apr 2018 22:32:05 +0000 (23:32 +0100)]
nvc0: fix line width on GM20x+
This has the side-effect of fixing polygon-offset piglit test failures.
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Nanley Chery [Mon, 9 Apr 2018 18:20:27 +0000 (11:20 -0700)]
i965/miptree: Delete an unused function
We're going to combine ::mcs_buf and ::hiz_buf in later commits. Once
that happens, this function no longer make sense.
Reviewed-by: Rafael Antognolli <rafael.antognolli@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Nanley Chery [Mon, 9 Apr 2018 18:27:08 +0000 (11:27 -0700)]
i965/miptree: Don't leak the clear_color_bo
Free the clear_color_bo in addition to freeing the
intel_miptree_aux_buffer which holds the reference to it.
Reviewed-by: Rafael Antognolli <rafael.antognolli@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Jason Ekstrand [Tue, 17 Apr 2018 22:07:13 +0000 (15:07 -0700)]
i965/blorp: Do the gen11 BTI flush
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Jason Ekstrand [Tue, 17 Apr 2018 22:06:46 +0000 (15:06 -0700)]
anv/blorp: Do the gen11 BTI flush
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Lucas Stach [Fri, 20 Apr 2018 12:34:45 +0000 (14:34 +0200)]
etnaviv: fix texture_format_needs_swiz
memcmp returns 0 when both swizzles are the same, which means we don't
need any hardware swizzling. texture_format_needs_swiz should return
true when the return value of the memcmp is non-zero.
Fixes:
751ae6afbefd ("etnaviv: add support for swizzled texture formats")
Cc: mesa-stable@lists.freedesktop.org
Signed-off-by: Lucas Stach <l.stach@pengutronix.de>
Tested-by: Marek Vasut <marex@denx.de>
Reviewed-by: Christian Gmeiner <christian.gmeiner@gmail.com>
Reviewed-by: Wladimir J. van der Laan <laanwj@gmail.com>
Samuel Pitoiset [Fri, 20 Apr 2018 16:06:43 +0000 (18:06 +0200)]
ac/nir: fix image dimension for subpass attachments
For subpass attachments we need one more coordinate with
the layer, so make them array types.
This fixes a bunch of CTS fails with RADV.
Fixes:
24fb3e6aa1 ("ac/nir: use ac_build_image_opcode for image intrinsics")
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Bas Nieuwenhuizen [Fri, 20 Apr 2018 16:16:02 +0000 (18:16 +0200)]
radv: Mark GTT memory as device local for APUs.
Otherwise a lot of games complain about not having enough memory,
and it is sort of local so this seems reasonable to me.
CC: 18.0 <mesa-stable@lists.freedesktop.org>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Samuel Pitoiset [Fri, 20 Apr 2018 11:42:36 +0000 (13:42 +0200)]
radv/winsys: allow to submit up to 4 IBs for chips without chaining
The SI family doesn't support chaining which means the maximum
size in dwords per CS is limited. When that limit was reached
we failed to submit the CS and the application crashed.
This patch allows to submit up to 4 IBs which is currently the
limit, but recent amdgpu supports more than that.
Please note that we can reach the limit of 4 IBs per submit
but currently we can't improve that. The only solution is to
upgrade libdrm. That will be improved later but for now this
should fix crashes on SI or when using RADV_DEBUG=noibs.
Fixes:
36cb5508e89 ("radv/winsys: Fail early on overgrown cs.")
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=105775
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Stefan Schake [Sun, 15 Apr 2018 22:45:17 +0000 (00:45 +0200)]
gallium/util: Android backtrace support
We can't use any of the existing implementations in u_debug_stack.
Android technically has libunwind, but it's been modified to the point
where it no longer compiles with the Mesa usage. The library is also
not meant to be referenced by vendor libraries. The officially sanctioned
way of obtaining backtraces is through the Android own libbacktrace, a
C++ library. Access it through a separate C++ source file on Android only.
Signed-off-by: Stefan Schake <stschake@gmail.com>
Acked-by: Eric Engestrom <eric.engestrom@imgtec.com>
Reviewed-by: Rob Herring <robh@kernel.org>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Stefan Schake [Sun, 15 Apr 2018 22:45:16 +0000 (00:45 +0200)]
gallium/util: Don't stub u_debug_stack on Android
The fallback path for no libunwind ends up being stubs for Android.
Don't compile them in so we can provide our own implementation.
Signed-off-by: Stefan Schake <stschake@gmail.com>
Reviewed-by: Eric Engestrom <eric.engestrom@imgtec.com>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Samuel Pitoiset [Fri, 20 Apr 2018 14:58:24 +0000 (16:58 +0200)]
ac/nir: handle nir_intrinsic_load_first_vertex like base_vertex
This fixes a ton of CTS crashes.
Fixes:
c366f422f0 ("nir: Offset vertex_id by first_vertex instead of base_vertex")
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Samuel Pitoiset [Fri, 20 Apr 2018 13:11:24 +0000 (15:11 +0200)]
radv/winsys: allow local BOs on APUs
Ported from RadeonSI.
Local BOs ignore BO priorities, and we don't need those on APUs.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Samuel Pitoiset [Thu, 19 Apr 2018 11:48:33 +0000 (13:48 +0200)]
radv: use a global BO list only for VK_EXT_descriptor_indexing
Maintaining two different paths is annoying but this gets
rid of the performance regression introduced by the global
BO list.
We might find a better solution in the future, but for now
just keeps two paths.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Samuel Pitoiset [Thu, 19 Apr 2018 11:39:17 +0000 (13:39 +0200)]
Revert "radv: Don't store buffer references in the descriptor set."
In order to reduce a performance regression introduced by
4b13fe55a4 ("radv: Keep a global BO list for VkMemory."),
we are going to maintain two different paths.
One when VK_EXT_descriptor_indexing is enabled by the
application because we need to have a global BO list, and
one (the old one) when it's not enabled.
With Talos on Polaris, the global BO list reduces performance
by 10% which is too much for me.
This reverts commit
ab6cadd3ecc7fbdd9079808b407674e0b19c52f0.
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Jose Maria Casanova Crespo [Mon, 19 Mar 2018 14:03:17 +0000 (15:03 +0100)]
i965/fs: retype offset_reg to UD at load_ssbo
All operations with offset_reg at do_vector_read are done
with UD type. So copy propagation was not working through
the generated MOVs:
mov(8) vgrf9:UD, vgrf7:D
This change allows removing the MOV generated for reading the
first components for 16-bit and 64-bit ssbo reads with
non-constant offsets.
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Nicolai Hähnle [Fri, 20 Apr 2018 07:30:07 +0000 (09:30 +0200)]
ac/nir: use ac_build_image_opcode for image intrinsics
So that we'll use the dimension-aware intrinsics in the future.
Acked-by: Marek Olšák <marek.olsak@amd.com>
Nicolai Hähnle [Fri, 20 Apr 2018 07:29:57 +0000 (09:29 +0200)]
radeonsi: generate image load/store/atomic ops using ac_build_image_opcode
In preparation of dimension-aware LLVM image intrinsics.
Acked-by: Marek Olšák <marek.olsak@amd.com>
Nicolai Hähnle [Fri, 23 Mar 2018 10:20:24 +0000 (11:20 +0100)]
amd/common: pass address components individually to ac_build_image_intrinsic
This is in preparation for the new image intrinsics.
Acked-by: Marek Olšák <marek.olsak@amd.com>
Nicolai Hähnle [Fri, 16 Feb 2018 13:21:56 +0000 (14:21 +0100)]
amd/common: pass new enum ac_image_dim to ac_build_image_opcode
This is in preparation for the new, dimension-aware LLVM image
intrinsics.
Acked-by: Marek Olšák <marek.olsak@amd.com>
Nicolai Hähnle [Wed, 4 Apr 2018 19:14:13 +0000 (21:14 +0200)]
radeonsi/nir: fix crash in test involving the sample mask
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
Nicolai Hähnle [Mon, 2 Apr 2018 11:20:02 +0000 (13:20 +0200)]
radeonsi/nir: set FS properties only when scanning a fragment shader
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
Nicolai Hähnle [Mon, 2 Apr 2018 12:12:50 +0000 (14:12 +0200)]
ac/nir: fix atomic compare-and-swap
The LLVM instruction returns { i32, i1 }, where the i1 indicates success.
We're only interested in the first part, which is the loaded value.
Fixes dEQP-GLES31.functional.compute.shared_var.atomic.compswap.*
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
Nicolai Hähnle [Tue, 16 Jan 2018 13:38:00 +0000 (14:38 +0100)]
radeonsi: fix error paths of si_texture_transfer_map
trans is zero-initialized, but trans->resource is setup immediately so
needs to be dereferenced.
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
Nicolai Hähnle [Fri, 23 Mar 2018 14:43:58 +0000 (15:43 +0100)]
glsl: prevent spurious Valgrind errors when serializing NIR
It looks as if the structure fields array is fully initialized below,
but in fact at least gcc in debug builds will not actually overwrite
the unused bits of bit fields.
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
Aaron Watry [Sat, 7 Apr 2018 18:44:53 +0000 (13:44 -0500)]
clover: Fix host access validation for sub-buffer creation
From CL 1.2 Section 5.2.1:
CL_INVALID_VALUE if buffer was created with CL_MEM_HOST_WRITE_ONLY and
flags specify CL_MEM_HOST_READ_ONLY , or if buffer was created with
CL_MEM_HOST_READ_ONLY and flags specify CL_MEM_HOST_WRITE_ONLY , or if
buffer was created with CL_MEM_HOST_NO_ACCESS and flags specify
CL_MEM_HOST_READ_ONLY or CL_MEM_HOST_WRITE_ONLY .
Fixes CL 1.2 CTS test/api get_buffer_info
v2: Correct host_access_flags check (Francisco)
Signed-off-by: Aaron Watry <awatry@gmail.com>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
Neil Roberts [Thu, 25 Jan 2018 18:15:43 +0000 (19:15 +0100)]
nir: Offset vertex_id by first_vertex instead of base_vertex
base_vertex will be zero for non-indexed calls and in that case we
need vertex_id to be offset by the ‘first’ parameter instead. That is
what we get with first_vertex. This is true for both GL and Vulkan.
The freedreno driver is also setting vertex_id_zero_based on
nir_options. In order to avoid breakage this patch switches the
relevant code to handle SYSTEM_VALUE_FIRST_VERTEX so that it can
retain the same behavior.
v2: change a3xx/fd3_emit.c and a4xx/fd4_emit.c from
SYSTEM_VALUE_BASE_VERTEX to SYSTEM_VALUE_FIRST_VERTEX (Kenneth).
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Cc: Rob Clark <robdclark@gmail.com>
Acked-by: Marek Olšák <marek.olsak@amd.com>
Neil Roberts [Thu, 25 Jan 2018 18:15:41 +0000 (19:15 +0100)]
spirv: Lower BaseVertex to FIRST_VERTEX instead of BASE_VERTEX
The base vertex in Vulkan is different from GL in that for non-indexed
primitives the value is taken from the firstVertex parameter instead
of being set to zero. This coincides with the new SYSTEM_VALUE_FIRST_VERTEX
instead of BASE_VERTEX.
v2 (idr): Add comment describing why SYSTEM_VALUE_FIRST_VERTEX is used
for SpvBuiltInBaseVertex. Suggested by Jason.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> [v1]
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Antia Puentes [Thu, 25 Jan 2018 18:15:40 +0000 (19:15 +0100)]
intel: Handle firstvertex in an identical way to BaseVertex
Until we set gl_BaseVertex to zero for non-indexed draw calls
both have an identical value.
The Vertex Elements are kept like that:
* VE 1: <BaseVertex/firstvertex, BaseInstance, VertexID, InstanceID>
* VE 2: <Draw ID, 0, 0, 0>
v2 (idr): Mark nir_intrinsic_load_first_vertex as "unreachable" in
emit_system_values_block and fs_visitor::nir_emit_vs_intrinsic.
Neil Roberts [Thu, 25 Jan 2018 18:15:39 +0000 (19:15 +0100)]
intel/compiler: Add a uses_firstvertex flag
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Antia Puentes [Thu, 25 Jan 2018 18:15:38 +0000 (19:15 +0100)]
compiler: Add SYSTEM_VALUE_FIRST_VERTEX and instrinsics
This VS system value will contain the value passed as <basevertex> for
indexed draw calls or the value passed as <first> for non-indexed draw
calls. It can be used to calculate the gl_VertexID as
SYSTEM_VALUE_VERTEX_ID_ZERO_BASE plus SYSTEM_VALUE_FIRST_VERTEX.
From the OpenGL 4.6 spec, 10.4 "Drawing Commands Using Vertex Arrays":
- Page 352:
"The index of any element transferred to the GL by DrawArraysOneInstance
is referred to as its vertex ID, and may be read by a vertex shader as
gl_VertexID. The vertex ID of the ith element transferred is first +
i."
- Page 355:
"The index of any element transferred to the GL by
DrawElementsOneInstance is referred to as its vertex ID, and may be read
by a vertex shader as gl_VertexID. The vertex ID of the ith element
transferred is the sum of basevertex and the value stored in the
currently bound element array buffer at offset indices + i."
Currently the gl_VertexID calculation uses SYSTEM_VALUE_BASE_VERTEX but
this will have to change when the value of gl_BaseVertex is
fixed. Currently its value is broken for non-indexed draw calls because
it must be zero but we are setting it to <first>.
v2: use SYSTEM_VALUE_FIRST_VERTEX as name for the value, instead of
SYSTEM_VALUE_BASE_VERTEX_ID (Kenneth).
v3 (idr): Rebase on Rob Clark converting nir_intrinsics.h to be
generated. Reformat commit message to 72 columns.
Reviewed-by: Neil Roberts <nroberts@igalia.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Mike Lothian [Thu, 19 Apr 2018 09:02:39 +0000 (10:02 +0100)]
meson: Build st_tests_common with gtest
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=106131
Fixes:
34cb4d0ebc1 ("meson: build tests for gallium mesa state tracker")
Signed-off-by: Mike Lothian <mike@fireburn.co.uk>
Reviewed-by: Dylan Baker <dylan@pnwbakers.com>
Bas Nieuwenhuizen [Thu, 19 Apr 2018 04:35:08 +0000 (06:35 +0200)]
radv: Add Vega M support.
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Bas Nieuwenhuizen [Thu, 19 Apr 2018 05:29:03 +0000 (07:29 +0200)]
radv: Add bound checking workaround for dynamic buffers.
I have seen a few applications and games do the dynamic buffer bounds incorrectly, this
make it easier to work around, e.g. for debugging.
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Thomas Hellstrom [Thu, 12 Apr 2018 12:41:47 +0000 (14:41 +0200)]
svga: Fix incorrect advertizing of EGL_KHR_gl_colorspace
When advertizing this extension, egl_dri2 uses the DRI2_RENDERER_QUERY
extension to query whether an sRGB format is supported. That extension will
query our driver with the BIND flag PIPE_BIND_RENDER_TARGET rather than
PIPE_BIND_DISPLAY_TARGET which is used when building the configs.
We only return the correct value for PIPE_BIND_DISPLAY_TARGET.
The inconsistency causes EGL to crash at surface initialization if sRGB is
not supported. Fix this by supporting both bind flags.
Testing done:
piglit egl_gl_colorspace srgb
Cc: <mesa-stable@lists.freedesktop.org>
Signed-off-by: Thomas Hellstrom <thellstrom@vmware.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Charmaine Lee <charmainel@vmware.com>
Mike Lothian [Wed, 4 Apr 2018 08:22:54 +0000 (09:22 +0100)]
swr: Fix include for createPromoteMemoryToRegisterPass
Include llvm/Transforms/Utils.h with the newest LLVM 7
v2: Include with " " rather than < > (Vinson Lee)
v3: Use LLVM_VERSION_MAJOR rather than HAVE_LLVM (George Kyriazis)
Signed-of-by: Mike Lothian <mike@fireburn.co.uk>
Tested-by: Vinson Lee <vlee@freedesktop.org>
Reviewed-By: George Kyriazis <george.kyriazis@intel.com>
Samuel Pitoiset [Tue, 17 Apr 2018 14:05:18 +0000 (16:05 +0200)]
radv: enable DCC for MSAA 2x textures on VI under an option
This can be enabled with RADV_PERFTEST=dccmsaa.
DCC for MSAA textures is actually not as easy to implement. It
looks like there is some corner cases. I will improve support
incrementally.
Vega support, as well as Polaris improvements, will be added later.
No CTS changes on Polaris using RADV_DEBUG=zerovram and
RADV_PERFTEST=dccmsaa.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Samuel Pitoiset [Tue, 17 Apr 2018 14:05:17 +0000 (16:05 +0200)]
radv: decompress DCC for multisampled source images before resolving
Multisampled source images (ie. color attachments) can be now
DCC compressed, so the driver needs to perform a DCC decompression
pass before resolving
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Samuel Pitoiset [Tue, 17 Apr 2018 14:05:16 +0000 (16:05 +0200)]
radv: add a workaround for fast clears with DCC and MSAA textures
This should be fixed at some point in order to improve
performance.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Samuel Pitoiset [Tue, 17 Apr 2018 14:05:15 +0000 (16:05 +0200)]
radv: allocate CMASK for DCC fast clear with MSAA
CMASK is required because it should be cleared to
0xCCCCCCCC for MSAA textures.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Samuel Pitoiset [Tue, 17 Apr 2018 14:05:14 +0000 (16:05 +0200)]
radv: implement fast color clear for DCC with MSAA
When DCC is enabled with MSAA textures, CMASK should be
cleared to 0xCCCCCCCC.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Samuel Pitoiset [Tue, 17 Apr 2018 13:08:11 +0000 (15:08 +0200)]
radv: make sure to sync after resolving using the compute path
This fixes some random CTS failures:
dEQP-VK.renderpass.multisample.*.
Performing a fast-clear eliminate is still useless, but it
seems that we need to sync.
Found while running CTS with RADV_DEBUG=zerovram.
Fixes:
56a171a499c ("radv: don't fast-clear eliminate after resolving a subpass with compute")
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Samuel Pitoiset [Wed, 18 Apr 2018 16:53:44 +0000 (18:53 +0200)]
radv: dump the SHA1 of SPIRV in the hang report
Might be useful for debugging purposes, especially when we
want to replace a shader on the fly.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Bas Nieuwenhuizen [Wed, 11 Apr 2018 17:08:30 +0000 (19:08 +0200)]
radv: Enable VK_EXT_descriptor_indexing.
This adds everything except non-uniform indexing, which needs a bit
more work and testing.
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Bas Nieuwenhuizen [Wed, 11 Apr 2018 23:36:22 +0000 (01:36 +0200)]
spirv: Add support for runtime descriptor array cap.
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Bas Nieuwenhuizen [Wed, 11 Apr 2018 23:34:29 +0000 (01:34 +0200)]
spirv: Add support for VK_EXT_descriptor_indexing uniform indexing caps.
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Bas Nieuwenhuizen [Mon, 9 Apr 2018 23:06:47 +0000 (01:06 +0200)]
radv: Support allocating variable size descriptor sets.
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Bas Nieuwenhuizen [Mon, 9 Apr 2018 22:00:22 +0000 (00:00 +0200)]
radv: Add support for variable descriptor set layouts.
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Bas Nieuwenhuizen [Mon, 9 Apr 2018 21:36:19 +0000 (23:36 +0200)]
radv: Fix GetDescriptorSetLayoutSupport.
The continue means we do alignment differently than during creation,
making the buffer smaller than expected.
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Bas Nieuwenhuizen [Mon, 9 Apr 2018 21:16:55 +0000 (23:16 +0200)]
radv: Use sorted bindings for set layout creation.
Previously we did not care about havin the set storage in order,
but for variable descriptor count we want the highest binding
at the end of the storage.
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Bas Nieuwenhuizen [Mon, 9 Apr 2018 11:02:14 +0000 (13:02 +0200)]
radv: Don't store buffer references in the descriptor set.
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Bas Nieuwenhuizen [Mon, 9 Apr 2018 10:46:49 +0000 (12:46 +0200)]
radv: Keep a global BO list for VkMemory.
With update after bind we can't attach bo's to the command buffer
from the descriptor set anymore, so we have to have a global BO
list.
I am somewhat surprised this works really well even though we have
implicit synchronization in the WSI based on the bo list associations
and with the new behavior every command buffer is associated with
every swapchain image. But I could not find slowdowns in games because
of it.
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Bas Nieuwenhuizen [Sun, 8 Apr 2018 11:03:06 +0000 (13:03 +0200)]
spirv: Update spirv.h to
12f8de9f04327336b699b1b80aa390ae7f9ddbf4
Acked-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Kenneth Graunke [Fri, 13 Apr 2018 18:48:06 +0000 (11:48 -0700)]
i965: Fix shadow batches to be the same size as the real BO.
brw_bo_alloc may round up our allocation size to the next bucket size.
In this case, we would malloc a shadow buffer that was the original
intended size, but use bo->size (the larger size) for all of our checks.
This could cause us to run off the end of the shadow buffer.
v2: Actually use the new BO size (caught by Lionel)
Reported-by: James Xiong <james.xiong@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Fixes:
c7dcee58b5fe183e1653c13bff6a212f0d157b29 (i965: Avoid problems from referencing orphaned BOs after growing.)
Marek Olšák [Fri, 13 Apr 2018 19:18:26 +0000 (15:18 -0400)]
glsl_to_tgsi: try harder to lower unsupported ir_binop_vector_extract
This fixes some piglits.
Cc: 18.0 <mesa-stable@lists.freedesktop.org>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Leo Liu [Wed, 22 Nov 2017 18:31:53 +0000 (13:31 -0500)]
radeon/vce: disable vce dual pipe on VegaM
Acked-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Marek Olšák [Mon, 27 Feb 2017 22:28:07 +0000 (23:28 +0100)]
radeonsi: add support for VegaM
Acked-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Marek Olšák [Tue, 7 Nov 2017 01:00:03 +0000 (02:00 +0100)]
amd/addrlib: add support for VegaM
Acked-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Marek Olšák [Tue, 17 Apr 2018 19:28:04 +0000 (15:28 -0400)]
radeonsi/gfx9: fix a hang with an empty first IB
This packet causes the no-op IB detection to fail, so the IB is always
submitted. Also fix the no-op IB detection by moving the begin call.
Cc: 18.0 <mesa-stable@lists.freedesktop.org>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Dylan Baker [Thu, 5 Apr 2018 21:39:13 +0000 (14:39 -0700)]
meson: build graw tests
This only enables the null and xlib target, so no windows support yet.
Signed-off-by: Dylan Baker <dylan.c.baker@intel.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Dylan Baker [Tue, 6 Feb 2018 23:46:25 +0000 (15:46 -0800)]
meson: build tests for gallium mesa state tracker
v2: - Fix typo
Signed-off-by: Dylan Baker <dylan.c.baker@intel.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Dylan Baker [Thu, 11 Jan 2018 00:13:52 +0000 (16:13 -0800)]
meson: build gallium unit tests
v2: - gate unit tests on swrast being enabled (Eric A)
v3: - rebase on libtrace being merged with gallium auxiliary
Signed-off-by: Dylan Baker <dylan.c.baker@intel.com>
Reviewed-by: Eric Anholt <eric@anholt.net> (v2)
Dylan Baker [Thu, 11 Jan 2018 00:07:11 +0000 (16:07 -0800)]
meson: Build gallium trivial tests
Signed-off-by: Dylan Baker <dylan.c.baker@intel.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Dylan Baker [Wed, 10 Jan 2018 23:18:54 +0000 (15:18 -0800)]
meson: Remove TODO about mesa/main tests
They're already done.
Signed-off-by: Dylan Baker <dylan.c.baker@intel.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Dylan Baker [Thu, 11 Jan 2018 22:41:42 +0000 (14:41 -0800)]
meson: enable glcpp test
Signed-off-by: Dylan Baker <dylan.c.baker@intel.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Dylan Baker [Tue, 9 Jan 2018 23:26:39 +0000 (15:26 -0800)]
glcpp/tests: Convert shell scripts to a python script
This ports glcpp-test.sh and glcpp-test-cr-lf.sh to a python script that
accepts arguments for each line ending type. This should allow for
better reporting to users.
v2: - Use $PYTHON2 to be consistent with other tests in mesa
Signed-off-by: Dylan Baker <dylan.c.baker@intel.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Dylan Baker [Thu, 11 Jan 2018 22:32:53 +0000 (14:32 -0800)]
glsl/tests: Remove unused compare_ir.py script
Signed-off-by: Dylan Baker <dylan.c.baker@intel.com>
Dylan Baker [Thu, 11 Jan 2018 22:32:40 +0000 (14:32 -0800)]
meson: enable optimization-test
Signed-off-by: Dylan Baker <dylan.c.baker@intel.com>
Dylan Baker [Sat, 9 Dec 2017 01:45:03 +0000 (17:45 -0800)]
glsl/tests: Convert optimization-test.sh to pure python
This patch converts optimization-test.sh to python, in this process it
removes external shell dependencies including diff. It replaces the
python script that generates shell scripts with a python library that
generates test cases and runs them using subprocess.
v2: - use $PYTHON2 to be consistent with other tests in mesa
Signed-off-by: Dylan Baker <dylan.c.baker@intel.com>
Dylan Baker [Sat, 9 Dec 2017 01:45:03 +0000 (17:45 -0800)]
meson: run glsl compiler warnings test
Signed-off-by: Dylan Baker <dylan.c.baker@intel.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Dylan Baker [Sat, 9 Dec 2017 01:25:50 +0000 (17:25 -0800)]
glsl/tests: reimplement warnings-test in python
This reimplements the test in python with a shell script wrapper that
allows autotools to continue to run the test without realizing that
anything has changed.
Using python has two advantages, first it's portable so this test can be
run on windows as well as Linux since it just requires python, no more
diff, pwd or sh. It's also no longer tied to autotools implementation
details, like the environment variables $srcdir and $abs_builddir,
though the autotools shell wrapper still uses those, which makes it
possible to run the test in meson.
v2: - Use $PYTHON2 in script to be consistent with other scripts in mesa
Signed-off-by: Dylan Baker <dylan.c.baker@intel.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
George Kyriazis [Tue, 10 Apr 2018 22:49:19 +0000 (17:49 -0500)]
swr/rast: Fix VGATHERPD lowering
Also Implement VHSUBPS in x86 lowering pass.
Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
George Kyriazis [Tue, 10 Apr 2018 17:03:41 +0000 (12:03 -0500)]
swr/rast: Replace x86 VMOVMSK with llvm-only implementation
Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
George Kyriazis [Tue, 10 Apr 2018 06:05:19 +0000 (01:05 -0500)]
swr/rast: Optimize late/bindless JIT of samplers
Add per-worker thread private data to all shader calls
Add per-worker sampler cache and jit context
Add late LoadTexel JIT support
Add per-worker-thread Sampler / LoadTexel JIT
Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
George Kyriazis [Mon, 9 Apr 2018 22:21:46 +0000 (17:21 -0500)]
swr/rast: Implement VROUND intrinsic in x86 lowering pass
Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
George Kyriazis [Mon, 9 Apr 2018 18:35:43 +0000 (13:35 -0500)]
swr/rast: Refactor to improve code sharing.
Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
George Kyriazis [Mon, 9 Apr 2018 17:51:14 +0000 (12:51 -0500)]
swr/rast: minimize codegen redundant work
Move filtering of redundant codegen operations into gen scripts themselves
Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>