Marek Olšák [Fri, 9 Jun 2017 16:46:07 +0000 (18:46 +0200)]
radeonsi: move instance divisors into a constant buffer
Shader key size: 107 -> 47
Divisors of 0 and 1 are encoded in the shader key. Greater instance divisors
are loaded from a constant buffer.
The shader code doing the division is huge. Is it something we need to
worry about? Does any app use instance divisors >= 2?
VS prolog disassembly:
s_load_dwordx4 s[12:15], s[0:1], 0x80 ;
C00A0300 00000080
s_nop 0 ;
BF800000
s_waitcnt lgkmcnt(0) ;
BF8C007F
s_buffer_load_dword s14, s[12:15], 0x4 ;
C0220386 00000004
s_waitcnt lgkmcnt(0) ;
BF8C007F
v_cvt_f32_u32_e32 v4, s14 ;
7E080C0E
v_rcp_iflag_f32_e32 v4, v4 ;
7E084704
v_mul_f32_e32 v4, 0x4f800000, v4 ;
0A0808FF 4F800000
v_cvt_u32_f32_e32 v4, v4 ;
7E080F04
v_mul_hi_u32 v5, v4, s14 ;
D2860005 00001D04
v_mul_lo_i32 v6, v4, s14 ;
D2850006 00001D04
v_cmp_eq_u32_e64 s[12:13], 0, v5 ;
D0CA000C 00020A80
v_sub_i32_e32 v5, vcc, 0, v6 ;
340A0C80
v_cndmask_b32_e64 v5, v6, v5, s[12:13] ;
D1000005 00320B06
v_mul_hi_u32 v5, v5, v4 ;
D2860005 00020905
v_add_i32_e32 v6, vcc, v5, v4 ;
320C0905
v_subrev_i32_e32 v4, vcc, v5, v4 ;
36080905
v_cndmask_b32_e64 v4, v4, v6, s[12:13] ;
D1000004 00320D04
v_mul_hi_u32 v5, v4, v1 ;
D2860005 00020304
v_add_i32_e32 v4, vcc, s8, v0 ;
32080008
v_mul_lo_i32 v6, v5, s14 ;
D2850006 00001D05
v_add_i32_e32 v7, vcc, 1, v5 ;
320E0A81
v_cmp_ge_u32_e64 s[12:13], v1, v6 ;
D0CE000C 00020D01
v_sub_i32_e32 v6, vcc, v1, v6 ;
340C0D01
v_cmp_le_u32_e32 vcc, s14, v6 ;
7D960C0E
v_cndmask_b32_e64 v8, 0, -1, s[12:13] ;
D1000008 00318280
v_cndmask_b32_e64 v6, 0, -1, vcc ;
D1000006 01A98280
v_and_b32_e32 v6, v8, v6 ;
260C0D08
v_cmp_eq_u32_e32 vcc, 0, v6 ;
7D940C80
v_cndmask_b32_e32 v6, v7, v5, vcc ;
000C0B07
v_add_i32_e32 v5, vcc, -1, v5 ;
320A0AC1
v_cmp_eq_u32_e32 vcc, 0, v8 ;
7D941080
v_cndmask_b32_e32 v5, v6, v5, vcc ;
000A0B06
v_add_i32_e32 v5, vcc, s9, v5 ;
320A0A09
v2: set prefer_mono for fetched instance divisors
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Marek Olšák [Mon, 26 Jun 2017 22:32:47 +0000 (00:32 +0200)]
radeonsi: check nr_cbufs in other places before flushing CB
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Marek Olšák [Mon, 19 Jun 2017 23:21:19 +0000 (01:21 +0200)]
radeonsi: use #pragma pack to pack si_shader_key
sizeof(struct si_shader_key):
Before reverting the 2 commits: 120 bytes
After reverting the 2 commits: 128 bytes
With #pragma pack: 107 bytes
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Marek Olšák [Mon, 19 Jun 2017 23:12:47 +0000 (01:12 +0200)]
Revert "radeonsi: use uint32_t to declare si_shader_key.opt.kill_outputs"
This reverts commit
7b2240ac9ce3ba9bd86f4ae8aac53af8878c0b10.
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Marek Olšák [Mon, 19 Jun 2017 23:12:38 +0000 (01:12 +0200)]
Revert "radeonsi: remove 8 bytes from si_shader_key with uint32_t ff_tcs_inputs_to_copy"
This reverts commit
6b6fed3a3c81c2b0d319ef121df20a0dc914705f.
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Marek Olšák [Thu, 22 Jun 2017 15:16:14 +0000 (17:16 +0200)]
mesa: optimize GL_PRIMITIVE_RESTART_NV more
And other client state changes don't have to call
update_derived_primitive_restart_state.
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Marek Olšák [Sat, 24 Jun 2017 01:16:06 +0000 (03:16 +0200)]
mesa: fix clip plane enable breakage
Broken by:
commit
00173d91b70ae4dcea7c6324ee4858c498cae14b
Author: Marek Olšák <marek.olsak@amd.com>
Date: Sat Jun 10 12:09:43 2017 +0200
mesa: don't flag _NEW_TRANSFORM for st/mesa if possible
It also optimizes the case slightly for GL core.
It doesn't try to fix that glEnable might be a bad place to do the
clip plane transformation.
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Tested-by: Michel Dänzer <michel.daenzer@amd.com>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Leo Liu [Fri, 23 Jun 2017 17:21:09 +0000 (13:21 -0400)]
radeon/vcn: enable h264 decode entension support
It's enabled through message buffer for UVD
Signed-off-by: Leo Liu <leo.liu@amd.com>
Acked-by: Christian König <christian.koenig@amd.com>
Charmaine Lee [Mon, 26 Jun 2017 20:18:33 +0000 (14:18 -0600)]
svga: clean up format_cap_table
Per Jose's suggestion, this patch cleans up format_cap_table to remove
the unnecessary default cap value for vgpu10 formats since those devcap values
can be retrieved from the device.
Tested with MTT conform, glretrace, piglit in HWv13 and HWv8.
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
Charmaine Lee [Mon, 12 Jun 2017 22:56:17 +0000 (15:56 -0700)]
svga: fix the default devcap for SVGA3D_Z_D24S8_INT
The default devcap for format SVGA3D_Z_D24S8_INT in HWv8 when its devcap is
not explicitly advertised should be set to zero to match the default value
in the device.
Tested with MTT piglit in HW version 8.
Reviewed-by: Neha Bhende <bhenden@vmware.com>
Charmaine Lee [Wed, 16 Nov 2016 23:31:00 +0000 (15:31 -0800)]
svga: create buffer surfaces for incompatible bind flags
In cases where certain bind flags cannot be enabled together,
such as CONSTANT_BUFFER cannot be combined with any other flags,
a separate host surface will be created.
For example, if a stream output buffer is reused as a constant buffer,
two host surfaces will be created, one for stream output,
and another one for constant buffer. Data will be copied from the
stream output surface to the constant buffer surface.
Fixes piglit test ext_transform_feedback-immediate-reuse-index-buffer,
ext_transform_feedback-immediate-reuse-uniform-buffer
Tested with MTT piglit, MTT glretrace, Nature, NobelClinician Viewer, Tropics.
v2: Fix bind flags compatibility check as suggested by Brian.
v3: Use the list utility to maintain the buffer surface list.
v4: Use the SAFE rev of LIST_FOR_EACH_ENTRY
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
Charmaine Lee [Tue, 15 Nov 2016 18:15:46 +0000 (10:15 -0800)]
svga: do not unconditionally enable streamout bind flag
Currently we unconditionally enable streamout bind flag at
buffer resource creation time. This is not necessary if the buffer
is never used as a streamout buffer. With this patch, we enable
streamout bind flag as indicated by the state tracker. If the buffer
is later bound to streamout and does not already has streamout bind
flag enabled, we will recreate the buffer with
the new set of bind flags. Buffer content will be copied
from the old buffer to the new one.
Tested with MTT piglit, Nature, Tropics, Lightsmark.
v2: Fix bind flags check as suggested by Brian.
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
Charmaine Lee [Mon, 26 Jun 2017 23:24:15 +0000 (17:24 -0600)]
svga: pass tobind_flags to svga_buffer_handle
This is to prepare for more bind_flags optimization
in subsequent patches.
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
Charmaine Lee [Fri, 11 Nov 2016 22:40:57 +0000 (14:40 -0800)]
svga: pass bind_flags to surface create functions
This is to prepare for other bind_flags optimization
in subsequent patches.
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
Brian Paul [Mon, 26 Jun 2017 20:40:58 +0000 (14:40 -0600)]
pipe_loader_sw: fix compilation warning
Add the new 'flags' parameter to pipe_loader_sw_create_screen().
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Eric Engestrom [Tue, 27 Jun 2017 10:47:14 +0000 (11:47 +0100)]
mesa: add missing include
src/mesa/drivers/x11/xm_dd.c:688:7: warning: implicit declaration of function ‘_mesa_update_draw_buffer_bounds’; did you mean ‘_mesa_has_ARB_draw_buffers_blend’? [-Wimplicit-function-declaration]
_mesa_update_draw_buffer_bounds(ctx, ctx->DrawBuffer);
^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Cc: Marek Olšák <marek.olsak@amd.com>
Fixes:
585c5cf8a514783d9ed3 ("mesa: don't update draw buffer bounds in
_mesa_update_state")
Signed-off-by: Eric Engestrom <eric.engestrom@imgtec.com>
Lionel Landwerlin [Mon, 5 Jun 2017 10:24:25 +0000 (11:24 +0100)]
i965: perf: add support for Geminilake
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Acked-by: Kenneth Graunke <kenneth@whitecape.org>
Lionel Landwerlin [Mon, 24 Apr 2017 01:38:36 +0000 (18:38 -0700)]
i965: perf: add support for Kabylake
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Acked-by: Kenneth Graunke <kenneth@whitecape.org>
Lionel Landwerlin [Mon, 24 Apr 2017 02:12:00 +0000 (19:12 -0700)]
i965: perf: use gen_device_info rather then brw_context
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Robert Bragg [Thu, 13 Apr 2017 18:50:37 +0000 (19:50 +0100)]
i965: perf: ensure isolated timer reports while idle don't confuse filtering
From experimentation in IGT, we found that the OA unit might label
some report as "idle" (using an invalid context ID), right after a
report for a given context. Deltas generated by those reports actually
belong to the previous context, even though they're not labelled as
such.
This change makes ensure that while reading OA reports, we only
consider the GPU actually idle after 2 reports with an invalid context
ID.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Lionel Landwerlin [Thu, 30 Mar 2017 14:46:40 +0000 (15:46 +0100)]
i965: perf: keep on reading reports until delimiting timestamp
Due to an underlying hardware race condition, we have no guarantee
that all the reports coming from the OA buffer related to the workload
we're trying to measure have landed to memory by the time all the work
submitted has completed. That means we need to keep on reading the OA
stream until we read a report with a timestamp more recent than the
timestamp recored by the MI_REPORT_PERF_COUNT at the end of the
performance query.
v2: fix uninitialized offset variable to 0 (Lionel)
v3: rework the reading to avoid blocking the user of the API unless
requested (Rob)
v4: fix a bug that makes the i965 driver reading the perf stream when
not necessary, leading to very long counter accumulation times
(Lionel)
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Robert Bragg [Wed, 25 Nov 2015 16:41:04 +0000 (16:41 +0000)]
i965: Add Gen8+ INTEL_performance_query support
Enables access to OA unit metrics on Gen8+ via INTEL_performance_query.
v2: make use of new parameters coming from gen_device_info (Lionel)
Signed-off-by: Robert Bragg <robert@sixbynine.org>
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Robert Bragg [Wed, 25 Nov 2015 16:41:04 +0000 (16:41 +0000)]
i965: Add XML OA metric sets for Gen8+
Also updates Makefile.am to generate corresponding normalization code.
Signed-off-by: Robert Bragg <robert@sixbynine.org>
Acked-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Robert Bragg [Wed, 22 Feb 2017 22:50:35 +0000 (22:50 +0000)]
i965: Add Gen8+ sys_vars for generated OA code
In preparation for adding XML OA metric set descriptions for Gen 8 and 9
which will result in auto generated code that depends on a number of new
system variables ($EuSubslicesTotalCount, $EuThreadsCount and
$SliceMask) this adds corresponding members to brw->perf.sys_vars.
Signed-off-by: Robert Bragg <robert@sixbynine.org>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Lionel Landwerlin [Thu, 15 Jun 2017 11:28:32 +0000 (12:28 +0100)]
anv/i965: drop libdrm_intel dependency completely
With Ken's work to drop the library dependency on libdrm_intel, we now
only depend on libdrm for the kernel uapi headers it provides. It
seems like we're better off just embeddeding those headers ourselves,
making the lives of people developping news features tightly
integrated with the kernel a tiny bit easier.
This change also makes it a bit more obvious what cflags/libs are
required by the i915 drivers vs i965, by renaming INTEL_CFLAGS/LIBS
into I915_CFLAGS/LIBS.
Headers were generated from drm-tip on the following commit :
commit
6d61e70ccc21606ffb8a0a03bd3aba24f659502b
Merge:
338ffbf7cb5e c0bc126f97fb
Author: Dave Airlie <airlied@redhat.com>
Date: Tue Jun 27 07:24:49 2017 +1000
Backmerge tag 'v4.12-rc7' into drm-next
v2: Use installed files from the kernel (Daniel Vetter)
v3: Use headers from drm-next rather than drm-tip (Dave/Daniel)
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Acked-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Lionel Landwerlin [Thu, 15 Jun 2017 11:28:07 +0000 (12:28 +0100)]
i915: use different CFLAGS/LIBS variables than i965/anv
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
Lionel Landwerlin [Tue, 25 Apr 2017 21:41:52 +0000 (14:41 -0700)]
aubinator: import intel_aub.h from libdrm
This enables us to compile aubinator without the libdrm dependency.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Lionel Landwerlin [Thu, 22 Jun 2017 01:15:50 +0000 (02:15 +0100)]
i965: perf: minimize the chances to spread queries across batchbuffers
Counter related to timings will be sensitive to any delay introduced
by the software. In particular if our begin & end of performance
queries end up in different batches, time related counters will
exhibit biffer values caused by the time it takes for the kernel
driver to load new requests into the hardware.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Acked-by: Kenneth Graunke <kenneth@whitecape.org>
Juan A. Suarez Romero [Thu, 8 Jun 2017 11:50:55 +0000 (11:50 +0000)]
nir: implement GLSL.std.450 NMax, NMIn and NClamp operations
v2: NIR fmax/fmin already handles NaN (Connor).
Reviewed by: Elie Tournier <elie.tournier@collabora.com>
Juan A. Suarez Romero [Thu, 8 Jun 2017 11:03:42 +0000 (11:03 +0000)]
nir: add support for 64-bit in SmoothStep function
According to GLSL.std.450 spec, SmoothStep expects input to be a
floating-point type, but it does not restrict the bitsize.
Current implementation relies on inputs to be 32-bit.
This commit extends the support to 64-bit size inputs.
Reviewed by: Elie Tournier <elie.tournier@collabora.com>
Juan A. Suarez Romero [Thu, 8 Jun 2017 10:06:48 +0000 (10:06 +0000)]
nir: sge operation is defined for floating-point types
According to GLSL.std.450 spec, the operand for step() function must be
a floating-point. It does not restrict the value to 32-bit floats.
Reviewed by: Elie Tournier <elie.tournier@collabora.com>
Topi Pohjolainen [Mon, 26 Jun 2017 07:43:15 +0000 (10:43 +0300)]
i965: Separate gen < 8 and gen >= 8 paths explicitly in wrap_mode()
Makes coverity happier.
Fix indentation in gen >= 8 block while at it.
CID: 1413020
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Engestrom <eric@engestrom.ch>
Signed-off-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Topi Pohjolainen [Mon, 26 Jun 2017 07:36:50 +0000 (10:36 +0300)]
intel/anv: Add missing break in anv_CreateDevice()
CID: 1413018
Reviewed-by: Eric Engestrom <eric@engestrom.ch>
Reviewed-by: Nanley Chery <nanley.g.chery@intel.com>
Signed-off-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Nicolai Hähnle [Thu, 18 May 2017 20:04:37 +0000 (22:04 +0200)]
ac/nir: convert emit helpers to ac_llvm_context
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Dave Airlie <airlied@redhat.com>
Nicolai Hähnle [Sat, 24 Jun 2017 19:06:34 +0000 (21:06 +0200)]
ac/nir: remove unused nir_to_llvm_context::has_ddxy
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Dave Airlie <airlied@redhat.com>
Nicolai Hähnle [Sun, 25 Jun 2017 10:57:02 +0000 (12:57 +0200)]
ac/nir: implement nir_op_f2b
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Dave Airlie <airlied@redhat.com>
Nicolai Hähnle [Sat, 24 Jun 2017 18:39:39 +0000 (20:39 +0200)]
ac/nir: implement nir_op_{b2i,i2b}
Booleans in NIR are ~0 for true, b2i returns 0/1.
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Dave Airlie <airlied@redhat.com>
Nicolai Hähnle [Thu, 18 May 2017 16:01:50 +0000 (18:01 +0200)]
ac/nir: convert type helpers to ac_llvm_context
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Dave Airlie <airlied@redhat.com>
Nicolai Hähnle [Sat, 24 Jun 2017 15:56:38 +0000 (17:56 +0200)]
ac/llvm: fix type of second llvm.cttz.* parameter
LLVM has required an i1 here for a long time. llvm.ctlz.* was fixed in
commit
edd23e06067 ("ac/llvm: fix various findMSB bugs").
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Dave Airlie <airlied@redhat.com>
Nicolai Hähnle [Mon, 15 May 2017 22:04:10 +0000 (00:04 +0200)]
ac/shader_info: fix a comment
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Dave Airlie <airlied@redhat.com>
Nicolai Hähnle [Thu, 8 Jun 2017 18:04:28 +0000 (20:04 +0200)]
ac: add ac_llvm_context::v8i32
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Dave Airlie <airlied@redhat.com>
Nicolai Hähnle [Thu, 18 May 2017 20:02:48 +0000 (22:02 +0200)]
ac: add ac_llvm_context::{i,f}32_{0,1}
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Dave Airlie <airlied@redhat.com>
Nicolai Hähnle [Thu, 30 Mar 2017 12:10:26 +0000 (14:10 +0200)]
ac: add ac_llvm_context::{i16, i64, f16, f64}
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Dave Airlie <airlied@redhat.com>
Ilia Mirkin [Sat, 24 Jun 2017 22:35:29 +0000 (18:35 -0400)]
nv50/ir: fix combineLd/St to update existing records as necessary
Previously the logic would decide that the record is kept, which
translates into keep = false in the caller, which meant that these
passes did not run.
While it's right that keep = false which means that a new record does
not need to be added, we do still have to perform the usual list
maintenance. It's easiest to do this pre-merge rather than post.
The lowering that clip/cull distance passes produce triggers this bug in
TCS (since reading outputs is done differently in other stages), but it
should be possible to achieve it with the right sequence of regular
reads/writes.
Fixes: KHR-GL45.cull_distance.functional
Fixes: generated_tests/spec/arb_tessellation_shader/execution/tes-input/tes-input-gl_ClipDistance.shader_test
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Cc: mesa-stable@lists.freedesktop.org
Ilia Mirkin [Sat, 24 Jun 2017 21:09:20 +0000 (17:09 -0400)]
nv50/ir: adjust overlapping logic to take fileIndex-relative offsets
If the fileIndex is different, that means they are in logically
different spaces. However if there's also a relative offset, then they
could end up pointing at the same spot again.
Also add a note about potential for multiple buffers to overlap even if
they're at different file indexes. However that's potentially lowered
away by the point that this logic hits.
Not known to fix any specific application or test.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Ilia Mirkin [Sat, 24 Jun 2017 21:08:11 +0000 (17:08 -0400)]
nv50/ir: VFETCH is also considered a load for MemoryOpt
This has no effect since in practice this will only play for
memory-backed files, for which VFETCH will never happen.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Ilia Mirkin [Sat, 24 Jun 2017 17:17:08 +0000 (13:17 -0400)]
nv50,nvc0: remove IDX from bufctx immediately, to avoid conflicts with clear
The idxbuf could linger, and when a clear happened, which also uses the
3d bufctx, we could get an error trying to access it.
This fixes spurious crashes/errors in CTS tests.
Fixes:
61d8f3387d ("nv50,nvc0: clear index buffer bufctx bin unconditionally")
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Cc: mesa-stable@lists.freedesktop.org
Ilia Mirkin [Sat, 24 Jun 2017 16:08:52 +0000 (12:08 -0400)]
nv50/ir: fetch indirect sources BEFORE the op that uses them
All the BuildUtil helpers just insert the operation into the current BB.
So we have to take care that any fetchSrc() operations happen before the
operation whose setIndirect() it goes into.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Cc: mesa-stable@lists.freedesktop.org
Timothy Arceri [Thu, 22 Jun 2017 22:56:40 +0000 (08:56 +1000)]
mesa: skip FLUSH_VERTICES() if no samplers were changed
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Timothy Arceri [Thu, 22 Jun 2017 22:44:25 +0000 (08:44 +1000)]
mesa: don't set _NEW_PROGRAM_CONSTANTS for non-bindless opaque uniforms
v2: rebase on new _mesa_flush_vertices_for_uniforms() helper
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Rob Herring [Mon, 26 Jun 2017 21:23:00 +0000 (16:23 -0500)]
Android: add renderonly files to libmesa_gallium
vc4 now depends on renderonly functions, but these weren't added to the
Android build resulting in the following errors:
src/gallium/drivers/vc4/vc4_resource.c:380: error: undefined reference to 'renderonly_scanout_destroy'
src/gallium/drivers/vc4/vc4_resource.c:681: error: undefined reference to 'renderonly_create_gpu_import_for_resource'
src/gallium/drivers/vc4/vc4_screen.c:625: error: undefined reference to 'renderonly_dup'
src/gallium/winsys/pl111/drm/pl111_drm_winsys.c:37: error: undefined reference to 'renderonly_create_gpu_import_for_resource'
src/gallium/winsys/pl111/drm/pl111_drm_winsys.c:37: error: undefined reference to 'renderonly_create_gpu_import_for_resource'
Fixes:
7029ec05e2c7 ("gallium: Add renderonly-based support for pl111+vc4.")
Signed-off-by: Rob Herring <robh@kernel.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
Timothy Arceri [Mon, 26 Jun 2017 03:27:17 +0000 (13:27 +1000)]
mesa: add KHR_no_error support for glCopyTexImage*D()
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Timothy Arceri [Mon, 26 Jun 2017 03:20:45 +0000 (13:20 +1000)]
mesa: add no error support to copyteximage()
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Timothy Arceri [Mon, 26 Jun 2017 03:14:03 +0000 (13:14 +1000)]
mesa: create copyteximage_err() helper and always inline copyteximage()
This will be useful in the following patch when we add KHR_no_error
support.
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Timothy Arceri [Mon, 26 Jun 2017 02:58:34 +0000 (12:58 +1000)]
mesa: tidy up copyteximage()
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Ian Romanick [Sat, 3 Jun 2017 02:08:15 +0000 (19:08 -0700)]
i915: On Gen <= 3 there are no array textures
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Ian Romanick [Sat, 3 Jun 2017 00:29:53 +0000 (17:29 -0700)]
i915: On Gen <= 3 there is no W-tiling
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Ian Romanick [Fri, 2 Jun 2017 23:57:45 +0000 (16:57 -0700)]
i915: Remove unused fields intel_mipmap_tree::logical_(width|height|depth)0
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Ian Romanick [Fri, 2 Jun 2017 23:56:19 +0000 (16:56 -0700)]
i915: Remove unused field intel_mipmap_tree::array_spacing_lod0
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Ian Romanick [Fri, 2 Jun 2017 23:42:58 +0000 (16:42 -0700)]
i915: On Gen <= 3 there is no multisampling
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Ian Romanick [Fri, 2 Jun 2017 23:40:30 +0000 (16:40 -0700)]
i915: Trivial code reformatting
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Ian Romanick [Fri, 2 Jun 2017 23:33:35 +0000 (16:33 -0700)]
i915,i965: Don't condition use of GLSL clear on the current API
Meta always sets the API to API_OPENGL_COMPAT, so the current API
setting is irrelevant.
text data bss dec hex filename
7154994 256860 37332 7449186 71aa62 32-bit i965_dri.so before
7154978 256860 37332 7449170 71aa52 32-bit i965_dri.so after
6788451 328056 50704 7167211 6d5ceb 64-bit i965_dri.so before
6788419 328056 50704 7167179 6d5ccb 64-bit i965_dri.so after
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Timothy Arceri [Mon, 26 Jun 2017 02:38:24 +0000 (12:38 +1000)]
mesa: add KHR_no_error support for glCopyTex{ture}SubImage*D()
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Timothy Arceri [Mon, 26 Jun 2017 02:38:23 +0000 (12:38 +1000)]
mesa: add copy_texture_sub_image_no_error() helper
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Timothy Arceri [Mon, 26 Jun 2017 02:38:22 +0000 (12:38 +1000)]
mesa: remove redundant NULL check
This can never be NULL in any of the entry paths.
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Timothy Arceri [Mon, 26 Jun 2017 02:38:21 +0000 (12:38 +1000)]
mesa: create copy_texture_sub_image_err() helper
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Timothy Arceri [Mon, 26 Jun 2017 02:38:20 +0000 (12:38 +1000)]
mesa: make _mesa_copy_texture_sub_image() static
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Timothy Arceri [Mon, 26 Jun 2017 00:49:17 +0000 (10:49 +1000)]
mesa: add KHR_no_error support for gl{Compressed}TexImage*D()
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Timothy Arceri [Mon, 26 Jun 2017 00:49:16 +0000 (10:49 +1000)]
mesa: add no error support to teximage()
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Timothy Arceri [Mon, 26 Jun 2017 00:49:15 +0000 (10:49 +1000)]
mesa: create wrapper around teximage()
This is used to inline KHR_no_error logic without inlining
the function into all its callers.
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Timothy Arceri [Sun, 25 Jun 2017 23:25:07 +0000 (09:25 +1000)]
mesa: fix unused variable warning in release builds
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Marek Olšák [Sat, 24 Jun 2017 20:39:01 +0000 (22:39 +0200)]
radeonsi: don't flush and wait for CB after depth-only rendering
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Ian Romanick [Tue, 6 Jun 2017 00:34:38 +0000 (17:34 -0700)]
blorp: Use normalized coordinates on Gen6
Apparently, the sampler has some sort of precision issues for
non-normalized texture coordinates with linear filtering. This caused
some small precision issues in scaled blits. Work around this by using
normalized coordinates. There is some extra work necessary because Gen6
uses TEX (instead of TXF) for some multisample resolve blits.
Fixes piglit.spec.arb_framebuffer_object.fbo-blit-stretch on SNB.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=68365
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Eduardo Lima Mitev <elima@igalia.com>
Marek Olšák [Mon, 26 Jun 2017 20:23:15 +0000 (22:23 +0200)]
mesa/glthread: don't include pthread.h
Not needed. This fixes the Windows build.
Nanley Chery [Thu, 11 May 2017 16:37:33 +0000 (09:37 -0700)]
anv/gpu_memcpy: Rename the gpu_memcpy function
A GPU memcpy function could alternatively be implemented using MI_*
commands. Provide more detail into how this one operates in case another
memcpy function is created.
v2:
- Update the commit message.
v3:
- Use 'memcpy' instead of 'cpy' (Jason Ekstrand)
- Shorten 'streamout' to 'so'
Suggested-by: Jason Ekstrand <jason@jlekstrand.net>
Signed-off-by: Nanley Chery <nanley.g.chery@intel.com>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com> (v2)
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Nanley Chery [Mon, 12 Jun 2017 17:12:41 +0000 (10:12 -0700)]
anv/blorp: Provide surface states for CCS resolves
In the future, we plan on using this method to resolve images whose
surface state fast-clear value is dynamically updated during command
buffer execution. Start using it now for testing and to reduce churn
later on.
Signed-off-by: Nanley Chery <nanley.g.chery@intel.com>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Nanley Chery [Sat, 11 Mar 2017 00:31:16 +0000 (16:31 -0800)]
anv/blorp: Add a surface-state-based CCS resolve function
This will be used in the next patch.
v2:
- Omit BLORP_BATCH_NO_EMIT_DEPTH_STENCIL (Jason Ekstrand)
- Update commit message.
Signed-off-by: Nanley Chery <nanley.g.chery@intel.com>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Nanley Chery [Tue, 7 Mar 2017 01:37:49 +0000 (17:37 -0800)]
blorp/clear: Add a binding-table-based CCS resolve function
v2:
- Do layered resolves.
(Jason Ekstrand):
- Replace "bt" suffix with "attachment".
- Rename helper function to prepare_ccs_resolve.
- Move blorp_params_init() into helper function.
Signed-off-by: Nanley Chery <nanley.g.chery@intel.com>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Nanley Chery [Mon, 12 Jun 2017 19:58:32 +0000 (12:58 -0700)]
anv: Adjust params of color buffer transitioning functions
Splitting out these fields will make the color buffer transitioning
function simpler when it gains more features.
v2: Remove unintended blank line (Iago Toral)
Signed-off-by: Nanley Chery <nanley.g.chery@intel.com>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Nanley Chery [Mon, 12 Jun 2017 19:58:32 +0000 (12:58 -0700)]
anv/blorp: Remove 3D subresource transition workaround
For 3D image subresources undergoing a layout transition via
PipelineBarrier, we increase the number of fast-cleared layers to match
the intended behaviour of KHR_maintenance1. When such subresources
undergo layout transitions between subpasses, we don't do this to avoid
failing incorrect CTS tests. Instead, unify the behaviour in both
scenarios, and wait for the CTS tests to catch up. See CL 1111 for the
test fix and Vulkan issue #849 for more information.
On SKL+, this causes 3 test failures under:
dEQP-VK.pipeline.render_to_image.3d.*
v2: Add a reference to the Vulkan issue (Iago Toral).
Signed-off-by: Nanley Chery <nanley.g.chery@intel.com>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com> (v1)
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Nanley Chery [Sat, 11 Mar 2017 01:24:23 +0000 (17:24 -0800)]
anv/cmd_buffer: Adjust the image view reloc function
Make the function take in an image instead of an image view. This
enables us to record relocations for surfaces states created outside of
the anv_CreateImageView path.
v2 (Jason Ekstrand):
- Use image->offset instead of surf_offset in aux_offset calculation.
Signed-off-by: Nanley Chery <nanley.g.chery@intel.com>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Nanley Chery [Mon, 12 Jun 2017 18:26:47 +0000 (11:26 -0700)]
anv/cmd_buffer: Adjust layout transition aspect checking
Reflect the fact that an image view or subresource range with the color
aspect cannot have any other aspect.
Signed-off-by: Nanley Chery <nanley.g.chery@intel.com>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Nanley Chery [Tue, 4 Apr 2017 16:56:16 +0000 (09:56 -0700)]
anv: Add and use color auxiliary buffer helpers
v2:
- Check for aux levels in layer helper (Jason Ekstrand)
- Don't assert aux is present, return 0 if it isn't.
- Use the helpers.
v3:
- Make the helpers aspect-agnostic (Jason Ekstrand)
- Drop anv_image_has_color_aux()
Signed-off-by: Nanley Chery <nanley.g.chery@intel.com>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com> (v2)
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Nanley Chery [Mon, 6 Mar 2017 22:27:44 +0000 (14:27 -0800)]
intel/isl: Only create a CCS buffer if the image supports rendering
v2: Omit the commit message.
Signed-off-by: Nanley Chery <nanley.g.chery@intel.com>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Nanley Chery [Mon, 24 Apr 2017 17:20:27 +0000 (10:20 -0700)]
intel/isl: Limit CCS to one level and layer on gen7
v2 (Jason Ekstrand):
- Remove Vulkan-specific terminology from the commit title.
- Replace '== 7' with '<= 7' to hint that this is a new feature on BDW+.
Signed-off-by: Nanley Chery <nanley.g.chery@intel.com>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com> (v1)
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Nanley Chery [Thu, 11 May 2017 17:51:25 +0000 (10:51 -0700)]
intel/blorp: Check for layer fast-clear restriction
v2: Update commit title (Jason Ekstrand)
Signed-off-by: Nanley Chery <nanley.g.chery@intel.com>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com> (v1)
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Nanley Chery [Thu, 11 May 2017 17:58:18 +0000 (10:58 -0700)]
intel/blorp: Assert levels and layers are in range
v2 (Jason Ekstrand):
- Update commit title.
- Check aux level and layer as well.
v3 (Jason Ekstrand):
- Move the non-aux layer check.
Signed-off-by: Nanley Chery <nanley.g.chery@intel.com>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com> (v1)
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Lucas Stach [Mon, 26 Jun 2017 10:25:08 +0000 (12:25 +0200)]
etnaviv: only flush resource to self if no scanout buffer exists
Currently a resource flush may trigger a self resolve, even if a scanout buffer
exists, but is up to date. If a scanout buffer exists we only ever want to
flush the resource to the scanout buffer. This fixes a performance regression.
Fixes:
dda956340ce9 (etnaviv: resolve tile status when flushing resource)
Cc: mesa-stable@lists.freedesktop.org
Signed-off-by: Lucas Stach <l.stach@pengutronix.de>
Reviewed-by: Philipp Zabel <p.zabel@pengutronix.de>
Reviewed-by: Christian Gmeiner <christian.gmeiner@gmail.com>
Christian Gmeiner [Wed, 21 Jun 2017 20:36:48 +0000 (22:36 +0200)]
etnaviv: add support for snorm textures
Based on a patch from Wladimir J. van der Laan and untested due
to lack of hardware. Binary blob emits those formats if GPU supports
HALTI1 (faked with ibvivhook).
Signed-off-by: Christian Gmeiner <christian.gmeiner@gmail.com>
Reviewed-by: Wladimir J. van der Laan <laanwj@gmail.com>
Christian Gmeiner [Wed, 21 Jun 2017 20:36:47 +0000 (22:36 +0200)]
etnaviv: add R8G8 texture support
Passes texwrap GL_ARB_texture_rg piglit (with faked full texture rg support).
Signed-off-by: Christian Gmeiner <christian.gmeiner@gmail.com>
Reviewed-by: Wladimir J. van der Laan <laanwj@gmail.com>
Christian Gmeiner [Fri, 16 Jun 2017 15:02:29 +0000 (17:02 +0200)]
etnaviv: add support for swizzled texture formats
Passes all ext_texture_swizzle piglits.
Signed-off-by: Christian Gmeiner <christian.gmeiner@gmail.com>
Reviewed-By: Wladimir J. van der Laan <laanwj@gmail.com>
Christian Gmeiner [Wed, 21 Jun 2017 20:36:45 +0000 (22:36 +0200)]
etnaviv: add support for extended texture formats
Signed-off-by: Christian Gmeiner <christian.gmeiner@gmail.com>
Reviewed-by: Wladimir J. van der Laan <laanwj@gmail.com>
Chad Versace [Thu, 22 Jun 2017 22:12:29 +0000 (15:12 -0700)]
glapi: Fix -Wduplicate-decl-specifier due to double-const
Fix all lines in src/mesa/main/marshal_generated.c that declare
double-const variables. Below is all such lines, with duplicates
removed:
$ grep 'const const' marshal_generated.c | sort -u
const const GLboolean * pointer = cmd->pointer;
const const GLvoid * indices = cmd->indices;
const const GLvoid * pointer = cmd->pointer;
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Eric Engestrom [Mon, 26 Jun 2017 11:14:49 +0000 (12:14 +0100)]
anv: use Mesa's u_atomic.h header
Signed-off-by: Eric Engestrom <eric.engestrom@imgtec.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Eric Engestrom [Mon, 26 Jun 2017 11:14:37 +0000 (12:14 +0100)]
radv: use Mesa's u_atomic.h header
Signed-off-by: Eric Engestrom <eric.engestrom@imgtec.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Bruce Cherniak [Mon, 26 Jun 2017 15:26:26 +0000 (10:26 -0500)]
swr: set an explicit clear_rect if scissor is not enabled.
Fix regression of "no rendering" on simple apps like glxgears by
setting an explicit full surface clear_rect when scissor is not
enabled.
This regressed with commit
00173d91 "st/mesa: don't set 16
scissors and 16 viewports if they're unused" due to an assumption
that a default scissor rect is always set, which was the case prior
to this optimization.
Reviewed-by: Tim Rowley <timothy.o.rowley@intel.com>
Tim Rowley [Mon, 26 Jun 2017 13:57:38 +0000 (08:57 -0500)]
swr/rast: adjust std::string usage to fix build
Some combinations of c++ compilers and standard libraries had problems
with the string::replace code we were using previously.
This should fix the travis-ci system.
Tested-by: Eric Engestrom <eric.engestrom@imgtec.com>
Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
Eric Engestrom [Thu, 22 Jun 2017 18:14:20 +0000 (19:14 +0100)]
travis: add missing libs: xdamage + xfixes
> configure: error: Package requirements (x11 xext xdamage >= 1.1 xfixes
> x11-xcb xcb xcb-glx >= 1.8.1 xcb-dri2 >= 1.8) were not met:
> No package 'xdamage' found
> No package 'xfixes' found
Signed-off-by: Eric Engestrom <eric@engestrom.ch>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
Nicolai Hähnle [Fri, 16 Jun 2017 18:50:56 +0000 (20:50 +0200)]
radeonsi: support indirect indexing in INTERP_* opcodes
The hardware doesn't support it, so we just interpolate all array elements
and then use indirect indexing on the resulting vector.
Clearly, this is not very efficient. There is an argument to be had for
adding if/else, or perhaps even pulling the data out of LDS directly.
Both don't really seem worth the effort, considering that it seems nobody
actually uses this feature.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Ben Crocker [Thu, 22 Jun 2017 19:14:51 +0000 (15:14 -0400)]
egl_dri2: swrastGetDrawableInfo: set *x, *y [v2]
In swrastGetDrawableInfo, set *x and *y, not just *w and *h;
this fixes a crash later in drisw_update_tex_buffer when the
(formerly) uninitialized x and y values are used to construct
an address in a call to llvmpipe_transfer_map.
Fixes crash in Piglit test
"spec@egl 1.4@eglcreatepbuffersurface and then glclear"
(<piglit dir>/bin/egl-create-pbuffer-surface -auto)
that occurred intermittently, e.g. when the uninitialized x and y in
drisw_update_tex_buffer just happened to contain absurd non-zero values.
v2: Initialize in case if function succeeds or fails, just like *w/*h.
Cc: mesa-stable@lists.freedesktop.org
Signed-off-by: Ben Crocker <bcrocker@redhat.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>