Tobias Klausmann [Tue, 8 Jul 2014 02:19:13 +0000 (04:19 +0200)]
nv50/ir: use unordered_set instead of list to keep track of var uses
The set of variable uses does not need to be ordered in any way, and
removing/adding elements is a fairly common operation in various
optimization passes.
This shortens runtime of piglit test fp-long-alu to ~22s from ~4h
Signed-off-by: Tobias Klausmann <tobias.johannes.klausmann@mni.thm.de>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Kenneth Graunke [Sun, 6 Jul 2014 05:21:40 +0000 (22:21 -0700)]
i965/disasm: Fix disassembly of the any16h/all16h predicates.
BRW_PREDICATE_ALIGN1_ANY16H was incorrectly being disassembled as
"all16h", and ALL16H would probably print as "(null)".
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Kenneth Graunke [Sun, 6 Jul 2014 05:04:45 +0000 (22:04 -0700)]
glsl: Fix the foreach_in_list_reverse macro.
We clearly don't want to start at the head and walk backwards; we want
to start at the last real element before the tail sentinel. If the list
is empty, tail_pred will be the head sentinel, and we'll stop.
Nothing uses this function, so I guess nobody noticed it was broken.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Marek Olšák [Tue, 8 Jul 2014 00:02:40 +0000 (02:02 +0200)]
radeonsi: mark MSAA config state as dirty at the beginning of CS
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=81020
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Marek Olšák [Thu, 19 Jun 2014 21:34:27 +0000 (23:34 +0200)]
gallium: fix u_default_transfer_inline_write for textures
This doesn't fix any known issue. In fact, radeon drivers ignore all
the discard flags for textures and implicitly do "discard range"
for any write transfer.
Cc: mesa-stable@lists.freedesktop.org
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Matt Turner [Mon, 7 Jul 2014 04:31:28 +0000 (21:31 -0700)]
i965: Remove artificial dependency between math instructions.
... on Gen6+. I'm not actually sure which class Gen6 fits into.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Matt Turner [Wed, 9 Oct 2013 05:54:46 +0000 (22:54 -0700)]
i965/fs: Track dependencies in instruction scheduling per reg offset.
Previously instruction scheduling tracked dependencies on a per-register
basis. This meant that there was an artificial dependency between
interpolation instructions writing into the same virtual register.
Instruction scheduling would insert a number of instructions between the
two instructions in this example, when they are actually independent.
linterp vgrf8+0.0:F, hw_reg2:F, hw_reg3:F, hw_reg6:F
linterp vgrf8+1.0:F, hw_reg2:F, hw_reg3:F, hw_reg6+16:F
This lead to cases where the first texture coordinate is interpolated at
the beginning of the shader, but the second is done immediately before
the texture operation that uses it as a source.
After this change, the artificial dependency is removed and the
interpolation instructions are scheduled together.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Jon TURNEY [Mon, 2 Jun 2014 16:40:38 +0000 (17:40 +0100)]
configure: Don't special case Cygwin to use gnu99, define _XOPEN_SOURCE instead
Revert "build: Build on Cygwin with gnu99 instead of c99." and define
_XOPEN_SOURCE appropriately.
This reverts commit
53e36d333c9b619c1a5fe9a8d2d08665654b0234.
Since Cygwin 1.7.18 (April 2013), it's headers correctly prototype strtoll()
when using -std=c99, and correctly prototype strdup() when _XOPEN_SOURCE is
defined appropriately, so this workaround is no longer needed.
Signed-off-by: Jon TURNEY <jon.turney@dronecode.org.uk>
Cc: Vinson Lee <vlee@freedesktop.org>
Chia-I Wu [Tue, 8 Jul 2014 06:03:17 +0000 (14:03 +0800)]
ilo: fix fence reference counting
The old code was complicated, and was wrong when *ptr is NULL.
Kristian Høgsberg [Tue, 8 Jul 2014 06:32:35 +0000 (23:32 -0700)]
i965: Extend compute-to-mrf pass to understand blocks of MOVs
The current compute-to-mrf pass doesn't handle blocks of MOVs. Shaders
that end with a texture fetch follwed by an fb write are left like this:
0x00000000: pln(8) g6<1>F g4<0,1,0>F g2<8,8,1>F { align1 WE_normal 1Q compacted };
0x00000008: pln(8) g7<1>F g4.4<0,1,0>F g2<8,8,1>F { align1 WE_normal 1Q compacted };
0x00000010: send(8) g2<1>UW g6<8,8,1>F
sampler (1, 0, 0, 1) mlen 2 rlen 4 { align1 WE_normal 1Q };
0x00000020: mov(8) g113<1>F g2<8,8,1>F { align1 WE_normal 1Q compacted };
0x00000028: mov(8) g114<1>F g3<8,8,1>F { align1 WE_normal 1Q compacted };
0x00000030: mov(8) g115<1>F g4<8,8,1>F { align1 WE_normal 1Q compacted };
0x00000038: mov(8) g116<1>F g5<8,8,1>F { align1 WE_normal 1Q compacted };
0x00000040: sendc(8) null g113<8,8,1>F
render ( RT write, 0, 4, 12) mlen 4 rlen 0 { align1 WE_normal 1Q EOT };
This patch lets compute-to-mrf recognize blocks of MOVs and match them to
instructions (typically SEND) that writes multiple registers. With this,
the above shader becomes:
0x00000000: pln(8) g6<1>F g4<0,1,0>F g2<8,8,1>F { align1 WE_normal 1Q compacted };
0x00000008: pln(8) g7<1>F g4.4<0,1,0>F g2<8,8,1>F { align1 WE_normal 1Q compacted };
0x00000010: send(8) g113<1>UW g6<8,8,1>F
sampler (1, 0, 0, 1) mlen 2 rlen 4 { align1 WE_normal 1Q };
0x00000020: sendc(8) null g113<8,8,1>F
render ( RT write, 0, 20, 12) mlen 4 rlen 0 { align1 WE_normal 1Q EOT };
which is the bulk of the shader db results:
total instructions in shared programs: 987040 -> 986720 (-0.03%)
instructions in affected programs: 844 -> 524 (-37.91%)
GAINED: 0
LOST: 0
The optimization also applies to MRT shaders that write the same
color value to multiple RTs, in which case we can eliminate four MOVs in
a similar fashion. See fbo-drawbuffers2-blend in piglit for an example.
No measurable performance impact. No piglit regressions.
Signed-off-by: Kristian Høgsberg <krh@bitplanet.net>
Ilia Mirkin [Sat, 5 Jul 2014 05:24:38 +0000 (01:24 -0400)]
nvc0/ir: fill offset in properly for TXD
Apparently TXD wants its offset differently than TEX, accepting it in
the upper bits of the layer index. Unclear what happens when this is
combined with indirect sampler indexing.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Ilia Mirkin [Sat, 5 Jul 2014 04:52:15 +0000 (00:52 -0400)]
nvc0/ir: use manual TXD when offsets are involved
Something about how we're implementing offsets for TXD is wrong, just
flip to the generic quadop-based implementation in that case.
This is the minimal fix appropriate for backporting.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: <mesa-stable@lists.freedesktop.org>
Ilia Mirkin [Sat, 5 Jul 2014 04:30:45 +0000 (00:30 -0400)]
nvc0/ir: do quadops on the right texture coordinates for TXD
handleTEX moves the layer as the first argument. This makes sure that
the quadops deal with the texture coordinates.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: <mesa-stable@lists.freedesktop.org>
Ilia Mirkin [Fri, 4 Jul 2014 22:38:28 +0000 (18:38 -0400)]
nv50/ir: ignore bias for samplerCubeShadow on nv50
Unfortunately there's no good way to do this on the nv50 shader isa.
Dropping the bias seems preferable to doing the compare post-filtering.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: <mesa-stable@lists.freedesktop.org>
Ilia Mirkin [Fri, 4 Jul 2014 21:09:43 +0000 (17:09 -0400)]
nv50/ir: retrieve shadow compare from first arg
This can only happen with texture(samplerCubeShadow, bias), where the
compare will be in the first argument.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: <mesa-stable@lists.freedesktop.org>
Carl Worth [Mon, 7 Jul 2014 23:28:37 +0000 (16:28 -0700)]
docs: Import 10.2.3 release notes
And add a news item.
Matt Turner [Sun, 29 Jun 2014 02:18:44 +0000 (19:18 -0700)]
i965/fs: Disable unlit_centroid_workaround on Haswell.
Although the HSW PRM shows it, the BSpec lists this workaround as being
for Ivybridge only.
total instructions in shared programs: 1994951 -> 1993675 (-0.06%)
instructions in affected programs: 27325 -> 26049 (-4.67%)
Matt Turner [Wed, 11 Jun 2014 20:49:34 +0000 (13:49 -0700)]
i965/vec4: Perform CSE on CMP(N) instructions.
Port of commit
b16b3c87 to the vec4 code.
No shader-db improvements, but might as well. The fs backend saw an
improvement because it's scalar and multiple identical CMP instructions
were generated by the SEL peepholes.
Matt Turner [Wed, 11 Jun 2014 22:24:52 +0000 (15:24 -0700)]
i965/vec4: Don't emit null MOVs in CSE.
Port of commit
219b43c6 to the vec4 code.
Matt Turner [Wed, 11 Jun 2014 20:43:15 +0000 (13:43 -0700)]
i965/vec4: Improve CSE performance by expiring some available expressions.
Port of commit
5daf867f to the vec4 code.
Kenneth Graunke [Wed, 6 Mar 2013 18:48:55 +0000 (10:48 -0800)]
i965/vec4: Add basic common subexpression elimination.
[mattst88]: Modified to perform CSE on instructions with
the same writemask. Offered no improvement before.
total instructions in shared programs: 1995633 -> 1995185 (-0.02%)
instructions in affected programs: 14410 -> 13962 (-3.11%)
Reviewed-by: Matt Turner <mattst88@gmail.com>
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Matt Turner [Sun, 6 Jul 2014 06:24:15 +0000 (23:24 -0700)]
i965: Fix warnings introduced in commit
e24ef5ab.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Christian König [Sun, 6 Jul 2014 10:29:00 +0000 (12:29 +0200)]
gallium/radeon: use PRIX64 instead of PRIu64
We want hex values here, not decimals.
Signed-off-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Matt Turner [Mon, 30 Jun 2014 01:11:29 +0000 (18:11 -0700)]
i965: Move assembly annotation functions to intel_asm_annotation.c.
It's C. Compile it as such.
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Matt Turner [Mon, 30 Jun 2014 01:09:35 +0000 (18:09 -0700)]
i965: Rename intel_asm_printer -> intel_asm_annotation.
The #ifndef include guards already said the right thing :)
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Matt Turner [Mon, 30 Jun 2014 01:21:30 +0000 (18:21 -0700)]
i965: Make backend_instruction usable from C.
With a hack to place an exec_node in the struct in C to be at the same
location as the inherited exec_node in C++.
Acked-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Matt Turner [Mon, 30 Jun 2014 01:18:53 +0000 (18:18 -0700)]
i965/cfg: Make cfg_t usable from C.
Acked-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Matt Turner [Mon, 30 Jun 2014 01:02:49 +0000 (18:02 -0700)]
i965: Repack backend_instruction struct.
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Matt Turner [Mon, 30 Jun 2014 00:58:59 +0000 (17:58 -0700)]
i965: Make a brw_predicate enum.
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Matt Turner [Mon, 30 Jun 2014 00:50:20 +0000 (17:50 -0700)]
i965: Make a brw_conditional_mod enum.
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Matt Turner [Mon, 30 Jun 2014 00:32:14 +0000 (17:32 -0700)]
i965: Move common fields into backend_instruction.
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Matt Turner [Sun, 29 Jun 2014 23:02:59 +0000 (16:02 -0700)]
i965: Use enum brw_reg_type for register types.
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Matt Turner [Sun, 29 Jun 2014 22:35:58 +0000 (15:35 -0700)]
i965: Move is_zero/one/null/accumulator into backend_reg.
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Matt Turner [Sun, 29 Jun 2014 22:27:07 +0000 (15:27 -0700)]
i965: Make a common backend_reg class.
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Matt Turner [Sun, 29 Jun 2014 22:14:35 +0000 (15:14 -0700)]
i965: Drop imm union from visitor register classes.
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Matt Turner [Sun, 29 Jun 2014 22:13:24 +0000 (15:13 -0700)]
i965: Use immediate storage in brw_reg for visitor regs.
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Andreas Boll [Sat, 5 Jul 2014 09:32:54 +0000 (11:32 +0200)]
docs: add news item for mesa-demos 8.2.0 release
Chris Forbes [Fri, 27 Jun 2014 09:09:43 +0000 (21:09 +1200)]
glsl: Fix merging of layout(invocations) with other qualifiers
If another layout qualifier appeared to the left of `invocations` in the
GS input layout declaration, the invocation count would be dropped on
the floor.
Fixes the piglit tests:
spec/ARB_transform_feedback3/arb_transform_feedback3-ext_interleaved_two_bufs_gs_max
spec/ARB_gpu_shader5/arb_gpu_shader5-invocation-id
spec/ARB_gpu_shader5/compiler/correct-multiple-layout-qualifier-invocations.geom
spec/ARB_gpu_shader5/execution/invocations-conflicting
Signed-off-by: Chris Forbes <chrisf@ijw.co.nz>
Tested-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Ilia Mirkin [Tue, 1 Jul 2014 04:49:34 +0000 (00:49 -0400)]
nvc0: add a memory barrier when there are persistent UBOs
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: "10.2" <mesa-stable@lists.freedesktop.org>
Ilia Mirkin [Tue, 1 Jul 2014 03:49:46 +0000 (23:49 -0400)]
nv50: do an explicit flush on draw when there are persistent buffers
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: "10.2" <mesa-stable@lists.freedesktop.org>
Ilia Mirkin [Tue, 1 Jul 2014 02:43:39 +0000 (22:43 -0400)]
nv50: disable dedicated ubo upload method
The hardware allows multiple simultaneous renders with the same
memory-backed constbufs but with each invocation having different
values. However in order for that to work, the data has to be streamed
in via the right constbuf slot. We weren't doing that for UBOs.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: "10.2 10.1" <mesa-stable@lists.freedesktop.org>
Ilia Mirkin [Thu, 3 Jul 2014 15:15:18 +0000 (11:15 -0400)]
gallium: rename PIPE_CAP_TGSI_VS_LAYER to also have _VIEWPORT
Now that this cap is used to determine the availability of both, adjust
its name to reflect the new reality.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Ilia Mirkin [Wed, 2 Jul 2014 16:17:59 +0000 (12:17 -0400)]
mesa/st: enable AMD_vertex_shader_viewport_index
The assumption is that any driver capable of emitting layer from the
vertex shader and supporting viewports should be able to also handle
emitting viewport index from the vertex shader.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Tested-by: Tobias Droste <tdroste@gmx.de>
Ilia Mirkin [Wed, 2 Jul 2014 16:16:01 +0000 (12:16 -0400)]
r600g: allow vs to write to gl_ViewportIndex
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Tested-by: Tobias Droste <tdroste@gmx.de>
Thomas Hellstrom [Fri, 13 Jun 2014 07:46:54 +0000 (09:46 +0200)]
svga: Don't unnecessarily reemit BindGBShader commands v2
The Linux winsys can no longer relocate shader code, so avoid
reemitting BindGBShader commands. They are costly.
v2: Correctly handle errors from SVGA3D_BindGBShader()
Reported-by: Michael Banack <banackm@vmware.com>
Signed-off-by: Thomas Hellstrom <thellstrom@vmware.com>
Tested-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Jakob Bornecrantz <jakob@vmware.com>
Aaron Watry [Wed, 2 Jul 2014 21:27:31 +0000 (16:27 -0500)]
radeon/llvm: Allocate space for kernel metadata operands
Previously, we were assuming that kernel metadata nodes only had 1 operand.
Kernels which have attributes can have more than 1, e.g.:
!0 = metadata !{void (i32 addrspace(1)*)* @testKernel, metadata !1}
!1 = metadata !{metadata !"work_group_size_hint", i32 4, i32 1, i32 1}
Attempting to get the kernel without the correct number of attributes led
to memory corruption and luxrays crashing out.
Fixes the cl/program/execute/attributes.cl piglit test.
Signed-off-by: Aaron Watry <awatry@gmail.com>
Reviewed-by: Tom Stellard <thomas.stellard@amd.com>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=76223
CC: "10.2" <mesa-stable@lists.freedesktop.org>
Samuel Iglesias Gonsalvez [Wed, 2 Jul 2014 07:38:43 +0000 (09:38 +0200)]
glsl: fix duplicated layout qualifier detection for GS
This patch fixes the duplicated layout qualifier detection
for geometry shader's layout qualifiers.
Also it makes the detection code more legible by defining
allowed_duplicates_mask variable.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=80778
Signed-off-by: Samuel Iglesias Gonsalvez <siglesias@igalia.com>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Brian Paul [Thu, 3 Jul 2014 14:25:48 +0000 (08:25 -0600)]
svga: add switch cases for PIPE_SHADER_CAP_DOUBLES
Signed-off-by: Brian Paul <brianp@vmware.com>
Thomas Hellstrom [Thu, 3 Jul 2014 09:07:36 +0000 (02:07 -0700)]
st/xa: Don't close the drm fd on failure v2
If XA fails to initialize with pipe_loader enabled, the pipe_loader's
cleanup function will close the drm file descriptor. That's pretty bad
because the file descriptor will probably be the X server driver's only
connection to drm. Temporarily solve this by dup()'ing the file descriptor
before handing it over to the pipe loader.
This fixes freedesktop.org bugzilla bug #80645.
v2: Fix CC addresses.
Cc: "10.2" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Thomas Hellstrom <thellstrom@vmware.com>
Reviewed-by: Jakob Bornecrantz <jakob@vmware.com>
Michel Dänzer [Thu, 3 Jul 2014 01:58:57 +0000 (10:58 +0900)]
Revert "radeonsi: Use dma_copy when possible for si_blit."
This reverts commit
5d5c20920e0e570742a497aa047e99a2fa3c04f2.
Caused visual corruption, see e.g.
https://bugs.freedesktop.org/show_bug.cgi?id=80827#c1
Ilia Mirkin [Wed, 2 Jul 2014 19:49:39 +0000 (15:49 -0400)]
i965: expose AMD_vertex_shader_viewport_index on gen7+
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Chris Forbes <chrisf@ijw.co.nz>
Ilia Mirkin [Wed, 2 Jul 2014 16:12:51 +0000 (12:12 -0400)]
glsl: add support for AMD_vertex_shader_viewport_index
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Chris Forbes <chrisf@ijw.co.nz>
Tested-by: Tobias Droste <tdroste@gmx.de>
Ilia Mirkin [Wed, 2 Jul 2014 16:12:28 +0000 (12:12 -0400)]
mesa: add support for AMD_vertex_shader_viewport_index
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Chris Forbes <chrisf@ijw.co.nz>
Tested-by: Tobias Droste <tdroste@gmx.de>
Ilia Mirkin [Mon, 23 Jun 2014 13:32:59 +0000 (09:32 -0400)]
mesa/st: enable ARB_fragment_layer_viewport
If multiple viewports are supported, that implies the presence of a GS
and layered rendering, so we can enable ARB_fragment_layer_viewport as
well.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Eric Anholt [Wed, 21 May 2014 21:31:31 +0000 (14:31 -0700)]
i965/gen6: Add a spec citation about push constant packet requirements.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Eric Anholt [Wed, 21 May 2014 21:09:25 +0000 (14:09 -0700)]
i965: Add a comment about null renderbuffer surfaces and why they exist.
I noticed this when trying to find comments about pull constant buffers.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Eric Anholt [Mon, 19 May 2014 15:51:12 +0000 (08:51 -0700)]
i965: Update a ton of comments about constant buffers.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Eric Anholt [Wed, 21 May 2014 21:22:47 +0000 (14:22 -0700)]
i965: Merge VS/GS and WM pull constant buffer upload paths.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Eric Anholt [Mon, 19 May 2014 17:06:15 +0000 (10:06 -0700)]
i965/gen6+: Merge VS/GS and WM push constant buffer upload paths.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Eric Anholt [Mon, 19 May 2014 17:10:01 +0000 (10:10 -0700)]
i965: Move dispatch_grf_start_reg and first_curbe_grf into stage_prog_data.
I wanted to access this value from stage-generic code, so stop storing it
under two different names.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Eric Anholt [Mon, 19 May 2014 16:37:00 +0000 (09:37 -0700)]
i965: Fix state flags for gen4/5 CURBE.
If we had some NOS affecting VS compilation that resulted in optimization
changing the set of constants to be uploaded, we might not have reuploaded
the constants.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Eric Anholt [Mon, 19 May 2014 04:47:29 +0000 (21:47 -0700)]
i965: Remove a dead define.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Eric Anholt [Mon, 19 May 2014 04:24:47 +0000 (21:24 -0700)]
i965: Reuse libdrm's header for AUB definitions.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Eric Anholt [Mon, 19 May 2014 03:48:16 +0000 (20:48 -0700)]
i965: Fix stale comments about the state cache.
This changed in the state streaming work years ago.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Eric Anholt [Mon, 19 May 2014 03:44:02 +0000 (20:44 -0700)]
i965: Fix stale binding table comment.
I recently moved the code from the mentioned location right into this
file.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Eric Anholt [Tue, 20 May 2014 18:54:26 +0000 (11:54 -0700)]
i965: Drop the memcmp for finding duplicated CURBE uploads.
At this point, the extra copy of the data and memcmp are as expensive as
just re-uploading.
Note: now that we'll always upload, and brw_constant_buffer watches
BRW_NEW_BATCH anyway, we don't need to explicitly unref the old curbe_bo
at batch reset time.
No significant performance difference on glamor copywinwin10 (n=55),
despite that test having a 98% hit rate on the cache.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Eric Anholt [Tue, 20 May 2014 18:09:57 +0000 (11:09 -0700)]
i965: Reuse intel_upload.c for gen4/5 constant buffers.
No performance difference on glamor with copywinwin10 (n=40) on my gm45.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Tom Stellard [Tue, 17 Jun 2014 15:52:34 +0000 (08:52 -0700)]
gallium: Add PIPE_SHADER_CAP_DOUBLES
This is for reporting whether or not double precision floating-point
operations are supported.
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
Matt Arsenault [Tue, 10 Jun 2014 05:21:52 +0000 (22:21 -0700)]
clover: Fix not setting build log if the build succeeds v2
If there were only warnings, they would not be added to the log.
v2:
- Use compat::string.
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
Francisco Jerez [Sat, 21 Jun 2014 16:06:07 +0000 (18:06 +0200)]
clover: Have compat::string allocate its own memory.
Tom Stellard [Wed, 18 Jun 2014 20:58:33 +0000 (16:58 -0400)]
gallium/radeon: Only print a message for LLVM diagnostic errors
We were printing messages for all diagnostic types, which was
spamming the console for some OpenCL programs.
Tom Stellard [Tue, 24 Jun 2014 23:35:08 +0000 (19:35 -0400)]
radeon/llvm: Use the llvm.rsq.clamped intrinsic for RSQ
Reviewed-and-Tested-by: Michel Dänzer <michel.daenzer@amd.com>
Tested-by: Laurent Carlier <lordheavym@gmail.com>
https://bugs.freedesktop.org/show_bug.cgi?id=80015
CC: "10.1 10.2" <mesa-stable@lists.freedesktop.org>
Ilia Mirkin [Tue, 24 Jun 2014 23:23:20 +0000 (19:23 -0400)]
r600g: allow viewport index/layer to be sent to ps
In order to support ARB_fragment_layer_viewport, we need to explicitly
send these along to the pixel shader, since it has no other way to
retrieve them.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Tested-by: Tobias Droste <tdroste@gmx.de>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Emil Velikov [Wed, 2 Jul 2014 11:06:41 +0000 (12:06 +0100)]
targets/dri: allow duplicated symbols
With the inclusion of xmlconfig in the loader we're providing dri* symbols
which are already available in libdricommon.la. This leads to a build
break due to the multiple definitions.
Temporary allow multiple definitions, until we come with a better solution.
Reported-by: Laurent Carlier <lordheavym@gmail.com>
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Emil Velikov [Mon, 16 Jun 2014 14:17:40 +0000 (15:17 +0100)]
st/dri: Remove the old libdridrm library
With all the hw drivers converted, we can go back to having
a single libdridrm provider.
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Tested-by: Rob Clark <robclark@freedesktop.org>
Tested-by: Thomas Helland <thomashelland90 at gmail.com>
Acked-by: Tom Stellard <thomas.stellard@amd.com>
Emil Velikov [Mon, 16 Jun 2014 12:50:01 +0000 (13:50 +0100)]
targets/dri-vmwgfx: Convert to static/shared pipe-drivers
Convert the final hardware driver to a single dri provider which
includes all the pipe-drivers.
Update the scons build and drop the unused vmw_powf.c.
Cc: José Fonseca <jfonseca@vmware.com>
Cc: Brian Paul <brianp@vmware.com>
Cc: Jakob Bornecrantz <jakob@vmware.com>
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Tested-by: Rob Clark <robclark@freedesktop.org>
Tested-by: Thomas Helland <thomashelland90 at gmail.com>
Acked-by: Tom Stellard <thomas.stellard@amd.com>
Emil Velikov [Mon, 16 Jun 2014 12:31:10 +0000 (13:31 +0100)]
targets/dri-ilo: Convert to static/shared pipe-driver
Cc: Chia-I Wu <olv@lunarg.com>
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Tested-by: Rob Clark <robclark@freedesktop.org>
Tested-by: Thomas Helland <thomashelland90 at gmail.com>
Acked-by: Tom Stellard <thomas.stellard@amd.com>
Emil Velikov [Sat, 21 Jun 2014 11:45:54 +0000 (12:45 +0100)]
targets/dri-i915: Convert to static/shared pipe-drivers
v2:
- Drop inclusion of the winsys wrapper and softpipe/llvmpipe.
- Remove old Makefile.am, target.c.
- Correctly append i915 to the megadrivers list.
Cc: Stephane Marchesin <stephane.marchesin@gmail.com>
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Tested-by: Rob Clark <robclark@freedesktop.org>
Tested-by: Thomas Helland <thomashelland90 at gmail.com>
Acked-by: Tom Stellard <thomas.stellard@amd.com>
Emil Velikov [Mon, 16 Jun 2014 01:54:56 +0000 (02:54 +0100)]
targets/dri-freedreno: Convert to static/shared pipe-drivers
Now we don't need a second dri module when using kgsl :)
Cc: Rob Clark <robclark@freedesktop.org>
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Tested-by: Rob Clark <robclark@freedesktop.org>
Tested-by: Thomas Helland <thomashelland90 at gmail.com>
Acked-by: Tom Stellard <thomas.stellard@amd.com>
Emil Velikov [Mon, 16 Jun 2014 13:25:50 +0000 (14:25 +0100)]
targets/(r300|r600|radeonsi)/dri: Convert to static/shared pipe-drivers
Related to previous commit, merge the separate dri targets to a single
one.
This is essentially all the buildsystem mayhem required for megaradeon.
Cc: Marek Olšák <marek.olsak@amd.com>
Cc: Michel Dänzer <michel.daenzer@amd.com>
Cc: Christian König <christian.koenig@amd.com>
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Tested-by: Rob Clark <robclark@freedesktop.org>
Tested-by: Thomas Helland <thomashelland90 at gmail.com>
Acked-by: Tom Stellard <thomas.stellard@amd.com>
Acked-by: Christian König <christian.koenig@amd.com>
Emil Velikov [Mon, 16 Jun 2014 13:23:50 +0000 (14:23 +0100)]
targets/dri-nouveau: Convert to static/shared pipe-drivers
Similiar to other targets, we'd like to convert all the separate
targets into a single one, thus we'll minimize the duplication and
overall size of mesa. The conversion per API basis, with the drivers
available either statically or shared. Currently the former is the
default.
v2: Correctly append the version script to the linker flags.
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Tested-by: Rob Clark <robclark@freedesktop.org>
Tested-by: Thomas Helland <thomashelland90 at gmail.com>
Acked-by: Tom Stellard <thomas.stellard@amd.com>
Emil Velikov [Mon, 16 Jun 2014 01:05:05 +0000 (02:05 +0100)]
st/dri/drm: Add a second libdridrm library
Will be used to create the single dri target library, on our
way to convert all the dri targets during the conversion to
to static/shared pipe-drivers.
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Tested-by: Rob Clark <robclark@freedesktop.org>
Tested-by: Thomas Helland <thomashelland90 at gmail.com>
Acked-by: Tom Stellard <thomas.stellard@amd.com>
Emil Velikov [Sun, 15 Jun 2014 23:20:07 +0000 (00:20 +0100)]
st/dri: Allow separate dri-targets
With this commit we add a couple of DEFINES making the ST code
conditional, in a way that we can use it to gradually convert
the dri-targets from separate libraries into a single one.
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Tested-by: Rob Clark <robclark@freedesktop.org>
Tested-by: Thomas Helland <thomashelland90 at gmail.com>
Acked-by: Tom Stellard <thomas.stellard@amd.com>
Emil Velikov [Mon, 16 Jun 2014 13:21:13 +0000 (14:21 +0100)]
targets/dri-swrast: use drm aware dricommon when building more than swrast
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Tested-by: Rob Clark <robclark@freedesktop.org>
Tested-by: Thomas Helland <thomashelland90 at gmail.com>
Acked-by: Tom Stellard <thomas.stellard@amd.com>
Ilia Mirkin [Tue, 1 Jul 2014 14:27:38 +0000 (10:27 -0400)]
docs: update hw-dependent bits of ARB_gpu_shader5
Some of the features are completely implemented by core, while others
have hardware dependencies. Create a list of drivers supporting each
sub-feature that must have hw support.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Chris Forbes <chrisf@ijw.co.nz>
Ilia Mirkin [Wed, 2 Jul 2014 00:05:09 +0000 (20:05 -0400)]
nvc0: add missed PIPE_CAP_DRAW_INDIRECT
Real support will be forthcoming. For now, avoid the unknown cap error
and compiler warning.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Roland Scheidegger [Tue, 1 Jul 2014 13:37:56 +0000 (15:37 +0200)]
llvmpipe: get rid of llvmpipe_get_texture_tile_linear
Because the layout is always linear this didn't really do much any longer -
at some point this triggered per-tile swizzled->linear conversion. The x/y
coords were ignored too.
Apart from triggering conversion, this also invoked alloc_image_data(), which
could only actually trigger mapping of display target resources. So, instead
just call resource_map in the callers (which also gives the ability to unmap
again). Note that mapping/unmapping of display target resources still isn't
really all that clean (map/unmap may be unmatched, and all such mappings use
the same pointer thus usage flags are a lie).
Reviewed-by: Brian Paul <brianp@vmware.com>
Roland Scheidegger [Tue, 1 Jul 2014 01:38:41 +0000 (03:38 +0200)]
llvmpipe: get rid of llvmpipe_get_texture_image
The only caller left used it only for non display target textures,
hence it was really the same as llvmpipe_get_texture_image_address - it
also had a usage flag but this was ignored anyway.
Reviewed-by: Brian Paul <brianp@vmware.com>
Roland Scheidegger [Tue, 1 Jul 2014 01:09:44 +0000 (03:09 +0200)]
llvmpipe: get rid of llvmpipe_get_texture_image_all
Once used for invoking swizzled->linear conversion for all needed images.
But we now have a single allocation for all images in a resource, thus looping
through all slices is rather pointless, conversion doesn't happen neither.
Also simplify the sampling setup code to use the mip_offsets array in the
resource directly - if the (non display target) resource exists its memory
will already be allocated as well.
Reviewed-by: Brian Paul <brianp@vmware.com>
Roland Scheidegger [Tue, 1 Jul 2014 15:06:48 +0000 (17:06 +0200)]
llvmpipe: allocate regular texture memory upfront
The deferred allocation doesn't really make much sense anymore, since we no
longer allocate swizzled/linear memory in chunks and not per level / slice
neither.
This means we could fail resource creation a bit more (could already fail in
theory anyway) but should not fail maps later (right now, callers can't deal
with neither really).
Reviewed-by: Brian Paul <brianp@vmware.com>
Roland Scheidegger [Tue, 1 Jul 2014 00:18:56 +0000 (02:18 +0200)]
llvmpipe: get rid of linear_img struct
Just use a tex_data pointer directly - the description was no longer correct
neither.
Reviewed-by: Brian Paul <brianp@vmware.com>
Roland Scheidegger [Mon, 30 Jun 2014 23:54:30 +0000 (01:54 +0200)]
llvmpipe: (trivial) rename linear_mip_offsets to mip_offsets
Since switching to non-swizzled rendering we only have "normal", aka linear,
offsets.
Reviewed-by: Brian Paul <brianp@vmware.com>
Roland Scheidegger [Tue, 1 Jul 2014 19:16:00 +0000 (21:16 +0200)]
target-helpers: don't use designated initializers
it looks since
ce1a1372280d737a1b85279995529206586ae480 they are now included
in more places, in particular even for things buildable with msvc, and hence
those break the build.
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Emil Velikov <emil.l.velikov@gmail.com>
Christoph Bumiller [Fri, 5 Apr 2013 12:29:37 +0000 (14:29 +0200)]
st/mesa: add support for indirect drawing
Marek Olšák [Thu, 24 Apr 2014 23:27:34 +0000 (01:27 +0200)]
gallium/u_vbuf: get draw info from an indirect buffer if there's any
This is required for fallbacks to work with ARB_draw_indirect.
Christoph Bumiller [Fri, 5 Apr 2013 12:29:36 +0000 (14:29 +0200)]
gallium: add facilities for indirect drawing
v2:
Added comments to util_draw_indirect, clarified and fixed map size.
Removed unlikely().
Christoph Bumiller [Fri, 5 Apr 2013 12:29:35 +0000 (14:29 +0200)]
gallium: add PIPE_BIND_COMMAND_ARGS_BUFFER
Intended for use with GL_ARB_draw_indirect's DRAW_INDIRECT_BUFFER
target or for D3D11_RESOURCE_MISC_DRAWINDIRECT_ARGS.
Dave Airlie [Tue, 1 Jul 2014 22:24:05 +0000 (08:24 +1000)]
xmlconfig/dri: bool -> unsigned char
Drop stdbool, due to the X server being a pain and having
struct members called bool, although I've sent a patch to fix
that we should retain stupidity here. Use unsigned char
which is what GLboolean is anyways.
Signed-off-by: Dave Airlie <airlied@redhat.com>
Cody Northrop [Tue, 1 Jul 2014 18:43:54 +0000 (12:43 -0600)]
i965/fs: Update discard jump to preserve uniform loads via sampler.
Commit
17c7ead7 exposed a bug in how uniform loading happens in the
presence of discard. It manifested itself in an application as
randomly incorrect pixels on the borders of conditional areas.
This is due to how discards jump to the end of the shader incorrectly
for some channels. The current implementation checks each 2x2
subspan to preserve derivatives. When uniform loading via samplers
was turned on, it uses a full execution mask, as stated in
lower_uniform_pull_constant_loads(), and only populates four channels
of the destination (see generate_uniform_pull_constant_load_gen7()).
It happens incorrectly when the first subspan has been jumped over.
The series that implemented this optimization was done before the
changes to use samplers for uniform loads. Uniform sampler loads
use special execution masks and only populate four channels, so we
can't jump over those or corruption ensues.
This fix only jumps to the end of the shader if all relevant channels
are disabled, i.e. all 8 or 16, depending on dispatch. This
preserves the original GLbenchmark 2.7 speedup noted in commit
beafced2.
It changes the shader assembly accordingly:
before : (-f0.1.any4h) halt(8) 17 2 null { align1 WE_all 1Q };
after(8) : (-f0.1.any8h) halt(8) 17 2 null { align1 WE_all 1Q };
after(16): (-f0.1.any16h) halt(16) 17 2 null { align1 WE_all 1H };
v2: Cleaned up comments and conditional ordering.
v3: Fix typo.
Signed-off-by: Cody Northrop <cody@lunarg.com>
Reviewed-by: Mike Stroyan <mike@lunarg.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=79948
Matt Turner [Mon, 30 Jun 2014 01:24:18 +0000 (18:24 -0700)]
i965/fs: Mark case unreachable to silence warning.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>