Vinson Lee [Fri, 2 Aug 2013 06:04:27 +0000 (23:04 -0700)]
i915,i965: Fix memory leak in try_pbo_upload (v2)
Fixes "Resource leak" defect reported by Coverity.
Tested on Haswell, no Piglit regressions.
v2: Apply to i965, not just i915. (chadv)
CC: "9.2, 9.1" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Vinson Lee <vlee@freedesktop.org>
Reviewed-by: Chad Versace <chad.versace@linux.intel.com>
Roland Scheidegger [Thu, 15 Aug 2013 17:26:39 +0000 (19:26 +0200)]
gallivm: revert accidentally commited hunk
That magic wasn't meant to be commited, need to work on some proper fix.
Roland Scheidegger [Thu, 15 Aug 2013 16:40:32 +0000 (18:40 +0200)]
gallivm: do per-sample depth comparison instead of doing it post-filter
Doing the comparisons pre-filter is highly recommended by OpenGL (and d3d9)
and definitely required by d3d10.
This actually doesn't do it pre-filter but more "in-filter" as otherwise
need to push the comparisons even further down into fetch code and this
also trivially allows using a somewhat cheaper lerp.
Doing it pre-filter would actually have some performance advantage for UNORM
formats (because the comparisons should be done in texture format, we'd only
need to convert the shadow ref coord to texture format once, but in turn would
save converting the per-sample texture values to floats) but this gets a bit
messy as this has implications for border color handling as well (which needs
to be done prior to depth comparisons, hence would also need to convert border
color to texture format too or use some other tricks like doing separate border
color / shadow ref comparison and simply using that result directly when doing
border replacement).
Should make no difference for nearest filtering, and performance for linear
filtering should be mostly the same too (essentially have one more comparison
instruction per sample, and replace the sub/mul/add lerp with a sub/and/and/add
special "lerp" which all in all shouldn't be much of a difference).
v2: get rid of old code completely
Reviewed-by: Zack Rusin <zackr@vmware.com>
Michel Dänzer [Fri, 9 Aug 2013 16:36:31 +0000 (18:36 +0200)]
radeonsi: Pixel shaders pre-load one more SGPR
Acked-by: Marek Olšák <maraeo@gmail.com>
Michel Dänzer [Wed, 7 Aug 2013 09:30:50 +0000 (11:30 +0200)]
radeonsi: TGSI_SEMANTIC_CLIPVERTEX doesn't use any parameters
Michel Dänzer [Thu, 8 Aug 2013 14:58:00 +0000 (16:58 +0200)]
radeonsi: Don't export unused clip distance vectors from vertex shader
E.g. the Source engine seems to always write to gl_ClipVertex, but normally
doesn't enable any GL_CLIP_DISTANCEn states. This change removes some
irrelevant parts from the generated vertex shader code in such cases.
Reviewed-by: Tom Stellard <thomas.stellard@amd.com>
Michel Dänzer [Wed, 7 Aug 2013 16:14:16 +0000 (18:14 +0200)]
radeonsi: Don't leave gaps between position exports from vertex shader
If the vertex shader exports clip distances but not point size, use
position exports 1/2 instead of 2/3 for the clip distances. Fixes
geometry corruption in that case.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=66974
Cc: mesa-stable@lists.freedesktop.org
Reviewed-by: Tom Stellard <thomas.stellard@amd.com>
Roland Scheidegger [Thu, 15 Aug 2013 14:50:27 +0000 (16:50 +0200)]
llvmpipe: fix stencil bug if we have both stencil and depth tests
This is a very well hidden bug found by accident (only the fixed glean
tstencil2 test so far seems to hit it).
We must use new mask with combined s_pass values and orig_mask values
for zpass/zfail stencil ops, otherwise both the sfail op and one of
zpass/zfail op are applied (probably not hit in most tests because
some of the ops tend to be KEEP usually).
Note: this is a candidate for the 9.2 branch.
Reviewed-by: Zack Rusin <zackr@vmware.com>
Roland Scheidegger [Wed, 14 Aug 2013 23:05:03 +0000 (01:05 +0200)]
st/mesa: use new float comparison opcodes if native integers are supported
Should get rid of some float-to-int conversions (with negation).
No piglit regressions (with llvmpipe).
v2: fix bogus formatting spotted by Brian.
Reviewed-by: Brian Paul <brianp@vmware.com>
Ilia Mirkin [Sat, 10 Aug 2013 22:02:49 +0000 (18:02 -0400)]
nvc0: move video param and format support functions to nouveau
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Ilia Mirkin [Sat, 10 Aug 2013 21:51:01 +0000 (17:51 -0400)]
nvc0: move firmware loading functions to nouveau
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Ilia Mirkin [Sat, 10 Aug 2013 21:10:26 +0000 (17:10 -0400)]
nvc0: move some of the simpler decoder functions into nouveau
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Ilia Mirkin [Sat, 10 Aug 2013 20:43:06 +0000 (16:43 -0400)]
nvc0: move vp param filling logic into nouveau
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Ilia Mirkin [Sat, 10 Aug 2013 20:07:17 +0000 (16:07 -0400)]
nvc0: move bsp param-filling logic into nouveau
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Ilia Mirkin [Sat, 10 Aug 2013 19:42:19 +0000 (15:42 -0400)]
nvc0: move nvc0_decoder into nouveau, rename to nouveau_vp3_decoder
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Ilia Mirkin [Sat, 10 Aug 2013 19:27:49 +0000 (15:27 -0400)]
nvc0: standardize on using #if for NVC0_DEBUG_FENCE
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Ilia Mirkin [Sat, 10 Aug 2013 17:27:47 +0000 (13:27 -0400)]
nvc0: refactor video buffer management logic into nouveau_vp3
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Ilia Mirkin [Mon, 29 Jul 2013 23:28:45 +0000 (19:28 -0400)]
nv50: allow forcing PMPEG use, for ease of testing
This also allows people who don't want to install the binary blobs
required for VP2 to still get MPEG decoding.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Ilia Mirkin [Tue, 2 Jul 2013 21:33:41 +0000 (17:33 -0400)]
nv30: hook up PMPEG support via nouveau_video, enables XvMC to work
Force the format to be the reasonable format that doesn't require an
inverse z-scan.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Ilia Mirkin [Sat, 10 Aug 2013 08:10:28 +0000 (04:10 -0400)]
nouveau: set buffer format of video buffer
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Ilia Mirkin [Sat, 10 Aug 2013 07:49:21 +0000 (03:49 -0400)]
nouveau: fix number of surfaces in video buffer, use defines
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Ilia Mirkin [Wed, 14 Aug 2013 05:08:38 +0000 (01:08 -0400)]
nv30: U8_USCALED only works for size 4
See https://bugs.freedesktop.org/show_bug.cgi?id=61635 for a sample
program. Changing it to use a vec4 makes it work. Remove the unsupported
formats.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: "9.2 and 9.1" <mesa-stable@lists.freedesktop.org>
Chris Forbes [Sun, 7 Jul 2013 10:51:02 +0000 (22:51 +1200)]
i965: allow 8 user clip planes on CTG+
There's no need to use a clip flag for NEGW on these gens, so
no reason we can't just enable 8 planes.
V2: - Bump (and document!) MAX_VERTS in the clip code.
- Fix clip flag masks in the clip unit state and in the shader
prolog
- Move this to the end of the series for less breakage.
Signed-off-by: Chris Forbes <chrisf@ijw.co.nz>
Reviewed-by: Paul Berry <stereotype441@gmail.com>
Chris Forbes [Sun, 7 Jul 2013 15:46:55 +0000 (03:46 +1200)]
i965: get rid of clip plane compaction
Signed-off-by: Chris Forbes <chrisf@ijw.co.nz>
Reviewed-by: Paul Berry <stereotype441@gmail.com>
Chris Forbes [Sun, 4 Aug 2013 16:18:22 +0000 (04:18 +1200)]
i965/clip: Support clip distances for line clipping
This does the same thing as we do for triangle clipping -- select the
appropriate source (either dot(hpos,fixed plane) or a clipdistance
slot).
Signed-off-by: Chris Forbes <chrisf@ijw.co.nz>
Reviewed-by: Paul Berry <stereotype441@gmail.com>
Chris Forbes [Sun, 4 Aug 2013 06:32:48 +0000 (18:32 +1200)]
i965/clip: remove spurious clipvertex param
Nothing in the clipper uses gl_ClipVertex any more, so we don't care
where it is.
V2: Don't bother fishing out the clipvertex offset either.
Signed-off-by: Chris Forbes <chrisf@ijw.co.nz>
Reviewed-by: Paul Berry <stereotype441@gmail.com>
Chris Forbes [Sun, 4 Aug 2013 06:31:56 +0000 (18:31 +1200)]
i965/clip: Use clip distances for all user clipping
V2: Adjust explanation of load_clip_distance()
Signed-off-by: Chris Forbes <chrisf@ijw.co.nz>
Reviewed-by: Paul Berry <stereotype441@gmail.com>
Chris Forbes [Sat, 3 Aug 2013 17:47:34 +0000 (05:47 +1200)]
i956/clip: push dp4 into load_clip_distance
Soon the dp4 is only going to be used for fixed clip planes.
V2: Remove old inaccurate comment about the behavior of this function;
add a better explanation above.
Signed-off-by: Chris Forbes <chrisf@ijw.co.nz>
Reviewed-by: Paul Berry <stereotype441@gmail.com>
Chris Forbes [Sat, 3 Aug 2013 17:20:17 +0000 (05:20 +1200)]
i965/clip: Track offset into the vertex for clipdistance
Signed-off-by: Chris Forbes <chrisf@ijw.co.nz>
Reviewed-by: Paul Berry <stereotype441@gmail.com>
Chris Forbes [Sat, 3 Aug 2013 15:32:34 +0000 (03:32 +1200)]
i965/Gen4-5: Set clip flags from clip distances
V2: - Use the new VS_OPCODE_UNPACK_FLAGS_SIMD4X2 to correctly split the
flags for the two vertices being processed together.
- Don't apply bogus masking of clip flags. The set of plane enables
aren't included in the shader key, and we wouldn't want the
recompiles anyway.
V3: - Tidy up spurious instructions, name temps properly.
Signed-off-by: Chris Forbes <chrisf@ijw.co.nz>
[V2] Reviewed-by: Paul Berry <stereotype441@gmail.com>
Chris Forbes [Wed, 7 Aug 2013 18:31:33 +0000 (06:31 +1200)]
i965: add new VS_OPCODE_UNPACK_FLAGS_SIMD4X2
Splits the bottom 8 bits of f0.0 for further wrangling
in a SIMD4x2 program. The 4 bits corresponding to the channels in each
program flow are copied to the LSBs of dst.x visible to each flow.
This is useful for working with clipping flags in the VS.
V3: - Fixup immediate types
- Teach scheduler about the hidden dep on flags
Signed-off-by: Chris Forbes <chrisf@ijw.co.nz>
V2: Reviewed-by: Paul Berry <stereotype441@gmail.com>
Chris Forbes [Thu, 15 Aug 2013 18:54:30 +0000 (06:54 +1200)]
i965/vs: add vec4_instruction::depends_on_flags
We're about to have an instruction that depends on the flags but isn't
predicated. This lays the groundwork.
Signed-off-by: Chris Forbes <chrisf@ijw.co.nz>
Chris Forbes [Sun, 7 Jul 2013 16:48:52 +0000 (04:48 +1200)]
i965/clip: Enable interpolation of clip distances
Previously we had disabled interpolation of the clip distances as a
special case, since they were unused.
Signed-off-by: Chris Forbes <chrisf@ijw.co.nz>
Reviewed-by: Paul Berry <stereotype441@gmail.com>
Chris Forbes [Sun, 7 Jul 2013 16:21:08 +0000 (04:21 +1200)]
i965/vs: Do legacy clip lowering earlier
We need to produce clip flags for the vertex header on Gen4/5, so
clip plane lowering has to be done before we try to emit the flags/psiz
attribute.
Signed-off-by: Chris Forbes <chrisf@ijw.co.nz>
Reviewed-by: Paul Berry <stereotype441@gmail.com>
Chris Forbes [Sun, 7 Jul 2013 15:44:58 +0000 (03:44 +1200)]
i965/Gen4-5: ensure VUE slots for clipdistance are valid if user clipping is enabled.
V2: We don't particularly care where they fall in the VUE map, as long
as they are allocated somewhere, and occupy two contiguous slots. Don't
fiddle with the SF layout at all -- there's no need.
Signed-off-by: Chris Forbes <chrisf@ijw.co.nz>
Reviewed-by: Paul Berry <stereotype441@gmail.com>
Chia-I Wu [Thu, 15 Aug 2013 03:14:05 +0000 (11:14 +0800)]
ilo: fix fragment shaders that use PCB on GEN7+
Missed this commit when preparing PCB changes for upstreaming.
Vinson Lee [Thu, 15 Aug 2013 00:27:53 +0000 (17:27 -0700)]
nouveau: Fix variable name.
Fixes build error introduced with commit
d1ba1055d98c246d1ee9d9c14706bb9fba6a98c7.
CC nouveau_video.lo
nouveau_video.c: In function 'nouveau_screen_get_video_param':
nouveau_video.c:866:33: error: 'screen' undeclared (first use in this function)
nouveau_video.c:866:33: note: each undeclared identifier is reported only once for each function it appear
Signed-off-by: Vinson Lee <vlee@freedesktop.org>
Matt Turner [Wed, 7 Aug 2013 20:00:48 +0000 (13:00 -0700)]
glsl: Add i2b() and b2i() to ir_builder.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Chad Versace <chad.versace@linux.intel.com>
Matt Turner [Sun, 4 Aug 2013 21:09:35 +0000 (14:09 -0700)]
glsl: Add nequal() to ir_builder.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Chad Versace <chad.versace@linux.intel.com>
Matt Turner [Sun, 4 Aug 2013 21:09:09 +0000 (14:09 -0700)]
glsl: Add abs() to ir_builder.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Chad Versace <chad.versace@linux.intel.com>
Matt Turner [Sat, 3 Aug 2013 18:33:49 +0000 (11:33 -0700)]
glsl: Add bitcast_i2f() to ir_builder.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Chad Versace <chad.versace@linux.intel.com>
Marek Olšák [Tue, 6 Aug 2013 04:33:22 +0000 (06:33 +0200)]
radeonsi: unduplicate code in create_context
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
Marek Olšák [Tue, 6 Aug 2013 04:23:52 +0000 (06:23 +0200)]
radeonsi: initialize the radeon_surface structure
this fixes valgrind warnings
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
Marek Olšák [Tue, 6 Aug 2013 04:31:17 +0000 (06:31 +0200)]
radeonsi: correct sampler function names
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
Marek Olšák [Mon, 5 Aug 2013 12:40:43 +0000 (14:40 +0200)]
radeonsi: rename r600_texture::dirty_db_mask to dirty_level_mask
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
Marek Olšák [Mon, 5 Aug 2013 01:42:11 +0000 (03:42 +0200)]
radeonsi: rename r600_resource_texture to r600_texture
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
Marek Olšák [Tue, 6 Aug 2013 04:35:23 +0000 (06:35 +0200)]
tgsi: add info about MSAA samplers to tgsi_shader_info
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
Marek Olšák [Tue, 6 Aug 2013 04:21:11 +0000 (06:21 +0200)]
tgsi: fix the location of sample index
The sample index is always in W.
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
Roland Scheidegger [Tue, 13 Aug 2013 16:59:35 +0000 (18:59 +0200)]
r600/radeonsi: implement new float comparison instructions
Also use ordered comparisons for old cmp instructions.
Tested-by: Michel Dänzer <michel@daenzer.net>
Reviewed-by: Tom Stellard <tom@stellard.net>
Roland Scheidegger [Tue, 13 Aug 2013 16:54:15 +0000 (18:54 +0200)]
nv50: implement new float comparison instructions
untested.
Reviewed-by: Christoph Bumiller <e0425955@student.tuwien.ac.at>
Roland Scheidegger [Tue, 13 Aug 2013 16:53:49 +0000 (18:53 +0200)]
ilo: implement new float comparison instructions
untested.
Reviewed-by: Chia-I Wu <olvaffe@gmail.com>
Roland Scheidegger [Wed, 14 Aug 2013 16:35:00 +0000 (18:35 +0200)]
gallivm: already pass coords in the right place in the sampler interface
This makes things a bit nicer, and more importantly it fixes an issue
where a "downgraded" array texture (due to view reduced to 1 layer and
addressed with (non-array) samplec instruction) would use the wrong
coord as shadow reference value. (This could also be fixed by passing
target through the sampler interface much the same way as is done for
size queries, might do this eventually anyway.)
And if we'd ever want to support (shadow) cube map arrays, we'd need
5 coords in any case.
v2: fix bugs (texel fetch using wrong layer coord for 1d, shadow tex
using wrong shadow coord for 2d...). Plus need to project the shadow
coord, and just for fun keep projecting the layer coord too.
Reviewed-by: Zack Rusin <zackr@vmware.com>
Roland Scheidegger [Wed, 14 Aug 2013 22:18:20 +0000 (00:18 +0200)]
gallivm: change coordinate handling throughout functions
Instead of passing s,t,r coordinates pass a coord array - the reason is that
I need to pass more coords (in particular for shadow "coord", future will also
need another one for cube map arrays) so just pass them as an array.
Also, to simplify things, use fixed location for the shadow reference value I
want to get rid of the silly "where is the right coord value" game.
Keep old-style however for aos sampling (which is not going to need shadow
coord, though for cube map arrays it still would need fixing).
(Next patch will pass those through using the new arrangement directly from
sampler interface.)
v2: fix up soa split path (unreachable currently but still...)
Reviewed-by: Zack Rusin <zackr@vmware.com>
Roland Scheidegger [Wed, 14 Aug 2013 00:13:18 +0000 (02:13 +0200)]
gallivm: fix border color with normalized texture formats
We need to put border color into texture format color space which
essentially means clamping for non-float, normalized formats (not entirely
sure if we're also meant to quantize the float but it's probably ok not to
do it thankfully).
For OpenGL we could do this easily outside generated code due to the
1:1 sampler/texture correspondence but not for d3d10 which is terrible
(as we recalculate a constant over and over again per shader invocation).
Fortunately border color should be rare enough that we don't care THAT much.
Reviewed-by: Zack Rusin <zackr@vmware.com>
Zack Rusin [Tue, 13 Aug 2013 05:42:37 +0000 (01:42 -0400)]
llvmpipe: fix pipeline statistics with a null ps
If the fragment shader is null then pixel shader invocations have
to be equal to zero. And if we're running a null ps then clipper
invocations and primitives should be equal to zero but only
if both stancil and depth testing are disabled.
Signed-off-by: Zack Rusin <zackr@vmware.com>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Zack Rusin [Fri, 9 Aug 2013 14:11:31 +0000 (10:11 -0400)]
draw: make sure that the stages setup outputs
Calling the prepare outputs cleans up the slot assignments
for outputs, unfortunately aapoint and aaline didn't have
code to reset their slots after the initial setup, this
was messing up our slot assignments. The unfilled stage
was just missing the initial assignment of the face slot.
This fixes all of the reported piglit failures.
Signed-off-by: Zack Rusin <zackr@vmware.com>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Paul Berry [Fri, 9 Aug 2013 14:58:43 +0000 (07:58 -0700)]
glsl: Fix incorrect pattern matching in ir_set_program_inouts
In commit 8fc41df (glsl: Modify ir_set_program_inouts to handle
geometry shaders), when attempting to pattern match the "foo" part of
expressions such as:
foo[i][j]
foo[i]
I incorrectly called as_dereference_variable() on the subexpression
foo[i] instead of foo. As a result, the pattern never matched, so
ir_set_program_inouts would fall back on marking the entire variable
as used, rather than just the portion indexed by the array.
This didn't result in incorrect behaviour, but it could have resulted
in inefficiency by causing the back-end to allocate resources for
unused parts of an input or output array.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Rico Schüller [Wed, 14 Aug 2013 11:17:22 +0000 (13:17 +0200)]
vl: Add support for max level query v2
This patch adds the level query support to the video decoders
and uses some more reasonable defaults.
v2: (ck) add commit message
Reviewed-by: Christian König <christian.koenig@amd.com>
Ian Romanick [Fri, 9 Aug 2013 20:32:40 +0000 (13:32 -0700)]
glsl: Emit better warnings for things that look like default precision statements
Previously we would emit a warning for empty declarations like
float;
We would also emit the same warning for things like
highp float;
However, this second case is most likely the application trying to set
the default precision. This makes the compiler generate a stronger
warning with some suggestion of a fix.
It really seems like this should be an error. I'll bet that 100% of the
time someone writes 'highp float;' the actually meant 'precision highp
float;'. Alas, both AMD and NVIDIA accept this syntax, and the spec
doesn't explicitly forbid it.
This makes piglit's precision-05.vert generate the following warnings:
0:12(11): warning: empty declaration with precision qualifier, to set the default precision, use `precision lowp float;'
0:13(12): warning: empty declaration with precision qualifier, to set the default precision, use `precision mediump int;'
v2: Add { } around a one-line if body and fix a comment. Suggested by
Ken.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Cc: "9.2" <mesa-stable@lists.freedesktop.org>
Paul Berry [Mon, 12 Aug 2013 13:39:23 +0000 (06:39 -0700)]
glsl/ast: Don't perform GS input array checks on non-inputs.
Previously, we were accidentally calling
handle_geometry_shader_input_decl() on non-input interface block
declarations, resulting in bogus error checking.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Paul Berry [Mon, 12 Aug 2013 13:39:23 +0000 (06:39 -0700)]
glsl/ast: Fix assertion failure when GS input declared as non-array.
Previously, if a geometry shader input was declared as a non-array, we
would flag the proper compiler error, but then before we got a chance
to report it to the client, handle_geometry_shader_input_decl() would
assertion fail.
With this patch, handle_geometry_shader_input_decl() ignores
non-arrays.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Paul Berry [Mon, 12 Aug 2013 13:39:23 +0000 (06:39 -0700)]
glsl/ast: Check that geometry shader interface block inputs are arrays.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Paul Berry [Wed, 14 Aug 2013 02:29:59 +0000 (19:29 -0700)]
i965/gen7+: Fix build error introduced by renaming upload_3dstate_so_decl_list.
Commit
9f9ccf707c54156b4559a4b1206022c2ca2d45cd renamed
upload_3dstate_so_decl_list to gen7_upload_3dstate_so_decl_list but
forgot to update the caller.
Jon Severinsson [Sun, 11 Aug 2013 17:37:01 +0000 (19:37 +0200)]
radeon/llvm: Add missing "%s" format string to fprintf.
This fixes a compilation warning with -Wformat-security.
CC: "9.2" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Tom Stellard <thomas.stellard@amd.com>
Chad Versace [Tue, 6 Aug 2013 22:52:11 +0000 (15:52 -0700)]
i965: Move arrays brw_multisample_positions* to new header
Move the arrays to the new header brw_multisample_state.h, which will be
shared with Broadwell code.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Signed-off-by: Chad Versace <chad.versace@linux.intel.com>
Chad Versace [Tue, 6 Aug 2013 22:53:38 +0000 (15:53 -0700)]
i965: Refactor names of sample_positions_8/4x arrays
Place each array in the brw namespace by renaming it:
sample_positions_4x -> brw_multisample_positions_4x
sample_positions_8x -> brw_multisample_positions_8x
This prepares for moving the arrays to a header shared by gen6 and gen8.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Signed-off-by: Chad Versace <chad.versace@linux.intel.com>
Kenneth Graunke [Tue, 4 Dec 2012 21:31:57 +0000 (13:31 -0800)]
i965/gen7+: Mark upload_3dstate_so_decl_list as non-static (v2)
We will reuse this for Broadwell.
v2: Prefix function name with 'gen7'. (chadv)
Reviewed-by: Chad Versace <chad.versace@linux.intel.com>
Kenneth Graunke [Tue, 4 Dec 2012 02:18:37 +0000 (18:18 -0800)]
i965: Mark a few brw_draw_upload.c functions as non-static
We will reuse these for Broadwell.
Reviewed-by: Chad Versace <chad.versace@linux.intel.com>
Ian Romanick [Fri, 9 Aug 2013 00:40:38 +0000 (17:40 -0700)]
glsl: Require function return type arrays be explicitly sized
Fixes piglit array-function-return-unsized.vert.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Cc: "9.2" <mesa-stable@lists.freedesktop.org>
Ian Romanick [Fri, 9 Aug 2013 00:23:01 +0000 (17:23 -0700)]
glsl: Move and refine test for unsized arrays in GLSL ES
GLSL ES does not allow unsized arrays, and GLSL ES 1.00 does not allow
array initializers. However, GLSL ES 3.00 allows array initializers,
and the initializer can explicitly size the array. The specification
even includes some examples of this:
float x[] = float[2] (1.0, 2.0); // declares an array of size 2
float y[] = float[] (1.0, 2.0, 3.0); // declares an array of size 3
float a[5];
float b[] = a;
Move the unsized array check to after the initializer has been
processed. If the array is still unsized, generate the error. This
should have no effect in GLSL ES 1.00 because, as previously mentioned,
array initializers are not allowed.
Fixes piglit "glsl-es-3.00 compiler array-sized-by-initializer.vert".
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Cc: "9.1 9.2" <mesa-stable@lists.freedesktop.org>
Ian Romanick [Fri, 9 Aug 2013 01:17:24 +0000 (18:17 -0700)]
glx: Generate GLXBadDrawable when drawable is zero
Fixes piglit glx-query-drawable-GLXBadDrawable.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Chad Versace <chad.versace@linux.intel.com>
Cc: "9.2" <mesa-stable@lists.freedesktop.org>
Ian Romanick [Thu, 8 Aug 2013 22:41:36 +0000 (15:41 -0700)]
mesa: Use _mesa_detach_renderbuffer when deleting a texture
The functional change is that now invalidate_framebuffer is called if
the texture is actually detached from one of the currently bound FBOs.
Previously this was only done for renderbuffers.
The remaining changes make the texture delete path look more similar to
the renderbuffer delete path. This includes adding relevant spec
quotations to justify the behavior.
Fixes piglit fbo-incomplete "delete texture of bound FBO" test.
v2: Move 'fb->Attachment[i].Texture == att' check from previous patch to
this patch... where it was intended to be in the first place. Noticed
by Chad.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Chad Versace <chad.versace@linux.intel.com>
Cc: "9.2" <mesa-stable@lists.freedesktop.org>
Ian Romanick [Thu, 8 Aug 2013 22:26:36 +0000 (15:26 -0700)]
mesa: Make detach_renderbuffer available outside fbobject.c
Also add a return value indicating whether any work was done.
This will be used by the next patch.
v2: Move 'fb->Attachment[i].Texture == att' check to the next
patch... where it was intended to be in the first place. Noticed by
Chad.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Chad Versace <chad.versace@linux.intel.com>
Cc: "9.2" <mesa-stable@lists.freedesktop.org>
Ian Romanick [Thu, 8 Aug 2013 19:33:04 +0000 (12:33 -0700)]
meta: Don't call _mesa_Ortho with width or height of 0
Fixes failures in oglconform fbo mipmap.manual.color,
mipmap.manual.colorAndDepth, mipmap.automatic, and
mipmap.manualIterateTexTargets subtests.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Chad Versace <chad.versace@linux.intel.com>
Cc: "9.2" <mesa-stable@lists.freedesktop.org>
Vadim Girlin [Sat, 10 Aug 2013 22:52:34 +0000 (02:52 +0400)]
r600g/sb: use MULADD workaround on R7xx for MULADD_IEEE
Looks like the same issue that was seen with MULADD in trans slot on
R7xx also affects MULADD_IEEE (maybe all OP3 instructions and MULADD is
just a most frequently used?). So the workaround is to not allow affected
instructions to be placed into the trans slot.
Fixes https://bugs.freedesktop.org/show_bug.cgi?id=67927
Signed-off-by: Vadim Girlin <vadimgirlin@gmail.com>
Cc: "9.2" <mesa-stable@lists.freedesktop.org>
Roland Scheidegger [Mon, 12 Aug 2013 23:16:42 +0000 (01:16 +0200)]
gallivm: implement new float comparison instructions returning integer masks
FSEQ/FSGE/FSLT/FSNE work just the same as SEQ/SGE/SLT/SNE except skip the
select.
And just for consistency use the same appropriate ordered/unordered comparisons
for the old opcodes as well.
Reviewed-by: Zack Rusin <zackr@vmware.com>
Roland Scheidegger [Mon, 12 Aug 2013 23:10:59 +0000 (01:10 +0200)]
tgsi: implement new float comparison instructions returning integer masks
Also while here add a bunch of other forgotten (integer) instructions to
tgsi_util_get_inst_usage_mask() (which isn't used for much except optimizing
away unused input components), though it may still be incomplete.
Reviewed-by: Zack Rusin <zackr@vmware.com>
Roland Scheidegger [Mon, 12 Aug 2013 23:07:51 +0000 (01:07 +0200)]
gallium: add new float comparison instructions returning integer masks
Newer graphic languages don't want messy float mask results but instead true
"boolean" mask results for float comparisons. Otherwise just need to convert
the floats back to integers. Need to keep the old opcodes however due to both
legacy (gl and d3d9) needing them and because older hw can't really deal with
integers. These new FSEQ/FSGE/FSLT/FSNE opcodes are part of integer API and
hence must be supported if a driver claims to support glsl 1.30 (or
PIPE_SHADER_CAP_INTEGERS).
Reviewed-by: Zack Rusin <zackr@vmware.com>
Chia-I Wu [Wed, 17 Jul 2013 21:42:21 +0000 (05:42 +0800)]
ilo: enable dumping of WM PCB
It was disabled because it wasn't supported.
Chia-I Wu [Tue, 13 Aug 2013 08:11:27 +0000 (16:11 +0800)]
ilo: no binding table change when constants are pushed
When constants can be pushed, and nothing else requires new SURFACE_STATEs,
there is no need to emit BINDING_TABLE_STATE.
Chia-I Wu [Wed, 17 Jul 2013 21:58:45 +0000 (05:58 +0800)]
ilo: support push constant model in shaders
Source constants from URB constant data when the constant data can fit in the
PCB.
Chia-I Wu [Wed, 17 Jul 2013 21:58:42 +0000 (05:58 +0800)]
ilo: support copying constant buffer 0 to PCB
Add ILO_KERNEL_PCB_CBUF0_SIZE so that a kernel can specify how many bytes of
constant buffer 0 need to be copied to PCB.
Chia-I Wu [Wed, 17 Jul 2013 21:43:00 +0000 (05:43 +0800)]
ilo: make constant buffer 0 upload optional
Add ILO_KERNEL_SKIP_CBUF0_UPLOAD so that we can skip constant buffer 0 upload
when the kernel does not need it.
Chia-I Wu [Tue, 13 Aug 2013 07:23:41 +0000 (15:23 +0800)]
Revert "ilo: initialize constant buffer SURFACE_STATE early"
This reverts commit
a9b800aa81cffdcaef2490ff49986099feae2663. With push
constant support, the constructed SURFACE_STATE is unused and wasted. The
change only slows things down.
Armin K [Sun, 11 Aug 2013 15:27:23 +0000 (17:27 +0200)]
gbm: Link to libwayland-drm if Wayland EGL platform is enabled
We were relying on libEGL to pull in libwayland-client symbols, but with
commit
2c2e64edaba0f6aeb181ca5b51eb8dea8e9b39f9 cleaned up the
symbol leak.
https://bugs.freedesktop.org/show_bug.cgi?id=67962
Roland Scheidegger [Mon, 12 Aug 2013 19:18:58 +0000 (21:18 +0200)]
gallivm: fix exec_mask interaction with geometry shader after end of main
Because we must maintain an exec_mask even if there's currently nothing
on the mask stack, we can still have an exec_mask at the end of the program.
Effectively, this mask should be set back to default when returning from main.
Without relying on END/RET opcode (I think it's valid to have neither) it is
actually difficult to do this, as there doesn't seem any reasonable place to
do it, so instead let's just say the exec_mask is invalid outside main (which
it really is effectively).
The problem is that geometry shader called end_primitive outside the shader
(in the epilogue), and as a result used a bogus mask, leading to bugs if we
had to set the (somewhat misnamed) ret_in_main bit anywhere. So just avoid
the mask combining function when called from outside the shader.
Reviewed-by: Zack Rusin <zackr@vmware.com>
Roland Scheidegger [Mon, 12 Aug 2013 16:01:18 +0000 (18:01 +0200)]
draw: simplify prim mask construction
The code was quite weird, the second comparison was in fact a complete no-op
and we can also do the comparison with the vector directly instead of scalar,
which should not also be faster but it is way more obvious how that mask
is actually going to look like.
Reviewed-by: Zack Rusin <zackr@vmware.com>
Roland Scheidegger [Mon, 12 Aug 2013 15:58:39 +0000 (17:58 +0200)]
gallivm: simplify geometry shader mask handling a bit
Instead of reducing masks to 0/1 simply use the mask directly as -1.
Also use some signed comparison instead of unsigned (as far as I understand
these values have to be (very) small and signed means llvm doesn't have to
apply additional logic to do the unsigned comparisons the cpu can't do).
Saves a couple of instructions in some test geometry shader here.
v2: that was a bit to much optimization, don't skip combining the masks...
Reviewed-by: Zack Rusin <zackr@vmware.com>
Roland Scheidegger [Sat, 10 Aug 2013 01:26:54 +0000 (03:26 +0200)]
draw: (trivial) dump tgsi for geometry shaders with GALLIVM_DEBUG_TGSI
And dump the variant key too (same as vs does).
Just so I can stop wondering why I see the tgsi dump for fs and vs but not
gs...
Roland Scheidegger [Fri, 9 Aug 2013 19:51:37 +0000 (21:51 +0200)]
gallivm: (trivial) fix typo in argument declaration of lp_build_size_query_soa
Was meant to match the name used elsewhere, spotted by Anthony.
Kenneth Graunke [Mon, 5 Aug 2013 05:37:34 +0000 (22:37 -0700)]
i965/fs: Add dump_instruction() support for ARF destinations.
CMP instructions use BRW_ARF_NULL as a destination. Prior to this
patch, dump_instruction() decoded the destination as "???".
Now it decodes BRW_ARF_NULL as "(null)" and other ARFs numerically.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Kenneth Graunke [Mon, 5 Aug 2013 05:35:01 +0000 (22:35 -0700)]
i965/fs: Remove extraneous newline in dump_instruction() for CMP.
This resulted in printouts like:
246: cmp.cmod.f0.0
???, vgrf152, 0.000000f, (null),
With this patch, CMP is properly printed on one line.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Kenneth Graunke [Sun, 4 Aug 2013 09:05:43 +0000 (02:05 -0700)]
i965/fs: Optimize IF/MOV/ELSE/MOV/ENDIF to SEL when possible.
Many GLSL shaders contain code of the form:
x = condition ? foo : bar
The compiler emits an ir_if tree for this, since each subexpression
might be a complex tree that could have side-effects and short-circuit
logic operations.
However, the common case is to simply pick one of two constants or
variable's values---which is exactly what SEL is for. Replacing IF/ELSE
with SEL also simplifies the control flow graph, making optimization
passes which work on basic blocks more effective.
The shader-db statistics:
total instructions in shared programs: 1655247 -> 1503234 (-9.18%)
instructions in affected programs: 949188 -> 797175 (-16.02%)
2,970 shaders were helped, none hurt. Gained 181 SIMD16 programs.
This helps Valve's Source Engine games (max -41.33%), The Cave
(max -33.33%), Serious Sam 3 (max -18.64%), Yo Frankie! (max -30.19%),
Zen Bound (max -22.22%), GStreamer (max -6.12%), and GLBenchmark 2.7
(max -1.94%).
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Kenneth Graunke [Mon, 5 Aug 2013 23:24:43 +0000 (16:24 -0700)]
i965/fs: Consider predicated SEL instructions as whole variable writes.
The instruction
(+f0.0) SEL dst, src0, src1
will write either src0 or src1 to dst, depending on the predicate.
Unlike most predicated instructions, it always writes to dst.
fs_inst::is_partial_write() is supposed to return true if the whole
register is guaranteed to be written. The !inst->predicated check makes
sense for most instructions, which might not write the whole register,
but SEL is a special case.
This caused live interval analysis to ignore the destination of
predicated SEL instructions when computing "def" information.
Requires the previous commit to avoid regressions.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Kenneth Graunke [Tue, 6 Aug 2013 00:12:12 +0000 (17:12 -0700)]
i965/fs: Explicitly disallow CSE on predicated instructions.
The existing inst->is_partial_write() already disallows predicated
instructions, so this has no functional change. However, it's worth
doing explicitly since the CSE pass does not consider the flag register.
This means it could blindly factor out operations that use the same
sources, but which have different condition codes set.
This prevents a regression in the next commit.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Kenneth Graunke [Sun, 4 Aug 2013 00:31:53 +0000 (17:31 -0700)]
i965/fs: Log a performance warning if skipping 16-wide due to pulls.
Usually, the driver creates both 8-wide and 16-wide variants of every
fragment shader. When 16-wide compilation fails, it logs a performance
warning explaining why only an 8-wide program exists.
However, when there are pull parameters, the driver won't even bother
trying the 16-wide compile (since it would fail). In this case, it
failed to emit a performance warning, leaving no explanation for the
missing 16-wide program.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Chia-I Wu [Sun, 11 Aug 2013 14:44:44 +0000 (22:44 +0800)]
ilo: initialize constant buffer SURFACE_STATE early
Fix ilo_gpe_init_view_surface_for_buffer to allow buffer to be NULL, and add
ilo_gpe_set_view_surface_bo to set it later. This allows us to set up
SURFACE_STATE early for constant buffers backed by user buffers.
Chia-I Wu [Fri, 9 Aug 2013 16:48:28 +0000 (00:48 +0800)]
ilo: 3DSTATE_INDEX_BUFFER may be wrongly skipped
In finalize_index_buffer(), when the current index buffer was destroyed due to
u_upload_data(), it may happen that the new index buffer is at the same
address as the old one. Comparing the pointers to the two buffers could fail
to work, and 3DSTATE_INDEX_BUFFER would be incorrectly skipped.
Holding a reference to the current index buffer before calling u_upload_data()
should fix the problem.
Chris Forbes [Sun, 4 Aug 2013 09:29:49 +0000 (21:29 +1200)]
i965: add missing BRW_NEW_INTERPOLATION_MAP to state dump
Makes this flag appear in the output for INTEL_DEBUG=state
Signed-off-by: Chris Forbes <chrisf@ijw.co.nz>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Chris Forbes [Sun, 4 Aug 2013 07:38:37 +0000 (19:38 +1200)]
i965: Add a new debug mode for the VUE map
INTEL_DEBUG=vue now emits a listing of each slot in the VUE map,
and the corresponding interpolation mode.
V2: Fix whitespace issues.
Signed-off-by: Chris Forbes <chrisf@ijw.co.nz>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>