platform/upstream/mesa.git
11 years agoi965: Remove the old brw_optimize() code.
Eric Anholt [Sat, 1 Dec 2012 02:30:40 +0000 (18:30 -0800)]
i965: Remove the old brw_optimize() code.

This is now done in the VS backend before instruction emit.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
11 years agoi965/vs: Add a pass to set dependency control fields on instructions.
Eric Anholt [Sat, 1 Dec 2012 02:29:34 +0000 (18:29 -0800)]
i965/vs: Add a pass to set dependency control fields on instructions.

This is a more aggressive version of the old brw_optimize() path.  Reduces
cycles spent in the vertex shader on minecraft by 18.6% +/- 10.0% (n=15).

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
11 years agoi965: Dump shader source for linked shader programs.
Eric Anholt [Fri, 22 Mar 2013 23:50:58 +0000 (16:50 -0700)]
i965: Dump shader source for linked shader programs.

We dump shader source in ir_to_mesa.cpp, and we dump linked programs here,
but we had no reference from the linked programs to their source.  This
was preventing improvement of shader-db to use linked shader programs
instead of individual shader files (which is bogus, because it means we
optimize out VS outputs, and don't interpolate FS inputs!)

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
11 years agoclover: Fix build with LLVM 3.3
Mike Lothian [Mon, 1 Apr 2013 17:50:23 +0000 (10:50 -0700)]
clover: Fix build with LLVM 3.3

11 years agollvmpipe: use triangle subdivision to avoid fixed-point overflow issues
Brian Paul [Tue, 26 Mar 2013 04:02:47 +0000 (22:02 -0600)]
llvmpipe: use triangle subdivision to avoid fixed-point overflow issues

If we're drawing to a surface that's 2048 x 2048 pixels or larger there's
danger of fixed-point overflow in the triangle rasterization code.  That
leads to various rendering glitches.

Rather than implement some intricate changes to the rasterization code,
simply subdivide triangles into smaller subtriangles to avoid the issue.
Only do this when the drawing surface is larger than 2048 by 2048.

Reviewed-by: José Fonseca <jfonseca@vmware.com>
11 years agomesa: remove platform checks around __builtin_ffs, __builtin_ffsll
Brian Paul [Thu, 28 Mar 2013 23:06:35 +0000 (17:06 -0600)]
mesa: remove platform checks around __builtin_ffs, __builtin_ffsll

Use the __builtin_ffs, __builtin_ffsll functions whenever we have GCC,
not just for specific platforms.  Fixes Solaris build.

Note: This is a candidate for the stable branches.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=62868
Signed-off-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
11 years agodocs: add a new page documenting known application issues
Brian Paul [Mon, 25 Mar 2013 19:15:37 +0000 (13:15 -0600)]
docs: add a new page documenting known application issues

Let's try to update this when we find other broken applications...

Reviewed-by: José Fonseca <jfonseca@vmware.com>
11 years agodrirc: set always_have_depth_buffer for Topogon
Brian Paul [Sat, 30 Mar 2013 00:29:52 +0000 (18:29 -0600)]
drirc: set always_have_depth_buffer for Topogon

Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
11 years agogallivm: Minor comment cleanup
Adam Jackson [Mon, 1 Apr 2013 13:45:38 +0000 (09:45 -0400)]
gallivm: Minor comment cleanup

Signed-off-by: Adam Jackson <ajax@redhat.com>
11 years agomesa: fix texture storage multisample prototypes harder.
Dave Airlie [Mon, 1 Apr 2013 09:53:55 +0000 (19:53 +1000)]
mesa: fix texture storage multisample prototypes harder.

I just noticed the warnings since I fixed the other bit.

Signed-off-by: Dave Airlie <airlied@redhat.com>
11 years agor600g/llvm: Update LLVM_REVISION
Vincent Lejeune [Sun, 31 Mar 2013 19:37:20 +0000 (21:37 +0200)]
r600g/llvm: Update LLVM_REVISION

11 years agor600g/llvm: use native encode for tex
Vincent Lejeune [Mon, 25 Mar 2013 23:47:08 +0000 (00:47 +0100)]
r600g/llvm: use native encode for tex

11 years agoglapi: fix storage multisample build errors
Dave Airlie [Sun, 31 Mar 2013 10:41:28 +0000 (20:41 +1000)]
glapi: fix storage multisample build errors

Reported on #radeon by udovdh

Signed-off-by: Dave Airlie <airlied@redhat.com>
11 years agodocs: mark ARB_texture_storage_multisample done
Chris Forbes [Sat, 16 Mar 2013 04:02:58 +0000 (17:02 +1300)]
docs: mark ARB_texture_storage_multisample done

Signed-off-by: Chris Forbes <chrisf@ijw.co.nz>
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Brian Paul <brianp@vmware.com>
11 years agoi965: enable ARB_texture_storage_multisample on Gen6+
Chris Forbes [Sat, 16 Feb 2013 09:09:38 +0000 (22:09 +1300)]
i965: enable ARB_texture_storage_multisample on Gen6+

This can be enabled everywhere that ARB_texture_multisample is
supported -- ARB_texture_storage is supported on everything.

Signed-off-by: Chris Forbes <chrisf@ijw.co.nz>
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Brian Paul <brianp@vmware.com>
11 years agomesa: allow multisample texture targets in [Get]TexParameter*
Chris Forbes [Fri, 15 Mar 2013 09:52:12 +0000 (22:52 +1300)]
mesa: allow multisample texture targets in [Get]TexParameter*

ARB_texture_storage_multisample allows texture parameters to be
queried for TEXTURE_2D_MULTISAMPLE and TEXTURE_2D_MULTISAMPLE_ARRAY
targets.

Some parameters may also be set, with the following exceptions:

- TEXTURE_BASE_LEVEL may not be set to a nonzero value; generates
   INVALID_OPERATION

- any state which appears in the `per-sampler` state table may not
  be set; generates INVALID_OPERATION

V2: Don't introduce bogus handling of TEXTURE_MAX_LEVEL

Signed-off-by: Chris Forbes <chrisf@ijw.co.nz>
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Brian Paul <brianp@vmware.com>
11 years agomesa: improve reported function name in Tex*Multisample
Chris Forbes [Sat, 16 Mar 2013 04:00:05 +0000 (17:00 +1300)]
mesa: improve reported function name in Tex*Multisample

Now that there are 4 variants, just pass the function name into
teximagemultisample rather than reconstructing it.

Signed-off-by: Chris Forbes <chrisf@ijw.co.nz>
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Brian Paul <brianp@vmware.com>
11 years agomesa: add enable bit for ARB_texture_storage_multisample
Chris Forbes [Sat, 16 Feb 2013 09:34:22 +0000 (22:34 +1300)]
mesa: add enable bit for ARB_texture_storage_multisample

Signed-off-by: Chris Forbes <chrisf@ijw.co.nz>
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Brian Paul <brianp@vmware.com>
11 years agoglapi: add definition of ARB_texture_storage_multisample
Chris Forbes [Sat, 16 Feb 2013 09:02:00 +0000 (22:02 +1300)]
glapi: add definition of ARB_texture_storage_multisample

Adds XML for the extension, dispatch_sanity enabling, and the two new
entrypoints. These are both implemented by calling the shared
teximagemultisample() with immutable=GL_TRUE.

Signed-off-by: Chris Forbes <chrisf@ijw.co.nz>
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Brian Paul <brianp@vmware.com>
11 years agomesa: add support for immutable textures to teximagemultisample()
Chris Forbes [Sat, 16 Feb 2013 08:38:20 +0000 (21:38 +1300)]
mesa: add support for immutable textures to teximagemultisample()

The new entrypoints will come later, but this adds the actual logic for
supporting immutable multisample textures:

- The immutability flag is set as desired.
- Attempting to modify an immutable multisample texture produces
  INVALID_OPERATION.

Note: The extension spec does not mention adding this behavior to
TexImage*Multisample, but it seems like the reasonable thing to do.

V2: - Cover missing error cases (unsized formats; texture object zero)

Signed-off-by: Chris Forbes <chrisf@ijw.co.nz>
[V1] Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Brian Paul <brianp@vmware.com>
11 years agomesa: extract _mesa_is_legal_tex_storage_format helper
Chris Forbes [Fri, 22 Mar 2013 06:58:03 +0000 (19:58 +1300)]
mesa: extract _mesa_is_legal_tex_storage_format helper

This is about to be used in teximagemultisample() when immutable=true.

Signed-off-by: Chris Forbes <chrisf@ijw.co.nz>
Reviewed-by: Brian Paul <brianp@vmware.com>
11 years agomesa: Delete VERT_ATTRIB_GENERIC_NV and VERT_BIT_GENERIC_NV macros.
Kenneth Graunke [Thu, 21 Mar 2013 17:57:08 +0000 (10:57 -0700)]
mesa: Delete VERT_ATTRIB_GENERIC_NV and VERT_BIT_GENERIC_NV macros.

These haven't been used since we deleted NV_vertex_program support.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
11 years agoi965: Fix an inconsistency inb the VUE map with gl_ClipVertex on gen4/5.
Eric Anholt [Fri, 29 Mar 2013 07:26:07 +0000 (00:26 -0700)]
i965: Fix an inconsistency inb the VUE map with gl_ClipVertex on gen4/5.

We are intentionally not allocating a slot for gl_ClipVertex.  But by
leaving the bit set in the slots_valid, the fragment shader's computation
of where varyings are in urb entry coming out of the SF would be off by
one.  Fixes rendering in Freespace 2 SCP, and improves rendering in TF2.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=62830
Tested-by: Joaquín Ignacio Aramendía <samsagax@gmail.com>
NOTE: This is a candidate for the 9.1 branch.
Reviewed-and-tested-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Paul Berry <stereotype441@gmail.com>
11 years agointel: Remove a never-taken debug print path.
Eric Anholt [Thu, 28 Mar 2013 22:58:25 +0000 (15:58 -0700)]
intel: Remove a never-taken debug print path.

Alessandro Pignotti noted when I added this code in commit
0e723b135bfd59868c92c3ae243f1adaedaec3a5 that it's in the else block for
"if (busy)", so this debug print couldn't happen.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
11 years agost/mesa: add ir_lod case in GLSL->TGSI code to silence warning
Brian Paul [Fri, 29 Mar 2013 23:21:33 +0000 (17:21 -0600)]
st/mesa: add ir_lod case in GLSL->TGSI code to silence warning

11 years agoglsl: Generated masked write instead of vector array index for UBO lowering
Ian Romanick [Sat, 23 Mar 2013 01:55:49 +0000 (18:55 -0700)]
glsl: Generated masked write instead of vector array index for UBO lowering

When reading a column from a row-major matrix, we would slot the single
value read into the vector using an ir_dereference_array of the vector
with a constant index.  This will (eventually) get optimized to a
masked-write, so just generate the masked write in the first place.

v2: Remove unused variable 'chan'.  Suggested by Ken.

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Cc: Eric Anholt <eric@anholt.net>
11 years agoglsl: Replace open-coded dot-product with dot
Ian Romanick [Mon, 25 Mar 2013 21:40:53 +0000 (14:40 -0700)]
glsl: Replace open-coded dot-product with dot

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Cc: Eric Anholt <eric@anholt.net>
Cc: Paul Berry <stereotype441@gmail.com>
11 years agoglsl: Replace constant-index vector array accesses with swizzles
Ian Romanick [Sat, 16 Mar 2013 01:05:55 +0000 (18:05 -0700)]
glsl: Replace constant-index vector array accesses with swizzles

Search and replace:

    ][0] -> ].x
    ][1] -> ].y
    ][2] -> ].z
    ][3] -> ].w

Fixes piglit tests inverse-mat[234].{vert,frag}.  These tests call the
inverse function with constant parameters and expect proper constant
folding to happen.  My suspicion is that this patch papers over some bug
in constant propagation involving array accesses.

Either way, all of these accesses eventually get lowered to swizzles.
This cuts out the middle man (saving a trivial amount of CPU).

NOTE: This is a candidate for the 9.1 branch.

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Cc: Eric Anholt <eric@anholt.net>
Cc: Paul Berry <stereotype441@gmail.com>
11 years agoglsl: Add missing bool case in glsl_type::get_scalar_type
Ian Romanick [Fri, 15 Mar 2013 23:47:46 +0000 (16:47 -0700)]
glsl: Add missing bool case in glsl_type::get_scalar_type

Since the case was missing bec4->get_scalar_type() would return bvec4,
but vec4->get_scalar_type() would return float.

NOTE: This is a candidate for stable branches.

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
11 years agoi965: Fix INTEL_DEBUG=shader_time for fragment shaders with discards.
Kenneth Graunke [Thu, 28 Mar 2013 06:19:39 +0000 (23:19 -0700)]
i965: Fix INTEL_DEBUG=shader_time for fragment shaders with discards.

"discard" instructions generate HALT instructions which jump to a final
HALT near the end of the shader.  Previously, fs_generator created this
final jump target when it saw the first FS_OPCODE_FB_WRITE, causing it
to jump right before the FB write epilogue.  This is normally good.

However, INTEL_DEBUG=shader_time also has an epilogue section which
records the final timestamp.  The frontend emits IR for this just before
FS_OPCODE_FB_WRITE.  Unfortunately, this led to the following ordering:

1. Shader Time Epilogue
2. Final HALT (where discards jump)
3. Framebuffer Write Epilogue

This meant that discarded pixels completely skipped the shader time
epilogue, causing no ending timestamp to be written.  This obviously
led to inaccurate results.

This patch adds a new FS_OPCODE_PLACEHOLDER_HALT in the IR stream just
before any epilogue sections.  This is where the final HALT should be
generated, and makes it easy to ensure the correct ordering:

1. Final HALT
2. Shader Time Epilogue
3. Framebuffer Write Epilogue

For shaders that don't discard, this opcode compiles away to nothing.
The scheduler adds barrier dependencies to make sure that it doesn't
get moved above any FS_OPCODE_DISCARD_JUMP instructions.

One 8-wide shader in GLBenchmark 2.7 dropped from 2291.67 Gcycles to
a mere 5.13 Gcycles.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
11 years agoi965: Add names for all instructions to dump_instruction() in FS and VS.
Eric Anholt [Tue, 12 Mar 2013 00:36:54 +0000 (17:36 -0700)]
i965: Add names for all instructions to dump_instruction() in FS and VS.

I'd previously added the minimum names to understand my dumps, but this
makes dumps in general much easier to read.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
11 years agoi965: Enable ARB_texture_query_lod.
Matt Turner [Wed, 6 Mar 2013 22:54:27 +0000 (14:54 -0800)]
i965: Enable ARB_texture_query_lod.

v2: Support Ironlake as well.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
11 years agoi965/fs: Generate LOD sampler message from ir_lod.
Matt Turner [Wed, 6 Mar 2013 22:47:01 +0000 (14:47 -0800)]
i965/fs: Generate LOD sampler message from ir_lod.

v2: Support Ironlake as well.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
11 years agoglsl: Implement ARB_texture_query_lod
Dave Airlie [Sun, 23 Sep 2012 09:50:41 +0000 (19:50 +1000)]
glsl: Implement ARB_texture_query_lod

v2 [mattst88]:
   - Rebase.
   - #define GL_ARB_texture_query_lod to 1.
   - Remove comma after ir_lod in ir.h for MSVC.
   - Handled ir_lod in ir_hv_accept.cpp, ir_rvalue_visitor.cpp,
     opt_tree_grafting.cpp.
   - Rename textureQueryLOD to textureQueryLod, see
     https://www.khronos.org/bugzilla/show_bug.cgi?id=821
   - Fix ir_reader of (lod ...).
v3 [mattst88]:
   - Rename textureQueryLod to textureQueryLOD, pending resolution of
     Khronos 821.
   - Add ir_lod case to ir_to_mesa.cpp.

Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
11 years agoi965/fs: Use measured Gen7 instruction timings on Gen6.
Matt Turner [Thu, 28 Mar 2013 18:38:57 +0000 (11:38 -0700)]
i965/fs: Use measured Gen7 instruction timings on Gen6.

x before
+ after
+------------------------------------------------------------------------------+
|   x                                   x   +                                  |
|   xx  ++                              x   +                                  |
|   xx  ++ +                           xx   ++                                 |
|x xxx x+++++          +           xxx x*x+*+++ +         x                   +|
|   |_____|____________A______A____M____M_|_______|                            |
+------------------------------------------------------------------------------+
    N           Min           Max        Median           Avg        Stddev
    x  23       8083.78       8287.83       8205.55     8162.7461     68.307951
    +  23       8107.56       8358.74       8224.33     8186.1765     71.506301
    No difference proven at 95.0% confidence

Reviewed-by: Eric Anholt <eric@anholt.net>
11 years agoi965/fs: Increase and document MAD latency on Gen7.
Matt Turner [Thu, 28 Mar 2013 18:15:20 +0000 (11:15 -0700)]
i965/fs: Increase and document MAD latency on Gen7.

58% of mad(8) generated in shader-db are reading registers from the same
bank.

Reviewed-by: Eric Anholt <eric@anholt.net>
11 years agoi965/fs: Add LRP instruction latency.
Matt Turner [Thu, 28 Mar 2013 17:57:34 +0000 (10:57 -0700)]
i965/fs: Add LRP instruction latency.

Set its latency to what happens to be the default floating-point
instruction latency. One day we may want to handle latency based on
register bank information.

Reviewed-by: Eric Anholt <eric@anholt.net>
11 years agoi965/fs: Add Haswell cycle timings
Matt Turner [Fri, 1 Mar 2013 00:42:51 +0000 (16:42 -0800)]
i965/fs: Add Haswell cycle timings

Reviewed-by: Eric Anholt <eric@anholt.net>
11 years agoi965: Note that write-after-write dependencies are blocking.
Matt Turner [Thu, 28 Mar 2013 17:46:17 +0000 (10:46 -0700)]
i965: Note that write-after-write dependencies are blocking.

Reviewed-by: Eric Anholt <eric@anholt.net>
11 years agoi965: Reword comment about the shared mathbox.
Matt Turner [Thu, 28 Mar 2013 17:45:34 +0000 (10:45 -0700)]
i965: Reword comment about the shared mathbox.

Reviewed-by: Eric Anholt <eric@anholt.net>
11 years agogallivm: consolidate some half-to-float and r11g11b10-to-float code
Roland Scheidegger [Fri, 29 Mar 2013 05:16:33 +0000 (06:16 +0100)]
gallivm: consolidate some half-to-float and r11g11b10-to-float code

Similar enough that we can try to use shared code.
v2: fix a stupid bug using wrong variable causing mayhem with Inf and NaNs.

Reviewed-by: Jose Fonseca <jfonseca@vmware.com
11 years agomesa: provide default implementation of QuerySamplesForFormat
Chris Forbes [Fri, 29 Mar 2013 03:22:09 +0000 (16:22 +1300)]
mesa: provide default implementation of QuerySamplesForFormat

Previously at least i915 failed to provide an implementation, but
exposed ARB_internalformat_query anyway, leading to crashes when
QueryInternalformativ was called.

Default implementation just returns 1 for everything, so is suitable for
any driver which does not support multisampling.

V2: - Move from intel to core mesa.

Signed-off-by: Chris Forbes <chrisf@ijw.co.nz>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
11 years agonvc0: implement MP performance counters
Christoph Bumiller [Wed, 27 Mar 2013 22:39:06 +0000 (23:39 +0100)]
nvc0: implement MP performance counters

There's more, but this only adds (most) of the counters that are
handled directly by the shader processors.
The other counter domains are not handled on the multiprocessor and
there are no FIFO object methods for configuring them.
Instead, they have to be programmed by the kernel via PCOUNTER, and
the interface for this isn't in place yet.

11 years agonvc0: enable compression when supported
Christoph Bumiller [Thu, 21 Mar 2013 18:26:01 +0000 (19:26 +0100)]
nvc0: enable compression when supported

11 years agonvc0: use NOUVEAU_GETPARAM_GRAPH_UNITS to get MP count
Christoph Bumiller [Wed, 27 Mar 2013 22:38:29 +0000 (23:38 +0100)]
nvc0: use NOUVEAU_GETPARAM_GRAPH_UNITS to get MP count

11 years agonv50,nvc0: fix 3d blits, restore viewport after blit
Christoph Bumiller [Fri, 22 Mar 2013 12:49:40 +0000 (13:49 +0100)]
nv50,nvc0: fix 3d blits, restore viewport after blit

11 years agonv50: fix 3D render target setup
Christoph Bumiller [Mon, 25 Mar 2013 18:41:18 +0000 (19:41 +0100)]
nv50: fix 3D render target setup

11 years agollvmpipe: put .bmp extension on dumped image files
Brian Paul [Thu, 28 Mar 2013 23:17:26 +0000 (17:17 -0600)]
llvmpipe: put .bmp extension on dumped image files

11 years agollvmpipe: add 'f' suffix to 1.0 in fixed_to_float()
Brian Paul [Thu, 28 Mar 2013 23:17:26 +0000 (17:17 -0600)]
llvmpipe: add 'f' suffix to 1.0 in fixed_to_float()

11 years agodraw: fix some build breakage when LLVM is not used
Brian Paul [Thu, 28 Mar 2013 23:03:57 +0000 (17:03 -0600)]
draw: fix some build breakage when LLVM is not used

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=62883
Tested-by: Vinson Lee <vlee@freedesktop.org>
11 years agomesa: handle STATE_CURRENT_ATTRIB_MAYBE_VP_CLAMPED for parameter printing
Marek Olšák [Thu, 28 Mar 2013 13:50:01 +0000 (14:50 +0100)]
mesa: handle STATE_CURRENT_ATTRIB_MAYBE_VP_CLAMPED for parameter printing

Reviewed-by: Brian Paul <brianp@vmware.com>
11 years agoi965: Tidy shader time printing code by using printf's field widths.
Kenneth Graunke [Thu, 28 Mar 2013 07:18:46 +0000 (00:18 -0700)]
i965: Tidy shader time printing code by using printf's field widths.

We can use %-6s%-6s rather than manually counting characters, resulting
in much more readable code.

This necessitates a small secondary change: using "total fs16" and ""
now causes the "" string to be padded out to 6 characters, resulting in
too much whitespace.  Splitting it into "total" and "fs16" produces the
same output as before.

Reviewed-by: Eric Anholt <eric@anholt.net>
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
11 years agoi965/vs: Include URB payload setup in shader_time.
Eric Anholt [Tue, 19 Mar 2013 23:28:54 +0000 (16:28 -0700)]
i965/vs: Include URB payload setup in shader_time.

This much more accurately reflects the cost of the vertex shader, since
the payload setup is often a significant fraction of the instructions in
the VS.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
11 years agoi965/vs: Use a send from a 2-register VGRF for shader time writes.
Eric Anholt [Tue, 18 Dec 2012 01:11:21 +0000 (17:11 -0800)]
i965/vs: Use a send from a 2-register VGRF for shader time writes.

This will let us emit it later, after we're setting up MRFs for the
URB write.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
11 years agoi965/vs: Teach copy propagation about sends from GRFs.
Eric Anholt [Tue, 18 Dec 2012 01:03:02 +0000 (17:03 -0800)]
i965/vs: Teach copy propagation about sends from GRFs.

This incidentally also teaches it a bit about gen6 math -- we now allow
unswizzled, unmodified GRF temps as the sources for math.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
11 years agoi965/vs: Prepare split_virtual_grfs() for the presence of SENDs from GRFs.
Eric Anholt [Tue, 18 Dec 2012 00:48:20 +0000 (16:48 -0800)]
i965/vs: Prepare split_virtual_grfs() for the presence of SENDs from GRFs.

v2: Fix silly bool handling, and don't add new tabs.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
11 years agoi965/fs: Include everything but the final FB write in shader_time.
Eric Anholt [Tue, 19 Mar 2013 22:14:20 +0000 (15:14 -0700)]
i965/fs: Include everything but the final FB write in shader_time.

Previously, if you just wrote a constant color to the render target, no
time got noted at all.  This is convenient for doing single-instruction
timings, but not so much for actual program analysis.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
11 years agoi965/fs: Switch shader_time writes to using GRFs.
Eric Anholt [Tue, 19 Mar 2013 22:28:11 +0000 (15:28 -0700)]
i965/fs: Switch shader_time writes to using GRFs.

This avoids conflicts between shader_time and FB writes, so we can include
more of the program under our profiling.  This does mean hiding more of
the message setup from the optimizer, which doesn't have a way to handle
multi-reg sends from GRFs.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
11 years agoi965: Provide more detailed information to match shader_time to programs.
Eric Anholt [Tue, 19 Mar 2013 21:28:29 +0000 (14:28 -0700)]
i965: Provide more detailed information to match shader_time to programs.

Ken asked me the other day what -1 vs 0 vs 3 vs other meant in our shader
names, and I realized that it was really unclear.  I'd like to do even
better, like noting which one is the clear shader, but that would require
exposing the metaops struct to the driver.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
11 years agoi965: Track ARB program state along with GLSL state for shader_time.
Eric Anholt [Tue, 19 Mar 2013 21:27:42 +0000 (14:27 -0700)]
i965: Track ARB program state along with GLSL state for shader_time.

This will let us do much better printouts for non-GLSL programs.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
11 years agost/dri: fix crash with HUD and single buffering
Marek Olšák [Wed, 27 Mar 2013 00:56:25 +0000 (01:56 +0100)]
st/dri: fix crash with HUD and single buffering

11 years agost/mesa: remove leftover printfs from ReadPixels
Marek Olšák [Thu, 28 Mar 2013 13:51:28 +0000 (14:51 +0100)]
st/mesa: remove leftover printfs from ReadPixels

Oops, I thought I had removed all debugging code.

11 years agoi965/fs: Improve performance of copy propagation dataflow using bitsets.
Eric Anholt [Fri, 22 Mar 2013 21:11:25 +0000 (14:11 -0700)]
i965/fs: Improve performance of copy propagation dataflow using bitsets.

Reduces compile time of l4d2's slowest shader by 17.8% +/- 1.3% (n=10).

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
11 years agollvmpipe/draw: Fix texture sampling in geometry shaders
Zack Rusin [Wed, 27 Mar 2013 00:53:27 +0000 (17:53 -0700)]
llvmpipe/draw: Fix texture sampling in geometry shaders

We weren't correctly propagating the samplers and sampler views
when they were related to geometry shaders.

Signed-off-by: Zack Rusin <zackr@vmware.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: José Fonseca <jfonseca@vmware.com>
11 years agodraw/llvm: Cleanup the store debugging code
Zack Rusin [Tue, 26 Mar 2013 19:35:45 +0000 (12:35 -0700)]
draw/llvm: Cleanup the store debugging code

Signed-off-by: Zack Rusin <zackr@vmware.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: José Fonseca <jfonseca@vmware.com>
11 years agodraw: Allocate the output buffer for output primitives
Zack Rusin [Tue, 26 Mar 2013 19:32:30 +0000 (12:32 -0700)]
draw: Allocate the output buffer for output primitives

We were allocating the output buffer but using the input
primitives. We need to allocate that buffer using the
maximum number of output, not input, primitives.

Signed-off-by: Zack Rusin <zackr@vmware.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: José Fonseca <jfonseca@vmware.com>
11 years agogallivm: Implement the breakc instruction
Zack Rusin [Wed, 27 Mar 2013 09:38:32 +0000 (02:38 -0700)]
gallivm: Implement the breakc instruction

Required by more modern examples. Like BRK but with a condition.

Signed-off-by: Zack Rusin <zackr@vmware.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: José Fonseca <jfonseca@vmware.com>
11 years agogallivm: implement implicit primitive flushing
Zack Rusin [Wed, 27 Mar 2013 09:30:38 +0000 (02:30 -0700)]
gallivm: implement implicit primitive flushing

TGSI semantics currently require an implicit endprim at the end
of GS if an ending primitive hasn't been emitted.

Signed-off-by: Zack Rusin <zackr@vmware.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: José Fonseca <jfonseca@vmware.com>
11 years agogallium/llvm: implement geometry shaders in the llvm paths
Zack Rusin [Mon, 18 Feb 2013 12:00:19 +0000 (04:00 -0800)]
gallium/llvm: implement geometry shaders in the llvm paths

This commits implements code generation of the geometry shaders in
the SOA paths. All the code is there but bugs are likely present.

Signed-off-by: Zack Rusin <zackr@vmware.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: José Fonseca <jfonseca@vmware.com>
11 years agodraw/gs: Fetch more than one primitive per invocation
Zack Rusin [Thu, 14 Mar 2013 08:07:28 +0000 (01:07 -0700)]
draw/gs: Fetch more than one primitive per invocation

Allows executing gs on up to 4 primitives at a time. Will also be
required by the llvm code because there we definitely don't want
to flush with just a single primitive.

Signed-off-by: Zack Rusin <zackr@vmware.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: José Fonseca <jfonseca@vmware.com>
11 years agodraw/gs: Abstract the portions of GS that are tgsi specific
Zack Rusin [Thu, 14 Mar 2013 07:42:06 +0000 (00:42 -0700)]
draw/gs: Abstract the portions of GS that are tgsi specific

To be able to add llvm paths later on we need to have some common
interface for them.

Signed-off-by: Zack Rusin <zackr@vmware.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: José Fonseca <jfonseca@vmware.com>
11 years agodraw/llvm: Remove unused gs_constants from jit_context
Zack Rusin [Thu, 14 Mar 2013 00:13:21 +0000 (17:13 -0700)]
draw/llvm: Remove unused gs_constants from jit_context

The member was never used and we'll need to handle it differently
because gs will also need samplers/textures setup.

Signed-off-by: Zack Rusin <zackr@vmware.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: José Fonseca <jfonseca@vmware.com>
11 years agograw/gs: add missing max output vertices to all tests
Zack Rusin [Wed, 13 Mar 2013 23:46:24 +0000 (16:46 -0700)]
graw/gs: add missing max output vertices to all tests

A few tests were missing this crucial property.

Signed-off-by: Zack Rusin <zackr@vmware.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: José Fonseca <jfonseca@vmware.com>
11 years agoradeonsi: add cs tracing v3
Jerome Glisse [Mon, 25 Mar 2013 15:46:38 +0000 (11:46 -0400)]
radeonsi: add cs tracing v3

Same as on r600, trace cs execution by writting cs offset after each
states, this allow to pin point lockup inside command stream and
narrow down the scope of lockup investigation.

v2: Use WRITE_DATA packet instead of WRITE_MEM
v3: Remove useless nop packet

Signed-off-by: Jerome Glisse <jglisse@redhat.com>
11 years agomesa: only check sample count if we actually wanted multisampling
Chris Forbes [Mon, 25 Mar 2013 10:19:07 +0000 (23:19 +1300)]
mesa: only check sample count if we actually wanted multisampling

Fixes various test fallout from 90b5a2425a on Pineview, which claims to
support ARB_internalformat_query but doesn't actually provide the
driverfunc.

That driver is still broken [GetInternalformativ will still segfault!]
but it was silly to be going through the sample count logic in the
nonmultisampling case at all.

Signed-off-by: Chris Forbes <chrisf@ijw.co.nz>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
11 years agoradeon/llvm: document LLVM commit
Christian König [Tue, 26 Mar 2013 14:08:00 +0000 (15:08 +0100)]
radeon/llvm: document LLVM commit

We need at least that revision to work correctly now.

Signed-off-by: Christian König <christian.koenig@amd.com>
11 years agoradeonsi: add preloading for all samplers
Christian König [Sun, 17 Mar 2013 15:02:42 +0000 (16:02 +0100)]
radeonsi: add preloading for all samplers

Signed-off-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
11 years agoradeonsi: add preloading of all constants
Christian König [Fri, 15 Mar 2013 14:53:25 +0000 (15:53 +0100)]
radeonsi: add preloading of all constants

Signed-off-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
11 years agoradeonsi: mark most intrinsics as readnone/nounwind
Christian König [Wed, 20 Mar 2013 11:10:35 +0000 (12:10 +0100)]
radeonsi: mark most intrinsics as readnone/nounwind

Signed-off-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
11 years agoradeonsi: mark all loads as constant
Christian König [Wed, 20 Mar 2013 13:37:21 +0000 (14:37 +0100)]
radeonsi: mark all loads as constant

Signed-off-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
11 years agoradeonsi: remove wqm intrinsic
Christian König [Tue, 19 Mar 2013 12:36:26 +0000 (13:36 +0100)]
radeonsi: remove wqm intrinsic

Now the backend handles that itself.

Signed-off-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
11 years agoradeon/llvm: remove uneeded inclusion
Christian König [Tue, 26 Mar 2013 10:37:45 +0000 (11:37 +0100)]
radeon/llvm: remove uneeded inclusion

The include isn't needed and the file has moved with LLVM master.

Signed-off-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
11 years agoglsl_to_tgsi: avoid creating arrays if driver doesn't support them
Christian König [Sun, 24 Mar 2013 15:24:52 +0000 (16:24 +0100)]
glsl_to_tgsi: avoid creating arrays if driver doesn't support them

Avoid creating arrays if we replace indirect addressing anyway.

Signed-off-by: Christian König <christian.koenig@amd.com>
11 years agoglsl_to_tgsi: make simplify_cmp work with arrays
Christian König [Mon, 25 Mar 2013 09:57:48 +0000 (10:57 +0100)]
glsl_to_tgsi: make simplify_cmp work with arrays

Even when we have arrays it is possible for simplify_cmp
to work on temps, just not on arrays.

Fixes: https://bugs.freedesktop.org/show_bug.cgi?id=62696

Signed-off-by: Christian König <christian.koenig@amd.com>
11 years agogallium/docs: document get_driver_query_info
Marek Olšák [Tue, 26 Mar 2013 00:37:40 +0000 (01:37 +0100)]
gallium/docs: document get_driver_query_info

11 years agor600g: add a driver query returning the amount of requested VRAM and GTT memory
Marek Olšák [Fri, 22 Mar 2013 01:39:42 +0000 (02:39 +0100)]
r600g: add a driver query returning the amount of requested VRAM and GTT memory

11 years agor600g: add a driver query returning the number of draw_vbo calls
Marek Olšák [Thu, 21 Mar 2013 18:44:18 +0000 (19:44 +0100)]
r600g: add a driver query returning the number of draw_vbo calls

between begin_query and end_query

11 years agost/dri: integrate the HUD
Marek Olšák [Thu, 21 Mar 2013 18:51:30 +0000 (19:51 +0100)]
st/dri: integrate the HUD

Reviewed-by: Brian Paul <brianp@vmware.com>
11 years agogallium: implement a heads-up display module
Marek Olšák [Thu, 21 Mar 2013 18:47:06 +0000 (19:47 +0100)]
gallium: implement a heads-up display module

Reviewed-by: Brian Paul <brianp@vmware.com>
v2: lots of cosmetic changes

11 years agogallium: add interface for driver queries like performance counters, etc.
Marek Olšák [Thu, 21 Mar 2013 18:32:24 +0000 (19:32 +0100)]
gallium: add interface for driver queries like performance counters, etc.

The pipe query interface is reused. The list of available queries can be
obtained using pipe_screen::get_driver_query_info.

Reviewed-by: Brian Paul <brianp@vmware.com>
11 years agogallium/tgsi: fix valgrind warning
Marek Olšák [Fri, 22 Mar 2013 16:04:15 +0000 (17:04 +0100)]
gallium/tgsi: fix valgrind warning

"Conditional jump or move depends on uninitialised value(s)"

Reviewed-by: Brian Paul <brianp@vmware.com>
11 years agost/mesa: fix crash with blit-based GetTexImage
Marek Olšák [Thu, 21 Mar 2013 23:31:47 +0000 (00:31 +0100)]
st/mesa: fix crash with blit-based GetTexImage

https://bugs.freedesktop.org/show_bug.cgi?id=62573

Tested-by: Andreas Boll <andreas.boll.dev@gmail.com>
11 years agocso: add constant buffer save/restore feature for postprocessing
Marek Olšák [Mon, 18 Mar 2013 21:36:21 +0000 (22:36 +0100)]
cso: add constant buffer save/restore feature for postprocessing

Postprocessing is an internal meta op and should restore the states
it changes.

11 years agoradeonsi: fix crash while binding a NULL constant buffer
Marek Olšák [Thu, 21 Mar 2013 18:29:29 +0000 (19:29 +0100)]
radeonsi: fix crash while binding a NULL constant buffer

Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
11 years agor600g: fix crash while binding a NULL constant buffer
Marek Olšák [Thu, 21 Mar 2013 18:29:29 +0000 (19:29 +0100)]
r600g: fix crash while binding a NULL constant buffer

11 years agor300g: fix crash while binding a NULL constant buffer
Marek Olšák [Thu, 21 Mar 2013 18:29:29 +0000 (19:29 +0100)]
r300g: fix crash while binding a NULL constant buffer

11 years agor600g: Use virtual address for PIPE_QUERY_SO* in r600_emit_query_end
Martin Andersson [Mon, 25 Mar 2013 22:11:34 +0000 (23:11 +0100)]
r600g: Use virtual address for PIPE_QUERY_SO* in r600_emit_query_end

Virtual address is used for PIPE_QUERY_SO* queries in
r600_emit_query_begin, but not in r600_emit_query_end.

This will trigger a GPU fault when one of those queries is
made and virtual address is enabled.

Note: this is a candidate for the 9.1 branch

Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
11 years agofreedreno: use u_debug for debug env vars
Rob Clark [Mon, 25 Mar 2013 18:57:24 +0000 (14:57 -0400)]
freedreno: use u_debug for debug env vars

Signed-off-by: Rob Clark <robdclark@gmail.com>
11 years agoglsl ir: add as_dereference_record
Jordan Justen [Sun, 10 Mar 2013 01:12:09 +0000 (17:12 -0800)]
glsl ir: add as_dereference_record

Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
11 years agogallium: undef PACKAGE_* macros to silence warnings
Brian Paul [Mon, 25 Mar 2013 16:24:01 +0000 (10:24 -0600)]
gallium: undef PACKAGE_* macros to silence warnings

Reviewed-by: Jose Fonseca <jfonseca@vmware.com>