platform/upstream/mesa.git
10 years agoi965: Improve dead control flow elimination.
Matt Turner [Tue, 15 Jul 2014 22:29:29 +0000 (15:29 -0700)]
i965: Improve dead control flow elimination.

... to eliminate an ELSE instruction followed immediately by an ENDIF.

instructions in affected programs:     704 -> 700 (-0.57%)

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
10 years agonvc0/ir: support 2d constbuf indexing
Ilia Mirkin [Sun, 6 Jul 2014 06:06:04 +0000 (02:06 -0400)]
nvc0/ir: support 2d constbuf indexing

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
10 years agogm107/ir: emit LDC subops
Ilia Mirkin [Tue, 15 Jul 2014 00:29:04 +0000 (20:29 -0400)]
gm107/ir: emit LDC subops

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
10 years agogk110/ir: emit load constant subop
Ilia Mirkin [Tue, 15 Jul 2014 00:20:03 +0000 (20:20 -0400)]
gk110/ir: emit load constant subop

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
10 years agomesa/st: add support for interpolate_at_* ops
Ilia Mirkin [Sun, 6 Jul 2014 03:32:06 +0000 (23:32 -0400)]
mesa/st: add support for interpolate_at_* ops

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Chris Forbes <chrisf@ijw.co.nz>
10 years agonv50/ir: fix phi/union sources when their def has been merged
Ilia Mirkin [Thu, 17 Jul 2014 04:30:40 +0000 (00:30 -0400)]
nv50/ir: fix phi/union sources when their def has been merged

In a situation where double-register values are used, the phi nodes can
still end up being u32 values. They all get merged into one RA node
though. When fixing up the merge (which comes after the phi node), the
phi node's def would get fixed, but not its sources which would remain
at the low register value.

This maintains the invariant that a phi node's defs and sources are
allocated the same register.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
10 years agonv50/ir: fix hard-coded TYPE_U32 sized register
Ilia Mirkin [Thu, 17 Jul 2014 03:20:57 +0000 (23:20 -0400)]
nv50/ir: fix hard-coded TYPE_U32 sized register

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
10 years agonvc0: mark shader header if fp64 is used
Ilia Mirkin [Fri, 18 Jul 2014 02:31:11 +0000 (22:31 -0400)]
nvc0: mark shader header if fp64 is used

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
10 years agonv50/ir: keep track of whether the program uses fp64
Ilia Mirkin [Fri, 18 Jul 2014 02:30:00 +0000 (22:30 -0400)]
nv50/ir: keep track of whether the program uses fp64

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
10 years agonvc0: make sure that the local memory allocation is aligned to 0x10
Ilia Mirkin [Fri, 18 Jul 2014 02:11:56 +0000 (22:11 -0400)]
nvc0: make sure that the local memory allocation is aligned to 0x10

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: <mesa-stable@lists.freedesktop.org>
10 years agomesa: add ARB_clear_texture.xml to file list, remove duplicate decls
Ilia Mirkin [Thu, 24 Jul 2014 01:10:51 +0000 (21:10 -0400)]
mesa: add ARB_clear_texture.xml to file list, remove duplicate decls

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Emil Velikov <emil.l.velikov@gmail.com>
10 years agoilo: check the tilings of imported handles
Chia-I Wu [Thu, 24 Jul 2014 05:21:41 +0000 (13:21 +0800)]
ilo: check the tilings of imported handles

Just to be cautious.

10 years agoilo: clean up resource bo renaming
Chia-I Wu [Thu, 24 Jul 2014 03:10:48 +0000 (11:10 +0800)]
ilo: clean up resource bo renaming

s/alloc_bo/rename_bo/ as that is what the functions do.  Simplify bo
allocation and move the complexity to bo renaming.

10 years agoilo: share some code between {tex,buf}_create_bo
Chia-I Wu [Thu, 24 Jul 2014 02:32:31 +0000 (10:32 +0800)]
ilo: share some code between {tex,buf}_create_bo

Add resource_get_bo_name() and resource_get_bo_initial_domain() for use by
both functions.

10 years agoilo: use native 3-component vertex formats on GEN7.5+
Chia-I Wu [Thu, 24 Jul 2014 01:39:37 +0000 (09:39 +0800)]
ilo: use native 3-component vertex formats on GEN7.5+

GEN7.5 gains support for those formats natively.

10 years agoilo: allow for device-dependent format translation
Chia-I Wu [Thu, 24 Jul 2014 01:32:34 +0000 (09:32 +0800)]
ilo: allow for device-dependent format translation

Pass ilo_dev_info to all format translation functions.

10 years agoi965: Accelerate uploads of RGBA and BGRA GL_UNSIGNED_INT_8_8_8_8_REV textures
Jason Ekstrand [Sat, 19 Jul 2014 01:23:30 +0000 (18:23 -0700)]
i965: Accelerate uploads of RGBA and BGRA GL_UNSIGNED_INT_8_8_8_8_REV textures

Since intel is always going to be little-endian,
GL_UNSIGNED_INT_8_8_8_8_REV is the same as GL_UNSIGNED_BYTE for RGBA and
BGRA textures, so the same acceleration code will work.  We might as well
use it.

Signed-off-by: Jason Ekstrand <jason.ekstrand@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
10 years agomesa: Fix the name in the error message
Ian Romanick [Wed, 16 Jul 2014 17:52:32 +0000 (10:52 -0700)]
mesa: Fix the name in the error message

Obvious copy-and-paste bug.

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
10 years agoglsl: Fix some bad indentation
Ian Romanick [Wed, 16 Jul 2014 20:02:26 +0000 (13:02 -0700)]
glsl: Fix some bad indentation

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
10 years agoi965/fs: Set LastRT on the final FB write on Broadwell.
Kenneth Graunke [Mon, 21 Jul 2014 23:17:46 +0000 (16:17 -0700)]
i965/fs: Set LastRT on the final FB write on Broadwell.

In Piglit's EXT_framebuffer_multisample/alpha-to-coverage-dual-src-blend
test, key->nr_color_regions == 2, but the dual source blend FB write has
ir->target set to 0.  So we failed to set "Last Render Target Select" on
any FB write message.

We only emit one FB write per render target, so my comment about setting
LastRT on every FB write directed at the last color region is a bit...
misinformed.  According to the documentation, depth buffer writes and
scoreboard updates happen on the FB write with LastRT set, so I believe
we want to set it only once.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Cc: "10.2" <mesa-stable@lists.freedesktop.org>
10 years agoi965: Port INTEL_DEBUG=optimizer to the vec4 backend.
Kenneth Graunke [Tue, 22 Jul 2014 03:06:23 +0000 (20:06 -0700)]
i965: Port INTEL_DEBUG=optimizer to the vec4 backend.

Largely via copy and paste.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
10 years agoi965: Save the gl_shader_stage enum in backend_visitor.
Kenneth Graunke [Tue, 22 Jul 2014 03:05:21 +0000 (20:05 -0700)]
i965: Save the gl_shader_stage enum in backend_visitor.

This will be useful for INTEL_DEBUG=optimizer in the vec4 backend, which
needs to know whether it's currently processing a VS or GS.  It isn't
worth adding virtual methods for this case.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
10 years agoi965: Don't print WE_normal in disassembly.
Kenneth Graunke [Thu, 17 Jul 2014 23:41:44 +0000 (16:41 -0700)]
i965: Don't print WE_normal in disassembly.

Dropping this helps most lines fit in an 80 column terminal.  The
absence of WE_normal also helps call attention to WE_all, where
something unusual is going on.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
10 years agofreedreno/a3xx/compiler: fix p0 (kill, etc)
Rob Clark [Wed, 23 Jul 2014 19:08:40 +0000 (15:08 -0400)]
freedreno/a3xx/compiler: fix p0 (kill, etc)

Don't assert (debug builds) or assign random uninitialized value for
predicate register (p0).. that screws up kill, etc.

Signed-off-by: Rob Clark <robclark@freedesktop.org>
10 years agoRevert "r600g/compute: Fix warnings"
Tom Stellard [Wed, 23 Jul 2014 15:52:05 +0000 (11:52 -0400)]
Revert "r600g/compute: Fix warnings"

This reverts commit 467f1585e28adba0e94ef593de131bc327f098bb.

This breaks the build on some systems.

10 years agoradeon/llvm: fix formatting
Grigori Goronzy [Thu, 17 Jul 2014 16:44:26 +0000 (18:44 +0200)]
radeon/llvm: fix formatting

Use K&R and same indent as most other code. No functional change
intended.

Reviewed-by: Tom Stellard <thomas.stellard@amd.com>
10 years agoradeon/llvm: enable unsafe math for graphics shaders
Grigori Goronzy [Thu, 17 Jul 2014 16:44:25 +0000 (18:44 +0200)]
radeon/llvm: enable unsafe math for graphics shaders

Accuracy of some operations was recently improved in the R600 backend,
at the cost of slower code. This is required for compute shaders,
but not for graphics shaders. Add unsafe-fp-math hint to make LLVM
generate faster but possibly less accurate code.

Piglit didn't indicate any regressions.

Reviewed-by: Tom Stellard <thomas.stellard@amd.com>
10 years agor600g/compute: Fix warnings
Tom Stellard [Wed, 23 Jul 2014 14:26:16 +0000 (10:26 -0400)]
r600g/compute: Fix warnings

10 years agor600g: Use hardware sqrt instruction
Glenn Kennard [Fri, 18 Jul 2014 07:54:37 +0000 (09:54 +0200)]
r600g: Use hardware sqrt instruction

Piglit quick tests including sqrt pass, no other regressions,
tested on radeon 6670.

Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
10 years agor600g/compute: Remove unneeded code from compute_memory_promote_item
Bruno Jiménez [Wed, 16 Jul 2014 21:12:47 +0000 (23:12 +0200)]
r600g/compute: Remove unneeded code from compute_memory_promote_item

Now that we know that the pool is defragmented, we positively know
that allocated + unallocated will be the total size of the
current pool plus all the items that will be promoted. So we only
need to grow the pool once.

This will allow us to just add the new items to the end of the
item_list without the need of looking for a place to the new item.

Reviewed-by: Tom Stellard <thomas.stellard@amd.com>
10 years agor600g/compute: Quick exit if there's nothing to add to the pool
Bruno Jiménez [Wed, 16 Jul 2014 21:12:46 +0000 (23:12 +0200)]
r600g/compute: Quick exit if there's nothing to add to the pool

This way we can avoid defragmenting the pool, even if it is needed
to defragment it, and looping again through the list of unallocated
items.

Reviewed-by: Tom Stellard <thomas.stellard@amd.com>
10 years agor600g/compute: Defrag the pool if it's necesary
Bruno Jiménez [Wed, 16 Jul 2014 21:12:45 +0000 (23:12 +0200)]
r600g/compute: Defrag the pool if it's necesary

This patch adds a new member to the pool to track its status.
For now it is used only for the 'fragmented' status, but if
needed it could be used for more statuses.

The pool will be considered fragmented if: An item that isn't
the last is freed or demoted.

This 'strategy' has a problem, although it shouldn't cause any bug.
If for example we have two items, A and B. We choose to free A first,
now the pool will have the 'fragmented' status. If we now free B,
the pool will retain its 'fragmented' status even if it isn't
fragmented.

Reviewed-by: Tom Stellard <thomas.stellard@amd.com>
10 years agor600g/compute: Add a function for defragmenting the pool
Bruno Jiménez [Wed, 16 Jul 2014 21:12:44 +0000 (23:12 +0200)]
r600g/compute: Add a function for defragmenting the pool

This new function will move items forward in the pool, so that
there's no gap between them, effectively defragmenting the pool.

For now this function is a bit dumb as it just moves items
forward without trying to see if other items in the pool could
fit in the gaps.

Reviewed-by: Tom Stellard <thomas.stellard@amd.com>
10 years agor600g/compute: Add a function for moving items in the pool
Bruno Jiménez [Wed, 16 Jul 2014 21:12:43 +0000 (23:12 +0200)]
r600g/compute: Add a function for moving items in the pool

This function will be used in the future by compute_memory_defrag
to move items forward in the pool.

It does so by first checking for overlaping ranges, if the ranges
don't overlap it will copy the contents directly. If they overlap
it will try first to make a temporary buffer, if this buffer fails
to allocate, it will finally fall back to a mapping.

Note that it will only be needed to move items forward, it only
checks for overlapping ranges in that case. If needed, it can
easily be added by changing the first if.

Reviewed-by: Tom Stellard <thomas.stellard@amd.com>
10 years agofreedreno/a3xx: more vtx formats
Rob Clark [Mon, 21 Jul 2014 14:41:49 +0000 (10:41 -0400)]
freedreno/a3xx: more vtx formats

Actually what we currently handle is just the SCALED versions, and not
the int versions.  The difference probably matters more when we actually
support integer in the compiler.

Signed-off-by: Rob Clark <robclark@freedesktop.org>
10 years agofreedreno/a3xx/compiler: const file relative addressing
Rob Clark [Mon, 21 Jul 2014 19:24:30 +0000 (15:24 -0400)]
freedreno/a3xx/compiler: const file relative addressing

Teach new compiler scheduling and register assignment how to deal with
relative addressing.  This gets us what we need to avoid falling back to
old compiler for CONST[ADDR[0].x+n].  It is also a prerequisite for temp
file relative addressing, although that is going to also need some
cleverness in register assignment to keep arrays grouped together.

NOTE: doing address calculation in full precision and then narrowing to
s16 in the mov to addr reg seems to sometimes cause lockups (and
sometimes work?!).  It seems more reliable to do the address calculation
in s16, like the blob does.  Which means teaching RA how to deal with
mixed half and full precision allocation.  Fortunately that didn't turn
out to be too hard, so that is a nice bonus which we could probably take
better advantage of elsewhere.

Signed-off-by: Rob Clark <robclark@freedesktop.org>
10 years agofreedreno/a3xx/compiler: move function
Rob Clark [Mon, 21 Jul 2014 18:16:44 +0000 (14:16 -0400)]
freedreno/a3xx/compiler: move function

Signed-off-by: Rob Clark <robclark@freedesktop.org>
10 years agofreedreno/a3xx: add back a few stalls
Rob Clark [Sun, 20 Jul 2014 15:26:56 +0000 (11:26 -0400)]
freedreno/a3xx: add back a few stalls

Technically we should not need these.  CP_LOAD_STATE can be pipelined.
But removing them broke a few piglit tests, like fbo-depth-
GL_DEPTH_COMPONENT24-readpixels.  I expect these are just masking a
problem elsewhere, or perhaps they are only needed under some more
specific circumstances.  But until that is understood properly, give
back a bit of the perf boost we got from c63450e8.

Signed-off-by: Rob Clark <robclark@freedesktop.org>
10 years agotargets/dri: fix freedreno targets
Rob Clark [Mon, 21 Jul 2014 14:43:30 +0000 (10:43 -0400)]
targets/dri: fix freedreno targets

The kernel driver name is either "kgsl" (downstream/android) or "msm"
(upstream).

Signed-off-by: Rob Clark <robclark@freedesktop.org>
10 years agofreedreno: update generated headers
Rob Clark [Sat, 19 Jul 2014 17:22:10 +0000 (13:22 -0400)]
freedreno: update generated headers

Signed-off-by: Rob Clark <robclark@freedesktop.org>
10 years agodocs: Update GL3.txt and relnotes for GL_ARB_clear_texture
Neil Roberts [Wed, 23 Jul 2014 11:10:37 +0000 (12:10 +0100)]
docs: Update GL3.txt and relnotes for GL_ARB_clear_texture

10 years agometa: Add a meta implementation of GL_ARB_clear_texture
Neil Roberts [Tue, 10 Jun 2014 15:21:21 +0000 (16:21 +0100)]
meta: Add a meta implementation of GL_ARB_clear_texture

Adds an implementation of the ClearTexSubImage driver entry point that tries
to set up an FBO to render to the texture and then calls glClearBuffer with a
scissor to perform the actual clear. If an FBO can't be created for the
texture then it will fall back to using _mesa_store_ClearTexSubImage.

When used in combination with _mesa_store_ClearTexSubImage this should provide
an implementation that works for all DRI-based drivers. However as this has
only been tested with the i965 driver it is currently only enabled there.

v2: Only enable the extension for the i965 driver instead of all DRI drivers.
    Remove an unnecessary goto. Don't require GL_ARB_framebuffer_object. Add
    some more comments.

v3: Use glClearBuffer* to avoid having to modify glClearColor and friends.
    Handle sRGB textures. Explicitly disable dithering.

Reviewed-by: Topi Pohjolainen <topi.pohjolainen at intel.com>
10 years agometa: Add a state flag for the GL_DITHER
Neil Roberts [Fri, 4 Jul 2014 14:37:28 +0000 (15:37 +0100)]
meta: Add a state flag for the GL_DITHER

The Meta implementation of glClearTexSubImage is going to want to ensure that
dithering is disabled so that it can get a consistent color across the whole
texture when clearing. This adds a state flag to easily save it and set it to
the default value when performing meta operations.

Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
10 years agotexstore: Add a generic implementation of GL_ARB_clear_texture
Neil Roberts [Tue, 10 Jun 2014 15:19:58 +0000 (16:19 +0100)]
texstore: Add a generic implementation of GL_ARB_clear_texture

Adds an implmentation of the ClearTexSubImage driver entry point that just
maps the texture and writes the values in. The extension is not yet enabled by
default because it doesn't work with multisample textures as they don't have a
simple linear layout.

Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
10 years agomesa/main: Add generic bits of ARB_clear_texture implementation
Neil Roberts [Tue, 10 Jun 2014 15:11:00 +0000 (16:11 +0100)]
mesa/main: Add generic bits of ARB_clear_texture implementation

This adds the driver entry point for glClearTexSubImage and fills in the
_mesa_ClearTexImage and _mesa_ClearTexSubImage functions that call it.

v2: Don't clear some of the images if only one of them makes an error

Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
10 years agoteximage: Add utility func for format/internalFormat compatibility check
Neil Roberts [Fri, 13 Jun 2014 16:28:48 +0000 (17:28 +0100)]
teximage: Add utility func for format/internalFormat compatibility check

In texture_error_check() there was a snippet of code to check whether the
given format and internal format are basically compatible. This has been split
out into its own static helper function so that it can be used by an
implementation of glClearTexImage too.

10 years agomesa/main: add ARB_clear_texture entrypoints
Ilia Mirkin [Sat, 1 Mar 2014 21:46:53 +0000 (16:46 -0500)]
mesa/main: add ARB_clear_texture entrypoints

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Neil Roberts <neil@linux.intel.com>
10 years agor600g/radeonsi: Use write-combined CPU mappings of some BOs in GTT
Michel Dänzer [Thu, 19 Jun 2014 01:40:38 +0000 (10:40 +0900)]
r600g/radeonsi: Use write-combined CPU mappings of some BOs in GTT

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
10 years agowinsys/radeon: Use separate caching buffer managers for VRAM and GTT
Michel Dänzer [Fri, 13 Jun 2014 08:48:57 +0000 (17:48 +0900)]
winsys/radeon: Use separate caching buffer managers for VRAM and GTT

Should reduce overhead because the caching buffer manager doesn't need to
consider buffers of the wrong type.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
10 years agodocs/GL3.txt: update status for ARB_compute_shader
Dave Airlie [Wed, 23 Jul 2014 01:06:15 +0000 (11:06 +1000)]
docs/GL3.txt: update status for ARB_compute_shader

since some bits are done in tree, but nobody is working on it anymore.

Reviewed-by: Chris Forbes <chrisf@ijw.co.nz>
Signed-off-by: Dave Airlie <airlied@redhat.com>
10 years agomesa: Don't use memcpy() in _mesa_texstore() for float depth texture data
Anuj Phogat [Mon, 21 Jul 2014 23:58:42 +0000 (16:58 -0700)]
mesa: Don't use memcpy() in _mesa_texstore() for float depth texture data

because float depth texture data needs clamping to [0.0, 1.0]. Let the
_mesa_texstore() fallback to slower path.

Fixes Khronos GLES3 CTS tests:
shadow_execution_vert
shadow_execution_frag

V2: Move the check to _mesa_texstore_can_use_memcpy() function.
    Add check for floating point data types.

Cc: <mesa-stable@lists.freedesktop.org>
Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
10 years agoi965/fs: Fix gl_SampleMask handling for SIMD16 on Gen8+.
Kenneth Graunke [Fri, 18 Jul 2014 20:19:46 +0000 (13:19 -0700)]
i965/fs: Fix gl_SampleMask handling for SIMD16 on Gen8+.

We actually want to use mov(16), not mov(8).

Fixes 7 Piglit tests: ARB_sample_shading/builtin-gl-sample-mask [2468]
and ARB_sample_shading/builtin-gl-sample-mask-simple [468].

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=80991
Reviewed-by: Matt Turner <mattst88@gmail.com>
Cc: "10.2" <mesa-stable@lists.freedesktop.org>
10 years agoi965/fs: Fix gl_SampleID for 2x MSAA and SIMD16 mode.
Kenneth Graunke [Fri, 18 Jul 2014 20:19:45 +0000 (13:19 -0700)]
i965/fs: Fix gl_SampleID for 2x MSAA and SIMD16 mode.

We might be able to do this without an extra program key field, but this
is non-invasive and fixes the bug, for now.

This fixes the following Piglit tests on Broadwell:
- ARB_sample_shading/builtin-gl-sample-id 2
- ARB_sample_shading/builtin-gl-sample-position 2
- EXT_framebuffer_multisample/multisample-blit 2 color
- EXT_framebuffer_multisample/multisample-blit 2 color linear
- EXT_framebuffer_multisample/multisample-blit 2 depth
- EXT_framebuffer_multisample/no-color 2 depth combined
- EXT_framebuffer_multisample/no-color 2 depth separate
- EXT_framebuffer_multisample/no-color 2 depth single
- EXT_framebuffer_multisample/no-color 2 depth-computed combined
- EXT_framebuffer_multisample/no-color 2 depth-computed separate
- EXT_framebuffer_multisample/no-color 2 depth-computed single
- EXT_framebuffer_multisample/unaligned-blit 2 color msaa
- EXT_framebuffer_multisample/unaligned-blit 2 depth msaa

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=80991
Reviewed-by: Matt Turner <mattst88@gmail.com>
Cc: "10.2" <mesa-stable@lists.freedesktop.org>
10 years agoi965: Add missing persample_shading field to brw_wm_debug_recompile.
Kenneth Graunke [Thu, 17 Jul 2014 18:18:35 +0000 (11:18 -0700)]
i965: Add missing persample_shading field to brw_wm_debug_recompile.

Otherwise, the performance warning for shader recompiles will just say
"something else".

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
10 years agoi965/disasm: Don't disassemble the URB complete field on Broadwell.
Kenneth Graunke [Thu, 17 Jul 2014 22:55:05 +0000 (15:55 -0700)]
i965/disasm: Don't disassemble the URB complete field on Broadwell.

It doesn't exist, so attempting to read it will trigger generation
assertions in the brw_inst API.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
10 years agoi965: Disable hex offset printing in disassembly.
Kenneth Graunke [Thu, 17 Jul 2014 23:29:41 +0000 (16:29 -0700)]
i965: Disable hex offset printing in disassembly.

Printing the hex offsets makes it basically impossible to diff assembly:
if you add even a single instruction, the entire shader shows up as a
difference.  So, every time I want to compare assembly, I have to strip
this out.

The hex offsets might be useful when debugging compaction, or when
inspecting the program cache buffer.  Since it's occasionally useful,
but uncommon, this patch disables it by default, but makes it easy to
re-enable it temporarily when the need arises.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
10 years agoi965/vec4: Use foreach_inst_in_block a couple more places.
Matt Turner [Sat, 12 Jul 2014 18:21:21 +0000 (11:21 -0700)]
i965/vec4: Use foreach_inst_in_block a couple more places.

Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
10 years agoi965: Replace cfg instances with calls to calculate_cfg().
Matt Turner [Sat, 12 Jul 2014 04:24:02 +0000 (21:24 -0700)]
i965: Replace cfg instances with calls to calculate_cfg().

Avoids regenerating it unnecessarily.

Every program in shader-db improved, none by an amount less than a 1/3
reduction. One Dota2 shader decreased from 62 -> 24.

cfg calculations:     429492 -> 193197 (-55.02%)

Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
10 years agoi965/cfg: Add a foreach_block_and_inst macro.
Matt Turner [Sat, 12 Jul 2014 04:17:01 +0000 (21:17 -0700)]
i965/cfg: Add a foreach_block_and_inst macro.

Will let us abstract how the instructions are stored.

Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
10 years agoi965: Add cfg to backend_visitor.
Matt Turner [Sat, 12 Jul 2014 03:54:52 +0000 (20:54 -0700)]
i965: Add cfg to backend_visitor.

Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
10 years agoradeonsi/compute: Add support scratch buffer support v2
Tom Stellard [Fri, 18 Jul 2014 18:45:18 +0000 (14:45 -0400)]
radeonsi/compute: Add support scratch buffer support v2

The scratch buffer will be used for private memory and also register
spilling.

v2:
  - Code cleanups

10 years agoradeonsi/compute: Bump number of user sgprs for LLVM 3.5
Tom Stellard [Fri, 18 Jul 2014 18:40:50 +0000 (14:40 -0400)]
radeonsi/compute: Bump number of user sgprs for LLVM 3.5

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
10 years agowinsys/radeon: Query the kernel for the number of SEs and SHs per SE
Tom Stellard [Fri, 18 Jul 2014 17:19:43 +0000 (13:19 -0400)]
winsys/radeon:  Query the kernel for the number of SEs and SHs per SE

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
10 years agoradeonsi/compute: Share COMPUTE_DBG macro with r600g
Tom Stellard [Fri, 18 Jul 2014 16:25:29 +0000 (12:25 -0400)]
radeonsi/compute: Share COMPUTE_DBG macro with r600g

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
10 years agoradeonsi: Read rodata from ELF and append it to the end of shaders
Tom Stellard [Mon, 14 Jul 2014 20:49:08 +0000 (16:49 -0400)]
radeonsi: Read rodata from ELF and append it to the end of shaders

The is used for programs that have arrays of constants that
are accessed using dynamic indices.  The shader will compute
the base address of the constants and then access them using
SMRD instructions.

10 years agoglsl: Fix bad indentation
Ian Romanick [Thu, 26 Jun 2014 00:24:13 +0000 (17:24 -0700)]
glsl: Fix bad indentation

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
10 years agoi965: Silence unused parameter warning
Ian Romanick [Wed, 2 Jul 2014 18:05:05 +0000 (11:05 -0700)]
i965: Silence unused parameter warning

brw_fs_visitor.cpp:2400:1: warning: unused parameter 'ir' [-Wunused-parameter]

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
10 years agoi965: Silence 'comparison is always true' warning
Ian Romanick [Wed, 2 Jul 2014 17:47:54 +0000 (10:47 -0700)]
i965: Silence 'comparison is always true' warning

The parameter is an int16_t, and we're check that it's value will fit in
16-bits.  Yes, the value that is stored in 16-bits will surely fit in
16-bits.

brw_inst.h: In function 'brw_inst_set_gen6_jump_count':
brw_inst.h:321:66: warning: comparison is always true due to limited range of data type [-Wtype-limits]
brw_inst.h:321:66: warning: comparison is always true due to limited range of data type [-Wtype-limits]

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
10 years agoi965: Silence many unused parameter warnings
Ian Romanick [Wed, 2 Jul 2014 17:30:19 +0000 (10:30 -0700)]
i965: Silence many unused parameter warnings

brw_inst.h: In function 'brw_inst_set_src1_vstride':
brw_inst.h:118:76: warning: unused parameter 'brw' [-Wunused-parameter]

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
10 years agoconfigure.ac: Add LLVM patch version to error message.
Vinson Lee [Fri, 18 Jul 2014 19:12:37 +0000 (12:12 -0700)]
configure.ac: Add LLVM patch version to error message.

Signed-off-by: Vinson Lee <vlee@freedesktop.org>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
10 years agomain/format_pack: Fix a wrong datatype in pack_ubyte_R8G8_UNORM
Jason Ekstrand [Thu, 17 Jul 2014 05:19:49 +0000 (22:19 -0700)]
main/format_pack: Fix a wrong datatype in pack_ubyte_R8G8_UNORM

Before it was only storing one of the color components due to truncation.
With this patch it now properly stores all of them.

Reviewed-by: Brian Paul <brianp@vmware.com>
Cc: "10.2" <mesa-stable@lists.freedesktop.org>
10 years agodocs: Import 10.2.4 release notes
Carl Worth [Fri, 18 Jul 2014 23:50:05 +0000 (16:50 -0700)]
docs: Import 10.2.4 release notes

And add a news item.

10 years agoAdd support for RGBA8 and RGBX8 textures in intel_texsubimage_tiled_memcpy
Jason Ekstrand [Thu, 17 Jul 2014 21:40:23 +0000 (14:40 -0700)]
Add support for RGBA8 and RGBX8 textures in intel_texsubimage_tiled_memcpy

Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Chad Versace <chad.versace@linux.intel.com>
10 years agoi965: Improve debug output in intelTexImage and intelTexSubimage
Jason Ekstrand [Thu, 17 Jul 2014 21:38:57 +0000 (14:38 -0700)]
i965: Improve debug output in intelTexImage and intelTexSubimage

Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Chad Versace <chad.versace@linux.intel.com>
10 years agoradeonsi: only update vertex buffers when they need updating
Marek Olšák [Fri, 11 Jul 2014 21:17:07 +0000 (23:17 +0200)]
radeonsi: only update vertex buffers when they need updating

Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
10 years agoradeonsi: remove nr_vertex_buffers
Marek Olšák [Wed, 9 Jul 2014 12:57:18 +0000 (14:57 +0200)]
radeonsi: remove nr_vertex_buffers

Unused.

Also inline util_set_vertex_buffers_count and simplify it.

Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
10 years agoradeonsi: move vertex buffer descriptors from IB to memory
Marek Olšák [Wed, 9 Jul 2014 02:00:53 +0000 (04:00 +0200)]
radeonsi: move vertex buffer descriptors from IB to memory

This removes the intermediate storage (pm4 state) and generates descriptors
directly in a staging buffer.

It also reduces the number of flushes, because the descriptors no longer
take CS space.

Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
10 years agoradeonsi: add support for fine-grained sampler view updates
Marek Olšák [Wed, 18 Jun 2014 01:23:46 +0000 (03:23 +0200)]
radeonsi: add support for fine-grained sampler view updates

Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
10 years agoradeonsi: move si_set_sampler_views to si_descriptors.c
Marek Olšák [Wed, 18 Jun 2014 01:08:01 +0000 (03:08 +0200)]
radeonsi: move si_set_sampler_views to si_descriptors.c

Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
10 years agoradeonsi: move sampler descriptors from IB to memory
Marek Olšák [Wed, 18 Jun 2014 00:46:49 +0000 (02:46 +0200)]
radeonsi: move sampler descriptors from IB to memory

Sampler descriptors are now represented by si_descriptors.
This also adds support for fine-grained sampler state updates and
the border color update is now isolated in a separate function.

Border colors have been broken if texturing from multiple shader stages is
used. This patch doesn't change that.

BTW, blitting already makes use of fine-grained state updates.
u_blitter uses 2 textures at most, so we only have to save 2.

Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
10 years agoradeonsi: implement ARB_draw_indirect
Marek Olšák [Thu, 24 Apr 2014 01:03:43 +0000 (03:03 +0200)]
radeonsi: implement ARB_draw_indirect

Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
10 years agoradeonsi: don't add info->start to the index buffer offset
Marek Olšák [Thu, 24 Apr 2014 14:13:54 +0000 (16:13 +0200)]
radeonsi: don't add info->start to the index buffer offset

info->start will be invalid once info->indirect isn't NULL, so it shouldn't
be added to ib.offset.

Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
10 years agoradeonsi: use an SGPR instead of VGT_INDX_OFFSET
Marek Olšák [Wed, 23 Apr 2014 14:15:36 +0000 (16:15 +0200)]
radeonsi: use an SGPR instead of VGT_INDX_OFFSET

The draw indirect packets cannot set VGT_INDX_OFFSET, they can only set user
data SGPRs. This is the only way to support start/index_bias with indirect
drawing.

Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
10 years agoradeonsi: assume LLVM 3.4.2 is always present
Marek Olšák [Tue, 8 Jul 2014 00:50:57 +0000 (02:50 +0200)]
radeonsi: assume LLVM 3.4.2 is always present

Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
10 years agoconfigure.ac: require LLVM 3.4.2 for radeon
Marek Olšák [Tue, 8 Jul 2014 00:41:13 +0000 (02:41 +0200)]
configure.ac: require LLVM 3.4.2 for radeon

Needed by ARB_draw_indirect.

Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
10 years agost/mesa,gallium: add a workaround for Unigine Heaven 4.0 and Valley 1.0
Marek Olšák [Tue, 8 Jul 2014 18:24:55 +0000 (20:24 +0200)]
st/mesa,gallium: add a workaround for Unigine Heaven 4.0 and Valley 1.0

Most (all?) Unigine shaders fail to compile without this if sample shading
is advertised. This is, of course, Unigine developers' fault.

Reviewed-by: Brian Paul <brianp@vmware.com>
10 years agoglsl: add a mechanism to allow #extension directives in the middle of shaders
Marek Olšák [Tue, 8 Jul 2014 18:20:22 +0000 (20:20 +0200)]
glsl: add a mechanism to allow #extension directives in the middle of shaders

This is needed to make Unigine Heaven 4.0 and Unigine Valley 1.0 work
with sample shading.

Also, if this is disabled, the error message at least makes sense now.

Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
10 years agor600g: Implement GL_ARB_texture_gather
Glenn Kennard [Wed, 16 Jul 2014 14:31:26 +0000 (16:31 +0200)]
r600g: Implement GL_ARB_texture_gather

Only supported on evergreen and later. Currently limited
to single component textures as the hardware GATHER4
instruction ignores texture swizzles.

Piglit quick run passes on radeon 6670 with all
applicable textureGather tests, no regressions.

Signed-off-by: Glenn Kennard <glenn.kennard@gmail.com>
Signed-off-by: Marek Olšák <marek.olsak@amd.com>
10 years agoi965: Fix z_offset computation in intel_miptree_unmap_depthstencil()
Anuj Phogat [Mon, 14 Jul 2014 23:16:47 +0000 (16:16 -0700)]
i965: Fix z_offset computation in intel_miptree_unmap_depthstencil()

The bug is triggered by using glTexSubImage2d() with GL_DEPTH_STENCIL
as base internal format and non-zero x, y offsets. Currently x, y
offsets are ignored while updating the texture image.

Fixes Khronos GLES3 CTS tests:
npot_tex_sub_image_2d
npot_tex_sub_image_3d
npot_pbo_tex_sub_image_2d
npot_pbo_tex_sub_image_2d

Cc: <mesa-stable@lists.freedesktop.org>
Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
10 years agoRevert "i965: Extend compute-to-mrf pass to understand blocks of MOVs"
Anuj Phogat [Tue, 15 Jul 2014 19:56:37 +0000 (12:56 -0700)]
Revert "i965: Extend compute-to-mrf pass to understand blocks of MOVs"

This reverts commit bbefb15e01e1c16af69646898918982ae00f8c92.
Fixes the 11 regressions caused in framebuffer_blit tests in
Khronos GLES3 CTS tests:

Original patch reduced the instruction count but had no performance
benefits. So, it's safe to revert it without causing any performance
regressions.

Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com>
Acked-by: Kristian Høgsberg <krh@bitplanet.net>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
10 years agoi915: Fix up intelInitScreen2 for DRI3
Adel Gadllah [Thu, 3 Jul 2014 20:13:53 +0000 (22:13 +0200)]
i915: Fix up intelInitScreen2 for DRI3

Commit 442442026eb updated both i915 and i965 for DRI3 support,
but one check in intelInitScreen2 was missed for i915 causing crashes
when trying to use i915 with DRI3.

So fix that up.

Reported-by: Igor Gnatenko <i.gnatenko.brain@gmail.com>
References: https://bugzilla.redhat.com/show_bug.cgi?id=1115323
References: https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=754297
Tested-by: František Zatloukal <Zatloukal.Frantisek@gmail.com>
Tested-by: Dirk Griesbach <spamthis@freenet.de>
Signed-off-by: Adel Gadllah <adel.gadllah@gmail.com>
Acked-by: Kenneth Graunke <kenneth@whitecape.org>
Cc: "10.2" <mesa-stable@lists.freedesktop.org>
10 years agomesa: Fix regression introduced by commit "mesa: fix packing of float texels to GL_SH...
Pavel Popov [Mon, 30 Jun 2014 15:21:56 +0000 (22:21 +0700)]
mesa: Fix regression introduced by commit "mesa: fix packing of float texels to GL_SHORT/GL_BYTE".

This commit "mesa: fix packing of float texels to GL_SHORT/GL_BYTE" replaced *_TO_BYTE to *_TO_BYTE_TEX because *_TO_FLOAT_TEX are used to unpack the texels to floats.
In this case *_TO_FLOATZ in function extract_float_rgba also should be replaced to *_TO_FLOAT_TEX. Underline that these macros automatically preserve zero when converting.

The regression was observed on 3 oglconform tests:
    snorm-textures basic.getTexImage
    snorm-textures advanced.mipmap.manual.getTex
    snorm-textures advanced.mipmap.upload.getTex

Signed-off-by: Pavel Popov <pavel.e.popov@intel.com>
Reviewed-by: Chris Forbes <chrisf@ijw.co.nz>
10 years agonv50: fix build failure on m68k due to invalid struct alignment assumptions
Thorsten Glaser [Wed, 30 Oct 2013 17:04:07 +0000 (18:04 +0100)]
nv50: fix build failure on m68k due to invalid struct alignment assumptions

Make alignment assumptions explicit by inserting correct padding with
unknown struct members.

Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: <mesa-stable@lists.freedesktop.org>
10 years agoclover: Call end_query before getting timestamp result v2
Tom Stellard [Wed, 16 Jul 2014 20:14:07 +0000 (16:14 -0400)]
clover: Call end_query before getting timestamp result v2

v2:
  - Move the end_query() call into the timestamp constructor.
  - Still pass false as the wait parameter to get_query_result().

Reviewed-by: Niels Ole Salscheider <niels_ole@salscheider-online.de>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
CC: "10.2" <mesa-stable@lists.freedesktop.org>
10 years agoglsl: handle a switch where default is in the middle of cases
Tapani Pälli [Mon, 14 Jul 2014 06:45:46 +0000 (09:45 +0300)]
glsl: handle a switch where default is in the middle of cases

This fixes following tests in es3conform:

   shaders.switch.default_not_last_dynamic_vertex
   shaders.switch.default_not_last_dynamic_fragment

and makes following tests in Piglit pass:

   glsl-1.30/execution/switch/fs-default-notlast-fallthrough
   glsl-1.30/execution/switch/fs-default_notlast

No Piglit regressions.

v2: take away unnecessary ir_if, just use conditional assignment
v3: use foreach_in_list instead of foreach_list

Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Roland Scheidegger <sroland@vmware.com> (v2)
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> (v3)
10 years agoglsl: Make the tree rebalancer use vector_elements, not components().
Kenneth Graunke [Tue, 15 Jul 2014 23:36:32 +0000 (16:36 -0700)]
glsl: Make the tree rebalancer use vector_elements, not components().

components() includes matrix columns, so if this code encountered a
matrix, it would ask for something like a vec9 or vec16.  This is
clearly not what you want.

Earlier code now prevents this from seeing matrices, but we should still
use vector_elements, for clarity.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
10 years agoglsl: Guard against error_type in the tree rebalancer.
Kenneth Graunke [Tue, 15 Jul 2014 23:35:55 +0000 (16:35 -0700)]
glsl: Guard against error_type in the tree rebalancer.

This helped me track down the bug fixed in the previous commit.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
10 years agoglsl: Make the tree rebalancer bail on matrix operands.
Kenneth Graunke [Tue, 15 Jul 2014 23:34:56 +0000 (16:34 -0700)]
glsl: Make the tree rebalancer bail on matrix operands.

It doesn't handle things like (vector * matrix) correctly, and
apparently Matt's intention was to bail.

Fixes shader compilation in Natural Selection 2.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
10 years agoRevert "i965: Implement GL_PRIMITIVES_GENERATED with non-zero streams."
Kenneth Graunke [Wed, 16 Jul 2014 18:35:59 +0000 (11:35 -0700)]
Revert "i965: Implement GL_PRIMITIVES_GENERATED with non-zero streams."

This reverts commit 3178d2474ae5bdd1102fb3d76a60d1d63c961ff5.

This caused GPU hangs on Ivybridge for some users and huge (80%)
performance regressions across the board on multiple platforms.

We need to find a better solution.  I've made several attempts, but none
of them have worked yet.  In the meantime, we should revert this.

Reverting it breaks GL_PRIMITIVES_GENERATED for non-zero streams, but
that's okay, since we don't expose GL_ARB_gpu_shader5 yet.

Fixes Piglit's EXT_transform_feedback/generatemipmap prims_generated
test case on Haswell.

10 years agoilo: add some missing formats
Chia-I Wu [Wed, 16 Jul 2014 05:51:49 +0000 (13:51 +0800)]
ilo: add some missing formats

Map more pipe formats to hardware formats.  Enable more VB formats on Haswell.