Jon Turney [Thu, 1 Feb 2018 15:19:08 +0000 (15:19 +0000)]
travis: conditionalize building of prerequisites on if OS=linux
Use a '|' YAML literal block to avoid the convoluted syntax needed to put
the entire conditional on a single line.
Signed-off-by: Jon Turney <jon.turney@dronecode.org.uk>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
Jon Turney [Thu, 25 Jan 2018 17:34:54 +0000 (17:34 +0000)]
glx/test: fix building for osx
An additional stub for applegl_create_context() is needed
Cannot test indirect API as it's not built on osx, currently
Signed-off-by: Jon Turney <jon.turney@dronecode.org.uk>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
Andres Gomez [Thu, 1 Feb 2018 15:15:14 +0000 (17:15 +0200)]
i965: check if upload is 0 explicitely, when downsizing a format
downsize_format_if_needed takes an integer as number of uploads
parameter. Hence, let's do an integer comparation instead of a boolean
check, since that is confusing.
Since we are at it, fix a couple of wrongly tabbed indents.
Cc: Alejandro Piñeiro <apinheiro@igalia.com>
Cc: Kenneth Graunke <kenneth@whitecape.org>
Signed-off-by: Andres Gomez <agomez@igalia.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Marek Olšák [Sun, 7 Jan 2018 17:27:40 +0000 (18:27 +0100)]
mesa: don't flag _NEW_COLOR for KHR adv.blend if prog constant doesn't change
This only affects drivers that set DriverFlags.NewBlend.
v2: - fix typo advanded -> advanced
- return "enum gl_advanced_blend_mode" from
_mesa_get_advanced_blend_sh_constant
- don't call FLUSH_VERTICES twice
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Samuel Pitoiset [Thu, 1 Feb 2018 15:37:15 +0000 (16:37 +0100)]
ac/nir: replace SI.buffer.load.dword with amdgcn.buffer.load
The old one generates useless instructions in there, found while
comparing geometry shaders between RadeonSI and RADV.
This improves all Vulkan demos that use geometry shaders, +4%
for deferredshadows, +9% for viewportarray, +7% for
geometryshader on Polaris10.
This seems to also improve DOW3 a little bit (+1%).
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Dave Airlie [Tue, 30 Jan 2018 02:21:59 +0000 (12:21 +1000)]
r600/eg: add crap indirect compute support.
I think the cp packets can be made work, but I think it might
need a kernel change, so for now just do the worst thing.
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
Jason Ekstrand [Thu, 1 Feb 2018 01:31:39 +0000 (17:31 -0800)]
i965: Call prepare_external after implicit window-system MSAA resolves
This fixes some rendering corruption in a couple of Android apps that
use window-system MSAA.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=104741
Cc: mesa-stable@lists.freedesktop.org
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Roland Scheidegger [Tue, 30 Jan 2018 04:48:27 +0000 (05:48 +0100)]
r600: don't do stack workarounds for hemlock
By the looks of it it seems hemlock is treated separately to cypress, but
certainly it won't need the stack workarounds cedar/redwood (and
seemingly every other eg chip except cypress/juniper) need.
(Discovered by accident.)
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Dave Airlie [Wed, 31 Jan 2018 04:28:26 +0000 (14:28 +1000)]
r600: initial attempt at gl_HelperInvocation (v3)
This passes the CTS and piglit tests.
This also disable sb for helper invocations until it doesn't
mess up the VPM flags.
Thanks to Ilia and Glenn for advice, and Roland for working
out the working evergreen path.
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
Bas Nieuwenhuizen [Wed, 31 Jan 2018 11:31:30 +0000 (12:31 +0100)]
radv: Don't expose VK_KHX_multiview on android.
deqp does not allow any KHX extensions, and since deqp is included
in android-cts, android does not allow any khx extensions.
So disable VK_KHX_multiview on android.
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
CC: 18.0 <mesa-stable@lists.freedesktop.org>
Mathias Fröhlich [Sat, 27 Jan 2018 15:07:22 +0000 (16:07 +0100)]
vbo: Simplify input array distribution for dlist type draws.
Using the newly introduced VAO array maps, we can
simplify vbo_bind_vertex_list.
Signed-off-by: Mathias Fröhlich <Mathias.Froehlich@web.de>
Reviewed-by: Brian Paul <brianp@vmware.com>
Mathias Fröhlich [Sat, 27 Jan 2018 15:07:22 +0000 (16:07 +0100)]
vbo: Simplify input array distribution for imm type draws.
Using the newly introduced VAO array maps, we can
simplify vbo_exec_bind_arrays.
Signed-off-by: Mathias Fröhlich <Mathias.Froehlich@web.de>
Reviewed-by: Brian Paul <brianp@vmware.com>
Mathias Fröhlich [Sat, 27 Jan 2018 15:07:22 +0000 (16:07 +0100)]
vbo: Simplify input array distribution for array type draws.
Using the newly introduced VAO state variable, we can
simplify recalculate_input_bindings.
Signed-off-by: Mathias Fröhlich <Mathias.Froehlich@web.de>
Reviewed-by: Brian Paul <brianp@vmware.com>
Mathias Fröhlich [Sat, 27 Jan 2018 15:07:22 +0000 (16:07 +0100)]
vbo: Use static const VERT_ATTRIB->VBO_ATTRIB maps.
Instead of each context having its own map instance for
this purpose, use a global static const map.
v2: s,unsigned char,GLubyte,g
s,_VP_MODE_MAX,VP_MODE_MAX,g
Change comment style.
Signed-off-by: Mathias Fröhlich <Mathias.Froehlich@web.de>
Reviewed-by: Brian Paul <brianp@vmware.com>
Mathias Fröhlich [Sat, 27 Jan 2018 15:07:22 +0000 (16:07 +0100)]
mesa: Track position/generic0 aliasing in the VAO.
Since the first material attribute no longer aliases with
the generic0 attribute, only aliasing between generic0 and
position is left and entirely dependent on the enabled
state of the VAO. So introduce a gl_attribute_map_mode
in the VAO that is used to track how the position
and the generic 0 attribute alias.
Provide a static const array that can be used to
map from vertex program input indices to VERT_ATTRIB_*
indices. The outer dimension of the array is meant to
be indexed directly by the new VAO member variable.
Also provide methods on the VAO to convert bitmasks of
VERT_BIT's from the VAO numbering to the vertex processing
inputs numbering.
v2: s,unsigned char,GLubyte,g
s,_ATTRIBUTE_MAP_MODE_MAX,ATTRIBUTE_MAP_MODE_MAX,g
Change comment style, add comments.
Signed-off-by: Mathias Fröhlich <Mathias.Froehlich@web.de>
Reviewed-by: Brian Paul <brianp@vmware.com>
Mathias Fröhlich [Sat, 27 Jan 2018 15:07:22 +0000 (16:07 +0100)]
mesa: Put materials at the end of the generic block.
The materials are now moved to the end of the
generic attributes block to the range 4-15.
Before, the way the position and generic 0 attribute
is handled was dependent on the presence and kind of
the currently attached vertex program. With this
change the way the position attribute and the generic 0
attribute is treated only depends on the enabled
flag of those two arrays.
This will later help to untangle the update dependencies
between enabled arrays and shader inputs.
v2: s,VERT_ATTRIB_MAT_OFFSET,VERT_ATTRIB_MAT0,g
Signed-off-by: Mathias Fröhlich <Mathias.Froehlich@web.de>
Reviewed-by: Brian Paul <brianp@vmware.com>
Mathias Fröhlich [Sat, 27 Jan 2018 15:07:22 +0000 (16:07 +0100)]
mesa: Use defines for the aliased material array attributes.
Instead of just assuming that the material attributes
just overlap with the generic attributes 0-12, give
them symbolic defines so that we can easier move them
to an other range.
Signed-off-by: Mathias Fröhlich <Mathias.Froehlich@web.de>
Reviewed-by: Brian Paul <brianp@vmware.com>
Mathias Fröhlich [Sat, 27 Jan 2018 15:07:22 +0000 (16:07 +0100)]
vbo: Correctly handle attribute offsets in dlist draw.
When executing a display list draw, for the offset
list to be correct, the offset computation needs to
accumulate all attribute size values in order.
Specifically, if we are shuffling around the position
and generic0 attributes, we may violate the order or
if we do not walk the generic vbo attributes we may
skip some of the attributes.
Even if this is an unlikely usecase we can fix this use
case by precomputing the offsets on the full attribute list
and store the full offset list in the display list node.
v2: Formatting fix
v3: Rebase
Signed-off-by: Mathias Fröhlich <Mathias.Froehlich@web.de>
Reviewed-by: Brian Paul <brianp@vmware.com>
Brian Paul [Thu, 1 Feb 2018 20:17:08 +0000 (13:17 -0700)]
gallivm/llvmpipe: add const qualifiers on sampler variables
Once a lp_build_sampler_soa or lp_build_sampler_aos object is created,
it should never be modified. Found by inspection.
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Brian Paul [Wed, 31 Jan 2018 23:19:07 +0000 (16:19 -0700)]
vbo: change an argument in vbo_draw_indirect_prims()
In vbo_draw_indirect_prims() pass the 'indirect_data' argument to
vbo->draw_prims(). All the callers are passing ctx->DrawIndirectBuffer
so this should be no functional change. Add a (temporary) assertion to
be sure.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Mathias Fröhlich <mathias.froehlich@web.de>
Brian Paul [Wed, 31 Jan 2018 23:15:53 +0000 (16:15 -0700)]
vbo: add comments on the VBO draw function typedefs
And rename indirect_params -> indirect_draw_count_buffer and
indirect_params_offset -> indirect_draw_count_offset to be more
specific.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Mathias Fröhlich <mathias.froehlich@web.de>
Brian Paul [Wed, 31 Jan 2018 23:11:12 +0000 (16:11 -0700)]
vbo: s/drawcount/drawcount_offset
This parameter (from the glMultiDrawArraysIndirectCountARB function)
is poorly named. It's an offset into the buffer which contains the
number of primitives to draw.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Mathias Fröhlich <mathias.froehlich@web.de>
Brian Paul [Wed, 31 Jan 2018 21:05:46 +0000 (14:05 -0700)]
vbo: use vbo local var for draw call in vbo_save_playback_vertex_list()
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Mathias Fröhlich <mathias.froehlich@web.de>
Brian Paul [Thu, 1 Feb 2018 03:36:35 +0000 (20:36 -0700)]
svga: remove unneeded #includes in svga_pipe_draw.c
Reviewed-by: Neha Bhende <bhenden@vmware.com>
Brian Paul [Thu, 1 Feb 2018 03:33:29 +0000 (20:33 -0700)]
svga: whitespace/formatting fixes in svga_pipe_draw.c
Reviewed-by: Neha Bhende <bhenden@vmware.com>
Brian Paul [Thu, 1 Feb 2018 03:18:52 +0000 (20:18 -0700)]
svga: clean up retry_draw_range_elements(), retry_draw_arrays()
Get rid of a bunch of goto spaghetti. Remove unneeded do_retry parameter.
No Piglit changes. Also tested w/ Google Earth and other apps.
Reviewed-by: Neha Bhende <bhenden@vmware.com>
Brian Paul [Wed, 31 Jan 2018 23:10:39 +0000 (16:10 -0700)]
svga: remove unused min/max_index params to draw_vgpu10()
Reviewed-by: Neha Bhende <bhenden@vmware.com>
Eric Anholt [Wed, 24 Jan 2018 01:13:53 +0000 (12:13 +1100)]
broadcom/vc5: Fix image_h setup for both loads and stores.
The image_h for the tiling algorithm needs to be the padded-to-a-uifblock
height of the level, not the unpadded height or the height of level 0.
Fixes some cases of KHR-GLES3.texture_repeat_mode.* and
depthstencil-render-miplevels.
Eric Anholt [Sun, 21 Jan 2018 02:25:10 +0000 (10:25 +0800)]
broadcom/vc5: Add appropriate height padding for bank conflicts.
I thought I didn't need this because I was doing level-0-always-UIF and
that the pad there would propagate down, but it turns out that for level 1
the padding ends up being chosen by the HW. This brings us closer to
being able to turn on UIF XOR for increased performance, as well.
Eric Anholt [Sun, 21 Jan 2018 00:44:53 +0000 (16:44 -0800)]
broadcom/vc5: Simplify separate stencil surface setup.
If we just make another gallium surface for the separate stencil, it's a
lot easier to keep track of which set of fields we're using in RCL setup.
This also incidentally fixes a little bug in setting up the surface's
padded height for separate stencil when the UIF-ness changes at different
levels of Z versus stencil.
Eric Anholt [Sat, 20 Jan 2018 23:57:08 +0000 (15:57 -0800)]
broadcom/vc5: Rename the UIFCFG register in the UAPI.
This matches the naming of the other hub regs we get, and I don't know for
sure if UIFCFG will be the same register between the hub and the cores on
all versions.
Eric Anholt [Fri, 19 Jan 2018 17:06:26 +0000 (09:06 -0800)]
broadcom/vc5: Fix a segfault on mix of booleans.
We don't have a src1 to look up if the compare instruction is "i2b".
Eric Anholt [Sat, 20 Jan 2018 18:19:20 +0000 (10:19 -0800)]
broadcom/vc5: Skip over missing color buffers for a couple of checks.
Fixes crashes in piglit alpha-to-coverage-no-draw-buffer-zero 2
Eric Anholt [Thu, 1 Feb 2018 01:38:22 +0000 (17:38 -0800)]
broadcom/vc5: Add the missing PIPE_CAP_FENCE_SIGNAL.
Baldur Karlsson [Thu, 1 Feb 2018 18:31:56 +0000 (11:31 -0700)]
mesa: fix query of GL_TEXTURE_COMPRESSION_HINT_ARB
Fixes:
f96a69f916a ("mesa: replace GLenum with GLenum16 in common
structures (v4)")
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=104908
Reviewed-by: Brian Paul <brianp@vmware.com>
Lucas Stach [Tue, 30 Jan 2018 14:11:35 +0000 (15:11 +0100)]
renderonly: fix dumb BO allocation for non 32bpp formats
Take into account the resource format, instead of applying a hardcoded
32bpp. This not only over-allocates 16bpp formats, but also results in
a wrong stride being filled into the handle.
Fixes:
848b49b288f ("gallium: add renderonly library")
CC: <mesa-stable@lists.freedesktop.org>
Signed-off-by: Lucas Stach <l.stach@pengutronix.de>
Reviewed-by: Philipp Zabel <p.zabel@pengutronix.de>
Reviewed-by: Daniel Stone <daniels@collabora.com>
Kenneth Graunke [Thu, 1 Feb 2018 17:43:20 +0000 (09:43 -0800)]
intel/decoder: Fix control / evaluation label mixup.
Trivial. DS is TES, HS is TCS.
Kenneth Graunke [Wed, 31 Jan 2018 15:03:17 +0000 (07:03 -0800)]
i965: Bump official kernel requirement to Linux v3.9.
In commit
3f353342a6b6744773c26ed66b12afed42bd57af (present in 17.3.0)
we started unconditionally using I915_EXEC_NO_RELOC, which was
introduced in Linux v3.9. ChromeOS kernel 3.8 has backported this,
so it should work too.
Running on older kernels would likely result in every single batch
being rejected by the kernel, which is pretty catastrophic. Yet, it
appears that nobody noticed. So, let's just bump the official
requirement and move forward ever so slowly.
Fixes:
3f353342a6b ("i965: Use I915_EXEC_NO_RELOC")
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Acked-by: Emil Velikov <emil.velikov@collabora.com>
Marc Dietrich [Thu, 1 Feb 2018 12:27:28 +0000 (13:27 +0100)]
meson: don't install windows headers on non-windows platforms
Only dive into the windows subdir if windows platform is selected.
Signed-off-by: Marc Dietrich <marvin24@gmx.de>
Fixes:
5ef75cb02b2b4db5506b8 "meson: build src/glx/windows"
Reviewed-by: Eric Engestrom <eric.engestrom@imgtec.com>
Marek Olšák [Tue, 30 Jan 2018 18:40:43 +0000 (19:40 +0100)]
radeonsi: use ac_build_buffer_load_format for image buffer loads
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Marek Olšák [Tue, 30 Jan 2018 18:40:43 +0000 (19:40 +0100)]
ac/nir: use ac_build_buffer_load_format for image buffer loads
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Marek Olšák [Tue, 30 Jan 2018 18:24:07 +0000 (19:24 +0100)]
ac: add glc parameter to ac_build_buffer_load_format
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Marek Olšák [Tue, 30 Jan 2018 17:34:25 +0000 (18:34 +0100)]
radeonsi: load the right number of components for VS inputs and TBOs
The supported counts are 1, 2, 4. (3=4)
The following snippet loads float, vec2, vec3, and vec4:
Before:
buffer_load_format_x v9, v4, s[0:3], 0 idxen ;
E0002000 80000904
buffer_load_format_xyzw v[0:3], v5, s[8:11], 0 idxen ;
E00C2000 80020005
s_waitcnt vmcnt(0) ;
BF8C0F70
buffer_load_format_xyzw v[2:5], v6, s[12:15], 0 idxen ;
E00C2000 80030206
s_waitcnt vmcnt(0) ;
BF8C0F70
buffer_load_format_xyzw v[5:8], v7, s[4:7], 0 idxen ;
E00C2000 80010507
After:
buffer_load_format_x v10, v4, s[0:3], 0 idxen ;
E0002000 80000A04
buffer_load_format_xy v[8:9], v5, s[8:11], 0 idxen ;
E0042000 80020805
buffer_load_format_xyzw v[0:3], v6, s[12:15], 0 idxen ;
E00C2000 80030006
s_waitcnt vmcnt(0) ;
BF8C0F70
buffer_load_format_xyzw v[3:6], v7, s[4:7], 0 idxen ;
E00C2000 80010307
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Marek Olšák [Tue, 30 Jan 2018 16:58:14 +0000 (17:58 +0100)]
radeonsi: remove unused si_shader_context members
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Jon Turney [Tue, 16 Jan 2018 16:26:57 +0000 (16:26 +0000)]
glx/apple: locate dispatch table functions to wrap by name
Avoid reaching into the dispatch table internals (and thus having to deal
with the complexities of remap etc.) by identifying functions to wrap by
name.
See:
https://lists.freedesktop.org/archives/mesa-dev/2015-June/086721.html et seq.
https://bugs.freedesktop.org/show_bug.cgi?id=90311
Signed-off-by: Jon Turney <jon.turney@dronecode.org.uk>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
Jon Turney [Sat, 2 Dec 2017 17:05:43 +0000 (17:05 +0000)]
glx/apple: include util/debug.h for env_var_as_boolean prototype
mesa/src/glx/glxcmds.c:1295:21: error: implicit declaration of function 'env_var_as_boolean' is invalid in C99 [-Werror,-Wimplicit-function-declaration]
mesa/src/glx/apple/apple_visual.c:85:28: error: implicit declaration of function 'env_var_as_boolean' is invalid in C99 [-Werror,-Wimplicit-function-declaration]
Signed-off-by: Jon Turney <jon.turney@dronecode.org.uk>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
Jon Turney [Sun, 3 Dec 2017 21:58:12 +0000 (21:58 +0000)]
osx: ld doesn't support --build-id
Signed-off-by: Jon Turney <jon.turney@dronecode.org.uk>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
Jon Turney [Tue, 16 Jan 2018 23:27:43 +0000 (23:27 +0000)]
configure: Default to gbm=no on osx
Signed-off-by: Jon Turney <jon.turney@dronecode.org.uk>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
Andres Rodriguez [Wed, 31 Jan 2018 17:22:41 +0000 (12:22 -0500)]
mesa: remove usage of alloca in externalobjects.c v4
Don't want an overly large numBufferBarriers/numTextureBarriers to blow
up the stack.
v2: handle malloc errors
v3: fix patch
v4: initialize texObjs/bufObjs
Suggested-by: Emil Velikov <emil.velikov@collabora.com>
Signed-off-by: Andres Rodriguez <andresx7@gmail.com>
Samuel Pitoiset [Wed, 31 Jan 2018 14:53:37 +0000 (15:53 +0100)]
radv: do not insert shaders in cache when it's disabled
When the application doesn't provide its own pipeline cache,
the driver uses a in-memory cache but it shouldn't insert any
entries when the cache is explicitely disabled by the user.
Found while running my experimental pipeline-db tool with a
ton of shaders, the memory footprint was just huge, and sometimes
the process was even killed...
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Samuel Pitoiset [Tue, 23 Jan 2018 11:10:44 +0000 (12:10 +0100)]
radv: use separate bindings for graphics and compute descriptors
The Vulkan spec says:
"pipelineBindPoint is a VkPipelineBindPoint indicating whether
the descriptors will be used by graphics pipelines or compute
pipelines. There is a separate set of bind points for each of
graphics and compute, so binding one does not disturb the other."
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=104732
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Samuel Pitoiset [Tue, 23 Jan 2018 11:20:32 +0000 (12:20 +0100)]
radv: store the bind point when creating descriptors with templates
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Dave Airlie [Thu, 1 Feb 2018 02:00:39 +0000 (12:00 +1000)]
r600/eg: make sure we allow vpm bit on other CF ops.
the vpm bit wasn't being applied to the push/pop instructions.
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
Timothy Arceri [Thu, 1 Feb 2018 02:52:55 +0000 (13:52 +1100)]
gallium/st/clover: remove unused PIPE_SHADER_IR_LLVM
This has been unused since
100796c15c3a.
Acked-by: Marek Olšák <marek.olsak@amd.com>
Dave Airlie [Thu, 1 Feb 2018 02:06:40 +0000 (12:06 +1000)]
r600/sb: just add some missing debug bits
Signed-off-by: Dave Airlie <airlied@redhat.com>
Dave Airlie [Thu, 1 Feb 2018 00:32:10 +0000 (10:32 +1000)]
r600: fix buffer resinfo opcode translation.
The vtx operations never got translated, so things worked by
0 being equal to 0, translate them so we can use the proper buffer
resinfo code.
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
Timothy Arceri [Wed, 31 Jan 2018 01:58:48 +0000 (12:58 +1100)]
st/glsl_to_nir: add more nir opts to st_nir_opts()
All of the current gallium nir driver use these optimisations but
they do so in their backends. Having these called in the backend
only can cause a number of problems:
- Shader compile times are greater because the opts need to do
significant passes over all shader variants.
- The shader cache is partially defeated due to the significant
optimisation passes over variants.
- We might miss out on nir linking optimisation opportunities.
Adding these passes to st_nir_opts() alleviates these problems.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Andres Gomez [Mon, 29 Jan 2018 16:25:30 +0000 (18:25 +0200)]
i965: perform 2 uploads with dual slot *64*PASSTHRU formats on gen<8
The emission of vertex attributes corresponding to dvec3 and dvec4
vertex shader input variables was not correct when the <size> passed
to the VertexAttribL* commands was <= 2.
In
61a8a55f557 ("i965/gen8: Fix vertex attrib upload for dvec3/4
shader inputs"), for gen8+ we needed to determine if the attrib was
dual slot to emit 128 or 256-bit, independently of the VAO size.
Similarly, for gen < 8 we also need to determine whether the attrib is
dual slot to force the emission of 256-bits through 2 uploads.
Additionally, we make use of the ISL_FORMAT_R32_FLOAT format in this
second upload to fill these unspecified components with zeros, as we
also do for gen8+.
Fixes the following test on Haswell:
KHR-GL46.vertex_attrib_binding.basic-inputL-case1
v2: Added more inline comments to explain why we are using
ISL_FORMAT_R32_FLOAT and its consequences, as requested by
Alejandro and Antía.
Fixes:
75968a668e4 ("i965/gen7: expose OpenGL 4.2 on Haswell when
supported")
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=103006
Cc: Alejandro Piñeiro <apinheiro@igalia.com>
Cc: Juan A. Suarez Romero <jasuarez@igalia.com>
Cc: Antia Puentes <apuentes@igalia.com>
Cc: Rafael Antognolli <rafael.antognolli@intel.com>
Cc: Kenneth Graunke <kenneth@whitecape.org>
Signed-off-by: Andres Gomez <agomez@igalia.com>
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Reviewed-by: Antia Puentes <apuentes@igalia.com>
Reviewed-by: Rafael Antognolli <rafael.antognolli@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Kenneth Graunke [Sun, 19 Jun 2016 06:32:08 +0000 (23:32 -0700)]
i965: Make texture validation code use texture objects, not units.
This requires moving the _MaxLevel handling up to the callers. Another
user of intel_finalize_mipmap_tree will be added later that depends on
_MaxLevel not being modified.
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Kenneth Graunke [Sun, 19 Jun 2016 06:44:44 +0000 (23:44 -0700)]
i965: Pass tObj into intel_update_max_level instead of intel_obj.
We want both anyway, but this will simplify things a tiny bit in an
upcoming patch.
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Kenneth Graunke [Wed, 31 Jan 2018 14:47:02 +0000 (06:47 -0800)]
i965: Delete more misleading comments.
brw_bo_wait_rendering used to take a brw_context pointer for perf_debug
messages about stalls. Chris eliminated that in
833108ac14ade91f54cc6e.
This message about passing NULL to avoid those warnings is no longer
relevant, and just adds confusion. So, drop it.
Andres Rodriguez [Wed, 31 Jan 2018 02:42:14 +0000 (21:42 -0500)]
docs/features: mark EXT_semaphore(_fd) as DONE v2
Support for these extensions is available in radeonsi.
v2: also updated relnotes
Signed-off-by: Andres Rodriguez <andresx7@gmail.com>
Brian Paul [Wed, 31 Jan 2018 03:08:34 +0000 (20:08 -0700)]
st/mesa: whitespace, formatting fixes in st_glsl_to_tgsi.cpp
Reviewed-by: Charmaine Lee <charmainel@vmware.com>
Brian Paul [Wed, 31 Jan 2018 02:57:33 +0000 (19:57 -0700)]
st/mesa: s/int/GLenum/ in st_glsl_to_tgsi.cpp
Reviewed-by: Charmaine Lee <charmainel@vmware.com>
Brian Paul [Tue, 30 Jan 2018 23:50:37 +0000 (16:50 -0700)]
svga: use opcode local var to simplify some code
Reviewed-by: Charmaine Lee <charmainel@vmware.com>
Brian Paul [Tue, 30 Jan 2018 23:49:00 +0000 (16:49 -0700)]
svga: s/unsigned/VGPU10_OPCODE_TYPE/
Reviewed-by: Charmaine Lee <charmainel@vmware.com>
Samuel Pitoiset [Wed, 31 Jan 2018 10:40:24 +0000 (11:40 +0100)]
radv: do not dump meta shader stats
That's quite useless and that pollutes the output.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Samuel Pitoiset [Wed, 31 Jan 2018 10:23:58 +0000 (11:23 +0100)]
ac/nir: fix emission of ffract for 64-bit
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Eric Engestrom [Thu, 7 Dec 2017 16:03:40 +0000 (16:03 +0000)]
meson: dedup gallium-xa logic
Signed-off-by: Eric Engestrom <eric.engestrom@imgtec.com>
Reviewed-by: Dylan Baker <dylan@pnwbakers.com>
Eric Engestrom [Thu, 7 Dec 2017 16:03:22 +0000 (16:03 +0000)]
meson: dedup gallium-va logic
Signed-off-by: Eric Engestrom <eric.engestrom@imgtec.com>
Reviewed-by: Dylan Baker <dylan@pnwbakers.com>
Eric Engestrom [Thu, 7 Dec 2017 16:03:04 +0000 (16:03 +0000)]
meson: dedup gallium-omx logic
Signed-off-by: Eric Engestrom <eric.engestrom@imgtec.com>
Reviewed-by: Dylan Baker <dylan@pnwbakers.com>
Eric Engestrom [Thu, 7 Dec 2017 16:02:29 +0000 (16:02 +0000)]
meson: dedup gallium-xvmc logic
Signed-off-by: Eric Engestrom <eric.engestrom@imgtec.com>
Reviewed-by: Dylan Baker <dylan@pnwbakers.com>
Eric Engestrom [Thu, 7 Dec 2017 16:02:02 +0000 (16:02 +0000)]
meson: dedup gallium-vdpau logic
Signed-off-by: Eric Engestrom <eric.engestrom@imgtec.com>
Reviewed-by: Dylan Baker <dylan@pnwbakers.com>
Antia Puentes [Fri, 26 Jan 2018 11:10:30 +0000 (12:10 +0100)]
Revert "mesa: add missing RGB9_E5 format in _mesa_base_fbo_format"
This reverts commit
513c2263cbff45edb105c7b46e58f316e06746ab.
_mesa_base_fbo_format_ is used to validate the internalformat
passed to RenderbufferStorage, which in the OpenGL 4.6 is said:
"An INVALID_ENUM error is generated if internalformat is not one of the
color-renderable, depth-renderable, or stencil-renderable formats defined
in section 9.4."
RGB9_E5 format is not renderable, as stated in the same specification
(Bug 9338).
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=104794
Cc: Juan A. Suarez Romero <jasuarez@igalia.com>
Cc: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Juan A. Suarez <jasuarez@igalia.com>
Michel Dänzer [Fri, 26 Jan 2018 17:32:32 +0000 (18:32 +0100)]
winsys/radeon: Compute is_displayable in surf_drm_to_winsys
It was always 0, breaking (at least) DRI3 with Xwayland.
Bugzilla: https://bugs.freedesktop.org/104306
Fixes:
5f2073be3282 ("ac/surface: add ac_surface::is_displayable")
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Matthew Nicholls [Mon, 29 Jan 2018 16:26:18 +0000 (16:26 +0000)]
radv: remove predication on cache flushes
This can lead to a situation where cache flushes could get conditionally
disabled while still clearing the flush_bits, and thus flushes due to
application pipeline barriers may never get executed.
Fixes:
a6c2001ace (radv: add support for cmd predication.)
Signed-off-by: Dave Airlie <airlied@redhat.com>
Brian Paul [Wed, 31 Jan 2018 02:32:37 +0000 (19:32 -0700)]
mesa: fix broken glGet*(GL_POLYGON_MODE) query
This reverts part of the patch which introduced the GLenum16 change.
Fixes a conform regression found by Roland.
Fixes:
f96a69f916aed405 ("mesa: replace GLenum with GLenum16 in
common structures (v4)")
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Dave Airlie [Mon, 13 Nov 2017 20:52:06 +0000 (06:52 +1000)]
virgl: also remove dimension on indirect.
This fixes some dEQP tests that generated bad shaders.
Fixes:
b6f6ead19 (virgl: drop const dimensions on first block.)
Reviewed-by: Gurchetan Singh <gurchetansingh@chromium.org>
Tested-by: Gurchetan Singh <gurchetansingh@chromium.org>
Signed-off-by: Dave Airlie <airlied@redhat.com>
Marek Olšák [Wed, 10 Jan 2018 23:21:44 +0000 (00:21 +0100)]
radeonsi: remove DBG_PRECOMPILE
it's useless and shader-db stats only report the main shader part.
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Marek Olšák [Wed, 10 Jan 2018 22:25:37 +0000 (23:25 +0100)]
radeonsi: print shader-db stats for main parts, not final binaries
This is needed to get shader-db stats for LS,HS,ES,GS stages on gfx9.
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Marek Olšák [Wed, 10 Jan 2018 22:09:58 +0000 (23:09 +0100)]
radeonsi: move max_simd_waves computation into a separate function
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Marek Olšák [Tue, 30 Jan 2018 21:24:12 +0000 (22:24 +0100)]
mesa: fix glGet MAX_VERTEX_ATTRIB queries
Broken by
f96a69f916aed40519e755d0460a83940a587
Reviewed-by: Brian Paul <brianp@vmware.com>
Jason Ekstrand [Sat, 27 Jan 2018 00:22:27 +0000 (16:22 -0800)]
anv/cmd_buffer: Re-emit the pipeline at every subpass
If we ever hit this edge-case, it can theoretically cause problem for
CNL because we could end up changing render targets without re-emitting
3DSTATE_MULTISAMPLE which is part of the pipeline. Just get rid of the
edge case.
Cc: mesa-stable@lists.freedesktop.org
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Ian Romanick [Wed, 2 Mar 2016 23:39:09 +0000 (15:39 -0800)]
nir: Distribute binary operations with constants into bcsel
This was specifically designed to simplify 1+mix(0, a-1, condition) to
mix(1, a, condition) by pushing the 1+ inside.
Skylake, Broadwell, and Haswell had similar results. Skylake shown.
total instructions in shared programs:
14521753 ->
14521716 (<.01%)
instructions in affected programs: 10619 -> 10582 (-0.35%)
helped: 51
HURT: 14
helped stats (abs) min: 1 max: 12 x̄: 1.43 x̃: 1
helped stats (rel) min: 0.20% max: 3.58% x̄: 1.01% x̃: 0.95%
HURT stats (abs) min: 1 max: 11 x̄: 2.57 x̃: 1
HURT stats (rel) min: 0.22% max: 1.75% x̄: 1.20% x̃: 1.32%
95% mean confidence interval for instructions value: -1.31 0.17
95% mean confidence interval for instructions %-change: -0.80% -0.27%
Inconclusive result (value mean confidence interval includes 0).
total cycles in shared programs:
533000205 ->
533003533 (<.01%)
cycles in affected programs: 110610 -> 113938 (3.01%)
helped: 43
HURT: 28
helped stats (abs) min: 6 max: 440 x̄: 27.12 x̃: 16
helped stats (rel) min: 0.39% max: 4.84% x̄: 1.60% x̃: 1.67%
HURT stats (abs) min: 2 max: 3066 x̄: 160.50 x̃: 14
HURT stats (rel) min: 0.08% max: 77.78% x̄: 5.16% x̃: 0.62%
95% mean confidence interval for cycles value: -43.81 137.56
95% mean confidence interval for cycles %-change: -1.47% 3.60%
Inconclusive result (value mean confidence interval includes 0).
Ivy Bridge
total instructions in shared programs:
10018840 ->
10018713 (<.01%)
instructions in affected programs: 9431 -> 9304 (-1.35%)
helped: 51
HURT: 3
helped stats (abs) min: 1 max: 80 x̄: 2.76 x̃: 1
helped stats (rel) min: 0.20% max: 16.43% x̄: 1.16% x̃: 0.81%
HURT stats (abs) min: 1 max: 12 x̄: 4.67 x̃: 1
HURT stats (rel) min: 0.22% max: 1.33% x̄: 0.59% x̃: 0.22%
95% mean confidence interval for instructions value: -5.36 0.66
95% mean confidence interval for instructions %-change: -1.66% -0.46%
Inconclusive result (value mean confidence interval includes 0).
total cycles in shared programs:
87571944 ->
87572785 (<.01%)
cycles in affected programs: 117234 -> 118075 (0.72%)
helped: 42
HURT: 23
helped stats (abs) min: 2 max: 114 x̄: 51.90 x̃: 30
helped stats (rel) min: 0.11% max: 11.01% x̄: 4.45% x̃: 2.74%
HURT stats (abs) min: 1 max: 2341 x̄: 131.35 x̃: 10
HURT stats (rel) min: 0.06% max: 37.11% x̄: 2.75% x̃: 0.61%
95% mean confidence interval for cycles value: -61.05 86.93
95% mean confidence interval for cycles %-change: -3.47% -0.33%
Inconclusive result (value mean confidence interval includes 0).
Sandy Bridge
total instructions in shared programs:
10542933 ->
10542844 (<.01%)
instructions in affected programs: 11487 -> 11398 (-0.77%)
helped: 52
HURT: 3
helped stats (abs) min: 1 max: 40 x̄: 1.96 x̃: 1
helped stats (rel) min: 0.08% max: 8.16% x̄: 0.90% x̃: 0.72%
HURT stats (abs) min: 1 max: 11 x̄: 4.33 x̃: 1
HURT stats (rel) min: 0.22% max: 1.22% x̄: 0.55% x̃: 0.22%
95% mean confidence interval for instructions value: -3.17 -0.07
95% mean confidence interval for instructions %-change: -1.13% -0.52%
Instructions are helped.
total cycles in shared programs:
146098397 ->
146097094 (<.01%)
cycles in affected programs: 128140 -> 126837 (-1.02%)
helped: 47
HURT: 8
helped stats (abs) min: 2 max: 333 x̄: 29.21 x̃: 18
helped stats (rel) min: 0.13% max: 5.04% x̄: 1.18% x̃: 0.95%
HURT stats (abs) min: 1 max: 16 x̄: 8.75 x̃: 9
HURT stats (rel) min: 0.08% max: 0.43% x̄: 0.30% x̃: 0.34%
95% mean confidence interval for cycles value: -37.49 -9.90
95% mean confidence interval for cycles %-change: -1.22% -0.71%
Cycles are helped.
Iron Lake
total instructions in shared programs: 7886711 -> 7886509 (<.01%)
instructions in affected programs: 10425 -> 10223 (-1.94%)
helped: 50
HURT: 2
helped stats (abs) min: 1 max: 78 x̄: 4.08 x̃: 1
helped stats (rel) min: 0.34% max: 15.38% x̄: 1.12% x̃: 0.54%
HURT stats (abs) min: 1 max: 1 x̄: 1.00 x̃: 1
HURT stats (rel) min: 0.86% max: 0.91% x̄: 0.89% x̃: 0.89%
95% mean confidence interval for instructions value: -8.05 0.28
95% mean confidence interval for instructions %-change: -1.83% -0.26%
Inconclusive result (value mean confidence interval includes 0).
total cycles in shared programs:
178115324 ->
178114612 (<.01%)
cycles in affected programs: 765726 -> 765014 (-0.09%)
helped: 39
HURT: 1
helped stats (abs) min: 2 max: 276 x̄: 18.31 x̃: 8
helped stats (rel) min: <.01% max: 8.47% x̄: 0.39% x̃: 0.04%
HURT stats (abs) min: 2 max: 2 x̄: 2.00 x̃: 2
HURT stats (rel) min: 0.03% max: 0.03% x̄: 0.03% x̃: 0.03%
95% mean confidence interval for cycles value: -32.07 -3.53
95% mean confidence interval for cycles %-change: -0.86% 0.10%
Inconclusive result (%-change mean confidence interval includes 0).
GM45
total instructions in shared programs: 4857762 -> 4857661 (<.01%)
instructions in affected programs: 5523 -> 5422 (-1.83%)
helped: 25
HURT: 1
helped stats (abs) min: 1 max: 78 x̄: 4.08 x̃: 1
helped stats (rel) min: 0.34% max: 13.61% x̄: 1.04% x̃: 0.52%
HURT stats (abs) min: 1 max: 1 x̄: 1.00 x̃: 1
HURT stats (rel) min: 0.86% max: 0.86% x̄: 0.86% x̃: 0.86%
95% mean confidence interval for instructions value: -9.99 2.22
95% mean confidence interval for instructions %-change: -2.01% 0.08%
Inconclusive result (value mean confidence interval includes 0).
total cycles in shared programs:
122179674 ->
122179194 (<.01%)
cycles in affected programs: 530162 -> 529682 (-0.09%)
helped: 22
HURT: 1
helped stats (abs) min: 2 max: 292 x̄: 21.91 x̃: 7
helped stats (rel) min: <.01% max: 8.65% x̄: 0.44% x̃: 0.04%
HURT stats (abs) min: 2 max: 2 x̄: 2.00 x̃: 2
HURT stats (rel) min: 0.03% max: 0.03% x̄: 0.03% x̃: 0.03%
95% mean confidence interval for cycles value: -46.56 4.82
95% mean confidence interval for cycles %-change: -1.20% 0.36%
Inconclusive result (value mean confidence interval includes 0).
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Reviewed-by: Elie Tournier <elie.tournier@collabora.com>
Ian Romanick [Fri, 5 Jan 2018 21:20:46 +0000 (13:20 -0800)]
nir: Rearrange logic op-compounded integer compares
Skylake and Broadwell had similar results. Skylake shown.
total instructions in shared programs:
14521769 ->
14521753 (<.01%)
instructions in affected programs: 8782 -> 8766 (-0.18%)
helped: 16
HURT: 0
helped stats (abs) min: 1 max: 1 x̄: 1.00 x̃: 1
helped stats (rel) min: 0.12% max: 0.40% x̄: 0.20% x̃: 0.18%
95% mean confidence interval for instructions value: -1.00 -1.00
95% mean confidence interval for instructions %-change: -0.23% -0.16%
Instructions are helped.
total cycles in shared programs:
533000376 ->
533000205 (<.01%)
cycles in affected programs: 447035 -> 446864 (-0.04%)
helped: 9
HURT: 9
helped stats (abs) min: 2 max: 40 x̄: 35.78 x̃: 40
helped stats (rel) min: 0.02% max: 0.18% x̄: 0.10% x̃: 0.09%
HURT stats (abs) min: 1 max: 52 x̄: 16.78 x̃: 10
HURT stats (rel) min: <.01% max: 1.11% x̄: 0.29% x̃: 0.12%
95% mean confidence interval for cycles value: -25.07 6.07
95% mean confidence interval for cycles %-change: -0.08% 0.27%
Inconclusive result (value mean confidence interval includes 0).
No changes on GM45, Iron Lake, Sandy Bridge, Ivy Bridge, or Haswell.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Reviewed-by: Elie Tournier <elie.tournier@collabora.com>
Ian Romanick [Thu, 4 Jan 2018 23:21:30 +0000 (15:21 -0800)]
nir: Rearrange and-compounded float compares
If both comparisons are used as sources for instructions other than the
iand, this transformation is detrimental. If the non-identical value in
both compares is constant, the fmin or fmax will be constant-folded
away, so the transformation is always a win.
It is interesting to me that on Iron Lake only 81 shaders have
instruction counts changed, but 726 shaders have cycle counts changed.
shader-db results:
Skylake
total instructions in shared programs:
14525728 ->
14521017 (-0.03%)
instructions in affected programs: 1164726 -> 1160015 (-0.40%)
helped: 1692
HURT: 5
helped stats (abs) min: 1 max: 637 x̄: 2.79 x̃: 2
helped stats (rel) min: 0.07% max: 16.36% x̄: 0.81% x̃: 0.33%
HURT stats (abs) min: 1 max: 12 x̄: 3.20 x̃: 1
HURT stats (rel) min: 0.38% max: 2.86% x̄: 2.36% x̃: 2.86%
95% mean confidence interval for instructions value: -3.52 -2.03
95% mean confidence interval for instructions %-change: -0.86% -0.74%
Instructions are helped.
total cycles in shared programs:
533115449 ->
532991404 (-0.02%)
cycles in affected programs:
119401803 ->
119277758 (-0.10%)
helped: 1145
HURT: 467
helped stats (abs) min: 1 max: 34644 x̄: 145.92 x̃: 18
helped stats (rel) min: <.01% max: 45.33% x̄: 1.58% x̃: 0.42%
HURT stats (abs) min: 1 max: 1590 x̄: 92.15 x̃: 15
HURT stats (rel) min: <.01% max: 13.48% x̄: 1.26% x̃: 0.39%
95% mean confidence interval for cycles value: -122.16 -31.74
95% mean confidence interval for cycles %-change: -0.94% -0.57%
Cycles are helped.
total spills in shared programs: 9597 -> 9534 (-0.66%)
spills in affected programs: 403 -> 340 (-15.63%)
helped: 1
HURT: 1
total fills in shared programs: 13904 -> 13790 (-0.82%)
fills in affected programs: 1627 -> 1513 (-7.01%)
helped: 2
HURT: 1
LOST: 0
GAINED: 2
Broadwell
total instructions in shared programs:
14816966 ->
14812590 (-0.03%)
instructions in affected programs: 1499885 -> 1495509 (-0.29%)
helped: 1672
HURT: 15
helped stats (abs) min: 1 max: 455 x̄: 2.70 x̃: 2
helped stats (rel) min: 0.05% max: 16.36% x̄: 0.81% x̃: 0.33%
HURT stats (abs) min: 1 max: 21 x̄: 9.20 x̃: 8
HURT stats (rel) min: 0.08% max: 2.86% x̄: 1.06% x̃: 0.53%
95% mean confidence interval for instructions value: -3.14 -2.05
95% mean confidence interval for instructions %-change: -0.85% -0.73%
Instructions are helped.
total cycles in shared programs:
559353622 ->
559345595 (<.01%)
cycles in affected programs:
139893703 ->
139885676 (<.01%)
helped: 921
HURT: 697
helped stats (abs) min: 1 max: 42424 x̄: 143.45 x̃: 18
helped stats (rel) min: <.01% max: 36.23% x̄: 2.02% x̃: 0.87%
HURT stats (abs) min: 1 max: 2370 x̄: 178.03 x̃: 38
HURT stats (rel) min: <.01% max: 17.35% x̄: 0.71% x̃: 0.14%
95% mean confidence interval for cycles value: -59.64 49.72
95% mean confidence interval for cycles %-change: -1.02% -0.66%
Inconclusive result (value mean confidence interval includes 0).
total spills in shared programs: 78902 -> 78861 (-0.05%)
spills in affected programs: 2418 -> 2377 (-1.70%)
helped: 1
HURT: 11
total fills in shared programs: 83782 -> 83678 (-0.12%)
fills in affected programs: 3515 -> 3411 (-2.96%)
helped: 2
HURT: 11
LOST: 0
GAINED: 5
Haswell and Ivy Bridge had similar results. Haswell shown.
total instructions in shared programs: 9033898 -> 9032010 (-0.02%)
instructions in affected programs: 308064 -> 306176 (-0.61%)
helped: 921
HURT: 4
helped stats (abs) min: 1 max: 20 x̄: 2.05 x̃: 1
helped stats (rel) min: 0.17% max: 17.54% x̄: 0.80% x̃: 0.35%
HURT stats (abs) min: 1 max: 1 x̄: 1.00 x̃: 1
HURT stats (rel) min: 3.23% max: 3.23% x̄: 3.23% x̃: 3.23%
95% mean confidence interval for instructions value: -2.21 -1.87
95% mean confidence interval for instructions %-change: -0.88% -0.68%
Instructions are helped.
total cycles in shared programs:
84628949 ->
84620520 (<.01%)
cycles in affected programs: 2164913 -> 2156484 (-0.39%)
helped: 518
HURT: 359
helped stats (abs) min: 1 max: 440 x̄: 41.52 x̃: 20
helped stats (rel) min: <.01% max: 17.17% x̄: 1.95% x̃: 1.01%
HURT stats (abs) min: 1 max: 586 x̄: 36.43 x̃: 8
HURT stats (rel) min: 0.04% max: 18.65% x̄: 1.47% x̃: 0.40%
95% mean confidence interval for cycles value: -15.17 -4.05
95% mean confidence interval for cycles %-change: -0.77% -0.32%
Cycles are helped.
LOST: 0
GAINED: 4
Sandy Bridge
total instructions in shared programs:
10544860 ->
10542933 (-0.02%)
instructions in affected programs: 360019 -> 358092 (-0.54%)
helped: 931
HURT: 4
helped stats (abs) min: 1 max: 20 x̄: 2.07 x̃: 1
helped stats (rel) min: 0.11% max: 15.52% x̄: 0.68% x̃: 0.30%
HURT stats (abs) min: 1 max: 1 x̄: 1.00 x̃: 1
HURT stats (rel) min: 3.33% max: 3.33% x̄: 3.33% x̃: 3.33%
95% mean confidence interval for instructions value: -2.23 -1.89
95% mean confidence interval for instructions %-change: -0.76% -0.58%
Instructions are helped.
total cycles in shared programs:
146106820 ->
146098397 (<.01%)
cycles in affected programs: 3435047 -> 3426624 (-0.25%)
helped: 572
HURT: 329
helped stats (abs) min: 1 max: 1289 x̄: 32.52 x̃: 15
helped stats (rel) min: <.01% max: 26.29% x̄: 0.97% x̃: 0.33%
HURT stats (abs) min: 1 max: 1714 x̄: 30.93 x̃: 6
HURT stats (rel) min: 0.02% max: 41.31% x̄: 1.13% x̃: 0.19%
95% mean confidence interval for cycles value: -16.85 -1.85
95% mean confidence interval for cycles %-change: -0.39% -0.01%
Cycles are helped.
LOST: 1
GAINED: 0
Iron Lake
total instructions in shared programs: 7886925 -> 7886711 (<.01%)
instructions in affected programs: 25763 -> 25549 (-0.83%)
helped: 75
HURT: 6
helped stats (abs) min: 1 max: 13 x̄: 3.33 x̃: 1
helped stats (rel) min: 0.35% max: 17.57% x̄: 1.96% x̃: 0.53%
HURT stats (abs) min: 1 max: 16 x̄: 6.00 x̃: 1
HURT stats (rel) min: 2.86% max: 4.79% x̄: 3.49% x̃: 2.86%
95% mean confidence interval for instructions value: -3.69 -1.60
95% mean confidence interval for instructions %-change: -2.54% -0.57%
Instructions are helped.
total cycles in shared programs:
178116888 ->
178115324 (<.01%)
cycles in affected programs: 5858790 -> 5857226 (-0.03%)
helped: 484
HURT: 242
helped stats (abs) min: 2 max: 76 x̄: 5.27 x̃: 6
helped stats (rel) min: 0.01% max: 10.70% x̄: 0.18% x̃: 0.06%
HURT stats (abs) min: 2 max: 76 x̄: 4.07 x̃: 2
HURT stats (rel) min: 0.01% max: 3.99% x̄: 0.19% x̃: 0.03%
95% mean confidence interval for cycles value: -2.76 -1.55
95% mean confidence interval for cycles %-change: -0.12% 0.01%
Inconclusive result (%-change mean confidence interval includes 0).
GM45
total instructions in shared programs: 4857870 -> 4857762 (<.01%)
instructions in affected programs: 13994 -> 13886 (-0.77%)
helped: 39
HURT: 5
helped stats (abs) min: 1 max: 13 x̄: 3.28 x̃: 2
helped stats (rel) min: 0.33% max: 17.11% x̄: 1.86% x̃: 0.48%
HURT stats (abs) min: 1 max: 16 x̄: 4.00 x̃: 1
HURT stats (rel) min: 2.86% max: 4.71% x̄: 3.23% x̃: 2.86%
95% mean confidence interval for instructions value: -3.86 -1.05
95% mean confidence interval for instructions %-change: -2.61% 0.04%
Inconclusive result (%-change mean confidence interval includes 0).
total cycles in shared programs:
122180744 ->
122179674 (<.01%)
cycles in affected programs: 3686646 -> 3685576 (-0.03%)
helped: 273
HURT: 141
helped stats (abs) min: 2 max: 76 x̄: 5.81 x̃: 6
helped stats (rel) min: 0.01% max: 10.70% x̄: 0.18% x̃: 0.06%
HURT stats (abs) min: 2 max: 76 x̄: 3.66 x̃: 2
HURT stats (rel) min: 0.01% max: 3.99% x̄: 0.16% x̃: 0.02%
95% mean confidence interval for cycles value: -3.42 -1.75
95% mean confidence interval for cycles %-change: -0.15% 0.03%
Inconclusive result (%-change mean confidence interval includes 0).
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Reviewed-by: Elie Tournier <elie.tournier@collabora.com>
Ian Romanick [Thu, 11 Jan 2018 22:14:25 +0000 (14:14 -0800)]
nir: Separate a weird compare with zero to two compares with zero
min(a+b, c+d) >= 0 becomes (a+b >= 0 && c+d >= 0).
No shader-db changes, but it does prevent 6 to 12 instruction
regressions in the next patch on all measured Intel platforms.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Reviewed-by: Elie Tournier <elie.tournier@collabora.com>
Ian Romanick [Tue, 8 Mar 2016 19:11:00 +0000 (11:11 -0800)]
nir: Simplify min and max of b2f
v2: Rebase on almost 2 years. Require that one of the arguments to fmin
or fmax be used only once. This prevents some regressions.
shader-db results:
Skylake and Broadwell had similar results. Skylake shown.
total instructions in shared programs:
14526021 ->
14525913 (<.01%)
instructions in affected programs: 4613 -> 4505 (-2.34%)
helped: 31
HURT: 0
helped stats (abs) min: 1 max: 4 x̄: 3.48 x̃: 4
helped stats (rel) min: 0.62% max: 6.67% x̄: 3.31% x̃: 2.42%
total cycles in shared programs:
533118710 ->
533118403 (<.01%)
cycles in affected programs: 34334 -> 34027 (-0.89%)
helped: 24
HURT: 0
helped stats (abs) min: 4 max: 24 x̄: 12.79 x̃: 14
helped stats (rel) min: 0.25% max: 2.40% x̄: 1.08% x̃: 1.03%
No changes on GM45, Iron Lake, Sandy Bridge, Ivy Bridge, or Haswell.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Reviewed-by: Elie Tournier <elie.tournier@collabora.com>
Ian Romanick [Fri, 5 Jan 2018 21:29:26 +0000 (13:29 -0800)]
nir: Undo possible damage caused by rearranging or-compounded float compares
shader-db results:
Skylake and Broadwell had similar results (Skylake shown)
total instructions in shared programs:
14525898 ->
14525836 (<.01%)
instructions in affected programs: 1964 -> 1902 (-3.16%)
helped: 14
HURT: 0
helped stats (abs) min: 1 max: 25 x̄: 4.43 x̃: 1
helped stats (rel) min: 0.68% max: 9.77% x̄: 2.10% x̃: 0.86%
95% mean confidence interval for instructions value: -9.46 0.60
95% mean confidence interval for instructions %-change: -3.97% -0.24%
Inconclusive result (value mean confidence interval includes 0).
total cycles in shared programs:
533119892 ->
533115756 (<.01%)
cycles in affected programs: 96061 -> 91925 (-4.31%)
helped: 13
HURT: 1
helped stats (abs) min: 60 max: 596 x̄: 318.77 x̃: 300
helped stats (rel) min: 1.15% max: 5.49% x̄: 4.27% x̃: 4.42%
HURT stats (abs) min: 8 max: 8 x̄: 8.00 x̃: 8
HURT stats (rel) min: 0.46% max: 0.46% x̄: 0.46% x̃: 0.46%
95% mean confidence interval for cycles value: -379.43 -211.43
95% mean confidence interval for cycles %-change: -4.84% -3.01%
Cycles are helped.
Haswell, Ivy Bridge and Sandy Bridge had similar results (Haswell shown).
total instructions in shared programs: 9033948 -> 9033898 (<.01%)
instructions in affected programs: 535 -> 485 (-9.35%)
helped: 2
HURT: 0
total cycles in shared programs:
84631402 ->
84628949 (<.01%)
cycles in affected programs: 63197 -> 60744 (-3.88%)
helped: 13
HURT: 2
helped stats (abs) min: 1 max: 594 x̄: 189.62 x̃: 140
helped stats (rel) min: 0.07% max: 5.04% x̄: 3.79% x̃: 4.01%
HURT stats (abs) min: 4 max: 8 x̄: 6.00 x̃: 6
HURT stats (rel) min: 0.17% max: 0.45% x̄: 0.31% x̃: 0.31%
95% mean confidence interval for cycles value: -253.40 -73.67
95% mean confidence interval for cycles %-change: -4.24% -2.25%
Cycles are helped.
No changes on GM45 or Iron Lake.
v2: Add a couple more tautological compares. Suggested by Elie.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Reviewed-by: Elie Tournier <elie.tournier@collabora.com>
Ian Romanick [Thu, 4 Jan 2018 21:30:49 +0000 (13:30 -0800)]
nir: Be more conservative about rearranging or-compounded compares
If both comparisons are used as sources for instructions other than the
ior, this transformation is detrimental. If the non-identical value in
both compares is constant, the fmin or fmax will be constant-folded
away, so the transformation is always a win.
shader-db results:
Skylake
total instructions in shared programs:
14526147 ->
14525898 (<.01%)
instructions in affected programs: 70239 -> 69990 (-0.35%)
helped: 102
HURT: 0
helped stats (abs) min: 1 max: 8 x̄: 2.44 x̃: 1
helped stats (rel) min: 0.07% max: 2.30% x̄: 0.38% x̃: 0.20%
95% mean confidence interval for instructions value: -2.86 -2.02
95% mean confidence interval for instructions %-change: -0.46% -0.31%
Instructions are helped.
total cycles in shared programs:
533120531 ->
533119892 (<.01%)
cycles in affected programs: 994875 -> 994236 (-0.06%)
helped: 76
HURT: 26
helped stats (abs) min: 1 max: 324 x̄: 27.09 x̃: 13
helped stats (rel) min: <.01% max: 4.21% x̄: 0.45% x̃: 0.18%
HURT stats (abs) min: 1 max: 167 x̄: 54.62 x̃: 26
HURT stats (rel) min: <.01% max: 4.36% x̄: 1.01% x̃: 0.39%
95% mean confidence interval for cycles value: -19.44 6.91
95% mean confidence interval for cycles %-change: -0.30% 0.15%
Inconclusive result (value mean confidence interval includes 0).
Broadwell
total instructions in shared programs:
14816005 ->
14815787 (<.01%)
instructions in affected programs: 64658 -> 64440 (-0.34%)
helped: 97
HURT: 0
helped stats (abs) min: 1 max: 8 x̄: 2.25 x̃: 1
helped stats (rel) min: 0.07% max: 2.30% x̄: 0.38% x̃: 0.20%
95% mean confidence interval for instructions value: -2.62 -1.87
95% mean confidence interval for instructions %-change: -0.45% -0.30%
Instructions are helped.
total cycles in shared programs:
559340386 ->
559339907 (<.01%)
cycles in affected programs: 1090491 -> 1090012 (-0.04%)
helped: 66
HURT: 28
helped stats (abs) min: 2 max: 198 x̄: 23.83 x̃: 16
helped stats (rel) min: 0.01% max: 4.21% x̄: 0.47% x̃: 0.27%
HURT stats (abs) min: 2 max: 226 x̄: 39.07 x̃: 11
HURT stats (rel) min: <.01% max: 4.61% x̄: 0.64% x̃: 0.20%
95% mean confidence interval for cycles value: -15.94 5.75
95% mean confidence interval for cycles %-change: -0.35% 0.07%
Inconclusive result (value mean confidence interval includes 0).
LOST: 0
GAINED: 1
Haswell
total instructions in shared programs: 9034106 -> 9033948 (<.01%)
instructions in affected programs: 24096 -> 23938 (-0.66%)
helped: 38
HURT: 0
helped stats (abs) min: 1 max: 8 x̄: 4.16 x̃: 4
helped stats (rel) min: 0.42% max: 2.29% x̄: 0.71% x̃: 0.64%
95% mean confidence interval for instructions value: -4.71 -3.60
95% mean confidence interval for instructions %-change: -0.84% -0.58%
Instructions are helped.
total cycles in shared programs:
84631628 ->
84631402 (<.01%)
cycles in affected programs: 148674 -> 148448 (-0.15%)
helped: 14
HURT: 14
helped stats (abs) min: 1 max: 114 x̄: 22.14 x̃: 12
helped stats (rel) min: 0.02% max: 2.98% x̄: 0.66% x̃: 0.21%
HURT stats (abs) min: 1 max: 10 x̄: 6.00 x̃: 5
HURT stats (rel) min: 0.01% max: 0.20% x̄: 0.12% x̃: 0.11%
95% mean confidence interval for cycles value: -19.42 3.28
95% mean confidence interval for cycles %-change: -0.59% 0.05%
Inconclusive result (value mean confidence interval includes 0).
Ivy Bridge
total instructions in shared programs:
10015456 ->
10015293 (<.01%)
instructions in affected programs: 27701 -> 27538 (-0.59%)
helped: 38
HURT: 0
helped stats (abs) min: 1 max: 9 x̄: 4.29 x̃: 4
helped stats (rel) min: 0.33% max: 2.79% x̄: 0.66% x̃: 0.52%
95% mean confidence interval for instructions value: -4.87 -3.71
95% mean confidence interval for instructions %-change: -0.82% -0.51%
Instructions are helped.
total cycles in shared programs:
87524771 ->
87524569 (<.01%)
cycles in affected programs: 112324 -> 112122 (-0.18%)
helped: 6
HURT: 12
helped stats (abs) min: 2 max: 111 x̄: 44.67 x̃: 20
helped stats (rel) min: 0.02% max: 2.94% x̄: 1.45% x̃: 1.26%
HURT stats (abs) min: 1 max: 16 x̄: 5.50 x̃: 5
HURT stats (rel) min: <.01% max: 0.16% x̄: 0.08% x̃: 0.08%
95% mean confidence interval for cycles value: -29.14 6.69
95% mean confidence interval for cycles %-change: -0.93% 0.08%
Inconclusive result (value mean confidence interval includes 0).
LOST: 0
GAINED: 2
Sandy Bridge
total instructions in shared programs:
10545655 ->
10545465 (<.01%)
instructions in affected programs: 37198 -> 37008 (-0.51%)
helped: 42
HURT: 0
helped stats (abs) min: 1 max: 8 x̄: 4.52 x̃: 4
helped stats (rel) min: 0.31% max: 2.15% x̄: 0.58% x̃: 0.49%
95% mean confidence interval for instructions value: -5.14 -3.91
95% mean confidence interval for instructions %-change: -0.68% -0.47%
Instructions are helped.
total cycles in shared programs:
146113059 ->
146112427 (<.01%)
cycles in affected programs: 423514 -> 422882 (-0.15%)
helped: 32
HURT: 10
helped stats (abs) min: 4 max: 162 x̄: 24.34 x̃: 12
helped stats (rel) min: 0.06% max: 2.74% x̄: 0.37% x̃: 0.11%
HURT stats (abs) min: 12 max: 19 x̄: 14.70 x̃: 14
HURT stats (rel) min: 0.10% max: 0.18% x̄: 0.16% x̃: 0.14%
95% mean confidence interval for cycles value: -26.03 -4.07
95% mean confidence interval for cycles %-change: -0.43% -0.05%
Cycles are helped.
Iron Lake
total instructions in shared programs: 7886959 -> 7886925 (<.01%)
instructions in affected programs: 1340 -> 1306 (-2.54%)
helped: 4
HURT: 0
helped stats (abs) min: 2 max: 15 x̄: 8.50 x̃: 8
helped stats (rel) min: 0.63% max: 4.30% x̄: 2.45% x̃: 2.43%
95% mean confidence interval for instructions value: -20.44 3.44
95% mean confidence interval for instructions %-change: -5.78% 0.89%
Inconclusive result (value mean confidence interval includes 0).
total cycles in shared programs:
178116996 ->
178116888 (<.01%)
cycles in affected programs: 6262 -> 6154 (-1.72%)
helped: 2
HURT: 2
helped stats (abs) min: 44 max: 78 x̄: 61.00 x̃: 61
helped stats (rel) min: 3.31% max: 3.94% x̄: 3.62% x̃: 3.62%
HURT stats (abs) min: 6 max: 8 x̄: 7.00 x̃: 7
HURT stats (rel) min: 0.34% max: 0.68% x̄: 0.51% x̃: 0.51%
95% mean confidence interval for cycles value: -93.27 39.27
95% mean confidence interval for cycles %-change: -5.38% 2.27%
Inconclusive result (value mean confidence interval includes 0).
GM45
total instructions in shared programs: 4857887 -> 4857870 (<.01%)
instructions in affected programs: 674 -> 657 (-2.52%)
helped: 2
HURT: 0
total cycles in shared programs:
122180816 ->
122180744 (<.01%)
cycles in affected programs: 3764 -> 3692 (-1.91%)
helped: 1
HURT: 1
helped stats (abs) min: 78 max: 78 x̄: 78.00 x̃: 78
helped stats (rel) min: 3.94% max: 3.94% x̄: 3.94% x̃: 3.94%
HURT stats (abs) min: 6 max: 6 x̄: 6.00 x̃: 6
HURT stats (rel) min: 0.34% max: 0.34% x̄: 0.34% x̃: 0.34%
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Reviewed-by: Elie Tournier <elie.tournier@collabora.com>
Ian Romanick [Tue, 9 Jan 2018 23:32:47 +0000 (15:32 -0800)]
nir: See through an fneg to apply existing optimizations
Doing the same for the existing feq and fne transformations didn't help
anything in shader-db.
shader-db results:
Broadwell and Skylake (Skylake shown)
total instructions in shared programs:
14529463 ->
14526147 (-0.02%)
instructions in affected programs: 402420 -> 399104 (-0.82%)
helped: 2136
HURT: 131
helped stats (abs) min: 1 max: 10 x̄: 1.61 x̃: 1
helped stats (rel) min: 0.03% max: 16.22% x̄: 3.14% x̃: 1.12%
HURT stats (abs) min: 1 max: 2 x̄: 1.01 x̃: 1
HURT stats (rel) min: 0.13% max: 7.69% x̄: 0.75% x̃: 0.57%
95% mean confidence interval for instructions value: -1.51 -1.41
95% mean confidence interval for instructions %-change: -3.06% -2.78%
Instructions are helped.
total cycles in shared programs:
533146915 ->
533120531 (<.01%)
cycles in affected programs:
10356261 ->
10329877 (-0.25%)
helped: 1933
HURT: 844
helped stats (abs) min: 1 max: 490 x̄: 29.44 x̃: 16
helped stats (rel) min: <.01% max: 28.57% x̄: 3.43% x̃: 1.88%
HURT stats (abs) min: 1 max: 423 x̄: 36.17 x̃: 12
HURT stats (rel) min: <.01% max: 23.75% x̄: 1.90% x̃: 0.59%
95% mean confidence interval for cycles value: -11.78 -7.22
95% mean confidence interval for cycles %-change: -1.98% -1.65%
Cycles are helped.
Haswell
total instructions in shared programs: 9037416 -> 9034106 (-0.04%)
instructions in affected programs: 389831 -> 386521 (-0.85%)
helped: 2184
HURT: 120
helped stats (abs) min: 1 max: 11 x̄: 1.57 x̃: 1
helped stats (rel) min: 0.03% max: 25.00% x̄: 2.73% x̃: 1.02%
HURT stats (abs) min: 1 max: 1 x̄: 1.00 x̃: 1
HURT stats (rel) min: 0.19% max: 7.69% x̄: 0.81% x̃: 0.57%
95% mean confidence interval for instructions value: -1.49 -1.39
95% mean confidence interval for instructions %-change: -2.68% -2.41%
Instructions are helped.
total cycles in shared programs:
84636243 ->
84631628 (<.01%)
cycles in affected programs: 4745058 -> 4740443 (-0.10%)
helped: 1904
HURT: 960
helped stats (abs) min: 1 max: 466 x̄: 30.21 x̃: 18
helped stats (rel) min: 0.02% max: 36.36% x̄: 3.57% x̃: 2.38%
HURT stats (abs) min: 1 max: 1080 x̄: 55.11 x̃: 14
HURT stats (rel) min: 0.02% max: 51.33% x̄: 2.77% x̃: 0.81%
95% mean confidence interval for cycles value: -4.51 1.29
95% mean confidence interval for cycles %-change: -1.64% -1.25%
Inconclusive result (value mean confidence interval includes 0).
LOST: 1
GAINED: 0
Sandy Bridge and Ivy Bridge (Ivy Bridge shown)
total instructions in shared programs:
10018873 ->
10015456 (-0.03%)
instructions in affected programs: 512820 -> 509403 (-0.67%)
helped: 2268
HURT: 162
helped stats (abs) min: 1 max: 11 x̄: 1.62 x̃: 1
helped stats (rel) min: 0.03% max: 25.00% x̄: 2.47% x̃: 0.88%
HURT stats (abs) min: 1 max: 4 x̄: 1.59 x̃: 1
HURT stats (rel) min: 0.09% max: 7.69% x̄: 0.86% x̃: 0.50%
95% mean confidence interval for instructions value: -1.46 -1.35
95% mean confidence interval for instructions %-change: -2.38% -2.12%
Instructions are helped.
total cycles in shared programs:
87538223 ->
87524771 (-0.02%)
cycles in affected programs: 5435520 -> 5422068 (-0.25%)
helped: 1916
HURT: 946
helped stats (abs) min: 1 max: 1392 x̄: 29.44 x̃: 18
helped stats (rel) min: <.01% max: 34.51% x̄: 3.34% x̃: 1.97%
HURT stats (abs) min: 1 max: 633 x̄: 45.41 x̃: 11
HURT stats (rel) min: 0.02% max: 25.95% x̄: 2.41% x̃: 0.62%
95% mean confidence interval for cycles value: -7.34 -2.06
95% mean confidence interval for cycles %-change: -1.62% -1.26%
Cycles are helped.
LOST: 1
GAINED: 0
Iron Lake
total instructions in shared programs: 7888446 -> 7886959 (-0.02%)
instructions in affected programs: 331581 -> 330094 (-0.45%)
helped: 1160
HURT: 97
helped stats (abs) min: 1 max: 10 x̄: 1.37 x̃: 1
helped stats (rel) min: 0.02% max: 9.68% x̄: 0.93% x̃: 0.43%
HURT stats (abs) min: 1 max: 1 x̄: 1.00 x̃: 1
HURT stats (rel) min: 0.17% max: 4.17% x̄: 0.37% x̃: 0.25%
95% mean confidence interval for instructions value: -1.25 -1.12
95% mean confidence interval for instructions %-change: -0.91% -0.75%
Instructions are helped.
total cycles in shared programs:
178130766 ->
178116996 (<.01%)
cycles in affected programs:
12534564 ->
12520794 (-0.11%)
helped: 1856
HURT: 187
helped stats (abs) min: 2 max: 202 x̄: 7.78 x̃: 4
helped stats (rel) min: <.01% max: 6.47% x̄: 0.28% x̃: 0.11%
HURT stats (abs) min: 2 max: 26 x̄: 3.55 x̃: 2
HURT stats (rel) min: 0.01% max: 2.14% x̄: 0.08% x̃: 0.02%
95% mean confidence interval for cycles value: -7.41 -6.07
95% mean confidence interval for cycles %-change: -0.28% -0.22%
Cycles are helped.
GM45
total instructions in shared programs: 4858912 -> 4857887 (-0.02%)
instructions in affected programs: 237565 -> 236540 (-0.43%)
helped: 867
HURT: 57
helped stats (abs) min: 1 max: 10 x̄: 1.25 x̃: 1
helped stats (rel) min: 0.02% max: 9.38% x̄: 0.87% x̃: 0.43%
HURT stats (abs) min: 1 max: 1 x̄: 1.00 x̃: 1
HURT stats (rel) min: 0.16% max: 3.85% x̄: 0.34% x̃: 0.22%
95% mean confidence interval for instructions value: -1.18 -1.04
95% mean confidence interval for instructions %-change: -0.88% -0.71%
Instructions are helped.
total cycles in shared programs:
122189118 ->
122180816 (<.01%)
cycles in affected programs: 8776418 -> 8768116 (-0.09%)
helped: 1213
HURT: 166
helped stats (abs) min: 2 max: 202 x̄: 7.30 x̃: 4
helped stats (rel) min: <.01% max: 6.43% x̄: 0.25% x̃: 0.11%
HURT stats (abs) min: 2 max: 26 x̄: 3.35 x̃: 2
HURT stats (rel) min: 0.01% max: 2.14% x̄: 0.06% x̃: 0.02%
95% mean confidence interval for cycles value: -6.78 -5.26
95% mean confidence interval for cycles %-change: -0.24% -0.18%
Cycles are helped.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Reviewed-by: Elie Tournier <elie.tournier@collabora.com>
Timothy Arceri [Mon, 15 Jan 2018 00:48:16 +0000 (11:48 +1100)]
st/glsl_to_nir: disable io lowering and array splitting of fs inputs
We need this to be able to support the interpolateAt builtins in a
sane way. It also leads to the generation of more optimal code.
The lowering and splitting is made conditional on lower_all_io_to_temps
because vc4 and freedreno both expect these passes to be enabled and
niether support glsl 400 so don't need to deal with the interpolateAt
builtins.
We leave the other stages for now as to avoid regressions. Ideally we
could remove the stage checks and just set the nir options correctly
for each stage. However all gallium drivers currently just use return
the same nir compiler options for all stages, and it's probably more
trouble than its worth to change this.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Timothy Arceri [Mon, 29 Jan 2018 23:55:19 +0000 (10:55 +1100)]
nir: add lower_all_io_to_temps flag
This will be used for freedreno and vc4 which require all inputs
and outputs to be copied to temps.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Timothy Arceri [Fri, 19 Jan 2018 02:05:35 +0000 (13:05 +1100)]
nir/st_glsl_to_nir: add param to disable splitting of inputs
We need this because we will always copy fs outputs to temps and
split the arrays, but do not want to do either of these with fs
inputs as it is unnessisary and makes handling interpolateAt
builtins difficult.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Timothy Arceri [Tue, 30 Jan 2018 00:51:31 +0000 (11:51 +1100)]
st/glsl_to_nir: copy nir compiler options to context
Various nir passes may expect this to be here as does the nir
serialisation pass.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Timothy Arceri [Mon, 15 Jan 2018 00:45:37 +0000 (11:45 +1100)]
radeonsi/nir: add input support for arrays that have not been copied to temps and split
We need this to be able to support the interpolateAt builtins in a
sane way. It also leads to the generation of more optimal code.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Timothy Arceri [Sun, 14 Jan 2018 09:54:20 +0000 (20:54 +1100)]
ac/radeonsi: add lookup_interp_param and load_sample_position to the abi
This will enable the interpolateAt builtins to work on the radeonsi
nir backend.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Timothy Arceri [Sun, 14 Jan 2018 09:51:35 +0000 (20:51 +1100)]
radeonsi/nir: add prim_mask to the abi
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Timothy Arceri [Sun, 14 Jan 2018 09:49:40 +0000 (20:49 +1100)]
radeonsi/nir: adjust load_sample_position() to be shared between backends
With this interface change it can be shared between the tgsi and
nir backends.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Timothy Arceri [Tue, 30 Jan 2018 03:54:13 +0000 (14:54 +1100)]
radeonsi/nir: add si_nir_lookup_interp_param() helper
Reviewed-by: Marek Olšák <marek.olsak@amd.com>