Brian Paul [Wed, 6 Jan 2016 01:11:14 +0000 (18:11 -0700)]
st/mesa: be more careful about state validation in st_Bitmap()
If the only dirty state is mesa's _NEW_PROGRAM_CONSTANTS flag, we can
skip state validation before drawing a bitmap since that state doesn't
effect bitmap rendering.
This further increases the performance of the ipers demo on llvmpipe
to about what it was before commit
36c93a6fae27561.
Reviewed-by: José Fonseca <jfonseca@vmware.com>
Brian Paul [Wed, 6 Jan 2016 01:28:57 +0000 (18:28 -0700)]
st/mesa: move bitmap cache flushing out of state validation
Just do it where needed (before drawing, clearing, etc).
Reviewed-by: José Fonseca <jfonseca@vmware.com>
Brian Paul [Wed, 6 Jan 2016 00:10:12 +0000 (17:10 -0700)]
st/mesa: check state->mesa in early return check in st_validate_state()
We were checking the dirty->st flags but not the dirty->mesa flags.
When we took the early return, we didn't clear the dirty->mesa flags
so the next time we called st_validate_state() we'd often flush the
glBitmap cache. And since st_validate_state() is called from
st_Bitmap(), it meant we flushed the bitmap cache for every glBitmap()
call.
This change seems to recover most of the performance loss observed
with the ipers demo on llvmpipe since commit commit
36c93a6fae27561.
Cc: mesa-stable@lists.freedesktop.org
Reviewed-by: José Fonseca <jfonseca@vmware.com>
Brian Paul [Wed, 6 Jan 2016 00:26:29 +0000 (17:26 -0700)]
st/mesa: protect debug printf() with a conditional instead of comment
Brian Paul [Wed, 6 Jan 2016 00:38:00 +0000 (17:38 -0700)]
st/mesa: fix comment indentation in st_flush_bitmap_cache()
Timothy Arceri [Wed, 6 Jan 2016 09:22:46 +0000 (20:22 +1100)]
glsl: fix varying slot allocation for blocks and structs with explicit locations
Previously each member was being counted as using a single slot,
count_attribute_slots() fixes the count for array and struct members.
Also don't assign a negitive to the unsigned expl_location variable.
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Timothy Arceri [Tue, 15 Dec 2015 05:23:29 +0000 (16:23 +1100)]
glsl: don't try adding built-ins to explicit locations bitmask
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Edward O'Callaghan <eocallaghan@alterapraxis.com>
Timothy Arceri [Tue, 15 Dec 2015 05:40:26 +0000 (16:40 +1100)]
glsl: fix overlapping of varying locations for arrays and structs
Previously we were only reserving a single location for arrays and
structs.
We also didn't take into account implicit locations clashing with
explicit locations when assigning locations for their arrays or
structs.
This patch fixes both issues.
V5: fix regression for patch inputs/outputs in tessellation shaders
V4: just use count_attribute_slots() to get the number of slots,
also calculate the correct number of slots to reserve for gs and
tess stages by making use of the new get_varying_type() helper.
V3: handle arrays of structs
V2: also fix for arrays of arrays and structs.
Acked-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Edward O'Callaghan <eocallaghan@alterapraxis.com>
Timothy Arceri [Fri, 18 Dec 2015 02:53:27 +0000 (13:53 +1100)]
glsl: create helper to remove outer vertex index array used by some stages
This will be used in the following patch for calculating array sizes correctly
when reserving explicit varying locations.
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Edward O'Callaghan <eocallaghan@alterapraxis.com>
Timothy Arceri [Mon, 21 Dec 2015 23:14:45 +0000 (10:14 +1100)]
glsl: remove unused varyings before packing them
Previously we would pack varyings before trying to remove them, this
relied on the packing pass not packing varyings with a location of -1
to avoid packing varyings that should be removed.
However this meant unused varyings with an explicit location would be
packed before they could be removed when we enable packing of them in a
later patch.
V2: fix regression in V1 removing unused varyings in multi-stage SSO,
fix regression with single stage programs.
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Edward O'Callaghan <eocallaghan@alterapraxis.com>
Krzysztof Sobiecki [Tue, 29 Dec 2015 19:27:44 +0000 (20:27 +0100)]
gallium/r600: Replace ALIGN_DIVUP with DIV_ROUND_UP
ALIGN_DIVUP is a driver specific(r600g) macro that duplicates DIV_ROUND_UP functionality.
Replacing it with DIV_ROUND_UP eliminates this problems.
Signed-off-by: Krzysztof A. Sobiecki <sobkas@gmail.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Eric Anholt [Wed, 6 Jan 2016 20:48:19 +0000 (12:48 -0800)]
vc4: Fix driver build from last minute rebase fix.
I had the driver all tested for the last series, and in my last build I
noticed that get_swizzled_channel was unused now, and removed
it... apparently without testing to find that I removed the wrong channel
swizzle function.
Eric Anholt [Wed, 6 Jan 2016 01:18:09 +0000 (17:18 -0800)]
vc4: Optimize out a comparison for bcsel based on an ALU comparison
We routinely have code like:
vec1 ssa_220 = fge ssa_104, ssa_61
vec1 ssa_199 = bcsel ssa_220, ssa_106, ssa_105
and we would compare fge's args and choose between ~0 and 0 to generate
ssa_220, then compare ssa_220 to 0 and choose between bcsel's args.
Instead, try to notice the pattern and compare between fge's args to
select between bcsel's args.
total instructions in shared programs: 88019 -> 87574 (-0.51%)
instructions in affected programs: 9985 -> 9540 (-4.46%)
total estimated cycles in shared programs: 245752 -> 245237 (-0.21%)
estimated cycles in affected programs: 17232 -> 16717 (-2.99%)
Eric Anholt [Wed, 6 Jan 2016 00:36:28 +0000 (16:36 -0800)]
vc4: Add missing sRGB decode to texel fetches.
We only see txf on MSAA textures, currently, and apparently this didn't
impact any of our piglit tests.
Eric Anholt [Wed, 6 Jan 2016 00:25:07 +0000 (16:25 -0800)]
vc4: Add support for GL_ARB_texture_swizzle.
We already had the code supporting it, since it's needed for the depth
mode when doing shadow comparisons.
Eric Anholt [Sat, 19 Dec 2015 03:15:03 +0000 (19:15 -0800)]
vc4: Use NIR texture lowering for texture swizzling.
We can't use its other features currently (mostly because we don't want
Newton-Raphson on rcps for texture coordinates), but it gets us started.
This eliminates some comparisons with constants in GLB2.7 and ETQW traces
at the QIR level by moving the comparisons into NIR, where they get
constant-folded out.
instructions in affected programs: 165 -> 156 (-5.45%)
total uniforms in shared programs: 32087 -> 32085 (-0.01%)
total estimated cycles in shared programs: 245762 -> 245752 (-0.00%)
estimated cycles in affected programs: 461 -> 451 (-2.17%)
Eric Anholt [Tue, 22 Dec 2015 21:37:36 +0000 (13:37 -0800)]
vc4: Replace the SSA-style SEL operators with conditional MOVs.
I'm moving away from QIR being SSA (since NIR is doing lots of SSA
optimization for us now) and instead having QIR just be QPU operations
with virtual registers. By making our SELs be composed of two MOVs, we
could potentially coalesce the registers for the MOV's src and dst and
eliminate the MOV.
total instructions in shared programs: 88448 -> 88028 (-0.47%)
instructions in affected programs: 39845 -> 39425 (-1.05%)
total estimated cycles in shared programs: 246306 -> 245762 (-0.22%)
estimated cycles in affected programs: 162887 -> 162343 (-0.33%)
Eric Anholt [Mon, 4 Jan 2016 21:56:39 +0000 (13:56 -0800)]
vc4: Don't try the SF coalescing unless it's on a def.
If you want the SF of the value of a register produced from a series of
packing MOVs or conditional MOVs, we can't just SF on the last MOV into
the register.
Edward O'Callaghan [Tue, 5 Jan 2016 10:07:23 +0000 (21:07 +1100)]
gallium/drivers/svga: Use unsigned for loop index
Fix a 's/unsigned int/unsigned/' consistency case while here.
Found-by: Coccinelle
Signed-off-by: Edward O'Callaghan <eocallaghan@alterapraxis.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Edward O'Callaghan [Tue, 5 Jan 2016 10:07:22 +0000 (21:07 +1100)]
gallium/drivers/r600: Use unsigned for loop index
Found-by: Coccinelle
Signed-off-by: Edward O'Callaghan <eocallaghan@alterapraxis.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Edward O'Callaghan [Tue, 5 Jan 2016 10:07:21 +0000 (21:07 +1100)]
gallium/drivers/ilo: Use unsigned for loop index
Found-by: Coccinelle
Signed-off-by: Edward O'Callaghan <eocallaghan@alterapraxis.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Edward O'Callaghan [Tue, 5 Jan 2016 10:07:20 +0000 (21:07 +1100)]
gallium: Use unsigned for loop index
Found-by: Coccinelle
Signed-off-by: Edward O'Callaghan <eocallaghan@alterapraxis.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Edward O'Callaghan [Tue, 5 Jan 2016 10:07:19 +0000 (21:07 +1100)]
gallium/drivers: Remove unnecessary semicolons
Found-by: Coccinelle
Signed-off-by: Edward O'Callaghan <eocallaghan@alterapraxis.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Edward O'Callaghan [Tue, 5 Jan 2016 10:07:18 +0000 (21:07 +1100)]
gallium: Remove unnecessary semicolons
Fix silly issue with MSVC case fall-though support to need
a extra 'break;'
Found-by: Coccinelle
Signed-off-by: Edward O'Callaghan <eocallaghan@alterapraxis.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Oded Gabbay [Tue, 29 Dec 2015 16:12:35 +0000 (18:12 +0200)]
llvmpipe: Optimize lp_rast_triangle_32_3_16 for POWER8
This patch converts the SSE-optimized lp_rast_triangle_32_3_16()
to VMX/VSX.
I measured the results on POWER8 machine with 32 cores at 3.4GHz and
16GB of RAM.
FPS/Score
Name Before After Delta
------------------------------------------------
openarena 16.35 16.7 2.14%
xonotic 4.707 4.97 5.57%
glmark2 didn't show a significant (more than 1%) difference.
v2: Make sure code is build only on POWER8 LE machine
Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Oded Gabbay [Tue, 29 Dec 2015 16:12:34 +0000 (18:12 +0200)]
llvmpipe: Optimize BUILD_MASK(_LINEAR) for POWER8
This patch converts the SSE-optimized build_mask_32() and
build_mask_linear_32() to VMX/VSX.
I measured the results on POWER8 machine with 32 cores at 3.4GHz and
16GB of RAM.
FPS/Score
Name Before After Delta
------------------------------------------------
glmark2 (score) 139.8 142.7 2.07%
openarena and xonotic didn't show a significant (more than 1%)
difference.
v2: Make sure code is build only on POWER8 LE machine
Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Oded Gabbay [Sun, 13 Dec 2015 15:49:32 +0000 (17:49 +0200)]
llvmpipe: Optimize do_triangle_ccw for POWER8
This patch converts the SSE optimization done in do_triangle_ccw to
VMX/VSX.
I measured the results on POWER8 machine with 32 cores at 3.4GHz and
16GB of RAM.
FPS/Score
Name Before After Delta
------------------------------------------------
glmark2 (score) 136.6 139.8 2.34%
openarena 16.14 16.35 1.30%
xonotic 4.655 4.707 1.11%
v2:
- Convert loads to use aligned loads
- Make sure code is build only on POWER8 LE machine
Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Oded Gabbay [Thu, 3 Dec 2015 07:11:13 +0000 (09:11 +0200)]
llvmpipe: add POWER8 portability file - u_pwr8.h
This file provides a portability layer that will make it easier to convert
SSE-based functions to VMX/VSX-based functions.
All the functions implemented in this file are prefixed using "vec_".
Therefore, when converting from SSE-based function, one needs to simply
replace the "_mm_" prefix of the SSE function being called to "vec_".
Having said that, not all functions could be converted as such, due to the
differences between the architectures. So, when doing such
conversion hurt the performance, I preferred to implement a more ad-hoc
solution. For example, converting the _mm_shuffle_epi32 needed to be done
using ad-hoc masks instead of a generic function.
All the functions in this file support both little-endian and big-endian
but currently the file is build only on POWER8 LE machine.
All of the functions are implemented using the Altivec/VMX intrinsics,
except one where I needed to use inline assembly (due to missing
intrinsic).
v2:
- Use vec_vgbbd instead of __builtin_vec_vgbbd
- Add an aligned load function
- Don't use typeof()
- Make file build only on POWER8 LE machine
Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Oded Gabbay [Thu, 3 Dec 2015 07:11:04 +0000 (09:11 +0200)]
configure.ac: Detect if running on POWER8 arch
To determine if we could use special POWER8 assembly directives, we first
need to detect whether we are running on POWER8 architecture. This patch
adds this detection to configure.ac and adds the necessary compilation
flags accordingly.
v2:
- Add option to disable POWER8 instructions generation
- Detect whether building on BE or LE machine and build with
-mpower8-vector only on LE machine
- Make the printed messages more standard
Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Kenneth Graunke [Tue, 5 Jan 2016 13:09:46 +0000 (05:09 -0800)]
nir: Add a lower_fdiv option, turn fdiv into fmul/frcp.
The nir_opt_algebraic rule
(('fadd', ('flog2', a), ('fneg', ('flog2', b))), ('flog2', ('fdiv', a, b))),
can produce new fdiv operations, which need to be lowered on i965,
as we don't actually implement fdiv. (Normally, we handle this in
GLSL IR's lower_instructions pass, but in the above case we introduce
an fdiv after that point. So, make NIR do it for us.)
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Cc: mesa-stable@lists.freedesktop.org
Kenneth Graunke [Tue, 5 Jan 2016 10:54:50 +0000 (02:54 -0800)]
i965: Only turn on ARB_compute_shader if we can write registers.
Compute shaders require reconfiguring the L3 for shared local memory
support. We have to be able to write the L3 registers to do that.
This effectively turns off compute shaders prior to Kernel 4.2.
(Previously, the extension enable was in an API_OPENGL_CORE conditional.
However, that isn't necessary - core Mesa extension handling already
restricts it properly. I've moved it out in this patch.)
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
Kenneth Graunke [Tue, 5 Jan 2016 12:46:33 +0000 (04:46 -0800)]
i965: Use rcp in brw_lower_texture_gradients rather than 1.0 / x.
That's what it's for. Plus, we actually implement rcp.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Timothy Arceri [Wed, 6 Jan 2016 00:27:05 +0000 (11:27 +1100)]
mesa: fix GL_MAX_NAME_LENGTH query for tessellation shaders
This fixes some piglit subtests for ARB_program_interface_query.
V3: remove some of the unnecessary parentheses
V2: fix alignment
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Timothy Arceri [Wed, 23 Dec 2015 03:26:49 +0000 (14:26 +1100)]
glsl: don't change the varying type in validation code
There is a function dedicated to demoting unused varyings lets
trust it to do its job.
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Edward O'Callaghan <eocallaghan@alterapraxis.com>
Timothy Arceri [Wed, 23 Dec 2015 03:11:04 +0000 (14:11 +1100)]
glsl: move lowering after matching validation
After lowering the matching flag is_unmatched_generic_inout is lost so
we need to move this validation before lowering.
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Edward O'Callaghan <eocallaghan@alterapraxis.com>
Timothy Arceri [Wed, 23 Dec 2015 22:50:59 +0000 (09:50 +1100)]
glsl: only add outward facing varyings to resourse list for SSO
An SSO program can have multiple stages and we only want to add the externally
facing varyings. The current code was adding both the packed inputs and outputs
for the first and last stage of each program.
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Edward O'Callaghan <eocallaghan@alterapraxis.com>
Anuj Phogat [Tue, 24 Mar 2015 23:07:40 +0000 (16:07 -0700)]
i965/gen9: Modify the conditions to use blitter on skl+
Conditions modified allow skl+ to use blitter:
- for all tiling formats
- to write data to YF/YS tiled surfaces
Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Anuj Phogat [Tue, 10 Nov 2015 23:33:53 +0000 (15:33 -0800)]
i965/gen9: Return false in place of assert in intelEmitCopyBlit()
This allows the fallback paths to handle it correctly.
Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Anuj Phogat [Tue, 3 Nov 2015 18:31:45 +0000 (10:31 -0800)]
i965/gen9: Remove regions overlap check in fast copy blit
Overlapping blits are anyway undefined in OpenGL. So no need
of overlap check here.
Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Anuj Phogat [Tue, 28 Jul 2015 17:47:35 +0000 (10:47 -0700)]
i965/gen9: Don't use fast copy blit in case of non power of 2 cpp
Fast copy blit is currently enabled for use only with Yf/Ys tiling.
Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Ian Romanick [Fri, 18 Dec 2015 01:50:34 +0000 (17:50 -0800)]
i915/i965: Fix typo in perf_debug message
Trivial
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Brian Paul [Tue, 5 Jan 2016 20:04:46 +0000 (13:04 -0700)]
st/mesa: minor indentation fixes
Brian Paul [Tue, 5 Jan 2016 20:03:05 +0000 (13:03 -0700)]
draw: minor indentation fix
Brian Paul [Tue, 5 Jan 2016 20:03:05 +0000 (13:03 -0700)]
mesa: minor clean-up of some memcpy/sizeof() calls in m_matrix.c
Reviewed-by: Charmaine Lee <charmainel@vmware.com>
Brian Paul [Tue, 5 Jan 2016 20:03:04 +0000 (13:03 -0700)]
util: add debug_dump_ubyte_rgba_bmp()
Like debug_dump_float_rgba_bmp() but takes ubyte values.
Reviewed-by: Charmaine Lee <charmainel@vmware.com>
Brian Paul [Tue, 5 Jan 2016 20:03:04 +0000 (13:03 -0700)]
mesa: check for z=0 in _mesa_Vertex3dv()
It's very rare that a GL app calls glVertex3dv(), but one in particular
calls it lot, always with Z = 0. Check for that condition and convert
the call into glVertex2f. This reduces VBO memory used and reduces
the number of times we have to switch between float[2] and float[3]
vertex formats in the svga driver. This results in a small but
measurable performance improvement.
Reviewed-by: Charmaine Lee <charmainel@vmware.com>
Brian Paul [Tue, 5 Jan 2016 20:03:04 +0000 (13:03 -0700)]
svga: fix test for SVGA_NEW_STIPPLE
We only want to set the SVGA_NEW_STIPPLE dirty flag when the polygon
stipple state changes. Before, we only set the flag when we were
enabling stipple, but not disabling.
We don't really have to add SVGA_NEW_STIPPLE to the dirty FS state
set since it's a subset of SVGA_NEW_RAST, but let's be explicit.
This doesn't fix any known bugs.
Reviewed-by: Charmaine Lee <charmainel@vmware.com>
Brian Paul [Tue, 5 Jan 2016 20:03:04 +0000 (13:03 -0700)]
svga: add some comments in svga_state_vs.c
Reviewed-by: Charmaine Lee <charmainel@vmware.com>
Brian Paul [Tue, 5 Jan 2016 20:03:04 +0000 (13:03 -0700)]
svga: change svga_hw_view_state::dirty to boolean
Since it's a true/false value.
Reviewed-by: Charmaine Lee <charmainel@vmware.com>
Brian Paul [Tue, 5 Jan 2016 20:03:04 +0000 (13:03 -0700)]
svga: avoid emitting redundant SetVertexBuffers() commands
Reviewed-by: Charmaine Lee <charmainel@vmware.com>
Brian Paul [Tue, 5 Jan 2016 20:03:04 +0000 (13:03 -0700)]
svga: check for no-ops in svga_bind_sampler_states()
and svga_set_sampler_views(). If there's no change, return early
and don't set a SVGA_NEW_x dirty state flag.
Reviewed-by: Charmaine Lee <charmainel@vmware.com>
Ilia Mirkin [Tue, 5 Jan 2016 04:28:52 +0000 (23:28 -0500)]
i965: quieten compiler warning about out-of-bounds access
gcc 4.9.3 shows the following error:
brw_vue_map.c:260:20: warning: array subscript is above array bounds
[-Warray-bounds]
return brw_names[slot - VARYING_SLOT_MAX];
This is because BRW_VARYING_SLOT_COUNT is a valid value for the enum
type. Adding an assert will generate no additional code but will teach
the compiler to not complain.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Timothy Arceri <timothy.arceri@collabora.com>
Julien Isorce [Thu, 9 Apr 2015 12:45:17 +0000 (13:45 +0100)]
build: enable st/va with nouveau driver
vainfo fails in vaDriverInit because "dd_create_screen"
does not reach strcmp(driver_name, "nouveau") code.
Indeed when compiling the va target.c, the macro GALLIUM_NOUVEAU
is not defined.
This patch define the macro the same it is done for dri and
vdpau targets.
Tested with:
./autogen.sh --enable-glx --enable-gles2 --enable-egl --enable-vdpau --enable-glx-tls=yes --enable-va
--with-gallium-drivers=swrast,nouveau --with-dri-drivers=swrast,nouveau --with-egl-platforms=x11
LIBVA_DRIVER_NAME=gallium vainfo
Output:
vainfo: Driver version: mesa gallium vaapi
vainfo: Supported profile and entrypoints
VAProfileMPEG2Simple : VAEntrypointVLD
VAProfileMPEG2Main : VAEntrypointVLD
VAProfileMPEG4Simple : VAEntrypointVLD
VAProfileMPEG4AdvancedSimple : VAEntrypointVLD
VAProfileVC1Simple : VAEntrypointVLD
VAProfileVC1Main : VAEntrypointVLD
VAProfileVC1Advanced : VAEntrypointVLD
VAProfileH264Baseline : VAEntrypointVLD
VAProfileH264Main : VAEntrypointVLD
VAProfileH264High : VAEntrypointVLD
VAProfileNone : VAEntrypointVideoProc
Signed-off-by: Julien Isorce <j.isorce@samsung.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Julien Isorce [Wed, 23 Dec 2015 09:25:53 +0000 (09:25 +0000)]
nvc0: add support for st/va
- split nvc0_decoder_bsp in begin/next/end
- preserve content buffer when calling nvc0_decoder_bsp_next
- implement pipe_video_codec::begin_frame/end_frame
https://bugs.freedesktop.org/show_bug.cgi?id=89969
Signed-off-by: Julien Isorce <j.isorce@samsung.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Julien Isorce [Wed, 23 Dec 2015 09:25:52 +0000 (09:25 +0000)]
nouveau: split nouveau_vp3_bsp in begin/next/end
It allows to call nouveau_vp3_bsp_next multiple times
between one begin/end.
It is required to support st/va.
https://bugs.freedesktop.org/show_bug.cgi?id=89969
Signed-off-by: Julien Isorce <j.isorce@samsung.com>
[imirkin: create strparm_bsp function, simplified w0 calculation]
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Julien Isorce [Tue, 5 Jan 2016 15:02:47 +0000 (15:02 +0000)]
st/va: count number of slices
The counter was not set but used by the nouveau driver.
It is required otherwise visual output is garbage.
Signed-off-by: Julien Isorce <j.isorce@samsung.com>
Reviewed-by: Christian Koenig <christian.koenig@amd.com>
Ilia Mirkin [Tue, 5 Jan 2016 00:57:11 +0000 (19:57 -0500)]
i965/wm: use binding size for ubo/ssbo when automatic size is unset
This fixes the same tests that commit
8cf2e892f was attempting to fix:
ES31-CTS.shader_storage_buffer_object.advanced-unsizedArrayLength-cs-std430-vec-bindrangeOffset
ES31-CTS.shader_storage_buffer_object.advanced-unsizedArrayLength-cs-std430-vec-bindrangeSize
as confirmed by Samuel.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Cc: Marta Lofstedt <marta.lofstedt@intel.com>
Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Ilia Mirkin [Tue, 5 Jan 2016 00:48:08 +0000 (19:48 -0500)]
Revert "i965/wm: use proper API buffer size for the surfaces."
This reverts commit
8cf2e892fca20c4776b4a07c39918343cb2d4e0e. It's
entirely bogus to attempt to store anything about the binding in the
buffer object itself, which might be bound any number of times.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Cc: Marta Lofstedt <marta.lofstedt@intel.com>
Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Nicolai Hähnle [Mon, 4 Jan 2016 22:31:05 +0000 (17:31 -0500)]
st/mesa: make KHR_debug output independent of context creation flags (v2)
Instead, keep track of GL_DEBUG_OUTPUT and (un)install the pipe_debug_callback
accordingly. Hardware drivers can still use the absence of the callback to
skip more expensive operations in the normal case, and users can no longer be
surprised by the need to set the debug flag at context creation time.
v2:
- re-add the proper initialization of debug contexts (Ilia Mirkin)
- silence a potential warning (Ilia Mirkin)
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Ilia Mirkin [Mon, 4 Jan 2016 16:26:27 +0000 (11:26 -0500)]
nvc0: scale up inter_bo size so that it's 16M for a 4K video
Experimentally, 4M causes corruption and slowness, try to ramp it up
with size instead.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: "11.0 11.1" <mesa-stable@lists.freedesktop.org>
Ilia Mirkin [Mon, 4 Jan 2016 16:16:45 +0000 (11:16 -0500)]
nv50,nvc0: fix crash when increasing bsp bo size for h264
H264 doesn't have a bitplane bo. We just need a device reference, so use
the one from the client.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: "11.0 11.1" <mesa-stable@lists.freedesktop.org>
Samuel Iglesias Gonsálvez [Tue, 15 Dec 2015 11:51:48 +0000 (12:51 +0100)]
i965/wm: use proper API buffer size for the surfaces.
Commit
5bb5eeea fixes a bug indicating that the surfaces should have the
API buffer size. Hovewer it picked the wrong value.
This patch adds a new variable, which takes into account
glBindBufferRange() values. This patch fixes the following CTS
regressions:
ES31-CTS.shader_storage_buffer_object.advanced-unsizedArrayLength-cs-std430-vec-bindrangeOffset
ES31-CTS.shader_storage_buffer_object.advanced-unsizedArrayLength-cs-std430-vec-bindrangeSize
Signed-off-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Reviewed-by: Marta Lofstedt <marta.lofstedt@intel.com>
Marek Olšák [Mon, 28 Dec 2015 00:39:20 +0000 (01:39 +0100)]
radeonsi: remove unused parameter from si_shader_binary_read_config
Reviewed-by: Edward O'Callaghan <eocallaghan@alterapraxis.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Marek Olšák [Sun, 27 Dec 2015 22:22:14 +0000 (23:22 +0100)]
radeonsi: move si_shader_binary_upload out of si_shader_binary_read
Reviewed-by: Edward O'Callaghan <eocallaghan@alterapraxis.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Marek Olšák [Sun, 27 Dec 2015 21:16:05 +0000 (22:16 +0100)]
gallium/radeon: dump LLVM module outside of radeon_llvm_compile
Reviewed-by: Edward O'Callaghan <eocallaghan@alterapraxis.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Marek Olšák [Sun, 27 Dec 2015 20:57:40 +0000 (21:57 +0100)]
gallium/radeon: always add +DumpCode to the LLVM target machine for LLVM <= 3.5
It's the same behavior that we use for later LLVM.
Reviewed-by: Edward O'Callaghan <eocallaghan@alterapraxis.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Marek Olšák [Sun, 27 Dec 2015 20:24:47 +0000 (21:24 +0100)]
gallium/radeon: r600_can_dump_shader should get TGSI processor type directly
Reviewed-by: Edward O'Callaghan <eocallaghan@alterapraxis.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Marek Olšák [Sun, 27 Dec 2015 21:24:41 +0000 (22:24 +0100)]
radeonsi: pass TGSI processor type to si_shader_binary_read for dumping
the parameter will be used later
Reviewed-by: Edward O'Callaghan <eocallaghan@alterapraxis.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Marek Olšák [Sun, 27 Dec 2015 21:22:24 +0000 (22:22 +0100)]
radeonsi: pass TGSI processor type to si_compile_llvm for dumping
the parameter will be used later
Reviewed-by: Edward O'Callaghan <eocallaghan@alterapraxis.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Marek Olšák [Wed, 30 Dec 2015 14:04:26 +0000 (15:04 +0100)]
radeonsi: rename shader parameter definitions and variables for more clarity
Reviewed-by: Edward O'Callaghan <eocallaghan@alterapraxis.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Ilia Mirkin [Thu, 29 Oct 2015 06:52:56 +0000 (02:52 -0400)]
nvc0/ir: add support for PK2H/UP2H
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Ilia Mirkin [Thu, 29 Oct 2015 06:52:57 +0000 (02:52 -0400)]
st/mesa: use PK2H/UP2H when supported
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Edward O'Callaghan <eocallaghan@alterapraxis.com>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Ilia Mirkin [Sat, 2 Jan 2016 23:55:48 +0000 (18:55 -0500)]
gallium: add PIPE_CAP_TGSI_PACK_HALF_FLOAT to indicate UP2H/PK2H support
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Edward O'Callaghan <eocallaghan@alterapraxis.com>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Ilia Mirkin [Thu, 29 Oct 2015 21:52:46 +0000 (17:52 -0400)]
tgsi: update PK2H/UP2H channel behavior info
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Edward O'Callaghan <eocallaghan@alterapraxis.com>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Ilia Mirkin [Thu, 29 Oct 2015 06:52:55 +0000 (02:52 -0400)]
gallium: document PK2H/UP2H
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Edward O'Callaghan <eocallaghan@alterapraxis.com>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Samuel Pitoiset [Sun, 3 Jan 2016 17:40:39 +0000 (18:40 +0100)]
st/mesa: fix parameter names for tesseval/tessctrl prototypes
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Ilia Mirkin [Sun, 3 Jan 2016 16:29:09 +0000 (11:29 -0500)]
nouveau: fix double-const qualifier
Reported by Tom^ on IRC. The original intent was to mark the pointer
constant as well as the data being pointed to, so move the *.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Rob Clark [Sat, 24 Oct 2015 18:54:56 +0000 (14:54 -0400)]
freedreno/ir3: use NIR_PASS helper macros
Signed-off-by: Rob Clark <robclark@freedesktop.org>
Rob Clark [Wed, 18 Nov 2015 21:33:41 +0000 (16:33 -0500)]
nir: extract out helper macros for running passes
Note these are a bit uglier, due to avoidance of GNU C extensions. But
drivers which do not need to be built with compilers that don't support
the extension can wrap these macros with their own.
Signed-off-by: Rob Clark <robclark@freedesktop.org>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Rob Clark [Mon, 26 Oct 2015 14:50:35 +0000 (10:50 -0400)]
freedreno/ir3: we require block_index metadata
Found during NIR_TEST_CLONE=1 piglit run. We were using block->index
but forgetting to require it. Causing things to not work with a cloned
shader which didn't preserve block_index.
Signed-off-by: Rob Clark <robclark@freedesktop.org>
Rob Clark [Sat, 24 Oct 2015 18:30:31 +0000 (14:30 -0400)]
freedreno/ir3: refactor NIR IR handling
Immediately convert into NIR and do an initial key-agnostic lowering/
optimization pass. This should let us share most of the per-variant
transformations between each variant, and hopefully minimize the draw-
time variant creation part of the compilation process.
Signed-off-by: Rob Clark <robclark@freedesktop.org>
Rob Clark [Mon, 21 Dec 2015 15:21:29 +0000 (10:21 -0500)]
freedreno/ir3: drop unnecessary unreachable() case
It will still hit a compile_assert() in emit_tex, which has the
advantage of dumping out the offending shader.
Signed-off-by: Rob Clark <robclark@freedesktop.org>
Samuel Pitoiset [Wed, 25 Nov 2015 00:19:16 +0000 (01:19 +0100)]
gallium/tests: fix build with clang compiler
Nested functions are supported as an extension in GNU C, but Clang
don't support them.
This fixes compilation errors when (manually) building compute.c,
or by setting --enable-gallium-tests to the configure script.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=75165
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Edward O'Callaghan <eocallaghan@alterapraxis.com>
Samuel Pitoiset [Wed, 9 Dec 2015 18:53:18 +0000 (19:53 +0100)]
nv50,nvc0: optimize coherent buffer checking at draw time
Instead of iterating over all the buffer resources looking for coherent
buffers, we keep track of a context-wide count. This will save some
iterations (and CPU cycles) in 99.99% case because usually coherent
buffers are not so used.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Kenneth Graunke [Sat, 2 Jan 2016 06:27:22 +0000 (22:27 -0800)]
i965: Make TCS precompile use the TES primitive mode when available.
If there's a linked TES program, we should just use the actual
primitive mode. If not, just guess triangles (as we did before).
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Kenneth Graunke [Mon, 28 Dec 2015 01:26:30 +0000 (17:26 -0800)]
i965: Push most TES inputs in SIMD8 mode.
Using the push model for inputs is much more efficient than pulling
inputs - the hardware can simply copy a large chunk into URB registers
at thread creation time, rather than having the thread send messages to
request data from the L3 cache. Unfortunately, it's possible to have
more TES inputs than fit in registers, so we have to fall back to the
pull model in some cases.
However, it turns out that most tessellation evaluation shaders are
fairly simple, and don't use many inputs. An arbitrary cut-off of
32 vec4 slots (16 registers) is more than sufficient to ensure that
100% of TES inputs are pushed for Shadow of Mordor, Unigine Heaven,
GPUTest/TessMark, and SynMark.
Note that unlike most SIMD8 stages, this actually reads packed vec4
data, since that is what our vec4 TCS programs write.
Improves performance in GPUTest's tessmark_x64 microbenchmark
by 93.4426% +/- 5.35541% (n = 25) on my Lenovo X250 at 1024x768.
Improves performance in Synmark's Gl40TerrainFlyTess microbenchmark
by 22.74% +/- 0.309394% (n = 5).
Improves performance in Shadow of Mordor at low settings with
tessellation enabled at 1280x720 by 2.12197% +/- 0.478553% (n = 4).
shader-db statistics for files containing tessellation shaders:
total instructions in shared programs: 184358 -> 181181 (-1.72%)
instructions in affected programs: 27971 -> 24794 (-11.36%)
helped: 226
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Kenneth Graunke [Mon, 28 Dec 2015 00:14:11 +0000 (16:14 -0800)]
i965: Use LOAD_PAYLOAD for SIMD8 TES input loads, not MOV.
We need a MOV to replicate g0.0<0,1,0> to all 8 channels. Since the
message payload is a single register, MOV seemed more sensible than
LOAD_PAYLOAD. However, MOV cannot be CSE'd, while LOAD_PAYLOAD can.
All input loads can use the same header - we don't need to re-expand
g0 every time. CSE accomplishes this, saving instructions.
shader-db statistics for files containing tessellation shaders:
total instructions in shared programs: 186923 -> 184358 (-1.37%)
instructions in affected programs: 30536 -> 27971 (-8.40%)
helped: 226
HURT: 0
total cycles in shared programs: 1009850 -> 1005356 (-0.45%)
cycles in affected programs: 168206 -> 163712 (-2.67%)
helped: 226
HURT: 0
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Kenneth Graunke [Thu, 31 Dec 2015 20:47:19 +0000 (12:47 -0800)]
i965: Move 3-src subnr swizzle handling into the vec4 backend.
While most align16 instructions only support a SubRegNum of 0 or 4
(using swizzling to control the other channels), 3-src instructions
actually support arbitrary SubRegNums. When the RepCtrl bit is set,
we believe it ignores the swizzle and uses the equivalent of a <0,1,0>
region from the subnr.
In the past, we adopted a vec4-centric approach of specifying subnr of
0 or 4 and a swizzle, then having brw_eu_emit.c convert that to a proper
SubRegNum. This isn't a great fit for the scalar backend, where we
don't set swizzles at all, and happily set subnrs in the range [0, 7].
This patch changes brw_eu_emit.c to use subnr and swizzle directly,
relying on the higher levels to set them sensibly.
This should fix problems where scalar sources get copy propagated into
3-src instructions in the FS backend. I've only observed this with
TES push model inputs, but I suppose it could happen in other cases.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Eric Anholt [Sun, 3 Jan 2016 01:33:19 +0000 (17:33 -0800)]
vc4: Fix build from upload changes.
Nicolai Hähnle [Sat, 2 Jan 2016 21:40:47 +0000 (16:40 -0500)]
gallium/radeon: send LLVM diagnostics as debug messages
Diagnostics sent during code generation and the every error message reported
by LLVMTargetMachineEmitToMemoryBuffer are disjoint reporting mechanisms. We
take care of both and also send an explicit message indicating failure at the
end, so that log parsers can more easily tell the boundary between shader
compiles.
Removed an fprintf that could never be triggered.
Reviewed-by: Edward O'Callaghan <eocallaghan@alterapraxis.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Nicolai Hähnle [Wed, 30 Dec 2015 21:00:56 +0000 (16:00 -0500)]
gallium/radeon: pass pipe_debug_callback into radeon_llvm_compile (v2)
This will allow us to send shader debug info via the context's debug callback.
Reviewed-by: Edward O'Callaghan <eocallaghan@alterapraxis.com> (v1)
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Nicolai Hähnle [Sat, 2 Jan 2016 21:30:57 +0000 (16:30 -0500)]
radeonsi: send shader info as debug messages in addition to stderr output
The output via stderr is very helpful for ad-hoc debugging tasks, so that remains
unchanged, but having the information available via debug messages as well
will allow the use of parallel shader-db runs.
Shader stats are always provided (if the context is a debug context, that is),
but you still have to enable the appropriate R600_DEBUG flags to get
disassembly (since it is rather spammy and is only generated by LLVM when we
explicitly ask for it).
Reviewed-by: Edward O'Callaghan <eocallaghan@alterapraxis.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Nicolai Hähnle [Wed, 30 Dec 2015 20:02:57 +0000 (15:02 -0500)]
radeonsi: pass pipe_debug_callback down into si_shader_binary_read (v2)
This will allow us to send shader debug info.
Reviewed-by: Edward O'Callaghan <eocallaghan@alterapraxis.com> (v1)
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Nicolai Hähnle [Wed, 30 Dec 2015 19:55:34 +0000 (14:55 -0500)]
gallium/radeon: implement set_debug_callback
Reviewed-by: Edward O'Callaghan <eocallaghan@alterapraxis.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Marek Olšák [Sat, 19 Dec 2015 16:54:31 +0000 (17:54 +0100)]
u_upload_mgr: allow specifying PIPE_USAGE_* for the upload buffer
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Marek Olšák [Sat, 19 Dec 2015 16:43:48 +0000 (17:43 +0100)]
u_upload_mgr: remove alignment parameter from u_upload_create
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Marek Olšák [Sat, 19 Dec 2015 16:15:02 +0000 (17:15 +0100)]
u_upload_mgr: pass alignment to u_upload_buffer manually
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Marek Olšák [Sat, 19 Dec 2015 16:15:02 +0000 (17:15 +0100)]
u_upload_mgr: pass alignment to u_upload_data manually
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Marek Olšák [Sat, 19 Dec 2015 16:15:02 +0000 (17:15 +0100)]
u_upload_mgr: pass alignment to u_upload_alloc manually
The fixed alignment of u_upload_mgr will go away.
This is the first step.
The motivation is that one u_upload_mgr can have multiple users,
each allocating from the same buffer, but requiring a different alignment.
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Marek Olšák [Sat, 19 Dec 2015 15:44:52 +0000 (16:44 +0100)]
u_upload_mgr: rework the application of alignment
The function only aligned the size, but not the offset.
The offset was aligned only when the previous suballocation was aligned.
That yielded the correct offset alignment if the alignment was constant
for all suballocations.
Instead, directly align the offset, but allow an unaligned size.
There is no change in behavior, because the alignment is constant
at the moment.
This a prerequisite for allowing a variable alignment for suballocations.
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>