Marek Olšák [Tue, 28 Nov 2017 19:57:10 +0000 (20:57 +0100)]
radeonsi: fix layered DCC fast clear
Cc: 17.2 17.3 <mesa-stable@lists.freedesktop.org>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Jon Turney [Mon, 27 Nov 2017 13:32:53 +0000 (13:32 +0000)]
util: Also include endian.h on cygwin
If u_endian.h can't determine the endianess, the default behaviour in sha1.c
is to build for big-endian
Signed-off-by: Jon Turney <jon.turney@dronecode.org.uk>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Juan A. Suarez Romero [Wed, 29 Nov 2017 11:09:47 +0000 (12:09 +0100)]
mesa: deal with vs_inputs as 64-bit unsigned integer
Commit 78942e ("mesa: shrink VERT_ATTRIB bitfields to 32 bits") uses
vs_prog_data->vs_inputs as if it were a 32-bit unsigned integer.
But actually it is a 64-bit integer, and as such it is used in other
parts of Mesa code. It is worth to note that bits from the entire range
are used, and not only 32-bits. This is due our implementation for
handling 64-bit dual-slot input attributes, which requires to use a
larger bitfield to manage them.
This commit reverts the changes done in brw_draw_upload.c, keeping the
rest of the changes.
This fixes the following tests:
- KHR-GL45.enhanced_layouts.varying_array_locations
- KHR-GL45.enhanced_layouts.varying_locations
Fixes: 78942e ("mesa: shrink VERT_ATTRIB bitfields to 32 bits")
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=103942
CC: Marek Olšák <marek.olsak@amd.com>
CC: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Signed-off-by: Juan A. Suarez Romero <jasuarez@igalia.com>
Timothy Arceri [Thu, 15 Jun 2017 23:56:56 +0000 (09:56 +1000)]
mesa: rework _mesa_add_parameter() to only add a single param
This is more inline with what the functions name suggests it should
do, and makes the code much easier to follow.
This will also make adding uniform packing support much simpler.
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Dave Airlie [Wed, 29 Nov 2017 03:13:17 +0000 (13:13 +1000)]
r600: lds load cleanups.
This is just some cleanups on top of the last patch from my compute branch.
Signed-off-by: Dave Airlie <airlied@redhat.com>
Gert Wollny [Wed, 15 Nov 2017 09:29:12 +0000 (10:29 +0100)]
r600_shader: only load from LDS what is really used
Use the destination write mask to determine which values are really to be
read from LDS and load only these.
Reviewed-by: Dave Airlie <airlied@redhat.com>
Signed-off-by: Gert Wollny <gw.fossdev@gmail.com>
Dave Airlie [Sun, 26 Nov 2017 23:36:39 +0000 (23:36 +0000)]
r600/sb: handle jump after target to end of program. (v2)
This fixes hangs on cayman with
tests/spec/arb_tessellation_shader/execution/trivial-tess-gs_no-gs-inputs.shader_test
This has a single if/else in it, and when this peephole activated,
it would set the jump target to NULL if there was no instruction
after the final POP. This adds a NOP if we get a jump in this case,
and seems to fix the hangs, so we have a valid target for the ELSE
instruction to go to, instead of 0 (which causes infinite loops).
v2: update last_cf correctly. (I had some other patches hide this)
Cc: <mesa-stable@lists.freedesktop.org>
Signed-off-by: Dave Airlie <airlied@redhat.com>
Kenneth Graunke [Tue, 28 Nov 2017 16:12:45 +0000 (08:12 -0800)]
i965: Change a ret == -1 check to ret != 0.
For consistency with most other ret checks. Suggested by Chris.
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Kenneth Graunke [Sun, 26 Nov 2017 09:14:26 +0000 (01:14 -0800)]
i965: Use C99 struct initializers in brw_bufmgr.c.
This is cleaner than using a non-standard memclear macro (which does a
memset to 0) and then initializing fields after the fact. We move the
declarations to where we initialized the fields. While we're at it, we
move the declaration of 'ret' that goes with the ioctl, eliminating the
declaration section altogether.
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Kenneth Graunke [Sun, 26 Nov 2017 09:42:11 +0000 (01:42 -0800)]
i965: Move perf_debug and WARN_ONCE back to brw_context.h.
These were moved to src/intel/common/gen_debug.h, but they are not
common code. They assume that brw_context or gl_context variables
exist, named brw or ctx. That isn't remotely true outside of i965.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Eric Engestrom [Mon, 27 Nov 2017 13:46:43 +0000 (13:46 +0000)]
i965: const a few structs and vars to avoid writing to them by accident
Signed-off-by: Eric Engestrom <eric.engestrom@imgtec.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Kenneth Graunke [Sun, 26 Nov 2017 00:59:27 +0000 (16:59 -0800)]
i965: Fix Smooth Point Enables.
We want to program the 3DSTATE_RASTER field to the gl_context value,
not the other way around.
Fixes:
13ac46557ab1 (i965: Port Gen8+ 3DSTATE_RASTER state to genxml.)
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Dylan Baker [Thu, 26 Oct 2017 22:45:40 +0000 (15:45 -0700)]
meson: build virgl driver
Build tested only.
Signed-off-by: Dylan Baker <dylanx.c.baker@intel.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Dylan Baker [Thu, 26 Oct 2017 21:19:19 +0000 (14:19 -0700)]
meson: build svga driver on linux
Build tested only.
Signed-off-by: Dylan Baker <dylanx.c.baker@intel.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Dylan Baker [Thu, 26 Oct 2017 01:55:38 +0000 (18:55 -0700)]
meson: build r600 driver
v4: - Ensure inc_amd_common defined when radeonsi is disabled (needed by
r600)
Signed-off-by: Dylan Baker <dylanx.c.baker@intel.com>
Tested-by: Aaron Watry <awatry@gmail.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Dylan Baker [Thu, 26 Oct 2017 00:59:11 +0000 (17:59 -0700)]
meson: build r300 driver
This is build tested only
Signed-off-by: Dylan Baker <dylanx.c.baker@intel.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Dylan Baker [Wed, 25 Oct 2017 23:54:53 +0000 (16:54 -0700)]
meson: build i915g driver
Build tested only.
Signed-off-by: Dylan Baker <dylanx.c.baker@intel.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Brian Paul [Tue, 21 Nov 2017 14:31:57 +0000 (07:31 -0700)]
svga: move svga_is_format_supported() to svga_format.c
where the other format-related functions live.
Reviewed-by: Charmaine Lee <charmainel@vmware.com>
Brian Paul [Tue, 21 Nov 2017 14:27:06 +0000 (07:27 -0700)]
svga: s/unsigned/SVGA3dDevCapIndex/
Reviewed-by: Charmaine Lee <charmainel@vmware.com>
Lionel Landwerlin [Thu, 9 Nov 2017 16:40:55 +0000 (16:40 +0000)]
i965: perf: add support for CoffeeLake GT3
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Acked-by: Kenneth Graunke <kenneth@whitecape.org>
Lionel Landwerlin [Thu, 31 Aug 2017 10:28:30 +0000 (11:28 +0100)]
i965: perf: add support for CoffeeLake GT2
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Acked-by: Kenneth Graunke <kenneth@whitecape.org>
Lionel Landwerlin [Thu, 9 Nov 2017 16:51:26 +0000 (16:51 +0000)]
i965: perf: add busyness metric sets on gen8/9 platforms
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Acked-by: Kenneth Graunke <kenneth@whitecape.org>
Lionel Landwerlin [Thu, 9 Nov 2017 16:48:45 +0000 (16:48 +0000)]
i965: fix time elapsed counter equations in VME/Media configs
There was a mistake just in those metric sets. We probably didn't
noticed because they're not really interesting for 3D workloads.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Acked-by: Kenneth Graunke <kenneth@whitecape.org>
Lionel Landwerlin [Thu, 9 Nov 2017 16:46:47 +0000 (16:46 +0000)]
i965: perf: update counter names on gen8/9 platforms
Just fixing names.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Acked-by: Kenneth Graunke <kenneth@whitecape.org>
Lionel Landwerlin [Tue, 29 Aug 2017 09:41:27 +0000 (10:41 +0100)]
i965: add a debug option to disable oa config loading
This provides a good way to verify we haven't broken using the perf
driver on older kernels (which don't have the oa config loading
mechanism).
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Lionel Landwerlin [Tue, 25 Jul 2017 16:22:58 +0000 (17:22 +0100)]
i965: perf: add support for userspace configurations
This allows us to deploy new configurations without touching the
kernel.
v2: Detect loadable configs without creating one (Chris)
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Lionel Landwerlin [Thu, 31 Aug 2017 10:04:28 +0000 (11:04 +0100)]
i965: perf: update configs for loading from userspace
When making configs loadable from userspace in the kernel, we left to
userspace more responsability around programming some registers. In
particular one register we use to set directly in the driver has now
been moved into the configs.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Acked-by: Kenneth Graunke <kenneth@whitecape.org>
Eric Engestrom [Mon, 27 Nov 2017 11:33:48 +0000 (11:33 +0000)]
util: add mesa-sha1 test to meson
Fixes:
513d7ffa23d42e96f831 "util: Add a SHA1 unit test program"
Signed-off-by: Eric Engestrom <eric.engestrom@imgtec.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Eric Engestrom [Fri, 24 Nov 2017 18:00:57 +0000 (18:00 +0000)]
compiler: fix typo
Signed-off-by: Eric Engestrom <eric.engestrom@imgtec.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Eric Engestrom [Thu, 23 Nov 2017 13:16:43 +0000 (13:16 +0000)]
compiler: use NDEBUG to guard asserts
nir_validate.c's #endif already had the correct NDEBUG comment
Fixes:
dcb1acdea00a8f2c29777 "nir/validate: Only build in debug mode"
Fixes:
9ff71b649b4b3808a9e17 "i965/nir: Validate that NIR passes call nir_metadata_preserve()"
Signed-off-by: Eric Engestrom <eric.engestrom@imgtec.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Eric Engestrom [Fri, 24 Nov 2017 17:59:23 +0000 (17:59 +0000)]
broadcom: use NDEBUG to guard asserts
Signed-off-by: Eric Engestrom <eric.engestrom@imgtec.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Eric Engestrom [Fri, 24 Nov 2017 16:58:43 +0000 (16:58 +0000)]
vc4: check preprocessor token existence using #ifdef instead of #if
(other uses of USE_VC4_SIMULATOR are already correct)
Signed-off-by: Eric Engestrom <eric.engestrom@imgtec.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Ben Crocker [Mon, 27 Nov 2017 19:44:59 +0000 (14:44 -0500)]
docs/llvmpipe.html: Minor edits
Language and spelling fixups in three places.
Cc: "17.2" "17.3" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Ben Crocker <bcrocker@redhat.com>
Reviewed-by: Eric Engestrom <eric.engestrom@imgtec.com>
[Eric: move two fixes from the other patch to this one.]
Signed-off-by: Eric Engestrom <eric.engestrom@imgtec.com>
Eric Engestrom [Fri, 24 Nov 2017 10:49:25 +0000 (10:49 +0000)]
st/dri: replace hard-coded array size with ARRAY_SIZE()
Signed-off-by: Eric Engestrom <eric.engestrom@imgtec.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Nicolai Hähnle [Thu, 16 Nov 2017 16:23:43 +0000 (17:23 +0100)]
radeonsi/gfx9: simplify condition for on-chip ESGS
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Nicolai Hähnle [Sat, 18 Nov 2017 13:33:34 +0000 (14:33 +0100)]
radeonsi: clarify that si_shader_selector::esgs_itemsize is set for the ES part
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Nicolai Hähnle [Thu, 16 Nov 2017 15:56:21 +0000 (16:56 +0100)]
radeonsi: use si_shader_context instead of lp_build_context in more places
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Nicolai Hähnle [Thu, 16 Nov 2017 06:33:34 +0000 (07:33 +0100)]
radeonsi: cleanup si_initialize_color_surface
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Nicolai Hähnle [Sun, 19 Nov 2017 16:26:45 +0000 (17:26 +0100)]
radeonsi: avoid attempting to create CMASK if the tiling mode doesn't have it
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Nicolai Hähnle [Tue, 14 Nov 2017 08:37:38 +0000 (09:37 +0100)]
radeonsi: check that we don't leak fine.buf references
Just as an added precaution.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Nicolai Hähnle [Sun, 19 Nov 2017 15:09:28 +0000 (16:09 +0100)]
ac/surface: fix indentation
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Nicolai Hähnle [Thu, 9 Nov 2017 09:59:22 +0000 (10:59 +0100)]
amd/common: sid.h cleanups
Fix a bunch of labels indicating when registers were added/removed
and normalize the SI-class GRBM_GFX_INDEX.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Nicolai Hähnle [Fri, 17 Nov 2017 19:01:50 +0000 (20:01 +0100)]
st_glsl_to_tgsi: check for the tail sentinel in merge_two_dsts
This fixes yet another case where DFRACEXP has only one destination. Found
by address sanitizer.
Fixes tests/spec/arb_gpu_shader_fp64/execution/built-in-functions/fs-frexp-dvec4-only-mantissa.shader_test
Fixes:
3b666aa74795 ("st/glsl_to_tgsi: fix DFRACEXP with only one destination")
Acked-by: Marek Olšák <marek.olsak@amd.com>
Tapani Pälli [Mon, 20 Nov 2017 13:00:19 +0000 (15:00 +0200)]
mesa/gles: adjust internal format in glTexSubImage2D error checks
When floating point textures are created on OpenGL ES 2.0, driver
is free to choose used internal format. Mesa makes this decision in
adjust_for_oes_float_texture. Error checking for glTexImage2D properly
checks that sized formats are not used. We use same error checking
path for glTexSubImage2D (since there is lot of overlap), however since
those checks include internalFormat checks, we need to pass original
internalFormat passed by the client. Patch adds oes_float_internal_format
that does reverse adjust_for_oes_float_texture to get that format.
Fixes following test failure:
ES2-CTS.gtf.GL2ExtensionTests.texture_float.texture_float
(when running test with MESA_GLES_VERSION_OVERRIDE=2.0)
Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=103227
Cc: "17.3" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Jason Ekstrand [Tue, 28 Nov 2017 02:28:51 +0000 (18:28 -0800)]
radv: Use the suffixed versions of VK_QUEUE_GLOBAL_PRIORITY_*
Acked-by: Dave Airlie <airlied@redhat.com>
Jason Ekstrand [Tue, 28 Nov 2017 02:26:21 +0000 (18:26 -0800)]
vulkan: Update the XML and headers to 1.0.66
Acked-by: Dave Airlie <airlied@redhat.com>
Jason Ekstrand [Sat, 11 Nov 2017 20:31:54 +0000 (12:31 -0800)]
intel/blorp: Drop blorp_resolve_ccs_attachment
The only reason why we needed that version was because the Vulkan driver
needed to be able to create the surface states so it could handle
indirect clear colors. Now that blorp handles them natively, there's no
need for the extra entrypoint.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Nanley Chery <nanley.g.chery@intel.com>
Jason Ekstrand [Sat, 11 Nov 2017 20:22:45 +0000 (12:22 -0800)]
anv: Let blorp handle indirect clear colors for CCS resolves
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Nanley Chery <nanley.g.chery@intel.com>
Jason Ekstrand [Sat, 11 Nov 2017 20:12:57 +0000 (12:12 -0800)]
anv: Move get_fast_clear_state_address into anv_private.h
While we're at it, we break it into two nicely named functions.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Nanley Chery <nanley.g.chery@intel.com>
Jason Ekstrand [Sat, 11 Nov 2017 19:26:23 +0000 (11:26 -0800)]
intel/blorp: Take a range of layers in blorp_ccs_resolve
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Nanley Chery <nanley.g.chery@intel.com>
Jason Ekstrand [Sat, 11 Nov 2017 19:10:59 +0000 (11:10 -0800)]
intel/blorp: Add initial support for indirect clear colors
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Jason Ekstrand [Mon, 13 Nov 2017 02:31:56 +0000 (18:31 -0800)]
i965/blorp: Use a designated initializer for blorp_surf
This way uninitialized fields get automatically zeroed and it's safe to
add more fields to blorp_surf.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Nanley Chery <nanley.g.chery@intel.com>
Jason Ekstrand [Sat, 11 Nov 2017 23:44:23 +0000 (15:44 -0800)]
intel/blorp: Add fast-clear to the special case in MSAA resolves
This doesn't go all the way of avoiding the txf_ms if it's fast-cleared,
however it does at least make us only do it once. This should improve
performance of MSAA resolves in the presence of lots of clear color.
Without the patch, enabling fast-clears in the multisampling Sascha demo
drops the framerate by about 10%. With this patch, enabling fast-clears
increases the demo's framerate by 25%.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Nanley Chery <nanley.g.chery@intel.com>
Jason Ekstrand [Sat, 11 Nov 2017 23:42:51 +0000 (15:42 -0800)]
intel/blorp/blit: Rename blorp_nir_txf_ms_mcs
That name is already taken by one of the helpers in blorp_nir_builder.h
and, while we haven't moved the guts of blorp_blit.c there yet, we'd
like to start using some things from that header.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Nanley Chery <nanley.g.chery@intel.com>
Rob Herring [Mon, 27 Nov 2017 19:32:19 +0000 (13:32 -0600)]
Android: disable warnings causing errors
AOSP master has changed the build default to -Werror making all the
warnings errors. Override that with -Wno-error.
Signed-off-by: Rob Herring <robh@kernel.org>
Timothy Arceri [Mon, 27 Nov 2017 05:25:11 +0000 (16:25 +1100)]
st/glsl_to_tgsi: make use of driver_cache_blob with the disk cache
driver_cache_blob was introduced with the i965 disk cache, it allows
us to simplify the cache a little and possibly offers some minor
speed improvements since we load the GLSL metadata and TGSI from
disk in one pass.
Using driver_cache_blob should also make it straight forward to
implement binary support for ARB_get_program_binary in gallium.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Gwan-gyeong Mun [Sat, 25 Nov 2017 14:08:23 +0000 (23:08 +0900)]
glsl: Fix typo nagivation -> navigation
Signed-off-by: Mun Gwan-gyeong <elongbug@gmail.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
Emil Velikov [Thu, 23 Nov 2017 18:51:14 +0000 (18:51 +0000)]
gl_table.py: add extern C guard for the generated glapitable.h
The header can be included from C++, hence contents should have
appropriate notation.
Cc: mesa-stable@lists.freedesktop.org
Cc: Dylan Baker <dylan@pnwbakers.com>
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Eric Engestrom <eric.engestrom@imgtec.com>
Marek Olšák [Tue, 14 Nov 2017 18:44:33 +0000 (19:44 +0100)]
ac: pack legacy_surf_level better
r600_texture: 1488 -> 1248 bytes
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Marek Olšák [Tue, 14 Nov 2017 18:31:39 +0000 (19:31 +0100)]
ac: change legacy_surf_level::slice_size to dword units
The next commit will reduce the size even more.
v2: typecast to uint64_t manually
v3: add more typecasts, add asserts
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Marek Olšák [Tue, 14 Nov 2017 18:22:15 +0000 (19:22 +0100)]
ac: pack ac_surface better
r600_texture: 1736 -> 1488 bytes
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Marek Olšák [Fri, 24 Nov 2017 21:08:03 +0000 (22:08 +0100)]
radeonsi: always initialize max_forced_staging_uploads
r600_resource is malloc'd.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=103808
Fixes:
4b0dc098b256 ("gallium/u_threaded: don't map big VRAM buffers for the first upload directly")
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Marek Olšák [Thu, 23 Nov 2017 19:29:27 +0000 (20:29 +0100)]
radeonsi: remove an old hack for evergreen
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Marek Olšák [Thu, 23 Nov 2017 19:22:25 +0000 (20:22 +0100)]
radeonsi: set COMPUTE_RESOURCE_LIMITS.FORCE_SIMD_DIST when profitable
ported from Vulkan
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Dave Airlie [Tue, 14 Nov 2017 05:11:39 +0000 (15:11 +1000)]
ac/nir: don't write tcs outputs to LDS that aren't read back.
If the TCS doesn't read back the outputs, no need to store them
to LDS in the first place. (except for tess factors).
This seems to give about 50fps (3290->3330) with tessellation demo.
I haven't tested if it impacts DoW3 at all.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
Dave Airlie [Tue, 14 Nov 2017 05:10:44 +0000 (15:10 +1000)]
nir: fill outputs_read field and add patch outputs read (v2)
This is to be used for TCS optimisations on radv.
v2: don't set written on reads (nha)
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
Dave Airlie [Thu, 25 May 2017 21:57:52 +0000 (07:57 +1000)]
r600/eg: dump event type in dumps
This just makes it easier to debug some things.
Signed-off-by: Dave Airlie <airlied@redhat.com>
Tobias Klausmann [Sun, 12 Nov 2017 01:51:55 +0000 (02:51 +0100)]
nouveau/compiler: Allow to omit line numbers when printing instructions
This comes in handy when checking "NV50_PROG_DEBUG=1" outputs with diff!
V2:
- Use environmental variable (Karol Herbst)
V3:
- Use the already populated nv50_ir_prog_info to forward information to the
print pass (Pierre Moreau)
V4:
- get rid of default value in PrintPass constructor
Signed-off-by: Tobias Klausmann <tobias.johannes.klausmann@mni.thm.de>
Reviewed-by: Pierre Moreau <pierre.morrow@free.fr>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Nicolai Hähnle [Wed, 22 Nov 2017 16:52:43 +0000 (17:52 +0100)]
radeonsi: try flushing unflushed fences in si_fence_finish even when timeout == 0
Under certain conditions, waiting on a GL sync objects should act like
a flush, regardless of the timeout.
Portal 2, CS:GO, and presumably other Source engine games rely on this
behavior and hang during loading without this fix.
Fixes:
bc65dcab3bc4 ("radeonsi: avoid syncing the driver thread in si_fence_finish")
Signed-off-by: Marek Olšák <marek.olsak@amd.com>
Tested-by: Kai Wasserbäch <kai@dev.carbon-project.org>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=103902
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=103904
Ilia Mirkin [Thu, 16 Nov 2017 06:48:20 +0000 (01:48 -0500)]
nv50/ir: move LateAlgebraicOpt to the very end
Memory loads can take offsets, but the SHLADD will often attempt to
consume the offsets too. As there may be multiple memory loads with the
same base but different offsets, those would end up in a SHLADD instead
of the offset of the memory operation.
This moves the pass after we've had a chance to attempt to propagate
immediate adds into the indirect offset.
total instructions in shared programs : 6580681 -> 6567716 (-0.20%)
total gprs used in shared programs : 944261 -> 943375 (-0.09%)
total shared used in shared programs : 0 -> 0 (0.00%)
total local used in shared programs : 15328 -> 15328 (0.00%)
total bytes used in shared programs :
60339896 ->
60221504 (-0.20%)
local shared gpr inst bytes
helped 0 0 555 2698 2698
hurt 0 0 138 336 336
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Ilia Mirkin [Thu, 16 Nov 2017 04:32:16 +0000 (23:32 -0500)]
nv50/ir: when merging immediates/consts, load directly
When a MERGE operation gets its constraint moves added, it
susbstantially extends live ranges to be reusing an immediate from
earlier in the program (not to mention the silliness of loading an
immediate into a register, and then moving into another register).
We detect these scenarios and insert moves that take the immediate or
constbuf load directly into the register. If it's the last use, then we
can just move that operation to the closer location.
With SM35 (255 regs) we get these results:
total instructions in shared programs : 6583670 -> 6580681 (-0.05%)
total gprs used in shared programs : 950818 -> 944261 (-0.69%)
total shared used in shared programs : 0 -> 0 (0.00%)
total local used in shared programs : 15328 -> 15328 (0.00%)
total bytes used in shared programs :
60367456 ->
60339896 (-0.05%)
local shared gpr inst bytes
helped 0 0 4584 3186 3186
hurt 0 0 55 968 968
I suspect they will be better for SM20 and SM30.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Ilia Mirkin [Sat, 11 Nov 2017 03:10:46 +0000 (22:10 -0500)]
nv50/ir: add optimization for modulo by a non-power-of-2 value
We can still use the optimized division methods which make use of
multiplication with overflow.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Tobias Klausmann <tobias.johannes.klausmann@mni.thm.de>
Ilia Mirkin [Sat, 11 Nov 2017 02:47:59 +0000 (21:47 -0500)]
nv50/ir: optimize signed integer modulo by pow-of-2
It's common to use signed int modulo in GLSL. As it happens, the GLSL
specs allow the result to be undefined, but that seems fairly
surprising. It's not that much more effort to get it right, at least for
positive modulo operators.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Matt Turner [Sun, 26 Nov 2017 00:45:27 +0000 (16:45 -0800)]
util: Just give up and define PIPE_ARCH_LITTLE_ENDIAN on MSVC
MSVC doesn't support #warning?! Getting really tired of this.
Andres Gomez [Sun, 26 Nov 2017 00:15:43 +0000 (02:15 +0200)]
docs: remove bug 103626 from fix list as per 17.2.6
Bug https://bugs.freedesktop.org/show_bug.cgi?id=103626 was
incorrectly listed as fixed.
Signed-off-by: Andres Gomez <agomez@igalia.com>
(cherry picked from commit
b9b60dbf55a1307a60a333c70c3add3643243c36)
Matt Turner [Sat, 25 Nov 2017 23:56:43 +0000 (15:56 -0800)]
util: Use preprocessor correctly
Fixes:
6a353479a757 ("util: Assume little endian in the absence of
platform-specific handling")
Andres Gomez [Sat, 25 Nov 2017 23:46:25 +0000 (01:46 +0200)]
docs: update calendar, add news item and link release notes for 17.2.6
Signed-off-by: Andres Gomez <agomez@igalia.com>
Andres Gomez [Sat, 25 Nov 2017 23:40:36 +0000 (01:40 +0200)]
docs: add sha256 checksums for 17.2.6
Signed-off-by: Andres Gomez <agomez@igalia.com>
(cherry picked from commit
93c2beafc0a7fa2f210b006d22aba61caa71f773)
Andres Gomez [Sat, 25 Nov 2017 23:32:53 +0000 (01:32 +0200)]
docs: add release notes for 17.2.6
Signed-off-by: Andres Gomez <agomez@igalia.com>
(cherry picked from commit
00b52f8e99653316a090826914509a138a1c78f7)
Ilia Mirkin [Sun, 19 Nov 2017 21:36:08 +0000 (16:36 -0500)]
freedreno/a4xx: add ARB_framebuffer_no_attachments support
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Rob Clark <robdclark@gmail.com>
Ilia Mirkin [Sun, 19 Nov 2017 21:32:12 +0000 (16:32 -0500)]
freedreno/a4xx: add indirect draw support
This is a copy of the a5xx logic. Fails a few tests, but basic
functionality is there.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Rob Clark <robdclark@gmail.com>
Ilia Mirkin [Sun, 19 Nov 2017 21:31:02 +0000 (16:31 -0500)]
freedreno: regenerate pm4 header, adjust code for new names
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Rob Clark <robdclark@gmail.com>
Ilia Mirkin [Sun, 19 Nov 2017 20:13:41 +0000 (15:13 -0500)]
freedreno/a4xx: add stencil texturing support
Copied from a5xx, should be identical.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Rob Clark <robdclark@gmail.com>
Ilia Mirkin [Sun, 19 Nov 2017 17:28:53 +0000 (12:28 -0500)]
freedreno/ir3: add a pass to lower tg4 to txl, enable gather on a4xx
Unfortunately Adreno A4xx hardware returns incorrect results with the
GATHER4 opcodes. As a result, we have to lower to 4 individual texture
calls (txl since we have to force lod to 0). We achieve this using
offsets, including on cube maps which normally never have offsets.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Rob Clark <robdclark@gmail.com>
Ilia Mirkin [Sun, 19 Nov 2017 17:27:12 +0000 (12:27 -0500)]
nir: allow texture offsets with cube maps
GL doesn't have this, but some hardware supports it. This is convenient
for lowering tg4 to plain texture calls, which is necessary on Adreno
A4xx hardware.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Rob Clark <robdclark@gmail.com>
Matt Turner [Thu, 23 Nov 2017 18:41:34 +0000 (10:41 -0800)]
util: Fix disk_cache index calculation on big endian
The cache-test test program attempts to create a collision (using key_a
and key_a_collide) by making the first two bytes identical. The idea is
fine -- the shader cache wants to use the first four characters of a
SHA1 hex digest as the index.
The following program
unsigned char array[4] = {1, 2, 3, 4};
int *ptr = (int *)array;
for (int i = 0; i < 4; i++) {
printf("%02x", array[i]);
}
printf("\n");
printf("%08x\n", *ptr);
prints
01020304
04030201
on little endian, and
01020304
01020304
on big endian.
On big endian platforms reading the character array back as an int (as
is done in disk_cache.c) does not yield the same results as reading the
byte array.
To get the first four characters of the SHA1 hex digest when we mask
with CACHE_INDEX_KEY_MASK, we need to byte swap the int on big endian
platforms.
Bugzilla: https://bugs.freedesktop.org/103668
Bugzilla: https://bugs.gentoo.org/637060
Bugzilla: https://bugs.gentoo.org/636326
Fixes:
87ab26b2ab35 ("glsl: Add initial functions to implement an
on-disk cache")
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
Matt Turner [Wed, 22 Nov 2017 23:10:47 +0000 (15:10 -0800)]
util: Add a SHA1 unit test program
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
Matt Turner [Thu, 23 Nov 2017 06:39:51 +0000 (22:39 -0800)]
util: Fix SHA1 implementation on big endian
The code defines a macro blk0(i) based on the preprocessor condition
BYTE_ORDER == LITTLE_ENDIAN. If true, blk0(i) is defined as a byte swap
operation. Unfortunately, if the preprocessor macros used in the test
are no defined, then the comparison becomes 0 == 0 and it evaluates as
true.
Fixes:
d1efa09d342b ("util: import sha1 implementation from OpenBSD")
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
Matt Turner [Sat, 25 Nov 2017 04:25:04 +0000 (20:25 -0800)]
util: Assume little endian in the absence of platform-specific handling
Marek Olšák [Wed, 15 Nov 2017 22:53:04 +0000 (23:53 +0100)]
mesa: shrink VERT_ATTRIB bitfields to 32 bits
There are only 32 vertex attribs now.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Marek Olšák [Wed, 15 Nov 2017 22:24:56 +0000 (23:24 +0100)]
mesa: remove unused vertex attrib WEIGHT
We don't support ARB_vertex_blend.
Note that the attribute aliasing check for ARB_vertex_program had to be
rewritten.
vbo_context: 20344 -> 20008 bytes
gl_context: 74672 -> 74616 bytes
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Marek Olšák [Wed, 15 Nov 2017 21:58:58 +0000 (22:58 +0100)]
mesa: don't assign numbers to vertex attrib enums manually
I plan to remove one of them.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Marek Olšák [Sun, 19 Nov 2017 20:29:46 +0000 (21:29 +0100)]
gallium/hud: add HUD sharing within a context share group
This is needed for profiling multi-context applications like Chrome.
One context can record queries and another context can draw the HUD.
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Marek Olšák [Sun, 19 Nov 2017 20:04:07 +0000 (21:04 +0100)]
gallium/hud: update the HUD interface for multiple contexts
This is the boring subset of the following commit.
All new parameters are optional.
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Marek Olšák [Sun, 19 Nov 2017 03:36:38 +0000 (04:36 +0100)]
gallium/hud: prevent a crash if the recording context is inactive
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Marek Olšák [Sat, 18 Nov 2017 17:07:40 +0000 (18:07 +0100)]
gallium/hud: separate code for record context init/release
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Marek Olšák [Sat, 18 Nov 2017 17:07:40 +0000 (18:07 +0100)]
gallium/hud: separate code for draw context init/release
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Marek Olšák [Sat, 18 Nov 2017 16:53:34 +0000 (17:53 +0100)]
gallium/hud: don't use hud->pipe in hud_parse_env_var
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Marek Olšák [Sat, 18 Nov 2017 16:46:51 +0000 (17:46 +0100)]
gallium/hud: use cso_get_pipe_context
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Marek Olšák [Sat, 18 Nov 2017 16:43:42 +0000 (17:43 +0100)]
cso: add cso_get_pipe_context
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>