From: Francisco Jerez Date: Tue, 17 May 2016 23:27:09 +0000 (-0700) Subject: i965/fs: Lower DDY instructions to SIMD8 during SIMD lowering time X-Git-Tag: upstream/17.1.0~9267 X-Git-Url: http://review.tizen.org/git/?a=commitdiff_plain;h=9eea3df29f21eb7507354c3b1d85d238b671a211;p=platform%2Fupstream%2Fmesa.git i965/fs: Lower DDY instructions to SIMD8 during SIMD lowering time ...on hardware lacking compressed Align16 support. Will allow simplifying the generator code and fixing it for SIMD32 codegen. Reviewed-by: Jason Ekstrand --- diff --git a/src/mesa/drivers/dri/i965/brw_fs.cpp b/src/mesa/drivers/dri/i965/brw_fs.cpp index 0006581..c2dd9da 100644 --- a/src/mesa/drivers/dri/i965/brw_fs.cpp +++ b/src/mesa/drivers/dri/i965/brw_fs.cpp @@ -4825,6 +4825,35 @@ get_lowered_simd_width(const struct brw_device_info *devinfo, */ return (devinfo->gen == 4 ? 16 : MIN2(16, inst->exec_size)); + case FS_OPCODE_DDY_FINE: + /* The implementation of this virtual opcode may require emitting + * compressed Align16 instructions, which are severely limited on some + * generations. + * + * From the Ivy Bridge PRM, volume 4 part 3, section 3.3.9 (Register + * Region Restrictions): + * + * "In Align16 access mode, SIMD16 is not allowed for DW operations + * and SIMD8 is not allowed for DF operations." + * + * In this context, "DW operations" means "operations acting on 32-bit + * values", so it includes operations on floats. + * + * Gen4 has a similar restriction. From the i965 PRM, section 11.5.3 + * (Instruction Compression -> Rules and Restrictions): + * + * "A compressed instruction must be in Align1 access mode. Align16 + * mode instructions cannot be compressed." + * + * Similar text exists in the g45 PRM. + * + * Empirically, compressed align16 instructions using odd register + * numbers don't appear to work on Sandybridge either. + */ + return (devinfo->gen == 4 || devinfo->gen == 6 || + (devinfo->gen == 7 && !devinfo->is_haswell) ? + MIN2(8, inst->exec_size) : MIN2(16, inst->exec_size)); + case SHADER_OPCODE_MULH: /* MULH is lowered to the MUL/MACH sequence using the accumulator, which * is 8-wide on Gen7+.