aboutsummaryrefslogtreecommitdiffstats
Commit message (Collapse)AuthorAgeFilesLines
* i965/fs: Skip remove_duplicate_mrf_writes() during SIMD32 runs.Francisco Jerez2016-05-271-1/+1
| | | | | | | The pass is disabled in SIMD16 dispatch mode for the same reason, it cannot handle instructions that write multiple MRF registers at once. Reviewed-by: Jason Ekstrand <[email protected]>
* i965/fs: Use SIMD8 SSBO GET_BUFFER_SIZE message regardless of the dispatch ↵Francisco Jerez2016-05-271-22/+18
| | | | | | width. Reviewed-by: Jason Ekstrand <[email protected]>
* i965/fs: Don't emit duplicated SSBO GET_BUFFER_SIZE instruction unnecessarily.Francisco Jerez2016-05-271-1/+0
| | | | Reviewed-by: Jason Ekstrand <[email protected]>
* i965/fs: Emit fixed width memory fence opcode regardless of the dispatch width.Francisco Jerez2016-05-271-2/+3
| | | | Reviewed-by: Jason Ekstrand <[email protected]>
* i965/fs: Return 32 bit mask from fs_builder::sample_mask().Francisco Jerez2016-05-271-1/+3
| | | | | | | | This doesn't actually handle the FS case, just add an assertion for the moment so I don't forget to update it later on for SIMD32 fragment shader dispatch. Reviewed-by: Jason Ekstrand <[email protected]>
* i965/fs: Emit fixed-width null register regardless of the dispatch width.Francisco Jerez2016-05-271-8/+4
| | | | | | | | brw_null_vec() cannot handle widths over 16 but it doesn't really matter what width we specify for null registers because destination regions have no width field at the hardware level. Reviewed-by: Jason Ekstrand <[email protected]>
* i965/fs: Fix half() to handle more exotic register files.Francisco Jerez2016-05-271-21/+4
| | | | | | | | horiz_offset() is able to deal with a superset of the register files currently special-cased in half(). Just call horiz_offset() in all cases. Reviewed-by: Jason Ekstrand <[email protected]>
* i965/fs: Fix horiz_offset() to handle ARF and HW GRF register files.Francisco Jerez2016-05-271-4/+10
| | | | | | | We'll hit these in some cases during SIMD lowering in 32-wide programs. Reviewed-by: Jason Ekstrand <[email protected]>
* i965/fs: Clean up remaining uses of fs_inst::reads_flag and ::writes_flag.Francisco Jerez2016-05-275-24/+12
| | | | Reviewed-by: Jason Ekstrand <[email protected]>
* i965/fs: Keep track of flag dependencies with byte granularity during ↵Francisco Jerez2016-05-271-10/+31
| | | | | | | | | | | | scheduling. This prevents false dependencies from being created between instructions that write disjoint 8-bit portions of the flag register and OTOH should make sure that the scheduler considers dependencies between instructions that write or read multiple flag subregisters at once (e.g. 32-wide predication or conditional mods). Reviewed-by: Jason Ekstrand <[email protected]>
* i965/fs: Track flag register liveness with byte granularity.Francisco Jerez2016-05-272-25/+9
| | | | | | | | | | | | | | | | | | This is required for correctness in presence of multiple 8-wide flag writes (e.g. 8-wide instructions with a conditional mod set) which update a different portion of the same 16-bit flag subregister. Right now we keep track of flag dataflow with 16-bit granularity and consider flag writes to have killed any previous definition of the same subregister even if the write was less than 16 channels wide, which can cause live flag register updates to be dead code-eliminated incorrectly. Additionally this makes sure that we handle 32-wide flag writes and reads which may span multiple flag subregisters so the current approach of just setting/testing a single bit from the live set wouldn't have worked. Reviewed-by: Jason Ekstrand <[email protected]>
* i965/fs: Define methods to calculate the flag subset read or written by an ↵Francisco Jerez2016-05-272-11/+67
| | | | | | | | fs_inst. v2: Codestyle fixes (Jason). Reviewed-by: Jason Ekstrand <[email protected]>
* i965/fs: Expose arbitrary channel execution groups to the IR.Francisco Jerez2016-05-276-32/+35
| | | | | | | | | | This generalizes the current fs_inst::force_sechalf flag to allow specifying channel enable groups other than 0 or 8. At some point it will likely make sense to fix the vec4 generator to support arbitrary execution groups and then move the definition of fs_inst::group into backend_instruction (e.g. so we can do FP64 in the VEC4 back-end). Reviewed-by: Jason Ekstrand <[email protected]>
* i965/ir: Make BROADCAST emit an unmasked single-channel move.Francisco Jerez2016-05-274-3/+17
| | | | | | | | | | | | | | Alternatively we could have extended the current semantics to 32-wide mode by changing brw_broadcast() to emit multiple indexed MOV instructions in the generator copying the selected value to all destination registers, but it seemed rather silly to waste EU cycles unnecessarily copying the exact same value 32 times in the GRF. The vstride change in the Align16 path is required to avoid assertions in validate_reg() since the change causes the execution size of the MOV and SEL instructions to be equal to the source region width. Reviewed-by: Jason Ekstrand <[email protected]>
* i965/fs: Allow specifying arbitrary quarter control to FIND_LIVE_CHANNEL.Francisco Jerez2016-05-271-7/+12
| | | | | | | | | | | This makes FIND_LIVE_CHANNEL behave like a normal instruction for non-zero quarter control. On Gen8+ we just leave the quarter control field of the emitted FBL instruction set to the default value so the hardware applies the expected shift to the execution mask signals. On Gen7 we apply the offset manually by specifying a non-zero subregister offset in the source region of the FBL instruction. Reviewed-by: Jason Ekstrand <[email protected]>
* i965/fs: Allow specifying arbitrary execution sizes up to 32 to ↵Francisco Jerez2016-05-271-8/+17
| | | | | | | | | | | | FIND_LIVE_CHANNEL. Due to a Gen7-specific hardware bug native 32-wide instructions get the lower 16 bits of the execution mask applied incorrectly to both halves of the instruction, so the MOV trick we currently use wouldn't work. Instead emit multiple 16-wide MOV instructions in 32-wide mode in order to cover the whole execution mask. Reviewed-by: Jason Ekstrand <[email protected]>
* i965/fs: Lower 32-wide scratch writes in the generator.Francisco Jerez2016-05-271-6/+24
| | | | | | | | | | | The hardware has messages that can write 32 32bit components at once but the channel enable mask gets messed up. We need to split them into several 16-wide scratch writes for the channel enables to be applied correctly. The SIMD lowering pass cannot be used for this because scratch writes are emitted rather late during register allocation long after SIMD lowering has been done. Reviewed-by: Jason Ekstrand <[email protected]>
* i965/fs: Implement scratch reads and writes of 4 GRFs at a time.Francisco Jerez2016-05-273-21/+18
| | | | Reviewed-by: Jason Ekstrand <[email protected]>
* i965/eu: Fix Gen7+ DP scratch message size calculation on Gen7.Francisco Jerez2016-05-271-1/+4
| | | | | | | | Gen7 hardware expects the block size field in the message descriptor to be the number of registers minus one instead of the log2 of the number of registers. Reviewed-by: Jason Ekstrand <[email protected]>
* i965/eu: Set execution size explicitly for memory fence send message.Francisco Jerez2016-05-271-4/+7
| | | | | | | | We don't want to emit a 32-wide send message in 32-wide programs. The memory fence message should have the same effect regardless of the execution size (as long as it's valid) so just set it to one. Reviewed-by: Jason Ekstrand <[email protected]>
* i965/eu: Consider QtrCtrl 3Q-4Q in typed surface message descriptor setup.Francisco Jerez2016-05-271-6/+6
| | | | | | | | | | | In SIMD32 programs the compiler is responsible for providing the appropriate half of the sample mask in the message header, so the first and third quarters both map to the first slot group of the provided 16-bit half, while the second and fourth quarters map to the second slot group -- IOW they should be equivalent to 1Q and 2Q modulo two. Reviewed-by: Jason Ekstrand <[email protected]>
* i965/fs: Clean up remaining uses of dispatch_width in the generator.Francisco Jerez2016-05-273-9/+8
| | | | | | | | Most of these are bugs because the intended execution size of an instruction and the dispatch width of the shader aren't necessarily the same (especially in SIMD32 programs). Reviewed-by: Jason Ekstrand <[email protected]>
* i965/eu: Remove brw_codegen::compressed and ::compressed_stack.Francisco Jerez2016-05-273-11/+5
| | | | Reviewed-by: Jason Ekstrand <[email protected]>
* i965/eu: Use current exec size instead of p->compressed in surface message ↵Francisco Jerez2016-05-271-6/+8
| | | | | | | | | | generation. This was kind of an abuse of p->compressed, dataport send message instructions are always uncompressed. Use the current execution size instead since p->compressed is on its way out. Reviewed-by: Jason Ekstrand <[email protected]>
* i965/fs: No need to reset predicate control after emitting some instructions.Francisco Jerez2016-05-271-2/+0
| | | | | | Trivial clean-up. Reviewed-by: Jason Ekstrand <[email protected]>
* i965/fs: Pass current execution size to brw_IF() and brw_DO().Francisco Jerez2016-05-271-2/+2
| | | | | | | | | | | | This gets IF and DO instructions working in SIMD32 programs. brw_IF() and brw_DO() should probably behave in the same way as other generator functions that emit control flow instructions and just figure out the right execution size by themselves from the current execution controls specified through the brw_codegen argument. Changing that will require updating lots of Gen4-5 clipper code though, so for the moment just pass the current value redundantly from the FS generator. Reviewed-by: Jason Ekstrand <[email protected]>
* i965/eu: Stop using p->compressed to specify the exec size of control flow ↵Francisco Jerez2016-05-271-13/+11
| | | | | | | | | instructions. p->compressed won't work for SIMD32, we should just be using the execution size value specified via p->current instead. Reviewed-by: Jason Ekstrand <[email protected]>
* i965/fs: Extend region width calculation to allow arbitrary execution sizes.Francisco Jerez2016-05-271-16/+23
| | | | | | | | | | | | Instead of just halving the execution size when the instruction is compressed hoping that it will give a legal source region width, we can calculate the maximum legal width value in closed form from the component size and stride. This makes sure that brw_reg_from_fs_reg() always returns a valid hardware region even for virtual 32-wide instructions (e.g. send-like instructions) that would seem to exceed the hardware region width limit after halving. Reviewed-by: Jason Ekstrand <[email protected]>
* i965/fs: Pass the compression mode to brw_reg_from_fs_reg().Kenneth Graunke2016-05-271-5/+6
| | | | | | | | | | Curro is planning to eliminate p->compressed, so let's avoid using it here and just pass in the value directly. Signed-off-by: Kenneth Graunke <[email protected]> [ Francisco Jerez: Pass boolean flag instead of brw_compression enum. ] Reviewed-by: Francisco Jerez <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]>
* i965/fs: Simplify per-instruction compression control setup in generator.Francisco Jerez2016-05-271-27/+17
| | | | | | | | By using the new compression/group control interface. This will allow easier extension to support arbitrary channel enable groups at the IR level. Reviewed-by: Jason Ekstrand <[email protected]>
* i965/fs: No need to set compression control at the top of generate_code().Francisco Jerez2016-05-271-2/+0
| | | | | | | The right value is dependent on the specific IR instruction being generated so it has to be reset in every iteration of the loop anyway. Reviewed-by: Jason Ekstrand <[email protected]>
* i965/eu: Fix a bunch of compression control bugs in the generator.Francisco Jerez2016-05-272-10/+9
| | | | | | | | | | Most of these were resetting quarter control to zero incorrectly even though everything they needed to do was disable instruction compression -- The brw_SAMPLE() case was doing the right thing but it can be simplified slightly by using the new compression control interface. Reviewed-by: Jason Ekstrand <[email protected]>
* i965/eu: Define alternative interface for setting compression and group ↵Francisco Jerez2016-05-272-0/+75
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | controls. This implements some simple helper functions that can be used to specify the group of channel enable signals and compression enable that apply to a brw_inst instruction. It's intended to replace brw_set_default_compression_control eventually because the current interface has a number of shortcomings inherited from the Gen-4-5-centric representation of compression and group controls as a single non-orthogonal enum: On the one hand it doesn't work for specifying arbitrary group controls other than 1Q and 2Q, which are frequently useful in SIMD32 and FP64 programs. On the other hand the current interface forces you to update the compression *and* group controls simultaneously, which has been the source of a number of generator bugs (a bunch of them fixed in this series), because in many cases we would end up resetting the group controls to zero inadvertently even though everything we wanted to do was disable instruction compression -- The latter seems especially unfortunate on Gen6+ hardware which have no explicit compression control, so we would end up bashing the quarter control field of the instruction for no benefit. Instead of a single function that updates both at the same time introduce separate interfaces to update one or the other independently preserving the current value of the other (which typically comes from the back-end IR so it has to be respected). Reviewed-by: Jason Ekstrand <[email protected]>
* i965/fs: Remove FS_OPCODE_PACK_STENCIL_REF virtual instruction.Francisco Jerez2016-05-275-52/+2
| | | | | | It's just a byte MOV with strided source. Reviewed-by: Jason Ekstrand <[email protected]>
* i965/fs: Remove extract virtual opcodes.Francisco Jerez2016-05-275-53/+9
| | | | | | | These can be easily represented in the IR as a MOV instruction with strided source so they seem rather redundant. Reviewed-by: Jason Ekstrand <[email protected]>
* i965: Define brw_int_type() helper.Francisco Jerez2016-05-271-0/+20
| | | | | | | Intended as a (partial) inverse of type_sz(). Will be useful in the next commit and some other SIMD32 generator changes I have queued up. Reviewed-by: Jason Ekstrand <[email protected]>
* i965/fs: Remove manual splitting of DDY ops in the generator.Francisco Jerez2016-05-271-37/+1
| | | | Reviewed-by: Jason Ekstrand <[email protected]>
* i965/fs: Remove manual unrolling of BFI instructions from the generator.Francisco Jerez2016-05-271-34/+2
| | | | Reviewed-by: Jason Ekstrand <[email protected]>
* i965/fs: Drop Gen7 CMP SIMD unrolling workaround from the generator.Francisco Jerez2016-05-271-36/+10
| | | | Reviewed-by: Jason Ekstrand <[email protected]>
* i965/fs: Drop lowering code for a few three-source instructions from the ↵Francisco Jerez2016-05-271-47/+4
| | | | | | generator. Reviewed-by: Jason Ekstrand <[email protected]>
* i965/fs: Set default access mode to Align1 for all instructions in the ↵Francisco Jerez2016-05-271-0/+1
| | | | | | | | | | | | | generator. Currently the generator code for most opcodes honours the default access mode (which should typically be Align1 in the scalar back-end), but generate_code() doesn't set it explicitly which means that the access mode from a previous instruction could leak into the following ones if you did something special and weren't careful enough to save and restore the previous access mode. Reviewed-by: Jason Ekstrand <[email protected]>
* i965/fs: Remove handcrafted math SIMD lowering from the generator.Francisco Jerez2016-05-272-101/+21
| | | | | | | | Most of this wouldn't have worked for SIMD32 and had various dispatch_width and compression control bugs. It's mostly dead now with SIMD lowering of math instructions turned on in the compiler. Reviewed-by: Jason Ekstrand <[email protected]>
* i965/fs: Limit SIMD width of various virtual opcodes to the maximum ↵Francisco Jerez2016-05-271-5/+40
| | | | | | | | | | supported value. Which is 16 or 8 in most cases. This will make sure that 32-wide virtual instructions get chopped up into chunks of their maximum execution size. Reviewed-by: Jason Ekstrand <[email protected]>
* i965/fs: Lower LOAD_PAYLOAD instructions of unsupported width.Francisco Jerez2016-05-271-0/+19
| | | | | | | | | | | | | | | | Only per-channel LOAD_PAYLOAD instructions can be lowered, which should cover everything that comes in from the front-end. LOAD_PAYLOAD instructions used to construct actual message payloads cannot be easily lowered because they contain headers and vectors of variable type that aren't necessarily channel-aligned -- We shouldn't find any of them in the program at SIMD lowering time though because they're introduced during logical send lowering. An alternative that may be worth considering would be to re-run the SIMD lowering pass after LOAD_PAYLOAD lowering instead of this patch. Reviewed-by: Jason Ekstrand <[email protected]>
* i965/fs: Lower DDY instructions to SIMD8 during SIMD lowering timeFrancisco Jerez2016-05-271-0/+29
| | | | | | | ...on hardware lacking compressed Align16 support. Will allow simplifying the generator code and fixing it for SIMD32 codegen. Reviewed-by: Jason Ekstrand <[email protected]>
* i965/fs: Apply usual FPU-like execution size restrictions to MULH.Francisco Jerez2016-05-271-1/+2
| | | | Reviewed-by: Jason Ekstrand <[email protected]>
* i965/fs: Calculate maximum execution size of MOV_INDIRECT correctly.Francisco Jerez2016-05-271-9/+3
| | | | Reviewed-by: Jason Ekstrand <[email protected]>
* i965/fs: Assert that IF instruction with embedded compare has legal exec_size.Francisco Jerez2016-05-271-0/+4
| | | | | | | | | We shouldn't encounter these right now but if we did it wouldn't be possible for the SIMD lowering pass to split it into multiple instructions because of its side effects on control flow, so just assert in order to kill the program. Reviewed-by: Jason Ekstrand <[email protected]>
* i965/fs: Implement HSW BFI exec size workarounds in the SIMD lowering pass.Francisco Jerez2016-05-271-2/+8
| | | | Reviewed-by: Jason Ekstrand <[email protected]>
* i965/fs: Implement workaround for IVB CMP dependency race in the SIMD ↵Francisco Jerez2016-05-271-1/+17
| | | | | | lowering pass. Reviewed-by: Jason Ekstrand <[email protected]>