summaryrefslogtreecommitdiffstats
path: root/src
Commit message (Collapse)AuthorAgeFilesLines
* i965/eu: Define alternative interface for setting compression and group ↵Francisco Jerez2016-05-272-0/+75
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | controls. This implements some simple helper functions that can be used to specify the group of channel enable signals and compression enable that apply to a brw_inst instruction. It's intended to replace brw_set_default_compression_control eventually because the current interface has a number of shortcomings inherited from the Gen-4-5-centric representation of compression and group controls as a single non-orthogonal enum: On the one hand it doesn't work for specifying arbitrary group controls other than 1Q and 2Q, which are frequently useful in SIMD32 and FP64 programs. On the other hand the current interface forces you to update the compression *and* group controls simultaneously, which has been the source of a number of generator bugs (a bunch of them fixed in this series), because in many cases we would end up resetting the group controls to zero inadvertently even though everything we wanted to do was disable instruction compression -- The latter seems especially unfortunate on Gen6+ hardware which have no explicit compression control, so we would end up bashing the quarter control field of the instruction for no benefit. Instead of a single function that updates both at the same time introduce separate interfaces to update one or the other independently preserving the current value of the other (which typically comes from the back-end IR so it has to be respected). Reviewed-by: Jason Ekstrand <[email protected]>
* i965/fs: Remove FS_OPCODE_PACK_STENCIL_REF virtual instruction.Francisco Jerez2016-05-275-52/+2
| | | | | | It's just a byte MOV with strided source. Reviewed-by: Jason Ekstrand <[email protected]>
* i965/fs: Remove extract virtual opcodes.Francisco Jerez2016-05-275-53/+9
| | | | | | | These can be easily represented in the IR as a MOV instruction with strided source so they seem rather redundant. Reviewed-by: Jason Ekstrand <[email protected]>
* i965: Define brw_int_type() helper.Francisco Jerez2016-05-271-0/+20
| | | | | | | Intended as a (partial) inverse of type_sz(). Will be useful in the next commit and some other SIMD32 generator changes I have queued up. Reviewed-by: Jason Ekstrand <[email protected]>
* i965/fs: Remove manual splitting of DDY ops in the generator.Francisco Jerez2016-05-271-37/+1
| | | | Reviewed-by: Jason Ekstrand <[email protected]>
* i965/fs: Remove manual unrolling of BFI instructions from the generator.Francisco Jerez2016-05-271-34/+2
| | | | Reviewed-by: Jason Ekstrand <[email protected]>
* i965/fs: Drop Gen7 CMP SIMD unrolling workaround from the generator.Francisco Jerez2016-05-271-36/+10
| | | | Reviewed-by: Jason Ekstrand <[email protected]>
* i965/fs: Drop lowering code for a few three-source instructions from the ↵Francisco Jerez2016-05-271-47/+4
| | | | | | generator. Reviewed-by: Jason Ekstrand <[email protected]>
* i965/fs: Set default access mode to Align1 for all instructions in the ↵Francisco Jerez2016-05-271-0/+1
| | | | | | | | | | | | | generator. Currently the generator code for most opcodes honours the default access mode (which should typically be Align1 in the scalar back-end), but generate_code() doesn't set it explicitly which means that the access mode from a previous instruction could leak into the following ones if you did something special and weren't careful enough to save and restore the previous access mode. Reviewed-by: Jason Ekstrand <[email protected]>
* i965/fs: Remove handcrafted math SIMD lowering from the generator.Francisco Jerez2016-05-272-101/+21
| | | | | | | | Most of this wouldn't have worked for SIMD32 and had various dispatch_width and compression control bugs. It's mostly dead now with SIMD lowering of math instructions turned on in the compiler. Reviewed-by: Jason Ekstrand <[email protected]>
* i965/fs: Limit SIMD width of various virtual opcodes to the maximum ↵Francisco Jerez2016-05-271-5/+40
| | | | | | | | | | supported value. Which is 16 or 8 in most cases. This will make sure that 32-wide virtual instructions get chopped up into chunks of their maximum execution size. Reviewed-by: Jason Ekstrand <[email protected]>
* i965/fs: Lower LOAD_PAYLOAD instructions of unsupported width.Francisco Jerez2016-05-271-0/+19
| | | | | | | | | | | | | | | | Only per-channel LOAD_PAYLOAD instructions can be lowered, which should cover everything that comes in from the front-end. LOAD_PAYLOAD instructions used to construct actual message payloads cannot be easily lowered because they contain headers and vectors of variable type that aren't necessarily channel-aligned -- We shouldn't find any of them in the program at SIMD lowering time though because they're introduced during logical send lowering. An alternative that may be worth considering would be to re-run the SIMD lowering pass after LOAD_PAYLOAD lowering instead of this patch. Reviewed-by: Jason Ekstrand <[email protected]>
* i965/fs: Lower DDY instructions to SIMD8 during SIMD lowering timeFrancisco Jerez2016-05-271-0/+29
| | | | | | | ...on hardware lacking compressed Align16 support. Will allow simplifying the generator code and fixing it for SIMD32 codegen. Reviewed-by: Jason Ekstrand <[email protected]>
* i965/fs: Apply usual FPU-like execution size restrictions to MULH.Francisco Jerez2016-05-271-1/+2
| | | | Reviewed-by: Jason Ekstrand <[email protected]>
* i965/fs: Calculate maximum execution size of MOV_INDIRECT correctly.Francisco Jerez2016-05-271-9/+3
| | | | Reviewed-by: Jason Ekstrand <[email protected]>
* i965/fs: Assert that IF instruction with embedded compare has legal exec_size.Francisco Jerez2016-05-271-0/+4
| | | | | | | | | We shouldn't encounter these right now but if we did it wouldn't be possible for the SIMD lowering pass to split it into multiple instructions because of its side effects on control flow, so just assert in order to kill the program. Reviewed-by: Jason Ekstrand <[email protected]>
* i965/fs: Implement HSW BFI exec size workarounds in the SIMD lowering pass.Francisco Jerez2016-05-271-2/+8
| | | | Reviewed-by: Jason Ekstrand <[email protected]>
* i965/fs: Implement workaround for IVB CMP dependency race in the SIMD ↵Francisco Jerez2016-05-271-1/+17
| | | | | | lowering pass. Reviewed-by: Jason Ekstrand <[email protected]>
* i965/fs: Enforce common regioning restrictions by SIMD splitting.Francisco Jerez2016-05-271-20/+104
| | | | | | | | | | | | | | | | | This change addresses a number of hardware restrictions on the source and destination regions and other execution controls of regular FPU-like instructions that in some cases can be avoided by reducing the execution size of the instruction. Some of these restrictions (e.g. the one about 3src instructions not supporting compression on some hardware) are currently being worked around case by case in the generator with ad-hoc splitting code that is buggy in several ways (e.g. doesn't handle non-trivial execution controls which would break SIMD32 code), but it seems cleaner to implement as many restrictions as we can in a single lowering pass since that will allow us to simplify some of the surrounding code considerably and also make sure that we don't forget applying them in the future. Reviewed-by: Jason Ekstrand <[email protected]>
* i965/fs: Enforce extended math exec size limits during SIMD lowering.Francisco Jerez2016-05-271-10/+24
| | | | | | | | | | This teaches the SIMD lowering pass about the hardware limits on the execution size of math instructions, which will allow simplifying the generator code and at the same time get rid of a number of bugs in the manual SIMD unrolling done currently that prevent SIMD32 codegen from working. Reviewed-by: Jason Ekstrand <[email protected]>
* i965/fs: Handle SAMPLEINFO consistently like other texturing instructions.Francisco Jerez2016-05-274-17/+15
| | | | | | | | | | | Seems like this texturing opcode was missing its logical counterpart which would prevent it from taking advantage of the SIMD lowering infrastructure, define it and plumb it through the back-end. At some point we'll likely want to emit a single SAMPLEINFO message shared among all channels irrespective of this change, but for the moment this should be enough to get the intrinsic working in SIMD32 mode. Reviewed-by: Jason Ekstrand <[email protected]>
* i965/fs: Lower math into Gen4-5 send-like instructions in lower_logical_sends.Francisco Jerez2016-05-272-42/+60
| | | | | | | | The benefit is we will be able to use the SIMD lowering pass to unroll math instructions of unsupported width and then remove some cruft from the generator. Reviewed-by: Jason Ekstrand <[email protected]>
* i965/fs: Add missing get_latency_gen7() cases for the Gen7 pull constant ↵Francisco Jerez2016-05-271-0/+2
| | | | | | | | | | | | opcodes. This was causing the scheduler to be rather optimistic about the latency of pull constant opcodes on Gen7+. This might seem to increase the cycle count estimate calculated by the scheduler itself for some shaders, even though the actual cycle count should actually be decreased. Reviewed-by: Jason Ekstrand <[email protected]>
* i965/fs: Rename Gen4 physical varying pull constant load opcode.Francisco Jerez2016-05-276-14/+14
| | | | | | | | | For consistency with the Gen7 variant. I'm not doing the same to the uniform pull constant message at this point because the non-GEN7 one is still overloaded to be either an expression-like logical instruction or a Gen4-specific physical send message. Reviewed-by: Jason Ekstrand <[email protected]>
* i965/fs: Implement promotion of varying pull loads on Gen4 during SIMD lowering.Francisco Jerez2016-05-271-14/+13
| | | | | | | | | Varying pull constant loads inherit the same limitation of pre-ILK hardware that requires expanding SIMD8 texel fetch instructions to SIMD16, we can deal with pull constant loads in the same way it's done for texturing during SIMD lowering. Reviewed-by: Jason Ekstrand <[email protected]>
* i965/fs: Hide varying pull constant load message setup behind logical opcode.Francisco Jerez2016-05-276-31/+39
| | | | | | | | This will allow the SIMD lowering pass to split 32-wide varying pull constant loads (not natively supported by the hardware) into 16-wide instructions. Reviewed-by: Jason Ekstrand <[email protected]>
* i965/fs: Avoid constant propagation when the type sizes don't match.Francisco Jerez2016-05-271-0/+8
| | | | | | | | | | | | The case where the source type of the instruction is smaller than the immediate type could be handled by calculating the portion of the immediate read by the instruction (assuming that the source channels are aligned with the destination channels of the copy) and then representing the same value as an immediate of the source type (assuming such an immediate type exists), but the code below doesn't do that, so just bail for the moment. Reviewed-by: Jason Ekstrand <[email protected]>
* i965/fs: Fix CSE temporary copy for some LOAD_PAYLOAD corner cases.Francisco Jerez2016-05-271-1/+2
| | | | | | | | | | | | | If the LOAD_PAYLOAD instruction only has header sources it's possible for the number of registers written to be less than or equal to the SIMD component size, in which case it would take the single-MOV path at the bottom which would cause the channel enable masks to be applied incorrectly to the header contents and/or cause it to write past the end of the allocated temporary. If the instruction is either LOAD_PAYLOAD or doesn't write exactly one component the MOV path is going to mess up the program so just don't use it. Reviewed-by: Jason Ekstrand <[email protected]>
* i965/fs: Handle instruction predication in SIMD lowering pass.Francisco Jerez2016-05-271-1/+11
| | | | Reviewed-by: Jason Ekstrand <[email protected]>
* i965/fs: No need to unzip SIMD-periodic sources during SIMD lowering.Francisco Jerez2016-05-271-1/+1
| | | | | | | | | | | | If the source value is going to the same for all SIMD-lowered chunks of the instruction there should be no need to unzip the value into multiple temporary registers one for each lowered chunk. As a side effect this fixes SIMD lowering of instructions with a vector immediate source. In the long term it *might* still be worth fixing offset() to handle vector immediates correctly though, this should be good enough for the moment. Reviewed-by: Jason Ekstrand <[email protected]>
* i965/fs: Generalize is_uniform() to is_periodic().Francisco Jerez2016-05-271-1/+30
| | | | | | This will be useful in the SIMD lowering pass. Reviewed-by: Jason Ekstrand <[email protected]>
* i965/fs: Fix byte_offset() for MRF/ARF/FIXED_GRF regs.Francisco Jerez2016-05-271-11/+17
| | | | Reviewed-by: Jason Ekstrand <[email protected]>
* i965/fs: Fix off-by-one region overlap comparison in copy propagation.Francisco Jerez2016-05-271-2/+2
| | | | | | | | | | | | | | This was introduced in cf375a3333e54a01462f192202d609436e5fbec8 but the blame is mine because the pseudocode I sent in my review comment for the original patch suggesting to do things this way already had the off-by-one error. This may have caused copy propagation to be unnecessarily strict while checking whether VGRF writes interfere with any ACP entries and possibly miss valid optimization opportunities in cases where multiple copy instructions write sequential locations of the same VGRF. Cc: Iago Toral Quiroga <[email protected]> Reviewed-by: Samuel Iglesias Gonsálvez <[email protected]>
* anv/cmd_buffer: Don't delete command buffers in ResetCommandPool()Ronie Salgado2016-05-271-19/+18
| | | | | | | v2 (Jason Ekstrand): Destroy command buffers in DestroyCommandPool(). Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=95034 Reviewed-by: Jason Ekstrand <[email protected]>
* gallium/util: another s/unsigned/enum pipe_prim_type/ for clangBrian Paul2016-05-271-1/+1
| | | | Trivial.
* anv: Try the first 8 render nodes instead of just renderD128Jason Ekstrand2016-05-271-4/+10
| | | | | | | | This way, if you have other cards installed, the Vulkan driver will still work. No guarantees about WSI working correctly but offscreen should at least work. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=95537
* anv: strdup the device path into the physical deviceJason Ekstrand2016-05-272-2/+4
| | | | | This way we don't have to assume that the string coming in is a piece of constant data that exists forever.
* anv/formats: Exit early for unsupported formatsJason Ekstrand2016-05-271-2/+3
|
* anv/formats: Map VK_FORMAT_UNDEFINED to ISL_FORMAT_UNSUPPORTEDJason Ekstrand2016-05-271-1/+1
| | | | | | At one point in time, we may have used the mapping to ISL_FORMAT_RAW for certain buffer surfaces but that time has long since passed. This fixes a bug where doing format queries on VK_FORMAT_UNDEFINED would assert-fail.
* anv/clear: Remove an unused variableJason Ekstrand2016-05-271-1/+0
|
* gallium/util: another unsigned -> enum pipe_prim_type changeBrian Paul2016-05-271-1/+1
| | | | | | | gcc didn't warn about the unsigned / enum pipe_prim_type mismatch between the .c and .h file. Reviewed-by: Roland Scheidegger <[email protected]>
* i965/compute: Fix uniform init issue when SIMD8 is skippedJordan Justen2016-05-271-1/+1
| | | | | | | | | | | | | In d8347f12ead89c5a58f69ce9283a54ac8487159c, we added support for skipping SIMD8 generation when the program local size is too large for SIMD8 to be usable. This change was missed in that commit. This bug would impact gen7 platforms when the compute shader local size is greater than 512, and gen8 platforms when the local size is greater than 448. Signed-off-by: Jordan Justen <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
* anv: Emit DRAWING_RECTANGLE once at driver initializationJason Ekstrand2016-05-272-13/+9
| | | | | | | | | | Also, we don't actually need it for clipping because meta always colors inside the lines and, for all other operations, the user is required to set a scissor. Since DRAWING_RECTANGLE stalls the GPU, we want to emit it as little as possible. Signed-off-by: Jason Ekstrand <[email protected]> Reviewed-by: Jordan Justen <[email protected]>
* anv/cmd_buffer: Only emit PIPE_CONTROL on-demandJason Ekstrand2016-05-276-70/+140
| | | | | | | | | | | This is in contrast to emitting it directly in vkCmdPipelineBarrier. This has a couple of advantages. First, it means that no matter how many vkCmdPipelineBarrier calls the application strings together it gets one or two PIPE_CONTROLs. Second, it allow us to better track when we need to do stalls because we can flag when a flush has happened and we need a stall. Signed-off-by: Jason Ekstrand <[email protected]> Reviewed-by: Jordan Justen <[email protected]>
* genxml: Make PIPE_CONTROL::CommandStreamerStallEnable a booleanJason Ekstrand2016-05-275-5/+5
| | | | | | | This has been declared as a uint since SNB but it's only one bit. Signed-off-by: Jason Ekstrand <[email protected]> Reviewed-by: Jordan Justen <[email protected]>
* anv/clear: Only clear the render area when doing subpass clearsJason Ekstrand2016-05-273-4/+3
| | | | | Signed-off-by: Jason Ekstrand <[email protected]> Reviewed-by: Jordan Justen <[email protected]>
* anv: Move push constant allocation to the command bufferJason Ekstrand2016-05-275-44/+71
| | | | | | | | | | Instead of blasting it out as part of the pipeline, we put it in the command buffer and only blast it out when it's really needed. Since the PUSH_CONSTANT_ALLOC commands aren't pipelined, they immediately cause a stall which we would like to avoid. Signed-off-by: Jason Ekstrand <[email protected]> Reviewed-by: Jordan Justen <[email protected]>
* radeonsi: enable OpenGL 4.3Bas Nieuwenhuizen2016-05-271-0/+4
| | | | | Signed-off-by: Bas Nieuwenhuizen <[email protected]> Reviewed-by: Marek Olšák <[email protected]>
* nouveau: enable GL 4.3 on kepler/fermiDave Airlie2016-05-281-1/+1
| | | | Signed-off-by: Dave Airlie <[email protected]>
* radeonsi: always reserve output space for tess factorsMarek Olšák2016-05-271-1/+6
| | | | | Reviewed-by: Bas Nieuwenhuizen <[email protected]> Tested-by: Dave Airlie <[email protected]>