summaryrefslogtreecommitdiffstats
path: root/src
Commit message (Collapse)AuthorAgeFilesLines
* i965/fs: Assert that IF instruction with embedded compare has legal exec_size.Francisco Jerez2016-05-271-0/+4
| | | | | | | | | We shouldn't encounter these right now but if we did it wouldn't be possible for the SIMD lowering pass to split it into multiple instructions because of its side effects on control flow, so just assert in order to kill the program. Reviewed-by: Jason Ekstrand <[email protected]>
* i965/fs: Implement HSW BFI exec size workarounds in the SIMD lowering pass.Francisco Jerez2016-05-271-2/+8
| | | | Reviewed-by: Jason Ekstrand <[email protected]>
* i965/fs: Implement workaround for IVB CMP dependency race in the SIMD ↵Francisco Jerez2016-05-271-1/+17
| | | | | | lowering pass. Reviewed-by: Jason Ekstrand <[email protected]>
* i965/fs: Enforce common regioning restrictions by SIMD splitting.Francisco Jerez2016-05-271-20/+104
| | | | | | | | | | | | | | | | | This change addresses a number of hardware restrictions on the source and destination regions and other execution controls of regular FPU-like instructions that in some cases can be avoided by reducing the execution size of the instruction. Some of these restrictions (e.g. the one about 3src instructions not supporting compression on some hardware) are currently being worked around case by case in the generator with ad-hoc splitting code that is buggy in several ways (e.g. doesn't handle non-trivial execution controls which would break SIMD32 code), but it seems cleaner to implement as many restrictions as we can in a single lowering pass since that will allow us to simplify some of the surrounding code considerably and also make sure that we don't forget applying them in the future. Reviewed-by: Jason Ekstrand <[email protected]>
* i965/fs: Enforce extended math exec size limits during SIMD lowering.Francisco Jerez2016-05-271-10/+24
| | | | | | | | | | This teaches the SIMD lowering pass about the hardware limits on the execution size of math instructions, which will allow simplifying the generator code and at the same time get rid of a number of bugs in the manual SIMD unrolling done currently that prevent SIMD32 codegen from working. Reviewed-by: Jason Ekstrand <[email protected]>
* i965/fs: Handle SAMPLEINFO consistently like other texturing instructions.Francisco Jerez2016-05-274-17/+15
| | | | | | | | | | | Seems like this texturing opcode was missing its logical counterpart which would prevent it from taking advantage of the SIMD lowering infrastructure, define it and plumb it through the back-end. At some point we'll likely want to emit a single SAMPLEINFO message shared among all channels irrespective of this change, but for the moment this should be enough to get the intrinsic working in SIMD32 mode. Reviewed-by: Jason Ekstrand <[email protected]>
* i965/fs: Lower math into Gen4-5 send-like instructions in lower_logical_sends.Francisco Jerez2016-05-272-42/+60
| | | | | | | | The benefit is we will be able to use the SIMD lowering pass to unroll math instructions of unsupported width and then remove some cruft from the generator. Reviewed-by: Jason Ekstrand <[email protected]>
* i965/fs: Add missing get_latency_gen7() cases for the Gen7 pull constant ↵Francisco Jerez2016-05-271-0/+2
| | | | | | | | | | | | opcodes. This was causing the scheduler to be rather optimistic about the latency of pull constant opcodes on Gen7+. This might seem to increase the cycle count estimate calculated by the scheduler itself for some shaders, even though the actual cycle count should actually be decreased. Reviewed-by: Jason Ekstrand <[email protected]>
* i965/fs: Rename Gen4 physical varying pull constant load opcode.Francisco Jerez2016-05-276-14/+14
| | | | | | | | | For consistency with the Gen7 variant. I'm not doing the same to the uniform pull constant message at this point because the non-GEN7 one is still overloaded to be either an expression-like logical instruction or a Gen4-specific physical send message. Reviewed-by: Jason Ekstrand <[email protected]>
* i965/fs: Implement promotion of varying pull loads on Gen4 during SIMD lowering.Francisco Jerez2016-05-271-14/+13
| | | | | | | | | Varying pull constant loads inherit the same limitation of pre-ILK hardware that requires expanding SIMD8 texel fetch instructions to SIMD16, we can deal with pull constant loads in the same way it's done for texturing during SIMD lowering. Reviewed-by: Jason Ekstrand <[email protected]>
* i965/fs: Hide varying pull constant load message setup behind logical opcode.Francisco Jerez2016-05-276-31/+39
| | | | | | | | This will allow the SIMD lowering pass to split 32-wide varying pull constant loads (not natively supported by the hardware) into 16-wide instructions. Reviewed-by: Jason Ekstrand <[email protected]>
* i965/fs: Avoid constant propagation when the type sizes don't match.Francisco Jerez2016-05-271-0/+8
| | | | | | | | | | | | The case where the source type of the instruction is smaller than the immediate type could be handled by calculating the portion of the immediate read by the instruction (assuming that the source channels are aligned with the destination channels of the copy) and then representing the same value as an immediate of the source type (assuming such an immediate type exists), but the code below doesn't do that, so just bail for the moment. Reviewed-by: Jason Ekstrand <[email protected]>
* i965/fs: Fix CSE temporary copy for some LOAD_PAYLOAD corner cases.Francisco Jerez2016-05-271-1/+2
| | | | | | | | | | | | | If the LOAD_PAYLOAD instruction only has header sources it's possible for the number of registers written to be less than or equal to the SIMD component size, in which case it would take the single-MOV path at the bottom which would cause the channel enable masks to be applied incorrectly to the header contents and/or cause it to write past the end of the allocated temporary. If the instruction is either LOAD_PAYLOAD or doesn't write exactly one component the MOV path is going to mess up the program so just don't use it. Reviewed-by: Jason Ekstrand <[email protected]>
* i965/fs: Handle instruction predication in SIMD lowering pass.Francisco Jerez2016-05-271-1/+11
| | | | Reviewed-by: Jason Ekstrand <[email protected]>
* i965/fs: No need to unzip SIMD-periodic sources during SIMD lowering.Francisco Jerez2016-05-271-1/+1
| | | | | | | | | | | | If the source value is going to the same for all SIMD-lowered chunks of the instruction there should be no need to unzip the value into multiple temporary registers one for each lowered chunk. As a side effect this fixes SIMD lowering of instructions with a vector immediate source. In the long term it *might* still be worth fixing offset() to handle vector immediates correctly though, this should be good enough for the moment. Reviewed-by: Jason Ekstrand <[email protected]>
* i965/fs: Generalize is_uniform() to is_periodic().Francisco Jerez2016-05-271-1/+30
| | | | | | This will be useful in the SIMD lowering pass. Reviewed-by: Jason Ekstrand <[email protected]>
* i965/fs: Fix byte_offset() for MRF/ARF/FIXED_GRF regs.Francisco Jerez2016-05-271-11/+17
| | | | Reviewed-by: Jason Ekstrand <[email protected]>
* i965/fs: Fix off-by-one region overlap comparison in copy propagation.Francisco Jerez2016-05-271-2/+2
| | | | | | | | | | | | | | This was introduced in cf375a3333e54a01462f192202d609436e5fbec8 but the blame is mine because the pseudocode I sent in my review comment for the original patch suggesting to do things this way already had the off-by-one error. This may have caused copy propagation to be unnecessarily strict while checking whether VGRF writes interfere with any ACP entries and possibly miss valid optimization opportunities in cases where multiple copy instructions write sequential locations of the same VGRF. Cc: Iago Toral Quiroga <[email protected]> Reviewed-by: Samuel Iglesias Gonsálvez <[email protected]>
* anv/cmd_buffer: Don't delete command buffers in ResetCommandPool()Ronie Salgado2016-05-271-19/+18
| | | | | | | v2 (Jason Ekstrand): Destroy command buffers in DestroyCommandPool(). Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=95034 Reviewed-by: Jason Ekstrand <[email protected]>
* gallium/util: another s/unsigned/enum pipe_prim_type/ for clangBrian Paul2016-05-271-1/+1
| | | | Trivial.
* anv: Try the first 8 render nodes instead of just renderD128Jason Ekstrand2016-05-271-4/+10
| | | | | | | | This way, if you have other cards installed, the Vulkan driver will still work. No guarantees about WSI working correctly but offscreen should at least work. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=95537
* anv: strdup the device path into the physical deviceJason Ekstrand2016-05-272-2/+4
| | | | | This way we don't have to assume that the string coming in is a piece of constant data that exists forever.
* anv/formats: Exit early for unsupported formatsJason Ekstrand2016-05-271-2/+3
|
* anv/formats: Map VK_FORMAT_UNDEFINED to ISL_FORMAT_UNSUPPORTEDJason Ekstrand2016-05-271-1/+1
| | | | | | At one point in time, we may have used the mapping to ISL_FORMAT_RAW for certain buffer surfaces but that time has long since passed. This fixes a bug where doing format queries on VK_FORMAT_UNDEFINED would assert-fail.
* anv/clear: Remove an unused variableJason Ekstrand2016-05-271-1/+0
|
* gallium/util: another unsigned -> enum pipe_prim_type changeBrian Paul2016-05-271-1/+1
| | | | | | | gcc didn't warn about the unsigned / enum pipe_prim_type mismatch between the .c and .h file. Reviewed-by: Roland Scheidegger <[email protected]>
* i965/compute: Fix uniform init issue when SIMD8 is skippedJordan Justen2016-05-271-1/+1
| | | | | | | | | | | | | In d8347f12ead89c5a58f69ce9283a54ac8487159c, we added support for skipping SIMD8 generation when the program local size is too large for SIMD8 to be usable. This change was missed in that commit. This bug would impact gen7 platforms when the compute shader local size is greater than 512, and gen8 platforms when the local size is greater than 448. Signed-off-by: Jordan Justen <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
* anv: Emit DRAWING_RECTANGLE once at driver initializationJason Ekstrand2016-05-272-13/+9
| | | | | | | | | | Also, we don't actually need it for clipping because meta always colors inside the lines and, for all other operations, the user is required to set a scissor. Since DRAWING_RECTANGLE stalls the GPU, we want to emit it as little as possible. Signed-off-by: Jason Ekstrand <[email protected]> Reviewed-by: Jordan Justen <[email protected]>
* anv/cmd_buffer: Only emit PIPE_CONTROL on-demandJason Ekstrand2016-05-276-70/+140
| | | | | | | | | | | This is in contrast to emitting it directly in vkCmdPipelineBarrier. This has a couple of advantages. First, it means that no matter how many vkCmdPipelineBarrier calls the application strings together it gets one or two PIPE_CONTROLs. Second, it allow us to better track when we need to do stalls because we can flag when a flush has happened and we need a stall. Signed-off-by: Jason Ekstrand <[email protected]> Reviewed-by: Jordan Justen <[email protected]>
* genxml: Make PIPE_CONTROL::CommandStreamerStallEnable a booleanJason Ekstrand2016-05-275-5/+5
| | | | | | | This has been declared as a uint since SNB but it's only one bit. Signed-off-by: Jason Ekstrand <[email protected]> Reviewed-by: Jordan Justen <[email protected]>
* anv/clear: Only clear the render area when doing subpass clearsJason Ekstrand2016-05-273-4/+3
| | | | | Signed-off-by: Jason Ekstrand <[email protected]> Reviewed-by: Jordan Justen <[email protected]>
* anv: Move push constant allocation to the command bufferJason Ekstrand2016-05-275-44/+71
| | | | | | | | | | Instead of blasting it out as part of the pipeline, we put it in the command buffer and only blast it out when it's really needed. Since the PUSH_CONSTANT_ALLOC commands aren't pipelined, they immediately cause a stall which we would like to avoid. Signed-off-by: Jason Ekstrand <[email protected]> Reviewed-by: Jordan Justen <[email protected]>
* radeonsi: enable OpenGL 4.3Bas Nieuwenhuizen2016-05-271-0/+4
| | | | | Signed-off-by: Bas Nieuwenhuizen <[email protected]> Reviewed-by: Marek Olšák <[email protected]>
* nouveau: enable GL 4.3 on kepler/fermiDave Airlie2016-05-281-1/+1
| | | | Signed-off-by: Dave Airlie <[email protected]>
* radeonsi: always reserve output space for tess factorsMarek Olšák2016-05-271-1/+6
| | | | | Reviewed-by: Bas Nieuwenhuizen <[email protected]> Tested-by: Dave Airlie <[email protected]>
* glsl/linker: call link_uniform blocks on linked shader.Dave Airlie2016-05-281-1/+1
| | | | | | | | | | | | | | | | | The old code called this on the prelinked shader list, but at this point we have the linked shader, so we should call the interface on that alone. This fixes a regression in: dEQP-GLES31.functional.ssbo.layout.random.all_per_block_buffers.13 introduced in 5b2675093e863a52b610f112884ae12d42513770 glsl: handle implicit sized arrays in ssbo Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=96228 Reviewed-by: Timothy Arceri <[email protected]> Reported-by: Mark James Signed-off-by: Dave Airlie <[email protected]>
* mesa/get: drop unused extension checks.Dave Airlie2016-05-281-3/+0
| | | | | | | These all show up as unused warnings here, so drop them for now. Reviewed-by: Emil Velikov <[email protected]> Signed-off-by: Dave Airlie <[email protected]>
* gallium/ddebug: Add passthrough for query_memory_info.Bas Nieuwenhuizen2016-05-271-0/+9
| | | | | Signed-off-by: Bas Nieuwenhuizen <[email protected]> Reviewed-by: Marek Olšák <[email protected]>
* nir/inline: Also rewrite param derefs for texture instructionsJason Ekstrand2016-05-271-6/+20
| | | | | | | Without this, samplers get left hanging as derefs to variables that don't actually exist. Reviewed-by: Connor Abbott <[email protected]>
* nir/inline: Break the guts of rewrite_param-derefs into a helperJason Ekstrand2016-05-271-19/+30
| | | | Reviewed-by: Connor Abbott <[email protected]>
* nir/inline: Make the rewrite_param_derefs helper work on instructionsJason Ekstrand2016-05-271-28/+25
| | | | | | | Now that we have the better nir_foreach_block macro, there's no reason to use the archaic block version for everything. Reviewed-by: Connor Abbott <[email protected]>
* nir/inline: Don't use foreach_instr_safe unless we need toJason Ekstrand2016-05-271-2/+2
| | | | Suggested-by: Connor Abbott <[email protected]>
* gallivm: eliminate a unnecessary AND with unorm lerpsRoland Scheidegger2016-05-271-10/+35
| | | | | | | | | Instead of doing a add and then mask out the upper bits, we can simply do a add with a half wide type (this, of course, assumes the hw can actually do it...), so we'll get the required zero in the upper bits automatically. Reviewed-by: Jose Fonseca <[email protected]>
* gallium/util: use enum pipe_prim_type instead of unsigned some moreRoland Scheidegger2016-05-271-5/+16
| | | | | | | | There were complaints from a mingw build: u_draw.h:134:14: error: invalid conversion from ‘uint {aka unsigned int}’ to ‘pipe_prim_type’ [-fpermissive] Reviewed-by: Brian Paul <[email protected]>
* svga: remove unneeded casts in get_query_result_vgpu9() callsBrian Paul2016-05-271-2/+2
| | | | Reviewed-by: Charmaine Lee <[email protected]>
* svga: use MAYBE_UNUSED to silence release-build warningsBrian Paul2016-05-271-7/+4
| | | | Signed-off-by: Brian Paul <[email protected]>
* isl: Fix some tautological-compare warningsBen Widawsky2016-05-262-2/+10
| | | | | | | | | | | | Fixes: isl.c:62:22: warning: self-comparison always evaluates to true [-Wtautological-compare] assert(ISL_DEV_GEN(dev) == dev->info->gen); ^~ isl.c:63:33: warning: self-comparison always evaluates to true [-Wtautological-compare] assert(ISL_DEV_USE_SEPARATE_STENCIL(dev) == dev->use_separate_stencil); Signed-off-by: Ben Widawsky <[email protected]> Reviewed-by: Anuj Phogat <[email protected]>
* mesa: add support for GLSL ES 3.20 version stringIlia Mirkin2016-05-261-0/+2
| | | | | Signed-off-by: Ilia Mirkin <[email protected]> Reviewed-by: Anuj Phogat <[email protected]>
* mapi: expose new functions in GL ES 3.2Ilia Mirkin2016-05-269-38/+38
| | | | | Signed-off-by: Ilia Mirkin <[email protected]> Reviewed-by: Anuj Phogat <[email protected]>
* nvc0/ir: handle a load's reg result not being used for locked variantsIlia Mirkin2016-05-263-11/+45
| | | | | | | | | | | | | | For a load locked, we might not use the first result but the second result is the predicate result of the locking. In that case the load splitting logic doesn't apply (which is designed for splitting 128-bit loads). Instead we take the predicate and move it into the first position (as having a dead result in first def's position upsets all sorts of things including RA). Update the emitters to deal with this as well. Signed-off-by: Ilia Mirkin <[email protected]> Tested-by: Dave Airlie <[email protected]> Reviewed-by: Samuel Pitoiset <[email protected]>