summaryrefslogtreecommitdiffstats
path: root/src
Commit message (Collapse)AuthorAgeFilesLines
* nir/opt_large_constants: Handle store writemasksConnor Abbott2019-09-241-20/+24
| | | | | | | | | | | | | | | This fixes some piglit tests on radeonsi NIR where a varying is initialized to a constant array in the vertex shader. Varying packing after nir_lower_io_to_temporaries creates writemasked stores which persist after pulling the constant initialization down into the fragment shader. While we're here, rewrite handle_constant_store() to do the loop over components outside the switch, so that we don't have to duplicate the writemask checking for every bitsize. Fixes: 1235850522c ("nir: Add a large constants optimization pass") Reviewed-by: Caio Marcelo de Oliveira Filho <[email protected]>
* radv: fix s/load/store/ copy-paste typoEric Engestrom2019-09-241-1/+1
| | | | | | Fixes: cdc6efddf918bc07d30d ("radv: implement all depth/stencil resolve modes using graphics") Signed-off-by: Eric Engestrom <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]>
* nouveau: add idep_nir_headers as dep for libnouveauStephen Barber2019-09-241-2/+2
| | | | | | | | | | | | | | | Fixes a compilation error when building libnouveau: In file included from ../src/gallium/drivers/nouveau/nv50/nv50_program.c:25: ../src/compiler/nir/nir.h:1115:10: fatal error: nir_intrinsics.h: No such file or directory #include "nir_intrinsics.h" ^~~~~~~~~~~~~~~~~~ compilation terminated. Fixes: f014ae3c7cce504afe5d ("nouveau: add support for nir") Signed-off-by: Stephen Barber <[email protected]> Reviewed-by: Eric Engestrom <[email protected]> Reviewed-by: Karol Herbst <[email protected]>
* radv: Add workaround for hang in The Surge 2.Bas Nieuwenhuizen2019-09-241-0/+8
| | | | | | | | | | | Released today and hangs on RADV. We don't have the root cause yet, but this should unblock people playing the game. No drirc because the radv debugflags are not usable from drirc and I want this backported. CC: <[email protected]> Reviewed-by: Dave Airlie <[email protected]>
* i965/fs: set rounding mode when emitting the flrp instructionAndres Gomez2019-09-241-0/+7
| | | | | | | | | | | flrp was forgotten when already adding the rounding mode for other instructions. Fixes: ba1e25e1aa6 ("i965/fs: set rounding mode when emitting fadd, fmul and ffma instructions") Suggested-by: Ian Romanick <[email protected]> Signed-off-by: Andres Gomez <[email protected]> Reviewed-by: Caio Marcelo de Oliveira Filho <[email protected]> Reviewed-by: Samuel Iglesias Gonsálvez <[email protected]>
* i965/fs: add a comment about how the rounding mode in fmul is setAndres Gomez2019-09-241-0/+4
| | | | | | | | | | | | | | | | | After 1711bf6cf2d ("intel/fs: Generate better code for fsign multiplied by a value"), the conflicts resolution for setting the rounding mode after the fused fmul and fsign optimization is non obvious. Basically, the optimization doesn't really result in a MUL, or any other operation which would need to have the rounding mode set. Hence, we set it just before the actual MUL in the treatment of fmul. Fixes: ba1e25e1aa6 ("i965/fs: set rounding mode when emitting fadd, fmul and ffma instructions") Suggested-by: Caio Marcelo de Oliveira Filho <[email protected]> Signed-off-by: Andres Gomez <[email protected]> Reviewed-by: Caio Marcelo de Oliveira Filho <[email protected]> Reviewed-by: Samuel Iglesias Gonsálvez <[email protected]>
* lima/gpir: Fix 64-bit shift in scheduler spillingConnor Abbott2019-09-241-2/+2
| | | | | | There are 64 physical registers so the shift must be 64 bits. Reviewed-by: Vasily Khoruzhick <[email protected]>
* lima/gpir: Don't emit movs when translating from NIRConnor Abbott2019-09-241-36/+50
| | | | | | | | | | | The scheduler doesn't expect them. To do this, I had to refactor the registration part of gpir_node_create_dest() to be separate from creating and inserting the node, since the last two now aren't done when handling moves. This adds more code but creates the possibility of automatically inserting input dependencies when inserting nodes, similar to what's done in NIR with the use-def lists (this isn't done yet). Reviewed-by: Vasily Khoruzhick <[email protected]>
* lima/gpir: Fix postlog2 fixup handlingConnor Abbott2019-09-241-11/+12
| | | | | | | | | | | | We guarantee that a complex1 op is always used by postlog2 directly by rewriting the postlog2 op to be a move when there would be a move inserted between them. But we weren't doing this in all circumstances where there might be a move. Move the logic to place_move() so that it always happens. Fixes a few log tests that happened to start failing due to changes in the register allocator leading to a different scheduling order. Reviewed-by: Vasily Khoruzhick <[email protected]>
* lima/gpir: Use registers for values live in multiple blocksConnor Abbott2019-09-247-156/+648
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This commit adds the framework for cross-basic-block register allocation. Like ARM's compiler, we assume that the value registers aren't usable across branches, which means we have to use physical registers to store any value that crosses a basic block. There are three parts to this: 1. When translating from NIR, we rely on the NIR out-of-ssa pass to coalesce values into registers. We insert store_reg instructions for values used in more than one basic block, and load_reg instructions for values not defined in the same basic block (or defined after their use, for loops). So by the time we've translated out of NIR we've already split things into values (which are only used in the same basic block) and registers (which are only used in different basic blocks than where they're defined). 2. We allocate the registers at the same time that we allocate the values, before the final scheduler. Unlike the values, where the assigned color is fake, we assign the actual physical index & component to physregs at this stage. load_reg and store_reg are treated as moves in the allocator and when creating write-after-read dependencies. 3. Finally, in the main scheduler we have to avoid overwriting existing live physregs when spilling. First, we have to tell the scheduler which physical registers are live at the end of each block, to avoid overwriting those. If a register is only live at the beginning, we can reuse it for spilling after the last original use in the final program happens, i.e. before any original use is scheduled, but we have to be careful to add the proper dependencies so that the spill write is scheduled before the original reads. To handle this we repurpose reg_link for uses to be used by the scheduler. A few register-related things copied over from NIR or from other drivers can be dropped. Reviewed-by: Vasily Khoruzhick <[email protected]>
* lima/gpir: Support branch instructionsConnor Abbott2019-09-246-78/+102
| | | | | | | | | | | | | | | | Because branch conditions have to be in the pass slot, there is no unconditional branch, and realistically the pass slot has to contain a move when branching (there's nothing it does that would be useful for operating on booleans, so we can't use it for anything when computing the branch condition), we put the branch instruction in the pass slot and at codegen time turn it into a move of the branch condition. This means that it doesn't have to be special-cased like store instructions are in the scheduler. Because of this decision we can remove the half-implemented BRANCH codegen slot. Finally, we (ab)use the existing schedule_first mechanism to make sure that branches are always last in the basic block. Reviewed-by: Vasily Khoruzhick <[email protected]>
* lima/gpir: Only try to place actual childrenConnor Abbott2019-09-241-1/+1
| | | | | | | | | | When picking a node to be scheduled, we try to schedule its children as well. But we shouldn't try to schedule nodes which only have a fake dependency on the original node, since this isn't the point of scheduling children at the same time and can break some expectations of the rest of the code. Reviewed-by: Vasily Khoruzhick <[email protected]>
* lima/gpir: Fix compiler warningConnor Abbott2019-09-241-1/+1
| | | | Reviewed-by: Vasily Khoruzhick <[email protected]>
* glx: Implement GLX_EXT_no_config_contextAdam Jackson2019-09-2312-26/+65
| | | | | | | | | | This is the GLX counterpart to EGL_KHR_no_config_context. Contexts may now be created without reference to an fbconfig, in which case it is treated as compatible with any fbconfig (and thus any GLX drawable). Khronos: https://github.com/KhronosGroup/OpenGL-Registry/pull/102 Reviewed-by: Kenneth Graunke <[email protected]> Reviewed-by: Ian Romanick <[email protected]>
* glx: Lift sending the MakeCurrent request to top-level codeAdam Jackson2019-09-232-167/+187
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Somewhat terrifyingly, we never sent this for direct contexts, which means the server never knew the context/drawable bindings. To handle this sanely, pull the request code up out of the indirect backend, and rewrite the context switch path to call it as appropriate. This attempts to preserve the existing behavior of not calling unbind() on the context if its refcount would not drop to zero. Of course, you can't just do this indiscriminately, because this is GLX and extant X servers have bugs and everything is terrible. To wit: - For 1.20.x prior to 1.20.6, you can bind a direct context once, but the second time you try to modify the context's binding you will get GLXBadContextTag. This includes unbinding the context. And "deleting" the context will leak memory, because it will still appear to be current. - For 1.19 and earlier, glXMakeCurrent(dpy, None, ctx) should be legal for GL 3.0+ contexts, but the server will throw BadMatch. To guard against this, we only send the request for indirect contexts unless the server is known good, and only mention one context at a time in such a request; if switching between contexts, we first unbind the old, and then bind the new. Note that the second VendorRelease() version is to catch XFree86 4.x and Xorg [67].x, which almost certainly have the above bugs. Other servers might report different version numbers here, but we can't do direct rendering against them, so this should be safe. Fixes glx-make-context, glx-multi-window-single-context and glx-query-drawable-glx_fbconfig_id-window. Sufficiently old piglit will regress on glx-make-glxdrawable-current (throwing BadMatch), which is fixed by mesa/piglit!116.
* glx: Move vertex array protocol state into the indirect backendAdam Jackson2019-09-232-16/+22
| | | | | Only relevant for indirect contexts, so let's get that code out of the common path.
* intel: Increase Gen11 compute shader scratch IDs to 64.Kenneth Graunke2019-09-233-2/+41
| | | | | | | | | | | | | | | | | | | | | | | From the MEDIA_VFE_STATE docs: "Starting with this configuration, the Maximum Number of Threads must be set to (#EU * 8) for GPGPU dispatches. Although there are only 7 threads per EU in the configuration, the FFTID is calculated as if there are 8 threads per EU, which in turn requires a larger amount of Scratch Space to be allocated by the driver." It's pretty clear that we need to increase this for scratch address calculations, because the FFTID has a certain bit-pattern. The quote above seems to indicate that we should increase the actual thread count programmed in MEDIA_VFE_STATE as well, but we think the intention is to only bump the scratch space. Fixes GPU hangs in Bioshock Infinite and Synmark's CSDof on Icelake 8x8. Fixes: 5ac804bd9ac ("intel: Add a preliminary device for Ice Lake") Reviewed-by: Matt Turner <[email protected]>
* Revert "intel/gen11+: Enable Hardware filtering of Semi-Pipelined State in WM"Kenneth Graunke2019-09-234-29/+0
| | | | | | | | | | | | | | | This reverts commit 729de1488f49033bc181b8123af5658228a51bf1. It turns out that, although the register is in the logical context, it isn't whitelisted, so we can't actually write it from userspace batch buffers. The write just becomes a noop, which is why we saw no performance changes. I manually whitelisted it, and still observed no performance gains, but it did regress KHR-GL46.texture_cube_map_array.color_depth_attachments on the iris driver. So we might need to fix something before enabling this. To prevent it randomly getting turned on should the kernel ever whitelist this register, we revert the patch for now.
* util/rb_tree: Replace useless ifs with assertsJason Ekstrand2019-09-231-2/+2
| | | | Reviewed-by: Ian Romanick <[email protected]>
* broadcom/genxml: Stop manually scrubbing 'α' -> "alpha"Kenneth Graunke2019-09-231-1/+0
| | | | | | | 'α' has never appeared in any genxml files, so there's no need to replace it with the word "alpha". Reviewed-by: Eric Anholt <[email protected]>
* intel/genxml: Stop manually scrubbing 'α' -> "alpha"Kenneth Graunke2019-09-232-2/+1
| | | | | | | 'α' has never appeared in any genxml files, so there's no need to replace it with the word "alpha". Reviewed-by: Jordan Justen <[email protected]>
* freedreno/a6xx: do streamout only in binning passRob Clark2019-09-232-13/+16
| | | | | | | | | | Use VPC_SO_OVERRIDE to control whether we do streamout in binning or draw pass. Normally we want to do streamout in binning pass, except when there is a single tile and binning passed is skipped. Signed-off-by: Rob Clark <[email protected]> Reviewed-by: Eric Anholt <[email protected]> Reviewed-by: Kristian H. Kristensen <[email protected]>
* freedreno/a6xx: fix binning pass vs. xfbRob Clark2019-09-231-3/+7
| | | | | | | | | | We could bit doing streamout from binning pass. In this case we want to use the full VS which doesn't have (potentially streamed out) varyings stripped out. Signed-off-by: Rob Clark <[email protected]> Reviewed-by: Eric Anholt <[email protected]> Reviewed-by: Kristian H. Kristensen <[email protected]>
* freedreno/a6xx: un-open-code PC_PRIMITIVE_CNTL_1.PSIZERob Clark2019-09-231-1/+1
| | | | | | Signed-off-by: Rob Clark <[email protected]> Reviewed-by: Eric Anholt <[email protected]> Reviewed-by: Kristian H. Kristensen <[email protected]>
* ac/nir: force unnormalized coordinates for RECTMarek Olšák2019-09-231-1/+3
| | | | | | This fixes VAAPI. Reviewed-by: Connor Abbott <[email protected]>
* ac/nir: port Z compare value clamping from radeonsiMarek Olšák2019-09-231-9/+25
| | | | | | This fixes some dEQP tests. Reviewed-by: Connor Abbott <[email protected]>
* tgsi_to_nir: fix 2-component system values like tess_level_inner_defaultMarek Olšák2019-09-231-1/+3
| | | | | Reviewed-by: Connor Abbott <[email protected]> Reviewed-by: Eric Anholt <[email protected]>
* tgsi_to_nir: fix masked out image loadsMarek Olšák2019-09-231-2/+1
| | | | | | | This caused a failure in NIR validation. Reviewed-by: Connor Abbott <[email protected]> Reviewed-by: Eric Anholt <[email protected]>
* nir: define 8-byte size and alignment for bindless variablesMarek Olšák2019-09-231-1/+6
| | | | Reviewed-by: Connor Abbott <[email protected]>
* nir: don't add bindless variables to num_textures and num_imagesMarek Olšák2019-09-231-0/+4
| | | | | | It confuses radeonsi. Reviewed-by: Connor Abbott <[email protected]>
* loader: always map the "amdgpu" kernel driver name to radeonsi (v2)Jiang, Sonny2019-09-231-0/+9
| | | | | | | | v2: cleanup Signed-off-by: Sonny Jiang <[email protected]> Signed-off-by: Marek Olšák <[email protected]> Acked-by: Pierre-Eric Pelloux-Prayer <[email protected]>
* ac: stop using PCI IDs for chip identificationMarek Olšák2019-09-231-15/+58
| | | | | | PCI IDs for amdgpu will be removed from Mesa. Acked-by: Pierre-Eric Pelloux-Prayer <[email protected]>
* ac/addrlib: fix chip identification for Vega10, Arcturus, Raven2, RenoirMarek Olšák2019-09-231-10/+5
| | | | | Cc: 19.2 <[email protected]> Acked-by: Pierre-Eric Pelloux-Prayer <[email protected]>
* nir/repair_ssa: Replace the unreachable check with the phi builderJason Ekstrand2019-09-231-35/+44
| | | | | | | | | | | | | | | In a3268599f3c9, I attempted to fix nir_repair_ssa for unreachable blocks. However, that commit missed the possibility that the use is in a block which, itself, is unreachable. In this case, we can end up in an infinite loop trying to replace a def with itself. Even though a no-op replacement is a fine operation, it keeps extending the end of the uses list as we're walking it. Instead of explicitly checking for the group of conditions, just check if the phi builder gives us a different def. That's guaranteed to be 100% reliable and, while it lacks symmetry with the is_valid checks, should be more reliable. Fixes: a3268599 "nir/repair_ssa: Repair dominance for unreachable..." Reviewed-by: Ian Romanick <[email protected]>
* aco: only emit waitcnt on loop continues if we there was some load or exportDaniel Schürmann2019-09-231-1/+1
| | | | Reviewed-by: Rhys Perry <[email protected]>
* nv50/ir/nir: comparison of integer expressions of different signedness warningKarol Herbst2019-09-231-1/+1
| | | | | Signed-off-by: Karol Herbst <[email protected]> Reviewed-by: Rhys Kidd <[email protected]>
* nv50/ir: fix unnecessary parentheses warningKarol Herbst2019-09-231-1/+1
| | | | | Signed-off-by: Karol Herbst <[email protected]> Reviewed-by: Rhys Kidd <[email protected]>
* lima: remove partial clear support from pipe->clear()Erico Nunes2019-09-231-93/+5
| | | | | | | | | | | | | | | pipe->clear() is not called for partial clears, which mesa emulates by drawing a quad. Furthermore, drivers should not use rasterizer state information for scissor information (which was being used to handle the partial clears). So, remove the partial clear support since it was not supposed to be handled by pipe->clear() anyway. This fixes issues with clearing after switching to different sized framebuffers. Signed-off-by: Erico Nunes <[email protected]> Reviewed-by: Vasily Khoruzhick <[email protected]> Reviewed-by: Qiang Yu <[email protected]>
* dEQP-GLES2.functional.buffer.write.use.index_array.* are passing now.Boris Brezillon2019-09-231-2/+0
| | | | Signed-off-by: Boris Brezillon <[email protected]>
* panfrost: Fix indexed drawsBoris Brezillon2019-09-231-1/+1
| | | | | | | | | ->padded_count should be large enough to cover all vertices pointed by the index array. Use the local vertex_count variable that contains the updated vertex_count value for the indexed draw case. Signed-off-by: Boris Brezillon <[email protected]> Reviewed-by: Alyssa Rosenzweig <[email protected]>
* clover/nir: fix compilation with g++-5.5 and maybe earlierKarol Herbst2019-09-231-10/+7
| | | | | | | | fixes "sorry, unimplemented: non-trivial designated initializers not supported" Fixes: deb04adf2ae ("clover: add support for passing kernels as nir to the driver") Signed-off-by: Karol Herbst <[email protected]> Reviewed-by: Eric Engestrom <[email protected]>
* st/mesa: Bail on incomplete attachments in discard_framebufferKenneth Graunke2019-09-221-1/+1
| | | | | | | | | | | Incomplete attachments don't have an associated pipe_surface, so this would crash. Fixes a WebGL conformance test that uses incomplete attachments: https://www.khronos.org/registry/webgl/sdk/tests/conformance2/renderbuffers/invalidate-framebuffer.html?webglVersion=2&quiet=0&quick=1 Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=111756 Reviewed-By: Tapani Pälli <[email protected]>
* lima: implement BO cacheVasily Khoruzhick2019-09-228-30/+212
| | | | | | | | | | | | | | | Allocating BOs is expensive, so we should avoid doing that by caching freed BOs. BO cache is modelled after one in v3d driver and works as follows: - in lima_bo_create() check if we have matching BO in cache and return it if there's one, allocate new BO otherwise. - in lima_bo_unreference() (renamed from lima_bo_free()): put BO in cache instead of freeing it and remove all stale BOs from cache Reviewed-by: Qiang Yu <[email protected]> Signed-off-by: Vasily Khoruzhick <[email protected]>
* lima: use 0 to poll if BO is busy in lima_bo_wait()Vasily Khoruzhick2019-09-221-1/+7
| | | | | | | | | os_time_get_absolute_timeout(0) returns current time, while kernel driver expects 0 as value to poll BO status and return immediately. Fix it by setting abs_timeout to 0 if timeout_ns is 0 Reviewed-by: Qiang Yu <[email protected]> Signed-off-by: Vasily Khoruzhick <[email protected]>
* lima: move damage bound build to resourceQiang Yu2019-09-233-22/+41
| | | | | Reviewed-and-Tested-by: Vasily Khoruzhick <[email protected]> Signed-off-by: Qiang Yu <[email protected]>
* lima: don't use damage system when full damageQiang Yu2019-09-231-0/+14
| | | | | | | | | Some time weston set full damage region. It is more effient to use the cached pp stream instead of dynamically create one. Reviewed-and-Tested-by: Vasily Khoruzhick <[email protected]> Signed-off-by: Qiang Yu <[email protected]>
* lima: implement EGL_KHR_partial_updateQiang Yu2019-09-235-65/+86
| | | | | | | | | | This extension set a damage region for each buffer swap which can be used to reduce buffer reload cost by only feed damage region's tile buffer address for PP. Reviewed-and-Tested-by: Vasily Khoruzhick <[email protected]> Signed-off-by: Qiang Yu <[email protected]>
* lima: fix PLBU viewport configurationIcenowy Zheng2019-09-223-21/+21
| | | | | | | | | | | | | The PLBU expects the viewport's 4 borders' coordinates, however currently we're feeding the coordinate of the left-bottom point and the size to it, which leads to misrendering when the left-bottom point is not (0,0). Change the macros for the viewport PLBU command, and the data feed to it. The code to calculate the 4 borders is ported from Panfrost. Signed-off-by: Icenowy Zheng <[email protected]> Reviewed-by: Qiang Yu <[email protected]>
* amd: Build aco only if radv is enabledBas Nieuwenhuizen2019-09-211-1/+1
| | | | | | | | ACO depends on C++14, but radeonsi/radv with LLVM 8,9 do not. Let us only require it for RADV, since that is the only user. Fixes: a70a9987181 "radv/aco: Setup alternate path in RADV to support the experimental ACO compiler" Reviewed-by: Marek Olšák <[email protected]>
* nvc0: expose spirv supportKarol Herbst2019-09-214-3/+26
| | | | | | | | | | required for OpenCL v2: adjust to changes in previous commits v3: properly convert to NIR in nvc0_cp_state_create Signed-off-by: Karol Herbst <[email protected]> Reviewed-by: Pierre Moreau <[email protected]> (v1)