summaryrefslogtreecommitdiffstats
Commit message (Collapse)AuthorAgeFilesLines
* r600: add cull distance supportDave Airlie2017-11-218-7/+26
| | | | | | This passes all the tests in piglit. Signed-off-by: Dave Airlie <[email protected]>
* i965: Optimize bucket index calculationAravindan Muthukumar2017-11-201-8/+39
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Reducing Bucket index calculation to O(1). This algorithm calculates the index using matrix method. Assuming PAGE_SIZE is 4096, matrix arrangement is as below: 1*4096 2*4096 3*4096 4*4096 5*4096 6*4096 7*4096 8*4096 10*4096 12*4096 14*4096 16*4096 20*4096 24*4096 28*4096 32*4096 ... ... ... ... ... ... ... ... ... ... ... max_cache_size From this matrix its clearly seen that every row follows the below way: ... ... ... n n+(1/4)n n+(1/2)n n+(3/4)n 2n Row is calculated as log2(size/PAGE_SIZE) Column is calculated as converting the difference between the elements to fit into power size of two and indexing it. Final Index is (row*4)+(col-1) Tested with Intel Mesa CI. Improves performance of 3DMark on BXT by 0.705966% +/- 0.229767% (n=20) v4: Review comments on style and code comments implemented (Ian). v3: Review comments implemented (Ian). v2: Review comments implemented (Jason). Signed-off-by: Aravindan Muthukumar <[email protected]> Signed-off-by: Kedar Karanje <[email protected]> Reviewed-by: Yogesh Marathe <[email protected]> Signed-off-by: Ian Romanick <[email protected]>
* meson: Guard the gallium dri componenetDylan Baker2017-11-201-2/+4
| | | | | | | | Currently the target has a redundant guard, and the state tracker isn't properly guarded. Signed-off-by: Dylan Baker <[email protected]> Reviewed-by: Eric Anholt <[email protected]>
* meson: don't build gallium subdir unless we're building galliumDylan Baker2017-11-201-1/+3
| | | | | | | This will allow us to simplify some guards within the gallium directory. Signed-off-by: Dylan Baker <[email protected]> Reviewed-by: Eric Anholt <[email protected]>
* broadcom/vc5: Align 1D texture miplevels to 64b.Eric Anholt2017-11-201-0/+2
| | | | Fixes tex-miplevel-selection GL2:texture() 1D
* broadcom/vc5: Clamp min lod to the last level.Eric Anholt2017-11-201-2/+3
| | | | | | Otherwise, the simulator would complain in tex-miplevel-selection that the min/max clamp was out of order. The actual HW seems to have clamped to the max anyway.
* broadcom/vc5: Increase simulator memory for tex-miplevel-selection.Eric Anholt2017-11-201-1/+1
| | | | | We were overflowing, because of all the little 4k allocations for CLs that were getting expanded to 128kb in the simulator due to the GMP alignment.
* swr/rast: Repair simd8 frontend code rotTim Rowley2017-11-201-1/+1
| | | | | | Keep non-default simd8 frontend code running for comparison purposes. Reviewed-by: Bruce Cherniak <[email protected]>
* swr/rast: Implement AVX-512 GATHERPS in SIMD16 fetch shaderTim Rowley2017-11-204-29/+220
| | | | | | Disabled for now. Reviewed-by: Bruce Cherniak <[email protected]>
* swr/rast: Simplify GATHER* jit builder apiTim Rowley2017-11-204-48/+48
| | | | | | | General cleanup, and prep work for possibly moving to llvm masked gather intrinsic. Reviewed-by: Bruce Cherniak <[email protected]>
* swr/rast: Add alignment to transpose targetsTim Rowley2017-11-201-8/+8
| | | | | | | | Needed to ensure alignment for avx512. Fixes address sanitizer crash. Reviewed-by: Bruce Cherniak <[email protected]>
* swr/rast: Cache eventmanagerTim Rowley2017-11-203-0/+9
| | | | Reviewed-by: Bruce Cherniak <[email protected]>
* swr/rast: Enable AVX-512 targets in the jitterTim Rowley2017-11-202-10/+0
| | | | Reviewed-by: Bruce Cherniak <[email protected]>
* swr/rast: Points with clipdistance can't go through simplepoints pathTim Rowley2017-11-201-1/+2
| | | | | | | Fixes piglit glsl-1.20:vs-clip-vertex-primitives and glsl-1.30:vs-clip-distance-primitives. Reviewed-by: Bruce Cherniak <[email protected]>
* swr/rast: Code style change (NFC)Tim Rowley2017-11-201-2/+7
| | | | Reviewed-by: Bruce Cherniak <[email protected]>
* swr/rast: Widen fetch shader to SIMD16Tim Rowley2017-11-205-3/+151
| | | | | | | Widen fetch shader to SIMD16, enable SIMD16 types in the jitter, and provide utility EXTRACT/INSERT SIMD8 <-> SIMD16 utility functions. Reviewed-by: Bruce Cherniak <[email protected]>
* swr/rast: Support flexible vertex layout for DS outputTim Rowley2017-11-202-0/+3
| | | | Reviewed-by: Bruce Cherniak <[email protected]>
* gallium/u_threaded: avoid syncing in threaded_context_flushNicolai Hähnle2017-11-203-5/+17
| | | | | | | | We could always do the flush asynchronously, but if we're going to wait for a fence anyway and the driver thread is currently idle, the additional communication overhead isn't worth it. Reviewed-by: Marek Olšák <[email protected]>
* radeonsi: avoid syncing the driver thread in si_fence_finishNicolai Hähnle2017-11-203-37/+49
| | | | | | It is really only required when we need to flush for deferred fences. Reviewed-by: Marek Olšák <[email protected]>
* radeonsi: recompute the relative timeout after waiting for ready fenceNicolai Hähnle2017-11-201-0/+5
| | | | Reviewed-by: Marek Olšák <[email protected]>
* ddebug: fix the hang detection timeout calculationNicolai Hähnle2017-11-201-2/+2
| | | | | Fixes: c9fefa062b36 ("ddebug: rewrite to always use a threaded approach") Reviewed-by: Marek Olšák <[email protected]>
* ddebug: fix use-after-free of streamout targetsNicolai Hähnle2017-11-201-1/+1
| | | | | Fixes: b47727a83ad6 ("ddebug: implement pipelined hang detection mode") Reviewed-by: Marek Olšák <[email protected]>
* gallium/u_threaded: properly initialize fence unflushed tokensNicolai Hähnle2017-11-201-2/+1
| | | | | | | This got lost in a rebase but never hurt anything because we happened to always sync in fence_finish anyway... Reviewed-by: Marek Olšák <[email protected]>
* util/u_queue: really use futex-based fencesNicolai Hähnle2017-11-201-1/+1
| | | | | | | The relevant define changed in the final revision of the simple mutex patch. Reviewed-by: Marek Olšák <[email protected]>
* util/u_queue: fix timeout handling in util_queue_fence_wait_timeoutNicolai Hähnle2017-11-201-1/+1
| | | | | Fixes: e3a8013de8ca ("util/u_queue: add util_queue_fence_wait_timeout") Reviewed-by: Marek Olšák <[email protected]>
* st/mesa: use asynchronous flushes in st_finishNicolai Hähnle2017-11-201-1/+1
| | | | | | | | | With threaded gallium, the driver may currently be running in another thread. In that case, we will execute all remaining commands in that thread instead of syncing, which should be better for cache locality. Reviewed-by: Andres Rodriguez <[email protected]> Reviewed-by: Marek Olšák <[email protected]>
* st/mesa: implement st_server_wait_sync properlyNicolai Hähnle2017-11-201-2/+24
| | | | | | | | | | | | | | | | | | | | | Asynchronous flushes require a proper implementation of st_server_wait_sync, because we could have the following with threaded Gallium: Context 1 app Context 1 driver Context 2 ------------- ---------------- --------- f = glFenceSync glFlush <-- app sync --> <-- app sync --> glWaitSync(f) .. draw calls .. pipe_context::flush for glFenceSync pipe_context::flush for glFlush Reviewed-by: Andres Rodriguez <[email protected]> Reviewed-by: Marek Olšák <[email protected]>
* u_threaded_gallium: remove synchronization in fence_server_syncNicolai Hähnle2017-11-203-3/+13
| | | | | | | | The whole point of fence_server_sync is that it can be used to avoid waiting in the application thread. Reviewed-by: Andres Rodriguez <[email protected]> Reviewed-by: Marek Olšák <[email protected]>
* amd: build addrlib with C++11Nicolai Hähnle2017-11-201-1/+1
| | | | | | | | | It is required for LLVM anyway. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=103658 Fixes: 7f33e94e43a6 ("amd/addrlib: update to latest version") Tested-by: Vinson Lee <[email protected]> Reviewed-by: Marek Olšák <[email protected]>
* radeonsi/gfx9: fix VM fault with fetched instance divisorsNicolai Hähnle2017-11-202-5/+12
| | | | | | | | | We need to account for SGPR locations in merged shaders. This case is exercised by KHR-GL45.enhanced_layouts.vertex_attrib_locations Fixes: 79c2e7388c7f ("radeonsi/gfx9: use SPI_SHADER_USER_DATA_COMMON") Reviewed-by: Marek Olšák <[email protected]>
* radv: use a 16 bytes array for the sampled/storage image descriptorsSamuel Pitoiset2017-11-203-12/+8
| | | | | | | This allows to update them with only one memcpy(). Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Dave Airlie <[email protected]>
* radv: do not add the query pool BO to the list in vkCmdEndQuery()Samuel Pitoiset2017-11-201-1/+3
| | | | | | | | | As per the spec, the query identified by queryPool and query must currently be active. Applications have to call vkCmdBeginQuery() before, and thus the query pool BO will already be in the list. Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Dave Airlie <[email protected]>
* radv: only load needed depth clear regs for fast depth clearsSamuel Pitoiset2017-11-201-2/+12
| | | | | | | | | Similar to how the driver sets the depth clear regs after a fast depth clear. Most of the time, this will copy a 32-bit reg instead of a 64-bit reg. Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]>
* radv: do not add the image BO in radv_set_depth_clear_regs()Samuel Pitoiset2017-11-201-2/+0
| | | | | | | | | | For the fast path, radv_fill_buffer() ensures that the BO is already in the list. For the slow path, the depth surface is part of the framebuffer which means the BO is added to the list when the framebuffer is emitted. Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]>
* radv: remove useless assertion in emit_depthstencil_clear()Samuel Pitoiset2017-11-201-4/+0
| | | | | | | Already checked in emit_clear(). Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]>
* radv: remove useless check in radv_set_depth_clear_regs()Samuel Pitoiset2017-11-201-1/+1
| | | | | | | | aspects can't be zero and there is an assertion that ensures it's not in emit_clear(). Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]>
* docs/features: mark some r600 extensions supportedDave Airlie2017-11-201-2/+2
| | | | | | These just looked to be missed when this file was updated. Signed-off-by: Dave Airlie <[email protected]>
* glsl: Catch subscripted calls to undeclared subroutinesGeorge Barrett2017-11-201-2/+7
| | | | | | | | | | generate_array_index fails to check whether the target of a subroutine call exists in the AST, potentially passing around null ir_rvalue pointers eventuating in abort/segfault. Fixes: fd01840c0bd3 ("glsl: add AoA support to subroutines") Reviewed-by: Timothy Arceri <[email protected]> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=100438
* broadcom/vc5: Fix up integer texture handling.Eric Anholt2017-11-192-27/+51
| | | | | | | | The original spec I had didn't expose integer textures and suggested that you use unfiltered floats. Now there are proper formats for them. Fixes 16- and 32-bit texwrap integer tests in piglit, and dEQP-GLES3.functional.fbo.completeness.renderable.renderbuffer.color0.rgb10_a2ui.
* broadcom/vc5: Fix simulator assertion failures about color RT clears.Eric Anholt2017-11-191-2/+19
| | | | | | | | When we tried to clear color while storing depth, it assertion failed about basically not having enough information to decide which color RT to clear. It turns out the STORE_GENERAL picks the buffer according to the color buffer being stored, or all of them if NONE. If you're doing depth, it doesn't know which to pick.
* freedreno/ir3: add texture gather supportRob Clark2017-11-183-3/+18
| | | | Signed-off-by: Rob Clark <[email protected]>
* etnaviv: enable full overwrite when no color buffer is presentLucas Stach2017-11-182-3/+3
| | | | | | | | The OVERWRITE bit disables destination fetches, which is exactly what we want when there is no valid color buffer bound. Signed-off-by: Lucas Stach <[email protected]> Reviewed-by: Wladimir J. van der Laan <[email protected]>
* i965: Stop including brw_cfg.h in brw_disasm_info.hJason Ekstrand2017-11-171-1/+5
| | | | | | | | | | | | The brw_disasm_info header is included by certain tools in order to get shader assembly from binaries so it's a semi-external header. Including brw_cfg.h also pulls in brw_shader.h so you end up getting quite a bit of our back-end compiler internals. Instead, make the couple of forward declarations we need and make the header more stand-alone. This fixes the meson build. Reviewed-by: Matt Turner <[email protected]> Fixes: 4f82b17287194ca7d10816f6cfe4712a3e0a03fc
* i965: Mark BOs as external when we export their handleJason Ekstrand2017-11-173-1/+11
| | | | | | | | | | | | | | | | | | | | Almost all of our BO export paths were already properly marked the BO as external and added it to the handle table. Most export use-cases go through a prime fd or flink where we have a brw_bo export helper that does the right thing. The one missing one happens when you call queryImage and ask for __DRI_IMAGE_ATTRIB_HANDLE. We just grabbed the gem handle out of the BO (because it's really easy to do that) and handed it off to the client; what could go wrong? As it turns out, this path is used by basically every compositor that wants to turn around and call drmModeAddFB2 on it so it can hand it off to display. The result, as of 4b1e70cc57d7ff5f465544644b2180dee1490cee, is that we no longer set MOCS_PTE on those surfaces and the kernel's attempts to disable caching fail and we scanout gets corruption. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=103759 Fixes: 4b1e70cc57d7ff5f465544644b2180dee1490cee Reviewed-by: Kenneth Graunke <[email protected]> Cc: [email protected]
* i965/bufmgr: Add a helper to mark a BO as externalJason Ekstrand2017-11-171-6/+11
| | | | | Reviewed-by: Kenneth Graunke <[email protected]> Cc: [email protected]
* i965: Correct disasm_info usage in eu_validate testAndres Gomez2017-11-181-6/+6
| | | | | | | | Fixes: 4f82b1728719 ("i965: Rewrite disassembly annotation code") Cc: Matt Turner <[email protected]> Signed-off-by: Andres Gomez <[email protected]> Reviewed-by: Matt Turner <[email protected]>
* broadcom/vc5: Set up the padded height at surface creation time.Eric Anholt2017-11-173-16/+15
| | | | | This centralizes the calculation in the surface, instead of in each load/store.
* broadcom/vc5: Ensure that there is always a TLB write.Eric Anholt2017-11-171-1/+17
| | | | | This should fix some GPU hangs in our (currently always single-threaded) fragment shaders, and definitely fixes assertion failures in simulation.
* broadcom/vc5: Fix clear color for swap_color_rb render targets.Eric Anholt2017-11-171-0/+9
| | | | Fixes dEQP-GLES3.functional.depth_stencil_clear.depth.*
* broadcom/vc5: Fix pasteo in front stencil ref value setup.Eric Anholt2017-11-171-1/+1
| | | | Fixes piglit masked-clear.