summaryrefslogtreecommitdiffstats
path: root/src
Commit message (Collapse)AuthorAgeFilesLines
* radeonsi: use compute shaders for clear_buffer & copy_bufferMarek Olšák2018-10-168-203/+350
| | | | | Fast color clears should be much faster. Also, fast color clears on evicted buffers should be 200x faster on GFX8 and older.
* radeonsi: use copy_buffer in buffer_do_flush_region directlyMarek Olšák2018-10-161-11/+4
|
* radeonsi: use faster integer division for instance divisorsMarek Olšák2018-10-163-36/+83
| | | | | | | | | | We know the divisors when we upload them, so instead we can precompute and upload division factors derived from each divisor. This fast division consists of add, mul_hi, and two shifts, and we have to load 4 dwords intead of 1. This probably won't affect any apps.
* ac: add helpers for fast integer division by a constantMarek Olšák2018-10-162-0/+78
|
* radeonsi: use higher subpixel precision (QUANT_MODE) for smaller viewportsMarek Olšák2018-10-163-9/+53
|
* radeonsi: move emission of PA_SU_VTX_CNTL into emit_guardbandMarek Olšák2018-10-164-6/+11
| | | | | We'll modify the quant mode there, which also affects the guarband computation.
* radeonsi: don't re-upload the sample position constant buffer repeatedlyMarek Olšák2018-10-164-16/+33
|
* radeonsi: set PA_SU_PRIM_FILTER_CNTL optimallyMarek Olšák2018-10-163-4/+13
|
* radeonsi: center viewport to improve guardband clipping for high resolutionsMarek Olšák2018-10-164-14/+62
| | | | | | This will be more useful when we change the quant mode to increase subpixel precision and decrease the viewport range (which might not be possible if the viewport is not centered in the viewport range).
* radeonsi: save raster config in screen, add se_tile_repeatMarek Olšák2018-10-166-11/+31
|
* radeonsi: switch back to standard DX sample positionsMarek Olšák2018-10-161-17/+26
| | | | Apps may rely on them.
* radeonsi: add GDS support to CP DMAMarek Olšák2018-10-163-21/+89
|
* radeonsi: rename si_gfx_* functions to si_cp_*Marek Olšák2018-10-166-59/+60
| | | | and write_event_eop -> release_mem
* radeonsi: make si_gfx_write_event_eop more configurableMarek Olšák2018-10-166-15/+34
|
* anv/skylake: disable ForceThreadDispatchEnableSergii Romantsov2018-10-161-7/+35
| | | | | | | | | | | | | | | | On Skylake enabling of ForceThreadDispatchEnable causes gpu-hang. -v2: enabling of ForceThreadDispatchEnable is only for gen8, for gen9 and higher reverted enabling of PixelShaderHasUAV. -v3 (Jason Ekstrand): Rework the comments a bit. CC: Jason Ekstrand <[email protected]> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=107941 Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=107760 Fixes: 79270d2140ec (anv: Stop setting 3DSTATE_PS_EXTRA::PixelShaderHasUAV) Signed-off-by: Sergii Romantsov <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]>
* anv: Implement VK_EXT_pci_bus_infoLionel Landwerlin2018-10-163-5/+26
| | | | | | | | Even though the Intel GPU are always at the same PCI location, all the info we need is already provided by libdrm. Let's be future proof. Signed-off-by: Lionel Landwerlin <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]>
* radv: disable VK_SUBGROUP_FEATURE_VOTE_BITSamuel Pitoiset2018-10-161-2/+4
| | | | | | | | | | | This feature isn't used for now, so disable it until wwm is fixed in LLVM. Fixes dEQP-VK.subgroups.vote.graphics.subgroupallequal* https://bugs.freedesktop.org/show_bug.cgi?id=108115 Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]>
* radv: implement buffer to image operations for R32G32B32Samuel Pitoiset2018-10-163-2/+353
| | | | | | | | | | This should fix rendering issues with Batman Arkham City. We will probably need to implement itob and itoi at some point, but currently nothing hits these paths. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=107765 Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]>
* ac/nir: Use context-specific LLVM typesAlex Smith2018-10-161-2/+2
| | | | | | | | | | | | | | | LLVMInt*Type() return types from the global context and therefore are not safe for use in other contexts. Use types from our own context instead. Fixes frequent crashes seen when doing multithreaded pipeline creation. Fixes: 4d0b02bb5a "ac: add support for 16bit load_push_constant" Fixes: 7e7ee82698 "ac: add support for 16bit buffer loads" Cc: "18.2" <[email protected]> Signed-off-by: Alex Smith <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]> Reviewed-by: Samuel Pitoiset <[email protected]>
* glsl: Check the subroutine associated functions namesVadym Shovkoplias2018-10-161-0/+36
| | | | | | | | | | | | | | | | | | | Adding compile time check for subroutine functions with the same names. Similar check for intrastage linking was already landed in commit 5f0567a4f60. From Section 6.1.2 (Subroutines) of the GLSL 4.00 specification "A program will fail to compile or link if any shader or stage contains two or more functions with the same name if the name is associated with a subroutine type." Fixes: * no-overloads.vert Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=108109 Signed-off-by: Vadym Shovkoplias <[email protected]> Reviewed-by: Tapani Pälli <[email protected]>
* glsl/linker: Change the format of spec quotationVadym Shovkoplias2018-10-161-6/+5
| | | | | | | | Also there is no "OpenGL ES Shading Language 4.00" spec, so change it to GLSL 4.00 spec. Signed-off-by: Vadym Shovkoplias <[email protected]> Reviewed-by: Tapani Pälli <[email protected]>
* nir: fix clip cull lowering to not assert if GLSL already lowered.Dave Airlie2018-10-151-0/+6
| | | | | | If GLSL has already done the lowering, we'd rather not crash in this pass. Reviewed-by: Kenneth Graunke <[email protected]>
* intel: disable FS IR validation in release mode.Kenneth Graunke2018-10-151-0/+2
| | | | | | We probably don't need to iterate, fprintf, and abort in release mode. Reviewed-by: Matt Turner <[email protected]>
* nir: Copy propagation between blocksCaio Marcelo de Oliveira Filho2018-10-152-82/+351
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Extend the pass to propagate the copies information along the control flow graph. It performs two walks, first it collects the vars that were written inside each node. Then it walks applying the copy propagation using a list of copies previously available. At each node the list is invalidated according to results from the first walk. This approach is simpler than a full data-flow analysis, but covers various cases. If derefs are used for operating on more memory resources (e.g. SSBOs), the difference from a regular pass is expected to be more visible -- as the SSA copy propagation pass won't apply to those. A full data-flow analysis would handle more scenarios: conditional breaks in the control flow and merge equivalent effects from multiple branches (e.g. using a phi node to merge the source for writes to the same deref). However, as previous commentary in the code stated, its complexity 'rapidly get out of hand'. The current patch is a good intermediate step towards more complex analysis. The 'copies' linked list was modified to use util_dynarray to make it more convenient to clone it (to handle ifs/loops). Annotated shader-db results for Skylake: total instructions in shared programs: 15105796 -> 15105451 (<.01%) instructions in affected programs: 152293 -> 151948 (-0.23%) helped: 96 HURT: 17 All the HURTs and many HELPs are one instruction. Looking at pass by pass outputs, the copy prop kicks in removing a bunch of loads correctly, which ends up altering what other other optimizations kick. In those cases the copies would be propagated after lowering to SSA. In few HELPs we are actually helping doing more than was possible previously, e.g. consolidating load_uniforms from different blocks. Most of those are from shaders/dolphin/ubershaders/. total cycles in shared programs: 566048861 -> 565954876 (-0.02%) cycles in affected programs: 151461830 -> 151367845 (-0.06%) helped: 2933 HURT: 2950 A lot of noise on both sides. total loops in shared programs: 4603 -> 4603 (0.00%) loops in affected programs: 0 -> 0 helped: 0 HURT: 0 total spills in shared programs: 11085 -> 11073 (-0.11%) spills in affected programs: 23 -> 11 (-52.17%) helped: 1 HURT: 0 The shaders/dolphin/ubershaders/12.shader_test was able to pull a couple of loads from inside if statements and reuse them. total fills in shared programs: 23143 -> 23089 (-0.23%) fills in affected programs: 2718 -> 2664 (-1.99%) helped: 27 HURT: 0 All from shaders/dolphin/ubershaders/. LOST: 0 GAINED: 0 The other generations follow the same overall shape. The spills and fills HURTs are all from the same game. shader-db results for Broadwell. total instructions in shared programs: 15402037 -> 15401841 (<.01%) instructions in affected programs: 144386 -> 144190 (-0.14%) helped: 86 HURT: 9 total cycles in shared programs: 600912755 -> 600902486 (<.01%) cycles in affected programs: 185662820 -> 185652551 (<.01%) helped: 2598 HURT: 3053 total loops in shared programs: 4579 -> 4579 (0.00%) loops in affected programs: 0 -> 0 helped: 0 HURT: 0 total spills in shared programs: 80929 -> 80924 (<.01%) spills in affected programs: 720 -> 715 (-0.69%) helped: 1 HURT: 5 total fills in shared programs: 93057 -> 93013 (-0.05%) fills in affected programs: 3398 -> 3354 (-1.29%) helped: 27 HURT: 5 LOST: 0 GAINED: 2 shader-db results for Haswell: total instructions in shared programs: 9231975 -> 9230357 (-0.02%) instructions in affected programs: 44992 -> 43374 (-3.60%) helped: 27 HURT: 69 total cycles in shared programs: 87760587 -> 87727502 (-0.04%) cycles in affected programs: 7720673 -> 7687588 (-0.43%) helped: 1609 HURT: 1416 total loops in shared programs: 1830 -> 1830 (0.00%) loops in affected programs: 0 -> 0 helped: 0 HURT: 0 total spills in shared programs: 1988 -> 1692 (-14.89%) spills in affected programs: 296 -> 0 helped: 1 HURT: 0 total fills in shared programs: 2103 -> 1668 (-20.68%) fills in affected programs: 438 -> 3 (-99.32%) helped: 4 HURT: 0 LOST: 0 GAINED: 1 v2: Remove the DISABLE prefix from tests we now pass. v3: Add comments about missing write_mask handling. (Caio) Add unreachable when switching on cf_node type. (Jason) Properly merge the component information in written map instead of replacing. (Jason) Explain how removal from written arrays works. (Jason) Use mode directly from deref instead of getting the var. (Jason) v4: Register the local written mode for calls. (Jason) Prefer cf_node instead of node. (Jason) Clarify that remove inside iteration only works in backward iterations. (Jason) Reviewed-by: Jason Ekstrand <[email protected]>
* nir: Take call instruction into account in copy_prop_varsCaio Marcelo de Oliveira Filho2018-10-151-6/+12
| | | | | | | | | | | | | | | | | Calls are not used yet (functions are inlined), but since new code is already taking them into account, do it here too. The convention here and in other places is that no writable memory is assumed to remain unchanged, as well as global variables. Also, explicitly state the modes affected (instead of using the reverse logic) in one of the apply_for_barrier_modes calls. Suggested by Jason. v2: Consider local vars used by a call to be conservative, SPIR-V has such cases. (Jason) Reviewed-by: Jason Ekstrand <[email protected]>
* nir: Add tests for copy propagation of derefsCaio Marcelo de Oliveira Filho2018-10-151-0/+300
| | | | | | | | | | | | Also tests for removal of redundant loads, that we currently handle as part of the copy propagation. Note some tests involve multiple blocks and are currently DISABLED because they (expectedly) fail. v2: Add missing DISABLED prefix to "multi block" tests. (Jason) Reviewed-by: Jason Ekstrand <[email protected]>
* nir: Remove handling of dead writes from copy_prop_varsCaio Marcelo de Oliveira Filho2018-10-151-76/+8
| | | | | | These are covered by another pass now. Reviewed-by: Jason Ekstrand <[email protected]>
* intel/nir, freedreno/ir3: Use the separated dead write vars passCaio Marcelo de Oliveira Filho2018-10-152-0/+2
| | | | | | | No changes to shader-db for intel. No changes to shader-db expected for freedreno. Reviewed-by: Jason Ekstrand <[email protected]>
* nir: Separate dead write removal into its own passCaio Marcelo de Oliveira Filho2018-10-155-3/+227
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Instead of doing this as part of the existing copy_prop_vars pass. Separation makes easier to expand the scope of both passes to be more than per-block. For copy propagation, the information about valid copies comes from previous instructions; while the dead write removal depends on information from later instructions ("have any instruction used this deref before overwrite it?"). Also change the tests to use this pass (instead of copy prop vars). Note that the disabled tests continue to fail, since the standalone pass is still per-block. v2: Remove entries from dynarray instead of marking items as deleted. Use foreach_reverse. (Caio) (all from Jason) Do not cache nir_deref_path. Not worthy for this patch. Clear unused writes when hitting a call instruction. Clean up enumeration of modes for barriers. Move metadata calls to the inner function. v3: For copies, use the vector length to calculate the mask. (all from Jason) Use nir_component_mask_t when applicable. Rename functions for clarity. Consider local vars used by a call to be conservative (SPIR-V has such cases). Comment and assert the assumption that stores and copies are always to a deref that ends with a vector or scalar. Reviewed-by: Jason Ekstrand <[email protected]>
* nir: Add tests for dead write eliminationCaio Marcelo de Oliveira Filho2018-10-151-0/+241
| | | | | | | | | | | Note at the moment the pass called is nir_opt_copy_prop_vars, because dead write elimination is implemented there. Also added tests that involve identifying dead writes in multiple blocks (e.g. the overwrite happens in another block). Those currently fail as expected, so are marked to be skipped. Reviewed-by: Jason Ekstrand <[email protected]>
* nir: Add test file for vars related passesCaio Marcelo de Oliveira Filho2018-10-153-11/+224
| | | | | | | | | | | | Add basic helpers for doing tests on the vars related optimization passes. The main goal is to lower the barrier to create tests during development and debugging of the passes. Full coverage is not a requirement. v2: Make find_next_intrinsic() skip blocks before 'after'. (Jason) Move nir_imm_ivec2() to nir_builder.h. (Jason) Reviewed-by: Jason Ekstrand <[email protected]>
* nir: Add nir_imm_ivec2 helperCaio Marcelo de Oliveira Filho2018-10-151-0/+12
| | | | Reviewed-by: Jason Ekstrand <[email protected]>
* util: Add foreach_reverse for dynarrayCaio Marcelo de Oliveira Filho2018-10-151-0/+6
| | | | | | | | | Useful to walk the array removing elements by swapping them with the last element. v2: Change iteration to make sure we never underflow. (Jason) Reviewed-by: Jason Ekstrand <[email protected]>
* v3d: Add support for hardware pack/unpack of half floats.Eric Anholt2018-10-152-0/+17
| | | | | Cuts the formerly 7-minute simulation time of fs-packHalf2x16.shader_test in half.
* nir: Expose nir_remove_unused_io_vars().Eric Anholt2018-10-152-8/+27
| | | | | | | | | | | | | | For gallium drivers where you want to do some linking at variant compile time, you don't have the other producer/consumer shader on hand to modify. By exposing the inner function, the driver can have the used varyings in the compiled shader cache key and still do linking. This is also useful for V3D, where the binning shader wants to only output position and TF varyings. We've been removing those after nir_lower_io, but this will be less driver-specific code and let more of the shader get DCEed early in NIR. Reviewed-by: Timothy Arceri <[email protected]>
* nir: Be sure to fix deref modes after demoting shader i/o vars to global.Eric Anholt2018-10-151-0/+3
| | | | | | | Fixes assertion failures when calling nir_remove_unused_varyings() or nir_remove_unused_io_vars(). Reviewed-by: Timothy Arceri <[email protected]>
* gallium/ttn: Convert inputs and outputs to derefs of variables.Eric Anholt2018-10-154-69/+64
| | | | | | | | | | | This means that TTN shaders more closely resemble GTN shaders: they have inputs and outputs as variable derefs, with the variables having their .driver_location already set up for you. This will be useful for v3d to do input variable DCE in NIR, which we can't do when the TTN shaders never have a pre-nir_lower_io stage. Acked-by: Rob Clark <[email protected]>
* gallium/ttn: Fix the type of gl_FragDepth.Eric Anholt2018-10-151-0/+1
| | | | | | | | In TGSI we have a vec4 of which only .z is used, but for NIR we should be using a float the same as other NIR IR. We were already moving TGSI's .z to the .x channel. Acked-by: Rob Clark <[email protected]>
* freedreno/a6xx: Enable blitterKristian H. Kristensen2018-10-155-0/+623
| | | | | Signed-off-by: Kristian H. Kristensen <[email protected]> Reviewed-by: Rob Clark <[email protected]>
* freedreno/a6xx: Update headersKristian H. Kristensen2018-10-151-16/+30
| | | | | Signed-off-by: Kristian H. Kristensen <[email protected]> Reviewed-by: Rob Clark <[email protected]>
* freedreno/a6xx: Remove unnecessary GRAS_2D_BLIT_INFO writeKristian H. Kristensen2018-10-151-2/+0
| | | | | Signed-off-by: Kristian H. Kristensen <[email protected]> Reviewed-by: Rob Clark <[email protected]>
* anv: Don't advertise ASTC support on BSWJason Ekstrand2018-10-151-0/+8
| | | | | Tested-by: Mark Janes <[email protected]> Reviewed-by: Nanley Chery <[email protected]>
* radv: do not force the flat qualifier for clip/cull distancesSamuel Pitoiset2018-10-151-2/+2
| | | | | | | | | | This fixes some new CTS that reads clip/cull distances from the fragment shader stage: dEQP-VK.clipping.user_defined.clip_* Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]>
* radv: bump discreteQueuePriorities to 2Samuel Pitoiset2018-10-151-1/+1
| | | | | | | | | It's the minimum value required by the spec. This fixes dEQP-VK.api.info.device.properties. Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]>
* anv: Split dispatch tables into device and instanceJason Ekstrand2018-10-153-91/+230
| | | | | | | | | | | | | | | | | | | | There's no reason why we need generate trampoline functions for instance functions or carry N copies of the instance dispatch table around for every hardware generation. Splitting the tables and being more conservative shaves about 34K off .text and about 4K off .data when built with clang. Before splitting dispatch tables: text data bss dec hex filename 3224305 286216 8960 3519481 35b3f9 _install/lib64/libvulkan_intel.so After splitting dispatch tables: text data bss dec hex filename 3190325 282232 8960 3481517 351fad _install/lib64/libvulkan_intel.so Reviewed-by: Lionel Landwerlin <[email protected]>
* i965: Drop assert about number of uniforms in ARB handling.Kenneth Graunke2018-10-151-3/+2
| | | | | | | | | | | My recent prog_to_nir patch started making new sampler uniforms, which apparently increased the number of parameters. We used to poke at the one parameter directly, making it important that there was only one, but we haven't done that in a while. It should be safe to just delete the assertion. Fixes: 1c0f92d8a8c "nir: Create sampler variables in prog_to_nir." Reviewed-by: Jason Ekstrand <[email protected]>
* radv: Implement VK_EXT_pci_bus_info.Bas Nieuwenhuizen2018-10-153-0/+13
| | | | Reviewed-by: Samuel Pitoiset <[email protected]>
* gallium/u_transfer_helper: Add support for separate Z24/S8 as well.Kenneth Graunke2018-10-145-22/+60
| | | | | | | | | | | | | | | | u_transfer_helper already had code to handle treating packed Z32_S8 as separate Z32_FLOAT and S8_UINT resources, since some drivers can't handle that interleaved format natively. Other hardware needs depth and stencil as separate resources for all formats. For example, V3D3 needs this for 24-bit depth as well. This patch adds a new flag to lower all depth/stencils formats, and implements support for Z24_UNORM_S8_UINT. (S8_UINT_Z24_UNORM is left as an exercise to the reader, preferably someone who has access to a machine that uses that format.) Reviewed-by: Eric Anholt <[email protected]>
* gallium/format: Add a helper to combine separate Z24 and S8 stencil.Kenneth Graunke2018-10-142-0/+22
| | | | | | | This new function takes separate Z24 depth and S8 stencil sources, and packs them into a single combined Z24S8 buffer. Reviewed-by: Eric Anholt <[email protected]>
* gallium/auxiliary: Add util_format_get_depth_only() helper.Kenneth Graunke2018-10-141-0/+21
| | | | | | | This will be used by u_transfer_helper.c shortly, in order to split packed depth-stencil into separate resources. Reviewed-by: Eric Anholt <[email protected]>