aboutsummaryrefslogtreecommitdiffstats
Commit message (Collapse)AuthorAgeFilesLines
* freedreno: allocate batches from the cache in launch_gridHyunjun Ko2018-10-171-1/+2
| | | | | | | | | | Needs to allocate batches from the cache so that it could get a valid index and make resource dependancy tracking right. In addition this fixes assertion on debug build since the commit 1a40faa8 landed. Signed-off-by: Rob Clark <[email protected]>
* freedreno: adds nondraw param to fd_bc_alloc_batchHyunjun Ko2018-10-174-6/+6
| | | | | | | | Needs to specify nondraw when creating a batch through fd_bc_alloc_batch since it'd better create a batch through it rather than fd_batch_create. Signed-off-by: Rob Clark <[email protected]>
* freedreno/a6xx: remove fd6_emit_render_cntl()Rob Clark2018-10-172-34/+0
| | | | | | It was dead code carried over from a5xx Signed-off-by: Rob Clark <[email protected]>
* freedreno/ir3: fix broken texcoord inputsRob Clark2018-10-171-21/+1
| | | | | | | | TODO not sure if this is best solution, but current logic is broken for texcoord inputs. It is definitely the simplest solution. Fixes: 1a24f519663 freedreno/ir3: ignore unused inputs Signed-off-by: Rob Clark <[email protected]>
* freedreno: fix off-by-one error in BEGIN_RING()Rob Clark2018-10-171-1/+1
| | | | Signed-off-by: Rob Clark <[email protected]>
* util: document a limitation of util_fast_udiv32Marek Olšák2018-10-171-1/+7
| | | | trivial
* i965/fs: Add 64-bit int immediate support to dump_instructions()Matt Turner2018-10-162-0/+8
| | | | | Reviewed-by: Jason Ekstrand <[email protected]> Reviewed-by: Iago Toral Quiroga <[email protected]>
* radeonsi: track context rolls better for the Vega scissor bug workaroundMarek Olšák2018-10-167-34/+80
| | | | | | We should get fewer context rolls with the SET_CONTEXT_REG optimization, but it would have been for nothing if the scissor state rolled the context anyway. Don't emit the scissor state if there is no context roll.
* radeonsi: emit sample locations for 1xAA only when the hw bug is presentMarek Olšák2018-10-161-4/+2
|
* radeonsi: use compute shaders for clear_buffer & copy_bufferMarek Olšák2018-10-168-203/+350
| | | | | Fast color clears should be much faster. Also, fast color clears on evicted buffers should be 200x faster on GFX8 and older.
* radeonsi: use copy_buffer in buffer_do_flush_region directlyMarek Olšák2018-10-161-11/+4
|
* radeonsi: use faster integer division for instance divisorsMarek Olšák2018-10-163-36/+83
| | | | | | | | | | We know the divisors when we upload them, so instead we can precompute and upload division factors derived from each divisor. This fast division consists of add, mul_hi, and two shifts, and we have to load 4 dwords intead of 1. This probably won't affect any apps.
* ac: add helpers for fast integer division by a constantMarek Olšák2018-10-162-0/+78
|
* radeonsi: use higher subpixel precision (QUANT_MODE) for smaller viewportsMarek Olšák2018-10-163-9/+53
|
* radeonsi: move emission of PA_SU_VTX_CNTL into emit_guardbandMarek Olšák2018-10-164-6/+11
| | | | | We'll modify the quant mode there, which also affects the guarband computation.
* radeonsi: don't re-upload the sample position constant buffer repeatedlyMarek Olšák2018-10-164-16/+33
|
* radeonsi: set PA_SU_PRIM_FILTER_CNTL optimallyMarek Olšák2018-10-163-4/+13
|
* radeonsi: center viewport to improve guardband clipping for high resolutionsMarek Olšák2018-10-164-14/+62
| | | | | | This will be more useful when we change the quant mode to increase subpixel precision and decrease the viewport range (which might not be possible if the viewport is not centered in the viewport range).
* radeonsi: save raster config in screen, add se_tile_repeatMarek Olšák2018-10-166-11/+31
|
* radeonsi: switch back to standard DX sample positionsMarek Olšák2018-10-161-17/+26
| | | | Apps may rely on them.
* radeonsi: add GDS support to CP DMAMarek Olšák2018-10-163-21/+89
|
* radeonsi: rename si_gfx_* functions to si_cp_*Marek Olšák2018-10-166-59/+60
| | | | and write_event_eop -> release_mem
* radeonsi: make si_gfx_write_event_eop more configurableMarek Olšák2018-10-166-15/+34
|
* anv/skylake: disable ForceThreadDispatchEnableSergii Romantsov2018-10-161-7/+35
| | | | | | | | | | | | | | | | On Skylake enabling of ForceThreadDispatchEnable causes gpu-hang. -v2: enabling of ForceThreadDispatchEnable is only for gen8, for gen9 and higher reverted enabling of PixelShaderHasUAV. -v3 (Jason Ekstrand): Rework the comments a bit. CC: Jason Ekstrand <[email protected]> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=107941 Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=107760 Fixes: 79270d2140ec (anv: Stop setting 3DSTATE_PS_EXTRA::PixelShaderHasUAV) Signed-off-by: Sergii Romantsov <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]>
* anv: Implement VK_EXT_pci_bus_infoLionel Landwerlin2018-10-163-5/+26
| | | | | | | | Even though the Intel GPU are always at the same PCI location, all the info we need is already provided by libdrm. Let's be future proof. Signed-off-by: Lionel Landwerlin <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]>
* appveyor: Cache pip's cache files.Jose Fonseca2018-10-161-0/+2
| | | | | | It should speed up the Python packages installation. Reviewed-by: Roland Scheidegger <[email protected]>
* appveyor: Update to newer Mako/winflexbison versions.Jose Fonseca2018-10-161-4/+5
| | | | | | As that's what most people are bound to use. Reviewed-by: Roland Scheidegger <[email protected]>
* appveyor: Update to MSVC 2017.Jose Fonseca2018-10-161-6/+6
| | | | | | That's what we (and I suppose most people out there) are using now. Reviewed-by: Roland Scheidegger <[email protected]>
* radv: disable VK_SUBGROUP_FEATURE_VOTE_BITSamuel Pitoiset2018-10-161-2/+4
| | | | | | | | | | | This feature isn't used for now, so disable it until wwm is fixed in LLVM. Fixes dEQP-VK.subgroups.vote.graphics.subgroupallequal* https://bugs.freedesktop.org/show_bug.cgi?id=108115 Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]>
* radv: implement buffer to image operations for R32G32B32Samuel Pitoiset2018-10-163-2/+353
| | | | | | | | | | This should fix rendering issues with Batman Arkham City. We will probably need to implement itob and itoi at some point, but currently nothing hits these paths. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=107765 Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]>
* ac/nir: Use context-specific LLVM typesAlex Smith2018-10-161-2/+2
| | | | | | | | | | | | | | | LLVMInt*Type() return types from the global context and therefore are not safe for use in other contexts. Use types from our own context instead. Fixes frequent crashes seen when doing multithreaded pipeline creation. Fixes: 4d0b02bb5a "ac: add support for 16bit load_push_constant" Fixes: 7e7ee82698 "ac: add support for 16bit buffer loads" Cc: "18.2" <[email protected]> Signed-off-by: Alex Smith <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]> Reviewed-by: Samuel Pitoiset <[email protected]>
* glsl: Check the subroutine associated functions namesVadym Shovkoplias2018-10-161-0/+36
| | | | | | | | | | | | | | | | | | | Adding compile time check for subroutine functions with the same names. Similar check for intrastage linking was already landed in commit 5f0567a4f60. From Section 6.1.2 (Subroutines) of the GLSL 4.00 specification "A program will fail to compile or link if any shader or stage contains two or more functions with the same name if the name is associated with a subroutine type." Fixes: * no-overloads.vert Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=108109 Signed-off-by: Vadym Shovkoplias <[email protected]> Reviewed-by: Tapani Pälli <[email protected]>
* glsl/linker: Change the format of spec quotationVadym Shovkoplias2018-10-161-6/+5
| | | | | | | | Also there is no "OpenGL ES Shading Language 4.00" spec, so change it to GLSL 4.00 spec. Signed-off-by: Vadym Shovkoplias <[email protected]> Reviewed-by: Tapani Pälli <[email protected]>
* nir: fix clip cull lowering to not assert if GLSL already lowered.Dave Airlie2018-10-151-0/+6
| | | | | | If GLSL has already done the lowering, we'd rather not crash in this pass. Reviewed-by: Kenneth Graunke <[email protected]>
* i965: Add PCI IDs for new Amberlake parts that are Coffeelake basedKenneth Graunke2018-10-151-2/+3
| | | | | | | | See commit c0c46ca461f136a0ae1ed69da6c874e850aeeb53 in the Linux kernel, where José Roberto de Souza added this new PCI ID there. Reviewed-by: Jordan Justen <[email protected]> Reviewed-by: Rodrigo Vivi <[email protected]>
* intel: disable FS IR validation in release mode.Kenneth Graunke2018-10-151-0/+2
| | | | | | We probably don't need to iterate, fprintf, and abort in release mode. Reviewed-by: Matt Turner <[email protected]>
* nir: Copy propagation between blocksCaio Marcelo de Oliveira Filho2018-10-152-82/+351
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Extend the pass to propagate the copies information along the control flow graph. It performs two walks, first it collects the vars that were written inside each node. Then it walks applying the copy propagation using a list of copies previously available. At each node the list is invalidated according to results from the first walk. This approach is simpler than a full data-flow analysis, but covers various cases. If derefs are used for operating on more memory resources (e.g. SSBOs), the difference from a regular pass is expected to be more visible -- as the SSA copy propagation pass won't apply to those. A full data-flow analysis would handle more scenarios: conditional breaks in the control flow and merge equivalent effects from multiple branches (e.g. using a phi node to merge the source for writes to the same deref). However, as previous commentary in the code stated, its complexity 'rapidly get out of hand'. The current patch is a good intermediate step towards more complex analysis. The 'copies' linked list was modified to use util_dynarray to make it more convenient to clone it (to handle ifs/loops). Annotated shader-db results for Skylake: total instructions in shared programs: 15105796 -> 15105451 (<.01%) instructions in affected programs: 152293 -> 151948 (-0.23%) helped: 96 HURT: 17 All the HURTs and many HELPs are one instruction. Looking at pass by pass outputs, the copy prop kicks in removing a bunch of loads correctly, which ends up altering what other other optimizations kick. In those cases the copies would be propagated after lowering to SSA. In few HELPs we are actually helping doing more than was possible previously, e.g. consolidating load_uniforms from different blocks. Most of those are from shaders/dolphin/ubershaders/. total cycles in shared programs: 566048861 -> 565954876 (-0.02%) cycles in affected programs: 151461830 -> 151367845 (-0.06%) helped: 2933 HURT: 2950 A lot of noise on both sides. total loops in shared programs: 4603 -> 4603 (0.00%) loops in affected programs: 0 -> 0 helped: 0 HURT: 0 total spills in shared programs: 11085 -> 11073 (-0.11%) spills in affected programs: 23 -> 11 (-52.17%) helped: 1 HURT: 0 The shaders/dolphin/ubershaders/12.shader_test was able to pull a couple of loads from inside if statements and reuse them. total fills in shared programs: 23143 -> 23089 (-0.23%) fills in affected programs: 2718 -> 2664 (-1.99%) helped: 27 HURT: 0 All from shaders/dolphin/ubershaders/. LOST: 0 GAINED: 0 The other generations follow the same overall shape. The spills and fills HURTs are all from the same game. shader-db results for Broadwell. total instructions in shared programs: 15402037 -> 15401841 (<.01%) instructions in affected programs: 144386 -> 144190 (-0.14%) helped: 86 HURT: 9 total cycles in shared programs: 600912755 -> 600902486 (<.01%) cycles in affected programs: 185662820 -> 185652551 (<.01%) helped: 2598 HURT: 3053 total loops in shared programs: 4579 -> 4579 (0.00%) loops in affected programs: 0 -> 0 helped: 0 HURT: 0 total spills in shared programs: 80929 -> 80924 (<.01%) spills in affected programs: 720 -> 715 (-0.69%) helped: 1 HURT: 5 total fills in shared programs: 93057 -> 93013 (-0.05%) fills in affected programs: 3398 -> 3354 (-1.29%) helped: 27 HURT: 5 LOST: 0 GAINED: 2 shader-db results for Haswell: total instructions in shared programs: 9231975 -> 9230357 (-0.02%) instructions in affected programs: 44992 -> 43374 (-3.60%) helped: 27 HURT: 69 total cycles in shared programs: 87760587 -> 87727502 (-0.04%) cycles in affected programs: 7720673 -> 7687588 (-0.43%) helped: 1609 HURT: 1416 total loops in shared programs: 1830 -> 1830 (0.00%) loops in affected programs: 0 -> 0 helped: 0 HURT: 0 total spills in shared programs: 1988 -> 1692 (-14.89%) spills in affected programs: 296 -> 0 helped: 1 HURT: 0 total fills in shared programs: 2103 -> 1668 (-20.68%) fills in affected programs: 438 -> 3 (-99.32%) helped: 4 HURT: 0 LOST: 0 GAINED: 1 v2: Remove the DISABLE prefix from tests we now pass. v3: Add comments about missing write_mask handling. (Caio) Add unreachable when switching on cf_node type. (Jason) Properly merge the component information in written map instead of replacing. (Jason) Explain how removal from written arrays works. (Jason) Use mode directly from deref instead of getting the var. (Jason) v4: Register the local written mode for calls. (Jason) Prefer cf_node instead of node. (Jason) Clarify that remove inside iteration only works in backward iterations. (Jason) Reviewed-by: Jason Ekstrand <[email protected]>
* nir: Take call instruction into account in copy_prop_varsCaio Marcelo de Oliveira Filho2018-10-151-6/+12
| | | | | | | | | | | | | | | | | Calls are not used yet (functions are inlined), but since new code is already taking them into account, do it here too. The convention here and in other places is that no writable memory is assumed to remain unchanged, as well as global variables. Also, explicitly state the modes affected (instead of using the reverse logic) in one of the apply_for_barrier_modes calls. Suggested by Jason. v2: Consider local vars used by a call to be conservative, SPIR-V has such cases. (Jason) Reviewed-by: Jason Ekstrand <[email protected]>
* nir: Add tests for copy propagation of derefsCaio Marcelo de Oliveira Filho2018-10-151-0/+300
| | | | | | | | | | | | Also tests for removal of redundant loads, that we currently handle as part of the copy propagation. Note some tests involve multiple blocks and are currently DISABLED because they (expectedly) fail. v2: Add missing DISABLED prefix to "multi block" tests. (Jason) Reviewed-by: Jason Ekstrand <[email protected]>
* nir: Remove handling of dead writes from copy_prop_varsCaio Marcelo de Oliveira Filho2018-10-151-76/+8
| | | | | | These are covered by another pass now. Reviewed-by: Jason Ekstrand <[email protected]>
* intel/nir, freedreno/ir3: Use the separated dead write vars passCaio Marcelo de Oliveira Filho2018-10-152-0/+2
| | | | | | | No changes to shader-db for intel. No changes to shader-db expected for freedreno. Reviewed-by: Jason Ekstrand <[email protected]>
* nir: Separate dead write removal into its own passCaio Marcelo de Oliveira Filho2018-10-155-3/+227
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Instead of doing this as part of the existing copy_prop_vars pass. Separation makes easier to expand the scope of both passes to be more than per-block. For copy propagation, the information about valid copies comes from previous instructions; while the dead write removal depends on information from later instructions ("have any instruction used this deref before overwrite it?"). Also change the tests to use this pass (instead of copy prop vars). Note that the disabled tests continue to fail, since the standalone pass is still per-block. v2: Remove entries from dynarray instead of marking items as deleted. Use foreach_reverse. (Caio) (all from Jason) Do not cache nir_deref_path. Not worthy for this patch. Clear unused writes when hitting a call instruction. Clean up enumeration of modes for barriers. Move metadata calls to the inner function. v3: For copies, use the vector length to calculate the mask. (all from Jason) Use nir_component_mask_t when applicable. Rename functions for clarity. Consider local vars used by a call to be conservative (SPIR-V has such cases). Comment and assert the assumption that stores and copies are always to a deref that ends with a vector or scalar. Reviewed-by: Jason Ekstrand <[email protected]>
* nir: Add tests for dead write eliminationCaio Marcelo de Oliveira Filho2018-10-151-0/+241
| | | | | | | | | | | Note at the moment the pass called is nir_opt_copy_prop_vars, because dead write elimination is implemented there. Also added tests that involve identifying dead writes in multiple blocks (e.g. the overwrite happens in another block). Those currently fail as expected, so are marked to be skipped. Reviewed-by: Jason Ekstrand <[email protected]>
* nir: Add test file for vars related passesCaio Marcelo de Oliveira Filho2018-10-153-11/+224
| | | | | | | | | | | | Add basic helpers for doing tests on the vars related optimization passes. The main goal is to lower the barrier to create tests during development and debugging of the passes. Full coverage is not a requirement. v2: Make find_next_intrinsic() skip blocks before 'after'. (Jason) Move nir_imm_ivec2() to nir_builder.h. (Jason) Reviewed-by: Jason Ekstrand <[email protected]>
* nir: Add nir_imm_ivec2 helperCaio Marcelo de Oliveira Filho2018-10-151-0/+12
| | | | Reviewed-by: Jason Ekstrand <[email protected]>
* util: Add foreach_reverse for dynarrayCaio Marcelo de Oliveira Filho2018-10-151-0/+6
| | | | | | | | | Useful to walk the array removing elements by swapping them with the last element. v2: Change iteration to make sure we never underflow. (Jason) Reviewed-by: Jason Ekstrand <[email protected]>
* v3d: Add support for hardware pack/unpack of half floats.Eric Anholt2018-10-152-0/+17
| | | | | Cuts the formerly 7-minute simulation time of fs-packHalf2x16.shader_test in half.
* nir: Expose nir_remove_unused_io_vars().Eric Anholt2018-10-152-8/+27
| | | | | | | | | | | | | | For gallium drivers where you want to do some linking at variant compile time, you don't have the other producer/consumer shader on hand to modify. By exposing the inner function, the driver can have the used varyings in the compiled shader cache key and still do linking. This is also useful for V3D, where the binning shader wants to only output position and TF varyings. We've been removing those after nir_lower_io, but this will be less driver-specific code and let more of the shader get DCEed early in NIR. Reviewed-by: Timothy Arceri <[email protected]>
* nir: Be sure to fix deref modes after demoting shader i/o vars to global.Eric Anholt2018-10-151-0/+3
| | | | | | | Fixes assertion failures when calling nir_remove_unused_varyings() or nir_remove_unused_io_vars(). Reviewed-by: Timothy Arceri <[email protected]>
* gallium/ttn: Convert inputs and outputs to derefs of variables.Eric Anholt2018-10-154-69/+64
| | | | | | | | | | | This means that TTN shaders more closely resemble GTN shaders: they have inputs and outputs as variable derefs, with the variables having their .driver_location already set up for you. This will be useful for v3d to do input variable DCE in NIR, which we can't do when the TTN shaders never have a pre-nir_lower_io stage. Acked-by: Rob Clark <[email protected]>