aboutsummaryrefslogtreecommitdiffstats
Commit message (Collapse)AuthorAgeFilesLines
* anv: Allow fast-clearing the first slice of a multi-slice imageJason Ekstrand2018-02-082-12/+23
| | | | | | | | Now that we're tracking aux properly per-slice, we can enable this for applications which actually care. Reviewed-by: Topi Pohjolainen <[email protected]> Reviewed-by: Nanley Chery <[email protected]>
* anv/cmd_buffer: Rework aux trackingJason Ekstrand2018-02-084-171/+360
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This commit completely reworks aux tracking. This includes a number of somewhat distinct changes: 1) Since we are no longer fast-clearing multiple slices, we only need to track one fast clear color and one fast clear type. 2) We store two bits for fast clear instead of one to let us distinguish between zero and non-zero fast clear colors. This is needed so that we can do full resolves when transitioning to PRESENT_SRC_KHR with gen9 CCS images where we allow zero clear values in all sorts of places we wouldn't normally. 3) We now track compression state as a boolean separate from fast clear type and this is tracked on a per-slice granularity. The previous scheme had some issues when it came to individual slices of a multi-LOD images. In particular, we only tracked "needs resolve" per-LOD but you could do a vkCmdPipelineBarrier that would only resolve a portion of the image and would set "needs resolve" to false anyway. Also, any transition from an undefined layout would reset the clear color for the entire LOD regardless of whether or not there was some clear color on some other slice. As far as full/partial resolves go, he assumptions of the previous scheme held because the one case where we do need a full resolve when CCS_E is enabled is for window-system images. Since we only ever allowed X-tiled window-system images, CCS was entirely disabled on gen9+ and we never got CCS_E. With the advent of Y-tiled window-system buffers, we now need to properly support doing a full resolve of images marked CCS_E. v2 (Jason Ekstrand): - Fix an bug in the compressed flag offset calculation - Treat 3D images as multi-slice for the purposes of resolve tracking v3 (Jason Ekstrand): - Set the compressed flag whenever we fast-clear - Simplify the resolve predicate computation logic Reviewed-by: Topi Pohjolainen <[email protected]> Reviewed-by: Nanley Chery <[email protected]>
* anv/cmd_buffer: Move the mi_alu helper higher upJason Ekstrand2018-02-081-17/+19
| | | | | Reviewed-by: Topi Pohjolainen <[email protected]> Reviewed-by: Nanley Chery <[email protected]>
* anv/image: Simplify some verbose commenntsJason Ekstrand2018-02-081-10/+3
| | | | | Reviewed-by: Topi Pohjolainen <[email protected]> Reviewed-by: Nanley Chery <[email protected]>
* anv: Use blorp_ccs_ambiguate instead of fast-clearsJason Ekstrand2018-02-082-50/+40
| | | | | | | | | | | | | | | | | | Even though the blorp pass looks a bit on the sketchy side, the end result in the Vulkan driver is very nice. Instead of having this weird case where you do a fast clear and then maybe have to resolve, we just do the ambiguate and are done with it. The ambiguate does exactly what we want of setting all the CCS values to 0 which puts it into the pass-through state. This should also improve performance a bit in certain cases. For instance, if we did a transition from UNDEFINED to GENERAL for a surface that doesn't have CCS enabled all the time, we would end up doing a fast-clear and then a full resolve which ends up touching every byte in the main surface as well as the CCS. With the ambiguate pass, that transition only touches the CCS. Reviewed-by: Nanley Chery <[email protected]>
* anv/cmd_buffer: Re-arrange the logic around UNDEFINED fast-clearsJason Ekstrand2018-02-081-17/+14
| | | | | Reviewed-by: Topi Pohjolainen <[email protected]> Reviewed-by: Nanley Chery <[email protected]>
* anv/cmd_buffer: Pull the undefined layout condition into the ifJason Ekstrand2018-02-081-9/+4
| | | | | | | | | Now that this isn't a multi-case if and it's just the one case, it's a bit clearer if the condition is just part of the if instead of being pulled out into a boolean variable. Reviewed-by: Topi Pohjolainen <[email protected]> Reviewed-by: Nanley Chery <[email protected]>
* intel/blorp: Add a CCS ambiguation passJason Ekstrand2018-02-082-0/+158
| | | | | | | | | | | | This pass performs an "ambiguate" operation on a CCS-compressed surface by manually writing zeros into the CCS. On gen8+, ISL gives us a fairly detailed notion of how the CCS is laid out so this is fairly simple to do. On gen7, the CCS tiling is quite crazy but that isn't an issue because we can only do CCS on single-slice images so we can just blast over the entire CCS buffer if we want to. Reviewed-by: Topi Pohjolainen <[email protected]> Reviewed-by: Nanley Chery <[email protected]>
* anv: Only fast clear single-slice imagesJason Ekstrand2018-02-081-17/+17
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The current strategy we use for managing resolves has an issues where we track clear colors and the need for resolves per-LOD but we still allow resolves of only a subset of the slices in any given LOD and doing so sets the "needs resolve" flag for that LOD to false while leaving the remaining layers unresolved. This patch is only the first step and does not, by itself fix anything. However, it's fairly self-contained and splitting it out means any performance regressions should bisect to this nice obvious commit rather than to the giant "rework aux tracking" commit. Nanley and I did some testing and none of the applications we tested even tried to fast-clear anything other than the first slice of an image. The test was done by adding a printf right before we call blorp_fast_clear if we were every going to touch any slice other than the first with a fast-clear. Due to the way the original code was structured, this would not have included applications which only cleared a subset of layers. The applications tested were: * All Sascha Willems demos * Aztec Ruins * Dota 2 * The Talos Principle * Mad Max * Warhammer 40,000: Dawn of War III * Serious Sam Fusion 2017: BFE While not the full list of shipping applications, it's a pretty good spread and covers most of the engines we've seen running on our driver. If this is ever shown to be a performance problem in the future, we can reconsider our strategy. Reviewed-by: Nanley Chery <[email protected]>
* anv/cmd_buffer: Add a mark_image_written helperJason Ekstrand2018-02-085-0/+119
| | | | | | | | | Currently, this helper does nothing but we call it every place where an image is written through the render pipeline. This will allow us to properly mark the aux state so that we can handle resolves correctly. Reviewed-by: Topi Pohjolainen <[email protected]> Reviewed-by: Nanley Chery <[email protected]>
* anv/blorp: Add src/dst_level helper variables in CmdCopyImageJason Ekstrand2018-02-081-8/+6
| | | | | Reviewed-by: Topi Pohjolainen <[email protected]> Reviewed-by: Nanley Chery <[email protected]>
* anv/cmd_buffer: Add an anv_genX_call macroJason Ekstrand2018-02-081-15/+25
| | | | | | | This is copied and pasted from the similar macro we added to ISL. Reviewed-by: Topi Pohjolainen <[email protected]> Reviewed-by: Nanley Chery <[email protected]>
* anv/cmd_buffer: Generalize transition_color_bufferJason Ekstrand2018-02-081-12/+47
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This moves it to being based on layout_to_aux_usage instead of being hard-coded based on bits of a priori knowledge of how transitions interact with layouts. This conceptually simplifies things because we're now using layout_to_aux_usage and layout_supports_fast_clear to make resolve decisions so changes to those functions will do what one expects. There is a potential bug with window system integration on gen9+ where we wouldn't do a resolve when transitioning to the PRESENT_SRC layout because we just assume that everything that handles CCS_E can handle it all the time. When handing a CCS_E image off to the window system, we may need to do a full resolve if the window system does not support the CCS_E modifier. The only reason why this hasn't been a problem yet is because we don't support modifiers in Vulkan WSI and so we always get X tiling which implies no CCS on gen9+. This patch doesn't actually fix that bug yet but it takes us the first step in that direction by making us actually pick the correct resolve op. In order to handle all of the cases, we need more detailed aux tracking. v2 (Jason Ekstrand): - Make a few more things const - Use the anv_fast_clear_support enum v3 (Jason Ekstrand): - Move an assert and add a better comment Reviewed-by: Topi Pohjolainen <[email protected]> Reviewed-by: Nanley Chery <[email protected]>
* anv/cmd_buffer: Recurse in transition_color_buffer instead of falling throughJason Ekstrand2018-02-081-9/+9
| | | | | Reviewed-by: Topi Pohjolainen <[email protected]> Reviewed-by: Nanley Chery <[email protected]>
* anv/image: Support color aspects in layout_to_aux_usageJason Ekstrand2018-02-081-19/+29
| | | | Reviewed-by: Nanley Chery <[email protected]>
* anv/image: Add a helper for determining when fast clears are supportedJason Ekstrand2018-02-082-0/+83
| | | | | | | | | | | | | | | | v2 (Jason Ekstrand): - Return an enum instead of a boolean v3 (Jason Ekstrand): - Return ANV_FAST_CLEAR_NONE instead of false (Topi) - Rename ANV_FAST_CLEAR_ANY to ANV_FAST_CLEAR_DEFAULT_VALUE - Add documentation for the enum values v4 (Jason Ekstrand): - Remove a dead comment Reviewed-by: Topi Pohjolainen <[email protected]> Reviewed-by: Nanley Chery <[email protected]>
* anv/image: Update a commentJason Ekstrand2018-02-081-1/+1
| | | | | | | This got lost in all of the aspect vs. plane rebasing of YCBCR. Reviewed-by: Topi Pohjolainen <[email protected]> Reviewed-by: Nanley Chery <[email protected]>
* anv/blorp: Rework HiZ ops to look like MCS and CCSJason Ekstrand2018-02-083-26/+34
| | | | | Reviewed-by: Topi Pohjolainen <[email protected]> Reviewed-by: Nanley Chery <[email protected]>
* anv/blorp: Support ISL_AUX_USAGE_HIZ in surf_for_anv_imageJason Ekstrand2018-02-081-16/+6
| | | | | | | | | If the function gets passed ANV_AUX_USAGE_DEFAULT, it still has the old behavior of setting ISL_AUX_USAGE_NONE for depth/stencil which is what we want for blits/copies. Reviewed-by: Topi Pohjolainen <[email protected]> Reviewed-by: Nanley Chery <[email protected]>
* anv/blorp: Rework image clear/resolve helpersJason Ekstrand2018-02-083-104/+166
| | | | | | | | | | This replaces image_fast_clear and ccs_resolve with two new helpers that simply perform an isl_aux_op whatever that may be on CCS or MCS. This is a bit cleaner as it separates performing the aux operation from which blorp helper we have to call to do it. Reviewed-by: Topi Pohjolainen <[email protected]> Reviewed-by: Nanley Chery <[email protected]>
* intel/isl: Codify AUX operations in an enumJason Ekstrand2018-02-081-25/+49
| | | | | | | | | Right now, we have different entrypoints and enums in blorp for these different operations. This provides us a central enum which we can begin to transition to. Reviewed-by: Topi Pohjolainen <[email protected]> Reviewed-by: Nanley Chery <[email protected]>
* r600/sb: Check whether optimizations would result in reladdr conflictGert Wollny2018-02-093-4/+55
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | v2: * Check whether the node src and dst registers are NULL before using them. * fix a type in the commit message. Two cases are handled with this patch: 1. If copy propagation tries to eliminated a move from a relative array access then it could optimize MOV R1, ARRAY[RELADDR_1] MOV R2, ARRAY[RELADDR_2] OP2 R3, R1 R2 into OP2 R3, ARRAY[RELADDR_1], ARRAY[RELADDR_2] which is forbidden, because there is only one address register available. 2. When MULADD(x,a,MUL(x,c)) is handled MUL TMP, R1, ARRAY[RELADDR_1] MULLADD R3, R1, ARRAY[RELADDR_2], TMP by folding this into ADD TMP, ARRAY[RELADDR_2], ARRAY[RELADDR_1] MUL R3, R1, TMP which is also forbidden. Test for these cases and reject the optimization if a forbidden combination of relative access would be created. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=103142 Signed-off-by: Gert Wollny <[email protected]> Reviewed-by: Dave Airlie <[email protected]>
* r600g: Implement spilling of temp arrays (v2)Glenn Kennard2018-02-093-8/+292
| | | | | | | | | Pessimistically spills arrays if GPR limit is exceeded. v2: fix r600 support [airlied] Signed-off-by: Glenn Kennard <[email protected]> Reviewed-by: Dave Airlie <[email protected]>
* r600/sb: handle scratch mem reads on r600Dave Airlie2018-02-092-5/+23
| | | | | | | | | | On r600 we use the scratch mem with read/read_ind, in that case sb should track the rw_gpr as a dst instead of a src. This stops the whole shader being optimised out. Signed-off-by: Dave Airlie <[email protected]> Reviewed-by: Dave Airlie <[email protected]>
* r600g/sb: Add dependency tracking for scratch opsGlenn Kennard2018-02-098-4/+22
| | | | | Signed-off-by: Glenn Kennard <[email protected]> Reviewed-by: Dave Airlie <[email protected]>
* r600g/sb: Support scratch opsGlenn Kennard2018-02-095-1/+153
| | | | | Signed-off-by: Glenn Kennard <[email protected]> Reviewed-by: Dave Airlie <[email protected]>
* r600g: Implement scratch buffer state management (v2)Glenn Kennard2018-02-096-1/+152
| | | | | | | v2: add Glenn's fixes Signed-off-by: Glenn Kennard <[email protected]> Reviewed-by: Dave Airlie <[email protected]>
* r600g: Add pending output functionGlenn Kennard2018-02-092-0/+22
| | | | | | | | Spills have to happen after the VLIW bundle currently processed, so defer emitting the spill op. Signed-off-by: Glenn Kennard <[email protected]> Reviewed-by: Dave Airlie <[email protected]>
* r600g: Support emitting scratch opsGlenn Kennard2018-02-094-1/+77
| | | | | Signed-off-by: Glenn Kennard <[email protected]> Reviewed-by: Dave Airlie <[email protected]>
* r600: fix texture gather swizzling.Dave Airlie2018-02-091-7/+7
| | | | | | | | | This fixes: KHR-GL45.texture_gather.swizzle on cayman and redwood. Reviewed-by: Roland Scheidegger <[email protected]> Signed-off-by: Dave Airlie <[email protected]>
* ac: add 64bit support to ac_find_lsb()Timothy Arceri2018-02-091-2/+20
| | | | | | v2: use LLVMBuildTrunc() Reviewed-by: Marek Olšák <[email protected]>
* ac: move get_elem_bits() to ac_llvm_build.cTimothy Arceri2018-02-093-26/+30
| | | | Reviewed-by: Marek Olšák <[email protected]>
* ac: add 64bit bitCount supportTimothy Arceri2018-02-091-1/+6
| | | | | | v2: use LLVMBuildTrunc() Reviewed-by: Marek Olšák <[email protected]>
* ac/nir: clean up handle_fs_outputs_post()Samuel Pitoiset2018-02-081-26/+38
| | | | | Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]>
* ac/nir: add radv_load_output() helperSamuel Pitoiset2018-02-081-20/+20
| | | | | Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]>
* ac/shader: scan info about output PS declarationsSamuel Pitoiset2018-02-085-16/+52
| | | | | | | | NIR->LLVM should only be a translation pass, and all scan stuff should be done before. Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]>
* ac/nir: add radv_export_param() helperSamuel Pitoiset2018-02-081-21/+22
| | | | | Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]>
* ac/nir: remove set but unused export_maskSamuel Pitoiset2018-02-082-2/+0
| | | | | Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]>
* ac/nir: remove dead code in handle_vs_outputs_post()Samuel Pitoiset2018-02-081-8/+1
| | | | | | | The memcpy can't be reached because the condition is always false. Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]>
* ac/nir: remove useless check in si_llvm_init_export_args()Samuel Pitoiset2018-02-081-3/+0
| | | | | | | values can't be NULL because we use ac_build_export_null() now. Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]>
* ac/nir: use ac_build_export_null()Samuel Pitoiset2018-02-081-2/+1
| | | | | | | The number of enabled channels should be 0 when exporting null. Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]>
* ac: add ac_build_export_null() helperSamuel Pitoiset2018-02-083-20/+20
| | | | | | | Imported from RadeonSI. Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Marek Olšák <[email protected]>
* meson: Add build option for toolsScott D Phillips2018-02-087-7/+24
| | | | | | | | | | | | | | | Add a build option to control building some of the misc tools we have. Also set the executables to install, presumably you want that if you're asking for the build. v2: set 'install:' to the with_tools value, not true (Jordan) handle 'all' in a the comma list (Dylan) Add freedreno's tools (Dylan) Reviewed-by: Eric Anholt <[email protected]> Reviewed-by: Jordan Justen <[email protected]> Reviewed-by: Eric Engestrom <[email protected]> Reviewed-by: Dylan Baker <[email protected]>
* intel: Add Coffee Lake brand stringsAnuj Phogat2018-02-081-3/+3
| | | | | Signed-off-by: Anuj Phogat <[email protected]> Reviewed-by: Jordan Justen <[email protected]>
* gallium/util: silence clang warning in blitter codeBrian Paul2018-02-081-1/+1
| | | | | | | Silence "warning: comparison of constant 4294967295 with expression of type 'ubyte'". Reviewed-by: Jose Fonseca <[email protected]>
* tgsi: s/unsigned/enum tgsi_semantic/ in ureg_DECL_output()Brian Paul2018-02-081-1/+1
| | | | | | | So the function matches the prototype. Found with clang. v2: fix copy&paste error Reviewed-by: Jose Fonseca <[email protected]>
* tgsi: use TGSI_INTERPOLATE_x arguments instead of zeros in ureg codeBrian Paul2018-02-081-2/+5
| | | | | | | | | | | | TGSI_INTERPOLATE_CONSTANT and TGSI_INTERPOLATE_LOC_CENTER have the value zero so there's no change in behavior. It seems funny to declare these fs input registers with constant interpolation. But it looks like ureg_DECL_input_layout() is not called anywhere and ureg_DECL_input() is only called from util_make_geometry_passthrough_shader(). Reviewed-by: Mathias Fröhlich <[email protected]> Reviewed-by: Roland Scheidegger <[email protected]>
* gallium/util: s/uint/enum tgsi_semantic/ in simple shader codeBrian Paul2018-02-085-11/+11
| | | | | Reviewed-by: Mathias Fröhlich <[email protected]> Reviewed-by: Roland Scheidegger <[email protected]>
* tgsi: s/unsigned/enum pipe_shader_type/ in ureg codeBrian Paul2018-02-082-5/+9
| | | | | | | And add a default switch case to silence a compiler warning. Reviewed-by: Mathias Fröhlich <[email protected]> Reviewed-by: Roland Scheidegger <[email protected]>
* gallium/util: s/uint/enum tgsi_semantic/ in u_blitter.cBrian Paul2018-02-081-4/+6
| | | | | | | And put static qualifier on const arrays. Reviewed-by: Mathias Fröhlich <[email protected]> Reviewed-by: Roland Scheidegger <[email protected]>