summaryrefslogtreecommitdiffstats
path: root/src
Commit message (Collapse)AuthorAgeFilesLines
* radv: always load 3 channels for formats that need to be shuffledSamuel Pitoiset2019-03-151-9/+14
| | | | | | | | | This fixes a rendering issue with Hellblade and DXVK. Fixes: a66b186bebf ("radv: use typed buffer loads for vertex input fetches") Reported-by: Philip Rebohle <[email protected]> Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]>
* mesa: Add assert to _mesa_primitive_restart_index.Mathias Fröhlich2019-03-151-0/+3
| | | | | | | Make sure the inde_size parameter is meant to be in bytes. Reviewed-by: Brian Paul <[email protected]> Signed-off-by: Mathias Fröhlich <[email protected]>
* vbo: Fix GL_PRIMITIVE_RESTART_FIXED_INDEX in display list compiles.Mathias Fröhlich2019-03-151-5/+9
| | | | | | | | The maximum value primitive restart index is different for each index data type. Use the appropriate fixed restart index value. Reviewed-by: Brian Paul <[email protected]> Signed-off-by: Mathias Fröhlich <[email protected]>
* vbo: Fix basevertex handling in display list compiles.Mathias Fröhlich2019-03-151-5/+12
| | | | | | | | | The standard requires that the primitive restart comparison happens before the basevertex value is added. Do this now, drop a reference to the standard why this happens at this place. Reviewed-by: Brian Paul <[email protected]> Signed-off-by: Mathias Fröhlich <[email protected]>
* mesa: Use mapping tools in debug prints.Mathias Fröhlich2019-03-151-45/+12
| | | | | Reviewed-by: Brian Paul <[email protected]> Signed-off-by: Mathias Fröhlich <[email protected]>
* mesa: Remove _ae_{,un}map_vbos and dependencies.Mathias Fröhlich2019-03-152-100/+0
| | | | | | | | | Since mapping and unmapping the buffer objects in a VAO is handled directly from the VAO, this part of the _NEW_ARRAY state is no longer used. So remove this part of array element state. Reviewed-by: Brian Paul <[email protected]> Signed-off-by: Mathias Fröhlich <[email protected]>
* mesa: Replace _ae_{,un}map_vbos with _mesa_vao_{,un}map_arraysMathias Fröhlich2019-03-152-13/+11
| | | | | | | | | | Due to the use of bitmaps, the _mesa_vao_{,un}map_arrays functions should provide comparable runtime efficienty to the currently used _ae_{,un}map_vbos functions. So use this functions and enable further cleanup. Reviewed-by: Brian Paul <[email protected]> Signed-off-by: Mathias Fröhlich <[email protected]>
* mesa: Use _mesa_array_element in dlist save.Mathias Fröhlich2019-03-151-4/+19
| | | | | | | | | | Make use of the newly factored out _mesa_array_element function in display list compilation. For now that duplicates out the primitive restart logic. But that turns out to need a fix in display list handling anyhow. Reviewed-by: Brian Paul <[email protected]> Signed-off-by: Mathias Fröhlich <[email protected]>
* mesa: Factor out _mesa_array_element.Mathias Fröhlich2019-03-152-19/+32
| | | | | | | | | The factored out function handles emitting the vertex attributes at the given index. The now public accessible function gets used in the following patches. Reviewed-by: Brian Paul <[email protected]> Signed-off-by: Mathias Fröhlich <[email protected]>
* mesa: Implement helper functions to map and unmap a VAO.Mathias Fröhlich2019-03-152-0/+102
| | | | | | | | | | Provide a set of functions that maps or unmaps all VBOs held in a VAO. The functions will be used in the following patches. v2: Update comments. Reviewed-by: Brian Paul <[email protected]> Signed-off-by: Mathias Fröhlich <[email protected]>
* st/mesa: Let NIR lower UBO and SSBO access when we have itJason Ekstrand2019-03-152-1/+11
| | | | Reviewed-by: Caio Marcelo de Oliveira Filho <[email protected]>
* i965: Stop setting LowerBuferInterfaceBlocksJason Ekstrand2019-03-152-1/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Instead, we do UBO and SSBO deref lowering in NIR after we've given it a chance to optimize SSBO access: Shader-db results on Kaby Lake: total instructions in shared programs: 15235775 -> 15235484 (<.01%) instructions in affected programs: 14992 -> 14701 (-1.94%) helped: 19 HURT: 20 total cycles in shared programs: 339220331 -> 339027307 (-0.06%) cycles in affected programs: 79831981 -> 79638957 (-0.24%) helped: 540 HURT: 602 total loops in shared programs: 4402 -> 4348 (-1.23%) loops in affected programs: 186 -> 132 (-29.03%) helped: 27 HURT: 0 total spills in shared programs: 23261 -> 23234 (-0.12%) spills in affected programs: 38 -> 11 (-71.05%) helped: 1 HURT: 0 total fills in shared programs: 31442 -> 31371 (-0.23%) fills in affected programs: 98 -> 27 (-72.45%) helped: 1 HURT: 0 LOST: 12 GAINED: 12 Most of the help and hurt in instruction counts was just churn caused by re-ordering of optimizations and the fact that the NIR deref lowering code is emitting slightly different instructions. Nothing was hurt by more than three instructions and most things weren't helped by more than four. The primary exception to this is one Car Chase shader: shaders/non-free/gfxbench4/carchase/341.shader_test CS SIMD32: 1144 -> 821 (-28.23%) There is also one compute shader in Manhattan 3.1 and a fragment shader in the UE4 Shooter Game demo that now get a loop partially unrolled. Those showed up in the results as hurt instructions but were manually removed to get the results above. The lost/gained was a dozen Car Chase shaders that went from SIMD8 to SIMD16 thanks to improved register pressure: shaders/non-free/gfxbench4/carchase/366.shader_test CS shaders/non-free/gfxbench4/carchase/368.shader_test CS shaders/non-free/gfxbench4/carchase/370.shader_test CS shaders/non-free/gfxbench4/carchase/372.shader_test CS shaders/non-free/gfxbench4/carchase/376.shader_test CS shaders/non-free/gfxbench4/carchase/378.shader_test CS shaders/non-free/gfxbench4/carchase/380.shader_test CS shaders/non-free/gfxbench4/carchase/382.shader_test CS shaders/non-free/gfxbench4/carchase/384.shader_test CS shaders/non-free/gfxbench4/carchase/388.shader_test CS shaders/non-free/gfxbench4/carchase/4.shader_test CS shaders/non-free/gfxbench4/carchase/6.shader_test CS Given how much it appeared to be improved, I ran Car Chase on my laptop. Unfortunately, I wasn't able to see any measurable improvement. It might be helped by 1-2% but it's in the noise. It does render correctly as far as I can tell so the improvement is legitimate. All of the loops that got delete were in dolphin uber shaders. I've had no opportunity to test them for correctness or performance. Reviewed-by: Caio Marcelo de Oliveira Filho <[email protected]>
* glsl/nir: Add a pass to lower UBO and SSBO accessJason Ekstrand2019-03-154-0/+305
| | | | Reviewed-by: Caio Marcelo de Oliveira Filho <[email protected]>
* glsl/nir: Handle unlowered SSBO atomic and array_length intrinsicsJason Ekstrand2019-03-151-0/+112
| | | | | | | | We didn't have any of these before because all NIR consumers always called lower_ubo_references. Soon, we want to pass the derefs straight through to NIR so we need to handle these intrinsics directly. Reviewed-by: Caio Marcelo de Oliveira Filho <[email protected]>
* glsl/nir: Set explicit types on UBO/SSBO variablesJason Ekstrand2019-03-151-15/+67
| | | | | | | | We want to be able to use variables and derefs for UBO/SSBO access in NIR. In order to do this, the rest of NIR needs to know the type layout information. Reviewed-by: Caio Marcelo de Oliveira Filho <[email protected]>
* glsl: Don't lower vector derefs for SSBOs, UBOs, and sharedJason Ekstrand2019-03-151-0/+21
| | | | | | | | | | | | All of these are backed by some sort of memory so if you have multiple threads writing to different components of the same vector at the same time, the load-vec-store pattern that GLSL IR emits won't work. This shouldn't affect any drivers today as they all call GLSL IR lowering which lowers access to these variables to index+offset intrinsics before we get to this point. However, NIR will start handling the derefs itself and won't want the lowering. Reviewed-by: Caio Marcelo de Oliveira Filho <[email protected]>
* nir/lower_io: Add a new buffer_array_length intrinsic and loweringJason Ekstrand2019-03-152-0/+45
| | | | | Reviewed-by: Kristian H. Kristensen <[email protected]> Reviewed-by: Caio Marcelo de Oliveira Filho <[email protected]>
* nir: Rename nir_address_format_vk_index_offset to not be vkJason Ekstrand2019-03-154-10/+10
| | | | | | | | It's just a 32-bit index and offset. We're going to want to use it in GL as well so stop talking about Vulkan. Reviewed-by: Kristian H. Kristensen <[email protected]> Reviewed-by: Caio Marcelo de Oliveira Filho <[email protected]>
* nir/deref: Consider COHERENT decorated var derefs as aliasingJason Ekstrand2019-03-151-4/+47
| | | | | | | | If we get to two deref_var paths with different variables, we usually know they don't alias. However, if both of the paths are marked coherent, we don't have to worry about it. Reviewed-by: Caio Marcelo de Oliveira Filho <[email protected]>
* compiler/types: Add helpers to get explicit types for standard layoutsJason Ekstrand2019-03-152-16/+191
| | | | | | | | | We also need to modify the current size/align helpers to not blow up when they encounter an explicitly laid out type. Previously we considered using the size/align helpers mutually exclusive with standard layouts but now we just assert that they match. Reviewed-by: Caio Marcelo de Oliveira Filho <[email protected]>
* compiler/types: Add a C wrapper to get full struct field dataJason Ekstrand2019-03-152-0/+11
| | | | | Reviewed-by: Kristian H. Kristensen <[email protected]> Reviewed-by: Caio Marcelo de Oliveira Filho <[email protected]>
* compiler/types: Add a new is_interface C wrapperJason Ekstrand2019-03-152-0/+7
| | | | | | Reviewed-by: Alejandro Piñeiro <[email protected]> Reviewed-by: Kristian H. Kristensen <[email protected]> Reviewed-by: Caio Marcelo de Oliveira Filho <[email protected]>
* nir/validate: Allow 32-bit boolean load/store intrinsicsJason Ekstrand2019-03-151-0/+6
| | | | | | | | | | | | With UBOs and SSBOs we have boolean types but they're actually 32-bit values. Make the validator a little less strict so that we can do a 32-bit load/store on boolean types. We're about to add a lowering pass called gl_nir_lower_buffers which will lower boolean load/store operations to 32-bit and insert i2b and b2i instructions to convert to/from 1-bit booleans. We want that to be legal. Reviewed-by: Kristian H. Kristensen <[email protected]> Reviewed-by: Caio Marcelo de Oliveira Filho <[email protected]>
* nir/validate: Only require bare types to match for copy_derefJason Ekstrand2019-03-153-3/+6
| | | | | | | | | | If we want to be able to use copy_deref instructions on explicitly laid out types, we have to be a little more flexible about what types we allow. Instead, of requiring the types to exactly match, only require the bare types to match. Reviewed-by: Kristian H. Kristensen <[email protected]> Reviewed-by: Caio Marcelo de Oliveira Filho <[email protected]>
* nir/algebraic: Add a couple optimizations for iabs and ishrJason Ekstrand2019-03-151-0/+6
| | | | | | | | | | | | | | Shader-db results on Kaby Lake: total instructions in shared programs: 15225213 -> 15222365 (-0.02%) instructions in affected programs: 43524 -> 40676 (-6.54%) helped: 203 HURT: 0 Lots of shaders in Shadow Warrior had this pattern along with Deus Ex, Civ, Shadow of Mordor, and several others. Reviewed-by: Kristian H. Kristensen <[email protected]>
* mesa/st: Fix leaks of TGSI tokens in VP variants.Eric Anholt2019-03-141-14/+20
| | | | | | | | | | Starting a glxgears and closing it, I was seeing a lot of leaked TGSI for the fixed function VPs. v2: drop unused delete_ir() arg. Fixes: 3b4929ec6e64 ("st/mesa: Copy VP TGSI tokens if they exist, even for NIR shaders.") Reviewed-by: Kenneth Graunke <[email protected]>
* mesa/st: Make sure that prog_to_nir NIR gets freed.Eric Anholt2019-03-141-0/+6
| | | | | | | | | | GLSL NIR gets freed on relink by _mesa_delete_program(), but for ARB programs we need to free the old NIR when PSN is used to set up new NIR in the same gl_program. Additionally, set the base .nir field so that it will get freed by _mesa_delete_program(). Fixes: 3d7611e9a6c6 ("st/nir: use NIR for asm programs") Reviewed-by: Kenneth Graunke <[email protected]>
* panfrost/midgard: Implement fpowAlyssa Rosenzweig2019-03-144-1/+4
| | | | | | | We have a native op for this, which was just found in a disassembly -- so instead of lowering, use it! Signed-off-by: Alyssa Rosenzweig <[email protected]>
* panfrost: Compute viewport state on the flyAlyssa Rosenzweig2019-03-142-71/+38
| | | | | | | | | | | Previously, we were caching this incorrectly; there's no real reason to given how variable it is (sensitive to changes in viewport, framebuffer dimensions, and scissors) and how cheap it is to recompute. So, just do it on the fly each draw. Fixes glmark-es2 -bshadow and -brefract. Signed-off-by: Alyssa Rosenzweig <[email protected]>
* panfrost; Disable AFBC for depth buffersAlyssa Rosenzweig2019-03-142-4/+7
| | | | | | | | | For inexplicable reasons, the depth buffer is faster if kept as linear, whereas the colour buffers are faster if AFBC. Given both code paths are available, we'll choose the faster one of each (which also helps with testing coverage). Signed-off-by: Alyssa Rosenzweig <[email protected]>
* panfrost: Allocate extra data for depth bufferAlyssa Rosenzweig2019-03-141-0/+5
| | | | | | | It's not clear why the hardware "spills" a little bit, but if we don't do this, we get MMU faults with linear depth buffers. Signed-off-by: Alyssa Rosenzweig <[email protected]>
* panfrost: Comment spelling fixAlyssa Rosenzweig2019-03-141-1/+1
| | | | Signed-off-by: Alyssa Rosenzweig <[email protected]>
* panfrost/mfbd: Respect per-job depth write flagAlyssa Rosenzweig2019-03-144-20/+42
| | | | | | | | | | While a depth buffer may be supplied, it only needs to be written to if the depth writemask is set for any draw AND if the depth buffer is not immediately invalidated (as is the case for scanout). This refactors panfrost_job to provide a depth write requirement, which is now implemented for MFBD depth buffers. Signed-off-by: Alyssa Rosenzweig <[email protected]>
* panfrost/mfbd: Implement linear depth buffersAlyssa Rosenzweig2019-03-141-10/+9
| | | | | | | | This removes a clunky hack where the depth buffer was enabled during the *clear*, instead of during depth buffer linking. That said, this does not yet support writeback like AFBC depth buffers. Signed-off-by: Alyssa Rosenzweig <[email protected]>
* panfrost: Minor comment cleanup (version detection)Alyssa Rosenzweig2019-03-141-2/+3
| | | | Signed-off-by: Alyssa Rosenzweig <[email protected]>
* panfrost: Remove staging MFBDAlyssa Rosenzweig2019-03-142-109/+98
| | | | | | | Same idea as the previous commit, but for the MFBD this time instead of the SFBD. Signed-off-by: Alyssa Rosenzweig <[email protected]>
* panfrost: Remove staging SFBD for pan_contextAlyssa Rosenzweig2019-03-144-39/+30
| | | | | | | | The fragment framebuffer descriptor should not be a context entry; rather, it should be constructed only at fragment time to keep analysis tractable. Signed-off-by: Alyssa Rosenzweig <[email protected]>
* panfrost: Break out fragment to SFBD/MFBD filesAlyssa Rosenzweig2019-03-147-378/+520
| | | | | | | | This substantially cleans up the corresponding logic at the expense of a bit of code duplication; nevertheless, it's a net win since otherwise incompatible hardware code is mixed confusingly. Signed-off-by: Alyssa Rosenzweig <[email protected]>
* freedreno: Use shared drm_find_modifier utilAlyssa Rosenzweig2019-03-141-16/+4
| | | | | Signed-off-by: Alyssa Rosenzweig <[email protected]> Reviewed-by: Rob Clark <[email protected]>
* vc4: Use shared drm_find_modifier utilAlyssa Rosenzweig2019-03-141-15/+3
| | | | | Signed-off-by: Alyssa Rosenzweig <[email protected]> Reviewed-by: Eric Anholt <[email protected]>
* v3d: Use shared drm_find_modifier utilAlyssa Rosenzweig2019-03-141-15/+3
| | | | | Signed-off-by: Alyssa Rosenzweig <[email protected]> Reviewed-by: Eric Anholt <[email protected]>
* util: Add a drm_find_modifier helperAlyssa Rosenzweig2019-03-141-0/+55
| | | | | | | | | | | | | | This function is replicated across vc4/v3d/freedreno and is needed in Panfrost; let's make this shared code. v2: Supply generic util_array_contains_u64 version (Eric Engestrom). Add missing stdbool.h include (Eric Anholt). Mark inline (Christian Gmeiner). Signed-off-by: Alyssa Rosenzweig <[email protected]> Reviewed-by: Eric Engestrom <[email protected]> Reviewed-by: Rob Clark <[email protected]> Reviewed-by: Eric Anholt <[email protected]>
* mesa: add logging function for formatted stringMark Janes2019-03-142-0/+35
| | | | Reviewed-by: Erik Faye-Lund <[email protected]>
* mesa: rename logging functions to reflect that they format stringsMark Janes2019-03-1412-92/+92
| | | | | | | In preparation for the definition of a function to log a formatted string. Reviewed-by: Erik Faye-Lund <[email protected]>
* mesa: properly report the length of truncated log messagesMark Janes2019-03-141-0/+3
| | | | | | | | | | | | | _mesa_log_msg must provide the length of the string passed into the KHR_debug api. When the string formatted by _mesa_gl_vdebugf exceeds MAX_DEBUG_MESSAGE_LENGTH, the length is incorrectly set to the number of characters that would have been written if enough space had been available. Fixes: 30256805784450b8bb9d4dabfb56226271ca9d24 ("mesa: Add support for GL_ARB_debug_output with dynamic ID allocation.") Reviewed-by: Erik Faye-Lund <[email protected]>
* anv: Only set 3DSTATE_PS::VectorMaskEnable on gen8+Jason Ekstrand2019-03-141-1/+1
| | | | | | | We don't set it on HSW and earlier in i965 and disabling it appears to make derivatives somewhat more reliable. Acked-by: Kenneth Graunke <[email protected]>
* radv: always initialize HTILE when the src layout is UNDEFINEDSamuel Pitoiset2019-03-141-2/+1
| | | | | | | | | | | | | HTILE should always be initialized when transitioning from VK_IMAGE_LAYOUT_UNDEFINED to other image layouts. Otherwise, if an app does a transition from UNDEFINED to GENERAL, the driver doesn't initialize HTILE and it tries to decompress the depth surface. For some reasons, this results in VM faults. Cc: [email protected] Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=107563 Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]>
* i965: Disable ARB_fragment_shader_interlock for platforms prior to GEN9Plamena Manolova2019-03-142-1/+25
| | | | | | | | | | | ARB_fragment_shader_interlock depends on memory fences to ensure fragment ordering and this ordering guarantee is only supported from GEN9 onwards. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=109980 Fixes: 939312702e35 "i965: Add ARB_fragment_shader_interlock support." Signed-off-by: Plamena Manolova <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]>
* iris: Don't mutate box in transfer map codeKenneth Graunke2019-03-131-37/+28
| | | | | | | Not mutating the boxes is arguably cleaner. Split from a patch by Chris Wilson but reworked to use a pointer to the original box rather than making a copy at all.
* i965: remove scaling factors from P010, P012Tapani Pälli2019-03-141-2/+2
| | | | | | | | | | | | | | | | | | Patch removes scaling factors introduced in 2a2e69f975b but leaves option to use scaling in place as it could be useful with other upcoming YUV formats. We did this scaling because ffmpeg was shifting channel bits down, however it seems this is not the right place as compositor wants to flip same buffers directly to display as well and therefore bitshifting needs to be done by the client when receiving frame from ffmpeg. Now P0x formats are treated the same, e.g. P010 is same as P016 but with lower 6 bits set to zeros. Fixes: 2a2e69f975b "i965: add P0x formats and propagate required scaling factors" Signed-off-by: Tapani Pälli <[email protected]> Reviewed-by: Lionel Landwerlin <[email protected]>