aboutsummaryrefslogtreecommitdiffstats
Commit message (Collapse)AuthorAgeFilesLines
* softpipe: Enable PIPE_CAP_TEXTURE_BUFFER_OFFSET_ALIGNMENTGert Wollny2019-04-092-2/+2
| | | | | | | | | | | | | The offset alignment must be set to s16 because the tile cache is implemented to require this. This enables ARB_buffer_texture_range and OES_texture_buffer for softpipe. The according deqp-gles31 tests pass. Also update the feature table. Signed-off-by: Gert Wollny <[email protected]> Reviewed-by: Dave Airlie <[email protected]>
* softpipe: Add an extra code path for the buffer texel lookupGert Wollny2019-04-091-1/+16
| | | | | | | | | | With buffers the addressing is done on a per-byte bases so the code path for normal textures doesn't work properly. Also add an assert to make sure that the bit cound for storing the X coordinate is large enough. Signed-off-by: Gert Wollny <[email protected]> Reviewed-by: Dave Airlie <[email protected]>
* softpipe: raise number of bits used for X coordinate texture lookupGert Wollny2019-04-092-7/+6
| | | | | | | | | With buffers the addressing is done on a per byte basis and we with a maximal block size of 16 byte we have to take into acount four more bits. For simplicity just remove the TEX_TILE_SIZE_LOG2, which is 5 bit. Signed-off-by: Gert Wollny <[email protected]> Reviewed-by: Dave Airlie <[email protected]>
* softpipe: Don't use mag filter for gather opGert Wollny2019-04-091-3/+3
| | | | | | | | | | | For the gather op no magnifictaion filter is provided, so always use the filter given for minification (which is the linear filter) Fixes: 0dff1533f25951adda3c36be6d9efa944741befb softpipe: Use mag texture filter also for clamped lod == 0 Signed-off-by: Gert Wollny <[email protected]> Reviewed-by: Dave Airlie <[email protected]>
* nir: Get rid of global registersJason Ekstrand2019-04-0918-209/+10
| | | | | | | | | We have a pass to lower global registers to locals and many drivers dutifully call it. However, no one ever creates a global register ever so it's all dead code. It's time we bury it. Acked-by: Karol Herbst <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
* nir: Get rid of nir_register::is_packedJason Ekstrand2019-04-097-34/+11
| | | | | | | | All we ever do is initialize it to zero, clone it, print it, and validate it. No one ever sets or uses it. Acked-by: Karol Herbst <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
* virgl: add support for ARB_indirect_parametersDave Airlie2019-04-093-2/+4
| | | | | | The protocol changes are already in place for it. Reviewed-By: Gert Wollny <[email protected]>
* virgl: add support for ARB_multi_draw_indirectDave Airlie2019-04-093-5/+10
| | | | | | | This will pass the multi draw through to the host if it has support for it instead of using the st to emulate it Reviewed-By: Gert Wollny <[email protected]>
* virgl: add support for missing command buffer binding.Dave Airlie2019-04-093-3/+9
| | | | | | | When I added indirect support I forgot this, however to use it now we need to check for a new enough capability on the host side. Reviewed-By: Gert Wollny <[email protected]>
* docs: Add NV_compute_shader_derivatives to 19.1.0 relnotesCaio Marcelo de Oliveira Filho2019-04-081-0/+2
|
* anv: Implement VK_NV_compute_shader_derivativesCaio Marcelo de Oliveira Filho2019-04-083-0/+10
| | | | | Reviewed-by: Ian Romanick <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]>
* spirv: Add support for DerivativeGroup capabilitiesCaio Marcelo de Oliveira Filho2019-04-082-0/+16
| | | | | | | | | | | | As defined in SPV_NV_compute_shader_derivatives. These control how the invocations are arranged in a CS when doing derivative and related operations (which are also enabled by the extension). Since we expect valid SPIR-V, we don't need to do more work at SPIR-V level to enable the derivative and related operations to be called. Reviewed-by: Ian Romanick <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]>
* iris: Enable NV_compute_shader_derivativesCaio Marcelo de Oliveira Filho2019-04-081-0/+1
| | | | | Reviewed-by: Kenneth Graunke <[email protected]> Reviewed-by: Ian Romanick <[email protected]>
* gallium: Add PIPE_CAP_COMPUTE_SHADER_DERIVATIVESCaio Marcelo de Oliveira Filho2019-04-084-0/+6
| | | | | | | | To enable NV_compute_shader_derivatives, which allows derivatives (and texture lookups with implicit derivatives) in compute shaders. Reviewed-by: Kenneth Graunke <[email protected]> Reviewed-by: Ian Romanick <[email protected]>
* i965: Advertise NV_compute_shader_derivativesCaio Marcelo de Oliveira Filho2019-04-081-0/+1
| | | | Reviewed-by: Ian Romanick <[email protected]>
* intel/fs: Use NIR_PASS_V when lowering CS intrinsicsCaio Marcelo de Oliveira Filho2019-04-081-3/+4
| | | | | | | | | This will make that step visible in NIR_PRINT=1. v2: Also use the macro for the cleanup passes. Reviewed-by: Ian Romanick <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]>
* intel/fs: Don't loop when lowering CS intrinsicsCaio Marcelo de Oliveira Filho2019-04-081-15/+10
| | | | | | | | | This was needed when certain intrinsics were lowered to other ones that were defined by the same pass. After 060817b2 "intel,nir: Move gl_LocalInvocationID lowering to nir_lower_system_values" we don't need the loop anymore. Reviewed-by: Jason Ekstrand <[email protected]>
* intel/fs: Add support for CS to group invocations in quadsCaio Marcelo de Oliveira Filho2019-04-083-16/+103
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When using quads, instead of mapping the elements to the next 4 local invocation indices, we map the two next in the "current" row and two next in the "next row". A side effect is that a thread will execute the indices in a different order. We now perform the lowering of both local invocation ID and index together -- and don't rely anymore on lowering done by nir_lower_system_values. That is convenient when doing the math for quads, because we need X and Y to get the right invocation index. When the pass progresses, fold the constants and clean up to reduce the noise from the indexing math. This implements the derivative_group_quadsNV semantics from NV_compute_shader_derivatives. v2: Take subgroup_id into account, otherwise only values in the first subgroup would be used. (Jason) v3: Calculate invocation index and ID together, to avoid duplicating some math in the quads case when both index and ID are used. (Jason) v4: Don't call cleanup passes as part of the lowering, let that to the call site. (Jason) Change calculation to use less instructions. (Jason) Reviewed-by: Ian Romanick <[email protected]> (v3) Reviewed-by: Jason Ekstrand <[email protected]>
* intel/fs: Use TEX_LOGICAL whenever implicit lod is supportedCaio Marcelo de Oliveira Filho2019-04-081-2/+6
| | | | | | | Make sure we include compute shaders that have a derivative group defined. Reviewed-by: Jason Ekstrand <[email protected]>
* nir: Don't set LOD=0 for compute shader that has derivative groupCaio Marcelo de Oliveira Filho2019-04-081-2/+6
| | | | | | | | | | | When using NV_compute_shader_derivatives to set a derivative group, a compute shader supports texture with implicit LOD calculation, so don't set an explicit LOD. Note if the extension is used but the derivative group is not specified, it will default to LOD=0 as before. Reviewed-by: Jason Ekstrand <[email protected]>
* nir/algebraic: Lower CS derivatives to zero when no group definedCaio Marcelo de Oliveira Filho2019-04-082-0/+14
| | | | | | | | | | | In compute shaders if no derivative group is defined, the derivatives will always be zero. Specified in NV_compute_shader_derivatives. To make the check more convenient, add a "info" local variable to the generated code so we can refer to it in the Python rules. (Jason) Reviewed-by: Ian Romanick <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]>
* glsl: Parse and propagate derivative_group to shader_infoCaio Marcelo de Oliveira Filho2019-04-089-4/+193
| | | | | | | | | | | | NV_compute_shader_derivatives allow selecting between two possible arrangements (quads and linear) when calculating derivatives and certain subgroup operations in case of Vulkan. So parse and propagate those up to shader_info.h. v2: Do not fail when ARB_compute_variable_group_size is being used, since we are still clarifying what is the right thing to do here. Reviewed-by: Ian Romanick <[email protected]>
* glsl: Enable texture builtins for NV_compute_shader_derivativesCaio Marcelo de Oliveira Filho2019-04-081-140/+153
| | | | | | | Renamed a few predicates from "fs_only" to be "derivative_only" (or similar pairs). Reviewed-by: Ian Romanick <[email protected]>
* glsl: Enable derivative builtins for NV_compute_shader_derivativesCaio Marcelo de Oliveira Filho2019-04-081-9/+25
| | | | Reviewed-by: Ian Romanick <[email protected]>
* glsl: Remove redundant conditions when asserting in_qualifierCaio Marcelo de Oliveira Filho2019-04-081-5/+2
| | | | | | | As the code evolved, we ended up with a redundant conditions. Clean this up. Reviewed-by: Ian Romanick <[email protected]>
* mesa: Extension boilerplate for NV_compute_shader_derivativesCaio Marcelo de Oliveira Filho2019-04-084-0/+5
| | | | Reviewed-by: Ian Romanick <[email protected]>
* nir/radv: remove restrictions on opt_if_loop_last_continue()Timothy Arceri2019-04-099-42/+61
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When I implemented opt_if_loop_last_continue() I had restricted this pass from moving other if-statements inside the branch opposite the continue. At the time it was causing a bunch of spilling in shader-db for i965. However Samuel Pitoiset noticed that making this pass more aggressive significantly improved the performance of Doom on RADV. Below are the statistics he gathered. 28717 shaders in 14931 tests Totals: SGPRS: 1267317 -> 1267549 (0.02 %) VGPRS: 896876 -> 895920 (-0.11 %) Spilled SGPRs: 24701 -> 26367 (6.74 %) Code Size: 48379452 -> 48507880 (0.27 %) bytes Max Waves: 241159 -> 241190 (0.01 %) Totals from affected shaders: SGPRS: 23584 -> 23816 (0.98 %) VGPRS: 25908 -> 24952 (-3.69 %) Spilled SGPRs: 503 -> 2169 (331.21 %) Code Size: 2471392 -> 2599820 (5.20 %) bytes Max Waves: 586 -> 617 (5.29 %) The codesize increases is related to Wolfenstein II it seems largely due to an increase in phis rather than the existing jumps. This gives +10% FPS with Doom on my Vega56. Rhys Perry also benchmarked Doom on his VEGA64: Before: 72.53 FPS After: 80.77 FPS v2: disable pass on non-AMD drivers Reviewed-by: Ian Romanick <[email protected]> (v1) Acked-by: Samuel Pitoiset <[email protected]>
* softpipe: add support for vertex streams (v2)Dave Airlie2019-04-092-2/+6
| | | | | | | | | This enables the ARB_gpu_shader5 vertex streams on softpipe. v2: only enable when not using llvm. Reviewed-by: Roland Scheidegger <[email protected]> Signed-off-by: Dave Airlie <[email protected]>
* draw: add support to tgsi paths for geometry streams. (v2)Dave Airlie2019-04-097-124/+194
| | | | | | | | | | | | | This hooks up the geometry shader processing to the TGSI support added in the previous commits. It doesn't change the llvm interface other than to keep things building. v2: fix some regressions caused by primitiveoffsets Reviewed-by: Roland Scheidegger <[email protected]> Signed-off-by: Dave Airlie <[email protected]>
* softpipe: add support for indexed queries.Dave Airlie2019-04-094-14/+16
| | | | | | | We need indexed queries to retrieve the geom shader info. Reviewed-by: Roland Scheidegger <[email protected]> Signed-off-by: Dave Airlie <[email protected]>
* tgsi: add support for geometry shader streams.Dave Airlie2019-04-093-18/+62
| | | | | | | | | | | | This adds support to retrieve the primitive counts for each stream, along with the offset for each primitive into the output array. It also adds support for parsing the stream argument to the emit and end instructions. Reviewed-by: Roland Scheidegger <[email protected]> Signed-off-by: Dave Airlie <[email protected]>
* draw: add stream member to stats callbackDave Airlie2019-04-094-3/+4
| | | | | | | | This just adds space for the member to the callback, doesn't change anything else. Reviewed-by: Roland Scheidegger <[email protected]> Signed-off-by: Dave Airlie <[email protected]>
* vulkan/wsi: make wl_drm optionalChia-I Wu2019-04-091-19/+32
| | | | | | | | | | | | When wl_drm is missing and the driver supports modifiers, use zwp_linux_dmabuf_v1 for the list of supported formats and for buffer creation. Limit the supported formats to those with modifiers, which are WL_DRM_FORMAT_{ARGB8888,XRGB8888} currently. Signed-off-by: Chia-I Wu <[email protected]> Reviewed-by: Daniel Stone <[email protected]>
* vulkan/wsi: add wsi_wl_display_dmabufChia-I Wu2019-04-091-22/+28
| | | | | | | Add wsi_wl_display_dmabuf for zwp_linux_dmabuf_v1-related states. Signed-off-by: Chia-I Wu <[email protected]> Reviewed-by: Daniel Stone <[email protected]>
* vulkan/wsi: add wsi_wl_display_drmChia-I Wu2019-04-091-14/+18
| | | | | | | | | | Add wsi_wl_display_drm for wl_drm-related states. We will move formats into the struct in a later commit. Remove the unnecessary check for wl_registry_bind failures. Signed-off-by: Chia-I Wu <[email protected]> Reviewed-by: Daniel Stone <[email protected]>
* vulkan/wsi: refactor drm_handle_formatChia-I Wu2019-04-091-53/+75
| | | | | | | | Refactor the swtich statement in drm_handle_format out to wsi_wl_display_add_wl_format. Signed-off-by: Chia-I Wu <[email protected]> Reviewed-by: Daniel Stone <[email protected]>
* vulkan/wsi: create wl_drm wrapper as neededChia-I Wu2019-04-091-7/+20
| | | | | | | | When modifiers are specified, we have to use dmabuf rather than wl_drm. We don't need the wrapper in that case. Signed-off-by: Chia-I Wu <[email protected]> Reviewed-by: Daniel Stone <[email protected]>
* vulkan/wsi: move modifier array into wsi_wl_swapchainChia-I Wu2019-04-091-20/+32
| | | | | | | This avoids repeated checks for each wsi_wl_image. Signed-off-by: Chia-I Wu <[email protected]> Reviewed-by: Daniel Stone <[email protected]>
* drisw: Try harder to probe whether MIT-SHM worksAdam Jackson2019-04-091-4/+21
| | | | | | | | | | | | | | | | XQueryExtension merely tells you whether the extension exists, it doesn't tell you whether you're local enough for it to work. XShmQueryVersion is not enough to discover this either, you need to provoke the server to do actual work, and if it thinks you're remote it will throw BadRequest at you. So send an invalid ShmDetach and use the error code to distinguish local from remote. [airlied: fixed bug not resetting xshm_error to 0 on success, which made later stuff fail completely.] Reviewed-by: Eric Anholt <[email protected]> Reviewed-by: Dave Airlie <[email protected]> Signed-off-by: Adam Jackson <[email protected]>
* nir/search: Search for all combinations of commutative opsJason Ekstrand2019-04-083-29/+64
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Consider the following search expression and NIR sequence: ('iadd', ('imul', a, b), b) ssa_2 = imul ssa_0, ssa_1 ssa_3 = iadd ssa_2, ssa_0 The current algorithm is greedy and, the moment the imul finds a match, it commits those variable names and returns success. In the above example, it maps a -> ssa_0 and b -> ssa_1. When we then try to match the iadd, it sees that ssa_0 is not b and fails to match. The iadd match will attempt to flip itself and try again (which won't work) but it cannot ask the imul to try a flipped match. This commit instead counts the number of commutative ops in each expression and assigns an index to each. It then does a loop and loops over the full combinatorial matrix of commutative operations. In order to keep things sane, we limit it to at most 4 commutative operations (16 combinations). There is only one optimization in opt_algebraic that goes over this limit and it's the bitfieldReverse detection for some UE4 demo. Shader-db results on Kaby Lake: total instructions in shared programs: 15310125 -> 15302469 (-0.05%) instructions in affected programs: 1797123 -> 1789467 (-0.43%) helped: 6751 HURT: 2264 total cycles in shared programs: 357346617 -> 357202526 (-0.04%) cycles in affected programs: 15931005 -> 15786914 (-0.90%) helped: 6024 HURT: 3436 total loops in shared programs: 4360 -> 4360 (0.00%) loops in affected programs: 0 -> 0 helped: 0 HURT: 0 total spills in shared programs: 23675 -> 23666 (-0.04%) spills in affected programs: 235 -> 226 (-3.83%) helped: 5 HURT: 1 total fills in shared programs: 32040 -> 32032 (-0.02%) fills in affected programs: 190 -> 182 (-4.21%) helped: 6 HURT: 2 LOST: 18 GAINED: 5 Reviewed-by: Thomas Helland <[email protected]>
* intel: add dependency on genxml generated filesLionel Landwerlin2019-04-087-6/+8
| | | | | | | | | | Drivers using genxml will start compilation before generated files are created, so add a dependency to it. Signed-off-by: Lionel Landwerlin <[email protected]> Reviewed-by: Eric Engestrom <[email protected]> Reviewed-by: Dylan Baker <[email protected]> Cc: [email protected]
* radeonsi: fix a crash when unbinding sampler statesMarek Olšák2019-04-081-1/+1
| | | | Acked-by: James Zhu <[email protected]>
* radv: fix getting the vertex strides if the bindings aren't contiguousSamuel Pitoiset2019-04-081-1/+15
| | | | | | | Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=110349 Fixes: a66b186bebf ("radv: use typed buffer loads for vertex input fetches") Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]>
* anv: implement VK_KHR_swapchain revision 70Lionel Landwerlin2019-04-084-3/+115
| | | | | | | | | | | | | | | | | | This revision allows for images to be : - created by reusing image parameters from swapchain - bound to memory from a swapchain v2: Add color attachment flag Use same implicit WSI parameters (tiling, samples, usage) v3: Fix missing break in vk_foreach_struct_const() switch (Lionel) v4: Fix accessing image aspects before android resolve (Tapani) Signed-off-by: Lionel Landwerlin <[email protected]> Reviewed-by: Tapani Pälli <[email protected]>
* vk/util: remove unneeded array indexEric Engestrom2019-04-081-1/+1
| | | | | | | | | This is an array of 1, so [0] is the only content, and meson already flattens the list so this is unnecessary. Also, all the other uses of vk_api_xml don't do that. Signed-off-by: Eric Engestrom <[email protected]> Reviewed-by: Dylan Baker <[email protected]>
* ac/nir: fix intrinsic names for atomic operations with LLVM 9+Samuel Pitoiset2019-04-081-11/+21
| | | | | | | | | | | | This fixes the following LLVM error when using RADV_DEBUG=checkir: Intrinsic name not mangled correctly for type arguments! Should be: llvm.amdgcn.buffer.atomic.add.i32 i32 (i32, <4 x i32>, i32, i32, i1)* @llvm.amdgcn.buffer.atomic.add The cmpswap operation still uses the old intrinsic. Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Erik Faye-Lund <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]>
* panfrost: Remove "mali_unknown6" nonsenseAlyssa Rosenzweig2019-04-071-8/+0
| | | | | | | | This structure was used maaaany moons ago as a placeholder for the varying meta (now unified with mali_attr_meta and essentially fully decoded). I don't know why it's still in the file. Let's wack it. Signed-off-by: Alyssa Rosenzweig <[email protected]>
* panfrost/midgard: Enable lower_find_lsbAlyssa Rosenzweig2019-04-071-0/+1
| | | | | | This is exactly what the blob does. Signed-off-by: Alyssa Rosenzweig <[email protected]>
* panfrost/midgard: Add ibitcount8 opAlyssa Rosenzweig2019-04-072-0/+3
| | | | | | | | | | | The mechanics of this opcode are a little opaque, but essentially, it's used in 8-bit mode to do a bit count in parallel of a uint and then doing a ton of clever iadd/imov ops to recombine. v2: Correct opcode. Thank you to jernej on IRC for noticing this awkward typo! Signed-off-by: Alyssa Rosenzweig <[email protected]>
* panfrost/midgard: Add ilzcnt opAlyssa Rosenzweig2019-04-072-0/+3
| | | | | | Used for implementing findLSB/MSB Signed-off-by: Alyssa Rosenzweig <[email protected]>