summaryrefslogtreecommitdiffstats
Commit message (Collapse)AuthorAgeFilesLines
* nir/search: Search for all combinations of commutative opsJason Ekstrand2019-04-083-29/+64
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Consider the following search expression and NIR sequence: ('iadd', ('imul', a, b), b) ssa_2 = imul ssa_0, ssa_1 ssa_3 = iadd ssa_2, ssa_0 The current algorithm is greedy and, the moment the imul finds a match, it commits those variable names and returns success. In the above example, it maps a -> ssa_0 and b -> ssa_1. When we then try to match the iadd, it sees that ssa_0 is not b and fails to match. The iadd match will attempt to flip itself and try again (which won't work) but it cannot ask the imul to try a flipped match. This commit instead counts the number of commutative ops in each expression and assigns an index to each. It then does a loop and loops over the full combinatorial matrix of commutative operations. In order to keep things sane, we limit it to at most 4 commutative operations (16 combinations). There is only one optimization in opt_algebraic that goes over this limit and it's the bitfieldReverse detection for some UE4 demo. Shader-db results on Kaby Lake: total instructions in shared programs: 15310125 -> 15302469 (-0.05%) instructions in affected programs: 1797123 -> 1789467 (-0.43%) helped: 6751 HURT: 2264 total cycles in shared programs: 357346617 -> 357202526 (-0.04%) cycles in affected programs: 15931005 -> 15786914 (-0.90%) helped: 6024 HURT: 3436 total loops in shared programs: 4360 -> 4360 (0.00%) loops in affected programs: 0 -> 0 helped: 0 HURT: 0 total spills in shared programs: 23675 -> 23666 (-0.04%) spills in affected programs: 235 -> 226 (-3.83%) helped: 5 HURT: 1 total fills in shared programs: 32040 -> 32032 (-0.02%) fills in affected programs: 190 -> 182 (-4.21%) helped: 6 HURT: 2 LOST: 18 GAINED: 5 Reviewed-by: Thomas Helland <[email protected]>
* intel: add dependency on genxml generated filesLionel Landwerlin2019-04-087-6/+8
| | | | | | | | | | Drivers using genxml will start compilation before generated files are created, so add a dependency to it. Signed-off-by: Lionel Landwerlin <[email protected]> Reviewed-by: Eric Engestrom <[email protected]> Reviewed-by: Dylan Baker <[email protected]> Cc: [email protected]
* radeonsi: fix a crash when unbinding sampler statesMarek Olšák2019-04-081-1/+1
| | | | Acked-by: James Zhu <[email protected]>
* radv: fix getting the vertex strides if the bindings aren't contiguousSamuel Pitoiset2019-04-081-1/+15
| | | | | | | Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=110349 Fixes: a66b186bebf ("radv: use typed buffer loads for vertex input fetches") Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]>
* anv: implement VK_KHR_swapchain revision 70Lionel Landwerlin2019-04-084-3/+115
| | | | | | | | | | | | | | | | | | This revision allows for images to be : - created by reusing image parameters from swapchain - bound to memory from a swapchain v2: Add color attachment flag Use same implicit WSI parameters (tiling, samples, usage) v3: Fix missing break in vk_foreach_struct_const() switch (Lionel) v4: Fix accessing image aspects before android resolve (Tapani) Signed-off-by: Lionel Landwerlin <[email protected]> Reviewed-by: Tapani Pälli <[email protected]>
* vk/util: remove unneeded array indexEric Engestrom2019-04-081-1/+1
| | | | | | | | | This is an array of 1, so [0] is the only content, and meson already flattens the list so this is unnecessary. Also, all the other uses of vk_api_xml don't do that. Signed-off-by: Eric Engestrom <[email protected]> Reviewed-by: Dylan Baker <[email protected]>
* ac/nir: fix intrinsic names for atomic operations with LLVM 9+Samuel Pitoiset2019-04-081-11/+21
| | | | | | | | | | | | This fixes the following LLVM error when using RADV_DEBUG=checkir: Intrinsic name not mangled correctly for type arguments! Should be: llvm.amdgcn.buffer.atomic.add.i32 i32 (i32, <4 x i32>, i32, i32, i1)* @llvm.amdgcn.buffer.atomic.add The cmpswap operation still uses the old intrinsic. Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Erik Faye-Lund <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]>
* panfrost: Remove "mali_unknown6" nonsenseAlyssa Rosenzweig2019-04-071-8/+0
| | | | | | | | This structure was used maaaany moons ago as a placeholder for the varying meta (now unified with mali_attr_meta and essentially fully decoded). I don't know why it's still in the file. Let's wack it. Signed-off-by: Alyssa Rosenzweig <[email protected]>
* panfrost/midgard: Enable lower_find_lsbAlyssa Rosenzweig2019-04-071-0/+1
| | | | | | This is exactly what the blob does. Signed-off-by: Alyssa Rosenzweig <[email protected]>
* panfrost/midgard: Add ibitcount8 opAlyssa Rosenzweig2019-04-072-0/+3
| | | | | | | | | | | The mechanics of this opcode are a little opaque, but essentially, it's used in 8-bit mode to do a bit count in parallel of a uint and then doing a ton of clever iadd/imov ops to recombine. v2: Correct opcode. Thank you to jernej on IRC for noticing this awkward typo! Signed-off-by: Alyssa Rosenzweig <[email protected]>
* panfrost/midgard: Add ilzcnt opAlyssa Rosenzweig2019-04-072-0/+3
| | | | | | Used for implementing findLSB/MSB Signed-off-by: Alyssa Rosenzweig <[email protected]>
* panfrost/midgard: Add umin/umax opcodesAlyssa Rosenzweig2019-04-073-0/+8
| | | | Signed-off-by: Alyssa Rosenzweig <[email protected]>
* panfrost: Add tilebuffer load? branchAlyssa Rosenzweig2019-04-072-1/+37
| | | | | | Also document branches better. Signed-off-by: Alyssa Rosenzweig <[email protected]>
* panfrost/decode: Add flags for tilebuffer readbackAlyssa Rosenzweig2019-04-072-3/+21
| | | | | | | | These flags are set when reading back the tilebuffer from a fragment shader via various mechanisms (including ARM_shader_framebuffer_fetch and EXT_pixel_local_storage). Signed-off-by: Alyssa Rosenzweig <[email protected]>
* panfrost/midgard: use nir_src_is_const and nir_src_as_uintKarol Herbst2019-04-071-7/+4
| | | | | Signed-off-by: Karol Herbst <[email protected]> Reviewed-by: Alyssa Rosenzweig <[email protected]>
* vc4: Prefer nir_src_comp_as_uint over nir_src_as_const_valueJason Ekstrand2019-04-072-18/+16
| | | | | Signed-off-by: Karol Herbst <[email protected]> Reviewed-by: Eric Anholt <[email protected]>
* v3d: prefer using nir_src_comp_as_int over nir_src_as_const_valueKarol Herbst2019-04-072-11/+8
| | | | | Signed-off-by: Karol Herbst <[email protected]> Reviewed-by: Eric Anholt <[email protected]>
* gallium/util: Add const to u_range_intersectKenneth Graunke2019-04-071-1/+2
| | | | | | This doesn't modify the range, so it can accept a const pointer. Reviewed-by: Eric Engestrom <[email protected]>
* gallium/hud: add CPU usage support for FreeBSDGreg V2019-04-071-0/+45
| | | | Reviewed-by: Eric Engestrom <[email protected]>
* iris: Silence unused variable warnings in release modeKenneth Graunke2019-04-061-3/+2
|
* nir/algebraic: Add some logical OR and AND patternsJason Ekstrand2019-04-051-0/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The new OR pattern has been seen in the wild and can end up being generated by GLSLang. Not sure about the other two new patterns but we may as well throw them in for completeness. While we're here, we can drop the '@bool' specifier from the one pattern because specifying True already implies 1-bit which basically implies boolean. Shader-db results on Kaby Lake: total instructions in shared programs: 15321227 -> 15321129 (<.01%) instructions in affected programs: 3594 -> 3496 (-2.73%) helped: 6 HURT: 0 total cycles in shared programs: 357481321 -> 357479725 (<.01%) cycles in affected programs: 44109 -> 42513 (-3.62%) helped: 6 HURT: 0 VkPipeline-DB results on Kaby Lake: total instructions in shared programs: 3770504 -> 3769734 (-0.02%) instructions in affected programs: 19058 -> 18288 (-4.04%) helped: 163 HURT: 0 total cycles in shared programs: 1417583701 -> 1417569727 (<.01%) cycles in affected programs: 750958 -> 736984 (-1.86%) helped: 158 HURT: 1 Reviewed-by: Timothy Arceri <[email protected]>
* nir/algebraic: Drop some @bool specifiersJason Ekstrand2019-04-051-7/+7
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Now that we have one-bit booleans, we don't need to rely on looking at parent instructions in order to figure out if a value is a Boolean most of the time. We can drop these specifiers and now the optimizations will apply more generally. Shader-DB results on Kaby Lake: total instructions in shared programs: 15321168 -> 15321227 (<.01%) instructions in affected programs: 8836 -> 8895 (0.67%) helped: 1 HURT: 31 total cycles in shared programs: 357481781 -> 357481321 (<.01%) cycles in affected programs: 146524 -> 146064 (-0.31%) helped: 22 HURT: 10 total spills in shared programs: 23675 -> 23673 (<.01%) spills in affected programs: 11 -> 9 (-18.18%) helped: 1 HURT: 0 total fills in shared programs: 32040 -> 32036 (-0.01%) fills in affected programs: 27 -> 23 (-14.81%) helped: 1 HURT: 0 No change in VkPipeline-DB Looking at the instructions hurt, a bunch of them seem to be a case where doing exactly the right thing in NIR ends up doing the wrong-ish thing in the back-end because flags are dumb. In particular, there's a case where we have a MUL followed by a CMP followed by a SEL and when we turn that SEL into an OR, it uses the GRF result of the CMP rather than the flag result so the CMP can't be merged with the MUL. Those shaders appear to schedule better according to the cycle estimates so I guess it's a win? Also it helps spilling in one Car Chase compute shader. Reviewed-by: Timothy Arceri <[email protected]>
* util: clean the 24-bit unused field to avoid an issuesAndrii Simiklit2019-04-051-3/+3
| | | | | | | | | | | | | | | | This is a field of FLOAT_32_UNSIGNED_INT_24_8_REV texture pixel. OpenGL spec "8.4.4.2 Special Interpretations" is saying: "the second word contains a packed 24-bit unused field, followed by an 8-bit index" The spec doesn't require us to clear this unused field however it make sense to do it to avoid some undefined behavior in some apps. Suggested-by: Eric Anholt <[email protected]> Reviewed-by: Marek Olšák <[email protected]> Reviewed-by: Eric Anholt <[email protected]> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=110305 Signed-off-by: Andrii Simiklit <[email protected]>
* nir: Take if_uses into account when repairing SSACaio Marcelo de Oliveira Filho2019-04-051-0/+18
| | | | | | | | | | | | | | | If a def is used as an condition before its definition, we should also consider this a case to repair. When repairing, make sure we rewrite any if conditions too. Found in while inspecting a SPIR-V conversion from a 'continue block' that contains a conditional branch. We pull the continue block up to the beggining of the loop, and the condition in the branch ends up defined afterwards. Reviewed-by: Kenneth Graunke <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]> Fixes: 364212f1ede4b "nir: Add a pass to repair SSA form"
* tegra: fix the build after the set_shader_buffers changeMarek Olšák2019-04-051-1/+1
|
* gallium/auxiliary/vl: Add barrier/unbind after compute shader launch.James Zhu2019-04-051-0/+13
| | | | | | | | Add memory barrier sync for multiple launch cases, and unbind completed resources after launch. Signed-off-by: James Zhu <[email protected]> Reviewed-by: Marek Olšák <[email protected]>
* gallium/auxiliary/vl: Fixed blank issue with compute shaderJames Zhu2019-04-051-6/+1
| | | | | | | | | Multiple init buffer within one open instance will cause blank issue. Updating viewport per frame will fix this issue. Signed-off-by: James Zhu <[email protected]> Tested-by: Bruno Milreu <[email protected]> Reviewed-by: Marek Olšák <[email protected]>
* gallium/auxiliary/vl: Fixed blur issue with weave compute shaderJames Zhu2019-04-051-1/+1
| | | | | | | | Correct wrong interpolatation with top/bottom row which caused blur issue. Signed-off-by: James Zhu <[email protected]> Tested-by: Bruno Milreu <[email protected]> Reviewed-by: Marek Olšák <[email protected]>
* docs: update calendar, add news item and link release notes for 18.3.6Emil Velikov2019-04-053-7/+11
| | | | Signed-off-by: Emil Velikov <[email protected]>
* docs: add sha256 checksums for 18.3.6Emil Velikov2019-04-051-1/+2
| | | | | Signed-off-by: Emil Velikov <[email protected]> (cherry picked from commit eb9da68cbf23aafb1192beed084b2f05df65dd04)
* docs: add release notes for 18.3.6Emil Velikov2019-04-051-0/+168
| | | | | Signed-off-by: Emil Velikov <[email protected]> (cherry picked from commit b03f51c4b4dfa54775e866b75f68a41862c062c2)
* nir: do not pack varying with different typesSamuel Pitoiset2019-04-051-0/+13
| | | | | | | | | The current algorithm only supports packing 32-bit types. If a shader uses both 16-bit and 32-bit varyings, we shouldn't compact them together. Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Timothy Arceri <[email protected]>
* softpipe: Use mag texture filter also for clamped lod == 0Gert Wollny2019-04-051-3/+3
| | | | | | | | | | | | | | | | | | | | | | | | Follow the spec when selecting the magnification filter (OpenGL 4.5, section 8.14): If λ(x, y) is less than or equal to the constant c (see section 8.15) the texture is said to be magnified; While we're here also silence a potential warning about implicit float to double conversion. v2: Update commit message to contain a reference to the spec as pointed out by Eric. Fixes a number of dEQP GLES2 and GLES3 test out of: dEQP-GLES2.functional.texture.filtering.* dEQP-GLES2.functional.texture.vertex.2d.filtering.* dEQP-GLES3.functional.texture.vertex.*.filtering.* dEQP-GLES3.functional.texture.filtering.* dEQP-GLES3.functional.texture.shadow.2d.* Signed-off-by: Gert Wollny <[email protected]> Reviewed-by: Eric Anholt <[email protected]>
* iris: handle aux properly in iris_resource_get_handleTapani Pälli2019-04-041-0/+8
| | | | | | | | | Disable aux when resource seen the first time and EXPLICIT_FLUSH not being set. This fixes issues seen when launching Xorg and CCS_E getting utilized. Signed-off-by: Tapani Pälli <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
* v3d: Add some more new packets for V3D 4.x.Eric Anholt2019-04-041-0/+131
| | | | The T/G shader references and common state will be needed for GLES 3.2.
* v3d: Don't try to use the TFU blit path if a scissor is enabled.Eric Anholt2019-04-041-1/+2
| | | | | | | | We'll need to do a render-based blit for scissors, since the TFU (as seen in this conditional) can only update a whole surface. Fixes: 976ea90bdca2 ("v3d: Add support for using the TFU to do some blits.") Fixes piglit fbo-scissor-blit.
* v3d: Bump the maximum texture size to 4k for V3D 4.x.Eric Anholt2019-04-045-5/+33
| | | | | | | 4.1 and 4.2 both have the same 16k limit, but it I'm seeing GPU hangs in the CTS at 8k and 16k. 4k at least lets us get one 4k display working. Cc: [email protected]
* v3d: Add support for handling OOM signals from the simulator.Eric Anholt2019-04-043-14/+78
| | | | | | I have v3d allocating enough initial allocation memory that we've been passing tests without it, but to match kernel behavior more it would be good to actually exercise the OOM path.
* mesa/main: Fix multisample texture initializeIllia Iorin2019-04-051-13/+25
| | | | | | | | | | | | | | | Sampler of Multisample textures wasn't initialized correct. So when texture object created as multisample its sampler is initialized in a individual case. We change the initial state of TEXTURE_MIN_FILTER and TEXTURE_MAG_FILTER to NEAREST. These changes are approved by KhronosGroup. https://github.com/KhronosGroup/OpenGL-API/issues/45 Signed-off-by: Sergii Romantsov <[email protected]> Signed-off-by: Illia Iorin <[email protected]> Reviewed-by: Eric Anholt <[email protected]> Reviewed-by: Marek Olšák <[email protected]> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=109057
* glsl: Fix input/output structure matching across shader stagesSergii Romantsov2019-04-054-26/+56
| | | | | | | | | | | | | | | | | | Section 7.4.1 (Shader Interface Matching) of the OpenGL 4.30 spec says: "Variables or block members declared as structures are considered to match in type if and only if structure members match in name, type, qualification, and declaration order." Fixes: * layout-location-struct.shader_test v2: rebased against master and small fixes Signed-off-by: Vadym Shovkoplias <[email protected]> Reviewed-by: Anuj Phogat <[email protected]> Reviewed-by: Timothy Arceri <[email protected]> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=108250
* ddebug: add compute functions to help hang detectionDave Airlie2019-04-051-2/+21
| | | | Reviewed-by: Marek Olšák <[email protected]>
* iris: avoid use after free in shader destructionDave Airlie2019-04-051-7/+49
| | | | | | | | | | | While playing with compute shaders, I was getting a random crash, noticed that bind_state was using the old shader info for comparision, but gallium allows the shader to be deleted while bound, so this could lead to a use after free. This can't happen using the cso cache. As it tracks all of this. Reviewed-by: Kenneth Graunke <[email protected]>
* radeonsi: set exact shader buffer read/write usage in CSMarek Olšák2019-04-046-24/+41
| | | | Reviewed-by: Timothy Arceri <[email protected]>
* glsl: remember which SSBOs are not read-only and pass it to galliumMarek Olšák2019-04-044-1/+20
| | | | Reviewed-by: Timothy Arceri <[email protected]>
* gallium: add writable_bitmask parameter into set_shader_buffersMarek Olšák2019-04-0419-28/+51
| | | | | | | to indicate write usage per buffer. This is just a hint (it will be used by radeonsi). Reviewed-by: Timothy Arceri <[email protected]>
* iris: Fix assert when using vertex attrib without buffer bindingDanylo Piliaiev2019-04-041-1/+2
| | | | | | | | | | | | | The GL 4.5 spec says: "If any enabled array’s buffer binding is zero when DrawArrays or one of the other drawing commands defined in section 10.4 is called, the result is undefined." The result is undefined but it should not crash. Fixes: gl-3.1-vao-broken-attrib Signed-off-by: Danylo Piliaiev <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
* iris: move iris_flush_resource so we can call it from get_handleTapani Pälli2019-04-041-15/+15
| | | | | Signed-off-by: Tapani Pälli <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
* iris: Save/restore MI_PREDICATE_RESULT, not MI_PREDICATE_DATA.Kenneth Graunke2019-04-043-8/+8
| | | | | | | | MI_PREDICATE_DATA is an intermediate storage for the MI_PREDICATE command's calculations - it holds the result of the subtraction when the compare operation is SRCS_EQUAL or DELTAS_EQUAL. But the actual result of the predication is MI_PREDICATE_RESULT, which is what we want to copy from the render context to the compute context.
* util/process: document memory leakEric Engestrom2019-04-041-0/+4
| | | | | | | | We consider it acceptable, but let's still document it in case people notice it and are not sure why it's there. Signed-off-by: Eric Engestrom <[email protected]> Reviewed-by: Tapani Pälli <[email protected]>
* simplify LLVM version string printingEric Engestrom2019-04-047-47/+21
| | | | | | | Figure it out once in the build system, then just use that all over the place. Signed-off-by: Eric Engestrom <[email protected]> Reviewed-by: Marek Olšák <[email protected]>