summaryrefslogtreecommitdiffstats
Commit message (Collapse)AuthorAgeFilesLines
* radeonsi: make si_cp_wait_mem more configurableMarek Olšák2019-01-025-8/+8
| | | | Tested-by: Dieter Nützel <[email protected]>
* radeonsi: call si_fix_resource_usage for the GS copy shader as wellMarek Olšák2019-01-021-0/+4
| | | | Tested-by: Dieter Nützel <[email protected]>
* radeonsi: don't emit redundant PKT3_NUM_INSTANCES packetsMarek Olšák2019-01-022-2/+10
| | | | Tested-by: Dieter Nützel <[email protected]>
* nir: add a way to print the deref chainCaio Marcelo de Oliveira Filho2019-01-022-4/+14
| | | | | | | | Makes debugging easier when we care about the deref chain and not the deref instruction itself. To make it take a const pointer, constify some of the static functions in nir_print.c. Reviewed-by: Eric Anholt <[email protected]>
* meson: Error out if building nouveau and using LLVM without rttiDylan Baker2019-01-021-0/+3
| | | | | | | | | | | Nouveau requires rtti. Often LLVM is configured without rtti, and code with and without cannot be linked safely. Lets just error out if nouveau is requested and llvm is built without rtti. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=109202 Fixes: c5a97d658ec19cc02719d7f86c1b0715e3d9ffc4 ("meson: fix builds against LLVM built without rtti") Reviewed-by: Bas Nieuwenhuizen <[email protected]>
* egl/haiku: Fix reference to disp vs dpyAlexander von Gluck IV2019-01-021-1/+2
| | | | | Reviewed-by: Eric Engestrom <[email protected]> Fixes: 00992700c9a812a54563 "egl: set the EGLDevice when creating a display"
* compiler/spirv: use 32-bit polynomial approximation for 16-bit asin()Iago Toral Quiroga2019-01-021-0/+14
| | | | | | | | | | | | | | | | | | | | | | | | The 16-bit polynomial execution doesn't meet Khronos precision requirements. Also, the half-float denorm range starts at 2^(-14) and with asin taking input values in the range [0, 1], polynomial approximations can lead to flushing relatively easy. An alternative is to use the atan2 formula to compute asin, which is the reference taken by Khronos to determine precision requirements, but that ends up generating too many additional instructions when compared to the polynomial approximation. Specifically, for the Intel case, doing this adds +41 instructions to the program for each asin/acos call, which looks like an undesirable trade off. So for now we take the easy way out and fallback to using the 32-bit polynomial approximation, which is better (faster) than the 16-bit atan2 implementation and gives us better precision that matches Khronos requirements. v2: - Fallback to 32-bit using recursion (Jason). Reviewed-by: Jason Ekstrand <[email protected]>
* compiler/spirv: implement 16-bit frexpIago Toral Quiroga2019-01-021-2/+46
| | | | Reviewed-by: Jason Ekstrand <[email protected]>
* compiler/spirv: implement 16-bit hyperbolic trigonometric functionsIago Toral Quiroga2019-01-021-18/+26
| | | | | | | | | | v2: - use nir_fadd_imm and nir_fmul_imm helpers (Jason) v3: - since we need to define one for fsub use it for fdiv too (Jason) Reviewed-by: Jason Ekstrand <[email protected]>
* compiler/spirv: implement 16-bit exp and logIago Toral Quiroga2019-01-021-2/+2
| | | | | | | v2 - use nir_fmul_imm helper (Jason) Reviewed-by: Jason Ekstrand <[email protected]>
* compiler/spirv: implement 16-bit atan2Iago Toral Quiroga2019-01-021-7/+11
| | | | | | | | | | | | v2: - fix huge_val for 16-bit, it was mean't to be 2^14 not 10^14. v3: - rebase on top of new bool sized opcodes - use nir_b2f helper - use nir_fmul_imm helper Reviewed-by: Jason Ekstrand <[email protected]>
* compiler/spirv: implement 16-bit atanIago Toral Quiroga2019-01-021-12/+11
| | | | | | | | | v2: - use nir_fadd_imm and nir_fmul_imm helpers (Jason) - rebased on top of new sized boolean opcodes - use nir_b2f helper Reviewed-by: Jason Ekstrand <[email protected]>
* compiler/spirv: implement 16-bit acosIago Toral Quiroga2019-01-021-2/+3
| | | | Reviewed-by: Jason Ekstrand <[email protected]>
* compiler/spirv: implement 16-bit asinIago Toral Quiroga2019-01-021-9/+14
| | | | | | | | | | | v2: - use nir_fmul_imm and nir_fadd_imm helpers (Jason) v3: - missed one case where we need to replace nir_imm_float with nir_imm_floatN_t (Jason) Reviewed-by: Jason Ekstrand <[email protected]>
* compiler/spirv: handle 16-bit float in radians() and degrees()Iago Toral Quiroga2019-01-021-2/+2
| | | | | | | v2: - use nir_imm_fmul helper (Jason) Reviewed-by: Jason Ekstrand <[email protected]>
* compiler/nir: add nir_fadd_imm() and nir_fmul_imm() helpersIago Toral Quiroga2019-01-021-0/+12
| | | | Reviewed-by: Jason Ekstrand <[email protected]>
* compiler/nir: add a nir_b2f() helperIago Toral Quiroga2019-01-021-0/+12
| | | | Reviewed-by: Jason Ekstrand <[email protected]>
* nir: link time opt duplicate varyingsTimothy Arceri2019-01-021-0/+88
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | If we are outputting the same value to more than one output component rewrite the inputs to read from a single component. This will allow the duplicate varying components to be optimised away by the existing opts. shader-db results i965 (SKL): total instructions in shared programs: 12869230 -> 12860886 (-0.06%) instructions in affected programs: 322601 -> 314257 (-2.59%) helped: 3080 HURT: 8 total cycles in shared programs: 317792574 -> 317730593 (-0.02%) cycles in affected programs: 2584925 -> 2522944 (-2.40%) helped: 2975 HURT: 477 shader-db results radeonsi (VEGA): SGPRS: 31576 -> 31664 (0.28 %) VGPRS: 17484 -> 17064 (-2.40 %) Spilled SGPRs: 184 -> 167 (-9.24 %) Spilled VGPRs: 0 -> 0 (0.00 %) Private memory VGPRs: 0 -> 0 (0.00 %) Scratch size: 0 -> 0 (0.00 %) dwords per thread Code Size: 583340 -> 569368 (-2.40 %) bytes LDS: 0 -> 0 (0.00 %) blocks Max Waves: 6162 -> 6270 (1.75 %) Wait states: 0 -> 0 (0.00 %) vkpipeline-db results RADV (VEGA): Totals from affected shaders: SGPRS: 14880 -> 15080 (1.34 %) VGPRS: 10872 -> 10888 (0.15 %) Spilled SGPRs: 0 -> 0 (0.00 %) Spilled VGPRs: 0 -> 0 (0.00 %) Private memory VGPRs: 0 -> 0 (0.00 %) Scratch size: 0 -> 0 (0.00 %) dwords per thread Code Size: 674016 -> 668396 (-0.83 %) bytes LDS: 0 -> 0 (0.00 %) blocks Max Waves: 2708 -> 2704 (-0.15 %) Wait states: 0 -> 0 (0.00 % V2: bunch of tidy ups suggested by Jason Reviewed-by: Eric Anholt <[email protected]>
* nir: rework nir_link_opt_varyings()Timothy Arceri2019-01-021-16/+12
| | | | | | | | This just cleans things up a little and make things more safe for derefs. Tested-by: Dieter Nützel <[email protected]> Reviewed-by: Eric Anholt <[email protected]>
* nir: add can_replace_varying() helperTimothy Arceri2019-01-021-2/+14
| | | | | | | | This will be reused by the following patch. Tested-by: Dieter Nützel <[email protected]> Reviewed-by: Marek Olšák <[email protected]> Reviewed-by: Eric Anholt <[email protected]>
* nir: rename nir_link_constant_varyings() nir_link_opt_varyings()Timothy Arceri2019-01-025-6/+6
| | | | | | | | | | The following patches will add support for an additional optimisation so this function will no longer just optimise varying constants. Tested-by: Dieter Nützel <[email protected]> Reviewed-by: Marek Olšák <[email protected]> Reviewed-by: Eric Anholt <[email protected]>
* st/glsl_to_nir: call nir_lower_load_const_to_scalar() in the stTimothy Arceri2019-01-022-3/+3
| | | | | | | | | This will help the new opt introduced in the following patches allowing us to remove extra duplicate varyings. Tested-by: Dieter Nützel <[email protected]> Reviewed-by: Marek Olšák <[email protected]> Reviewed-by: Eric Anholt <[email protected]>
* radeonsi: make use of ac_are_tessfactors_def_in_all_invocs()Timothy Arceri2019-01-021-8/+2
| | | | | Tested-by: Dieter Nützel <[email protected]> Reviewed-by: Marek Olšák <[email protected]>
* ac/nir_to_llvm: add ac_are_tessfactors_def_in_all_invocs()Timothy Arceri2019-01-022-0/+163
| | | | | | | | | | | The following patch will use this with the radeonsi NIR backend but I've added it to ac so we can use it with RADV in future. This is a NIR implementation of the tgsi function tgsi_scan_tess_ctrl(). Tested-by: Dieter Nützel <[email protected]> Reviewed-by: Marek Olšák <[email protected]>
* radeonsi: remove unrequired param in si_nir_scan_tess_ctrl()Timothy Arceri2019-01-023-3/+1
| | | | | Tested-by: Dieter Nützel <[email protected]> Reviewed-by: Marek Olšák <[email protected]>
* tgsi/scan: correctly walk instructions in tgsi_scan_tess_ctrl()Timothy Arceri2019-01-021-29/+43
| | | | | | | | | | | | The previous code used a do while loop and continues after walking a nested loop/if-statement. This means we end up evaluating the last instruction from the nested block against the while condition and potentially exit early if it matches the exit condition of the outer block. Fixes: 386d165d8d09 ("tgsi/scan: add a new pass that analyzes tess factor writes") Reviewed-by: Marek Olšák <[email protected]>
* tgsi/scan: fix loop exit point in tgsi_scan_tess_ctrl()Timothy Arceri2019-01-021-1/+1
| | | | | | | | | | | | | This just happened not to crash/assert because all loops have at least 1 if-statement and due to a second bug we end up matching the same ENDIF to exit both the iteration over the if-statment and the loop. The second bug is fixed in the following patch. Fixes: 386d165d8d09 ("tgsi/scan: add a new pass that analyzes tess factor writes") Reviewed-by: Marek Olšák <[email protected]>
* nv30: disable rendering to 3D texturesIlia Mirkin2019-01-011-0/+6
| | | | | | | | | | | There's no way to tell the 3D engine about swizzling on such textures. While rendering to NPOT ones may be possible, there's no great way to expose that in gallium, nor would there be any practical benefit. Fixes the non-compressed-format "copyteximage 3D" failures. Something odd going on with the compressed formats. Signed-off-by: Ilia Mirkin <[email protected]>
* radv: Do a cache flush if needed before reading predicates.Bas Nieuwenhuizen2018-12-311-0/+2
| | | | | | | | | | | | | | This caused random failures for two conditional rendering tests: dEQP-VK.conditional_rendering.draw_clear.draw.update_with_rendering_discard dEQP-VK.conditional_rendering.draw_clear.draw.update_with_rendering_no_discard These wrote the predicate with the vertex shader, did a barrier and then started the conditional rendering. However the cache flushes for the barrier only happen on first draw, so after the predicate has been read. Fixes: e45ba51ea45 "radv: add support for VK_EXT_conditional_rendering" Reviewed-by: Dave Airlie <[email protected]>
* anv/autotools: make sure tests link with -msse2Erik Faye-Lund2018-12-311-0/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Without this, I get the following error when building the tests with autotools on i686: ---8<--- src/intel/common/gen_clflush.h: In function ‘gen_clflush_range’: src/intel/common/gen_clflush.h:37:7: warning: implicit declaration of function ‘__builtin_ia32_clflush’; did you mean ‘__builtin_ia32_pause’? [-Wimplicit-function-declaration] __builtin_ia32_clflush(p); ^~~~~~~~~~~~~~~~~~~~~~ __builtin_ia32_pause src/intel/common/gen_clflush.h: In function ‘gen_flush_range’: src/intel/common/gen_clflush.h:45:4: warning: implicit declaration of function ‘__builtin_ia32_mfence’; did you mean ‘__builtin_ia32_fnclex’? [-Wimplicit-function-declaration] __builtin_ia32_mfence(); ^~~~~~~~~~~~~~~~~~~~~ __builtin_ia32_fnclex ---8<--- The erros are generated for each of these files: - mesa/src/intel/vulkan/tests/state_pool_no_free.c - mesa/src/intel/vulkan/tests/state_pool.c - mesa/src/intel/vulkan/tests/block_pool_no_free.c - mesa/src/intel/vulkan/tests/state_pool_free_list_only.c This is obviously because gen_clflush.h contains code that uses intrinsics that are only available with SSE3. Since the driver already uses SSE3, it seems reasonable to add this to the tests as well. Signed-off-by: Erik Faye-Lund <[email protected]> Reviewed-by: Jordan Justen <[email protected]> Acked-by: Eric Engeström <[email protected]>
* anv/meson: make sure tests link with -msse2Erik Faye-Lund2018-12-311-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Without this, I get the following error when building the tests using meson on i686: ---8<--- In file included from ../../../mesa/src/intel/vulkan/anv_private.h:46, from ../../../mesa/src/intel/vulkan/tests/state_pool_no_free.c:26: ../../../mesa/src/intel/common/gen_clflush.h: In function ‘gen_clflush_range’: ../../../mesa/src/intel/common/gen_clflush.h:37:7: error: implicit declaration of function ‘__builtin_ia32_clflush’; did you mean ‘__builtin_ia32_pause’? [-Werror=implicit-function-declaration] __builtin_ia32_clflush(p); ^~~~~~~~~~~~~~~~~~~~~~ __builtin_ia32_pause ../../../mesa/src/intel/common/gen_clflush.h: In function ‘gen_flush_range’: ../../../mesa/src/intel/common/gen_clflush.h:45:4: error: implicit declaration of function ‘__builtin_ia32_mfence’; did you mean ‘__builtin_ia32_fnclex’? [-Werror=implicit-function-declaration] __builtin_ia32_mfence(); ^~~~~~~~~~~~~~~~~~~~~ __builtin_ia32_fnclex ---8<--- The errors are generated for each of these files: - mesa/src/intel/vulkan/tests/state_pool_no_free.c - mesa/src/intel/vulkan/tests/state_pool.c - mesa/src/intel/vulkan/tests/block_pool_no_free.c - mesa/src/intel/vulkan/tests/state_pool_free_list_only.c This is obviously because gen_clflush.h contains code that uses intrinsics that are only available with SSE3. Since the driver already uses SSE3, it seems reasonable to add this to the tests as well. Signed-off-by: Erik Faye-Lund <[email protected]> Reviewed-by: Jordan Justen <[email protected]> Reviewed-by: Eric Engeström <[email protected]>
* nv30: fix some s3tc layout issuesIlia Mirkin2018-12-302-7/+26
| | | | | | | | | | | | | | s3tc layouts are a bit finicky - they're packed, but not swizzled. Adjust logic to allow for that case: - Don't set a uniform pitch for POT-sized compressed textures - Adjust define_rect API to be less confused about block sizes - Only mark a texture as linear if it has a uniform pitch set This has been tested to fix xonotic (as well as the s3tc-* piglits) on nv3x and keeps it working on nv4x. Signed-off-by: Ilia Mirkin <[email protected]>
* nv30: use correct helper to get blocks in y directionIlia Mirkin2018-12-301-1/+1
| | | | | | | This doesn't matter since all compressed formats supported by this hardware use square blocks, but best to use the correct helper. Signed-off-by: Ilia Mirkin <[email protected]>
* nv30: add support for multi-layer transfersIlia Mirkin2018-12-301-4/+35
| | | | | | | | This logic mirrors what we do on nv50. The relatively new texture_subdata callback can cause this to happen with 3D textures, which is triggered at least by xonotic, and probably many piglits. Signed-off-by: Ilia Mirkin <[email protected]>
* nv30: fix rare issue with fp unbinding not finding the bufctxIlia Mirkin2018-12-301-1/+1
| | | | | | | | | | | | | If the last-active context gets deleted, the pushbuf doesn't have a bufctx to reference. Then there could be a sequence of binds which would trigger a reset on that bin before validation was done. Instead we just pass in the bufctx in question directly. All other instances of PUSH_RESET happen strictly after a validation is run. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=102349 Signed-off-by: Ilia Mirkin <[email protected]>
* nv30: avoid setting user_priv without setting cur_ctxIlia Mirkin2018-12-301-3/+1
| | | | | | | | | | | | The whole user_priv thing is a mess, but as long as it's there, it basically has to map 1:1 to the cur_ctx. Unfortunately we were setting user_priv to some context, then that context could get deleted without any draws/validations in it, leading user_priv to become NULL, with cur_ctx still pointing at some old context. Then we wouldn't run the switch logic, which in turn led to a NULL bufctx being dereferenced. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=102349 Signed-off-by: Ilia Mirkin <[email protected]>
* v3d: Add support for gl_HelperInvocation.Eric Anholt2018-12-301-0/+8
| | | | | | We can just look at the MSF flags -- if they're unset, then we're definitely in a helper invocation. Fixes dEQP-GLES31.functional.shaders.helper_invocation.* with GLES3.1 enabled.
* v3d: Add support for textureSize() on MSAA textures.Eric Anholt2018-12-301-0/+1
| | | | | | Fixes failures in dEQP-GLES31.functional.shaders.builtin_functions.texture_size.samples_1_texture_2d in the GLES3.1 suite.
* v3d: Add support for requesting the sample offsets.Eric Anholt2018-12-301-0/+22
|
* v3d: Add support for non-constant texture offsets.Eric Anholt2018-12-301-8/+24
| | | | | | Fixes dEQP-GLES31.functional.texture.gather.offset_dynamic.min_required_offset.2d.rgba8.size_pot.clamp_to_edge_repeat and others.
* v3d: Force sampling from base level for tg4.Eric Anholt2018-12-301-3/+3
| | | | | | This is what the GLSL ES 310 spec tells us to do, but apparently the "gather mode" flag doesn't imply it in the HW. Fixes dEQP-GLES31.functional.texture.gather.basic.2d.rgba8.filter_mode.min_nearest_mipmap_linear_mag_linear
* v3d: Add a note for a potential performance win on multop/umul24.Eric Anholt2018-12-301-0/+4
| | | | Noticed while debugging a testcase.
* v3d: Dead-code eliminate unused flags updates.Eric Anholt2018-12-301-4/+42
| | | | | | | | | The greedy comparison folding in bcsel means that we may have left the original bool-generating NIR ALU instruction dead, but DCE wasn't eliminating the VIR code for it because of the flags updates. total instructions in shared programs: 5186024 -> 5100894 (-1.64%) instructions in affected programs: 1448695 -> 1363565 (-5.88%)
* v3d: Don't generate temps for comparisons.Eric Anholt2018-12-301-12/+14
| | | | | This was just generated work for vir_opt_dead_code and cluttered up the dumps.
* v3d: Move "does this instruction have flags" from sched to generic helpers.Eric Anholt2018-12-306-55/+48
| | | | I wanted to reuse it for DCE of flags updates.
* v3d: Drop incorrect dependency for flpop.Eric Anholt2018-12-301-4/+0
| | | | | It is just shifting probably-means-flags bits out of a value, it doesn't actually update the flags on its own.
* v3d: Drop unused count_nir_instrs() helper.Eric Anholt2018-12-301-18/+0
| | | | | This was for shader-db, but I haven't cared about NIR instruction counts in a long time.
* v3d: Hook up some shader-db output to GL_ARB_debug_output.Eric Anholt2018-12-304-2/+55
| | | | | | | This allows the original shader-db project's run.c runner to parse things easily, and is probably a good thing to have for GL_ARB_debug_output in general. I formatted it more like Intel's so I can mostly reuse their report script.
* v3d: Add a "precompile" debug flag for shader-db.Eric Anholt2018-12-293-0/+78
| | | | | | | | | I've been using my apitrace-based shader-db so far, but it's slow (apitrace decompression), intrusive (apitrace windows spamming the screen), and doesn't have much coverage. The original shader-db provides a lot more coverage and compiles faster, at the expense of not having the actual runtime variant key. As v3d has a lot less runtime variation than vc4 did, this tradeoff makes more sense.
* v3d: Fix uniform pretty printing assertion failure with branches.Eric Anholt2018-12-291-0/+3
| | | | Fixes: 248a7fb392ba ("v3d: Do uniform pretty-printing in the QPU dump.")