aboutsummaryrefslogtreecommitdiffstats
path: root/src
Commit message (Collapse)AuthorAgeFilesLines
* radv: implement VK_AMD_shader_core_properties2Samuel Pitoiset2019-08-212-0/+10
| | | | | | | Trivial extension that matches PAL. Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]>
* radv: force enable VK_AMD_shader_ballot for Wolfenstein YoungbloodSamuel Pitoiset2019-08-211-0/+8
| | | | | | | | | | | | | | This gives a nice boost, +20% at this time on my Vega 56. Shader ballot should be enabled by default at some point but it reduces performance a bit (-6%) with Wolfeinstein II. Enable it only for Youngblood at the moment, like what we did for Talos in the past. As a bonus point, it gets rid of some minor artifacts that only happens when ballot is disabled for some reasons. Cc: 19.2 <[email protected] Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]>
* radv: add a new debug option called RADV_DEBUG=noshaderballotSamuel Pitoiset2019-08-212-0/+2
| | | | | | | | | Shader ballot will be enabled by default for Wolfenstein Youngblood. This follows what we did for sisched. Cc: 19.2 <[email protected] Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]>
* radv: allow to enable VK_AMD_shader_ballot only on GFX8+Samuel Pitoiset2019-08-212-2/+3
| | | | | | | | Scans aren't implemented on SI/CIK. Cc: 19.2 <[email protected]> Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]>
* nir/loop_analyze: Treat do{}while(false) loops as 0 iterationsDanylo Piliaiev2019-08-211-0/+49
| | | | | | | | | | | | | | | | | | | | | | | | | | | | Loops like: block block_0: vec1 32 ssa_2 = load_const (0x00000020) vec1 32 ssa_3 = load_const (0x00000001) loop { vec1 32 ssa_7 = phi block_0: ssa_3, block_4: ssa_9 vec1 1 ssa_8 = ige ssa_2, ssa_7 if ssa_8 { break } else { } vec1 32 ssa_9 = iadd ssa_7, ssa_1 } Were treated as having more than 1 iteration and after unrolling produced wrong results, however such loop will exit during the first iteration if not unrolled. So we check if loop will actually loop. Fixes tests/shaders/glsl-fs-loop-while-false-02.shader_test Signed-off-by: Danylo Piliaiev <[email protected]> Reviewed-by: Timothy Arceri <[email protected]>
* nir/loop_unroll: Prepare loop for unrolling in wrapper_unrollDanylo Piliaiev2019-08-211-25/+1
| | | | | | | | | Without loop_prepare_for_unroll loops are losing phis. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=111411 Fixes: 5db98195 "nir: add loop unroll support for wrapper loops" Signed-off-by: Danylo Piliaiev <[email protected]> Reviewed-by: Timothy Arceri <[email protected]>
* nir/loop_unroll: Update the comments for loop_prepare_for_unrollDanylo Piliaiev2019-08-211-2/+2
| | | | | | | | The comments say that we should remove continue if it is the last intruction in a loop however we remove any kind of jump. Signed-off-by: Danylo Piliaiev <[email protected]> Reviewed-by: Timothy Arceri <[email protected]>
* radv: Emit VGT_GS_ONCHIP_CNTL for tess on GFX10.Bas Nieuwenhuizen2019-08-211-0/+8
| | | | | | | | Otherwise hangs are possible. This register was already set for GS and NGG. Fixes: 5eaed7ecfce "radv/gfx10: enable support for NAVI10, NAVI12 and NAVI14" Reviewed-by: Samuel Pitoiset <[email protected]>
* radv: Use correct vgpr_comp_cnt for VS if both prim_id and instance_id are ↵Bas Nieuwenhuizen2019-08-211-2/+4
| | | | | | | | | needed. Should take the max of the 2. Fixes: ea337c8b7e9 "radv/gfx10: fix VS input VGPRs with the legacy path" Reviewed-by: Samuel Pitoiset <[email protected]>
* nir/algebraic: some subtraction optimizationsDaniel Schürmann2019-08-211-0/+3
| | | | | | | | | | | | | | | | | | Changes with RADV/ACO: Totals from affected shaders: SGPRS: 444087 -> 455543 (2.58 %) VGPRS: 436468 -> 436768 (0.07 %) Spilled SGPRs: 0 -> 0 (0.00 %) Spilled VGPRs: 0 -> 0 (0.00 %) Private memory VGPRs: 0 -> 0 (0.00 %) Scratch size: 0 -> 0 (0.00 %) dwords per thread Code Size: 13448928 -> 13353520 (-0.71 %) bytes LDS: 0 -> 0 (0.00 %) blocks Max Waves: 68060 -> 67979 (-0.12 %) Wait states: 0 -> 0 (0.00 %) Reviewed-by: Alyssa Rosenzweig <[email protected]> Reviewed-by: Connor Abbott <[email protected]>
* radeonsi: take reference glsl types for compile threadsLionel Landwerlin2019-08-211-0/+8
| | | | | | | | | | | | | An application quitting before the destroying its GL context and binding a NULL context might still have a radeonsi compiler thread running and potentially still accessing the types. Therefore take a reference for the duration of the threads' lifetime. v2: Only ref the glsl types, the builtins should be used by the time shader data gets to a gallium driver. Signed-off-by: Lionel Landwerlin <[email protected]>
* mesa/compiler: rework tear down of builtin/typesLionel Landwerlin2019-08-2111-79/+65
| | | | | | | | | | | | | | | | | | | | | | | | | | The issue we're running into when running CTS is that glsl types are deleted while builtins depending on them are not. This happens because on one hand we have glsl types ref counted, but builtins are not. Instead builtins are destroyed when unloading libGL or explicitly calling glReleaseShaderCompiler(). This change removes almost entirely any dealing with glsl types ref/unref by letting the builtins deal with it instead. In turn we introduce a builtin ref count mechanism. Each GL context takes a reference on the builtins when compiling a shader for the first time. It releases the reference when the context is destroyed. It can also explicitly release those when glReleaseShaderCompiler() is called. Finally we also take a reference on the glsl types when loading libGL to avoid recreating glsl types too often. v2: Ensure we take a reference if we don't have one in link step (Lionel) Signed-off-by: Lionel Landwerlin <[email protected]> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=110796 Reviewed-by: Eric Anholt <[email protected]> Reviewed-by: Tapani Pälli <[email protected]>
* compiler: ensure glsl types are not created without a referenceLionel Landwerlin2019-08-211-1/+6
| | | | | | | | | We want to detect invalid refcounting so assert we have at least one use before creating types. Signed-off-by: Lionel Landwerlin <[email protected]> Reviewed-by: Eric Anholt <[email protected]> Reviewed-by: Tapani Pälli <[email protected]>
* nir/tests: take reference on glsl typesLionel Landwerlin2019-08-214-1/+16
| | | | | | Signed-off-by: Lionel Landwerlin <[email protected]> Reviewed-by: Eric Anholt <[email protected]> Reviewed-by: Tapani Pälli <[email protected]>
* glsl/tests: take refs on glsl typesLionel Landwerlin2019-08-219-18/+64
| | | | | | | | | Much like each driver, tests as standalone entities must take references on the glsl types. Signed-off-by: Lionel Landwerlin <[email protected]> Reviewed-by: Eric Anholt <[email protected]> Reviewed-by: Tapani Pälli <[email protected]>
* radv/gfx10: hardcode some depth+stencil formats in the format tableSamuel Pitoiset2019-08-211-0/+5
| | | | | | | | | | | The script doesn't handle them correctly and D16_UNORM_S8_UINT isn't supported by the hardware, mark it as invalid. This fixes warning when generating gfx10_format_table.h. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=111393 Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]>
* radv/gfx10: tidy up gfx10_format_table.pySamuel Pitoiset2019-08-211-11/+9
| | | | | Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]>
* gallium/vl: use compute preference for all multimedia, not just blitIlia Mirkin2019-08-206-7/+7
| | | | | | | | | | | | The compute paths in vl are a bit AMD-specific. For example, they (on nouveau), try to use a BGRX8 image format, which is not supported. Fixing all this is probably possible, but since the compute paths aren't in any way better, it's difficult to care. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=111213 Fixes: 9364d66cb7 (gallium/auxiliary/vl: Add video compositor compute shader render) Signed-off-by: Ilia Mirkin <[email protected]> Reviewed-by: Marek Olšák <[email protected]>
* lima/ppir: use ra_get_best_spill_node to select spill node19.2-branchpointErico Nunes2019-08-201-7/+22
| | | | | | | | | | | | ra_get_best_spill_node is what other users of the mesa register allocator use. Switching to it now also fixes an infinite loop issue with ppir regalloc with the ppir control flow patchset, and also provides a small gain over the previous herusitic on number of spilled nodes testing with shader-db. Signed-off-by: Erico Nunes <[email protected]> Reviewed-by: Vasily Khoruzhick <[email protected]>
* tgsi: Remove unused tgsi_check_soa_dependencies().Eric Anholt2019-08-202-59/+0
| | | | | Acked-by: Eric Engestrom <[email protected]> Reviewed-By: Gert Wollny <[email protected]>
* tgsi: Drop the SSE2 constants setup that's been dead code since 2011.Eric Anholt2019-08-202-53/+9
| | | | | | | The SSE2 executor was removed in 4eb3225b38ce ("Remove tgsi_sse2.") Acked-by: Eric Engestrom <[email protected]> Reviewed-By: Gert Wollny <[email protected]>
* tgsi: drop a stale commentEric Anholt2019-08-201-3/+0
| | | | | | | | This was fixed in 912ed84f8338 ("tgsi: move to using vector for system values.") Acked-by: Eric Engestrom <[email protected]> Reviewed-By: Gert Wollny <[email protected]>
* mesa: reverse no_error on compressed_tex_sub_image for TEX_MODE_CURRENTJose Maria Casanova Crespo2019-08-201-2/+2
| | | | | | | | | This fixes the regression introduced on "mesa: refactor compressed_tex_sub_image function" that started to crash KHR-GLES2.texture_3d.compressed_texture.negative_compressed_tex_sub_image Fixes: 7df233d68dc ("mesa: refactor compressed_tex_sub_image function") Reviewed-by: Eric Anholt <[email protected]>
* glx: Eliminate glx_config::{rgb,float,colorIndex}ModeAdam Jackson2019-08-204-37/+9
| | | | | These are redundant with glx_config::renderType, let's just use that consistently.
* glx: Remove unused glx_config::pixmapModeAdam Jackson2019-08-201-2/+0
| | | | Reviewed-by: Eric Engestrom <[email protected]>
* glx: convert glx_config_create_list to one big callocAdam Jackson2019-08-201-37/+26
| | | | | | Simpler, less failure prone, less malloc overhead, what's not to like. Reviewed-by: Eric Engestrom <[email protected]>
* glx: convert a malloc+memset to callocAdam Jackson2019-08-201-2/+1
| | | | Reviewed-by: Eric Engestrom <[email protected]>
* glx: Fix parameter documentation of glx_config_create_listAdam Jackson2019-08-201-4/+0
| | | | | | 'minimum_size' is not, in fact, an argument to this function. Reviewed-by: Eric Engestrom <[email protected]>
* anv: inline uniforms blocks don't count toward descriptor set limitsArcady Goldmints-Orlov2019-08-201-0/+23
| | | | | | | | In a descriptor set inline uniform blocks don't use up any bindings. However, the presence of any inline uniform blocks doed require the use of the descriptor buffer, which takes up one binding. Reviewed-by: Jason Ekstrand <[email protected]>
* nir: add divergence analysis pass.Daniel Schürmann2019-08-203-0/+799
| | | | | | | | | | This pass expects the shader to be in LCSSA form. The algorithm is based on 'The Simple Divergence Analysis' from Diogo Sampaio, Rafael De Souza, Sylvain Collange, Fernando Magno Quintão Pereira. Divergence Analysis. ACM Transactions on Programming Languages and Systems (TOPLAS) Reviewed-by: Jason Ekstrand <[email protected]> Reviewed-by: Connor Abbott <[email protected]>
* nir/subgroups: Lower clustered reductions with cluster_size >= subgroup_size ↵Rhys Perry2019-08-201-1/+12
| | | | | | | | into reductions The behavior for reductions with cluster_size >= subgroup_size is implementation defined. Reviewed-by: Jason Ekstrand <[email protected]>
* nir/lcssa: allow to create LCSSA phis for loop-invariant booleansRhys Perry2019-08-202-3/+7
| | | | | | | ACO depends on LCSSA phis for divergent booleans to work correctly. Reviewed-by: Connor Abbott <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]>
* nir/lcssa: Skip loop invariant variables when converting to LCSSA.Daniel Schürmann2019-08-202-14/+162
| | | | | | Co-authored-by: Rhys Perry <[email protected]> Reviewed-by: Connor Abbott <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]>
* nir: make nir_to_lcssa() a general NIR pass.Rhys Perry2019-08-202-3/+42
| | | | | Reviewed-by: Connor Abbott <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]>
* nir/lcssa: handle deref instructions properlyDaniel Schürmann2019-08-202-14/+26
| | | | | Reviewed-by: Jason Ekstrand <[email protected]> Fixes: 414148cdc124 "nir: Support deref instructions in loop_analyze"
* tgsi_to_nir: only update TGSI properties of the current shader stageJose Maria Casanova Crespo2019-08-201-9/+18
| | | | | | | | | | | | | | | | | | The implementation introduced in "tgsi_to_nir: be careful about not losing any TGSI properties silently (v2)" updates all the TGSI properties, but it didn't take into account that the shader_info structure uses a union to store the different attributes for each shader stage. Now we only update the attributes if they affect current shader stage, avoiding to overwrite members of the union that should be overwritten. This has created hundreds of regressions in v3d. For example the TGSI_PROPERTY_VS_BLIT_SGPRS_AMD was overwritting the same position used by TGSI_PROPERY_CS_FIXED_BLOCK_DEPTH. Fixes: e3003651978 ("tgsi_to_nir: be careful about not losing any TGSI properties silently (v2)") Reviewed-by: Marek Olšák <[email protected]>
* radv/gfx10: do not emit PA_SC_TILE_STEERING_OVERRIDE twiceSamuel Pitoiset2019-08-201-2/+0
| | | | | | | CLEAR_STATE emits it for us. Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]>
* radv: do not emit PKT3_CONTEXT_CONTROL with AMDGPU 3.6.0+Samuel Pitoiset2019-08-202-6/+12
| | | | | | | It's emitted by the kernel. Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]>
* mesa/program: Take ARB_framebuffers_no_attachments into account in wpos ↵Gert Wollny2019-08-201-2/+2
| | | | | | | | | | | | | correction If a drawbuffer is an fbo without an attachment then its 'Height' will be zero, and we have to take its 'DefaultGeometry.Height' into account. Fixes on softpipe (with the exception of tests that use multisample): dEQP-GLES31.functional.fbo.no_attachments.* Signed-off-by: Gert Wollny <[email protected]> Reviewed-by: Marek Olšák <[email protected]>
* iris: Enable non coherent framebuffer fetch on broadwellSagar Ghuge2019-08-203-4/+3
| | | | | | | | v2: Use GEN_GEN in iris_state (Kenneth Graunke) Signed-off-by: Sagar Ghuge <[email protected]> Reviewed-by: Caio Marcelo de Oliveira Filho <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
* iris: Free resource if failed to allocate surface stateSagar Ghuge2019-08-201-1/+3
| | | | | Signed-off-by: Sagar Ghuge <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
* iris: Pass isl_surf to fill_surface_stateSagar Ghuge2019-08-201-16/+19
| | | | | | Signed-off-by: Sagar Ghuge <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]> Suggested-by: Kenneth Graunke <[email protected]>
* iris: Add infrastructure to support non coherent framebuffer fetchSagar Ghuge2019-08-204-13/+172
| | | | | | | | | | | | | | | | Create separate SURFACE_STATE for render target read in order to support non coherent framebuffer fetch on broadwell. Also we need to resolve framebuffer in order to support CCS_D. v2: Add outputs_read check (Kenneth Graunke) v3: 1) Import Curro's comment from get_isl_surf 2) Rename get_isl_surf method 3) Clean up allocation in case of failure Signed-off-by: Sagar Ghuge <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
* iris: Add helper functions to get tile offsetSagar Ghuge2019-08-202-0/+106
| | | | | | | All helper functions are ported from i965 driver. Signed-off-by: Sagar Ghuge <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
* iris: Add helper function to get isl dim layoutSagar Ghuge2019-08-202-0/+32
| | | | | | | v2: Add missing space (Caio) Signed-off-by: Sagar Ghuge <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
* iris: Add render target read entry in binding tableSagar Ghuge2019-08-202-7/+44
| | | | | | | | | | | | | | This will be used in next patches for supporting non coherent framebuffer fetch on Broadwell. v2: Fix comment (Kenneth Graunke) v3: 1) Fix a few nits (Caio) 2) Add comment (Caio) Signed-off-by: Sagar Ghuge <[email protected]> Reviewed-by: Caio Marcelo de Oliveira Filho <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
* build: Bump C++ standard requirement to C++14 to fix FTBFS with LLVM 10Kai Wasserbäch2019-08-202-2/+2
| | | | | | | | | | | When building Mesa against a recent LLVM 10 with C++11, the build fails if the AMD common code is built as well due to "std::index_sequence" being undeclared. LLVM requires a minimum of C++14. Signed-off-by: Kai Wasserbäch <[email protected]> Acked-by: Eric Engestrom <[email protected]>
* panfrost: Add madvise support to BO cacheRob Herring2019-08-192-2/+23
| | | | | | | | | | | | | | | The kernel now supports madvise ioctl to indicate which BOs can be freed when there is memory pressure. Mark BOs purgeable when they are in the BO cache. The BOs must also be munmapped when they are in the cache or they cannot be purged. We could optimize avoiding the madvise ioctl on older kernels once the driver version bump lands, but probably not worth it given the other driver features also being added. Reviewed-by: Alyssa Rosenzweig <[email protected]> Reviewed-by: Tomeu Vizoso <[email protected]> Signed-off-by: Rob Herring <[email protected]>
* mesa: add ext_dsa GetMultiTexLevelParameterEXTPierre-Eric Pelloux-Prayer2019-08-195-2/+75
| | | | Reviewed-by: Marek Olšák <[email protected]>
* mesa: add EXT_dsa glCompressedMultiTex* functions display list supportPierre-Eric Pelloux-Prayer2019-08-191-0/+276
| | | | Reviewed-by: Marek Olšák <[email protected]>