summaryrefslogtreecommitdiffstats
path: root/src
Commit message (Collapse)AuthorAgeFilesLines
* v3d: Use the right size for v3d 4.x TEXTURE_SHADER_STATE BO.Eric Anholt2018-06-141-2/+2
| | | | This doesn't really matter, since they both get rounded up to 4096.
* v3d: Add static asserts for other packed packet sizes.Eric Anholt2018-06-142-0/+7
|
* v3d: Fix the size of the packed attribute state.Eric Anholt2018-06-141-1/+1
| | | | Fixes segfaults in dEQP-GLES3.functional.vertex_array_objects.all_attributes.
* v3d: Remove some unused context fields from vc4.Eric Anholt2018-06-141-11/+0
|
* v3d: Remove unused QUNIFORM_STENCIL left over from vc4.Eric Anholt2018-06-142-11/+0
|
* v3d: Use our #define for max attributes in shader caps.Eric Anholt2018-06-141-1/+1
|
* v3d: Fix undefined results for a swap_color_rb RT from a float shader output.Eric Anholt2018-06-141-1/+4
| | | | | Fixes segfaults and undefined behavior in dEQP-GLES3.functional.fragment_out.basic.fixed.srgb8_alpha8_lowp_float
* radv: remove multisample bit from shader key.Dave Airlie2018-06-153-4/+0
| | | | | | This wasn't being used anywhere inside the shader from what I can see. Reviewed-by: Bas Nieuwenhuizen <[email protected]>
* intel/compiler: Properly consider UBO loads that cross 32B boundaries.Kenneth Graunke2018-06-141-2/+14
| | | | | | | | | | | | | | | | | | | | | | | The UBO push analysis pass incorrectly assumed that all values would fit within a 32B chunk, and only recorded a bit for the 32B chunk containing the starting offset. For example, if a UBO contained the following, tightly packed: vec4 a; // [0, 16) float b; // [16, 20) vec4 c; // [20, 36) then, c would start at offset 20 / 32 = 0 and end at 36 / 32 = 1, which means that we ought to record two 32B chunks in the bitfield. Similarly, dvec4s would suffer from the same problem. v2: Rewrite the accounting, my calculations were wrong. v3: Write a comment about partial values (requested by Jason). Reviewed-by: Rafael Antognolli <[email protected]> [v1] Reviewed-by: Jason Ekstrand <[email protected]> [v3]
* glsl: Don't copy propagate elements from SSBO or shared variables eitherIan Romanick2018-06-141-0/+8
| | | | | | | | | | | | | | | | | | | | | | | | Since SSBOs can be written by a different GPU thread, copy propagating a read can cause the value to magically change. SSBO reads are also very expensive, so doing it twice will be slower. The same shader was helped by this patch and the previous. Haswell, Broadwell, and Skylake had similar results. (Skylake shown) total instructions in shared programs: 14399119 -> 14399113 (<.01%) instructions in affected programs: 683 -> 677 (-0.88%) helped: 1 HURT: 0 total cycles in shared programs: 532973113 -> 532971865 (<.01%) cycles in affected programs: 524666 -> 523418 (-0.24%) helped: 1 HURT: 0 Signed-off-by: Ian Romanick <[email protected]> Reviewed-by: Caio Marcelo de Oliveira Filho <[email protected]> Cc: [email protected] Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=106774
* glsl: Don't copy propagate from SSBO or shared variables eitherIan Romanick2018-06-141-0/+2
| | | | | | | | | | | | | | | | | | | | | | Since SSBOs can be written by other GPU threads, copy propagating a read can cause the value to magically change. SSBO reads are also very expensive, so doing it twice will be slower. Haswell, Broadwell, and Skylake had similar results. (Skylake shown) total instructions in shared programs: 14399120 -> 14399119 (<.01%) instructions in affected programs: 684 -> 683 (-0.15%) helped: 1 HURT: 0 total cycles in shared programs: 532978931 -> 532973113 (<.01%) cycles in affected programs: 530484 -> 524666 (-1.10%) helped: 1 HURT: 0 Signed-off-by: Ian Romanick <[email protected]> Reviewed-by: Caio Marcelo de Oliveira Filho <[email protected]> Cc: [email protected] Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=106774
* meson: only build vl_winsys_dri.c when x11 platform is usedLukas Rusak2018-06-141-1/+1
| | | | | | | | | | | | | | This seems to have been missed in the move from autotools This fixes the following build issue: ../src/gallium/auxiliary/vl/vl_winsys_dri.c:34:10: fatal error: X11/Xlib-xcb.h: No such file or directory #include <X11/Xlib-xcb.h> ^~~~~~~~~~~~~~~~ Fixes: b1b65397d0c4978e36a84c0a1c98a4bd6cb9588e ("meson: Build gallium auxiliary") Reviewed-by: Dylan Baker <[email protected]>
* st/mesa: add missing switch cases in glsl_to_tgsi_visitor::visit()Brian Paul2018-06-141-0/+2
| | | | | | To silence compiler warning about unhandled switch cases. Reviewed-by: Charmaine Lee <[email protected]>
* radv: Fix output for sparse MRTs.Bas Nieuwenhuizen2018-06-141-9/+10
| | | | | | | | | | | | We need to init the cb_shader_format correctly with the changed col_format, so this moves the col_format adjustment to before the adjustment to before the cb_shader_mask gets generated. Fixes: 06d3c650980 "radv: fix a GPU hang when MRTs are sparse" Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=106903 CC: 18.1 <[email protected]> Reviewed-by: Dave Airlie <[email protected]> Reviewed-by: Samuel Pitoiset <[email protected]>
* radv: update the ZRANGE_PRECISION value for the TC-compat bugSamuel Pitoiset2018-06-141-0/+108
| | | | | | | | | | | | | | | | | | | | | | On GFX8+, there is a bug that affects TC-compatible depth surfaces when the ZRange is not reset after LateZ kills pixels. The workaround is to always set DB_Z_INFO.ZRANGE_PRECISION to match the last fast clear value. Because the value is set to 1 by default, we only need to update it when clearing Z to 0.0. We also need to set the depth clear regs and to update ZRANGE_PRECISION when initializing a TC-compat depth image to 0. Original patch from James Legg. This fixes random CTS fails with dEQP-VK.renderpass.suballocation.formats.d32_sfloat_s8_uint.input.* Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=105396 CC: <[email protected]> Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]>
* anv: reduce maxFragmentInputComponentsSamuel Iglesias Gonsálvez2018-06-141-1/+1
| | | | | | | | | | | | | | | | If the application asks for the maximum number of fragment input components (128), use all of them plus some builtins that are passed in the VUE, then we exceed the maximum number of used VUE slots (32) and we break one assert that checks this limit. Also, with separate shader objects, we add CLIP_DIST0, CLIP_DIST1 builtins in brw_compute_vue_map() because we don't know if gl_ClipDistance is going to be read/write by an adjacent stage. Fixes VK-GL-CTS CL#2569. Signed-off-by: Samuel Iglesias Gonsálvez <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]>
* radeonsi/gfx9: fix si_get_buffer_from_descriptors for 48-bit pointersMarek Olšák2018-06-131-2/+2
| | | | | | | | This fixes: GL45-CTS.pipeline_statistics_query_tests_ARB.functional_compute_shader_invocations Cc: 18.0 18.1 <[email protected]> Reviewed-by: Samuel Pitoiset <[email protected]>
* radeonsi/gfx9: update & clean up a DPBB heuristicMarek Olšák2018-06-131-9/+5
| | | | Tested-by: Dieter Nützel <[email protected]>
* radeonsi/gfx9: set POPS_DRAIN_PS_ON_OVERLAP due to a hw bugMarek Olšák2018-06-131-2/+4
| | | | | | This may not be needed yet, but let's set it now. Tested-by: Dieter Nützel <[email protected]>
* radeonsi/gfx9: remove UINT_MAX array terminators in bin size tablesMarek Olšák2018-06-131-19/+1
| | | | Tested-by: Dieter Nützel <[email protected]>
* radeonsi/gfx9: update bin sizesMarek Olšák2018-06-131-35/+38
| | | | | | This is based on our docs (recently updated), not amdvlk. Tested-by: Dieter Nützel <[email protected]>
* radeonsi/gfx9: update primitive binning code for EQAAMarek Olšák2018-06-131-4/+9
| | | | Tested-by: Dieter Nützel <[email protected]>
* radeonsi: assume that rasterizer state is non-NULL in draw_vboMarek Olšák2018-06-134-75/+61
| | | | Tested-by: Dieter Nützel <[email protected]>
* radeonsi: micro-optimize prim checking and fix guardband with lines+adjacencyMarek Olšák2018-06-134-13/+23
| | | | Tested-by: Dieter Nützel <[email protected]>
* radeonsi: move the guardband registers into a separate state atomMarek Olšák2018-06-135-19/+35
| | | | | | | | | They have a different frequency of updates and don't change when scissors change. I think this even fixes something in si_update_vs_viewport_state. Tested-by: Dieter Nützel <[email protected]>
* radeonsi/gfx9: implement the scissor bug workaround without performance dropMarek Olšák2018-06-132-29/+81
| | | | | | This might improve performance on Vega10 and Raven. Tested-by: Dieter Nützel <[email protected]>
* radeonsi: don't set VGT_LS_HS_CONFIG if it doesn't changeMarek Olšák2018-06-133-6/+12
| | | | Tested-by: Dieter Nützel <[email protected]>
* radeonsi: move VGT_GS_OUT_PRIM_TYPE into si_shader_gsMarek Olšák2018-06-134-33/+26
| | | | | | same as amdvlk. Tested-by: Dieter Nützel <[email protected]>
* radeonsi: record CLIPVERTEX output usage properly for compatibility profilesMarek Olšák2018-06-131-1/+0
| | | | | | This was missed when adding CLIPVERTEX support into GS & tess. Tested-by: Dieter Nützel <[email protected]>
* radeonsi: fix FBFETCH with 2D MSAA arraysMarek Olšák2018-06-131-1/+2
| | | | Tested-by: Dieter Nützel <[email protected]>
* ac: handle undefined EQAA samples in ac_apply_fmask_to_sampleMarek Olšák2018-06-131-2/+4
| | | | | | RADV might wanna use this helper too. Tested-by: Dieter Nützel <[email protected]>
* radeonsi: return real memory usage instead of per-process usageMarek Olšák2018-06-131-2/+2
| | | | Reviewed-by: Bas Nieuwenhuizen <[email protected]>
* ac/gpu_info: report real total memory sizesMarek Olšák2018-06-131-28/+54
| | | | | | | The change from MIN2 to MAX2 is intentional. Cc: 18.1 <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]>
* virgl: add ARB_tessellation_shader support. (v2)Dave Airlie2018-06-147-8/+107
| | | | | | | | | | This should add all the pieces to enable tess shaders on virgl. v2: fixup transform to handle tess and strip out precise. set default for max patch varyings to work around issue when tess gets enabled from v1 caps but v2 caps aren't in place. (Elie) Reviewed-by: Elie Tournier <[email protected]>
* glsl: allow standalone semicolons outside main()Dave Airlie2018-06-141-0/+1
| | | | | | | | | | | GLSL 4.60 offically added this but games and older CTS suites actually had shaders that did this, we may as well enable it everywhere. Adding stable because it appears apps in the wild do this. Acked-by: Timothy Arceri <[email protected]> Reviewed-by: Matt Turner <[email protected]> Cc: <[email protected]>
* radv: don't fast clear HTILE for 16-bit depth surfaces on GFX8Samuel Pitoiset2018-06-131-0/+8
| | | | | | | | | | This causes rendering issues in Shadow Warrior 2 with DXVK. Cc: [email protected] Fixes: ccc64f3133 ("radv: enable TC-compat HTILE for 16-bit depth surfaces on GFX8") Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=106912 Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]>
* meson: Remove various completed todosDylan Baker2018-06-132-4/+0
| | | | | | | | v3: - Remove "won't do" todos, so only completed todo's are now removed. Signed-off-by: Dylan Baker <[email protected]> Reviewed-by: Eric Engestrom <[email protected]> Reviewed-by: Matt Turner <[email protected]> (v2)
* meson: Add support for SPARC assemblyDylan Baker2018-06-132-2/+9
| | | | | | | | | This was blindly copied from autotools and tested by a helpful gentoo user. Signed-off-by: Dylan Baker <[email protected]> Reviewed-by: Eric Engestrom <[email protected]> Reviewed-by: Matt Turner <[email protected]>
* meson: Set include dirs for asmDylan Baker2018-06-131-2/+6
| | | | | | | | | v2: - split this from the next patch - Only include x86-64 and not x86 when buiding x86_64 Signed-off-by: Dylan Baker <[email protected]> Reviewed-by: Eric Engestrom <[email protected]> Reviewed-by: Matt Turner <[email protected]>
* Revert "intel/compiler: Properly consider UBO loads that cross 32B boundaries."Jason Ekstrand2018-06-131-7/+1
| | | | | | This reverts commit b8fa847c2ed9c7c743f31e57560a09fae3992f46. This broke about 30k Vulkan CTS tests.
* intel/compiler: Properly consider UBO loads that cross 32B boundaries.Kenneth Graunke2018-06-131-1/+7
| | | | | | | | | | | | | | | | | | | The UBO push analysis pass incorrectly assumed that all values would fit within a 32B chunk, and only recorded a bit for the 32B chunk containing the starting offset. For example, if a UBO contained the following, tightly packed: vec4 a; // [0, 16) float b; // [16, 20) vec4 c; // [20, 36) then, c would start at offset 20 / 32 = 0 and end at 36 / 32 = 1, which means that we ought to record two 32B chunks in the bitfield. Similarly, dvec4s would suffer from the same problem. Reviewed-by: Rafael Antognolli <[email protected]>
* drivers/dri/i965: add missing #includeRoss Burton2018-06-121-0/+2
| | | | | | brw_bufmgr.h uses time_t without include time.h, so the build fails under musl. Reviewed-by: Eric Engestrom <[email protected]>
* anv/android: Use an address for each anv_image planeMauro Rossi2018-06-121-2/+2
| | | | | | | | | | | | | | | | | Fixes to avoid building error after change in image->planes[] structure, {bo,bo_offset} has to be replaced by address.{bo,offset} and update is needed also in the assert() for debug builds. external/mesa/src/intel/vulkan/anv_android.c:188:21: error: no member named 'bo' in 'struct anv_image::(anonymous at external/mesa/src/intel/vulkan/anv_private.h:2647:4)' image->planes[0].bo = bo; ~~~~~~~~~~~~~~~~ ^ 1 error generated. Fixes: bf34ef16ac ("anv: Use an address for each anv_image plane") Signed-off-by: Mauro Rossi <[email protected]> Reviewed-by: Lionel Landwerlin <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]>
* anv/android: Set the BO flags in bo_cache_import (v2)Mauro Rossi2018-06-121-1/+7
| | | | | | | | | | | | | | | | | | Changes to avoid building error: external/mesa/src/intel/vulkan/anv_android.c:131:72: error: too few arguments to function call, expected 5, have 4 result = anv_bo_cache_import(device, &device->bo_cache, dma_buf, &bo); ~~~~~~~~~~~~~~~~~~~ ^ 1 error generated. (v2) Set the correct bo_flags based on support of 48bit addresses and soft-pin Fixes: b0d50247a7 ("anv/allocator: Set the BO flags in bo_cache_alloc/import") Fixes: e7d0378bd9 ("anv: Soft-pin client-allocated memory") Signed-off-by: Mauro Rossi <[email protected]> Reviewed-by: Lionel Landwerlin <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]>
* anv: Disable __gen_validate_value if NDEBUG is set.Kenneth Graunke2018-06-111-0/+2
| | | | | | | | | | | | | | | | | | We were enabling undefined memory checking for genxml values based on Valgrind being installed at build time, even for release builds. This generates piles and piles of assembly whenever you touch genxml. With gcc 7.3.1 and -O3 and -march=native on a Kabylake with Valgrind installed at build time: text data bss dec hex filename 5978385 262884 13488 6254757 5f70a5 libvulkan_intel.so 3799377 262884 13488 4075749 3e30e5 libvulkan_intel.so That's a 36% reduction in text size. Fixes: 047ed02723071d7eccbed3210b5be6ae73603a53 (vk/emit: Use valgrind to validate every packed field) Reviewed-by: Jason Ekstrand <[email protected]>
* i965: fix resource leakEric Engestrom2018-06-111-1/+3
| | | | | | | | | | v2: intel_miptree_release() already takes care of the planes, no need to hand-code the loop (Lionel) Coverity ID: 1436909 Fixes: 3352f2d746d3959b22ca4 "i965: Create multiple miptrees for planar YUV images" Reviewed-by: Lionel Landwerlin <[email protected]> Signed-off-by: Eric Engestrom <[email protected]>
* freedreno/ir3: use pipe_image_view's cppRob Clark2018-06-111-1/+6
| | | | | | | At least for PIPE_BUFFER, we could get the resource used as (for example) R32F imageBuffer. So using cpp=1 from the rsc is wrong. Signed-off-by: Rob Clark <[email protected]>
* freedreno/ir3: fix image dimensions offsetRob Clark2018-06-111-1/+1
| | | | | | copy-pasta fail from how SSBO sizes are handled. Signed-off-by: Rob Clark <[email protected]>
* freedreno/a5xx: correct image/ssbo offsetRob Clark2018-06-111-1/+1
| | | | Signed-off-by: Rob Clark <[email protected]>
* freedreno/ir3: use saml always if we have lodRob Clark2018-06-111-1/+1
| | | | | | | In some cases we get plain tex opcodes (but w/ a lod argument).. in this case always use the saml instruction. Signed-off-by: Rob Clark <[email protected]>