aboutsummaryrefslogtreecommitdiffstats
Commit message (Collapse)AuthorAgeFilesLines
...
* i965/fs: Rearrange code to remove most of the gotosIan Romanick2018-06-151-11/+3
| | | | Signed-off-by: Ian Romanick <[email protected]>
* i965/fs: Refactor propagation of conditional modifiers from compares to addsIan Romanick2018-06-151-57/+80
| | | | Signed-off-by: Ian Romanick <[email protected]>
* i965/vec4: Optimize OR with 0 into a MOVIan Romanick2018-06-151-0/+8
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | All of the affected shaders are geometry shaders... the same ones from the similar fs changes. The "No changes on any other platforms" comment below is not quite right. Without the previous change to register coalescing, this optimization caused quite a few regressions in tests that either used gl_ClipVertex or used different interpolation modes. I observed that with both patches applied, glsl-1.10/execution/interpolation/interpolation-none-gl_BackSecondaryColor-smooth-vertex.shader_test was one instruction shorter. I suspect other shaders would be similarly affected. Since this is all based on NOS, shader-db does not reflect it. Haswell total instructions in shared programs: 12954955 -> 12954918 (<.01%) instructions in affected programs: 3603 -> 3566 (-1.03%) helped: 37 HURT: 0 helped stats (abs) min: 1 max: 1 x̄: 1.00 x̃: 1 helped stats (rel) min: 0.21% max: 2.50% x̄: 1.99% x̃: 2.50% 95% mean confidence interval for instructions value: -1.00 -1.00 95% mean confidence interval for instructions %-change: -2.30% -1.69% Instructions are helped. total cycles in shared programs: 410012108 -> 410012098 (<.01%) cycles in affected programs: 3540 -> 3530 (-0.28%) helped: 5 HURT: 0 helped stats (abs) min: 2 max: 2 x̄: 2.00 x̃: 2 helped stats (rel) min: 0.28% max: 0.28% x̄: 0.28% x̃: 0.28% 95% mean confidence interval for cycles value: -2.00 -2.00 95% mean confidence interval for cycles %-change: -0.28% -0.28% Cycles are helped. Ivy Bridge total instructions in shared programs: 11679387 -> 11679351 (<.01%) instructions in affected programs: 3292 -> 3256 (-1.09%) helped: 36 HURT: 0 helped stats (abs) min: 1 max: 1 x̄: 1.00 x̃: 1 helped stats (rel) min: 0.21% max: 2.50% x̄: 2.04% x̃: 2.50% 95% mean confidence interval for instructions value: -1.00 -1.00 95% mean confidence interval for instructions %-change: -2.34% -1.74% Instructions are helped. No changes on any other platforms. Signed-off-by: Ian Romanick <[email protected]>
* i965/vec4: Don't register coalesce into source of VS_OPCODE_UNPACK_FLAGS_SIMD4X2Ian Romanick2018-06-151-0/+9
| | | | | | | This prevents regressions in a bunch of clipping and interpolation tests caused by the next patch (i965/vec4: Optimize OR with 0 into a MOV). Signed-off-by: Ian Romanick <[email protected]>
* i965/fs: Optimize OR with 0 into a MOVIan Romanick2018-06-151-1/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | fs_visitor::set_gs_stream_control_data_bits generates some code like "control_data_bits | stream_id << ((2 * (vertex_count - 1)) % 32)" as part of EmitVertex. The first time this (dynamically) occurs in the shader, control_data_bits is zero. Many times we can determine this statically and various optimizations will collaborate to make one of the OR operands literal zero. Converting the OR to a MOV usually allows it to be copy-propagated away. However, this does not happen in at least some shaders (in the assembly output of shaders/closed/UnrealEngine4/EffectsCaveDemo/301.shader_test, search for shl). All of the affected shaders are geometry shaders. Broadwell and Skylake had similar results. (Skylake shown) total instructions in shared programs: 14375452 -> 14375413 (<.01%) instructions in affected programs: 6422 -> 6383 (-0.61%) helped: 39 HURT: 0 helped stats (abs) min: 1 max: 1 x̄: 1.00 x̃: 1 helped stats (rel) min: 0.14% max: 2.56% x̄: 1.91% x̃: 2.56% 95% mean confidence interval for instructions value: -1.00 -1.00 95% mean confidence interval for instructions %-change: -2.26% -1.57% Instructions are helped. total cycles in shared programs: 531981179 -> 531980555 (<.01%) cycles in affected programs: 27493 -> 26869 (-2.27%) helped: 39 HURT: 0 helped stats (abs) min: 16 max: 16 x̄: 16.00 x̃: 16 helped stats (rel) min: 0.60% max: 7.92% x̄: 5.94% x̃: 7.92% 95% mean confidence interval for cycles value: -16.00 -16.00 95% mean confidence interval for cycles %-change: -6.98% -4.90% Cycles are helped. No changes on earlier platforms. Signed-off-by: Ian Romanick <[email protected]>
* v3d: Handle a no-intersection scissor even if it's outside of the VP.Eric Anholt2018-06-151-10/+8
| | | | | | The min/maxes ended up producing a negative clip width/height for dEQP-GLES3.functional.fragment_ops.scissor.outside_render_line. Just make sure they stay at 0 (or v3d 3.x's workaround) if that happens.
* v3d: Use the proper depth texture type for sampling.Eric Anholt2018-06-151-3/+3
| | | | Fixes failing tests in dEQP-GLES3.functional.texture.shadow
* v3d: Limit shader threading according to our maximum TMU fifo usage.Eric Anholt2018-06-151-10/+24
| | | | | | Fixes simulator assertion failures in dEQP-GLES3.functional.shaders.texture_functions.texture.samplercubeshadow_bias_fragment and similar complicated cases.
* v3d: Fix shaders using pixel center W but no varyings.Eric Anholt2018-06-154-16/+9
| | | | | | | | The docs called this field "uses both center W and centroid W", but actually it's "do you need center W even if varyings don't obviously call for it?" Fixes dEQP-GLES3.functional.shaders.builtin_variable.fragcoord_w
* docs: Update release-notes and calendarDylan Baker2018-06-152-7/+1
|
* docs: Add release notes for 18.1.2Dylan Baker2018-06-151-0/+170
|
* intel/aubinator: Use int to store getopt_long flags.Rafael Antognolli2018-06-151-2/+2
| | | | | | | | getopt_long flag parameter is an int pointer, so if we use bool to store those values, when getopt_long writes to one of them, it might end up overwriting the next one. Reviewed-by: Ian Romanick <[email protected]>
* Revert "radv: always set/load both depth and stencil clear values"Samuel Pitoiset2018-06-151-5/+28
| | | | | | | | | This fixes a rendering regression with RoTR. This reverts commit 4bdad9faddc82a4560603936ce5ade5707ecb254. Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]>
* radv: don't check for linear images in emit_fast_color_clear()Samuel Pitoiset2018-06-151-2/+0
| | | | | | | | We don't enable CMASK for linear surfaces and addrlib only enables DCC for tiling surfaces. Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]>
* radv: allow RADV_PERFTEST=dccmsaa on GFX9Samuel Pitoiset2018-06-151-2/+2
| | | | | Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]>
* radv: add RADV_DEBUG=checkirSamuel Pitoiset2018-06-155-3/+11
| | | | | | | This allows to run the LLVM verifier pass. Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]>
* radv: update ZRANGE_PRECISION in radv_update_bound_fast_clear_ds()Samuel Pitoiset2018-06-151-31/+15
| | | | | Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]>
* radv: clean up radv_{set,load}_depth_clear_regs() helpersSamuel Pitoiset2018-06-153-32/+44
| | | | | | | And replace _regs by _metadata because it makes more sense. Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]>
* radv: always set/load both depth and stencil clear valuesSamuel Pitoiset2018-06-151-28/+5
| | | | | | | | I don't think that matter much to emit both values and that makes the code a bit simpler. Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]>
* radv: update the fast ds clear values only if the image is boundSamuel Pitoiset2018-06-151-5/+32
| | | | | | | | It's unnecessary to update the fast depth/stencil clear values if the fast cleared depth/stencil image isn't currently bound. Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]>
* radv: clean up radv_{set,load}_color_clear_regs() helpersSamuel Pitoiset2018-06-153-33/+47
| | | | | | | And replace _regs by _metadata because it makes more sense. Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]>
* radv: update the fast color clear values only if the image is boundSamuel Pitoiset2018-06-151-3/+32
| | | | | | | | It's unnecessary to update the fast color clear values if the fast cleared color image isn't currently bound. Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]>
* util/bitset: include util/macro.hChristian Gmeiner2018-06-151-0/+1
| | | | | | | | | | BITSET_FFS(x) macro makes use of ARRAY_SIZE(x) macro which is defined in util/macro.h. Include it directy to make usage more straightforward. Fixes: 692bd4a1ab9 ("util: replace Elements() with ARRAY_SIZE()") Signed-off-by: Christian Gmeiner <[email protected]> Reviewed-by: Eric Engestrom <[email protected]>
* meson: fix private libs when building without glxLukas Rusak2018-06-151-6/+12
| | | | | | | | | | | | | | I noticed that the generated pkg-config files will include glx and x11 dependencies even when x11 isn't a selected platform. This fixes the private libs and was tested by building kmscube V2: - check if gallium-xlib is being used for glx Fixes: 108d257a16859898f5ce0 "meson: build libEGL" Reviewed-by: Dylan Baker <[email protected]> Reviewed-by: Eric Engestrom <[email protected]>
* docs: document addition of GL_ARB_sample_locations for nvc0Rhys Perry2018-06-142-2/+2
| | | | | Signed-off-by: Rhys Perry <[email protected]> Reviewed-by: Brian Paul <[email protected]> (v2)
* nvc0: add support for programmable sample locationsRhys Perry2018-06-1410-46/+299
| | | | Signed-off-by: Rhys Perry <[email protected]>
* st/mesa: add support for ARB_sample_locationsRhys Perry2018-06-148-7/+129
| | | | | | Signed-off-by: Rhys Perry <[email protected]> Reviewed-by: Brian Paul <[email protected]> (v2) Reviewed-by: Marek Olšák <[email protected]> (v2)
* gallium: add support for programmable sample locationsRhys Perry2018-06-1424-2/+120
| | | | | | Signed-off-by: Rhys Perry <[email protected]> Reviewed-by: Brian Paul <[email protected]> (v2) Reviewed-by: Marek Olšák <[email protected]> (v2)
* mesa: add support for ARB_sample_locationsRhys Perry2018-06-1412-28/+455
| | | | | | Signed-off-by: Rhys Perry <[email protected]> Reviewed-by: Brian Paul <[email protected]> (v2) Reviewed-by: Marek Olšák <[email protected]> (v2)
* v3d: Fix polygon offset for Z16 buffers.Eric Anholt2018-06-143-2/+14
| | | | | | Fixes: dEQP-GLES3.functional.polygon_offset.fixed16_displacement_with_units dEQP-GLES3.functional.polygon_offset.fixed16_render_with_units
* v3d: Fix configuration setup of mixed f32 and f16 render targets.Eric Anholt2018-06-141-1/+1
| | | | Fixes dEQP-GLES3.functional.fragment_out.random.26 and 6 others.
* v3d: Don't set the first_ez_state to DISABLED if after only UNDECIDED draws.Eric Anholt2018-06-141-1/+2
| | | | | | | | | We need to have the RCL start with EZ enabled, since those undecided draws had EZ enabled. But we do need to update from UNDECIDED to LT or GT as necessary still. Fixes many simulator assertion fails in deqp fragment_ops/interaction/basic_shader/*
* v3d: Use the right size for v3d 4.x TEXTURE_SHADER_STATE BO.Eric Anholt2018-06-141-2/+2
| | | | This doesn't really matter, since they both get rounded up to 4096.
* v3d: Add static asserts for other packed packet sizes.Eric Anholt2018-06-142-0/+7
|
* v3d: Fix the size of the packed attribute state.Eric Anholt2018-06-141-1/+1
| | | | Fixes segfaults in dEQP-GLES3.functional.vertex_array_objects.all_attributes.
* v3d: Remove some unused context fields from vc4.Eric Anholt2018-06-141-11/+0
|
* v3d: Remove unused QUNIFORM_STENCIL left over from vc4.Eric Anholt2018-06-142-11/+0
|
* v3d: Use our #define for max attributes in shader caps.Eric Anholt2018-06-141-1/+1
|
* v3d: Fix undefined results for a swap_color_rb RT from a float shader output.Eric Anholt2018-06-141-1/+4
| | | | | Fixes segfaults and undefined behavior in dEQP-GLES3.functional.fragment_out.basic.fixed.srgb8_alpha8_lowp_float
* radv: remove multisample bit from shader key.Dave Airlie2018-06-153-4/+0
| | | | | | This wasn't being used anywhere inside the shader from what I can see. Reviewed-by: Bas Nieuwenhuizen <[email protected]>
* intel/compiler: Properly consider UBO loads that cross 32B boundaries.Kenneth Graunke2018-06-141-2/+14
| | | | | | | | | | | | | | | | | | | | | | | The UBO push analysis pass incorrectly assumed that all values would fit within a 32B chunk, and only recorded a bit for the 32B chunk containing the starting offset. For example, if a UBO contained the following, tightly packed: vec4 a; // [0, 16) float b; // [16, 20) vec4 c; // [20, 36) then, c would start at offset 20 / 32 = 0 and end at 36 / 32 = 1, which means that we ought to record two 32B chunks in the bitfield. Similarly, dvec4s would suffer from the same problem. v2: Rewrite the accounting, my calculations were wrong. v3: Write a comment about partial values (requested by Jason). Reviewed-by: Rafael Antognolli <[email protected]> [v1] Reviewed-by: Jason Ekstrand <[email protected]> [v3]
* glsl: Don't copy propagate elements from SSBO or shared variables eitherIan Romanick2018-06-141-0/+8
| | | | | | | | | | | | | | | | | | | | | | | | Since SSBOs can be written by a different GPU thread, copy propagating a read can cause the value to magically change. SSBO reads are also very expensive, so doing it twice will be slower. The same shader was helped by this patch and the previous. Haswell, Broadwell, and Skylake had similar results. (Skylake shown) total instructions in shared programs: 14399119 -> 14399113 (<.01%) instructions in affected programs: 683 -> 677 (-0.88%) helped: 1 HURT: 0 total cycles in shared programs: 532973113 -> 532971865 (<.01%) cycles in affected programs: 524666 -> 523418 (-0.24%) helped: 1 HURT: 0 Signed-off-by: Ian Romanick <[email protected]> Reviewed-by: Caio Marcelo de Oliveira Filho <[email protected]> Cc: [email protected] Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=106774
* glsl: Don't copy propagate from SSBO or shared variables eitherIan Romanick2018-06-141-0/+2
| | | | | | | | | | | | | | | | | | | | | | Since SSBOs can be written by other GPU threads, copy propagating a read can cause the value to magically change. SSBO reads are also very expensive, so doing it twice will be slower. Haswell, Broadwell, and Skylake had similar results. (Skylake shown) total instructions in shared programs: 14399120 -> 14399119 (<.01%) instructions in affected programs: 684 -> 683 (-0.15%) helped: 1 HURT: 0 total cycles in shared programs: 532978931 -> 532973113 (<.01%) cycles in affected programs: 530484 -> 524666 (-1.10%) helped: 1 HURT: 0 Signed-off-by: Ian Romanick <[email protected]> Reviewed-by: Caio Marcelo de Oliveira Filho <[email protected]> Cc: [email protected] Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=106774
* meson: only build vl_winsys_dri.c when x11 platform is usedLukas Rusak2018-06-141-1/+1
| | | | | | | | | | | | | | This seems to have been missed in the move from autotools This fixes the following build issue: ../src/gallium/auxiliary/vl/vl_winsys_dri.c:34:10: fatal error: X11/Xlib-xcb.h: No such file or directory #include <X11/Xlib-xcb.h> ^~~~~~~~~~~~~~~~ Fixes: b1b65397d0c4978e36a84c0a1c98a4bd6cb9588e ("meson: Build gallium auxiliary") Reviewed-by: Dylan Baker <[email protected]>
* st/mesa: add missing switch cases in glsl_to_tgsi_visitor::visit()Brian Paul2018-06-141-0/+2
| | | | | | To silence compiler warning about unhandled switch cases. Reviewed-by: Charmaine Lee <[email protected]>
* radv: Fix output for sparse MRTs.Bas Nieuwenhuizen2018-06-141-9/+10
| | | | | | | | | | | | We need to init the cb_shader_format correctly with the changed col_format, so this moves the col_format adjustment to before the adjustment to before the cb_shader_mask gets generated. Fixes: 06d3c650980 "radv: fix a GPU hang when MRTs are sparse" Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=106903 CC: 18.1 <[email protected]> Reviewed-by: Dave Airlie <[email protected]> Reviewed-by: Samuel Pitoiset <[email protected]>
* radv: update the ZRANGE_PRECISION value for the TC-compat bugSamuel Pitoiset2018-06-141-0/+108
| | | | | | | | | | | | | | | | | | | | | | On GFX8+, there is a bug that affects TC-compatible depth surfaces when the ZRange is not reset after LateZ kills pixels. The workaround is to always set DB_Z_INFO.ZRANGE_PRECISION to match the last fast clear value. Because the value is set to 1 by default, we only need to update it when clearing Z to 0.0. We also need to set the depth clear regs and to update ZRANGE_PRECISION when initializing a TC-compat depth image to 0. Original patch from James Legg. This fixes random CTS fails with dEQP-VK.renderpass.suballocation.formats.d32_sfloat_s8_uint.input.* Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=105396 CC: <[email protected]> Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]>
* anv: reduce maxFragmentInputComponentsSamuel Iglesias Gonsálvez2018-06-141-1/+1
| | | | | | | | | | | | | | | | If the application asks for the maximum number of fragment input components (128), use all of them plus some builtins that are passed in the VUE, then we exceed the maximum number of used VUE slots (32) and we break one assert that checks this limit. Also, with separate shader objects, we add CLIP_DIST0, CLIP_DIST1 builtins in brw_compute_vue_map() because we don't know if gl_ClipDistance is going to be read/write by an adjacent stage. Fixes VK-GL-CTS CL#2569. Signed-off-by: Samuel Iglesias Gonsálvez <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]>
* radeonsi/gfx9: fix si_get_buffer_from_descriptors for 48-bit pointersMarek Olšák2018-06-131-2/+2
| | | | | | | | This fixes: GL45-CTS.pipeline_statistics_query_tests_ARB.functional_compute_shader_invocations Cc: 18.0 18.1 <[email protected]> Reviewed-by: Samuel Pitoiset <[email protected]>
* radeonsi/gfx9: update & clean up a DPBB heuristicMarek Olšák2018-06-131-9/+5
| | | | Tested-by: Dieter Nützel <[email protected]>