summaryrefslogtreecommitdiffstats
path: root/src
Commit message (Collapse)AuthorAgeFilesLines
* main: Support getting GL_PROGRAM_BINARY_LENGTHJordan Justen2017-12-081-1/+6
| | | | | | | | | | | V2: call generic _mesa_get_program_binary_length() helper rather than driver function directly to allow greater code sharing. Signed-off-by: Timothy Arceri <[email protected]> Signed-off-by: Jordan Justen <[email protected]> (v1) Reviewed-by: Nicolai Hähnle <[email protected]>i (v1) Reviewed-by: Tapani Pälli <[email protected]>
* mesa: Add Mesa ARB_get_program_binary helper functionsJordan Justen2017-12-084-0/+351
| | | | | | | | | | | | | | | V2 (Timothy Arceri): - add extra code comment - stop passing around void *binary and just pass program_binary_header *hdr instead. - move to src/mesa/main rather than src/util V3 (Timothy Arceri): - Move more code out of the backend and into the common helpers. Signed-off-by: Jordan Justen <[email protected]> Reviewed-by: Tapani Pälli <[email protected]>
* mesa: add driver callbacks for serialising ProgramBinary blobsTimothy Arceri2017-12-081-0/+17
| | | | Reviewed-by: Jordan Justen <[email protected]>
* main: Support 1 Mesa format with get for GL_PROGRAM_BINARY_FORMATSJordan Justen2017-12-082-1/+10
| | | | | | | | | Mesa supports either 0 or 1 formats. If 1 format is supported, it is GL_PROGRAM_BINARY_FORMAT_MESA as defined in the GL_MESA_program_binary_formats extension spec. Signed-off-by: Jordan Justen <[email protected]> Reviewed-by: Nicolai Hähnle <[email protected]>
* main: Allow non-zero NUM_PROGRAM_BINARY_FORMATSJordan Justen2017-12-082-1/+4
| | | | | Signed-off-by: Jordan Justen <[email protected]> Reviewed-by: Nicolai Hähnle <[email protected]>
* i965: Fix memory leak when serializing nirJordan Justen2017-12-081-0/+1
| | | | | Signed-off-by: Jordan Justen <[email protected]> Reviewed-by: Timothy Arceri <[email protected]>
* i965: Add brw_program_serialize_nirJordan Justen2017-12-083-6/+14
| | | | | Signed-off-by: Jordan Justen <[email protected]> Reviewed-by: Timothy Arceri <[email protected]>
* i965: Free serialized nir after deserializingJordan Justen2017-12-081-0/+6
| | | | | Signed-off-by: Jordan Justen <[email protected]> Reviewed-by: Timothy Arceri <[email protected]>
* i965: Add brw_program_deserialize_nirJordan Justen2017-12-083-23/+28
| | | | | Signed-off-by: Jordan Justen <[email protected]> Reviewed-by: Timothy Arceri <[email protected]>
* main, glsl: Add UniformDataDefaults which stores uniform defaultsJordan Justen2017-12-084-2/+30
| | | | | | | | | | | | | | | | | | | The ARB_get_program_binary extension requires that uniform values in a program be restored to their initial value just after linking. This patch saves off the initial values just after linking. When the program is restored by glProgramBinary, we can use this to copy the initial value of uniforms into UniformDataSlots. V2 (Timothy Arceri): - Store UniformDataDefaults only when serializing GLSL as this is what we want for both disk cache and ARB_get_program_binary. This saves us having to come back later and reset the Uniforms on program binary restores. Signed-off-by: Timothy Arceri <[email protected]> Signed-off-by: Jordan Justen <[email protected]> (v1) Reviewed-by: Tapani Pälli <[email protected]>
* glsl: Split out shader program serializationJordan Justen2017-12-086-1181/+1297
| | | | | | | | This will allow us to use the program serialization to implement ARB_get_program_binary. Signed-off-by: Jordan Justen <[email protected]> Reviewed-by: Timothy Arceri <[email protected]>
* mesa: add GL_PROGRAM_BINARY_FORMAT_MESA enumJordan Justen2017-12-081-1/+6
| | | | | | | Signed-off-by: Jordan Justen <[email protected]> Reviewed-by: Ian Romanick <[email protected]> Reviewed-by: Nicolai Hähnle <[email protected]> Reviewed-by: Timothy Arceri <[email protected]>
* intel/cfg: Represent divergent control flow paths caused by non-uniform loop ↵Francisco Jerez2017-12-071-6/+69
| | | | | | | | | | | | | | | | | | | | | | | | | | execution. This addresses a long-standing back-end compiler bug that could lead to cross-channel data corruption in loops executed non-uniformly. In some cases live variables extending through a loop divergence point (e.g. a non-uniform break) into a convergence point (e.g. the end of the loop) wouldn't be considered live along all physical control flow paths the SIMD thread could possibly have taken in between due to some channels remaining in the loop for additional iterations. This patch fixes the problem by extending the CFG with physical edges that don't exist in the idealized non-vectorized program, but represent valid control flow paths the SIMD EU may take due to the divergence of logical threads. This makes sense because the i965 IR is explicitly SIMD, and it's not uncommon for instructions to have an influence on neighboring channels (e.g. a force_writemask_all header setup), so the behavior of the SIMD thread as a whole needs to be considered. No changes in shader-db. Reviewed-by: Jason Ekstrand <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
* intel/fs: Don't let undefined values prevent copy propagation.Francisco Jerez2017-12-071-3/+47
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This makes the dataflow propagation logic of the copy propagation pass more intelligent in cases where the destination of a copy is known to be undefined for some incoming CFG edges, building upon the definedness information provided by the last patch. Helps a few programs, and avoids a handful shader-db regressions from the next patch. shader-db results on ILK: total instructions in shared programs: 6541547 -> 6541523 (-0.00%) instructions in affected programs: 360 -> 336 (-6.67%) helped: 8 HURT: 0 LOST: 0 GAINED: 10 shader-db results on BDW: total instructions in shared programs: 8174323 -> 8173882 (-0.01%) instructions in affected programs: 7730 -> 7289 (-5.71%) helped: 5 HURT: 2 LOST: 0 GAINED: 4 shader-db results on SKL: total instructions in shared programs: 8185669 -> 8184598 (-0.01%) instructions in affected programs: 10364 -> 9293 (-10.33%) helped: 5 HURT: 2 LOST: 0 GAINED: 2 Reviewed-by: Jason Ekstrand <[email protected]>
* intel/fs: Restrict live intervals to the subset possibly reachable from any ↵Francisco Jerez2017-12-072-4/+42
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | definition. Currently the liveness analysis pass would extend a live interval up to the top of the program when no unconditional and complete definition of the variable is found that dominates all of its uses. This can lead to a serious performance problem in shaders containing many partial writes, like scalar arithmetic, FP64 and soon FP16 operations. The number of oversize live intervals in such workloads can cause the compilation time of the shader to explode because of the worse than quadratic behavior of the register allocator and scheduler when running out of registers, and it can also cause the running time of the shader to explode due to the amount of spilling it leads to, which is orders of magnitude slower than GRF memory. This patch fixes it by computing the intersection of our current live intervals with the subset of the program that can possibly be reached from any definition of the variable. Extending the storage allocation of the variable beyond that is pretty useless because its value is guaranteed to be undefined at a point that cannot be reached from any definition. According to Jason, this improves performance of the subgroup Vulkan CTS tests significantly (e.g. the runtime of the dvec4 broadcast test improves by nearly 50x). No significant change in the running time of shader-db (with 5% statistical significance). shader-db results on IVB: total cycles in shared programs: 61108780 -> 60932856 (-0.29%) cycles in affected programs: 16335482 -> 16159558 (-1.08%) helped: 5121 HURT: 4347 total spills in shared programs: 1309 -> 1288 (-1.60%) spills in affected programs: 249 -> 228 (-8.43%) helped: 3 HURT: 0 total fills in shared programs: 1652 -> 1597 (-3.33%) fills in affected programs: 262 -> 207 (-20.99%) helped: 4 HURT: 0 LOST: 2 GAINED: 209 shader-db results on BDW: total cycles in shared programs: 67617262 -> 67361220 (-0.38%) cycles in affected programs: 23397142 -> 23141100 (-1.09%) helped: 8045 HURT: 6488 total spills in shared programs: 1456 -> 1252 (-14.01%) spills in affected programs: 465 -> 261 (-43.87%) helped: 3 HURT: 0 total fills in shared programs: 1720 -> 1465 (-14.83%) fills in affected programs: 471 -> 216 (-54.14%) helped: 4 HURT: 0 LOST: 2 GAINED: 162 shader-db results on SKL: total cycles in shared programs: 65436248 -> 65245186 (-0.29%) cycles in affected programs: 22560936 -> 22369874 (-0.85%) helped: 8457 HURT: 6247 total spills in shared programs: 437 -> 437 (0.00%) spills in affected programs: 0 -> 0 helped: 0 HURT: 0 total fills in shared programs: 870 -> 854 (-1.84%) fills in affected programs: 16 -> 0 helped: 1 HURT: 0 LOST: 0 GAINED: 107 Reviewed-by: Jason Ekstrand <[email protected]>
* intel/fs: Teach instruction scheduler about GRF bank conflict cycles.Francisco Jerez2017-12-073-2/+23
| | | | | | | | | | | This should allow the post-RA scheduler to do a slightly better job at hiding latency in presence of instructions incurring bank conflicts. The main purpuse of this patch is not to improve performance though, but to get conflict cycles to show up in shader-db statistics in order to make sure that regressions in the bank conflict mitigation pass don't go unnoticed. Acked-by: Matt Turner <[email protected]>
* intel/fs: Implement GRF bank conflict mitigation pass.Francisco Jerez2017-12-075-0/+898
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Unnecessary GRF bank conflicts increase the issue time of ternary instructions (the overwhelmingly most common of which is MAD) by roughly 50%, leading to reduced ALU throughput. This pass attempts to minimize the number of bank conflicts by rearranging the layout of the GRF space post-register allocation. It's in general not possible to eliminate all of them without introducing extra copies, which are typically more expensive than the bank conflict itself. In a shader-db run on SKL this helps roughly 46k shaders: total conflicts in shared programs: 1008981 -> 600461 (-40.49%) conflicts in affected programs: 816222 -> 407702 (-50.05%) helped: 46234 HURT: 72 The running time of shader-db itself on SKL seems to be increased by roughly 2.52%±1.13% with n=20 due to the additional work done by the compiler back-end. On earlier generations the pass is somewhat less effective in relative terms because the hardware incurs a bank conflict anytime the last two sources of the instruction are duplicate (e.g. while trying to square a value using MAD), which is impossible to avoid without introducing copies. E.g. for a shader-db run on SNB: total conflicts in shared programs: 944636 -> 623185 (-34.03%) conflicts in affected programs: 853258 -> 531807 (-37.67%) helped: 31052 HURT: 19 And on BDW: total conflicts in shared programs: 1418393 -> 987539 (-30.38%) conflicts in affected programs: 1179787 -> 748933 (-36.52%) helped: 47592 HURT: 70 On SKL GT4e this improves performance of GpuTest Volplosion by 3.64% ±0.33% with n=16. NOTE: This patch intentionally disregards some i965 coding conventions for the sake of reviewability. This is addressed by the next squash patch which introduces an amount of (for the most part boring) boilerplate that might distract reviewers from the non-trivial algorithmic details of the pass. The following patch is squashed in: SQUASH: intel/fs/bank_conflicts: Roll back to the nineties. Acked-by: Matt Turner <[email protected]>
* meson: Add lmsensors to gallium libgl-xlib target.Dylan Baker2017-12-071-1/+3
| | | | | | Fixes: 5e71efef44b992b5d70b ("meson: Add lmsensors support") Signed-off-by: Dylan Baker <[email protected]> Reviewed-by: Eric Engestrom <[email protected]>
* meson: add dep_thread to every lib that includes threads.hEric Engestrom2017-12-079-8/+9
| | | | | | Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=104141 Signed-off-by: Eric Engestrom <[email protected]> Reviewed-by: Dylan Baker <[email protected]>
* meson: fix pl111 dependency on vc4Eric Engestrom2017-12-072-5/+6
| | | | | | | | src/gallium/winsys/pl111/drm/libpl111winsys.a(pl111_drm_winsys.c.o): In function `pl111_drm_screen_create': pl111_drm_winsys.c:(.text+0x33): undefined reference to `vc4_drm_screen_create_renderonly' Signed-off-by: Eric Engestrom <[email protected]> Reviewed-by: Dylan Baker <[email protected]>
* radv: use a faster version for nir_op_pack_half_2x16Samuel Pitoiset2017-12-071-11/+1
| | | | | | | | | | | | | | | | | | | | | This patch is ported from RadeonSI and it has two effects. It fixes a rendering issue which affects F1 2017 and Dawn of War 3 (Vega only) because LLVM was ending up by generating the new v_mad_mix_{hi,lo} instructions which appear to be buggy in some way. Not sure if Mesa is generating something wrong or if the issue is in LLVM only. Anyway, that explains why the DOW3 issue can't be reproduced with GL on Vega. It also improves performance because v_cvt_pkrtz_f16 is faster, and because I guess the rounding mode behaviour is similar between GL and VK, we can use it. About performance, it improves Talos by +3/4% but I don't see any other impacts. No CTS regressions on Polaris. Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Dave Airlie <[email protected]>
* mesa/spirv: move and rename nir_spirv_supported_capabilitiesAlejandro Piñeiro2017-12-073-15/+15
| | | | | | | | | | To avoid any vulkan driver to include the GL mtypes.h. Renamed as eventually this could be used by drivers not using nir. v2: remove compiler/spirv/spirv.h from mtypes (Alejandro) v3: added the definition at compiler/shader_info.h (Jason Ekstrand) Reviewed-by: Jason Ekstrand <[email protected]>
* util/disk_cache: Remove unneeded free() on always null stringVadym Shovkoplias2017-12-071-1/+0
| | | | | | | | | At this point dc_job->cache_item_metadata.keys always equals NULL, so call to free() is useless Fixes: b86ecea3446 ("util/disk_cache: write cache item metadata to disk") Signed-off-by: Vadym Shovkoplias <[email protected]> Reviewed-by: Eric Engestrom <[email protected]>
* spirv: fix bug when OpSpecConstantOp calls a conversionSamuel Iglesias Gonsálvez2017-12-071-6/+21
| | | | | | | | | | | | | | | | | In that case, nir_eval_const_opcode() will evaluate the conversion but as it was using destination's bit_size, the resulting value was just a cast of the source constant value. By passing the source's bit size, it does the conversion properly. Fixes: dEQP-VK.spirv_assembly.instruction.*.opspecconstantop.*convert* v2: - Remove invalid conversion op cases. Signed-off-by: Samuel Iglesias Gonsálvez <[email protected]> Reviewed-by: Ian Romanick <[email protected]>
* spirv: allow specialization constants with bitsize different than 32 bitsSamuel Iglesias Gonsálvez2017-12-071-1/+0
| | | | | Signed-off-by: Samuel Iglesias Gonsálvez <[email protected]> Reviewed-by: Ian Romanick <[email protected]>
* nir/opcodes: Fix constant-folding of bitfield_insertJames Legg2017-12-071-2/+2
| | | | | | | Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=104119 CC: <[email protected]> CC: Samuel Pitoiset <[email protected]> Reviewed-by: Matt Turner <[email protected]>
* radv: Add LLVM version to the device name stringAlex Smith2017-12-072-26/+37
| | | | | | | | | | Allows apps to determine the LLVM version so that they can decide whether or not to enable workarounds for LLVM issues. Signed-off-by: Alex Smith <[email protected]> Cc: "17.2 17.3" <[email protected]> Reviewed-by: Samuel Pitoiset <[email protected]> Tested-by: Dieter Nützel <[email protected]>
* mesa: remove set_entry from forward type declarationsAlejandro Piñeiro2017-12-071-1/+0
| | | | | | This type was used at gl_sync_object, but it is not used anymore. Reviewed-by: Timothy Arceri <[email protected]>
* meta: Fix ClearTexture with GL_DEPTH_COMPONENT.Kenneth Graunke2017-12-061-9/+14
| | | | | | | | | | | | | We only handled unpacking for GL_DEPTH_STENCIL formats. Cemu was hitting _mesa_problem() for an unsupported format in _mesa_unpack_float_32_uint_24_8_depth_stencil_row(), because the format was depth-only, rather than depth-stencil. Cc: "13.0 12.0" <[email protected]> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=94739 Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=103966 Reviewed-by: Tapani Pälli <[email protected]>
* meta: Initialize depth/clear values on declaration.Kenneth Graunke2017-12-061-5/+2
| | | | | | | This helps avoid compiler warningss in the next commit - everything was initialized, but it wasn't obvious to static analysis. Suggested-by: Tapani Pälli <[email protected]>
* glsl: get correct member type when processing xfb ifc arraysTimothy Arceri2017-12-071-2/+4
| | | | | | | | | This fixes a crash in: KHR-GL45.enhanced_layouts.xfb_block_stride Fixes: 0822517936d4 "glsl: add helper to process xfb qualifiers during linking" Reviewed-by: Kenneth Graunke <[email protected]>
* r600/sb: do not convert if-blocks that contain indirect array accessGert Wollny2017-12-073-2/+5
| | | | | | | | | | | | | | | | | | | | If an array is accessed within an if block, then currently it is not known whether the value in the address register is involved in the evaluation of the if condition, and converting the if condition may actually result in out-of-bounds array access. Consequently, if blocks that contain indirect array access should not be converted. Fixes piglits on r600/BARTS: spec/glsl-1.10/execution/variable-indexing/ vs-output-array-float-index-wr vs-output-array-vec3-index-wr vs-output-array-vec4-index-wr Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=104143 Signed-off-by: Gert Wollny <[email protected]> Cc: <[email protected]> Signed-off-by: Dave Airlie <[email protected]>
* r600: add support for compute grid/block sizes. (v2)Dave Airlie2017-12-064-3/+100
| | | | | | | | | | We just pass these in from outside in a constant buffer. The shader side stores them once they are accessed once. v2: fix to not use a temp_reg. Signed-off-by: Dave Airlie <[email protected]>
* r600: handle image/buffer sizes correctly.Dave Airlie2017-12-063-4/+21
| | | | | | This adds support to compute for the resq workarounds (buffer/cube sizes) Signed-off-by: Dave Airlie <[email protected]>
* r600/compute: add support for emitting compute image/buffer atomsDave Airlie2017-12-061-1/+9
| | | | Signed-off-by: Dave Airlie <[email protected]>
* r600/compute: handle atomic counters in compute state.Dave Airlie2017-12-061-0/+9
| | | | Signed-off-by: Dave Airlie <[email protected]>
* r600/compute: add support for TGSI compute shaders. (v1.1)Dave Airlie2017-12-062-28/+103
| | | | | | | | | | | This add paths to handle TGSI compute shaders and shader selection. It also avoids emitting certain things on tgsi paths, CBs, vertex buffers, config reg init (not required). v1.1: fix rat mask calc Signed-off-by: Dave Airlie <[email protected]>
* r600/shader: add compute support to shader assemblerDave Airlie2017-12-061-0/+14
| | | | Signed-off-by: Dave Airlie <[email protected]>
* r600/texture: drop lowering 1d/2d images to linear.Dave Airlie2017-12-061-8/+0
| | | | | | | This appears to cause hangs with compute images. Unless we can find more specifics, just don't do this for now. Signed-off-by: Dave Airlie <[email protected]>
* mesa: define nir_spirv_supported_capabilitiesAlejandro Piñeiro2017-12-062-13/+15
| | | | | | | | | Until now it was part of spirv_to_nir_options. But it will be used on the implementation of ARB_gl_spirv and ARB_spirv_extensions, and added to the OpenGL context, as a way to save what SPIR-V capabilities the current OpenGL implementation supports. Reviewed-by: Ian Romanick <[email protected]>
* anv: fix a case statement in GetMemoryFdPropertiesKHRFredrik Höglund2017-12-061-1/+1
| | | | | | | | | The handle type in the case statement is supposed to be VK_EXTERNAL_- MEMORY_HANDLE_TYPE_DMA_BUF_BIT_EXT. Fixes: ab18e8e59b6 ("anv: Implement VK_EXT_external_memory_dma_buf") Signed-off-by: Fredrik Höglund <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]>
* radv: fix a case statement in GetMemoryFdPropertiesKHRFredrik Höglund2017-12-061-1/+1
| | | | | | | | | | The handle type in the case statement is supposed to be VK_EXTERNAL_- MEMORY_HANDLE_TYPE_DMA_BUF_BIT_EXT. Fixes: 546e747867c ("radv: Implement VK_EXT_external_memory_dma_buf") Signed-off-by: Fredrik Höglund <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]> Reviewed-by: Samuel Pitoiset <[email protected]>
* meson: fix keyword argument in declare_dependency()Eric Engestrom2017-12-062-2/+2
| | | | | | | | | `declare_dependency()` takes `compile_args`, not `c_args`. It was correct in all the other `declare_dependency()` from that commit. Fixes: 0bbecc5a8548883f76a71 "meson: define driver dependencies" Signed-off-by: Eric Engestrom <[email protected]> Reviewed-by: Dylan Baker <[email protected]>
* i965: include brw_pipe_control.h in the tarballEmil Velikov2017-12-061-0/+1
| | | | | | Fixes: bfe0f3a7027 ("i965: Move PIPE_CONTROL defines and prototypes to brw_pipe_control.h.") Signed-off-by: Emil Velikov <[email protected]>
* mesa: document _mesa_extension_override_* variablesEmil Velikov2017-12-061-0/+9
| | | | | | | | | Currently there are no users of these outside of extensions.c. Provide some information why they exist and how to use them. Cc: Jordan Justen <[email protected]> Signed-off-by: Emil Velikov <[email protected]> Reviewed-by: Andres Gomez <[email protected]>
* swr/scons: Fix another intermittent build failureGeorge Kyriazis2017-12-061-0/+1
| | | | | | | gen_BackendPixelRate*.cpp depends on gen_ar_eventhandler.hpp. Fix missing dependency. Reviewed-by: Bruce Cherniak <[email protected]>
* radeonsi: make const and stream uploaders allocate read-only memoryMarek Olšák2017-12-061-2/+5
| | | | | | | and anything that clones these uploaders, like u_threaded_context. Tested-by: Dieter Nützel <[email protected]> Reviewed-by: Nicolai Hähnle <[email protected]>
* radeonsi: use a separate allocator for fine fencesMarek Olšák2017-12-063-1/+9
| | | | | Tested-by: Dieter Nützel <[email protected]> Reviewed-by: Nicolai Hähnle <[email protected]>
* radeonsi/gfx9: make shader binaries use read-only memoryMarek Olšák2017-12-065-3/+13
| | | | | Tested-by: Dieter Nützel <[email protected]> Reviewed-by: Nicolai Hähnle <[email protected]>
* winsys/amdgpu: make IBs use read-only memoryMarek Olšák2017-12-061-0/+1
| | | | | Tested-by: Dieter Nützel <[email protected]> Reviewed-by: Nicolai Hähnle <[email protected]>