summaryrefslogtreecommitdiffstats
path: root/src/mesa/drivers
Commit message (Collapse)AuthorAgeFilesLines
* intel: Increase Gen11 compute shader scratch IDs to 64.Kenneth Graunke2019-09-231-1/+13
| | | | | | | | | | | | | | | | | | | | | | | From the MEDIA_VFE_STATE docs: "Starting with this configuration, the Maximum Number of Threads must be set to (#EU * 8) for GPGPU dispatches. Although there are only 7 threads per EU in the configuration, the FFTID is calculated as if there are 8 threads per EU, which in turn requires a larger amount of Scratch Space to be allocated by the driver." It's pretty clear that we need to increase this for scratch address calculations, because the FFTID has a certain bit-pattern. The quote above seems to indicate that we should increase the actual thread count programmed in MEDIA_VFE_STATE as well, but we think the intention is to only bump the scratch space. Fixes GPU hangs in Bioshock Infinite and Synmark's CSDof on Icelake 8x8. Fixes: 5ac804bd9ac ("intel: Add a preliminary device for Ice Lake") Reviewed-by: Matt Turner <[email protected]>
* Revert "intel/gen11+: Enable Hardware filtering of Semi-Pipelined State in WM"Kenneth Graunke2019-09-232-9/+0
| | | | | | | | | | | | | | | This reverts commit 729de1488f49033bc181b8123af5658228a51bf1. It turns out that, although the register is in the logical context, it isn't whitelisted, so we can't actually write it from userspace batch buffers. The write just becomes a noop, which is why we saw no performance changes. I manually whitelisted it, and still observed no performance gains, but it did regress KHR-GL46.texture_cube_map_array.color_depth_attachments on the iris driver. So we might need to fix something before enabling this. To prevent it randomly getting turned on should the kernel ever whitelist this register, we revert the patch for now.
* Move blob from compiler/ to util/Jason Ekstrand2019-09-191-1/+1
| | | | | | | | There's nothing whatsoever compiler-specific about it other than that's currently where it's used. Reviewed-by: Kristian H. Kristensen <[email protected]> Reviewed-by: Lionel Landwerlin <[email protected]>
* i965: support AYUV/XYUV for external import onlyHaihao Xiang2019-09-181-0/+2
| | | | | | | | | Fixes: 89785e2d56e7fa ("i965: add support for sampling from AYUV") Fixes: 7cab8d3661f243 ("i965: Add support for sampling from XYUV images") Cc: Vivek Kasireddy <[email protected]> Cc: Lionel Landwerlin <[email protected]> Signed-off-by: Haihao Xiang <[email protected]> Reviewed-by: Tapani Pälli <[email protected]>
* scons: Make scons and meson agree about path to glapi generated headersDylan Baker2019-09-163-1/+3
| | | | | | | | Currently scons puts them in src/mapi/glapi, meosn puts them in src/mapi/glapi/gen. This results in some things being compilable only by one or the other, put them in the same places so that everyone is happy. Reviewed-by: Eric Engestrom <[email protected]>
* driconfig: add a new engine name/version parameterLionel Landwerlin2019-09-156-8/+9
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Vulkan applications can register with the following structure : typedef struct VkApplicationInfo { VkStructureType sType; const void* pNext; const char* pApplicationName; uint32_t applicationVersion; const char* pEngineName; uint32_t engineVersion; uint32_t apiVersion; } VkApplicationInfo; This enables the Vulkan implementations to apply workarounds based off matching this description. Here we add a new parameter for matching the driconfig options with the following : <device driver="anv"> <application engine_name_match="MyOwnEngine.*" engine_versions="10:12,40:42"> <option name="blaaah" value="true" /> </application> </device> v2: switch engine name match to use regexps v3: Verify that the regexec returns REG_NOMATCH for match failure (Eric) v4: Add missing bit that went to the following commit (Eric) Signed-off-by: Lionel Landwerlin <[email protected]> Reviewed-by: Eric Engestrom <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]> Cc: 19.2 <[email protected]>
* dri: Use DRM_FORMAT_* instead of defining our own copy.Eric Anholt2019-09-112-46/+47
| | | | | | | | | | | | We have only two defines that aren't from DRM_FORMAT_*: SARGB and SABGR. Keep only those as __DRI_IMAGE_FOURCC and garbage collect the rest. While this header is also used from the X server, the X server doesn't use any __DRI_IMAGE enums. Reviewed-by: Eric Engestrom <[email protected]> Reviewed-by: Kristian H. Kristensen <[email protected]>
* intel/gen11+: Enable Hardware filtering of Semi-Pipelined State in WMAnuj Phogat2019-09-112-0/+9
| | | | | | | Initial benchmarking didn't show any performance benefits. But it might eventually. Signed-off-by: Anuj Phogat <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
* glsl/nir: Add and use a gl_nir_link() functionCaio Marcelo de Oliveira Filho2019-09-101-8/+4
| | | | | | | | | Perform all the NIR linking steps in order. Change iris and i965 to use it. Suggested by Alejandro. v2: Add gl_nir_linker_options struct. Reviewed-by: Alejandro Piñeiro <[email protected]> [v1]
* glsl/nir: Fill in the Parameters in NIR linkerCaio Marcelo de Oliveira Filho2019-09-101-1/+1
| | | | | | | | | | | | | | | | | | | | | The parameter lists were not being created nor filled since i965 doesn't use them. In Gallium they are used for uniform handling, so add a way to fill them. The gl_uniform_storage struct got two new fields that let us go - from a Parameter to the matching UniformStorage and, - from the variable to the *first* UniformStorage without relying on names -- since they are optional for ARB_gl_spirv. Later patches will make use of them. v2: Do not fill parameters for i965. (Timothy) Use uint32_t for the new attributes. (Marek) v3: Serialize the new fields. (Timothy) Reviewed-by: Timothy Arceri <[email protected]>
* mesa: Eliminate gl_config::rgbModeAdam Jackson2019-09-092-16/+3
| | | | | Reviewed-by: Eric Anholt <[email protected]> Reviewed-by: Marek Olšák <[email protected]>
* mesa: Eliminate gl_config::have{Accum,Depth,Stencil}BufferAdam Jackson2019-09-095-20/+12
| | | | | Reviewed-by: Eric Anholt <[email protected]> Reviewed-by: Marek Olšák <[email protected]>
* mesa: Remove unused gl_config::indexBitsAdam Jackson2019-09-091-1/+0
| | | | | Reviewed-by: Eric Anholt <[email protected]> Reviewed-by: Marek Olšák <[email protected]>
* intel: Stop redirecting state cache to command streamer cache sectionKenneth Graunke2019-09-061-5/+0
| | | | | | | | | | | | | | | | | | This bit redirects the state cache from the unified/RO sections of the L3 cache to the "CS command buffer" section of the cache, which would be set up via TCCNTLREG. The documentation says: "Additionaly, this redirection should be enabled only if there is a non-zero allocation for the CS command buffer section." We don't allocate any cache to the CS command buffer section, so enabling this redirection effectively disabled the state cache. The Windows driver only sets up that section when using POSH, which we do not currently use. So, leave it unallocated and disable the redirection to get a functional state cache again. Improves performance in Civilization VI by 18%, Manhattan 3.0 by 6%, and Car Chase by 2%.
* intel/dri: finish proper glthreadSergii Romantsov2019-09-051-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | KWin was able to get NULL-context in the call intelUnbindContext. But a call _mesa_glthread_finish is not resistent to such case. Case can be catched with steps: 1. Create both glx and egl contexts 2. Make glx as current 3. Make egl as current 4. Reset glx context 5. Make egl as current Solution adds proper finishing of glthread-context (context will be taken from the requested dri-context for unbinding, but not from the saved current context). Piglit-test: https://gitlab.freedesktop.org/mesa/piglit/merge_requests/87 Cc: 19.1 19.2 <[email protected]> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=110814 Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=111271 Fixes: dca36d5516d0 (i965: Implement threaded GL support) Signed-off-by: Sergii Romantsov <[email protected]> Reviewed-by: Lionel Landwerlin <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
* i965: initialize bo_reuse when creating brw_bufmgrTapani Pälli2019-08-294-26/+15
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Fixes a possible data race spotted while debugging on other EGL related failures where glFinish and eglCreateContext are going on at the same time: ==11558== Possible data race during read of size 1 at 0x5E78CD0 by thread #23 ==11558== Locks held: 1, at address 0x5E77CA8 ==11558== at 0x61B71D4: bo_alloc_internal (brw_bufmgr.c:639) ==11558== by 0x61B7328: brw_bo_alloc (brw_bufmgr.c:669) ==11558== by 0x61EF975: recreate_growing_buffer (intel_batchbuffer.c:231) ==11558== by 0x61EFAAE: intel_batchbuffer_reset (intel_batchbuffer.c:255) ==11558== by 0x61EFB85: intel_batchbuffer_reset_and_clear_render_cache (intel_batchbuffer.c:280) ==11558== by 0x61F0507: brw_new_batch (intel_batchbuffer.c:551) ==11558== by 0x61F12C1: _intel_batchbuffer_flush_fence (intel_batchbuffer.c:888) ==11558== by 0x61BDD6B: intel_glFlush (brw_context.c:296) ==11558== by 0x61BDDB9: intel_finish (brw_context.c:307) ==11558== by 0x623831B: _mesa_Finish (context.c:1906) ==11558== by 0x46D556: deqp::egl::GLES2ThreadTest::Operation::execute(tcu::ThreadUtil::Thread&) ==11558== by 0x721502: tcu::ThreadUtil::Thread::run() ==11558== ==11558== This conflicts with a previous write of size 1 by thread #26 ==11558== Locks held: 1, at address 0x5D09878 ==11558== at 0x61B98A9: brw_bufmgr_enable_reuse (brw_bufmgr.c:1541) ==11558== by 0x61BF09D: brw_process_driconf_options (brw_context.c:854) ==11558== by 0x61BF6CA: brwCreateContext (brw_context.c:993) ==11558== by 0x621181F: driCreateContextAttribs (dri_util.c:473) ==11558== by 0x53FE87B: dri2_create_context (egl_dri2.c:1388) ==11558== by 0x53EE7BE: eglCreateContext (eglapi.c:807) ==11558== by 0x5C8AB9: eglw::FuncPtrLibrary::createContext(void*, void*, void*, int const*) const ==11558== by 0x46E027: deqp::egl::GLES2ThreadTest::CreateContext::exec(tcu::ThreadUtil::Thread&) Signed-off-by: Tapani Pälli <[email protected]> Reviewed-by: Lionel Landwerlin <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
* i965: Exit with error if gen12+ is detectedJordan Justen2019-08-281-0/+5
| | | | | | | For OpenGL support on gen12, the iris driver should be used. Signed-off-by: Jordan Justen <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
* glx: Fix incompatible function pointer types.Jose Fonseca2019-08-281-1/+1
| | | | | | | | | | | I don't know how Meson didn't hit this issue, when it too already uses -Werror=incompatible-pointer-types Fixes: 3dd299c3d5b88114894e ("glx: Sync <GL/glxext.h> with Khronos") Reviewed-by: Eric Engestrom <[email protected]> Reviewed-by: Adam Jackson <[email protected]> Reviewed-by: Brian Paul <[email protected]> Reviewed-by: Roland Scheidegger <[email protected]>
* intel/fs: Drop the gl_program from fs_visitorJason Ekstrand2019-08-252-2/+2
| | | | | | | | | It's not used by anything anymore now that so much lowering has been moved into NIR. Sadly, we still need on in brw_compile_gs() for geometry shaders on Sandy Bridge. Short of a lot of pointless work, that one's probably not going away. Reviewed-by: Kenneth Graunke <[email protected]>
* i965: Silence brw_blorp uninitialized warningCaio Marcelo de Oliveira Filho2019-08-231-1/+1
| | | | | | | | | | The variables level and start_layer are not initialized, then initialized if we have a BUFFER_BIT_DEPTH set. We assert on them later using the same check. This should be enough but GCC 9.1.1 is not convinced, so let's initialize the variables. Acked-by: Eric Engestrom <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]>
* glx: Sync <GL/glxext.h> with KhronosAdam Jackson2019-08-223-9/+8
| | | | | | | | Minor fixups required to keep the prototypes matching and to remove mention of retired enums. Acked-by: Eric Engestrom <[email protected]> Reviewed-by: Eric Anholt <[email protected]>
* i965: honor scanout requirement from DRILionel Landwerlin2019-08-211-1/+3
| | | | | Signed-off-by: Lionel Landwerlin <[email protected]> Reviewed-by: Tapani Pälli <[email protected]>
* i965: Add handling for fp16 configsKevin Strasser2019-08-211-1/+24
| | | | | | | | | | | | Expose configs when allow_fp16_configs has been enabled and DRI_LOADER_CAP_FP16 is set in the loader. Also, define a new dri configuration option so users can disable exposure of fp16 formats. Make fp16 opt-in for i965. Signed-off-by: Kevin Strasser <[email protected]> Reviewed-by: Adam Jackson <[email protected]> Reviewed-by: Emil Velikov <[email protected]>
* dri: Add fp16 formatsKevin Strasser2019-08-212-0/+22
| | | | | | | | | | | Add dri formats for RGBA ordered 64 bpp IEEE 754 half precision floating point. Leverage existing offscreen render support for MESA_FORMAT_RGBA_FLOAT16 and MESA_FORMAT_RGBX_FLOAT16. Signed-off-by: Kevin Strasser <[email protected]> Reviewed-by: Adam Jackson <[email protected]> Reviewed-by: Marek Olšák <[email protected]> Reviewed-by: Emil Velikov <[email protected]>
* dri: Handle configs with floating point pixel dataKevin Strasser2019-08-211-0/+5
| | | | | | | | | | | In order to handle pixel formats that consist of floating point data, enable floatMode field in the dri config, and set __DRI_ATTRIB_FLOAT_BIT in the render type attribute. Signed-off-by: Kevin Strasser <[email protected]> Reviewed-by: Adam Jackson <[email protected]> Reviewed-by: Marek Olšák <[email protected]> Reviewed-by: Emil Velikov <[email protected]>
* dri: Add config attributes for color channel shiftKevin Strasser2019-08-211-19/+49
| | | | | | | | | | | | | | | The existing mask attributes can only support up to 32 bpp. Introduce per-channel SHIFT attributes that indicate how many bits, from lsb towards msb, the bit field is offset. A shift of -1 will indicate that there is no bit field set for the channel. As old loaders will still be looking for masks, we set the masks to 0 for any formats wider than 32 bpp. Signed-off-by: Kevin Strasser <[email protected]> Reviewed-by: Adam Jackson <[email protected]> Reviewed-by: Marek Olšák <[email protected]> Reviewed-by: Emil Velikov <[email protected]>
* i965: Add helper function for allowed config formatsKevin Strasser2019-08-211-31/+34
| | | | | | | | | The driver checks dri config options and loader caps to filter out certain formats during config creation. Fold 4 call sites under a single helper function. Signed-off-by: Kevin Strasser <[email protected]> Reviewed-by: Emil Velikov <[email protected]>
* nir: Add explicit signs to image min/max intrinsicsJason Ekstrand2019-08-211-2/+4
| | | | | | | | | | | This better matches all the other atomic intrinsics such as those for SSBOs and shared variables where the sign is part of the intrinsic opcode. Both generators (GLSL and SPIR-V) know the sign from the type of the image variable or handle. In SPIR-V, signed min/max are separate opcodes from unsigned. Reviewed-by: Kenneth Graunke <[email protected]> Reviewed-by: Eric Anholt <[email protected]>
* i965: Enable OpenGL 4.6 for Gen8+Alejandro Piñeiro2019-08-212-2/+2
| | | | | | | | | The last remaining stuff was ARB_gl_spirv and ARB_spirv_extensions. Note that it is really likely that we can enable it for some Gen7 (as 4.5 was), but it was not tested yet. Reviewed-by: Caio Marcelo de Oliveira Filho <[email protected]>
* i965: enable ARB_gl_spirv extension and ARB_spirv_extensions for gen7+Alejandro Piñeiro2019-08-211-0/+3
| | | | | | v2: squashed the two enable patches with the docs one (Jason) Reviewed-by: Caio Marcelo de Oliveira Filho <[email protected]>
* i965/gen11: fix genX_bits.h include pathMauro Rossi2019-08-131-1/+1
| | | | | | | | | | | Instead of "genX_bits.h" use "genxml/genX_bits.h" as already done in other similar cases Besides being more correct, it also fixes building error in Android. Fixes: f0d2923 ("i965/gen11: Emit SLICE_HASH_TABLE when pipes are unbalanced.") Signed-off-by: Mauro Rossi <[email protected]> Reviewed-by: Rafael Antognolli <[email protected]>
* i965/gen11: Emit SLICE_HASH_TABLE when pipes are unbalanced.Rafael Antognolli2019-08-122-0/+88
| | | | | | | If the pixel pipes have a different number of subslices, emit a slice hashing table that will ensure proper workload distribution. v2: Set Mask field to 0xffff for workaround (Ken).
* intel/compiler: Fill a compiler statistics structJason Ekstrand2019-08-126-6/+7
| | | | | | | | | This commit is all annoying plumbing work which just adds support for a new brw_compile_stats struct. This struct provides a binary driver readable form of the same statistics we dump out to stderr when we INTEL_DEBUG is set with a shader stage. Reviewed-by: Lionel Landwerlin <[email protected]>
* i965/gen9: Optimize slice and subslice load balancing behavior.Francisco Jerez2019-08-125-6/+109
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The default pixel hashing mode settings used for slice and subslice load balancing are far from optimal under certain conditions (see the comments below for the gory details). The top-of-the-line GT4 parts suffer from a particularly severe performance problem currently due to a subslice load balancing issue. Fixing this seems to improve graphics performance across the board for most of the benchmarks in my test set, up to ~20% in some cases, e.g. from SKL GT4: unigine/valley: 3.44% ±0.11% gfxbench/gl_manhattan31: 3.99% ±0.13% gputest/pixmark_piano: 7.95% ±0.33% synmark/OglTexFilterAniso: 15.22% ±0.07% synmark/OglTexMem128: 22.26% ±0.06% Lower-end platforms are also affected by some subslice load imbalance to a lesser degree, especially during CCS resolve and fast clear operations, which are handled specially here due to rasterization ocurring in reduced CCS coordinates, which changes the semantics of the pixel hashing mode settings. No regressions seen during my tests on some SKL, KBL and BXT configurations. Additional benchmark reports welcome on any Gen9 platforms (that includes anything with Skylake, Broxton, Kabylake, Geminilake, Coffeelake, Whiskey Lake, Comet Lake or Amber Lake in your renderer string). P.S.: A similar problem is likely to be present on other non-Gen9 platforms, especially for CCS resolve and fast clear operations. Will follow-up with additional patches fixing the hashing mode for those once I have enough performance data to justify it. Reviewed-by: Kenneth Graunke <[email protected]>
* i965/spirv: Lower shared memory laterCaio Marcelo de Oliveira Filho2019-08-101-0/+20
| | | | | | | | | | | Instead of asking spirv_to_nir to lower the workgroup (shared memory) to offsets, keep them as derefs longer, then lower it later on. Because Workgroup memory doesn't have explicit offsets, we need to set those using nir_lower_vars_to_explicit_types before calling the I/O lowering pass. Reviewed-by: Jason Ekstrand <[email protected]>
* i965: Use force_compat_profile driconf optionDanylo Piliaiev2019-08-102-2/+8
| | | | | Signed-off-by: Danylo Piliaiev <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
* i965: fix mem leak in error pathEric Engestrom2019-08-101-1/+3
| | | | | | Fixes: 8ae6667992ccca41d088 ("intel/perf: move query_object into perf") Signed-off-by: Eric Engestrom <[email protected]> Reviewed-by: Mark Janes <[email protected]>
* i965: don't use p_compiler.h typesLionel Landwerlin2019-08-091-1/+1
| | | | | Signed-off-by: Lionel Landwerlin <[email protected]> Acked-by: Eric Engestrom <[email protected]>
* meson: define ETIME to ETIMEDOUT if not presentGreg V2019-08-081-3/+0
| | | | Reviewed-by: Eric Engestrom <[email protected]>
* anv,i965,iris: deduplicate setting of total_sharedRhys Perry2019-08-081-2/+0
| | | | | | | | v5: add patch Signed-off-by: Rhys Perry <[email protected]> Reviewed-by: Caio Marcelo de Oliveira Filho <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]>
* i965: Emit a dummy MEDIA_VFE_STATE before switching from GPGPU to 3DDanylo Piliaiev2019-08-081-0/+21
| | | | | | | | | | | | | There is an object-level preemption workaround which requires this. However, even without object-level preemption, we seem to have issues with geometry flickering when 3D and compute are combined in the same batch and this appears to fix it. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=110395 Suggested-by: Jason Ekstrand <[email protected]> Signed-off-by: Danylo Piliaiev <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]> Cc: [email protected]
* intel/perf: make perf context privateMark Janes2019-08-073-39/+27
| | | | | | Encapsulate the details of this data structure. Reviewed-by: Kenneth Graunke <[email protected]>
* intel/perf: print debug informationMark Janes2019-08-071-25/+8
| | | | | | | | | INTEL_DEBUG=perfmon will iterate over the perf queries, printing information about the state of each query. Some of this information will be private to intel/perf, and needs to a dump routine that can be called from i965. Reviewed-by: Kenneth Graunke <[email protected]>
* intel/perf: expose method to create queryMark Janes2019-08-071-10/+3
| | | | | | | By encapsulating this implementation within perf, we can eventually make struct gen_perf_ctx private. Reviewed-by: Kenneth Graunke <[email protected]>
* intel/perf: move initialization of pipeline statistics metrics to gen_perfMark Janes2019-08-071-93/+3
| | | | Reviewed-by: Kenneth Graunke <[email protected]>
* intel/perf: move get_query_data into gen_perfMark Janes2019-08-072-399/+2
| | | | | | | | | | | | | | | This refactor moves several helper functions for get_query_data as well: - accumulate_oa_reports - read_gt_frequency - get_pipeline_stats_data - get_oa_counter_data Functions which are no longer referenced in brw_performance_query.c have been removed. Reviewed-by: Kenneth Graunke <[email protected]>
* intel/perf: move delete_query to gen_perfMark Janes2019-08-071-39/+1
| | | | Reviewed-by: Kenneth Graunke <[email protected]>
* intel/perf: move is_query_ready to gen_perfMark Janes2019-08-071-133/+1
| | | | Reviewed-by: Kenneth Graunke <[email protected]>
* intel/perf: move wait_query to perfMark Janes2019-08-071-38/+1
| | | | | | | | | | | | | The following methods have duplicate implementation of read_oa_samples_until in brw_performance_query.c: - read_oa_samples_for_query - read_oa_samples_until They ar still referenced by other methods in the file and will be removed on the subsequent commit. Reviewed-by: Kenneth Graunke <[email protected]>
* intel/perf: create a vtable entry for bo_busyMark Janes2019-08-071-4/+5
| | | | | | | Iris and i965 variants of this method need to be called by perf routines. Reviewed-by: Kenneth Graunke <[email protected]>