mesa.git - Unnamed repository; edit this file 'description' to name the repository.

	Commit message (Collapse)	Author	Age	Files	Lines
*	meson: Force '.so' extension for DRI drivers	Jon Turney	2019-04-25	1	-0/+1
\| \| \| \| \| \| \| \| \| \| \| \|	DRI driver loadable modules are always installed with install_megadriver.py with names ending with '.so', irrespective of platform. Force the name the loadable module is built with to match, so install_megadriver.py doesn't spin trying to remove non-existent symlinks. Fixes: c77acc3c "meson: remove meson-created megadrivers symlinks"
*	st/mesa/radeonsi: fix race between destruction of types and shader compilation	Timothy Arceri	2019-04-24	6	-10/+10
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Commit 624789e3708c moved the destruction of types out of atexit() and made use of a ref count instead. This is useful for avoiding a crash where drivers such as radeonsi are still compiling in a thread when the app exits and has not called MakeCurrent to change from the current context. While the above scenario is technically an app bug we shouldn't crash. However that change caused another race condition between the shader compilation tread in radeonsi and context teardown functions. This patch makes two changes to fix this new problem: First we explicitly call _mesa_destroy_shader_compiler_types() when destroying the st context rather than calling it indirectly via _mesa_free_context_data(). We do this as we must call it after st_destroy_context_priv() so that we don't destory the glsl types before the compilation threads finish. Next wait for the shader threads to finish in si_destroy_context() this also means we need to call context destroy before destroying the queues in si_destroy_screen(). Fixes: 624789e3708c ("compiler/glsl: handle case where we have multiple users for types") Reviewed-by: Marek Olšák <[email protected]>
*	i965: Tidy bogus indentation left by previous commit	Kenneth Graunke	2019-04-22	1	-26/+24
\| \| \| \| \| \| \| \| \|	I left code indented one level too far in the previous commit to make the diff easier to review. Drop that extra level now. Fixes: 6981069fc80 i965: Ignore uniform storage for samplers or images, use binding info Reviewed-by: Jason Ekstrand <[email protected]>
*	i965: Ignore uniform storage for samplers or images, use binding info	Kenneth Graunke	2019-04-22	3	-18/+28
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	gl_nir_lower_samplers_as_deref creates new top level sampler and image uniforms which have been split from structure uniforms. i965 assumed that it could walk through gl_uniform_storage slots by starting at var->data.location and walking forward based on a simple slot count. This assumed that structure types were walked in a particular order. With samplers and images split out of structures, it becomes impossible to assign meaningful locations. Consider: struct S { sampler2D a; sampler2D b; } s[2]; The gl_uniform_storage locations for these follow this map: 0 => a[0], 1 => b[0], 2 => a[0], 3 => b[0]. But the new split variables look like: sampler2D lowered_a[2]; sampler2D lowered_b[2]; and there is no way to know that there's effectively a stride to get to the location for successive elements of a[] or b[]. So, working with location becomes effectively impossible. Ultimately, the point of looking at uniform storage was to pull out the bindings from the opaque index fields. gl_nir_lower_samplers_as_derefs can obtain this information while doing the splitting, however, and sets up var->data.binding to have the desired values. We move gl_nir_lower_samplers before brw_nir_lower_image_load_store so gl_nir_lower_samplers_as_derefs has the opportunity to set proper image bindings. Then, we make the uniform handling code skip sampler(-array) variables, and handle image param setup based on var->data.binding. Fixes Piglit tests/spec/glsl-1.10/execution/samplers/uniform-struct, this time without regressing dEQP-GLES2.functional.uniform_api.random.3. Fixes: f003859f97c nir: Make gl_nir_lower_samplers use gl_nir_lower_samplers_as_deref Reviewed-by: Jason Ekstrand <[email protected]>
*	i965: implement WaEnableStateCacheRedirectToCS	Lionel Landwerlin	2019-04-18	2	-0/+6
\| \| \| \| \| \| \| \| \| \| \| \|	This 3d performance workaround was initially put in the kernel but the media driver requires different settings so the register has been whitelisted in i915 [1] and userspace drivers are left initializing it as they wish. [1] : https://patchwork.freedesktop.org/series/59494/ Signed-off-by: Lionel Landwerlin <[email protected]> Reviewed-by: Anuj Phogat <[email protected]>
*	intel/perf: drop counter size field	Lionel Landwerlin	2019-04-17	2	-5/+6
\| \| \| \| \| \| \|	We can deduct the size from another field, let's just save some space. Signed-off-by: Lionel Landwerlin <[email protected]> Reviewed-by: Mark Janes <[email protected]>
*	i965: perf: add mdapi pipeline statistics queries on gen10/11	Lionel Landwerlin	2019-04-17	1	-1/+9
\| \| \| \| \| \| \| \| \|	The Gen10+ expected format adds an additional counter which we can't disclose yet. We can still make the size of the expected query result match. Signed-off-by: Lionel Landwerlin <[email protected]> Reviewed-by: Mark Janes <[email protected]>
*	i965: move mdapi guid into intel/perf	Lionel Landwerlin	2019-04-17	1	-2/+1
\| \| \| \| \| \| \|	One more thing we want to share between the different APIs. Signed-off-by: Lionel Landwerlin <[email protected]> Reviewed-by: Mark Janes <[email protected]>
*	i965: move mdapi result data format to intel/perf	Lionel Landwerlin	2019-04-17	3	-96/+10
\| \| \| \| \| \| \|	We want to reuse this in Anv. Signed-off-by: Lionel Landwerlin <[email protected]> Reviewed-by: Mark Janes <[email protected]>
*	i965: move brw_timebase_scale to device info	Lionel Landwerlin	2019-04-17	5	-19/+15
\| \| \| \| \|	Signed-off-by: Lionel Landwerlin <[email protected]> Reviewed-by: Mark Janes <[email protected]>
*	i965: move OA accumulation code to intel/perf	Lionel Landwerlin	2019-04-17	3	-167/+45
\| \| \| \| \| \| \|	We'll want to reuse this in our Vulkan extension. Signed-off-by: Lionel Landwerlin <[email protected]> Reviewed-by: Mark Janes <[email protected]>
*	i965: move mdapi data structure to intel/perf	Lionel Landwerlin	2019-04-17	1	-96/+7
\| \| \| \| \| \| \|	We'll want to reuse those structures later on. Signed-off-by: Lionel Landwerlin <[email protected]> Reviewed-by: Mark Janes <[email protected]>
*	i965: extract performance query metrics	Lionel Landwerlin	2019-04-17	23	-148117/+206
\| \| \| \| \| \| \| \| \| \|	We would like to reuse performance query metrics in other APIs. Let's make the query code dealing with the processing of raw counters into human readable values API agnostic. Signed-off-by: Lionel Landwerlin <[email protected]> Reviewed-by: Mark Janes <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
*	i965: store device revision in gen_device_info	Lionel Landwerlin	2019-04-17	3	-6/+4
\| \| \| \| \|	Signed-off-by: Lionel Landwerlin <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
*	i965: Move program key debugging to the compiler.	Kenneth Graunke	2019-04-16	9	-283/+36
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The i965 driver has a bunch of code to compare two sets of program keys and print out the differences. This can be useful for debugging why a shader needed to be recompiled on the fly due to non-orthogonal state dependencies. anv doesn't do recompiles, so we didn't need to share this in the past - but I'd like to use it in iris. This moves the bulk of the code to the compiler where it can be reused. To make that possible, we need to decouple it from i965 - we can't get at the brw program cache directly, nor use brw_context to print things. Instead, we use compiler->shader_perf_log(), and simply pass in keys. We put all of this debugging code in brw_debug_recompile.c, and only export a single function, for simplicity. I also tidied the code a bit while moving it, now that it all lives in one file. Reviewed-by: Jordan Justen <[email protected]>
*	Delete autotools	Dylan Baker	2019-04-15	10	-655/+0
\| \| \| \| \| \| \| \| \| \|	Acked-by: Kenneth Graunke <[email protected]> Reviewed-by: Eric Anholt <[email protected]> Reviewed-by: Eric Engestrom <[email protected]> Acked-by: Marek Olšák <[email protected]> Acked-by: Jason Ekstrand <[email protected]> Acked-by: Bas Nieuwenhuizen <[email protected]> Acked-by: Matt Turner <[email protected]>
*	intel: Emit 3DSTATE_VF_STATISTICS dynamically	Kenneth Graunke	2019-04-14	2	-6/+24
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Pipeline statistics queries should not count BLORP's rectangles. (23) How do operations like Clear, TexSubImage, etc. affect the results of the newly introduced queries? DISCUSSION: Implementations might require "helper" rendering commands be issued to implement certain operations like Clear, TexSubImage, etc. RESOLVED: They don't. Only application submitted rendering commands should have an effect on the results of the queries. Piglit's arb_pipeline_statistics_query-vert_adj exposes this bug when the driver is hacked to always perform glBufferData via a GPU staging copy (for debugging purposes). Reviewed-by: Jason Ekstrand <[email protected]> Reviewed-by: Lionel Landwerlin <[email protected]>
*	glsl/nir: add support for lowering bindless images_derefs	Karol Herbst	2019-04-12	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \|	v2: handle atomics as well make use of nir_rewrite_image_intrinsic v3: remove call to nir_remove_dead_derefs v4: (Timothy Arceri) dont actually call lowering yet Signed-off-by: Karol Herbst <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]> (v3) Reviewed-by: Marek Olšák <[email protected]>
*	nir: move brw_nir_rewrite_image_intrinsic into common code	Karol Herbst	2019-04-12	1	-1/+1
\| \| \| \| \| \|	Signed-off-by: Karol Herbst <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]> Reviewed-by: Marek Olšák <[email protected]>
*	intel/common: move gen_debug to intel/dev	Mark Janes	2019-04-10	4	-4/+4
\| \| \| \| \| \| \| \| \|	libintel_common depends on libintel_compiler, but it contains debug functionality that is needed by libintel_compiler. Break the circular dependency by moving gen_debug files to libintel_dev. Suggested-by: Kenneth Graunke <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
*	i965: Advertise NV_compute_shader_derivatives	Caio Marcelo de Oliveira Filho	2019-04-08	1	-0/+1
\| \| \| \|	Reviewed-by: Ian Romanick <[email protected]>
*	intel: add dependency on genxml generated files	Lionel Landwerlin	2019-04-08	1	-1/+1
\| \| \| \| \| \| \| \| \| \|	Drivers using genxml will start compilation before generated files are created, so add a dependency to it. Signed-off-by: Lionel Landwerlin <[email protected]> Reviewed-by: Eric Engestrom <[email protected]> Reviewed-by: Dylan Baker <[email protected]> Cc: [email protected]
*	meson: strip rpath from megadrivers	Eric Engestrom	2019-04-01	1	-0/+3
\| \| \| \| \| \| \| \| \| \|	More specifically, use the library file that has been post-processed by Meson when creating the hardlinks. Bugs: https://bugs.freedesktop.org/show_bug.cgi?id=108766 Fixes: 3218056e0eb375eeda47 "meson: Build i965 and dri stack" Signed-off-by: Eric Engestrom <[email protected]> Reviewed-by: Dylan Baker <[email protected]>
*	i965: perf: update render basic configs for big core gen9/gen10	Lionel Landwerlin	2019-04-01	8	-23/+24
\| \| \| \| \| \| \| \| \|	This updates allows an MI_LRI to trigger a OA report write in the global OA buffer. This isn't really useful for us, we just keep close to the internal public configs. Signed-off-by: Lionel Landwerlin <[email protected]> Acked-by: Kenneth Graunke <[email protected]>
*	i965: perf: add ring busyness metric for cfl gt2	Lionel Landwerlin	2019-04-01	1	-1/+165
\| \| \| \| \|	Signed-off-by: Lionel Landwerlin <[email protected]> Acked-by: Kenneth Graunke <[email protected]>
*	i965: perf: enable Icelake metrics	Lionel Landwerlin	2019-03-31	3	-3/+11
\| \| \| \| \|	Signed-off-by: Lionel Landwerlin <[email protected]> Acked-by: Kenneth Graunke <[email protected]>
*	i965: perf: add Icelake metrics	Lionel Landwerlin	2019-03-31	1	-0/+11899
\| \| \| \| \|	Signed-off-by: Lionel Landwerlin <[email protected]> Acked-by: Kenneth Graunke <[email protected]>
*	i965: perf: sklgt2: drop programming of an unused NOA register	Lionel Landwerlin	2019-03-31	1	-11/+6
\| \| \| \| \|	Signed-off-by: Lionel Landwerlin <[email protected]> Acked-by: Kenneth Graunke <[email protected]>
*	i965: perf: hsw: drop register programming not needed on HSW	Lionel Landwerlin	2019-03-31	1	-2/+1
\| \| \| \| \| \| \|	This register is flagged as IVB only in the documentation. Signed-off-by: Lionel Landwerlin <[email protected]> Acked-by: Kenneth Graunke <[email protected]>
*	i965: perf: chv: fixup counters names	Lionel Landwerlin	2019-03-31	1	-25/+25
\| \| \| \| \|	Signed-off-by: Lionel Landwerlin <[email protected]> Acked-by: Kenneth Graunke <[email protected]>
*	i965: perf: add PMA stall metrics	Lionel Landwerlin	2019-03-31	10	-10/+1140
\| \| \| \| \| \| \| \|	These are new metrics for Gen8/9 to measure the effect of the PMA stall workaround fix. Signed-off-by: Lionel Landwerlin <[email protected]> Acked-by: Kenneth Graunke <[email protected]>
*	i965: perf: sklgt2: update memory write config	Lionel Landwerlin	2019-03-31	1	-7/+49
\| \| \| \| \| \| \| \|	This rework the programming between older pre-production steppings & new ones. Signed-off-by: Lionel Landwerlin <[email protected]> Acked-by: Kenneth Graunke <[email protected]>
*	i965: perf: sklgt2: update compute metrics config	Lionel Landwerlin	2019-03-31	1	-8/+2
\| \| \| \| \| \| \| \|	This unifies some of the programming between pre-production stepping and production ones. Signed-off-by: Lionel Landwerlin <[email protected]> Acked-by: Kenneth Graunke <[email protected]>
*	i965: perf: sklgt2: update a priority for register programming	Lionel Landwerlin	2019-03-31	1	-2/+2
\| \| \| \| \| \| \|	This makes no difference in term of programming, it's just a cleanup. Signed-off-by: Lionel Landwerlin <[email protected]> Acked-by: Kenneth Graunke <[email protected]>
*	i965,iris/blorp: do not blit 0-sizes	Sergii Romantsov	2019-03-30	1	-1/+9
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Seems there is no sense in blitting 0-sized sources or destinations. Additionaly it may cause segfaults for i965. v2: Function call replaced with inline check v3: Added check to avoid devision by zero (L. Landwerlin) v4: Added simillar check for Iris (L. Landwerlin) Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=110239 Signed-off-by: Sergii Romantsov <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]> Reviewed-by: Lionel Landwerlin <[email protected]>
*	i965/blorp: Remove unused parameter from blorp_surf_for_miptree.	Rafael Antognolli	2019-03-28	1	-24/+12
\| \| \| \| \| \|	It seems pretty useless nowadays. Reviewed-by: Jason Ekstrand <[email protected]>
*	i965,iris,anv: Make alpha to coverage work with sample mask	Danylo Piliaiev	2019-03-25	1	-6/+12
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	From "Alpha Coverage" section of SKL PRM Volume 7: "If Pixel Shader outputs oMask, AlphaToCoverage is disabled in hardware, regardless of the state setting for this feature." From OpenGL spec 4.6, "15.2 Shader Execution": "The built-in integer array gl_SampleMask can be used to change the sample coverage for a fragment from within the shader." From OpenGL spec 4.6, "17.3.1 Alpha To Coverage": "If SAMPLE_ALPHA_TO_COVERAGE is enabled, a temporary coverage value is generated where each bit is determined by the alpha value at the corresponding sample location. The temporary coverage value is then ANDed with the fragment coverage value to generate a new fragment coverage value." Similar wording could be found in Vulkan spec 1.1.100 "25.6. Multisample Coverage" Thus we need to compute alpha to coverage dithering manually in shader and replace sample mask store with the bitwise-AND of sample mask and alpha to coverage dithering. The following formula is used to compute final sample mask: m = int(16.0 * clamp(src0_alpha, 0.0, 1.0)) dither_mask = 0x1111 * ((0xfea80 >> (m & ~3)) & 0xf) \| 0x0808 * (m & 2) \| 0x0100 * (m & 1) sample_mask = sample_mask & dither_mask Credits to Francisco Jerez <[email protected]> for creating it. It gives a number of ones proportional to the alpha for 2, 4, 8 or 16 least significant bits of the result. GEN6 hardware does not have issue with simultaneous usage of sample mask and alpha to coverage however due to the wrong sending order of oMask and src0_alpha it is still affected by it. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=109743 Signed-off-by: Danylo Piliaiev <[email protected]> Reviewed-by: Francisco Jerez <[email protected]>
*	android: static link with libexpat with Android O+	Kishore Kadiyala	2019-03-25	1	-1/+9
\| \| \| \| \| \| \| \| \| \| \| \| \|	In Android O, MESA needs to statically link libexpat so that it's in same VNDK namespace. v2: apply change also to anv driver (Tapani) v3: use += in anv change (Eric Engestrom) Change-Id: I82b0be5c817c21e734dfdf5bfb6a9aa1d414ab33 Signed-off-by: Kishore Kadiyala <[email protected]> Signed-off-by: Tapani Pälli <[email protected]> Reviewed-by: Eric Engestrom <[email protected]>
*	i965/icl: Add WA_2204188704 to disable pixel shader panic dispatch	Anuj Phogat	2019-03-19	2	-0/+10
\| \| \| \| \| \|	Signed-off-by: Anuj Phogat <[email protected]> Acked-by: Jason Ekstrand <[email protected]> Reviewed-by: Lionel Landwerlin <[email protected]>
*	i965: Stop setting LowerBuferInterfaceBlocks	Jason Ekstrand	2019-03-15	1	-0/+4
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Instead, we do UBO and SSBO deref lowering in NIR after we've given it a chance to optimize SSBO access: Shader-db results on Kaby Lake: total instructions in shared programs: 15235775 -> 15235484 (<.01%) instructions in affected programs: 14992 -> 14701 (-1.94%) helped: 19 HURT: 20 total cycles in shared programs: 339220331 -> 339027307 (-0.06%) cycles in affected programs: 79831981 -> 79638957 (-0.24%) helped: 540 HURT: 602 total loops in shared programs: 4402 -> 4348 (-1.23%) loops in affected programs: 186 -> 132 (-29.03%) helped: 27 HURT: 0 total spills in shared programs: 23261 -> 23234 (-0.12%) spills in affected programs: 38 -> 11 (-71.05%) helped: 1 HURT: 0 total fills in shared programs: 31442 -> 31371 (-0.23%) fills in affected programs: 98 -> 27 (-72.45%) helped: 1 HURT: 0 LOST: 12 GAINED: 12 Most of the help and hurt in instruction counts was just churn caused by re-ordering of optimizations and the fact that the NIR deref lowering code is emitting slightly different instructions. Nothing was hurt by more than three instructions and most things weren't helped by more than four. The primary exception to this is one Car Chase shader: shaders/non-free/gfxbench4/carchase/341.shader_test CS SIMD32: 1144 -> 821 (-28.23%) There is also one compute shader in Manhattan 3.1 and a fragment shader in the UE4 Shooter Game demo that now get a loop partially unrolled. Those showed up in the results as hurt instructions but were manually removed to get the results above. The lost/gained was a dozen Car Chase shaders that went from SIMD8 to SIMD16 thanks to improved register pressure: shaders/non-free/gfxbench4/carchase/366.shader_test CS shaders/non-free/gfxbench4/carchase/368.shader_test CS shaders/non-free/gfxbench4/carchase/370.shader_test CS shaders/non-free/gfxbench4/carchase/372.shader_test CS shaders/non-free/gfxbench4/carchase/376.shader_test CS shaders/non-free/gfxbench4/carchase/378.shader_test CS shaders/non-free/gfxbench4/carchase/380.shader_test CS shaders/non-free/gfxbench4/carchase/382.shader_test CS shaders/non-free/gfxbench4/carchase/384.shader_test CS shaders/non-free/gfxbench4/carchase/388.shader_test CS shaders/non-free/gfxbench4/carchase/4.shader_test CS shaders/non-free/gfxbench4/carchase/6.shader_test CS Given how much it appeared to be improved, I ran Car Chase on my laptop. Unfortunately, I wasn't able to see any measurable improvement. It might be helped by 1-2% but it's in the noise. It does render correctly as far as I can tell so the improvement is legitimate. All of the loops that got delete were in dolphin uber shaders. I've had no opportunity to test them for correctness or performance. Reviewed-by: Caio Marcelo de Oliveira Filho <[email protected]>
*	mesa: rename logging functions to reflect that they format strings	Mark Janes	2019-03-14	6	-43/+43
\| \| \| \| \| \| \|	In preparation for the definition of a function to log a formatted string. Reviewed-by: Erik Faye-Lund <[email protected]>
*	i965: Disable ARB_fragment_shader_interlock for platforms prior to GEN9	Plamena Manolova	2019-03-14	1	-1/+24
\| \| \| \| \| \| \| \| \| \| \|	ARB_fragment_shader_interlock depends on memory fences to ensure fragment ordering and this ordering guarantee is only supported from GEN9 onwards. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=109980 Fixes: 939312702e35 "i965: Add ARB_fragment_shader_interlock support." Signed-off-by: Plamena Manolova <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]>
*	i965: remove scaling factors from P010, P012	Tapani Pälli	2019-03-14	1	-2/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Patch removes scaling factors introduced in 2a2e69f975b but leaves option to use scaling in place as it could be useful with other upcoming YUV formats. We did this scaling because ffmpeg was shifting channel bits down, however it seems this is not the right place as compositor wants to flip same buffers directly to display as well and therefore bitshifting needs to be done by the client when receiving frame from ffmpeg. Now P0x formats are treated the same, e.g. P010 is same as P016 but with lower 6 bits set to zeros. Fixes: 2a2e69f975b "i965: add P0x formats and propagate required scaling factors" Signed-off-by: Tapani Pälli <[email protected]> Reviewed-by: Lionel Landwerlin <[email protected]>
*	i965: Reimplement all the PIPE_CONTROL rules.	Kenneth Graunke	2019-03-11	1	-136/+403
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This implements virtually all documented PIPE_CONTROL restrictions in a centralized helper. You now simply ask for the operations you want, and the pipe control "brain" will figure out exactly what pipe controls to emit to make that happen without tanking your system. The hope is that this will fix some intermittent flushing issues as well as GPU hangs. However, it also has a high risk of causing GPU hangs and other regressions, as this is a particularly sensitive area and poking the bear isn't always advisable. Mark Janes noted that this patch helps with some GPU hangs on Icelake. This does re-enable the VF Invalidate => Write Immediate workaround on Gen8, which had been disabled (bug 103787) due to GPU hangs. The old code did this workaround after another which would have added CS stall bits, so it missed a workaround. The new code orders them properly and appears to work. v4: Don't pass "bo, offset, imm" to a recursive CS stall (caught by Topi Pohjolainen), drop Gen10 workarounds that are unnecessary for production hardware. Reviewed-by: Topi Pohjolainen <[email protected]>
*	i965: Use genxml for emitting PIPE_CONTROL.	Kenneth Graunke	2019-03-11	7	-230/+362
\| \| \| \| \| \| \| \| \| \| \|	While this does add a bunch of boilerplate, it also protects us against the hardware moving bits, or changing their meaning. For something as finnicky as PIPE_CONTROL, the extra safety seems worth it. We turn PIPE_CONTROL_* into an bitfield of arbitrary flags, and then pack them appropriately. Reviewed-by: Topi Pohjolainen <[email protected]>
*	i965: Rename ISP_DIS to INDIRECT_STATE_POINTERS_DISABLE.	Kenneth Graunke	2019-03-11	2	-2/+2
\| \| \| \| \| \|	Clearer name. Reviewed-by: Topi Pohjolainen <[email protected]>
*	i965: Move some genX infrastructure to genX_boilerplate.h.	Kenneth Graunke	2019-03-11	4	-128/+174
\| \| \| \| \| \| \|	This will let us make multiple genX_*.c files, without copy and pasting all this boilerplate. Reviewed-by: Topi Pohjolainen <[email protected]>
*	isl: Add a swizzle parameter to isl_buffer_fill_state()	Kenneth Graunke	2019-03-07	1	-0/+1
\| \| \| \| \| \| \|	This is necessary for legacy texture buffer object formats, where we'll need to use a swizzle to fake e.g. luminance. Reviewed-by: Jason Ekstrand <[email protected]>
*	intel/decoders: handle decoding MI_BBS from ring	Lionel Landwerlin	2019-03-07	1	-1/+1
\| \| \| \| \| \| \| \| \|	An MI_BATCH_BUFFER_START in the ring buffer acts as a second level batchbuffer (aka jump back to ring buffer when running into a MI_BATCH_BUFFER_END). Signed-off-by: Lionel Landwerlin <[email protected]> Reviewed-by: Rafael Antognolli <[email protected]>
*	intel/decoders: add address space indicator to get BOs	Lionel Landwerlin	2019-03-07	1	-1/+1
\| \| \| \| \| \| \|	Some commands like MI_BATCH_BUFFER_START have this indicator. Signed-off-by: Lionel Landwerlin <[email protected]> Reviewed-by: Rafael Antognolli <[email protected]>