mesa.git - Unnamed repository; edit this file 'description' to name the repository.

	Commit message (Collapse)	Author	Age	Files	Lines
*	i965/CFL: Add PCI Ids for Coffee Lake.	Anusha Srivatsa	2017-06-22	2	-0/+27
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	Coffee Lake has a gen9 graphics following KBL. From 3D perspective, CFL is a clone of KBL/SKL features. v2: Change commit message, correct alignment <Anuj Phogat> v3: Update IDs. v4: Initialize l3_banks, correct nomenclature <Anuj> Cc: Rodrigo Vivi <[email protected]> Signed-off-by: Anusha Srivatsa <[email protected]> Acked-by: Benjamin Widawsky <[email protected]> Reviewed-by: Anuj Phogat <[email protected]>
*	intel: compiler/i965: fix is_broxton checks	Lionel Landwerlin	2017-06-20	1	-0/+3
\| \| \| \| \| \| \| \| \| \|	In 5f2fe9302c is_geminilake was introduced for the differenciate broxton from geminilake. Unfortunately I failed as verifying that is_broxton is throughout the code base to mean Gen9lp. Fixes: 5f2fe9302c ("intel: common: add flag to identify platforms by name") Signed-off-by: Lionel Landwerlin <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
*	i965/cnl: Add l3 configuration for Cannonlake	Ben Widawsky	2017-06-20	1	-1/+20
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	V2 (Anuj): Squash the changes in one patch rebase on master. Address the review comments made by Francisco Jerez. Do the URB allocation per slice (not per bank). V3 (Anuj): Update the comment. Format the table as other l3 config tables. Signed-off-by: Ben Widawsky <[email protected]> Signed-off-by: Anuj Phogat <[email protected]> Reviewed-by: Francisco Jerez <[email protected]> --- V1 was sent out with the heading: "i965/cnl: Properly handle l3 configuration"
*	i965: Add a variable for way size per bank in get_l3_way_size()	Anuj Phogat	2017-06-20	1	-5/+4
\| \| \| \| \| \| \| \| \| \|	Adding this variable better explains the computation of L3 way size in the function. V2: Use const variable for way_size_per_bank. Signed-off-by: Anuj Phogat <[email protected]> Reviewed-by: Francisco Jerez <[email protected]>
*	i965: Fix broxton 2x6 l3 config	Anuj Phogat	2017-06-20	1	-0/+16
\| \| \| \| \| \| \| \| \| \| \|	The new table added in this patch matches with the table in gfxspecs. We were programming the wrong values earlier. V2: Update the comment. Cc: "17.1" <[email protected]> Signed-off-by: Anuj Phogat <[email protected]> Reviewed-by: Francisco Jerez <[email protected]>
*	intel: common: add number of thread per eu	Lionel Landwerlin	2017-06-19	2	-2/+28
\| \| \| \| \| \| \|	This will be used by to normalize OA counters. Signed-off-by: Lionel Landwerlin <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
*	intel: common: express timestamps units in frequency	Lionel Landwerlin	2017-06-19	2	-11/+13
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	Rather than storing the period as a double that looses some precision. Also fixes the Gen9LP timestamp frequency which is no 19200123 but 19200000 as pointed by Ville : https://lists.freedesktop.org/archives/intel-gfx/2017-April/125126.html Finally add the Cannonlake timestamp frequency. Signed-off-by: Lionel Landwerlin <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
*	intel: common: add flag to identify platforms by name	Lionel Landwerlin	2017-06-19	2	-6/+24
\| \| \| \| \| \| \| \|	The perf infrastructure needs to identify specific platforms, not just generations. Signed-off-by: Lionel Landwerlin <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
*	i965/cnl: Add a preliminary device for Cannonlake	Ben Widawsky	2017-06-09	1	-0/+46
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	v2 (Anuj): Rebased on master and updated pci ids Remove redundant initialization of max_wm_threads to 64 * 12. For gen9+ max_wm_threads are initialized in gen_get_device_info(). v3 (Anuj): Move the patch to end of series. Remove unused gt1, gt2, gt3 functions. Remove l3_banks variable. Variable is now available on master. Signed-off-by: Anuj Phogat <[email protected]> Signed-off-by: Ben Widawsky <[email protected]> Reviewed-by: Anuj Phogat <[email protected]>
*	i965/cnl: Handle gen10 in switch cases across the driver	Anuj Phogat	2017-06-09	1	-0/+1
\| \| \| \| \| \| \| \| \|	V2: Start using gen10 functions isl_gen10*(), gen10_blorp_exec() gen10_init_atoms() (Jason) Remove Vulkan changes. Do them later in a separate patch. Signed-off-by: Anuj Phogat <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]>
*	i965: Make feature macros gen8 based	Ben Widawsky	2017-06-09	1	-8/+5
\| \| \| \| \| \| \| \| \| \|	All the "features" of the hardware are similar starting with GEN8, so remove as much of the GEN9 uniqueness as possible. This makes implementing future gen platforms a bit easier. Signed-off-by: Ben Widawsky <[email protected]> Reviewed-by: Anuj Phogat <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]>
*	intel: Fix broxton 2x6 way size computation	Anuj Phogat	2017-06-06	1	-0/+4
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This patch is undoing the changes to way size computation in broxton 2x6, made by below commit: Commit: 0d576fbfbe912cf3fb9ab594bb31eb58bccf2138 Author: Anuj Phogat <[email protected]> i965: Simplify l3 way size computations By making use of l3_banks field in gen_device_info struct l3_way_size for gen7+ = 2 * l3_banks. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=101306 Signed-off-by: Anuj Phogat <[email protected]> Tested-by: Mark Janes <[email protected]> Acked-by: Kenneth Graunke <[email protected]>
*	intel: gen-decoder: rework how we handle groups	Lionel Landwerlin	2017-06-06	2	-86/+161
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The current way of handling groups doesn't seem to be able to handle MI_LOAD_REGISTER_* with more than one register. This change reworks the way we handle groups by building a traversal list on loading the GENXML files. Let's say you have Instruction { Field0 Field1 Field2 Group0 (count=2) { Field0-0 Field0-1 } Group1 (count=4) { Field1-0 Field1-1 } } We build of linked on load that goes : Instruction -> Group0 -> Group1 All of those are gen_group structures, making the traversal trivial. We just need to iterate groups for the right number of timers (count field in genxml). The more fancy case is when you have only a single group of unknown size (count=0). In that case we keep on reading that group for as long as we're within the DWordLength of that instruction. Signed-off-by: Lionel Landwerlin <[email protected]> Reviewed-by: Rafael Antognolli <[email protected]>
*	i965: Change INTEL_DEBUG=vec4 to INTEL_SCALAR_VS for consistency.	Kenneth Graunke	2017-06-05	2	-2/+1
\| \| \| \| \| \| \| \| \|	We moved to INTEL_SCALAR_* when we added more than a single stage, but never went back and converted the VS to work that way. Be consistent. Also update the documentation to actually mention these debug variables. Acked-by: Jason Ekstrand <[email protected]>
*	i965: Simplify l3 way size computations	Anuj Phogat	2017-06-02	1	-10/+2
\| \| \| \| \| \| \| \| \| \| \|	By making use of l3_banks field in gen_device_info struct l3_way_size for gen7+ = 2 * l3_banks. V2: Keep the get_l3_way_size() function. Suggested-by: Francisco Jerez <[email protected]> Signed-off-by: Anuj Phogat <[email protected]> Reviewed-by: Francisco Jerez <[email protected]>
*	i965: Add and initialize l3_banks field for gen7+	Anuj Phogat	2017-06-02	2	-3/+27
\| \| \| \| \| \| \| \| \| \| \|	This new field helps simplify l3 way size computations in next patch. V2: Initialize the l3_banks to 0 in macros. Suggested-by: Francisco Jerez <[email protected]> Signed-off-by: Anuj Phogat <[email protected]> Reviewed-by: Francisco Jerez <[email protected]>
*	genxml: Fix decoder to print the array element on field members.	Kenneth Graunke	2017-06-01	1	-3/+9
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Previously we'd print things like: 0xfffbb568: 0x00010000 : Dword 1 ReadLength: 0 ReadLength: 1 0xfffbb568: 0x00000001 : Dword 1 ReadLength: 1 ReadLength: 0 instead of the more obvious: 0xfffbb568: 0x00010000 : Dword 1 ReadLength[0]: 0 ReadLength[1]: 1 0xfffbb568: 0x00000001 : Dword 1 ReadLength[2]: 1 ReadLength[3]: 0 (Yes, the ralloc context here is bogus - the decoder leaks just about everything. We need to use proper ralloc contexts someday...) Acked-by: Lionel Landwerlin <[email protected]>
*	genxml: Fix decoding of array groups.	Kenneth Graunke	2017-06-01	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	If you had a group as the first element of a struct, i.e. <struct name="3DSTATE_CONSTANT_BODY" length="10"> <group count="4" start="0" size="16"> <field name="ReadLength" start="0" end="15" type="uint"/> </group> ... </struct> we would get a group_offset of 0, causing create_field() to think the field wasn't in a group, and fail to offset forward for successive array elements. So we'd mark all the array elements as offset 0. Using ctx->group->elem_size is a better check for "are we in a group?". Acked-by: Lionel Landwerlin <[email protected]>
*	genxml: Fix decoder for groups with multiple fields.	Kenneth Graunke	2017-06-01	1	-4/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	If you have something like: <group count="0" start="96" size="32"> <field name="Entry_0" start="0" end="15" type="GATHER_CONSTANT_ENTRY"/> <field name="Entry_1" start="16" end="31" type="GATHER_CONSTANT_ENTRY"/> </group> We would reset ctx->group_count to 0 after processing the first field, so the second would not have a group count. This is largely untested, as the only groups with multiple fields are packets we don't emit in Mesa. Found by inspection. Acked-by: Lionel Landwerlin <[email protected]>
*	intel/decoder: Handle the BLT ring in gen_group_get_length	Jason Ekstrand	2017-05-26	1	-0/+4
\| \| \| \|	Reviewed-by: Jordan Justen <[email protected]>
*	intel/decoder: Handle gen4 VF_STATISTICS and PIPELINE_SELECT	Jason Ekstrand	2017-05-26	1	-2/+7
\| \| \| \| \| \| \|	These need special handling because they have no "DWord Length" parameter and they have an unusual bias of 1. Reviewed-by: Jordan Justen <[email protected]>
*	intel/decoder: Fix indentation	Matt Turner	2017-05-15	1	-4/+4
\| \| \| \|	Reviewed-by: Iago Toral Quiroga <[email protected]>
*	intel: gen-decoder: fix xml parser leak	Lionel Landwerlin	2017-05-15	1	-6/+7
\| \| \| \| \| \| \| \|	In the unlikely case the parsing of genxml files fails, we were leaking an xml parser object. Signed-off-by: Lionel Landwerlin <[email protected]> Reviewed-by: Tapani Pälli <[email protected]>
*	i965: Drop INTEL_DEBUG=stats.	Kenneth Graunke	2017-05-10	2	-2/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	For whatever reason, we had an INTEL_DEBUG=stats option that enabled various statistics counters on Gen4-5 systems. It's been around forever, though I can't think of a single time that it's been useful. On Gen6+, we enable statistics all the time because they're necessary to support various query object targets. Turning them off would break those queries. Gen4-5 don't support those queries, so the statistics counters generally aren't useful; we disabled them by default. This patch disables them altogether. Reviewed-by: Samuel Iglesias Gonsálvez <[email protected]>
*	intel: gen decoder: don't check for size_t negative values	Lionel Landwerlin	2017-05-09	1	-1/+1
\| \| \| \| \| \| \| \| \|	We should get either 0 or 1 here. CID: 1373562 (Control flow issues) Signed-off-by: Lionel Landwerlin <[email protected]> Acked-by: Matt Turner <[email protected]>
*	intel/aubinator: Correctly read variable length structs.	Rafael Antognolli	2017-04-24	2	-6/+35
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Before this commit, when a group with count="0" is found, only one field is added to the struct representing the instruction. This causes only one entry to be printed by aubinator, for variable length groups. With this commit we "detect" that there's a variable length group (count="0") and store the offset of the last entry added to the struct when reading the xml. When finally reading the aubdump file, we check the size of the group and whether we have variable number of elements, and in that case, reuse the last field to add the remaining elements. Signed-off-by: Rafael Antognolli <[email protected]> Tested-by: Jason Ekstrand <[email protected]> Acked-by: Kenneth Graunke <[email protected]>
*	intel/decoder: Fix is_header_field starting condition.	Kenneth Graunke	2017-04-16	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \|	Starting positions >= 32 are not part of the header, rather than >. Caught by Coverity, which found that "bits <<= field->start" may shift by 32, which has undefined behavior. CID: 1404968 Reviewed-by: Lionel Landwerlin <[email protected]>
*	intel/gen_decoder: return -1 for unknown command formats	Jordan Justen	2017-04-06	1	-7/+15
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Decoding with aubinator encountered a command of 0xffffffff. With the previous code, it caused aubinator to jump 255 + 2 dwords to start decoding again. Instead we can attempt to detect the known instruction formats. If the format is not recognized, then we can advance just 1 dword. v2: * Update aubinator_error_decode * Actually convert the length variable returned into a signed integer in aubinator.c, intel_batchbuffer.c and aubinator_error_decode.c. Signed-off-by: Jordan Justen <[email protected]> Acked-by: Lionel Landwerlin <[email protected]>
*	intel/gen_decoder: Fix length for Media State/Object commands	Jordan Justen	2017-04-06	1	-2/+10
\| \| \| \| \| \| \| \|	From BDW PRM, Volume 6: Command Stream Programming, 'Render Command Header Format'. Signed-off-by: Jordan Justen <[email protected]> Reviewed-by: Lionel Landwerlin <[email protected]>
*	intel: tools: add aubinator_error_decode tool	Lionel Landwerlin	2017-04-04	2	-0/+12
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This is pretty much the same tool as what i-g-t has, only with a more fancy decoding of the instructions/registers. It also doesn't support anything before gen4. v2 (from Matt): Drop authors Remove undefined automake variable v3: Fix incorrect offsets for dword > 1 (Jordan) v4: Fix decompression error with large blobs (Jordan) Signed-off-by: Lionel Landwerlin <[email protected]> Acked-by: Matt Turner <[email protected]>
*	aubinator/gen_decoder/i965: decode instructions from dword 0	Lionel Landwerlin	2017-04-03	2	-5/+18
\| \| \| \| \| \| \| \| \|	Some packets like 3DSTATE_VF_STATISTICS, 3DSTATE_DRAWING_RECTANGLE, 3DPRIMITIVE, PIPELINE_SELECT, etc... have configurable fields in dword0, we probably want to print those. Signed-off-by: Lionel Landwerlin <[email protected]> Reviewed-by: Matt Turner <[email protected]>
*	intel: gen_decoder: store pointer to current decoded field in iterator	Lionel Landwerlin	2017-04-03	2	-25/+26
\| \| \| \| \|	Signed-off-by: Lionel Landwerlin <[email protected]> Reviewed-by: Matt Turner <[email protected]>
*	intel: genxml: compress all gen files into one	Lionel Landwerlin	2017-03-31	1	-40/+22
\| \| \| \| \| \| \| \| \| \| \| \| \|	Combining all the files into a single string didn't make any difference in the size of the aubinator binary. With this change we now also embed gen4/4.5/5 descriptions, which increases the aubinator size by ~16Kb. v2 (Lionel): rebase makefiles Signed-off-by: Lionel Landwerlin <[email protected]> Reviewed-by: Jordan Justen <[email protected]>
*	intel/common: consistently use ifndef guards over pragma once	Emil Velikov	2017-03-22	1	-1/+5
\| \| \| \| \| \| \| \|	Signed-off-by: Emil Velikov <[email protected]> Acked-by: Lionel Landwerlin <[email protected]> Acked-by: Vedran Miletić <[email protected]> Acked-by: Juha-Pekka Heikkila <[email protected]> Reviewed-by: Edward O'Callaghan <[email protected]>
*	intel: Move tools/decoder.[ch] to common/gen_decoder.[ch].	Kenneth Graunke	2017-03-21	2	-0/+1000
\| \| \| \| \| \| \|	This way they become part of libintel_common.la so I can use them in the i965 driver. Reviewed-by: Emil Velikov <[email protected]>
*	intel: Add a INTEL_DEBUG=color option.	Kenneth Graunke	2017-03-21	2	-0/+2
\| \| \| \| \| \| \|	This will be used for color output in debug messages. Reviewed-by: Lionel Landwerlin <[email protected]> Reviewed-by: Emil Velikov <[email protected]>
*	i965: Allow a per gen timebase scale factor	Robert Bragg	2017-03-17	2	-2/+35
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Prior to Skylake the Gen HW timestamps were driven by a 12.5MHz clock with the convenient property of being able to scale by an integer (80) to nanosecond units. For Skylake the frequency is 12MHz or a scale factor of 83.333333 This updates gen_device_info to track a floating point timebase_scale factor and makes corresponding _queryobj.c changes to no longer assume a scale factor of 80 works across all gens. Although the gen6_ code could have been been left alone, the changes keep the code more comparable, and it now shares a few utility functions for scaling raw timestamps and calculating deltas. The utility for calculating deltas takes into account 32 or 36bit overflow depending on the current kernel version. Note: this leaves the timestamp handling of ARB_query_buffer_object untouched, which continues to use an incorrect scale of 80 on Skylake for now. This is more awkward to solve since the scaling is currently done using a very limited uint64 ALU available to the command parser that doesn't support multiply or divide where it's already taking a large number of instructions just to effectively multiple by 80. This fixes piglit arb_timer_query-timestamp-get on Skylake v2: (Ken) Update timebase_scale for platforms past Skylake/Broxton too. Signed-off-by: Robert Bragg <[email protected]> Reviewed-by: Matt Turner <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
*	intel/debug: Add a common INTEL_DEBUG=nohiz option	Jason Ekstrand	2017-03-14	2	-0/+2
\| \| \| \| \| \| \| \|	The GL driver had a driconf option (which doesn't make much sense) and the Vulkan driver had a hand-rolled environment variable. Instead, let's tie both into the INTEL_DEBUG mechanism and unify things. Reviewed-by: Topi Pohjolainen <[email protected]>
*	i965: Remove use of deprecated drm_intel_aub routines	Chris Wilson	2017-03-07	2	-20/+18
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	With mesa/drm commit cd2f91e18db087edf93fed828e568ee53b887860 Author: Kristian Høgsberg Kristensen <[email protected]> Date: Fri Jul 31 10:47:50 2015 -0700 intel: Drop aub dumping functionality the drm_intel_aub routines are mere stubs and do nothing. Likewise remove our invocations. Signed-off-by: Chris Wilson <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
*	i965: don't require 64bit cmpxchg	Grazvydas Ignotas	2017-03-06	1	-3/+11
\| \| \| \| \| \| \| \| \| \| \| \|	There are still some distributions trying to support unfortunate people with old or exotic CPUs that don't have 64bit atomic operations. The only thing preventing compile of the Intel driver for them seems to be initialization of a debug variable. v2: use call_once() instead of unsafe code, as suggested by Matt Turner Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=93089 Signed-off-by: Grazvydas Ignotas <[email protected]>
*	i965: Move intel_debug.h to intel/common/gen_debug.h	Jason Ekstrand	2017-03-01	2	-0/+243
\| \| \| \| \| \| \| \| \| \|	This is shared between the Vulkan and GL drivers as it's a requirement of the back-end compiler. However, it doesn't really belong in the compiler. We rename the file to match the prefix of the other stuff in common and because libdrm defines an intel_debug.h and this avoids a pile of possible name conflicts. Reviewed-by: Anuj Phogat <[email protected]>
*	i965: Fix a mistake from porting the URB allocation code to arrays.	Kenneth Graunke	2016-11-23	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Commit 6d416bcd846a49414f210cd761789156c37a7b3e (i965: Use arrays in Gen7+ URB code.) introduced a regression which caused us to fail to allocate all of our URB space. - total_wants -= ds_wants; + total_wants -= additional; The new line should have been total_wants -= wants[i]. Fixes a large performance regression in TessMark. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=98815 Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Matt Turner <[email protected]>
*	intel/common: Add an is_kabylake field to gen_device_info	Jason Ekstrand	2016-11-22	2	-0/+6
\| \| \| \| \| \| \| \|	Most of the 3-D engine Kaby Lake is identical to Sky Lake. However, there are a few small differences that we need to be able to detect. Signed-off-by: Jason Ekstrand <[email protected]> Reviewed-by: Matt Turner <[email protected]>
*	intel: Share URB configuration code between GL and Vulkan.	Kenneth Graunke	2016-11-19	2	-0/+207
\| \| \| \| \| \| \| \| \|	This code is far too complicated to cut and paste. v2: Update the newly added genX_gpu_memcpy.c; const a few things. Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Topi Pohjolainen <[email protected]>
*	intel: Convert devinfo->urb.min_*_entries into an array.	Kenneth Graunke	2016-11-19	2	-30/+63
\| \| \| \| \|	Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Topi Pohjolainen <[email protected]>
*	intel: Convert devinfo->urb.max_*_entries into an array.	Kenneth Graunke	2016-11-19	2	-60/+92
\| \| \| \| \|	Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Topi Pohjolainen <[email protected]>
*	i965: Consolidate GEN9 LP definition	Ben Widawsky	2016-11-15	1	-80/+42
\| \| \| \| \| \|	Signed-off-by: Ben Widawsky <[email protected]> Reviewed-by: Anuj Phogat <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
*	i965/glk: Add basic Geminilake support	Ben Widawsky	2016-11-15	1	-0/+46
\| \| \| \| \| \| \| \| \| \|	v2: s/bdw/gen; Add the 2x6 config v3: Add min_ds_entries Cc: "13.0" <[email protected]> Signed-off-by: Ben Widawsky <[email protected]> Reviewed-by: Anuj Phogat <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
*	intel: Set min_ds_entries on Broxton.	Kenneth Graunke	2016-11-15	1	-0/+2
\| \| \| \| \| \| \| \|	This was missing. Cc: [email protected] Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Ben Widawsky <[email protected]>
*	mesa: Fix pixel shader scratch space allocation on Gen9+ platforms.	Kenneth Graunke	2016-11-09	1	-14/+19
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	We had missed a bit of errata - PS scratch needs to be computed as if there were 4 subslices per slice, rather than 3. Skylake Broxton Kabylake GT1 GT2 GT3 GT4 2x6 3x6 GT1 GT1.5 GT2 GT3 GT4 Actual Slices 1 1 2 3 1 1 1 1 1 2 3 Total Subslices 3 3 6 9 2 3 2 3 3 6 9 Subsl. for PS Scratch 4 4 8 12 4 4 4 4 4 8 12 Note that Skylake GT1-3 already worked because we allocated 64 * 9 (trying to use a value that would work on GT4, with 9 subslices), and the actual required values were 64 * 4 or 64 * 8. However, all others (Skylake GT4, Broxton, and Kabylake GT1-4) underallocated, which can lead to scratch writes trashing random process memory, and rendering corruption or GPU hangs. Fixes GPU hangs and rendering corruption on Skylake GT4 in shaders that spill. Particularly, dEQP-GLES31.functional.ubo.all_per_block_buffers.* now runs successfully with no hangs and renders correctly. This may fix problems on Broxton and Kabylake as well. Cc: "13.0" <[email protected]> Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Ben Widawsky <[email protected]>