mesa.git - Unnamed repository; edit this file 'description' to name the repository.

	Commit message (Collapse)	Author	Age	Files	Lines
...
*	i965: Add a brw_hw_type_to_reg_type() function	Matt Turner	2017-08-21	2	-0/+29
\| \| \| \| \| \|	Will be used in later commits. Reviewed-by: Scott D Phillips <[email protected]>
*	i965: Use a common table to translate logical to hardware types	Matt Turner	2017-08-21	1	-36/+29
\| \| \| \|	Reviewed-by: Scott D Phillips <[email protected]>
*	i965: Extract functions dealing with register types to separate file	Matt Turner	2017-08-21	4	-140/+207
\| \| \| \| \| \| \| \| \| \|	I'm going to encapsulate all of the logic dealing with register types in this file. Rename the parameters for the hardware encodings from type -> hw_type at the same time. Reviewed-by: Scott D Phillips <[email protected]>
*	i965: Reverse file/type arguments to register type functions	Matt Turner	2017-08-21	4	-13/+15
\| \| \| \| \| \| \|	I think of the initial arguments as "state" and the last as the actual subject. Reviewed-by: Scott D Phillips <[email protected]>
*	i965: Add support for disassembling 64-bit integer immediates	Matt Turner	2017-08-21	2	-0/+13
\| \| \| \| \| \| \|	After the last patch converted things into enums, I helpfully got a compiler warning about these missing from the switch statement. Reviewed-by: Scott D Phillips <[email protected]>
*	i965: Use separate enums for register vs immediate types	Matt Turner	2017-08-21	6	-129/+144
\| \| \| \| \| \| \|	The hardware encodings often mean different things depending on whether the source is an immediate. Reviewed-by: Scott D Phillips <[email protected]>
*	i965: Reorder brw_reg_type enum values	Matt Turner	2017-08-21	5	-26/+21
\| \| \| \| \| \| \| \| \| \| \|	These vaguely corresponded to the hardware encodings, but that is purely historical at this point. Reorder them so we stop making things "almost work" when mixing enums. The ordering has been closen so that no enum value is the same as a compatible hardware encoding. Reviewed-by: Scott D Phillips <[email protected]>
*	i965: Validate destination restrictions with vector immediates	Matt Turner	2017-08-21	3	-12/+141
\| \| \| \|	Reviewed-by: Scott D Phillips <[email protected]>
*	i965: Don't let raw-move check be tricked by immediate vector types	Matt Turner	2017-08-21	1	-3/+10
\| \| \| \| \| \| \|	UB and B type encodings are the same as UV and VF. Noticed when writing the following patch. Reviewed-by: Scott D Phillips <[email protected]>
*	i965: Only change type of 0.0f to VF if destination stride == 1	Matt Turner	2017-08-21	1	-1/+2
\| \| \| \| \| \| \| \| \| \|	The destination stride must be equivalent to a dword if VF is used. Also, since the only compaction table entires with "i:vf" have the destination as "r:f" specifically check that the destination is of type float. Reviewed-by: Scott D Phillips <[email protected]>
*	i965: Remove CONT/BREAK from instruction compaction test	Matt Turner	2017-08-21	1	-4/+0
\| \| \| \| \| \| \|	These cannot be compacted. A similar mistake was fixed in commit 90eaf01616a8 Reviewed-by: Scott D Phillips <[email protected]>
*	i965: Test instruction compaction on all supported Gens	Matt Turner	2017-08-21	1	-8/+42
\| \| \| \| \| \| \| \|	Note that there's no point in testing on G45, since its compaction is the same as Gen5. Same logic applies to Gen7 variants and low-power parts. Reviewed-by: Scott D Phillips <[email protected]>
*	i965: Silence signed/unsigned comparison warning	Matt Turner	2017-08-21	1	-1/+1
\| \| \| \|	Reviewed-by: Scott D Phillips <[email protected]>
*	i965: Move compaction "prepass" into brw_eu_compact.c	Matt Turner	2017-08-21	2	-72/+82
\| \| \| \|	Reviewed-by: Scott D Phillips <[email protected]>
*	i965: Mark src inst pointer const in compaction code	Matt Turner	2017-08-21	2	-12/+13
\| \| \| \|	Reviewed-by: Scott D Phillips <[email protected]>
*	intel/compiler: properly size attribute wa_flags array for Vulkan	Iago Toral Quiroga	2017-08-11	1	-1/+17
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Mesa will map user defined vertex input attributes to slots starting at VERT_ATTRIB_GENERIC0 which gives us room for only 16 slots (up to GL_VERT_ATTRIB_MAX). This sufficient for GL, where we expose exactly 16 vertex attributes for user defined inputs, but in Vulkan we can expose up to 28 (which are also mapped from VERT_ATTRIB_GENERIC0 onwards) so we need to account for this when we scope the size of the array of attribute workaround flags that is used during the brw_vertex_workarounds NIR pass. This prevents out-of-bounds accesses in that array for NIR shaders that use more than 16 vertex input attributes. Fixes: dEQP-VK.pipeline.vertex_input.max_attributes.* Acked-by: Lionel Landwerlin <[email protected]>
*	intel/vec4/gs: reset nr_pull_param if DUAL_INSTANCED compile failed.	Dave Airlie	2017-08-03	1	-0/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	If dual object compile fails (as seems to happen with virgl a fair bit, and does piglit even have any tests for it?), we end up not restarting the pull params, so we call vec4_visitor::move_uniform_array_access_to_pull_constant a second time and it runs over the ends of the alloc. Fixes: tests/spec/glsl-1.50/execution/geometry/max-input-components.shader_test running inside virgl on ivybridge. Reviewed-by: Kenneth Graunke <[email protected]> Cc: <[email protected]> Signed-off-by: Dave Airlie <[email protected]>
*	i965: Fix indentation	Matt Turner	2017-08-02	2	-8/+8
\|
*	i965: Set lower_vote_trivial in vector_nir_options_gen6 too.	Kenneth Graunke	2017-07-21	1	-0/+1
\| \| \| \| \| \|	There's a second struct for Gen6+. Reviewed-by: Matt Turner <[email protected]>
*	i965/fs: Match destination type to size for ballot	Matt Turner	2017-07-20	2	-2/+6
\| \| \| \|	No use in taking a 64-bit value when we know the high 32-bits are zero.
*	nir: Reduce destination size of ballot intrinsic when possible	Matt Turner	2017-07-20	1	-0/+1
\| \| \| \| \| \| \| \| \|	Some hardware, like i965, doesn't support group sizes greater than 32. In that case, we can reduce the destination size of the ballot intrinsic, which will simplify our code generation. Reviewed-by: Connor Abbott <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
*	i965/fs: Implement ARB_shader_ballot operations	Matt Turner	2017-07-20	3	-0/+48
\| \| \| \|	Reviewed-by: Kenneth Graunke <[email protected]>
*	i965/fs: Do not move MOVs writing the flag outside of control flow	Matt Turner	2017-07-20	1	-2/+4
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The implementation of ballotARB() will start by zeroing the flags register. So, a doing something like if (gl_SubGroupInvocationARB % 2u == 0u) { ... = ballotARB(true); [...] } else { ... = ballotARB(true); [...] } (like fs-ballot-if-else.shader_test does) would generate identical MOVs to the same destination (the flag register!), and we definitely do not want to pull that out of the control flow. Reviewed-by: Kenneth Graunke <[email protected]>
*	i965/fs: Handle explicit flag sources in flags_read()	Francisco Jerez	2017-07-20	1	-4/+5
\| \| \| \| \| \| \|	The implementations of the ARB_shader_ballot intrinsics will explicitly read the flag as a source register. Reviewed-by: Matt Turner <[email protected]>
*	nir: Add system values from ARB_shader_ballot	Matt Turner	2017-07-20	2	-3/+3
\| \| \| \| \| \| \| \| \| \| \| \| \|	We already had a channel_num system value, which I'm renaming to subgroup_invocation to match the rest of the new system values. Note that while ballotARB(true) will return zeros in the high 32-bits on systems where gl_SubGroupSizeARB <= 32, the gl_SubGroup??MaskARB variables do not consider whether channels are enabled. See issue (1) of ARB_shader_ballot. Reviewed-by: Connor Abbott <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
*	i965/fs: Implement ARB_shader_group_vote operations	Matt Turner	2017-07-20	1	-0/+50
\| \| \| \|	Reviewed-by: Kenneth Graunke <[email protected]>
*	i965/fs: Handle explicit flag destinations in flags_written()	Francisco Jerez	2017-07-20	1	-4/+19
\| \| \| \| \| \| \|	The implementations of the ARB_shader_group_vote intrinsics will explicitly write the flag as the destination register. Reviewed-by: Matt Turner <[email protected]>
*	i965/vec4: Lower ARB_shader_group_vote intrinsics	Matt Turner	2017-07-20	1	-0/+1
\| \| \| \| \| \| \| \|	I don't expect anyone is going to care about using this in vec4 programs (vertex/tessellation/geometry on Gen6/7), no one has come up with a good way to implement it much less test it. Reviewed-by: Kenneth Graunke <[email protected]>
*	nir: Add pass to optimize intrinsics	Matt Turner	2017-07-20	1	-0/+1
\| \| \| \| \| \| \|	Specifically, constant fold intrinsics from ARB_shader_group_vote, but I suspect it'll be useful for other things in the future. Reviewed-by: Kenneth Graunke <[email protected]>
*	i965: Use pushed UBO data in the scalar backend.	Kenneth Graunke	2017-07-13	3	-1/+64
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This actually takes advantage of the newly pushed UBO data, avoiding pull loads. Improves performance in GLBenchmark Manhattan 3.1 by: HSW: ~1%, BDW/SKL/KBL GT2: 3-4%, SKL GT4: 7-8%, APL: 4-5%. (thanks to Eero Tamminen for these numbers) shader-db results on Skylake, ignoring programs with spill/fill changes: total instructions in shared programs: 13963994 -> 13651893 (-2.24%) instructions in affected programs: 4250328 -> 3938227 (-7.34%) helped: 28527 HURT: 0 total cycles in shared programs: 179808608 -> 172535170 (-4.05%) cycles in affected programs: 79720410 -> 72446972 (-9.12%) helped: 26951 HURT: 1248 LOST: 46 GAINED: 21 Many "Deus Ex: Mankind Divided" shaders which already spilled end up spill a lot more (about 240 programs hurt, 9 helped). The cycle estimator suggests this is still overall a win (-0.23% in cycle counts) presumably because we trade pull loads for fills. v2: Drop "PULL" environment variable left in for initial debugging (caught by Matt). Reviewed-by: Matt Turner <[email protected]>
*	i965: Factor out push locations.	Kenneth Graunke	2017-07-13	2	-16/+25
\| \| \| \| \| \| \| \|	With UBOs, the answer of "have we decided to push this uniform" gets a bit more complicated - for one, we have multiple surfaces. This patch refactors things so we can add the new code in a single place. Reviewed-by: Matt Turner <[email protected]>
*	i965: Push UBO data, but don't use it just yet.	Kenneth Graunke	2017-07-13	2	-1/+11
\| \| \| \| \| \| \| \| \| \| \|	This patch starts uploading UBO data via 3DSTATE_CONSTANT_* packets, and updates the compiler to know that there's extra payload data, so things continue working. However, it still issues pull loads for all data. I wanted to separate the two aspects for greater bisectability. v2: Update for new intel_bufferobj_buffer parameter. Reviewed-by: Matt Turner <[email protected]>
*	i965: Select ranges of UBO data to be uploaded as push constants.	Kenneth Graunke	2017-07-13	3	-0/+311
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This adds a NIR pass that decides which portions of UBOS we should upload as push constants, rather than pull constants. v2: Switch to uint16_t for the UBO block number, because we may have a lot of them in Vulkan (suggested by Jason). Add more comments about bitfield trickery (requested by Matt). v3: Skip vec4 stages for now...I haven't finished wiring up support in the vec4 backend, and so pushing the data but not using it will just be wasteful. Reviewed-by: Matt Turner <[email protected]>
*	i965: Switch to absolute addressing for constant buffer 0.	Kenneth Graunke	2017-07-13	1	-0/+6
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	By default, 3DSTATE_CONSTANT_* Constant Buffer 0 is relative to dynamic state base address. This makes it unusable for pushing UBOs. I'd like to be able to use all four push buffers. There is a bit in the INSTPM register (or CS_DEBUG_MODE2 on Skylake) which controls whether buffer 0 is relative to dynamic state base address, or simply a normal pointer. Setting that gives us full flexibility. We can't currently write this on Haswell and earlier, and will need to update the kernel command parser, and then do the whole version checking song and dance. Reviewed-by: Matt Turner <[email protected]>
*	intel/compiler: no need to check unsigned is >= 0	Lionel Landwerlin	2017-07-13	1	-1/+1
\| \| \| \| \| \|	CID: 1338342 Signed-off-by: Lionel Landwerlin <[email protected]> Reviewed-by: Anuj Phogat <[email protected]>
*	intel/compiler: don't check unsigned is >= 0	Lionel Landwerlin	2017-07-13	1	-1/+1
\| \| \| \| \| \|	CID: 1224468 Signed-off-by: Lionel Landwerlin <[email protected]> Reviewed-by: Anuj Phogat <[email protected]>
*	intel/compiler: remove check unsigned is >= 0	Lionel Landwerlin	2017-07-13	1	-1/+1
\| \| \| \| \| \| \| \|	By definition unsigned are always >= 0. CID: 742212 Signed-off-by: Lionel Landwerlin <[email protected]> Reviewed-by: Anuj Phogat <[email protected]>
*	intel/compiler: Don't use opt_sampler_eot() optimization on gen10+	Anuj Phogat	2017-07-12	1	-1/+1
\| \| \| \| \| \| \|	This optimization has been removed on gen10+. Signed-off-by: Anuj Phogat <[email protected]> Reviewed-by: Lionel Landwerlin <[email protected]>
*	i965/i915: Add UYVY as the supported format	Johnson Lin	2017-06-30	2	-0/+2
\| \| \| \| \| \|	Trigger the correct sampler options for it. Similar with YUYV Reviewed-by: Kristian H. Kristensen <[email protected]>
*	intel: compiler/i965: fix is_broxton checks	Lionel Landwerlin	2017-06-20	3	-4/+4
\| \| \| \| \| \| \| \| \| \|	In 5f2fe9302c is_geminilake was introduced for the differenciate broxton from geminilake. Unfortunately I failed as verifying that is_broxton is throughout the code base to mean Gen9lp. Fixes: 5f2fe9302c ("intel: common: add flag to identify platforms by name") Signed-off-by: Lionel Landwerlin <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
*	i965/cnl: Make URB {VS, GS, HS, DS} sizes non multiple of 3	Anuj Phogat	2017-06-09	4	-4/+33
\| \| \| \| \| \| \| \| \| \|	v1: By Ben Widawsky <[email protected]> v2: v1 had an assert only for VS. Add the restriction for GS, HS and DS as well and make sure the allocated sizes are not multiple of 3. v3: Move the entry_size checks in to compiler code (Ken) Signed-off-by: Anuj Phogat <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
*	i965/cnl: Handle gen10 in switch cases across the driver	Anuj Phogat	2017-06-09	2	-0/+3
\| \| \| \| \| \| \| \| \|	V2: Start using gen10 functions isl_gen10*(), gen10_blorp_exec() gen10_init_atoms() (Jason) Remove Vulkan changes. Do them later in a separate patch. Signed-off-by: Anuj Phogat <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]>
*	i965/cnl: Update few assertions	Anuj Phogat	2017-06-09	1	-1/+1
\| \| \| \| \|	Signed-off-by: Anuj Phogat <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]>
*	tree-wide: remove trailing backslash	Eric Engestrom	2017-06-07	1	-1/+1
\| \| \| \| \| \| \| \| \|	Simple search for a backslash followed by two newlines. If one of the newlines were to be removed, this would cause issues, so let's just remove these trailing backslashes. Signed-off-by: Eric Engestrom <[email protected]> Reviewed-by: Ian Romanick <[email protected]>
*	i965: Change INTEL_DEBUG=vec4 to INTEL_SCALAR_VS for consistency.	Kenneth Graunke	2017-06-05	1	-1/+1
\| \| \| \| \| \| \| \| \|	We moved to INTEL_SCALAR_* when we added more than a single stage, but never went back and converted the VS to work that way. Be consistent. Also update the documentation to actually mention these debug variables. Acked-by: Jason Ekstrand <[email protected]>
*	i965: Drop duplicate shadow variable.	Kenneth Graunke	2017-06-01	1	-1/+0
\| \| \| \| \| \|	We already initialized this at the top of the function. Trivial.
*	i965: Move SOL PSIZ hacks from draw time to link time.	Kenneth Graunke	2017-06-01	1	-12/+1
\| \| \| \| \| \| \| \| \|	We can just update the gl_transform_feedback_info fields at link time to make the VUE header fields have the right location and component. Then we don't need to handle them specially at draw time, which is expensive. Reviewed-by: Rafael Antognolli <[email protected]>
*	i965: Ignore INTEL_SCALAR_* debug variables on Gen10+.	Kenneth Graunke	2017-05-29	1	-10/+16
\| \| \| \| \| \| \| \| \| \| \|	Scalar mode has been default since Broadwell, and vector mode is getting increasingly unmaintained. There are a few things that don't even fully work in vector mode on Skylake, but we've never cared because nobody uses it. There's no point in porting it forward to new platforms. So, just ignore the debug options to force it on. Reviewed-by: Jason Ekstrand <[email protected]>
*	i965: Move clip program compilation to the compiler	Jason Ekstrand	2017-05-26	8	-0/+2340
\| \| \| \|	Reviewed-by: Topi Pohjolainen <[email protected]>
*	i965: Move SF compilation to the compiler	Jason Ekstrand	2017-05-26	3	-0/+931
\| \| \| \|	Reviewed-by: Topi Pohjolainen <[email protected]>