mesa.git - Unnamed repository; edit this file 'description' to name the repository.

	Commit message (Collapse)	Author	Age	Files	Lines
*	i965/vec4: check swizzle before discarding a uniform on a 3src operand	Alejandro Piñeiro	2015-09-24	1	-3/+6
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Without this commit, copy propagation is discarded if it involves a uniform with an instruction that has 3 sources. But 3 sourced instructions can access scalar values. For example, this is what vec4_visitor::fix_3src_operand() is already doing: if (src.file == UNIFORM && brw_is_single_value_swizzle(src.swizzle)) return src; Shader-db results (unfiltered) on NIR: total instructions in shared programs: 6259650 -> 6241985 (-0.28%) instructions in affected programs: 812755 -> 795090 (-2.17%) helped: 7930 HURT: 0 Shader-db results (unfiltered) on IR: total instructions in shared programs: 6445822 -> 6441788 (-0.06%) instructions in affected programs: 296630 -> 292596 (-1.36%) helped: 2533 HURT: 0 v2: - Updated commit message, using Matt Turner suggestions - Move the check after we've created the final value, as Jason Ekstrand suggested - Clean up the condition v3: - Move the check back to the original place, to keep things tidy, as suggested by Jason Ekstrand v4: - Fixed missing is_single_value_swizzle() as pointed by Jason Ekstrand Reviewed-by: Matt Turner <[email protected]>
*	i965: Respect stride and subreg_offset for ATTR registers	Kristian Høgsberg Kristensen	2015-09-24	1	-1/+4
\| \| \| \| \| \| \| \| \| \| \| \|	When we assign hw regs to attributes, we don't incorporate the stride and subreg_offset from the fs_reg. It's rarely used, but the integer multiplication lowering uses unusual stride and subreg_offset combination breaks when one source is an attribute. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=91970 Cc: "10.6 11.0" <[email protected]> Signed-off-by: Kristian Høgsberg Kristensen <[email protected]> Reviewed-by: Matt Turner <[email protected]>
*	mesa: rework Driver.CopyImageSubData() and related code	Brian Paul	2015-09-24	3	-31/+154
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Previously, core Mesa's _mesa_CopyImageSubData() created temporary textures to wrap renderbuffer sources/destinations. This caused a bit of a mess in the Mesa/gallium state tracker because we had to basically undo that wrapping. Instead, change ctx->Driver.CopyImageSubData() to take both gl_renderbuffer and gl_texture_image src/dst pointers (one being null, the other non-null) so the driver can handle renderbuffer vs. texture as needed. For the i965 driver, we basically moved the code that wrapped textures around renderbuffers from copyimage.c down into the met and driver code. The old code in copyimage.c also made some questionable calls to _mesa_BindTexture(), etc. which weren't undone at the end. v2 (Jason Ekstrand): Rework the intel bits v3 (Brian Paul): Update the temporary st_CopyImageSubData() function. Reviewed-by: Topi Pohjolainen <[email protected]> Tested-by: Kai Wasserbäch <[email protected]> Tested-by: Nick Sarnie <[email protected]>
*	i965: add ARB_texture_barrier support	Ilia Mirkin	2015-09-23	2	-0/+10
\| \| \| \| \|	Signed-off-by: Ilia Mirkin <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]>
*	i965/gs: Fix extra level of indentation left by the previous commit.	Kenneth Graunke	2015-09-23	2	-115/+111
\| \| \| \| \| \| \| \|	I left a bunch of code indented a level in the previous patch to make the diff easier to read. But now we should fix that. Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]>
*	i965/gs: Use new NIR intrinsics.	Kenneth Graunke	2015-09-23	4	-26/+48
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	By performing the vertex counting in NIR, we're able to elide a ton of useless safety checks around every EmitVertex() call: total instructions in shared programs: 3952 -> 3720 (-5.87%) instructions in affected programs: 3491 -> 3259 (-6.65%) helped: 11 HURT: 0 Improves performance in Gl32GSCloth by 0.671742% +/- 0.142202% (n=621) on Haswell GT3e at 1024x768. This should also make it easier to implement Broadwell's "Static Vertex Count" feature someday. Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]>
*	i915: Make hw_prim[] const	Ville Syrjälä	2015-09-23	1	-1/+1
\| \| \| \| \| \| \| \|	The table used to map the GL primitive to the hw primitive never changes so make it const. Signed-off-by: Ville Syrjälä <[email protected]> Reviewed-by: Ian Romanick <[email protected]>
*	mesa: Remove unused HAVE_TRI_STRIP_1 defines	Ian Romanick	2015-09-23	5	-5/+0
\| \| \| \| \| \| \|	Defined to 0 in a few places, but it's not used anywhere. Signed-off-by: Ian Romanick <[email protected]> Reviewed-by: Brian Paul <[email protected]>
*	t_dd_dmatmp: Remove HAVE_QUADS support	Ian Romanick	2015-09-23	2	-2/+0
\| \| \| \| \| \| \| \| \|	Two drivers use this file, and neither supports quads. No piglit regressions on i915 (G33) or radeon (Radeon 7500). Signed-off-by: Ian Romanick <[email protected]> Reviewed-by: Brian Paul <[email protected]>
*	t_dd_dmatmp: Remove HAVE_QUAD_STRIPS support	Ian Romanick	2015-09-23	2	-2/+0
\| \| \| \| \| \| \| \| \|	Two drivers use this file, and neither supports quad strips. No piglit regressions on i915 (G33) or radeon (Radeon 7500). Signed-off-by: Ian Romanick <[email protected]> Reviewed-by: Brian Paul <[email protected]>
*	t_dd_dmatmp: Make "count" actually be the count	Ian Romanick	2015-09-23	2	-2/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The value passed in count previously was "vertex after the last vertex to be processed." Calling that "count" was misleading and kind of mean. Looking at the code, many functions immediately do "count-start" to get back the true count. That's just silly. If it is better for the loops to be 'for (j = start; j < (start + count); j++)', GCC will do that transformation. NOTE: There is some strange formatting left by this patch. That was done to make it more obvious that the before and after code is equivalent. These will be fixed in the next patch. No piglit regressions on i915 (G33) or radeon (Radeon 7500). v2: Fix a remaining (count-start) in render_quad_strip_verts. Signed-off-by: Ian Romanick <[email protected]> Reviewed-by: Brian Paul <[email protected]> [v1] Cc: "10.6 11.0" <[email protected]>
*	i965/vec4: Don't coalesce regs in Gen6 MATH ops if reswizzle/writemask needed	Antia Puentes	2015-09-23	2	-3/+12
\| \| \| \| \| \| \| \|	Gen6 MATH instructions can not execute in align16 mode, so swizzles or writemasking are not allowed. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=92033 Reviewed-by: Matt Turner <[email protected]>
*	i965/vec4: Detect and delete useless MOVs.	Matt Turner	2015-09-22	1	-0/+22
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	With NIR: instructions in affected programs: 111508 -> 109193 (-2.08%) helped: 507 Without NIR: instructions in affected programs: 28763 -> 28474 (-1.00%) helped: 186 Reviewed-by: Jason Ekstrand <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
*	i965/vec4: Add support for fdph_replicated	Jason Ekstrand	2015-09-22	1	-0/+5
\| \| \| \| \|	Reviewed-by: Matt Turner <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
*	i965: Add defines for tessellation stages	Chris Forbes	2015-09-22	1	-0/+72
\| \| \| \| \| \| \| \| \| \| \| \| \|	v2 (Ken): - Squash together commits for HS, DS, and TE, as well as fixes. - Add INTEL_MASK variants so we can use SET_FIELD if we want. - Rename GEN7_HS_INSTANCE_CONTROL to GEN7_HS_INSTANCE_COUNT to match the documentation. - Add some more fields from the PRMs. - Add Broadwell variants. Signed-off-by: Chris Forbes <[email protected]> Signed-off-by: Kenneth Graunke <[email protected]>
*	i965/vec4: refactor brw_vec4_copy_propagation.	Alejandro Piñeiro	2015-09-22	1	-14/+18
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Now it is more similar to brw_fs_copy_propagation, with three clear stages: 1) Build up the value we are propagating as if it were the source of a single MOV: 2) Check that we can propagate that value 3) Build the final value Previously everything was somewhat messed up, making the implementation on some specific cases, like knowing if you can propagate from a previous instruction even with type mismatches, even messier (for example, with the need of maintaining more of one has_source_modifiers). The refactoring clears stuff, and gives support to this mentioned use case without doing anything extra (for example, only one has_source_modifiers is used). Shader-db results for vec4 programs on Haswell: total instructions in shared programs: 1683842 -> 1669037 (-0.88%) instructions in affected programs: 739837 -> 725032 (-2.00%) helped: 6237 HURT: 0 v2: using 'arg' index to get the from inst was wrong v3: rebased against last change on the previous patch of the series v4: don't need to track instructions on struct copy_entry, as we only set the source on a direct copy v5: change the approach for a refactoring v6: tweaked comments Reviewed-by: Jason Ekstrand <[email protected]>
*	i965: fix textureGrad for cubemaps	Tapani Pälli	2015-09-22	1	-19/+182
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Fixes bugs exposed by commit 2b1cdb0eddb73f62e4848d4b64840067f1f70865 in: ES3-CTS.gtf.GL3Tests.shadow.shadow_execution_frag No regressions observed in deqp, CTS or Piglit. v2: address review feedback from Iago Toral: - move rho calculation to else branch - optimize dx and dy calculation - fix documentation inconsistensies Signed-off-by: Tapani Pälli <[email protected]> Signed-off-by: Kevin Rogovin <[email protected]> Reviewed-by: Iago Toral Quiroga <[email protected]> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=91114 Cc: "10.6 11.0" <[email protected]>
*	i965: Clean up GLSL compiler option setup	Jason Ekstrand	2015-09-21	1	-26/+20
\| \| \| \| \| \| \| \| \|	The only functional change here is that we now set EmitNoIndirectOutput and EmitNoIndirectTemp for compute shaders. Compute shaders don't have outputs per-se and we should have been setting EmitNoIndirectTemp all along. Reviewed-by: Jordan Justen <[email protected]> Reviewed-by: Matt Turner <[email protected]>
*	i965/skl: Use larger URB size where available.	Ben Widawsky	2015-09-21	1	-1/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	All SKL SKUs except the lowest one which has half the L3 size actually have 384K of URB per slice. For once, I can explain how this mistake was made and how it was missed in review... Historically when we enable a platform and put the production sizes, you can simply look at the "smallest" SKU and see what its URB size is (and we assumed it was the 1 slice variant). Since on newer platforms the URB sizes are scaled automatically by HW, this was sufficient. On SKL, this is a bit different as the lowest SKU actually has half of the L3 fused off. GT2 is the 1 slice (not GT1) variant and it has 384K. There are no Jenkins tests fixed (or regressions) and we don't expect any fixes here because you can always run with less URB size. Thanks to Sarah for bringing this to my attention. Cc: Sarah Sharp <[email protected]> Signed-off-by: Ben Widawsky <[email protected]> Reviewed-by: Jordan Justen <[email protected]>
*	i965: Fix MRF register number assertions for compr4.	Kenneth Graunke	2015-09-21	1	-2/+2
\| \| \| \| \| \| \| \| \| \|	compr4 is represented by setting the high bit on the MRF number. We need to mask it out before sanity checking the register number. Fixes ~8000 assert fails on Ironlake and G45. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=92066 Signed-off-by: Kenneth Graunke <[email protected]>
*	i965/vec4: Use MRF registers 21-23 for spilling in gen6	Iago Toral Quiroga	2015-09-21	1	-4/+6
\| \| \| \|	Reviewed-by: Kenneth Graunke <[email protected]>
*	i965/fs: Use MRF registers 21-23 for spilling in gen6	Iago Toral Quiroga	2015-09-21	1	-4/+7
\| \| \| \|	Reviewed-by: Kenneth Graunke <[email protected]>
*	i965: Turn BRW_MAX_MRF into a macro that accepts a hardware generation	Iago Toral Quiroga	2015-09-21	8	-28/+28
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	There are some bug reports about shaders failing to compile in gen6 because MRF 14 is used when we need to spill. For example: https://bugs.freedesktop.org/show_bug.cgi?id=86469 https://bugs.freedesktop.org/show_bug.cgi?id=90631 Discussion in bugzilla pointed to the fact that gen6 might actually have 24 MRF registers available instead of 16, so we could use other MRF registers and avoid these conflicts (we still need to investigate why some shaders need up to MRF 14 anyway, since this is not expected). Notice that the hardware docs are not clear about this fact: SNB PRM Vol4 Part2's "Table 5-4. MRF Registers Available in Device Hardware" says "Number per Thread" - "24 registers" However, SNB PRM Vol4 Part1, 1.6.1 Message Register File (MRF) says: "Normal threads should construct their messages in m1..m15. (...) Regardless of actual hardware implementation, the thread should not assume th at MRF addresses above m15 wrap to legal MRF registers." Therefore experimentation was necessary to evaluate if we had these extra MRF registers available or not. This was tested in gen6 using MRF registers 21..23 for spilling and doing a full piglit run (all.py) forcing spilling of everything on the FS backend. It was also tested by doing spilling of everything on both the FS and the VS backends with a piglit run of shader.py. In both cases no regressions were observed. In fact, many of these tests where helped in the cases where we forced spilling, since that triggered the same underlying problem described in the bug reports. Here are some results using INTEL_DEBUG=spill_fs,spill_vec4 for a shader.py run on gen6 hardware: Using MRFs 13..15 for spilling: crash: 2, fail: 113, pass: 6621, skip: 5461 Using MRFs 21..23 for spilling: crash: 2, fail: 12, pass: 6722, skip: 5461 This patch sets the ground for later patches to implement spilling using MRF registers 21..23 in gen6. Reviewed-by: Kenneth Graunke <[email protected]>
*	i965: Move MRF register asserts out of brw_reg.h	Iago Toral Quiroga	2015-09-21	4	-7/+16
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	In a later patch we will make BRW_MAX_MRF return a different value depending on the hardware generation, but it is inconvenient to add a gen parameter to the brw_reg functions only for the assertions, so move these to places where we have the hardware generation available. Ken suggested to add the asserts to brw_set_src0 and brw_set_dest since that would make sure that we catch all uses of MRF registers, even those coming from modules that generate native code directly, like blorp. Unfortunately, this is very late in the process which can make things harder to debug, so add asserts to the generator as well. Reviewed-by: Kenneth Graunke <[email protected]>
*	i965: Maximum allowed size of SEND messages is 15 (4 bits)	Iago Toral Quiroga	2015-09-21	4	-2/+10
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Until now we only used MRFs 1..15 for regular SEND messages, so the message length could not possibly exceed the maximum size. Soon we'll allow to use MRF registers 1..23 in gen6, so we need to be careful not to build messages that can go beyond the limit. That could occur, specifically, when building URB write messages, which we may need to split in chunks due to their size. Previously we would simply go and create a new message when we reached MRF 13 (since 13..15 were reserved for spilling), now we also want to check the size of the message explicitly. Besides adding that condition to split URB write messages properly, this patch also adds asserts in the generator. Notice that brw_inst_set_mlen already asserts for this, but asserting in the generators is easy and can make debugging easier in some cases. Reviewed-by: Kenneth Graunke <[email protected]>
*	i965/vec4/nir: Remove all "this->" snippets	Eduardo Lima Mitev	2015-09-20	1	-16/+15
\| \| \| \| \| \| \| \|	For consistency, either we have all class members dereferenced, or none. In this case, very few are so lets get rid of them all. Reviewed-by: Timothy Arceri <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
*	dri/common: fix gbm-symbols-check regression	Marcin Ślusarz	2015-09-20	1	-1/+1
\| \| \| \| \| \| \| \|	Broken by commit c228514c72cb2fd5fb9e510808e29204fc9e7ae1 "dri/common: use sysconfdir when looking for drirc". Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=92054 Signed-off-by: Marcin Ślusarz <[email protected]>
*	dri/common: use sysconfdir when looking for drirc	Marcin Ślusarz	2015-09-19	2	-1/+6
\| \| \| \| \| \| \| \|	Useful when locally installed mesa has more quirks than the system one. Signed-off-by: Marcin Ślusarz <[email protected]> Reviewed-by: Ilia Mirkin <[email protected]> Reviewed-by: Emil Velikov <[email protected]>
*	nir/lower_tex: support projector lowering per sampler type	Rob Clark	2015-09-18	1	-1/+4
\| \| \| \| \| \| \| \| \| \| \| \|	Some hardware, such as adreno a3xx, supports txp on some but not all sampler types. In this case we want more fine grained control over which texture projectors get lowered. v2: split out nir_lower_tex_options struct to make it easier to add the additional parameters coming in the following patches Signed-off-by: Rob Clark <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
*	nir: rename nir_lower_tex_projector	Rob Clark	2015-09-18	1	-1/+1
\| \| \| \| \| \| \| \| \| \|	Since the following patches will add additional tex-lowering related functionality, which doesn't make sense to split out into a separate pass (as they would require duplication of the projector lowering logic), let's give this pass a more generic name. Signed-off-by: Rob Clark <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
*	i965/vec4: Change types as needed to propagate source modifiers using ↵	Alejandro Piñeiro	2015-09-19	1	-2/+28
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	current instruction SEL and MOV instructions, as long as they don't have source modifiers, are just copying bits around. So those kind of instruction could be propagated even if there are type mismatches. This is needed because NIR generates integer SEL and MOV instructions whenever it doesn't know what else to generate. This commit adds support for copy propagation using current instruction as reference. Equivalent to commit 472ef9 but for vec4. v2: include check for saturate, as Jason Ekstrand suggested v3: check that the dst.type and the src type are the same, in order to solve (among others) the following deqp regression with v2: dEQP-GLES3.functional.shaders.operator.unary_operator.minus.lowp_uint_vertex Reviewed-by: Jason Ekstrand <[email protected]>
*	i965/fs: Fix comparison between signed and unsigned integer expressions	Iago Toral Quiroga	2015-09-18	1	-2/+2
\| \| \| \| \| \| \|	brw_fs_visitor.cpp: In member function 'void fs_visitor::emit_urb_writes()': brw_fs_visitor.cpp:977:58: warning: comparison between signed and unsigned integer expressions [-Wsign-compare] Reviewed-by: Tapani Pälli <[email protected]>
*	i965/vec4: Use nir_move_vec_src_uses_to_dest	Jason Ekstrand	2015-09-17	1	-0/+3
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The idea here is not that it gives register coalescing a little bit of a helping hand. It doesn't actually fix the coalescing problems, but it seems to help a good bit. Shader-db results for vec4 programs on Haswell: total instructions in shared programs: 1746280 -> 1683959 (-3.57%) instructions in affected programs: 1259166 -> 1196845 (-4.95%) helped: 11363 HURT: 148 v2 (Jason Ekstrand): - Run nir_move_vec_src_uses_to_dest after going out of SSA - New shader-db numbers Reviewed-by: Eduardo Lima Mitev <[email protected]>
*	i965/fs: The barrier send uses only 1 payload register	Jordan Justen	2015-09-15	2	-2/+5
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	When preparing the barrier payload, the instructions should operate in simd8 mode since we only use 1 payload register. fs_inst::regs_read is also updated to indicate that it only reads one register for SHADER_OPCODE_BARRIER. These issues were flagged by: commit cadd7dd384b33a779d46bd664f456bed4a21a5b7 Author: Jason Ekstrand <[email protected]> Date: Thu Jul 2 15:41:02 2015 -0700 i965/fs: Add a very basic validation pass Signed-off-by: Jordan Justen <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]>
*	i965/vec4: Use the replicated fdot instruction in NIR	Jason Ekstrand	2015-09-15	2	-3/+11
\| \| \| \| \|	Reviewed-by: Connor Abbott <[email protected]> Reviewed-by: Eduardo Lima Mitev <[email protected]>
*	i965/vec4_nir: Use partial SSA form rather than full non-SSA	Jason Ekstrand	2015-09-15	3	-4/+20
\| \| \| \| \| \| \| \| \|	We made this switch in the FS backend some time ago and it seems to make a number of things a bit easier. In particular, supporting SSA values takes very little work in the backend and allows us to take advantage of the majority of the SSA information even after we've gotten rid of Phi nodes. Reviewed-by: Eduardo Lima Mitev <[email protected]>
*	i965/fs: Add a very basic validation pass	Jason Ekstrand	2015-09-15	4	-0/+69
\| \| \| \| \| \| \| \|	Currently the validation pass only validates that regs_read and regs_written are consistent with the sizes of VGRF's. We can add more as we find it to be useful. Reviewed-by: Matt Turner <[email protected]>
*	i965/fs_surface_builder: Only apply predicate to components that exist	Jason Ekstrand	2015-09-15	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \|	In certain conditions, we have to do bounds-checking in the shader for image_load_store. The way this works for image loads is that we do a predicated load and then emit a series of selects, one per component, that gives us 0 or the loaded value depending on whether or not you're in bounds. However, we were hard-coding 4 components which may not be correct. Instead, we should be using size which is the number of components read. Reviewed-by: Francisco Jerez <[email protected]>
*	i965/fs: Only read output_components many components when writing an output	Jason Ekstrand	2015-09-15	1	-1/+3
\| \| \| \|	Reviewed-by: Kristian Høgsberg <[email protected]>
*	i965/fs: Set output_components for lowered clip distance outputs	Jason Ekstrand	2015-09-15	1	-0/+2
\| \| \| \|	Reviewed-by: Kristian Høgsberg <[email protected]>
*	i965: Move perf_debug code to brw_codegen_*_prog()	Kristian Høgsberg Kristensen	2015-09-14	5	-76/+75
\| \| \| \| \| \| \| \| \| \|	We're trying to avoid a libdrm dependency in the core compiler, so let's move the perf_debug code one level up from the brw__emit() helpers to the brw_codegen__prog() helpers. Reviewed-by: Kenneth Graunke <[email protected]> Reviewed-by: Jordan Justen <[email protected]> Signed-off-by: Kristian Høgsberg Kristensen <[email protected]>
*	i965: Move brw_fs_precompile() to brw_wm.c	Kristian Høgsberg Kristensen	2015-09-14	2	-58/+59
\| \| \| \| \| \| \| \| \|	All other precompile functions live in the brw_<stage>.c files, make fs follow the convention. Reviewed-by: Kenneth Graunke <[email protected]> Reviewed-by: Jordan Justen <[email protected]> Signed-off-by: Kristian Høgsberg Kristensen <[email protected]>
*	i965: Move compute shader code around	Kristian Høgsberg Kristensen	2015-09-14	5	-333/+362
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This moves the compute shader code around in order to make the way the code is split up more consistent. There should be no functional changes. Typically we have a few files per stage: brw_vs.c, brw_wm.c brw_gs.c: code to drive code generation and implement precompiling and cache search. genX_<stage>_state.c gen specific implementation of the state emission for the shader stage. The brw_*_emit() functions are all in the same files as the visitor classes they use (with the exception of VS, which may use either vec4 or fs). To make compute follow this convention, we move the brw_cs_emit() function into brw_fs.cpp. We can then rename brw_cs.cpp to brw_cs.c and do this in C like the other similar files. Finally, move state setup and atoms to gen7_cs_state.c. Reviewed-by: Kenneth Graunke <[email protected]> Reviewed-by: Jordan Justen <[email protected]> Signed-off-by: Kristian Høgsberg Kristensen <[email protected]>
*	meta: Abort meta pbo path if TexSubImage need signed unsigned conversion	Anuj Phogat	2015-09-14	1	-18/+25
\| \| \| \| \| \| \| \| \|	See similar fix for Readpixels in mesa commit 0d20790. Jason suggested we need that for TexSubImage as well. Cc: <[email protected]> Signed-off-by: Anuj Phogat <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]>
*	i965/vec4_nir: Load constants as integers	Antia Puentes	2015-09-14	1	-2/+2
\| \| \| \| \| \| \| \| \| \| \|	Loads constants using integer as their register type, like it is done in FS backend. No shader-db changes in HSW. Cc: "10.6 11.0" <[email protected]> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=91716 Reviewed-by: Jason Ekstrand <[email protected]>
*	i965/vec4: Fix saturation errors when coalescing registers	Antia Puentes	2015-09-14	1	-0/+21
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	If the register types do not match and the instruction that contains the final destination is saturated, register coalescing generated non-equivalent code. This did not happen when using IR because types usually matched, but it is visible in nir-vec4. For example, mov vgrf7:D vgrf2:D mov.sat m4:F vgrf7:F is coalesced to: mov.sat m4:D vgrf2:D The patch prevents coalescing in such scenario, unless the instruction we want to coalesce into is a MOV (without type conversion implied). In that case, the patch sets the register types to the type of the final destination. Shader-db results in HSW (only vec4 instructions shown): total instructions in shared programs: 1754415 -> 1754416 (0.00%) instructions in affected programs: 74 -> 75 (1.35%) helped: 0 HURT: 1 GAINED: 0 LOST: 0 Only one extra instruction in one of the shaders, that comes from eliminating a saturation error by preventing register coalesce. Cc: "10.6 11.0" <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]>
*	i965/nir: Support gl_WorkGroupID variable	Jordan Justen	2015-09-13	1	-1/+9
\| \| \| \| \|	Signed-off-by: Jordan Justen <[email protected]> Reviewed-by: Kristian Høgsberg <[email protected]>
*	i965/cs: Initialize gl_WorkGroupID variable from payload	Jordan Justen	2015-09-13	2	-0/+20
\| \| \| \| \|	Signed-off-by: Jordan Justen <[email protected]> Reviewed-by: Kristian Høgsberg <[email protected]>
*	i965/nir: Support gl_LocalInvocationID variable	Jordan Justen	2015-09-13	1	-0/+17
\| \| \| \| \|	Signed-off-by: Jordan Justen <[email protected]> Reviewed-by: Kristian Høgsberg <[email protected]>
*	i965/cs: Initialize gl_LocalInvocationID from payload	Jordan Justen	2015-09-13	2	-2/+24
\| \| \| \| \|	Signed-off-by: Jordan Justen <[email protected]> Reviewed-by: Kristian Høgsberg <[email protected]>