mesa.git - Unnamed repository; edit this file 'description' to name the repository.

	Commit message (Collapse)	Author	Age	Files	Lines
...
*	i965: Mark functions called from C as extern "C".	Matt Turner	2015-11-24	2	-3/+3
\| \| \| \| \| \| \| \|	These functions' prototypes are marked with extern "C", which apparently overrides a lack of extern "C" at the definition site if the prototype has been seen first. Reviewed-by: Ian Romanick <[email protected]>
*	i965: Push down inclusion of vbo/vbo.h.	Matt Turner	2015-11-24	2	-1/+1
\| \| \| \|	Reviewed-by: Ian Romanick <[email protected]>
*	i965: Remove duplicate #includes.	Matt Turner	2015-11-24	1	-2/+0
\| \| \| \| \| \| \|	Added in commits 36fd65381 and 337dad8ce even though the existing include was in view. Reviewed-by: Ian Romanick <[email protected]>
*	i965: Remove unneeded forward declarations.	Matt Turner	2015-11-24	4	-8/+0
\| \| \| \|	Reviewed-by: Ian Romanick <[email protected]>
*	i965: Mark count_trailing_one_bits() static.	Matt Turner	2015-11-24	1	-1/+1
\| \| \| \|	Reviewed-by: Ian Romanick <[email protected]>
*	i965: Remove useless gen6_blorp.h/gen7_blorp.h headers.	Matt Turner	2015-11-24	7	-88/+7
\| \| \| \|	Reviewed-by: Ian Romanick <[email protected]>
*	i965: Prevent implicit upcasts to brw_reg.	Matt Turner	2015-11-24	7	-14/+49
\| \| \| \| \| \| \|	Now that backend_reg inherits from brw_reg, we have to be careful to avoid the object slicing problem. Reviewed-by: Francisco Jerez <[email protected]>
*	i965: Use scope operator to ensure brw_reg is interpreted as a type.	Matt Turner	2015-11-24	4	-6/+6
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	In the next patch, I make backend_reg's inheritance from brw_reg private, which confuses clang when it sees the type "struct brw_reg" in the derived class constructors, thinking it is referring to the privately inherited brw_reg: brw_fs.cpp:366:23: error: 'brw_reg' is a private member of 'brw_reg' fs_reg::fs_reg(struct brw_reg reg) : ^ brw_shader.h:39:22: note: constrained by private inheritance here struct backend_reg : private brw_reg ^~~~~~~~~~~~~~~ brw_reg.h:232:8: note: member is declared here struct brw_reg { ^ Avoid this by marking brw_reg with the scope resolution operator.
*	i965: Use implicit backend_reg copy-constructor.	Matt Turner	2015-11-24	2	-5/+3
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	In order to do this, we have to change the signature of the backend_reg(brw_reg) constructor to take a reference to a brw_reg in order to avoid unresolvable ambiguity about which constructor is actually being called in the other modifications in this patch. As far as I understand it, the rule in C++ is that if multiple constructors are available for parent classes, the one closest to you in the class heirarchy is closen, but if one of them didn't take a reference, that screws things up. Reviewed-by: Francisco Jerez <[email protected]>
*	i965: Add and use backend_reg::equals().	Matt Turner	2015-11-24	4	-6/+12
\| \| \| \|	Reviewed-by: Francisco Jerez <[email protected]>
*	util: move brw_env_var_as_boolean() to util	Rob Clark	2015-11-24	5	-31/+8
\| \| \| \| \| \| \| \|	Kind of a handy function. And I'll want it available outside of i965 for common nir-pass helpers. Signed-off-by: Rob Clark <[email protected]> Reviewed-by: Nicolai Hähnle <[email protected]>
*	i965: Use NIR for lowering texture swizzle	Jason Ekstrand	2015-11-23	6	-196/+44
\| \| \| \| \| \| \| \| \| \|	Now that nir_lower_tex can do texture swizzle lowering, we can use that instead of repeating more-or-less the same code in both backends. This both allows us to share code and means that things like the tg4 work-arounds are somewhat simpler because they don't have to take the swizzle into account. Reviewed-by: Connor Abbott <[email protected]>
*	i965: Use nir_lower_tex for texture coordinate lowering	Jason Ekstrand	2015-11-23	8	-131/+42
\| \| \| \| \| \| \| \| \| \|	Previously, we had a rescale_texcoords helper in the FS backend for handling rescaling of texture coordinates. Now that we can do variants in NIR, we can use nir_lower_tex to do the rescaling for us. This allows us to delete the i965-specific code and gives us proper TEXTURE_RECTANGLE and GL_CLAMP handling in vertex and geometry shaders. Reviewed-by: Kenneth Graunke <[email protected]>
*	i965/fs: Stomp the texture return type to UINT32 for resinfo messages	Jason Ekstrand	2015-11-23	1	-0/+11
\| \| \| \|	Reviewed-by: Kenneth Graunke <[email protected]>
*	nir/lower_tex: Report progress	Jason Ekstrand	2015-11-23	1	-1/+1
\| \| \| \| \|	Reviewed-by: Kenneth Graunke <[email protected]> Reviewed-by: Iago Toral Quiroga <[email protected]>
*	i965: Move postprocess_nir to codegen time	Jason Ekstrand	2015-11-23	4	-5/+20
\| \| \| \| \| \| \| \| \|	This allows us to insert NIR passes between initial NIR compilation and optimization (link time) and actual backend code-gen. In particular, it will allow us to do shader variants in NIR and share some of that shader variant code between backends. Reviewed-by: Iago Toral Quiroga <[email protected]>
*	i965/nir: Split shader optimization and lowering into three stages	Jason Ekstrand	2015-11-23	2	-38/+104
\| \| \| \| \| \| \| \| \|	At the moment, brw_create_nir just calls the three stages in sequence so there's not much difference. Soon, however, we will want to start doing variants in NIR at which point the postprocessing step will have to move from shader create time to codegen time. Reviewed-by: Iago Toral Quiroga <[email protected]>
*	i965: Use ull immediates in brw_inst_bits	Jason Ekstrand	2015-11-23	1	-2/+2
\| \| \| \| \| \| \| \|	This fixes a regression introduced in b1a83b5d1 that caused basically all shaders to fail to compile on 32-bit platforms. Reported-by: Mark Janes <[email protected]> Reviewed-by: Matt Turner <[email protected]>
*	i965: Handle lum, intensity and missing components in the fast clear	Neil Roberts	2015-11-23	1	-2/+32
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	It looks like the sampler hardware doesn't take into account the surface format when sampling a cleared color after a fast clear has been done. So for example if you clear a GL_RED surface to 1,1,1,1 then the sampling instructions will return 1,1,1,1 instead of 1,0,0,1. This patch makes it override the color that is programmed in the surface state in order to swizzle for luminance and intensity as well as overriding the missing components. Fixes the ext_framebuffer_multisample-fast-clear Piglit test. v2: Handle luminance and intensity formats Reviewed-by: Ben Widawsky <[email protected]>
*	nir: s/nir_type_unsigned/nir_type_uint	Jason Ekstrand	2015-11-23	2	-3/+3
\| \| \| \| \| \| \| \| \| \| \|	v2: do the same in tgsi_to_nir (Samuel) v3: added missing cases after rebase (Iago) v4: Add a blank space after '#' in one of the comments (Matt) Reviewed-by: Iago Toral Quiroga <[email protected]> Reviewed-by: Matt Turner <[email protected]>
*	i965/fs: add stride restrictions for copy propagation	Connor Abbott	2015-11-23	1	-1/+55
\| \| \| \| \| \| \| \| \| \|	There are various restrictions on what the hstride can be that depend on the Gen, and now that we're using hstride == 2 for packing/unpacking doubles, we're going to run into these restrictions a lot more often. Pull them out into a separate function, and move the one restriction we checked previously into it. Reviewed-by: Matt Turner <[email protected]>
*	i965/fs: don't propagate cmod when the exec sizes differ	Connor Abbott	2015-11-23	1	-1/+2
\| \| \| \| \| \| \| \| \| \| \| \|	This can happen when the source of the compare was split by the SIMD lowering pass. Potentially, we could allow the case where the exec size of scan_inst is larger, and scan_inst has the right quarter selected, but doing that seems a little more risky. v2: Merge the bail condition into the the previous if/break block (Matt) Reviewed-by: Iago Toral Quiroga <[email protected]> Reviewed-by: Matt Turner <[email protected]>
*	i965/fs: respect force_sechalf/force_writemask_all in CSE	Connor Abbott	2015-11-23	1	-0/+2
\| \| \| \| \| \|	Reviewed-by: Iago Toral Quiroga <[email protected]> Reviewed-by: Matt Turner <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
*	i965: fix 64-bit immediates in brw_inst(_set)_bits	Connor Abbott	2015-11-23	1	-2/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	If we tried to get/set something that was exactly 64 bits, we would try to do (1 << 64) - 1 to calculate the mask which doesn't give us all 1's like we want. v2 (Iago) - Replace ~0 by ~0ull - Removed unnecessary parenthesis v3 (Kristian) - Avoid the conditional Reviewed-by: Iago Toral Quiroga <[email protected]> Reviewed-by: Matt Turner <[email protected]> Reviewed-by: Kristian Høgsberg <[email protected]>
*	i965/fs: print non-1 strides when dumping instructions	Connor Abbott	2015-11-23	1	-0/+12
\| \| \| \| \| \| \| \|	v2: - Simplify code (Iago) Reviewed-by: Iago Toral Quiroga <[email protected]> Reviewed-by: Matt Turner <[email protected]>
*	i965: Fix num_uniforms count for scalar GS.	Kenneth Graunke	2015-11-22	1	-1/+3
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	I noticed that brw_vs.c does this. I believe the point is that nir->num_uniforms is either counted in scalar components (in scalar mode), or vec4 slots (in vector mode). But we want param_count to be in scalar components regardless, so we have to scale up in vector mode. We don't have to scale up in scalar mode, though. Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]>
*	i965: Test that nonrepresentable floats cannot be converted to VF.	Matt Turner	2015-11-20	1	-0/+15
\| \| \| \|	Reviewed-by: Iago Toral Quiroga <[email protected]>
*	i965: Use ldexpf() in VF float test set up.	Matt Turner	2015-11-20	1	-8/+3
\| \| \| \|	Reviewed-by: Iago Toral Quiroga <[email protected]>
*	i965/vec4: Initialize nir_inputs with src_reg().	Matt Turner	2015-11-20	1	-1/+1
\| \| \| \| \| \| \| \|	nir_locals, nir_ssa_values, and nir_system_values are all dst_reg (not that that makes a whole lot of sense to me), and only nir_inputs is a src_reg. Reviewed-by: Francisco Jerez <[email protected]>
*	i965/fs: Add support for gl_HelperInvocation system value.	Matt Turner	2015-11-20	1	-0/+52
\| \| \| \| \| \| \| \|	In most cases (when the negate is copy propagated and the MOV removed), this is two instructions on Gen >= 8 and only two instructions on earlier platforms -- and it doesn't use the flag register. Reviewed-by: Jason Ekstrand <[email protected]>
*	i965: Add brw_imm_uv().	Matt Turner	2015-11-20	1	-0/+9
\|
*	i965: Don't bother setting regioning on immediates.	Matt Turner	2015-11-20	1	-6/+0
\| \| \| \|	The region fields are unioned with the immediate storage.
*	i965/gen9: Support fast clears for 32b float	Ben Widawsky	2015-11-20	2	-10/+6
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	SKL supports the ability to do fast clears and resolves of 32b RGBA as both integer and floats. This patch only enables float color clears because we haven't yet enabled integer color clears, (HW support for that was added in BDW). v2: Remove LUMINANCE16F and INTENSITY16F special cases since they are now handled by Neil's patch to disable MSAA fast clears. Signed-off-by: Ben Widawsky <[email protected]> Reviewed-by: Neil Roberts <[email protected]> Reviewed-by: Chad Versace <[email protected]>
*	Revert "i965/gen9: Enable rep clears on gen9"	Ben Widawsky	2015-11-20	1	-5/+0
\| \| \| \| \| \| \| \| \|	This reverts commit 8a0c85b25853decb4a110b6d36d79c4f095d437b. It's not a strict revert because I don't want to bring back the gen < 9 check at this point in time. Reviewed-by: Neil Roberts <[email protected]>
*	Revert "i965/gen9: Disable MCS for 1x color surfaces"	Ben Widawsky	2015-11-20	1	-8/+0
\| \| \| \| \| \|	This reverts commit dcd59a9e322edeea74187bcad65a8e56c0bfaaa2. Reviewed-by: Neil Roberts <[email protected]>
*	i965/meta/gen9: Individually fast clear color attachments	Ben Widawsky	2015-11-20	1	-13/+65
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The impetus for this patch comes from a seemingly benign statement within the spec (quoted within the patch). It is very important for clearing multiple color buffer attachments and can be observed in the following piglit tests: spec/arb_framebuffer_object/fbo-drawbuffers-none glclear spec/ext_framebuffer_multisample/blit-multiple-render-targets 0 v2: Doing the framebuffer binding only once (Chad) Directly use the renderbuffers from the mt (Chad) v3: Patch from Neil whose feedback I originally missed. Signed-off-by: Ben Widawsky <[email protected]> Reviewed-by: Chad Versace <[email protected]> Reviewed-by: Neil Roberts <[email protected]>
*	i965/skl: skip fast clears for certain surface formats	Ben Widawsky	2015-11-20	2	-27/+33
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Some of the information originally in this commit message is now in the patch before this. SKL adds compressible render targets and as a result mutates some of the programming for fast clears and resolves. There is a new internal surface type called the CCS. The old AUX_MCS bit becomes AUX_CCS_D. "Auxiliary Surfaces For Sampled Tiled Resource". The formats which are supported are defined in the table titled "Render Target Surface Types [SKL+]". There is no PRM yet to reference. The previously implemented helper function already does the right thing provided the table is correct. v2: Use better English in commit message (Matt) s/compressable/compressible/ (Matt) Don't compare bools to true (Matt) Use the helper function and don't increase the context size - this is mostly implemented in the patch just before this (Chad, Neil) Remove an "invalid" assert (Chad) Fix assertion to check num_samples > 1, instead of num_samples (Chad) v3: Use Matt's code as Requested-by: Chad. I didn't even look at it since Chad said he was fine with that, and presumably Matt is fine with it. v4: Use better quote from spec (Topi) Cc: Chad Versace <[email protected]> Signed-off-by: Ben Widawsky <[email protected]> Reviewed-by: Topi Pohjolainen <[email protected]>
*	i965: Add lossless compression to surface format table	Ben Widawsky	2015-11-20	3	-252/+282
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Background: Prior to Skylake and since Ivybridge Intel hardware has had the ability to use a MCS (Multisample Control Surface) as auxiliary data in "compression" operations on the surface. This reduces memory bandwidth. This hardware was either used for MSAA compression, or fast clear operations. On Gen8, a similar mechanism exists to allow the hiz buffer to be sampled from, and therefore this feature is sometimes referred to more generally as "AUX buffers". Skylake adds the ability to have the display engine directly source compressed surfaces on top of the ability to sample from them. Inference dictates that enabling this display features adds a restriction to the formats which could actually be compressed. This is backed up by a blurb in the AUX_CCS_D section from the RENDER_SURFACE_STATE: "In addition, if the surface is bound to the sampling engine, Surface Format must be supported for Render Target Compression for surfaces bound to the sampling engine." The current set of surfaces seems to be a subset as compared to previous gens (see the next patch). Also, if I had to guess I would guess that future gens add support for more surface formats. To make handling this a bit easier to read, and more future proof, the support for this is moved into the surface formats table. Along with the modifications to the table, a helper function is also provided to determine if a surface is CCS_E compatible. Because fast clears are currently disabled on SKL, we can plumb the helper all the way through here, and not actually have anything break. v2: - rename ccs to ccs_e; Requested-by: Chad - rename lossless_compression to lossless_compression Requested-by: Chad - change meaning of brw_losslessly_compressible_format Requested-by: Chad - related changes to the code to reflect this. - remove excess ccs (Chad) v3: - Commit message changes (Topi) - Const some things which could be const (Topi) Requested-by: Chad Versace <[email protected]> Requested-by: Neil Roberts <[email protected]> Signed-off-by: Ben Widawsky <[email protected]> Reviewed-by: Topi Pohjolainen <[email protected]> Reviewed-by: Chad Versace <[email protected]>
*	i965/skl: Add fast color clear infrastructure	Ben Widawsky	2015-11-20	4	-20/+81
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Patch was originally called: i965/skl: Enable fast color clears on SKL Skylake introduces some differences in the way that fast clears are programmed and in the restrictions for using fast clears. Since some of these are non-obvious, and fast clears are currently disabled globally, we can enable the simple stuff here and leave the weirder stuff and separately reviewable work. Based on a patch originally from Kristian. Note that within this patch the change in scaling factors could be achieved with this hunk instead. I've opted to keep things more like how the docs describe it however. --- a/src/mesa/drivers/dri/i965/intel_mipmap_tree.c +++ b/src/mesa/drivers/dri/i965/intel_mipmap_tree.c @@ -150,9 +150,13 @@ intel_get_non_msrt_mcs_alignment(struct brw_context brw, / In release builds, fall through / case I915_TILING_Y: width_px = 32 / mt->cpp; - height = 4; + if (brw->gen >= 9) + height = 2; + else + *height = 4; v2: Add braces for the multiline (Matt + Chad) Comment updates (requested by Chad) Modified commit message Commit message from Chad explaining the MCS height change (Chad) Signed-off-by: Ben Widawsky <[email protected]> Reviewed-by: Neil Roberts <[email protected]> Reviewed-by: Chad Versace <[email protected]>
*	xmlconfig: Add support for DragonFly	François Tigeot	2015-11-20	1	-0/+3
\| \| \| \|	Signed-off-by: Emil Velikov <[email protected]>
*	i965: Enable EXT_shader_samples_identical	Ian Romanick	2015-11-19	5	-2/+34
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	On the vec4 backend, textureSamplesIdentical() will always return false. There are currently no test cases for the vec4 backend, so we don't have much confidence in any implementation. We also don't think anyone is likely to miss it. v2: Handle immediate value for MCS smarter. Rebase on changes to nir_texop_sampels_identical (missing second parameter). Suggested by Jason. v3: Add Neil's code to handle 16x MSAA in the FS. Also rebase on top of f9a9ba5e. Stub out the vec4 implementation. Signed-off-by: Ian Romanick <[email protected]> Signed-off-by: Neil Roberts <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]> [v2] Reviewed-by: Chris Forbes <[email protected]> [v2]
*	i965/vec4: Handle nir_tex_src_ms_index more like the scalar	Ian Romanick	2015-11-19	1	-8/+10
\| \| \| \| \| \| \| \|	v2: Rebase on top of f9a9ba5e. Signed-off-by: Ian Romanick <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]> Reviewed-by: Chris Forbes <[email protected]>
*	nir: Add nir_texop_samples_identical opcode	Ian Romanick	2015-11-19	2	-0/+2
\| \| \| \| \| \| \| \| \| \| \|	This is the NIR analog to GLSL IR ir_samples_identical. v2: Don't add the second nir_tex_src_ms_index parameter. Suggested by Ken and Jason. Signed-off-by: Ian Romanick <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]> Reviewed-by: Chris Forbes <[email protected]>
*	nir: add nir_var_all enum	Rob Clark	2015-11-19	1	-1/+1
\| \| \| \| \| \| \| \| \|	Otherwise, passing -1 gets you: error: invalid conversion from 'int' to 'nir_variable_mode' [-fpermissive] Signed-off-by: Rob Clark <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]>
*	i965: Drop IMM fs_reg/src_reg -> brw_reg conversions.	Matt Turner	2015-11-19	2	-36/+2
\| \| \| \| \| \| \|	The previous two commits make this unnecessary. Reviewed-by: Emil Velikov <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
*	i965/vec4: Replace src_reg(imm) constructors with brw_imm_*().	Matt Turner	2015-11-19	12	-239/+195
\| \| \| \| \| \| \|	Cuts 1.5k of .text. Reviewed-by: Emil Velikov <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
*	i965/fs: Use brw_imm_uw().	Matt Turner	2015-11-19	2	-8/+3
\| \| \| \| \| \| \| \| \| \| \| \|	W/UW immediates are 16-bits, but those 16-bits must be replicated in the high 16-bits of the 32-bit field. Remove the useless W/UW immediate saturating code, since we'll now be using the appropriate immediate (and W/UW immediates in the IR can now no longer be larger than 16-bits). Reviewed-by: Emil Velikov <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
*	i965/fs: Replace fs_reg(imm) constructors with brw_imm_*().	Matt Turner	2015-11-19	9	-217/+167
\| \| \| \| \| \| \| \| \| \| \| \|	Cuts 10k of .text, of which only 776 bytes are the fs_reg constructor implementations themselves. text data bss dec hex filename 5204535 214112 27784 5446431 531b1f i965_dri.so before 5193977 214112 27784 5435873 52f1e1 i965_dri.so after Reviewed-by: Emil Velikov <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
*	i965: Make brw_imm_vf4() take 8-bit restricted floats.	Matt Turner	2015-11-19	2	-32/+11
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This partially reverts commit bbf8239f92ecd79431dfa41402e1c85318e7267f. I didn't like that commit to begin with -- computing things at compile time is fine -- but for purposes of verifying that the resulting values are correct, looking up 0x00 and 0x30 in a table is a lot better than evaluating a recursive function. Anyway, by making brw_imm_vf4() take the actual 8-bit restricted floats directly (instead of only integral values that would be converted to restricted float), we can use this function as a replacement for the vector float src_reg/fs_reg constructors. brw_float_to_vf() is not currently an inline function, so it will not be evaluated at compile time. I'll address that in a follow-up patch. Reviewed-by: Emil Velikov <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
*	i965: Allow indirect GS input indexing in the scalar backend.	Kenneth Graunke	2015-11-18	4	-46/+107
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This allows arbitrary non-constant indices on GS input arrays, both for the vertex index, and any array offsets beyond that. All indirects are handled via the pull model. We could potentially handle indirect addressing of pushed data as well, but it would add additional code complexity, and we usually have to pull inputs anyway due to the sheer volume of input data. Plus, marking pushed inputs as live due to indirect addressing could exacerbate register pressure problems pretty badly. We'd need to be careful. v2: Use updated MOV_INDIRECT opcode. Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Abdiel Janulgue <[email protected]> Reviewed-by: Kristian Høgsberg <[email protected]>