summaryrefslogtreecommitdiffstats
path: root/src/mesa/drivers
Commit message (Collapse)AuthorAgeFilesLines
* i965/vec4: Init mlen for several send from GRF instructions.Francisco Jerez2015-02-103-5/+11
| | | | Reviewed-by: Matt Turner <[email protected]>
* i965/vec4: Don't infer MRF dependencies for send from GRF instructions.Francisco Jerez2015-02-101-14/+18
| | | | Reviewed-by: Matt Turner <[email protected]>
* i965/vec4: Fix the scheduler to take into account reads and writes of ↵Francisco Jerez2015-02-103-5/+29
| | | | | | | | multiple registers. v2: Avoid nested ternary operators in vec4_instruction::regs_read(). (Matt) Reviewed-by: Matt Turner <[email protected]>
* i965/vec4: Make vec4_visitor::implied_mrf_writes() return zero for sends ↵Francisco Jerez2015-02-101-1/+1
| | | | | | from GRF. Reviewed-by: Matt Turner <[email protected]>
* i965/vec4: Pass dst register to the vec4_instruction constructor.Francisco Jerez2015-02-101-7/+5
| | | | | | So regs_written gets initialized with a sensible value. Reviewed-by: Matt Turner <[email protected]>
* i965/vec4: Initialize vec4_instruction::predicate and ::predicate_inverse.Francisco Jerez2015-02-101-0/+2
| | | | Reviewed-by: Matt Turner <[email protected]>
* i965/vec4: Implement equals() method for dst_reg too.Francisco Jerez2015-02-102-0/+18
| | | | Reviewed-by: Matt Turner <[email protected]>
* i965/fs: Fix fs_inst::regs_written calculation for instructions with scalar dst.Francisco Jerez2015-02-101-1/+2
| | | | | | | | | | | Scalar registers are required to have zero stride, fix the regs_written calculation not to assume that the instruction writes zero registers in that case. v2: Rename CEILING() to DIV_ROUND_UP(). (Matt, Ken) Reviewed-by: Kenneth Graunke <[email protected]> Reviewed-by: Matt Turner <[email protected]>
* i965/fs: Fix stack allocation of fs_inst and stop stealing src array ↵Francisco Jerez2015-02-102-37/+39
| | | | | | | | | | provided on construction. Using 'ralloc*(this, ...)' is wrong if the object has automatic storage or was allocated through any other means. Use normal dynamic memory instead. Reviewed-by: Matt Turner <[email protected]>
* i965/fs: Remove duplicate include of brw_shader.hFrancisco Jerez2015-02-101-1/+0
| | | | | | | The second one was inside an extern "C" block, luckily it was being discarded by the preprocessor. Reviewed-by: Matt Turner <[email protected]>
* i965: Move up fs_inst::flag_subreg to backend_instruction.Francisco Jerez2015-02-105-7/+16
| | | | Reviewed-by: Matt Turner <[email protected]>
* i965: Move up fs_inst::regs_written to backend_instruction.Francisco Jerez2015-02-103-1/+2
| | | | | | It will also be useful in the VEC4 back-end. Reviewed-by: Matt Turner <[email protected]>
* i965/vec4: Remove dependency of vec4_instruction on the visitor class.Francisco Jerez2015-02-103-36/+32
| | | | | | | | | The only reason why you need a vec4_visitor to construct a vec4_instruction is to initialize vec4_instruction::ir and ::annotation. Instead set them from vec4_visitor::emit() just like fs_visitor does. Reviewed-by: Matt Turner <[email protected]>
* i965/fs: Remove dependency of fs_inst on the visitor class.Francisco Jerez2015-02-107-13/+12
| | | | | | The fs_visitor argument of fs_inst::regs_read() wasn't used at all. Reviewed-by: Matt Turner <[email protected]>
* i965: Move IR object definitions to separate header files.Francisco Jerez2015-02-104-381/+450
| | | | | | | | | | | | One should be able to manipulate i965 IR without pulling the whole FS/VEC4 visitor classes -- Optimization passes and other transformations would ideally be visitor-agnostic. Among other issues this avoids a circular dependency between the header file where such visitor-agnostic code will be defined and the main FS/VEC4 header where both IR (layer below) and visitor (layer above) happen to be defined. Reviewed-by: Matt Turner <[email protected]>
* i965: Factor out virtual GRF allocation to a separate object.Francisco Jerez2015-02-1018-201/+235
| | | | | | | | | | | | | Right now virtual GRF book-keeping and allocation is performed in each visitor class separately (among other hundred different things), leading to duplicated logic in each visitor and preventing layering as it forces any code that manipulates i965 IR and needs to allocate virtual registers to depend on the specific visitor that happens to be used to translate from GLSL IR. v2: Use realloc()/free() to allocate VGRF book-keeping arrays (Connor). Reviewed-by: Matt Turner <[email protected]>
* i965: Fix integer border color on Haswell.Kenneth Graunke2015-02-093-0/+66
| | | | | | | | | +82 Piglits - 100% of border color tests now pass on Haswell. Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]> Reviewed-by: Chris Forbes <[email protected]> Cc: [email protected]
* i965: Use a gl_color_union for sampler border color.Kenneth Graunke2015-02-091-53/+52
| | | | | | | | | | | This should have no effect, but will make it easier to implement other bug fixes. v2: Eliminate "unsigned one" local; just use the value where necessary. Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Ian Romanick <[email protected]> Cc: [email protected]
* i965: Override swizzles for integer luminance formats.Kenneth Graunke2015-02-091-0/+8
| | | | | | | | | | | | | | | The hardware's integer luminance formats are completely unusable; currently we fall back to RGBA. This means we need to override the texture swizzle to obtain the XXX1 values expected for luminance formats. Fixes spec/EXT_texture_integer/texwrap formats bordercolor [swizzled] on Broadwell - 100% of border color tests now pass on Broadwell. Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Ian Romanick <[email protected]> Reviewed-by: Chris Forbes <[email protected]> Cc: [email protected]
* i965: Add more stringent blitter assertionsBen Widawsky2015-02-071-0/+3
| | | | | | | | | | | | Blits to or from a y-tiled surface must always be a multiple of the tile size. From page 16 of the HSW PRM (https://01.org/linuxgraphics/sites/default/files/documentation/intel-gfx-prm-osrc-hsw-memory-views.pdf#16) "The pitch of a tiled enclosing region must be an integral number of tile widths" Signed-off-by: Ben Widawsky <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]>
* i965: Consolidate some of the intel_blit logicBen Widawsky2015-02-071-20/+8
| | | | | | | | | | | An upcoming patch is going to introduce some code here, and having this code organized as the patch does makes it a bit easier to read later. There should be no functional change here. Signed-off-by: Ben Widawsky <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]>
* i965/vec4: Correct MUL destination hazardBen Widawsky2015-02-061-4/+4
| | | | | | | | | | | | | | | | | | | | | | As it turns out, we were over-thinking the cause of the hang on Cherryview. It's simply errata for Cherryview. commit 88fea85f09e2252035bec66ab26c375b45b000f5 Author: Ben Widawsky <[email protected]> Date: Fri Nov 21 10:47:41 2014 -0800 i965/vec4/gen8: Handle the MUL dest hazard exception This is an explanation to why we never saw the hang on BDW. NOTE: The problem the original patch was trying to fix does still exist. It will have to be fixed at some point. v2: Modify commit message, s/CHV/BDW Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=84212 Signed-off-by: Ben Widawsky <[email protected]> Reviewed-by: Matt Turner <[email protected]>
* i965: Fix INTEL_DEBUG=shader_time for SIMD8 VS (and GS).Kenneth Graunke2015-02-051-9/+25
| | | | | | | | | | | | We were incorrectly attributing VS time to FS8 on Gen8+, which now use fs_visitor for vertex shaders. We don't hit this for geometry shaders yet, but we may as well add support now - the fix is obvious, and we'll just forget later. Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Francisco Jerez <[email protected]> Reviewed-by: Jordan Justen <[email protected]>
* i965/fs: Use inst->eot rather than opcodes in register allocation.Kenneth Graunke2015-02-051-11/+10
| | | | | | | | | | | | Previously, we special cased FB writes and URB writes in the register allocation code. What we really wanted was to handle any message with EOT set. This saves us from extending the list with new opcodes in the future. Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Jordan Justen <[email protected]> Reviewed-by: Ben Widawsky <[email protected]>
* i965/fs: Delete is_last_send(); just check inst->eot.Kenneth Graunke2015-02-051-14/+1
| | | | | | | | | | | | | | | This helper function basically just checks inst->eot, but also asserts that only opcodes we expect to terminate threads have EOT set. As far as I'm aware, we've never had such a bug. Removing it means that we don't have to extend the list for new opcodes. Cherryview and Skylake introduce an optimization where sampler messages can have EOT set; scalar GS/HS/DS will likely introduce new opcodes as well. Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Jordan Justen <[email protected]> Reviewed-by: Ben Widawsky <[email protected]>
* i965: Remove now unnecessary Gen8 CMP destination type override.Matt Turner2015-02-041-8/+0
| | | | Reviewed-by: Kenneth Graunke <[email protected]>
* i965: Set CMP's destination type to src0's type.Matt Turner2015-02-042-18/+18
| | | | | | Allows CMP instructions with float sources to be compacted and coissued. Reviewed-by: Kenneth Graunke <[email protected]>
* i965/fs: Implement the WaCMPInstFlagDepClearedEarly work-around.Matt Turner2015-02-041-1/+36
| | | | | | Prevents piglit regressions from the next patch. Reviewed-by: Kenneth Graunke <[email protected]>
* i965/fs: Fix saturate on MAD and LRP with the NIR backend.Kenneth Graunke2015-02-041-2/+4
| | | | | | | | | Fixes misrendering in "Witcher 2" with INTEL_USE_NIR=1, and probably many other programs. Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]> Reviewed-by: Matt Turner <[email protected]>
* i965/nir: use redundant phi optimizationConnor Abbott2015-02-031-0/+2
| | | | | | Reviewed-by: Jason Ekstrand <[email protected]> Tested-by: Jason Ekstrand <[email protected]> Signed-off-by: Connor Abbott <[email protected]>
* i965/fs_nir: Get rid of get_alu_srcJason Ekstrand2015-02-032-59/+75
| | | | | | | | | | | | | | | | | Originally, get_alu_src was supposed to handle resolving swizzles and things like that. However, now that basically every instruction we have only takes scalar sources, we don't really need it anymore. The only case where it's still marginally useful is for the mov and vecN operations that are left over from SSA form. We can handle those cases as a special case easily enough. As a side-effect, we don't need the vec_to_movs pass anymore. v2 Jason Ekstrand <[email protected]>: - Rework the way we detect if we need an extra copy for swizzling. The old code involved a pile of confusing switch fall-throughs; we now use a loop. Reviewed-by: Kenneth Graunke <[email protected]>
* i965/fs: Use NIR's scalarizing abilities and stop handling vectorsJason Ekstrand2015-02-032-349/+161
| | | | | | | | | | | | | | | | | | | | Now that we can scalarize with NIR, there's no need for all this code anymore. Let's get rid of it and just do scalar operations. v2: run copy prop before lowering phi nodes v3: Get rid of the "emit(...)->saturate = foo" pattern v4: Run alu_to_scalar as an optimization pass total instructions in shared programs: 5998321 -> 5974070 (-0.40%) instructions in affected programs: 732075 -> 707824 (-3.31%) helped: 3137 HURT: 191 GAINED: 18 LOST: 0 Reviewed-by: Kenneth Graunke <[email protected]>
* i965/fs: Add support for constant propagating into sources with modifiers.Matt Turner2015-02-031-6/+12
| | | | | | | | | | | All but 16 of the programs helped were ARB fp programs. total instructions in shared programs: 5949286 -> 5945470 (-0.06%) instructions in affected programs: 275162 -> 271346 (-1.39%) helped: 1197 GAINED: 1 Reviewed-by: Jason Ekstrand <[email protected]>
* i965/vec4: Use abs/negate functions in const propagation.Matt Turner2015-02-031-13/+5
| | | | | | No changes in shader-db. Reviewed-by: Jason Ekstrand <[email protected]>
* i965: Add function to take the abs of immediates.Matt Turner2015-02-032-0/+40
| | | | Reviewed-by: Jason Ekstrand <[email protected]>
* i965: Add function to negate immediates.Matt Turner2015-02-032-0/+40
| | | | Reviewed-by: Jason Ekstrand <[email protected]>
* i965: Mark UB/B immediates as unreachable.Matt Turner2015-02-031-4/+1
| | | | Reviewed-by: Jason Ekstrand <[email protected]>
* glsl: Improve precision of mod(x,y)Iago Toral Quiroga2015-02-033-3/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Currently, Mesa uses the lowering pass MOD_TO_FRACT to implement mod(x,y) as y * fract(x/y). This implementation has a down side though: it introduces precision errors due to the fract() operation. Even worse, since the result of fract() is multiplied by y, the larger y gets the larger the precision error we produce, so for large enough numbers the precision loss is significant. Some examples on i965: Operation Precision error ----------------------------------------------------- mod(-1.951171875, 1.9980468750) 0.0000000447 mod(121.57, 13.29) 0.0000023842 mod(3769.12, 321.99) 0.0000762939 mod(3769.12, 1321.99) 0.0001220703 mod(-987654.125, 123456.984375) 0.0160663128 mod( 987654.125, 123456.984375) 0.0312500000 This patch replaces the current lowering pass with a different one (MOD_TO_FLOOR) that follows the recommended implementation in the GLSL man pages: mod(x,y) = x - y * floor(x/y) This implementation eliminates the precision errors at the expense of an additional add instruction on some systems. On systems that can do negate with multiply-add in a single operation this new implementation would come at no additional cost. v2 (Ian Romanick) - Do not clone operands because when they are expressions we would be duplicating them and that can lead to suboptimal code. Fixes the following 16 dEQP tests: dEQP-GLES3.functional.shaders.builtin_functions.precision.mod.mediump_* dEQP-GLES3.functional.shaders.builtin_functions.precision.mod.highp_* Reviewed-by: Ian Romanick <[email protected]>
* i965: Fix negate with unsigned integersIago Toral Quiroga2015-02-032-8/+10
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | For code such as: uint tmp1 = uint(in0); uint tmp2 = -tmp1; float out0 = float(tmp2); We produce code like: mov(8) g5<1>.xF -g9<4,4,1>.xUD which does not produce correct results. This code produces the results we would expect if tmp1 and tmp2 were signed integers instead. It seems that a similar problem was detected and addressed when using negations with unsigned integers as part of condionals, but it looks like the problem has a wider impact than that. This patch fixes the problem by preventing copy-propagation of negated UD registers in all scenarios, not only in conditionals. Fixes the following 24 dEQP tests: dEQP-GLES3.functional.shaders.operator.unary_operator.minus.*_uint_* dEQP-GLES3.functional.shaders.operator.unary_operator.minus.*_uvec2_* dEQP-GLES3.functional.shaders.operator.unary_operator.minus.*_uvec3_* dEQP-GLES3.functional.shaders.operator.unary_operator.minus.*_uvec4_* Reviewed-by: Anuj Phogat <[email protected]>
* i965/gen6+: enable EXT_polygon_offset_clampIlia Mirkin2015-02-024-3/+4
| | | | | | | Replace the hard-coded 0's with the context clamp value. Signed-off-by: Ilia Mirkin <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
* mesa: add support for GL_EXT_polygon_offset_clampIlia Mirkin2015-02-023-3/+3
| | | | | | | | | | Nothing enables the extension yet, but the values are now available. The spec calls for it to only be exposed for GL 3.3+, which is core-only in mesa. Instead we allow any driver to enable it, including in a compat context for any GL version. Signed-off-by: Ilia Mirkin <[email protected]> Reviewed-by: Glenn Kennard <[email protected]>
* i965: Add a better PRM citation for the IMS dimension mangling.Kenneth Graunke2015-02-021-1/+22
| | | | | | | | | | | | | | | Paul originally had to reverse engineer these formulas based on the description about how the sampler works. The description here is not the easiest to follow - especially given that it's from the Sandybridge era, when the hardware only did 4x multisampling. Jordan and I recently found another part of the documentation where they simply state that IMS dimensions must be adjusted by a set of formulas. Quoting this section provides an easy to follow explanation for the code, including 2x/4x/8x/16x. Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Chad Versace <[email protected]>
* DD: Refactor BlitFramebuffer.Laura Ekstrand2015-02-0212-46/+78
| | | | | | | | | In preparation for glBlitNamedFramebuffer, the DD table function BlitFramebuffer needs to accept two arbitrary framebuffer objects rather than assuming ctx->ReadBuffer and ctx->DrawBuffer. Reviewed-by: Jason Ekstrand <[email protected]> Reviewed-by: Brian Paul <[email protected]>
* i965: Don't use tiled_memcpy to download from RGBX or BGRX surfacesJason Ekstrand2015-02-022-0/+14
| | | | | | Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=88841 Reviewed-by: Anuj Phogat <[email protected]>
* dir-locals.el: Don't set variables for non-programming modesNeil Roberts2015-02-021-1/+1
| | | | | | | | | | | | | | This limits the style changes to modes inherited from prog-mode. The main reason to do this is to avoid setting fill-column for people using Emacs to edit commit messages because 78 characters is too many to make it wrap properly in git log. Note that makefile-mode also inherits from prog-mode so the fill column should continue to apply there. v2: Apply to all the .dir-locals.el files, not just the one in the root directory. Acked-by: Michel Dänzer <[email protected]>
* i965: Fix intel_miptree_copy_teximage for GL_TEXTURE_1D_ARRAYIago Toral Quiroga2015-02-021-1/+6
| | | | | | | | | | | For GL_TEXTURE_1D_ARRAY targets we store the depth of the array in the Height field and leave Depth=1 in the underlying texture object. When we call intel_miptree_copy_teximage in the process of re-creating a miptree (possibily because the number of miplevels has changed) we didn't account for this, so we where only copying texture images for the first slice. Reviewed-by: Jason Ekstrand <[email protected]>
* i965/pixel_read: Don't try to do a tiled_memcpy from a multisampled bufferJason Ekstrand2015-01-311-0/+7
| | | | | | | | | | The GL spec guarantees that glGetTexImage will never get a multisampled texture, but this is not true for glReadPixels. If we get a multisampled buffer, we have to do a multisample resolve on it before we can pull the data down for the user. Since this isn't practical to handle in tiled_memcpy, we just fall back to the other paths that can handle this. Reviewed-by: Kenneth Graunke <[email protected]>
* i965: Enable L3 caching of buffer surfaces.Francisco Jerez2015-01-314-9/+3
| | | | | | | | | | | | | | | And remove the mocs argument of the emit_buffer_surface_state vtbl hook. Its semantics vary greatly from one generation to another, so it kind of encourages the caller to pass 0 which is the only valid setting across generations. After this commit the hardware-specific code decides what the best cacheability settings are for buffer surfaces, just like we do for textures. This together with some additional changes coming is expected to improve performance of pull constants, buffer textures, atomic counters and image objects on Gen7 and up. Reviewed-by: Kenneth Graunke <[email protected]>
* intel/pixel_read: Properly flip the results for window system buffersJason Ekstrand2015-01-301-0/+15
| | | | | | Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=88841 Reviewed-by: Chad Versace <[email protected]>
* i965/tiled_memcpy: Support a signed linear pitchJason Ekstrand2015-01-302-17/+17
| | | | Reviewed-by: Chad Versace <[email protected]>