summaryrefslogtreecommitdiffstats
path: root/src/mesa
Commit message (Collapse)AuthorAgeFilesLines
* gen7: Use logical, not physical, dims in 3DSTATE_DEPTH_BUFFER (v2)Chad Versace2013-10-072-4/+4
| | | | | | | | | | | | | | | | | In 3DSTATE_DEPTH_BUFFER, we set Width and Height to the miptree slice's physical dimensions. (Logical and physical dimensions may differ for multisample surfaces). However, in SURFACE_STATE, we always set Width and Height to the slice's logical dimensions. We should do the same for 3DSTATE_DEPTH_BUFFER, because the hw docs say so. No Piglit regressions (-x glx -x glean) on Ivybridge with Wayland. v2: No Piglit regressions, for real this time. Reviewed-by: Jordan Justen <[email protected]> Signed-off-by: Chad Versace <[email protected]>
* i965: Remove the "ARF" register file.Matt Turner2013-10-075-16/+6
| | | | | | | | The registers in the architecture register file don't share much in common, so there's no point in grouping them together. Use the HW_REG class instead. The vec4 backend already does this. Reviewed-by: Kenneth Graunke <[email protected]>
* i965: Fixup for don't dead-code eliminate instructions that write to the ↵Matt Turner2013-10-072-4/+2
| | | | | | | | | | accumulator. Accidentally pushed an old version of the patch. v2: Set destination register using brw_null_reg(). Reviewed-by: Paul Berry <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
* i965: Generate code for ir_binop_imul_high.Matt Turner2013-10-073-0/+18
| | | | | | v2: Make accumulator's type match the type of the operation. Noticed by Ken. Reviewed-by: Kenneth Graunke <[email protected]>
* i965: Use the multiplication result's type for the accumulator.Matt Turner2013-10-072-2/+2
| | | | Reviewed-by: Eric Anholt <[email protected]>
* i965/fs: Disable CSE on instructions writing to HW_REG.Matt Turner2013-10-071-1/+2
| | | | | | | | | | | | | | | | | | | | | | | | | CSE would otherwise combine the two mul(8) emitted by [iu]mulExtended: mul(8) acc0 x y mach(8) null x y mov(8) lsb acc0 ... mul(8) acc0 x y mach(8) msb x y Into: mul(8) temp x y mov(8) acc0 temp mach(8) null x y mov(8) lsb acc0 ... mov(8) acc0 temp mach(8) msb x y But mul(8) into the accumulator produces more than 32-bits of precision, which is required and lost if multiplying into a general register and moving to the accumulator. Reviewed-by: Eric Anholt <[email protected]>
* glsl: Implement [iu]mulExtended() built-ins for ARB_gpu_shader5.Matt Turner2013-10-072-0/+2
| | | | | | | | | | These built-ins have two "out" parameters, which makes implementing them efficiently with our current compiler infrastructure difficult. Instead, implement them in terms of the existing ir_binop_mul IR (to return the low 32-bits) and a new ir_binop_mul64 which returns the high 32-bits. v2: Rename mul64 -> imul_high as suggested by Ken. Reviewed-by: Kenneth Graunke <[email protected]>
* i965: Add Gen assertion checks for newer instructions.Matt Turner2013-10-072-0/+22
| | | | Reviewed-by: Kenneth Graunke <[email protected]>
* i965: Don't dead-code eliminate instructions that write to the accumulator.Matt Turner2013-10-072-2/+30
| | | | | Reviewed-by: Paul Berry <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
* i965: Generate code for ir_binop_carry and ir_binop_borrow.Matt Turner2013-10-0714-0/+80
| | | | | | Using the ADDC and SUBB instructions on Gen7. Reviewed-by: Kenneth Graunke <[email protected]>
* i965: Add UD null register helpers.Matt Turner2013-10-072-0/+6
| | | | Reviewed-by: Kenneth Graunke <[email protected]>
* glsl: Add ir_binop_carry and ir_binop_borrow.Matt Turner2013-10-072-0/+4
| | | | | | | | | Calculates the carry out of the addition of two values and the borrow from subtraction respectively. Will be used in uaddCarry() and usubBorrow() built-in implementations. Reviewed-by: Kenneth Graunke <[email protected]> Reviewed-by: Ian Romanick <[email protected]>
* st/mesa: silence warning about unhandled ir_query_levels in switchBrian Paul2013-10-071-0/+3
|
* dri/nouveau: add AllocTextureImageBuffer implementationIlia Mirkin2013-10-061-0/+9
| | | | | | | | | This fixes issues where get_rt_format would see a 0 format because the nouveau_surface had not been properly initialized. Fixes crash on supertuxkart startup (which still fails due to out-of-vram issues). Signed-off-by: Ilia Mirkin <[email protected]> Signed-off-by: Francisco Jerez <[email protected]>
* i965/ivb: Flag RG32F quirk for texture gather regardless of swizzlesChris Forbes2013-10-061-1/+1
| | | | | | | | | | | | As of ARB_gpu_shader5, textureGather doesn't always read the post-swizzle RED channel -- so we can't just look at the red swizzle state. Theoretically we could only flag the quirk if *some* green swizzle is in use, but that's probably more trouble than it's worth. Signed-off-by: Chris Forbes <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
* i965/vs: Add support for textureGather(.., comp)Chris Forbes2013-10-061-7/+11
| | | | | | | | | | | - For HSW: Select the channel based on the component selected (swizzle is done in HW) - For IVB: Select the channel based on the swizzle state for the component selected. Only apply the RG32F w/a if we actually want green -- we're about to flag it regardless of swizzle state. Signed-off-by: Chris Forbes <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
* i965/fs: Add support for textureGather(.., comp)Chris Forbes2013-10-061-7/+11
| | | | | | | | | | | - For HSW: Select the channel based on the component selected (swizzle is done in HW) - For IVB: Select the channel based on the swizzle state for the component selected. Only apply the RG32F w/a if we actually want green -- we're about to flag it regardless of swizzle state. Signed-off-by: Chris Forbes <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
* i965: Enable ARB_conservative_depth for Gen7+.Chris Forbes2013-10-061-0/+1
| | | | | Signed-off-by: Chris Forbes <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
* i965/wm: Program correct conservative depth modesChris Forbes2013-10-061-2/+14
| | | | | Signed-off-by: Chris Forbes <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
* i965/vs: add missing break between ir_query_levels and ir_tg4 casesChris Forbes2013-10-051-0/+1
| | | | Signed-off-by: Chris Forbes <[email protected]>
* i965: enable ARB_texture_query_levels on Gen6+Chris Forbes2013-10-051-0/+1
| | | | | | | | | | Theoretically would work on Gen5 as well but requires GLSL 1.30, which is not (yet) enabled by default there. V2: Enable for Gen5 conditionally on GLSL version. Signed-off-by: Chris Forbes <[email protected]> Reviewed-by: Matt Turner <[email protected]>
* i965/vs: implement ir_query_levelsChris Forbes2013-10-051-1/+14
| | | | | Signed-off-by: Chris Forbes <[email protected]> Reviewed-by: Matt Turner <[email protected]>
* i965/fs: implement ir_query_levelsChris Forbes2013-10-051-1/+19
| | | | | Signed-off-by: Chris Forbes <[email protected]> Reviewed-by: Matt Turner <[email protected]>
* i965: ignore all texturing opcodes without a coordinate, for cubemap normalizeChris Forbes2013-10-051-1/+1
| | | | | | | | Previously we special-cased textureSize() but this is the more correct condition. Signed-off-by: Chris Forbes <[email protected]> Reviewed-by: Matt Turner <[email protected]>
* glsl: add plumbing for GL_ARB_texture_query_levelsChris Forbes2013-10-051-0/+3
| | | | | Signed-off-by: Chris Forbes <[email protected]> Reviewed-by: Matt Turner <[email protected]>
* mesa: add plumbing for GL_ARB_texture_query_levelsChris Forbes2013-10-052-0/+2
| | | | | Signed-off-by: Chris Forbes <[email protected]> Reviewed-by: Matt Turner <[email protected]>
* mesa: Don't return any data for GL_SHADER_BINARY_FORMATSIan Romanick2013-10-041-1/+1
| | | | | | | | | | | We return 0 for GL_NUM_SHADER_BINARY_FORMATS, so GL_SHADER_BINARY_FORMATS should not write any data to the application buffer. Fixes piglit test 'arb_get_program_binary-overrun shader'. Signed-off-by: Ian Romanick <[email protected]> Reviewed-by: Matt Turner <[email protected]>
* i965/fs: Improve accuracy of dFdy() to match dFdx().Paul Berry2013-10-032-20/+62
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Previously, we computed dFdy() using the following instruction: add(8) dst<1>F src<4,4,0)F -src.2<4,4,0>F { align1 1Q } That had the disadvantage that it computed the same value for all 4 pixels of a 2x2 subspan, which meant that it was less accurate than dFdx(). This patch changes it to the following instruction when c->key.high_quality_derivatives is set: add(8) dst<1>F src<4,4,1>.xyxyF -src<4,4,1>.zwzwF { align16 1Q } This gives it comparable accuracy to dFdx(). Unfortunately, align16 instructions can't be compressed, so in SIMD16 shaders, instead of emitting this instruction: add(16) dst<1>F src<4,4,1>.xyxyF -src<4,4,1>.zwzwF { align16 1H } We need to unroll to two instructions: add(8) dst<1>F src<4,4,1>.xyxyF -src<4,4,1>.zwzwF { align16 1Q } add(8) (dst+1)<1>F (src+1)<4,4,1>.xyxyF -(src+1)<4,4,1>.zwzwF { align16 2Q } Fixes piglit test spec/glsl-1.10/execution/fs-dfdy-accuracy. Acked-by: Chris Forbes <[email protected]> Reviewed-by: Eric Anholt <[email protected]>
* st/mesa: silence warning about unhandled enum in switch statementBrian Paul2013-10-031-0/+3
|
* mesa: fix make check for ARB_texture_gatherChris Forbes2013-10-032-3/+3
| | | | | | | | | | | Clean up inconsistency in enum decoration: - Use the undecorated enums where possible. - MAX_PROGRAM_TEXTURE_GATHER_COMPONENTS_ARB remains decorated, since it has no undecorated equivalent in GL4. Signed-off-by: Chris Forbes <[email protected]> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=70054 Reviewed-by: Kenneth Graunke <[email protected]>
* i965/hsw: Apply gather4 RG32F w/a using SCS instead of shader.Chris Forbes2013-10-032-8/+11
| | | | | | | | The new surface channel select bits allow us to avoid having to recompile the shader for this workaround. Signed-off-by: Chris Forbes <[email protected]> Reviewed-and-tested-by: Kenneth Graunke <[email protected]>
* i965: Enable ARB_texture_gather on Gen7Chris Forbes2013-10-032-0/+5
| | | | | Signed-off-by: Chris Forbes <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
* i965: use gather slots in the binding table for gather4.Chris Forbes2013-10-032-4/+12
| | | | | Signed-off-by: Chris Forbes <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
* i965: Emit a second set of SURFACE_STATE for gather4 from textures.Chris Forbes2013-10-033-8/+39
| | | | | | | | | | | | | This allows us to use a different surface format for gather4, which is required for R32G32_FLOAT to work on Gen7. V4: - Only emit alternate surface state for shaders which will actually use it. - Pass a simple 'for_gather' flag rather than a function pointer. The callee can decide what w/a to apply. Signed-off-by: Chris Forbes <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
* i965: make room in the binding table for a full alternate set of surface_statesChris Forbes2013-10-031-2/+18
| | | | | | | | | | Worst-case is that *every* texunit uses a format that needs overriding. V4: Place the gather slots last, so shaders which don't use gather don't get penalized by having a huge binding table. Signed-off-by: Chris Forbes <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
* i965: Add BRW_SURFACEFORMAT_R32G32_FLOAT_LD, required for IVB gather4 w/aChris Forbes2013-10-032-0/+2
| | | | | | | | | | | gather4 GREEN channel against a surface with format R32G32_FLOAT doesn't work correctly on IVB. w/a from bspec: - use R32G32_FLOAT_LD = 0x97 instead, for gather4 only. - select BLUE channel to read GREEN Signed-off-by: Chris Forbes <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
* i965: w/a for gather4 green RG32FChris Forbes2013-10-034-0/+22
| | | | | | | | | V4: Only flag quirks if there are any uses of gather in the shader, to avoid spurious recompiles just because someone happened to use RG32F. Signed-off-by: Chris Forbes <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
* glsl: flag shaders which use gather4 at allChris Forbes2013-10-031-0/+2
| | | | | Signed-off-by: Chris Forbes <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
* i965/vs: Add support for ir_tg4Chris Forbes2013-10-032-2/+45
| | | | | | | | | | Pretty much the same as the FS case. Channel select goes in the header, V2: Less mangling. V3: Avoid sampling at all, for degenerate swizzles. Signed-off-by: Chris Forbes <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
* i965/fs: Add support for ir_tg4Chris Forbes2013-10-032-3/+60
| | | | | | | | | | | | | | | | | | | | Lowers ir_tg4 (from textureGather and textureGatherOffset builtins) to SHADER_OPCODE_TG4. The usual post-sampling swizzle workaround can't work for ir_tg4, so avoid doing that: * For R/G/B/A swizzles use the hardware channel select (lives in the same dword in the header as the texel offset), and then don't do anything afterward in the shader. * For 0/1 swizzles blast the appropriate constant over all the output channels instead of sampling. V2: Avoid duplicating header enabling block V3: Avoid sampling at all, for degenerate swizzles. Signed-off-by: Chris Forbes <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
* i965: add SHADER_OPCODE_TG4Chris Forbes2013-10-036-2/+17
| | | | | | | | | | Adds the Gen7 message IDs, a new SHADER_OPCODE_TG4 pseudo-op, and low-level support for emitting it via generate_tex(). V3: Updated for changes in master. Signed-off-by: Chris Forbes <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
* glsl: add texture gather changesMaxence Le Dore2013-10-031-0/+3
| | | | | | | | | | V2 [Chris Forbes]: - Add new pattern, fixup parameter reading. V3: Rebase onto new builtins machinery Reviewed-by: Kenneth Graunke <[email protected]> Reviewed-by: Ian Romanick <[email protected]>
* mesa: add texture gather changesMaxence Le Dore2013-10-036-0/+21
| | | | | Reviewed-by: Kenneth Graunke <[email protected]> Reviewed-by: Ian Romanick <[email protected]>
* i965: fix bogus swizzle in brw_cubemap_normalizeChris Forbes2013-10-031-4/+6
| | | | | | | | | | | | | | | | When used with a cube array in VS, failed assertion in ir_validate: Assignment count of LHS write mask channels enabled not matching RHS vector size (3 LHS, 4 RHS). To fix this, swizzle the RHS correctly for the writemask. This showed up in the ARB_texture_gather tests, which exercise cube arrays in the VS. Signed-off-by: Chris Forbes <[email protected]> Cc: "9.2" <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
* i965: compute DDX in a subspan based only on top rowChia-I Wu2013-10-027-8/+49
| | | | | | | | | | | | | | | | | | | | | | | | | | Consider only the top-left and top-right pixels to approximate DDX in a 2x2 subspan, unless the application requests a more accurate approximation via GL_FRAGMENT_SHADER_DERIVATIVE_HINT or this optimization is disabled from the new driconf option disable_derivative_optimization. This results in a less accurate approximation. However, it improves the performance of Xonotic with Ultra settings by 24.3879% +/- 0.832202% (at 95.0% confidence) on Haswell. No noticeable image quality difference observed. The improvement comes from faster sample_d. It seems, on Haswell, some optimizations are introduced to allow faster sample_d when all pixels in a subspan have the same derivative. I considered SAMPLE_STATE too, which allows one to control the quality of sample_d on Haswell. But it gave much worse image quality without giving better performance comparing to this change. No piglit quick.tests regression on Haswell (tested with v1). v2: better guess for precompile program key Signed-off-by: Chia-I Wu <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]> Reviewed-by: Ian Romanick <[email protected]> Reviewed-by: Chris Forbes <[email protected]>
* i965/blorp: Use passed in framebuffer rather than ctx->DrawBufferChris Forbes2013-10-021-4/+4
| | | | | | | | We have the destination framebuffer object passed in; there's no need to go digging around in the context. Signed-off-by: Chris Forbes <[email protected]> Reviewed-by: Chad Versace <[email protected]>
* st/mesa: Switch glsl_to_tgsi_instruction to the non-zeroing allocator.Francisco Jerez2013-10-011-1/+1
| | | | | | | | All member variables of glsl_to_tgsi_instruction are already being initialized from its implicitly defined constructor, it's not necessary to use rzalloc to allocate its memory. Reviewed-by: Kenneth Graunke <[email protected]>
* mesa/program: Switch ir_to_mesa_instruction to the non-zeroing allocator.Francisco Jerez2013-10-011-1/+1
| | | | | | | | All member variables of ir_to_mesa_instruction are already being initialized from its implicitly defined constructor, it's not necessary to use rzalloc to allocate its memory. Reviewed-by: Kenneth Graunke <[email protected]>
* i965: Switch vec4_live_variables to the non-zeroing allocator.Francisco Jerez2013-10-011-1/+1
| | | | | | | | | | | | | | | | All member variables of vec4_live_variables are already being initialized from its constructor, it's not necessary to use rzalloc to allocate its memory, and doing so makes it more likely that we will start relying on the allocator to zero out all memory if the class is ever extended with new member variables. That's bad because it ties objects to some specific allocation scheme, and gives unpredictable results when an object is created with a different allocator -- Stack allocation, array allocation, or aggregation inside a different object are some of the useful possibilities that come to my mind. Reviewed-by: Kenneth Graunke <[email protected]>
* i965: Switch fs_live_variables to the non-zeroing allocator.Francisco Jerez2013-10-011-1/+1
| | | | | | | | | | | | | | | | All member variables of fs_live_variables are already being initialized from its constructor, it's not necessary to use rzalloc to allocate its memory, and doing so makes it more likely that we will start relying on the allocator to zero out all memory if the class is ever extended with new member variables. That's bad because it ties objects to some specific allocation scheme, and gives unpredictable results when an object is created with a different allocator -- Stack allocation, array allocation, or aggregation inside a different object are some of the useful possibilities that come to my mind. Reviewed-by: Kenneth Graunke <[email protected]>