summaryrefslogtreecommitdiffstats
path: root/src/mesa
Commit message (Collapse)AuthorAgeFilesLines
* mesa: add plumbing for GL_ARB_texture_query_levelsChris Forbes2013-10-052-0/+2
| | | | | Signed-off-by: Chris Forbes <[email protected]> Reviewed-by: Matt Turner <[email protected]>
* mesa: Don't return any data for GL_SHADER_BINARY_FORMATSIan Romanick2013-10-041-1/+1
| | | | | | | | | | | We return 0 for GL_NUM_SHADER_BINARY_FORMATS, so GL_SHADER_BINARY_FORMATS should not write any data to the application buffer. Fixes piglit test 'arb_get_program_binary-overrun shader'. Signed-off-by: Ian Romanick <[email protected]> Reviewed-by: Matt Turner <[email protected]>
* i965/fs: Improve accuracy of dFdy() to match dFdx().Paul Berry2013-10-032-20/+62
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Previously, we computed dFdy() using the following instruction: add(8) dst<1>F src<4,4,0)F -src.2<4,4,0>F { align1 1Q } That had the disadvantage that it computed the same value for all 4 pixels of a 2x2 subspan, which meant that it was less accurate than dFdx(). This patch changes it to the following instruction when c->key.high_quality_derivatives is set: add(8) dst<1>F src<4,4,1>.xyxyF -src<4,4,1>.zwzwF { align16 1Q } This gives it comparable accuracy to dFdx(). Unfortunately, align16 instructions can't be compressed, so in SIMD16 shaders, instead of emitting this instruction: add(16) dst<1>F src<4,4,1>.xyxyF -src<4,4,1>.zwzwF { align16 1H } We need to unroll to two instructions: add(8) dst<1>F src<4,4,1>.xyxyF -src<4,4,1>.zwzwF { align16 1Q } add(8) (dst+1)<1>F (src+1)<4,4,1>.xyxyF -(src+1)<4,4,1>.zwzwF { align16 2Q } Fixes piglit test spec/glsl-1.10/execution/fs-dfdy-accuracy. Acked-by: Chris Forbes <[email protected]> Reviewed-by: Eric Anholt <[email protected]>
* st/mesa: silence warning about unhandled enum in switch statementBrian Paul2013-10-031-0/+3
|
* mesa: fix make check for ARB_texture_gatherChris Forbes2013-10-032-3/+3
| | | | | | | | | | | Clean up inconsistency in enum decoration: - Use the undecorated enums where possible. - MAX_PROGRAM_TEXTURE_GATHER_COMPONENTS_ARB remains decorated, since it has no undecorated equivalent in GL4. Signed-off-by: Chris Forbes <[email protected]> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=70054 Reviewed-by: Kenneth Graunke <[email protected]>
* i965/hsw: Apply gather4 RG32F w/a using SCS instead of shader.Chris Forbes2013-10-032-8/+11
| | | | | | | | The new surface channel select bits allow us to avoid having to recompile the shader for this workaround. Signed-off-by: Chris Forbes <[email protected]> Reviewed-and-tested-by: Kenneth Graunke <[email protected]>
* i965: Enable ARB_texture_gather on Gen7Chris Forbes2013-10-032-0/+5
| | | | | Signed-off-by: Chris Forbes <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
* i965: use gather slots in the binding table for gather4.Chris Forbes2013-10-032-4/+12
| | | | | Signed-off-by: Chris Forbes <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
* i965: Emit a second set of SURFACE_STATE for gather4 from textures.Chris Forbes2013-10-033-8/+39
| | | | | | | | | | | | | This allows us to use a different surface format for gather4, which is required for R32G32_FLOAT to work on Gen7. V4: - Only emit alternate surface state for shaders which will actually use it. - Pass a simple 'for_gather' flag rather than a function pointer. The callee can decide what w/a to apply. Signed-off-by: Chris Forbes <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
* i965: make room in the binding table for a full alternate set of surface_statesChris Forbes2013-10-031-2/+18
| | | | | | | | | | Worst-case is that *every* texunit uses a format that needs overriding. V4: Place the gather slots last, so shaders which don't use gather don't get penalized by having a huge binding table. Signed-off-by: Chris Forbes <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
* i965: Add BRW_SURFACEFORMAT_R32G32_FLOAT_LD, required for IVB gather4 w/aChris Forbes2013-10-032-0/+2
| | | | | | | | | | | gather4 GREEN channel against a surface with format R32G32_FLOAT doesn't work correctly on IVB. w/a from bspec: - use R32G32_FLOAT_LD = 0x97 instead, for gather4 only. - select BLUE channel to read GREEN Signed-off-by: Chris Forbes <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
* i965: w/a for gather4 green RG32FChris Forbes2013-10-034-0/+22
| | | | | | | | | V4: Only flag quirks if there are any uses of gather in the shader, to avoid spurious recompiles just because someone happened to use RG32F. Signed-off-by: Chris Forbes <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
* glsl: flag shaders which use gather4 at allChris Forbes2013-10-031-0/+2
| | | | | Signed-off-by: Chris Forbes <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
* i965/vs: Add support for ir_tg4Chris Forbes2013-10-032-2/+45
| | | | | | | | | | Pretty much the same as the FS case. Channel select goes in the header, V2: Less mangling. V3: Avoid sampling at all, for degenerate swizzles. Signed-off-by: Chris Forbes <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
* i965/fs: Add support for ir_tg4Chris Forbes2013-10-032-3/+60
| | | | | | | | | | | | | | | | | | | | Lowers ir_tg4 (from textureGather and textureGatherOffset builtins) to SHADER_OPCODE_TG4. The usual post-sampling swizzle workaround can't work for ir_tg4, so avoid doing that: * For R/G/B/A swizzles use the hardware channel select (lives in the same dword in the header as the texel offset), and then don't do anything afterward in the shader. * For 0/1 swizzles blast the appropriate constant over all the output channels instead of sampling. V2: Avoid duplicating header enabling block V3: Avoid sampling at all, for degenerate swizzles. Signed-off-by: Chris Forbes <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
* i965: add SHADER_OPCODE_TG4Chris Forbes2013-10-036-2/+17
| | | | | | | | | | Adds the Gen7 message IDs, a new SHADER_OPCODE_TG4 pseudo-op, and low-level support for emitting it via generate_tex(). V3: Updated for changes in master. Signed-off-by: Chris Forbes <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
* glsl: add texture gather changesMaxence Le Dore2013-10-031-0/+3
| | | | | | | | | | V2 [Chris Forbes]: - Add new pattern, fixup parameter reading. V3: Rebase onto new builtins machinery Reviewed-by: Kenneth Graunke <[email protected]> Reviewed-by: Ian Romanick <[email protected]>
* mesa: add texture gather changesMaxence Le Dore2013-10-036-0/+21
| | | | | Reviewed-by: Kenneth Graunke <[email protected]> Reviewed-by: Ian Romanick <[email protected]>
* i965: fix bogus swizzle in brw_cubemap_normalizeChris Forbes2013-10-031-4/+6
| | | | | | | | | | | | | | | | When used with a cube array in VS, failed assertion in ir_validate: Assignment count of LHS write mask channels enabled not matching RHS vector size (3 LHS, 4 RHS). To fix this, swizzle the RHS correctly for the writemask. This showed up in the ARB_texture_gather tests, which exercise cube arrays in the VS. Signed-off-by: Chris Forbes <[email protected]> Cc: "9.2" <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
* i965: compute DDX in a subspan based only on top rowChia-I Wu2013-10-027-8/+49
| | | | | | | | | | | | | | | | | | | | | | | | | | Consider only the top-left and top-right pixels to approximate DDX in a 2x2 subspan, unless the application requests a more accurate approximation via GL_FRAGMENT_SHADER_DERIVATIVE_HINT or this optimization is disabled from the new driconf option disable_derivative_optimization. This results in a less accurate approximation. However, it improves the performance of Xonotic with Ultra settings by 24.3879% +/- 0.832202% (at 95.0% confidence) on Haswell. No noticeable image quality difference observed. The improvement comes from faster sample_d. It seems, on Haswell, some optimizations are introduced to allow faster sample_d when all pixels in a subspan have the same derivative. I considered SAMPLE_STATE too, which allows one to control the quality of sample_d on Haswell. But it gave much worse image quality without giving better performance comparing to this change. No piglit quick.tests regression on Haswell (tested with v1). v2: better guess for precompile program key Signed-off-by: Chia-I Wu <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]> Reviewed-by: Ian Romanick <[email protected]> Reviewed-by: Chris Forbes <[email protected]>
* i965/blorp: Use passed in framebuffer rather than ctx->DrawBufferChris Forbes2013-10-021-4/+4
| | | | | | | | We have the destination framebuffer object passed in; there's no need to go digging around in the context. Signed-off-by: Chris Forbes <[email protected]> Reviewed-by: Chad Versace <[email protected]>
* st/mesa: Switch glsl_to_tgsi_instruction to the non-zeroing allocator.Francisco Jerez2013-10-011-1/+1
| | | | | | | | All member variables of glsl_to_tgsi_instruction are already being initialized from its implicitly defined constructor, it's not necessary to use rzalloc to allocate its memory. Reviewed-by: Kenneth Graunke <[email protected]>
* mesa/program: Switch ir_to_mesa_instruction to the non-zeroing allocator.Francisco Jerez2013-10-011-1/+1
| | | | | | | | All member variables of ir_to_mesa_instruction are already being initialized from its implicitly defined constructor, it's not necessary to use rzalloc to allocate its memory. Reviewed-by: Kenneth Graunke <[email protected]>
* i965: Switch vec4_live_variables to the non-zeroing allocator.Francisco Jerez2013-10-011-1/+1
| | | | | | | | | | | | | | | | All member variables of vec4_live_variables are already being initialized from its constructor, it's not necessary to use rzalloc to allocate its memory, and doing so makes it more likely that we will start relying on the allocator to zero out all memory if the class is ever extended with new member variables. That's bad because it ties objects to some specific allocation scheme, and gives unpredictable results when an object is created with a different allocator -- Stack allocation, array allocation, or aggregation inside a different object are some of the useful possibilities that come to my mind. Reviewed-by: Kenneth Graunke <[email protected]>
* i965: Switch fs_live_variables to the non-zeroing allocator.Francisco Jerez2013-10-011-1/+1
| | | | | | | | | | | | | | | | All member variables of fs_live_variables are already being initialized from its constructor, it's not necessary to use rzalloc to allocate its memory, and doing so makes it more likely that we will start relying on the allocator to zero out all memory if the class is ever extended with new member variables. That's bad because it ties objects to some specific allocation scheme, and gives unpredictable results when an object is created with a different allocator -- Stack allocation, array allocation, or aggregation inside a different object are some of the useful possibilities that come to my mind. Reviewed-by: Kenneth Graunke <[email protected]>
* i965: Switch fs_inst to the non-zeroing allocator.Francisco Jerez2013-10-011-1/+1
| | | | | | | | | | | | | | | | All member variables of fs_inst are already being initialized from its constructor, it's not necessary to use rzalloc to allocate its memory, and doing so makes it more likely that we will start relying on the allocator to zero out all memory if the class is ever extended with new member variables. That's bad because it ties objects to some specific allocation scheme, and gives unpredictable results when an object is created with a different allocator -- Stack allocation, array allocation, or aggregation inside a different object are some of the useful possibilities that come to my mind. Reviewed-by: Kenneth Graunke <[email protected]>
* i965: Switch ip_record to the non-zeroing allocator.Francisco Jerez2013-10-011-1/+1
| | | | | | | | | | | | | | | | All member variables of ip_record are already being initialized from its constructor, it's not necessary to use rzalloc to allocate its memory, and doing so makes it more likely that we will start relying on the allocator to zero out all memory if the class is ever extended with new member variables. That's bad because it ties objects to some specific allocation scheme, and gives unpredictable results when an object is created with a different allocator -- Stack allocation, array allocation, or aggregation inside a different object are some of the useful possibilities that come to my mind. Reviewed-by: Kenneth Graunke <[email protected]>
* i965: Initialize all member variables of cfg_t on construction.Francisco Jerez2013-10-012-1/+2
| | | | | | | | | | | | | The cfg_t object relies on the memory allocator zeroing out its contents before it's initialized, which is quite an unusual practice in the C++ world because it ties objects to some specific allocation scheme, and gives unpredictable results when an object is created with a different allocator -- Stack allocation, array allocation, or aggregation inside a different object are some of the useful possibilities that come to my mind. Initialize all fields from the constructor and stop using the zeroing allocator. Reviewed-by: Kenneth Graunke <[email protected]>
* i965: Initialize all member variables of bblock_t on construction.Francisco Jerez2013-10-012-2/+3
| | | | | | | | | | | | | | | The bblock_t object relies on the memory allocator zeroing out its contents before it's initialized, which is quite an unusual practice in the C++ world because it ties objects to some specific allocation scheme, and gives unpredictable results when an object is created with a different allocator -- Stack allocation, array allocation, or aggregation inside a different object are some of the useful possibilities that come to my mind. Initialize all fields from the constructor and stop using the zeroing allocator. v2: Use zero initialization for numeric types instead of default construction. Reviewed-by: Kenneth Graunke <[email protected]>
* i965: Initialize all member variables of vec4_instruction on construction.Francisco Jerez2013-10-012-1/+16
| | | | | | | | | | | | | | | The vec4_instruction object relies on the memory allocator zeroing out its contents before it's initialized, which is quite an unusual practice in the C++ world because it ties objects to some specific allocation scheme, and gives unpredictable results when an object is created with a different allocator -- Stack allocation, array allocation, or aggregation inside a different object are some of the useful possibilities that come to my mind. Initialize all fields from the constructor and stop using the zeroing allocator. Reviewed-by: Paul Berry <[email protected]> Reviewed-by: Chad Versace <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
* mesa: Fix misplaced includes of "main/uniforms.h".Francisco Jerez2013-10-015-5/+4
| | | | | | | | | | | Several C++ source files include "main/uniforms.h" from an extern "C" block, which is both unnecessary, because "uniforms.h" already checks for a C++ compiler and sets the right linkage, and incorrect, because the header file includes other C++ headers ("glsl_types.h" and "ir_uniform.h") that are supposed to get C++ linkage. Reviewed-by: Paul Berry <[email protected]> Reviewed-by: Chad Versace <[email protected]>
* i965/gs: Fix incorrect numbering of DWORDs in 3DSTATE_GSPaul Berry2013-10-011-3/+4
| | | | | | | | | | | | | | | In commit 247f90c77e8f3894e963d796628246ba0bde27b5 (i965/gs: Set control data header size/format appropriately for EndPrimitive()), I incorrectly numbered the DWORDs in the 3DSTATE_GS command starting from 1 instead of starting from 0. This caused the control data format to be programmed into the wrong DWORD, resulting in corruption in some geometry shaders that used an output type of points. This patch numbers the DWORDs starting from 0, as we do for all other commands, which causes the control data format to be programmed into the correct DWORD. Reviewed-by: Chad Versace <[email protected]>
* mesa: check for bufSize > 0 in _mesa_GetSynciv()Brian Paul2013-10-011-1/+1
| | | | | | | | The spec doesn't say GL_INVALID_VALUE should be raised for bufSize <= 0. In any case, memcpy(len < 0) will lead to a crash, so don't allow it. CC: "9.2" <[email protected]> Reviewed-by: Ian Romanick <[email protected]>
* mesa: minor fix-ups for _mesa_validate_sync()Brian Paul2013-10-012-4/+13
| | | | | | Return bool instead of int. Const-qualify the syncObj. Add some comments. Reviewed-by: Ian Romanick <[email protected]>
* mesa: add missing error checks in _mesa_GetObject[Ptr]Label()Brian Paul2013-10-011-0/+12
| | | | | | | | | | | Error checking bufSize isn't mentioned in the spec, but it is in the man pages. However, I believe the man page is incorrect. Typically, GL functions that take GLsizei parameters check that they're positive or non-negative. Negative values don't make sense here. A spec bug has been filed with Khronos/ARB. v2: check for negative values, not <= 0.
* mesa: use caller string in error message in get_label_pointer()Brian Paul2013-10-011-1/+1
| | | | Reviewed-by: Timothy Arceri <[email protected]>
* mesa: asst. clean-ups in copy_label()Brian Paul2013-10-011-10/+27
| | | | | | | | | | | This incorporates Vinson's change to check for a null src pointer as detected by coverity. Also, rename the function params to be src/dst, const-qualify src, and use GL types to match the calling functions. And add some more comments. Reviewed-by: Timothy Arceri <[email protected]>
* mesa/drivers: drop HAVE_*_DRI from individual makefilesEmil Velikov2013-10-016-19/+0
| | | | | | | | | The mesa/drivers/dri/Makefile.am already guards the individual targets/subdirs with HAVE_*_DRI before including them. Thus making the additional check within each Makefile.am unnecessary. Signed-off-by: Emil Velikov <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
* i915: Fix memory leak in do_blit_readpixels.Vinson Lee2013-09-301-0/+1
| | | | | | | Fixes "Resource leak" defect reported by Coverity. Signed-off-by: Vinson Lee <[email protected]> Reviewed-by: Eric Anholt <[email protected]>
* i965: Reenable glBitmap() after the sRGB winsys enabling.Eric Anholt2013-09-301-1/+2
| | | | | | | | | The format of the window system framebuffer changed from ARGB8888 to SARGB8, but we're still supposed to render to it the same as ARGB8888 unless the user flipped the GL_FRAMEBUFFER_SRGB switch. Reviewed-by: Kenneth Graunke <[email protected]> NOTE: This is a candidate for stable branches.
* mesa: Remove all traces of GL_OES_matrix_getIan Romanick2013-09-303-12/+0
| | | | | | | | | | | | | I believe this extension was enabled by accident. As far as I can tell, there has never been any code in Mesa to actually support it. Not only that, this extension is only useful in the common-lite profile, and Mesa does the common profile. This "fixes" the piglit test oes_matrix_get-api. Signed-off-by: Ian Romanick <[email protected]> Cc: "9.1 9.2" <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
* i965/blorp: retype destination register for texture SEND instruction to UW.Paul Berry2013-09-301-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | From the bspec documentation of the SEND instruction: "destination region cannot cross the 256-bit register boundary." To avoid violating this restriction when executing SIMD16 texturing operations (such as those used by blorp), we need to ensure that the destination of the SEND instruction doesn't exceed 256 bits in size. An easy way to do this is to set the type of the destination register to UW (unsigned word), since 16 unsigned words can fit inside a 256-bit register. Fortunately, this has no effect on the sampling operation, since the sampler always infers the destination data type from the sampler message rather than from the type of the instruction operand. Previously, we did this for texturing operations issued by the vec4 and fs back-ends, but not for blorp. This patch makes blorp use the same trick. I haven't observed any behavioural difference on actual hardware due to this patch, but it avoids a warning from the simulator so it seems like the right thing to do. Reviewed-by: Matt Turner <[email protected]> Reviewed-by: Eric Anholt <[email protected]> Acked-by: Chad Versace <[email protected]>
* i965: Add a real native TexStorage path.Eric Anholt2013-09-301-0/+63
| | | | | | | | | | | | We originally had a path just did the loop and called ctx->Driver.AllocTextureImageBuffer(), which I moved into Mesa core. But we can do better, avoiding incorrect miptree size guesses and later texture validations by just directly allocating the miptree and setting it to all the images. v2: drop debug printf. Reviewed-by: Chad Versace <[email protected]>
* i965: Add missing license to intel_tex_validate.c.Eric Anholt2013-09-301-0/+23
| | | | | | I've rewritten a lot of this file. Reviewed-by: Chad Versace <[email protected]>
* i965: Always allocate validated miptrees from level 0.Eric Anholt2013-09-301-6/+5
| | | | | | | No change in copies during a piglit run, but it's one less first_level != 0 in our codebase. Reviewed-by: Chad Versace <[email protected]>
* i965: Don't relayout a texture just for baselevel changes.Eric Anholt2013-09-302-24/+39
| | | | | | | | | | | | As long as the baselevel, maxlevel still sit inside the range we had previously validated, there's no need to reallocate the texture. I also hope this makes our texture validation logic much more obvious. It's taken me enough tries to write this change, that's for sure. Reduces miptree copy count on a piglit run by 1.3%, though the change in amount of data moved is much smaller. Reviewed-by: Chad Versace <[email protected]>
* i965: Don't allocate a 1-level texture when GL_GENERATE_MIPMAP is set.Eric Anholt2013-09-301-1/+2
| | | | | | | | | | Given that a teximage that calls us with this flag set will immediately proceed to allocate the other levels, we can probably just go ahead and allocate those levels now. Reduces miptree copies in piglit by about .05%. Reviewed-by: Chad Versace <[email protected]>
* i965: Stop allocating miptrees with first_level != 0.Eric Anholt2013-09-301-17/+6
| | | | | | | | | | | | If the caller shows up with GL_BASE_LEVEL != 0, it doesn't mean that the texture will over the course of its lifetime have that nonzero baselevel, it means that the caller is filling the texture from the bottom up for some reason (one could imagine demand-loading detailed texture layers at runtime, for example). If we allocate from just the current baselevel, it means when they come along with the next level up, we'll have to allocate a new miptree and copy all of our bits out of the first miptree. Reviewed-by: Chad Versace <[email protected]>
* i965: Drop a special case for guessing small miptree levels.Eric Anholt2013-09-301-43/+30
| | | | | | | | | | | | | | | | Let's say you started allocating your 2D texture with level 2 of a tree as a 1x1 image. The driver doesn't know if this means that level 0 is 4x4 or 4x1 or 1x4, so we would just allocate a single 1x1 and let it get copied in to the real location at texture validate time later. Since this is just a temporary allocation that *will* get copied, the extra space allocation of just taking the normal path which will happen to producing a 4x1 level 0, 2x1 level 1, and 1x1 level 2 is the right way to go, to reduce complexity in the normal case. No change in miptree copies over the course of a piglit run. Reviewed-by: Chad Versace <[email protected]>
* i965: Totally switch around how we handle nonzero baselevel-first_level.Eric Anholt2013-09-304-19/+12
| | | | | | | | | | | | | | | | | | | This has no effect currently, because intel_finalize_mipmap_tree() always makes mt->first_level == tObj->BaseLevel. The change I made before to handle it (b1080cfbdb0a084122fcd662cd27b4748c5598fd) got very close to working, but after fixing some unrelated bugs in the series, it still left tex-miplevel-selection producing errors when testing textureLod(). The problem is that for explicit LODs, the sampler's LOD clamping is ignored, and only the surface's MIP clamping is respected. So we need to use surface mip clamping, which applies on top of the sampler's mip clamping, so the sampler change gets backed out. Now actually tested with a non-regressing series producing a non-zero computed baselevel. Reviewed-by: Chad Versace <[email protected]>