summaryrefslogtreecommitdiffstats
path: root/src/mesa/drivers
Commit message (Collapse)AuthorAgeFilesLines
* mesa: remove ctx->Driver.Error() hookBrian Paul2013-01-291-1/+0
| | | | | | Not used by any driver anymore. Reviewed-by: Kenneth Graunke <[email protected]>
* osmesa: use _mesa_generate_mipmap() for mipmap generation, not metaBrian Paul2013-01-291-0/+3
| | | | | | | | See previous commit for more info. Note: This is a candidate for the 9.0 branch. Reviewed-by: José Fonseca <[email protected]>
* xlib: use _mesa_generate_mipmap() for mipmap generation, not metaBrian Paul2013-01-291-0/+3
| | | | | | | | | | | | | | | | | | The swrast fragment program interpreter has trouble computing the right texture LOD because it doesn't have easy access to input derivatives. This causes the GLSL-based meta generate mipmap code to fetch texels from the wrong mipmap level. One possible fix would be to set the GL_TEXTURE_MIN/MAX_LOD parameters to limit sampling from the right level. But let's just use the _mesa_generate_mipmap() fallback since it's a lot faster than using the fragment shader interpreter. Fixes http://bugs.freedesktop.org/show_bug.cgi?id=54240 Note: This is a candidate for the 9.0 branch. Reviewed-by: José Fonseca <[email protected]>
* xlib: stop use _mesa_enable_extension(), just set the boolean flagsBrian Paul2013-01-291-5/+4
| | | | Reviewed-by: Ian Romanick <[email protected]>
* xlib: fix incorrect GL_ANGLE_texture_compression_dxt enableBrian Paul2013-01-291-1/+2
| | | | Reviewed-by: Ian Romanick <[email protected]>
* i965: Add chipset limits for Haswell GT1/GT2.Kenneth Graunke2013-01-281-1/+17
| | | | | | | | | | | The maximum number of URB entries come from the 3DSTATE_URB_VS and 3DSTATE_URB_GS state packet documentation; the thread count information comes from the 3DSTATE_VS and 3DSTATE_PS state packet documentation. NOTE: This is a candidate for the 9.0 branch. Signed-off-by: Kenneth Graunke <[email protected]> Signed-off-by: Eugeni Dodonov <[email protected]>
* intel: Un-hardcode lengths from blitter commands.Kenneth Graunke2013-01-282-7/+7
| | | | | | | | | The packet length may change at some point in the future. Specifying it explicitly (rather than hardcoding it in the command #define) allows us to change it much more easily in the future. Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Eric Anholt <[email protected]>
* intel: Use a CPU map of the batch on LLC-sharing architectures.Eric Anholt2013-01-294-9/+24
| | | | | | | | | | | | | | | | Before, we were keeping a CPU-only buffer to accumulate the batchbuffer in, which was an improvement over mapping the batch through the GTT directly (since any readback or other failure to stream through write combining correctly would hurt). However, on LLC-sharing architectures we can do better by mapping the batch directly, which reduces the cache footprint of the application since we no longer have this extra copy of a batchbuffer around. Improves performance of GLBenchmark 2.1 offscreen on IVB by 3.5% +/- 0.4% (n=21). Improves Lightsmark performance by 1.1 +/- 0.1% (n=76). Improves cairo-gl performance by 1.9% +/- 1.4% (n=57). No statistically significant difference in GLB2.1 on SNB (n=37). Improves cairo-gl performance by 2.1% +/- 0.1% (n=278).
* i965: Fix assignment instead of comparison in asserts.Vinson Lee2013-01-281-2/+2
| | | | | | | | Fixes side effect in assertion defects reported by Coverity. Note: This is a candidate for the 9.1 branch. Signed-off-by: Vinson Lee <[email protected]> Reviewed-by: Chad Versace <[email protected]>
* intel: Typo fix: "pitsh" -> "pitch"Paul Berry2013-01-281-1/+1
| | | | Comment change only.
* i965: Enable ARB_shading_language_packingMatt Turner2013-01-251-0/+1
| | | | | Reviewed-by: Chad Versace <[email protected]> Reviewed-by: Paul Berry <[email protected]>
* i965: Assert that the 4x8 pack/unpack operations have been loweredMatt Turner2013-01-253-0/+12
| | | | | Reviewed-by: Chad Versace <[email protected]> Reviewed-by: Paul Berry <[email protected]>
* i965: Lower the 4x8 pack/unpack operationsMatt Turner2013-01-251-1/+5
| | | | | Reviewed-by: Chad Versace <[email protected]> Reviewed-by: Paul Berry <[email protected]>
* i965: Pass in the glarray to get_surface_type.Eric Anholt2013-01-251-29/+22
| | | | | | | Dereffing all the values in the two callers was just pointless, and the function isn't inlined so there was actual code impact. Reviewed-by: Kenneth Graunke <[email protected]>
* i965: Remove nonsense comment.Eric Anholt2013-01-251-2/+0
| | | | | | vb.inputs_read has never been a thing, even in the initial import. Reviewed-by: Kenneth Graunke <[email protected]>
* i965: Remove NDEBUG undef that was snuck in.Eric Anholt2013-01-251-2/+0
| | | | | | If you want debug, set --enable-debug in your config flags. Reviewed-by: Kenneth Graunke <[email protected]>
* i965: reuse _mesa_sizeof_type for index buffer types.Eric Anholt2013-01-251-24/+2
| | | | | | | The core Mesa code has just one more case than this (GL_BITMAP), so I don't see any cause to special-case it. Reviewed-by: Kenneth Graunke <[email protected]>
* i965: Reuse precalculated ib_type_size value.Eric Anholt2013-01-251-1/+1
| | | | Reviewed-by: Kenneth Graunke <[email protected]>
* i965: Drop debug check for knowing the size of a type.Eric Anholt2013-01-251-2/+1
| | | | | | | | This was added in b93684f5f311f89c965960ab42bfea71a397b180, but there's no need for it -- get_size has to succeed, and it has an assert for us in debug builds. Reviewed-by: Kenneth Graunke <[email protected]>
* i965: Stop worrying about alignment of vertex data.Eric Anholt2013-01-251-7/+1
| | | | | | | | | For our current types, the required alignment is actually just 1 byte. When we get doubles, we have to worry (those have to be aligned to the natural size), but we don't have doubles yet and they'll just be a special case. Reviewed-by: Kenneth Graunke <[email protected]>
* i965: Use the glarray _ElementSize that Mesa tracks for us.Eric Anholt2013-01-252-8/+4
| | | | Reviewed-by: Kenneth Graunke <[email protected]>
* glsl: Add ir_variable::is_in_uniform_block predicateIan Romanick2013-01-252-2/+2
| | | | | | | | | The way a variable is tested for this property is about to change, and this makes the code easier to modify. Signed-off-by: Ian Romanick <[email protected]> Reviewed-by: Carl Worth <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
* glsl: Add GLSL_TYPE_INTERFACEIan Romanick2013-01-254-0/+4
| | | | | | | | | | | | | | | | | | | | | Interfaces are structurally identical to structures from the compiler's point of view. They have some additional restrictions, and generally GPUs use different instructions to access them. Using a different base type should make this a bit easier. This commit also adds the glsl_type::interface_packing fields. For GLSL_TYPE_INTERFACE types, this will track the specified packing mode. It is analogous to gl_uniform_buffer::_Packing. v2: Add serveral missing GLSL_TYPE_INTERFACE cases in switch-statements. v3: Add information about glsl_type::interface_packing. Move row_major checking in glsl_type::record_key_compare from this patch to the previous patch. Both suggested by Paul Berry. Signed-off-by: Ian Romanick <[email protected]> Reviewed-by: Paul Berry <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
* glsl: Replace most default cases in switches on GLSL typeIan Romanick2013-01-254-7/+17
| | | | | | | | | | | | | | | This makes it easier to find switch-statements that need to be updated after a new GLSL_TYPE_* is added because the compiler will generate a warning. Switch-statements that only had a small number of cases (e.g., everything in ir_constant_expression.cpp) were not modified. I may regret that decision when we eventually add support for doubles. Signed-off-by: Ian Romanick <[email protected]> Reviewed-by: Carl Worth <[email protected]> Reviewed-by: Chad Versace <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
* i965: Correct gen6+ guardband calculation.Eric Anholt2013-01-252-9/+21
| | | | | | | | | | | Too much attention was paid to the first paragraphs, and not enough to the last little note that "oh, by the way, the rendered things themselves still have to be clipped to just 8192 wide/high". Fixes GTF's clip.c test with 4096 or higher width on ivb, where one of the triangles got the upper half of its pixels dropped. Tested-by: Ian Romanick <[email protected]>
* i965: Use GL_RED for DEPTH_TEXTURE_MODE in ES 3.0 for unsized formats.Kenneth Graunke2013-01-254-7/+21
| | | | | | | | | | | | | | | | | | | | | Khronos has apparently decided that depth textures with sized formats (allowed with ARB_internalformat_query or ES 3.0) should be treated as GL_RED, while unsized formats (an existing feature) should be treated as GL_INTENSITY for compatibility with ES 2.0. Ian is proposing changes to ARB_internalformat_query which will make this actually legal and consistent. A similar problem exists with GL 4.2, but we're going to ignore that for the time being. Tested on Ivybridge: no Piglit regressions; fixes 4 es3conform tests: - depth_texture_fbo - depth_texture_fbo_clear - depth_texture_teximage - depth_texture_texsubimage Reviewed-by: Ian Romanick <[email protected]>
* i965: Bump maximum supported ES2 context version to 3.0Chad Versace2013-01-251-1/+1
| | | | | | | | | | Since patch "i965: Validate requested GLES context version in brwCreateContext", we have been able to create ES 3.0 contexts due to the max version check. So...bump the max version. Reviewed-by: Kenneth Graunke <[email protected]> Reviewed-by: Ian Romanick <[email protected]> Signed-off-by: Chad Versace <[email protected]>
* i965/Gen6+: Enable ARB_ES3_compatibility extensionPaul Berry2013-01-251-0/+1
| | | | Reviewed-by: Ian Romanick <[email protected]>
* i965/fs/gen7: Fix fatal typo in unpackHalf2x16Chad Versace2013-01-241-1/+1
| | | | | | | | | s/src/src_w/ That little typo, which sneaked into v4 of the previous patch, generates incorrect fs code. Signed-off-by: Chad Versace <[email protected]>
* i965/fs/gen7: Emit code for GLSL 3.00 pack/unpack operations (v4)Chad Versace2013-01-245-3/+144
| | | | | | | | | | | | | v2: Remove lewd comment. [for idr] v3: - Optimize away tmp register for packHalf2x16. [for anholt, paul] - Improve comments. [for anholt, paul] - Reduce near-duplicate code by removing vec4_visitor emit_pack/unpack methods. [for chadv] v4: Factor our UD/W register conversion into helper function. [for anholt] Reviewed-by: Eric Anholt <[email protected]> Reviewed-by: Ian Romanick <[email protected]> (v2) Signed-off-by: Chad Versace <[email protected]>
* i965/vs/gen7: Emit code for GLSL ES 3.00 pack/unpack operations (v3)Chad Versace2013-01-243-0/+146
| | | | | | | | | | | | | | FIXME: This patch emits VS code that violates documented hardware restrictions and then relies on undocumented behavior that results from that violation. This patch passes all tests, but should be fixed ASAP to conform to the hardware documentation. v2: Explain undocumented hardware behavior. Improve comments. v3: Use ALU1 helper methods F32TO16() and F16TO32(). [for anholt] Reviewed-by: Eric Anholt <[email protected]> Reviewed-by: Ian Romanick <[email protected]> (v1) Signed-off-by: Chad Versace <[email protected]>
* i965: Quote the PRM on a HorzStride subtletyChad Versace2013-01-241-1/+4
| | | | | Reviewed-by: Ian Romanick <[email protected]> Signed-off-by: Chad Versace <[email protected]>
* i965: Add opcodes for F32TO16 and F16TO32Chad Versace2013-01-244-0/+8
| | | | | | | | | | | | | The GLSL ES 3.00 operations packHalf2x16 and unpackHalf2x16 will emit these opcodes. - Define the opcodes BRW_OPCODE_{F32TO16,F16TO32}. - Add the opcodes to the brw_disasm table. - Define convenience functions brw_{F32TO16,F16TO32}. Reviewed-by: Ian Romanick <[email protected]> Acked-by: Paul Berry <[email protected]> Signed-off-by: Chad Versace <[email protected]>
* i965: Lower the GLSL ES 3.00 pack/unpack operations (v2)Chad Versace2013-01-241-0/+32
| | | | | | | | | | | | | | | | | On gen < 7, we fully lower all operations to arithmetic and bitwise operations. On gen >= 7, we fully lower the Snorm2x16 and Unorm2x16 operations, and partially lower the Half2x16 operations. v2: - Comment that scalarization is needed only for SOA code [for idr]. - Replace switch-statement with if-statement [for idr]. - Remove misplaced hunk from previous patch [found by idr]. Reviewed-by: Ian Romanick <[email protected]> Reviewed-by: Matt Tuner <[email protected]> Signed-off-by: Chad Versace <[email protected]>
* i965/disasm: Fix horizontal stride of dest registersChad Versace2013-01-241-3/+6
| | | | | | | | | | The bug: The printed horizontal stride was the numerical value of the BRW_HORIZONTAL_$N enum. The fix: Translate the enum before printing. Note: This is a candidate for the stable releases. Reviewed-by: Eric Anholt <[email protected]> Signed-off-by: Chad Versace <[email protected]>
* intel: Fix glCopyTexSubImage on buffers whose width >= 32kbytesPaul Berry2013-01-241-0/+21
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When possible, glCopyTexSubImage calls are performed using the hardware blitter. However, according to the Ivy Bridge PRM, Vol1 Part4, section 1.2.1.2 (Graphics Data Size Limitations): The BLT engine is capable of transferring very large quantities of graphics data. Any graphics data read from and written to the destination is permitted to represent a number of pixels that occupies up to 65,536 scan lines and up to 32,768 bytes per scan line at the destination. The maximum number of pixels that may be represented per scan line’s worth of graphics data depends on the color depth. With an RGBA32F color buffer (which has 16 bytes per pixel) this imposes a maximum width of 2048 pixels. Other pixel formats have accordingly larger limits. To make matters worse, if the pitch of the buffer is 32k or greater, intel_copy_texsubimage's call to intelEmitCopyBlit will overflow intelEmitCopyBlit's src_pitch and dst_pitch parameters (which are 16-bit signed integers). We can conveniently avoid both problems by avoiding use of the blitter when the miptree's pitch is >= 32k. Fixes gles3conform "framebuffer_blit_functionality_magnifying_blit" tests when the buffer width is equal to 8192. Note: this is very similar to the recent patch "intel: Fix ReadPixels on buffers whose width >= 32kbytes" except that it applies to glCopyTexSubImage instead of glReadPixels. In a future patch it would be nice to refactor the code so that (a) overflow is avoided, and (b) intelEmitCopyBlit is responsible for checking whether the blitter can handle the width, so that all callers of intelEmitCopyBlit work properly, rather than just these two. Reviewed-by: Kenneth Graunke <[email protected]>
* glsl: Eliminate ambiguity between function ins/outs and shader ins/outsPaul Berry2013-01-244-9/+11
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch replaces the three ir_variable_mode enums: - ir_var_in - ir_var_out - ir_var_inout with the following five: - ir_var_shader_in - ir_var_shader_out - ir_var_function_in - ir_var_function_out - ir_var_function_inout This eliminates a frustrating ambiguity: it used to be impossible to tell whether an ir_var_{in,out} variable was a shader in/out or a function in/out without seeing where the variable was declared in the IR. This complicated some optimization and lowering passes, and would have become a problem for implementing varying structs. In the lisp-style serialization of GLSL IR to strings performed by ir_print_visitor.cpp and ir_reader.cpp, I've retained the names "in", "out", and "inout" for function parameters, to avoid introducing code churn to the src/glsl/builtins/ir/ directory. Note: a couple of comments in the code seemed to indicate that we were planning for a possible future in which geometry shaders could have shader-scope inout variables. Our GLSL grammar rejects shader-scope inout variables, and I've been unable to find any evidence in the GLSL standards documents (or extensions) that this will ever be allowed, so I've eliminated these comments. Reviewed-by: Carl Worth <[email protected]> Reviewed-by: Jordan Justen <[email protected]> Reviewed-by: Eric Anholt <[email protected]>
* i965/vs: Do headerless texturing for texelFetchOffset().Kenneth Graunke2013-01-241-2/+4
| | | | | | | | | For texelFetchOffset(), we just add the texel offsets to the coordinate rather than using the message header's offset fields. So we don't actually need a header on Gen5+. Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Eric Anholt <[email protected]>
* intel: Fix ReadPixels on buffers whose width >= 32kbytesPaul Berry2013-01-241-4/+24
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When possible, glReadPixels calls are performed using the hardware blitter. However, according to the Ivy Bridge PRM, Vol1 Part4, section 1.2.1.2 (Graphics Data Size Limitations): The BLT engine is capable of transferring very large quantities of graphics data. Any graphics data read from and written to the destination is permitted to represent a number of pixels that occupies up to 65,536 scan lines and up to 32,768 bytes per scan line at the destination. The maximum number of pixels that may be represented per scan line’s worth of graphics data depends on the color depth. With an RGBA32F color buffer (which has 16 bytes per pixel) this imposes a maximum width of 2048 pixels. To make matters worse, if the pitch of the buffer is 32k or greater, intel_miptree_map_blit's call to intelEmitCopyBlit will overflow intelEmitCopyBlit's src_pitch and dst_pitch parameters (which are 16-bit signed integers). We can conveniently avoid both problems by avoiding the readpixels blit path when the miptree's pitch is >= 32k. Fixes gles3conform "half_float" tests when the buffer width is greater than 2048. Reviewed-by: Eric Anholt <[email protected]> Tested-by: Ian Romanick <[email protected]>
* intel: callocing a 32 byte temp is silly, so don'tIan Romanick2013-01-241-3/+3
| | | | | | | | | I believe that the size used to vary, so the dynamic allocation is necessary. Signed-off-by: Ian Romanick <[email protected]> Reviewed-by: Anuj Phogat <[email protected]> Reviewed-by: Matt Turner <[email protected]>
* intel: Enable S3TC extensions alwaysIan Romanick2013-01-231-6/+4
| | | | | | | | | | | | | | | | | | | | | Always enable the use of pre-compressed texture data. The ability to perform on-line compression still requires the presence of libtxc_dxtn or an explicit driconf over-ride. Applications that just want to submit precompessed data when an on-line compressor is not available can look for the GL_EXT_texture_compression_dxt1 and GL_ANGLE_texture_compression_dxt[35] extensions. v2: Only enable the extensions that do not require on-line compression by default. The previous statement "This should not impact many (if any) real applications." proved to be false for at least Sauerbraten. This application mostly submits pre-compressed data, but it also can submit uncompressed data that it asks the driver to compress. Signed-off-by: Ian Romanick <[email protected]> Reviewed-by: Jordan Justen <[email protected]> [v1] Reviewed-by: Kenneth Graunke <[email protected]> [v1] Acked-by: Eric Anholt <[email protected]> [v1] Acked-by: Lee Salzman <[email protected]>
* mesa: Use a single flag for the S3TC extensions that don't require on-line ↵Ian Romanick2013-01-236-6/+7
| | | | | | | | compression Signed-off-by: Ian Romanick <[email protected]> Reviewed-by: Eric Anholt <[email protected]> Acked-by: Lee Salzman <[email protected]>
* i965: Use swizzles to force R, G, and B to 0.0 for ALPHA textures.Carl Worth2013-01-231-3/+10
| | | | | | | | | | | | | | | | | Similar to the previous commit, we may be using a texture with actual RGBA storage for the GL_ALPHA format, so force the color values to 0.0. This commit fixes the following piglit (sub) tests: EXT_texture_snorm/fbo-blending-formats GL_ALPHA16_SNORM GL_ALPHA8_SNORM GL_ALPHA_SNORM Note: Haswell bypasses this swizzle code, so may require an independent fix for this bug. Reviewed-by: Eric Anholt <[email protected]>
* i965: Use swizzles to force alpha to 1.0 for RED, RG, or RGB textures.Carl Worth2013-01-231-0/+12
| | | | | | | | | | | | | | | | | | | | | | | | | | | | We may be using a texture with actual RGBA storage for these formats, so force the alpha value read to 1.0. This commit fixes the following piglit (sub) tests: ARB_texture_float/fb-blending-formats GL_RGB16F_ARB EXT_framebuffer_object/fbo-blending-formats GL_RGB10 GL_RGB12 GL_RGB16 EXT_texture_snorm/fbo-blending-formats GL_RGB16_SNORM GL_RGB8_SNORM GL_RGB_SNORM These test improvements depend on the previous commit as well. That commit smashes alpha to 1.0 for the case of ReadPixels (so fixes "FBO testing" as reported by this test), while this commit smashes alpha to 1.0 for the case of texturing (fixed the "window testing" as reported by this test). Note: Haswell bypasses this swizzle code, so may require an independent fix for this bug. Reviewed-by: Eric Anholt <[email protected]>
* i965: Examine _BaseFormat when deciding to perform xRGB_alpha fixupsCarl Worth2013-01-231-1/+2
| | | | | | | | | | | | | | | | | | | | | The renderbuffer's Format field may have an alpha channel even when the underlying _BaseFormat does not. This can happen when mesa chooses to use RGBA16 for an RGB16 format, for example. So look at _BaseFormat when deciding whether to fixup the blend factors. This test improves the results of at least the following piglit tests: EXT_frambebuffer_object/fbo-blending-formats {GL_RGB10, GL_RGB12, GL_RGB16} EXT_texture_snorm/fbo-blending-formats {GL_RGB16_SNORM, GLRGB8_SNORM, GL_RGB_SNORM} But none of these actually change from FAIL to PASS yet. The R, G, and B probe values are fixed with this commit, but the tests still fail because the alpha values are still wrong. Reviewed-by: Eric Anholt <[email protected]>
* wmesa: include api_exec.h to fix compilationBrian Paul2013-01-221-0/+1
|
* i965: Implement the GL_ARB_base_instance extension.Kenneth Graunke2013-01-222-2/+4
| | | | | | | | | | | | | Thanks to Fredrik Höglund, all the hard work was already done. Tested using a modified oglconform (that actually runs these tests on our driver); it looks like there may be some bugs when using client arrays. All applicable non-compatibility tests passed. For now, only enable it in core profiles. Reviewed-by: Eric Anholt <[email protected]> Tested-by: Ian Romanick <[email protected]>
* mesa: Make the drivers call a non-code-generated dispatch table setup.Eric Anholt2013-01-2110-10/+10
| | | | | | | I want to drive the Save dispatch table setup from this same function. Reviewed-by: Brian Paul <[email protected]> Reviewed-by: Ian Romanick <[email protected]>
* mesa: Remove the dead PrepareExecBegin() driver hook.Eric Anholt2013-01-211-1/+0
| | | | | | | This was used in i965 for a while, but no more. Reviewed-by: Brian Paul <[email protected]> Reviewed-by: Ian Romanick <[email protected]>
* scons: Fix dependencies of generated headers.José Fonseca2013-01-212-5/+2
| | | | | | | | | | | | | | It appears that scons implicit dependency scanners fail to chain dependencies of generated headers when these are outside the build tree. This patch ensures generated source files are _always_ put in the build tree. I'm not 100% this will fix all depency issues, but from my experiments it does seem to fix this. NOTE: For this to be effective it is necessary to clean the source tree from generated header/source files. Reviewed-by: Brian Paul <[email protected]>