summaryrefslogtreecommitdiffstats
path: root/src/mesa
Commit message (Collapse)AuthorAgeFilesLines
* i965: Always reserve clip distance VUE slots in SSO mode.Kenneth Graunke2016-11-231-0/+13
| | | | | | | | | | | | | | This fixes rendering in Dolphin on Vulkan since we enabled clip distances. (Dolphin on GL has a similar bug because the linker fails to eliminate unused clip distance built-in arrays, but it isn't using SSO...so that needs more fixing.) Also fixes a Piglit test: spec/glsl-1.50/execution/geometry.clip-distance-vs-gs-out-sso Signed-off-by: Kenneth Graunke <[email protected]> Tested-by: Emmanuel Gil Peyrot <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]>
* i965: Use 3DSTATE_CLIP's User Clip Distance Enable bitmask on Gen8+.Kenneth Graunke2016-11-235-18/+17
| | | | | | | | | | | | | | | | | | | | | | | | Gen6-7.5 specify the user clip distance enable bitmask in 3DSTATE_CLIP. Gen8+ normally uses the new internal signalling mechanism to select the one specified in the last enabled shader stage (3DSTATE_VS, DS, or GS). This is a pretty good fit for Vulkan, or even newer GL, where the bitmask comes entirely from the shader. But with glClipPlane(), this is dynamic state, and we have to listen to _NEW_TRASNFORM. Clip plane enables are the only reason the VS/DS/GS atoms need to listen to _NEW_TRANSFORM. 3DSTATE_CLIP already has to listen to it in order to support ARB_clip_control settings. Setting the "Use the 3DSTATE_CLIP bitmask" force enable bit allows us to drop _NEW_TRANSFORM from all the shader stage atoms, so we can re-emit them less often. Improves performance of OglBatch7 (version 6) by 2.70773% +/- 0.491257% (n = 38) at 1024x768 on Cherryview. Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Matt Turner <[email protected]>
* i965/gen7: Only advertise 4 samples for RGBA32F on GLESJordan Justen2016-11-231-3/+19
| | | | | | | | | | | We can't render to 8x MSAA if the width is greater than 64 bits. (see brw_render_target_supported) Fixes ES31-CTS.sample_variables.mask.rgba32f.samples_8.mask_* Signed-off-by: Jordan Justen <[email protected]> Reviewed-by: Samuel Iglesias Gonsálvez <[email protected]> Reviewed-by: Iago Toral Quiroga <[email protected]>
* i965: Restructure fast clear eligibility decisionBen Widawsky2016-11-231-14/+37
| | | | | | | | | v2 (Jason): - Use PRM citation for SKL now that it is available - Also return false for gen < 8 mipmapped/arrayed Signed-off-by: Topi Pohjolainen <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]>
* i965: Set initial msaa fast clear status explicitlyTopi Pohjolainen2016-11-231-1/+1
| | | | | | | | | | | instead of in intel_miptree_init_mcs(). For lossless compression the status is immediately overwritten in intel_miptree_alloc_non_msrt_mcs() while the status for non-compressed non-msaa miptrees is explicitly set in do_blorp_clear(). Signed-off-by: Topi Pohjolainen <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]>
* i965: Declare read-only input to level/layer check constTopi Pohjolainen2016-11-231-1/+1
| | | | | Signed-off-by: Topi Pohjolainen <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]>
* i965/fbo: Prepare layer multiplier for render buffer compressionTopi Pohjolainen2016-11-231-1/+1
| | | | | | | | This path is not yet taken for fast cleared or compressed buffers but later patches will enable it. Signed-off-by: Topi Pohjolainen <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]>
* i965: Add multi-slice getter for resolve mapsTopi Pohjolainen2016-11-232-7/+27
| | | | | | | This is useful when checking if any slice is in unresolved state. Signed-off-by: Topi Pohjolainen <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]>
* i965/meta: Split conversion of color and setting itTopi Pohjolainen2016-11-233-19/+36
| | | | | | | | | And fix a mangled comment while at it. v2 (Ben): Return the converted color. Signed-off-by: Topi Pohjolainen <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]>
* i965/miptree: Don't shrink textures when augmenting for more levelsTopi Pohjolainen2016-11-231-4/+25
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This was detected when examining CCS_E failures with piglit test: "fbo-generatemipmap-formats". Test creates a 2D texture with dimensions 293x277. It manually loops over all levels and calls glTexImage2D(). Level one triggers creation of full miptree: intel_alloc_texture_image_buffer() realizes that there is only one level in the miptree and calls intel_miptree_create_for_teximage() to re-allocate the miptree with all 9 levels. However, the end result is a miptree with level zero dimensions of 292x276. Related, and possibly calling for treatment of its own is mip-map generation: After calling glTexImage2D() against every level test continues by replacing content for levels one to eight with data derived from level zero by calling glGenerateMipmapEXT(). This results into the miptree being allocated anew for every level: Mip-map generation goes thru meta which ends up validating the texture (brw_validate_textures()->intel_finalize_mipmap_tree()-> intel_miptree_match_image()) where one finds texture with base level size 292:276. This results into new miptree being created for the npot size 293:277. Only here intel_finalize_mipmap_tree() is asked for only one level, and therefore such is created. Generation for level one in turn finds right base level size but only one level when two is needed. And the same goes on for all eight levels. This patch prevents the shrink maintaining the NPOT size of 293x277. Signed-off-by: Topi Pohjolainen <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]>
* main/getteximage: Use the height argument to calculate memcpy copy sizeEduardo Lima Mitev2016-11-231-1/+1
| | | | | | | | | | | | | | | | | | In get_tex_memcpy, when copying texture data directly from source to destination (when row strides match for both src and dst), the copy size is currently calculated using the full texture height instead of the sub-region height parameter that was passed. This can cause a read past the end of the mapped buffer when y-offset is greater than zero, leading to a segfault. Fixes CTS test (from crash to pass): * GL45-CTS/get_texture_sub_image/functional_test v2: (Jason) Use the passed 'height' instead of copying til the end of the buffer (tex-height - yoffset). Reviewed-by: Jason Ekstrand <[email protected]>
* i965/fs: Implement load_layer_id for fragment shadersJason Ekstrand2016-11-221-0/+5
| | | | Reviewed-by: Jordan Justen <[email protected]>
* compiler: Add the rest of the subpassInput typesJason Ekstrand2016-11-221-0/+1
| | | | | | | There are actually 6 of them according to the GL_KHR_vulkan_glsl spec. Reviewed-by: Dave Airlie <[email protected]> Reviewed-by: Jordan Justen <[email protected]>
* mesa: use special checksums for unset checksums and fixed-func shadersMarek Olšák2016-11-222-0/+6
| | | | | | for debugging Reviewed-by: Timothy Arceri <[email protected]>
* glsl: add gl_linked_shader::SourceChecksumMarek Olšák2016-11-223-1/+17
| | | | | | | | for debugging v2: wrap all checksums in #ifdef DEBUG Reviewed-by: Timothy Arceri <[email protected]>
* mesa: use util_hash_crc32 instead of _mesa_str_checksumMarek Olšák2016-11-223-26/+2
| | | | Reviewed-by: Timothy Arceri <[email protected]>
* i965/compiler: Disable trig workarounds on KBL+Jason Ekstrand2016-11-222-4/+8
| | | | | | | | | | The precision of our trig instructions appears to have been fixed on Kaby Lake. Neither Ben nor I can find any documentation for this. However, the dEQP precision tests now pass with INTEL_PRECISE_TRIG=0 where they fail on Sky Lake. Signed-off-by: Jason Ekstrand <[email protected]> Reviewed-by: Matt Turner <[email protected]>
* mesa/glsl: remove unused uses_builtin_functions fieldTimothy Arceri2016-11-232-2/+0
| | | | | | This has been unused since 943b69cddd Reviewed-by: Samuel Iglesias Gonsálvez <[email protected]>
* i965: Use NIR-based clip/cull lowering for OpenGL as well.Kenneth Graunke2016-11-222-1/+2
| | | | | | | | | | | | The old approach works fine, and this approach isn't necessarily better. But it at least has the advantage that Vulkan and GL use the same approach. I originally wrote it to gain additional testing for the new paths. shader-db statistics show 0 instruction count changes. Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]>
* i965/vec4: Handle component qualifiers on non-generic varyings.Kenneth Graunke2016-11-225-73/+53
| | | | | | | | | | | | | ARB_enhanced_layouts only requires component qualifier support for generic varyings, so this is all the vec4 backend knew how to handle. This patch extends the backend to handle it for all varyings, so we can use store_output intrinsics with a component set for things like clip/cull distances. We may want to use that for other VUE header fields in the future as well. Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Timothy Arceri <[email protected]>
* i965/fs: Handle compact outputs.Kenneth Graunke2016-11-221-1/+3
| | | | | | | We need to calculate the number of vec4 slots correctly. Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]>
* i965/hsw: Set integer mode in sampling state for stencil texturingJordan Justen2016-11-212-18/+9
| | | | | | | | | | | | | | | Fixes: ES31-CTS.functional.texture.border_clamp.formats.depth24_stencil8_sample_stencil.nearest_size_pot ES31-CTS.functional.texture.border_clamp.formats.depth24_stencil8_sample_stencil.nearest_size_npot ES31-CTS.functional.texture.border_clamp.formats.depth32f_stencil8_sample_stencil.nearest_size_pot ES31-CTS.functional.texture.border_clamp.formats.depth32f_stencil8_sample_stencil.nearest_size_npot ES31-CTS.functional.texture.border_clamp.unused_channels.depth24_stencil8_sample_stencil ES31-CTS.functional.texture.border_clamp.unused_channels.depth32f_stencil8_sample_stencil Cc: "13.0" <[email protected]> Signed-off-by: Jordan Justen <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]>
* mesa: fold always true conditionalEmil Velikov2016-11-211-4/+2
| | | | | Signed-off-by: Emil Velikov <[email protected]> Reviewed-by: Brian Paul <[email protected]>
* mesa: drop unneeded assertEmil Velikov2016-11-211-1/+0
| | | | | | | | As seen a couple of lines above - there's no way for the assert to trigger. Signed-off-by: Emil Velikov <[email protected]> Reviewed-by: Brian Paul <[email protected]>
* tnl: remove unneeded #include "util/simple_list.h"Brian Paul2016-11-202-2/+0
| | | | Reviewed-by: Vinson Lee <[email protected]>
* radeon: remove unneeded #include "util/simple_list.h"Brian Paul2016-11-205-5/+0
| | | | | | Compile tested only. Reviewed-by: Vinson Lee <[email protected]>
* r200: remove unneeded #include "util/simple_list.h"Brian Paul2016-11-205-5/+1
| | | | | | | And include "util/simple_list.h" where it is needed in r200_state.c Compile tested only. Reviewed-by: Vinson Lee <[email protected]>
* i915: remove unneeded #include "util/simple_list.h"Brian Paul2016-11-202-2/+0
| | | | | | Compile tested only. Reviewed-by: Vinson Lee <[email protected]>
* mesa: remove unneeded #includes in errors.cBrian Paul2016-11-201-6/+0
| | | | Reviewed-by: Vinson Lee <[email protected]>
* mesa: remove trailing whitespace in errors.cBrian Paul2016-11-201-6/+6
| | | | Reviewed-by: Vinson Lee <[email protected]>
* i965: Store a clip_distance_mask field similar to cull_distance_mask.Kenneth Graunke2016-11-194-0/+7
| | | | | | | This isn't useful for legacy GL, but will be used in Vulkan. Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]>
* i965: Use shader_info for brw_vue_prog_data::cull_distance_mask.Kenneth Graunke2016-11-196-12/+12
| | | | | | | | This also allows us to move it from a GL specific location to a part of the compiler shared by both GL and Vulkan. Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]>
* compiler: Store the clip/cull distance array sizes in shader_info.Kenneth Graunke2016-11-191-1/+2
| | | | | | | We switched from a boolean to array lengths in gl_program a while back. Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]>
* i965: Fix GS push inputs with enhanced layouts.Kenneth Graunke2016-11-191-1/+1
| | | | | | | | | | | | We weren't taking first_component into account when handling GS push inputs. We hardly ever push GS inputs, so this was not caught by existing tests. When I started using component qualifiers for the gl_ClipDistance arrays, glsl-1.50-transform-feedback-type-and-size started catching this. Cc: "13.0" <[email protected]> Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]>
* i965: Delete unused variable.Kenneth Graunke2016-11-191-2/+0
| | | | | | I forgot to delete this in 9ef2b9277d3bead6dbfa47e95794ca61e8be4e84. Signed-off-by: Kenneth Graunke <[email protected]>
* intel: Share URB configuration code between GL and Vulkan.Kenneth Graunke2016-11-191-138/+4
| | | | | | | | | This code is far too complicated to cut and paste. v2: Update the newly added genX_gpu_memcpy.c; const a few things. Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Topi Pohjolainen <[email protected]>
* i965: Use arrays in Gen7+ URB code.Kenneth Graunke2016-11-191-202/+134
| | | | | | | | | | | So much of this code was cut and pasted per stage. We can accomplish much of it by looping over shader stages. Improves performance of OglBatch7 (version 6) by 1.50783% +/- 0.287049% (n = 71) at 1024x768 on Cherryview. Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Topi Pohjolainen <[email protected]>
* i965: Drop brw->urb.{nr_*_entries,*_start} assignments from gen7_urb.c.Kenneth Graunke2016-11-191-17/+8
| | | | | | | | The context fields are for Gen4-5; setting them has always been useless. There's no point in spending the cost in the hottest path in the driver. Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Topi Pohjolainen <[email protected]>
* i965: Switch to roundf in HS/DS URB code.Kenneth Graunke2016-11-191-2/+2
| | | | | | | | | | | Matt intentionally switched the VS calculation to be float-based in commit c1da15709a0c0c2775bd9e534f67c60f7dc95ce8. Tessellation support was written before this and rebased forward, and missed the change. Now it's consistent. Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Topi Pohjolainen <[email protected]>
* i965: Make URB code use prog_data for GS/tessellation enable checks.Kenneth Graunke2016-11-191-6/+4
| | | | | | | | If geometry/tessellation shaders are disabled, prog_data will be NULL (see brw_state_upload.c). This consolidates dirty bits a little. Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Topi Pohjolainen <[email protected]>
* intel: Convert devinfo->urb.min_*_entries into an array.Kenneth Graunke2016-11-192-5/+7
| | | | | Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Topi Pohjolainen <[email protected]>
* intel: Convert devinfo->urb.max_*_entries into an array.Kenneth Graunke2016-11-192-14/+20
| | | | | Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Topi Pohjolainen <[email protected]>
* st/mesa/glsl/nir/i965: make use of new gl_shader_program_data in ↵Timothy Arceri2016-11-1923-151/+147
| | | | | | gl_shader_program Reviewed-by: Emil Velikov <[email protected]>
* mesa: create new gl_shader_program_data structTimothy Arceri2016-11-193-0/+69
| | | | | | | | This will be used to share data between gl_program and gl_shader_program allowing for greater code simplification as we can remove a number of awkward uses of gl_shader_program. Reviewed-by: Emil Velikov <[email protected]>
* i965: Disable depth writes when depth test is GL_EQUAL.Kenneth Graunke2016-11-188-8/+28
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | There's no point in performing depth writes when the depth test comparison function is set to GL_EQUAL - it would just write out the same value that's already there (if it is written at all). While this is harmless from a functional perspective, it hurts performance. Obviously, writing to memory is not free, but there's another more subtle impact as well: it can prevent early depth optimizations. Depth writes aren't supposed to happen for pixels that are killed by fragment shader discard statements or the alpha test. So, with depth writes enabled and either of those, the pixel shader must be invoked to determine whether or not to perform the write. This is fairly stupid in the EQUAL case - we're running a shader to decide whether to replace the existing depth value with itself. By disabling these pointless writes, we allow early depth even with discards and alpha testing, allowing the hardware to skip the pixel shader altogether if the depth test fails. Improves performance of Unigine Valley: - Skylake GT2: +17.8% - Broadwell GT3e: +11.5% - Cherrytrail: +19.4% Huge thanks to Mark Janes for building frameretrace [1], the performance analysis tool that helped us find this issue, and to Robert Bragg for providing us performance metrics on Linux. Mark also spent the time to analyze Valley performance on Windows vs. Linux and discovered a discrepancy in early depth test metrics. Once he had isolated a draw call and drawn attention to the problem, fixing it was pretty simple. [1] https://github.com/janesma/apitrace/wiki/frameretrace-branch Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Anuj Phogat <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]>
* glsl/i965: move per stage AtomicBuffers list to gl_programTimothy Arceri2016-11-199-44/+21
| | | | Reviewed-by: Emil Velikov <[email protected]>
* glsl: create gl_program at the start of linking rather than the endTimothy Arceri2016-11-194-29/+4
| | | | | | | | | | | | | | | | | | This will allow us to directly store metadata we want to retain in gl_program this metadata is currently stored in gl_linked_shader and will be lost if relinking fails even though the program will remain in use and is still valid according to the spec. "If a program object that is active for any shader stage is re-linked unsuccessfully, the link status will be set to FALSE, but any existing executables and associated state will remain part of the current rendering state until a subsequent call to UseProgram, UseProgramStages, or BindProgramPipeline removes them from use." This change will also help avoid the double handing that happens in _mesa_copy_linked_program_data(). Reviewed-by: Emil Velikov <[email protected]>
* st/mesa/i965: simplify gl_program references and stop leakingTimothy Arceri2016-11-194-13/+11
| | | | | | | | | | | | | | In i965 we were calling _mesa_reference_program() after creating gl_program and then later calling it again with NULL as a param to get the refcount back down to 1. This changes things to not use _mesa_reference_program() at all and just have gl_linked_shader take ownership of gl_program since refcount starts at 1. The st and ir_to_mesa linkers were worse as they were both getting in a state were the refcount would never get to 0 and we would leak the program. Reviewed-by: Emil Velikov <[email protected]>
* mesa/fbobject: Update CubeMapFace when reusing texturesNanley Chery2016-11-181-0/+1
| | | | | | | | | | | | | | | | | | | | | Framebuffer attachments can be specified through FramebufferTexture* calls. Upon specifying a depth (or stencil) framebuffer attachment that internally reuses a texture, the cube map face of the new attachment would not be updated (defaulting to TEXTURE_CUBE_MAP_POSITIVE_X). Fix this issue by actually updating the CubeMapFace field. This bug manifested itself in BindFramebuffer calls performed on framebuffers whose stencil attachments internally reused a depth texture. When binding a framebuffer, we walk through the framebuffer's attachments and update each one's corresponding gl_renderbuffer. Since the framebuffer's depth and stencil attachments may share a gl_renderbuffer and the walk visits the stencil attachment after the depth attachment, the uninitialized CubeMapFace forced rendering to TEXTURE_CUBE_MAP_POSITIVE_X. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=77662 Signed-off-by: Nanley Chery <[email protected]> Reviewed-by: Brian Paul <[email protected]>
* mesa: add NV_image_formats extension supportLionel Landwerlin2016-11-182-6/+17
| | | | | | | | | | | | | | | This extension can be enabled automatically as it is a subset of ARB_shader_image_load_store. v2: Replace helper function by qualifier struct field (Ilia) Enable NV_image_formats using ARB_shader_image_load_store (Ilia) v3: Drop extension field from gl_extensions (Ilia) Release notes (Ilia) Signed-off-by: Lionel Landwerlin <[email protected]> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=98480 Reviewed-by: Ilia Mirkin <[email protected]>