summaryrefslogtreecommitdiffstats
path: root/src/mesa
Commit message (Collapse)AuthorAgeFilesLines
* main: don't error when enabling conservative rasterization on glesLionel Landwerlin2016-12-131-1/+1
| | | | | Signed-off-by: Lionel Landwerlin <[email protected]> Reviewed-by: Ilia Mirkin <[email protected]>
* main: use new driver flag for conservative rasterization stateLionel Landwerlin2016-12-136-7/+19
| | | | | | | | | | | Suggested by Marek. v2: Use new driver flag (Marek) v3: Fix i965 comments (Lionel) Signed-off-by: Lionel Landwerlin <[email protected]> Reviewed-by: Marek Olšák <[email protected]>
* i965: remove brw_lower_texture_gradientsIago Toral Quiroga2016-12-135-358/+1
| | | | | | | This has been ported to NIR now so we don'tneed to keep the GLSL IR lowering any more. Reviewed-by: Kenneth Graunke <[email protected]>
* i965/nir: enable lowering of texture gradient for shadow samplersIago Toral Quiroga2016-12-131-0/+3
| | | | | | | This gets the lowering on the Vulkan driver too, which is required for hardware that does not have the sample_l_d message (up to IvyBridge). Reviewed-by: Kenneth Graunke <[email protected]>
* i965/nir: enable lowering of texture gradient for cube mapsIago Toral Quiroga2016-12-131-0/+1
| | | | | | | | | This gets the lowering on the Vulkan driver too. Fixes Vulkan CTS cube map texture gradient tests in: dEQP-VK.glsl.texture_functions.texturegrad.* Reviewed-by: Kenneth Graunke <[email protected]>
* treewide: s/comparitor/comparator/Ilia Mirkin2016-12-1211-40/+40
| | | | | | | | | | git grep -l comparitor | xargs sed -i 's/comparitor/comparator/g' Just happened to notice this in a patch that was sent and included one of the tokens in question. Signed-off-by: Ilia Mirkin <[email protected]> Acked-by: Nicolai Hähnle <[email protected]>
* i965/fs: Reject copy propagation into SEL if not min/max.Matt Turner2016-12-122-1/+12
| | | | | | | | | | | | | | | | | | | | | We shouldn't ever see a SEL with conditional mod other than GE (for max) or L (for min), but we might see one with predication and no conditional mod. total instructions in shared programs: 8241806 -> 8241902 (0.00%) instructions in affected programs: 13284 -> 13380 (0.72%) HURT: 62 total cycles in shared programs: 84165104 -> 84166244 (0.00%) cycles in affected programs: 75364 -> 76504 (1.51%) helped: 10 HURT: 34 Fixes generated code in at least Sanctum 2, Borderlands 2, Goat Simulator, XCOM: Enemy Unknown, and Shogun 2. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=92234 Reviewed-by: Jason Ekstrand <[email protected]>
* i965/fs: Add unit tests for copy propagation pass.Matt Turner2016-12-122-0/+211
| | | | | | Pretty basic, but it's a start. Acked-by: Jason Ekstrand <[email protected]>
* i965/fs: Rename opt_copy_propagate -> opt_copy_propagation.Matt Turner2016-12-123-15/+16
| | | | | | Matches the vec4 backend, cmod propagation, and saturate propagation. Reviewed-by: Jason Ekstrand <[email protected]>
* st/glsl_to_tgsi: plumb the GS output stream qualifier through to TGSINicolai Hähnle2016-12-121-0/+10
| | | | | | Allow drivers to emit GS outputs in a smarter way. Reviewed-by: Marek Olšák <[email protected]>
* i965/blorp: fix release build unused variable warningGrazvydas Ignotas2016-12-121-3/+1
| | | | | Signed-off-by: Grazvydas Ignotas <[email protected]> Reviewed-by: Eduardo Lima Mitev <[email protected]>
* i965: Print out cycle estimates at the start of block annotations.Kenneth Graunke2016-12-111-1/+1
| | | | | | | | | | | | We now print START B15 <-B14 (42774 cycles) indicating that we estimate B15 will take 42,774 cycles. Printing this should make it easier where time is spent in the program. Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Matt Turner <[email protected]>
* mesa: Return LINEAR encoding for winsys FBO depth/stencil.Kenneth Graunke2016-12-111-2/+7
| | | | | | | | | | | | | | | | | | | | | | | | | | | | GetFramebufferAttachmentParameteriv should return GL_LINEAR for the window system default framebuffer's GL_DEPTH or GL_STENCIL attachments when there are zero depth or stencil bits. The GL 4.5 spec's GetFramebufferAttachmentParameteriv section says: "If the value of FRAMEBUFFER_ATTACHMENT_OBJECT_TYPE is not NONE, these queries apply to all other framebuffer types: [...] If attachment is not a color attachment, or no data storage or texture image has been specified for the attachment, then params will contain the value LINEAR." Note that we already return LINEAR for the case where there is an actual depth or stencil renderbuffer attached. In the case modified by this patch, FRAMEBUFFER_ATTACHMENT_OBJECT_TYPE returns FRAMEBUFFER_DEFAULT rather than NONE. Fixes a CTS test when run in a visual without depth / stencil buffers: GL45-CTS.gtf30.GL3Tests.framebuffer_srgb.framebuffer_srgb_default_encoding Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Jordan Justen <[email protected]>
* i965/mt: Disable HiZ when sharing depth buffer externally (v2)Chad Versace2016-12-101-7/+22
| | | | | | | | | | | | | | | | | intel_miptree_make_shareable() discarded and disabled CCS. Fix it so that it discards and disables HiZ too. Fixes dEQP-EGL.functional.image.render_multiple_contexts.gles2_renderbuffer_depth16_depth_buffer on Skylake. v2: Actually do what the commit message says. Discard the HiZ buffer. Fixes: https://bugs.freedesktop.org/show_bug.cgi?id=98329 Reviewed-by: Topi Pohjolainen <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]> Cc: Nanley Chery <[email protected] Cc: Haixia Shi <[email protected]> Cc: [email protected]
* i965/mt: Disable aux surfaces after making miptree shareableChad Versace2016-12-101-0/+2
| | | | | | | | | | | | | | The entire goal of intel_miptree_make_shareable() is to permanently disable the miptree's aux surfaces. So set intel_mipmap_tree:disable_aux_buffers after the function's done with discarding down the aux surfaces. References: https://bugs.freedesktop.org/show_bug.cgi?id=98329 Reviewed-by: Topi Pohjolainen <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]> Cc: Nanley Chery <[email protected] Cc: Haixia Shi <[email protected]> Cc: [email protected]
* i965: delay adding built-in uniforms to Parameters listTimothy Arceri2016-12-091-23/+19
| | | | | | | | | | This is a step towards using NIR optimisations over GLSL IR optimisations. Delaying adding built-in uniforms until after we convert to NIR gives it a chance to optimise them away. V2: move the new code back to brw_link_shader() Reviewed-by: Kenneth Graunke <[email protected]>
* st/mesa: round lod_bias to a multiple of 1/256Marek Olšák2016-12-071-0/+6
| | | | | | | This reduces the number of sampler states 3.6x in Batman Arkham: Origins. (from ~7200 to ~2000) Reviewed-by: Nicolai Hähnle <[email protected]>
* i965: Increase max texture to 16k for gen7+Jordan Justen2016-12-071-3/+10
| | | | | | Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=98297 Signed-off-by: Jordan Justen <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]>
* i965: enable INTEL_conservative_rasterization on Gen9+Lionel Landwerlin2016-12-076-5/+18
| | | | | Signed-off-by: Lionel Landwerlin <[email protected]> Reviewed-by: Chris Forbes <[email protected]>
* mesa: add support for GL_INTEL_conservative_rasterizationLionel Landwerlin2016-12-075-0/+59
| | | | | Signed-off-by: Lionel Landwerlin <[email protected]> Reviewed-by: Chris Forbes <[email protected]>
* i965: Add i965 plumbing for ARB_post_depth_coverage for i965 (gen9+).Plamena Manolova2016-12-074-3/+13
| | | | | | | | | | This extension allows the fragment shader to control whether values in gl_SampleMaskIn[] reflect the coverage after application of the early depth and stencil tests. Signed-off-by: Plamena Manolova <[email protected]> Reviewed-by: Chris Forbes <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]>
* mesa: Add GL and GLSL plumbing for ARB_post_depth_coverage for i965 (gen9+).Plamena Manolova2016-12-073-0/+4
| | | | | | | | | This extension allows the fragment shader to control whether values in gl_SampleMaskIn[] reflect the coverage after application of the early depth and stencil tests. Signed-off-by: Plamena Manolova <[email protected]> Reviewed-by: Chris Forbes <[email protected]>
* i965: Drop redundant key->outputs_written initialization.Kenneth Graunke2016-12-061-2/+0
| | | | | | | This was already set to the same value earlier. Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Matt Turner <[email protected]>
* i965: Initialize "separate" flag in VUE maps.Kenneth Graunke2016-12-061-0/+3
| | | | | | | | | | | | This was uninitialized, which resulted in weird looking printouts where it appeared that the TCS output and TES input patch URB entries differed in SSO/non-SSO layout. There is no "separable" layout for both, as they're tied together. It should have no other actual effect. Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Matt Turner <[email protected]>
* i965: Don't force SSO layout for VS->TCS.Kenneth Graunke2016-12-062-4/+3
| | | | | | | | | | | | | | | | | | This was a hack which worked around the VS and TCS disagreeing on their shared interface due to the lack of varying packing. In particular, it was needed by Piglit's tcs-input-read-array-interface test. However, that was just one case where things could go awry, so the previous commit forcibly made interfaces match. This hack is no longer necessary. It also seems to be broken, though I'm not sure why. It fixes Piglit regressions in spec/arb_shader_image_load_store/semantics from commit ec1f159ac81ed964415d102eed4a0a29be8e7937. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=98893 Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Timothy Arceri <[email protected]>
* i965: Unify shader interfaces explicitly.Kenneth Graunke2016-12-061-0/+29
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | A while ago, I made i965 start compiling shaders independently. The VUE map layouts were based entirely on each shader's input/output bitfields. Assuming the interfaces match, this works out well - both sides will compute the same layout, and outputs are correctly routed to inputs. At the time, I had assumed that the linker would guarantee that the interfaces match. While it usually succeeds, it unfortunately seems to fail in some cases. For example, Piglit's tcs-input-read-array-interface test has a VS output array with two elements, but the TCS only reads one. The linker isn't able to eliminate the unused element from the VS, which makes the interfaces not match. Another case is where a shader other than the last writes clip/cull distances. These should be demoted to ordinary varyings, but they currently aren't - so we think they still have some special meaning, and prevent them from being eliminated. Fixing the linker to guarantee this in all cases is complicated. It needs to be able to optimize out dead code. It's tied into varying packing and other messiness. While we can certainly improve it---and should---I'd rather not rely on it being correct in all cases. This patch ORs adjacent stages' input/output bitfields together, ensuring that their interface (and hence VUE map layout) will be compatible. This should safeguard us against linker insufficiencies. Fixes line rendering in Dolphin, and the Piglit test based on it: spec/glsl-1.50/execution/geometry/clip-distance-vs-gs-out. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=97232 Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Timothy Arceri <[email protected]>
* i965: Emit proper NOPs.Matt Turner2016-12-061-4/+2
| | | | | | | | | | | The PRMs for HSW and newer say that other than the opcode and DebugCtrl bits of the instruction word, the rest must be zero. By zeroing the instruction word manually, we avoid using any of the state inherited through brw_codegen. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=96959 Reviewed-by: Ian Romanick <[email protected]>
* i965: Allocate at least some URB space even when max_vertices = 0.Kenneth Graunke2016-12-051-1/+7
| | | | | | | | | | | | | | | | | | | | Allocating zero URB space is a really bad idea. The hardware has to give threads a handle to their URB space, and threads have to use that to terminate the thread. Having it be an empty region just breaks a lot of assumptions. Hence, why we asserted that it isn't possible. Unfortunately, it /is/ possible prior to Gen8, if max_vertices = 0. In theory a geometry shader could do SSBO/image access and maybe still accomplish something. In reality, this is tripped up by conformance tests. Gen8+ already avoids this problem by placing the vertex count DWord in the URB entry header. This fixes things on earlier generations. Cc: [email protected] Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Anuj Phogat <[email protected]> Tested-by: Ian Romanick <[email protected]>
* main: allow NEAREST_MIPMAP_NEAREST for stencil texturingRoland Scheidegger2016-12-061-15/+8
| | | | | | | | | | | | As per GL 4.5 rules, which fixed a spec mistake in GL_ARB_stencil_texturing. The extension spec wasn't updated, but just allow it with older GL versions as well, hoping there aren't any crazy tests which want to see an error there... (Compile tested only.) Reported by Józef Kucia <[email protected]> Acked-by: Józef Kucia <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
* Revert "i965: use nir_lower_indirect_derefs() for GLSL"Jason Ekstrand2016-12-052-10/+13
| | | | | This reverts commit 9404439a754e5640ccd98df40fa694835c0d8759. I didn't intend to push it and it breaks clip and cull distance.
* i965: Delete the meta-base CopyImageSubData implementationJason Ekstrand2016-12-054-328/+0
| | | | | | | | | | | | | | | | | | | When I originally implemented the ARB_copy_image extension, the fast-path was written in meta using texture views. This path only worked if both images were uncompressed color images. All of the other cases fell back to the blitter or, in the worst case, mapping and memcpy on the CPU. Now that we have the blorp path, it handles all copies ever and the old meta, blitter, and CPU paths are only used on gen5 and below. The primary reason why we needed the meta path (apart from having a slow blitter on later hardware) was to handle multisampling which gen5 and earlier don't support anyway. Since the blitter is reasonably fast on gen5, we can just delete the meta path and get rid of all that terrible code. If we decide that we're ok with just disabling ARB_copy_image on gen5 and earlier (I personally am), then we could get rid of another 300 lines or so of semi-hairy code. Reviewed-by: Anuj Phogat <[email protected]>
* i965/copy_image: Re-implement the blitter path with emit_miptree_blitJason Ekstrand2016-12-053-97/+80
| | | | | | | | | | By using emit_miptree_blit which does chunking, this fixes the blitter path for the case where the image is too tall to blit normally. We also pull it into intel_blit as intel_miptree_copy. This matches the naming of the blorp blit and copy functions brw_blorp_blit and brw_blorp_copy. Reviewed-by: Anuj Phogat <[email protected]> Cc: "13.0" <[email protected]>
* i965/blit: Break the guts of intel_miptree_blit into a helperJason Ekstrand2016-12-051-67/+84
| | | | | Reviewed-by: Anuj Phogat <[email protected]> Cc: "13.0" <[email protected]>
* i965: use nir_lower_indirect_derefs() for GLSLTimothy Arceri2016-12-052-13/+10
| | | | | | | | | | | | | | | | | | | | | | | | | | | | This moves the nir_lower_indirect_derefs() call into brw_preprocess_nir() so thats is called by both OpenGL and Vulkan and removes that call to the old GLSL IR pass lower_variable_index_to_cond_assign() We want to do this pass in nir to be able to move loop unrolling to nir. There is a increase of 1-3 instructions in a small number of shaders, and 2 Kerbal Space program shaders that increase by 32 instructions. Shader-db results BDW: total instructions in shared programs: 8705873 -> 8706194 (0.00%) instructions in affected programs: 32515 -> 32836 (0.99%) helped: 3 HURT: 79 total cycles in shared programs: 74618120 -> 74583476 (-0.05%) cycles in affected programs: 528104 -> 493460 (-6.56%) helped: 47 HURT: 37 LOST: 2 GAINED: 0
* i965: Release aux buffer when disabling ccsTopi Pohjolainen2016-12-051-0/+3
| | | | | | | | Otherwise subsequent render cycles keep on using compression and/or fast clear. Signed-off-by: Topi Pohjolainen <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
* Revert "st/mesa: get Version from gl_program rather than gl_shader_program"Timothy Arceri2016-12-021-1/+4
| | | | | | | This reverts commit 6bf63b011992dbbc899a28bde5692070dbcf965a. A patch that adds a reference to gl_shader_program_data to gl_program needs to land befor this one.
* st/mesa: get Version from gl_program rather than gl_shader_programTimothy Arceri2016-12-021-4/+1
| | | | Reviewed-by: Nicolai Hähnle <[email protected]>
* st/mesa/glsl: move Version to gl_shader_program_dataTimothy Arceri2016-12-023-3/+4
| | | | | | | | | | | This is mostly just used during linking however the st uses it when updating textures. In order to store gl_program in the CurrentProgram array rather than gl_shader_program we need to move this field to the shared gl_shader_program_data struct. Reviewed-by: Nicolai Hähnle <[email protected]>
* mesa: only verify that enabled arrays have backing buffersIlia Mirkin2016-12-011-1/+1
| | | | | | | | | | | | We were previously also verifying that no backing buffers were available when an array wasn't enabled. This is has no basis in the spec, and it causes GLupeN64 to fail as a result. Fixes: c2e146f487 ("mesa: error out in indirect draw when vertex bindings mismatch") Cc: [email protected] Signed-off-by: Ilia Mirkin <[email protected]> Reviewed-by: Timothy Arceri <[email protected]> Reviewed-by: Tapani Pälli <[email protected]>
* mesa: reset linked_stages bitmask when re-linkingTimothy Arceri2016-12-011-0/+2
| | | | | | | | | | | | | | | 34953f8907fdd added this bitmask but it wasn't being reset when a program was relinked. If a stage was removed from the new program then it could case a crash as we expect the linked shader for that stage to not be null. Fixes crashes in: ESEXT-CTS.tessellation_shader.single.xfb_captures_data_from_correct_stage ES31-CTS.core.tessellation_shader.single.xfb_captures_data_from_correct_stage Reviewed-by: Kenneth Graunke <[email protected]> Reviewed-by: Dave Airlie <[email protected]> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=98917
* st/mesa: skip lower_output_reads when possibleNicolai Hähnle2016-11-301-1/+2
| | | | Reviewed-by: Marek Olšák <[email protected]>
* st/glsl_to_tgsi: swizzle PROGRAM_OUTPUTs correctly in src_register translationNicolai Hähnle2016-11-301-1/+11
| | | | | | | This is required for reading directly from fragment shader stencil and depth outputs. Reviewed-by: Marek Olšák <[email protected]>
* mesa: optimise interleaved sso validationTimothy Arceri2016-11-301-11/+14
| | | | | | | | | | Now that we have a linked_stages bitfield we can use this to check if the program is used at a later stage. This change is also required to be able to use gl_program rather than gl_shader_program in the CurrentProgram array. Reviewed-by: Ian Romanick <[email protected]>
* mesa/glsl: add bitmask to track stages a program was linked againstTimothy Arceri2016-11-301-0/+3
| | | | | | | | | | | | | | | | This will be used to enable us to store the current gl_program rather than gl_shader_program in the gl_pipline_object allowing us to simplify handing of validation. Also we should not be depending on _LinkedShader for this information as it may contain shaders from a failed linking attempt rather than the current program still in use. We could also use this mask to iterate over the stages during linking with _mesa_bit_scan() rather then the current method of NULL checking each stage. Reviewed-by: Ian Romanick <[email protected]>
* i965/sched: Schedule trivial blocks.Matt Turner2016-11-291-3/+0
| | | | | | | | | | In commit 45cd76e342d1e8e schedule_instructions(bblock_t *) began setting bblock_t::cycle_count, but that function was not called on trivial blocks. Remove the code to skip trivial blocks so that cycle_count is set. Reviewed-by: Francisco Jerez <[email protected]>
* i965/sched: Make 'time' a local variable.Matt Turner2016-11-291-3/+1
| | | | Reviewed-by: Francisco Jerez <[email protected]>
* i965/cfg: Initialize bblock_t::cycle_count.Matt Turner2016-11-291-1/+1
| | | | | | | | | | | schedule_instructions(bblock_t *) isn't called on blocks with a single instruction, and since it is the only thing that set cycle_count, cycle_count would be uninitialized. A non-empty block with bblock_t::cycle_count == 0 is arguably a bug. That'll be fixed in the next commit. Reviewed-by: Francisco Jerez <[email protected]>
* i965/cfg: Initialize cfg_t::cycle_count.Matt Turner2016-11-292-1/+2
| | | | | | This reverts commit b4001af1744a02f472bd1204458662088307981b. Reviewed-by: Francisco Jerez <[email protected]>
* i965/gen7: expose larger gather offsetsIlia Mirkin2016-11-291-2/+7
| | | | | | | This matches the capabilities of the hardware. Signed-off-by: Ilia Mirkin <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]>
* i965: support constant gather offsets larger than 4 bitsIlia Mirkin2016-11-294-12/+24
| | | | | | | | Offsets that don't fit into 4 bits need to force gather_po to be selected. Adjust the logic so that this happens. Signed-off-by: Ilia Mirkin <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]>