summaryrefslogtreecommitdiffstats
Commit message (Collapse)AuthorAgeFilesLines
* anv: fix enumeration of propertiesEmil Velikov2016-11-231-6/+8
| | | | | | | | | | | | | | | | | | | Driver should enumerate only up-to min2(num_available, num_requested) properties and return VK_INCOMPLETE if the # of requested props is smaller than the ones available. Presently we assert out in such cases. Inspired by a similar fix for RADV. v2: Use MIN2 + typed_memcpy (Jason). Should fix: dEQP-VK.api.info.device.extensions Cc: "13.0" <[email protected]> Signed-off-by: Emil Velikov <[email protected]> Reviewed-by: Eric Engestrom <[email protected]> (v1) Reviewed-by: Jason Ekstrand <[email protected]>
* i965: Restructure fast clear eligibility decisionBen Widawsky2016-11-231-14/+37
| | | | | | | | | v2 (Jason): - Use PRM citation for SKL now that it is available - Also return false for gen < 8 mipmapped/arrayed Signed-off-by: Topi Pohjolainen <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]>
* i965: Set initial msaa fast clear status explicitlyTopi Pohjolainen2016-11-231-1/+1
| | | | | | | | | | | instead of in intel_miptree_init_mcs(). For lossless compression the status is immediately overwritten in intel_miptree_alloc_non_msrt_mcs() while the status for non-compressed non-msaa miptrees is explicitly set in do_blorp_clear(). Signed-off-by: Topi Pohjolainen <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]>
* i965: Declare read-only input to level/layer check constTopi Pohjolainen2016-11-231-1/+1
| | | | | Signed-off-by: Topi Pohjolainen <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]>
* i965/fbo: Prepare layer multiplier for render buffer compressionTopi Pohjolainen2016-11-231-1/+1
| | | | | | | | This path is not yet taken for fast cleared or compressed buffers but later patches will enable it. Signed-off-by: Topi Pohjolainen <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]>
* i965: Add multi-slice getter for resolve mapsTopi Pohjolainen2016-11-232-7/+27
| | | | | | | This is useful when checking if any slice is in unresolved state. Signed-off-by: Topi Pohjolainen <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]>
* i965/meta: Split conversion of color and setting itTopi Pohjolainen2016-11-233-19/+36
| | | | | | | | | And fix a mangled comment while at it. v2 (Ben): Return the converted color. Signed-off-by: Topi Pohjolainen <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]>
* intel/blorp: Fix rectangle size for level-not-zero resolvesTopi Pohjolainen2016-11-231-2/+2
| | | | | | | | Needed to prevent gpu hangs when mip-mapped compression gets enabled. Signed-off-by: Topi Pohjolainen <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]>
* i965/miptree: Don't shrink textures when augmenting for more levelsTopi Pohjolainen2016-11-231-4/+25
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This was detected when examining CCS_E failures with piglit test: "fbo-generatemipmap-formats". Test creates a 2D texture with dimensions 293x277. It manually loops over all levels and calls glTexImage2D(). Level one triggers creation of full miptree: intel_alloc_texture_image_buffer() realizes that there is only one level in the miptree and calls intel_miptree_create_for_teximage() to re-allocate the miptree with all 9 levels. However, the end result is a miptree with level zero dimensions of 292x276. Related, and possibly calling for treatment of its own is mip-map generation: After calling glTexImage2D() against every level test continues by replacing content for levels one to eight with data derived from level zero by calling glGenerateMipmapEXT(). This results into the miptree being allocated anew for every level: Mip-map generation goes thru meta which ends up validating the texture (brw_validate_textures()->intel_finalize_mipmap_tree()-> intel_miptree_match_image()) where one finds texture with base level size 292:276. This results into new miptree being created for the npot size 293:277. Only here intel_finalize_mipmap_tree() is asked for only one level, and therefore such is created. Generation for level one in turn finds right base level size but only one level when two is needed. And the same goes on for all eight levels. This patch prevents the shrink maintaining the NPOT size of 293x277. Signed-off-by: Topi Pohjolainen <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]>
* main/getteximage: Use the height argument to calculate memcpy copy sizeEduardo Lima Mitev2016-11-231-1/+1
| | | | | | | | | | | | | | | | | | In get_tex_memcpy, when copying texture data directly from source to destination (when row strides match for both src and dst), the copy size is currently calculated using the full texture height instead of the sub-region height parameter that was passed. This can cause a read past the end of the mapped buffer when y-offset is greater than zero, leading to a segfault. Fixes CTS test (from crash to pass): * GL45-CTS/get_texture_sub_image/functional_test v2: (Jason) Use the passed 'height' instead of copying til the end of the buffer (tex-height - yoffset). Reviewed-by: Jason Ekstrand <[email protected]>
* nir/spirv: implement ordered / unordered floating point comparisons properlyIago Toral Quiroga2016-11-231-1/+52
| | | | | | | | | | | | | | | | | | | | | Besides the logical operation involved, these also require that we test if the operands are ordered / unordered. For ordered operations, both operands must be ordered (and they must pass the conditional test) while for unordered operations it is sufficient if only one of the operands is unordered (or they pass the logical test). Fixes the following Vulkan CTS tests: dEQP-VK.spirv_assembly.instruction.compute.opfunord.equal dEQP-VK.spirv_assembly.instruction.compute.opfunord.greater dEQP-VK.spirv_assembly.instruction.compute.opfunord.greaterequal dEQP-VK.spirv_assembly.instruction.compute.opfunord.less dEQP-VK.spirv_assembly.instruction.compute.opfunord.lessequal v2: Fixed typo: s/nir_eq/nir_feq Reviewed-by: Lionel Landwerlin <[email protected]>
* anv: fix segfault in anv_BindImageMemoryDave Airlie2016-11-231-1/+1
| | | | | | | | | | | | Since bind image memory started memsetting surfaces, the device node can't be NULL, since we lookup device->info.has_llc. Not sure why it ever was NULL before. Fixes some things on my Ivybridge. Reviewed-by: Jason Ekstrand <[email protected]> Signed-off-by: Dave Airlie <[email protected]>
* swr: [rasterizer core] fix cast for stencil clear valueTim Rowley2016-11-221-3/+2
| | | | | | | Bad type cast for stencil clear value was picking up structure padding bytes. Reviewed-by: Ilia Mirkin <[email protected]>
* swr: color interpolation is also supposed to get perspective divisionIlia Mirkin2016-11-221-2/+4
| | | | | Signed-off-by: Ilia Mirkin <[email protected]> Reviewed-by: Tim Rowley <[email protected]>
* swr: add sprite coord enable mask to fs keyIlia Mirkin2016-11-222-1/+3
| | | | | | | This fixes gl-coord-replace-doesnt-eliminate-frag-tex-coords Signed-off-by: Ilia Mirkin <[email protected]> Reviewed-by: Tim Rowley <[email protected]>
* swr: rework vert <-> frag shader linkage logicIlia Mirkin2016-11-221-43/+50
| | | | | | | | | | | | Fixes a few things: - sprite coords only apply to generic varyings, and are a bitmask - back color only applies in 2-sided lighting mode - handle some odd situations between only some front/back colors being there. This is only semi-legal in GL, but we shouldn't start crashing. Signed-off-by: Ilia Mirkin <[email protected]> Reviewed-by: Tim Rowley <[email protected]>
* swr: flatshading makes color outputs flat, it doesn't affect othersIlia Mirkin2016-11-221-4/+2
| | | | | | | | We were previously not marking the "regular" flat outputs as flat when flatshading was enabled. Signed-off-by: Ilia Mirkin <[email protected]> Reviewed-by: Tim Rowley <[email protected]>
* swr: only broadcast color0 value, not all color valuesIlia Mirkin2016-11-221-1/+2
| | | | | | | | | | | | | The way that dual-source blending is described for GLES2 is very odd, and we end up with a shader that both has this property set *and* has a color1 value to be used as the second source. While changing the state tracker is an option, it seems more reliable to verify that the broadcast is only done on color0. Fixes arb_blend_func_extended-fbo-extended-blend-pattern_gles2 Signed-off-by: Ilia Mirkin <[email protected]> Reviewed-by: Tim Rowley <[email protected]>
* swr: report a reasonable max lod biasIlia Mirkin2016-11-221-1/+1
| | | | | | | | | This is the same value that llvmpipe uses. Since swr uses the same sampler logic, makes sense for this value to also be the same. Most applications don't care. Signed-off-by: Ilia Mirkin <[email protected]> Reviewed-by: Tim Rowley <[email protected]>
* swr: avoid using exceptions for expected condition handlingIlia Mirkin2016-11-221-5/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | I was getting a weird segfault from GCC 4.9.3: 0x00007ffff54f27aa in strlen () from /lib64/libc.so.6 (gdb) bt #0 0x00007ffff54f27aa in strlen () from /lib64/libc.so.6 #1 0x00007ffff4f128e5 in get_cie_encoding (cie=cie@entry=0x7ffff6e09813) at /gcc-4.9.3/libgcc/unwind-dw2-fde.c:272 #2 0x00007ffff4f1318e in classify_object_over_fdes (ob=ob@entry=0xd7bb90, this_fde=0x7ffff7f11010) at /gcc-4.9.3/libgcc/unwind-dw2-fde.c:628 #3 0x00007ffff4f135ba in init_object (ob=0xd7bb90) at /gcc-4.9.3/libgcc/unwind-dw2-fde.c:749 #4 search_object (ob=ob@entry=0xd7bb90, pc=pc@entry=0x7ffff4f11f4d <_Unwind_RaiseException+61>) at /gcc-4.9.3/libgcc/unwind-dw2-fde.c:961 #5 0x00007ffff4f13e62 in _Unwind_Find_registered_FDE (bases=0x7fffffffd358, pc=0x7ffff4f11f4d <_Unwind_RaiseException+61>) at /gcc-4.9.3/libgcc/unwind-dw2-fde.c:1025 #6 _Unwind_Find_FDE (pc=0x7ffff4f11f4d <_Unwind_RaiseException+61>, bases=bases@entry=0x7fffffffd358) at /gcc-4.9.3/libgcc/unwind-dw2-fde-dip.c:450 #7 0x00007ffff4f11197 in uw_frame_state_for (context=context@entry=0x7fffffffd2b0, fs=fs@entry=0x7fffffffd100) at /gcc-4.9.3/libgcc/unwind-dw2.c:1245 #8 0x00007ffff4f11b15 in uw_init_context_1 (context=context@entry=0x7fffffffd2b0, outer_cfa=outer_cfa@entry=0x7fffffffd660, outer_ra=0x7ffff518d23b <__cxa_throw+91>) at /gcc-4.9.3/libgcc/unwind-dw2.c:1566 #9 0x00007ffff4f11f4e in _Unwind_RaiseException (exc=0xd7c250) at /gcc-4.9.3/libgcc/unwind.inc:88 #10 0x00007ffff518d23b in __cxa_throw () from /usr/lib/gcc/x86_64-pc-linux-gnu/4.9.3/libstdc++.so.6 #11 0x00007ffff51ed556 in std::__throw_out_of_range(char const*) () from /usr/lib/gcc/x86_64-pc-linux-gnu/4.9.3/libstdc++.so.6 #12 0x00007fffea778be0 in std::map<pipe_format, SWR_FORMAT, std::less<pipe_format>, std::allocator<std::pair<pipe_format const, SWR_FORMAT> > >::at ( this=0x7fffebeb4c40 <mesa_to_swr_format(pipe_format)::mesa2swr>, __k=@0x7fffffffd73c: PIPE_FORMAT_RGTC1_UNORM) at /usr/lib/gcc/x86_64-pc-linux-gnu/4.9.3/include/g++-v4/bits/stl_map.h:549 #13 0x00007fffea776aee in mesa_to_swr_format (format=PIPE_FORMAT_RGTC1_UNORM) at swr_screen.cpp:597 We can just void this whole issue by not using exceptions in the first place. Signed-off-by: Ilia Mirkin <[email protected]> Reviewed-by: Bruce Cherniak <[email protected]>
* swr: remove formats from mapping table that don't have StoreTile implsIlia Mirkin2016-11-221-38/+48
| | | | | | | | | | | This table exists for the purpose of determining renderable formats. Without a StoreTile implementation, that can't happen. This basically removes rendering support to all L/LA/I formats. They can be re-added when/if StoreTile implementations are added. Signed-off-by: Ilia Mirkin <[email protected]> Reviewed-by: Bruce Cherniak <[email protected]>
* swr: remove unnecessary -1 entries in format mapping tableIlia Mirkin2016-11-221-126/+0
| | | | | Signed-off-by: Ilia Mirkin <[email protected]> Reviewed-by: Bruce Cherniak <[email protected]>
* swr: rework resource layout and surface setupIlia Mirkin2016-11-226-160/+352
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This is a bit of a mega-commit, but unfortunately there's no great way to break this up since a lot of different pieces have to match up. Here we do the following: - change surface layout to match swr's Load/StoreTile expectations - fix sampler settings to respect all sampler view parameters - fix stencil sampling to read from secondary resource - respect pipe surface format, level, and layer settings - fix resource map/unmap based on the new layout logic - fix resource map/unmap to copy proper parts of stencil values in and out of the matching depth texture These fix a massive quantity of piglits, including all the tex-miplevel-selection ones. Note that the swr native miptree layout isn't extremely space-efficient, and we end up using it for all textures, not just the renderable ones. A back-of-the-envelope calculation suggests about 10%-25% increased memory usage for miptrees, depending on the number of LODs. Single-LOD textures should be unaffected. There are a handful of regressions as a result of this change: - Some textureGrad tests, these failures match llvmpipe. (There are debug settings allowing improved gallivm sampling accurancy.) - Some layered clearing tests as swr doesn't currently support that. It was getting lucky before because enough other things were broken. Signed-off-by: Ilia Mirkin <[email protected]> Reviewed-by: Bruce Cherniak <[email protected]>
* util: fix missing swizzle components in the SINT <-> UINT conversion stringCharmaine Lee2016-11-231-2/+2
| | | | | | | Fixes tgsi error introduced in commit 3817a7a. The error complains missing swizzle component in the conversion string "UMIN TEMP[0], TEMP[0], IMM[0].x". Reviewed-by: Roland Scheidegger <[email protected]>
* vc4: Don't conditionalize the src1 mov of qir_SEL().Eric Anholt2016-11-221-4/+2
| | | | | | | | | | | | | | | | | My thought in having both arguments conditionally moved was that it should theoretically save some power by not doing work in those channels. However, it ends up costing us instructions because we can't register-coalesce the first of the MOVs, and it also introduces extra scheduling dependencies. The instruction cost would swamp whatever power benefit I was hoping for. shader-db results: total instructions in shared programs: 100548 -> 99741 (-0.80%) instructions in affected programs: 42450 -> 41643 (-1.90%) With obvious outliers removed (I had an X11 emacs running over the network in the "after" case), 3DMMES Taiji showed 1.07231% +/- 0.488241% fps improvement (n=18, 30).
* vc4: Re-add R4 to the "any" register class.Eric Anholt2016-11-221-0/+2
| | | | | | | | | | I screwed this up in fdad4d24024ab7bc9b6b9cb6288f8b76ccac0d89 which was supposed to be making this code more maintainable. What's amazing is multithreaded FS showed the wins it did despite this bug. shader-db results: total instructions in shared programs: 103535 -> 100548 (-2.89%) instructions in affected programs: 83794 -> 80807 (-3.56%)
* vc4: Disable MSAA rasterization when the job binning is single-sampled.Eric Anholt2016-11-221-2/+13
| | | | | | | | Gallium core just changed to start setting MSAA enabled in the rasterizer state even with samples==1 buffers. This caused disagreements in our driver between binning and rasterization state, which the simulator threw assertion failures about. Keep the single-sampled samples==1 behavior for now.
* vc4: Make sure we don't overflow texture input/output FIFOs when threaded.Eric Anholt2016-11-221-2/+3
| | | | | | | | I dropped the first hunk of this change last minute when I decided it wasn't actually needed, and apparently failed to piglit it in simulation. The simulator threw an an assertion in gl-1.0-drawpixels-color-index, which queued up 5 coordinates (3 before a switch, two after) before loading the result.
* radv: move pipeline barrier image transitions after src flushingDave Airlie2016-11-231-8/+12
| | | | | | | | | This seems like it would conform better with the spec. noticed while digging into fast clears. Reviewed-by: Bas Nieuwenhuizen <[email protected]> Signed-off-by: Dave Airlie <[email protected]>
* anv: Enable fast clears on gen7-8Jason Ekstrand2016-11-222-13/+36
| | | | Reviewed-by: Jordan Justen <[email protected]>
* anv: Add support for fast clears on gen9Jason Ekstrand2016-11-223-29/+176
| | | | Reviewed-by: Jordan Justen <[email protected]>
* anv/blorp: Rework flushing around resolvesJason Ekstrand2016-11-221-14/+18
| | | | | | | | It turns out that the flushing required around resolves is a bit more extensive than I first thought. You actually need render cache flush and a CS stall both before *and* after the resolve. Reviewed-by: Jordan Justen <[email protected]>
* anv/cmd_buffer: Apply remaining flushes in EndCommandBufferJason Ekstrand2016-11-221-0/+2
| | | | | | | | Otherwise, some pipe flushes may just never happen. This is unlikely to cause problems depending on how the kernel schedules batches, but we shouldn't count on it. Reviewed-by: Jordan Justen <[email protected]>
* anv/blorp: Use regular blorp clears for subpass clearsJason Ekstrand2016-11-221-10/+19
| | | | | | | | At vkCmdNextSubpass time, we have the actual framebuffer so we can use regular blorp_clear for subpass clears. For fast clears, there is no attachment version, so this will make fast clears a bit easier. Reviewed-by: Jordan Justen <[email protected]>
* anv: Add a vk_to_isl_color helperJason Ekstrand2016-11-222-7/+16
| | | | Reviewed-by: Jordan Justen <[email protected]>
* anv/cmd_buffer: Make setup_attachments take a RenderPassBeginInfoJason Ekstrand2016-11-221-7/+6
| | | | Reviewed-by: Jordan Justen <[email protected]>
* anv: Set up binding tables and surface states for input attachmentsJason Ekstrand2016-11-225-6/+85
| | | | | | | | | | | | | This commit adds the last remaining bits to support input attachments in the Intel Vulkan driver. For color and depth attachments, we allocate an input attachment surface state during vkCmdBeginRenderPass like we do for the render target surface states. This is so that we can incorporate the clear color and aux information as used in rendering. For stencil, we just treat it like a regular texture because we don't there is no aux. Also, only having to worry about at most one input attachment surface for each attachment makes some of the vkCmdBeginRenderPass code simpler. Reviewed-by: Jordan Justen <[email protected]>
* anv/pipeline: Handle depth/stencil self-dependenciesJason Ekstrand2016-11-223-6/+31
| | | | Reviewed-by: Jordan Justen <[email protected]>
* anv: Use pass attachment information to insert flushesJason Ekstrand2016-11-221-0/+88
| | | | | | | | | | | | Input and resolve attachments can cause an implicit dependency in the pipeline. It's our job to insert the needed flushes. Fortunately, we can easily reuse the usage tracking that we use for CCS resolves. This fixes 159 Vulkan CTS tests on Haswell because we're now flushing in between drawing and MSAA resolves. I have no idea how they were passing before on newer hardware. Reviewed-by: Jordan Justen <[email protected]>
* anv/cmd_buffer: Fix pipeline barriers for input attachmentsJason Ekstrand2016-11-221-1/+1
| | | | | | | | We were using VK_IMAGE_ACCESS_COLOR_ATTACHMENT_READ_BIT to detect an input attachment read. We should use VK_IMAGE_ACCESS_INPUT_ATTACHMENT_READ_BIT instead. Reviewed-by: Jordan Justen <[email protected]>
* anv/pipeline: Add a input_attachment_index to the bindingsJason Ekstrand2016-11-222-0/+30
| | | | | | | This allows us to go from the binding to either the descriptor or the input attachment at will. Reviewed-by: Jordan Justen <[email protected]>
* anv/pass: Calculate the combined image usage of attachmentsJason Ekstrand2016-11-222-1/+12
| | | | Reviewed-by: Jordan Justen <[email protected]>
* anv: Add an input attachment lowering passJason Ekstrand2016-11-224-0/+147
| | | | Reviewed-by: Jordan Justen <[email protected]>
* i965/fs: Implement load_layer_id for fragment shadersJason Ekstrand2016-11-221-0/+5
| | | | Reviewed-by: Jordan Justen <[email protected]>
* nir: Add a layer_id system value intrinsicJason Ekstrand2016-11-221-0/+1
| | | | Reviewed-by: Jordan Justen <[email protected]>
* spirv: Stop warning about input attachmentsJason Ekstrand2016-11-221-1/+1
| | | | | Reviewed-by: Dave Airlie <[email protected]> Reviewed-by: Jordan Justen <[email protected]>
* spirv: Handle the InputAttachmentIndex decorationJason Ekstrand2016-11-222-0/+5
| | | | | Reviewed-by: Dave Airlie <[email protected]> Reviewed-by: Jordan Justen <[email protected]>
* compiler: Add the rest of the subpassInput typesJason Ekstrand2016-11-224-6/+23
| | | | | | | There are actually 6 of them according to the GL_KHR_vulkan_glsl spec. Reviewed-by: Dave Airlie <[email protected]> Reviewed-by: Jordan Justen <[email protected]>
* anv/cmd_buffer: Emit CS push constants after binding tablesJason Ekstrand2016-11-221-8/+8
| | | | | Emitting binding tables can cause push constants to be dirtied if the shader uses images so we need to handle push constants later.
* gallium: fix more occurences of u_hash.hMarek Olšák2016-11-225-5/+5
| | | | this fixes compile failures since 86514d84e0beec47c82da4888db12bf07f33cb83