summaryrefslogtreecommitdiffstats
Commit message (Collapse)AuthorAgeFilesLines
* radeonsi: count and report temp arrays in scratch separatelyMarek Olšák2016-11-292-4/+40
| | | | | | v2: only do this if debug output of shader dumping is enabled Reviewed-by: Nicolai Hähnle <[email protected]> (v1)
* radeonsi: don't try to eliminate trivial VS outputs for PS and CSMarek Olšák2016-11-291-1/+4
| | | | | | PS and CS don't have any param exports, so it's a no-op. Reviewed-by: Nicolai Hähnle <[email protected]>
* radeonsi: disable RB+ blend optimizations for dual source blendingMarek Olšák2016-11-291-0/+11
| | | | | | | | This fixes dual source blending on Stoney. The fix was copied from Vulkan. The problem was discovered during internal testing. Cc: 13.0 <[email protected]> Reviewed-by: Nicolai Hähnle <[email protected]>
* radeonsi: set CB_BLEND1_CONTROL.ENABLE for dual source blendingMarek Olšák2016-11-291-0/+4
| | | | | | | copied from Vulkan Cc: 13.0 <[email protected]> Reviewed-by: Nicolai Hähnle <[email protected]>
* radeonsi: always set all blend registersMarek Olšák2016-11-291-5/+5
| | | | | | | better safe than sorry Cc: 13.0 <[email protected]> Reviewed-by: Nicolai Hähnle <[email protected]>
* radeonsi: set the smallest possible CB_TARGET_MASKMarek Olšák2016-11-291-5/+5
| | | | | | better safe than sorry; set_framebuffer_state always makes this dirty Reviewed-by: Nicolai Hähnle <[email protected]>
* radeonsi: don't print bodies of header-only packetsMarek Olšák2016-11-291-0/+4
| | | | Reviewed-by: Nicolai Hähnle <[email protected]>
* radeonsi: print unknown registers with correct formattingMarek Olšák2016-11-291-1/+2
| | | | Reviewed-by: Nicolai Hähnle <[email protected]>
* ddebug: fix hang detection with deferred flushesMarek Olšák2016-11-291-1/+1
| | | | Reviewed-by: Nicolai Hähnle <[email protected]>
* radv: set spi_baryc_cntl.pos_float_location to 0Dave Airlie2016-11-291-1/+1
| | | | | | | | | | | This fixes: dEQP-VK.pipeline.multisample_interpolation.offset_interpolate_at_sample_position.* This should probably be 2 when sample shading is enabled, but I'm not sure. Reviewed-by: Bas Nieuwenhuizen <[email protected]> Signed-off-by: Dave Airlie <[email protected]>
* radv: force persample shading when required.Dave Airlie2016-11-294-6/+23
| | | | | | | | | | | | | | | | | | | We need to force persample shading when a) shader uses sample_id b) shader uses sample_position c) shader uses sample qualifier. Also since ps_iter_samples can now change independently of the rasterizer samples we need to move setting the regs more often. This fixes: dEQP-VK.pipeline.multisample_interpolation.centroid_interpolate_at_consistency.* dEQP-VK.pipeline.multisample_interpolation.centroid_qualifier_inside_primitive.137_191_1.* dEQP-VK.pipeline.multisample_interpolation.sample_interpolate_at_distinct_values.* dEQP-VK.pipeline.multisample_interpolation.sample_qualifier_distinct_values.128_128_1.* Reviewed-by: Bas Nieuwenhuizen <[email protected]> Signed-off-by: Dave Airlie <[email protected]>
* nir: print var binding in dumps.Dave Airlie2016-11-291-1/+1
| | | | | | | | This only useful for spir-v shaders, but I keep finding myself having to add it. Reviewed-by: Kenneth Graunke <[email protected]> Signed-off-by: Dave Airlie <[email protected]>
* docs: fix small typoEric Engestrom2016-11-291-1/+1
| | | | | | Fixes: ba28f2136febca32fe56 ("docs: add note about r-b/other tags when resending") Signed-off-by: Eric Engestrom <[email protected]> Reviewed-by: Emil Velikov <[email protected]>
* i965/sched: Schedule trivial blocks.Matt Turner2016-11-291-3/+0
| | | | | | | | | | In commit 45cd76e342d1e8e schedule_instructions(bblock_t *) began setting bblock_t::cycle_count, but that function was not called on trivial blocks. Remove the code to skip trivial blocks so that cycle_count is set. Reviewed-by: Francisco Jerez <[email protected]>
* i965/sched: Make 'time' a local variable.Matt Turner2016-11-291-3/+1
| | | | Reviewed-by: Francisco Jerez <[email protected]>
* i965/cfg: Initialize bblock_t::cycle_count.Matt Turner2016-11-291-1/+1
| | | | | | | | | | | schedule_instructions(bblock_t *) isn't called on blocks with a single instruction, and since it is the only thing that set cycle_count, cycle_count would be uninitialized. A non-empty block with bblock_t::cycle_count == 0 is arguably a bug. That'll be fixed in the next commit. Reviewed-by: Francisco Jerez <[email protected]>
* i965/cfg: Initialize cfg_t::cycle_count.Matt Turner2016-11-292-1/+2
| | | | | | This reverts commit b4001af1744a02f472bd1204458662088307981b. Reviewed-by: Francisco Jerez <[email protected]>
* ac/nir: Fix accessing an unitialized value.Bas Nieuwenhuizen2016-11-291-1/+2
| | | | | Signed-off-by: Bas Nieuwenhuizen <[email protected]> Reviewed-by: Dave Airlie <[email protected]>
* radv: Initialize the shader_stats_dump flag.Bas Nieuwenhuizen2016-11-291-0/+1
| | | | | | | | Meta was using it before it was set. I suspect we typically don't want to dump meta shaders, so just set it to false in the beginning. Signed-off-by: Bas Nieuwenhuizen <[email protected]> Reviewed-by: Dave Airlie <[email protected]>
* vc4: Add a note for the future about texture latency calculation.Eric Anholt2016-11-291-0/+20
| | | | | | | Debugging a shader-db reported cycle count regression from the tex coalescing, I eventually figured out that the texture latencies were totally bogus. Really fixing it will probably involve mirroring vc4_qir_schedule.c's texture fifo management here.
* vc4: Add support for coalescing ALU ops into tex_[srtb] MOVs.Eric Anholt2016-11-294-29/+37
| | | | | | | | | | | This isn't as complete as I would like (can't merge interpolation because of the implicit r5 dependency, doesn't work with control flow), but this was cheap and easy. Improves 3DMMES Taiji performance by 1.15353% +/- 0.299896% (n=29, 16) total instructions in shared programs: 99810 -> 99059 (-0.75%) instructions in affected programs: 10705 -> 9954 (-7.02%)
* vc4: Restructure VPM write optimization into two passes.Eric Anholt2016-11-291-18/+10
| | | | | For texturing, there won't be a fixed limit on how many writes there are, so we need to compute uses up front.
* vc4: Make qir_for_each_inst_inorder() safe against removal.Eric Anholt2016-11-291-1/+1
| | | | | The dead code elimination wants it to be safe, and I actually got segfaults due to it being unsafe with the new coalescing pass.
* vc4: Split optimizing VPM writes from VPM reads.Eric Anholt2016-11-295-51/+110
| | | | | | The VPM write logic will be basically the same as the texture coordinate write logic we need, and it's not really related to the VPM read logic other than the reuse of the use_count array.
* vc4: Restructure texture insts as ALU ops with tex_[strb] as the dst.Eric Anholt2016-11-299-89/+194
| | | | | For now we're still just generating MOVs, but this will let us fold into other ops in the future. No difference on shader-db.
* vc4: Refactor qir_get_op_nsrc(enum qop) to qir_get_nsrc(struct qinst *).Eric Anholt2016-11-2917-36/+34
| | | | | | Every caller was dereffing the qinst, and this will let us make the number of sources vary depending on the destination of the qinst so that we can have general ALU ops that store to tex_[strb] and get an implicit uniform.
* vc4: Replace the qinst src[] with a fixed-size array.Eric Anholt2016-11-293-4/+2
| | | | | | This may have made a tiny bit of sense when we had one 4-arg inst per shader, but if we only ever put 2 things in, having a pointer to 2 things almost every instruction is pointless indirection.
* vc4: Remove qir_inst4().Eric Anholt2016-11-292-25/+0
| | | | | This was used originally for unorm4x8 packs, but we now represent those as a series of packed movs.
* anv: bump the texture gather offset limitsIlia Mirkin2016-11-291-2/+2
| | | | | | | | This matches what NVIDIA and AMD hardware expose, as well as what Intel hardware supports. Signed-off-by: Ilia Mirkin <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]>
* i965/gen7: expose larger gather offsetsIlia Mirkin2016-11-291-2/+7
| | | | | | | This matches the capabilities of the hardware. Signed-off-by: Ilia Mirkin <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]>
* i965: support constant gather offsets larger than 4 bitsIlia Mirkin2016-11-294-12/+24
| | | | | | | | Offsets that don't fit into 4 bits need to force gather_po to be selected. Adjust the logic so that this happens. Signed-off-by: Ilia Mirkin <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]>
* i965/fs: Refactor handling of constant tg4 offsetsJason Ekstrand2016-11-293-34/+29
| | | | | | | | | | | | | | | | | | | | | Previously, we had an OFFSET_VALUE source for logical texture instructions that was intended to mean exactly what it says, "offset". In reality, we only fully used it for tg4 offsets. We used offset_value.file == IMM to mean, "you have a constant offset, go look in instr->offset" and didn't actually use the contents of the register at all in that case except for in nir_emit_texture where we used it as a temporary before we copy it into instr->offset. This commit renames OFFSET_VALUE to TG4_OFFSET and restricts its usage to indirect tg4 offsets only. The nir_emit_texture code is refactored so that we explicitly build a header_bits value which is placed in instr->offset and the constant offset values (both for tg4 and regular texture operations) are used to construct header_bits and don't go through the offset source at all. Finally, we stop passing offset_value in to lower_sampler_logical_send_gen5 because we can't do indirect offsets until gen7 anyway. Reviewed-by: Kenneth Graunke <[email protected]>
* radv: Use different intrinsic for ubo loads.Bas Nieuwenhuizen2016-11-291-1/+29
| | | | | | | | | | | Not sure about the deprecation path, but this intrinsic can be lowered to SMEM loads. This results in a significant Talos performance improvement. v2: Fix for LLVM attribute changes. Signed-off-by: Bas Nieuwenhuizen <[email protected]> Reviewed-by: Dave Airlie <[email protected]>
* mesa: fix active subroutine uniforms properlyTimothy Arceri2016-11-293-104/+10
| | | | | | | | | | | | | | | | | | | | | | | | 07fe2d565b introduced a big hack in order to return NumSubroutineUniforms when querying ACTIVE_RESOURCES for <shader>_SUBROUTINE_UNIFORM interfaces. However this is the wrong fix we are meant to be returning the number of active resources i.e. the count of subroutine uniforms in the resource list which is what the code was previously doing, anything else will cause trouble when trying to retrieve the resource properties based on the ACTIVE_RESOURCES count. The real problem is that NumSubroutineUniforms was counting array elements as separate uniforms but the innermost array is always considered a single uniform so we fix that count instead which was counted incorrectly in 7fa0250f9. Idealy we could probably completely remove NumSubroutineUniforms and just compute its value when needed from the resource list but this works for now. Reviewed-by: Alejandro Piñeiro <[email protected]> Reviewed-by: Tapani Pälli <[email protected]> Cc: 13.0 <[email protected]>
* anv/cmd_buffer: Remove the 1-D case from the HiZ QPitch calculationJason Ekstrand2016-11-281-3/+6
| | | | | | | | | | | The 1-D special case doesn't actually apply to depth or HiZ. I discovered this while converting BLORP over to genxml and ISL. The reason is that the 1-D special case only applies to the new Sky Lake 1-D layout which is only used for LINEAR 1-D images. For tiled 1-D images, such as depth buffers, the old gen4 2-D layout is used and the QPitch should be in rows. Reviewed-by: Nanley Chery <[email protected]> Cc: "13.0" <[email protected]>
* anv/cmd_buffer: Set the correct surface type for depth/stencilJason Ekstrand2016-11-281-2/+53
| | | | Reviewed-by: Nanley Chery <[email protected]>
* anv: enable drawIndirectFirstInstanceIlia Mirkin2016-11-281-1/+1
| | | | | | | | This was already piped through in the CmdDraw(Indexed)Indirect handling. Signed-off-by: Ilia Mirkin <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]>
* anv: expose depthBiasClamp, it is already setIlia Mirkin2016-11-281-1/+1
| | | | | | | | The gen7/8_cmd_buffer logic already sets the clamp, and it's piped through via the dynamic state. Signed-off-by: Ilia Mirkin <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
* anv: bump maxFramebufferLayers to 2048Ilia Mirkin2016-11-281-1/+1
| | | | | | | | | This matches maxImageArrayLayers, as well as the same setting in the GL frontend. Signed-off-by: Ilia Mirkin <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
* anv: enable storage image extended formatsIlia Mirkin2016-11-281-1/+1
| | | | | | | | These are all regularly available in desktop GL, so the backend fully supports them. Signed-off-by: Ilia Mirkin <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]>
* anv: expose imageCubeArray functionalityIlia Mirkin2016-11-281-1/+1
| | | | | | | | This appears to be fully supported already. Signed-off-by: Ilia Mirkin <[email protected]> Reviewed-by: Lionel Landwerlin <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]>
* radv: set maxFragmentDualSrcAttachments to 1Dave Airlie2016-11-291-1/+1
| | | | | | | Reported-by: Ilia Mirkin <[email protected]> Cc: "13.0" <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]> Signed-off-by: Dave Airlie <[email protected]>
* anv: set maxFragmentDualSrcAttachments to 1Dave Airlie2016-11-291-1/+1
| | | | | | | | Reviewed-by: Kenneth Graunke <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]> Reported-by: Ilia Mirkin <[email protected]> Cc: "13.0" <[email protected]> Signed-off-by: Dave Airlie <[email protected]>
* swr: [rasterizer memory] only clear up to the LOD sizeIlia Mirkin2016-11-281-2/+8
| | | | | Signed-off-by: Ilia Mirkin <[email protected]> Reviewed-by: Tim Rowley <[email protected]>
* swr: [rasterizer memory] hook up stencil clears for ClearTileIlia Mirkin2016-11-281-5/+8
| | | | | Signed-off-by: Ilia Mirkin <[email protected]> Reviewed-by: Tim Rowley <[email protected]>
* swr: [rasterizer memory] add support for clearing Z32F_X32 and Z16Ilia Mirkin2016-11-281-0/+2
| | | | | Signed-off-by: Ilia Mirkin <[email protected]> Reviewed-by: Tim Rowley <[email protected]>
* intel/aubinator: Pull useful information from the AUB headerJason Ekstrand2016-11-281-2/+32
| | | | | | | | | | | This commit does two things. One is to pull useful and/or interesting information from the AUB file header and display it as a header above your decoded batches. Second, it is now capable of pulling the PCI ID from the AUB file comment left by intel_aubdump. This removes the need to use the --gen flag all the time. Reviewed-by: Lionel Landwerlin <[email protected]> Reviewed-by: Jordan Justen <[email protected]>
* intel/aubinator: Wait to setup decoders until we parse the aub headerJason Ekstrand2016-11-281-23/+28
| | | | | | | This requires that a few more state bits become global. Reviewed-by: Lionel Landwerlin <[email protected]> Reviewed-by: Jordan Justen <[email protected]>
* intel/aubinator: Rework handling of the --gen flagJason Ekstrand2016-11-281-20/+16
| | | | | | | This makes it just store the pci_id instead of a struct pointer Reviewed-by: Lionel Landwerlin <[email protected]> Reviewed-by: Jordan Justen <[email protected]>
* intel/aubinator: Trust the packet size in the header for SUBOPCODE_HEADERJason Ekstrand2016-11-281-14/+4
| | | | | | | | | | | We were reading from the "comment size" dword and incrementing by that amount. This never caused a problem because that field was always zero. However, experimenting with actual aub file comments indicates, the simulator seems to include the comment size in the packet size provided in the header. We should do the same. Reviewed-by: Lionel Landwerlin <[email protected]> Reviewed-by: Jordan Justen <[email protected]>