summaryrefslogtreecommitdiffstats
Commit message (Collapse)AuthorAgeFilesLines
* radv: Initialize the shader_stats_dump flag.Bas Nieuwenhuizen2016-11-291-0/+1
| | | | | | | | Meta was using it before it was set. I suspect we typically don't want to dump meta shaders, so just set it to false in the beginning. Signed-off-by: Bas Nieuwenhuizen <[email protected]> Reviewed-by: Dave Airlie <[email protected]>
* vc4: Add a note for the future about texture latency calculation.Eric Anholt2016-11-291-0/+20
| | | | | | | Debugging a shader-db reported cycle count regression from the tex coalescing, I eventually figured out that the texture latencies were totally bogus. Really fixing it will probably involve mirroring vc4_qir_schedule.c's texture fifo management here.
* vc4: Add support for coalescing ALU ops into tex_[srtb] MOVs.Eric Anholt2016-11-294-29/+37
| | | | | | | | | | | This isn't as complete as I would like (can't merge interpolation because of the implicit r5 dependency, doesn't work with control flow), but this was cheap and easy. Improves 3DMMES Taiji performance by 1.15353% +/- 0.299896% (n=29, 16) total instructions in shared programs: 99810 -> 99059 (-0.75%) instructions in affected programs: 10705 -> 9954 (-7.02%)
* vc4: Restructure VPM write optimization into two passes.Eric Anholt2016-11-291-18/+10
| | | | | For texturing, there won't be a fixed limit on how many writes there are, so we need to compute uses up front.
* vc4: Make qir_for_each_inst_inorder() safe against removal.Eric Anholt2016-11-291-1/+1
| | | | | The dead code elimination wants it to be safe, and I actually got segfaults due to it being unsafe with the new coalescing pass.
* vc4: Split optimizing VPM writes from VPM reads.Eric Anholt2016-11-295-51/+110
| | | | | | The VPM write logic will be basically the same as the texture coordinate write logic we need, and it's not really related to the VPM read logic other than the reuse of the use_count array.
* vc4: Restructure texture insts as ALU ops with tex_[strb] as the dst.Eric Anholt2016-11-299-89/+194
| | | | | For now we're still just generating MOVs, but this will let us fold into other ops in the future. No difference on shader-db.
* vc4: Refactor qir_get_op_nsrc(enum qop) to qir_get_nsrc(struct qinst *).Eric Anholt2016-11-2917-36/+34
| | | | | | Every caller was dereffing the qinst, and this will let us make the number of sources vary depending on the destination of the qinst so that we can have general ALU ops that store to tex_[strb] and get an implicit uniform.
* vc4: Replace the qinst src[] with a fixed-size array.Eric Anholt2016-11-293-4/+2
| | | | | | This may have made a tiny bit of sense when we had one 4-arg inst per shader, but if we only ever put 2 things in, having a pointer to 2 things almost every instruction is pointless indirection.
* vc4: Remove qir_inst4().Eric Anholt2016-11-292-25/+0
| | | | | This was used originally for unorm4x8 packs, but we now represent those as a series of packed movs.
* anv: bump the texture gather offset limitsIlia Mirkin2016-11-291-2/+2
| | | | | | | | This matches what NVIDIA and AMD hardware expose, as well as what Intel hardware supports. Signed-off-by: Ilia Mirkin <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]>
* i965/gen7: expose larger gather offsetsIlia Mirkin2016-11-291-2/+7
| | | | | | | This matches the capabilities of the hardware. Signed-off-by: Ilia Mirkin <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]>
* i965: support constant gather offsets larger than 4 bitsIlia Mirkin2016-11-294-12/+24
| | | | | | | | Offsets that don't fit into 4 bits need to force gather_po to be selected. Adjust the logic so that this happens. Signed-off-by: Ilia Mirkin <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]>
* i965/fs: Refactor handling of constant tg4 offsetsJason Ekstrand2016-11-293-34/+29
| | | | | | | | | | | | | | | | | | | | | Previously, we had an OFFSET_VALUE source for logical texture instructions that was intended to mean exactly what it says, "offset". In reality, we only fully used it for tg4 offsets. We used offset_value.file == IMM to mean, "you have a constant offset, go look in instr->offset" and didn't actually use the contents of the register at all in that case except for in nir_emit_texture where we used it as a temporary before we copy it into instr->offset. This commit renames OFFSET_VALUE to TG4_OFFSET and restricts its usage to indirect tg4 offsets only. The nir_emit_texture code is refactored so that we explicitly build a header_bits value which is placed in instr->offset and the constant offset values (both for tg4 and regular texture operations) are used to construct header_bits and don't go through the offset source at all. Finally, we stop passing offset_value in to lower_sampler_logical_send_gen5 because we can't do indirect offsets until gen7 anyway. Reviewed-by: Kenneth Graunke <[email protected]>
* radv: Use different intrinsic for ubo loads.Bas Nieuwenhuizen2016-11-291-1/+29
| | | | | | | | | | | Not sure about the deprecation path, but this intrinsic can be lowered to SMEM loads. This results in a significant Talos performance improvement. v2: Fix for LLVM attribute changes. Signed-off-by: Bas Nieuwenhuizen <[email protected]> Reviewed-by: Dave Airlie <[email protected]>
* mesa: fix active subroutine uniforms properlyTimothy Arceri2016-11-293-104/+10
| | | | | | | | | | | | | | | | | | | | | | | | 07fe2d565b introduced a big hack in order to return NumSubroutineUniforms when querying ACTIVE_RESOURCES for <shader>_SUBROUTINE_UNIFORM interfaces. However this is the wrong fix we are meant to be returning the number of active resources i.e. the count of subroutine uniforms in the resource list which is what the code was previously doing, anything else will cause trouble when trying to retrieve the resource properties based on the ACTIVE_RESOURCES count. The real problem is that NumSubroutineUniforms was counting array elements as separate uniforms but the innermost array is always considered a single uniform so we fix that count instead which was counted incorrectly in 7fa0250f9. Idealy we could probably completely remove NumSubroutineUniforms and just compute its value when needed from the resource list but this works for now. Reviewed-by: Alejandro Piñeiro <[email protected]> Reviewed-by: Tapani Pälli <[email protected]> Cc: 13.0 <[email protected]>
* anv/cmd_buffer: Remove the 1-D case from the HiZ QPitch calculationJason Ekstrand2016-11-281-3/+6
| | | | | | | | | | | The 1-D special case doesn't actually apply to depth or HiZ. I discovered this while converting BLORP over to genxml and ISL. The reason is that the 1-D special case only applies to the new Sky Lake 1-D layout which is only used for LINEAR 1-D images. For tiled 1-D images, such as depth buffers, the old gen4 2-D layout is used and the QPitch should be in rows. Reviewed-by: Nanley Chery <[email protected]> Cc: "13.0" <[email protected]>
* anv/cmd_buffer: Set the correct surface type for depth/stencilJason Ekstrand2016-11-281-2/+53
| | | | Reviewed-by: Nanley Chery <[email protected]>
* anv: enable drawIndirectFirstInstanceIlia Mirkin2016-11-281-1/+1
| | | | | | | | This was already piped through in the CmdDraw(Indexed)Indirect handling. Signed-off-by: Ilia Mirkin <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]>
* anv: expose depthBiasClamp, it is already setIlia Mirkin2016-11-281-1/+1
| | | | | | | | The gen7/8_cmd_buffer logic already sets the clamp, and it's piped through via the dynamic state. Signed-off-by: Ilia Mirkin <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
* anv: bump maxFramebufferLayers to 2048Ilia Mirkin2016-11-281-1/+1
| | | | | | | | | This matches maxImageArrayLayers, as well as the same setting in the GL frontend. Signed-off-by: Ilia Mirkin <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
* anv: enable storage image extended formatsIlia Mirkin2016-11-281-1/+1
| | | | | | | | These are all regularly available in desktop GL, so the backend fully supports them. Signed-off-by: Ilia Mirkin <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]>
* anv: expose imageCubeArray functionalityIlia Mirkin2016-11-281-1/+1
| | | | | | | | This appears to be fully supported already. Signed-off-by: Ilia Mirkin <[email protected]> Reviewed-by: Lionel Landwerlin <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]>
* radv: set maxFragmentDualSrcAttachments to 1Dave Airlie2016-11-291-1/+1
| | | | | | | Reported-by: Ilia Mirkin <[email protected]> Cc: "13.0" <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]> Signed-off-by: Dave Airlie <[email protected]>
* anv: set maxFragmentDualSrcAttachments to 1Dave Airlie2016-11-291-1/+1
| | | | | | | | Reviewed-by: Kenneth Graunke <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]> Reported-by: Ilia Mirkin <[email protected]> Cc: "13.0" <[email protected]> Signed-off-by: Dave Airlie <[email protected]>
* swr: [rasterizer memory] only clear up to the LOD sizeIlia Mirkin2016-11-281-2/+8
| | | | | Signed-off-by: Ilia Mirkin <[email protected]> Reviewed-by: Tim Rowley <[email protected]>
* swr: [rasterizer memory] hook up stencil clears for ClearTileIlia Mirkin2016-11-281-5/+8
| | | | | Signed-off-by: Ilia Mirkin <[email protected]> Reviewed-by: Tim Rowley <[email protected]>
* swr: [rasterizer memory] add support for clearing Z32F_X32 and Z16Ilia Mirkin2016-11-281-0/+2
| | | | | Signed-off-by: Ilia Mirkin <[email protected]> Reviewed-by: Tim Rowley <[email protected]>
* intel/aubinator: Pull useful information from the AUB headerJason Ekstrand2016-11-281-2/+32
| | | | | | | | | | | This commit does two things. One is to pull useful and/or interesting information from the AUB file header and display it as a header above your decoded batches. Second, it is now capable of pulling the PCI ID from the AUB file comment left by intel_aubdump. This removes the need to use the --gen flag all the time. Reviewed-by: Lionel Landwerlin <[email protected]> Reviewed-by: Jordan Justen <[email protected]>
* intel/aubinator: Wait to setup decoders until we parse the aub headerJason Ekstrand2016-11-281-23/+28
| | | | | | | This requires that a few more state bits become global. Reviewed-by: Lionel Landwerlin <[email protected]> Reviewed-by: Jordan Justen <[email protected]>
* intel/aubinator: Rework handling of the --gen flagJason Ekstrand2016-11-281-20/+16
| | | | | | | This makes it just store the pci_id instead of a struct pointer Reviewed-by: Lionel Landwerlin <[email protected]> Reviewed-by: Jordan Justen <[email protected]>
* intel/aubinator: Trust the packet size in the header for SUBOPCODE_HEADERJason Ekstrand2016-11-281-14/+4
| | | | | | | | | | | We were reading from the "comment size" dword and incrementing by that amount. This never caused a problem because that field was always zero. However, experimenting with actual aub file comments indicates, the simulator seems to include the comment size in the packet size provided in the header. We should do the same. Reviewed-by: Lionel Landwerlin <[email protected]> Reviewed-by: Jordan Justen <[email protected]>
* intel/aubinator: Add a get_offset helperJason Ekstrand2016-11-281-10/+19
| | | | | | | The helper automatically handles masking for us so we don't have to worry about whether or not something is in the bottom bits. Reviewed-by: Kristian H. Kristensen <[email protected]>
* intel/aubinator: Fix the kernel start pointer for 3DSTATE_HSJason Ekstrand2016-11-281-2/+2
| | | | Reviewed-by: Kristian H. Kristensen <[email protected]>
* intel/aubinator: Add a get_address helperJason Ekstrand2016-11-281-16/+31
| | | | | | | This new helper is automatically handles 32 vs. 48-bit GTT issues. It also handles 48-bit canonical addresses on Broadwell and above. Reviewed-by: Kristian H. Kristensen <[email protected]>
* intel/aubinator: Properly handle batch buffer chainingJason Ekstrand2016-11-281-1/+19
| | | | | | | | | | | | | The original aubinator that Kristian wrote had a bug in the handling of MI_BATCH_BUFFER_START that propagated into the version in upstream mesa. In particular, it ignored the "2nd level" bit which tells you whether this MI_BATCH_BUFFER_START is a subroutine call (2nd level) or a goto. Since the Vulkan driver uses batch chaining, this can lead to a very confusing interpretation of the batches. In some cases, depending on how things are laid out in the virtual GTT, you can even end up with infinite loops in batch processing. Reviewed-by: Kristian H. Kristensen <[email protected]>
* swr: don't clear all dirty bits when changing so targetsIlia Mirkin2016-11-281-1/+1
| | | | | | | | | | Among other things, blits would clear existing SO targets which would cause a bunch of updates from u_blitter to be missed. Fixes fbo-scissor-blit fbo, probably among many others. Signed-off-by: Ilia Mirkin <[email protected]> Reviewed-by: Bruce Cherniak <[email protected]>
* swr: [rasterizer core] fix typo in scissor tile-alignment logicIlia Mirkin2016-11-281-1/+1
| | | | | Signed-off-by: Ilia Mirkin <[email protected]> Reviewed-by: Tim Rowley <[email protected]>
* anv: Fix cache UUID generation.Kenneth Graunke2016-11-281-2/+2
| | | | | | | | | | I asked Emil to switch from 0 (success) vs. -1 (fail) to use a boolean in my review comments. The "not" went missing. Easy mistake, but the result is that nothing runs at all :) Fix whitespace while we're here too. Signed-off-by: Kenneth Graunke <[email protected]>
* vulkan/wsi: Fix resource leak in success path of wsi_queue_init()Gwan-gyeong Mun2016-11-281-0/+1
| | | | | | | | | It fixes leakage of pthread_condattr resource on wsi_queue_init() Cc: "13.0" <[email protected]> Signed-off-by: Mun Gwan-gyeong <[email protected]> Reviewed-by: Emil Velikov <[email protected]> Reviewed-by: Eduardo Lima Mitev <[email protected]>
* anv: Update the teardown in reverse order of the anv_CreateDeviceGwan-gyeong Mun2016-11-281-9/+14
| | | | | | | | | | | This updates releasing of resource in reverse order of the anv_CreateDevice to anv_DestroyDevice. And it fixes resource leak in pthread_mutex, pthread_cond, anv_gem_context. Cc: "13.0" <[email protected]> Signed-off-by: Mun Gwan-gyeong <[email protected]> Reviewed-by: Emil Velikov <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]>
* anv: drop the return type for anv_queue_init()Gwan-gyeong Mun2016-11-281-3/+1
| | | | | | | | | anv_queue_init() always returns VK_SUCCESS, so caller does not need to check return value of anv_queue_init(). Signed-off-by: Mun Gwan-gyeong <[email protected]> Reviewed-by: Emil Velikov <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]>
* anv: Add missing error-checking to anv_block_pool_init (v2)Gwan-gyeong Mun2016-11-282-8/+23
| | | | | | | | | | | | | | | | | | | When the memfd_create() and u_vector_init() fail on anv_block_pool_init(), this patch makes to return VK_ERROR_INITIALIZATION_FAILED. All of initialization success on anv_block_pool_init(), it makes to return VK_SUCCESS. CID 1394319 v2: Fixes from Emil's review: a) Add the return type for propagating the return value to caller. b) Changed anv_block_pool_init() to return VK_ERROR_INITIALIZATION_FAILED on failure of initialization. Cc: "13.0" <[email protected]> Signed-off-by: Mun Gwan-gyeong <[email protected]> Reviewed-by: Emil Velikov <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]>
* st/omx/dec/h264: consider POC as signed instead of unsignedChandu Babu Namburu2016-11-281-3/+3
| | | | | | picture order count can be a negative value Reviewed-by: Christian König <[email protected]>
* radv: don't return VK_SUCCESS if radv_device_get_cache_uuid() failsEmil Velikov2016-11-281-0/+2
| | | | | | | | | If radv_device_get_cache_uuid() fails result will be VK_SUCCESS as set by the radv_init_wsi() call above. Fixes: d943839 (radv: Use library mtime for cache UUID.) Signed-off-by: Emil Velikov <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]>
* radv: don't leak the fd if radv_physical_device_init() succeedsEmil Velikov2016-11-281-0/+1
| | | | | | | | | | radv_amdgpu_winsys_create() does not take ownership of the fd, thus we end up leaking it as we return with VK_SUCCESS. Cc: Dave Airlie <[email protected]> Cc: "13.0" <[email protected]> Signed-off-by: Emil Velikov <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]>
* anv: don't leak memory if anv_init_wsi() failsEmil Velikov2016-11-281-2/+4
| | | | | | | | brw_compiler_create() rzalloc-ates memory which we forgot to free. Cc: "13.0" <[email protected]> Signed-off-by: Emil Velikov <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]>
* anv: don't double-close the same fdEmil Velikov2016-11-281-2/+1
| | | | | Cc: "13.0" <[email protected]> Signed-off-by: Emil Velikov <[email protected]>
* configure.ac: remove no longer used TIMESTAMP_CMDEmil Velikov2016-11-281-2/+0
| | | | | | Good bye, you shall not be missed. Signed-off-by: Emil Velikov <[email protected]>
* anv: automake: don't generate anv_timestamp.hEmil Velikov2016-11-282-8/+1
| | | | | | No longer used as of last commit. Signed-off-by: Emil Velikov <[email protected]>