summaryrefslogtreecommitdiffstats
Commit message (Collapse)AuthorAgeFilesLines
* radeonsi: atomize the scratch buffer stateMarek Olšák2017-01-306-29/+32
| | | | | | | | | The update frequency is very low. Difference: Only account for the size when allocating a new one and when starting a new IB, and check for NULL. (v3) Reviewed-by: Nicolai Hähnle <[email protected]>
* r600: Fix stack overflowBartosz Tomczyk2017-01-301-1/+1
| | | | | | | | | Commit 7b5878ee0491e7a93914389a8369cd6752b9757d increased number of outputs to 64, but left output array intact. This caused stack overflow when number of outputs is bigger then 32. Found by ASAN. Cc: "12.0 13.0 17.0" <[email protected]> Reviewed-by: Nicolai Hähnle <[email protected]>
* gallium/radeon: add new HUD queries for monitoring the CPSamuel Pitoiset2017-01-304-3/+80
| | | | | | | | | | There are even more counters in the CP_STAT register but I think these ones are enough for now. v2: only read (and expose) CP_STAT on VI+ Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Marek Olšák <[email protected]>
* gallium/radeon: add new GPU-sdma-busy HUD querySamuel Pitoiset2017-01-304-1/+21
| | | | | | | | | For simplicity, GPU-sdma-busy will return 0 on previous gens. v2: only read SRBM_STATUS2 on Evergreen+ Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Marek Olšák <[email protected]>
* gallium/radeon: rename grbm to mmio in the gpu load pathSamuel Pitoiset2017-01-302-32/+33
| | | | | | | | We also want to monitor other MMIO counters like SRBM_STATUS2 in order to know if SDMA is busy. Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Marek Olšák <[email protected]>
* winsys/amdgpu: add a fast exit path into amdgpu_cs_add_bufferMarek Olšák2017-01-302-0/+21
| | | | | | The time spent in the function dropped by 37% for torcs. Reviewed-by: Nicolai Hähnle <[email protected]>
* winsys/amdgpu: do not iterate twice when adding fence dependenciesSamuel Pitoiset2017-01-301-31/+32
| | | | | | | | The perf difference is very small, 3.25->2.84% in amdgpu_cs_flush() in the DXMD benchmark. Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Marek Olšák <[email protected]>
* winsys/amdgpu: add one likely() call in amdgpu_cs_flush()Samuel Pitoiset2017-01-301-2/+2
| | | | | Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Marek Olšák <[email protected]>
* hud: fix compilation warnings in hud_nic_graph_install()Samuel Pitoiset2017-01-301-2/+2
| | | | | | | v2: use PRId64 instead of PRIx64 Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Marek Olšák <[email protected]>
* st/mesa: make st_texture_get_sampler_view() staticSamuel Pitoiset2017-01-302-5/+1
| | | | | Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Nicolai Hähnle <[email protected]>
* gallium/radeon: remove r600_common_context::max_dbMarek Olšák2017-01-303-20/+17
| | | | | | this cleanup is based on the vulkan driver, which seems to do the same thing Reviewed-by: Nicolai Hähnle <[email protected]>
* winsys/amdgpu: fix ADDR_REGISTER_VALUE::backendDisablesMarek Olšák2017-01-301-1/+1
| | | | | | This would be a fix if the value was used anywhere. Reviewed-by: Nicolai Hähnle <[email protected]>
* gallium/radeon: clean up r600_query_init_backend_maskMarek Olšák2017-01-306-22/+21
| | | | | | | This just needs to be done for r600g in the screen. We don't need an IB submission for every new context created for GCN. Reviewed-by: Nicolai Hähnle <[email protected]>
* radeonsi: precompute IA_MULTI_VGT_PARAM values into a tableMarek Olšák2017-01-306-72/+163
| | | | | | | The perf difference is very small: 0.99% -> 0.40% for the time spent in si_get_ia_multi_vgt_param when si_draw_vbo is 20%. Pretty much nothing. Reviewed-by: Nicolai Hähnle <[email protected]>
* radeonsi: move VGT_VERTEX_REUSE_BLOCK_CNTL into shader states for PolarisMarek Olšák2017-01-304-21/+43
| | | | Reviewed-by: Nicolai Hähnle <[email protected]>
* radeonsi: state atom IDs don't have to be off by oneMarek Olšák2017-01-302-4/+4
| | | | Reviewed-by: Nicolai Hähnle <[email protected]>
* radeonsi: use a bitmask for looping over dirty PM4 statesMarek Olšák2017-01-305-18/+20
| | | | | | also move it to draw_vbo, because it should be 0 in most cases Reviewed-by: Nicolai Hähnle <[email protected]>
* radeonsi: atomize L2 prefetchesMarek Olšák2017-01-307-36/+50
| | | | | | to move the big conditional statement out of draw_vbo Reviewed-by: Nicolai Hähnle <[email protected]>
* radeonsi: unbind disabled shader stages to prevent useless L2 prefetchesMarek Olšák2017-01-301-0/+6
| | | | Reviewed-by: Nicolai Hähnle <[email protected]>
* radeonsi: also prefetch compute shadersMarek Olšák2017-01-301-0/+12
| | | | Reviewed-by: Nicolai Hähnle <[email protected]>
* radeonsi: update dirty_level_mask only after the first draw after FB changeMarek Olšák2017-01-303-24/+31
| | | | Reviewed-by: Nicolai Hähnle <[email protected]>
* gallium/radeon: allow VRAM-only placements again on APUs & recent amdgpuMarek Olšák2017-01-301-0/+4
| | | | Reviewed-by: Nicolai Hähnle <[email protected]>
* radeonsi: don't set +fp64-denormalsMarek Olšák2017-01-301-1/+1
| | | | | | it's the default and the name will change to +fp64-fp16-denormals. Reviewed-by: Nicolai Hähnle <[email protected]>
* radeonsi: remove si_shader_context::param_tess_offchipMarek Olšák2017-01-302-8/+3
| | | | | | we don't use on-chip tess. Reviewed-by: Nicolai Hähnle <[email protected]>
* etnaviv: force vertex buffers through the MMULucas Stach2017-01-301-1/+4
| | | | | | | | | This fixes a vertex data corruption issue if some of the vertex streams go through the MMU and some don't. Signed-off-by: Lucas Stach <[email protected]> Tested-by: Philipp Zabel <[email protected]> Acked-by: Christian Gmeiner <[email protected]>
* radv: Expose VK_KHR_maintenance1Andres Rodriguez2017-01-301-0/+4
| | | | Reviewed-by: Bas Nieuwenhuizen <[email protected]>
* radv: Fix vkCmdCopyImage for 2d slices into 3d ImagesAndres Rodriguez2017-01-301-1/+4
| | | | | | | | | | | | | | | Previously the z offset of the destination image was being ignored. It should be taken into account when copying into a 3d target. Also, img_extent_el.depth was being incorrectly clamped to 1 due to the source image being VK_IMAGE_TYPE_2D. This would result in the blit failing to iterate over all the 3d slices. Instead we clamp to the destination image type. Fixes failures in CTS tests: dEQP-VK.api.copy_and_blit.image_to_image.3d_images.* Reviewed-by: Bas Nieuwenhuizen <[email protected]>
* radv: Expose transfer format features.Bas Nieuwenhuizen2017-01-301-0/+11
| | | | | | Signed-off-by: Bas Nieuwenhuizen <[email protected]> Reviewed-by: Dave Airlie <[email protected]> Reviewed-by: Edward O'Callaghan <[email protected]>
* radv: Don't allow any operations on non-supported depth/stencil formats.Bas Nieuwenhuizen2017-01-301-4/+5
| | | | | | | | We really use the depth block for the blits. Signed-off-by: Bas Nieuwenhuizen <[email protected]> Reviewed-by: Edward O'Callaghan <[email protected]> Reviewed-by: Dave Airlie <[email protected]>
* radv: use new error codes for AllocateDescriptorSetsAndres Rodriguez2017-01-301-1/+1
| | | | | | | | | | There is a new error code in Maintenance1 that is more specific to the situation: VK_ERROR_OUT_OF_POOL_MEMORY_KHR Fixes CTS test case: dEQP-VK.api.descriptor_pool.out_of_pool_memory Reviewed-by: Bas Nieuwenhuizen <[email protected]>
* radv: vkAllocateCommandBuffers should NULL all output handlesAndres Rodriguez2017-01-301-0/+3
| | | | | | | This is part of the spec and fixes CTS tests: dEQP-VK.api.object_management.alloc_callback_fail_multiple.command_buffer_* Reviewed-by: Bas Nieuwenhuizen <[email protected]>
* radv: add trim command pool stubAndres Rodriguez2017-01-301-0/+7
| | | | Reviewed-by: Bas Nieuwenhuizen <[email protected]>
* i965: Support the force_glsl_version driconf option.Kenneth Graunke2017-01-292-0/+4
| | | | | | | | | Gallium drivers have had this for a while. It makes sense to support it consistently across drivers, so expose it in i965 as well. Cc: "17.0" <[email protected]> Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Jordan Justen <[email protected]>
* i965: Fix check for negative pitch in can_do_fast_copy_blit().Kenneth Graunke2017-01-291-6/+4
| | | | | | | | | | | | | At this point, the pitch is in bytes. We haven't yet divided the pitch by 4 for tiled surfaces, so abs(pitch) may be larger than 32K. This means the bit 15 trick won't work. The caller now has signed integers anyway, so just pass those through and do the obvious check. Cc: "17.0" <[email protected]> Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]>
* radv: Handle command buffers that need scratch memory.Bas Nieuwenhuizen2017-01-303-6/+199
| | | | | | v2: Create the descriptor BO with CPU access. Signed-off-by: Bas Nieuwenhuizen <[email protected]> Reviewed-by: Dave Airlie <[email protected]>
* radv: Track scratch usage across pipelines & command buffers.Bas Nieuwenhuizen2017-01-304-8/+119
| | | | | | | Based on code written by Dave Airlie. Signed-off-by: Bas Nieuwenhuizen <[email protected]> Reviewed-by: Dave Airlie <[email protected]>
* radv/ac: Add compiler support for spilling.Bas Nieuwenhuizen2017-01-307-23/+42
| | | | | | | Based on code written by Dave Airlie. Signed-off-by: Bas Nieuwenhuizen <[email protected]> Reviewed-by: Dave Airlie <[email protected]>
* radv/amdgpu: Support a preamble CS.Bas Nieuwenhuizen2017-01-304-15/+56
| | | | | Signed-off-by: Bas Nieuwenhuizen <[email protected]> Reviewed-by: Dave Airlie <[email protected]>
* i965: add assert to while_jumps_before_offset()Timothy Arceri2017-01-301-0/+1
| | | | | | | jip should always be negative here as its the result of do instruction - while instruction. Reviewed-by: Kenneth Graunke <[email protected]>
* i965: fix up asserts in brw_inst_set_jip()Timothy Arceri2017-01-301-2/+2
| | | | | | | We are casting from a signed 32bit int to an unsigned 16bit int so shift 15 bits rather than 16. Reviewed-by: Kenneth Graunke <[email protected]>
* llvmpipe: Use LLVMDumpModule, not DumpModule.Bas Nieuwenhuizen2017-01-291-1/+1
| | | | | | | Forgot the prefix ... Fixes: 0fca80b3db64dc1d004f78e22b9de86a07e9de96 Signed-off-by: Bas Nieuwenhuizen <[email protected]>
* various: Fix missing DumpModule with recent LLVM.Bas Nieuwenhuizen2017-01-295-5/+22
| | | | | | | | | | | | | | Since LLVM revision 293359 DumpModule gets only implemented when either a debug build or LLVM_ENABLE_DUMP is set. This patch adds a direct replacement for the function for radv and radeonsi, However, as I don't know a good place to put common LLVM code for all three I inlined the implementation for LLVMPipe. v2: Use the new code for LLVM 3.4+ instead of LLVM 5+ & fixed indentation Signed-off-by: Bas Nieuwenhuizen <[email protected]> Reviewed-by: Roland Scheidegger <[email protected]>
* r600g: use ieee variants of multiplication instructionsIlia Mirkin2017-01-292-18/+19
| | | | | | | | This matches the behavior of most other drivers, including nouveau, radeonsi, and i965. Signed-off-by: Ilia Mirkin <[email protected]> Reviewed-by: Nicolai Hähnle <[email protected]>
* r600g: add support for optionally using non-IEEE mul opsIlia Mirkin2017-01-282-4/+18
| | | | | Signed-off-by: Ilia Mirkin <[email protected]> Reviewed-by: Nicolai Hähnle <[email protected]>
* vc4: Coalesce into TLB writes as well as VPM/tex.Eric Anholt2017-01-281-1/+5
| | | | | | | | This generally cuts an instruction when blending is enabled and we thus have a single instruction generating the color value. total instructions in shared programs: 91759 -> 91634 (-0.14%) instructions in affected programs: 5338 -> 5213 (-2.34%)
* vc4: Avoid an extra temporary and mov in ffloor/ffract/fceil.Eric Anholt2017-01-281-13/+18
| | | | | | | | | | shader-db results: total instructions in shared programs: 92611 -> 91764 (-0.91%) instructions in affected programs: 27417 -> 26570 (-3.09%) The star is one shader in glmark2's terrain (drops 16% of its instructions), but there are also wins in mupen64plus and glb2.7.
* vc4: Flip the switch to run the GLSL compiler optimization loop once.Eric Anholt2017-01-281-1/+1
| | | | | | | | | | | | | This has almost no effect on shader-db: total instructions in shared programs: 92572 -> 92611 (0.04%) instructions in affected programs: 4486 -> 4525 (0.87%) Looking at 2 of the 7 different shaders that were hurt (all of which were in mupen64), they all appear to be just differences in order of instructions at the NIR level. The advantage is that this should significantly reduce time in the compiler.
* i965: Unbind deleted shaders from brw_context, fixing malloc heisenbug.Kenneth Graunke2017-01-271-0/+43
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Applications may delete a shader program, create a new one, and bind it before the next draw. With terrible luck, malloc may randomly return a chunk of memory for the new gl_program that happened to be the exact same pointer as our previously bound gl_program. In this case, our logic to detect new programs in brw_upload_pipeline_state() would break: if (brw->vertex_program != ctx->VertexProgram._Current) { brw->vertex_program = ctx->VertexProgram._Current; brw->ctx.NewDriverState |= BRW_NEW_VERTEX_PROGRAM; } Because the pointer is the same, we'd think it was the same program. But it could be wildly different - a different stage altogether, different sets of resources, and so on. This causes utter chaos. As unlikely as this seems, I believe I hit this when running a subset of the CTS in a loop, in a group of tests that churns through simple programs, deleting and rebuilding them. Presumably malloc uses a bucketing cache of sorts, and so freeing up a gl_program and allocating a new one fairly quickly causes it to reuse that memory. The result was that brw->vertex_program->info.num_ssbos claimed the program had SSBOs, while brw->vs.base.prog_data.binding_table claimed that there were none. This was crazy, because the binding table is calculated from info.num_ssbos - the shader info appeared to change between shader compile time and draw time. Careful use of watchpoints revealed that it was being clobbered by rzalloc's memset when building an entirely different program... Fortunately, our 0xd0d0d0d0 canary for unused binding table entries caused us to crash out of bounds when trying to upload SSBOs, or we may have never discovered this heisenbug. Fixes crashes in GL45-CTS.compute_shader.sso-case2 when using a hacked cts-runner that only runs GL45-CTS.compute_shader.s* in EGL config ID 5 at 64x64 in a loop with 100 iterations. Cc: "17.0 13.0 12.0" <[email protected]> Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Timothy Arceri <[email protected]> Reviewed-by: Topi Pohjolainen <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]>
* radv/ac: Use base in push constant loads.Bas Nieuwenhuizen2017-01-281-2/+5
| | | | | | | | | Apparently the source is not an address but an offset, so we actually need to use the base. Signed-off-by: Bas Nieuwenhuizen <[email protected]> Reviewed-by: Dave Airlie <[email protected]> CC: <[email protected]>
* radv: drop support for VK_AMD_NEGATIVE_VIEWPORT_HEIGHTAndres Rodriguez2017-01-281-4/+0
| | | | | | | | This extension was not correctly supported, and it conflicts with the VK_KHR_MAINTENANCE1 spec. Reviewed-by: Fredrik Höglund <[email protected]> Signed-off-by: Dave Airlie <[email protected]>