summaryrefslogtreecommitdiffstats
path: root/src/gallium/drivers
Commit message (Collapse)AuthorAgeFilesLines
* swr: Removed stalling SwrWaitForIdle from queries.Bruce Cherniak2016-10-034-119/+87
| | | | | | | | Previous fundamental change in stats gathering added a temporary SwrWaitForIdle to begin_query and end_query. Code has been reworked to remove stall. Reviewed-by: George Kyriazis <[email protected]>
* swr: [rasterizer core] refactor thread creationTim Rowley2016-10-033-9/+29
| | | | | | | | | | Create worker pool now computes number of worker threads based on things like topologies, etc. and creates the pool but doesn't actually launch the threads. Instead there is a separate start thread pool function. This allows thread resources to be constructed first before threads start. Signed-off-by: Tim Rowley <[email protected]>
* swr: [rasterizer jitter] canonicalize blend compile stateTim Rowley2016-10-032-0/+39
| | | | | | Canonicalize to prevent unnecessary JIT compiles. Signed-off-by: Tim Rowley <[email protected]>
* swr: [rasterizer core] archrast fixesTim Rowley2016-10-034-7/+14
| | | | | | | - Immediately sleep threads until thread data is initialized - Fix some compile bugs with AR enabled Signed-off-by: Tim Rowley <[email protected]>
* swr: [rasterizer jitter] fixes for icc in vs2015 compat modeTim Rowley2016-10-0312-1404/+1439
| | | | | | | - Move most jitter functionality into SwrJit namespace - Avoid global "using namespace llvm" in headers Signed-off-by: Tim Rowley <[email protected]>
* swr: [rasterizer core] generalize compute dispatch mechanismTim Rowley2016-10-033-4/+15
| | | | | | Generalize compute dispatch mechanism to support other types of dispatches. Signed-off-by: Tim Rowley <[email protected]>
* swr: [rasterizer common] os.h portability header changesTim Rowley2016-10-031-0/+6
| | | | | | | - Fix conflict between windows MemoryFence and llvm::sys::MemoryFence - Declare gettid() Signed-off-by: Tim Rowley <[email protected]>
* gallium/radeon: fix crash/regression in performance countersNicolai Hähnle2016-09-301-0/+9
| | | | | | | Regression introduced by "gallium/radeon: zero all query buffers". Cc: Michel Dänzer <[email protected]> Reviewed-by: Marek Olšák <[email protected]>
* gallium/radeon: update documentation of buffer_get_virtual_addressNicolai Hähnle2016-09-301-0/+3
| | | | Reviewed-by: Marek Olšák <[email protected]>
* gallium/radeon: emit relocations for query fencesNicolai Hähnle2016-09-304-9/+15
| | | | | | | | | This is only needed for r600 which doesn't have ARB_query_buffer_object and therefore wouldn't really need the fences, but let's be optimistic about filling in this feature gap eventually. Cc: Dieter Nützel <[email protected]> Reviewed-by: Marek Olšák <[email protected]>
* radeon/uvd: adjust the buffer offset when relocation is usedNicolai Hähnle2016-09-301-0/+1
| | | | | | | We don't plan to use sub-allocated buffers with UVD, but just in case one slips through, this increases the chances of things working out anyway. Reviewed-by: Christian König <[email protected]>
* radeon/vce: adjust the buffer offset when relocation is usedNicolai Hähnle2016-09-301-0/+1
| | | | | | | We don't plan to use sub-allocated buffers with VCE, but just in case one slips through, this increases the chances of things working out anyway. Reviewed-by: Christian König <[email protected]>
* radeon/video: don't use sub-allocated buffersNicolai Hähnle2016-09-302-1/+10
| | | | | | | Cc: Christian König <[email protected]> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=97976 Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=97969 Reviewed-by: Christian König <[email protected]>
* nv50/ir: teach insnCanLoad() about SHLADDSamuel Pitoiset2016-09-291-0/+2
| | | | | | | | | | | | | | | | | | | | | | | Commutativity is not allowed with SHLADD, but src2 can accept loads. To allow the load propagation pass to do its job, add a special case like for SUCLAMP because src1 is always an immediate. This IMAD to SHLADD optimization helps a bunch of shaders from Tomb Raider, Victor Vran, UE4 demos (+15% perf with Elemental) and Shadow Warrior. GF100/GK104: total instructions in shared programs :2838045 -> 2834712 (-0.12%) total gprs used in shared programs :396684 -> 396386 (-0.08%) total local used in shared programs :34416 -> 34416 (0.00%) local gpr inst bytes helped 0 326 1105 1105 hurt 0 55 3 3 Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Ilia Mirkin <[email protected]>
* nv50/ir: optimize SHLADD(a, b, c) to MOV((a << b) + c)Samuel Pitoiset2016-09-291-0/+3
| | | | | Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Ilia Mirkin <[email protected]>
* nv50/ir: optimize SHLADD(a, b, 0x0) to SHL(a, b)Samuel Pitoiset2016-09-291-0/+8
| | | | | Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Ilia Mirkin <[email protected]>
* nv50/ir: optimize IMAD to SHLADD in presence of power of 2Samuel Pitoiset2016-09-291-0/+7
| | | | | | | Only and only if src1 is a power of 2 we can replace IMAD by SHLADD. Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Ilia Mirkin <[email protected]>
* nvc0/ir: add emission for SHLADDSamuel Pitoiset2016-09-293-0/+127
| | | | | | | | Unfortunately, we can't use the emit helpers for GF100/GK110 because src1 and src2 are swapped. Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Ilia Mirkin <[email protected]>
* nv50/ir: add preliminary support for SHLADDSamuel Pitoiset2016-09-295-7/+17
| | | | | | | | | | This instruction is available since SM20 (Fermi) and allow to do (a << b) + c in one shot. In some situations, IMAD should be replaced by SHLADD when b is a power of 2, and ADD+SHL should be replaced by SHLADD as well. Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Ilia Mirkin <[email protected]>
* nvc0: update GM107 sched control codes formatSamuel Pitoiset2016-09-292-23/+23
| | | | | | | | | envyas now uses a much better representation for those control codes and it displays the different flags instead of an unreadable hex number. Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Ilia Mirkin <[email protected]>
* gallium/radeon: use smaller buffers for query resultsNicolai Hähnle2016-09-291-1/+1
| | | | | | | Most of the time, even the 512 bytes that we now get is more than sufficient (pipeline stats queries are the largest at 184 bytes per shot). Reviewed-by: Marek Olšák <[email protected]>
* gallium/radeon/winsyses: add radeon_winsys::min_alloc_sizeNicolai Hähnle2016-09-291-0/+1
| | | | Reviewed-by: Marek Olšák <[email protected]>
* radeonsi: enable ARB_query_buffer_object (v2)Nicolai Hähnle2016-09-291-7/+14
| | | | | | | v2: enable only when compute is available Reviewed-by: Edward O'Callaghan <[email protected]> Reviewed-by: Marek Olšák <[email protected]>
* gallium/radeon: implement get_query_result_resource (v2)Nicolai Hähnle2016-09-294-1/+402
| | | | | | | v2: fix a comment (Gustaw Smolarczyk) Acked-by: Edward O'Callaghan <[email protected]> Reviewed-by: Marek Olšák <[email protected]>
* gallium/radeon: zero all query buffersNicolai Hähnle2016-09-292-17/+11
| | | | | | | To ensure that fences are properly initialized. Reviewed-by: Edward O'Callaghan <[email protected]> Reviewed-by: Marek Olšák <[email protected]>
* gallium/radeon: cleanup getting PIPE_QUERY_TIMESTAMP resultNicolai Hähnle2016-09-291-5/+1
| | | | | Reviewed-by: Edward O'Callaghan <[email protected]> Reviewed-by: Marek Olšák <[email protected]>
* gallium/radeon: add query fences and r600_get_hw_query_paramsNicolai Hähnle2016-09-291-16/+91
| | | | | | | | | | | | | | | | | We will support the waiting option in ARB_query_buffer_object using WAIT_REG_MEM on an appropriate fence-like dword. Some queries conveniently write their results with the highest bit set, and we can just use that; for others, we have to write a fence explicitly. ZPASS_DONE for occlusion queries writes its results with the high bit set, but it writes up to 8 pairs of results (one for each DB). We have to wait for all of these results, so let's just add an explicit fence. The new function provides summary information to be used by subsequent patches. Reviewed-by: Edward O'Callaghan <[email protected]> Reviewed-by: Marek Olšák <[email protected]>
* radeonsi: add save_qbo_stateNicolai Hähnle2016-09-293-0/+22
| | | | | | | | Save compute shader state that will be used for the ARB_query_buffer_object implementation. Reviewed-by: Edward O'Callaghan <[email protected]> Reviewed-by: Marek Olšák <[email protected]>
* radeonsi: add si_get_shader_buffers/get_pipe_constant_buffers (v2)Nicolai Hähnle2016-09-292-0/+51
| | | | | | | | | | These functions extract the pipe state structure from the current descriptors, for state saving. v2: correctly dereference *buf (Bas) Reviewed-by: Edward O'Callaghan <[email protected]> Reviewed-by: Marek Olšák <[email protected]>
* gallium/radeon: add r600_gfx_{write,wait}_fenceNicolai Hähnle2016-09-293-38/+60
| | | | | | | For bottom-of-pipe fences inside the gfx command stream. Reviewed-by: Edward O'Callaghan <[email protected]> Reviewed-by: Marek Olšák <[email protected]>
* gallium/radeon: add barrier_flags to r600_common_screenNicolai Hähnle2016-09-293-0/+23
| | | | | | | | | | | There are driver-specific context flags for barriers that are not covered by the Gallium barrier interfaces. The R600 settings of these flags may not be optimal, but we're not going to use them yet anyway. Reviewed-by: Edward O'Callaghan <[email protected]> Reviewed-by: Marek Olšák <[email protected]>
* vc4: Emit perf debug when we fall back to quad clears.Eric Anholt2016-09-281-0/+2
|
* gallium/radeon: Initialize pipe_resource::next to NULLMichel Dänzer2016-09-282-0/+2
| | | | | | | Fixes lots of piglit tests crashing due to using uninitialized memory. Fixes: ecd6fce2611e ("mesa/st: support lowering multi-planar YUV") Reviewed-by: Nicolai Hähnle <[email protected]> Reviewed-by: Marek Olšák <[email protected]>
* swr: replace gallium->swr format enum conversionTim Rowley2016-09-271-51/+293
| | | | | | Replace old string comparison with a mapping table. Reviewed-by: Bruce Cherniak <[email protected]>
* winsys/amdgpu: add fence and buffer list logic for slab allocated buffersNicolai Hähnle2016-09-271-0/+3
| | | | Reviewed-by: Marek Olšák <[email protected]>
* gallium/radeon: add RADEON_FLAG_HANDLENicolai Hähnle2016-09-273-1/+4
| | | | | | | | | | | | | | | | When passed to winsys->buffer_create, this flag will indicate that we require a buffer that maps 1:1 with a kernel buffer handle. This is currently set for all textures, since textures can potentially be exported to other processes. This is not a huge loss, since the main purpose of this patch series is to deal with applications that allocate many small buffers. A hypothetical application with tons of tiny textures might still benefit from not setting this flag, but that's not a use case I'm worried about just now. Reviewed-by: Marek Olšák <[email protected]>
* gallium/radeon: add RADEON_USAGE_SYNCHRONIZEDNicolai Hähnle2016-09-275-13/+25
| | | | | | | | | | This is really the behavior we want most of the time, but having a SYNCHRONIZED flag instead of an UNSYNCHRONIZED one has the advantage that OR'ing different flags together always results in stronger guarantees. The parent BOs of sub-allocated buffers will be added unsynchronized. Reviewed-by: Marek Olšák <[email protected]>
* nv50/ir: fix comments about instructions infoSamuel Pitoiset2016-09-261-2/+3
| | | | | Signed-off-by: Samuel Pitoiset <[email protected]> Acked-by: Ilia Mirkin <[email protected]>
* nvc0: allow to force compiling programs in debug buildSamuel Pitoiset2016-09-261-9/+10
| | | | | | | | | This adds a new envvar called NV50_PROG_CHIPSET which allows to compile shaders with a different target, especially useful for shader-db. Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Ilia Mirkin <[email protected]>
* nv50/ir: drop unused NVISA_XXX_CHIPSET constantsSamuel Pitoiset2016-09-261-2/+0
| | | | | Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Ilia Mirkin <[email protected]>
* r600g: Add support for PK2H/UP2HGlenn Kennard2016-09-262-5/+104
| | | | | | | | Based off of Ilia's original patch, but with output values replicated so that it matches the TGSI semantics. Signed-off-by: Glenn Kennard <[email protected]> Signed-off-by: Marek Olšák <[email protected]>
* svga: set PIPE_BIND_DEPTH_STENCIL flag for new resources when possibleBrian Paul2016-09-231-1/+11
| | | | | | | | When we create a depth/stencil texture, also check if we can render to it and set the PIPE_BIND_DEPTH_STENCIL flag. We were previously doing this for color textures (PIPE_BIND_RENDER_TARGET). Reviewed-by: Charmaine Lee <[email protected]>
* svga: don't special case caps for SVGA3D_R32_FLOATBrian Paul2016-09-231-6/+2
| | | | | | | This may have been needed years ago during development, but not now. Prevents some regressions after introducing the next patch. Reviewed-by: Charmaine Lee <[email protected]>
* svga: use new adjust_z_layer() helper in svga_pipe_blit.cBrian Paul2016-09-231-44/+28
| | | | | | | | | To handle z/layer fix-ups for blitting and copying. Note that we weren't doing this properly in svga_blit() before. Also, remove redundant stex, dtex assignments. Reviewed-by: Charmaine Lee <[email protected]>
* svga: simplify/improve the format compatibility check for region copiesBrian Paul2016-09-231-5/+25
| | | | | | | | The util_is_format_compatible() function didn't quite do what we wanted for vgpu10. This check is more flexible and allows copies between formats such as R32G32B32A32_FLOAT and R32G32B32A32_INT. Reviewed-by: Charmaine Lee <[email protected]>
* svga: add const qualifier on svga_translate_format()Brian Paul2016-09-232-2/+2
| | | | Reviewed-by: Charmaine Lee <[email protected]>
* svga: eliminate unneeded gotos in svga_validate_surface_view()Brian Paul2016-09-231-7/+4
| | | | Reviewed-by: Charmaine Lee <[email protected]>
* svga: disable srgb format related code from svga_blit()Neha Bhende2016-09-231-12/+0
| | | | | | | | | With latest mesa and latest piglit tests srgb<->linear conversion is not required as per GL4.4 rules See commit b662c70aeab6a92751514f30719c13a6de253b40. Reviewed-by: Charmaine Lee <[email protected]>
* nir: Allow opt_peephole_sel to be more aggressive in flattening IFs.Eric Anholt2016-09-221-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | VC4 was running into a major performance regression from enabling control flow in the glmark2 conditionals test, because of short if statements containing an ffract. This pass seems like it was was trying to ensure that we only flattened IFs that should be entirely a win by guaranteeing that there would be fewer bcsels than there were MOVs otherwise. However, if the number of ALU ops is small, we can avoid the overhead of branching (which itself costs cycles) and still get a win, even if it means moving real instructions out of the THEN/ELSE blocks. For now, just turn on aggressive flattening on vc4. i965 will need some tuning to avoid regressions. It does looks like this may be useful to replace freedreno code. Improves glmark2 -b conditionals:fragment-steps=5:vertex-steps=0 from 47 fps to 95 fps on vc4. vc4 shader-db: total instructions in shared programs: 101282 -> 99543 (-1.72%) instructions in affected programs: 17365 -> 15626 (-10.01%) total uniforms in shared programs: 31295 -> 31172 (-0.39%) uniforms in affected programs: 3580 -> 3457 (-3.44%) total estimated cycles in shared programs: 225182 -> 223746 (-0.64%) estimated cycles in affected programs: 26085 -> 24649 (-5.51%) v2: Update shader-db output. Reviewed-by: Ian Romanick <[email protected]> (v1)
* svga: minor simplification in svga_validate_surface_view()Brian Paul2016-09-211-3/+2
| | | | | | Get rid of unneeded local var. Reviewed-by: Charmaine Lee <[email protected]>