summaryrefslogtreecommitdiffstats
path: root/src/gallium/drivers
Commit message (Collapse)AuthorAgeFilesLines
* nv50/ir: teach insnCanLoad() about SHLADDSamuel Pitoiset2016-09-291-0/+2
| | | | | | | | | | | | | | | | | | | | | | | Commutativity is not allowed with SHLADD, but src2 can accept loads. To allow the load propagation pass to do its job, add a special case like for SUCLAMP because src1 is always an immediate. This IMAD to SHLADD optimization helps a bunch of shaders from Tomb Raider, Victor Vran, UE4 demos (+15% perf with Elemental) and Shadow Warrior. GF100/GK104: total instructions in shared programs :2838045 -> 2834712 (-0.12%) total gprs used in shared programs :396684 -> 396386 (-0.08%) total local used in shared programs :34416 -> 34416 (0.00%) local gpr inst bytes helped 0 326 1105 1105 hurt 0 55 3 3 Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Ilia Mirkin <[email protected]>
* nv50/ir: optimize SHLADD(a, b, c) to MOV((a << b) + c)Samuel Pitoiset2016-09-291-0/+3
| | | | | Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Ilia Mirkin <[email protected]>
* nv50/ir: optimize SHLADD(a, b, 0x0) to SHL(a, b)Samuel Pitoiset2016-09-291-0/+8
| | | | | Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Ilia Mirkin <[email protected]>
* nv50/ir: optimize IMAD to SHLADD in presence of power of 2Samuel Pitoiset2016-09-291-0/+7
| | | | | | | Only and only if src1 is a power of 2 we can replace IMAD by SHLADD. Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Ilia Mirkin <[email protected]>
* nvc0/ir: add emission for SHLADDSamuel Pitoiset2016-09-293-0/+127
| | | | | | | | Unfortunately, we can't use the emit helpers for GF100/GK110 because src1 and src2 are swapped. Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Ilia Mirkin <[email protected]>
* nv50/ir: add preliminary support for SHLADDSamuel Pitoiset2016-09-295-7/+17
| | | | | | | | | | This instruction is available since SM20 (Fermi) and allow to do (a << b) + c in one shot. In some situations, IMAD should be replaced by SHLADD when b is a power of 2, and ADD+SHL should be replaced by SHLADD as well. Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Ilia Mirkin <[email protected]>
* nvc0: update GM107 sched control codes formatSamuel Pitoiset2016-09-292-23/+23
| | | | | | | | | envyas now uses a much better representation for those control codes and it displays the different flags instead of an unreadable hex number. Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Ilia Mirkin <[email protected]>
* gallium/radeon: use smaller buffers for query resultsNicolai Hähnle2016-09-291-1/+1
| | | | | | | Most of the time, even the 512 bytes that we now get is more than sufficient (pipeline stats queries are the largest at 184 bytes per shot). Reviewed-by: Marek Olšák <[email protected]>
* gallium/radeon/winsyses: add radeon_winsys::min_alloc_sizeNicolai Hähnle2016-09-291-0/+1
| | | | Reviewed-by: Marek Olšák <[email protected]>
* radeonsi: enable ARB_query_buffer_object (v2)Nicolai Hähnle2016-09-291-7/+14
| | | | | | | v2: enable only when compute is available Reviewed-by: Edward O'Callaghan <[email protected]> Reviewed-by: Marek Olšák <[email protected]>
* gallium/radeon: implement get_query_result_resource (v2)Nicolai Hähnle2016-09-294-1/+402
| | | | | | | v2: fix a comment (Gustaw Smolarczyk) Acked-by: Edward O'Callaghan <[email protected]> Reviewed-by: Marek Olšák <[email protected]>
* gallium/radeon: zero all query buffersNicolai Hähnle2016-09-292-17/+11
| | | | | | | To ensure that fences are properly initialized. Reviewed-by: Edward O'Callaghan <[email protected]> Reviewed-by: Marek Olšák <[email protected]>
* gallium/radeon: cleanup getting PIPE_QUERY_TIMESTAMP resultNicolai Hähnle2016-09-291-5/+1
| | | | | Reviewed-by: Edward O'Callaghan <[email protected]> Reviewed-by: Marek Olšák <[email protected]>
* gallium/radeon: add query fences and r600_get_hw_query_paramsNicolai Hähnle2016-09-291-16/+91
| | | | | | | | | | | | | | | | | We will support the waiting option in ARB_query_buffer_object using WAIT_REG_MEM on an appropriate fence-like dword. Some queries conveniently write their results with the highest bit set, and we can just use that; for others, we have to write a fence explicitly. ZPASS_DONE for occlusion queries writes its results with the high bit set, but it writes up to 8 pairs of results (one for each DB). We have to wait for all of these results, so let's just add an explicit fence. The new function provides summary information to be used by subsequent patches. Reviewed-by: Edward O'Callaghan <[email protected]> Reviewed-by: Marek Olšák <[email protected]>
* radeonsi: add save_qbo_stateNicolai Hähnle2016-09-293-0/+22
| | | | | | | | Save compute shader state that will be used for the ARB_query_buffer_object implementation. Reviewed-by: Edward O'Callaghan <[email protected]> Reviewed-by: Marek Olšák <[email protected]>
* radeonsi: add si_get_shader_buffers/get_pipe_constant_buffers (v2)Nicolai Hähnle2016-09-292-0/+51
| | | | | | | | | | These functions extract the pipe state structure from the current descriptors, for state saving. v2: correctly dereference *buf (Bas) Reviewed-by: Edward O'Callaghan <[email protected]> Reviewed-by: Marek Olšák <[email protected]>
* gallium/radeon: add r600_gfx_{write,wait}_fenceNicolai Hähnle2016-09-293-38/+60
| | | | | | | For bottom-of-pipe fences inside the gfx command stream. Reviewed-by: Edward O'Callaghan <[email protected]> Reviewed-by: Marek Olšák <[email protected]>
* gallium/radeon: add barrier_flags to r600_common_screenNicolai Hähnle2016-09-293-0/+23
| | | | | | | | | | | There are driver-specific context flags for barriers that are not covered by the Gallium barrier interfaces. The R600 settings of these flags may not be optimal, but we're not going to use them yet anyway. Reviewed-by: Edward O'Callaghan <[email protected]> Reviewed-by: Marek Olšák <[email protected]>
* vc4: Emit perf debug when we fall back to quad clears.Eric Anholt2016-09-281-0/+2
|
* gallium/radeon: Initialize pipe_resource::next to NULLMichel Dänzer2016-09-282-0/+2
| | | | | | | Fixes lots of piglit tests crashing due to using uninitialized memory. Fixes: ecd6fce2611e ("mesa/st: support lowering multi-planar YUV") Reviewed-by: Nicolai Hähnle <[email protected]> Reviewed-by: Marek Olšák <[email protected]>
* swr: replace gallium->swr format enum conversionTim Rowley2016-09-271-51/+293
| | | | | | Replace old string comparison with a mapping table. Reviewed-by: Bruce Cherniak <[email protected]>
* winsys/amdgpu: add fence and buffer list logic for slab allocated buffersNicolai Hähnle2016-09-271-0/+3
| | | | Reviewed-by: Marek Olšák <[email protected]>
* gallium/radeon: add RADEON_FLAG_HANDLENicolai Hähnle2016-09-273-1/+4
| | | | | | | | | | | | | | | | When passed to winsys->buffer_create, this flag will indicate that we require a buffer that maps 1:1 with a kernel buffer handle. This is currently set for all textures, since textures can potentially be exported to other processes. This is not a huge loss, since the main purpose of this patch series is to deal with applications that allocate many small buffers. A hypothetical application with tons of tiny textures might still benefit from not setting this flag, but that's not a use case I'm worried about just now. Reviewed-by: Marek Olšák <[email protected]>
* gallium/radeon: add RADEON_USAGE_SYNCHRONIZEDNicolai Hähnle2016-09-275-13/+25
| | | | | | | | | | This is really the behavior we want most of the time, but having a SYNCHRONIZED flag instead of an UNSYNCHRONIZED one has the advantage that OR'ing different flags together always results in stronger guarantees. The parent BOs of sub-allocated buffers will be added unsynchronized. Reviewed-by: Marek Olšák <[email protected]>
* nv50/ir: fix comments about instructions infoSamuel Pitoiset2016-09-261-2/+3
| | | | | Signed-off-by: Samuel Pitoiset <[email protected]> Acked-by: Ilia Mirkin <[email protected]>
* nvc0: allow to force compiling programs in debug buildSamuel Pitoiset2016-09-261-9/+10
| | | | | | | | | This adds a new envvar called NV50_PROG_CHIPSET which allows to compile shaders with a different target, especially useful for shader-db. Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Ilia Mirkin <[email protected]>
* nv50/ir: drop unused NVISA_XXX_CHIPSET constantsSamuel Pitoiset2016-09-261-2/+0
| | | | | Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Ilia Mirkin <[email protected]>
* r600g: Add support for PK2H/UP2HGlenn Kennard2016-09-262-5/+104
| | | | | | | | Based off of Ilia's original patch, but with output values replicated so that it matches the TGSI semantics. Signed-off-by: Glenn Kennard <[email protected]> Signed-off-by: Marek Olšák <[email protected]>
* svga: set PIPE_BIND_DEPTH_STENCIL flag for new resources when possibleBrian Paul2016-09-231-1/+11
| | | | | | | | When we create a depth/stencil texture, also check if we can render to it and set the PIPE_BIND_DEPTH_STENCIL flag. We were previously doing this for color textures (PIPE_BIND_RENDER_TARGET). Reviewed-by: Charmaine Lee <[email protected]>
* svga: don't special case caps for SVGA3D_R32_FLOATBrian Paul2016-09-231-6/+2
| | | | | | | This may have been needed years ago during development, but not now. Prevents some regressions after introducing the next patch. Reviewed-by: Charmaine Lee <[email protected]>
* svga: use new adjust_z_layer() helper in svga_pipe_blit.cBrian Paul2016-09-231-44/+28
| | | | | | | | | To handle z/layer fix-ups for blitting and copying. Note that we weren't doing this properly in svga_blit() before. Also, remove redundant stex, dtex assignments. Reviewed-by: Charmaine Lee <[email protected]>
* svga: simplify/improve the format compatibility check for region copiesBrian Paul2016-09-231-5/+25
| | | | | | | | The util_is_format_compatible() function didn't quite do what we wanted for vgpu10. This check is more flexible and allows copies between formats such as R32G32B32A32_FLOAT and R32G32B32A32_INT. Reviewed-by: Charmaine Lee <[email protected]>
* svga: add const qualifier on svga_translate_format()Brian Paul2016-09-232-2/+2
| | | | Reviewed-by: Charmaine Lee <[email protected]>
* svga: eliminate unneeded gotos in svga_validate_surface_view()Brian Paul2016-09-231-7/+4
| | | | Reviewed-by: Charmaine Lee <[email protected]>
* svga: disable srgb format related code from svga_blit()Neha Bhende2016-09-231-12/+0
| | | | | | | | | With latest mesa and latest piglit tests srgb<->linear conversion is not required as per GL4.4 rules See commit b662c70aeab6a92751514f30719c13a6de253b40. Reviewed-by: Charmaine Lee <[email protected]>
* nir: Allow opt_peephole_sel to be more aggressive in flattening IFs.Eric Anholt2016-09-221-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | VC4 was running into a major performance regression from enabling control flow in the glmark2 conditionals test, because of short if statements containing an ffract. This pass seems like it was was trying to ensure that we only flattened IFs that should be entirely a win by guaranteeing that there would be fewer bcsels than there were MOVs otherwise. However, if the number of ALU ops is small, we can avoid the overhead of branching (which itself costs cycles) and still get a win, even if it means moving real instructions out of the THEN/ELSE blocks. For now, just turn on aggressive flattening on vc4. i965 will need some tuning to avoid regressions. It does looks like this may be useful to replace freedreno code. Improves glmark2 -b conditionals:fragment-steps=5:vertex-steps=0 from 47 fps to 95 fps on vc4. vc4 shader-db: total instructions in shared programs: 101282 -> 99543 (-1.72%) instructions in affected programs: 17365 -> 15626 (-10.01%) total uniforms in shared programs: 31295 -> 31172 (-0.39%) uniforms in affected programs: 3580 -> 3457 (-3.44%) total estimated cycles in shared programs: 225182 -> 223746 (-0.64%) estimated cycles in affected programs: 26085 -> 24649 (-5.51%) v2: Update shader-db output. Reviewed-by: Ian Romanick <[email protected]> (v1)
* svga: minor simplification in svga_validate_surface_view()Brian Paul2016-09-211-3/+2
| | | | | | Get rid of unneeded local var. Reviewed-by: Charmaine Lee <[email protected]>
* svga: remove disable_shader debug variableBrian Paul2016-09-213-10/+0
| | | | | | Never used, AFAIK. Reviewed-by: Charmaine Lee <[email protected]>
* radeonsi: prepare 64-bit integer support. (v2)Dave Airlie2016-09-211-7/+62
| | | | | | | | | | | v2: - no PIPE_CAP_INT64 yet - emit DIV/MOD without the divide-by-zero workaround Reviewed-by: Marek Olšák <[email protected]> (v1) Reviewed-by: Edward O'Callaghan <[email protected]> Signed-off-by: Dave Airlie <[email protected]> Signed-off-by: Nicolai Hähnle <[email protected]>
* radeon/vce: add firmware support for version 52.8.3Leo Liu2016-09-201-0/+3
| | | | | Signed-off-by: Leo Liu <[email protected]> Reviewed-by: Christian König <[email protected]>
* swr: [rasterizer core] Better thread destructionTim Rowley2016-09-198-69/+126
| | | | Signed-off-by: Tim Rowley <[email protected]>
* swr: [rasterizer jitter] Fix missing end-of-file newlineTim Rowley2016-09-191-1/+2
| | | | Signed-off-by: Tim Rowley <[email protected]>
* swr: [rasterizer core] Add macros for mapping ArchRast to bucketsTim Rowley2016-09-1911-200/+249
| | | | | | Switch all RDTSC_START/STOP macros to use AR_BEGIN/END macros. Signed-off-by: Tim Rowley <[email protected]>
* nvc0: get rid of nvc0_stage_sampler_states_bind_range()Samuel Pitoiset2016-09-191-74/+9
| | | | | | | Same thing as nvc0_stage_set_sampler_views_range(). Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Ilia Mirkin <[email protected]>
* nvc0: get rid of nvc0_stage_set_sampler_views_range()Samuel Pitoiset2016-09-191-89/+15
| | | | | | | | This function was quite similar to nvc0_stage_set_sampler_views() and I don't see any reasons to not remove it. Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Ilia Mirkin <[email protected]>
* nv50/ir: optimize SUB(a, b) to MOV(a - b)Samuel Pitoiset2016-09-181-0/+10
| | | | | | | | | | | | | | | | | | | This helps shaders in UE4 demos, especially with Elemental (+1% perf). This optimization reduces spilling usage in one shader which explains the little gain. GF100/GK104: total instructions in shared programs :2838551 -> 2838045 (-0.02%) total gprs used in shared programs :396706 -> 396684 (-0.01%) total local used in shared programs :34432 -> 34416 (-0.05%) local gpr inst bytes helped 1 19 112 112 hurt 0 0 0 0 Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Ilia Mirkin <[email protected]>
* gk110/ir: fix wrong emission of OP_NOTSamuel Pitoiset2016-09-181-1/+1
| | | | | | | | | This should emit src0 instead of src1. Found by inspection. Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Ilia Mirkin <[email protected]> Cc: [email protected]
* r600g/sb: fix struct/class declaration conflictsMartina Kollarova2016-09-181-5/+1
| | | | | | | | | A couple of forward-declarations were causing warnings in clang: 'value' defined as a class here but previously declared as a struct [-Wmismatched-tags] Signed-off-by: Martina Kollarova <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]>
* svga: relax restriction of compressed formats for texture uploadCharmaine Lee2016-09-171-3/+22
| | | | | | | | | | | | | | | This patch relaxes the restriction of compressed formats for texture upload buffer. For now, 3D texture with compressed format is still not supported in the texture upload buffer path. As Brian noted, ETQW does many texture updates with glCompressedTexSubImage. This patch greatly improves the performance of the ETQW trace. Tested with ETQW, MTT piglit, glretrace, conform, viewperf v2: Per Brian's suggestion, removed the subregion boundary check. Reviewed-by: Brian Paul <[email protected]>
* svga: skip query flush if we already have the query resultBrian Paul2016-09-171-5/+5
| | | | | | | | | This reduces the number of times we flush in some situations (the arbocclude demo is one trivial example). Tested with Piglit, ETQW, Sauerbraten, arbocclude. Reviewed-by: Charmaine Lee <[email protected]>