summaryrefslogtreecommitdiffstats
path: root/src/gallium
Commit message (Collapse)AuthorAgeFilesLines
* radeonsi: reload PS inputs with direct indexing at each use (v2)Marek Olšák2016-09-143-22/+41
| | | | | | | | | | | | | | | | | | | | | The LLVM compiler can CSE interp intrinsics thanks to LLVMReadNoneAttribute. 26011 shaders in 14651 tests Totals: SGPRS: 1146340 -> 1132676 (-1.19 %) VGPRS: 727371 -> 711730 (-2.15 %) Spilled SGPRs: 2218 -> 2078 (-6.31 %) Spilled VGPRs: 369 -> 369 (0.00 %) Scratch VGPRs: 1344 -> 1344 (0.00 %) dwords per thread Code Size: 35841268 -> 36009732 (0.47 %) bytes LDS: 767 -> 767 (0.00 %) blocks Max Waves: 222559 -> 224779 (1.00 %) Wait states: 0 -> 0 (0.00 %) v2: don't call load_input for fragment shaders in emit_declaration Reviewed-by: Nicolai Hähnle <[email protected]>
* radeonsi: get rid of constant buffer preloadingMarek Olšák2016-09-141-24/+14
| | | | | | | | | | | | | | | | | 26011 shaders in 14651 tests Totals: SGPRS: 1152636 -> 1146340 (-0.55 %) VGPRS: 728198 -> 727371 (-0.11 %) Spilled SGPRs: 3776 -> 2218 (-41.26 %) Spilled VGPRs: 369 -> 369 (0.00 %) Scratch VGPRs: 1344 -> 1344 (0.00 %) dwords per thread Code Size: 35835152 -> 35841268 (0.02 %) bytes LDS: 767 -> 767 (0.00 %) blocks Max Waves: 222372 -> 222559 (0.08 %) Wait states: 0 -> 0 (0.00 %) Reviewed-by: Nicolai Hähnle <[email protected]> Reviewed-by: Edward O'Callaghan <[email protected]>
* radeonsi: get rid of img/buf/sampler descriptor preloading (v2)Marek Olšák2016-09-141-132/+47
| | | | | | | | | | | | | | | | | | 26011 shaders in 14651 tests Totals: SGPRS: 1251920 -> 1152636 (-7.93 %) VGPRS: 728421 -> 728198 (-0.03 %) Spilled SGPRs: 16644 -> 3776 (-77.31 %) Spilled VGPRs: 369 -> 369 (0.00 %) Scratch VGPRs: 1344 -> 1344 (0.00 %) dwords per thread Code Size: 36001064 -> 35835152 (-0.46 %) bytes LDS: 767 -> 767 (0.00 %) blocks Max Waves: 222221 -> 222372 (0.07 %) Wait states: 0 -> 0 (0.00 %) v2: merge codepaths where possible Reviewed-by: Nicolai Hähnle <[email protected]>
* radeonsi: rename get_sampler_desc -> load_sampler_descMarek Olšák2016-09-141-11/+11
| | | | | Reviewed-by: Nicolai Hähnle <[email protected]> Reviewed-by: Edward O'Callaghan <[email protected]>
* radeonsi: cosmetic changes in si_shader.cMarek Olšák2016-09-141-3/+5
| | | | | Reviewed-by: Nicolai Hähnle <[email protected]> Reviewed-by: Edward O'Callaghan <[email protected]>
* radeonsi: load streamout buffer descriptors before use (v2)Marek Olšák2016-09-141-33/+14
| | | | | | v2: inline the code and remove the conditional that's a no-op now Reviewed-by: Nicolai Hähnle <[email protected]>
* vc4: Implement job shufflingEric Anholt2016-09-148-194/+333
| | | | | | | | | | | | | | | Track rendering to each FBO independently and flush rendering only when necessary. This lets us avoid the overhead of storing and loading the frame when an application momentarily switches to rendering to some other texture in order to continue rendering the main scene. Improves glmark -b desktop:effect=shadow:windows=4 by 27% Improves glmark -b desktop:blur-radius=5:effect=blur:passes=1:separable=true:windows=4 by 17% While I haven't tested other apps, this should help X rendering a lot, and I've heard GLBenchmark needed it too.
* vc4: Handle resolve skipping at job submit time.Eric Anholt2016-09-143-31/+37
| | | | | | This is done in vc4_flush currently, but I'm going to make the job always track the surfaces it might be rendering to instead of putting in the destinations at flush time.
* vc4: Move the render job state into a separate structure.Eric Anholt2016-09-1412-255/+287
| | | | | This is a preparation step for having multiple jobs being queued up at the same time.
* vc4: Always unref the current job surfaces at job reset time.Eric Anholt2016-09-143-36/+21
| | | | | Drops some tricky logic in vc4_flush() trying to update the pointers, and fixes a broken lack of unref for MSAA surfaces at context destroy time.
* vc4: Move job-submit skip cases to vc4_job_submit().Eric Anholt2016-09-142-12/+12
| | | | For calling job_submit() directly, I need the skipping here.
* vc4: Move bin CL trailer to job_submit() time.Eric Anholt2016-09-142-11/+14
| | | | | To implement job shuffling, I want to be able to call submit() on specific jobs, turning vc4_flush() into the context's flush-all-jobs hook.
* vc4: Simplify the DISCARD_RANGE handlingEric Anholt2016-09-141-12/+15
| | | | | | | It's really just an upgrade to attempting WHOLE_RESOURCE. Pulling the logic out caught two bugs in it: We would try to do so on cubemaps (even though we're only mapping 1 of the 6 slices), and we would break persistent coherent mappings by trying to reallocate when we shouldn't.
* vc4: Fix incorrect clearing of Z/stencil when cleared separately.Eric Anholt2016-09-143-15/+38
| | | | | | | | | | | | | | | | | The clear of Z or stencil will end up clearing the other as well, instead of masking. There's no way around this that I know of, so if we are clearing just one then we need to draw a quad. Fixes a regression in the job-shuffling code, where the clear values move to the job and don't just have the last clear's value laying around when you do glClear(DEPTH) and then glClear(STENCIL) separately (ext_framebuffer_multisample-clear 4 depth)). This causes regressions in ext_framebuffer_multisample/multisample-blit depth and ext_framebuffer_multisample/no-color depth, but these were formerly false positives due to the reference image also being black. Now the reference and test images are both being drawn, and it looks like there's an incorrect resolve of depth during blitting to an MSAA FBO.
* tgsi/scan: don't set interp flags for inputs only used by INTERP instructionsMarek Olšák2016-09-131-48/+57
| | | | | | | | | radeonsi depends on the interp flags a little bit too much. This fixes 9 randomly failing tests: GL45-CTS.shader_multisample_interpolation.render.interpolate_at_centroid.* Reviewed-by: Nicolai Hähnle <[email protected]>
* radeonsi: fix FP64 UBO loads with indirect uniform block indexingMarek Olšák2016-09-131-2/+1
| | | | | | | No known tests. Cc: [email protected] Reviewed-by: Nicolai Hähnle <[email protected]>
* winsys/amdgpu: don't assume GTT if the VRAM flag isn't setMarek Olšák2016-09-131-3/+3
| | | | Reviewed-by: Nicolai Hähnle <[email protected]>
* radeonsi: clean up CP DMA emit codeMarek Olšák2016-09-131-84/+60
| | | | | | | | Unify the clear and copy paths, clean up the definitions. It looks more like a rework. It's a preparation for GDS support, which might or might not come. Reviewed-by: Nicolai Hähnle <[email protected]>
* radeonsi: print the IB and buffer list in VM fault reportsMarek Olšák2016-09-131-1/+2
| | | | | | This is a fallout from reworking the debug flags. Reviewed-by: Nicolai Hähnle <[email protected]>
* radeonsi: add sampler view BOs to the BO list lastMarek Olšák2016-09-131-7/+10
| | | | | | | | | If si_sampler_view_add_buffer ends up flushing, then the code in begin_new_cs would previously have added the buffer(s) for whatever was previously bound to that slot. Now it would add only the new buffer. Reviewed-by: Nicolai Hähnle <[email protected]>
* radeonsi: export SampleMask from pixel shaders at full rateMarek Olšák2016-09-133-16/+56
| | | | | | | Heaven and Valley write gl_SampleMask and not Z. Use 16_ABGR instead of 32_ABGR if Z isn't written. Reviewed-by: Nicolai Hähnle <[email protected]>
* gallium/radeon: set new r600_resource fields correctly in other places tooMarek Olšák2016-09-131-0/+11
| | | | | | | | | | | | | This was missed in: commit 0d2e43fcb1198a6e67c85feadb1ca8c360ddc284 Author: Marek Olšák <[email protected]> Date: Thu Aug 18 16:30:00 2016 +0200 gallium/radeon: derive buffer placement and flags only at initialization Tested-by: Michel Dänzer <[email protected]> Reviewed-by: Nicolai Hähnle <[email protected]>
* ddebug: dump shader buffers and imagesMarek Olšák2016-09-133-3/+49
| | | | | | this was unimplemented Reviewed-by: Nicolai Hähnle <[email protected]>
* ddebug: fix a crash in resource_get_handleMarek Olšák2016-09-131-1/+1
| | | | | | broken recently Reviewed-by: Nicolai Hähnle <[email protected]>
* radeon: Don't check DCC on pipe buffersJan Vesely2016-09-131-3/+4
| | | | | | | | | Fixes segfaults in EG compute since: commit 21de3be8e62b2b093569a99550e6356ed2f106b4 radeonsi: fix texture format reinterpretation with DCC Signed-off-by: Jan Vesely <[email protected]> Reviewed-by: Marek Olšák <[email protected]>
* vl/util: Fix YV12/I420 convert to NV12 U/V reversalAndy Furniss2016-09-131-1/+1
| | | | | | | | Fix VAAPI YV12/I420 convert to NV12 U/V reversal. Input order is YVU when this is called. Signed-off-by: Andy Furniss <[email protected]> Reviewed-by: Boyuan Zhang <[email protected]>
* android: add support for libmesa_amdgpu_addrlibMauro Rossi2016-09-134-6/+11
| | | | | | | | | | | Android porting of the following commits: f1f1ba3 "radeonsi: move sid.h/r600d_common.h to a common place." 69fca64 "amd/addrlib: move addrlib from amdgpu winsys to common code" This patch fixes android building errors Reviewed-by: Dave Airlie <[email protected]>
* st/va: also honors interlaced preference when providing a video formatJulien Isorce2016-09-121-17/+19
| | | | | | | | | | | | | | | This fixes a crash when using the prefered video format with vaapisink on Nvidia hardwares. Also caught by the following assert: nouveau_vp3_video.c:91: Assertion `templat->interlaced' failed. TEST= gst-launch-1.0 videotestsrc ! video/x-raw, format=NV12 ! vaapisink Cc: <[email protected]> Signed-off-by: Julien Isorce <[email protected]> Tested-by: Víctor Manuel Jáquez Leal <[email protected]> Tested-by: Boyuan Zhang <[email protected]> Reviewed-by: Christian König <[email protected]>
* tgsi: document semantics for compute shadersSamuel Pitoiset2016-09-121-0/+28
| | | | | Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Nicolai Hähnle <[email protected]>
* radeonsi: don't preload constants at the beginning of shadersMarek Olšák2016-09-121-20/+11
| | | | | | | | | | | | | | | | | | | | | | LLVM can CSE the loads, thus we can always re-load constants before each use. The decrease in SGPR spilling is huge. The best improvements are the dumbest ones. 26011 shaders in 14651 tests Totals: SGPRS: 1453346 -> 1251920 (-13.86 %) VGPRS: 742576 -> 728421 (-1.91 %) Spilled SGPRs: 52298 -> 16644 (-68.17 %) Spilled VGPRs: 397 -> 369 (-7.05 %) Scratch VGPRs: 1372 -> 1344 (-2.04 %) dwords per thread Code Size: 36136488 -> 36001064 (-0.37 %) bytes LDS: 767 -> 767 (0.00 %) blocks Max Waves: 219315 -> 222221 (1.33 %) Reviewed-by: Bas Nieuwenhuizen <[email protected]> Reviewed-by: Edward O'Callaghan <[email protected]> Reviewed-by: Nicolai Hähnle <[email protected]>
* st/va: enable vbr rate control for vaapi encodeBoyuan Zhang2016-09-122-2/+2
| | | | | | | | This patch enables variable bit-rate for vaapi encoding. According to va.h, target bit-rate equals to maximum bit-rate multiplies by target_percentage. Signed-off-by: Boyuan Zhang <[email protected]> Reviewed-by: Christian König <[email protected]>
* vl/rbsp: match initial escaped bits with valid in the bufferLeo Liu2016-09-121-2/+4
| | | | | | Otherwise the check for the three byte will not make sense. Signed-off-by: Leo Liu <[email protected]>
* winsys/radeon: rename nrelocs, crelocs to max_relocs, num_relocsNicolai Hähnle2016-09-122-27/+27
| | | | Reviewed-by: Marek Olšák <[email protected]>
* winsys/radeon: don't pre-allocate the relocations arrayNicolai Hähnle2016-09-121-14/+1
| | | | | | It's really not necessary. Switch to an exponential resizing strategy. Reviewed-by: Marek Olšák <[email protected]>
* winsys/radeon: remove unused radeon_cs_context::priority_usageNicolai Hähnle2016-09-121-1/+0
| | | | Reviewed-by: Marek Olšák <[email protected]>
* winsys/amdgpu: remove amdgpu_cs_lookup_bufferNicolai Hähnle2016-09-122-9/+3
| | | | | | | The radeonsi driver doesn't and shouldn't care about the buffer index. Only the virtual addresses matter. Reviewed-by: Marek Olšák <[email protected]>
* winsys/amdgpu: remove unused field domains from amdgpu_cs_bufferNicolai Hähnle2016-09-122-37/+17
| | | | Reviewed-by: Marek Olšák <[email protected]>
* winsys/amdgpu: remove initial buffer list allocationNicolai Hähnle2016-09-121-20/+0
| | | | | | It's really not necessary. Reviewed-by: Marek Olšák <[email protected]>
* winsys/amdgpu: extract adding a new buffer list entry into its own functionNicolai Hähnle2016-09-121-43/+70
| | | | | | | While at it, try to be a little more robust in the face of memory allocation failure. Reviewed-by: Marek Olšák <[email protected]>
* winsys/amdgpu: use only one fence per BONicolai Hähnle2016-09-123-68/+56
| | | | | | | | | | | The fence that is added to the BO during flush is guaranteed to be signaled after all the fences that were in the fences array of the BO before the flush, because those fences are added as dependencies for the submission (and all this happens atomically under the bo_fence_lock). Therefore, keeping only the last fence around is sufficient. Reviewed-by: Marek Olšák <[email protected]>
* winsys/amdgpu: add do_winsys_deinit functionNicolai Hähnle2016-09-121-2/+7
| | | | | | | The idea is to have matching init/deinit functions so that deinit can be re-used for cleanup in the error path of amdgpu_winsys_create. Reviewed-by: Marek Olšák <[email protected]>
* winsys/amdgpu: clean up error paths in amdgpu_winsys_createNicolai Hähnle2016-09-121-7/+5
| | | | | | | No need to call pb_cache_deinit, because the cache hasn't been initialized at that point. Reviewed-by: Marek Olšák <[email protected]>
* gallium/radeon: page alignment for buffers is unnecessaryNicolai Hähnle2016-09-121-4/+1
| | | | | | In some places (e.g. shader program pointers) we require 256 bytes alignment. Reviewed-by: Marek Olšák <[email protected]>
* gallium/radeon/winsyses: remove #includes of pb_bufmgr.hNicolai Hähnle2016-09-123-3/+0
| | | | Reviewed-by: Marek Olšák <[email protected]>
* freedreno/a3xx: disable filtering for texture buffers and int texturesIlia Mirkin2016-09-111-0/+2
| | | | Signed-off-by: Ilia Mirkin <[email protected]>
* st/clover: Define __OPENCL_VERSION__ on the device sideNiels Ole Salscheider2016-09-101-0/+3
| | | | | | | | This is required by the OpenCL standard. Signed-off-by: Niels Ole Salscheider <[email protected]> Reviewed-by: Edward O'Callaghan <[email protected]> Reviewed-by: Vedran Miletić <[email protected]>
* gm107/ir: allow indirect inputs to be loaded by frag shaderIlia Mirkin2016-09-102-5/+21
| | | | | | | | | Looks like the GM107 IPA op does not allow a separate offset when using an indirect register. Instead we must use AL2P like we do for indirect vertex operations on Kepler+. Signed-off-by: Ilia Mirkin <[email protected]> Reviewed-by: Samuel Pitoiset <[email protected]>
* gm107/ir: AL2P writes to a predicate registerIlia Mirkin2016-09-101-0/+1
| | | | | | | | | | | We have to force it to write to predicate 7 (aka PT) in order for it not to mess up another predicate. Unclear what would be returned in the predicate, perhaps an error code for out-of-bounds requests. Blob doesn't seem to check it. Signed-off-by: Ilia Mirkin <[email protected]> Reviewed-by: Samuel Pitoiset <[email protected]> Cc: [email protected]
* radeonsi: flush TC L2 before using a compute indirect bufferMarek Olšák2016-09-091-2/+10
| | | | | | | There is no known test for this. Cc: 12.0 <[email protected]> Reviewed-by: Nicolai Hähnle <[email protected]>
* radeonsi: fix the VGT performance tweak for small instancesMarek Olšák2016-09-091-5/+6
| | | | | | | | Based on the VGT spec. The Vulkan driver doesn't do it optimally and they plan to fix it. Reviewed-by: Nicolai Hähnle <[email protected]>