summaryrefslogtreecommitdiffstats
path: root/src/gallium
Commit message (Collapse)AuthorAgeFilesLines
* gallium: add missing zero-init for resource templatesRob Clark2016-10-0710-0/+17
| | | | | | | Mostly test code, plus one spot I noticed in r600. Signed-off-by: Rob Clark <[email protected]> Reviewed-by: Nicolai Hähnle <[email protected]>
* freedreno: don't try to shadow layered texturesRob Clark2016-10-071-0/+3
| | | | | | | | We will only hit this with multi-planar YUV external images, so we would probably never hit this code path in the first place. But if we did, it wouldn't do the right thing so just bail. Signed-off-by: Rob Clark <[email protected]>
* freedreno/a3xx+a4xx: fix clip-plane lowering stateRob Clark2016-10-072-0/+6
| | | | | | | If enabled clip-planes have changed, we need to mark program state dirty. Signed-off-by: Rob Clark <[email protected]>
* vc4: Don't worry about partial Z/S clear if the other is already cleared.Eric Anholt2016-10-061-3/+7
| | | | | | | | | We have to be careful to not smash the value they're clearing to, but other than that we're fine. Avoids quad clears in Processing, which likes to do glClear(Z|S); glClear(Z). Improves performance of Processing's QuadRendering demo at 5000 quads by 5.46507% +/- 1.35576% (n=15 before, 32 after)
* vc4: Try to fix the HW-2116 workaround.Eric Anholt2016-10-061-9/+10
| | | | | | | | | | | | | | | We were incrementing the count at the end of vc4_start_draw(), except that that function returns immediately if we've already started drawing on this batch. It also failed to count the statechanges from the GFXH-515 workaround. This incidentally allows repeated glClear() to be coalesced, because the fast clears aren't counted in draw_calls_queued any more. Fixes most of the extra flushes in Processing, which emits glClear(Z|S); glClear(Z); glClear(C) during its frame setup. Improves performance of Processing's QuadRendering demo at 5000 quads by 3.33538% +/- 2.05846% (n=21 before, 15 after)
* vc4: Drop dead argument from vc4_start_draw().Eric Anholt2016-10-061-3/+3
|
* vc4: Fix fallback to quad clears of depth in GLX.Eric Anholt2016-10-064-25/+64
| | | | | The fix in the vc4-jobs series ended up triggering the fallback path on GLX apps that use depth but not stencil.
* vc4: Add the format name in miptree_debug.Eric Anholt2016-10-061-2/+4
| | | | | I was curious if my Z/S buffer was actually ZS or ZX, and the vc4 format of "0" didn't tell me much.
* vc4: Fix perf debug formatting on partial Z/S clear.Eric Anholt2016-10-061-1/+1
|
* vc4: Drop destination register when it's unused.Eric Anholt2016-10-061-1/+22
| | | | | | | This slightly reduces instructions on shader-db, but I think it's just perturbing register allocation -- the allocator should have always trivially colored these nodes, before. This commit is just to make QIR code failing more intelligible when register allocation fails.
* vc4: Fix live intervals analysis for screening defs in if statements.Eric Anholt2016-10-063-5/+20
| | | | | | | | | If a conditional assignment is only conditioned on the exec mask, that's still screening off the value in the executed channels (and, since we're not storing to the unexcuted channels, we don't care what's in there). Fixes a bunch of extra register pressure on Processing's Ribbons demo, which is failing to allocate.
* vc4: Fix simulator when more than one vc4_screen is opened.Eric Anholt2016-10-063-3/+39
| | | | | | We would assertion fail in setting up the simulator the second time around. This at least postpones the assertion failure until we've closed all of the first set of screens and started opening a new set.
* vc4: Fix assertion fails from trying to cast non-ALU instrs to ALU.Eric Anholt2016-10-061-0/+2
| | | | | Fixes 100 piglit tests since the assertions were added to nir.h. What's amazing is that these tests used to pass, even when casting garbage.
* nv50/ir: fix wrong check when optimizing MAD to SHLADDSamuel Pitoiset2016-10-071-1/+1
| | | | | | | | | Checking if MAD is supported is definitely wrong, and it's more likely a typo I introduced few days ago which breaks NV50 because SHLADD is not supported there. Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Ilia Mirkin <[email protected]>
* nvc0: dump program binary only when NV50_PROG_DEBUG is setSamuel Pitoiset2016-10-071-1/+1
| | | | | | | | When the chipset is forced with NV50_PROG_CHIPSET, we actually only want to output the binary if NV50_PROG_DEBUG is also enabled. Otherwise, this pollutes the shader-db output. Signed-off-by: Samuel Pitoiset <[email protected]>
* nvc0: expose ARB_compute_variable_group_sizeSamuel Pitoiset2016-10-071-2/+6
| | | | | | | | | Only expose 512 threads/block on Fermi to not be limited by 32 GPRs/thread. v4: - use 512 threads on Fermi, 1024 on Kepler+ Signed-off-by: Samuel Pitoiset <[email protected]>
* nv50/ir: set number of threads/block for variable local sizeSamuel Pitoiset2016-10-071-0/+2
| | | | | | | | | | | | When a variable local size is defined as specified by ARB_compute_variable_group_size, the fixed local size is set to 0 and a SIGFPE occurs when we compute the maximum number of regs. This allows to use 64 GPRs/thread. v4: - use 512 threads on Fermi, 1024 on Kepler+ Signed-off-by: Samuel Pitoiset <[email protected]>
* gallium: add PIPE_COMPUTE_CAP_MAX_VARIABLE_THREADS_PER_BLOCKSamuel Pitoiset2016-10-077-1/+15
| | | | | | | | | v3: - use a new case statement in r600_pipe_common.c - fix compilation of softpipe... Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Marek Olšák <[email protected]> Reviewed-by: Nicolai Hähnle <[email protected]>
* nv50/ir: optimize sub(a, 0) to aKarol Herbst2016-10-061-0/+3
| | | | | | | | | | | | | | | | | helped some ue4 demos and divinity OS shaders total instructions in shared programs : 2818674 -> 2818606 (-0.00%) total gprs used in shared programs : 379273 -> 379273 (0.00%) total local used in shared programs : 9505 -> 9505 (0.00%) total bytes used in shared programs : 25837792 -> 25837192 (-0.00%) local gpr inst bytes helped 0 0 33 33 hurt 0 0 0 0 Signed-off-by: Karol Herbst <[email protected]> Reviewed-by: Ilia Mirkin <[email protected]> Reviewed-by: Pierre Moreau <[email protected]>
* nir: Make nir_foo_first/last_cf_node return a block insteadJason Ekstrand2016-10-062-11/+6
| | | | | | | | | | One of NIR's invariants is that control flow lists always start and end with blocks. There's no good reason why we should return a cf_node from these functions since we know that it's always a block. Making it a block lets us remove a bunch of code. Signed-off-by: Jason Ekstrand <[email protected]> Reviewed-by: Connor Abbott <[email protected]>
* gallium/hud: Remove superfluous debugSteven Toth2016-10-064-52/+0
| | | | | | | No longer required. Signed-off-by: Steven Toth <[email protected]> Reviewed-by: Emil Velikov <[email protected]>
* svga: add svga_mksstats.h to the sources listEmil Velikov2016-10-061-0/+1
| | | | | | | Otherwise it won't be picked in the tarball and the build will fail. Fixes: 0035f7f1365 ("svga: add guest statistic gathering interface") Signed-off-by: Emil Velikov <[email protected]>
* st/xvmc/tests: force enable assertionsEmil Velikov2016-10-061-0/+2
| | | | | | | | | | Similar to the other 'tests', enable assertions in xvmc_bench. This silences the GCC warnings about unused-variable(s), makes the program actually useful, as the XvMC API called. Atm the function calls are omitted, since they're called within the assert. Signed-off-by: Emil Velikov <[email protected]>
* nvc0: dump program binary when chipset has been forcedSamuel Pitoiset2016-10-051-0/+5
| | | | | | | | Currently, program binaries are only dumped at upload time, but when the chipset has been forced via NV50_PROG_CHIPSET we might want to show the generated code, especially with shaderdb. Signed-off-by: Samuel Pitoiset <[email protected]>
* radeonsi: fix texture border colors for compute shadersMarek Olšák2016-10-051-0/+12
| | | | | | | | There are VM faults without this. Cc: 12.0 <[email protected]> Acked-by: Edward O'Callaghan <[email protected]> Reviewed-by: Nicolai Hähnle <[email protected]>
* gallium/radeon/winsyses: set reasonable max_alloc_sizeMarek Olšák2016-10-052-3/+5
| | | | | | | | | | which is returned for GL_MAX_TEXTURE_BUFFER_SIZE. It doesn't have any other use at the moment. Bigger allocations are not rejected. This fixes GL45-CTS.texture_buffer.texture_buffer_max_size on Bonaire. Reviewed-by: Nicolai Hähnle <[email protected]>
* radeonsi: fix interpolateAt opcodes for .zw componentsMarek Olšák2016-10-051-1/+1
| | | | | | | | | | Not returning garbage in .zw seems pretty important. This fixes: GL45-CTS.shader_multisample_interpolation.render.interpolate_at_*_check.* Cc: 11.2 12.0 <[email protected]> Reviewed-by: Nicolai Hähnle <[email protected]>
* radeonsi: add assertions to validate interpolation flagsMarek Olšák2016-10-051-0/+34
| | | | Reviewed-by: Nicolai Hähnle <[email protected]>
* radeonsi: interpolate colors after interpolation weight shufflingMarek Olšák2016-10-051-48/+48
| | | | Reviewed-by: Nicolai Hähnle <[email protected]>
* tgsi/scan: don't set interp flags for inputs only used by INTERP (v2)Marek Olšák2016-10-051-48/+57
| | | | | | | | | | | | (v1 pushed, then reverted) This fixes 9 randomly failing tests on radeonsi: GL45-CTS.shader_multisample_interpolation.render.interpolate_at_centroid.* v2: use input_interpolate[input] (correct) instead of input_interpolate[index] (incorrect) Reviewed-by: Nicolai Hähnle <[email protected]>
* ddebug: dump most driver information with GALLIUM_DDEBUG=alwaysMarek Olšák2016-10-051-1/+5
| | | | Reviewed-by: Nicolai Hähnle <[email protected]>
* nv50/ra: let simplify return an error and handle thatKarol Herbst2016-10-051-5/+7
| | | | | | | | fixes a crash in the case simplify reports an error Signed-off-by: Karol Herbst <[email protected]> Reviewed-by: Ilia Mirkin <[email protected]> Reviewed-by: Samuel Pitoiset <[email protected]>
* virgl: Fix build regression of commit 8a943564Nicolai Hähnle2016-10-051-1/+1
|
* gallium/radeon: implement set_device_reset_callbackNicolai Hähnle2016-10-054-0/+40
| | | | | | | | | Check for device reset on flush. It would be nicer if the kernel just reported this as an error on the submit ioctl (and similarly for fences), but this will do for now. Reviewed-by: Edward O'Callaghan <[email protected]> Reviewed-by: Marek Olšák <[email protected]>
* ddebug: add pass-through of set_device_reset_callbackNicolai Hähnle2016-10-051-0/+10
| | | | | Reviewed-by: Edward O'Callaghan <[email protected]> Reviewed-by: Marek Olšák <[email protected]>
* gallium: add pipe_context::set_device_reset_callbackNicolai Hähnle2016-10-053-0/+42
| | | | | Reviewed-by: Edward O'Callaghan <[email protected]> Reviewed-by: Marek Olšák <[email protected]>
* virgl: use the new parent/child pools for transfersNicolai Hähnle2016-10-056-8/+14
| | | | Reviewed-by: Marek Olšák <[email protected]>
* vc4: use the new parent/child pools for transfersNicolai Hähnle2016-10-055-6/+11
| | | | Reviewed-by: Marek Olšák <[email protected]>
* freedreno: use the new parent/child pools for transfersNicolai Hähnle2016-10-055-6/+12
| | | | Reviewed-by: Marek Olšák <[email protected]>
* r300: use the new parent/child pools for transfers (v2)Nicolai Hähnle2016-10-055-7/+11
| | | | | | v2: slab_alloc_st -> slab_alloc Reviewed-by: Marek Olšák <[email protected]>
* gallium/radeon: use the new parent/child pools for transfersNicolai Hähnle2016-10-053-6/+11
| | | | | Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=97894 Reviewed-by: Marek Olšák <[email protected]>
* gallivm: Use AVX2 gather instrinsics.Jose Fonseca2016-10-041-0/+95
| | | | | | v2: Use AVX2 gather for non aligned loads too. Reviewed-by: Roland Scheidegger <[email protected]>
* gallivm: Use 8 wide AoS sampling on AVX2.Roland Scheidegger2016-10-041-5/+6
| | | | | | | | | | v2: Make sure that with num_lods > 1 and min_filter != mag_filter we still enter the splitting path. So this case would still use 4-wide aos path (as a side note, the 4-wide aos sampling path could actually be improved quite a bit if we have avx2, by just doing the filtering with 256bit vectors). Reviewed-by: Jose Fonseca <[email protected]>
* gallivm: Basic AVX2 support.José Fonseca2016-10-044-28/+98
| | | | | | v2: pblendb -> pblendvb Reviewed-by: Roland Scheidegger <[email protected]>
* st/omx/dec/h265: add scaling list dataLeo Liu2016-10-041-5/+97
| | | | | | | | | Specified by subclause 7.3.4 v2: get the loop optimized Signed-off-by: Leo Liu <[email protected]> Reviewed-by: Christian König <[email protected]>
* st/omx/dec/h265: fix the skip for before and after listLeo Liu2016-10-041-3/+4
| | | | | | | | | | For reference picture sets, there are cases that rps will not always be used. Once detect the unused flag from encoded bitstream, we should not add this rps to any list, otherwise pass the incorrect reference and skip the correct rps. Signed-off-by: Leo Liu <[email protected]> Reviewed-by: Emil Velikov <[email protected]>
* st/omx/dec/h265: set the default reference picture set for referenceLeo Liu2016-10-041-2/+4
| | | | | | | | | | It will fix the corruption for frame, that only has one stort term ref picture set, we set NULL rps for this case previously, causing taking incorrect reference. Instead we should take that only short term set as reference Signed-off-by: Leo Liu <[email protected]> Reviewed-by: Emil Velikov <[email protected]>
* st/omx/dec/h265: decoder size should follow from spsLeo Liu2016-10-042-7/+8
| | | | | | | | | | | | | The video size from format container is not always compatible with the size from codec bitstream, the HW decoder should take the size information from bitstream, otherwise the corruption appears with clip that has different size info between bitstream and format container So we are passing width(height)_in_samples from sequence parameter set to video decoder. Signed-off-by: Leo Liu <[email protected]> Reviewed-by: Emil Velikov <[email protected]>
* st/omx/dec/h265: increase dpb max size to 32Leo Liu2016-10-041-1/+1
| | | | | | For clip with frame delta poc over 16 Signed-off-by: Leo Liu <[email protected]>
* radeonsi: optionally run the LLVM IR verifier passNicolai Hähnle2016-10-045-9/+38
| | | | | | | | This is enabled automatically if shader printing is enabled, or separately by R600_DEBUG=checkir. Catch mal-formed IR before it crashes in a later pass. Reviewed-by: Marek Olšák <[email protected]>