aboutsummaryrefslogtreecommitdiffstats
path: root/src/gallium
Commit message (Collapse)AuthorAgeFilesLines
* nvc0: expose a group of performance metrics on FermiSamuel Pitoiset2015-10-293-3/+16
| | | | | | | This allows to monitor those performance metrics through GL_AMD_performance_monitor. Signed-off-by: Samuel Pitoiset <[email protected]>
* nv50/ir: adapt to new method for passing in cull/clip distance masksIlia Mirkin2015-10-294-14/+14
| | | | Signed-off-by: Ilia Mirkin <[email protected]>
* nvc0: share shaders between contexts and build immediatelyIlia Mirkin2015-10-293-1/+7
| | | | | | | Avoid deferring building shaders until draw time, should hopefully reduce any stuttering, as well as enable shader-db style analysis. Signed-off-by: Ilia Mirkin <[email protected]>
* nvc0: do upload-time fixups for interpolation parametersIlia Mirkin2015-10-2915-19/+239
| | | | | | | | | | | | | | | | | Unfortunately flatshading is an all-or-nothing proposition on nvc0, while GL 3.0 calls for the ability to selectively specify explicit interpolation parameters on gl_Color/gl_SecondaryColor which would override the flatshading setting. This allows us to fix up the interpolation settings after shader generation based on rasterizer settings. While we're at it, we can add support for dynamically forcing all (non-flat) shader inputs to be interpolated per-sample, which allows st/mesa to not generate variants for these. Fixes the remaining failing glsl-1.30/execution/interpolation piglits. Signed-off-by: Ilia Mirkin <[email protected]>
* clover: fix building fix clang-3.8Laurent Carlier2015-10-291-1/+5
| | | | | | | | | https://bugs.freedesktop.org/show_bug.cgi?id=92705 v2.1: use Linker::Flags::None instead of 0 and emplace_back() Signed-off-by: Laurent Carlier <[email protected]> Reviewed-by: Francisco Jerez <[email protected]>
* nv50: add ARB_copy_image supportIlia Mirkin2015-10-282-7/+11
| | | | Signed-off-by: Ilia Mirkin <[email protected]>
* nvc0: add ARB_copy_image supportIlia Mirkin2015-10-282-7/+11
| | | | Signed-off-by: Ilia Mirkin <[email protected]>
* nvc0: fix crash when nv50_miptree_from_handle failsJulien Isorce2015-10-281-1/+2
| | | | | Signed-off-by: Julien Isorce <[email protected]> Reviewed-by: Samuel Pitoiset <[email protected]>
* gallium: add PIPE_CAP_COPY_BETWEEN_COMPRESSED_AND_PLAIN_FORMATSMarek Olšák2015-10-2815-1/+17
| | | | | | For ARB_copy_image. Reviewed-by: Brian Paul <[email protected]>
* radeonsi: allow copying between compatible compressed and uncompressed formatsMarek Olšák2015-10-281-1/+1
| | | | | | | | which is where a block in src maps to a pixel in dst and vice versa. e.g. DXT1 <-> R32G32_UINT DXT5 <-> R32G32B32A32_UINT Reviewed-by: Michel Dänzer <[email protected]>
* st/vdpau: disable RefPicList for Vdpau HEVCBoyuan Zhang2015-10-271-0/+1
| | | | | | Signed-off-by: Boyuan Zhang <[email protected]> Reviewed-by: Christian König <[email protected]> Reviewed-by: Leo Liu <[email protected]>
* st/va: add VAAPI HEVC decode supportBoyuan Zhang2015-10-274-1/+208
| | | | | | Signed-off-by: Boyuan Zhang <[email protected]> Reviewed-by: Christian König <[email protected]> Reviewed-by: Leo Liu <[email protected]>
* radeon/uvd: implement and add flag for VAAPI HEVC decodeBoyuan Zhang2015-10-272-0/+16
| | | | | | Signed-off-by: Boyuan Zhang <[email protected]> Reviewed-by: Christian König <[email protected]> Reviewed-by: Leo Liu <[email protected]>
* vl: add RefPicList defines for VAAPI HEVC decodeBoyuan Zhang2015-10-271-0/+2
| | | | | | Signed-off-by: Boyuan Zhang <[email protected]> Reviewed-by: Christian König <[email protected]> Reviewed-by: Leo Liu <[email protected]>
* winsys/amdgpu: remove the dcc_enable surface flagMarek Olšák2015-10-273-10/+7
| | | | | | dcc_size is sufficient and doesn't need a further comment in my opinion. Reviewed-by: Nicolai Hähnle <[email protected]>
* radeonsi: add debug flags that disable DCC and DCC fast clearMarek Olšák2015-10-273-0/+10
| | | | | | | For debugging, bug reports, etc. This is not in the radeonsi directory, but it is about radeonsi. Reviewed-by: Nicolai Hähnle <[email protected]>
* radeonsi: properly check if DCC is enabled and allocatedMarek Olšák2015-10-275-8/+8
| | | | Reviewed-by: Nicolai Hähnle <[email protected]>
* radeonsi: simplify DCC handling in si_initialize_color_surfaceMarek Olšák2015-10-271-7/+3
| | | | Reviewed-by: Nicolai Hähnle <[email protected]>
* vc4: Add support for copy propagation with unpack flags present.Eric Anholt2015-10-262-36/+109
| | | | | total instructions in shared programs: 89251 -> 87862 (-1.56%) instructions in affected programs: 52971 -> 51582 (-2.62%)
* vc4: Rewrite the pack instructions as a MOV with a dst pack flagEric Anholt2015-10-263-37/+18
| | | | Another step in reducing the special-casing of instructions.
* vc4: Move dst pack setup out to a helper function with more asserts.Eric Anholt2015-10-261-10/+22
|
* vc4: Switch the unpack ops to being unpack flags on a mov.Eric Anholt2015-10-266-123/+42
| | | | | | | | | | | | This paves the way for copy propagating our unpacks. We end up with a small change on shader-db: total instructions in shared programs: 89390 -> 89251 (-0.16%) instructions in affected programs: 19041 -> 18902 (-0.73%) which appears to be because we no longer convert MOVs for an FMAX dst, r4.unpack, r4.unpack (instead of the previous MOV dst, r4.unpack), and this ends up with a slightly better schedule.
* vc4: Drop some confused code about pack/unpack handling.Eric Anholt2015-10-261-23/+4
| | | | | | | | | At one point I thought packs and unpacks were in the same field of the instruction. They aren't. These instructions therefore never cause a pack. total instructions in shared programs: 89472 -> 89390 (-0.09%) instructions in affected programs: 15261 -> 15179 (-0.54%)
* vc4: Reduce MOV special-casing in QIR-to-QPU.Eric Anholt2015-10-261-8/+11
| | | | | I'm going to introduce some more types of MOV, which also want the elision of raw MOVs.
* vc4: Fix up the test for whether the unpack can be from r4.Eric Anholt2015-10-263-8/+27
| | | | We can do 16a/16b from float as well. No difference on shader-db.
* vc4: Don't try to follow MOVs across a pack.Eric Anholt2015-10-261-1/+2
|
* vc4: Only copy propagate raw MOVs.Eric Anholt2015-10-261-6/+1
| | | | No problems being fixed, but needed for the new unpack changes.
* vc4: If a QIR source has an unpack set, print it.Eric Anholt2015-10-263-3/+13
| | | | Not used yet, but will be.
* gallivm: disable f16c when not using AVXRoland Scheidegger2015-10-261-0/+3
| | | | | | | | | | | | | | | | | f16c intrinsic can only be emitted when AVX is used. So when we disable AVX due to forcing 128bit vectors we must not use this intrinsic (depending on llvm version, this worked previously because llvm used AVX even when we didn't tell it to, however I've seen this fail with llvm 3.3 since 718249843b915decf8fccec92e466ac1a6219934 which seems to have the side effect of disabling avx in llvm albeit it only touches sse flags really, but with ea421e919ae6e72e1319fb205c42a6fb53ca2f82 it's now really disabled). Albeit being able to use AVX with 128bit vectors also would have its uses, the code as is really was meant to emulate jit code creation for less capable cpus. v2: add some (ifdefed out) missing de-featuring options for simulating less capable cpus. Reviewed-by: Nicolai Hähnle <[email protected]> Reviewed-by: Jose Fonseca <[email protected]>
* st/va: pass picture desc to begin and decodeJulien Isorce2015-10-261-2/+2
| | | | | | | | | | At least vl_mpeg12_decoder uses the picture desc in begin_frame and decode_bitstream. https://bugs.freedesktop.org/show_bug.cgi?id=92634 Signed-off-by: Julien Isorce <[email protected]> Reviewed-by: Christian König <[email protected]>
* vc4: Fix names of the 16-bit unpacksEric Anholt2015-10-243-6/+6
| | | | | They're only f16-to-f32 on a float operation, otherwise they're i16-to-i32.
* vc4: Don't try to register coalesce into the VPM across non-raw MOVs.Eric Anholt2015-10-241-1/+1
| | | | | No known bugs, just something I noticed while updating optimization code for other changes.
* vc4: Take advantage of the 8888 pack function in pack_unorm_4x8.Eric Anholt2015-10-241-0/+14
| | | | | | | | | | One instruction instead of four, and it turns out you do this a lot for the Over operator. total uniforms in shared programs: 32168 -> 32087 (-0.25%) uniforms in affected programs: 318 -> 237 (-25.47%) total instructions in shared programs: 89830 -> 89472 (-0.40%) instructions in affected programs: 6434 -> 6076 (-5.56%)
* vc4: Fix the test for skipping raw MOVs.Eric Anholt2015-10-243-1/+10
| | | | | I don't know what previous test was trying to do, but it dates back to the first add of vc4_qpu_emit.c. No change to shader-db.
* freedreno: remove unnecessary null checksRob Clark2015-10-244-13/+13
| | | | | | | | According to piglit/xonotic/neverball/stc, blend/rasterize/zsa state will always be bound (never null). And the null checks were in- consistent anyways, so remove them. Signed-off-by: Rob Clark <[email protected]>
* radeonsi: Implement DCC fast clear.Bas Nieuwenhuizen2015-10-243-14/+100
| | | | | | | | | | | Uses the DCC buffer instead of the CMASK buffer. The ELIMINATE_FAST_CLEAR still works. Furthermore, with DCC compression we can directly clear to a limited set of colors such that we do not need a postprocessing step. v2 Marek: check dcc_buffer && dirty_level_mask in set_sampler_view Signed-off-by: Bas Nieuwenhuizen <[email protected]> Signed-off-by: Marek Olšák <[email protected]>
* gallivm: fix tex offsets with mirror repeat linearRoland Scheidegger2015-10-241-4/+5
| | | | | | | | Can't see why anyone would ever want to use this, but it was clearly broken. This fixes the piglit texwrap offset test using this combination. Reviewed-by: Dave Airlie <[email protected]> Reviewed-by: Jose Fonseca <[email protected]>
* gallivm: fix sampling with texture offsets in SoA pathRoland Scheidegger2015-10-241-3/+8
| | | | | | | | | | | | | When using nearest filtering and clamp / clamp to edge wrapping results could be wrong for negative offsets. Fix this by adding the offset before doing the conversion to int coords (could also use floor instead of trunc int conversion but probably more complex on "typical" cpu). This fixes the piglit texwrap offset failures with this filter/wrap combo (which only leaves the linear/mirror repeat combination broken). Reviewed-by: Dave Airlie <[email protected]> Reviewed-by: Jose Fonseca <[email protected]>
* softpipe: fix using non-zero layer in non-array view from array resourceRoland Scheidegger2015-10-242-9/+31
| | | | | | | | | | | | For vertex/geometry shader sampling, this is the same as for llvmpipe - just use the original resource target. For fragment shader sampling though (which does not use first-layer based mip offsets) adjust the sampling code to use first_layer in the non-array cases. While here also fix up some code which looked wrong wrt buffer texel fetch (no piglit change). Reviewed-by: Dave Airlie <[email protected]> Reviewed-by: Jose Fonseca <[email protected]>
* llvmpipe: fix using non-zero layer in non-array view from array resourceRoland Scheidegger2015-10-242-8/+8
| | | | | | | | | | | Just need to use resource target not view target when calculating first-layer based mip offsets. (This is a gl specific problem since d3d10 does not distinguish between non-array and array resources neither at the resource nor view level, only at the shader level.) Fixes new piglit arb_texture_view sampling-2d-array-as-2d-layer test. Reviewed-by: Dave Airlie <[email protected]> Reviewed-by: Jose Fonseca <[email protected]>
* radeonsi: add Stoney to si_init_gs_info()Alex Deucher2015-10-231-0/+1
| | | | | | | | This patch was originally written before stoney support was merged. Add stoney. Reviewed-by: Marek Olšák <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
* radeonsi: Enable DCC.Bas Nieuwenhuizen2015-10-246-6/+50
| | | | | Signed-off-by: Bas Nieuwenhuizen <[email protected]> Signed-off-by: Marek Olšák <[email protected]>
* radeonsi: Add FLUSH_AND_INV_CB_DATA_TS for DCC.Bas Nieuwenhuizen2015-10-241-0/+11
| | | | | Signed-off-by: Bas Nieuwenhuizen <[email protected]> Signed-off-by: Marek Olšák <[email protected]>
* radeonsi: Disable operations that do not work with DCC.Bas Nieuwenhuizen2015-10-244-3/+11
| | | | | Signed-off-by: Bas Nieuwenhuizen <[email protected]> Signed-off-by: Marek Olšák <[email protected]>
* radeonsi: Allocate buffers for DCC.Bas Nieuwenhuizen2015-10-245-5/+92
| | | | | | | | | | | As the alignment requirements can be 32 KiB or more, also adding an aligned buffer creation function. DCC is disabled for textures that can be shared as sharing the DCC buffers has not been implemented yet. Signed-off-by: Bas Nieuwenhuizen <[email protected]> Signed-off-by: Marek Olšák <[email protected]>
* radeonsi: only apply the SNORM blit workaround to *8_SNORMMarek Olšák2015-10-241-1/+1
| | | | | | | Like the comment says. This fixes DCC, which doesn't like blitting RG16 as RGBA8. Reviewed-by: Michel Dänzer <[email protected]>
* util/format: add helper util_format_is_snorm8Marek Olšák2015-10-242-0/+22
| | | | Reviewed-by: Michel Dänzer <[email protected]>
* radeonsi: add another requirement for PARTIAL_ES_WAVEMarek Olšák2015-10-244-2/+35
| | | | Reviewed-by: Michel Dänzer <[email protected]>
* radeonsi: merge two ifs setting WD_SWITCH_ON_EOPMarek Olšák2015-10-241-5/+2
| | | | Reviewed-by: Michel Dänzer <[email protected]>
* radeonsi: make PARTIAL_ES_WAVE globally dependent on SWITCH_ON_EOIMarek Olšák2015-10-241-5/+6
| | | | | | This catches the other cases that enable SWITCH_ON_EOI. Reviewed-by: Michel Dänzer <[email protected]>