summaryrefslogtreecommitdiffstats
path: root/src/gallium
Commit message (Collapse)AuthorAgeFilesLines
* android: fix llvm, elf dependencies for M, N releasesMauro Rossi2017-02-011-1/+1
| | | | | | | | These changes set the correct llvm version and elf include path which differ for Marshmallow and Nougat Cc: "17.0" <[email protected]> Reviewed-by: Emil Velikov <[email protected]>
* radeonsi/ac: move frag interp emission code to shared llvm code.Dave Airlie2017-02-021-87/+13
| | | | | | | | This code should be used in radv, so move it to a shared location in advance of doing that. Reviewed-by: Nicolai Hähnle <[email protected]> Signed-off-by: Dave Airlie <[email protected]>
* st/va: add h264 constrained baseline profileBoyuan Zhang2017-02-011-0/+1
| | | | | Signed-off-by: Boyuan Zhang <[email protected]> Reviewed-by: Christian König <[email protected]>
* st/vdpau: add h264 constrained baseline profileBoyuan Zhang2017-02-011-0/+4
| | | | | Signed-off-by: Boyuan Zhang <[email protected]> Reviewed-by: Christian König <[email protected]>
* radeon/uvd: add h264 constrained baseline supportBoyuan Zhang2017-02-011-0/+1
| | | | | Signed-off-by: Boyuan Zhang <[email protected]> Reviewed-by: Christian König <[email protected]>
* vl: add h264 constrained baseline profileBoyuan Zhang2017-02-012-0/+2
| | | | | Signed-off-by: Boyuan Zhang <[email protected]> Reviewed-by: Christian König <[email protected]>
* winsys/radeon: Allow visible VRAM size > 256MB with kernel driver >= 2.49Michel Dänzer2017-02-011-1/+6
| | | | | | | | | The kernel driver reports correct values now. Reviewed-by: Christian König <[email protected]> Reviewed-by: Samuel Pitoiset <[email protected]> Reviewed-by: Nicolai Hähnle <[email protected]> Reviewed-by: Alex Deucher <[email protected]>
* radeonsi: Fix build on LLVM < 3.9 v2Tom Stellard2017-02-011-2/+4
| | | | | | | | | This was broken by: e0cc0a614c96011958bc3a1b84da9168e0e1ccbb v2: - Use preprocessor macro Tested-by: Mark Janes <[email protected]>
* vc4: Enable Neon on arm android buildsRob Herring2017-01-311-0/+2
| | | | | Signed-off-by: Rob Herring <[email protected]> Reviewed-by: Eric Anholt <[email protected]>
* vc4: fix arm64 build with NeonRob Herring2017-01-311-1/+1
| | | | | | | | The addition of Neon assembly breaks on arm64 builds because the assembly syntax is different. For now, restrict Neon to ARMv7 builds. Signed-off-by: Rob Herring <[email protected]> Reviewed-by: Eric Anholt <[email protected]>
* vc4: Make Neon inline assembly clang compatibleRob Herring2017-01-311-35/+35
| | | | | | | | | | | | clang throws an error on "%r2" and similar. I couldn't find any documentation on what "%r?" is supposed to mean and I've never seen any use like that as far as I remember. The parameter is supposed to be cpu_stride and just %2/%3 should be sufficient. There's no need for trailing ";" either, so remove those, too. Signed-off-by: Rob Herring <[email protected]> Reviewed-by: Eric Anholt <[email protected]>
* radeonsi: Set datalayout on the llvm moduleTom Stellard2017-01-311-0/+6
| | | | | | | | | | This prevents LLVM from using sext instructions for local memory offsets and allows the backend to fold immediate offsets into the instruction. This also prevents some incorrect code generation for ptrtoint and inttoptr instructions. Reviewed-by: Nicolai Hähnle <[email protected]>
* etnaviv: Set SE.CLIP registers, add margins for scissor/clip registersWladimir J. van der Laan2017-01-313-20/+52
| | | | | | | | | | | | | | | | | | | | | | | | | | This fixes rendering of full-screen quads (and other screen-filling geometry, e.g. ioquake3 walls up-close) on gc3000. It should be a no-op on other hardware. - It looks like SE_CLIP registers were not set at all. I'm amazed that rendering worked without them. Emit them to avoid issues on gc3000. - Define constants ETNA_SE_SCISSOR_MARGIN_RIGHT (0x1119) ETNA_SE_SCISSOR_MARGIN_BOTTOM (0x1111) ETNA_SE_CLIP_MARGIN_RIGHT (0xffff) ETNA_SE_CLIP_MARGIN_BOTTOM (0xffff) These demarcate the margin (fixp16) between the computed sizes and the value sent to the chip. I have set these to the numbers used by the Vivante driver for gc2000. I am not sure whether any old hardware was relying on the old numbers, or whether those were just a guess. But if so, these need to be moved to the _specs structure. CC: <[email protected]> Signed-off-by: Wladimir J. van der Laan <[email protected]> Acked-by: Christian Gmeiner <[email protected]>
* etnaviv: Generate new sin/cos instructions on GC3000Wladimir J. van der Laan2017-01-313-1/+40
| | | | | | | | | | | | | | | | | | | | | Shaders using sin/cos instructions were not working on GC3000. The reason for this turns out to be that these chips implement sin/cos in a different way (but using the same opcodes): - Need their input scaled by 1/pi instead of 2/pi. - Output an x and y component, which need to be multiplied to get the result. - tex_amode needs to be set to 1. Add a new bit to the compiler specs and generate these instructions as necessary. CC: <[email protected]> Signed-off-by: Wladimir J. van der Laan <[email protected]> Acked-by: Christian Gmeiner <[email protected]>
* etnaviv: Cannot render to rb-swapped formatsWladimir J. van der Laan2017-01-311-2/+5
| | | | | | | | | | | | | | Exposing rb swapped (or other swizzled) formats for rendering would involve swizzing in the pixel shader. This is not the case at the moment, so reject requests for creating such surfaces. (GPUs that need an extra resolve step anyway due to multiple pixel pipes, such as gc2000, might also do this swap in the resolve operation. But this would be tricky to keep track of) CC: <[email protected]> Signed-off-by: Wladimir J. van der Laan <[email protected]> Acked-by: Christian Gmeiner <[email protected]>
* etnaviv: Avoid infinite loop in find_frame()Christian Gmeiner2017-01-311-1/+1
| | | | | | | | | | | | | | | | | | | Use of unsigned loop control variable with '>= 0' would lead to infinite loop. Reported by clang: etnaviv_compiler.c:1024:39: warning: comparison of unsigned expression >= 0 is always true [-Wtautological-compare] for (unsigned sp = c->frame_sp; sp >= 0; sp--) ~~ ^ ~ v2: Simply use the same datatype as c->frame_sp is using. CC: <[email protected]> Reported-by: Rhys Kidd <[email protected]> Signed-off-by: Christian Gmeiner <[email protected]> Reviewed-by: Rhys Kidd <[email protected]>
* r600: fix a compilation warning in r600_screen_create()Samuel Pitoiset2017-01-301-1/+1
| | | | | | | | Should be r600_common_screen instead of r600_screen. Fixes: 80157a2c20 ("gallium/radeon: clean up r600_query_init_backend_mask") Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Nicolai Hähnle <[email protected]>
* gallium/radeon: merge dirty_fb_counter and dirty_tex_descriptor_counterMarek Olšák2017-01-304-41/+22
| | | | | | to simplify things in draw_vbo a little Reviewed-by: Nicolai Hähnle <[email protected]>
* winsys/radeon: clamp vram_vis_size to 256MBMarek Olšák2017-01-301-1/+1
| | | | | | the value from the kernel is wrong Reviewed-by: Nicolai Hähnle <[email protected]>
* radeonsi: handle count_from_stream_output in a few IA_MULTI_VGT_PARAM casesMarek Olšák2017-01-301-2/+4
| | | | Reviewed-by: Nicolai Hähnle <[email protected]>
* radeonsi: don't invoke DCC decompression in update_all_texture_descriptorsMarek Olšák2017-01-301-5/+6
| | | | | | | | | | | | | This fixes a bug uncovered by the 17-part patch series, specifically: "gallium/radeon: merge dirty_fb_counter and dirty_tex_descriptor_counter" If dirty_tex_counter has been updated and set_shader_image invokes DCC decompression, the DCC decompression itself checks the counter and updates descriptors, which in turn invokes the same DCC decompression. The blitter can't handle the recursion and the driver eventually crashes. Cc: 17.0 <[email protected]> Reviewed-by: Nicolai Hähnle <[email protected]>
* radeonsi: fold info->indirect conditionals into the last one in draw_vboMarek Olšák2017-01-301-12/+13
| | | | Reviewed-by: Nicolai Hähnle <[email protected]>
* radeonsi: atomize the scratch buffer stateMarek Olšák2017-01-306-29/+32
| | | | | | | | | The update frequency is very low. Difference: Only account for the size when allocating a new one and when starting a new IB, and check for NULL. (v3) Reviewed-by: Nicolai Hähnle <[email protected]>
* r600: Fix stack overflowBartosz Tomczyk2017-01-301-1/+1
| | | | | | | | | Commit 7b5878ee0491e7a93914389a8369cd6752b9757d increased number of outputs to 64, but left output array intact. This caused stack overflow when number of outputs is bigger then 32. Found by ASAN. Cc: "12.0 13.0 17.0" <[email protected]> Reviewed-by: Nicolai Hähnle <[email protected]>
* gallium/radeon: add new HUD queries for monitoring the CPSamuel Pitoiset2017-01-304-3/+80
| | | | | | | | | | There are even more counters in the CP_STAT register but I think these ones are enough for now. v2: only read (and expose) CP_STAT on VI+ Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Marek Olšák <[email protected]>
* gallium/radeon: add new GPU-sdma-busy HUD querySamuel Pitoiset2017-01-304-1/+21
| | | | | | | | | For simplicity, GPU-sdma-busy will return 0 on previous gens. v2: only read SRBM_STATUS2 on Evergreen+ Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Marek Olšák <[email protected]>
* gallium/radeon: rename grbm to mmio in the gpu load pathSamuel Pitoiset2017-01-302-32/+33
| | | | | | | | We also want to monitor other MMIO counters like SRBM_STATUS2 in order to know if SDMA is busy. Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Marek Olšák <[email protected]>
* winsys/amdgpu: add a fast exit path into amdgpu_cs_add_bufferMarek Olšák2017-01-302-0/+21
| | | | | | The time spent in the function dropped by 37% for torcs. Reviewed-by: Nicolai Hähnle <[email protected]>
* winsys/amdgpu: do not iterate twice when adding fence dependenciesSamuel Pitoiset2017-01-301-31/+32
| | | | | | | | The perf difference is very small, 3.25->2.84% in amdgpu_cs_flush() in the DXMD benchmark. Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Marek Olšák <[email protected]>
* winsys/amdgpu: add one likely() call in amdgpu_cs_flush()Samuel Pitoiset2017-01-301-2/+2
| | | | | Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Marek Olšák <[email protected]>
* hud: fix compilation warnings in hud_nic_graph_install()Samuel Pitoiset2017-01-301-2/+2
| | | | | | | v2: use PRId64 instead of PRIx64 Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Marek Olšák <[email protected]>
* gallium/radeon: remove r600_common_context::max_dbMarek Olšák2017-01-303-20/+17
| | | | | | this cleanup is based on the vulkan driver, which seems to do the same thing Reviewed-by: Nicolai Hähnle <[email protected]>
* winsys/amdgpu: fix ADDR_REGISTER_VALUE::backendDisablesMarek Olšák2017-01-301-1/+1
| | | | | | This would be a fix if the value was used anywhere. Reviewed-by: Nicolai Hähnle <[email protected]>
* gallium/radeon: clean up r600_query_init_backend_maskMarek Olšák2017-01-306-22/+21
| | | | | | | This just needs to be done for r600g in the screen. We don't need an IB submission for every new context created for GCN. Reviewed-by: Nicolai Hähnle <[email protected]>
* radeonsi: precompute IA_MULTI_VGT_PARAM values into a tableMarek Olšák2017-01-306-72/+163
| | | | | | | The perf difference is very small: 0.99% -> 0.40% for the time spent in si_get_ia_multi_vgt_param when si_draw_vbo is 20%. Pretty much nothing. Reviewed-by: Nicolai Hähnle <[email protected]>
* radeonsi: move VGT_VERTEX_REUSE_BLOCK_CNTL into shader states for PolarisMarek Olšák2017-01-304-21/+43
| | | | Reviewed-by: Nicolai Hähnle <[email protected]>
* radeonsi: state atom IDs don't have to be off by oneMarek Olšák2017-01-302-4/+4
| | | | Reviewed-by: Nicolai Hähnle <[email protected]>
* radeonsi: use a bitmask for looping over dirty PM4 statesMarek Olšák2017-01-305-18/+20
| | | | | | also move it to draw_vbo, because it should be 0 in most cases Reviewed-by: Nicolai Hähnle <[email protected]>
* radeonsi: atomize L2 prefetchesMarek Olšák2017-01-307-36/+50
| | | | | | to move the big conditional statement out of draw_vbo Reviewed-by: Nicolai Hähnle <[email protected]>
* radeonsi: unbind disabled shader stages to prevent useless L2 prefetchesMarek Olšák2017-01-301-0/+6
| | | | Reviewed-by: Nicolai Hähnle <[email protected]>
* radeonsi: also prefetch compute shadersMarek Olšák2017-01-301-0/+12
| | | | Reviewed-by: Nicolai Hähnle <[email protected]>
* radeonsi: update dirty_level_mask only after the first draw after FB changeMarek Olšák2017-01-303-24/+31
| | | | Reviewed-by: Nicolai Hähnle <[email protected]>
* gallium/radeon: allow VRAM-only placements again on APUs & recent amdgpuMarek Olšák2017-01-301-0/+4
| | | | Reviewed-by: Nicolai Hähnle <[email protected]>
* radeonsi: don't set +fp64-denormalsMarek Olšák2017-01-301-1/+1
| | | | | | it's the default and the name will change to +fp64-fp16-denormals. Reviewed-by: Nicolai Hähnle <[email protected]>
* radeonsi: remove si_shader_context::param_tess_offchipMarek Olšák2017-01-302-8/+3
| | | | | | we don't use on-chip tess. Reviewed-by: Nicolai Hähnle <[email protected]>
* etnaviv: force vertex buffers through the MMULucas Stach2017-01-301-1/+4
| | | | | | | | | This fixes a vertex data corruption issue if some of the vertex streams go through the MMU and some don't. Signed-off-by: Lucas Stach <[email protected]> Tested-by: Philipp Zabel <[email protected]> Acked-by: Christian Gmeiner <[email protected]>
* llvmpipe: Use LLVMDumpModule, not DumpModule.Bas Nieuwenhuizen2017-01-291-1/+1
| | | | | | | Forgot the prefix ... Fixes: 0fca80b3db64dc1d004f78e22b9de86a07e9de96 Signed-off-by: Bas Nieuwenhuizen <[email protected]>
* various: Fix missing DumpModule with recent LLVM.Bas Nieuwenhuizen2017-01-292-4/+10
| | | | | | | | | | | | | | Since LLVM revision 293359 DumpModule gets only implemented when either a debug build or LLVM_ENABLE_DUMP is set. This patch adds a direct replacement for the function for radv and radeonsi, However, as I don't know a good place to put common LLVM code for all three I inlined the implementation for LLVMPipe. v2: Use the new code for LLVM 3.4+ instead of LLVM 5+ & fixed indentation Signed-off-by: Bas Nieuwenhuizen <[email protected]> Reviewed-by: Roland Scheidegger <[email protected]>
* r600g: use ieee variants of multiplication instructionsIlia Mirkin2017-01-292-18/+19
| | | | | | | | This matches the behavior of most other drivers, including nouveau, radeonsi, and i965. Signed-off-by: Ilia Mirkin <[email protected]> Reviewed-by: Nicolai Hähnle <[email protected]>
* r600g: add support for optionally using non-IEEE mul opsIlia Mirkin2017-01-282-4/+18
| | | | | Signed-off-by: Ilia Mirkin <[email protected]> Reviewed-by: Nicolai Hähnle <[email protected]>