summaryrefslogtreecommitdiffstats
path: root/src/gallium
Commit message (Collapse)AuthorAgeFilesLines
* gallium/radeon: rename grbm to mmio in the gpu load pathSamuel Pitoiset2017-01-302-32/+33
| | | | | | | | We also want to monitor other MMIO counters like SRBM_STATUS2 in order to know if SDMA is busy. Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Marek Olšák <[email protected]>
* winsys/amdgpu: add a fast exit path into amdgpu_cs_add_bufferMarek Olšák2017-01-302-0/+21
| | | | | | The time spent in the function dropped by 37% for torcs. Reviewed-by: Nicolai Hähnle <[email protected]>
* winsys/amdgpu: do not iterate twice when adding fence dependenciesSamuel Pitoiset2017-01-301-31/+32
| | | | | | | | The perf difference is very small, 3.25->2.84% in amdgpu_cs_flush() in the DXMD benchmark. Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Marek Olšák <[email protected]>
* winsys/amdgpu: add one likely() call in amdgpu_cs_flush()Samuel Pitoiset2017-01-301-2/+2
| | | | | Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Marek Olšák <[email protected]>
* hud: fix compilation warnings in hud_nic_graph_install()Samuel Pitoiset2017-01-301-2/+2
| | | | | | | v2: use PRId64 instead of PRIx64 Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Marek Olšák <[email protected]>
* gallium/radeon: remove r600_common_context::max_dbMarek Olšák2017-01-303-20/+17
| | | | | | this cleanup is based on the vulkan driver, which seems to do the same thing Reviewed-by: Nicolai Hähnle <[email protected]>
* winsys/amdgpu: fix ADDR_REGISTER_VALUE::backendDisablesMarek Olšák2017-01-301-1/+1
| | | | | | This would be a fix if the value was used anywhere. Reviewed-by: Nicolai Hähnle <[email protected]>
* gallium/radeon: clean up r600_query_init_backend_maskMarek Olšák2017-01-306-22/+21
| | | | | | | This just needs to be done for r600g in the screen. We don't need an IB submission for every new context created for GCN. Reviewed-by: Nicolai Hähnle <[email protected]>
* radeonsi: precompute IA_MULTI_VGT_PARAM values into a tableMarek Olšák2017-01-306-72/+163
| | | | | | | The perf difference is very small: 0.99% -> 0.40% for the time spent in si_get_ia_multi_vgt_param when si_draw_vbo is 20%. Pretty much nothing. Reviewed-by: Nicolai Hähnle <[email protected]>
* radeonsi: move VGT_VERTEX_REUSE_BLOCK_CNTL into shader states for PolarisMarek Olšák2017-01-304-21/+43
| | | | Reviewed-by: Nicolai Hähnle <[email protected]>
* radeonsi: state atom IDs don't have to be off by oneMarek Olšák2017-01-302-4/+4
| | | | Reviewed-by: Nicolai Hähnle <[email protected]>
* radeonsi: use a bitmask for looping over dirty PM4 statesMarek Olšák2017-01-305-18/+20
| | | | | | also move it to draw_vbo, because it should be 0 in most cases Reviewed-by: Nicolai Hähnle <[email protected]>
* radeonsi: atomize L2 prefetchesMarek Olšák2017-01-307-36/+50
| | | | | | to move the big conditional statement out of draw_vbo Reviewed-by: Nicolai Hähnle <[email protected]>
* radeonsi: unbind disabled shader stages to prevent useless L2 prefetchesMarek Olšák2017-01-301-0/+6
| | | | Reviewed-by: Nicolai Hähnle <[email protected]>
* radeonsi: also prefetch compute shadersMarek Olšák2017-01-301-0/+12
| | | | Reviewed-by: Nicolai Hähnle <[email protected]>
* radeonsi: update dirty_level_mask only after the first draw after FB changeMarek Olšák2017-01-303-24/+31
| | | | Reviewed-by: Nicolai Hähnle <[email protected]>
* gallium/radeon: allow VRAM-only placements again on APUs & recent amdgpuMarek Olšák2017-01-301-0/+4
| | | | Reviewed-by: Nicolai Hähnle <[email protected]>
* radeonsi: don't set +fp64-denormalsMarek Olšák2017-01-301-1/+1
| | | | | | it's the default and the name will change to +fp64-fp16-denormals. Reviewed-by: Nicolai Hähnle <[email protected]>
* radeonsi: remove si_shader_context::param_tess_offchipMarek Olšák2017-01-302-8/+3
| | | | | | we don't use on-chip tess. Reviewed-by: Nicolai Hähnle <[email protected]>
* etnaviv: force vertex buffers through the MMULucas Stach2017-01-301-1/+4
| | | | | | | | | This fixes a vertex data corruption issue if some of the vertex streams go through the MMU and some don't. Signed-off-by: Lucas Stach <[email protected]> Tested-by: Philipp Zabel <[email protected]> Acked-by: Christian Gmeiner <[email protected]>
* llvmpipe: Use LLVMDumpModule, not DumpModule.Bas Nieuwenhuizen2017-01-291-1/+1
| | | | | | | Forgot the prefix ... Fixes: 0fca80b3db64dc1d004f78e22b9de86a07e9de96 Signed-off-by: Bas Nieuwenhuizen <[email protected]>
* various: Fix missing DumpModule with recent LLVM.Bas Nieuwenhuizen2017-01-292-4/+10
| | | | | | | | | | | | | | Since LLVM revision 293359 DumpModule gets only implemented when either a debug build or LLVM_ENABLE_DUMP is set. This patch adds a direct replacement for the function for radv and radeonsi, However, as I don't know a good place to put common LLVM code for all three I inlined the implementation for LLVMPipe. v2: Use the new code for LLVM 3.4+ instead of LLVM 5+ & fixed indentation Signed-off-by: Bas Nieuwenhuizen <[email protected]> Reviewed-by: Roland Scheidegger <[email protected]>
* r600g: use ieee variants of multiplication instructionsIlia Mirkin2017-01-292-18/+19
| | | | | | | | This matches the behavior of most other drivers, including nouveau, radeonsi, and i965. Signed-off-by: Ilia Mirkin <[email protected]> Reviewed-by: Nicolai Hähnle <[email protected]>
* r600g: add support for optionally using non-IEEE mul opsIlia Mirkin2017-01-282-4/+18
| | | | | Signed-off-by: Ilia Mirkin <[email protected]> Reviewed-by: Nicolai Hähnle <[email protected]>
* vc4: Coalesce into TLB writes as well as VPM/tex.Eric Anholt2017-01-281-1/+5
| | | | | | | | This generally cuts an instruction when blending is enabled and we thus have a single instruction generating the color value. total instructions in shared programs: 91759 -> 91634 (-0.14%) instructions in affected programs: 5338 -> 5213 (-2.34%)
* vc4: Avoid an extra temporary and mov in ffloor/ffract/fceil.Eric Anholt2017-01-281-13/+18
| | | | | | | | | | shader-db results: total instructions in shared programs: 92611 -> 91764 (-0.91%) instructions in affected programs: 27417 -> 26570 (-3.09%) The star is one shader in glmark2's terrain (drops 16% of its instructions), but there are also wins in mupen64plus and glb2.7.
* vc4: Flip the switch to run the GLSL compiler optimization loop once.Eric Anholt2017-01-281-1/+1
| | | | | | | | | | | | | This has almost no effect on shader-db: total instructions in shared programs: 92572 -> 92611 (0.04%) instructions in affected programs: 4486 -> 4525 (0.87%) Looking at 2 of the 7 different shaders that were hurt (all of which were in mupen64), they all appear to be just differences in order of instructions at the NIR level. The advantage is that this should significantly reduce time in the compiler.
* nouveau: remove explicit __STDC_FORMAT_MACROS defineEmil Velikov2017-01-271-1/+0
| | | | | | | | Already handled by the build. Signed-off-by: Emil Velikov <[email protected]> Reviewed-by: Eric Engestrom <[email protected]> Reviewed-by: Jose Fonseca <[email protected]>
* scons: swr: remove explicit __STDC_.*_MACROS definesEmil Velikov2017-01-271-5/+0
| | | | | | | | | Analogous to previous commits. Cc: George Kyriazis <[email protected]> Signed-off-by: Emil Velikov <[email protected]> Reviewed-by: Eric Engestrom <[email protected]> Reviewed-by: Jose Fonseca <[email protected]>
* gallium: remove explicit __STDC_.*_MACROS definesEmil Velikov2017-01-271-3/+0
| | | | | | | | Analogous to previous commits. Signed-off-by: Emil Velikov <[email protected]> Reviewed-by: Eric Engestrom <[email protected]> Reviewed-by: Jose Fonseca <[email protected]>
* gallivm: remove explicit __STDC_.*_MACROS definesEmil Velikov2017-01-271-8/+0
| | | | | | | | | | Correctly handled by the build systems. Cc: Roland Scheidegger <[email protected]> Signed-off-by: Emil Velikov <[email protected]> Reviewed-by: Roland Scheidegger <[email protected]> Reviewed-by: Eric Engestrom <[email protected]> Reviewed-by: Jose Fonseca <[email protected]>
* st/xa: automake: remove duplicate -WallEmil Velikov2017-01-271-1/+1
| | | | | | Already handled by configure.ac Signed-off-by: Emil Velikov <[email protected]>
* svga: remove const qualifier from SVGA3D_vgpu10_GenMips() prototypeEmil Velikov2017-01-271-1/+1
| | | | | | | | | | Does not match the function definition or how it's used. Triggers the following warning in AppVeyor svga_cmd_vgpu10.c(1301) : warning C4028: formal parameter 2 different from declaration Cc: Charmaine Lee <[email protected]> Signed-off-by: Emil Velikov <[email protected]>
* d3dadapter9: automake: include builddir prior to srcdirEmil Velikov2017-01-271-1/+1
| | | | | | | | Analogous to previous commit. Cc: "12.0 13.0" <[email protected]> Cc: Axel Davy <[email protected]> Signed-off-by: Emil Velikov <[email protected]>
* st/dri: automake: include builddir prior to srcdirEmil Velikov2017-01-271-1/+1
| | | | | | | Analogous to previous commit. Cc: "12.0 13.0" <[email protected]> Signed-off-by: Emil Velikov <[email protected]>
* clover: automake: remove -I$(srcdir)Emil Velikov2017-01-271-2/+1
| | | | | | | Already implicitly handled by the build system. Signed-off-by: Emil Velikov <[email protected]> Tested-by: Aaron Watry <[email protected]>
* clover: automake: include builddir prior to srcdirEmil Velikov2017-01-271-1/+1
| | | | | | | | | Analogous to previous commit. Cc: "12.0 13.0" <[email protected]> Cc: Aaron Watry <[email protected]> Cc: Francisco Jerez <[email protected]> Signed-off-by: Emil Velikov <[email protected]>
* freedreno: automake: correctly set MKDIR_GENEmil Velikov2017-01-271-0/+1
| | | | | | | | | | | | Analogous to previous commit. Fixes: 4610e5ef28e "freedreno/ir3: fix sin/cos" Cc: "12.0 13.0" <[email protected]> Cc: Rob Clark <[email protected]> Cc: Nicolas Dechesne <[email protected]> Reported-by: Nicolas Dechesne <[email protected]> Signed-off-by: Emil Velikov <[email protected]> Tested-by: Nicolas Dechesne <[email protected]>
* gallium: enable int64 on radeonsi, llvmpipe, softpipeNicolai Hähnle2017-01-273-4/+4
| | | | | | | | | All of these have had support for the TGSI opcodes since before most of the glsl compiler work landed. Also update the docs accordingly, including the missing note about i965. Reviewed-by: Marek Olšák <[email protected]>
* gallium: Add integer 64 capabilityDave Airlie2017-01-2717-0/+17
| | | | | | | | | v1.1: move to using a normal CAP. (Marek) v2: fill in the cap everywhere Signed-off-by: Dave Airlie <[email protected]> Reviewed-by: Marek Olšák <[email protected]>
* clover: Fix build against clang SVN >= r293097Michel Dänzer2017-01-271-1/+8
| | | | Reviewed-by: Francisco Jerez <[email protected]>
* vc4: Use NEON to speed up utile stores on Pi2+.cros-mesa-17.1.0-r2-vanillacros-mesa-17.1.0-r1-vanillachadv/cros-mesa-17.1.0-r2-vanillachadv/cros-mesa-17.1.0-r1-vanillaEric Anholt2017-01-261-5/+50
| | | | Improves 1024x1024 TexSubImage2D by 41.2371% +/- 3.52799% (n=10).
* vc4: Use NEON to speed up utile loads on Pi2.Eric Anholt2017-01-263-18/+115
| | | | | | | | | | | | | | | | | | | We had a lot of memcpy call overhead because gpu_stride wasn't being inlined. But if you split out the stride==8 and stride==16 cases like this code does while still using memcpy, you'd no longer have glibc's NEON memcpy applied at which point we'd be doing 16 uncached reads instead of 64/(NEON memcpy granularity), for about a 30% performance hit. By hand writing the assembly, we can get a whole cacheline loaded at a time. Unfortunately, NEON intrinsics turned out to be unusable -- they didn't have the vldm instruction available. Note that, for now, the NEON code is only enabled when building for ARMv7 (Pi 2+). We may want to do runtime detection for the Raspbian case, in the future. Improves 1024x1024 GetTexImage by 208.256% +/- 7.07029% (n=10).
* vc4: Move LT tiling code to a separate file.Eric Anholt2017-01-264-80/+122
| | | | This paves the way for building it twice, with NEON assembly or not.
* vc4: Use unreachable() in an unreachable codepath for tiling.Eric Anholt2017-01-261-4/+2
|
* gallium/radeon: add VRAM-vis-usage HUD querySamuel Pitoiset2017-01-265-0/+14
| | | | | | | | | | This new query returns the current visible usage of VRAM accessed by the CPU. It will return 0 on radeon because it's unimplemented. Signed-off-by: Samuel Pitoiset <[email protected]> Tested-by: Edmondo Tommasina <[email protected]> Reviewed-by: Nicolai Hähnle <[email protected]> Reviewed-by: Marek Olšák <[email protected]>
* gallium/radeon: query the CPU accessible size of VRAMSamuel Pitoiset2017-01-264-1/+13
| | | | | | | | | | R600_DEBUG="info" can be used to display that size, as well as the total amount of VRAM/GTT. Signed-off-by: Samuel Pitoiset <[email protected]> Tested-by: Edmondo Tommasina <[email protected]> Reviewed-by: Nicolai Hähnle <[email protected]> Reviewed-by: Marek Olšák <[email protected]>
* swr: Update fs texture & sampler state logicGeorge Kyriazis2017-01-251-2/+5
| | | | | | | | In swr_update_derived() update texture and sampler state on a new fragment shader. GALLIUM_HUD can update fs using a previously bound texture and sampler. Reviewed-by: Bruce Cherniak <[email protected]>
* gallium/radeon: add a new HUD query for the number of mapped buffersSamuel Pitoiset2017-01-259-0/+18
| | | | | | | | | | | Useful when debugging applications which map a ton of buffers and also because we used to run into Linux's limit on the number of simultaneous mmap() calls. v2: - update the commit message Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Nicolai Hähnle <[email protected]>
* radeonsi: handle first_non_void correctly in si_create_vertex_elementsMarek Olšák2017-01-241-3/+3
| | | | | | | | This fixes R11G11B10_FLOAT, because it's in the category of "OTHER", meaning that it doesn't have any channel description. Cc: 17.0 <[email protected]> Reviewed-by: Nicolai Hähnle <[email protected]>