summaryrefslogtreecommitdiffstats
path: root/src/gallium
Commit message (Collapse)AuthorAgeFilesLines
* vc4: Coalesce into TLB writes as well as VPM/tex.Eric Anholt2017-01-281-1/+5
| | | | | | | | This generally cuts an instruction when blending is enabled and we thus have a single instruction generating the color value. total instructions in shared programs: 91759 -> 91634 (-0.14%) instructions in affected programs: 5338 -> 5213 (-2.34%)
* vc4: Avoid an extra temporary and mov in ffloor/ffract/fceil.Eric Anholt2017-01-281-13/+18
| | | | | | | | | | shader-db results: total instructions in shared programs: 92611 -> 91764 (-0.91%) instructions in affected programs: 27417 -> 26570 (-3.09%) The star is one shader in glmark2's terrain (drops 16% of its instructions), but there are also wins in mupen64plus and glb2.7.
* vc4: Flip the switch to run the GLSL compiler optimization loop once.Eric Anholt2017-01-281-1/+1
| | | | | | | | | | | | | This has almost no effect on shader-db: total instructions in shared programs: 92572 -> 92611 (0.04%) instructions in affected programs: 4486 -> 4525 (0.87%) Looking at 2 of the 7 different shaders that were hurt (all of which were in mupen64), they all appear to be just differences in order of instructions at the NIR level. The advantage is that this should significantly reduce time in the compiler.
* nouveau: remove explicit __STDC_FORMAT_MACROS defineEmil Velikov2017-01-271-1/+0
| | | | | | | | Already handled by the build. Signed-off-by: Emil Velikov <[email protected]> Reviewed-by: Eric Engestrom <[email protected]> Reviewed-by: Jose Fonseca <[email protected]>
* scons: swr: remove explicit __STDC_.*_MACROS definesEmil Velikov2017-01-271-5/+0
| | | | | | | | | Analogous to previous commits. Cc: George Kyriazis <[email protected]> Signed-off-by: Emil Velikov <[email protected]> Reviewed-by: Eric Engestrom <[email protected]> Reviewed-by: Jose Fonseca <[email protected]>
* gallium: remove explicit __STDC_.*_MACROS definesEmil Velikov2017-01-271-3/+0
| | | | | | | | Analogous to previous commits. Signed-off-by: Emil Velikov <[email protected]> Reviewed-by: Eric Engestrom <[email protected]> Reviewed-by: Jose Fonseca <[email protected]>
* gallivm: remove explicit __STDC_.*_MACROS definesEmil Velikov2017-01-271-8/+0
| | | | | | | | | | Correctly handled by the build systems. Cc: Roland Scheidegger <[email protected]> Signed-off-by: Emil Velikov <[email protected]> Reviewed-by: Roland Scheidegger <[email protected]> Reviewed-by: Eric Engestrom <[email protected]> Reviewed-by: Jose Fonseca <[email protected]>
* st/xa: automake: remove duplicate -WallEmil Velikov2017-01-271-1/+1
| | | | | | Already handled by configure.ac Signed-off-by: Emil Velikov <[email protected]>
* svga: remove const qualifier from SVGA3D_vgpu10_GenMips() prototypeEmil Velikov2017-01-271-1/+1
| | | | | | | | | | Does not match the function definition or how it's used. Triggers the following warning in AppVeyor svga_cmd_vgpu10.c(1301) : warning C4028: formal parameter 2 different from declaration Cc: Charmaine Lee <[email protected]> Signed-off-by: Emil Velikov <[email protected]>
* d3dadapter9: automake: include builddir prior to srcdirEmil Velikov2017-01-271-1/+1
| | | | | | | | Analogous to previous commit. Cc: "12.0 13.0" <[email protected]> Cc: Axel Davy <[email protected]> Signed-off-by: Emil Velikov <[email protected]>
* st/dri: automake: include builddir prior to srcdirEmil Velikov2017-01-271-1/+1
| | | | | | | Analogous to previous commit. Cc: "12.0 13.0" <[email protected]> Signed-off-by: Emil Velikov <[email protected]>
* clover: automake: remove -I$(srcdir)Emil Velikov2017-01-271-2/+1
| | | | | | | Already implicitly handled by the build system. Signed-off-by: Emil Velikov <[email protected]> Tested-by: Aaron Watry <[email protected]>
* clover: automake: include builddir prior to srcdirEmil Velikov2017-01-271-1/+1
| | | | | | | | | Analogous to previous commit. Cc: "12.0 13.0" <[email protected]> Cc: Aaron Watry <[email protected]> Cc: Francisco Jerez <[email protected]> Signed-off-by: Emil Velikov <[email protected]>
* freedreno: automake: correctly set MKDIR_GENEmil Velikov2017-01-271-0/+1
| | | | | | | | | | | | Analogous to previous commit. Fixes: 4610e5ef28e "freedreno/ir3: fix sin/cos" Cc: "12.0 13.0" <[email protected]> Cc: Rob Clark <[email protected]> Cc: Nicolas Dechesne <[email protected]> Reported-by: Nicolas Dechesne <[email protected]> Signed-off-by: Emil Velikov <[email protected]> Tested-by: Nicolas Dechesne <[email protected]>
* gallium: enable int64 on radeonsi, llvmpipe, softpipeNicolai Hähnle2017-01-273-4/+4
| | | | | | | | | All of these have had support for the TGSI opcodes since before most of the glsl compiler work landed. Also update the docs accordingly, including the missing note about i965. Reviewed-by: Marek Olšák <[email protected]>
* gallium: Add integer 64 capabilityDave Airlie2017-01-2717-0/+17
| | | | | | | | | v1.1: move to using a normal CAP. (Marek) v2: fill in the cap everywhere Signed-off-by: Dave Airlie <[email protected]> Reviewed-by: Marek Olšák <[email protected]>
* clover: Fix build against clang SVN >= r293097Michel Dänzer2017-01-271-1/+8
| | | | Reviewed-by: Francisco Jerez <[email protected]>
* vc4: Use NEON to speed up utile stores on Pi2+.cros-mesa-17.1.0-r2-vanillacros-mesa-17.1.0-r1-vanillachadv/cros-mesa-17.1.0-r2-vanillachadv/cros-mesa-17.1.0-r1-vanillaEric Anholt2017-01-261-5/+50
| | | | Improves 1024x1024 TexSubImage2D by 41.2371% +/- 3.52799% (n=10).
* vc4: Use NEON to speed up utile loads on Pi2.Eric Anholt2017-01-263-18/+115
| | | | | | | | | | | | | | | | | | | We had a lot of memcpy call overhead because gpu_stride wasn't being inlined. But if you split out the stride==8 and stride==16 cases like this code does while still using memcpy, you'd no longer have glibc's NEON memcpy applied at which point we'd be doing 16 uncached reads instead of 64/(NEON memcpy granularity), for about a 30% performance hit. By hand writing the assembly, we can get a whole cacheline loaded at a time. Unfortunately, NEON intrinsics turned out to be unusable -- they didn't have the vldm instruction available. Note that, for now, the NEON code is only enabled when building for ARMv7 (Pi 2+). We may want to do runtime detection for the Raspbian case, in the future. Improves 1024x1024 GetTexImage by 208.256% +/- 7.07029% (n=10).
* vc4: Move LT tiling code to a separate file.Eric Anholt2017-01-264-80/+122
| | | | This paves the way for building it twice, with NEON assembly or not.
* vc4: Use unreachable() in an unreachable codepath for tiling.Eric Anholt2017-01-261-4/+2
|
* gallium/radeon: add VRAM-vis-usage HUD querySamuel Pitoiset2017-01-265-0/+14
| | | | | | | | | | This new query returns the current visible usage of VRAM accessed by the CPU. It will return 0 on radeon because it's unimplemented. Signed-off-by: Samuel Pitoiset <[email protected]> Tested-by: Edmondo Tommasina <[email protected]> Reviewed-by: Nicolai Hähnle <[email protected]> Reviewed-by: Marek Olšák <[email protected]>
* gallium/radeon: query the CPU accessible size of VRAMSamuel Pitoiset2017-01-264-1/+13
| | | | | | | | | | R600_DEBUG="info" can be used to display that size, as well as the total amount of VRAM/GTT. Signed-off-by: Samuel Pitoiset <[email protected]> Tested-by: Edmondo Tommasina <[email protected]> Reviewed-by: Nicolai Hähnle <[email protected]> Reviewed-by: Marek Olšák <[email protected]>
* swr: Update fs texture & sampler state logicGeorge Kyriazis2017-01-251-2/+5
| | | | | | | | In swr_update_derived() update texture and sampler state on a new fragment shader. GALLIUM_HUD can update fs using a previously bound texture and sampler. Reviewed-by: Bruce Cherniak <[email protected]>
* gallium/radeon: add a new HUD query for the number of mapped buffersSamuel Pitoiset2017-01-259-0/+18
| | | | | | | | | | | Useful when debugging applications which map a ton of buffers and also because we used to run into Linux's limit on the number of simultaneous mmap() calls. v2: - update the commit message Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Nicolai Hähnle <[email protected]>
* radeonsi: handle first_non_void correctly in si_create_vertex_elementsMarek Olšák2017-01-241-3/+3
| | | | | | | | This fixes R11G11B10_FLOAT, because it's in the category of "OTHER", meaning that it doesn't have any channel description. Cc: 17.0 <[email protected]> Reviewed-by: Nicolai Hähnle <[email protected]>
* gallivm: don't try to use fast rcp for fdivRoland Scheidegger2017-01-241-1/+3
| | | | | | | | The use of fast rcp instruction is disabled, and will always fall back to use a division instead (1 / x). Hence, if we get a division opcode, it doesn't make much sense trying to split that into rcp/mul. Reviewed-by: Jose Fonseca <[email protected]>
* gallivm: (trivial) fix ddiv cpu implementationRoland Scheidegger2017-01-241-1/+0
| | | | | | | | | | | we can't use the cpu implementation of fdiv, as this one uses different lp_build_context, which causes assertion failure. Just use default fdiv action (there is no fast rcp for doubles which we could potentially use anyway). Cc: 17.0 <[email protected]> Reviewed-by: Nicolai Hähnle <[email protected]> Reviewed-by: Jose Fonseca <[email protected]>
* tgsi: implement ddiv opcodeRoland Scheidegger2017-01-241-0/+14
| | | | | | | | | softpipe (along with llvmpipe) claims to support arb_gpu_shader_fp64, so we really need to support that opcode. Cc: 17.0 <[email protected]> Reviewed-by: Nicolai Hähnle <[email protected]> Reviewed-by: Jose Fonseca <[email protected]>
* gallium/radeon: undef the very specific UPDATE_COUNTER macroSamuel Pitoiset2017-01-241-5/+9
| | | | | | | Also, wrap this into a do { ... } while (0). Suggested by Nicolai. Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Nicolai Hähnle <[email protected]>
* nv50: add support for MUL_ZERO_WINS propertyIlia Mirkin2017-01-235-2/+11
| | | | | | | This is simply keyed off the vertex shader, as that's guaranteed to be present in any pipeline. Signed-off-by: Ilia Mirkin <[email protected]>
* nvc0: add support for MUL_ZERO_WINS propertyIlia Mirkin2017-01-234-9/+25
| | | | | | | | This sets the dnz flag on all the relevant multiplication operations. At emission time, this will only be supported by nvc0+, so nv50 will need a different solution. Signed-off-by: Ilia Mirkin <[email protected]>
* st/nine: set the MUL_ZERO_WINS flag when supportedIlia Mirkin2017-01-231-0/+3
| | | | | Signed-off-by: Ilia Mirkin <[email protected]> Reviewed-by: Axel Davy <[email protected]>
* gallium: add PIPE_CAP_TGSI_MUL_ZERO_WINSIlia Mirkin2017-01-2318-0/+19
| | | | | | Signed-off-by: Ilia Mirkin <[email protected]> Reviewed-by: Nicolai Hähnle <[email protected]> Reviewed-by: Axel Davy <[email protected]>
* gallium: add TGSI_PROPERTY_MUL_ZERO_WINSIlia Mirkin2017-01-233-3/+18
| | | | | | | | | This will be useful for proper D3D9 emulation, where this behavior is expected by some shaders. Signed-off-by: Ilia Mirkin <[email protected]> Reviewed-by: Nicolai Hähnle <[email protected]> Reviewed-by: Axel Davy <[email protected]>
* radeonsi: always set the TCL1_ACTION_ENA when invalidating L2Marek Olšák2017-01-231-1/+2
| | | | | | | | Some CIK-VI docs say this is the default behavior on SI. That doesn't answer whether it's also the default behavior on CIK-VI. Cc: 17.0 13.0 <[email protected]> Reviewed-by: Nicolai Hähnle <[email protected]>
* radeonsi: don't declare LDS in TESMarek Olšák2017-01-231-2/+1
| | | | | | not used since we started using the offchip tess ring Reviewed-by: Nicolai Hähnle <[email protected]>
* radeonsi: preload PS inputs only if KILL is usedMarek Olšák2017-01-231-2/+6
| | | | | | | | | | | so that most shaders can get lower VGPR usage thanks to lazy input loading. I think this is a more accurate constraint that prevents the black transitions in Witcher 2. Affected shaders (7758): Max Waves: 57437 -> 58231 (1.38 %) Reviewed-by: Nicolai Hähnle <[email protected]>
* gallium/radeon: adjust the rule for using the LINEAR_ALIGNED layoutMarek Olšák2017-01-231-1/+3
| | | | Reviewed-by: Nicolai Hähnle <[email protected]>
* winsys/amdgpu: drop all IBs if at least one was rejected within the contextMarek Olšák2017-01-231-1/+7
| | | | | | The corruption is inevitable and hangs are possible too. Reviewed-by: Nicolai Hähnle <[email protected]>
* winsys/amdgpu: report a rejected IB as a lost contextMarek Olšák2017-01-233-0/+14
| | | | Reviewed-by: Nicolai Hähnle <[email protected]>
* gallium/radeon: add HUD queries for monitoring some hw blocksSamuel Pitoiset2017-01-234-1/+110
| | | | | | | | | | | | It's also possible to monitor them via performance counters but the hardware can only use two counters simultaneously. It seems easier to re-use the existing code which reads from MMIO instead of writing a multi-pass approach. v2: - add new lines after ':' Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Marek Olšák <[email protected]>
* gallium/radeon: refactor the GRBM counters pathSamuel Pitoiset2017-01-233-43/+47
| | | | | | | | | | This will allow to expose more queries in order to know which blocks are busy/idle. v2: - add new lines after ':' Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Marek Olšák <[email protected]>
* swr: Align query results allocationGeorge Kyriazis2017-01-232-4/+5
| | | | | | | | | | | Some query results struct contents are declared as cache line aligned. Use aligned malloc, and align the whole struct, to be safe. Fixes crash when compiling with clang. CC: <[email protected]> Reviewed-by: Bruce Cherniak <[email protected]>
* swr: Prune empty nodes in CalculateProcessorTopology.Bruce Cherniak2017-01-231-0/+9
| | | | | | | | | | | | CalculateProcessorTopology tries to figure out system topology by parsing /proc/cpuinfo to determine the number of threads, cores, and NUMA nodes. There are some architectures where the "physical id" begins with 1 rather than 0, which was creating and empty "0" node and causing a crash in CreateThreadPool. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=97102 Reviewed-By: George Kyriazis <[email protected]> CC: <[email protected]>
* st/va: make sure that we call begin_frame() only once v2Christian König2017-01-232-3/+9
| | | | | | | | | | This fixes "st/va: delay calling begin_frame until we have all parameters". v2: call begin frame after decoder (re)creation as well. Signed-off-by: Christian König <[email protected]> Reviewed-by: Nayan Deshmukh <[email protected]> Tested-by: Andy Furniss <[email protected]>
* r600: implement DDIVNicolai Hähnle2017-01-231-0/+59
| | | | | | Tested-by: Glenn Kennard <[email protected]> Tested-by: James Harvey <[email protected]> Cc: 17.0 <[email protected]>
* r600: factor out cayman_emit_unary_double_rawNicolai Hähnle2017-01-231-20/+42
| | | | | | | | We will use it for DDIV. Tested-by: Glenn Kennard <[email protected]> Tested-by: James Harvey <[email protected]> Cc: 17.0 <[email protected]>
* r600: double multiply can handle only one multiply at a timeNicolai Hähnle2017-01-231-17/+19
| | | | | | | | | | It seems clear that trying to multiply two pairs of doubles would result in the temporary register getting overwritten by the second pair. So make the code more explicit. Tested-by: Glenn Kennard <[email protected]> Tested-by: James Harvey <[email protected]> Cc: 17.0 <[email protected]>
* freedreno/a5xx: set frag shader threadsizeRob Clark2017-01-221-2/+7
| | | | | Signed-off-by: Rob Clark <[email protected]> Cc: "17.0" <[email protected]>