summaryrefslogtreecommitdiffstats
path: root/src/gallium
Commit message (Collapse)AuthorAgeFilesLines
* nvc0: support MP performance counters on MaxwellSamuel Pitoiset2016-11-103-3/+721
| | | | | | | This adds some performance counters/metrics for SM50/SM52. Signed-off-by: Samuel Pitoiset <[email protected]> Tested-by: Pierre Moreau <[email protected]>
* gallium: detect avx512 cpu featuresTim Rowley2016-11-102-0/+36
| | | | | | | v3: fix check for xmm/ymm test v2: style code, add avx512 to cpu dump Reviewed-by: Roland Scheidegger <[email protected]>
* radeonsi: fix r600_texture::tc_compatible_htileMarek Olšák2016-11-101-3/+3
| | | | | | | | htile_size is now always non-zero if HTILE is allocated. It seems to have caused no issues. Reviewed-by: Nicolai Hähnle <[email protected]>
* radeonsi: accept is_store in image_fetch_rsrc instead of dcc_offMarek Olšák2016-11-101-4/+4
| | | | Reviewed-by: Nicolai Hähnle <[email protected]>
* radeonsi: don't rely on tgsi_scan::images_buffersMarek Olšák2016-11-101-8/+11
| | | | | | the instruction knows the target Reviewed-by: Nicolai Hähnle <[email protected]>
* radeonsi: re-order cases in si_get_shader_paramMarek Olšák2016-11-101-28/+28
| | | | Reviewed-by: Nicolai Hähnle <[email protected]>
* radeonsi: increase MAX_CONTROL_FLOW_DEPTH AKA MaxIfDepthMarek Olšák2016-11-101-2/+1
| | | | | | we don't want to lower deep IFs unconditionally Reviewed-by: Nicolai Hähnle <[email protected]>
* radeonsi: fix/silence unused variable warnings in optimized buildsNicolai Hähnle2016-11-102-3/+3
| | | | | | | | I'm leaving num_out_sgpr around since it's not in a fast path, and besides the compiler should be able to optimize it away easily. The alternative with #if/#endif would be extremely ugly. Reviewed-by: Marek Olšák <[email protected]>
* gallivm: fix [IU]MUL_HI regression harderNicolai Hähnle2016-11-101-8/+12
| | | | | | | | | | The fix in commit 88f791db75e9f065bac8134e0937e1b76600aa36 was insufficient for radeonsi because the vector case was not handled properly. It seems piglit only covers the scalar case, unfortunately. Fixes GL45-CTS.shader_bitfield_operation.[iu]mulExtended.* Reviewed-by: Roland Scheidegger <[email protected]>
* swr: correct setting of independentAlphaBlendEnableIlia Mirkin2016-11-091-1/+6
| | | | | | | | This setting is for whether color and alpha have different blend settings, not for whether blending is enabled on a per-RT basis. Signed-off-by: Ilia Mirkin <[email protected]> Reviewed-by: Tim Rowley <[email protected]>
* swr: [rasterizer] add a .dir-locals.el to support 4-space indentsIlia Mirkin2016-11-091-0/+8
| | | | | Signed-off-by: Ilia Mirkin <[email protected]> Reviewed-by: Tim Rowley <[email protected]>
* swr: set halfz rasterizer settingIlia Mirkin2016-11-091-0/+1
| | | | | Signed-off-by: Ilia Mirkin <[email protected]> Reviewed-by: Tim Rowley <[email protected]>
* swr: [rasterizer core] allow an OpenGL driver to specify halfz clippingIlia Mirkin2016-11-092-7/+7
| | | | | | | | | | With ARB_clip_control, GL may also do 0..1 depth clipping, not just -1..1. This removes clip's reliance on driver type. DX users will need to be updated to set the new clipHalfZ flag to get proper clipping functionality. Signed-off-by: Ilia Mirkin <[email protected]> Reviewed-by: Tim Rowley <[email protected]>
* swr: fix support for inverted depth scalesIlia Mirkin2016-11-091-7/+3
| | | | | Signed-off-by: Ilia Mirkin <[email protected]> Reviewed-by: Tim Rowley <[email protected]>
* swr: [rasterizer jitter] fix logic op to work with unorm/snormIlia Mirkin2016-11-091-17/+65
| | | | | | | | | Most logic op usage is probably going to end up with normalized textures. Scale the floating point values and convert to integer before performing the logic operations. Signed-off-by: Ilia Mirkin <[email protected]> Reviewed-by: Tim Rowley <[email protected]>
* vc4: Clamp the shadow comparison value.Eric Anholt2016-11-091-0/+9
| | | | | | Fixes piglit glsl-fs-shadow2D-clamp-z. Cc: <[email protected]>
* vc4: Don't pair up TLB scoreboard locking instructions early in QPU sched.Eric Anholt2016-11-091-0/+12
| | | | | | | | | | | | | | | | | | | | | | | | | | | Jonas Pfeil noticed that we were putting passthrough tlb_z writes early in the shader, despite QIR and QPU scheduling both trying to delay scoreboard locking for as long as possible. The problem was that when trying to pair up QPU instructions, at some point the passthrough tlb_z would be the last one available and it would get paired, even if the other half would open up other instructions to be scheduled and we could have paired tlb_z with something later in the program. Also, since passthrough z is just a mov, it pairs up really easily. The proper fix would probably be to flip the order of scheduling instructions so we went from bottom to top (also relevant for branch delay slot scheduling). However, we can do a quick fix here to just not schedule a TLB lock until there's nothing but TLB left in the program, at a slight instruction cost (est .61% cycle count in shader-db) but a major fragment shader parallelism win. glmark2 results: texture:texture-filter=linear: +1.24481% +/- 0.626117% (n=15) bump:bump-render=height: 1.24991% +/- 0.154793% (n=136,133 -- screensaver outliers removed)
* vc4: Print a reg pressure estimate in our reg allocation failure dump.Eric Anholt2016-11-091-0/+5
|
* vc4: Don't abort when a shader compile fails.Eric Anholt2016-11-096-8/+32
| | | | | | | | | It's much better to just skip the draw call entirely. Getting this information out of register allocation will also be useful for implementing threaded fragment shaders, which will need to retry non-threaded if RA fails. Cc: <[email protected]>
* llvmpipe: Fix build after removal of deprecated attribute API v2Aaron Watry2016-11-092-3/+2
| | | | | | | | | | | | Applies on top of v3 of Tom's gallivm change. v2: - Tom Stellard: Use enums instread of strings. Reviewed-by: Nicolai Hähnle <[email protected]> Signed-off-by: Aaron Watry <[email protected]> CC: Tom Stellard <[email protected]> CC: Jan Vesely <[email protected]>
* gallivm: Fix build after removal of deprecated attribute API v3Tom Stellard2016-11-096-52/+138
| | | | | | | | | | | | v2: Fix adding parameter attributes with LLVM < 4.0. v3: Fix typo. Fix parameter index. Add a gallivm enum for function attributes. Reviewed-by: Nicolai Hähnle <[email protected]>
* Revert "draw: use vectorized calculations for fetch"Roland Scheidegger2016-11-092-282/+159
| | | | | | | | Trivial. There's some regressions internally, related to overflow behavior. I'll have to look at it at another time, some interactions with vsplit/vcache are actually mind-blowing. This reverts commit 3fa10ffb496cc4e6d1003891cf0381bb5bec2a74.
* swr: disable logic op when the rt format is float or srgbIlia Mirkin2016-11-081-0/+6
| | | | | Signed-off-by: Ilia Mirkin <[email protected]> Reviewed-by: Tim Rowley <[email protected]>
* swr: fix AND_INVERTED logic op conversionIlia Mirkin2016-11-081-1/+1
| | | | | Signed-off-by: Ilia Mirkin <[email protected]> Reviewed-by: Tim Rowley <[email protected]>
* swr: add support for EXT_depth_bounds_testIlia Mirkin2016-11-082-1/+7
| | | | | Signed-off-by: Ilia Mirkin <[email protected]> Reviewed-by: Tim Rowley <[email protected]>
* swr: [rasterizer core] set depth hottile when depth bounds test enabledIlia Mirkin2016-11-081-1/+3
| | | | | Signed-off-by: Ilia Mirkin <[email protected]> Reviewed-by: Tim Rowley <[email protected]>
* swr: allow alphatest without blend or logicopTim Rowley2016-11-081-1/+2
| | | | | | We need to compile a blend function when alphatest is enabled. Reviewed-by: Bruce Cherniak <[email protected]>
* tgsi/scan: turn a huge if-else-if.. chain into a switch statementMarek Olšák2016-11-081-14/+30
| | | | | Reviewed-by: Brian Paul <[email protected]> Reviewed-by: Nicolai Hähnle <[email protected]>
* tgsi/scan: fix images_buffers regressionMarek Olšák2016-11-081-3/+2
| | | | | | | | | The first IF statement disabled the second one. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=98599 Reviewed-by: Brian Paul <[email protected]> Reviewed-by: Nicolai Hähnle <[email protected]>
* gallivm: fix [IU]MUL_HI regressionNicolai Hähnle2016-11-083-28/+90
| | | | | | | | | | | | | | | | This patch does two things: 1. It separates the host-CPU code generation from the generic code generation. This guards against accidently breaking things for radeonsi in the future. 2. It makes sure we actually use both arguments and don't just compute a square :-p Fixes a regression introduced by commit 29279f44b3172ef3b84d470e70fc7684695ced4b Cc: Roland Scheidegger <[email protected]> Reviewed-by: Roland Scheidegger <[email protected]>
* draw: use vectorized calculations for fetchRoland Scheidegger2016-11-082-159/+282
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Instead of doing all the math with scalars, use vectors. This means the overflow math needs to be done manually, albeit that's only really problematic for the stride/index mul, the rest has been pretty much moved outside the shader loop (albeit the mul could actually be optimized away too), where things are still scalar. Because llvm is complete fail with the zero-extend widening mul, roll our own even... To eliminate control flow in the main shader loop fetch, provide fake buffers (so index 0 is always valid to fetch). Still uses aos fetch though in the end - mostly because some more code would be needed to handle unaligned fetches in that path, and because for most formats it won't make a difference anyway (we generate some truly horrendous code for things like R16G16_something for instance). Instanced fetch however stays roughly the same as before, except that no longer the same element is fetched multiple times (I've seen a reduction of ~3 times in main shader loop size due to apparently llvm not being able to deduce it's really all the same with a couple instanced elements). Also, for elts gathering, use vectorized code as well - provide a fake elt buffer if there's no valid one bound. The generated shaders are smaller and faster to compile (not entirely sure about execution speed, but generally unless there's just single vertices to handle I would expect it to be faster - there's more opportunities for future improvements by using soa fetch). No piglit change. Reviewed-by: Jose Fonseca <[email protected]>
* gallivm: introduce 32x32->64bit lp_build_mul_32_lohi functionRoland Scheidegger2016-11-083-38/+172
| | | | | | | | | | | | This is used by shader umul_hi/imul_hi functions (and soon by draw). It's actually useful separating this out on its own, however the real reason for doing it is because we're using an optimized sse2 version, since the code llvm generates is atrocious (since there's no widening mul in llvm, and it does not recognize the widening mul pattern, so it generates code for real 64x64->64bit mul, which the cpu can't do natively, in contrast to 32x32->64bit mul which it could do). Reviewed-by: Jose Fonseca <[email protected]>
* nvc0: simplify draw parameters upload for vertex shadersSamuel Pitoiset2016-11-071-8/+6
| | | | | Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Ilia Mirkin <[email protected]>
* gallium/hud: protect against and initialization raceSteven Toth2016-11-074-8/+41
| | | | | | | | | In the event that multiple threads attempt to install a graph concurrently, protect the shared list. Signed-off-by: Steven Toth <[email protected]> Reviewed-by: Brian Paul <[email protected]> Reviewed-by: Nicolai Hähnle <[email protected]>
* gallium/hud: close a previously opened handleSteven Toth2016-11-073-1/+6
| | | | | | | | We're missing the closedir() to the matching opendir(). Signed-off-by: Steven Toth <[email protected]> Reviewed-by: Brian Paul <[email protected]> Reviewed-by: Nicolai Hähnle <[email protected]>
* gallium/hud: fix a problem where objects are free'd while in use.Steven Toth2016-11-074-55/+0
| | | | | | | | | | | | | | Instead of trying to maintain a reference counted list of valid HUD objects, and freeing them accordingly, creating race conditions between unanticipated multiple threads, simply accept they're allocated once and never released until the process terminates. They're a shared resource between multiple threads, so accept they're always available for use. Signed-off-by: Steven Toth <[email protected]> Reviewed-by: Brian Paul <[email protected]> Reviewed-by: Nicolai Hähnle <[email protected]>
* clover: Add CL_PROGRAM_BINARY_TYPE support (CL1.2).Serge Martin2016-11-0610-11/+35
| | | | | | | | | | | | v3 [Francisco Jerez]: Loosely based on Serge's v1 of this patch in order to avoid CL-specific enums in the clover module binary format. In addition to other changes made in v2: Represent the CL program binary type as the section type instead of adding a CL API-specific enum, check that the binary types of the input objects are valid during clLinkProgram(), pass section type as argument to build_module_library() instead of using separate function. Reviewed-by: Francisco Jerez <[email protected]>
* clover: add missing clGetDeviceInfo CL1.2 queriesSerge Martin2016-11-063-0/+35
| | | | | | Reviewed-by: Francisco Jerez <[email protected]> Reviewed-by: Edward O'Callaghan <[email protected]> Reviewed-by: Vedran Miletić <[email protected]>
* nvc0: get rid of NVE4_COMPUTE_MP_PM_{A,B}_SIGSEL_XXXSamuel Pitoiset2016-11-051-56/+56
| | | | | | | Instead, hardcode group sigsel because there are a bunch of unknown groups, especially on SM50/SM52. Signed-off-by: Samuel Pitoiset <[email protected]>
* gm107/ir: emit RED instead of ATOM when no dstSamuel Pitoiset2016-11-051-1/+28
| | | | | | | | | | | | | This is similar to NVC0 and GK110 emitters where we emit reduction operations instead of atomic operations when the destination is not used. Found after writing some tests which check if performance counters return the expected value. In that case, gred_count returned 0 on gm107 while at least gk106 returned the correct value. Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Ilia Mirkin <[email protected]>
* android: amd/common: add support for libmesa_amd_commonMauro Rossi2016-11-051-1/+1
| | | | | | | | | | | | | | | Fixes the following building error introduced with commit 7115e56 and related amd/common dependencies: external/mesa/src/gallium/drivers/radeonsi/si_shader.c:6861: error: undefined reference to 'ac_is_sgpr_param' external/mesa/src/gallium/drivers/radeonsi/si_shader.c:6951: error: undefined reference to 'ac_is_sgpr_param' clang++: error: linker command failed with exit code 1 (use -v to see invocation) ninja: build stopped: subcommand failed. build/core/ninja.mk:148: recipe for target 'ninja_wrapper' failed make: *** [ninja_wrapper] Error 1 Signed-off-by: Marek Olšák <[email protected]>
* winsys/radeon: don't call surface_best for FMASKMarek Olšák2016-11-051-1/+1
| | | | | | Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=98518 Acked-by: Edward O'Callaghan <[email protected]>
* vc4: Use Newton-Raphson on the 1/W write to fix glmark2 terrain.Eric Anholt2016-11-041-1/+1
| | | | | | | The 1/W was apparently not accurate enough, and we were getting sparklies in the distance. The closed driver also did a N-R step here. Cc: <[email protected]>
* vc4: Make sure that vertex shader texture2D() calls use LOD 0.Eric Anholt2016-11-041-0/+10
| | | | | I noticed this while trying to debug glmark2 terrain (which does vertex shader texturing, but no mipmaps on its textures sampled from the VS).
* radeonsi: fix vertex fetches for 2_10_10_10 formatsNicolai Hähnle2016-11-045-6/+78
| | | | | | | | | | | The hardware always treats the alpha channel as unsigned, so add a shader workaround. This is rare enough that we'll just build a monolithic vertex shader. The SINT case cannot actually happen in OpenGL, but I've included it for completeness since it's just a mix of the other cases. Reviewed-by: Marek Olšák <[email protected]>
* Revert "st/vdpau: use linear layout for output surfaces"Dave Airlie2016-11-041-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This reverts commit d180de35320eafa3df3d76f0e82b332656530126. This is a radeon specific hack that causes problems on nouveau when combined with the SHARED flag later. If radeonsi needs a fix for this, please fix it in the driver. [chk] Using linear surfaces for this makes sense because tilling isn't beneficial and the surfaces can potentially be shared with other GPUs using the VDPAU OpenGL interop. [airlied] I think we need a flag that isn't SHARED/LINEAR that is more SHARED_OTHER_GPU. [mareko] Does radeonsi need PIPE_BIND_VIDEO_DECODE_OUTPUT that it would translate into linear ? [mareko] My only concern is decoding performance. If the decoder works in 64x1 blocks, tiling will hurt. That's the theory. I don't know how the decoder works. Cc: 12.0 13.0 <[email protected]> Acked-by: Christian König <[email protected]> Signed-off-by: Dave Airlie <[email protected]> Tested-by: Ilia Mirkin <[email protected]> Tested-by: Nayan Deshmukh <[email protected]> (I+A)
* radeonsi: fix an assertion failure in si_decompress_sampler_color_texturesMarek Olšák2016-11-041-1/+3
| | | | | | | | | This fixes a crash in Deus Ex: Mankind Divided. Release builds were unaffected, so it's not too serious. Cc: 11.2 12.0 13.0 <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]> Reviewed-by: Nicolai Hähnle <[email protected]>
* radeonsi: enable GLSL 4.50Nicolai Hähnle2016-11-041-1/+1
| | | | | Reviewed-by: Edward O'Callaghan <[email protected]> Reviewed-by: Dave Airlie <[email protected]>
* gallium/radeon: Multiply bpe by nsamples in surf_winsys_to_drmMichel Dänzer2016-11-041-2/+5
| | | | | | | For symmetry with surf_drm_to_winsys. Reviewed-by: Nicolai Hähnle <[email protected]> Reviewed-by: Marek Olšák <[email protected]>
* gallium/radeon: Use flags parameter in radeon_winsys_surface_initMichel Dänzer2016-11-041-1/+1
| | | | | | | Fixes valgrind warnings about surf_ws->flags being uninitialized while starting X. Reviewed-by: Nicolai Hähnle <[email protected]>