summaryrefslogtreecommitdiffstats
path: root/src/gallium
Commit message (Collapse)AuthorAgeFilesLines
* vc4: Add shader-db dumping of NIR instruction count.Eric Anholt2015-04-011-0/+29
| | | | | | | | I was previously using temporary disables of VC4 optimization to show the benefits of improved NIR optimization, but this can get me quick and dirty numbers for NIR-only improvements without having to add hacks to disable VC4's code (disabling of which might hide ways that the NIR changes would hurt actual VC4 codegen).
* vc4: Convert to consuming NIR.Eric Anholt2015-04-015-720/+707
| | | | | | | | | | | | | | | | | | | NIR brings us better optimization than I would have bothered to write within the driver, developers sharing future optimization work, and the ability to share device-specific lowering code that we and other GLES2-level drivers need. total uniforms in shared programs: 13421 -> 13422 (0.01%) uniforms in affected programs: 62 -> 63 (1.61%) total instructions in shared programs: 39961 -> 39707 (-0.64%) instructions in affected programs: 15494 -> 15240 (-1.64%) v2: Add missing imov support, and assert that there are no dest saturates. v3: Rebase on the target-specific algebraic series. v4: Rebase on gallium-includes-from-NIR changes in mater. v5: Rebase on variables being in lists instead of hash tables. v6: Squash in intermediate changes that used the NIR-to-TGSI pass (which I'm not committing)
* gallium: Add tgsi_to_nir to get a nir_shader for a TGSI shader.Eric Anholt2015-04-013-0/+1454
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This will be used by the VC4 driver for doing device-independent optimization, and hopefully eventually replacing its whole IR. It also may be useful to other drivers for the same reason. v2: Add all of the instructions I was relying on tgsi_lowering to remove, and more. v3: Rebase on SSA rework of the builder. v4: Use the NIR ineg operation instead of doing a src modifier. v5: Don't use ineg for fnegs. (infer_src_type on MOV doesn't do what I expect, again). v6: Fix handling of multi-channel KILL_IF sources. v7: Make ttn_get_f() return a swizzle of a scalar load_const, rather than a vector load_const. CSE doesn't recognize that srcs out of those channels are actually all the same. v8: Rebase on nir_builder auto-sizing, make the scalar arguments to non-ALU instructions actually be scalars. v9: Add support for if/loop instructions, additional texture targets, and untested support for indirect addressing on temps. v10: Rebase on master, drop bad comment about control flow and just choose the X channel, use int comparison opcodes in LIT for now, drop unused pipe_context argument.. v11: Fix translation of LRP (previously missed because I mis-translated back out), use nir_builder init helpers. v12: Rebase on master, adding explicit include of mtypes.h to get INTERP_QUALIFIER_* v13: Rebase on variables being in lists instead of hash tables, drop use of mtypes.h in favor of util/pipeline.h. Use Ken's nir_builder swizzle and fmov/imov_alu helpers, drop "struct" in front of nir_builder, use nir_builder directly as the function arg in a lot of cases, drop redundant members of ttn_compile that are also in nir_builder, drop some half-baked malloc failure handling. v14: The indirect uniform src0 should be scalar, not vector (noticed as odd by robclark, confirmed by cwabbott). Apply Ken's review to initialize s->num_uniforms and friends, skip ttn_channel for dot products, and use the simpler discard_if intrinsic. Reviewed-by: Kenneth Graunke <[email protected]> (v13) Acked-by: Rob Clark <[email protected]>
* vc4: Tell shader-db how big our UBOs are, if present.Eric Anholt2015-04-011-0/+6
| | | | I had regressed them for a while with the NIR work.
* gallivm: (trivial) fix the logic deciding if function call should be used...Roland Scheidegger2015-04-011-3/+1
| | | | | Copy and paste bug with the img filter decision. Since there's only 2 different filters anyway just drop this bit.
* egl: add initial EGL_MESA_image_dma_buf_export v2.4Dave Airlie2015-04-011-1/+31
| | | | | | | | | | | | | | | | | | | | | | | | | | At the moment to get an EGL image to a dma-buf file descriptor, you have to use EGL_MESA_drm_image, and then use libdrm to convert this to a file descriptor. This extension just provides an API modelled on EGL_MESA_drm_image, to return a dma-buf file descriptor. v2: update spec for new API proposal add internal queries to get the fourcc back from intel driver. v2.1: add gallium pieces. v2.2: add offsets to spec and API, rename fd->fds, stride->strides in API. rewrite spec a bit more, add some q/a v2.3: add modifiers to query interface and 64-bit type for that (Daniel Stone) specifiy what happens to num fds vs num planes differences. (Chad Versace) v2.4: fix grammar (Daniel Stone) Signed-off-by: Dave Airlie <[email protected]>
* gallivm: do some hack heuristic to disable texture functionsRoland Scheidegger2015-04-011-0/+40
| | | | | | | | | | | | | | We've seen some cases where performance can hurt quite a bit. Technically, the more simple the function the more overhead there is for using a function for this (and the less benefits this provides). Hence don't do this if we expect the generated code to be simple. There's an even more important reason why this hurts performance, which is shaders reusing the same unit with some of the same inputs, as llvm cannot figure out the calculations are the same if they are performned in the function (even just reusing the same unit without any input being the same provides such optimization opportunities though not very much). This is something which would need to be handled by IPO passes however.
* nouveau: synchronize "scratch runout" destruction with the command streamMarcin Ślusarz2015-03-312-19/+37
| | | | | | | | | | | | | | | | | | | | | When nvc0_push_vbo calls nouveau_scratch_done it does not mean scratch buffers can be freed immediately. It means "when hardware advances to this place in the command stream the scratch buffers can be freed". To fix it, just postpone scratch runout destruction after current fence is signalled. The bug existed for a very long time. Nobody noticed, because "scratch runout" code path is rarely executed. Fixes hang at the very beginning of first mission in "Serious Sam 3" on nve7/gk107. It manifested as: nouveau E[ PFIFO][0000:01:00.0] read fault at 0x000a9e0000 [PTE] from GR/GPC0/PE_2 on channel 0x007f853000 [Sam3[17056]] Cc: "10.4 10.5" <[email protected]> Reviewed-by: Ilia Mirkin <[email protected]>
* clover: Return CL_BUILD_ERROR for CL_PROGRAM_BUILD_STATUS when compilation ↵Tom Stellard2015-03-311-0/+2
| | | | | | | | | | | fails v2 v2: - Don't use _errs map Cc: 10.5 10.4 <[email protected]> Reviewed-by: Francisco Jerez <[email protected]>
* radeonsi/compute: Default to the same PIPE_SHADER_CAP values as other shader ↵Tom Stellard2015-03-311-1/+5
| | | | | | | | | types v2 v2: - Fix typo Reviewed-by: Marek Olšák <[email protected]>
* radeon/vce: implement video usability information supportLeo Liu2015-03-313-0/+59
| | | | | | | | | This will help encoding VUI into the bitstream v2: make backward compatible Signed-off-by: Leo Liu <[email protected]> Reviewed-by: Christian König <[email protected]>
* st/omx/enc: export framerate to vce driverLeo Liu2015-03-311-4/+4
| | | | | | | The framerate will be used for video usability info support by VCE driver Signed-off-by: Leo Liu <[email protected]> Reviewed-by: Christian König <[email protected]>
* llvmpipe: enable ARB_texture_gatherRoland Scheidegger2015-03-311-2/+3
| | | | | | | | | | | | Just announce support for 4 components. While here also increase the max/min texel offsets (the limit is completely artificial, was chosen because that's what other hardware did, however there's other drivers using larger limits). Over a thousand little piglits skip->pass. v2: update docs/GL3.txt Reviewed-by: Jose Fonseca <[email protected]>
* gallivm: implement TG4 for ARB_texture_gatherRoland Scheidegger2015-03-312-40/+133
| | | | | | | | | | | | | | | | This is quite trivial, essentially just follow all the same code you'd use with linear min/mag (and no mip) filter, then just skip the filtering after looking up the texels in favor of direct assignment of the right channel to the result. (This is though not true for the multi-offset version if we'd want to support it - for this would probably need to do something along the lines of 4x nearest sampling due to the necessity of doing coord wrapping individually per texel.) Supports multi-channel formats. From the SM5 gather cap bit, should support non-constant offsets, plus shadow comparisons (the former untested), but not component selection (should be easy to implement but all this stuff is not really exposable anyway for now). Reviewed-by: Jose Fonseca <[email protected]>
* gallivm: add gather support to sampler interfaceRoland Scheidegger2015-03-313-21/+34
| | | | | | Luckily thanks to the revamped interface this is a lot less work now... Reviewed-by: Jose Fonseca <[email protected]>
* gallivm: simplify sampler interfaceRoland Scheidegger2015-03-316-271/+218
| | | | | | | | | | | | | | | | | | This has got a bit out of control with more and more parameters added. Worse, whenever something in there changes all callees have to be updated for that, even though they don't really do much with any parameter in there except pass it on to the actual sampling function. Hence simply put almost everything into a struct. Also instead of relying on some arguments being NULL, be explicit and set this in a key (which is just reused for function generation for simplicity). (The code still relies on them being NULL in the end for now.) Technically there is a minimal functional change here for shadow sampling: if shadow sampling is done is now determined explicitly by the texture function (either sample_c or the gl-style tex func inherit this from target) instead of the static texture state. These two should always match, however. Otherwise, it should generate all the same code. Reviewed-by: Jose Fonseca <[email protected]>
* util/debug: Update MgwHelp link, drop BfdHelp link.Jose Fonseca2015-03-311-10/+2
|
* gallivm: Fix build against LLVM 3.7 SVN r233648Michel Dänzer2015-03-311-0/+5
| | | | Reviewed-by: Jose Fonseca <[email protected]>
* vc4: Drop integer multiplies with 0 to moves of 0.Eric Anholt2015-03-301-0/+8
| | | | | | | | This cleans up more instructions generated by uniform array indexing multiplies. total instructions in shared programs: 39989 -> 39961 (-0.07%) instructions in affected programs: 896 -> 868 (-3.12%)
* vc4: Add a constant folding pass.Eric Anholt2015-03-304-0/+113
| | | | | | | | | | | | This cleans up some pointless operations generated by the in-driver mul24 lowering (commonly generated by making a vec4 index for a matrix in a uniform array). I could fill in other operations, but pretty much anything else ought to be getting handled at the NIR level, I think. total uniforms in shared programs: 13423 -> 13421 (-0.01%) uniforms in affected programs: 346 -> 344 (-0.58%)
* vc4: Don't bother masking out the low 24 bits for integer multipliesEric Anholt2015-03-301-12/+8
| | | | | | | | | | The hardware just uses the low 24 lines, saving us an AND to drop the high bits. total uniforms in shared programs: 13433 -> 13423 (-0.07%) uniforms in affected programs: 356 -> 346 (-2.81%) total instructions in shared programs: 40003 -> 39989 (-0.03%) instructions in affected programs: 910 -> 896 (-1.54%)
* vc4: Make integer multiply use 24 bits for the low parts.Eric Anholt2015-03-301-5/+5
| | | | | The hardware uses the low 24 bits in integer multiplies, so we can have fewer high bits (and so probably drop them more frequently).
* radeonsi: Cache LLVMTargetMachineRef in context instead of in screenMichel Dänzer2015-03-306-30/+41
| | | | | | | | | | Fixes a crash in genymotion with several threads compiling shaders concurrently. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=89746 Cc: 10.5 <[email protected]> Reviewed-by: Tom Stellard <[email protected]>
* freedreno/a3xx: add support for point sprite coordinate replacementIlia Mirkin2015-03-284-30/+28
| | | | | | | This does not (yet) support different coordinate origins, so the tests still fail due to fbo flipping. Signed-off-by: Ilia Mirkin <[email protected]>
* freedreno/a3xx: make vs-set point size workIlia Mirkin2015-03-283-2/+10
| | | | | | | | | | This appears to need the A2XX version of the point list, so select it at draw time if necessary. Experimentally, always using the A2XX version causes hangs when PSIZE isn't actually emitted. Signed-off-by: Ilia Mirkin <[email protected]>
* freedreno/a3xx: point size should not be divided by 2Ilia Mirkin2015-03-282-5/+5
| | | | | | | | | | | The division is probably a holdover from the days when the fixed point inline functions generated by headergen were broken. Also reduce the maximum point size to 4092 (vs 4096), which is what the blob does. Cc: "10.4 10.5" <[email protected]> Signed-off-by: Ilia Mirkin <[email protected]>
* freedreno/a3xx: fix 3d texture layoutIlia Mirkin2015-03-282-7/+16
| | | | | | | | | | | | | The SZ2 field contains the layer size of a lower miplevel. It only contains 4 bits, which limits the maximum layer size it can describe. In situations where the next miplevel would be too big, the hardware appears to keep minifying the size until it hits one of that size. Unfortunately the hardware's ideas about sizes can differ from freedreno's which can still lead to issues. Minimize those by stopping to minify as soon as possible. Signed-off-by: Ilia Mirkin <[email protected]> Cc: "10.4 10.5" <[email protected]>
* freedreno/a3xx: LAYERSZ2 appears to have no effect on arraysIlia Mirkin2015-03-281-2/+1
| | | | Signed-off-by: Ilia Mirkin <[email protected]>
* llvmpipe: simplify address calculation for 4x4 blocksRoland Scheidegger2015-03-284-76/+35
| | | | | | | | | | | | | | These functions looked quite complicated, even though what they actually did was trivial (ever since we dropped swizzled rendering). Also drop lookup of format block per bytes done for each block, and do it once per scene instead. This improves everybody's favorite "benchmark" by 3% or so, though lp_rast_shade_quads_all() which calls this shows up still quite high for a function which does little more than call the jit function. (This would most likely be much better handled by the jit function itself, the strides are passed through anyway already, though for being able to handle layers it would definitely add some complexity.) Reviewed-by: Jose Fonseca <[email protected]>
* gallivm: fix texture function name (key) when using txf/ldRoland Scheidegger2015-03-281-2/+5
| | | | | | | | | | When using the texel fetch functions rather than ordinary texturing, the arguments are all int vecs instead of float vecs, not to mention the actual function would look completely different. Hence this must be included in the texture function name (which serves as the key) otherwise things crash badly when a shader accesses the same texture and sampler unit with both txf/ld and ordinary texturing instructions with otherwise matching keys.
* nv50/ir/gk110: fix offset flag position for TXD opcodeIlia Mirkin2015-03-271-0/+1
| | | | | Cc: "10.4 10.5" <[email protected]> Signed-off-by: Ilia Mirkin <[email protected]>
* nv50/ir: take postFactor into account when doing peephole optimizationsIlia Mirkin2015-03-271-4/+8
| | | | | | | | | | Multiply operations can have a post-factor on them, which other ops don't support. Only perform the peephole optimizations when there is no post-factor involved. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=89758 Cc: "10.4 10.5" <[email protected]> Signed-off-by: Ilia Mirkin <[email protected]>
* gallivm: Fix build since llvm r233411Jan Vesely2015-03-271-0/+4
| | | | | Signed-off-by: Jan Vesely <[email protected]> Reviewed-by: Tom Stellard <[email protected]>
* gallivm: use llvm function calls for texturing instead of inliningRoland Scheidegger2015-03-272-26/+438
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | There are issues with inlining everything, most notably llvm will use much more memory (and be slower) when compiling. Ideally we'd probably use functions for shader functions too but texture sampling usually is responsible for quite some IR (it can easily reach 80% of total IR instructions) so this seems like a good start. This still generates a different function for all different combinations just like before, however it is possible llvm is missing some optimization opportunities - it is believed though such opportunities should be somewhat rare, but at least for now it can still be switched off (at compile time only). It should probably make compiled code also smaller because the same function should be used for different variants in the same module (so for the opaque/partial or linear/elts variants). No piglit change (though it does indeed speed up unrealistic tests like fp-indirections2 by a factor of 30 or so). Has a small negative performance impact in openarena - I suspect this could be fixed by running some IPO passes (despite the private linkage, llvm right now does NO optimization at all wrt anything going past the call, even if there's just one caller - so things like values stored before the call and then always written by the function etc. will not be optimized away, nor will dead arguments (which we mostly shouldn't have) be eliminated, always constant arguments promoted etc.). v2: use proper return values instead of pointer function arguments. llvm supports aggregate return values, which do wonders here eliminating unnecessary stack variables - everything in fact will be returned in registers even without any IPO optimizations. It makes the code simpler too. With this I could not measure a peformance impact in openarena any longer (though since there's still no constant value propagation etc. into the tex functions this does not mean it couldn't have a negative impact elsewhere). v3: fix some minor issues suggested by Jose, and do disassembly (and the profiling) without hacks. Reviewed-by: Jose Fonseca <[email protected]>
* gallivm: pass jit_context pointer through to samplingRoland Scheidegger2015-03-2711-112/+171
| | | | | | | | | | | | | | The callbacks used for getting the dynamic texture/sampler state were using the jit_context from the generated jit function. This works just fine, however that way it's impossible to generate separate functions for texture sampling, as will be done in the next commit. Hence, pass this pointer through all interfaces so it can be passed to a separate function (technically, it would probably be possible to extract this pointer from the current function instead, but this feels hacky and would probably require some more hacks if we'd use real functions instead of inlining all shader functions at some point). There should be no difference in the generated code for now. Reviewed-by: Jose Fonseca <[email protected]>
* gallium/vl: partially revert "Use util_cpu_to_le{16,32} in many more places."Christian König2015-03-271-1/+5
| | | | | | | | The data in memory is in big endian format and needs to be converted into CPU byte order. So the patch actually reversed what needs to be done. Signed-off-by: Christian König <[email protected]> Reviewed-by: Matt Turner <[email protected]>
* tgsi: fix out-of-bounds access for cube arraysIlia Mirkin2015-03-261-1/+1
| | | | | | | | | | The CUBE_ARRAY case uses r[4]. Make sure that the stack variable is there. Noticed by Coverity. Signed-off-by: Ilia Mirkin <[email protected]> Reviewed-by: Roland Scheidegger <[email protected]>
* gallium/util: remove u_linkageIlia Mirkin2015-03-265-220/+0
| | | | | | | | Does not appear to be used in tree. Coverity spotted some errors in the bitmask stuff, but the whole thing appears to be unused. Signed-off-by: Ilia Mirkin <[email protected]> Reviewed-by: Brian Paul <[email protected]>
* gallium/hud: avoid overflowing hud graph name sizeIlia Mirkin2015-03-261-1/+2
| | | | | | | Spotted by Coverity. Signed-off-by: Ilia Mirkin <[email protected]> Reviewed-by: Marek Olšák <[email protected]>
* gallium/util: Use HAVE___BUILTIN_FFS* macros.Jonathan Gray2015-03-241-2/+16
| | | | | | | | | Make use of the builtin ffs macros and split out ffsll to a seperate block. Needed for at least OpenBSD which does not have ffsll in libc. Signed-off-by: Jonathan Gray <[email protected]> Reviewed-by: Emil Velikov <[email protected]>
* vc4: Add a dump-the-surface-contents routine.Eric Anholt2015-03-242-0/+101
| | | | | This has been useful once again while trying to debug stride issues between render targets and texturing.
* vc4: Allow DRI3 on simulation, as well.Eric Anholt2015-03-241-0/+5
| | | | The problem I'd seen before seems to be gone.
* vc4: Fix pitch alignment of linear textures.Eric Anholt2015-03-241-1/+1
| | | | | Fixes some non-power-of-two texture rendering when I force ARGB8888 to raster.
* vc4: Write the alignment of level width consistently in validation.Eric Anholt2015-03-241-2/+2
| | | | | | 16 / cpp happens to be the same as utile_w on the only raster format supported (4 bytes per pixel), but simulator/hw source code generally talks in terms of utiles.
* vc4: Fix use of a bool as an enum.Eric Anholt2015-03-241-1/+1
| | | | The enum compared to was 0, so it worked out, but it sure looked wrong.
* vc4: Decide the HW's format before laying out the miptree.Eric Anholt2015-03-241-3/+3
| | | | | | I'm experimenting with a workaround for raster texture misrendering on hardware, and this lets me look at the format chosen when computing strides.
* vc4: Use our device-specific ioctls for create/mmap.Eric Anholt2015-03-241-15/+36
| | | | | | They don't do anything special for us, but I've been told by kernel maintainers that relying on dumb for my acceleration-capable buffers is not OK.
* vc4: Make a new #define for making code conditional on the simulator.Eric Anholt2015-03-243-15/+25
| | | | | | I'd like to compile as much of the device-specific code as possible when building for simulator, and using if (using_simulator) instead of ifdefs helps.
* vc4: Add some useful debug printfs for miptrees.Eric Anholt2015-03-241-0/+37
| | | | I keep rewriting these.
* Revert "nv50,nvc0: remove bogus 64_FLOAT formats"Ilia Mirkin2015-03-231-0/+5
| | | | | | | | | | | | | | | This reverts commit 20346808cf4f1ee4f320afaf18f94043fb146f2e. The conversion is actually done since these are the *B macro variants and no vtx format is supplied, which makes them go through the translate module. This restores the following piglit tests to passing: draw-vertices user gl-2.0-vertexattribpointer Signed-off-by: Ilia Mirkin <[email protected]>