summaryrefslogtreecommitdiffstats
path: root/src/gallium/drivers
Commit message (Collapse)AuthorAgeFilesLines
* gallivm: don't use control flow when doing indirect constant buffer lookupsRoland Scheidegger2015-04-091-2/+4
| | | | | | | | | | | | | | | | | | | | | | | llvm goes crazy when doing that, using way more memory and time, though there's probably more to it - this points to a very much similar issue as fixed in 8a9f5ecdb116d0449d63f7b94efbfa8b205d826f. In any case I've seen a quite plain looking vertex shader with just ~50 simple tgsi instructions (but with a dozen or so such indirect constant buffer lookups) go from a terribly high ~440ms compile time (consuming 25MB of memory in the process) down to a still awful ~230ms and 13MB with this fix (with llvm 3.3), so there's still obvious improvements possible (but I have no clue why it's so slow...). The resulting shader is most likely also faster (certainly seemed so though I don't have any hard numbers as it may have been influenced by compile times) since generally fetching constants outside the buffer range is most likely an app error (that is we expect all indices to be valid). It is possible this fixes some mysterious vertex shader slowdowns we've seen ever since we are conforming to newer apis at least partially (the main draw loop also has similar looking conditionals which we probably could do without - if not for the fetch at least for the additional elts condition.) v2: use static vars for the fake bufs, minor code cleanups Reviewed-by: Jose Fonseca <[email protected]>
* r600g/sb: Enable SB for geometry shadersGlenn Kennard2015-04-0811-16/+55
| | | | | | | | | | | | | | | | Add SV_GEOMETRY_EMIT special variable type to track the implicit dependencies between CUT/EMIT_VERTEX/MEM_RING instructions so GCM/scheduler doesn't reorder them. Mark emit instructions as unkillable so DCE doesn't eat them. Enable only for evergreen/cayman as there are a few unexplained GS piglit regressions on R6xx/R7xx with SB enabled otherwise. Signed-off-by: Glenn Kennard <[email protected]> Reviewed-by: Dave Airlie <[email protected]> Signed-off-by: Dave Airlie <[email protected]>
* r600g/sb: Update last_cf for loopsGlenn Kennard2015-04-081-0/+8
| | | | | | | | | | | CF_END could end up emitted in the middle of a shader on cayman when there was a loop at the very end. Fixes glsl-1.50-geometry-end-primitive and ext_transform_feedback-geometry-shaders-basic piglit tests. Signed-off-by: Glenn Kennard <[email protected]> Signed-off-by: Dave Airlie <[email protected]>
* nv50,nvc0: limit the y-tiling of 3d textures to the first level's tilingIlia Mirkin2015-04-063-10/+13
| | | | | | | | | | | | | | We limit y-tiling to 0x20 when depth is involved. However the function is run for each miplevel, and the hardware expects miplevel 0 to have the highest tiling settings. Perform the y-tiling limit on all levels of a 3d texture, not just the ones that have depth. Fixes: texelFetch fs sampler3D 98x129x1-98x129x9 Signed-off-by: Ilia Mirkin <[email protected]> Tested-by: Nick Tenney <[email protected]> # GT216 Cc: "10.4 10.5" <[email protected]>
* r600g: fix op3 abs issueDave Airlie2015-04-071-17/+34
| | | | | | | | | | | This code to handle absolute values on op3 srcs was a bit too simple, it really needs a temp reg per src, not one per channel, make it easier and let sb clean up the mess. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=89831 Reviewed-by: Glenn Kennard <[email protected]> Signed-off-by: Dave Airlie <[email protected]>
* freedreno/ir3: add NIR compilerRob Clark2015-04-057-5/+1762
| | | | | | | | | | | | | | | | The NIR compiler frontend is an alternative to the TGSI f/e, producing the same ir3 IR and using the same backend passes for scheduling, etc. It is not enabled by default yet, as there are still some regressions. To enable, use 'FD_MESA_DEBUG=nir'. It is enough to use with, for example, xonotic or supertuxkart. With the NIR f/e, scalarizing and a number of other lowering steps happen in NIR, so we don't have to do them in ir3. Which simplifies the f/e and allows the lowered instructions to pass through other optimization stages. Signed-off-by: Rob Clark <[email protected]>
* freedreno/a3xx: don't decode srgb on mem2gmemIlia Mirkin2015-04-051-6/+12
| | | | Signed-off-by: Ilia Mirkin <[email protected]>
* freedreno/a3xx: pass sprite coord mode through to program emitIlia Mirkin2015-04-053-1/+4
| | | | | | | | Use the correct sprite replacement depending on the flip of the coord mode, using either T or 1-T depending on whether we have an upper-left or lower-left coordinate origin. This fixes all the point sprite piglits. Signed-off-by: Ilia Mirkin <[email protected]>
* freedreno/a3xx: add UBO supportIlia Mirkin2015-04-056-38/+132
| | | | Signed-off-by: Ilia Mirkin <[email protected]>
* freedreno/ir3: insert nop between sfu/mem operationsIlia Mirkin2015-04-051-1/+6
| | | | Signed-off-by: Ilia Mirkin <[email protected]>
* freedreno: dirty context when reallocating a bound boIlia Mirkin2015-04-051-0/+40
| | | | Signed-off-by: Ilia Mirkin <[email protected]>
* freedreno: keep track of buffer valid rangesIlia Mirkin2015-04-052-0/+27
| | | | | | | | Copies nouveau_buffer and radeon_buffer. This allows a write to proceed to an uninitialized part of a buffer even when the GPU is using the previously-initialized portions. Signed-off-by: Ilia Mirkin <[email protected]>
* freedreno: mark resources as being read so that writes flush the queueIlia Mirkin2015-04-055-1/+59
| | | | Signed-off-by: Ilia Mirkin <[email protected]>
* freedreno: don't bother setting resource timestampsIlia Mirkin2015-04-051-9/+0
| | | | | | | Waiting on a bo being ready is handled in fd_bo_cpu_prep. No need to keep separate timestamps around. Signed-off-by: Ilia Mirkin <[email protected]>
* freedreno: add a reading flag to indicate gpu is reading rscIlia Mirkin2015-04-052-2/+3
| | | | Signed-off-by: Ilia Mirkin <[email protected]>
* freedreno: fix resource flushing confusionIlia Mirkin2015-04-051-14/+10
| | | | | | | | | A resource flush is an upload of a hypothetically-staging texture to the GPU. For a UMA system, this will largely be a no-op or cache-maintenance. Move the render flush logic into transfer_map where it belongs, and clear out the transfer_flush function. Signed-off-by: Ilia Mirkin <[email protected]>
* freedreno: remove tex_resourceIlia Mirkin2015-04-059-11/+3
| | | | | | | pipe_sampler_view already contains a texture, remove the redundant tex_resource member which pointed at the same thing. Signed-off-by: Ilia Mirkin <[email protected]>
* freedreno/ir3: handle FRAG IN's without interpolation specifiedRob Clark2015-04-051-7/+15
| | | | | | Fallback to picking based on semantic name. Signed-off-by: Rob Clark <[email protected]>
* freedreno/ir3/cmdline: add @const headers for immediatesRob Clark2015-04-051-0/+9
| | | | | | | | | | Since NIR f/e currently encodes immediates in instructions (rather than passing via const), we need to ensure that when const's are used the get initialized to the proper values. Otherwise comparing NIR to TGSI compiler, it will use proper immediate values in one case, and randomly initialize values in the other. Which confuses ir3test. Signed-off-by: Rob Clark <[email protected]>
* freedreno/ir3/cmdline: remove hack for old compilerRob Clark2015-04-051-23/+0
| | | | | | Since we dropped the old compiler, we don't need this hack anymore. Signed-off-by: Rob Clark <[email protected]>
* freedreno/ir3: handle const/immed/abs/neg in cpRob Clark2015-04-053-31/+314
| | | | | | | | | | | | | Be smarter about propagating copies from const or immed, or with abs/neg modifiers. Also, realize that absneg.s and absneg.f are really "fancy" mov instructions. This opens up the possibility to remove more copies. It helps the TGSI frontend a bit, but will be really needed for the NIR f/e which builds everything up in SSA form (ie. will *always* insert a mov from const or immediate). Signed-off-by: Rob Clark <[email protected]>
* freedreno/ir3: split float/int abs/negRob Clark2015-04-055-64/+213
| | | | | | | | | | | | Even though in the end, they map to the same bits, the backend will need to be able to differentiate float abs/neg vs integer abs/neg. Rather than making the backend figure it out based on instruction opcode (which when combined with mov/absneg instructions, can be awkward), just split out different flags for each so the frontend can signal it's intentions more clearly. Also, since (neg) for bitwise op's is actually a bitwise- not, split it out into bnot flag. Signed-off-by: Rob Clark <[email protected]>
* freedreno/ir3: add ir3 builder helpersRob Clark2015-04-053-4/+162
| | | | | | | | Add helpers for constructing SSA forms of instructions. Only partial cat5/cat6 coverage.. but we can add stuff as needed. Signed-off-by: Rob Clark <[email protected]>
* freedreno/ir3: fix sam argument order commentRob Clark2015-04-051-1/+1
| | | | Signed-off-by: Rob Clark <[email protected]>
* xa: support for drivers which use NIRRob Clark2015-04-051-0/+4
| | | | | | | | | | We need to pull in libnir.la and it's dependency libglsl_util.la. Also, _mesa_error_no_memory() must be defined. Fortunately with libnir.la (vs pulling in all of libglsl.la) we don't also need libstdc++. Signed-off-by: Rob Clark <[email protected]>
* nv50: allocate more offset space for occlusion queriesIlia Mirkin2015-04-041-5/+5
| | | | | | | | | | | | | | | Commit 1a170980a09 started writing to q->data[4]/[5] but kept the per-query space at 16, which meant that in some cases we would write past the end of the buffer. Rotate by 32, like nvc0 does. This ensures that we always have 32 bytes in front of us, and the data writes will go within the allocated space. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=89679 Signed-off-by: Ilia Mirkin <[email protected]> Tested-by: Nick Tenney <[email protected]> Reviewed-by: Samuel Pitoiset <[email protected]> Reviewed-by: Tobias Klausmann <[email protected]> Cc: "10.4 10.5" <[email protected]>
* nv50/ir: avoid folding immediates into imad operationsIlia Mirkin2015-04-021-1/+2
| | | | | | | | Commit 09ee907266 added logic to fold immediates into mad operations, but the emission code is only there for fmad. Only allow it on float types. Signed-off-by: Ilia Mirkin <[email protected]>
* nv50/ir: fix imad emission when dst == src2Ilia Mirkin2015-04-021-1/+1
| | | | | | | Commit fb63df22151f added 4-byte mad support, but only supported emission for floats. Disable it for ints for now. Signed-off-by: Ilia Mirkin <[email protected]>
* vc4: Add support for nir_iabs.Eric Anholt2015-04-021-0/+5
| | | | | Tested using the GLSL 1.30 tests for integer abs(). Not currently used, but it was one of the new opcodes used by robclark's idiv lowering.
* freedreno/a3xx: add MRT supportIlia Mirkin2015-04-028-138/+219
| | | | | | | The hardware only supports 4 MRTs. It should be possible to emulate support for 8, but doesn't seem worth the trouble. Signed-off-by: Ilia Mirkin <[email protected]>
* freedreno: convert blit program to array for each number of rtsIlia Mirkin2015-04-0212-21/+45
| | | | Signed-off-by: Ilia Mirkin <[email protected]>
* freedreno: add support for laying out MRTs in gmemIlia Mirkin2015-04-022-16/+43
| | | | Signed-off-by: Ilia Mirkin <[email protected]>
* freedreno: add core infrastructure support for MRTsIlia Mirkin2015-04-024-8/+14
| | | | Signed-off-by: Ilia Mirkin <[email protected]>
* freedreno/ir3: add support for FS_COLOR0_WRITES_ALL_CBUFS propertyIlia Mirkin2015-04-022-1/+10
| | | | | | | This will enable the driver to tell which regids to link up to which MRT outputs. Signed-off-by: Ilia Mirkin <[email protected]>
* freedreno/a3xx: add independent blend function supportIlia Mirkin2015-04-022-8/+9
| | | | | | This is needed for MRT support Signed-off-by: Ilia Mirkin <[email protected]>
* freedreno: remove alpha key from ir3_shaderIlia Mirkin2015-04-029-42/+8
| | | | | | | This complication is unnecessary and makes MRTs more complicated and likely to generate tons of variants. Signed-off-by: Ilia Mirkin <[email protected]>
* i915g: Implement EGL_EXT_image_dma_buf_importStéphane Marchesin2015-04-012-1/+2
| | | | | | | This adds all the plumbing to get EGL_EXT_image_dma_buf_import in i915g. Signed-off-by: Stéphane Marchesin <[email protected]>
* vc4: Add shader-db dumping of NIR instruction count.Eric Anholt2015-04-011-0/+29
| | | | | | | | I was previously using temporary disables of VC4 optimization to show the benefits of improved NIR optimization, but this can get me quick and dirty numbers for NIR-only improvements without having to add hacks to disable VC4's code (disabling of which might hide ways that the NIR changes would hurt actual VC4 codegen).
* vc4: Convert to consuming NIR.Eric Anholt2015-04-015-720/+707
| | | | | | | | | | | | | | | | | | | NIR brings us better optimization than I would have bothered to write within the driver, developers sharing future optimization work, and the ability to share device-specific lowering code that we and other GLES2-level drivers need. total uniforms in shared programs: 13421 -> 13422 (0.01%) uniforms in affected programs: 62 -> 63 (1.61%) total instructions in shared programs: 39961 -> 39707 (-0.64%) instructions in affected programs: 15494 -> 15240 (-1.64%) v2: Add missing imov support, and assert that there are no dest saturates. v3: Rebase on the target-specific algebraic series. v4: Rebase on gallium-includes-from-NIR changes in mater. v5: Rebase on variables being in lists instead of hash tables. v6: Squash in intermediate changes that used the NIR-to-TGSI pass (which I'm not committing)
* vc4: Tell shader-db how big our UBOs are, if present.Eric Anholt2015-04-011-0/+6
| | | | I had regressed them for a while with the NIR work.
* nouveau: synchronize "scratch runout" destruction with the command streamMarcin Ślusarz2015-03-312-19/+37
| | | | | | | | | | | | | | | | | | | | | When nvc0_push_vbo calls nouveau_scratch_done it does not mean scratch buffers can be freed immediately. It means "when hardware advances to this place in the command stream the scratch buffers can be freed". To fix it, just postpone scratch runout destruction after current fence is signalled. The bug existed for a very long time. Nobody noticed, because "scratch runout" code path is rarely executed. Fixes hang at the very beginning of first mission in "Serious Sam 3" on nve7/gk107. It manifested as: nouveau E[ PFIFO][0000:01:00.0] read fault at 0x000a9e0000 [PTE] from GR/GPC0/PE_2 on channel 0x007f853000 [Sam3[17056]] Cc: "10.4 10.5" <[email protected]> Reviewed-by: Ilia Mirkin <[email protected]>
* radeonsi/compute: Default to the same PIPE_SHADER_CAP values as other shader ↵Tom Stellard2015-03-311-1/+5
| | | | | | | | | types v2 v2: - Fix typo Reviewed-by: Marek Olšák <[email protected]>
* radeon/vce: implement video usability information supportLeo Liu2015-03-313-0/+59
| | | | | | | | | This will help encoding VUI into the bitstream v2: make backward compatible Signed-off-by: Leo Liu <[email protected]> Reviewed-by: Christian König <[email protected]>
* llvmpipe: enable ARB_texture_gatherRoland Scheidegger2015-03-311-2/+3
| | | | | | | | | | | | Just announce support for 4 components. While here also increase the max/min texel offsets (the limit is completely artificial, was chosen because that's what other hardware did, however there's other drivers using larger limits). Over a thousand little piglits skip->pass. v2: update docs/GL3.txt Reviewed-by: Jose Fonseca <[email protected]>
* gallivm: simplify sampler interfaceRoland Scheidegger2015-03-311-26/+7
| | | | | | | | | | | | | | | | | | This has got a bit out of control with more and more parameters added. Worse, whenever something in there changes all callees have to be updated for that, even though they don't really do much with any parameter in there except pass it on to the actual sampling function. Hence simply put almost everything into a struct. Also instead of relying on some arguments being NULL, be explicit and set this in a key (which is just reused for function generation for simplicity). (The code still relies on them being NULL in the end for now.) Technically there is a minimal functional change here for shadow sampling: if shadow sampling is done is now determined explicitly by the texture function (either sample_c or the gl-style tex func inherit this from target) instead of the static texture state. These two should always match, however. Otherwise, it should generate all the same code. Reviewed-by: Jose Fonseca <[email protected]>
* vc4: Drop integer multiplies with 0 to moves of 0.Eric Anholt2015-03-301-0/+8
| | | | | | | | This cleans up more instructions generated by uniform array indexing multiplies. total instructions in shared programs: 39989 -> 39961 (-0.07%) instructions in affected programs: 896 -> 868 (-3.12%)
* vc4: Add a constant folding pass.Eric Anholt2015-03-304-0/+113
| | | | | | | | | | | | This cleans up some pointless operations generated by the in-driver mul24 lowering (commonly generated by making a vec4 index for a matrix in a uniform array). I could fill in other operations, but pretty much anything else ought to be getting handled at the NIR level, I think. total uniforms in shared programs: 13423 -> 13421 (-0.01%) uniforms in affected programs: 346 -> 344 (-0.58%)
* vc4: Don't bother masking out the low 24 bits for integer multipliesEric Anholt2015-03-301-12/+8
| | | | | | | | | | The hardware just uses the low 24 lines, saving us an AND to drop the high bits. total uniforms in shared programs: 13433 -> 13423 (-0.07%) uniforms in affected programs: 356 -> 346 (-2.81%) total instructions in shared programs: 40003 -> 39989 (-0.03%) instructions in affected programs: 910 -> 896 (-1.54%)
* vc4: Make integer multiply use 24 bits for the low parts.Eric Anholt2015-03-301-5/+5
| | | | | The hardware uses the low 24 bits in integer multiplies, so we can have fewer high bits (and so probably drop them more frequently).
* radeonsi: Cache LLVMTargetMachineRef in context instead of in screenMichel Dänzer2015-03-306-30/+41
| | | | | | | | | | Fixes a crash in genymotion with several threads compiling shaders concurrently. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=89746 Cc: 10.5 <[email protected]> Reviewed-by: Tom Stellard <[email protected]>