summaryrefslogtreecommitdiffstats
path: root/src/gallium
Commit message (Collapse)AuthorAgeFilesLines
* tgsi/scan: scan texture offset operandsMarek Olšák2016-10-241-0/+16
| | | | | | This seems important considering how much we depend on some of the flags. Reviewed-by: Nicolai Hähnle <[email protected]>
* tgsi/scan: move src operand processing into a separate functionMarek Olšák2016-10-241-171/+183
| | | | | | the next commit will need this Reviewed-by: Nicolai Hähnle <[email protected]>
* tgsi/scan: get information about shader buffer usageMarek Olšák2016-10-242-0/+23
| | | | Reviewed-by: Nicolai Hähnle <[email protected]>
* tgsi/scan: handle indirect image indexing correctlyMarek Olšák2016-10-242-8/+17
| | | | Reviewed-by: Nicolai Hähnle <[email protected]>
* tgsi/scan: don't treat RESQ etc. as memory instructionsMarek Olšák2016-10-241-5/+13
| | | | Reviewed-by: Nicolai Hähnle <[email protected]>
* tgsi/scan: get information about indirect 2D file accessMarek Olšák2016-10-242-0/+7
| | | | Reviewed-by: Nicolai Hähnle <[email protected]>
* tgsi/scan: get information about indirect CONST accessMarek Olšák2016-10-242-0/+15
| | | | Reviewed-by: Nicolai Hähnle <[email protected]>
* nv50/ir: display OP_BAR subops in debug modeSamuel Pitoiset2016-10-241-0/+9
| | | | | Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Ilia Mirkin <[email protected]>
* nv50/ir: it appears that OP_DISCARD can't take a join modifierIlia Mirkin2016-10-221-0/+1
| | | | | | | nvdisasm does not print a .S even though the bit is set. Signed-off-by: Ilia Mirkin <[email protected]> Reviewed-by: Samuel Pitoiset <[email protected]>
* nv50/ir: use levelZero for non-frag tex/txp opsIlia Mirkin2016-10-221-0/+5
| | | | | | | | | radeonsi also does the same thing. I suspect that this is likely to be a no-op in reality, but it brings nouveau code closer to what the blob produces. Plus it makes sense to not try to do auto-derivatives on this. Signed-off-by: Ilia Mirkin <[email protected]> Reviewed-by: Samuel Pitoiset <[email protected]>
* gallium: add PIPE_CAP_STREAM_OUTPUT_INTERLEAVE_BUFFERSIlia Mirkin2016-10-2217-0/+19
| | | | | | | | | | | | | | This allows the driver to signal that it can't handle random interleaving of attributes across buffers. This is required for ARB_transform_feedback3, and it's initialized to whatever the previous value of PIPE_CAP_STREAM_OUTPUT_PAUSE_RESUME was except for nv50 where it is disabled. Note that the proprietary drivers never expose ARB_transform_feedback3 on any GT21x's (where nouveau previously did), and after some effort I was unable to get it to work. Signed-off-by: Ilia Mirkin <[email protected]> Reviewed-by: Nicolai Hähnle <[email protected]> Reviewed-by: Marek Olšák <[email protected]>
* nvc0/ir: remove outdated comment about SHLADDSamuel Pitoiset2016-10-222-2/+0
| | | | | Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Ilia Mirkin <[email protected]>
* vc4: Avoid making temporaries for assignments to NIR registers.Eric Anholt2016-10-211-35/+79
| | | | | | | | | | | | | | | | | | | | | | | Getting stores to NIR regs to not generate new MOVs is tricky, since the result we're trying to store into the NIR reg may have been from a conditional update of a temp, or a series of packed writes. The easiest solution seems to be to require that nir_store_dest()'s arg comes from an SSA temp. This causes us to put in a few more temporary MOVs in the NIR SSA dest case, but copy propagation successfully cleans those up. The shader-db change is modest: total instructions in shared programs: 93774 -> 93598 (-0.19%) instructions in affected programs: 14760 -> 14584 (-1.19%) total estimated cycles in shared programs: 212135 -> 211946 (-0.09%) estimated cycles in affected programs: 27005 -> 26816 (-0.70%) but I was seeing patterns in some register-allocation failures in DEQP tests that looked like the extra MOVs would increase maximum register pressure in loops. Some debug code indicates that that's not the case, though I'm still a bit confused by that result.
* vc4: Add a comment with discussion of how simulation works.Eric Anholt2016-10-211-0/+25
|
* vc4: Move simulator winsys mapping and tracking to the simulator.Eric Anholt2016-10-213-20/+56
| | | | | One tiny hack is left in vc4_bufmgr.c for what kind of mapping we got so that we can free it.
* vc4: Move simulator memory management to a u_mm.h heap.Eric Anholt2016-10-215-41/+208
| | | | | | Now we aren't limited to 256MB total allocated across a driver instance, just 256MB at one time. We're still copying in and out, which should get fixed.
* vc4: Move simulator globals into a struct.Eric Anholt2016-10-212-34/+29
| | | | | I would like to put a couple more things in here, so it's time to package it up.
* vc4: Restructure the simulator mode.Eric Anholt2016-10-215-84/+182
| | | | | | | | | | | | | Rather than having simulator mode changes scattered around vc4_bufmgr.c and vc4_screen.c, make vc4_bufmgr.c just call a vc4_simulator_ioctl, which then dispatches to a corresponding implementation. This will give the simulator support a centralized place to do tricks like storing most BOs directly in simulator memory rather than copying in and out. This leaves special casing of mmaping BOs and execution, because of the winsys mapping.
* vc4: Fix termination of the initial scan for branch targets.Eric Anholt2016-10-211-11/+8
| | | | | | | | | | | | | The loop is scanning until the original max_ip (size of the BO), but we want to not examine any code after the PROG_END's delay slots. There was a block trying to do that, except that we had some early continue statements if the signal wasn't a PROG_END or a BRANCH. The failure mode would be that a valid shader is rejected because some undefined memory after the PROG_END slots is parsed as a branch and the rest of its setup is illegal. I haven't seen this in the wild, but valgrind was complaining and the new userland simulator code started triggering it.
* radeonsi: fix a regression in si_eliminate_const_outputNicolai Hähnle2016-10-211-4/+3
| | | | | | | | | | A constant value of float type is not necessarily a ConstantFP: it could also be a constant expression that for some reason hasn't been folded. This fixes a regression in GL45-CTS.arrays_of_arrays_gl.InteractionFunctionCalls2 that was introduced by commit 3ec9975555d1cc5365413ad9062f412904f944a3. Reviewed-by: Marek Olšák <[email protected]>
* nv50,nvc0: don't keep track of whether fb rt0 is integer-onlyIlia Mirkin2016-10-216-44/+22
| | | | | | | | | | This reverts commits 1af0641db345209c076e9b1ba4dca7524541671a and a6ad49cbbd599aec054d0a3163fff5ad724f2b18. st/mesa adjusts the rasterizer state for us now. Signed-off-by: Ilia Mirkin <[email protected]> Reviewed-by: Samuel Pitoiset <[email protected]>
* nvc0: do not break 3D state by pushing MS coordinates on FermiSamuel Pitoiset2016-10-201-43/+44
| | | | | | | | | | | | | | | | | | Long story short, 3D and CP are aliased on Fermi and initializing compute after pushing the MS sample coordinate offsets seems to corrupt 3D state for weird reasons. I still don't have the faintest clue what is going on, but this seems to only affect Fermi generation. A possible fix could be to use two different channels, one for 3D and one for CP. This fixes a bunch of regressions pinpointed by piglit. Fixes: "nvc0: fix up image support for allowing multiple samples" Cc: "13.0" <[email protected]> Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Ilia Mirkin <[email protected]>
* nvc0: translate compute shaders at program creationSamuel Pitoiset2016-10-201-0/+4
| | | | | | | This makes shader-db reports results for compute shaders. Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Ilia Mirkin <[email protected]>
* gallivm: try to fix build with LLVM <= 3.4 due to missing CallSite.hMarek Olšák2016-10-201-1/+5
| | | | | Reviewed-by: Brian Paul <[email protected]> Tested-by: Brian Paul <[email protected]>
* radeonsi: fix build of si_eliminate_const_vs_outputs on LLVM <= 3.8Marek Olšák2016-10-201-3/+2
| | | | Reviewed-by: Nicolai Hähnle <[email protected]>
* gallivm: add wrappers for missing functions in LLVM <= 3.8Marek Olšák2016-10-202-0/+27
| | | | | | radeonsi needs these. Reviewed-by: Nicolai Hähnle <[email protected]>
* radeonsi: fix 64-bit loads from LDSNicolai Hähnle2016-10-201-1/+1
| | | | | | | | | Fixes spec/arb_tessellation_shader/execution/dvec[23]-vs-tcs-tes, among others. Cc: "12.0 13.0" <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]> Reviewed-by: Marek Olšák <[email protected]>
* nv50/ir: process texture offset sources as regular sourcesIlia Mirkin2016-10-191-53/+94
| | | | | | | | | | | | | | | With ARB_gpu_shader5, texture offsets can be any source, including TEMPs and IN's. Make sure to process them as regular sources so that we pick up masks, etc. This should fix some CTS tests that feed offsets directly to textureGatherOffset, and we were not picking up the input use, thus not advertising it in the shader header. Signed-off-by: Ilia Mirkin <[email protected]> Reviewed-by: Samuel Pitoiset <[email protected]> Tested-by: Dave Airlie <[email protected]> Cc: 12.0 13.0 <[email protected]>
* nv50,nvc0: avoid reading out of bounds when getting bogus so infoIlia Mirkin2016-10-192-2/+8
| | | | | | | | | The state tracker tries to attach the info to the wrong shader. This is easy enough to protect against. Signed-off-by: Ilia Mirkin <[email protected]> Reviewed-by: Samuel Pitoiset <[email protected]> Cc: 12.0 13.0 <[email protected]>
* nvc0/ir: simplify predicate logic for GK104 atomic operationsSamuel Pitoiset2016-10-191-14/+7
| | | | | | | | The predicate is always CC_NOT_P as defined in processSurfaceCoordsNVE4(), so we only want to emit OR. Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Ilia Mirkin <[email protected]>
* nvc0/ir: remove useless NVC0LoweringPass::gMemBaseSamuel Pitoiset2016-10-191-4/+1
| | | | Signed-off-by: Samuel Pitoiset <[email protected]>
* nv50/ir: print CCTL subops in debug modeSamuel Pitoiset2016-10-191-0/+9
| | | | | Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Ilia Mirkin <[email protected]>
* radeonsi: eliminate trivial constant VS outputsMarek Olšák2016-10-193-2/+186
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | These constant value VS PARAM exports: - 0,0,0,0 - 0,0,0,1 - 1,1,1,0 - 1,1,1,1 can be loaded into PS inputs using the DEFAULT_VAL field, and the VS exports can be removed from the IR to save export & parameter memory. After LLVM optimizations, analyze the IR to see which exports are equal to the ones listed above (or undef) and remove them if they are. Targeted use cases: - All DX9 eON ports always clear 10 VS outputs to 0.0 even if most of them are unused by PS (such as Witcher 2 below). - VS output arrays with unused elements that the GLSL compiler can't eliminate (such as Batman below). The shader-db deltas are quite interesting: (not from upstream si-report.py, it won't be upstreamed) PERCENTAGE DELTAS Shaders PARAM exports (affected only) batman_arkham_origins 589 -67.17 % bioshock-infinite 1769 -0.47 % dirt-showdown 548 -2.68 % dota2 1747 -3.36 % f1-2015 776 -4.94 % left_4_dead_2 1762 -0.07 % metro_2033_redux 2670 -0.43 % portal 474 -0.22 % talos_principle 324 -3.63 % warsow 176 -2.20 % witcher2 1040 -73.78 % ---------------------------------------- All affected 991 -65.37 % ... 9681 -> 3353 ---------------------------------------- Total 26725 -10.82 % ... 58490 -> 52162 v2: treat Undef as both 0 and 1 Reviewed-by: Nicolai Hähnle <[email protected]> (v1) Tested-by: Edmondo Tommasina <[email protected]> (v1)
* nv50/ir: silent TGSI_PROPERTY_FS_DEPTH_LAYOUTSamuel Pitoiset2016-10-191-0/+1
| | | | | | | | Found that information message while replaying a trace from Metro 2033 Redux. Mark that property as useless for now. Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Ilia Mirkin <[email protected]>
* radeonsi: remove cb0_is_integer handlingMarek Olšák2016-10-193-13/+3
| | | | | | st/mesa does this for us. Reviewed-by: Nicolai Hähnle <[email protected]>
* draw: improve vertex fetch (v2)Roland Scheidegger2016-10-193-86/+134
| | | | | | | | | | | | | | | | | | | | | | | The per-element fetch has quite some calculations which are constant, these can be moved outside both the per-element as well as the main shader loop (llvm can figure out it's constant mostly on its own, however this can have a significant compile time cost). Similarly, it looks easier swapping the fetch loops (outer loop per attrib, inner loop filling up the per vertex elements - this way the aos->soa conversion also can be done per attrib and not just at the end though again this doesn't really make much of a difference in the generated code). (This would also make it possible to vectorize the calculations leading to the fetches.) There's also some minimal change simplifying the overflow math slightly. All in all, the generated code seems to look slightly simpler (depending on the actual vs), but more importantly I've seen a significant reduction in compile times for some vs (albeit with old (3.3) llvm version, and the time reduction is only really for the optimizations run on the IR). v2: adapt to other draw change. No changes with piglit. Reviewed-by: Jose Fonseca <[email protected]>
* draw: improved handling of undefined inputsRoland Scheidegger2016-10-191-21/+32
| | | | | | | | | | | | | | | Previous attempts to zero initialize all inputs were not really optimal (though no performance impact was measurable). In fact this is not really necessary, since we know the max number of inputs used. Instead, just generate fetch for up to max inputs used by the shader, directly replacing inputs for which there was no vertex element by zero. This also cleans up key generation, which previously would have stored some garbage for these elements. And also drop the assertion which indicates such bogus usage by a debug_printf (the whole point of initializing the undefined inputs was to make this case safe to handle). Reviewed-by: Jose Fonseca <[email protected]>
* gallivm: print out time for jitting functions with GALLIVM_DEBUG=perfRoland Scheidegger2016-10-191-0/+11
| | | | | | | | Compilation to actual machine code can easily take as much time as the optimization passes on the IR if not more, so print this out too. Reviewed-by: Brian Paul <[email protected]> Reviewed-by: Jose Fonseca <[email protected]>
* gallivm: Use native packs and unpacks for the lerpsRoland Scheidegger2016-10-193-13/+156
| | | | | | | | | | | | | | | | | | | | | | | | For the texturing packs, things looked pretty terrible. For every lerp, we were repacking the values, and while those look sort of cheap with 128bit, with 256bit we end up with 2 of them instead of just 1 but worse, plus 2 extracts too (the unpack, however, works fine with a single instruction, albeit only with llvm 3.8 - the vpmovzxbw). Ideally we'd use more clever pack for llvmpipe backend conversion too since we actually use the "wrong" shuffle (which is more work) when doing the fs twiddle just so we end up with the wrong order for being able to do native pack when converting from 2x8f -> 1x16b. But this requires some refactoring, since the untwiddle is separate from conversion. This is only used for avx2 256bit pack/unpack for now. Improves openarena scores by 8% or so, though overall it's still pretty disappointing how much faster 256bit vectors are even with avx2 (or rather, aren't...). And, of course, eliminating the needless packs/unpacks in the first place would eliminate most of that advantage (not quite all) from this patch. Reviewed-by: Jose Fonseca <[email protected]>
* svga: minor code improvements in svga_validate_pipe_sampler_view()Brian Paul2016-10-181-8/+8
| | | | | | | Use the 'texture' local var in more places. Rename 'pFormat' to 'viewFormat'. Reviewed-by: Charmaine Lee <[email protected]>
* st/va: force to flush the last p frame in idr periodBoyuan Zhang2016-10-181-0/+3
| | | | | | | | | | | | | | During dual instance encoding submission, if the second encode task and first encode task have no reference dependency, e.g. p following with idr-frame, there is a chance the second task will use for its reconstructed picture buffer the same buffer used by first task for its reference/reconstructed picture. In this case, buffer corruption may occur depending on encoding speed. Fix is to force flush these two tasks separately to avoid race condition Fixes: https://bugs.freedesktop.org/show_bug.cgi?id=98005 Signed-off-by: Boyuan Zhang <[email protected]> Reviewed-by: Christian König <[email protected]>
* radeonsi: rename prefixes from radeon to siMarek Olšák2016-10-184-157/+157
| | | | | | Reviewed-by: Nicolai Hähnle <[email protected]> Reviewed-by: Emil Velikov <[email protected]> Acked-by: Edward O'Callaghan <[email protected]>
* radeonsi: merge radeon_llvm_context and si_shader_contextMarek Olšák2016-10-184-317/+290
| | | | | | Reviewed-by: Nicolai Hähnle <[email protected]> Reviewed-by: Emil Velikov <[email protected]> Acked-by: Edward O'Callaghan <[email protected]>
* radeonsi: import all TGSI->LLVM code from gallium/radeonMarek Olšák2016-10-1811-462/+346
| | | | | | Acked-by: Nicolai Hähnle <[email protected]> Reviewed-by: Emil Velikov <[email protected]> Acked-by: Edward O'Callaghan <[email protected]>
* gallium/radeon: simplify initialization of 64-bit gallivm buildersMarek Olšák2016-10-181-18/+4
| | | | | | Reviewed-by: Nicolai Hähnle <[email protected]> Reviewed-by: Emil Velikov <[email protected]> Acked-by: Edward O'Callaghan <[email protected]>
* gallium/radeon: remove unused radeon_llvm_reg_index_soaMarek Olšák2016-10-182-7/+0
| | | | | | Reviewed-by: Nicolai Hähnle <[email protected]> Reviewed-by: Emil Velikov <[email protected]> Acked-by: Edward O'Callaghan <[email protected]>
* radeonsi: move LLVM ALU codegen into radeonsiMarek Olšák2016-10-186-992/+1056
| | | | | | Acked-by: Nicolai Hähnle <[email protected]> Reviewed-by: Emil Velikov <[email protected]> Acked-by: Edward O'Callaghan <[email protected]>
* loader: remove loader_get_driver_for_fd() driver_typeEmil Velikov2016-10-181-1/+1
| | | | | | | | | | | | | | Reminiscent from the pre-loader days, were we had multiple instances of the loader logic in separate places and one could build a "GALLIUM_ONLY" version. Since that is no longer the case and the loaders (glx/egl/gbm) do not (and should not) require to know any classic/gallium specific we can drop the argument and the related code. Signed-off-by: Emil Velikov <[email protected]> Reviewed-by: Axel Davy <[email protected]> Reviewed-by: Nicolai Hähnle <[email protected]>
* gm107/ir: fix bit offset of tex lod setting for indirect texturingIlia Mirkin2016-10-181-1/+1
| | | | | | Signed-off-by: Ilia Mirkin <[email protected]> Reviewed-by: Samuel Pitoiset <[email protected]> Cc: [email protected]
* gm107/ir: fix texturing with indirect samplersIlia Mirkin2016-10-181-0/+10
| | | | | | | | | | The indirect handle has to come right after the coordinates, so if there was a sample/bias/depth compare/offset, everything would end up being shifted by one argument position. Signed-off-by: Ilia Mirkin <[email protected]> Reviewed-by: Samuel Pitoiset <[email protected]> Cc: [email protected]