summaryrefslogtreecommitdiffstats
path: root/src/gallium/auxiliary
Commit message (Collapse)AuthorAgeFilesLines
* util: fix memory leak from the fragment shaders for SINT<->UINT blitsCharmaine Lee2016-11-231-1/+1
| | | | | | This patch deletes those fragment shaders in util_blitter_destroy(). Reviewed-by: Brian Paul <[email protected]>
* util: fix missing swizzle components in the SINT <-> UINT conversion stringCharmaine Lee2016-11-231-2/+2
| | | | | | | Fixes tgsi error introduced in commit 3817a7a. The error complains missing swizzle component in the conversion string "UMIN TEMP[0], TEMP[0], IMM[0].x". Reviewed-by: Roland Scheidegger <[email protected]>
* gallium: fix more occurences of u_hash.hMarek Olšák2016-11-221-1/+1
| | | | this fixes compile failures since 86514d84e0beec47c82da4888db12bf07f33cb83
* util: import CRC32 implementation from galliumMarek Olšák2016-11-223-178/+0
| | | | Reviewed-by: Timothy Arceri <[email protected]>
* auxiliary/vl/dri: call get_xcb_screen() only onceEmil Velikov2016-11-221-2/+6
| | | | | Signed-off-by: Emil Velikov <[email protected]> Reviewed-by: Eric Engestrom <[email protected]>
* tgsi/scan: record if a shader writes the position outputMarek Olšák2016-11-212-0/+3
| | | | | Tested-by: Edmondo Tommasina <[email protected]> Reviewed-by: Nicolai Hähnle <[email protected]>
* tgsi/scan: use a big switch for scanning outputsMarek Olšák2016-11-211-40/+28
| | | | | Tested-by: Edmondo Tommasina <[email protected]> Reviewed-by: Nicolai Hähnle <[email protected]>
* draw: drop some overflow computationsRoland Scheidegger2016-11-211-65/+46
| | | | | | | | | | | | | | | | | | | | | | | It turns out that noone actually cares if the address computations overflow, be it the stride mul or the offset adds. Wrap around seems to be explicitly permitted even by some other API (which is a _very_ surprising result, as these overflow computations were added just for that and made some tests pass at that time - I suspect some later fixes fixed the actual root cause...). So the requirements in that other api were actually sane there all along after all... Still need to make sure the computed buffer size needed is valid, of course. This ditches the shiny new widening mul from these codepaths, ah well... And now that I really understand this, change the fishy min limiting indices to what it really should have done. Which is simply to prevent fetching more values than valid for the last loop iteration. (This makes the code path in the loop minimally more complex for the non-indexed case as we have to skip the optimization combining two adds. I think it should be safe to skip this actually there, but I don't care much about this especially since skipping that optimization actually makes the code easier to read elsewhere.) Reviewed-by: Jose Fonseca <[email protected]>
* draw: simplify fetch some moreRoland Scheidegger2016-11-211-63/+55
| | | | | | | | | | | Don't keep the ofbit. This is just a minor simplification, just adjust the buffer size so that there will always be an overflow if buffers aren't valid to fetch from. Also, get rid of control flow from the instanced path too. Not worried about performance, but it's simpler and keeps the code more similar to ordinary fetch. Reviewed-by: Jose Fonseca <[email protected]>
* draw: unify linear and elts draw jit functionsRoland Scheidegger2016-11-213-89/+70
| | | | | | | | | | | | | | | | | | | | | | | | The code for elts and linear paths was nearly 100% identical by now - with the elts path simply having some additional gather for the elements in the main loop (with some additional small differences before the main loop). Hence nuke the separate functions and decide this at jit shader execution time (simply based on the presence of the elts pointer). Some analysis shows that the generated vs jit functions seem to be just very minimally more complex than the former elts functions, and almost none of the additional complexity is in the main loop (basically just the branch logic for the branch fetching the actual indices). Compared to linear, the codesize of the function is of course a bit larger, however the actual executed code in the main loop appears to be near 100% identical (the additional code looking up indices is skipped as expected). So, I would not expect a (meaningful) performance difference with the generated code, neither with elts nor linear, this does however roughly half the compilation time (the compiled shaders should also use only half the memory of course). Reviewed-by: Jose Fonseca <[email protected]>
* draw: use same argument order for jit draw linear / elts functionsRoland Scheidegger2016-11-213-34/+30
| | | | | | This is a bit simpler. Mostly to make it easier to unify the paths later... Reviewed-by: Jose Fonseca <[email protected]>
* draw: drop unnecessary index overflow handling from vsplit codeRoland Scheidegger2016-11-212-56/+28
| | | | | | | | | | | | | | | | | | | | This was kind of strange, since it replaced indices which were only overflowing due to bias with MAX_UINT. This would cause an overflow later in the shader, except if stride was 0, however the vertex id would be essentially random then (-1 + eltBias). No test cared about it, though. So, drop this and just use ordinary int arithmetic wraparound as usual. This is much simpler to understand and the results are "more correct" or at least more consistent (vertex id as well as actual fetch results just correspond to wrapped around arithmetic). There's only one catch, it is now possible to hit the cache initialization value also with ushort and ubyte elts path (this wouldn't be an issue if we'd simply handle the eltBias itself later in the shader). Hence, we need to make sure the cache logic doesn't think this element has already been emitted when it has not (I believe some seriously bad things could happen otherwise). So, borrow the logic which handled this from the uint case, but not before fixing it up... Reviewed-by: Jose Fonseca <[email protected]>
* draw: simplify vsplit elts code a bitRoland Scheidegger2016-11-213-40/+18
| | | | | | | | | | | | | | | | | | | | | | | | | | | vsplit_get_base_idx explicitly returned idx 0 and set the ofbit in case of overflow. We'd then check the ofbit and use idx 0 instead of looking it up. This was necessary because DRAW_GET_IDX used to return DRAW_MAX_FETCH_IDX and not 0 in case of overflows. However, this is all unnecessary, we can just let DRAW_GET_IDX return 0 in case of overflow. In fact before bbd1e60198548a12be3405fc32dd39a87e8968ab the code already did that, not sure why this particular bit was changed (might have been one half of an attempt to get these indices to actual draw shader execution - in fact I think this would make things less awkward, it would require moving the eltBias handling to the shader as well). Note there's other callers of DRAW_GET_IDX - those code paths however explicitly do not handle index buffer overflows, therefore the overflow value doesn't matter for them. Also do some trivial simplification - for (unsigned) a + b, checking res < a is sufficient for overflow detection, we don't need to check for res < b too (similar for signed). And an index buffer overflow check looked bogus - eltMax is the number of elements in the index buffer, not the maximum element which can be fetched. (Drop the start check against the idx buffer though, this is already covered by end check and end < start). Reviewed-by: Jose Fonseca <[email protected]>
* draw: finally optimize bool clip mask generationRoland Scheidegger2016-11-183-23/+26
| | | | | | | | | | | lp_build_any_true_range is just what we need, though it will only produce optimal code with sse41 (ptest + set) - but even without it on 64bit x86 the code is still better (1 unpack, 2 movq + or + set), on 32bit x86 it's going to be roughly the same as before. While here also make it a "real" 8bit boolean - cuts one instruction but more importantly similar to ordinary booleans. Reviewed-by: Jose Fonseca <[email protected]>
* draw: use vectorized calculations for fetch (v2)Roland Scheidegger2016-11-181-131/+265
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Instead of doing all the math with scalars, use vectors. This means the overflow math needs to be done manually, albeit that's only really problematic for the stride/index mul, the rest has been pretty much moved outside the shader loop (albeit the mul could actually be optimized away too), where things are still scalar. To eliminate control flow in the main shader loop fetch, provide fake buffers (so index 0 is always valid to fetch). Still uses aos fetch though in the end - mostly because some more code would be needed to handle unaligned fetches in that path, and because for most formats it won't make a difference anyway (we generate some truly horrendous code for things like R16G16_something for instance). Instanced fetch however stays roughly the same as before, except that no longer the same element is fetched multiple times (I've seen a reduction of ~3 times in main shader loop size due to llvm not recognizing it's all the same fetch, since it would have been possible some of the fetches getting replaced with zeros in case vector size exceeds remaining fetch count - the values of such fetches don't matter at all though). Also, for elts gathering, use vectorized code as well. The generated shaders are smaller and faster to compile (not entirely sure about execution speed, but generally unless there's just single vertices to handle I would expect it to be faster - there's more opportunities for future improvements by using soa fetch). v3: skip the fake index buffer, not needed due to the jit code never seeing the real index buffer in the first place. Fix a bug with mask expansion (needs SExt, not ZExt). Also, be really really careful to keep the behavior the same, even in cases where it looks wrong, and add comments why the code is doing the seemingly wrong stuff... Fortunately it's not actually more complex in the end... Also change function order slightly just to make the diff more readable. No piglit change. Passes some internal testing with another api too... Reviewed-by: Jose Fonseca <[email protected]>
* u_simple_shaders: try to un-break the Windows buildNicolai Hähnle2016-11-161-2/+3
| | | | Acked-by: Edward O'Callaghan <[email protected]>
* util/blitter: add clamping during SINT <-> UINT blitsNicolai Hähnle2016-11-165-43/+124
| | | | | | | | | | | | Even though glBlitFramebuffer cannot be used for SINT <-> UINT blits, we still need to handle this type of blit here because it can happen as part of texture uploads / downloads, e.g. uploading a GL_RGBA8I texture from GL_UNSIGNED_INT data. Fixes parts of GL45-CTS.gtf32.GL3Tests.packed_pixels.packed_pixels. Reviewed-by: Marek Olšák <[email protected]> Reviewed-by: Edward O'Callaghan <[email protected]>
* util/blitter: index texfetch_col shaders by typeNicolai Hähnle2016-11-161-35/+19
| | | | | Reviewed-by: Marek Olšák <[email protected]> Reviewed-by: Edward O'Callaghan <[email protected]>
* gallium: add PIPE_SHADER_CAP_LOWER_IF_THRESHOLDMarek Olšák2016-11-152-0/+2
| | | | Reviewed-by: Nicolai Hähnle <[email protected]>
* gallivm: limit use of setFastMathFlags to LLVM 3.8 and laterMarek Olšák2016-11-151-0/+2
| | | | Reviewed-by: Brian Paul <[email protected]>
* gallivm: add lp_create_builder with an unsafe_fpmath optionMarek Olšák2016-11-152-0/+17
| | | | Reviewed-by: Nicolai Hähnle <[email protected]>
* gallium: detect avx512 cpu featuresTim Rowley2016-11-102-0/+36
| | | | | | | v3: fix check for xmm/ymm test v2: style code, add avx512 to cpu dump Reviewed-by: Roland Scheidegger <[email protected]>
* gallivm: fix [IU]MUL_HI regression harderNicolai Hähnle2016-11-101-8/+12
| | | | | | | | | | The fix in commit 88f791db75e9f065bac8134e0937e1b76600aa36 was insufficient for radeonsi because the vector case was not handled properly. It seems piglit only covers the scalar case, unfortunately. Fixes GL45-CTS.shader_bitfield_operation.[iu]mulExtended.* Reviewed-by: Roland Scheidegger <[email protected]>
* gallivm: Fix build after removal of deprecated attribute API v3Tom Stellard2016-11-094-8/+89
| | | | | | | | | | | | v2: Fix adding parameter attributes with LLVM < 4.0. v3: Fix typo. Fix parameter index. Add a gallivm enum for function attributes. Reviewed-by: Nicolai Hähnle <[email protected]>
* Revert "draw: use vectorized calculations for fetch"Roland Scheidegger2016-11-092-282/+159
| | | | | | | | Trivial. There's some regressions internally, related to overflow behavior. I'll have to look at it at another time, some interactions with vsplit/vcache are actually mind-blowing. This reverts commit 3fa10ffb496cc4e6d1003891cf0381bb5bec2a74.
* tgsi/scan: turn a huge if-else-if.. chain into a switch statementMarek Olšák2016-11-081-14/+30
| | | | | Reviewed-by: Brian Paul <[email protected]> Reviewed-by: Nicolai Hähnle <[email protected]>
* tgsi/scan: fix images_buffers regressionMarek Olšák2016-11-081-3/+2
| | | | | | | | | The first IF statement disabled the second one. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=98599 Reviewed-by: Brian Paul <[email protected]> Reviewed-by: Nicolai Hähnle <[email protected]>
* gallivm: fix [IU]MUL_HI regressionNicolai Hähnle2016-11-083-28/+90
| | | | | | | | | | | | | | | | This patch does two things: 1. It separates the host-CPU code generation from the generic code generation. This guards against accidently breaking things for radeonsi in the future. 2. It makes sure we actually use both arguments and don't just compute a square :-p Fixes a regression introduced by commit 29279f44b3172ef3b84d470e70fc7684695ced4b Cc: Roland Scheidegger <[email protected]> Reviewed-by: Roland Scheidegger <[email protected]>
* draw: use vectorized calculations for fetchRoland Scheidegger2016-11-082-159/+282
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Instead of doing all the math with scalars, use vectors. This means the overflow math needs to be done manually, albeit that's only really problematic for the stride/index mul, the rest has been pretty much moved outside the shader loop (albeit the mul could actually be optimized away too), where things are still scalar. Because llvm is complete fail with the zero-extend widening mul, roll our own even... To eliminate control flow in the main shader loop fetch, provide fake buffers (so index 0 is always valid to fetch). Still uses aos fetch though in the end - mostly because some more code would be needed to handle unaligned fetches in that path, and because for most formats it won't make a difference anyway (we generate some truly horrendous code for things like R16G16_something for instance). Instanced fetch however stays roughly the same as before, except that no longer the same element is fetched multiple times (I've seen a reduction of ~3 times in main shader loop size due to apparently llvm not being able to deduce it's really all the same with a couple instanced elements). Also, for elts gathering, use vectorized code as well - provide a fake elt buffer if there's no valid one bound. The generated shaders are smaller and faster to compile (not entirely sure about execution speed, but generally unless there's just single vertices to handle I would expect it to be faster - there's more opportunities for future improvements by using soa fetch). No piglit change. Reviewed-by: Jose Fonseca <[email protected]>
* gallivm: introduce 32x32->64bit lp_build_mul_32_lohi functionRoland Scheidegger2016-11-083-38/+172
| | | | | | | | | | | | This is used by shader umul_hi/imul_hi functions (and soon by draw). It's actually useful separating this out on its own, however the real reason for doing it is because we're using an optimized sse2 version, since the code llvm generates is atrocious (since there's no widening mul in llvm, and it does not recognize the widening mul pattern, so it generates code for real 64x64->64bit mul, which the cpu can't do natively, in contrast to 32x32->64bit mul which it could do). Reviewed-by: Jose Fonseca <[email protected]>
* gallium/hud: protect against and initialization raceSteven Toth2016-11-074-8/+41
| | | | | | | | | In the event that multiple threads attempt to install a graph concurrently, protect the shared list. Signed-off-by: Steven Toth <[email protected]> Reviewed-by: Brian Paul <[email protected]> Reviewed-by: Nicolai Hähnle <[email protected]>
* gallium/hud: close a previously opened handleSteven Toth2016-11-073-1/+6
| | | | | | | | We're missing the closedir() to the matching opendir(). Signed-off-by: Steven Toth <[email protected]> Reviewed-by: Brian Paul <[email protected]> Reviewed-by: Nicolai Hähnle <[email protected]>
* gallium/hud: fix a problem where objects are free'd while in use.Steven Toth2016-11-074-55/+0
| | | | | | | | | | | | | | Instead of trying to maintain a reference counted list of valid HUD objects, and freeing them accordingly, creating race conditions between unanticipated multiple threads, simply accept they're allocated once and never released until the process terminates. They're a shared resource between multiple threads, so accept they're always available for use. Signed-off-by: Steven Toth <[email protected]> Reviewed-by: Brian Paul <[email protected]> Reviewed-by: Nicolai Hähnle <[email protected]>
* draw: fix undefined input handling some more...Roland Scheidegger2016-11-041-50/+54
| | | | | | | | | | | | | | Previous fixes were incomplete - some code still iterated through the number of elements provided by velem layout instead of the number stored in the key (which is the same as the number defined by the vs). And also actually accessed the elements from the layout directly instead of those in the key. This mismatch could still cause crashes. (Besides, it is a very good idea to only use data stored in the key anyway.) v2: move null format check, remove now unnecessary function parameter, some minor prettify Reviewed-by: Jose Fonseca <[email protected]> Reviewed-by: Nicolai Hähnle <[email protected]>
* gallium/hud: call fflush() after printing error messagesBrian Paul2016-11-031-1/+9
| | | | | | For Windows. Otherwise, we don't see the message until the program exits. Reviewed-by: Charmaine Lee <[email protected]>
* nir/i965/anv/radv/gallium: make shader info a pointerTimothy Arceri2016-10-261-5/+5
| | | | | | | | | | When restoring something from shader cache we won't have and don't want to create a nir_shader this change detaches the two. There are other advantages such as being able to reuse the shader info populated by GLSL IR. Reviewed-by: Jason Ekstrand <[email protected]>
* tgsi: trivial build fix for MSVCBrian Paul2016-10-241-1/+1
| | | | Reviewed-by: Marek Olšák <[email protected]>
* gallium/util: Add align_callocAxel Davy2016-10-241-0/+8
| | | | | | | | | | | Add implementation for align_calloc, which is align_malloc + memset. v2: add if (ptr) before memset. Fix indentation. Signed-off-by: Axel Davy <[email protected]> Reviewed-by: Nicolai Hähnle <[email protected]>
* tgsi/scan: scan texture offset operandsMarek Olšák2016-10-241-0/+16
| | | | | | This seems important considering how much we depend on some of the flags. Reviewed-by: Nicolai Hähnle <[email protected]>
* tgsi/scan: move src operand processing into a separate functionMarek Olšák2016-10-241-171/+183
| | | | | | the next commit will need this Reviewed-by: Nicolai Hähnle <[email protected]>
* tgsi/scan: get information about shader buffer usageMarek Olšák2016-10-242-0/+23
| | | | Reviewed-by: Nicolai Hähnle <[email protected]>
* tgsi/scan: handle indirect image indexing correctlyMarek Olšák2016-10-242-8/+17
| | | | Reviewed-by: Nicolai Hähnle <[email protected]>
* tgsi/scan: don't treat RESQ etc. as memory instructionsMarek Olšák2016-10-241-5/+13
| | | | Reviewed-by: Nicolai Hähnle <[email protected]>
* tgsi/scan: get information about indirect 2D file accessMarek Olšák2016-10-242-0/+7
| | | | Reviewed-by: Nicolai Hähnle <[email protected]>
* tgsi/scan: get information about indirect CONST accessMarek Olšák2016-10-242-0/+15
| | | | Reviewed-by: Nicolai Hähnle <[email protected]>
* gallivm: try to fix build with LLVM <= 3.4 due to missing CallSite.hMarek Olšák2016-10-201-1/+5
| | | | | Reviewed-by: Brian Paul <[email protected]> Tested-by: Brian Paul <[email protected]>
* gallivm: add wrappers for missing functions in LLVM <= 3.8Marek Olšák2016-10-202-0/+27
| | | | | | radeonsi needs these. Reviewed-by: Nicolai Hähnle <[email protected]>
* draw: improve vertex fetch (v2)Roland Scheidegger2016-10-193-86/+134
| | | | | | | | | | | | | | | | | | | | | | | The per-element fetch has quite some calculations which are constant, these can be moved outside both the per-element as well as the main shader loop (llvm can figure out it's constant mostly on its own, however this can have a significant compile time cost). Similarly, it looks easier swapping the fetch loops (outer loop per attrib, inner loop filling up the per vertex elements - this way the aos->soa conversion also can be done per attrib and not just at the end though again this doesn't really make much of a difference in the generated code). (This would also make it possible to vectorize the calculations leading to the fetches.) There's also some minimal change simplifying the overflow math slightly. All in all, the generated code seems to look slightly simpler (depending on the actual vs), but more importantly I've seen a significant reduction in compile times for some vs (albeit with old (3.3) llvm version, and the time reduction is only really for the optimizations run on the IR). v2: adapt to other draw change. No changes with piglit. Reviewed-by: Jose Fonseca <[email protected]>
* draw: improved handling of undefined inputsRoland Scheidegger2016-10-191-21/+32
| | | | | | | | | | | | | | | Previous attempts to zero initialize all inputs were not really optimal (though no performance impact was measurable). In fact this is not really necessary, since we know the max number of inputs used. Instead, just generate fetch for up to max inputs used by the shader, directly replacing inputs for which there was no vertex element by zero. This also cleans up key generation, which previously would have stored some garbage for these elements. And also drop the assertion which indicates such bogus usage by a debug_printf (the whole point of initializing the undefined inputs was to make this case safe to handle). Reviewed-by: Jose Fonseca <[email protected]>
* gallivm: print out time for jitting functions with GALLIVM_DEBUG=perfRoland Scheidegger2016-10-191-0/+11
| | | | | | | | Compilation to actual machine code can easily take as much time as the optimization passes on the IR if not more, so print this out too. Reviewed-by: Brian Paul <[email protected]> Reviewed-by: Jose Fonseca <[email protected]>