summaryrefslogtreecommitdiffstats
path: root/src/gallium
Commit message (Collapse)AuthorAgeFilesLines
* radeonsi: decrease the number of texture slots to 24Marek Olšák2016-11-211-1/+1
| | | | | | | | | Company Of Heroes 2 needs only 24. This saves 512 bytes of CE RAM per shader stage. Tested-by: Edmondo Tommasina <[email protected]> Reviewed-by: Nicolai Hähnle <[email protected]>
* radeonsi: fast exit si_emit_derived_tess_state earlyMarek Olšák2016-11-212-11/+15
| | | | | Tested-by: Edmondo Tommasina <[email protected]> Reviewed-by: Nicolai Hähnle <[email protected]>
* winsys/amdgpu: set addrlib flag opt4SpaceMarek Olšák2016-11-211-0/+1
| | | | | Tested-by: Edmondo Tommasina <[email protected]> Reviewed-by: Nicolai Hähnle <[email protected]>
* radeonsi: check for !is_linear in do_hardware_msaa_resolveMarek Olšák2016-11-211-2/+4
| | | | | | | We don't want opt4Space here. Tested-by: Edmondo Tommasina <[email protected]> Reviewed-by: Nicolai Hähnle <[email protected]>
* gallium/radeon: add RADEON_SURF_OPTIMIZE_FOR_SPACEMarek Olšák2016-11-213-1/+6
| | | | | | | | FORCE_TILING should disable it. It has no effect now, but that may change soon. Tested-by: Edmondo Tommasina <[email protected]> Reviewed-by: Nicolai Hähnle <[email protected]>
* radeonsi: Add missing error-checking to si_create_compute_state (v2)Mun Gwan-gyeong2016-11-211-1/+5
| | | | | | | | | | | | | | | | When the uploading of shader fails on si_shader_binary_upload(), it returns -ENOMEM. We should handle si_shader_binary_upload() failure path on si_create_compute_state(). CID 1394027 v2: Fixes from Edward O'Callaghan's review a) Update explicitly return value check with "si_shader_binary_upload() < 0" b) Update commit message. Signed-off-by: Mun Gwan-gyeong <[email protected]> Reviewed-by: Edward O'Callaghan <[email protected]> Signed-off-by: Marek Olšák <[email protected]>
* draw: drop some overflow computationsRoland Scheidegger2016-11-211-65/+46
| | | | | | | | | | | | | | | | | | | | | | | It turns out that noone actually cares if the address computations overflow, be it the stride mul or the offset adds. Wrap around seems to be explicitly permitted even by some other API (which is a _very_ surprising result, as these overflow computations were added just for that and made some tests pass at that time - I suspect some later fixes fixed the actual root cause...). So the requirements in that other api were actually sane there all along after all... Still need to make sure the computed buffer size needed is valid, of course. This ditches the shiny new widening mul from these codepaths, ah well... And now that I really understand this, change the fishy min limiting indices to what it really should have done. Which is simply to prevent fetching more values than valid for the last loop iteration. (This makes the code path in the loop minimally more complex for the non-indexed case as we have to skip the optimization combining two adds. I think it should be safe to skip this actually there, but I don't care much about this especially since skipping that optimization actually makes the code easier to read elsewhere.) Reviewed-by: Jose Fonseca <[email protected]>
* draw: simplify fetch some moreRoland Scheidegger2016-11-211-63/+55
| | | | | | | | | | | Don't keep the ofbit. This is just a minor simplification, just adjust the buffer size so that there will always be an overflow if buffers aren't valid to fetch from. Also, get rid of control flow from the instanced path too. Not worried about performance, but it's simpler and keeps the code more similar to ordinary fetch. Reviewed-by: Jose Fonseca <[email protected]>
* draw: unify linear and elts draw jit functionsRoland Scheidegger2016-11-213-89/+70
| | | | | | | | | | | | | | | | | | | | | | | | The code for elts and linear paths was nearly 100% identical by now - with the elts path simply having some additional gather for the elements in the main loop (with some additional small differences before the main loop). Hence nuke the separate functions and decide this at jit shader execution time (simply based on the presence of the elts pointer). Some analysis shows that the generated vs jit functions seem to be just very minimally more complex than the former elts functions, and almost none of the additional complexity is in the main loop (basically just the branch logic for the branch fetching the actual indices). Compared to linear, the codesize of the function is of course a bit larger, however the actual executed code in the main loop appears to be near 100% identical (the additional code looking up indices is skipped as expected). So, I would not expect a (meaningful) performance difference with the generated code, neither with elts nor linear, this does however roughly half the compilation time (the compiled shaders should also use only half the memory of course). Reviewed-by: Jose Fonseca <[email protected]>
* draw: use same argument order for jit draw linear / elts functionsRoland Scheidegger2016-11-213-34/+30
| | | | | | This is a bit simpler. Mostly to make it easier to unify the paths later... Reviewed-by: Jose Fonseca <[email protected]>
* draw: drop unnecessary index overflow handling from vsplit codeRoland Scheidegger2016-11-212-56/+28
| | | | | | | | | | | | | | | | | | | | This was kind of strange, since it replaced indices which were only overflowing due to bias with MAX_UINT. This would cause an overflow later in the shader, except if stride was 0, however the vertex id would be essentially random then (-1 + eltBias). No test cared about it, though. So, drop this and just use ordinary int arithmetic wraparound as usual. This is much simpler to understand and the results are "more correct" or at least more consistent (vertex id as well as actual fetch results just correspond to wrapped around arithmetic). There's only one catch, it is now possible to hit the cache initialization value also with ushort and ubyte elts path (this wouldn't be an issue if we'd simply handle the eltBias itself later in the shader). Hence, we need to make sure the cache logic doesn't think this element has already been emitted when it has not (I believe some seriously bad things could happen otherwise). So, borrow the logic which handled this from the uint case, but not before fixing it up... Reviewed-by: Jose Fonseca <[email protected]>
* draw: simplify vsplit elts code a bitRoland Scheidegger2016-11-213-40/+18
| | | | | | | | | | | | | | | | | | | | | | | | | | | vsplit_get_base_idx explicitly returned idx 0 and set the ofbit in case of overflow. We'd then check the ofbit and use idx 0 instead of looking it up. This was necessary because DRAW_GET_IDX used to return DRAW_MAX_FETCH_IDX and not 0 in case of overflows. However, this is all unnecessary, we can just let DRAW_GET_IDX return 0 in case of overflow. In fact before bbd1e60198548a12be3405fc32dd39a87e8968ab the code already did that, not sure why this particular bit was changed (might have been one half of an attempt to get these indices to actual draw shader execution - in fact I think this would make things less awkward, it would require moving the eltBias handling to the shader as well). Note there's other callers of DRAW_GET_IDX - those code paths however explicitly do not handle index buffer overflows, therefore the overflow value doesn't matter for them. Also do some trivial simplification - for (unsigned) a + b, checking res < a is sufficient for overflow detection, we don't need to check for res < b too (similar for signed). And an index buffer overflow check looked bogus - eltMax is the number of elements in the index buffer, not the maximum element which can be fetched. (Drop the start check against the idx buffer though, this is already covered by end check and end < start). Reviewed-by: Jose Fonseca <[email protected]>
* gallium: Add support for SWR compilationGeorge Kyriazis2016-11-213-0/+12
| | | | | | | | Include swr library and include -DHAVE_SWR in the compile line. v3: split to a separate commit Reviewed-by: Emil Velikov <[email protected]>
* gallium: swr: Added swr build for windowsGeorge Kyriazis2016-11-213-0/+218
| | | | | | | | v4: Add windows-specific gen_knobs.{cpp|h} changes v5: remove aggresive squashing of gen_knobs.py to this commit; added SConscript to EXTRA_DIST in Makefile.am Reviewed-by: Emil Velikov <[email protected]>
* swr: Modify gen_knobs.{cpp|h} creation scriptGeorge Kyriazis2016-11-212-26/+39
| | | | | | | | | | Modify gen_knobs.py so that each invocation creates a single generated file. This is more similar to how the other generators behave. v5: remove Scoscript edits from this commit; moved to commit that first adds SConscript Acked-by: Emil Velikov <[email protected]>
* swr: Windows-related changesGeorge Kyriazis2016-11-212-7/+29
| | | | | | | | | | | | | - Handle dynamic library loading for windows - Implement swap for gdi - fix prototypes - update include paths on configure-based build for swr_loader.cpp v2: split to multiple patches v3: split and reshuffle some more; renamed title v4: move Makefile.am changes to other commit. Modify header files Reviewed-by: Emil Velikov <[email protected]>
* swr: renamed duplicate swr_create_screen()George Kyriazis2016-11-213-2/+6
| | | | | | | | | | | There are 2 swr_create_screen() functions. One in swr_loader.cpp, which is used during driver init, and the other is hiding in swr_screen.cpp, which ends up in the arch-specific .dll/.so. Rename the second one to swr_create_screen_internal(), to avoid confusion in header files. Reviewed-by: Emil Velikov <[email protected]>
* swr: Handle windows.h and NOMINMAXGeorge Kyriazis2016-11-213-26/+17
| | | | | | | | | Reorder header files so that we have a chance to defined NOMINMAX before mesa include files include windows.h v3: split from bigger patch Reviewed-by: Emil Velikov <[email protected]>
* gallium: Added SWR support for gdiGeorge Kyriazis2016-11-211-5/+23
| | | | | | | | | | Added hooks for screen creation and swap. Still keep llvmpipe the default software renderer. v2: split from bigger patch v3: reword commit message Reviewed-by: Emil Velikov <[email protected]>
* radeonsi: Fix resource leak in gs_copy_shader allocation failure pathGwan-gyeong Mun2016-11-221-1/+7
| | | | | | | | | CID 1394028 Signed-off-by: Mun Gwan-gyeong <[email protected]> Reviewed-by: Edward O'Callaghan <[email protected]> Reviewed-by: Marek Olšák <[email protected]> Reviewed-by: Nicolai Hähnle <[email protected]>
* radeonsi: store group_size_variable in struct si_computeNicolai Hähnle2016-11-211-5/+8
| | | | | | | | | | | | | | For compute shaders, we free the selector after the shader has been compiled, so we need to save this bit somewhere else. Also, make sure that this type of bug cannot re-appear, by NULL-ing the selector pointer after we're done with it. This bug has been there since the feature was added, but was only exposed in piglit arb_compute_variable_group_size-local-size by commit 9bfee7047b70cb0aa026ca9536465762f96cb2b1 (which is totally unrelated). Cc: 13.0 <[email protected]> Reviewed-by: Marek Olšák <[email protected]>
* nvc0/ir: use levelZero flag when the lod is set to 0Ilia Mirkin2016-11-202-6/+43
| | | | | Signed-off-by: Ilia Mirkin <[email protected]> Reviewed-by: Samuel Pitoiset <[email protected]>
* swr: mark streamout buffers as writtenIlia Mirkin2016-11-191-0/+7
| | | | | Signed-off-by: Ilia Mirkin <[email protected]> Reviewed-by: Bruce Cherniak <[email protected]>
* radeonsi: emit sample locations also when nr_samples == 1Nicolai Hähnle2016-11-181-1/+4
| | | | | | | | | | | | Since the state tracker now enables MSAA in the hardware for the case nr_samples == 1 as well, we need to set sample locations correctly for this case. The Polaris override is still needed for the non-MSAA case (when nr_samples == 0). Reviewed-by: Marek Olšák <[email protected]> Reviewed-by: Edward O'Callaghan <[email protected]>
* radeonsi: allow sample mask export for single-sample framebuffersNicolai Hähnle2016-11-181-4/+5
| | | | | | | This fixes GL45-CTS.sample_variables.mask.*.samples_1.*. Reviewed-by: Marek Olšák <[email protected]> Reviewed-by: Edward O'Callaghan <[email protected]>
* draw: finally optimize bool clip mask generationRoland Scheidegger2016-11-183-23/+26
| | | | | | | | | | | lp_build_any_true_range is just what we need, though it will only produce optimal code with sse41 (ptest + set) - but even without it on 64bit x86 the code is still better (1 unpack, 2 movq + or + set), on 32bit x86 it's going to be roughly the same as before. While here also make it a "real" 8bit boolean - cuts one instruction but more importantly similar to ordinary booleans. Reviewed-by: Jose Fonseca <[email protected]>
* draw: use vectorized calculations for fetch (v2)Roland Scheidegger2016-11-181-131/+265
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Instead of doing all the math with scalars, use vectors. This means the overflow math needs to be done manually, albeit that's only really problematic for the stride/index mul, the rest has been pretty much moved outside the shader loop (albeit the mul could actually be optimized away too), where things are still scalar. To eliminate control flow in the main shader loop fetch, provide fake buffers (so index 0 is always valid to fetch). Still uses aos fetch though in the end - mostly because some more code would be needed to handle unaligned fetches in that path, and because for most formats it won't make a difference anyway (we generate some truly horrendous code for things like R16G16_something for instance). Instanced fetch however stays roughly the same as before, except that no longer the same element is fetched multiple times (I've seen a reduction of ~3 times in main shader loop size due to llvm not recognizing it's all the same fetch, since it would have been possible some of the fetches getting replaced with zeros in case vector size exceeds remaining fetch count - the values of such fetches don't matter at all though). Also, for elts gathering, use vectorized code as well. The generated shaders are smaller and faster to compile (not entirely sure about execution speed, but generally unless there's just single vertices to handle I would expect it to be faster - there's more opportunities for future improvements by using soa fetch). v3: skip the fake index buffer, not needed due to the jit code never seeing the real index buffer in the first place. Fix a bug with mask expansion (needs SExt, not ZExt). Also, be really really careful to keep the behavior the same, even in cases where it looks wrong, and add comments why the code is doing the seemingly wrong stuff... Fortunately it's not actually more complex in the end... Also change function order slightly just to make the diff more readable. No piglit change. Passes some internal testing with another api too... Reviewed-by: Jose Fonseca <[email protected]>
* vc4: Try compiling our FSes in multithreaded mode on new kernels.Eric Anholt2016-11-165-2/+20
| | | | | | Multithreaded fragment shaders let us hide texturing latency by a hyperthreading-style switch to another fragment shader. This gets us up to 20% framerate improvements on glmark2 tests.
* vc4: Add support for ETC1 textures if the kernel is new enough.Eric Anholt2016-11-164-5/+18
| | | | | The kernel changes for exposing the param have now been merged, so we can expose it here.
* vc4: Fix simulator mode missing-GETPARAM debug info.Eric Anholt2016-11-161-1/+1
| | | | The value is 0 since we didn't set it, we wanted to see the param.
* vc4: Fix resource leak in register allocation failure path.Mun Gwan-gyeong2016-11-161-0/+2
| | | | | | CID 1394322 Signed-off-by: Mun Gwan-gyeong <[email protected]>
* swr: [rasterizer core] fix clear with multiple color attachmentsTim Rowley2016-11-166-52/+40
| | | | | | | | Fixes fbo-mrt-alphatest v2: styling fixes Reviewed-by: Bruce Cherniak <[email protected]>
* u_simple_shaders: try to un-break the Windows buildNicolai Hähnle2016-11-161-2/+3
| | | | Acked-by: Edward O'Callaghan <[email protected]>
* radeonsi: fix a subtle bounds checking corner case with 3-component attributesNicolai Hähnle2016-11-163-2/+39
| | | | | | | | I'm also sending out a piglit test, gl-2.0/vertexattribpointer-size-3, which exposes this corner case. Reviewed-by: Marek Olšák <[email protected]> Reviewed-by: Edward O'Callaghan <[email protected]>
* radeonsi: reject some 3-component formats as buffer texturesNicolai Hähnle2016-11-161-8/+35
| | | | | | | Fixes parts of GL45-CTS.gtf32.GL3Tests.packed_pixels.packed_pixels_pbo. Reviewed-by: Marek Olšák <[email protected]> Reviewed-by: Edward O'Callaghan <[email protected]>
* util/blitter: add clamping during SINT <-> UINT blitsNicolai Hähnle2016-11-166-43/+125
| | | | | | | | | | | | Even though glBlitFramebuffer cannot be used for SINT <-> UINT blits, we still need to handle this type of blit here because it can happen as part of texture uploads / downloads, e.g. uploading a GL_RGBA8I texture from GL_UNSIGNED_INT data. Fixes parts of GL45-CTS.gtf32.GL3Tests.packed_pixels.packed_pixels. Reviewed-by: Marek Olšák <[email protected]> Reviewed-by: Edward O'Callaghan <[email protected]>
* util/blitter: index texfetch_col shaders by typeNicolai Hähnle2016-11-161-35/+19
| | | | | Reviewed-by: Marek Olšák <[email protected]> Reviewed-by: Edward O'Callaghan <[email protected]>
* swr: mark color clamping as unsupportedIlia Mirkin2016-11-151-2/+3
| | | | | | | | | | | There is no functionality in swr to clamp either vertex or frag colors. This could be added in swr_shader, at which point these could be re-enabled. Fixes arb_color_buffer_float-render Signed-off-by: Ilia Mirkin <[email protected]> Reviewed-by: Tim Rowley <[email protected]>
* swr: always enable adding start/base vertex to gl_VertexIdIlia Mirkin2016-11-151-0/+1
| | | | | | | Fixes gl-3.2-basevertex-vertexid Signed-off-by: Ilia Mirkin <[email protected]> Reviewed-by: Tim Rowley <[email protected]>
* swr: add support for upper-left fragcoord positionIlia Mirkin2016-11-151-2/+8
| | | | | | | Fixes glsl-arb-fragment-coord-conventions. Signed-off-by: Ilia Mirkin <[email protected]> Reviewed-by: Tim Rowley <[email protected]>
* swr: make sure that all rendering is finished on shader destroyIlia Mirkin2016-11-151-0/+8
| | | | | | | | | | | | Rendering could still be ongoing (or have yet to start) when the shader is deleted. There's no refcounting on the shader text, so insert a pipeline stall unconditionally when this happens. [Note, we should instead introduce a way to attach work to fences, so that the freeing can be done in the current fence.] Signed-off-by: Ilia Mirkin <[email protected]> Reviewed-by: Bruce Cherniak <[email protected]>
* swr: disable blending for integer formatsIlia Mirkin2016-11-151-0/+3
| | | | | | | | | | The EXT_texture_integer test says that blending and alphatest should all be disabled. st/mesa takes care of alphatest already. Fixes the ext_texture_integer-fbo-blending piglit test. Signed-off-by: Ilia Mirkin <[email protected]> Reviewed-by: Bruce Cherniak <[email protected]>
* swr: mark rgb9_e5 as unrenderableIlia Mirkin2016-11-151-1/+1
| | | | | | | | | The support in swr requires shaders to output the components as UINTs. This is not how GL or Gallium work, and since this is not a required-renderable format, just leave it out. Signed-off-by: Ilia Mirkin <[email protected]> Reviewed-by: Bruce Cherniak <[email protected]>
* swr: no support for shader stencil exportIlia Mirkin2016-11-151-1/+1
| | | | | Signed-off-by: Ilia Mirkin <[email protected]> Reviewed-by: Bruce Cherniak <[email protected]>
* swr: mark both frag and vert textures read, don't forget about cbsIlia Mirkin2016-11-151-5/+15
| | | | | Signed-off-by: Ilia Mirkin <[email protected]> Reviewed-by: Bruce Cherniak <[email protected]>
* swr: fix texture layout for compressed formatsIlia Mirkin2016-11-152-4/+6
| | | | | | | Fixes the texsubimage piglit and lets the copyteximage one get further. Signed-off-by: Ilia Mirkin <[email protected]> Reviewed-by: Bruce Cherniak <[email protected]>
* swr: add archrast generated files to gitignoreIlia Mirkin2016-11-151-0/+4
| | | | | Signed-off-by: Ilia Mirkin <[email protected]> Reviewed-by: Bruce Cherniak <[email protected]>
* swr: [rasterizer jitter] don't bother quantizing unused channelsIlia Mirkin2016-11-151-1/+1
| | | | | | | | In a BGR10X2 or BGR5X1 situation, there's no need to try to quantize the X channel - the default will have the proper quantization required. Signed-off-by: Ilia Mirkin <[email protected]> Reviewed-by: Bruce Cherniak <[email protected]>
* swr: [rasterizer memory] fix store tile for 128-bit ymajor tilingIlia Mirkin2016-11-151-1/+1
| | | | | | | Noticed by inspection. Signed-off-by: Ilia Mirkin <[email protected]> Reviewed-by: Bruce Cherniak <[email protected]>
* swr: [rasterizer memory] add support for R32_FLOAT_X8X24 formatsIlia Mirkin2016-11-152-0/+2
| | | | | | | | This is the format used for the primary surface of a PIPE_FORMAT_Z32_FLOAT_S8X24_UINT resource. Signed-off-by: Ilia Mirkin <[email protected]> Reviewed-by: Bruce Cherniak <[email protected]>