summaryrefslogtreecommitdiffstats
path: root/src/gallium/drivers/llvmpipe
Commit message (Collapse)AuthorAgeFilesLines
* gallium/util: switch over to new u_debug_image.[ch] codeBrian Paul2016-02-081-0/+1
| | | | Reviewed-by: Marek Olšák <[email protected]>
* gallium: add interface for querying memory usage and sizes (v2)Marek Olšák2016-02-051-0/+1
| | | | | | | | | | If you're worried about the duplication of some CAPs, we can remove them later. v2: add fields for memory eviction stats Reviewed-by: Ilia Mirkin <[email protected]> Reviewed-by: Alex Deucher <[email protected]>
* gallium: add PIPE_CAP_QUERY_BUFFER_OBJECTIlia Mirkin2016-02-041-0/+1
| | | | | Signed-off-by: Ilia Mirkin <[email protected]> Reviewed-by: Marek Olšák <[email protected]>
* gallium: Add PIPE_CAP_SURFACE_REINTERPRET_BLOCKSNicolai Hähnle2016-02-031-0/+1
| | | | | | | | | | This cap indicates whether pipe->create_surface can reinterpret a texture as a surface with a format of different block width/height (but equal block size). v2: fix whitespace Reviewed-by: Edward O'Callaghan <[email protected]>
* gallium: Add PIPE_CAP_BUFFER_SAMPLER_VIEW_RGBA_ONLYNicolai Hähnle2016-02-031-0/+1
| | | | | | | | | This cap indicates that the driver only supports R, RG, RGB and RGBA formats for PIPE_BUFFER sampler views. v2: move into "unsupported features" section for nouveau (Ilia Mirkin) Reviewed-by: Edward O'Callaghan <[email protected]>
* llvmpipe: use scissor_planes_needed helper functionRoland Scheidegger2016-02-033-18/+33
| | | | So it doesn't get out of sync in multiple places.
* llvmpipe: drop scissor planes early if the tri is fully inside themRoland Scheidegger2016-02-022-69/+110
| | | | | | | | | | | | | | | | | | | | | | | | | | If the tri is fully inside a scissor edge (or rather, we just use the bounding box of the tri for the comparison), then we can drop these additional scissor "planes" early. We do not even need to allocate space for them in the tri. The math actually appears to be slightly iffy due to bounding boxes being rounded, but it doesn't matter in the end. Those scissor rects are costly - the 4 planes from the scissor are already more expensive to calculate than the 3 planes from the tri itself, and it also prevents us from using the specialized raster code for small tris. This helps openarena performance by about 8% or so. Of course, it helps there that while openarena often enables scissoring (and even moves the scissor rect around) I have not seen a single tri actually hit the scissor rect, ever. v2: drop individual scissor edges, and do it earlier, not even allocating space for them. v3: help the compiler a bit with simpler code, suggested by Brian. Reviewed-by: Brian Paul <[email protected]>
* llvmpipe: minor cleanup of sse2 for calc_fixed_positionRoland Scheidegger2016-02-021-6/+5
| | | | | | Just slightly simpler assembly. Reviewed-by: Brian Paul <[email protected]>
* llvmpipe: use vector loads for (optimized) tri raster funcsRoland Scheidegger2016-02-022-37/+24
| | | | | | | | | | | | | | | | | | | | | | | | | | | When we switched to 64bit rasterization, we could no longer use straight aligned loads for loading the plane data. However, what the code actually does for loading 3 planes, is 12 scalar loads + 9 unpacks, and then there's another 8 unpacks for the transpose we need (!). It would be possible to do the (scalar) loads of course already transposed (at least saving the additional unpacks), however instead just use (un)aligned vector loads, and recalculate the eo values, which is much less instructions (note in case of the triangle_32_3_4 case, the eo values are not even used, making the scalar loads + unpacks for them all the more pointless). This drops execution time of the triangle_32_3_4 function considerably, albeit it doesn't really make a measurable difference (for small tris we're essentially limited by vertex throughput in any case), for triangle_32_3_16 it's essentially noise (the loop is more costly than the initial code there). (I'm thinking about just ditching storing the eo values in the plane data, so could switch back to using aligned planes, however right now they are still used in the other raster functions dealing with planes with scalar code. Also not touching the ppc code, might not be that bad there in any case.) Reviewed-by: Brian Paul <[email protected]>
* gallium: add GREMEDY_string_markerRob Clark2016-01-211-0/+1
| | | | | | | | | | Since the GREMEDY extensions are normally only exposed by the gremedy debugger (and could possibly trigger debug paths in the app), we don't expose the extension by default, but instead only with ST_DEBUG=gremedy. Signed-off-by: Rob Clark <[email protected]> Reviewed-by: Ilia Mirkin <[email protected]>
* llvmpipe: warn about illegal use of objects in different contextsRoland Scheidegger2016-01-213-1/+32
| | | | | | | | | | | Doing that is clearly a bug. We can't quite assert as st/mesa may hit this, but increase at least visibility of it a bit. (For the non-refcounted objects it would be illegal too, but we can't detect that unless we'd store the context ourselves. Plus, those don't tend to cause random crashes at context or object destruction time... So just sampler views, surfaces and so targets for now.) Reviewed-by: Jose Fonseca <[email protected]>
* llvmpipe,i915: add back NEW_RASTERIZER dependency when computing vertex infoRoland Scheidegger2016-01-211-2/+4
| | | | | | | | | | | | | | | | | | | | I removed this mistakenly in 2dbc20e45689e09766552517a74e2270e49817b5. I actually thought it should not be necessary and a piglit run didn't show any differences, but this shouldn't have been in there. draw_prepare_shader_outputs() is in fact dependent on NEW_RASTERIZER. The new polygon-mode-facing test indeed shows why this is necessary, there's lots of invalid reads and writes with valgrind (also crashes without valgrind), because the pre-pipeline vertex size doesn't match the post-pipeline vertex size (note this won't help much with stages which don't have the prepare hook which can grow the vertex size, in particular the wide point stage, but this isn't used by llvmpipe). The test still won't pass, of course, but it is only usage of uninitialized values now, which is much less dangerous... (Albeit I'm pretty sure for i915 it really is not needed anymore as it doesn't care about the extra outputs and doesn't call draw_prepare_shader_outputs().) Reviewed-by: Jose Fonseca <[email protected]>
* llvmpipe: turn depth clears into full depth/stencil clears for d24x8 formatsRoland Scheidegger2016-01-201-11/+14
| | | | | | | | | | | | If we have a d24x8 format, there is no stencil. Therefore, we can always clear these bits too, which means this will be some kind of memset rather than read-modify-write. This is good for some 7% increase or so in gears with huge window size - seems to have a bigger effect if things aren't in caches. Of course, any real app won't spend nearly as much time comparatively in clearing depth buffer in the first place, so the speedup will be much lower. Reviewed-by: Jose Fonseca <[email protected]>
* llvmpipe: fix arguments order given to vec_andcOded Gabbay2016-01-171-1/+1
| | | | | | | | | | | | | | | | This patch fixes a classic "confuse the enemy" bug. _mm_andnot_si128 (SSE) and vec_andc (VMX) do the same operation, but the arguments are opposite. _mm_andnot_si128 performs "r = (~a) & b" while vec_andc performs "r = a & (~b)" To make sure this error won't return in another place, I added a wrapper function, vec_andnot_si128, in u_pwr8.h, which makes the swap inside. Signed-off-by: Oded Gabbay <[email protected]> Reviewed-by: Roland Scheidegger <[email protected]>
* llvmpipe: ditch additional ref counting for vertex/geometry sampler viewsRoland Scheidegger2016-01-154-46/+12
| | | | | | | | | | | | | | | | | | | | | | | | | | The cleaning up was quite a performance hog (making pipe_resource_reference the number two in profilers on the vertex path, and 3rd overall, with its cousin pipe_reference_described not far behind) if there were lots of tiny draw calls (ipers). Now the reason was really that it was blindly calling this for all potential shader views (so 32 each for vs and gs) even though the app never touched a single one which could have been fixed, however I can't come up with a good reason why we refcount these. We've got references, of course, in the sampler views, which should be quite sufficient as we do all vertex and geometry shader execution fully synchronous. (Calling prepare_shader_sampling for all draw calls even if there were no changes looks quite suboptimal too, but generally we don't really expect vs/gs shader sampling to be used much with llvmpipe, and there's even an early exit if there aren't any views to avoid the "null loop" albeit it's now no longer always trying to loop through all 32 slots. Maybe improve another time...). Of course, if we manage to make vertex loads run asynchronously some day, we need references again, but adding that back would be the least of the problems... Also only set LP_NEW_SAMPLER_VIEW for fragment sampler views. Nothing on the vertex side depends on it (I suppose we'd really wanted a separate flag in any case). (Good for a 3% improvement or so in ipers under the right conditions.) Reviewed-by: Jose Fonseca <[email protected]>
* llvmpipe: fix "leaking" texturesRoland Scheidegger2016-01-152-2/+9
| | | | | | | | | | | | | | | | | | | | | This was not really a leak per se, but we were referencing the textures for longer than intended. If textures were set via llvmpipe_set_sampler_views() (for fs) and then picked up by lp_setup_set_fragment_sampler_views(), they were referenced in the setup state. However, the only way to unreference them was by replacing them with another texture, and not when the texture slot was replaced with a NULL sampler view. (They were then further also referenced by the scene too which might have additional minor side effects as we limit the memory size which is allowed to be referenced by a scene in a rather crude way.) Only setup destruction (at context destruction time) then finally would get rid of the references. Fix this by noting the number of textures the last time, and unreference things if the new view is NULL (avoiding having to unreference things always up to PIPE_MAX_SHADER_SAMPLER_VIEWS which would also have worked). Found by code inspection, no test... v2: rename var Reviewed-by: Jose Fonseca <[email protected]>
* gallium/st: add pipe_context::generate_mipmap()Charmaine Lee2016-01-141-0/+1
| | | | | | | | | | | | | | | | This patch adds a new interface to support hardware mipmap generation. PIPE_CAP_GENERATE_MIPMAP is added to allow a driver to specify if this new interface is supported; if not supported, the state tracker will fallback to mipmap generation by rendering/texturing. v2: add PIPE_CAP_GENERATE_MIPMAP to the disabled section for all drivers v3: add format to the generate_mipmap interface to allow mipmap generation using a format other than the resource format v4: fix return type of trace_context_generate_mipmap() Reviewed-by: Brian Paul <[email protected]> Reviewed-by: Roland Scheidegger <[email protected]> Reviewed-by: Jose Fonseca <[email protected]>
* gallium: add PIPE_CAP_INVALIDATE_BUFFERNicolai Hähnle2016-01-141-0/+1
| | | | | | | | | It makes sense to re-use pipe->invalidate_resource for the purpose of glInvalidateBufferData, but this function is already implemented in vc4 where it doesn't have the expected behavior. So add a capability flag to indicate that the driver supports the expected behavior. Reviewed-by: Marek Olšák <[email protected]>
* llvmpipe: (trivial) use cast wrapper for __m128d to __m128 castsRoland Scheidegger2016-01-131-2/+2
| | | | some compiler was unhappy.
* llvmpipe: avoid most 64 bit math in rasterizationRoland Scheidegger2016-01-132-65/+143
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The trick here is to recognize that in the c + n * dcdx calculations, not only can the lower FIXED_ORDER bits not change (as the dcdx values have those all zero) but that this means the sign bit of the calculations cannot be different as well, that is sign(c + n*dcdx) == sign((c >> FIXED_ORDER) + n*(dcdx >> FIXED_ORDER)). That shaves off more than enough bits to never require 64bit masks. A shifted plane c value could still easily exceed 32 bits, however since we throw out planes which are trivial accept even before binning (and similarly don't even get to see tris for which there was a trivial reject plane)) this is never a problem. The idea isnt't all that revolutionary, in fact something similar was tried ages ago (9773722c2b09d5f0615a47cecf4347859474dc56) back when the values were only 32 bit anyway. I believe now it didn't quite work then because the adjustment needed for testing trivial reject / partial masks wasn't handled correctly. This still keeps the separate 32/64 bit paths for now, as the 32 bit one still looks minimally simpler (and also because if we'd pass in dcdx/dcdy/eo unscaled from setup which would be a good reason to ditch the 32 bit path, we'd need to change the special-purpose rasterization functions for small tris). This passes piglit triangle-rasterization (-fbo -auto -max_size -subpixelbits 8) and triangle-rasterization-overdraw (with some hacks to make it work correctly with large sizes) easily (full piglit as well of course, but most tests wouldn't use triangles large enough to be affected, that is tris with a bounding box over 128x128). The profiler says indeed time spent in rast_tri functions is reduced substantially, BUT of course only if the tris are large. I measured a 3% improvement in mesa gloss demo when supersized to twice the screen size... Reviewed-by: Brian Paul <[email protected]> Reviewed-by: Jose Fonseca <[email protected]>
* llvmpipe: scale up bounding box planes to subpixel precisionRoland Scheidegger2016-01-133-30/+30
| | | | | | | | | Otherwise some planes we get in rasterization have subpixel precision, others not. Doesn't matter so far, but will soon. (OpenGL actually supports viewports with subpixel accuracy, so could even do bounding box calcs with that). Reviewed-by: Brian Paul <[email protected]> Reviewed-by: Jose Fonseca <[email protected]>
* llvmpipe: add sse code for fixed position calculationRoland Scheidegger2016-01-131-8/+50
| | | | | | | | | | | | | | This is quite a few less instructions, albeit still do the 2 64bit muls with scalar c code (they'd need way more shuffles, plus fixup for the signed mul so it totally doesn't seem worth it - x86 can do 32x32->64bit signed scalar muls natively just fine after all (even on 32bit). (This still doesn't have a very measurable performance impact in reality, although profiler seems to say time spent in setup indeed has gone down by 10% or so overall. Maybe good for a 3% or so improvement in openarena.) Reviewed-by: Brian Paul <[email protected]> Reviewed-by: Jose Fonseca <[email protected]>
* gallium: add PIPE_CAP_SHADER_BUFFER_OFFSET_ALIGNMENTIlia Mirkin2016-01-081-0/+1
| | | | | Signed-off-by: Ilia Mirkin <[email protected]> Reviewed-by: Marek Olšák <[email protected]>
* gallium: add caps for POSITION and FACE system valuesMarek Olšák2016-01-081-0/+2
| | | | | | | v2: document the integer behavior Reviewed-by: Edward O'Callaghan <[email protected] Reviewed-by: Brian Paul <[email protected]>
* gallium: add caps to expose support for multi indirect drawsIlia Mirkin2016-01-071-0/+2
| | | | | Signed-off-by: Ilia Mirkin <[email protected]> Reviewed-by: Marek Olšák <[email protected]>
* llvmpipe: do 64bit plane calculations in the sse pathRoland Scheidegger2016-01-082-50/+70
| | | | | | | | | | | | The sse path was pretty much disabled for practical purposes because the largest allowed fb size was 128x128. So, adapt it for 64bit plane calculations. This is actually not that difficult, though a problem is that we can't do a signed 32x32->64bit mul, only unsigned, so need to fix that up. Overall, the code still looks reasonable, though it's not like changes there in setup really make much of a difference in the end... Reviewed-by: Jose Fonseca <[email protected]> Reviewed-by: Brian Paul <[email protected]>
* llvmpipe: don't store eo as 64bit intRoland Scheidegger2016-01-084-11/+16
| | | | | | | | | | | eo, just like dcdx and dcdy, cannot overflow 32bit. Store it as unsigned though just in case (it cannot be negative, but in theory twice as big as dcdx or dcdy so this gives it one more bit). This doesn't really change anything, albeit it might help minimally on 32bit archs. Reviewed-by: Jose Fonseca <[email protected]> Reviewed-by: Brian Paul <[email protected]>
* llvmpipe: use aligned data for the assembly program in setupRoland Scheidegger2016-01-081-17/+21
| | | | | | | | | Back in the day (before 24678700edaf5bb9da9be93a1367f1a24cfaa471) the values were not actually in a struct but even then I can't see why we didn't simply align the values. Especially since it's trivial to do so. (Not that it actually matters since the code is pretty much unused for now.) Reviewed-by: Oded Gabbay <[email protected]>
* llvmpipe: use ints not unsigned for slotsRoland Scheidegger2016-01-076-67/+73
| | | | | | | | | | | | | They can't actually be 0 (as position is there) but should avoid confusion. This was supposed to have been done by af7ba989fb5a39925a0a1261ed281fe7f48a16cf but I accidentally pushed an older version of the patch in the end... Also prettify slightly. And make some notes about the confusing and useless fs input "map". Reviewed-by: Jose Fonseca <[email protected]> Reviewed-by: Brian Paul <[email protected]> Reviewed-by: Edward O'Callaghan <[email protected]>
* draw: nuke the interp parameter from vertex_infoRoland Scheidegger2016-01-071-13/+12
| | | | | | | | | | | | | draw emit couldn't care less what the interpolation mode is... This somehow looked like it would matter, all drivers more or less dutifully filled that in correctly. But this is only used for emit, if draw needs to know about interpolation mode (for clipping for instance) it will get that information from the vs anyway. softpipe actually used to depend on that interpolation parameter, as it abused that structure quite a bit but no longer. Reviewed-by: Brian Paul <[email protected]> Reviewed-by: Edward O'Callaghan <[email protected]>
* llvmpipe: scratch some special handling of vp_index/layerRoland Scheidegger2016-01-074-38/+7
| | | | | | | | | It was actually slightly buggy (missing initialization / setup not dependent on new vs albeit I didn't see issues), but the case of non-existing attributes is now handled by draw emit code so don't need that anymore. Reviewed-by: Brian Paul <[email protected]> Reviewed-by: Edward O'Callaghan <[email protected]>
* gallium/drivers: Remove unnecessary semicolonsEdward O'Callaghan2016-01-062-2/+2
| | | | | | Found-by: Coccinelle Signed-off-by: Edward O'Callaghan <[email protected]> Reviewed-by: Brian Paul <[email protected]>
* llvmpipe: Optimize lp_rast_triangle_32_3_16 for POWER8Oded Gabbay2016-01-061-1/+141
| | | | | | | | | | | | | | | | | | | | | This patch converts the SSE-optimized lp_rast_triangle_32_3_16() to VMX/VSX. I measured the results on POWER8 machine with 32 cores at 3.4GHz and 16GB of RAM. FPS/Score Name Before After Delta ------------------------------------------------ openarena 16.35 16.7 2.14% xonotic 4.707 4.97 5.57% glmark2 didn't show a significant (more than 1%) difference. v2: Make sure code is build only on POWER8 LE machine Signed-off-by: Oded Gabbay <[email protected]> Reviewed-by: Roland Scheidegger <[email protected]>
* llvmpipe: Optimize BUILD_MASK(_LINEAR) for POWER8Oded Gabbay2016-01-061-40/+110
| | | | | | | | | | | | | | | | | | | | | This patch converts the SSE-optimized build_mask_32() and build_mask_linear_32() to VMX/VSX. I measured the results on POWER8 machine with 32 cores at 3.4GHz and 16GB of RAM. FPS/Score Name Before After Delta ------------------------------------------------ glmark2 (score) 139.8 142.7 2.07% openarena and xonotic didn't show a significant (more than 1%) difference. v2: Make sure code is build only on POWER8 LE machine Signed-off-by: Oded Gabbay <[email protected]> Reviewed-by: Roland Scheidegger <[email protected]>
* llvmpipe: Optimize do_triangle_ccw for POWER8Oded Gabbay2016-01-061-0/+100
| | | | | | | | | | | | | | | | | | | | | | | This patch converts the SSE optimization done in do_triangle_ccw to VMX/VSX. I measured the results on POWER8 machine with 32 cores at 3.4GHz and 16GB of RAM. FPS/Score Name Before After Delta ------------------------------------------------ glmark2 (score) 136.6 139.8 2.34% openarena 16.14 16.35 1.30% xonotic 4.655 4.707 1.11% v2: - Convert loads to use aligned loads - Make sure code is build only on POWER8 LE machine Signed-off-by: Oded Gabbay <[email protected]> Reviewed-by: Roland Scheidegger <[email protected]>
* gallium: add PIPE_CAP_TGSI_PACK_HALF_FLOAT to indicate UP2H/PK2H supportIlia Mirkin2016-01-031-0/+1
| | | | | | Signed-off-by: Ilia Mirkin <[email protected]> Reviewed-by: Edward O'Callaghan <[email protected]> Reviewed-by: Roland Scheidegger <[email protected]>
* gallium: add PIPE_CAP_DRAW_PARAMETERSIlia Mirkin2015-12-301-0/+1
| | | | | | | | This allows the state tracker to know that the various draw parameters are available in vertex shaders. Signed-off-by: Ilia Mirkin <[email protected]> Reviewed-by: Roland Scheidegger <[email protected]>
* llvmpipe: fix layer/vp input into fs when not written by prior stagesRoland Scheidegger2015-12-128-53/+96
| | | | | | | | | | | | | | | | | | | | | | | | | ARB_fragment_layer_viewport requires that if a fs reads layer or viewport index but it wasn't output by gs (or vs with other extensions), then it reads 0. This never worked for llvmpipe, and is surprisingly non-trivial to fix. The problem is the mechanism to handle non-existing outputs in draw is rather crude, it will simply redirect them to whatever is at output 0, thus later stages will just get garbage. So, rather than trying to fix this up (which looks non-trivial), fix this up in llvmpipe setup by detecting this case there and output a fixed zero directly. While here, also optimize the hw vertex layout a bit - previously if the gs outputted layer (or vp) and the fs read those inputs, we'd add them twice to the vertex layout, which is unnecessary. And do some minor cleanup, slots don't require that many bits, there was some bogus (but harmless) float/int mixup for psize slot too, make the slots all unsigned (we always put pos at pos zero thus everything else has to be positive if it exists), and make sure they are properly initialized (layer and vp index slot were not which looked fishy as they might not have got set back to zero when changing from a gs which outputs them to one which does not). This fixes the failures in piglit's arb_fragment_layer_viewport group (3 each for layer and vp). Reviewed-by: Jose Fonseca <[email protected]>
* gallium/drivers: Sanitize NULL checks into canonical formEdward O'Callaghan2015-12-067-7/+7
| | | | | | | | | | Use NULL tests of the form `if (ptr)' or `if (!ptr)'. They do not depend on the definition of the symbol NULL. Further, they provide the opportunity for the accidental assignment, are clear and succinct. Signed-off-by: Edward O'Callaghan <[email protected]> Signed-off-by: Marek Olšák <[email protected]>
* gallium/drivers: Trivial code-style cleanupEdward O'Callaghan2015-12-065-7/+7
| | | | | Signed-off-by: Edward O'Callaghan <[email protected]> Signed-off-by: Marek Olšák <[email protected]>
* llvmpipe: Make use of ARRAY_SIZE macroEdward O'Callaghan2015-12-062-4/+4
| | | | | Signed-off-by: Edward O'Callaghan <[email protected]> Signed-off-by: Marek Olšák <[email protected]>
* llvmpipe: use provoking vertex for layer/viewportRoland Scheidegger2015-12-042-17/+32
| | | | | | | | | | | | | | | | | | | | d3d10 actually requires using provoking (first) vertex. GL is happy with any vertex (as long as we say it's undefined in the corresponding queries). Up to now we actually used vertex 0 for viewport index, and vertex 1 for layer (for tris), which really didn't make sense (probably a typo). Also,$ since we reorder vertices of clockwise triangle, that actually meant we used a different vertex depending if the traingle was cw or ccw (still ok by gl). However, it should be consistent with what draw (clip) does, and using provoking vertex seems like the sensible choice (draw clip will be fixed next as it is totally broken there). While here, also use the correct viewport always even when not needed in setup (we pass it down to jit fragment shader it might be needed there for getting correct near/far depth values). No piglit changes. Reviewed-by: Brian Paul <[email protected]> Reviewed-by: Jose Fonseca <[email protected]>
* softpipe/llvmpipe: don't advertize support for ASTCRoland Scheidegger2015-11-241-1/+2
| | | | | | | | | 33339775565154040e0c4ea2e196217dccc08cdf added support for ASTC textures to gallium. They don't have any helpers hooked up for software decoding, however, so cannot support them in drivers relying on util code for decoding. Reviewed-by: Jose Fonseca <[email protected]> Reviewed-by: Ilia Mirkin <[email protected]>
* llvmpipe: don't test for unsupported formats in lp_test_formatRoland Scheidegger2015-11-241-0/+12
| | | | | | | | | | | | | Removing the fake format helpers (1c7d0a6aa4f5cb38af7e281e1e5437cd1a20f781) caused this to fail. These formats were never supported, but previously they would have asserted in the generated jit functions (which, due to lack of test cases for these formats, were never called) whereas we now assert when trying to build the jit function. So, skip them completely. This fixes https://bugs.freedesktop.org/show_bug.cgi?id=93092 Reviewed-by: Jose Fonseca <[email protected]> Reviewed-by: Ilia Mirkin <[email protected]>
* gallium: add PIPE_CAP_CLEAR_TEXTURE and clear_texture prototypeIlia Mirkin2015-11-111-0/+1
| | | | | Signed-off-by: Ilia Mirkin <[email protected]> Reviewed-by: Marek Olšák <[email protected]>
* llvmpipe: disable front updates for nowDave Airlie2015-11-081-1/+1
| | | | | | | | As pointed out by Emil, this sometimes hangs, appears to be due to threading need to rethink how this stuff works for llvmpipe. Signed-off-by: Dave Airlie <[email protected]>
* llvmpipe: disable texture cacheRoland Scheidegger2015-11-051-1/+1
| | | | There are some weird problems with 8-wide vectors.
* llvmpipe: add cache for compressed texturesRoland Scheidegger2015-11-047-10/+109
| | | | | | | | | | | | | | | | | | | | | | compressed textures are very slow because decoding is rather complex (and because there's no jit code code to decode them too for non-technical reasons). Thus, add some texture cache which holds a couple of decoded blocks. Right now this handles only s3tc format albeit it could be extended to work with other formats rather trivially as long as the result of decode fits into 32bit per texel (ideally, rgtc actually would decode to more than 8 bits per channel, but even then making it work for it shouldn't be too difficult). This can improve performance noticeably but don't expect wonders (uncompressed is unsurprisingly still faster). It's also possible it might be slower in some cases (using nearest filtering for example or if there's otherwise not many cache hits, the cache is only direct mapped which isn't great). Also, actual decode of a block relies on util code, thus even though always full blocks are decoded it is done texel by texel - this could obviously benefit greatly from simd-optimized code decoding full blocks at once... Note the cache is per (raster) thread, and currently only used for fragment shaders. Reviewed-by: Jose Fonseca <[email protected]>
* llvmpipe: use simple coeffs calc for 128bit vectorsOded Gabbay2015-11-041-1/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | There are currently two methods in llvmpipe code to calculate coeffs to be used as inputs for the fragment shader. The two methods use slightly different ways to do the floating point calculations and thus produce slightly different results. The decision which method to use is determined by the size of the vector that is used by the platform. For vectors with size of more than 128bit, a single-step method is used, in which coeffs_init_simple() + attribs_update_simple() are called. For vectors with size of 128bit or less, a two-step method is used, in which coeffs_init() + attribs_update() are called. This causes some piglit tests (clip-distance-bulk-copy, interface-vs-unnamed-to-fs-unnamed) to fail when using platforms with 128bit vectors (such as ppc64le or x86-64 without AVX). This patch makes platforms with 128bit vectors use the single-step method (aka "simple" method) instead of the two-step method. This would make the resulting coeffs identical between more platforms, make sure the piglit tests passes, and make debugging and maintainability a bit easier as the generated LLVM IR will be the same for more platforms. The performance impact is negligible for x86-64 without AVX, and basically non-existent for ppc64le, as it can be seen from the following benchmarking results: - glxspheres, on ppc64le: - original code: 4.892745317 frames/sec 5.460303857 Mpixels/sec - with the patch: 4.932083873 frames/sec 5.504205571 Mpixels/sec - Additional 0.8% performance boost - glxspheres, on x86-64 without AVX: - original code: 20.16418809 frames/sec 22.50323395 Mpixels/sec - with the patch: 20.31328989 frames/sec 22.66963152 Mpixels/sec - Additional 0.74% performance boost - glmark2, on ppc64le: - original code: score of 58 - with my change: score of 57 - glmark2, on x86-64 without AVX: - original code: score of 175 - with the patch: score of 167 - Impact of of -4.5% on performance - OpenArena, on ppc64le: - original code: 3398 frames 1719.0 seconds 2.0 fps 255.0/505.9/2773.0/0.0 ms - with the patch: 3398 frames 1690.4 seconds 2.0 fps 241.0/497.5/2563.0/0.2 ms - 29 seconds faster with the patch, which is about 2% - OpenArena, on x86-64 without AVX: - original code: 3398 frames 239.6 seconds 14.2 fps 38.0/70.5/719.0/14.6 ms - with the patch: 3398 frames 244.4 seconds 13.9 fps 38.0/71.9/697.0/14.3 ms - 0.3 fps slower with the patch (about 2%) Additional details can be found at: http://lists.freedesktop.org/archives/mesa-dev/2015-October/098635.html Signed-off-by: Oded Gabbay <[email protected]> Reviewed-by: Roland Scheidegger <[email protected]>
* gallium/swrast: fix front buffer blitting. (v2)Dave Airlie2015-10-311-6/+15
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | So I've known this was broken before, cogl has a workaround for it from what I know, but with the gallium based swrast drivers BlitFramebuffer from back to front or vice-versa was pretty broken. The legacy swrast driver tracks when a front buffer is used and does the get/put images when it is mapped/unmapped, so this patch attempts to add the same functionality to the gallium drivers. It creates a new context interface to denote when a front buffer is being created, and passes a private pointer to it, this pointer is then used to decide on map/unmap if the contents should be updated from the real frontbuffer using get/put image. This is primarily to make gtk's gl code work, the only thing I've tested so far is the glarea test from https://github.com/ebassi/glarea-example.git v2: bump extension version, check extension version before calling get image. (Ian) Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=91930 Cc: <[email protected]> Signed-off-by: Dave Airlie <[email protected]>