summaryrefslogtreecommitdiffstats
path: root/src/gallium/drivers/llvmpipe
Commit message (Collapse)AuthorAgeFilesLines
* llvmpipe: bump 3d and cube map limits to 2048 and 8192 respectivelyRoland Scheidegger2013-06-061-2/+2
| | | | | | | These should just work, required by d3d10. Too large resources will get thrown out separately anyway. Reviewed-by: Brian Paul <[email protected]>
* llvmpipe: improve alignment calculation for fetching/storing pixelsRoland Scheidegger2013-06-051-12/+21
| | | | | | | | | | | | | | | | This was always doing per-pixel alignment which isn't necessary, except for the buffer case (due to the per-element offset). The disabled code for calculating it was incorrect because it assumed that always the full block would be fetched, which may not be the case, so fix this up. The original code failed for instance for r10g10b10a2 the alignment would have been calculated as 4 (block_width) * 4 (bytes) so 16, but the actual fetch may have only fetched 2 values at a time, hence only alignment 8 - it is unclear what exactly would happen in this case (alignment larger than size to fetch). So just use the (already calculated) fetch size instead and get alignment from that which should always work, no matter if fetching 1,2 or 4 pixels. Reviewed-by: Jose Fonseca <[email protected]>
* llvmpipe: reduce alignment requirement for 1d resources from 4x4 to 4x1Roland Scheidegger2013-06-059-44/+158
| | | | | | | | | | | | | | | | | | | | | | | | | | | | For rendering to buffers, we cannot have any y alignment. So make sure that tile clear commands only clear up to the fb width/height, not more (do this for all resources actually as clearing more seems pointless for other resources too). For the jit fs function, skip execution of the lower half of the fragment shader for the 4x4 stamp completely, for depth/stencil only load/store the values from the first row (replace other row with undef). For the blend function, also only load half the values from fs output, replace the rest with undefs so that everything still operates on the full 4x4 block to keep code the same between 4x1 and 4x4 (except for load/store of course which also needs to skip (store) or replace these values with undefs (load))., at the cost of slightly less optimal code being produced in some cases. Also reduce 1d and 1d array alignment too, because they can be handled the same as buffers so don't need to waste memory. v2: don't try to run special blend code for 4x1, (very) slightly less complexity if we just use the same code as for 4x4 which may or may not make it easier to optimize in the future (as we care a lot more about 4x4 performance than 1d). v2: don't use undef values for unused fs src outputs with llvm 3.1 as it apparently can trigger a bug in llvm. Reviewed-by: Jose Fonseca <[email protected]>
* llvmpipe: cleanup of generate_unswizzled_blendRoland Scheidegger2013-06-051-22/+37
| | | | | | | | | | | | | Some parameters were used inconsistently, for instance not using block_width/block_height/block_size for deferring number of pixels but rather relying on guesses from the number of fragment shaders etc, so fix this up (no actual change in behavior since the block size stays fixed). (Though most of the code would work with different block_height, with three exceptions, one being the hacked r11g11b10 conversions and twiddle code which only work with block_height 2 not 1, and the last one being blend vector type not being 128bit wide.) Reviewed-by: Jose Fonseca <[email protected]>
* gallivm: enhance special sse2 4x4f and 2x8f -> 1x16ub conversionRoland Scheidegger2013-06-051-0/+2
| | | | | | | | | | | | | There's no good reason why it can't handle 2x4f->1x8ub, 1x4f->1x4ub and 1x8f->1x8ub cases, there might be legitimate reasons why we don't have enough input vectors for a full destination vector, and using pack intrinsics should still be much better than using generic conversion (it looks like convert_alpha from the blend code might hit this though I suspect it could be avoided). v2: add another test vector format to lp_test_conv so this gets tested. Reviewed-by: Jose Fonseca <[email protected]>
* llvmpipe: fix bogus assertions for buffer surfacesRoland Scheidegger2013-06-011-2/+2
| | | | | | | | | | One of the assertion made no sense for buffer rendertargets (due to the union), so drop it. (The same assertion is present already in the path for texture surfaces later.). v2: make assertion completely accurate (suggested by Jose). Reviewed-by: Jose Fonseca <[email protected]>
* llvmpipe: reduce alignment requirement for resources from 64x64 to 4x4Roland Scheidegger2013-05-317-53/+83
| | | | | | | | | | | | | | | | | | | | | | | | The overallocation was very bad especially for things like 1d array textures which got blown up by a factor of 64. (Even ordinary smallish 2d textures benefit a lot from this, a mipmapped 64x64 rgba8 texture previously used 7*16kB = 112kB instead of now ~22kB.) 4x4 is chosen because this is the size the jit functions run on, so making it smaller is going to be a bit more complicated. It is actually not strictly 4x4 pixel, since we'd want to avoid situations where different threads are rendering to the same cacheline so we keep cacheline size alignment in x direction (often 64bytes). To make this work introduce new task width/height parameters and make sure clears don't clear the whole tile if it's a partial tile. Likewise, the rasterizer may produce fragments outside the 4x4 blocks present in a tile, so don't call the jit function for them. This does not yet fix rendering to buffers (which cannot have any y alignment at all), and 1d/1d array textures are still overallocated by a factor of 4. v2: replace magic number 4 with LP_RASTER_BLOCK_SIZE, fix size of buffers allocated (needed in case we render to them). Reviewed-by: Jose Fonseca <[email protected]>
* llvmpipe: Remove x/y from cmd_binAdam Jackson2013-05-316-47/+30
| | | | | | | | | | These were mostly just a waste of memory and cache pressure, and were really only used for debugging. This change reduces instruction count (as measured by callgrind's Ir event) of gnome-shell-perf-tool on Ivybridge by 3.5% ± 0.015% (n=20). Signed-off-by: Adam Jackson <[email protected]>
* llvmpipe: clamp scissors to be between 0 and maxZack Rusin2013-05-255-3/+13
| | | | | | | | | | We need to clamp to make sure invalid shader doesn't crash our driver. The spec says to return 0-th index for everything that's out of bounds. Signed-off-by: Zack Rusin <[email protected]> Reviewed-by: José Fonseca<[email protected]> Reviewed-by: Brian Paul <[email protected]>
* draw: fixup draw_find_shader_outputZack Rusin2013-05-251-4/+4
| | | | | | | | | | | | | | | | | draw_find_shader_output like most of the code in draw used to depend on position always being at output slot 0. which meant that any other attribute being at 0 could signify an error. unfortunately position can be at any of the output slots, thus other attributes can occupy slot 0 and we need to mark the ones which were not found by something else. This commit changes draw_find_shader_output so that it returns -1 if it can't find the given attribute and adjust the code that depended on it returning >0 whenever it correctly found an attrib. Signed-off-by: Zack Rusin <[email protected]> Reviewed-by: José Fonseca<[email protected]> Reviewed-by: Brian Paul <[email protected]> Reviewed-by: Roland Scheidegger <[email protected]>
* llvmpipe: implement support for multiple viewportsZack Rusin2013-05-2511-36/+79
| | | | | | | | | | Largely related to making sure the rasterizer can correctly pick out the correct scissor box for the current viewport. Signed-off-by: Zack Rusin <[email protected]> Reviewed-by: José Fonseca<[email protected]> Reviewed-by: Brian Paul <[email protected]> Reviewed-by: Roland Scheidegger <[email protected]>
* gallium: Add support for multiple viewportsZack Rusin2013-05-252-9/+16
| | | | | | | | | | | | Gallium supported only a single viewport/scissor combination. This commit changes the interface to allow us to add support for multiple viewports/scissors. Signed-off-by: Zack Rusin <[email protected]> Reviewed-by: Marek Olšák <[email protected]> Reviewed-by: José Fonseca<[email protected]> Reviewed-by: Brian Paul <[email protected]> Reviewed-by: Roland Scheidegger <[email protected]>
* llvmpipe: get rid of tiled/linear layout remainsRoland Scheidegger2013-05-296-226/+47
| | | | | | | Eliminate the rest of the no longer needed layout logic. (It is possible some code could be simplified a bit further still.) Reviewed-by: Jose Fonseca <[email protected]>
* llvmpipe: disable simple_shader optimizationRoland Scheidegger2013-05-221-1/+6
| | | | | | | | | | | | | | This optimization disabled mask checks if the shader is simple enough. While this should work correctly, the problem is that it can hide real issues because shaders in practice are usually complex enough (8 instructions or 1 texture is already enough) so this doesn't get used, whereas dumbed-down tests which should hit all the same code paths suddenly do something quite different. This was the reason that bug 41787 could not be easily tracked as stencil test not working correctly (piglit would in fact have failed some tests without that optimization). So disable it for now, it's unclear if it's much of a win in any case. Reviewed-by: Jose Fonseca <[email protected]>
* llvmpipe: fix early depth test / late depth write stencil issuesRoland Scheidegger2013-05-221-5/+12
| | | | | | | | | | | | | | We actually did early depth/stencil test and late depth/stencil write even when the shader could kill the fragment (alpha test or discard). Since it matters for the new stencil value if the fragment is killed by depth/stencil test or by the shader (in which case it will not reach the depth/stencil test) this simply cannot work (we also would possibly skip writing the new stencil value due to mask checks but this is a secondary issue). So use late depth test / late depth write instead in this case. (No piglit changes as it doesn't seem to hit such bogus early depth test / late depth write path.) Reviewed-by: Jose Fonseca <[email protected]>
* llvmpipe: fix issue with not writing new stencil valuesRoland Scheidegger2013-05-222-4/+7
| | | | | | | | | | | | | | | We did mask checks between depth/stencil testing and depth/stencil write. This meant that if the depth/stencil test killed off all fragments we never actually wrote the new stencil value. This issue affected all early/late test/write combinations. So move the mask check after depth/stencil write (for early depth test, could do the same for late depth test but might not be worth it at that point so just skip it there). This addresses https://bugs.freedesktop.org/show_bug.cgi?id=41787. Piglit does not hit this issue because of the simple_shader optimization in generate_fs_loop() which means we're skipping the mask checks. Reviewed-by: Jose Fonseca <[email protected]>
* llvmpipe: (trivial) remove confusing code in stencil testRoland Scheidegger2013-05-221-16/+11
| | | | | | | | | This was meant to disable some code which isn't needed when depth/stencil isn't written. However, there's more code which wouldn't be needed in that case so having the condition there was just odd (llvm will drop all the code anyway). Reviewed-by: Jose Fonseca <[email protected]>
* llvmpipe: fix bug in early depth test / late depth write handlingRoland Scheidegger2013-05-221-6/+7
| | | | | | | Using wrong type if the format was less than 32bits. No piglit changes as it doesn't hit that path. Reviewed-by: Jose Fonseca <[email protected]>
* llvmpipe: enable z32s8x24 formatRoland Scheidegger2013-05-181-6/+0
| | | | | | | | Now that we can handle it both for sampling and as depth/stencil enable it. Passes nearly all additional piglit tests which are now performed, with two exceptions (one being a framebuffer blit which fails for all other formats including stencil too as we don't support stencil blits, the other reporting a unexpected GL error so doesn't look to be llvmpipe's fault).
* llvmpipe: handle z32s8x24 depth/stencil formatRoland Scheidegger2013-05-187-147/+252
| | | | | | | We need to split up the depth and stencil values in this case, and there's some new logic required to handle float depth and stencil simultaneously. Also make sure we get the 64bit zs clear values and masks propagated correctly.
* llvmpipe: get rid of unused tiled/linear logicRoland Scheidegger2013-05-187-713/+50
| | | | | | | | | We do rendering to linear color buffers for quite some time, and since switching to linear depth buffers all the tiled/linear logic was unused. So get rid of (most) of it - there's still some LAYOUT_NONE things and late allocation of resources which probably could be simplified. Reviewed-by: Jose Fonseca <[email protected]>
* llvmpipe: fix bogus handling of first_layer when setting up texture samplingRoland Scheidegger2013-05-182-14/+18
| | | | | | | | | | | | The code avoided first_layer parameter in the sampler interface (and needing to do another calculation at runtime) by fixing up the base texture pointer instead. Unfortunately, this didn't actually work as we have mip-first texture layout so fixing up the base ptr by a fixed amount is very wrong if there are mipmaps present. The wrong offsets caused misrendering and crashes. Fix this by just adjusting the individual mip level offsets instead. Spotted by Jose. Reviewed-by: Jose Fonseca <[email protected]>
* gallivm: Eliminate 8.8 fixed point intermediates from AoS sampling path.José Fonseca2013-05-171-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This change was meant as a stepping stone to use PMADDUBSW SSSE3 instruction, but actually this refactoring by itself yields a 10% speedup on texture intensive shaders (e.g, Google Earth's ocean water w/o S3TC on a Ivy Bridge machine), while giving yielding exactly the same results, whereas PMADDUBSW only gave an extra 5%, at the expense of 2bits of precision in the interpolation. I belive that the speedup of this change comes from the reduced register pressure (as 8.8 fixed point intermediates take twice the space of 8bit unorm). Also, not dealing with 8.8 simplifies lp_bld_sample_aos.c code substantially -- it's no longer necessary to have code duplicated for low and high register halfs. Note about lp_build_sample_mipmap(): the path for num_quads > 1 is never executed (as it is faster on AVX to split the 256bit wide texture computation into two 128bit chunks, in order to leverage integer opcodes). This path might be useful in the future, so in order to verify this change did not break that path I had to apply this change: @@ -1662,11 +1662,11 @@ lp_build_sample_soa(struct gallivm_state *gallivm, /* * we only try 8-wide sampling with soa as it appears to * be a loss with aos with AVX (but it should work). * (It should be faster if we'd support avx2) */ - if (num_quads == 1 || !use_aos) { + if (/* num_quads == 1 || ! */ use_aos) { if (num_quads > 1) { if (mip_filter == PIPE_TEX_MIPFILTER_NONE) { LLVMValueRef index0 = lp_build_const_int32(gallivm, 0); /* and then run texfilt mesademo: LP_NATIVE_VECTOR_WIDTH=256 ./texfilt Ran whole piglit without regressions. Reviewed-by: Roland Scheidegger <[email protected]>
* llvmpipe: Temporary workaround to prevent segfault on array textures.José Fonseca2013-05-161-0/+3
|
* draw: try to prevent overflows on index buffersZack Rusin2013-05-141-4/+10
| | | | | | | | | | | | Pass in the size of the index buffer, when available, and use it to handle out of bounds conditions. The behavior in the case of an overflow needs to be the same as with other overflows in the vertex processing pipeline meaning that a vertex should still be generated but all attributes in it set to zero. Signed-off-by: Zack Rusin <[email protected]> Reviewed-by: José Fonseca <[email protected]> Reviewed-by: Roland Scheidegger <[email protected]>
* draw: don't crash on vertex buffer overflowZack Rusin2013-05-141-2/+4
| | | | | | | | | | | | | | We would crash when stride was bigger than the size of the buffer. The correct behavior is to just fetch zero's in this case. Unfortunatly with user_buffer's there's no way to validate the size because currently we're just not getting it. Adjust the draw interface to pass the size along the mapped buffer, which works perfectly for buffer backed vertex_buffers and, in future, it will allow us to plumb user_buffer sizes through the same interface. Signed-off-by: Zack Rusin <[email protected]> Reviewed-by: José Fonseca <[email protected]> Reviewed-by: Roland Scheidegger <[email protected]>
* gallium: add PIPE_CAP_MAX_TEXTURE_BUFFER_SIZE for GLMarek Olšák2013-05-111-0/+2
| | | | | | v2: fix typo 65535 -> 65536 Reviewed-by: Brian Paul <[email protected]>
* gallium: fix type of flags in pipe_context::flush()Chia-I Wu2013-05-041-1/+1
| | | | | | | | | | | | | | | | It should be unsigned, not enum pipe_flush_flags. Fixed a build error: src/gallium/state_trackers/egl/android/native_android.cpp:426:29: error: invalid conversion from 'int' to 'pipe_flush_flags' [-fpermissive] v2: replace all occurrences of enum pipe_flush_flags by unsigned Signed-off-by: Chia-I Wu <[email protected]> Reviewed-by: Marek Olšák <[email protected]> [olv: document the parameter now that the type is unsigned]
* llvmpipe: get rid of depth swizzling.Roland Scheidegger2013-05-037-273/+414
| | | | | | | | | | | | | | | Eliminating this we no longer need to copy between linear and swizzled layout. This is probably not quite ideal since it's a bit more work for now, could do some optimizations by moving depth testing outside the fragment shader loop (but tricky for early depth test as we don't have neither the mask nor the interpolated z in the right order handy). The large amount of tile/untile code is no longer needed will be deleted in next commit. No piglit regressions. v2: change a forgotten LAYOUT_NONE to LAYOUT_LINEAR. v3: fix (bogus) uninitialized variable warnings, add comments, fix a bad type Reviewed-by: Jose Fonseca <[email protected]>
* llvmpipe: Fix queries when screen->num_threads == 0.José Fonseca2013-04-291-2/+3
| | | | | | | | | | That is, when llvmpipe is run in single-threaded mode. Trivial. Tested with LP_NUM_THREADS=0 glean --run results --overwrite --quick --tests occluQry
* llvmpipe: stop crashing when one of the so targets is nullZack Rusin2013-04-271-2/+5
| | | | | | | Fixes a crash when one of the so targets is null. Signed-off-by: Zack Rusin <[email protected]> Reviewed-by: José Fonseca <[email protected]>
* llvmpipe: implement so_overflow queryZack Rusin2013-04-263-0/+15
| | | | | | Signed-off-by: Zack Rusin <[email protected]> Reviewed-by: José Fonseca <[email protected]> Reviewed-by: Roland Scheidegger <[email protected]>
* llvmpipe: replace LP_MAX_THREADS with screen->num_threads in query codeBrian Paul2013-04-261-2/+4
| | | | Reviewed-by: Roland Scheidegger <[email protected]>
* llvmpipe: bump LP_MAX_THREADS to 16Brian Paul2013-04-261-1/+1
| | | | | | | On the mesa-users list, Burlen Loring reported a speed-up with 16 cores and his test/app. Reviewed-by: Roland Scheidegger <[email protected]>
* gallium: Replace gl_rasterization_rules with lower_left_origin and ↵José Fonseca2013-04-237-38/+21
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | half_pixel_center. Squashed commit of the following: commit 04c5fa2cbb8e89d6f2fa5a75af1cca03b1f6b852 Author: José Fonseca <[email protected]> Date: Tue Apr 23 17:37:18 2013 +0100 gallium: s/lower_left_origin/bottom_edge_rule/ commit 4dff4f64fa83b9737def136fffd161d55e4f1722 Author: José Fonseca <[email protected]> Date: Tue Apr 23 17:35:04 2013 +0100 gallium: Move diagram to docs. commit 442a63012c8c3c3797f45e03f2ca20ad5f399832 Author: James Benton <[email protected]> Date: Fri May 11 17:50:55 2012 +0100 gallium: Replace gl_rasterization_rules with lower_left_origin and half_pixel_center. This change is necessary to achieve correct results when using OpenGL FBOs. Reviewed-by: Marek Olšák <[email protected]>
* llvmpipe: verify function on blend test.José Fonseca2013-04-211-0/+2
|
* llvmpipe: Don't support Z32_FLOAT_S8X24_UINT texture sampling support either.José Fonseca2013-04-201-4/+6
| | | | | | | Because we don't support, and the u_format fallback doesn't work for zs formats. Reviewed-by: Brian Paul <[email protected]>
* llvmpipe: Ignore depth-stencil state if format has no depth/stencil.José Fonseca2013-04-201-4/+10
| | | | | | Prevents assertion failures inside the driver for such state combinations. Reviewed-by: Brian Paul <[email protected]>
* llvmpipe: Take in consideration all current constant buffers when mapping.José Fonseca2013-04-181-3/+9
| | | | | Reviewed-by: Roland Scheidegger <[email protected]> Reviewed-by: Zack Rusin <[email protected]>
* st/mesa: optionally apply texture swizzle to border color v2Christoph Bumiller2013-04-181-0/+2
| | | | | | | | | | | | This is the only sane solution for nv50 and nvc0 (really, trust me), but since on other hardware the border colour is tightly coupled with texture state they'd have to undo the swizzle, so I've added a cap. The dependency of update_sampler on the texture updates was introduced to avoid doing the apply_depthmode to the swizzle twice. v2: Moved swizzling helper to u_format.c, extended the CAP to provide more accurate information.
* llvmpipe: Support half integer pixel center fs coord.José Fonseca2013-04-184-3/+28
| | | | | | | Tested with graw/fs-fragcoord 2/3, and piglit glsl-arb-fragment-coord-conventions. Reviewed-by: Roland Scheidegger <[email protected]>
* llvmpipe: Remove the static interpolation.José Fonseca2013-04-183-384/+19
| | | | | | | | No longer used. If we ever want the old behavior we can run a loop unroller pass. Reviewed-by: Roland Scheidegger <[email protected]>
* gallivm: Drop pos arg from lp_build_tgsi_soa.José Fonseca2013-04-181-2/+2
| | | | | | Never used. Reviewed-by: Roland Scheidegger <[email protected]>
* draw: implement pipeline statistics in the draw moduleZack Rusin2013-04-166-0/+69
| | | | | | | | | | | | | This is a basic implementation of the pipeline statistics in the draw module. The interface is similar to the stream output statistics and also requires that the callers explicitly enable it. Included is the implementation of the interface in llvmpipe and softpipe. Only softpipe enables the pipeline statistics capability though because llvmpipe is lacking gathering of the fragment shading and rasterization statistics. Signed-off-by: Zack Rusin <[email protected]> Reviewed-by: Jose Fonseca <[email protected]>
* llvmpipe: implement PIPE_QUERY_SO_STATISTICSZack Rusin2013-04-102-0/+21
| | | | | | | | | | We were missing the implementation of PIPE_QUERY_SO_STATISTICS query, this change implements it on top of the existing facilities. Signed-off-by: Zack Rusin <[email protected]> Reviewed-by: Jose Fonseca <[email protected]> Reviewed-by: Roland Scheidegger <[email protected]>
* llvmpipe: Work without sse2 if llvm is new enoughAdam Jackson2013-04-051-2/+3
| | | | | | | | At least on llvm 3.2 this appears to work fine. Tested on an Athlon XP 2600+, which has sse and 3dnow but not sse2. Reviewed-by: Jose Fonseca <[email protected]> Signed-off-by: Adam Jackson <[email protected]>
* draw/llvmpipe: allow independent so attachments to the vsZack Rusin2013-04-032-9/+27
| | | | | | | | | | | | | | When geometry shaders are present, one needs to be able to create an empty geometry shader with stream output that needs to be resolved later and attached to the currently bound vertex shader. Lets add support for it to llvmpipe and draw. draw allows attaching independent stream output info to any vertex shader and llvmpipe resolves at draw time which vertex shader the given empty geometry shader should be linked to. Signed-off-by: Zack Rusin <[email protected]> Reviewed-by: Brian Paul <[email protected]> Reviewed-by: José Fonseca <[email protected]>
* llvmpipe: reset so buffers when not appendingZack Rusin2013-04-031-0/+6
| | | | | | | | | We need to reset the internal state of the so buffers or we'll keep appending even though we're not supposed to. Signed-off-by: Zack Rusin <[email protected]> Reviewed-by: Brian Paul <[email protected]> Reviewed-by: José Fonseca <[email protected]>
* gallium: add PIPE_CAP_QUERY_PIPELINE_STATISTICSChristoph Bumiller2013-04-031-0/+2
| | | | Reviewed-by: Marek Olšák <[email protected]>
* llvmpipe: use triangle subdivision to avoid fixed-point overflow issuesBrian Paul2013-04-013-0/+186
| | | | | | | | | | | | If we're drawing to a surface that's 2048 x 2048 pixels or larger there's danger of fixed-point overflow in the triangle rasterization code. That leads to various rendering glitches. Rather than implement some intricate changes to the rasterization code, simply subdivide triangles into smaller subtriangles to avoid the issue. Only do this when the drawing surface is larger than 2048 by 2048. Reviewed-by: José Fonseca <[email protected]>