summaryrefslogtreecommitdiffstats
path: root/src/gallium/drivers
Commit message (Collapse)AuthorAgeFilesLines
* nv50/ir: add saturate support on ex2Ilia Mirkin2016-01-162-0/+6
| | | | Signed-off-by: Ilia Mirkin <[email protected]>
* llvmpipe: ditch additional ref counting for vertex/geometry sampler viewsRoland Scheidegger2016-01-154-46/+12
| | | | | | | | | | | | | | | | | | | | | | | | | | The cleaning up was quite a performance hog (making pipe_resource_reference the number two in profilers on the vertex path, and 3rd overall, with its cousin pipe_reference_described not far behind) if there were lots of tiny draw calls (ipers). Now the reason was really that it was blindly calling this for all potential shader views (so 32 each for vs and gs) even though the app never touched a single one which could have been fixed, however I can't come up with a good reason why we refcount these. We've got references, of course, in the sampler views, which should be quite sufficient as we do all vertex and geometry shader execution fully synchronous. (Calling prepare_shader_sampling for all draw calls even if there were no changes looks quite suboptimal too, but generally we don't really expect vs/gs shader sampling to be used much with llvmpipe, and there's even an early exit if there aren't any views to avoid the "null loop" albeit it's now no longer always trying to loop through all 32 slots. Maybe improve another time...). Of course, if we manage to make vertex loads run asynchronously some day, we need references again, but adding that back would be the least of the problems... Also only set LP_NEW_SAMPLER_VIEW for fragment sampler views. Nothing on the vertex side depends on it (I suppose we'd really wanted a separate flag in any case). (Good for a 3% improvement or so in ipers under the right conditions.) Reviewed-by: Jose Fonseca <[email protected]>
* llvmpipe: fix "leaking" texturesRoland Scheidegger2016-01-152-2/+9
| | | | | | | | | | | | | | | | | | | | | This was not really a leak per se, but we were referencing the textures for longer than intended. If textures were set via llvmpipe_set_sampler_views() (for fs) and then picked up by lp_setup_set_fragment_sampler_views(), they were referenced in the setup state. However, the only way to unreference them was by replacing them with another texture, and not when the texture slot was replaced with a NULL sampler view. (They were then further also referenced by the scene too which might have additional minor side effects as we limit the memory size which is allowed to be referenced by a scene in a rather crude way.) Only setup destruction (at context destruction time) then finally would get rid of the references. Fix this by noting the number of textures the last time, and unreference things if the new view is NULL (avoiding having to unreference things always up to PIPE_MAX_SHADER_SAMPLER_VIEWS which would also have worked). Found by code inspection, no test... v2: rename var Reviewed-by: Jose Fonseca <[email protected]>
* nv50/ir: rebase indirect temp arrays to 0, so that we use less lmem spaceIlia Mirkin2016-01-141-14/+44
| | | | | | | | | | | | | Reduces local memory usage in a lot of Metro 2033 Redux and a few KSP shaders: total local used in shared programs : 54116 -> 30372 (-43.88%) Probably modest advantage to execution, but it's an imporant prerequisite to dropping some of the TGSI optimizations done by the state tracker. Signed-off-by: Ilia Mirkin <[email protected]>
* nv50/ir: only use FILE_LOCAL_MEMORY for temp arrays that use indirectionIlia Mirkin2016-01-141-15/+50
| | | | | | | | | | | | | | | | | | | Previously we were treating any indirect temp array usage to mean that everything should end up in lmem. The MemoryOpt pass would clean a lot of that up later, but in the meanwhile we would lose a lot of opportunity for optimization. This helps a lot of Metro 2033 Redux and a handful of KSP shaders: total instructions in shared programs : 6288373 -> 6261517 (-0.43%) total gprs used in shared programs : 944051 -> 945131 (0.11%) total local used in shared programs : 54116 -> 54116 (0.00%) A typical case is for register usage to double and for instructions to halve. A future commit can also optimize local memory usage size to be reduced with better packing. Signed-off-by: Ilia Mirkin <[email protected]>
* nvc0/ir: be careful about propagating very large offsets into const loadIlia Mirkin2016-01-144-1/+19
| | | | | | | | | | | | | Indirect constbuf indexing works by using very large offsets. However if an indirect constbuf index load is const-propagated, it becomes a very large const offset. Take that into account when legalizing the SSA by moving the high parts of that offset into the file index. Also disallow very large (or small) indices on most other instructions. This fixes regressions in ubo_array_indexing/*-two-arrays piglit tests. Fixes: abd326e81b (nv50/ir: propagate indirect loads into instructions) Signed-off-by: Ilia Mirkin <[email protected]>
* nvc0: allow fragment shader inputs to use indirect indexingIlia Mirkin2016-01-141-1/+1
| | | | Signed-off-by: Ilia Mirkin <[email protected]>
* radeonsi: don't miss changes to SPI_TMPRING_SIZEMarek Olšák2016-01-141-2/+7
| | | | | | | | | | I'm not sure about the consequences of this bug, but it's definitely dangerous. This applies to SI, CIK, VI. Cc: 11.0 11.1 <[email protected]> Reviewed-by: Nicolai Hähnle <[email protected]>
* svga: add DXGenMips command supportCharmaine Lee2016-01-1410-26/+144
| | | | | | | | | | | | | For those formats that support hw mipmap generation, use the DXGenMips command. Otherwise fallback to the mipmap generation utility. Tested with piglit, OpenGL apps (Heaven, Turbine, Cinebench) v2: make sure the texture surface was created with the render target bind flag set relocation flag to SVGA_RELOC_WRITE for the texture surface Reviewed-by: Brian Paul <[email protected]> Reviewed-by: Jose Fonseca <[email protected]>
* svga: add num-generate-mipmap HUD queryCharmaine Lee2016-01-143-1/+12
| | | | | | | | The actual increment of the num-generate-mipmap counter will be done in a subsequent patch when hw generate mipmap is supported. Reviewed-by: Brian Paul <[email protected]> Reviewed-by: Jose Fonseca <[email protected]>
* gallium/st: add pipe_context::generate_mipmap()Charmaine Lee2016-01-1415-0/+51
| | | | | | | | | | | | | | | | This patch adds a new interface to support hardware mipmap generation. PIPE_CAP_GENERATE_MIPMAP is added to allow a driver to specify if this new interface is supported; if not supported, the state tracker will fallback to mipmap generation by rendering/texturing. v2: add PIPE_CAP_GENERATE_MIPMAP to the disabled section for all drivers v3: add format to the generate_mipmap interface to allow mipmap generation using a format other than the resource format v4: fix return type of trace_context_generate_mipmap() Reviewed-by: Brian Paul <[email protected]> Reviewed-by: Roland Scheidegger <[email protected]> Reviewed-by: Jose Fonseca <[email protected]>
* gallium/radeon: do not reallocate user memory buffersNicolai Hähnle2016-01-142-8/+31
| | | | | | | | | The whole point of AMD_pinned_memory is that applications don't have to map buffers via OpenGL - but they're still allowed to, so make sure we don't break the link between buffer object and user memory unless explicitly instructed to. Reviewed-by: Marek Olšák <[email protected]>
* gallium/radeon: implement PIPE_CAP_INVALIDATE_BUFFERNicolai Hähnle2016-01-145-9/+22
| | | | Reviewed-by: Marek Olšák <[email protected]>
* gallium/radeon: reset valid_buffer_range on PIPE_TRANSFER_DISCARD_WHOLE_RESOURCENicolai Hähnle2016-01-141-0/+3
| | | | | | | This accomodates a streaming pattern where the discard flag is set when the application wraps back to the beginning of the buffer. Reviewed-by: Marek Olšák <[email protected]>
* gallium: add PIPE_CAP_INVALIDATE_BUFFERNicolai Hähnle2016-01-1414-0/+14
| | | | | | | | | It makes sense to re-use pipe->invalidate_resource for the purpose of glInvalidateBufferData, but this function is already implemented in vc4 where it doesn't have the expected behavior. So add a capability flag to indicate that the driver supports the expected behavior. Reviewed-by: Marek Olšák <[email protected]>
* radeonsi: move POSITION and FACE fragment shader inputs to system valuesMarek Olšák2016-01-133-45/+25
| | | | | | And FACE becomes integer instead of float. Reviewed-by: Edward O'Callaghan <[email protected]>
* radeonsi: simplify gl_FragCoord behaviorMarek Olšák2016-01-131-23/+22
| | | | | | It will become a system value, not an input. Reviewed-by: Edward O'Callaghan <[email protected]>
* llvmpipe: (trivial) use cast wrapper for __m128d to __m128 castsRoland Scheidegger2016-01-131-2/+2
| | | | some compiler was unhappy.
* llvmpipe: avoid most 64 bit math in rasterizationRoland Scheidegger2016-01-132-65/+143
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The trick here is to recognize that in the c + n * dcdx calculations, not only can the lower FIXED_ORDER bits not change (as the dcdx values have those all zero) but that this means the sign bit of the calculations cannot be different as well, that is sign(c + n*dcdx) == sign((c >> FIXED_ORDER) + n*(dcdx >> FIXED_ORDER)). That shaves off more than enough bits to never require 64bit masks. A shifted plane c value could still easily exceed 32 bits, however since we throw out planes which are trivial accept even before binning (and similarly don't even get to see tris for which there was a trivial reject plane)) this is never a problem. The idea isnt't all that revolutionary, in fact something similar was tried ages ago (9773722c2b09d5f0615a47cecf4347859474dc56) back when the values were only 32 bit anyway. I believe now it didn't quite work then because the adjustment needed for testing trivial reject / partial masks wasn't handled correctly. This still keeps the separate 32/64 bit paths for now, as the 32 bit one still looks minimally simpler (and also because if we'd pass in dcdx/dcdy/eo unscaled from setup which would be a good reason to ditch the 32 bit path, we'd need to change the special-purpose rasterization functions for small tris). This passes piglit triangle-rasterization (-fbo -auto -max_size -subpixelbits 8) and triangle-rasterization-overdraw (with some hacks to make it work correctly with large sizes) easily (full piglit as well of course, but most tests wouldn't use triangles large enough to be affected, that is tris with a bounding box over 128x128). The profiler says indeed time spent in rast_tri functions is reduced substantially, BUT of course only if the tris are large. I measured a 3% improvement in mesa gloss demo when supersized to twice the screen size... Reviewed-by: Brian Paul <[email protected]> Reviewed-by: Jose Fonseca <[email protected]>
* llvmpipe: scale up bounding box planes to subpixel precisionRoland Scheidegger2016-01-133-30/+30
| | | | | | | | | Otherwise some planes we get in rasterization have subpixel precision, others not. Doesn't matter so far, but will soon. (OpenGL actually supports viewports with subpixel accuracy, so could even do bounding box calcs with that). Reviewed-by: Brian Paul <[email protected]> Reviewed-by: Jose Fonseca <[email protected]>
* llvmpipe: add sse code for fixed position calculationRoland Scheidegger2016-01-131-8/+50
| | | | | | | | | | | | | | This is quite a few less instructions, albeit still do the 2 64bit muls with scalar c code (they'd need way more shuffles, plus fixup for the signed mul so it totally doesn't seem worth it - x86 can do 32x32->64bit signed scalar muls natively just fine after all (even on 32bit). (This still doesn't have a very measurable performance impact in reality, although profiler seems to say time spent in setup indeed has gone down by 10% or so overall. Maybe good for a 3% or so improvement in openarena.) Reviewed-by: Brian Paul <[email protected]> Reviewed-by: Jose Fonseca <[email protected]>
* nvc0: do not force re-binding of compute constbufs on FermiSamuel Pitoiset2016-01-121-1/+1
| | | | | | | | | | Re-binding compute constant buffers after launching a grid have no effects because they are not currently validated and because dirty_cp is not updated accordingly. This might also prevent weird future behaviours when UBOs will be bound for compute. Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Ilia Mirkin <[email protected]>
* nvc0: remove useless goto in nvc0_launch_grid()Samuel Pitoiset2016-01-121-6/+4
| | | | | | | Trivial. Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Ilia Mirkin <[email protected]>
* nv50/ir: the whole point of data array is to hand out regular registersIlia Mirkin2016-01-111-1/+1
| | | | | Fixes: 0d3051f75a (nv50/ir: Fix scratch allocation size and file) Signed-off-by: Ilia Mirkin <[email protected]>
* nv50/ir: Fix scratch allocation size and filePierre Moreau2016-01-092-3/+3
| | | | | Signed-off-by: Pierre Moreau <[email protected]> Reviewed-by: Ilia Mirkin <[email protected]>
* nv50,nvc0: use a face sysval to avoid the useless back-and-forth conversionIlia Mirkin2016-01-085-9/+2
| | | | Signed-off-by: Ilia Mirkin <[email protected]>
* freedreno: add ir3_compiler to gitignoreIlia Mirkin2016-01-081-0/+1
| | | | Signed-off-by: Ilia Mirkin <[email protected]>
* gallium: add PIPE_CAP_SHADER_BUFFER_OFFSET_ALIGNMENTIlia Mirkin2016-01-0814-13/+27
| | | | | Signed-off-by: Ilia Mirkin <[email protected]> Reviewed-by: Marek Olšák <[email protected]>
* gallium: add PIPE_SHADER_CAP_MAX_SHADER_BUFFERSIlia Mirkin2016-01-089-0/+16
| | | | | Signed-off-by: Ilia Mirkin <[email protected]> Reviewed-by: Marek Olšák <[email protected]>
* tgsi: add ureg support for image declsIlia Mirkin2016-01-083-7/+15
| | | | | Signed-off-by: Ilia Mirkin <[email protected]> Reviewed-by: Marek Olšák <[email protected]>
* util/pstipple: allow fragment shader POSITION to be a system valueMarek Olšák2016-01-084-4/+8
| | | | | Reviewed-by: Edward O'Callaghan <[email protected] Reviewed-by: Brian Paul <[email protected]>
* gallium: add caps for POSITION and FACE system valuesMarek Olšák2016-01-0814-0/+28
| | | | | | | v2: document the integer behavior Reviewed-by: Edward O'Callaghan <[email protected] Reviewed-by: Brian Paul <[email protected]>
* radeon, si: Use TGSI chan name defines in lp_build_emit_fetch() callsEdward O'Callaghan2016-01-082-8/+8
| | | | | Signed-off-by: Edward O'Callaghan <[email protected]> Reviewed-by: Nicolai Hähnle <[email protected]>
* nvc0: add ARB_indirect_parameters supportIlia Mirkin2016-01-075-6/+313
| | | | | | | I chose to make separate macros for this due to the additional complexity and extra scratch usage. Signed-off-by: Ilia Mirkin <[email protected]>
* nvc0: add support for real ARB_multi_draw_indirectIlia Mirkin2016-01-074-18/+47
| | | | | | | The draw groups are now split up into groups of 32 if there's a non-packed stride, or in groups of 400-500 if the draw data is packed. Signed-off-by: Ilia Mirkin <[email protected]>
* nvc0: adjust indirect draw macros to handle multiple draws at onceIlia Mirkin2016-01-073-52/+101
| | | | | | | These are still invoked one at a time, but the underlying macro can handle multiple draws. Signed-off-by: Ilia Mirkin <[email protected]>
* gallium: add caps to expose support for multi indirect drawsIlia Mirkin2016-01-0714-0/+28
| | | | | Signed-off-by: Ilia Mirkin <[email protected]> Reviewed-by: Marek Olšák <[email protected]>
* llvmpipe: do 64bit plane calculations in the sse pathRoland Scheidegger2016-01-082-50/+70
| | | | | | | | | | | | The sse path was pretty much disabled for practical purposes because the largest allowed fb size was 128x128. So, adapt it for 64bit plane calculations. This is actually not that difficult, though a problem is that we can't do a signed 32x32->64bit mul, only unsigned, so need to fix that up. Overall, the code still looks reasonable, though it's not like changes there in setup really make much of a difference in the end... Reviewed-by: Jose Fonseca <[email protected]> Reviewed-by: Brian Paul <[email protected]>
* llvmpipe: don't store eo as 64bit intRoland Scheidegger2016-01-084-11/+16
| | | | | | | | | | | eo, just like dcdx and dcdy, cannot overflow 32bit. Store it as unsigned though just in case (it cannot be negative, but in theory twice as big as dcdx or dcdy so this gives it one more bit). This doesn't really change anything, albeit it might help minimally on 32bit archs. Reviewed-by: Jose Fonseca <[email protected]> Reviewed-by: Brian Paul <[email protected]>
* llvmpipe: use aligned data for the assembly program in setupRoland Scheidegger2016-01-081-17/+21
| | | | | | | | | Back in the day (before 24678700edaf5bb9da9be93a1367f1a24cfaa471) the values were not actually in a struct but even then I can't see why we didn't simply align the values. Especially since it's trivial to do so. (Not that it actually matters since the code is pretty much unused for now.) Reviewed-by: Oded Gabbay <[email protected]>
* radeonsi: adjust the parameters of si_shader_dumpMarek Olšák2016-01-073-20/+11
| | | | | | | The function will be extended to dump all binaries shaders will consist of, so si_shader* makes sense here. Reviewed-by: Nicolai Hähnle <[email protected]>
* radeonsi: move si_shader_dump call out of si_compile_llvmMarek Olšák2016-01-072-2/+11
| | | | Reviewed-by: Nicolai Hähnle <[email protected]>
* radeonsi: inline si_shader_binary_readMarek Olšák2016-01-073-11/+3
| | | | Reviewed-by: Nicolai Hähnle <[email protected]>
* radeonsi: move si_shader_dump call out of si_shader_binary_readMarek Olšák2016-01-073-20/+21
| | | | Reviewed-by: Nicolai Hähnle <[email protected]>
* radeonsi: separate shader dumping code to si_shader_dump and *_dump_statsMarek Olšák2016-01-071-12/+30
| | | | | | | Eventually, I'd like to dump stats for several combined binaries, which is why you don't see a binary parameter in si_shader_dump_stats Reviewed-by: Nicolai Hähnle <[email protected]>
* radeonsi: add si_shader_destroy_binaryMarek Olšák2016-01-072-5/+10
| | | | Reviewed-by: Nicolai Hähnle <[email protected]>
* radeonsi: don't pass si_shader to si_compile_llvmMarek Olšák2016-01-073-18/+28
| | | | Reviewed-by: Nicolai Hähnle <[email protected]>
* radeonsi: move si_shader_binary_upload out of si_compile_llvmMarek Olšák2016-01-072-4/+9
| | | | Reviewed-by: Nicolai Hähnle <[email protected]>
* radeonsi: always keep shader code, rodata, and relocs in memoryMarek Olšák2016-01-071-7/+3
| | | | | | | We won't compile shaders in draw calls, but we will concatenate shader binaries according to states in draw calls, so keep the binaries. Reviewed-by: Nicolai Hähnle <[email protected]>
* radeonsi: don't pass si_shader to si_shader_binary_readMarek Olšák2016-01-073-14/+19
| | | | Reviewed-by: Nicolai Hähnle <[email protected]>