summaryrefslogtreecommitdiffstats
path: root/src/gallium/drivers/nouveau
Commit message (Collapse)AuthorAgeFilesLines
* nv50/ir: optimize mad/fma with third argument 0 to mulKarol Herbst2016-01-281-0/+21
| | | | | | | | | | | | | | | Very modest effect, but it's clearly the right thing to do. total instructions in shared programs : 6131491 -> 6131398 (-0.00%) total gprs used in shared programs : 910157 -> 910131 (-0.00%) total local used in shared programs : 15328 -> 15328 (0.00%) local gpr inst bytes helped 0 55 85 85 hurt 0 26 20 20 Signed-off-by: Karol Herbst <[email protected]> Reviewed-by: Ilia Mirkin <[email protected]>
* nv50/ir: run DCE backwardsKarol Herbst2016-01-281-3/+3
| | | | | | | Reduces calls up to 50% Signed-off-by: Karol Herbst <[email protected]> Reviewed-by: Ilia Mirkin <[email protected]>
* nv50/ir: optimize shl(shr(a, c), c) to and(a, ~((1 << c) - 1))Karol Herbst2016-01-281-0/+8
| | | | | | | | | | | | | | | Following shader-db results on GK110: total instructions in shared programs : 6141510 -> 6131491 (-0.16%) total gprs used in shared programs : 910187 -> 910157 (-0.00%) total local used in shared programs : 15328 -> 15328 (0.00%) local gpr inst bytes helped 0 18 821 821 hurt 0 0 0 0 Signed-off-by: Karol Herbst <[email protected]> Reviewed-by: Ilia Mirkin <[email protected]>
* nv50/ir: fix memory corruption when spilling and redoing RAKarol Herbst2016-01-261-0/+3
| | | | | | | | | | | When RA fails, and we spill, we have to clean everything up before doing RA again. We were forgetting to reset the hi/lo linked lists - at least the hi list is guaranteed to still have pointers to now-deleted RIG nodes. Signed-off-by: Karol Herbst <[email protected]> Reviewed-by: Ilia Mirkin <[email protected]> Cc: [email protected]
* gallium: add GREMEDY_string_markerRob Clark2016-01-213-0/+3
| | | | | | | | | | Since the GREMEDY extensions are normally only exposed by the gremedy debugger (and could possibly trigger debug paths in the app), we don't expose the extension by default, but instead only with ST_DEBUG=gremedy. Signed-off-by: Rob Clark <[email protected]> Reviewed-by: Ilia Mirkin <[email protected]>
* nv50/ir: 64-bit splitting fixesIlia Mirkin2016-01-201-1/+2
| | | | | | | Take reading shader outputs into account, and use setFlagsDef for the carry since we rely on having i->flagsDef being set. Signed-off-by: Ilia Mirkin <[email protected]>
* gk110/ir: allow carry to be set/read by imadIlia Mirkin2016-01-201-0/+4
| | | | Signed-off-by: Ilia Mirkin <[email protected]>
* gm107/ir: add carry emission to LOP and IADDIlia Mirkin2016-01-201-0/+4
| | | | Signed-off-by: Ilia Mirkin <[email protected]>
* gm107/ir: add ATOM and CCTL supportIlia Mirkin2016-01-201-0/+52
| | | | Signed-off-by: Ilia Mirkin <[email protected]>
* gm107/ir: set LD/ST address width bitIlia Mirkin2016-01-201-0/+2
| | | | Signed-off-by: Ilia Mirkin <[email protected]>
* gk110/ir: fix double-wide vm addressIlia Mirkin2016-01-201-0/+4
|
* gk110/ir: add OP_CCTL handlingIlia Mirkin2016-01-201-0/+35
|
* gk110/ir: add atomic op emission, fix gmem loadsIlia Mirkin2016-01-201-5/+65
| | | | Signed-off-by: Ilia Mirkin <[email protected]>
* nv50/ir: don't flip SHL(ADD) into ADD(SHL) if ADD sources have modifiersIlia Mirkin2016-01-201-0/+2
| | | | | Fixes: 31fde8fa (nv50/ir: flip shl(add, imm) into add(shl, imm)) Signed-off-by: Ilia Mirkin <[email protected]>
* gk110/ir: fix load from shared memoryIlia Mirkin2016-01-201-1/+1
| | | | | | It was accidentally using the store opcode. Signed-off-by: Ilia Mirkin <[email protected]>
* gk110/ir: add partial BAR supportIlia Mirkin2016-01-201-2/+18
| | | | | | This is enough for the plain TGSI BARRIER implementation. Signed-off-by: Ilia Mirkin <[email protected]>
* nv50/ir: swap the least-ref'd source into src1 when both const/immIlia Mirkin2016-01-181-10/+15
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The whole point of inlining sources is to reduce loads. We can end up in a situation where one value is used a lot of times, and one value is used only once per instruction. The once-per-instruction one is the one that should get inlined, but with the previous algorithm, it was given no preference. This flips things around to preferring putting less-referenced values into src1 which increases the likelihood of them being inlined. While we're at it, adjust the heuristic to not treat 0 as an immediate, as well as (effectively) check for situations where LIMMs can't be loaded. All this yields improvements on nvc0: total instructions in shared programs : 6261157 -> 6255985 (-0.08%) total gprs used in shared programs : 945082 -> 943417 (-0.18%) total local used in shared programs : 30372 -> 30288 (-0.28%) total bytes used in shared programs : 50089256 -> 50047880 (-0.08%) local gpr inst bytes helped 21 822 3332 3332 hurt 0 278 565 565 And more importantly avoids generating really bad code with SSBOs, where we end up checking a lot of different values (usually immediates) against the length. On nv50 we get comparable results, and even improve packing (bytes went down more than instructions): total instructions in shared programs : 6346564 -> 6341277 (-0.08%) total gprs used in shared programs : 728719 -> 725131 (-0.49%) total local used in shared programs : 3552 -> 3552 (0.00%) total bytes used in shared programs : 43995688 -> 43932928 (-0.14%) local gpr inst bytes helped 0 1380 3252 3774 hurt 0 287 1710 1365 Signed-off-by: Ilia Mirkin <[email protected]>
* gm107/ir: don't do indirect frag shader inputs on GM107Ilia Mirkin2016-01-171-0/+1
| | | | | | | | Apparently the IPA op decided to stop working with offsets. Need to figure out if we need to do an AL2P situation or something similar. For now just turn it back off. Signed-off-by: Ilia Mirkin <[email protected]>
* nvc0: bsp_bo can't be nullIlia Mirkin2016-01-171-1/+1
| | | | | | | We already deref it earlier. And these are all allocated on load. Spotted by Coverity. Signed-off-by: Ilia Mirkin <[email protected]>
* nv50/ir: add saturate support on ex2Ilia Mirkin2016-01-162-0/+6
| | | | Signed-off-by: Ilia Mirkin <[email protected]>
* nv50/ir: rebase indirect temp arrays to 0, so that we use less lmem spaceIlia Mirkin2016-01-141-14/+44
| | | | | | | | | | | | | Reduces local memory usage in a lot of Metro 2033 Redux and a few KSP shaders: total local used in shared programs : 54116 -> 30372 (-43.88%) Probably modest advantage to execution, but it's an imporant prerequisite to dropping some of the TGSI optimizations done by the state tracker. Signed-off-by: Ilia Mirkin <[email protected]>
* nv50/ir: only use FILE_LOCAL_MEMORY for temp arrays that use indirectionIlia Mirkin2016-01-141-15/+50
| | | | | | | | | | | | | | | | | | | Previously we were treating any indirect temp array usage to mean that everything should end up in lmem. The MemoryOpt pass would clean a lot of that up later, but in the meanwhile we would lose a lot of opportunity for optimization. This helps a lot of Metro 2033 Redux and a handful of KSP shaders: total instructions in shared programs : 6288373 -> 6261517 (-0.43%) total gprs used in shared programs : 944051 -> 945131 (0.11%) total local used in shared programs : 54116 -> 54116 (0.00%) A typical case is for register usage to double and for instructions to halve. A future commit can also optimize local memory usage size to be reduced with better packing. Signed-off-by: Ilia Mirkin <[email protected]>
* nvc0/ir: be careful about propagating very large offsets into const loadIlia Mirkin2016-01-144-1/+19
| | | | | | | | | | | | | Indirect constbuf indexing works by using very large offsets. However if an indirect constbuf index load is const-propagated, it becomes a very large const offset. Take that into account when legalizing the SSA by moving the high parts of that offset into the file index. Also disallow very large (or small) indices on most other instructions. This fixes regressions in ubo_array_indexing/*-two-arrays piglit tests. Fixes: abd326e81b (nv50/ir: propagate indirect loads into instructions) Signed-off-by: Ilia Mirkin <[email protected]>
* nvc0: allow fragment shader inputs to use indirect indexingIlia Mirkin2016-01-141-1/+1
| | | | Signed-off-by: Ilia Mirkin <[email protected]>
* gallium/st: add pipe_context::generate_mipmap()Charmaine Lee2016-01-143-0/+3
| | | | | | | | | | | | | | | | This patch adds a new interface to support hardware mipmap generation. PIPE_CAP_GENERATE_MIPMAP is added to allow a driver to specify if this new interface is supported; if not supported, the state tracker will fallback to mipmap generation by rendering/texturing. v2: add PIPE_CAP_GENERATE_MIPMAP to the disabled section for all drivers v3: add format to the generate_mipmap interface to allow mipmap generation using a format other than the resource format v4: fix return type of trace_context_generate_mipmap() Reviewed-by: Brian Paul <[email protected]> Reviewed-by: Roland Scheidegger <[email protected]> Reviewed-by: Jose Fonseca <[email protected]>
* gallium: add PIPE_CAP_INVALIDATE_BUFFERNicolai Hähnle2016-01-143-0/+3
| | | | | | | | | It makes sense to re-use pipe->invalidate_resource for the purpose of glInvalidateBufferData, but this function is already implemented in vc4 where it doesn't have the expected behavior. So add a capability flag to indicate that the driver supports the expected behavior. Reviewed-by: Marek Olšák <[email protected]>
* nvc0: do not force re-binding of compute constbufs on FermiSamuel Pitoiset2016-01-121-1/+1
| | | | | | | | | | Re-binding compute constant buffers after launching a grid have no effects because they are not currently validated and because dirty_cp is not updated accordingly. This might also prevent weird future behaviours when UBOs will be bound for compute. Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Ilia Mirkin <[email protected]>
* nvc0: remove useless goto in nvc0_launch_grid()Samuel Pitoiset2016-01-121-6/+4
| | | | | | | Trivial. Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Ilia Mirkin <[email protected]>
* nv50/ir: the whole point of data array is to hand out regular registersIlia Mirkin2016-01-111-1/+1
| | | | | Fixes: 0d3051f75a (nv50/ir: Fix scratch allocation size and file) Signed-off-by: Ilia Mirkin <[email protected]>
* nv50/ir: Fix scratch allocation size and filePierre Moreau2016-01-092-3/+3
| | | | | Signed-off-by: Pierre Moreau <[email protected]> Reviewed-by: Ilia Mirkin <[email protected]>
* nv50,nvc0: use a face sysval to avoid the useless back-and-forth conversionIlia Mirkin2016-01-085-9/+2
| | | | Signed-off-by: Ilia Mirkin <[email protected]>
* gallium: add PIPE_CAP_SHADER_BUFFER_OFFSET_ALIGNMENTIlia Mirkin2016-01-083-0/+3
| | | | | Signed-off-by: Ilia Mirkin <[email protected]> Reviewed-by: Marek Olšák <[email protected]>
* gallium: add PIPE_SHADER_CAP_MAX_SHADER_BUFFERSIlia Mirkin2016-01-083-0/+4
| | | | | Signed-off-by: Ilia Mirkin <[email protected]> Reviewed-by: Marek Olšák <[email protected]>
* tgsi: add ureg support for image declsIlia Mirkin2016-01-081-3/+9
| | | | | Signed-off-by: Ilia Mirkin <[email protected]> Reviewed-by: Marek Olšák <[email protected]>
* gallium: add caps for POSITION and FACE system valuesMarek Olšák2016-01-083-0/+6
| | | | | | | v2: document the integer behavior Reviewed-by: Edward O'Callaghan <[email protected] Reviewed-by: Brian Paul <[email protected]>
* nvc0: add ARB_indirect_parameters supportIlia Mirkin2016-01-075-6/+313
| | | | | | | I chose to make separate macros for this due to the additional complexity and extra scratch usage. Signed-off-by: Ilia Mirkin <[email protected]>
* nvc0: add support for real ARB_multi_draw_indirectIlia Mirkin2016-01-074-18/+47
| | | | | | | The draw groups are now split up into groups of 32 if there's a non-packed stride, or in groups of 400-500 if the draw data is packed. Signed-off-by: Ilia Mirkin <[email protected]>
* nvc0: adjust indirect draw macros to handle multiple draws at onceIlia Mirkin2016-01-073-52/+101
| | | | | | | These are still invoked one at a time, but the underlying macro can handle multiple draws. Signed-off-by: Ilia Mirkin <[email protected]>
* gallium: add caps to expose support for multi indirect drawsIlia Mirkin2016-01-073-0/+6
| | | | | Signed-off-by: Ilia Mirkin <[email protected]> Reviewed-by: Marek Olšák <[email protected]>
* draw: nuke the interp parameter from vertex_infoRoland Scheidegger2016-01-071-8/+7
| | | | | | | | | | | | | draw emit couldn't care less what the interpolation mode is... This somehow looked like it would matter, all drivers more or less dutifully filled that in correctly. But this is only used for emit, if draw needs to know about interpolation mode (for clipping for instance) it will get that information from the vs anyway. softpipe actually used to depend on that interpolation parameter, as it abused that structure quite a bit but no longer. Reviewed-by: Brian Paul <[email protected]> Reviewed-by: Edward O'Callaghan <[email protected]>
* nvc0: add support for st/vaJulien Isorce2016-01-053-50/+133
| | | | | | | | | | | - split nvc0_decoder_bsp in begin/next/end - preserve content buffer when calling nvc0_decoder_bsp_next - implement pipe_video_codec::begin_frame/end_frame https://bugs.freedesktop.org/show_bug.cgi?id=89969 Signed-off-by: Julien Isorce <[email protected]> Reviewed-by: Ilia Mirkin <[email protected]>
* nouveau: split nouveau_vp3_bsp in begin/next/endJulien Isorce2016-01-054-41/+77
| | | | | | | | | | | | | It allows to call nouveau_vp3_bsp_next multiple times between one begin/end. It is required to support st/va. https://bugs.freedesktop.org/show_bug.cgi?id=89969 Signed-off-by: Julien Isorce <[email protected]> [imirkin: create strparm_bsp function, simplified w0 calculation] Signed-off-by: Ilia Mirkin <[email protected]>
* nvc0: scale up inter_bo size so that it's 16M for a 4K videoIlia Mirkin2016-01-041-2/+5
| | | | | | | | Experimentally, 4M causes corruption and slowness, try to ramp it up with size instead. Signed-off-by: Ilia Mirkin <[email protected]> Cc: "11.0 11.1" <[email protected]>
* nv50,nvc0: fix crash when increasing bsp bo size for h264Ilia Mirkin2016-01-042-4/+4
| | | | | | | | H264 doesn't have a bitplane bo. We just need a device reference, so use the one from the client. Signed-off-by: Ilia Mirkin <[email protected]> Cc: "11.0 11.1" <[email protected]>
* nvc0/ir: add support for PK2H/UP2HIlia Mirkin2016-01-034-2/+28
| | | | Signed-off-by: Ilia Mirkin <[email protected]>
* gallium: add PIPE_CAP_TGSI_PACK_HALF_FLOAT to indicate UP2H/PK2H supportIlia Mirkin2016-01-033-0/+3
| | | | | | Signed-off-by: Ilia Mirkin <[email protected]> Reviewed-by: Edward O'Callaghan <[email protected]> Reviewed-by: Roland Scheidegger <[email protected]>
* nv50,nvc0: optimize coherent buffer checking at draw timeSamuel Pitoiset2016-01-036-68/+82
| | | | | | | | | | Instead of iterating over all the buffer resources looking for coherent buffers, we keep track of a context-wide count. This will save some iterations (and CPU cycles) in 99.99% case because usually coherent buffers are not so used. Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Ilia Mirkin <[email protected]>
* nv50,nvc0: make sure there's pushbuf space and that we ref the bo earlyIlia Mirkin2016-01-014-6/+5
| | | | | | | | | | | First off, we can't flush in the middle of a command. Secondly requesting the extra push space might cause a flush to happen. If that flush happens, we'd have to do the PUSH_REFN again. So instead do PUSH_REFN after the push space request. This helps avoid rare crashes with supertuxkart in libdrm due to assertion failures. Signed-off-by: Ilia Mirkin <[email protected]> Cc: "11.0 11.1" <[email protected]>
* nvc0: Set winding order regardless of domain.Kenneth Graunke2015-12-301-2/+4
| | | | | | | | | | Quads need to respect winding order, too - not just triangles. Fixes rendering in GFXBench 4.0's tessellation benchmark. Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Ilia Mirkin <[email protected]> Cc: "11.0 11.1" <[email protected]>
* nvc0: add ARB_shader_draw_parameters supportIlia Mirkin2015-12-3013-15/+73
| | | | Signed-off-by: Ilia Mirkin <[email protected]>