summaryrefslogtreecommitdiffstats
path: root/src/gallium/drivers
Commit message (Collapse)AuthorAgeFilesLines
* freedreno: threaded batch flushRob Clark2016-07-309-26/+99
| | | | | | | | | | | | | | | With the state accessed from GMEM+submit factored out of fd_context and into fd_batch, now it is possible to punt this off to a helper thread. And more importantly, since there are cases where one context might force the batch-cache to flush another context's batches (ie. when there are too many in-flight batches), using a per-context helper thread keeps various different flushes for a given context serialized. TODO as with batch-cache, there are a few places where we'll need a mutex to protect critical sections, which is completely missing at the moment. Signed-off-by: Rob Clark <[email protected]>
* freedreno: track batch/blit typesRob Clark2016-07-305-24/+52
| | | | | | | | | | | | Add a bit of extra book-keeping about blits and back-blits (from resource shadowing). If the app uploads all mipmap levels, as opposed to uploading the first level and then glGenerateMipmap(), we can discard the back-blit (as opposed to being naive and shadowing the resource for each mipmap level). Also, after a normal blit, we might as well flush the batch immediately, since there is not likely to be further rendering to the surface. Signed-off-by: Rob Clark <[email protected]>
* freedreno: re-order support for hw queriesRob Clark2016-07-3019-264/+288
| | | | | | | | | | | Push query state down to batch, and use the resource tracking to figure out which batch(es) need to be flushed to get the query result. This means we actually need to allocate the prsc up front, before we know the size. So we have to add a special way to allocate an un- backed resource, and then later allocate the backing storage. Signed-off-by: Rob Clark <[email protected]>
* freedreno: use prsc for hw queriesRob Clark2016-07-303-35/+45
| | | | | | | | | Switch to using a pipe_resource (rather than an fd_bo directly) for hw query result buffers. This is first step towards making queries work properly with reordered batches, since we'll need the additional dependency tracking to know which batches to flush. Signed-off-by: Rob Clark <[email protected]>
* freedreno: support discarding previous rendering in special casesRob Clark2016-07-303-5/+32
| | | | | | | | | | Basically, to "DCE" blits triggered by resource shadowing, in cases where the levels are immediately completely overwritten. For example, mid-frame texture upload to level zero triggers shadowing and back-blits to the remaining levels, which are immediately overwritten by glGenerateMipmap(). Signed-off-by: Rob Clark <[email protected]>
* freedreno: shadow textures if possible to avoid stall/flushRob Clark2016-07-303-11/+211
| | | | | | | | | | | | | | To make batch re-ordering useful, we need to be able to create shadow resources to avoid a flush/stall in transfer_map(). For example, uploading new texture contents or updating a UBO mid-batch. In these cases, we want to clone the buffer, and update the new buffer, leaving the old buffer (whose reference is held by cmdstream) as a shadow. This is done by blitting the remaining other levels (and whatever part of current level that is not discarded) from the old/shadow buffer to the new one. Signed-off-by: Rob Clark <[email protected]>
* freedreno: spiff up some debug tracesRob Clark2016-07-306-6/+18
| | | | | | | Make it easier to track batches, to ensure things happen properly when they are reordered. Signed-off-by: Rob Clark <[email protected]>
* freedreno: add batch-cache and batch reorderingRob Clark2016-07-3015-111/+760
| | | | | | | | | | | | Note that I originally also had a entry-point that would construct a key and do lookup from a pipe_surface. I ended up not needing that (yet?) but it is easy-enough to re-introduce later if we need it for the blit path. For now, not enabled by default, but can be enabled (on a3xx/a4xx) with FD_MESA_DEBUG=reorder. Signed-off-by: Rob Clark <[email protected]>
* freedreno: move more batch related tracking to fd_batchRob Clark2016-07-3023-398/+420
| | | | | | | | | | | | | | | | To flush batches out of order, the gmem code needs to not depend on state from fd_context (since that may apply to a more recent batch). So this all moves into batch. The one exception is the gmem/pipe/tile state itself. But this is only used from gmem code (and batches are flushed serially). The alternative would be having to re-calculate GMEM layout on every batch, even if the dimensions of the render targets are the same. Note: This opens up the possibility of pushing gmem/submit into a helper thread. Signed-off-by: Rob Clark <[email protected]>
* freedreno: dynamically sized/growable cmd buffersRob Clark2016-07-302-23/+33
| | | | Signed-off-by: Rob Clark <[email protected]>
* freedreno: push resource tracking down into batchRob Clark2016-07-307-42/+51
| | | | Signed-off-by: Rob Clark <[email protected]>
* freedreno: introduce fd_batchRob Clark2016-07-3020-177/+252
| | | | | | | | | | | | | | | | | | | Introduce the batch object, to track a batch/submit's worth of ringbuffers and other bookkeeping. In this first step, just move the ringbuffers into batch, since that is mostly uninteresting churn. For now there is just a single batch at a time. Note that one outcome of this change is that rb's are allocated/freed on each use. But the expectation is that the bo pool in libdrm_freedreno will save us the GEM bo alloc/free which was the initial reason to implement a rb pool in gallium. The purpose of the batch is to eventually facilitate out-of-order rendering, with batches associated to framebuffer state, and tracking the dependencies on other batches. Signed-off-by: Rob Clark <[email protected]>
* freedreno: limit non-user constant buffers to a4xxRob Clark2016-07-291-1/+1
| | | | | | | Seems to mostly work on a3xx. Except when it doesn't and kills gpu quite badly. Signed-off-by: Rob Clark <[email protected]>
* vc4: automake: remove vc4_drm.h from the sources listsEmil Velikov2016-07-281-1/+0
| | | | | | | | | The file was removed with earlier commit breaking 'make dist'. Drop it from Makefile.sources since it's no longer around. Fixes: 16985eb308e ("vc4: Switch to using the libdrm-provided vc4_drm.h.") Signed-off-by: Emil Velikov <[email protected]>
* ddebug: use pclose to close a popen()'d FILENicolai Hähnle2016-07-281-1/+1
| | | | | | Found by Coverity. Reviewed-by: Marek Olšák <[email protected]>
* nvc0: enable ARB_tessellation_shader on GM107+Samuel Pitoiset2016-07-271-3/+0
| | | | | | | This exposes OpenGL 4.1 on Maxwell (tested on GM107 and GM206). Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Ilia Mirkin <[email protected]>
* gm107/ir: add a legalize SSA pass for PFETCHSamuel Pitoiset2016-07-274-2/+43
| | | | | | | PFETCH, actually ISBERD on GM107+ ISA only accepts a GPR for src0. Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Ilia Mirkin <[email protected]>
* nvc0: fix up TCP header on GM107+Samuel Pitoiset2016-07-271-0/+9
| | | | | | | | | | The number of outputs patch (limited to 255) has moved in the TCP header, but blob seems to also set the old position. Also, the high 8-bits are now located inbetween the min/max parallel output read address at position 20. Signed-off-by: Samuel Pitoiset <[email protected]> Acked-by: Ilia Mirkin <[email protected]>
* radeon/llvm: Use alloca instructions for larger arrays [revert a revert]Marek Olšák2016-07-262-25/+149
| | | | | | This reverts commit f84e9d749fbb6da73a60fb70e6725db773c9b8f8. Bioshock Infinite no longer hangs.
* r600g: add support for B5G6R5 PBO uploads via texture buffers (v2)Marek Olšák2016-07-261-0/+6
| | | | | | v2: set endian swap to 16 untested
* radeonsi: pre-generate shader logs for ddebugMarek Olšák2016-07-264-6/+34
| | | | | | | | This cuts down the overhead of si_dump_shader when ddebug is capturing shader logs, which is done for every draw call unconditionally (that's quite a lot of work for a draw call). Reviewed-by: Nicolai Hähnle <[email protected]>
* radeonsi: add empty lines after shader statsMarek Olšák2016-07-261-1/+1
| | | | | | to separate individual shaders dumped consecutively. Reviewed-by: Nicolai Hähnle <[email protected]>
* radeonsi: move the shader key dumping to si_shader_dumpMarek Olšák2016-07-263-5/+9
| | | | Reviewed-by: Nicolai Hähnle <[email protected]>
* ddebug: implement pipelined hang detection modeMarek Olšák2016-07-264-4/+567
| | | | | | | | | | | | | | | | | | | | | | For good performance while being able to generate decent hang reports. The report doesn't contain the parsed IB and the buffer list, but it isolates the draw call and dumps shaders while not having to flush the context. This is for GPU hangs that are harder to reproduce and require interactive playing for minutes or even hours. dd_pipe.h explains some implementation details. Initializing, copying (recording) and clearing states is most of the code. The performance should be at least 50% of the normal performance depending on the circumstances. (i.e. 50% is expected to be the worst case scenario, not the best case) The majority of time is spent in dump_debug_state(PIPE_DUMP_CURRENT_SHADERS) and that's after all the optimizations in later patches. There is no obvious way to optimize that further. Reviewed-by: Nicolai Hähnle <[email protected]>
* ddebug: don't save pointers to call parametersMarek Olšák2016-07-262-6/+6
| | | | Reviewed-by: Nicolai Hähnle <[email protected]>
* ddebug: move dd_call into dd_pipe.hMarek Olšák2016-07-262-66/+66
| | | | Reviewed-by: Nicolai Hähnle <[email protected]>
* ddebug: separate draw call dumping logicMarek Olšák2016-07-261-21/+26
| | | | Reviewed-by: Nicolai Hähnle <[email protected]>
* ddebug: move all states into a separate structureMarek Olšák2016-07-263-129/+140
| | | | Reviewed-by: Nicolai Hähnle <[email protected]>
* ddebug: write contents of dmesg into hang reportsMarek Olšák2016-07-261-4/+25
| | | | Reviewed-by: Nicolai Hähnle <[email protected]>
* ddebug: implement create_batch_queryMarek Olšák2016-07-261-0/+27
| | | | Reviewed-by: Nicolai Hähnle <[email protected]>
* ddebug: don't use abort()Marek Olšák2016-07-261-1/+1
| | | | | | We don't want a core dump. Reviewed-by: Nicolai Hähnle <[email protected]>
* ddebug: make dd_get_file_stream accept the screen onlyMarek Olšák2016-07-261-7/+8
| | | | Reviewed-by: Nicolai Hähnle <[email protected]>
* ddebug: clean up ddebug_screen_createMarek Olšák2016-07-261-16/+23
| | | | Reviewed-by: Nicolai Hähnle <[email protected]>
* gallium: rework flags for pipe_context::dump_debug_stateMarek Olšák2016-07-262-15/+29
| | | | | | | | The pipelined hang detection mode will not want to dump everything. (and it's also time consuming) It will only dump shaders after a draw call and then dump the status registers separately if a hang is detected. Reviewed-by: Nicolai Hähnle <[email protected]>
* vc4: add hash table look-up for exported dmabufsRob Herring2016-07-264-3/+56
| | | | | | | | | | | | | It is necessary to reuse existing BOs when dmabufs are imported. There are 2 cases that need to be handled. dmabufs can be created/exported and imported by the same process and can be imported multiple times. Copying other drivers, add a hash table to track exported BOs so the BOs get reused. v2: Whitespace fixup (by anholt) Signed-off-by: Rob Herring <[email protected]> Reviewed-by: Eric Anholt <[email protected]>
* vc4: Disable early Z with computed depth.Eric Anholt2016-07-263-2/+11
| | | | | We don't tell the hardware whether we're computing depth, so we need to manage early Z state manually. Fixes piglit early-z.
* nvc0: use nvc0_m2mf_push_linear() to reduce code duplicationSamuel Pitoiset2016-07-261-13/+3
| | | | | Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Ilia Mirkin <[email protected]>
* nvc0: use nve4_p2mf_push_linear() to reduce code duplicationSamuel Pitoiset2016-07-261-36/+9
| | | | | Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Ilia Mirkin <[email protected]>
* nvc0: upload sample locations on GM20xSamuel Pitoiset2016-07-243-5/+31
| | | | | | | | This fixes a bunch of multisample piglit tests on GM206, like bin/arb_texture_multisample-texelfetch 2 -auto -fbo Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Ilia Mirkin <[email protected]>
* freedreno/a4xx: time-elapsed query should be active for clearsRob Clark2016-07-241-1/+1
| | | | Signed-off-by: Rob Clark <[email protected]>
* nvc0/ir: fix up an assertion in emitUADD()Samuel Pitoiset2016-07-241-4/+3
| | | | | | | | It's illegal to have neg modifiers on both sources for OP_ADD, and it's illegal to have OP_SUB with just src0 neg. Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Ilia Mirkin <[email protected]>
* nvc0: fix wrong indentation in nvc0_validate_fb()Samuel Pitoiset2016-07-231-141/+141
| | | | | | Trivial. Signed-off-by: Samuel Pitoiset <[email protected]>
* freedreno/a4xx: timestamp queriesRob Clark2016-07-233-1/+34
| | | | Signed-off-by: Rob Clark <[email protected]>
* freedreno: hw timestamp supportRob Clark2016-07-232-2/+15
| | | | | | If the kernel supports it, use hw counter for timestamps. Signed-off-by: Rob Clark <[email protected]>
* freedreno: prep work for timestamp queriesRob Clark2016-07-233-6/+10
| | | | | | | | | We need "NULL" state to be a valid bit in the bitmask, because timestamp queries are not restricted to draw/etc stages (ie. the only commands to submit may just be to read the timestamp). And just because there are no draws, isn't a reason to skip the flush and return zero. Signed-off-by: Rob Clark <[email protected]>
* radeonsi: ensure sample locations are set for line and polygon smoothingNicolai Hähnle2016-07-231-2/+1
| | | | | | | Since commit d938b8c, the sample locations are no longer set unconditionally, so we need to set the atom to dirty on all chips, not just Polaris. Cc: 12.0 <[email protected]>
* radeonsi: fix Polaris MSAA regressionNicolai Hähnle2016-07-232-15/+20
| | | | | | | | | | | | | The regression was introduced by commit d938b8c. The problem here is that in order to use the small primitive filter, we need to explicitly set the sample locations to 0. But the DB doesn't properly process the change of sample locations without a flush, and so we can end up with incorrect Z values. Instead of doing a flush, just disable the small primitive filter when MSAA is force-disabled. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=96908 Cc: 12.0 <[email protected]>
* freedreno/ir3: Add missing braces in initializer[email protected]2016-07-231-1/+1
| | | | Signed-off-by: Rob Clark <[email protected]>
* freedreno/a2xx: silence missing case 'SHADER_COMPUTE' warning (v2)[email protected]2016-07-231-0/+2
| | | | | | | v2: no need for break after an unreachable (Matt Turner) Signed-off-by: Francesco Ansanelli <[email protected]> Signed-off-by: Rob Clark <[email protected]>
* radeonsi: implement buffer_subdata without indirect callsMarek Olšák2016-07-235-5/+41
| | | | | | There is less noise in CPU profile data now. Reviewed-by: Nicolai Hähnle <[email protected]>