summaryrefslogtreecommitdiffstats
path: root/src/gallium/drivers
Commit message (Collapse)AuthorAgeFilesLines
* llvmpipe: reduce alignment requirement for resources from 64x64 to 4x4Roland Scheidegger2013-05-317-53/+83
| | | | | | | | | | | | | | | | | | | | | | | | The overallocation was very bad especially for things like 1d array textures which got blown up by a factor of 64. (Even ordinary smallish 2d textures benefit a lot from this, a mipmapped 64x64 rgba8 texture previously used 7*16kB = 112kB instead of now ~22kB.) 4x4 is chosen because this is the size the jit functions run on, so making it smaller is going to be a bit more complicated. It is actually not strictly 4x4 pixel, since we'd want to avoid situations where different threads are rendering to the same cacheline so we keep cacheline size alignment in x direction (often 64bytes). To make this work introduce new task width/height parameters and make sure clears don't clear the whole tile if it's a partial tile. Likewise, the rasterizer may produce fragments outside the 4x4 blocks present in a tile, so don't call the jit function for them. This does not yet fix rendering to buffers (which cannot have any y alignment at all), and 1d/1d array textures are still overallocated by a factor of 4. v2: replace magic number 4 with LP_RASTER_BLOCK_SIZE, fix size of buffers allocated (needed in case we render to them). Reviewed-by: Jose Fonseca <[email protected]>
* llvmpipe: Remove x/y from cmd_binAdam Jackson2013-05-316-47/+30
| | | | | | | | | | These were mostly just a waste of memory and cache pressure, and were really only used for debugging. This change reduces instruction count (as measured by callgrind's Ir event) of gnome-shell-perf-tool on Ivybridge by 3.5% ± 0.015% (n=20). Signed-off-by: Adam Jackson <[email protected]>
* r600g/sb: fix broken assertVadim Girlin2013-05-311-1/+1
| | | | Signed-off-by: Vadim Girlin <[email protected]>
* ilo: simplify shader variant handlingCourtney Goeltzenleuchter2013-05-302-25/+2
| | | | | | Remove hash function on shader variants. Nature of variants limits them to a small number and thus its more efficient to just do a memory compare of the actual shader structures rather than compute and compare hashes.
* svga: add PIPE_CAP_MAX_VIEWPORTS to switch to silence warningBrian Paul2013-05-291-0/+2
|
* llvmpipe: clamp scissors to be between 0 and maxZack Rusin2013-05-255-3/+13
| | | | | | | | | | We need to clamp to make sure invalid shader doesn't crash our driver. The spec says to return 0-th index for everything that's out of bounds. Signed-off-by: Zack Rusin <[email protected]> Reviewed-by: José Fonseca<[email protected]> Reviewed-by: Brian Paul <[email protected]>
* draw: fixup draw_find_shader_outputZack Rusin2013-05-252-5/+5
| | | | | | | | | | | | | | | | | draw_find_shader_output like most of the code in draw used to depend on position always being at output slot 0. which meant that any other attribute being at 0 could signify an error. unfortunately position can be at any of the output slots, thus other attributes can occupy slot 0 and we need to mark the ones which were not found by something else. This commit changes draw_find_shader_output so that it returns -1 if it can't find the given attribute and adjust the code that depended on it returning >0 whenever it correctly found an attrib. Signed-off-by: Zack Rusin <[email protected]> Reviewed-by: José Fonseca<[email protected]> Reviewed-by: Brian Paul <[email protected]> Reviewed-by: Roland Scheidegger <[email protected]>
* llvmpipe: implement support for multiple viewportsZack Rusin2013-05-2511-36/+79
| | | | | | | | | | Largely related to making sure the rasterizer can correctly pick out the correct scissor box for the current viewport. Signed-off-by: Zack Rusin <[email protected]> Reviewed-by: José Fonseca<[email protected]> Reviewed-by: Brian Paul <[email protected]> Reviewed-by: Roland Scheidegger <[email protected]>
* gallium: Add support for multiple viewportsZack Rusin2013-05-2524-128/+205
| | | | | | | | | | | | Gallium supported only a single viewport/scissor combination. This commit changes the interface to allow us to add support for multiple viewports/scissors. Signed-off-by: Zack Rusin <[email protected]> Reviewed-by: Marek Olšák <[email protected]> Reviewed-by: José Fonseca<[email protected]> Reviewed-by: Brian Paul <[email protected]> Reviewed-by: Roland Scheidegger <[email protected]>
* llvmpipe: get rid of tiled/linear layout remainsRoland Scheidegger2013-05-296-226/+47
| | | | | | | Eliminate the rest of the no longer needed layout logic. (It is possible some code could be simplified a bit further still.) Reviewed-by: Jose Fonseca <[email protected]>
* radeonsi: Enable GLSL 1.30Michel Dänzer2013-05-281-1/+1
|
* radeonsi: Handle TGSI TXQ opcodeMichel Dänzer2013-05-282-3/+33
|
* radeonsi: Add support for TGSI TXF opcodeMichel Dänzer2013-05-282-14/+51
|
* radeonsi: Use tgsi_util_get_texture_coord_dim()Michel Dänzer2013-05-281-25/+7
|
* radeonsi: Handle TGSI_SEMANTIC_CLIPDISTMichel Dänzer2013-05-281-4/+17
|
* radeonsi: Make border colour state handling safe for integer texturesMichel Dänzer2013-05-282-20/+27
|
* radeonsi: Fix hardware state for dual source blendingMichel Dänzer2013-05-284-6/+17
| | | | | Set up CB_SHADER_MASK register according to pixel shader exports, and enable some minimal state for colour buffer 1 in case dual source blending is used.
* r600g/sb: handle more cases for folding in gvn passVadim Girlin2013-05-282-28/+118
| | | | Signed-off-by: Vadim Girlin <[email protected]>
* r600g/sb: improve folding for SETccVadim Girlin2013-05-271-8/+98
| | | | Signed-off-by: Vadim Girlin <[email protected]>
* r600g/sb: optimize CNDcc instructionsVadim Girlin2013-05-273-1/+113
| | | | Signed-off-by: Vadim Girlin <[email protected]>
* r600g/sb: improve optimization of conditional instructionsVadim Girlin2013-05-276-21/+96
| | | | Signed-off-by: Vadim Girlin <[email protected]>
* ilo: enable multiple constant buffersChia-I Wu2013-05-271-1/+1
| | | | This effectively enables uniform buffer object support.
* ilo: add support for indirect access of CONST in FSChia-I Wu2013-05-272-2/+99
| | | | | Unlike other register files, CONST is read with a message and indirect access is easier to implement.
* ilo: add support for TBOs on GEN6Chia-I Wu2013-05-271-8/+26
| | | | This hunk was missing in the last commit.
* ilo: advertise supports for pure integer formatsChia-I Wu2013-05-271-2/+3
| | | | For pure integer formats, no filtering nor blending is needed.
* ilo: add support for texture buffer objectsChia-I Wu2013-05-272-10/+32
| | | | | Take care of sampler views that have buffers as the underlying resources. Update caps related to TBOs.
* r600g/sb: improve handling of KILL instructionsVadim Girlin2013-05-273-89/+139
| | | | | | | This patch improves handling of unconditional KILL instructions inside the conditional blocks, uncovering more opportunities for if-conversion. Signed-off-by: Vadim Girlin <[email protected]>
* r600g/sb: fix peephole optimization for PRED_SETEVadim Girlin2013-05-271-1/+1
| | | | | | | Fixes incorrect condition that prevented optimization for PRED_SETE/PRED_SETE_INT. Signed-off-by: Vadim Girlin <[email protected]>
* r600g/sb: fix scheduling of PRED_SET instructionsVadim Girlin2013-05-272-2/+18
| | | | | | | | | | | | | | PRED_SET instructions that update exec mask should be scheduled immediately prior to the "if-then-else" block, because any instruction that is inserted after alu clause with PRED_SET and before conditional block is also conditionally executed by hw (exec mask is already updated at that moment). Propbably it's better to make PRED_SET a part of conditional "if-then-else" block in the IR to handle this more cleanly, but for now this temporary solution should prevent the problem. Signed-off-by: Vadim Girlin <[email protected]>
* r600g/sb: fix handling of preloaded inputs for compute shadersVadim Girlin2013-05-251-0/+4
| | | | | | | | | | For compute shaders we need to let the backend know that GPRs 0 and 1 are preloaded with some compute-specific input values, otherwise any use of these regs without previous definition is considered as undefined value and usually is simply replaced with 0. Signed-off-by: Vadim Girlin <[email protected]>
* r600g/sb: fix incorrect assertVadim Girlin2013-05-241-1/+1
| | | | Signed-off-by: Vadim Girlin <[email protected]>
* r600g/sb: relax some restrictions for FETCH instructionsVadim Girlin2013-05-241-9/+8
| | | | | | | | This allows GVN rewrite pass to propagate non-const (register) values to FETCH source operands, helping to eliminate unnecessary copies in some cases. Signed-off-by: Vadim Girlin <[email protected]>
* r600g/sb: relax register allocation for compute shadersVadim Girlin2013-05-242-2/+16
| | | | | | | | | | | | | | We have to assume that all GPRs in compute shader can be indirectly addressed because LLVM backend doesn't provide any indirect array info. That's why for compute shaders GPR array is created that covers all used GPRs (0..r600_bytecode::ngpr-1), but this seriously restricts register allocation in sb. This patch checks for actual use of indirect access in the shader and if it's not used then GPR array is not created, so that regalloc is not unnecessarily restricted. Signed-off-by: Vadim Girlin <[email protected]>
* r600g/sb: fix gpr array handling for compute shadersVadim Girlin2013-05-241-1/+1
| | | | | | Fixes segfault with bfgminer and R600_DEBUG=sbcl. Signed-off-by: Vadim Girlin <[email protected]>
* r600g/sb: fix buffer overflow in sb_ostreamVadim Girlin2013-05-241-1/+1
| | | | | | Fixes segfault during bytecode dump with bfgminer kernel Signed-off-by: Vadim Girlin <[email protected]>
* r600g/compute: Use common transfer_{map,unmap} functions for global resourcesTom Stellard2013-05-231-44/+24
| | | | Reviewed-by: Marek Olšák <[email protected]>
* r600g/compute: Use common transfer_{map,unmap} functions for kernel inputsTom Stellard2013-05-231-4/+11
| | | | Reviewed-by: Marek Olšák <[email protected]>
* freedreno: scissor fixRob Clark2013-05-231-0/+11
| | | | | | | Don't assume the state-tracker will set the scissor after the framebuffer state is changed. Signed-off-by: Rob Clark <[email protected]>
* freedreno: implement pipe->resource_copy_region()Rob Clark2013-05-231-8/+43
| | | | Signed-off-by: Rob Clark <[email protected]>
* ilo: Initialize need_flush in draw_vbo.Vinson Lee2013-05-231-1/+1
| | | | | | | | | need_flush was uninitialized if hw3d->new_batch was true. Fixes "Uninitialized scalar variable" defect reported by Coverity. Signed-off-by: Vinson Lee <[email protected]> Reviewed-by: Chia-I Wu <[email protected]>
* radeon: Initialize variables in radeon_llvm_context_init.Vinson Lee2013-05-221-0/+2
| | | | | | | | | | | 'type' was not fully initialized when calling lp_build_context_init. Fixes "Uninitialized scalar variable" defect reported by Coverity. NOTE: This is a candidate for the stable branches. Signed-off-by: Vinson Lee <[email protected]> Reviewed-by: Michel Dänzer <[email protected]>
* softpipe: change TEX_TILE_SIZE and NUM_TEX_TILE_ENTRIESRoland Scheidegger2013-05-221-2/+8
| | | | | | | | | | | | | | | | | | | | | | | | | Initially we had NUM_TEX_TILE_ENTRIES of 50, however this was using too much memory (mostly because the tile cache is operating on fixed max current sampler views which could be fixed but that's another topic). So it was decreased to 4. However this is a ridiculously low number which can't actually really work (the number of tiles needed for as little as a single quad with linear_mipmap_linear is 2 to 8 for a 2d texture, and 4 to 16 for a 3d texture), as it just about guarantees there will be cache thrashing sometimes (just about always for 3d textures in fact, since while there are 4 entries the cache is direct mapped). So increase that number to 16 (which is still on the low side for direct mapped cache though I guess using something like 4-way associativity would be more effective than increasing this further) which has at least some good chance to avoid thrashing. Since we don't want to increase memory requirements however in turn decrease the tile size accordingly from 64 to 32 (as a bonus point this also decreases the cost of texture thrashing which might still happen sometimes). I've seen performance improvement in the order of factor ~200 (specifically, drawing the first frame from the replay from bug 41787 needs "only" ~10s instead of ~30min, meaning I can actually compare the output with other drivers...) with this. Reviewed-by: Jose Fonseca <[email protected]>
* softpipe: disambiguate TILE_SIZE / TEX_TILE_SIZERoland Scheidegger2013-05-223-38/+38
| | | | | | | These can be different (just like NUM_TEX_TILE_ENTRIES / NUM_ENTRIES), though currently they aren't. Reviewed-by: Jose Fonseca <[email protected]>
* llvmpipe: disable simple_shader optimizationRoland Scheidegger2013-05-221-1/+6
| | | | | | | | | | | | | | This optimization disabled mask checks if the shader is simple enough. While this should work correctly, the problem is that it can hide real issues because shaders in practice are usually complex enough (8 instructions or 1 texture is already enough) so this doesn't get used, whereas dumbed-down tests which should hit all the same code paths suddenly do something quite different. This was the reason that bug 41787 could not be easily tracked as stencil test not working correctly (piglit would in fact have failed some tests without that optimization). So disable it for now, it's unclear if it's much of a win in any case. Reviewed-by: Jose Fonseca <[email protected]>
* llvmpipe: fix early depth test / late depth write stencil issuesRoland Scheidegger2013-05-221-5/+12
| | | | | | | | | | | | | | We actually did early depth/stencil test and late depth/stencil write even when the shader could kill the fragment (alpha test or discard). Since it matters for the new stencil value if the fragment is killed by depth/stencil test or by the shader (in which case it will not reach the depth/stencil test) this simply cannot work (we also would possibly skip writing the new stencil value due to mask checks but this is a secondary issue). So use late depth test / late depth write instead in this case. (No piglit changes as it doesn't seem to hit such bogus early depth test / late depth write path.) Reviewed-by: Jose Fonseca <[email protected]>
* llvmpipe: fix issue with not writing new stencil valuesRoland Scheidegger2013-05-222-4/+7
| | | | | | | | | | | | | | | We did mask checks between depth/stencil testing and depth/stencil write. This meant that if the depth/stencil test killed off all fragments we never actually wrote the new stencil value. This issue affected all early/late test/write combinations. So move the mask check after depth/stencil write (for early depth test, could do the same for late depth test but might not be worth it at that point so just skip it there). This addresses https://bugs.freedesktop.org/show_bug.cgi?id=41787. Piglit does not hit this issue because of the simple_shader optimization in generate_fs_loop() which means we're skipping the mask checks. Reviewed-by: Jose Fonseca <[email protected]>
* llvmpipe: (trivial) remove confusing code in stencil testRoland Scheidegger2013-05-221-16/+11
| | | | | | | | | This was meant to disable some code which isn't needed when depth/stencil isn't written. However, there's more code which wouldn't be needed in that case so having the condition there was just odd (llvm will drop all the code anyway). Reviewed-by: Jose Fonseca <[email protected]>
* llvmpipe: fix bug in early depth test / late depth write handlingRoland Scheidegger2013-05-221-6/+7
| | | | | | | Using wrong type if the format was less than 32bits. No piglit changes as it doesn't hit that path. Reviewed-by: Jose Fonseca <[email protected]>
* ilo: set more fields of 3DSTATE_DEPTH_BUFFERChia-I Wu2013-05-221-48/+120
| | | | | | | | Set lod/layer related fields of 3DSTATE_DEPTH_BUFFER. Since we always point to a single level/layer, those fields are always zero and this commit effectively makes no change. While at it, make it easier to disable manual slice offset calculation.
* ilo: correctly set view extent in SURFACE_STATEChia-I Wu2013-05-222-85/+115
| | | | | | | The view extent was set to be the same as the depth while it should be set to the number of layers. It makes a difference for 3D textures. Also use this as a chance to clean up the code.