summaryrefslogtreecommitdiffstats
path: root/src/gallium/drivers
Commit message (Collapse)AuthorAgeFilesLines
* radeonsi: Fix resource leak in gs_copy_shader allocation failure pathGwan-gyeong Mun2016-11-221-1/+7
| | | | | | | | | CID 1394028 Signed-off-by: Mun Gwan-gyeong <[email protected]> Reviewed-by: Edward O'Callaghan <[email protected]> Reviewed-by: Marek Olšák <[email protected]> Reviewed-by: Nicolai Hähnle <[email protected]>
* radeonsi: store group_size_variable in struct si_computeNicolai Hähnle2016-11-211-5/+8
| | | | | | | | | | | | | | For compute shaders, we free the selector after the shader has been compiled, so we need to save this bit somewhere else. Also, make sure that this type of bug cannot re-appear, by NULL-ing the selector pointer after we're done with it. This bug has been there since the feature was added, but was only exposed in piglit arb_compute_variable_group_size-local-size by commit 9bfee7047b70cb0aa026ca9536465762f96cb2b1 (which is totally unrelated). Cc: 13.0 <[email protected]> Reviewed-by: Marek Olšák <[email protected]>
* nvc0/ir: use levelZero flag when the lod is set to 0Ilia Mirkin2016-11-202-6/+43
| | | | | Signed-off-by: Ilia Mirkin <[email protected]> Reviewed-by: Samuel Pitoiset <[email protected]>
* swr: mark streamout buffers as writtenIlia Mirkin2016-11-191-0/+7
| | | | | Signed-off-by: Ilia Mirkin <[email protected]> Reviewed-by: Bruce Cherniak <[email protected]>
* radeonsi: emit sample locations also when nr_samples == 1Nicolai Hähnle2016-11-181-1/+4
| | | | | | | | | | | | Since the state tracker now enables MSAA in the hardware for the case nr_samples == 1 as well, we need to set sample locations correctly for this case. The Polaris override is still needed for the non-MSAA case (when nr_samples == 0). Reviewed-by: Marek Olšák <[email protected]> Reviewed-by: Edward O'Callaghan <[email protected]>
* radeonsi: allow sample mask export for single-sample framebuffersNicolai Hähnle2016-11-181-4/+5
| | | | | | | This fixes GL45-CTS.sample_variables.mask.*.samples_1.*. Reviewed-by: Marek Olšák <[email protected]> Reviewed-by: Edward O'Callaghan <[email protected]>
* vc4: Try compiling our FSes in multithreaded mode on new kernels.Eric Anholt2016-11-165-2/+20
| | | | | | Multithreaded fragment shaders let us hide texturing latency by a hyperthreading-style switch to another fragment shader. This gets us up to 20% framerate improvements on glmark2 tests.
* vc4: Add support for ETC1 textures if the kernel is new enough.Eric Anholt2016-11-164-5/+18
| | | | | The kernel changes for exposing the param have now been merged, so we can expose it here.
* vc4: Fix simulator mode missing-GETPARAM debug info.Eric Anholt2016-11-161-1/+1
| | | | The value is 0 since we didn't set it, we wanted to see the param.
* vc4: Fix resource leak in register allocation failure path.Mun Gwan-gyeong2016-11-161-0/+2
| | | | | | CID 1394322 Signed-off-by: Mun Gwan-gyeong <[email protected]>
* swr: [rasterizer core] fix clear with multiple color attachmentsTim Rowley2016-11-166-52/+40
| | | | | | | | Fixes fbo-mrt-alphatest v2: styling fixes Reviewed-by: Bruce Cherniak <[email protected]>
* radeonsi: fix a subtle bounds checking corner case with 3-component attributesNicolai Hähnle2016-11-163-2/+39
| | | | | | | | I'm also sending out a piglit test, gl-2.0/vertexattribpointer-size-3, which exposes this corner case. Reviewed-by: Marek Olšák <[email protected]> Reviewed-by: Edward O'Callaghan <[email protected]>
* radeonsi: reject some 3-component formats as buffer texturesNicolai Hähnle2016-11-161-8/+35
| | | | | | | Fixes parts of GL45-CTS.gtf32.GL3Tests.packed_pixels.packed_pixels_pbo. Reviewed-by: Marek Olšák <[email protected]> Reviewed-by: Edward O'Callaghan <[email protected]>
* swr: mark color clamping as unsupportedIlia Mirkin2016-11-151-2/+3
| | | | | | | | | | | There is no functionality in swr to clamp either vertex or frag colors. This could be added in swr_shader, at which point these could be re-enabled. Fixes arb_color_buffer_float-render Signed-off-by: Ilia Mirkin <[email protected]> Reviewed-by: Tim Rowley <[email protected]>
* swr: always enable adding start/base vertex to gl_VertexIdIlia Mirkin2016-11-151-0/+1
| | | | | | | Fixes gl-3.2-basevertex-vertexid Signed-off-by: Ilia Mirkin <[email protected]> Reviewed-by: Tim Rowley <[email protected]>
* swr: add support for upper-left fragcoord positionIlia Mirkin2016-11-151-2/+8
| | | | | | | Fixes glsl-arb-fragment-coord-conventions. Signed-off-by: Ilia Mirkin <[email protected]> Reviewed-by: Tim Rowley <[email protected]>
* swr: make sure that all rendering is finished on shader destroyIlia Mirkin2016-11-151-0/+8
| | | | | | | | | | | | Rendering could still be ongoing (or have yet to start) when the shader is deleted. There's no refcounting on the shader text, so insert a pipeline stall unconditionally when this happens. [Note, we should instead introduce a way to attach work to fences, so that the freeing can be done in the current fence.] Signed-off-by: Ilia Mirkin <[email protected]> Reviewed-by: Bruce Cherniak <[email protected]>
* swr: disable blending for integer formatsIlia Mirkin2016-11-151-0/+3
| | | | | | | | | | The EXT_texture_integer test says that blending and alphatest should all be disabled. st/mesa takes care of alphatest already. Fixes the ext_texture_integer-fbo-blending piglit test. Signed-off-by: Ilia Mirkin <[email protected]> Reviewed-by: Bruce Cherniak <[email protected]>
* swr: mark rgb9_e5 as unrenderableIlia Mirkin2016-11-151-1/+1
| | | | | | | | | The support in swr requires shaders to output the components as UINTs. This is not how GL or Gallium work, and since this is not a required-renderable format, just leave it out. Signed-off-by: Ilia Mirkin <[email protected]> Reviewed-by: Bruce Cherniak <[email protected]>
* swr: no support for shader stencil exportIlia Mirkin2016-11-151-1/+1
| | | | | Signed-off-by: Ilia Mirkin <[email protected]> Reviewed-by: Bruce Cherniak <[email protected]>
* swr: mark both frag and vert textures read, don't forget about cbsIlia Mirkin2016-11-151-5/+15
| | | | | Signed-off-by: Ilia Mirkin <[email protected]> Reviewed-by: Bruce Cherniak <[email protected]>
* swr: fix texture layout for compressed formatsIlia Mirkin2016-11-152-4/+6
| | | | | | | Fixes the texsubimage piglit and lets the copyteximage one get further. Signed-off-by: Ilia Mirkin <[email protected]> Reviewed-by: Bruce Cherniak <[email protected]>
* swr: add archrast generated files to gitignoreIlia Mirkin2016-11-151-0/+4
| | | | | Signed-off-by: Ilia Mirkin <[email protected]> Reviewed-by: Bruce Cherniak <[email protected]>
* swr: [rasterizer jitter] don't bother quantizing unused channelsIlia Mirkin2016-11-151-1/+1
| | | | | | | | In a BGR10X2 or BGR5X1 situation, there's no need to try to quantize the X channel - the default will have the proper quantization required. Signed-off-by: Ilia Mirkin <[email protected]> Reviewed-by: Bruce Cherniak <[email protected]>
* swr: [rasterizer memory] fix store tile for 128-bit ymajor tilingIlia Mirkin2016-11-151-1/+1
| | | | | | | Noticed by inspection. Signed-off-by: Ilia Mirkin <[email protected]> Reviewed-by: Bruce Cherniak <[email protected]>
* swr: [rasterizer memory] add support for R32_FLOAT_X8X24 formatsIlia Mirkin2016-11-152-0/+2
| | | | | | | | This is the format used for the primary surface of a PIPE_FORMAT_Z32_FLOAT_S8X24_UINT resource. Signed-off-by: Ilia Mirkin <[email protected]> Reviewed-by: Bruce Cherniak <[email protected]>
* radeonsi: set IF_THRESHOLD to 3Marek Olšák2016-11-151-1/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Piglit regressions (radeonsi or LLVM bugs, they pass on softpipe): - glsl-1.10/execution/variable-indexing/vs-output-array-vec3-index-wr - glsl-1.10/execution/variable-indexing/vs-output-array-vec4-index-wr - glsl-110/execution/variable-indexing/vs-temp-array-mat2-index-col-row-wr - glsl-110/execution/variable-indexing/vs-temp-array-mat2-index-row-wr Totals: SGPRS: 1132185 -> 1168801 (3.23 %) VGPRS: 907856 -> 906204 (-0.18 %) Spilled SGPRs: 2011 -> 2425 (20.59 %) Spilled VGPRs: 368 -> 96 (-73.91 %) Scratch VGPRs: 1344 -> 1060 (-21.13 %) dwords per thread Code Size: 35916164 -> 35705372 (-0.59 %) bytes LDS: 767 -> 767 (0.00 %) blocks Max Waves: 194010 -> 194921 (0.47 %) Wait states: 0 -> 0 (0.00 %) Before: VGPR SPILLING APPS Shaders SpillVGPR ScratchVGPR alien_isolation 2938 38 40 bioshock-infinite 1769 245 732 dirt-showdown 548 85 72 f1-2015 776 0 320 ue4_lightroom_inter.. 74 0 180 After: VGPR SPILLING APPS Shaders SpillVGPR ScratchVGPR alien_isolation 2938 38 40 bioshock-infinite 1769 0 480 dirt-showdown 548 58 40 f1-2015 776 0 320 ue4_lightroom_inter.. 74 0 180 Bioshock and DiRT benefit. If I set IF_THRESHOLD=4, tesseract starts spilling VGPRs Reviewed-by: Nicolai Hähnle <[email protected]>
* gallium: add PIPE_SHADER_CAP_LOWER_IF_THRESHOLDMarek Olšák2016-11-1510-0/+14
| | | | Reviewed-by: Nicolai Hähnle <[email protected]>
* radeonsi: set unsafe fpmath on FP instructions when allowed by R600_DEBUGMarek Olšák2016-11-151-1/+5
| | | | Reviewed-by: Nicolai Hähnle <[email protected]>
* radeonsi: fold some shader context initialization to si_llvm_context_initMarek Olšák2016-11-153-29/+30
| | | | Reviewed-by: Nicolai Hähnle <[email protected]>
* swr: [rasterizer core] remove driverTypeTim Rowley2016-11-145-49/+2
| | | | Reviewed-by: Bruce Cherniak <[email protected]>
* swr: [rasterizer archrast] move to pass by valueTim Rowley2016-11-142-2/+2
| | | | | | | | | Move to pass by value since most events are very small in size. We can look at pass by reference but will need to create multiple versions to handle temp objects. Reviewed-by: Bruce Cherniak <[email protected]>
* swr: [rasterizer core] add mode for aux buffer in the SWR_SURFACE_STATETim Rowley2016-11-141-0/+16
| | | | Reviewed-by: Bruce Cherniak <[email protected]>
* swr: [rasterizer common] don't bleed NOMINMAX definition after <windows.h>Tim Rowley2016-11-141-1/+4
| | | | Reviewed-by: Bruce Cherniak <[email protected]>
* swr: [rasterizer archrast] add eventsTim Rowley2016-11-146-6/+541
| | | | | | | Added events for tracking early/late Depth and stencil events, TE patch info, GS prim info, and FrontEnd/BackEnd DrawEnd events. Reviewed-by: Bruce Cherniak <[email protected]>
* swr: [rasterizer core] fix culling issuesTim Rowley2016-11-141-66/+119
| | | | | | | | | - Do proper culling of wireframe triangles (including non-culling of degenerates) - Fix degenerate culling of CCW front-facing triangles in wireframe and conservative rast Reviewed-by: Bruce Cherniak <[email protected]>
* swr: [rasterizer core/jitter] fix alpha test bugTim Rowley2016-11-143-3/+15
| | | | | | | | Alpha from render target 0 should always be used for alpha test for all render targets, according to GL and DX9 specs. Previously we were using alpha from the current render target. Reviewed-by: Bruce Cherniak <[email protected]>
* swr: [rasterizer core] various code style changesTim Rowley2016-11-146-5/+26
| | | | Reviewed-by: Bruce Cherniak <[email protected]>
* swr: [rasterizer archrast] don't generate empty filesTim Rowley2016-11-144-8/+39
| | | | | | | Don't generate files when no events have been generated outside the header events. Reviewed-by: Bruce Cherniak <[email protected]>
* swr: [rasterizer archrast] fix open file handle limit issueTim Rowley2016-11-141-6/+44
| | | | | | | | Buffer events ourselves and then when that's full or we're destroying the context then write the contents to file. Previously, we're relying ofstream to buffer for us. Reviewed-by: Bruce Cherniak <[email protected]>
* swr: [rasterizer archrast] fix double free issueTim Rowley2016-11-149-24/+41
| | | | Reviewed-by: Bruce Cherniak <[email protected]>
* swr: [rasterizer core] separate frontend/backend stats enablesTim Rowley2016-11-146-26/+51
| | | | Reviewed-by: Bruce Cherniak <[email protected]>
* swr: [rasterizer core] 16-wide tile store nearly completedTim Rowley2016-11-145-314/+917
| | | | | | | | | | * All format combinations coded * Fully emulated on AVX2 and AVX * Known issue: the MSAA sample locations need to be adjusted for 8x2 Set ENABLE_AVX512_SIMD16 and USD_8x2_TILE_BACKEND to 1 in knobs.h to enable Reviewed-by: Bruce Cherniak <[email protected]>
* vc4: Add simulator kernel validation for multithreaded fragment shaders.Jonas Pfeil2016-11-123-5/+76
| | | | | This is Jonas Pfeil's code from the kernel, brought back to Mesa by anholt.
* vc4: Mark threaded FSes as non-singlethread in the CL.Eric Anholt2016-11-123-1/+6
|
* vc4: Flag the last thread switch in the program as the last.Eric Anholt2016-11-123-0/+34
| | | | | | We don't allow the last thread switch to be inside control flow, to be sure that we hit the last state exactly once. If the last texturing was in control flow, fall back to single threaded.
* vc4: Add THRSW nodes after each tex sample setup in multithreaded mode.Eric Anholt2016-11-122-0/+49
| | | | | This is a suboptimal implementation, but Jonas Pfeil found that it was still a massive performance gain.
* vc4: Add some spec citations about texture fifo management.Eric Anholt2016-11-121-5/+37
|
* vc4: Use ra14/rb14 as the spilling registers.Eric Anholt2016-11-122-8/+8
| | | | This makes the raddr fixups compatible with FS threading.
* vc4: Add support for register allocation for threaded shaders.Eric Anholt2016-11-123-20/+85
| | | | | | We have two major requirements: Make sure that only the bottom half of the physical reg space is used, and make sure that none of our values are live in an accumulator across a switch.