summaryrefslogtreecommitdiffstats
path: root/src/gallium
Commit message (Collapse)AuthorAgeFilesLines
* swr: fix texture layout for compressed formatsIlia Mirkin2016-11-152-4/+6
| | | | | | | Fixes the texsubimage piglit and lets the copyteximage one get further. Signed-off-by: Ilia Mirkin <[email protected]> Reviewed-by: Bruce Cherniak <[email protected]>
* swr: add archrast generated files to gitignoreIlia Mirkin2016-11-151-0/+4
| | | | | Signed-off-by: Ilia Mirkin <[email protected]> Reviewed-by: Bruce Cherniak <[email protected]>
* swr: [rasterizer jitter] don't bother quantizing unused channelsIlia Mirkin2016-11-151-1/+1
| | | | | | | | In a BGR10X2 or BGR5X1 situation, there's no need to try to quantize the X channel - the default will have the proper quantization required. Signed-off-by: Ilia Mirkin <[email protected]> Reviewed-by: Bruce Cherniak <[email protected]>
* swr: [rasterizer memory] fix store tile for 128-bit ymajor tilingIlia Mirkin2016-11-151-1/+1
| | | | | | | Noticed by inspection. Signed-off-by: Ilia Mirkin <[email protected]> Reviewed-by: Bruce Cherniak <[email protected]>
* swr: [rasterizer memory] add support for R32_FLOAT_X8X24 formatsIlia Mirkin2016-11-152-0/+2
| | | | | | | | This is the format used for the primary surface of a PIPE_FORMAT_Z32_FLOAT_S8X24_UINT resource. Signed-off-by: Ilia Mirkin <[email protected]> Reviewed-by: Bruce Cherniak <[email protected]>
* radeonsi: set IF_THRESHOLD to 3Marek Olšák2016-11-151-1/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Piglit regressions (radeonsi or LLVM bugs, they pass on softpipe): - glsl-1.10/execution/variable-indexing/vs-output-array-vec3-index-wr - glsl-1.10/execution/variable-indexing/vs-output-array-vec4-index-wr - glsl-110/execution/variable-indexing/vs-temp-array-mat2-index-col-row-wr - glsl-110/execution/variable-indexing/vs-temp-array-mat2-index-row-wr Totals: SGPRS: 1132185 -> 1168801 (3.23 %) VGPRS: 907856 -> 906204 (-0.18 %) Spilled SGPRs: 2011 -> 2425 (20.59 %) Spilled VGPRs: 368 -> 96 (-73.91 %) Scratch VGPRs: 1344 -> 1060 (-21.13 %) dwords per thread Code Size: 35916164 -> 35705372 (-0.59 %) bytes LDS: 767 -> 767 (0.00 %) blocks Max Waves: 194010 -> 194921 (0.47 %) Wait states: 0 -> 0 (0.00 %) Before: VGPR SPILLING APPS Shaders SpillVGPR ScratchVGPR alien_isolation 2938 38 40 bioshock-infinite 1769 245 732 dirt-showdown 548 85 72 f1-2015 776 0 320 ue4_lightroom_inter.. 74 0 180 After: VGPR SPILLING APPS Shaders SpillVGPR ScratchVGPR alien_isolation 2938 38 40 bioshock-infinite 1769 0 480 dirt-showdown 548 58 40 f1-2015 776 0 320 ue4_lightroom_inter.. 74 0 180 Bioshock and DiRT benefit. If I set IF_THRESHOLD=4, tesseract starts spilling VGPRs Reviewed-by: Nicolai Hähnle <[email protected]>
* gallium: add PIPE_SHADER_CAP_LOWER_IF_THRESHOLDMarek Olšák2016-11-1514-0/+21
| | | | Reviewed-by: Nicolai Hähnle <[email protected]>
* gallivm: limit use of setFastMathFlags to LLVM 3.8 and laterMarek Olšák2016-11-151-0/+2
| | | | Reviewed-by: Brian Paul <[email protected]>
* radeonsi: set unsafe fpmath on FP instructions when allowed by R600_DEBUGMarek Olšák2016-11-151-1/+5
| | | | Reviewed-by: Nicolai Hähnle <[email protected]>
* gallivm: add lp_create_builder with an unsafe_fpmath optionMarek Olšák2016-11-152-0/+17
| | | | Reviewed-by: Nicolai Hähnle <[email protected]>
* radeonsi: fold some shader context initialization to si_llvm_context_initMarek Olšák2016-11-153-29/+30
| | | | Reviewed-by: Nicolai Hähnle <[email protected]>
* clover: adapt to new error API since LLVM r286752Vedran Miletić2016-11-141-2/+8
| | | | Tested-by: Dieter Nützel <[email protected]>
* swr: [rasterizer core] remove driverTypeTim Rowley2016-11-145-49/+2
| | | | Reviewed-by: Bruce Cherniak <[email protected]>
* swr: [rasterizer archrast] move to pass by valueTim Rowley2016-11-142-2/+2
| | | | | | | | | Move to pass by value since most events are very small in size. We can look at pass by reference but will need to create multiple versions to handle temp objects. Reviewed-by: Bruce Cherniak <[email protected]>
* swr: [rasterizer core] add mode for aux buffer in the SWR_SURFACE_STATETim Rowley2016-11-141-0/+16
| | | | Reviewed-by: Bruce Cherniak <[email protected]>
* swr: [rasterizer common] don't bleed NOMINMAX definition after <windows.h>Tim Rowley2016-11-141-1/+4
| | | | Reviewed-by: Bruce Cherniak <[email protected]>
* swr: [rasterizer archrast] add eventsTim Rowley2016-11-146-6/+541
| | | | | | | Added events for tracking early/late Depth and stencil events, TE patch info, GS prim info, and FrontEnd/BackEnd DrawEnd events. Reviewed-by: Bruce Cherniak <[email protected]>
* swr: [rasterizer core] fix culling issuesTim Rowley2016-11-141-66/+119
| | | | | | | | | - Do proper culling of wireframe triangles (including non-culling of degenerates) - Fix degenerate culling of CCW front-facing triangles in wireframe and conservative rast Reviewed-by: Bruce Cherniak <[email protected]>
* swr: [rasterizer core/jitter] fix alpha test bugTim Rowley2016-11-143-3/+15
| | | | | | | | Alpha from render target 0 should always be used for alpha test for all render targets, according to GL and DX9 specs. Previously we were using alpha from the current render target. Reviewed-by: Bruce Cherniak <[email protected]>
* swr: [rasterizer core] various code style changesTim Rowley2016-11-146-5/+26
| | | | Reviewed-by: Bruce Cherniak <[email protected]>
* swr: [rasterizer archrast] don't generate empty filesTim Rowley2016-11-144-8/+39
| | | | | | | Don't generate files when no events have been generated outside the header events. Reviewed-by: Bruce Cherniak <[email protected]>
* swr: [rasterizer archrast] fix open file handle limit issueTim Rowley2016-11-141-6/+44
| | | | | | | | Buffer events ourselves and then when that's full or we're destroying the context then write the contents to file. Previously, we're relying ofstream to buffer for us. Reviewed-by: Bruce Cherniak <[email protected]>
* swr: [rasterizer archrast] fix double free issueTim Rowley2016-11-149-24/+41
| | | | Reviewed-by: Bruce Cherniak <[email protected]>
* swr: [rasterizer core] separate frontend/backend stats enablesTim Rowley2016-11-146-26/+51
| | | | Reviewed-by: Bruce Cherniak <[email protected]>
* swr: [rasterizer core] 16-wide tile store nearly completedTim Rowley2016-11-145-314/+917
| | | | | | | | | | * All format combinations coded * Fully emulated on AVX2 and AVX * Known issue: the MSAA sample locations need to be adjusted for 8x2 Set ENABLE_AVX512_SIMD16 and USD_8x2_TILE_BACKEND to 1 in knobs.h to enable Reviewed-by: Bruce Cherniak <[email protected]>
* vc4: Add simulator kernel validation for multithreaded fragment shaders.Jonas Pfeil2016-11-123-5/+76
| | | | | This is Jonas Pfeil's code from the kernel, brought back to Mesa by anholt.
* vc4: Mark threaded FSes as non-singlethread in the CL.Eric Anholt2016-11-123-1/+6
|
* vc4: Flag the last thread switch in the program as the last.Eric Anholt2016-11-123-0/+34
| | | | | | We don't allow the last thread switch to be inside control flow, to be sure that we hit the last state exactly once. If the last texturing was in control flow, fall back to single threaded.
* vc4: Add THRSW nodes after each tex sample setup in multithreaded mode.Eric Anholt2016-11-122-0/+49
| | | | | This is a suboptimal implementation, but Jonas Pfeil found that it was still a massive performance gain.
* vc4: Add some spec citations about texture fifo management.Eric Anholt2016-11-121-5/+37
|
* vc4: Use ra14/rb14 as the spilling registers.Eric Anholt2016-11-122-8/+8
| | | | This makes the raddr fixups compatible with FS threading.
* vc4: Add support for register allocation for threaded shaders.Eric Anholt2016-11-123-20/+85
| | | | | | We have two major requirements: Make sure that only the bottom half of the physical reg space is used, and make sure that none of our values are live in an accumulator across a switch.
* vc4: Split register class setup for physical files from accumulators.Eric Anholt2016-11-121-17/+19
|
* vc4: Use register allocator CLASS_BIT_R0_R3 to clean up CLASS_B.Eric Anholt2016-11-121-4/+4
| | | | | | We have had no reason to separate ability to store in an accumulator from ability to store in B, but with FS threading, we need to be able to force values to be stored only in the physical regfiles.
* vc4: Add support for QPU scheduling of thread switch instructions.Eric Anholt2016-11-121-2/+27
| | | | This is vaguely based off of Jonas Pfeil's thread switch support branch.
* vc4: Add a thread switch QIR instruction.Eric Anholt2016-11-123-0/+18
| | | | | | | This will eventually be generated at the QIR level, so that vc4_qir_schedule.c can arrange the separation of tex_strb from tex_result correctly. It will also be important so that register allocation set the register classes appropriately for values that are live across the switch.
* vc4: Add a bit of QPU validation for threaded shaders.Eric Anholt2016-11-121-1/+102
| | | | | These are both bugs we've run into along the way writing multithreaded FS support.
* vc4: Fix register class handling of DDX/DDY arguments.Eric Anholt2016-11-121-1/+1
| | | | | | | I had this exactly backwards, but apparently the piglit tests were all landing in r0-r3 anyway. Cc: "13.0" <[email protected]>
* freedreno/ir3: fixup ralloc falloutRob Clark2016-11-122-2/+2
| | | | | | | Fixes fallout from acc23b04 ("ralloc: remove memset from ralloc_size"). We were still depending on zero'd allocations in a couple of places. Signed-off-by: Rob Clark <[email protected]>
* clover: fix building since llvm r286566Laurent Carlier2016-11-111-0/+5
| | | | pretty trivial fix
* nvc0: support MP performance counters on MaxwellSamuel Pitoiset2016-11-103-3/+721
| | | | | | | This adds some performance counters/metrics for SM50/SM52. Signed-off-by: Samuel Pitoiset <[email protected]> Tested-by: Pierre Moreau <[email protected]>
* gallium: detect avx512 cpu featuresTim Rowley2016-11-102-0/+36
| | | | | | | v3: fix check for xmm/ymm test v2: style code, add avx512 to cpu dump Reviewed-by: Roland Scheidegger <[email protected]>
* radeonsi: fix r600_texture::tc_compatible_htileMarek Olšák2016-11-101-3/+3
| | | | | | | | htile_size is now always non-zero if HTILE is allocated. It seems to have caused no issues. Reviewed-by: Nicolai Hähnle <[email protected]>
* radeonsi: accept is_store in image_fetch_rsrc instead of dcc_offMarek Olšák2016-11-101-4/+4
| | | | Reviewed-by: Nicolai Hähnle <[email protected]>
* radeonsi: don't rely on tgsi_scan::images_buffersMarek Olšák2016-11-101-8/+11
| | | | | | the instruction knows the target Reviewed-by: Nicolai Hähnle <[email protected]>
* radeonsi: re-order cases in si_get_shader_paramMarek Olšák2016-11-101-28/+28
| | | | Reviewed-by: Nicolai Hähnle <[email protected]>
* radeonsi: increase MAX_CONTROL_FLOW_DEPTH AKA MaxIfDepthMarek Olšák2016-11-101-2/+1
| | | | | | we don't want to lower deep IFs unconditionally Reviewed-by: Nicolai Hähnle <[email protected]>
* radeonsi: fix/silence unused variable warnings in optimized buildsNicolai Hähnle2016-11-102-3/+3
| | | | | | | | I'm leaving num_out_sgpr around since it's not in a fast path, and besides the compiler should be able to optimize it away easily. The alternative with #if/#endif would be extremely ugly. Reviewed-by: Marek Olšák <[email protected]>
* gallivm: fix [IU]MUL_HI regression harderNicolai Hähnle2016-11-101-8/+12
| | | | | | | | | | The fix in commit 88f791db75e9f065bac8134e0937e1b76600aa36 was insufficient for radeonsi because the vector case was not handled properly. It seems piglit only covers the scalar case, unfortunately. Fixes GL45-CTS.shader_bitfield_operation.[iu]mulExtended.* Reviewed-by: Roland Scheidegger <[email protected]>
* swr: correct setting of independentAlphaBlendEnableIlia Mirkin2016-11-091-1/+6
| | | | | | | | This setting is for whether color and alpha have different blend settings, not for whether blending is enabled on a per-RT basis. Signed-off-by: Ilia Mirkin <[email protected]> Reviewed-by: Tim Rowley <[email protected]>