summaryrefslogtreecommitdiffstats
path: root/src/gallium/drivers
Commit message (Collapse)AuthorAgeFilesLines
* swr/rast: Jit debug workGeorge Kyriazis2018-01-191-30/+81
| | | | | | Properly validate DLL matches OBJ for jitted function Reviewed-by: Bruce Cherniak <[email protected]>
* swr/rast: silence generated file warningsGeorge Kyriazis2018-01-191-0/+1
| | | | Reviewed-by: Bruce Cherniak <[email protected]>
* swr/rast: jit shader lib debug workGeorge Kyriazis2018-01-192-0/+11
| | | | | | Create shader_lib during build, link with shaders at DLL generation time Reviewed-by: Bruce Cherniak <[email protected]>
* swr/rast: AVX-512 changes to enable 16-wide VSGeorge Kyriazis2018-01-194-8/+29
| | | | | | | | | | | | | | Add a new define (USE_SIMD16_VS), to denote calling a 16-wide vertex shader. This is needed because the mesa driver can do 16-wide shaders, but rasty cannot yet, so we need to distinguish. Create a new VertexID entry (VertexID16) for the USE_SIMD16_VS case, since we need to format the vertex id in a way that is digestible by the 16-wide VS Disabled for now. To be enabled in a future checkin when driver work is complete. Reviewed-by: Bruce Cherniak <[email protected]>
* swr/rast: x86 autogenerated macro workGeorge Kyriazis2018-01-194-14/+15
| | | | | | | Add name argument to x86 autogenerated macros. Add useful variable names for DCL_inputVec implementation. Reviewed-by: Bruce Cherniak <[email protected]>
* swr/rast: Shorten some filenamesGeorge Kyriazis2018-01-192-2/+2
| | | | | | in shader and fetch dump files Reviewed-by: Bruce Cherniak <[email protected]>
* swr/rast: work supporting optimizations in Debug builds.George Kyriazis2018-01-192-9/+23
| | | | Reviewed-by: Bruce Cherniak <[email protected]>
* swr/rast: Add debugging type support for function types.George Kyriazis2018-01-192-0/+21
| | | | Reviewed-by: Bruce Cherniak <[email protected]>
* swr/rast: Shader debugging workGeorge Kyriazis2018-01-191-0/+6
| | | | | | | - Move debug .ll files to JIT_CACHE_DIR - Don't link against jitter SRGBLut table, add global data to shader that needs it. Reviewed-by: Bruce Cherniak <[email protected]>
* swr/rast: Debug Symbols workGeorge Kyriazis2018-01-194-19/+88
| | | | | | | Added support for Fetch / Sample / LD functions Added DLL link to JitCache implementation Reviewed-by: Bruce Cherniak <[email protected]>
* swr/rast: Initial work for debugging support.George Kyriazis2018-01-196-16/+191
| | | | | | | | | | Adds ability to step into jitted llvm IR in Visual Studio. - Updated llvm type generation script to also generate corresponding debug types. - New module pass inserts debug metadata into the IR for each function Disabled by default. Reviewed-by: Bruce Cherniak <[email protected]>
* swr/rast: Add private state parameter in fetcherGeorge Kyriazis2018-01-195-29/+41
| | | | Reviewed-by: Bruce Cherniak <[email protected]>
* swr/rast: Added missing define for Linux/gccGeorge Kyriazis2018-01-191-0/+1
| | | | | | + ZeroMemory() macro definition for non win32-compilation in common/os.h Reviewed-by: Bruce Cherniak <[email protected]>
* swr/rast: Fix one more invalid object format for windows.George Kyriazis2018-01-191-1/+1
| | | | Reviewed-by: Bruce Cherniak <[email protected]>
* swr: allow a single swr architecture to be builtinChuck Atkins2018-01-191-35/+49
| | | | | | | | | | | | | | | | | | | | | | | | | | Part 2 of 2 (part 1 is autoconf changes, part 2 is C++ changes) When only a single SWR architecture is being used, this allows that architecture to be builtin rather than as a separate libswrARCH.so that gets loaded via dlopen. Since there are now several different code paths for each detected CPU architecture, the log output is also adjusted to convey where the backend is getting loaded from. This allows SWR to be used for static mesa builds which are still important for large HPC environments where shared libraries can impose unacceptable application startup times as hundreds of thousands of copies of the libs are loaded from a shared parallel filesystem. Based on an initial implementation by Tim Rowley. v2: Refactor repetitive preprocessor checks to reduce code duplication v3: Formatting changes per Bruce C. Also delay screen creation until end to avoid leaks when failure conditions are hit. Signed-off-by: Chuck Atkins <[email protected]> Reviewed-by: Bruce Cherniak <[email protected]> CC: Tim Rowley <[email protected]> Reviewed-by: Bruce Cherniak <[email protected]>
* swr: (autoconf) allow a single swr architecture to be builtinChuck Atkins2018-01-191-11/+39
| | | | | | | | | | | | | | | | | | | | | | | | Part 1 of 2 (part 1 is autoconf changes, part 2 is C++ changes) When only a single SWR architecture is being used, this allows that architecture to be builtin rather than as a separate libswrARCH.so that gets loaded via dlopen. Since there are now several different code paths for each detected CPU architecture, the log output is also adjusted to convey where the backend is getting loaded from. This allows SWR to be used for static mesa builds which are still important for large HPC environments where shared libraries can impose unacceptable application startup times as hundreds of thousands of copies of the libs are loaded from a shared parallel filesystem. Based on an initial implementation by Tim Rowley. v2: Fix comment placement pointed out by Bruce C. Signed-off-by: Chuck Atkins <[email protected]> Reviewed-by: Bruce Cherniak <[email protected]> CC: Tim Rowley <[email protected]> Reviewed-by: Bruce Cherniak <[email protected]>
* swr: fix clang 5 null cast warningGreg V2018-01-191-3/+3
| | | | | Reviewed-by: Emil Velikov <[email protected]> Reviewed-by: Bruce Cherniak <[email protected]>
* r600: enable ARB_enhanced_layoutsDave Airlie2018-01-191-1/+1
| | | | | | | | | | | Only one piglit test fails, sso-vs-gs-fs-array-interleave There are 3 tests using ssbo without checking sizes failing also but those are test bugs. Reviewed-by: Roland Scheidegger <[email protected]> Signed-off-by: Dave Airlie <[email protected]>
* vc5: add missing files to the tarballEmil Velikov2018-01-181-0/+5
| | | | Signed-off-by: Emil Velikov <[email protected]>
* r600/sb: add lds related peepholes.Dave Airlie2018-01-181-1/+8
| | | | | | | | | if no destination: a) convert _RET instructions to non _RET variants if no dst b) set src0 to undefined if it's a READ, this should get DCE then. Acked-By: Roland Scheidegger <[email protected]> Signed-off-by: Dave Airlie <[email protected]>
* r600/sb: use different stacks for tracking lds and queue usage.Dave Airlie2018-01-182-3/+24
| | | | | | | | | | | | | | The normal ssa renumbering isn't sufficient for LDS queue access, this uses two stacks, one for the lds queue, and one for the lds r/w ordering. The LDS oq values are incremented in their use in a linear fashion. The LDS rw values are incremented in their definitions and used in the next lds operation to ensure reordering doesn't occur. Acked-By: Roland Scheidegger <[email protected]> Signed-off-by: Dave Airlie <[email protected]>
* r600/sb: schedule LDS ops in appropriate places.Dave Airlie2018-01-182-0/+7
| | | | | | | | | | So LDS ops have to be SLOT_X, and LDS OQ reads have read port restrictions so we try and force those into only having one per slot and avoiding bank swizzles. Acked-By: Roland Scheidegger <[email protected]> Signed-off-by: Dave Airlie <[email protected]>
* r600/sb: hit the scheduler with a big hammer to avoid lds splits.Dave Airlie2018-01-181-0/+3
| | | | | | | | | | | | | This tries to avoid an lds queue read getting scheduled separately from an lds ret read, the non-sb code uses the same style of hammer, this isn't foolproof. We can do better, but it's a bit tricky, as you have to scan ahead and either schedule more lds oq moves and more lds reads and that could lead to you running out of space anyways. Acked-By: Roland Scheidegger <[email protected]> Signed-off-by: Dave Airlie <[email protected]>
* r600/sb: adding lds oq tracking to the schedulerDave Airlie2018-01-182-3/+15
| | | | | | | | | | | This adds support for tracking the lds oq read/writes so can avoid scheduling other things in between. This patch just adds the tracking and assert to show problems. Acked-By: Roland Scheidegger <[email protected]> Signed-off-by: Dave Airlie <[email protected]>
* r600/sb: add gcm support to avoid clause between lds read/queue readDave Airlie2018-01-182-2/+17
| | | | | | | | | | You have to schedule LDS_READ_RET _, x and MOV reg, LDS_OQ_A_POP in the same basic block/clause. This makes sure once we've issues and MOV we don't add another block until we balance it with an LDS read. Acked-By: Roland Scheidegger <[email protected]> Signed-off-by: Dave Airlie <[email protected]>
* r600/sb: handle lds special dest registers.Dave Airlie2018-01-182-2/+2
| | | | | | | This adds lds to the geom emit handling Acked-By: Roland Scheidegger <[email protected]> Signed-off-by: Dave Airlie <[email protected]>
* r600/sb: handle LDS operations in folding.Dave Airlie2018-01-181-0/+11
| | | | | | | Don't try and fold LDS using expressions. Acked-By: Roland Scheidegger <[email protected]> Signed-off-by: Dave Airlie <[email protected]>
* r600/sb: add finalising for lds output queue special values.Dave Airlie2018-01-181-0/+12
| | | | | | | We need to convert these to the hw special registers. Acked-By: Roland Scheidegger <[email protected]> Signed-off-by: Dave Airlie <[email protected]>
* r600/sb: add initial support for parsing lds operations.Dave Airlie2018-01-181-2/+50
| | | | | | | This handles parsing the LDS ops and queue accessess. Acked-By: Roland Scheidegger <[email protected]> Signed-off-by: Dave Airlie <[email protected]>
* r600/sb: disable if conversion for hsDave Airlie2018-01-181-1/+1
| | | | | | | This fixes bad interactions with the LDS special values. Acked-By: Roland Scheidegger <[email protected]> Signed-off-by: Dave Airlie <[email protected]>
* r600/sb: lds ops have no dst register.Dave Airlie2018-01-181-1/+1
| | | | | | | Although these are op3s they don't have a dst reg. Acked-By: Roland Scheidegger <[email protected]> Signed-off-by: Dave Airlie <[email protected]>
* r600/sb: introduce special register values for lds support.Dave Airlie2018-01-183-1/+33
| | | | | | | | | | | | | For LDS read/write ordering we use the LDS_RW value, reads will wait on previous writes. For LDS read/read from LDS queue ordering we use the LDS_OQ values, we define two for now, though initially we'll just support OQA. Also add the check for the lds oq values Acked-By: Roland Scheidegger <[email protected]> Signed-off-by: Dave Airlie <[email protected]>
* r600/sb: update last_cf if alu is the last clauseDave Airlie2018-01-181-0/+1
| | | | | | | | | | | It's rare to have a final alu clause on normal shaders (exports) but tess shaders write to LDS as their output, so we see some alu clauses, and the CF_END get put in the wrong place. This makes sure to update last_cf correctly. Acked-By: Roland Scheidegger <[email protected]> Signed-off-by: Dave Airlie <[email protected]>
* r600/sb: start adding GDS supportDave Airlie2018-01-1813-13/+123
| | | | | | | | | This adds support for GDS ops to sb backend. This seems to work for atomics and tess factor writes. Acked-By: Roland Scheidegger <[email protected]> Signed-off-by: Dave Airlie <[email protected]>
* r600/sb: add tess/compute initial state registers.Dave Airlie2018-01-181-1/+4
| | | | | | | This stops them being optimised out. Reviewed-by: Roland Scheidegger <[email protected]> Signed-off-by: Dave Airlie <[email protected]>
* r600/sb: fix a bug emitting ar load from a constant.Dave Airlie2018-01-181-0/+3
| | | | | | | | | | | | Some tess shaders were doing MOVA_INT _, c0.x on cayman, and then hitting an assert in sb_bc_finalize.cpp:translate_kcache. This makes sure the toplevel kcache tracker gets updated, and the clause gets fixed up. Reviewed-by: Roland Scheidegger <[email protected]> Cc: <[email protected]> Signed-off-by: Dave Airlie <[email protected]>
* r600/shader: only emit add instruction if param has a value.Dave Airlie2018-01-181-6/+8
| | | | | | | Just saves a pointless a = a + 0; Reviewed-by: Roland Scheidegger <[email protected]> Signed-off-by: Dave Airlie <[email protected]>
* r600: emit 0 gds_op for tf write.Dave Airlie2018-01-181-2/+3
| | | | | | | This field is ignored for tf writes so should be 0. Reviewed-by: Roland Scheidegger <[email protected]> Signed-off-by: Dave Airlie <[email protected]>
* r600: add support for ARB_shader_clock.Dave Airlie2018-01-183-5/+29
| | | | | Reviewed-by: Gert Wollny <[email protected]> Signed-off-by: Dave Airlie <[email protected]>
* gallium: remove PIPE_CAP_USER_CONSTANT_BUFFERSMarek Olšák2018-01-1716-21/+0
| | | | | Reviewed-by: Roland Scheidegger <[email protected]> Tested-by: Dieter Nützel <[email protected]>
* gallium: remove PIPE_CAP_TEXTURE_SHADOW_MAPMarek Olšák2018-01-1716-20/+0
| | | | | Reviewed-by: Roland Scheidegger <[email protected]> Tested-by: Dieter Nützel <[email protected]>
* gallium: remove PIPE_CAP_TWO_SIDED_STENCILMarek Olšák2018-01-1716-20/+0
| | | | | Reviewed-by: Roland Scheidegger <[email protected]> Tested-by: Dieter Nützel <[email protected]>
* svga: add num-commands-per-draw HUD queryBrian Paul2018-01-176-0/+24
| | | | | | | | | This query shows the ratio of total commands vs. drawing commands sent to the vgpu device. This gives some idea of how many state changes are sent per draw call. The closer the ratio is to 1.0, the better. Reviewed-by: Charmaine Lee <[email protected]> Reviewed-by: Neha Bhende <[email protected]>
* meson: add llvm dependency for swr buildGeorge Kyriazis2018-01-171-0/+1
| | | | Reviewed-by: Dylan Baker <[email protected]>
* radeonsi: bump glsl version to 450 for nir backendTimothy Arceri2018-01-181-6/+1
| | | | | | | | | | We still have more work to do but piglit results are looking pretty good. At GLSL 1.50 we have 30647/31118 piglit tests passing. At GLSL 4.50 we have 37927/38551 piglit tests passing. Reviewed-by: Nicolai Hähnle <[email protected]>
* radeonsi/nir: add some missing tcs bits to the nir scan passTimothy Arceri2018-01-181-0/+14
| | | | Reviewed-by: Nicolai Hähnle <[email protected]>
* ac/radeonsi: add tcs load outputs supportTimothy Arceri2018-01-182-16/+29
| | | | | | | | | The code to load outputs is essentially the same as load inputs so we make the interface more generic to maximise code sharing. We will make use of the new support in the following patch. Reviewed-by: Nicolai Hähnle <[email protected]>
* radeon: remove unneeded semicolonsGrazvydas Ignotas2018-01-171-3/+3
| | | | | | Trivial. Found by Coccinelle. Reviewed-by: Eric Engestrom <[email protected]>
* ac: import lp_create_builder() from gallivmSamuel Pitoiset2018-01-161-4/+4
| | | | | Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]>
* radeon/vcn: update quantiser matrices only when requestedIndrajit Das2018-01-161-6/+11
| | | | | | | Only update them when the pointers are valid. Signed-off-by: Indrajit Das <[email protected]> Reviewed-by: Christian König <[email protected]>