summaryrefslogtreecommitdiffstats
path: root/src/gallium/drivers/swr
Commit message (Collapse)AuthorAgeFilesLines
* swr/rast: Fix invalid casting for calls to Interlocked* functionsTim Rowley2017-08-163-7/+7
| | | | | | CID: 1416243, 1416244, 1416255 CC: [email protected] Reviewed-by: Bruce Cherniak <[email protected]>
* gallium: introduce PIPE_CAP_MEMOBJTimothy Arceri2017-08-031-0/+1
| | | | | | | | | | | | | | This can be used to guard support for EXT_memory_object and related extensions. v2: update gallium docs v3 (Timothy Arceri): - add cap to nv50 Signed-off-by: Andres Rodriguez <[email protected]> Reviewed-by: Marek Olšák <[email protected]> Reviewed-by: Samuel Pitoiset <[email protected]>
* swr/rast: fix core / knights split of AVX512 intrinsicsTim Rowley2017-08-024-55/+69
| | | | | | | | Move AVX512BW specific intrinics to be Core-only. Move some AVX512F intrinsics back to common implementation file. Reviewed-by: Bruce Cherniak <[email protected]>
* swr/rast: simplify knob default value setupTim Rowley2017-08-022-14/+11
| | | | Reviewed-by: Bruce Cherniak <[email protected]>
* swr/rast: split gen_knobs templates into .h/.cppTim Rowley2017-08-025-118/+166
| | | | | | | Switch to a 1:1 mapping template:generated for future maintenance. Reviewed-by: Emil Velikov <[email protected]> Reviewed-by: Bruce Cherniak <[email protected]>
* swr/rast: gen_knobs template code styleTim Rowley2017-08-021-2/+2
| | | | Reviewed-by: Bruce Cherniak <[email protected]>
* swr/rast: switch gen_knobs.cpp licenseTim Rowley2017-08-021-12/+17
| | | | | | | Unintentionally added with an apache2 license; relicense to match the rest of the tree. Reviewed-by: Bruce Cherniak <[email protected]>
* swr/rast: fix scons gen_knobs.h dependencyTim Rowley2017-08-021-1/+1
| | | | | | | | Copy/paste error was duplicating a gen_knobs.cpp rule. Fixes: 5079c277b57 ("swr: [scons] Fix windows build") Reviewed-by: Emil Velikov <[email protected]> Reviewed-by: Bruce Cherniak <[email protected]>
* swr/rast: constify swr rasterizerTim Rowley2017-08-0218-323/+339
| | | | | | Add "const" as appropriate in method/function signatures. Reviewed-by: Bruce Cherniak <[email protected]>
* swr/rast: SIMD16 shaders - widen fetch and vertex shadersTim Rowley2017-08-026-5/+238
| | | | | | Work in progress, disabled by default. Reviewed-by: Bruce Cherniak <[email protected]>
* swr/rast: vmask() implementations for KNLTim Rowley2017-08-021-0/+14
| | | | Reviewed-by: Bruce Cherniak <[email protected]>
* swr/rast: rename frontend pVertexStoreTim Rowley2017-08-021-6/+9
| | | | | | Rename to reflect global nature. Reviewed-by: Bruce Cherniak <[email protected]>
* swr/rast: fix movemask_ps / movemask_pd on AVX512Tim Rowley2017-08-021-2/+7
| | | | Reviewed-by: Bruce Cherniak <[email protected]>
* swr/rast: stop using MSFT types in platform independent codeTim Rowley2017-08-0214-31/+35
| | | | Reviewed-by: Bruce Cherniak <[email protected]>
* swr/rast: enable USE_SIMD16_FRONTEND by defaultTim Rowley2017-08-021-1/+1
| | | | Reviewed-by: Bruce Cherniak <[email protected]>
* swr/rast: disable AVX512 optimization of SSE / AVX codeTim Rowley2017-08-021-0/+4
| | | | | | | | | | Disable an optimization which implemented sse/avx operations on avx512 using avx512 intrinsics (to avoid switching between lane widths). Compile with SIMD_OPT_128_AVX512 / SIMD_OPT_256_AVX512 defined to enable these optimizations. Reviewed-by: Bruce Cherniak <[email protected]>
* swr/rast: fix USE_SIMD16_FRONTEND issuesTim Rowley2017-08-0214-74/+49
| | | | | | | Fix problems found when enabling USE_SIMD16_FRONTEND, mostly related to vMask / movemask_ps(pd). Reviewed-by: Bruce Cherniak <[email protected]>
* swr/rast: simdlib better separation of core vs knights avx512Tim Rowley2017-08-0215-245/+911
| | | | Reviewed-by: Bruce Cherniak <[email protected]>
* swr/rast: threadID via portable std::this_thread::get_id()Tim Rowley2017-08-021-9/+11
| | | | | | | Replace use of Win32 GetCurrentThreadId() with portable std::this_thread::get_id(). Reviewed-by: Bruce Cherniak <[email protected]>
* gallium: add PIPE_QUERY_SO_OVERFLOW_ANY_PREDICATE and corresponding capNicolai Hähnle2017-08-021-0/+1
| | | | | | | | v2: rename cap to PIPE_CAP_QUERY_SO_OVERFLOW and be a bit more explicit in the documentation Reviewed-by: Roland Scheidegger <[email protected]> Reviewed-by: Marek Olšák <[email protected]>
* gallium: add PIPE_CAP_NIR_SAMPLERS_AS_DEREFNicolai Hähnle2017-07-311-0/+1
| | | | Reviewed-by: Marek Olšák <[email protected]>
* swr: fix transform feedback logicGeorge Kyriazis2017-07-274-8/+71
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The shader that is used to copy vertex data out of the vs/gs shaders to the user-specified buffer (streamout or SO shader) was not using the correct offsets. Adjust the offsets that are used just for the SO shader: - Make sure that position is handled in the same special way as in the vs/gs shaders - Use the correct offset to be passed in the core - consolidate register slot mapping logic into one function, since it's been calculated in 2 different places (one for calcuating the slot mask, and one for the register offsets themselves Also make room for all attibutes in the backend vertex area. Fixes: - all vtk GL2PS tests - 18 piglit tests (16 ext_transform_feedback tests, arb-quads-follow-provoking-vertex and primitive-type gl_points v2: - take care of more SGV slots in slot mapping logic - trim feState.vsVertexSize - fix GS interface and incorporate GS while calculating vsVertexSize Note that vsVertexSize is used in the core as the one parameter that controls vertex size between all stages, so it has to be adjusted appropriately for the whole vs/gs/fs pipeline. Also note that GS and SO is not fully implemented. This will be addressed later. fixes: - fixes total of 20 piglit tests CC: 17.2 <[email protected]> Reviewed-by: Bruce Cherniak <[email protected]>
* swr/rast: non-regex knob fallback code for gcc < 4.9Tim Rowley2017-07-271-0/+21
| | | | | | | | gcc prior to 4.9 didn't implement <regex>, causing a startup crash in the swr knob parameter reading code. CC: <[email protected]> Reviewed-by: Bruce Cherniak <[email protected]>
* swr: use the correct variable for no undefined symbolsEmil Velikov2017-07-241-1/+1
| | | | | | | | | | | | | | | | | | | | | The variable name was missing a leading LD_, which resulted in a missing check for unresolved symbols in the backend binaries. With the link addressed with earlier patches, we can correct the typo. Thanks to Laurent for the help spotting this. v2: Split from a larger patch. Cc: [email protected] Cc: Bruce Cherniak <[email protected]> Cc: Tim Rowley <[email protected]> Cc: Laurent Carlier <[email protected]> Fixes: 9475251145174882b532 "swr: standardize linkage and check for unresolved symbols" Reviewed-by: Eric Engestrom <[email protected]> Reported-by: Laurent Carlier <[email protected]> Signed-off-by: Emil Velikov <[email protected]>
* swr: don't forget to link KNL/SKX against pthreadsEmil Velikov2017-07-241-0/+8
| | | | | | | | | | Analogous to previous commit but for the KNL/SKX backends. Cc: Bruce Cherniak <[email protected]> Cc: Tim Rowley <[email protected]> Cc: Laurent Carlier <[email protected]> Fixes: 1cb5a6061ce ("configure/swr: add KNL and SKX architecture targets") Signed-off-by: Emil Velikov <[email protected]>
* swr: don't forget to link AVX/AVX2 against pthreadsEmil Velikov2017-07-241-0/+8
| | | | | | | | | | | | | | | | | | Seems like the backends have been using pthreads since day one, yet we've been missing the link. With later commit we'll fix a typo, hence the libraries will be build with -Wl,no-undefined, aka failing the build on unresolved symbols. v2: Split from a larger patch. Cc: [email protected] Cc: Bruce Cherniak <[email protected]> Cc: Tim Rowley <[email protected]> Cc: Laurent Carlier <[email protected]> Fixes: c6e67f5a9373e916a8d2 "gallium/swr: add OpenSWR rasterizer" Reviewed-by: Eric Engestrom <[email protected]> Signed-off-by: Emil Velikov <[email protected]>
* swr/rast: quit using linux-specific gettid()Tim Rowley2017-07-212-4/+3
| | | | | | | | | | | | | Linux-specific gettid() syscall shouldn't be used in portable code. Fix does assume a 1:1 thread:LWP architecture, but works for our current target platforms and can be revisited later if needed. Fixes unresolved symbol in linux scons builds. v2: add comment in code about the 1:1 assumption. Cc: [email protected] Reviewed-by: Bruce Cherniak <[email protected]>
* swr/rast: fix memory paths for avx512 optimized avx/sseTim Rowley2017-07-212-10/+10
| | | | | | | Source/destination will not be AVX512 aligned, use the unaligned load/store intrinsics. Reviewed-by: Bruce Cherniak <[email protected]>
* swr/rast: cache line align hottile buffersTim Rowley2017-07-211-3/+3
| | | | | | Prevents unalignment crashes with avx512 code on gcc/clang. Reviewed-by: Bruce Cherniak <[email protected]>
* swr/rast: simdlib changes for clang/gccTim Rowley2017-07-212-10/+35
| | | | | | Tested with clang-4.0 and gcc-6.3. Reviewed-by: Bruce Cherniak <[email protected]>
* configure/swr: add KNL and SKX architecture targetsTim Rowley2017-07-192-0/+58
| | | | | | | | | | Not built by default. Currently only builds with icc. v2: * document knl,skx possibilities for swr_archs * merge with changed loader lib selection code Reviewed-by: Emil Velikov <[email protected]>
* configure/swr: configurable swr architecturesTim Rowley2017-07-193-8/+40
| | | | | | | | | | | | | | | | Allow configuration of the SWR architecture depend libraries we build for with --with-swr-archs. Maintains current behavior by defaulting to avx,avx2. Scons changes made to make it still build and work, but without the changes for configuring which architectures. v2: * add missing comma for swr_archs default * check that at least one architecture is enabled * modify loader logic to make it clearer how to add archs Reviewed-by: Emil Velikov <[email protected]>
* swr: remove unneeded fallback strcasecmp defineEmil Velikov2017-07-191-5/+0
| | | | | | | | | The last user of the function was removed with earlier commit. Fixes: 50842e8a931 ("swr: replace gallium->swr format enum conversion") Cc: Tim Rowley <[email protected]> Signed-off-by: Emil Velikov <[email protected]> Reviewed-by: Tim Rowley <[email protected]>
* swr: JitManager runtime determination of architectureTim Rowley2017-07-141-1/+2
| | | | | | | Fixes performance regression from f50aa21456d - was forcing internal code generation to target AVX (no gather, etc). Reviewed-by: Bruce Cherniak <[email protected]>
* swr/rast: Fix use of KNL-only intrinsics in SKX buildTim Rowley2017-07-133-6/+6
| | | | Reviewed-by: Bruce Cherniak <[email protected]>
* swr/rast: Fix build warnings when using the Intel compilerTim Rowley2017-07-131-1/+1
| | | | Reviewed-by: Bruce Cherniak <[email protected]>
* swr/rast: SIMD16 Frontend - Fix USE_SIMD16_FRONTEND buildTim Rowley2017-07-134-12/+25
| | | | | | | Previous check-ins without testing with USE_SIMD16_FRONTEND have introduced regressions. This fixes the build, not the regressions. Reviewed-by: Bruce Cherniak <[email protected]>
* swr/rast: Removing unneeded MSVC warning pragmaTim Rowley2017-07-131-3/+0
| | | | Reviewed-by: Bruce Cherniak <[email protected]>
* swr/rast: Add support for read-only render targetsTim Rowley2017-07-132-4/+10
| | | | | | | Core will ensure hot tiles are loaded for read and write render targets, and will skip all output merger for read-only render targets. Reviewed-by: Bruce Cherniak <[email protected]>
* swr/rast: Support render target mask instead of render target countTim Rowley2017-07-137-49/+85
| | | | | | WIP to support read-only render targets. Reviewed-by: Bruce Cherniak <[email protected]>
* swr: Add path to draw directly from client memory without copy.Bruce Cherniak2017-07-125-11/+51
| | | | | | | | | | | | | | | | If size of client memory copy is too large, don't copy. The draw will access user-buffer directly and then block. This is faster and more efficient than queuing many large client draws. Applications that still use large client arrays benefit from this. VMD is an example. The threshold for this path defaults to 32KB. This value can be overridden by setting environment variable SWR_CLIENT_COPY_LIMIT. v2: Use #define for default value, rather than hard-coded constant. Reviewed-by: Tim Rowley <[email protected]>
* swr: Move environment config options into separate function.Bruce Cherniak2017-07-121-26/+34
| | | | | | | | Moved reading of environment config options out of swr_create_screen_internal, into a separate swr_validate_env_options. This is to keep from cluttering create_screen. Reviewed-by: Tim Rowley <[email protected]>
* swr: Remove hard-coded constant and "todo" comment.Bruce Cherniak2017-07-121-1/+2
| | | | | | | | Removed the hard-coded constant in favor of a #define. Also removed TODO comment. The constant value doesn't need an environment configurable option. Reviewed-by: Tim Rowley <[email protected]>
* swr: build driver proper separate from rasterizerTim Rowley2017-07-115-39/+36
| | | | | | | | | | | | | | | | | | swr used to build and link the rasterizer to the driver, and to support multiple architectures we needed to have multiple versions of the driver/rasterizer combination, which needed to link in much of mesa. Changing to having one instance of the driver and just building architecture specific versions of the rasterizer gives a large reduction in disk space. libGL.so 6464 Kb -> 7000 Kb libswrAVX.so 10068 Kb -> 5432 Kb libswrAVX2.so 9828 Kb -> 5200 Kb Total 26360 Kb -> 17632 Kb Reviewed-by: Emil Velikov <[email protected]>
* swr: switch to using SwrGetInterface api tableTim Rowley2017-07-1110-65/+72
| | | | | | | | | Use the SWR rasterizer API through the table returned from SwrGetInterface rather than referencing the functions directly. This will allow us to move to a model of having the driver dynamically load the appropriate swr architecture library. Reviewed-by: Bruce Cherniak <[email protected]>
* swr/rast: make SWR_VISIBLE attribute work for windowsGeorge Kyriazis2017-07-111-1/+1
| | | | | | Needed to expose SwrGetInterface Reviewed-by: Bruce Cherniak <[email protected]>
* swr/rast: Correctly allocate SWR_STATS memory as cacheline alignedTim Rowley2017-07-062-5/+5
| | | | | | | | | | | | Cacheline alignment of SWR_STATS to prevent sharing of cachelines between threads (performance). Gets rid of gcc-7.1 warning about using c++17's over-aligned new feature. Cc: [email protected] Reviewed-by: Bruce Cherniak <[email protected]>
* swr/rast: remove unused variablesTim Rowley2017-07-062-4/+0
| | | | Reviewed-by: Bruce Cherniak <[email protected]>
* swr/rast: don't use _mm256_fmsub_ps in AVX codeTim Rowley2017-07-061-1/+5
| | | | Reviewed-by: Bruce Cherniak <[email protected]>
* swr/rast: _mm*_undefined_* implementations for gcc<4.9Tim Rowley2017-07-061-0/+6
| | | | | | | | | Define these in terms of setzero for ancient gcc versions which don't have the undefined intrinsics. Cc: [email protected] Reviewed-by: Bruce Cherniak <[email protected]>