| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
| |
Reviewed-by: Bruce Cherniak <[email protected]>
|
|
|
|
| |
Reviewed-by: Bruce Cherniak <[email protected]>
|
|
|
|
|
|
| |
Fix routines for 8-bit and 16-bit formats used by optimized tile store.
Reviewed-by: Bruce Cherniak <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Fixed Transpose_16 methods of following formats:
Transpose8_8_8_8
Transpose8_8
Transpose32_32
Transpose16_16_16_16
Transpose16_16_16
Transpose16_16
Reviewed-by: Bruce Cherniak <[email protected]>
|
|
|
|
|
| |
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=99214
Reviewed-by: Bruce Cherniak <[email protected]>
|
|
|
|
|
|
| |
v2: use fmul(1/65536) instead of fdiv(65535)
Reviewed-by: Bruce Cherniak <[email protected]>
|
|
|
|
| |
Reviewed-by: Bruce Cherniak <[email protected]>
|
|
|
|
| |
Reviewed-by: Bruce Cherniak <[email protected]>
|
|
|
|
|
|
|
|
|
|
| |
* All format combinations coded
* Fully emulated on AVX2 and AVX
* Known issue: the MSAA sample locations need to be adjusted for 8x2
Set ENABLE_AVX512_SIMD16 and USD_8x2_TILE_BACKEND to 1 in knobs.h to enable
Reviewed-by: Bruce Cherniak <[email protected]>
|
|
|
|
|
|
| |
Used in abandoned all-or-nothing approach to converting to AVX512
Reviewed-by: Bruce Cherniak <[email protected]>
|
|
|
|
|
|
|
|
|
| |
Work in progress (disabled).
USE_8x2_TILE_BACKEND define in knobs.h enables AVX512 code paths
(emulated on non-AVX512 HW).
Signed-off-by: Tim Rowley <[email protected]>
|
|
|
|
|
|
|
|
| |
Fixes: b3bd8bb611bb465d2e5e ("swr: [rasterizer core] add support
for "RAW" surface format")
CovID: 1373647
Signed-off-by: Eric Engestrom <[email protected]>
Reviewed-by: Tim Rowley <[email protected]>
|
|
|
|
| |
Signed-off-by: Tim Rowley <[email protected]>
|
|
|
|
| |
Signed-off-by: Tim Rowley <[email protected]>
|
|
|
|
| |
Signed-off-by: Tim Rowley <[email protected]>
|
|
|
|
|
|
|
|
|
| |
Align and use streaming store instructions for BE fifo queues.
Provides slightly faster enqueue and doesn't pollute the caches.
Add appropriate memory fences to ensure streaming writes are
globally visible.
Signed-off-by: Tim Rowley <[email protected]>
|
|
|
|
| |
Signed-off-by: Tim Rowley <[email protected]>
|
|
|
|
|
|
|
| |
- Fix conflict between windows MemoryFence and llvm::sys::MemoryFence
- Declare gettid()
Signed-off-by: Tim Rowley <[email protected]>
|
|
|
|
| |
Signed-off-by: Tim Rowley <[email protected]>
|
|
|
|
| |
Signed-off-by: Tim Rowley <[email protected]>
|
|
|
|
| |
Signed-off-by: Tim Rowley <[email protected]>
|
|
|
|
| |
Signed-off-by: Tim Rowley <[email protected]>
|
|
|
|
|
|
|
|
|
| |
Refactoring to leave existing simd_* intrinsics in "simdintrin.h" unchanged,
adding corresponding simd16_* intrinsics in "simd16intrin.h" on the side,
with emulation, that we can use piecemeal, rather than the all-or-nothing
approach to bring up avx512.
Signed-off-by: Tim Rowley <[email protected]>
|
|
|
|
| |
Signed-off-by: Tim Rowley <[email protected]>
|
|
|
|
|
|
| |
Enabling KNOB_SIMD_WIDTH = 16 for AVX512 pre-work and low level simd utils
Signed-off-by: Tim Rowley <[email protected]>
|
|
|
|
| |
Signed-off-by: Tim Rowley <[email protected]>
|
|
|
|
|
|
|
| |
Currently, most code paths between AVX2 and AVX512 are identical
(see changes to knobs.h).
Reviewed-by: Bruce Cherniak <[email protected]>
|
|
|
|
| |
Reviewed-by: Bruce Cherniak <[email protected]>
|
|
|
|
| |
Reviewed-by: Bruce Cherniak <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Fix build error with icc.
CXX libswrAVX_la-swr_clear.lo
icpc: command line warning #10006: ignoring unknown option '-Wdelete-non-virtual-dtor'
In file included from ./rasterizer/jitter/jit_api.h(31),
from swr_context.h(30),
from swr_clear.cpp(24):
./rasterizer/common/os.h(135): error: expected an identifier
void _mm256_storeu2_m128i(__m128i *hi, __m128i *lo, __m256i a)
^
Signed-off-by: Vinson Lee <[email protected]>
Reviewed-by: Tim Rowley <[email protected]>
|
|
|
|
| |
Reviewed-by: Bruce Cherniak <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
1. Don't clear bucket descriptions to fix issues with sim level
buckets getting out of sync.
2. Close out threadviz file descriptors in ClearThreads().
3. Skip buckets for jitter based buckets when multithreaded. We need
thread local storage through llvm jit functions to be fixed before
we can enable this.
4. Fix buckets StopCapture to correctly detect capture complete.
Reviewed-by: Bruce Cherniak <[email protected]>
|
|
|
|
| |
Reviewed-by: Bruce Cherniak <[email protected]>
|
|
|
|
| |
Reviewed-by: Bruce Cherniak <[email protected]>
|
|
|
|
| |
Reviewed-by: Bruce Cherniak <[email protected]>
|
|
|
|
| |
Reviewed-by: Bruce Cherniak <[email protected]>
|
|
|
|
|
|
| |
Output with slashes instead of backslashes for unix/linux.
Reviewed-by: Bruce Cherniak <[email protected]>
|
|
|
|
| |
Reviewed-by: Bruce Cherniak <[email protected]>
|
|
|
|
|
|
| |
Fix static code analysis errors found by coverity on Linux
Reviewed-by: Bruce Cherniak <[email protected]>
|
|
|
|
| |
Reviewed-by: Bruce Cherniak <[email protected]>
|
|
|
|
| |
Reviewed-by: Bruce Cherniak <[email protected]>
|
|
|
|
|
|
|
| |
Need to do lazy eval of the threadviz knob since order of globals
is undefined.
Reviewed-by: Bruce Cherniak <[email protected]>
|
|
|
|
| |
Reviewed-by: Bruce Cherniak <[email protected]>
|
|
|
|
| |
Reviewed-by: Bruce Cherniak <[email protected]>
|
|
|
|
|
|
|
|
|
| |
BackendPixelRate should be easier to read/maintain now hopefully.
Small perf bump by moving some of the pfn's to inline functions
without template params.
Reviewed-by: Bruce Cherniak <[email protected]>
|
|
|
|
|
|
| |
v2: use _mm_cmpunord_ps for vIsNaN
Reviewed-by: Bruce Cherniak <[email protected]>
|
|
|
|
| |
Acked-by: Brian Paul <[email protected]>
|
|
|
|
| |
Acked-by: Brian Paul <[email protected]>
|
|
|
|
|
|
| |
Reduce list traversal during Alloc and Free.
Add ability to have multiple lists based on alloc size (not used for now)
|
| |
|