| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
| |
Reviewed-by: Bruce Cherniak <[email protected]>
|
|
|
|
|
|
|
| |
Small fetch performance optimization - use gather instruction
for odd format fetch instead of slow emulated code.
Reviewed-by: Bruce Cherniak <[email protected]>
|
|
|
|
| |
Reviewed-by: Bruce Cherniak <[email protected]>
|
|
|
|
| |
Reviewed-by: Bruce Cherniak <[email protected]>
|
|
|
|
|
|
|
| |
Misplaced #endif preventing depth and stencil hot tile pointers
from incrementing in SIMD16 8x2 configuration of BackendPixelRate.
Reviewed-by: Bruce Cherniak <[email protected]>
|
|
|
|
| |
Reviewed-by: Bruce Cherniak <[email protected]>
|
|
|
|
| |
Reviewed-by: Bruce Cherniak <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Frontend - reduce simdvertex/simd16vertex stack usage for VS output in
ProcessDraw, fixes stack overflow in some of the deeper call stacks under
SIMD16.
1. Move the vertex store out of PA_FACTORY, and off the stack
2. Allocate the vertex store out of the aligned heap (pointer is
temporarily stored in TLS, but will be migrated to thread pool
along with other frontend temporary buffers).
3. Grow the vertex store as necessary for the number of verts per
primitive, in chunks of 8/4 simdvertex/simd16vertex
Reviewed-by: Bruce Cherniak <[email protected]>
|
|
|
|
| |
Reviewed-by: Bruce Cherniak <[email protected]>
|
|
|
|
| |
Reviewed-by: Bruce Cherniak <[email protected]>
|
|
|
|
|
|
| |
Fixes gcc error for SIMD16 FE.
Reviewed-by: Bruce Cherniak <[email protected]>
|
|
|
|
| |
Reviewed-by: Bruce Cherniak <[email protected]>
|
|
|
|
|
|
|
|
| |
Disabling buffer overrun warning for Assemble(uint32_t slot,
simdvector *verts) due to what looks like a MSVC compiler bug
when compiling the SIMD16 FE.
Reviewed-by: Bruce Cherniak <[email protected]>
|
|
|
|
| |
Reviewed-by: Bruce Cherniak <[email protected]>
|
|
|
|
|
|
| |
Fixes MSVC errors with SIMD16 FE.
Reviewed-by: Bruce Cherniak <[email protected]>
|
|
|
|
|
|
| |
Not used yet.
Reviewed-by: Bruce Cherniak <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
| |
Ability to allocate space for an arbitrary number (at compile time)
of positions in the vertex layout.
Removes KNOB_NUM_ATTRIBUTES from knobs.h, replaces the VTX slot
number #defines with the SWR_VTX_SLOTS enum (which contains
replacement for NUM_ATTRIBUTES: SWR_VTX_NUM_SLOTS)
Reviewed-by: Bruce Cherniak <[email protected]>
|
|
|
|
|
|
|
| |
We already have BRW_NEW_BATCH, which completely covers all the cases
that BRW_NEW_CONTEXT would handle. Drop it.
Reviewed-by: Topi Pohjolainen <[email protected]>
|
|
|
|
|
|
| |
There's no reason for this as far as I can tell.
Reviewed-by: Topi Pohjolainen <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Gen4-5 and Gen8+ already set this, but Gen6-7.5 did not. We ought to
be consistent - the answer depends on the API, not the hardware generation.
The Sandybridge PRM says about RASTRULE_UPPER_RIGHT:
"To match OpenGL point rasterization rules (round to +infinity, where
this is the upper right direction wrt OpenGL screen origin of lower
left).
So this is likely the one we should use.
Reviewed-by: Rafael Antognolli <[email protected]>
|
|
|
|
|
|
|
|
| |
We set this unconditionally on every other platform. Zero (Manhattan)
isn't even listed as an option in the Sandybridge docs - only "true".
Reviewed-by: Plamena Manolova <[email protected]>
Reviewed-by: Francisco Jerez <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The original Broadwater and Crestline platforms computed antialiased
line distances using "manhattan" distance, aka a + b = c. Eaglelake
and Cantiga added "true" distance, which apparently does something
like max(a, b) + min(a, b) / 4. Not exactly "true", but at least
more accurate.
The G45 documentation indicates that the old manhattan distance setting
is "only for debug purposes" and should never be used. The Ironlake
documentation no longer mentions AALINEDISTANCE_MANHATTAN, though it
does still contain the narrative about the feature.
At any rate, we should use the more accurate mode.
Reviewed-by: Plamena Manolova <[email protected]>
Reviewed-by: Francisco Jerez <[email protected]>
|
|
|
|
| |
Signed-off-by: Andres Gomez <[email protected]>
|
|
|
|
|
| |
Signed-off-by: Andres Gomez <[email protected]>
(cherry picked from commit 6cb65ce2d3689ae7f692f8cf08559109037dd74e)
|
|
|
|
|
| |
Signed-off-by: Andres Gomez <[email protected]>
(cherry picked from commit 61b134a862ecc1877bbe2f2c14e493b5fb607e04)
|
|
|
|
|
|
|
|
|
| |
Basically, don't load GRID_SIZE or BLOCK_SIZE if they are unused, determine
whether to load BLOCK_ID for each component separately, and set the number
of THREAD_ID VGPRs to load. Now we should get the maximum CS launch wave
rate in most cases.
Reviewed-by: Nicolai Hähnle <[email protected]>
|
|
|
|
|
|
| |
v2: just do indexing with swizzle[i]
Reviewed-by: Nicolai Hähnle <[email protected]>
|
|
|
|
|
| |
Reviewed-by: Nicolai Hähnle <[email protected]>
Reviewed-by: Samuel Pitoiset <[email protected]>
|
|
|
|
|
|
|
| |
LLVM 5.0 removes s_barrier instructions if the max-work-group-size
attribute is not set. What a surprise.
Reviewed-by: Nicolai Hähnle <[email protected]>
|
|
|
|
|
|
| |
We need 4 more bits there. I don't know what is fixed by this.
Reviewed-by: Nicolai Hähnle <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This removes s_load_dword latency for tess rings.
We need just 1 SGPR for the address if we use 64K alignment. The final asm
for recreating the descriptor is:
// s2 is (address >> 16)
s_mov_b32 s3, 0
s_lshl_b64 s[4:5], s[2:3], 16
s_mov_b32 s6, -1
s_mov_b32 s7, 0x27fac
v2: bitcast the descriptor type from v2i64 to v4i32
Reviewed-by: Nicolai Hähnle <[email protected]>
|
|
|
|
| |
Reviewed-by: Nicolai Hähnle <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
| |
The use of PrimID in the pixel shader is too rare to deserve such
a sizable support code.
The initial idea of the VS epilog was to move the clipping code there and
remove it based on states, but optimized variants are now used to do that
and are easier to support, so the VS epilog has turned out to be not so
useful.
Reviewed-by: Nicolai Hähnle <[email protected]>
|
|
|
|
|
|
| |
VGPR1 = InstanceID / StepRate0; // StepRate0 can be set to 1
Reviewed-by: Nicolai Hähnle <[email protected]>
|
|
|
|
| |
Reviewed-by: Nicolai Hähnle <[email protected]>
|
|
|
|
| |
Reviewed-by: Nicolai Hähnle <[email protected]>
|
|
|
|
|
|
|
| |
Tentatively enable it, expecting the scratch buffer support to be done before
the next Mesa release.
Reviewed-by: Nicolai Hähnle <[email protected]>
|
|
|
|
| |
Reviewed-by: Nicolai Hähnle <[email protected]>
|
|
|
|
|
|
|
| |
The 2nd shader of merged shaders should take a reference of the 1st shader.
The next commit will do that.
Reviewed-by: Nicolai Hähnle <[email protected]>
|
|
|
|
| |
Reviewed-by: Nicolai Hähnle <[email protected]>
|
|
|
|
| |
Reviewed-by: Nicolai Hähnle <[email protected]>
|
|
|
|
|
|
| |
not implemented yet
Reviewed-by: Nicolai Hähnle <[email protected]>
|
|
|
|
|
|
| |
In addition to the non-monolithic variant.
Reviewed-by: Nicolai Hähnle <[email protected]>
|
|
|
|
| |
Reviewed-by: Nicolai Hähnle <[email protected]>
|
|
|
|
| |
Reviewed-by: Nicolai Hähnle <[email protected]>
|
|
|
|
| |
Reviewed-by: Nicolai Hähnle <[email protected]>
|
|
|
|
| |
Reviewed-by: Nicolai Hähnle <[email protected]>
|
|
|
|
| |
Reviewed-by: Nicolai Hähnle <[email protected]>
|
|
|
|
| |
Reviewed-by: Nicolai Hähnle <[email protected]>
|
|
|
|
| |
Reviewed-by: Nicolai Hähnle <[email protected]>
|