| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
|
| |
Signed-off-by: Emil Velikov <[email protected]>
(cherry picked from commit 088d3501786a2ff0833de45951b63acbe6560a0f)
|
|
|
|
|
| |
Signed-off-by: Emil Velikov <[email protected]>
(cherry picked from commit 52bd154980e306b8bc9b9d2edc0e728a9f8f3bf6)
|
|
|
|
|
| |
Signed-off-by: Emil Velikov <[email protected]>
(cherry picked from commit 9f1149876f2d010c871751a53d02d4d2b6aef1fe)
|
|
|
|
|
|
|
|
| |
Also fixes two sided lighting which was broken at least
on pre-evergreen by commit b1eb00.
Signed-off-by: Glenn Kennard <[email protected]>
Signed-off-by: Marek Olšák <[email protected]>
|
|
|
|
|
|
|
|
|
| |
This is the last use tgsi_parse_token in radeonsi.
It looks ugly because the code was re-indented, but there is really no change
in behavior.
Reviewed-by: Michel Dänzer <[email protected]>
|
|
|
|
|
|
|
|
| |
It's redundant now.
It led to a simplification in si_llvm_emit_streamout, because outidx == reg.
Reviewed-by: Michel Dänzer <[email protected]>
|
|
|
|
|
|
| |
That code was really ugly.
Reviewed-by: Michel Dänzer <[email protected]>
|
|
|
|
|
|
|
|
|
| |
They were reinventing tgsi_shader_info. They are unused now.
radeon_llvm_context::load_input can be NULL if input fetching is implemented
in some other way.
Reviewed-by: Michel Dänzer <[email protected]>
|
|
|
|
|
|
| |
Those are going away.
Reviewed-by: Michel Dänzer <[email protected]>
|
|
|
|
| |
Reviewed-by: Michel Dänzer <[email protected]>
|
|
|
|
| |
Reviewed-by: Michel Dänzer <[email protected]>
|
|
|
|
| |
Reviewed-by: Michel Dänzer <[email protected]>
|
|
|
|
| |
Reviewed-by: Michel Dänzer <[email protected]>
|
|
|
|
| |
Reviewed-by: Michel Dänzer <[email protected]>
|
|
|
|
| |
Reviewed-by: Michel Dänzer <[email protected]>
|
|
|
|
| |
Reviewed-by: Michel Dänzer <[email protected]>
|
|
|
|
|
|
| |
tgsi_shader_info contains everything we need.
Reviewed-by: Michel Dänzer <[email protected]>
|
|
|
|
|
|
| |
Both cases are equivalent.
Reviewed-by: Michel Dänzer <[email protected]>
|
|
|
|
|
|
| |
Written CLIPDIST outputs are simply disabled in PA_CL_VS_OUT_CNTL.
Reviewed-by: Michel Dänzer <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
| |
No code in Mesa sets the usage mask to any other value.
The final mask is AND'ed with enable bits from the rasterizer state anyway.
If somebody implements setting usage masks in st/mesa, we can use
tgsi_shader_info to get it more easily.
This is a prerequisite for the following commit.
Reviewed-by: Michel Dänzer <[email protected]>
|
| |
|
|
|
|
|
|
|
| |
[ Francisco Jerez: Split off from a larger patch, and take a slightly
different approach for passing the implicit arguments around. ]
Reviewed-by: Francisco Jerez <[email protected]>
|
|
|
|
|
|
| |
arguments.
Reviewed-by: Jan Vesely <[email protected]>
|
|
|
|
|
|
| |
passing.
Reviewed-by: Jan Vesely <[email protected]>
|
|
|
|
| |
Reviewed-by: Francisco Jerez <[email protected]>
|
|
|
|
|
| |
Signed-off-by: Vinson Lee <[email protected]>
Acked-by: Brian Paul <[email protected]>
|
|
|
|
| |
Signed-off-by: Chia-I Wu <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Our current atan()-approximation is pretty inaccurate at 1.0, so
let's try to improve the situation by doing a direct approximation
without going through atan.
This new implementation uses an 11th degree polynomial to approximate
atan in the [-1..1] range, and the following identitiy to reduce the
entire range to [-1..1]:
atan(x) = 0.5 * pi * sign(x) - atan(1.0 / x)
This range-reduction idea is taken from the paper "Fast computation
of Arctangent Functions for Embedded Applications: A Comparative
Analysis" (Ukil et al. 2011).
The polynomial that approximates atan(x) is:
x * 0.9999793128310355 - x^3 * 0.3326756418091246 +
x^5 * 0.1938924977115610 - x^7 * 0.1173503194786851 +
x^9 * 0.0536813784310406 - x^11 * 0.0121323213173444
This polynomial was found with the following GNU Octave script:
x = linspace(0, 1);
y = atan(x);
n = [1, 3, 5, 7, 9, 11];
format long;
polyfitc(x, y, n)
The polyfitc function is not built-in, but too long to include here.
It can be downloaded from the following URL:
http://www.mathworks.com/matlabcentral/fileexchange/47851-constraint-polynomial-fit/content/polyfitc.m
This fixes the following piglit test:
shaders/glsl-const-folding-01
Signed-off-by: Erik Faye-Lund <[email protected]>
Reviewed-by: Ian Romanick <[email protected]>
|
|
|
|
|
| |
Improves simulated norast performance on a little benchmark by 13.4012%
+/- 2.08459% (n=13).
|
|
|
|
|
| |
Improves simulated norast performance on a little benchmark by 38.0965%
+/- 3.27534% (n=11).
|
|
|
|
|
| |
I was trying to skip state updates when !dirty, and suspiciously
everything was always dirty.
|
|
|
|
| |
Cleans up some output to be more obvious in a piglit test I'm looking at.
|
|
|
|
|
| |
Signed-off-by: Tapani Pälli <[email protected]>
Reviewed-by: Juha-Pekka Heikkila <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
When mapping the buffer a second time, we need to use the new pointer,
not the one from the previous mapping. Otherwise, we will most likely
crash.
Apparently, we've just been getting lucky and getting the same
bo->virtual pointer in both cases. libdrm probably has a hand in that.
Signed-off-by: Kenneth Graunke <[email protected]>
Reviewed-by: Anuj Phogat <[email protected]>
Cc: [email protected]
|
| |
|
|
|
|
|
| |
This was being generated frequently by matrix multiplies of 2 and
3-channel vertex attributes (which have the 0 or 1 loaded in the shader).
|
|
|
|
| |
This will be used more in the next commits.
|
| |
|
| |
|
|
|
|
|
|
|
|
|
| |
Copy propagating these might result in reading the r4 after some other
instruction has written r4. Just prevent all copy propagation of this for
now.
Fixes bad rendering with upcoming indirect register access support, where
the copy propagation was consistently happening across another read.
|
|
|
|
|
|
|
|
|
|
|
| |
Merging VS and CS into the same struct wasn't winning us anything except
for not allocating a separate BO (but if we want to pack programs into
BOs, we should pack not just those 2 programs together). What it was
getting us was a bunch of code duplication about hash table lookups and
propagating vc4_compile contents into a vc4_compiled_shader.
I was about to make the situation worse with indirect uniform buffer
access.
|
|
|
|
|
|
| |
I wanted to make another set of texture uploads for handling reladdr
constants, and duplicating all the bitshifting looked like a terrible
idea. In the process, this fixes a swap of the s/t texture wrap modes.
|
|
|
|
|
|
|
|
|
|
| |
Under the simulator, reading registers before writing them triggers an
assertion failure. c->undef gets treated as r0, which will usually be
written, but not if it's used in the first instruction. We should
definitely not be aborting in this case, and return some sort of undefined
value instead.
Fixes glsl-user-varying-ff.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The border color is only needed when using the GL_CLAMP_TO_BORDER or
(deprecated) GL_CLAMP wrap modes; all others ignore it, including the
common GL_CLAMP_TO_EDGE and GL_REPEAT wrap modes.
In those cases, we can skip uploading it entirely, saving a bit of space
in the batchbuffer. Instead, we just point it at the start of the
batch (offset 0); we have to program something, and that address is safe
to read.
Signed-off-by: Kenneth Graunke <[email protected]>
Reviewed-by: Chris Forbes <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Write-back caching cannot be used for buffers being scanned out by the
display engine; surfaces used for scan-out must be write-through or
uncached. I originally chose WT for render targets because it works in
all cases. However, we really want to use write-back caching where
possible, as it is more efficient.
Most renderbuffers are not used for scanout - off-screen FBOs certainly
are fine, and non-pageflipped backbuffers should be fine as well. So
in most cases WB will work. However, we don't know what will be used
for scan-out, so we instead simply use the PTE value specified by the
kernel, as it knows these things.
This matches our MOCS choice on Haswell.
Fixes performance regressions since commit ee4484be3dc827cf15bcf109f5
in a microbenchmark (spotted by Eero Tamminen). Improves performance
in GLBenchmark 2.7/EgyptHD by 7.44362% +/- 0.496939% (n=55) on a
Broadwell GT2. Improves performance in a bunch of other microbenchmarks
by ~15% or so.
Signed-off-by: Kenneth Graunke <[email protected]>
Reported-by: Eero Tamminen <[email protected]>
Reviewed-by: Daniel Vetter <[email protected]>
Reviewed-by: Kristian Høgsberg <[email protected]>
Cc: [email protected]
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Like BDW_MOCS_WB and BDW_MOCS_WT, this specifies that we want to use all
three caches (L3, LLC, and eLLC where available), but leaves the LLC
caching mode up to the kernel's page table entry.
This allows the kernel to pick WB/WT/UC based on whether it's using a
buffer for scanout.
Signed-off-by: Kenneth Graunke <[email protected]>
Reviewed-by: Daniel Vetter <[email protected]>
Reviewed-by: Kristian Høgsberg <[email protected]>
Cc: [email protected]
|
|
|
|
|
|
|
|
|
| |
These days, most driver debug output happens via stderr, not stdout.
Some applications (such as Xephyr) also appear to close stdout which
makes these messages go nowhere.
Signed-off-by: Kenneth Graunke <[email protected]>
Reviewed-by: Tapani Pälli <[email protected]>
|
|
|
|
|
|
|
|
|
| |
Putting those in VRAM can cause long pauses due to buffers being moved
into / out of VRAM.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=84662
Cc: [email protected]
Reviewed-by: Alex Deucher <[email protected]>
|
|
|
|
|
| |
This is a common production of st_glsl_to_tgsi, because CMP takes a float
argument.
|
|
|
|
|
| |
This is a common production of st_glsl_to_tgsi, which uses negate flags on
source arguments to handle subtraction.
|