| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
In 3DSTATE_DEPTH_BUFFER, we set Width and Height to the miptree slice's
physical dimensions. (Logical and physical dimensions may differ for
multisample surfaces).
However, in SURFACE_STATE, we always set Width and Height to the slice's
logical dimensions. We should do the same for 3DSTATE_DEPTH_BUFFER,
because the hw docs say so.
No Piglit regressions (-x glx -x glean) on Ivybridge with Wayland.
v2: No Piglit regressions, for real this time.
Reviewed-by: Jordan Justen <[email protected]>
Signed-off-by: Chad Versace <[email protected]>
|
|
|
|
|
|
|
|
| |
The registers in the architecture register file don't share much in
common, so there's no point in grouping them together. Use the HW_REG
class instead. The vec4 backend already does this.
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
|
|
|
| |
accumulator.
Accidentally pushed an old version of the patch.
v2: Set destination register using brw_null_reg().
Reviewed-by: Paul Berry <[email protected]>
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
| |
v2: Make accumulator's type match the type of the operation. Noticed by
Ken.
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
| |
Reviewed-by: Eric Anholt <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
CSE would otherwise combine the two mul(8) emitted by [iu]mulExtended:
mul(8) acc0 x y
mach(8) null x y
mov(8) lsb acc0
...
mul(8) acc0 x y
mach(8) msb x y
Into:
mul(8) temp x y
mov(8) acc0 temp
mach(8) null x y
mov(8) lsb acc0
...
mov(8) acc0 temp
mach(8) msb x y
But mul(8) into the accumulator produces more than 32-bits of precision,
which is required and lost if multiplying into a general register and
moving to the accumulator.
Reviewed-by: Eric Anholt <[email protected]>
|
|
|
|
|
|
|
|
|
|
| |
These built-ins have two "out" parameters, which makes implementing them
efficiently with our current compiler infrastructure difficult. Instead,
implement them in terms of the existing ir_binop_mul IR (to return the
low 32-bits) and a new ir_binop_mul64 which returns the high 32-bits.
v2: Rename mul64 -> imul_high as suggested by Ken.
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
| |
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
| |
Reviewed-by: Paul Berry <[email protected]>
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
| |
Using the ADDC and SUBB instructions on Gen7.
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
| |
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
|
|
| |
Calculates the carry out of the addition of two values and the
borrow from subtraction respectively. Will be used in uaddCarry() and
usubBorrow() built-in implementations.
Reviewed-by: Kenneth Graunke <[email protected]>
Reviewed-by: Ian Romanick <[email protected]>
|
| |
|
|
|
|
|
|
|
|
|
| |
This fixes issues where get_rt_format would see a 0 format because the
nouveau_surface had not been properly initialized. Fixes crash on
supertuxkart startup (which still fails due to out-of-vram issues).
Signed-off-by: Ilia Mirkin <[email protected]>
Signed-off-by: Francisco Jerez <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
| |
As of ARB_gpu_shader5, textureGather doesn't always read the
post-swizzle RED channel -- so we can't just look at the red swizzle
state.
Theoretically we could only flag the quirk if *some* green swizzle is in
use, but that's probably more trouble than it's worth.
Signed-off-by: Chris Forbes <[email protected]>
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
| |
- For HSW: Select the channel based on the component selected (swizzle
is done in HW)
- For IVB: Select the channel based on the swizzle state for the
component selected. Only apply the RG32F w/a if we actually want
green -- we're about to flag it regardless of swizzle state.
Signed-off-by: Chris Forbes <[email protected]>
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
| |
- For HSW: Select the channel based on the component selected (swizzle
is done in HW)
- For IVB: Select the channel based on the swizzle state for the
component selected. Only apply the RG32F w/a if we actually want
green -- we're about to flag it regardless of swizzle state.
Signed-off-by: Chris Forbes <[email protected]>
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
| |
Signed-off-by: Chris Forbes <[email protected]>
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
| |
Signed-off-by: Chris Forbes <[email protected]>
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
| |
Signed-off-by: Chris Forbes <[email protected]>
|
|
|
|
|
|
|
|
|
|
| |
Theoretically would work on Gen5 as well but requires GLSL 1.30, which
is not (yet) enabled by default there.
V2: Enable for Gen5 conditionally on GLSL version.
Signed-off-by: Chris Forbes <[email protected]>
Reviewed-by: Matt Turner <[email protected]>
|
|
|
|
|
| |
Signed-off-by: Chris Forbes <[email protected]>
Reviewed-by: Matt Turner <[email protected]>
|
|
|
|
|
| |
Signed-off-by: Chris Forbes <[email protected]>
Reviewed-by: Matt Turner <[email protected]>
|
|
|
|
|
|
|
|
| |
Previously we special-cased textureSize() but this is the more correct
condition.
Signed-off-by: Chris Forbes <[email protected]>
Reviewed-by: Matt Turner <[email protected]>
|
|
|
|
|
| |
Signed-off-by: Chris Forbes <[email protected]>
Reviewed-by: Matt Turner <[email protected]>
|
|
|
|
|
| |
Signed-off-by: Chris Forbes <[email protected]>
Reviewed-by: Matt Turner <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
| |
We return 0 for GL_NUM_SHADER_BINARY_FORMATS, so
GL_SHADER_BINARY_FORMATS should not write any data to the application
buffer.
Fixes piglit test 'arb_get_program_binary-overrun shader'.
Signed-off-by: Ian Romanick <[email protected]>
Reviewed-by: Matt Turner <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Previously, we computed dFdy() using the following instruction:
add(8) dst<1>F src<4,4,0)F -src.2<4,4,0>F { align1 1Q }
That had the disadvantage that it computed the same value for all 4
pixels of a 2x2 subspan, which meant that it was less accurate than
dFdx(). This patch changes it to the following instruction when
c->key.high_quality_derivatives is set:
add(8) dst<1>F src<4,4,1>.xyxyF -src<4,4,1>.zwzwF { align16 1Q }
This gives it comparable accuracy to dFdx().
Unfortunately, align16 instructions can't be compressed, so in SIMD16
shaders, instead of emitting this instruction:
add(16) dst<1>F src<4,4,1>.xyxyF -src<4,4,1>.zwzwF { align16 1H }
We need to unroll to two instructions:
add(8) dst<1>F src<4,4,1>.xyxyF -src<4,4,1>.zwzwF { align16 1Q }
add(8) (dst+1)<1>F (src+1)<4,4,1>.xyxyF -(src+1)<4,4,1>.zwzwF { align16 2Q }
Fixes piglit test spec/glsl-1.10/execution/fs-dfdy-accuracy.
Acked-by: Chris Forbes <[email protected]>
Reviewed-by: Eric Anholt <[email protected]>
|
| |
|
|
|
|
|
|
|
|
|
|
|
| |
Clean up inconsistency in enum decoration:
- Use the undecorated enums where possible.
- MAX_PROGRAM_TEXTURE_GATHER_COMPONENTS_ARB remains decorated, since it
has no undecorated equivalent in GL4.
Signed-off-by: Chris Forbes <[email protected]>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=70054
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
|
| |
The new surface channel select bits allow us to avoid having to
recompile the shader for this workaround.
Signed-off-by: Chris Forbes <[email protected]>
Reviewed-and-tested-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
| |
Signed-off-by: Chris Forbes <[email protected]>
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
| |
Signed-off-by: Chris Forbes <[email protected]>
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This allows us to use a different surface format for gather4, which is
required for R32G32_FLOAT to work on Gen7.
V4: - Only emit alternate surface state for shaders which will actually
use it.
- Pass a simple 'for_gather' flag rather than a function pointer.
The callee can decide what w/a to apply.
Signed-off-by: Chris Forbes <[email protected]>
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
|
|
|
| |
Worst-case is that *every* texunit uses a format that needs overriding.
V4: Place the gather slots last, so shaders which don't use gather don't
get penalized by having a huge binding table.
Signed-off-by: Chris Forbes <[email protected]>
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
| |
gather4 GREEN channel against a surface with format R32G32_FLOAT doesn't work
correctly on IVB. w/a from bspec:
- use R32G32_FLOAT_LD = 0x97 instead, for gather4 only.
- select BLUE channel to read GREEN
Signed-off-by: Chris Forbes <[email protected]>
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
|
|
| |
V4: Only flag quirks if there are any uses of gather in the shader,
to avoid spurious recompiles just because someone happened to use
RG32F.
Signed-off-by: Chris Forbes <[email protected]>
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
| |
Signed-off-by: Chris Forbes <[email protected]>
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
|
|
|
| |
Pretty much the same as the FS case. Channel select goes in the header,
V2: Less mangling.
V3: Avoid sampling at all, for degenerate swizzles.
Signed-off-by: Chris Forbes <[email protected]>
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Lowers ir_tg4 (from textureGather and textureGatherOffset builtins) to
SHADER_OPCODE_TG4.
The usual post-sampling swizzle workaround can't work for ir_tg4,
so avoid doing that:
* For R/G/B/A swizzles use the hardware channel select (lives in the
same dword in the header as the texel offset), and then don't do
anything afterward in the shader.
* For 0/1 swizzles blast the appropriate constant over all the output
channels instead of sampling.
V2: Avoid duplicating header enabling block
V3: Avoid sampling at all, for degenerate swizzles.
Signed-off-by: Chris Forbes <[email protected]>
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
|
|
|
| |
Adds the Gen7 message IDs, a new SHADER_OPCODE_TG4 pseudo-op, and
low-level support for emitting it via generate_tex().
V3: Updated for changes in master.
Signed-off-by: Chris Forbes <[email protected]>
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
|
|
|
| |
V2 [Chris Forbes]:
- Add new pattern, fixup parameter reading.
V3: Rebase onto new builtins machinery
Reviewed-by: Kenneth Graunke <[email protected]>
Reviewed-by: Ian Romanick <[email protected]>
|
|
|
|
|
| |
Reviewed-by: Kenneth Graunke <[email protected]>
Reviewed-by: Ian Romanick <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
When used with a cube array in VS, failed assertion in ir_validate:
Assignment count of LHS write mask channels enabled not
matching RHS vector size (3 LHS, 4 RHS).
To fix this, swizzle the RHS correctly for the writemask.
This showed up in the ARB_texture_gather tests, which exercise cube
arrays in the VS.
Signed-off-by: Chris Forbes <[email protected]>
Cc: "9.2" <[email protected]>
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Consider only the top-left and top-right pixels to approximate DDX in a 2x2
subspan, unless the application requests a more accurate approximation via
GL_FRAGMENT_SHADER_DERIVATIVE_HINT or this optimization is disabled from the
new driconf option disable_derivative_optimization.
This results in a less accurate approximation. However, it improves the
performance of Xonotic with Ultra settings by 24.3879% +/- 0.832202% (at 95.0%
confidence) on Haswell. No noticeable image quality difference observed.
The improvement comes from faster sample_d. It seems, on Haswell, some
optimizations are introduced to allow faster sample_d when all pixels in a
subspan have the same derivative. I considered SAMPLE_STATE too, which allows
one to control the quality of sample_d on Haswell. But it gave much worse
image quality without giving better performance comparing to this change.
No piglit quick.tests regression on Haswell (tested with v1).
v2: better guess for precompile program key
Signed-off-by: Chia-I Wu <[email protected]>
Reviewed-by: Kenneth Graunke <[email protected]>
Reviewed-by: Ian Romanick <[email protected]>
Reviewed-by: Chris Forbes <[email protected]>
|
|
|
|
|
|
|
|
| |
We have the destination framebuffer object passed in; there's no need to
go digging around in the context.
Signed-off-by: Chris Forbes <[email protected]>
Reviewed-by: Chad Versace <[email protected]>
|
|
|
|
|
|
|
|
| |
All member variables of glsl_to_tgsi_instruction are already being
initialized from its implicitly defined constructor, it's not
necessary to use rzalloc to allocate its memory.
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
|
| |
All member variables of ir_to_mesa_instruction are already being
initialized from its implicitly defined constructor, it's not
necessary to use rzalloc to allocate its memory.
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
All member variables of vec4_live_variables are already being
initialized from its constructor, it's not necessary to use rzalloc to
allocate its memory, and doing so makes it more likely that we will
start relying on the allocator to zero out all memory if the class is
ever extended with new member variables.
That's bad because it ties objects to some specific allocation scheme,
and gives unpredictable results when an object is created with a
different allocator -- Stack allocation, array allocation, or
aggregation inside a different object are some of the useful
possibilities that come to my mind.
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
All member variables of fs_live_variables are already being
initialized from its constructor, it's not necessary to use rzalloc to
allocate its memory, and doing so makes it more likely that we will
start relying on the allocator to zero out all memory if the class is
ever extended with new member variables.
That's bad because it ties objects to some specific allocation scheme,
and gives unpredictable results when an object is created with a
different allocator -- Stack allocation, array allocation, or
aggregation inside a different object are some of the useful
possibilities that come to my mind.
Reviewed-by: Kenneth Graunke <[email protected]>
|