| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
|
|
|
| |
And fix these warning that appear at autoreconf time:
"`:='-style assignments are not portable"
v2: Fix the recently-converted-to-automake r600.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
A little analysis shows that the worst-case value for "nr" is 17:
- base_mrf = 2 ... 2
- header present (say gen == 5) ... 4
- aa_dest_stencil_reg (stencil test) ... 5
- SIMD16 mode: += 4 * reg_width ... 13
- source_depth_to_render_target ... 15
- dest_depth_reg ... 17
This resulted in us setting base_mrf to 2 and mlen to 15. In other
words, we'd try to use m2..m16. But m16 doesn't exist pre-Gen6. Also,
the instruction scheduler data structures use arrays of size 16, so this
would cause us to access them out of bounds.
While the debugger system routine may need m0 and m1, we don't use it
today, so the simplest solution is just to move base_mrf back to 1.
That way, our worst case message fits in m1..m15, which is legal.
An alternative would be to fail on SIMD16 in this case, but that seems
a bit unfortunate if there's no real need to reserve m0 and m1.
Fixes new piglit test shaders/depth-test-and-write on Ironlake.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=48218
Signed-off-by: Kenneth Graunke <[email protected]>
Reviewed-by: Eric Anholt <[email protected]>
|
|
|
|
|
|
| |
It appears that when using 'ld' with the offset bits, address bounds
checking happens before the offset is applied, so parts of the drawing
in piglit texelFetchOffset() with a negative texcoord go black.
|
|
|
|
|
|
| |
It appears that when using 'ld' with the offset bits, address bounds
checking happens before the offset is applied, so parts of the drawing
in piglit texelFetchOffset() with a negative texcoord go black.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Our previous live interval analysis just said that anything in a loop
was live for the whole loop. If you had to spill a reg in a loop,
then we would consider the unspilled value live across the loop too,
so you never made progress by spilling. Eventually it would consider
everything in the loop unspillable and fail out.
With the new analysis, things completely deffed and used inside the
loop won't be marked live across the loop, so even if you
spill/unspill something that used to be live across the loop, you
reduce register pressure. But you usually don't even have to spill
any more, since our intervals are smaller than before.
This fixes assertion failure trying to compile the shader for the
"glyphy" text rasterier and piglit glsl-fs-unroll-explosion.
Improves Unigine Tropics performance 1.3% +/- 0.2% (n=5), by allowing
more shaders to be compiled in 16-wide mode.
|
|
|
|
| |
I'm about to replace the insides of this using the new analysis.
|
| |
|
|
|
|
|
|
| |
This takes the fs_inst list generated by the visitor, and generates a
list of basic blocks with edges between them. This is a building
block for data-flow analysis.
|
| |
|
|
|
|
|
|
|
|
| |
Now that we use separate binding tables for WM, VS, and GS, and have
BRW_MAX_VS_SURFACES and BRW_MAX_GS_SURFACES macros, we really shouldn't
have an unqualified BRW_MAX_SURFACES macro. It's confusing.
Signed-off-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
|
|
| |
They had a number of issues:
- A paragraph states that we use a single binding table, but we don't.
- We labelled the WM binding table diagram as SOL/WM.
- The WM diagram had an "Only relevant to the WM" comment. Duh.
Signed-off-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
TEXTURED_TRIANGLE and MULTITEX_TRIANGLE are both a bit special in that if
you use any other graph object in the meantime they'll forget their state
and spew a lovely METHOD_CNT error at you when you try to draw.
The pre-newlib driver has a flush_notify() hook which does this state
re-emit, and a number of random workarounds like extra flushes and state
dirtying after various operations to solve this issue.
I'm taking a slightly different approach to things instead, which has the
nice side-effect of removing the divergent code-paths for ttri/mtri, the
flush/dirty workarounds and the need for flush_notify. Also gives a few
FPS boost in OA, yay.
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Noticed by clang:
brw_wm_surface_state.c:330:30: warning: initializer overrides prior
initialization of this subobject [-Winitializer-overrides]
[MESA_FORMAT_Z24_S8] = 0,
^
brw_wm_surface_state.c:326:30: note: previous initialization is here
[MESA_FORMAT_Z24_S8] = 0,
^
No functionality change, since the array is declared static so
it was zero-initialized by default.
Reviewed-by: Kenneth Graunke <[email protected]>
Reviewed-by: Chad Versace <[email protected]>
|
|
|
|
|
| |
The param wasn't added until drm-intel-next for 3.4, so we were
missing our various LLC fast-paths.
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
By making a bool fs_reg only have a defined low bit (matching CMP
output), instead of being a full 0 or 1 value, we reduce the ANDs
generated in logic chains like:
if (v_texcoord.x < 0.0 || v_texcoord.x > texwidth ||
v_texcoord.y < 0.0 || v_texcoord.y > 1.0)
discard;
My concern originally when writing this code was that we would end up
generating unnecessary ANDs on bool uniforms, so I put the ANDs right
at the point of doing the CMPs that otherwise set only the low bit.
However, in order to use a bool, we're generating some instruction
anyway (e.g. moving it so as to produce a condition code update), and
those instructions can often be turned into an AND at that point. It
turns out in the shaders I have on hand, none of them regress in
instruction count:
Total instructions: 262649 -> 262545
39/2148 programs affected (1.8%)
14253 -> 14149 instructions in affected programs (0.7% reduction)
|
|
|
|
|
|
|
|
|
|
|
|
| |
This change (before the previous two) produced a .23% +/- .11%
performance improvement in Unigine Tropics at 1024x768 on IVB.
Total instructions: 269270 -> 262649
614/2148 programs affected (28.6%)
179386 -> 172765 instructions in affected programs (3.7% reduction)
v2: Move some of the logic of finding the instruction that produced
the result of an expression tree to a helper.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
When using a separate stencil buffer, i965 requires that the pitch of
the buffer (in the 3DSTATE_STENCIL_BUFFER command) be specified as 2x
the actual pitch.
Previously this was accomplished by doubling the "cpp" and "pitch"
values stored in the intel_region data structure, and halving the
height. However, this was confusing, and it led to a subtle (but
benign) bug: since a stencil buffer is W-tiled, its true height must
be aligned to a multiple of 64; we were accidentally aligning its faux
height to a multiple of 64, causing memory to be wasted.
Note that for window system stencil buffers, the DDX also doubles the
cpp and pitch values. To facilitate fixing this DDX server bug in the
future, we fix the cpp and pitch values we receive from the X server
only if cpp has the "incorrect" value of 2.
Acked-by: Kenneth Graunke <[email protected]>
Reviewed-by: Chad Versace <[email protected]>
v2: Clarify comments about the DDX.
|
|
|
|
| |
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
| |
This was hacked in in one place for EGL image stuff, but the right
thing to do was just to provide the mapping from the mesa format to
the native hardware format, which includes render target support.
This turns out to be required for GL_ARB_texture_buffer_object, which
sees data in this layout.
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
| |
It turns out this field *is* used, and it's the stride between samples
from the buffer. Discovered during TBO debugging.
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
|
| |
This never worked. brwProgramStringNotify also explicitly rejects
programs that use CAL and RET. So there's no need for this to exist.
Signed-off-by: Kenneth Graunke <[email protected]>
Reviewed-by: Eric Anholt <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
When SPRITE_POINT_ENABLE bit is set, the texture coord would be
replaced, and this is only needed when we called something like
glTexEnvi(GL_POINT_SPRITE, GL_COORD_REPLACE, GL_TRUE).
And more, we currently handle varying inputs as texture coord,
we would be careful when setting this bit and set it just when
needed, or you will find the value of varying input is not right
and changed.
Thus we do set SPRITE_POINT_ENABLE bit only when all enabled tex
coord units need do CoordReplace. Or fallback is needed to make
sure the rendering is right.
With handling the bit setup at i915_update_sprite_point_enable(),
we don't need the relative code at i915Enable then.
This patch would _really_ fix the webglc point-size.html test case and
of course, not regress piglit point-sprite and glean-pointSprite
testcase.
NOTE: This is a candidate for stable release branches.
v2: fallback just when all enabled tex coord units need do
CoordReplace (Eric)
v3: move the sprite point validate code at I915InvalidateState (Eric)
v4: sprite point enable bit update based on _NEW_PROGRAM, too
add relative _NEW-state comments to show what state is being used(Eric)
Signed-off-by: Yuanhan Liu <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
We already program all the sampler state correctly, we just didn't give
the GPU a pointer to it for the VS stage. Thus, any texturing other
than texelFetch() wouldn't work.
Fixes piglit test vs-textureLod-miplevels and 99 of oglconform's
glsl-bif-tex subtests.
NOTE: This is a candidate for the 8.0 branch.
Signed-off-by: Kenneth Graunke <[email protected]>
Reviewed-by: Ian Romanick <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Variables have types, expression trees have types, but statements don't.
Rather than have a nonsensical field that stays NULL in the base class,
just move it to where it makes sense.
Fix up a few places that lazily used ir_instruction even though they
actually knew the particular subclass.
Signed-off-by: Kenneth Graunke <[email protected]>
Reviewed-by: Eric Anholt <[email protected]>
Reviewed-by: Ian Romanick <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Certain applications don't call SwapBuffers before exiting. Yet, we'd
really like to see a bitmap containing the final rendered image even if
they choose never to present it.
In particular, Piglit tests (at least with -auto -fbo) fall into this
category. Many of them failed to dump any images at all.
Dumping one final image at context destruction time seems to work.
We may wish to pursue a more elegant solution later.
Signed-off-by: Kenneth Graunke <[email protected]>
Reviewed-by: Eric Anholt <[email protected]>
|
|
|
|
|
|
|
| |
Those IDs are used by Bromolow.
Reviewed-by: Kenneth Graunke <[email protected]>
Signed-off-by: Eugeni Dodonov <[email protected]>
|
|
|
|
|
|
| |
Signed-off-by: Kenneth Graunke <[email protected]>
Reviewed-by: Eugeni Dodonov <[email protected]>
Reviewed-by: Eric Anholt <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
| |
These can be used to implement EXT_texture_swizzle without baking
state-dependent swizzle instructions into the shader and forcing
recompiles.
For now, just set them to pass-through mode, so everything continues to
work as it did on Ivybridge. We can optimize this later.
Signed-off-by: Kenneth Graunke <[email protected]>
Reviewed-by: Eric Anholt <[email protected]>
|
|
|
|
|
|
|
| |
We only need one sample, since we don't support multisampling yet.
Signed-off-by: Kenneth Graunke <[email protected]>
Reviewed-by: Eric Anholt <[email protected]>
|
|
|
|
|
| |
Signed-off-by: Kenneth Graunke <[email protected]>
Reviewed-by: Eric Anholt <[email protected]>
|
|
|
|
|
|
|
| |
Apparently this needs to be the same as in 3DSTATE_WM.
Signed-off-by: Kenneth Graunke <[email protected]>
Reviewed-by: Eric Anholt <[email protected]>
|
|
|
|
|
|
|
| |
These now start at bit 23 instead of bit 24/25.
Signed-off-by: Kenneth Graunke <[email protected]>
Reviewed-by: Eric Anholt <[email protected]>
|
|
|
|
|
|
|
|
| |
Getting HiZ working means updating all the state packets for resolves
and clears. It's not worth doing until we get the basics working.
Signed-off-by: Kenneth Graunke <[email protected]>
Reviewed-by: Eric Anholt <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
| |
For now, these all return 0, as I don't yet want to enable Haswell
support. Eventually they will be filled in with proper PCI IDs.
Also add an is_haswell field similar to is_g4x to make it easy to
distinguish Gen7 and Gen7.5.
Signed-off-by: Kenneth Graunke <[email protected]>
Reviewed-by: Eric Anholt <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
According to the BSpec ISA volume's "Accumulator Register" section:
"[DevIVB] SIMD16 execution on dwords is not allowed when accumulator is
explicit source or destination operand."
Fixes piglit tests:
- fs-multiply-const-ivec4
- fs-multiply-const-uvec4
- fs-multiply-ivec4-const
- fs-multiply-uvec4-const
Signed-off-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This is a regression introduced by commit cdcfd5, which forget to
increase the map_refcount for successfully-mapped region. Thus caused a
wrong non-blanced map_refcount.
This would fix the regression found in the two following webglc testcase
on Pineview platform:
texture-npot.html
gl-max-texture-dimensions.html
Cc: Anuj Phogat <[email protected]>
Signed-off-by: Yuanhan Liu <[email protected]>
Reviewed-by: Anuj Phogat <[email protected]>
|
|
|
|
|
|
|
|
| |
don't ask why I had to debug this.
tested to fix g-s and kwin at 16-bpp on Ironlake.
Signed-off-by: Dave Airlie <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
| |
Or technically, a near-null dereference.
https://bugs.freedesktop.org/show_bug.cgi?id=46303
https://bugs.freedesktop.org/show_bug.cgi?id=46739
NOTE: This is a candidate for the 8.0 branch.
Reviewed-by: Chad Versace <[email protected]>
|
|
|
|
|
|
|
|
| |
Improves Unigine Tropics performance at 1024x768 by 2.06236% +/-
0.50272% (n=11).
Reviewed-by: Chris Wilson <[email protected]>
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
| |
drm_intel_bo_unmap() supports both in the current libdrm version.
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
|
|
| |
Unigine Tropics uses INVALIDATE_BUFFER and not UNSYNCHRONIZED to reset
the buffer object when its streaming wraps. Don't penalize it by
flushing the batch at the wrap point, just allocate a new BO and get
to using it.
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
| |
Reviewed-by: Yuanhan Liu <[email protected]>
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
| |
It also asks for BMPs in the aub file at SwapBuffers time.
Reviewed-by: Yuanhan Liu <[email protected]>
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
| |
These are used for pretty presentation of the application name in the
UI.
Tested-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
| |
The force-enable option is dropped, now that the hardware we were
concerned about has HiZ on by default. Now, instead of doing
INTEL_HIZ=0 to test disabling hiz, you can set hiz=false.
v2: Disable separate stencil on gen6 when HIZ is turned off.
(previously, this had to be done manually in addition).
Reviewed-by: Kenneth Graunke <[email protected]> (v1)
|
|
|
|
|
|
|
|
|
| |
This was a debug option during gen6 transform feedback bringup (and a
similar one existed during gen4 bringup). However, it looks like
we're done with that, and we don't anticipate it being used again,
either for geometry shaders or transform feedback.
Suggested by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
| |
This was added in the i915/i965 merge from the i915 driver, but I
don't recall it ever being used since then.
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
|
|
| |
If you want to test the graphics driver, you want to test it under the
conditions that users will see, not some set of additional fallbacks.
If you want to test swrast, run the swrast driver (or no_rast=true)
instead.
Reviewed-by: Kenneth Graunke <[email protected]>
|