| Commit message (Collapse) | Author | Age | Files | Lines |
| |
|
| |
|
| |
|
|
|
|
|
| |
Fixes "warning: cast from pointer to integer of different size" for
64-bit builds.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Optimizes
cmp.ge.f0(8) null g45<8,8,1>F 0F
(+f0) sel(8) g50<1>F g40<8,8,1>F g10<8,8,1>F
cmp.ge.f0(8) null g45<8,8,1>F 0F
(+f0) sel(8) g51<1>F g41<8,8,1>F g11<8,8,1>F
cmp.ge.f0(8) null g45<8,8,1>F 0F
(+f0) sel(8) g52<1>F g42<8,8,1>F g12<8,8,1>F
cmp.ge.f0(8) null g45<8,8,1>F 0F
(+f0) sel(8) g53<1>F g43<8,8,1>F g13<8,8,1>F
into
cmp.ge.f0(8) null g45<8,8,1>F 0F
(+f0) sel(8) g50<1>F g40<8,8,1>F g10<8,8,1>F
(+f0) sel(8) g51<1>F g41<8,8,1>F g11<8,8,1>F
(+f0) sel(8) g52<1>F g42<8,8,1>F g12<8,8,1>F
(+f0) sel(8) g53<1>F g43<8,8,1>F g13<8,8,1>F
total instructions in shared programs: 1644938 -> 1638181 (-0.41%)
instructions in affected programs: 574955 -> 568198 (-1.18%)
Two more 16-wide programs (in L4D2). Some large (-9%) decreases in
instruction count in some of Valve's Source Engine games. No
regressions.
Reviewed-by: Eric Anholt <[email protected]>
Reviewed-by: Paul Berry <[email protected]>
|
|
|
|
|
|
|
|
| |
We'd like to CSE some instructions, like CMP, that often have null
destinations. Instead of replacing them with MOVs to null, just don't
emit the MOV.
Reviewed-by: Paul Berry <[email protected]>
|
|
|
|
|
| |
Reviewed-by: Eric Anholt <[email protected]>
Reviewed-by: Paul Berry <[email protected]>
|
|
|
|
|
| |
Reviewed-by: Eric Anholt <[email protected]>
Reviewed-by: Paul Berry <[email protected]>
|
|
|
|
|
| |
Reviewed-by: Eric Anholt <[email protected]>
Reviewed-by: Paul Berry <[email protected]>
|
|
|
|
|
|
|
|
|
|
| |
This avoids a lot of message setup we had to do otherwise. Improves
GLB2.7 performance with register spilling force enabled by 1.6442% +/-
0.553218% (n=4).
v2: Use BRW_PREDICATE_NONE, improve a comment (by Paul).
Reviewed-by: Paul Berry <[email protected]>
|
|
|
|
|
|
|
| |
I'm going to be introducing gen7 variants, and the previous naming was
going to get confusing.
Reviewed-by: Paul Berry <[email protected]>
|
|
|
|
|
|
|
| |
We were clearing the reg_offset before trying to use it. Oops. Fixes
glsl-fs-texture2drect with the reg spilling debug enabled.
Reviewed-by: Paul Berry <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
| |
Things blew up when I enabled the debug register spill code without
disabling 16-wide, so I decided to just fix 16-wide spilling.
We still don't generate 16-wide when register spilling happens as part of
allocation (since we expect it to be slower), but now we can experiment
with allowing it in some cases in the future.
Reviewed-by: Paul Berry <[email protected]>
|
|
|
|
|
|
|
|
|
| |
I believe this will never happen in SIMD8 mode, but it could for SIMD16
when we fix it.
v2: Fix off-by-one in my register counting comment (caught by Paul).
Reviewed-by: Paul Berry <[email protected]> (v1)
|
|
|
|
|
|
|
|
|
|
|
| |
Now that reg spilling generates new vgrfs, we were looping forever if you
ever turned it on.
Instead, move the debug code into the register allocator right near where
we'd be doing spilling anyway, which should more accurately reflect how
register spilling occurs in the wild.
Reviewed-by: Paul Berry <[email protected]>
|
|
|
|
|
|
|
|
| |
I'm going to need to reuse this for fixing register spilling on SIMD16.
Note that BRW_MAX_MRF is 16, which is the same as BRW_MAX_GRF -
GEN7_MRF_HACK_START.
Reviewed-by: Paul Berry <[email protected]>
|
|
|
|
|
|
| |
This hasn't been true since SIMD16 mode was added.
Reviewed-by: Paul Berry <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
When faced with a million instructions that all became candidates at the
same time (none of which individually reduce register pressure), the ones
on the critical path are more likely to be the ones that will free up some
candidates soon.
shader-db:
total instructions in shared programs: 1681070 -> 1681070 (0.00%)
instructions in affected programs: 0 -> 0
GAINED: 40
LOST: 74
Fixes indistinguishable-from-hanging behavior in GLES3conform's
uniform_buffer_object_max_uniform_block_size test, regressed by
c3c9a8c85758796a26b48e484286e6b6f5a5299a. Given that
93bd627d5a6c485948b94488e6cd53a06b7ebdcf was unlocked by that commit, the
net effect on 16-wide program count is still quite positive, and I think
this should give us more stable scheduling (less dependency on original
instruction emit order).
v2: Comment suggestions by Paul
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=70943
Reviewed-by: Paul Berry <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
| |
This is a step in doing scheduling as described in Muchnick (p538). A
difference is that our latency function is only specific to one
instruction (it doesn't describe, for example, the different latency
between WAR of a send's arguments and RAW of a send's destination), but
that's changeable later. We also don't separately compute the postorder
traversal of the graph, since we can use the setting of the delay field as
the "visited" flag.
Reviewed-by: Paul Berry <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
| |
The return value has been unused since commit d348b0c. This was
originally included in another patch, but it was split out by Ian
Romanick.
v2: Drop unnecessary final return. Suggested by Paul.
Reviewed-by: Ian Romanick <[email protected]>
Reviewed-by: Paul Berry <[email protected]>
Cc: Eric Anholt <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Use PKG_CHECK_MODULE over requesting the user to setup the
option at configure time. Drop unused EXPAT_INCLUDE and
update all targets.
NOTE: The this commit removes the --with-expat configure
option. One should ensure that the expat they wish to use
has expat.pc file accessible by pkg-config.
v2:
* Add note about the removal of --with-expat
(per Tom Stellard)
* Drop EXPAT_CFLAGS for targets that do not build DRI_COMMON
(spotted by Matt Turner)
v3:
* Rebase on top of megadrivers (drop EXPAT_CFLAGS from swrast)
Acked-by: Matt Turner <[email protected]> (v2)
Reviewed-by: Tom Stellard <[email protected]> (v2)
Signed-off-by: Emil Velikov <[email protected]>
Conflicts:
configure.ac
src/mesa/drivers/dri/common/Makefile.am
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The idea of the original order was that you'd dead code eliminate accesses
to push constants. But I've never seen a case of that (nor has
shader-db), while we frequently see sparse accesses of large constant
arrays that would overflow into pull constants.
Cuts pull constant use on csgo, serious sam, planeshift, and the cave:
total instructions in shared programs: 1695103 -> 1688795 (-0.37%)
instructions in affected programs: 92024 -> 85716 (-6.85%)
GAINED: 339
LOST: 0
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
| |
Reviewed-by: Paul Berry <[email protected]>
|
|
|
|
|
|
|
|
| |
The MRF variant is going to be used extensively by the atomic counter
intrinsics to assemble untyped atomic and surface read messages
easily.
Reviewed-by: Paul Berry <[email protected]>
|
| |
|
|
|
|
| |
Reviewed-by: Paul Berry <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
| |
The maximum number of atomic buffer objects is somewhat arbitrary, we
can change it in the future easily if it turns out it's not enough...
v2: Add comments with the relevant mesa dirty bits. Fix usage of
BRW_NEW_UNIFORM_BUFFER in the GS ABO state atom.
v3: Update binding table layout diagrams.
v4: Resolve conflicts with the recent dynamic surface index assignment changes.
Reviewed-by: Paul Berry <[email protected]>
|
|
|
|
|
|
|
|
| |
And add Gen7 implementation.
v2: Fix off by one error in buffer size calculation.
Reviewed-by: Paul Berry <[email protected]>
|
|
|
|
|
|
|
|
|
| |
v2: Fix GLSL version in which the type became available. Add
contains_atomic() convenience method. Split off atomic counter
comparison error checking to a separate patch that will handle all
opaque types. Include new ir_variable fields for atomic types.
Reviewed-by: Ian Romanick <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This patch implements the common support code required for the
ARB_shader_atomic_counters extension. It defines the necessary data
structures for tracking atomic counter buffer objects (from now on
"ABOs") associated with some specific context or shader program, it
implements support for binding buffers to an ABO binding point and
querying the existing atomic counters and buffers declared by GLSL
shaders.
v2: Fix extension checks. Drop unused MAX_ATOMIC_BUFFERS constant.
Acked-by: Paul Berry <[email protected]>
Reviewed-by: Ian Romanick <[email protected]>
|
|
|
|
|
|
|
|
| |
Add XML file for the dispatch code generator, update the
dispatch_sanity test and add stub definition for the new entry point.
Reviewed-by: Paul Berry <[email protected]>
Reviewed-by: Ian Romanick <[email protected]>
|
|
|
|
|
|
|
|
|
| |
These ralloc contexts belong to a specific object and are being
deallocated manually from the class destructor. Now that we've hooked
up destructors to ralloc there's no reason for them to be children of
any other context, and doing so might to lead to double frees under
some circumstances. The class destructor has all the responsibility
of freeing class memory resources now.
|
|
|
|
|
|
|
|
|
|
| |
destructible.
Only implemented on GCC and Clang for now. Other compilers use a
dummy implementation that always returns false, which should be a safe
[but slightly inefficient] assumption in all cases.
Reviewed-by: Ian Romanick <[email protected]>
|
|
|
|
|
|
|
|
| |
This will let us use strcasecmp() from anywhere inside Mesa without
having to worry about the fact that it doesn't exist in MSVC.
Reviewed-by: Kenneth Graunke <[email protected]>
Reviewed-by: Ian Romanick <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
| |
Before we were only checking the st->vertex_array_out_of_memory flag
after updating array state. But if there's two consecutive glDrawArrays
calls and the first one is skipped because of OOM, the second one should
be skipped too.
Cc: 9.2 <[email protected]>
Reviewed-by: Marek Olšák <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Orbital Explorer was generating a 4000 instruction geometry shader, which
was taking 275 trips through dead code elimination and register
coalescing, each of which updated live variables to get its work done, and
invalidated those live variables afterwards.
By using bitfields instead of bools (reducing the working set size by a
factor of 8) in live variables analysis, it drops from 88% of the profile
to 57%, and reduces overall runtime from I-got-bored-and-killed-it (Paul
says 3+ minutes) to 10.5 seconds.
Compare to f179f419d1d0a03fad36c2b0a58e8b853bae6118 on the FS side.
Reviewed-by: Paul Berry <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
There is nothing in the OpenGL specification which prevents the user from
calling glGenQueries to generate a new query object while another object is
active. Neither is there anything in the Mesa implementation which prevents
this. So remove the INVALID_OPERATION errors in this case.
Similarly, it is explicitly allowed by the OpenGL specification to delete an
active query, so remove the assertion for that case, replacing it with the
necesssary state updates to end the query, (clear the bindpt pointer and call
into the driver's EndQuery hook).
CC: <[email protected]>
Reviewed-by: Brian Paul <[email protected]>
Tested-by: Brian Paul <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
| |
The normal drawing path does this, and it's necessary on Ivybridge,
so let's try it on Sandybridge too. It's not explicitly documented
as necessary, but might help with hangs.
Signed-off-by: Kenneth Graunke <[email protected]>
Tested-by: Xinkai Chen <[email protected]>
Reviewed-by: Eric Anholt <[email protected]>
Cc: "9.2" <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
From the documentation:
"[DevIVB] 3DSTATE_DEPTH_BUFFER must always be programmed along with the
other Depth/Stencil state commands(i.e. 3DSTATE_CLEAR_PARAMS,
3DSTATE_STENCIL_BUFFER, or 3DSTATE_HIER_DEPTH_BUFFER)."
We normally do this, but BLORP was failing to do so in the case where it
disables depth.
Not observed to fix anything yet.
Signed-off-by: Kenneth Graunke <[email protected]>
Tested-by: Xinkai Chen <[email protected]>
Reviewed-by: Eric Anholt <[email protected]>
Cc: "9.2" <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
| |
For some reason, we put the flush in the caller, rather than just before
emitting the packet. This is more than a cosmetic problem: BLORP calls
gen6_emit_3dstate_multisample() directly, and so it missed the flush.
Signed-off-by: Kenneth Graunke <[email protected]>
Tested-by: Xinkai Chen <[email protected]>
Reviewed-by: Eric Anholt <[email protected]>
Cc: "9.2" <[email protected]>
|
|
|
|
|
|
|
|
|
| |
Non-pipelined commands need this flush.
Signed-off-by: Kenneth Graunke <[email protected]>
Tested-by: Xinkai Chen <[email protected]>
Reviewed-by: Eric Anholt <[email protected]>
Cc: "9.2" <[email protected]>
|
|
|
|
|
|
|
|
|
| |
This is another non-pipelined command that needs a flush on Sandybridge.
Signed-off-by: Kenneth Graunke <[email protected]>
Tested-by: Xinkai Chen <[email protected]>
Reviewed-by: Eric Anholt <[email protected]>
Cc: "9.2" <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
From the comments above intel_emit_post_sync_nonzero_flush:
"[DevSNB-C+{W/A}] Before any depth stall flush (including those
produced by non-pipelined state commands), software needs to first
send a PIPE_CONTROL with no bits set except Post-Sync Operation != 0."
This suggests that every non-pipelined (0x79xx) command needs a
post-sync non-zero flush before it.
Signed-off-by: Kenneth Graunke <[email protected]>
Tested-by: Xinkai Chen <[email protected]>
Reviewed-by: Eric Anholt <[email protected]>
Cc: "9.2" <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Otherwise the gen6 w/a in the kernel won't kick in and the write will
land nowhere.
Inspired by a patch Ken pointed me at which had the same issue (but
isn't yet merged and also for a gen7+ feature). An audit of the entire
driver didn't reveal any other case than the one in in the write_reg
helper used by the gen6 queryobj code.
Acked-by: Kenneth Graunke <[email protected]>
Signed-off-by: Daniel Vetter <[email protected]>
Tested-by: Xinkai Chen <[email protected]>
Reviewed-by: Eric Anholt <[email protected]>
Cc: "9.2" <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Setting bilinear_filter flag in case of multisample blits with
GL_LINEAR filter causes incorrect behavior in translate_dst_to_src()
function. This broke Modern Warfare (1, 2 and 3) on SNB, IVB and HSW.
Tested on SNB and IVB, no Piglit regressions. Trace file of the game
(taken with apitrace) works fine with this patch.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=69078
Cc: [email protected]
Signed-off-by: Anuj Phogat <[email protected]>
Reported-by: Armin K <[email protected]>
Tested-by: Armin K <[email protected]>
Reviewed-by: Paul Berry <[email protected]>
|
|
|
|
|
| |
Signed-off-by: Rico Schüller <[email protected]>
Signed-off-by: Brian Paul <[email protected]>
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
When a geometry shader is present, the fragment shader gl_PrimitiveID
input acts like an ordinary varying, receiving data from the gs
gl_PrimitiveID output. When there's no geometry shader, we have to
ask the fixed function SF hardware to provide the primitive ID to the
fragment shader instead.
Previously, the SF setup code would handle this situation by
recognizing that the FS gl_PrimitiveID input didn't match to any VS
output; since normally an FS input with no corresponding VS output
leads to undefined data, the SF setup code used to just arbitrarily
assign it to receive data from attribute 0.
This patch changes the SF setup code so that instead of arbitrarily
using attribute 0, it assigns the unmatched FS input to receive
gl_PrimitiveID. In the case where the FS input really is
gl_PrimitiveID, this produces the intended result. In all other
cases, no harm is done since GL specifies that the behaviour is
undefined.
Fixes piglit test primitive-id-no-gs.
v2: If an attribute is already being overridden with point
coordinates, don't try to also override it with gl_PrimitiveID. This
is necessary to avoid regressing piglit tests such as
shaders/glsl-fs-pointcoord.
Reviewed-by: Eric Anholt <[email protected]>
|
|
|
|
|
|
|
|
| |
Fixes 'make check' failures introduced with commit
80964226e9b8a05c39157f9305c06c0b2861e080.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=70900
Signed-off-by: Vinson Lee <[email protected]>
|
|
|
|
| |
Fixes SCons build.
|