| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
|
|
|
| |
i965 will have more than 32 bits when BRW_STATE_COMPUTE_PROGRAM is added.
Signed-off-by: Jordan Justen <[email protected]>
Reviewed-by: Marek Olšák <[email protected]>
|
|
|
|
|
|
|
| |
The set of state atoms for compute shaders is currently empty; it will
be filled in by future patches.
Reviewed-by: Jordan Justen <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The hardware state for compute shaders is almost entirely orthogonal
to the hardware state for 3D rendering. To avoid sending unnecessary
state to the hardware, we'll need to have a separate set of state
atoms for the compute pipeline and the 3D pipeline. That means we
need to maintain two separate sets of dirty bits to determine which
state atoms need to be run.
But the dirty bits are not completely independent; for example, if
BRW_NEW_SURFACES is flagged while doing 3D rendering, then not only do
we need to re-run 3D state atoms that depend on BRW_NEW_SURFACES, but
we also need to re-run compute state atoms that depend on
BRW_NEW_SURFACES. But we'll also need to re-run those state atoms the
next time the compute pipeline is run.
To accomplish this, we record two sets of dirty bits, one for each
pipeline. When bits are dirtied (via SET_DIRTY_BIT() or
SET_DIRTY_ALL()) we set them to the dirty state in both pipelines.
When brw_state_upload() is run, we clear the dirty bits just for the
pipeline that was run.
Note that since the number of pipelines is known at compile time to be
2, the compiler should unroll the loops in SET_DIRTY_BIT() and
SET_DIRTY_ALL().
Reviewed-by: Jordan Justen <[email protected]>
|
|
|
|
|
|
|
| |
This will make it easier to extend dirty bit handling to support
compute shaders.
Reviewed-by: Jordan Justen <[email protected]>
|
|
|
|
|
|
|
| |
This will make it easier to extend dirty bit handling to support
compute shaders.
Reviewed-by: Jordan Justen <[email protected]>
|
|
|
|
|
|
|
| |
This will make it easier to extend dirty bit handling to support
compute shaders.
Reviewed-by: Jordan Justen <[email protected]>
|
|
|
|
|
|
|
|
|
| |
coverity reported this, Matt said it look like missing parens,
not bad identing, so lets try that.
Cc: "10.2 10.3" <[email protected]>
Reviewed-by: Chris Forbes <[email protected]>
Signed-off-by: Dave Airlie <[email protected]>
|
|
|
|
|
|
|
| |
This one path doesn't goto fail, so it seems to leak dec.
Reviewed-by: Ilia Mirkin <[email protected]>
Signed-off-by: Dave Airlie <[email protected]>
|
|
|
|
|
|
| |
Reported-by: Coverity scanner.
Signed-off-by: Dave Airlie <[email protected]>
|
|
|
|
|
|
| |
This fixes the code, but we never run it anyways, so silence coverity.
Signed-off-by: Dave Airlie <[email protected]>
|
|
|
|
|
|
| |
Recent code changes have caused these to no longer be used. Remove them.
Signed-off-by: Ilia Mirkin <[email protected]>
|
|
|
|
|
| |
Signed-off-by: Ilia Mirkin <[email protected]>
Reviewed-by: Chris Forbes <[email protected]>
|
|
|
|
|
|
|
|
|
| |
The current code... makes no sense. Use nouveau_bo_ref to attach the bo
to the exposed resource so as to have the proper lifetime guarantees.
Tested-by: Emil Velikov <[email protected]>
Signed-off-by: Ilia Mirkin <[email protected]>
Cc: "10.2 10.3" <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
With VP2, nv50_miptree is faked because the underlying bo's have to be
laid out in a certain way. This is done by adjusting the address. Make
sure that blits (and everything else for consistency) use the mt address
rather than the bo address as a base.
This fixes retrieving chroma plane with VDPAU.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=82255
Tested-by: Emil Velikov <[email protected]>
Signed-off-by: Ilia Mirkin <[email protected]>
Cc: "10.2 10.3" <[email protected]>
|
|
|
|
|
|
|
|
|
|
| |
The mt address is about to be used more, make sure it's set
appropriately.
Reported-by: Emil Velikov <[email protected]>
Tested-by: Emil Velikov <[email protected]>
Signed-off-by: Ilia Mirkin <[email protected]>
Cc: "10.2 10.3" <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
| |
When constant folding a MAD operation, we first fold the multiply and
generate an ADD. However we do so without making sure that the immediate
can be handled in the saturate case. If it can't, load the immediate in
a separate instruction.
Reported-by: Tiziano Bacocco <[email protected]>
Signed-off-by: Ilia Mirkin <[email protected]>
Cc: "10.2 10.3" <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
| |
Experimentally, the sampler doesn't appear to like these, neither as
buffer nor as rect textures. So remove 1D from the list of texture types
to make linear when used for staging.
This fixes the OSD in mplayer for VDPAU.
Signed-off-by: Ilia Mirkin <[email protected]>
Cc: "10.2 10.3" <[email protected]>
|
|
|
|
|
|
|
|
|
| |
Samplers are only defined up to num_samplers, so set all samplers above
nr to NULL so that we don't try to read them again later.
Tested-by: Christian Ruppert <[email protected]>
Signed-off-by: Ilia Mirkin <[email protected]>
Cc: "10.2 10.3" <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
| |
In certain circumstances, findFirstUses could end up doubling back on
instructions it had already processed, resulting in an infinite
recursion. Avoid this by keeping track of already-visited instructions.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=83079
Tested-by: Tobias Klausmann <[email protected]>
Signed-off-by: Ilia Mirkin <[email protected]>
Cc: "10.2 10.3" <[email protected]>
|
|
|
|
| |
Signed-off-by: Rob Clark <[email protected]>
|
|
|
|
|
|
|
| |
Once we've assembled the shader, no need to keep the intermediate
around.
Signed-off-by: Rob Clark <[email protected]>
|
|
|
|
|
| |
Signed-off-by: Jason Ekstrand <[email protected]>
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
| |
Signed-off-by: Jason Ekstrand <[email protected]>
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
| |
Reviewed-by: Michel Dänzer <[email protected]>
|
|
|
|
| |
Reviewed-by: Michel Dänzer <[email protected]>
|
|
|
|
| |
Reviewed-by: Michel Dänzer <[email protected]>
|
|
|
|
|
|
| |
So that I can add fast depth clear.
Reviewed-by: Michel Dänzer <[email protected]>
|
|
|
|
|
|
|
| |
We always left them enabled, which turned off HiZ in some cases.
This should improve performace with Hyper-Z.
Reviewed-by: Michel Dänzer <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This should be as fast as no HTILE for stencil. I think we can still get full
performance with depth-only rendering even if stencil is present in the buffer
but not used, but I'm not 100% sure. This may be revisited when HiS and fast
stencil clear are implemented.
This fixes a hang in Brutal Legend.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=64471
Reviewed-by: Michel Dänzer <[email protected]>
|
|
|
|
|
|
|
| |
This is a golden setting on RV740, but there is a hw bug which recommends
setting it on all R7xx chipsets.
Acked-by: Michel Dänzer <[email protected]>
|
|
|
|
|
|
| |
already implemented
Acked-by: Michel Dänzer <[email protected]>
|
|
|
|
|
| |
Cc: [email protected]
Acked-by: Michel Dänzer <[email protected]>
|
|
|
|
| |
Acked-by: Michel Dänzer <[email protected]>
|
|
|
|
|
|
| |
I have a piglit test that hits this.
Acked-by: Michel Dänzer <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
| |
It's almost the same.
This enables tiling for HTILE. It also enables Hyper-Z for other texture
targets (1D, 1D_ARRAY, 2D_ARRAY, CUBE, CUBE_ARRAY, 3D, RECT).
2D array depth textures are tested by Unigine Sanctuary and my new piglit
test.
Acked-by: Michel Dänzer <[email protected]>
|
|
|
|
|
|
|
|
| |
This fixes rendering to non-zero layer/face/slice with HTILE.
v2: added the assertion
Reviewed-by: Michel Dänzer <[email protected]>
|
|
|
|
|
|
|
|
|
|
| |
This fixes rendering to a non-zero layer/face/slice with HTILE.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=72685
v2: added the assertion
Reviewed-by: Michel Dänzer <[email protected]>
|
|
|
|
|
|
| |
Requires Evergreen or later hardware.
Signed-off-by: Glenn Kennard <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
| |
This fixes crashes if the number of temporaries is greater than 4096.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=66184
v2: added fail paths for realloc failures
Cc: 10.2 10.3 [email protected]
Reviewed-by: Ilia Mirkin <[email protected]>
|
|
|
|
|
|
| |
This should make a machine which is running piglit more responsive at times.
e.g. streaming-texture-leak can easily eat 600 MB because of how fast it
creates new textures.
|
| |
|
|
|
|
| |
Required for multi-GPU configuration where each GPU has its own X screen.
|
|
|
|
| |
Signed-off-by: Jordan Justen <[email protected]>
|
|
|
|
|
| |
Signed-off-by: Connor Abbott <[email protected]>
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
|
| |
We were only using it to get at its type, which we already know because
it's a builtin variable.
Signed-off-by: Connor Abbott <[email protected]>
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
|
|
|
| |
We were only using it to get at its type, which we already know because
it's a builtin variable.
v2 (Ken): Rebase on Matt's optimized gl_FrontFacing calculations.
Signed-off-by: Connor Abbott <[email protected]>
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
|
| |
The replicated data clear shader needs to be SIMD16, or else the GPU
will hang. So, compile it even if INTEL_DEBUG=no16 is set.
Signed-off-by: Kenneth Graunke <[email protected]>
Reviewed-by: Matt Turner <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Current method of generating distribution tar-balls involves manually
invoking make + target name in the appropriate places. This temporary
solution is used until we get 'make dist' working.
Currently it does not work, as in order to have the target (which is
also a filename) available in the final Makefile we need to add a PHONY
target + use the correct target name.
Cc: "10.2 10.3" <[email protected]>
Signed-off-by: Emil Velikov <[email protected]>
|
|
|
|
|
|
|
|
|
| |
Now that saturate is implemented natively as an instruction,
we can cut down on unneeded functionality.
Reviewed-by: Matt Turner <[email protected]>
Reviewed-by: Ian Romanick <[email protected]>
Signed-off-by: Abdiel Janulgue <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
| |
v3: Since the fs backend can emit saturate as a separate instruction, there is
no need to detect for min/max instructions and to rewrite the instruction tree
accordingly. On the other hand, we don't need to emit a separate saturated
mov either when the expression generating src can do saturate directly.
v4: Add can_do_saturate() check before enabling saturate modifer (Ken)
Reviewed-by: Matt Turner <[email protected]>
Reviewed-by: Ian Romanick <[email protected]>
Signed-off-by: Abdiel Janulgue <[email protected]>
|