| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
|
|
|
|
|
|
|
| |
Compute shaders fetch data from vertex buffers via the texture cache, so
we need to make sure the texture cache is flushed.
v2:
- Fix rebase mistake
- Fix spelling in comment
Reviewed-by: Marek Olšák <[email protected]>
|
|
|
|
|
|
|
|
|
|
| |
LOOP_START_DX10 ignores the LOOP_CONFIG* registers, so it is not limited
to 4096 iterations like the other LOOP_* instructions. Compute shaders
need to use this instruction, and since we aren't optimizing loops with
the LOOP_CONFIG* registers for pixel and vertex shaders, it seems like
we should just use it for everything.
Reviewed-by: Marek Olšák <[email protected]>
|
|
|
|
|
|
|
|
|
| |
For buffers (which is what is being used for RATs), the
COLOR*_DIM.WIDTH_MASK field needs to be set to the low 16-bits of the
buffer size, and the COLOR*_DIM.HEIEGHT_MAX needs to be set to the
high bits.
Reviewed-by: Marek Olšák <[email protected]>
|
|
|
|
|
|
| |
The kernel CS checker will fail if this register is not initialized.
Reviewed-by: Marek Olšák <[email protected]>
|
|
|
|
| |
Reviewed-by: Marek Olšák <[email protected]>
|
| |
|
|
|
|
| |
Signed-off-by: Tom Stellard <[email protected]>
|
|
|
|
|
| |
This is necessary upcoming encoding changes, since we will only be
using 9-bits for register encoding.
|
|
|
|
|
|
| |
This reverts commit 5b807400a87d5efefc481017eb420b772933e1da.
accidentally pushed.
|
|
|
|
|
|
| |
This reverts commit 5205db6a7ce623a7fca72e6dc6391bd12be3f6aa.
accidentally pushed
|
|
|
|
|
|
| |
This reverts commit 0c67fe5d2dc6d8066fc23c39184d9614abf63992.
accidentally pushed.
|
| |
|
| |
|
|
|
|
|
| |
This is the code that checks if a subtexure region is aligned to the
compressed format's block size.
|
|
|
|
|
|
|
|
|
|
|
| |
Don't cache pointers to elements of reallocatable array.
In some circumstances it caused false cache hits resulting in incorrect
command stream and gpu lockup.
Note: This is a candidate for the stable branches.
Signed-off-by: Vadim Girlin <[email protected]>
Reviewed-by: Marek Olšák <[email protected]>
|
|
|
|
| |
Reviewed-by: Tom Stellard <thomas.stellard at amd.com>
|
|
|
|
| |
Reviewed-by: Tom Stellard <thomas.stellard at amd.com>
|
|
|
|
|
|
| |
And define a SP_MAX_TEXTURE_SIZE value as we do in llvmpipe.
Reviewed-by: Jose Fonseca <[email protected]>
|
|
|
|
| |
Reviewed-by: Jose Fonseca <[email protected]>
|
|
|
|
|
|
|
| |
If the gallium driver implements the can_create_resource() function, call
it to do proxy texture size checks.
Reviewed-by: Jose Fonseca <[email protected]>
|
|
|
|
|
|
|
|
|
|
| |
Used to implement proxy textures. If a gallium driver doesn't implement
this function we'll just continue to use the core Mesa fallback code.
Without this hook we really have no good way to implement OpenGL proxy
textures with gallium drivers.
Reviewed-by: Jose Fonseca <[email protected]>
|
|
|
|
|
| |
There will always be six cube faces so take that into consideration when
computing the texture size and comparing against the limit.
|
| |
|
|
|
|
|
|
|
|
| |
Before, the limit was 8K. For 32-bit RGBA that would be require 1.5 GB
of memory (w/out mipmaps). That's well beyond the LP_MAX_TEXTURE_SIZE
of 1GB.
Reviewed-by: Jose Fonseca <[email protected]>
|
|
|
|
|
|
| |
Fix copy&paste error and move min levels check closer to max levels check.
Reviewed-by: Jose Fonseca <[email protected]>
|
|
|
|
|
|
|
|
| |
Simplify the code and make it more like the other glTexImage commands.
Call _mesa_legal_texture_dimensions() to validate width, height, depth.
Call ctx->Driver.TestProxyTexImage() to make sure texture is not too large.
Reviewed-by: Jose Fonseca <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
There are two aspects to texture image size checking:
1. Are the width, height, depth legal values (not negative, not larger
than the max size for the mipmap level, etc)?
2. Is the texture just too large to handle? For example, we might not be
able to really allocate memory for a 3D texture of maxSize x maxSize x
maxSize.
Previously, we did (1) via the ctx->Driver.TestProxyTextureImage() hook
but those tests are really device-independent. Now we do (2) via that
hook since the max texture memory and texture shape are device-dependent.
Also, (1) is now done outside the general texture parameter error checking
functions because of the special interaction with proxy textures. The
recently introduced PROXY_ERROR token is removed.
The teximage() and copyteximage() functions are bit simpler now (less
if-then nesting, etc.)
Reviewed-by: Jose Fonseca <[email protected]>
|
|
|
|
|
|
|
| |
Basically, move the body into a new _mesa_legal_texture_dimensions() function.
More refactoring to come.
Reviewed-by: Jose Fonseca <[email protected]>
|
|
|
|
|
|
|
| |
Move level checking out of _mesa_test_proxy_teximage() and into
the other error-checking functions.
Reviewed-by: Jose Fonseca <[email protected]>
|
|
|
|
| |
Fixes "warning: no return statement in function returning non-void"
|
|
|
|
|
|
|
| |
I can't see any reason this is global (unless for debugging)
Reviewed-by: Matt Turner <[email protected]>
Signed-off-by: Dave Airlie <[email protected]>
|
|
|
|
|
|
| |
This adds basic flow control support for If-Then-Else blocks using
predicates (stored in the EXEC register) and a predicate stack for
nested flow control.
|
|
|
|
|
|
|
| |
No regressions found in the tests of opencl-example/run_tests.sh.
Signed-off-by: Xinya Zhang <[email protected]>
Signed-off-by: Tom Stellard <[email protected]>
|
| |
|
| |
|
|
|
|
|
| |
Signed-off-by: Jordan Justen <[email protected]>
Reviewed-by: Chad Versace <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
As far as I can see, the intention of the requirement that we do so is to
prevent instruction prefetch from wandering out into either unmapped memory or
memory with a different caching type, and hanging the chip. The kernel makes
sure that the page after your BO has a valid page of the same caching type,
which meets this requirement, so there's no need to waste space between our
programs (and in instruction cache) on this.
Saves another 9kb instructions in l4d2 shaders.
Acked-by: Kenneth Graunke <[email protected]>
|
| |
|
|
|
|
|
|
|
|
|
|
| |
Reduces l4d2 program size from 1195kb to 919kb. Improves performance by 0.22%
+/- 0.11% (n=70).
v2: Rebase on compaction v2, fix up flag reg handling (by anholt).
v3: Fix uncompaction of the flag register number.
Signed-off-by: Kenneth Graunke <[email protected]>
|
|
|
|
| |
Reviewed-by: Paul Berry <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This reduces program size by using some smaller encodings for common bit
patterns in the Gen ISA, with the hope of making programs fit in the
instruction cache better.
v2: Use larger bitshifts for the uncompressed field setups, in line with the
way it's described in the spec. Consistently name a brw_compile "p" like
all other code. Add a couple more tests. Consistently call things
"compacted" not "compressed" (which is a different feature). Drop the
explicit check for not compacting SENDs, which is unjustified and already
implied by our lack of support for immediate values.
Reviewed-by: Paul Berry <[email protected]>
|
|
|
|
|
|
|
|
|
| |
The first cut at instruction compaction won't compact things that
would change control flow jump distances, but we do need to still be
able to walk the instruction stream, which involves jumping by 8 or 16
bytes between instructions.
Reviewed-by: Paul Berry <[email protected]>
|
|
|
|
|
|
|
|
|
| |
It's going to get more complicated when we do instruction compaction. This
also introduces putting the program offset in the output.
v2: Use next_insn_offset in brw_get_program(), too.
Reviewed-by: Paul Berry <[email protected]>
|
|
|
|
|
|
|
|
| |
To do unit testing of i965, we want to be able to link against the
driver's symbols and prod them. If we don't have a separate lib from
our loadable module, libtool gets super whiny.
Acked-by: Paul Berry <[email protected]>
|
|
|
|
|
|
|
|
| |
This file is used to provide stubs for the link test in gallium dri drivers.
But the same stubs without the main can be used for making unit tests for code
in a dri driver.
Acked-by: Paul Berry <[email protected]>
|
|
|
|
|
|
|
|
| |
I noticed in valgrind that p->single_program_flow was used while
uninitialized. Everything else zeroed out brw_compile, but this is better
API.
Reviewed-by: Paul Berry <[email protected]>
|
|
|
|
|
| |
Signed-off-by: Michel Dänzer <[email protected]>
Reviewed-by: Tom Stellard <[email protected]>
|
|
|
|
|
| |
Signed-off-by: Michel Dänzer <[email protected]>
Reviewed-by: Tom Stellard <[email protected]>
|
|
|
|
|
| |
Signed-off-by: Michel Dänzer <[email protected]>
Reviewed-by: Tom Stellard <[email protected]>
|
|
|
|
|
| |
Signed-off-by: Michel Dänzer <[email protected]>
Reviewed-by: Tom Stellard <[email protected]>
|