| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
|
| |
Signed-off-by: Jordan Justen <[email protected]>
Reviewed-by: Timothy Arceri <[email protected]>
|
|
|
|
|
| |
Signed-off-by: Jordan Justen <[email protected]>
Reviewed-by: Timothy Arceri <[email protected]>
|
|
|
|
|
| |
Signed-off-by: Jordan Justen <[email protected]>
Reviewed-by: Timothy Arceri <[email protected]>
|
|
|
|
|
| |
Signed-off-by: Jordan Justen <[email protected]>
Reviewed-by: Timothy Arceri <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
| |
For saving programs (shader cache; get program binary) it is useful to
set the id to 0, with the stage being a parameter.
For restoring programs it is useful to set the id to the id allocated
to the program at creation time.
Signed-off-by: Jordan Justen <[email protected]>
Reviewed-by: Timothy Arceri <[email protected]>
|
|
|
|
|
| |
Signed-off-by: Jordan Justen <[email protected]>
Reviewed-by: Timothy Arceri <[email protected]>
|
|
|
|
|
|
|
|
| |
We will want to use these for both the disk shader cache, and for the
ARB_get_program_binary.
Signed-off-by: Jordan Justen <[email protected]>
Reviewed-by: Timothy Arceri <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
| |
brw_program_deserialize_driver_blob will be a more generic form of
brw_program_deserialize_nir. In addition to nir, it will also be able
to extract gen binaries and upload them to the program cache.
In this commit, it continues to only support nir.
Signed-off-by: Jordan Justen <[email protected]>
Reviewed-by: Timothy Arceri <[email protected]>
|
|
|
|
|
|
|
|
| |
This will allow get_program_binary to add the gen program into its
serialization in addition to just the nir program.
Signed-off-by: Jordan Justen <[email protected]>
Reviewed-by: Timothy Arceri <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The driver may prefer to have a different blob for
ARB_get_program_binary compared to the version saved out for the disk
shader cache.
Since they both use the driver_cache_blob field, we need to always
give the driver the opportunity to fill in the driver_cache_blob when
saving the program binary.
Signed-off-by: Jordan Justen <[email protected]>
Reviewed-by: Timothy Arceri <[email protected]>
|
|
|
|
|
|
|
|
| |
This function is called just before the gl_program::driver_cache_blob
is saved out as part of the gl_program serialization.
Signed-off-by: Jordan Justen <[email protected]>
Reviewed-by: Timothy Arceri <[email protected]>
|
|
|
|
|
| |
Signed-off-by: Jordan Justen <[email protected]>
Reviewed-by: Timothy Arceri <[email protected]>
|
|
|
|
|
|
|
|
|
|
| |
Previously the mesa core code would not call to serialize the
driver_cache_blob if it existed. We will update it to always call to
serialize the driver_cache_blob meaning we should avoid re-serializing
it under mesa/state_tracker.
Signed-off-by: Jordan Justen <[email protected]>
Reviewed-by: Timothy Arceri <[email protected]>
|
|
|
|
|
| |
Signed-off-by: Jordan Justen <[email protected]>
Reviewed-by: Timothy Arceri <[email protected]>
|
|
|
|
|
|
|
|
|
| |
Until now we have assumed that we could skip emitting these barriers
in the general case based on empirical testing and a few assumptions
detailed in a comment in the driver code, however, recent CTS tests
have showed that we actually need them to produce correct behavior.
Reviewed-by: Jason Ekstrand <[email protected]>
|
|
|
|
|
|
|
| |
Looks like I broke intel CI compiles.
Fixes: 6f3aee40f9 (radv: using tls to store llvm related info and speed up compiles (v10))
Tested-by: Clayton Craft <[email protected]>
|
|
|
|
| |
Reviewed-by: Keith Packard <[email protected]>
|
|
|
|
|
|
|
|
|
|
| |
Error states coming from actual Vulkan applications tend to have fairly
long command buffers and lots of chained batches. 30 total BOs isn't
nearly enough. This commit bumps it to 256, makes some things use the
actual number of sections instead of the #define, and adds asserts if we
ever go over 256 sections.
Reviewed-by: Lionel Landwerlin <[email protected]>
|
|
|
|
|
|
|
|
|
| |
Our attempt to restart the loop with the second level batch worked at
one point but got broken at some point. It was too fragile anyway and
we're not likely to have enough secondaries to actually overflow the
stack so we may as well recurse in both cases.
Reviewed-by: Lionel Landwerlin <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
| |
The vtest protocol is pretty simple but also pretty dumb, and
the v1 caps query was fixed size, with no nice way to expand it,
however the server also ignores any command it doesn't understand.
So we can query v2 caps by sending a v2 followed by a v1, if the
v2 is ignored we know it's an old vtest server, and the we get
a v2 answer then we can just read the v1 answer and discard it.
Acked-by: Jakob Bornecrantz <[email protected]> (sounds good)
|
|
|
|
|
|
|
|
|
|
|
|
| |
CACHE_MODE_SS is not listed in gfxspecs table for user mode
non-privileged registers. So, making any changes from Mesa
will do nothing. Kernel is already setting this bit in
CACHE_MODE_SS register which is saved/restored to/from
the HW context image.
Signed-off-by: Anuj Phogat <[email protected]>
Reviewed-by: Jason Ekstrand <[email protected]>
Reviewed-by: Lionel Landwerlin <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
| |
CACHE_MODE_SS is not listed in gfxspecs table for user mode
non-privileged registers. So, making any changes from Mesa
will do nothing. Kernel is already setting this bit in
CACHE_MODE_SS register which is saved/restored to/from
the HW context image.
Signed-off-by: Anuj Phogat <[email protected]>
Reviewed-by: Jason Ekstrand <[email protected]>
Reviewed-by: Lionel Landwerlin <[email protected]>
|
|
|
|
| |
Reviewed-by: Caio Marcelo de Oliveira Filho <[email protected]>
|
|
|
|
| |
Reviewed-by: Caio Marcelo de Oliveira Filho <[email protected]>
|
|
|
|
| |
Reviewed-by: Caio Marcelo de Oliveira Filho <[email protected]>
|
|
|
|
|
|
|
|
| |
Enables SPV_KHR_8bit_storage and VK_KHR_8bit_storage on gen 8+
using the VK_KHR_get_physical_device_properties2 functionality
to expose if the extension is supported or not.
Reviewed-by: Jason Ekstrand <[email protected]>
|
|
|
|
| |
Reviewed-by: Jason Ekstrand <[email protected]>
|
|
|
|
|
|
| |
Updates headers and grammar to ff684ffc6a35d2a58f0f63108877d0064ea33feb
Acked-by: Jason Ekstrand <[email protected]>
|
|
|
|
|
|
| |
v2: Update comment according to this patch. (Jason Ekstrand)
Reviewed-by: Jason Ekstrand <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
| |
When the destination is a BYTE type allow raw movs
even if the stride is not exact multiple of destination
type and exec type, execution type is Word and its size is 2.
This restriction was only allowing stride==2 destinations
for 8-bit types.
Reviewed-by: Jason Ekstrand <[email protected]>
|
|
|
|
| |
Reviewed-by: Jason Ekstrand <[email protected]>
|
|
|
|
| |
Reviewed-by: Jason Ekstrand <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Since Gen8+ Intel PRM states that "r127 must not be used for return
address when there is a src and dest overlap in send instruction."
This patch implements this restriction creating new grf127_send_hack_node
at the register allocator. This node has a fixed assignation to grf127.
For vgrf that are used as destination of send messages we create node
interfereces with the grf127_send_hack_node. So the register allocator
will never assign to these vgrf a register that involves grf127.
If dispatch_width > 8 we don't create these interferences to the because
all instructions have node interferences between sources and destination.
That is enough to avoid the r127 restriction.
This fixes CTS tests that raised this issue as they were executed as SIMD8:
dEQP-VK.spirv_assembly.instruction.graphics.8bit_storage.8struct_to_32struct.storage_buffer_*int_geom
Shader-db results on Skylake:
total instructions in shared programs: 7686798 -> 7686797 (<.01%)
instructions in affected programs: 301 -> 300 (-0.33%)
helped: 1
HURT: 0
total cycles in shared programs: 337092322 -> 337091919 (<.01%)
cycles in affected programs: 22420415 -> 22420012 (<.01%)
helped: 712
HURT: 588
Shader-db results on Broadwell:
total instructions in shared programs: 7658574 -> 7658625 (<.01%)
instructions in affected programs: 19610 -> 19661 (0.26%)
helped: 3
HURT: 4
total cycles in shared programs: 340694553 -> 340676378 (<.01%)
cycles in affected programs: 24724915 -> 24706740 (-0.07%)
helped: 998
HURT: 916
total spills in shared programs: 4300 -> 4311 (0.26%)
spills in affected programs: 333 -> 344 (3.30%)
helped: 1
HURT: 3
total fills in shared programs: 5370 -> 5378 (0.15%)
fills in affected programs: 274 -> 282 (2.92%)
helped: 1
HURT: 3
v2: Avoid duplicating register classes without grf127. Let's use a node
with a fixed assignation to grf127 and create interferences to send
message vgrf destinations. (Eric Anholt)
v3: Update reference to CTS VK_KHR_8bit_storage failing tests.
(Jose Maria Casanova)
Reviewed-by: Jason Ekstrand <[email protected]>
Cc: 18.1 <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Implement at brw_eu_validate the restriction from Intel Broadwell PRM,
vol 07, section "Instruction Set Reference", subsection "EUISA
Instructions", Send Message (page 990):
"r127 must not be used for return address when there is a src and
dest overlap in send instruction."
v2: Style fixes (Matt Turner)
Reviewed-by: Matt Turner <[email protected]>
Cc: 18.1 <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This uses the common compiler passes abstraction to help radv
avoid fixed cost compiler overheads. This uses a linked list per
thread stored in thread local storage, with an entry in the list
for each target machine.
This should remove all the fixed overheads setup costs of creating
the pass manager each time.
This takes a demo app time to compile the radv meta shaders on nocache
and exit from 1.7s to 1s. It also has been reported to take the startup
time of uncached shaders on RoTR from 12m24s to 11m35s (Alex)
v2: fix llvm6 build, inline emit function, handle multiple targets
in one thread
v3: rebase and port onto new structure
v4: rename some vars (Bas)
v5: drag all code into radv for now, we can refactor it out later
for radeonsi if we make it shareable
v6: use a bit more C++ in the wrapper
v7: logic bugs fixed so it actually runs again.
v8: rebase on top of radeonsi changes.
v9: drop some C++ headers, cleanup list entry
v10: use pop_back (didn't have enough caffeine)
Reviewed-by: Bas Nieuwenhuizen <[email protected]>
|
|
|
|
|
|
|
|
|
|
| |
Fixes 14 piglits, mostly in egl_khr_create_context.
v2: Also short-circuit the same-context-no-drawables case (Eric Anholt)
Fixes: https://github.com/anholt/libepoxy/issues/177
Reviewed-by: Eric Anholt <[email protected]>
Signed-off-by: Adam Jackson <[email protected]>
|
|
|
|
|
|
|
|
| |
We were not properly writing page tables when the virtual address
range spans multiple subtrees of the tables.
Signed-off-by: Lionel Landwerlin <[email protected]>
Reviewed-by: Rafael Antognolli <[email protected]>
|
|
|
|
|
| |
Fixes a bunch of piglit interpolation tests, and reduces my concern about
some MSAA blit shaders with noperspective varyings.
|
|
|
|
|
|
| |
The logic was duplicated in a pretty gross way, when what we really need
is just a helper function for stuffing the values in the packet. This
will make implementing noperspective easier.
|
|
|
|
|
|
| |
We weren't using the field yet, so it didn't affect anything.
Fixes: c0476d964abb ("v3d: Express dithering mode in the same way that the CLIF parser does.")
|
|
|
|
|
|
|
|
|
|
| |
"sampler2DRect" and "sampler2DRectShadow" are specified as
reserved from GLSL 1.1 and GLSL ES 1.0
Signed-off-by: zhaowei yuan <[email protected]>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=106906
Reviewed-by: Eric Anholt <[email protected]>
Fixes: 34f7e761bc61 ("glsl/parser: Track built-in types using the glsl_type directly")
|
|
|
|
|
|
|
|
|
| |
Java2d opengl pipeline passes NULL piAttribList to
wglCreatePbufferARB(). So skip parsing the attribute list
if it is NULL.
Reviewed-by: Brian Paul <[email protected]>
Reviewed-by: Neha Bhende <[email protected]>
|
|
|
|
|
|
|
|
| |
The implementation of CreateRenderPass2 uses the helpers we broke out in
previous commits. The implementations of the new vkCmd functions just
call the old versions.
Reviewed-by: Lionel Landwerlin <[email protected]>
|
|
|
|
|
|
|
|
| |
This makes certain checks a bit easier and means that we don't have
the attachment information duplicated in the attachment list and in
depth_stencil_attachment.
Reviewed-by: Lionel Landwerlin <[email protected]>
|
|
|
|
| |
Reviewed-by: Lionel Landwerlin <[email protected]>
|
|
|
|
|
|
| |
This new helper takes a VkSubpassDependency2KHR for future-proofing.
Reviewed-by: Lionel Landwerlin <[email protected]>
|
|
|
|
| |
Reviewed-by: Lionel Landwerlin <[email protected]>
|
|
|
|
| |
Reviewed-by: Lionel Landwerlin <[email protected]>
|
|
|
|
| |
Reviewed-by: Lionel Landwerlin <[email protected]>
|
|
|
|
|
|
|
|
| |
Drawable and readable need to either both be None or both be non-None.
Cc: <[email protected]>
Signed-off-by: Adam Jackson <[email protected]>
Reviewed-by: Eric Anholt <[email protected]>
|