| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
|
|
|
| |
The compiler will emit GLC=1.
Signed-off-by: Samuel Pitoiset <[email protected]>
Reviewed-by: Bas Nieuwenhuizen <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Even if we don't use local buffers in general. Turns out that even
though the performance is not the best the kernel still does it
better than our own list.
We still have to keep the radv bo list for buffers that are shared
externally.
This improves Talos on lowest quality setting (so as CPU bound as
possible) by ~10% if the global bo list is enabled.
Reviewed-by: Samuel Pitoiset <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
In radv we had a separate flag to actually use it + an env option
to experimentally use it.
The common code setting has_local_buffers to false of course broke
that experimental option.
Also the "enable on APU" did not make sense for RADV as it is still
disabled by default.
Fixes: b21a4efb553 "radv/winsys: allow local BOs on APUs"
Reviewed-by: Samuel Pitoiset <[email protected]>
|
|
|
|
|
|
| |
To test global_bo_list performance.
Reviewed-by: Samuel Pitoiset <[email protected]>
|
|
|
|
|
|
|
|
|
| |
Coverity: CID 1444664
Fixes: d62d434fe920 ("ac/nir_to_llvm: add image bindless support")
Reviewed-by: Samuel Pitoiset <[email protected]>
Reviewed-by: Timothy Arceri <[email protected]>
|
|
|
|
|
|
|
|
| |
Seems it was missing the "/ ma + 0.5" and the order was swapped.
Fixes: a1a2a8dfda7b9cac7e ('nir: add AMD_gcn_shader extended instructions')
Signed-off-by: Rhys Perry <[email protected]>
Reviewed-by: Samuel Pitoiset <[email protected]>
|
|
|
|
|
|
|
|
| |
These were updated in version 1.1.106 of vulkan.h to make more sense
with the extension names. We may as well keep with the times.
Acked-by: Dave Airlie <[email protected]>
Reviewed-by: Lionel Landwerlin <[email protected]>
|
|
|
|
|
| |
Acked-by: Dave Airlie <[email protected]>
Acked-by: Lionel Landwerlin <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
When gathering info for unmovable types we need to handle arrays.
While we dont support packing/moving arrays we do support packing
scalar components with these arrays.
Fixes piglit:
tests/spec/arb_enhanced_layouts/execution/component-layout/vs-fs-array-interleave-range.shader_test
Fixes: 5eb17506e159 ("nir: do not pack varying with different types")
Reviewed-by: Samuel Pitoiset <[email protected]>
|
|
|
|
|
| |
Signed-off-by: Samuel Pitoiset <[email protected]>
Reviewed-by: Bas Nieuwenhuizen <[email protected]>
|
|
|
|
|
| |
Signed-off-by: Samuel Pitoiset <[email protected]>
Reviewed-by: Bas Nieuwenhuizen <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Pipeline statistics queries should not count BLORP's rectangles.
(23) How do operations like Clear, TexSubImage, etc. affect the
results of the newly introduced queries?
DISCUSSION: Implementations might require "helper" rendering
commands be issued to implement certain operations like Clear,
TexSubImage, etc.
RESOLVED: They don't. Only application submitted rendering
commands should have an effect on the results of the queries.
Piglit's arb_pipeline_statistics_query-vert_adj exposes this bug when
the driver is hacked to always perform glBufferData via a GPU staging
copy (for debugging purposes).
Reviewed-by: Jason Ekstrand <[email protected]>
Reviewed-by: Lionel Landwerlin <[email protected]>
|
|
|
|
| |
Reviewed-by: Karol Herbst <[email protected]>
|
|
|
|
|
|
|
| |
Now that nir_const_value is a scalar, we don't need the switch on bit
size in order to pluck off components properly.
Reviewed-by: Karol Herbst <[email protected]>
|
|
|
|
|
|
|
| |
Now that nir_const_value is a scalar, we don't need the switch on bit
size in order copy components around properly.
Reviewed-by: Karol Herbst <[email protected]>
|
|
|
|
|
|
|
| |
Now that nir_const_value is a scalar, we don't need the switch on bit
size in order to swizzle them properly.
Reviewed-by: Karol Herbst <[email protected]>
|
|
|
|
|
|
|
|
|
| |
v2: remove & operator in a couple of memsets
add some memsets
v3: fixup lima
Signed-off-by: Karol Herbst <[email protected]>
Reviewed-by: Jason Ekstrand <[email protected]> (v2)
|
|
|
|
|
|
|
|
| |
we already assert above that there are no more than 3 sources, so it
doesn't make sense to use an array of 4 sources
Signed-off-by: Karol Herbst <[email protected]>
Reviewed-by: Jason Ekstrand <[email protected]>
|
|
|
|
|
| |
Signed-off-by: Karol Herbst <[email protected]>
Reviewed-by: Jason Ekstrand <[email protected]>
|
|
|
|
| |
Reviewed-by: Karol Herbst <[email protected]>
|
|
|
|
|
|
| |
v2: replace nir_zero_vec with nir_imm_zero (Karol Herbst)
Reviewed-by: Karol Herbst <[email protected]>
|
|
|
|
|
|
|
|
| |
While we're here, fix a typo which caused it to actually return a vec4
with the third and fourth components zero.
Signed-off-by: Karol Herbst <[email protected]>
Reviewed-by: Jason Ekstrand <[email protected]>
|
|
|
|
|
|
| |
Signed-off-by: Karol Herbst <[email protected]>
Reviewed-by: Jason Ekstrand <[email protected]>
Reviewed-by: Qiang Yu <[email protected]>
|
|
|
|
|
|
|
|
| |
v2 (Jason Ekstrand):
- Add even more places
Signed-off-by: Karol Herbst <[email protected]>
Reviewed-by: Jason Ekstrand <[email protected]>
|
|
|
|
|
| |
Signed-off-by: Karol Herbst <[email protected]>
Reviewed-by: Jason Ekstrand <[email protected]>
|
|
|
|
|
| |
This makes things a bit simpler and it's also more robust because it no
longer has a hard dependency on the offset being a 32-bit value.
|
|
|
|
|
| |
Signed-off-by: Karol Herbst <[email protected]>
Reviewed-by: Jason Ekstrand <[email protected]>
|
|
|
|
|
| |
Signed-off-by: Karol Herbst <[email protected]>
Reviewed-by: Jason Ekstrand <[email protected]>
|
|
|
|
|
|
|
| |
v2: Run before lowering I/O.
Signed-off-by: Alyssa Rosenzweig <[email protected]>
Reviewed-by: Qiang Yu <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
On Mali hardware (supported by Panfrost and Lima), the fixed-function
transformation from world-space to screen-space coordinates is done in
the vertex shader prior to writing out the gl_Position varying, rather
than in dedicated hardware. This commit adds a shared NIR pass for
implementing coordinate transformation and lowering gl_Position writes
into screen-space gl_Position writes.
v2: Run directly on derefs before io/vars are lowered to cleanup the
code substantially. Thank you to Qiang for this suggestion!
v3: Bikeshed continues.
v4: Add to Makefile.sources (per Jason's comment). Bikeshed comment.
Ian and Qiang's reviews are from v3, but no real functional changes from
v4. Rob's review is from v4.
Signed-off-by: Alyssa Rosenzweig <[email protected]>
Suggested-by: Qiang Yu <[email protected]>
Reviewed-by: Ian Romanick <[email protected]>
Reviewed-by: Qiang Yu <[email protected]>
Reviewed-by: Rob Clark <[email protected]>
|
|
|
|
|
|
|
|
|
|
| |
As part of this cleanup, we use the newly-exposed
u_vbuf_get_minmax_index, deduplicating quite a bit of bookkeeping. We
also centralize the draw_flags tracking to make this code cleaner /
futureproofed; we have already had bugs regarding this field so we might
as well get it right now.
Signed-off-by: Alyssa Rosenzweig <[email protected]>
|
|
|
|
|
|
|
| |
This was used as a workaround for uniform sizing which was fixed in
771adffe ("st: Lower uniforms in st in the...")
Signed-off-by: Alyssa Rosenzweig <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Fixes the following building error happening with Android build system:
external/mesa/src/gallium/auxiliary/draw/draw_gs.c:740:79:
error: address of array 'draw->gs.tgsi.machine->PrimitiveOffsets' will always evaluate to 'true' [-Werror,-Wpointer-bool-conversion]
if (!draw->gs.tgsi.machine->Primitives[i] || !draw->gs.tgsi.machine->PrimitiveOffsets)
~~~~~~~~~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~
1 error generated.
Fixes: 7720ce3 ("draw: add support to tgsi paths for geometry streams. (v2)")
Signed-off-by: Mauro Rossi <[email protected]>
Reviewed-by: Dave Airlie <[email protected]>
|
|
|
|
|
|
| |
Fixes: 92d7ca4b1cd "gallium: add lima driver"
Signed-off-by: Qiang Yu <[email protected]>
Reviewed-by: Vasily Khoruzhick <[email protected]>
|
|
|
|
|
|
|
|
| |
Come from glmark2-es2 jellyfish test.
Fixes: 92d7ca4b1cd "gallium: add lima driver"
Signed-off-by: Qiang Yu <[email protected]>
Reviewed-by: Vasily Khoruzhick <[email protected]>
|
|
|
|
|
|
|
|
|
| |
Hardware supports writing back Z/S buffers and sampling from them,
so add support for that.
Signed-off-by: Vasily Khoruzhick <[email protected]>
Reviewed-by: Qiang Yu <[email protected]>
Tested-by: Icenowy Zheng <[email protected]>
|
|
|
|
|
|
|
|
|
| |
Looks like it's somehow used by subsequent PP job, so we have to
preserve its contents until PP job is done.
Signed-off-by: Vasily Khoruzhick <[email protected]>
Reviewed-by: Qiang Yu <[email protected]>
Tested-by: Icenowy Zheng <[email protected]>
|
|
|
|
|
|
|
| |
Port TGSI TRUNC lowering to nir
Signed-off-by: Christian Gmeiner <[email protected]>
Reviewed-by: Jason Ekstrand <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
In 628c9ca9089789 I forgot to apply the same -4Gb of the high address
of the high heap VMA. This was previously computed in the
HIGH_HEAP_MAX_ADDRESS.
Many thanks to James for pointing this out.
Signed-off-by: Lionel Landwerlin <[email protected]>
Reported-by: Xiong, James <[email protected]>
Fixes: 628c9ca9089789 ("anv: store heap address bounds when initializing physical device")
Reviewed-by: Jason Ekstrand <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
We can use the same register spilling infrastructure for our loads/stores
of indirect access of temp variables, instead of doing an if ladder.
Cuts 50% of instructions and max-temps from 2 KSP shaders in shader-db.
Also causes several other KSP shaders with large bodies and large loop
counts to not be force-unrolled.
The change was originally motivated by NOLTIS slightly modifying register
pressure in piglit temp mat4 array read/write tests, triggering register
allocation failures.
|
|
|
|
|
|
|
|
|
|
| |
This commit adds new nir_load/store_scratch opcodes which read and write
a virtual scratch space. It's up to the back-end to figure out what to
do with it and where to put the actual scratch data.
v2: Drop const_index comments (by anholt)
Reviewed-by: Eric Anholt <[email protected]>
|
|
|
|
|
| |
We were missing a * 4 even if the particular hardware matched our
assumption.
|
| |
|
|
|
|
|
| |
This code is so touchy, trying to emit the minimum amount of address math.
Some day we'll move it all to NIR, I hope.
|
|
|
|
|
|
|
|
| |
While waiting for the CSD UABI to get reviewed, I keep having to rebase
the CS patch. Just land the compiler side for now to keep it from
diverging.
For now this covers just GLES 3.1 compute shaders, not CL kernels.
|
|
|
|
|
|
|
|
|
| |
We're using ARB_debug_output for the main shader-db, but I had this env
var left around from the shader-db-2 support (vc4 apitrace-based). Keep
the env var around since it's nice sometimes to get the stats on a shader
you're optimizing without having to do a shader-db run, but drop the old
formatting that's not useful and keeps tricking me when I go to add
another measurement to the shader-db output.
|
|
|
|
|
| |
This gives us finer-grained feedback on how we're doing on register
pressure than "did we trigger a new shader to spill or drop thread count?"
|
| |
|
|
|
|
|
| |
A shader invocation always executes 16 channels together, so we often end
up multiplying things by this magic 16 number. Give it a name.
|
|
|
|
|
|
| |
I was thinking about a refactor, and needed to read this first.
Reviewed-by: Jason Ekstrand <[email protected]>
|