| Commit message (Collapse) | Author | Age | Files | Lines |
| |
|
|
|
|
|
|
|
|
| |
v2: add commments for limitation of max references numbers,
and what the caculation is based
Signed-off-by: Leo Liu <[email protected]>
Reviewed-by: Christian König <[email protected]>
|
|
|
|
|
|
|
|
|
| |
Hopefully this is the last one now (for texture X32_S8X24_UINT views).
+4 piglits.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=90167
Reviewed-by: Brian Paul <[email protected]>
|
|
|
|
| |
Reviewed-by: Francisco Jerez <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Extends the syntax of GALLIUM_HUD environment variable to:
- Add options to set the size and exact location of each pane.
- Add an option to limit the maximum allowed value of the X axis on a
pane, clamping the graph down to not go above this value.
- Add an option to auto-adjust the value of the Y axis down to the
highest value still visible on the graph.
v2:
- Make the patch simpler and smaller.
- With dynamic auto-adjusting on, adjust the Y axis once per pane
update instead of updating once every several seconds.
- No longer mishandle pane height when having more than one graph per
pane.
|
|
|
|
|
|
|
|
|
|
|
| |
This code is only used when our memory debugging wrappers are enabled,
as we use the C runtime functions directly elsewhere.
Tested llvmpipe on Windows w/ memory debugging enabled.
VMware PR894263.
Reviewed-by: Roland Scheidegger <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
We were resetting the prim id count for each run of the prim assembler,
hence this only worked when the draw calls were very small (the exact limit
depending on the vertex size), since larger draw calls get split up.
So, do the same as we do already if there's a gs, reset it to zero explicitly
for every new instance (this possibly could use the same variable but that
isn't doable without some heavy refactoring and I'm not sure it makes sense).
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=90130.
Reviewed-by: Jose Fonseca <[email protected]>
CC: <[email protected]>
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Android 5.0 allows modules to generate source into $OUT/gen, which will
then be copied into $OUT/obj and $OUT/obj_$(TARGET_2ND_ARCH) as necessary.
Modules will need to change calls to local-intermediates-dir into
local-generated-sources-dir.
The patch changes local-intermediates-dir into local-generated-sources-dir.
If the Android version is less than 5.0, fallback to local-intermediates-dir.
The patch also fixes the 64-bit building issue of Android 5.0.
v2 [Emil Velikov]
- Keep the LOCAL_UNSTRIPPED_PATH variable.
Signed-off-by: Chih-Wei Huang <[email protected]>
|
|
|
|
|
|
|
|
|
| |
Many parts of mesa already have the include with others depending on it
but it's missing. Add it once at the top makefile and be done with it.
Cc: "10.4 10.5" <[email protected]>
Signed-off-by: Emil Velikov <[email protected]>
Reviewed-by: Chih-Wei Huang <[email protected]>
|
|
|
|
|
|
|
|
|
| |
LLVM removed JITEmitDebugInfo from TargetOptions since they weren't used
v2: Be consistent with the LLVM version check (Aaron Watry)
Signed-off-by: Nick Sarnie <[email protected]>
Reviewed-and-Tested-by: Michel Dänzer <[email protected]>
|
|
|
|
|
|
|
|
|
|
| |
This allows drivers to provide consistent flat shading for quads.
Otherwise a driver that only supported tris would have to force last
provoking vertex when drawing quads (and would have to say that quads
don't follow the provoking vertex convention).
Signed-off-by: Ilia Mirkin <[email protected]>
Reviewed-by: Rob Clark <[email protected]>
|
|
|
|
|
|
|
|
|
| |
This should match to how drivers program hardware. flatshade relates to
whether color inputs are interpolated, not the provoking vertex
convention.
Signed-off-by: Ilia Mirkin <[email protected]>
Reviewed-by: Rob Clark <[email protected]>
|
|
|
|
|
|
|
|
| |
There is a level param stashed away in the .w component of the first
src.
Signed-off-by: Rob Clark <[email protected]>
Reviewed-by: Eric Anholt <[email protected]>
|
|
|
|
|
|
|
|
| |
v2: move ishl into ttn (instead of driver backend) to keep the units
consistent between immediate and indirect offsets
Signed-off-by: Rob Clark <[email protected]>
Reviewed-by: Eric Anholt <[email protected]>
|
|
|
|
|
|
|
|
|
|
| |
v2: also use ttn_src_for_indirect() everywhere for addr access, rather
than open-coding it for INPUT/CONST srcs
v3: move ralloc out of ttn_src_for_indirect() into the one call site
that needs a ptr
Signed-off-by: Rob Clark <[email protected]>
Reviewed-by: Eric Anholt <[email protected]>
|
|
|
|
|
| |
Signed-off-by: Rob Clark <[email protected]>
Reviewed-by: Eric Anholt <[email protected]>
|
|
|
|
|
| |
Signed-off-by: Rob Clark <[email protected]>
Reviewed-by: Eric Anholt <[email protected]>
|
|
|
|
|
|
| |
So we can see the label associated with subroutines.
Reviewed-by: José Fonseca <[email protected]>
|
|
|
|
|
|
|
|
| |
So far just the system values that freedreno supports, so we may add
more later.
Signed-off-by: Rob Clark <[email protected]>
Reviewed-by: Eric Anholt <[email protected]>
|
|
|
|
|
|
|
| |
With TXD we also have the ddx/ddy sources (before the sampler).
Signed-off-by: Rob Clark <[email protected]>
Reviewed-by: Eric Anholt <[email protected]>
|
|
|
|
|
|
|
|
|
|
| |
Split out from ttn_tex() since it is kind of a weird instruction that
maps to two NIR opcodes, and it was cleaner this way.
v2: query_levels doesn't take any args
Signed-off-by: Rob Clark <[email protected]>
Reviewed-by: Eric Anholt <[email protected]>
|
|
|
|
|
|
|
|
| |
We'll need this as well for TXQ. Split this out first to reduce noise
in the next patch.
Signed-off-by: Rob Clark <[email protected]>
Reviewed-by: Eric Anholt <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Since the rest of NIR really would rather have these as variables rather
than registers, create a nir_variable per array. But rather than
completely re-arrange ttn to be variable based rather than register
based, keep the registers. In the cases where there is a matching var
for the reg, ttn_emit_instruction will append the appropriate intrinsic
to get things back from the shadow reg into the variable.
NOTE: this doesn't quite handle TEMP[ADDR[]] when the DCL doesn't give
an array id. But those just kinda suck, and should really go away.
AFAICT we don't get those from glsl. Might be an issue for some other
state tracker.
v2: rework to use load_var/store_var with deref chains
v3: create new "burner" reg for temporarily holding the (potentially
writemask'd) dest after each instruction; add load_var to initialize
temporary dest in case not all components are overwritten
v4: review comments: asserts and use ttn_src_for_indirect() in
ttn_array_deref() so we can drop later patch converting to use vec1 for
addr reg (since ttn_src_for_indirect() handles the imov to vec1 from
tgsi addr component that we want)
v5: rebase: new requirements about parent mem ctx for derefs
Signed-off-by: Rob Clark <[email protected]>
Reviewed-by: Eric Anholt <[email protected]>
|
|
|
|
|
|
|
| |
Extract tgsi_dst->Index into a local.. split out from 'gallium/ttn: add
support for temp arrays' for noise reduction..
Signed-off-by: Rob Clark <[email protected]>
|
|
|
|
|
|
|
|
| |
Revert 50e9fa2ed69cb5f76f66231976ea789c0091a64d as LLVM reverted their
change.
Signed-off-by: Nick Sarnie <[email protected]>
Reviewed-by: Jan Vesely <[email protected]>
|
|
|
|
|
|
| |
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=89963
Signed-off-by: Vinson Lee <[email protected]>
Reviewed-by: Tom Stellard <[email protected]>
|
|
|
|
|
|
|
|
| |
Neither the shader nor the key change when doing elts or linear variant, so
this was just annoying (probably mildly useful at some point when we printed
the IR per function too).
Reviewed-by: Jose Fonseca <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
llvm goes crazy when doing that, using way more memory and time, though there's
probably more to it - this points to a very much similar issue as fixed in
8a9f5ecdb116d0449d63f7b94efbfa8b205d826f. In any case I've seen a quite
plain looking vertex shader with just ~50 simple tgsi instructions (but with a
dozen or so such indirect constant buffer lookups) go from a terribly high
~440ms compile time (consuming 25MB of memory in the process) down to a still
awful ~230ms and 13MB with this fix (with llvm 3.3), so there's still obvious
improvements possible (but I have no clue why it's so slow...).
The resulting shader is most likely also faster (certainly seemed so though
I don't have any hard numbers as it may have been influenced by compile times)
since generally fetching constants outside the buffer range is most likely an
app error (that is we expect all indices to be valid).
It is possible this fixes some mysterious vertex shader slowdowns we've seen
ever since we are conforming to newer apis at least partially (the main draw
loop also has similar looking conditionals which we probably could do without -
if not for the fetch at least for the additional elts condition.)
v2: use static vars for the fake bufs, minor code cleanups
Reviewed-by: Jose Fonseca <[email protected]>
|
|
|
|
| |
Signed-off-by: Dave Airlie <[email protected]>
|
|
|
|
|
|
|
|
|
| |
arb_stencil_texturing-draw failed under softpipe because we got a float
back from the texturing function, and then tried to U2F it, stencil
texturing returns ints, so we should fix the tiling to retrieve
the stencil values as integers not floats.
Signed-off-by: Dave Airlie <[email protected]>
|
|
|
|
|
| |
Signed-off-by: Rob Clark <[email protected]>
Reviewed-by: Eric Anholt <[email protected]
|
|
|
|
|
| |
Signed-off-by: Rob Clark <[email protected]>
Reviewed-by: Eric Anholt <[email protected]>
|
|
|
|
|
|
|
| |
This adds all the plumbing to get EGL_EXT_image_dma_buf_import in
i915g.
Signed-off-by: Stéphane Marchesin <[email protected]>
|
|
|
|
|
| |
Acked-by: Matt Turner <[email protected]>
Signed-off-by: Emil Velikov <[email protected]>
|
|
|
|
|
|
| |
SCons does not build NIR yet.
Trivial.
|
|
|
|
|
|
| |
Add include path for generated nir_opcodes.h.
Trivial.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This will be used by the VC4 driver for doing device-independent
optimization, and hopefully eventually replacing its whole IR. It also
may be useful to other drivers for the same reason.
v2: Add all of the instructions I was relying on tgsi_lowering to remove,
and more.
v3: Rebase on SSA rework of the builder.
v4: Use the NIR ineg operation instead of doing a src modifier.
v5: Don't use ineg for fnegs. (infer_src_type on MOV doesn't do what I
expect, again).
v6: Fix handling of multi-channel KILL_IF sources.
v7: Make ttn_get_f() return a swizzle of a scalar load_const, rather than
a vector load_const. CSE doesn't recognize that srcs out of those
channels are actually all the same.
v8: Rebase on nir_builder auto-sizing, make the scalar arguments to
non-ALU instructions actually be scalars.
v9: Add support for if/loop instructions, additional texture targets, and
untested support for indirect addressing on temps.
v10: Rebase on master, drop bad comment about control flow and just choose
the X channel, use int comparison opcodes in LIT for now, drop unused
pipe_context argument..
v11: Fix translation of LRP (previously missed because I mis-translated
back out), use nir_builder init helpers.
v12: Rebase on master, adding explicit include of mtypes.h to get
INTERP_QUALIFIER_*
v13: Rebase on variables being in lists instead of hash tables, drop use
of mtypes.h in favor of util/pipeline.h. Use Ken's nir_builder
swizzle and fmov/imov_alu helpers, drop "struct" in front of
nir_builder, use nir_builder directly as the function arg in a lot of
cases, drop redundant members of ttn_compile that are also in
nir_builder, drop some half-baked malloc failure handling.
v14: The indirect uniform src0 should be scalar, not vector (noticed as
odd by robclark, confirmed by cwabbott). Apply Ken's review to
initialize s->num_uniforms and friends, skip ttn_channel for dot
products, and use the simpler discard_if intrinsic.
Reviewed-by: Kenneth Graunke <[email protected]> (v13)
Acked-by: Rob Clark <[email protected]>
|
|
|
|
|
| |
Copy and paste bug with the img filter decision. Since there's only 2 different
filters anyway just drop this bit.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
We've seen some cases where performance can hurt quite a bit.
Technically, the more simple the function the more overhead there is
for using a function for this (and the less benefits this provides).
Hence don't do this if we expect the generated code to be simple.
There's an even more important reason why this hurts performance,
which is shaders reusing the same unit with some of the same inputs,
as llvm cannot figure out the calculations are the same if they
are performned in the function (even just reusing the same unit without
any input being the same provides such optimization opportunities though
not very much). This is something which would need to be handled by IPO
passes however.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This is quite trivial, essentially just follow all the same code you'd
use with linear min/mag (and no mip) filter, then just skip the filtering
after looking up the texels in favor of direct assignment of the right channel
to the result. (This is though not true for the multi-offset version if we'd
want to support it - for this would probably need to do something along the
lines of 4x nearest sampling due to the necessity of doing coord wrapping
individually per texel.)
Supports multi-channel formats.
From the SM5 gather cap bit, should support non-constant offsets, plus shadow
comparisons (the former untested), but not component selection (should be
easy to implement but all this stuff is not really exposable anyway for now).
Reviewed-by: Jose Fonseca <[email protected]>
|
|
|
|
|
|
| |
Luckily thanks to the revamped interface this is a lot less work now...
Reviewed-by: Jose Fonseca <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This has got a bit out of control with more and more parameters added.
Worse, whenever something in there changes all callees have to be updated
for that, even though they don't really do much with any parameter in there
except pass it on to the actual sampling function.
Hence simply put almost everything into a struct. Also instead of relying
on some arguments being NULL, be explicit and set this in a key (which is
just reused for function generation for simplicity). (The code still relies
on them being NULL in the end for now.)
Technically there is a minimal functional change here for shadow sampling:
if shadow sampling is done is now determined explicitly by the texture
function (either sample_c or the gl-style tex func inherit this from target)
instead of the static texture state. These two should always match, however.
Otherwise, it should generate all the same code.
Reviewed-by: Jose Fonseca <[email protected]>
|
| |
|
|
|
|
| |
Reviewed-by: Jose Fonseca <[email protected]>
|
|
|
|
|
|
|
|
|
|
| |
When using the texel fetch functions rather than ordinary texturing,
the arguments are all int vecs instead of float vecs, not to mention
the actual function would look completely different. Hence this must
be included in the texture function name (which serves as the key)
otherwise things crash badly when a shader accesses the same texture
and sampler unit with both txf/ld and ordinary texturing instructions
with otherwise matching keys.
|
|
|
|
|
| |
Signed-off-by: Jan Vesely <[email protected]>
Reviewed-by: Tom Stellard <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
There are issues with inlining everything, most notably llvm will use much
more memory (and be slower) when compiling. Ideally we'd probably use
functions for shader functions too but texture sampling usually is responsible
for quite some IR (it can easily reach 80% of total IR instructions) so this
seems like a good start.
This still generates a different function for all different combinations just
like before, however it is possible llvm is missing some optimization
opportunities - it is believed though such opportunities should be somewhat
rare, but at least for now it can still be switched off (at compile time only).
It should probably make compiled code also smaller because the same function
should be used for different variants in the same module (so for the
opaque/partial or linear/elts variants).
No piglit change (though it does indeed speed up unrealistic tests like
fp-indirections2 by a factor of 30 or so).
Has a small negative performance impact in openarena - I suspect this could
be fixed by running some IPO passes (despite the private linkage, llvm right
now does NO optimization at all wrt anything going past the call, even if
there's just one caller - so things like values stored before the call and then
always written by the function etc. will not be optimized away, nor will dead
arguments (which we mostly shouldn't have) be eliminated, always constant
arguments promoted etc.).
v2: use proper return values instead of pointer function arguments.
llvm supports aggregate return values, which do wonders here eliminating
unnecessary stack variables - everything in fact will be returned in registers
even without any IPO optimizations. It makes the code simpler too.
With this I could not measure a peformance impact in openarena any longer
(though since there's still no constant value propagation etc. into the tex
functions this does not mean it couldn't have a negative impact elsewhere).
v3: fix some minor issues suggested by Jose, and do disassembly (and the
profiling) without hacks.
Reviewed-by: Jose Fonseca <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The callbacks used for getting the dynamic texture/sampler state were using
the jit_context from the generated jit function. This works just fine, however
that way it's impossible to generate separate functions for texture sampling,
as will be done in the next commit. Hence, pass this pointer through all
interfaces so it can be passed to a separate function (technically, it would
probably be possible to extract this pointer from the current function instead,
but this feels hacky and would probably require some more hacks if we'd use
real functions instead of inlining all shader functions at some point).
There should be no difference in the generated code for now.
Reviewed-by: Jose Fonseca <[email protected]>
|
|
|
|
|
|
|
|
| |
The data in memory is in big endian format and needs to be converted
into CPU byte order. So the patch actually reversed what needs to be done.
Signed-off-by: Christian König <[email protected]>
Reviewed-by: Matt Turner <[email protected]>
|