| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
| |
Signed-off-by: Rob Clark <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
| |
Gallium can leave const buffers bound above what is used by the current
shader. Which can have a couple bad effects:
1) write beyond const space assigned, which can trigger HLSQ lockup
2) double emit of immed consts, first with bound const buffer vals
followed by with actual immed vals. This seems to be a sort of
undefined condition.
Signed-off-by: Rob Clark <[email protected]>
|
|
|
|
|
|
| |
Drop color/pos/psize_regid, plus a few compiler and IR cleanups.
Signed-off-by: Rob Clark <[email protected]>
|
|
|
|
| |
Signed-off-by: Rob Clark <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Currently lowers the following instructions:
DST, XPD, SCS, LRP, FRC, POW, LIT, EXP, LOG, DP4,
DP3, DPH, DP2
translating these into equivalent simpler TGSI instructions.
This probably should be moved to util so other drivers can use
it, but just adding under freedreno for now so that I can clear
out a lot of the lowering code in a3xx compiler before beginning
to add new compiler.
Signed-off-by: Rob Clark <[email protected]>
|
|
|
|
| |
Signed-off-by: Rob Clark <[email protected]>
|
|
|
|
| |
Signed-off-by: Rob Clark <[email protected]>
|
|
|
|
|
|
|
|
| |
The ctx should hold ref to dev to avoid problems if screen is destroyed
before ctx. Doesn't really fix the egl/glx issues, but at least it
prevents things from getting much worse.
Signed-off-by: Rob Clark <[email protected]>
|
|
|
|
| |
Signed-off-by: Rob Clark <[email protected]>
|
|
|
|
|
|
|
|
| |
Allocations actually have page alignment, but 64 is still a reasonable
value.
Signed-off-by: Ian Romanick <[email protected]>
Reviewed-by: Rob Clark <[email protected]>
|
|
|
|
|
|
|
| |
If any driver doesn't support this, it can use a blit after resolving
the samples.
Reviewed-by: Brian Paul <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Add for now some simple/basic query support (ie. things not actually
requiring the GPU). Might change around a bit when I actually add
GPU queries, but for now this enables some useful performance info
in the GALLIUM_HUD. For example:
GALLIUM_HUD=fps+batches+batches-sysmem+batches-gmem+restores,draw-calls
The driver specific specific queries are:
+ draw-calls
+ batches - number of batches per second, sum of batches-sysmem
plus batches-gmem
+ batches-gmem - render a set of tiles in GMEM, for each tile
(optionally) system mem -> gmem (restore), plus N draws,
plus gmem -> system mem (resolve) per second
+ batches-sysmem - N draws to system memory (GMEM bypass) per
second
+ restores - number of GMEM batches that required restore per
second
Ideally for GMEM rendering, you want batches-gmem to equal fps. If
the app is doing something that triggers multiple passes (ie. requires
extra round trip gmem <-> system memory) then the # of batches per
second will go up relative to fps.
Signed-off-by: Rob Clark <[email protected]>
|
|
|
|
|
|
|
|
| |
Since we now have the cmdstream patch mechanism needed for hw binning,
might as well also use it for RB_RENDER_CONTROL updates. This avoids
the need to use RMW (and associated WFI) to update RB_RENDER_CONTROL.
Signed-off-by: Rob Clark <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The binning pass sorts vertices into which bins/tiles they apply to.
The visibility information generated during the binning pass can be
used to speed up the rendering pass by filtering out vertices which
do not apply to the current tile. See:
https://github.com/freedreno/freedreno/wiki/Adreno-tiling#optimized-approach
This brings a significant fps boost. A rough assortment of tests
(supertuxkart, etracer, tremulous, glmark2 'build' test, etc) seems
to yield a ~35-45% fps improvement.
For now, to be conservative, the binning pass is not enabled yet by
default. To enable it use:
FD_MESA_DEBUG=binning
So far I haven't found anything that breaks with binning enabled,
but I'd like a bit more testing before I enable it as default.
Signed-off-by: Rob Clark <[email protected]>
|
|
|
|
|
|
| |
Only need to leave room for depth/stencil if it is actually used, etc.
Signed-off-by: Rob Clark <[email protected]>
|
|
|
|
| |
Signed-off-by: Rob Clark <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Using RMW on banked context registers is not safe. The value read
could be the wrong one. So if there has been a DRAW_IDX launched,
the RMW must be preceded by a WAIT_FOR_IDLE to ensure the read part
of RMW sees the correct value.
To avoid unnecessary WFI's, keep track if there is a need for WFI,
and only emit one if needed. Furthermore, keep track if we even
need to update the register in the first place.
And to cut down on the amount of RMW to avoid excessive WFI's, at the
tiling/GMEM level we can always overwrite RB_RENDER_CONTROL, as the
state at beginning of draw/clear cmds (which we IB to) is always
undefined. In the draw/clear commands, we always still use RMW (with
WFI if needed), but only if the register value actually changes. (At
points where the current value cannot be known, the saved value is
reset to ~0, which includes bits outside of RBRC_DRAW_STATE, so there
never is chance for confusion.)
Signed-off-by: Rob Clark <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
| |
Actually assign VSC_PIPE's properly, which will be needed for tiling.
And introduce fd_tile for per-tile state (including the assignment of
tile to VSC_PIPE). This gives us the proper pipe setup that we'll
need for hw binning pass, and also cleans things up a bit by not having
to pass so many parameters around. And will also make it easier to
introduce different tiling patterns (since we may no longer render
tiles in a simple left-to-right top-to-bottom pattern).
Signed-off-by: Rob Clark <[email protected]>
|
|
|
|
| |
Signed-off-by: Rob Clark <[email protected]>
|
|
|
|
|
|
| |
Fixes gpu lockups in supertuxkart.
Signed-off-by: Rob Clark <[email protected]>
|
|
|
|
|
|
| |
Signed-off-by: Christopher James Halse Rogers <[email protected]>
Reviewed-by: Thomas Hellstrom <[email protected]>
Signed-off-by: Maarten Lankhorst <[email protected]>
|
|
|
|
| |
Signed-off-by: Rob Clark <[email protected]>
|
|
|
|
| |
Signed-off-by: Rob Clark <[email protected]>
|
| |
|
| |
|
|
|
|
|
|
|
|
|
| |
Ever since introducing separate sampler and sampler view max this was really
missing.
Every driver but llvmpipe reports the same number as number of samplers for
now, so nothing should break.
Reviewed-by: Jose Fonseca <[email protected]>
|
|
|
|
|
|
|
|
|
|
| |
* minimise flags duplication
* distingush between VISIBILITY C and CXX flags
* set only required flags - C and/or CXX
v2: add LLVM_CFLAGS back to AM_CFLAGS (add missing backslash)
Signed-off-by: Emil Velikov <[email protected]>
|
|
|
|
| |
Signed-off-by: Emil Velikov <[email protected]>
|
|
|
|
|
|
|
|
|
| |
Nearly everything within the three Makefile.am's is identical.
Let's simplify things a little.
v2: Rebase and rewrite the commit message (Emil Velikov)
Signed-off-by: Emil Velikov <[email protected]>
|
|
|
|
| |
Signed-off-by: Rob Clark <[email protected]>
|
|
|
|
| |
Signed-off-by: Rob Clark <[email protected]>
|
|
|
|
|
|
|
|
| |
Actually link VS out / FS in based on semantic info, keeping in mind
that position/pointsize can also be an input to the FS. This fixes a
few fragment shaders which were using gl_Position.
Signed-off-by: Rob Clark <[email protected]>
|
|
|
|
| |
Signed-off-by: Rob Clark <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Fixes use of full-precision in fragment shader (ie. don't clobber r0.x
since that can be used by future bary instructions for varying fetch).
And makes use of full-precision the default in fragment shader (but can
be overriden via FD_MESA_DEBUG=fraghalf).
Seems like half precision is often not enough for texture coordinates.
The blob compiler is clever enough to keep texture coords in full
precision registers while using half precision for everything else. But
we aren't quite that clever yet, so better to default to full precision.
Signed-off-by: Rob Clark <[email protected]>
|
|
|
|
|
|
|
| |
Handle some relative addressing constraints: cannot handle const or
relative in cat5 and src2 of cat3.
Signed-off-by: Rob Clark <[email protected]>
|
|
|
|
| |
Signed-off-by: Rob Clark <[email protected]>
|
|
|
|
|
|
|
| |
Use u_primconvert to convert unsupported primitives into supported
primitive plus index buffer.
Signed-off-by: Rob Clark <[email protected]>
|
|
|
|
|
|
| |
pull in some fixes to draw-initiator/prim-type.
Signed-off-by: Rob Clark <[email protected]>
|
|
|
|
|
|
|
|
|
| |
This CAP will determine whether ARB_framebuffer_object can be enabled.
The nv30 driver does not allow mixing swizzled and linear zsbuf/cbuf
textures.
Signed-off-by: Ilia Mirkin <[email protected]>
Signed-off-by: Marek Olšák <[email protected]>
|
|
|
|
| |
Signed-off-by: Rob Clark <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The encoding of constant, relative, and relative-const src registers is
a bit more complex than originally thought, which gives an extra bit to
encode const reg # at expense of taking a bit from relative offset.
In most cases a3xx seems to actually use a scheme whereby it can encode
an extra bit for const register. You have three possible encodings in
thirteen bits:
register: (11 bits for N.c)
00........... rN.c
relative: (10 bits for N)
010.......... r<a0.x + N>
011.......... c<a0.x + N>
const: (12 bits for N.c)
1............ cN.c
Which means we can deal w/ more consts than previously thought.
Signed-off-by: Rob Clark <[email protected]>
|
|
|
|
| |
Signed-off-by: Rob Clark <[email protected]>
|
|
|
|
|
|
| |
Fail more gracefully when buffer allocation/import fails.
Signed-off-by: Rob Clark <[email protected]>
|
|
|
|
|
|
| |
Small typo introduced in a3ed98f.
Signed-off-by: Rob Clark <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
| |
The new function replaces four old functions: set_fragment/vertex/
geometry/compute_sampler_views().
Note: at this time, it's expected that the 'start' parameter will
always be zero.
Reviewed-by: Roland Scheidegger <[email protected]>
Reviewed-by: Marek Olšák <[email protected]>
Tested-by: Emil Velikov <[email protected]>
|
| |
|
|
|
|
|
| |
Signed-off-by: Emil Velikov <[email protected]>
Reviewed-by: Tom Stellard <[email protected]>
|
|
|
|
|
|
|
|
|
| |
r600g needs explicit flushing before DRI2 buffers are presented on the screen.
v2: add (stub) implementations for all drivers, fix frontbuffer flushing
v3: fix galahad
Signed-off-by: Marek Olšák <[email protected]>
|
|
|
|
| |
Signed-off-by: Rob Clark <[email protected]>
|
|
|
|
|
|
|
| |
duh, we still need to flush if there are pending draws and it isn't an
unsynchronized case.
Signed-off-by: Rob Clark <[email protected]>
|