| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
|
|
|
|
| |
I was previously using temporary disables of VC4 optimization to show the
benefits of improved NIR optimization, but this can get me quick and dirty
numbers for NIR-only improvements without having to add hacks to disable
VC4's code (disabling of which might hide ways that the NIR changes would
hurt actual VC4 codegen).
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
NIR brings us better optimization than I would have bothered to write
within the driver, developers sharing future optimization work, and the
ability to share device-specific lowering code that we and other
GLES2-level drivers need.
total uniforms in shared programs: 13421 -> 13422 (0.01%)
uniforms in affected programs: 62 -> 63 (1.61%)
total instructions in shared programs: 39961 -> 39707 (-0.64%)
instructions in affected programs: 15494 -> 15240 (-1.64%)
v2: Add missing imov support, and assert that there are no dest saturates.
v3: Rebase on the target-specific algebraic series.
v4: Rebase on gallium-includes-from-NIR changes in mater.
v5: Rebase on variables being in lists instead of hash tables.
v6: Squash in intermediate changes that used the NIR-to-TGSI pass (which
I'm not committing)
|
|
|
|
| |
I had regressed them for a while with the NIR work.
|
|
|
|
|
|
|
|
| |
This cleans up more instructions generated by uniform array indexing
multiplies.
total instructions in shared programs: 39989 -> 39961 (-0.07%)
instructions in affected programs: 896 -> 868 (-3.12%)
|
|
|
|
|
|
|
|
|
|
|
|
| |
This cleans up some pointless operations generated by the in-driver mul24
lowering (commonly generated by making a vec4 index for a matrix in a
uniform array).
I could fill in other operations, but pretty much anything else ought to
be getting handled at the NIR level, I think.
total uniforms in shared programs: 13423 -> 13421 (-0.01%)
uniforms in affected programs: 346 -> 344 (-0.58%)
|
|
|
|
|
|
|
|
|
|
| |
The hardware just uses the low 24 lines, saving us an AND to drop the high
bits.
total uniforms in shared programs: 13433 -> 13423 (-0.07%)
uniforms in affected programs: 356 -> 346 (-2.81%)
total instructions in shared programs: 40003 -> 39989 (-0.03%)
instructions in affected programs: 910 -> 896 (-1.54%)
|
|
|
|
|
| |
The hardware uses the low 24 bits in integer multiplies, so we can have
fewer high bits (and so probably drop them more frequently).
|
|
|
|
|
| |
This has been useful once again while trying to debug stride issues
between render targets and texturing.
|
|
|
|
|
| |
Fixes some non-power-of-two texture rendering when I force ARGB8888 to
raster.
|
|
|
|
|
|
| |
16 / cpp happens to be the same as utile_w on the only raster format
supported (4 bytes per pixel), but simulator/hw source code generally
talks in terms of utiles.
|
|
|
|
| |
The enum compared to was 0, so it worked out, but it sure looked wrong.
|
|
|
|
|
|
| |
I'm experimenting with a workaround for raster texture misrendering on
hardware, and this lets me look at the format chosen when computing
strides.
|
|
|
|
|
|
| |
They don't do anything special for us, but I've been told by kernel
maintainers that relying on dumb for my acceleration-capable buffers
is not OK.
|
|
|
|
|
|
| |
I'd like to compile as much of the device-specific code as possible
when building for simulator, and using if (using_simulator) instead of
ifdefs helps.
|
|
|
|
| |
I keep rewriting these.
|
|
|
|
|
|
|
|
|
| |
The only hackish ones are llvmpipe and softpipe, which currently return
the same string as for get_vendor(), while ideally they should return
the CPU vendor.
Signed-off-by: Giuseppe Bilotta <[email protected]>
Reviewed-by: Tom Stellard <[email protected]>
|
|
|
|
|
|
|
|
|
| |
Needed by ARB_gpu_shader5.
v2: select DMAD for FMA with double precision
v3: add and select DFMA
Reviewed-by: Ilia Mirkin <[email protected]>
|
|
|
|
|
|
| |
New BO create and mmap ioctls are added. The submit ABI gains a flags
argument, and the pointers are fixed at 64-bit. Shaders are now fixed at
the start of their BOs.
|
|
|
|
|
| |
The optimization passes are always regenerating it and throwing it away,
but it's not hard to keep track of.
|
|
|
|
|
| |
I may want them in optimization passes, and they're not really particular
to the program translation stage.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This lets us more intelligently decide which uniform values should be put
into temporaries, by choosing the most reused values to push to temps
first.
total uniforms in shared programs: 13457 -> 13433 (-0.18%)
uniforms in affected programs: 1524 -> 1500 (-1.57%)
total instructions in shared programs: 40198 -> 40019 (-0.45%)
instructions in affected programs: 6027 -> 5848 (-2.97%)
I noticed this opportunity because with the NIR work, some programs were
happening to make different uniform copy propagation choices that
significantly increased instruction counts.
|
| |
|
| |
|
|
|
|
|
|
| |
v2: add alignment restrictions to docs, fix indentation in headers
Reviewed-by: Christian König <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
| |
Right now the places that used to emit a mov.sf just put the SF on the
previous instruction when it generated the source of the SF value. Even
without optimization to push the sf up further (and kill thus potentially
kill more MOVs), this gets us:
total uniforms in shared programs: 13455 -> 13457 (0.01%)
uniforms in affected programs: 3 -> 5 (66.67%)
total instructions in shared programs: 40296 -> 40198 (-0.24%)
instructions in affected programs: 12595 -> 12497 (-0.78%)
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Resolving a multisampled depth texture into
a single sampled texture is supported on >= SM4.1
hw. It is possible some previous hw support it.
The ability was tested on radeonsi and nvc0.
Apparently is is also supported for radeon >= r700.
This patch adds the MULTISAMPLE_Z_RESOLVE cap and
add it to the drivers. It is advertised for drivers
for which it is sure the ability is supported.
Reviewed-by: Marek Olšák <[email protected]>
Signed-off-by: Axel Davy <[email protected]>
|
|
|
|
|
|
| |
Signed-off-by: Ilia Mirkin <[email protected]>
Reviewed-by: Glenn Kennard <[email protected]>
Reviewed-by: Roland Scheidegger <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This limits the style changes to modes inherited from prog-mode. The
main reason to do this is to avoid setting fill-column for people
using Emacs to edit commit messages because 78 characters is too many
to make it wrap properly in git log. Note that makefile-mode also
inherits from prog-mode so the fill column should continue to apply
there.
v2: Apply to all the .dir-locals.el files, not just the one in the
root directory.
Acked-by: Michel Dänzer <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
I could have done this in the bit that generates the ANDs and ORs, but
it's probably generally useful. Sadly, I still need this even if I move
to NIR, because I can't yet express my read of the destination color in
NIR, which I would need to move my blend/logicop/colormask handling into
NIR.
total uniforms in shared programs: 13497 -> 13455 (-0.31%)
uniforms in affected programs: 101 -> 59 (-41.58%)
total instructions in shared programs: 40797 -> 40296 (-1.23%)
instructions in affected programs: 1639 -> 1138 (-30.57%)
|
|
|
|
|
| |
Since the VPM reads have to be in order, it's useful to see their indices
in the dump.
|
| |
|
|
|
|
|
|
|
| |
The code was exactly the same, except util/ has c++ guards and a struct
simple_node declaration.
Reviewed-by: Marek Olšák <[email protected]>
|
| |
|
| |
|
|
|
|
| |
I wanted to read it, so I wrote parsing.
|
|
|
|
|
| |
Execution will end at the cl->next, because that's what ct0ea/ct1ea get
programmed to.
|
|
|
|
|
| |
Everything from ETC1 to RGBA64 was getting its top bit dropped, but we
didn't use any of those formats.
|
|
|
|
|
| |
Theoretically it should apply after dithering as well, but ditehring for
565 happens in fixed function in the TLB store.
|
|
|
|
|
| |
Since unpack only happens on things read from the A register file, we have
to leave them as something that can be allocated to A (temp or uniform).
|
|
|
|
| |
I want it from another location.
|
|
|
|
|
| |
It would mean different unpacking behavior, since only the A file does
unpack (with PM==0).
|
|
|
|
|
| |
No difference on shader-db, but prevents definite regressions in the
blending changes.
|
|
|
|
|
| |
No difference on shader-db, but will become more important as I introduce
more use of pack flags with the blending changes.
|
|
|
|
|
|
| |
It turns out the simulator was not treating this bit the same as the RPi,
and I'd forgotten to remove it when turning on early Z. The result was
that you'd get big chunks of your rendering missing.
|
|
|
|
| |
Fixes the remaining ARB_color_buffer_float rendering tests.
|
| |
|
|
|
|
|
|
|
|
| |
No need to recheck the FS compile when the VS source has changed, but
there *is* a need to recheck the VS compile when the compiled VS has
changed (since the live inputs may change).
Fixes es3conform's blend test.
|
|
|
|
|
|
|
| |
The util_pack_color() thing only sets up the low bits of the union, so
only return them, too. Fixes intermittent failure on
fbo-alphatest-formats and es3conform's framebuffer-objects test under
simulation.
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Turns out this was harmful in code quality:
total instructions in shared programs: 39487 -> 38845 (-1.63%)
instructions in affected programs: 22522 -> 21880 (-2.85%)
This costs us yet another register, which is painful since it means more
programs might fail to compile). However, the alternative was causing us
trouble where we'd save/restore r3 while it contained a MIN-ed direct
texture offset, causing the kernel to fail to validate our shaders (such
as in GLB2.7).
|
|
|
|
|
|
|
|
| |
This gets a bunch of dead reads out of the CSes, which don't read most
attributes generally.
total instructions in shared programs: 39753 -> 39487 (-0.67%)
instructions in affected programs: 4721 -> 4455 (-5.63%)
|