| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
|
|
|
|
|
| |
We can do one per instruction, and we have to be careful not to overwrite
raddr_b, but this greatly reduces the pressure on uniform loads
(particularly around ldvpm/stvpm instructions).
total instructions in shared programs: 90768 -> 88220 (-2.81%)
instructions in affected programs: 82711 -> 80163 (-3.08%)
|
|
|
|
|
|
| |
Sometimes when iterating over sources, we might want to check if it's the
implicit one. We wouldn't want to match on a non-implicit src using this
function.
|
|
|
|
|
|
| |
shader-db:
total instructions in shared programs: 91275 -> 90768 (-0.56%)
instructions in affected programs: 20702 -> 20195 (-2.45%)
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
These instructions let us write directly to the phys regfile, instead of
just R4. That lets us avoid moving out of R4 to avoid conflicting with
other SFU results, and to avoid conflicting with thread switches.
There is still an extra instruction of latency, which is not represented
in the scheduler at the moment. If you use the result before it's ready,
the QPU will just stall, unlike the magic R4 mode where you'd read the
previous value. That means that the following shader-db results aren't
quite representative (since we now cause some stalls instead of emitting
nops), but they're impressive enough that I'm happy with the change.
total instructions in shared programs: 95669 -> 91275 (-4.59%)
instructions in affected programs: 82590 -> 78196 (-5.32%)
|
|
|
|
|
| |
These instructions allow writing the result to any register, instead of a
special writeback to r4.
|
|
|
|
|
| |
Noticed while trying to sort a new op into the appropriate place to match
the documentation.
|
|
|
|
|
| |
If we fail initial disassembly, it's good to know what instruction it was
that failed.
|
|
|
|
| |
We lower saturates in NIR.
|
|
|
|
|
|
|
|
|
|
|
| |
Similarly to VC4's implementation, by not picking r0 immediately upon
freeing it, we give the scheduler more of a chance to fit later writes in
earlier. I'm not clear on whether there's any real cost to picking phys
over accumulators, so keep that behavior for now.
shader-db:
total instructions in shared programs: 96831 -> 95669 (-1.20%)
instructions in affected programs: 77254 -> 76092 (-1.50%)
|
|
|
|
|
|
|
|
|
| |
This restriction existed in V3D 2.x, but lifting it was a major change in
3.x.
shader-db results:
total instructions in shared programs: 98117 -> 96831 (-1.31%)
instructions in affected programs: 48520 -> 47234 (-2.65%)
|
|
|
|
|
| |
I keep having to ignore these shader-db changes since I don't trust them,
so just disable the reports entirely.
|
|
|
|
|
|
|
|
| |
total instructions in shared programs: 98578 -> 98119 (-0.47%)
instructions in affected programs: 27571 -> 27112 (-1.66%)
and it also eliminates most spills/fills on the CTS's randomized uniform
usage testcases.
|
|
|
|
| |
The docs had an update noting this restriction, so reflect it in the code.
|
|
|
|
|
| |
This doesn't affect us yet since we're not doing TMUWTs, but I think we
will for GLES 3.1.
|
|
|
|
|
| |
Fixes a bunch of piglit interpolation tests, and reduces my concern about
some MSAA blit shaders with noperspective varyings.
|
|
|
|
|
|
| |
We weren't using the field yet, so it didn't affect anything.
Fixes: c0476d964abb ("v3d: Express dithering mode in the same way that the CLIF parser does.")
|
|
|
|
| |
Fixes piglit ext_framebuffer_multisample-draw-buffers-alpha-to-one
|
|
|
|
|
| |
We don't actually set the two flags together, but I want to use the
r/g/b/a reordered fields in the next commit.
|
|
|
|
|
| |
This fixes GPU hangs on 7278 in transform feedback tests such as
GTF-GLES3.gtf.GL3Tests.transform_feedback2.transform_feedback2_basic
|
|
|
|
| |
It doesn't depend on V3D_VER, since it's just calling v3d_print_group.
|
|
|
|
|
| |
Noticed when trying to CLIF parse a transform feedback job that hangs on
HW.
|
| |
|
|
|
|
|
|
| |
We were overlapping it with the threadable/nan flags, resulting in
incorrect relocations (threadable/nan included in the offset) and wrong
ordering in the CLIF files.
|
|
|
|
|
| |
It looks like we don't need this flag for anything (not that I'm clear on
what it does), but it makes our struct dumping line up with CLIF parsing.
|
| |
|
|
|
|
|
| |
Noticed when trying to feed our dumps through the CLIF parser. Since this
is a "minus one" field, we were already filling in the value we wanted (0).
|
|
|
|
|
| |
The render_target_* fields gave us pretty(ish) printing, but meant we were
incompatible with CLIF, and had much more verbose code generating them.
|
|
|
|
|
|
| |
The XML ends up noisier if you're only looking at one version, but from
the diffstat there's obvious wins in terms of deduplication. This will
get even more significant if we ever support 3.2 or 4.0.
|
|
|
|
|
|
|
| |
The XML zipper wants one XML per version for filling out its tables, but
we want to do more than one GPU version per XML now. Assume that the
"gen" field will be the same as min_ver and look up our XML text assuming
that they're listed in increasing min_ver.
|
|
|
|
|
| |
This will be used to merge together the V3D 3.3-4.1 XML with the variants
disabled based on the version.
|
|
|
|
|
|
|
| |
It turns out that most V3D versions change very few packets, so keeping
separate copies of the XML per version makes changing the XML a pain as
you have to replicate your changes to each one. This is the start of
changing it so that one XML can generate headers for multiple versions.
|
|
|
|
|
| |
This fixes up their formatting for CLIF files and makes the code more
legible.
|
|
|
|
|
|
|
|
|
|
| |
Right now, we name these fields as "field name minus one" so that your C
code obviously states what the value should be. However, it's easy enough
to handle at the codegen level with another little XML attribute, meaning
less C code and easier-to-read values in CLIF dumping and gdb as well.
(The actual CLIF format for simulator and FPGA replay takes in
pre-minus-one values, so we need it there too).
|
|
|
|
|
|
| |
For a meson -Db_ndebug=true release build on x86_64, reduces text size of
libv3d.a from 53.0k to 51.6k. Inspired by 0d5329d626e3 ("anv: Disable
__gen_validate_value if NDEBUG is set.")
|
|
|
|
|
|
| |
There's a convenient "FTOC" instruction for generating the coverage now,
unlike vc4. This fixes
dEQP-GLES3.functional.multisample.fbo_4_samples.proportionality_alpha_to_coverage
|
|
|
|
| |
The #define existed and was checked in the driver.
|
|
|
|
|
|
| |
Fixes simulator assertion failures in
dEQP-GLES3.functional.shaders.texture_functions.texture.samplercubeshadow_bias_fragment
and similar complicated cases.
|
|
|
|
|
|
|
|
| |
The docs called this field "uses both center W and centroid W", but
actually it's "do you need center W even if varyings don't obviously call
for it?"
Fixes dEQP-GLES3.functional.shaders.builtin_variable.fragcoord_w
|
|
|
|
| |
Fixes dEQP-GLES3.functional.fragment_out.random.26 and 6 others.
|
| |
|
|
|
|
|
| |
Fixes segfaults and undefined behavior in
dEQP-GLES3.functional.fragment_out.basic.fixed.srgb8_alpha8_lowp_float
|
|
|
|
|
|
|
|
|
|
| |
These together get the GLSL 3.00 unorm/snorm pack functions and
MESA_shader_integer operations working.
v2: Fix commit message typo.
Reviewed-by: Matt Turner <[email protected]>
Reviewed-by: Ian Romanick <[email protected]>
|
|
|
|
|
|
|
| |
You'd need src/broadcom/cle/ in the -I previously, for srcdir != builddir.
nir was fine at that, but automake didn't have it.
Bugzilla: https://github.com/anholt/mesa/issues/104
|
| |
|
|
|
|
| |
Fixes piglit vs-isnan-*.shader_test at the expense of gl-1.0-spot-light.
|
| |
|
|
|
|
|
|
| |
This allows the driver to load against the merged kernel DRM driver. In
the process, rename most of the build system variables and gallium
plumbing functions.
|
|
|
|
|
|
|
|
|
| |
There is a compile warning from Android 8 (API version 26) from "include cutils/log.h"
warning: "Deprecated: don't include cutils/log.h, use either android/log.h or log/log.h"-W#warnings,
Change to include "log/log.h" on Android 8 or later major version to avoid this warning
Signed-off-by: jenny.q.cao <[email protected]>
Reviewed-by: Tapani Pälli <[email protected]>
|
|
|
|
|
|
|
|
|
| |
It would be nice to share the flags packet emit logic with flat shade
flags, but I couldn't come up with a good way while still using our pack
macros. We need to refactor this to shader record setup at compile time,
anyway.
Fixes ext_framebuffer_multisample-interpolation * centroid-*
|