| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
|
|
|
|
|
|
|
|
| |
This commit moves the register allocator out of midgard_compile.c and
into its own midgard_ra.c file. In doing so, a number of dependencies
are identified and moved into their own files in turn. midgard_compile.c
is still fairly monolithic, but this should help.
Code churn, but no functional changes should be introduced by this
commit.
Signed-off-by: Alyssa Rosenzweig <[email protected]>
|
|
|
|
|
|
|
|
| |
This fixes a few miscellaneous issues with the fixed-function blending
programming, though it is far from complete. For cases known to be
buggy, we force a fallback to blend shaders.
Signed-off-by: Alyssa Rosenzweig <[email protected]>
|
|
|
|
|
|
|
|
| |
This implements blend shaders via nir_lower_blend, by creating dummy
fragment shaders simply passing through the source color and using the
new lowering pass to inject blendability.
Signed-off-by: Alyssa Rosenzweig <[email protected]>
|
|
|
|
|
|
|
| |
To prepare for the new nir_lower_blend pass, we wire up the intrinsics
for tilebuffer reads and constant colour loading.
Signed-off-by: Alyssa Rosenzweig <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This new lowering pass implements the OpenGL ES blend pipeline in
shaders, applicable to hardware lacking full-featured blending hardware
(including Midgard/Bifrost and vc4). This pass is run on a fragment
shader, rewriting the store to a blended version, loading in the
framebuffer destination color and constant color via intrinsics as
necessary. This pass is sufficient for OpenGL ES 2.0 and is verified to
pass dEQP's blend tests. MIN/MAX modes are included and tested as well.
That said, at present it has the following limitations:
- MRT is not supported (ES3).
- sRGB support is missing (ES3).
- Extended blending is not yet ported from GLSL IR lowering (ES3.2)
- Dual-source blending is not supported. (N/A)
- Logic ops are not supported. (N/A)
v2: Fix code conventions (per Ian Romanick's feedback). Implement color
masks.
This pass should be in common nir/ space, but due to non-technical
reasons, for now it's in Panfrost space. In the future, depending if
other drivers need some of the functionality, we can move this back to
src/compiler/nir space.
Signed-off-by: Alyssa Rosenzweig <[email protected]>
|
|
|
|
|
| |
Signed-off-by: Alyssa Rosenzweig <[email protected]>
Reviewed-by: Ryan Houdek <[email protected]>
|
|
|
|
|
| |
Signed-off-by: Alyssa Rosenzweig <[email protected]>
Reviewed-by: Ryan Houdek <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This adds a forgotten decode line on Midgard and adds the field of a
blend constant on Bifrost. The Bifrost encoding is fairly weird; whereas
Midgard is just a regular 32-bit float, Bifrost uses a fancy
fixed-point-esque encoding. The decode logic here is experimentally
correct. The encode logic is a sort of "guesstimate", assuming that the
high byte is just int(f / 255.0) and then solving algebraicly for the
low byte. This might be slightly off in some cases.
Signed-off-by: Alyssa Rosenzweig <[email protected]>
Reviewed-by: Ryan Houdek <[email protected]>
|
|
|
|
|
|
|
|
|
| |
This eliminates one major source of #ifdef parity between Midgard and
Bifrost, better representing how the struct acts on Midgard and allowing
proper decodes on Bifrost.
Signed-off-by: Alyssa Rosenzweig <[email protected]>
Reviewed-by: Ryan Houdek <[email protected]>
|
|
|
|
|
|
|
|
|
| |
We already have the Bifrost disassembler in-tree, so now that panwrap is
able to dump Bifrost command streams, hook up the disassembler to
pandecode.
Signed-off-by: Alyssa Rosenzweig <[email protected]>
Reviewed-by: Ryan Houdek <[email protected]>
|
|
|
|
|
| |
Reported-by: Ryan Houdek <[email protected]>
Signed-off-by: Alyssa Rosenzweig <[email protected]>
|
|
|
|
|
|
|
| |
Many of these are now patched; one of them we patch here. Regardless,
this is one less thing to worry about in the code, I suppose.
Signed-off-by: Alyssa Rosenzweig <[email protected]>
|
|
|
|
| |
Signed-off-by: Alyssa Rosenzweig <[email protected]>
|
|
|
|
|
|
|
|
|
| |
This commit adds a bunch of new load/store opcodes, largely related to
OpenCL, as well as adjusting the name of existing opcodes to be more
uniform. The immediate effect is compute shaders are substantially
easier to interpret now.
Signed-off-by: Alyssa Rosenzweig <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
| |
Midgard ALU features two types of constants: embedded constants (128-bit
chunk, zero/one per schedule bundle) and inline constants (16-bit
splattered into the op, second source if present). Inline constants are
much more efficient from a space and scheduling freedom standpoint, so
it's desirable to inline when possible. Now that integer ops are well
understood and in use, we enable inlining of integers constants in
addition to floats (which have been inlined since forever).
Signed-off-by: Alyssa Rosenzweig <[email protected]>
|
|
|
|
|
|
| |
The previous commit fixes the issue this patched around.
Signed-off-by: Alyssa Rosenzweig <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
By default, the "normal" output modifier is set on ALU ops. This is the
correct default for float outputs -- for floats, it preserves the semantic
value. Unfortunately, when used with integers, it does not preserve the
bitstream encoding, causing misbehaviour. (It's an open question what
happens when `normal` is used with integers -- does it apply some other
transformation? or does it do floating point normalization/etc on the
ints as if they were floats?).
Instead, we default to the "clamp to integer" output modifier for
ops writing integers. Semantically, this makes sense (clamping an
integer to the nearest integer is the identity function). In the
hardware with an integer opcode, this is the actual "normal".
This fixes numerous sporadic and sometimes bizarre bugs relating to
integers, especially integer moves. With this in place, we no longer
care about the types involved; it's just bits on the wire again.
Signed-off-by: Alyssa Rosenzweig <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
From Gallium (and our) perspective, the stride of a BO is arbitrary. For
internal buffers, we can make it something nice, but for imported linear
buffers (e.g. EGL clients), we don't always have that luxury. To cope,
we calculate the expected stride of a texture, compare it to the BO's
actual reported stride, and if they differ, set the latter as a custom
stride.
Fixes rendering of windows not on tile boundaries (noticeable in Weston
with es2gears_wayland, for instance). Also, this should fix stride
issues with bufer reloading.
Signed-off-by: Alyssa Rosenzweig <[email protected]>
|
|
|
|
|
|
|
|
|
| |
With a special flag, texture descriptors can include custom stride(s).
We haven't seen a case of this used for mipmaps/cubemaps, so it's not
clear how that will be encoded, but this dumps correctly for single
one-level 2D textures.
Signed-off-by: Alyssa Rosenzweig <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
| |
One field was not dumped for some reason. It's observed to be 0, but
it's still good to have it available.
Also, extra fields might be snuck in the bitmaps array (it's
variable-lengthed at the end), and we want to guard against that
possibility, so we dump a little more.
Signed-off-by: Alyssa Rosenzweig <[email protected]>
|
|
|
|
|
|
|
|
| |
As with the previous value of 5000 we seemed to be reaching OOM in some
circumstances.
Signed-off-by: Tomeu Vizoso <[email protected]>
Acked-by: Alyssa Rosenzweig <[email protected]>
|
|
|
|
|
|
|
|
|
|
| |
Since last Friday, these two tests have been fixed:
dEQP-GLES2.functional.shaders.functions.control_flow.return_in_nested_loop_fragment
dEQP-GLES2.functional.shaders.linkage.varying_7
Signed-off-by: Tomeu Vizoso <[email protected]>
Acked-by: Alyssa Rosenzweig <[email protected]>
|
|
|
|
|
|
|
|
| |
The _LEVELS assumes that the max is always power of two. For V3D 4.2, we
can support up to 7680 non-power-of-two MSAA textures, which will let X11
support dual 4k displays on newer hardware.
Reviewed-by: Marek Olšák <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
| |
We use an algebraic pass for the csel optimizations, and use proper
vectorized csel ops (i/fcsel_v) for mixed, rather lowering.
To avoid regressions along the way, we fix an issue with the copy
propagation pass (it should not attempt to propagate constants).
Similarly, we take care to break bundles when using csel to fix some
scheduler corner cases.
Signed-off-by: Alyssa Rosenzweig <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Just do what everybody else but Nouveau does and return 0.0f.
This prevents the repeated logging of these messages on startup:
Unexpected PIPE_CAPF 6 query
Unexpected PIPE_CAPF 7 query
Unexpected PIPE_CAPF 8 query
Signed-off-by: Tomeu Vizoso <[email protected]>
Reviewed-by: Emil Velikov <[email protected]>
Reviewed-by: Alyssa Rosenzweig <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
As the functions operate on 16-byte blocks.
Fixes this Valgrind error:
Invalid read of size 4
at 0x5857568: swizzle_bpp1_align16 (pan_swizzle.c:85)
by 0x585780F: panfrost_texture_swizzle (pan_swizzle.c:171)
by 0x584F587: panfrost_tile_texture (pan_resource.c:489)
by 0x584F641: panfrost_transfer_unmap (pan_resource.c:525)
by 0x587718D: u_transfer_helper_transfer_unmap (u_transfer_helper.c:516)
by 0x5875D85: pipe_transfer_unmap (u_inlines.h:515)
by 0x5875F13: u_default_texture_subdata (u_transfer.c:80)
by 0x53FFDC3: st_TexSubImage (st_cb_texture.c:1480)
by 0x54005BB: st_TexImage (st_cb_texture.c:1709)
by 0x5391353: teximage (teximage.c:3105)
by 0x5391353: teximage_err (teximage.c:3132)
by 0x5391B9B: _mesa_TexImage2D (teximage.c:3170)
by 0x5097A77: shared_dispatch_stub_183 (glapi_mapi_tmp.h:18833)
Address 0x1e94f1e8 is 0 bytes after a block of size 16 alloc'd
at 0x483F5C8: malloc (vg_replace_malloc.c:299)
by 0x584F47D: panfrost_transfer_map (pan_resource.c:467)
by 0x587694D: u_transfer_helper_transfer_map (u_transfer_helper.c:243)
by 0x5875EA7: u_default_texture_subdata (u_transfer.c:59)
by 0x53FFDC3: st_TexSubImage (st_cb_texture.c:1480)
by 0x54005BB: st_TexImage (st_cb_texture.c:1709)
by 0x5391353: teximage (teximage.c:3105)
by 0x5391353: teximage_err (teximage.c:3132)
by 0x5391B9B: _mesa_TexImage2D (teximage.c:3170)
by 0x5097A77: shared_dispatch_stub_183 (glapi_mapi_tmp.h:18833)
by 0x4DA8AB: glu::CallLogWrapper::glTexImage2D(unsigned int, int, int, int, int, int, unsigned int, unsigned int, void const*) (in /home/tomeu/deqp-build/modules/gles2/deqp-gles2)
Signed-off-by: Tomeu Vizoso <[email protected]>
Reviewed-by: Emil Velikov <[email protected]>
Cc: 19.1 <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Valgrind was complaining of those.
NIR_PASS only sets progress to TRUE if there was progress.
nir_const_load_to_arr() only sets as many constants as components has
the instruction.
This was causing some dEQP tests to flip-flop, such as:
dEQP-GLES2.functional.fragment_ops.blend.equation_src_func_dst_func.add_src_color_constant_color
Signed-off-by: Tomeu Vizoso <[email protected]>
Reviewed-by: Alyssa Rosenzweig <[email protected]>
Fixes: 14531d676b11 ("nir: make nir_const_value scalar")
|
|
|
|
|
|
|
|
| |
These tests add too much time to the total run time, and some of them
even hang the DUTs, even if I haven't been able to reproduce it locally.
Signed-off-by: Tomeu Vizoso <[email protected]>
Reviewed-by: Alyssa Rosenzweig <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
| |
There doesn't seem to actually be any noticeably memory leaks on Weston
when running dEQP. We do seem to leak quiet a bit in the client, so we
still have to run the dEQP runner in batches.
This removes the risk of Weston not restarting properly and introducing
spurious failures.
Signed-off-by: Tomeu Vizoso <[email protected]>
Reviewed-by: Alyssa Rosenzweig <[email protected]>
|
|
|
|
|
|
|
|
| |
This matches the current state of things on both RK3288 and RK3399.
Hopefully, from now on we'll only remove stuff from this list.
Signed-off-by: Tomeu Vizoso <[email protected]>
Reviewed-by: Alyssa Rosenzweig <[email protected]>
|
|
|
|
|
| |
Signed-off-by: Tomeu Vizoso <[email protected]>
Reviewed-by: Alyssa Rosenzweig <[email protected]>
|
|
|
|
|
|
|
|
| |
Make sure we have only test case names in the list, excluding names of
test groups.
Signed-off-by: Tomeu Vizoso <[email protected]>
Reviewed-by: Alyssa Rosenzweig <[email protected]>
|
|
|
|
|
|
|
|
|
| |
To improve robustness, check that we got the expected number of results.
Right now we hard-code the expected number of tests run, but with some
effort we may be able to infer it.
Signed-off-by: Tomeu Vizoso <[email protected]>
Reviewed-by: Alyssa Rosenzweig <[email protected]>
|
|
|
|
|
|
|
| |
These tests aren't giving reliable results. Mask them for now.
Signed-off-by: Tomeu Vizoso <[email protected]>
Reviewed-by: Alyssa Rosenzweig <[email protected]>
|
|
|
|
|
|
|
|
| |
Build artifacts for armhf and schedule them on a Veyron Chromebook with
RK3288.
Signed-off-by: Tomeu Vizoso <[email protected]>
Reviewed-by: Alyssa Rosenzweig <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
I don't know why I thought NIR_PASS always set the progress variable.
Derp.
Fixes: d41cdef2a59 ("nir: Use the flrp lowering pass instead of nir_opt_algebraic")
Reviewed-by: Brian Paul <[email protected]>
Reviewed-by: Timothy Arceri <[email protected]>
Reviewed-by: Emil Velikov <[email protected]>
Coverity CID: 1444996
Coverity CID: 1444995
Coverity CID: 1444994
Coverity CID: 1444993
Coverity CID: 1444991
Coverity CID: 1444989
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
I tried to be very careful while updating all the various drivers, but I
don't have any of that hardware for testing. :(
i965 is the only platform that sets always_precise = true, and it is
only set true for fragment shaders. Gen4 and Gen5 both set lower_flrp32
only for vertex shaders. For fragment shaders, nir_op_flrp is lowered
during code generation as a(1-c)+bc. On all other platforms 64-bit
nir_op_flrp and on Gen11 32-bit nir_op_flrp are lowered using the old
nir_opt_algebraic method.
No changes on any other Intel platforms.
v2: Add panfrost changes.
Iron Lake and GM45 had similar results. (Iron Lake shown)
total cycles in shared programs: 188647754 -> 188647748 (<.01%)
cycles in affected programs: 5096 -> 5090 (-0.12%)
helped: 3
HURT: 0
helped stats (abs) min: 2 max: 2 x̄: 2.00 x̃: 2
helped stats (rel) min: 0.12% max: 0.12% x̄: 0.12% x̃: 0.12%
Reviewed-by: Matt Turner <[email protected]>
|
|
|
|
|
|
|
|
| |
Driver which do not support native integers should use a lowering
pass to go from integers to floats.
Signed-off-by: Christian Gmeiner <[email protected]>
Reviewed-by: Jason Ekstrand <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This commit does a fairly large cleanup of blend descriptors, although
there should not be any functional changes. In particular, we split
apart the Midgard and Bifrost blend descriptors, since they are
radically different. From there, we can identify that the Midgard
descriptor as previously written was really two render targets'
descriptors stuck together. From this observation, we split the Midgard
descriptor into what a single RT actually needs. This enables us to
correctly dump blending configuration for MRT samples on Midgard. It
also allows the Midgard and Bifrost blend code to peacefully coexist,
with runtime selection rather than a #ifdef. So, as a bonus, this will
help the future Bifrost effort, eliminating one major source of
compile-time architectural divergence.
Signed-off-by: Alyssa Rosenzweig <[email protected]>
|
|
|
|
| |
Signed-off-by: Alyssa Rosenzweig <[email protected]>
|
|
|
|
|
|
|
|
|
| |
Basically, when the conditions of a csel diverge, we scalarize to avoid
going into weird code paths during emit. We could be doing better, but
this case can't occur organically from GLSL as far as I can, though it
does fix lowered atan2.
Signed-off-by: Alyssa Rosenzweig <[email protected]>
|
|
|
|
|
|
|
|
| |
A previous commit by Tomeu aborted RA early, which solves the memory
corruption issue, but then generates an incorrect compile. This fixes
that.
Signed-off-by: Alyssa Rosenzweig <[email protected]>
|
|
|
|
| |
Signed-off-by: Alyssa Rosenzweig <[email protected]>
|
|
|
|
| |
Signed-off-by: Alyssa Rosenzweig <[email protected]>
|
|
|
|
| |
Signed-off-by: Alyssa Rosenzweig <[email protected]>
|
|
|
|
| |
Signed-off-by: Alyssa Rosenzweig <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This handles the usual case. 8-bit register access parallels 16-bit
access, but with one major caveat: in 8-bit mode, only half of the
register file is actually (directly) accessible as sources. In
particular, for each 16-bit integer register (hrN), we can only index a
*single* 8-bit integer (qrN), corresponding to the lower 8-bits. To get
the upper 8-bits, it is required to do an explicit shift. For example,
to add the bytes of a 16-bit integer hr0.x and get the result as an
8-bit qr0, you'd need to do something like:
ilsr hr1.x, hr0.x, #8
iadd qr0.x, qr0.x, qr1.x
This scheme diverges from 32-bit registers, in that both the upper and
lower halves of a 32-bit register are individually accessible as a pair
of half registers. For contrast, to add the lower and upper 16-bits of a
32-bit integer r0.x, you can just:
iadd hr0.x, hr0.x, hr1.x
Since hr1.x = upper 16-bit of r0.x.
Signed-off-by: Alyssa Rosenzweig <[email protected]>
|
|
|
|
|
|
|
|
| |
Meanwhile, we're forced to disable dest_override, since it's not yet
clear how this interacts with other bitnesses (it'll likely need to be
overhauled in any case).
Signed-off-by: Alyssa Rosenzweig <[email protected]>
|
|
|
|
|
|
| |
Per OpenCL.
Signed-off-by: Alyssa Rosenzweig <[email protected]>
|
|
|
|
| |
Signed-off-by: Alyssa Rosenzweig <[email protected]>
|