| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
|
|
|
| |
Supported on R700 and up.
Signed-off-by: Glenn Kennard <[email protected]>
Reviewed-by: Marek Olšák <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
a const
When both fadd and fmul instructions have at least one operand that is a
constant and it is only used once, the total number of instructions can
be reduced from 3 (1 ffma + 2 load_const) to 2 (1 fmul + 1 fadd); because
the constants will be progagated as immediate operands of fmul and fadd.
This patch detects these situations and prevents fusing fmul+fadd into ffma.
Shader-db results on i965 Haswell:
total instructions in shared programs: 6235835 -> 6225895 (-0.16%)
instructions in affected programs: 1124094 -> 1114154 (-0.88%)
total loops in shared programs: 1979 -> 1979 (0.00%)
helped: 7612
HURT: 843
GAINED: 4
LOST: 0
Reviewed-by: Jason Ekstrand <[email protected]>
|
|
|
|
|
|
| |
Returns whether the list has exactly one element.
Reviewed-by: Matt Turner <[email protected]>
|
|
|
|
|
|
|
|
|
| |
Because the next patch will add an optimization that is specific to i965,
we want to move this loweing pass to that driver altogether.
This is safe because i965 is the only consumer.
Reviewed-by: Jason Ekstrand <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
We've assumed that we could lower per-component vector access from
vec[i] = scalar
to
vec = ir_triop_vector_insert(vec, scalar, i)
but with SSBOs (and compute shader SLM and tesselation outputs) this is
no longer valid. If a vector is "externally visible", multiple threads
can write independent components simultaneously. With lowering to
ir_triop_vector_insert, each thread read the entire vector, changes one
component, then writes out the entire vector. This is racy.
Instead of generating a ir_binop_vector_extract when we see v[i], we
generate ir_dereference_array. We then add a lowering pass to lower the
ir_dereference_array to ir_binop_vector_extract for rvalues and for to
vector_insert for lvalues in a separate lowering pass.
The resulting IR is the same as before, but we now have a window between
ast->ir conversion and the lowering pass where v[i] appears in the IR as
an array deref. This lets us run lowering passes that lower the vector
access to I/O (eg for SSBO load/store) before we lower the per-component
access to full vector writes.
Reviewed-by: Jordan Justen <[email protected]>
Signed-off-by: Kristian Høgsberg Kristensen <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
| |
All GLSL IR consumers run this lowering pass so we can move it to the
linker. This moves the pass up quite a bit, but that's the point: it
needs to run before we throw away information about per-component vector
access.
Reviewed-by: Iago Toral Quiroga <[email protected]>
Reviewed-by: Jordan Justen <[email protected]>
Signed-off-by: Kristian Høgsberg Kristensen <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
| |
We always pass in shader->ir and we already pass in the shader, so just
drop the exec_list. Most passes either take just a exec_list or a
shader, so this seems more consistent.
Reviewed-by: Timothy Arceri <[email protected]>
Reviewed-by: Iago Toral Quiroga <[email protected]>
Reviewed-by: Jordan Justen <[email protected]>
Signed-off-by: Kristian Høgsberg Kristensen <[email protected]>
|
|
|
|
|
|
|
| |
v2: use nir_bulder_cf_insert (Ken)
Signed-off-by: Connor Abbott <[email protected]>
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
| |
Again, this matches what the builder will have to do.
Signed-off-by: Connor Abbott <[email protected]>
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
|
| |
Its only user now returns a nir_ssa_def *, and we'll need this since the
builder returns a nir_ssa_def *.
Signed-off-by: Connor Abbott <[email protected]>
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
| |
A long time ago, before NIR was even merged to master, glsl_to_nir used
registers and these sources were actually register sources. But nowadays
everything in glsl_to_nir is an SSA value, so stop pretending that by
evaluating an rvalue we can get an arbitrary nir_src. Most importantly,
we need this since the builder takes nir_ssa_def * sources directly.
Signed-off-by: Connor Abbott <[email protected]>
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
|
| |
Ideally we should have a _mesa_cleanup_buffer_object function in
src/mesa/bufferobj.c so that the destruction logic resided in a single
place.
Reviewed-by: Brian Paul <[email protected]>
|
|
|
|
|
|
|
|
|
|
| |
These tessellation shader related fields need plumbing through NIR.
v2: Use uint32_t instead of uint64_t to match the source type of
GLbitfield (caught by Iago Toral).
Signed-off-by: Kenneth Graunke <[email protected]>
Reviewed-by: Iago Toral Quiroga <[email protected]>
|
|
|
|
|
|
|
| |
Since X has undefined contents in new pixmaps, it will allocate new
textures for an FBO and draw to them without an explicit clear. For
VC4, it's much faster to emit a clear than the load of the actual
undefined memory contents, so just do that instead.
|
|
|
|
|
|
|
| |
I'm not sure what the caller does is appropriate (just have a NULL sampler
at this slot), but it fixes the immediate crash.
Cc: "11.0" <[email protected]>
|
|
|
|
|
|
|
| |
I was afraid our callers weren't prepared for this, but it looks like
at least for resource creation, mesa/st throws an error appropriately.
Cc: "11.0" <[email protected]>
|
| |
|
| |
|
|
|
|
|
| |
Signed-off-by: Jordan Justen <[email protected]>
Reviewed-by: Timothy Arceri <[email protected]>
|
|
|
|
|
|
|
|
|
|
| |
Shared variables are stored in a common pool accessible by all threads
in a compute shader local work group.
These variables are similar to OpenCL's local/__local variables.
Signed-off-by: Jordan Justen <[email protected]>
Reviewed-by: Timothy Arceri <[email protected]>
|
|
|
|
|
| |
Signed-off-by: Jordan Justen <[email protected]>
Reviewed-by: Timothy Arceri <[email protected]>
|
|
|
|
|
|
|
|
| |
v2:
* Split from patch to add ir_var_shader_shared (tarceri)
Signed-off-by: Jordan Justen <[email protected]>
Reviewed-by: Timothy Arceri <[email protected]>
|
|
|
|
|
|
|
|
|
|
| |
v2:
* Move shared parsing under storage qualifiers (tarceri)
* Fail to compile if shared is used in non-compute shader (tarceri)
* Use separate shared_storage bit for shared variables (tarceri)
Signed-off-by: Jordan Justen <[email protected]>
Reviewed-by: Timothy Arceri <[email protected]>
|
|
|
|
|
|
|
|
|
| |
Qualifiers on member variables are redundent all we need to do
if check if it matches the stream associated with the block and
throw an error if its not.
Reviewed-by: Samuel Iglesias Gonsalvez <[email protected]>
Cc: Emil Velikov <[email protected]>
|
|
|
|
| |
Signed-off-by: Ilia Mirkin <[email protected]>
|
|
|
|
|
|
| |
Fixes crash when using HUD with Nobel Clinician Viewer.
Reviewed-by: Jose Fonseca <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The stw_st_framebuffer_present_locked() function was getting called
twice per SwapBuffers. First, when st_context_iface::flush() was
called from DrvSwapBuffers() because the ST_FLUSH_FRONT flag was
given. Second, by stw_st_swap_framebuffer_locked() which does the
actual SwapBuffers.
Two code changes:
1. Pass ST_FLUSH_END_OF_FRAME, instead of ST_FLUSH_FRONT.
2. Move the implementation of stw_flush_current_locked() into
DrvSwapBuffers() since it's not called anywhere else.
Not much change in perf for benchmarks like Lightsmark, but some simple
Mesa demos are measurably faster.
Reviewed-by: José Fonseca <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
And put 8-bit/channel formats before 5/6/5 formats.
The ChoosePixelFormat() function seems to be finicky about format
selection. Putting the MSAA formats after the non-MSAA formats
means most apps get a low-numbered format. Now we generally get
the same pixel format regardless of whether using vgpu9 or 10.
VMware bug 1455030
Reviewed-by: José Fonseca <[email protected]>
|
|
|
|
|
|
|
| |
This allows to use apitrace's retracediff script on Windows to retrace and
compare two builds of a Mesa based opengl32.dll/ICD side-by-side.
See also https://github.com/apitrace/apitrace/commit/e4a4f15f5b92e0abbd24d7d053da25f8278c9f64
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Fixes GPUVM conflicts with non-4K page size.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=92738
v2: Replace sanitization of VM base address alignment with comment why
that's not necessary.
v3: Use unsigned instead of long as the type for the size_align member.
(Marek)
Cc: [email protected]
Reviewed-by: Christian König <[email protected]> (v1)
Reviewed-by: Marek Olšák <[email protected]>
|
|
|
|
|
| |
Signed-off-by: Christian König <[email protected]>
Reviewed-by: Alex Deucher <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
| |
This will allow dec/enc/transcode without X
v2: use env override even with X,
use loader_open_device instead of open
v3: clean up
Signed-off-by: Leo Liu <[email protected]>
Reviewed-by: Christian König <[email protected]>
|
|
|
|
|
|
|
| |
v2: move the dup to vl_wys_drm for pipe loader
Signed-off-by: Leo Liu <[email protected]>
Reviewed-by: Christian König <[email protected]>
|
|
|
|
|
|
|
|
|
|
| |
This will allow the state trackers to use render nodes
with screen creation
v2: dup fd for pipe loader
Signed-off-by: Leo Liu <[email protected]>
Reviewed-by: Christian König <[email protected]>
|
|
|
|
|
|
|
| |
There is no dev in drv, and dev should be from vl_screen here
Signed-off-by: Leo Liu <[email protected]>
Reviewed-by: Christian König <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
| |
Altough the compute support is still not complete because textures and
surfaces need to be implemented, it allows to launch very simple compute
kernel like one which reads reading MP performance counters.
This turns on PIPE_CAP_COMPUTE and PIPE_SHADER_COMPUTE.
Signed-off-by: Samuel Pitoiset <[email protected]>
Reviewed-by: Ilia Mirkin <[email protected]>
|
|
|
|
|
|
|
|
|
|
| |
There might only be a single arg (e.g. cvt), so use mode rather than
looking at the source directly. Also we don't want to rely on the type
of the value, which can be unreliable, but instead use the
instruction's. This works out well since mkSplit doesn't adjust the
type.
Signed-off-by: Ilia Mirkin <[email protected]>
|
|
|
|
|
|
|
|
| |
Not reachable from TGSI since it only has UMUL, no IMUL. However it's
surprising that setting argument types to s32 will cause sign to get
lost.
Signed-off-by: Ilia Mirkin <[email protected]>
|
|
|
|
|
|
|
|
|
|
| |
Force the fence to get kicked off, which won't actually wait for its
completion, but any additional work will be put onto a fresh list.
This fixes crashes in teximage-colors --benchmark with too many active
maps.
Signed-off-by: Ilia Mirkin <[email protected]>
|
|
|
|
|
|
|
|
| |
As pointed out by Emil, this sometimes hangs, appears to be due to threading
need to rethink how this stuff works for llvmpipe.
Signed-off-by: Dave Airlie <[email protected]>
|
|
|
|
|
|
|
| |
Coverity reported that ret could only be 0 or 1, since it
was setting ret = fn() > 0, instead of doing (ret = fn()) > 0.
Signed-off-by: Dave Airlie <[email protected]>
|
|
|
|
| |
Reviewed-by: Connor Abbott <[email protected]>
|
|
|
|
| |
Reviewed-by: Connor Abbott <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
| |
Previously, we were assuming that everything read/wrote exactly 1 logical
GRF (1 in SIMD8 and 2 in SIMD16). This isn't actually true. In
particular, the PLN instruction reads 2 logical registers in one of the
components. This commit changes post-RA scheduling to use regs_read and
regs_written instead so that we add enough dependencies.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=92770
Reviewed-by: Matt Turner <[email protected]>
Reviewed-by: Connor Abbott <[email protected]>
|
|
|
|
| |
Reviewed-by: Connor Abbott <[email protected]>
|
|
|
|
|
|
| |
There are a few non-stoney changes too.
Reviewed-by: Alex Deucher <[email protected]>
|
|
|
|
|
|
| |
v2: set emit_scratch_reloc, add a NULL check
Reviewed-by: Michel Dänzer <[email protected]>
|
|
|
|
| |
Reviewed-by: Michel Dänzer <[email protected]>
|
|
|
|
|
|
| |
v2: don't call get_flush_flags twice per function
Reviewed-by: Michel Dänzer <[email protected]>
|
|
|
|
|
|
| |
This should improve performance for big copies that need to be split.
Reviewed-by: Michel Dänzer <[email protected]>
|