| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
|
|
|
|
|
| |
This splits emit_fb_writes into two functions: emit_fb_writes and
emit_single_fb_write. This reduces the amount of duplicated code in
emit_fb_writes and makes the register number fiddling less arcane.
Signed-off-by: Jason Ekstrand <[email protected]>
Reviewed-by: Matt Turner <[email protected]>
|
|
|
|
|
|
|
|
| |
Sometimes these show up in LOAD_PAYLOAD instructions and it's nice to be
able to see them.
Signed-off-by: Jason Ekstrand <[email protected]>
Reviewed-by: Matt Turner <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Previously we disabled compact_virtual_grfs when dumping optimizations.
The idea here was to make it easier to diff the dumped shader because you
didn't have a sudden renaming. However, sometimes a bug is affected by
compact_virtual_grfs and, when this happens, you want to keep dumping
instructions with compact_virtual_grfs enabled. By turning it into an
optimization pass and dumping it along with the others, we retain the
ability to diff because you can just diff against the compact_virtual_grf
output.
Signed-off-by: Jason Ekstrand <[email protected]>
Reviewed-by: Matt Turner <[email protected]>
|
|
|
|
|
| |
Signed-off-by: Jason Ekstrand <[email protected]>
Reviewed-by: Matt Turner <[email protected]>
|
|
|
|
|
|
|
|
|
| |
We also set the register width equal to the dispatch width. Right now,
this is effectively a no-op since we don't do anything with it. However,
it will be important once we add an actual width field to fs_reg.
Signed-off-by: Jason Ekstrand <[email protected]>
Reviewed-by: Matt Turner <[email protected]>
|
|
|
|
|
| |
Signed-off-by: Jason Ekstrand <[email protected]>
Reviewed-by: Matt Turner <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
| |
Previously, if an instruction wrote to more than one register, we
implicitly assumed that it filled the entire register. We never hit this
before because the only time we did multi-register writes was things like
texturing which always wrote to all of the registers. However, with the
upcoming ability to do 16-wide instructions in SIMD8 and things of that
nature, we can have multi-register writes at offsets and we'll hit this.
Signed-off-by: Jason Ekstrand <[email protected]>
Reviewed-by: Matt Turner <[email protected]>
|
|
|
|
|
|
|
|
|
|
| |
instructions
Using a floating-point type doesn't usually cause hangs on my HSW, but the
simulator complains about it quite a bit.
Signed-off-by: Jason Ekstrand <[email protected]>
Reviewed-by: Matt Turner <[email protected]>
|
|
|
|
|
|
|
|
|
|
| |
We have this wonderful offset() function for advancing registers, but we're
not using it. Using offset() allows us to do some sanity checking and
avoid manually touching fs_reg::reg_offset. In a few commits, we will make
offset do even more nifty things for us.
Signed-off-by: Jason Ekstrand <[email protected]>
Reviewed-by: Matt Turner <[email protected]>
|
|
|
|
|
| |
Signed-off-by: Jason Ekstrand <[email protected]>
Reviewed-by: Matt Turner <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The original vgrf splitting code was written with the assumption that vgrfs
came in two types: those that can be split into single registers and those
that can't be split at all It was very conservative and bailed as soon as
more than one element of a register was read or written. This won't work
once we start allowing a regular MOV or ADD operation to operate on
multiple registers. This rewrite allows for the case where a vgrf of size
5 may appropriately be split in to one register of size 1 and two registers
of size 2.
Signed-off-by: Jason Ekstrand <[email protected]>
Acked-by: Matt Turner <[email protected]>
|
|
|
|
|
| |
Signed-off-by: Jason Ekstrand <[email protected]>
Reviewed-by: Matt Turner <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Previously, we were generating the fast-clear shader from GLSL. The
problem is that fast clears require that we use a replicated write rather
than a regular write instruction. In order to get this we had a
complicated and somewhat fragile optimization pass that looked for places
where we can use a replicated write and used it. Since replicated writes
have a lot of restrictions, we only ever use them for fast-clear
operations.
This commit replaces the optimization pass with a function that just
generates the shader we want. This is a) less code, b) less fragile than
the optimization pass, and c) generates a more efficient shader.
Signed-off-by: Jason Ekstrand <[email protected]>
Reviewed-by: Kristian Høgsberg <[email protected]>
Acked-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
| |
Otherwise some parts of tiled slices can be missed.
Reviewed-by: Marek Olšák <[email protected]>
|
|
|
|
| |
Reviewed-by: Marek Olšák <[email protected]>
|
|
|
|
|
|
|
| |
Fixes GPUVM faults when running the piglit test "getteximage-formats
init-by-rendering" with R600_DEBUG=forcedma on SI.
Reviewed-by: Marek Olšák <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
| |
We are currently only dealing with depth-only or stencil-only resources
here, not with resources having both depth and stencil[0]. In both cases,
the tiling mode index is in the tile_mode field, not in the
stencil_tile_mode field.
[0] Add an assertion for that.
Reviewed-by: Marek Olšák <[email protected]>
|
|
|
|
|
|
|
| |
The VE format of edge flag pointers was changed in
780ce576bb1781f027797039693b98253ee4813e.
Signed-off-by: Chia-I Wu <[email protected]>
|
|
|
|
|
|
|
|
| |
Add finalize_vertex_elements() to finalize ilo_ve_state. This fixes a
potential issue with URB entry allocation for VS and move the complexity of
gen6_3DSTATE_VERTEX_ELEMENTS() to the new function.
Signed-off-by: Chia-I Wu <[email protected]>
|
|
|
|
|
|
| |
To replace the hacky zs_align_surface().
Signed-off-by: Chia-I Wu <[email protected]>
|
|
|
|
|
|
| |
The size is always 24 bytes. We can upload them to the dynamic buffer.
Signed-off-by: Chia-I Wu <[email protected]>
|
|
|
|
|
|
|
|
|
|
| |
Commit "st/xa: scissor to help tilers" broke xa_yuv_planar_blit() and vmwgfx
textured video. Fix this by implementing scissors also in the yuv draw path.
Signed-off-by: Thomas Hellstrom <[email protected]>
Reviewed-by: Sinclair Yeh <[email protected]>
Cc: Rob Clark <[email protected]>
Cc: "10.2 10.3" <[email protected]>
|
|
|
|
|
|
| |
Unused; it was replaced by include/pci_ids/i965_pci_ids.h long ago.
Acked-by: Matt Turner <[email protected]>
|
|
|
|
|
| |
Reviewed-by: Alex Deucher <[email protected]>
Reviewed-by: Eric Anholt <[email protected]>
|
|
|
|
|
| |
Reviewed-by: Alex Deucher <[email protected]>
Reviewed-by: Eric Anholt <[email protected]>
|
|
|
|
|
| |
Reviewed-by: Alex Deucher <[email protected]>
Reviewed-by: Eric Anholt <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The code was kind of mixed up what buffers were getting stored in the case
that a resolve bit was unset (which are set based on the GL state at draw
time) and the buffer wasn't actually bound. In particular, depth-only
rendering would store the color buffer contents, which happen to be
pointing at the depth buffer.
Thanks to clearing out the resolve bits for things we really can't
resolve, now I can drop the safety checks for buffer presence around the
actual stores.
Fixes 42 piglit tests.
|
| |
|
|
|
|
|
|
| |
Notice the mistaken (but harmless) argument swapping in brw_math_invert().
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
| |
Tested-by: Mark Janes <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
My attempts to clarify the code with _compacted/_uncompacted prefixed
variables apparently failed. Hopefully this is clearer.
In any case, the previous code wasn't clear enough to gcc to let it
optimize division by a power of two into a shift. No problems now.
Also, the previous code (in the ADD case) didn't work on 32-bit x86, due
to complicated set of interactions best summed up as unsigned division
and compiler optimizations.
Tested-by: Mark Janes <[email protected]>
|
|
|
|
|
|
|
|
|
| |
We need to keep track if a state change other than frag/vert shader
state will trigger us to need a different shader variant, and if
necessary mark the appropriate shader state as dirty. Otherwise we will
forget to re-emit the shader state.
Signed-off-by: Rob Clark <[email protected]>
|
|
|
|
| |
Signed-off-by: Rob Clark <[email protected]>
|
|
|
|
| |
Signed-off-by: Rob Clark <[email protected]>
|
|
|
|
|
|
|
| |
This is for hw that needs to emulate some texture wrap modes (like
CLAMP) with some help from the shader.
Signed-off-by: Rob Clark <[email protected]>
|
|
|
|
|
|
|
|
|
| |
Keep the existing function as a common helper. But this lets us move an
a2xx specific hack out of common code. And the PIPE_TEX_WRAP_CLAMP
emulation will require an a3xx specific hack. So rather than piling on
hacks, split this out.
Signed-off-by: Rob Clark <[email protected]>
|
|
|
|
| |
Signed-off-by: Rob Clark <[email protected]>
|
|
|
|
| |
Signed-off-by: Rob Clark <[email protected]>
|
|
|
|
|
|
| |
We just clamp the incoming texture coordinates. This breaks the lambda
calculation, but it gets the piglit tests to pass. This is the same
behavior as in i965.
|
|
|
|
|
|
|
| |
One spot in the docs says that it's stored at a miplevel just beyond the
last miplevel, which was scary. But really, you just load it as the R
coordinate (which conflicts with cubemaps, but you don't do border
clamping on cubes).
|
|
|
|
|
|
| |
We have to expose them for GL 2.0, but we just always return a value of 0.
We should be advertising 0 query bits instead of 64, but gallium doesn't
have plumbing for that yet. At least this stops the segfaults.
|
|
|
|
|
| |
Drops instructions on vs-temp-array-mat4-index-col-row-wr.shader_test,
which I was looking at because it's failing to register allocate.
|
|
|
|
| |
Definitely helps when trying to understand and optimize a program.
|
|
|
|
|
|
| |
This may reduce register pressure and uniform counts. Drops a bunch of 0
uniform loads on vs-temp-array-mat4-index-col-row-wr.shader_test, which is
failing to register allocate.
|
|
|
|
|
|
| |
It's not passing some of the piglit tests, because it looks like at small
miplevels some contents from surrounding faces are getting filtered in at
the corners. It does get 7 new tests passing.
|
|
|
|
|
|
| |
In the other related fields, "0" refers to the size of the first miplevel,
while this is a field in a slice. The other implicit slices we have
(cubemap layers) don't vary in size compared to the first one.
|
|
|
|
|
|
|
|
|
|
|
| |
Almost always, the MOV will get copy propagated out. Even if it doesn't,
it's probably better to just reload the uniform at next use (to reduce
register pressure) rather than try to save instruction count.
I was looking at this because in the presence of texturing (which calls
add_uniform() directly to get the uniform load forced into the
instruction) the c->uniform_contents indices don't match 1:1 with the
temporary qregs.
|
|
|
|
|
|
|
|
| |
commit 4ed23fd broke creation of pbuffer surfaces, patch fixes
the failure, noticed when running chrome with '--use-gl=egl'.
Signed-off-by: Tapani Pälli <[email protected]>
Reviewed-by: Juha-Pekka Heikkila <[email protected]>
|
|
|
|
|
|
| |
An 'else' is missing in the disassembler.
Signed-off-by: Chia-I Wu <[email protected]>
|
|
|
|
|
|
|
|
|
|
| |
According to GLES (i.e. 1.0 and above) spec textureCubeLod and
texture2DProjLod are built in functions. We seem to disable support
for these functions with GLES. This patch enables the support.
Signed-off-by: Kalyan Kondapally <[email protected]>
Reviewed-by: Matt Turner <[email protected]>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=84355
|