| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This patch fixes a classic "confuse the enemy" bug.
_mm_andnot_si128 (SSE) and vec_andc (VMX) do the same operation, but the
arguments are opposite.
_mm_andnot_si128 performs "r = (~a) & b" while
vec_andc performs "r = a & (~b)"
To make sure this error won't return in another place, I added a wrapper
function, vec_andnot_si128, in u_pwr8.h, which makes the swap inside.
Signed-off-by: Oded Gabbay <[email protected]>
Reviewed-by: Roland Scheidegger <[email protected]>
|
|
|
|
|
|
|
| |
Only user is freedreno, and after array-rework it can cope. Avoids
generating loads for a store.
Signed-off-by: Rob Clark <[email protected]>
|
|
|
|
|
| |
Signed-off-by: Rob Clark <[email protected]>
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
| |
This adds code that is basically the same as the code in umod, udiv and idiv.
However, unlike idiv we return -1.
Reviewed-by: Roland Scheidegger <[email protected]>
|
|
|
|
|
|
|
|
| |
Discovered by accident, valgrind was complaining (could have possibly caused
us to create redundant geometry shader variants).
v2: convinced by Brian and Jose, just use memset for both gs and vs keys,
just as easy and less error prone.
|
|
|
|
|
| |
Signed-off-by: Christian König <[email protected]>
Reviewed-by: Emil Velikov <[email protected]>
|
|
|
|
|
|
|
| |
Handle other formats than YV12 as well.
Signed-off-by: Christian König <[email protected]>
Reviewed-by: Emil Velikov <[email protected]>
|
|
|
|
|
|
|
| |
Useful for the state trackers as well.
Signed-off-by: Christian König <[email protected]>
Reviewed-by: Emil Velikov <[email protected]>
|
|
|
|
|
|
|
|
|
| |
This hasn't been in use since c476305 ("gallium/util: pregenerate
half float tables"), where the last bit of run-time init using this
was killed. So let's just get rid of the pointless header.
Signed-off-by: Erik Faye-Lund <[email protected]>
Reviewed-by: Timothy Arceri <[email protected]>
|
|
|
|
|
| |
Signed-off-by: Ilia Mirkin <[email protected]>
Reviewed-by: Marek Olšák <[email protected]>
|
|
|
|
|
| |
Signed-off-by: Ilia Mirkin <[email protected]>
Reviewed-by: Marek Olšák <[email protected]>
|
|
|
|
|
| |
Signed-off-by: Ilia Mirkin <[email protected]>
Reviewed-by: Marek Olšák <[email protected]>
|
|
|
|
|
|
|
|
|
|
| |
Each load/store on most hardware can specify what caching to do. Since
SSBO allows individual variables to also have separate caching modes,
allow loads/stores to have the qualifiers instead of attempting to
encode them in declarations.
Signed-off-by: Ilia Mirkin <[email protected]>
Reviewed-by: Marek Olšák <[email protected]>
|
|
|
|
|
| |
Signed-off-by: Ilia Mirkin <[email protected]>
Reviewed-by: Marek Olšák <[email protected]>
|
|
|
|
|
| |
Signed-off-by: Ilia Mirkin <[email protected]>
Reviewed-by: Marek Olšák <[email protected]>
|
|
|
|
|
| |
Reviewed-by: Edward O'Callaghan <[email protected]
Reviewed-by: Brian Paul <[email protected]>
|
|
|
|
|
| |
Reviewed-by: Edward O'Callaghan <[email protected]
Reviewed-by: Brian Paul <[email protected]>
|
|
|
|
|
| |
Reviewed-by: Edward O'Callaghan <[email protected]
Reviewed-by: Brian Paul <[email protected]>
|
|
|
|
|
| |
Reviewed-by: Brian Paul <[email protected]>
Reviewed-by: Edward O'Callaghan <[email protected]>
|
|
|
|
|
|
|
|
| |
It can be trivially derived from the number of already declared system
values. This allows ureg users not to worry about which index to choose.
Reviewed-by: Brian Paul <[email protected]>
Reviewed-by: Edward O'Callaghan <[email protected]>
|
|
|
|
|
| |
Signed-off-by: Edward O'Callaghan <[email protected]>
Reviewed-by: Nicolai Hähnle <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
| |
The sse path was pretty much disabled for practical purposes because the
largest allowed fb size was 128x128. So, adapt it for 64bit plane calculations.
This is actually not that difficult, though a problem is that we can't do
a signed 32x32->64bit mul, only unsigned, so need to fix that up. Overall,
the code still looks reasonable, though it's not like changes there in
setup really make much of a difference in the end...
Reviewed-by: Jose Fonseca <[email protected]>
Reviewed-by: Brian Paul <[email protected]>
|
|
|
|
|
|
|
|
|
| |
Otherwise, clipped lines would have undefined stippling reset bit if line
stippling is enabled.
(Untested, and I just assume copying over the bits from the original line
is actually the right thing to do.)
Reviewed-by: Jose Fonseca <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The unfilled stage was not filling in the prim header, and the line stage
then decided to reset the stipple counter or not based on the uninitialized
data. This causes some failures in conform linestipple test (albeit quite
randomly happening depending on environment).
So fill in the prim header in the unfilled stage - I am not entirely sure
if anybody really needs determinant after that stage, but there's at least
later stages (wide line for instance) which copy over the determinant as well.
Reviewed-by: Jose Fonseca <[email protected]>
Reviewed-by: Brian Paul <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
In lp_build_conv() and lp_build_conv_auto(), there is a special case of
conversion when sse2 is present. That code path is suitable without any
changes to altivec, because all the functions that are called in that
code path already support altivec.
This patch increase the FPS in POWER arch across the board
between 10%-25%
I checked ipers, glxgears, glxspheres64, openarena, xonotic and glmark2.
Signed-off-by: Oded Gabbay <[email protected]>
Reviewed-by: Roland Scheidegger <[email protected]>
|
|
|
|
|
|
| |
This will be used by radeonsi.
Reviewed-by: Nicolai Hähnle <[email protected]>
|
|
|
|
|
|
| |
This has no users in Mesa.
Reviewed-by: Nicolai Hähnle <[email protected]>
|
|
|
|
|
|
| |
This will be used by radeonsi.
Reviewed-by: Nicolai Hähnle <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
draw emit couldn't care less what the interpolation mode is...
This somehow looked like it would matter, all drivers more or less
dutifully filled that in correctly. But this is only used for emit,
if draw needs to know about interpolation mode (for clipping for instance)
it will get that information from the vs anyway.
softpipe actually used to depend on that interpolation parameter, as it
abused that structure quite a bit but no longer.
Reviewed-by: Brian Paul <[email protected]>
Reviewed-by: Edward O'Callaghan <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Previously the code would just redirect requests for attributes which
don't exist to use output 0. Rework this to output all zeros instead which
seems more useful - in particular some extensions like
ARB_fragment_layer_viewport require 0 in the fs even if it wasn't output by
previous stages. That way, drivers don't have to special case this depending
if the vs/gs outputs some attribute or not.
Reviewed-by: Brian Paul <[email protected]>
Reviewed-by: Jose Fonseca <[email protected]>
Reviewed-by: Edward O'Callaghan <[email protected]>
|
|
|
|
|
|
| |
Found-by: Coccinelle
Signed-off-by: Edward O'Callaghan <[email protected]>
Reviewed-by: Brian Paul <[email protected]>
|
|
|
|
|
|
|
|
|
| |
Fix silly issue with MSVC case fall-though support to need
a extra 'break;'
Found-by: Coccinelle
Signed-off-by: Edward O'Callaghan <[email protected]>
Reviewed-by: Brian Paul <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This file provides a portability layer that will make it easier to convert
SSE-based functions to VMX/VSX-based functions.
All the functions implemented in this file are prefixed using "vec_".
Therefore, when converting from SSE-based function, one needs to simply
replace the "_mm_" prefix of the SSE function being called to "vec_".
Having said that, not all functions could be converted as such, due to the
differences between the architectures. So, when doing such
conversion hurt the performance, I preferred to implement a more ad-hoc
solution. For example, converting the _mm_shuffle_epi32 needed to be done
using ad-hoc masks instead of a generic function.
All the functions in this file support both little-endian and big-endian
but currently the file is build only on POWER8 LE machine.
All of the functions are implemented using the Altivec/VMX intrinsics,
except one where I needed to use inline assembly (due to missing
intrinsic).
v2:
- Use vec_vgbbd instead of __builtin_vec_vgbbd
- Add an aligned load function
- Don't use typeof()
- Make file build only on POWER8 LE machine
Signed-off-by: Oded Gabbay <[email protected]>
Reviewed-by: Roland Scheidegger <[email protected]>
|
| |
|
|
|
|
|
|
| |
Like debug_dump_float_rgba_bmp() but takes ubyte values.
Reviewed-by: Charmaine Lee <[email protected]>
|
|
|
|
|
|
| |
Signed-off-by: Ilia Mirkin <[email protected]>
Reviewed-by: Edward O'Callaghan <[email protected]>
Reviewed-by: Roland Scheidegger <[email protected]>
|
|
|
|
| |
Reviewed-by: Nicolai Hähnle <[email protected]>
|
|
|
|
| |
Reviewed-by: Nicolai Hähnle <[email protected]>
|
|
|
|
| |
Reviewed-by: Nicolai Hähnle <[email protected]>
|
|
|
|
| |
Reviewed-by: Nicolai Hähnle <[email protected]>
|
|
|
|
|
|
|
|
|
|
| |
The fixed alignment of u_upload_mgr will go away.
This is the first step.
The motivation is that one u_upload_mgr can have multiple users,
each allocating from the same buffer, but requiring a different alignment.
Reviewed-by: Nicolai Hähnle <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The function only aligned the size, but not the offset.
The offset was aligned only when the previous suballocation was aligned.
That yielded the correct offset alignment if the alignment was constant
for all suballocations.
Instead, directly align the offset, but allow an unaligned size.
There is no change in behavior, because the alignment is constant
at the moment.
This a prerequisite for allowing a variable alignment for suballocations.
Reviewed-by: Nicolai Hähnle <[email protected]>
|
|
|
|
|
| |
Signed-off-by: Ilia Mirkin <[email protected]>
Reviewed-by: Roland Scheidegger <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
| |
A hugely common case when using nir_builder is to have a shader with a
single function called main. This adds a helper that gives you just that.
This commit also makes us use it in the NIR control-flow unit tests as well
as tgsi_to_nir and prog_to_nir.
Reviewed-by: Eric Anholt <[email protected]>
Reviewed-by: Connor Abbott <[email protected]>
Acked-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
| |
This is analogous to the alreading existing macros for BOOL, NUM, and FLAGS.
Reviewed-by: Marek Olšák <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
When Connor originally drafted NIR, he copied the same function+overload
system that GLSL IR had with a few names changed. However, this
double-indirection is not really needed and has only served to confuse
people. Instead, let's just have functions which may not have unique names
and may or may not have an implementation. If someone wants to do overload
resolving, they can hav a hash table based function+overload system in the
overload resolving pass. There's no good reason to keep it in core NIR.
Reviewed-by: Connor Abbott <[email protected]>
Acked-by: Kenneth Graunke <[email protected]>
ir3 bits are
Reviewed-by: Rob Clark <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
NIR has never been built with MSVC2008, so we shouldn't add
MSVC2008_COMPAT_CFLAGS to anything that uses it. This allows us to get
rid of the pragma in tgsi_to_nir.c.
Build tested with freedreno.
v2: Use MSVC2013_COMPAT_CLFAGS instead.
Reviewed-by: Jose Fonseca <[email protected]>
Signed-off-by: Connor Abbott <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Tessellation control shaders need to be careful when writing outputs.
Because multiple threads can concurrently write the same output
variables, we need to only write the exact components we were told.
Traditionally, for sub-vector writes, we've read the whole vector,
updated the temporary, and written the whole vector back. This breaks
down with concurrent access.
This patch prepares the way for a solution by adding a writemask field
to store_var intrinsics, as well as the other store intrinsics. It then
updates all produces to emit a writemask of "all channels enabled". It
updates nir_lower_io to copy the writemask to output store intrinsics.
Finally, it updates nir_lower_vars_to_ssa to handle partial writemasks
by doing a read-modify-write cycle (which is safe, because local
variables are specific to a single thread).
This should have no functional change, since no one actually emits
partial writemasks yet.
v2: Make nir_validate momentarily assert that writemasks cover the
complete value - we shouldn't have partial writemasks yet
(requested by Jason Ekstrand).
v3: Fix accidental SSBO change that arose from merge conflicts.
v4: Don't try to handle writemasks in ir3_compiler_nir - my code
for indirects was likely wrong, and TTN doesn't generate partial
writemasks today anyway. Change them to asserts as requested by
Rob Clark.
Signed-off-by: Kenneth Graunke <[email protected]>
Reviewed-by: Jason Ekstrand <[email protected]> [v3]
|
|
|
|
|
|
|
|
|
|
|
| |
As in the previous patches, these can be implemented as
any(v) -> any_nequal(v, false)
all(v) -> all_equal(v, true)
and their removal simplifies the code in the next patch.
Reviewed-by: Ian Romanick <[email protected]>
|
|
|
|
| |
as it uses definition from it (enum tgsi_return_type).
|