| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This is done instead of copy propagating the VPM reads into the
instructions using them, because VPM reads have to stay in order.
shader-db results:
total instructions in shared programs: 78509 -> 78114 (-0.50%)
instructions in affected programs: 5203 -> 4808 (-7.59%)
total estimated cycles in shared programs: 234670 -> 234318 (-0.15%)
estimated cycles in affected programs: 5345 -> 4993 (-6.59%)
Signed-off-by: Varad Gautam <[email protected]>
Reviewed-by: Eric Anholt <[email protected]>
Tested-by: Rhys Kidd <[email protected]>
|
|
|
|
|
|
|
|
| |
This file will contain optimization passes for both vpm reads
and writes.
Signed-off-by: Varad Gautam <[email protected]>
Reviewed-by: Eric Anholt <[email protected]>
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
| |
Some rasterization code relies (for sse) on the first and third planes
(but not the second for now) being 128bit aligned, and we didn't get that
on 32bit - I mistakenly thought the 64bit number in the struct would get
the thing aligned to 64bit even on 32bit archs.
Stephane Marchesin really figured this out.
Reviewed-by: Jose Fonseca <[email protected]>
CC: <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The logic was comparing actual ints, not true/false values.
This meant that it was emitting always multiple line segments instead of just
one even if the stipple test had the same result, which looks inefficient, and
the segments also overlapped thus breaking line aa as well.
(In practice, with the no-op default line stipple pattern, for a 10-pixel
long line from 0-9 it was emitting 10 segments, with the individual segments
ranging from 0-1, 0-2, 0-3 and so on.)
This fixes https://bugs.freedesktop.org/show_bug.cgi?id=94193
Reviewed-by: Jose Fonseca <[email protected]>
CC: <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
All these img filter loops iterate through NUM_CHANNELS, not QUAD_SIZE.
In practice both are of course the same unchangeable value (4), but it
makes the code look a bit confusing. Moreover, some of the functions were
actually given an array of 4 values according to the declaration, yet the
code was addressing values 0/4/8/12 out of it, so fix this by just saying
it's a pointer to floats like the other functions.
While here, also add comment about not quite correct filtering.
There's no actual code difference.
Reviewed-by: Jose Fonseca <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The filt_args->offset wasn't assigned but was always used later leading
to a crash (as far as I can tell, texel offsets don't actually make much
sense with anisotropic filtering, but because there's no explicit setting
if offsets are enabled there the array is always accessed).
This fixes https://bugs.freedesktop.org/show_bug.cgi?id=94481
Reviewed-by: Eduardo Lima Mitev <[email protected]>
CC: <[email protected]>
|
|
|
|
| |
Reviewed-by: Marek Olšák <[email protected]>
|
|
|
|
|
| |
Reviewed-by: Ilia Mirkin <[email protected]>
Reviewed-by: Marek Olšák <[email protected]>
|
|
|
|
|
| |
Reviewed-by: Ilia Mirkin <[email protected]>
Reviewed-by: Marek Olšák <[email protected]>
|
|
|
|
|
|
|
|
| |
This allows drivers to make smarter decisions e.g. about whether the image
has to be decompressed.
Reviewed-by: Ilia Mirkin <[email protected]>
Reviewed-by: Marek Olšák <[email protected]>
|
|
|
|
|
| |
Reviewed-by: Ilia Mirkin <[email protected]>
Reviewed-by: Marek Olšák <[email protected]>
|
|
|
|
|
| |
Reviewed-by: Ilia Mirkin <[email protected]>
Reviewed-by: Marek Olšák <[email protected]>
|
|
|
|
|
|
|
|
| |
Frontends should have this information readily available, and it simplifies
image LOAD/STORE/ATOM* handling especially with indirect image access.
Reviewed-by: Ilia Mirkin <[email protected]>
Reviewed-by: Marek Olšák <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
After pipe_grid_info.indirect was introduced, clover was not modified
to set it causing it to pass uninitialized memory for it to launch_grid.
This commit fixes this by zero-ing the entire pipe_grid_info struct when
declaring it, to avoid similar problems popping-up in the future.
Cc: "11.2" <[email protected]>
Signed-off-by: Hans de Goede <[email protected]>
Reviewed-by: Samuel Pitoiset <[email protected]>
[ Francisco Jerez: Trivial codestyle fix. ]
Reviewed-by: Francisco Jerez <[email protected]>
|
|
|
|
|
|
|
|
| |
Better tracking of resource state and synchronization.
A follow on commit will clean up resource functions into a new
swr_resource.cpp file.
Reviewed-By: George Kyriazis <[email protected]>
|
|
|
|
|
| |
Signed-off-by: Pierre Moreau <[email protected]>
Reviewed-by: Samuel Pitoiset <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The following commits broke things by starting to feed us unhandled
extract_u16/extract_u8 opcodes:
commit 905ff861982450831a56d112036f68a751337441
Author: Matt Turner <[email protected]>
AuthorDate: Wed Feb 3 14:28:31 2016 -0800
Commit: Matt Turner <[email protected]>
CommitDate: Fri Mar 4 11:52:34 2016 -0800
nir: Recognize open-coded extract_u16.
commit 76289fbfa84a06ef4db8ad44ea0eb88ad0be8d5c
Author: Matt Turner <[email protected]>
AuthorDate: Thu Jan 21 09:09:48 2016 -0800
Commit: Matt Turner <[email protected]>
CommitDate: Fri Mar 4 11:52:34 2016 -0800
nir: Recognize open-coded extract_u8.
Signed-off-by: Rob Clark <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
First off, st/mesa lowers DSQRT incorrectly (it uses CMP to attempt to
find out whether the input is less than 0). Secondly the current
approach (x * rsq(x)) behaves poorly for x = inf - a NaN is produced
instead of inf.
Instead we switch to the less accurate rcp(rsq(x)) method - this behaves
nicely for all valid inputs. We still don't do this for DSQRT since the
RSQ/RCP ops are *really* inaccurate, and don't even have Newton-Raphson
steps right now. Eventually we should have a separate library function
for DSQRT that does it more precisely (and perhaps move this lowering to
the post-opt phase).
This fixes a number of dEQP precision tests that were expecting better
behavior for infinite inputs.
Signed-off-by: Ilia Mirkin <[email protected]>
Reviewed-by: Samuel Pitoiset <[email protected]>
Tested-by: Samuel Pitoiset <[email protected]>
|
|
|
|
|
| |
Signed-off-by: Ilia Mirkin <[email protected]>
Reviewed-by: Samuel Pitoiset <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The idea is that a single triangle will cover the whole area being
drawn, allowing the blit shader to do its work. However the max fb size
is 16384x16384, which means that the triangle we draw needs to be twice
that in order to cover the whole area fully. Increase the size of the
triangle to 32768x32768.
This fixes a number of dEQP tests that were failing because a blit was
involved which would miss some of the resulting texture.
Signed-off-by: Ilia Mirkin <[email protected]>
Cc: "11.1 11.2" <[email protected]>
|
|
|
|
|
|
| |
Make sure we use OUT_RELOCW() in cases where the buffer is written to.
Signed-off-by: Rob Clark <[email protected]>
|
|
|
|
| |
Signed-off-by: Rob Clark <[email protected]>
|
|
|
|
|
|
| |
No need to open-code this.
Signed-off-by: Rob Clark <[email protected]>
|
|
|
|
|
|
|
|
|
| |
Bitfields where shuffled around for the better on a4xx, so we don't need
any patching on this one. It appears to be something we set entirely in
the gmem code so no conflict between tiling and render state like we had
in a3xx.
Signed-off-by: Rob Clark <[email protected]>
|
|
|
|
| |
Signed-off-by: Rob Clark <[email protected]>
|
|
|
|
| |
Signed-off-by: Rob Clark <[email protected]>
|
|
|
|
|
|
|
| |
Move where we pick dummy FS for binning pass, so the whole driver sees
the same dummy/no-op FS stage.
Signed-off-by: Rob Clark <[email protected]>
|
|
|
|
| |
Signed-off-by: Rob Clark <[email protected]>
|
|
|
|
|
|
| |
Most of the driver just needs read-only access, so constify..
Signed-off-by: Rob Clark <[email protected]>
|
|
|
|
| |
Signed-off-by: Rob Clark <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Sampler states don't really make sense with buffer textures, but they
can be set anyway, so we need to be defensive here. This bug was lurking
for a while and was finally noticed due to PBO uploads setting sampler
states.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=94284
Cc: [email protected]
Reviewed-by: Marek Olšák <[email protected]>
Tested-by: Laurent Carlier <[email protected]>
Tested-by: Shawn Starr <[email protected]>
|
|
|
|
|
|
| |
Signed-off-by: Boyuan Zhang <[email protected]>
Reviewed-by: Christian König <[email protected]>
Reviewed-by: Alex Deucher <[email protected]>
|
|
|
|
|
|
| |
Signed-off-by: Boyuan Zhang <[email protected]>
Reviewed-by: Christian König <[email protected]>
Reviewed-by: Alex Deucher <[email protected]>
|
|
|
|
|
| |
Signed-off-by: Boyuan Zhang <[email protected]>
Reviewed-by: Alex Deucher <[email protected]>
|
|
|
|
|
|
|
|
|
|
| |
Clear DCC flags if necessary when binding a new sampler view.
v2: Do not reset DCC flags of bound sampler views.
v3: Check that we have a real texture (Nicolai)
Signed-off-by: Bas Nieuwenhuizen <[email protected]>
Reviewed-by: Nicolai Hähnle <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
| |
Found by inspection of the source based on a bisected bug report.
This bug has been in the code for a long time, but the more recent PBO upload
feature exposed it because it leads to more uses of buffer textures.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=94388
Reviewed-by: Marek Olšák <[email protected]>
Cc: "11.0 11.1 11.2" <[email protected]>
|
|
|
|
|
|
|
|
| |
This will allow the nouveau backend to not try and split up ops that are
fused in GLSL.
Signed-off-by: Ilia Mirkin <[email protected]>
Reviewed-by: Samuel Pitoiset <[email protected]>
|
|
|
|
|
| |
Reviewed-by: Bas Nieuwenhuizen <[email protected]>
Reviewed-by: Marek Olšák <[email protected]>
|
|
|
|
|
|
|
|
|
| |
Since it is all about calling into blitter functions, it makes more
sense here. This change also reduces the size of the interfaces between
.c files.
Reviewed-by: Bas Nieuwenhuizen <[email protected]>
Reviewed-by: Marek Olšák <[email protected]>
|
|
|
|
|
| |
Reviewed-by: Bas Nieuwenhuizen <[email protected]>
Reviewed-by: Marek Olšák <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
There is an annoying corner case that I stumbled across while looking into
piglit's arb_shader_image_load_store/execution/load-from-cleared-image.shader_test
(which can be easily adapted to demonstrate the bug without the
ARB_shader_image_load_store extension)
When we bind a texture and then clear it using glClear (by attaching it
to the current framebuffer) for the first time, we allocate a separate
cmask for the texture to do fast clear, but the corresponding bit in
compressed_colortex_mask is not set. Subsequent rendering will use
incorrect data.
Conversely, when a currently bound texture with an existing cmask is
exported leading to that cmask being disabled, the compressed_colortex_mask
bit will remain set, leading to an assertion later on in debug builds.
Since iterating through all contexts and/or remembering where every
texture is bound would be costly, and cmask enable/disable should be
rare, we will maintain a global counter to signal contexts that they
must update their compressed_colortex_masks.
This patch introduces the global counter, and subsequent patches will
do the mask update.
Reviewed-by: Bas Nieuwenhuizen <[email protected]>
Reviewed-by: Marek Olšák <[email protected]>
|
|
|
|
|
|
| |
Remove use of a win32-style type leaked from the swr rasterizer.
Reviewed-by: Bruce Cherniak <[email protected]>
|
|
|
|
|
| |
Signed-off-by: Samuel Pitoiset <[email protected]>
Acked-by: Ilia Mirkin <[email protected]>
|
|
|
|
|
| |
Signed-off-by: Samuel Pitoiset <[email protected]>
Acked-by: Ilia Mirkin <[email protected]>
|
|
|
|
|
|
|
|
|
| |
Because compute support is not enabled by default for these chipsets,
NVF0_COMPUTE=1 needs to be used, along with GALLIUM_HUD to enable
performance counters.
Signed-off-by: Samuel Pitoiset <[email protected]>
Acked-by: Ilia Mirkin <[email protected]>
|
|
|
|
|
|
|
|
| |
This is really verbose but most of the configuration will be reused
for SM35 (GK110).
Signed-off-by: Samuel Pitoiset <[email protected]>
Acked-by: Ilia Mirkin <[email protected]>
|
|
|
|
|
|
|
| |
This follows the same design as MP perf counters.
Signed-off-by: Samuel Pitoiset <[email protected]>
Acked-by: Ilia Mirkin <[email protected]>
|
|
|
|
|
|
|
| |
This mainly improves how we define the different list of queries.
Signed-off-by: Samuel Pitoiset <[email protected]>
Acked-by: Ilia Mirkin <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
| |
CXX codegen/nv50_ir.lo
In file included from codegen/nv50_ir.cpp:28:
./nouveau_debug.h:19:30: error: invalid suffix on literal; C++11 requires a space between literal and identifier
[-Wreserved-user-defined-literal]
fprintf(stderr, "%s:%d - "fmt, __FUNCTION__, __LINE__, ##args)
^
Signed-off-by: Vinson Lee <[email protected]>
Reviewed-by: Samuel Pitoiset <[email protected]>
|