| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This is done instead of copy propagating the VPM reads into the
instructions using them, because VPM reads have to stay in order.
shader-db results:
total instructions in shared programs: 78509 -> 78114 (-0.50%)
instructions in affected programs: 5203 -> 4808 (-7.59%)
total estimated cycles in shared programs: 234670 -> 234318 (-0.15%)
estimated cycles in affected programs: 5345 -> 4993 (-6.59%)
Signed-off-by: Varad Gautam <[email protected]>
Reviewed-by: Eric Anholt <[email protected]>
Tested-by: Rhys Kidd <[email protected]>
|
|
|
|
|
|
|
|
| |
This file will contain optimization passes for both vpm reads
and writes.
Signed-off-by: Varad Gautam <[email protected]>
Reviewed-by: Eric Anholt <[email protected]>
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
| |
Some rasterization code relies (for sse) on the first and third planes
(but not the second for now) being 128bit aligned, and we didn't get that
on 32bit - I mistakenly thought the 64bit number in the struct would get
the thing aligned to 64bit even on 32bit archs.
Stephane Marchesin really figured this out.
Reviewed-by: Jose Fonseca <[email protected]>
CC: <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The logic was comparing actual ints, not true/false values.
This meant that it was emitting always multiple line segments instead of just
one even if the stipple test had the same result, which looks inefficient, and
the segments also overlapped thus breaking line aa as well.
(In practice, with the no-op default line stipple pattern, for a 10-pixel
long line from 0-9 it was emitting 10 segments, with the individual segments
ranging from 0-1, 0-2, 0-3 and so on.)
This fixes https://bugs.freedesktop.org/show_bug.cgi?id=94193
Reviewed-by: Jose Fonseca <[email protected]>
CC: <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
All these img filter loops iterate through NUM_CHANNELS, not QUAD_SIZE.
In practice both are of course the same unchangeable value (4), but it
makes the code look a bit confusing. Moreover, some of the functions were
actually given an array of 4 values according to the declaration, yet the
code was addressing values 0/4/8/12 out of it, so fix this by just saying
it's a pointer to floats like the other functions.
While here, also add comment about not quite correct filtering.
There's no actual code difference.
Reviewed-by: Jose Fonseca <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The filt_args->offset wasn't assigned but was always used later leading
to a crash (as far as I can tell, texel offsets don't actually make much
sense with anisotropic filtering, but because there's no explicit setting
if offsets are enabled there the array is always accessed).
This fixes https://bugs.freedesktop.org/show_bug.cgi?id=94481
Reviewed-by: Eduardo Lima Mitev <[email protected]>
CC: <[email protected]>
|
|
|
|
| |
Reviewed-by: Marek Olšák <[email protected]>
|
|
|
|
|
| |
Reviewed-by: Ilia Mirkin <[email protected]>
Reviewed-by: Marek Olšák <[email protected]>
|
|
|
|
|
| |
Reviewed-by: Ilia Mirkin <[email protected]>
Reviewed-by: Marek Olšák <[email protected]>
|
|
|
|
|
| |
Reviewed-by: Ilia Mirkin <[email protected]>
Reviewed-by: Marek Olšák <[email protected]>
|
|
|
|
|
|
|
|
| |
This allows drivers to make smarter decisions e.g. about whether the image
has to be decompressed.
Reviewed-by: Ilia Mirkin <[email protected]>
Reviewed-by: Marek Olšák <[email protected]>
|
|
|
|
|
| |
Reviewed-by: Ilia Mirkin <[email protected]>
Reviewed-by: Marek Olšák <[email protected]>
|
|
|
|
|
| |
Reviewed-by: Ilia Mirkin <[email protected]>
Reviewed-by: Marek Olšák <[email protected]>
|
|
|
|
|
|
|
|
| |
This is required to preserve the image variable's coherent/restrict/volatile
qualifiers in TGSI.
Reviewed-by: Ilia Mirkin <[email protected]>
Reviewed-by: Marek Olšák <[email protected]>
|
|
|
|
|
| |
Reviewed-by: Ilia Mirkin <[email protected]>
Reviewed-by: Marek Olšák <[email protected]>
|
|
|
|
|
|
|
|
| |
Frontends should have this information readily available, and it simplifies
image LOAD/STORE/ATOM* handling especially with indirect image access.
Reviewed-by: Ilia Mirkin <[email protected]>
Reviewed-by: Marek Olšák <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
| |
The enums MAX_COMBINED_IMAGE_UNITS_AND_FRAGMENT_OUTPUTS and
MAX_COMBINED_SHADER_OUTPUT_RESOURCES are equal and should therefore only
appear once.
Noticed while implementing ARB_shader_image_load_store without previously
implementing SSBO.
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
|
|
|
| |
Should have no functional change. The IP value of an instruction that
reads src_var cannot possibly be after the end of the live interval of
the variable it's reading from, by the definition of live interval.
Might save future readers a momentary WTF while trying to understand
this code.
Reviewed-by: Matt Turner <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Bug found by the liveness analysis validation pass that will be
introduced in a later commit. The no-op MOV check in
opt_register_coalesce() was removing instructions which makes the
cached liveness analysis calculation inconsistent with the shader IR.
We were failing to set progress to true in that case though, which
means that invalidate_live_intervals() wouldn't necessarily be called
at the end of the function.
Cc: [email protected]
Reviewed-by: Matt Turner <[email protected]>
|
|
|
|
|
|
|
|
|
|
| |
Bug found by the liveness analysis validation pass that will be
introduced in a later commit. fixup_3src_null_dest() was allocating
registers which makes the cached liveness analysis calculation
incomplete, so it must be invalidated.
Cc: [email protected]
Reviewed-by: Matt Turner <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
| |
Bug found by the liveness analysis validation pass that will be
introduced in a later commit. opt_sampler_eot() was allocating
registers and inserting and removing instructions, which makes the
cached liveness analysis calculation inconsistent with the shader IR,
so it must be invalidated.
Cc: [email protected]
Reviewed-by: Matt Turner <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
After pipe_grid_info.indirect was introduced, clover was not modified
to set it causing it to pass uninitialized memory for it to launch_grid.
This commit fixes this by zero-ing the entire pipe_grid_info struct when
declaring it, to avoid similar problems popping-up in the future.
Cc: "11.2" <[email protected]>
Signed-off-by: Hans de Goede <[email protected]>
Reviewed-by: Samuel Pitoiset <[email protected]>
[ Francisco Jerez: Trivial codestyle fix. ]
Reviewed-by: Francisco Jerez <[email protected]>
|
|
|
|
|
|
|
| |
This should help the next person working on hardware enabling figure out
where in the Intel PRMs to find the magic platform hardware values.
Signed-off-by: Sarah Sharp <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
When reading the source code, it's useful to indicate that a group of
fields in a struct are related in someway. There were several places
where people tried to group related structure members with the {@
syntax, without realizing they also needed to add the \name syntax in
order to generate correct doxygen html.
There are several files with groupings that look like this:
struct foo {
/**
* Related fields description
* @{
*/
int bar;
char baz;
/** @} */
long qux;
}
However, the doxygen syntax for grouping is:
struct foo {
/**
* \name Related fields description
* @{
*/
int bar;
char baz;
/** @} */
long qux;
}
https://www.stack.nl/~dimitri/doxygen/manual/grouping.html
Without the group name definition, the fields don't get properly
grouped. Instead, the group description is applied to the first field.
Fix the Intel hardware information structure, brw_device_info to
properly group the GPU hardware limitations and hardware quirks fields.
Once you've run `cd doxygen; make clean; make all`,
updated documentation can be found at
mesa/doxygen/i965/structbrw__device__info.html
Signed-off-by: Sarah Sharp <[email protected]>
|
|
|
|
|
|
|
|
| |
Better tracking of resource state and synchronization.
A follow on commit will clean up resource functions into a new
swr_resource.cpp file.
Reviewed-By: George Kyriazis <[email protected]>
|
|
|
|
|
|
| |
since 737b6ed13e8f813987b5566004f0f45e9c55f1e8
src/gallium/winsys/amdgpu/drm/amdgpu_winsys.c no longer compiles:
error: unknown type name ‘drmDevicePtr’
|
|
|
|
|
|
|
|
|
|
|
| |
From the point it's constructed the CFG contains the only existing
copy of the program IR, and it never becomes invalid. Calling
backend_shader::invalidate_cfg would have destroyed the program
structure irrecoverably -- We weren't calling it at all for a good
reason.
Reviewed-by: Kenneth Graunke <[email protected]
Reviewed-by: Matt Turner <[email protected]>
|
|
|
|
|
| |
Signed-off-by: Pierre Moreau <[email protected]>
Reviewed-by: Samuel Pitoiset <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The following commits broke things by starting to feed us unhandled
extract_u16/extract_u8 opcodes:
commit 905ff861982450831a56d112036f68a751337441
Author: Matt Turner <[email protected]>
AuthorDate: Wed Feb 3 14:28:31 2016 -0800
Commit: Matt Turner <[email protected]>
CommitDate: Fri Mar 4 11:52:34 2016 -0800
nir: Recognize open-coded extract_u16.
commit 76289fbfa84a06ef4db8ad44ea0eb88ad0be8d5c
Author: Matt Turner <[email protected]>
AuthorDate: Thu Jan 21 09:09:48 2016 -0800
Commit: Matt Turner <[email protected]>
CommitDate: Fri Mar 4 11:52:34 2016 -0800
nir: Recognize open-coded extract_u8.
Signed-off-by: Rob Clark <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
First off, st/mesa lowers DSQRT incorrectly (it uses CMP to attempt to
find out whether the input is less than 0). Secondly the current
approach (x * rsq(x)) behaves poorly for x = inf - a NaN is produced
instead of inf.
Instead we switch to the less accurate rcp(rsq(x)) method - this behaves
nicely for all valid inputs. We still don't do this for DSQRT since the
RSQ/RCP ops are *really* inaccurate, and don't even have Newton-Raphson
steps right now. Eventually we should have a separate library function
for DSQRT that does it more precisely (and perhaps move this lowering to
the post-opt phase).
This fixes a number of dEQP precision tests that were expecting better
behavior for infinite inputs.
Signed-off-by: Ilia Mirkin <[email protected]>
Reviewed-by: Samuel Pitoiset <[email protected]>
Tested-by: Samuel Pitoiset <[email protected]>
|
|
|
|
|
| |
Signed-off-by: Ilia Mirkin <[email protected]>
Reviewed-by: Samuel Pitoiset <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The idea is that a single triangle will cover the whole area being
drawn, allowing the blit shader to do its work. However the max fb size
is 16384x16384, which means that the triangle we draw needs to be twice
that in order to cover the whole area fully. Increase the size of the
triangle to 32768x32768.
This fixes a number of dEQP tests that were failing because a blit was
involved which would miss some of the resulting texture.
Signed-off-by: Ilia Mirkin <[email protected]>
Cc: "11.1 11.2" <[email protected]>
|
|
|
|
|
|
| |
Make sure we use OUT_RELOCW() in cases where the buffer is written to.
Signed-off-by: Rob Clark <[email protected]>
|
|
|
|
| |
Signed-off-by: Rob Clark <[email protected]>
|
|
|
|
|
|
| |
No need to open-code this.
Signed-off-by: Rob Clark <[email protected]>
|
|
|
|
|
|
|
|
|
| |
Bitfields where shuffled around for the better on a4xx, so we don't need
any patching on this one. It appears to be something we set entirely in
the gmem code so no conflict between tiling and render state like we had
in a3xx.
Signed-off-by: Rob Clark <[email protected]>
|
|
|
|
| |
Signed-off-by: Rob Clark <[email protected]>
|
|
|
|
| |
Signed-off-by: Rob Clark <[email protected]>
|
|
|
|
|
|
|
| |
Move where we pick dummy FS for binning pass, so the whole driver sees
the same dummy/no-op FS stage.
Signed-off-by: Rob Clark <[email protected]>
|
|
|
|
| |
Signed-off-by: Rob Clark <[email protected]>
|
|
|
|
|
|
| |
Most of the driver just needs read-only access, so constify..
Signed-off-by: Rob Clark <[email protected]>
|
|
|
|
| |
Signed-off-by: Rob Clark <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Sampler states don't really make sense with buffer textures, but they
can be set anyway, so we need to be defensive here. This bug was lurking
for a while and was finally noticed due to PBO uploads setting sampler
states.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=94284
Cc: [email protected]
Reviewed-by: Marek Olšák <[email protected]>
Tested-by: Laurent Carlier <[email protected]>
Tested-by: Shawn Starr <[email protected]>
|
|
|
|
| |
Reviewed-by: Ian Romanick <[email protected]>
|
|
|
|
| |
Reviewed-by: Jordan Justen <[email protected]>
|
|
|
|
| |
Reviewed-by: Jordan Justen <[email protected]>
|
|
|
|
|
|
|
|
| |
Since we aren't going to put the function parameters or the return variable
in the list of locals, it won't get a proper declaration. This changes
nir_print to print the type along with each parameter or return variable.
Reviewed-by: Jordan Justen <[email protected]>
|
|
|
|
|
|
|
|
| |
Otherwise, we have a problem when we go to print functions with arguments
because their names get added to the hash table during declaration which
happens after we print the prototype.
Reviewed-by: Jordan Justen <[email protected]>
|
|
|
|
| |
Reviewed-by: Jordan Justen <[email protected]>
|