| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
|
|
|
| |
This exposes OpenGL 4.1 on Maxwell (tested on GM107 and GM206).
Signed-off-by: Samuel Pitoiset <[email protected]>
Reviewed-by: Ilia Mirkin <[email protected]>
|
|
|
|
|
|
|
| |
PFETCH, actually ISBERD on GM107+ ISA only accepts a GPR for src0.
Signed-off-by: Samuel Pitoiset <[email protected]>
Reviewed-by: Ilia Mirkin <[email protected]>
|
|
|
|
|
|
|
|
|
|
| |
The number of outputs patch (limited to 255) has moved in the TCP
header, but blob seems to also set the old position. Also, the high
8-bits are now located inbetween the min/max parallel output read
address at position 20.
Signed-off-by: Samuel Pitoiset <[email protected]>
Acked-by: Ilia Mirkin <[email protected]>
|
|
|
|
|
| |
Signed-off-by: Samuel Pitoiset <[email protected]>
Reviewed-by: Ilia Mirkin <[email protected]>
|
|
|
|
|
| |
Signed-off-by: Samuel Pitoiset <[email protected]>
Reviewed-by: Ilia Mirkin <[email protected]>
|
|
|
|
|
|
|
|
| |
This fixes a bunch of multisample piglit tests on GM206, like
bin/arb_texture_multisample-texelfetch 2 -auto -fbo
Signed-off-by: Samuel Pitoiset <[email protected]>
Reviewed-by: Ilia Mirkin <[email protected]>
|
|
|
|
|
|
|
|
| |
It's illegal to have neg modifiers on both sources for OP_ADD,
and it's illegal to have OP_SUB with just src0 neg.
Signed-off-by: Samuel Pitoiset <[email protected]>
Reviewed-by: Ilia Mirkin <[email protected]>
|
|
|
|
|
|
| |
Trivial.
Signed-off-by: Samuel Pitoiset <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
to reduce the call indirections with u_resource_vtbl.
The worst call tree you could get was:
- u_transfer_inline_write_vtbl
- u_default_transfer_inline_write
- u_transfer_map_vtbl
- driver_transfer_map
- u_transfer_unmap_vtbl
- driver_transfer_unmap
That's 6 indirect calls. Some drivers only had 5. The goal is to have
1 indirect call for drivers that care. The resource type can be determined
statically at most call sites.
The new interface is:
pipe_context::buffer_subdata(ctx, resource, usage, offset, size, data)
pipe_context::texture_subdata(ctx, resource, level, usage, box, data,
stride, layer_stride)
v2: fix whitespace, correct ilo's behavior
Reviewed-by: Nicolai Hähnle <[email protected]>
Acked-by: Roland Scheidegger <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This allows the load-propagation pass to swap the sources in presence
of immediate values.
Maxwell (GM107):
total instructions in shared programs :1928187 -> 1927634 (-0.03%)
total gprs used in shared programs :330741 -> 330154 (-0.18%)
total local used in shared programs :28032 -> 28032 (0.00%)
local gpr inst bytes
helped 0 271 425 425
hurt 0 0 194 194
Fermi (GF114):
total instructions in shared programs :2334474 -> 2333829 (-0.03%)
total gprs used in shared programs :380934 -> 380215 (-0.19%)
total local used in shared programs :33304 -> 33264 (-0.12%)
local gpr inst bytes
helped 5 314 521 521
hurt 0 4 195 195
No regressions on GM107 and GF114 with full piglit.
Signed-off-by: Samuel Pitoiset <[email protected]>
Reviewed-by: Ilia Mirkin <[email protected]>
|
|
|
|
|
|
|
|
|
|
| |
This fixes a regression introduced in
1da704a94c57aa0b0cf8faaa3236fe47dfb8f88c because the offset has moved
from 0x180 to 0x1a0, and the macros have to be re-compiled.
Fixes: 1da704a ("nvc0: increase the tex handles area size in the driver")
Signed-off-by: Samuel Pitoiset <[email protected]>
Reviewed-by: Ilia Mirkin <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This fixes a regression introduced in
1da704a94c57aa0b0cf8faaa3236fe47dfb8f88c because the offset has moved
from 0x600 to 0x620, and the kernels used for reading MP perf counters
have to be re-assembled.
This also fixes amd_performance_monitor_measure piglit.
Fixes: 1da704a ("nvc0: increase the tex handles area size in the driver")
Signed-off-by: Samuel Pitoiset <[email protected]>
Reviewed-by: Ilia Mirkin <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
| |
This allows Gallium drivers to advertise the subpixel precision
for floating point viewports bounds.
v2:
- Set ViewportSubpixelBits in st_init_limits.
Signed-off-by: Józef Kucia <[email protected]>
Signed-off-by: Marek Olšák <[email protected]>
Reviewed-by: Ilia Mirkin <[email protected]>
|
|
|
|
|
|
|
|
| |
MS images have to be handled explicitly and I don't plan to implement
them for now.
Signed-off-by: Samuel Pitoiset <[email protected]>
Reviewed-by: Ilia Mirkin <[email protected]>
|
|
|
|
|
| |
Signed-off-by: Samuel Pitoiset <[email protected]>
Reviewed-by: Ilia Mirkin <[email protected]>
|
|
|
|
|
| |
Signed-off-by: Samuel Pitoiset <[email protected]>
Reviewed-by: Ilia Mirkin <[email protected]>
|
|
|
|
|
| |
Signed-off-by: Samuel Pitoiset <[email protected]>
Reviewed-by: Ilia Mirkin <[email protected]>
|
|
|
|
|
| |
Signed-off-by: Samuel Pitoiset <[email protected]>
Reviewed-by: Ilia Mirkin <[email protected]>
|
|
|
|
|
| |
Signed-off-by: Samuel Pitoiset <[email protected]>
Reviewed-by: Ilia Mirkin <[email protected]>
|
|
|
|
|
|
|
|
|
| |
On Maxwell, images binding is slightly different (and much better)
regarding Fermi and Kepler because a texture view needs to be uploaded
for each image and this is going to simplify the thing a lot.
Signed-off-by: Samuel Pitoiset <[email protected]>
Reviewed-by: Ilia Mirkin <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
| |
Currently, we can store 32 tex handles of 32-bits integer each and
that fits perfectly with the underlying hardware except on GM107+
which requires to upload a texture view for each images.
This patch increases the number of storable texture handles in the
driver constant buffer from 32 to 40 because we expose 8 images.
Signed-off-by: Samuel Pitoiset <[email protected]>
Reviewed-by: Ilia Mirkin <[email protected]>
|
|
|
|
|
|
|
|
| |
ADD only allows to emit 19-bits immediates.
Signed-off-by: Samuel Pitoiset <[email protected]>
Reviewed-by: Ilia Mirkin <[email protected]>
Cc: <[email protected]>
|
|
|
|
|
|
|
|
| |
Like FADD32I, the NEG modifier of src0 is at position 56.
Signed-off-by: Samuel Pitoiset <[email protected]>
Reviewed-by: Ilia Mirkin <[email protected]>
Cc: [email protected]
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Mark both L8_SRGB and L8A8_SRGB as non-renderable (the latter already
didn't have the bind flags). This makes the state tracker pick a
different format when rendering is required, or mark the fb as
incomplete. This fixes:
bin/getteximage-formats init-by-clear-and-render -auto -fbo
bin/getteximage-formats init-by-rendering -auto -fbo
which previously ran into srgb-encoding differences.
Signed-off-by: Ilia Mirkin <[email protected]>
Reviewed-by: Samuel Pitoiset <[email protected]>
Cc: [email protected]
|
|
|
|
|
|
|
|
| |
This is useful for pbo downloads, which are now accelerated with images.
BGRA8 is a moderately common format to do that in.
Signed-off-by: Ilia Mirkin <[email protected]>
Reviewed-by: Samuel Pitoiset <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The hardware can only do alphatest when using a blendable format. This
means that the various *16 norm formats didn't work with alphatest. It
appears that Talos Principle uses such formats, as well as alpha tests,
for some internal renders, which made them be incorrect. However this
does not appear to affect the final renders, but in a different game it
easily could.
The approach we take is that when alphatests are enabled and a suitable
format is used (which we anticipate is the vast minority of the time),
we insert code into the shader to perform the comparison and discard.
Once inserted, that code lives in the shader forever, and we re-upload
it each time the function changes with a fixed-up compare. To avoid
re-uploading too often, if we switch back to a blendable format, the
test is (effectively) disabled and the hw alphatest functionality is
used.
Signed-off-by: Ilia Mirkin <[email protected]>
|
|
|
|
|
|
|
| |
Fixes: 2aa1197 ("nouveau: Add support for SV_WORK_DIM")
Signed-off-by: Samuel Pitoiset <[email protected]>
Reviewed-by: Ilia Mirkin <[email protected]>
Reviewed-by: Hans de Goede <[email protected]>
|
|
|
|
| |
Signed-off-by: Ben Skeggs <[email protected]>
|
|
|
|
|
|
|
| |
This might avoid mistakes if the size is bumped in the future.
Signed-off-by: Samuel Pitoiset <[email protected]>
Reviewed-by: Ilia Mirkin <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
| |
The size of the driver constant buffer for each stage should be 2048
and not 512 because it has been increased recently for buffers/images.
While we are at it, do the same change for indirect draws.
This fixes all ARB_shader_draw_parameters tests on GM107.
Signed-off-by: Samuel Pitoiset <[email protected]>
Reviewed-by: Ilia Mirkin <[email protected]>
Cc: 12.0 <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
| |
This fixes the following piglits:
arb_arrays_of_arrays-basic-imagestore-mixed-const-non-const-uniform-index
arb_arrays_of_arrays-basic-imagestore-mixed-const-non-const-uniform-index2
Signed-off-by: Samuel Pitoiset <[email protected]>
Reviewed-by: Ilia Mirkin <[email protected]>
Cc: 12.0 <[email protected]>
|
|
|
|
|
| |
Signed-off-by: Samuel Pitoiset <[email protected]>
Reviewed-by: Ilia Mirkin <[email protected]>
|
|
|
|
|
| |
Signed-off-by: Samuel Pitoiset <[email protected]>
Reviewed-by: Ilia Mirkin <[email protected]>
|
|
|
|
|
| |
Signed-off-by: Samuel Pitoiset <[email protected]>
Reviewed-by: Ilia Mirkin <[email protected]>
|
|
|
|
|
|
|
|
| |
While we are at it, fix a typo inside the comment which describes
what those constants are for.
Signed-off-by: Samuel Pitoiset <[email protected]>
Reviewed-by: Ilia Mirkin <[email protected]>
|
|
|
|
|
|
|
|
|
|
| |
In presence of an indirect image access, the base offset should be
zeroed because the stride will be computed twice. This is a pretty
rare situation but it can happen when tex.r > 0.
Signed-off-by: Samuel Pitoiset <[email protected]>
Reviewed-by: Ilia Mirkin <[email protected]>
Cc: "11.2 12.0" <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
| |
When emitting OP_SUB, the sign bit for FADD and FADD32I is not
at the same position. It's at position 45 for FADD but 51 for FADD32I.
This fixes the following piglit test:
tests/spec/arb_fragment_program/fdo30337b.shader_test
Signed-off-by: Samuel Pitoiset <[email protected]>
Reviewed-by: Ilia Mirkin <[email protected]>
Cc: <[email protected]>
|
|
|
|
|
|
|
|
| |
gcc6 does not like the trick where we point to one entry before the
array start and then start a while with a pre-increment.
Signed-off-by: Hans de Goede <[email protected]>
Reviewed-by: Ilia Mirkin <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
These are all new false positives with gcc6.
In nouveau_compiler.c: gcc6 no longer assumes that passing a pointer
to a variable into a function initialises that variable.
In nv50_ir_from_tgsi.cpp op and mode are not set if there are 0
enabled dst channels, this never happens, but gcc cannot know this.
Signed-off-by: Hans de Goede <[email protected]>
Acked-by: Ilia Mirkin <[email protected]>
|
|
|
|
|
| |
Signed-off-by: Hans de Goede <[email protected]>
Reviewed-by: Samuel Pitoiset <[email protected]>
|
|
|
|
|
|
|
|
| |
Add support for SV_WORK_DIM for nvc0 and nve4.
Signed-off-by: Hans de Goede <[email protected]>
Reviewed-by: Ilia Mirkin <[email protected]>
Reviewed-by: Samuel Pitoiset <[email protected]>
|
|
|
|
|
|
|
|
|
| |
This brings it inline with the other macros like NVC0_CB_AUX_UBO_INFO
and NVC0_CB_AUX_TEX_INFO.
Signed-off-by: Hans de Goede <[email protected]>
Reviewed-by: Ilia Mirkin <[email protected]>
Reviewed-by: Samuel Pitoiset <[email protected]>
|
|
|
|
|
|
|
|
|
| |
Basically we just have to scale up the coordinates and then add the
relevant sample offset. The code to handle this was already largely
present from Christoph's earlier attempts to pipe images through back in
the dark ages, this just hooks it all up.
Signed-off-by: Ilia Mirkin <[email protected]>
|
|
|
|
|
|
|
|
|
| |
The output of draw requires a null viewport transform, which the regular
code is ill-equiped to do. Reinstate the original settings in the render
path, and add setting of the viewport clip polygon based on fb
width/height (as that is all taken care of by draw).
Signed-off-by: Ilia Mirkin <[email protected]>
|
|
|
|
|
|
|
| |
This fixes a ton of "*clip*" dEQP GLES2 tests, as well as
triangle-guardband-viewport in piglit.
Signed-off-by: Ilia Mirkin <[email protected]>
|
|
|
|
|
| |
Signed-off-by: Samuel Pitoiset <[email protected]>
Reviewed-by: Ilia Mirkin <[email protected]>
|
|
|
|
|
| |
Signed-off-by: Samuel Pitoiset <[email protected]>
Reviewed-by: Ilia Mirkin <[email protected]>
|
|
|
|
|
| |
Signed-off-by: Samuel Pitoiset <[email protected]>
Reviewed-by: Ilia Mirkin <[email protected]>
|
|
|
|
|
| |
Signed-off-by: Samuel Pitoiset <[email protected]>
Reviewed-by: Ilia Mirkin <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
| |
Rely on the existence of a second destination when emitting a setcond
flag is dangerous, because this doesn't mean that the flag has been
correctly set. Instead rely on flagsDef like what emitX() does
for flagsSrc.
Signed-off-by: Samuel Pitoiset <[email protected]>
Reviewed-by: Ilia Mirkin <[email protected]>
Cc: <[email protected]>
|