| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
|
|
|
|
|
| |
It doesn't seem like the exact number has too much effect on the
performaince in "teximage". However setting it to just about anything
prevents some OOMs from getting hit. These values are not well-tuned,
but don't seem too bad.
Signed-off-by: Ilia Mirkin <[email protected]>
|
|
|
|
|
|
|
| |
Since the max attrib stride is 2048, the max src offset makes sense as
2047.
Signed-off-by: Ilia Mirkin <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
All TXF operations implicitly use sampler 0, and fail if it's not bound
to anything. This does not happen in LINKED_TSC mode, but we don't
currently use this.
We ensure that TSC entry at id 0 has the SRGB conversion bit enabled
(and all samplers we normally generate will too). Then when the TSC at
*slot* 0 (not to be confused with entry 0 in the global TSC table) is
unbound, we bind it to entry 0. This way, TXF operations are not
dependent on there being a regular sampler bound there.
Fixes arb_texture_buffer_object-subdata-sync among others. (TBO's are
particularly susceptible to this as they don't bind a sampler.)
Signed-off-by: Ilia Mirkin <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The new approach is that samplers don't get unbound even if they won't be used
in a draw and we should just leave them be as well.
Fixes a regression in multiple windows games using gallium nine and nouveau.
v2: adjust num_samplers to keep track of the highest sampler bound
v3: rework how to set the new value of num_samplers
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=106577
Fixes: 4d6fab245eec3880e2a59424a579851f44857ce8
"cso: don't track the number of sampler states bound"
Signed-off-by: Karol Herbst <[email protected]>
Reviewed-by: Ilia Mirkin <[email protected]>
|
|
|
|
|
|
|
|
|
|
| |
dnz flag only applies for multiplications (e.g. to make 0 * Infinity
becomes 0 instead of NaN). Once we optimize a MAD into an ADD, the dnz
flag no longer makes sense, and upsets the GM107 emitter (since it looks
at the ftz and dnz flags together).
Signed-off-by: Ilia Mirkin <[email protected]>
Reviewed-by: Karol Herbst <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
On nv50, certain operations must happen on regs below 64, due to
encoding requirements. First of all, we add infrastructure to enforce
this. Secondly we change the spill order to first spill RIG nodes that
are unconstrained, followed by ones that are.
This makes the gamecube logo shadertoy compile properly. Curiously, if
we adjust the spill order so that we first spill the constrained RIG
nodes instead, the RA also succeeds. However it seems more logical to
first spill the unconstrained ones.
While we're at it, drop the nv50 max register to reserve r127 as the
zero register of last resort (r63 is preferred).
Signed-off-by: Ilia Mirkin <[email protected]>
Acked-by: Karol Herbst <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Instead of the size restriction existing in two places, and potentially
being applied twice, we move this together. Ops with 16-bit register
addresses can only take a short reg, and ops with immediates can only
take a short reg.
Of course we leave the immediate 0 in place since we know that it will
be replaced by r63/r127 down the line, so don't treat zeroes as an
immediate.
Signed-off-by: Ilia Mirkin <[email protected]>
Reviewed-by: Karol Herbst <[email protected]>
|
|
|
|
|
|
|
|
|
| |
We removed the op from the BB, but it was still listed in its sources'
uses. This could trip up some logic down the line which analyzes all the
uses of an l-value, e.g. spilling.
Signed-off-by: Ilia Mirkin <[email protected]>
Reviewed-by: Karol Herbst <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
| |
In function 'uint8_t nv50_ir::getTEXSMask(uint8_t)':
warning: control reaches end of non-void function [-Wreturn-type]
Reported-by: Moiman@freenode
Fixes: f821e80213e38e93f96255b3deacb737a600ed40
"gm107/ir: use scalar tex instructions where possible"
Signed-off-by: Karol Herbst <[email protected]>
Reviewed-by: Ilia Mirkin <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
TEXS, TLD4 and TLD4S are variants of tex instructions which are more
scalar, which gives RA more freedom and is less likely to insert silly
MOVs to satisfy quad registers.
shader-db changes:
total instructions in shared programs : 7687265 -> 7614782 (-0.94%)
total gprs used in shared programs : 803620 -> 798045 (-0.69%)
total shared used in shared programs : 639636 -> 639636 (0.00%)
total local used in shared programs : 24648 -> 24648 (0.00%)
total bytes used in shared programs : 82103400 -> 81330696 (-0.94%)
local shared gpr inst bytes
helped 0 0 3648 10647 10647
hurt 0 0 464 205 205
Reviewed-by: Ilia Mirkin <[email protected]>
|
|
|
|
| |
Reviewed-by: Ilia Mirkin <[email protected]>
|
|
|
|
| |
Reviewed-by: Ilia Mirkin <[email protected]>
|
|
|
|
|
|
|
| |
v2: print the mask for TXG as well
make the mask to be printed more mask like
Reviewed-by: Ilia Mirkin <[email protected]>
|
|
|
|
|
| |
Signed-off-by: Eric Engestrom <[email protected]>
Reviewed-by: Ilia Mirkin <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
| |
After discussion with Timothy Arceri. disk_cache_get_function_identifier
was using only the first byte of the sha1 build-id. Replace
disk_cache_get_function_identifier with implementation from
radv_get_build_id. Instead of writing a uint32_t it now writes to a
mesa_sha1. All drivers using disk_cache_get_function_identifier are
updated accordingly.
Reviewed-by: Timothy Arceri <[email protected]>
Fixes: 83ea8dd99bb1 ("util: add disk_cache_get_function_identifier()")
|
|
|
|
|
|
|
|
| |
Gives a +3.89% to +5.27% FPS improvement with Hitman and +2.73% to +2.82%
FPS improvement with Dirt Rally on my GTX 1060.
Signed-off-by: Rhys Perry <[email protected]>
Reviewed-by: Ilia Mirkin <[email protected]>
|
|
|
|
|
|
|
|
| |
Fixes: 2f52925f5c60c72c9389bfdc122c3d5f8e15b25f
"nv50/ir: move a * b -> a << log2(b) code into createMul()"
Reviewed-by: Rhys Perry <[email protected]>
Signed-off-by: Karol Herbst <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
For some reason the 2d engine can't handle this. Red formats get special
treatment there, so perhaps related.
Fixes dEQP-GLES3 tests of the form:
dEQP-GLES3.functional.fbo.blit.conversion.r{8,16f,32f}_to_srgb8_alpha8
Signed-off-by: Ilia Mirkin <[email protected]>
Reviewed-by: Karol Herbst <[email protected]>
Cc: [email protected]
|
|
|
|
|
|
|
|
|
|
| |
The current state tracker can generate these sometimes. Fixing this is
more involved, and due to some integer math we can generate
divisions-by-zero.
Signed-off-by: Ilia Mirkin <[email protected]>
Reviewed-by: Karol Herbst <[email protected]>
Cc: [email protected]
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This helps st/mesa avoid some (apparently) buggy fallbacks. Specifically
the CopyTexSubImage fallback tries to read texture A as RGBA_FLOAT and
write back that data into the target format, which fails for integer
formats which have no appropriate logic to do the conversion.
Since integer formats don't blend, there's no harm in the fact that the
"A" component gets written anyways.
Fixes, among others:
https://www.khronos.org/registry/webgl/sdk/tests/conformance2/textures/canvas/tex-2d-rgb8ui-rgb_integer-unsigned_byte.html
Signed-off-by: Ilia Mirkin <[email protected]>
Cc: [email protected]
|
|
|
|
| |
Reviewed-by: Marek Olšák <[email protected]>
|
|
|
|
|
|
|
| |
Seems this fixes linking problems that occur in some situations.
Signed-off-by: Rhys Perry <[email protected]>
Reviewed-by: Ilia Mirkin <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
NVC0_CB_AUX_BINDLESS_INFO isn't written to on Maxwell+ and it's too small
anyway.
With these changes, TXQ is used to determine the number of samples and
the coordinate adjustment information looked up in a small array in the
driver constant buffer.
v2: rework to use TXQ and a small array instead of a larger array with an
entry for each texture
v3: get rid of the small array and calculate the adjustments in the shader
Signed-off-by: Rhys Perry <[email protected]>
Fixes: c2ae9b40527 ('nvc0: implement multisampled images on Maxwell+')
Reviewed-by: Ilia Mirkin <[email protected]>
|
|
|
|
|
| |
Signed-off-by: Rhys Perry <[email protected]>
Reviewed-by: Ilia Mirkin <[email protected]>
|
|
|
|
|
|
| |
Fixes: 66ca7e400b8 ('nvc0: add support for programmable sample locations')
Signed-off-by: Rhys Perry <[email protected]>
Reviewed-by: Ilia Mirkin <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
| |
Those operations do not map to actual hardware instructions, therefore
those should always be lowered to 32-bit instructions.
Fixes: 009c54aa7af "nv50/ir: Split 64-bit integer MAD/MUL operations"
Signed-off-by: Pierre Moreau <[email protected]>
Reviewed-by: Karol Herbst <[email protected]>
Signed-off-by: Karol Herbst <[email protected]>
|
|
|
|
|
|
|
|
|
|
| |
Not handling caps explicitly means that we're likely getting incorrect
values -- these need to be reviewed and set appropriately.
While we're at it, add in some missing caps, and set all the subpixel
stuff to 8 as that seems to be what the blob reports.
Signed-off-by: Ilia Mirkin <[email protected]>
|
| |
|
| |
|
|
|
|
| |
for AMD_depth_clamp_separate.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
One of the pains of implementing a gallium driver is filling in a million
pipe caps you don't know about yet when you're just starting out. One of
the pains of working on gallium is copy-and-pasting your new PIPE_CAP into
each driver. We can fix both of these by having each driver call into the
default helper from their default case, so that both sides can ignore each
other until they need to.
v2: fix i915g build, revert swr change to avoid breaking scons build
(https://travis-ci.org/anholt/mesa/jobs/419739857)
v3: Rebase on 3 new gallium caps.
Reviewed-by: Marek Olšák <[email protected]> (v1)
Cc: Bruce Cherniak <[email protected]>
Cc: George Kyriazis <[email protected]>
Cc: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
| |
Passes the compat piglits. I'm sure that there will be odd issues that
aren't caught by them, but at least it should basically work.
Signed-off-by: Ilia Mirkin <[email protected]>
|
|
|
|
|
|
| |
This passes the handful of tests in piglit.
Signed-off-by: Ilia Mirkin <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
| |
Move this now-unused function into the existing comment block, which was its only prior use.
../../../../../src/gallium/drivers/nouveau/codegen/nv50_ir_from_tgsi.cpp:2645:1: warning:
unused function 'partitionLoadStore' [-Wunused-function]
partitionLoadStore(uint8_t comp[2], uint8_t size[2], uint8_t mask)
Fixes: ("86e4440361 nouveau: codegen: Disable more old resource handling code")
Signed-off-by: Rhys Kidd <[email protected]>
Reviewed-by: Ilia Mirkin <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Gives a +7.79% increase in FPS with Hitman on lowest quality settings on
my GTX 1060.
total instructions in shared programs : 5787979 -> 5748677 (-0.68%)
total gprs used in shared programs : 669901 -> 669373 (-0.08%)
total shared used in shared programs : 548832 -> 548832 (0.00%)
total local used in shared programs : 21068 -> 21064 (-0.02%)
local shared gpr inst bytes
helped 1 0 152 274 274
hurt 0 0 0 0 0
Signed-off-by: Rhys Perry <[email protected]>
Reviewed-by: Karol Herbst <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Rather than the usual three that would be created.
total instructions in shared programs : 5796385 -> 5786560 (-0.17%)
total gprs used in shared programs : 670103 -> 669968 (-0.02%)
total shared used in shared programs : 548832 -> 548832 (0.00%)
total local used in shared programs : 21164 -> 21068 (-0.45%)
local shared gpr inst bytes
helped 1 0 64 1040 1040
hurt 0 0 27 0 0
Signed-off-by: Rhys Perry <[email protected]>
Reviewed-by: Karol Herbst <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
total instructions in shared programs : 5819319 -> 5796385 (-0.39%)
total gprs used in shared programs : 670571 -> 670103 (-0.07%)
total shared used in shared programs : 548832 -> 548832 (0.00%)
total local used in shared programs : 21164 -> 21164 (0.00%)
local shared gpr inst bytes
helped 0 0 318 1758 1758
hurt 0 0 63 0 0
Signed-off-by: Rhys Perry <[email protected]>
Reviewed-by: Karol Herbst <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
With this commit, OP_MAD is handled on nv50 too. This commit is also
useful for later commits.
Also, instead of creating a shladd, it relies on LateAlgebraicOpt to
create one. This simplifies the code and helps shader-db slightly overall.
total instructions in shared programs : 5820882 -> 5819319 (-0.03%)
total gprs used in shared programs : 670595 -> 670571 (-0.00%)
total shared used in shared programs : 548832 -> 548832 (0.00%)
total local used in shared programs : 21164 -> 21164 (0.00%)
local shared gpr inst bytes
helped 0 0 18 230 230
hurt 0 0 8 263 263
Signed-off-by: Rhys Perry <[email protected]>
Reviewed-by: Karol Herbst <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This hits the shader-db numbers a good bit, though a few xmads is way
faster than an imul or imad and the cost is mitigated by the next commit,
which optimizes many multiplications by immediates into shorter and less
register heavy instructions than the xmads.
total instructions in shared programs : 5768871 -> 5820882 (0.90%)
total gprs used in shared programs : 669919 -> 670595 (0.10%)
total shared used in shared programs : 548832 -> 548832 (0.00%)
total local used in shared programs : 21068 -> 21164 (0.46%)
local shared gpr inst bytes
helped 0 0 38 0 0
hurt 1 0 365 3076 3076
Signed-off-by: Rhys Perry <[email protected]>
Reviewed-by: Karol Herbst <[email protected]>
|
|
|
|
|
|
|
|
| |
v4: make the immediate field 16 bits
v5: don't ever emit h1 flags for immediates
Signed-off-by: Rhys Perry <[email protected]>
Reviewed-by: Karol Herbst <[email protected]>
|
|
|
|
|
|
|
|
| |
v4: remove uint16_t(...)
v4: don't allow immediates outside [0,65535] in insnCanLoad()
Signed-off-by: Rhys Perry <[email protected]>
Reviewed-by: Karol Herbst <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Some hardware can do PIPE_TEX_WRAP_MIRROR_REPEAT but not
PIPE_TEX_WRAP_MIRROR_CLAMP and PIPE_TEX_WRAP_MIRROR_CLAMP_TO_BORDER.
Drivers for such hardware would like to advertise support for
ARB_texture_mirror_clamp_to_edge but not EXT_texture_mirror_clamp.
This commit adds a new PIPE_CAP_TEXTURE_MIRROR_CLAMP_TO_EDGE bit,
changes the extension enable to be based on that, and enables it
in all upstream drivers which supported PIPE_CAP_TEXTURE_MIRROR_CLAMP
(so they continue supporting this mode).
|
|
|
|
| |
Tested-by: Dieter Nützel <[email protected]>
|
|
|
|
| |
Tested-by: Dieter Nützel <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
We already guarded all OP_SULDP against out of bound accesses, but we
ended up just reusing whatever value was stored in the dest registers.
Fixes CTS test shader_image_load_store.incomplete_textures
v2: fix for loads not ending up with predicates (bindless_texture)
v3: fix replacing the def
Cc: <[email protected]>
Reviewed-by: Ilia Mirkin <[email protected]>
Signed-off-by: Karol Herbst <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
mitigates hurt shaders after adding sqrt:
total instructions in shared programs : 5456166 -> 5454825 (-0.02%)
total gprs used in shared programs : 647522 -> 647551 (0.00%)
total shared used in shared programs : 389120 -> 389120 (0.00%)
total local used in shared programs : 21064 -> 21064 (0.00%)
total bytes used in shared programs : 58288696 -> 58274448 (-0.02%)
local shared gpr inst bytes
helped 0 0 0 516 516
hurt 0 0 27 2 2
Reviewed-by: Ilia Mirkin <[email protected]>
Signed-off-by: Karol Herbst <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
./GpuTest /test=pixmark_piano 1024x640 30sec:
301 -> 327 points
shader-db:
total instructions in shared programs : 5472103 -> 5456166 (-0.29%)
total gprs used in shared programs : 647530 -> 647522 (-0.00%)
total shared used in shared programs : 389120 -> 389120 (0.00%)
total local used in shared programs : 21064 -> 21064 (0.00%)
total bytes used in shared programs : 58459304 -> 58288696 (-0.29%)
local shared gpr inst bytes
helped 0 0 27 8281 8281
hurt 0 0 21 431 431
v2: use NVISA_GM200_CHIPSET
Reviewed-by: Ilia Mirkin <[email protected]>
Signed-off-by: Karol Herbst <[email protected]>
|
|
|
|
| |
Tested-by: Dieter Nützel <[email protected]>
|
|
|
|
| |
Tested-by: Dieter Nützel <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
To avoid serializing, this has the user constant buffer always be 65536
bytes and enabled unless it's required that something else is used for
constant buffer 0.
Fixes artifacts with at least XCOM: Enemy Within, 0 A.D. and Unigine
Valley, Heaven and Superposition.
v2: changed uniform_buffer_bound to be bool instead of a uint32_t
v3: remove magic constants
v3: remove pointless code in nvc0_validate_driverconst
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=100177
Signed-off-by: Rhys Perry <[email protected]>
Reviewed-by: Ilia Mirkin <[email protected]>
|