summaryrefslogtreecommitdiffstats
path: root/src/gallium/drivers/nouveau
Commit message (Collapse)AuthorAgeFilesLines
* nvc0: use PascalB for most Pascal boardsBen Skeggs2017-02-212-1/+9
| | | | Signed-off-by: Ben Skeggs <[email protected]>
* gallium: remove TGSI_OPCODE_CLAMPMarek Olšák2017-02-181-10/+0
| | | | | | | Not used and not widely supported. Use MIN+MAX instead. Reviewed-by: Dave Airlie <[email protected]> Reviewed-by: Nicolai Hähnle <[email protected]>
* gallium: set pipe_context uploaders in drivers (v3)Marek Olšák2017-02-143-0/+32
| | | | | | | | | | | | | | | Notes: - make sure the default size is large enough to handle all state trackers - pipe wrappers don't receive transfer calls from stream_uploader, because pipe_context::stream_uploader points directly to the underlying driver's stream_uploader (to keep it simple for now) v2: add error handling to nv50, nvc0, noop v3: set const_uploader Reviewed-by: Nicolai Hähnle <[email protected]> Tested-by: Edmondo Tommasina <[email protected]> (v1) Tested-by: Charmaine Lee <[email protected]>
* nvc0: disable linked tsc mode in compute launch descriptorIlia Mirkin2017-02-132-2/+6
| | | | | | | | | | Empirically, this makes things work. Presumably this was originally copied from the blob, which does make use of linked tsc mode. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=99532 Signed-off-by: Ilia Mirkin <[email protected]> Reviewed-by: Samuel Pitoiset <[email protected]> Cc: [email protected]
* nv50,nvc0: use alternate samplers for stencilIlia Mirkin2017-02-121-3/+3
| | | | | | | | The blob uses these, and it fixes a bunch of dEQP stencil sampling tests involving border colors. Probably the Z-based samplers work somehow differently wrt border colors when using the stencil swizzle. Signed-off-by: Ilia Mirkin <[email protected]>
* nvc0: set the render condition in the compute objectIlia Mirkin2017-02-111-2/+10
| | | | | | | Fixes GL45-CTS.compute_shader.conditional-dispatching Signed-off-by: Ilia Mirkin <[email protected]> Cc: [email protected]
* gm107/ir: fix address offset bitfield for ATOMSIlia Mirkin2017-02-111-1/+1
| | | | | | | Fixes GL45-CTS.compute_shader.atomic-case1 on Maxwell Signed-off-by: Ilia Mirkin <[email protected]> Cc: [email protected]
* nv50/ir: convert an ATOM.EXCH without a destination into a storeIlia Mirkin2017-02-111-0/+5
| | | | | | | | | | On SM35 there does not appear to be a way to emit a ATOM.EXCH with a null destination. This should be functionally equivalent to a plain store however, so just do that. Fixes GL45-CTS.compute_shader.atomic-case2 on SM35. Signed-off-by: Ilia Mirkin <[email protected]>
* nvc0: fix 64-bit integer query buffer writesIlia Mirkin2017-02-113-20/+37
| | | | | | | The former logic just plain didn't work at all. We need to write the subsequent dword to the next buffer location. Signed-off-by: Ilia Mirkin <[email protected]>
* nv50/ir: return a register when retrieving thread id sysvalIlia Mirkin2017-02-111-1/+1
| | | | | | | | | | We have logic to short-circuit such retrievals to zero. However "zero" was an immediate, and some logic expected to get registers (to later be propagated). Fix this by using loadImm. Fixes GL45-CTS.gpu_shader5.images_array_indexing Signed-off-by: Ilia Mirkin <[email protected]>
* nv50/ir: add missing break after DSSGIlia Mirkin2017-02-111-0/+1
| | | | | | Recently broken during int64 addition. Signed-off-by: Ilia Mirkin <[email protected]>
* nvc0/ir: fix ubo max clamp, reset file indexIlia Mirkin2017-02-091-1/+3
| | | | | | | | | | | We just increased the max UBO, so we should also increase the clamp that we do for robustness. Similarly, as we're including the fileIndex in the new indirect value, we should reset fileIndex to 0 so that it is not added in a second time. Signed-off-by: Ilia Mirkin <[email protected]> Reviewed-by: Samuel Pitoiset <[email protected]> Cc: [email protected]
* nv50/ir: always return 0 when trying to read thread id along unit dimIlia Mirkin2017-02-094-5/+17
| | | | | | | | | | | | | Many many many compute shaders only define a 1- or 2-dimensional block, but then continue to use system values that take the full 3d into account (like gl_LocalInvocationIndex, etc). So for the special case that a dimension is exactly 1, we know that the thread id along that axis will always be 0, so return it as such and allow constant folding to fix things up. Signed-off-by: Ilia Mirkin <[email protected]> Reviewed-by: Pierre Moreau <[email protected]> Reviewed-by: Samuel Pitoiset <[email protected]>
* nvc0/ir: fix robustness guarantees for constbuf loads on kepler+ computeIlia Mirkin2017-02-091-25/+22
| | | | | | | | | | | | | Kepler and up unfortunately only support up to 8 constbufs. We work around this by loading from constbufs as if they were storage buffers. However we were not consistently applying limits to loads from these buffers. Make sure to do the same thing we do for storage buffers. Fixes GL45-CTS.robust_buffer_access_behavior.uniform_buffer Signed-off-by: Ilia Mirkin <[email protected]> Reviewed-by: Samuel Pitoiset <[email protected]> Cc: [email protected]
* nvc0: increase number of ubo binding pointsIlia Mirkin2017-02-091-3/+2
| | | | | | | | | | | | | Apparently GL 4.5 requires 14 of these (there's a "*" in the spec, but it's unclear what it refers to). We need to expose an extra binding point for the "program parameters", which means this must be 15. Remove the last vestige of the "use c14 for immediates" idea. Fixes GL45-CTS.shading_language_420pack.binding_uniform_block_array Signed-off-by: Ilia Mirkin <[email protected]> Reviewed-by: Samuel Pitoiset <[email protected]> Cc: [email protected]
* nvc0: expose int64Ilia Mirkin2017-02-091-1/+1
| | | | Signed-off-by: Ilia Mirkin <[email protected]>
* nvc0/ir: make it possible to have the flags def in def0Ilia Mirkin2017-02-095-12/+15
| | | | | | | There's all kinds of logic that doesn't like there being holes in defs or srcs lists. Avoid them. This also fixes the sched logic for maxwell. Signed-off-by: Ilia Mirkin <[email protected]>
* nvc0/ir: add support for 64-bit shift lowering on SM20/SM30Ilia Mirkin2017-02-091-6/+62
| | | | | | | Unfortunately there is no SHF.L/SHF.R instruction pre-SM35. So we have to do a bit more work to get the job done. Signed-off-by: Ilia Mirkin <[email protected]>
* nvc0/ir: add support for all the new int64 tgsi opcodesIlia Mirkin2017-02-096-5/+302
| | | | | | | | | | | | A few thoughts: - Some of that LegalizeSSA logic should really live much earlier and be subject to the likes of DCE and other useful passes - Some of the "lowering" done in from_tgsi should be done later so that proper optimization might be done. However this all works and the above can be improved upon later. Signed-off-by: Ilia Mirkin <[email protected]>
* nv50/ir: Split 64-bit integer MAD/MUL operationsPierre Moreau2017-02-091-0/+116
| | | | | | | Hardware does not support 64-bit integers MAD and MUL operations, so we need to transform them in 32-bit operations. Signed-off-by: Pierre Moreau <[email protected]>
* nvc0/ir: add a "high" subop for shifts, emit shf.l/shf.r for 64-bitIlia Mirkin2017-02-093-3/+74
| | | | | | Note that this is not available for SM20/SM30. Signed-off-by: Ilia Mirkin <[email protected]>
* nvc0/ir: fix SET and SLCT emissionIlia Mirkin2017-02-092-0/+6
| | | | | | | | We were never emitting a .X flag for consuming condition code on SET, and weren't emitting a signed type for SLCT comparison. Discovered while working on int64 logic. Signed-off-by: Ilia Mirkin <[email protected]>
* nvc0/ir: add support for emitting partial min/max ops for int64Ilia Mirkin2017-02-094-1/+14
| | | | | | | | | | These operations allow you to compute min/max on arbitrary-width integers, 32 bits at a time. Note that the low/med ops implicitly set the condition code, and the med/high ops implicitly consume it. Signed-off-by: Ilia Mirkin <[email protected]>
* gallium: add separate PIPE_CAP_INT64_DIVMODIlia Mirkin2017-02-093-0/+3
| | | | | | | | | | | Nouveau does not currently have logic to implement this as a library function. Even though such a library could be written, there's no big advantage to do it that way for now given that int64 is a very uncommon use-case. Allow a driver to expose INT64 without supporting division and modulo operations. Signed-off-by: Ilia Mirkin <[email protected]> Reviewed-by: Nicolai Hähnle <[email protected]>
* gallium: turn PIPE_SHADER_CAP_DOUBLES into a screen capabilityNicolai Hähnle2017-02-023-5/+3
| | | | | | | | | | | | | | | | | | | Make the cap consistent with PIPE_CAP_INT64. Aside from the hypothetical case of using draw for vertex shaders (and actually caring about doubles...), every implementation supports doubles either nowhere or everywhere. Also, st/mesa didn't even check the cap correctly in all supported shader stages. While at it, add a missing LLVM version check for 64-bit integers in radeonsi. This is conservative: judging by the log, LLVM 3.8 might be sufficient, but there are probably bugs that have been fixed since then. v2: fix clover (Marek) Reviewed-by: Marek Olšák <[email protected]>
* nouveau: remove explicit __STDC_FORMAT_MACROS defineEmil Velikov2017-01-271-1/+0
| | | | | | | | Already handled by the build. Signed-off-by: Emil Velikov <[email protected]> Reviewed-by: Eric Engestrom <[email protected]> Reviewed-by: Jose Fonseca <[email protected]>
* gallium: Add integer 64 capabilityDave Airlie2017-01-273-0/+3
| | | | | | | | | v1.1: move to using a normal CAP. (Marek) v2: fill in the cap everywhere Signed-off-by: Dave Airlie <[email protected]> Reviewed-by: Marek Olšák <[email protected]>
* nv50: add support for MUL_ZERO_WINS propertyIlia Mirkin2017-01-235-2/+11
| | | | | | | This is simply keyed off the vertex shader, as that's guaranteed to be present in any pipeline. Signed-off-by: Ilia Mirkin <[email protected]>
* nvc0: add support for MUL_ZERO_WINS propertyIlia Mirkin2017-01-234-9/+25
| | | | | | | | This sets the dnz flag on all the relevant multiplication operations. At emission time, this will only be supported by nvc0+, so nv50 will need a different solution. Signed-off-by: Ilia Mirkin <[email protected]>
* gallium: add PIPE_CAP_TGSI_MUL_ZERO_WINSIlia Mirkin2017-01-233-0/+3
| | | | | | Signed-off-by: Ilia Mirkin <[email protected]> Reviewed-by: Nicolai Hähnle <[email protected]> Reviewed-by: Axel Davy <[email protected]>
* nouveau: remove always false argument in nouveau_fence_new()Emil Velikov2017-01-185-11/+6
| | | | | | | | | No point in having the extra argument considering that it's effectively unused since the function was introduced. Cc: Ilia Mirkin <[email protected]> Signed-off-by: Emil Velikov <[email protected]> Reviewed-by: Ilia Mirkin <[email protected]>
* nv50/ir: optimize shl + andIlia Mirkin2017-01-161-0/+11
| | | | | | | | | | | | | | | | | Address loading can often end up as shl + shr + shl combinations. The latter two are equal shifts, which get converted into an and mask. However if the previous shl is more than the mask is trying to remove (in terms of low bits), we can just remove the and entirely. This reduces some large shaders by as many as 3% of instructions (out of 2K). total instructions in shared programs : 6495509 -> 6491076 (-0.07%) total gprs used in shared programs : 954621 -> 954623 (0.00%) local gpr inst bytes helped 0 0 1014 1014 hurt 0 2 0 0 Signed-off-by: Ilia Mirkin <[email protected]>
* nvc0: enable FBFETCH with a special slot for color buffer 0Ilia Mirkin2017-01-169-6/+172
| | | | | | | | | | | | We don't need to support all the color buffers for advanced blend, just cb0. For Fermi, we use the special binding slots so that we don't overlap with user textures, while Kepler+ gets a dedicated position for the fb handle in the driver constbuf. This logic is only triggered when a FBFETCH is actually present so it should be a no-op most of the time. Signed-off-by: Ilia Mirkin <[email protected]>
* gallium: add flags parameter to texture barrierIlia Mirkin2017-01-162-2/+2
| | | | | | | | This is so that we can differentiate between flushing any framebuffer reading caches from regular sampler caches. Signed-off-by: Ilia Mirkin <[email protected]> Reviewed-by: Nicolai Hähnle <[email protected]>
* gallium: add PIPE_CAP_TGSI_FS_FBFETCHIlia Mirkin2017-01-163-0/+3
| | | | | Signed-off-by: Ilia Mirkin <[email protected]> Reviewed-by: Nicolai Hähnle <[email protected]>
* nvc0: true up exposing of the HW_METRIC_QUERY_GROUP for maxwellIlia Mirkin2017-01-161-2/+2
| | | | | | | This had been updated in one place but not the other. Signed-off-by: Ilia Mirkin <[email protected]> Reviewed-by: Samuel Pitoiset <[email protected]>
* nv50/ir: handle new DDIV op which will be used for double divisionsIlia Mirkin2017-01-161-0/+3
| | | | | | | The existing lowering is in place to lower that to RCP + MUL, or fancier things down the line if necessary. Signed-off-by: Ilia Mirkin <[email protected]>
* nvc0/ir: emit FMZ flag when requested on FFMAIlia Mirkin2017-01-151-0/+4
| | | | Signed-off-by: Ilia Mirkin <[email protected]>
* nvc0/ir: only try to check for zero LOD if we aren't already forcing itIlia Mirkin2017-01-121-1/+1
| | | | | | | | | | There's a levelZero flag which forces texturing to pick level zero (and not consume an explicit LOD argument). This is set for MS targets, but could also be set for any other incoming instruction. As that is what determines whether a LOD argument is present, check that rather than the more indirect isMS logic. Signed-off-by: Ilia Mirkin <[email protected]>
* nouveau: take extra push space into account for pushbuf_space callsIlia Mirkin2017-01-1215-56/+26
| | | | | | | | | | | | | | | | | | | | | | Ever since a long time ago when I messed around with fences, I ensure that after a PUSH_SPACE call there is enough space to write a fence out into the pushbuf. However the PUSH_SPACE macro is not all-knowing, and so sometimes we have to invoke nouveau_pushbuf_space manually with the relocs/pushes args set. If we don't take the extra allocation from PUSH_SPACE into account, then we will end up accidentally flushing when the code was not expecting a flush. This can lead to various runtime and rendering failures. The amount of extra allocation isn't that important - it has to be at least 8 based on the current nouveau_winsys.h setting, but even more won't hurt. I just rounded up to powers of 2. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=99354 Cc: "12.0 13.0" <[email protected]> Signed-off-by: Ilia Mirkin <[email protected]> Acked-by: Ben Skeggs <[email protected]>
* nvc0: enable GL 4.3 on gm107+Samuel Pitoiset2017-01-121-7/+4
| | | | | | | | | | | | Although, arb_shader_image_load_store-atomicity will most likely hang your box, I think it's now quite reasonable to enable GL 4.3 on Maxwell/Pascal GPUs. I suspect that test to be wrong because it doesn't even work on the NVIDIA blob. I have tested a bunch of benchmarks (UE4 demos) and real games like Shadow of Mordor and they all work fine. Signed-off-by: Samuel Pitoiset <[email protected]>
* nvc0: use sched control codes for gm107 MP counters codeSamuel Pitoiset2017-01-121-44/+44
| | | | | Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Pierre Moreau <[email protected]>
* nvc0: use sched control codes for gm107 blitter shaderSamuel Pitoiset2017-01-121-6/+14
| | | | | | Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Pierre Moreau <[email protected]> Acked-by: Ilia Mirkin <[email protected]>
* nv50/ir: use sched control codes for gm107 builtinsSamuel Pitoiset2017-01-122-40/+40
| | | | | | | | | | Yes, IMUL/IMAD require dependency barriers and we should definitely replace these instructions by XMAD but the different flags need to be figured out. Note that XMAD only supports 16-bits integers. Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Pierre Moreau <[email protected]>
* nv50/ir: improve instruction pipelining on gm107Samuel Pitoiset2017-01-123-4/+1027
| | | | | | | | | | | | | | | | | | | | | This makes use of scheduling control codes which are very useful for improving the instruction pipelining. This patch will increase performance on Maxwell GPUs by, at least, x1.5 up to x3.5 for some benchmarks. Although this has been fairly well tested, I would not be suprised if someone hit a corner case somewhere. That way, the scheduler is enabled by default but it can be deactivated by using NV50_PROG_SCHED=0. Thanks to Scott Gray for the reverse engineering work available from https://github.com/NervanaSystems/maxas/wiki/Control-Codes. Signed-off-by: Samuel Pitoiset <[email protected]> Acked-by: Pierre Moreau <[email protected]> Tested-by: Alexandre Courbot <[email protected]> Tested-by: Jan Vesely <[email protected]>
* nv50/ir: do not insert texture barriers on gm107Samuel Pitoiset2017-01-121-1/+2
| | | | | | | | | | It's actually useless to insert those texture barriers post RA because the current control code (ie. st 0x0) will wait for all dependencies before issuing a new instruction. Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Ilia Mirkin <[email protected]> Reviewed-by: Pierre Moreau <[email protected]>
* gallium: remove TGSI_OPCODE_SUBMarek Olšák2017-01-053-8/+0
| | | | | | It's redundant with the source modifier. Reviewed-by: Nicolai Hähnle <[email protected]>
* gallium: remove TGSI_OPCODE_ABSMarek Olšák2017-01-053-9/+0
| | | | | | It's redundant with the source modifier. Reviewed-by: Nicolai Hähnle <[email protected]>
* gallium: add PIPE_CAP_GLSL_OPTIMIZE_CONSERVATIVELYMarek Olšák2017-01-053-0/+3
| | | | | | Drivers with good compilers don't need aggressive optimizations before TGSI. Reviewed-by: Eric Anholt <[email protected]>
* gallium: support for native fence fd'sRob Clark2016-12-013-0/+3
| | | | | | | This enables gallium support for EGL_ANDROID_native_fence_sync, for drivers which support PIPE_CAP_NATIVE_FENCE_FD. Signed-off-by: Rob Clark <[email protected]>