aboutsummaryrefslogtreecommitdiffstats
path: root/src/gallium/drivers/nouveau
Commit message (Collapse)AuthorAgeFilesLines
* drm-uapi: use local files, not system libdrmEric Engestrom2019-02-146-6/+6
| | | | | | | | | There was an issue recently caused by the system header being included by mistake, so let's just get rid of this include path and always explicitly #include "drm-uapi/FOO.h" Signed-off-by: Eric Engestrom <[email protected]> Reviewed-by: Kristian H. Kristensen <[email protected]>
* nvc0: we have 16k-sized framebuffers, fix default scissorsIlia Mirkin2019-02-101-2/+2
| | | | | | | | | For some reason we don't use view volume clipping by default, and use scissors instead. These scissors were set to an 8k max fb size, while the driver advertises 16k-sized framebuffers. Signed-off-by: Ilia Mirkin <[email protected]> Cc: <[email protected]>
* nouveau: Silence unhandled cap warningsKenneth Graunke2019-02-082-0/+2
| | | | | | | | | | Nouveau apparently uses the u_screen helper but prints a warning in the default case, so running any GL program would start grumbling. Fixes: 8fa54bc5490 gallium: Add a PIPE_CAP_NIR_COMPACT_ARRAYS capability bit. Reviewed-by: Karol Herbst <[email protected]> Acked-by: Ilia Mirkin <[email protected]>
* gallium: add PIPE_CAP_MAX_VARYINGSKarol Herbst2019-02-073-12/+12
| | | | | | | | | | | | | | | | | Some NVIDIA hardware can accept 128 fragment shader input components, but only have up to 124 varying-interpolated input components. We add a new cap to express this cleanly. For most drivers, this will have the same value as PIPE_SHADER_CAP_MAX_INPUTS for the fragment shader. Fixes KHR-GL45.limits.max_fragment_input_components Signed-off-by: Karol Herbst <[email protected]> [imirkin: rebased, improved docs/commit message] Signed-off-by: Ilia Mirkin <[email protected]> Acked-by: Rob Clark <[email protected]> Acked-by: Eric Anholt <[email protected]> Reviewed-by: Marek Olšák <[email protected]> Cc: 19.0 <[email protected]>
* nvc0: add compute invocation counterRhys Perry2019-02-068-4/+207
| | | | | | | | | | | | | | | | | The strategy is to keep a CPU-side counter of the direct invocations, and a GPU-side counter of the indirect invocations, and then add them together for queries. The specific technique is a macro which multiplies a list of integers together and accumulates the product into SCRATCH registers held inside of the context. Another macro will read those values out and add them to the passed-in cpu-side counter to be stored in a query buffer the same way that all the other statistics are stored. Original implementation by Rhys Perry, redone by Ilia Mirkin to use the SCRATCH temporaries. Signed-off-by: Ilia Mirkin <[email protected]>
* gm107/ir: add fp64 rsqKarol Herbst2019-02-063-3/+128
| | | | | Acked-by: Ilia Mirkin <[email protected]> Cc: 19.0 <[email protected]>
* gm107/ir: add fp64 rcpKarol Herbst2019-02-063-4/+270
| | | | | Acked-by: Ilia Mirkin <[email protected]> Cc: 19.0 <[email protected]>
* gk104/ir: Use the new rcp/rsq in libraryKarol Herbst2019-02-063-15/+334
| | | | | | [imirkin: add a few more "long" prefixes to safen things up] Acked-by: Ilia Mirkin <[email protected]> Cc: 19.0 <[email protected]>
* gk110/ir: Use the new rcp/rsq in libraryBoyan Ding2019-02-065-0/+42
| | | | | | | | | | v2: (Karol Herbst <[email protected]> * fix Value setup for the builtins Signed-off-by: Boyan Ding <[email protected]> [imirkin: track the fp64 flag when switching ops to calls] Signed-off-by: Ilia Mirkin <[email protected]> Cc: 19.0 <[email protected]>
* gk110/ir: Add rsq f64 implementationBoyan Ding2019-02-062-2/+109
| | | | | | Signed-off-by: Boyan Ding <[email protected]> Acked-by: Ilia Mirkin <[email protected]> Cc: 19.0 <[email protected]>
* gk110/ir: Add rcp f64 implementationBoyan Ding2019-02-062-4/+235
| | | | | | Signed-off-by: Boyan Ding <[email protected]> Acked-by: Ilia Mirkin <[email protected]> Cc: 19.0 <[email protected]>
* nvc0: stick zero values for the compute invocation countsIlia Mirkin2019-02-061-0/+2
| | | | | | | | | | Not quite perfect, but at least we don't end up with random values in the query buffer. Fixes KHR-GL45.pipeline_statistics_query_tests_ARB.functional_default_qo_values Signed-off-by: Ilia Mirkin <[email protected]> Cc: 19.0 <[email protected]>
* nv50,nvc0: use condition for occlusion queries when already completeIlia Mirkin2019-02-066-28/+25
| | | | | | | | | | | | | | | | | | For the NO_WAIT variants, we would jump into the ALWAYS case for both nested and inverted occlusion queries. However if the query had previously completed, the application could reasonably expect that the render condition would follow that result. To resolve this, we remove the nesting distinction which unnecessarily created an imbalance between the regular and inverted cases (since there's no "zero" condition mode). We also use the proper comparison if we know that the query has completed (which could happen as a result of an earlier get_query_result call). Fixes KHR-GL45.conditional_render_inverted.functional Signed-off-by: Ilia Mirkin <[email protected]> Cc: 19.0 <[email protected]>
* nvc0: fix 3d images on keplerIlia Mirkin2019-02-062-35/+34
| | | | | | | | | | | | | Looks like SUBFM.3D and SUEAU are perfectly capable of dealing with 3d tiling, they just need the correct inputs. Supply them. We also have to deal with the case where a 2d "layer" of a 3d image is bound. In this case, we supply the z coordinate separately to the shader, which has to optionally treat every 2d case as if it could be a slice of a 3d texture. Signed-off-by: Ilia Mirkin <[email protected]> Cc: 19.0 <[email protected]>
* nvc0/ir: fix second tex argument after levelZero optimizationIlia Mirkin2019-02-062-25/+24
| | | | | | | | | | | | | | | | | | | | We used to pre-set a bunch of extra arguments to a texture instruction in order to force the RA to allocate a register at the boundary of 4. However with the levelZero optimization, which removes a LOD argument when it's uniformly equal to zero, we undid that logic by removing an extra argument. As a result, we could end up with insufficient alignment on the second wide texture argument. Instead we switch to a different method of achieving the same result. The logic runs during the constraint analysis of the RA, and adds unset sources as necessary right before being merged into a wide argument. Fixes MISALIGNED_REG errors in Hitman when run with bindless textures enabled on a GK208. Fixes: 9145873b152 ("nvc0/ir: use levelZero flag when the lod is set to 0") Signed-off-by: Ilia Mirkin <[email protected]> Cc: 19.0 <[email protected]>
* nvc0/ir: always use CG mode for loads from atomic-only buffersIlia Mirkin2019-02-061-2/+12
| | | | | | | | | | | | | | | | Atomic operations don't update the local cache, which means that we would have to issue CCTL operations in order to get the updated values. When we know that a buffer is primarily used for atomic operations, it's easier to just avoid the caching at that level entirely. The same issue persists for non-atomic buffers, which will have to be fixed separately. Fixes the failing dEQP-GLES31.functional.atomic_counter.* tests. Signed-off-by: Ilia Mirkin <[email protected]> Reviewed-by: Karol Herbst <[email protected]> Cc: 19.0 <[email protected]>
* nvc0: add support for handling indirect draws with attrib conversionIlia Mirkin2019-02-063-1/+82
| | | | | | | | | | | | | | | | The hardware does not natively support FIXED and DOUBLE formats. If those are used in an indirect draw, they have to be converted. Our conversion tries to be clever about only converting the data that's needed. However for indirect, that won't work. Given that DOUBLE or FIXED are highly unlikely to ever be used with indirect draws, read the indirect buffer on the CPU and issue draws directly. Fixes the failing dEQP-GLES31.functional.draw_indirect.random.* tests. Signed-off-by: Ilia Mirkin <[email protected]> Cc: 19.0 <[email protected]>
* nvc0/ir: replace cvt instructions with add to improve shader performanceKarol Herbst2019-02-052-0/+64
| | | | | | | | | | | | | | | | | | | gives me an performance boost of 0.2% in pixmark_piano on my gk106, gm204 and gp107. reduces the amount of generated convert instructions by roughly 30% in shader-db. v2: only for 32 bit operations move some common code out of the switch handle OP_SAT with modifiers v3: only for registers and const memory rework if clauses merge isCvt into this patch v4: merge isCvt into its use Signed-off-by: Karol Herbst <[email protected]> Reviewed-by: Ilia Mirkin <[email protected]>
* nv50,nvc0: add explicit settings for recent capsIlia Mirkin2019-02-042-0/+4
| | | | | Signed-off-by: Ilia Mirkin <[email protected]> Cc: 19.0 <[email protected]>
* nvc0: don't put text segment into bufctxIlia Mirkin2019-01-276-13/+16
| | | | | | | | | The text segment is shared among multiple contexts, while each one has its own bufctx. So when reallocating the text segment, some contexts may end up with stale values in their bufctx's. Instead limit the exposure to the bufctx to within a single draw. Signed-off-by: Ilia Mirkin <[email protected]>
* nv50,nvc0: mark textures dirty on fb updateIlia Mirkin2019-01-222-2/+4
| | | | | | | | | | | | | | We may have to flush the cache if there are any textures presently bound that refer to the outgoing framebuffer. This is only checked at validation time. Fixes a number of dEQP-GLES3.functional.fbo.color.repeated_clear.sample.* tests, which would bind a texture, then clear it while the binding was in effect, and then render to a different texture. This seems legal under the "no feedback loops" rule. Signed-off-by: Ilia Mirkin <[email protected]> Reviewed-by: Karol Herbst <[email protected]>
* nv50,nvc0: add missing CAPs for unsupported featuresRhys Kidd2019-01-202-0/+4
| | | | | Signed-off-by: Rhys Kidd <[email protected]> Reviewed-by: Ilia Mirkin <[email protected]>
* gm107/ir: disable TEXS for tex with derivAll setKarol Herbst2019-01-182-1/+3
| | | | | | | | | | | | | | | | | | | | | | | fixes deqp tests: dEQP-GLES3.functional.shaders.texture_functions.texturegrad.samplercube_fixed_vertex dEQP-GLES3.functional.shaders.texture_functions.texturegrad.samplercube_float_vertex dEQP-GLES3.functional.shaders.texture_functions.texturegrad.isamplercube_vertex dEQP-GLES3.functional.shaders.texture_functions.texturegrad.usamplercube_vertex dEQP-GLES3.functional.shaders.texture_functions.texturegrad.sampler3d_fixed_vertex dEQP-GLES3.functional.shaders.texture_functions.texturegrad.sampler3d_float_vertex dEQP-GLES3.functional.shaders.texture_functions.texturegrad.isampler3d_vertex dEQP-GLES3.functional.shaders.texture_functions.texturegrad.usampler3d_vertex dEQP-GLES3.functional.shaders.texture_functions.texturegrad.sampler2dshadow_vertex dEQP-GLES3.functional.shaders.texture_functions.textureprojgrad.sampler3d_fixed_vertex dEQP-GLES3.functional.shaders.texture_functions.textureprojgrad.sampler3d_float_vertex dEQP-GLES3.functional.shaders.texture_functions.textureprojgrad.isampler3d_vertex dEQP-GLES3.functional.shaders.texture_functions.textureprojgrad.usampler3d_vertex dEQP-GLES3.functional.shaders.texture_functions.textureprojgrad.sampler2dshadow_vertex Fixes: f821e80213e38e93f96255b3deacb737a600ed40 "gm107/ir: use scalar tex instructions where possible" Signed-off-by: Karol Herbst <[email protected]> Reviewed-by: Ilia Mirkin <[email protected]>
* nv50/ir: disable tryCollapseChainedMULs in ConstantFolding for precise ↵Karol Herbst2019-01-181-1/+1
| | | | | | | | | | instructions fixes dEQP-GLES2.functional.shaders.invariance.mediump.loop_3 CC: <[email protected]> Signed-off-by: Karol Herbst <[email protected]> Reviewed-by: Ilia Mirkin <[email protected]>
* nv30: disable rendering to 3D texturesIlia Mirkin2019-01-011-0/+6
| | | | | | | | | | | There's no way to tell the 3D engine about swizzling on such textures. While rendering to NPOT ones may be possible, there's no great way to expose that in gallium, nor would there be any practical benefit. Fixes the non-compressed-format "copyteximage 3D" failures. Something odd going on with the compressed formats. Signed-off-by: Ilia Mirkin <[email protected]>
* nv30: fix some s3tc layout issuesIlia Mirkin2018-12-302-7/+26
| | | | | | | | | | | | | | s3tc layouts are a bit finicky - they're packed, but not swizzled. Adjust logic to allow for that case: - Don't set a uniform pitch for POT-sized compressed textures - Adjust define_rect API to be less confused about block sizes - Only mark a texture as linear if it has a uniform pitch set This has been tested to fix xonotic (as well as the s3tc-* piglits) on nv3x and keeps it working on nv4x. Signed-off-by: Ilia Mirkin <[email protected]>
* nv30: use correct helper to get blocks in y directionIlia Mirkin2018-12-301-1/+1
| | | | | | | This doesn't matter since all compressed formats supported by this hardware use square blocks, but best to use the correct helper. Signed-off-by: Ilia Mirkin <[email protected]>
* nv30: add support for multi-layer transfersIlia Mirkin2018-12-301-4/+35
| | | | | | | | This logic mirrors what we do on nv50. The relatively new texture_subdata callback can cause this to happen with 3D textures, which is triggered at least by xonotic, and probably many piglits. Signed-off-by: Ilia Mirkin <[email protected]>
* nv30: fix rare issue with fp unbinding not finding the bufctxIlia Mirkin2018-12-301-1/+1
| | | | | | | | | | | | | If the last-active context gets deleted, the pushbuf doesn't have a bufctx to reference. Then there could be a sequence of binds which would trigger a reset on that bin before validation was done. Instead we just pass in the bufctx in question directly. All other instances of PUSH_RESET happen strictly after a validation is run. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=102349 Signed-off-by: Ilia Mirkin <[email protected]>
* nv30: avoid setting user_priv without setting cur_ctxIlia Mirkin2018-12-301-3/+1
| | | | | | | | | | | | The whole user_priv thing is a mess, but as long as it's there, it basically has to map 1:1 to the cur_ctx. Unfortunately we were setting user_priv to some context, then that context could get deleted without any draws/validations in it, leading user_priv to become NULL, with cur_ctx still pointing at some old context. Then we wouldn't run the switch logic, which in turn led to a NULL bufctx being dereferenced. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=102349 Signed-off-by: Ilia Mirkin <[email protected]>
* nv50,nvc0: add missing CAPs for unsupported featuresIlia Mirkin2018-12-262-0/+3
| | | | Signed-off-by: Ilia Mirkin <[email protected]>
* nvc0: enable GL_NV_shader_atomic_float on pre-MaxwellIlia Mirkin2018-12-261-0/+2
| | | | Signed-off-by: Ilia Mirkin <[email protected]>
* nv50/ir: add support for converting ATOMFADD to proper irIlia Mirkin2018-12-261-0/+4
| | | | Signed-off-by: Ilia Mirkin <[email protected]>
* nvc0: always keep TSC slot 0 bound to fix TXFIlia Mirkin2018-12-142-0/+21
| | | | | | | | | | | | Same as on nv50, the TXF op always uses the TSC bound to slot 0, returning blank values if nothing is bound. An earlier change arranges for the TSC entries list to always have valid data at entry 0, so here we just make use of it. Fixes arb_texture_buffer_object-subdata-sync among others. Signed-off-by: Ilia Mirkin <[email protected]>
* nvc0: replace use of explicit default_tsc with entry 0Ilia Mirkin2018-12-146-22/+25
| | | | | | | | | | | This was used for implementing FBFETCH. However that uses TXF, which doesn't do much with a TSC. The only important bit is that sRGB-decoding works as expected, which we can achieve since all samplers we ever generate enable sRGB-decoding. Always point to entry 0 in the TSC table, and ensure that even before it ever gets initialized, the sRGB-decoding enable bit is set. Signed-off-by: Ilia Mirkin <[email protected]>
* nv50/ir: fix use-after-free in ConstantFolding::visitKarol Herbst2018-12-091-33/+49
| | | | | | | | | | | opnd() might delete the passed in instruction, but it's used through i->srcExists() later in visit v2: use continue instead return v3: use brackets for the outer if/else chain Signed-off-by: Karol Herbst <[email protected]> Reviewed-by: Ilia Mirkin <[email protected]>
* nouveau: use atomic operations for driver statisticsKarol Herbst2018-12-091-3/+4
| | | | | | | multiple threads can write to those at the same time Signed-off-by: Karol Herbst <[email protected]> Reviewed-by: Ilia Mirkin <[email protected]>
* nv50/ir: initialize relDegree staticlyKarol Herbst2018-12-091-7/+16
| | | | | | | this race condition is pretty harmless, but also pretty trivial to fix Signed-off-by: Karol Herbst <[email protected]> Reviewed-by: Ilia Mirkin <[email protected]>
* nouveau: set texture upload budgetIlia Mirkin2018-12-033-3/+6
| | | | | | | | | It doesn't seem like the exact number has too much effect on the performaince in "teximage". However setting it to just about anything prevents some OOMs from getting hit. These values are not well-tuned, but don't seem too bad. Signed-off-by: Ilia Mirkin <[email protected]>
* nv50,nvc0: add explicit handling of PIPE_CAP_MAX_VERTEX_ELEMENT_SRC_OFFSETIlia Mirkin2018-12-032-0/+4
| | | | | | | Since the max attrib stride is 2048, the max src offset makes sense as 2047. Signed-off-by: Ilia Mirkin <[email protected]>
* nv50: always keep TSC slot 0 boundIlia Mirkin2018-12-033-0/+31
| | | | | | | | | | | | | | | | | All TXF operations implicitly use sampler 0, and fail if it's not bound to anything. This does not happen in LINKED_TSC mode, but we don't currently use this. We ensure that TSC entry at id 0 has the SRGB conversion bit enabled (and all samplers we normally generate will too). Then when the TSC at *slot* 0 (not to be confused with entry 0 in the global TSC table) is unbound, we bind it to entry 0. This way, TXF operations are not dependent on there being a regular sampler bound there. Fixes arb_texture_buffer_object-subdata-sync among others. (TBO's are particularly susceptible to this as they don't bind a sampler.) Signed-off-by: Ilia Mirkin <[email protected]>
* nv50,nvc0: Fix gallium nine regression regarding sampler bindingsKarol Herbst2018-12-022-16/+12
| | | | | | | | | | | | | | | | The new approach is that samplers don't get unbound even if they won't be used in a draw and we should just leave them be as well. Fixes a regression in multiple windows games using gallium nine and nouveau. v2: adjust num_samplers to keep track of the highest sampler bound v3: rework how to set the new value of num_samplers Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=106577 Fixes: 4d6fab245eec3880e2a59424a579851f44857ce8 "cso: don't track the number of sampler states bound" Signed-off-by: Karol Herbst <[email protected]> Reviewed-by: Ilia Mirkin <[email protected]>
* nv50/ir: remove dnz flag when converting MAD to ADD due to optimizationsIlia Mirkin2018-11-241-0/+3
| | | | | | | | | | dnz flag only applies for multiplications (e.g. to make 0 * Infinity becomes 0 instead of NaN). Once we optimize a MAD into an ADD, the dnz flag no longer makes sense, and upsets the GM107 emitter (since it looks at the ftz and dnz flags together). Signed-off-by: Ilia Mirkin <[email protected]> Reviewed-by: Karol Herbst <[email protected]>
* nv50/ir/ra: enforce max register requirement, and change spill orderIlia Mirkin2018-11-164-16/+26
| | | | | | | | | | | | | | | | | | On nv50, certain operations must happen on regs below 64, due to encoding requirements. First of all, we add infrastructure to enforce this. Secondly we change the spill order to first spill RIG nodes that are unconstrained, followed by ones that are. This makes the gamecube logo shadertoy compile properly. Curiously, if we adjust the spill order so that we first spill the constrained RIG nodes instead, the RA also succeeds. However it seems more logical to first spill the unconstrained ones. While we're at it, drop the nv50 max register to reserve r127 as the zero register of last resort (r63 is preferred). Signed-off-by: Ilia Mirkin <[email protected]> Acked-by: Karol Herbst <[email protected]>
* nv50/ir/ra: improve condition for short regs, unify with cond for 16-bitIlia Mirkin2018-11-161-7/+7
| | | | | | | | | | | | | | Instead of the size restriction existing in two places, and potentially being applied twice, we move this together. Ops with 16-bit register addresses can only take a short reg, and ops with immediates can only take a short reg. Of course we leave the immediate 0 in place since we know that it will be replaced by r63/r127 down the line, so don't treat zeroes as an immediate. Signed-off-by: Ilia Mirkin <[email protected]> Reviewed-by: Karol Herbst <[email protected]>
* nv50/ir: delete MINMAX instruction that is no longer in the BBIlia Mirkin2018-11-161-1/+1
| | | | | | | | | We removed the op from the BB, but it was still listed in its sources' uses. This could trip up some logic down the line which analyzes all the uses of an l-value, e.g. spilling. Signed-off-by: Ilia Mirkin <[email protected]> Reviewed-by: Karol Herbst <[email protected]>
* gm107/ir: fix compile time warning in getTEXSMaskKarol Herbst2018-11-071-0/+1
| | | | | | | | | | | In function 'uint8_t nv50_ir::getTEXSMask(uint8_t)': warning: control reaches end of non-void function [-Wreturn-type] Reported-by: Moiman@freenode Fixes: f821e80213e38e93f96255b3deacb737a600ed40 "gm107/ir: use scalar tex instructions where possible" Signed-off-by: Karol Herbst <[email protected]> Reviewed-by: Ilia Mirkin <[email protected]>
* gm107/ir: use scalar tex instructions where possibleKarol Herbst2018-11-062-3/+317
| | | | | | | | | | | | | | | | | | | TEXS, TLD4 and TLD4S are variants of tex instructions which are more scalar, which gives RA more freedom and is less likely to insert silly MOVs to satisfy quad registers. shader-db changes: total instructions in shared programs : 7687265 -> 7614782 (-0.94%) total gprs used in shared programs : 803620 -> 798045 (-0.69%) total shared used in shared programs : 639636 -> 639636 (0.00%) total local used in shared programs : 24648 -> 24648 (0.00%) total bytes used in shared programs : 82103400 -> 81330696 (-0.94%) local shared gpr inst bytes helped 0 0 3648 10647 10647 hurt 0 0 464 205 205 Reviewed-by: Ilia Mirkin <[email protected]>
* nv50/ir: add scalar field to TexInstructionsKarol Herbst2018-11-062-1/+6
| | | | Reviewed-by: Ilia Mirkin <[email protected]>
* nv50/ra: add condenseDef overloads for partial condensesKarol Herbst2018-11-061-8/+21
| | | | Reviewed-by: Ilia Mirkin <[email protected]>