summaryrefslogtreecommitdiffstats
path: root/src/gallium/drivers/nouveau
Commit message (Collapse)AuthorAgeFilesLines
* gallium: Add a pipe cap for whether primitive restart works for patches.Kenneth Graunke2016-05-233-0/+3
| | | | | | | | | | | | | | | Some hardware supports primitive restart on patch primitives, and other hardware does not. Modern GL and ES include a query for this feature; adding a capability bit will allow us to answer it. As far as I know, AMD hardware does not support this feature, while NVIDIA and Intel hardware does. However, most Gallium drivers do not appear to support tessellation shaders yet. So, I've enabled it for nvc0 and disabled it everywhere else. Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Ilia Mirkin <[email protected]> Reviewed-by: Marek Olšák <[email protected]>
* nvc0: do not invalidate compute constbufs on KeplerSamuel Pitoiset2016-05-231-4/+6
| | | | | | | | Constbufs are only aliased on Fermi and this will reduce the number of flushes when we switch between 3d and compute. Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Ilia Mirkin <[email protected]>
* nv30: don't assert when running out of registersIlia Mirkin2016-05-222-3/+1
| | | | | | | | | | This happens with dEQP tests. The code doesn't at all protect against this condition, so while unhandled, this is an expected situation. Also avoid using more than the first 16 registers for nv3x vertex programs. Signed-off-by: Ilia Mirkin <[email protected]>
* nouveau: allow allocating non-object-backed buffersIlia Mirkin2016-05-221-4/+1
| | | | | | | On nv30, for example, there is no hardware index buffer support. So all of those will be created entirely in user memory. Signed-off-by: Ilia Mirkin <[email protected]>
* nvc0/ir: fix indirect access for imagesSamuel Pitoiset2016-05-221-8/+14
| | | | | | | | | | | | When the array doesn't start at 0 we need to account for su->tex.r. While we are at it, make sure to avoid out of bounds access by masking the index. This fixes GL45-CTS.shading_language_420pack.binding_image_array. Signed-off-by: Samuel Pitoiset <[email protected]> Reported-by: Dave Airlie <[email protected]> Reviewed-by: Ilia Mirkin <[email protected]>
* nv30: reset the stencil mask when fast-clearingIlia Mirkin2016-05-221-1/+6
| | | | | | | | Apparently the stencil mask applies to clears on nv30/nv40. Reset it to 0xff before doing a stencil clear. This fixes gl-1.0-readpixsanity and a number of other piglit tests. Signed-off-by: Ilia Mirkin <[email protected]>
* nv30,nv50: add PIPE_SHADER_CAP_PREFERRED_IR supportIlia Mirkin2016-05-222-6/+12
| | | | | | The mesa state tracker has recently started to query this. Signed-off-by: Ilia Mirkin <[email protected]>
* nvc0: fix setting of tess_mode in various situationsIlia Mirkin2016-05-221-4/+14
| | | | | | | | This fixes a lot of INVALID_VALUE errors reported by the card when running dEQP tests. Signed-off-by: Ilia Mirkin <[email protected]> Cc: "11.1 11.2" <[email protected]>
* nv50/ir: fix prog info initIlia Mirkin2016-05-221-3/+1
| | | | | | | | | Left over from the pre-mainline tess support. Adapt to use the new defines. Signed-off-by: Ilia Mirkin <[email protected]> Reviewed-by: Samuel Pitoiset <[email protected]> Cc: "11.1 11.2" <[email protected]>
* nvc0/ir: return 0 for gl_TessCoord.z for non-triangles modesIlia Mirkin2016-05-221-0/+4
| | | | | | Signed-off-by: Ilia Mirkin <[email protected]> Reviewed-by: Samuel Pitoiset <[email protected]> Cc: "11.1 11.2" <[email protected]>
* nvc0: expose GLSL version 420 on GF100Samuel Pitoiset2016-05-211-1/+1
| | | | | Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Ilia Mirkin <[email protected]>
* nvc0: enable ARB_shader_image_load_store on GF100Samuel Pitoiset2016-05-211-0/+3
| | | | | Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Ilia Mirkin <[email protected]>
* nvc0/ir: add a lowering pass for surfaces on FermiSamuel Pitoiset2016-05-212-0/+117
| | | | | Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Ilia Mirkin <[email protected]>
* nvc0/ir: add emission for SULDB and SUSTxSamuel Pitoiset2016-05-211-2/+44
| | | | | Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Ilia Mirkin <[email protected]>
* nvc0/ir: add emission for OP_SULEASamuel Pitoiset2016-05-211-0/+58
| | | | | Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Ilia Mirkin <[email protected]>
* nv50/ir: fix tex constraints for surface coords on FermiSamuel Pitoiset2016-05-211-0/+6
| | | | | Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Ilia Mirkin <[email protected]>
* nv50/ir: use moveSources to condense sourcesIlia Mirkin2016-05-211-6/+1
| | | | | | | This makes sure that rIndirectSrc and other things stay updated. Signed-off-by: Ilia Mirkin <[email protected]> Reviewed-by: Samuel Pitoiset <[email protected]>
* nvc0: bind images on fragment and compute shaders for FermiSamuel Pitoiset2016-05-214-7/+196
| | | | | Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Ilia Mirkin <[email protected]>
* nvc0/ir: don't check the format for surface stores on KeplerSamuel Pitoiset2016-05-211-8/+7
| | | | | | | | | | | | Initially to make sure the format doesn't mismatch and won't produce out-of-bounds access, we checked that both formats have exactly the same number of bytes, but this should not be checked for type stores. This fixes serious rendering issues in the UE4 demos (tested with realistic and reflections). Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Ilia Mirkin <[email protected]>
* nv50/ir: fix a comment in canDualIssue()Samuel Pitoiset2016-05-211-1/+1
| | | | | | | Trivial. Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Ilia Mirkin <[email protected]>
* nv50/ir: fix SUSTx constraints on KeplerSamuel Pitoiset2016-05-211-3/+1
| | | | | | | | | | | | | To prevent out-of-bounds access and format mismatch we add a predicate on sustp, but we have to account for it when the sources are condensed because a predicate is a source. Using the range 3:6 will only condense the input data and it's always the case. This also fixes constraints when an indirect access is used. This ensures that sources are correctly aligned. Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Ilia Mirkin <[email protected]>
* nvc0: account for shader-allocated local memory needsIlia Mirkin2016-05-192-2/+2
| | | | | Signed-off-by: Ilia Mirkin <[email protected]> Reviewed-by: Samuel Pitoiset <[email protected]>
* nv50/ir: treat addresses as localIlia Mirkin2016-05-191-1/+1
| | | | | | | | | Address registers are always loaded right before use. Don't treat them as "global", which will cause them to be put into the function's linkage, and will make the register allocator hold onto that register until the end of the function. Signed-off-by: Ilia Mirkin <[email protected]>
* Treewide: Remove Elements() macroJan Vesely2016-05-175-9/+9
| | | | | Signed-off-by: Jan Vesely <[email protected]> Reviewed-by: Brian Paul <[email protected]>
* nvc0/ir: fix shared atomic lowering to preserve shared memory locationIlia Mirkin2016-05-171-10/+8
| | | | | | | | | | We were always doing atomics on shared memory location 0 instead of the originally supplied location. Make sure to pass through the original symbol and any indirection. Signed-off-by: Ilia Mirkin <[email protected]> Reviewed-by: Samuel Pitoiset <[email protected]> Cc: [email protected] # note: expect minor conflict
* nvc0/ir: make sure out-of-bounds buffer loads/atomics get a 0 resultIlia Mirkin2016-05-171-1/+26
| | | | | Signed-off-by: Ilia Mirkin <[email protected]> Reviewed-by: Samuel Pitoiset <[email protected]>
* nv50/ir: avoid asserts when the state tracker feeds us bogus inputsIlia Mirkin2016-05-151-12/+48
| | | | | | | | | | | | | | | | | | INTERP is defined (by me) to have to have a INPUT source. However the state tracker does not always obey this. This happens due to varying packing logic introducing additional mov's which can't always be undone. Instead of just giving up, we instead try harder to find the original input. This won't always be possible, for example with indirect accesses. There's not much we can (easily) do about that though. This fixes the remaining interpolateAt* failures in dEQP: dEQP-GLES31.functional.shaders.multisample_interpolation.interpolate_at* some of which were asserting due to INTERP_* being passed a non-input. Signed-off-by: Ilia Mirkin <[email protected]> Reviewed-by: Samuel Pitoiset <[email protected]>
* nvc0: don't try to go through the push path for indirect drawsIlia Mirkin2016-05-151-1/+2
| | | | | | | | | | | | | | This fixes dEQP-GLES31.functional.draw_indirect.draw_elements_indirect.*.default_attribute These tests were causing a const vbo to be set up, and were small enough draws that the logic was trying to go via the push path (which emits data directly into the cmd stream rather than uploading a user vbo). Signed-off-by: Ilia Mirkin <[email protected]> Cc: [email protected] Reviewed-by: Samuel Pitoiset <[email protected]>
* nvc0/ir: make sure to align the second arg of TXD to 4, as we do for TEXIlia Mirkin2016-05-151-0/+14
| | | | | | | | | | | | | This was handled in handleTEX(), however the way the logic works, those extra arguments aren't added on by then, so it did nothing. Instead we must duplicate that bit here. GK110 appears to complain about MISALIGNED_GPR, however it's reasonable to believe that GK104 has the same requirements. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=95403 Signed-off-by: Ilia Mirkin <[email protected]> Cc: [email protected] Reviewed-by: Samuel Pitoiset <[email protected]>
* nv50,nvc0: add support for cull distancesTobias Klausmann2016-05-1510-11/+33
| | | | | | | | | | | Cull distances are just a special case of clip distances as far as the hardware is concerned. Make sure that the relevant "planes" are enabled, and flip the clip mode to cull for those. Signed-off-by: Tobias Klausmann <[email protected]> [imirkin: add enables on nvc0, add nv50 support] Signed-off-by: Ilia Mirkin <[email protected]> Reviewed-by: Tobias Klausmann <[email protected]>
* gallium: Add a pipe cap for arb_cull_distanceTobias Klausmann2016-05-143-0/+3
| | | | | | | | | This lets us safely enable or disable the extension as needed Signed-off-by: Tobias Klausmann <[email protected]> Reviewed-by: Edward O'Callaghan <[email protected]> Reviewed-by: Marek Olšák <[email protected]> Signed-off-by: Dave Airlie <[email protected]>
* nvc0: fix indentation in nvc0_invalidate_resource_storage()Samuel Pitoiset2016-05-121-52/+52
| | | | | | | Trivial. Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Ilia Mirkin <[email protected]>
* nvc0: save some CPU cycles in nvc0_context_unreference_resources()Samuel Pitoiset2016-05-121-8/+6
| | | | | | | | This reduces the number of loop iterations for invalidating buffers and images. Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Ilia Mirkin <[email protected]>
* nvc0: invalidate texture buffers for computeSamuel Pitoiset2016-05-121-3/+8
| | | | | | | This is a pretty rare situation but this can happen though. Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Ilia Mirkin <[email protected]>
* nvc0: fix gl_SampleMaskIn computationIlia Mirkin2016-05-1110-5/+94
| | | | | | | | | | | | | | | | | | | | | The SAMPLEMASK semantic should only return the bits set covered by the current invocation. However we were always retrieving the covmask, which returns the covered samples of the whole pixel. When not doing per-sample invocation, this is precisely what we want. However when doing per-sample invocation, we have to select the sampleid'th bit and only return that. Furthermore, this means that we have to have a 1:1 correlation for invocations and samples. This fixes most dEQP-GLES31.functional.shaders.sample_variables.sample_mask_in.* tests. A few failures remain due to disagreements about nr_samples==1 logic as well as what happens with MSAA x2 RTs when the shading fraction is 0.5. Signed-off-by: Ilia Mirkin <[email protected]>
* nv50/ir: generalize interp fixups to be able to fixup anythingIlia Mirkin2016-05-1110-60/+71
| | | | Signed-off-by: Ilia Mirkin <[email protected]>
* nvc0: enable compute support by default on GK110+Samuel Pitoiset2016-05-101-15/+3
| | | | | | | | | | Compute support seems to be pretty stable now, and according to piglit it doesn't seem to break 3D state. As a side effect, this will expose ARB_compute_shader on GK110/GK208. Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Ilia Mirkin <[email protected]>
* nv50/ir: silence unsupported TGSI_PROPERTY_CS_FIXED_BLOCK_*Samuel Pitoiset2016-05-091-0/+5
| | | | | | | We don't need them for compute shaders. Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Ilia Mirkin <[email protected]>
* nvc0: unreference images when the context is destroyedSamuel Pitoiset2016-05-061-0/+4
| | | | | | | Like other resources, we need to unreference all images. Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Ilia Mirkin <[email protected]>
* nouveau/video: properly detect the decoder class for availability checksIlia Mirkin2016-05-041-8/+17
| | | | | | | | | | | The kernel is now more strict with the class ids it exposes, so we need to check the G98 and MCP89 classes as well as the GT215 class. This effectively caused us to decide there were no decoding capabilities on newer kernel for VP3 chips. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=95251 Signed-off-by: Ilia Mirkin <[email protected]> Cc: "11.2" <[email protected]>
* nvc0: compute a percentage for metric-achieved_occupancySamuel Pitoiset2016-05-031-4/+4
| | | | | | | metric-issue_slot_utilization and metric-branch_efficiency are already computed as percentages. Signed-off-by: Samuel Pitoiset <[email protected]>
* nvc0: display some performance metrics with a percentageSamuel Pitoiset2016-05-031-3/+3
| | | | | | This makes more sense for them. Signed-off-by: Samuel Pitoiset <[email protected]>
* nvc0: store the driver query type for performance metricsSamuel Pitoiset2016-05-031-18/+22
| | | | | | | | This will allow to use percentages for some metrics because the Gallium HUD doesn't allow to display floating point numbers and 0 is printed instead. Signed-off-by: Samuel Pitoiset <[email protected]>
* nvc0: fix exposing of metric-issue_slots for SM21/SM30Samuel Pitoiset2016-05-031-2/+22
| | | | | | | | | This is most likely a copy-paste error when I reworked this area few weeks ago. For SM20, metric-issue_slots is equal to inst_issued because there is only one pipeline, so the metric is not exposed there. Signed-off-by: Samuel Pitoiset <[email protected]> Reported-by: Karol Herbst <[email protected]>
* nv50,nvc0: re-bind old compute state after reading MP perf countersSamuel Pitoiset2016-05-022-0/+4
| | | | | | | | This might be useful to avoid breaking the current compute state when monitoring MP perf counters because we use a compute kernel to read out those counters. This has been initially suggested by Ilia Mirkin. Signed-off-by: Samuel Pitoiset <[email protected]>
* nvc0: stick compute kernel arguments into uniform_boSamuel Pitoiset2016-04-295-26/+10
| | | | | | | | | | | | | Having one buffer object for input kernel arguments coming from clover and an other one for OpenGL user uniforms is unnecessary. Using the uniform_bo object for both GL/CL uniforms avoids to declare a new BO. This only affects compute programs but it should not hurt anything because the states are dirtied and data will get reuploaded. Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Hans de Goede <[email protected]> Reviewed-by: Ilia Mirkin <[email protected]>
* nouveau: codegen: LOAD: Take src swizzle into accountHans de Goede2016-04-271-2/+6
| | | | | | | | | | | | | | | | | | | | | | The llvm TGSI backend uses pointers in registers and does things like: LOAD TEMP[0].y, MEMORY[0], TEMP[0] Expecting the data at address TEMP[0].x to get loaded to TEMP[0].y. But this will cause the data at TEMP[0].x + 4 to be loaded instead. This commit adds support for a swizzle suffix for the 1st source operand, which allows using: LOAD TEMP[0].y, MEMORY[0].xxxx, TEMP[0] And actually getting the desired behavior Signed-off-by: Hans de Goede <[email protected]> Reviewed-by: Samuel Pitoiset <[email protected]> Reviewed-by: Ilia Mirkin <[email protected]>
* nouveau: codegen: LOAD: Do not call fetchSrc(1) if the address is immediateHans de Goede2016-04-271-2/+3
| | | | | | | | | | "off" later gets set to NULL when the address is immediate, so move the fetchSrc(1) call to the non-immediate branch of the if-else. This brings handleLOAD's offset handling inline with how it is done in handleSTORE. Signed-off-by: Hans de Goede <[email protected]> Reviewed-by: Samuel Pitoiset <[email protected]> Reviewed-by: Ilia Mirkin <[email protected]>
* nouveau: codegen: LOAD: Always use component 0 when getting the addressHans de Goede2016-04-271-1/+3
| | | | | | | | | | LOAD loads upto 4 components from the specified resource starting at the passed in x value of the 2nd source operand, the y, z and w components of the address should not be used. Signed-off-by: Hans de Goede <[email protected]> Reviewed-by: Samuel Pitoiset <[email protected]> Reviewed-by: Ilia Mirkin <[email protected]>
* gallium: Remove every double semi-colonJakob Sinclair2016-04-262-2/+2
| | | | | | Signed-off-by: Jakob Sinclair <[email protected]> Reviewed-by: Ian Romanick <[email protected]> Reviewed-by: Chad Versace <[email protected]>