summaryrefslogtreecommitdiffstats
path: root/src/gallium/drivers/nouveau
Commit message (Collapse)AuthorAgeFilesLines
* nvc0: use nvc0_m2mf_push_linear() to reduce code duplicationSamuel Pitoiset2016-07-261-13/+3
| | | | | Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Ilia Mirkin <[email protected]>
* nvc0: use nve4_p2mf_push_linear() to reduce code duplicationSamuel Pitoiset2016-07-261-36/+9
| | | | | Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Ilia Mirkin <[email protected]>
* nvc0: upload sample locations on GM20xSamuel Pitoiset2016-07-243-5/+31
| | | | | | | | This fixes a bunch of multisample piglit tests on GM206, like bin/arb_texture_multisample-texelfetch 2 -auto -fbo Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Ilia Mirkin <[email protected]>
* nvc0/ir: fix up an assertion in emitUADD()Samuel Pitoiset2016-07-241-4/+3
| | | | | | | | It's illegal to have neg modifiers on both sources for OP_ADD, and it's illegal to have OP_SUB with just src0 neg. Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Ilia Mirkin <[email protected]>
* nvc0: fix wrong indentation in nvc0_validate_fb()Samuel Pitoiset2016-07-231-141/+141
| | | | | | Trivial. Signed-off-by: Samuel Pitoiset <[email protected]>
* gallium: split transfer_inline_write into buffer and texture callbacksMarek Olšák2016-07-237-7/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | to reduce the call indirections with u_resource_vtbl. The worst call tree you could get was: - u_transfer_inline_write_vtbl - u_default_transfer_inline_write - u_transfer_map_vtbl - driver_transfer_map - u_transfer_unmap_vtbl - driver_transfer_unmap That's 6 indirect calls. Some drivers only had 5. The goal is to have 1 indirect call for drivers that care. The resource type can be determined statically at most call sites. The new interface is: pipe_context::buffer_subdata(ctx, resource, usage, offset, size, data) pipe_context::texture_subdata(ctx, resource, level, usage, box, data, stride, layer_stride) v2: fix whitespace, correct ilo's behavior Reviewed-by: Nicolai Hähnle <[email protected]> Acked-by: Roland Scheidegger <[email protected]>
* nv50/ir: allow to swap sources for OP_SUBSamuel Pitoiset2016-07-221-1/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This allows the load-propagation pass to swap the sources in presence of immediate values. Maxwell (GM107): total instructions in shared programs :1928187 -> 1927634 (-0.03%) total gprs used in shared programs :330741 -> 330154 (-0.18%) total local used in shared programs :28032 -> 28032 (0.00%) local gpr inst bytes helped 0 271 425 425 hurt 0 0 194 194 Fermi (GF114): total instructions in shared programs :2334474 -> 2333829 (-0.03%) total gprs used in shared programs :380934 -> 380215 (-0.19%) total local used in shared programs :33304 -> 33264 (-0.12%) local gpr inst bytes helped 5 314 521 521 hurt 0 4 195 195 No regressions on GM107 and GF114 with full piglit. Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Ilia Mirkin <[email protected]>
* nvc0/mme: fix offsets used for indirect drawsSamuel Pitoiset2016-07-222-8/+8
| | | | | | | | | | This fixes a regression introduced in 1da704a94c57aa0b0cf8faaa3236fe47dfb8f88c because the offset has moved from 0x180 to 0x1a0, and the macros have to be re-compiled. Fixes: 1da704a ("nvc0: increase the tex handles area size in the driver") Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Ilia Mirkin <[email protected]>
* nvc0: fix offsets of MP perf counters input parametersSamuel Pitoiset2016-07-221-15/+15
| | | | | | | | | | | | | This fixes a regression introduced in 1da704a94c57aa0b0cf8faaa3236fe47dfb8f88c because the offset has moved from 0x600 to 0x620, and the kernels used for reading MP perf counters have to be re-assembled. This also fixes amd_performance_monitor_measure piglit. Fixes: 1da704a ("nvc0: increase the tex handles area size in the driver") Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Ilia Mirkin <[email protected]>
* gallium: add a cap for VIEWPORT_SUBPIXEL_BITS (v2)Józef Kucia2016-07-203-0/+3
| | | | | | | | | | | | This allows Gallium drivers to advertise the subpixel precision for floating point viewports bounds. v2: - Set ViewportSubpixelBits in st_init_limits. Signed-off-by: Józef Kucia <[email protected]> Signed-off-by: Marek Olšák <[email protected]> Reviewed-by: Ilia Mirkin <[email protected]>
* nvc0: disable MS images on GM107+Samuel Pitoiset2016-07-201-0/+7
| | | | | | | | MS images have to be handled explicitly and I don't plan to implement them for now. Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Ilia Mirkin <[email protected]>
* nv50/ir: print OP_SUREDB subops in debug modeSamuel Pitoiset2016-07-201-0/+1
| | | | | Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Ilia Mirkin <[email protected]>
* gm107/ir: add emission for SUREDxSamuel Pitoiset2016-07-201-0/+50
| | | | | Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Ilia Mirkin <[email protected]>
* gm107/ir: add emission for SUSTx and SULDxSamuel Pitoiset2016-07-201-0/+104
| | | | | Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Ilia Mirkin <[email protected]>
* gm107/ra: fix constraints for surface operationsSamuel Pitoiset2016-07-201-2/+23
| | | | | Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Ilia Mirkin <[email protected]>
* gm107/ir: lower surface operationsSamuel Pitoiset2016-07-202-1/+77
| | | | | Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Ilia Mirkin <[email protected]>
* nvc0: bind images for 3d/cp shaders on GM107+Samuel Pitoiset2016-07-205-18/+207
| | | | | | | | | On Maxwell, images binding is slightly different (and much better) regarding Fermi and Kepler because a texture view needs to be uploaded for each image and this is going to simplify the thing a lot. Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Ilia Mirkin <[email protected]>
* nvc0: increase the tex handles area size in the driver cbSamuel Pitoiset2016-07-201-11/+11
| | | | | | | | | | | | Currently, we can store 32 tex handles of 32-bits integer each and that fits perfectly with the underlying hardware except on GM107+ which requires to upload a texture view for each images. This patch increases the number of storable texture handles in the driver constant buffer from 32 to 40 because we expose 8 images. Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Ilia Mirkin <[email protected]>
* gm107/ir: make use of ADD32I for all immediatesSamuel Pitoiset2016-07-191-1/+1
| | | | | | | | ADD only allows to emit 19-bits immediates. Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Ilia Mirkin <[email protected]> Cc: <[email protected]>
* gm107/ir: add missing NEG modifier for IADD32ISamuel Pitoiset2016-07-191-0/+1
| | | | | | | | Like FADD32I, the NEG modifier of src0 is at position 56. Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Ilia Mirkin <[email protected]> Cc: [email protected]
* nv50,nvc0: srgb rendering is only available for rgba/bgraIlia Mirkin2016-07-181-2/+2
| | | | | | | | | | | | | | | | Mark both L8_SRGB and L8A8_SRGB as non-renderable (the latter already didn't have the bind flags). This makes the state tracker pick a different format when rendering is required, or mark the fb as incomplete. This fixes: bin/getteximage-formats init-by-clear-and-render -auto -fbo bin/getteximage-formats init-by-rendering -auto -fbo which previously ran into srgb-encoding differences. Signed-off-by: Ilia Mirkin <[email protected]> Reviewed-by: Samuel Pitoiset <[email protected]> Cc: [email protected]
* nvc0: add support for BGRA8 imagesIlia Mirkin2016-07-187-1/+16
| | | | | | | | This is useful for pbo downloads, which are now accelerated with images. BGRA8 is a moderately common format to do that in. Signed-off-by: Ilia Mirkin <[email protected]> Reviewed-by: Samuel Pitoiset <[email protected]>
* nv50: fix alphatest for non-blendable formatsIlia Mirkin2016-07-1614-11/+118
| | | | | | | | | | | | | | | | | | | | The hardware can only do alphatest when using a blendable format. This means that the various *16 norm formats didn't work with alphatest. It appears that Talos Principle uses such formats, as well as alpha tests, for some internal renders, which made them be incorrect. However this does not appear to affect the final renders, but in a different game it easily could. The approach we take is that when alphatests are enabled and a suitable format is used (which we anticipate is the vast minority of the time), we insert code into the shader to perform the comparison and discard. Once inserted, that code lives in the shader forever, and we re-upload it each time the function changes with a fixed-up compare. To avoid re-uploading too often, if we switch back to a blendable format, the test is (effectively) disabled and the hw alphatest functionality is used. Signed-off-by: Ilia Mirkin <[email protected]>
* nv50/ir: add missing string for SV_WORK_DIMSamuel Pitoiset2016-07-141-0/+1
| | | | | | | Fixes: 2aa1197 ("nouveau: Add support for SV_WORK_DIM") Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Ilia Mirkin <[email protected]> Reviewed-by: Hans de Goede <[email protected]>
* nvc0: initial support for GP100 GPUsBen Skeggs2016-07-124-5/+15
| | | | Signed-off-by: Ben Skeggs <[email protected]>
* nvc0: use a define for the driver constant buffer sizeSamuel Pitoiset2016-07-117-17/+17
| | | | | | | This might avoid mistakes if the size is bumped in the future. Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Ilia Mirkin <[email protected]>
* nvc0: fix the driver cb size when draw parameters are usedSamuel Pitoiset2016-07-111-2/+2
| | | | | | | | | | | | The size of the driver constant buffer for each stage should be 2048 and not 512 because it has been increased recently for buffers/images. While we are at it, do the same change for indirect draws. This fixes all ARB_shader_draw_parameters tests on GM107. Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Ilia Mirkin <[email protected]> Cc: 12.0 <[email protected]>
* nvc0/ir: fix images indirect access on FermiSamuel Pitoiset2016-07-111-0/+7
| | | | | | | | | | | This fixes the following piglits: arb_arrays_of_arrays-basic-imagestore-mixed-const-non-const-uniform-index arb_arrays_of_arrays-basic-imagestore-mixed-const-non-const-uniform-index2 Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Ilia Mirkin <[email protected]> Cc: 12.0 <[email protected]>
* nvc0/ir: remove unused resource info loading helpersSamuel Pitoiset2016-07-082-28/+0
| | | | | Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Ilia Mirkin <[email protected]>
* nvc0/ir: refactor the surfaces info loading logicSamuel Pitoiset2016-07-082-82/+44
| | | | | Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Ilia Mirkin <[email protected]>
* nvc0/ir: move the shift left op inside loadTexHandle()Samuel Pitoiset2016-07-081-8/+6
| | | | | Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Ilia Mirkin <[email protected]>
* nvc0/ir: rename NVE4_SU_INFO_XXX to NVC0_SU_INFO_XXXSamuel Pitoiset2016-07-051-49/+49
| | | | | | | | While we are at it, fix a typo inside the comment which describes what those constants are for. Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Ilia Mirkin <[email protected]>
* nvc0/ir: reset the base offset for indirect images accessesSamuel Pitoiset2016-07-051-2/+4
| | | | | | | | | | In presence of an indirect image access, the base offset should be zeroed because the stride will be computed twice. This is a pretty rare situation but it can happen when tex.r > 0. Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Ilia Mirkin <[email protected]> Cc: "11.2 12.0" <[email protected]>
* gm107/ir: fix sign bit emission for FADD32ISamuel Pitoiset2016-07-051-3/+6
| | | | | | | | | | | | When emitting OP_SUB, the sign bit for FADD and FADD32I is not at the same position. It's at position 45 for FADD but 51 for FADD32I. This fixes the following piglit test: tests/spec/arb_fragment_program/fdo30337b.shader_test Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Ilia Mirkin <[email protected]> Cc: <[email protected]>
* nv30: Fix "array subscript is below array bounds" compiler warningHans de Goede2016-07-021-2/+1
| | | | | | | | gcc6 does not like the trick where we point to one entry before the array start and then start a while with a pre-increment. Signed-off-by: Hans de Goede <[email protected]> Reviewed-by: Ilia Mirkin <[email protected]>
* nouveau: Fix a couple of "foo may be used uninitialized' compiler warningsHans de Goede2016-07-022-3/+3
| | | | | | | | | | | | | These are all new false positives with gcc6. In nouveau_compiler.c: gcc6 no longer assumes that passing a pointer to a variable into a function initialises that variable. In nv50_ir_from_tgsi.cpp op and mode are not set if there are 0 enabled dst channels, this never happens, but gcc cannot know this. Signed-off-by: Hans de Goede <[email protected]> Acked-by: Ilia Mirkin <[email protected]>
* nouveau: Fix gcc6 / c++11 auto_ptr deprecation compiler warningsHans de Goede2016-07-021-0/+4
| | | | | Signed-off-by: Hans de Goede <[email protected]> Reviewed-by: Samuel Pitoiset <[email protected]>
* nouveau: Add support for SV_WORK_DIMHans de Goede2016-07-028-12/+29
| | | | | | | | Add support for SV_WORK_DIM for nvc0 and nve4. Signed-off-by: Hans de Goede <[email protected]> Reviewed-by: Ilia Mirkin <[email protected]> Reviewed-by: Samuel Pitoiset <[email protected]>
* nvc0: Make NVC0_CB_AUX_GRID_INFO take an index argumentHans de Goede2016-07-023-4/+4
| | | | | | | | | This brings it inline with the other macros like NVC0_CB_AUX_UBO_INFO and NVC0_CB_AUX_TEX_INFO. Signed-off-by: Hans de Goede <[email protected]> Reviewed-by: Ilia Mirkin <[email protected]> Reviewed-by: Samuel Pitoiset <[email protected]>
* nvc0: fix up image support for allowing multiple samplesIlia Mirkin2016-07-017-49/+108
| | | | | | | | | Basically we just have to scale up the coordinates and then add the relevant sample offset. The code to handle this was already largely present from Christoph's earlier attempts to pipe images through back in the dark ages, this just hooks it all up. Signed-off-by: Ilia Mirkin <[email protected]>
* nv30: go back to not using viewport validate function for swtnlIlia Mirkin2016-07-012-1/+16
| | | | | | | | | The output of draw requires a null viewport transform, which the regular code is ill-equiped to do. Reinstate the original settings in the render path, and add setting of the viewport clip polygon based on fb width/height (as that is all taken care of by draw). Signed-off-by: Ilia Mirkin <[email protected]>
* nv30: fix viewport clipping settings to be based on viewport, not rtIlia Mirkin2016-07-012-17/+11
| | | | | | | This fixes a ton of "*clip*" dEQP GLES2 tests, as well as triangle-guardband-viewport in piglit. Signed-off-by: Ilia Mirkin <[email protected]>
* nv50/ir: print EMIT subops in debug modeSamuel Pitoiset2016-06-291-0/+9
| | | | | Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Ilia Mirkin <[email protected]>
* nv50/ir: print RSQ/RCP subops in debug modeSamuel Pitoiset2016-06-291-0/+10
| | | | | Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Ilia Mirkin <[email protected]>
* nv50/ir: print PIXLD subops in debug modeSamuel Pitoiset2016-06-291-0/+9
| | | | | Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Ilia Mirkin <[email protected]>
* nv50/ir: print SHFL subops in debug modeSamuel Pitoiset2016-06-291-0/+9
| | | | | Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Ilia Mirkin <[email protected]>
* gm107/ir: make sure that flagsDef is set when emitting setcondSamuel Pitoiset2016-06-281-1/+1
| | | | | | | | | | | Rely on the existence of a second destination when emitting a setcond flag is dangerous, because this doesn't mean that the flag has been correctly set. Instead rely on flagsDef like what emitX() does for flagsSrc. Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Ilia Mirkin <[email protected]> Cc: <[email protected]>
* gm107/ir: add missing setcond flags for LOP variantsSamuel Pitoiset2016-06-281-0/+2
| | | | | | Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Ilia Mirkin <[email protected]> Cc: <[email protected]>
* gm107/ir: make use of LOP32I for all immediatesSamuel Pitoiset2016-06-281-1/+1
| | | | | | | | LOP only allows to emit 19-bits immediates. Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Ilia Mirkin <[email protected]> Cc: <[email protected]>
* gm107/ir: make use of MOV32I for all immediatesSamuel Pitoiset2016-06-271-2/+1
| | | | | | | | | MOV only allows to emit 19-bits immediates. This is similar to the previous fix I did for IMUL. Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Ilia Mirkin <[email protected]> Cc: <[email protected]>