summaryrefslogtreecommitdiffstats
path: root/src/gallium/drivers/nouveau
Commit message (Collapse)AuthorAgeFilesLines
* gm107/ir: allow indirect inputs to be loaded by frag shaderIlia Mirkin2016-09-102-5/+21
| | | | | | | | | Looks like the GM107 IPA op does not allow a separate offset when using an indirect register. Instead we must use AL2P like we do for indirect vertex operations on Kepler+. Signed-off-by: Ilia Mirkin <[email protected]> Reviewed-by: Samuel Pitoiset <[email protected]>
* gm107/ir: AL2P writes to a predicate registerIlia Mirkin2016-09-101-0/+1
| | | | | | | | | | | We have to force it to write to predicate 7 (aka PT) in order for it not to mess up another predicate. Unclear what would be returned in the predicate, perhaps an error code for out-of-bounds requests. Blob doesn't seem to check it. Signed-off-by: Ilia Mirkin <[email protected]> Reviewed-by: Samuel Pitoiset <[email protected]> Cc: [email protected]
* gallium: remove PIPE_BIND_TRANSFER_READ/WRITEMarek Olšák2016-09-083-12/+6
| | | | | | | | not used in any useful way Reviewed-by: Eric Anholt <[email protected]> Reviewed-by: Nicolai Hähnle <[email protected]> Reviewed-by: Roland Scheidegger <[email protected]>
* gk110/ir: fix quadop dall emissionIlia Mirkin2016-09-041-2/+2
| | | | | | | | | We recently starting to always emit the NDV (== dall) bit for quadops. However it was folded into the wrong code word. Fixes: e0a067ed48 (nv50/ir: always emit the NDV bit for OP_QUADOP) Signed-off-by: Ilia Mirkin <[email protected]> Cc: <[email protected]>
* nvc0/ir: allow min/max instructions to be dual-issued in pairsKarol Herbst2016-09-031-2/+12
| | | | | | | | | | | | | | changes for GpuTest /test=pixmark_piano /benchmark /no_scorebox /msaa=0 /benchmark_duration_ms=60000 /width=1024 /height=640: inst_executed: 1.03G inst_issued1: 614M -> 580M inst_issued2: 213M -> 230M score: 1021 -> 1030 Signed-off-by: Karol Herbst <[email protected]> Reviewed-by: Ilia Mirkin <[email protected]>
* nv50,nvc0: respect render condition enable flag when clearing rt/zsIlia Mirkin2016-09-032-12/+24
| | | | | | | | This is a newly added flag. We always pass false into it from nv50_clear_texture, but other callers may want to respect the render condition. (And the functions were originally spec'd to respect it.) Signed-off-by: Ilia Mirkin <[email protected]>
* nvc0/ir: don't dual-issue ops that depend or interfere with each otherKarol Herbst2016-09-033-14/+23
| | | | | | | Signed-off-by: Karol Herbst <[email protected]> Reviewed-by: Tobias Klausmann <[email protected]> [imirkin: rewrite to split up the helpers and move more logic to target] Signed-off-by: Ilia Mirkin <[email protected]>
* nvc0: reduce the initial code segment size to 512KBSamuel Pitoiset2016-09-011-1/+1
| | | | | Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Ilia Mirkin <[email protected]>
* nvc0: allow to resize the code segment dynamicallySamuel Pitoiset2016-09-011-1/+24
| | | | | | | | | | | | | When an application uses a ton of shaders, we need to evict them when the code segment is full but this is not really a good solution if monster shaders are used because code eviction will happen a lot. To avoid this, it seems better to dynamically resize the code segment area after each eviction. The maximum size is arbitrary fixed to 8MB which should be enough. Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Ilia Mirkin <[email protected]>
* nvc0: add a new bin for the code segmentSamuel Pitoiset2016-09-012-4/+6
| | | | | | | | | To avoid the bins list to grow up indefinitely when the code segment size will be bumped, we need to separate that bin from the SCREEN one because it contains other resources like the uniform bo. Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Ilia Mirkin <[email protected]>
* nvc0: add nvc0_screen_resize_text_area() helperSamuel Pitoiset2016-09-013-10/+40
| | | | | | | | This function will be helpful for resizing the code segment area when we need to evict all shaders. Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Ilia Mirkin <[email protected]>
* nvc0: re-upload currently bound shaders after code evictionSamuel Pitoiset2016-09-011-0/+27
| | | | | | | | | | | | | This fixes a very old issue which happens when the code segment size is full. A bunch of real applications like Tomb Raider, F1 2015, Elemental, hit that issue because they use a ton of shaders. In this case, all shaders are evicted (for freeing space) but all currently bound shaders also need to be re-uploaded and SP_START_ID have to be updated accordingly. Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Ilia Mirkin <[email protected]>
* nvc0: refactor the program upload processSamuel Pitoiset2016-09-013-32/+59
| | | | | | | | | | This refactoring will help for fixing the "out of code space" eviction issue because we will need to reupload the code for all currently bound shaders but it's slightly different than uploading a new fresh code. Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Ilia Mirkin <[email protected]>
* nvc0: remove an attempt at uploading all IMMD into a CBSamuel Pitoiset2016-08-313-40/+0
| | | | | | | | | | | This has never been used because info->immd.bufSize is always 0 and anyways this is an experimental code which has never been completed. This gets rid of some unused code in the program validation process. Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Ilia Mirkin <[email protected]>
* nv50: remove unused nv50_program::immd_size fieldSamuel Pitoiset2016-08-311-1/+0
| | | | | Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Ilia Mirkin <[email protected]>
* nv30: set usage to staging so that the buffer is allocated in GARTIlia Mirkin2016-08-311-1/+2
| | | | | | | | | | The code a few lines below expects to migrate the bo in question to VRAM. Since we're filling the initial data via CPU, it's more efficient to create the temporary buffer in GART. There is no "push" method implemented, otherwise we'd use that instead. Signed-off-by: Ilia Mirkin <[email protected]> Cc: [email protected]
* nv30: only bail on color/depth bpp mismatch when surfaces are swizzledIlia Mirkin2016-08-311-2/+3
| | | | | | | | | | | The actual restriction is a little weaker than I originally thought. See https://bugs.freedesktop.org/show_bug.cgi?id=92306#c17 for the suggestion. This also explain why things weren't *always* failing before, only sometimes. We will allocate a non-swizzled depth buffer for NPOT winsys buffer sizes, which they almost always are. Signed-off-by: Ilia Mirkin <[email protected]> Cc: [email protected]
* nvc0: fix indentation in nvc0_screen_init()Samuel Pitoiset2016-08-301-1/+1
| | | | | | | Trivial. Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Ilia Mirkin <[email protected]>
* nvc0: check return value of nvc0_screen_resize_tls_area()Samuel Pitoiset2016-08-302-11/+8
| | | | | | | | While we are at it, make it static and change the return values policy to be consistent. Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Ilia Mirkin <[email protected]>
* nvc0: make use of FAIL_SCREEN_INIT in nvc0_screen_create()Samuel Pitoiset2016-08-301-9/+7
| | | | | Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Ilia Mirkin <[email protected]>
* nv50/ir: always emit the NDV bit for OP_QUADOPSamuel Pitoiset2016-08-302-8/+2
| | | | | | | | | | | | | | This silences a divergent error found with F1 2015. Basically, the NDV bit has to be set when a FSWZ instruction is inside divergent code, but it's not needed otherwise. The correct fix should be to set it only in divergent code situations. GM107 emitter already sets that bit. Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Ilia Mirkin <[email protected]> Cc: <[email protected]>
* nvc0: undo overzealous enum usageIlia Mirkin2016-08-301-2/+2
| | | | | | | | Commit 7413625ad3 flipped a few functions too many to use pipe_shader_type. These functions actually take an integer that does not correspond 1:1 with the enum. Signed-off-by: Ilia Mirkin <[email protected]>
* gallium: add cap to export device pointer sizeJan Vesely2016-08-292-0/+4
| | | | | | | | | v2: document the new cap v3: fix 80 char limit in screen.rst Signed-off-by: Jan Vesely <[email protected]> Reviewed-by: Marek Olšák <[email protected]> Acked-by: Ilia Mirkin <[email protected]>
* gallium: Use enum pipe_shader_type in set_shader_images()Kai Wasserbäch2016-08-291-1/+2
| | | | | Signed-off-by: Kai Wasserbäch <[email protected]> Reviewed-by: Brian Paul <[email protected]>
* gallium: Use enum pipe_shader_type in set_shader_buffers()Kai Wasserbäch2016-08-291-1/+1
| | | | | Signed-off-by: Kai Wasserbäch <[email protected]> Reviewed-by: Brian Paul <[email protected]>
* gallium: Use enum pipe_shader_type in set_sampler_views()Kai Wasserbäch2016-08-293-3/+3
| | | | | Signed-off-by: Kai Wasserbäch <[email protected]> Reviewed-by: Brian Paul <[email protected]>
* gallium: Use enum pipe_shader_type in bind_sampler_states() (v2)Kai Wasserbäch2016-08-293-5/+16
| | | | | | | | | | | v1 → v2: - Fixed indentation (noted by Brian Paul) - Removed second assert from nouveau's switch statements (suggested by Brian Paul) Signed-off-by: Kai Wasserbäch <[email protected]> Reviewed-by: Samuel Pitoiset <[email protected]> Reviewed-by: Brian Paul <[email protected]>
* nvc0: invalidate textures/samplers on GK104+Samuel Pitoiset2016-08-242-12/+22
| | | | | | | | | | | | | | Like Fermi, textures and samplers are aliased between 3D and compute, especially the TIC_FLUSH/TSC_FLUSH methods and we have to re-validate these resources when switching between the two pipelines. This fixes a GPU hang with Elemental (and most likely with other UE4 demos). Tested on GK107 and GM107. Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Ilia Mirkin <[email protected]> CC: <[email protected]>
* gallium: add a cap to expose whether driver supports mixed color/zs bitsIlia Mirkin2016-08-233-0/+3
| | | | | | | | | | Some hardware can't render to color/depth buffers of mixed bitness. When that happens a fallback has to happen, but this allows the driver to express that this isn't an optimal scenario. The purpose of this is to remove such fbconfigs from the GLX/EGL config list. Signed-off-by: Ilia Mirkin <[email protected]> Reviewed-by: Marek Olšák <[email protected]>
* nv50/ir: make sure cfg iterator always hits all blocksIlia Mirkin2016-08-231-4/+4
| | | | | | | | | | | | In some very specially-crafted cases, we could attempt to visit a node that has already been visited, and then run out of bb's to visit, while there were still cross blocks on the list. Make sure that those get moved over in that case. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=96274 Signed-off-by: Ilia Mirkin <[email protected]> Reviewed-by: Samuel Pitoiset <[email protected]> Cc: [email protected]
* gallium: change pipe_image_view::first_element/last_element -> offset/sizeMarek Olšák2016-08-172-19/+9
| | | | | | | | | This is required by OpenGL. Our hardware supports this. Example: Bind RGBA32F with offset = 4 bytes. Acked-by: Ilia Mirkin <[email protected]> Acked-by: Nicolai Hähnle <[email protected]>
* gallium: change pipe_sampler_view::first_element/last_element -> offset/sizeMarek Olšák2016-08-172-12/+13
| | | | | | | | | | | This is required by OpenGL. Our hardware supports this. Example: Bind RGBA32F with offset = 4 bytes. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=97305 Acked-by: Ilia Mirkin <[email protected]> Acked-by: Nicolai Hähnle <[email protected]>
* nv50/ir: fix bb positions after exit instructionsIlia Mirkin2016-08-161-3/+10
| | | | | | | | | | | It's fairly rare that the BB layout puts BBs after the exit block, which is likely the reason these issues lingered for so long. This fixes a fraction of issues with the giant pixmark piano shader. Signed-off-by: Ilia Mirkin <[email protected]> Reviewed-by: Samuel Pitoiset <[email protected]> Cc: <[email protected]>
* nv50/ir: properly clear upper bits of a bitset fillIlia Mirkin2016-08-161-2/+2
| | | | | | | Found by inspection. In practice, val is always == 0, so this never got triggered. Signed-off-by: Ilia Mirkin <[email protected]>
* nv50,nvc0: fix depth range when halfz is enabledIlia Mirkin2016-08-142-4/+14
| | | | | | Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=97231 Signed-off-by: Ilia Mirkin <[email protected]> Cc: "11.2 12.0" <[email protected]>
* gallium: add a pipe_context parameter to fence_finishMarek Olšák2016-08-101-0/+1
| | | | | | | | required by glClientWaitSync (GL 4.5 Core spec) that can optionally flush the context Reviewed-by: Rob Clark <[email protected]> Reviewed-by: Nicolai Hähnle <[email protected]>
* gallium: add render_condition_enable param to clear_render_target/depth_stencilMarek Olšák2016-08-104-11/+17
| | | | | Reviewed-by: Roland Scheidegger <[email protected]> Reviewed-by: Nicolai Hähnle <[email protected]>
* nvc0: enable ARB_tessellation_shader on GM107+Samuel Pitoiset2016-07-271-3/+0
| | | | | | | This exposes OpenGL 4.1 on Maxwell (tested on GM107 and GM206). Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Ilia Mirkin <[email protected]>
* gm107/ir: add a legalize SSA pass for PFETCHSamuel Pitoiset2016-07-274-2/+43
| | | | | | | PFETCH, actually ISBERD on GM107+ ISA only accepts a GPR for src0. Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Ilia Mirkin <[email protected]>
* nvc0: fix up TCP header on GM107+Samuel Pitoiset2016-07-271-0/+9
| | | | | | | | | | The number of outputs patch (limited to 255) has moved in the TCP header, but blob seems to also set the old position. Also, the high 8-bits are now located inbetween the min/max parallel output read address at position 20. Signed-off-by: Samuel Pitoiset <[email protected]> Acked-by: Ilia Mirkin <[email protected]>
* nvc0: use nvc0_m2mf_push_linear() to reduce code duplicationSamuel Pitoiset2016-07-261-13/+3
| | | | | Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Ilia Mirkin <[email protected]>
* nvc0: use nve4_p2mf_push_linear() to reduce code duplicationSamuel Pitoiset2016-07-261-36/+9
| | | | | Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Ilia Mirkin <[email protected]>
* nvc0: upload sample locations on GM20xSamuel Pitoiset2016-07-243-5/+31
| | | | | | | | This fixes a bunch of multisample piglit tests on GM206, like bin/arb_texture_multisample-texelfetch 2 -auto -fbo Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Ilia Mirkin <[email protected]>
* nvc0/ir: fix up an assertion in emitUADD()Samuel Pitoiset2016-07-241-4/+3
| | | | | | | | It's illegal to have neg modifiers on both sources for OP_ADD, and it's illegal to have OP_SUB with just src0 neg. Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Ilia Mirkin <[email protected]>
* nvc0: fix wrong indentation in nvc0_validate_fb()Samuel Pitoiset2016-07-231-141/+141
| | | | | | Trivial. Signed-off-by: Samuel Pitoiset <[email protected]>
* gallium: split transfer_inline_write into buffer and texture callbacksMarek Olšák2016-07-237-7/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | to reduce the call indirections with u_resource_vtbl. The worst call tree you could get was: - u_transfer_inline_write_vtbl - u_default_transfer_inline_write - u_transfer_map_vtbl - driver_transfer_map - u_transfer_unmap_vtbl - driver_transfer_unmap That's 6 indirect calls. Some drivers only had 5. The goal is to have 1 indirect call for drivers that care. The resource type can be determined statically at most call sites. The new interface is: pipe_context::buffer_subdata(ctx, resource, usage, offset, size, data) pipe_context::texture_subdata(ctx, resource, level, usage, box, data, stride, layer_stride) v2: fix whitespace, correct ilo's behavior Reviewed-by: Nicolai Hähnle <[email protected]> Acked-by: Roland Scheidegger <[email protected]>
* nv50/ir: allow to swap sources for OP_SUBSamuel Pitoiset2016-07-221-1/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This allows the load-propagation pass to swap the sources in presence of immediate values. Maxwell (GM107): total instructions in shared programs :1928187 -> 1927634 (-0.03%) total gprs used in shared programs :330741 -> 330154 (-0.18%) total local used in shared programs :28032 -> 28032 (0.00%) local gpr inst bytes helped 0 271 425 425 hurt 0 0 194 194 Fermi (GF114): total instructions in shared programs :2334474 -> 2333829 (-0.03%) total gprs used in shared programs :380934 -> 380215 (-0.19%) total local used in shared programs :33304 -> 33264 (-0.12%) local gpr inst bytes helped 5 314 521 521 hurt 0 4 195 195 No regressions on GM107 and GF114 with full piglit. Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Ilia Mirkin <[email protected]>
* nvc0/mme: fix offsets used for indirect drawsSamuel Pitoiset2016-07-222-8/+8
| | | | | | | | | | This fixes a regression introduced in 1da704a94c57aa0b0cf8faaa3236fe47dfb8f88c because the offset has moved from 0x180 to 0x1a0, and the macros have to be re-compiled. Fixes: 1da704a ("nvc0: increase the tex handles area size in the driver") Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Ilia Mirkin <[email protected]>
* nvc0: fix offsets of MP perf counters input parametersSamuel Pitoiset2016-07-221-15/+15
| | | | | | | | | | | | | This fixes a regression introduced in 1da704a94c57aa0b0cf8faaa3236fe47dfb8f88c because the offset has moved from 0x600 to 0x620, and the kernels used for reading MP perf counters have to be re-assembled. This also fixes amd_performance_monitor_measure piglit. Fixes: 1da704a ("nvc0: increase the tex handles area size in the driver") Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Ilia Mirkin <[email protected]>
* gallium: add a cap for VIEWPORT_SUBPIXEL_BITS (v2)Józef Kucia2016-07-203-0/+3
| | | | | | | | | | | | This allows Gallium drivers to advertise the subpixel precision for floating point viewports bounds. v2: - Set ViewportSubpixelBits in st_init_limits. Signed-off-by: Józef Kucia <[email protected]> Signed-off-by: Marek Olšák <[email protected]> Reviewed-by: Ilia Mirkin <[email protected]>