summaryrefslogtreecommitdiffstats
path: root/src/gallium/drivers/nouveau
Commit message (Collapse)AuthorAgeFilesLines
* nv50/ir: fix printing of pixldRhys Perry2018-05-031-1/+1
| | | | | Signed-off-by: Rhys Perry <[email protected]> Reviewed-by: Ilia Mirkin <[email protected]>
* nv50: Extract needed value bits without shifting them before calling bitcountVlad Golovkin2018-05-021-1/+1
| | | | | | | This can save one instruction since bitcount doesn't care about specific bits' positions. Reviewed-by: Karol Herbst <[email protected]>
* nvc0: add conservative rasterization supportRhys Perry2018-04-307-8/+87
| | | | | | | | | Subpixel precision bias, dilation and the post-snap mode are supported on GM200 and newer. The pre-snap mode is supported for triangle primitives on GP100. Signed-off-by: Rhys Perry <[email protected]> Reviewed-by: Ilia Mirkin <[email protected]>
* gallium: add initial support for conservative rasterizationRhys Perry2018-04-303-0/+30
| | | | | | Signed-off-by: Rhys Perry <[email protected]> Reviewed-by: Brian Paul <[email protected]> Reviewed-by: Marek Olšák <[email protected]>
* nvc0/ir: all short immediates are sign-extended, adjust LIMM testIlia Mirkin2018-04-243-19/+24
| | | | | | | | | | | | | | | | | | | | | | Some analysis suggests that all short immediates are sign-extended. The insnCanLoad logic already accounted for this, but we could still pick the wrong form when emitting actual instructions that support both short and long immediates (with the long form usually having additional restrictions that insnCanLoad should be aware of). This also reverses a bunch of commits that had previously "worked around" this issue in various emitters: 9c63224540ef: gm107/ir: make use of ADD32I for all immediates 83a4f28dc27b: gm107/ir: make use of LOP32I for all immediates b84c97587b4a: gm107/ir: make use of IMUL32I for all immediates d30768025a22: gk110/ir: make use of IMUL32I for all immediates as well as the original import for UMUL in the nvc0 emitter. Reported-by: Karol Herbst <[email protected]> Signed-off-by: Ilia Mirkin <[email protected]> Tested-by: Karol Herbst <[email protected]>
* gm107/ir/lib: fix sched in div u32 builtinKarol Herbst2018-04-242-4/+4
| | | | | | | | | | Imad needs to set a read barrier. With significant big work groups I was getting wrong results for div u32. Turns out the issue was with the sched opcodes. Signed-off-by: Karol Herbst <[email protected]> Reviewed-by: Samuel Pitoiset <[email protected]>
* nv50/ir: make a copy of tex src if it's referenced multiple timesIlia Mirkin2018-04-221-37/+49
| | | | | | | | | | | | | | | | For nv50 we coalesce the srcs and defs into a single node. As such, we can end up with impossible constraints if the source is referenced after the tex operation (which, due to the coalescing of values, will have overwritten it). This logic already exists for inserting moves for MERGE/UNION sources. It's the exact same idea here, so leverage that code, which also includes a few optimizations around not extending live ranges unnecessarily. Fixes tests/spec/glsl-1.30/execution/fs-textureSize-components.shader_test Signed-off-by: Ilia Mirkin <[email protected]>
* nv50/ir/ra: prefer def == src2 for fma with immediates on nvc0Karol Herbst2018-04-211-10/+29
| | | | | | | | | | | | | | | | | | | | This helps with the PostRALoadPropagation pass moving long immediates into FMA/MAD instructions. changes in shader-db: total instructions in shared programs : 5894114 -> 5886074 (-0.14%) total gprs used in shared programs : 666558 -> 666563 (0.00%) total shared used in shared programs : 520416 -> 520416 (0.00%) total local used in shared programs : 53524 -> 53524 (0.00%) total bytes used in shared programs : 54006744 -> 53932472 (-0.14%) local shared gpr inst bytes helped 0 0 2 4192 4192 hurt 0 0 7 9 9 Signed-off-by: Karol Herbst <[email protected]> [imirkin: minor edits to separate nv50 and nvc0+ cases] Reviewed-by: Ilia Mirkin <[email protected]>
* nvc0: fix line width on GM20x+Rhys Perry2018-04-201-1/+4
| | | | | | | This has the side-effect of fixing polygon-offset piglit test failures. Signed-off-by: Rhys Perry <[email protected]> Reviewed-by: Ilia Mirkin <[email protected]>
* nvc0: finish implementation of PIPE_QUERY_SO_OVERFLOW_PREDICATERhys Perry2018-04-073-17/+30
| | | | | | | This also removes some useless code leftover from old changes. Signed-off-by: Rhys Perry <[email protected]> Reviewed-by: Ilia Mirkin <[email protected]>
* nvc0: change ACQUIRE_EQUAL to ACQUIRE_GEQUAL in nvc0_hw_query_fifo_waitRhys Perry2018-04-071-1/+1
| | | | | | | | | | If a fence is created in between nvc0_hw_end_query and nvc0_hw_query_fifo_wait, the sequence number in nvc0->screen->fence.bo can be larger than hq->fence->sequence before the semaphore is created, resulting in the semaphore never being triggered. Signed-off-by: Rhys Perry <[email protected]> Reviewed-by: Ilia Mirkin <[email protected]>
* nvc0: ensure the query's fence has been emitted in nvc0_hw_query_fifo_waitRhys Perry2018-04-071-0/+4
| | | | | | | | | | If the fence has not been emitted, hq->fence->sequence would be zero. This would result in the semaphore never being triggered, blocking all later commands in the pushbuf. Signed-off-by: Rhys Perry <[email protected]> [imirkin: use nouveau_fence_emit instead] Reviewed-by: Ilia Mirkin <[email protected]>
* nvc0: restore image binding on RGB10A2, remove from BGR10A2Ilia Mirkin2018-04-071-2/+2
| | | | | | | | | | | | | Fixes a bunch of new CTS pbo tests that use those as an output format, which the state tracker converts into buffer image writes. No part of the driver is ready for BGR10A2. It could probably be enabled on Maxwell+, but seems unnecessary. This error was introduced when flipping the displayable bit on those formats, which accidentally also moved the image bit. Fixes: e1a70aed10d (nv50,nvc0: mark ABGR format as displayable instead of ARGB format) Signed-off-by: Ilia Mirkin <[email protected]>
* util: Move util_is_power_of_two to bitscan.h and rename to ↵Ian Romanick2018-03-294-7/+7
| | | | | | | | | | | util_is_power_of_two_or_zero The new name make the zero-input behavior more obvious. The next patch adds a new function with different zero-input behavior. Signed-off-by: Ian Romanick <[email protected]> Suggested-by: Matt Turner <[email protected]> Reviewed-by: Alejandro Piñeiro <[email protected]>
* nvc0/ir: fix emiting NOTs with predicatesKarol Herbst2018-03-291-0/+2
| | | | | Signed-off-by: Karol Herbst <[email protected]> Reviewed-by: Ilia Mirkin <[email protected]>
* nvc0/ir: fix INTERP_* with indirect inputsIlia Mirkin2018-03-271-3/+4
| | | | | | | | | | | | There were two problems, both of which are fixed now: - The indirect address was not being shifted by 4 - The indirect address was being placed as an argument in the offset case This fixes some of the new interpolateAt* piglits which now test for these situations. Signed-off-by: Ilia Mirkin <[email protected]> Reviewed-by: Karol Herbst <[email protected]>
* gallium: add packed uniform CAPTimothy Arceri2018-03-203-0/+3
| | | | Reviewed-by: Marek Olšák <[email protected]>
* nv50,nvc0: Support BGRX1010102 and RGBX1010102 for sampling.Mario Kleiner2018-03-141-0/+2
| | | | | | | | | | | | | | | | | | | | | Add them as usable for textures, so they can be used by Wayland drm in 10 bpc mode and for X11 compositing under GLX and EGL. We need these formats to be supported at least for sampling, otherwise GLX_texture_from_pixmap and the equivalent EGL image extension won't work with X11 drawables of depth 30 and just display an all black window. Do not expose these formats as renderable, and thereby not as a fbconfig/EGLConfig/Visual, as NVidia hw does not support 10 bpc unorm formats without alpha channel. Tested under X11 + GLX/EGL + DRI2/DRI3 for compositing, and under Wayland+Weston drm backend with a Tesla and Pascal gpu. Signed-off-by: Mario Kleiner <[email protected]> Reviewed-by: Ilia Mirkin <[email protected]>
* nouveau: Add framebuffer modifier supportThierry Reding2018-03-096-5/+146
| | | | | | | | | | | | | | | | | | | | | | | This adds support for framebuffer modifiers to Nouveau. This will be used by the Tegra driver to share metadata about the format of buffers (such as the tiling mode or compression). Changes in v2: - remove unused parameters to nouveau_buffer_create() - move format modifier query code to nvc0 backend - restrict format modifiers to 2D textures - implement ->query_dmabuf_modifiers() Changes in v4: - add UAPI include path on meson builds Changes in v5: - remove unnecessary includes Acked-by: Emil Velikov <[email protected]> Tested-by: Andre Heider <[email protected]> Reviewed-by: Ilia Mirkin <[email protected]> Signed-off-by: Thierry Reding <[email protected]>
* nouveau/nvc0: Extract common tile mode macroThierry Reding2018-03-091-6/+9
| | | | | | | | | | | Add a new macro that can be used to extract the tiling mode from a tile_mode value. This is will be used to determine the number of GOBs used in block linear mode. Acked-by: Emil Velikov <[email protected]> Tested-by: Andre Heider <[email protected]> Reviewed-by: Ilia Mirkin <[email protected]> Signed-off-by: Thierry Reding <[email protected]>
* nvc0: collapse output slots to have adjacent registersIlia Mirkin2018-02-271-2/+12
| | | | | | | | | | The hardware skips over unallocated slots, so we have to make sure those registers are packed together. Fixes KHR-GL45.enhanced_layouts.fragment_data_location_api Signed-off-by: Ilia Mirkin <[email protected]> Tested-by: Karol Herbst <[email protected]>
* nvir/gm107: consider FILE_FLAGS dependencies in SchedDataCalculatorGM107Karol Herbst2018-02-261-1/+14
| | | | | | | | | | | | | | | | | | | | currently while insterting barriers, writes and reads to FILE_FLAGS aren't considered. This can lead to WaR hazards in some situations. With the previous commit fixes shaders with intstructions like this: mad u32 $r2 $r4 $r11 $r2 mad u32 { $r5 $c0 } $r4 $r10 $r6 mad (SUBOP:1) u32 $r3 $r4 $r10 $r2 $c0 Affects OpenCL CTS tests on Maxwell+: basic/test_basic intmath_long basic/test_basic intmath_long2 basic/test_basic intmath_long4 v2: only put barriers on instructions which actually read flags Reviewed-by: Samuel Pitoiset <[email protected]> Signed-off-by: Karol Herbst <[email protected]>
* nvir/gm107: iterate over all defs in SchedDataCalculatorGM107::findFirstUseKarol Herbst2018-02-261-16/+18
| | | | | | | | | | In the sched data calculator we have to track first use of defs by iterating over all defs of an instruction, not just the first one. v2: fix minGRP and maxGRP values Reviewed-by: Samuel Pitoiset <[email protected]> Signed-off-by: Karol Herbst <[email protected]>
* nvir: dont optimize mad with subops to shladdKarol Herbst2018-02-241-1/+2
| | | | | Signed-off-by: Karol Herbst <[email protected]> Reviewed-by: Ilia Mirkin <[email protected]>
* nv50,nvc0: fix integer MS resolves using 2d engineIlia Mirkin2018-02-221-1/+2
| | | | | | | | We don't want filtering for integer textures, same as depth/stencil. Fixes: KHR-GL45.direct_state_access.renderbuffers_storage_multisample Signed-off-by: Ilia Mirkin <[email protected]> Tested-by: Karol Herbst <[email protected]>
* nvc0: fix writing query results into bufferIlia Mirkin2018-02-221-4/+10
| | | | | | | | | | | We need to mark the range as valid, and validate the resource using a helper to ensure that the buffer status is marked properly. Fixes some CTS pipeline stats query tests, and KHR-GL45.direct_state_access.queries_functional Signed-off-by: Ilia Mirkin <[email protected]> Tested-by: Karol Herbst <[email protected]>
* nv50,nvc0: fix clear buffer accelerationIlia Mirkin2018-02-222-28/+17
| | | | | | | | | | | | Two things were off: - valid range was not updated, which could affect waiting for future maps - fencing was done manually instead of using the *_resource_validate helper, which resulted in a missed dirty buffer flag being set Fixes: KHR-GL45.direct_state_access.buffers_clear Signed-off-by: Ilia Mirkin <[email protected]> Tested-by: Karol Herbst <[email protected]>
* nvir/nvc0: fix legalizing of ld unlock c0[0x10000]Karol Herbst2018-02-211-1/+1
| | | | | | | | | We have to increase the file index also for 0x10000 not just for values greater than 0x10000. Fixes: 37b67db6ae34fb6586d640a7a1b6232f091dd812 Signed-off-by: Karol Herbst <[email protected]> Reviewed-by: Ilia Mirkin <[email protected]>
* nv50,nvc0: mark ABGR format as displayable instead of ARGB formatIlia Mirkin2018-02-191-2/+2
| | | | | | This matches the hardware's capabilities. Signed-off-by: Ilia Mirkin <[email protected]>
* gm107/ir: avoid using kepler instruction capabilitiesIlia Mirkin2018-02-172-21/+45
| | | | | | | | Split up the op properties table into generation-specific bits, and only use the kepler ones on kepler. Fixes some CTS images tests. Signed-off-by: Ilia Mirkin <[email protected]> Reviewed-by: Karol Herbst <[email protected]>
* nvc0: add support for bindless on maxwell+Ilia Mirkin2018-02-173-14/+116
| | | | Signed-off-by: Ilia Mirkin <[email protected]>
* gm107/ir: change how SUQ works in preparation for bindlessIlia Mirkin2018-02-173-1/+61
| | | | | | | All this information can be retrieved from the TIC directly. Avoid having to dip into the constbuf information about the image. Signed-off-by: Ilia Mirkin <[email protected]>
* nvc0: Use GP100_COMPUTE_CLASS on GP10BMikko Perttunen2018-02-171-1/+2
| | | | | | | | GP10B requires the use of GP100_COMPUTE_CLASS instead of GP104_COMPUTE_CLASS as is used for other non-GP100 chips. Signed-off-by: Mikko Perttunen <[email protected]> Reviewed-by: Ilia Mirkin <[email protected]>
* gallium: allow drivers to impose BO flags restrictions on constant buffer 0Marek Olšák2018-02-173-0/+3
| | | | Required by radeonsi for optimal behavior.
* nvc0: disable MS Images for sample_count == 1 on MaxwellKarol Herbst2018-02-151-1/+1
| | | | | | | | fixes KHR-GL45.multi_bind.dispatch_bind_textures on Maxwell Suggested-by: Ilia Mirkin <[email protected]> Signed-off-by: Karol Herbst <[email protected]> Reviewed-by: Ilia Mirkin <[email protected]>
* gallium: drop all the guard band float caps.Dave Airlie2018-02-142-12/+0
| | | | | | | | | | Nobody queries these and nobody sets them to anything useful, the docs say TODO. Drop them until a use appears. Reviewed-by: Roland Scheidegger <[email protected]> Signed-off-by: Dave Airlie <[email protected]>
* meson: Add build option for toolsScott D Phillips2018-02-081-1/+2
| | | | | | | | | | | | | | | Add a build option to control building some of the misc tools we have. Also set the executables to install, presumably you want that if you're asking for the build. v2: set 'install:' to the with_tools value, not true (Jordan) handle 'all' in a the comma list (Dylan) Add freedreno's tools (Dylan) Reviewed-by: Eric Anholt <[email protected]> Reviewed-by: Jordan Justen <[email protected]> Reviewed-by: Eric Engestrom <[email protected]> Reviewed-by: Dylan Baker <[email protected]>
* gallium: introduce PIPE_CAP_FENCE_SIGNAL v2Andres Rodriguez2018-01-303-0/+3
| | | | | | | | | Protects semaphore signaling functionality required by GL_EXT_semaphore. v2: s/semaphore/fence Signed-off-by: Andres Rodriguez <[email protected]> Reviewed-by: Marek Olšák <[email protected]>
* autotools: include meson build files in tarballDylan Baker2018-01-191-1/+1
| | | | | | | | | | | | This adds the meson.build, meson_options.txt, and a few scripts that are used exclusively by the meson build. v2: - Remove accidentally included changes needed to test make dist with LLVM > 3.9 Signed-off-by: Dylan Baker <[email protected]> Acked-by: Eric Engestrom <[email protected]> Reviewed-by: Emil Velikov <[email protected]>
* gallium: remove PIPE_CAP_USER_CONSTANT_BUFFERSMarek Olšák2018-01-173-3/+0
| | | | | Reviewed-by: Roland Scheidegger <[email protected]> Tested-by: Dieter Nützel <[email protected]>
* gallium: remove PIPE_CAP_TEXTURE_SHADOW_MAPMarek Olšák2018-01-173-3/+0
| | | | | Reviewed-by: Roland Scheidegger <[email protected]> Tested-by: Dieter Nützel <[email protected]>
* gallium: remove PIPE_CAP_TWO_SIDED_STENCILMarek Olšák2018-01-173-3/+0
| | | | | Reviewed-by: Roland Scheidegger <[email protected]> Tested-by: Dieter Nützel <[email protected]>
* meson: move libsensors dependency to libgalliumDylan Baker2018-01-111-1/+1
| | | | | | | | | This simplifies the build by removing the need to link targets against libsensors. Suggested-by: Emil Velikov <[email protected]> Signed-off-by: Dylan Baker <[email protected]> Acked-by: Eric Engestrom <[email protected]>
* nvc0: enable bindless on keplerIlia Mirkin2018-01-071-3/+3
| | | | | | | All the functionality is in. Maxwell will take a little bit more enablement work. Signed-off-by: Ilia Mirkin <[email protected]>
* nvc0: add bindless image support for keplerIlia Mirkin2018-01-0711-75/+272
| | | | | | | | A part of the driver constbuf area is allocated for bindless images. Any update requires uploading to all driver constbufs. This also extends the driver constbuf to 64KB, up from 2KB. Signed-off-by: Ilia Mirkin <[email protected]>
* nvc0: add support for bindless textures on kepler+Ilia Mirkin2018-01-0710-5/+183
| | | | | | | | | This keeps a list of resident textures (per context), and dumps that list into the active buffer list when submitting. We also treat bindless texture fetches slightly differently, wrt the meaning of indirect, and not requiring the SAMPLER file to be used. Signed-off-by: Ilia Mirkin <[email protected]>
* nv50/ir: use the image info in the instruction rather than declIlia Mirkin2018-01-071-52/+24
| | | | | | | | | | In preparation for bindless images, we have to retrieve the target/format info from the instruction directly, as there will be no declaration. Furthermore, for bound images, this information is still available in the instruction, so we can drop the declaration-based mechanism entirely. Signed-off-by: Ilia Mirkin <[email protected]>
* nvc0/ir: safen up lowering logic against overwriting reused valuesIlia Mirkin2018-01-071-2/+4
| | | | | | | | | I'm fairly sure both of the changed sites are OK as-is, but they're fragile, so this is just safening them up. Since this is happening pre-ssa, we don't want to be overwriting values that may potentially get used later on. Signed-off-by: Ilia Mirkin <[email protected]>
* nvc0: update tic in-place when buffer address changesIlia Mirkin2018-01-072-14/+21
| | | | | | This is helpful for bindless, where changing TIC id's is undesirable. Signed-off-by: Ilia Mirkin <[email protected]>
* nvc0: ensure that pushbuf keeps ref to old text/tls bosIlia Mirkin2018-01-071-0/+13
| | | | | | | | | If we free the bo, then the PTE may get deallocated immediately. We have to make sure that the submission includes a ref to the old bo so that it remains mapped for the duration of the command execution. Signed-off-by: Ilia Mirkin <[email protected]> Reviewed-by: Samuel Pitoiset <[email protected]>