summaryrefslogtreecommitdiffstats
path: root/src/gallium/drivers/nouveau
Commit message (Collapse)AuthorAgeFilesLines
* nvc0/ir: return 0 in imageLoad on incomplete texturesKarol Herbst2018-08-042-3/+31
| | | | | | | | | | | | | | We already guarded all OP_SULDP against out of bound accesses, but we ended up just reusing whatever value was stored in the dest registers. Fixes CTS test shader_image_load_store.incomplete_textures v2: fix for loads not ending up with predicates (bindless_texture) v3: fix replacing the def Cc: <[email protected]> Reviewed-by: Ilia Mirkin <[email protected]> Signed-off-by: Karol Herbst <[email protected]>
* gm200/ir: optimize rcp(sqrt) to rsqKarol Herbst2018-08-041-1/+10
| | | | | | | | | | | | | | | | mitigates hurt shaders after adding sqrt: total instructions in shared programs : 5456166 -> 5454825 (-0.02%) total gprs used in shared programs : 647522 -> 647551 (0.00%) total shared used in shared programs : 389120 -> 389120 (0.00%) total local used in shared programs : 21064 -> 21064 (0.00%) total bytes used in shared programs : 58288696 -> 58274448 (-0.02%) local shared gpr inst bytes helped 0 0 0 516 516 hurt 0 0 27 2 2 Reviewed-by: Ilia Mirkin <[email protected]> Signed-off-by: Karol Herbst <[email protected]>
* gm200/ir: add native OP_SQRT supportKarol Herbst2018-08-044-2/+14
| | | | | | | | | | | | | | | | | | | | | ./GpuTest /test=pixmark_piano 1024x640 30sec: 301 -> 327 points shader-db: total instructions in shared programs : 5472103 -> 5456166 (-0.29%) total gprs used in shared programs : 647530 -> 647522 (-0.00%) total shared used in shared programs : 389120 -> 389120 (0.00%) total local used in shared programs : 21064 -> 21064 (0.00%) total bytes used in shared programs : 58459304 -> 58288696 (-0.29%) local shared gpr inst bytes helped 0 0 27 8281 8281 hurt 0 0 21 431 431 v2: use NVISA_GM200_CHIPSET Reviewed-by: Ilia Mirkin <[email protected]> Signed-off-by: Karol Herbst <[email protected]>
* gallium: add storage_sample_count parameter into is_format_supportedMarek Olšák2018-07-314-0/+13
| | | | Tested-by: Dieter Nützel <[email protected]>
* gallium: add PIPE_CAP_FRAMEBUFFER_MSAA_CONSTRAINTSMarek Olšák2018-07-313-0/+3
| | | | Tested-by: Dieter Nützel <[email protected]>
* nvc0: serialize before updating some constant buffer bindings on Maxwell+Rhys Perry2018-07-304-47/+81
| | | | | | | | | | | | | | | | | To avoid serializing, this has the user constant buffer always be 65536 bytes and enabled unless it's required that something else is used for constant buffer 0. Fixes artifacts with at least XCOM: Enemy Within, 0 A.D. and Unigine Valley, Heaven and Superposition. v2: changed uniform_buffer_bound to be bool instead of a uint32_t v3: remove magic constants v3: remove pointless code in nvc0_validate_driverconst Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=100177 Signed-off-by: Rhys Perry <[email protected]> Reviewed-by: Ilia Mirkin <[email protected]>
* nv50/ir: move LateAlgebraicOpt back to right after ConstantFoldingRhys Perry2018-07-191-1/+1
| | | | | | | | | | | | total instructions in shared programs : 5480808 -> 5472107 (-0.16%) total gprs used in shared programs : 647530 -> 647532 (0.00%) total shared used in shared programs : 389120 -> 389120 (0.00%) total local used in shared programs : 21064 -> 21064 (0.00%) total bytes used in shared programs : 58551648 -> 58459352 (-0.16%) local shared gpr inst bytes helped 0 0 73 2609 2609 hurt 0 0 71 34 34
* nv50/ir: handle SHLADD in IndirectPropagationRhys Perry2018-07-191-0/+12
| | | | | | | | | | | | | | | An alternative solution to the problem fixed in 0bd83d0 ("nv50/ir: move LateAlgebraicOpt to the very end"). total instructions in shared programs : 5481195 -> 5480808 (-0.01%) total gprs used in shared programs : 647535 -> 647530 (-0.00%) total shared used in shared programs : 389120 -> 389120 (0.00%) total local used in shared programs : 21064 -> 21064 (0.00%) total bytes used in shared programs : 58555784 -> 58551648 (-0.01%) local shared gpr inst bytes helped 0 0 2 34 34 hurt 0 0 0 0 0
* gm107/ir: use CS2R for SV_CLOCKRhys Perry2018-07-193-2/+25
| | | | | | | | This instruction seems to be faster than S2R and requires no barrier, though the range of special registers it can read from is limited. Signed-off-by: Rhys Perry <[email protected]> Reviewed-by: Karol Herbst <[email protected]>
* nouveau: fix 3D blitter for unsigned to signed integer conversionsKarol Herbst2018-07-152-10/+22
| | | | | | | | | fixes a couple of packed_pixel CTS tests. No regressions inside a CTS run. v2: simplify the changes a bit Reviewed-by: Ilia Mirkin <[email protected]> Signed-off-by: Karol Herbst <[email protected]>
* nv50/ir: fix Instruction::isActionEqual for PHI instructionsKarol Herbst2018-07-071-0/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | phi instructions don't have the same results by simply having the same sources. They need to be inside the same BasicBlock or share an equal condition resulting into a path through the shader selecting equal sources as well. short example: cond = ...; const0 = 0; const1 = 1; if (cond) { ssa_1 = const0; } else { ssa_2 = const1; } ssa_3 = phi ssa_1 ssa_2; if (!cond) { ssa_4 = const0; } else { ssa_5 = const1; } ssa_6 = phi ssa_4 ssa_5; allthough both phis actually have sources with equal results, merging them would be wrong due to having a different condition selecting which source to take. For now we also stick an assert into GlobalCSE, because it should never end up having to merge phi instructions. Reviewed-by: Ilia Mirkin <[email protected]>
* nvc0/ir: use the combined tid special registerRhys Perry2018-07-079-0/+61
| | | | | | | | | | | | | | total instructions in shared programs : 5804448 -> 5804690 (0.00%) total gprs used in shared programs : 670065 -> 670065 (0.00%) total shared used in shared programs : 548832 -> 548832 (0.00%) total local used in shared programs : 21068 -> 21068 (0.00%) local shared gpr inst bytes helped 0 0 0 5 5 hurt 0 0 0 191 191 Signed-off-by: Rhys Perry <[email protected]> Reviewed-by: Karol Herbst <[email protected]>
* nvc0: implement multisampled images on Maxwell+Rhys Perry2018-07-046-39/+48
| | | | | | | | | | Changes in v2: - make loadSuInfo32() protected without making the rest protected - move NVC0_SU_INFO_* into nv50_ir_lowering_nvc0.h instead of duplicating NVC0_SU_INFO_MS Signed-off-by: Rhys Perry <[email protected]> Reviewed-by: Karol Herbst <[email protected]>
* nv50/ir: handle clipvertex for geom and tess shaders as wellKarol Herbst2018-07-021-1/+6
| | | | | | | | | this will be needed for compatibility profiles v2: handle tess shaders Reviewed-by: Ilia Mirkin <[email protected]> Signed-off-by: Karol Herbst <[email protected]>
* gallium/util: remove dummy function util_format_is_supportedMarek Olšák2018-06-293-10/+0
| | | | Reviewed-by: Eric Engestrom <[email protected]>
* nv50/ir: improve maintainability of Target*::initOpInfo()Rhys Perry2018-06-292-23/+28
| | | | | | | | | | This is mainly useful for when one needs to add new opcodes in a painless and reliable way. Signed-off-by: Rhys Perry <[email protected]> Reviewed-by: Ilia Mirkin <[email protected]> Reviewed-by: Karol Herbst <[email protected]> Signed-off-by: Karol Herbst <[email protected]>
* nv50/ir: fix image stores with indirect handlesRhys Perry2018-06-291-4/+5
| | | | | | | | | | Having this if statement here prevented the next if statement from being reached in the case of image stores, which is needed for instructions with indirect bindless handles like "STORE TEMP[ADDR[2].x+1](1) ...". Signed-off-by: Rhys Perry <[email protected]> Reviewed-by: Karol Herbst <[email protected]> Signed-off-by: Karol Herbst <[email protected]>
* nvc0: remove magic values in nve4_set_tex_handles()Rhys Perry2018-06-281-1/+1
| | | | | | | | | With this commit, things no longer break if NVC0_CB_AUX_TEX_INFO is changed to anything other than 0x20. Signed-off-by: Rhys Perry <[email protected]> Reviewed-by: Karol Herbst <[email protected]> Signed-off-by: Karol Herbst <[email protected]>
* nvc0/ir: fix TargetNVC0::insnCanLoadOffset()Rhys Perry2018-06-281-0/+1
| | | | | | | | | | | | | | Previously, TargetNVC0::insnCanLoadOffset() returned whether the offset could be set to a specific value. The IndirectPropagation pass expected it to return whether the offset could be increased by a specific value, which is what TargetNV50::insnCanLoadOffset() does. Fixes: 37b67db6ae34fb6586d640a7a1b6232f091dd812 ("nvc0/ir: be careful about propagating very large offsets into const load") Signed-off-by: Rhys Perry <[email protected]> Reviewed-by: Karol Herbst <[email protected]> Signed-off-by: Karol Herbst <[email protected]>
* nv50/ir: only avoid spilling constrained def if a mov is addedKarol Herbst2018-06-231-2/+2
| | | | | | | | | | | | | fix spilling regression introduced by 5428066f5e this is just a minor mistake done while moving the code out into a new function. The function contained a loop which might have been terminated earlier and skipped setting noSpill to 1. After the refactoring it was always set. Fixes: 5428066f5e1ef5ea6ae04c84019f270023cfc6aa ("nv50/ir: make a copy of tex src if it's referenced multiple times") Signed-off-by: Karol Herbst <[email protected]>
* gallium: add scalar isa shader capChristian Gmeiner2018-06-203-0/+6
| | | | | | | | | | | | | | | | v1 -> v2: - nv30 is _NOT_ scalar as suggested by Ilia Mirkin. - Change from a screen cap to a shader cap as suggested by Eric Anholt. - radeonsi is scalar as suggested by Marek Olšák. - Change missing ones to be scalar. v2 -> v3: - r600 prefers vec4 as suggested by Marek Olšák. Signed-off-by: Christian Gmeiner <[email protected]> Reviewed-by: Eric Anholt <[email protected]> Reviewed-by: Marek Olšák <[email protected]>
* nvc0: add support for programmable sample locationsRhys Perry2018-06-1410-46/+299
| | | | Signed-off-by: Rhys Perry <[email protected]>
* gallium: add support for programmable sample locationsRhys Perry2018-06-143-0/+3
| | | | | | Signed-off-by: Rhys Perry <[email protected]> Reviewed-by: Brian Paul <[email protected]> (v2) Reviewed-by: Marek Olšák <[email protected]> (v2)
* nv30: add a couple of missed shader capsIlia Mirkin2018-05-301-0/+2
| | | | Signed-off-by: Ilia Mirkin <[email protected]>
* nv30: ensure that displayable formats are marked accordinglyIlia Mirkin2018-05-301-4/+6
| | | | | | Fixes: f7604d8af52 ("st/dri: only expose config formats that are display targets") Cc: "18.1" <[email protected]> Signed-off-by: Ilia Mirkin <[email protected]>
* gallium: add PIPE_CAP_GLSL_FEATURE_LEVEL_COMPATIBILITYMarek Olšák2018-05-293-0/+5
| | | | | Reviewed-by: Nicolai Hähnle <[email protected]> Reviewed-by: Timothy Arceri <[email protected]>
* gallium/winsys: rename DRM_API_HANDLE_* to WINSYS_HANDLE_*Dave Airlie2018-05-301-6/+6
| | | | | | | | | | | | This just renames this as we want to add an shm handle which isn't really drm related. Originally by: Marc-André Lureau <[email protected]> (airlied: I used this sed script instead) This was generated with: git grep -l 'DRM_API_' | xargs sed -i 's/DRM_API_/WINSYS_/g' Reviewed-by: Marek Olšák <[email protected]>
* nv50/ir: Extend ImmediateValue::applyLog2 to 64-bit integersPierre Moreau2018-05-291-1/+10
| | | | | | Signed-off-by: Karol Herbst <[email protected]> Reviewed-by: Karol Herbst <[email protected]> Reviewed-by: Ilia Mirkin <[email protected]>
* gm107/ir: prevent WaW hazards in instruction schedulingRhys Perry2018-05-281-54/+57
| | | | | | | | Previously, findFirstUse() only considered reads "uses". This fixes that by making it check both an instruction's sources and definitions. It also shortens both findFistUse() and findFirstDef() along the way. Reviewed-by: Ilia Mirkin <[email protected]>
* nvc0: fix setting of subpixel precision during conservative rasterizationRhys Perry2018-05-132-2/+2
| | | | | | Fixes: 07dac3e040 ("nvc0: add conservative rasterization support") Signed-off-by: Rhys Perry <[email protected]> Reviewed-by: Ilia Mirkin <[email protected]>
* nv50/ir: fix printing of pixldRhys Perry2018-05-031-1/+1
| | | | | Signed-off-by: Rhys Perry <[email protected]> Reviewed-by: Ilia Mirkin <[email protected]>
* nv50: Extract needed value bits without shifting them before calling bitcountVlad Golovkin2018-05-021-1/+1
| | | | | | | This can save one instruction since bitcount doesn't care about specific bits' positions. Reviewed-by: Karol Herbst <[email protected]>
* nvc0: add conservative rasterization supportRhys Perry2018-04-307-8/+87
| | | | | | | | | Subpixel precision bias, dilation and the post-snap mode are supported on GM200 and newer. The pre-snap mode is supported for triangle primitives on GP100. Signed-off-by: Rhys Perry <[email protected]> Reviewed-by: Ilia Mirkin <[email protected]>
* gallium: add initial support for conservative rasterizationRhys Perry2018-04-303-0/+30
| | | | | | Signed-off-by: Rhys Perry <[email protected]> Reviewed-by: Brian Paul <[email protected]> Reviewed-by: Marek Olšák <[email protected]>
* nvc0/ir: all short immediates are sign-extended, adjust LIMM testIlia Mirkin2018-04-243-19/+24
| | | | | | | | | | | | | | | | | | | | | | Some analysis suggests that all short immediates are sign-extended. The insnCanLoad logic already accounted for this, but we could still pick the wrong form when emitting actual instructions that support both short and long immediates (with the long form usually having additional restrictions that insnCanLoad should be aware of). This also reverses a bunch of commits that had previously "worked around" this issue in various emitters: 9c63224540ef: gm107/ir: make use of ADD32I for all immediates 83a4f28dc27b: gm107/ir: make use of LOP32I for all immediates b84c97587b4a: gm107/ir: make use of IMUL32I for all immediates d30768025a22: gk110/ir: make use of IMUL32I for all immediates as well as the original import for UMUL in the nvc0 emitter. Reported-by: Karol Herbst <[email protected]> Signed-off-by: Ilia Mirkin <[email protected]> Tested-by: Karol Herbst <[email protected]>
* gm107/ir/lib: fix sched in div u32 builtinKarol Herbst2018-04-242-4/+4
| | | | | | | | | | Imad needs to set a read barrier. With significant big work groups I was getting wrong results for div u32. Turns out the issue was with the sched opcodes. Signed-off-by: Karol Herbst <[email protected]> Reviewed-by: Samuel Pitoiset <[email protected]>
* nv50/ir: make a copy of tex src if it's referenced multiple timesIlia Mirkin2018-04-221-37/+49
| | | | | | | | | | | | | | | | For nv50 we coalesce the srcs and defs into a single node. As such, we can end up with impossible constraints if the source is referenced after the tex operation (which, due to the coalescing of values, will have overwritten it). This logic already exists for inserting moves for MERGE/UNION sources. It's the exact same idea here, so leverage that code, which also includes a few optimizations around not extending live ranges unnecessarily. Fixes tests/spec/glsl-1.30/execution/fs-textureSize-components.shader_test Signed-off-by: Ilia Mirkin <[email protected]>
* nv50/ir/ra: prefer def == src2 for fma with immediates on nvc0Karol Herbst2018-04-211-10/+29
| | | | | | | | | | | | | | | | | | | | This helps with the PostRALoadPropagation pass moving long immediates into FMA/MAD instructions. changes in shader-db: total instructions in shared programs : 5894114 -> 5886074 (-0.14%) total gprs used in shared programs : 666558 -> 666563 (0.00%) total shared used in shared programs : 520416 -> 520416 (0.00%) total local used in shared programs : 53524 -> 53524 (0.00%) total bytes used in shared programs : 54006744 -> 53932472 (-0.14%) local shared gpr inst bytes helped 0 0 2 4192 4192 hurt 0 0 7 9 9 Signed-off-by: Karol Herbst <[email protected]> [imirkin: minor edits to separate nv50 and nvc0+ cases] Reviewed-by: Ilia Mirkin <[email protected]>
* nvc0: fix line width on GM20x+Rhys Perry2018-04-201-1/+4
| | | | | | | This has the side-effect of fixing polygon-offset piglit test failures. Signed-off-by: Rhys Perry <[email protected]> Reviewed-by: Ilia Mirkin <[email protected]>
* nvc0: finish implementation of PIPE_QUERY_SO_OVERFLOW_PREDICATERhys Perry2018-04-073-17/+30
| | | | | | | This also removes some useless code leftover from old changes. Signed-off-by: Rhys Perry <[email protected]> Reviewed-by: Ilia Mirkin <[email protected]>
* nvc0: change ACQUIRE_EQUAL to ACQUIRE_GEQUAL in nvc0_hw_query_fifo_waitRhys Perry2018-04-071-1/+1
| | | | | | | | | | If a fence is created in between nvc0_hw_end_query and nvc0_hw_query_fifo_wait, the sequence number in nvc0->screen->fence.bo can be larger than hq->fence->sequence before the semaphore is created, resulting in the semaphore never being triggered. Signed-off-by: Rhys Perry <[email protected]> Reviewed-by: Ilia Mirkin <[email protected]>
* nvc0: ensure the query's fence has been emitted in nvc0_hw_query_fifo_waitRhys Perry2018-04-071-0/+4
| | | | | | | | | | If the fence has not been emitted, hq->fence->sequence would be zero. This would result in the semaphore never being triggered, blocking all later commands in the pushbuf. Signed-off-by: Rhys Perry <[email protected]> [imirkin: use nouveau_fence_emit instead] Reviewed-by: Ilia Mirkin <[email protected]>
* nvc0: restore image binding on RGB10A2, remove from BGR10A2Ilia Mirkin2018-04-071-2/+2
| | | | | | | | | | | | | Fixes a bunch of new CTS pbo tests that use those as an output format, which the state tracker converts into buffer image writes. No part of the driver is ready for BGR10A2. It could probably be enabled on Maxwell+, but seems unnecessary. This error was introduced when flipping the displayable bit on those formats, which accidentally also moved the image bit. Fixes: e1a70aed10d (nv50,nvc0: mark ABGR format as displayable instead of ARGB format) Signed-off-by: Ilia Mirkin <[email protected]>
* util: Move util_is_power_of_two to bitscan.h and rename to ↵Ian Romanick2018-03-294-7/+7
| | | | | | | | | | | util_is_power_of_two_or_zero The new name make the zero-input behavior more obvious. The next patch adds a new function with different zero-input behavior. Signed-off-by: Ian Romanick <[email protected]> Suggested-by: Matt Turner <[email protected]> Reviewed-by: Alejandro Piñeiro <[email protected]>
* nvc0/ir: fix emiting NOTs with predicatesKarol Herbst2018-03-291-0/+2
| | | | | Signed-off-by: Karol Herbst <[email protected]> Reviewed-by: Ilia Mirkin <[email protected]>
* nvc0/ir: fix INTERP_* with indirect inputsIlia Mirkin2018-03-271-3/+4
| | | | | | | | | | | | There were two problems, both of which are fixed now: - The indirect address was not being shifted by 4 - The indirect address was being placed as an argument in the offset case This fixes some of the new interpolateAt* piglits which now test for these situations. Signed-off-by: Ilia Mirkin <[email protected]> Reviewed-by: Karol Herbst <[email protected]>
* gallium: add packed uniform CAPTimothy Arceri2018-03-203-0/+3
| | | | Reviewed-by: Marek Olšák <[email protected]>
* nv50,nvc0: Support BGRX1010102 and RGBX1010102 for sampling.Mario Kleiner2018-03-141-0/+2
| | | | | | | | | | | | | | | | | | | | | Add them as usable for textures, so they can be used by Wayland drm in 10 bpc mode and for X11 compositing under GLX and EGL. We need these formats to be supported at least for sampling, otherwise GLX_texture_from_pixmap and the equivalent EGL image extension won't work with X11 drawables of depth 30 and just display an all black window. Do not expose these formats as renderable, and thereby not as a fbconfig/EGLConfig/Visual, as NVidia hw does not support 10 bpc unorm formats without alpha channel. Tested under X11 + GLX/EGL + DRI2/DRI3 for compositing, and under Wayland+Weston drm backend with a Tesla and Pascal gpu. Signed-off-by: Mario Kleiner <[email protected]> Reviewed-by: Ilia Mirkin <[email protected]>
* nouveau: Add framebuffer modifier supportThierry Reding2018-03-096-5/+146
| | | | | | | | | | | | | | | | | | | | | | | This adds support for framebuffer modifiers to Nouveau. This will be used by the Tegra driver to share metadata about the format of buffers (such as the tiling mode or compression). Changes in v2: - remove unused parameters to nouveau_buffer_create() - move format modifier query code to nvc0 backend - restrict format modifiers to 2D textures - implement ->query_dmabuf_modifiers() Changes in v4: - add UAPI include path on meson builds Changes in v5: - remove unnecessary includes Acked-by: Emil Velikov <[email protected]> Tested-by: Andre Heider <[email protected]> Reviewed-by: Ilia Mirkin <[email protected]> Signed-off-by: Thierry Reding <[email protected]>
* nouveau/nvc0: Extract common tile mode macroThierry Reding2018-03-091-6/+9
| | | | | | | | | | | Add a new macro that can be used to extract the tiling mode from a tile_mode value. This is will be used to determine the number of GOBs used in block linear mode. Acked-by: Emil Velikov <[email protected]> Tested-by: Andre Heider <[email protected]> Reviewed-by: Ilia Mirkin <[email protected]> Signed-off-by: Thierry Reding <[email protected]>