summaryrefslogtreecommitdiffstats
path: root/src/gallium/drivers/nouveau
Commit message (Collapse)AuthorAgeFilesLines
* nouveau: remove unused class memberEric Engestrom2018-10-301-1/+0
| | | | | Signed-off-by: Eric Engestrom <[email protected]> Reviewed-by: Ilia Mirkin <[email protected]>
* util: Change remaining uint32 cache ids to sha1David McFarland2018-10-261-14/+15
| | | | | | | | | | | | After discussion with Timothy Arceri. disk_cache_get_function_identifier was using only the first byte of the sha1 build-id. Replace disk_cache_get_function_identifier with implementation from radv_get_build_id. Instead of writing a uint32_t it now writes to a mesa_sha1. All drivers using disk_cache_get_function_identifier are updated accordingly. Reviewed-by: Timothy Arceri <[email protected]> Fixes: 83ea8dd99bb1 ("util: add disk_cache_get_function_identifier()")
* nvc0: increase NOUVEAU_TRANSFER_PUSHBUF_THRESHOLD to 1024 on Kepler+Rhys Perry2018-10-254-3/+11
| | | | | | | | Gives a +3.89% to +5.27% FPS improvement with Hitman and +2.73% to +2.82% FPS improvement with Dirt Rally on my GTX 1060. Signed-off-by: Rhys Perry <[email protected]> Reviewed-by: Ilia Mirkin <[email protected]>
* nv50/ir: fix ConstantFolding::createMul for 64 bit mulsKarol Herbst2018-10-201-1/+1
| | | | | | | | Fixes: 2f52925f5c60c72c9389bfdc122c3d5f8e15b25f "nv50/ir: move a * b -> a << log2(b) code into createMul()" Reviewed-by: Rhys Perry <[email protected]> Signed-off-by: Karol Herbst <[email protected]>
* nvc0: fix blitting red to srgb8_alphaIlia Mirkin2018-10-091-0/+4
| | | | | | | | | | | | | For some reason the 2d engine can't handle this. Red formats get special treatment there, so perhaps related. Fixes dEQP-GLES3 tests of the form: dEQP-GLES3.functional.fbo.blit.conversion.r{8,16f,32f}_to_srgb8_alpha8 Signed-off-by: Ilia Mirkin <[email protected]> Reviewed-by: Karol Herbst <[email protected]> Cc: [email protected]
* nv50,nvc0: guard against zero-size blitsIlia Mirkin2018-10-092-0/+14
| | | | | | | | | | The current state tracker can generate these sometimes. Fixing this is more involved, and due to some integer math we can generate divisions-by-zero. Signed-off-by: Ilia Mirkin <[email protected]> Reviewed-by: Karol Herbst <[email protected]> Cc: [email protected]
* nv50,nvc0: mark RGBX_UINT formats as renderableIlia Mirkin2018-10-091-4/+4
| | | | | | | | | | | | | | | | This helps st/mesa avoid some (apparently) buggy fallbacks. Specifically the CopyTexSubImage fallback tries to read texture A as RGBA_FLOAT and write back that data into the target format, which fails for integer formats which have no appropriate logic to do the conversion. Since integer formats don't blend, there's no harm in the fact that the "A" component gets written anyways. Fixes, among others: https://www.khronos.org/registry/webgl/sdk/tests/conformance2/textures/canvas/tex-2d-rgb8ui-rgb_integer-unsigned_byte.html Signed-off-by: Ilia Mirkin <[email protected]> Cc: [email protected]
* nouveau: use build-id when available for disk cacheTimothy Arceri2018-10-031-7/+7
| | | | Reviewed-by: Marek Olšák <[email protected]>
* nv50/ir: fix link-time build failureRhys Perry2018-09-231-1/+1
| | | | | | | Seems this fixes linking problems that occur in some situations. Signed-off-by: Rhys Perry <[email protected]> Reviewed-by: Ilia Mirkin <[email protected]>
* nvc0: fix bindless multisampled images on Maxwell+Rhys Perry2018-09-223-5/+45
| | | | | | | | | | | | | | | | | NVC0_CB_AUX_BINDLESS_INFO isn't written to on Maxwell+ and it's too small anyway. With these changes, TXQ is used to determine the number of samples and the coordinate adjustment information looked up in a small array in the driver constant buffer. v2: rework to use TXQ and a small array instead of a larger array with an entry for each texture v3: get rid of the small array and calculate the adjustments in the shader Signed-off-by: Rhys Perry <[email protected]> Fixes: c2ae9b40527 ('nvc0: implement multisampled images on Maxwell+') Reviewed-by: Ilia Mirkin <[email protected]>
* nvc0: warn about changing NVC0_CB_AUX_MP_INFO and NVC0_CB_AUX_DRAW_INFORhys Perry2018-09-221-2/+6
| | | | | Signed-off-by: Rhys Perry <[email protected]> Reviewed-by: Ilia Mirkin <[email protected]>
* nvc0: Update counter reading shaders to new NVC0_CB_AUX_MP_INFORhys Perry2018-09-221-18/+18
| | | | | | Fixes: 66ca7e400b8 ('nvc0: add support for programmable sample locations') Signed-off-by: Rhys Perry <[email protected]> Reviewed-by: Ilia Mirkin <[email protected]>
* nvir: Always split 64-bit IMAD/IMUL operationsPierre Moreau2018-09-131-1/+1
| | | | | | | | | | | Those operations do not map to actual hardware instructions, therefore those should always be lowered to 32-bit instructions. Fixes: 009c54aa7af "nv50/ir: Split 64-bit integer MAD/MUL operations" Signed-off-by: Pierre Moreau <[email protected]> Reviewed-by: Karol Herbst <[email protected]> Signed-off-by: Karol Herbst <[email protected]>
* nv50,nvc0: warn on not-explicitly-handled capsIlia Mirkin2018-09-112-14/+26
| | | | | | | | | | Not handling caps explicitly means that we're likely getting incorrect values -- these need to be reviewed and set appropriately. While we're at it, add in some missing caps, and set all the subpixel stuff to 8 as that seems to be what the blob reports. Signed-off-by: Ilia Mirkin <[email protected]>
* gallium: add PIPE_CAP_MAX_TEXTURE_UPLOAD_MEMORY_BUDGETMarek Olšák2018-09-073-0/+3
|
* gallium: enable GL_AMD_depth_clamp_separate on r600, radeonsiMarek Olšák2018-09-063-0/+3
|
* gallium: split depth_clip into depth_clip_near & depth_clip_farMarek Olšák2018-09-063-3/+3
| | | | for AMD_depth_clamp_separate.
* gallium: Add a helper for implementing PIPE_CAP_* default values.Eric Anholt2018-09-043-9/+9
| | | | | | | | | | | | | | | | | | One of the pains of implementing a gallium driver is filling in a million pipe caps you don't know about yet when you're just starting out. One of the pains of working on gallium is copy-and-pasting your new PIPE_CAP into each driver. We can fix both of these by having each driver call into the default helper from their default case, so that both sides can ignore each other until they need to. v2: fix i915g build, revert swr change to avoid breaking scons build (https://travis-ci.org/anholt/mesa/jobs/419739857) v3: Rebase on 3 new gallium caps. Reviewed-by: Marek Olšák <[email protected]> (v1) Cc: Bruce Cherniak <[email protected]> Cc: George Kyriazis <[email protected]> Cc: Kenneth Graunke <[email protected]>
* nv50: bump compat glsl level to same as coreIlia Mirkin2018-08-291-1/+1
| | | | | | | Passes the compat piglits. I'm sure that there will be odd issues that aren't caught by them, but at least it should basically work. Signed-off-by: Ilia Mirkin <[email protected]>
* nvc0: bump compat GLSL version to match coreIlia Mirkin2018-08-291-1/+1
| | | | | | This passes the handful of tests in piglit. Signed-off-by: Ilia Mirkin <[email protected]>
* nv50/ir: silence partitionLoadStore() unused function warningRhys Kidd2018-08-291-2/+2
| | | | | | | | | | | | Move this now-unused function into the existing comment block, which was its only prior use. ../../../../../src/gallium/drivers/nouveau/codegen/nv50_ir_from_tgsi.cpp:2645:1: warning: unused function 'partitionLoadStore' [-Wunused-function] partitionLoadStore(uint8_t comp[2], uint8_t size[2], uint8_t mask) Fixes: ("86e4440361 nouveau: codegen: Disable more old resource handling code") Signed-off-by: Rhys Kidd <[email protected]> Reviewed-by: Ilia Mirkin <[email protected]>
* nv50/ir,nvc0: use constant buffers for compute when possible on Kepler+Rhys Perry2018-08-272-10/+36
| | | | | | | | | | | | | | | | | Gives a +7.79% increase in FPS with Hitman on lowest quality settings on my GTX 1060. total instructions in shared programs : 5787979 -> 5748677 (-0.68%) total gprs used in shared programs : 669901 -> 669373 (-0.08%) total shared used in shared programs : 548832 -> 548832 (0.00%) total local used in shared programs : 21068 -> 21064 (-0.02%) local shared gpr inst bytes helped 1 0 152 274 274 hurt 0 0 0 0 0 Signed-off-by: Rhys Perry <[email protected]> Reviewed-by: Karol Herbst <[email protected]>
* nv50/ir: optimize multiplication by 16-bit immediates into two xmadsRhys Perry2018-08-271-0/+10
| | | | | | | | | | | | | | | | Rather than the usual three that would be created. total instructions in shared programs : 5796385 -> 5786560 (-0.17%) total gprs used in shared programs : 670103 -> 669968 (-0.02%) total shared used in shared programs : 548832 -> 548832 (0.00%) total local used in shared programs : 21164 -> 21068 (-0.45%) local shared gpr inst bytes helped 1 0 64 1040 1040 hurt 0 0 27 0 0 Signed-off-by: Rhys Perry <[email protected]> Reviewed-by: Karol Herbst <[email protected]>
* nv50/ir: optimize near power-of-twos into shladdRhys Perry2018-08-271-0/+27
| | | | | | | | | | | | | | total instructions in shared programs : 5819319 -> 5796385 (-0.39%) total gprs used in shared programs : 670571 -> 670103 (-0.07%) total shared used in shared programs : 548832 -> 548832 (0.00%) total local used in shared programs : 21164 -> 21164 (0.00%) local shared gpr inst bytes helped 0 0 318 1758 1758 hurt 0 0 63 0 0 Signed-off-by: Rhys Perry <[email protected]> Reviewed-by: Karol Herbst <[email protected]>
* nv50/ir: move a * b -> a << log2(b) code into createMul()Rhys Perry2018-08-271-15/+30
| | | | | | | | | | | | | | | | | | | | With this commit, OP_MAD is handled on nv50 too. This commit is also useful for later commits. Also, instead of creating a shladd, it relies on LateAlgebraicOpt to create one. This simplifies the code and helps shader-db slightly overall. total instructions in shared programs : 5820882 -> 5819319 (-0.03%) total gprs used in shared programs : 670595 -> 670571 (-0.00%) total shared used in shared programs : 548832 -> 548832 (0.00%) total local used in shared programs : 21164 -> 21164 (0.00%) local shared gpr inst bytes helped 0 0 18 230 230 hurt 0 0 8 263 263 Signed-off-by: Rhys Perry <[email protected]> Reviewed-by: Karol Herbst <[email protected]>
* nv50/ir: optimize imul/imad to xmadsRhys Perry2018-08-272-1/+56
| | | | | | | | | | | | | | | | | | | This hits the shader-db numbers a good bit, though a few xmads is way faster than an imul or imad and the cost is mitigated by the next commit, which optimizes many multiplications by immediates into shorter and less register heavy instructions than the xmads. total instructions in shared programs : 5768871 -> 5820882 (0.90%) total gprs used in shared programs : 669919 -> 670595 (0.10%) total shared used in shared programs : 548832 -> 548832 (0.00%) total local used in shared programs : 21068 -> 21164 (0.46%) local shared gpr inst bytes helped 0 0 38 0 0 hurt 1 0 365 3076 3076 Signed-off-by: Rhys Perry <[email protected]> Reviewed-by: Karol Herbst <[email protected]>
* gm107/ir: add support for OP_XMAD on GM107+Rhys Perry2018-08-273-1/+71
| | | | | | | | v4: make the immediate field 16 bits v5: don't ever emit h1 flags for immediates Signed-off-by: Rhys Perry <[email protected]> Reviewed-by: Karol Herbst <[email protected]>
* nv50/ir: add preliminary support for OP_XMADRhys Perry2018-08-277-5/+85
| | | | | | | | v4: remove uint16_t(...) v4: don't allow immediates outside [0,65535] in insnCanLoad() Signed-off-by: Rhys Perry <[email protected]> Reviewed-by: Karol Herbst <[email protected]>
* gallium: Split out PIPE_CAP_TEXTURE_MIRROR_CLAMP_TO_EDGE.Kenneth Graunke2018-08-243-0/+3
| | | | | | | | | | | | | Some hardware can do PIPE_TEX_WRAP_MIRROR_REPEAT but not PIPE_TEX_WRAP_MIRROR_CLAMP and PIPE_TEX_WRAP_MIRROR_CLAMP_TO_BORDER. Drivers for such hardware would like to advertise support for ARB_texture_mirror_clamp_to_edge but not EXT_texture_mirror_clamp. This commit adds a new PIPE_CAP_TEXTURE_MIRROR_CLAMP_TO_EDGE bit, changes the extension enable to be based on that, and enables it in all upstream drivers which supported PIPE_CAP_TEXTURE_MIRROR_CLAMP (so they continue supporting this mode).
* gallium: add PIPE_CAP_MAX_SHADER_BUFFER_SIZEMarek Olšák2018-08-233-0/+6
| | | | Tested-by: Dieter Nützel <[email protected]>
* gallium: add PIPE_CAP_MAX_GS_INVOCATIONSMarek Olšák2018-08-233-0/+6
| | | | Tested-by: Dieter Nützel <[email protected]>
* nvc0/ir: return 0 in imageLoad on incomplete texturesKarol Herbst2018-08-042-3/+31
| | | | | | | | | | | | | | We already guarded all OP_SULDP against out of bound accesses, but we ended up just reusing whatever value was stored in the dest registers. Fixes CTS test shader_image_load_store.incomplete_textures v2: fix for loads not ending up with predicates (bindless_texture) v3: fix replacing the def Cc: <[email protected]> Reviewed-by: Ilia Mirkin <[email protected]> Signed-off-by: Karol Herbst <[email protected]>
* gm200/ir: optimize rcp(sqrt) to rsqKarol Herbst2018-08-041-1/+10
| | | | | | | | | | | | | | | | mitigates hurt shaders after adding sqrt: total instructions in shared programs : 5456166 -> 5454825 (-0.02%) total gprs used in shared programs : 647522 -> 647551 (0.00%) total shared used in shared programs : 389120 -> 389120 (0.00%) total local used in shared programs : 21064 -> 21064 (0.00%) total bytes used in shared programs : 58288696 -> 58274448 (-0.02%) local shared gpr inst bytes helped 0 0 0 516 516 hurt 0 0 27 2 2 Reviewed-by: Ilia Mirkin <[email protected]> Signed-off-by: Karol Herbst <[email protected]>
* gm200/ir: add native OP_SQRT supportKarol Herbst2018-08-044-2/+14
| | | | | | | | | | | | | | | | | | | | | ./GpuTest /test=pixmark_piano 1024x640 30sec: 301 -> 327 points shader-db: total instructions in shared programs : 5472103 -> 5456166 (-0.29%) total gprs used in shared programs : 647530 -> 647522 (-0.00%) total shared used in shared programs : 389120 -> 389120 (0.00%) total local used in shared programs : 21064 -> 21064 (0.00%) total bytes used in shared programs : 58459304 -> 58288696 (-0.29%) local shared gpr inst bytes helped 0 0 27 8281 8281 hurt 0 0 21 431 431 v2: use NVISA_GM200_CHIPSET Reviewed-by: Ilia Mirkin <[email protected]> Signed-off-by: Karol Herbst <[email protected]>
* gallium: add storage_sample_count parameter into is_format_supportedMarek Olšák2018-07-314-0/+13
| | | | Tested-by: Dieter Nützel <[email protected]>
* gallium: add PIPE_CAP_FRAMEBUFFER_MSAA_CONSTRAINTSMarek Olšák2018-07-313-0/+3
| | | | Tested-by: Dieter Nützel <[email protected]>
* nvc0: serialize before updating some constant buffer bindings on Maxwell+Rhys Perry2018-07-304-47/+81
| | | | | | | | | | | | | | | | | To avoid serializing, this has the user constant buffer always be 65536 bytes and enabled unless it's required that something else is used for constant buffer 0. Fixes artifacts with at least XCOM: Enemy Within, 0 A.D. and Unigine Valley, Heaven and Superposition. v2: changed uniform_buffer_bound to be bool instead of a uint32_t v3: remove magic constants v3: remove pointless code in nvc0_validate_driverconst Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=100177 Signed-off-by: Rhys Perry <[email protected]> Reviewed-by: Ilia Mirkin <[email protected]>
* nv50/ir: move LateAlgebraicOpt back to right after ConstantFoldingRhys Perry2018-07-191-1/+1
| | | | | | | | | | | | total instructions in shared programs : 5480808 -> 5472107 (-0.16%) total gprs used in shared programs : 647530 -> 647532 (0.00%) total shared used in shared programs : 389120 -> 389120 (0.00%) total local used in shared programs : 21064 -> 21064 (0.00%) total bytes used in shared programs : 58551648 -> 58459352 (-0.16%) local shared gpr inst bytes helped 0 0 73 2609 2609 hurt 0 0 71 34 34
* nv50/ir: handle SHLADD in IndirectPropagationRhys Perry2018-07-191-0/+12
| | | | | | | | | | | | | | | An alternative solution to the problem fixed in 0bd83d0 ("nv50/ir: move LateAlgebraicOpt to the very end"). total instructions in shared programs : 5481195 -> 5480808 (-0.01%) total gprs used in shared programs : 647535 -> 647530 (-0.00%) total shared used in shared programs : 389120 -> 389120 (0.00%) total local used in shared programs : 21064 -> 21064 (0.00%) total bytes used in shared programs : 58555784 -> 58551648 (-0.01%) local shared gpr inst bytes helped 0 0 2 34 34 hurt 0 0 0 0 0
* gm107/ir: use CS2R for SV_CLOCKRhys Perry2018-07-193-2/+25
| | | | | | | | This instruction seems to be faster than S2R and requires no barrier, though the range of special registers it can read from is limited. Signed-off-by: Rhys Perry <[email protected]> Reviewed-by: Karol Herbst <[email protected]>
* nouveau: fix 3D blitter for unsigned to signed integer conversionsKarol Herbst2018-07-152-10/+22
| | | | | | | | | fixes a couple of packed_pixel CTS tests. No regressions inside a CTS run. v2: simplify the changes a bit Reviewed-by: Ilia Mirkin <[email protected]> Signed-off-by: Karol Herbst <[email protected]>
* nv50/ir: fix Instruction::isActionEqual for PHI instructionsKarol Herbst2018-07-071-0/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | phi instructions don't have the same results by simply having the same sources. They need to be inside the same BasicBlock or share an equal condition resulting into a path through the shader selecting equal sources as well. short example: cond = ...; const0 = 0; const1 = 1; if (cond) { ssa_1 = const0; } else { ssa_2 = const1; } ssa_3 = phi ssa_1 ssa_2; if (!cond) { ssa_4 = const0; } else { ssa_5 = const1; } ssa_6 = phi ssa_4 ssa_5; allthough both phis actually have sources with equal results, merging them would be wrong due to having a different condition selecting which source to take. For now we also stick an assert into GlobalCSE, because it should never end up having to merge phi instructions. Reviewed-by: Ilia Mirkin <[email protected]>
* nvc0/ir: use the combined tid special registerRhys Perry2018-07-079-0/+61
| | | | | | | | | | | | | | total instructions in shared programs : 5804448 -> 5804690 (0.00%) total gprs used in shared programs : 670065 -> 670065 (0.00%) total shared used in shared programs : 548832 -> 548832 (0.00%) total local used in shared programs : 21068 -> 21068 (0.00%) local shared gpr inst bytes helped 0 0 0 5 5 hurt 0 0 0 191 191 Signed-off-by: Rhys Perry <[email protected]> Reviewed-by: Karol Herbst <[email protected]>
* nvc0: implement multisampled images on Maxwell+Rhys Perry2018-07-046-39/+48
| | | | | | | | | | Changes in v2: - make loadSuInfo32() protected without making the rest protected - move NVC0_SU_INFO_* into nv50_ir_lowering_nvc0.h instead of duplicating NVC0_SU_INFO_MS Signed-off-by: Rhys Perry <[email protected]> Reviewed-by: Karol Herbst <[email protected]>
* nv50/ir: handle clipvertex for geom and tess shaders as wellKarol Herbst2018-07-021-1/+6
| | | | | | | | | this will be needed for compatibility profiles v2: handle tess shaders Reviewed-by: Ilia Mirkin <[email protected]> Signed-off-by: Karol Herbst <[email protected]>
* gallium/util: remove dummy function util_format_is_supportedMarek Olšák2018-06-293-10/+0
| | | | Reviewed-by: Eric Engestrom <[email protected]>
* nv50/ir: improve maintainability of Target*::initOpInfo()Rhys Perry2018-06-292-23/+28
| | | | | | | | | | This is mainly useful for when one needs to add new opcodes in a painless and reliable way. Signed-off-by: Rhys Perry <[email protected]> Reviewed-by: Ilia Mirkin <[email protected]> Reviewed-by: Karol Herbst <[email protected]> Signed-off-by: Karol Herbst <[email protected]>
* nv50/ir: fix image stores with indirect handlesRhys Perry2018-06-291-4/+5
| | | | | | | | | | Having this if statement here prevented the next if statement from being reached in the case of image stores, which is needed for instructions with indirect bindless handles like "STORE TEMP[ADDR[2].x+1](1) ...". Signed-off-by: Rhys Perry <[email protected]> Reviewed-by: Karol Herbst <[email protected]> Signed-off-by: Karol Herbst <[email protected]>
* nvc0: remove magic values in nve4_set_tex_handles()Rhys Perry2018-06-281-1/+1
| | | | | | | | | With this commit, things no longer break if NVC0_CB_AUX_TEX_INFO is changed to anything other than 0x20. Signed-off-by: Rhys Perry <[email protected]> Reviewed-by: Karol Herbst <[email protected]> Signed-off-by: Karol Herbst <[email protected]>
* nvc0/ir: fix TargetNVC0::insnCanLoadOffset()Rhys Perry2018-06-281-0/+1
| | | | | | | | | | | | | | Previously, TargetNVC0::insnCanLoadOffset() returned whether the offset could be set to a specific value. The IndirectPropagation pass expected it to return whether the offset could be increased by a specific value, which is what TargetNV50::insnCanLoadOffset() does. Fixes: 37b67db6ae34fb6586d640a7a1b6232f091dd812 ("nvc0/ir: be careful about propagating very large offsets into const load") Signed-off-by: Rhys Perry <[email protected]> Reviewed-by: Karol Herbst <[email protected]> Signed-off-by: Karol Herbst <[email protected]>