summaryrefslogtreecommitdiffstats
path: root/src/gallium/drivers/nouveau/codegen
Commit message (Collapse)AuthorAgeFilesLines
* nv50/ir: fix emission of s[] args in certain situationsIlia Mirkin2015-11-071-2/+2
| | | | | | | | | | There might only be a single arg (e.g. cvt), so use mode rather than looking at the source directly. Also we don't want to rely on the type of the value, which can be unreliable, but instead use the instruction's. This works out well since mkSplit doesn't adjust the type. Signed-off-by: Ilia Mirkin <[email protected]>
* nv50/ir: only take abs value when computing high resultIlia Mirkin2015-11-071-1/+1
| | | | | | | | Not reachable from TGSI since it only has UMUL, no IMUL. However it's surprising that setting argument types to s32 will cause sign to get lost. Signed-off-by: Ilia Mirkin <[email protected]>
* nv50/ir: allow emission of immediates in imul/imad opsIlia Mirkin2015-11-071-2/+8
| | | | | | | Nothing actually uses this yet (due to complications), but the emission logic is right. Signed-off-by: Ilia Mirkin <[email protected]>
* nv50/ir: properly set the type of the constant folding resultIlia Mirkin2015-11-061-4/+4
| | | | | | | This removes the hack used for merge, which only covers a fraction of the cases. Signed-off-by: Ilia Mirkin <[email protected]>
* nv50/ir: add support for const-folding OP_CVT with F64 source/destIlia Mirkin2015-11-063-0/+45
| | | | Signed-off-by: Ilia Mirkin <[email protected]>
* nv50/ir: add fp64 opcode emission support for G200 (NVA0)Ilia Mirkin2015-11-061-10/+84
| | | | | | Need to emulate rcp/rsq before providing full fp64 support Signed-off-by: Ilia Mirkin <[email protected]>
* nv50/ir: Add support for 64bit immediates to checkSwapSrc01Hans de Goede2015-11-061-5/+6
| | | | | | | | Now that we support 64 bit immediates in insnCanLoad, we need to swap 64 bit immediate sources too for optimal effect. Signed-off-by: Hans de Goede <[email protected]> Reviewed-by: Ilia Mirkin <[email protected]>
* nvc0/ir: Teach insnCanLoad about double immediatesHans de Goede2015-11-061-6/+19
| | | | | | | | | | | | | | | | Teach insnCanLoad about double immediates, together with the "Add support for merge-s to the ConstantFolding pass" This turns the following (nvc0) code: 1: mov u32 $r2 0x00000000 (8) 2: mov u32 $r3 0x3fe00000 (8) 3: add f64 $r0d $r0d $r2d (8) Into: 1: add f64 $r0d $r0d 0.500000 (8) Signed-off-by: Hans de Goede <[email protected]> Reviewed-by: Ilia Mirkin <[email protected]>
* nv50/ir: Add support for merge-s to the ConstantFolding passHans de Goede2015-11-061-0/+15
| | | | | | | | | | | This allows later passes like LoadPropagation to properly deal with 64 bit immediates. If the new 64 bit load this introduces does not get optimized away then split64BitOpPostRA() will split this into 2 instructions again. Signed-off-by: Hans de Goede <[email protected]> Reviewed-by: Ilia Mirkin <[email protected]>
* nv50/ir: disallow 64-bit immediates on nv50 targetsIlia Mirkin2015-11-061-1/+1
| | | | | | No instructions are able to load short immediates like nvc0 can. Signed-off-by: Ilia Mirkin <[email protected]>
* nv50/ir: allow movs with TYPE_F64 destinations to be splitIlia Mirkin2015-11-061-0/+6
| | | | Signed-off-by: Ilia Mirkin <[email protected]>
* gm107/ir: Add support for double immediatesHans de Goede2015-11-061-1/+4
| | | | | | | | Add support for encoding double immediates (up to 20 bits of precision) into the generated gm107 machine-code. Signed-off-by: Hans de Goede <[email protected]> Reviewed-by: Ilia Mirkin <[email protected]>
* nvc0/ir: Add support for double immediatesHans de Goede2015-11-061-0/+8
| | | | | | | | Add support for encoding double immediates (up to 20 bits of precision) into the generated nvc0 machine-code. Signed-off-by: Hans de Goede <[email protected]> Reviewed-by: Ilia Mirkin <[email protected]>
* nv50,nvc0: provide debug messages with shader compilation statsIlia Mirkin2015-11-052-0/+3
| | | | Signed-off-by: Ilia Mirkin <[email protected]>
* nouveau: get rid of tabsIlia Mirkin2015-10-313-4/+4
| | | | Signed-off-by: Ilia Mirkin <[email protected]>
* nv50: allow per-sample interpolation to be forced via rastIlia Mirkin2015-10-293-3/+29
| | | | | | | | Uses the same technique as for nvc0 of fixups before upload, and evicting in case of state change. Removes one source of variants kept by st/mesa. Signed-off-by: Ilia Mirkin <[email protected]>
* nv50/ir: adapt to new method for passing in cull/clip distance masksIlia Mirkin2015-10-293-8/+10
| | | | Signed-off-by: Ilia Mirkin <[email protected]>
* nvc0: do upload-time fixups for interpolation parametersIlia Mirkin2015-10-297-9/+157
| | | | | | | | | | | | | | | | | Unfortunately flatshading is an all-or-nothing proposition on nvc0, while GL 3.0 calls for the ability to selectively specify explicit interpolation parameters on gl_Color/gl_SecondaryColor which would override the flatshading setting. This allows us to fix up the interpolation settings after shader generation based on rasterizer settings. While we're at it, we can add support for dynamically forcing all (non-flat) shader inputs to be interpolated per-sample, which allows st/mesa to not generate variants for these. Fixes the remaining failing glsl-1.30/execution/interpolation piglits. Signed-off-by: Ilia Mirkin <[email protected]>
* nv50/ir: use C++11 standard std::unordered_map if possibleChih-Wei Huang2015-10-151-3/+17
| | | | | | | | Note Android version before Lollipop is not supported. Signed-off-by: Chih-Wei Huang <[email protected]> Reviewed-by: Ilia Mirkin <[email protected]> Cc: [email protected]
* nvc0/ir: start offset at texBindBase for txq, like regular texturingIlia Mirkin2015-09-141-1/+4
| | | | | | | | | Curiously this has no actual effect. I think it's because the first 8 textures are bound in multiple slots for some reason. However seems prudent to use these the same way as regular texturing, esp in the case where there are more than 8 textures bound. Signed-off-by: Ilia Mirkin <[email protected]>
* nv50/ir: add support for TXQS tgsi opcodeIlia Mirkin2015-09-133-7/+39
| | | | Signed-off-by: Ilia Mirkin <[email protected]>
* nv50/ir: don't fold immediate into mad if registers are too highIlia Mirkin2015-09-101-0/+4
| | | | | | Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=91551 Signed-off-by: Ilia Mirkin <[email protected]> Cc: "11.0" <[email protected]>
* nv50/ir: fix emission of 8-byte wide interp instructionIlia Mirkin2015-09-101-5/+6
| | | | | | | | | This can come up if the target register number is > 63, which is fairly rare. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=91551 Signed-off-by: Ilia Mirkin <[email protected]> Cc: "11.0" <[email protected]>
* nv50/ir: r63 is only 0 if we are using less than 63 registersIlia Mirkin2015-09-101-1/+4
| | | | | | | | | It is advantageous to use r63 instead of r127 since r63 can fit into the shorter encoding. However if we've RA'd over 63 registers, we must use r127 as the replacement instead. Signed-off-by: Ilia Mirkin <[email protected]> Cc: "11.0" <[email protected]>
* nv50/ir: make edge splitting fix up phi node sourcesIlia Mirkin2015-09-101-13/+77
| | | | | | | | | | | | | | | | | | | | | Unfortunately nv50_ir phi nodes aren't directly connected to the CFG, so the mapping between source and the actual BB is by inbound edge order. So when manipulating edges one has to be extremely careful. We were insufficiently careful when splitting critical edges which resulted in the phi nodes being confused as to where their sources were coming from. This primarily manifests itself with the TXL-lowering logic on nv50, when it is inside of a conditional. I've been unable to trigger the issue anywhere else so far. This resolves rendering failures in a number of games like Two Worlds 2, Trine: Enchanted Edition, Trine 2, XCOM:Enemy Unknown, Stacking. It also improves the situation in Hearthstone, Sonic Generations, and The Raven: Legacy of a Master Thief. However more work needs to be done there (splitting a lot more edges solves it, so it's some other sort of RA-related issue). Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=90887 Signed-off-by: Ilia Mirkin <[email protected]> Cc: "11.0" <[email protected]>
* nouveau: android: add space before PRIx64 macroMauro Rossi2015-09-091-1/+1
| | | | | | | | | | | | | | | | | | | | | Otherwise the android build fails with error : unable to find string literal operator ‘operator"" PRIx64’ There are several resources referring to the problem, which is related to c++11, in our case used when building mesa for lollipop. http://comments.gmane.org/gmane.comp.graphics.opensg.user/5883 I've not investigated all the semantics, some people even suggested a bug in the gcc compiler, I just saw the building error was solved with one little space for lollipop and no side effect when c+11 not used. v2: [Emil Velikov] add an alternative commit message from Mauro. Cc: 11.0 <[email protected]> Reviewed-by: Emil Velikov <[email protected]>
* nv50/ir: pre-compute BFE arg when both bits and offset are immIlia Mirkin2015-08-201-3/+9
| | | | | | | | | | Due to a quirk in how the nv50 opt passes run, the algebraic optimization that looks for these BFE's happens before the constant folding pass. Rearranging these passes isn't a great idea, but this is easy enough to fix. Allows a following cvt to eliminate the bfe in certain situations. Signed-off-by: Ilia Mirkin <[email protected]>
* nv50/ir: Handle OP_CVT when folding constant expressionsTobias Klausmann2015-08-201-0/+78
| | | | | [imirkin: handle more type combinations, use macro] Signed-off-by: Ilia Mirkin <[email protected]>
* nvc0/ir: undo more shifts still by allowing a pre-SHL to occurIlia Mirkin2015-08-201-15/+33
| | | | | | | | This happens with unpackSnorm lowering. There's yet another bitfield-extract behind it, but there's too much variation to be worth cutting through. Signed-off-by: Ilia Mirkin <[email protected]>
* nvc0/ir: don't require AND when the high byte is being addressedIlia Mirkin2015-08-201-0/+12
| | | | | | | unpackUnorm* lowering doesn't AND the high byte/word as it's unnecessary. Detect that situation as well. Signed-off-by: Ilia Mirkin <[email protected]>
* nvc0/ir: detect i2f/i2i which operate on specific bytes/wordsIlia Mirkin2015-08-204-4/+82
| | | | | | | | | | | Some Unigine shaders have been observed to unpack bytes out of 32-bit integers and convert them to floats. I2F/I2I can handle this sort of thing directly. Detect the handleable situations. This misses 16-bit word capabilities in nv50, but I haven't seen shaders that would actually make use of that. Signed-off-by: Ilia Mirkin <[email protected]>
* nvc0/ir: detect AND/SHR pairs and convert into EXTBFIlia Mirkin2015-08-201-20/+46
| | | | | | | Some shaders appear to extract bits using shift/and combos. Detect (some) of those and convert to EXTBF instead. Signed-off-by: Ilia Mirkin <[email protected]>
* nv50/ir: support different unordered_set implementationsChih-Wei Huang2015-08-205-12/+57
| | | | | | | | | | | | If build with C++11 standard, use std::unordered_set. Otherwise if build on old Android version with stlport, use std::tr1::unordered_set with a wrapper class. Otherwise use std::tr1::unordered_set. Signed-off-by: Chih-Wei Huang <[email protected]> Reviewed-by: Ilia Mirkin <[email protected]>
* gk110/ir: fix sched calculator to consider all registers in the ISAIlia Mirkin2015-08-171-7/+10
| | | | | | | GK110/GK208 have 256 registers, not 64. Find out the number of registers from the target to avoid unnecessary iteration for pre-GK110. Signed-off-by: Ilia Mirkin <[email protected]>
* gm107/ir: avoid letting the lowering pass get out of syncIlia Mirkin2015-08-172-88/+5
| | | | | | | | There's a lot of functionality duplicated in the gm107 lowering pass from the nvc0 pass. As that one gets updated, the gm107 one falls behind. Avoid this by sharing the code. Signed-off-by: Ilia Mirkin <[email protected]>
* gm107/ir: indirect handle goes first on maxwell alsoIlia Mirkin2015-08-141-8/+4
| | | | | | | Fixes fs-simple-texture-size.shader_test Signed-off-by: Ilia Mirkin <[email protected]> Cc: "10.6" <[email protected]>
* nvc0/ir: cache vertex out base so that we don't recompute againIlia Mirkin2015-07-291-8/+15
| | | | | | | | The global CSE pass stinks and is unable to pull this out. Easy enough to handle it here and avoid generating unnecessary special register loads (which can allegedly be quite slow). Signed-off-by: Ilia Mirkin <[email protected]>
* nvc0/ir: output base for reading is based on laneidIlia Mirkin2015-07-291-0/+25
| | | | | | | | | | | PFETCH retrieves the address for incoming vertices, not output vertices in TCS. For output vertices, we must use the laneid as a base. Fixes barrier piglit test, which was failing for entirely non-barrier reasons, but rather that it was (a) trying to draw multiple patches and (b) the incoming patch size was not the same as the outgoing patch size. Signed-off-by: Ilia Mirkin <[email protected]>
* nvc0/ir: trim out barrier sync for non-compute shadersIlia Mirkin2015-07-281-0/+6
| | | | | | | It seems like they're never necessary, and actively cause harm. This fixes some of the barrier-related piglits. Signed-off-by: Ilia Mirkin <[email protected]>
* nvc0/ir: fix barrier emissionIlia Mirkin2015-07-281-0/+2
| | | | | | immediate arguments require a flag to be set for each one Signed-off-by: Ilia Mirkin <[email protected]>
* nvc0/ir: per-patch vars are in a separate address spaceIlia Mirkin2015-07-241-0/+2
| | | | | | | | | | | | | There's no need to attempt to avoid overlapping generic i/o with patch i/o. By the same token, we can't merge patch and non-patch loads/stores. This fixes at least the tes-both-input-array-*-index-rd tessellation variable-indexing tests. Signed-off-by: Ilia Mirkin <[email protected]>
* nvc0/ir: kepler can't do indirect shader input/output loads directlyIlia Mirkin2015-07-238-6/+75
| | | | | | | | | | | | | | There's a special AL2P instruction (called AFETCH in nv50 ir) which computes a "physical" value to be used with indirect addressing with ALD. Fixes tcs-input-array-*-index-rd tcs-output-array-*-index-wr varying-indexing tessellation tests on Kepler. Signed-off-by: Ilia Mirkin <[email protected]>
* nvc0/ir: tess factors are now sysvals, adapt codegen to expect thatIlia Mirkin2015-07-236-11/+24
| | | | Signed-off-by: Ilia Mirkin <[email protected]>
* gk110/ir: fake BAR supportIlia Mirkin2015-07-231-0/+12
| | | | | | Makes things sorta work until we figure out the real way to do this. Signed-off-by: Ilia Mirkin <[email protected]>
* nvc0/ir: cleanup private enums that have graduated to galliumIlia Mirkin2015-07-231-5/+0
| | | | Signed-off-by: Ilia Mirkin <[email protected]>
* nvc0/ir: allow tess eval output loads to be CSE'dIlia Mirkin2015-07-231-0/+2
| | | | | | These only happen for gl_TessCoord which are constant. Signed-off-by: Ilia Mirkin <[email protected]>
* nvc0/ir: add hazard for 2nd dim of vfetch/load indirect argumentIlia Mirkin2015-07-231-0/+2
| | | | | | | Apparently a multi-word load can potentially overwrite the indirect sources, so make sure that RA picks different registers for those. Signed-off-by: Ilia Mirkin <[email protected]>
* nvc0/ir: patch vertex count is stored in the upper bitsIlia Mirkin2015-07-231-0/+4
|
* nvc0/ir: add support for reading outputs in tess control shadersIlia Mirkin2015-07-232-2/+18
| | | | Signed-off-by: Ilia Mirkin <[email protected]>
* nvc0/ir: set perPatch flag on load/stores to per-patch varyingsIlia Mirkin2015-07-231-2/+6
| | | | Signed-off-by: Ilia Mirkin <[email protected]>