aboutsummaryrefslogtreecommitdiffstats
path: root/src/gallium/drivers/nouveau/codegen
Commit message (Collapse)AuthorAgeFilesLines
* nv50/ir: pre-compute BFE arg when both bits and offset are immIlia Mirkin2015-08-201-3/+9
| | | | | | | | | | Due to a quirk in how the nv50 opt passes run, the algebraic optimization that looks for these BFE's happens before the constant folding pass. Rearranging these passes isn't a great idea, but this is easy enough to fix. Allows a following cvt to eliminate the bfe in certain situations. Signed-off-by: Ilia Mirkin <[email protected]>
* nv50/ir: Handle OP_CVT when folding constant expressionsTobias Klausmann2015-08-201-0/+78
| | | | | [imirkin: handle more type combinations, use macro] Signed-off-by: Ilia Mirkin <[email protected]>
* nvc0/ir: undo more shifts still by allowing a pre-SHL to occurIlia Mirkin2015-08-201-15/+33
| | | | | | | | This happens with unpackSnorm lowering. There's yet another bitfield-extract behind it, but there's too much variation to be worth cutting through. Signed-off-by: Ilia Mirkin <[email protected]>
* nvc0/ir: don't require AND when the high byte is being addressedIlia Mirkin2015-08-201-0/+12
| | | | | | | unpackUnorm* lowering doesn't AND the high byte/word as it's unnecessary. Detect that situation as well. Signed-off-by: Ilia Mirkin <[email protected]>
* nvc0/ir: detect i2f/i2i which operate on specific bytes/wordsIlia Mirkin2015-08-204-4/+82
| | | | | | | | | | | Some Unigine shaders have been observed to unpack bytes out of 32-bit integers and convert them to floats. I2F/I2I can handle this sort of thing directly. Detect the handleable situations. This misses 16-bit word capabilities in nv50, but I haven't seen shaders that would actually make use of that. Signed-off-by: Ilia Mirkin <[email protected]>
* nvc0/ir: detect AND/SHR pairs and convert into EXTBFIlia Mirkin2015-08-201-20/+46
| | | | | | | Some shaders appear to extract bits using shift/and combos. Detect (some) of those and convert to EXTBF instead. Signed-off-by: Ilia Mirkin <[email protected]>
* nv50/ir: support different unordered_set implementationsChih-Wei Huang2015-08-205-12/+57
| | | | | | | | | | | | If build with C++11 standard, use std::unordered_set. Otherwise if build on old Android version with stlport, use std::tr1::unordered_set with a wrapper class. Otherwise use std::tr1::unordered_set. Signed-off-by: Chih-Wei Huang <[email protected]> Reviewed-by: Ilia Mirkin <[email protected]>
* gk110/ir: fix sched calculator to consider all registers in the ISAIlia Mirkin2015-08-171-7/+10
| | | | | | | GK110/GK208 have 256 registers, not 64. Find out the number of registers from the target to avoid unnecessary iteration for pre-GK110. Signed-off-by: Ilia Mirkin <[email protected]>
* gm107/ir: avoid letting the lowering pass get out of syncIlia Mirkin2015-08-172-88/+5
| | | | | | | | There's a lot of functionality duplicated in the gm107 lowering pass from the nvc0 pass. As that one gets updated, the gm107 one falls behind. Avoid this by sharing the code. Signed-off-by: Ilia Mirkin <[email protected]>
* gm107/ir: indirect handle goes first on maxwell alsoIlia Mirkin2015-08-141-8/+4
| | | | | | | Fixes fs-simple-texture-size.shader_test Signed-off-by: Ilia Mirkin <[email protected]> Cc: "10.6" <[email protected]>
* nvc0/ir: cache vertex out base so that we don't recompute againIlia Mirkin2015-07-291-8/+15
| | | | | | | | The global CSE pass stinks and is unable to pull this out. Easy enough to handle it here and avoid generating unnecessary special register loads (which can allegedly be quite slow). Signed-off-by: Ilia Mirkin <[email protected]>
* nvc0/ir: output base for reading is based on laneidIlia Mirkin2015-07-291-0/+25
| | | | | | | | | | | PFETCH retrieves the address for incoming vertices, not output vertices in TCS. For output vertices, we must use the laneid as a base. Fixes barrier piglit test, which was failing for entirely non-barrier reasons, but rather that it was (a) trying to draw multiple patches and (b) the incoming patch size was not the same as the outgoing patch size. Signed-off-by: Ilia Mirkin <[email protected]>
* nvc0/ir: trim out barrier sync for non-compute shadersIlia Mirkin2015-07-281-0/+6
| | | | | | | It seems like they're never necessary, and actively cause harm. This fixes some of the barrier-related piglits. Signed-off-by: Ilia Mirkin <[email protected]>
* nvc0/ir: fix barrier emissionIlia Mirkin2015-07-281-0/+2
| | | | | | immediate arguments require a flag to be set for each one Signed-off-by: Ilia Mirkin <[email protected]>
* nvc0/ir: per-patch vars are in a separate address spaceIlia Mirkin2015-07-241-0/+2
| | | | | | | | | | | | | There's no need to attempt to avoid overlapping generic i/o with patch i/o. By the same token, we can't merge patch and non-patch loads/stores. This fixes at least the tes-both-input-array-*-index-rd tessellation variable-indexing tests. Signed-off-by: Ilia Mirkin <[email protected]>
* nvc0/ir: kepler can't do indirect shader input/output loads directlyIlia Mirkin2015-07-238-6/+75
| | | | | | | | | | | | | | There's a special AL2P instruction (called AFETCH in nv50 ir) which computes a "physical" value to be used with indirect addressing with ALD. Fixes tcs-input-array-*-index-rd tcs-output-array-*-index-wr varying-indexing tessellation tests on Kepler. Signed-off-by: Ilia Mirkin <[email protected]>
* nvc0/ir: tess factors are now sysvals, adapt codegen to expect thatIlia Mirkin2015-07-236-11/+24
| | | | Signed-off-by: Ilia Mirkin <[email protected]>
* gk110/ir: fake BAR supportIlia Mirkin2015-07-231-0/+12
| | | | | | Makes things sorta work until we figure out the real way to do this. Signed-off-by: Ilia Mirkin <[email protected]>
* nvc0/ir: cleanup private enums that have graduated to galliumIlia Mirkin2015-07-231-5/+0
| | | | Signed-off-by: Ilia Mirkin <[email protected]>
* nvc0/ir: allow tess eval output loads to be CSE'dIlia Mirkin2015-07-231-0/+2
| | | | | | These only happen for gl_TessCoord which are constant. Signed-off-by: Ilia Mirkin <[email protected]>
* nvc0/ir: add hazard for 2nd dim of vfetch/load indirect argumentIlia Mirkin2015-07-231-0/+2
| | | | | | | Apparently a multi-word load can potentially overwrite the indirect sources, so make sure that RA picks different registers for those. Signed-off-by: Ilia Mirkin <[email protected]>
* nvc0/ir: patch vertex count is stored in the upper bitsIlia Mirkin2015-07-231-0/+4
|
* nvc0/ir: add support for reading outputs in tess control shadersIlia Mirkin2015-07-232-2/+18
| | | | Signed-off-by: Ilia Mirkin <[email protected]>
* nvc0/ir: set perPatch flag on load/stores to per-patch varyingsIlia Mirkin2015-07-231-2/+6
| | | | Signed-off-by: Ilia Mirkin <[email protected]>
* nvc0/ir: populate info structure based on new tess propertiesIlia Mirkin2015-07-231-0/+18
| | | | Signed-off-by: Ilia Mirkin <[email protected]>
* nvc0/ir: mark varyings as per-patch based on semantic nameIlia Mirkin2015-07-231-0/+14
| | | | | | Also add proper handling for PATCH semantics Signed-off-by: Ilia Mirkin <[email protected]>
* nvc0: TESSCOORD comes in as a sysval, not an inputIlia Mirkin2015-07-231-2/+0
| | | | Signed-off-by: Ilia Mirkin <[email protected]>
* nvc0: preliminary tess supportIlia Mirkin2015-07-233-7/+4
| | | | | | | Uncomment the various functionality that was already there and add in obvious missing bits that parallel vp/gp/fp functionality. Signed-off-by: Ilia Mirkin <[email protected]>
* nouveau: use bool instead of booleanSamuel Pitoiset2015-07-213-12/+12
| | | | | Signed-off-by: Samuel Pitoiset <[email protected]> Acked-by: Ilia Mirkin <[email protected]>
* gm107/ir: fix indirect txq emissionIlia Mirkin2015-07-181-2/+8
| | | | | Signed-off-by: Ilia Mirkin <[email protected]> Cc: [email protected]
* nvc0/ir: don't worry about sampler in txq handlingIlia Mirkin2015-07-181-22/+8
| | | | | | | | | There's no need to deal with samplers for texture size queries. That code also was accidentally setting an invalid sIndirectSrc position, but it can now just be removed. Signed-off-by: Ilia Mirkin <[email protected]> Cc: [email protected]
* nvc0/ir: fix txq on indirect samplersIlia Mirkin2015-07-182-2/+56
| | | | | Signed-off-by: Ilia Mirkin <[email protected]> Cc: [email protected]
* nv50/ir: UCMP arguments are float, so make sure modifiers are appliedIlia Mirkin2015-07-031-1/+2
| | | | | | | | | The first argument to UCMP needs to be compared against 0, but the latter arguments are treated as float and need to be able to properly apply neg/abs arguments. Adjust the inferSrcType function accordingly. Signed-off-by: Ilia Mirkin <[email protected]> Cc: "10.5 10.6" <[email protected]>
* nv50/ir: don't emit src2 in immediate formIlia Mirkin2015-07-021-2/+2
| | | | | | | | | In the immediate form, src2 == dst, so it does not need to be emitted. Otherwise it overlaps with the immediate value's low bits. Fixes: 09ee907266 (nv50/ir: Fold IMM into MAD) Cc: "10.6" <[email protected]> Signed-off-by: Ilia Mirkin <[email protected]>
* nv50/ir: copy joinAt when splitting both before and afterIlia Mirkin2015-07-013-0/+5
| | | | | | | | | | | | | | | | | | | | | The current implementation only moves the joinAt when splitting after the given instruction, not before it. So if you have a BB with foo instr bar joinat and thus with joinAt set, we end up first splitting before instr, at which point the instr's bb is updated to the new bb. Since that bb doesn't have a joinAt set (despite containing one), when splitting after the instr, there is nothing to copy over. Since the joinat will be in the "split" bb irrespective of whether we're splitting before or after the instruction, move it over in either case. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=91124 Signed-off-by: Ilia Mirkin <[email protected]> Cc: "10.5 10.6" <[email protected]>
* nv50/ir: fix emission of address reg in 3rd sourceIlia Mirkin2015-06-301-2/+6
| | | | | | Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=91056 Signed-off-by: Ilia Mirkin <[email protected]> Cc: "10.5 10.6" <[email protected]>
* nv50/ir: propagate modifier to right arg when const-folding madIlia Mirkin2015-06-261-1/+4
| | | | | | | | | | An immediate has to be the second arg of an ADD operation. However we were mistakenly propagating the modifier of the non-folded value to the folded immediate argument. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=91117 Signed-off-by: Ilia Mirkin <[email protected]> Cc: "10.5 10.6" <[email protected]>
* nvc0/ir: can't have a join on a load with an indirect sourceIlia Mirkin2015-06-171-1/+1
| | | | | | | | | | | Triggers an INVALID_OPCODE warning on GK208. Seems rare enough to not warrant verification on other chips. Fixes the new piglits: ubo_array_indexing/fs-nonuniform-control-flow.shader_test ubo_array_indexing/vs-nonuniform-control-flow.shader_test Signed-off-by: Ilia Mirkin <[email protected]> Cc: "10.5 10.6" <[email protected]>
* nvc0/ir: fix collection of first uses for texture barrier insertionIlia Mirkin2015-06-151-5/+11
| | | | | | | | | | | | | One of the places we have to insert texbars is in situations where the result of the tex gets overwritten by a different instruction (e.g. in a conditional statement). However in some situations it can actually appear as though the original tex itself is an overwriting instruction. This can naturally never really happen, so just ignore the tex instruction when it comes up. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=90347 Signed-off-by: Ilia Mirkin <[email protected]> Cc: "10.5 10.6" <[email protected]>
* nv50/ir: OP_JOIN is a flow instructionJürgen Rühle2015-06-151-1/+1
| | | | | | | | | | | OP_JOIN instructions are assumed to be flow instructions and mercilessly casted to FlowInstruction. This patch fixes an instance where an OP_JOIN is created as a plain instruction. This can cause crashes in the ir printer. [imirkin: add ->fixed = 1] Reviewed-by: Ilia Mirkin <[email protected]>
* nv50/ir: avoid messing up arg1 of PFETCHIlia Mirkin2015-05-231-2/+18
| | | | | | | | | | | | | | There can be scenarios where the "indirect" arg of a PFETCH becomes known, and so the code will attempt to propagate it. Use this opportunity to just fold it into the first argument, and prevent the load propagation pass from touching PFETCH further. This fixes gs-input-array-vec4-index-rd.shader_test and vs-output-array-vec4-index-wr-before-gs.shader_test on nvc0 at least. Signed-off-by: Ilia Mirkin <[email protected]> Reviewed-by: Tobias Klausmann <[email protected]> Cc: "10.5 10.6" <[email protected]>
* nvc0/ir: LOAD's can't be used for shader inputsIlia Mirkin2015-05-222-0/+2
| | | | | | | | | | We forgot to convert to VFETCH in case of indirect access. Fix that. This avoids crashes on the new gs-input-array-vec4-index-rd and vs-output-array-vec4-index-wr-before-gs but they still fail. Signed-off-by: Ilia Mirkin <[email protected]> Cc: "10.5 10.6" <[email protected]>
* nv50/ir: guess that the constant offset is the starting slot of arrayIlia Mirkin2015-05-221-2/+4
| | | | | | | | When we get something like IN[ADDR[0].x+5], we will now guess that we should look at IN[5] for the "base" information. Signed-off-by: Ilia Mirkin <[email protected]> Cc: "10.5 10.6" <[email protected]>
* nvc0/ir: set ftz when sources are floats, not just destinationsIlia Mirkin2015-05-221-3/+2
| | | | | | | | In the case of a compare, the destination might be a predicate, but we still want to flush denorms. Signed-off-by: Ilia Mirkin <[email protected]> Cc: "10.5 10.6" <[email protected]>
* nv50/ir: allow OP_SET to merge with OP_SET_AND/etc as well as a negIlia Mirkin2015-05-221-26/+55
| | | | | | | | | | | | | | | | | | | | This covers the pattern where a KILL_IF is used, which triggers a comparison of -x to 0. This can usually be folded into the comparison whose result is being compared to 0, however it may, itself, have already been combined with another comparison. That shouldn't impact the logic of this pass however. With this and the & 1.0 change, code like 00000020: 001c0001 80081df4 set b32 $r0 lt f32 $r0 0x3e800000 00000028: 001c0000 201fc000 and b32 $r0 $r0 0x3f800000 00000030: 7f9c001e dd885c00 set $p0 0x1 lt f32 neg $r0 0x0 00000038: 0000003c 19800000 $p0 discard becomes 00000020: 001c001d b5881df4 set $p0 0x1 lt f32 $r0 0x3e800000 00000028: 0000003c 19800000 $p0 discard Signed-off-by: Ilia Mirkin <[email protected]>
* nvc0/ir: optimize set & 1.0 to produce boolean-float setsIlia Mirkin2015-05-222-0/+29
| | | | | | | | This has started to happen more now that the backend is producing KILL_IF more often. Signed-off-by: Ilia Mirkin <[email protected]> Reviewed-by: Tobias Klausmann <[email protected]>
* nvc0/ir: allow iset to produce a boolean floatIlia Mirkin2015-05-223-5/+16
| | | | Signed-off-by: Ilia Mirkin <[email protected]>
* nvc0/ir: avoid jumping to a sched instructionIlia Mirkin2015-05-223-2/+9
| | | | Signed-off-by: Ilia Mirkin <[email protected]>
* gallium: remove TGSI_SAT_MINUS_PLUS_ONEMarek Olšák2015-05-201-12/+1
| | | | | | | | It's a remnant of some old NV extension. Unused. I also have a patch that removes predicates if anyone is interested. Reviewed-by: Roland Scheidegger <[email protected]>
* gk110/ir: switch to gk104-style sched codes rather than all-in-oneIlia Mirkin2015-05-181-9/+9
| | | | | | Matches change to envydis/envyas tools. Signed-off-by: Ilia Mirkin <[email protected]>