summaryrefslogtreecommitdiffstats
path: root/src/gallium/auxiliary/gallivm
Commit message (Collapse)AuthorAgeFilesLines
* gallivm: remove explicit __STDC_.*_MACROS definesEmil Velikov2017-01-271-8/+0
| | | | | | | | | | Correctly handled by the build systems. Cc: Roland Scheidegger <[email protected]> Signed-off-by: Emil Velikov <[email protected]> Reviewed-by: Roland Scheidegger <[email protected]> Reviewed-by: Eric Engestrom <[email protected]> Reviewed-by: Jose Fonseca <[email protected]>
* gallivm: don't try to use fast rcp for fdivRoland Scheidegger2017-01-241-1/+3
| | | | | | | | The use of fast rcp instruction is disabled, and will always fall back to use a division instead (1 / x). Hence, if we get a division opcode, it doesn't make much sense trying to split that into rcp/mul. Reviewed-by: Jose Fonseca <[email protected]>
* gallivm: (trivial) fix ddiv cpu implementationRoland Scheidegger2017-01-241-1/+0
| | | | | | | | | | | we can't use the cpu implementation of fdiv, as this one uses different lp_build_context, which causes assertion failure. Just use default fdiv action (there is no fast rcp for doubles which we could potentially use anyway). Cc: 17.0 <[email protected]> Reviewed-by: Nicolai Hähnle <[email protected]> Reviewed-by: Jose Fonseca <[email protected]>
* gallivm: use #ifdef not #if for PIPE_ARCH_BIG_ENDIANDave Airlie2017-01-191-1/+1
| | | | | | | | This fixes the build on ppc/s390. Reviewed-by: Roland Scheidegger <[email protected]> Cc: "17.0" <[email protected]> Signed-off-by: Dave Airlie <[email protected]>
* gallivm: (trivial) fix copy/paste bug with big endian codeRoland Scheidegger2017-01-181-2/+4
| | | | | | 8bd67a35c50e68c21aed043de11e095c284d151a introduced using undefined variable on big endian archs due to copy/paste bug. (compile hack tested only)
* gallivm: Cleanup USE_MCJIT.Jose Fonseca2017-01-181-10/+25
| | | | | | | Split USE_MCJIT macro dual nature into a separate constant time define and a run-time variable. Reviewed-by: Emil Velikov <[email protected]>
* tgsi: add DDIV instructionNicolai Hähnle2017-01-161-0/+2
| | | | | | | | | Double-precision division, to allow more precision than a DRCP + DMUL sequence. Reviewed-by: Roland Scheidegger <[email protected]> Reviewed-by: Ilia Mirkin <[email protected]> Reviewed-by: Marek Olšák <[email protected]>
* gallivm: generalize 4x4f->1x16ub special case conversionRoland Scheidegger2017-01-061-56/+118
| | | | | | | | | | | | | | | | | | | | | | | | | | | This special packing path can be easily extended to handle not just float->unorm8 but also float->snorm8 and uint32->uint8 and int32->int8 (i.e. all interesting cases for llvmpipe fs backend code). The packing parts all stay the same (only the last step packing will be signed->signed instead of signed->unsigned but luckily even sse2 can do both). While here also note some bugs with that (we keep the bugs identical to what we did before on x86, albeit other archs may differ). In particular float->unorm8 too large values will still get clamped to 0, not 255, and for float->snorm8 NaNs will end up as -1, not 0 (but we do the clamp against 1.0 there to prevent too large values ending up as -1.0 - this is inconsistent to unorm8 handling but is what we ended up before, I'm not sure we can get away without it). This is quite fishy in any case as we depend on arch-dependent behavior of the iround (my understanding is in fact with altivec the conversion would actually saturate although I've no idea about NaNs, so probably wouldn't need to do anything for snorm). (There are only minimal piglit tests for unorm clamping behavior AFAICT, in particular nothing seems to test values which are too large to be handled by the float->int conversion.) For uint32->uint8 we also do a min against MAX_INT, since the source for the packs is always signed (again, on x86 - should probably be able to express these arch-dependent bits better some day). Reviewed-by: Jose Fonseca <[email protected]>
* gallivm: (trivial) fix typo bug with small AoS format unpackingRoland Scheidegger2017-01-061-1/+1
| | | | | | | Fix typo using wrong (uninitialized) build context introduced by 4634cb5921b985f04f2daf00cda2d28036143bd3. (This only affects very rare small packed formats which have a PIPE_SWIZZLE_0 channel, such as r4a4, which is never used by mesa/st. Nevertheless it broke lp_test_format.)
* gallivm: implement aos unpack (to unorm8) for small unorm formatsRoland Scheidegger2017-01-051-12/+152
| | | | | | | | | | | | | | | | | | | Using bit replication. This path now resembles something which might make sense. (The logic was mostly copied from llvmpipe fs backend.) I am not convinced though it is actually faster than SoA sampling (actually I'm quite certain it's always a loss with AVX). With SoA it's just shift/mask/cvt/mul for getting the colors, whereas there's still roughly 3 shifts, 3 or/and per channel for AoS (i.e. for SoA it's exactly the same as it would be for a rgba8 format, whereas the extra effort for AoS is significant). The filtering might still be faster (albeit with FMA the instruction count gets down quite a bit there on the SoA float filtering path on new cpus). And those small unorm formats often don't have an alpha channel (which makes things worse relatively for AoS path). (This also fixes a trivial bug in the llvmpipe fs code this was derived from, albeit it was only relevant for 4-bit channels.) Reviewed-by: Jose Fonseca <[email protected]>
* gallivm: optimize lp_build_unpack_arith_rgba_aos slightlyRoland Scheidegger2017-01-051-19/+97
| | | | | | | | | | | | | | | | | This code uses a vector shift which has to be emulated on x86 unless there's AVX2. Luckily in some cases we can actually avoid the shift altogether, so do that. Also make sure we hit the fast lp_build_conv() path when applicable, albeit that's quite the hack... That said, this path is taken for AoS sampling for small unorm (smaller than rgba8) formats, and it is completely hopeless even with those changes, with or without AVX. (Probably should have some code similar to the one in the llvmpipe fs backend code, using bit replication to extend to rgba8888 - rounding is not quite 100% accurate but if it's good enough there it should be here as well.) Reviewed-by: Jose Fonseca <[email protected]>
* gallivm: use 2 srcs for 32->16bit conversions in lp_bld_conv_autoRoland Scheidegger2017-01-051-2/+19
| | | | | | | | | | | | | | | | | If we only feed one source vector at a time, we cannot use pack intrinsics (as we only have a 64bit destination dst vector). lp_bld_conv_auto is specifically designed to alter the length and number of destination vectors, so this works just fine (if we use single source vectors at a time, afterwards we immediately reassemble the vectors). For AVX though this isn't really possible, since we expect 128bit output already for a single 256bit input. (One day we should handle AVX2 which again would need multiple inputs, however there's the problem that we get different ordered output there and we don't want to reorder, so would need to be able to tell build_conv to handle upper and lower halfs independently.) A similar strategy would probably work for 32->8bit too (if it doesn't hit the special case) but I'm going to try something different for that... Reviewed-by: Jose Fonseca <[email protected]>
* llvmpipe: (trivial) minimally simplify mask constructionRoland Scheidegger2017-01-051-0/+2
| | | | | | | | | | | | | | | | simd instruction sets usually have comparisons for equal, not unequal. So use a different comparison against the mask itself - which also means we don't need a all-zero as well as a all-one (for the pxor) reg. Also add code to avoid scalar expansion of i1 values which we definitely shouldn't do. There's problems with this though with llvm select interaction, so it's disabled (basically using llvm select instead of intrinsics may still produce atrocious code, even in cases where we figured it should not, albeit I think this could probably be fixed with some better selection of optimization passes, but I have zero idea there really). Reviewed-by: Jose Fonseca <[email protected]>
* gallium: remove TGSI_OPCODE_SUBMarek Olšák2017-01-052-39/+5
| | | | | | It's redundant with the source modifier. Reviewed-by: Nicolai Hähnle <[email protected]>
* gallium: remove TGSI_OPCODE_ABSMarek Olšák2017-01-053-17/+3
| | | | | | It's redundant with the source modifier. Reviewed-by: Nicolai Hähnle <[email protected]>
* swr: fix windows build breakGeorge Kyriazis2017-01-051-0/+7
| | | | | | | | | | wrap lp_bld_type.h around extern "C". Windows decorates global variables, so when used from .cpp files, need to use an undecorated version. Also, removed related and unneeded code from swr_screen.cpp Reviewed-by: Ilia Mirkin <[email protected]>
* gallivm: generalize the compressed format soa fetch a bitRoland Scheidegger2016-12-211-37/+49
| | | | | | | | | This can now handle rgtc (unorm) too - this path no longer handles plain formats, but that's unnecessary they now all have their proper SoA unpack (this will still be dog-slow though due to the actual fetch being per-pixel util fallbacks). Reviewed-by: Jose Fonseca <[email protected]>
* gallivm: provide soa fetch path handling formats with more than 32bitRoland Scheidegger2016-12-211-154/+375
| | | | | | | | | | | | | | | | | | | | | | | | | | | | This previously always fell back to AoS conversion. Even for 4-float formats (which is the optimal case by far for that fallback case) this was suboptimal, since it meant the conversion couldn't be done with 256bit vectors. While this may still only be partly possible for some formats, (unless there's AVX2 support) at least the transpose can be done with half the unpacks (and before using the transpose for AoS fallbacks, it was worse still). With less than 4 channels, things got way worse with the AoS fallback quickly even with 128bit vectors. The strategy is pretty much the same as the existing one for formats which fit into 32 bits, except there's now multiple vectors to be fetched (2 or 4 to be exact), which need to be shuffled first (if it's 4 vectors, this amounts to a transpose, for 2 it's a bit different), then the unpack is done the same (with the exception that the shift of the channels is now modulo 32, and we need to select the right vector). In fact the most complex part about it is to get the shuffles right for separating into lo/hi parts for AVX/AVX2... This also makes use of the new ability of gather to use provided type information, which we abuse to outsmart llvm so we get decent shuffles, and to fetch 3x32bit vectors without having to ZExt the scalar. And just because we can, we handle double formats too, albeit they are a bit different (draw sometimes needs to handle that). v2: fix typo float/int bug (generating inefficient code). Reviewed-by: Jose Fonseca <[email protected]>
* gallivm: optimize gather a bit, by using supplied destination typeRoland Scheidegger2016-12-217-78/+332
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | By using a dst_type in the the gather interface, gather has some more knowledge about how values should be fetched. E.g. if this is a 3x32bit fetch and dst_type is 4x32bit vector gather will no longer do a ZExt with a 96bit scalar value to 128bit, but just fetch the 96bit as 3x32bit vector (this is still going to be 2 loads of course, but the loads can be done directly to simd vector that way). Also, we can now do some try to use the right int/float type. This should make no difference really since there's typically no domain transition penalties for such simd loads, however it actually makes a difference since llvm will use different shuffle lowering afterwards so the caller can use this to trick llvm into using sane shuffle afterwards (and yes llvm is really stupid there - nothing against using the shuffle instruction from the correct domain, but not at the cost of doing 3 times more shuffles, the case which actually matters is refusal to use shufps for integer values). Also do some attempt to avoid things which look great on paper but llvm doesn't really handle (e.g. fetching 3-element 8 bit and 16 bit vectors which is simply disastrous - I suspect type legalizer is to blame trying to extend these vectors to 128bit types somehow, so fetching these with scalars like before which is suboptimal due to the ZExt). Remove the ability for truncation (no point, this is gather, not conversion) as it is complex enough already. While here also implement not just the float, but also the 64bit avx2 gathers (disabled though since based on the theoretical numbers the benefit just isn't there at all until Skylake at least). Reviewed-by: Jose Fonseca <[email protected]>
* gallivm: optimize SoA AoS fallback fetch path a littleRoland Scheidegger2016-12-211-22/+46
| | | | | | | | | | We should do transpose, not extract/insert, at least with "sufficient" amount of channels (for 4 channels, extract/insert shuffles generated otherwise look truly terrifying). Albeit we shouldn't fallback to that so often in any case. v2: ditch the extract/insert path, not worth keeping (we're going to avoid hitting the fallback that often with future patches). Reviewed-by: Jose Fonseca <[email protected]>
* gallivm: (trivial) handle non-aligned fetch for lp_build_fetch_rgba_soaRoland Scheidegger2016-12-213-8/+12
| | | | | | | | | | soa fetch so far always assumed that data was aligned. However, we want to use this for vertex fetch, and data might not be aligned there, so handle it in this path too (basically just pass through alignment through to other functions). (It looks like it wouldn't work for for cached s3tc but this is no different than with AoS fetch.) Reviewed-by: Jose Fonseca <[email protected]>
* gallivm: optimize 16bit->32bit gather path a bitRoland Scheidegger2016-12-061-3/+39
| | | | | | | | | | LLVM can't really optimize anything which crosses scalar/vector boundaries, so help a bit with some particular gather operations when the width is expanded (only do it for 16->32bit expansion for now), by doing expansion after fetch. That is probably a better solution anyway even if llvm would recognize it, makes for cleaner IR... Reviewed-by: Jose Fonseca <[email protected]>
* gallivm: handle 16bit float fetches in lp_build_fetch_rgba_soaRoland Scheidegger2016-12-061-4/+18
| | | | | | | | | | | | | | | | | | | | Note that we really want to _never_ reach the bottom of the function, which resorts to AoS fetch. Half floats can be handled just like other formats which fit into 32bit vectors (so, only 1x16 and 2x16 formats, albeit with more channels things are not THAT bad), with minimal plumbing. I've seen code size go down nearly by a factor of 3 for a complete texture sampling function (including bilinear filtering) using R16F. (What we should do for everything not special cased is to do AoS gather, shuffle/shift things into SoA vectors, and then do the conversion there. Otherwise it's particularly bad with 1 or 2 channel formats - that r16f format with either 4 or 8-wide vectors was still doing one element at a time, essentially doing exactly the same work as for rgba16f. Also replacing the channels with SWIZZLE0/1 (particularly the latter) adds even more work, as it has to be done per aos vector, and not just straightforward at the end with the SoA vector.) Reviewed-by: Jose Fonseca <[email protected]>
* gallivm: use getHostCPUFeatures on x86/llvm-4.0+.Tim Rowley2016-12-051-0/+15
| | | | | | | Use llvm provided API based on cpuid rather than our own manually mantained list of mattr enabling/disabling. Reviewed-by: Roland Scheidegger <[email protected]>
* gallium: add PIPE_SHADER_CAP_LOWER_IF_THRESHOLDMarek Olšák2016-11-151-0/+1
| | | | Reviewed-by: Nicolai Hähnle <[email protected]>
* gallivm: limit use of setFastMathFlags to LLVM 3.8 and laterMarek Olšák2016-11-151-0/+2
| | | | Reviewed-by: Brian Paul <[email protected]>
* gallivm: add lp_create_builder with an unsafe_fpmath optionMarek Olšák2016-11-152-0/+17
| | | | Reviewed-by: Nicolai Hähnle <[email protected]>
* gallivm: fix [IU]MUL_HI regression harderNicolai Hähnle2016-11-101-8/+12
| | | | | | | | | | The fix in commit 88f791db75e9f065bac8134e0937e1b76600aa36 was insufficient for radeonsi because the vector case was not handled properly. It seems piglit only covers the scalar case, unfortunately. Fixes GL45-CTS.shader_bitfield_operation.[iu]mulExtended.* Reviewed-by: Roland Scheidegger <[email protected]>
* gallivm: Fix build after removal of deprecated attribute API v3Tom Stellard2016-11-093-4/+87
| | | | | | | | | | | | v2: Fix adding parameter attributes with LLVM < 4.0. v3: Fix typo. Fix parameter index. Add a gallivm enum for function attributes. Reviewed-by: Nicolai Hähnle <[email protected]>
* gallivm: fix [IU]MUL_HI regressionNicolai Hähnle2016-11-083-28/+90
| | | | | | | | | | | | | | | | This patch does two things: 1. It separates the host-CPU code generation from the generic code generation. This guards against accidently breaking things for radeonsi in the future. 2. It makes sure we actually use both arguments and don't just compute a square :-p Fixes a regression introduced by commit 29279f44b3172ef3b84d470e70fc7684695ced4b Cc: Roland Scheidegger <[email protected]> Reviewed-by: Roland Scheidegger <[email protected]>
* gallivm: introduce 32x32->64bit lp_build_mul_32_lohi functionRoland Scheidegger2016-11-083-38/+172
| | | | | | | | | | | | This is used by shader umul_hi/imul_hi functions (and soon by draw). It's actually useful separating this out on its own, however the real reason for doing it is because we're using an optimized sse2 version, since the code llvm generates is atrocious (since there's no widening mul in llvm, and it does not recognize the widening mul pattern, so it generates code for real 64x64->64bit mul, which the cpu can't do natively, in contrast to 32x32->64bit mul which it could do). Reviewed-by: Jose Fonseca <[email protected]>
* gallivm: try to fix build with LLVM <= 3.4 due to missing CallSite.hMarek Olšák2016-10-201-1/+5
| | | | | Reviewed-by: Brian Paul <[email protected]> Tested-by: Brian Paul <[email protected]>
* gallivm: add wrappers for missing functions in LLVM <= 3.8Marek Olšák2016-10-202-0/+27
| | | | | | radeonsi needs these. Reviewed-by: Nicolai Hähnle <[email protected]>
* draw: improve vertex fetch (v2)Roland Scheidegger2016-10-192-0/+30
| | | | | | | | | | | | | | | | | | | | | | | The per-element fetch has quite some calculations which are constant, these can be moved outside both the per-element as well as the main shader loop (llvm can figure out it's constant mostly on its own, however this can have a significant compile time cost). Similarly, it looks easier swapping the fetch loops (outer loop per attrib, inner loop filling up the per vertex elements - this way the aos->soa conversion also can be done per attrib and not just at the end though again this doesn't really make much of a difference in the generated code). (This would also make it possible to vectorize the calculations leading to the fetches.) There's also some minimal change simplifying the overflow math slightly. All in all, the generated code seems to look slightly simpler (depending on the actual vs), but more importantly I've seen a significant reduction in compile times for some vs (albeit with old (3.3) llvm version, and the time reduction is only really for the optimizations run on the IR). v2: adapt to other draw change. No changes with piglit. Reviewed-by: Jose Fonseca <[email protected]>
* gallivm: print out time for jitting functions with GALLIVM_DEBUG=perfRoland Scheidegger2016-10-191-0/+11
| | | | | | | | Compilation to actual machine code can easily take as much time as the optimization passes on the IR if not more, so print this out too. Reviewed-by: Brian Paul <[email protected]> Reviewed-by: Jose Fonseca <[email protected]>
* gallivm: Use native packs and unpacks for the lerpsRoland Scheidegger2016-10-193-13/+156
| | | | | | | | | | | | | | | | | | | | | | | | For the texturing packs, things looked pretty terrible. For every lerp, we were repacking the values, and while those look sort of cheap with 128bit, with 256bit we end up with 2 of them instead of just 1 but worse, plus 2 extracts too (the unpack, however, works fine with a single instruction, albeit only with llvm 3.8 - the vpmovzxbw). Ideally we'd use more clever pack for llvmpipe backend conversion too since we actually use the "wrong" shuffle (which is more work) when doing the fs twiddle just so we end up with the wrong order for being able to do native pack when converting from 2x8f -> 1x16b. But this requires some refactoring, since the untwiddle is separate from conversion. This is only used for avx2 256bit pack/unpack for now. Improves openarena scores by 8% or so, though overall it's still pretty disappointing how much faster 256bit vectors are even with avx2 (or rather, aren't...). And, of course, eliminating the needless packs/unpacks in the first place would eliminate most of that advantage (not quite all) from this patch. Reviewed-by: Jose Fonseca <[email protected]>
* gallivm: Use AVX2 gather instrinsics.Jose Fonseca2016-10-041-0/+95
| | | | | | v2: Use AVX2 gather for non aligned loads too. Reviewed-by: Roland Scheidegger <[email protected]>
* gallivm: Use 8 wide AoS sampling on AVX2.Roland Scheidegger2016-10-041-5/+6
| | | | | | | | | | v2: Make sure that with num_lods > 1 and min_filter != mag_filter we still enter the splitting path. So this case would still use 4-wide aos path (as a side note, the 4-wide aos sampling path could actually be improved quite a bit if we have avx2, by just doing the filtering with 256bit vectors). Reviewed-by: Jose Fonseca <[email protected]>
* gallivm: Basic AVX2 support.José Fonseca2016-10-044-28/+98
| | | | | | v2: pblendb -> pblendvb Reviewed-by: Roland Scheidegger <[email protected]>
* gallivm: support negation on 64-bit integersNicolai Hähnle2016-09-211-0/+4
| | | | | | | This should be analogous to 32-bit integers. Reviewed-by: Edward O'Callaghan <[email protected]> Signed-off-by: Nicolai Hähnle <[email protected]>
* gallivm/llvmpipe: prepare support for ARB_gpu_shader_int64.Dave Airlie2016-09-214-4/+498
| | | | | | | | | | | | | | | | This enables 64-bit integer support in gallivm and llvmpipe. v2: add conversion opcodes. v3: - PIPE_CAP_INT64 is not there yet - restrict DIV/MOD defaults to the CPU, as for 32 bits - TGSI_OPCODE_I2U64 becomes TGSI_OPCODE_U2I64 Reviewed-by: Roland Scheidegger <[email protected]> Reviewed-by: Edward O'Callaghan <[email protected]> Signed-off-by: Dave Airlie <[email protected]> Signed-off-by: Nicolai Hähnle <[email protected]>
* gallivm: add lp_build_alloca_undefNicolai Hähnle2016-08-172-0/+24
| | | | | Reviewed-by: Roland Scheidegger <[email protected]> Reviewed-by: Marek Olšák <[email protected]>
* gallivm: add create_builder_at_entry helper functionNicolai Hähnle2016-08-171-23/+22
| | | | | | | Reduces code duplication. Reviewed-by: Roland Scheidegger <[email protected]> Reviewed-by: Marek Olšák <[email protected]>
* util: Move format_rgb9e5.h to src/utilJason Ekstrand2016-08-051-1/+1
| | | | | | | It's used from both mesa main and gallium. Signed-off-by: Jason Ekstrand <[email protected]> Reviewed-by: Roland Scheidegger <[email protected]>
* gallivm: add helper lp_add_attr_dereferenceableMarek Olšák2016-07-132-0/+14
| | | | | | | | | Not sure if this is the right way to do it, but it seems to work. v2: make it a no-op on LLVM <= 3.5 Reviewed-by: Roland Scheidegger <[email protected]> Reviewed-by: Nicolai Hähnle <[email protected]>
* gallivm: set LLVMNoUnwindAttribute on all intrinsicsMarek Olšák2016-07-111-2/+4
| | | | | | | | | RadeonSI stats: Mostly 0% difference, but Valley shows a small improvement: Application Files SGPRs VGPRs SpillSGPR SpillVGPR Code Size LDS Max Waves Waits unigine_valley 278 0.00 % -0.29 % 0.00 % 0.00 % 0.01 % 0.00 % 0.17 % 0.00 % Reviewed-by: Jose Fonseca <[email protected]>
* gallivm: don't use integer min/max sse intrinsics with llvm >= 3.9Roland Scheidegger2016-06-201-2/+4
| | | | | | | | | | | | | | | | | | | Apparently, these are deprecated. There's some AutoUpgrade feature which is supposed to promote these to cmp/select, which apparently doesn't work with jit code. It is possible it's not actually even meant to work (see the bug filed against llvm which couldn't provide an answer neither) but in any case this is meant to be only temporary unless the intrinsics are really illegal. So, just use the fallback code (which should be cmp/select, we're actually doing cmp/sext/trunc/select, but in any case llvm 3.9 manages to optimize this back to pmin/pmax in the end). This addresses https://llvm.org/bugs/show_bug.cgi?id=28176 CC: <[email protected]> Reviewed-by: Jose Fonseca <[email protected]> Tested-by: Vinson Lee <[email protected]> Tested-by: Aaron Watry <[email protected]>
* gallivm: Fix trivial sign warningsJan Vesely2016-06-138-21/+22
| | | | | | | v2: include whitespace fixes Signed-off-by: Jan Vesely <[email protected]> Reviewed-by: Jose Fonseca <[email protected]>
* gallivm: more 64-bit integer prep work.Dave Airlie2016-06-111-8/+8
| | | | | | | This converts one other place to using the new helper. Reviewed-by: Nicolai Hähnle <[email protected]> Signed-off-by: Dave Airlie <[email protected]>
* gallivm: make non-float return code bitcast consistent.Dave Airlie2016-06-111-12/+6
| | | | | | | | This just uses the same form across the fetches. Reviewed-by: Roland Scheidegger <[email protected]> Reviewed-by: Nicolai Hähnle <[email protected]> Signed-off-by: Dave Airlie <[email protected]>