aboutsummaryrefslogtreecommitdiffstats
path: root/src/gallium/auxiliary
Commit message (Collapse)AuthorAgeFilesLines
* gallivm: generalize 4x4f->1x16ub special case conversionRoland Scheidegger2017-01-061-56/+118
| | | | | | | | | | | | | | | | | | | | | | | | | | | This special packing path can be easily extended to handle not just float->unorm8 but also float->snorm8 and uint32->uint8 and int32->int8 (i.e. all interesting cases for llvmpipe fs backend code). The packing parts all stay the same (only the last step packing will be signed->signed instead of signed->unsigned but luckily even sse2 can do both). While here also note some bugs with that (we keep the bugs identical to what we did before on x86, albeit other archs may differ). In particular float->unorm8 too large values will still get clamped to 0, not 255, and for float->snorm8 NaNs will end up as -1, not 0 (but we do the clamp against 1.0 there to prevent too large values ending up as -1.0 - this is inconsistent to unorm8 handling but is what we ended up before, I'm not sure we can get away without it). This is quite fishy in any case as we depend on arch-dependent behavior of the iround (my understanding is in fact with altivec the conversion would actually saturate although I've no idea about NaNs, so probably wouldn't need to do anything for snorm). (There are only minimal piglit tests for unorm clamping behavior AFAICT, in particular nothing seems to test values which are too large to be handled by the float->int conversion.) For uint32->uint8 we also do a min against MAX_INT, since the source for the packs is always signed (again, on x86 - should probably be able to express these arch-dependent bits better some day). Reviewed-by: Jose Fonseca <[email protected]>
* gallivm: (trivial) fix typo bug with small AoS format unpackingRoland Scheidegger2017-01-061-1/+1
| | | | | | | Fix typo using wrong (uninitialized) build context introduced by 4634cb5921b985f04f2daf00cda2d28036143bd3. (This only affects very rare small packed formats which have a PIPE_SWIZZLE_0 channel, such as r4a4, which is never used by mesa/st. Nevertheless it broke lp_test_format.)
* gallivm: implement aos unpack (to unorm8) for small unorm formatsRoland Scheidegger2017-01-051-12/+152
| | | | | | | | | | | | | | | | | | | Using bit replication. This path now resembles something which might make sense. (The logic was mostly copied from llvmpipe fs backend.) I am not convinced though it is actually faster than SoA sampling (actually I'm quite certain it's always a loss with AVX). With SoA it's just shift/mask/cvt/mul for getting the colors, whereas there's still roughly 3 shifts, 3 or/and per channel for AoS (i.e. for SoA it's exactly the same as it would be for a rgba8 format, whereas the extra effort for AoS is significant). The filtering might still be faster (albeit with FMA the instruction count gets down quite a bit there on the SoA float filtering path on new cpus). And those small unorm formats often don't have an alpha channel (which makes things worse relatively for AoS path). (This also fixes a trivial bug in the llvmpipe fs code this was derived from, albeit it was only relevant for 4-bit channels.) Reviewed-by: Jose Fonseca <[email protected]>
* gallivm: optimize lp_build_unpack_arith_rgba_aos slightlyRoland Scheidegger2017-01-051-19/+97
| | | | | | | | | | | | | | | | | This code uses a vector shift which has to be emulated on x86 unless there's AVX2. Luckily in some cases we can actually avoid the shift altogether, so do that. Also make sure we hit the fast lp_build_conv() path when applicable, albeit that's quite the hack... That said, this path is taken for AoS sampling for small unorm (smaller than rgba8) formats, and it is completely hopeless even with those changes, with or without AVX. (Probably should have some code similar to the one in the llvmpipe fs backend code, using bit replication to extend to rgba8888 - rounding is not quite 100% accurate but if it's good enough there it should be here as well.) Reviewed-by: Jose Fonseca <[email protected]>
* gallivm: use 2 srcs for 32->16bit conversions in lp_bld_conv_autoRoland Scheidegger2017-01-051-2/+19
| | | | | | | | | | | | | | | | | If we only feed one source vector at a time, we cannot use pack intrinsics (as we only have a 64bit destination dst vector). lp_bld_conv_auto is specifically designed to alter the length and number of destination vectors, so this works just fine (if we use single source vectors at a time, afterwards we immediately reassemble the vectors). For AVX though this isn't really possible, since we expect 128bit output already for a single 256bit input. (One day we should handle AVX2 which again would need multiple inputs, however there's the problem that we get different ordered output there and we don't want to reorder, so would need to be able to tell build_conv to handle upper and lower halfs independently.) A similar strategy would probably work for 32->8bit too (if it doesn't hit the special case) but I'm going to try something different for that... Reviewed-by: Jose Fonseca <[email protected]>
* llvmpipe: (trivial) minimally simplify mask constructionRoland Scheidegger2017-01-051-0/+2
| | | | | | | | | | | | | | | | simd instruction sets usually have comparisons for equal, not unequal. So use a different comparison against the mask itself - which also means we don't need a all-zero as well as a all-one (for the pxor) reg. Also add code to avoid scalar expansion of i1 values which we definitely shouldn't do. There's problems with this though with llvm select interaction, so it's disabled (basically using llvm select instead of intrinsics may still produce atrocious code, even in cases where we figured it should not, albeit I think this could probably be fixed with some better selection of optimization passes, but I have zero idea there really). Reviewed-by: Jose Fonseca <[email protected]>
* gallium/hud: increase the vertex buffer size for textMarek Olšák2017-01-051-1/+1
| | | | Reviewed-by: Nicolai Hähnle <[email protected]>
* gallium/hud: add an option to sort items below graphsMarek Olšák2017-01-052-5/+33
| | | | Reviewed-by: Nicolai Hähnle <[email protected]>
* gallium/hud: add an option to reset the color counterMarek Olšák2017-01-052-3/+19
| | | | Reviewed-by: Nicolai Hähnle <[email protected]>
* gallium/hud: allow more data sources per paneMarek Olšák2017-01-051-13/+15
| | | | Reviewed-by: Nicolai Hähnle <[email protected]>
* gallium/hud: add an option to rename each data sourceMarek Olšák2017-01-051-2/+35
| | | | | | | | useful for radeonsi performance counters v2: allow specifying both : and = Reviewed-by: Nicolai Hähnle <[email protected]>
* gallium: remove TGSI_OPCODE_SUBMarek Olšák2017-01-0517-94/+61
| | | | | | It's redundant with the source modifier. Reviewed-by: Nicolai Hähnle <[email protected]>
* gallium: remove TGSI_OPCODE_ABSMarek Olšák2017-01-058-25/+4
| | | | | | It's redundant with the source modifier. Reviewed-by: Nicolai Hähnle <[email protected]>
* swr: fix windows build breakGeorge Kyriazis2017-01-051-0/+7
| | | | | | | | | | wrap lp_bld_type.h around extern "C". Windows decorates global variables, so when used from .cpp files, need to use an undecorated version. Also, removed related and unneeded code from swr_screen.cpp Reviewed-by: Ilia Mirkin <[email protected]>
* gallium/hud: add a path separator between dump directory and filenameEdmondo Tommasina2017-01-031-1/+2
| | | | | | | It's more user friendly and it avoids to write files in unexpected places. Signed-off-by: Marek Olšák <[email protected]>
* vl/zscan: fix "Fix trivial sign compare warnings"Christian König2017-01-031-1/+1
| | | | | | | | | | | The variable actually needs to be signed, otherwise converting it to a float doesn't work as expected. Fixes: https://bugs.freedesktop.org/show_bug.cgi?id=98914 Signed-off-by: Christian König <[email protected]> Reviewed-by: Nayan Deshmukh <[email protected]> Cc: "13.0" <[email protected]> Fixes: 1fb4179f927 ("vl: Fix trivial sign compare warnings")
* vl/compositor: implement error handlingNayan Deshmukh2017-01-032-3/+12
| | | | | | | pipe_buffer_map and pipe_buffer_create may return NULL Signed-off-by: Nayan Deshmukh <[email protected]> Reviewed-by: Christian König <[email protected]>
* gallium/hud: fix the windows build by disabling file dumpingMarek Olšák2017-01-021-0/+2
|
* gallium/hud: set filedescriptor for fps graphEdmondo Tommasina2017-01-011-0/+2
| | | | Signed-off-by: Marek Olšák <[email protected]>
* gallium/hud: set filedescriptor for cpu graphEdmondo Tommasina2017-01-011-0/+2
| | | | Signed-off-by: Marek Olšák <[email protected]>
* gallium/hud: move file initialization to a functionEdmondo Tommasina2017-01-013-11/+20
| | | | | | | The function will be used later to create the filedescriptor for other metrics. Signed-off-by: Marek Olšák <[email protected]>
* gallium/hud: dump hud_driver_query values to filesEdmondo Tommasina2017-01-013-0/+19
| | | | | | | | | | | | | | Dump values for every selected data source in GALLIUM_HUD. Every data source has its own file and the filename is equal to the data source identifier. Set GALLIUM_HUD_DUMP_DIR to dump values to files in this directory. No values are dumped if the environment variable is not set, the directory doesn't exist or the user doesn't have write access. Signed-off-by: Marek Olšák <[email protected]>
* ttn: set ->info->num_ubosRob Clark2016-12-271-1/+4
| | | | | | | | | For dealing w/ 32b vs 64b gpu addresses, I need to rework how we pass UBO buffer addresses to shader, and knowing up front the # of UBOs is useful. But I noticed ttn wasn't setting this. Signed-off-by: Rob Clark <[email protected]> Reviewed-by: Eric Anholt <[email protected]>
* ttn: handle GLSL_SAMPLER_DIM_SUBPASS_MS caseJuan A. Suarez Romero2016-12-211-0/+1
| | | | | | Fixes a warning. Reviewed-by: Samuel Iglesias Gonsálvez <[email protected]>
* draw: use SoA fetch, not AoS oneRoland Scheidegger2016-12-211-48/+23
| | | | | | | | | | | | | | | | | | | Now that there's some SoA fetch which never falls back, we should always get results which are better or at least not worse (something like rgba32f will stay the same). For cases which get way better, think something like R16_UNORM with 8-wide vectors: this was 8 sign-extend fetches, 8 cvt, 8 muls, followed by a couple of shuffles to stitch things together (if it is smart enough, 6 unpacks) and then a (8-wide) transpose (not sure if llvm could even optimize the shuffles + transpose, since the 16bit values were actually sign-extended to 128bit before being cast to a float vec, so that would be another 8 unpacks). Now that is just 8 fetches (directly inserted into vector, albeit there's one 128bit insert needed), 1 cvt, 1 mul. v2: ditch the old AoS code instead of just disabling it. Reviewed-by: Jose Fonseca <[email protected]>
* gallivm: generalize the compressed format soa fetch a bitRoland Scheidegger2016-12-211-37/+49
| | | | | | | | | This can now handle rgtc (unorm) too - this path no longer handles plain formats, but that's unnecessary they now all have their proper SoA unpack (this will still be dog-slow though due to the actual fetch being per-pixel util fallbacks). Reviewed-by: Jose Fonseca <[email protected]>
* gallivm: provide soa fetch path handling formats with more than 32bitRoland Scheidegger2016-12-211-154/+375
| | | | | | | | | | | | | | | | | | | | | | | | | | | | This previously always fell back to AoS conversion. Even for 4-float formats (which is the optimal case by far for that fallback case) this was suboptimal, since it meant the conversion couldn't be done with 256bit vectors. While this may still only be partly possible for some formats, (unless there's AVX2 support) at least the transpose can be done with half the unpacks (and before using the transpose for AoS fallbacks, it was worse still). With less than 4 channels, things got way worse with the AoS fallback quickly even with 128bit vectors. The strategy is pretty much the same as the existing one for formats which fit into 32 bits, except there's now multiple vectors to be fetched (2 or 4 to be exact), which need to be shuffled first (if it's 4 vectors, this amounts to a transpose, for 2 it's a bit different), then the unpack is done the same (with the exception that the shift of the channels is now modulo 32, and we need to select the right vector). In fact the most complex part about it is to get the shuffles right for separating into lo/hi parts for AVX/AVX2... This also makes use of the new ability of gather to use provided type information, which we abuse to outsmart llvm so we get decent shuffles, and to fetch 3x32bit vectors without having to ZExt the scalar. And just because we can, we handle double formats too, albeit they are a bit different (draw sometimes needs to handle that). v2: fix typo float/int bug (generating inefficient code). Reviewed-by: Jose Fonseca <[email protected]>
* gallivm: optimize gather a bit, by using supplied destination typeRoland Scheidegger2016-12-218-79/+333
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | By using a dst_type in the the gather interface, gather has some more knowledge about how values should be fetched. E.g. if this is a 3x32bit fetch and dst_type is 4x32bit vector gather will no longer do a ZExt with a 96bit scalar value to 128bit, but just fetch the 96bit as 3x32bit vector (this is still going to be 2 loads of course, but the loads can be done directly to simd vector that way). Also, we can now do some try to use the right int/float type. This should make no difference really since there's typically no domain transition penalties for such simd loads, however it actually makes a difference since llvm will use different shuffle lowering afterwards so the caller can use this to trick llvm into using sane shuffle afterwards (and yes llvm is really stupid there - nothing against using the shuffle instruction from the correct domain, but not at the cost of doing 3 times more shuffles, the case which actually matters is refusal to use shufps for integer values). Also do some attempt to avoid things which look great on paper but llvm doesn't really handle (e.g. fetching 3-element 8 bit and 16 bit vectors which is simply disastrous - I suspect type legalizer is to blame trying to extend these vectors to 128bit types somehow, so fetching these with scalars like before which is suboptimal due to the ZExt). Remove the ability for truncation (no point, this is gather, not conversion) as it is complex enough already. While here also implement not just the float, but also the 64bit avx2 gathers (disabled though since based on the theoretical numbers the benefit just isn't there at all until Skylake at least). Reviewed-by: Jose Fonseca <[email protected]>
* gallivm: optimize SoA AoS fallback fetch path a littleRoland Scheidegger2016-12-211-22/+46
| | | | | | | | | | We should do transpose, not extract/insert, at least with "sufficient" amount of channels (for 4 channels, extract/insert shuffles generated otherwise look truly terrifying). Albeit we shouldn't fallback to that so often in any case. v2: ditch the extract/insert path, not worth keeping (we're going to avoid hitting the fallback that often with future patches). Reviewed-by: Jose Fonseca <[email protected]>
* gallivm: (trivial) handle non-aligned fetch for lp_build_fetch_rgba_soaRoland Scheidegger2016-12-213-8/+12
| | | | | | | | | | soa fetch so far always assumed that data was aligned. However, we want to use this for vertex fetch, and data might not be aligned there, so handle it in this path too (basically just pass through alignment through to other functions). (It looks like it wouldn't work for for cached s3tc but this is no different than with AoS fetch.) Reviewed-by: Jose Fonseca <[email protected]>
* st/nine: Implement gallium nine CSMTPatrick Rudolph2016-12-201-0/+11
| | | | | | | | Use an offloading thread for all nine_context functions. Macros are used to ease the reading of the code. Signed-off-by: Patrick Rudolph <[email protected]> Signed-off-by: Axel Davy <[email protected]>
* Revert "cso: don't release sampler states that are bound"Michel Dänzer2016-12-191-3/+1
| | | | | | | This reverts commit 6dc96de303290e8d1fc294da478c4f370be98dea. No longer necessary with the previous change. Reviewed-by: Nicolai Hähnle <[email protected]>
* cso: Make sanitize_hash safe for samplersMichel Dänzer2016-12-191-1/+43
| | | | | | | | Remove currently bound sampler states from the hash table before pruning entries from the hash table, so they cannot accidentally be deleted by the pruning. Reviewed-by: Nicolai Hähnle <[email protected]>
* cso: Store hash key in struct cso_samplerMichel Dänzer2016-12-192-0/+2
| | | | | | Preparation for following changes, no functional change intended. Reviewed-by: Nicolai Hähnle <[email protected]>
* cso: Optimize cso_save/restore_fragment_samplersMichel Dänzer2016-12-191-4/+17
| | | | | | | | | Only copy/memset the pointers that actually need to be. v2: * Cast info->nr_samplers to int for calculating delta (Nicolai) Reviewed-by: Nicolai Hähnle <[email protected]>
* cso: Store pointers to struct cso_sampler in struct sampler_infoMichel Dänzer2016-12-191-15/+18
| | | | | | Preparation for following changes, no functional change intended. Reviewed-by: Nicolai Hähnle <[email protected]>
* cso: Don't restore nr_samplers in cso_restore_fragment_samplersMichel Dänzer2016-12-191-1/+0
| | | | | | | | | | If info->nr_samplers > ctx->nr_fragment_samplers_saved, the assignment would prevent cso_single_sampler_done from unbinding the no longer used samplers from the driver, which could result in use-after-free. This is probably unlikely to happen in practice though. Cc: "12.0 13.0" <[email protected]> Reviewed-by: Nicolai Hähnle <[email protected]>
* treewide: s/comparitor/comparator/Ilia Mirkin2016-12-121-1/+1
| | | | | | | | | | git grep -l comparitor | xargs sed -i 's/comparitor/comparator/g' Just happened to notice this in a patch that was sent and included one of the tokens in question. Signed-off-by: Ilia Mirkin <[email protected]> Acked-by: Nicolai Hähnle <[email protected]>
* st/glsl_to_tgsi: plumb the GS output stream qualifier through to TGSINicolai Hähnle2016-12-122-1/+21
| | | | | | Allow drivers to emit GS outputs in a smarter way. Reviewed-by: Marek Olšák <[email protected]>
* tgsi/scan: collect information about output usagemasksNicolai Hähnle2016-12-122-0/+2
| | | | Reviewed-by: Marek Olšák <[email protected]>
* tgsi/scan: collect information about output vertex streamsNicolai Hähnle2016-12-122-0/+19
| | | | Reviewed-by: Marek Olšák <[email protected]>
* tgsi: add Stream{X,Y,Z,W} fields to tgsi_declaration_semanticNicolai Hähnle2016-12-123-2/+77
| | | | | | | | | | | This is for geometry shader outputs. Without it, drivers have no way of knowing which stream each output is intended for, and have to conservatively write all outputs to all streams. Separate stream numbers for each component are required due to output packing. Reviewed-by: Marek Olšák <[email protected]>
* tgsi: fix the src type of TGSI_OPCODE_MEMBARMarek Olšák2016-12-071-0/+1
| | | | | | | It's a literal integer. The next commit will need this. Cc: 13.0 <[email protected]> Reviewed-by: Nicolai Hähnle <[email protected]>
* cso: don't release sampler states that are boundMarek Olšák2016-12-071-1/+3
| | | | | | | | | | | | | | This fixes random radeonsi GPU hangs in Batman Arkham: Origins (Wine) and probably many other games too. cso_cache deletes sampler states when the cache size is too big and doesn't check which sampler states are bound, causing use-after-free in drivers. Because of that, radeonsi uploaded garbage sampler states and the hardware went bananas. Other drivers may have experienced similar issues. Cc: 12.0 13.0 <[email protected]> Reviewed-by: Nicolai Hähnle <[email protected]> Reviewed-by: Edward O'Callaghan <[email protected]>
* gallivm: optimize 16bit->32bit gather path a bitRoland Scheidegger2016-12-061-3/+39
| | | | | | | | | | LLVM can't really optimize anything which crosses scalar/vector boundaries, so help a bit with some particular gather operations when the width is expanded (only do it for 16->32bit expansion for now), by doing expansion after fetch. That is probably a better solution anyway even if llvm would recognize it, makes for cleaner IR... Reviewed-by: Jose Fonseca <[email protected]>
* gallivm: handle 16bit float fetches in lp_build_fetch_rgba_soaRoland Scheidegger2016-12-061-4/+18
| | | | | | | | | | | | | | | | | | | | Note that we really want to _never_ reach the bottom of the function, which resorts to AoS fetch. Half floats can be handled just like other formats which fit into 32bit vectors (so, only 1x16 and 2x16 formats, albeit with more channels things are not THAT bad), with minimal plumbing. I've seen code size go down nearly by a factor of 3 for a complete texture sampling function (including bilinear filtering) using R16F. (What we should do for everything not special cased is to do AoS gather, shuffle/shift things into SoA vectors, and then do the conversion there. Otherwise it's particularly bad with 1 or 2 channel formats - that r16f format with either 4 or 8-wide vectors was still doing one element at a time, essentially doing exactly the same work as for rgba16f. Also replacing the channels with SWIZZLE0/1 (particularly the latter) adds even more work, as it has to be done per aos vector, and not just straightforward at the end with the SoA vector.) Reviewed-by: Jose Fonseca <[email protected]>
* util: (trivial) ETC1 meets the criteria for fitting into unorm8Roland Scheidegger2016-12-061-0/+5
| | | | | | Just like other similar compressed formats. Reviewed-by: Jose Fonseca <[email protected]>
* gallivm: use getHostCPUFeatures on x86/llvm-4.0+.Tim Rowley2016-12-051-0/+15
| | | | | | | Use llvm provided API based on cpuid rather than our own manually mantained list of mattr enabling/disabling. Reviewed-by: Roland Scheidegger <[email protected]>
* configure.ac: Move llvm_set_environment_variables higher.Tobias Droste2016-12-051-1/+1
| | | | | | | | | | | | | | | | This moves the function to get the LLVM environment variables higher in the file. It still needs to be below the "--enable-opencl" because it uses $enable_opencl. It can be called without condition now as it only throws errors if openCL is enabled. v5: HAVE_MESA_LLVM is only used for gallium. Rename it to HAVE_GALLIUM_LLVM. In order to only link LLVM when it is needed, HAVE_GALLIUM_LLVM is only set if "$enable-gallium-llvm" is yes. Signed-off-by: Tobias Droste <[email protected]> Reviewed-by: Emil Velikov <[email protected]>
* tgsi: store writes_primid when scanning tgsiTim Rowley2016-12-012-0/+4
| | | | Reviewed-by: Marek Olšák <[email protected]>