summaryrefslogtreecommitdiffstats
Commit message (Collapse)AuthorAgeFilesLines
* anv: do not open random render node(s)Emil Velikov2017-03-152-17/+39
| | | | | | | | | | | | | | | drmGetDevices2() provides us with enough flexibility to build heuristics upon. Opening a random node on the other hand will wake up the device, regardless if it's the one we're interested or not. v2: Rebase, explicitly require/check for libdrm v3: Return VK_ERROR_INCOMPATIBLE_DRIVER for no devices (Ilia) v4: Rebase Cc: Jason Ekstrand <[email protected]> Signed-off-by: Emil Velikov <[email protected]> Reviewed-by: Eric Engestrom <[email protected]> (v1) Tested-by: Mike Lothian <[email protected]>
* radv: do not open random render node(s)Emil Velikov2017-03-151-12/+36
| | | | | | | | | | | | | | | | drmGetDevices2() provides us with enough flexibility to build heuristics upon. Opening a random node on the other hand will wake up the device, regardless if it's the one we're interested or not. v2: Rebase. v3: Return VK_ERROR_INCOMPATIBLE_DRIVER for no devices (Ilia) Cc: Michel Dänzer <[email protected]> Cc: Dave Airlie <[email protected]> Signed-off-by: Emil Velikov <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]> (v1) Reviewed-by: Eric Engestrom <[email protected]> (v1) Tested-by: Mike Lothian <[email protected]>
* radv/winsys: use drmGetDevice2 APIEmil Velikov2017-03-152-2/+3
| | | | | | | | | | | | | Analogous to previous commit v2: Add explicit require_libdrm check. Cc: Dave Airlie <[email protected]> Signed-off-by: Emil Velikov <[email protected]> Reviewed-by: Michel Dänzer <[email protected]> (v1) Reviewed-by: Bas Nieuwenhuizen <[email protected]> (v1) Reviewed-by: Eric Engestrom <[email protected]> (v1) Tested-by: Mike Lothian <[email protected]>
* winsys/amdgpu: use drmGetDevice2 APIEmil Velikov2017-03-151-2/+2
| | | | | | | | | | | Analogous to previous commit Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=98502 Signed-off-by: Emil Velikov <[email protected]> Reviewed-by: Michel Dänzer <[email protected]> Reviewed-by: Marek Olšák <[email protected]> Reviewed-by: Eric Engestrom <[email protected]> Tested-by: Mike Lothian <[email protected]>
* loader: use drmGetDevice[s]2 APIEmil Velikov2017-03-151-3/+3
| | | | | | | | | | | | By this allows us to fetch the device list/info w/o the revision field. At the moment retrieving the latter wakes up the device. Note: kernel patch to resolve that should be in 4.10. Signed-off-by: Emil Velikov <[email protected]> Reviewed-by: Michel Dänzer <[email protected]> Reviewed-by: Eric Engestrom <[email protected]> Tested-by: Mike Lothian <[email protected]>
* autoconf/scons: bump libdrm to 2.4.75Emil Velikov2017-03-152-2/+2
| | | | | | | | | | | We'll be using the drmGetDevice[s]2 API in src/loader with next patch. v2: Rebase. Signed-off-by: Emil Velikov <[email protected]> Reviewed-by: Michel Dänzer <[email protected]> (v1) Reviewed-by: Eric Engestrom <[email protected]> (v1) Tested-by: Mike Lothian <[email protected]>
* util/sha1: drop _mesa_sha1_{update, format} return typeEmil Velikov2017-03-154-16/+14
| | | | | | | | | Unused/unchecked by any of the callers. v2: Fix the glsl cases that have crept in since v1 Signed-off-by: Emil Velikov <[email protected]> Reviewed-by: Grazvydas Ignotas <[email protected]>
* util/sha1: rework _mesa_sha1_{init,final}Emil Velikov2017-03-156-64/+44
| | | | | | | | | | | | Rather than having an extra memory allocation [that we currently do not and act accordingly] just make the API take an pointer to a stack allocated instance. This and follow-up steps will effectively make the _mesa_sha1_foo simple define/inlines around their SHA1 counterparts. Signed-off-by: Emil Velikov <[email protected]> Reviewed-by: Grazvydas Ignotas <[email protected]>
* util/sha1: add non-typedef name for the SHA1_CTX structEmil Velikov2017-03-152-1/+4
| | | | | | | | | | Using typedef(s) is not always the answer and makes it harder for people to do clever (or one might call nasty) things with the code. Add a struct name which we will use with follow-up commit. Signed-off-by: Emil Velikov <[email protected]> Reviewed-by: Grazvydas Ignotas <[email protected]>
* radv: Remove unused descriptor set field.Bas Nieuwenhuizen2017-03-151-1/+0
| | | | | | Trivial. Signed-off-by: Bas Nieuwenhuizen <[email protected]>
* r600: refactor binding code for attach buffer to CB.Dave Airlie2017-03-151-33/+78
| | | | | | | | | This refactors out the code and fixes it up to be used for images later. It uses the code in the current RAT binding for compute. Reviewed-by: Edward O'Callaghan <[email protected]> Signed-off-by: Dave Airlie <[email protected]>
* r600: refactor out CB setup.Dave Airlie2017-03-151-104/+143
| | | | | | | | | This moves the code to create CB info out into a separate function so it can be reused in images code to create RATs. Reviewed-by: Edward O'Callaghan <[email protected]> Signed-off-by: Dave Airlie <[email protected]>
* r600: refactor texture resource words setup code.Dave Airlie2017-03-151-88/+131
| | | | | | | | This refactors out the code to setup a texture resource so we can reuse it later from the images code. Reviewed-by: Edward O'Callaghan <[email protected]> Signed-off-by: Dave Airlie <[email protected]>
* r600: factor out the code to initialise a buffer resource.Dave Airlie2017-03-151-29/+51
| | | | | | | | | | This takes the code required to initialise a buffer resource out of the texture buffer code, into it's own function. This is going to be used for the image support later. Reviewed-by: Edward O'Callaghan <[email protected]> Signed-off-by: Dave Airlie <[email protected]>
* r600g: make framebuffer atom rely on dual src blend state.Dave Airlie2017-03-154-2/+7
| | | | | | | | In order to make ARB_shader_image_load_store, we have to share the CB space with RATs, so we should only steal the dual src space if we have dual src enabled. Signed-off-by: Dave Airlie <[email protected]>
* intel/debug: Add a common INTEL_DEBUG=nohiz optionJason Ekstrand2017-03-145-6/+4
| | | | | | | | The GL driver had a driconf option (which doesn't make much sense) and the Vulkan driver had a hand-rolled environment variable. Instead, let's tie both into the INTEL_DEBUG mechanism and unify things. Reviewed-by: Topi Pohjolainen <[email protected]>
* anv/image: Move handling of INTEL_VK_HIZJason Ekstrand2017-03-141-2/+2
| | | | | | | This makes it so that you don't get an "Implement gen7 HiZ" perf warning when you manually disable HiZ on gen8. Reviewed-by: Topi Pohjolainen <[email protected]>
* radv: trivial tidy upsTimothy Arceri2017-03-152-5/+2
| | | | Reviewed-by: Edward O'Callaghan <[email protected]>
* util/disk_cache: scale cache according to filesystem sizeAlan Swanson2017-03-151-3/+8
| | | | | | | | Select higher of current 1G default or 10% of filesystem where cache is located. Acked-by: Timothy Arceri <[email protected]> Reviewed-by: Grazvydas Ignotas <[email protected]>
* util/disk_cache: actually enforce cache sizeAlan Swanson2017-03-152-4/+24
| | | | | | | | | | Currently only a one in one out eviction so if at max_size and cache files were to constantly increase in size then so would the cache. Restrict to limit of 8 evictions per new cache entry. V2: (Timothy Arceri) fix make check tests Reviewed-by: Grazvydas Ignotas <[email protected]>
* util/disk_cache: use LRU eviction rather than random evictionAlan Swanson2017-03-151-43/+34
| | | | | | | | | | | Still using fast random selection of two-character subdirectory in which to check cache files rather than scanning entire cache. v2: Factor out double strlen call v3: C99 declaration of variables where used Reviewed-by: Grazvydas Ignotas <[email protected]> Reviewed-by: Timothy Arceri <[email protected]>
* util/disk_cache: don't fallback to an empty cache dir on evictTimothy Arceri2017-03-151-6/+27
| | | | | | | | | | If we fail to randomly select a two letter cache dir, don't select an empty dir on fallback. In real world use we should never hit the fallback path but it can be hit by tests when the cache is set to a very small max value. Reviewed-by: Grazvydas Ignotas <[email protected]>
* util/disk_cache: use a thread queue to write to shader cacheTimothy Arceri2017-03-152-13/+75
| | | | | | | | | | | | | | This should help reduce any overhead added by the shader cache when programs are not found in the cache. To avoid creating any special function just for the sake of the tests we add a one second delay whenever we call dick_cache_put() to give it time to finish. V2: poll for file when waiting for thread in test V3: fix poll delay to really be 100ms, and simplify the wait function Reviewed-by: Grazvydas Ignotas <[email protected]>
* util/disk_cache: add helpers for creating/destroying disk cache put jobsTimothy Arceri2017-03-151-0/+40
| | | | | | | V2: Make a copy of the data so we don't have to worry about it being freed before we are done compressing/writing. Reviewed-by: Grazvydas Ignotas <[email protected]>
* util/disk_cache: add thread queue to disk cacheTimothy Arceri2017-03-151-1/+15
| | | | | Reviewed-by: Marek Olšák <[email protected]> Reviewed-by: Grazvydas Ignotas <[email protected]>
* radv/ac: workaround regression in llvm 4.0 releaseDave Airlie2017-03-151-1/+12
| | | | | | | | | | | | | | | LLVM 4.0 released with a pretty messy regression, that hopefully get fixed in the future. This work around was proposed by Tom, and it fixes the CTS regressions here at least, I'm not sure if this will cause any major side effects, but correctness over speed and all that. radeonsi should possibly consider the same workaround until an llvm fix can be found. Acked-by: Bas Nieuwenhuizen <[email protected]> Signed-off-by: Dave Airlie <[email protected]>
* radv/ac: gather4 cube workaround integerDave Airlie2017-03-151-1/+71
| | | | | | | | | | | | | | | | | | | | This fix is extracted from amdgpu-pro shader traces. It appears the gather4 workaround for integer types doesn't work for cubes, so instead if forces a float scaled sample, then converts to integer. It modifies the descriptor before calling the gather. This also produces some ugly asm code for reasons specified in the patch, llvm could probably do better than dumping sgprs to vgprs. This fixes: dEQP-VK.glsl.texture_gather.basic.cube.rgba8* Acked-by: Bas Nieuwenhuizen <[email protected]> Signed-off-by: Dave Airlie <[email protected]>
* radv: Set driver version to mesa version;Bas Nieuwenhuizen2017-03-151-1/+23
| | | | | | | | | | | | | I couldn't really find an encoding in the spec. I'm not sure it prescribes VK_MAKE_VERSION format, but vulkan.gpuinfo.org interprets it that way by default. vulkaninfo gives the raw number, so we could alternatively do something like 17001000, but that doesn't show up right on vulkan.gpuinfo.org again. Looking at that site, the -pro driver also uses VK_MAKE_VERSION, so keeping consistency is probably best. Signed-off-by: Bas Nieuwenhuizen <[email protected]> Acked-by: Dave Airlie <[email protected]>
* radv: Increase api version to 1.0.42.Bas Nieuwenhuizen2017-03-151-1/+1
| | | | | | | | | I've skimmed to changes from 1.0.5 to 1.0.42 and I think we have all changes. We're still not conformant ofcourse, but this should not regress stuff, Signed-off-by: Bas Nieuwenhuizen <[email protected]> Acked-by: Dave Airlie <[email protected]>
* util/vk: Add helpers for finding an extension structJason Ekstrand2017-03-151-0/+17
| | | | Reviewed-by: Dave Airlie <[email protected]>
* radv: Flush before copying with PKT3_WRITE_DATA in CmdUpdateBufferAlex Smith2017-03-141-0/+2
| | | | | | | | | | | | | | | Need to flush before updating the buffer to ensure that the copy is ordered after previous accesses (assuming the app has performed the appropriate barriers). This fixes potential issues due to draws prior to an update reading the new buffer content, despite having the necessary barriers between them. Signed-off-by: Alex Smith <[email protected]> Cc: 17.0 <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]> Reviewed-by: Dave Airlie <[email protected]>
* radv: Emit cache flushes before CP DMA.Bas Nieuwenhuizen2017-03-141-0/+3
| | | | | | | | The flushes could be due to TRANSFER barriers. Signed-off-by: Bas Nieuwenhuizen <[email protected]> Cc: 17.0 <[email protected]> Reviewed-by: Dave Airlie <[email protected]>
* Convert sed(1) syntax to be compatible with FreeBSD and OpenBSDJan Beich2017-03-141-10/+10
| | | | | | | | | | | | BSD regex library doesn't support extended RE escapes (e.g. \+) and shorthand character classes (e.g. \s, \S) and SVR4-style word delimiters[1] (on DragonFly and NetBSD). Both GNU and BSD sed support -E and -r to enable extended RE but OS X still lacks -r. [1] https://www.illumos.org/issues/516 Reviewed-by: Eric Engestrom <[email protected]> Tested-by: Eric Engestrom <[email protected]> (GNU sed)
* anv: Properly enumerate physical devices when none are presentJason Ekstrand2017-03-141-2/+5
|
* nir/constant_expressions: Refactor helper functionsJason Ekstrand2017-03-141-24/+27
| | | | | | | | Apart from avoiding some unneeded size cases, this shouldn't have any actual functional impact. Reviewed-by: Dylan Baker <[email protected]> Reviewed-by: Lionel Landwerlin <[email protected]>
* nir: Rework conversion opcodesJason Ekstrand2017-03-1422-308/+218
| | | | | | | | | | | | | | | | | | | | | | | | The NIR story on conversion opcodes is a mess. We've had way too many of them, naming is inconsistent, and which ones have explicit sizes was sort-of random. This commit re-organizes things and makes them all consistent: - All non-bool conversion opcodes now have the explicit size in the destination and are named <src_type>2<dst_type><size>. - Integer <-> integer conversion opcodes now only come in i2i and u2u forms (i2u and u2i have been removed) since the only difference between the different integer conversions is whether or not they sign-extend when up-converting. - Boolean conversion opcodes all have the explicit size on the bool and are named <src_type>2<dst_type>. Making things consistent also allows nir_type_conversion_op to be moved to nir_opcodes.c and auto-generated using mako. This will make adding int8, int16, and float16 versions much easier when the time comes. Reviewed-by: Eric Anholt <[email protected]>
* i965/fs: Re-arrange conversion operationsJason Ekstrand2017-03-141-36/+31
| | | | Reviewed-by: Topi Pohjolainen <[email protected]>
* i965/vec4: Get rid of the type parameter from to/from_doubleJason Ekstrand2017-03-142-24/+15
| | | | Reviewed-by: Topi Pohjolainen <[email protected]>
* glsl/nir: Use nir_type_conversion_opJason Ekstrand2017-03-141-37/+32
| | | | | | Using the helper is way better than hand-coding the universe. Reviewed-by: Eric Anholt <[email protected]>
* nir: Rewrite nir_type_conversion_opJason Ekstrand2017-03-141-63/+92
| | | | | | | | | The original version was very convoluted and tried way too hard to not just have the nested switch statement that it needs. Let's just write the obvious code and then we know it's correct. This fixes a bunch of missing cases particularly with int64. Reviewed-by: Plamena Manolova <[email protected]>
* nir: Add a get_nir_type_for_glsl_base_type helperJason Ekstrand2017-03-141-2/+8
| | | | Reviewed-by: Eric Anholt <[email protected]>
* nir/validate: Rework ALU bit-size rule validationJason Ekstrand2017-03-141-32/+33
| | | | | | | | | | | The original bit-size validation wasn't capable of properly dealing with instructions with variable bit sizes. An attempt was made to handle it by looking at source and destinations but, because the validation was done in validate_alu_(src|dest), it didn't really have the needed information. The new validation code is much more straightforward and should be more correct. Reviewed-by: Eric Anholt <[email protected]>
* nir/validate: Validate that bit sizes and components always matchJason Ekstrand2017-03-141-38/+63
| | | | | | | | | | | | | We've always required bit sizes to match but the rules for number of components have been a bit loose. You've never been allowed to source from something with less components than you consume, but more has always been fine. This changes the validator to require that they match exactly. The fact that they don't always match has been a source of confusion in NIR for quite some time and it's time we got rid of it. Reviewed-by: Eric Anholt <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]> Reviewed-by: Connor Abbott <[email protected]>
* nir: Make image_size a variable-width intrinsicJason Ekstrand2017-03-143-11/+16
| | | | | | Reviewed-by: Eric Anholt <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]> Reviewed-by: Connor Abbott <[email protected]>
* i965/fs: Use num_components from the SSA def in image intrinsicsJason Ekstrand2017-03-141-2/+1
| | | | | | Reviewed-by: Eric Anholt <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]> Reviewed-by: Connor Abbott <[email protected]>
* nir/lower_tex: Use tex_instr_dest_size for txs destinationsJason Ekstrand2017-03-141-1/+2
| | | | | | | | | Using coord_components of the source texture is correct for everything except cube maps where it's off by one. Reviewed-by: Eric Anholt <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]> Reviewed-by: Connor Abbott <[email protected]>
* nir/spirv: Restrict the number of channels in texture coordinatesJason Ekstrand2017-03-141-1/+2
| | | | | | | | | Some SPIR-V texturing instructions pack more than the texture coordinate into the coordinate source. We need to mask off the unused channels. Reviewed-by: Eric Anholt <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]> Reviewed-by: Connor Abbott <[email protected]>
* nir/copy_prop: Respect the source's number of componentsJason Ekstrand2017-03-141-33/+96
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | In the near future we are going to require that the num_components in a src dereference match the num_components of the SSA value being dereferenced. To do that, we need copy_prop to not remove our MOVs from a larger SSA value into an instruction that uses fewer channels. Because we suddenly have to know how many components each source has, this makes the pass a bit more complicated. Fortunately, copy propagation is the only pass that cares about the number of components are read by any given source so it's fairly contained. Shader-db results on Sky Lake: total instructions in shared programs: 13318947 -> 13320265 (0.01%) instructions in affected programs: 260633 -> 261951 (0.51%) helped: 324 HURT: 1027 Looking through the hurt programs, about a dozen are hurt by 3 instructions and the rest are all hurt by 2 instructions. From a spot-check of the shaders, the story is always the same: They get a vec4 from somewhere (frequently an input) and use the first two or three components as a texture coordinate. Because of the vector component mismatch, we have a mov or, more likely, a vecN sitting between the texture instruction and the input. This means that the back-end inserts a bunch of MOVs and split_virtual_grfs() goes to town. Because the texture coordinate is also used by some other calculation, register coalesce can't combine them back together and we end up with an extra 2 MOV instructions in our shader. Reviewed-by: Eric Anholt <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]> Reviewed-by: Connor Abbott <[email protected]>
* nir/intrinsics: Make load_barycentric_input take a 2-component coorJason Ekstrand2017-03-141-1/+3
| | | | | | Reviewed-by: Eric Anholt <[email protected]> Reviewed-by: Connor Abbott <[email protected]> Cc: "17.0 13.0" <[email protected]>
* anv/blorp: Only set a clear color for resolves if fast-clearedJason Ekstrand2017-03-141-1/+2
| | | | | | | Reviewed-by: Eric Anholt <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]> Reviewed-by: Connor Abbott <[email protected]> Cc: "17.0" <[email protected]>