summaryrefslogtreecommitdiffstats
Commit message (Collapse)AuthorAgeFilesLines
* radv: Emit cache flushes before CP DMA.Bas Nieuwenhuizen2017-03-141-0/+3
| | | | | | | | The flushes could be due to TRANSFER barriers. Signed-off-by: Bas Nieuwenhuizen <[email protected]> Cc: 17.0 <[email protected]> Reviewed-by: Dave Airlie <[email protected]>
* Convert sed(1) syntax to be compatible with FreeBSD and OpenBSDJan Beich2017-03-141-10/+10
| | | | | | | | | | | | BSD regex library doesn't support extended RE escapes (e.g. \+) and shorthand character classes (e.g. \s, \S) and SVR4-style word delimiters[1] (on DragonFly and NetBSD). Both GNU and BSD sed support -E and -r to enable extended RE but OS X still lacks -r. [1] https://www.illumos.org/issues/516 Reviewed-by: Eric Engestrom <[email protected]> Tested-by: Eric Engestrom <[email protected]> (GNU sed)
* anv: Properly enumerate physical devices when none are presentJason Ekstrand2017-03-141-2/+5
|
* nir/constant_expressions: Refactor helper functionsJason Ekstrand2017-03-141-24/+27
| | | | | | | | Apart from avoiding some unneeded size cases, this shouldn't have any actual functional impact. Reviewed-by: Dylan Baker <[email protected]> Reviewed-by: Lionel Landwerlin <[email protected]>
* nir: Rework conversion opcodesJason Ekstrand2017-03-1422-308/+218
| | | | | | | | | | | | | | | | | | | | | | | | The NIR story on conversion opcodes is a mess. We've had way too many of them, naming is inconsistent, and which ones have explicit sizes was sort-of random. This commit re-organizes things and makes them all consistent: - All non-bool conversion opcodes now have the explicit size in the destination and are named <src_type>2<dst_type><size>. - Integer <-> integer conversion opcodes now only come in i2i and u2u forms (i2u and u2i have been removed) since the only difference between the different integer conversions is whether or not they sign-extend when up-converting. - Boolean conversion opcodes all have the explicit size on the bool and are named <src_type>2<dst_type>. Making things consistent also allows nir_type_conversion_op to be moved to nir_opcodes.c and auto-generated using mako. This will make adding int8, int16, and float16 versions much easier when the time comes. Reviewed-by: Eric Anholt <[email protected]>
* i965/fs: Re-arrange conversion operationsJason Ekstrand2017-03-141-36/+31
| | | | Reviewed-by: Topi Pohjolainen <[email protected]>
* i965/vec4: Get rid of the type parameter from to/from_doubleJason Ekstrand2017-03-142-24/+15
| | | | Reviewed-by: Topi Pohjolainen <[email protected]>
* glsl/nir: Use nir_type_conversion_opJason Ekstrand2017-03-141-37/+32
| | | | | | Using the helper is way better than hand-coding the universe. Reviewed-by: Eric Anholt <[email protected]>
* nir: Rewrite nir_type_conversion_opJason Ekstrand2017-03-141-63/+92
| | | | | | | | | The original version was very convoluted and tried way too hard to not just have the nested switch statement that it needs. Let's just write the obvious code and then we know it's correct. This fixes a bunch of missing cases particularly with int64. Reviewed-by: Plamena Manolova <[email protected]>
* nir: Add a get_nir_type_for_glsl_base_type helperJason Ekstrand2017-03-141-2/+8
| | | | Reviewed-by: Eric Anholt <[email protected]>
* nir/validate: Rework ALU bit-size rule validationJason Ekstrand2017-03-141-32/+33
| | | | | | | | | | | The original bit-size validation wasn't capable of properly dealing with instructions with variable bit sizes. An attempt was made to handle it by looking at source and destinations but, because the validation was done in validate_alu_(src|dest), it didn't really have the needed information. The new validation code is much more straightforward and should be more correct. Reviewed-by: Eric Anholt <[email protected]>
* nir/validate: Validate that bit sizes and components always matchJason Ekstrand2017-03-141-38/+63
| | | | | | | | | | | | | We've always required bit sizes to match but the rules for number of components have been a bit loose. You've never been allowed to source from something with less components than you consume, but more has always been fine. This changes the validator to require that they match exactly. The fact that they don't always match has been a source of confusion in NIR for quite some time and it's time we got rid of it. Reviewed-by: Eric Anholt <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]> Reviewed-by: Connor Abbott <[email protected]>
* nir: Make image_size a variable-width intrinsicJason Ekstrand2017-03-143-11/+16
| | | | | | Reviewed-by: Eric Anholt <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]> Reviewed-by: Connor Abbott <[email protected]>
* i965/fs: Use num_components from the SSA def in image intrinsicsJason Ekstrand2017-03-141-2/+1
| | | | | | Reviewed-by: Eric Anholt <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]> Reviewed-by: Connor Abbott <[email protected]>
* nir/lower_tex: Use tex_instr_dest_size for txs destinationsJason Ekstrand2017-03-141-1/+2
| | | | | | | | | Using coord_components of the source texture is correct for everything except cube maps where it's off by one. Reviewed-by: Eric Anholt <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]> Reviewed-by: Connor Abbott <[email protected]>
* nir/spirv: Restrict the number of channels in texture coordinatesJason Ekstrand2017-03-141-1/+2
| | | | | | | | | Some SPIR-V texturing instructions pack more than the texture coordinate into the coordinate source. We need to mask off the unused channels. Reviewed-by: Eric Anholt <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]> Reviewed-by: Connor Abbott <[email protected]>
* nir/copy_prop: Respect the source's number of componentsJason Ekstrand2017-03-141-33/+96
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | In the near future we are going to require that the num_components in a src dereference match the num_components of the SSA value being dereferenced. To do that, we need copy_prop to not remove our MOVs from a larger SSA value into an instruction that uses fewer channels. Because we suddenly have to know how many components each source has, this makes the pass a bit more complicated. Fortunately, copy propagation is the only pass that cares about the number of components are read by any given source so it's fairly contained. Shader-db results on Sky Lake: total instructions in shared programs: 13318947 -> 13320265 (0.01%) instructions in affected programs: 260633 -> 261951 (0.51%) helped: 324 HURT: 1027 Looking through the hurt programs, about a dozen are hurt by 3 instructions and the rest are all hurt by 2 instructions. From a spot-check of the shaders, the story is always the same: They get a vec4 from somewhere (frequently an input) and use the first two or three components as a texture coordinate. Because of the vector component mismatch, we have a mov or, more likely, a vecN sitting between the texture instruction and the input. This means that the back-end inserts a bunch of MOVs and split_virtual_grfs() goes to town. Because the texture coordinate is also used by some other calculation, register coalesce can't combine them back together and we end up with an extra 2 MOV instructions in our shader. Reviewed-by: Eric Anholt <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]> Reviewed-by: Connor Abbott <[email protected]>
* nir/intrinsics: Make load_barycentric_input take a 2-component coorJason Ekstrand2017-03-141-1/+3
| | | | | | Reviewed-by: Eric Anholt <[email protected]> Reviewed-by: Connor Abbott <[email protected]> Cc: "17.0 13.0" <[email protected]>
* anv/blorp: Only set a clear color for resolves if fast-clearedJason Ekstrand2017-03-141-1/+2
| | | | | | | Reviewed-by: Eric Anholt <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]> Reviewed-by: Connor Abbott <[email protected]> Cc: "17.0" <[email protected]>
* anv/blorp: Turn off AUX after doing a CCS_D resolveJason Ekstrand2017-03-141-0/+2
| | | | | | | | | | For render passes with multiple subpasses on gen7, we only fast-clear at the top but an input attachment use can cause us to do a resolve in the middle of the render pass. Once we've done so, we are no longer have a fast-cleared surface so we can just set aux_usage to NONE. Reviewed-by: Topi Pohjolainen <[email protected]> Cc: "17.0" <[email protected]>
* android: add '/vulkan' to libmesa_anv_entrypoints pathTapani Pälli2017-03-141-2/+2
| | | | | | | otherwise generated entrypoint headers are not found during build Signed-off-by: Tapani Pälli <[email protected]> Reviewed-by: Emil Velikov <[email protected]>
* android: add src/intel/compiler to libmesa_intel_compiler includesTapani Pälli2017-03-141-0/+1
| | | | | | | | fixes build error when brw_nir.h not found in the generated file brw_nir_trig_workarounds.c. Signed-off-by: Tapani Pälli <[email protected]> Reviewed-by: Emil Velikov <[email protected]>
* anv: Add missing error-checking to anv_CreateDevice (v3)Gwan-gyeong Mun2017-03-131-9/+56
| | | | | | | | | | | | | | | | | This patch adds missing error-checking and fixes resource leak in allocation failure path on anv_CreateDevice() v2: Fixes from Jason Ekstrand's review a) Add missing destructors for all of the state pools on allocation failure path b) Add missing destructor for batch bo pools on allocation failure path v3: Fixes from Emil Velikov's review Add missing destructor for queue and scratch_pool on allocation failure path Signed-off-by: Mun Gwan-gyeong <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]>
* radv: setup llvm target data layoutDave Airlie2017-03-141-0/+7
| | | | | | | | | | | | | | Ported from radeonsi, pointed out by Tom. "This prevents LLVM from using sext instructions for local memory offsets and allows the backend to fold immediate offsets into the instruction. This also prevents some incorrect code generation for ptrtoint and inttoptr instructions." Cc: "13.0 17.0" <[email protected]> Reviewed-by: Tom Stellard <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]> Signed-off-by: Dave Airlie <[email protected]>
* radv: Reinitialise loaderMagic when allocating a cached command bufferAlex Smith2017-03-131-0/+1
| | | | | | | | | | | This must be set to ICD_LOADER_MAGIC by vkAllocateCommandBuffers, which was being done when allocating a new buffer but not when reusing an existing one in the cache. This would hit an assertion and crash in debug builds of the Vulkan loader. Fixes: 682248db451f ("radv: Cache command buffers in command pool.") Signed-off-by: Alex Smith <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]>
* gallium/radeon: disable the shader cache if dumping shadersMarek Olšák2017-03-131-0/+5
| | | | | | otherwise, cached shaders aren't dumped. Reviewed-by: Timothy Arceri <[email protected]>
* radeonsi: mark all bound shader buffer ranges as initializedMarek Olšák2017-03-131-0/+3
| | | | | | | | This should prevent cases when a buffer was incorrectly mapped without synchronization just because this wasn't done. Cc: 13.0 17.0 <[email protected]> Reviewed-by: Samuel Pitoiset <[email protected]>
* st/mesa: disable the shader cache if dumping shadersMarek Olšák2017-03-131-4/+4
| | | | | | otherwise, cached shaders aren't dumped. Reviewed-by: Timothy Arceri <[email protected]>
* anv: Use vk_outarray in vkGetPhysicalDeviceQueueFamilyPropertiesChad Versace2017-03-131-55/+18
| | | | | | No intended change in behavior. Just a refactor. Reviewed-by: Jason Ekstrand <[email protected]>
* anv: Use vk_outarray in vkEnumeratePhysicalDevices (v2)Chad Versace2017-03-131-27/+4
| | | | | | | | | No intended change in behavior. Just a refactor. v2: Replace vk_outarray_is_incomplete() with vk_outarray_status(). For Jason. Reviewed-by: Jason Ekstrand <[email protected]>
* util/vulkan: Add vk_outarray (v2)Chad Versace2017-03-131-0/+140
| | | | | | | | | | | This is a wrapper for a Vulkan output array. A Vulkan output array is one that follows the convention of the parameters to vkGetPhysicalDeviceQueueFamilyProperties(). v2: Replace vk_outarray_is_incomplete() with vk_outarray_status(). For Jason. Reviewed-by: Jason Ekstrand <[email protected]>
* intel: genxml: prevent missing ; with address fields dwordsLionel Landwerlin2017-03-131-28/+26
| | | | | | | | | | | | | | | | Before this change, the generator could print this kind of things : const uint32_t v0 = __gen_uint(values->ValidBit, 0, 0) | __gen_uint(values->FaultType, 1, 2) | __gen_uint(values->SRCIDofFault, 3, 10) | __gen_uint(values->GTTSEL, 11, 1) | dw[0] = __gen_combine_address(data, &dw[0], values->VirtualAddressofFault, v0); This change fix the trailing '|'. Signed-off-by: Lionel Landwerlin <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]>
* gallium/hud: check NULL return from u_upload_allocJulien Isorce2017-03-131-0/+5
| | | | | | | | | | | | | | | | | | | | Fixes the following segmentation fault: signal SIGSEGV: invalid address (fault address: 0x0) frame #0: 0x00007fffe718e117 radeonsi_dri.so hud_draw_background_quad hud_context.c:170 167 168 assert(hud->bg.num_vertices + 4 <= hud->bg.max_num_vertices); 169 -> 170 vertices[num++] = (float) x1; 171 vertices[num++] = (float) y1; 172 173 vertices[num++] = (float) x1; (lldb) bt * frame #0: 0x00007fffe718e117 radeonsi_dri.so`hud_draw_background_quad frame #1: 0x00007fffe718f458 radeonsi_dri.so`hud_draw frame #2: 0x00007fffe712967f radeonsi_dri.so`dri_flush Signed-off-by: Marek Olšák <[email protected]>
* winsys/radeon: check null return from radeon_cs_create_fence in cs_flushJulien Isorce2017-03-131-11/+13
| | | | | | | | | | | | Follow-up of patch: "radeon_cs_create_fence: check null return from radeon_winsys_bo_create" radeon_drm_cs_flush radeon_cs_create_fence radeon_winsys_bo_create Signed-off-by: Julien Isorce <[email protected]> Signed-off-by: Marek Olšák <[email protected]>
* winsys/radeon: check null in radeon_cs_create_fenceJulien Isorce2017-03-131-0/+3
| | | | | | | | | | | | | | Fixes the following segmentation fault: radeon_drm_cs_add_buffer (bo=0x0) at radeon_drm_cs.c -> if (!bo->handle) (gdb) bt 0 radeon_drm_cs_add_buffer (bo=0x0) at radeon_drm_cs.c 1 0x00007fffe73575de in radeon_cs_create_fence radeon_drm_cs.c 2 0x00007fffe7358c48 in radeon_drm_cs_flush radeon_drm_cs.c Signed-off-by: Julien Isorce <[email protected]> Signed-off-by: Marek Olšák <[email protected]>
* vulkan/wsi: include builddir for generated headersJuan A. Suarez Romero2017-03-131-0/+1
| | | | | | | | wayland-drm-client-protocol.h is generated in builddir, so when builddir != srcdir the header is not found, and compilation of wsi_common_wayland.c will fail. Reviewed-by: Emil Velikov <[email protected]>
* anv: Use on-the-fly surface states for dynamic buffer descriptorsJason Ekstrand2017-03-137-245/+86
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | We have a performance problem with dynamic buffer descriptors. Because we are currently implementing them by pushing an offset into the shader and adding that offset onto the already existing offset for the UBO/SSBO operation, all UBO/SSBO operations on dynamic descriptors are indirect. The back-end compiler implements indirect pull constant loads using what basically amounts to a texelFetch instruction. For pull constant loads with constant offsets, however, we use an oword block read message which goes through the constant cache and reads a whole cache line at a time. Because of these two things, direct pull constant loads are much faster than indirect pull constant loads. Because all loads from dynamically bound buffers are indirect, the user takes a substantial performance penalty when using this "performance" feature. There are two potential solutions I have seen for this problem. The alternate solution is to continue pushing offsets into the shader but wire things up in the back-end compiler so that we use the oword block read messages anyway. The only reason we can do this because we know a priori that the dynamic offsets are uniform and 16-byte aligned. Unfortunately, thanks to the 16-byte alignment requirement of the oword messages, we can't do some general "if the indirect offset is uniform, use an oword message" sort of thing. This solution, however, is recommended for a few of reasons: 1. Surface states are relatively cheap. We've been using on-the-fly surface state setup for some time in GL and it works well. Also, dynamic offsets with on-the-fly surface state should still be cheaper than allocating new descriptor sets every time you want to change a buffer offset which is really the only requirement of the dynamic offsets feature. 2. This requires substantially less compiler plumbing. Not only can we delete the entire apply_dynamic_offsets pass but we can also avoid having to add architecture for passing dynamic offsets to the back- end compiler in such a way that it can continue using oword messages. 3. We get robust buffer access range-checking for free. Because the offset and range are baked into the surface state, we no longer need to pass ranges around and do bounds-checking in the shader. 4. Once we finally get UBO pushing implemented, it will be much easier to handle pushing chunks of dynamic descriptors if the compiler remains blissfully unaware of dynamic descriptors. This commit improves performance of The Talos Principle on ULTRA settings by around 50% and brings it nicely into line with OpenGL performance. Reviewed-by: Lionel Landwerlin <[email protected]>
* anv: Stall before fast-clear operationsJason Ekstrand2017-03-131-6/+19
| | | | | | | | | | | | | | | During initial CCS bring-up, I discovered that you have to do a full CS stall prior to doing a CCS resolve as well as afterwards. It appears that the same is needed for fast-clears as well. This fixes rendering corruptions on The Talos Principle on Sky Lake GT4. The issue hasn't been demonstrated on any other hardware however, given that this appears to be a "too many things in the pipe" problem, having it be easier to reproduce on a system with more EUs makes sense. The issues with resolves is demonstrable on a GT3 or GT2 so this is probably also a problem on all GTs. Reviewed-by: Topi Pohjolainen <[email protected]> Cc: "13.0 17.0" <[email protected]>
* anv: Accurately advertise dynamic descriptor limitsJason Ekstrand2017-03-131-2/+2
| | | | | | | | | | The number of dynamic descriptors is limited by both the number of descriptors and the total number of dynamic things. Because there isn't a single "maximum dynamic things" limit, we need to divide by two so that they can create the maximum of both UBOs and SSBOs. Reviewed-by: Eduardo Lima Mitev <[email protected]> Cc: "17.0 13.0" <[email protected]>
* anv: Add a helper for working with VK_WHOLE_SIZE for buffersJason Ekstrand2017-03-134-11/+28
| | | | Reviewed-by: Plamena Manolova <[email protected]>
* freedreno/ir3: fragz cannot be half precisionRob Clark2017-03-131-0/+6
| | | | Signed-off-by: Rob Clark <[email protected]>
* freedreno/ir3: optimize less in glslRob Clark2017-03-131-1/+1
| | | | | | | | | | | | | | | | | | | Rely on nir for optimization, to reduce compile times. Very minimal impact on shader-db: total instructions in shared programs: 104170 -> 104199 (0.03%) total dwords in shared programs: 209664 -> 209728 (0.03%) total full registers used in shared programs: 7156 -> 7161 (0.07%) total half registers used in shader programs: 109 -> 109 (0.00%) total const registers used in shared programs: 24222 -> 24224 (0.01%) half full const instr dwords helped 12 107 103 112 98 hurt 11 104 105 115 102 But shader db runtime dropped from ~29.3s user to ~20.4s user. Signed-off-by: Rob Clark <[email protected]>
* aubinator/genxml: use gzipped files to store embedded genxmlLionel Landwerlin2017-03-134-18/+66
| | | | | | | | | | | | This reduces the size of the aubinator binary from ~1.4Mb to ~700Kb. With can now drop the checks on xxd in configure. v2: Fix incorrect makefile dependency (Lionel) v3: use $(PYTHON2) (Emil) Signed-off-by: Lionel Landwerlin <[email protected]> Reviewed-by: Emil Velikov <[email protected]>
* intel: genxml: add script to generate gzipped genxmlLionel Landwerlin2017-03-132-0/+48
| | | | | | | | | | | | | v2 (from Dylan): Add main function Add missing Copyright Use print_function v3: Add actually license (Dylan) Signed-off-by: Lionel Landwerlin <[email protected]> Reviewed-by: Dylan Baker <[email protected]> Reviewed-by: Emil Velikov <[email protected]>
* util/u_thread.h: Include stdint.h for int64_t definition.Jose Fonseca2017-03-131-0/+2
| | | | Fixes MinGW build. Trivial.
* intel: fix compiler buildIago Toral Quiroga2017-03-132-8/+7
| | | | | | | | | compiler/brw_vec4_gs_visitor.cpp:744:39: error: ‘GEN7_MAX_GS_OUTPUT_VERTEX_SIZE_BYTES’ was not declared in this scope output_vertex_size_bytes <= GEN7_MAX_GS_OUTPUT_VERTEX_SIZE_BYTES); Fixes: d0d4a5f43b4 ("i965: split EU defines to brw_eu_defines.h") Reviewed-by: Emil Velikov <[email protected]>
* svga: handle P016 format as wellChristian König2017-03-131-0/+1
| | | | | | Fixes: 62cff793785 ("gallium: add P016 format") Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=100180 Reviewed-by: Emil Velikov <[email protected]>
* configure.ac: require pthread-stubs only where availableEmil Velikov2017-03-131-2/+3
| | | | | | | | | | | | | | | | | | The project is a thing only for BSD platforms. Or in other words - for any other platforms building/installing pthread-stubs results only in a pthread-stub.pc file. And even where it provides a DSO, there's a fundamental design issue with it - see the pthread-stubs mailing list for the specifics. v2: Update comment above the switch statement (Jon Turney). Reviewed-by: Jeremy Huddleston Sequoia <[email protected]> Acked-by: Gary Wong <[email protected]> Tested-by: Eric Engestrom <[email protected]> Acked-by: Randy Fishel <[email protected]> Cc: Niveditha Rau <[email protected]> Signed-off-by: Emil Velikov <[email protected]>
* configure.ac: do not require the i965 driver for ANVEmil Velikov2017-03-131-3/+2
| | | | | | | | | | | As of last few commits we have the two split, thus we no longer require the i965 in order to have the ANV driver. Even though ANV does not link against libdrm nor libdrm_intel, we still require those as dependencies due to the headers they provide. Signed-off-by: Emil Velikov <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]>
* intel/vulkan: Get rid of recursive makeJason Ekstrand2017-03-139-307/+305
| | | | | | | | v2 [Emil Velikov] - Various fixes and initial stab at the Android build. - Keep the generation rules/EXTRA_DIST outside the conditional Reviewed-by: Jason Ekstrand <[email protected]>