summaryrefslogtreecommitdiffstats
Commit message (Collapse)AuthorAgeFilesLines
* meson: Be a bit more helpful when arch or OS is unknownGuido Günther2018-08-271-7/+14
| | | | | | | V2: Add one missing @0@ Signed-off-by: Guido Günther <[email protected]> Reviewed-by: Dylan Baker <[email protected]>
* intel/eu: print bytes instead of 32 bit hex valueSagar Ghuge2018-08-271-17/+30
| | | | | | | | | | | | | | | INTEL_DEBUG=hex prints 32 bit hex value and due to endianness of CPU byte order is reversed. In order to disassemble binary files, print each byte instead of 32 bit hex value. v2: Print blank spaces in order to vertically align output of compacted instructions hex value with uncompacted instructions hex value. (Matt Turner) v3: Fix line wrap at correct length Signed-off-by: Sagar Ghuge <[email protected]> Reviewed-by: Matt Turner <[email protected]>
* intel: decoder: handle 0 sized structsLionel Landwerlin2018-08-271-4/+8
| | | | | | | | | | Gen7.5 has a BLEND_STATE of size 0 which includes a variable length group. We did not deal with that very well, leading to an endless loop. Signed-off-by: Lionel Landwerlin <[email protected]> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=107544 Reviewed-by: Jason Ekstrand <[email protected]>
* nv50/ir,nvc0: use constant buffers for compute when possible on Kepler+Rhys Perry2018-08-272-10/+36
| | | | | | | | | | | | | | | | | Gives a +7.79% increase in FPS with Hitman on lowest quality settings on my GTX 1060. total instructions in shared programs : 5787979 -> 5748677 (-0.68%) total gprs used in shared programs : 669901 -> 669373 (-0.08%) total shared used in shared programs : 548832 -> 548832 (0.00%) total local used in shared programs : 21068 -> 21064 (-0.02%) local shared gpr inst bytes helped 1 0 152 274 274 hurt 0 0 0 0 0 Signed-off-by: Rhys Perry <[email protected]> Reviewed-by: Karol Herbst <[email protected]>
* nv50/ir: optimize multiplication by 16-bit immediates into two xmadsRhys Perry2018-08-271-0/+10
| | | | | | | | | | | | | | | | Rather than the usual three that would be created. total instructions in shared programs : 5796385 -> 5786560 (-0.17%) total gprs used in shared programs : 670103 -> 669968 (-0.02%) total shared used in shared programs : 548832 -> 548832 (0.00%) total local used in shared programs : 21164 -> 21068 (-0.45%) local shared gpr inst bytes helped 1 0 64 1040 1040 hurt 0 0 27 0 0 Signed-off-by: Rhys Perry <[email protected]> Reviewed-by: Karol Herbst <[email protected]>
* nv50/ir: optimize near power-of-twos into shladdRhys Perry2018-08-271-0/+27
| | | | | | | | | | | | | | total instructions in shared programs : 5819319 -> 5796385 (-0.39%) total gprs used in shared programs : 670571 -> 670103 (-0.07%) total shared used in shared programs : 548832 -> 548832 (0.00%) total local used in shared programs : 21164 -> 21164 (0.00%) local shared gpr inst bytes helped 0 0 318 1758 1758 hurt 0 0 63 0 0 Signed-off-by: Rhys Perry <[email protected]> Reviewed-by: Karol Herbst <[email protected]>
* nv50/ir: move a * b -> a << log2(b) code into createMul()Rhys Perry2018-08-271-15/+30
| | | | | | | | | | | | | | | | | | | | With this commit, OP_MAD is handled on nv50 too. This commit is also useful for later commits. Also, instead of creating a shladd, it relies on LateAlgebraicOpt to create one. This simplifies the code and helps shader-db slightly overall. total instructions in shared programs : 5820882 -> 5819319 (-0.03%) total gprs used in shared programs : 670595 -> 670571 (-0.00%) total shared used in shared programs : 548832 -> 548832 (0.00%) total local used in shared programs : 21164 -> 21164 (0.00%) local shared gpr inst bytes helped 0 0 18 230 230 hurt 0 0 8 263 263 Signed-off-by: Rhys Perry <[email protected]> Reviewed-by: Karol Herbst <[email protected]>
* nv50/ir: optimize imul/imad to xmadsRhys Perry2018-08-272-1/+56
| | | | | | | | | | | | | | | | | | | This hits the shader-db numbers a good bit, though a few xmads is way faster than an imul or imad and the cost is mitigated by the next commit, which optimizes many multiplications by immediates into shorter and less register heavy instructions than the xmads. total instructions in shared programs : 5768871 -> 5820882 (0.90%) total gprs used in shared programs : 669919 -> 670595 (0.10%) total shared used in shared programs : 548832 -> 548832 (0.00%) total local used in shared programs : 21068 -> 21164 (0.46%) local shared gpr inst bytes helped 0 0 38 0 0 hurt 1 0 365 3076 3076 Signed-off-by: Rhys Perry <[email protected]> Reviewed-by: Karol Herbst <[email protected]>
* gm107/ir: add support for OP_XMAD on GM107+Rhys Perry2018-08-273-1/+71
| | | | | | | | v4: make the immediate field 16 bits v5: don't ever emit h1 flags for immediates Signed-off-by: Rhys Perry <[email protected]> Reviewed-by: Karol Herbst <[email protected]>
* nv50/ir: add preliminary support for OP_XMADRhys Perry2018-08-277-5/+85
| | | | | | | | v4: remove uint16_t(...) v4: don't allow immediates outside [0,65535] in insnCanLoad() Signed-off-by: Rhys Perry <[email protected]> Reviewed-by: Karol Herbst <[email protected]>
* glsl/linker: Allow unused in blocks which are not declated on previous stagevadym.shovkoplias2018-08-272-3/+9
| | | | | | | | | | | | | | | | | | | | >From Section 4.3.4 (Inputs) of the GLSL 1.50 spec: "Only the input variables that are actually read need to be written by the previous stage; it is allowed to have superfluous declarations of input variables." Fixes: * interstage-multiple-shader-objects.shader_test v2: Update comment in ir.h since the usage of "used" field has been extended. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=101247 Signed-off-by: Vadym Shovkoplias <[email protected]> Reviewed-by: Alejandro Piñeiro <[email protected]> Reviewed-by: Timothy Arceri <[email protected]>
* nir: Pull block_ends_in_jump into nir.hJason Ekstrand2018-08-273-23/+13
| | | | | | | We had two different implementations in different files. May as well have one and put it in nir.h. Reviewed-by: Timothy Arceri <[email protected]>
* anv: Add support for protected memory properties on ↵Samuel Iglesias Gonsálvez2018-08-271-0/+7
| | | | | | | | | | | anv_GetPhysicalDeviceProperties2() VkPhysicalDeviceProtectedMemoryProperties structure is new on Vulkan 1.1. Fixes Vulkan CTS CL#2849. Signed-off-by: Samuel Iglesias Gonsálvez <[email protected]> Reviewed-by: Lionel Landwerlin <[email protected]>
* intel/tools: Add 0x in front of a couple of hex valuesJason Ekstrand2018-08-251-2/+2
| | | | Reviewed-by: Lionel Landwerlin <[email protected]>
* anv: Fill holes in the VF VUE to zeroJason Ekstrand2018-08-251-1/+28
| | | | | | | | This fixes a GPU hang in DOOM 2016 running under wine. Cc: [email protected] Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=104809 Reviewed-by: Lionel Landwerlin <[email protected]>
* intel: tools: Fix aubinator_error's fprintf call (format-security)Kai Wasserbäch2018-08-251-1/+1
| | | | | | | | | | | | | | | | | | | | | | | The recent commit 4616639b49b4bbc91e503c1c27632dccc1c2b5be introduced the new function aubinator_error() which is a trivial wrapper around fprintf() to STDERR. The call to fprintf() however is passed the message msg directly: fprintf(stderr, msg); This is a format-security violation and leads to an FTBFS with -Werror=format-security (GCC 8): ../../../src/intel/tools/aubinator.c: In function 'aubinator_error': ../../../src/intel/tools/aubinator.c:74:4: error: format not a string literal and no format arguments [-Werror=format-security] fprintf(stderr, msg); ^~~~~~~ This patch fixes this trivially by introducing a catch-all "%s" format argument. Fixes: 4616639b49b ("intel: tools: split aub parsing from aubinator") Cc: Lionel Landwerlin <[email protected]> Signed-off-by: Kai Wasserbäch <[email protected]> Reviewed-by: Lionel Landwerlin <[email protected]>
* intel/batch_decoder: Print blend states properlyJason Ekstrand2018-08-251-1/+16
| | | | Reviewed-by: Lionel Landwerlin <[email protected]>
* intel/batch_decoder: Fix dynamic state printingJason Ekstrand2018-08-251-2/+2
| | | | | | | | | Instead of printing addresses like everyone else, we were accidentally printing the offset from state base address. Also, state_map is a void pointer so we were incrementing in bytes instead of dwords and every state other than the first was wrong. Reviewed-by: Lionel Landwerlin <[email protected]>
* intel/decoder: Print ISL formats for vertex elementsJason Ekstrand2018-08-251-1/+2
| | | | Reviewed-by: Lionel Landwerlin <[email protected]>
* intel/decoder: Clean up field iteration and fix sub-dword fieldsJason Ekstrand2018-08-251-16/+16
| | | | | | | | | | | | | First of all, setting iter->name in advance_field is unnecessary because it gets set by gen_decode_field which gets called immediately after gen_decode_field in the one call-site. Second, we weren't properly initializing start_bit and end_bit in the initial condition of gen_field_iterator_next so the first field of a struct would get printed wrong if it doesn't start on the first bit. This is fixed by adding a iter_start_field helper which sets the field and also sets up the other bits we need. This fixes decoding of 3DSTATE_SBE_SWIZ. Reviewed-by: Lionel Landwerlin <[email protected]>
* gallium: Split out PIPE_CAP_TEXTURE_MIRROR_CLAMP_TO_EDGE.Kenneth Graunke2018-08-2419-3/+22
| | | | | | | | | | | | | Some hardware can do PIPE_TEX_WRAP_MIRROR_REPEAT but not PIPE_TEX_WRAP_MIRROR_CLAMP and PIPE_TEX_WRAP_MIRROR_CLAMP_TO_BORDER. Drivers for such hardware would like to advertise support for ARB_texture_mirror_clamp_to_edge but not EXT_texture_mirror_clamp. This commit adds a new PIPE_CAP_TEXTURE_MIRROR_CLAMP_TO_EDGE bit, changes the extension enable to be based on that, and enables it in all upstream drivers which supported PIPE_CAP_TEXTURE_MIRROR_CLAMP (so they continue supporting this mode).
* intel: decoder: unify MI_BB_START field namingLionel Landwerlin2018-08-243-7/+7
| | | | | | | | | | | The batch decoder looks for a field with a particular name to decide whether an MI_BB_START leads into a second batch buffer level. Because the names are different between Gen7.5/8 and the newer generation we fail that test and keep on reading (invalid) instructions. Signed-off-by: Lionel Landwerlin <[email protected]> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=107544 Reviewed-by: Jason Ekstrand <[email protected]>
* docs: Update calendar, news, relnotes for 18.1.7Dylan Baker2018-08-243-7/+7
|
* docs: Add mesa 18.1.7 notesDylan Baker2018-08-241-1/+2
|
* docs: Add mesa 18.1.7 docsDylan Baker2018-08-241-0/+103
|
* docs: update calendar 18.2.0-rc4 is out, extend to 18.2.0-rc5Andres Gomez2018-08-241-2/+2
| | | | Signed-off-by: Andres Gomez <[email protected]>
* docs/relnotes: Mark NV_fragment_shader_interlock support in i965Kevin Rogovin2018-08-241-0/+1
| | | | Acked-by: Jason Ekstrand <[email protected]>
* egl/drm: use gbm_dri_bo() wrapperEmil Velikov2018-08-241-2/+2
| | | | | | | | Remove the explicit cast, using the appropriate wrapper instead. Signed-off-by: Emil Velikov <[email protected]> Reviewed-by: Eric Engestrom <[email protected]> Acked-by: Daniel Stone <[email protected]>
* egl/drm: use gbm_dri_surface() wrapperEmil Velikov2018-08-241-3/+3
| | | | | | | | Remove the explicit cast, using the appropriate wrapper instead. Signed-off-by: Emil Velikov <[email protected]> Reviewed-by: Eric Engestrom <[email protected]> Acked-by: Daniel Stone <[email protected]>
* egl/drm: use gbm_dri_device() wrapperEmil Velikov2018-08-241-1/+1
| | | | | | | | Remove the explicit cast, using the appropriate wrapper instead. Signed-off-by: Emil Velikov <[email protected]> Reviewed-by: Eric Engestrom <[email protected]> Acked-by: Daniel Stone <[email protected]>
* egl/android: simplify device open/probeEmil Velikov2018-08-241-34/+18
| | | | | | | | | | | Currently droid_probe_device, does not do any 'probing' but filtering out a device if it doesn't match the vendor string given. Rename the function, straighten the return type and call it only as needed - an actual vendor string is provided. Signed-off-by: Emil Velikov <[email protected]> Reviewed-by: Tomasz Figa <[email protected]>
* egl/android: remove drmVersion::name NULL checkEmil Velikov2018-08-241-5/+0
| | | | | | | The name string is guaranteed to be non-NULL. Signed-off-by: Emil Velikov <[email protected]> Reviewed-by: Tomasz Figa <[email protected]>
* egl/android: remove droid_probe_driver()Emil Velikov2018-08-241-18/+0
| | | | | | | | | | | The function name is misleading - it effectively checks if loader_get_driver_for_fd fails. Which can happen only only on strdup error - a close to impossible scenario. Drop the function - we call the loader API at at later stage. Signed-off-by: Emil Velikov <[email protected]> Reviewed-by: Tomasz Figa <[email protected]>
* egl/android: use strcmp with drmVersion::nameEmil Velikov2018-08-241-1/+1
| | | | | | | | The name string is guaranteed to be NULL terminated. Drop the explicit length check that comes with strncmp(). Signed-off-by: Emil Velikov <[email protected]> Reviewed-by: Tomasz Figa <[email protected]>
* egl/android: use drmDevice instead of the manual /dev/dri iterationEmil Velikov2018-08-241-16/+12
| | | | | | | | | | | | | | | | Replace the manual handling of /dev/dri in favor of the drmDevice API. The latter provides a consistent way of enumerating the devices, providing device details as needed. v2: - Use ARRAY_SIZE (Frank) - s/famour/favor/ typo (Frank) - Make MAX_DRM_DEVICES a macro - fix vla errors (RobF) - Remove left-over dev_path instance (RobF) Signed-off-by: Emil Velikov <[email protected]> Reviewed-by: Robert Foss <[email protected]> (v1) Reviewed-by: Tomasz Figa <[email protected]>
* Revert "configure: allow building with python3"Emil Velikov2018-08-2427-37/+39
| | | | | | | | | | | | | | This reverts commit ae7898dfdbe5c8dab7d11c71862353f1ae43feb0. Turns out the python scripts are _not_ fully python 3 compatible. As Ilia reported using get_xmlpool.py with LANG=C produces some weird output - see the link for details. Even though the issue was spotted with the autoconf build, it exposes a genuine problem with the script (and lack of lang handling of the meson build.) https://lists.freedesktop.org/archives/mesa-dev/2018-August/203508.html
* Revert "travis: use python3 for the autoconf builds"Emil Velikov2018-08-241-11/+1
| | | | | | | | | | | | | | This reverts commit 855af9a5a209f061355513b92f3ba4576f48d091. Turns out the python scripts are _not_ fully python 3 compatible. As Ilia reported using get_xmlpool.py with LANG=C produces some weird output - see the link for details. Even though the issue was spotted with the autoconf build, it exposes a genuine problem with the script (and lack of lang handling of the meson build.) https://lists.freedesktop.org/archives/mesa-dev/2018-August/203508.html
* Revert "mesa: bump GL_MAX_ELEMENTS_INDICES and GL_MAX_ELEMENTS_VERTICES"Kenneth Graunke2018-08-242-5/+2
| | | | | | | | | | This reverts commit 095515e16ca3cb2c9f1813b6602ee57ae28325a8. This breaks KHR-GL46.map_buffer_alignment.functional on i965. This code was apparently not reviewed and I don't know why we would move from a driver configurable constant to a hardcoded value for all drivers. This really looks like an accidental hack push.
* Revert recent changes about not including compute in combined limits.Kenneth Graunke2018-08-243-27/+26
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | As far as I can tell, no one reviewed these changes, they made i965 assert fail on driver load, and I am not certain they are correct. (Hopefully reverting these does not break radeonsi too badly...) The uniform related changes seem fine and reasonable, but the texture image units change is possibly incorrect. According to the OES_tessellation_shader spec issue 5: (5) How are aggregate shader limits computed? RESOLVED: Following the GL 4.4 model, but we restrict uniform buffer bindings to 12/stage instead of 14, this results in MAX_UNIFORM_BUFFER_BINDINGS = 72 This is 12 bindings/stage * 6 shader stages, allowing a static partitioning of the bindings even though at most 5 stages can appear in a program object). MAX_COMBINED_UNIFORM_BLOCKS = 60 This is 12 blocks/stage * 5 stages, since compute shaders can't be mixed with other stages. MAX_COMBINED_TEXTURE_IMAGE_UNITS = 96 This is 16 textures/stage * 6 stages. which definitely is including compute shaders in that last limit. Not including compute shaders breaks the following test: dEQP-GLES31.functional.state_query.integer.max_combined_texture_image_units_getinteger There was enough breakage that I figured we should just send this back to the drawing board. Revert "i965: don't include compute resources in "Combined" limits" Revert "st/mesa: don't include compute resources in "Combined" limits" Revert "mesa: don't include compute resources in MAX_COMBINED_* limits" This reverts commit b03dcb1e5f507c5950d0de053a6f76e6306ee71f. This reverts commit cff290df4c09547cd2cb3b129ec59bdebdadba90. This reverts commit 45f87a48f94148b484961f18a4f1ccf86f066b1c.
* gallivm: don't use saturated unsigned add/sub intrinsics for llvm 8.0Roland Scheidegger2018-08-241-27/+60
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | These have been removed. Unfortunately auto-upgrade doesn't work for jit. (Worse, it seems we don't get a compilation error anymore when compiling the shader, rather llvm will just do a call to a null function in the jitted shaders making it difficult to detect when intrinsics vanish.) Luckily the signed ones are still there, I helped convincing llvm removing them is a bad idea for now, since while the unsigned ones have sort of agreed-upon simplest patterns to replace them with, this is not the case for the signed ones, and they require _significantly_ more complex patterns - to the point that the recognition is IMHO probably unlikely to ever work reliably in practice (due to other optimizations interfering). (Even for the relatively trivial unsigned patterns, llvm already added test cases where recognition doesn't work, unsaturated add followed by saturated add may produce atrocious code.) Nevertheless, it seems there's a serious quest to squash all cpu-specific intrinsics going on, so I'd expect patches to nuke them as well to resurface. Adapt the existing fallback code to match the simple patterns llvm uses and hope for the best. I've verified with lp_test_blend that it does produce the expected saturated assembly instructions. Though our cmp/select build helpers don't use boolean masks, but it doesn't seem to interfere with llvm's ability to recognize the pattern. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=106231 Reviewed-by: Jose Fonseca <[email protected]>
* st/mesa: expose KHR_texture_compression_astc_sliced_3dMarek Olšák2018-08-243-3/+6
| | | | | This is ASTC 2D LDR allowing texture arrays and 3D, compressing each slice as a separate 2D image. Tested by piglit. Trivial.
* st/mesa: expose EXT_disjoint_timer_queryMarek Olšák2018-08-242-0/+2
| | | | same cap as ARB_timer_query, no changes needed, tested by piglit
* mesa: expose EXT_vertex_attrib_64bitMarek Olšák2018-08-244-0/+74
| | | | | | | because the closed driver exposes it. It's the same as the ARB extension. Reviewed-by: Ian Romanick <[email protected]>
* mesa: expose AMD_query_buffer_objectMarek Olšák2018-08-242-0/+2
| | | | | | it's a subset of the ARB extension. Reviewed-by: Ian Romanick <[email protected]>
* mesa: expose AMD_multi_draw_indirectMarek Olšák2018-08-243-0/+22
| | | | | | | because the closed driver exposes it. This is equivalent to the ARB extension. Reviewed-by: Ian Romanick <[email protected]>
* mesa: expose AMD_gpu_shader_int64Marek Olšák2018-08-249-12/+261
| | | | | | | | | because the closed driver exposes it. It's equivalent to ARB_gpu_shader_int64. In this patch, I did everything the same as we do for ARB_gpu_shader_int64. Reviewed-by: Ian Romanick <[email protected]>
* mesa: expose ARB_post_depth_coverage in the Compatibility profileMarek Olšák2018-08-242-1/+2
| | | | | | It only contains GLSL changes. v2: allow the layout qualifier on GLSL <= 1.30
* intel/nir: Enable nir_opt_find_array_copiesJason Ekstrand2018-08-232-13/+28
| | | | | | | | | | | | | | | | | | | | | | | We have to be a bit careful with this one because we want it to run in the optimization loop but only in the first brw_nir_optimize call. Later calls assume that we've lowered away copy_deref instructions and we don't want to introduce any more. Shader-db results on Kaby Lake: total instructions in shared programs: 15176942 -> 15176942 (0.00%) instructions in affected programs: 0 -> 0 helped: 0 HURT: 0 In spite of the lack of any shader-db improvement, this patch completely eliminates spilling in the Batman: Arkham City tessellation shaders. This is because we are now able to detect that the temporary array created by DXVK for storing TCS inputs is a copy of the input arrays and use indirect URB reads instead of making a copy of 4.5 KiB of input data and then indirecting on it with if-ladders. Reviewed-by: Caio Marcelo de Oliveira Filho <[email protected]>
* nir: Add an array copy optimizationJason Ekstrand2018-08-234-0/+415
| | | | | | | | | | | | This peephole optimization looks for a series of load/store_deref or copy_deref instructions that copy an array from one variable to another and turns it into a copy_deref that copies the entire array. The pattern it looks for is extremely specific but it's good enough to pick up on the input array copies in DXVK and should also be able to pick up the sequence generated by spirv_to_nir for a OpLoad of a large composite followed by OpStore. It can always be improved later if needed. Reviewed-by: Caio Marcelo de Oliveira Filho <[email protected]>
* intel/nir: Use nir_shrink_vec_array_varsJason Ekstrand2018-08-231-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | Shader-db results on Kaby Lake: total instructions in shared programs: 15177605 -> 15176765 (<.01%) instructions in affected programs: 4259 -> 3419 (-19.72%) helped: 1 HURT: 0 total spills in shared programs: 10954 -> 10855 (-0.90%) spills in affected programs: 295 -> 196 (-33.56%) helped: 1 HURT: 0 total fills in shared programs: 22222 -> 22117 (-0.47%) fills in affected programs: 417 -> 312 (-25.18%) helped: 1 HURT: 0 The helped shader is from the OglCSDof synmark test. On my Kaby Lake laptop, the actual framerate of the benchmark didn't appear to improve beyond the noise. Reviewed-by: Caio Marcelo de Oliveira Filho <[email protected]>