summaryrefslogtreecommitdiffstats
path: root/src
Commit message (Collapse)AuthorAgeFilesLines
* gallivm: (trivial) do division by 1000 with int64Roland Scheidegger2018-04-241-1/+1
| | | | | | | Conversion to int can otherwise overflow if compile times are over ~71min. (Yes this can happen...) Reviewed-by: Jose Fonseca <[email protected]>
* gallivm: remove LICM passRoland Scheidegger2018-04-241-1/+9
| | | | | | | | | | | | | | LICM is simply too expensive, even though it presumably can help quite a bit in some cases. It was definitely cheaper in llvm 3.3, though as far as I can tell with llvm 3.3 it failed to do anything in most cases. early-cse also actually seems to cause licm to be able to move things when it previously couldn't, which causes noticeable compile time increases. There's more loop passes in llvm, but I'm not sure which ones are helpful, and I couldn't find anything which would roughly do what the old licm in llvm 3.3 did, so ditch it. Reviewed-by: Jose Fonseca <[email protected]>
* gallivm: add early cse passRoland Scheidegger2018-04-241-4/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This pass is quite cheap, and can simplify the IR quite a bit for our generated IR. In particular on a variety of shaders I've found the time saved by other passes due to the simplified IR more than makes up for the cost of this pass, and on top of that the end result is actually better. The only downside I've found is this enables the LICM pass to move some things out of the main shader loop (in the case I've seen, instanced vertex fetch (which is constant within the jit shader) plus the derived instructions in the shader) which it couldn't do before for some reason. This would actually be desirable but can increase compile time considerably (licm seems to have considerable cost when it actually can move things out of loops, due to alias analysis). But blaming early cse for this seems inappropriate. (Note that the first two sroa / earlycse passes are similar to what a standard llvm opt -O1/-O2 pipeline would do, albeit this has some more passes even before but I don't think they'd do much for us.) It also in particular helps some crazy shader used for driver verification (don't ask...) a lot (about factor of 6 faster in compile time) (due to simplfiying the ir before LICM is run). While here, also move licm behind simplifycfg. For some shaders there seems to be very significant compile time gains (we've seen a factor of 10000 albeit that was a really crazy shader you'd certainly never see in a real app), beause LICM is quite expensive and there's cases where running simplifycfg (along with sroa and early-cse) before licm reduces IR complexity significantly. (I'm not entirely sure if it would make sense to also run it afterwards.) Reviewed-by: Jose Fonseca <[email protected]>
* glsl/glcpp: Handle hex constants with 0X prefixVlad Golovkin2018-04-243-7/+12
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | GLSL 4.6 spec describes hex constant as: hexadecimal-constant: 0x hexadecimal-digit 0X hexadecimal-digit hexadecimal-constant hexadecimal-digit Right now if you have a shader with the following structure: #if 0X1 // or any hex number with the 0X prefix // some code #endif the code between #if and #endif gets removed because the checking is performed only for "0x" prefix which results in strtoll being called with the base 8 and after encountering the 'X' char the strtoll returns 0. Letting strtoll detect the base makes this limitation go away and also makes code easier to read. From the strtoll Linux man page: "If base is zero or 16, the string may then include a "0x" prefix, and the number will be read in base 16; otherwise, a zero base is taken as 10 (decimal) unless the next character is '0', in which case it is taken as 8 (octal)." This matches the behaviour in the GLSL spec. This patch also adds a test for uppercase hex prefix. Reviewed-by: Timothy Arceri <[email protected]>
* mesa: rename api_validate.{c,h} -> draw_validate.{c,h}Timothy Arceri2018-04-249-10/+10
| | | | | Reviewed-by: Mathias Fröhlich <[email protected]> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=65422
* ac/radv/radeonsi: refactor harvest config register getters.Dave Airlie2018-04-244-206/+130
| | | | | | | | This refactors the code out to share it between radv and radeonsi. Reviewed-by: Bas Nieuwenhuizen <[email protected]> Acked-by: Nicolai Hähnle <[email protected]> Reviewed-by: Samuel Pitoiset <[email protected]>
* radv: only set raster_config_1 outside the index registers.Dave Airlie2018-04-241-15/+16
| | | | | | | | | | This follows what radeonsi does. Ported from radeonsi: radeonsi: emit PA_SC_RASTER_CONFIG_1 only once Reviewed-by: Bas Nieuwenhuizen <[email protected]> Reviewed-by: Samuel Pitoiset <[email protected]>
* ac/radv/radeonsi: refactor max simd waves into common code.Dave Airlie2018-04-243-22/+18
| | | | | Reviewed-by: Nicolai Hähnle <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]>
* ac/radv/radeonsi: refactor raster_config default values getters.Dave Airlie2018-04-244-165/+102
| | | | | | | This just makes this common code between the two drivers. Reviewed-by: Nicolai Hähnle <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]>
* radeonsi: use common gs_table_depth codeDave Airlie2018-04-241-31/+2
| | | | | Reviewed-by: Nicolai Hähnle <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]>
* radv: use common gs_table_depth code.Dave Airlie2018-04-241-30/+2
| | | | Reviewed-by: Bas Nieuwenhuizen <[email protected]>
* ac/info: move gs table depth to common code.Dave Airlie2018-04-242-0/+34
| | | | | Reviewed-by: Nicolai Hähnle <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]>
* radeonsi: don't runtime check gs table infoDave Airlie2018-04-241-7/+7
| | | | | | | | We can just unreachable here, this aligns with radv code, makes it easier to move to common code. Reviewed-by: Nicolai Hähnle <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]>
* radv/gfx9: don't use gs_table_depth on gfx9.Dave Airlie2018-04-242-5/+6
| | | | | | | | Missed this on initial radeonsi port, we shouldn't use this value on gfx9, but also in gfx8 only for when we have a geom shader. Reviewed-by: Bas Nieuwenhuizen <[email protected]> Signed-off-by: Dave Airlie <[email protected]>
* i965/fs: Return mlen * 8 for size_read() for INTERPOLATE_AT_*Jason Ekstrand2018-04-231-0/+2
| | | | | | | | | They are send messages and this makes size_read() and mlen agree. For both of these opcodes, the payload is just a dummy so mlen == 1 and this should decrease register pressure a bit. Reviewed-by: Francisco Jerez <[email protected]> Cc: [email protected]
* ac: fix the number of coordinates for ac_image_get_lod and arraysSamuel Pitoiset2018-04-231-0/+14
| | | | | | | | | | | | | This fixes crashes for the following CTS: dEQP-VK.glsl.texture_functions.query.texturequerylod.* Cubemaps are the same as 2D arrays. Fixes: 625dcbbc456 ("amd/common: pass address components individually to ac_build_image_intrinsic") Cc: 18.1 <[email protected]> Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Nicolai Hähnle <[email protected]>
* i965: perf: enable GPA query statisticsLionel Landwerlin2018-04-233-1/+68
| | | | | | | | | | | | The combinaison of GPA/MDAPI components expects a particular name & layout for their pipeline statistics query. v2: Limit the query GPA/MDAPI statistics to gen7->9 (Lionel) v3: Add curly braces (Ken) Signed-off-by: Lionel Landwerlin <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
* i965: perf: add support for raw queriesLionel Landwerlin2018-04-236-27/+474
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The INTEL_performance_query extension provides a list of queries that a user can select to monitor a particular workload. Each query reports different sets of counters (roughly looking at different parts of the hardware, i.e. caches/fixed functions/etc...). Each query has an associated configuration that we need to program into the hardware before using the query. Up to now, we provided predefined queries. This change allows the user to build its own query (and associated configuration) externally, and have the i965 driver use that configuration through a new query named : Intel_Raw_Hardware_Counters_Set_0_Query When this query is selected, the i965 driver will report raw counters deltas (meaning their values need to be interpreted by the user, as opposed to existing queries that provide human readable values). This change is also useful for debug purposes for building new pre-defined queries and verifying the underlying numbers make sense before writing equations for user readable output. This change's purpose is also to enable GPA. GPA uses a library called MDAPI that processes raw counter data. MDAPI expects raw data to have a certain layout (per generation which is a bit unfortunate...). This change also embeds the expected data layouts. v2: Enable raw queries on gen 7->11, v1 had 7->9 (Lionel) v3: Don't assert on cherryview for gen7... (Ken) Signed-off-by: Lionel Landwerlin <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
* i965: perf: read slice/unslice frequencies from OA reportsLionel Landwerlin2018-04-232-0/+71
| | | | | | | | | | v2: Add comment breaking down where the frequency values come from (Ken) v3: More documentation (Ken/Lionel) Adjust clock ratio multiplier to reflect the divider's behavior (Lionel) Signed-off-by: Lionel Landwerlin <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
* i965: perf: snapshot RPSTAT registerLionel Landwerlin2018-04-233-0/+67
| | | | | | | | | | | This register contains the current/previous frequency of the GT, it's one of the value GPA would like to have as part of their queries. v2: Don't use this register on baytrail/cherryview (Ken) Use GET_FIELD() macro (Ken) Signed-off-by: Lionel Landwerlin <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
* i965: perf: extract utility functionsLionel Landwerlin2018-04-236-252/+294
| | | | | | | | | | | | We would like to reuse a number of the functions and structures in another file in a future commit. We also move the previous content of brw_performance_query.h into brw_performance_query_metrics.h to be included by generated metrics files. Signed-off-by: Lionel Landwerlin <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
* ac: teach get_ac_sampler_dim() about subpass attachmentsSamuel Pitoiset2018-04-231-17/+7
| | | | | | | | Suggested by Nicolai. Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]> Reviewed-by: Nicolai Hähnle <[email protected]>
* ac/nir: add missing round_slice for 1D arraysSamuel Pitoiset2018-04-231-0/+7
| | | | | | | | | | | | | This fixes a bunch of CTS fails with 1D arrays: dEQP-VK.glsl.texture_functions.texture*.sampler1darray_* Fixes: 625dcbbc456 ("amd/common: pass address components individually to ac_build_image_intrinsic") Cc: 18.1 <[email protected]> Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]> Reviewed-by: Nicolai Hähnle <[email protected]>
* compiler/glsl: close fd's in glcpp_test.pyDylan Baker2018-04-231-2/+4
| | | | | | | | | | | | I would have thought falling out of scope would allow the gc to collect these, but apparently it doesn't, and this hits an fd limit on macos. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=106133 Fixes: db8cd8e36771eed98eb638fd0593c978c3da52a9 ("glcpp/tests: Convert shell scripts to a python script") Signed-off-by: Dylan Baker <[email protected]> Reviewed-by: Eric Anholt <[email protected]> Tested-by: Vinson Lee <[email protected]>
* nir: Do not use progress for unreachable code in return lowering.Bas Nieuwenhuizen2018-04-231-1/+6
| | | | | | | | | | | | | | | | | | | | | | | We seem to use progress for two cases: 1) When we lowered some returns. 2) When we remove unreachable code. If just case 2 happens we assert as state->return_flag has not been allocated yet, but we are still trying to do insert all predicates based on it. This splits the concerns. We only use progress internally for case 1 and then keep track of 2 in a separate variable to indicate progress in the return value of the pass. This is slightly better than transforming the assert into if (!state->return_flag) return, as the solution in this patch avoids inserting predicates even if some other part of the might need them. Fixes: 6e22ad6edc "nir: return early when lowering a return at the end of a function" CC: 18.1 <[email protected]> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=106174 Reviewed-by: Timothy Arceri <[email protected]>
* radv: advertise 8 bits of subpixel precision for viewportsJózef Kucia2018-04-231-1/+1
| | | | | | This is what radeonsi does. Reviewed-by: Samuel Pitoiset <[email protected]>
* st/dri: Fix dangling pointer to a destroyed dri_drawableJohan Klokkhammer Helsing2018-04-231-0/+4
| | | | | | | | | | | | | | | | | | | | | | | If an EGLSurface is created, made current and destroyed, and then a second EGLSurface is created. Then the second malloc in driCreateNewDrawable may return the same pointer address the first surface's drawable had. Consequently, when dri_make_current later tries to determine if it should update the texture_stamp it compares the surface's drawable pointer against the drawable in the last call to dri_make_current and assumes it's the same surface (which it isn't). When texture_stamp is left unset, then dri_st_framebuffer_validate thinks it has already called update_drawable_info for that drawable, leaving it unvalidated and this is when bad things starts to happen. In my case it manifested itself by the width and height of the surface being unset. This is fixed this by setting the pointer to NULL before freeing the surface. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=106126 Signed-off-by: Johan Klokkhammer Helsing <[email protected]> Signed-off-by: Marek Olšák <[email protected]> Cc: 18.0 18.1 <[email protected]>
* nv50/ir: make a copy of tex src if it's referenced multiple timesIlia Mirkin2018-04-221-37/+49
| | | | | | | | | | | | | | | | For nv50 we coalesce the srcs and defs into a single node. As such, we can end up with impossible constraints if the source is referenced after the tex operation (which, due to the coalescing of values, will have overwritten it). This logic already exists for inserting moves for MERGE/UNION sources. It's the exact same idea here, so leverage that code, which also includes a few optimizations around not extending live ranges unnecessarily. Fixes tests/spec/glsl-1.30/execution/fs-textureSize-components.shader_test Signed-off-by: Ilia Mirkin <[email protected]>
* virgl: disable virgl when no 3D for virtio gpu.Lepton Wu2018-04-231-0/+11
| | | | | | | | | | | | If users are running mesa under old version of qemu or have turned off GL at runtime, virtio gpu driver actually doesn't work. Adds a detection here so mesa can fall back to software rendering. v2: - move detection from loader to virgl (Ilia, Emil) Signed-off-by: Lepton Wu <[email protected]> Reviewed-by: Dave Airlie <[email protected]>
* radv: mark const structs as extern in header file to avoid lto damageDave Airlie2018-04-231-3/+3
| | | | | | | | | | | | | The copr repo from che was using LTO and he reported radv broke recently with it. When testing with lto builds here I noticed that we weren't seeing any instance extensions reported. It appears LTO was treating the const without extern as an empty struct, this is possibly a gcc bug, but we can work around it just by marking these with extern. Acked-by: Samuel Pitoiset <[email protected]> Signed-off-by: Dave Airlie <[email protected]>
* gallium/tests/trivial: fix viewport depth transformIlia Mirkin2018-04-212-6/+7
| | | | | | | | | | | These were getting mapped off into outer space, which would cause nv50 and nvc0 to clip the primitives (as depth_clip was enabled). These drivers are configured to clip everything outside the [0, 1] range, even though the hardware supports other view settings. Signed-off-by: Ilia Mirkin <[email protected]> Reviewed-by: Roland Scheidegger <[email protected]>
* trace: allow image resource to be nullIlia Mirkin2018-04-211-1/+1
| | | | | Signed-off-by: Ilia Mirkin <[email protected]> Reviewed-by: Marek Olšák <[email protected]>
* nv50/ir/ra: prefer def == src2 for fma with immediates on nvc0Karol Herbst2018-04-211-10/+29
| | | | | | | | | | | | | | | | | | | | This helps with the PostRALoadPropagation pass moving long immediates into FMA/MAD instructions. changes in shader-db: total instructions in shared programs : 5894114 -> 5886074 (-0.14%) total gprs used in shared programs : 666558 -> 666563 (0.00%) total shared used in shared programs : 520416 -> 520416 (0.00%) total local used in shared programs : 53524 -> 53524 (0.00%) total bytes used in shared programs : 54006744 -> 53932472 (-0.14%) local shared gpr inst bytes helped 0 0 2 4192 4192 hurt 0 0 7 9 9 Signed-off-by: Karol Herbst <[email protected]> [imirkin: minor edits to separate nv50 and nvc0+ cases] Reviewed-by: Ilia Mirkin <[email protected]>
* autotools: Include new meson files18.1-branchpointDylan Baker2018-04-205-1/+10
| | | | | Signed-off-by: Dylan Baker <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
* autotools: Add passes.h to sources so it will be included in the tarballDylan Baker2018-04-201-0/+1
| | | | | | | | | | | This was introduced in commit 8f848ada8a42d9aaa8136afa1bafe32281a0fb48 but not added to the sources list, which is necessary for it to be included in release tarballs. Fixes: 8f848ada8a42d9aaa8136afa1bafe32281a0fb48 ("swr/rast: Start refactoring of builder/packetizer.") Signed-off-by: Dylan Baker <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
* nvc0: fix line width on GM20x+Rhys Perry2018-04-201-1/+4
| | | | | | | This has the side-effect of fixing polygon-offset piglit test failures. Signed-off-by: Rhys Perry <[email protected]> Reviewed-by: Ilia Mirkin <[email protected]>
* i965/miptree: Delete an unused functionNanley Chery2018-04-202-17/+0
| | | | | | | | We're going to combine ::mcs_buf and ::hiz_buf in later commits. Once that happens, this function no longer make sense. Reviewed-by: Rafael Antognolli <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]>
* i965/miptree: Don't leak the clear_color_boNanley Chery2018-04-201-2/+1
| | | | | | | | Free the clear_color_bo in addition to freeing the intel_miptree_aux_buffer which holds the reference to it. Reviewed-by: Rafael Antognolli <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]>
* i965/blorp: Do the gen11 BTI flushJason Ekstrand2018-04-201-0/+14
| | | | | Reviewed-by: Anuj Phogat <[email protected]> Reviewed-by: Topi Pohjolainen <[email protected]>
* anv/blorp: Do the gen11 BTI flushJason Ekstrand2018-04-201-0/+14
| | | | | Reviewed-by: Anuj Phogat <[email protected]> Reviewed-by: Topi Pohjolainen <[email protected]>
* etnaviv: fix texture_format_needs_swizLucas Stach2018-04-201-1/+1
| | | | | | | | | | | | | memcmp returns 0 when both swizzles are the same, which means we don't need any hardware swizzling. texture_format_needs_swiz should return true when the return value of the memcmp is non-zero. Fixes: 751ae6afbefd ("etnaviv: add support for swizzled texture formats") Cc: [email protected] Signed-off-by: Lucas Stach <[email protected]> Tested-by: Marek Vasut <[email protected]> Reviewed-by: Christian Gmeiner <[email protected]> Reviewed-by: Wladimir J. van der Laan <[email protected]>
* ac/nir: fix image dimension for subpass attachmentsSamuel Pitoiset2018-04-201-3/+15
| | | | | | | | | | | For subpass attachments we need one more coordinate with the layer, so make them array types. This fixes a bunch of CTS fails with RADV. Fixes: 24fb3e6aa1 ("ac/nir: use ac_build_image_opcode for image intrinsics") Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]>
* radv: Mark GTT memory as device local for APUs.Bas Nieuwenhuizen2018-04-201-3/+5
| | | | | | | | Otherwise a lot of games complain about not having enough memory, and it is sort of local so this seems reasonable to me. CC: 18.0 <[email protected]> Reviewed-by: Samuel Pitoiset <[email protected]>
* radv/winsys: allow to submit up to 4 IBs for chips without chainingSamuel Pitoiset2018-04-201-50/+168
| | | | | | | | | | | | | | | | | | | The SI family doesn't support chaining which means the maximum size in dwords per CS is limited. When that limit was reached we failed to submit the CS and the application crashed. This patch allows to submit up to 4 IBs which is currently the limit, but recent amdgpu supports more than that. Please note that we can reach the limit of 4 IBs per submit but currently we can't improve that. The only solution is to upgrade libdrm. That will be improved later but for now this should fix crashes on SI or when using RADV_DEBUG=noibs. Fixes: 36cb5508e89 ("radv/winsys: Fail early on overgrown cs.") Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=105775 Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]>
* gallium/util: Android backtrace supportStefan Schake2018-04-203-1/+114
| | | | | | | | | | | | | | We can't use any of the existing implementations in u_debug_stack. Android technically has libunwind, but it's been modified to the point where it no longer compiles with the Mesa usage. The library is also not meant to be referenced by vendor libraries. The officially sanctioned way of obtaining backtraces is through the Android own libbacktrace, a C++ library. Access it through a separate C++ source file on Android only. Signed-off-by: Stefan Schake <[email protected]> Acked-by: Eric Engestrom <[email protected]> Reviewed-by: Rob Herring <[email protected]> Reviewed-by: Tapani Pälli <[email protected]>
* gallium/util: Don't stub u_debug_stack on AndroidStefan Schake2018-04-201-1/+2
| | | | | | | | | The fallback path for no libunwind ends up being stubs for Android. Don't compile them in so we can provide our own implementation. Signed-off-by: Stefan Schake <[email protected]> Reviewed-by: Eric Engestrom <[email protected]> Reviewed-by: Tapani Pälli <[email protected]>
* ac/nir: handle nir_intrinsic_load_first_vertex like base_vertexSamuel Pitoiset2018-04-201-2/+2
| | | | | | | | This fixes a ton of CTS crashes. Fixes: c366f422f0 ("nir: Offset vertex_id by first_vertex instead of base_vertex") Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]>
* radv/winsys: allow local BOs on APUsSamuel Pitoiset2018-04-201-1/+2
| | | | | | | | | Ported from RadeonSI. Local BOs ignore BO priorities, and we don't need those on APUs. Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]>
* radv: use a global BO list only for VK_EXT_descriptor_indexingSamuel Pitoiset2018-04-203-9/+34
| | | | | | | | | | | | Maintaining two different paths is annoying but this gets rid of the performance regression introduced by the global BO list. We might find a better solution in the future, but for now just keeps two paths. Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]>
* Revert "radv: Don't store buffer references in the descriptor set."Samuel Pitoiset2018-04-205-13/+82
| | | | | | | | | | | | | | | | In order to reduce a performance regression introduced by 4b13fe55a4 ("radv: Keep a global BO list for VkMemory."), we are going to maintain two different paths. One when VK_EXT_descriptor_indexing is enabled by the application because we need to have a global BO list, and one (the old one) when it's not enabled. With Talos on Polaris, the global BO list reduces performance by 10% which is too much for me. This reverts commit ab6cadd3ecc7fbdd9079808b407674e0b19c52f0. Reviewed-by: Bas Nieuwenhuizen <[email protected]>