summaryrefslogtreecommitdiffstats
path: root/src/mesa/drivers
Commit message (Collapse)AuthorAgeFilesLines
* intel: Move brw_prog_key_set_id from i965 to the compiler.Kenneth Graunke2019-05-212-20/+0
| | | | | | I want to use it in iris. Reviewed-by: Dylan Baker <[email protected]>
* meson: expose glapi through osmesaEric Engestrom2019-05-181-1/+2
| | | | | | | | | | | Suggested-by: Pierre Guillou <[email protected]> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=109659 Fixes: f121a669c7d94d2ff672 "meson: build gallium based osmesa" Fixes: cbbd5bb889a2c271a504 "meson: build classic osmesa" Cc: Brian Paul <[email protected]> Cc: Dylan Baker <[email protected]> Signed-off-by: Eric Engestrom <[email protected]> Tested-by: Chuck Atkins <[email protected]>
* i965/blorp: Set MOCS for gen11 in blorp_alloc_vertex_bufferJordan Justen2019-05-141-1/+5
| | | | | | | | v2: * Add build error for gen > 6 if MOCS is not set. (Lionel) Signed-off-by: Jordan Justen <[email protected]> Reviewed-by: Anuj Phogat <[email protected]>
* intel/compiler: Implement TCS 8_PATCH mode and INTEL_DEBUG=tcs8Kenneth Graunke2019-05-142-2/+8
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Our tessellation control shaders can be dispatched in several modes. - SINGLE_PATCH (Gen7+) processes a single patch per thread, with each channel corresponding to a different patch vertex. PATCHLIST_N will launch (N / 8) threads. If N is less than 8, some channels will be disabled, leaving some untapped hardware capabilities. Conditionals based on gl_InvocationID are non-uniform, which means that they'll often have to execute both paths. However, if there are fewer than 8 vertices, all invocations will happen within a single thread, so barriers can become no-ops, which is nice. We also burn a maximum of 4 registers for ICP handles, so we can compile without regard for the value of N. It also works in all cases. - DUAL_PATCH mode processes up to two patches at a time, where the first four channels come from patch 1, and the second group of four come from patch 2. This tries to provide better EU utilization for small patches (N <= 4). It cannot be used in all cases. - 8_PATCH mode processes 8 patches at a time, with a thread launched per vertex in the patch. Each channel corresponds to the same vertex, but in each of the 8 patches. This utilizes all channels even for small patches. It also makes conditions on gl_InvocationID uniform, leading to proper jumps. Barriers, unfortunately, become real. Worse, for PATCHLIST_N, the thread payload burns N registers for ICP handles. This can burn up to 32 registers, or 1/4 of our register file, for URB handles. For Vulkan (and DX), we know the number of vertices at compile time, so we can limit the amount of waste. In GL, the patch dimension is dynamic state, so we either would have to waste all 32 (not reasonable) or guess (badly) and recompile. This is unfortunate. Because we can only spawn 16 thread instances, we can only use this mode for PATCHLIST_16 and smaller. The rest must use SINGLE_PATCH. This patch implements the new 8_PATCH TCS mode, but leaves us using SINGLE_PATCH by default. A new INTEL_DEBUG=tcs8 flag will switch to using 8_PATCH mode for testing and benchmarking purposes. We may want to consider using 8_PATCH mode in Vulkan in some cases. The data I've seen shows that 8_PATCH mode can be more efficient in some cases, but SINGLE_PATCH mode (the one we use today) is faster in other cases. Ultimately, the TES matters much more than the TCS for performance, so the decision may not matter much. Reviewed-by: Jason Ekstrand <[email protected]>
* i965: Pass compiler to default key populatorsKenneth Graunke2019-05-1413-27/+37
| | | | | | This lets us get devinfo and other misc. compiler settings. Reviewed-by: Jason Ekstrand <[email protected]>
* i965/miptree: Refactor intel_miptree_supports_ccs_e()Nanley Chery2019-05-141-10/+5
| | | | | | | Update and rename this function to format_supports_ccs_e() to better match its behavior. Reviewed-by: Rafael Antognolli <[email protected]>
* i965/miptree: Drop intel_*_supports_hiz()Nanley Chery2019-05-141-35/+2
| | | | | | | | intel_tiling_supports_hiz() and intel_miptree_supports_hiz() duplicate much the work done by isl_surf_get_hiz_surf(). Replace them with simple expressions. Reviewed-by: Rafael Antognolli <[email protected]>
* i965/miptree: Drop intel_*_supports_ccs()Nanley Chery2019-05-141-124/+6
| | | | | | | | intel_tiling_supports_ccs() and intel_miptree_supports_ccs() duplicate much the work done by isl_surf_get_ccs_surf(). Drop them both and index a boolean array to choose CCS_D in intel_miptree_choose_aux_usage(). Reviewed-by: Rafael Antognolli <[email protected]>
* i965/miptree: Drop intel_miptree_supports_mcs()Nanley Chery2019-05-141-46/+1
| | | | | | | This function duplicates much the work done by isl_surf_get_mcs_surf(). Replace it with a simple expression. Reviewed-by: Rafael Antognolli <[email protected]>
* i965/miptree: Fall back to no aux if creation failsNanley Chery2019-05-141-5/+6
| | | | | | | | | No surface requires an auxiliary surface to operate correctly. Fall back to an uncompressed surface if mesa fails to create and allocate an auxiliary surface. This enables adding more restrictions to ISL without having to update i965. Reviewed-by: Rafael Antognolli <[email protected]>
* mesa: Replace MaxTextureLevels with MaxTextureSize.Eric Anholt2019-05-139-10/+10
| | | | | | | | | | In most places (glGetInteger, max_legal_texture_dimensions), we wanted the number of pixels, not the number of levels. Number of levels is easily recovered with util_next_power_of_two() and ffs(). More importantly, for V3D we want to be able to expose a non-power-of-two maximum texture size to cover 2x4k displays on HW that can't quite do 8192 wide. Reviewed-by: Marek Olšák <[email protected]>
* i965: Fix memory leaks in brw_upload_cs_work_groups_surface().Kenneth Graunke2019-05-101-0/+5
| | | | | | | | | | | | | This was taking a reference to the 64kB upload buffer and never returning it, leaking a reference each time this atom triggered. This leaked lots of 64kB upload BOs, eventually running us out of of VMA space. This would usually happen when using mpv to watch a movie, after 20-40 minutes. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=110134 Fixes: 63d7b33f516 i965/cs: Setup surface binding for gl_NumWorkGroups Reviewed-by: Caio Marcelo de Oliveira Filho <[email protected]>
* i965: leave the top 4Gb of the high heap VMA unusedKenneth Graunke2019-05-071-1/+5
| | | | | | | | This ports commit 9e7b0988d6e98690eb8902e477b51713a6ef9cae from anv to i965. Thanks to Lionel for noticing that it was missing! Fixes: 01058a55229 i965: Add virtual memory allocator infrastructure to brw_bufmgr. Reviewed-by: Jason Ekstrand <[email protected]>
* i965: Force VMA alignment to be a multiple of the page size.Kenneth Graunke2019-05-071-0/+2
| | | | | | | This should happen regardless, but let's be paranoid. Fixes: 01058a55229 i965: Add virtual memory allocator infrastructure to brw_bufmgr. Reviewed-by: Jason Ekstrand <[email protected]>
* i965: Fix BRW_MEMZONE_LOW_4G heap size.Kenneth Graunke2019-05-071-1/+4
| | | | | | | | | | The STATE_BASE_ADDRESS "Size" fields can only hold 0xfffff in pages, and 0xfffff * 4096 = 4294963200, which is 1 page shy of 4GB. So we can't use the top page. Fixes: 01058a55229 i965: Add virtual memory allocator infrastructure to brw_bufmgr. Reviewed-by: Jason Ekstrand <[email protected]>
* mesa: Remove the now unused _NEW_ARRAY state change flag.Mathias Fröhlich2019-05-041-1/+0
| | | | | | | Is no longer used, so we have less occasions where NewState is non zero. Reviewed-by: Brian Paul <[email protected]> Signed-off-by: Mathias Fröhlich <[email protected]>
* mesa: Rip out now unused gl_context::aelt_context.Mathias Fröhlich2019-05-042-2/+0
| | | | | | | Now this part of gl_context state is unused and can be removed. Reviewed-by: Brian Paul <[email protected]> Signed-off-by: Mathias Fröhlich <[email protected]>
* anv,i965: Stop warning about incomplete gen11 supportJason Ekstrand2019-05-031-7/+0
| | | | | | | | Both drivers are feature-complete and should be running more-or-less at perf at this point. Drop the warning. Acked-by: Kenneth Graunke <[email protected]> Reviewed-by: Anuj Phogat <[email protected]>
* meson: lift driver-collection out into parent build-fileErik Faye-Lund2019-05-027-23/+17
| | | | | | | | | | This way we can mark the dri_drivers and dri_link arrays as temporary, as all knowledge about them are contained in a single build-file with clearly visible limited life-span. Signed-off-by: Erik Faye-Lund <[email protected]> Reviewed-by: Eric Engestrom <[email protected]> Acked-by: Dylan Baker <[email protected]>
* i965: Re-enable fast color clears for GEN11.Plamena Manolova2019-04-291-15/+8
| | | | | | | | | | This patch re-enables fast color clears for GEN11. It also ensures that we use linear color formats for sRGB surfaces during fast clears. Signed-off-by: Plamena Manolova <[email protected]> Reviewed-by: Nanley Chery <[email protected]> Reviewed-by: Rafael Antognolli <[email protected]>
* delete autotools input filesEric Engestrom2019-04-292-23/+0
| | | | | | | Leftovers from when autotools was deleted. Signed-off-by: Eric Engestrom <[email protected]> Reviewed-by: Dylan Baker <[email protected]>
* delete autotools .gitignore filesEric Engestrom2019-04-291-1/+0
| | | | | | | | One special case, `src/util/xmlpool/.gitignore` is not entirely deleted, as `xmlpool.pot` still gets generated (eg. by `ninja xmlpool-pot`). Signed-off-by: Eric Engestrom <[email protected]> Reviewed-by: Dylan Baker <[email protected]>
* meson: Force '.so' extension for DRI driversJon Turney2019-04-251-0/+1
| | | | | | | | | | | | DRI driver loadable modules are always installed with install_megadriver.py with names ending with '.so', irrespective of platform. Force the name the loadable module is built with to match, so install_megadriver.py doesn't spin trying to remove non-existent symlinks. Fixes: c77acc3c "meson: remove meson-created megadrivers symlinks"
* st/mesa/radeonsi: fix race between destruction of types and shader compilationTimothy Arceri2019-04-246-10/+10
| | | | | | | | | | | | | | | | | | | | | | | | | | Commit 624789e3708c moved the destruction of types out of atexit() and made use of a ref count instead. This is useful for avoiding a crash where drivers such as radeonsi are still compiling in a thread when the app exits and has not called MakeCurrent to change from the current context. While the above scenario is technically an app bug we shouldn't crash. However that change caused another race condition between the shader compilation tread in radeonsi and context teardown functions. This patch makes two changes to fix this new problem: First we explicitly call _mesa_destroy_shader_compiler_types() when destroying the st context rather than calling it indirectly via _mesa_free_context_data(). We do this as we must call it after st_destroy_context_priv() so that we don't destory the glsl types before the compilation threads finish. Next wait for the shader threads to finish in si_destroy_context() this also means we need to call context destroy before destroying the queues in si_destroy_screen(). Fixes: 624789e3708c ("compiler/glsl: handle case where we have multiple users for types") Reviewed-by: Marek Olšák <[email protected]>
* i965: Tidy bogus indentation left by previous commitKenneth Graunke2019-04-221-26/+24
| | | | | | | | | I left code indented one level too far in the previous commit to make the diff easier to review. Drop that extra level now. Fixes: 6981069fc80 i965: Ignore uniform storage for samplers or images, use binding info Reviewed-by: Jason Ekstrand <[email protected]>
* i965: Ignore uniform storage for samplers or images, use binding infoKenneth Graunke2019-04-223-18/+28
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | gl_nir_lower_samplers_as_deref creates new top level sampler and image uniforms which have been split from structure uniforms. i965 assumed that it could walk through gl_uniform_storage slots by starting at var->data.location and walking forward based on a simple slot count. This assumed that structure types were walked in a particular order. With samplers and images split out of structures, it becomes impossible to assign meaningful locations. Consider: struct S { sampler2D a; sampler2D b; } s[2]; The gl_uniform_storage locations for these follow this map: 0 => a[0], 1 => b[0], 2 => a[0], 3 => b[0]. But the new split variables look like: sampler2D lowered_a[2]; sampler2D lowered_b[2]; and there is no way to know that there's effectively a stride to get to the location for successive elements of a[] or b[]. So, working with location becomes effectively impossible. Ultimately, the point of looking at uniform storage was to pull out the bindings from the opaque index fields. gl_nir_lower_samplers_as_derefs can obtain this information while doing the splitting, however, and sets up var->data.binding to have the desired values. We move gl_nir_lower_samplers before brw_nir_lower_image_load_store so gl_nir_lower_samplers_as_derefs has the opportunity to set proper image bindings. Then, we make the uniform handling code skip sampler(-array) variables, and handle image param setup based on var->data.binding. Fixes Piglit tests/spec/glsl-1.10/execution/samplers/uniform-struct, this time without regressing dEQP-GLES2.functional.uniform_api.random.3. Fixes: f003859f97c nir: Make gl_nir_lower_samplers use gl_nir_lower_samplers_as_deref Reviewed-by: Jason Ekstrand <[email protected]>
* i965: implement WaEnableStateCacheRedirectToCSLionel Landwerlin2019-04-182-0/+6
| | | | | | | | | | | | This 3d performance workaround was initially put in the kernel but the media driver requires different settings so the register has been whitelisted in i915 [1] and userspace drivers are left initializing it as they wish. [1] : https://patchwork.freedesktop.org/series/59494/ Signed-off-by: Lionel Landwerlin <[email protected]> Reviewed-by: Anuj Phogat <[email protected]>
* intel/perf: drop counter size fieldLionel Landwerlin2019-04-172-5/+6
| | | | | | | We can deduct the size from another field, let's just save some space. Signed-off-by: Lionel Landwerlin <[email protected]> Reviewed-by: Mark Janes <[email protected]>
* i965: perf: add mdapi pipeline statistics queries on gen10/11Lionel Landwerlin2019-04-171-1/+9
| | | | | | | | | The Gen10+ expected format adds an additional counter which we can't disclose yet. We can still make the size of the expected query result match. Signed-off-by: Lionel Landwerlin <[email protected]> Reviewed-by: Mark Janes <[email protected]>
* i965: move mdapi guid into intel/perfLionel Landwerlin2019-04-171-2/+1
| | | | | | | One more thing we want to share between the different APIs. Signed-off-by: Lionel Landwerlin <[email protected]> Reviewed-by: Mark Janes <[email protected]>
* i965: move mdapi result data format to intel/perfLionel Landwerlin2019-04-173-96/+10
| | | | | | | We want to reuse this in Anv. Signed-off-by: Lionel Landwerlin <[email protected]> Reviewed-by: Mark Janes <[email protected]>
* i965: move brw_timebase_scale to device infoLionel Landwerlin2019-04-175-19/+15
| | | | | Signed-off-by: Lionel Landwerlin <[email protected]> Reviewed-by: Mark Janes <[email protected]>
* i965: move OA accumulation code to intel/perfLionel Landwerlin2019-04-173-167/+45
| | | | | | | We'll want to reuse this in our Vulkan extension. Signed-off-by: Lionel Landwerlin <[email protected]> Reviewed-by: Mark Janes <[email protected]>
* i965: move mdapi data structure to intel/perfLionel Landwerlin2019-04-171-96/+7
| | | | | | | We'll want to reuse those structures later on. Signed-off-by: Lionel Landwerlin <[email protected]> Reviewed-by: Mark Janes <[email protected]>
* i965: extract performance query metricsLionel Landwerlin2019-04-1723-148117/+206
| | | | | | | | | | We would like to reuse performance query metrics in other APIs. Let's make the query code dealing with the processing of raw counters into human readable values API agnostic. Signed-off-by: Lionel Landwerlin <[email protected]> Reviewed-by: Mark Janes <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
* i965: store device revision in gen_device_infoLionel Landwerlin2019-04-173-6/+4
| | | | | Signed-off-by: Lionel Landwerlin <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
* i965: Move program key debugging to the compiler.Kenneth Graunke2019-04-169-283/+36
| | | | | | | | | | | | | | | | | | | The i965 driver has a bunch of code to compare two sets of program keys and print out the differences. This can be useful for debugging why a shader needed to be recompiled on the fly due to non-orthogonal state dependencies. anv doesn't do recompiles, so we didn't need to share this in the past - but I'd like to use it in iris. This moves the bulk of the code to the compiler where it can be reused. To make that possible, we need to decouple it from i965 - we can't get at the brw program cache directly, nor use brw_context to print things. Instead, we use compiler->shader_perf_log(), and simply pass in keys. We put all of this debugging code in brw_debug_recompile.c, and only export a single function, for simplicity. I also tidied the code a bit while moving it, now that it all lives in one file. Reviewed-by: Jordan Justen <[email protected]>
* Delete autotoolsDylan Baker2019-04-1510-655/+0
| | | | | | | | | | Acked-by: Kenneth Graunke <[email protected]> Reviewed-by: Eric Anholt <[email protected]> Reviewed-by: Eric Engestrom <[email protected]> Acked-by: Marek Olšák <[email protected]> Acked-by: Jason Ekstrand <[email protected]> Acked-by: Bas Nieuwenhuizen <[email protected]> Acked-by: Matt Turner <[email protected]>
* intel: Emit 3DSTATE_VF_STATISTICS dynamicallyKenneth Graunke2019-04-142-6/+24
| | | | | | | | | | | | | | | | | | | | | Pipeline statistics queries should not count BLORP's rectangles. (23) How do operations like Clear, TexSubImage, etc. affect the results of the newly introduced queries? DISCUSSION: Implementations might require "helper" rendering commands be issued to implement certain operations like Clear, TexSubImage, etc. RESOLVED: They don't. Only application submitted rendering commands should have an effect on the results of the queries. Piglit's arb_pipeline_statistics_query-vert_adj exposes this bug when the driver is hacked to always perform glBufferData via a GPU staging copy (for debugging purposes). Reviewed-by: Jason Ekstrand <[email protected]> Reviewed-by: Lionel Landwerlin <[email protected]>
* glsl/nir: add support for lowering bindless images_derefsKarol Herbst2019-04-121-1/+1
| | | | | | | | | | | v2: handle atomics as well make use of nir_rewrite_image_intrinsic v3: remove call to nir_remove_dead_derefs v4: (Timothy Arceri) dont actually call lowering yet Signed-off-by: Karol Herbst <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]> (v3) Reviewed-by: Marek Olšák <[email protected]>
* nir: move brw_nir_rewrite_image_intrinsic into common codeKarol Herbst2019-04-121-1/+1
| | | | | | Signed-off-by: Karol Herbst <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]> Reviewed-by: Marek Olšák <[email protected]>
* intel/common: move gen_debug to intel/devMark Janes2019-04-104-4/+4
| | | | | | | | | libintel_common depends on libintel_compiler, but it contains debug functionality that is needed by libintel_compiler. Break the circular dependency by moving gen_debug files to libintel_dev. Suggested-by: Kenneth Graunke <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
* i965: Advertise NV_compute_shader_derivativesCaio Marcelo de Oliveira Filho2019-04-081-0/+1
| | | | Reviewed-by: Ian Romanick <[email protected]>
* intel: add dependency on genxml generated filesLionel Landwerlin2019-04-081-1/+1
| | | | | | | | | | Drivers using genxml will start compilation before generated files are created, so add a dependency to it. Signed-off-by: Lionel Landwerlin <[email protected]> Reviewed-by: Eric Engestrom <[email protected]> Reviewed-by: Dylan Baker <[email protected]> Cc: [email protected]
* meson: strip rpath from megadriversEric Engestrom2019-04-011-0/+3
| | | | | | | | | | More specifically, use the library file that has been post-processed by Meson when creating the hardlinks. Bugs: https://bugs.freedesktop.org/show_bug.cgi?id=108766 Fixes: 3218056e0eb375eeda47 "meson: Build i965 and dri stack" Signed-off-by: Eric Engestrom <[email protected]> Reviewed-by: Dylan Baker <[email protected]>
* i965: perf: update render basic configs for big core gen9/gen10Lionel Landwerlin2019-04-018-23/+24
| | | | | | | | | This updates allows an MI_LRI to trigger a OA report write in the global OA buffer. This isn't really useful for us, we just keep close to the internal public configs. Signed-off-by: Lionel Landwerlin <[email protected]> Acked-by: Kenneth Graunke <[email protected]>
* i965: perf: add ring busyness metric for cfl gt2Lionel Landwerlin2019-04-011-1/+165
| | | | | Signed-off-by: Lionel Landwerlin <[email protected]> Acked-by: Kenneth Graunke <[email protected]>
* i965: perf: enable Icelake metricsLionel Landwerlin2019-03-313-3/+11
| | | | | Signed-off-by: Lionel Landwerlin <[email protected]> Acked-by: Kenneth Graunke <[email protected]>
* i965: perf: add Icelake metricsLionel Landwerlin2019-03-311-0/+11899
| | | | | Signed-off-by: Lionel Landwerlin <[email protected]> Acked-by: Kenneth Graunke <[email protected]>
* i965: perf: sklgt2: drop programming of an unused NOA registerLionel Landwerlin2019-03-311-11/+6
| | | | | Signed-off-by: Lionel Landwerlin <[email protected]> Acked-by: Kenneth Graunke <[email protected]>