summaryrefslogtreecommitdiffstats
path: root/src
Commit message (Collapse)AuthorAgeFilesLines
* swr/rast: x86 autogenerated macro workGeorge Kyriazis2018-01-194-14/+15
| | | | | | | Add name argument to x86 autogenerated macros. Add useful variable names for DCL_inputVec implementation. Reviewed-by: Bruce Cherniak <[email protected]>
* swr/rast: Shorten some filenamesGeorge Kyriazis2018-01-192-2/+2
| | | | | | in shader and fetch dump files Reviewed-by: Bruce Cherniak <[email protected]>
* swr/rast: work supporting optimizations in Debug builds.George Kyriazis2018-01-192-9/+23
| | | | Reviewed-by: Bruce Cherniak <[email protected]>
* swr/rast: Add debugging type support for function types.George Kyriazis2018-01-192-0/+21
| | | | Reviewed-by: Bruce Cherniak <[email protected]>
* swr/rast: Shader debugging workGeorge Kyriazis2018-01-191-0/+6
| | | | | | | - Move debug .ll files to JIT_CACHE_DIR - Don't link against jitter SRGBLut table, add global data to shader that needs it. Reviewed-by: Bruce Cherniak <[email protected]>
* swr/rast: Debug Symbols workGeorge Kyriazis2018-01-194-19/+88
| | | | | | | Added support for Fetch / Sample / LD functions Added DLL link to JitCache implementation Reviewed-by: Bruce Cherniak <[email protected]>
* swr/rast: Initial work for debugging support.George Kyriazis2018-01-196-16/+191
| | | | | | | | | | Adds ability to step into jitted llvm IR in Visual Studio. - Updated llvm type generation script to also generate corresponding debug types. - New module pass inserts debug metadata into the IR for each function Disabled by default. Reviewed-by: Bruce Cherniak <[email protected]>
* swr/rast: Add private state parameter in fetcherGeorge Kyriazis2018-01-195-29/+41
| | | | Reviewed-by: Bruce Cherniak <[email protected]>
* swr/rast: Added missing define for Linux/gccGeorge Kyriazis2018-01-191-0/+1
| | | | | | + ZeroMemory() macro definition for non win32-compilation in common/os.h Reviewed-by: Bruce Cherniak <[email protected]>
* swr/rast: Fix one more invalid object format for windows.George Kyriazis2018-01-191-1/+1
| | | | Reviewed-by: Bruce Cherniak <[email protected]>
* radv: Always re-emit the sample position offset user SGPR.Bas Nieuwenhuizen2018-01-191-17/+17
| | | | | | | The user SGPR location can change between pipelines, so we need to emit it again to the pottentially changed SGPR index. Reviewed-by: Samuel Pitoiset <[email protected]>
* radv: emit pa_sc_mode_cntl_0 with multisample state.Bas Nieuwenhuizen2018-01-192-3/+4
| | | | | | | We don't have the meta kludge with 0 viewports anymore, so we can always enable them. Reviewed-by: Samuel Pitoiset <[email protected]>
* i965: Avoid problems from referencing orphaned BOs after growing.Kenneth Graunke2018-01-192-24/+105
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Growing the batch/state buffer is a lot more dangerous than I thought. A number of places emit multiple state buffer sections, and then write data to the returned pointer, or save a pointer to brw->batch.state.bo and then use it in relocations. If each call can grow, this can result in stale map references or stale BO pointers. Furthermore, fences refer to the old batch BO, and that reference needs to continue working. To avoid these woes, we avoid ever swapping the brw->batch.*.bo pointer, instead exchanging the brw_bo structures in place. That way, stale BO references are fine - the GEM handle changes, but the brw_bo pointer doesn't. We also defer the memcpy until a quiescent point, so callers can write to the returned pointer - which may be in either BO - and we'll sort it out and combine the two properly in the end. v2/v3: - Handle stale pointers in the shadow copy case, where realloc may or may not move our shadow copy to a new address. - Track the partial map explicitly, to avoid problems with buffer reuse where multiple map modes exist (caught by Chris Wilson). v4: - Don't use realloc in the CPU shadow case, it isn't safe. Fixes: 2dfc119f22f257082ab0 "i965: Grow the batch/state buffers if we need space and can't flush." Reviewed-by: Iago Toral Quiroga <[email protected]> [v3] Reviewed-by: Chris Wilson <[email protected]>
* i965: Rename 'aux' to 'prog_data' in program cache.Kenneth Graunke2018-01-191-15/+16
| | | | | | | 'aux' is a very generic name, suggesting it can be a bunch of things. However, it's always the brw_*_prog_data structure. So, call it that. Reviewed-by: Iago Toral Quiroga <[email protected]>
* swr: allow a single swr architecture to be builtinChuck Atkins2018-01-191-35/+49
| | | | | | | | | | | | | | | | | | | | | | | | | | Part 2 of 2 (part 1 is autoconf changes, part 2 is C++ changes) When only a single SWR architecture is being used, this allows that architecture to be builtin rather than as a separate libswrARCH.so that gets loaded via dlopen. Since there are now several different code paths for each detected CPU architecture, the log output is also adjusted to convey where the backend is getting loaded from. This allows SWR to be used for static mesa builds which are still important for large HPC environments where shared libraries can impose unacceptable application startup times as hundreds of thousands of copies of the libs are loaded from a shared parallel filesystem. Based on an initial implementation by Tim Rowley. v2: Refactor repetitive preprocessor checks to reduce code duplication v3: Formatting changes per Bruce C. Also delay screen creation until end to avoid leaks when failure conditions are hit. Signed-off-by: Chuck Atkins <[email protected]> Reviewed-by: Bruce Cherniak <[email protected]> CC: Tim Rowley <[email protected]> Reviewed-by: Bruce Cherniak <[email protected]>
* swr: (autoconf) allow a single swr architecture to be builtinChuck Atkins2018-01-191-11/+39
| | | | | | | | | | | | | | | | | | | | | | | | Part 1 of 2 (part 1 is autoconf changes, part 2 is C++ changes) When only a single SWR architecture is being used, this allows that architecture to be builtin rather than as a separate libswrARCH.so that gets loaded via dlopen. Since there are now several different code paths for each detected CPU architecture, the log output is also adjusted to convey where the backend is getting loaded from. This allows SWR to be used for static mesa builds which are still important for large HPC environments where shared libraries can impose unacceptable application startup times as hundreds of thousands of copies of the libs are loaded from a shared parallel filesystem. Based on an initial implementation by Tim Rowley. v2: Fix comment placement pointed out by Bruce C. Signed-off-by: Chuck Atkins <[email protected]> Reviewed-by: Bruce Cherniak <[email protected]> CC: Tim Rowley <[email protected]> Reviewed-by: Bruce Cherniak <[email protected]>
* swr: fix clang 5 null cast warningGreg V2018-01-191-3/+3
| | | | | Reviewed-by: Emil Velikov <[email protected]> Reviewed-by: Bruce Cherniak <[email protected]>
* mesa/program: Fix -Wunused-param warningGert Wollny2018-01-194-6/+4
| | | | | | | | v2: Don't annotate, but remove the unused ctx parameter Signed-off-by: Gert Wollny <[email protected]> Reviewed-by: Emil Velikov <[email protected]> Reviewed-by: Ian Romanick <[email protected]>
* mesa/program/prog_execute.c: Silence -Wunused-paramGert Wollny2018-01-191-6/+3
| | | | | | | | v2: Don't annotate, but remove the unused ctx parameter Signed-off-by: Gert Wollny <[email protected]> Reviewed-by: Ian Romanick <[email protected]> Reviewed-by: Emil Velikov <[email protected]>
* mesa: Make numSamples an unsigned intGert Wollny2018-01-196-8/+8
| | | | | | | | | | As a followup to the previous patch propagate the change of numSamples from int to unsigned to gl_config::samples and consequently fix some -Wsign-compare warnings. Signed-off-by: Gert Wollny <[email protected]> Reviewed-by: Emil Velikov <[email protected]> Reviewed-by: Ian Romanick <[email protected]>
* gallium: Make (num_)samples an unsigned intGert Wollny2018-01-192-2/+6
| | | | | | | | | | | According to the ARB_multisample num_samples is a non-negative integer. Consequently define it as such, fail in glx/choose_visual if a negative number is given. v2: split patch into gallium and mesa part Signed-off-by: Gert Wollny <[email protected]> Reviewed-by: Emil Velikov <[email protected]>
* st/vdpau: release held lock in error pathGrazvydas Ignotas2018-01-191-1/+3
| | | | | | Signed-off-by: Grazvydas Ignotas <[email protected]> Reviewed-by: Christian König <[email protected]> Cc: [email protected]
* anv: avoid segmentation fault due to vk_error()Samuel Iglesias Gonsálvez2018-01-191-8/+10
| | | | | | | | | | | | | vk_error() is a macro that calls __vk_errorf() with instance == NULL. Then, __vk_errorf() passes a pointer to instance->debug_report_callbacks to vk_debug_error(), which segfaults as this pointer is invalid but not NULL. Fixes: e5b1bd6ab8 "vulkan: move anv VK_EXT_debug_report implementation to common code." Signed-off-by: Samuel Iglesias Gonsálvez <[email protected]> Reviewed-by: Tapani Pälli <[email protected]>
* ac/nir: Fix vector extraction if source vector has >4 elements.Bas Nieuwenhuizen2018-01-191-16/+32
| | | | | | | | v2: Add forgotten argument and start offset. Fixes: 91074bb11bda "radv/ac: Implement Float64 SSBO stores." Tested-by: Timothy Arceri <[email protected]> Acked-by: Timothy Arceri <[email protected]>
* ac/nir: Use correct 32-bit component writemask for 64-bit SSBO stores.Bas Nieuwenhuizen2018-01-191-9/+13
| | | | | | Fixes: 91074bb11bda "radv/ac: Implement Float64 SSBO stores." Tested-by: Timothy Arceri <[email protected]> Acked-by: Timothy Arceri <[email protected]>
* ac/nir: Fix TCS output LDS offsets.Bas Nieuwenhuizen2018-01-191-5/+6
| | | | | | | | | | | | When a channel was not set we also did not increase the LDS address, while that obviously should happen. The output loading code was inadvertently fixed which resulted in a mismatch causing the SaschaWillems tessellation demo to result in corrupt rendering. Fixes: 7898eb9a60 "ac: rework load_tcs_{inputs,outputs}" Reviewed-by: Dave Airlie <[email protected]>
* radv: Use correct bindings for inputRate in key generation.Bas Nieuwenhuizen2018-01-191-1/+7
| | | | | | | | | The bindings also have an index field. Fixes: 49d035122e "radv: Add single pipeline cache key." Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=104677 Reviewed-by: Dave Airlie <[email protected]> Reviewed-by: Samuel Pitoiset <[email protected]>
* radv: Implement VK_ANDROID_native_buffer.Bas Nieuwenhuizen2018-01-197-4/+407
| | | | | | | | | | | | | | | | | | | | | | | | Passes dEQP-VK.api.smoke.* dEQP-VK.wsi.android.* with android-cts-7.1_r12 . Unlike the initial anv implementation this does use syncobjs instead of waiting on the CPU. This is missing meson build coverage for now. One possible todo is that linux 4.15 now has a sycall that allows us to export amdgpu fence to a sync_file, which allows us not to force all fences and semaphores to use syncobjs. However, I had trouble with my kernel crashing regularly with NULL pointers, and I'm not sure how beneficial it is in the first place given that intel uses syncobjs for all fences if available. Reviewed-by: Dave Airlie <[email protected]>
* radv: Add create image flag to not use DCC/CMASK.Bas Nieuwenhuizen2018-01-192-19/+25
| | | | | | | If we import an image, we might not have space in the buffer for CMASK, even though it is compatible. Reviewed-by: Dave Airlie <[email protected]>
* radv: Generate VK_ANDROID_native_buffer.Bas Nieuwenhuizen2018-01-193-2/+9
| | | | Reviewed-by: Dave Airlie <[email protected]>
* radv: Replace an assert with unreachable.Bas Nieuwenhuizen2018-01-191-1/+1
| | | | | | Otherwise we get uninitialized variable warnings for es_vgpr_comp_cnt. Reviewed-by: Samuel Pitoiset <[email protected]>
* radv: Remove DCC check on CS resolve dst image.Bas Nieuwenhuizen2018-01-191-3/+0
| | | | | | | Gives a warning when the assert is disabled, and not even necessarily true. Reviewed-by: Samuel Pitoiset <[email protected]>
* gallivm: support avx512 (16x32) in interleave2_halfGeorge Kyriazis2018-01-181-2/+38
| | | | | | | | | | | | | | lp_build_interleave2_half was not doing the right thing for avx512-style 16-wide loads. This path is hit in the swr driver with a 16-wide vertex shader. It is called from lp_build_transpose_aos, when doing texel fetches and the fetched data needs to be transposed to one component per output register. Special-case the post-load swizzle operations for avx512 16x32 (16-wide 32-bit values) so that we move the xyzw components correctly to the outputs. Reviewed-by: Roland Scheidegger <[email protected]>
* vbo: fix VBO optimization regressionBrian Paul2018-01-182-4/+7
| | | | | | | | | | | | | | | The optimization in change 8e4efdc895ea ("vbo: optimize some display list drawing") missed the loopback case. This is used when the glBegin/End primitive doesn't have a uniform set of vertex attributes. The new Piglit gl-1.0-dlist-materials test hits this. So check the aligned_vertex_buffer_offset(list) value and adjust the buffer offset accordingly. We also need to remove the 'start == 0' assertion in the loopback code since it no longer applies. Reviewed-by: Roland Scheidegger <[email protected]>
* meson: ensure that xmlpool_options.h is generated for targets that need itDylan Baker2018-01-183-12/+12
| | | | | | | | Currently a couple of gallium targets race with xmlpool_options.h being generated, don't do that. Signed-off-by: Dylan Baker <[email protected]> Reviewed-by: Eric Anholt <[email protected]>
* ac: fix visit_ssa_undef() for doublesTimothy Arceri2018-01-191-2/+3
| | | | | | | | V2: use LLVMIntTypeInContext() Fixes: f4e499ec7914 "radv: add initial non-conformant radv vulkan driver" Reviewed-by: Bas Nieuwenhuizen <[email protected]>
* ac/nir: account for view index in the user sgpr allocation.Dave Airlie2018-01-181-8/+34
| | | | | | | | | | The view index user sgpr wasn't being accounted for properly, this refactors out the code to decide if it's required and then uses that info to account for it. Fixes: 180c1b924e (ac/nir: Add shader support for multiviews.) Reviewed-by: Bas Nieuwenhuizen <[email protected]> Signed-off-by: Dave Airlie <[email protected]>
* r600: enable ARB_enhanced_layoutsDave Airlie2018-01-191-1/+1
| | | | | | | | | | | Only one piglit test fails, sso-vs-gs-fs-array-interleave There are 3 tests using ssbo without checking sizes failing also but those are test bugs. Reviewed-by: Roland Scheidegger <[email protected]> Signed-off-by: Dave Airlie <[email protected]>
* intel: Future-proof ring names for aubinator_error_decodeChris Wilson2018-01-181-24/+98
| | | | | | | | | | | | | | | | | | The kernel is moving to a $class$instance naming scheme in preparation for accommodating more rings in the future in a consistent manner. It is already using the naming scheme internally, and now we are looking at updating some soft-ABI such as the error state to use the new naming scheme. This of course means we need to teach aubinator_error_decode how to map both sets of ring names onto its register maps. Signed-off-by: Chris Wilson <[email protected]> Cc: Michel Thierry <[email protected]> Cc: Michal Wajdeczko <[email protected]> Cc: Tvrtko Ursulin <[email protected]> Cc: Lionel Landwerlin <[email protected]> Cc: Kenneth Graunke <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]> Reviewed-by: Michel Thierry <[email protected]>
* i965: Bind null render targets for shadow sampling + color.Kenneth Graunke2018-01-181-1/+32
| | | | | | | | | | | | | | | | | | | Portal 2 appears to bind RGBA8888_UNORM textures to a sampler2DShadow, and calls shadow2D() on it. This causes undefined behavior in OpenGL. Unfortunately, our sampler appears to hang in this scenario, which is not acceptable. Just give them a null surface instead, which returns all zeroes. Fixes GPU hangs in Portal 2 on Kabylake. Huge thanks to Jason Ekstrand for noticing this crazy behavior while sifting through crash dumps. Cc: [email protected] Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=104487 Reviewed-by: Topi Pohjolainen <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]>
* anv/query: implement multiview interactionsIago Toral Quiroga2018-01-181-0/+54
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | From the Vulkan spec with KHX extensions: "If queries are used while executing a render pass instance that has multiview enabled, the query uses N consecutive query indices in the query pool (starting at query) where N is the number of bits set in the view mask in the subpass the query is used in. How the numerical results of the query are distributed among the queries is implementation-dependent. For example, some implementations may write each view's results to a distinct query, while other implementations may write the total result to the first query and write zero to the other queries. However, the sum of the results in all the queries must accurately reflect the total result of the query summed over all views. Applications can sum the results from all the queries to compute the total result." In our case we only really emit a single query (in the first query index) that stores the aggregated result for all views, but we still need to manage availability for all the other query indices involved, even if we don't actually use them. This is relevant when clients call vkGetQueryPoolResults and pass all N queries to retrieve the results. In that scenario, without this patch, we will never see queries other than the first being available since we never emit them. v2: we need the same treatment for timestamp queries. v3 (Jason): - Better an if instead of an early return. - We can't write to this memory in the CPU, we should use MI_STORE_DATA_IMM and emit_query_availability (Jason). v4 (Jason): - No need to take the value to write as parameter, just hard code it to 0. Fixes test failures in some work-in-progress CTS multiview+query tests. Reviewed-by: Jason Ekstrand <[email protected]>
* vc5: add missing files to the tarballEmil Velikov2018-01-181-0/+5
| | | | Signed-off-by: Emil Velikov <[email protected]>
* broadcom: add missing headers to the tarballEmil Velikov2018-01-181-2/+5
| | | | Signed-off-by: Emil Velikov <[email protected]>
* i965/screen: Allow drirc to set 'allow_rgb10_configs' again.Mario Kleiner2018-01-181-1/+6
| | | | | | | | | | | | | | Since setup of ALLOW_RGB10_CONFIGS was moved to i965's own brw_config_options.xml, this was hard-coded to false and could not be overriden by drirc. Add some parsing into i965's private screen->optionCache to enable drirc again. Fixes: b391fb26df9f1b ("dri_util: remove ALLOW_RGB10_CONFIGS option (v2)") Signed-off-by: Mario Kleiner <[email protected]> Cc: Marek Olšák <[email protected]> Cc: Tapani Pälli <[email protected]> Reviewed-by: Marek Olšák <[email protected]> Reviewed-by: Tapani Pälli <[email protected]>
* anv: return VK_ERROR_OUT_OF_DEVICE_MEMORY when surface size is out of HW limitsSamuel Iglesias Gonsálvez2018-01-181-4/+2
| | | | | Signed-off-by: Samuel Iglesias Gonsálvez <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]>
* ac: tidy up array indexing logicTimothy Arceri2018-01-181-5/+1
| | | | Reviewed-by: Bas Nieuwenhuizen <[email protected]>
* mesa/st: translate SO info in glsl_to_nir() caseRob Clark2018-01-181-4/+43
| | | | | | | | | | | | | | | | | | | | | | | This was handled for VS, but not for GS. Fixes for gallium drivers using nir: spec@arb_gpu_shader5@arb_gpu_shader5-xfb-streams-without-invocations spec@arb_gpu_shader5@arb_gpu_shader5-xfb-streams* spec@arb_transform_feedback3@arb_transform_feedback3-ext_interleaved_two_bufs_gs* spec@ext_transform_feedback@geometry-shaders-basic spec@ext_transform_feedback@* use_gs [email protected]@execution@geometry@primitive-id* [email protected]@execution@geometry@tri-strip-ordering-with-prim-restart gl_triangle_strip * [email protected]@transform-feedback-builtins [email protected]@transform-feedback-type-and-size v2: don't call st_translate_program_stream_output) for TCS v3: drop scanning patch outputs as TCS can't output xfb Signed-off-by: Rob Clark <[email protected]> Reviewed-by: Timothy Arceri <[email protected]> Tested-by: Karol Herbst <[email protected]>
* r600/sb: add lds related peepholes.Dave Airlie2018-01-181-1/+8
| | | | | | | | | if no destination: a) convert _RET instructions to non _RET variants if no dst b) set src0 to undefined if it's a READ, this should get DCE then. Acked-By: Roland Scheidegger <[email protected]> Signed-off-by: Dave Airlie <[email protected]>
* r600/sb: use different stacks for tracking lds and queue usage.Dave Airlie2018-01-182-3/+24
| | | | | | | | | | | | | | The normal ssa renumbering isn't sufficient for LDS queue access, this uses two stacks, one for the lds queue, and one for the lds r/w ordering. The LDS oq values are incremented in their use in a linear fashion. The LDS rw values are incremented in their definitions and used in the next lds operation to ensure reordering doesn't occur. Acked-By: Roland Scheidegger <[email protected]> Signed-off-by: Dave Airlie <[email protected]>
* r600/sb: schedule LDS ops in appropriate places.Dave Airlie2018-01-182-0/+7
| | | | | | | | | | So LDS ops have to be SLOT_X, and LDS OQ reads have read port restrictions so we try and force those into only having one per slot and avoiding bank swizzles. Acked-By: Roland Scheidegger <[email protected]> Signed-off-by: Dave Airlie <[email protected]>