summaryrefslogtreecommitdiffstats
Commit message (Collapse)AuthorAgeFilesLines
* ac/nir: remove emission of nir_op_fpowSamuel Pitoiset2018-02-221-4/+0
| | | | | | | fpow is now lowered at NIR level. Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]>
* radv: enable lowering of fpow to fexp2 and flog2Samuel Pitoiset2018-02-221-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | There is no fpow in hardware, so it's always lowered somewhere, but it appears that lowering at NIR level is better. Figured while comparing compute shaders between RadeonSI and RADV. Polaris10: Totals from affected shaders: SGPRS: 18936 -> 18904 (-0.17 %) VGPRS: 12240 -> 12220 (-0.16 %) Spilled SGPRs: 2809 -> 2809 (0.00 %) Code Size: 718116 -> 719848 (0.24 %) bytes Max Waves: 1409 -> 1410 (0.07 %) Vega10: Totals from affected shaders: SGPRS: 18392 -> 18392 (0.00 %) VGPRS: 12008 -> 11920 (-0.73 %) Spilled SGPRs: 3001 -> 2981 (-0.67 %) Code Size: 777444 -> 778788 (0.17 %) bytes Max Waves: 1503 -> 1504 (0.07 %) Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]>
* nir: lower fexp2(fmul(flog2(a), 2)) to fmul(a, a)Samuel Pitoiset2018-02-221-0/+2
| | | | | | | | | Similar for the 4 case. Suggested by Bas. Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]>
* nir: add is_used_once for fmul(fexp2(a), fexp2(b)) to fexp2(fadd(a, b))Samuel Pitoiset2018-02-221-1/+1
| | | | | | | | Otherwise the code size increases because the original fexp2() instructions can't be deleted. Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]>
* ac/nir: set GLC=1 for load/store of coherent/volatile imagesSamuel Pitoiset2018-02-221-3/+4
| | | | | | | | | | This disables persistence accross wavefronts. F1 2017 and Wolfenstein 2 appear to use some coherent images but this patch doesn't seem to change anything. Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]>
* spirv: apply memory qualifiers to imagesSamuel Pitoiset2018-02-221-3/+9
| | | | | Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]>
* glx: Properly handle cases where screen creation failsChuck Atkins2018-02-223-30/+33
| | | | | | | | | | | | | This fixes a segfault exposed by a29d63ecf7 which occurs when swr is used on an unsupported architecture. v2: re-work to place logic in xmesa_init_display Signed-off-by: Chuck Atkins <[email protected]> Reviewed-by: Emil Velikov <[email protected]> Cc: [email protected] Cc: George Kyriazis <[email protected]> Cc: Bruce Cherniak <[email protected]>
* anv/blorp: multisample resolve all attachment layersIago Toral Quiroga2018-02-221-11/+20
| | | | | | | | | | | | | | | | | | | | | | | | | We were only resolving the first. v2: - Do not require that the number of layers on dst and src are an exact match, it is okay if the dst has more layers so long as it has at least the same that we are going to resolve. - Do not always resolve array_len layers, we should resolve only from base_array_layer to array_len. v3: - v2 was assuming that array_len represented the total number of layers in the image, but it represents the number of layers starting at the base array ayer. v4: - The number of layers to resolve should be taken from the framebuffer (Nanley). Fixes new CTS tests for multisampled layered rendering: dEQP-VK.renderpass.multisample_resolve.layers_* Reviewed-by: Nanley Chery <[email protected]>
* intel/isl: Improve the documentation on get_default_aux_stateJason Ekstrand2018-02-211-4/+20
| | | | Reviewed-by: Chad Versace <[email protected]>
* i965: Use finish_external instead of make_shareable in setTexBuffer2Jason Ekstrand2018-02-215-2/+87
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The setTexBuffer2 hook from GLX is used to implement glxBindTexImageEXT which has tighter restrictions than just "it's shared". In particular, it says that any rendering to the image while it is bound causes the contents to become undefined. The GLX_EXT_texture_from_pixmap extension provides us with an acquire and release in the form of glXBindTexImageEXT and glXReleaseTexImageEXT. The extension spec says, "Rendering to the drawable while it is bound to a texture will leave the contents of the texture in an undefined state. However, no synchronization between rendering and texturing is done by GLX. It is the application's responsibility to implement any synchronization required." From the EGL 1.4 spec for eglBindTexImage: "After eglBindTexImage is called, the specified surface is no longer available for reading or writing. Any read operation, such as glReadPixels or eglCopyBuffers, which reads values from any of the surface’s color buffers or ancillary buffers will produce indeterminate results. In addition, draw operations that are done to the surface before its color buffer is released from the texture produce indeterminate results In other words, between the bind and release calls, we effectively own those pixels and can assume, so long as we don't crash, that no one else is reading from/writing to the surface. The GLX and EGL implementations call the setTexBuffer2 and releaseTexBuffer function pointers that the driver can hook. In theory, this means that, between BindTexImage and ReleaseTexImage, we own the pixels and it should be safe to track aux usage so we can avoid redundant resolves so long as we start off with the right assumption at the start of the bind/release pair. In practice, however, X11 has slightly different expectations. It's expected that the server may be drawing to the image at the same time as the compositor is texturing from it. In that case, the worst expected outcome should be tearing or partial rendering and not random corruption like we see when rendering races with scanout with CCS. Fortunately, the GEM rules about texture/render dependencies save us here. If X11 submits work to write to a pixmap after the compositor has submitted work to texture from it, GEM inserts a dependency between the compositor and X11. If X11 is using a high-priority context, this will cause the compositor to get a temporarily boosted priority while the batch from X11 is waiting on it. This means that we will never have an actual race between X11 and the compositor so no corruption can happen. Unfortunately, however, this means that X11 will likely be rendering to it between the compositor's BindTexImage and ReleaseTexImage calls. If we want to avoid strange issues, we need to be a bit careful about resolves because we can't really transition it away from the "default" aux usage. The only case where this would practically be a problem is with image_load_store where we have to do a full resolve in order to use the image via the data port. Even there it would only be a problem if batches were split such that X11's rendering happens between the resolve and the use of it as a storage image. However, the chances of this happening are very slim so we just emit a warning and hope for the best. This commit adds a new helper intel_miptree_finish_external which resets all aux state to whatever ISL says is the right worst-case "default" for the given modifier. It feels a little awkward to call it "finish" because it's actually an acquire from the perspective of the driver, but it matches the semantics of the other prepare/finish functions. This new helper gets called in intelSetTexBuffer2 instead of make_shareable. We also add an intelReleaseTexBuffer (we passed NULL to releaseTexBuffer before) and call intel_miptree_prepare_external in it. This probably does nothing most of the time but it means that the prepare/finish calls are properly matched. Reviewed-by: Kenneth Graunke <[email protected]> Reviewed-by: Chad Versace <[email protected]>
* i965/tex_image: Reference the renderbuffer miptree in setTexBuffer2Jason Ekstrand2018-02-211-14/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | The old code made a new miptree that referenced the same BO as the renderbuffer and just trusted in the memory aliasing to work. There are only two ways in which the new miptree is liable to differ from the one in the renderbuffer and neither of them matter: 1) It may have a different target. The only targets that we can ever see in intelSetTexBuffer2 are GL_TEXTURE_2D and GL_TEXTURE_RECTANGLE and the difference between the two doesn't matter as far as the miptree is concerned; genX(update_sampler_state) only looks at the gl_texture_object and not the miptree when determining whether or not to use normalized coordinates. 2) It may have a very slightly different format. Again, this doesn't matter because we've supported texture views for quite some time so we always look at the gl_texture_object format instead of the miptree format for hardware setup anyway. On the other hand, because we were recreating the miptree, we were using intel_miptree_create_for_bo which doesn't understand modifiers. We really want this function to work without doing a resolve so long as you have modifiers so we need to fix that. Reviewed-by: Kenneth Graunke <[email protected]> Reviewed-by: Chad Versace <[email protected]>
* i965/tex_image: Pull the tex format from the renderbuffer in intelSetTexBuffer2Jason Ekstrand2018-02-211-15/+19
| | | | | Reviewed-by: Kenneth Graunke <[email protected]> Reviewed-by: Chad Versace <[email protected]>
* i965/miptree: Loosen the format check in miptree_match_imageJason Ekstrand2018-02-214-6/+8
| | | | | | | | | | This function is used to determine when we need to re-allocate a miptree. Since we do nothing different in miptree allocation for sRGB vs. linear, loosening this should be safe and may lead to less copying and reallocating in some odd cases. Reviewed-by: Kenneth Graunke <[email protected]> Reviewed-by: Chad Versace <[email protected]>
* i965/state: Ignore intel_obj->_Format for depth/stencil and ETC2Jason Ekstrand2018-02-211-1/+15
| | | | | | | | | | We're about to start letting the intel_obj->_Format be the "real" texture format. For depth/stencil textures, this may be a combined depth stencil format. For ETC2 on gen7 and earlier, this will be the actual ETC2 format. This makes a bit more GL sense but means we have to be careful in state upload. Reviewed-by: Chad Versace <[email protected]>
* glsl: Parse 'layout' as a token with advanced blending or bindlessKenneth Graunke2018-02-211-0/+2
| | | | | | | | | | | | | Both KHR_blend_equation_advanced and ARB_bindless_texture provide layout qualifiers, and are exposed in compatibility contexts. We need to parse the layout qualifier as a token in order for those to work, but forgot to extend this check. ARB_shader_image_load_store would need a similar treatment, but we don't expose that in legacy OpenGL contexts. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=105161 Reviewed-by: Iago Toral Quiroga <[email protected]>
* vulkan/wsi/x11: Consistently update and return swapchain statusDaniel Stone2018-02-211-19/+63
| | | | | | | | | | | | | | | | | Use a helper function for updating the swapchain status. This will be used later to handle VK_SUBOPTIMAL_KHR, where we need to make a non-error status stick to the swapchain until recreation. Instead of direct comparisons to VK_SUCCESS to check for error, test for negative numbers meaning an error status, and positive numbers indicating non-error statuses. v2 (Jason Ekstrand): - Use a pattern of "return x11_swapchain_result(chain, VK_WHATEVER)" - Handle wsi_queue_pull returning VK_TIMEOUT - Call x11_swapchain_result in x11_present_to_x11 Signed-off-by: Daniel Stone <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]>
* vulkan/wsi/x11: Set OUT_OF_DATE if wait_for_special_event failsJason Ekstrand2018-02-211-1/+3
| | | | | | | | | This most likely means we lost our connection to the X server so OUT_OF_DATE is reasonable. This was also the one case where we pushed a UINT32_MAX into the queue without setting an error condition. Cc: [email protected] Reviewed-by: Daniel Stone <[email protected]>
* vulkan/wsi/wayland: Add support for zwp_dmabufDaniel Stone2018-02-214-15/+142
| | | | | | | | zwp_linux_dmabuf_v1 lets us use multi-planar images and buffer modifiers. Signed-off-by: Daniel Stone <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]>
* anv/image: Add support for modifiers for WSIJason Ekstrand2018-02-214-4/+104
| | | | | | This adds support for the modifiers portion of the WSI "extension". Reviewed-by: Daniel Stone <[email protected]>
* anv/image: Separate modifiers from legacy scanoutJason Ekstrand2018-02-213-38/+21
| | | | | | | | | | | | | | | | | | | For a bit there, we had a bug in i965 where it ignored the tiling of the modifier and used the one from the BO instead. At one point, we though this was best fixed by setting a tiling from Vulkan. However, we've decided that i965 was just doing the wrong thing and have fixed it as of 50485723523d2948a44570ba110f02f726f86a54. The old assumptions also affected the solution we used for legacy scanout in Vulkan. Instead of treating it specially, we just treated it like a modifier like we do in GL. This commit goes back to making it it's own thing so that it's clear in the driver when we're using modifiers and when we're using legacy paths. v2 (Jason Ekstrand): - Rename legacy_scanout to needs_set_tiling Reviewed-by: Daniel Stone <[email protected]>
* vulkan/wsi: Add modifiers support to wsi_create_native_imageJason Ekstrand2018-02-215-19/+177
| | | | | | | | | | This involves extending our fake extension a bit to allow for additional querying and passing of modifier information. The added bits are intended to look a lot like the draft of VK_EXT_image_drm_format_modifier. Once the extension gets finalized, we'll simply transition all of the structs used in wsi_common to the real extension structs. Reviewed-by: Daniel Stone <[email protected]>
* vulkan/wsi: Add drm_modifier member to wsi_imageDaniel Stone2018-02-214-1/+6
| | | | | | | Not yet used anywhere. Signed-off-by: Daniel Stone <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]>
* vulkan/wsi: Add multiple planes to wsi_imageDaniel Stone2018-02-214-20/+31
| | | | | | | Not currently used. Signed-off-by: Daniel Stone <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]>
* nir: remove old assertTimothy Arceri2018-02-221-1/+0
| | | | | | | | | | | | This was originally intended to make sure the remap location was not -1. However the code has changed alot since then, the location is now never set to -1 and we also handle components meaning this old assert has been doing comparisions with the pointer to the array of component data. Reviewed-by: Ian Romanick <[email protected]> Reviewed-by: Iago Toral Quiroga <[email protected]> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=105183
* radeonsi/nir: collect more accurate output_usagemaskTimothy Arceri2018-02-221-13/+43
| | | | | | | | | | | Fixes assert in the glsl-1.50-gs-max-output-components piglit test. Note that the double handling will only work for doubles that don't take up multiple slots i.e. double and dvec2. However dual slot double handling is an existing bug which is made no worse by this patch. Reviewed-by: Marek Olšák <[email protected]>
* radeonsi/nir: disable GLSL IR loop unrollingTimothy Arceri2018-02-221-0/+2
| | | | | | | | | | | | | | | | | | | | | | | Delaying unrolling and allowing NIR to do it instead has been shown to result in better code in drivers such as i965. shader-db results appear to show the same is true for radeonsi. The other advantage is that using NIR unrolling improves compile times significantly. Totals from affected shaders: SGPRS: 9624 -> 10016 (4.07 %) VGPRS: 6800 -> 6464 (-4.94 %) Spilled SGPRs: 0 -> 2 (0.00 %) Spilled VGPRs: 0 -> 0 (0.00 %) Private memory VGPRs: 0 -> 0 (0.00 %) Scratch size: 0 -> 0 (0.00 %) dwords per thread Code Size: 359176 -> 332264 (-7.49 %) bytes LDS: 0 -> 0 (0.00 %) blocks Max Waves: 1355 -> 1432 (5.68 %) Wait states: 0 -> 0 (0.00 %) Reviewed-by: Marek Olšák <[email protected]>
* radeonsi/nir: fix tess varying loads for doublesTimothy Arceri2018-02-221-2/+2
| | | | | | | | | Fixes the following piglit tests: tests/spec/arb_tessellation_shader/execution/double-array-vs-tcs-tes.shader_test tests/spec/arb_tessellation_shader/execution/double-vs-tcs-tes.shader_test Reviewed-by: Marek Olšák <[email protected]>
* ac/radeonsi: pass type to load_tess_varyings()Timothy Arceri2018-02-224-2/+17
| | | | | | We need this to be able to load 64bit varyings. Reviewed-by: Marek Olšák <[email protected]>
* x11/dri3: Store raw present completion modeDaniel Stone2018-02-212-10/+4
| | | | | | | | | | The DRI3 drawable info struct currently stores a boolean for whether the last completed operation was a flip or not. As we need to track the full completion mode for handling suboptimal returns, change the 'flipping' field to the raw present completion mode from the server. Signed-off-by: Daniel Stone <[email protected]> Reviewed-by: Eric Engestrom <[email protected]>
* x11/dri3: Don't open-code ARRAY_SIZEDaniel Stone2018-02-212-3/+4
| | | | | | Signed-off-by: Daniel Stone <[email protected]> Reviewed-by: Eric Engestrom <[email protected]> Reviewed-by: Emil Velikov <[email protected]>
* anv: Don't assert that stencil HiZ clears are single-sliceJason Ekstrand2018-02-211-3/+6
| | | | | | | | | It's true for depth HiZ clears because we only have HiZ on single-slice images right now. However, for stencil-only clears there is no such restriction. Tested-by: Rafael Antognolli <[email protected]> Reviewed-by: Nanley Chery <[email protected]>
* anv: Only copy clear dwords if we're rendering to the first sliceJason Ekstrand2018-02-211-1/+4
| | | | Reviewed-by: Rafael Antognolli <[email protected]>
* radeonsi: don't flush when si_eliminate_fast_color_clear is no-opMarek Olšák2018-02-211-1/+5
|
* radeonsi: make texture_discard_cmask/eliminate functions non-staticMarek Olšák2018-02-212-11/+13
|
* radeonsi: enable uvd encode for HEVC mainJames Zhu2018-02-211-1/+3
| | | | | | | Enable UVD encode for HEVC main profile Signed-off-by: James Zhu <[email protected]> Reviewed-by: Boyuan Zhang <[email protected]>
* radeonsi:create uvd hevc enc entryJames Zhu2018-02-211-3/+12
| | | | | | | Add UVD hevc encode pipe video codec creation entry Signed-off-by: James Zhu <[email protected]> Reviewed-by: Boyuan Zhang <[email protected]>
* radeon/uvd:add uvd hevc enc functionsJames Zhu2018-02-213-0/+383
| | | | | | | Implement UVD hevc encode functions Signed-off-by: James Zhu <[email protected]> Reviewed-by: Boyuan Zhang <[email protected]>
* radeon/uvd:add uvd hevc enc hw ib implementationJames Zhu2018-02-213-0/+1134
| | | | | | | Implement required IBs for UVD HEVC encode. Signed-off-by: James Zhu <[email protected]> Reviewed-by: Boyuan Zhang <[email protected]>
* radeon/uvd:add uvd hevc enc hw interface headerJames Zhu2018-02-213-0/+471
| | | | | | | Add hevc encode hardware interface for UVD Signed-off-by: James Zhu <[email protected]> Reviewed-by: Boyuan Zhang <[email protected]>
* winsys/amdgpu:add uvd hevc enc support in amdgpu csJames Zhu2018-02-211-0/+6
| | | | | | | Support UVD HEVC encode in amdgpu cs Signed-off-by: James Zhu <[email protected]> Reviewed-by: Boyuan Zhang <[email protected]>
* amd/common:add uvd hevc enc support check in hw queryJames Zhu2018-02-212-1/+12
| | | | | | | Based on amdgpu hardware query information to check if UVD hevc enc support Signed-off-by: James Zhu <[email protected]> Reviewed-by: Marek Olšák <[email protected]>
* nvir/nvc0: fix legalizing of ld unlock c0[0x10000]Karol Herbst2018-02-211-1/+1
| | | | | | | | | We have to increase the file index also for 0x10000 not just for values greater than 0x10000. Fixes: 37b67db6ae34fb6586d640a7a1b6232f091dd812 Signed-off-by: Karol Herbst <[email protected]> Reviewed-by: Ilia Mirkin <[email protected]>
* ac/nir: add glsl_is_array_image() helperSamuel Pitoiset2018-02-211-23/+18
| | | | | | | For consistency. Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]>
* ac/nir: set the DA field when performing atomics on 3D imagesSamuel Pitoiset2018-02-211-1/+2
| | | | | | | This doesn't fix anything known but it should definitely be set. Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]>
* i965: Fix compiler warning about write being undefined.Eric Anholt2018-02-201-1/+1
| | | | | | | This looks like it should be protected by the assume() about nr_color_regions, but my compiler warns anyway. Reviewed-by: Matt Turner <[email protected]>
* glsl/tests: Fix a compiler warning about signed/unsigned loop comparison.Eric Anholt2018-02-201-1/+1
| | | | | Fixes: d32956935edf ("glsl: Walk a list of ir_dereference_array to mark array elements as accessed") Reviewed-by: Ian Romanick <[email protected]>
* loader: Fix compiler warnings about truncating the PCI ID path.Eric Anholt2018-02-201-8/+7
| | | | | | | | | | | | | My build was producing: ../src/loader/loader.c:121:67: warning: ‘%1u’ directive output may be truncated writing between 1 and 3 bytes into a region of size 2 [-Wformat-truncation=] and we can avoid this careful calculation by just using asprintf (as we do elsewhere in the file). Reviewed-by: Eric Engestrom <[email protected]> Reviewed-by: Emil Velikov <[email protected]> Reviewed-by: Ian Romanick <[email protected]>
* glsl: Silence warnings in the uniform initializer test about 16-bit typesEric Anholt2018-02-201-0/+9
| | | | | | | | | They should probably get unit tests implemented, but this cleans up a bunch of warnings in my build for now. Fixes: 59f458cd8703 ("glsl: Add 16-bit types") Cc: Eduardo Lima Mitev <[email protected]> Reviewed-by: Ian Romanick <[email protected]>
* i965: Enable disk shader cache by defaultJordan Justen2018-02-202-3/+1
| | | | | | | Signed-off-by: Jordan Justen <[email protected]> Reviewed-by: Timothy Arceri <[email protected]> Reviewed-by: Tapani Pälli <[email protected]> Acked-by: Kenneth Graunke <[email protected]>
* radv: don't send num_tcs_input_cp to sgprs.Dave Airlie2018-02-211-4/+1
| | | | | | | We never use it in the shaders. Reviewed-by: Samuel Pitoiset <[email protected]> Signed-off-by: Dave Airlie <[email protected]>