summaryrefslogtreecommitdiffstats
path: root/src
Commit message (Collapse)AuthorAgeFilesLines
* egl/android: fix segfault within swap_buffersTapani Pälli2017-05-191-1/+6
| | | | | | | | | | | | | | | | | | | | | | Function droid_swap_buffers may get called without dri2_surf->buffer set, in these cases we don't have a back buffer set either. Patch fixes segfault seen with 3DMark that uses android.opengl.GLSurfaceView for rendering it's UI. backtrace: #00 pc 00013f88 /system/lib/egl/libGLES_mesa.so (droid_swap_buffers+104) #01 pc 000117b2 /system/lib/egl/libGLES_mesa.so (dri2_swap_buffers+50) #02 pc 000058b2 /system/lib/egl/libGLES_mesa.so (eglSwapBuffers+386) #03 pc 00011329 /system/lib/libEGL.so (eglSwapBuffersWithDamageKHR+553) #04 pc 000118e7 /system/lib/libEGL.so (eglSwapBuffers+55) #05 pc 000754dc /system/lib/libandroid_runtime.so Note, this is v1 as v2 caused dEQP regressions. Fixes: 2acc69d ("EGL/Android: Add EGL_EXT_buffer_age extension") Signed-off-by: Tapani Pälli <[email protected]> Reviewed-by: Eric Engestrom <[email protected]> Acked-by: Emil Velikov <[email protected]> Cc: "17.1" <[email protected]>
* egl/wayland: Ensure we get a back bufferDaniel Stone2017-05-191-1/+9
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Commit 9ca6711faa03 changed the Wayland winsys to only block for the frame callback inside SwapBuffers, rather than get_back_bo. get_back_bo would perform a single non-blocking Wayland event dispatch, to try to find any release events which we had pulled off the wire but not actually processed. The blocking dispatch was moved to SwapBuffers. This removed a guarantee that we would've processed all events inside get_back_bo(), and introduced a failure whereby the server could've sent a buffer release event, but we wouldn't have read it. In clients unconstrained by SwapInterval (rendering ~as fast as possible), which were being displayed directly without composition (buffer release delayed), this could lead to get_back_bo() failing because there were no free buffers available to it. The drawing rightly failed, but this was papered over because of the path in eglSwapBuffers() which attempts to guarantee a BO, in order to support calling SwapBuffers twice in a row with no rendering actually having been performed. Since eglSwapBuffers will perform a blocking dispatch of Wayland events, a buffer release would have arrived by that point, and we could then choose a buffer to post to the server. The effect was that frames were displayed out-of-order, since we grabbed a frame with random past content to display to the compositor. Ideally get_back_bo() failing should store a failure flag inside the surface and cause the next SwapBuffers to fail, but for the meantime, restore the correct behaviour such that get_back_bo() no longer fails. Signed-off-by: Daniel Stone <[email protected]> Reported-by: Eero Tamminen <[email protected]> Acked-by: Pekka Paalanen <[email protected]> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=98833 Fixes: 9ca6711faa03 ("Revert "wayland: Block for the frame callback in get_back_bo not dri2_swap_buffers"")
* egl/wayland: Use per-surface event queuesDaniel Stone2017-05-192-27/+68
| | | | | | | | | | | | | | | | | | | | During display initialisation, we need a separate event queue to handle the registry events, which is correctly handled. But we also need separate per-surface event queues to handle swapchain-related events, such as surface frame events and buffer release events. This avoids two surfaces from the same EGLDisplay, both current on separate threads, dispatching each other's events. Create separate per-surface event queues, create wl_surface and wl_drm proxy wrapper objects per surface, so we eliminate the race around sending events to the wrong queue. swrast buffers do not need a dedicated proxy wrapper, as the wl_shm_pool used to create the wl_buffers, being transient, can itself be assigned to a queue. Signed-off-by: Daniel Stone <[email protected]> Reviewed-by: Lionel Landwerlin <[email protected]> Fixes: 36b9976e1f99 ("egl/wayland: Avoid race conditions when on non-main thread") Cc: [email protected]
* egl/wayland: Don't open-code roundtripDaniel Stone2017-05-191-25/+1
| | | | | | | | | | wl_display_roundtrip_queue() exists and can replace roundtrip(). The API was introduced with wayland 1.6, while we currently require 1.11. Signed-off-by: Daniel Stone <[email protected]> Reviewed-by: Emil Velikov <[email protected]> Reviewed-by: Lionel Landwerlin <[email protected]> Cc: [email protected]
* vulkan/wsi/wayland: Use proxy wrappers for swapchainDaniel Stone2017-05-191-16/+36
| | | | | | | | | | | | | | Though most swapchain operations used a queue, they were racy in that the object was created with the queue only set later, meaning that its event could potentially be dispatched from the default queue in between these two steps. Use proxy wrappers to avoid this race, also assigning wl_buffers created for the swapchain to the event queue. Signed-off-by: Daniel Stone <[email protected]> Reviewed-by: Lionel Landwerlin <[email protected]> Cc: [email protected]
* vulkan/wsi/wayland: Use per-display event queueDaniel Stone2017-05-191-12/+32
| | | | | | | | | | Calling random callbacks on the display's event queue is hostile, as we may call into client code when it least expects it. Create our own event queue, one per wsi_wl_display, and use that for the registry. Signed-off-by: Daniel Stone <[email protected]> Reviewed-by: Lionel Landwerlin <[email protected]> Cc: [email protected]
* vulkan/wsi/wayland: Remove roundtrip when creating imageDaniel Stone2017-05-191-1/+0
| | | | | | | | | | | | | There's no need to call wl_display_roundtrip() after trying to create a buffer through wl_drm; if it succeeds then everything is fine, and if it fails, then we get a fatal protocol error so can't recover anyway. Additionally, doing a roundtrip on the default / main application queue, is destructive anyway, so would need to be its own queue. Signed-off-by: Daniel Stone <[email protected]> Reviewed-by: Lionel Landwerlin <[email protected]> Cc: [email protected]
* vulkan: Fix Wayland uninitialised registryDaniel Stone2017-05-191-4/+5
| | | | | | | | | Untangle the exit cleanup paths so we don't try to use the registry variable before it's been initialised. Signed-off-by: Daniel Stone <[email protected]> Reviewed-by: Lionel Landwerlin <[email protected]> Cc: [email protected]
* i965/formats: Update the three-channel DXT1 mappingsNanley Chery2017-05-182-14/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The procedure for decompressing an opaque DXT1 OpenGL format is dependant on the comparison of two colors stored in the first 32 bits of the compressed block. Here's the specified OpenGL behavior for reference: The RGB color for a texel at location (x,y) in the block is given by: RGB0, if color0 > color1 and code(x,y) == 0 RGB1, if color0 > color1 and code(x,y) == 1 (2*RGB0+RGB1)/3, if color0 > color1 and code(x,y) == 2 (RGB0+2*RGB1)/3, if color0 > color1 and code(x,y) == 3 RGB0, if color0 <= color1 and code(x,y) == 0 RGB1, if color0 <= color1 and code(x,y) == 1 (RGB0+RGB1)/2, if color0 <= color1 and code(x,y) == 2 BLACK, if color0 <= color1 and code(x,y) == 3 The sampling operation performed on an opaque DXT1 Intel format essentially hard-codes the comparison result of the two colors as color0 > color1. This means that the behavior is incompatible with OpenGL. This is stated in the SKL PRM, Vol 5: Memory Views: Opaque Textures (DXT1_RGB) Texture format DXT1_RGB is identical to DXT1, with the exception that the One-bit Alpha encoding is removed. Color 0 and Color 1 are not compared, and the resulting texel color is derived strictly from the Opaque Color Encoding. The alpha channel defaults to 1.0. Programming Note Context: Opaque Textures (DXT1_RGB) The behavior of this format is not compliant with the OGL spec. The opaque and non-opaque DXT1 OpenGL formats are specified to be decoded in exactly the same way except the BLACK value must have a transparent alpha channel in the latter. Use the four-channel BC1 Intel formats with the alpha set to 1 to provide the behavior required by the spec. Note that the alpha is already set to 1 for RGB formats in brw_get_texture_swizzle(). v2: Provide a more detailed commit message (Kenneth Graunke). v3: Ensure the alpha channel is set to 1 for DXT1 formats. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=100925 Cc: <[email protected]> Acked-by: Tapani Pälli <[email protected]> (v1) Reviewed-by: Kenneth Graunke <[email protected]> Signed-off-by: Nanley Chery <[email protected]>
* anv/formats: Update the three-channel BC1 mappingsNanley Chery2017-05-181-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The procedure for decompressing an opaque BC1 Vulkan format is dependant on the comparison of two colors stored in the first 32 bits of the compressed block. Here's the specified OpenGL (and Vulkan) behavior for reference: The RGB color for a texel at location (x,y) in the block is given by: RGB0, if color0 > color1 and code(x,y) == 0 RGB1, if color0 > color1 and code(x,y) == 1 (2*RGB0+RGB1)/3, if color0 > color1 and code(x,y) == 2 (RGB0+2*RGB1)/3, if color0 > color1 and code(x,y) == 3 RGB0, if color0 <= color1 and code(x,y) == 0 RGB1, if color0 <= color1 and code(x,y) == 1 (RGB0+RGB1)/2, if color0 <= color1 and code(x,y) == 2 BLACK, if color0 <= color1 and code(x,y) == 3 The sampling operation performed on an opaque DXT1 Intel format essentially hard-codes the comparison result of the two colors as color0 > color1. This means that the behavior is incompatible with OpenGL and Vulkan. This is stated in the SKL PRM, Vol 5: Memory Views: Opaque Textures (DXT1_RGB) Texture format DXT1_RGB is identical to DXT1, with the exception that the One-bit Alpha encoding is removed. Color 0 and Color 1 are not compared, and the resulting texel color is derived strictly from the Opaque Color Encoding. The alpha channel defaults to 1.0. Programming Note Context: Opaque Textures (DXT1_RGB) The behavior of this format is not compliant with the OGL spec. The opaque and non-opaque BC1 Vulkan formats are specified to be decoded in exactly the same way except the BLACK value must have a transparent alpha channel in the latter. Use the four-channel BC1 Intel formats with the alpha set to 1 to provide the behavior required by the spec. v2 (Kenneth Graunke): - Provide a more detailed commit message. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=100925 Cc: <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]> Signed-off-by: Nanley Chery <[email protected]>
* anv: Add an option to abort on device lossJason Ekstrand2017-05-181-0/+5
| | | | | | | | | | | | | | | | This is mostly for running in our CI system to prevent dEQP from continuing on to the next test if we get a GPU hang. As it currently stands, dEQP uses the same VkDevice for almost all tests and if one of the tests hangs, we set the anv_device::device_lost flag and report VK_ERROR_DEVICE_LOST for all queue operations from that point forward without sending anything to the GPU. dEQP will happily continue trying to run tests and reporting failures until it eventually gets crash that forces the test runner to start over. This circumvents the problem by just aborting the process if we ever get a GPU hang. Since this is not the recommended behavior most of the time, we hide it behind an environment variable. Reviewed-by: Lionel Landwerlin <[email protected]>
* anv: Wrap the device lost error in vk_error in QueueSubmitJason Ekstrand2017-05-181-1/+1
| | | | | | | | We weren't wrapping this before because anv_cmd_buffer_execbuf may throw a more meaningful error message. However, we do change the error code into VK_ERROR_DEVICE_LOST, so we should print a new message. Reviewed-by: Lionel Landwerlin <[email protected]>
* radeonsi/gfx9: use CE RAM optimallyMarek Olšák2017-05-182-36/+134
| | | | | | | | | | | | | | On GFX9 with only 4K CE RAM, define the range of slots that will be allocated in CE RAM. All other slots will be uploaded directly. This will switch dynamically according to which slots are used by current shaders. GFX9 CE usage should now be similar to VI instead of being often disabled. Tested on VI by taking the GFX9 CE allocation codepath and setting num_ce_slots = 2 everywhere to get frequent switches between both modes. CE is still disabled on GFX9. Reviewed-by: Nicolai Hähnle <[email protected]>
* radeonsi: remove CE offset alignment restrictionMarek Olšák2017-05-181-2/+1
| | | | | | | This was only needed by LOAD_CONST_RAM, which is now only used to load whole CE. Reviewed-by: Nicolai Hähnle <[email protected]>
* radeonsi: only upload (dump to L2) those descriptors that are used by shadersMarek Olšák2017-05-184-24/+117
| | | | | | | This decreases the size of CE RAM dumps to L2, or the size of descriptor uploads without CE. Reviewed-by: Nicolai Hähnle <[email protected]>
* radeonsi: record which descriptor slots are used by shadersMarek Olšák2017-05-185-0/+41
| | | | Reviewed-by: Nicolai Hähnle <[email protected]>
* radeonsi: update si_ce_needed_cs_spaceMarek Olšák2017-05-181-8/+8
| | | | Reviewed-by: Nicolai Hähnle <[email protected]>
* radeonsi: do only 1 big CE dump at end of IBs and one reload in the preambleMarek Olšák2017-05-185-37/+37
| | | | | | | | A later commit will only upload descriptors used by shaders, so we won't do full dumps anymore, so the only way to have a complete mirror of CE RAM in memory is to do a separate dump after the last draw call. Reviewed-by: Nicolai Hähnle <[email protected]>
* radeonsi: remove early return in si_upload_descriptorsMarek Olšák2017-05-181-3/+0
| | | | | | | | All updates of descriptors_dirty also set dirty_mask, so the return is unnecessary. The next commit will want this function to be executed even if dirty_mask == 0. Reviewed-by: Nicolai Hähnle <[email protected]>
* radeonsi: clamp indirect index to the number of declared shader resourcesMarek Olšák2017-05-184-4/+15
| | | | | | | We'll do partial uploads of descriptor arrays, so we need to clamp against what shaders declare. Reviewed-by: Nicolai Hähnle <[email protected]>
* radeonsi: merge sampler and image descriptor lists into oneMarek Olšák2017-05-186-112/+99
| | | | | | | | | | | | Sampler slots: slot[8], .. slot[39] (ascending) Image slots: slot[7], .. slot[0] (descending) Each image occupies 1/2 of each slot, so there are 16 images in total, therefore the layout is: slot[15], .. slot[0]. (in 1/2 slot increments) Updating image slot 2n+i (i <= 1) also dirties and re-uploads slot 2n+!i. Reviewed-by: Nicolai Hähnle <[email protected]>
* radeonsi: merge constant and shader buffers descriptor lists into oneMarek Olšák2017-05-188-132/+152
| | | | | | | | | | Constant buffers: slot[16], .. slot[31] (ascending) Shader buffers: slot[15], .. slot[0] (descending) The idea is that if we have 4 constant buffers and 2 shader buffers, we only have to upload 6 slots. That optimization is left for a later commit. Reviewed-by: Nicolai Hähnle <[email protected]>
* gallium/u_threaded: add a fast path for unbinding shader buffersMarek Olšák2017-05-181-3/+9
| | | | | Reviewed-by: Samuel Pitoiset <[email protected]> Reviewed-by: Nicolai Hähnle <[email protected]>
* gallium/u_threaded: add a fast path for unbinding shader imagesMarek Olšák2017-05-181-4/+10
| | | | | Reviewed-by: Samuel Pitoiset <[email protected]> Reviewed-by: Nicolai Hähnle <[email protected]>
* st/mesa: silence a valgrind warning in u_threaded_context due to st_draw_vboMarek Olšák2017-05-181-0/+1
| | | | | Reviewed-by: Samuel Pitoiset <[email protected]> Reviewed-by: Nicolai Hähnle <[email protected]>
* glsl_to_tgsi: declare all SSBOs and atomics when indirect indexing is usedMarek Olšák2017-05-181-16/+14
| | | | | | | | Only the first array element was declared, so tgsi_shader_info:: shader_buffers_declared didn't match what the shader was using. Reviewed-by: Samuel Pitoiset <[email protected]> Reviewed-by: Nicolai Hähnle <[email protected]>
* radeonsi: get the sampler view type from inst->Texture for TG4Samuel Pitoiset2017-05-181-7/+3
| | | | | | | | | This will also magically fix this special lowering for bindless samplers. Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Marek Olšák <[email protected]> Reviewed-by: Nicolai Hähnle <[email protected]>
* tgsi: store the sampler view type directly in the instructionSamuel Pitoiset2017-05-188-23/+49
| | | | | | | | | | | RadeonSI needs to do a special lowering for Gather4 with integer formats, but with bindless samplers we just can't access the index. Instead, store the return type in the instruction like the target. Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Marek Olšák <[email protected]> Reviewed-by: Nicolai Hähnle <[email protected]>
* tgsi: remove some unused OPCODE macrosSamuel Pitoiset2017-05-182-200/+0
| | | | | | | Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Roland Scheidegger <[email protected]> Reviewed-by: Marek Olšák <[email protected]> Reviewed-by: Nicolai Hähnle <[email protected]>
* gallivm: Make sure module has the correct data layout when pass manager runsTom Stellard2017-05-181-16/+18
| | | | | | | | | | | | | | | | | | | The datalayout for modules was purposely not being set in order to work around the fact that the ExecutionEngine requires that the module's datalayout matches the datalayout of the TargetMachine that the ExecutionEngine is using. When the pass manager runs on a module with no datalayout, it uses the default datalayout which is little-endian. This causes problems on big-endian targets, because some optimizations that are legal on little-endian or illegal on big-endian. To resolve this, we set the datalayout prior to running the pass manager, and then clear it before creating the ExectionEngine. This patch fixes a lot of piglit tests on big-endian ppc64. Cc: [email protected]
* egl: Partially revert 23c86c74, fix eglMakeCurrentChad Versace2017-05-181-19/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Fixes regressions in Android CtsVerifier.apk on Intel Chrome OS devices due to incorrect error handling in eglMakeCurrent. See below on how to confirm the regression is fixed. This partially reverts commit 23c86c74cc450a23848b85cfe914376caede1cdf Author: Chad Versace <[email protected]> Subject: egl: Emit error when EGLSurface is lost The problem with commit 23c86c74 is that, once an EGLSurface became lost, the app could never unbind the bad surface. Each attempt to unbind the bad surface with eglMakeCurrent failed with EGL_BAD_CURRENT_SURFACE. Specificaly, the bad commit added the error handling below. #2 and #3 were right, but #1 was wrong. 1. eglMakeCurrent emits EGL_BAD_CURRENT_SURFACE if the calling thread has unflushed commands and either previous surface is no longer valid. 2. eglMakeCurrent emits EGL_BAD_NATIVE_WINDOW if either new surface is no longer valid. 3. eglSwapBuffers emits EGL_BAD_NATIVE_WINDOW if the swapped surface is no longer valid. Whe I wrote the bad commit, I misunderstood the EGL spec language for #1. The correct behavior is, if I understand correctly now, is below. This patch doesn't implement the correct behavior, though, it just reverts the broken behavior. - Assume a bound EGLSurface is no longer valid. - Assume the bound EGLContext has unflushed commands. - The app calls eglMakeCurrent. The spec requires eglMakeCurrent to implicitly flush. After flushing, eglMakeCurrent emits EGL_BAD_CURRENT_SURFACE and does *not* alter the thread's current bindings. - If the app calls eglMakeCurrent again, and the app inserts no commands into the GL command stream between the two eglMakeCurrent calls, then this second eglMakeCurrent succeeds without emitting an error. How to confirm this fixes the regression: Download android-cts-verifier-7.1_r5-linux_x86-x86.zip from source.android.com, unpack, and `adb install CtsVerifier.apk`. Run test "Projection Cube". Click the Pass button (a green checkmark). Then run test "Projection Widget". Confirm that widgets are visible and that logcat does not complain about eglMakeCurrent failure. Then confirm there are no regressions in the cts-traded module that commit 263243b1 fixed: cts-tf > run cts --skip-preconditions --skip-device-info \ -m CtsCameraTestCases \ -t android.hardware.camera2.cts.RobustnessTest Tested with Chrome OS board "reef". Fixes: 23c86c74 (egl: Emit error when EGLSurface is lost) Acked-by: Tapani Pälli <[email protected]> Cc: "17.1" <[email protected]> Cc: Tomasz Figa <[email protected]> Cc: Nicolas Boichat <[email protected]> Cc: Emil Velikov <[email protected]>
* anv: fix multiview for clear commandsIago Toral Quiroga2017-05-181-0/+41
| | | | | | | | | | | | | | | | | | | | According to the VK_KHX_multiview spec: "Multiview causes all drawing and clear commands in the subpass to behave as if they were broadcast to each view, where each view is represented by one layer of the framebuffer attachments." This adds support for multiview clears, which were missing in the initial implementation. v2 (Jason): - split multiview from regular case - Use for_each_bit() macro Fixes new CTS multiview tests: dEQP-VK.multiview.clear_attachments.* Reviewed-by: Jason Ekstrand <[email protected]>
* ac: add missing extern "C" guardsNicolai Hähnle2017-05-182-0/+16
| | | | Reviewed-by: Marek Olšák <[email protected]>
* ac: add radeon_info::num_{sdma,compute}_ringsNicolai Hähnle2017-05-184-7/+19
| | | | | | Vulkan needs them. Reviewed-by: Marek Olšák <[email protected]>
* ac: add radeon_surf::htile_slice_sizeNicolai Hähnle2017-05-182-0/+6
| | | | | | Vulkan needs it. Reviewed-by: Marek Olšák <[email protected]>
* ac_surface: use radeon_info from ac_gpu_infoNicolai Hähnle2017-05-184-35/+31
| | | | Reviewed-by: Marek Olšák <[email protected]>
* ac/radeonsi: move radeon_info initialization to amd/commonNicolai Hähnle2017-05-187-241/+293
| | | | | | v2: update Android.common.mk (Emil) Reviewed-by: Marek Olšák <[email protected]>
* ac/radeonsi: move struct radeon_info to ac_gpu_info.hNicolai Hähnle2017-05-182-61/+94
| | | | Reviewed-by: Marek Olšák <[email protected]>
* ac/radeonsi: move some aspects of sanity checking to ac_surfaceNicolai Hähnle2017-05-182-16/+33
| | | | Reviewed-by: Marek Olšák <[email protected]>
* ac/radeonsi: add ac_compute_surface to automatically switch gfx6 vs. gfx9Nicolai Hähnle2017-05-183-20/+24
| | | | Reviewed-by: Marek Olšák <[email protected]>
* ac/radeonsi: move the bulk of gfx9_surface_init to ac_surfaceNicolai Hähnle2017-05-183-415/+394
| | | | | | We can now merge the two *_surface_init functions. Reviewed-by: Marek Olšák <[email protected]>
* ac/radeonsi: move the bulk of gfx6_surface_init to ac_surfaceNicolai Hähnle2017-05-183-411/+470
| | | | Reviewed-by: Marek Olšák <[email protected]>
* ac/radeonsi: move amdgpu_addr_create to ac_surfaceNicolai Hähnle2017-05-188-165/+220
| | | | | | | | v2: - update Android.common.mk (Emil) - rebase on top of Raven support Reviewed-by: Marek Olšák <[email protected]> (v1)
* ac/radeonsi: move surface definitions to new header ac_surface.hNicolai Hähnle2017-05-182-147/+179
| | | | Reviewed-by: Marek Olšák <[email protected]>
* st/mesa: remove an incorrect assertionNicolai Hähnle2017-05-181-2/+0
| | | | | | | | | | | There is really no reason why the current DrawBuffer needs to be complete at this point. In particular, the assertion gets hit on the X server side in libglx when running .../piglit/bin/glx-get-current-display-ext -auto (which uses indirect GLX rendering). Fixes: 19b61799e3d0 ("st/mesa: don't cast the incomplete framebufer to st_framebuffer") Reported-by: Michel Dänzer <[email protected]> Reviewed-by: Marek Olšák <[email protected]>
* i965/vec4: load dvec3/4 uniforms first in the push constant bufferSamuel Iglesias Gonsálvez2017-05-181-27/+80
| | | | | | | | | | | | | | | | | | | | | | | | Reorder the uniforms to load first the dvec4-aligned variables in the push constant buffer and then push the vec4-aligned ones. It takes into account that the relocated uniforms should be aligned to their channel size. This fixes a bug were the dvec3/4 might be loaded one part on a GRF and the rest in next GRF, so the region parameters to read that could break the HW rules. v2: - Fix broken logic. - Add a comment to explain what should be needed to optimise the usage of the push constant buffer slots, as this patch does not pack the uniforms. v3: - Implemented the push constant buffer usage optimization. Signed-off-by: Samuel Iglesias Gonsálvez <[email protected]> Cc: "17.1" <[email protected]> Acked-by: Francisco Jerez <[email protected]>
* i965/vec4: fix swizzle and writemask when loading an uniform with constant ↵Samuel Iglesias Gonsálvez2017-05-181-4/+11
| | | | | | | | | | | | | | | | | | | | | | | | | offset It was setting XYWZ swizzle and writemask to all uniforms, no matter if they were a vector or scalar, so this can lead to problems when loading them to the push constant buffer. Moreover, 'shift' calculation was designed to calculate the offset in DWORDS, but it doesn't take into account DFs, so the calculated swizzle for the later ones was wrong. The indirect case is not changed because MOV INDIRECT will write to all components. Added an assert to verify that these uniforms are aligned. v2: - Fix 'shift' calculation (Curro) - Set both swizzle and writemask. - Add assert(shift == 0) for the indirect case. Signed-off-by: Samuel Iglesias Gonsálvez <[email protected]> Cc: "17.1" <[email protected]> Reviewed-by: Francisco Jerez <[email protected]>
* i965/vec4/gs: restore the uniform values which was overwritten by failed ↵Samuel Iglesias Gonsálvez2017-05-181-0/+26
| | | | | | | | | | | | | | | | | vec4_gs_visitor execution We are going to add a packing feature to reduce the usage of the push constant buffer. One of the consequences is that 'nr_params' would be modified by vec4_visitor's run call, so we need to restore it if one of them failed before executing the fallback ones. Same thing happens to the uniforms values that would be reordered afterwards. Fixes GL45-CTS.arrays_of_arrays_gl.InteractionFunctionCalls2 when the dvec4 alignment and packing patch is applied. Signed-off-by: Samuel Iglesias Gonsálvez <[email protected]> Cc: "17.1" <[email protected]> Acked-by: Francisco Jerez <[email protected]>
* vc4: Don't allocate new BOs to avoid synchronization when they're shared.Eric Anholt2017-05-171-1/+2
| | | | | | | If X11 did a software fallback to the entire screen, we would throw out the BO the screen is scanning out from and allocate a new one. Cc: [email protected]
* vc4: Drop pointless indirections around BO import/export.Eric Anholt2017-05-173-69/+49
| | | | | | I've since found them to be more confusing by adding indirections than clarifying by screening off resources from the handle/fd import/export process.