summaryrefslogtreecommitdiffstats
path: root/src
Commit message (Collapse)AuthorAgeFilesLines
* i965/fs: Fix extract_i8/u8 to a 64-bit destinationMatt Turner2017-11-141-2/+23
| | | | | | | | | | | | | | | The MOV instruction can extract bytes to words/double words, and words/double words to quadwords, but not byte to quadwords. For unsigned byte to quadword, we can read them as words and AND off the high byte and extract to quadword in one instruction. For signed bytes, we need to first sign extend to word and the sign extend that word to a quadword. Fixes the following test on CHV, BXT, and GLK: KHR-GL46.shader_ballot_tests.ShaderBallotBitmasks Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=103628 Reviewed-by: Jason Ekstrand <[email protected]>
* i965/fs: Split all 32->64-bit MOVs on CHV, BXT, GLKMatt Turner2017-11-141-4/+4
| | | | | | | Fixes the following tests on CHV, BXT, and GLK: KHR-GL46.shader_ballot_tests.ShaderBallotFunctionBallot dEQP-VK.spirv_assembly.instruction.compute.uconvert.uint32_to_int64 Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=103115
* swr/rast: Faster emulated simd16 permuteTim Rowley2017-11-141-23/+11
| | | | | | | | Speed up simd16 frontend (default) on avx/avx2 platforms; fixes performance regression caused by switch to simdlib. Reviewed-by: Bruce Cherniak <[email protected]> Cc: [email protected]
* swr/rast: Use gather instruction for i32gather_ps on simd16/avx512Tim Rowley2017-11-141-11/+1
| | | | | | | | Speed up avx512 platforms; fixes performance regression caused by swithc to simdlib. Reviewed-by: Bruce Cherniak <[email protected]> Cc: [email protected]
* egl/wayland: Add a fallback when fourcc query isn't supportedDerek Foreman2017-11-141-2/+30
| | | | | | | | | | | | | | | When queryImage doesn't support __DRI_IMAGE_ATTRIB_FOURCC wayland clients will die with a NULL derefence in wl_proxy_add_listener. Attempt to provide a simple fallback to keep ancient systems working. Fixes: 6595c699511 ("egl/wayland: Remove more surface specifics from create_wl_buffer") Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=103519 Signed-off-by: Derek Foreman <[email protected]> Reviewed-by: Emil Velikov <[email protected]> Acked-by: Daniel Stone <[email protected]> Reviewed-by: Eric Engestrom <[email protected]>
* radeonsi: remove has_cp_dma, has_streamout flags (v2)Marek Olšák2017-11-143-20/+2
| | | | v2: remove r600_can_dma_copy_buffer
* i965: implement (un)mapImageJulien Isorce2017-11-141-2/+63
| | | | | | | | | | | | | | | | | | | | | | | | | | | | Already implemented for Gallium drivers. Useful for gbm_bo_(un)map. Tests: By porting wayland/weston/clients/simple-dmabuf-drm.c to GBM. kmscube --mode=rgba kmscube --mode=nv12-1img kmscube --mode=nv12-2img piglit ext_image_dma_buf_import-refcount -auto piglit ext_image_dma_buf_import-transcode-nv12-as-r8-gr88 -auto piglit ext_image_dma_buf_import-sample_rgb -fmt=XR24 -alpha-one -auto piglit ext_image_dma_buf_import-sample_rgb -fmt=AR24 -auto piglit ext_image_dma_buf_import-sample_yuv -fmt=NV12 -auto piglit ext_image_dma_buf_import-sample_yuv -fmt=YU12 -auto piglit ext_image_dma_buf_import-sample_yuv -fmt=YV12 -auto v2: add early return if (flag & MAP_INTERNAL_MASK) v3: take input rect into account and test with kmscube and piglit. v4: handle wraparound and bo reference. v5: indent, exclude 0 width and height on the boundary, map bo independently of the image. Signed-off-by: Julien Isorce <[email protected]> Reviewed-by: Chris Wilson <[email protected]>
* radv: force enable LLVM sisched for The Talos PrincipleSamuel Pitoiset2017-11-141-0/+20
| | | | | | | | | | It seems safe and it improves performance by +4% (73->76). A drirc based solution is not what we want for now, keep it simple and improve later if it's really needed. Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Dave Airlie <[email protected]>
* radv: add nosisched debug optionSamuel Pitoiset2017-11-142-0/+10
| | | | | Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Dave Airlie <[email protected]>
* spirv: fix typo on DO NOT EDIT headerAlejandro Piñeiro2017-11-141-1/+1
| | | | | | Introduced on commit 157c9a13414b524ce98ea0ea07fce819efc1ba65 Reviewed-by: Iago Toral Quiroga <[email protected]>
* radv: Free temporary syncobj after waiting on it.Bas Nieuwenhuizen2017-11-141-4/+18
| | | | | | | Otherwise we leak it. Fixes: eaa56eab6da "radv: initial support for shared semaphores (v2)" Reviewed-by: Samuel Pitoiset <[email protected]>
* radv: Free syncobj with multiple imports.Bas Nieuwenhuizen2017-11-141-2/+8
| | | | | | | | Otherwise we can leak the old syncobj. Fixes: eaa56eab6da "radv: initial support for shared semaphores (v2)" Reviewed-by: Dave Airlie <[email protected]> Reviewed-by: Samuel Pitoiset <[email protected]>
* i965: Track the depth and render caches separatelyJason Ekstrand2017-11-135-22/+26
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Previously, we just had one hash set for tracking depth and render caches called brw_context::render_cache. This is less than ideal because the depth and render caches are separate and we can't track moves between the depth and the render caches. This limitation led to some unnecessary flushing around the depth cache. There are cases (mostly with BLORP) where we can end up touching a depth or stencil buffer through the render cache. To guard against this, blorp would unconditionally do a render_cache_set_check_flush on it's destination which meant that if you did any rendering (including a BLORP operation) to a given surface and then used it as a blorp destination, you would end up flushing it out of the render cache before rendering into it. Things get worse when you dig into the depth/stencil state code for regular GL draw calls. Because we may end up rendering to a depth or stencil buffer via BLORP, we did a render_cache_set_check_flush on all depth and stencil buffers in brw_emit_depthbuffer to ensure that they got flushed out of the render cache prior to using them for depth or stencil testing. However, because we also need to track dirtiness for depth and stencil so that we can implement depth and stencil texturing correctly, we were adding all depth and stencil buffers to the render cache set in brw_postdraw_set_buffers_need_resolve. This meant that, if anything caused 3DSTATE_DEPTH_BUFFER to get re-emitted (currently _NEW_BUFFERS, BRW_NEW_BATCH, and BRW_NEW_BLORP), we would almost always do a full pipeline stall and render/depth cache flush. The root cause of both of these problems is that we can't tell the difference between the render and depth caches in our tracking. This commit splits our cache tracking into two sets, one for render and one for depth, and properly handles transitioning between the two. We still flush all the caches whenever anything needs to be flushed. The idea is that if we're going to take the hit of a flush and stall, we may as well flush everything in the hopes that we can avoid a flush by something else later. Reviewed-by: Kenneth Graunke <[email protected]>
* i965/blorp: Add more destination flushingJason Ekstrand2017-11-131-1/+6
| | | | | | | | | Right now we just always flush the destination for render and aren't particularly careful about depth or stencil. Soon, flush_for_render isn't going to do the same thing as flush_for_depth and we may be doing a good deal less depth flushing so we should be a bit more precise. Reviewed-by: Kenneth Graunke <[email protected]>
* i965: Add more precise cache tracking helpersJason Ekstrand2017-11-136-13/+49
| | | | | | | | In theory, this will let us track the depth and render caches separately. Right now, they're just wrappers around brw_render_cache_set_* Reviewed-by: Kenneth Graunke <[email protected]>
* i965: Add stencil buffers to cache set regardless of stencil texturingJason Ekstrand2017-11-131-3/+1
| | | | | | | | We may access them as a texture using blorp regardless of whether or not stencil texturing is enabled. Reviewed-by: Kenneth Graunke <[email protected]> Cc: [email protected]
* i965: Switch over to fully external-or-not MOCS schemeJason Ekstrand2017-11-133-29/+11
| | | | Reviewed-by: Kenneth Graunke <[email protected]>
* i965: Use PTE MOCS for all external buffersJason Ekstrand2017-11-132-10/+18
| | | | | | | | | | | | | We were already using PTE for all render targets in case one happened to get scanned out. However, this still wasn't 100% correct because there are still possibly cases where we may want to texture from an external buffer even though we don't know the caching mode. This can happen, for instance, on buffers imported from another GPU via prime. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=101691 Cc: "17.3" <[email protected]> Tested-by: Lyude Paul <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
* intel/blorp: Make the MOCS setting part of blorp_addressJason Ekstrand2017-11-136-33/+44
| | | | | | | | This makes our MOCS settings significantly more flexible. Cc: "17.3" <[email protected]> Tested-by: Lyude Paul <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
* anv/blorp: Add a device parameter to blorp_surf_for_anv_imageJason Ekstrand2017-11-131-22/+34
| | | | | | Cc: "17.3" <[email protected]> Tested-by: Lyude Paul <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
* intel/blorp: Use mocs.tex for depth stencilJason Ekstrand2017-11-131-5/+1
| | | | | | Cc: "17.3" <[email protected]> Tested-by: Lyude Paul <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
* intel/tools/error: Decode compute shaders.Kenneth Graunke2017-11-131-7/+42
| | | | | | | | | | | | This is a bit more annoying than your average shader - we need to look at MEDIA_INTERFACE_DESCRIPTOR_LOAD in the batch buffer, then hop over to the dynamic state buffer to read the INTERFACE_DESCRIPTOR_DATA, then hop over to the instruction buffer to decode the program. Now that we store all the buffers before decoding, we can actually do this fairly easily. Reviewed-by: Lionel Landwerlin <[email protected]>
* intel/tools/error: Use do-while for field iterator loops.Kenneth Graunke2017-11-131-6/+6
| | | | | | | | | | | | while loops skip the first field of the instruction/structure, which is not what the code intended. It works out because the field we're looking for doesn't happen to be first, but we ought to do it right regardless. Found while writing the next patch, where Kernel Start Pointer is the first field of INTERFACE_DESCRIPTOR_DATA. Reviewed-by: Lionel Landwerlin <[email protected]>
* intel/tools/error: Decode shaders while decoding batch commands.Kenneth Graunke2017-11-131-85/+49
| | | | | | | | | | | | | This makes aubinator_error_decode's shader dumping work like aubinator. Instead of printing them after the fact, it prints them right inside the 3DSTATE_VS/HS/DS/GS/PS packet that references them. This saves you the effort of cross-referencing things and jumping back and forth. It also reduces a bunch of book-keeping, and eliminates the limitation that we could only handle 4096 programs. That code was also broken and failed to print any shaders if there were under 4096 programs. Reviewed-by: Lionel Landwerlin <[email protected]>
* intel/tools/error: Save error state sections and decode them later.Kenneth Graunke2017-11-131-37/+58
| | | | | | | | | This lets us complete parsing and storing of each buffer's data before we begin decoding the batchbuffer. This makes it possible to inspect the state buffer and program buffer, so we can properly decode any indirect state or shader programs. Reviewed-by: Chris Wilson <[email protected]>
* intel/tools/error: Fix null termination of ring name string.Kenneth Graunke2017-11-131-0/+1
| | | | | | Ported from intel_error_decode. We don't want to run off the end. Reviewed-by: Chris Wilson <[email protected]>
* intel/tools/error: Drop unused MAX_RINGS #define.Kenneth Graunke2017-11-131-2/+0
| | | | | | Dead code. Reviewed-by: Chris Wilson <[email protected]>
* intel/tools/error: Refactor buffer matching, add more buffers.Kenneth Graunke2017-11-131-62/+30
| | | | | | | | | | Based on a similar patch to intel_error_decode by Chris Wilson. While we're de-duplicating the gtt_offset calculation, we can simplify it to assume two hex digits are there - the kernel has done this since v4.6, and we already require error states from v4.10. Reviewed-by: Chris Wilson <[email protected]>
* intel/tools/error: Only decode a few sections of error states.Kenneth Graunke2017-11-131-1/+3
| | | | | | These three are the only we can reasonably decode with genxml. Reviewed-by: Chris Wilson <[email protected]>
* intel/tools/error: Drop unused parameters from decode() helper.Kenneth Graunke2017-11-131-5/+3
| | | | | | | | Also change count from a pointer into a value. We were supposed to be resetting it to 0 (and failed to), but that's gone since we dropped the pre-ascii85 handling. Reviewed-by: Chris Wilson <[email protected]>
* intel/tools/error: Drop support for non-ascii85 encoded error states.Kenneth Graunke2017-11-131-35/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Error state files used to look like: render ring --- gtt_offset = 0x0e8f6000 00000000 : 69040000 00000004 : 79090000 ... 00007ffc : 00000000 --- ringbuffer = 0x00001000 There were thousands of lines between sections. The file format changed with Kernel 4.10, and now has a single ascii85-encoded line following each section heading. This is much easier to parse. There are a bunch of bugs in our handling of the old style format, where we'd decode the wrong data, at the wrong time. Fixing all of these is going to be a giant pain. It's also a lot of extra code complexity. In order to properly decode indirect state, or compute shaders, we'll also need to parse data in advance of decoding, which is going to be a giant pain with this ad-hoc "decode everywhere!" mentality. So, let's just drop support for the older file format. This unfortunately requires an error state generated by Kernel 4.10 or later. That's probably not the end of the world, as we encourage users to upgrade to the latest kernel when encountering GPU hangs anyway. It might be a giant pain for people with LTS kernels, though... Reviewed-by: Chris Wilson <[email protected]>
* intel/tools/error: Do ascii85 decode first.Kenneth Graunke2017-11-131-31/+29
| | | | | | | | | | The dashes "---" may occur within an ascii85 block, but only an ascii85 block starts with ':' or '~'. Ported from Chris Wilson's intel-gpu-tools commit: bceec7e1d8a160226b783c6344eae8cbf4ece144 Reviewed-by: Chris Wilson <[email protected]>
* egl/haiku: Correct invalid void* conversion in callocAlexander von Gluck IV2017-11-131-1/+2
| | | | | Reviewed-by: Brian Paul <[email protected]> Reviewed-by: Eric Engestrom <[email protected]>
* meson: Remove build_by_default from amd codeDylan Baker2017-11-133-3/+3
| | | | | | | This is the same logic as the previous two patches. Signed-off-by: Dylan Baker <[email protected]> Reviewed-by: Eric Anholt <[email protected]>
* meson: Don't build intel shared components by defaultDylan Baker2017-11-134-6/+3
| | | | | | | | It's a neat idea, and still useful in some cases, but the intel common code is used by i965 and anvil only, this is a little clearer. Signed-off-by: Dylan Baker <[email protected]> Reviewed-by: Eric Anholt <[email protected]>
* meson: don't use build_by_default for specific gallium driversDylan Baker2017-11-1313-34/+25
| | | | | | | | | | | | | | | | | | | Using build_by_default : false is convenient for dependencies that can be pulled in by various diverse components of the build system, the gallium hardware/software drivers and state trackers do not fit that description. Instead, these should be guarded using the variable that tracks whether that driver should be enabled. This leaves a few helper libraries: trace, rbug, etc, and the generic winsys bits as `build_by_default : false` because there are a large number of gallium components that pull them in. v2: - remove build_by_default from winsys convenience libs as well. v3: - Always put drivers before winsys for consistency Signed-off-by: Dylan Baker <[email protected]> Tested-by: Lionel Landwerlin <[email protected]> (v1) Reviewed-by: Eric Anholt <[email protected]>
* r600/shader: handle bitfield extract semantics properly.Dave Airlie2017-11-141-4/+53
| | | | | | | Fixes: tests/spec/arb_gpu_shader5/execution/built-in-functions/fs-bitfieldExtract.shader_test Signed-off-by: Dave Airlie <[email protected]>
* r600: handle bitfieldInsert corner case.Dave Airlie2017-11-141-1/+39
| | | | | | | | | This handles the bits >= 32 corner case in bitfieldInsert. Fixes: tests/spec/arb_gpu_shader5/execution/built-in-functions/fs-bitfieldInsert.shader_test. Signed-off-by: Dave Airlie <[email protected]>
* r600: add gs tri strip adjacency fix.Dave Airlie2017-11-144-5/+62
| | | | | | | | | | | | | | Like radeonsi: generate GS prolog to (partially) fix triangle strip adjacency rotation evergreen hw suffers from the same problem, so rotate the geometry inputs to fix this. This fixes: ./bin/glsl-1.50-geometry-primitive-types GL_TRIANGLE_STRIP_ADJACENCY on evergreen. Signed-off-by: Dave Airlie <[email protected]>
* r600: fix isoline tess factor component swapping.Dave Airlie2017-11-141-0/+7
| | | | | | | | | As per radeonsi, the tess factor components for isolines are reversed. Fixes: tests/spec/arb_tessellation_shader/execution/isoline.shader_test Cc: <[email protected]> Signed-off-by: Dave Airlie <[email protected]>
* r600/shader: reserve first register of vertex shader.Dave Airlie2017-11-141-2/+4
| | | | | | | | | | r0 in input into vertex shaders contains things like vertexid, we need to reserve it even if we have no inputs. This fixes a bunch of tessellation piglits. Cc: <[email protected]> Signed-off-by: Dave Airlie <[email protected]>
* r600: don't emit atomic save if we have no atomic counters.Dave Airlie2017-11-141-0/+3
| | | | | | | Otherwise we end up emitting the fence. Tested-By: Gert Wollny <[email protected]> Signed-off-by: Dave Airlie <[email protected]>
* glx/dri3: Fix passing renderType into glXCreateContextAdam Jackson2017-11-131-1/+2
| | | | | | | | | | Without this, trying to create a GLX_RGBA_FLOAT_TYPE_ARB context would fail, because GLX_RGBA_TYPE would be a mismatch with the fbconfig. Cc: [email protected] Reviewed-by: Tapani Pälli <[email protected]> Reviewed-by: Emil Velikov <[email protected]> Signed-off-by: Adam Jackson <[email protected]>
* glx/drisw: Fix glXMakeCurrent(dpy, None, ctx)Adam Jackson2017-11-131-4/+2
| | | | | | | | | | | This is perfectly legal in GL 3.0+. Fixes piglit/glx-create-context-current-no-framebuffer. Cc: [email protected] Reviewed-by: Tapani Pälli <[email protected]> Reviewed-by: Emil Velikov <[email protected]> Signed-off-by: Adam Jackson <[email protected]>
* glx: Lower GLX opcode lookup into SendMakeCurrentRequestAdam Jackson2017-11-131-9/+7
| | | | | | Reviewed-by: Tapani Pälli <[email protected]> Reviewed-by: Emil Velikov <[email protected]> Signed-off-by: Adam Jackson <[email protected]>
* aubinator: Don't skip the first field in each subgroupJason Ekstrand2017-11-131-2/+3
| | | | | | | | | | The previous iteration algorithm would advance the field pointer right after we advance the group. This meant that you would end up with skipping the first field of the group. In the common case, where the only field is a struct (e.g. 3DSTATE_VERTEX_BUFFERS), it would get skipped entirely. Reviewed-by: Lionel Landwerlin <[email protected]>
* intel/genxml: Delete empty groupsJason Ekstrand2017-11-134-8/+0
| | | | | | | | | They serve no purpose other than to just fill empty space in the packet so each dword has something. Just disallowing empty groups is a bit easier on some of the tools. This does not change the generated packing headers in any way. Reviewed-by: Lionel Landwerlin <[email protected]>
* anv: Don't crash on invalid heap sizes when the PCI ID is overridenJason Ekstrand2017-11-131-0/+12
|
* nir/spirv: tg4 requires a samplerAlex Smith2017-11-132-2/+1
| | | | | | | | | | Gather operations in both GLSL and SPIR-V require a sampler. Fixes gathers returning garbage when using separate texture/samplers (on AMD, was using an invalid sampler descriptor). Signed-off-by: Alex Smith <[email protected]> Cc: "17.2 17.3" <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]>
* spirv: Use correct type for sampled imagesAlex Smith2017-11-133-6/+6
| | | | | | | | | | | | | | | | | | | | | | We should use the result type of the OpSampledImage opcode, rather than the type of the underlying image/samplers. This resolves an issue when using separate images and shadow samplers with glslang. Example: layout (...) uniform samplerShadow s0; layout (...) uniform texture2D res0; ... float result = textureLod(sampler2DShadow(res0, s0), uv, 0); For this, for the combined OpSampledImage, the type of the base image was being used (which does not have the Depth flag set, whereas the result type does), therefore it was not being recognised as a shadow sampler. This led to the wrong LLVM intrinsics being emitted by RADV. Signed-off-by: Alex Smith <[email protected]> Cc: "17.2 17.3" <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]>