summaryrefslogtreecommitdiffstats
path: root/src/gallium
Commit message (Collapse)AuthorAgeFilesLines
* radeonsi: move si_*_descriptors_idx functions into si_state.hMarek Olšák2019-05-162-14/+14
| | | | | Tested-by: Dieter Nützel <[email protected]> Acked-by: Nicolai Hähnle <[email protected]>
* radeonsi: make si_initialize_compute reusableMarek Olšák2019-05-162-7/+8
| | | | | Tested-by: Dieter Nützel <[email protected]> Acked-by: Nicolai Hähnle <[email protected]>
* radeonsi: extract COMPUTE_RESOURCE_LIMITS code into a helperMarek Olšák2019-05-162-12/+23
| | | | | Tested-by: Dieter Nützel <[email protected]> Acked-by: Nicolai Hähnle <[email protected]>
* radeonsi: return the last part's return value from @wrapperMarek Olšák2019-05-161-3/+26
| | | | | | | The primitive discard compute shader will get the position output this way. Tested-by: Dieter Nützel <[email protected]> Acked-by: Nicolai Hähnle <[email protected]>
* winsys/amdgpu: always set NO_CPU_ACCESS and NO_SUBALLOC on GDS resourcesMarek Olšák2019-05-161-2/+5
| | | | Acked-by: Nicolai Hähnle <[email protected]>
* swr: clean up supported OGL4.0/4.1 extensions listJan Zielinski2019-05-161-4/+5
| | | | | | | | | | | | This commit adjusts the capabilities returned by the SWR driver and the documentation to correctly report the following extensions: GL_ARB_texture_query_lod, GL_ARB_texture_cube_map_array, GL_ARB_gpu_shader_fp64, GL_ARB_texture_gather, GL_ARB_vertex_attrib_64bit. Reviewed-by: Alok Hota <[email protected]>
* vl/dri3: set back buffer from output to NULL with front buffer caseLeo Liu2019-05-161-0/+1
| | | | | | | Since the using output optimization is only for back buffer case Signed-off-by: Leo Liu <[email protected]> Acked-by: Alex Deucher <[email protected]>
* auxiliary/draw: fix crash with zero-stride draw autoRoland Scheidegger2019-05-161-1/+2
| | | | | | | | | | | | transform feedback draws get the number of vertices from the transform feedback object. In draw, we'll figure this out with the number of bytes written divided by the stride. However, it is apparently possible we end up with a stride of 0 there (not entirely sure it could happen with GL). Probably when nothing was actually ever written (so we don't actually have a stride set). Just avoid the division by zero by setting the count to 0. Reviewed-by: Jose Fonseca <[email protected]>
* iris: Dodge more GLSL IR loweringKenneth Graunke2019-05-151-2/+3
| | | | This avoids some lower_instructions bits in st.
* panfrost/midgard: Add load/store opcodesAlyssa Rosenzweig2019-05-164-52/+131
| | | | | | | | | This commit adds a bunch of new load/store opcodes, largely related to OpenCL, as well as adjusting the name of existing opcodes to be more uniform. The immediate effect is compute shaders are substantially easier to interpret now. Signed-off-by: Alyssa Rosenzweig <[email protected]>
* panfrost/midgard: Enable integer constant inliningAlyssa Rosenzweig2019-05-161-4/+0
| | | | | | | | | | | | Midgard ALU features two types of constants: embedded constants (128-bit chunk, zero/one per schedule bundle) and inline constants (16-bit splattered into the op, second source if present). Inline constants are much more efficient from a space and scheduling freedom standpoint, so it's desirable to inline when possible. Now that integer ops are well understood and in use, we enable inlining of integers constants in addition to floats (which have been inlined since forever). Signed-off-by: Alyssa Rosenzweig <[email protected]>
* panfrost/midgard: Remove imov workaroundAlyssa Rosenzweig2019-05-161-26/+0
| | | | | | The previous commit fixes the issue this patched around. Signed-off-by: Alyssa Rosenzweig <[email protected]>
* panfrost/midgard: Set int outmod for ops writing integersAlyssa Rosenzweig2019-05-162-7/+23
| | | | | | | | | | | | | | | | | | | | | By default, the "normal" output modifier is set on ALU ops. This is the correct default for float outputs -- for floats, it preserves the semantic value. Unfortunately, when used with integers, it does not preserve the bitstream encoding, causing misbehaviour. (It's an open question what happens when `normal` is used with integers -- does it apply some other transformation? or does it do floating point normalization/etc on the ints as if they were floats?). Instead, we default to the "clamp to integer" output modifier for ops writing integers. Semantically, this makes sense (clamping an integer to the nearest integer is the identity function). In the hardware with an integer opcode, this is the actual "normal". This fixes numerous sporadic and sometimes bizarre bugs relating to integers, especially integer moves. With this in place, we no longer care about the types involved; it's just bits on the wire again. Signed-off-by: Alyssa Rosenzweig <[email protected]>
* panfrost: Set custom stride for textures when necessaryAlyssa Rosenzweig2019-05-161-0/+25
| | | | | | | | | | | | | | | From Gallium (and our) perspective, the stride of a BO is arbitrary. For internal buffers, we can make it something nice, but for imported linear buffers (e.g. EGL clients), we don't always have that luxury. To cope, we calculate the expected stride of a texture, compare it to the BO's actual reported stride, and if they differ, set the latter as a custom stride. Fixes rendering of windows not on tile boundaries (noticeable in Weston with es2gears_wayland, for instance). Also, this should fix stride issues with bufer reloading. Signed-off-by: Alyssa Rosenzweig <[email protected]>
* panfrost/decode: Stride decodingAlyssa Rosenzweig2019-05-162-3/+33
| | | | | | | | | With a special flag, texture descriptors can include custom stride(s). We haven't seen a case of this used for mipmaps/cubemaps, so it's not clear how that will be encoded, but this dumps correctly for single one-level 2D textures. Signed-off-by: Alyssa Rosenzweig <[email protected]>
* panfrost/decode: Futureproof texture dumpingAlyssa Rosenzweig2019-05-161-2/+13
| | | | | | | | | | | One field was not dumped for some reason. It's observed to be 0, but it's still good to have it available. Also, extra fields might be snuck in the bitmaps array (it's variable-lengthed at the end), and we want to guard against that possibility, so we dump a little more. Signed-off-by: Alyssa Rosenzweig <[email protected]>
* ac: rename SI-CIK-VI to GFX6-GFX7-GFX8Marek Olšák2019-05-1535-309/+309
| | | | | | | | | | | | Acked-by: Dave Airlie <[email protected]> We already use GFX9 and I don't want us to have confusing naming in the driver. GFXn naming is better from the driver perspective, because it's the real version of the gfx portion of the hw. Also, CIK means Bonaire-Kaveri-Kabini, it doesn't mean CI. It shouldn't confuse our SDMA, UVD, VCE etc. code much. Those have nothing to do with GFXn and they have their own version numbers.
* st/dri: Minor style fixesKenneth Graunke2019-05-151-4/+6
| | | | Trivial.
* virgl: handle DONT_BLOCK and MAP_DIRECTLYChia-I Wu2019-05-154-7/+45
| | | | | | | | | | Handle PIPE_TRANSFER_DONT_BLOCK and PIPE_TRANSFER_MAP_DIRECTLY. Make virgl_resource_transfer_prepare return an enum instead of a bool for extensibility (e.g., instruct the callers to map differently). Signed-off-by: Chia-I Wu <[email protected]> Reviewed-by: Alexandros Frantzis <[email protected]>
* virgl: add virgl_resource_transfer_prepareChia-I Wu2019-05-154-55/+49
| | | | | | | | | | | | virgl_resource_transfer_prepare should be called before mapping to prepare the resource. It does flush, readback, and wait as needed. virgl_res_needs_flush and virgl_res_needs_readback become internal helpers to the new function. There should be no externally visible change. Signed-off-by: Chia-I Wu <[email protected]> Reviewed-by: Alexandros Frantzis <[email protected]>
* virgl: honor DISCARD_WHOLE_RESOURCE in virgl_res_needs_readbackChia-I Wu2019-05-151-1/+2
| | | | | Signed-off-by: Chia-I Wu <[email protected]> Reviewed-by: Alexandros Frantzis <[email protected]>
* virgl: clean up virgl_res_needs_readbackChia-I Wu2019-05-151-5/+16
| | | | | | | Add comments and follow the coding style of virgl_res_needs_flush. Signed-off-by: Chia-I Wu <[email protected]> Reviewed-by: Alexandros Frantzis <[email protected]>
* gallium: Add default check for PIPE_CAP_FRAGMENT_SHADER_INTERLOCKAlyssa Rosenzweig2019-05-141-0/+1
| | | | | | | Fixes: c704c0226 ("gallium: Add a PIPE_CAP_FRAGMENT_SHADER_INTERLOCK") Signed-off-by: Alyssa Rosenzweig <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
* iris: Check if resource has stencil before returning itAndrii Kryvytskyi2019-05-141-1/+5
| | | | | | Signed-off-by: Andrii Kryvytskyi <[email protected]> Signed-off-by: Danylo Piliaiev <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
* iris: Enable fragment shader interlock on Gen9+.Kenneth Graunke2019-05-141-0/+1
| | | | | | | | There's some debate about whether we should support this on older hardware as well. Currently i965 turns it off on Gen8- though, so we follow suit. If this changes, we can update this as well. Reviewed-by: Marek Olšák <[email protected]>
* gallium: Add a PIPE_CAP_FRAGMENT_SHADER_INTERLOCK.Kenneth Graunke2019-05-142-0/+3
| | | | | | | | | Corresponding to GL_ARB_fragment_shader_interlock and GL_NV_fragment_shader_interlock. Currently, only the NIR paths support this functionality, but someone could conceivably add it to TGSI too. Reviewed-by: Marek Olšák <[email protected]>
* intel/compiler: Implement TCS 8_PATCH mode and INTEL_DEBUG=tcs8Kenneth Graunke2019-05-142-0/+14
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Our tessellation control shaders can be dispatched in several modes. - SINGLE_PATCH (Gen7+) processes a single patch per thread, with each channel corresponding to a different patch vertex. PATCHLIST_N will launch (N / 8) threads. If N is less than 8, some channels will be disabled, leaving some untapped hardware capabilities. Conditionals based on gl_InvocationID are non-uniform, which means that they'll often have to execute both paths. However, if there are fewer than 8 vertices, all invocations will happen within a single thread, so barriers can become no-ops, which is nice. We also burn a maximum of 4 registers for ICP handles, so we can compile without regard for the value of N. It also works in all cases. - DUAL_PATCH mode processes up to two patches at a time, where the first four channels come from patch 1, and the second group of four come from patch 2. This tries to provide better EU utilization for small patches (N <= 4). It cannot be used in all cases. - 8_PATCH mode processes 8 patches at a time, with a thread launched per vertex in the patch. Each channel corresponds to the same vertex, but in each of the 8 patches. This utilizes all channels even for small patches. It also makes conditions on gl_InvocationID uniform, leading to proper jumps. Barriers, unfortunately, become real. Worse, for PATCHLIST_N, the thread payload burns N registers for ICP handles. This can burn up to 32 registers, or 1/4 of our register file, for URB handles. For Vulkan (and DX), we know the number of vertices at compile time, so we can limit the amount of waste. In GL, the patch dimension is dynamic state, so we either would have to waste all 32 (not reasonable) or guess (badly) and recompile. This is unfortunate. Because we can only spawn 16 thread instances, we can only use this mode for PATCHLIST_16 and smaller. The rest must use SINGLE_PATCH. This patch implements the new 8_PATCH TCS mode, but leaves us using SINGLE_PATCH by default. A new INTEL_DEBUG=tcs8 flag will switch to using 8_PATCH mode for testing and benchmarking purposes. We may want to consider using 8_PATCH mode in Vulkan in some cases. The data I've seen shows that 8_PATCH mode can be more efficient in some cases, but SINGLE_PATCH mode (the one we use today) is faster in other cases. Ultimately, the TES matters much more than the TCS for performance, so the decision may not matter much. Reviewed-by: Jason Ekstrand <[email protected]>
* virgl: clean up virgl_res_needs_flushChia-I Wu2019-05-141-2/+34
| | | | | | | | | | | Add comments and some minor cleanups. v2: document the function Signed-off-by: Chia-I Wu <[email protected]> Reviewed-by: Alexandros Frantzis <[email protected]> (v1) Reviewed-by: Gurchetan Singh <[email protected]> Signed-off-by: Chia-I Wu <[email protected]>
* virgl: comment on a sync issue in transfersChia-I Wu2019-05-142-0/+20
| | | | | | Signed-off-by: Chia-I Wu <[email protected]> Reviewed-by: Alexandros Frantzis <[email protected]> Reviewed-by: Gurchetan Singh <[email protected]>
* virgl: PIPE_TRANSFER_READ does not imply flushChia-I Wu2019-05-141-4/+1
| | | | | | | | virgl_res_needs_flush should suffice. Signed-off-by: Chia-I Wu <[email protected]> Reviewed-by: Alexandros Frantzis <[email protected]> Reviewed-by: Gurchetan Singh <[email protected]>
* virgl: do not skip readback because of explicit flushChia-I Wu2019-05-141-3/+0
| | | | | | | | | | Both apps and we (see virgl_buffer_transfer_flush_region) might flush regions that are unmodified. We have to read back for those flushes. Signed-off-by: Chia-I Wu <[email protected]> Reviewed-by: Alexandros Frantzis <[email protected]> Reviewed-by: Gurchetan Singh <[email protected]>
* virgl: remove unused virgl_transfer_inline_writeChia-I Wu2019-05-142-42/+0
| | | | | | | | | | It currently has no user and is probably incorrect (resource_wait is required in some more cases). Remove it so that we can focus on transfers first. Signed-off-by: Chia-I Wu <[email protected]> Reviewed-by: Alexandros Frantzis <[email protected]> Reviewed-by: Gurchetan Singh <[email protected]>
* iris/resource: Drop redundant checks for aux supportNanley Chery2019-05-141-15/+0
| | | | | | Drop some checks that are already done by ISL. Reviewed-by: Rafael Antognolli <[email protected]>
* iris/resource: Fall back to no aux if creation failsNanley Chery2019-05-141-4/+6
| | | | | | | | | No surface requires an auxiliary surface to operate correctly. Fall back to an uncompressed surface if mesa fails to create and allocate an auxiliary surface. This enables adding more restrictions to ISL without having to update iris. Reviewed-by: Rafael Antognolli <[email protected]>
* softpipe/buffer: load only as many components as the the buffer resource ↵Gert Wollny2019-05-141-2/+5
| | | | | | | | | | | | | | | type provides Otherwise we risk to read past the end of the buffer. In addition, change the loop counters to unsigned to be consistent with the types. Fixes: afa8707ba93a7d226a76319acda2a8dd89524db7 softpipe: add SSBO/shader atomics support. Signed-off-by: Gert Wollny <[email protected]> Reviewed-by: Dave Airlie <[email protected]>
* panfrost: ci: Reduce batch size to 3000Tomeu Vizoso2019-05-141-1/+1
| | | | | | | | As with the previous value of 5000 we seemed to be reaching OOM in some circumstances. Signed-off-by: Tomeu Vizoso <[email protected]> Acked-by: Alyssa Rosenzweig <[email protected]>
* panfrost: ci: Update expectationsTomeu Vizoso2019-05-141-2/+0
| | | | | | | | | | Since last Friday, these two tests have been fixed: dEQP-GLES2.functional.shaders.functions.control_flow.return_in_nested_loop_fragment dEQP-GLES2.functional.shaders.linkage.varying_7 Signed-off-by: Tomeu Vizoso <[email protected]> Acked-by: Alyssa Rosenzweig <[email protected]>
* freedreno: Fix warning on printing a uint64_t using %llx.Eric Anholt2019-05-131-1/+1
| | | | Reviewed-by: Kristian H. Kristensen <[email protected]>
* freedreno: Silence compiler warnings about "*" in boolean context.Eric Anholt2019-05-132-2/+2
| | | | | | | It sure looks like we just want both of them to be nonzero, and && is probably going to be cheaper than * anyway. Reviewed-by: Kristian H. Kristensen <[email protected]>
* freedreno: Silence compiler warnings about uninit 'layers'Eric Anholt2019-05-133-3/+3
| | | | | | | My gcc can't see that the uninitialized value from the PIPE_BUFFER case isn't used from the !PIPE_BUFFER cases later. Reviewed-by: Kristian H. Kristensen <[email protected]>
* freedreno: Make emacs indent the way robclark's eclipse does.Eric Anholt2019-05-131-0/+3
| | | | | | | | | The .editorconfig helps with the tabs, but we've got this two-tabs-from-previous-indentation line continuation style that requires whacking the c-file-offsets. This will throw emacs warnings when first opening a file in the directory, press '!' to shut it up for the future. Reviewed-by: Kristian H. Kristensen <[email protected]>
* freedreno: Make .editorconfig match .dir-locals.el.Eric Anholt2019-05-131-0/+3
| | | | | | | | The editorconfig takes precedence over dir-locals in emacs26 with editorconfig enabled, so the /.editorconfig was affecting these directories. Reviewed-by: Kristian H. Kristensen <[email protected]>
* nv50/ir/nir: make use of SYSTEM_VALUE_MAX when iterating read sysvalsKarol Herbst2019-05-131-1/+1
| | | | | Signed-off-by: Karol Herbst <[email protected]> Reviewed-by: Pierre Moreau <[email protected]>
* nv50/ir/nir: prefer to shift 1ull instead of 1llKarol Herbst2019-05-131-2/+2
| | | | | | Signed-off-by: Karol Herbst <[email protected]> Suggested-by: Ilia Mirkin <[email protected]> Reviewed-by: Pierre Moreau <[email protected]>
* v3d: Use driconf to expose non-MSAA texture limits for Xorg.Eric Anholt2019-05-1313-22/+76
| | | | | | The V3D 4.2 HW has a limit to MSAA texture sizes of 4096. With non-MSAA, we can go up to 7680 (actually probably 8138, but that hasn't been validated by the HW team). Exposing 7680 in X11 will allow dual 4k displays.
* gallium: Redefine the max texture 2d cap from _LEVELS to _SIZE.Eric Anholt2019-05-1332-81/+84
| | | | | | | | The _LEVELS assumes that the max is always power of two. For V3D 4.2, we can support up to 7680 non-power-of-two MSAA textures, which will let X11 support dual 4k displays on newer hardware. Reviewed-by: Marek Olšák <[email protected]>
* radeonsi: overhaul the vertex fetch fixup mechanismNicolai Hähnle2019-05-138-280/+301
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | The overall goal is to support unaligned loads from vertex buffers natively on SI. In the unaligned case, we fall back to the general case implementation in ac_build_opencoded_load_format. Since this function is fully general, we will also use it going forward for cases requiring fully manual format conversions of dwords anyway. This requires a different encoding of the fix_fetch array, which will now contain the entire format information if a fixup is required. Having to check the alignment of vertex buffers is awkward. To keep the impact on the fast path minimal, the si_context will keep track of which vertex buffers are (not) at least dword-aligned, while the si_vertex_elements will note which vertex buffers have some (at most dword) alignment requirement. Vertex buffers should be dword-aligned most of the time, which allows a fast early-out in almost all cases. Add the radeonsi_vs_fetch_always_opencode configuration variable for testing purposes. Note that it can only be used reliably on LLVM >= 9, because support for byte and short load is required. v2: - add a missing check to si_bind_vertex_elements Reviewed-by: Marek Olšák <[email protected]>
* radeonsi: store sctx->vertex_elements in a local in si_shader_selector_key_vsNicolai Hähnle2019-05-131-7/+6
| | | | | | Purely as a shorthand in the remainder of the function. Reviewed-by: Marek Olšák <[email protected]>
* lima: add Allwinner H5 supportPatrick Lerda2019-05-131-2/+20
| | | | | | | | The H5 hardware variant requires a specific plb_max_blk number. This value can't be probed at the hardware level. Signed-off-by: Patrick Lerda <[email protected]> Reviewed-by: Qiang Yu <[email protected]>
* lima: refactor plb_max_blkPatrick Lerda2019-05-135-11/+34
| | | | | | | | Move plb_max_blk to lima_screen, and add a new debug option: LIMA_PLB_MAX_BLK Signed-off-by: Patrick Lerda <[email protected]> Reviewed-by: Qiang Yu <[email protected]>