summaryrefslogtreecommitdiffstats
path: root/src/gallium
Commit message (Collapse)AuthorAgeFilesLines
* radeonsi: disable primitive restart for triangles for DiRT RallyMarek Olšák2019-05-164-14/+25
| | | | | | It may decrease performance and it prevents compute-based primitive culling. Acked-by: Nicolai Hähnle <[email protected]>
* radeonsi: add primitive culling stats to the HUDMarek Olšák2019-05-164-4/+44
| | | | Acked-by: Nicolai Hähnle <[email protected]>
* radeonsi: cull primitives with async compute for large draw callsMarek Olšák2019-05-1618-28/+2124
| | | | | Tested-by: Dieter Nützel <[email protected]> Acked-by: Nicolai Hähnle <[email protected]>
* winsys/amdgpu: add REWIND emulation via INDIRECT_BUFFER into cs_check_spaceMarek Olšák2019-05-169-15/+26
| | | | Acked-by: Nicolai Hähnle <[email protected]>
* radeonsi: add si_vs_prolog_bits::unpack_instance_id_from_vertex_id:1Marek Olšák2019-05-162-2/+24
| | | | | | | The prim discard compute shader bakes InstanceID into the output index buffer. Tested-by: Dieter Nützel <[email protected]> Acked-by: Nicolai Hähnle <[email protected]>
* radeonsi: make some functions non-staticMarek Olšák2019-05-163-18/+25
| | | | | Tested-by: Dieter Nützel <[email protected]> Acked-by: Nicolai Hähnle <[email protected]>
* radeonsi: allow si_shader_select_with_key to return an optimized shader or failMarek Olšák2019-05-162-12/+32
| | | | | | | | If a prim discard compute shader hasn't finished compilation, we don't want to any shader. Tested-by: Dieter Nützel <[email protected]> Acked-by: Nicolai Hähnle <[email protected]>
* radeonsi: use pipe_draw_info::instance_count indirectlyMarek Olšák2019-05-161-14/+22
| | | | | | | It will be modified by compute shader culling. Tested-by: Dieter Nützel <[email protected]> Acked-by: Nicolai Hähnle <[email protected]>
* radeonsi: use pipe_draw_info::prim and primitive_restart indirectlyMarek Olšák2019-05-161-31/+40
| | | | | | | so that the fields can be changed by the driver. Tested-by: Dieter Nützel <[email protected]> Acked-by: Nicolai Hähnle <[email protected]>
* radeonsi: make functions for creating LLVM functions non-staticMarek Olšák2019-05-162-23/+32
| | | | | Tested-by: Dieter Nützel <[email protected]> Acked-by: Nicolai Hähnle <[email protected]>
* winsys/amdgpu: add a parallel compute IB coupled with a gfx IBMarek Olšák2019-05-166-10/+195
| | | | | Tested-by: Dieter Nützel <[email protected]> Acked-by: Nicolai Hähnle <[email protected]>
* radeonsi: add a cs parameter into si_cp_copy_dataMarek Olšák2019-05-165-9/+8
| | | | | Tested-by: Dieter Nützel <[email protected]> Acked-by: Nicolai Hähnle <[email protected]>
* radeonsi: add a cs parameter into si_cp_release_memMarek Olšák2019-05-165-10/+9
| | | | | Tested-by: Dieter Nützel <[email protected]> Acked-by: Nicolai Hähnle <[email protected]>
* radeonsi: add threadgroups_per_cu param into si_get_compute_resource_limitsMarek Olšák2019-05-162-4/+8
| | | | | Tested-by: Dieter Nützel <[email protected]> Acked-by: Nicolai Hähnle <[email protected]>
* radeonsi: move si_*_descriptors_idx functions into si_state.hMarek Olšák2019-05-162-14/+14
| | | | | Tested-by: Dieter Nützel <[email protected]> Acked-by: Nicolai Hähnle <[email protected]>
* radeonsi: make si_initialize_compute reusableMarek Olšák2019-05-162-7/+8
| | | | | Tested-by: Dieter Nützel <[email protected]> Acked-by: Nicolai Hähnle <[email protected]>
* radeonsi: extract COMPUTE_RESOURCE_LIMITS code into a helperMarek Olšák2019-05-162-12/+23
| | | | | Tested-by: Dieter Nützel <[email protected]> Acked-by: Nicolai Hähnle <[email protected]>
* radeonsi: return the last part's return value from @wrapperMarek Olšák2019-05-161-3/+26
| | | | | | | The primitive discard compute shader will get the position output this way. Tested-by: Dieter Nützel <[email protected]> Acked-by: Nicolai Hähnle <[email protected]>
* winsys/amdgpu: always set NO_CPU_ACCESS and NO_SUBALLOC on GDS resourcesMarek Olšák2019-05-161-2/+5
| | | | Acked-by: Nicolai Hähnle <[email protected]>
* swr: clean up supported OGL4.0/4.1 extensions listJan Zielinski2019-05-161-4/+5
| | | | | | | | | | | | This commit adjusts the capabilities returned by the SWR driver and the documentation to correctly report the following extensions: GL_ARB_texture_query_lod, GL_ARB_texture_cube_map_array, GL_ARB_gpu_shader_fp64, GL_ARB_texture_gather, GL_ARB_vertex_attrib_64bit. Reviewed-by: Alok Hota <[email protected]>
* vl/dri3: set back buffer from output to NULL with front buffer caseLeo Liu2019-05-161-0/+1
| | | | | | | Since the using output optimization is only for back buffer case Signed-off-by: Leo Liu <[email protected]> Acked-by: Alex Deucher <[email protected]>
* auxiliary/draw: fix crash with zero-stride draw autoRoland Scheidegger2019-05-161-1/+2
| | | | | | | | | | | | transform feedback draws get the number of vertices from the transform feedback object. In draw, we'll figure this out with the number of bytes written divided by the stride. However, it is apparently possible we end up with a stride of 0 there (not entirely sure it could happen with GL). Probably when nothing was actually ever written (so we don't actually have a stride set). Just avoid the division by zero by setting the count to 0. Reviewed-by: Jose Fonseca <[email protected]>
* iris: Dodge more GLSL IR loweringKenneth Graunke2019-05-151-2/+3
| | | | This avoids some lower_instructions bits in st.
* panfrost/midgard: Add load/store opcodesAlyssa Rosenzweig2019-05-164-52/+131
| | | | | | | | | This commit adds a bunch of new load/store opcodes, largely related to OpenCL, as well as adjusting the name of existing opcodes to be more uniform. The immediate effect is compute shaders are substantially easier to interpret now. Signed-off-by: Alyssa Rosenzweig <[email protected]>
* panfrost/midgard: Enable integer constant inliningAlyssa Rosenzweig2019-05-161-4/+0
| | | | | | | | | | | | Midgard ALU features two types of constants: embedded constants (128-bit chunk, zero/one per schedule bundle) and inline constants (16-bit splattered into the op, second source if present). Inline constants are much more efficient from a space and scheduling freedom standpoint, so it's desirable to inline when possible. Now that integer ops are well understood and in use, we enable inlining of integers constants in addition to floats (which have been inlined since forever). Signed-off-by: Alyssa Rosenzweig <[email protected]>
* panfrost/midgard: Remove imov workaroundAlyssa Rosenzweig2019-05-161-26/+0
| | | | | | The previous commit fixes the issue this patched around. Signed-off-by: Alyssa Rosenzweig <[email protected]>
* panfrost/midgard: Set int outmod for ops writing integersAlyssa Rosenzweig2019-05-162-7/+23
| | | | | | | | | | | | | | | | | | | | | By default, the "normal" output modifier is set on ALU ops. This is the correct default for float outputs -- for floats, it preserves the semantic value. Unfortunately, when used with integers, it does not preserve the bitstream encoding, causing misbehaviour. (It's an open question what happens when `normal` is used with integers -- does it apply some other transformation? or does it do floating point normalization/etc on the ints as if they were floats?). Instead, we default to the "clamp to integer" output modifier for ops writing integers. Semantically, this makes sense (clamping an integer to the nearest integer is the identity function). In the hardware with an integer opcode, this is the actual "normal". This fixes numerous sporadic and sometimes bizarre bugs relating to integers, especially integer moves. With this in place, we no longer care about the types involved; it's just bits on the wire again. Signed-off-by: Alyssa Rosenzweig <[email protected]>
* panfrost: Set custom stride for textures when necessaryAlyssa Rosenzweig2019-05-161-0/+25
| | | | | | | | | | | | | | | From Gallium (and our) perspective, the stride of a BO is arbitrary. For internal buffers, we can make it something nice, but for imported linear buffers (e.g. EGL clients), we don't always have that luxury. To cope, we calculate the expected stride of a texture, compare it to the BO's actual reported stride, and if they differ, set the latter as a custom stride. Fixes rendering of windows not on tile boundaries (noticeable in Weston with es2gears_wayland, for instance). Also, this should fix stride issues with bufer reloading. Signed-off-by: Alyssa Rosenzweig <[email protected]>
* panfrost/decode: Stride decodingAlyssa Rosenzweig2019-05-162-3/+33
| | | | | | | | | With a special flag, texture descriptors can include custom stride(s). We haven't seen a case of this used for mipmaps/cubemaps, so it's not clear how that will be encoded, but this dumps correctly for single one-level 2D textures. Signed-off-by: Alyssa Rosenzweig <[email protected]>
* panfrost/decode: Futureproof texture dumpingAlyssa Rosenzweig2019-05-161-2/+13
| | | | | | | | | | | One field was not dumped for some reason. It's observed to be 0, but it's still good to have it available. Also, extra fields might be snuck in the bitmaps array (it's variable-lengthed at the end), and we want to guard against that possibility, so we dump a little more. Signed-off-by: Alyssa Rosenzweig <[email protected]>
* ac: rename SI-CIK-VI to GFX6-GFX7-GFX8Marek Olšák2019-05-1535-309/+309
| | | | | | | | | | | | Acked-by: Dave Airlie <[email protected]> We already use GFX9 and I don't want us to have confusing naming in the driver. GFXn naming is better from the driver perspective, because it's the real version of the gfx portion of the hw. Also, CIK means Bonaire-Kaveri-Kabini, it doesn't mean CI. It shouldn't confuse our SDMA, UVD, VCE etc. code much. Those have nothing to do with GFXn and they have their own version numbers.
* st/dri: Minor style fixesKenneth Graunke2019-05-151-4/+6
| | | | Trivial.
* virgl: handle DONT_BLOCK and MAP_DIRECTLYChia-I Wu2019-05-154-7/+45
| | | | | | | | | | Handle PIPE_TRANSFER_DONT_BLOCK and PIPE_TRANSFER_MAP_DIRECTLY. Make virgl_resource_transfer_prepare return an enum instead of a bool for extensibility (e.g., instruct the callers to map differently). Signed-off-by: Chia-I Wu <[email protected]> Reviewed-by: Alexandros Frantzis <[email protected]>
* virgl: add virgl_resource_transfer_prepareChia-I Wu2019-05-154-55/+49
| | | | | | | | | | | | virgl_resource_transfer_prepare should be called before mapping to prepare the resource. It does flush, readback, and wait as needed. virgl_res_needs_flush and virgl_res_needs_readback become internal helpers to the new function. There should be no externally visible change. Signed-off-by: Chia-I Wu <[email protected]> Reviewed-by: Alexandros Frantzis <[email protected]>
* virgl: honor DISCARD_WHOLE_RESOURCE in virgl_res_needs_readbackChia-I Wu2019-05-151-1/+2
| | | | | Signed-off-by: Chia-I Wu <[email protected]> Reviewed-by: Alexandros Frantzis <[email protected]>
* virgl: clean up virgl_res_needs_readbackChia-I Wu2019-05-151-5/+16
| | | | | | | Add comments and follow the coding style of virgl_res_needs_flush. Signed-off-by: Chia-I Wu <[email protected]> Reviewed-by: Alexandros Frantzis <[email protected]>
* gallium: Add default check for PIPE_CAP_FRAGMENT_SHADER_INTERLOCKAlyssa Rosenzweig2019-05-141-0/+1
| | | | | | | Fixes: c704c0226 ("gallium: Add a PIPE_CAP_FRAGMENT_SHADER_INTERLOCK") Signed-off-by: Alyssa Rosenzweig <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
* iris: Check if resource has stencil before returning itAndrii Kryvytskyi2019-05-141-1/+5
| | | | | | Signed-off-by: Andrii Kryvytskyi <[email protected]> Signed-off-by: Danylo Piliaiev <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
* iris: Enable fragment shader interlock on Gen9+.Kenneth Graunke2019-05-141-0/+1
| | | | | | | | There's some debate about whether we should support this on older hardware as well. Currently i965 turns it off on Gen8- though, so we follow suit. If this changes, we can update this as well. Reviewed-by: Marek Olšák <[email protected]>
* gallium: Add a PIPE_CAP_FRAGMENT_SHADER_INTERLOCK.Kenneth Graunke2019-05-142-0/+3
| | | | | | | | | Corresponding to GL_ARB_fragment_shader_interlock and GL_NV_fragment_shader_interlock. Currently, only the NIR paths support this functionality, but someone could conceivably add it to TGSI too. Reviewed-by: Marek Olšák <[email protected]>
* intel/compiler: Implement TCS 8_PATCH mode and INTEL_DEBUG=tcs8Kenneth Graunke2019-05-142-0/+14
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Our tessellation control shaders can be dispatched in several modes. - SINGLE_PATCH (Gen7+) processes a single patch per thread, with each channel corresponding to a different patch vertex. PATCHLIST_N will launch (N / 8) threads. If N is less than 8, some channels will be disabled, leaving some untapped hardware capabilities. Conditionals based on gl_InvocationID are non-uniform, which means that they'll often have to execute both paths. However, if there are fewer than 8 vertices, all invocations will happen within a single thread, so barriers can become no-ops, which is nice. We also burn a maximum of 4 registers for ICP handles, so we can compile without regard for the value of N. It also works in all cases. - DUAL_PATCH mode processes up to two patches at a time, where the first four channels come from patch 1, and the second group of four come from patch 2. This tries to provide better EU utilization for small patches (N <= 4). It cannot be used in all cases. - 8_PATCH mode processes 8 patches at a time, with a thread launched per vertex in the patch. Each channel corresponds to the same vertex, but in each of the 8 patches. This utilizes all channels even for small patches. It also makes conditions on gl_InvocationID uniform, leading to proper jumps. Barriers, unfortunately, become real. Worse, for PATCHLIST_N, the thread payload burns N registers for ICP handles. This can burn up to 32 registers, or 1/4 of our register file, for URB handles. For Vulkan (and DX), we know the number of vertices at compile time, so we can limit the amount of waste. In GL, the patch dimension is dynamic state, so we either would have to waste all 32 (not reasonable) or guess (badly) and recompile. This is unfortunate. Because we can only spawn 16 thread instances, we can only use this mode for PATCHLIST_16 and smaller. The rest must use SINGLE_PATCH. This patch implements the new 8_PATCH TCS mode, but leaves us using SINGLE_PATCH by default. A new INTEL_DEBUG=tcs8 flag will switch to using 8_PATCH mode for testing and benchmarking purposes. We may want to consider using 8_PATCH mode in Vulkan in some cases. The data I've seen shows that 8_PATCH mode can be more efficient in some cases, but SINGLE_PATCH mode (the one we use today) is faster in other cases. Ultimately, the TES matters much more than the TCS for performance, so the decision may not matter much. Reviewed-by: Jason Ekstrand <[email protected]>
* virgl: clean up virgl_res_needs_flushChia-I Wu2019-05-141-2/+34
| | | | | | | | | | | Add comments and some minor cleanups. v2: document the function Signed-off-by: Chia-I Wu <[email protected]> Reviewed-by: Alexandros Frantzis <[email protected]> (v1) Reviewed-by: Gurchetan Singh <[email protected]> Signed-off-by: Chia-I Wu <[email protected]>
* virgl: comment on a sync issue in transfersChia-I Wu2019-05-142-0/+20
| | | | | | Signed-off-by: Chia-I Wu <[email protected]> Reviewed-by: Alexandros Frantzis <[email protected]> Reviewed-by: Gurchetan Singh <[email protected]>
* virgl: PIPE_TRANSFER_READ does not imply flushChia-I Wu2019-05-141-4/+1
| | | | | | | | virgl_res_needs_flush should suffice. Signed-off-by: Chia-I Wu <[email protected]> Reviewed-by: Alexandros Frantzis <[email protected]> Reviewed-by: Gurchetan Singh <[email protected]>
* virgl: do not skip readback because of explicit flushChia-I Wu2019-05-141-3/+0
| | | | | | | | | | Both apps and we (see virgl_buffer_transfer_flush_region) might flush regions that are unmodified. We have to read back for those flushes. Signed-off-by: Chia-I Wu <[email protected]> Reviewed-by: Alexandros Frantzis <[email protected]> Reviewed-by: Gurchetan Singh <[email protected]>
* virgl: remove unused virgl_transfer_inline_writeChia-I Wu2019-05-142-42/+0
| | | | | | | | | | It currently has no user and is probably incorrect (resource_wait is required in some more cases). Remove it so that we can focus on transfers first. Signed-off-by: Chia-I Wu <[email protected]> Reviewed-by: Alexandros Frantzis <[email protected]> Reviewed-by: Gurchetan Singh <[email protected]>
* iris/resource: Drop redundant checks for aux supportNanley Chery2019-05-141-15/+0
| | | | | | Drop some checks that are already done by ISL. Reviewed-by: Rafael Antognolli <[email protected]>
* iris/resource: Fall back to no aux if creation failsNanley Chery2019-05-141-4/+6
| | | | | | | | | No surface requires an auxiliary surface to operate correctly. Fall back to an uncompressed surface if mesa fails to create and allocate an auxiliary surface. This enables adding more restrictions to ISL without having to update iris. Reviewed-by: Rafael Antognolli <[email protected]>
* softpipe/buffer: load only as many components as the the buffer resource ↵Gert Wollny2019-05-141-2/+5
| | | | | | | | | | | | | | | type provides Otherwise we risk to read past the end of the buffer. In addition, change the loop counters to unsigned to be consistent with the types. Fixes: afa8707ba93a7d226a76319acda2a8dd89524db7 softpipe: add SSBO/shader atomics support. Signed-off-by: Gert Wollny <[email protected]> Reviewed-by: Dave Airlie <[email protected]>
* panfrost: ci: Reduce batch size to 3000Tomeu Vizoso2019-05-141-1/+1
| | | | | | | | As with the previous value of 5000 we seemed to be reaching OOM in some circumstances. Signed-off-by: Tomeu Vizoso <[email protected]> Acked-by: Alyssa Rosenzweig <[email protected]>