aboutsummaryrefslogtreecommitdiffstats
Commit message (Collapse)AuthorAgeFilesLines
* spirv: Set lengths on scalar and vector typesJason Ekstrand2017-12-111-0/+4
| | | | Reviewed-by: Ian Romanick <[email protected]>
* ac/nir: Support vulkan_resource_reindex.Bas Nieuwenhuizen2017-12-121-0/+14
| | | | | Fixes: 93b4cb61eb2 "spirv: Allow OpPtrAccessChain for block indices" Reviewed-by: Dave Airlie <[email protected]>
* ac/nir: Don't load the descriptor in vulkan_resource_index.Bas Nieuwenhuizen2017-12-121-5/+13
| | | | | | | | | | | To support the reindex intrinsic, we need the result to be something on which we can adjust the index/address. Since it is all within a basic block, the compiler should be able to merge any extra loads. v2: Change visit_get_buffer_size too. Reviewed-by: Dave Airlie <[email protected]>
* winsys/amdgpu: disable local BOs again due to worse performanceMarek Olšák2017-12-111-2/+3
| | | | | Cc: 17.3 <[email protected]> Reviewed-by: Nicolai Hähnle <[email protected]>
* drirc: whitelist glthread for Mount and Blade Warband againMarek Olšák2017-12-111-0/+3
|
* radv: Don't use local BOs when allocating with export options.Bas Nieuwenhuizen2017-12-101-1/+3
| | | | | | | | | If the app does not plan to put a buffer or image in it (why? But it is allowed and CTS does it), they do not need to allocate it with the deciate allocation struct. Fixes: a639d40f133 "radv: add support for local bos. (v3)" Reviewed-by: Dave Airlie <[email protected]>
* spirv: Fix loading an entire block at once.Bas Nieuwenhuizen2017-12-101-30/+33
| | | | | | | | | There is no chain, so checking the length ends with a SEGFAULT. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=103579 Cc: <[email protected]> Reviewed-by: Emil Velikov <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]>
* anv: Enable UBO pushingJason Ekstrand2017-12-082-0/+7
| | | | | | | | | | | | | Push constants on Intel hardware are significantly more performant than pull constants. Since most Vulkan applications don't actively use push constants on Vulkan or at least don't use it heavily, we're pulling way more than we should be. By enabling pushing chunks of UBOs we can get rid of a lot of those pulls. On my SKL GT4e, this improves the performance of Dota 2 and Talos by around 2.5% and improves Aztec Ruins by around 2%. Reviewed-by: Jordan Justen <[email protected]>
* i965/fs: Handle !supports_pull_constants and push UBOs properlyJason Ekstrand2017-12-081-1/+1
| | | | | | In Vulkan, we don't support classic pull constants and everything the client asks us to push, we push. However, for pushed UBOs, we still want to fall back to conventional pulls if we run out of space.
* anv/device: Increase the UBO alignment requirement to 32Jason Ekstrand2017-12-081-2/+10
| | | | | | | | Push constants work in terms of 32-byte chunks so if we want to be able to push UBOs, every thing needs to be 32-byte aligned. Currently, we only require 16-byte which is too small. Reviewed-by: Jordan Justen <[email protected]>
* anv/cmd_buffer: Add support for pushing UBO rangesJason Ekstrand2017-12-082-33/+112
| | | | | | | | In order to do this we have to modify push constant set up to handle ranges. We also have to tweak the way we handle dirty bits a bit so that we re-push whenever a descriptor set changes. Reviewed-by: Jordan Justen <[email protected]>
* anv/cmd_buffer: Add some stage assertsJason Ekstrand2017-12-081-0/+6
| | | | | | | There are several places where we look up opcodes in an array of stages. Assert that the we don't end up going out-of-bounds. Reviewed-by: Jordan Justen <[email protected]>
* anv/cmd_buffer: Add some helpers for working with descriptor setsJason Ekstrand2017-12-081-11/+34
| | | | Reviewed-by: Jordan Justen <[email protected]>
* anv/pipeline: Translate vulkan_resource_index to a constant when possibleJason Ekstrand2017-12-081-4/+13
| | | | | | | | | | We want to call brw_nir_analyze_ubo_ranges immedately after anv_nir_apply_pipeline_layout and it badly wants constants. We could run an optimization step and let constant folding do it but that's way more expensive than needed. It's really easy to just handle constants in apply_pipeline_layout. Reviewed-by: Jordan Justen <[email protected]>
* i965/fs: Rewrite assign_constant_locationsJason Ekstrand2017-12-081-133/+185
| | | | | | | | | | | | | | | | | | | | This rewires the logic for assigning uniform locations to work in terms of "complex alignments". The basic idea is that, as we walk the list of instructions, we keep track of the alignment and continuity requirements of each slot and assert that the alignments all match up. We then use those alignments in the compaction stage to ensure that everything gets placed at a properly aligned register. The old mechanism handled alignments by special-casing each of the bit sizes and placing 64-bit values first followed by 32-bit values. The old scheme had the advantage of never leaving a hole since all the 64-bit values could be tightly packed and so could the 32-bit values. However, the new scheme has no type size special cases so it handles not only 32 and 64-bit types but should gracefully extend to 16 and 8-bit types as the need arises. Tested-by: Jose Maria Casanova Crespo <[email protected]> Reviewed-by: Topi Pohjolainen <[email protected]>
* anv: Disable VK_KHR_16bit_storageJason Ekstrand2017-12-082-3/+3
| | | | | | | | | | | | The testing for this extension is currently very poor. The CTS tests only test accessing UBOs and SSBOs at dynamic offsets so none of our constant-offset paths get triggered at all. Also, there's an assertion in our handling of nir_intrinsic_load_uniform that offset % 4 == 0 which is never triggered indicating that nothing every gets loaded from an offset which is not a dword. Both push constants and the constant offset pull paths are complex enough, we really don't want to ship without tests. We'll turn the extension back on once we have decent tests.
* radeon/vce: move destroy command before feedback commandLeo Liu2017-12-081-1/+1
| | | | | | | | | | | | | | VCE processing IBs starts from session and task info at first level, other commands processed subsequently. The task info for destroy is embedded to destroy command, resulting that feedback command is not properly procoessed. This is causing kernel spin VM fault messages on Polaris and Vega10 card when running ends at encode application. The fix is also verified on VCE physical mode card. Signed-off-by: Leo Liu <[email protected]> Cc: [email protected] Acked-by: Christian König <[email protected]>
* docs/llvmpipe: document ppc64le as alternative architecture to x86.Ben Crocker2017-12-081-3/+9
| | | | | | | | | | | | Power8, Power8NV, and Power9 are supported on an equal footing with X86. Cc: "17.2" "17.3" <[email protected]> Signed-off-by: Ben Crocker <[email protected]> Reviewed-by: Eric Engestrom <[email protected]> [Eric: changed formatting, reworded a bit (with Ben's ack)] Signed-off-by: Eric Engestrom <[email protected]>
* docs/release-calendar: drop 17.3.0 from the tableEmil Velikov2017-12-081-7/+1
| | | | Signed-off-by: Emil Velikov <[email protected]>
* docs: add news item and link release notes for 17.3.0Emil Velikov2017-12-082-0/+8
| | | | Signed-off-by: Emil Velikov <[email protected]>
* docs: add sha256 checksums for 17.3.0Emil Velikov2017-12-081-1/+2
| | | | | Signed-off-by: Emil Velikov <[email protected]> (cherry picked from commit 49a612d1580b3316392273a069d20d93967126a8)
* docs: Update 17.3.0 release notesEmil Velikov2017-12-081-5/+178
| | | | | Signed-off-by: Emil Velikov <[email protected]> (cherry picked from commit 8d55da9f579463038f4305ed7d505aa7fffa0f37)
* radv: do not print ASM to stderr when dumping shadersSamuel Pitoiset2017-12-081-1/+1
| | | | | Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Dave Airlie <[email protected]>
* radv/winsys: implement query_value()Samuel Pitoiset2017-12-082-0/+72
| | | | | | | | | Might be useful to know the VRAM/GTT usage, the number of VRAM CPU page faults, etc. Nothing is currently using that new interface, but it's a first step. Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Dave Airlie <[email protected]>
* radv: remove useless check radv_set_dcc_need_cmask_elim_pred()Samuel Pitoiset2017-12-081-2/+1
| | | | | | | emit_fast_color_clear() already checks that. Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Dave Airlie <[email protected]>
* radv: remove useless checks in radv_set_{color,depth}_clear_regs()Samuel Pitoiset2017-12-081-4/+2
| | | | | | | Already checked by the respective callers. Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Dave Airlie <[email protected]>
* radv: only re-mit the index type when it changesSamuel Pitoiset2017-12-082-10/+24
| | | | | | | | | dota2 binds a ton of index buffers but the type is always 16-bit. Note that we have to invalidate the type when switching from indexed draws to normal draws. Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Dave Airlie <[email protected]>
* radv: only reset command buffers that are not in the initial stateSamuel Pitoiset2017-12-081-4/+9
| | | | | | | | dota2 always calls vkResetCommandBuffer() before vkBeginCommandBuffer() which is quite useless. Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Dave Airlie <[email protected]>
* radv: track different status of a command bufferSamuel Pitoiset2017-12-083-0/+17
| | | | | | | | | RADV_CMD_BUFFER_STATUS_INVALID is not used for now, but I think it makes sense to declare it. Could be used later with better command buffer error handling. Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Dave Airlie <[email protected]>
* radv: fix TC-compat HTILE with VK_FORMAT_D32_SFLOAT_S8_UINT on VegaSamuel Pitoiset2017-12-081-0/+6
| | | | | | | | | | | | Copied from RadeonSI. This fixes all CTS dEQP-VK.renderpass.dedicated_allocation.formats.d32_sfloat_s8_uint.clear.* And some other ones which use the same format. Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Dave Airlie <[email protected]>
* docs: Update GL_ARB_get_program_binary docs to support 1 formatJordan Justen2017-12-082-1/+2
| | | | | | Signed-off-by: Jordan Justen <[email protected]> Reviewed-by: Timothy Arceri <[email protected]> Acked-by: Tapani Pälli <[email protected]>
* i965: Add ARB_get_program_binary support using nir_serializationJordan Justen2017-12-086-6/+99
| | | | | | | | | | | | | | This resolves an apparent game bug described in 85564. The game doesn't properly handle ARB_get_program_binary with 0 supported formats. V2 (Timothy Arceri): - less driver code as more has been moved into the common helpers. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=85564 Signed-off-by: Timothy Arceri <[email protected]> Signed-off-by: Jordan Justen <[email protected]> (v1) Reviewed-by: Tapani Pälli <[email protected]>
* main: Clear shader program data whenever ProgramBinary is calledJordan Justen2017-12-081-0/+3
| | | | | | | | | | | | | | | The GL_ARB_get_program_binary extension spec says: "If ProgramBinary fails to load a binary, no error is generated, but any information about a previous link or load of that program object is lost." v2: * Re-initialize shProg->data after clear. (Jordan) (Required after 6a72eba755fea15a0d97abb913a6315d9d32e274) Signed-off-by: Jordan Justen <[email protected]> Reviewed-by: Nicolai Hähnle <[email protected]>
* main: add binary support to ProgramBinaryJordan Justen2017-12-081-17/+19
| | | | | | | | | | V2: call generic mesa_program_binary() helper rather than driver function directly to allow greater code sharing. Signed-off-by: Timothy Arceri <[email protected]> Signed-off-by: Jordan Justen <[email protected]> (v1) Reviewed-by: Nicolai Hähnle <[email protected]> (v1) Reviewed-by: Tapani Pälli <[email protected]>
* main: add binary support to GetProgramBinaryJordan Justen2017-12-081-6/+9
| | | | | | | | | | V2: call generic _mesa_get_program_binary() helper rather than driver function directly to allow greater code sharing. Signed-off-by: Timothy Arceri <[email protected]> Signed-off-by: Jordan Justen <[email protected]> (v1) Reviewed-by: Nicolai Hähnle <[email protected]> (v1) Reviewed-by: Tapani Pälli <[email protected]>
* main: Support getting GL_PROGRAM_BINARY_LENGTHJordan Justen2017-12-081-1/+6
| | | | | | | | | | | V2: call generic _mesa_get_program_binary_length() helper rather than driver function directly to allow greater code sharing. Signed-off-by: Timothy Arceri <[email protected]> Signed-off-by: Jordan Justen <[email protected]> (v1) Reviewed-by: Nicolai Hähnle <[email protected]>i (v1) Reviewed-by: Tapani Pälli <[email protected]>
* mesa: Add Mesa ARB_get_program_binary helper functionsJordan Justen2017-12-084-0/+351
| | | | | | | | | | | | | | | V2 (Timothy Arceri): - add extra code comment - stop passing around void *binary and just pass program_binary_header *hdr instead. - move to src/mesa/main rather than src/util V3 (Timothy Arceri): - Move more code out of the backend and into the common helpers. Signed-off-by: Jordan Justen <[email protected]> Reviewed-by: Tapani Pälli <[email protected]>
* mesa: add driver callbacks for serialising ProgramBinary blobsTimothy Arceri2017-12-081-0/+17
| | | | Reviewed-by: Jordan Justen <[email protected]>
* main: Support 1 Mesa format with get for GL_PROGRAM_BINARY_FORMATSJordan Justen2017-12-082-1/+10
| | | | | | | | | Mesa supports either 0 or 1 formats. If 1 format is supported, it is GL_PROGRAM_BINARY_FORMAT_MESA as defined in the GL_MESA_program_binary_formats extension spec. Signed-off-by: Jordan Justen <[email protected]> Reviewed-by: Nicolai Hähnle <[email protected]>
* main: Allow non-zero NUM_PROGRAM_BINARY_FORMATSJordan Justen2017-12-082-1/+4
| | | | | Signed-off-by: Jordan Justen <[email protected]> Reviewed-by: Nicolai Hähnle <[email protected]>
* i965: Fix memory leak when serializing nirJordan Justen2017-12-081-0/+1
| | | | | Signed-off-by: Jordan Justen <[email protected]> Reviewed-by: Timothy Arceri <[email protected]>
* i965: Add brw_program_serialize_nirJordan Justen2017-12-083-6/+14
| | | | | Signed-off-by: Jordan Justen <[email protected]> Reviewed-by: Timothy Arceri <[email protected]>
* i965: Free serialized nir after deserializingJordan Justen2017-12-081-0/+6
| | | | | Signed-off-by: Jordan Justen <[email protected]> Reviewed-by: Timothy Arceri <[email protected]>
* i965: Add brw_program_deserialize_nirJordan Justen2017-12-083-23/+28
| | | | | Signed-off-by: Jordan Justen <[email protected]> Reviewed-by: Timothy Arceri <[email protected]>
* main, glsl: Add UniformDataDefaults which stores uniform defaultsJordan Justen2017-12-084-2/+30
| | | | | | | | | | | | | | | | | | | The ARB_get_program_binary extension requires that uniform values in a program be restored to their initial value just after linking. This patch saves off the initial values just after linking. When the program is restored by glProgramBinary, we can use this to copy the initial value of uniforms into UniformDataSlots. V2 (Timothy Arceri): - Store UniformDataDefaults only when serializing GLSL as this is what we want for both disk cache and ARB_get_program_binary. This saves us having to come back later and reset the Uniforms on program binary restores. Signed-off-by: Timothy Arceri <[email protected]> Signed-off-by: Jordan Justen <[email protected]> (v1) Reviewed-by: Tapani Pälli <[email protected]>
* glsl: Split out shader program serializationJordan Justen2017-12-086-1181/+1297
| | | | | | | | This will allow us to use the program serialization to implement ARB_get_program_binary. Signed-off-by: Jordan Justen <[email protected]> Reviewed-by: Timothy Arceri <[email protected]>
* include: Add GL_MESA_program_binary_formats to GL/GLES2 ext.h filesJordan Justen2017-12-082-0/+10
| | | | | | | | | Thus was merged into the OpenGL Registry in version 667c5a253781834b40a6ae9eb19d05af4542cfe1. Ref: https://github.com/KhronosGroup/OpenGL-Registry/pull/127 Signed-off-by: Jordan Justen <[email protected]> Reviewed-by: Nicolai Hähnle <[email protected]>
* mesa: add GL_PROGRAM_BINARY_FORMAT_MESA enumJordan Justen2017-12-082-1/+9
| | | | | | | Signed-off-by: Jordan Justen <[email protected]> Reviewed-by: Ian Romanick <[email protected]> Reviewed-by: Nicolai Hähnle <[email protected]> Reviewed-by: Timothy Arceri <[email protected]>
* intel/cfg: Represent divergent control flow paths caused by non-uniform loop ↵Francisco Jerez2017-12-071-6/+69
| | | | | | | | | | | | | | | | | | | | | | | | | | execution. This addresses a long-standing back-end compiler bug that could lead to cross-channel data corruption in loops executed non-uniformly. In some cases live variables extending through a loop divergence point (e.g. a non-uniform break) into a convergence point (e.g. the end of the loop) wouldn't be considered live along all physical control flow paths the SIMD thread could possibly have taken in between due to some channels remaining in the loop for additional iterations. This patch fixes the problem by extending the CFG with physical edges that don't exist in the idealized non-vectorized program, but represent valid control flow paths the SIMD EU may take due to the divergence of logical threads. This makes sense because the i965 IR is explicitly SIMD, and it's not uncommon for instructions to have an influence on neighboring channels (e.g. a force_writemask_all header setup), so the behavior of the SIMD thread as a whole needs to be considered. No changes in shader-db. Reviewed-by: Jason Ekstrand <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
* intel/fs: Don't let undefined values prevent copy propagation.Francisco Jerez2017-12-071-3/+47
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This makes the dataflow propagation logic of the copy propagation pass more intelligent in cases where the destination of a copy is known to be undefined for some incoming CFG edges, building upon the definedness information provided by the last patch. Helps a few programs, and avoids a handful shader-db regressions from the next patch. shader-db results on ILK: total instructions in shared programs: 6541547 -> 6541523 (-0.00%) instructions in affected programs: 360 -> 336 (-6.67%) helped: 8 HURT: 0 LOST: 0 GAINED: 10 shader-db results on BDW: total instructions in shared programs: 8174323 -> 8173882 (-0.01%) instructions in affected programs: 7730 -> 7289 (-5.71%) helped: 5 HURT: 2 LOST: 0 GAINED: 4 shader-db results on SKL: total instructions in shared programs: 8185669 -> 8184598 (-0.01%) instructions in affected programs: 10364 -> 9293 (-10.33%) helped: 5 HURT: 2 LOST: 0 GAINED: 2 Reviewed-by: Jason Ekstrand <[email protected]>