aboutsummaryrefslogtreecommitdiffstats
path: root/src/intel
Commit message (Collapse)AuthorAgeFilesLines
* Get rid of a bunch of KHR suffixesJason Ekstrand2018-03-0711-248/+248
| | | | | Reviewed-by: Lionel Landwerlin <[email protected]> Reviewed-by: Samuel Iglesias Gonsálvez <[email protected]>
* anv: Add version 1.1.0 but leave it disabledJason Ekstrand2018-03-077-21/+22
| | | | | | | | This requires us to rename any Vulkan API entrypoints which became core in 1.1 to no longer have the KHR suffix. Reviewed-by: Lionel Landwerlin <[email protected]> Reviewed-by: Samuel Iglesias Gonsálvez <[email protected]>
* anv/entrypoints: Generate #ifdef guards from platform attributesJason Ekstrand2018-03-071-0/+8
| | | | Reviewed-by: Samuel Iglesias Gonsálvez <[email protected]>
* anv/extensions: Add support for multiple API versionsJason Ekstrand2018-03-072-10/+40
| | | | | Reviewed-by: Lionel Landwerlin <[email protected]> Reviewed-by: Samuel Iglesias Gonsálvez <[email protected]>
* anv/entrypoints_gen: Add support for aliases in the XMLJason Ekstrand2018-03-071-19/+46
| | | | | Reviewed-by: Lionel Landwerlin <[email protected]> Reviewed-by: Samuel Iglesias Gonsálvez <[email protected]>
* anv/entrypoints: Allow an entrypoint to require multiple extensionsJason Ekstrand2018-03-071-9/+12
| | | | | | | | In this case, we say an entrypoint is supported if ANY of the extensions is supported. This is because, in the XML, entrypoints don't require extensions so much as extensions require entrypoints. Reviewed-by: Samuel Iglesias Gonsálvez <[email protected]>
* anv/entrypoints: Add an is_device_entrypoint helperJason Ekstrand2018-03-071-2/+5
| | | | Reviewed-by: Samuel Iglesias Gonsálvez <[email protected]>
* anv/entrypoints_gen: Allow the string map to growJason Ekstrand2018-03-071-4/+6
| | | | | Reviewed-by: Lionel Landwerlin <[email protected]> Reviewed-by: Samuel Iglesias Gonsálvez <[email protected]>
* anv/entrypoints_gen: A bit of refactoringJason Ekstrand2018-03-071-15/+7
| | | | | Reviewed-by: Lionel Landwerlin <[email protected]> Reviewed-by: Samuel Iglesias Gonsálvez <[email protected]>
* anv/entrypoints: Generalize the string map a bitJason Ekstrand2018-03-071-85/+103
| | | | | | | | | | | | | The original string map assumed that the mapping from strings to entrypoints was a bijection. This will not be true the moment we add entrypoint aliasing. This reworks things to be an arbitrary map from strings to non-negative signed integers. The old one also had a potential bug if we ever had a hash collision because it didn't do the strcmp inside the lookup loop. While we're at it, we break things out into a helpful class. Reviewed-by: Lionel Landwerlin <[email protected]> Reviewed-by: Samuel Iglesias Gonsálvez <[email protected]>
* vulkan: Rename multiview from KHX to KHRJason Ekstrand2018-03-074-10/+10
| | | | | | Reviewed-by: Lionel Landwerlin <[email protected]> Reviewed-by: Samuel Iglesias Gonsálvez <[email protected]> Reviewed-by: Dave Airlie <[email protected]>
* android: anv: add libmesa_intel_dev static dependencyMauro Rossi2018-03-071-0/+1
| | | | | | | | | | | | Fixes the following building errors: external/mesa/src/intel/vulkan/anv_device.c:300: error: undefined reference to 'gen_get_pci_device_id_override' external/mesa/src/intel/vulkan/anv_device.c:312: error: undefined reference to 'gen_get_device_name' external/mesa/src/intel/vulkan/anv_device.c:313: error: undefined reference to 'gen_get_device_info' clang.real: error: linker command failed with exit code 1 (use -v to see invocation) Fixes: 272bef0601a "intel: Split gen_device_info out into libintel_dev" Reviewed-by: Tapani Pälli <[email protected]>
* intel: Add missing includes for building on AndroidClayton Craft2018-03-061-0/+1
| | | | | | | | | | | This adds a missing library to the i965/Android.mk file, and updates intel/Android.mk to include the new library. Without this, mesa does not build on Android. Fixes: 272bef0601a "intel: Split gen_device_info out into libintel_dev" Reviewed-by: Kenneth Graunke <[email protected]>
* vulkan: do not expose surface/swapchain extensions on AndroidTapani Pälli2018-03-061-1/+1
| | | | | | | | | On Android surface/swapchain extensions are implemented by the loader. Patch modifies both anv and radv extension scripts disabling currently exposed ones. See also earlier commit 9f763c1f9b. Signed-off-by: Tapani Pälli <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]>
* anv: Don't expose VK_KHX_multiview on android.Tapani Pälli2018-03-061-1/+1
| | | | | | | | | | | | | | Just like commit 2ffe395 does for radv. Fixes following dEQP test on i965: dEQP-VK.api.info.android.no_unknown_extensions v2: make it !ANDROID since this extension is not about surfaces/swapchain Signed-off-by: Tapani Pälli <[email protected]> Reviewed-by: Samuel Iglesias Gonsálvez <[email protected]> Reviewed-by: Emil Velikov <[email protected]>
* intel: Drop SURFACE_FORMAT enum from genxml.Kenneth Graunke2018-03-0514-2265/+31
| | | | | | | | | | | We want people to be using ISL_FORMAT_*, rather than the genxml format enumerations. This patch drops 10 separate copies, and drops a bunch of ugly casting. Reviewed-by: Jordan Justen <[email protected]> [[email protected]: Minor changes for rebase] Signed-off-by: Jordan Justen <[email protected]> Reviewed-by: Samuel Iglesias Gonsálvez <[email protected]>
* intel/common: Use isl for decoder surface formatsJordan Justen2018-03-053-1/+10
| | | | | Signed-off-by: Jordan Justen <[email protected]> Reviewed-by: Samuel Iglesias Gonsálvez <[email protected]>
* intel/isl: Add isl_format_is_validJordan Justen2018-03-052-0/+10
| | | | | Signed-off-by: Jordan Justen <[email protected]> Reviewed-by: Samuel Iglesias Gonsálvez <[email protected]>
* intel: Split gen_device_info out into libintel_devJordan Justen2018-03-0528-25/+131
| | | | | | | | | | | | Split out the device info so isl doesn't depend on intel/common. Now it will depend on the new intel/dev device info lib. This will allow the decoder in intel/common to use isl, allowing us to apply Ken's patch that removes the genxml duplication of surface formats. Signed-off-by: Jordan Justen <[email protected]> Reviewed-by: Samuel Iglesias Gonsálvez <[email protected]>
* isl: Silence unused parameter warnings in __gen_combine_address implementationsIan Romanick2018-03-022-2/+6
| | | | | | | | | | | | | | | | Reduces my build from 1808 warnings to 1772 warnings by silencing 36 instances of things like ../../SOURCE/master/src/intel/isl/isl_emit_depth_stencil.c: In function ‘__gen_combine_address’: ../../SOURCE/master/src/intel/isl/isl_emit_depth_stencil.c:30:29: warning: unused parameter ‘data’ [-Wunused-parameter] __gen_combine_address(void *data, void *loc, uint64_t addr, uint32_t delta) ^~~~ ../../SOURCE/master/src/intel/isl/isl_emit_depth_stencil.c:30:41: warning: unused parameter ‘loc’ [-Wunused-parameter] __gen_combine_address(void *data, void *loc, uint64_t addr, uint32_t delta) ^~~ Signed-off-by: Ian Romanick <[email protected]> Reviewed-by: Samuel Iglesias Gonsálvez <[email protected]>
* genxml: Silence unused parameter warnings in generated pack codeIan Romanick2018-03-021-3/+11
| | | | | | | | | | | | | | | | | | | | | | | | | | Reduces my build from 1960 warnings to 1808 warnings by silencing 152 instances of things like In file included from ../../SOURCE/master/src/intel/genxml/genX_pack.h:32:0, from ../../SOURCE/master/src/intel/isl/isl_emit_depth_stencil.c:36: src/intel/genxml/gen4_pack.h: In function ‘__gen_uint’: src/intel/genxml/gen4_pack.h:58:49: warning: unused parameter ‘end’ [-Wunused-parameter] __gen_uint(uint64_t v, uint32_t start, uint32_t end) ^~~ src/intel/genxml/gen4_pack.h: In function ‘__gen_offset’: src/intel/genxml/gen4_pack.h:94:35: warning: unused parameter ‘start’ [-Wunused-parameter] __gen_offset(uint64_t v, uint32_t start, uint32_t end) ^~~~~ src/intel/genxml/gen4_pack.h:94:51: warning: unused parameter ‘end’ [-Wunused-parameter] __gen_offset(uint64_t v, uint32_t start, uint32_t end) ^~~ src/intel/genxml/gen4_pack.h: In function ‘__gen_ufixed’: src/intel/genxml/gen4_pack.h:133:48: warning: unused parameter ‘end’ [-Wunused-parameter] __gen_ufixed(float v, uint32_t start, uint32_t end, uint32_t fract_bits) ^~~ Signed-off-by: Ian Romanick <[email protected]> Reviewed-by: Samuel Iglesias Gonsálvez <[email protected]>
* i965: Silence unused parameter warnings in blorpIan Romanick2018-03-021-7/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Reduces my build from 2023 warnings to 1960 warnings by silencing 63 instances of things like In file included from ../../SOURCE/master/src/mesa/drivers/dri/i965/genX_blorp_exec.c:33:0: ../../SOURCE/master/src/intel/blorp/blorp_genX_exec.h: In function ‘blorp_emit_cc_viewport’: ../../SOURCE/master/src/intel/blorp/blorp_genX_exec.h:500:51: warning: unused parameter ‘params’ [-Wunused-parameter] const struct blorp_params *params) ^~~~~~ ../../SOURCE/master/src/intel/blorp/blorp_genX_exec.h: In function ‘blorp_emit_sampler_state’: ../../SOURCE/master/src/intel/blorp/blorp_genX_exec.h:524:53: warning: unused parameter ‘params’ [-Wunused-parameter] const struct blorp_params *params) ^~~~~~ In file included from ../../SOURCE/master/src/mesa/drivers/dri/i965/genX_blorp_exec.c:36:0: ../../SOURCE/master/src/mesa/drivers/dri/i965/gen4_blorp_exec.h: In function ‘blorp_emit_vs_state’: ../../SOURCE/master/src/mesa/drivers/dri/i965/gen4_blorp_exec.h:50:48: warning: unused parameter ‘params’ [-Wunused-parameter] const struct blorp_params *params) ^~~~~~ ../../SOURCE/master/src/mesa/drivers/dri/i965/genX_blorp_exec.c: In function ‘blorp_flush_range’: ../../SOURCE/master/src/mesa/drivers/dri/i965/genX_blorp_exec.c:197:39: warning: unused parameter ‘batch’ [-Wunused-parameter] blorp_flush_range(struct blorp_batch *batch, void *start, size_t size) ^~~~~ ../../SOURCE/master/src/mesa/drivers/dri/i965/genX_blorp_exec.c:197:52: warning: unused parameter ‘start’ [-Wunused-parameter] blorp_flush_range(struct blorp_batch *batch, void *start, size_t size) ^~~~~ ../../SOURCE/master/src/mesa/drivers/dri/i965/genX_blorp_exec.c:197:66: warning: unused parameter ‘size’ [-Wunused-parameter] blorp_flush_range(struct blorp_batch *batch, void *start, size_t size) ^~~~ Signed-off-by: Ian Romanick <[email protected]> Reviewed-by: Samuel Iglesias Gonsálvez <[email protected]>
* i965: Silence warnings about mixing enum and non-enum in conditionalIan Romanick2018-03-021-1/+1
| | | | | | | | | | | | | | | | | | | | Reduces my build from 6451 warnings to 6301 warnings by silencing 150 instances of ../../SOURCE/master/src/intel/compiler/brw_inst.h: In function ‘brw_reg_type brw_inst_src1_type(const gen_device_info*, const brw_inst*)’: ../../SOURCE/master/src/intel/compiler/brw_inst.h:802:55: warning: enumeral and non-enumeral type in conditional expression [-Wextra] unsigned file = __builtin_strcmp("dst", #reg) == 0 ? \ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~ BRW_GENERAL_REGISTER_FILE : \ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ brw_inst_##reg##_reg_file(devinfo, inst); \ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ ../../SOURCE/master/src/intel/compiler/brw_inst.h:811:1: note: in expansion of macro ‘REG_TYPE’ REG_TYPE(src1) ^~~~~~~~ Signed-off-by: Ian Romanick <[email protected]> Reviewed-by: Samuel Iglesias Gonsálvez <[email protected]>
* intel/compiler: Silence unused parameter warnings in release buildsIan Romanick2018-03-022-6/+7
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Reduces my build from 7005 warnings to 6451 warnings by silencing 554 instances of In file included from ../../SOURCE/master/src/intel/compiler/brw_disasm.c:28:0: ../../SOURCE/master/src/intel/compiler/brw_inst.h: In function ‘brw_inst_3src_a1_src0_imm’: ../../SOURCE/master/src/intel/compiler/brw_inst.h:346:57: warning: unused parameter ‘devinfo’ [-Wunused-parameter] brw_inst_3src_a1_src0_imm(const struct gen_device_info *devinfo, ^~~~~~~ ../../SOURCE/master/src/intel/compiler/brw_inst.h: In function ‘brw_inst_3src_a1_src2_imm’: ../../SOURCE/master/src/intel/compiler/brw_inst.h:354:57: warning: unused parameter ‘devinfo’ [-Wunused-parameter] brw_inst_3src_a1_src2_imm(const struct gen_device_info *devinfo, ^~~~~~~ ../../SOURCE/master/src/intel/compiler/brw_inst.h: In function ‘brw_inst_set_3src_a1_src0_imm’: ../../SOURCE/master/src/intel/compiler/brw_inst.h:362:61: warning: unused parameter ‘devinfo’ [-Wunused-parameter] brw_inst_set_3src_a1_src0_imm(const struct gen_device_info *devinfo, ^~~~~~~ ../../SOURCE/master/src/intel/compiler/brw_inst.h: In function ‘brw_inst_set_3src_a1_src2_imm’: ../../SOURCE/master/src/intel/compiler/brw_inst.h:370:61: warning: unused parameter ‘devinfo’ [-Wunused-parameter] brw_inst_set_3src_a1_src2_imm(const struct gen_device_info *devinfo, ^~~~~~~ ../../SOURCE/master/src/intel/compiler/brw_inst.h: In function ‘brw_inst_imm_uq’: ../../SOURCE/master/src/intel/compiler/brw_inst.h:703:47: warning: unused parameter ‘devinfo’ [-Wunused-parameter] brw_inst_imm_uq(const struct gen_device_info *devinfo, const brw_inst *insn) ^~~~~~~ In file included from ../../SOURCE/master/src/intel/compiler/brw_shader.h:29:0, from ../../SOURCE/master/src/intel/compiler/brw_disasm.c:29: ../../SOURCE/master/src/intel/compiler/brw_compiler.h: In function ‘brw_stage_has_packed_dispatch’: ../../SOURCE/master/src/intel/compiler/brw_compiler.h:1277:61: warning: unused parameter ‘devinfo’ [-Wunused-parameter] brw_stage_has_packed_dispatch(const struct gen_device_info *devinfo, ^~~~~~~ ../../SOURCE/master/src/intel/compiler/brw_disasm.c: In function ‘src_ia1’: ../../SOURCE/master/src/intel/compiler/brw_disasm.c:849:18: warning: unused parameter ‘_reg_file’ [-Wunused-parameter] unsigned _reg_file, ^~~~~~~~~ Signed-off-by: Ian Romanick <[email protected]> Reviewed-by: Samuel Iglesias Gonsálvez <[email protected]>
* intel: Drop program size pointer from vec4/fs assembly getters.Kenneth Graunke2018-03-029-25/+17
| | | | | | | | | These days, we're just passing a pointer to a prog_data field, which we already have access to. We can just use it directly. (In the past, it was a pointer to a separate value.) Reviewed-by: Iago Toral Quiroga <[email protected]>
* intel/compiler: Memory fence commit must always be enabled for gen10+Anuj Phogat2018-03-021-1/+3
| | | | | | | | | | | | Commit bit in the message descriptor (Bit 13) must be always set to true in CNL+ for memory fence messages. It also fixes a piglit GPU hang on cnl+ in simulation environment. Piglit test: arb_shader_image_load_store-shader-mem-barrier See HSD ES # 1404612949 Signed-off-by: Anuj Phogat <[email protected]> Cc: [email protected] Reviewed-by: Francisco Jerez <[email protected]>
* Revert "i965/fs: Predicate byte scattered writes if needed"Francisco Jerez2018-03-021-14/+1
| | | | | | | | | | | | | This reverts commit a4031bdfa927fb4c3c5d0bdadc70634f3c1a5eac. It's redundant with the sample mask predication done at this point by the common logical send lowering infrastructure, and rather buggy because it wasn't applying the correct sample mask in shaders using discard, since the dispatch mask returned by FS_OPCODE_MOV_DISPATCH_TO_FLAGS doesn't reflect samples discarded by the shader, so it could have led to data corruption in fragment shader invocations that execute discard based on a non-dynamically uniform condition. Reviewed-by: Kenneth Graunke <[email protected]>
* intel/fs: Handle surface opcode sample masks via predication.Francisco Jerez2018-03-021-1/+42
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The main motivation is to enable HDC surface opcodes on ICL which no longer allows the sample mask to be provided in a message header, but this is enabled all the way back to IVB when possible because it decreases the instruction count of some shaders using HDC messages significantly, e.g. one of the SynMark2 CSDof compute shaders decreases instruction count by about 40% due to the removal of header setup boilerplate which in turn makes a number of send message payloads more easily CSE-able. Shader-db results on SKL: total instructions in shared programs: 15325319 -> 15314384 (-0.07%) instructions in affected programs: 311532 -> 300597 (-3.51%) helped: 491 HURT: 1 Shader-db results on BDW where the optimization needs to be disabled in some cases due to hardware restrictions: total instructions in shared programs: 15604794 -> 15598028 (-0.04%) instructions in affected programs: 220863 -> 214097 (-3.06%) helped: 351 HURT: 0 The FPS of SynMark2 CSDof improves by 5.09% ±0.36% (n=10) on my SKL laptop with this change. According to Eero this improves performance of the same test by 9% on BYT and by 7-8% on BXT J4205 and on SKL GT2 desktop. Reviewed-by: Kenneth Graunke <[email protected]> Tested-By: Eero Tamminen <[email protected]>
* intel/eu: Plumb header present bit to codegen helpers for HDC messages.Francisco Jerez2018-03-024-29/+50
| | | | | | | | | | This makes sure that the header-present bit of the message descriptor is in sync with the IR instruction fields, which gives the optimizer more control to avoid the overhead of setting up a message header when it's possible to do so. Reviewed-by: Jordan Justen <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
* intel/ir: Allow arbitrary scratch flag registers for ↵Francisco Jerez2018-03-023-4/+6
| | | | | | | | | | | | | | SHADER_OPCODE_FIND_LIVE_CHANNEL. This shouldn't cause any functional change at this point, it changes SHADER_OPCODE_FIND_LIVE_CHANNEL to use the flag register specified at the IR level instead of the hard-coded f1.0, now that it can be represented in backend_instruction::flag_subreg. This will be necessary for scheduling to behave correctly once more things start making use of f1.0. Reviewed-by: Jordan Justen <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
* intel/ir: Allow representing additional flag subregisters in the IR.Francisco Jerez2018-03-027-14/+24
| | | | | | | | | This allows representing conditional mods and predicates on f1.0-f1.1 at the IR level by adding an extra bit to the flag_subreg backend_instruction field. Reviewed-by: Jordan Justen <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
* intel/l3: Don't allocate SLM partition on ICL+.Francisco Jerez2018-03-021-1/+1
| | | | | | | | SLM has a chunk of special-purpose memory separate from L3 on ICL+, we shouldn't allocate a partition for it on L3 anymore. Reviewed-by: Jordan Justen <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
* intel/fs: Set up sampler message headers in the visitor on gen7+Jason Ekstrand2018-03-012-22/+39
| | | | | | | | | | | | | | This gives the scheduler visibility into the headers which should improve scheduling. More importantly, however, it lets the scheduler know that the header gets written. As-is, the scheduler thinks that a texture instruction only reads it's payload and is unaware that it may write to the first register so it may reorder it with respect to a read from that register. This is causing issues in a couple of Dota 2 vertex shaders. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=104923 Cc: [email protected] Reviewed-by: Francisco Jerez <[email protected]>
* anv: Enable MSAA fast-clearsJason Ekstrand2018-03-011-4/+7
| | | | | | | This speeds up the Sascha Willems multisampling demo by around 25% when using 8x or 16x MSAA. Reviewed-by: Samuel Iglesias Gonsálvez <[email protected]>
* anv/cmd_buffer: Add support for MCS fast-clears and resolvesJason Ekstrand2018-03-011-5/+39
| | | | Reviewed-by: Samuel Iglesias Gonsálvez <[email protected]>
* anv/cmd_buffer: Add helpers for computing resolve predicatesJason Ekstrand2018-03-011-10/+64
| | | | | | | We'll want to re-use the complex resolve predicate computations for MCS resolves so it's nice to have them as helper functions. Reviewed-by: Samuel Iglesias Gonsálvez <[email protected]>
* anv/cmd_buffer: Handle MCS identical to CCS_E in compute_aux_usageJason Ekstrand2018-03-011-9/+5
| | | | | | | | This doesn't actually do anything because att_state->fast_clear is determined based on the return value of anv_layout_to_fast_clear_type which currently returns NONE for multisampled images. Reviewed-by: Samuel Iglesias Gonsálvez <[email protected]>
* anv/blorp: Pass the clear address to blorp for subpass MSAA resolvesJason Ekstrand2018-03-011-0/+6
| | | | Reviewed-by: Samuel Iglesias Gonsálvez <[email protected]>
* anv/blorp: Allow indirect clear colors on blorp sources on gen7Jason Ekstrand2018-03-011-2/+2
| | | | Reviewed-by: Samuel Iglesias Gonsálvez <[email protected]>
* anv/blorp: Add partial clear support to anv_image_mcs_opJason Ekstrand2018-03-011-1/+14
| | | | Reviewed-by: Samuel Iglesias Gonsálvez <[email protected]>
* intel/blorp: Add indirect clear color support to mcs_partial_resolveJason Ekstrand2018-03-013-10/+70
| | | | | | | | | | This is a bit complicated because we have to get the indirect clear color in there somehow. In order to not do any more work in the shader than needed, we set it up as it's own vertex binding which points directly at the clear color address specified by the client. Acked-by: Samuel Iglesias Gonsálvez <[email protected]> Reviewed-by: Topi Pohjolainen <[email protected]>
* intel/blorp: Add a helper for filling out VERTEX_BUFFER_STATEJason Ekstrand2018-03-011-36/+33
| | | | | | | There are enough #ifs in there that it's kind-of pointless to duplicate it for each buffer. Reviewed-by: Samuel Iglesias Gonsálvez <[email protected]>
* anv: Enable VK_KHR_16bit_storage for PushConstantJose Maria Casanova Crespo2018-02-281-1/+1
| | | | | | Enables storagePushConstant16 features of VK_KHR_16bit_storage for Gen8+. Reviewed-by: Jason Ekstrand <[email protected]>
* spirv/i965/anv: Relax push constant offset assertions being 32-bit alignedJose Maria Casanova Crespo2018-02-282-7/+10
| | | | | | | | | | | | | | | | The introduction of 16-bit types with VK_KHR_16bit_storages implies that push constant offsets could be multiple of 2-bytes. Some assertions are updated so offsets should be just multiple of size of the base type but in some cases we can not assume it as doubles aren't aligned to 8 bytes in some cases. For 16-bit types, the push constant offset takes into account the internal offset in the 32-bit uniform bucket adding 2-bytes when we access not 32-bit aligned elements. In all 32-bit aligned cases it just becomes 0. v2: Assert offsets to be aligned to the dest type size. (Jason Ekstrand) Reviewed-by: Jason Ekstrand <[email protected]>
* anv: Enable VK_KHR_16bit_storage for SSBO and UBOJose Maria Casanova Crespo2018-02-282-3/+4
| | | | | | | Enables storageBuffer16BitAccess and uniformAndStorageBuffer16BitAccesss features of VK_KHR_16bit_storage for Gen8+. Reviewed-by: Jason Ekstrand <[email protected]>
* i965/fs: Support 16-bit store_ssbo with VK_KHR_relaxed_block_layoutJose Maria Casanova Crespo2018-02-281-7/+15
| | | | | | | | | | | | | | | | Restrict the use of untyped_surface_write with 16-bit pairs in ssbo to the cases where we can guarantee that offset is multiple of 4. Taking into account that VK_KHR_relaxed_block_layout is available in ANV we can only guarantee that when we have a constant offset that is multiple of 4. For non constant offsets we will always use byte_scattered_write. v2: (Jason Ekstrand) - Assert offset_reg to be multiple of 4 if it is immediate. Reviewed-by: Jason Ekstrand <[email protected]>
* i965/fs: Support 16-bit do_read_vector with VK_KHR_relaxed_block_layoutJose Maria Casanova Crespo2018-02-281-14/+37
| | | | | | | | | | | | | | | | | | | | | | | | | | 16-bit load_ubo/ssbo operations that call do_untyped_read_vector don't guarantee that offsets are multiple of 4-bytes as required by untyped_read message. This happens for example in the case of f16mat3x3 when then VK_KHR_relaxed_block_layout is enabled. Vectors reads when we have non-constant offsets are implemented with multiple byte_scattered_read messages that not require 32-bit aligned offsets. Now for all constant offsets we can use the untyped_read_surface message. In the case of constant offsets not aligned to 32-bits, we calculate a start offset 32-bit aligned and use the shuffle_32bit_load_result_to_16bit_data function and the first_component parameter to skip the copy of the unneeded component. v2: (Jason Ekstrand) Use untyped_read_surface messages always we have constant offsets. v3: (Jason Ekstrand) Simplify loop for reads with non constant offsets. Use end - start to calculate the number of 32-bit components to read with constant offsets. Reviewed-by: Jason Ekstrand <[email protected]>
* i965/fs: shuffle_32bit_load_result_to_16bit_data now skips componentsJose Maria Casanova Crespo2018-02-283-3/+6
| | | | | | | | | | | | | | | | This helper used to load 16bit components from 32-bits read now allows skipping components with the new parameter first_component. The semantics now skip components until we reach the first_component, and then reads the number of components passed to the function. All previous uses of the helper are updated to use 0 as first_component. This will allow read 16-bit components when the first one is not aligned 32-bit. Enabling more usages of untyped_reads with 16-bit types. v2: (Jason Ektrand) Change parameters order to first_component, num_components Reviewed-by: Jason Ekstrand <[email protected]>
* isl/i965/fs: SSBO/UBO buffers need size padding if not multiple of 32-bitJose Maria Casanova Crespo2018-02-283-2/+62
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The surfaces that backup the GPU buffers have a boundary check that considers that access to partial dwords are considered out-of-bounds. For example, buffers with 1,3 16-bit elements has size 2 or 6 and the last two bytes would always be read as 0 or its writting ignored. The introduction of 16-bit types implies that we need to align the size to 4-bytew multiples so that partial dwords could be read/written. Adding an inconditional +2 size to buffers not being multiple of 2 solves this issue for the general cases of UBO or SSBO. But, when unsized arrays of 16-bit elements are used it is not possible to know if the size was padded or not. To solve this issue the implementation calculates the needed size of the buffer surfaces, as suggested by Jason: surface_size = isl_align(buffer_size, 4) + (isl_align(buffer_size, 4) - buffer_size) So when we calculate backwards the buffer_size in the backend we update the resinfo return value with: buffer_size = (surface_size & ~3) - (surface_size & 3) It is also exposed this buffer requirements when robust buffer access is enabled so these buffer sizes recommend being multiple of 4. v2: (Jason Ekstrand) Move padding logic fron anv to isl_surface_state. Move calculus of original size from spirv to driver backend. v3: (Jason Ekstrand) Rename some variables and use a similar expresion when calculating. padding than when obtaining the original buffer size. Avoid use of unnecesary component call at brw_fs_nir. v4: (Jason Ekstrand) Complete comment with buffer size calculus explanation in brw_fs_nir. Reviewed-by: Jason Ekstrand <[email protected]>
* anv: Always set has_context_priorityJason Ekstrand2018-02-281-3/+1
| | | | | | | | | | We don't zalloc the physical device so we need to unconditionally set everything. Crucible helpfully initializes all allocations to 139 so it was getting true regardless of whether or not the kernel actually supports context priorities. Fixes: 6d8ab53303331 "anv: implement VK_EXT_global_priority extension" Reviewed-by: Kenneth Graunke <[email protected]>