summaryrefslogtreecommitdiffstats
path: root/src/compiler
Commit message (Collapse)AuthorAgeFilesLines
* spirv/cl: support vload/vstoreKarol Herbst2019-05-041-0/+55
| | | | | Signed-off-by: Karol Herbst <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]>
* nir: Add nir_op_vec helperKarol Herbst2019-05-043-22/+14
| | | | | | | | | with that we can simplify code where nir vectors are created v2: merge both lines in nir_vec Signed-off-by: Karol Herbst <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]>
* nir: Add a nir_builder_alu variant which takes an array of componentsKarol Herbst2019-05-041-14/+36
| | | | | | | v2: rename to nir_build_alu_src_arr Signed-off-by: Karol Herbst <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]>
* vtn: handle bitcast with pointer src/destKarol Herbst2019-05-043-29/+45
| | | | | | | v2: use vtn_push_ssa and vtn_ssa_value Signed-off-by: Karol Herbst <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]>
* nir: Add a SSA type gathering passJason Ekstrand2019-05-044-0/+223
| | | | | | | | | | | | | | This new pass (which isn't even compile-tested) attempts to determine the ALU type of all the SSA values in a function impl. It takes a greedy approach and assigns intness or floatness to everything it thinks can possibly contain an int or a float. Some values will be labled as both int and float and some will be labled as neither and it is up to the caller to decide what to do with this information. However, for a "nice" shader where the original source contained no bit-casts and no implicit bit-casts were introduced by optimizations, there shouldn't be any overlap in the two sets save for the odd CSEd zero constant. Reviewed-by: Vasily Khoruzhick <[email protected]>
* nir/algebraic: Don't emit empty initializers for MSVCConnor Abbott2019-05-041-0/+4
| | | | | | | | | Just don't emit the transform array at all if there are no transforms v2: - Don't use len(array) > 0 (Dylan) - Keep using ARRAY_SIZE to make the generated C code easier to read (Jason).
* meson: Don't build glsl cache_test when shader cache is disabledDylan Baker2019-05-031-12/+13
| | | | | | | v2: - Use new with_shader_cache variable instead of host_machine.system() == 'windows' Reviewed-by: Eric Anholt <[email protected]>
* glsl/tests: define ssize_t on windowsDylan Baker2019-05-031-0/+4
| | | | Reviewed-by: Eric Anholt <[email protected]>
* glsl: fix general_ir_test with mingwDylan Baker2019-05-031-7/+7
| | | | | | | | Somewhere down in the depths of the mingw headers 'interface' is defined, change it to iface like a similar patch did. Signed-off-by: Dylan Baker <[email protected]> Reviewed-by: Eric Anholt <[email protected]>
* nir: fix lower vars to ssa for larger vector sizes.Dave Airlie2019-05-031-4/+4
| | | | | | | This has a couple of hardcoded vec4 limits in it, change them to the proper sizing to avoid future issues. Reviewed-by: Jason Ekstrand <[email protected]>
* spirv: fix SpvOpBitSize return value.Dave Airlie2019-05-031-3/+1
| | | | | | The spir-v spec says this returns a bool. Reviewed-by: Jason Ekstrand <[email protected]>
* nir: fix nir tex print harderRob Clark2019-05-021-6/+5
| | | | | | Fixes: 691d5a825a6 nir: rework tex instruction printing Reviewed-by: Eric Anholt <[email protected]> Signed-off-by: Rob Clark <[email protected]>
* glsl: fix and clean up NV_compute_shader_derivatives supportMarek Olšák2019-05-021-54/+24
| | | | | | | - make sure compute shader derivatives are exposed for all extensions - unify duplicated code Reviewed-by: Caio Marcelo de Oliveira Filho <[email protected]>
* nir: add pass to lower fb readsRob Clark2019-05-025-6/+141
| | | | | Signed-off-by: Rob Clark <[email protected]> Reviewed-by: Kristian H. Kristensen <[email protected]>
* nir: fix lower_wpos_ytransform in load_frag_coord caseRob Clark2019-05-021-10/+11
| | | | | | | | | | | | | | | Apparently we never hit this path. Or at least haven't for a rather long time. But in either case (load_deref or load_frag_coord), we can just directly use the intrinsic's ssa dest. So stop passing the nir_variable (which would be NULL in the load_frag_coord case) around and instead just use &intr->dest.ssa. (This ofc means we need to setup the cursor to insert *after* the instruction, which seems to be another bug of the original implementation.) Signed-off-by: Rob Clark <[email protected]> Reviewed-by: Kristian H. Kristensen <[email protected]>
* nir: rework tex instruction printingRob Clark2019-05-021-8/+10
| | | | | | | The extra comma at the end was annoying me. Signed-off-by: Rob Clark <[email protected]> Reviewed-by: Kristian H. Kristensen <[email protected]>
* nir/search: Add debugging code to dump the pattern matchedConnor Abbott2019-05-021-0/+75
| | | | | | This was useful while debugging the previous commit. Reviewed-by: Jason Ekstrand <[email protected]>
* nir/search: Add automaton-based pre-searchingConnor Abbott2019-05-023-19/+425
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | nir_opt_algebraic is currently one of the most expensive NIR passes, because of the many different patterns we've added over the years. Even though patterns are already sorted by opcode, there are still way too many patterns for common opcodes like bcsel and fadd, which means that many patterns are tried but only a few actually match. One way to fix this is to add a pre-pass over the code that scans it using an automaton constructed beforehand, similar to the automatons produced by lex and yacc for parsing source code. This automaton has to walk the SSA graph and recognize possible pattern matches. It turns out that the theory to do this is quite mature already, having been developed for instruction selection as well as other non-compiler things. I followed the presentation in the dissertation cited in the code, "Tree algorithms: Two Taxonomies and a Toolkit," trying to keep the naming similar. To create the automaton, we have to perform something like the classical NFA to DFA subset construction used by lex, but it turns out that actually computing the transition table for all possible states would be way too expensive, with the dissertation reporting times of almost half an hour for an example of size similar to nir_opt_algebraic. Instead, we adopt one of the "filter" approaches explained in the dissertation, which trade much faster table generation and table size for a few more table lookups per instruction at runtime. I chose the filter which resulted the fastest table generation time, with medium table size. Right now, the table generation takes around .5 seconds, despite being implemented in pure Python, which I think is good enough. Based on the numbers in the dissertation, the other choice might make table compilation time 25x slower to get 4x smaller table size, but I don't think that's worth it. As of now, we get the following binary size before and after this patch: text data bss dec hex filename 11979455 464720 730864 13175039 c908ff before i965_dri.so text data bss dec hex filename 12037835 616244 791792 13445871 cd2aef after i965_dri.so There are a number of places where I've simplified the automaton by getting rid of details in the LHS patterns rather than complicate things to deal with them. For example, right now the automaton doesn't distinguish between constants with different values. This means that it isn't as precise as it could be, but the decrease in compile time is still worth it -- these are the compilation time numbers for a shader-db run with my (admittedly old) database on Intel skylake: Difference at 95.0% confidence -42.3485 +/- 1.375 -7.20383% +/- 0.229926% (Student's t, pooled s = 1.69843) We can always experiment with making it more precise later. Reviewed-by: Jason Ekstrand <[email protected]>
* glsl: fix typo in #warning messageBrian Paul2019-05-021-1/+1
| | | | Trivial. Spotted by Eric Engestrom.
* glsl: work around MinGW 7.x compiler bugBrian Paul2019-05-011-0/+15
| | | | | | | | | | | I'm not sure what triggered this, but building with scons platform=windows toolchain=crossmingw machine=x86 build=profile with MinGW g++ 7.3 or 7.4 causes an internal compiler error. We can work around it by forcing -O1 optimization. Reviewed-by: Jose Fonseca <[email protected]> Reviewed-by: Neha Bhende <[email protected]>
* nir: Saturating integer arithmetic is not associativeIan Romanick2019-05-011-1/+1
| | | | | | | | | | | | | | | | | | In 8-bits, iadd_sat(iadd_sat(0x7f, 0x7f), -1) = iadd_sat(0x7f, -1) = 0x7e but, iadd_sat(0x7f, iadd_sat(0x7f, -1)) = iadd_sat(0x7f, 0x7e) = 0x7f Fixes: 272e927d0e9 ("nir/spirv: initial handling of OpenCL.std extension opcodes") Reviewed-by: Karol Herbst <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]>
* nir: improve convert_yuv_to_rgbJonathan Marek2019-05-011-15/+14
| | | | | | | | | | | | Use a different arrangement of constants to allow more ffma. A vec4 backend will now use 3 fma for yuv_to_rgb. On freedreno/ir3, it is down from 10 to 7 alu (4 fma, 3 mul, 3 add to 7 fma). Other backends shouldn't be hurt. Signed-off-by: Jonathan Marek <[email protected]> Reviewed-by: Eric Anholt <[email protected]> Tested-by: Ian Romanick <[email protected]>
* spirv: add missing SPV_EXT_descriptor_indexing capabilitiesJuan A. Suarez Romero2019-04-302-0/+16
| | | | | | | | | | | | | | Add ShaderNonUniformEXT, UniformBufferArrayNonUniformIndexingEXT, SampledImageArrayNonUniformIndexingEXT, StorageBufferArrayNonUniformIndexingEXT, StorageImageArrayNonUniformIndexingEXT, InputAttachmentArrayNonUniformIndexingEXT, UniformTexelBufferArrayNonUniformIndexingEXT and StorageTexelBufferArrayNonUniformIndexingEXT capabilities. Cc: [email protected] Reviewed-by: Jason Ekstrand <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]>
* spirv: Properly handle SpvOpAtomicCompareExchangeWeakCaio Marcelo de Oliveira Filho2019-04-291-75/+82
| | | | | | | | | | | | | | | | | The code was handling the Weak variant in some cases, but missing others, e.g. the get_deref_nir_atomic_op. Add all the missing cases with the same behavior of the non-Weak SpvOpAtomicCompareExchange. Note that the Weak variant is basically an alias, as SPIR-V 1.3, Revision 7 says "OpAtomicCompareExchangeWeak Deprecated (use OpAtomicCompareExchange). Has the same semantics as OpAtomicCompareExchange." Reviewed-by: Jason Ekstrand <[email protected]>
* delete autotools .gitignore filesEric Engestrom2019-04-299-45/+0
| | | | | | | | One special case, `src/util/xmlpool/.gitignore` is not entirely deleted, as `xmlpool.pot` still gets generated (eg. by `ninja xmlpool-pot`). Signed-off-by: Eric Engestrom <[email protected]> Reviewed-by: Dylan Baker <[email protected]>
* glsl/linker: check for xfb_offset aliasingAndres Gomez2019-04-292-31/+84
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | From page 76 (page 80 of the PDF) of the GLSL 4.60 v.5 spec: " No aliasing in output buffers is allowed: It is a compile-time or link-time error to specify variables with overlapping transform feedback offsets." Currently, this is expected to fail, but it succeeds: " ... layout (xfb_offset = 0) out vec2 a; layout (xfb_offset = 0) out vec4 b; ... " Fixes the following piglit test: tests/spec/arb_enhanced_layouts/compiler/transform-feedback-layout-qualifiers/xfb_offset/invalid-overlap.vert Fixes the following test: KHR-GL44.enhanced_layouts.xfb_output_overlapping v2: - Use a data structure to track the used components instead of a nested loop (Ilia). v3: - Take the BITSET_WORD array out from the gl_transform_feedback_buffer struct and make it local to the validation process (Timothy). - Do not use a nested scope for the validation (Timothy). v4: - Add reference to the fixed piglit test in the commit log. - Add reference to the fixed VK-GL-CTS test in the commit log (Tapani). - Empty initialize the BITSET_WORD pointers array (Tapani). Cc: Timothy Arceri <[email protected]> Cc: Ilia Mirkin <[email protected]> Signed-off-by: Andres Gomez <[email protected]> Reviewed-by: Tapani Pälli <[email protected]>
* nir: Add a new nir_cf_list_is_empty_block() helper.Kenneth Graunke2019-04-281-0/+15
| | | | | | | Helper and name suggested by Eric Anholt. Reviewed-by: Jason Ekstrand <[email protected]> Reviewed-by: Matt Turner <[email protected]>
* glsl/list: Add an exec_list_is_singular() helper.Kenneth Graunke2019-04-281-0/+7
| | | | | | | Similar to list_is_singular() in util/list.h. Reviewed-by: Jason Ekstrand <[email protected]> Reviewed-by: Matt Turner <[email protected]>
* nir: add rcp(w) lowering for gl_FragCoordAndreas Baierl2019-04-294-0/+84
| | | | | | | | | | | | | | On some hardware (e.g. Mali400) the shader needs to apply some transformations for correct gl_FragCoord handling. The lowering actions look like the following in pseudocode: gl_FragCoord.xyz = gl_FragCoord_orig.xyz gl_FragCoord.w = 1.0 / gl_FragCoord_orig.w Add this lowering as a nir pass in preparation for using it in the driver. Signed-off-by: Andreas Baierl <[email protected]> Reviewed-by: Qiang Yu <[email protected]> Reviewed-by: Eric Anholt <[email protected]>
* glsl: use empty brace initializerTapani Pälli2019-04-261-2/+2
| | | | | | | | | fixes following warning with clang: warning: suggest braces around initialization of subobject Signed-off-by: Tapani Pälli <[email protected]> Reviewed-by: Caio Marcelo de Oliveira Filho <[email protected]> Reviewed-by: Kristian H. Kristensen <[email protected]>
* nir: use braces around subobject in initializerTapani Pälli2019-04-262-6/+6
| | | | | | | | | | | | | | Used same syntax as elsewhere with Mesa sources, verified result against MSVC with godbolt.org. fixes following warning with clang: warning: suggest braces around initialization of subobject v2: empty braces -> braces around subobject (Caio, Kristian) Signed-off-by: Tapani Pälli <[email protected]> Reviewed-by: Caio Marcelo de Oliveira Filho <[email protected]> Reviewed-by: Kristian H. Kristensen <[email protected]>
* nir/algebraic: Optimize integer cast-of-castJason Ekstrand2019-04-261-0/+42
| | | | | | | These have been popping up more and more with the OpenCL work and other bits causing extra conversions to/from 64-bit. Reviewed-by: Karol Herbst <[email protected]>
* nir: fix bit_size in lower indirect derefs.Dave Airlie2019-04-261-1/+1
| | | | | | | | This fixes a case where we are expecting 64-bit but generate 32-bit consts and validate gets angry. Reviewed-by: Jason Ekstrand <[email protected]> Signed-off-by: Dave Airlie <[email protected]>
* glsl: fix shader_storage_blocks_write_access for SSBO block arrays (v2)Marek Olšák2019-04-251-3/+19
| | | | | | | | | | This fixes KHR-GL45.compute_shader.resources-max on radeonsi. Fixes: 4e1e8f684bf "glsl: remember which SSBOs are not read-only and pass it to gallium" v2: use is_interface_array, protect again assertion failures in u_bit_consecutive Reviewed-by: Dave Airlie <[email protected]>
* freedreno/ir3: lower load_barycentric_at_offsetRob Clark2019-04-251-0/+3
| | | | | | | | | Calculates i,j at specified offset within a pixel. A new load_size_ir3 intrinsic is used in conjunction with fddx/fddy to translate the offset into primitive space and adjust the i,j from load_barycentric_pixel accordingly. Signed-off-by: Rob Clark <[email protected]>
* freedreno/ir3: lower load_barycentric_at_sampleRob Clark2019-04-251-0/+7
| | | | | | | This lowers load_barycentric_at_sample to load_sample_pos_from_id plus load_barycentric_at_offset. Signed-off-by: Rob Clark <[email protected]>
* compiler: rename SYSTEM_VALUE_VARYING_COORDRob Clark2019-04-252-3/+12
| | | | | | | And add corresponding enums for different sorts of varying interpolation. Signed-off-by: Rob Clark <[email protected]>
* nir: Add option to lower tex to txl when shader don't support implicit LODCaio Marcelo de Oliveira Filho2019-04-252-0/+8
| | | | | | | | | | | We already add the LOD src, so go ahead and update the texop as well when this option is set. v2: Make it an option. (Rob Clark) v3: Use a more concise name suggested by Jason. Reviewed-by: Rob Clark <[email protected]>
* nir: fix nir_remove_unused_varyings()Timothy Arceri2019-04-251-18/+33
| | | | | | | | | | | | | | | | | | | | | | | | We were only setting the used mask for the first component of a varying. Since the linking opts split vectors into scalars this has mostly worked ok. However this causes an issue where for example if we split a struct on one side of the interface but not the other, then we can possibly end up removing the first components on the side that was split and then incorrectly remove the whole struct on the other side of the varying. With this change we simply mark all 4 components for each slot used by a struct. We could possibly make this more fine gained but that would require a more complex change. This fixes a bug in Strange Brigade on RADV when tessellation is enabled, all credit goes to Samuel Pitoiset for tracking down the cause of the bug. Fixes: f1eb5e639997 ("nir: add component level support to remove_unused_io_vars()") Reviewed-by: Samuel Pitoiset <[email protected]>
* glsl: handle interactions between EXT_gpu_shader4 and texture extensionsMarek Olšák2019-04-243-323/+412
| | | | | | also, EXT_texture_buffer_object has to be enabled separately. Reviewed-by: Eric Anholt <[email protected]>
* glsl: allow "varying out" for fragment shader outputs with EXT_gpu_shader4Marek Olšák2019-04-243-2/+19
| | | | Reviewed-by: Eric Anholt <[email protected]>
* glsl: add texture builtin functions for EXT_gpu_shader4Marek Olšák2019-04-241-25/+667
| | | | | | | | | v2: some fixes to texture functions thanks to piglit tests Reviewed-by: Timothy Arceri <[email protected]> (v1) Reviewed-by: Ian Romanick <[email protected]> (v1) Tested-by: Dieter Nützel <[email protected]> (v1) Reviewed-by: Eric Anholt <[email protected]>
* glsl: add arithmetic builtin functions for EXT_gpu_shader4Marek Olšák2019-04-241-13/+35
| | | | | | | Reviewed-by: Timothy Arceri <[email protected]> Reviewed-by: Ian Romanick <[email protected]> Tested-by: Dieter Nützel <[email protected]> Reviewed-by: Eric Anholt <[email protected]>
* glsl: add builtin variables for EXT_gpu_shader4Marek Olšák2019-04-241-3/+4
| | | | | | | Reviewed-by: Timothy Arceri <[email protected]> Reviewed-by: Ian Romanick <[email protected]> Tested-by: Dieter Nützel <[email protected]> Reviewed-by: Eric Anholt <[email protected]>
* glsl: apply some 1.30 and other rules to EXT_gpu_shader4 as wellMarek Olšák2019-04-243-8/+12
| | | | Reviewed-by: Eric Anholt <[email protected]>
* glsl: enable types for EXT_gpu_shader4Chris Forbes2019-04-242-25/+57
| | | | | | | Reviewed-by: Timothy Arceri <[email protected]> Tested-by: Dieter Nützel <[email protected]> Reviewed-by: Ian Romanick <[email protected]> Reviewed-by: Eric Anholt <[email protected]>
* glsl: add `unsigned int` type for EXT_GPU_shader4Marek Olšák2019-04-242-2/+11
| | | | | | | Reviewed-by: Timothy Arceri <[email protected]> Reviewed-by: Ian Romanick <[email protected]> Tested-by: Dieter Nützel <[email protected]> Reviewed-by: Eric Anholt <[email protected]>
* glsl: enable noperspective|flat|centroid for EXT_gpu_shader4Chris Forbes2019-04-241-3/+3
| | | | | | | Reviewed-by: Timothy Arceri <[email protected]> Reviewed-by: Ian Romanick <[email protected]> Tested-by: Dieter Nützel <[email protected]> Reviewed-by: Eric Anholt <[email protected]>
* glsl: add scaffolding for EXT_gpu_shader4Chris Forbes2019-04-243-0/+4
| | | | | | | Reviewed-by: Timothy Arceri <[email protected]> Reviewed-by: Ian Romanick <[email protected]> Tested-by: Dieter Nützel <[email protected]> Reviewed-by: Eric Anholt <[email protected]>
* glsl: Silence may unused parameter warnings in glsl/ir.hIan Romanick2019-04-231-1/+1
| | | | | | | | | | | | | Every file that included glsl/ir.h had a warning like: src/compiler/glsl/ir.h: In member function ‘virtual bool ir_rvalue::is_lvalue(const _mesa_glsl_parse_state*) const’: src/compiler/glsl/ir.h:236:64: warning: unused parameter ‘state’ [-Wunused-parameter] virtual bool is_lvalue(const struct _mesa_glsl_parse_state *state = NULL) const ^ Cc: Samuel Pitoiset <[email protected]> Fixes: fa4ebf6b8d9 ("glsl: add _mesa_glsl_parse_state object to is_lvalue()") Reviewed-by: Sagar Ghuge <[email protected]> Reviewed-by: Matt Turner <[email protected]>