summaryrefslogtreecommitdiffstats
Commit message (Collapse)AuthorAgeFilesLines
* REVIEWERS: add Emil as EGL reviewerEric Engestrom2018-11-131-0/+1
| | | | | Signed-off-by: Eric Engestrom <[email protected]> Acked-by: Emil Velikov <[email protected]>
* REVIEWERS: add include path for EGLEric Engestrom2018-11-131-0/+1
| | | | | Signed-off-by: Eric Engestrom <[email protected]> Acked-by: Emil Velikov <[email protected]>
* intel/genxml: Add engine definition to render engine instructions (gen11)Toni Lönnberg2018-11-131-116/+116
| | | | | | | | | | | | | | Instructions meant for the render engine now have a definition specifying that so that can differentiate instructions meant for different engines due to shared opcodes. v2: Divided into individual patches for each gen v3: Added additional engine definitions. v4: Added missing engine definition to MI_TOPOLOGY_FILTER. Reviewed-by: Lionel Landwerlin <[email protected]>
* intel/genxml: Add engine definition to render engine instructions (gen10)Toni Lönnberg2018-11-131-113/+113
| | | | | | | | | | | | | | Instructions meant for the render engine now have a definition specifying that so that can differentiate instructions meant for different engines due to shared opcodes. v2: Divided into individual patches for each gen v3: Added additional engine definitions. v4: Added missing engine definition to MI_TOPOLOGY_FILTER. Reviewed-by: Lionel Landwerlin <[email protected]>
* intel/genxml: Add engine definition to render engine instructions (gen9)Toni Lönnberg2018-11-131-117/+117
| | | | | | | | | | | | | | Instructions meant for the render engine now have a definition specifying that so that can differentiate instructions meant for different engines due to shared opcodes. v2: Divided into individual patches for each gen v3: Added additional engine definitions. v4: Added more missing engine definitions. Reviewed-by: Lionel Landwerlin <[email protected]>
* intel/genxml: Add engine definition to render engine instructions (gen8)Toni Lönnberg2018-11-131-116/+116
| | | | | | | | | | | | | | Instructions meant for the render engine now have a definition specifying that so that can differentiate instructions meant for different engines due to shared opcodes. v2: Divided into individual patches for each gen v3: Added additional engine definitions. v4: Added missing engine tag for MI_TOPOLOGY_FILTER and MI_LOAD_URB_MEM. Reviewed-by: Lionel Landwerlin <[email protected]>
* intel/genxml: Add engine definition to render engine instructions (gen75)Toni Lönnberg2018-11-131-107/+107
| | | | | | | | | | | | Instructions meant for the render engine now have a definition specifying that so that can differentiate instructions meant for different engines due to shared opcodes. v2: Divided into individual patches for each gen v3: Added additional engine definitions. Reviewed-by: Lionel Landwerlin <[email protected]>
* intel/genxml: Add engine definition to render engine instructions (gen7)Toni Lönnberg2018-11-131-83/+83
| | | | | | | | | | | | Instructions meant for the render engine now have a definition specifying that so that can differentiate instructions meant for different engines due to shared opcodes. v2: Divided into individual patches for each gen v3: Added additional engine definitions. Reviewed-by: Lionel Landwerlin <[email protected]>
* intel/genxml: Add engine definition to render engine instructions (gen6)Toni Lönnberg2018-11-131-54/+54
| | | | | | | | | | | | | | Instructions meant for the render engine now have a definition specifying that so that can differentiate instructions meant for different engines due to shared opcodes. v2: Divided into individual patches for each gen v3: Added additional engine definitions v4: Added missing engine to MEDIA_GATEWAY_STATE Reviewed-by: Lionel Landwerlin <[email protected]>
* intel/genxml: Add engine definition to render engine instructions (gen5)Toni Lönnberg2018-11-131-30/+30
| | | | | | | | | | | | Instructions meant for the render engine now have a definition specifying that so that can differentiate instructions meant for different engines due to shared opcodes. v2: Divided into individual patches for each gen v3: Added additional engine definitions. Reviewed-by: Lionel Landwerlin <[email protected]>
* intel/genxml: Add engine definition to render engine instructions (gen45)Toni Lönnberg2018-11-131-27/+27
| | | | | | | | | | | | Instructions meant for the render engine now have a definition specifying that so that can differentiate instructions meant for different engines due to shared opcodes. v2: Divided into individual patches for each gen v3: Added addition engine definitions. Reviewed-by: Lionel Landwerlin <[email protected]>
* intel/genxml: Add engine definition to render engine instructions (gen4)Toni Lönnberg2018-11-131-25/+25
| | | | | | | | | | | | Instructions meant for the render engine now have a definition specifying that so that can differentiate instructions meant for different engines due to shared opcodes. v2: Divided into individual patches for each gen v3: Added additional engine definitions. Reviewed-by: Lionel Landwerlin <[email protected]>
* intel/decoder: tools: Use engine for decoding batch instructionsToni Lönnberg2018-11-138-53/+69
| | | | | | | | | | | | | | | | The engine to which the batch was sent to is now set to the decoder context when decoding the batch. This is needed so that we can distinguish between instructions as the render and video pipe share some of the instruction opcodes. v2: The engine is now in the decoder context and the batch decoder uses a local function for finding the instruction for an engine. v3: Spec uses engine_mask now instead of engine, replaced engine class enums with the definitions from UAPI. v4: Fix up aubinator_viewer (Lionel) Reviewed-by: Lionel Landwerlin <[email protected]>
* intel/decoder: tools: gen_engine to drm_i915_gem_engine_classToni Lönnberg2018-11-134-25/+19
| | | | | | | | | | Removed the gen_engine enum and changed the involved functions to use the drm_i915_gem_engine_class enum from UAPI instead. v3: Wrong engine was being used for blocks in video ring v4: Fixed aubinator_viewer.cpp Reviewed-by: Lionel Landwerlin <[email protected]>
* intel/decoder: Engine parameter for instructionsToni Lönnberg2018-11-132-0/+31
| | | | | | | | | | | | | | | | | | | Preliminary work for adding handling of different pipes to gen_decoder. Each instruction needs to have a definition describing which engine it is meant for. If left undefined, by default, the instruction is defined for all engines. v2: Changed to use the engine class definitions from UAPI v3: Changed I915_ENGINE_CLASS_TO_MASK to use BITSET_BIT, change engine to engine_mask, added check for incorrect engine and added the possibility to define an instruction to multiple engines using the "|" as a delimiter in the engine attribute. v4: Fixed the memory leak. v5: Removed an unnecessary ralloc_free(). Reviewed-by: Lionel Landwerlin <[email protected]>
* virgl: Add command and flags to initiate debugging on the host (v2)Gert Wollny2018-11-135-0/+37
| | | | | | | | | | | | On the host VREND_DEBUG=guestallow must be set to let the guest override the debug flags. v2: Send flag string instead of flags, this avoids the need to keep the flags in sync. v3: Only request host logging if the host actually understands the command Signed-off-by: Gert Wollny <[email protected]> Reviewed-by: Erik Faye-Lund <[email protected]>
* mesa: Reference count shaders that are used by transform feedback objectsGert Wollny2018-11-131-0/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | Transform feedback objects may hold a pointer to a shader program, and at least in Gallium, this must be a valid pointer until ctx->Driver.EndTransformFeedback in glEndTransformFeedback has been called - which is conform with the spec that any program that is part of a current rendering state should only be flagged for deletion by glDeleteProgram. This was not handled properly for the transform feedback objects so that a call sequence glUseProgram(x) glBeginTransformFreedback(...) glPauseTransformFeedback(...) glDeleteProgram(x) glEndTransformFeedback(...) would result in a use after free bug. With this patch the transform feedback object also updates the reference count to the used program thereby keeping the program valid as long as the transform feedback objects links to it. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=108713 Fixes: 654587696b4234d09a6b471b70e9629cf2887c27 mesa: add end_transform_feedback() helper Signed-off-by: Gert Wollny <[email protected]> Reviewed-by: Emil Velikov <[email protected]>
* radv: set optimal OVERWRITE_COMBINER_WATERMARK on GFX9Samuel Pitoiset2018-11-132-3/+21
| | | | | | | Ported from RadeonSI. Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]>
* radv: set PA.SC_CONSERVATIVE_RASTERIZATION.NULL_SQUAD_AA_MASK_ENABLESamuel Pitoiset2018-11-131-1/+1
| | | | | Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]>
* radv: binding streamout buffers doesn't change context regsSamuel Pitoiset2018-11-131-2/+7
| | | | | | Cc: 18.3 <[email protected]> Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]>
* nir: Don't lower the local work group size if it's variable.Plamena Manolova2018-11-131-6/+18
| | | | | | | | | | If the local work group size is variable it won't be available at compile time so we can't lower it in nir_lower_system_values(). Signed-off-by: Plamena Manolova <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]> Reviewed-by: Karol Herbst <[email protected]>
* util/ralloc: Make sizeof(linear_header) a multiple of 8Matt Turner2018-11-121-2/+12
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Prior to this patch sizeof(linear_header) was 20 bytes in a non-debug build on 32-bit platforms. We do some pointer arithmetic to calculate the next available location with ptr = (linear_size_chunk *)((char *)&latest[1] + latest->offset); in linear_alloc_child(). The &latest[1] adds 20 bytes, so an allocation would only be 4-byte aligned. On 32-bit SPARC a 'sttw' instruction (which stores a consecutive pair of 4-byte registers to memory) requires an 8-byte aligned address. Such an instruction is used to store to an 8-byte integer type, like intmax_t which is used in glcpp's expression_value_t struct. As a result of the 4-byte alignment returned by linear_alloc_child() we would generate a SIGBUS (unaligned exception) on SPARC. According to the GNU libc manual malloc() always returns memory that has at least an alignment of 8-bytes [1]. I think our allocator should do the same. So, simple fix with two parts: (1) Increase SUBALLOC_ALIGNMENT to 8 unconditionally. (2) Mark linear_header with an aligned attribute, which will cause its sizeof to be rounded up to that alignment. (We already do this for ralloc_header) With this done, all Mesa's unit tests now pass on SPARC. [1] https://www.gnu.org/software/libc/manual/html_node/Aligned-Memory-Blocks.html Fixes: 47e17586924f ("glcpp: use the linear allocator for most objects") Bug: https://bugs.gentoo.org/636326 Reviewed-by: Eric Anholt <[email protected]>
* util/ralloc: Switch from DEBUG to NDEBUGMatt Turner2018-11-121-14/+4
| | | | | | | The debug code is all asserts, so protect it with the same thing that controls assert. Reviewed-by: Eric Anholt <[email protected]>
* nir: add support for removing redundant stores to copy prop varTimothy Arceri2018-11-131-10/+35
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | For example the following type of thing is seen in TCS from a number of Vulkan and DXVK games: vec1 32 ssa_557 = deref_var &oPatch (shader_out float) vec1 32 ssa_558 = intrinsic load_deref (ssa_557) () vec1 32 ssa_559 = deref_var &oPatch@42 (shader_out float) vec1 32 ssa_560 = intrinsic load_deref (ssa_559) () vec1 32 ssa_561 = deref_var &oPatch@43 (shader_out float) vec1 32 ssa_562 = intrinsic load_deref (ssa_561) () intrinsic store_deref (ssa_557, ssa_558) (1) /* wrmask=x */ intrinsic store_deref (ssa_559, ssa_560) (1) /* wrmask=x */ intrinsic store_deref (ssa_561, ssa_562) (1) /* wrmask=x */ No shader-db changes on i965 (SKL). vkpipeline-db results RADV (VEGA): Totals from affected shaders: SGPRS: 7832 -> 7728 (-1.33 %) VGPRS: 6476 -> 6740 (4.08 %) Spilled SGPRs: 0 -> 0 (0.00 %) Spilled VGPRs: 0 -> 0 (0.00 %) Private memory VGPRs: 0 -> 0 (0.00 %) Scratch size: 0 -> 0 (0.00 %) dwords per thread Code Size: 469572 -> 456596 (-2.76 %) bytes LDS: 0 -> 0 (0.00 %) blocks Max Waves: 989 -> 960 (-2.93 %) Wait states: 0 -> 0 (0.00 %) The Max Waves and VGPRS changes here are misleading. What is happening is a bunch of TCS outputs are being optimised away as they are now recognised as unused. This results in more varyings being compacted via nir_compact_varyings() which can result in more register pressure when they are not packed in an optimal way. This is an existing problem independent of this patch. I've run some benchmarks and haven't noticed any performance regressions in affected games. Reviewed-by: Jason Ekstrand <[email protected]>
* anv/i965: make use of nir_link_constant_varyings()Timothy Arceri2018-11-131-0/+3
| | | | | | | | | | | | | | | | | | | shader-db results for SLK: total instructions in shared programs: 13106498 -> 13091573 (-0.11%) instructions in affected programs: 1186244 -> 1171319 (-1.26%) helped: 6186 HURT: 0 total cycles in shared programs: 332062633 -> 331961653 (-0.03%) cycles in affected programs: 8537165 -> 8436185 (-1.18%) helped: 5371 HURT: 862 LOST: 6 GAINED: 14 Reviewed-by: Jason Ekstrand <[email protected]>
* egl: Improve the debugging of gbm format matching in DRI configs.Eric Anholt2018-11-121-2/+3
| | | | | | | | | | | | | | | | | Previously the debug would be: libEGL debug: No DRI config supports native format 0x20203852 libEGL debug: No DRI config supports native format 0x38385247 but libEGL debug: No DRI config supports native format R8 libEGL debug: No DRI config supports native format GR88 is a lot easier to understand. Reviewed-by: Eric Engestrom <[email protected]> Reviewed-by: Daniel Stone <[email protected]>
* gbm: Introduce a helper function for printing GBM format names.Eric Anholt2018-11-122-0/+26
| | | | | | | | | | This requires that the caller make a little (stack) allocation to store the string. v2: Use gbm_format_canonicalize (suggested by Daniel) Reviewed-by: Eric Engestrom <[email protected]> Reviewed-by: Daniel Stone <[email protected]>
* gbm: Move gbm_format_canonicalize() to the core.Eric Anholt2018-11-123-16/+19
| | | | | | | I want it for the format name debugging code. Reviewed-by: Eric Engestrom <[email protected]> Reviewed-by: Daniel Stone <[email protected]>
* meson: fix libatomic testsDylan Baker2018-11-121-1/+2
| | | | | | | | | | | There are two problems: 1) the extra underscore in MISSING_64BIT_ATOMICS 2) we should link with libatomic if the previous test decided we needed it Fixes: d1992255bb29054fa51763376d125183a9f602f3 ("meson: Add build Intel "anv" vulkan driver") Reviewed-and-Tested-by: Matt Turner <[email protected]>
* mesa: mark GL_SR8_EXT non-renderable on GLESMarek Olšák2018-11-121-0/+1
| | | | | | Fixes: dEQP-GLES3.functional.fbo.completeness.renderable.texture.color0.sr8_ext Reviewed-by: Ilia Mirkin <[email protected]>
* st/mesa: disable L3 thread pinningMarek Olšák2018-11-121-9/+0
| | | | | | | This implementation can have massive drawbacks. Cc: 18.3 <[email protected]> Reviewed-by: Edmondo Tommasina <[email protected]>
* nir: add lowering for ffloorChristian Gmeiner2018-11-122-0/+4
| | | | | Signed-off-by: Christian Gmeiner <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]>
* util: Fix warning in u_cpu_detect on non-x86Alyssa Rosenzweig2018-11-121-2/+2
| | | | | | | | | regs is only set and used on x86; on other platforms (like ARM), this code causes a trivial warning, solved by moving the regs declaration to the architecture-dependent usage. Reviewed-by: Matt Turner <[email protected]> Signed-off-by: Alyssa Rosenzweig <[email protected]>
* meson: Don't set -WallDylan Baker2018-11-121-2/+2
| | | | | | | | | meson does this for you with its warn levels, so we don't need to set it ourselves. Fixes: d1992255bb29054fa51763376d125183a9f602f3 ("meson: Add build Intel "anv" vulkan driver") Reviewed-by: Eric Engestrom <[email protected]>
* freedreno/drm: fix unused 'entry' warningsRob Clark2018-11-121-2/+0
| | | | | | | Looks like importing libdrm_freedreno into mesa crossed paths with e27902a2613. Signed-off-by: Rob Clark <[email protected]>
* i965: add support for sampling from AYUVLionel Landwerlin2018-11-124-0/+11
| | | | | | Signed-off-by: Lionel Landwerlin <[email protected]> Reviewed-by: Tapani Pälli <[email protected]> Reviewed-by: Eric Engestrom <[email protected]>
* dri: add AYUV formatLionel Landwerlin2018-11-123-0/+4
| | | | | | | | v2: Add a AYUV entry android in the android backend (Tapani) Signed-off-by: Lionel Landwerlin <[email protected]> Reviewed-by: Tapani Pälli <[email protected]> Reviewed-by: Eric Engestrom <[email protected]>
* nir/lower_tex: Add AYUV lowering supportLionel Landwerlin2018-11-122-0/+20
| | | | | | | | | | | | | | | Byte ordering is : 0: V 1: U 2: Y 3: A v2: Split refactoring of alpha channel (Lionel) Signed-off-by: Lionel Landwerlin <[email protected]> Reviewed-by: Tapani Pälli <[email protected]> (v1) Acked-by: Eric Engestrom <[email protected]> (v2)
* nir/lower_tex: add alpha channel parameter for yuv loweringLionel Landwerlin2018-11-121-6/+11
| | | | | | | | | We're about to introduce AYUV support which provides its own alpha channel. So give alpha as a parameter and set it to 1 on exising formats. Signed-off-by: Lionel Landwerlin <[email protected]> Reviewed-by: Eric Engestrom <[email protected]>
* radv: make use of num_good_cu_per_sh in si_emit_graphics() tooSamuel Pitoiset2018-11-121-2/+1
| | | | | Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]>
* radv: clean up setting partial_es_wave for distributed tess on VISamuel Pitoiset2018-11-121-7/+4
| | | | | | | | Only needed when the pipeline actually uses tessellation. I don't think that changes anything, except improving readability. Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]>
* radv: cleanup and document a Hawaii bug with offchip buffersSamuel Pitoiset2018-11-121-9/+8
| | | | | Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]>
* glsl/test: Fix use after free in test_optpass.Hanno Böck2018-11-121-1/+4
| | | | | | | | | | | The variable state is free'd and afterwards state->error is used as the return value, resulting in a use after free bug detected by memory safety tools like address sanitizer. Signed-off-by: Hanno Böck <[email protected]> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=108636 Reviewed-by: Eric Engestrom <[email protected]> Reviewed-by: Tapani Pälli <[email protected]>
* nir: don't pack varyings ints with floats unless flatTimothy Arceri2018-11-121-4/+7
| | | | | | Fixes: 1c9c42d16b4c ("nir: add varying component packing helpers") Reviewed-by: Jason Ekstrand <[email protected]>
* nir: add glsl_type_is_integer() helperTimothy Arceri2018-11-122-0/+6
| | | | | | Fixes: 1c9c42d16b4c ("nir: add varying component packing helpers") Reviewed-by: Jason Ekstrand <[email protected]>
* intel/fs: Prevent emission of IR instructions not aligned to their own ↵Francisco Jerez2018-11-091-3/+17
| | | | | | | | | | | | | | | | | execution size. This can occur during payload setup of SIMD-split send message instructions, which can lead to the emission of header setup instructions with a non-zero channel group and fixed SIMD width. Such instructions could end up using undefined channel enable signals except they don't care since they're always marked force_writemask_all. Not known to affect correctness of any workload at this point, but it would be trivial to back-port to stable if something comes up. Reported-by: Sagar Ghuge <[email protected]> Reviewed-by: Matt Turner <[email protected]> Tested-by: Sagar Ghuge <[email protected]>
* st/mesa: make use of nir_link_constant_varyings()Timothy Arceri2018-11-101-0/+3
| | | | | | | | | | | | | | | | | | Shader-db results radeonsi (VEGA): Totals from affected shaders: SGPRS: 161464 -> 161368 (-0.06 %) VGPRS: 86904 -> 86292 (-0.70 %) Spilled SGPRs: 296 -> 314 (6.08 %) Spilled VGPRs: 0 -> 0 (0.00 %) Private memory VGPRs: 0 -> 0 (0.00 %) Scratch size: 0 -> 0 (0.00 %) dwords per thread Code Size: 3618596 -> 3573852 (-1.24 %) bytes LDS: 0 -> 0 (0.00 %) blocks Max Waves: 26189 -> 26276 (0.33 %) Wait states: 0 -> 0 (0.00 %) Reviewed-by: Eric Anholt <[email protected]>
* nir: add new linking opt nir_link_constant_varyings()Timothy Arceri2018-11-102-0/+111
| | | | | | | This pass moves constant outputs to the consuming shader stage where possible. Reviewed-by: Eric Anholt <[email protected]>
* st/nine: clean up thead shutdown sequence a bitAndre Heider2018-11-091-4/+2
| | | | | | | Just break out of the loop instead, it does the same thing. Signed-off-by: Andre Heider <[email protected]> Reviewed-by: Axel Davy <[email protected]>
* st/nine: plug thread related leaksAndre Heider2018-11-092-0/+9
| | | | | Signed-off-by: Andre Heider <[email protected]> Reviewed-by: Axel Davy <[email protected]>