summaryrefslogtreecommitdiffstats
Commit message (Collapse)AuthorAgeFilesLines
* radeonsi: move instance divisors into a constant bufferMarek Olšák2017-06-277-28/+93
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Shader key size: 107 -> 47 Divisors of 0 and 1 are encoded in the shader key. Greater instance divisors are loaded from a constant buffer. The shader code doing the division is huge. Is it something we need to worry about? Does any app use instance divisors >= 2? VS prolog disassembly: s_load_dwordx4 s[12:15], s[0:1], 0x80 ; C00A0300 00000080 s_nop 0 ; BF800000 s_waitcnt lgkmcnt(0) ; BF8C007F s_buffer_load_dword s14, s[12:15], 0x4 ; C0220386 00000004 s_waitcnt lgkmcnt(0) ; BF8C007F v_cvt_f32_u32_e32 v4, s14 ; 7E080C0E v_rcp_iflag_f32_e32 v4, v4 ; 7E084704 v_mul_f32_e32 v4, 0x4f800000, v4 ; 0A0808FF 4F800000 v_cvt_u32_f32_e32 v4, v4 ; 7E080F04 v_mul_hi_u32 v5, v4, s14 ; D2860005 00001D04 v_mul_lo_i32 v6, v4, s14 ; D2850006 00001D04 v_cmp_eq_u32_e64 s[12:13], 0, v5 ; D0CA000C 00020A80 v_sub_i32_e32 v5, vcc, 0, v6 ; 340A0C80 v_cndmask_b32_e64 v5, v6, v5, s[12:13] ; D1000005 00320B06 v_mul_hi_u32 v5, v5, v4 ; D2860005 00020905 v_add_i32_e32 v6, vcc, v5, v4 ; 320C0905 v_subrev_i32_e32 v4, vcc, v5, v4 ; 36080905 v_cndmask_b32_e64 v4, v4, v6, s[12:13] ; D1000004 00320D04 v_mul_hi_u32 v5, v4, v1 ; D2860005 00020304 v_add_i32_e32 v4, vcc, s8, v0 ; 32080008 v_mul_lo_i32 v6, v5, s14 ; D2850006 00001D05 v_add_i32_e32 v7, vcc, 1, v5 ; 320E0A81 v_cmp_ge_u32_e64 s[12:13], v1, v6 ; D0CE000C 00020D01 v_sub_i32_e32 v6, vcc, v1, v6 ; 340C0D01 v_cmp_le_u32_e32 vcc, s14, v6 ; 7D960C0E v_cndmask_b32_e64 v8, 0, -1, s[12:13] ; D1000008 00318280 v_cndmask_b32_e64 v6, 0, -1, vcc ; D1000006 01A98280 v_and_b32_e32 v6, v8, v6 ; 260C0D08 v_cmp_eq_u32_e32 vcc, 0, v6 ; 7D940C80 v_cndmask_b32_e32 v6, v7, v5, vcc ; 000C0B07 v_add_i32_e32 v5, vcc, -1, v5 ; 320A0AC1 v_cmp_eq_u32_e32 vcc, 0, v8 ; 7D941080 v_cndmask_b32_e32 v5, v6, v5, vcc ; 000A0B06 v_add_i32_e32 v5, vcc, s9, v5 ; 320A0A09 v2: set prefer_mono for fetched instance divisors Reviewed-by: Nicolai Hähnle <[email protected]>
* radeonsi: check nr_cbufs in other places before flushing CBMarek Olšák2017-06-271-2/+4
| | | | Reviewed-by: Nicolai Hähnle <[email protected]>
* radeonsi: use #pragma pack to pack si_shader_keyMarek Olšák2017-06-271-0/+8
| | | | | | | | | sizeof(struct si_shader_key): Before reverting the 2 commits: 120 bytes After reverting the 2 commits: 128 bytes With #pragma pack: 107 bytes Reviewed-by: Nicolai Hähnle <[email protected]>
* Revert "radeonsi: use uint32_t to declare si_shader_key.opt.kill_outputs"Marek Olšák2017-06-273-10/+6
| | | | | | This reverts commit 7b2240ac9ce3ba9bd86f4ae8aac53af8878c0b10. Reviewed-by: Nicolai Hähnle <[email protected]>
* Revert "radeonsi: remove 8 bytes from si_shader_key with uint32_t ↵Marek Olšák2017-06-273-14/+5
| | | | | | | | ff_tcs_inputs_to_copy" This reverts commit 6b6fed3a3c81c2b0d319ef121df20a0dc914705f. Reviewed-by: Nicolai Hähnle <[email protected]>
* mesa: optimize GL_PRIMITIVE_RESTART_NV moreMarek Olšák2017-06-271-10/+9
| | | | | | | And other client state changes don't have to call update_derived_primitive_restart_state. Reviewed-by: Nicolai Hähnle <[email protected]>
* mesa: fix clip plane enable breakageMarek Olšák2017-06-271-1/+6
| | | | | | | | | | | | | | | | | | | Broken by: commit 00173d91b70ae4dcea7c6324ee4858c498cae14b Author: Marek Olšák <[email protected]> Date: Sat Jun 10 12:09:43 2017 +0200 mesa: don't flag _NEW_TRANSFORM for st/mesa if possible It also optimizes the case slightly for GL core. It doesn't try to fix that glEnable might be a bad place to do the clip plane transformation. Reviewed-by: Nicolai Hähnle <[email protected]> Tested-by: Michel Dänzer <[email protected]> Reviewed-by: Roland Scheidegger <[email protected]>
* radeon/vcn: enable h264 decode entension supportLeo Liu2017-06-272-0/+3
| | | | | | | It's enabled through message buffer for UVD Signed-off-by: Leo Liu <[email protected]> Acked-by: Christian König <[email protected]>
* svga: clean up format_cap_tableCharmaine Lee2017-06-271-403/+92
| | | | | | | | | | | Per Jose's suggestion, this patch cleans up format_cap_table to remove the unnecessary default cap value for vgpu10 formats since those devcap values can be retrieved from the device. Tested with MTT conform, glretrace, piglit in HWv13 and HWv8. Reviewed-by: Brian Paul <[email protected]> Reviewed-by: Jose Fonseca <[email protected]>
* svga: fix the default devcap for SVGA3D_Z_D24S8_INTCharmaine Lee2017-06-271-4/+1
| | | | | | | | | | The default devcap for format SVGA3D_Z_D24S8_INT in HWv8 when its devcap is not explicitly advertised should be set to zero to match the default value in the device. Tested with MTT piglit in HW version 8. Reviewed-by: Neha Bhende <[email protected]>
* svga: create buffer surfaces for incompatible bind flagsCharmaine Lee2017-06-274-27/+217
| | | | | | | | | | | | | | | | | | | | | | In cases where certain bind flags cannot be enabled together, such as CONSTANT_BUFFER cannot be combined with any other flags, a separate host surface will be created. For example, if a stream output buffer is reused as a constant buffer, two host surfaces will be created, one for stream output, and another one for constant buffer. Data will be copied from the stream output surface to the constant buffer surface. Fixes piglit test ext_transform_feedback-immediate-reuse-index-buffer, ext_transform_feedback-immediate-reuse-uniform-buffer Tested with MTT piglit, MTT glretrace, Nature, NobelClinician Viewer, Tropics. v2: Fix bind flags compatibility check as suggested by Brian. v3: Use the list utility to maintain the buffer surface list. v4: Use the SAFE rev of LIST_FOR_EACH_ENTRY Reviewed-by: Brian Paul <[email protected]> Reviewed-by: Jose Fonseca <[email protected]>
* svga: do not unconditionally enable streamout bind flagCharmaine Lee2017-06-274-11/+78
| | | | | | | | | | | | | | | | | | Currently we unconditionally enable streamout bind flag at buffer resource creation time. This is not necessary if the buffer is never used as a streamout buffer. With this patch, we enable streamout bind flag as indicated by the state tracker. If the buffer is later bound to streamout and does not already has streamout bind flag enabled, we will recreate the buffer with the new set of bind flags. Buffer content will be copied from the old buffer to the new one. Tested with MTT piglit, Nature, Tropics, Lightsmark. v2: Fix bind flags check as suggested by Brian. Reviewed-by: Brian Paul <[email protected]> Reviewed-by: Jose Fonseca <[email protected]>
* svga: pass tobind_flags to svga_buffer_handleCharmaine Lee2017-06-278-15/+26
| | | | | | | | This is to prepare for more bind_flags optimization in subsequent patches. Reviewed-by: Brian Paul <[email protected]> Reviewed-by: Jose Fonseca <[email protected]>
* svga: pass bind_flags to surface create functionsCharmaine Lee2017-06-273-26/+34
| | | | | | | | This is to prepare for other bind_flags optimization in subsequent patches. Reviewed-by: Brian Paul <[email protected]> Reviewed-by: Jose Fonseca <[email protected]>
* pipe_loader_sw: fix compilation warningBrian Paul2017-06-271-1/+2
| | | | | | Add the new 'flags' parameter to pipe_loader_sw_create_screen(). Reviewed-by: Marek Olšák <[email protected]>
* mesa: add missing includeEric Engestrom2017-06-271-0/+1
| | | | | | | | | | | src/mesa/drivers/x11/xm_dd.c:688:7: warning: implicit declaration of function ‘_mesa_update_draw_buffer_bounds’; did you mean ‘_mesa_has_ARB_draw_buffers_blend’? [-Wimplicit-function-declaration] _mesa_update_draw_buffer_bounds(ctx, ctx->DrawBuffer); ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Cc: Marek Olšák <[email protected]> Fixes: 585c5cf8a514783d9ed3 ("mesa: don't update draw buffer bounds in _mesa_update_state") Signed-off-by: Eric Engestrom <[email protected]>
* i965: perf: add support for GeminilakeLionel Landwerlin2017-06-274-1/+9131
| | | | | Signed-off-by: Lionel Landwerlin <[email protected]> Acked-by: Kenneth Graunke <[email protected]>
* i965: perf: add support for KabylakeLionel Landwerlin2017-06-275-1/+20970
| | | | | Signed-off-by: Lionel Landwerlin <[email protected]> Acked-by: Kenneth Graunke <[email protected]>
* i965: perf: use gen_device_info rather then brw_contextLionel Landwerlin2017-06-271-2/+2
| | | | | Signed-off-by: Lionel Landwerlin <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
* i965: perf: ensure isolated timer reports while idle don't confuse filteringRobert Bragg2017-06-271-1/+17
| | | | | | | | | | | | | | | From experimentation in IGT, we found that the OA unit might label some report as "idle" (using an invalid context ID), right after a report for a given context. Deltas generated by those reports actually belong to the previous context, even though they're not labelled as such. This change makes ensure that while reading OA reports, we only consider the GPU actually idle after 2 reports with an invalid context ID. Signed-off-by: Lionel Landwerlin <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
* i965: perf: keep on reading reports until delimiting timestampLionel Landwerlin2017-06-271-20/+113
| | | | | | | | | | | | | | | | | | | | | | Due to an underlying hardware race condition, we have no guarantee that all the reports coming from the OA buffer related to the workload we're trying to measure have landed to memory by the time all the work submitted has completed. That means we need to keep on reading the OA stream until we read a report with a timestamp more recent than the timestamp recored by the MI_REPORT_PERF_COUNT at the end of the performance query. v2: fix uninitialized offset variable to 0 (Lionel) v3: rework the reading to avoid blocking the user of the API unless requested (Rob) v4: fix a bug that makes the i965 driver reading the perf stream when not necessary, leading to very long counter accumulation times (Lionel) Signed-off-by: Lionel Landwerlin <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
* i965: Add Gen8+ INTEL_performance_query supportRobert Bragg2017-06-273-33/+288
| | | | | | | | | | Enables access to OA unit metrics on Gen8+ via INTEL_performance_query. v2: make use of new parameters coming from gen_device_info (Lionel) Signed-off-by: Robert Bragg <[email protected]> Signed-off-by: Lionel Landwerlin <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
* i965: Add XML OA metric sets for Gen8+Robert Bragg2017-06-279-14/+65809
| | | | | | | | Also updates Makefile.am to generate corresponding normalization code. Signed-off-by: Robert Bragg <[email protected]> Acked-by: Lionel Landwerlin <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
* i965: Add Gen8+ sys_vars for generated OA codeRobert Bragg2017-06-271-0/+3
| | | | | | | | | | | In preparation for adding XML OA metric set descriptions for Gen 8 and 9 which will result in auto generated code that depends on a number of new system variables ($EuSubslicesTotalCount, $EuThreadsCount and $SliceMask) this adds corresponding members to brw->perf.sys_vars. Signed-off-by: Robert Bragg <[email protected]> Reviewed-by: Lionel Landwerlin <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
* anv/i965: drop libdrm_intel dependency completelyLionel Landwerlin2017-06-2711-4/+3566
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | With Ken's work to drop the library dependency on libdrm_intel, we now only depend on libdrm for the kernel uapi headers it provides. It seems like we're better off just embeddeding those headers ourselves, making the lives of people developping news features tightly integrated with the kernel a tiny bit easier. This change also makes it a bit more obvious what cflags/libs are required by the i915 drivers vs i965, by renaming INTEL_CFLAGS/LIBS into I915_CFLAGS/LIBS. Headers were generated from drm-tip on the following commit : commit 6d61e70ccc21606ffb8a0a03bd3aba24f659502b Merge: 338ffbf7cb5e c0bc126f97fb Author: Dave Airlie <[email protected]> Date: Tue Jun 27 07:24:49 2017 +1000 Backmerge tag 'v4.12-rc7' into drm-next v2: Use installed files from the kernel (Daniel Vetter) v3: Use headers from drm-next rather than drm-tip (Dave/Daniel) Signed-off-by: Lionel Landwerlin <[email protected]> Acked-by: Jason Ekstrand <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
* i915: use different CFLAGS/LIBS variables than i965/anvLionel Landwerlin2017-06-275-7/+7
| | | | | | Signed-off-by: Lionel Landwerlin <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]> Reviewed-by: Emil Velikov <[email protected]>
* aubinator: import intel_aub.h from libdrmLionel Landwerlin2017-06-271-0/+153
| | | | | | | This enables us to compile aubinator without the libdrm dependency. Signed-off-by: Lionel Landwerlin <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
* i965: perf: minimize the chances to spread queries across batchbuffersLionel Landwerlin2017-06-271-0/+8
| | | | | | | | | | | Counter related to timings will be sensitive to any delay introduced by the software. In particular if our begin & end of performance queries end up in different batches, time related counters will exhibit biffer values caused by the time it takes for the kernel driver to load new requests into the hardware. Signed-off-by: Lionel Landwerlin <[email protected]> Acked-by: Kenneth Graunke <[email protected]>
* nir: implement GLSL.std.450 NMax, NMIn and NClamp operationsJuan A. Suarez Romero2017-06-271-0/+3
| | | | | | v2: NIR fmax/fmin already handles NaN (Connor). Reviewed by: Elie Tournier <[email protected]>
* nir: add support for 64-bit in SmoothStep functionJuan A. Suarez Romero2017-06-271-3/+5
| | | | | | | | | | | According to GLSL.std.450 spec, SmoothStep expects input to be a floating-point type, but it does not restrict the bitsize. Current implementation relies on inputs to be 32-bit. This commit extends the support to 64-bit size inputs. Reviewed by: Elie Tournier <[email protected]>
* nir: sge operation is defined for floating-point typesJuan A. Suarez Romero2017-06-271-1/+1
| | | | | | | According to GLSL.std.450 spec, the operand for step() function must be a floating-point. It does not restrict the value to 32-bit floats. Reviewed by: Elie Tournier <[email protected]>
* i965: Separate gen < 8 and gen >= 8 paths explicitly in wrap_mode()Topi Pohjolainen2017-06-271-3/+3
| | | | | | | | | | | Makes coverity happier. Fix indentation in gen >= 8 block while at it. CID: 1413020 Reviewed-by: Kenneth Graunke <[email protected]> Reviewed-by: Eric Engestrom <[email protected]> Signed-off-by: Topi Pohjolainen <[email protected]>
* intel/anv: Add missing break in anv_CreateDevice()Topi Pohjolainen2017-06-271-0/+1
| | | | | | | CID: 1413018 Reviewed-by: Eric Engestrom <[email protected]> Reviewed-by: Nanley Chery <[email protected]> Signed-off-by: Topi Pohjolainen <[email protected]>
* ac/nir: convert emit helpers to ac_llvm_contextNicolai Hähnle2017-06-271-117/+118
| | | | | Reviewed-by: Bas Nieuwenhuizen <[email protected]> Signed-off-by: Dave Airlie <[email protected]>
* ac/nir: remove unused nir_to_llvm_context::has_ddxyNicolai Hähnle2017-06-271-2/+0
| | | | | Reviewed-by: Bas Nieuwenhuizen <[email protected]> Signed-off-by: Dave Airlie <[email protected]>
* ac/nir: implement nir_op_f2bNicolai Hähnle2017-06-271-0/+12
| | | | | Reviewed-by: Bas Nieuwenhuizen <[email protected]> Signed-off-by: Dave Airlie <[email protected]>
* ac/nir: implement nir_op_{b2i,i2b}Nicolai Hähnle2017-06-271-0/+20
| | | | | | | Booleans in NIR are ~0 for true, b2i returns 0/1. Reviewed-by: Bas Nieuwenhuizen <[email protected]> Signed-off-by: Dave Airlie <[email protected]>
* ac/nir: convert type helpers to ac_llvm_contextNicolai Hähnle2017-06-271-95/+95
| | | | | Reviewed-by: Bas Nieuwenhuizen <[email protected]> Signed-off-by: Dave Airlie <[email protected]>
* ac/llvm: fix type of second llvm.cttz.* parameterNicolai Hähnle2017-06-271-1/+1
| | | | | | | | LLVM has required an i1 here for a long time. llvm.ctlz.* was fixed in commit edd23e06067 ("ac/llvm: fix various findMSB bugs"). Reviewed-by: Bas Nieuwenhuizen <[email protected]> Signed-off-by: Dave Airlie <[email protected]>
* ac/shader_info: fix a commentNicolai Hähnle2017-06-271-2/+6
| | | | | Reviewed-by: Bas Nieuwenhuizen <[email protected]> Signed-off-by: Dave Airlie <[email protected]>
* ac: add ac_llvm_context::v8i32Nicolai Hähnle2017-06-272-0/+2
| | | | | Reviewed-by: Bas Nieuwenhuizen <[email protected]> Signed-off-by: Dave Airlie <[email protected]>
* ac: add ac_llvm_context::{i,f}32_{0,1}Nicolai Hähnle2017-06-272-0/+10
| | | | | Reviewed-by: Bas Nieuwenhuizen <[email protected]> Signed-off-by: Dave Airlie <[email protected]>
* ac: add ac_llvm_context::{i16, i64, f16, f64}Nicolai Hähnle2017-06-272-0/+8
| | | | | Reviewed-by: Bas Nieuwenhuizen <[email protected]> Signed-off-by: Dave Airlie <[email protected]>
* nv50/ir: fix combineLd/St to update existing records as necessaryIlia Mirkin2017-06-261-0/+8
| | | | | | | | | | | | | | | | | | | | | Previously the logic would decide that the record is kept, which translates into keep = false in the caller, which meant that these passes did not run. While it's right that keep = false which means that a new record does not need to be added, we do still have to perform the usual list maintenance. It's easiest to do this pre-merge rather than post. The lowering that clip/cull distance passes produce triggers this bug in TCS (since reading outputs is done differently in other stages), but it should be possible to achieve it with the right sequence of regular reads/writes. Fixes: KHR-GL45.cull_distance.functional Fixes: generated_tests/spec/arb_tessellation_shader/execution/tes-input/tes-input-gl_ClipDistance.shader_test Signed-off-by: Ilia Mirkin <[email protected]> Reviewed-by: Samuel Pitoiset <[email protected]> Cc: [email protected]
* nv50/ir: adjust overlapping logic to take fileIndex-relative offsetsIlia Mirkin2017-06-261-1/+5
| | | | | | | | | | | | | | | If the fileIndex is different, that means they are in logically different spaces. However if there's also a relative offset, then they could end up pointing at the same spot again. Also add a note about potential for multiple buffers to overlap even if they're at different file indexes. However that's potentially lowered away by the point that this logic hits. Not known to fix any specific application or test. Signed-off-by: Ilia Mirkin <[email protected]> Reviewed-by: Samuel Pitoiset <[email protected]>
* nv50/ir: VFETCH is also considered a load for MemoryOptIlia Mirkin2017-06-261-1/+1
| | | | | | | | This has no effect since in practice this will only play for memory-backed files, for which VFETCH will never happen. Signed-off-by: Ilia Mirkin <[email protected]> Reviewed-by: Samuel Pitoiset <[email protected]>
* nv50,nvc0: remove IDX from bufctx immediately, to avoid conflicts with clearIlia Mirkin2017-06-262-9/+10
| | | | | | | | | | | | The idxbuf could linger, and when a clear happened, which also uses the 3d bufctx, we could get an error trying to access it. This fixes spurious crashes/errors in CTS tests. Fixes: 61d8f3387d ("nv50,nvc0: clear index buffer bufctx bin unconditionally") Signed-off-by: Ilia Mirkin <[email protected]> Reviewed-by: Samuel Pitoiset <[email protected]> Cc: [email protected]
* nv50/ir: fetch indirect sources BEFORE the op that uses themIlia Mirkin2017-06-261-19/+32
| | | | | | | | | | All the BuildUtil helpers just insert the operation into the current BB. So we have to take care that any fetchSrc() operations happen before the operation whose setIndirect() it goes into. Signed-off-by: Ilia Mirkin <[email protected]> Reviewed-by: Samuel Pitoiset <[email protected]> Cc: [email protected]
* mesa: skip FLUSH_VERTICES() if no samplers were changedTimothy Arceri2017-06-271-1/+6
| | | | Reviewed-by: Marek Olšák <[email protected]>
* mesa: don't set _NEW_PROGRAM_CONSTANTS for non-bindless opaque uniformsTimothy Arceri2017-06-271-0/+6
| | | | | | v2: rebase on new _mesa_flush_vertices_for_uniforms() helper Reviewed-by: Marek Olšák <[email protected]>