summaryrefslogtreecommitdiffstats
path: root/src/gallium
Commit message (Collapse)AuthorAgeFilesLines
* radeonsi: move instance divisors into a constant bufferMarek Olšák2017-06-277-28/+93
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Shader key size: 107 -> 47 Divisors of 0 and 1 are encoded in the shader key. Greater instance divisors are loaded from a constant buffer. The shader code doing the division is huge. Is it something we need to worry about? Does any app use instance divisors >= 2? VS prolog disassembly: s_load_dwordx4 s[12:15], s[0:1], 0x80 ; C00A0300 00000080 s_nop 0 ; BF800000 s_waitcnt lgkmcnt(0) ; BF8C007F s_buffer_load_dword s14, s[12:15], 0x4 ; C0220386 00000004 s_waitcnt lgkmcnt(0) ; BF8C007F v_cvt_f32_u32_e32 v4, s14 ; 7E080C0E v_rcp_iflag_f32_e32 v4, v4 ; 7E084704 v_mul_f32_e32 v4, 0x4f800000, v4 ; 0A0808FF 4F800000 v_cvt_u32_f32_e32 v4, v4 ; 7E080F04 v_mul_hi_u32 v5, v4, s14 ; D2860005 00001D04 v_mul_lo_i32 v6, v4, s14 ; D2850006 00001D04 v_cmp_eq_u32_e64 s[12:13], 0, v5 ; D0CA000C 00020A80 v_sub_i32_e32 v5, vcc, 0, v6 ; 340A0C80 v_cndmask_b32_e64 v5, v6, v5, s[12:13] ; D1000005 00320B06 v_mul_hi_u32 v5, v5, v4 ; D2860005 00020905 v_add_i32_e32 v6, vcc, v5, v4 ; 320C0905 v_subrev_i32_e32 v4, vcc, v5, v4 ; 36080905 v_cndmask_b32_e64 v4, v4, v6, s[12:13] ; D1000004 00320D04 v_mul_hi_u32 v5, v4, v1 ; D2860005 00020304 v_add_i32_e32 v4, vcc, s8, v0 ; 32080008 v_mul_lo_i32 v6, v5, s14 ; D2850006 00001D05 v_add_i32_e32 v7, vcc, 1, v5 ; 320E0A81 v_cmp_ge_u32_e64 s[12:13], v1, v6 ; D0CE000C 00020D01 v_sub_i32_e32 v6, vcc, v1, v6 ; 340C0D01 v_cmp_le_u32_e32 vcc, s14, v6 ; 7D960C0E v_cndmask_b32_e64 v8, 0, -1, s[12:13] ; D1000008 00318280 v_cndmask_b32_e64 v6, 0, -1, vcc ; D1000006 01A98280 v_and_b32_e32 v6, v8, v6 ; 260C0D08 v_cmp_eq_u32_e32 vcc, 0, v6 ; 7D940C80 v_cndmask_b32_e32 v6, v7, v5, vcc ; 000C0B07 v_add_i32_e32 v5, vcc, -1, v5 ; 320A0AC1 v_cmp_eq_u32_e32 vcc, 0, v8 ; 7D941080 v_cndmask_b32_e32 v5, v6, v5, vcc ; 000A0B06 v_add_i32_e32 v5, vcc, s9, v5 ; 320A0A09 v2: set prefer_mono for fetched instance divisors Reviewed-by: Nicolai Hähnle <[email protected]>
* radeonsi: check nr_cbufs in other places before flushing CBMarek Olšák2017-06-271-2/+4
| | | | Reviewed-by: Nicolai Hähnle <[email protected]>
* radeonsi: use #pragma pack to pack si_shader_keyMarek Olšák2017-06-271-0/+8
| | | | | | | | | sizeof(struct si_shader_key): Before reverting the 2 commits: 120 bytes After reverting the 2 commits: 128 bytes With #pragma pack: 107 bytes Reviewed-by: Nicolai Hähnle <[email protected]>
* Revert "radeonsi: use uint32_t to declare si_shader_key.opt.kill_outputs"Marek Olšák2017-06-273-10/+6
| | | | | | This reverts commit 7b2240ac9ce3ba9bd86f4ae8aac53af8878c0b10. Reviewed-by: Nicolai Hähnle <[email protected]>
* Revert "radeonsi: remove 8 bytes from si_shader_key with uint32_t ↵Marek Olšák2017-06-273-14/+5
| | | | | | | | ff_tcs_inputs_to_copy" This reverts commit 6b6fed3a3c81c2b0d319ef121df20a0dc914705f. Reviewed-by: Nicolai Hähnle <[email protected]>
* radeon/vcn: enable h264 decode entension supportLeo Liu2017-06-272-0/+3
| | | | | | | It's enabled through message buffer for UVD Signed-off-by: Leo Liu <[email protected]> Acked-by: Christian König <[email protected]>
* svga: clean up format_cap_tableCharmaine Lee2017-06-271-403/+92
| | | | | | | | | | | Per Jose's suggestion, this patch cleans up format_cap_table to remove the unnecessary default cap value for vgpu10 formats since those devcap values can be retrieved from the device. Tested with MTT conform, glretrace, piglit in HWv13 and HWv8. Reviewed-by: Brian Paul <[email protected]> Reviewed-by: Jose Fonseca <[email protected]>
* svga: fix the default devcap for SVGA3D_Z_D24S8_INTCharmaine Lee2017-06-271-4/+1
| | | | | | | | | | The default devcap for format SVGA3D_Z_D24S8_INT in HWv8 when its devcap is not explicitly advertised should be set to zero to match the default value in the device. Tested with MTT piglit in HW version 8. Reviewed-by: Neha Bhende <[email protected]>
* svga: create buffer surfaces for incompatible bind flagsCharmaine Lee2017-06-274-27/+217
| | | | | | | | | | | | | | | | | | | | | | In cases where certain bind flags cannot be enabled together, such as CONSTANT_BUFFER cannot be combined with any other flags, a separate host surface will be created. For example, if a stream output buffer is reused as a constant buffer, two host surfaces will be created, one for stream output, and another one for constant buffer. Data will be copied from the stream output surface to the constant buffer surface. Fixes piglit test ext_transform_feedback-immediate-reuse-index-buffer, ext_transform_feedback-immediate-reuse-uniform-buffer Tested with MTT piglit, MTT glretrace, Nature, NobelClinician Viewer, Tropics. v2: Fix bind flags compatibility check as suggested by Brian. v3: Use the list utility to maintain the buffer surface list. v4: Use the SAFE rev of LIST_FOR_EACH_ENTRY Reviewed-by: Brian Paul <[email protected]> Reviewed-by: Jose Fonseca <[email protected]>
* svga: do not unconditionally enable streamout bind flagCharmaine Lee2017-06-274-11/+78
| | | | | | | | | | | | | | | | | | Currently we unconditionally enable streamout bind flag at buffer resource creation time. This is not necessary if the buffer is never used as a streamout buffer. With this patch, we enable streamout bind flag as indicated by the state tracker. If the buffer is later bound to streamout and does not already has streamout bind flag enabled, we will recreate the buffer with the new set of bind flags. Buffer content will be copied from the old buffer to the new one. Tested with MTT piglit, Nature, Tropics, Lightsmark. v2: Fix bind flags check as suggested by Brian. Reviewed-by: Brian Paul <[email protected]> Reviewed-by: Jose Fonseca <[email protected]>
* svga: pass tobind_flags to svga_buffer_handleCharmaine Lee2017-06-278-15/+26
| | | | | | | | This is to prepare for more bind_flags optimization in subsequent patches. Reviewed-by: Brian Paul <[email protected]> Reviewed-by: Jose Fonseca <[email protected]>
* svga: pass bind_flags to surface create functionsCharmaine Lee2017-06-273-26/+34
| | | | | | | | This is to prepare for other bind_flags optimization in subsequent patches. Reviewed-by: Brian Paul <[email protected]> Reviewed-by: Jose Fonseca <[email protected]>
* pipe_loader_sw: fix compilation warningBrian Paul2017-06-271-1/+2
| | | | | | Add the new 'flags' parameter to pipe_loader_sw_create_screen(). Reviewed-by: Marek Olšák <[email protected]>
* i915: use different CFLAGS/LIBS variables than i965/anvLionel Landwerlin2017-06-273-3/+3
| | | | | | Signed-off-by: Lionel Landwerlin <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]> Reviewed-by: Emil Velikov <[email protected]>
* nv50/ir: fix combineLd/St to update existing records as necessaryIlia Mirkin2017-06-261-0/+8
| | | | | | | | | | | | | | | | | | | | | Previously the logic would decide that the record is kept, which translates into keep = false in the caller, which meant that these passes did not run. While it's right that keep = false which means that a new record does not need to be added, we do still have to perform the usual list maintenance. It's easiest to do this pre-merge rather than post. The lowering that clip/cull distance passes produce triggers this bug in TCS (since reading outputs is done differently in other stages), but it should be possible to achieve it with the right sequence of regular reads/writes. Fixes: KHR-GL45.cull_distance.functional Fixes: generated_tests/spec/arb_tessellation_shader/execution/tes-input/tes-input-gl_ClipDistance.shader_test Signed-off-by: Ilia Mirkin <[email protected]> Reviewed-by: Samuel Pitoiset <[email protected]> Cc: [email protected]
* nv50/ir: adjust overlapping logic to take fileIndex-relative offsetsIlia Mirkin2017-06-261-1/+5
| | | | | | | | | | | | | | | If the fileIndex is different, that means they are in logically different spaces. However if there's also a relative offset, then they could end up pointing at the same spot again. Also add a note about potential for multiple buffers to overlap even if they're at different file indexes. However that's potentially lowered away by the point that this logic hits. Not known to fix any specific application or test. Signed-off-by: Ilia Mirkin <[email protected]> Reviewed-by: Samuel Pitoiset <[email protected]>
* nv50/ir: VFETCH is also considered a load for MemoryOptIlia Mirkin2017-06-261-1/+1
| | | | | | | | This has no effect since in practice this will only play for memory-backed files, for which VFETCH will never happen. Signed-off-by: Ilia Mirkin <[email protected]> Reviewed-by: Samuel Pitoiset <[email protected]>
* nv50,nvc0: remove IDX from bufctx immediately, to avoid conflicts with clearIlia Mirkin2017-06-262-9/+10
| | | | | | | | | | | | The idxbuf could linger, and when a clear happened, which also uses the 3d bufctx, we could get an error trying to access it. This fixes spurious crashes/errors in CTS tests. Fixes: 61d8f3387d ("nv50,nvc0: clear index buffer bufctx bin unconditionally") Signed-off-by: Ilia Mirkin <[email protected]> Reviewed-by: Samuel Pitoiset <[email protected]> Cc: [email protected]
* nv50/ir: fetch indirect sources BEFORE the op that uses themIlia Mirkin2017-06-261-19/+32
| | | | | | | | | | All the BuildUtil helpers just insert the operation into the current BB. So we have to take care that any fetchSrc() operations happen before the operation whose setIndirect() it goes into. Signed-off-by: Ilia Mirkin <[email protected]> Reviewed-by: Samuel Pitoiset <[email protected]> Cc: [email protected]
* Android: add renderonly files to libmesa_galliumRob Herring2017-06-261-0/+1
| | | | | | | | | | | | | | | vc4 now depends on renderonly functions, but these weren't added to the Android build resulting in the following errors: src/gallium/drivers/vc4/vc4_resource.c:380: error: undefined reference to 'renderonly_scanout_destroy' src/gallium/drivers/vc4/vc4_resource.c:681: error: undefined reference to 'renderonly_create_gpu_import_for_resource' src/gallium/drivers/vc4/vc4_screen.c:625: error: undefined reference to 'renderonly_dup' src/gallium/winsys/pl111/drm/pl111_drm_winsys.c:37: error: undefined reference to 'renderonly_create_gpu_import_for_resource' src/gallium/winsys/pl111/drm/pl111_drm_winsys.c:37: error: undefined reference to 'renderonly_create_gpu_import_for_resource' Fixes: 7029ec05e2c7 ("gallium: Add renderonly-based support for pl111+vc4.") Signed-off-by: Rob Herring <[email protected]> Reviewed-by: Eric Anholt <[email protected]>
* radeonsi: don't flush and wait for CB after depth-only renderingMarek Olšák2017-06-261-1/+4
| | | | Reviewed-by: Nicolai Hähnle <[email protected]>
* etnaviv: only flush resource to self if no scanout buffer existsLucas Stach2017-06-261-4/+5
| | | | | | | | | | | | Currently a resource flush may trigger a self resolve, even if a scanout buffer exists, but is up to date. If a scanout buffer exists we only ever want to flush the resource to the scanout buffer. This fixes a performance regression. Fixes: dda956340ce9 (etnaviv: resolve tile status when flushing resource) Cc: [email protected] Signed-off-by: Lucas Stach <[email protected]> Reviewed-by: Philipp Zabel <[email protected]> Reviewed-by: Christian Gmeiner <[email protected]>
* etnaviv: add support for snorm texturesChristian Gmeiner2017-06-262-3/+7
| | | | | | | | | Based on a patch from Wladimir J. van der Laan and untested due to lack of hardware. Binary blob emits those formats if GPU supports HALTI1 (faked with ibvivhook). Signed-off-by: Christian Gmeiner <[email protected]> Reviewed-by: Wladimir J. van der Laan <[email protected]>
* etnaviv: add R8G8 texture supportChristian Gmeiner2017-06-261-1/+1
| | | | | | | Passes texwrap GL_ARB_texture_rg piglit (with faked full texture rg support). Signed-off-by: Christian Gmeiner <[email protected]> Reviewed-by: Wladimir J. van der Laan <[email protected]>
* etnaviv: add support for swizzled texture formatsChristian Gmeiner2017-06-264-39/+99
| | | | | | | Passes all ext_texture_swizzle piglits. Signed-off-by: Christian Gmeiner <[email protected]> Reviewed-By: Wladimir J. van der Laan <[email protected]>
* etnaviv: add support for extended texture formatsChristian Gmeiner2017-06-264-4/+10
| | | | | Signed-off-by: Christian Gmeiner <[email protected]> Reviewed-by: Wladimir J. van der Laan <[email protected]>
* swr: set an explicit clear_rect if scissor is not enabled.Bruce Cherniak2017-06-261-1/+9
| | | | | | | | | | | | | Fix regression of "no rendering" on simple apps like glxgears by setting an explicit full surface clear_rect when scissor is not enabled. This regressed with commit 00173d91 "st/mesa: don't set 16 scissors and 16 viewports if they're unused" due to an assumption that a default scissor rect is always set, which was the case prior to this optimization. Reviewed-by: Tim Rowley <[email protected]>
* swr/rast: adjust std::string usage to fix buildTim Rowley2017-06-261-3/+9
| | | | | | | | | | Some combinations of c++ compilers and standard libraries had problems with the string::replace code we were using previously. This should fix the travis-ci system. Tested-by: Eric Engestrom <[email protected]> Reviewed-by: Bruce Cherniak <[email protected]>
* radeonsi: support indirect indexing in INTERP_* opcodesNicolai Hähnle2017-06-261-20/+58
| | | | | | | | | | | | The hardware doesn't support it, so we just interpolate all array elements and then use indirect indexing on the resulting vector. Clearly, this is not very efficient. There is an argument to be had for adding if/else, or perhaps even pulling the data out of LDS directly. Both don't really seem worth the effort, considering that it seems nobody actually uses this feature. Reviewed-by: Marek Olšák <[email protected]>
* r600g: fix crash when file in R600_TRACE doesn't existConstantine Charlamov2017-06-261-4/+5
| | | | | | | | | | …and print error in such case. Which probably is not a rare event btw because fopen doesn't expand ~ to $HOME. Also get rid of unused "bool ret" variable. Signed-off-by: Constantine Kharlamov <[email protected]> Signed-off-by: Dave Airlie <[email protected]>
* r600g: take into account offset to system inputs at tgsi_interp_egcm()Constantine Charlamov2017-06-262-6/+7
| | | | | | | | | Fixes: https://bugs.freedesktop.org/show_bug.cgi?id=100785 v2: I was too much twiddling whether to initialize nsys_inputs at the beginning of shader initialization or for allocation of system values, and by the time I decided to go with the first one, I forgot to change it back. Signed-off-by: Constantine Kharlamov <[email protected]> Signed-off-by: Dave Airlie <[email protected]>
* r600g: get rid of trailing whitespaceConstantine Charlamov2017-06-261-22/+22
| | | | | Signed-off-by: Constantine Kharlamov <[email protected]> Signed-off-by: Dave Airlie <[email protected]>
* r600/asm: add support for other GDS operations.Dave Airlie2017-06-263-4/+26
| | | | | | | This adds support for the GDS operations needed to do atomic counters. Signed-off-by: Dave Airlie <[email protected]>
* r600: don't merge GDS into VTXDave Airlie2017-06-261-2/+3
| | | | | | We don't want vtx/tex instructions ending up in GDS sections. Signed-off-by: Dave Airlie <[email protected]>
* r600: for memory instructions dump index gpr for read indirects also.Dave Airlie2017-06-261-1/+2
| | | | | | This just makes sure we can see the index gpr in the asm dumps. Signed-off-by: Dave Airlie <[email protected]>
* r600: add support for vertex fetches via texture cacheDave Airlie2017-06-262-2/+20
| | | | | | | | On evergreen we can route vertex fetches via the texture cache, and this is required for some images support. So add support to the asm builder for it. Signed-off-by: Dave Airlie <[email protected]>
* r600: route indirect address register correctly for vtx fetches.Dave Airlie2017-06-261-1/+1
| | | | | | | This was found during writing the images code, we need to make sure we route the correct index register. Signed-off-by: Dave Airlie <[email protected]>
* gallium/hud: add glthread countersMarek Olšák2017-06-263-0/+91
| | | | Reviewed-by: Timothy Arceri <[email protected]>
* gallium/hud: add API-thread-busy for monitoring the thread loadMarek Olšák2017-06-263-4/+22
| | | | Reviewed-by: Timothy Arceri <[email protected]>
* gallium/hud: add hud_pane::hud pointerMarek Olšák2017-06-262-3/+6
| | | | | | for later use Reviewed-by: Timothy Arceri <[email protected]>
* mesa/glthread: add glthread "perf" counters and pass them to gallium HUDMarek Olšák2017-06-265-2/+23
| | | | | | | | | | | for HUD integration in following commits. This valuable profiling data will allow us to see on the HUD how well glthread is able to utilize parallelism. This is better than benchmarking, because you can see exactly what's happening and you don't have to be CPU-bound. u_threaded_context has the same counters. Reviewed-by: Timothy Arceri <[email protected]>
* gallium/hud: move struct hud_context to hud_private.hMarek Olšák2017-06-262-46/+48
| | | | Reviewed-by: Timothy Arceri <[email protected]>
* gallium/hud: rename API-thread-busy to main-thread-busyMarek Olšák2017-06-263-5/+5
| | | | Reviewed-by: Timothy Arceri <[email protected]>
* util: move pipe_thread_is_self from gallium to src/utilMarek Olšák2017-06-262-12/+1
| | | | Reviewed-by: Timothy Arceri <[email protected]>
* nv50/ir: Properly fold constants in SPLIT operationPierre Moreau2017-06-251-3/+4
| | | | | | | Fixes: b7d9677d ("nv50/ir: constant fold OP_SPLIT") Cc: [email protected] Signed-off-by: Pierre Moreau <[email protected]> Reviewed-by: Ilia Mirkin <[email protected]>
* radeonsi/gfx9: don't overallocate shader binariesMarek Olšák2017-06-241-6/+0
| | | | | | It's not needed. The hw doesn't fetch ahead over page boundaries. Reviewed-by: Nicolai Hähnle <[email protected]>
* st/dri2: implement image offset queryLucas Stach2017-06-241-0/+6
| | | | | | | | This trivially adds support for the image offset query, which is needed for the zwp_linux_dmabuf based EGL platform wayland implementation. Signed-off-by: Lucas Stach <[email protected]> Reviewed-by: Daniel Stone <[email protected]>
* llvmpipe: initialize default fb correctly in setupRoland Scheidegger2017-06-241-0/+4
| | | | | | | | | | | | | | | | If lp_setup_bind_framebuffer() is never called, then setup fb x1/y1 was not correctly initialized. This can happen if there's never a fb set - both cso and llvmpipe would consider setting this with no cbufs and no zsbuf a redundant change and therefore it would never get set. We rely on this setup fb rect being initialized correctly for the tri intersect tests, throwing away tris which don't intersect. Not initializing it meant we'd then say it intersected, and we'd try to bin that despite that we have no actual tiles to bin it to, leading to assertion failures (pretty harmless since tile 0/0 always exists nevertheless as tiles are statically allocated, albeit that should change at some point). (Note probably not an issue with gl state tracker) Reviewed-by: Jose Fonseca <[email protected]>
* radeonsi: unreference vertex buffers when destroying the contextMarek Olšák2017-06-231-0/+2
| | | | Reviewed-by: Samuel Pitoiset <[email protected]>
* radeonsi: implement the workaround for Rocket League - postponed TGSI killMarek Olšák2017-06-235-0/+37
| | | | | | | | Do KILL at the end of shaders so as not to break WQM. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=100070 Reviewed-by: Nicolai Hähnle <[email protected]>