aboutsummaryrefslogtreecommitdiffstats
path: root/src/gallium/drivers
Commit message (Collapse)AuthorAgeFilesLines
* panfrost/midgard/disasm: Print 8-bit sourcesAlyssa Rosenzweig2019-05-041-23/+43
| | | | | | | | | | | | | | | | | | | | | | | | | This handles the usual case. 8-bit register access parallels 16-bit access, but with one major caveat: in 8-bit mode, only half of the register file is actually (directly) accessible as sources. In particular, for each 16-bit integer register (hrN), we can only index a *single* 8-bit integer (qrN), corresponding to the lower 8-bits. To get the upper 8-bits, it is required to do an explicit shift. For example, to add the bytes of a 16-bit integer hr0.x and get the result as an 8-bit qr0, you'd need to do something like: ilsr hr1.x, hr0.x, #8 iadd qr0.x, qr0.x, qr1.x This scheme diverges from 32-bit registers, in that both the upper and lower halves of a 32-bit register are individually accessible as a pair of half registers. For contrast, to add the lower and upper 16-bits of a 32-bit integer r0.x, you can just: iadd hr0.x, hr0.x, hr1.x Since hr1.x = upper 16-bit of r0.x. Signed-off-by: Alyssa Rosenzweig <[email protected]>
* panfrost/midgard/disasm: Support 8-bit destinationAlyssa Rosenzweig2019-05-041-18/+21
| | | | | | | | Meanwhile, we're forced to disable dest_override, since it's not yet clear how this interacts with other bitnesses (it'll likely need to be overhauled in any case). Signed-off-by: Alyssa Rosenzweig <[email protected]>
* panfrost/midgard: Rename ilzcnt8 -> iclzAlyssa Rosenzweig2019-05-042-2/+2
| | | | | | Per OpenCL. Signed-off-by: Alyssa Rosenzweig <[email protected]>
* panfrost/midgard: Fix crash on unknown opAlyssa Rosenzweig2019-05-041-2/+6
| | | | Signed-off-by: Alyssa Rosenzweig <[email protected]>
* panfrost/midgard/disasm: Fill in .int modAlyssa Rosenzweig2019-05-041-1/+1
| | | | Signed-off-by: Alyssa Rosenzweig <[email protected]>
* panfrost/midgard/disasm: Extend print_reg to 8-bitAlyssa Rosenzweig2019-05-041-15/+34
| | | | Signed-off-by: Alyssa Rosenzweig <[email protected]>
* panfrost/midgard/disasm: Catch mask errorsAlyssa Rosenzweig2019-05-041-0/+11
| | | | | | | We silently ignored certain bits of the mask, which causes issues when disassembly 8/64-bit ops. Signed-off-by: Alyssa Rosenzweig <[email protected]>
* panfrost/midgard: reg_mode_full -> reg_mode_32, etcAlyssa Rosenzweig2019-05-043-16/+16
| | | | | | | In preparation for 8-bit and 64-bit operands, let's not reinforce the 32-bit-centric biases in the ISA. Signed-off-by: Alyssa Rosenzweig <[email protected]>
* freedreno/a6xx: deduplicate a few linesRob Clark2019-05-041-6/+0
| | | | Signed-off-by: Rob Clark <[email protected]>
* freedreno: add ubwc_enabled helperRob Clark2019-05-046-26/+28
| | | | | | | | | Since it is dependent on the tile mode (ie. disabled for smaller mipmap levels), we should handle it a similar way to fd_resource_level_linear(). The code previously mostly did the right thing because the old helper took the tile mode. Signed-off-by: Rob Clark <[email protected]>
* freedreno: move UBWC color offset to fd_resource_offset()Rob Clark2019-05-047-18/+42
| | | | | | | | | | Best to keep it encapsulated in the helper which returns layer/level offset (and actually use that helper everywhere) rather than spreading the logic around the code. Also add a helper to find UBWC offset, to complete the encapsulation. Signed-off-by: Rob Clark <[email protected]>
* freedreno/a6xx: buffer resources cannot be compressedRob Clark2019-05-041-26/+5
| | | | | | | Small cleanup. They are just an array of data and only ever linear/ uncompressed. Signed-off-by: Rob Clark <[email protected]>
* freedreno: mark imported resources as validRob Clark2019-05-041-0/+2
| | | | | | | If someone is importing a buffer, we can't really know the state of it's contents, so assume it is valid. Signed-off-by: Rob Clark <[email protected]>
* freedreno/a6xx: UBWC support for imagesRob Clark2019-05-042-19/+57
| | | | | | | | | | | | | | There are still some fallbacks we'll need to handle before we can enable UBWC by default. I think we may need to fallback to uncompressed if image atomic operations are used. And we still need to sort out how to handle image and sampler views of compressed resources if the image/ sampler view is using a format that does not support compression. (I think the latter should hopefully be uncommon outside of deqp/piglit.) But at least this gets us to the point where supertuxkart works properly with UBWC enabled ;-) Signed-off-by: Rob Clark <[email protected]>
* freedreno/a6xx: UBWC fixesRob Clark2019-05-042-11/+78
| | | | | | | | | | | | | | | A few fixes that get UBWC working for the games/benchmarks where I noticed problems before (in particular and manhattan, and stk (modulo image support for UBWC when compute shaders are used for post-process effects): + fix the size of the UBWC meta buffer (ie, the offset to color pixel data) that is returned by ->fill_ubwc_buffer_sizes() + correct size/layout for 8 and 16 byte per pixel formats + limit the supported formats.. Note all formats that can be tiled can be compressed. Signed-off-by: Rob Clark <[email protected]>
* freedreno: update generated headersRob Clark2019-05-041-2/+2
| | | | | | Corrects tex state ubwc pitch/size Signed-off-by: Rob Clark <[email protected]>
* freedreno/a6xx: OUT_RELOC vs OUT_RELOCW fixesRob Clark2019-05-041-3/+3
| | | | Signed-off-by: Rob Clark <[email protected]>
* iris: Delete bucketing allocatorsKenneth Graunke2019-05-031-167/+3
| | | | | | | | | These add a lot of complexity, and I currently can't measure any performance benefit from having them. In the past, I seem to recall seeing a benefit in drawoverhead scores, but currently it looks like dropping them is either a wash or 1-2% faster. Drop them to simplify allocations.
* iris: Force VMA alignment to be a multiple of the page size.Kenneth Graunke2019-05-031-0/+3
| | | | This should happen regardless, but let's be paranoid.
* iris: leave the top 4Gb of the high heap VMA unusedKenneth Graunke2019-05-031-1/+5
| | | | | This ports commit 9e7b0988d6e98690eb8902e477b51713a6ef9cae from anv to iris. Thanks to Lionel for noticing that it was missing!
* iris: Fix 4GB memory zone heap sizes.Kenneth Graunke2019-05-031-3/+6
| | | | | | | The STATE_BASE_ADDRESS "Size" fields can only hold 0xfffff in pages, and 0xfffff * 4096 = 4294963200, which is 1 page shy of 4GB. So we can't use the top page.
* iris: Resolve textures used by the program, not merely bound texturesKenneth Graunke2019-05-031-2/+5
| | | | | | | | | st/mesa's PBO upload path binds a vertex shader that doesn't use any textures, but leaves the existing sampler views bound in place. This was tricking us into thinking the PBO destination might be bound for texturing in some cases. In Civilization VI, this fixes a false self- dependency issue that was preventing CCS_E compression on upload. Fixing this slightly improves frame times.
* r600: implement resource_get_infoJulien Isorce2019-05-031-5/+29
| | | | | | | | Factoring code with resource_get_handle. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=110443 Signed-off-by: Julien Isorce <[email protected]> Reviewed-by: Dave Airlie [email protected]
* iris: Disable dual source blending when shader doesn't handle itKenneth Graunke2019-05-021-4/+15
| | | | | | | | | | | | | This is a port of Danylo's eca4a6548d07bbbb02a7768edb397bad7b72cfc2 which fixed the hang on i965. It fixes GPU hangs in his new Piglit test, arb_blend_func_extended-dual-src-blending-discard-without-src1. I avoided my own review feedback here, and decided to simply adjust 3DSTATE_PS_BLEND rather than BLEND_STATE_ENTRY[0]. It has never been clear to me which the hardware uses in every case. However, whacking the enable in 3DSTATE_PS_BLEND seems to be sufficient to fix the hang, and that packet is already dynamic, so it's easy to handle. I'd rather avoid making BLEND_STATE_ENTRY[0] dynamic unless I have to.
* lima/ppir: support nir_op_ftruncErico Nunes2019-05-023-0/+14
| | | | | | | | Support nir_op_ftrunc by turning it into a mov with a round to integer output modifier. Signed-off-by: Erico Nunes <[email protected]> Reviewed-by: Qiang Yu <[email protected]>
* freedreno/a6xx: smaller hammer for fb barrierRob Clark2019-05-023-0/+48
| | | | | | | We just need to do a sequence of commands to flush the cache. Signed-off-by: Rob Clark <[email protected]> Reviewed-by: Kristian H. Kristensen <[email protected]>
* freedreno/a6xx: KHR_blend_equation_advanced supportRob Clark2019-05-027-5/+96
| | | | | | | | | Wire up support to sample from the fb (and force GMEM rendering when we have fb reads). The existing GLSL IR lowering for blend_equation_advanced does the rest. Signed-off-by: Rob Clark <[email protected]> Reviewed-by: Kristian H. Kristensen <[email protected]>
* freedreno/ir3: add some ubo range related assertsRob Clark2019-05-021-3/+6
| | | | | | | And a comment.. since we are mixing units of bytes/dwords/vec4, hopefully this will avoid some unit confusion. Signed-off-by: Rob Clark <[email protected]>
* panfrost/midgard: Skip liveness analysis for instructions without destTomeu Vizoso2019-05-021-0/+7
| | | | | | | [Alyssa: Add comment explanation] Signed-off-by: Tomeu Vizoso <[email protected]> Reviewed-by: Alyssa Rosenzweig <[email protected]>
* panfrost/midgard: Skip register allocation if there's no work to doTomeu Vizoso2019-05-021-0/+3
| | | | | Signed-off-by: Tomeu Vizoso <[email protected]> Reviewed-by: Alyssa Rosenzweig <[email protected]>
* svga: add SVGA_NO_LOGGING env var (v2)Brian Paul2019-05-021-1/+15
| | | | | | | | | | valgrind crashes when we try to initialize host logging. This env var can be used to disable logging. v2: rebase onto "svga: move host logging to winsys". Cc: [email protected] Reviewed-by: Neha Bhende <[email protected]>
* svga: move host logging to winsysCharmaine Lee2019-05-026-502/+11
| | | | | | | | | This patch adds a host_log interface to svga_winsys and moves the host logging code to the winsys layer. Cc: [email protected] Reviewed-by: Brian Paul <[email protected]> Reviewed-by: Neha Bhende <[email protected]>
* svga: Avoid bouncing buffer data in malloced buffersThomas Hellstrom2019-05-023-13/+36
| | | | | | | | | | | | | | Some constant- and texture upload buffer data may bounce in malloced buffers before being transferred to hardware buffers. In the case of texture upload buffers this seems to be an oversight. In the case of constant buffers, code comments indicate that we want to avoid mapping hardware buffers for reading when copying out of buffers that need modification before being passed to hardware. In this case we avoid data bouncing for upload manager buffers but make sure buffers that we read out from stay in malloced memory. Signed-off-by: Thomas Hellstrom <[email protected]> Reviewed-by: Brian Paul <[email protected]>
* radeonsi: set sampler state and view functions for compute-only contextsMarek Olšák2019-05-013-9/+12
|
* radeonsi: use new atomic LLVM helpersMarek Olšák2019-05-011-8/+4
| | | | This depends on "ac,ac/nir: use a better sync scope for shared atomics"
* lima/gpir: add limit of max 512 instructionsErico Nunes2019-05-021-0/+6
| | | | | | | | | | | It has been noted that the lima GP has a limit of 512 instructions, after which the shaders don't work and fail silently. This commit adds a check to make the shader compilation abort when the shader exceeds this limit, so that we get a clear reason for why the program will not work. Signed-off-by: Erico Nunes <[email protected]> Reviewed-by: Qiang Yu <[email protected]>
* panfrost: Fix blend shader uploadAlyssa Rosenzweig2019-05-012-7/+14
| | | | Signed-off-by: Alyssa Rosenzweig <[email protected]>
* panfrost/decode: Hit MRT blend shader enable bitsAlyssa Rosenzweig2019-05-012-3/+18
| | | | Signed-off-by: Alyssa Rosenzweig <[email protected]>
* panfrost: Remove shader dumpAlyssa Rosenzweig2019-05-014-9/+0
| | | | | | Redundant via the midgard shader dump. Signed-off-by: Alyssa Rosenzweig <[email protected]>
* virgl: Re-use and extend queue transfers for intersecting buffer subdatas.David Riley2019-05-011-0/+46
| | | | | | | | Small buffer subdatas which are essentially doing a memcpy were getting bogged down by all the overhead of creating new transfers. Signed-off-by: David Riley <[email protected]> Reviewed-by: Gurchetan Singh <[email protected]>
* virgl: Allow transfer queue entries to be found and extended.David Riley2019-05-012-0/+58
| | | | | | | | | Intersecting transfer queue entries allow for the possibility of extending an existing transfer instead of creating a new one (and all the associated mappign/unmapping). Signed-off-by: David Riley <[email protected]> Reviewed-by: Gurchetan Singh <[email protected]>
* virgl: Store mapped hw resource with transfer object.David Riley2019-05-013-7/+7
| | | | | Signed-off-by: David Riley <[email protected]> Reviewed-by: Gurchetan Singh <[email protected]>
* iris: Fix imageBuffer and PBO download.Kenneth Graunke2019-05-011-2/+2
| | | | | | | | | Recently we added checks to try and deny multisampled shader images. Unfortunately, this messed up imageBuffers, which have sample_count = 0, which are also used in PBO download, causing us hit CPU map fallbacks. Fixes: b15f5cfd20c iris: Do not advertise multisampled image load/store. Reviewed-by: Rafael Antognolli <[email protected]>
* r600: reset tex array override even when no view boundDave Airlie2019-05-021-11/+10
| | | | | | | | | | If no view is bound we still should reset the override to 0 and array mode. This should fix misrendering in firefox WebRender since the pbo sampler was removed. Fixes: 1250383e36 (st/mesa: remove sampler associated with buffer texture in pbo logic)
* swr/rast: Add general SWTag statisticsAlok Hota2019-05-013-161/+191
| | | | | | Update Archrast parser to use stats, used with an internal tool Reviewed-by: Bruce Cherniak <[email protected]>
* swr/rast: Add string handling to AR event frameworkAlok Hota2019-05-015-31/+54
| | | | | | For use by an internal tool Reviewed-by: Bruce Cherniak <[email protected]>
* swr/rast: Add initial SWTag proto definitionsAlok Hota2019-05-012-39/+71
| | | | | | Update gen_archrast.py to properly generate event IDs Reviewed-by: Bruce Cherniak <[email protected]>
* swr/rast: Cleanup and generalize gen_archrastAlok Hota2019-05-013-123/+57
| | | | | | | | | | | | | | | | - Update meson.build - Includes current_build_dir() fix meson/swr: replace hard-coded path with current_build_dir() Fixes: 93cd9905c8fbb98985ae "swr/rast: Cleanup and generalize gen_archrast" Signed-off-by: Eric Engestrom <[email protected]> Reviewed-by: Alok Hota <[email protected]> Reviewed-by: Dylan Baker <[email protected]> - Clean up meson.build (remove foreach loop, replace with single call) - Update SConscript - use `$SOURCES` to call `CodeGenerate` with multiple source files Reviewed-by: Bruce Cherniak <[email protected]>
* softpipe: setup pixel_offset for all primitive typesErik Faye-Lund2019-05-011-11/+10
| | | | | | | | | | | | If we don't update this for all primitive-types, we end up rendering slightly offset points and lines up until the point where the first triangle gets drawn. This is obviously not correct, and violates OpenGL's repeatability rule. Signed-off-by: Erik Faye-Lund <[email protected]> Fixes: ca9c413647b ("softpipe: Respect gl_rasterization_rules in primitive setup.") Reviewed-By: Gert Wollny <[email protected]>
* softpipe: Increase the GLSL feature levelGert Wollny2019-05-011-1/+1
| | | | | | | | This will enable calls to the interpolateAt* functions, but also a bunch of other features. Signed-off-by: Gert Wollny <[email protected]> Reviewed-by: Roland Scheidegger <[email protected]>