aboutsummaryrefslogtreecommitdiffstats
path: root/src/gallium
Commit message (Collapse)AuthorAgeFilesLines
...
* radeon/vce: determine idr by pic typeBoyuan Zhang2017-12-151-1/+1
| | | | | | | | | Vaapi encode interface provides idr frame flags, where omx interface doesn't. Therefore, change to use picture type to determine idr frame, which will work for both interfaces. Signed-off-by: Boyuan Zhang <[email protected]> Reviewed-by: Leo Liu <[email protected]>
* radeon/vcn: determine idr by pic typeBoyuan Zhang2017-12-151-1/+1
| | | | | | | | | | Vaapi encode interface provides idr frame flags, where omx interface doesn't. Therefore, change to use picture type to determine idr frame, which will work for both interfaces. Signed-off-by: Boyuan Zhang <[email protected]> Reviewed-by: Leo Liu <[email protected]> Reviewed-by: Christian König <[email protected]>
* swr/rast: Move more RTAI handling out of binnerTim Rowley2017-12-152-12/+2
| | | | Reviewed-by: Bruce Cherniak <[email protected]>
* swr/rast: EXTRACT2 changed from vextract/vinsert to vshuffleTim Rowley2017-12-153-61/+32
| | | | Reviewed-by: Bruce Cherniak <[email protected]>
* swr/rast: Fix cache of API thread event managerTim Rowley2017-12-151-1/+1
| | | | Reviewed-by: Bruce Cherniak <[email protected]>
* swr/rast: Replace VPSRL with LSHRTim Rowley2017-12-154-41/+4
| | | | | | | | Replace use of x86 intrinsic with general llvm IR instruction. Generates the same final assembly. Reviewed-by: Bruce Cherniak <[email protected]>
* swr/rast: Rework thread binding parameters for machine partitioningTim Rowley2017-12-157-88/+322
| | | | | | | | | | | | | Add BASE_NUMA_NODE, BASE_CORE, BASE_THREAD parameters to SwrCreateContext. Add optional SWR_API_THREADING_INFO parameter to SwrCreateContext to control reservation of API threads. Add SwrBindApiThread() function to allow binding of API threads to reserved HW threads. Reviewed-by: Bruce Cherniak <[email protected]>
* swr/rast: Pull of RTAI gather & offset out of clip/bin codeTim Rowley2017-12-157-146/+203
| | | | Reviewed-by: Bruce Cherniak <[email protected]>
* swr/rast: Remove no-op VBROADCAST of vIDTim Rowley2017-12-151-2/+2
| | | | Reviewed-by: Bruce Cherniak <[email protected]>
* swr/rast: SIMD16 Fetch - Fully widen 32-bit integer vertex componentsTim Rowley2017-12-154-17/+109
| | | | | | Also widen the 16-bit a 8-bit integer vertex component gathers to SIMD16. Reviewed-by: Bruce Cherniak <[email protected]>
* swr/rast: Replace INSERT2 vextract/vinsert with JOIN2 vshuffleTim Rowley2017-12-153-105/+30
| | | | Reviewed-by: Bruce Cherniak <[email protected]>
* swr/rast: SIMD16 Fetch - Fully widen 16-bit float vertex componentsTim Rowley2017-12-151-7/+48
| | | | Reviewed-by: Bruce Cherniak <[email protected]>
* swr/rast: SIMD16 Fetch - Fully widen 32-bit float vertex componentsTim Rowley2017-12-154-32/+194
| | | | Reviewed-by: Bruce Cherniak <[email protected]>
* swr/rast: Pass prim to ClipSimdTim Rowley2017-12-151-5/+5
| | | | Reviewed-by: Bruce Cherniak <[email protected]>
* swr/rast: Pull most of the VPAI manipulation out of the binner/clipperTim Rowley2017-12-157-158/+177
| | | | | | Move out of binner/clipper; hand them down from the frontend code instead. Reviewed-by: Bruce Cherniak <[email protected]>
* swr/rast: Move GatherScissors to headerTim Rowley2017-12-152-127/+127
| | | | Reviewed-by: Bruce Cherniak <[email protected]>
* swr/rast: Rewrite Shuffle8bpcGatherd using shuffleTim Rowley2017-12-151-182/+62
| | | | | | Ease future code maintenance, prepare for folding simd8 and simd16 versions. Reviewed-by: Bruce Cherniak <[email protected]>
* swr/rast: Convert gather masks to Nx1bitTim Rowley2017-12-152-40/+14
| | | | | | | Simplifies calling code, gets gather function interface closer to llvm's masked_gather. Reviewed-by: Bruce Cherniak <[email protected]>
* swr/rast: WIP - Widen fetch shader to SIMD16Tim Rowley2017-12-151-27/+689
| | | | | | Widen vertex gather/storage to SIMD16 for all component types. Reviewed-by: Bruce Cherniak <[email protected]>
* swr/rast: Corrections to multi-scissor handlingTim Rowley2017-12-151-88/+88
| | | | | | | binner's GatherScissors() will be turned into a real gather in the not too distant future. Reviewed-by: Bruce Cherniak <[email protected]>
* swr/rast: Binner fixes for viewport index offset handlingTim Rowley2017-12-152-2/+12
| | | | Reviewed-by: Bruce Cherniak <[email protected]>
* swr/rast: Remove unneeded copy of gather maskTim Rowley2017-12-152-79/+23
| | | | Reviewed-by: Bruce Cherniak <[email protected]>
* freedreno: use u_transfer_helperRob Clark2017-12-152-229/+44
| | | | Signed-off-by: Rob Clark <[email protected]>
* gallium/util: add u_transfer_helperRob Clark2017-12-155-1/+649
| | | | | | | | | | | | | | | | | | | | | | | | Add a new helper that drivers can use to emulate various things that need special handling in particular in transfer_map: 1) z32_s8x24.. gl/gallium treats this as a single buffer with depth and stencil interleaved but hardware frequently treats this as separate z32 and s8 buffers. Special pack/unpack handling is needed in transfer_map/unmap to pack/unpack the exposed buffer 2) fake RGTC.. GPUs designed with GLES in mind, but which can other- wise do GL3, if native RGTC is not supported it can be emulated by converting to uncompressed internally, but needs pack/unpack in transfer_map/unmap 3) MSAA resolves in the transfer_map() case v2: add MSAA resolve based on Eric's "gallium: Add helpers for MSAA resolves in pipe_transfer_map()/unmap()." patch; avoid wrapping pipe_resource, to make it possible for drivers to use both this and threaded_context. Signed-off-by: Rob Clark <[email protected]>
* gallivm: implement accurate corner behavior for textureGather with cube mapsRoland Scheidegger2017-12-141-103/+201
| | | | | | | | | | | | | | | | The spec says the missing texel (when we wrap around both x and y axis) should be synthesized as the average of the 3 other texels. For bilinear filtering however we instead adjusted the filter weights (because, while the complexity looks similar, there would be 4 times as many color values to fix up than weights). Obviously this could not work for gather (hence accurate corner filtering was disabled with gather). Implement this by just doing it as the spec implies - calculate the 4th texel as the average of the other 3. With gather of course there's only one color to worry about, so it's not all that many instructions neither (albeit surely the whole cube map filtering is hilariously complex). Reviewed-by: Brian Paul <[email protected]> Reviewed-by: Jose Fonseca <[email protected]>
* gallivm: fix an issue with NaNs with seamless cube filteringRoland Scheidegger2017-12-141-0/+11
| | | | | | | | | | | | | | | | | | | Cube texture wrapping is a bit special since the values (post face projection) always are within [0,1], so we took advantage of that and omitted some clamps. However, we can still get NaNs (either because the coords already had NaNs, or the face projection generated them), and in fact we didn't handle them quite safely. I've seen -INT_MAX + 1 been propagated through as the final int coord value, albeit I didn't observe a crash. (Not quite a coincidence, since any stride mul with -INT_MAX or -INT_MAX+1 will turn up as a small positive number - nevertheless, I'd rather not try my luck, I'm not entirely sure it can't really turn up negative neither due to seamless coord swapping, plus ifloor of a NaN is not guaranteed to return -INT_MAX by any standard. And we kill off NaNs similarly with ordinary texture wrapping too.) So kill off the NaNs by using the common max against zero method. Reviewed-by: Brian Paul <[email protected]> Reviewed-by: Jose Fonseca <[email protected]>
* amd/common: add ac_build_waitcnt()Samuel Pitoiset2017-12-143-15/+4
| | | | | Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]>
* radeonsi: make use of ac_build_fdiv()Samuel Pitoiset2017-12-141-7/+1
| | | | | | | And move the comment to amd/common. Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]>
* radeonsi: make use of ac_get_spi_shader_z_format()Samuel Pitoiset2017-12-143-23/+4
| | | | | Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]>
* swr: Correct texture allocation and limit max size to 2GBBruce Cherniak2017-12-132-4/+10
| | | | | | | | | | | | | | | | | | | | | | | | | This patch fixes piglit tex3d-maxsize by correcting 4 things: The total_size calculation was using 32-bit math, therefore a >4GB allocation request overflowed and was not returning false (unsupported). Changed AlignedMalloc arguments from "unsigned int" to size_t, to handle >4GB allocations. Added error checking on texture allocations to fail gracefully. Finally, temporarily decreased supported max texture size from 4GB to 2GB. The gallivm texture-sampler needs some additional work to correctly handle larger than 2GB textures (offsets to LLVMBuildGEP are signed). I'm working on a follow-on patch to allow up to 4GB textures, as this is useful in HPC visualization applications. Fixes piglit tex3d-maxsize. v2: Updated patch description to clarify ">4GB". Reviewed-By: George Kyriazis <[email protected]>
* swr: Fix KNOB_MAX_WORKER_THREADS thread creation override.Bruce Cherniak2017-12-131-2/+1
| | | | | | | | | | | | | Environment variable KNOB_MAX_WORKER_THREADS allows the user to override default thread creation and thread binding. Previous commit to adjust linux cpu topology caused setting this KNOB to bind all threads to a single core. This patch restores correct functionality of override. Cc: <[email protected]> Reviewed-by: Tim Rowley <[email protected]>
* gallium/docs: document behavior of set_sample_mask()Brian Paul2017-12-131-1/+4
| | | | | | | | The sample mask is used even if msaa is not explicity enabled when we have a framebuffer with multisampled surfaces. That's DX behavior and what the Radeon drivers do. Not sure about other drivers at this point. Reviewed-by: Roland Scheidegger <[email protected]>
* radeonsi: create get_tcs_tes_buffer_address helperTimothy Arceri2017-12-131-12/+32
| | | | | | This will be shared between the NIR and TGSI backends. Reviewed-by: Nicolai Hähnle <[email protected]>
* cso: add point rasterization sanity check assertionBrian Paul2017-12-121-0/+5
| | | | | Reviewed-by: Nicolai Hähnle <[email protected]> Reviewed-by: Roland Scheidegger <[email protected]>
* gallium/u_blitter: replace tabs with spacesBrian Paul2017-12-121-18/+18
| | | | Reviewed-by: Marek Olšák <[email protected]>
* gallium/util: don't pass a pipe_resource to util_resource_is_array_texture()Brian Paul2017-12-122-4/+4
| | | | | | | | | No need to pass a pipe_resource when we can just pass the target. This makes the function potentially more usable. Rename it too. Reviewed-by: Roland Scheidegger <[email protected]> Reviewed-by: Charmaine Lee <[email protected]> Reviewed-by: Marek Olšák <[email protected]>
* gallium/aux: include nr_samples in util_resource_size() computationBrian Paul2017-12-121-1/+2
| | | | | | | | | | | | | This function is only used in two places: 1. VMware driver, but only for HUD reporting 2. st/nine state tracker, used for texture memory accounting Fixes: a69efa9482d ("util: add new util_resource_size() function in u_resource.[ch]") Reviewed-by: Roland Scheidegger <[email protected]> Reviewed-by: Charmaine Lee <[email protected]> Reviewed-by: Marek Olšák <[email protected]>
* svga: trivial whitespace/formatting fixes in svga_pipe_rasterizer.cBrian Paul2017-12-121-9/+5
|
* gallivm: fix texture wrapping for texture gather for mirror modesRoland Scheidegger2017-12-121-74/+171
| | | | | | | | | | | | | | | | | | | Care must be taken that all coords end up correct, the tests are very sensitive that everything is correctly rounded. This doesn't matter for bilinear filter (since picking a wrong texel with weight zero is ok), and we could also switch the per-sample coords mistakenly. While here, also optimize the coord_mirror helper a bit (we can do the mirroring directly by exploiting float rounding, no need for fixing up odd/even manually). I did not touch the mirror_clamp and mirror_clamp_to_border modes. In contrast to mirror_clamp_to_edge and mirror_repeat these are legacy modes. They are specified against old gl rules, which actually does the mirroring not per sample (so you get swapped order if the coord is in the mirrored section). I think the idea though is that they should follow the respecified mirror_clamp_to_edge rules so the order would be correct. Reviewed-by: Jose Fonseca <[email protected]>
* winsys/amdgpu: disable local BOs again due to worse performanceMarek Olšák2017-12-111-2/+3
| | | | | Cc: 17.3 <[email protected]> Reviewed-by: Nicolai Hähnle <[email protected]>
* radeon/vce: move destroy command before feedback commandLeo Liu2017-12-081-1/+1
| | | | | | | | | | | | | | VCE processing IBs starts from session and task info at first level, other commands processed subsequently. The task info for destroy is embedded to destroy command, resulting that feedback command is not properly procoessed. This is causing kernel spin VM fault messages on Polaris and Vega10 card when running ends at encode application. The fix is also verified on VCE physical mode card. Signed-off-by: Leo Liu <[email protected]> Cc: [email protected] Acked-by: Christian König <[email protected]>
* meson: Add lmsensors to gallium libgl-xlib target.Dylan Baker2017-12-071-1/+3
| | | | | | Fixes: 5e71efef44b992b5d70b ("meson: Add lmsensors support") Signed-off-by: Dylan Baker <[email protected]> Reviewed-by: Eric Engestrom <[email protected]>
* meson: add dep_thread to every lib that includes threads.hEric Engestrom2017-12-074-3/+4
| | | | | | Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=104141 Signed-off-by: Eric Engestrom <[email protected]> Reviewed-by: Dylan Baker <[email protected]>
* meson: fix pl111 dependency on vc4Eric Engestrom2017-12-072-5/+6
| | | | | | | | src/gallium/winsys/pl111/drm/libpl111winsys.a(pl111_drm_winsys.c.o): In function `pl111_drm_screen_create': pl111_drm_winsys.c:(.text+0x33): undefined reference to `vc4_drm_screen_create_renderonly' Signed-off-by: Eric Engestrom <[email protected]> Reviewed-by: Dylan Baker <[email protected]>
* r600/sb: do not convert if-blocks that contain indirect array accessGert Wollny2017-12-073-2/+5
| | | | | | | | | | | | | | | | | | | | If an array is accessed within an if block, then currently it is not known whether the value in the address register is involved in the evaluation of the if condition, and converting the if condition may actually result in out-of-bounds array access. Consequently, if blocks that contain indirect array access should not be converted. Fixes piglits on r600/BARTS: spec/glsl-1.10/execution/variable-indexing/ vs-output-array-float-index-wr vs-output-array-vec3-index-wr vs-output-array-vec4-index-wr Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=104143 Signed-off-by: Gert Wollny <[email protected]> Cc: <[email protected]> Signed-off-by: Dave Airlie <[email protected]>
* r600: add support for compute grid/block sizes. (v2)Dave Airlie2017-12-064-3/+100
| | | | | | | | | | We just pass these in from outside in a constant buffer. The shader side stores them once they are accessed once. v2: fix to not use a temp_reg. Signed-off-by: Dave Airlie <[email protected]>
* r600: handle image/buffer sizes correctly.Dave Airlie2017-12-063-4/+21
| | | | | | This adds support to compute for the resq workarounds (buffer/cube sizes) Signed-off-by: Dave Airlie <[email protected]>
* r600/compute: add support for emitting compute image/buffer atomsDave Airlie2017-12-061-1/+9
| | | | Signed-off-by: Dave Airlie <[email protected]>
* r600/compute: handle atomic counters in compute state.Dave Airlie2017-12-061-0/+9
| | | | Signed-off-by: Dave Airlie <[email protected]>
* r600/compute: add support for TGSI compute shaders. (v1.1)Dave Airlie2017-12-062-28/+103
| | | | | | | | | | | This add paths to handle TGSI compute shaders and shader selection. It also avoids emitting certain things on tgsi paths, CBs, vertex buffers, config reg init (not required). v1.1: fix rat mask calc Signed-off-by: Dave Airlie <[email protected]>