summaryrefslogtreecommitdiffstats
path: root/src/gallium/drivers
Commit message (Collapse)AuthorAgeFilesLines
* v3d: Don't forget to flush writes to UBOs.Eric Anholt2018-12-072-5/+16
| | | | | If someone did TF into a UBO, we might have left the TF job un-flushed at the point of reading.
* v3d: Make an array for frag/vert texture state in the context.Eric Anholt2018-12-077-42/+21
| | | | | This simplifies a bunch of our texture handling, while introducing the slots necessary for adding new shader stages.
* v3d: Put default vertex attribute values into the state uploader as well.Eric Anholt2018-12-073-8/+12
| | | | | The default attributes are long-lived (the state struct is cached), and only 256 bytes each.
* v3d: Create a state uploader for packing our shaders together.Eric Anholt2018-12-074-13/+35
| | | | | | Shaders are usually quite short, and are private to the context. We can save memory and reduce the work the kernel needs to do at exec time by packing them together in a stream uploader for long-lived state.
* v3d: Update simulator cache flushing code to match the kernel better.Eric Anholt2018-12-071-13/+19
| | | | | We were missing the invalidate between bin and render (possibly relevant for SSBOs), and still trying to flush the nonexistent L2C on 3.3+.
* v3d: Use the TFU to do generatemipmap.Eric Anholt2018-12-077-1/+175
| | | | | This is a separate, dedicated hardware unit for texture layout conversions and mipmap generation.
* v3d: Add the V3D TFU submit interface to the simulator.Eric Anholt2018-12-073-20/+90
| | | | | | | | | The TFU lets us format raster and SAND images into formats that can be read by the texture engine, and do mipmap generation. The UAPI comes from drm-next e69aa5f9b97f ("Merge tag 'drm-misc-next-2018-12-06' of git://anongit.freedesktop.org/drm/drm-misc into drm-next")
* v3d: Use combined input/output segments.Eric Anholt2018-12-071-4/+7
| | | | | | | The HW apparently has some issues (or at least a much more complicated VCM calculation) with non-combined segments, and the closed source driver also uses combined I/O. Until I get the last CTS failure resolved (which does look plausibly like some VPM stomping), let's use combined I/O too.
* v3d: Add missing OES_half_float_linear support.Eric Anholt2018-12-071-0/+1
| | | | | | | | We were exposing ARB_texture_float, but apparently not the OES subset flag. Fixes regression from GLES3 support to GLES2. Fixes: fcf9fcee3c8a ("mesa/main: do not require float-texture filtering for es3")
* v3d: Add support for RGBA_SRGB along with BGRA_SRGB.Eric Anholt2018-12-071-0/+2
| | | | | This is the actual native format for the hardware, without swizzling. Noticed while debugging why GLES3 disappeared.
* freedreno/ir3: track max flow control depth for a5xx/a6xxRob Clark2018-12-072-4/+4
| | | | | | Rather than just hard-coding BRANCHSTACK size. Signed-off-by: Rob Clark <[email protected]>
* freedreno/ir3: sync instr/disasmRob Clark2018-12-071-1/+1
| | | | Signed-off-by: Rob Clark <[email protected]>
* freedreno/a6xx: blitter fixesRob Clark2018-12-072-3/+80
| | | | Signed-off-by: Rob Clark <[email protected]>
* freedreno: update generated headersRob Clark2018-12-077-35/+56
| | | | Signed-off-by: Rob Clark <[email protected]>
* freedreno/a5xx+a6xx: remove unused fs/vs pvt memRob Clark2018-12-074-20/+0
| | | | | | copy/pasta from older gens Signed-off-by: Rob Clark <[email protected]>
* freedreno: remove unused fd_surface fieldsRob Clark2018-12-071-5/+0
| | | | Signed-off-by: Rob Clark <[email protected]>
* freedreno: Add support for EXT_multisampled_render_to_textureKristian H. Kristensen2018-12-063-1/+7
| | | | | | | | | There is not much to do in freedreno - tile layout and multisample state for gmem renderings is programmed based on the pfb sample count, while resolve blits take the destination sample count from the resource. Reviewed-by: Rob Clark <[email protected]> Signed-off-by: Kristian H. Kristensen <[email protected]>
* freedreno/a6xx: MSAARob Clark2018-12-0610-24/+74
| | | | | Reviewed-by: Kristian H. Kristensen <[email protected]> Signed-off-by: Rob Clark <[email protected]>
* amd: remove support for LLVM 6.0Samuel Pitoiset2018-12-067-195/+38
| | | | | | | User are encouraged to switch to LLVM 7.0 released in September 2018. Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Marek Olšák <[email protected]>
* gallium: Android build fixesKristian H. Kristensen2018-12-051-0/+1
| | | | | | | A couple of simple fixes for building on Android with autotools. Reviewed-by: Marek Olšák <[email protected]> Reviewed-by: Emil Velikov <[email protected]>
* nir: Make boolean conversions sized just like the othersJason Ekstrand2018-12-051-4/+4
| | | | | | | | | Instead of a single i2b and b2i, we now have i2b32 and b2iN where N is one if 8, 16, 32, or 64. This leads to having a few more opcodes but now everything is consistent and booleans aren't a weird special case anymore. Reviewed-by: Connor Abbott <[email protected]>
* nouveau: set texture upload budgetIlia Mirkin2018-12-033-3/+6
| | | | | | | | | It doesn't seem like the exact number has too much effect on the performaince in "teximage". However setting it to just about anything prevents some OOMs from getting hit. These values are not well-tuned, but don't seem too bad. Signed-off-by: Ilia Mirkin <[email protected]>
* nv50,nvc0: add explicit handling of PIPE_CAP_MAX_VERTEX_ELEMENT_SRC_OFFSETIlia Mirkin2018-12-032-0/+4
| | | | | | | Since the max attrib stride is 2048, the max src offset makes sense as 2047. Signed-off-by: Ilia Mirkin <[email protected]>
* nv50: always keep TSC slot 0 boundIlia Mirkin2018-12-033-0/+31
| | | | | | | | | | | | | | | | | All TXF operations implicitly use sampler 0, and fail if it's not bound to anything. This does not happen in LINKED_TSC mode, but we don't currently use this. We ensure that TSC entry at id 0 has the SRGB conversion bit enabled (and all samplers we normally generate will too). Then when the TSC at *slot* 0 (not to be confused with entry 0 in the global TSC table) is unbound, we bind it to entry 0. This way, TXF operations are not dependent on there being a regular sampler bound there. Fixes arb_texture_buffer_object-subdata-sync among others. (TBO's are particularly susceptible to this as they don't bind a sampler.) Signed-off-by: Ilia Mirkin <[email protected]>
* virgl: fix const warning on debug flags.Dave Airlie2018-12-042-3/+3
| | | | Fixes: 8d4bb6e5c (virgl: Add command and flags to initiate debugging on the host (v2))
* nv50,nvc0: Fix gallium nine regression regarding sampler bindingsKarol Herbst2018-12-022-16/+12
| | | | | | | | | | | | | | | | The new approach is that samplers don't get unbound even if they won't be used in a draw and we should just leave them be as well. Fixes a regression in multiple windows games using gallium nine and nouveau. v2: adjust num_samplers to keep track of the highest sampler bound v3: rework how to set the new value of num_samplers Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=106577 Fixes: 4d6fab245eec3880e2a59424a579851f44857ce8 "cso: don't track the number of sampler states bound" Signed-off-by: Karol Herbst <[email protected]> Reviewed-by: Ilia Mirkin <[email protected]>
* virgl: don't mark buffers as unclean after a writeGurchetan Singh2018-11-302-1/+10
| | | | | | | | | | | | | | | | | We can mark the buffer unclean if it's ever bound as a TBO, SSBO, ABO, or image. This improves dEQP-GLES3.performance.buffer.data_upload.function_call.map_buffer_range.new_specified_buffer.flag_write_full.stream_draw from 9.58 MB/s to 451.17 MB/s. v2: Track buffer cleanliness as a function of bindings (Ilia). v3: virgl_modify_clean --> virgl_dirty_res (Erik) Tested-By: Gert Wollny <[email protected]> Reviewed-by: Erik Faye-Lund <[email protected]>
* virgl: avoid large inline transfersGurchetan Singh2018-11-301-1/+5
| | | | | | | | | | | | | | | | | | | | We flush everytime the command buffer (16 kB) is full, which is quite costly. This improves dEQP-GLES3.performance.buffer.data_upload.function_call.buffer_data.new_buffer.usage_stream_draw from 111.16 MB/s to 1930.36 MB/s. In addition, I made the benchmark produce buffers from 0 --> VIRGL_MAX_CMDBUF_DWORDS * 4, and tried ((VIRGL_MAX_CMDBUF_DWORDS * 4) / 2), ((VIRGL_MAX_CMDBUF_DWORDS * 4) / 4), etc. I didn't notice any clear differences, so let's just go with the most obvious heuristic. Tested-By: Gert Wollny <[email protected]> Reviewed-by: Erik Faye-Lund <[email protected]>
* virgl: quadruple command buffer sizeGurchetan Singh2018-11-301-1/+1
| | | | | | | | | | | | Tested running WebGL aquarium on Nvidia host (10,000 fishes) This moves us from 7 fps to 9 fps. After quadrupling, performance gains diminish. v2: Remove change ID (Erik) Tested-By: Gert Wollny <[email protected]> Reviewed-by: Erik Faye-Lund <[email protected]>
* radeonsi: add memory management stress tests for GDSMarek Olšák2018-11-282-0/+48
| | | | Reviewed-by: Bas Nieuwenhuizen <[email protected]>
* winsys/amdgpu: add support for allocating GDS and OA resourcesMarek Olšák2018-11-281-1/+3
| | | | Reviewed-by: Bas Nieuwenhuizen <[email protected]>
* radeonsi: allow si_cp_dma_clear_buffer to clear GDS from any IBMarek Olšák2018-11-284-31/+33
| | | | Reviewed-by: Bas Nieuwenhuizen <[email protected]>
* winsys/amdgpu,radeon: pass vm_alignment to buffer_from_handleMarek Olšák2018-11-284-3/+10
| | | | Acked-by: Christian König <[email protected]>
* radeonsi: fix is_oneway_access_only for bindless imagesMarek Olšák2018-11-281-6/+23
|
* radeonsi/nir: parse more information about bindless usageMarek Olšák2018-11-281-4/+32
| | | | fill more tgsi_shader_info fields.
* radeonsi: small cleanup for memory opcodesMarek Olšák2018-11-281-9/+4
|
* radeonsi: fix is_oneway_access_only for image storesMarek Olšák2018-11-281-12/+37
| | | | We need to look at the Dst for image stores.
* radeonsi: use structured buffer intrinsics for image viewsMarek Olšák2018-11-282-10/+42
| | | | to stop using the workaround in si_make_buffer_descriptor.
* radeonsi: clean up primitive binning enablementMarek Olšák2018-11-281-11/+16
| | | | | | no change in behavior. Reviewed-by: Bas Nieuwenhuizen <[email protected]>
* virgl: fix undefined shift to use unsigned.Dave Airlie2018-11-291-1/+1
| | | | | | Ported from virglrenderer. Signed-off-by: Dave Airlie <[email protected]>
* r600: make suballocator 256-bytes alignDave Airlie2018-11-291-1/+1
| | | | | Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=108311 Cc: <[email protected]>
* winsys/amdgpu: explicitly declare whether buffer_map is permanent or notNicolai Hähnle2018-11-2814-28/+67
| | | | | | | | | | | | | | | | | | | | | | | | | | Introduce a new driver-private transfer flag RADEON_TRANSFER_TEMPORARY that specifies whether the caller will use buffer_unmap or not. The default behavior is set to permanent maps, because that's what drivers do for Gallium buffer maps. This should eliminate the need for hacks in libdrm. Assertions are added to catch when the buffer_unmap calls don't match the (temporary) buffer_map calls. I did my best to update r600 for consistency (r300 needs no changes because it never calls buffer_unmap), even though the radeon winsys ignores the new flag. As an added bonus, this should actually improve the performance of the normal fast path, because we no longer call into libdrm at all after the first map, and there's one less atomic in the winsys itself (there are now no atomics left in the UNSYNCHRONIZED fast path). Cc: Leo Liu <[email protected]> v2: - remove comment about visible VRAM (Marek) - don't rely on amdgpu_bo_cpu_map doing an atomic write Reviewed-by: Marek Olšák <[email protected]>
* virgl: Don't try handling server fences when they are not supportedGert Wollny2018-11-281-2/+4
| | | | | | | | | | | | | | | | | | | | | | | | vtest doesn't implement the according API and would segfault: Program received signal SIGSEGV, Segmentation fault. #0 0x0000000000000000 in ?? () #1 in virgl_fence_server_sync at src/gallium/drivers/virgl/virgl_context.c:1049 #2 in st_server_wait_sync at src/mesa/state_tracker/st_cb_syncobj.c:155 so just don't do the call when the function pointers are not set. Fixes dEQP: dEQP-GLES3.functional.fence_sync.wait_sync_smalldraw dEQP-GLES3.functional.fence_sync.wait_sync_largedraw Fixes: d1a1c21e7621b5177febf191fcd3d3b8ef69dc96 virgl: native fence fd support Signed-off-by: Gert Wollny <[email protected]> Reviewed-by: Dave Airlie <[email protected]> Reviewed-by: Robert Foss <[email protected]>
* v3d: Add renderonly support.Eric Anholt2018-11-274-4/+68
| | | | | | I've been using this with the kmsro series to test v3d on VKMS without my old KMS hack in the v3d kernel driver. KMSRO still needs some cleanup, but v3d RO support seems reasonable.
* freedreno: implements get_sample_positionHyunjun Ko2018-11-271-0/+45
| | | | | | | | | | Since 1285f71d3e landed, it needs to provide apps with proper sample position for MSAA. Currently no way to query this to hw, these are taken from blob driver. Fixes: dEQP-GLES31.functional.texture.multisample.samples_#.sample_position Signed-off-by: Rob Clark <[email protected]>
* freedreno/a3xx: also set FSSUPERTHREADENABLERob Clark2018-11-271-0/+1
| | | | | | | | | | We set equiv bit in SP_FS_CTRL_REG0. Somehow the hw doesn't hang with this mismatched config, but does run slower. It is faster with either neither bit set, or both bits set, but both is the fastest of the three configurations. Worth a bit over 10% gain in glmark2. Spotted-by: Jonathan Marek <[email protected]> Signed-off-by: Rob Clark <[email protected]>
* freedreno: use MSM_BO_SCANOUT with scanout buffersJonathan Marek2018-11-271-1/+3
| | | | Signed-off-by: Jonathan Marek <[email protected]>
* freedreno: use GENERIC instead of TEXCOORD for blit programJonathan Marek2018-11-271-1/+1
| | | | | | | blip_fp uses GENERIC as input, so blit_vp should match for linking Signed-off-by: Jonathan Marek <[email protected]> Signed-off-by: Rob Clark <[email protected]>
* freedreno: a2xx texture updateJonathan Marek2018-11-279-20/+212
| | | | | | | | | | | Adds all missing texture related logic. For everything to work it also needs changes to ir2/fd2_program, which are part of the ir2 update patch. Note: it needs rnndb update Signed-off-by: Jonathan Marek <[email protected]> [remove stray patch] Signed-off-by: Rob Clark <[email protected]>
* freedreno/a2xx: Compute depth base in gmem correctlyJonathan Marek2018-11-271-5/+7
| | | | | | | | Note: it needs rnndb update Signed-off-by: Marek Vasut <[email protected]> Signed-off-by: Jonathan Marek <[email protected]> Signed-off-by: Rob Clark <[email protected]>