summaryrefslogtreecommitdiffstats
path: root/src/gallium
Commit message (Collapse)AuthorAgeFilesLines
* radeonsi: flush CB after MSAA only when transitioning from CB to texturesMarek Olšák2017-06-222-14/+60
| | | | | | | | | | | | | The main flush before texturing is done after the FMASK decompress pass. CB after MSAA rendering is not flushed in set_framebuffer_state and also not in memory_barrier if the current color buffer is MSAA. We fully rely on the FMASK decompress pass for the flushing. Some CB decompress and resolve passes need an explicit flush before and after. Reviewed-by: Nicolai Hähnle <[email protected]>
* radeonsi: unify CB_RESOLVE blitter invocation codeMarek Olšák2017-06-221-17/+18
| | | | Reviewed-by: Nicolai Hähnle <[email protected]>
* radeonsi: flush DB caches only when transitioning from DB to texturingMarek Olšák2017-06-225-25/+56
| | | | | | | | | Use the mechanism of si_decompress_textures, but instead of doing the actual decompression, just flag the DB cache flush there. This removes a lot of unnecessary DB cache flushes. Reviewed-by: Nicolai Hähnle <[email protected]>
* radeonsi: add separate HUD counters for CB and DB cache flushesMarek Olšák2017-06-224-10/+20
| | | | Reviewed-by: Nicolai Hähnle <[email protected]>
* cso: inline a few frequently-used functionsMarek Olšák2017-06-222-31/+26
| | | | Reviewed-by: Nicolai Hähnle <[email protected]>
* cso: don't return errors from sampler functionsMarek Olšák2017-06-222-18/+8
| | | | | | No code checks the errors. Reviewed-by: Nicolai Hähnle <[email protected]>
* cso: don't track the number of sampler states boundMarek Olšák2017-06-221-36/+23
| | | | | | | | | | | | This removes 2 loops from hot codepaths and adds 1 loop to a rare codepath (restore_sampler_states), and makes sanitize_hash() slightly worse. Sampler states, when bound, are not unbound for draw calls that don't need them. That's OK, because bound sampler states don't add any overhead. This results in lower CPU overhead in most cases. Reviewed-by: Nicolai Hähnle <[email protected]>
* egl: turn one more boolean `int` into a `bool`Eric Engestrom2017-06-212-4/+4
| | | | | | | | | | Same as the previous commit, but this one was split out because it's a bit more complicated: this field is given as a pointer to a function, so the function had to be changed as well, and the function was use in a bunch of places, which needed updating as well. Signed-off-by: Eric Engestrom <[email protected]> Reviewed-by: Emil Velikov <[email protected]>
* etnaviv: fix blend color for RB swapped rendertargetsLucas Stach2017-06-214-14/+45
| | | | | | | | | Same as with the colormasks, the blend color needs to be swizzled according to the rendertarget format. Signed-off-by: Lucas Stach <[email protected]> Reviewed-by: Wladimir J. van der Laan <[email protected]> Reviewed-by: Christian Gmeiner <[email protected]>
* st/xvmc: deal with drivers wanting different texture formatsIlia Mirkin2017-06-201-36/+115
| | | | | | | | | | Previously, texture formats were being used unconditionally without checking. However nv30 supports neither RGBX8 nor R4A4/A4R4 formats. Add sufficient fallbacks so that the nv30 driver can have working OSD. Tested on a NV44A/PCI. Signed-off-by: Ilia Mirkin <[email protected]>
* nvc0: fix transfer of larger rectangles with DmaCopy on gk104 and upBen Skeggs2017-06-201-9/+32
| | | | | | | | | | | | | | | | By treating the rectangles as 1cpp, we can run up against some internal copy engine limits and trigger a MEM2MEM_RECT_OUT_OF_BOUNDS error check at launch time. This commit enables the REMAP hardware, which allows us to specify both the component size and number of components for a transfer. We're then able to pass in the real width/nblocksx values and not hit the limits. There's a couple of "supported" CPPs in the list that we can't actually hit, but are there simply because they're possible. Signed-off-by: Ben Skeggs <[email protected]> Acked-by: Ilia Mirkin <[email protected]>
* nvc0: copy engine surface params are only relevant for tiled surfacesBen Skeggs2017-06-201-18/+19
| | | | | | | | | Aside from reducing pushbuf usage in some situations, this commit should have no other effect, and is just to make it somewhat obvious that those methods have zero effect on linear surfaces. Signed-off-by: Ben Skeggs <[email protected]> Acked-by: Ilia Mirkin <[email protected]>
* swr: Include definition of missing functionGeorge Kyriazis2017-06-201-0/+1
| | | | | | | | | Inline function SWR_MULTISAMPLE_POS::PrecalcSampleData() was missing definition. Include definition in core/state_funcs.h. Fixes windows build. Reviewed-by: Tim Rowley <[email protected]>
* vc4: Clean up release build warnings using MAYBE_UNUSED.Eric Anholt2017-06-202-6/+5
| | | | | These variables are all used in an assert(), so release builds see no usages.
* vc4: Allow VBOs to be mapped during execution.Eric Anholt2017-06-201-1/+1
| | | | | | | | There's no reason we can't -- the mappings we expose are basically equivalent to persistent/coherent, already. Improves mesa-demos drawoverhead (no state change) performance by 5.21362% +/- 1.25078% (n=11).
* gallium/vbuf: avoid segfault when we get invalid glDrawRangeElements()Brian Paul2017-06-201-1/+15
| | | | | | | | | | | | | | | | | | | A common user error is to call glDrawRangeElements() with the 'end' argument being one too large. If we use the vbuf module to translate some vertex attributes this error can cause us to read past the end of the mapped hardware buffer, resulting in a crash. This patch adjusts the vertex count to avoid that issue. Typically, the vertex_count gets decremented by one. This fixes crashes with the Unigine Tropics and Sanctuary demos with older VMware hardware versions. The issue isn't hit with VGPU10 because we don't hit this fallback. No piglit changes. CC: [email protected] Reviewed-by: Marek Olšák <[email protected]>
* gallium/vbuf: add some const qualifiersBrian Paul2017-06-201-12/+13
| | | | | | Helps understandability a bit. Reviewed-by: Marek Olšák <[email protected]>
* translate: whitespace fixes in translate_generic.cBrian Paul2017-06-201-199/+206
|
* softpipe: remove unused softpipe_context::line_stipple_counterBrian Paul2017-06-201-2/+0
| | | | Trivial.
* radeonsi: set correct usage flag according to image access typeSamuel Pitoiset2017-06-201-1/+3
| | | | | Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Marek Olšák <[email protected]>
* winsys/amdgpu: fix a deadlock when waiting for submission_in_progressMarek Olšák2017-06-202-16/+43
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | First this happens: 1) amdgpu_cs_flush (lock bo_fence_lock) -> amdgpu_add_fence_dependency -> os_wait_until_zero (wait for submission_in_progress) - WAITING 2) amdgpu_bo_create -> pb_cache_reclaim_buffer (lock pb_cache::mutex) -> pb_cache_is_buffer_compat -> amdgpu_bo_wait (lock bo_fence_lock) - WAITING So both bo_fence_lock and pb_cache::mutex are held. amdgpu_bo_create can't continue. amdgpu_cs_flush is waiting for the CS ioctl to finish the job, but the CS ioctl is trying to release a buffer: 3) amdgpu_cs_submit_ib (CS thread - job entrypoint) -> amdgpu_cs_context_cleanup -> pb_reference -> pb_destroy -> amdgpu_bo_destroy_or_cache -> pb_cache_add_buffer (lock pb_cache::mutex) - DEADLOCK The simple solution is not to wait for submission_in_progress, which we need in order to create the list of dependencies for the CS ioctl. Instead of building the list of dependencies as a direct input to the CS ioctl, build the list of dependencies as a list of fences, and make the final list of dependencies in the CS thread itself. Therefore, amdgpu_cs_flush doesn't have to wait and can continue. Then, amdgpu_bo_create can continue and return. And then amdgpu_cs_submit_ib can continue. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=101294 Cc: 17.1 <[email protected]> Reviewed-by: Nicolai Hähnle <[email protected]>
* radeonsi: update all resident texture descriptors when neededSamuel Pitoiset2017-06-201-57/+104
| | | | | | | | | | | | | | | To avoid useless DCC fetches when DCC is disabled, descriptors have to be updated in order to reflect this change. This is quite similar to how we update descriptors of bound textures. As a side effect, this should also prevent VM faults when bindless textures are invalidated, because the VA in the descriptor has to be updated accordingly as well. I don't see any performance improvements with DOW3. Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Marek Olšák <[email protected]>
* radeonsi: keep track of the sampler state for texture handlesSamuel Pitoiset2017-06-202-0/+2
| | | | | | | | Needed for updating all resident texture descriptors when dirty_tex_counter changes. Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Marek Olšák <[email protected]>
* radeonsi: fix dumping shader descriptors into ddebug logsMarek Olšák2017-06-191-35/+41
| | | | Reviewed-by: Nicolai Hähnle <[email protected]>
* radeonsi: add a workaround for inexact SNORM8 blitting againMarek Olšák2017-06-191-0/+37
| | | | | | | | GFX9 is affected. We only have tests for GL_x_SNORM where x is R8, RG8, RGB8, and RGBA8. Reviewed-by: Nicolai Hähnle <[email protected]>
* radeonsi/gfx9: fix TC-compatible stencil compressionMarek Olšák2017-06-191-0/+6
| | | | Reviewed-by: Nicolai Hähnle <[email protected]>
* radeonsi/gfx9: fix TXF_LZ with 1D texturesMarek Olšák2017-06-191-1/+2
| | | | Reviewed-by: Nicolai Hähnle <[email protected]>
* radeonsi/gfx9: disable sparse buffersMarek Olšák2017-06-191-0/+3
| | | | Reviewed-by: Nicolai Hähnle <[email protected]>
* gallium/radeon/gfx9: fix PBO texture uploads to compressed texturesNicolai Hähnle2017-06-191-1/+6
| | | | | | | | | st/mesa creates a surface that reinterprets the compressed blocks as RGBA16UI or RGBA32UI. We have to adjust width0 & height0 accordingly to avoid out-of-bounds memory accesses by CB. Cc: 17.1 <[email protected]> Reviewed-by: Marek Olšák <[email protected]>
* r600: fix off-by-one in egd_tables.pyNicolai Hähnle2017-06-191-1/+1
| | | | | | Port of the corresponding fix in sid_tables.py. Reviewed-by: Marek Olšák <[email protected]>
* radeonsi: reduce overhead for resident textures which need color decompressionSamuel Pitoiset2017-06-184-34/+58
| | | | | | | | | This is done by introducing a separate list. si_decompress_textures() is now 5x faster. Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Marek Olšák <[email protected]>
* radeonsi: reduce overhead for resident textures which need depth decompressionSamuel Pitoiset2017-06-184-8/+29
| | | | | | | This is done by introducing a separate list. Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Marek Olšák <[email protected]>
* radeonsi: use util_dynarray_foreach for bindless resourcesSamuel Pitoiset2017-06-182-129/+46
| | | | | Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Marek Olšák <[email protected]>
* gallium/radeon: add a new HUD query for the number of resident handlesSamuel Pitoiset2017-06-184-0/+12
| | | | | | | | | Useful for debugging performance issues when ARB_bindless_texture is enabled. This query doesn't make a distinction between texture and image handles. Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Marek Olšák <[email protected]>
* r600: include libelf headers only as neededEmil Velikov2017-06-171-0/+2
| | | | | | | | | | | | | | | | | Headers are required only when building with OpenCL. As we're building w/o it libelf may be missing, hence we'll error out as below: src/gallium/drivers/r600/evergreen_compute.c:27:10: fatal error: 'gelf.h' file not found ^ 1 error generated. Fixes: d96a210842 ("r600g,compute: provide local copy of functions from ac_binary.c") Reviewed-by: Jan Vesely <[email protected]> Reported-by: Mauro Rossi <[email protected]> Tested-by: Mauro Rossi <[email protected]> Signed-off-by: Emil Velikov <[email protected]>
* radeonsi: include ac_binary.h for struct ac_shader_binaryEmil Velikov2017-06-171-2/+2
| | | | | | | | | | | | | | The header embeds the struct so it needs the header inclusion instead of the dummy forward declaration. Cc: Nicolai Hähnle <[email protected]> Cc: Marek Olšák <[email protected]> Cc: Tom Stellard <[email protected]> Fixes: 32206c5e560 ("radeonsi: Add radeon_shader_binary member to struct si_shader") Signed-off-by: Emil Velikov <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]> Tested-by: Bas Nieuwenhuizen <[email protected]>
* r600, radeon: move radeon_shader_binary_{init,clean} back to radeonEmil Velikov2017-06-173-23/+28
| | | | | | | | | | | | | Those are used by r600 and radeonsi, so moving them within the former was a bad idea. Fixes: d96a210842b ("r600g,compute: provide local copy of functions from ac_binary.c") Cc: Jan Vesely <[email protected]> Cc: Aaron Watry <[email protected]> Signed-off-by: Emil Velikov <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]> Tested-by: Bas Nieuwenhuizen <[email protected]>
* svga: add new num-failed-allocations HUD queryBrian Paul2017-06-165-2/+26
| | | | | | | This counter is incremented if we fail to allocate memory for vertex/index/const buffers, textures, etc. Reviewed-by: Neha Bhende <[email protected]>
* gallium/hud: support GALLIUM_HUD_DUMP_DIR feature on WindowsBrian Paul2017-06-161-6/+30
| | | | | | | Use a dummy implementation of the access() function. Use \ path separator. Add a few comments. Reviewed-by: Neha Bhende <[email protected]>
* svga: add a few minor commentsBrian Paul2017-06-162-1/+6
| | | | Trivial.
* swr/rast: Fix read-back of viewport array indexTim Rowley2017-06-1610-117/+182
| | | | | | | Binner/clipper read viewport array index from the vertex header as needed. Move viewport state to BACKEND_STATE. Reviewed-by: Bruce Cherniak <[email protected]>
* swr/rast: Refactor includes to limit simdintrin.h usageTim Rowley2017-06-1616-1079/+1147
| | | | | | | Reduces the files rebuilt after modifying simdintrin.h from 84 to 64. Reviewed-by: Bruce Cherniak <[email protected]>
* swr/rast: Fix read-back of render target array indexTim Rowley2017-06-165-13/+18
| | | | | | | | The last FE stage can emit render target array index. Currently we only check to see if GS is emitting it. Moved the state to BACKEND_STATE and plumbed the driver to set it. Reviewed-by: Bruce Cherniak <[email protected]>
* swr/rast: Adjust cast for gcc warningTim Rowley2017-06-161-1/+1
| | | | Reviewed-by: Bruce Cherniak <[email protected]>
* swr/rast: Don't transition hottile resolved->dirty during store tilesTim Rowley2017-06-161-1/+4
| | | | | | Fixes crash when dumping render targets and RT surface has been deleted. Reviewed-by: Bruce Cherniak <[email protected]>
* swr/rast: gen_llvm_types.py support for SIMD256/SIMD512Tim Rowley2017-06-161-6/+6
| | | | Reviewed-by: Bruce Cherniak <[email protected]>
* swr/rast: Properly size GS stage scratch spaceTim Rowley2017-06-161-1/+1
| | | | Reviewed-by: Bruce Cherniak <[email protected]>
* swr/rast: Fix early z / query interactionTim Rowley2017-06-161-0/+4
| | | | | | | | | | | | | | | | For certain cases, we perform early z for optimization. The GL_SAMPLES_PASSED query was providing erroneous results because we were counting the number of samples passed before the fragment shader, which did not work if the fragment shader contained a discard. Account properly for discard and early z, by anding the zpass mask with the post fragment shader active mask, after the fragment shader. Fixes the following piglit tests: - occlusion-query-discard - occlusion_query_meta_fragments Reviewed-by: Bruce Cherniak <[email protected]>
* swr/rast: Share vertex memory between VS input/outputTim Rowley2017-06-161-5/+2
| | | | | | | | Removes large simdvertex stack allocation. Vertex shader must ensure reads happen before writes. Reviewed-by: Bruce Cherniak <[email protected]>
* swr/rast: Add support for dynamic vertex size for VS outputTim Rowley2017-06-163-15/+23
| | | | | | | | Add support for dynamic vertex size for the vertex shader output. Add new state in SWR_FRONTEND_STATE to specify the size. Reviewed-by: Bruce Cherniak <[email protected]>