summaryrefslogtreecommitdiffstats
Commit message (Collapse)AuthorAgeFilesLines
* mesa: use pre_hashed version of search for the mesa hash tableTimothy Arceri2017-04-121-2/+6
| | | | | | | The key is just an unsigned int so there is never any real hashing done. Reviewed-by: Eric Anholt <[email protected]>
* swr: [rasterizer core] Disable 8x2 tile backendTim Rowley2017-04-111-1/+1
| | | | Reviewed-by: Bruce Cherniak <[email protected]>
* swr: [rasterizer common] Add _simd_testz_si aliasTim Rowley2017-04-111-0/+1
| | | | Reviewed-by: Bruce Cherniak <[email protected]>
* swr: [rasterizer archrast] Fix archrast for MSVC 2017 compilerTim Rowley2017-04-115-6/+6
| | | | Reviewed-by: Bruce Cherniak <[email protected]>
* swr: [rasterizer jitter] Remove unused functionTim Rowley2017-04-112-35/+0
| | | | Reviewed-by: Bruce Cherniak <[email protected]>
* swr: [rasterizer jitter] Remove HAVE_LLVM tests supporting llvm < 3.8Tim Rowley2017-04-114-52/+0
| | | | Reviewed-by: Bruce Cherniak <[email protected]>
* swr: [rasterizer common/core] Fix 32-bit windows buildTim Rowley2017-04-116-117/+123
| | | | Reviewed-by: Bruce Cherniak <[email protected]>
* swr: [rasterizer core] Fix unused variable warningsTim Rowley2017-04-113-10/+1
| | | | Reviewed-by: Bruce Cherniak <[email protected]>
* swr: [rasterizer core] Code formating changeTim Rowley2017-04-111-10/+10
| | | | Reviewed-by: Bruce Cherniak <[email protected]>
* swr: [rasterizer core] SIMD16 Frontend WIP - PATim Rowley2017-04-111-22/+22
| | | | | | Fix PA NextPrim for SIMD8 on SIMD16. Reviewed-by: Bruce Cherniak <[email protected]>
* swr: [rasterizer core] SIMD16 Frontend WIP - ClipperTim Rowley2017-04-115-124/+941
| | | | | | Implement widened clipper for SIMD16. Reviewed-by: Bruce Cherniak <[email protected]>
* swr: [rasterizer core] Multisample sample position setup changeTim Rowley2017-04-113-75/+92
| | | | Reviewed-by: Bruce Cherniak <[email protected]>
* swr: [rasterizer core] Reduce templates to speed compileTim Rowley2017-04-113-10/+71
| | | | | | | Quick patch to remove some unused template params to cut down rasterizer compile time. Reviewed-by: Bruce Cherniak <[email protected]>
* i965/fs: Take into account lower frequency of conditional blocks in spilling ↵Francisco Jerez2017-04-111-5/+14
| | | | | | | | | | | | | | | | | | | | cost heuristic. The individual branches of an if/else/endif construct will be executed some unknown number of times between 0 and 1 relative to the parent block. Use some factor in between as weight while approximating the cost of spill/fill instructions within a conditional if-else branch. This favors spilling registers used within conditional branches which are likely to be executed less frequently than registers used at the top level. Improves the framerate of the SynMark2 OglCSDof benchmark by ~1.9x on my SKL GT4e. Should have a comparable effect on other platforms. No significant regressions. Reviewed-by: Jason Ekstrand <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]> Reviewed-by: Matt Turner <[email protected]>
* swr: return true for PIPE_CAP_DOUBLESTim Rowley2017-04-111-0/+1
| | | | Reviewed-by: Ilia Mirkin <[email protected]>
* i965: Set kernel features before computing max GL version.Kenneth Graunke2017-04-111-24/+24
| | | | | | | | | | | | | | | | We check these bitfields when computing the Haswell max GL version. We need to set them ahead of time, or they won't exist, and all our checks will fail. That sets the max core profile GL version to 4.2. This introduces the bizarre situation where asking for a GL context with version 4.3+ fails, but asking for a GL core profile context with version <= 4.2 actually promotes you a 4.5 context. GLX_MESA_query_renderer also reported the bogus 4.2 value. Now it shows 4.5. Cc: "17.0" <[email protected]> Reported-and-tested-by: Rafael Ristovski <[email protected]>
* anv: remove needless VALGRIND_MAKE_MEM_DEFINEDJuan A. Suarez Romero2017-04-111-1/+0
| | | | | | This is already invoked in the following VG_NOACCESS_READ() call. Reviewed-by: Jason Ekstrand <[email protected]>
* etnaviv: enable TS, but disable autodisableLucas Stach2017-04-111-2/+2
| | | | | | | | Autodisable seems to cause missed rendering in some cases, but otherwise TS seems to work properly. Signed-off-by: Lucas Stach <[email protected]> Reviewed-by: Wladimir J. van der Laan <[email protected]>
* etnaviv: enable TS also on sampler resourcesLucas Stach2017-04-111-3/+0
| | | | | | | | | | | Fixes a performance issue with imported winsys buffers as those are marked with binding sampler view. This might require a TS flush on single pipe chips that directly sample from the rendered buffer, but otherwise seems to work fine. Signed-off-by: Lucas Stach <[email protected]> Reviewed-by: Wladimir J. van der Laan <[email protected]>
* etnaviv: align TS surface size to number of pixel pipesLucas Stach2017-04-111-1/+2
| | | | | | | | | | The TS surface gets cleared by a tiled RS fill. If the chip has more than 1 pixel pipe the size of the TS surface needs to be aligned so that each pipe address matches a tile start, otherwise the RS will hang. Signed-off-by: Lucas Stach <[email protected]> Reviewed-by: Wladimir J. van der Laan <[email protected]>
* etnaviv: avoid using invalid TSLucas Stach2017-04-113-1/+7
| | | | | | | | | | | The TS is only valid after it has been initialized by a fast clear, so it should not be taken into account when blitting resources that haven't been cleared. Also the blit itself invalidates the destination TS, as it's not updated and will retain data from the previous rendering after the blit. Signed-off-by: Lucas Stach <[email protected]> Reviewed-by: Wladimir J. van der Laan <[email protected]>
* glsl: use the BA1 macro for textureQueryLevels()Samuel Pitoiset2017-04-111-32/+33
| | | | | | | For both consistency and new bindless sampler types. Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Timothy Arceri <[email protected]>
* glsl: use the BA1 macro for textureSamples()Samuel Pitoiset2017-04-111-9/+10
| | | | | | | For both consistency and new bindless sampler types. Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Timothy Arceri <[email protected]>
* glsl: use the BA1 macro for textureCubeArrayShadow()Samuel Pitoiset2017-04-111-5/+6
| | | | | | | For both consistency and new bindless sampler types. Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Timothy Arceri <[email protected]>
* radv: Implement pipeline statistics queries.Bas Nieuwenhuizen2017-04-113-27/+394
| | | | | | | | | | | The devil is in the shader again, otherwise this is fairly straightforward. The CTS contains no pipeline statistics copy to buffer testcases, so I did a basic smoketest. Signed-off-by: Bas Nieuwenhuizen <[email protected]> Reviewed-by: Dave Airlie <[email protected]>
* radv: Let count be dynamic in radv_break_on_count.Bas Nieuwenhuizen2017-04-111-3/+3
| | | | | Signed-off-by: Bas Nieuwenhuizen <[email protected]> Reviewed-by: Dave Airlie <[email protected]>
* radv: Rename query pipeline/set layout.Bas Nieuwenhuizen2017-04-112-13/+13
| | | | | | | For using them with both occlusion and pipeline statistics queries. Signed-off-by: Bas Nieuwenhuizen <[email protected]> Reviewed-by: Dave Airlie <[email protected]>
* radv: Use VK_WHOLE_SIZE for the query buffer bindings.Bas Nieuwenhuizen2017-04-111-2/+2
| | | | | | | | The buffer sizes are specified just a few lines earlier, so don't repeat ourselves. Signed-off-by: Bas Nieuwenhuizen <[email protected]> Reviewed-by: Dave Airlie <[email protected]>
* radv: Use a shader for occlusion CmdCopyQueryPoolResults.Bas Nieuwenhuizen2017-04-111-74/+64
| | | | | | | | | | | | | | | | | | Use the new occlusion query copy shader. We don't use the shader for the waiting as a polling loop ineracts badly with having caching enabled. I noticed on my GPU (Tonga) that the values are written out in order, so I just use a WAIT_REG_MEM on the last value. If it turns out other chips don't do that we may need to look a bit more into this. Having 8 WAIT_REG_MEM packets per query doesn't sound ideal. This also restricts the availability word in the pool to timestamp queries only, as occlusion queries don't use it, and pipeline statistic queries likely won't either. Signed-off-by: Bas Nieuwenhuizen <[email protected]> Reviewed-by: Dave Airlie <[email protected]>
* radv: Add occlusion query shader.Bas Nieuwenhuizen2017-04-114-0/+435
| | | | | | | | | Adds a shader for writing occlusion query results to a buffer, as the CP packet isn't support on SI or secondary buffers, and doesn't handle the availability bit (or partial results) nor truncation to 32-bit. Signed-off-by: Bas Nieuwenhuizen <[email protected]> Reviewed-by: Dave Airlie <[email protected]>
* i965: Fix wonky indentation left by brw_bo_alloc_tiled rename.Kenneth Graunke2017-04-102-18/+17
|
* nouveau: when mapping a persistent buffer, synchronize on former xfersIlia Mirkin2017-04-111-4/+2
| | | | | | | | | If the buffer is being used, we should wait for those uses to be complete before returning the map. Fixes: GL45-CTS.direct_state_access.buffers_functional Signed-off-by: Ilia Mirkin <[email protected]> Cc: [email protected]
* nvc0: increase texture buffer object alignment to 256 for pre-GM107Ilia Mirkin2017-04-111-1/+1
| | | | | | | | | | | | | We currently don't pass the low byte of the address via the surface info, so in order to work with images, these have to implicitly be aligned to 256. The proprietary driver also doesn't go out of its way to provide lower alignment. Fixes GL45-CTS.texture_buffer.texture_buffer_texture_buffer_range Signed-off-by: Ilia Mirkin <[email protected]> Cc: [email protected] Reviewed-by: Samuel Pitoiset <[email protected]>
* mesa: fix typo and add assert() to _mesa_attach_renderbuffer_without_ref()Timothy Arceri2017-04-111-1/+3
| | | | | This function should only be used with a "freshly created" renderbuffer so assert RefCount is 1.
* i965/drm: Add stall warnings when mapping or waiting on BOs.Kenneth Graunke2017-04-1017-55/+68
| | | | | | | | | | | | | | | | | | | This restores the performance warnings removed in: i965: Drop brw_bo_map[_gtt] wrappers which issue perf warnings. but adds them for nearly all BO mapping, and also for wait_rendering. Because we add this to the core bufmgr, we automatically get stall warnings in all callers, unlike before where only a few callsites used the wrappers that gave stall warnings. We also do it a bit differently: we simply measure how long set_domain takes (the part that stalls), and complain if it's more than 0.01 ms. We don't bother calling brw_bo_busy(), and we don't measure the mmap time (which doesn't stall). This should be more accurate. Reviewed-by: Daniel Vetter <[email protected]>
* i965/drm: Make a set_domain() helper function.Kenneth Graunke2017-04-101-37/+20
| | | | | | Less boilerplate. Reviewed-by: Daniel Vetter <[email protected]>
* i965/batch: Ensure we use a consistent offset in relocsDaniel Vetter2017-04-101-2/+6
| | | | | | | | | | | | | | | | | | | In theory gcc is free to re-load them, and if a concurrent execbuf races and updates bo->offset64 then we have a problem: execbuffer api requires that the ->presumed_offset and the one we used for the reloc matches. It does not require that the value is sensible, which means no locks needed, just a consistent load. Ken said his next series will nuke this, so just hand-roll the kernel's READ_ONCE idea inline. FIXME: Most callers of brw_emit_reloc recompute the relocation themselves, which means this doesn't really fix the race. But the long term plan is to move to per-context relocation handling, which will fix this all properly. So leave this for now as just a reminder. Signed-off-by: Daniel Vetter <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
* i965/bufmgr: Garbage-collect vma cache/pruningDaniel Vetter2017-04-102-129/+5
| | | | | | | | | | | | | | | | | | | | | | | This was done because the kernel has 1 global address space, shared with all render clients, for gtt mmap offsets, and that address space was only 32bit on 32bit kernels. This was fixed in commit 440fd5283a87345cdd4237bdf45fb01130ea0056 Author: Thierry Reding <[email protected]> Date: Fri Jan 23 09:05:06 2015 +0100 drm/mm: Support 4 GiB and larger ranges which shipped in 4.0. Of course you still want to limit the bo cache to a reasonable size on 32bit apps to avoid ENOMEM, but that's better solved by tuning the cache a bit. On 64bit, this was never an issue. On top, mesa never set this, so it's all dead code. Collect an trash it. Signed-off-by: Daniel Vetter <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
* i965/bufmgr: Remove some reuse functionsDaniel Vetter2017-04-102-33/+0
| | | | | | | | | is_reusable was needed by uxa because it couldn't keep track of its scanout buffers and used this as a proxy. Disabling reuse is a silly idea, we set this once at start. Remove both. Signed-off-by: Daniel Vetter <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
* i965/bufmgr: remove start_gtt_accessDaniel Vetter2017-04-102-29/+14
| | | | | | | | | | | Iirc this was used by uxa for persistent mmpas of the frontbuffer. For mesa all the set_domain stuff needed before a synchronized mmap is handled within the bufmgr, so no reason ever to call this. Inline the implementation into its only internal user. Signed-off-by: Daniel Vetter <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
* i965/bufmgr: Delete set_tilingDaniel Vetter2017-04-102-25/+0
| | | | | | | | | | | | | | Entirely unused, and really shouldn't be used. The alloc functions already take care of this. And even in a future where we're not going to h/v-align tiled buffers in the bufmgr, but only in isl, I think we still want to adjust the tiling mode in the bufmgr, since that ties in closely to mmaps and stuff like that. get_tiling is still needed for the import paths (until we have modifiers everywhere). Signed-off-by: Daniel Vetter <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
* i965/bufmgr: Delete alloc_for_renderDaniel Vetter2017-04-102-19/+0
| | | | | | | Entirely unused, mesa instead used the BO_ALLOC_FOR_RENDER flag. Signed-off-by: Daniel Vetter <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
* i965/drm: Use list_for_each_entry_safe in a couple of cases.Kenneth Graunke2017-04-101-11/+3
| | | | | | Suggested by Chris Wilson. A tiny bit simpler. Reviewed-by: Daniel Vetter <[email protected]>
* i965/drm: Rename intel_bufmgr_gem.c to brw_bufmgr.c.Kenneth Graunke2017-04-102-1/+1
| | | | | | Matches the class name and the header file name. Acked-by: Jason Ekstrand <[email protected]>
* i965/drm: Reindent intel_bufmgr_gem.c and brw_bufmgr.h.Kenneth Graunke2017-04-102-1215/+1161
| | | | | | | indent -i3 -nut -br -brs -npcs -ce --no-tabs -Tuint32_t -Tuint64_t plus some manual fixes because those aren't quite the right settings. Acked-by: Jason Ekstrand <[email protected]>
* i965/drm: Rename drm_bacon_bo to brw_bo.Kenneth Graunke2017-04-1048-477/+475
| | | | | | | | | | The bacon is all gone. This renames both the class and the related functions. We're about to run indent on the bufmgr code, so no need to worry about fixing bad indentation. Acked-by: Jason Ekstrand <[email protected]>
* i965: Drop brw_bo_map[_gtt] wrappers which issue perf warnings.Kenneth Graunke2017-04-107-57/+10
| | | | | | | | | | | | | | | | The stupid reason for eliminating these functions is that I'm about to rename drm_bacon_bo_map() to brw_bo_map(), which makes the real function have the short name, rather than the wrapper. I'm also planning on reworking our mapping code soon, so we use WC mappings and proper unsynchronized mappings on non-LLC platforms. It will be easier to do that without thinking about the stall warnings and wrappers. My eventual hope is to put the performance warnings in the BO map function itself, so all callers gain the warning. Acked-by: Jason Ekstrand <[email protected]>
* i965/drm: Rename drm_bacon_reg_read() to brw_reg_read().Kenneth Graunke2017-04-104-12/+8
| | | | | | Less bacon. Acked-by: Jason Ekstrand <[email protected]>
* i965/drm: Rename drm_bacon_bufmgr to struct brw_bufmgr.Kenneth Graunke2017-04-108-72/+69
| | | | | | Also stop using typedefs, per Mesa coding style. Acked-by: Jason Ekstrand <[email protected]>
* i965: Just use a uint32_t context handle rather than a malloc'd wrapper.Kenneth Graunke2017-04-107-70/+21
| | | | | | | | | | | | drm_bacon_context is a malloc'd struct containing a uint32_t context ID and a pointer back to the bufmgr. The bufmgr pointer is pretty useless, as everybody already has brw->bufmgr. At that point...we may as well just use the ctx_id handle directly. A number of places already had to call drm_bacon_gem_context_get_id() to extract the ID anyway. Now they just have it. Reviewed-by: Chris Wilson <[email protected]> Acked-by: Jason Ekstrand <[email protected]>