summaryrefslogtreecommitdiffstats
Commit message (Collapse)AuthorAgeFilesLines
* intel/isl: Tighten up restrictions for CCS on gen7Jason Ekstrand2017-07-221-7/+23
| | | | | | | | | | It may technically be possible to enable some sort of fast-clear support for at least the base slice of a 2D array texture on gen7. However, it's not documented to work, we've never tried to do it in GL, and we have no idea what the hardware does if you turn on CCS_D with arrayed rendering. Let's just play it safe and disallow it for now. If someone really cares that much about gen7 performance, they can come along and try to get it working later.
* i965/bufmgr: Add comments about GTT coherency issues.Chris Wilson2017-07-221-0/+22
| | | | | | (Patch written by Ken, but entirely comments written by Chris.) Acked-by: Kenneth Graunke <[email protected]>
* i965: Drop non-LLC lunacy in the program cache code.Kenneth Graunke2017-07-223-70/+21
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The non-LLC story was a horror show. We uploaded data via pwrite (drm_intel_bo_subdata), which would stall if the cache BO was in use (being read) by the GPU. Obviously, we wanted to avoid that. So, we tried to detect whether the buffer was busy, and if so, we'd allocate a new BO, map the old one read-only (hopefully not stalling), copy all shaders compiled since the dawn of time to the new buffer, upload our new one, toss the old BO, and let the state upload code know that our program cache BO changed. This was a lot of extra data copying, and flagging BRW_NEW_PROGRAM_CACHE would also cause a new STATE_BASE_ADDRESS to be emitted, stalling the entire pipeline. Not only that, but our rudimentary busy tracking consistented of a flag set at execbuf time, and not cleared until we threw out the program cache BO. So, the first shader upload after any drawing would hit this "abandon the cache and start over" copying path. This is largely unnecessary - it's just ancient and crufty code. We can use the same persistent mapping paths on all platforms. On non-ancient kernels, this will use a write combining map, which should be reasonably fast. One aspect that is worse: we do occasionally grow the program cache BO, and copy the old contents to the newer BO. This will suffer from UC readback performance now. To mitigate this, we use the MOVNTDQA based streaming memcpy on platforms with SSE 4.1 (all Gen7+ atoms). Gen4-5 are unfortunately going to be penalized. v2: Add MOVNTDQA path, rebase on other map flag changes. v3: Drop cache->bo_used_by_gpu too (caught by Chris Wilson). Reviewed-by: Matt Turner <[email protected]>
* i965: Set MAP_PERSISTENT on program cache buffers.Kenneth Graunke2017-07-221-4/+8
| | | | | | | | Chris Wilson pointed out that this mapping really is persistant. Shouldn't actually have any effect today, but best to set it anyway. Reviewed-by: Matt Turner <[email protected]>
* i965: Correctly set MAP_WRITE when creating the LLC program cache map.Kenneth Graunke2017-07-221-1/+1
| | | | | | | Using a read-only mapping is completely bogus - we use this mapping to write all new shaders to the cache. Reviewed-by: Matt Turner <[email protected]>
* i965/bufmgr: Use write-combine mappings where availableMatt Turner2017-07-221-3/+88
| | | | | | | | | | | | | | | | | | | | | | | | | | | | Write-combine mappings give much better performance on writes than uncached access through the GTT. Improves performance of GFXBench 4's gl_driver2 benchmark at 1024x768 on Apollolake by 3.6086% +/- 0.674193% (n=15). v2: (by Ken) Rebase on lockless mappings, map_count deletion, valgrind updates, potential for CPU/WC maps failing, and other changes. v3: (by Ken and Chris Wilson) (Ken): Rebase on set_domain -> gem_wait (Chris): Fix up a failed CPU/WC mmaping with a GTT mapping Not all objects will be mappable for direct access by the CPU (either using WC/CPU or WC paths), for example, a dmabuf wrapping an object on a foreign device or an object wrapping access to stolen memory. Since either the physical pages are not known or even do not exist, we need to use the mediated, indirect access via the GTT. (If one day, the kernel does suddenly start providing mediated access via a regular WB/WC mmapping, we no longer need the fallback.) v4: Avoid falling back for MAP_RAW (Chris). Reviewed-by: Kenneth Graunke <[email protected]>
* i965/bufmgr: Skip wait ioctl when not busy.Kenneth Graunke2017-07-221-0/+4
| | | | | | | | If the buffer is idle, we I915_GEM_WAIT will return immediately, so we may as well skip the ioctl altogether. We can't trust the "idle" flag for external buffers, but for most, it should be fine. Reviewed-by: Matt Turner <[email protected]>
* i965/bufmgr: Explicitly wait instead of using I915_GEM_SET_DOMAIN.Kenneth Graunke2017-07-221-17/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | With the advent of asynchronous maps, domain tracking doesn't make a whole lot of sense. Buffers can be in use on both the CPU and GPU at the same time. In order to avoid blocking, we stopped using set_domain for asynchronous mappings, which means that the kernel's tracking has lies. We can't properly track it in userspace either, as the kernel can change domains on us spontaneously (for example, when un-swapping). According to Chris Wilson, I915_GEM_SET_DOMAIN does the following: 1. pins the backing storage (acquiring pages outside of the struct_mutex) 2. waits either for read/write access, including inter-device waits 3. updates the domain, clflushing as required 4. marks the object as used (for swapping) 5. turns off FBC/PSR/fancy scanout caching Item (1) is not terribly important. Most BOs are recycled via the BO cache, so they already have pages. Regardless, we fixed this via an initial set_domain in the previous patch. We implement item (2) with I915_GEM_WAIT. This has one downside: we'll stall unnecessarily if we do a read-only mapping of a buffer that the GPU is reading. I believe this is pretty uncommon. We may want to extend the wait ioctl at some point. Mesa already does item (3) itself. For cache-coherent buffers (most on LLC systems), we don't need to do any clflushing - the CPU and GPU views are coherent. For non-coherent buffers (most on non-LLC systems), we currently only use the CPU for read-only maps, and we explicitly clflush when necessary. We don't care about item (4)...swapping has already killed performance. Plus, with async maps, the kernel's domain tracking is already bogus, so it can't do this accurately regardless. Item (5) should be okay because we avoid cached maps of scanout buffers. Reviewed-by: Matt Turner <[email protected]>
* i965/bufmgr: Allocate BO pages outside of the kernel's locking.Kenneth Graunke2017-07-221-0/+13
| | | | | | | | Suggested by Chris Wilson. v2: Set the write domain to 0 (suggested by Chris). Reviewed-by: Matt Turner <[email protected]>
* glsl: rework misleading block layout codeTimothy Arceri2017-07-231-4/+4
| | | | | | | | | | | From the ARB_uniform_buffer_object spec: ""shared" uniform blocks, the default layout, ..." This doesn't fix anything as the default layout is already applied at this point but fixes the misleading code/comment. Reviewed-by: Samuel Pitoiset <[email protected]>
* glsl: remove placeholder commentTimothy Arceri2017-07-231-4/+0
| | | | | | | | This was added in 2d03f48a65a666 and seems like it was intended as a TODO comment in a function stub rather than a useful code comment. Reviewed-by: Samuel Pitoiset <[email protected]>
* st/mesa: use proper resource target type in st_AllocTextureStorage()Brian Paul2017-07-221-1/+4
| | | | | | | | | | | When we validate the texture sample count, pass the correct pipe_texture_target for the texture, rather than PIPE_TEXTURE_2D. Also add more comments about MSAA. No piglit regressions with VMware driver. Reviewed-by: Samuel Pitoiset <[email protected]>
* mesa: remove pointless assignments in init_teximage_fields_ms()Brian Paul2017-07-221-3/+0
| | | | | | | | The NumSamples and FixedSampleLocation fields are set again later at the end of the function so these earlier assignments aren't needed. Reviewed-by: Ian Romanick <[email protected]> Reviewed-by: Samuel Pitoiset <[email protected]>
* svga: Limit number of immediates in shaderNeha Bhende2017-07-221-3/+5
| | | | | | | | | | | | imm {128.0, -128.0, 2.0, 3.0} is used for lit instruction which is not used very frequently. So allocate it only if lit instruction is used. Tested with mtt piglit and mtt glretrace v2: As per Charmaine's comment Reviewed-by: Brian Paul <[email protected]> Reviewed-by: Charmaine Lee <[email protected]>
* svga: fix constant indices for texcoord scale factors and texture buffer sizeCharmaine Lee2017-07-221-9/+6
| | | | | | | | | | This patch fixes the ordering of the constant indices for texcoord scale factor and texture buffer size to match the order they were added to the constant buffer in svga_get_extra_constants_common(). Tested with MTT piglit, glretrace. Reviewed-by: Brian Paul <[email protected]>
* svga: fix unnormalized->normalized texture coordinate conversionNeha Bhende2017-07-223-3/+35
| | | | | | | | | | | | | Sometimes, converting unnormalized coordinates to normalized coordinates requires an epsilon value to produce the right texels with nearest filtering. Adding 0.0001 to the coordinates when the min/mag filter is nearest fixes the issue. Fixes piglit test fbo-blit-scaled-linear Tested with mtt-piglit, mtt-glretrace Reviewed-by: Brian Paul <[email protected]> Reviewed-by: Charmaine Lee <[email protected]>
* svga: only support 4x, 8x, 16x msaaBrian Paul2017-07-221-0/+5
| | | | | | | Skip 2x MSAA, for example, since it's seldom used and just bloats the list of pixel formats. Reviewed-by: Charmaine Lee <[email protected]>
* mesa: include texture size in error messagesBrian Paul2017-07-221-4/+5
| | | | | Reviewed-by: Alejandro Piñeiro <[email protected]> Reviewed-by: Samuel Pitoiset <[email protected]>
* i965: Support the mesa_no_error driconf option.Kenneth Graunke2017-07-222-0/+4
| | | | | | | This allows us to override contexts to use no_error functionality even if the applications themselves do not. Reviewed-by: Matt Turner <[email protected]>
* anv/blorp: Assert isl_surf_init success in do_buffer_copyJason Ekstrand2017-07-221-13/+15
| | | | Reviewed-by: Topi Pohjolainen <[email protected]>
* anv/blorp: Explicitly set row_pitch in do_buffer_copyJason Ekstrand2017-07-221-1/+1
| | | | | | | | We have a very specific row pitch that we want and we don't want ISL to be changing it on us so just be explicit about it. Fixes: a40f0430347c07bf2d5794642fe02f5dd248a473 Reviewed-by: Topi Pohjolainen <[email protected]>
* i965: Delete gen8_draw_upload.cKenneth Graunke2017-07-221-0/+0
| | | | For some reason we left an empty file, rather than deleting it.
* nv50/ir: disable mul+add to mad for precise instructionsKarol Herbst2017-07-211-2/+3
| | | | | | | | | | | | fixes missrendering in TombRaider KHR-GL44.gpu_shader5.precise_qualifier KHR-GL45.gpu_shader5.precise_qualifier v4: disable opt only for MAD, it's fine for SAD Signed-off-by: Karol Herbst <[email protected]> Reviewed-by: Pierre Moreau <[email protected]>
* nv50/ir/tgsi: handle precise for most ALU instructionsKarol Herbst2017-07-211-0/+2
| | | | | Signed-off-by: Karol Herbst <[email protected]> Reviewed-by: Pierre Moreau <[email protected]>
* nv50/ir: add precise field to InstructionKarol Herbst2017-07-212-0/+3
| | | | | | | v4: initialize field with NULL Signed-off-by: Karol Herbst <[email protected]> Reviewed-by: Pierre Moreau <[email protected]>
* st/glsl_to_tgsi: don't optimize mul+add to mad if expression is preciseKarol Herbst2017-07-211-1/+1
| | | | | Signed-off-by: Karol Herbst <[email protected]> Reviewed-by: Nicolai Hähnle <[email protected]>
* gallium/docs: add precise instruction modifierKarol Herbst2017-07-211-1/+10
| | | | | | | | v4: add comment about intermediate rounding step to MAD Signed-off-by: Karol Herbst <[email protected]> Reviewed-by: Nicolai Hähnle <[email protected]> Reviewed-by: Roland Scheidegger <[email protected]>
* tgsi/text: parse _PRECISE modifierKarol Herbst2017-07-211-3/+14
| | | | | | | | v2: use str_match_no_case to fix _SAT_PRECISE detection v4: usd is_digit_alpha_underscore to match end of mods Signed-off-by: Karol Herbst <[email protected]> Reviewed-by: Nicolai Hähnle <[email protected]>
* tgsi: populate preciseKarol Herbst2017-07-218-30/+51
| | | | | | | | | Only implemented for glsl->tgsi. Other converters just set precise to 0. v2: remove precise paramter from ureg_tex_insn and ureg_memory_insn Signed-off-by: Karol Herbst <[email protected]> Reviewed-by: Nicolai Hähnle <[email protected]>
* st/glsl_to_tgsi: handle precise modifierKarol Herbst2017-07-211-0/+13
| | | | | | | | | all subexpression inside an ir_assignment needs to be tagged as precise. v2: make precise handling more global inside the visitor Signed-off-by: Karol Herbst <[email protected]> Reviewed-by: Nicolai Hähnle <[email protected]>
* tgsi/dump: print _PRECISE modifier on InstructionsKarol Herbst2017-07-211-0/+4
| | | | | Signed-off-by: Karol Herbst <[email protected]> Reviewed-by: Nicolai Hähnle <[email protected]>
* tgsi: add precise flag to tgsi_instructionKarol Herbst2017-07-212-1/+3
| | | | | | Signed-off-by: Karol Herbst <[email protected]> Reviewed-by: Nicolai Hähnle <[email protected]> Reviewed-by: Roland Scheidegger <[email protected]>
* i965: Set lower_vote_trivial in vector_nir_options_gen6 too.Kenneth Graunke2017-07-211-0/+1
| | | | | | There's a second struct for Gen6+. Reviewed-by: Matt Turner <[email protected]>
* radv: reset non-syncobj semaphore context after wait.Dave Airlie2017-07-221-0/+2
| | | | | | | | | | | When I ported from libdrm, I forgot to add the line to reset the sem, we just need to reset the context. This fixes a regression in DOOM. Fixes: 9ac1432a571 ("radv: port to new libdrm API.") Reported-by: Grazvydas Ignotas <[email protected]> Signed-off-by: Dave Airlie <[email protected]>
* st/mesa: add destroy_drawable interfaceCharmaine Lee2017-07-207-3/+123
| | | | | | | | | | | | | | | With this patch, the st manager will maintain a hash table for the active framebuffer interface objects. A destroy_drawable interface is added to allow the state tracker to notify the st manager to remove the associated framebuffer interface object from the hash table, so the associated framebuffer and its resources can be deleted at framebuffers purge time. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=101829 Fixes: 147d7fb772a ("st/mesa: add a winsys buffers list in st_context") Tested-by: Brad King <[email protected]> Tested-by: Gert Wollny <[email protected]> Reviewed-by: Brian Paul <[email protected]>
* radv: rebase radv_entrypoints_gen.py on anv_entrypoints_gen.pyDylan Baker2017-07-212-275/+287
| | | | | | | | | | | | | The two generators forked from each other, and they remain basically the same. This rebases the radv version on the anv version, but with the radv changes ported over. The result is that we get rid of the "cat |" madness and gain mako, correct "generated by" attributions, and write files out directly. The only differences between the output is whitespace and comments. Signed-off-by: Dylan Baker <[email protected]> Acked-by: Dave Airlie <[email protected]>
* i965/miptree: Clean-up unusedTopi Pohjolainen2017-07-2214-1646/+96
| | | | | Reviewed-by: Jason Ekstrand <[email protected]> Signed-off-by: Topi Pohjolainen <[email protected]>
* i965/miptree: Switch remaining surfaces to islTopi Pohjolainen2017-07-222-93/+41
| | | | | Reviewed-by: Jason Ekstrand <[email protected]> Signed-off-by: Topi Pohjolainen <[email protected]>
* i965/miptree: Drop miptree_array_layout in get_isl_dim_layout()Topi Pohjolainen2017-07-223-11/+8
| | | | | | | | | | | | | This was only needed for checking gen6 stencil which is already using isl. One could delete GEN6_HIZ_STENCIL layout altogether but that will be gone with the rest after a while anyway. The dim_layout converter is needed even after transition to isl when setting up surface states - see brw_emit_surface_state(). Hence dropping the unneeded argument separately. Reviewed-by: Jason Ekstrand <[email protected]> Signed-off-by: Topi Pohjolainen <[email protected]>
* i965/miptree: Relax size alignment for linear surfacesTopi Pohjolainen2017-07-221-1/+6
| | | | | Reviewed-by: Jason Ekstrand <[email protected]> Signed-off-by: Topi Pohjolainen <[email protected]>
* i965/miptree: Store compression flag also for isl basedTopi Pohjolainen2017-07-221-0/+1
| | | | | Reviewed-by: Jason Ekstrand <[email protected]> Signed-off-by: Topi Pohjolainen <[email protected]>
* i965/miptree: Check tex image allocation failuresTopi Pohjolainen2017-07-221-0/+2
| | | | | | | | | | | | | | allowing graceful failure instead of crash on assert later on. This can be hit, for example, on SNB when trying to allocate 8kx8k CUBE_MAP against isl: x-tiled buffer size becomes 2421161984 exceeding the maximum of 1 << 31 == 2147483648. Another way to hit this on SNB is with multisampling of over 64-bit formats. Reviewed-by: Jason Ekstrand <[email protected]> Signed-off-by: Topi Pohjolainen <[email protected]>
* main/teximage: Even on failure use valid format for init()Topi Pohjolainen2017-07-221-1/+1
| | | | | | | | | Otherwise init_teximage_fields_ms() (called by _mesa_init_teximage_fields()) will always assert as it can't find valid base format. Reviewed-by: Jason Ekstrand <[email protected]> Signed-off-by: Topi Pohjolainen <[email protected]>
* intel/isl/gen7: Don't allow multisampled surfaces with valign2Topi Pohjolainen2017-07-221-19/+23
| | | | | | | | | | | | | | | | | There is the same constraintg later on as assert in isl_gen7_choose_image_alignment_el() so catch it earlier in order to return error instead of crash. Needed to avoid crashes with piglits on IVB and HSW: arb_internalformat_query2.image_format_compatibility_type pname checks arb_internalformat_query2.all internalformat_<x>_type pname checks arb_internalformat_query2.max dimensions related pname checks arb_copy_image.arb_copy_image-formats --samples=2/4/6/8 arb_texture_float.multisample-fast-clear gl_arb_texture_float Reviewed-by: Jason Ekstrand <[email protected]> Signed-off-by: Topi Pohjolainen <[email protected]>
* intel/isl/gen7: Allow msaa with signed integer formatsTopi Pohjolainen2017-07-221-2/+3
| | | | | | | | | | | | | These formats are already allowed by the i965 GL driver, and the feature seems to work just fine. There are tests for multisampled rendering in piglit: tests/spec/ext_framebuffer_multisample which can be patched to try 16I/32I in addition to GL_RGBA8I. IvyBridge passed all tests with all sample numbers. Reviewed-by: Jason Ekstrand <[email protected]> Signed-off-by: Topi Pohjolainen <[email protected]>
* intel/isl/gen7: Allow msaa with 128-bit formatsTopi Pohjolainen2017-07-221-4/+7
| | | | | | | | | | | | | | These formats are already allowed by the i965 GL driver, and the feature seems to work just fine. There are tests for multisampled rendering in piglit: tests/spec/ext_framebuffer_multisample which can be patched to try GL_RGBA16F/32F/16I/16UI/32I/32UI in addition to GL_RGBA/8I. IvyBridge passed all tests with all sample numbers and even with 128-bit formats. Reviewed-by: Jason Ekstrand <[email protected]> Signed-off-by: Topi Pohjolainen <[email protected]>
* intel/isl: Allow 1D surfaces with compressed formatsTopi Pohjolainen2017-07-221-1/+1
| | | | | Reviewed-by: Jason Ekstrand <[email protected]> Signed-off-by: Topi Pohjolainen <[email protected]>
* intel/isl: Align non-tiled horizontally by cache lineTopi Pohjolainen2017-07-221-1/+15
| | | | | | | in order to support blit engine. Reviewed-by: Jason Ekstrand <[email protected]> Signed-off-by: Topi Pohjolainen <[email protected]>
* i965/miptree/gen4: Prepare x-tiled fallback for isl basedTopi Pohjolainen2017-07-221-6/+20
| | | | | Reviewed-by: Jason Ekstrand <[email protected]> Signed-off-by: Topi Pohjolainen <[email protected]>
* i965/miptree: Prepare non-tiled fallback for isl basedTopi Pohjolainen2017-07-221-0/+36
| | | | | | | See brw_miptree_choose_tiling(). Reviewed-by: Jason Ekstrand <[email protected]> Signed-off-by: Topi Pohjolainen <[email protected]>