summaryrefslogtreecommitdiffstats
path: root/src/mesa/drivers
Commit message (Collapse)AuthorAgeFilesLines
* i965: Preserve EXEC_OBJECT_CAPTURE when growing the BO.Kenneth Graunke2017-11-291-0/+3
| | | | | | | | | The original state buffer was marked with EXEC_OBJECT_CAPTURE. When growing it, we want to preserve that flag so we continue to capture it in GPU hang reports. Fixes: 2dfc119f22f257082ab0 "i965: Grow the batch/state buffers if we need space and can't flush." Reviewed-by: Ian Romanick <[email protected]>
* i965: Use old_bo->align when growing batch/state buffer instead of 4096.Kenneth Graunke2017-11-291-1/+2
| | | | | | | | | | | | | | The intention here is make the new BO use the same alignment as the old BO. This isn't strictly necessary, but we would have to update the 'alignment' field in the validation list when swapping it out, and we don't bother today. The batch and state buffers use an alignment of 4096, so this should be equivalent - it's just clearer than cut and pasting a magic constant. Fixes: 2dfc119f22f257082ab0 "i965: Grow the batch/state buffers if we need space and can't flush." Reviewed-by: Ian Romanick <[email protected]> Reviewed-by: Jordan Justen <[email protected]>
* i965: Program the dynamic state heap size to MAX_STATE_SIZE.Kenneth Graunke2017-11-293-10/+10
| | | | | | | | | | | | | | | | | | | | | | | | | | STATE_BASE_ADDRESS specifies a maximum size of the dynamic state section, beyond which data supposedly reads back as 0. On Gen8+, we were programming it to the size of the buffer. This worked fine until we started growing the state buffer in commit 2dfc119f22f25708. When the state buffer grows, the value in STATE_BASE_ADDRESS becomes too small, and our state beyond STATE_SZ bytes would read back as 0. To avoid having to update the value, we program it to MAX_STATE_SIZE. We used to program the upper bound to the maximum on older hardware anyway, so programming it too large isn't a big deal. Bogus SURFACE_STATE can easily lead to GPU hangs and misrendering. DiRT Rally was hitting the statebuffer growth path, and suffered from bad texture corruption and GPU hangs (usually around the same time). This patch fixes both issues. Fixes: 2dfc119f22f257082ab0 "i965: Grow the batch/state buffers if we need space and can't flush." Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=103101 Tested-by: Jordan Justen <[email protected]> Reviewed-by: Chris Wilson <[email protected]> Reviewed-by: Jordan Justen <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]>
* mesa: deal with vs_inputs as 64-bit unsigned integerJuan A. Suarez Romero2017-11-291-4/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | Commit 78942e ("mesa: shrink VERT_ATTRIB bitfields to 32 bits") uses vs_prog_data->vs_inputs as if it were a 32-bit unsigned integer. But actually it is a 64-bit integer, and as such it is used in other parts of Mesa code. It is worth to note that bits from the entire range are used, and not only 32-bits. This is due our implementation for handling 64-bit dual-slot input attributes, which requires to use a larger bitfield to manage them. This commit reverts the changes done in brw_draw_upload.c, keeping the rest of the changes. This fixes the following tests: - KHR-GL45.enhanced_layouts.varying_array_locations - KHR-GL45.enhanced_layouts.varying_locations Fixes: 78942e ("mesa: shrink VERT_ATTRIB bitfields to 32 bits") Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=103942 CC: Marek Olšák <[email protected]> CC: Ian Romanick <[email protected]> Reviewed-by: Marek Olšák <[email protected]> Reviewed-by: Iago Toral Quiroga <[email protected]> Signed-off-by: Juan A. Suarez Romero <[email protected]>
* i965: Change a ret == -1 check to ret != 0.Kenneth Graunke2017-11-281-1/+1
| | | | | | For consistency with most other ret checks. Suggested by Chris. Reviewed-by: Chris Wilson <[email protected]>
* i965: Use C99 struct initializers in brw_bufmgr.c.Kenneth Graunke2017-11-281-91/+49
| | | | | | | | | | This is cleaner than using a non-standard memclear macro (which does a memset to 0) and then initializing fields after the fact. We move the declarations to where we initialized the fields. While we're at it, we move the declaration of 'ret' that goes with the ioctl, eliminating the declaration section altogether. Reviewed-by: Chris Wilson <[email protected]>
* i965: Move perf_debug and WARN_ONCE back to brw_context.h.Kenneth Graunke2017-11-281-0/+29
| | | | | | | | These were moved to src/intel/common/gen_debug.h, but they are not common code. They assume that brw_context or gl_context variables exist, named brw or ctx. That isn't remotely true outside of i965. Reviewed-by: Lionel Landwerlin <[email protected]>
* i965: const a few structs and vars to avoid writing to them by accidentEric Engestrom2017-11-281-4/+4
| | | | | Signed-off-by: Eric Engestrom <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
* i965: Fix Smooth Point Enables.Kenneth Graunke2017-11-281-1/+1
| | | | | | | | | | We want to program the 3DSTATE_RASTER field to the gl_context value, not the other way around. Fixes: 13ac46557ab1 (i965: Port Gen8+ 3DSTATE_RASTER state to genxml.) Reviewed-by: Jason Ekstrand <[email protected]> Reviewed-by: Lionel Landwerlin <[email protected]>
* i965: perf: add support for CoffeeLake GT3Lionel Landwerlin2017-11-285-2/+10712
| | | | | Signed-off-by: Lionel Landwerlin <[email protected]> Acked-by: Kenneth Graunke <[email protected]>
* i965: perf: add support for CoffeeLake GT2Lionel Landwerlin2017-11-285-2/+10484
| | | | | Signed-off-by: Lionel Landwerlin <[email protected]> Acked-by: Kenneth Graunke <[email protected]>
* i965: perf: add busyness metric sets on gen8/9 platformsLionel Landwerlin2017-11-287-0/+1231
| | | | | Signed-off-by: Lionel Landwerlin <[email protected]> Acked-by: Kenneth Graunke <[email protected]>
* i965: fix time elapsed counter equations in VME/Media configsLionel Landwerlin2017-11-286-12/+12
| | | | | | | | There was a mistake just in those metric sets. We probably didn't noticed because they're not really interesting for 3D workloads. Signed-off-by: Lionel Landwerlin <[email protected]> Acked-by: Kenneth Graunke <[email protected]>
* i965: perf: update counter names on gen8/9 platformsLionel Landwerlin2017-11-288-116/+116
| | | | | | | Just fixing names. Signed-off-by: Lionel Landwerlin <[email protected]> Acked-by: Kenneth Graunke <[email protected]>
* i965: add a debug option to disable oa config loadingLionel Landwerlin2017-11-281-1/+2
| | | | | | | | | This provides a good way to verify we haven't broken using the perf driver on older kernels (which don't have the oa config loading mechanism). Signed-off-by: Lionel Landwerlin <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
* i965: perf: add support for userspace configurationsLionel Landwerlin2017-11-281-8/+101
| | | | | | | | | | This allows us to deploy new configurations without touching the kernel. v2: Detect loadable configs without creating one (Chris) Signed-off-by: Lionel Landwerlin <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
* i965: perf: update configs for loading from userspaceLionel Landwerlin2017-11-2810-0/+243
| | | | | | | | | | When making configs loadable from userspace in the kernel, we left to userspace more responsability around programming some registers. In particular one register we use to set directly in the driver has now been moved into the configs. Signed-off-by: Lionel Landwerlin <[email protected]> Acked-by: Kenneth Graunke <[email protected]>
* intel/blorp: Take a range of layers in blorp_ccs_resolveJason Ekstrand2017-11-271-1/+1
| | | | | Reviewed-by: Lionel Landwerlin <[email protected]> Reviewed-by: Nanley Chery <[email protected]>
* intel/blorp: Add initial support for indirect clear colorsJason Ekstrand2017-11-271-0/+13
| | | | Reviewed-by: Lionel Landwerlin <[email protected]>
* i965/blorp: Use a designated initializer for blorp_surfJason Ekstrand2017-11-271-8/+9
| | | | | | | | This way uninitialized fields get automatically zeroed and it's safe to add more fields to blorp_surf. Reviewed-by: Lionel Landwerlin <[email protected]> Reviewed-by: Nanley Chery <[email protected]>
* mesa: shrink VERT_ATTRIB bitfields to 32 bitsMarek Olšák2017-11-251-4/+4
| | | | | | There are only 32 vertex attribs now. Reviewed-by: Ian Romanick <[email protected]>
* mesa: remove unused vertex attrib WEIGHTMarek Olšák2017-11-252-8/+1
| | | | | | | | | | | | We don't support ARB_vertex_blend. Note that the attribute aliasing check for ARB_vertex_program had to be rewritten. vbo_context: 20344 -> 20008 bytes gl_context: 74672 -> 74616 bytes Reviewed-by: Ian Romanick <[email protected]>
* i965: Support decoding INTERFACE_DESCRIPTOR_DATA with INTEL_DEBUG=batJordan Justen2017-11-211-0/+24
| | | | | | | | This will dump the INTERFACE_DESCRIPTOR_DATA along with the associated samplers & surfaces. Signed-off-by: Jordan Justen <[email protected]> Reviewed-by: Scott D Phillips <[email protected]>
* i965: Optimize bucket index calculationAravindan Muthukumar2017-11-201-8/+39
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Reducing Bucket index calculation to O(1). This algorithm calculates the index using matrix method. Assuming PAGE_SIZE is 4096, matrix arrangement is as below: 1*4096 2*4096 3*4096 4*4096 5*4096 6*4096 7*4096 8*4096 10*4096 12*4096 14*4096 16*4096 20*4096 24*4096 28*4096 32*4096 ... ... ... ... ... ... ... ... ... ... ... max_cache_size From this matrix its clearly seen that every row follows the below way: ... ... ... n n+(1/4)n n+(1/2)n n+(3/4)n 2n Row is calculated as log2(size/PAGE_SIZE) Column is calculated as converting the difference between the elements to fit into power size of two and indexing it. Final Index is (row*4)+(col-1) Tested with Intel Mesa CI. Improves performance of 3DMark on BXT by 0.705966% +/- 0.229767% (n=20) v4: Review comments on style and code comments implemented (Ian). v3: Review comments implemented (Ian). v2: Review comments implemented (Jason). Signed-off-by: Aravindan Muthukumar <[email protected]> Signed-off-by: Kedar Karanje <[email protected]> Reviewed-by: Yogesh Marathe <[email protected]> Signed-off-by: Ian Romanick <[email protected]>
* i965: Mark BOs as external when we export their handleJason Ekstrand2017-11-173-1/+11
| | | | | | | | | | | | | | | | | | | | Almost all of our BO export paths were already properly marked the BO as external and added it to the handle table. Most export use-cases go through a prime fd or flink where we have a brw_bo export helper that does the right thing. The one missing one happens when you call queryImage and ask for __DRI_IMAGE_ATTRIB_HANDLE. We just grabbed the gem handle out of the BO (because it's really easy to do that) and handed it off to the client; what could go wrong? As it turns out, this path is used by basically every compositor that wants to turn around and call drmModeAddFB2 on it so it can hand it off to display. The result, as of 4b1e70cc57d7ff5f465544644b2180dee1490cee, is that we no longer set MOCS_PTE on those surfaces and the kernel's attempts to disable caching fail and we scanout gets corruption. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=103759 Fixes: 4b1e70cc57d7ff5f465544644b2180dee1490cee Reviewed-by: Kenneth Graunke <[email protected]> Cc: [email protected]
* i965/bufmgr: Add a helper to mark a BO as externalJason Ekstrand2017-11-171-6/+11
| | | | | Reviewed-by: Kenneth Graunke <[email protected]> Cc: [email protected]
* i965: Revert Gen8 aspect of VF PIPE_CONTROL workaround.Kenneth Graunke2017-11-171-1/+5
| | | | | | | This apparently causes hangs on Broadwell, so let's back it out for now. I think there are other PIPE_CONTROL workarounds that we're missing. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=103787
* i965: Rewrite disassembly annotation codeMatt Turner2017-11-171-1/+1
| | | | | | | | | | | | | | | The old code used an array to store each "instruction group" (the new, better name than the old overloaded "annotation"), and required a memmove() to shift elements over in the array when we needed to split a group so that we could add an error message. This was confusing and difficult to get right, not the least of which was because the array has a tail sentinel not included in .ann_count. Instead use a linked list, a data structure made for efficient insertion. Acked-by: Samuel Iglesias Gonsálvez <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
* i965: Remove DWord length from MI_FLUSH_DW definitionAnuj Phogat2017-11-171-1/+1
| | | | | | | | Fixes: 6165fda59b8 ("i965: Program DWord Length in MI_FLUSH_DW") Cc: <[email protected]> Signed-off-by: Anuj Phogat <[email protected]> Reviewed-by: Nanley Chery <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
* i965: Upload invariant state once at the start of the batch on Gen4-5.Kenneth Graunke2017-11-164-13/+3
| | | | | | | | | | | | | | | | | | | | | | | | We want to emit invariant state at the start of a render batch. In the past, this more or less happened: a new batch flagged BRW_NEW_CONTEXT (because we don't have hardware contexts), which triggered the brw_invariant_state atom. So, it would be emitted before any 3D drawing. (Technically, there might be some BLT commands in the batch because Gen4-5 have a single combined render/BLT ring, but that should be harmless). With the advent of BLORP, this broke. The first item in a batch might be a BLORP operation, which bypasses the normal draw upload path. So, we need to ensure invariant state happens first. To do that, we just upload it when creating a new batch. On Gen6+ we'd need to worry about whether it's a RENDER or BLT batch, but because we have a combined ring, this approach should work fine on Gen4-5. Seems to fix GPU hangs when playing hardware accelerated video with mpv -hwdec=vaapi on Ironlake. Cc: [email protected] Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=103529 Reviewed-by: Jason Ekstrand <[email protected]>
* meson: Add dridriverdir variable to dri.pc.Rafael Antognolli2017-11-161-0/+1
| | | | | | | | | Xorg (and possibly other things) depend on this variable to find the path to DRI drivers. Signed-off-by: Rafael Antognolli <[email protected]> Cc: Dylan Baker <[email protected]> Reviewed-by: Dylan Baker <[email protected]>
* i915: add missing extensions.h includeEmil Velikov2017-11-162-0/+2
| | | | | | | | | | | Otherwise we'll bail with due to -Werror=implicit-function-declaration. It went unnoticed since the we had a bug which did consistently set the compiler flag. Fixes: ba8a347f932 ("mesa: split extensions overrides and glGetString(GL_EXTENSIONS)") Signed-off-by: Emil Velikov <[email protected]> Reviewed-by: Andres Gomez <[email protected]> Reviewed-by: Marek Olšák <[email protected]>
* mesa: split extensions overrides and glGetString(GL_EXTENSIONS)Emil Velikov2017-11-168-0/+8
| | | | | | | | | | | | | Currently we apply the extension overrides and construct the extensions string upon MakeCurrent. They are two distinct things, so let's slit the two while pushing the overrides management _before_ _mesa_compute_version(). This ensures that the version is updated to reflect the enabled/disabled extensions. Cc: Jordan Justen <[email protected]> Signed-off-by: Emil Velikov <[email protected]> Reviewed-by: Brian Paul <[email protected]>
* i965: remove ARB_compute_shader extension overrideEmil Velikov2017-11-161-2/+1
| | | | | | | | | | | | Checking the override was useful in the early stages of developing the extension. Now that everything is wired, where possible, we can drop the check. Doing so allows us to simplify some of the related code. Signed-off-by: Emil Velikov <[email protected]> Reviewed-by: Brian Paul <[email protected]> Reviewed-by: Jordan Justen <[email protected]>
* i965: use _mesa_is_desktop_gl helperEmil Velikov2017-11-161-1/+1
| | | | | | | Use the helper over opencoding the check. Signed-off-by: Emil Velikov <[email protected]> Reviewed-by: Brian Paul <[email protected]>
* i965: Implement another VF cache invalidate workaround on Gen8+.Kenneth Graunke2017-11-161-8/+33
| | | | | | | | | | | | ...and provide a better citation for the existing one. v2: - Apply the workaround to Gen8 too, as intended (caught by Topi). - Restructure to add bits instead of an extra flush (based on a similar patch by Rafael Antognolli). Cc: [email protected] Reviewed-by: Rafael Antognolli <[email protected]>
* i965: Drop some reserved space remnants.Kenneth Graunke2017-11-152-4/+1
| | | | | | BATCH_RESERVED was deleted in commit 2c46a67b4138631217 (i965: Delete BATCH_RESERVED handling.) The reserved_space field is dead code, and the comments aren't useful these days.
* i965: Fold ABO state upload code into the SSBO/UBO state upload code.Kenneth Graunke2017-11-1510-189/+16
| | | | | | | | | | | Having this separate could potentially make programs that rebind atomics but no other surfaces ever so slightly faster. But it's a tiny amount of code to add to the existing UBO/SSBO atom, and very related. The extra atoms have a cost on every draw call, and so dropping some of them would be nice. This also reclaims a dirty bit. Reviewed-by: Jason Ekstrand <[email protected]>
* i965: Use nir_lower_atomics_to_ssbos and delete ABO compiler code.Kenneth Graunke2017-11-153-11/+8
| | | | | | | | | | | | We use the same hardware mechanism for both atomic counters and SSBO atomics, so there's really no benefit to maintaining separate code to handle each case. Instead, we can just use Rob's shiny new NIR pass to convert atomic_uints to SSBOs, and delete piles of code. The ssbo_start section of the binding table becomes a combined ABO and SSBO section, with ABOs first, then SSBOs. Reviewed-by: Jason Ekstrand <[email protected]>
* i965: Make a better helper function for UBO/SSBO/ABO surface handling.Kenneth Graunke2017-11-153-94/+37
| | | | | | | | | This fixes the missing AutomaticSize handling in the ABO code, removes a bunch of duplicated code, and drops an extra layer of wrapping around brw_emit_buffer_surface_state(). Reviewed-by: Tapani Pälli <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]>
* i965: Make use of brw_load_register_imm32() helper functionAnuj Phogat2017-11-145-40/+19
| | | | | Signed-off-by: Anuj Phogat <[email protected]> Cc: Nanley Chery <[email protected]>
* i965/gen8+: Fix the number of dwords programmed in MI_FLUSH_DWAnuj Phogat2017-11-142-5/+19
| | | | | | | Number of dwords in MI_FLUSH_DW changed from 4 to 5 in gen8+. Signed-off-by: Anuj Phogat <[email protected]> Cc: <[email protected]>
* i965: Program DWord Length in MI_FLUSH_DWAnuj Phogat2017-11-142-2/+2
| | | | | Signed-off-by: Anuj Phogat <[email protected]> Cc: <[email protected]>
* i965: implement (un)mapImageJulien Isorce2017-11-141-2/+63
| | | | | | | | | | | | | | | | | | | | | | | | | | | | Already implemented for Gallium drivers. Useful for gbm_bo_(un)map. Tests: By porting wayland/weston/clients/simple-dmabuf-drm.c to GBM. kmscube --mode=rgba kmscube --mode=nv12-1img kmscube --mode=nv12-2img piglit ext_image_dma_buf_import-refcount -auto piglit ext_image_dma_buf_import-transcode-nv12-as-r8-gr88 -auto piglit ext_image_dma_buf_import-sample_rgb -fmt=XR24 -alpha-one -auto piglit ext_image_dma_buf_import-sample_rgb -fmt=AR24 -auto piglit ext_image_dma_buf_import-sample_yuv -fmt=NV12 -auto piglit ext_image_dma_buf_import-sample_yuv -fmt=YU12 -auto piglit ext_image_dma_buf_import-sample_yuv -fmt=YV12 -auto v2: add early return if (flag & MAP_INTERNAL_MASK) v3: take input rect into account and test with kmscube and piglit. v4: handle wraparound and bo reference. v5: indent, exclude 0 width and height on the boundary, map bo independently of the image. Signed-off-by: Julien Isorce <[email protected]> Reviewed-by: Chris Wilson <[email protected]>
* i965: Track the depth and render caches separatelyJason Ekstrand2017-11-135-22/+26
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Previously, we just had one hash set for tracking depth and render caches called brw_context::render_cache. This is less than ideal because the depth and render caches are separate and we can't track moves between the depth and the render caches. This limitation led to some unnecessary flushing around the depth cache. There are cases (mostly with BLORP) where we can end up touching a depth or stencil buffer through the render cache. To guard against this, blorp would unconditionally do a render_cache_set_check_flush on it's destination which meant that if you did any rendering (including a BLORP operation) to a given surface and then used it as a blorp destination, you would end up flushing it out of the render cache before rendering into it. Things get worse when you dig into the depth/stencil state code for regular GL draw calls. Because we may end up rendering to a depth or stencil buffer via BLORP, we did a render_cache_set_check_flush on all depth and stencil buffers in brw_emit_depthbuffer to ensure that they got flushed out of the render cache prior to using them for depth or stencil testing. However, because we also need to track dirtiness for depth and stencil so that we can implement depth and stencil texturing correctly, we were adding all depth and stencil buffers to the render cache set in brw_postdraw_set_buffers_need_resolve. This meant that, if anything caused 3DSTATE_DEPTH_BUFFER to get re-emitted (currently _NEW_BUFFERS, BRW_NEW_BATCH, and BRW_NEW_BLORP), we would almost always do a full pipeline stall and render/depth cache flush. The root cause of both of these problems is that we can't tell the difference between the render and depth caches in our tracking. This commit splits our cache tracking into two sets, one for render and one for depth, and properly handles transitioning between the two. We still flush all the caches whenever anything needs to be flushed. The idea is that if we're going to take the hit of a flush and stall, we may as well flush everything in the hopes that we can avoid a flush by something else later. Reviewed-by: Kenneth Graunke <[email protected]>
* i965/blorp: Add more destination flushingJason Ekstrand2017-11-131-1/+6
| | | | | | | | | Right now we just always flush the destination for render and aren't particularly careful about depth or stencil. Soon, flush_for_render isn't going to do the same thing as flush_for_depth and we may be doing a good deal less depth flushing so we should be a bit more precise. Reviewed-by: Kenneth Graunke <[email protected]>
* i965: Add more precise cache tracking helpersJason Ekstrand2017-11-136-13/+49
| | | | | | | | In theory, this will let us track the depth and render caches separately. Right now, they're just wrappers around brw_render_cache_set_* Reviewed-by: Kenneth Graunke <[email protected]>
* i965: Add stencil buffers to cache set regardless of stencil texturingJason Ekstrand2017-11-131-3/+1
| | | | | | | | We may access them as a texture using blorp regardless of whether or not stencil texturing is enabled. Reviewed-by: Kenneth Graunke <[email protected]> Cc: [email protected]
* i965: Switch over to fully external-or-not MOCS schemeJason Ekstrand2017-11-133-29/+11
| | | | Reviewed-by: Kenneth Graunke <[email protected]>
* i965: Use PTE MOCS for all external buffersJason Ekstrand2017-11-132-10/+18
| | | | | | | | | | | | | We were already using PTE for all render targets in case one happened to get scanned out. However, this still wasn't 100% correct because there are still possibly cases where we may want to texture from an external buffer even though we don't know the caching mode. This can happen, for instance, on buffers imported from another GPU via prime. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=101691 Cc: "17.3" <[email protected]> Tested-by: Lyude Paul <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>