aboutsummaryrefslogtreecommitdiffstats
path: root/src/mesa/drivers/dri/i965/intel_blit.c
Commit message (Collapse)AuthorAgeFilesLines
* i965: Make blt_pitch publicNanley Chery2018-07-121-10/+2
| | | | | | | We'd like to reuse this helper. Cc: <[email protected]> Reviewed-by: Chris Wilson <[email protected]>
* i965: Remove ring switching entirelyJason Ekstrand2018-05-221-3/+3
| | | | | Reviewed-by: Topi Pohjolainen <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
* i965/blit: Delete intel_emit_linear_blitJason Ekstrand2018-05-221-56/+0
| | | | | | | This function is no longer used. Reviewed-by: Topi Pohjolainen <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
* i965: Make intelEmitCopyBlit staticIan Romanick2018-01-261-199/+199
| | | | | | | | | | And rename to emit_copy_blit. v2: sed --in-place -e 's/color_logic_ops/gl_logicop_mode/g' $(grep -lr color_logic_ops src/) suggested by Brian. Signed-off-by: Ian Romanick <[email protected]> Reviewed-by: Samuel Iglesias Gonsálvez <[email protected]> [v1]
* i965: Use enum color_logic_ops for blitsIan Romanick2018-01-261-27/+9
| | | | | | | | v2: sed --in-place -e 's/color_logic_ops/gl_logicop_mode/g' $(grep -lr color_logic_ops src/) suggested by Brian. Signed-off-by: Ian Romanick <[email protected]> Reviewed-by: Samuel Iglesias Gonsálvez <[email protected]> [v1]
* i965: Support accelerated blit for depth 30 formats. (v2)Mario Kleiner2018-01-031-1/+19
| | | | | | | | | | | | | | | | | | | Extend intel_miptree_blit() to handle at least ARGB2101010 -> XRGB2101010, ARGB2101010 -> ARGB2101010, and XRGB2101010 -> XRGB2101010 via the BLT engine, but not XRGB2101010 -> ARGB2101010 yet. This works as tested under Compiz, KDE-5, Gnome-Shell. v2: Restrict BLT fast path to exclude XRGB2101010 -> ARGB2101010, as intel_miptree_set_alpha_to_one() isn't ready to set 2 bit alpha channels to 1.0 yet. However, couldn't find a test case where this specific blit would be needed, so maybe not much of a point to improve here. Signed-off-by: Mario Kleiner <[email protected]> Reviewed-by: Tapani Pälli <[email protected]> Signed-off-by: Marek Olšák <[email protected]>
* i965/gen8+: Fix the number of dwords programmed in MI_FLUSH_DWAnuj Phogat2017-11-141-3/+14
| | | | | | | Number of dwords in MI_FLUSH_DW changed from 4 to 5 in gen8+. Signed-off-by: Anuj Phogat <[email protected]> Cc: <[email protected]>
* i965: Program DWord Length in MI_FLUSH_DWAnuj Phogat2017-11-141-1/+1
| | | | | Signed-off-by: Anuj Phogat <[email protected]> Cc: <[email protected]>
* i965: fix unused var warnings in release buildTimothy Arceri2017-10-251-3/+1
| | | | Reviewed-by: Jordan Justen <[email protected]>
* i965: drop brw->gen in favor of devinfo->genLionel Landwerlin2017-08-301-12/+19
| | | | | | Signed-off-by: Lionel Landwerlin <[email protected]> Reviewed-by: Samuel Iglesias Gonsálvez <[email protected]> Reviewed-by: Emil Velikov <[email protected]>
* i965: Reduce passing 2x32b of reloc_domains to 2 bitsChris Wilson2017-08-041-24/+8
| | | | | | | | | | | | | | | | | | | | | | | | | | | | The kernel only cares about whether the object is to be written to or not, only reduces (reloc.read_domains, reloc.write_domain) down to just !!reloc.write_domain. When we use NO_RELOC, the kernel doesn't even read those relocs and instead userspace has to pass that information in the execobject.flags. We can simplify our reloc api by also removing the unused read/write domains and only pass the resultant flags. The caveat to the above are when we need to make the kernel aware that certain objects need to take into account different work arounds. Previously, this was done using the magic (INSTRUCTION, INSTRUCTION) reloc domains. NO_RELOC requires this to be passed in the execobject flags as well, and now we push that up the callstack. The API is more compact, more expressive of what happens underneath, but unfortunately requires more knowledge of the system at the point of use. Conversely it also means that knowledge is specific and not generally applied and so not overused. text data bss dec hex filename 8502991 356912 424944 9284847 8dacef lib/i965_dri.so (before) 8500455 356912 424944 9282311 8da307 lib/i965_dri.so (after) v2: (by Ken) Rebase. Reviewed-by: Kenneth Graunke <[email protected]>
* i965/blit: Remember to include miptree buffer offset in relocsChris Wilson2017-08-021-2/+2
| | | | | | | | Remember to add the offset to the start of the buffer in the relocation or else we write 0xff into random bytes elsewhere. Reviewed-by: Kenneth Graunke <[email protected]> Cc: [email protected]
* i965: Delete pitch alignment assertion in get_blit_intratile_offset_el.Kenneth Graunke2017-08-021-1/+0
| | | | | | | | | | | | | The cacheline alignment restriction is on the base address; the pitch can be anything. Fixes assertion failures when using primus (say, on glxgears, which creates a 300x300 linear BGRX surface with a pitch of 1200): intel_blit.c:190: get_blit_intratile_offset_el: Assertion `mt->surf.row_pitch % 64 == 0' failed. Cc: [email protected] Reviewed-by: Chris Wilson <[email protected]>
* i965/miptree: Clean-up unusedTopi Pohjolainen2017-07-221-22/+11
| | | | | Reviewed-by: Jason Ekstrand <[email protected]> Signed-off-by: Topi Pohjolainen <[email protected]>
* i965/miptree: Prepare intel_miptree_copy() for isl basedTopi Pohjolainen2017-07-201-4/+16
| | | | | | Reviewed-by: Jason Ekstrand <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]> Signed-off-by: Topi Pohjolainen <[email protected]>
* i965: Prepare blit engine for isl based miptreesTopi Pohjolainen2017-07-201-5/+11
| | | | | | | | | v2: Do not concern cpp, pitch and tiling which are already transitioned. Reviewed-by: Jason Ekstrand <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]> Signed-off-by: Topi Pohjolainen <[email protected]>
* i965/miptree: Switch to isl_surf::row_pitchTopi Pohjolainen2017-07-201-6/+7
| | | | | | Reviewed-by: Jason Ekstrand <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]> Signed-off-by: Topi Pohjolainen <[email protected]>
* i965/miptree: Switch to isl_surf::tilingTopi Pohjolainen2017-07-201-25/+29
| | | | | | | | v2 (Daniel): Use isl tiling converters instead of introducing local. Reviewed-by: Jason Ekstrand <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]> Signed-off-by: Topi Pohjolainen <[email protected]>
* i965/miptree: Switch to isl_surf::samplesTopi Pohjolainen2017-07-201-2/+2
| | | | | | | | | | | | | | v2 (Jason): - Don't trigger miptree re-creation in vain later on with ISL based. Core GL uses zero to indicate single sampled while ISL uses one - this would cause intel_miptree_match_image() to always fail. - Now that native miptree is already using sample number of one, there is no need for MAX2() when converting to ISL. Reviewed-by: Jason Ekstrand <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]> Signed-off-by: Topi Pohjolainen <[email protected]>
* i965/miptree: Use > 1 instead of > 0 to check for multisamplingTopi Pohjolainen2017-07-181-2/+2
| | | | | | | | | | | | Checking against zero currently works as single sampling is represented with zero. Once one moves to isl single sampling really has sample number of one. This keeps later patches simpler. Reviewed-by: Emil Velikov <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]> Signed-off-by: Topi Pohjolainen <[email protected]>
* i965: Use the new resolve function for several simple casesJason Ekstrand2017-06-071-10/+4
| | | | | Reviewed-by: Topi Pohjolainen <[email protected]> Reviewed-by: Chad Versace <[email protected]>
* i965/miptree: Refactor intel_miptree_resolve_colorJason Ekstrand2017-06-071-4/+4
| | | | | | | | | The new version now takes a range of levels as well as a range of layers. It should also be a tiny bit faster because it only walks the resolve_map list once instead of once per layer. Reviewed-by: Topi Pohjolainen <[email protected]> Reviewed-by: Chad Versace <[email protected]>
* i965: Mark depth surfaces as needing a HiZ resolve after blittingJason Ekstrand2017-06-071-0/+2
| | | | | | | Cc: "17.0 17.1" <[email protected]> Reviewed-by: Topi Pohjolainen <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]> Reviewed-by: Chad Versace <[email protected]>
* intel/isl: Make get_intratile_offset_el take the element size in bitsJason Ekstrand2017-06-011-1/+1
| | | | Reviewed-by: Topi Pohjolainen <[email protected]>
* intel/isl: Remove the device parameter from isl_tiling_get_infoJason Ekstrand2017-06-011-2/+1
| | | | | | | | | We were only using it for validating that we don't use Ys/Yf on gen8 and earlier. Removing it from isl_tiling_get_info lets us remove it from a bunch of other things that had no business needing a hardware generation. Reviewed-by: Topi Pohjolainen <[email protected]>
* i965: Round copy size to the nearest block in intel_miptree_copyJason Ekstrand2017-05-261-2/+2
| | | | | | | | | | | The width and height of the copy don't have to be aligned to the block size if they specify the right or bottom edges of the image. (See also the comment and asserts right above). We need to round them up when we do the division in order to get it 100% right. Reviewed-by: Ben Widawsky <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]> Cc: "17.0 17.1" <[email protected]>
* i965/drm: Rename drm_bacon_bo to brw_bo.Kenneth Graunke2017-04-101-5/+5
| | | | | | | | | | The bacon is all gone. This renames both the class and the related functions. We're about to run indent on the bufmgr code, so no need to worry about fixing bad indentation. Acked-by: Jason Ekstrand <[email protected]>
* i965/drm: Rewrite relocation handling.Kenneth Graunke2017-04-101-22/+8
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The execbuf2 kernel API requires us to construct two kinds of lists. First is a "validation list" (struct drm_i915_gem_exec_object2[]) containing each BO referenced by the batch. (The batch buffer itself must be the last entry in this list.) Each validation list entry contains a pointer to the second kind of list: a relocation list. The relocation list contains information about pointers to BOs that the kernel may need to patch up if it relocates objects within the VMA. This is a very general mechanism, allowing every BO to contain pointers to other BOs. libdrm_intel models this by giving each drm_intel_bo a list of relocations to other BOs. Together, these form "reloc trees". Processing relocations involves a depth-first-search of the relocation trees, starting from the batch buffer. Care has to be taken not to double-visit buffers. Creating the validation list has to be deferred until the last minute, after all relocations are emitted, so we have the full tree present. Calculating the amount of aperture space required to pin those BOs also involves tree walking, which is expensive, so libdrm has hacks to try and perform less expensive estimates. For some reason, it also stored the validation list in the global (per-screen) bufmgr structure, rather than as an local variable in the execbuffer function, requiring locking for no good reason. It also assumed that the batch would probably contain a relocation every 2 DWords - which is absurdly high - and simply aborted if there were more relocations than the max. This meant the first relocation from a BO would allocate 180kB of data structures! This is way too complicated for our needs. i965 only emits relocations from the batchbuffer - all GPU commands and state such as SURFACE_STATE live in the batch BO. No other buffer uses relocations. This means we can have a single relocation list for the batchbuffer. We can add a BO to the validation list (set) the first time we emit a relocation to it. We can easily keep a running tally of the aperture space required for that list by adding the BO size when we add it to the validation list. This patch overhauls the relocation system to do exactly that. There are many nice benefits: - We have a flat relocation list instead of trees. - We can produce the validation list up front. - We can allocate smaller arrays and dynamically grow them. - Aperture space checks are now (a + b <= c) instead of a tree walk. - brw_batch_references() is a trivial validation list walk. It should be straightforward to make it O(1) in the future. - We don't need to bloat each drm_bacon_bo with 32B of reloc data. - We don't need to lock in execbuffer, as the data structures are context-local, and not per-screen. - Significantly less code and a better match for what we're doing. - The simpler system should make it easier to take advantage of I915_EXEC_NO_RELOC in a future patch. Improves performance in Synmark 7.0's OglBatch7: - Skylake GT4e: 12.1499% +/- 2.29531% (n=130) - Apollolake: 3.89245% +/- 0.598945% (n=35) Improves performance in GFXBench4's gl_driver2 test: - Skylake GT4e: 3.18616% +/- 0.867791% (n=229) - Apollolake: 4.1776% +/- 0.240847% (n=120) v2: Feedback from Chris Wilson: - Omit explicit zero initializers for garbage execbuf fields. - Use .rsvd1 = ctx_id rather than i915_execbuffer2_set_context_id - Drop unnecessary fencing assertions. - Only use _WR variant of execbuf ioctl when necessary. - Shrink the arrays to be smaller by default. Reviewed-by: Jason Ekstrand <[email protected]>
* i965/drm: Use our internal libdrm (drm_bacon) rather than the real one.Kenneth Graunke2017-04-101-9/+9
| | | | | | Now we can actually test our changes. Acked-by: Jason Ekstrand <[email protected]>
* i965: Stop using legacy dri_bufmgr_* and intel_* names.Kenneth Graunke2017-03-301-1/+1
| | | | | | | | Eric renamed these from dri_bufmgr_* and intel_bufmgr_* to drm_intel_* in libdrm commit 4b9826408f65976a1a13387beda748b65e03ec52, circa 2008, but we've been using the legacy names this whole time. Reviewed-by: Jason Ekstrand <[email protected]>
* i965: Delete fast copy blit codeAnuj Phogat2017-03-271-183/+48
| | | | | | | | | | | | | | | Fast copy blit was primarily added to support Yf/Ys detiling. But, Yf/Ys tiling never got used in i965 due to not delivering the expected performance benefits. Also, replacing legacy blits with fast copy blit didn't help the benchmarking numbers. This is probably due to a h/w restriction that says "start pixel for Fast Copy blit should be on an OWord boundary". This restriction causes many blit operations to skip fast copy blit and use legacy blits. So, this patch is deleting this dead code in favor of adding it later when we actually find it useful. Signed-off-by: Anuj Phogat <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]>
* i965: Fix check for negative pitch in can_do_fast_copy_blit().Kenneth Graunke2017-01-291-6/+4
| | | | | | | | | | | | | At this point, the pitch is in bytes. We haven't yet divided the pitch by 4 for tiled surfaces, so abs(pitch) may be larger than 32K. This means the bit 15 trick won't work. The caller now has signed integers anyway, so just pass those through and do the obvious check. Cc: "17.0" <[email protected]> Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]>
* i965: Make intelEmitCopyBlit not truncate large strides.Kenneth Graunke2017-01-261-9/+5
| | | | | | | | | | | | | | | | | | | When trying to blit larger tiled surfaces, the pitch can be larger than 32768 bytes, which means it won't fit in a GLshort. Passing it in will truncate the stride to 0, which has...surprising results. The pitch can be up to 32,768 DWords, or 128kB. We measure it in bytes, but divide by 4 when programming it. So we need to handle values up to 131,072. Switch from GLshort to int32_t to avoid the truncation. Fixes GL45-CTS.gtf30.GL3Tests.depth_texture.depth_texture_copyteximage at widths greater than 8192. v2: Use int32_t as negative values can be used (Jason). Cc: "17.0" <[email protected]> Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]>
* i965/blit: Fix the src dimension sanity check in miptree_copyJason Ekstrand2016-12-131-2/+10
| | | | | Reviewed-by: Topi Pohjolainen <[email protected]> Cc: "13.0" <[email protected]>
* i965/copy_image: Re-implement the blitter path with emit_miptree_blitJason Ekstrand2016-12-051-0/+68
| | | | | | | | | | By using emit_miptree_blit which does chunking, this fixes the blitter path for the case where the image is too tall to blit normally. We also pull it into intel_blit as intel_miptree_copy. This matches the naming of the blorp blit and copy functions brw_blorp_blit and brw_blorp_copy. Reviewed-by: Anuj Phogat <[email protected]> Cc: "13.0" <[email protected]>
* i965/blit: Break the guts of intel_miptree_blit into a helperJason Ekstrand2016-12-051-67/+84
| | | | | Reviewed-by: Anuj Phogat <[email protected]> Cc: "13.0" <[email protected]>
* i965: Provide slice details to color resolverTopi Pohjolainen2016-11-251-2/+2
| | | | | | | | v2: Make intel_miptree_resolve_color() take start layer and layer count. Signed-off-by: Topi Pohjolainen <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]>
* i965/blit: Break blits into chunks in intel_miptree_blitJason Ekstrand2016-10-271-23/+41
| | | | | | | | | | | | | | This allows us to blit much larger images than if we use the blitter directly. In particular, it gives us an almost infinite image height compared to the fairly limiting 32k. We do, however, still have a restriction on stride of the image because handling larger strides, while possible, is fairly difficult. v2: Properly handle linear blit alignment restrictions Signed-off-by: Jason Ekstrand <[email protected]> Reviewed-by: Topi Pohjolainen <[email protected]> Reviewed-by: Anuj Phogat <[email protected]>
* i965/blit: Break blits into chunks in set_alpha_to_oneJason Ekstrand2016-10-271-15/+73
| | | | | | | | v2: Properly handle linear blit alignment restrictions Signed-off-by: Jason Ekstrand <[email protected]> Reviewed-by: Topi Pohjolainen <[email protected]> Reviewed-by: Anuj Phogat <[email protected]>
* i965/blit: Remove a bogus assertionJason Ekstrand2016-10-271-4/+0
| | | | | | | | | | | | This assertion, while valid for linear buffers, doesn't work properly for tiled memory. It used to work most of the time because the offset provided was always to the left-hand edge of the image. However, if you use a byte offset to get to the inside of the image, the height * stride calculation may actually end up being too large. Signed-off-by: Jason Ekstrand <[email protected]> Reviewed-by: Topi Pohjolainen <[email protected]> Reviewed-by: Anuj Phogat <[email protected]>
* i965: Roll intel_reg.h into brw_defines.hJason Ekstrand2016-08-191-1/+0
| | | | | | | | More than half of the stuff in intel_reg.h had nothing whatsoever to do with registers and really belongs in brw_defines.h anyway. Signed-off-by: Jason Ekstrand <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
* i965: Bail on the BLT path if BlitFramebuffer requires sRGB conversion.Kenneth Graunke2016-08-081-2/+2
| | | | | | | | | | | | | Modern OpenGL BlitFramebuffer require sRGB encode/decode when GL_FRAMEBUFFER_SRGB is enabled. The blitter can't handle this, so we need to bail. On Gen4-5, this means falling back to Meta, which should handle it. We allow sRGB <-> sRGB blits, as decode then encode ought to be a noop (other than potential precision loss, which nobody wants anyway). Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Ian Romanick <[email protected]>
* i965: add missing return in if statementThomas Hindoe Paaboel Andersen2016-05-281-0/+1
| | | | | | | | | | Re-add the "return false" that was removed in 0c02d7002d6c005b4c1fe997b5ef5916978dd183 It seems that something went wrong when merging the patch. The patch sent to the mailing list does not directly match what was committed. https://lists.freedesktop.org/archives/mesa-dev/2016-May/118198.html Reviewed-by: Jason Ekstrand <[email protected]>
* i965: Don't use fast copy blit in case of logical operations other than GL_COPYAnuj Phogat2016-05-261-2/+7
| | | | | | | | | | | XY_FAST_COPY_BLT command doesn't have a field for raster operation. So, fall back to using XY_SRC_COPY_BLT to handle those cases. Fixes piglit test gl-1.1-xor-copypixels when fast copy blit is enabled for all tiling formats. Signed-off-by: Anuj Phogat <[email protected]> Reviewed-by: Matt Turner <[email protected]>
* i965/gen9: Remove the halign/valign field setup code in fast copy blitAnuj Phogat2016-05-261-65/+0
| | | | | | | | | | | | | Experimentation with different values of src/dst horizontal/vertical alignment showed that these fileds are not used on gen9 hardware. A recent update in graphics specs has removed these fields from XY_FAST_COPY_BLT command. Cc: Ben Widawsky <[email protected]> Cc: Chad Versace <[email protected]> Signed-off-by: Anuj Phogat <[email protected]> Reviewed-by: Ben Widawsky <[email protected]>
* i965: Add means for limiting color resolvesTopi Pohjolainen2016-02-131-2/+2
| | | | | | | | | | | | | | | | | Until now there has been only one type of color buffer that needs to resolved - namely single sampled fast clear. As even the sampler engine in GPU doesn't understand the associated meta data, the color values need to be always resolved prior to reading them. From SKL onwards there is new scheme supported called the lossless compression of single sampled color buffers. This is something that is understood by the sampling engine and therefore resolving of these types of buffers is not necessary before sampling. This patch adds means to make the distinction when considering if resolve is needed. Signed-off-by: Topi Pohjolainen <[email protected]> Reviewed-by: Ben Widawsky <[email protected]>
* i965/gen9: Return false in place of assert in intelEmitCopyBlit()Anuj Phogat2016-01-051-3/+4
| | | | | | | This allows the fallback paths to handle it correctly. Signed-off-by: Anuj Phogat <[email protected]> Reviewed-by: Jordan Justen <[email protected]>
* i965/gen9: Remove regions overlap check in fast copy blitAnuj Phogat2016-01-051-5/+0
| | | | | | | | Overlapping blits are anyway undefined in OpenGL. So no need of overlap check here. Signed-off-by: Anuj Phogat <[email protected]> Reviewed-by: Jordan Justen <[email protected]>
* i965/gen9: Don't use fast copy blit in case of non power of 2 cppAnuj Phogat2016-01-051-2/+4
| | | | | | | Fast copy blit is currently enabled for use only with Yf/Ys tiling. Signed-off-by: Anuj Phogat <[email protected]> Reviewed-by: Jordan Justen <[email protected]>
* i965: remove unneeded #include of colormac.hMark Janes2015-10-061-1/+0
| | | | Reviewed-by: Matt Turner <[email protected]>