| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
|
|
|
| |
We'd like to reuse this helper.
Cc: <[email protected]>
Reviewed-by: Chris Wilson <[email protected]>
|
|
|
|
|
| |
Reviewed-by: Topi Pohjolainen <[email protected]>
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
| |
This function is no longer used.
Reviewed-by: Topi Pohjolainen <[email protected]>
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
|
|
|
| |
And rename to emit_copy_blit.
v2: sed --in-place -e 's/color_logic_ops/gl_logicop_mode/g' $(grep -lr
color_logic_ops src/) suggested by Brian.
Signed-off-by: Ian Romanick <[email protected]>
Reviewed-by: Samuel Iglesias Gonsálvez <[email protected]> [v1]
|
|
|
|
|
|
|
|
| |
v2: sed --in-place -e 's/color_logic_ops/gl_logicop_mode/g' $(grep -lr
color_logic_ops src/) suggested by Brian.
Signed-off-by: Ian Romanick <[email protected]>
Reviewed-by: Samuel Iglesias Gonsálvez <[email protected]> [v1]
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Extend intel_miptree_blit() to handle at least
ARGB2101010 -> XRGB2101010, ARGB2101010 -> ARGB2101010,
and XRGB2101010 -> XRGB2101010 via the BLT engine,
but not XRGB2101010 -> ARGB2101010 yet.
This works as tested under Compiz, KDE-5, Gnome-Shell.
v2: Restrict BLT fast path to exclude XRGB2101010 -> ARGB2101010,
as intel_miptree_set_alpha_to_one() isn't ready to set 2 bit
alpha channels to 1.0 yet. However, couldn't find a test case
where this specific blit would be needed, so maybe not much
of a point to improve here.
Signed-off-by: Mario Kleiner <[email protected]>
Reviewed-by: Tapani Pälli <[email protected]>
Signed-off-by: Marek Olšák <[email protected]>
|
|
|
|
|
|
|
| |
Number of dwords in MI_FLUSH_DW changed from 4 to 5 in gen8+.
Signed-off-by: Anuj Phogat <[email protected]>
Cc: <[email protected]>
|
|
|
|
|
| |
Signed-off-by: Anuj Phogat <[email protected]>
Cc: <[email protected]>
|
|
|
|
| |
Reviewed-by: Jordan Justen <[email protected]>
|
|
|
|
|
|
| |
Signed-off-by: Lionel Landwerlin <[email protected]>
Reviewed-by: Samuel Iglesias Gonsálvez <[email protected]>
Reviewed-by: Emil Velikov <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The kernel only cares about whether the object is to be written to or
not, only reduces (reloc.read_domains, reloc.write_domain) down to just
!!reloc.write_domain. When we use NO_RELOC, the kernel doesn't even read
those relocs and instead userspace has to pass that information in the
execobject.flags. We can simplify our reloc api by also removing the
unused read/write domains and only pass the resultant flags.
The caveat to the above are when we need to make the kernel aware that
certain objects need to take into account different work arounds.
Previously, this was done using the magic (INSTRUCTION, INSTRUCTION)
reloc domains. NO_RELOC requires this to be passed in the execobject
flags as well, and now we push that up the callstack.
The API is more compact, more expressive of what happens underneath, but
unfortunately requires more knowledge of the system at the point of use.
Conversely it also means that knowledge is specific and not generally
applied and so not overused.
text data bss dec hex filename
8502991 356912 424944 9284847 8dacef lib/i965_dri.so (before)
8500455 356912 424944 9282311 8da307 lib/i965_dri.so (after)
v2: (by Ken) Rebase.
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
|
| |
Remember to add the offset to the start of the buffer in the relocation
or else we write 0xff into random bytes elsewhere.
Reviewed-by: Kenneth Graunke <[email protected]>
Cc: [email protected]
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The cacheline alignment restriction is on the base address; the pitch
can be anything.
Fixes assertion failures when using primus (say, on glxgears, which
creates a 300x300 linear BGRX surface with a pitch of 1200):
intel_blit.c:190: get_blit_intratile_offset_el: Assertion `mt->surf.row_pitch % 64 == 0' failed.
Cc: [email protected]
Reviewed-by: Chris Wilson <[email protected]>
|
|
|
|
|
| |
Reviewed-by: Jason Ekstrand <[email protected]>
Signed-off-by: Topi Pohjolainen <[email protected]>
|
|
|
|
|
|
| |
Reviewed-by: Jason Ekstrand <[email protected]>
Reviewed-by: Kenneth Graunke <[email protected]>
Signed-off-by: Topi Pohjolainen <[email protected]>
|
|
|
|
|
|
|
|
|
| |
v2: Do not concern cpp, pitch and tiling which are already
transitioned.
Reviewed-by: Jason Ekstrand <[email protected]>
Reviewed-by: Kenneth Graunke <[email protected]>
Signed-off-by: Topi Pohjolainen <[email protected]>
|
|
|
|
|
|
| |
Reviewed-by: Jason Ekstrand <[email protected]>
Reviewed-by: Kenneth Graunke <[email protected]>
Signed-off-by: Topi Pohjolainen <[email protected]>
|
|
|
|
|
|
|
|
| |
v2 (Daniel): Use isl tiling converters instead of introducing local.
Reviewed-by: Jason Ekstrand <[email protected]>
Reviewed-by: Kenneth Graunke <[email protected]>
Signed-off-by: Topi Pohjolainen <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
v2 (Jason):
- Don't trigger miptree re-creation in vain later on with ISL
based. Core GL uses zero to indicate single sampled while
ISL uses one - this would cause intel_miptree_match_image()
to always fail.
- Now that native miptree is already using sample number of
one, there is no need for MAX2() when converting to ISL.
Reviewed-by: Jason Ekstrand <[email protected]>
Reviewed-by: Kenneth Graunke <[email protected]>
Signed-off-by: Topi Pohjolainen <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
| |
Checking against zero currently works as single sampling is
represented with zero. Once one moves to isl single sampling
really has sample number of one.
This keeps later patches simpler.
Reviewed-by: Emil Velikov <[email protected]>
Reviewed-by: Jason Ekstrand <[email protected]>
Signed-off-by: Topi Pohjolainen <[email protected]>
|
|
|
|
|
| |
Reviewed-by: Topi Pohjolainen <[email protected]>
Reviewed-by: Chad Versace <[email protected]>
|
|
|
|
|
|
|
|
|
| |
The new version now takes a range of levels as well as a range of
layers. It should also be a tiny bit faster because it only walks the
resolve_map list once instead of once per layer.
Reviewed-by: Topi Pohjolainen <[email protected]>
Reviewed-by: Chad Versace <[email protected]>
|
|
|
|
|
|
|
| |
Cc: "17.0 17.1" <[email protected]>
Reviewed-by: Topi Pohjolainen <[email protected]>
Reviewed-by: Kenneth Graunke <[email protected]>
Reviewed-by: Chad Versace <[email protected]>
|
|
|
|
| |
Reviewed-by: Topi Pohjolainen <[email protected]>
|
|
|
|
|
|
|
|
|
| |
We were only using it for validating that we don't use Ys/Yf on gen8 and
earlier. Removing it from isl_tiling_get_info lets us remove it from a
bunch of other things that had no business needing a hardware
generation.
Reviewed-by: Topi Pohjolainen <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
| |
The width and height of the copy don't have to be aligned to the block
size if they specify the right or bottom edges of the image. (See also
the comment and asserts right above). We need to round them up when we
do the division in order to get it 100% right.
Reviewed-by: Ben Widawsky <[email protected]>
Reviewed-by: Kenneth Graunke <[email protected]>
Cc: "17.0 17.1" <[email protected]>
|
|
|
|
|
|
|
|
|
|
| |
The bacon is all gone.
This renames both the class and the related functions. We're about to
run indent on the bufmgr code, so no need to worry about fixing bad
indentation.
Acked-by: Jason Ekstrand <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The execbuf2 kernel API requires us to construct two kinds of lists.
First is a "validation list" (struct drm_i915_gem_exec_object2[])
containing each BO referenced by the batch. (The batch buffer itself
must be the last entry in this list.) Each validation list entry
contains a pointer to the second kind of list: a relocation list.
The relocation list contains information about pointers to BOs that
the kernel may need to patch up if it relocates objects within the VMA.
This is a very general mechanism, allowing every BO to contain pointers
to other BOs. libdrm_intel models this by giving each drm_intel_bo a
list of relocations to other BOs. Together, these form "reloc trees".
Processing relocations involves a depth-first-search of the relocation
trees, starting from the batch buffer. Care has to be taken not to
double-visit buffers. Creating the validation list has to be deferred
until the last minute, after all relocations are emitted, so we have the
full tree present. Calculating the amount of aperture space required to
pin those BOs also involves tree walking, which is expensive, so libdrm
has hacks to try and perform less expensive estimates.
For some reason, it also stored the validation list in the global
(per-screen) bufmgr structure, rather than as an local variable in the
execbuffer function, requiring locking for no good reason.
It also assumed that the batch would probably contain a relocation
every 2 DWords - which is absurdly high - and simply aborted if there
were more relocations than the max. This meant the first relocation
from a BO would allocate 180kB of data structures!
This is way too complicated for our needs. i965 only emits relocations
from the batchbuffer - all GPU commands and state such as SURFACE_STATE
live in the batch BO. No other buffer uses relocations. This means we
can have a single relocation list for the batchbuffer. We can add a BO
to the validation list (set) the first time we emit a relocation to it.
We can easily keep a running tally of the aperture space required for
that list by adding the BO size when we add it to the validation list.
This patch overhauls the relocation system to do exactly that. There
are many nice benefits:
- We have a flat relocation list instead of trees.
- We can produce the validation list up front.
- We can allocate smaller arrays and dynamically grow them.
- Aperture space checks are now (a + b <= c) instead of a tree walk.
- brw_batch_references() is a trivial validation list walk.
It should be straightforward to make it O(1) in the future.
- We don't need to bloat each drm_bacon_bo with 32B of reloc data.
- We don't need to lock in execbuffer, as the data structures are
context-local, and not per-screen.
- Significantly less code and a better match for what we're doing.
- The simpler system should make it easier to take advantage of
I915_EXEC_NO_RELOC in a future patch.
Improves performance in Synmark 7.0's OglBatch7:
- Skylake GT4e: 12.1499% +/- 2.29531% (n=130)
- Apollolake: 3.89245% +/- 0.598945% (n=35)
Improves performance in GFXBench4's gl_driver2 test:
- Skylake GT4e: 3.18616% +/- 0.867791% (n=229)
- Apollolake: 4.1776% +/- 0.240847% (n=120)
v2: Feedback from Chris Wilson:
- Omit explicit zero initializers for garbage execbuf fields.
- Use .rsvd1 = ctx_id rather than i915_execbuffer2_set_context_id
- Drop unnecessary fencing assertions.
- Only use _WR variant of execbuf ioctl when necessary.
- Shrink the arrays to be smaller by default.
Reviewed-by: Jason Ekstrand <[email protected]>
|
|
|
|
|
|
| |
Now we can actually test our changes.
Acked-by: Jason Ekstrand <[email protected]>
|
|
|
|
|
|
|
|
| |
Eric renamed these from dri_bufmgr_* and intel_bufmgr_* to drm_intel_*
in libdrm commit 4b9826408f65976a1a13387beda748b65e03ec52, circa 2008,
but we've been using the legacy names this whole time.
Reviewed-by: Jason Ekstrand <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Fast copy blit was primarily added to support Yf/Ys detiling.
But, Yf/Ys tiling never got used in i965 due to not delivering
the expected performance benefits. Also, replacing legacy blits
with fast copy blit didn't help the benchmarking numbers. This
is probably due to a h/w restriction that says "start pixel for
Fast Copy blit should be on an OWord boundary". This restriction
causes many blit operations to skip fast copy blit and use legacy
blits. So, this patch is deleting this dead code in favor of
adding it later when we actually find it useful.
Signed-off-by: Anuj Phogat <[email protected]>
Reviewed-by: Jason Ekstrand <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
At this point, the pitch is in bytes. We haven't yet divided the pitch
by 4 for tiled surfaces, so abs(pitch) may be larger than 32K. This
means the bit 15 trick won't work.
The caller now has signed integers anyway, so just pass those through
and do the obvious check.
Cc: "17.0" <[email protected]>
Signed-off-by: Kenneth Graunke <[email protected]>
Reviewed-by: Jason Ekstrand <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
When trying to blit larger tiled surfaces, the pitch can be larger than
32768 bytes, which means it won't fit in a GLshort. Passing it in will
truncate the stride to 0, which has...surprising results.
The pitch can be up to 32,768 DWords, or 128kB. We measure it in bytes,
but divide by 4 when programming it. So we need to handle values up to
131,072. Switch from GLshort to int32_t to avoid the truncation.
Fixes GL45-CTS.gtf30.GL3Tests.depth_texture.depth_texture_copyteximage
at widths greater than 8192.
v2: Use int32_t as negative values can be used (Jason).
Cc: "17.0" <[email protected]>
Signed-off-by: Kenneth Graunke <[email protected]>
Reviewed-by: Jason Ekstrand <[email protected]>
|
|
|
|
|
| |
Reviewed-by: Topi Pohjolainen <[email protected]>
Cc: "13.0" <[email protected]>
|
|
|
|
|
|
|
|
|
|
| |
By using emit_miptree_blit which does chunking, this fixes the blitter path
for the case where the image is too tall to blit normally. We also pull it
into intel_blit as intel_miptree_copy. This matches the naming of the
blorp blit and copy functions brw_blorp_blit and brw_blorp_copy.
Reviewed-by: Anuj Phogat <[email protected]>
Cc: "13.0" <[email protected]>
|
|
|
|
|
| |
Reviewed-by: Anuj Phogat <[email protected]>
Cc: "13.0" <[email protected]>
|
|
|
|
|
|
|
|
| |
v2: Make intel_miptree_resolve_color() take start layer and
layer count.
Signed-off-by: Topi Pohjolainen <[email protected]>
Reviewed-by: Jason Ekstrand <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This allows us to blit much larger images than if we use the blitter
directly. In particular, it gives us an almost infinite image height
compared to the fairly limiting 32k. We do, however, still have a
restriction on stride of the image because handling larger strides, while
possible, is fairly difficult.
v2: Properly handle linear blit alignment restrictions
Signed-off-by: Jason Ekstrand <[email protected]>
Reviewed-by: Topi Pohjolainen <[email protected]>
Reviewed-by: Anuj Phogat <[email protected]>
|
|
|
|
|
|
|
|
| |
v2: Properly handle linear blit alignment restrictions
Signed-off-by: Jason Ekstrand <[email protected]>
Reviewed-by: Topi Pohjolainen <[email protected]>
Reviewed-by: Anuj Phogat <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
| |
This assertion, while valid for linear buffers, doesn't work properly for
tiled memory. It used to work most of the time because the offset provided
was always to the left-hand edge of the image. However, if you use a byte
offset to get to the inside of the image, the height * stride calculation
may actually end up being too large.
Signed-off-by: Jason Ekstrand <[email protected]>
Reviewed-by: Topi Pohjolainen <[email protected]>
Reviewed-by: Anuj Phogat <[email protected]>
|
|
|
|
|
|
|
|
| |
More than half of the stuff in intel_reg.h had nothing whatsoever to do
with registers and really belongs in brw_defines.h anyway.
Signed-off-by: Jason Ekstrand <[email protected]>
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Modern OpenGL BlitFramebuffer require sRGB encode/decode when
GL_FRAMEBUFFER_SRGB is enabled. The blitter can't handle this,
so we need to bail. On Gen4-5, this means falling back to Meta,
which should handle it.
We allow sRGB <-> sRGB blits, as decode then encode ought to be a noop
(other than potential precision loss, which nobody wants anyway).
Signed-off-by: Kenneth Graunke <[email protected]>
Reviewed-by: Ian Romanick <[email protected]>
|
|
|
|
|
|
|
|
|
|
| |
Re-add the "return false" that was removed in 0c02d7002d6c005b4c1fe997b5ef5916978dd183
It seems that something went wrong when merging the patch. The patch
sent to the mailing list does not directly match what was committed.
https://lists.freedesktop.org/archives/mesa-dev/2016-May/118198.html
Reviewed-by: Jason Ekstrand <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
| |
XY_FAST_COPY_BLT command doesn't have a field for raster operation. So, fall
back to using XY_SRC_COPY_BLT to handle those cases.
Fixes piglit test gl-1.1-xor-copypixels when fast copy blit is enabled
for all tiling formats.
Signed-off-by: Anuj Phogat <[email protected]>
Reviewed-by: Matt Turner <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Experimentation with different values of src/dst horizontal/vertical
alignment showed that these fileds are not used on gen9 hardware.
A recent update in graphics specs has removed these fields from
XY_FAST_COPY_BLT command.
Cc: Ben Widawsky <[email protected]>
Cc: Chad Versace <[email protected]>
Signed-off-by: Anuj Phogat <[email protected]>
Reviewed-by: Ben Widawsky <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Until now there has been only one type of color buffer that needs
to resolved - namely single sampled fast clear. As even the
sampler engine in GPU doesn't understand the associated meta data,
the color values need to be always resolved prior to reading them.
From SKL onwards there is new scheme supported called the lossless
compression of single sampled color buffers. This is something that
is understood by the sampling engine and therefore resolving of
these types of buffers is not necessary before sampling.
This patch adds means to make the distinction when considering if
resolve is needed.
Signed-off-by: Topi Pohjolainen <[email protected]>
Reviewed-by: Ben Widawsky <[email protected]>
|
|
|
|
|
|
|
| |
This allows the fallback paths to handle it correctly.
Signed-off-by: Anuj Phogat <[email protected]>
Reviewed-by: Jordan Justen <[email protected]>
|
|
|
|
|
|
|
|
| |
Overlapping blits are anyway undefined in OpenGL. So no need
of overlap check here.
Signed-off-by: Anuj Phogat <[email protected]>
Reviewed-by: Jordan Justen <[email protected]>
|
|
|
|
|
|
|
| |
Fast copy blit is currently enabled for use only with Yf/Ys tiling.
Signed-off-by: Anuj Phogat <[email protected]>
Reviewed-by: Jordan Justen <[email protected]>
|
|
|
|
| |
Reviewed-by: Matt Turner <[email protected]>
|