| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
|
|
| |
This was for some of the old spans-related code that is now gone.
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This patch reduces the time spent in glTexImage and glTexSubImage by
over 5x on Sandybridge for the workload described below.
It adds a new fast path for glTexImage2D and glTexSubImage2D,
intel_texsubimage_tiled_memcpy, which is optimized for Google Chrome's
paint rectangles. The fast path is implemented only for 2D GL_BGRA
textures for chipsets with a LLC.
=== Performance Analysis ===
Workload description:
Personalize your google.com page with a wallpaper. Start chromium
with flags "--ignore-gpu-blacklist --enable-accelerated-painting
--force-compositing-mode". Start recording with chrome://tracing. Visit
google.com and wait for page to finish rendering. Measure the time spent
by process CrGpuMain in GLES2DecoderImpl::HandleTexImage2D and
HandleTexSubImage2D.
System config:
cpu: Sandybridge Mobile GT2+ (0x0126)
kernel 3.4.9 x86_64
chromium 21.0.1180.89 (154005)
Statistics:
| N Median Avg Stddev
--------------|-------------------------
before (msec) | 8 472.5 463.75 72.6
after (msec) | 8 78.0 79.6 5.7
Arithmetic difference at 95.0% confidence:
-384.1 +/- 55.2 msec
-82.8% +/- 11.9%
Ratio at 95.0% confidence:
5.81 +/- 0.119
v2:
- Replace check for `intel->gen >= 6` with `intel->has_llc`, per
danvet.
- Fix typo in comment, s/throuh/through/.
- Swap 'before' and 'after' rows in stat table.
v3:
- If the current batch references the bo, then flush batch before mapping
the bo. Found by Chris.
- Restrict supported texture images to level 0 of target
GL_TEXTURE_2D. This avoids an arithmetic bug in calculating image
offsets within the miptree, found by Paul. This restriction does not
diminish this patch's benefit to Chrome OS performance.
- Use less instructions for bit6 swizzling, suggested by Paul.
- Remove erroneous comment about Y-tiling, for Paul.
- Print perf_debug messages when flushing and stalling.
- Update stats in commit message; run workload under a release build
rather than a debug build.
Note: This is a candidate for the 9.0 branch.
Acked-by: Eric Anholt <[email protected]>
CC: Stéphane Marchesin <[email protected]>
Signed-off-by: Chad Versace <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
| |
Replace target, level parameters with gl_texture_image.
Add gl_renderbuffer parameter to indicate source buffer for the copy.
This removes some redundant code in the drivers to find the source
renderbuffer and the destination texture image (which we already had
in _mesa_CopyTexSubImage).
Signed-off-by: Brian Paul <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Before, we had an uncached read of S8 to untile, then a RMW (so
uncached penalty) of the packed S8Z24 to store the value, then the
consumer would uncached read that once per pixel. If data was written
to the map, we would then have to uncached read the written data back
out and do the scatter to the tiled S8 buffer (also uncached access
penalties, since WC couldn't actually combine). So 3 or 5 uncached
accesses per pixel in the ROI (and we we were ignoring the ROI, so it
was the whole image).
Now we get an uncached read of S8 to untile, and an uncached read of
Z. The consumer gets to do cached accesses. Then if data was
written, we do streaming Z writes (WC success), and scattered S8
tiling writes (uncached penalty). So 2 or 3 uncached accesses per
pixel in the ROI.
This should be a performance win, to the extent that anybody is doing
software accesses of packed depth/stencil buffers.
Reviewed-by: Chad Versace <[email protected]>
|
|
|
|
|
|
|
|
|
|
| |
The 'mode' param is a bitset of GL_MAP_READ_BIT, GL_MAP_WRITE_BIT.
A future commit will perform buffer resolves in intel_region_map(). So,
even though the access mode is irrelevant to the GTT, the extra
information allows us to intelligently avoid unneccessary buffer resolves.
Signed-off-by: Chad Versace <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
I initially produced the patch using this bash command:
for file in {intel,i915,i965}/*.{c,cpp,h}; do [ ! -h $file ] && sed -i
's/GLboolean/bool/g' $file && sed -i 's/GL_TRUE/true/g' $file && sed -i
's/GL_FALSE/false/g' $file; done
Then I manually added #include <stdbool.h> to fix compilation errors,
and converted a few functions back to GLboolean that were used in core
Mesa's function pointer table to avoid "incompatible pointer" warnings.
Finally, I cleaned up some whitespace issues introduced by the change.
Signed-off-by: Kenneth Graunke <[email protected]>
Acked-by: Chad Versace <[email protected]>
Acked-by: Paul Berry <[email protected]>
|
|
|
|
|
|
| |
Before, we were only allocating these from our TexImage, so if the
texture image was set up in any other way (non-accelerated
glGenerateMipmaps()), they'd be missing or wrong.
|
|
|
|
|
| |
Now we can rely on Mesa core for uploads of data without introducing
an extra copy at validate time.
|
|
|
|
|
|
|
|
|
| |
The GLenum target parameter was not used in intel_copy_texsubimage, so
remove it. Also remove the GLenum internalFormat parameter. Each
caller just copied this out of the intel_texture_image that is already
passed to intel_copy_texsubimage.
Reviewed-by: Eric Anholt <[email protected]>
|
|
|
|
| |
Reviewed-by: Brian Paul <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
In the case where glBlitFramebuffer is being used to copy to a texture
without scaling it is faster if we can use the hardware to do a blit
rather than having to do a texture render. In most of the drivers
glCopyTexSubImage2D will use a blit so this patch makes it check for
when glBlitFramebuffer is doing a simple copy and then divert to
glCopyTexSubImage2D.
This was originally proposed as an extension to the common meta-ops.
However, it was rejected as using the BLT is only advantageous for Intel
hardware.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=33934
Signed-off-by: Chris Wilson <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
| |
This reverts commit 7ce6517f3ac41bf770ab39aba4509d4f535ef663.
This reverts commit d60145d06d999c5c76000499e6fa9351e11d17fa.
I was wrong about which generations supported baselevel adjustment --
it's just gen4, nothing earlier. This meant that i915 would have
never used the mag filter when baselevel != 0. Not a severe bug, but
not an intentional regression. I think we can fix the performance
issue another way.
|
|
|
|
|
|
|
|
|
|
|
| |
BaseLevel/MaxLevel are mostly used for two things: clamping texture
access for FBO rendering, and limiting the used mipmap levels when
incrementally loading textures. By restricting our mipmap trees to
just the current BaseLevel/MaxLevel, we caused reallocation thrashing
in the common case, for a theoretical win if someone really did want
just levels 2..4 or whatever of their texture object.
Bug #30366
|
|
|
|
|
|
| |
We now share the type/format -> MESA_FORMAT_* mappings with software
mesa, and the core supports most of the fallbacks hardware drivers
will want.
|
| |
|
| |
|
| |
|
| |
|
|\
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
Conflicts:
src/mesa/drivers/dri/radeon/radeon_fbo.c
src/mesa/drivers/dri/s3v/s3v_tex.c
src/mesa/drivers/dri/s3v/s3v_xmesa.c
src/mesa/drivers/dri/trident/trident_context.c
src/mesa/main/debug.c
src/mesa/main/mipmap.c
src/mesa/main/texformat.c
src/mesa/main/texgetimage.c
|
| |
| |
| |
| |
| |
| | |
Now gl_texture_image::TexFormat is a simple MESA_FORMAT_x enum.
ctx->Driver.ChooseTexture format also returns a MESA_FORMAT_x.
gl_texture_format will go away next.
|
|/ |
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This requires upgrading the interface so that the argument to
glXBindTexImageEXT isn't just dropped on the floor. Note that this only
fixes the accelerated path on Intel, as Mesa's texture format support is
missing x8r8g8b8 support (right now, GL_RGB textures get uploaded as a8r8gb8,
but in this case we're not doing the upload so we can't really work around it
that way).
Fixes bugs with compositors trying to use shaders that use alpha channels, on
windows without a valid alpha channel. Bug #19910 and likely others as well.
Reviewed-by: Ian Romanick <[email protected]>
|
| |
|
|
|
|
| |
Makefile.template
|
|
|
|
|
|
|
|
|
| |
Previously, the updated images would be ignored because the miptree in the
image matched the miptree in the object, even though Mesa core had just attached
updated contents in ->Data. Additionally, Mesa core could have tried to
free inside our miptree if it had already been validated.
Fixes bug #17077.
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This fixes a problem where texturing from the same Pixmap more than
once per batchbuffer would hang the DRI driver. We just use the region
associated with the front left renderbuffer of the __DRIdrawable for
texturing, which avoids creating different regions for the same BO.
This change also make GLX_EXT_texture_from_pixmap work for direct
rendering, since tracking the __DRIdrawable -> BO handle now uses
the standard DRI2 event buffer. Of course, DRI2 direct rendering
doesn't exist yet.
Finally, this commit bumps the DRI interface version again, accounting
for the change in the DRI_TEX_BUFFER extension and the change in
commit 0bba0e5be7a4a7275dad1edc34bdcc134ea1f424 to pass in the
event buffer head index on drawable creation.
|
|
|
|
|
| |
They're changed by the intel driver implementation and thus not const.
Fixes compilation warning.
|
|
|
|
| |
Currently only implemented for intel hw.
|
|
|
|
|
|
|
| |
The core problem was that _mesa_generate_mipmap was not respecting RowStride
of the source image. Additionally, the intel private data associated with the
images (level and face) was not being initialized for the
_mesa_generate_mipmap-generated images.
|
|
|