| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
According to the docs for 3DSTATE_PS (Gen7+) and 3DSTATE_WM (Gen6),
there is a platform dependent value for the minimum number of pixel
shader threads. It may also vary based on whether WIZ Hashing is on.
For example, Ivybridge requires at least 4 threads if WIZ hashing is
disabled, and 8 if it's enabled. Programming it to use less threads is
illegal. Sandybridge appears to have similar restrictions.
So on newer platforms, INTEL_DEBUG=sing will probably just hang the GPU.
Rather than try to patch it up for newer platforms and extend it to
support geometry shaders, just remove it as it isn't that useful anyway.
Signed-off-by: Kenneth Graunke <[email protected]>
Reviewed-by: Eric Anholt <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Add the following to the vtbl:
hiz_resolve_depthbuffer
hiz_resolve_hizbuffer
For all drivers for which HiZ is not enabled, the methods are set to be
no-ops. If HiZ is enabled, the methods are currently to set to empty
stubs.
Reviewed-by: Eric Anholt <[email protected]>
Signed-off-by: Chad Versace <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
I initially produced the patch using this bash command:
for file in {intel,i915,i965}/*.{c,cpp,h}; do [ ! -h $file ] && sed -i
's/GLboolean/bool/g' $file && sed -i 's/GL_TRUE/true/g' $file && sed -i
's/GL_FALSE/false/g' $file; done
Then I manually added #include <stdbool.h> to fix compilation errors,
and converted a few functions back to GLboolean that were used in core
Mesa's function pointer table to avoid "incompatible pointer" warnings.
Finally, I cleaned up some whitespace issues introduced by the change.
Signed-off-by: Kenneth Graunke <[email protected]>
Acked-by: Chad Versace <[email protected]>
Acked-by: Paul Berry <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
What I would prefer to assert is that, for each region that is currently
mapped, no batch is emitted that uses that region's bo. However, it's much
easier to implement this big hammer.
Observe that this requires that the batch flush in intel_region_map() be
moved to within the map_refcount guard.
v2: Add comments (borrowed from anholt's reply) explaining why the
assertion is a good idea.
Reviewed-by: Eric Anholt <[email protected]>
Reviewed-by: Ian Romanick <[email protected]>
Signed-off-by: Chad Versace <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
| |
The "slight optimization to avoid the GS program" in brw_set_prim() is not
used by Gen 6, since Gen 6 doesn't use a GS program. Also, Gen 6 doesn't use
reduced primitives.
Also, document that intel_context.reduced_primitive is only used for Gen < 6
Reviewed-by: Eric Anholt <[email protected]>
Signed-off-by: Chad Versace <[email protected]>
|
|
|
|
|
|
|
|
|
|
| |
It seems that GT1/GT2 sorts of variations are here to stay, and more
special cases will likely be required in the future. Checking by PCI ID
via the IS_xxx_GTx macros is cumbersome; introducing a new 'gt' field
analogous to intel->gen will make this easier.
Signed-off-by: Kenneth Graunke <[email protected]>
Reviewed-by: Eric Anholt <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Seeing as they were only used once (in the same function they were
defined), having them as context members seemed rather pointless.
Remove them entirely (rather than using local variables) since the
chipset generation checks are actually just as straightforward.
While we're at it, clean up the remainder of the if-tree that set them.
Signed-off-by: Kenneth Graunke <[email protected]>
Reviewed-by: Eric Anholt <[email protected]>
|
| |
|
|
|
|
|
|
|
|
| |
The illusion of shared code here wasn't fooling anybody. It was
tempting to keep i830 and i915 still shared, but I think I actually
want to make them diverge shortly.
Reviewed-by: Chad Versace <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The first rendering after context create didn't know of the color
buffer yet, triggering a sw fallback. The intel_prepare_render() from
intelSpanRenderStart then found the buffer and turned off fallbacks,
but intelSpanRenderFinish was never called and things were left
mapped. By checking buffers before making the call on whether to do
the fallback pipeline or not, we avoid the fallback change inside of
the rendering pipeline.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=31561
Reviewed-by: Ian Romanick <[email protected]>
|
|
|
|
|
| |
We're about to call this function in a bunch of state emits, so let's
not spam the hardware with flushes too hard.
|
|
|
|
|
|
|
|
| |
This was spectacularly unsafe. On my system, address 0 happens to be
the hardware status page for the render ring, and the first quadword
of that happens to contain nothing we ever look at, but I sure didn't
look forward to having to debug some day when, for example, the kernel
happened to bind the ringbuffer before binding the hwsp.
|
|
|
|
|
| |
Before, we were waiting for (most of) the current framebuffer to be
done, which is not quite the same thing.
|
|
|
|
|
|
|
|
|
| |
Given a format, is_hiz_depth_format() indicates if HiZ can be enabled on
a depthbuffer of that format.
Reviewed-by: Eric Anholt <[email protected]>
Reviewed-by: Kenneth Graunke <[email protected]>
Signed-off-by: Chad Versace <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Add the following flags:
intel_context.has_separate_stencil
intel_context.must_use_separate_stencil
intel_context.has_hiz
The flags are currently set to false, and will be enabled for a given
chipset once the feature is completely implemented.
Since it may be some time before these features are completed, their
values can be overridden with environment variables INTEL_HIZ and
INTEL_SEPARATE_STENCIL. Valid values for these environment variables are
"0" and "1".
Reviewed-by: Eric Anholt <[email protected]>
Reviewed-by: Kenneth Graunke <[email protected]>
Signed-off-by: Chad Versace <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This reverts commit 50ade6ea697953bb17e3ca7210515fbd0411cd1e.
Fixes jerky rendering again on apps that don't block on the GPU per
frame and are GPU bound (e.g. 3DMMES on Ironlake). The whole point of
this complicated throttle scheme is to wait on frame n-1 to have
started rendering before starting frame n's rendering. Otherwise, the
GPU-bound app will race ahead and call the GL to draw many
nearly-identical frames, then >0ms later get stuck waiting for them
(all dispatched at about the same time) to retire, then render a new
batch of nearly-identical frames.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Previously the macro would (ALIGN(value - alignment - 1, alignment)).
At the very least, this was missing parenthesis around "alignment -
1". As a result, if value was already aligned, it would be reduced by
alignment. Condisder:
x = ROUND_DOWN_TO(256, 128);
This becomes:
x = ALIGN(256 - 128 - 1, 128);
Or:
x = ALIGN(127, 128);
Which becomes:
x = 128;
This macro is currently only used in brw_state_batch
(brw_state_batch.c). It looks like the original version of this macro
would just use too much space in the batch buffer. It's possible, but
not at all clear to me from the code, that the original behavior is
actually desired.
In any case, this patch does not cause any piglit regressions on my
Ironlake system.
I also think that ALIGN_FLOOR would be a better name for this macro,
but ROUND_DOWN_TO matches rounddown in the Linux kernel.
Reviewed-by: Eric Anholt <[email protected]>
Reviewed-by: Keith Whitwell <[email protected]>
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
|
| |
I was being overly miserly and gave the offset of the buffer into the bo
insufficient bits, distracted by the adjacency of the buffer[4096].
Ref: https://bugs.freedesktop.org/show_bug.cgi?id=34541
Signed-off-by: Chris Wilson <[email protected]>
|
|
|
|
|
|
|
|
|
|
| |
Rather than waiting on the first batch after the last swapbuffers to be
retired, call into the kernel to wait upon the retirement of any request
less than 20ms old. This has the twofold advantage of (a) not blocking
any other clients from utilizing the device whilst we wait and (b) we
attain higher throughput without overloading the system.
Signed-off-by: Chris Wilson <[email protected]>
|
|
|
|
|
|
|
|
| |
Move the tracking of the last emitted instructions into the core
batchbuffer routines and take advantage of the shadow batch copy to
avoid extra memory allocations and copies.
Signed-off-by: Chris Wilson <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
| |
It's faster. Not only is the memcpy more efficiently performed in the
kernel (making up for the system call overhead), but by not using mmap
we remove the greater overhead of tracking the vma of every batch.
And it means we can read back from the batch buffer without incurring
the cost of a uncached read through the GTT.
Signed-off-by: Chris Wilson <[email protected]>
|
|
|
|
|
|
|
|
|
| |
Rather than performing lots of little writes to update the common bo
upon each update, write those into a static buffer and flush that when
full (or at the end of the batch). Doing so gives a dramatic performance
improvement over and above using mmaped access.
Signed-off-by: Chris Wilson <[email protected]>
|
|
|
|
|
|
|
|
| |
Dynamic arrays have the tendency to be small and so allocating a bo for
each one is overkill and we can exploit many efficiency gains by packing
them together.
Signed-off-by: Chris Wilson <[email protected]>
|
| |
|
|
|
|
|
|
|
| |
By relying on just intel_span_supports_format, some formats that
aren't supported pre-gen4 were not reporting FBO incomplete. And we
also complained in stderr when it happened on i915 because draw_region
gets called before framebuffer completeness validation.
|
|
|
|
|
| |
This requires shuffling the driconf XML macros around, since they use
true and false tokens expecting them to not get expanded to anything.
|
| |
|
|
|
|
|
|
|
| |
Sometimes I'm on the train and want to just read what's generated
under INTEL_DEBUG=vs,wm for some code on another generation. Or, for
the next gen enablement we'll want to dump aub files before we have
the actual hardware. This will let us do that.
|
| |
|
|
|
|
|
|
|
| |
This provides the optimizer with hints about code hotness, which we're
quite certain about for debug printouts (or, rather, while we
developers often hit the checks for debug printouts, we don't care
about performance while doing so).
|
| |
|
| |
|
|
|
|
| |
Bug #29855
|
| |
|
| |
|
|
|
|
|
| |
The slightly less mechanical change of converting the emit_reloc calls
will follow.
|
|
|
|
|
|
|
|
|
| |
The support for XRGB8888 appeared in the 855 and 865, and this format
is reserved on 830/845. This should fix a regression from
b4a6169412819cc3a027c6a118f0537911145a30 that caused hangs in etracer
on 845s.
Bug #26557.
|
|
|
|
|
|
|
|
|
|
|
| |
Before we would throttle in the flush callback prior to round-tripping
to the server to do copyregion or swapbuffer. Now, instead just note
that we need to throttle and do it in intel_prepare_render(), which
will be called after receiving the response from the server but before
we start rendering the next frame. Even if the server also throttles
us in swapbuffer, this just makes the throttling a no-op when we hit
intel_prepare_render(). With that we can drop the
using_dri2_swapbuffers hack and just always throttle.
|
| |
|
| |
|
| |
|
| |
|
|
|
|
|
| |
Now that intel_flush() deosn't use the needs_mi_flush argument, we can
finally drop one of the two flush functions.
|
| |
|
|
|
|
|
|
|
|
|
| |
Rename old IGDNG to Ironlake, and set 'gen' number for
Ironlake as 5, so tracking the features with generation num
instead of special is_ironlake flag.
Reviewed-by: Eric Anholt <[email protected]>
Signed-off-by: Zhenyu Wang <[email protected]>
|
|
|
|
| |
All these pointers are in the __DRIcontext struct, which we point to.
|
|
|
|
|
| |
I keep finding the desire to force this path to debug it instead of
cooking up goofy-looking testcases to do so.
|
|
|
|
|
|
|
| |
Shaves 60k off the driver from removing the broken spans code. This
means we now require 2.6.29, which seems fair given that it's a year
old and we've removed support for non-KMS already in the last release
of 2D.
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This uses a stamp mechanisms to mark the DRI drawable as invalid.
Instead of immediately updating the buffers we just bump the drawable
stamp and call out to DRI2GetBuffers "later".
"Later" used to be at LOCK_HARDWARE time, and this patch brings back
callouts at the points where we used to call LOCK_HARDWARE. A new function,
intel_prepare_render(), is called where we used to call LOCK_HARDWARE,
and if the buffers are invalid, we call out to DRI2GetBuffers there.
This lets us invalidate buffers only when notified instead of on
every glViewport() call. If the loader calls the DRI invalidate
entrypoint, we disable viewport triggered buffer invalidation.
Additionally, we can clean up the old viewport mechanism a bit,
since we can just invalidate the buffers and not worry about
reentrancy and whatnot.
|