| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
| |
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
| |
It's going to be reused in a second place soon.
Reviewed-by: Eric Anholt <[email protected]>
Signed-off-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Haswell moved the "Cut Index Enable" bit from the INDEX_BUFFER packet to
a new 3DSTATE_VF packet, so we need to emit that. Also, it requires us
to specify the cut index rather than assuming it's 0xffffffff.
This adds a new Haswell-specific tracked state atom to gen7_atoms.
Normally, we would create a new generation-specific atom list, but since
there's only one difference over Ivybridge so far, I chose to simply
make it return without doing any work on non-Haswell systems.
Fixes five piglit tests:
- general/primitive-restart-DISABLE_VBO
- general/primitive-restart-VBO_COMBINED_VERTEX_AND_INDEX
- general/primitive-restart-VBO_INDEX_ONLY
- general/primitive-restart-VBO_SEPARATE_VERTEX_AND_INDEX
- general/primitive-restart-VBO_VERTEX_ONLY
Signed-off-by: Kenneth Graunke <[email protected]>
Reviewed-by: Jordan Justen <[email protected]>
|
|
|
|
|
|
|
| |
There were no other cases that set it any more.
Reviewed-by: Kenneth Graunke <[email protected]>
Reviewed-by: Ian Romanick <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
| |
This represents the index into the sampler state table or sampler
default color table (the two are identical).
Right now, this is still the texture unit, but that will change shortly.
Signed-off-by: Kenneth Graunke <[email protected]>
Reviewed-by: Paul Berry <[email protected]>
Reviewed-by: Eric Anholt <[email protected]>
|
|
|
|
|
|
| |
v2: Comment fix, drop extraneous parens (review by Kenneth)
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
| |
We'll use this for UBO surfaces.
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This patch modifies gen7_set_surface_num_multisamples() to set up the
SURFACE_STATE appropriately for texturing from IMS format MSAA
surfaces (which are only used on Gen7 for depth and stencil buffers).
Since the function now sets more than just the number of multisamples,
it's been renamed to gen7_set_surface_msaa().
This will make it possible to remove some kludginess from the blorp
engine.
Reviewed-by: Anuj Phogat <[email protected]>
|
|
|
|
|
|
|
|
|
| |
When a buffer using Gen7's CMS MSAA layout is bound to a texture or a
render target, the SURFACE_STATE structure needs to point to the MCS
buffer and to indicate its pitch. This patch updates the functions
that emit SURFACE_STATE to handle CMS layout properly.
Reviewed-by: Chad Versace <[email protected]>
|
|
|
|
|
|
|
|
|
| |
When a Gen7 SURFACE_STATE is configured for MSAA, a number of
additional constaints come in to play. This patch adds a function
gen7_check_surface_setup() which verifies that all of those
constraints are met.
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This patch enables MSAA for Gen6, by modifying intel_mipmap_tree to
understand multisampled buffers, adapting the rendering pipeline setup
to enable multisampled rendering, and adding multisample resolve
operations to brw_blorp_blit.cpp. Some preparation work is also
included for Gen7, but it is not yet enabled.
MSAA support is still fairly preliminary. In particular, the
following are not yet supported:
- Fully general blits between MSAA and non-MSAA buffers.
- Formats other than RGBA8, DEPTH24, and STENCIL8.
- Centroid interpolation.
- Coverage parameters (glSampleCoverage, GL_SAMPLE_ALPHA_TO_COVERAGE,
GL_SAMPLE_ALPHA_TO_ONE, GL_SAMPLE_COVERAGE, GL_SAMPLE_COVERAGE_VALUE,
GL_SAMPLE_COVERAGE_INVERT).
Fixes piglit tests "EXT_framebuffer_multisample/accuracy" on
i965/Gen6.
v2:
- In intel_alloc_renderbuffer_storage(), quantize the requested number
of samples to the next higher sample count supported by the
hardware. This ensures that a query of GL_SAMPLES will return the
correct value. It also ensures that MSAA is fully disabled on Gen7
for now (since Gen7 MSAA support doesn't work yet).
- When reading from a non-MSAA surface, ensure that s_is_zero is true
so that we won't try to read from a nonexistent sample.
|
|
|
|
|
|
|
|
| |
This patch exposes the functions brw_get_surface_tiling_bits and
gen7_set_surface_tiling, so that they can be re-used when setting up
surface states in gen6_blorp.cpp and gen7_blorp.cpp.
Reviewed-by: Chad Versace <[email protected]>
|
|
|
|
|
|
|
|
| |
These declarations are necessary to allow C++ code to call C code
without causing unresolved symbols (which would make the driver fail
to load).
Reviewed-by: Chad Versace <[email protected]>
|
|
|
|
| |
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
|
| |
Improves VS state change microbenchmark performance by 7.08729% +/-
1.22289% (n=10) on gen7, because we don't upload the 64 dwords of
unused binding table any more.
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
|
| |
This is a step toward making the samplers/binding tables reflect
sampler uniform mappings instead of embedding those in the programs.
No significant performance difference on the microbenchmark (n=10).
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The gen7_urb atom depends on CACHE_NEW_VS_PROG and CACHE_NEW_GS_PROG,
causing gen7_upload_urb() to be called when switching to a new VS
program.
In addition to partitioning the URB space between the VS and GS,
gen7_upload_urb() also allocated space for VS and PS push constants.
Unfortunately, this meant that whenever CACHE_NEW_VS was flagged, we'd
reallocate the space for the PS push constants. According to the BSpec,
after sending 3DSTATE_PUSH_CONSTANT_ALLOC_PS, we must reprogram
3DSTATE_CONSTANT_PS prior to the next 3DPRIMITIVE.
Since our URB allocation for push constants is entirely static, it makes
sense to split it out into its own atom that only subscribes to
BRW_NEW_CONTEXT. This avoids reallocating the space and trashing
constants.
Fixes a rendering artifact in Extreme Tuxracer, where instead of a snow
trail, you'd get a bright red streak (affectionately known as the
"bloody penguin bug").
This also explains why adding VS-related dirty bits to gen7_ps_state
made the problem disappear: it made 3DSTATE_CONSTANT_PS be emitted after
every 3DSTATE_PUSH_CONSTANT_ALLOC_PS packet.
NOTE: This is a candidate for the 7.11 branch.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=38868
Signed-off-by: Kenneth Graunke <[email protected]>
Reviewed-by: Paul Berry <[email protected]>
|
|
|
|
|
|
|
|
|
| |
$ dict invarient
No definitions found for "invarient", perhaps you mean:
gcide: Invariant
wn: invariant
Signed-off-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
We don't currently have kernel support for saving GPU registers on a
context switch, so if multiple processes are performing transform
feedback at the same time, their SVBI registers will interfere with
each other. To avoid this situation, we keep a software shadow of the
state of the SVBI 0 register (which is the only register we use), and
re-upload it on every new batch.
The function that updates the shadow state of SVBI 0 is called
brw_update_primitive_count, since it will also be used to update the
counters for the PRIMITIVES_GENERATED and
TRANSFORM_FEEDBACK_PRIMITIVES_WRITTEN queries.
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This patch adds basic transform feedback capability for Gen6 hardware.
This consists of several related pieces of functionality:
(1) In gen6_sol.c, we set up binding table entries for use by
transform feedback. We use one binding table entry per transform
feedback varying (this allows us to avoid doing pointer arithmetic in
the shader, since we can set up the binding table entries with the
appropriate offsets and surface pitches to place each varying at the
correct address).
(2) In brw_context.c, we advertise the hardware capabilities, which
are as follows:
MAX_TRANSFORM_FEEDBACK_INTERLEAVED_COMPONENTS 64
MAX_TRANSFORM_FEEDBACK_SEPARATE_ATTRIBS 4
MAX_TRANSFORM_FEEDBACK_SEPARATE_COMPONENTS 16
OpenGL 3.0 requires these values to be at least 64, 4, and 4,
respectively. The reason we advertise a larger value than required
for MAX_TRANSFORM_FEEDBACK_SEPARATE_COMPONENTS is that we have already
set aside 64 binding table entries, so we might as well make them all
available in both separate attribs and interleaved modes.
(3) We set aside a single SVBI ("streamed vertex buffer index") for
use by transform feedback. The hardware supports four independent
SVBI's, but we only need one, since vertices are added to all
transform feedback buffers at the same rate. Note: at the moment this
index is reset to 0 only when the driver is initialized. It needs to
be reset to 0 whenever BeginTransformFeedback() is called, and
otherwise preserved.
(4) In brw_gs_emit.c and brw_gs.c, we modify the geometry shader
program to output transform feedback data as a side effect.
(5) In gen6_gs_state.c, we configure the geometry shader stage to
handle the SVBI pointer correctly.
Note: ordering of vertices is not yet correct for triangle strips
(alternate triangles are improperly oriented). This will be addressed
in a future patch.
Reviewed-by: Kenneth Graunke <[email protected]>
Reviewed-by: Eric Anholt <[email protected]>
|
|
|
|
|
|
|
|
|
|
| |
This will make handling new formats (like actually exposing Z32F)
easier and more reliable.
v2: Remove the check for hiz buffer -- the MESA_FORMAT should really
be giving us the value we want even for hiz.
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
|
|
|
| |
NEW_COLOR is only needed on Gen4-5 as brw_update_renderbuffer_surfaces
only uses ctx->Color when intel->gen < 6.
This should reduce unnecessary state updates.
Signed-off-by: Kenneth Graunke <[email protected]>
Reviewed-by: Eric Anholt <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
brw_wm_samplers actually enables any active samplers regardless of what
pipeline stage is using them, so it doesn't make much sense for it to be
WM-specific. So, rename it to "brw_samplers."
To properly generalize it, move sampler_count and sampler_offset from
brw_context::wm to a new brw_context::sampler that can be shared without
looking strange.
Signed-off-by: Kenneth Graunke <[email protected]>
Reviewed-by: Eric Anholt <[email protected]>
|
|
|
|
|
|
|
|
|
|
| |
Like for the WM pull constants, we can merge the former prepare/emit
stages into one tracked state atom. Furthermore, the code that used to
handle the binding table was removed in the last commit, leaving some
rather silly looking short functions that can easily be folded in.
Signed-off-by: Kenneth Graunke <[email protected]>
Reviewed-by: Eric Anholt <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Although the hardware supports separate binding tables for each pipeline
stage, we don't see much advantage over a single shared table.
Consider the contents of the binding table:
- Textures (16)
- Draw buffers (8)
- Pull constant buffers (1 for VS, 1 for WM)
OpenGL's texture bindings are global: the same set of textures is
available to all shader targets. So our binding table entries for
textures would be exactly the same in every table.
There are only two pull constant buffers (not many), and although draw
buffers aren't interesting to the VS, it shouldn't hurt to have them in
the table. The hardware supports up to 254 binding table entries, and
we currently only use 26.
Signed-off-by: Kenneth Graunke <[email protected]>
Reviewed-by: Eric Anholt <[email protected]>
Reviewed-by: Paul Berry <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
First, the texturing setup code is relevant for all pipeline stages,
while renderbuffer surfaces are only used by the WM.
Secondly, renderbuffer and texture setup depends on a different set of
dirty bits. There's no reason to walk the array of textures when
changing draw buffers, or vice-versa.
Signed-off-by: Kenneth Graunke <[email protected]>
Reviewed-by: Eric Anholt <[email protected]>
Reviewed-by: Paul Berry <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
These were only split for historical reasons: brw_wm_constants used to
be the "prepare" step, while brw_wm_constant_surface was "emit". Now
that both happen at emit time, it makes sense to combine them.
Call the newly combined state atom "brw_wm_pull_constants" to indicate
help distinguish it from the Gen6+ atoms that handle push constants.
Finally, remove the BRW_NEW_WM_CONSTBUF dirty bit entirely now that it's
never flagged nor used.
Signed-off-by: Kenneth Graunke <[email protected]>
Reviewed-by: Eric Anholt <[email protected]>
Reviewed-by: Paul Berry <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
When reading the "brw_wm_constants" and "gen6_wm_constants" atoms
side-by-side, I initially failed to notice the crucial difference:
the Gen6 atoms are for Push Constants, while brw_wm_constants handles
Pull Constants. (Gen4/5 Push Constants are handled by "brw_curbe.")
Renaming these should clarify the code and save me from constant
confusion over the fact that "gen6_wm_constants" isn't just a newer
version of "brw_wm_constants."
Signed-off-by: Kenneth Graunke <[email protected]>
Reviewed-by: Eric Anholt <[email protected]>
Reviewed-by: Paul Berry <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Gen7+ SURFACE_STATE is different from Gen4-6, so we need separate
per-generation functions for creating and updating it. However, the
usage is the same, and callers just want to utilize the appropriate
functions with minimal pain. So, put them in the vtable.
Since these take a brw_context pointer and are only used on Gen4, just
add a forward declaration. This is the simplest (if not cleanest)
solution. It would be nicer to have a i965-specific vtable, but that's
a refactor for another day.
Signed-off-by: Kenneth Graunke <[email protected]>
Reviewed-by: Eric Anholt <[email protected]>
|
|
|
|
|
|
|
| |
They were called back-to-back at this point.
Reviewed-by: Kenneth Graunke <[email protected]>
Acked-by: Paul Berry <[email protected]>
|
|
|
|
|
| |
Reviewed-by: Kenneth Graunke <[email protected]>
Acked-by: Paul Berry <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Previously, if the user enabled a non-consecutive set of clip planes
(e.g. 0, 1, and 3), the driver would compact them down to a
consecutive set starting at 0. This optimization was of dubious
value, and complicated the implementation of gl_ClipDistance.
This patch changes the driver so that with Gen6 and later chipsets, we
no longer compact the clip planes. However, we still discard any clip
planes beyond the highest number that is in use, so performance should
not be affected for applications that use clip planes consecutively
from 0.
With chipsets previous to Gen6, we still compact the clip planes,
since the pre-Gen6 clipper thread relies on this behavior.
Reviewed-by: Ian Romanick <[email protected]>
Reviewed-by: Eric Anholt <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
When using user-defined clipping planes, the i965 driver compacts the
array of clipping planes so that disabled clipping planes do not
appear in it--this saves precious push constant space and makes it
easier to generate the pre-GEN6 clip program. As a result, when
enabling clipping planes in GEN6+ hardware, we always enable clipping
planes 0 through n-1 (where n is the number of clipping planes
enabled), regardless of which clipping planes the user actually
requested.
However, we can't do this when using gl_ClipDistance, because it would
be prohibitively complex to compact the gl_ClipDistance array inside
the user-supplied vertex shader. So, when enabling clipping planes in
GEN6+ hardware, if gl_ClipDistance is in use, we need to pass the
user-supplied enable flags directly through to the hardware rather
than just enabling the first n planes.
Fixes Piglit test vs-clip-distance-enables.
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
|
| |
This patch changes get_attr_override() (which computes the
relationship between vertex shader outputs and fragment shader inputs)
to use the VUE map.
Reviewed-by: Eric Anholt <[email protected]>
|
|
|
|
|
|
|
|
|
| |
I want to make brw_state_dump.c handle more than just the last
statechange, so I want to keep track of what's in the batch state. By
using AUB file numbering for most of these packets, this may be
reusable for aub dumping.
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
|
|
|
| |
There will be a little bit of thrashing of the program cache BO as the
cache warms up, but once the application is in steady state, this
reduces relocations on gen5 and later.
On my T420 laptop, cairogl firefox-talos-gfx performance improves 2.6%
+/- 1.3% (n=6). No statistically significant performance difference
on nexuiz (n=5).
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Fixes 17 piglit tests:
- glsl-vs-arrays-3
- glsl-vs-texturematrix-2
- glsl-vs-uniform-array-2
- arl
- nv-arl
- nv-init-zero-addr
- vp-address-01
- vp-arl-constant-array
- vp-arl-constant-array-huge
- vp-arl-constant-array-huge-offset
- vp-arl-constant-array-huge-offset-neg
- vp-arl-constant-array-huge-relative-offset
- vp-arl-constant-array-huge-varying
- vp-arl-env-array
- vp-arl-local-array
- vp-arl-neg-array
- vp-arl-neg-array-2
Fixes 4 glean tests:
- glsl1-constant array of vec4 with variable indexing, vertex shader
- glsl1-constant array with variable indexing, vertex shader
- glsl1-constant array with variable indexing, vertex shader (2)
- vp1-ARL test
Signed-off-by: Kenneth Graunke <[email protected]>
Reviewed-by: Eric Anholt <[email protected]>
|
|
|
|
|
|
|
|
| |
This removes the stupid strict-conformance fallback code I broke when
adding ARB_sampler_objects.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=36572
Reviewed-by: Kenneth Graunke <[email protected]> (v1)
|
|
|
|
|
|
|
| |
Most of this code copied from brw_wm_sampler_state.c.
Signed-off-by: Kenneth Graunke <[email protected]>
Reviewed-by: Eric Anholt <[email protected]>
|
|
|
|
|
| |
Signed-off-by: Kenneth Graunke <[email protected]>
Reviewed-by: Eric Anholt <[email protected]>
|
|
|
|
|
|
|
|
| |
I'm still not happy with the amount of code duplication here, but it
will have to do for now.
Signed-off-by: Kenneth Graunke <[email protected]>
Reviewed-by: Eric Anholt <[email protected]>
|
|
|
|
|
|
|
| |
I need to reuse them.
Signed-off-by: Kenneth Graunke <[email protected]>
Reviewed-by: Eric Anholt <[email protected]>
|
|
|
|
|
|
|
| |
This will make it easier to share between files.
Signed-off-by: Kenneth Graunke <[email protected]>
Reviewed-by: Eric Anholt <[email protected]>
|
|
|
|
|
|
|
|
| |
This also disables the HiZ and separate stencil buffers. We still need
to implement stencil.
Signed-off-by: Kenneth Graunke <[email protected]>
Reviewed-by: Eric Anholt <[email protected]>
|
|
|
|
|
|
|
|
| |
Since we currently only support sampling in the fragment shader, we only
bother to emit the PS variant. In the future we'll need to emit others.
Signed-off-by: Kenneth Graunke <[email protected]>
Reviewed-by: Eric Anholt <[email protected]>
|
|
|
|
|
|
|
| |
This may not be strictly necessary, but seems wise.
Signed-off-by: Kenneth Graunke <[email protected]>
Reviewed-by: Eric Anholt <[email protected]>
|
|
|
|
|
|
|
|
|
| |
Copied from gen6_vs_state.c; reuses create_vs_constant_bo from there.
The 3DSTATE_VS command is identical but 3DSTATE_CONSTANT_VS is not.
Signed-off-by: Kenneth Graunke <[email protected]>
Reviewed-by: Eric Anholt <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
| |
SF and CLIP viewport state has been combined into SF_CLIP_VIEWPORT;
SF_CLIP and CC state pointers can now be uploaded independently.
Some portions of the hardware documentation refer to separate upload
commands for SF and CLIP; these are outdated and incorrect.
Signed-off-by: Kenneth Graunke <[email protected]>
Reviewed-by: Eric Anholt <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
| |
Copied from gen6_clip_state.c.
This enables early culling and sets the necessary fields. Otherwise, it
is entirely the same, so I doubt this patch is strictly necessary for a
functional driver.
Signed-off-by: Kenneth Graunke <[email protected]>
Reviewed-by: Eric Anholt <[email protected]>
|
|
|
|
|
|
|
|
|
| |
The state itself still seems to be the same; the only change is that
each part (CC, BLEND, DEPTH_STENCIL) can now be uploaded independently.
Thus, we still rely on the code in gen6_cc.c to set up the state.
Signed-off-by: Kenneth Graunke <[email protected]>
Reviewed-by: Eric Anholt <[email protected]>
|