| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
|
|
|
|
| |
This will allow the generic parts to be re-used for geometry shaders.
Reviewed-by: Jordan Justen <[email protected]>
Reviewed-by: Eric Anholt <[email protected]>
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
In patches that follow, we'll be splitting structs brw_vs_prog_data
and brw_vs_compile into a vec4-generic base struct and a VS-specific
derived struct (this will allow the vec4-generic code to be re-used
for geometry shaders). Having brw_vs_compile point to
brw_vs_prog_data makes it difficult to do this cleanly.
Fortunately most of the functions that use brw_vs_compile (those in
the vec4_visitor class) already have access to brw_vs_prog_data
through a separate pointer (vec4_visitor::prog_data). So all we have
to do is use that pointer consistently, and plumb prog_data through
the few remaining functions that need access to it.
Reviewed-by: Jordan Justen <[email protected]>
Reviewed-by: Eric Anholt <[email protected]>
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
| |
This patch modifies the arguments to brw_compute_vue_map() so that
they no longer bake in the assumption that we are generating a VUE map
for vertex shader outputs. It also makes the function non-static so
that we can re-use it for geometry shader outputs.
Reviewed-by: Jordan Justen <[email protected]>
Reviewed-by: Eric Anholt <[email protected]>
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The vec4_visitor functions don't use any VS specific data from
vec4_visitor::vp. So rename it to "prog" and change its type from
struct gl_vertex_program * to struct gl_program *. This will allow
the code to be re-used for geometry shaders.
Reviewed-by: Jordan Justen <[email protected]>
v2: Use the name "prog" rather than "p".
Reviewed-by: Eric Anholt <[email protected]>
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
| |
The next patch is going to change the type of vec4_visitor::vp from
struct gl_vertex_program * to struct gl_program *, and rename it. The
sensible name to change it to is vec4_visitor::prog. However, prog is
already used in backend_visitor (which vec4_visitor derives from).
Since backend_visitor::prog is of type struct gl_shader_program *, it
makes sense to rename it to shader_prog.
Reviewed-by: Eric Anholt <[email protected]>
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
|
| |
Reported and tested by degasus on #radeon.
Note: This is a candidate for the 9.1 branch
Signed-off-by: Dave Airlie <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
If a region is larger than the estimated aperture size, we map/unmap it
by copying with the BLT engine. Which means we can't use Y-tiling.
Fixes Piglit max-texture-size and tex3d-maxsize, which regressed in my
recent change to use Y-tiling by default on Gen6+. This was due to a
botched merge conflict resolution.
v2: Return a mask of valid tilings from intel_miptree_select_tiling.
This allows us to avoid the X-tiling fallback if Y-tiling is actually
mandatory.
Signed-off-by: Kenneth Graunke <[email protected]>
Reviewed-by: Chad Versace <[email protected]>
Reviewed-by: Eric Anholt <[email protected]>
|
|
|
|
|
|
|
|
|
| |
This reduces the nesting level slightly, and in my opinion, makes it a
bit easier to follow.
Signed-off-by: Kenneth Graunke <[email protected]>
Reviewed-by: Chad Versace <[email protected]>
Reviewed-by: Eric Anholt <[email protected]>
|
|
|
|
|
|
|
|
| |
We need know this in order to decide what tiling mode to use.
Signed-off-by: Kenneth Graunke <[email protected]>
Reviewed-by: Chad Versace <[email protected]>
Reviewed-by: Eric Anholt <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
| |
gen7_blorp_emit_depth_stencil_config() is only called when
params->depth.mt is non-null. Therefore, it's not necessary to do an
"if (params->depth.mt)" test inside it. The presence of this if test
was misleading static analysis tools (and briefly, me) into thinking
that gen7_blorp_emit_depth_stencil_config() might sometimes access
uninitialized data and dereference a null pointer.
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Enable hiz by setting intel_context::has_hiz. However, to work around
a hardware bug, we selectively enable hiz for only nicely aligned miptree
slices.
No Piglit regressions on Haswell 0x0d26 rev07 when based atop
mesa-master-4ad3601.
Improves the performance of GLB27_TRex_C24Z16_FixedTimeStep by 18.52%
(hsw-0x0d26-rev07; kernel-3.9.0-rc1; GLBenchmark 2.7.0 Release a68901;
samples=3).
v2: Replace the check for IS_HASWELL(devid) in intel_miptree_slice_has_hiz()
with a conditional set of has_hiz. [for anholt]
Reviewed-by: Eric Anholt <[email protected]>
Signed-off-by: Chad Versace <[email protected]>
|
|
|
|
|
|
|
|
| |
After recent refactorings, the field is written but no longer read.
Reviewed-by: Eric Anholt <[email protected]>
Reviewed-by: Kenneth Graunke <[email protected]>
Signed-off-by: Chad Versace <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
When appropriate, replace each check `hiz_mt != NULL` with either a call
to intel_miptree_slice_has_hiz() or intel_renderbuffer_has_hiz(). No
behavioral change.
This prepares for selectively enabling hiz on individual miptree slices
for Haswell.
This refactoring had several side effects.
1. To prevent new warnings about discarding the const qualifier,
I removed 'const' from some variable declarations in
intel_validate_framebuffer(). The alternative was to add const
qualifiers to multiple function signatures in the
intel_renderbuffer_has_hiz call graph. Since the dominant convention
in the Intel code is to not qualify function parameters as const,
I chose to remove rather than add const qualifiers.
2. I changed the signature of brw_emit_depth_stencil_hiz() by replacing
`struct intel_mipmap_tree *hiz_mt` with `bool hiz`. The function used
hiz_mt mostly as a boolean indicator of the presence of hiz, so the
signature change is consistent with the patch's goal.
Reviewed-by: Eric Anholt <[email protected]>
Reviewed-by: Kenneth Graunke <[email protected]>
Signed-off-by: Chad Versace <[email protected]>
|
|
|
|
|
|
|
|
|
|
| |
Add new parameters `depth_level` and `depth_layer`, which specify depth
miptree's slice of interest. A following patch will pass the new
parameters through to intel_miptree_slice_has_hiz().
Reviewed-by: Eric Anholt <[email protected]>
Reviewed-by: Kenneth Graunke <[email protected]>
Signed-off-by: Chad Versace <[email protected]>
|
|
|
|
|
|
|
|
|
| |
The new fields define the 2D miptree slice to be used. A following patch
will pass the new fields through to intel_miptree_slice_has_hiz().
Reviewed-by: Eric Anholt <[email protected]>
Reviewed-by: Kenneth Graunke <[email protected]>
Signed-off-by: Chad Versace <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
On Haswell, HiZ will selectively be enabled on individual miptree slices
to workaround a hardware bug. The new field 'has_hiz' indicates if HiZ is
enabled for a given slice.
Also add two new accessor functions for this field.
intel_miptree_slice_has_hiz
intel_renderbuffer_has_hiz
The new field and accessor functions are not yet used. Also, this patch
introduces no behavioral change because, in this patch,
intel_miptree_alloc_hiz() sets has_hiz for all slices.
Reviewed-by: Eric Anholt <[email protected]>
Signed-off-by: Chad Versace <[email protected]>
|
|
|
|
|
|
|
|
|
|
| |
The hardware docs and the simulator require that the rectangle primitive
emitted during fast depth clears and hiz resolves must be aligned to 8x4
pixels.
Reviewed-by: Eric Anholt <[email protected]>
Reviewed-by: Kenneth Graunke <[email protected]>
Signed-off-by: Chad Versace <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This allows the computation of the offset to get written directly into the
message source.
shader-db results:
total instructions in shared programs: 3308390 -> 3283025 (-0.77%)
instructions in affected programs: 442998 -> 417633 (-5.73%)
No difference in GLB2.7 low res (n=9).
Reviewed-by: Matt Turner <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
We have several places in our pull constant handling where we make a
temporary src_reg for an int, and then turn it into a dst. In doing so,
we were writing to the dst.xyzw, so we never register coalesced it with a
later mov from dst.x to real_dst.x.
These extra channels written would be removed if we had channel-wise DCE
in the backend, but we don't. Fix it for now by just not writing these
extra channels that won't get used.
Reviewed-by: Matt Turner <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
| |
The software-tracked transform feedback offsets (svbi_0_starting_index)
are incorrect in the presence of primitive restart, so we were actually
updating it with a bogus value if the batch wrapped and we emitted the
packet again during a single transform feedback. By reducing state
emission, we avoid the bug.
Fixes piglit OpenGL 3.1/primitive-restart-xfb flush
Reviewed-by: Paul Berry <[email protected]>
NOTE: This is a candidate for the 9.1 branch.
|
|
|
|
|
|
|
|
|
|
|
|
| |
The software-tracked transform feedback offsets (svbi_0_starting_index)
are incorrect in the presence of primitive restart, so we can't reliably
compute offsets for our buffer pointers after a batch flush. Thanks to HW
contexts, our transform feedback offsets are now saved, so we can just
keep using the ones from before the batch wrap.
Fixes piglit OpenGL 3.1/primitive-restart-xfb flush
Reviewed-by: Paul Berry <[email protected]>
NOTE: This is a candidate for the 9.1 branch.
|
|
|
|
| |
Reviewed-by: Jordan Justen <[email protected]>
|
|
|
|
| |
Reviewed-by: Jordan Justen <[email protected]>
|
|
|
|
| |
Reviewed-by: Jordan Justen <[email protected]>
|
|
|
|
|
|
| |
None of these were needed.
Reviewed-by: Jordan Justen <[email protected]>
|
|
|
|
|
|
| |
This makes sure that ctx->DrawBuffer->Visual.samples is up-to-date.
Reviewed-by: Eric Anholt <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
"ctx->DrawBuffer->Visual" might be invalid if (NewState &_NEW_BUFFERS) != 0.
v2: also fix:
- RGBA_INTEGER_MODE_EXT
- RGBA_FLOAT_MODE_ARB (also check API support)
- FRAMEBUFFER_SRGB_CAPABLE_EXT
NOTE: This is a candidate for stable branches.
Reviewed-by: Eric Anholt <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Gen7.5 (Haswell) hardware supports primitive restart for all primitive
types. It also handles all possible primitive restart indices.
Rather than specialize both can_cut_index_handle_restart_index() and
the switch statement in can_cut_index_handle_prims() for Haswell, just
return early if the hardware is Haswell because we know it can handle
everything.
Reviewed-by: Kenneth Graunke <[email protected]>
Reviewed-by: Jordan Justen <[email protected]>
Reviewed-by: Eric Anholt <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
brw_draw.c contains a trim() function which modifies the vertex count
for quads and quad strips in order to discard dangling vertices. In
principle this shouldn't be necessary, since hardware since Gen4 is
capable of discarding dangling vertices by itself. However, it's
necessary because as a hack to speed up rendering on Gen 4-5, we
sometimes convert quads to trifans and quad strips to tristrips. The
trim() function isn't necessary on Gen6 and up.
This patch documents why and when the trim() function is necessary,
and avoids calling it when it's not needed.
This will avoid creating problems when we enable hardware support for
primitive restart of quads and quad strips on Haswell.
Reviewed-by: Kenneth Graunke <[email protected]>
Reviewed-by: Jordan Justen <[email protected]>
Reviewed-by: Eric Anholt <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
| |
The call to emit_shader_time_end() before the second URB write was
conditioned with "if (eot)", but eot is always false in this code
path, so emit_shader_time_end() was never being called for vertex
shaders that performed 2 URB writes.
Reviewed-by: Jordan Justen <[email protected]>
Reviewed-by: Ian Romanick <[email protected]>
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
| |
This patch updates the interp[] array to match the enum
glsl_interp_qualifier.
Reviewed-by: Jordan Justen <[email protected]>
Reviewed-by: Kenneth Graunke <[email protected]>
v2: Add a STATIC_ASSERT to make sure the array is the correct size.
This required adding INTERP_QUALIFIER_COUNT to the enum.
|
|
|
|
|
|
|
|
|
|
|
|
| |
This option can force textures to be untiled. However, on Gen6+, depth
buffers must be Y-tiled. MSAA buffers also must be Y-tiled. So setting
this option on even a trivial application like glxgears causes assertion
failures in a debug build, and likely GPU hangs in a release build.
It's just giving users a license to shoot themselves in the foot.
Signed-off-by: Kenneth Graunke <[email protected]>
Reviewed-by: Eric Anholt <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
In the past, we preferred X-tiling for color buffers because our BLT
code couldn't handle Y-tiling. However, the BLT paths have been largely
replaced by BLORP on Gen6+, which can handle any kind of tiling.
We hadn't measured any performance improvement in the past, but that's
probably because compressed textures were all untiled anyway.
Improves performance in GLB27_TRex_C24Z16_FixedTime by 7.69231%.
v2: Rebase on top of Eric's untiled-for-larger-than-aperture changes.
Signed-off-by: Kenneth Graunke <[email protected]>
Reviewed-by: Matt Turner <[email protected]>
Reviewed-by: Eric Anholt <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The code has no rationale for why we would force compressed textures to
be untiled, and it appears to work fine. Git archeology indicates that
it's been that way dating back to when we first started tiling.
Improves performance in GLB27_TRex_C24Z16_FixedTimeStep at 1280x720 by
10.0529% +/- 0.573075% (n=12). Improves performance in Xonotic by
4.56409% +/- 0.27965% (n=3).
Signed-off-by: Kenneth Graunke <[email protected]>
Reviewed-by: Matt Turner <[email protected]>
Reviewed-by: Eric Anholt <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This patch (1) extracts from intel_miptree_create() the spaghetti logic
that selects the tiling format, (2) rewrites that spaghetti into a lucid
form, and (3) moves it to a new function, intel_miptree_choose_tiling().
No behavioral change.
As a bonus, it is now evident that the force_y_tiling parameter to
intel_miptree_create() does not really force Y tiling.
v2 (Ken): Rebase on top of Eric's untiled-for-larger-than-aperture
changes. This required passing in the miptree.
Signed-off-by: Chad Versace <[email protected]>
Reviewed-by: Kenneth Graunke <[email protected]>
Reviewed-by: Matt Turner <[email protected]>
Reviewed-by: Eric Anholt <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
When moving the renderbuffer to a new miptree, we neglected to allocate
the hiz buffer for the new miptree. Oops.
Fixes all Piglit depthstencil-render-miplevels tests from crash to pass on
Sandybridge.
Note: This is a candidate for the 9.1 branch.
Reviewed-by: Eric Anholt <[email protected]>
Reviewed-by: Paul Berry <[email protected]>
Signed-off-by: Chad Versace <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
| |
calim pointed out we were getting mipmap levels for array multisamples,
this didn't make sense. So then I noticed this function takes last_level
so we are passing in a too high value here.
I think this should fix the case he was seeing.
Reviewed-by: Brian Paul <[email protected]>
Signed-off-by: Dave Airlie <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Doing so was breaking miptree mapping, which we really need to be able to
handle. With this change, intel_miptree_map_direct() falls through to
doing a CPU mapping on the buffer like we need.
With the previous 2 patches, all of these should be fixed:
piglit max-texture-size (all 3 patches required!)
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=37871
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=44958
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=53494
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
|
| |
This still fails, since 8192*4bpp == 32768, which is too big to use the
blitter on.
Reviewed-by: Kenneth Graunke <[email protected]>
Reviewed-by: Daniel Vetter <[email protected]>
|
|
|
|
|
|
|
| |
This will be used for handling updates of large textures.
Reviewed-by: Kenneth Graunke <[email protected]>
Reviewed-by: Chad Versace <[email protected]>.
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Now that we require 2.6.39, there's no need to also check for 2.6.29.
Calling drm_intel_bufmgr_gem_enable_fenced_relocs() without checking
should be safe, as it simply sets a flag.
This does remove the check for zero fences available, but that doesn't
seem worth checking.
Signed-off-by: Kenneth Graunke <[email protected]>
Reviewed-by: Daniel Vetter <[email protected]>
Reviewed-by: Eric Anholt <[email protected]>
|
|
|
|
|
|
|
|
|
|
| |
Chris Wilson's relaxed relocation patch landed in March 2011. Anyone
running pre-3.0 kernels probably isn't going to get the latest Mesa
anyway.
Signed-off-by: Kenneth Graunke <[email protected]>
Reviewed-by: Daniel Vetter <[email protected]>
Reviewed-by: Eric Anholt <[email protected]>
|
|
|
|
|
|
|
|
| |
These were likely used for BRW_NEW_... dirty bit flags at one point, but
they're unused now.
Signed-off-by: Kenneth Graunke <[email protected]>
Reviewed-by: Eric Anholt <[email protected]>
|
|
|
|
|
|
|
| |
Nobody uses this value, so there's no need to set it.
Signed-off-by: Kenneth Graunke <[email protected]>
Reviewed-by: Eric Anholt <[email protected]>
|
|
|
|
|
|
|
|
|
|
| |
When I removed the proj_attrib_mask optimization, I also removed the
last consumer of this bit without realizing it.
Since nobody uses it, there's no point in flagging it.
Signed-off-by: Kenneth Graunke <[email protected]>
Reviewed-by: Eric Anholt <[email protected]>
|
|
|
|
| |
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
| |
Reviewed-by: Brian Paul <[email protected]>
|
|
|
|
| |
Reviewed-by: Brian Paul <[email protected]>
|
|
|
|
|
|
|
|
|
| |
It has 2 dependencies: glClampColor and the framebuffer, we might just as well
do the update where those two are changed.
v2: cosmetic changes from Brian's email
Reviewed-by: Brian Paul <[email protected]>
|
|
|
|
|
|
|
|
| |
This should reduce shader recompilations with drivers that emulate fragment
color clamping, because we want the clamping to be enabled only if there is
a signed normalized or floating-point colorbuffer.
Reviewed-by: Brian Paul <[email protected]>
|