| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
|
|
|
|
| |
This was only necessary because our bounds checking was off by one, and
thus we read an extra pair of values.
Signed-off-by: Kenneth Graunke <[email protected]>
Reviewed-by: Eric Anholt <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
| |
If we've written N pairs of values to the buffer, then last_index = N,
but the values are 0 .. N-1. Thus, we need to use <, not <=.
This worked anyway because we fill the buffer with zeroes, so we just
added an extra (0 - 0) to our results.
Signed-off-by: Kenneth Graunke <[email protected]>
Reviewed-by: Eric Anholt <[email protected]>
|
|
|
|
|
|
|
| |
Mesa core is the place for encoding what format/type matches a mesa
format, so rely on that.
Reviewed-by: Chad Versace <[email protected]>
|
|
|
|
|
|
|
|
| |
We were allowing things like copying RG1616 to a user's ARGB8888
format, while we were denying anything that wasn't ARGB8888 or
RGB565.
Reviewed-by: Chad Versace <[email protected]>
|
|
|
|
|
|
|
|
|
| |
This is similar code to intel_miptree_copy_slice, but the knobs
are all set differently.
v2: fix whitespace
Reviewed-by: Chad Versace <[email protected]>
|
|
|
|
|
|
|
|
|
|
| |
I'm trying to move us away from the region structure, and all the
callers are currently dereferencing a miptree to get the region.
In this change, the map_refcount is dropped. However, the bo->virtual is
itself map refcounted, so that's already dealt with.
Reviewed-by: Chad Versace <[email protected]>
|
|
|
|
|
|
|
|
| |
The point of tracking the value was removed in February 2012
(65b096aeddd9b45ca038f44cc9adfff86c8c48b2), and this should have
been removed at the same time.
Reviewed-by: Chad Versace <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
| |
I don't see any reason for it -- it was introduced with the DRI2
invalidate work by krh in 2010 with no explanation. I suspect it was
something about wanting the same drm_intel_bo struct underneath multiple
openings of the BO within one process, but that's covered by libdrm at
this point. As far as the struct region goes, it is not threadsafe, so
multiple contexts sharing a region could have mixed up the map_count and
assertion failed or worse.
Reviewed-by: Chad Versace <[email protected]>
|
|
|
|
|
|
|
|
| |
I was testing the ARB_debug_output code and wrote an obvious sample that
should have hit this, and got confused that my ARB_debug_output was
broken.
Reviewed-by: Jordan Justen <[email protected]>
|
|
|
|
|
|
|
|
| |
I tried to ensure that performance in the non-debug case doesn't change
(we still just check one condition up front), and I think the impact is
small enough in the debug context case to warrant including all of it.
Reviewed-by: Jordan Justen <[email protected]>
|
|
|
|
|
|
|
| |
They're about to change to handle GL_ARB_debug_output, so just make one
function.
Reviewed-by: Jordan Justen <[email protected]>
|
|
|
|
|
|
|
|
|
|
| |
This doesn't provide detailed error type information, but it's important
to get these relatively severe but rare error messages out to the
developer through whatever mechanism they are using.
v2: Rebase on new WARN_ONCE additions.
Reviewed-by: Jordan Justen <[email protected]> (v1)
|
|
|
|
|
|
|
|
| |
Should fix https://bugs.freedesktop.org/show_bug.cgi?id=60510
Note: this is a candidate for the stable branches
Acked-by: Francisco Jerez <[email protected]>
|
|
|
|
|
|
|
|
|
| |
The second digit was off by one, which meant we accidentally treated
GTn as GT(n-1). This also meant no support for GT1 at all.
NOTE: This is a candidate for stable branches.
Signed-off-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
|
|
| |
We'll want to reuse this for non-occlusion queries in the future.
Plus, it's a single logical task, so having it as a helper function
clarifies the code somewhat.
Signed-off-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
|
|
|
| |
Again, eliminating a global variable in favor of a per-query object
variable will help in a future where we have more queries in hardware.
Personally, I find this clearer: there's just the query object's BO,
rather than two variables that usually shadow each other.
Signed-off-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
| |
The code a few lines above calls brw_emit_query_begin() if !query->bo,
and that creates query->bo. So it should always be non-NULL.
Signed-off-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
If we haven't allocated a BO yet, we need to do that. Or, if there
isn't enough room to write another pair of values, we need to gather up
the existing results and start a new one. This is simple enough.
However, the old code was awkwardly split into two blocks, with a
write_depth_count() placed in the middle. The new depth count isn't
relevant to gathering the old BO's data, so that can go after the
reallocation is done. With the two blocks adjacent, we can merge them.
Signed-off-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
| |
Since we already have an index in the brw_query_object, there's no need
to also keep a global variable that shadows it.
Plus, if we ever add support for more types of queries that still need
the per-batch before/after treatment we do for occlusion queries, we
won't be able to use a single global variable. In contrast, per-query
object variables will work fine.
Signed-off-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
| |
brw->query.index is initialized to 0 just a few lines before it's
copied to first_index.
Presumably the idea here was to reuse the query BO for subsequent
queries of the same type, but since that doesn't happen, there's no need
to have the extra code complexity.
Signed-off-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This code was really difficult to follow, for a number of reasons:
- Queries were handled in four different ways (TIMESTAMP writes a single
value, TIME_ELAPSED writes a single pair of values, occlusion queries
write pairs of values for the start and end of each batch, and other
queries are done entirely in software. It turns out that there are
very good reasons each query is handled the way it is, but
insufficient comments explaining the rationale.
- It wasn't immediately obvious which functions were driver hooks
and which were helper functions. For example, brw_query_begin() is
a driver hook that implements glBeginQuery() for all query types, but
the similarly named brw_emit_query_begin() is a helper function that's
only relevant for occlusion queries.
Extra explanatory comments should save me and others from constantly
having to ask how this code works and why various query types are
handled differently.
v2: Incorporate Eric's feedback: change "as soon as possible" to "the
results will be present when mapped."
Signed-off-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
|
|
|
| |
For timestamp queries, we just write a single value to a BO. The
natural place to write that is element 0, so we should do that.
Previously, we wrote it into element 1 (the second slot) leaving
element 0 filled with garbage.
Signed-off-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
| |
This moves the GL_TIMESTAMP handling out of EndQuery.
Signed-off-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
|
| |
V2: Works on Ivy Bridge now too, so this can be 6+.
Signed-off-by: Chris Forbes <[email protected]>
Reviewed-by: Paul Berry <[email protected]>
Reviewed-by: Eric Anholt <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
On Gen6, lower this to `ld` with lod=0 and an extra sample_index
parameter.
On Gen7, use `ld2dms`. We don't support CMS yet for multisample
textures, so we just hardcode MCS=0. This is ignored for IMS and UMS
surfaces.
Note: If we do end up emitting specialized shaders based on the MSAA
layout, we can emit a slightly shorter message here in the UMS case.
Note: According to the PRM, `ld2dms` takes one more parameter, lod.
However, it's always zero, and including it would make the message too
long for SIMD16, so we just omit it.
V2: Reworked completely, added support for Gen7.
V3: - Introduce sample_index parameter rather than reusing lod
- Removed spurious whitespace change
- Clarify commit message
V4: - Fix comment style
- Emit SHADER_OPCODE_TXF_MS on Gen6. This was benignly wrong since
it lowers to `ld` anyway on this gen, but still wrong.
Signed-off-by: Chris Forbes <[email protected]>
Reviewed-by: Eric Anholt <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
On Gen6, lower this to `ld` with lod=0 and an extra sample_index
parameter.
On Gen7, use `ld2dms`. This takes an additional MCS parameter to support
compressed multisample surfaces, but we're not enabling them for
multisample textures for now, so it's always ignored and can be safely
omitted.
V2: Reworked completely, added support for Gen7.
V3: - Use new sample_index, sample_index_type rather than reusing lod
- Clarify commit message.
V4: - Fix comment style
Signed-off-by: Chris Forbes <[email protected]>
Reviewed-by: Eric Anholt <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This is very similar to the TXF opcode, but lowers to `ld2dms` rather
than `ld` on Gen7.
V4: - add SHADER_OPCODE_TXF_MS to is_tex() functions, so regalloc thinks
it actually writes the correct number of registers. Otherwise in
nontrivial shaders some of the registers tend to get clobbered,
producing bad results.
Signed-off-by: Chris Forbes <[email protected]>
Reviewed-by: Paul Berry <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Gen7 has an erratum affecting the ld_mcs message, making it unsafe to
use when the surface doesn't have an associated MCS.
From the Ivy Bridge PRM, Vol4 Part1 p77 ("MCS Enable"):
"If this field is disabled and the sampling engine <ld_mcs>
message is issued on this surface, the MCS surface may be
accessed. Software must ensure that the surface is defined
to avoid GTT errors."
To allow the shader to treat all surfaces uniformly, force UMS if the
surface is to be used as a multisample texture, even if CMS would have
been possible.
V3: - Quoted erratum text
Signed-off-by: Chris Forbes <[email protected]>
Reviewed-by: Eric Anholt <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The surface_state setup for renderbuffers already worked; only the
texturing side needed work. BLORP does something similar, but does its
own surface_state setup.
On Gen6, we just need to set the correct sample count.
On Gen7: - set the correct sample count
- set the correct layout mode
- set GEN7_SURFACE_ARYSPC_LOD0 if it's set in the miptree.
V2: - Clarify commit message
- Rebased onto Paul's physical/logical dims cleanup
- Added Gen7 support
Signed-off-by: Chris Forbes <[email protected]>
Reviewed-by: Paul Berry <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
| |
V2: - Fix for state moving from texobj to image
- Rebased onto Paul's logical/physical cleanup
- Fixed missing quantization of sample count
- Fold in IMS renderbuffer wrapper fixes from later in the series
- Use correct physical slice offset for UMS/CMS surfaces on Gen7
Signed-off-by: Chris Forbes <[email protected]>
Reviewed-by: Paul Berry <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
| |
Moves the definition of the sample positions out of
gen6_emit_3dstate_multisample, and unpacks them in
gen6_get_sample_position.
V2: Be consistent about `sample position` rather than `location`.
Signed-off-by: Chris Forbes <[email protected]>
Reviewed-by: Paul Berry <[email protected]>
Acked-by: Ian Romanick <[email protected]>
|
|
|
|
|
|
|
| |
Signed-off-by: Chris Forbes <[email protected]>
Reviewed-by: Eric Anholt <[email protected]>
Reviewed-by: Paul Berry <[email protected]>
Reviewed-by: Ian Romanick <[email protected]>
|
|
|
|
|
|
|
|
|
|
| |
V2: For now, only expose a depth sample count of 1, since there are
possible unresolved interactions with HiZ.
Signed-off-by: Chris Forbes <[email protected]>
Reviewed-by: Paul Berry <[email protected]>
Reviewed-by: Eric Anholt <[email protected]>
Reviewed-by: Ian Romanick <[email protected]>
|
|
|
|
| |
Reviewed-and-tested-by: Ian Romanick <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The GLX extension lets you expose visuals that explicitly guarantee you
that the GL_FRAMEBUFFER_SRGB_CAPABLE flag will be set, but we can set
the flag even while the visual doesn't provide the guarantee. This
appears to be consistent with other implementations, as we've seen
several apps now that don't require an srgb visual and assume sRGB will
work without checking the GL_FRAMEBUFFER_SRGB_CAPABLE flag.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=55783
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=60633
Reviewed-and-tested-by: Ian Romanick <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
| |
Now that we have W-tiled S8, we can't just region_map and poke at bits --
there has to be some swizzling. Rely on intel_miptree_map to get that job
done. This should also get the highest performance path we know of for the
mapping (interesting if I get around to finishing movntdqa some day).
v2: Fix stale name of the bit in a comment.
Reviewed-by: Chad Versace <[email protected]>
|
|
|
|
|
|
|
|
| |
I want to reuse intel_miptree_map() to replace some region mapping that's
broken for separate stencil, but doing so would result in new demands on
ETC transcode that we actually don't want to happen.
Reviewed-by: Chad Versace <[email protected]>
|
|
|
|
| |
Reviewed-by: Chad Versace <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Without this set, dri_util.c:dri2CreateContextAttribs
will reject requests to create a context with
__DRI_API_OPENGL_CORE.
This prevents a 3.2 core profile context from being created
even when MESA_GL_OVERRIDE_VERSION=3.2 is used.
Signed-off-by: Jordan Justen <[email protected]>
Reviewed-by: Eric Anholt <[email protected]>
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
|
|
| |
If the override is version is >= 3.1, then update the
max_gl_core_version. Otherwise, update max_gl_compat_version.
Signed-off-by: Jordan Justen <[email protected]>
Reviewed-by: Eric Anholt <[email protected]>
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
|
| |
Immediate operands can only be src2 in 2-source instructions. Fixes
piglit failures since 0a1d145e (oops!).
Spotted-by: Eric Anholt <[email protected]>
Reviewed-by: Matt Turner <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The field was equivalent to (etc_format != MESA_FORMAT_NONE), and
therefore duplicate information.
This patch removes field and replaces all references to it with
`etc_format != MESA_FORMAT_NONE`.
No Piglit ETC test regresses on Intel Sandybridge.
Reviewed-by: Kenneth Graunke <[email protected]>
Reviewed-by: Anuj Phogat <[email protected]>
Signed-off-by: Chad Versace <[email protected]>
|
|
|
|
|
| |
Reviewed-by: Eric Anholt <[email protected]>
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
| |
Reviewed-by: Eric Anholt <[email protected]>
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
v2 [mattst88]:
- Add BRW_OPCODE_LRP to list of CSE-able expressions.
- Fix op_var[] array size.
- Rename arguments to emit_lrp to (x, y, a) to clear confusion.
- Add LRP function to brw_fs.cpp/.h.
- Corrected comment about LRP instruction arguments in emit_lrp.
v3 [mattst88]:
- Duplicate MAD code for LRP instead of using a function pointer.
- Check for != GRF instead of == IMM in emit_lrp.
- Lower LRP on gen < 6.
Reviewed-by: Matt Turner <[email protected]>
Reviewed-by: Eric Anholt <[email protected]>
Signed-off-by: Kenneth Graunke <[email protected]>
1
|
|
|
|
|
|
|
|
| |
Like MAD, this is another three-source instruction.
Reviewed-by: Matt Turner <[email protected]>
Reviewed-by: Eric Anholt <[email protected]>
Signed-off-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Many GPUs have an instruction to do linear interpolation which is more
efficient than simply performing the algebra necessary (two multiplies,
an add, and a subtract).
Pattern matching or peepholing this is more desirable, but can be
tricky. By using an opcode, we can at least make shaders which use the
mix() built-in get the more efficient behavior.
Currently, all consumers lower ir_triop_lrp. Subsequent patches will
actually generate different code.
v2 [mattst88]:
- Add LRP_TO_ARITH flag to ir_to_mesa.cpp. Will be removed in a
subsequent patch and ir_triop_lrp translated directly.
v3 [mattst88]:
- Move changes from the next patch to opt_algebraic.cpp to accept
3-src operations.
Reviewed-by: Matt Turner <[email protected]>
Reviewed-by: Eric Anholt <[email protected]>
Signed-off-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
|
|
| |
total instructions in shared programs: 346873 -> 346847 (-0.01%)
instructions in affected programs: 364 -> 338 (-7.14%)
(All affected shaders are from Lightsmark)
Reviewed-by: Eric Anholt <[email protected]>
|
|
|
|
|
|
|
| |
total instructions in shared programs: 1376297 -> 1375626 (-0.05%)
instructions in affected programs: 35977 -> 35306 (-1.87%)
Reviewed-by: Eric Anholt <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Gen6 has write-only MRF registers, and for ease of implementation we
paritition off 16 general purposes registers to act as MRFs on Gen7.
Knowing that our Gen7 MRFs are actually GRFs, we can do things we can't
do with real MRFs:
- read from them;
- return values directly to them from a send instruction; and
- compute directly to them with math instructions.
Reviewed-by: Eric Anholt <[email protected]>
|