| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The BLT engine has many limitations. Currently, it can only blit
X-tiled buffers (since we don't have a kernel API to whack the BLT
tiling mode register), which means all depth/stencil operations get
punted to meta code, which can be very CPU-intensive.
Even if we used the BLT engine, it can't blit between buffers with
different tiling modes, such as an X-tiled non-MSAA ARGB8888 texture
and a Y-tiled CMS ARGB8888 renderbuffer. This is a fundamental
limitation, and the only way around that is to use BLORP.
Previously, BLORP only handled BlitFramebuffer. This patch adds an
additional frontend for doing CopyTexSubImage. It also makes it the
default. This is partly to increase testing and avoid hiding bugs,
and partly because the BLORP path can already handle more cases. With
trivial extensions, it should be able to handle everything the BLT can.
This helps PlaneShift massively, which tries to CopyTexSubImage2D
between depth buffers whenever a player casts a spell. Since these
are Y-tiled, we hit meta and software ReadPixels paths, eating 99% CPU
while delivering ~1 FPS. This is particularly bad in an MMO setting
because people cast spells all the time.
It also helps Xonotic in 4X MSAA mode. At default power management
settings, I measured a 6.35138% +/- 0.672548% performance boost (n=5).
(This data is from v1 of the patch.)
No Piglit regressions on Ivybridge (v3) or Sandybridge (v2).
v2: Create a fake intel_renderbuffer to wrap the destination texture
image and then reuse do_blorp_blit rather than reimplementing most
of it. Remove unnecessary clipping code and conditional rendering
check.
v3: Reuse formats_match() to centralize checks; delete temporary
renderbuffers. Reorganize the code.
v4: Actually copy stencil when dealing with separate stencil buffers but
packed depth/stencil formats. Tested by a new Piglit test.
NOTE: This is a candidate for the 9.1 branch.
Signed-off-by: Kenneth Graunke <[email protected]>
Reviewed-by: Paul Berry <[email protected]> [v4]
Reviewed-by: Ian Romanick <[email protected]> [v3]
Reviewed-and-tested-by: Carl Worth <[email protected]> [v2]
Tested-by: Martin Steigerwald <[email protected]> [v3]
|
| |
|
|
|
|
| |
This was used by the old VS backend, but that's long gone.
|
|
|
|
| |
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
According to bug #54524, I regressed oglconform's multicontext test
when I reenabled the fragment shader precompile.
However, these test cases only passed by miraculous coincedence. We
assign each fragment program a unique ID (brw_fragment_program::id which
becomes brw_wm_prog_key::program_string_id) which we obtain by storing a
per-context counter.
The test case uses GLX context sharing to access the same fragment
program from two different contexts. This means that we share a program
cache. Before the precompile, if both contexts happened to use the same
shaders in the same order, we'd obtain the same program_string_ids (by
virtue of doing the same computation twice). However, the more likely
scenario is that they completely disagree on program_string_id.
This meant that we'd have two completely different fragment shaders in
the cache with the same ID, tricking us to think they were the same
(aside from NOS), so we'd render using the wrong program.
This patch implements a simple fix suggested by Eric: it moves the
global counter out of brw_context and into intel_screen, which is shared
across all contexts. A mutex protects it from concurrent access.
This is also the first direct usage of pthreads in the i965 driver.
Fixes 10 subcases of oglconform's multicontext test.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=54524
Reviewed-by: Eric Anholt <[email protected]>
|
|
|
|
|
|
|
|
|
| |
Previously this macro existed in 3 separate places, some inside the
intel driver and some outside of it. It makes more sense to have it
in main/macros.h
Reviewed-by: Brian Paul <[email protected]>
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
|
|
|
| |
Some shaders experience resets more than others, which skews the numbers
reported. Attempt to correct for this by linearly scaling according to
the number of resets that happen.
Note that will not be accurate if invocations of shaders have varying
times and longer invocations are more likely to reset. However, this
should at least be better than the previous situation.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This can be used for two purposes: Using hand-coded shaders to determine
per-instruction timings, or figuring out which shader to optimize in a
whole application.
Note that this doesn't cover the instructions that set up the message to
the URB/FB write -- we'd need to convert the MRF usage in these
instructions to GRFs so that our offsets/times don't overwrite our
shader outputs.
Reviewed-by: Kenneth Graunke <[email protected]> (v1)
v2: Check the timestamp reset flag in the VS, which is apparently
getting set fairly regularly in the range we watch, resulting in
negative numbers getting added to our 32-bit counter, and thus large
values added to our uint64_t.
v3: Rebase on reladdr changes, removing a new safety check that proved
impossible to satisfy. Add a comment to the AOP defs from Ken's
review, and put them in a slightly more sensible spot.
v4: Check timestamp reset in the FS as well.
|
|
|
|
|
|
|
|
|
|
| |
Given that we have the mask information here (assuming the rebase is to
the same tiling, which is safe), we can just save a set of miptrees and
offsets and the global intra-tile offset in the context and cut out a
bunch of logic. This will also save emitting the next fix I need to do
twice.
Acked-by: Chad Versace <[email protected]>
|
|
|
|
|
|
|
|
| |
brw_optimize.c's brw_opcodes table was a copy of brw_disasm.c's
opcode_descs table, but with an additional field: is_arith. Now that
I've deleted that, the two are identical. Keep the one in brw_disasm.c.
Reviewed-by: Eric Anholt <[email protected]>
|
|
|
|
|
|
| |
Nobody uses it.
Reviewed-by: Eric Anholt <[email protected]>
|
|
|
|
| |
Reviewed-by: Eric Anholt <[email protected]>
|
|
|
|
|
|
|
| |
At this point, it's just gl_shader_program. Nobody even uses it; even
the program that creates them only returns gl_shader_program pointers.
Reviewed-by: Eric Anholt <[email protected]>
|
|
|
|
|
|
| |
With a name like that, it can't be used. Sure enough, it's not.
Reviewed-by: Eric Anholt <[email protected]>
|
|
|
|
|
|
|
|
| |
We were successfully freeing our compile data at context destroy, but until
then we were allocating a new store every compile without freeing it.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=56019
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
| |
If the index buffer is full of values like "0 1 2 3", but basevertex is 4, we
need to upload at least vertex data for elements 4 5 6 7. Whether we also
upload 0 1 2 3 is a question of whether there are VBOs present or not -- see
the code setting start_vertex_bias in brw_draw_upload.c.
Fixes piglit draw-elements*base-vertex user_varrays
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
| |
Only the old backend used it.
Reviewed-by: Eric Anholt <[email protected]>
|
|
|
|
|
|
| |
It's no longer used for anything.
Reviewed-by: Eric Anholt <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This essentially reverts the following:
commit c625aa19cb53ed27f91bfd16fea6ea727e9a5bbd
Author: Chris Wilson <[email protected]>
Date: Fri Feb 18 10:37:43 2011 +0000
intel: extend current vertex buffers
While working on optimizing an upcoming Steam title, I broke this code.
Eric expressed his doubts about this optimization, and noted that the
original commit offered no performance data.
I ran before and after benchmarks on Xonotic and Citybench, and found
that this code made no difference. So, remove it to reduce complexity
and make future work simpler.
Reviewed-by: Eric Anholt <[email protected]>
|
|
|
|
|
|
|
| |
This is a leftover from when we had to split those two functions due to
the separate BO validation step.
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
|
|
| |
"Active" is an already-used term for the query being between
glBeginQuery() and glEndQuery(), while this is tracking whether the
start of the packet pair for emitting state has been inserted into the
current batchbuffer.
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
| |
Now that we've replaced all the variable settings other than reg_width, it's
easy to hang on to this (the expensive part of setting up the allocator).
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
| |
This is super basic, but it let me visualize a problem I had with
opt_compute_to_mrf().
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
| |
Fixes 51 piglit tests (fbo-clear-formats, and most of the remaining failures
in depthstencil).
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
|
|
|
| |
This code is twisty, and the comment before most of the blocks was actually
giving me the opposite impression from its intention: We want to apply as much
of our offset as possible through coarse tile-aligned adjustment, since we can
do so independently per buffer, and apply the minimum we can through
fine-grained drawing offset x/y, since it has to agree between all buffers.
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
|
|
| |
Saves 96MB of wasted memory in the l4d2 demo.
v2: Rebase on compare func change, change brace style.
Reviewed-by: Kenneth Graunke <[email protected]>
Reviewed-by: Ian Romanick <[email protected]>
|
|
|
|
|
|
|
|
|
| |
Saves 26.5MB of wasted memory allocation in the l4d2 demo.
v2: Rebase on compare func change, fix comments.
Reviewed-by: Ian Romanick <[email protected]> (v1)
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
| |
Currently, this just avoids comparing all unused parts of param[] and
pull_param[], but it's a step toward getting rid of those giant statically
sized arrays.
v2: Actually use the new function instead of just looking at its
address. This required changing the args to const pointers.
(review by Kenneth)
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
| |
Signed-off-by: Ian Romanick <[email protected]>
|
|
|
|
|
|
|
| |
v2: Use base-10 for versions like gl_context::Version. Suggested by Ken.
Signed-off-by: Ian Romanick <[email protected]>
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
| |
Reviewed-by: Jordan Justen <[email protected]>
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
| |
v2: Comment fix, drop extraneous parens (review by Kenneth)
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
|
| |
Ever since ctx->NativeIntegers was set, the conversion flag has been
PARAM_NO_CONVERT.
Reviewed-by: Ian Romanick <[email protected]>
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
On Gen6, multisampled null render targets don't seem to work
properly--they cause the GPU to hang. So, as a workaround, we render
into a dummy color buffer.
Fortunately this situation (multisampled rendering without a color
buffer) is rare, and we don't have to waste too much memory, because
we can give the workaround buffer a very small pitch.
Fixes piglit test "EXT_framebuffer_multisample/no-color {2,4}
depth-computed *" on Gen6.
Reviewed-by: Chad Versace <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Intel hardware doesn't natively support textureGrad with shadow
comparisons. So we need to generate code to handle it somehow.
Based on the equations of page 205 of the OpenGL 3.0 specification,
it's possible to compute the LOD value that would be selected given the
gradient values. Then, we can simply convert the TXD to a TXL.
Currently, this passes 34/46 of oglconform's shadow-grad subtests;
four cubemap tests are regressed. We should investigate this in the
future.
v2: Apply abs() to the scalar case (thanks to Eric).
Signed-off-by: Kenneth Graunke <[email protected]>
Reviewed-by: Eric Anholt <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
From the Ivy Bridge PRM, Vol 2 Part 1 p280-281 (3DSTATE_WM:
Barycentric Interpolation Mode):
"Errata: When Centroid Barycentric mode is required, HW may
produce incorrect interpolation results when a 2X2 pixels have
unlit pixels."
To work around this problem, after doing centroid interpolation, we
replace the centroid-interpolated values for unlit pixels with
non-centroid-interpolated values (which are interpolated at pixel
centers). This produces correct rendering at the expense of a slight
increase in shader execution time.
I've conditioned the workaround with a runtime flag
(brw->needs_unlit_centroid_workaround) in the hopes that we won't need
it in future chip generations.
Fixes piglit tests "EXT_framebuffer_multisample/interpolation {2,4}
{centroid-deriv,centroid-deriv-disabled}". All MSAA interpolation
tests pass now.
Reviewed-by: Eric Anholt <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
When querying GL_PRIMITIVES_GENERATED, if primitive restart
is also used, then take the software primitive restart
path so GL_PRIMITIVES_GENERATED is returned correctly.
GL_TRANSFORM_FEEDBACK_PRIMITIVES_WRITTEN is also updated
since it will also affected by the same issue.
As noted in brw_primitive_restart.c, with further work we
should be able to move this situation back to a hardware
handled path.
Signed-off-by: Jordan Justen <[email protected]>
Reviewed-by: Ian Romanick <[email protected]>
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
|
| |
Set the step_rate value when drawing to implement
ARB_instanced_arrays for gen >= 4.
Signed-off-by: Jordan Justen <[email protected]>
Reviewed-by: Eric Anholt <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This patch enables glSampleCoverage() functionality, which allows the
client program to specify that only a portion of the samples be lit up
when performing multisampled rendering. i965 supports
glSampleCoverage() through the 3DSTATE_SAMPLE_MASK command packet,
which allows the driver to specify a bitfield indicating which samples
to light up.
Fixes piglit tests "EXT_framebuffer_multisample/sample-coverage {2,4}
{inverted,non-inverted}".
Reviewed-by: Anuj Phogat <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
On Gen7, push constants for shader programs are stored in the URB, so
blorp code needs to set aside space for them. This was previously
unnecessary because blorp code was based on HiZ operations, which
don't require any shaders.
This patch adds a call from gen7_blorp_exec() to
gen7_allocate_push_constants(), to ensure that push constants are
assigned the correct location in the URB. It also extracts a new
function gen7_emit_urb_state() from gen7_upload_urb(), which is
re-used by gen7_blorp_emit_urb_config() to ensure that the URB regions
used by all the pipeline stages leave room for the push constants.
Reviewed-by: Chad Versace <[email protected]>
Reviewed-by: Ian Romanick <[email protected]>
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
|
|
| |
When brw->prim_restart.enable_cut_index is set, the cut index
will be enabled when uploading index_buffer commands.
Signed-off-by: Jordan Justen <[email protected]>
Reviewed-by: Kenneth Graunke <[email protected]>
Reviewed-by: Ian Romanick <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
For newer hardware we disable the VBO module's software handling
of primitive restart. We now handle primitive restarts in
brw_handle_primitive_restart.
The initial version of brw_handle_primitive_restart simply calls
vbo_sw_primitive_restart, and therefore still uses the VBO
module software primitive restart support.
Signed-off-by: Jordan Justen <[email protected]>
Reviewed-by: Kenneth Graunke <[email protected]>
Reviewed-by: Ian Romanick <[email protected]>
|
|
|
|
|
|
|
| |
v2: Add support for gen6, and don't turn it on if blending is
disabled. (fixes GPU hang), and note it in docs/GL3.txt
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Previously, when the environment variable INTEL_DEBUG=aub was set,
mesa would simply instruct DRM to start dumping data to an .aub file,
but we would not provide DRM with any information about the format of
the data in various buffers. As a result, a lot of the data in the
generate .aub file would be unannotated, making further data analysis
difficult.
This patch causes the entire contents of each batch buffer to be
annotated using the data in brw->state_batch_list (which was
previously used only to annotate the output of INTEL_DEBUG=bat). This
includes data that was allocated by brw_state_batch, such as binding
tables, surface and sampler states, depth/stencil state, and so on.
The new annotation mechanism requires DRM version 2.4.34.
Reviewed-by: Eric Anholt <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This patch enables MSAA for Gen6, by modifying intel_mipmap_tree to
understand multisampled buffers, adapting the rendering pipeline setup
to enable multisampled rendering, and adding multisample resolve
operations to brw_blorp_blit.cpp. Some preparation work is also
included for Gen7, but it is not yet enabled.
MSAA support is still fairly preliminary. In particular, the
following are not yet supported:
- Fully general blits between MSAA and non-MSAA buffers.
- Formats other than RGBA8, DEPTH24, and STENCIL8.
- Centroid interpolation.
- Coverage parameters (glSampleCoverage, GL_SAMPLE_ALPHA_TO_COVERAGE,
GL_SAMPLE_ALPHA_TO_ONE, GL_SAMPLE_COVERAGE, GL_SAMPLE_COVERAGE_VALUE,
GL_SAMPLE_COVERAGE_INVERT).
Fixes piglit tests "EXT_framebuffer_multisample/accuracy" on
i965/Gen6.
v2:
- In intel_alloc_renderbuffer_storage(), quantize the requested number
of samples to the next higher sample count supported by the
hardware. This ensures that a query of GL_SAMPLES will return the
correct value. It also ensures that MSAA is fully disabled on Gen7
for now (since Gen7 MSAA support doesn't work yet).
- When reading from a non-MSAA surface, ensure that s_is_zero is true
so that we won't try to read from a nonexistent sample.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This patch expands the "blorp" component to be able to perform blits
as well as HiZ resolves. The new blitting code is located in
brw_blorp_blit.cpp. This includes the necessary fragment shader code
to look up pixels in the source buffer (which is configured as a
texture) and output them to the destination buffer (which is
configured as the render target).
Most of the time the fragment shader code is simple and
straightforward, since it merely has to apply a coordinate offset,
read from the texture, and write to the render target. However, in
the case of blitting stencil buffers, things are more complicated,
since the GPU stores stencil data using W tiling, and W tiling is not
supported for textures or render targets. So, we set up the stencil
buffers as Y tiled, and emit fragment shader code that adjusts the
coordinates to account for the difference between W and Y tiling.
Furthermore, since a rectangular region in W tiling does not
necessarily correspond to a rectangular region in Y tiling, we widen
the rectangle primitive to the nearest tile boundary and have the
fragment shader "kill" any pixels that don't fall inside the actual
desired destination rectangle.
All of this is a necessary prerequisite for implementing MSAA, since
we'll need to be able to blit between multisample color, depth, and
stencil buffers and their non-multisampled counterparts, and none of
the existing blitting mechanisms support multisampling.
In addition, the new blitting code should speed up operations where we
previously fell back to software rasterization, such as blitting of
stencil buffers. The current fallback sequence is: first we try to do
a blit using the hardware blitting engine. If that fails we try to do
a blit using the render path. If that also fails then we do the blit
using a meta-op (which may or may not fall back to software
rasterization).
Note that blitting using the render path has some limitations at the
moment: it only supports a few formats, and it doesn't support
clipping or scissoring. These limitations will be addressed in future
patch series.
v2:
- Add the code that configures the WM program to
gen{6,7}_emit_wm_config() and gen7_emit_ps_config() rather than
creating separate ...enable() functions.
- Call intel_prepare_render before determining which miptrees we are
blitting from/to, because it may cause miptrees to be reallocated.
- Allow the blit to mirror X and/or Y coordinates.
- Disable blorp blits on Gen7 for now, since they aren't working yet.
|
|
|
|
|
|
|
|
| |
These declarations are necessary to allow C++ code to call C code
without causing unresolved symbols (which would make the driver fail
to load).
Reviewed-by: Chad Versace <[email protected]>
|
|
|
|
|
|
|
|
| |
Now that we use separate binding tables for WM, VS, and GS, and have
BRW_MAX_VS_SURFACES and BRW_MAX_GS_SURFACES macros, we really shouldn't
have an unqualified BRW_MAX_SURFACES macro. It's confusing.
Signed-off-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
|
|
| |
They had a number of issues:
- A paragraph states that we use a single binding table, but we don't.
- We labelled the WM binding table diagram as SOL/WM.
- The WM diagram had an "Only relevant to the WM" comment. Duh.
Signed-off-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This patch add the support of gl_PointCoord gl builtin variable for
platform gen4 and gen5(ILK).
Unlike gen6+, we don't have a hardware support of gl_PointCoord, means
hardware will not calculate the interpolation coefficient for you.
Instead, you should handle it yourself in sf shader stage.
But badly, gl_PointCoord is a FS instead of VS builtin variable, thus
it's not included in c.vue_map generated in VS stage. Thus the current
code doesn't aware of this attribute. And to handle it correctly, we
need add it to c.vue_map manually to let SF shader generate the needed
interpolation coefficient for FS shader. SF stage has it's own copy of
vue_map, thus I think it's safe to do it manually.
Since handling gl_PointCoord for gen4 and gen5 platforms is somehow a
little special, I added a lot of comments and hope I didn't overdo it ;)
v2: add a /* _NEW_BUFFERS */ comment to note the state flag dependency
and also add the _NEW_BUFFERS dirty mask (Eric).
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=45975
Piglit: glsl-fs-pointcoord and fbo-gl_pointcoord
NOTE: This is a candidate for stable release branches.
Signed-off-by: Yuanhan Liu <[email protected]>
Reviewed-by: Eric Anholt <[email protected]>
|