| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
|
|
| |
Trivial cleanup.
Reviewed-by: Tapani Pälli <[email protected]>
|
|
|
|
|
|
|
| |
The old comment pinned this function to X11 windows. In reality, this
function serves more than X11 and more than just windows.
Reviewed-by: Tapani Pälli <[email protected]>
|
|
|
|
|
|
|
|
| |
See commit ece0e535a44c228dd994861592deb155c14740d8. This makes
Gen4-5 follow the behavior we use on Gen6+. It seems to have
worked out there.
Reviewed-by: Topi Pohjolainen <[email protected]>
|
|
|
|
|
|
|
| |
This brings the improved guardbanding we implemented on Gen6+
back to the older Gen4-5 code. It also deletes piles of code.
Reviewed-by: Topi Pohjolainen <[email protected]>
|
|
|
|
|
|
|
|
| |
Gen4-5 include a single SCISSOR_RECT in SF_VIEWPORT.
Making a helper function will allow us to reuse this code for Gen4-5.
Reviewed-by: Topi Pohjolainen <[email protected]>
|
|
|
|
|
|
| |
This is clearer and less likely to break in the future.
Reviewed-by: Topi Pohjolainen <[email protected]>
|
|
|
|
|
|
|
|
| |
These are fairly related. Gen4-5 combine the scissor rectangle and
SF_VIEWPORT. Co-locating them will allow me to avoid forward
declarations of helper functions in a few patches.
Reviewed-by: Topi Pohjolainen <[email protected]>
|
|
|
|
| |
Reviewed-by: Topi Pohjolainen <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
On Gen7+ we emit 3DSTATE_VIEWPORT_STATE_POINTERS_{SF_CL,CC} when
emitting a new viewport.
This patch makes us take the same approach on Sandybridge - but because
we have a combined command, we just set the appropriate "change" bits.
This eliminates an atom, some dirty flagging, and some brw->*.vp_offset
writes. It does mean we'll emit two 3DSTATE_VIEWPORT_STATE_POINTERS
instead of one if both change, but that's probably fine.
Reviewed-by: Topi Pohjolainen <[email protected]>
|
|
|
|
| |
Reviewed-by: Topi Pohjolainen <[email protected]>
|
|
|
|
|
|
|
| |
This was just an accidental typo in the refactoring. The intention was
to try the blitter on gen4-5, not just gen4.
Reviewed-by: Matt Turner <[email protected]>
|
|
|
|
| |
Reviewed-by: Topi Pohjolainen <[email protected]>
|
|
|
|
|
|
|
|
| |
We keep the blit path because it's probably faster when it works.
However, now that we can use blorp, we can delete that nasty CPU
fall-back path.
Reviewed-by: Topi Pohjolainen <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
| |
The width and height of the copy don't have to be aligned to the block
size if they specify the right or bottom edges of the image. (See also
the comment and asserts right above). We need to round them up when we
do the division in order to get it 100% right.
Reviewed-by: Ben Widawsky <[email protected]>
Reviewed-by: Kenneth Graunke <[email protected]>
Cc: "17.0 17.1" <[email protected]>
|
|
|
|
|
|
|
|
| |
We don't support replicated data clears yet. Those take a bit more work
and enabling replicated data clears in its own commit is probably better
for bisectibility anyway.
Reviewed-by: Topi Pohjolainen <[email protected]>
|
|
|
|
| |
Reviewed-by: Topi Pohjolainen <[email protected]>
|
|
|
|
|
|
|
|
|
|
| |
Due to complications with things such as URB setup on gen4-5, it's
easier to keep gen4 support in blorp completely internal to i965. This
makes things a bit awkward because that means there's a file in i965
that includes blorp_priv.h but it's either that or have a file in blorp
that includes brw_context.h.
Reviewed-by: Topi Pohjolainen <[email protected]>
|
|
|
|
| |
Reviewed-by: Topi Pohjolainen <[email protected]>
|
|
|
|
|
|
|
|
| |
As part of enabling support for SF programs, we plumb the SF URB size
through to emit_urb_config. For now, it's always zero but, on gen4, it
may be something larger.
Reviewed-by: Topi Pohjolainen <[email protected]>
|
|
|
|
| |
Reviewed-by: Topi Pohjolainen <[email protected]>
|
|
|
|
| |
Reviewed-by: Topi Pohjolainen <[email protected]>
|
|
|
|
|
|
|
|
|
| |
Having it be a pointer means that we end up caching clip programs based
on a pointer to wm_prog_data rather than the actual interpolation modes.
We've been caching one clip program per FS ever since 91d61fbf7cb61a44a
where Timothy rewrote brw_setup_vue_interpolation().
Reviewed-by: Topi Pohjolainen <[email protected]>
|
|
|
|
|
|
|
|
|
| |
Having it be a pointer means that we end up caching clip programs based
on a pointer to wm_prog_data rather than the actual interpolation modes.
We've been caching one clip program per FS ever since 91d61fbf7cb61a44a
where Timothy rewrote brw_setup_vue_interpolation().
Reviewed-by: Topi Pohjolainen <[email protected]>
|
|
|
|
| |
Reviewed-by: Topi Pohjolainen <[email protected]>
|
|
|
|
|
|
|
|
| |
The guts of blorp and ISL don't understand i965's partial miptrees.
Instead, we need to subtract off first_level before we hand anything off
to blorp.
Reviewed-by: Topi Pohjolainen <[email protected]>
|
|
|
|
|
|
|
| |
ISL doesn't have a concept of a partial miptree. Instead, we need to
subtract off first_level.
Reviewed-by: Topi Pohjolainen <[email protected]>
|
|
|
|
|
|
|
|
|
|
| |
The blorp_copy entrypoint is designed for doing memcpy like operations
which is what we need to do here while blorp_blit is for handling format
conversion and scaling. Using blorp_copy is much simpler and prevents
us from getting formats wrong. While we're here, we get rid of the
layers_per_blit thing since stencil always uses interleaved MSAA.
Reviewed-by: Topi Pohjolainen <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
We've discovered in the Vulkan driver that simply doing the end-of-pipe
sync afterwards is insufficient. The specific requirement stated in the
PRM is that you have to do one every time you transition between the
tree modes of "clear", "render", and "resolve". This is GL, so we could
track it but any attempt to do so would most likely get it wrong. For
now, it's easier to just assume that every fast-clear op is an island
and do the sync both before and after.
This also removes the unneeded flush and stall after slow-clear
operations.
Reviewed-by: Topi Pohjolainen <[email protected]>
Cc: "17.0 17.1" <[email protected]>
|
|
|
|
|
|
|
|
|
| |
Simplify the handling of mmap for Android by using mmap64 instead. mmap64
may have not existed for Android when this was written, but it's been
around since 2013.
Reviewed-by: Emil Velikov <[email protected]>
Signed-off-by: Rob Herring <[email protected]>
|
|
|
|
|
|
|
|
| |
This platform passes the following GLES3 tests:
ES3-CTS.functional.texture.compressed.astc.endpoint_value_hdr_cem_*
Reviewed-by: Anuj Phogat <[email protected]>
Signed-off-by: Nanley Chery <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
| |
text data bss dec hex filename
7038459 235248 37280 7310987 6f8e8b 32-bit i965_dri.so before
7038227 235248 37280 7310755 6f8da3 32-bit i965_dri.so after
6681438 303400 50608 7035446 6b5a36 64-bit i965_dri.so before
6681254 303400 50608 7035262 6b597e 64-bit i965_dri.so after
Signed-off-by: Ian Romanick <[email protected]>
Reviewed-by: Ilia Mirkin <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This option will allow GLSL builtins to be redeclared verbatim (e.g.
redeclaring "in int gl_VertexID" in a vertex shader). This is not strictly
valid and would normally fail to compile, but some applications (such as
newer Techland ports) do it and need more leniency.
v2 (Samuel Pitoiset):
- Rename allow_glsl_builtin_redeclaration ->
allow_glsl_builtin_variable_redeclaration
Signed-off-by: John Brooks <[email protected]>
Reviewed-by: Edward O'Callaghan <[email protected]>
Reviewed-by: Samuel Pitoiset <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
We can easily use the upload BO for push constants on Gen7.5/Gen8 too,
at the cost of a relocation when emitting 3DSTATE_CONSTANT_XS. We can
simply switch to using constant buffer pointer 2 instead of pointer 0,
like we do on Gen9+.
Ivybridge and Baytrail can't do this trick because they require the
constant buffers to be enabled in order, starting with 0. We'd have
to set the INSTPM bit to make the constant buffer pointer not relative
to dynamic state base address, which would need kernel command parser
support.
Improves performance in GLBenchmark 2.7/TRex Offscreen by:
- Broadwell GT2: 0.305608% +/- 0.19877% (n = 68)
- Braswell: No difference proven (n = 742)
- Haswell GT3e: 0.180755% +/- 0.0237505% (n = 30)
Reviewed-by: Chris Forbes <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Shaders can use quite a bit of uniform data. Better to put it in the
upload buffers, like we do for client vertex data, rather than the
batch buffer state area, which is primarly used for indirect state.
This should free up batch space, allowing us to emit more commands in a
batch before flushing. Because BRW_NEW_BATCH also causes a lot of state
to be re-emitted, it may also reduce CPU overhead a little bit.
We took this approach on Gen4-5, but switched to using the batch area
on Gen6+ because buffer 0 is relative to Dynamic State Base Address by
default, which is set to the start of the batch.
On Gen9+, we already use a relocation due to a workaround, so this is
trivial to change and has basically no downside.
Unfortunately we can't change compute shader push constants because
MEDIA_CURBE_LOAD always uses an offset from dynamic state base address.
Improves performance in GLBenchmark 2.7/TRex Offscreen by:
- Skylake GT4e: 0.52821% +/- 0.113402% (n = 190)
- Apollolake: 0.510225% +/- 0.273064% (n = 70)
Reviewed-by: Chris Forbes <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
| |
I don't think CS push constant uploading uses the section of L3
controlled by 3DSTATE_PUSH_CONSTANT_ALLOC_XS. So I don't think
it needs to be re-emitted when that space is reallocated.
The programming note in gen7_allocate_push_constants doesn't
indicate this is necessary, at least.
Reviewed-by: Chris Forbes <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The procedure for decompressing an opaque DXT1 OpenGL format is
dependant on the comparison of two colors stored in the first 32 bits of
the compressed block. Here's the specified OpenGL behavior for
reference:
The RGB color for a texel at location (x,y) in the block is given by:
RGB0, if color0 > color1 and code(x,y) == 0
RGB1, if color0 > color1 and code(x,y) == 1
(2*RGB0+RGB1)/3, if color0 > color1 and code(x,y) == 2
(RGB0+2*RGB1)/3, if color0 > color1 and code(x,y) == 3
RGB0, if color0 <= color1 and code(x,y) == 0
RGB1, if color0 <= color1 and code(x,y) == 1
(RGB0+RGB1)/2, if color0 <= color1 and code(x,y) == 2
BLACK, if color0 <= color1 and code(x,y) == 3
The sampling operation performed on an opaque DXT1 Intel format essentially
hard-codes the comparison result of the two colors as color0 > color1.
This means that the behavior is incompatible with OpenGL. This is stated
in the SKL PRM, Vol 5: Memory Views:
Opaque Textures (DXT1_RGB)
Texture format DXT1_RGB is identical to DXT1, with the exception that the
One-bit Alpha encoding is removed. Color 0 and Color 1 are not compared, and
the resulting texel color is derived strictly from the Opaque Color Encoding.
The alpha channel defaults to 1.0.
Programming Note
Context: Opaque Textures (DXT1_RGB)
The behavior of this format is not compliant with the OGL spec.
The opaque and non-opaque DXT1 OpenGL formats are specified to be
decoded in exactly the same way except the BLACK value must have a
transparent alpha channel in the latter. Use the four-channel BC1 Intel
formats with the alpha set to 1 to provide the behavior required by the
spec. Note that the alpha is already set to 1 for RGB formats in
brw_get_texture_swizzle().
v2: Provide a more detailed commit message (Kenneth Graunke).
v3: Ensure the alpha channel is set to 1 for DXT1 formats.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=100925
Cc: <[email protected]>
Acked-by: Tapani Pälli <[email protected]> (v1)
Reviewed-by: Kenneth Graunke <[email protected]>
Signed-off-by: Nanley Chery <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
When the GPU hangs, the kernel saves some state for us. Until now it has
not included the shader programs, which are very often the reason the
GPU hang occurred. With the programs saved in the error state, we should
be more capable of debugging hangs.
Thanks to Chris Wilson and Ben Widawsky who provided the kernel support
for this feature ("drm/i915: Copy user requested buffers into the error
state"), which will be in kernel v4.13.
Reviewed-by: Topi Pohjolainen <[email protected]>
|
|
|
|
|
|
|
|
| |
v2: Just use cast to uintptr_t (Chris)
Reported-by: Mauro Rossi <[email protected]>
Signed-off-by: Lionel Landwerlin <[email protected]>
Reviewed-by: Tapani Pälli <[email protected]>
|
|
|
|
|
|
|
| |
With this last state ported, we can get rid of gen8_draw_upload.c.
Signed-off-by: Rafael Antognolli <[email protected]>
Reviewed-by: Lionel Landwerlin <[email protected]>
|
|
|
|
|
|
|
|
| |
Also make the brw_get_index_type() function not shift its return, since that
is genxml's job now.
Signed-off-by: Rafael Antognolli <[email protected]>
Reviewed-by: Lionel Landwerlin <[email protected]>
|
|
|
|
|
|
|
| |
Emit the respective commands using genxml code.
Signed-off-by: Rafael Antognolli <[email protected]>
Reviewed-by: Lionel Landwerlin <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Several issues were caught on review after the original patch landed.
This commit fixes them.
v2:
- Fix padding (Topi)
- Remove .DestinationElementOffset change from this patch (Topi)
Signed-off-by: Rafael Antognolli <[email protected]>
Reviewed-by: Topi Pohjolainen <[email protected]>
Reviewed-by: Lionel Landwerlin <[email protected]>
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
|
|
|
| |
This was used by the meta fast clear code. Now that we've switched
back to BLORP, it's always true.
We might want it back when we add a RECTLIST extension to GL, but
that's someday in the future...
Reviewed-by: Kristian H. Kristensen <[email protected]>
|
|
|
|
|
|
| |
It's actually not that much code.
Reviewed-by: Rafael Antognolli <[email protected]>
|
|
|
|
|
|
|
|
| |
This whole code is surrounded in #if GEN_GEN >= 6, and this code only
applies on Sandybridge. So, use GEN_GEN == 6 to reduce the delta in
the next patch, when we add Gen4-5 support.
Reviewed-by: Rafael Antognolli <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This was added in b527dd65c830a as a work around because fixed function
fragment shaders were tracked in ctx->FragmentProgram._Current as
a gl_program rather than gl_shader_program.
However after my refactoring of the program and shader structs
at the end of 2016 which culminated in c505d6d85222, we no longer
need gl_shader_program to track the current program making
_CurrentFragmentProgram obsolete.
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This is something the original decoder did, but I didn't bother with
until now. I recently had to debug an Ironlake issue, and wanted to
inspect VS_STATE. So, now it's back.
The other packets in the switch statement are all Gen6/7+, where we
use offsets from dynamic state base address, so we don't need the
gtt_offset subtraction introduced here. We might want to make a
helper for this hack at some point - perhaps when we introduce the
next occurance.
Acked-by: Jason Ekstrand <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
| |
The BRW_NEW_CURBE_OFFSETS dirty bit is signalled when changing the
partitioning of the Constant Buffer URB section between the various
shader stages, on Gen4-5.
BRW_NEW_PUSH_CONSTANT_ALLOCATION is basically the same thing on Gen7+.
So, save a bit, and use the new name.
Reviewed-by: Topi Pohjolainen <[email protected]>
|
|
|
|
|
|
|
| |
Gen6 doesn't have a configurable push constant region. This is only
used on Gen7+.
Reviewed-by: Topi Pohjolainen <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
| |
Previously we guarded large swathes of code with #if GEN ... #endif
blocks. This made it difficult to see which generations include what.
This patch splits up the #if..#endif sections so they surround a small
section of code - usually a single function/atom, or sometimes a group
of related functions. It should make the code easier to work on.
Reviewed-by: Rafael Antognolli <[email protected]>
|