| Commit message (Collapse) | Author | Age | Files | Lines |
... | |
|
|
|
|
|
| |
Constant combining won't promote non-floats, so this isn't safe.
Fixes regressions since commit 0087cf23e.
|
|
|
|
| |
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
| |
Reviewed-by: Jordan Justen <[email protected]>
Reviewed-by: Matt Turner <[email protected]>
|
|
|
|
|
|
|
| |
The INTEL_DEBUG variable is a uint64_t and if we want a enum value higer
than 32 bits, you need to use ull. We might as well use it for all of them.
Reviewed-by: Matt Turner <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The BLT engine on Gen8+ requires linear surfaces to be cacheline
aligned. This restriction was added as part of converting the BLT to
use 48-bit addressing.
The main user, intel_emit_linear_blit, now handles this properly.
But we might also have linear miptrees; just refuse to blit those.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=88521
Signed-off-by: Kenneth Graunke <[email protected]>
Reviewed-by: Ian Romanick <[email protected]>
Reviewed-by: Anuj Phogat <[email protected]>
Cc: [email protected]
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The BLT engine on Gen8+ requires linear surfaces to be cacheline
aligned. This restriction was added as part of converting the BLT to
use 48-bit addressing.
intel_emit_linear_blit needs to handle blits that are not cacheline
aligned, as we use it for arbitrary glBufferSubData calls and subrange
mappings.
Since intel_emit_linear_blit uses 1 byte per pixel, we can use the src/dst
pixel X offset field to represent the unaligned portion, and subtract
that from the address so it's cacheline aligned.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=88521
Signed-off-by: Kenneth Graunke <[email protected]>
Reviewed-by: Ian Romanick <[email protected]>
Reviewed-by: Anuj Phogat <[email protected]>
Cc: [email protected]
|
|
|
|
| |
Reviewed-by: Connor Abbott <[email protected]>
|
|
|
|
| |
Reviewed-by: Matt Turner <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
| |
This name better matches what it's actually used for. The patch was
generated with the following command:
for file in *; do
sed -i -e s/brw_compile/brw_codegen/g $file
done
Signed-off-by: Jason Ekstrand <[email protected]>
Reviewed-by: Matt Turner <[email protected]>
|
|
|
|
| |
Reviewed-by: Matt Turner <[email protected]>
|
|
|
|
| |
Reviewed-by: Matt Turner <[email protected]>
|
|
|
|
| |
Reviewed-by: Matt Turner <[email protected]>
|
|
|
|
|
|
|
|
|
|
| |
Since commit 2881b123, we have used 0/~0 for representing booleans on all
gens. However, we still had a bunch of places in the visitor code where we
were still referring to ctx->Const.UniformBooleanTrue. Since this is
always ~0, we can just remove them.
Reviewed-by: Kenneth Graunke <[email protected]>
Reviewed-by: Matt Turner <[email protected]>
|
|
|
|
| |
Reviewed-by: Matt Turner <[email protected]>
|
|
|
|
| |
Reviewed-by: Matt Turner <[email protected]>
|
|
|
|
|
|
|
|
|
|
| |
This also involves moving revision checking to screen creation time and
passing that into brw_get_device_info so that we can get the right
device_info for early versions of SKL. Since the only place we used
revision was to check for SIMD16 3-src instruction support, it's safe to
remove the revision field from brw_context.
Reviewed-by: Matt Turner <[email protected]>
|
|
|
|
|
|
| |
It's basically just a copy of GEN7_FEATURES only with is_haswell set
Reviewed-by: Matt Turner <[email protected]>
|
|
|
|
| |
Reviewed-by: Matt Turner <[email protected]>
|
|
|
|
| |
Reviewed-by: Matt Turner <[email protected]>
|
|
|
|
| |
Reviewed-by: Matt Turner <[email protected]>
|
|
|
|
| |
Reviewed-by: Matt Turner <[email protected]>
|
|
|
|
| |
Reviewed-by: Matt Turner <[email protected]>
|
|
|
|
| |
Reviewed-by: Matt Turner <[email protected]>
|
|
|
|
| |
Reviewed-by: Matt Turner <[email protected]>
|
|
|
|
|
|
|
| |
In future tests, we will start relying on devinfo and not just brw in the
compiler. Changing this now keeps these tests from failing in the future.
Reviewed-by: Matt Turner <[email protected]>
|
|
|
|
|
|
|
|
| |
It wasn't really being used anyway. We used it to assert that gpu_shader5
is supported in the back-end but that should be caught by the front-end.
Signed-off-by: Jason Ekstrand <[email protected]>
Reviewed-by: Matt Turner <[email protected]>
|
|
|
|
|
|
|
|
|
| |
- Do not attempt to create the save folder twice - both dir $@ and
PRIVATE_LOCALEDIR point to the same place.
- Use @ and $(hide), for mkdir and python, to avoid spamming the
output.
Signed-off-by: Emil Velikov <[email protected]>
|
|
|
|
|
|
|
|
|
| |
v2 [Emil Velikov]
- Keep the PRIVATE_LOCALEDIR variable.
- Do not use $(@D) but the more widespead $(dir $@)
Signed-off-by: Chih-Wei Huang <[email protected]>
Signed-off-by: Emil Velikov <[email protected]>
|
|
|
|
|
|
|
| |
The modules need the headers can get the path automatically.
Signed-off-by: Chih-Wei Huang <[email protected]>
Reviewed-by: Emil Velikov <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Android 5.0 allows modules to generate source into $OUT/gen, which will
then be copied into $OUT/obj and $OUT/obj_$(TARGET_2ND_ARCH) as necessary.
Modules will need to change calls to local-intermediates-dir into
local-generated-sources-dir.
The patch changes local-intermediates-dir into local-generated-sources-dir.
If the Android version is less than 5.0, fallback to local-intermediates-dir.
The patch also fixes the 64-bit building issue of Android 5.0.
v2 [Emil Velikov]
- Keep the LOCAL_UNSTRIPPED_PATH variable.
Signed-off-by: Chih-Wei Huang <[email protected]>
|
|
|
|
|
|
|
|
| |
The dri modules depend on symbols provided by it.
Cc: "10.5" <[email protected]>
Signed-off-by: Emil Velikov <[email protected]>
Reviewed-by: Chih-Wei Huang <[email protected]>
|
|
|
|
|
|
|
|
| |
Similar to e8c5cbfd921(mesa: Add gallium include dirs to more parts of
the tree.)
Signed-off-by: Emil Velikov <[email protected]>
Reviewed-by: Chih-Wei Huang <[email protected]>
|
|
|
|
|
|
|
| |
Otherwise we'll fail to find the drm.h header.
Cc: "10.4 10.5" <[email protected]>
Signed-off-by: Emil Velikov <[email protected]>
|
|
|
|
|
|
|
|
|
| |
Many parts of mesa already have the include with others depending on it
but it's missing. Add it once at the top makefile and be done with it.
Cc: "10.4 10.5" <[email protected]>
Signed-off-by: Emil Velikov <[email protected]>
Reviewed-by: Chih-Wei Huang <[email protected]>
|
|
|
|
|
|
|
|
|
| |
... to manage the LIBDRM*_CFLAGS. The former is the recommended approach
by the Android build system developers while the latter has been
depreciated for quite some time.
Cc: "10.4 10.5" <[email protected]>
Signed-off-by: Emil Velikov <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
| |
Appears to fix shader compilation. Tested by starting the client,
dragging the "quality and speed" slider back and forth, and watching the
console output - instead of piles of "shader failed to compile", the CPU
seems to be busy compiling shaders. I haven't actually tried to play.
Signed-off-by: Kenneth Graunke <[email protected]>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=69226
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=71591
Cc: [email protected]
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The X and Y values come interleaved in g1 (.4-.11 inclusive), so we can
calculate them together with a single add(32) instruction on some
platforms like Broadwell and newer or in SIMD8 elsewhere.
Note that I also moved the PIXEL_X/PIXEL_Y virtual opcodes from before
LINTERP to after it. That's because the writes_accumulator_implicitly()
function in backend_instruction tests for <= LINTERP for determining
whether the instruction indeed writes the accumulator implicitly. The
old FS_OPCODE_PIXEL_X/Y emitted ADD instructions, which did, but the new
opcodes just emit MOVs, which don't. It doesn't matter, since we don't
use these opcodes on Gen4/5 anymore, but in the case that we do...
On Broadwell:
total instructions in shared programs: 7192355 -> 7186224 (-0.09%)
instructions in affected programs: 1190700 -> 1184569 (-0.51%)
helped: 6131
On Haswell:
total instructions in shared programs: 6155979 -> 6152800 (-0.05%)
instructions in affected programs: 652362 -> 649183 (-0.49%)
helped: 3179
Reviewed-by: Jason Ekstrand <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This lets SIMD16 programs on G45 and Gen5 use the PLN instruction.
On Ironlake:
total instructions in shared programs: 5634757 -> 5518055 (-2.07%)
instructions in affected programs: 1745837 -> 1629135 (-6.68%)
helped: 11439
HURT: 4
Reviewed-by: Jason Ekstrand <[email protected]>
|
|
|
|
|
|
|
|
|
|
| |
These were used only on Gen4 and 5. emit_interpolation_setup_gen6() emits
ADDs directly. The virtual opcodes weren't providing anything useful.
I'm going to repurpose these opcodes, so deleting and readding them makes
it simpler to see what's going on.
Reviewed-by: Jason Ekstrand <[email protected]>
|
|
|
|
|
|
|
|
| |
Like LINE (commit 92346db0), src0 must have a scalar region. Setting
src1's region to <8,8,1> lets us pass a properly sized combined delta_xy
argument in a few commits without getting a bogus <16,16,1> region.
Reviewed-by: Jason Ekstrand <[email protected]>
|
|
|
|
|
|
|
|
|
|
| |
LINTERP's src0 is PLN's src1, and PLN's src1 reads exec_size / 4
registers.
Having that information lets us drop the delta_x/y special case code in
split_virtual_grfs().
Reviewed-by: Jason Ekstrand <[email protected]>
|
|
|
|
|
|
|
| |
We don't want to set compression control on a SIMD16 instruction
operating on words or smaller.
Reviewed-by: Jason Ekstrand <[email protected]>
|
|
|
|
|
|
|
| |
In a few commits, we'll start emitting an add(32) instruction on some
platforms.
Reviewed-by: Jason Ekstrand <[email protected]>
|
|
|
|
|
|
| |
Avoids annoying warnings when comparing with sizeof(...).
Reviewed-by: Jason Ekstrand <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
guess_execution_size() does two things:
1. Cope with small destination registers.
2. Cope with SIMD8 vs SIMD16 mode.
This patch replaces the first with a simple if block in brw_set_dest: if
the destination register width is less than 8, you probably want the
execution size to match. (I didn't put this in the 3src block because
it doesn't seem to matter.)
Since only the FS compiler cares about SIMD16 mode, it's easy to just
set the default execution size there.
This pattern was already been proven in the Gen8+ generator, but we
didn't port it back to the existing generator when we combined the two.
This is based on a patch from Ken from about a year ago. I've rebased it
and and fixed a few bugs.
Reviewed-by: Jason Ekstrand <[email protected]>
|
|
|
|
|
|
| |
The BSpec says this applies to Gen6 as well.
Reviewed-by: Jason Ekstrand <[email protected]>
|
|
|
|
|
|
|
|
|
| |
Consistently just use C99's __func__ everywhere.
No functional changes.
Signed-off-by: Marius Predut <[email protected]>
Acked-by: Michel Dänzer <[email protected]>
Reviewed-by: Emil Velikov <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
On Skylake the qpitch value is uploaded as part of the surface state
so we don't need to add the extra rows that are done for other
generations. However for 3D textures it needs to be aligned to the
tile height and for depth/stencil textures it needs to be a multiple
of 8. Unlike previous generations the qpitch is measured as a multiple
of the block size for compressed surfaces. When the horizontal mipmap
layout is used for 1D textures then the qpitch is measured in pixels
instead of rows.
v2: Align the depth/stencil textures to a multiple of 8
v3: Add an assert that ALL_SLICES_AT_EACH_LOD is not used. Ignore the
vertical alignment when picking the qpitch for 1D_ARRAY textures.
Reviewed-by: Ben Widawsky <[email protected]>
Reviewed-by: Anuj Phogat <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The render surface state command for Skylake doesn't have the surface
array spacing bit so it's not possible to select this layout. I think
it was only used in order to make it pick a tightly-packed qpitch
value that doesn't include space for the mipmaps. However this won't
be necessary after the next patch because it will automatically pick a
packed qpitch value whenever first_level==last_level. It is better to
remove this layout entirely on Gen8+ because although it can
effectively be implemented with a small qpitch value when there are no
mipmaps it isn't possible to support the case where there are mipmaps
because in that case the layout is very different.
It could be good to make a similar change for Gen8 if we also change
the layouting code to pick the qpitch value in a similar way.
v2: Make the commit message and comments more convincing
Reviewed-by: Ben Widawsky <[email protected]>
Tested-by: Ben Widawsky <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
We haven't implemented proper unsynchronized map support on !LLC systems
(pre-SNB, Atom). MapBufferRange with GL_MAP_UNSYNCHRONIZE_BIT will
actually do a synchronized map, probably killing performance.
Also warn on BufferSubData, when we should be doing an unsynchronized
upload, but instead have to do a synchronous map.
v2: Only complain if the buffer is actually busy - we use unsynchronized
maps internally for vertex upload and such, but expect those to not
be busy.
Signed-off-by: Kenneth Graunke <[email protected]>
Reviewed-by: Ben Widawsky <[email protected]>
Tested-by: Ben Widawsky <[email protected]>
|