| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
|
|
|
|
|
| |
This limits the number of emitted vertices to the shaders max output
vertices, and avoids us writing things into memory that isn't big
enough for it.
Reviewed-by: Zack Rusin <[email protected]>
Signed-off-by: Dave Airlie <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Add Intel driver hook for glGetTexImage to accelerate the case of reading
texture image into a PBO. This case gets huge performance gains by using
GPU BLIT directly to PBO rather than GPU BLIT to temporary texture followed
by memcpy.
No regressions on Piglit tests with Intel driver.
Performance gain (1280 x 800 FBO, Ivybridge):
glGetTexImage + glMapBufferRange with patch 1.45 msec
glGetTexImage + glMapBufferRange without patch 4.68 msec
v3: (by Kenneth Graunke)
- Fix compile after Eric's change to drop the tiling argument
to intel_miptree_create_for_bo.
- Add GL_TEXTURE_3D to blacklisted texture targets to prevent Piglit
regressions.
- Squash in several whitespace and coding style fixes.
|
|
|
|
|
|
|
|
|
| |
We need to invalidate the live intervals when inserting new
instructions.
Signed-off-by: Kenneth Graunke <[email protected]>
Reviewed-by: Matt Turner <[email protected]>
Cc: [email protected]
|
|
|
|
|
|
|
|
|
|
|
| |
When walking backwards, we want to stop at the head sentinel, which is
where scan_inst->prev->prev == NULL, not scan_inst->prev == NULL.
Fixes random crashes, as well as valgrind errors.
Signed-off-by: Kenneth Graunke <[email protected]>
Reviewed-by: Matt Turner <[email protected]>
Cc: [email protected]
|
|
|
|
|
|
|
|
|
| |
Giving the meta clear program a meaningful name makes it easier to find
in output such as INTEL_DEBUG=fs or INTEL_DEBUG=shader_time. We already
did so for integer programs, but neglected to label the primary program.
Signed-off-by: Kenneth Graunke <[email protected]>
Reviewed-by: Matt Turner <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
These used to call different math emitters (brw_math vs. brw_math2).
Now that they both call gen6_math, they're virtually identical.
When unrolling SIMD16 to multiple SIMD8 operations, we should take care
not to apply sechalf to brw_null_reg for src1. Otherwise, we'd end up
with BRW_ARF_NULL + 1 as the register number, and I'm not sure if that's
valid.
Signed-off-by: Kenneth Graunke <[email protected]>
Reviewed-by: Matt Turner <[email protected]>
Reviewed-by: Jordan Justen <[email protected]>
|
|
|
|
|
|
|
|
|
|
| |
These functions are basically identical, so we should combine them.
However, they're so trivial, we may as well just fold them into their
only call sites.
Signed-off-by: Kenneth Graunke <[email protected]>
Reviewed-by: Matt Turner <[email protected]>
Reviewed-by: Jordan Justen <[email protected]>
|
|
|
|
|
|
|
|
|
| |
These are trivial to combine: we should just avoid checking the second
operand if it's brw_null_reg.
Signed-off-by: Kenneth Graunke <[email protected]>
Reviewed-by: Matt Turner <[email protected]>
Reviewed-by: Jordan Justen <[email protected]>
|
|
|
|
|
|
|
|
|
| |
It's now a single line of code, so we may as well fold it into the
caller.
Signed-off-by: Kenneth Graunke <[email protected]>
Reviewed-by: Matt Turner <[email protected]>
Reviewed-by: Jordan Justen <[email protected]>
|
|
|
|
|
|
|
|
|
| |
Usually, I try to use "brw" for functions that apply to all generations,
and "gen4" for dead end/legacy code that is only used on Gen4-5.
Signed-off-by: Kenneth Graunke <[email protected]>
Reviewed-by: Matt Turner <[email protected]>
Reviewed-by: Jordan Justen <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Our existing functions, brw_math and brw_math2, had unclear roles:
Gen4-5 used brw_math for both unary and binary math functions; it never
used brw_math2. Since operands are already in message registers, this
is reasonable.
Gen6+ used brw_math for unary math functions, and brw_math2 for binary
math functions, duplicating a lot of code. The only real difference was
that brw_math used brw_null_reg() for src1.
This patch improves brw_math2's assertions to allow both unary and
binary operations, renames it to gen6_math(), and drops the Gen6+ code
out of brw_math().
Signed-off-by: Kenneth Graunke <[email protected]>
Reviewed-by: Matt Turner <[email protected]>
Reviewed-by: Jordan Justen <[email protected]>
|
|
|
|
|
|
|
| |
This is more typical C++ style.
Signed-off-by: Kenneth Graunke <[email protected]>
Reviewed-by: Topi Pohjolainen <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Thread switching on control flow instructions is a documented workaround
for Gen4-5 errata. As far as I can tell, it hasn't been needed since
Sandybridge. Thread switching is not free, so in theory this may help
performance slightly.
Flow control instructions with the "switch" flag cannot be compacted, so
removing it will make these instructions compactable. (Of course, we
still have to implement compaction for flow control instructions...)
Signed-off-by: Kenneth Graunke <[email protected]>
Reviewed-by: Matt Turner <[email protected]>
|
|
|
|
|
|
|
|
|
|
| |
total instructions in shared programs: 2081469 -> 2081248 (-0.01%)
instructions in affected programs: 22606 -> 22385 (-0.98%)
No programs were hurt by this patch.
Signed-off-by: Kenneth Graunke <[email protected]>
Reviewed-by: Matt Turner <[email protected]>
Reviewed-by: Chris Forbes <[email protected]>
|
|
|
|
|
|
|
|
| |
Found with IWYU. Compile-tested on my Ivy-bridge system.
Reviewed-by: Tom Stellard <[email protected]>
Reviewed-by: Matt Turner <[email protected]>
Signed-off-by: Thomas Helland <[email protected]>
|
|
|
|
|
|
|
|
| |
Found with IWYU. Compile-tested on my Ivy-bridge system.
Reviewed-by: Tom Stellard <[email protected]>
Reviewed-by: Matt Turner <[email protected]>
Signed-off-by: Thomas Helland <[email protected]>
|
|
|
|
|
|
|
|
| |
Found with IWYU. Compile-tested on my Ivy-bridge system.
Reviewed-by: Tom Stellard <[email protected]>
Reviewed-by: Matt Turner <[email protected]>
Signed-off-by: Thomas Helland <[email protected]>
|
|
|
|
|
|
|
|
| |
Found with IWYU. Compile-tested on my Ivy-bridge system.
Reviewed-by: Tom Stellard <[email protected]>
Reviewed-by: Matt Turner <[email protected]>
Signed-off-by: Thomas Helland <[email protected]>
|
|
|
|
|
|
|
|
| |
Found with IWYU. Compile-tested on my Ivy-bridge system
Reviewed-by: Tom Stellard <[email protected]>
Reviewed-by: Matt Turner <[email protected]>
Signed-off-by: Thomas Helland <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Only function-defs use glsl_type so forward declare instead.
Compile-tested on my Ivy-bridge system.
IWYU also suggests removing #include <new>, and this compiles fine.
I'm not familiar enough with memory management in C/C++ that I feel
comfortable removing this. Insights would be appreciated.
Reviewed-by: Tom Stellard <[email protected]>
Reviewed-by: Matt Turner <[email protected]>
Signed-off-by: Thomas Helland <[email protected]>
|
|
|
|
|
|
|
|
|
| |
Found with IWYU. Compile-tested on my Ivy-bridge system.
Added comment about core.h being used for MAX2.
Reviewed-by: Tom Stellard <[email protected]>
Reviewed-by: Matt Turner <[email protected]>
Signed-off-by: Thomas Helland <[email protected]>
|
|
|
|
|
|
|
|
| |
Found with IWYU. Compile-tested on my Ivy-bridge system.
Reviewed-by: Tom Stellard <[email protected]>
Reviewed-by: Matt Turner <[email protected]>
Signed-off-by: Thomas Helland <[email protected]>
|
|
|
|
|
|
|
|
|
|
| |
Found with IWYU. Comment says it's for struct gl_extensions.
Grepping for gl_extensions shows no uses.
Tested by compiling on my Ivy-bridge system.
Reviewed-by: Tom Stellard <[email protected]>
Reviewed-by: Matt Turner <[email protected]>
Signed-off-by: Thomas Helland <[email protected]>
|
|
|
|
|
|
|
|
|
| |
Found with IWYU, compile-tested on my Ivy-bridge system.
This is not used in the header, and is included in the source.
Reviewed-by: Tom Stellard <[email protected]>
Reviewed-by: Matt Turner <[email protected]>
Signed-off-by: Thomas Helland <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
| |
Found with IWYU, confirmed with grepping for "hash" and "symbol".
No negative effects on compilation.
IWYU also reported core.h and linker.h could be removed,
but I'm unsure if those are false positives.
Reviewed-by: Tom Stellard <[email protected]>
Reviewed-by: Matt Turner <[email protected]>
Signed-off-by: Thomas Helland <[email protected]>
|
|
|
|
| |
Reviewed-by: Ian Romanick <[email protected]>
|
|
|
|
| |
Reviewed-by: Ian Romanick <[email protected]>
|
|
|
|
| |
Reviewed-by: Ian Romanick <[email protected]>
|
|
|
|
| |
Reviewed-by: Ian Romanick <[email protected]>
|
|
|
|
| |
Reviewed-by: Ian Romanick <[email protected]>
|
|
|
|
| |
Reviewed-by: Ian Romanick <[email protected]>
|
|
|
|
| |
Reviewed-by: Ian Romanick <[email protected]>
|
|
|
|
| |
Reviewed-by: Ian Romanick <[email protected]>
|
|
|
|
| |
Reviewed-by: Ian Romanick <[email protected]>
|
|
|
|
|
|
|
| |
This fixes an issue when running cl-program-bitcoin-phatk
piglit test where some of the inputs have negative values
Reviewed-by: Tom Stellard <[email protected]>
|
|
|
|
|
|
|
|
|
| |
Now, items whose size is a multiple of 1024 dw won't leave
1024 dw between itself and the following item
The rest of the cases is left as it was
Reviewed-by: Tom Stellard <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
| |
Removed compute_memory_defrag declaration because it seems
to be unimplemented.
I think that this function would have been the one that solves
the problem with fragmentation that compute_memory_finalize_pending has.
Also removed comments that are already at compute_memory_pool.c
Reviewed-by: Tom Stellard <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Explanation of the changes, as requested by Tom Stellard:
Let's take need after is calculated as
item->size_in_dw+2048 - (pool->size_in_dw - allocated)
BEFORE:
If need is positive or 0:
we calculate need += 1024 - (need % 1024), which is like
cealing to the nearest multiple of 1024, for example
0 goes to 1024, 512 goes to 1024 as well, 1025 goes
to 2048 and so on. So now need is always possitive,
we do compute_memory_grow_pool, check its output
and continue.
If need is negative:
we calculate need += 1024 - (need % 1024), in this case
we will have negative numbers, and if need is
[-1024:-1] 0, so now we take the else, recalculate
need as need = pool->size_in_dw / 10 and
need += 1024 - (need % 1024), we do
compute_memory_grow_pool, check its output and continue.
AFTER:
If need is positive or 0:
we jump the if, calculate need += 1024 - (need % 1024)
compute_memory_grow_pool, check its output and continue.
If need is negative:
we enter the if, and need is now pool->size_in_dw / 10.
Now we calculate need += 1024 - (need % 1024)
compute_memory_grow_pool, check its output and continue.
Reviewed-by: Tom Stellard <[email protected]>
|
|
|
|
|
|
|
|
|
| |
In this case, NULL checks are added to compute_memory_grow_pool,
so it returns -1 when it fails. This makes necesary
to handle such cases in compute_memory_finalize_pending
when it is needed to grow the pool
Reviewed-by: Tom Stellard <[email protected]>
|
|
|
|
| |
Reviewed-by: Tom Stellard <[email protected]>
|
|
|
|
| |
Reviewed-by: Tom Stellard <[email protected]>
|
|
|
|
|
| |
Signed-off-by: Cody Northrop <[email protected]>
Reviewed-by: Brian Paul <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
| |
LLVM is enabled by default for some architectures, but the test was failing
before that.
Signed-off-by: Michel Dänzer <[email protected]>
Reviewed-by: Emil Velikov <[email protected]>
Reviewed-by: Tom Stellard <[email protected]>
Cc: "10.1 10.2" <[email protected]>
|
|
|
|
|
|
|
| |
v2 Marek: set the query result correctly
Signed-off-by: David Heidelberger <[email protected]>
Signed-off-by: Marek Olšák <[email protected]>
|
|
|
|
|
|
|
|
|
| |
Always default to --enable-driglx-direct, now that will build driswrast, but
won't try to use dri[123] on platforms which don't have that.
Signed-off-by: Jon TURNEY <[email protected]>
Reviewed-by: Emil Velikov <[email protected]>
Reviewed-by: Matt Turner <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Some untangling to fix building in the dri_platform=none, --enable-driglx-direct
case, where only driswast can be used.
Turn the test for including the glXGetScreenDriver()/glXGetScreenDriver()
interface used by xdriinfo from !GLX_USE_APPLEGL into a positive form, as it is
only useful when dri_platform=drm
Add additional GLX_USE_DRM tests so DRI[123] renderers are only used when
dri_platform=drm
Note that swrast and indirect must still be disabled in the APPLEGL case at the
moment, which makes things more complex than they need to be. More untangling
is needed to allow that
Signed-off-by: Jon TURNEY <[email protected]>
Reviewed-by: Emil Velikov <[email protected]>
|
|
|
|
|
|
|
|
| |
Nothing else uses GL-types here.
Signed-off-by: Kristian Høgsberg <[email protected]>
Reviewed-by: Matt Turner <[email protected]>
Reviewed-by: Chris Forbes <[email protected]>
|
|
|
|
|
|
|
|
| |
It's not used.
Signed-off-by: Kristian Høgsberg <[email protected]>
Reviewed-by: Matt Turner <[email protected]>
Reviewed-by: Chris Forbes <[email protected]>
|
|
|
|
| |
Untested.
|
|
|
|
|
|
|
|
|
|
|
|
| |
Commit 07af0ab changed fs_inst to have 0 sources for texture opcodes
in emit_texture_gen5 (Ironlake, Sandybrige) while fs_generator still
uses a single source from brw_reg struct. Patch sets src as reg_undef
which matches the behavior before the constructor got changed.
Reviewed-by: Matt Turner <[email protected]>
Reviewed-by: Topi Pohjolainen <[email protected]>
Signed-off-by: Tapani Pälli <[email protected]>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=79534
|