| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
|
|
|
|
|
|
|
|
|
| |
They can't actually be 0 (as position is there) but should avoid confusion.
This was supposed to have been done by af7ba989fb5a39925a0a1261ed281fe7f48a16cf
but I accidentally pushed an older version of the patch in the end...
Also prettify slightly. And make some notes about the confusing and useless
fs input "map".
Reviewed-by: Jose Fonseca <[email protected]>
Reviewed-by: Brian Paul <[email protected]>
Reviewed-by: Edward O'Callaghan <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
draw emit couldn't care less what the interpolation mode is...
This somehow looked like it would matter, all drivers more or less
dutifully filled that in correctly. But this is only used for emit,
if draw needs to know about interpolation mode (for clipping for instance)
it will get that information from the vs anyway.
softpipe actually used to depend on that interpolation parameter, as it
abused that structure quite a bit but no longer.
Reviewed-by: Brian Paul <[email protected]>
Reviewed-by: Edward O'Callaghan <[email protected]>
|
|
|
|
|
|
|
|
|
|
| |
softpipe would calculate two "vertex layouts". The second one was however
just used for internal purposes, draw would know nothing about it even though
it looked exactly the same as the other one we tell draw about.
So, store that information separately as this was just confusing.
Reviewed-by: Brian Paul <[email protected]>
Reviewed-by: Edward O'Callaghan <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Unlike llvmpipe, softpipe always tells draw to emit the vertices as-is.
The two vertex layouts it calculates are a bit confusing, one which is just
used to tell draw to emit vertices as-is, and the other which has draw written
all over it but draw is completely unaware of and is used only to look up the
correct interpolation info later in setup.
Thus, the slots used are different to what llvmpipe does (I'm going to clean
up the confusing two layout stuff).
Reviewed-by: Brian Paul <[email protected]>
Reviewed-by: Edward O'Callaghan <[email protected]>
|
|
|
|
|
|
|
|
|
| |
It was actually slightly buggy (missing initialization / setup not dependent
on new vs albeit I didn't see issues), but the case of non-existing attributes
is now handled by draw emit code so don't need that anymore.
Reviewed-by: Brian Paul <[email protected]>
Reviewed-by: Edward O'Callaghan <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Previously the code would just redirect requests for attributes which
don't exist to use output 0. Rework this to output all zeros instead which
seems more useful - in particular some extensions like
ARB_fragment_layer_viewport require 0 in the fs even if it wasn't output by
previous stages. That way, drivers don't have to special case this depending
if the vs/gs outputs some attribute or not.
Reviewed-by: Brian Paul <[email protected]>
Reviewed-by: Jose Fonseca <[email protected]>
Reviewed-by: Edward O'Callaghan <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Add PCI IDs for the Intel Kabylake platforms. The IDs are taken
directly from the Linux kernel patches, which are under review:
http://lists.freedesktop.org/archives/intel-gfx/2015-October/078967.html
http://cgit.freedesktop.org/~vivijim/drm-intel/log/?h=kbl-upstream-v2
The Kabylake PCI IDs taken from the kernel are rearranged to be in order
of GT type, then PCI ID.
Please note that if this patch is backported, the following fixes will
need to be added before this patch:
commit 28ed1e08e8ba98e "i965/skl: Remove early platform support"
commit c1e38ad37042b0e "i965/skl: Use larger URB size where available."
Thanks to Ben for fixing a bug around setting urb.size, and being
patient with my questions about what the various fields mean.
Signed-off-by: Sarah Sharp <[email protected]>
Suggested-by: Ben Widawsky <[email protected]>
Tested-by: Rodrigo Vivi <[email protected]> (KBL-GT2)
Cc: "11.1" <[email protected]>
|
|
|
|
|
|
|
|
| |
Rename SVGA_HINT_FLAG_DRAW_EMITTED to SVGA_HINT_FLAG_CAN_PRE_FLUSH
because preemptive flush can be unblocked by more commands than
draw.
Reviewed-by: Brian Paul <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The existing code effectively turns off preemptive flushing for all
but the regions used for draws. This turns out to be overly
restrictive as some memory regions, e.g. GMR, may never get a draw
when used as a DMA upload staging area, causing problems for apps
that upload a large amount of textures, e.g. Unigine Heaven.
This patch fixes the Unigine Heaven memory allocation error and
has been verified to not cause a regression in the previous extended
retina display issue.
Reviewed-by: Thomas Hellstrom <[email protected]>
Reviewed-by: Brian Paul <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
In emit_input_declarations(), we are skipping declarations for those
registers that are not being used. But in emit_vertex_attrib_instructions(),
we are still emitting instructions to tweak the vertex attributes even if
they are not being used. This causes an assert in the backend because an
input register is not declared in the shader. This patch fixes the problem
by skipping the instruction if the vertex attribute is not being used.
Changes in this patch is originated from the code snippet from Jose as
suggested in bug 1530161.
Tested with piglit, Heaven, Turbine, glretrace.
Reviewed-by: Jose Fonseca <[email protected]>
Reviewed-by: Brian Paul <[email protected]>
|
|
|
|
| |
Remove useless comment. Reformat code.
|
|
|
|
| |
The _mesa_Bitmap() caller already checks for zero-sized bitmaps.
|
|
|
|
|
|
|
| |
Some kinds of textures never have mipmaps. 3D textures seldom have
mipmaps.
Reviewed-by: José Fonseca <[email protected]>
|
|
|
|
|
|
| |
Better readability and easier to extend.
Reviewed-by: José Fonseca <[email protected]>
|
|
|
|
| |
Reviewed-by: José Fonseca <[email protected]>
|
|
|
|
| |
Reviewed-by: José Fonseca <[email protected]>
|
|
|
|
|
|
| |
Use GLbitfield instead of GLuint to be consistent with other variables.
Reviewed-by: José Fonseca <[email protected]>
|
|
|
|
|
|
| |
To match dd_function_table::UpdateState().
Reviewed-by: José Fonseca <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
| |
If the only dirty state is mesa's _NEW_PROGRAM_CONSTANTS flag, we can
skip state validation before drawing a bitmap since that state doesn't
effect bitmap rendering.
This further increases the performance of the ipers demo on llvmpipe
to about what it was before commit 36c93a6fae27561.
Reviewed-by: José Fonseca <[email protected]>
|
|
|
|
|
|
| |
Just do it where needed (before drawing, clearing, etc).
Reviewed-by: José Fonseca <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
We were checking the dirty->st flags but not the dirty->mesa flags.
When we took the early return, we didn't clear the dirty->mesa flags
so the next time we called st_validate_state() we'd often flush the
glBitmap cache. And since st_validate_state() is called from
st_Bitmap(), it meant we flushed the bitmap cache for every glBitmap()
call.
This change seems to recover most of the performance loss observed
with the ipers demo on llvmpipe since commit commit 36c93a6fae27561.
Cc: [email protected]
Reviewed-by: José Fonseca <[email protected]>
|
| |
|
| |
|
|
|
|
|
|
|
|
|
| |
Previously each member was being counted as using a single slot,
count_attribute_slots() fixes the count for array and struct members.
Also don't assign a negitive to the unsigned expl_location variable.
Reviewed-by: Ilia Mirkin <[email protected]>
|
|
|
|
|
| |
Reviewed-by: Anuj Phogat <[email protected]>
Reviewed-by: Edward O'Callaghan <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Previously we were only reserving a single location for arrays and
structs.
We also didn't take into account implicit locations clashing with
explicit locations when assigning locations for their arrays or
structs.
This patch fixes both issues.
V5: fix regression for patch inputs/outputs in tessellation shaders
V4: just use count_attribute_slots() to get the number of slots,
also calculate the correct number of slots to reserve for gs and
tess stages by making use of the new get_varying_type() helper.
V3: handle arrays of structs
V2: also fix for arrays of arrays and structs.
Acked-by: Anuj Phogat <[email protected]>
Reviewed-by: Edward O'Callaghan <[email protected]>
|
|
|
|
|
|
|
|
| |
This will be used in the following patch for calculating array sizes correctly
when reserving explicit varying locations.
Reviewed-by: Anuj Phogat <[email protected]>
Reviewed-by: Edward O'Callaghan <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Previously we would pack varyings before trying to remove them, this
relied on the packing pass not packing varyings with a location of -1
to avoid packing varyings that should be removed.
However this meant unused varyings with an explicit location would be
packed before they could be removed when we enable packing of them in a
later patch.
V2: fix regression in V1 removing unused varyings in multi-stage SSO,
fix regression with single stage programs.
Reviewed-by: Anuj Phogat <[email protected]>
Reviewed-by: Edward O'Callaghan <[email protected]>
|
|
|
|
|
|
|
|
| |
ALIGN_DIVUP is a driver specific(r600g) macro that duplicates DIV_ROUND_UP functionality.
Replacing it with DIV_ROUND_UP eliminates this problems.
Signed-off-by: Krzysztof A. Sobiecki <[email protected]>
Reviewed-by: Nicolai Hähnle <[email protected]>
|
|
|
|
|
|
|
| |
I had the driver all tested for the last series, and in my last build I
noticed that get_swizzled_channel was unused now, and removed
it... apparently without testing to find that I removed the wrong channel
swizzle function.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
We routinely have code like:
vec1 ssa_220 = fge ssa_104, ssa_61
vec1 ssa_199 = bcsel ssa_220, ssa_106, ssa_105
and we would compare fge's args and choose between ~0 and 0 to generate
ssa_220, then compare ssa_220 to 0 and choose between bcsel's args.
Instead, try to notice the pattern and compare between fge's args to
select between bcsel's args.
total instructions in shared programs: 88019 -> 87574 (-0.51%)
instructions in affected programs: 9985 -> 9540 (-4.46%)
total estimated cycles in shared programs: 245752 -> 245237 (-0.21%)
estimated cycles in affected programs: 17232 -> 16717 (-2.99%)
|
|
|
|
|
| |
We only see txf on MSAA textures, currently, and apparently this didn't
impact any of our piglit tests.
|
|
|
|
|
| |
We already had the code supporting it, since it's needed for the depth
mode when doing shadow comparisons.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
We can't use its other features currently (mostly because we don't want
Newton-Raphson on rcps for texture coordinates), but it gets us started.
This eliminates some comparisons with constants in GLB2.7 and ETQW traces
at the QIR level by moving the comparisons into NIR, where they get
constant-folded out.
instructions in affected programs: 165 -> 156 (-5.45%)
total uniforms in shared programs: 32087 -> 32085 (-0.01%)
total estimated cycles in shared programs: 245762 -> 245752 (-0.00%)
estimated cycles in affected programs: 461 -> 451 (-2.17%)
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
I'm moving away from QIR being SSA (since NIR is doing lots of SSA
optimization for us now) and instead having QIR just be QPU operations
with virtual registers. By making our SELs be composed of two MOVs, we
could potentially coalesce the registers for the MOV's src and dst and
eliminate the MOV.
total instructions in shared programs: 88448 -> 88028 (-0.47%)
instructions in affected programs: 39845 -> 39425 (-1.05%)
total estimated cycles in shared programs: 246306 -> 245762 (-0.22%)
estimated cycles in affected programs: 162887 -> 162343 (-0.33%)
|
|
|
|
|
|
| |
If you want the SF of the value of a register produced from a series of
packing MOVs or conditional MOVs, we can't just SF on the last MOV into
the register.
|
|
|
|
|
|
|
|
| |
Fix a 's/unsigned int/unsigned/' consistency case while here.
Found-by: Coccinelle
Signed-off-by: Edward O'Callaghan <[email protected]>
Reviewed-by: Brian Paul <[email protected]>
|
|
|
|
|
|
| |
Found-by: Coccinelle
Signed-off-by: Edward O'Callaghan <[email protected]>
Reviewed-by: Brian Paul <[email protected]>
|
|
|
|
|
|
| |
Found-by: Coccinelle
Signed-off-by: Edward O'Callaghan <[email protected]>
Reviewed-by: Brian Paul <[email protected]>
|
|
|
|
|
|
| |
Found-by: Coccinelle
Signed-off-by: Edward O'Callaghan <[email protected]>
Reviewed-by: Brian Paul <[email protected]>
|
|
|
|
|
|
| |
Found-by: Coccinelle
Signed-off-by: Edward O'Callaghan <[email protected]>
Reviewed-by: Brian Paul <[email protected]>
|
|
|
|
|
|
|
|
|
| |
Fix silly issue with MSVC case fall-though support to need
a extra 'break;'
Found-by: Coccinelle
Signed-off-by: Edward O'Callaghan <[email protected]>
Reviewed-by: Brian Paul <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This patch converts the SSE-optimized lp_rast_triangle_32_3_16()
to VMX/VSX.
I measured the results on POWER8 machine with 32 cores at 3.4GHz and
16GB of RAM.
FPS/Score
Name Before After Delta
------------------------------------------------
openarena 16.35 16.7 2.14%
xonotic 4.707 4.97 5.57%
glmark2 didn't show a significant (more than 1%) difference.
v2: Make sure code is build only on POWER8 LE machine
Signed-off-by: Oded Gabbay <[email protected]>
Reviewed-by: Roland Scheidegger <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This patch converts the SSE-optimized build_mask_32() and
build_mask_linear_32() to VMX/VSX.
I measured the results on POWER8 machine with 32 cores at 3.4GHz and
16GB of RAM.
FPS/Score
Name Before After Delta
------------------------------------------------
glmark2 (score) 139.8 142.7 2.07%
openarena and xonotic didn't show a significant (more than 1%)
difference.
v2: Make sure code is build only on POWER8 LE machine
Signed-off-by: Oded Gabbay <[email protected]>
Reviewed-by: Roland Scheidegger <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This patch converts the SSE optimization done in do_triangle_ccw to
VMX/VSX.
I measured the results on POWER8 machine with 32 cores at 3.4GHz and
16GB of RAM.
FPS/Score
Name Before After Delta
------------------------------------------------
glmark2 (score) 136.6 139.8 2.34%
openarena 16.14 16.35 1.30%
xonotic 4.655 4.707 1.11%
v2:
- Convert loads to use aligned loads
- Make sure code is build only on POWER8 LE machine
Signed-off-by: Oded Gabbay <[email protected]>
Reviewed-by: Roland Scheidegger <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This file provides a portability layer that will make it easier to convert
SSE-based functions to VMX/VSX-based functions.
All the functions implemented in this file are prefixed using "vec_".
Therefore, when converting from SSE-based function, one needs to simply
replace the "_mm_" prefix of the SSE function being called to "vec_".
Having said that, not all functions could be converted as such, due to the
differences between the architectures. So, when doing such
conversion hurt the performance, I preferred to implement a more ad-hoc
solution. For example, converting the _mm_shuffle_epi32 needed to be done
using ad-hoc masks instead of a generic function.
All the functions in this file support both little-endian and big-endian
but currently the file is build only on POWER8 LE machine.
All of the functions are implemented using the Altivec/VMX intrinsics,
except one where I needed to use inline assembly (due to missing
intrinsic).
v2:
- Use vec_vgbbd instead of __builtin_vec_vgbbd
- Add an aligned load function
- Don't use typeof()
- Make file build only on POWER8 LE machine
Signed-off-by: Oded Gabbay <[email protected]>
Reviewed-by: Roland Scheidegger <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The nir_opt_algebraic rule
(('fadd', ('flog2', a), ('fneg', ('flog2', b))), ('flog2', ('fdiv', a, b))),
can produce new fdiv operations, which need to be lowered on i965,
as we don't actually implement fdiv. (Normally, we handle this in
GLSL IR's lower_instructions pass, but in the above case we introduce
an fdiv after that point. So, make NIR do it for us.)
Signed-off-by: Kenneth Graunke <[email protected]>
Reviewed-by: Jason Ekstrand <[email protected]>
Reviewed-by: Matt Turner <[email protected]>
Cc: [email protected]
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Compute shaders require reconfiguring the L3 for shared local memory
support. We have to be able to write the L3 registers to do that.
This effectively turns off compute shaders prior to Kernel 4.2.
(Previously, the extension enable was in an API_OPENGL_CORE conditional.
However, that isn't necessary - core Mesa extension handling already
restricts it properly. I've moved it out in this patch.)
Signed-off-by: Kenneth Graunke <[email protected]>
Reviewed-by: Francisco Jerez <[email protected]>
|
|
|
|
|
|
|
|
| |
That's what it's for. Plus, we actually implement rcp.
Signed-off-by: Kenneth Graunke <[email protected]>
Reviewed-by: Topi Pohjolainen <[email protected]>
Reviewed-by: Matt Turner <[email protected]>
|
|
|
|
|
|
|
|
|
| |
This fixes some piglit subtests for ARB_program_interface_query.
V3: remove some of the unnecessary parentheses
V2: fix alignment
Reviewed-by: Marek Olšák <[email protected]>
|