| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
|
|
|
|
|
|
|
|
| |
This paves the way for copy propagating our unpacks. We end up with a
small change on shader-db:
total instructions in shared programs: 89390 -> 89251 (-0.16%)
instructions in affected programs: 19041 -> 18902 (-0.73%)
which appears to be because we no longer convert MOVs for an FMAX dst,
r4.unpack, r4.unpack (instead of the previous MOV dst, r4.unpack), and
this ends up with a slightly better schedule.
|
|
|
|
|
|
|
|
|
| |
At one point I thought packs and unpacks were in the same field of the
instruction. They aren't. These instructions therefore never cause a
pack.
total instructions in shared programs: 89472 -> 89390 (-0.09%)
instructions in affected programs: 15261 -> 15179 (-0.54%)
|
|
|
|
|
| |
I'm going to introduce some more types of MOV, which also want the elision
of raw MOVs.
|
|
|
|
| |
We can do 16a/16b from float as well. No difference on shader-db.
|
| |
|
|
|
|
| |
No problems being fixed, but needed for the new unpack changes.
|
|
|
|
| |
Not used yet, but will be.
|
|
|
|
|
|
|
|
|
|
|
|
| |
When a TCS is present, the TES input gl_PatchVerticesIn is actually a
constant - it's simply the # of output vertices specified by the TCS
layout qualifiers. So, we can replace the system value with a constant,
which may allow further optimization, and will likely be more efficient.
If the TCS is absent, we can't do this optimization.
Signed-off-by: Kenneth Graunke <[email protected]>
Reviewed-by: Marek Olšák <[email protected]>
|
|
|
|
|
|
| |
Trivial.
Signed-off-by: Ian Romanick <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Previously we could create a renderbuffer with format
MESA_FORMAT_R8G8B8A8_UNORM, convert that renderbuffer to an EGLImage,
then FAIL to convert the EGLImage back to a renderbuffer because
reasons. Just use the same check in
intel_image_target_renderbuffer_storage that brw_render_target_supported
uses.
There are more checks in brw_render_target_supported, but I don't think
they are necessary here. A different approach would be to refactor
brw_render_target_supported to take rb->Format and rb->NumSamples as
parameters (instead of a gl_renderbuffer) and use the new function here.
Fixes:
ES2-CTS.gtf.GL2ExtensionTests.egl_image.egl_image
Signed-off-by: Ian Romanick <[email protected]>
Reviewed-by: Anuj Phogat <[email protected]>
Tested-by: Tapani Pälli <[email protected]>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=92476
Cc: "10.3 10.4 10.5 10.6 11.0" <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This is more optimal as it means we no longer have to upload the same set
of ABO surfaces to all stages in the program.
This also fixes a bug where since commit c0cd5b var->data.binding was
being used as a replacement for atomic buffer index, but they don't have
to be the same value they just happened to end up the same when binding is 0.
Reviewed-by: Francisco Jerez <[email protected]>
Reviewed-by: Samuel Iglesias Gonsálvez <[email protected]>
Cc: Ilia Mirkin <[email protected]>
Cc: Alejandro Piñeiro <[email protected]>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=90175
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
f16c intrinsic can only be emitted when AVX is used. So when we disable AVX
due to forcing 128bit vectors we must not use this intrinsic (depending on
llvm version, this worked previously because llvm used AVX even when we didn't
tell it to, however I've seen this fail with llvm 3.3 since
718249843b915decf8fccec92e466ac1a6219934 which seems to have the side effect
of disabling avx in llvm albeit it only touches sse flags really, but
with ea421e919ae6e72e1319fb205c42a6fb53ca2f82 it's now really disabled).
Albeit being able to use AVX with 128bit vectors also would have its uses, the
code as is really was meant to emulate jit code creation for less capable cpus.
v2: add some (ifdefed out) missing de-featuring options for simulating
less capable cpus.
Reviewed-by: Nicolai Hähnle <[email protected]>
Reviewed-by: Jose Fonseca <[email protected]>
|
|
|
|
|
|
|
|
|
|
| |
At least vl_mpeg12_decoder uses the picture
desc in begin_frame and decode_bitstream.
https://bugs.freedesktop.org/show_bug.cgi?id=92634
Signed-off-by: Julien Isorce <[email protected]>
Reviewed-by: Christian König <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Patch adds additional check to make sure we don't return locations for
structures or arrays of structures.
From page 79 of the OpenGL 4.2 spec:
"A valid name cannot be a structure, an array of structures, or any
portion of a single vector or a matrix."
v2: use without-array() to simplify code (Timothy)
No Piglit or CTS regressions observed.
Signed-off-by: Tapani Pälli <[email protected]>
Reviewed-by: Samuel Iglesias Gonsálvez <[email protected]>
|
|
|
|
|
|
|
| |
It doesn't modify it.
Signed-off-by: Kenneth Graunke <[email protected]>
Reviewed-by: Matt Turner <[email protected]>
|
|
|
|
|
| |
They're only f16-to-f32 on a float operation, otherwise they're
i16-to-i32.
|
|
|
|
|
| |
No known bugs, just something I noticed while updating optimization code
for other changes.
|
|
|
|
|
|
|
|
|
|
| |
One instruction instead of four, and it turns out you do this a lot for
the Over operator.
total uniforms in shared programs: 32168 -> 32087 (-0.25%)
uniforms in affected programs: 318 -> 237 (-25.47%)
total instructions in shared programs: 89830 -> 89472 (-0.40%)
instructions in affected programs: 6434 -> 6076 (-5.56%)
|
|
|
|
|
| |
I don't know what previous test was trying to do, but it dates back to the
first add of vc4_qpu_emit.c. No change to shader-db.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
I left the function to obtain the revision because it is, and will continue to
be useful in the future. I'd rather not have to dig it up every time we need it.
Comments left at the implementation to say as much.
This was accidentally left here when I moved the early platform support:
commit 28ed1e08e8ba98ebd4ff0b56326372f0df9c73ad
Author: Ben Widawsky <[email protected]>
Date: Fri Aug 7 13:58:37 2015 -0700
i965/skl: Remove early platform support
Signed-off-by: Ben Widawsky <[email protected]>
Reviewed-by: Jason Ekstrand <[email protected]>
|
|
|
|
|
|
|
|
| |
According to piglit/xonotic/neverball/stc, blend/rasterize/zsa state
will always be bound (never null). And the null checks were in-
consistent anyways, so remove them.
Signed-off-by: Rob Clark <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
| |
Uses the DCC buffer instead of the CMASK buffer. The ELIMINATE_FAST_CLEAR
still works. Furthermore, with DCC compression we can directly clear
to a limited set of colors such that we do not need a postprocessing step.
v2 Marek: check dcc_buffer && dirty_level_mask in set_sampler_view
Signed-off-by: Bas Nieuwenhuizen <[email protected]>
Signed-off-by: Marek Olšák <[email protected]>
|
|
|
|
|
|
|
|
| |
Can't see why anyone would ever want to use this, but it was clearly broken.
This fixes the piglit texwrap offset test using this combination.
Reviewed-by: Dave Airlie <[email protected]>
Reviewed-by: Jose Fonseca <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
When using nearest filtering and clamp / clamp to edge wrapping results could
be wrong for negative offsets. Fix this by adding the offset before doing
the conversion to int coords (could also use floor instead of trunc int
conversion but probably more complex on "typical" cpu).
This fixes the piglit texwrap offset failures with this filter/wrap combo
(which only leaves the linear/mirror repeat combination broken).
Reviewed-by: Dave Airlie <[email protected]>
Reviewed-by: Jose Fonseca <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
| |
For vertex/geometry shader sampling, this is the same as for llvmpipe - just
use the original resource target.
For fragment shader sampling though (which does not use first-layer based mip
offsets) adjust the sampling code to use first_layer in the non-array cases.
While here also fix up some code which looked wrong wrt buffer texel fetch
(no piglit change).
Reviewed-by: Dave Airlie <[email protected]>
Reviewed-by: Jose Fonseca <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
| |
Just need to use resource target not view target when calculating
first-layer based mip offsets. (This is a gl specific problem since
d3d10 does not distinguish between non-array and array resources neither
at the resource nor view level, only at the shader level.)
Fixes new piglit arb_texture_view sampling-2d-array-as-2d-layer test.
Reviewed-by: Dave Airlie <[email protected]>
Reviewed-by: Jose Fonseca <[email protected]>
|
|
|
|
|
|
|
|
| |
This patch was originally written before stoney support
was merged. Add stoney.
Reviewed-by: Marek Olšák <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
|
|
|
| |
Signed-off-by: Bas Nieuwenhuizen <[email protected]>
Signed-off-by: Marek Olšák <[email protected]>
|
|
|
|
|
| |
Signed-off-by: Bas Nieuwenhuizen <[email protected]>
Signed-off-by: Marek Olšák <[email protected]>
|
|
|
|
|
| |
Signed-off-by: Bas Nieuwenhuizen <[email protected]>
Signed-off-by: Marek Olšák <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
| |
As the alignment requirements can be 32 KiB or more, also adding
an aligned buffer creation function.
DCC is disabled for textures that can be shared as sharing the
DCC buffers has not been implemented yet.
Signed-off-by: Bas Nieuwenhuizen <[email protected]>
Signed-off-by: Marek Olšák <[email protected]>
|
|
|
|
|
|
|
| |
Like the comment says. This fixes DCC, which doesn't like blitting RG16
as RGBA8.
Reviewed-by: Michel Dänzer <[email protected]>
|
|
|
|
| |
Reviewed-by: Michel Dänzer <[email protected]>
|
|
|
|
| |
Reviewed-by: Michel Dänzer <[email protected]>
|
|
|
|
| |
Reviewed-by: Michel Dänzer <[email protected]>
|
|
|
|
|
|
| |
This catches the other cases that enable SWITCH_ON_EOI.
Reviewed-by: Michel Dänzer <[email protected]>
|
|
|
|
|
|
| |
The VI condition depends on geometry shaders and MAX_PRIMGRP_IN_WAVE.
Reviewed-by: Michel Dänzer <[email protected]>
|
|
|
|
| |
Reviewed-by: Michel Dänzer <[email protected]>
|
|
|
|
| |
Reviewed-by: Michel Dänzer <[email protected]>
|
|
|
|
|
|
| |
hardware does this automatically
Reviewed-by: Michel Dänzer <[email protected]>
|
|
|
|
|
|
| |
Draw calls without a vertex shader are skipped.
Reviewed-by: Michel Dänzer <[email protected]>
|
|
|
|
| |
Reviewed-by: Michel Dänzer <[email protected]>
|
|
|
|
| |
Reviewed-by: Michel Dänzer <[email protected]>
|
|
|
|
|
|
| |
This will allow removing the dummy PS.
Reviewed-by: Michel Dänzer <[email protected]>
|
|
|
|
|
|
| |
Just to validate that radeonsi doesn't crash.
Reviewed-by: Michel Dänzer <[email protected]>
|
|
|
|
|
|
|
|
|
| |
Out of 7063 shaders from my shader-db:
- 6564 (93%) shaders don't have any state parameters.
- 347 (5%) shaders have 1 state parameter for WPOS lowering.
- The remaining 2% have more state parameters, usually matrices.
Reviewed-by: Brian Paul <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
| |
v2 (agd): rebase on mesa master, split pci ids to
separate commit
v3 (agd): use carrizo for llvm processor name for
llvm 3.7 and older
Reviewed-by: Marek Olšák <[email protected]>
Signed-off-by: Samuel Li <[email protected]>
Cc: [email protected]
|
|
|
|
|
|
|
|
|
|
|
|
| |
The edgeflag comes in as ubyte with glEdgeFlagPointer but as float with
plain immediate glEdgeFlag. Avoid reading bytes that weren't meant for
the edgeflag in the pointer case.
Fixes intermittent failures with gl-2.0-edgeflag piglit (and valgrind
complaints about reading uninitialized memory).
Signed-off-by: Ilia Mirkin <[email protected]>
Cc: [email protected]
|
|
|
|
|
|
| |
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=92214
CC: "10.6 11.0" <[email protected]>
Reviewed-by: Roland Scheidegger <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
We can't do this all the time, because you want blending to be done in
linear space, and sRGB would lose too much precision being done in 4x8.
The win on instructions is pretty huge when you can, though.
total uniforms in shared programs: 32065 -> 32168 (0.32%)
uniforms in affected programs: 327 -> 430 (31.50%)
total instructions in shared programs: 92644 -> 89830 (-3.04%)
instructions in affected programs: 15580 -> 12766 (-18.06%)
Improves openarena performance at 1920x1080 from 10.7fps to 11.2fps.
|