| Commit message (Collapse) | Author | Age | Files | Lines |
... | |
|
|
|
| |
Reviewed-by: Jason Ekstrand <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
In the compiler, we'd like to generate implicit uniforms for internal
use. These should not be visible via the GL uniform introspection API.
To support that, we add a new ir_variable::how_declared value of
ir_var_hidden, and plumb that through to gl_uniform_storage.
v2 (idr): Fix some memory management issues in
move_hidden_uniforms_to_end. The comment block on the function has more
details.
Signed-off-by: Kenneth Graunke <[email protected]>
Signed-off-by: Ian Romanick <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Makes use of SSE 4.1 to speed up compute of min and max elements.
Callgrind cpu usage results from pts benchmarks:
Openarena 0.8.8: 3.67% -> 1.03%
UrbanTerror: 2.36% -> 0.81%
V5:
- actually make use of the optimisation in android (Emil Velikov)
- set a better array size limit for using SSE and added TODO
V4:
- fixed bugs with incrementing pointer and updating counters
V3:
- Removed sse_minmax.c from Makefile.sources
- handle the first few values without SSE until the pointer is aligned
and use _mm_load_si128 rather than _mm_loadu_si128
- guard the call to the SSE code better at build time
V2:
- removed GL* types
- use _mm_store_si128() rather than _mm_store_ps()
- add runtime check for SSE
- use aligned attribute for local mix/max
- bunch of tidyups
Reviewed-by: Juha-Pekka Heikkila <[email protected]>
Reviewed-by: Matt Turner <[email protected]>
Signed-off-by: Timothy Arceri <[email protected]>
|
|
|
|
|
|
| |
These never existed, as far as I can tell.
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
| |
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
| |
Last use was in shader_time.
Reviewed-by: Jason Ekstrand <[email protected]>
|
|
|
|
|
|
| |
The ADDs depended on dispatch_width, which really isn't what we wanted.
Reviewed-by: Jason Ekstrand <[email protected]>
|
|
|
|
| |
We only want fields 0-2.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
driUnbindContext() checks for valid drawables before calling the driver
unbind function. In case of Surfaceless contexts, the drawables are always
Null and we end up not releasing the underlying DRI context. Moving the
call to the driver function before the drawable validity checks fixes things.
Steps to trigger this bug are following:
- create surfaceless context and make it current
- make some other context current
- {another thread} destroy surfaceless context
- make another context current
Signed-off-by: Alexandros Frantzis <[email protected]>
Signed-off-by: Kalyan Kondapally <[email protected]>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=74563
|
|
|
|
|
|
|
|
|
|
| |
builds
../../src/mesa/main/context.c: In function 'check_context_limits':
../../src/mesa/main/context.c:733:41: warning: unused parameter 'ctx' [-Wunused-parameter]
Signed-off-by: Ian Romanick <[email protected]>
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
| |
This started hitting an assertion recently. Only affects Haswell
(Ivybridge doesn't support this meddling with the sampler state pointer,
and ARB_gpu_shader5 is not enabled yet on Broadwell)
14 Piglits crash->pass.
Signed-off-by: Chris Forbes <[email protected]>
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Improves performance in GLBenchmark 2.7 TRex by 3.88889% +/- 0.336383%
(n=80) at 1280x720 on Broadwell GT3. Together with the previous patch,
it improves performance by 5.42738% +/- 0.541971% (n=10) at 1920x1080.
Note that without the PMA stall fix, this would instead decrease
performance by 22%.
v2: Update comment (noticed by Kristian Høgsberg).
Signed-off-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Certain non-promoted depth cases typically incur stalls. In very
specific cases, we can enable a workaround which improves performance.
Improves performance in GLBenchmark 2.7 TRex by 1.17762% +/- 0.448765%
(n=75) at 1280x720 on Broadwell GT3.
Haswell has this feature as well, but we can't currently write registers
from userspace batches (and we'd incur additional software batch
scanning overhead as well), so we haven't enabled it. Broadwell allows
us to write CACHE_MODE_1. Backporters beware: the formula and flushing
incantation differs between Haswell and Broadwell.
v2: Move pma_stall_bits from brw->state to brw itself (requested by
Kristian Høgsberg).
Signed-off-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
| |
This patch adds macros needed for the HiZ PMA stall optimization.
Signed-off-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
|
|
| |
Matt requested this in review feedback on the original patch, which I
completely missed when pushing this series. Kristian also made this
change, but I grabbed the wrong version of the patch.
Signed-off-by: Kenneth Graunke <[email protected]>
Reviewed-by: Matt Turner <[email protected]>
|
|
|
|
|
|
|
| |
Otherwise, calling glPopAttrib on drivers that don't support
ARB_clip_control gives you a GL error, which is surprising at best.
Signed-off-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
|
| |
We're not programming the clear values yet, so this won't work.
This patch should be (effectively) reverted eventually.
Signed-off-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
| |
On Skylake, the MOCS bits are an index into a table of 63 different,
configurable cache configurations. As for previous GENs, we only care about
WB and WT, which are available in the documented default set. Define
SKL_MOCS_WB and SKL_MOCS_WT to the indices for those configucations and use
those for the Skylake MOCS values.
Signed-off-by: Kristian Høgsberg <[email protected]>
Reviewed-by: Kenneth Graunke <[email protected]>
Reviewed-by: Anuj Phogat <[email protected]>
|
|
|
|
|
|
|
| |
Signed-off-by: Jordan Justen <[email protected]>
Reviewed-by: Kristian Høgsberg <[email protected]>
Reviewed-by: Kenneth Graunke <[email protected]>
Reviewed-by: Anuj Phogat <[email protected]>
|
|
|
|
|
|
|
|
|
| |
Skylake has separate controls for enabling the Z Clip Test for the near
and far planes. For now, maintain the legacy behavior by setting both.
Signed-off-by: Kenneth Graunke <[email protected]>
Reviewed-by: Kristian Høgsberg <[email protected]>
Reviewed-by: Anuj Phogat <[email protected]>
|
|
|
|
|
|
|
|
|
| |
Skylake's 3DSTATE_DS packet has a few more fields; we don't support
domain shaders yet though.
Signed-off-by: Kenneth Graunke <[email protected]>
Reviewed-by: Kristian Høgsberg <[email protected]>
Reviewed-by: Anuj Phogat <[email protected]>
|
|
|
|
|
|
|
| |
They are the same as for BDW, so just add a case for SKL to the init switch.
Signed-off-by: Kristian Høgsberg <[email protected]>
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
|
| |
SKL updates the resolve rectangle scaling factors again.
Signed-off-by: Kristian Høgsberg <[email protected]>
Reviewed-by: Kenneth Graunke <[email protected]>
Reviewed-by: Anuj Phogat <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
| |
On SKL, 3DSTATE_CONSTANT_* command is not committed until we give
the corresponding 3DSTATE_BINDING_TABLE_POINTERS_* command. If we
fail to do so, the constant buffers wont be read and push constants
will be wrong.
Signed-off-by: Kenneth Graunke <[email protected]>
Reviewed-by: Kristian Høgsberg <[email protected]>
Reviewed-by: Anuj Phogat <[email protected]>
|
|
|
|
|
|
|
|
|
| |
Otherwise they overlap and horrible things happen. All the new DWords
are for fast color clear values, which we don't do yet.
Signed-off-by: Kenneth Graunke <[email protected]>
Reviewed-by: Kristian Høgsberg <[email protected]>
Reviewed-by: Anuj Phogat <[email protected]>
|
|
|
|
|
|
|
|
|
|
| |
We will need to allocate more DWords on Skylake.
v2: Don't mark brw_context parameter const. It's modified.
Signed-off-by: Kenneth Graunke <[email protected]>
Reviewed-by: Kristian Høgsberg <[email protected]>
Reviewed-by: Anuj Phogat <[email protected]>
|
|
|
|
|
|
|
|
|
| |
Skylake introduces a new base address for a feature we don't yet expose.
Setting these to 0 should be safe.
Signed-off-by: Kenneth Graunke <[email protected]>
Reviewed-by: Kristian Høgsberg <[email protected]>
Reviewed-by: Anuj Phogat <[email protected]>
|
|
|
|
|
|
|
|
|
| |
Skylake uploads the stencil reference values in DW3 of the
3DSTATE_WM_DEPTH_STENCIL packet, rather than in COLOR_CALC_STATE.
Signed-off-by: Kenneth Graunke <[email protected]>
Reviewed-by: Kristian Høgsberg <[email protected]>
Reviewed-by: Anuj Phogat <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
| |
Skylake has some extra bits in PIPELINE_SELECT, none of which are
interesting for a 3D driver. In order to selectively change them, it
also introduces new "mask bits" in 15:8. We care about the "Pipeline
Selection" bits (1:0), so set the mask to 0x3.
Signed-off-by: Kenneth Graunke <[email protected]>
Reviewed-by: Kristian Høgsberg <[email protected]>
Reviewed-by: Anuj Phogat <[email protected]>
|
|
|
|
|
|
|
| |
Signed-off-by: Jordan Justen <[email protected]>
Reviewed-by: Kristian Høgsberg <[email protected]>
Reviewed-by: Kenneth Graunke <[email protected]>
Reviewed-by: Anuj Phogat <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This commands has seen the addition of 2 dwords that allow to specify
which channels of which attributes need to be forwarded to the fragment
shader.
v2: Rebase forward a year (done by Ken).
Signed-off-by: Damien Lespiau <[email protected]>
Signed-off-by: Kenneth Graunke <[email protected]>
Reviewed-by: Kristian Høgsberg <[email protected]>
Reviewed-by: Anuj Phogat <[email protected]>
|
|
|
|
|
|
| |
No differences in shader-db on Haswell (Gen 7.5).
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
|
| |
AFAICT the number of threads is 80, not 70. I am not sure if Ken knows
something I do not.
Signed-off-by: Ben Widawsky <[email protected]>
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
|
|
|
| |
We are about to change mesa to spawn threads for deferred glCompileShader and
glLinkProgram, and we need to make sure those threads can send compiler
warnings/errors to the debug output safely.
Signed-off-by: Chia-I Wu <[email protected]>
Reviewed-by: Brian Paul <[email protected]>
Reviewed-by: Ian Romanick <[email protected]>
|
|
|
|
|
|
|
|
|
| |
Both core mesa and glsl have their own wrappers for strtof_l. Merge
and move them to util/. They are compiled with a C++ compiler so that
we can make them thread-safe in a following commit.
Signed-off-by: Chia-I Wu <[email protected]>
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
|
|
| |
This removes the need for the gallium rasterizer state
to listen to viewport changes.
Thanks to Marek Olšák <[email protected]>.
Reviewed-by: Marek Olšák <[email protected]>
Signed-off-by: Mathias Froehlich <[email protected]>
|
|
|
|
|
|
|
|
| |
This reverts commit cabc93c5adc9ea62be901621eff5ce4cb9574791.
Mark thinks the failures on the SNB GT2 in the lab are actually because
of faulty hardware, not instruction compaction. The GT1 didn't see any
problems after changes to the compaction code.
|
|
|
|
| |
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
|
| |
Multiplication is commutative.
instructions in affected programs: 48314 -> 47954 (-0.75%)
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
| |
These source files support actual geometry shaders, so using "gs" for
the name makes a lot of sense. We're going to be adding SIMD8 geometry
shader support as well, at which point "vec4_gs" will be a misnomer.
Signed-off-by: Kenneth Graunke <[email protected]>
Acked-by: Matt Turner <[email protected]>
Acked-by: Jason Ekstrand <[email protected]>
Acked-by: Iago Toral Quiroga <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The brw_gs.[ch] and brw_gs_emit.c source files contain code for
emulating fixed-function unit functionality (VF primitive decomposition
or SOL) using the GS unit. They do not contain code to support proper
geometry shaders.
We've taken to calling that code "ff_gs" (see brw_ff_gs_prog_key,
brw_ff_gs_prog_data, brw_context::ff_gs, brw_ff_gs_compile,
brw_ff_gs_prog). So it makes sense to make the filenames match.
Signed-off-by: Kenneth Graunke <[email protected]>
Acked-by: Matt Turner <[email protected]>
Acked-by: Jason Ekstrand <[email protected]>
Acked-by: Iago Toral Quiroga <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The GL functions and driver hooks use corresponding names---for example,
glMapBufferRange and Driver.MapBufferRange. But our implementation was
called "intel_bufferobj_map_range," which has the words "map" and
"buffer" swapped, as well as randomly adding "obj."
FlushMappedBufferRange was even trickier: it ordered the words
3, "obj", 1, 2, 4: intel_bufferobj_flush_mapped_range.
Even though the old names were consistent, I always had trouble
rearranging the jumble of words when searching for a function,
and it took a few tries to eventually land there.
The new names match the word order of GL and the driver hooks;
FlushMappedBufferRange is simply brw_flush_mapped_buffer_range.
Signed-off-by: Kenneth Graunke <[email protected]>
Reviewed-by: Kristian Høgsberg <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The OpenGL 4.0 core profile specification, section 2.17.3
Transform Feedback Draw Operations says:
"The error INVALID_VALUE is generated if <stream> is greater
than or equal to the value of MAX_VERTEX_STREAMS.
...
The error INVALID_OPERATION
is generated if EndTransformFeedback has never been called
while the object named by id was bound."
Fixes the piglit test:
ARB_transform_feedback3/arb_transform_feedback3-draw_using_invalid_stream_index
(with the test itself fixed to eliminate an unrelated failure)
Signed-off-by: Chris Forbes <[email protected]>
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
| |
When we're checking if the framebuffer is sRGB capable, call
is_format_supported() with the PIPE_BIND_DISPLAY_TARGET flag.
Reviewed-by: Charmaine Lee <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This reverts commit 20836c81851e0df29a8ee9c86e5e5388738c840b.
255 is a huge number. If you have a loop with 255 iterations, unrolling it
will exceed the SM3 instruction limit. Let's use the default again.
The comment about a SM3 limit doesn't make sense. For SM3, we generally
want 32 (default) or a lower number due to the SM3 instruction limit, which
is 512 instructions. For SM4, we can try higher numbers if needed, but
some shaders can end up being pretty huge and shader compilation can take
more time.
This fixes a shader compile failure on R500/SM3. Reported on IRC.
Cc: 10.2 10.3 <[email protected]>
Reviewed-by: Brian Paul <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The GL side of this extension just provides an accessor via glGetIntegerv for
the value of GL_CONTEXT_RELEASE_BEHAVIOR so it is trivial to implement. There
is a constant on the context for the value of the enum which is initialised to
GL_CONTEXT_RELEASE_BEHAVIOR_FLUSH. The extension is always enabled because it
doesn't need any driver interaction to retrieve the value.
If the value of the enum is anything but FLUSH then _mesa_make_current will
now refrain from calling _mesa_flush. This should only affect drivers that
explicitly change the enum to a non-default value.
Reviewed-by: Ian Romanick <[email protected]>
|
|
|
|
| |
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
| |
Reviewed-by: Kristian Høgsberg <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
| |
Different platforms require the offset to be in different units. However,
the generator fixes all of this up for us and only requires an offset in
bytes. Previously, we were getting this wrong all over the place. Some
computed/used it correctly as bytes while others treated the offset as
whole registers or computed it as bytes or bytes*2 in SIMD16 mode. This
commit cleans all this up and makes us properly treat it as bytes
everywhere.
Reviewed-by: Kristian Høgsberg <[email protected]>
|
|
|
|
| |
Reviewed-by: Kristian Høgsberg <[email protected]>
|