| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
|
| |
Signed-off-by: Emil Velikov <[email protected]>
(cherry picked from commit 1a9cc5f50db5d27530a3449743b43aac389d781f)
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Use clamping constants that guarantee no integer overflows.
As spotted by Chris Forbes.
This causes the code to change as:
- value |= (uint32_t)CLAMP(src[0], 0.0f, 4294967295.0f);
+ value |= (uint32_t)CLAMP(src[0], 0.0f, 4294967040.0f);
- value |= (uint32_t)((int32_t)CLAMP(src[0], -2147483648.0f, 2147483647.0f));
+ value |= (uint32_t)((int32_t)CLAMP(src[0], -2147483648.0f, 2147483520.0f));
Reviewed-by: Roland Scheidegger <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This commit causes the generated C code to change as
union util_format_r32g32b32a32_sscaled pixel;
- pixel.chan.r = (int32_t)CLAMP(src[0], -2147483648, 2147483647);
- pixel.chan.g = (int32_t)CLAMP(src[1], -2147483648, 2147483647);
- pixel.chan.b = (int32_t)CLAMP(src[2], -2147483648, 2147483647);
- pixel.chan.a = (int32_t)CLAMP(src[3], -2147483648, 2147483647);
+ pixel.chan.r = (int32_t)CLAMP(src[0], -2147483648.0f, 2147483647.0f);
+ pixel.chan.g = (int32_t)CLAMP(src[1], -2147483648.0f, 2147483647.0f);
+ pixel.chan.b = (int32_t)CLAMP(src[2], -2147483648.0f, 2147483647.0f);
+ pixel.chan.a = (int32_t)CLAMP(src[3], -2147483648.0f, 2147483647.0f);
memcpy(dst, &pixel, sizeof pixel);
which surprisingly makes a difference for MSVC.
Thanks to Juraj Svec for diagnosing this and drafting a fix.
Fixes https://bugs.freedesktop.org/show_bug.cgi?id=29661
|
|
|
|
| |
Signed-off-by: Vinson Lee <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
| |
Fixes MSVC build error.
shaderapi.c
src\glsl\list.h(535) : error C2143: syntax error : missing ';' before 'type'
src\glsl\list.h(535) : error C2143: syntax error : missing ')' before 'type'
src\glsl\list.h(536) : error C2065: 'node' : undeclared identifier
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=86025
Signed-off-by: Vinson Lee <[email protected]>
|
|
|
|
|
|
|
| |
This can be very useful for trying to debug list corruptions.
Signed-off-by: Jason Ekstrand <[email protected]>
Reviewed-by: Matt Turner <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
On Windows, DllMain calls and thread creation/destruction are
serialized, so when llvmpipe is destroyed from DllMain waiting for the
rasterizer threads to finish will deadlock.
So, instead of waiting for rasterizer threads to have finished, simply wait for the
rasterizer threads to notify they are just about to finish.
Verified with this very simple program:
#include <windows.h>
int main() {
HMODULE hModule = LoadLibraryA("opengl32.dll");
FreeLibrary(hModule);
}
Fixes https://bugs.freedesktop.org/show_bug.cgi?id=76252
Reviewed-by: Roland Scheidegger <[email protected]>
Cc: 10.2 10.3 <[email protected]>
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
No longer needed as the file was removed with
commit 8c229d306b3f312adbdfbaf79967ee43fbfc839e
Author: Kenneth Graunke <[email protected]>
Date: Mon Aug 11 10:07:07 2014 -0700
i965: Delete the Gen8 code generators.
We now use the brw_eu_emit.c code instead.
Reviewed-by: Kenneth Graunke <[email protected]>
Reviewed-by: Matt Turner <[email protected]>
Signed-off-by: Emil Velikov <[email protected]>
|
|
|
|
|
|
|
|
| |
During teardown we free the driver_configs list pointer, but we forget
to deallocate each config in that list.
Signed-off-by: Emil Velikov <[email protected]>
Reviewed-and-tested-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
|
|
|
| |
The function is not called by platform_drm. As such one needs to
pay special attention at teardown.
v2: Fix the comment block. Spotted by Ken.
Signed-off-by: Emil Velikov <[email protected]>
Reviewed-and-tested-by: Kenneth Graunke <[email protected]> (v1)
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Earlier commit failed to attribure that for drm platforms one does not
call dri2_create_screen, thus it does not create the screen and
driver_configs but inherits them from the "display" - gbm.
As such wrap cleanup in Platform != _EGL_PLATFORM_DRM to prevent
the issue and still cleanup correctly for non-drm platforms.
v2:
- Drop the ifdef HAVE_DRM_PLATFORM, reindent the code and fix the
comment block. Suggested by Ken.
Reported-by: Kenneth Graunke <[email protected]>
Reported-by: Mark Janes <[email protected]>
Signed-off-by: Emil Velikov <[email protected]>
Reviewed-and-tested-by: Kenneth Graunke <[email protected]> (v1)
|
|
|
|
|
|
|
| |
Move opcode to string mappings to functions of their own. Have for consistent
outputs for similar opcodes.
Signed-off-by: Chia-I Wu <[email protected]>
|
|
|
|
|
|
| |
This is at least much better than decoding as blobs.
Signed-off-by: Chia-I Wu <[email protected]>
|
|
|
|
|
|
|
|
| |
When the earlier block ended with control flow, we'd mistakenly remove
some of its links to its children. The same happened with the later
block.
Acked-by: Jason Ekstrand <[email protected]>
|
|
|
|
| |
Reviewed-by: Jason Ekstrand <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
A pattern in certain shaders is:
uniform vec4 colors[NUM_LIGHTS];
for (int i = 0; i < NUM_LIGHTS; i++) {
...use colors[i]...
}
In this case, the application author expects the shader compiler to
unroll the loop. By doing so, it replaces variable indexing of the
array with constant indexing, which is more efficient.
This patch extends the heuristic to see if arrays accessed within the
loop are indexed by an induction variable, and if the array size exactly
matches the number of loop iterations. If so, the application author
probably intended us to unroll it. If not, we rely on the existing
loop-too-large heuristic.
Improves performance in a phong shading microbenchmark by 2.88x, and a
shadow mapping microbenchmark by 1.63x. Without variable indexing, we
can upload the small uniform arrays as push constants instead of pull
constants, avoiding shader memory access. Affects several games, but
doesn't appear to impact their performance.
Signed-off-by: Kenneth Graunke <[email protected]>
Reviewed-by: Matt Turner <[email protected]>
Acked-by: Kristian Høgsberg <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Consider GLSL code such as:
const ivec2 offsets[] =
ivec2[](ivec2(-1, -1), ivec2(-1, 0), ivec2(-1, 1),
ivec2(0, -1), ivec2(0, 0), ivec2(0, 1),
ivec2(1, -1), ivec2(1, 0), ivec2(1, 1));
ivec2 offset = offsets[<non-constant expression>];
Both i965 and nv50 currently handle this very poorly. On i965, this
becomes a pile of MOVs to load the immediate constants into registers,
a pile of scratch writes to move the whole array to memory, and one
scratch read to actually access the value - effectively the same as if
it were a non-constant array.
We'd much rather upload large blocks of constant data as uniform data,
so drivers can simply upload the data via constbufs, and not have to
populate it via shader instructions.
This is currently non-optional because both i965 and nouveau benefit
from it, and according to Marek radeonsi would benefit today as well.
(According to Tom, radeonsi may want to handle this itself in the long
term, but we can always add a flag when it becomes useful.)
Improves performance in a terrain rendering microbenchmark by about 2x,
and cuts the number of instructions in about half. Helps a lot of
"Natural Selection 2" shaders, as well as one "HOARD" shader.
total instructions in shared programs: 5473459 -> 5471765 (-0.03%)
instructions in affected programs: 5880 -> 4186 (-28.81%)
v2: Use ir_var_hidden to avoid exposing the new uniform via the GL
uniform introspection API.
v3: Alphabetize Makefile.sources properly.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=77957
Signed-off-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
In the compiler, we'd like to generate implicit uniforms for internal
use. These should not be visible via the GL uniform introspection API.
To support that, we add a new ir_variable::how_declared value of
ir_var_hidden, and plumb that through to gl_uniform_storage.
v2 (idr): Fix some memory management issues in
move_hidden_uniforms_to_end. The comment block on the function has more
details.
Signed-off-by: Kenneth Graunke <[email protected]>
Signed-off-by: Ian Romanick <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Makes use of SSE 4.1 to speed up compute of min and max elements.
Callgrind cpu usage results from pts benchmarks:
Openarena 0.8.8: 3.67% -> 1.03%
UrbanTerror: 2.36% -> 0.81%
V5:
- actually make use of the optimisation in android (Emil Velikov)
- set a better array size limit for using SSE and added TODO
V4:
- fixed bugs with incrementing pointer and updating counters
V3:
- Removed sse_minmax.c from Makefile.sources
- handle the first few values without SSE until the pointer is aligned
and use _mm_load_si128 rather than _mm_loadu_si128
- guard the call to the SSE code better at build time
V2:
- removed GL* types
- use _mm_store_si128() rather than _mm_store_ps()
- add runtime check for SSE
- use aligned attribute for local mix/max
- bunch of tidyups
Reviewed-by: Juha-Pekka Heikkila <[email protected]>
Reviewed-by: Matt Turner <[email protected]>
Signed-off-by: Timothy Arceri <[email protected]>
|
|
|
|
|
|
| |
These never existed, as far as I can tell.
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
| |
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
| |
Last use was in shader_time.
Reviewed-by: Jason Ekstrand <[email protected]>
|
|
|
|
|
|
| |
The ADDs depended on dispatch_width, which really isn't what we wanted.
Reviewed-by: Jason Ekstrand <[email protected]>
|
|
|
|
| |
We only want fields 0-2.
|
|
|
|
|
| |
Reviewed-by: Tom Stellard <[email protected]>
Signed-off-by: Jan Vesely <[email protected]>
|
|
|
|
|
|
|
|
| |
Walk through the list and free each config, and finally free the list
itself. Freeing approx 20KiB of memory, according to valgrind.
Inspired by a similar patch by enpeng xu.
Signed-off-by: Emil Velikov <[email protected]>
|
|
|
|
| |
Signed-off-by: Emil Velikov <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
driUnbindContext() checks for valid drawables before calling the driver
unbind function. In case of Surfaceless contexts, the drawables are always
Null and we end up not releasing the underlying DRI context. Moving the
call to the driver function before the drawable validity checks fixes things.
Steps to trigger this bug are following:
- create surfaceless context and make it current
- make some other context current
- {another thread} destroy surfaceless context
- make another context current
Signed-off-by: Alexandros Frantzis <[email protected]>
Signed-off-by: Kalyan Kondapally <[email protected]>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=74563
|
|
|
|
|
|
|
|
| |
The dummy shader sends an EOT message to end itself. There are many more
works need to be done on the compiler side before we can advertise
PIPE_CAP_COMPUTE.
Signed-off-by: Chia-I Wu <[email protected]>
|
|
|
|
|
|
|
| |
All we need to do is to upload the input data and call
ilo_render_emit_launch_grid() with space checking.
Signed-off-by: Chia-I Wu <[email protected]>
|
|
|
|
|
|
|
| |
ilo_render_emit_launch_grid() emits all the hardware states needed for a
launch_grid() call.
Signed-off-by: Chia-I Wu <[email protected]>
|
|
|
|
|
|
| |
They were written for Gen6 but mostly untested. Make them work for Gen7+.
Signed-off-by: Chia-I Wu <[email protected]>
|
|
|
|
| |
Signed-off-by: Chia-I Wu <[email protected]>
|
|
|
|
| |
Signed-off-by: Chia-I Wu <[email protected]>
|
|
|
|
| |
Signed-off-by: Chia-I Wu <[email protected]>
|
|
|
|
|
|
| |
It updates the handles of the global bindings.
Signed-off-by: Chia-I Wu <[email protected]>
|
|
|
|
|
|
|
| |
Use util_dynarray in ilo_set_global_binding() to allow for unlimited number of
global bindings. Add a comment for global bindings.
Signed-off-by: Chia-I Wu <[email protected]>
|
|
|
|
|
|
|
| |
We need to know the local/input/private sizes and others. This is not
complete. We need many others for CURBE setup.
Signed-off-by: Chia-I Wu <[email protected]>
|
|
|
|
|
|
| |
Based on beignet, hardware capabilities, and OpenCL requirements.
Signed-off-by: Chia-I Wu <[email protected]>
|
|
|
|
|
|
|
| |
They will be used to report compute params or program compute states.
thread_count can also be used for 3DSTATE_VS.
Signed-off-by: Chia-I Wu <[email protected]>
|
|
|
|
|
|
| |
drm_intel_gem_bo_wait() with negative timeout is broken on kernel 3.17.
Signed-off-by: Chia-I Wu <[email protected]>
|
|
|
|
|
|
|
|
|
|
| |
builds
../../src/mesa/main/context.c: In function 'check_context_limits':
../../src/mesa/main/context.c:733:41: warning: unused parameter 'ctx' [-Wunused-parameter]
Signed-off-by: Ian Romanick <[email protected]>
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
|
|
| |
Based on the description of __assume at:
http://msdn.microsoft.com/en-us/library/1b3fsfxw.aspx
Signed-off-by: Ian Romanick <[email protected]>
Reviewed-by: Brian Paul <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
| |
This started hitting an assertion recently. Only affects Haswell
(Ivybridge doesn't support this meddling with the sampler state pointer,
and ARB_gpu_shader5 is not enabled yet on Broadwell)
14 Piglits crash->pass.
Signed-off-by: Chris Forbes <[email protected]>
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
| |
Allows the driver to advertise DMA-BUF and throttling.
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Improves performance in GLBenchmark 2.7 TRex by 3.88889% +/- 0.336383%
(n=80) at 1280x720 on Broadwell GT3. Together with the previous patch,
it improves performance by 5.42738% +/- 0.541971% (n=10) at 1920x1080.
Note that without the PMA stall fix, this would instead decrease
performance by 22%.
v2: Update comment (noticed by Kristian Høgsberg).
Signed-off-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Certain non-promoted depth cases typically incur stalls. In very
specific cases, we can enable a workaround which improves performance.
Improves performance in GLBenchmark 2.7 TRex by 1.17762% +/- 0.448765%
(n=75) at 1280x720 on Broadwell GT3.
Haswell has this feature as well, but we can't currently write registers
from userspace batches (and we'd incur additional software batch
scanning overhead as well), so we haven't enabled it. Broadwell allows
us to write CACHE_MODE_1. Backporters beware: the formula and flushing
incantation differs between Haswell and Broadwell.
v2: Move pma_stall_bits from brw->state to brw itself (requested by
Kristian Høgsberg).
Signed-off-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
| |
This patch adds macros needed for the HiZ PMA stall optimization.
Signed-off-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
|
|
| |
Matt requested this in review feedback on the original patch, which I
completely missed when pushing this series. Kristian also made this
change, but I grabbed the wrong version of the patch.
Signed-off-by: Kenneth Graunke <[email protected]>
Reviewed-by: Matt Turner <[email protected]>
|