| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
|
|
| |
This prevents a build failure on some systems.
Reviewed-by: Francisco Jerez <[email protected]>
|
|
|
|
|
| |
Reviewed-by: Jose Fonseca <[email protected]>
Reviewed-by: Brian Paul <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
For seamless cube filtering it is necessary to determine new faces and new
coords per sample. The logic for this is _seriously_ complex (what needs
to happen is very "asymmetric" wrt face, x/y under/overflow), further
complicated by the fact that if the 4 samples are in a corner (meaning we
only have actually 3 samples, and all 3 are on different faces) then
falling off the edge is happening _both_ on x and y axis simultaneously.
There was a noticeable performance hit in mesa's cubemap demo when seamless
filtering was forced on (just below 10 percent or so in a debug build, when
disabling all filtering hacks, otherwise it would probably be a bit more) and
when always doing the logic, hence use a branch which it only does it if any
of the pixels in a quad (or in two quads) actually hit this. With that there
was no measurable performance hit in the cubemap demo (neither in a debug nor
release buidl), but this will vary (cubemap demo very rarely hits edges).
Might also be different on other cpus, as this forces SoA sampling path which
potentially can be quite a bit slower.
Note that as for corners, this code gets all the 3 samples which actually
exist right, and the 4th texel will simply be the same as one of the others,
meaning that filter weights will be a bit wrong. This however should be
enough for full OpenGL (but not d3d10) compliance.
Reviewed-by: Jose Fonseca <[email protected]>
Reviewed-by: Brian Paul <[email protected]>
|
|
|
|
|
|
|
|
|
| |
Using atomic function for ncs is superfluous since it is
protected by a mutex anyway. Also lock the mutex only once
while retrieving the next CS for submission.
Signed-off-by: Christian König <[email protected]>
Reviewed-by: Marek Olšák <[email protected]>
|
|
|
|
|
|
|
|
|
|
| |
rc_find_free_temporary_list() returns signed integer
(in case of lack of free temporary registers returns -1),
so new_index in radeon_rename_regs() should be signed.
https://bugs.freedesktop.org/show_bug.cgi?id=54867
Signed-off-by: Marek Olšák <[email protected]>
|
|
|
|
|
|
|
| |
Fixes "Uninitialized scalar field" defect reported by Coverity.
Signed-off-by: Vinson Lee <[email protected]>
Reviewed-by: Vadim Girlin <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
translate_sse.c contains code for msabi on x86_64, but it appears to be
untested.
Currently arguments 1 and 2 passed to the generated code are moved as 32-bit
quantities into the registers used by sysvabi, irrespective of the architecture.
Since these may be pointers, they must be moved as 64-bit quantities to avoid
truncation.
Commit f4dd0991719ef3e2606920c5100b372181c60899 disabled tranlate_sse.c on MinGW
x86_64, I don't know if was due to this issue, or a different one...
Signed-off-by: Jon TURNEY <[email protected]>
Reviewed-by: Brian Paul <[email protected]>
|
|
|
|
|
|
|
|
|
|
| |
Cygwin also uses the msabi calling convention on x86_64, not the sysvabi calling
convention
Signed-off-by: Jon TURNEY <[email protected]>
Reviewed-by: Brian Paul <[email protected]>
ignored, and an empty message aborts the commit.
|
|
|
|
|
|
|
|
|
|
|
| |
implementation which uses mmap()
The heap is NX on 64-bit Cygwin, so use the rtasm_exec_malloc() implementation
which uses mmap() to allocate an anonymous page with execute permission, rather
than the one which just uses malloc().
Signed-off-by: Jon TURNEY <[email protected]>
Reviewed-by: Brian Paul <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
We can't perform DCE using the liveness pass between GVN and GCM
because it relies on the correct schedule, but GVN doesn't care about
preserving correctness - it's rescheduled later by GCM.
This patch makes dce_cleanup pass perform simple DCE
between GVN and GCM instead of relying on liveness pass.
Fixes https://bugs.freedesktop.org/show_bug.cgi?id=70088
Signed-off-by: Vadim Girlin <[email protected]>
|
|
|
|
|
|
|
|
|
|
| |
This reverts commit 94d05bf87a21bd364e84f699a0064e5fba58a6f9 as it has a
few problems:
- it breaks windows builds becuase env[LLVM_CXXFLAGS] is never set there
- it is merging not only rtti, but the whole cxxflags (defines etc)
which has proven to be a source of troubles (breaks debugging etc.)
|
|
|
|
|
|
|
|
| |
LLVM 3.3 does not know about CIK processors, and the codes paths for SI
and CIK are the same.
Reviewed-by: Marek Olšák <[email protected]>
Cc: "9.2" <[email protected]>
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This is required in order for clang to correctly handle the OpenCL C
barrier() builtin which has the following restrictions acording to
the OpenCL 1.1 Specification:
If barrier is inside a conditional statement, then all work-items must
enter the conditional if any work-item enters the conditional statement
and executes the barrier.
If barrier is inside a loop, all work-items must execute the barrier for
each iteration of the loop before any are allowed to continue execution
beyond the barrier.
By linking before otimizations, we can replace calls to barrier() with
calls to a target specific intrinsic which has the noduplicate attribute
This attribute prevents clang from performing optimizations which could
violate the above rules.
This attribute must be applied to the call instruction that invokes
the function, so it is not enough to add this attribute the barrier()
declaration.
As a bonus this will probably speed up compile times since we will no
longer need to run link-time optimizations.
|
|
|
|
|
| |
Fix debug error message. Add switch case for PIPE_SHADER_COMPUTE.
Trivial.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
During the recent bind_sampler_states() interface change in gallium
we changed the CSO single_sampler_done() function so that if we were
decreasing the number of sampler states bound in the driver, we'd
null-out the "extra/old" sampler states to unbind them. See commit
1e2fbf265.
However, we didn't make the corresponding fix for sampler views.
This caused an assertion to fail in the svga driver which checked
that the number of sampler views matched the number of sampler states.
This patch fixes cso_restore_sampler_views() so that it nulls-out
the extra/old sampler views if the number of new views is less than
the number of current/old views.
Reviewed-by: Jose Fonseca <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
| |
* The rtti fix actually dug up a bug in the scons build scripts.
* Autotools took the LLVM cpp and cxx flags, while scons only took
the cpp flags.
* This grabs the cxx flags and applies them where needed. We may
want to make the same change for the llvm cpp flags in scons.
* The only linux platform I can find with LLVM no-rtti is Ubuntu.
* Fixes bug #70471
Tested-by: Vinson Lee <[email protected]>
|
|
|
|
|
|
|
|
| |
Actually implemented by draw module.
Tested piglit ARB_depth_clamp tests, which pass 100%.
Trivial.
|
|
|
|
|
|
| |
Otherwise (vs_slot < 0) will never be true.
Trivial.
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The xmlpool/options.h file was not accessible when building
out-of-tree leading to failure.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=70378
Reported-by: Fabio Pedretti <[email protected]>
Tested-by: Fabio Pedretti <[email protected]>
Tested-by: Andre Heider <[email protected]>
Signed-off-by: Emil Velikov <[email protected]>
Reviewed-by: Brian Paul <[email protected]>
Reviewed-by: Andreas Boll <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
* As discussed on the mailing list,
forced no-rtti breaks C++ public
API's such as the Haiku C++ libGL.so
* -fno-rtti *can* be still set however
instead of blindly forcing -fno-rtti,
we can rely on the llvm-config
--cppflags output.
If the system llvm is built without
rtti (default), the no-rtti flag will be
present in llvm-config --cppflags
(which we pick up on)
If llvm is built with rtti
(REQUIRES_RTTI=1), then -fno-rtti is
removed from llvm-config --cppflags.
* We could selectively add / remove rtti
from various components, however mixing
rtti and non-rtti code is tricky and
could introduce missing symbols.
* This needs impact tested.
Reviewed-by: Francisco Jerez <[email protected]>
|
|
|
|
|
|
|
|
| |
Add simple plain C routines for NV12<->YV12 and YUYV<->UYVY
conversions. The NV12->YV12 conversion is commonly used, for instance
by VLC.
Reviewed-by: Christian König <[email protected]>
|
|
|
|
|
|
|
|
| |
Textures that likely reside in VRAM, are mapped for reading and
don't require direct mapping should be staged into GTT, to avoid bad
performance. This fixes readback performance of VDPAU surfaces.
Reviewed-by: Marek Olšák <[email protected]>
|
|
|
|
|
|
|
| |
This new bind flag forces linear storage, but does not have other
side effects like R600_RESOURCE_FLAG_TRANSFER.
Reviewed-by: Christian König <[email protected]>
|
|
|
|
|
| |
This fixes a crash in Unigine Heaven 3.0, and probably in some
others apps.
|
|
|
|
|
|
|
|
|
|
|
|
| |
Currently it's hardcoded in the shader, so every change requires
compilation of the shader variant, killing the performance
in Serious Sam 3 and probably other apps.
This patch passes alpha_ref in the user sgpr and removes it from
the shader key.
Signed-off-by: Vadim Girlin <[email protected]>
Reviewed-by: Michel Dänzer <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This fixes the issue when dst and src is the same reg and operation on one
channel overwrites the source for other channels, e.g.:
UMUL TEMP[2].xyz, TEMP[0].xyzz, TEMP[2].xxxx
In this example the result of the operation on channel x is written in
TEMP[2].x and then used as a second source operand for channels y and z
instead of original value in TEMP[2].x.
This patch stores the results in temp reg and moves them to
dst after performing operation on all channels.
Fixes https://bugs.freedesktop.org/show_bug.cgi?id=70327
Signed-off-by: Vadim Girlin <[email protected]>
|
|
|
|
|
|
| |
Now that we support start, assert on start + num < max samplers
Reported by xexaxo
|
|
|
|
|
|
|
|
| |
With code dump enabled LLVM may generate disassembly during compilation.
Show this disassembly when available and prefer it to SI bytecode dump.
Reviewed-by: Tom Stellard <[email protected]>
Signed-off-by: Jay Cornwall <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Fix coord wrapping (and face selection too) in case of edges.
Unfortunately, the coord wrapping is way more complicated than what
the code did, as it depends on the face and the direction where the
texel falls off the face (the logic needed to get this right in fact
seems utterly ridiculous).
Also fix a bug in (y direction under/overflow) face selection.
And get rid of complicated cube corner handling. Just like edge case,
the coord wrapping was wrong and it seems very difficult to fix.
I'm near certain it can't always work anyway (though ordinary seamless
filtering on edge has actually a similar problem but not as severe)
because we don't have per-pixel face, hence could have multiple corner
texels which would make it very difficult to average the remaining texels
correctly. Hence simply pick a texel which would only have fallen off one
edge but not both instead, which is not quite accurate but actually I think
should be enough to meet OpenGL (but not d3d10) requirements.
v2: small fixes suggested by Brian, add some comments.
Reviewed-by: Brian Paul <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The previous limit of of 128*1024 was reported to cause frequent recompiles
in some apps due to shader variant thrashing on IRC in some apps leading
to noticeable lags.
Note that the LP_MAX_SHADER_VARIANTS limit (1024) was more or less impossible
to reach, since even simple fragment shaders without texturing (glxgears) used
more than twice than 128 instructions, hence the instruction limit would have
always been reached first (excluding things like trivial shaders not writing
color). Even with the new limit it is VERY likely the instruction limit is hit
first.
Should help with such lags due to recompiles (though other shader types have
their own limits, LP_MAX_SETUP_VARIANTS and DRAW_MAX_SHADER_VARIANTS, in
particular the latter seems a bit small (128)).
Reviewed-by: Brian Paul <[email protected]>
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
| |
Now that libEGL has been fixed to not leak all kinds of symbols, gbm
links to its own copy of the libwayland-drm.a helper library. That means
we can't rely on comparing the addresses of a static vtable symbol in that
library to determine if a wl_buffer is a wl_drm_buffer. Instead, we
move the vtable into the wl_drm struct and use that for comparing.
https://bugs.freedesktop.org/show_bug.cgi?id=69437
Cc: 9.2 <[email protected]>
|
|
|
|
|
|
|
|
| |
We should be able to safely set the framebuffer state without a
fragment shader bound. bind_ps_state will take care of updating the
necessary state bits later.
v2: check in update_db_shader_control
|
|
|
|
|
|
|
| |
* Fix LLVM library and defines
* Only enable tracing when scons build=debug
Acked-by: Brian Paul <[email protected]>
|
|
|
|
|
|
|
| |
* /boot/common no longer exists in Haiku as of
a few days ago (and this is undefined)
Acked-by: Brian Paul <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
i965, i915, radeon, r200, swrast, and nouveau were mostly trying to do the
same logic, except where they failed to. Notably, swrast had code that
appeared to try to enable GLES1/2 but forgot to set api_mask (thus
preventing any gles context from being created), and the non-intel drivers
didn't support MESA_GL_VERSION_OVERRIDE.
nouveau still relies on _mesa_compute_version(), because I don't know what
its limits actually are, and gallium drivers don't declare limits up front
at all. I think I've heard talk about doing so, though.
v2: Compat max version should be 30 (noted by Ken)
Drop r100's custom max version check, too (noted by Emil Velikov)
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
| |
The only important difference was not calling drmGetVersion, and making
the swrast extension vtable. That doesn't justify duplicating the other
330 lines of code.
v2: fix the scons build (code by Emil Velikov)
v3: fix scons build with swrast-only (code by Emil Velikov)
v4: Drop the new define I added, when we already have __NOT_HAVE_DRM_H.
Signed-off-by: Emil Velikov <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Calling radeon_drm_cs_flush from multiple threads might cause deadlocks,
fix this by immediately signaling the semaphore after waiting for it.
This is a candidate for the stable branch(es).
Partially fixes: https://bugs.freedesktop.org/show_bug.cgi?id=70123
v2: some fixes on commit message
Signed-off-by: Christian König <[email protected]>
|
|
|
|
|
| |
_GNU_SOURCE appears to not be used reliably. Use _MSC_VER instead so
that MSVC alone is affected.
|
|
|
|
|
|
|
|
| |
Unless the polygon fill mode is different from PIPE_POLYGON_MODE_FILL,
so checking the the polygon mode is sufficient.
Testing done: no regression in polygon-mode-offset
Reviewed-by: Roland Scheidegger <[email protected]>
|
|
|
|
|
|
|
|
|
|
| |
Not used since ages, and it wouldn't work at all with explicit derivatives now
(not that it did before as it ignored them but now the code would just use
the derivs pre-projected which would be quite random numbers).
v2: also get rid of 3 helper functions no longer used.
Reviewed-by: Jose Fonseca <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
They need some special handling. Quite complicated.
Additionally, use the same code for implicit derivatives too if no_rho_approx
and no_quad_lod is set, because it seems while generally it should be ok
to use per quad lod for implicit derivatives there's at least some test which
insists that in case of cubemaps the shared lod value MUST come from a pixel
inside the primitive (due to the derivatives becoming different if a different
larger major axis is chosen).
v2: based on Brian's feedback, clean up code a bit.
And use sign bit of major axis instead of pre-select s/t/r sign for coord
mirroring (which should be the same in the end, saves 2 ands).
Also fix two bugs with select/mirror of derivatives, the minor axes need to
use major axis sign as well (instead of major derivative axis sign), and
don't mistakenly use absolute values of major derivative and inverse major
values.
Reviewed-by: Jose Fonseca <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
There's two reasons for this:
1) even when ignoring rho approximation for cube maps, the result is still
not correct, but it's better as the max error at edges is now sqrt(2) instead
of 2 (which was a full mip level), same as it is for ordinary 2d maps when
doing rho approximations (so the error actually goes from factor 2 at edges and
sqrt(2) completely inside a face to sqrt(2) at edges and 0 inside a face).
2) I want to repurpose rho_no_approx for cubemaps for fully correct cubemap
derivatives (so don't need yet another debug var).
Reviewed-by: Jose Fonseca <[email protected]>
Reviewed-by: Brian Paul <[email protected]>
|
|
|
|
|
|
|
| |
GNU C++ compiler declares the C99 lrint, etc. when _GNU_SOURCE is
defined, but MSVC does not.
Trivial.
|
|
|
|
|
|
|
|
|
|
| |
As we're moving towards expanding the number of subpixel
bits and the width of the variables used in the computations
we need to make this code a bit more centralized.
Signed-off-by: Zack Rusin <[email protected]>
Reviewed-by: José Fonseca <[email protected]>
Reviewed-by: Brian Paul <[email protected]>
|
|
|
|
|
|
|
|
|
| |
Both the imul_hi and umul_hi are working with this patch.
Signed-off-by: Zack Rusin <[email protected]>
Reviewed-by: José Fonseca <[email protected]>
Reviewed-by: Roland Scheidegger <[email protected]>
Reviewed-by: Brian Paul <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The code introduces two new 32bit integer multiplication opcodes which
can be used to produce correct 64 bit results. GLSL, OpenCL and D3D10+
require them. We use two seperate opcodes, because they match the
behavior of GLSL and OpenCL, are a lot easier to add than a single
opcode with multiple destinations and because there's not much (any)
difference wrt code-generation.
Signed-off-by: Zack Rusin <[email protected]>
Reviewed-by: José Fonseca <[email protected]>
Reviewed-by: Roland Scheidegger <[email protected]>
Reviewed-by: Brian Paul <[email protected]>
|
|
|
|
|
|
|
| |
only 8 and 32 bit integers were supported before.
Signed-off-by: Zack Rusin <[email protected]>
Reviewed-by: José Fonseca <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The compiler cannot find the Xlib.h in the installed system headers.
All supplied include directives point to inside the mesa module.
The X11_CFLAGS variable is undefined (not defined in config.status).
It appears the intent was to use X11_INCLUDES defined in configure.ac.
The Xlib.h file is not installed on my workstation. It is supplied in
the libx11-dev package. This allows an X developer control over which
version of this file is used for X development.
Use to test: --enable-gallium-egl --enable-xlib-glx --disable-dri
Acked-by: Brian Paul <[email protected]>
Signed-off-by: Gaetan Nadon <[email protected]>
|