| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
| |
Reviewed-by: Samuel Iglesias Gonsálvez <[email protected]>
|
|
|
|
| |
Reviewed-by: Samuel Iglesias Gonsálvez <[email protected]>
|
|
|
|
| |
Reviewed-by: Samuel Iglesias Gonsálvez <[email protected]>
|
|
|
|
| |
Reviewed-by: Samuel Iglesias Gonsálvez <[email protected]>
|
|
|
|
|
|
|
| |
On gen5-6, SeparateStencilBufferEnable and HierarchicalDepthBufferEnable
come hand in hand and we have to set either both or neither.
Reviewed-by: Samuel Iglesias Gonsálvez <[email protected]>
|
|
|
|
|
|
|
| |
This ensures that we get the correct layout for all stencil buffers, not
just those which are created as separate stencil for a depth buffer.
Reviewed-by: Samuel Iglesias Gonsálvez <[email protected]>
|
|
|
|
| |
Reviewed-by: Samuel Iglesias Gonsálvez <[email protected]>
|
|
|
|
| |
Reviewed-by: Samuel Iglesias Gonsálvez <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The CL CTS queries the max allocation size, and then attempts to
allocate buffers of that size. If not enough contiguous RAM/VRAM is
available, this causes errors in the radeon kernel module due to
inability to allocate the required memory.
It's a bit of a hack, but experimentally on my system, I can use ~3/4
of the card's VRAM for a single global/constant buffer allocation given
current GUI/compositor use.
For a 1GB Pitcairn (HD7850) this gets me from the reported clinfo values of:
Global memory size 2143076352 (1.996GiB)
Max memory allocation 1500153446 (1.397GiB)
Max constant buffer size 1500153446 (1.397GiB)
To:
Global memory size 2143076352 (1.996GiB)
Max memory allocation 751619276 (716MiB)
Max constant buffer size 751619276 (716MiB)
Fixes: OpenCL CTS test/conformance/api/min_max_mem_alloc_size,
OpenCL CTS test/conformance/api/min_max_constant_buffer_size
Signed-off-by: Aaron Watry <[email protected]>
Reviewed-by: Marek Olšák <[email protected]>
|
|
|
|
|
|
|
|
| |
This updates the Gen4-5 code to use a line end cap width of 0.5
for non-smooth lines, and 1.0 for smooth lines - which is what we
do on Gen6+.
Reviewed-by: Rafael Antognolli <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This unifies the Gen4-5 and Gen6+ line width calculations.
I believe it also fixes a bug - we weren't rounding the line width
to the nearest integer. The GL 4.5 (and GL 2.1) specs "Wide Lines"
section says:
"The actual width of non-antialiased lines is determined by rounding
the supplied width to the nearest integer, then clamping it to the
implementation-dependent maximum non-antialiased line width."
We don't need to care about _NEW_MULTISAMPLE here because multisampling
doesn't exist on Gen4-5, so the state shouldn't change.
Reviewed-by: Rafael Antognolli <[email protected]>
|
|
|
|
|
|
| |
It's a U3.1. It became a U3.7 on Sandybridge.
Reviewed-by: Rafael Antognolli <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This effectively reverts Robert Ellison's 2009 commit
cc8afbd3862fedfe42e51c3774960d1c7078ec53.
I'm not seeing any GL spec text indicating that UPPER won't work.
On Gen6+, this bit moved to 3DSTATE_WM as a single bit, controlling
UPPER_LEFT vs. UPPER_RIGHT. There is no way to request LOWER_RIGHT,
so UPPER_RIGHT is the best you can do.
In the G45 docs, it's marked as "Reserved" as well, but we just
decided to use it anyway.
This patch unifies the behavior between Gen4-5 and Gen6+.
Note that this is separate from point sprite texcoord behavior.
Reviewed-by: Rafael Antognolli <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Modern GL specifications say that the point size should be 1.0 when
gl_PointSize is unwritten and the last enabled stage is a geometry
or tessellation shader. If it's a vertex shader, though, both the
GL specs and ES 3.0 spec say that it's undefined - so since Gen4-5
only support vertex shaders, there's no actual requirement to do this.
Since there is a cost associated (an extra dirty bit, which may cause
SF_STATE to be emitted more often), it may not be a good idea.
The real benefit is that it makes all generations behave identically.
And that seems somewhat nice...
Reviewed-by: Rafael Antognolli <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Apparently, Nanhai made the Gen4-5 point size calculations round to the
nearest integer in commit 8d5231a3582e4f2769ac0685cf0174e09750700e,
"according to spec". When Eric first ported the driver to Sandybridge,
he did not implement this rounding.
In the GL 2.1 and 3.0 specs "Basic Point Rasterization" section, it does
say "If antialiasing and point sprites are disabled, the actual width is
determined by rounding the supplied width to the nearest integer, then
clamping it to the implementation-dependent maximum non-antialised point
width."
In contrast, GL 3.1 and later do not appear to contain this rounding.
It might be reasonable to round, given that we only implement GL 2.1.
Of course, if we were to do that, we should actually implement the AA
vs. non-AA distinction. Brian added an XXX comment reminding us to fix
this 10 years ago, but it never happened.
I think a better plan is to follow the newer, unrounded behavior. This
is what we do on Gen6+ and it passes all the relevant conformance tests.
Reviewed-by: Rafael Antognolli <[email protected]>
|
|
|
|
|
|
|
|
| |
According to the docs, a simple CS stall is insufficient to ensure that
the memory from the flush is visible and an end-of-pipe sync is needed.
Cc: "17.1" <[email protected]>
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
| |
Cc: "17.1" <[email protected]>
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
| |
Cc: "17.1" <[email protected]>
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
| |
v2 (Jason Ekstrand):
- Take a flags parameter to control the flushes
- Refactoring
Cc: "17.1" <[email protected]>
Signed-off-by: Topi Pohjolainen <[email protected]>
Reviewed-by: Jason Ekstrand <[email protected]>
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
|
|
|
| |
These two functions contain almost identical logic except for one SNB
workaround required for render target cache flushes. They may as well
call into the same code so we only have to handle the work-arounds in
one place.
Cc: "17.1" <[email protected]>
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
|
| |
It's a 64-bit value. Splitting it up just makes the function arguments
awkward.
Cc: "17.1" <[email protected]>
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
| |
Cc: "17.1" <[email protected]>
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
|
| |
I forgot to add this when introducing the new key field. It doesn't
happen often - just with the Unigine workarounds. But we may as well
have it, so we get an accurate picture of why recompiles happen.
Reviewed-by: Alejandro Piñeiro <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
| |
KHR/khrplatform.h is required by the EGL, GLES and VG headers, but is
only installed if Mesa3d is compiled with EGL support.
This patch installs this header file unconditionally.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=77240
Signed-off-by: Eric Le Bihan <[email protected]>
Reviewed-by: Emil Velikov <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
wpos_tex used to be a GLuint so assigning -1 to it and
later comparing with -1 worked correctly, but commit
c349031c27b7 ("i915: Fix texcoord vs. varying collision in
fragment programs") changed wpos_tex to uint8_t and hence
broke the comparison. To fix this define a more explicit
invalid value for wpos_tex.
gcc warns us:
i915_fragprog.c:1255:57: warning: comparison is always true due to limited range of data type [-Wtype-limits]
if (inputsRead & VARYING_BITS_TEX_ANY || p->wpos_tex != -1) {
^
And clang says:
i915_fragprog.c:1255:57: warning: comparison of constant -1 with expression of type 'uint8_t' (aka 'unsigned char') is always true [-Wtautological-constant-out-of-range-compare]
if (inputsRead & VARYING_BITS_TEX_ANY || p->wpos_tex != -1) {
~~~~~~~~~~~ ^ ~~
Cc: Chih-Wei Huang <[email protected]>
Cc: Eric Anholt <[email protected]>
Cc: Ian Romanick <[email protected]>
Cc: [email protected]
Fixes: c349031c27b7 ("i915: Fix texcoord vs. varying collision in fragment programs")
Signed-off-by: Ville Syrjälä <[email protected]>
Reviewed-by: Emil Velikov <[email protected]>
|
|
|
|
|
|
|
|
| |
This should fix compilation errors in some situations.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=101418
Signed-off-by: Samuel Pitoiset <[email protected]>
Reviewed-by: Marek Olšák <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Testing with zlib versions 1.2.{3,4,5,6,7,8} showed no difference in
functionality, correctness, or zlib API usage and 1.2.3 is the oldest
version available in still actively deployed production Linux
distributions (RHEL/CentOS 6 and SuSE 11).
Build 17.1.1 against the system supplied zlib-devel packages for 1.2.3
in EL6 and 1.2.7 on EL7. I then swapped out the zlib version at runtime
via LD_LIBRARY_PATH with ones build from the release tarballs from
zlib.net
Testwise - I ran the piglit shader profile with --quick addded to the
tests since I figured that would exercise the shader cache, which would
in turn use zlib.
Signed-off-by: Chuck Atkins <[email protected]>
Cc: 17.1 <[email protected]>
Cc: Timothy Arceri <[email protected]>
Reviewed-by: Timothy Arceri <[email protected]>
[Emil Velikov: add hunk about version/piglit testing]
Acked-by: Emil Velikov <[email protected]>
|
|
|
|
|
|
|
| |
This has only been tested on RX480.
Signed-off-by: Samuel Pitoiset <[email protected]>
Reviewed-by: Marek Olšák <[email protected]>
|
|
|
|
|
| |
Signed-off-by: Samuel Pitoiset <[email protected]>
Reviewed-by: Marek Olšák <[email protected]>
|
|
|
|
|
| |
Signed-off-by: Samuel Pitoiset <[email protected]>
Reviewed-by: Marek Olšák <[email protected]>
|
|
|
|
|
|
|
|
| |
When a buffer becomes resident, check if it has been invalidated,
if so update the descriptor and the dirty flag.
Signed-off-by: Samuel Pitoiset <[email protected]>
Reviewed-by: Marek Olšák <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
When texture buffers are invalidated the addr in the resident
descriptor has to be updated but we can't create a new descriptor
because the resident handle has to be the same.
Instead, use the WRITE_DATA packet which allows to update memory
directly but graphics/compute have to be idle in case the GPU is
reading the descriptor.
Signed-off-by: Samuel Pitoiset <[email protected]>
Reviewed-by: Marek Olšák <[email protected]>
|
|
|
|
|
|
|
|
| |
When the current bound shaders don't use any bindless textures
or images, it's useless to decompress the resident resources.
Signed-off-by: Samuel Pitoiset <[email protected]>
Reviewed-by: Marek Olšák <[email protected]>
|
|
|
|
|
|
|
|
|
| |
This adds some new helper functions to know if the current draw
call (or dispatch compute) is using bindless samplers/images,
based on TGSI analysis.
Signed-off-by: Samuel Pitoiset <[email protected]>
Reviewed-by: Marek Olšák <[email protected]>
|
|
|
|
|
|
|
|
| |
Similar to the existing decompression code path except that it
loops over the list of resident textures/images.
Signed-off-by: Samuel Pitoiset <[email protected]>
Reviewed-by: Marek Olšák <[email protected]>
|
|
|
|
|
|
|
|
|
|
| |
Analogous to bound textures/images. We should also update the
resident descriptors and disable COMPRESSION_EN for avoiding
useless DCC fetches, but I postpone this optimization for a
separate series.
Signed-off-by: Samuel Pitoiset <[email protected]>
Reviewed-by: Marek Olšák <[email protected]>
|
|
|
|
|
|
|
|
|
| |
This won't help much except for applications that use a ton
of resident handles. Though, this will reduce the winsys
overhead a little bit.
Signed-off-by: Samuel Pitoiset <[email protected]>
Reviewed-by: Marek Olšák <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
| |
Resident buffers have to be added to every new command stream.
Though, this could be slightly improved when current shaders
don't use any bindless textures/images but usually applications
tend to use bindless for almost every draw call, and the winsys
thread might help when buffers are added early.
Signed-off-by: Samuel Pitoiset <[email protected]>
Reviewed-by: Marek Olšák <[email protected]>
|
|
|
|
|
|
|
|
| |
This implements the Gallium interface. Decompression of resident
textures/images will follow in the next patches.
Signed-off-by: Samuel Pitoiset <[email protected]>
Reviewed-by: Marek Olšák <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
For each texture/image handles, we need to allocate a new
buffer for the bindless descriptor. But when the number of
buffers added to the current CS becomes high, the overhead
in the winsys (and in the kernel) is important.
To reduce this bottleneck, the idea is to suballocate the
bindless descriptors using a slab similar to the one used
in the winsys.
Currently, a buffer can hold 1024 bindless descriptors but
this limit is arbitrary and could be changed in the future
for some reasons. Once a slab is allocated the "base" buffer
is added to a per-context list.
Signed-off-by: Samuel Pitoiset <[email protected]>
Reviewed-by: Marek Olšák <[email protected]>
|
|
|
|
|
|
|
| |
To share some common code between bound and bindless images.
Signed-off-by: Samuel Pitoiset <[email protected]>
Reviewed-by: Marek Olšák <[email protected]>
|
|
|
|
|
|
|
| |
To share some common code between bound and bindless textures.
Signed-off-by: Samuel Pitoiset <[email protected]>
Reviewed-by: Marek Olšák <[email protected]>
|
|
|
|
|
|
|
|
| |
This will be used in order to initialize resident descriptors
for bindless textures/images.
Signed-off-by: Samuel Pitoiset <[email protected]>
Reviewed-by: Marek Olšák <[email protected]>
|
|
|
|
|
|
| |
Signed-off-by: Samuel Pitoiset <[email protected]>
Reviewed-by: Marek Olšák <[email protected]>
Reviewed-by: Nicolai Hähnle <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The ARB_bindless_texture spec say:
"If ARB_seamless_cubemap (or OpenGL 4.0, which includes it) is
supported, the per-context seamless cubemap enable is ignored
and treated as disabled when using texture handles."
"If AMD_seamless_cubemap_per_texture is supported, the seamless
cube map texture parameter of the underlying texture does apply
when texture handles are used."
The per-context seamless cubemap flag should only be enabled for
bound textures/samplers.
Signed-off-by: Samuel Pitoiset <[email protected]>
Reviewed-by: Marek Olšák <[email protected]>
Reviewed-by: Nicolai Hähnle <[email protected]>
|
|
|
|
|
|
| |
Signed-off-by: Samuel Pitoiset <[email protected]>
Reviewed-by: Nicolai Hähnle <[email protected]>
Reviewed-by: Marek Olšák <[email protected]>
|
|
|
|
|
|
| |
Signed-off-by: Samuel Pitoiset <[email protected]>
Reviewed-by: Nicolai Hähnle <[email protected]>
Reviewed-by: Marek Olšák <[email protected]>
|
|
|
|
|
|
| |
Signed-off-by: Samuel Pitoiset <[email protected]>
Reviewed-by: Nicolai Hähnle <[email protected]>
Reviewed-by: Marek Olšák <[email protected]>
|
|
|
|
|
|
| |
Signed-off-by: Samuel Pitoiset <[email protected]>
Reviewed-by: Nicolai Hähnle <[email protected]>
Reviewed-by: Marek Olšák <[email protected]>
|
|
|
|
|
|
| |
Signed-off-by: Samuel Pitoiset <[email protected]>
Reviewed-by: Nicolai Hähnle <[email protected]>
Reviewed-by: Marek Olšák <[email protected]>
|