| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
|
| |
Reviewed-by: Bas Nieuwenhuizen <[email protected]>
Reviewed-by: Marek Olšák <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Code is taken from a combination of radv (for the more basic functions,
to avoid gallivm dependencies) and radeonsi (for the new and improved
derivative calculations).
v2: add 0.5 offset to tex coords only after derivative calculation
v3:
- really only touch the first three coordinates
- rebase on the removal of the 1.5 --> 0.5 offset change
Reviewed-by: Bas Nieuwenhuizen <[email protected]> (v2)
Reviewed-by: Marek Olšák <[email protected]>
|
|
|
|
|
|
|
|
| |
Sourcing coords_arg[4] is actually never correct, since bias is handled
differently in tex_fetch_args anyway.
Reviewed-by: Bas Nieuwenhuizen <[email protected]>
Reviewed-by: Marek Olšák <[email protected]>
|
|
|
|
|
| |
Reviewed-by: Bas Nieuwenhuizen <[email protected]>
Reviewed-by: Marek Olšák <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
As remarked by the comment in the original code, the old algorithm fails when
(tc + deriv) points at a different cube face. Instead, simply project the
derivative directly to the plane of the selected cube face.
The new code is based on exactly differentiating (using the chain rule)
the projection onto a plane corresponding to a fixed cube map face (which
is still selected in the usual way based on the texture coordinate itself).
The computations end up fairly involved, but we do save two reciprocal
computations.
Fixes GL45-CTS.texture_cube_map_array.sampling.
v2: add 0.5 offset to tex coords only after derivative calculation
v3: go back to 1.5 offset
Reviewed-by: Bas Nieuwenhuizen <[email protected]> (v2)
Reviewed-by: Marek Olšák <[email protected]>
|
|
|
|
|
|
|
| |
v2: fix compile error that snuck in during rebase
Reviewed-by: Bas Nieuwenhuizen <[email protected]>
Reviewed-by: Marek Olšák <[email protected]>
|
|
|
|
|
|
|
|
| |
b838f642 "ac/debug: Move sid_tables.h generation to common code." moved
sid_tables.h but forgot the corresponding .gitignore.
Signed-off-by: Grazvydas Ignotas <[email protected]>
Reviewed-by: Nicolai Hähnle <[email protected]>
|
|
|
|
| |
Reported-by: Matt Turner <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
| |
This fixes a bug in code motion that occurred when the best block is the
same as the schedule early block. In this case, because we're checking
(lca != def->parent_instr->block) at the top of the loop, we never get to
the check for loop depth so we wouldn't move it out of the loop. This
commit reworks the loop to be a simple for loop up the dominator chain and
we place the (lca != def->parent_instr->block) check at the end of the
loop.
Reviewed-by: Matt Turner <[email protected]>
|
|
|
|
|
|
|
|
| |
Cc: [email protected]
Cc: Bruce Cherniak <[email protected]>
Signed-of-by: Chuck Atkins <[email protected]>
Reviewed-by: Bruce Cherniak <[email protected]>
Reviewed-by: Emil Velikov <[email protected]>
|
|
|
|
|
|
|
|
|
|
| |
The generated file is correctly stored in the builddir as of earlier
commit. Yet the commit forgot to add the respective include flag thus
the compiler would error out failing to find sid_tables.h
Bugzila: https://bugs.freedesktop.org/show_bug.cgi?id=99389
Fixes: d1dc22eb466 "ac: automake: rework sid_tables.h generation"
Signed-off-by: Emil Velikov <[email protected]>
|
|
|
|
|
|
|
|
| |
Port of faa1edeeb7bbe9321c79587e592dce812e8caa78
"anv/pipeline: Call NIR passes using NIR_PASS_V"
Signed-off-by: Bas Nieuwenhuizen <[email protected]>
Reviewed-by: Timothy Arceri <[email protected]>
|
|
|
|
|
|
|
|
|
| |
Port of c5d664f9dc2d281c74844cef36ecb9f5862a8f6a
"anv/pipeline: Call nir_lower_constant_initializers"
Signed-off-by: Bas Nieuwenhuizen <[email protected]>
Cc: <[email protected]>
Reviewed-by: Timothy Arceri <[email protected]>
|
|
|
|
|
|
|
|
| |
Port of 43e0b0d4b255d910616c10e3e01bfec5db469e0e
"anv/pipeline: Only call remove_dead_variables once"
Signed-off-by: Bas Nieuwenhuizen <[email protected]>
Reviewed-by: Timothy Arceri <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
When the flag D3DCREATE_MULTITHREAD is set, a global mutex is used
to protect nine calls.
However for performance reasons, AddRef and Release didn't hold the mutex,
and instead used atomics.
Unfortunately at item release, the item can be destroyed, and that
destruction path should be protected by a mutex (at least for
some objects).
Without this patch, it is possible an app thread is in a dtor
while another thread is making gallium nine calls. It is possible
that two threads are using the same gallium pipe, which is forbiden.
The problem has been made worse with csmt, because it can cause hang,
since nine_csmt_process is not threadsafe.
Fixes Hitman hang, and possibly others.
Signed-off-by: Axel Davy <[email protected]>
|
|
|
|
|
|
|
|
| |
Flush the queue to get refcounts right, and properly
release the items, instead of throwing away all pending
commands.
Signed-off-by: Axel Davy <[email protected]>
|
|
|
|
|
|
|
|
| |
Some nine_state_* and nine_context_* functions
used for Reset() require all pending commands are
flushed.
Signed-off-by: Axel Davy <[email protected]>
|
|
|
|
|
|
|
| |
nine_context uses NineSurface9 fields, thus we need to flush
pending commands using the surface before changing the fields.
Signed-off-by: Axel Davy <[email protected]>
|
|
|
|
|
|
| |
Create both surfaces in one call.
Signed-off-by: Axel Davy <[email protected]>
|
|
|
|
|
|
|
|
| |
There is no need to check on csmt_active before
calling nine_csmt_process, because the function
checks already.
Signed-off-by: Axel Davy <[email protected]>
|
|
|
|
|
|
|
|
|
|
| |
When dirty region is empty, u_box_union_* incorrectly expands
the new region.
This fixes broken font rendering issue in WOLF RPG Editor v2.10 games.
Signed-off-by: Masanori Kakura <[email protected]>
Reviewed-by: Axel Davy <[email protected]>
|
|
|
|
|
|
| |
... and list the public header within it.
Signed-off-by: Emil Velikov <[email protected]>
|
|
|
|
|
|
| |
Note: the currently mentioned etnaviv_utils.h is typo.
Signed-off-by: Emil Velikov <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Drop $(srcdir)/ prefix analogous to before the file (and rule) movement
and move it outside of the NEED_RADEON_LLVM conditional.
Otherwise the build may fail as below.
make[3]: *** No rule to make target 'common/sid_tables.h', needed by 'distdir'. Stop.
Fixes: b838f642371 "ac/debug: Move sid_tables.h generation to common
code."
Signed-off-by: Emil Velikov <[email protected]>
|
|
|
|
|
|
|
|
|
|
| |
Changes from V1 -> V2:
- updated Copyright
- added $(top_srcdir)/src/gallium/winsys to include path (suggested by Emil)
- adapted driver to new renderonly API
Signed-off-by: Christian Gmeiner <[email protected]>
Acked-by: Emil Velikov <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This driver supports a wide range of Vivante IP cores like GC880,
GC1000, GC2000 and GC3000.
Changes from V1 -> V2:
- added missing files to actually integrate the driver into build system.
- adapted driver to new renderonly API
Signed-off-by: Christian Gmeiner <[email protected]>
Signed-off-by: Lucas Stach <[email protected]>
Signed-off-by: Philipp Zabel <[email protected]>
Signed-off-by: Rob Herring <[email protected]>
Signed-off-by: Russell King <[email protected]>
Signed-off-by: Wladimir J. van der Laan <[email protected]>
Acked-by: Emil Velikov <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This a very lightweight library to add basic support for renderonly
GPUs. A kms gallium driver must specify how a renderonly_scanout
objects gets created. Also it must provide file handles to the used
kms device and the used gpu device.
This could look like:
struct renderonly ro = {
.create_for_resource = renderonly_create_gpu_import_for_resource,
.kms_fd = fd,
.gpu_fd = open("/dev/dri/renderD128", O_RDWR | O_CLOEXEC)
};
The renderonly_scanout object exits for two reasons:
- Do any special treatment for a scanout resource like importing the
GPU resource into the scanout hw.
- Make it easier for a gallium driver to detect if anything special
needs to be done in flush_resource(..) like a resolve to linear.
A GPU gallium driver which gets used as renderonly GPU needs to be
aware of the renderonly library.
This library will likely break android support and hopefully will get
replaced with a better solution based on gbm2.
Changes from V1 -> V2:
- reworked the lifecycle of renderonly object (suggested by Nicolai Hähnle)
- killed the midlayer (suggested by Thierry Reding)
- made the API more explicit regarding gpu and kms fd's
- added some docs
Signed-off-by: Christian Gmeiner <[email protected]>
Acked-by: Emil Velikov <[email protected]>
Tested-by: Alexandre Courbot <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Once again, SPIR-V is insane... It allows you to place "patch"
decorations on structure members. Presumably, this is so that you can
do something such as
out struct S {
layout(location = 0) patch vec4 thing1;
layout(location = 0) vec4 thing2;
} str;
And have your I/O "nicely" organized. While this is a bit silly, it's
allowed and well-defined so whatever. Where it really gets interesting
is when you have an array of struct. SPIR-V says nothing about not
allowing you to have those qualifiers on the members of a struct that's
inside an array and GLSLang does this. Specifically, if you have
layout(location = 0) out patch struct S {
vec4 thing1;
vec4 thing2;
} str[2];
then GLSLang will place the "patch" decorations on the struct members.
This is ridiculous there is no way that having some of them be patch and
some not would be well-defined given that patch and non-patch outputs
are in effectively different storage classes. This commit moves around
the way we handle the "patch" decoration so that we can detect even the
crazy cases and handle them.
Fixes: dEQP-VK.tessellation.user_defined_io.per_patch_block_array.*
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This patch implements vk_icdNegotiateLoaderICDInterfaceVersion(), which
brings us to loader interface v3.
v2:
- Drop the pragmas. [emil]
- Advertise v3 instead of v2. Anvil supported more than I
thought. [jason]
- s/Surface/SurfaceKHR/ in comments. [emil]
Reviewed-by: Emil Velikov <[email protected]>
Cc: [email protected]
Cc: Jason Ekstrand <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
We can't import the latest vk_icd.h because the new header breaks the
Mesa build. This patch defines new casting macros,
ICD_DEFINE_NONDISP_HANDLE_CASTS() and ICD_FROM_HANDLE(), which can
handle both the old and new vk_icd.h, and will prevent the build from
breaking when we update the header.
In the old vk_icd.h, types were defined as:
typedef struct _VkIcdFoo {
...
} VkIcdFoo;
Commit 6ebba1f6 in the Vulkan loader changed the above to
typedef {
...
} VkIcdFoo;
because the old definitions violated the C and C++ specs. According to
the specs, identifiers that begins with an underscore followed by an
uppercase letter are reserved. (It's pedantic, I know), See the Github
issue referenced below.
References: https://github.com/KhronosGroup/Vulkan-LoaderAndValidationLayers/issues/7
References: https://github.com/KhronosGroup/Vulkan-LoaderAndValidationLayers/commit/6ebba1f630015af7a78767a15c1e74ba9b23601c
Reviewed-by: Emil Velikov <[email protected]>
Cc: [email protected]
|
|
|
|
|
|
|
| |
Defer delete on regular resources. This ensures that any work being done
on the resource is completed before freeing up the resource's memory.
Reviewed-by: Bruce Cherniak <[email protected]>
|
|
|
|
|
|
| |
This fixes a defect detected by Coverity Scan.
Reviewed-by: Iago Toral Quiroga <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
| |
Although, arb_shader_image_load_store-atomicity will most likely
hang your box, I think it's now quite reasonable to enable GL 4.3
on Maxwell/Pascal GPUs. I suspect that test to be wrong because
it doesn't even work on the NVIDIA blob.
I have tested a bunch of benchmarks (UE4 demos) and real games
like Shadow of Mordor and they all work fine.
Signed-off-by: Samuel Pitoiset <[email protected]>
|
|
|
|
|
| |
Signed-off-by: Samuel Pitoiset <[email protected]>
Reviewed-by: Pierre Moreau <[email protected]>
|
|
|
|
|
|
| |
Signed-off-by: Samuel Pitoiset <[email protected]>
Reviewed-by: Pierre Moreau <[email protected]>
Acked-by: Ilia Mirkin <[email protected]>
|
|
|
|
|
|
|
|
|
|
| |
Yes, IMUL/IMAD require dependency barriers and we should
definitely replace these instructions by XMAD but the
different flags need to be figured out. Note that XMAD only
supports 16-bits integers.
Signed-off-by: Samuel Pitoiset <[email protected]>
Reviewed-by: Pierre Moreau <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This makes use of scheduling control codes which are very useful
for improving the instruction pipelining.
This patch will increase performance on Maxwell GPUs by, at least,
x1.5 up to x3.5 for some benchmarks.
Although this has been fairly well tested, I would not be suprised
if someone hit a corner case somewhere. That way, the scheduler
is enabled by default but it can be deactivated by using
NV50_PROG_SCHED=0.
Thanks to Scott Gray for the reverse engineering work available from
https://github.com/NervanaSystems/maxas/wiki/Control-Codes.
Signed-off-by: Samuel Pitoiset <[email protected]>
Acked-by: Pierre Moreau <[email protected]>
Tested-by: Alexandre Courbot <[email protected]>
Tested-by: Jan Vesely <[email protected]>
|
|
|
|
|
|
|
|
|
|
| |
It's actually useless to insert those texture barriers post RA
because the current control code (ie. st 0x0) will wait for all
dependencies before issuing a new instruction.
Signed-off-by: Samuel Pitoiset <[email protected]>
Reviewed-by: Ilia Mirkin <[email protected]>
Reviewed-by: Pierre Moreau <[email protected]>
|
|
|
|
|
|
|
|
|
| |
GL_ARB_vertex_attrib_64bit was the last piece missing.
v2: update docs (Jordan)
Signed-off-by: Juan A. Suarez Romero <[email protected]>
Reviewed-by: Jordan Justen <[email protected]>
|
|
|
|
|
|
|
|
| |
v2: update docs (Jordan)
Signed-off-by: Samuel Iglesias Gonsálvez <[email protected]>
Signed-off-by: Juan A. Suarez Romero <[email protected]>
Reviewed-by: Jordan Justen <[email protected]>
|
|
|
|
|
|
|
|
|
| |
ARB_vertex_attrib_64bit for HSW+
Signed-off-by: Samuel Iglesias Gonsálvez <[email protected]>
Signed-off-by: Juan A. Suarez Romero <[email protected]>
Reviewed-by: Jordan Justen <[email protected]>
|
|
|
|
|
|
|
|
| |
v2: update docs (Jordan)
Signed-off-by: Alejandro Piñeiro <[email protected]>
Signed-off-by: Juan A. Suarez Romero <[email protected]>
Reviewed-by: Jordan Justen <[email protected]>
|
|
|
|
|
|
|
| |
Those not supporting 64 bit input vertex attributes will have the
dual_slot value as false.
Reviewed-by: Jordan Justen <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
For dvec3 and dvec4 types, a single GRF do not have enough space to
allocate two inputs from two different vertices (SIMD4x2).
So the GRF only contains first two components for the two vertices, and
the next GRF has the remaining components.
We want to put all the components for the same vertex in the same
register. Thus, we do a shuffle to reorder the data.
Reviewed-by: Jordan Justen <[email protected]>
|
|
|
|
|
|
|
|
|
|
| |
Doubles needs more that one slot per attribute. So when filling the
attribute_map we check if it is a double in order to allocate one
extra register.
Signed-off-by: Alejandro Piñeiro <[email protected]>
Signed-off-by: Juan A. Suarez Romero <[email protected]>
Reviewed-by: Jordan Justen <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Doubles need extra space, so we would need to do a remapping for vec4
too in order to take that into account. We reuse the already
existing remap_vs_attrs, but passing is_scalar, so they could
remap accordingly.
v2: code-format remap_vs_attrs_params initialization (Matt)
Signed-off-by: Alejandro Piñeiro <[email protected]>
Signed-off-by: Juan A. Suarez Romero <[email protected]>
Reviewed-by: Jordan Justen <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
As part of the payload setup, setup_attributes is called with the first
GRF that can be used for the attributes (first ones are used for
uniforms for example) and returns the first GRF that is not part of the
payload. Before this patch, it adds directly the number of attributes.
But as with 64-bit attributes can consume more than one slot, that is
not valid anymore. This patch change the addition to use the number of
slots consumed.
gen >= 8 would not be affected, as they use the scalar mode. For that
case, the vs configuration is done at fs_visitor::assign_vs_urb_setup.
v2: add explanation in commit log (Jordan)
Reviewed-by: Jordan Justen <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
| |
gen < 8 doesn't support *64*PASSTHRU formats when emitting
vertices. So in order to provide the equivalent functionality, we need
to downsize the format to equivalent *32*FLOAT, and in some cases
(R64G64B64 and R64G64B64A64) submit two 3DSTATE_VERTEX_ELEMENTS for
each vertex element.
Signed-off-by: Alejandro Piñeiro <[email protected]>
Signed-off-by: Juan A. Suarez Romero <[email protected]>
Reviewed-by: Jordan Justen <[email protected]>
|
|
|
|
|
|
|
|
| |
Although gen7 doesn't include surface types as a valid conversion format,
we return it, as it reflects what we want to achieve, even if we need
to workaround it on gen < 8.
Reviewed-by: Jordan Justen <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
From OpenGL 3.1 spec, section 4.3.1 "Reading Pixels", page 190 (203 PDF)
"When READ FRAMEBUFFER BINDING is zero, i.e. the default
framebuffer, src must be one of the values listed in table 4.4,
including NONE . FRONT_AND_BACK , FRONT , and LEFT refer to the
front left buffer."
There is an equivalent text on OpenGL 4.5 spec, section 18.2.1
"Selecting Buffers for Reading", page 502 (524 PDF), so the behaviour
is still the same.
Part of the fix for:
GL45-CTS.direct_state_access.framebuffers_draw_read_buffers_errors
Reviewed-by: Anuj Phogat <[email protected]>
|