summaryrefslogtreecommitdiffstats
path: root/src
Commit message (Collapse)AuthorAgeFilesLines
* radeonsi: remove unused si_prepare_cube_coordsNicolai Hähnle2017-01-132-200/+0
| | | | | Reviewed-by: Bas Nieuwenhuizen <[email protected]> Reviewed-by: Marek Olšák <[email protected]>
* amd/common: unify cube map coordinate handling between radeonsi and radvNicolai Hähnle2017-01-136-197/+440
| | | | | | | | | | | | | | | Code is taken from a combination of radv (for the more basic functions, to avoid gallivm dependencies) and radeonsi (for the new and improved derivative calculations). v2: add 0.5 offset to tex coords only after derivative calculation v3: - really only touch the first three coordinates - rebase on the removal of the 1.5 --> 0.5 offset change Reviewed-by: Bas Nieuwenhuizen <[email protected]> (v2) Reviewed-by: Marek Olšák <[email protected]>
* radeonsi: only touch first three coordinates in si_prepare_cube_coordsNicolai Hähnle2017-01-131-12/+1
| | | | | | | | Sourcing coords_arg[4] is actually never correct, since bias is handled differently in tex_fetch_args anyway. Reviewed-by: Bas Nieuwenhuizen <[email protected]> Reviewed-by: Marek Olšák <[email protected]>
* radeonsi: remove unused si_llvm_cube_to_2d_coordsNicolai Hähnle2017-01-131-28/+0
| | | | | Reviewed-by: Bas Nieuwenhuizen <[email protected]> Reviewed-by: Marek Olšák <[email protected]>
* radeonsi: restrict cube map derivative computations to the correct planeNicolai Hähnle2017-01-131-23/+107
| | | | | | | | | | | | | | | | | | | | As remarked by the comment in the original code, the old algorithm fails when (tc + deriv) points at a different cube face. Instead, simply project the derivative directly to the plane of the selected cube face. The new code is based on exactly differentiating (using the chain rule) the projection onto a plane corresponding to a fixed cube map face (which is still selected in the usual way based on the texture coordinate itself). The computations end up fairly involved, but we do save two reciprocal computations. Fixes GL45-CTS.texture_cube_map_array.sampling. v2: add 0.5 offset to tex coords only after derivative calculation v3: go back to 1.5 offset Reviewed-by: Bas Nieuwenhuizen <[email protected]> (v2) Reviewed-by: Marek Olšák <[email protected]>
* radeonsi: communicate cube map coordinates more explicitlyNicolai Hähnle2017-01-131-33/+43
| | | | | | | v2: fix compile error that snuck in during rebase Reviewed-by: Bas Nieuwenhuizen <[email protected]> Reviewed-by: Marek Olšák <[email protected]>
* ac/debug: move .gitignore for sid_tables.h tooGrazvydas Ignotas2017-01-131-0/+0
| | | | | | | | b838f642 "ac/debug: Move sid_tables.h generation to common code." moved sid_tables.h but forgot the corresponding .gitignore. Signed-off-by: Grazvydas Ignotas <[email protected]> Reviewed-by: Nicolai Hähnle <[email protected]>
* nir/gcm: Fix a typo in a commentJason Ekstrand2017-01-121-1/+1
| | | | Reported-by: Matt Turner <[email protected]>
* nir/gcm: Rework the schedule late loopJason Ekstrand2017-01-121-5/+6
| | | | | | | | | | | | This fixes a bug in code motion that occurred when the best block is the same as the schedule early block. In this case, because we're checking (lca != def->parent_instr->block) at the top of the loop, we never get to the check for loop depth so we wouldn't move it out of the loop. This commit reworks the loop to be a simple for loop up the dominator chain and we place the (lca != def->parent_instr->block) check at the end of the loop. Reviewed-by: Matt Turner <[email protected]>
* glx: Add missing glproto dependency for gallium-xlib glxChuck Atkins2017-01-121-0/+1
| | | | | | | | Cc: [email protected] Cc: Bruce Cherniak <[email protected]> Signed-of-by: Chuck Atkins <[email protected]> Reviewed-by: Bruce Cherniak <[email protected]> Reviewed-by: Emil Velikov <[email protected]>
* ac, radeonsi: automake: add missing builddir includeEmil Velikov2017-01-122-0/+2
| | | | | | | | | | The generated file is correctly stored in the builddir as of earlier commit. Yet the commit forgot to add the respective include flag thus the compiler would error out failing to find sid_tables.h Bugzila: https://bugs.freedesktop.org/show_bug.cgi?id=99389 Fixes: d1dc22eb466 "ac: automake: rework sid_tables.h generation" Signed-off-by: Emil Velikov <[email protected]>
* radv: Call NIR passes using NIR_PASS_V.Bas Nieuwenhuizen2017-01-121-17/+7
| | | | | | | | Port of faa1edeeb7bbe9321c79587e592dce812e8caa78 "anv/pipeline: Call NIR passes using NIR_PASS_V" Signed-off-by: Bas Nieuwenhuizen <[email protected]> Reviewed-by: Timothy Arceri <[email protected]>
* radv: Call nir_lower_constant_initializers.Bas Nieuwenhuizen2017-01-121-0/+13
| | | | | | | | | Port of c5d664f9dc2d281c74844cef36ecb9f5862a8f6a "anv/pipeline: Call nir_lower_constant_initializers" Signed-off-by: Bas Nieuwenhuizen <[email protected]> Cc: <[email protected]> Reviewed-by: Timothy Arceri <[email protected]>
* radv: Only call remove_dead_variables once.Bas Nieuwenhuizen2017-01-121-3/+3
| | | | | | | | Port of 43e0b0d4b255d910616c10e3e01bfec5db469e0e "anv/pipeline: Only call remove_dead_variables once" Signed-off-by: Bas Nieuwenhuizen <[email protected]> Reviewed-by: Timothy Arceri <[email protected]>
* st/nine: Protect dtors with mutexAxel Davy2017-01-124-19/+64
| | | | | | | | | | | | | | | | | | | | | When the flag D3DCREATE_MULTITHREAD is set, a global mutex is used to protect nine calls. However for performance reasons, AddRef and Release didn't hold the mutex, and instead used atomics. Unfortunately at item release, the item can be destroyed, and that destruction path should be protected by a mutex (at least for some objects). Without this patch, it is possible an app thread is in a dtor while another thread is making gallium nine calls. It is possible that two threads are using the same gallium pipe, which is forbiden. The problem has been made worse with csmt, because it can cause hang, since nine_csmt_process is not threadsafe. Fixes Hitman hang, and possibly others. Signed-off-by: Axel Davy <[email protected]>
* st/nine: Flush the queue at device dtorAxel Davy2017-01-121-1/+6
| | | | | | | | Flush the queue to get refcounts right, and properly release the items, instead of throwing away all pending commands. Signed-off-by: Axel Davy <[email protected]>
* st/nine: Process pending commands on ResetAxel Davy2017-01-123-0/+5
| | | | | | | | Some nine_state_* and nine_context_* functions used for Reset() require all pending commands are flushed. Signed-off-by: Axel Davy <[email protected]>
* st/nine: Flush pending commands if needed for surface9 changesAxel Davy2017-01-122-13/+32
| | | | | | | nine_context uses NineSurface9 fields, thus we need to flush pending commands using the surface before changing the fields. Signed-off-by: Axel Davy <[email protected]>
* st/nine: Rework CreatePipeSurfaceAxel Davy2017-01-122-22/+30
| | | | | | Create both surfaces in one call. Signed-off-by: Axel Davy <[email protected]>
* st/nine: Remove duplicated checksAxel Davy2017-01-122-10/+7
| | | | | | | | There is no need to check on csmt_active before calling nine_csmt_process, because the function checks already. Signed-off-by: Axel Davy <[email protected]>
* st/nine: Don't call u_box_union_* when dirty region is emptyMasanori Kakura2017-01-123-10/+22
| | | | | | | | | | When dirty region is empty, u_box_union_* incorrectly expands the new region. This fixes broken font rendering issue in WOLF RPG Editor v2.10 games. Signed-off-by: Masanori Kakura <[email protected]> Reviewed-by: Axel Davy <[email protected]>
* winsys/etnaviv: automake: introduce Makefile.sourcesEmil Velikov2017-01-122-1/+5
| | | | | | ... and list the public header within it. Signed-off-by: Emil Velikov <[email protected]>
* etnaviv: automake: include all files in the sources listsEmil Velikov2017-01-121-1/+9
| | | | | | Note: the currently mentioned etnaviv_utils.h is typo. Signed-off-by: Emil Velikov <[email protected]>
* ac: automake: rework sid_tables.h generationEmil Velikov2017-01-122-3/+3
| | | | | | | | | | | | | Drop $(srcdir)/ prefix analogous to before the file (and rule) movement and move it outside of the NEED_RADEON_LLVM conditional. Otherwise the build may fail as below. make[3]: *** No rule to make target 'common/sid_tables.h', needed by 'distdir'. Stop. Fixes: b838f642371 "ac/debug: Move sid_tables.h generation to common code." Signed-off-by: Emil Velikov <[email protected]>
* imx: gallium driver for imx-drm scanout driverChristian Gmeiner2017-01-1212-0/+182
| | | | | | | | | | Changes from V1 -> V2: - updated Copyright - added $(top_srcdir)/src/gallium/winsys to include path (suggested by Emil) - adapted driver to new renderonly API Signed-off-by: Christian Gmeiner <[email protected]> Acked-by: Emil Velikov <[email protected]>
* etnaviv: gallium driver for Vivante GPUsThe etnaviv authors2017-01-1270-0/+14952
| | | | | | | | | | | | | | | | | This driver supports a wide range of Vivante IP cores like GC880, GC1000, GC2000 and GC3000. Changes from V1 -> V2: - added missing files to actually integrate the driver into build system. - adapted driver to new renderonly API Signed-off-by: Christian Gmeiner <[email protected]> Signed-off-by: Lucas Stach <[email protected]> Signed-off-by: Philipp Zabel <[email protected]> Signed-off-by: Rob Herring <[email protected]> Signed-off-by: Russell King <[email protected]> Signed-off-by: Wladimir J. van der Laan <[email protected]> Acked-by: Emil Velikov <[email protected]>
* gallium: add renderonly libraryChristian Gmeiner2017-01-125-0/+303
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This a very lightweight library to add basic support for renderonly GPUs. A kms gallium driver must specify how a renderonly_scanout objects gets created. Also it must provide file handles to the used kms device and the used gpu device. This could look like: struct renderonly ro = { .create_for_resource = renderonly_create_gpu_import_for_resource, .kms_fd = fd, .gpu_fd = open("/dev/dri/renderD128", O_RDWR | O_CLOEXEC) }; The renderonly_scanout object exits for two reasons: - Do any special treatment for a scanout resource like importing the GPU resource into the scanout hw. - Make it easier for a gallium driver to detect if anything special needs to be done in flush_resource(..) like a resolve to linear. A GPU gallium driver which gets used as renderonly GPU needs to be aware of the renderonly library. This library will likely break android support and hopefully will get replaced with a better solution based on gbm2. Changes from V1 -> V2: - reworked the lifecycle of renderonly object (suggested by Nicolai Hähnle) - killed the midlayer (suggested by Thierry Reding) - made the API more explicit regarding gpu and kms fd's - added some docs Signed-off-by: Christian Gmeiner <[email protected]> Acked-by: Emil Velikov <[email protected]> Tested-by: Alexandre Courbot <[email protected]>
* spirv: Handle patch decorations up-frontJason Ekstrand2017-01-121-0/+23
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Once again, SPIR-V is insane... It allows you to place "patch" decorations on structure members. Presumably, this is so that you can do something such as out struct S { layout(location = 0) patch vec4 thing1; layout(location = 0) vec4 thing2; } str; And have your I/O "nicely" organized. While this is a bit silly, it's allowed and well-defined so whatever. Where it really gets interesting is when you have an array of struct. SPIR-V says nothing about not allowing you to have those qualifiers on the members of a struct that's inside an array and GLSLang does this. Specifically, if you have layout(location = 0) out patch struct S { vec4 thing1; vec4 thing2; } str[2]; then GLSLang will place the "patch" decorations on the struct members. This is ridiculous there is no way that having some of them be patch and some not would be well-defined given that patch and non-patch outputs are in effectively different storage classes. This commit moves around the way we handle the "patch" decoration so that we can detect even the crazy cases and handle them. Fixes: dEQP-VK.tessellation.user_defined_io.per_patch_block_array.* Reviewed-by: Kenneth Graunke <[email protected]>
* anv: Support loader interface version 3 (patch v2)Chad Versace2017-01-121-0/+44
| | | | | | | | | | | | | | | This patch implements vk_icdNegotiateLoaderICDInterfaceVersion(), which brings us to loader interface v3. v2: - Drop the pragmas. [emil] - Advertise v3 instead of v2. Anvil supported more than I thought. [jason] - s/Surface/SurfaceKHR/ in comments. [emil] Reviewed-by: Emil Velikov <[email protected]> Cc: [email protected] Cc: Jason Ekstrand <[email protected]>
* vulkan: Add new cast macros for VkIcd typesChad Versace2017-01-125-16/+34
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | We can't import the latest vk_icd.h because the new header breaks the Mesa build. This patch defines new casting macros, ICD_DEFINE_NONDISP_HANDLE_CASTS() and ICD_FROM_HANDLE(), which can handle both the old and new vk_icd.h, and will prevent the build from breaking when we update the header. In the old vk_icd.h, types were defined as: typedef struct _VkIcdFoo { ... } VkIcdFoo; Commit 6ebba1f6 in the Vulkan loader changed the above to typedef { ... } VkIcdFoo; because the old definitions violated the C and C++ specs. According to the specs, identifiers that begins with an underscore followed by an uppercase letter are reserved. (It's pedantic, I know), See the Github issue referenced below. References: https://github.com/KhronosGroup/Vulkan-LoaderAndValidationLayers/issues/7 References: https://github.com/KhronosGroup/Vulkan-LoaderAndValidationLayers/commit/6ebba1f630015af7a78767a15c1e74ba9b23601c Reviewed-by: Emil Velikov <[email protected]> Cc: [email protected]
* Always defer memory free in swr_resource_destroyGeorge Kyriazis2017-01-121-12/+5
| | | | | | | Defer delete on regular resources. This ensures that any work being done on the resource is completed before freeing up the resource's memory. Reviewed-by: Bruce Cherniak <[email protected]>
* nir/i965: assert first is always less than 64Juan A. Suarez Romero2017-01-121-0/+1
| | | | | | This fixes a defect detected by Coverity Scan. Reviewed-by: Iago Toral Quiroga <[email protected]>
* nvc0: enable GL 4.3 on gm107+Samuel Pitoiset2017-01-121-7/+4
| | | | | | | | | | | | Although, arb_shader_image_load_store-atomicity will most likely hang your box, I think it's now quite reasonable to enable GL 4.3 on Maxwell/Pascal GPUs. I suspect that test to be wrong because it doesn't even work on the NVIDIA blob. I have tested a bunch of benchmarks (UE4 demos) and real games like Shadow of Mordor and they all work fine. Signed-off-by: Samuel Pitoiset <[email protected]>
* nvc0: use sched control codes for gm107 MP counters codeSamuel Pitoiset2017-01-121-44/+44
| | | | | Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Pierre Moreau <[email protected]>
* nvc0: use sched control codes for gm107 blitter shaderSamuel Pitoiset2017-01-121-6/+14
| | | | | | Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Pierre Moreau <[email protected]> Acked-by: Ilia Mirkin <[email protected]>
* nv50/ir: use sched control codes for gm107 builtinsSamuel Pitoiset2017-01-122-40/+40
| | | | | | | | | | Yes, IMUL/IMAD require dependency barriers and we should definitely replace these instructions by XMAD but the different flags need to be figured out. Note that XMAD only supports 16-bits integers. Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Pierre Moreau <[email protected]>
* nv50/ir: improve instruction pipelining on gm107Samuel Pitoiset2017-01-123-4/+1027
| | | | | | | | | | | | | | | | | | | | | This makes use of scheduling control codes which are very useful for improving the instruction pipelining. This patch will increase performance on Maxwell GPUs by, at least, x1.5 up to x3.5 for some benchmarks. Although this has been fairly well tested, I would not be suprised if someone hit a corner case somewhere. That way, the scheduler is enabled by default but it can be deactivated by using NV50_PROG_SCHED=0. Thanks to Scott Gray for the reverse engineering work available from https://github.com/NervanaSystems/maxas/wiki/Control-Codes. Signed-off-by: Samuel Pitoiset <[email protected]> Acked-by: Pierre Moreau <[email protected]> Tested-by: Alexandre Courbot <[email protected]> Tested-by: Jan Vesely <[email protected]>
* nv50/ir: do not insert texture barriers on gm107Samuel Pitoiset2017-01-121-1/+2
| | | | | | | | | | It's actually useless to insert those texture barriers post RA because the current control code (ie. st 0x0) will wait for all dependencies before issuing a new instruction. Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Ilia Mirkin <[email protected]> Reviewed-by: Pierre Moreau <[email protected]>
* i965/gen7: expose OpenGL 4.2 on Haswell when supportedJuan A. Suarez Romero2017-01-122-2/+2
| | | | | | | | | GL_ARB_vertex_attrib_64bit was the last piece missing. v2: update docs (Jordan) Signed-off-by: Juan A. Suarez Romero <[email protected]> Reviewed-by: Jordan Justen <[email protected]>
* i965: enable ARB_shader_precision to HSW+Samuel Iglesias Gonsálvez2017-01-121-1/+1
| | | | | | | | v2: update docs (Jordan) Signed-off-by: Samuel Iglesias Gonsálvez <[email protected]> Signed-off-by: Juan A. Suarez Romero <[email protected]> Reviewed-by: Jordan Justen <[email protected]>
* i965: unify the code to enable of ARB_gpu_shader_fp64 and ↵Samuel Iglesias Gonsálvez2017-01-121-7/+2
| | | | | | | | | ARB_vertex_attrib_64bit for HSW+ Signed-off-by: Samuel Iglesias Gonsálvez <[email protected]> Signed-off-by: Juan A. Suarez Romero <[email protected]> Reviewed-by: Jordan Justen <[email protected]>
* i965: Enable ARB_vertex_attrib_64bit for HaswellAlejandro Piñeiro2017-01-121-1/+3
| | | | | | | | v2: update docs (Jordan) Signed-off-by: Alejandro Piñeiro <[email protected]> Signed-off-by: Juan A. Suarez Romero <[email protected]> Reviewed-by: Jordan Justen <[email protected]>
* i965: check for dual slot attributes on any genJuan A. Suarez Romero2017-01-121-2/+1
| | | | | | | Those not supporting 64 bit input vertex attributes will have the dual_slot value as false. Reviewed-by: Jordan Justen <[email protected]>
* i965/vec4: emit correctly load_inputs for 64bit dataJuan A. Suarez Romero2017-01-121-6/+15
| | | | | | | | | | | | | For dvec3 and dvec4 types, a single GRF do not have enough space to allocate two inputs from two different vertices (SIMD4x2). So the GRF only contains first two components for the two vertices, and the next GRF has the remaining components. We want to put all the components for the same vertex in the same register. Thus, we do a shuffle to reorder the data. Reviewed-by: Jordan Justen <[email protected]>
* i965/vec4: take into account doubles when creating attribute mappingAlejandro Piñeiro2017-01-121-4/+9
| | | | | | | | | | Doubles needs more that one slot per attribute. So when filling the attribute_map we check if it is a double in order to allocate one extra register. Signed-off-by: Alejandro Piñeiro <[email protected]> Signed-off-by: Juan A. Suarez Romero <[email protected]> Reviewed-by: Jordan Justen <[email protected]>
* i965/vec4/nir: vec4 also needs to remap vs attributesAlejandro Piñeiro2017-01-121-10/+22
| | | | | | | | | | | | | Doubles need extra space, so we would need to do a remapping for vec4 too in order to take that into account. We reuse the already existing remap_vs_attrs, but passing is_scalar, so they could remap accordingly. v2: code-format remap_vs_attrs_params initialization (Matt) Signed-off-by: Alejandro Piñeiro <[email protected]> Signed-off-by: Juan A. Suarez Romero <[email protected]> Reviewed-by: Jordan Justen <[email protected]>
* i965/vec4: use attribute slots for first non payload GRFAlejandro Piñeiro2017-01-121-1/+1
| | | | | | | | | | | | | | | | | As part of the payload setup, setup_attributes is called with the first GRF that can be used for the attributes (first ones are used for uniforms for example) and returns the first GRF that is not part of the payload. Before this patch, it adds directly the number of attributes. But as with 64-bit attributes can consume more than one slot, that is not valid anymore. This patch change the addition to use the number of slots consumed. gen >= 8 would not be affected, as they use the scalar mode. For that case, the vs configuration is done at fs_visitor::assign_vs_urb_setup. v2: add explanation in commit log (Jordan) Reviewed-by: Jordan Justen <[email protected]>
* i965: downsize *64*PASSTHRU formats to equivalent *32*FLOAT formats on gen < 8Alejandro Piñeiro2017-01-121-30/+139
| | | | | | | | | | | | gen < 8 doesn't support *64*PASSTHRU formats when emitting vertices. So in order to provide the equivalent functionality, we need to downsize the format to equivalent *32*FLOAT, and in some cases (R64G64B64 and R64G64B64A64) submit two 3DSTATE_VERTEX_ELEMENTS for each vertex element. Signed-off-by: Alejandro Piñeiro <[email protected]> Signed-off-by: Juan A. Suarez Romero <[email protected]> Reviewed-by: Jordan Justen <[email protected]>
* i965: return PASSTHRU surface types also on gen7Alejandro Piñeiro2017-01-121-2/+6
| | | | | | | | Although gen7 doesn't include surface types as a valid conversion format, we return it, as it reflects what we want to achieve, even if we need to workaround it on gen < 8. Reviewed-by: Jordan Justen <[email protected]>
* main/buffers: take into account FRONT_AND_BACK on ReadBufferAlejandro Piñeiro2017-01-121-0/+2
| | | | | | | | | | | | | | | | | | From OpenGL 3.1 spec, section 4.3.1 "Reading Pixels", page 190 (203 PDF) "When READ FRAMEBUFFER BINDING is zero, i.e. the default framebuffer, src must be one of the values listed in table 4.4, including NONE . FRONT_AND_BACK , FRONT , and LEFT refer to the front left buffer." There is an equivalent text on OpenGL 4.5 spec, section 18.2.1 "Selecting Buffers for Reading", page 502 (524 PDF), so the behaviour is still the same. Part of the fix for: GL45-CTS.direct_state_access.framebuffers_draw_read_buffers_errors Reviewed-by: Anuj Phogat <[email protected]>