aboutsummaryrefslogtreecommitdiffstats
path: root/src/gallium/drivers
Commit message (Collapse)AuthorAgeFilesLines
...
* gallium/radeon: Add libamd_common.a to TARGET_LIB_DEPS also for r600Michel Dänzer2017-02-282-1/+5
| | | | | | | | | | | | | | | | | | Fixes build failure with --enable-opencl --enable-xvmc: make[4]: Entering directory '/home/daenzer/src/mesa-git/mesa/build-amd64/src/gallium/targets/xvmc' CXXLD libXvMCgallium.la ../../../../src/gallium/drivers/r600/.libs/libr600.a(evergreen_compute.o): In function `evergreen_create_compute_state': /home/daenzer/src/mesa-git/mesa/build-amd64/src/gallium/drivers/r600/../../../../../src/gallium/drivers/r600/evergreen_compute.c:254: undefined reference to `ac_elf_read' ../../../../src/gallium/drivers/r600/.libs/libr600.a(evergreen_compute.o): In function `r600_shader_binary_read_config': /home/daenzer/src/mesa-git/mesa/build-amd64/src/gallium/drivers/r600/../../../../../src/gallium/drivers/r600/evergreen_compute.c:189: undefined reference to `ac_shader_binary_config_start' /home/daenzer/src/mesa-git/mesa/build-amd64/src/gallium/drivers/r600/../../../../../src/gallium/drivers/r600/evergreen_compute.c:189: undefined reference to `ac_shader_binary_config_start' collect2: error: ld returned 1 exit status Makefile:760: recipe for target 'libXvMCgallium.la' failed Fixes: dc4c551a345d ("radeon/ac: switch from radeon_elf_read() to ac_elf_read()") Acked-by: Timothy Arceri <[email protected]> Tested-by: Timothy Arceri <[email protected]>
* radeon: remove unused radeon_elf_util.{c,h}Timothy Arceri2017-02-287-256/+0
| | | | | | We now use the shared code in AMD common instead. Reviewed-by: Marek Olšák <[email protected]>
* radeon/ac: switch to ac_shader_binary_config_start()Timothy Arceri2017-02-282-3/+4
| | | | | | | | For radeonsi we could probably switch to ac_shader_binary_read_config(). However the functions have diverged so just share this helper for now. Reviewed-by: Marek Olšák <[email protected]>
* radeon/ac: switch from radeon_elf_read() to ac_elf_read()Timothy Arceri2017-02-284-6/+4
| | | | Reviewed-by: Marek Olšák <[email protected]>
* radeon/ac: switch from radeon_shader_binary to ac_shader_binaryTimothy Arceri2017-02-2812-73/+36
| | | | Reviewed-by: Marek Olšák <[email protected]>
* svga: fix MSVC build error after PIPE_CAP_USER_INDEX_BUFFERS removalBrian Paul2017-02-241-1/+1
| | | | | | | Need to specify the zero for the struct initializer. My earlier test of the patch series was with MinGW, not MSVC. Trivial.
* vc4: Lazily emit our FS/VS input loads.Eric Anholt2017-02-244-75/+93
| | | | | | | | | | | | | | | | | | | This reduces register pressure in both types of shaders, by reordering the input loads from the var->data.driver_location order to whatever order they appear first in the NIR shader. These instructions aren't reorderable at our QIR scheduling level because the FS takes two in lockstep to do an interpolation, and the VS takes multiple read instructions in a row to get a whole vec4-level attribute read. shader-db impact: total instructions in shared programs: 76666 -> 76590 (-0.10%) instructions in affected programs: 42945 -> 42869 (-0.18%) total max temps in shared programs: 9395 -> 9208 (-1.99%) max temps in affected programs: 2951 -> 2764 (-6.34%) Some programs get their max temps hurt, depending on the order that the load_input intrinsics appear, because we end up being unable to copy propagate an older VPM read into its only use.
* vc4: Refactor the load_input code out of the intrinsic code.Eric Anholt2017-02-241-25/+42
| | | | It's going gain most of ntq_setup_inputs(), so simplify it first.
* vc4: Track the last block we emitted at the top level.Eric Anholt2017-02-243-5/+10
| | | | | This will be used for delaying our VPM reads (which must be unconditional) until just before they're used.
* vc4: Emit max number of temps in the shader-db output.Eric Anholt2017-02-241-0/+23
| | | | | | | We need to be paying attention to optimization's impact on this -- even if we reduce instruction count, increasing max temps in general is likely to cause us to fail to register allocate on some shaders, which means that those won't run at all.
* radeonsi: fix broken tessellation on Carrizo and StoneyMarek Olšák2017-02-251-1/+3
| | | | | | | Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=99850 Cc: 13.0 17.0 <[email protected]> Reviewed-by: Alex Deucher <[email protected]>
* trace: remove pipe_resource wrappingMarek Olšák2017-02-257-260/+40
| | | | | | | | | | Not needed. ddebug does the same thing. The limitation is that drivers can only use pipe_resource::screen through pipe_resource_reference. This unbreaks trace, because pipe_context uploaders aren't wrapped, so trace doesn't understand buffers returned by them. Reviewed-by: Brian Paul <[email protected]>
* gallium: remove PIPE_CAP_USER_INDEX_BUFFERSMarek Olšák2017-02-2515-15/+0
| | | | | | | | all drivers support it Reviewed-by: Nicolai Hähnle <[email protected]> Reviewed-by: Brian Paul <[email protected]> Tested-by: Brian Paul <[email protected]> (VMware driver only)
* svga: implement user index buffersMarek Olšák2017-02-252-2/+13
| | | | | Reviewed-by: Brian Paul <[email protected]> Tested-by: Brian Paul <[email protected]> (VMware driver only)
* freedreno: add support for user index buffersMarek Olšák2017-02-252-1/+13
| | | | Reviewed-by: Brian Paul <[email protected]>
* etnaviv: add support for user index buffersMarek Olšák2017-02-252-1/+13
| | | | Reviewed-by: Brian Paul <[email protected]>
* vc4: automake: add the kernel/README to the tarballEmil Velikov2017-02-241-0/+2
| | | | | | Signed-off-by: Emil Velikov <[email protected]> Reviewed-by: Eric Anholt <[email protected]> Reviewed-by: Andreas Boll <[email protected]>
* softpipe: enable clear_texture with util_clear_textureLars Hamre2017-02-242-1/+4
| | | | | | | | Passes all corresponding piglit tests. Signed-off-by: Lars Hamre <[email protected]> Reviewed-by: Roland Scheidegger <[email protected]> Reviewed-by: Edward O'Callaghan <[email protected]>
* llvmpipe: enable clear_texture with util_clear_textureLars Hamre2017-02-242-2/+4
| | | | | | | | Passes all corresponding piglit tests. Signed-off-by: Lars Hamre <[email protected]> Reviewed-by: Roland Scheidegger <[email protected]> Reviewed-by: Edward O'Callaghan <[email protected]>
* swr: fix index buffers with non-zero indicesGeorge Kyriazis2017-02-235-6/+62
| | | | | | | | | | | | | Fix issue with index buffers that do not contain a 0 index. 0 index can be a non-valid index if the (copied) vertex buffers are a subset of the user's (which happens because we only copy the range between min & max). Core will use an index passed in from the driver to replace invalid indices. Only do this for calls that contain non-zero indices, to minimize performance Reviewed-by: Bruce Cherniak <[email protected]> cost.
* swr: add fetch shader cacheGeorge Kyriazis2017-02-236-15/+50
| | | | | | | | | For now, the cache key is all of FETCH_COMPILE_STATE. Use new/delete for swr_vertex_element_state, since we have to call the constructors/destructors of the struct elements. Reviewed-by: Bruce Cherniak <[email protected]>
* radeon: fix r600 builds when old version of llvm is presentTimothy Arceri2017-02-231-2/+2
| | | | Reviewed-by: Edward O'Callaghan <[email protected]>
* r600/radeonsi: enable glsl/tgsi on-disk cacheTimothy Arceri2017-02-232-0/+46
| | | | | | | | | | For gpu generations that use LLVM we create a timestamp string containing both the LLVM and Mesa build times, otherwise we just use the Mesa build time. Reviewed-by: Marek Olšák <[email protected]> Reviewed-by: Edward O'Callaghan <[email protected]> Reviewed-by: Nicolai Hähnle <[email protected]>
* ddebug/rbug/trace: add get_disk_shader_cache() to pass-throughsTimothy Arceri2017-02-233-0/+39
| | | | Reviewed-by: Nicolai Hähnle <[email protected]>
* gallium/u_queue: isolate util_queue_fence implementationMarek Olšák2017-02-222-4/+4
| | | | | | it's cleaner this way. Reviewed-by: Nicolai Hähnle <[email protected]>
* radeonsi: fix issues with monolithic shadersMarek Olšák2017-02-211-1/+2
| | | | | | | | | | | | | | | | R600_DEBUG=mono has had no effect since: commit 1fabb297177069e95ec1bb7053acb32f8ec3e092 Author: Marek Olšák <[email protected]> Date: Tue Feb 14 22:08:32 2017 +0100 radeonsi: have separate LS and ES main shader parts in the shader selector Also, this assertion was failing: si_state_shaders.c:1307: si_shader_select_with_key: Assertion `!shader->is_optimized' failed. Reviewed-by: Nicolai Hähnle <[email protected]>
* radeonsi: set no-signed-zeros-fp-mathMarek Olšák2017-02-212-1/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Recommended by Matt Arsenault. 46757 shaders in 28742 tests Totals: SGPRS: 2068851 -> 2066907 (-0.09 %) VGPRS: 1604056 -> 1602676 (-0.09 %) Spilled SGPRs: 1402 -> 1382 (-1.43 %) Spilled VGPRs: 113 -> 113 (0.00 %) Private memory VGPRs: 1332 -> 1332 (0.00 %) Scratch size: 3224 -> 3188 (-1.12 %) dwords per thread Code Size: 58815520 -> 58716788 (-0.17 %) bytes LDS: 1162 -> 1162 (0.00 %) blocks Max Waves: 354616 -> 354905 (0.08 %) Wait states: 0 -> 0 (0.00 %) Totals from affected shaders: SGPRS: 786452 -> 784508 (-0.25 %) VGPRS: 530000 -> 528620 (-0.26 %) Spilled SGPRs: 958 -> 938 (-2.09 %) Spilled VGPRs: 85 -> 85 (0.00 %) Private memory VGPRs: 636 -> 636 (0.00 %) Scratch size: 1880 -> 1844 (-1.91 %) dwords per thread Code Size: 26349936 -> 26251204 (-0.37 %) bytes LDS: 304 -> 304 (0.00 %) blocks Max Waves: 108962 -> 109251 (0.27 %) Wait states: 0 -> 0 (0.00 %) Reviewed-by: Nicolai Hähnle <[email protected]>
* gallivm: add no-signed-zeros-fp-math option to lp_create_builder (v2)Marek Olšák2017-02-211-1/+5
| | | | | | v2: define lp_float_mode Reviewed-by: Nicolai Hähnle <[email protected]>
* radeonsi: skip TESSINNER/OUTER offchip stores if TES doesn't read themMarek Olšák2017-02-213-15/+77
| | | | | | | | | | We were unconditionally storing these outputs, sometimes even one component at a time, but apps never read them in TES. Move the TESSINNER/OUTER buffer stores into the TCS epilog where we can easily disable them on demand. Reviewed-by: Nicolai Hähnle <[email protected]>
* radeonsi: skip LDS stores in TCS if there are no LDS output readsMarek Olšák2017-02-211-1/+16
| | | | | | | | | | | This removes a lot of useless LDS stores. A few games read TESSINNER/OUTER, but not any other outputs. Most games don't read any outputs. The only app doing LDS output reads is UE4 Lightsroom Interior. Reviewed-by: Nicolai Hähnle <[email protected]>
* etnaviv: remove number of pixel pipes validationChristian Gmeiner2017-02-211-10/+0
| | | | | | | | | | | | | This validation was added before the etnaviv drm driver landed in the linux kernel. Due some pre-merge API changes we had to fix-up this value but with a mainline kernel this is not a problem anymore. Lets remove that validation which also gets rid of problem caught by Coverity, reported to me by imirkin. Cc: "17.0" <[email protected]> Signed-off-by: Christian Gmeiner <[email protected]> Reviewed-by: Eric Engestrom <[email protected]>
* etnaviv: move pctx initialisation to avoid a null dereferenceChristian Gmeiner2017-02-211-6/+6
| | | | | | | | | | | In case ctx->stream == NULL the fail label gets executed where pctx gets dereferenced - too bad pctx is NULL in that case. Caught by Coverity, reported to me by imirkin. Cc: "17.0" <[email protected]> Signed-off-by: Christian Gmeiner <[email protected]> Reviewed-by: Eric Engestrom <[email protected]>
* etnaviv: add missing fallthrough annotationChristian Gmeiner2017-02-211-0/+1
| | | | | | | Caught by Coverity, reported to me by imirkin. Signed-off-by: Christian Gmeiner <[email protected]> Reviewed-by: Eric Engestrom <[email protected]>
* radeonsi: fix UINT/SINT clamping for 10-bit formats on <= CIKNicolai Hähnle2017-02-216-19/+43
| | | | | | | | | | The same PS epilog workaround as for 8-bit integer formats is required, since the CB doesn't do clamping. Fixes GL45-CTS.gtf32.GL3Tests.packed_pixels.packed_pixels*. Cc: [email protected] Reviewed-by: Marek Olšák <[email protected]>
* radeonsi: handle MultiDrawIndirect in si_get_draw_start_countNicolai Hähnle2017-02-211-7/+53
| | | | | | | | | | | | | | | | | | | | | Also handle the GL_ARB_indirect_parameters case where the count itself is in a buffer. Use transfers rather than mapping the buffers directly. This anticipates the possibility that the buffers are sparse (once ARB_sparse_buffer is implemented), in which case they cannot be mapped directly. Fixes GL45-CTS.gtf43.GL3Tests.multi_draw_indirect.multi_draw_indirect_type on <= CIK. v2: - unmap the indirect buffer correctly - handle the corner case where we have indirect draws, but all of them have count 0. Cc: [email protected] Reviewed-by: Marek Olšák <[email protected]> Acked-by: Edward O'Callaghan <[email protected]>
* nvc0: use PascalB for most Pascal boardsBen Skeggs2017-02-212-1/+9
| | | | Signed-off-by: Ben Skeggs <[email protected]>
* r300g: only allow byteswapped formats on big endianGrazvydas Ignotas2017-02-211-0/+5
| | | | | | | | | They cause regressions on little endian. Fixes: 172bfdaa9e ("r300g: add support for PIPE_FORMAT_x8R8G8B8_*") Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=98869 Signed-off-by: Grazvydas Ignotas <[email protected]> Signed-off-by: Marek Olšák <[email protected]>
* android: radeonsi: fix sid_table.h generated header include pathMauro Rossi2017-02-201-1/+3
| | | | | | | | | | | | | | | generated-sources-dir-for macro replaces intermediates-dir-for and LOCAL_MODULE_CLASS is defined as required by new macro, in order to avoid the following building error: external/mesa/src/gallium/drivers/radeonsi/si_debug.c:29:10: fatal error: 'sid_tables.h' file not found ^ 1 error generated. Fixes: 730574c58e8 ("android: ac/debug: move sid_tables.h generation and IB decode to amd/common") Acked-by: Nicolai Hähnle <[email protected]> Acked-by: Emil Velikov <[email protected]>
* gallium/u_index_modify: don't add PIPE_TRANSFER_UNSYNCHRONIZED unconditionallyMarek Olšák2017-02-193-3/+5
| | | | | | | | It's OK for r300g (because r300g can't write to buffers via the GPU), but not later hardware. This issue was spotted randomly. Cc: [email protected] Reviewed-by: Nicolai Hähnle <[email protected]>
* radeonsi: fix UNSIGNED_BYTE index buffer fallback with non-zero start (v2)Marek Olšák2017-02-191-2/+2
| | | | | | | | | | | start can only be non-zero with MultiDrawElements, which is unlikely to occur with UNSIGNED_BYTE indices. v2: Also fix the util_shorten_ubyte_elts_to_userptr call. Tested with the new piglit. Cc: [email protected] Reviewed-by: Nicolai Hähnle <[email protected]>
* gallium: remove TGSI_OPCODE_CLAMPMarek Olšák2017-02-185-18/+3
| | | | | | | Not used and not widely supported. Use MIN+MAX instead. Reviewed-by: Dave Airlie <[email protected]> Reviewed-by: Nicolai Hähnle <[email protected]>
* radeonsi: stop using TGSI_OPCODE_CLAMP by moving it amd/commonMarek Olšák2017-02-183-24/+6
| | | | | Reviewed-by: Dave Airlie <[email protected]> Reviewed-by: Nicolai Hähnle <[email protected]>
* radeonsi: use R600_RESOURCE_FLAG_UNMAPPABLE where it's desirableMarek Olšák2017-02-185-26/+50
| | | | Reviewed-by: Nicolai Hähnle <[email protected]>
* gallium/radeon: add R600_RESOURCE_FLAG_UNMAPPABLEMarek Olšák2017-02-182-2/+3
| | | | Reviewed-by: Nicolai Hähnle <[email protected]>
* gallium/radeon: change r600_aligned_buffer_create to take flags, not bindMarek Olšák2017-02-182-4/+4
| | | | | | All call sites set bind = 0. The next commit will use this. Reviewed-by: Nicolai Hähnle <[email protected]>
* radeonsi: upload constants into VRAM instead of GTTMarek Olšák2017-02-184-10/+18
| | | | | | | | This lowers lgkm wait cycles by 30% on VI and normal conditions. The might be a measurable improvement when CE is disabled (radeon) or under L2 thrashing. Reviewed-by: Nicolai Hähnle <[email protected]>
* gallium/radeon: use TCC line size as alignment in other placesMarek Olšák2017-02-184-5/+9
| | | | Reviewed-by: Nicolai Hähnle <[email protected]>
* radeonsi: use a clever alignment for index buffer uploadsMarek Olšák2017-02-181-4/+7
| | | | Reviewed-by: Nicolai Hähnle <[email protected]>
* radeonsi: use a clever alignment for descriptor uploadsMarek Olšák2017-02-181-4/+7
| | | | | | | Non-VBO descriptors won't be smaller than the cache line, so simply use the cache line size. Reviewed-by: Nicolai Hähnle <[email protected]>
* radeonsi: use a clever alignment for constant buffer uploadsMarek Olšák2017-02-183-1/+19
| | | | | | This results in a very tiny decrease in lgkm wait cycles. Reviewed-by: Nicolai Hähnle <[email protected]>