summaryrefslogtreecommitdiffstats
path: root/src/gallium/drivers/radeon
Commit message (Collapse)AuthorAgeFilesLines
* android: use LOCAL_SHARED_LIBRARIES over TARGET_OUT_HEADERSEmil Velikov2015-04-221-2/+1
| | | | | | | | | ... to manage the LIBDRM*_CFLAGS. The former is the recommended approach by the Android build system developers while the latter has been depreciated for quite some time. Cc: "10.4 10.5" <[email protected]> Signed-off-by: Emil Velikov <[email protected]>
* radeonsi: add a debug option to compile shaders when they're createdMarek Olšák2015-04-162-0/+2
| | | | Tested-by: Tom Stellard <[email protected]>
* radeon/llvm: Improve codegen for KILL_IFTom Stellard2015-04-141-0/+29
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Rather than emitting one kill instruction per component of KILL_IF's src reg, we now or the components of the src register together and use the result as a condition for just one kill instruction. shader-db stats (bonaire): 979 shaders Totals: SGPRS: 34872 -> 34848 (-0.07 %) VGPRS: 20696 -> 20676 (-0.10 %) Code Size: 749032 -> 748452 (-0.08 %) bytes LDS: 11 -> 11 (0.00 %) blocks Scratch: 12288 -> 12288 (0.00 %) bytes per wave Totals from affected shaders: SGPRS: 1184 -> 1160 (-2.03 %) VGPRS: 600 -> 580 (-3.33 %) Code Size: 13200 -> 12620 (-4.39 %) bytes LDS: 0 -> 0 (0.00 %) blocks Scratch: 0 -> 0 (0.00 %) bytes per wave Increases: SGPRS: 2 (0.00 %) VGPRS: 0 (0.00 %) Code Size: 0 (0.00 %) LDS: 0 (0.00 %) Scratch: 0 (0.00 %) Decreases: SGPRS: 5 (0.01 %) VGPRS: 5 (0.01 %) Code Size: 25 (0.03 %) LDS: 0 (0.00 %) Scratch: 0 (0.00 %) *** BY PERCENTAGE *** Max Increase: SGPRS: 32 -> 40 (25.00 %) VGPRS: 0 -> 0 (0.00 %) Code Size: 0 -> 0 (0.00 %) bytes LDS: 0 -> 0 (0.00 %) blocks Scratch: 0 -> 0 (0.00 %) bytes per wave Max Decrease: SGPRS: 32 -> 24 (-25.00 %) VGPRS: 16 -> 12 (-25.00 %) Code Size: 116 -> 96 (-17.24 %) bytes LDS: 0 -> 0 (0.00 %) blocks Scratch: 0 -> 0 (0.00 %) bytes per wave *** BY UNIT *** Max Increase: SGPRS: 64 -> 72 (12.50 %) VGPRS: 0 -> 0 (0.00 %) Code Size: 0 -> 0 (0.00 %) bytes LDS: 0 -> 0 (0.00 %) blocks Scratch: 0 -> 0 (0.00 %) bytes per wave Max Decrease: SGPRS: 32 -> 24 (-25.00 %) VGPRS: 16 -> 12 (-25.00 %) Code Size: 424 -> 356 (-16.04 %) bytes LDS: 0 -> 0 (0.00 %) blocks Scratch: 0 -> 0 (0.00 %) bytes per wave Reviewed-by: Marek Olšák <[email protected]>
* radeon/llvm: Run LLVM's instruction combining passTom Stellard2015-04-141-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This should improve code quality in general and will help with some future changes to how we emit kill instructions. shader-db shows a few regressions, but these don't seem to be the result of deficiencies in instcombine. They're mostly caused by the scheduler making different decisions than before. shader-db stats (bonaire): 979 shaders Totals: SGPRS: 35056 -> 34872 (-0.52 %) VGPRS: 20624 -> 20696 (0.35 %) Code Size: 764372 -> 749032 (-2.01 %) bytes LDS: 11 -> 11 (0.00 %) blocks Scratch: 12288 -> 12288 (0.00 %) bytes per wave Totals from affected shaders: SGPRS: 13264 -> 13072 (-1.45 %) VGPRS: 8248 -> 8316 (0.82 %) Code Size: 486320 -> 470992 (-3.15 %) bytes LDS: 11 -> 11 (0.00 %) blocks Scratch: 11264 -> 11264 (0.00 %) bytes per wave Increases: SGPRS: 6 (0.01 %) VGPRS: 20 (0.02 %) Code Size: 14 (0.01 %) LDS: 0 (0.00 %) Scratch: 0 (0.00 %) Decreases: SGPRS: 32 (0.03 %) VGPRS: 8 (0.01 %) Code Size: 244 (0.25 %) LDS: 0 (0.00 %) Scratch: 0 (0.00 %) *** BY PERCENTAGE *** Max Increase: SGPRS: 32 -> 48 (50.00 %) VGPRS: 12 -> 20 (66.67 %) Code Size: 216 -> 224 (3.70 %) bytes LDS: 0 -> 0 (0.00 %) blocks Scratch: 0 -> 0 (0.00 %) bytes per wave Max Decrease: SGPRS: 40 -> 32 (-20.00 %) VGPRS: 16 -> 12 (-25.00 %) Code Size: 368 -> 280 (-23.91 %) bytes LDS: 0 -> 0 (0.00 %) blocks Scratch: 0 -> 0 (0.00 %) bytes per wave *** BY UNIT *** Max Increase: SGPRS: 32 -> 48 (50.00 %) VGPRS: 28 -> 36 (28.57 %) Code Size: 39320 -> 40132 (2.07 %) bytes LDS: 0 -> 0 (0.00 %) blocks Scratch: 0 -> 0 (0.00 %) bytes per wave Max Decrease: SGPRS: 72 -> 64 (-11.11 %) VGPRS: 48 -> 40 (-16.67 %) Code Size: 6272 -> 5852 (-6.70 %) bytes LDS: 0 -> 0 (0.00 %) blocks Scratch: 0 -> 0 (0.00 %) bytes per wave Reviewed-by: Marek Olšák <[email protected]>
* radeon/vce: implement video usability information supportLeo Liu2015-03-313-0/+59
| | | | | | | | | This will help encoding VUI into the bitstream v2: make backward compatible Signed-off-by: Leo Liu <[email protected]> Reviewed-by: Christian König <[email protected]>
* gallium: implement get_device_vendor() for existing driversGiuseppe Bilotta2015-03-231-0/+6
| | | | | | | | | The only hackish ones are llvmpipe and softpipe, which currently return the same string as for get_vendor(), while ideally they should return the CPU vendor. Signed-off-by: Giuseppe Bilotta <[email protected]> Reviewed-by: Tom Stellard <[email protected]>
* radeonsi: increase coords array size for radeon_llvm_emit_prepare_cube_coordsMarek Olšák2015-03-181-1/+1
| | | | | | | | | radeon_llvm_emit_prepare_cube_coords uses coords[4] in some cases (TXB2 etc.) Discovered by Coverity. Reported by Ilia Mirkin. Cc: 10.5 10.4 <[email protected]> Reviewed-by: Michel Dänzer <[email protected]>
* radeonsi: implement TGSI_OPCODE_BFI (v2)Marek Olšák2015-03-161-0/+34
| | | | | | | v2: Don't use the intrinsics, the shader backend can recognize these patterns and generates optimal code automatically. Reviewed-by: Tom Stellard <[email protected]>
* radeonsi: add basic code for overrasterizationMarek Olšák2015-03-163-14/+26
| | | | | | | This will be used for line and polygon smoothing. This is GCN-only even though it's in shared code. Reviewed-by: Michel Dänzer <[email protected]>
* radeonsi: add support for easy opcodes from ARB_gpu_shader5Marek Olšák2015-03-161-0/+8
| | | | | | | | I have to use the BFE instrinsics, because BFE is one of the most complex instructions that can't be matched easily. BFE has 3 conditional branches and one of them is quite big. In the isel DAG, lowered BFE has 27 nodes (including leafs).
* radeonsi: implement bit-finding opcodes from ARB_gpu_shader5Marek Olšák2015-03-161-0/+92
| | | | Reviewed-by: Glenn Kennard <[email protected]>
* radeonsi: add support for SQRTMarek Olšák2015-03-161-0/+2
| | | | | Reviewed-by: Tom Stellard <[email protected]> Reviewed-by: Glenn Kennard <[email protected]>
* radeonsi: add support for FMAMarek Olšák2015-03-161-0/+2
| | | | | Reviewed-by: Tom Stellard <[email protected]> Reviewed-by: Glenn Kennard <[email protected]>
* gallium/radeon: don't use LLVMReadOnlyAttribute for ALUMarek Olšák2015-03-161-16/+9
| | | | | | | None of the instructions use a pointer argument. (+ small cosmetic changes) Reviewed-by: Tom Stellard <[email protected]>
* r600g,radeonsi: fix streamout after pipeline stats have been usedMarek Olšák2015-02-242-13/+1
| | | | | | | | | | | EVENT_TYPE_PIPELINESTAT_STOP disables streamout queries too. Luckily, pipeline stats are enabled by default, so we don't even have to emit EVENT_TYPE_PIPELINESTAT_START. Tested on Hawaii, Bonaire, Redwood, RV730. Reviewed-by: Michel Dänzer <[email protected]>
* gallium/radeon: fix an uninitialized-variable warningMarek Olšák2015-02-201-1/+1
|
* Revert "radeon/llvm: enable unsafe math for graphics shaders"Michel Dänzer2015-02-181-4/+0
| | | | | | | | | | | This reverts commit 0e9cdedd2e3943bdb7f3543a3508b883b167e427. It caused the grass to disappear in The Talos Principle. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=89069 Cc: "10.5 10.4" <[email protected]> Reviewed-by: Tom Stellard <[email protected]> Reviewed-by: Marek Olšák <[email protected]>
* r600g,radeonsi: implement GL_AMD_pinned_memoryMarek Olšák2015-02-173-4/+48
| | | | | | v2: update release notes Reviewed-by: Christian König <[email protected]>
* radeonsi: initialize TC_L2_dirty to false after buffer allocationMarek Olšák2015-02-171-0/+1
| | | | | | I forgot to do this, though "true" should have no effect on correctness. Reviewed-by: Michel Dänzer <[email protected]>
* r600g,radeonsi: use fences to implement PIPE_QUERY_GPU_FINISHEDMarek Olšák2015-02-171-9/+13
| | | | | | Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=89014 Reviewed-by: Michel Dänzer <[email protected]>
* r600g,radeonsi: demote TIMESTAMP_DISJOINT query to be a software queryMarek Olšák2015-02-171-14/+13
| | | | | | The query result is always constant. Reviewed-by: Michel Dänzer <[email protected]>
* r600g,radeonsi: don't append to streamout buffers that haven't been used yetMarek Olšák2015-02-042-1/+4
| | | | | | | | | | | | The FILLED_SIZE counter is uninitialized at the beginning, so we can't use it. Instead, use offset = 0, which is what we always do when not appending. This unexpectedly fixes spec/ARB_texture_multisample/sample-position/*. Yes, the test does use transform feedback. Cc: 10.3 10.4 <[email protected]> Reviewed-by: Glenn Kennard <[email protected]> Reviewed-by: Michel Dänzer <[email protected]>
* tgsi: add tgsi_get_processor_type helper from radeonMarek Olšák2015-02-041-11/+0
| | | | | Reviewed-by: Glenn Kennard <[email protected]> Reviewed-by: Brian Paul <[email protected]>
* dir-locals.el: Don't set variables for non-programming modesNeil Roberts2015-02-021-1/+1
| | | | | | | | | | | | | | This limits the style changes to modes inherited from prog-mode. The main reason to do this is to avoid setting fill-column for people using Emacs to edit commit messages because 78 characters is too many to make it wrap properly in git log. Note that makefile-mode also inherits from prog-mode so the fill column should continue to apply there. v2: Apply to all the .dir-locals.el files, not just the one in the root directory. Acked-by: Michel Dänzer <[email protected]>
* r600g,radeonsi: Fix calculation of IR target cap string buffer sizeMichel Dänzer2015-01-271-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Fixes writing beyond the allocated buffer: ==31855== Invalid write of size 1 ==31855== at 0x50AB2A9: vsprintf (iovsprintf.c:43) ==31855== by 0x508F6F6: sprintf (sprintf.c:32) ==31855== by 0xB59C7EC: r600_get_compute_param (r600_pipe_common.c:526) ==31855== by 0x5B2B7DE: get_compute_param<char> (device.cpp:37) ==31855== by 0x5B2B7DE: clover::device::ir_target() const (device.cpp:201) ==31855== by 0x5B398E0: clover::program::build(clover::ref_vector<clover::device> const&, char const*, clover::compat::vector<clover::compat::pair<clover::compat::string, clover::compat::string> > const&) (program.cpp:63) ==31855== by 0x5B20152: clBuildProgram (program.cpp:182) ==31855== by 0x400F41: main (hello_world.c:109) ==31855== Address 0x56fed5f is 0 bytes after a block of size 15 alloc'd ==31855== at 0x4C29180: operator new(unsigned long) (vg_replace_malloc.c:324) ==31855== by 0x5B2B7C2: allocate (new_allocator.h:104) ==31855== by 0x5B2B7C2: allocate (alloc_traits.h:357) ==31855== by 0x5B2B7C2: _M_allocate (stl_vector.h:170) ==31855== by 0x5B2B7C2: _M_create_storage (stl_vector.h:185) ==31855== by 0x5B2B7C2: _Vector_base (stl_vector.h:136) ==31855== by 0x5B2B7C2: vector (stl_vector.h:278) ==31855== by 0x5B2B7C2: get_compute_param<char> (device.cpp:35) ==31855== by 0x5B2B7C2: clover::device::ir_target() const (device.cpp:201) ==31855== by 0x5B398E0: clover::program::build(clover::ref_vector<clover::device> const&, char const*, clover::compat::vector<clover::compat::pair<clover::compat::string, clover::compat::string> > const&) (program.cpp:63) ==31855== by 0x5B20152: clBuildProgram (program.cpp:182) ==31855== by 0x400F41: main (hello_world.c:109) Reviewed-by: Marek Olšák <[email protected]> Reviewed-by: Tom Stellard <[email protected]>
* radeonsi: Re-enable LLVM IR dumpsTom Stellard2015-01-201-1/+3
| | | | | This was inadvertently disabled by 761e36b4caab4e8e09a4c2b1409a825902fc7d2c.
* radeon: Teach radeon_elf_read() how to parse reloc information v3Tom Stellard2015-01-203-5/+76
| | | | | | | | | v2: - Use strdup for copying reloc names. - Free reloc memory. v3: - Add free_relocs parameter to radeon_shader_binary_free_members()
* radeon: Add a helper function for freeing members of radeon_shader_binaryTom Stellard2015-01-202-0/+11
|
* r600g: fix build failure when building the driver without LLVMMarek Olšák2015-01-121-0/+4
|
* radeonsi: use TC L2 for CP DMA operations with shader resources on CIKMarek Olšák2015-01-071-0/+12
| | | | | | | | | So that TC L2 doesn't need to be flushed. The only problem is with index buffers, which don't use TC. A simple solution is added that flushes TC L2 before a draw call (TC_L2_dirty). Reviewed-by: Michel Dänzer <[email protected]>
* radeonsi: only flush the right set of caches for CP DMA operationsMarek Olšák2015-01-074-8/+14
| | | | | | | | That's either framebuffer caches or caches for shader resources. The motivation is that framebuffer caches need to be flushed very rarely here. Reviewed-by: Michel Dänzer <[email protected]>
* r600g,radeonsi: separate cache flush flagsMarek Olšák2015-01-071-22/+2
| | | | | | I will rename them for radeonsi. Reviewed-by: Michel Dänzer <[email protected]>
* r600g: move r6xx-specific streamout flush flagging into r600gMarek Olšák2015-01-071-6/+1
| | | | Reviewed-by: Michel Dänzer <[email protected]>
* radeonsi: use ordered compares for SSG and face selectionMarek Olšák2015-01-071-2/+2
| | | | | | | | | | | Ordered compares are what you have in C. Unordered compares are the result of negating ordered compares (they return true if either argument is NaN). That special NaN behavior is completely useless here, and unordered compares produce horrible code with all stable LLVM versions. (I think that has been fixed in LLVM git) Reviewed-by: Michel Dänzer <[email protected]>
* radeon/llvm: Use amdgcn triple for SI+ on LLVM >= 3.6Tom Stellard2015-01-063-14/+20
|
* radeonsi: Cache LLVMTargetMachine object in si_screenTom Stellard2015-01-062-24/+29
| | | | | | | | | | Rather than building a new one every compile. This should reduce some of the overhead of compiling shaders. One consequence of this change is that we lose the MachineInstrs dumps when dumping the shaders via R600_DEBUG. The LLVM IR and assembly is still dumped, and if you still want to see the MachineInstr dump, you can run the dumped LLVM IR through llc.
* radeonsi: fix warningsMarek Olšák2015-01-011-0/+2
|
* gallium: Remove Android files from distribution.Matt Turner2014-12-121-1/+1
| | | | Android builds Mesa from git, so there don't need to be in the tarball.
* radeonsi: add emit util functions for SH registersMarek Olšák2014-12-102-1/+18
| | | | Reviewed-by: Michel Dänzer <[email protected]>
* winsys/radeon: Always report at least 1 compute unitTom Stellard2014-12-081-1/+1
| | | | | | | | All uses of this require that the value be at least one, so it's easier to report at least one than having to wrap all uses in MAX2(max_compute_units, 1). Reviewed-by: Marek Olšák <[email protected]>
* gallium: remove unused pipe_viewport_state::translate[3] and scale[3]Marek Olšák2014-11-161-2/+0
| | | | Almost all drivers ignore them.
* r600g/compute: Enable PIPE_SHADER_IR_NATIVE for compute shaders v2Tom Stellard2014-10-312-6/+6
| | | | | v2: - Drop dependency on LLVM >= 3.5.1
* gallium/radeon: Add query for symbol specific config informationTom Stellard2014-10-313-0/+86
| | | | | | | This adds a query which allows drivers to access the config information of a specific function within the LLVM generated ELF binary. This makes it possible for the driver to handle ELF binaries with multiple kernels / global functions.
* r600g: Delete unused variable 'max_global_size' in 'r600_get_compute_param'Dieter Nützel2014-10-301-1/+0
| | | | Signed-off-by: Dieter Nützel <[email protected]>
* radeon/llvm: Dynamically allocate branch/loop stack arraysMichel Dänzer2014-10-292-6/+37
| | | | | | | | | | | This prevents us from silently overflowing the stack arrays, and allows arbitrary stack depths. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=85454 Cc: [email protected] Reported-and-Tested-by: Nick Sarnie <[email protected]> Reviewed-by: Marek Olšák <[email protected]>
* radeon: enable Hyper-Z on r600g and radeonsi by defaultAndreas Boll2014-10-243-3/+3
| | | | | | | | | | | | | | | | | This reverts commit 01e637114914453451becc0dc8afe60faff48d84. Since then many Hyper-Z issues have been fixed or worked around. Enable Hyper-Z by default so that we get enough feedback for the upcoming mesa 10.4 release. If you have issues with Hyper-Z try to disable Hyper-Z using the enviroment variable R600_DEBUG=nohyperz and please report the issue on the bugtracker. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=75011 See also: https://bugs.freedesktop.org/show_bug.cgi?id=75112 Signed-off-by: Andreas Boll <[email protected]> Reviewed-by: Marek Olšák <[email protected]>
* r600g,radeonsi: convert TGSI shader type to LLVM shader typeMarek Olšák2014-10-211-1/+30
| | | | | | | | | | | | The values are hardcoded in the LLVM backend, but the TGSI definitions are going to be changed with tessellation, e.g. TGSI_PROCESSOR_COMPUTE will be increased by 2. We'll use VS for LS and HS, because there's nothing special about them from the LLVM backend point of view, even though the hardware side is different. We do the same for ES. Reviewed-by: Michel Dänzer <[email protected]>
* r600g,radeonsi: Only set use_staging_texture = TRUE onceMichel Dänzer2014-10-151-8/+5
| | | | | | No need to check for setting the flag after we set it already. Reviewed-by: Marek Olšák <[email protected]>
* r600g,radeonsi: Use staging texture for transfers if any miplevel is tiledMichel Dänzer2014-10-151-1/+1
| | | | | | | We set the NO_CPU_ACCESS flag for BO allocation in that case, so direct CPU access may not work. Reviewed-by: Marek Olšák <[email protected]>
* radeonsi: remove shader->input[] and output[] arrays and dependenciesMarek Olšák2014-10-121-1/+2
| | | | | | | | | They were reinventing tgsi_shader_info. They are unused now. radeon_llvm_context::load_input can be NULL if input fetching is implemented in some other way. Reviewed-by: Michel Dänzer <[email protected]>