summaryrefslogtreecommitdiffstats
path: root/src/gallium/drivers
Commit message (Collapse)AuthorAgeFilesLines
* r600g/sb: improve alu packing on caymanVadim Girlin2013-07-172-15/+89
| | | | | | | | | | | | | | | | Scheduler/register allocator in r600-sb was developed and optimized on evergreen (VLIW-5) hardware, so currently it's not optimal for VLIW-4 chips. This patch should improve performance on cayman gpus due to better alu packing, but also it tends to increase register usage, so overall positive effect on performance has to be proven by real benchmarks yet. Some results with bfgminer kernel on cayman: source bytecode: 60 gprs, 3905 alu groups, sbcl before the patch: 45 gprs, 4088 alu groups, sbcl with this patch: 55 gprs, 3474 alu groups. Signed-off-by: Vadim Girlin <[email protected]>
* r600g/sb: fix handling of new multislot instructions on caymanVadim Girlin2013-07-173-5/+6
| | | | | | | Ex-scalar instructions that became multislot on cayman do replicate result to all channels - handle them similar to DOT4. Signed-off-by: Vadim Girlin <[email protected]>
* r600g/sb: fix debug dump code in schedulerVadim Girlin2013-07-171-4/+5
| | | | | | Update the stale debug code for other changes related to debug output. Signed-off-by: Vadim Girlin <[email protected]>
* r600g/sb: fix initial register allocationVadim Girlin2013-07-171-0/+1
| | | | | | | | | | Mark values that are members of the 'same register' constraint as preallocated in ra_init pass, this will prevent incorrect reallocation in scheduler in some cases. Should fix https://bugs.freedesktop.org/show_bug.cgi?id=66713 Signed-off-by: Vadim Girlin <[email protected]>
* r600g/sb: move chip & class name functions to sb_contextVadim Girlin2013-07-174-53/+55
| | | | Signed-off-by: Vadim Girlin <[email protected]>
* r600g/sb: fix handling of PS in source bytecode on caymanVadim Girlin2013-07-171-0/+5
| | | | | | | | | Actually PS doesn't make sense for cayman and isn't even mentioned in cayman docs, but llvm backend currently uses it in bytecode and, assuming that hw seems to be mostly ok with it, this will allow sb to parse such source bytecode correctly. Signed-off-by: Vadim Girlin <[email protected]>
* r600g/sb: Initialize ra_checker member variables.Vinson Lee2013-07-171-1/+1
| | | | | | Fixes "Uninitialized scalar field" defect reported by Coverity. Signed-off-by: Vinson Lee <[email protected]>
* llvmpipe: support sRGB framebuffersRoland Scheidegger2013-07-162-14/+57
| | | | | | | | | | | | | | | Just use the new conversion functions to do the work. The way it's plugged in into the blend code is quite hacktastic but follows all the same hacks as used by packed float format already. Only support 4x8bit srgb formats (rgba/rgbx plus swizzle), 24bit formats never worked anyway in the blend code and are thus disabled, and I don't think anyone is interested in L8/L8A8. Would need even more hacks otherwise. Unless I'm missing something, this is the last feature except MSAA needed for OpenGL 3.0, and for OpenGL 3.1 as well I believe. v2: prettify a bit, use separate function for packing. Reviewed-by: Jose Fonseca <[email protected]>
* Revert "r300g: allow HiZ with a 16-bit zbuffer"Marek Olšák2013-07-151-0/+1
| | | | | | | | This reverts commit 631c631cbf5b7e84e42a7cfffa1c206d63143370. https://bugs.freedesktop.org/show_bug.cgi?id=66921 Cc: [email protected]
* r300g/swtcl: fix a lockup in MSAA resolveMarek Olšák2013-07-151-0/+7
| | | | Cc: [email protected]
* r300g/swtcl: fix geometry corruption by uploading indices to a bufferMarek Olšák2013-07-153-45/+31
| | | | | | | | | | | | | The splitting of a draw call into several draw commands was broken, because the split sometimes took place in the middle of a primitive. The splitting was supposed to be dealing with the case when there are more indices than the maximum size of a CS. This commit throws that code away and uses a real index buffer instead. https://bugs.freedesktop.org/show_bug.cgi?id=66558 Cc: [email protected]
* ilo: skip 3DSTATE_INDEX_BUFFER when possibleChia-I Wu2013-07-144-59/+77
| | | | | | When only the offset to the index buffer is changed, we can skip the 3DSTATE_INDEX_BUFFER if we always use 0 for the offset, and add (offset / index_size) to Start Vertex Location in 3DPRIMITIVE.
* r600g/sb: Initialize ra_constraint::cost.Vinson Lee2013-07-131-1/+1
| | | | | | Fixes "Uninitialized scalar field" reported by Coverity. Signed-off-by: Vinson Lee <[email protected]>
* ilo: move a santiy check into its assert()Chia-I Wu2013-07-131-5/+2
| | | | | | The compiler does not know that ilo_3d_pipeline_estimate_size() is pure and can be eliminated in a release build in gen6_pipeline_end(). Move the call into the assert().
* ilo: mark some states dirty when they are really changedChia-I Wu2013-07-131-0/+16
| | | | | The checks may seem redundant because cso_context handles them, but util_blitter does not have access to cso_context.
* ilo: clean up ilo_blitter_pipe_begin()Chia-I Wu2013-07-133-27/+39
| | | | | Document why certain states need to be saved, and fix a bug when blitting with scissor enabled.
* r600g: don't use the CB/DB CP COHER logic on r6xxAlex Deucher2013-07-121-2/+10
| | | | | | | | | There are hw bugs. Flush and inv event is sufficient. Fixes: https://bugs.freedesktop.org/show_bug.cgi?id=66837 Signed-off-by: Alex Deucher <[email protected]>
* nv30: fix KILL_IF breakageBrian Paul2013-07-121-1/+1
| | | | Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=66858
* tgsi: rename the TGSI fragment kill opcodesBrian Paul2013-07-1214-55/+53
| | | | | | | | | | | | | | | | | | | | | TGSI_OPCODE_KIL and KILP had confusing names. The former was conditional kill (if any src component < 0). The later was unconditional kill. At one time KILP was supposed to work with NV-style condition codes/predicates but we never had that in TGSI. This patch renames both opcodes: TGSI_OPCODE_KIL -> KILL_IF (kill if src.xyzw < 0) TGSI_OPCODE_KILP -> KILL (unconditional kill) Note: I didn't just transpose the opcode names to help ensure that I didn't miss updating any code anywhere. I believe I've updated all the relevant code and comments but I'm not 100% sure that some drivers had this right in the first place. For example, the radeon driver might have llvm.AMDGPU.kill and llvm.AMDGPU.kilp mixed up. Driver authors should review their code. Reviewed-by: Jose Fonseca <[email protected]>
* softpipe: silence some MSVC warningsBrian Paul2013-07-122-14/+14
|
* radeon/uvd: fall back to shader based decoding for MPEG2 on UVD 2.x v2Christian König2013-07-122-5/+19
| | | | | | | | | | | UVD 2.x doesn't support hardware decoding of MPEG2, just use shader based decoding for those chipsets. Fixes: https://bugs.freedesktop.org/show_bug.cgi?id=66450 v2: fix interlacing as well Signed-off-by: Christian König <[email protected]>
* r600g: x/y coordinates must be divided by block dim in dma blitChristoph Bumiller2013-07-112-4/+16
| | | | | | | Note: this is a candidate for the 9.1 branch. Reviewed-by: Marek Olšák <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
* r600g/sb: Fix Android build v2Chih-Wei Huang2013-07-124-7/+8
| | | | | | | Add the sb CXX files to the Android Makefile and also stop using some c++11 features. v2 (Vadim Girlin): use &bc[0] instead of bc.begin()
* r600g/sb: improve math optimizations v2Vadim Girlin2013-07-1111-47/+435
| | | | | | | | | | | | | | | | This patch adds support for some math optimizations that are generally considered unsafe, that's why they are currently disabled for compute shaders. GL requirements are less strict, so they are enabled for for GL shaders by default. In case of any issues with applications that rely on higher precision than guaranteed by GL, 'sbsafemath' option in R600_DEBUG allows to disable them. v2 - always set proper src vector size for transformed instructions - check for clamp modifier in the expr_handler::fold_assoc Signed-off-by: Vadim Girlin <[email protected]>
* ilo: reduce PIPE_CAP_MAX_TEXTURE_CUBE_LEVELS to 12Chia-I Wu2013-07-111-2/+3
| | | | So that there are at most (2^22 * 6) texels, lower than the 2^26 limit.
* ilo: correctly initialize undefined registers in fsChia-I Wu2013-07-111-5/+15
| | | | | Initialize all 4 channels of undefined registers (that is, TEMPs that are used before being assigned) in FS.
* radeonsi: Handle TGSI_OPCODE_DDX/Y using local memoryMichel Dänzer2013-07-104-2/+103
| | | | | | 16 more little piglits. Reviewed-by: Tom Stellard <[email protected]>
* radeonsi: Handle TGSI_OPCODE_TXDMichel Dänzer2013-07-101-2/+25
| | | | | | One more little piglit. Reviewed-by: Tom Stellard <[email protected]>
* ilo: honor surface padding requirementsChia-I Wu2013-07-101-0/+53
| | | | The PRM specifies several padding requirements that we failed to honor.
* util: treat denorm'ed floats like zeroZack Rusin2013-07-092-0/+11
| | | | | | | | | | | | | The D3D10 spec is very explicit about treatment of denorm floats and the behavior is exactly the same for them as it would be for -0 or +0. This makes our shading code match that behavior, since OpenGL doesn't care and on a few cpu's it's faster (worst case the same). Float16 conversions will likely break but we'll fix them in a follow up commit. Signed-off-by: Zack Rusin <[email protected]> Reviewed-by: Jose Fonseca <[email protected]> Reviewed-by: Roland Scheidegger <[email protected]>
* r600g: improve the mechanism for recognizing an empty CSMarek Olšák2013-07-083-3/+8
| | | | Reviewed-by: Alex Deucher <[email protected]>
* r600g: explicitly flush caches for streamout-based buffer copying & clearingMarek Olšák2013-07-081-0/+13
| | | | | | | It's done automatically for vertex buffers, but not for constant buffers, textures, and colorbuffers. Reviewed-by: Alex Deucher <[email protected]>
* r600g: only flush the caches that need to be flushed during CP DMA operationsMarek Olšák2013-07-083-32/+117
| | | | | | | This should increase performance if constant uploads are done with the CP DMA, because only the cache that needs to be flushed is flushed. Reviewed-by: Alex Deucher <[email protected]>
* r600g: split INVAL_READ_CACHES into vertex, tex, and const cache flagsMarek Olšák2013-07-085-27/+52
| | | | | | | also flushing any cache in evergreen_emit_cs_shader seems to be superfluous (we don't flush caches when changing the other shaders either) Reviewed-by: Alex Deucher <[email protected]>
* r600g: adjust flush flags (v3)Alex Deucher2013-07-086-7/+42
| | | | | | | | | | | | | 1. flush SH with read caches 2. add flag for DB flushes 3. add flag for CB flushes v2: flush all CBs, remove redundant emit_state variable. v3: Marek: also set the new flags in r600_context_flush, the CP dma functions, and texture_barrier, and rename them Signed-off-by: Marek Olšák <[email protected]> Reviewed-by: Alex Deucher <[email protected]>
* r600g: don't call buffer_wait in buffer_mmap_sync_with_ringsMarek Olšák2013-07-081-2/+1
| | | | | | | | The winsys should do this, because it measures how much time we spend in buffer_map doing synchronization, which can be viewed with the gallium HUD. Reviewed-by: Alex Deucher <[email protected]>
* r600g: don't read back the MSAA depth buffer if the read flag is not setMarek Olšák2013-07-081-8/+8
| | | | Reviewed-by: Alex Deucher <[email protected]>
* r600g: don't flush the context in texture_transfer_mapMarek Olšák2013-07-081-5/+0
| | | | | | the winsys does this automatically Reviewed-by: Alex Deucher <[email protected]>
* r600g: fix texture offset computation for mapped MSAA depth buffersMarek Olšák2013-07-082-16/+14
| | | | | | | | | It was wrong, because the offset shouldn't be applied to MSAA depth buffers. This small cleanup should prevent such issues in the future. This fixes a lockup in "piglit/fbo-depthstencil default_fb -samples=n". Reviewed-by: Alex Deucher <[email protected]>
* r600g: fix color resolve for RGBX8 and RGBX16 integer formatsMarek Olšák2013-07-081-2/+2
| | | | Reviewed-by: Alex Deucher <[email protected]>
* r600g: enable fast MSAA color clear for array/3D/cube texturesMarek Olšák2013-07-081-4/+3
| | | | Reviewed-by: Alex Deucher <[email protected]>
* r600g: implement fast MSAA color clear for integer texturesMarek Olšák2013-07-081-9/+12
| | | | | | | this also fixes the fast clear with multiple colorbuffers and each having a different format Reviewed-by: Alex Deucher <[email protected]>
* r600/uvd: fix check for UVD 2.xChristian König2013-07-081-1/+1
| | | | Signed-off-by: Christian König <[email protected]>
* nvc0: enable very initial support for nvf0 (GK110)Ben Skeggs2013-07-054-5/+75
| | | | | | | Shaders need a lot of work still. Basic stuff generally works, so this is basically just fine for gnome-shell, OA etc at this point. Signed-off-by: Ben Skeggs <[email protected]>
* gallivm: do per-pixel lod calculations for explicit lodRoland Scheidegger2013-07-041-1/+2
| | | | | | | | | | | | | | | | | | | | | d3d10 requires per-pixel lod calculations for explicit lod, lod bias and explicit derivatives, and we should probably do it for OpenGL too - at least if they are used from vertex or geometry shaders (so doesn't apply to lod bias) this doesn't just affect neighboring pixels. Some code was already there to handle this so fix it up and enable it. There will no doubt be a performance hit unfortunately, we could do better if we'd knew we had a real vector shift instruction (with variable shift count) but this requires AVX2 on x86 (or a AMD Bulldozer family cpu). Don't do anything for lod bias and explicit derivatives yet, though no special magic should be needed for them neither. Likewise, the size query is still broken just the same. v2: Use information if lod is a (broadcast) scalar or not. The idea would be to base this on the actual value, for now just pretend it's a scalar in fs and not a scalar otherwise (so, per-pixel lod is only used in gs/vs but same code is generated for fs as before). Reviewed-by: Jose Fonseca <[email protected]>
* mesa,glsl,gallium: remove GLSLSkipStrictMaxVaryingLimitCheck and dependenciesMarek Olšák2013-07-0212-12/+0
| | | | | | Not needed with do_dead_builtin_varyings. Reviewed-by: Ian Romanick <[email protected]>
* draw/translate: fix instancingZack Rusin2013-06-284-16/+16
| | | | | | | | | | | | | | | | | | We were incorrectly computing the buffer offset when using the instances. The buffer offset is always equal to: start_instance * stride + (instance_num / instance_divisor) * stride We were completely ignoring the start instance quite often producing instances that completely wrong, e.g. if start instance = 5, instance divisor = 2, then on the first iteration it should be: 5 * stride, not (5/2) * stride as we'd have currently, and if start instance = 1, instance divisor = 3, then on the first iteration it should be: 1 * stride, not 0 as we'd have. This fixes it and adjusts all the code to the changes. Signed-off-by: Zack Rusin <[email protected]>
* nvc0: allow frame dropping in h264Maarten Lankhorst2013-07-011-3/+0
| | | | | | | | The only reason the checks existed were paranoia, when I first wrote the code I wasn't sure it was correct. Now that I am, the asserts triggered when XBMC was dropping frames, so remove it. NOTE: This is a candidate for the 9.1 branch.
* r300g/compiler: Prevent regalloc from swizzling texture operands v2Tom Stellard2013-06-305-0/+124
| | | | | | | | | https://bugs.freedesktop.org/show_bug.cgi?id=63520 NOTE: This is a candidate for the stable branches. Reviewed-by: Marek Olšák <[email protected]> Reviewed-by: Alex Deucher <[email protected]>
* r300g/compiler/tests: Add an assembly parserTom Stellard2013-06-305-16/+200
| | | | | | | The assembly parser can be used to load r300 assembly dumps and run them through any of the r300 compiler passes. Reviewed-by: Alex Deucher <[email protected]>