summaryrefslogtreecommitdiffstats
path: root/src/gallium/drivers
Commit message (Collapse)AuthorAgeFilesLines
* vc4: Colormask should apply after all other fragment ops (like logic op).Eric Anholt2015-01-151-9/+18
| | | | | Theoretically it should apply after dithering as well, but ditehring for 565 happens in fixed function in the TLB store.
* vc4: No turning unpack arguments into small immediates.Eric Anholt2015-01-151-0/+3
| | | | | Since unpack only happens on things read from the A register file, we have to leave them as something that can be allocated to A (temp or uniform).
* vc4: Move the tests for src needing to be an A register to vc4_qir.c.Eric Anholt2015-01-153-17/+28
| | | | I want it from another location.
* vc4: Don't swap the raddr on instructions doing unpacks.Eric Anholt2015-01-151-0/+5
| | | | | It would mean different unpacking behavior, since only the A file does unpack (with PM==0).
* vc4: Don't let pairing happen with badly mismatched unpack flags.Eric Anholt2015-01-151-0/+39
| | | | | No difference on shader-db, but prevents definite regressions in the blending changes.
* vc4: Don't let pairing happen with badly mismatched pack flags.Eric Anholt2015-01-151-0/+39
| | | | | No difference on shader-db, but will become more important as I introduce more use of pack flags with the blending changes.
* vc4: Fix early Z behavior on hardware.Eric Anholt2015-01-151-2/+1
| | | | | | It turns out the simulator was not treating this bit the same as the RPi, and I'd forgotten to remove it when turning on early Z. The result was that you'd get big chunks of your rendering missing.
* Revert "radeonsi: only set BC_OPTIMIZE_DISABLE when necessary"Michel Dänzer2015-01-152-15/+6
| | | | | | | | | | | | | | | | This reverts commit 0543630d0b0d9d9f6eefbc14fbd3385d4de37ba0. It caused flickering artifacts in Steam games such as Team Fortress 2 or Left 4 Dead 2. We could probably only enable this optimization by also making sure the shader code only uses either SI_PARAM_LINEAR_CENTROID or SI_PARAM_LINEAR_CENTER, not both. This would probably require a shader variant. Sorry I didn't remember this when reviewing the reverted change. Reviewed-by: Marek Olšák <[email protected]>
* freedreno/ir3: handle "holes" in inputsRob Clark2015-01-131-1/+31
| | | | | | | | | | | | If, for example, only the x/y/w components of in.xyzw are actually used, we still need to have a group of four registers and assign all four components. The hardware can't write in.xy and in.w to discontiguous registers. To handle this, pad with a dummy NOP instruction, to keep the neighbor chain contiguous. This fixes a problem noticed with firefox OMTC. Signed-off-by: Rob Clark <[email protected]>
* r600g: fix build failure when building the driver without LLVMMarek Olšák2015-01-121-0/+4
|
* vc4: Clamp the inputs to the blend equation to [0, 1].Eric Anholt2015-01-111-1/+10
| | | | Fixes the remaining ARB_color_buffer_float rendering tests.
* vc4: Add a little helper for clamping to [0,1].Eric Anholt2015-01-111-4/+10
|
* vc4: Fix up statechange management for uncompiled/compiled FS/VS.Eric Anholt2015-01-112-11/+10
| | | | | | | | No need to recheck the FS compile when the VS source has changed, but there *is* a need to recheck the VS compile when the compiled VS has changed (since the live inputs may change). Fixes es3conform's blend test.
* vc4: Fix clear color setup for RGB565.Eric Anholt2015-01-111-1/+4
| | | | | | | The util_pack_color() thing only sets up the low bits of the union, so only return them, too. Fixes intermittent failure on fbo-alphatest-formats and es3conform's framebuffer-objects test under simulation.
* vc4: Avoid the save/restore of r3 for raddr conflicts, just use ra31.Eric Anholt2015-01-112-38/+11
| | | | | | | | | | | | | Turns out this was harmful in code quality: total instructions in shared programs: 39487 -> 38845 (-1.63%) instructions in affected programs: 22522 -> 21880 (-2.85%) This costs us yet another register, which is painful since it means more programs might fail to compile). However, the alternative was causing us trouble where we'd save/restore r3 while it contained a MIN-ed direct texture offset, causing the kernel to fail to validate our shaders (such as in GLB2.7).
* vc4: Allow dead code elimination of VPM reads.Eric Anholt2015-01-102-1/+44
| | | | | | | | This gets a bunch of dead reads out of the CSes, which don't read most attributes generally. total instructions in shared programs: 39753 -> 39487 (-0.67%) instructions in affected programs: 4721 -> 4455 (-5.63%)
* vc4: Cook up the draw-time VPM setup info during shader compile.Eric Anholt2015-01-104-11/+28
| | | | | | This will give the compiler the chance to dead-code eliminate unused VPM reads. This is particularly a big deal in the CS where a bunch of vattrs are just not going to be used.
* vc4: Split two notions of instructions having side effects.Eric Anholt2015-01-105-4/+15
| | | | | Some ops can't be DCEd, while some of the ops that are just important due to the args they have can be.
* vc4: Redo VPM reads as a read file.Eric Anholt2015-01-105-16/+16
| | | | This will let us do copy propagation of the VPM reads.
* vc4: Fix miscalculation of the VPM space.Eric Anholt2015-01-101-1/+1
| | | | | We pass in a byte offset, not dword. I'm rather scared that this actually managed to pass piglit, but it does fix gears.
* vc4: Pack VPM attr contents according to just the size of the attribute.Eric Anholt2015-01-103-11/+9
| | | | | total instructions in shared programs: 40960 -> 39753 (-2.95%) instructions in affected programs: 20871 -> 19664 (-5.78%)
* vc4: Restructure color packing as a series of channel replacements.Eric Anholt2015-01-104-49/+60
| | | | | | | | | I'm using this in some WIP commits for doing blending in 8888 instead of vec4. But it also gives us these results immediately, thanks to allowing more uniforms/immediates in the arguments: total instructions in shared programs: 41027 -> 40960 (-0.16%) instructions in affected programs: 4381 -> 4314 (-1.53%)
* vc4: Fix the no-copy-propagating-from-TLB_COLOR_READ check.Eric Anholt2015-01-101-1/+1
| | | | | Our MOV's dst obviously won't be the TLB_COLOR_READ's def, because we're ssa.
* vc4: Move global seqno short-circuiting to vc4_wait_seqno().Eric Anholt2015-01-102-6/+3
| | | | Any other caller would want it, too.
* freedreno/ir3: fix pos_regid > max_regRob Clark2015-01-074-41/+121
| | | | | | | | | | | | | | We can't (or don't know how to) turn this off. But it can end up being stored to a higher reg # than what the shader uses, leading to corruption. Also we currently aren't clever enough to turn off frag_coord/frag_face if the input is dead-code, so just fixup max_reg/max_half_reg. Re-org this a bit so both vp and fp reg footprint fixup are called by a common fxn used also by ir3_cmdline. Also add a few more output lines for ir3_cmdline to make it easier to see what is going on. Signed-off-by: Rob Clark <[email protected]>
* freedreno/ir3: start on indirect gpr readsRob Clark2015-01-073-8/+146
| | | | | | | | | | | | | | Handle TEMP[ADDR[]] src registers by generating a fanin to group array elements, similarly to how texture fetch instructions work. NOTE: For all the scalar instructions generated for a single tgsi vector operation which uses an array src (or possibly even uses the same array as multiple srcs), re-use the same fanin node. Since a vector operation operates on all components at the same time, it should never see more than one version of the same array. Signed-off-by: Rob Clark <[email protected]>
* freedreno/ir3: make reg array dynamicRob Clark2015-01-074-13/+50
| | | | | | | | | To use fanin's to group registers in an array, we can potentially have a much larger array of registers. Rather than continuing to bump up the array size, just make it dynamically allocated when the instruction is created. Signed-off-by: Rob Clark <[email protected]>
* freedreno/ir3: simplify RARob Clark2015-01-078-777/+622
| | | | | | | | | | | | | | | | | | | | | | Group inputs/outputs, in addition to fanin/fanout, as they must also exist in sequential scalar registers. This lets us simplify RA by working in terms of neighbor groups. NOTE: has the slight problem that it can't optimize out mov's for things like: MOV OUT[n], IN[m] To avoid this, instead of trying to figure out what mov's we can eliminate, we first remove all mov's prior to grouping, and then re-insert mov's as needed while grouping inputs/outputs/fanins. Eventually we'd prefer the frontend to not insert extra mov's in the first place (so we don't have to bother removing them). This is the plan for an eventual NIR based frontend, so separate out the instr grouping (which will still be needed for NIR frontend) from the mov elimination (which won't). Signed-off-by: Rob Clark <[email protected]>
* freedreno/ir3: regmask support for relative addrRob Clark2015-01-072-17/+51
| | | | | | | | For temp arrays, a 32bit mask won't be sufficient.. but otoh we don't need to support an arbitrary mask. So for this case use a simple size field rather than a bitmask. Signed-off-by: Rob Clark <[email protected]>
* freedreno/ir3: split up ssa_srcRob Clark2015-01-071-23/+34
| | | | | | | Slight bit of refactoring that will be needed for indirect gpr addressing (TEMP[ADDR[]]). Signed-off-by: Rob Clark <[email protected]>
* freedreno/ir3: drop instr_clone() stuffRob Clark2015-01-072-49/+17
| | | | | | | Unnecessary and overly complicated. And gets in the way for temp arrays (TEMP[ADDR[]]). Signed-off-by: Rob Clark <[email protected]>
* freedreno/ir3: runtime enable RA debug for DEBUG buildsRob Clark2015-01-071-1/+6
| | | | Signed-off-by: Rob Clark <[email protected]>
* freedreno/ir3: handle relative addr in ir3_dumpRob Clark2015-01-071-1/+8
| | | | Signed-off-by: Rob Clark <[email protected]>
* freedreno/ir3: legalize vs unused sam dst componentsRob Clark2015-01-072-2/+9
| | | | | | | | We probably could be more clever elsewhere and mask out components that are not used. But either way, legalize should realize that there is also a write-after-write hazard with texture sample instructions. Signed-off-by: Rob Clark <[email protected]>
* freedreno/ir3: hack for old compilerRob Clark2015-01-071-0/+23
| | | | | | | Old compiler doesn't have ir3_block's.. so we need a special path. This hack can be dropped when ir3_compiler_old is retired. Signed-off-by: Rob Clark <[email protected]>
* Revert "radeonsi: reduce the size of si_pm4_state"Marek Olšák2015-01-082-3/+12
| | | | | | This reverts commit 9141d8855555e45a057970e78969e1518ad3617d. It broke OpenCL.
* radeonsi: Fix crash when destroying si_screenTom Stellard2015-01-071-2/+4
| | | | | | | | | We were invalidating si_screen:tm by calling r600_destroy_common_screen() which frees the si_screen object. This caused the driver to crash in LLVMDisposeTargetMachine() since we were passing it an invalid pointer. https://bugs.freedesktop.org/show_bug.cgi?id=88170
* radeonsi: enable LLVM optimizations that assume no NaNs for non-compute shadersMarek Olšák2015-01-073-4/+12
| | | | | | | v2: complete rewrite Reviewed-by: Alex Deucher <[email protected]> Reviewed-by: Tom Stellard <[email protected]>
* radeonsi: emit SURFACE_SYNC lastMarek Olšák2015-01-071-23/+35
| | | | | | | This fixes a case where a transform feedback buffer is fed back as an index buffer, because SURFACE_SYNC must be after VS_PARTIAL_FLUSH. Reviewed-by: Michel Dänzer <[email protected]>
* radeonsi: flush all CB/DB caches unconditionally when changing the framebufferMarek Olšák2015-01-071-11/+7
| | | | | | This is easier to read and will work better with shader image stores. Reviewed-by: Michel Dänzer <[email protected]>
* radeonsi: change TC cache flushing strategy for texturesMarek Olšák2015-01-072-4/+6
| | | | Reviewed-by: Michel Dänzer <[email protected]>
* radeonsi: improve and fix streamout flushingMarek Olšák2015-01-073-10/+40
| | | | | | | | | | | - we don't usually need to flush TC L2 - we should flush KCACHE (not really an issue now since we always flush KCACHE when updating descriptors, but it could be a problem if we used CE, which doesn't require flushing KCACHE) - add an explicit VS_PARTIAL_FLUSH flag Reviewed-by: Michel Dänzer <[email protected]>
* radeonsi: use TC L2 for CP DMA operations with shader resources on CIKMarek Olšák2015-01-073-10/+39
| | | | | | | | | So that TC L2 doesn't need to be flushed. The only problem is with index buffers, which don't use TC. A simple solution is added that flushes TC L2 before a draw call (TC_L2_dirty). Reviewed-by: Michel Dänzer <[email protected]>
* radeonsi: use TC L2 for updating descriptors on CIKMarek Olšák2015-01-072-5/+10
| | | | | | This allows not flushing TC L2 on CIK later. Reviewed-by: Michel Dänzer <[email protected]>
* radeonsi: don't use TC L2 for updating descriptors on SIMarek Olšák2015-01-072-2/+14
| | | | | | | | | | | | It's causing problems, because we mix uncached CP DMA with cached WRITE_DATA when updating the same memory. The solution for SI is to use uncached access here, because CP DMA doesn't support cached access. CIK will be handled in the next patch. Reviewed-by: Michel Dänzer <[email protected]>
* radeonsi: only flush the right set of caches for CP DMA operationsMarek Olšák2015-01-079-34/+48
| | | | | | | | That's either framebuffer caches or caches for shader resources. The motivation is that framebuffer caches need to be flushed very rarely here. Reviewed-by: Michel Dänzer <[email protected]>
* radeonsi: implement separate ICACHE and KCACHE flush for SIMarek Olšák2015-01-071-9/+17
| | | | Reviewed-by: Michel Dänzer <[email protected]>
* radeonsi: add a combined flag for flushing a framebufferMarek Olšák2015-01-073-20/+10
| | | | Reviewed-by: Michel Dänzer <[email protected]>
* radeonsi: rename flush flags, split the TC flag into L1 and L2Marek Olšák2015-01-077-91/+109
| | | | Reviewed-by: Michel Dänzer <[email protected]>
* r600g,radeonsi: separate cache flush flagsMarek Olšák2015-01-075-26/+39
| | | | | | I will rename them for radeonsi. Reviewed-by: Michel Dänzer <[email protected]>