summaryrefslogtreecommitdiffstats
path: root/src/gallium
Commit message (Collapse)AuthorAgeFilesLines
* vc4: Redo VPM reads as a read file.Eric Anholt2015-01-105-16/+16
| | | | This will let us do copy propagation of the VPM reads.
* vc4: Fix miscalculation of the VPM space.Eric Anholt2015-01-101-1/+1
| | | | | We pass in a byte offset, not dword. I'm rather scared that this actually managed to pass piglit, but it does fix gears.
* vc4: Pack VPM attr contents according to just the size of the attribute.Eric Anholt2015-01-103-11/+9
| | | | | total instructions in shared programs: 40960 -> 39753 (-2.95%) instructions in affected programs: 20871 -> 19664 (-5.78%)
* vc4: Restructure color packing as a series of channel replacements.Eric Anholt2015-01-104-49/+60
| | | | | | | | | I'm using this in some WIP commits for doing blending in 8888 instead of vec4. But it also gives us these results immediately, thanks to allowing more uniforms/immediates in the arguments: total instructions in shared programs: 41027 -> 40960 (-0.16%) instructions in affected programs: 4381 -> 4314 (-1.53%)
* vc4: Fix the no-copy-propagating-from-TLB_COLOR_READ check.Eric Anholt2015-01-101-1/+1
| | | | | Our MOV's dst obviously won't be the TLB_COLOR_READ's def, because we're ssa.
* vc4: Move global seqno short-circuiting to vc4_wait_seqno().Eric Anholt2015-01-102-6/+3
| | | | Any other caller would want it, too.
* st/wgl: Ignore ulVersion in DrvValidateVersion.José Fonseca2015-01-081-2/+10
| | | | | | | | | | | We never used ulVersion for proper version checks. Most 3rd party drivers use version 1, but recently NVIDIA OpenGL driver started using a different version number, so the handy trick of renaming Mesa's ICDs as nvoglv32.dll on Windows machines with NVIDIA hardware for quick testing of Mesa software renderers stopped working. Reviewed-by: Brian Paul <[email protected]>
* freedreno/ir3: fix pos_regid > max_regRob Clark2015-01-074-41/+121
| | | | | | | | | | | | | | We can't (or don't know how to) turn this off. But it can end up being stored to a higher reg # than what the shader uses, leading to corruption. Also we currently aren't clever enough to turn off frag_coord/frag_face if the input is dead-code, so just fixup max_reg/max_half_reg. Re-org this a bit so both vp and fp reg footprint fixup are called by a common fxn used also by ir3_cmdline. Also add a few more output lines for ir3_cmdline to make it easier to see what is going on. Signed-off-by: Rob Clark <[email protected]>
* freedreno/ir3: start on indirect gpr readsRob Clark2015-01-073-8/+146
| | | | | | | | | | | | | | Handle TEMP[ADDR[]] src registers by generating a fanin to group array elements, similarly to how texture fetch instructions work. NOTE: For all the scalar instructions generated for a single tgsi vector operation which uses an array src (or possibly even uses the same array as multiple srcs), re-use the same fanin node. Since a vector operation operates on all components at the same time, it should never see more than one version of the same array. Signed-off-by: Rob Clark <[email protected]>
* freedreno/ir3: make reg array dynamicRob Clark2015-01-074-13/+50
| | | | | | | | | To use fanin's to group registers in an array, we can potentially have a much larger array of registers. Rather than continuing to bump up the array size, just make it dynamically allocated when the instruction is created. Signed-off-by: Rob Clark <[email protected]>
* freedreno/ir3: simplify RARob Clark2015-01-078-777/+622
| | | | | | | | | | | | | | | | | | | | | | Group inputs/outputs, in addition to fanin/fanout, as they must also exist in sequential scalar registers. This lets us simplify RA by working in terms of neighbor groups. NOTE: has the slight problem that it can't optimize out mov's for things like: MOV OUT[n], IN[m] To avoid this, instead of trying to figure out what mov's we can eliminate, we first remove all mov's prior to grouping, and then re-insert mov's as needed while grouping inputs/outputs/fanins. Eventually we'd prefer the frontend to not insert extra mov's in the first place (so we don't have to bother removing them). This is the plan for an eventual NIR based frontend, so separate out the instr grouping (which will still be needed for NIR frontend) from the mov elimination (which won't). Signed-off-by: Rob Clark <[email protected]>
* freedreno/ir3: regmask support for relative addrRob Clark2015-01-072-17/+51
| | | | | | | | For temp arrays, a 32bit mask won't be sufficient.. but otoh we don't need to support an arbitrary mask. So for this case use a simple size field rather than a bitmask. Signed-off-by: Rob Clark <[email protected]>
* freedreno/ir3: split up ssa_srcRob Clark2015-01-071-23/+34
| | | | | | | Slight bit of refactoring that will be needed for indirect gpr addressing (TEMP[ADDR[]]). Signed-off-by: Rob Clark <[email protected]>
* freedreno/ir3: drop instr_clone() stuffRob Clark2015-01-072-49/+17
| | | | | | | Unnecessary and overly complicated. And gets in the way for temp arrays (TEMP[ADDR[]]). Signed-off-by: Rob Clark <[email protected]>
* freedreno/ir3: runtime enable RA debug for DEBUG buildsRob Clark2015-01-071-1/+6
| | | | Signed-off-by: Rob Clark <[email protected]>
* freedreno/ir3: handle relative addr in ir3_dumpRob Clark2015-01-071-1/+8
| | | | Signed-off-by: Rob Clark <[email protected]>
* freedreno/ir3: legalize vs unused sam dst componentsRob Clark2015-01-072-2/+9
| | | | | | | | We probably could be more clever elsewhere and mask out components that are not used. But either way, legalize should realize that there is also a write-after-write hazard with texture sample instructions. Signed-off-by: Rob Clark <[email protected]>
* freedreno/ir3: hack for old compilerRob Clark2015-01-071-0/+23
| | | | | | | Old compiler doesn't have ir3_block's.. so we need a special path. This hack can be dropped when ir3_compiler_old is retired. Signed-off-by: Rob Clark <[email protected]>
* tgsi: track max array per fileRob Clark2015-01-072-0/+4
| | | | | | | | | | | | NOTE IN[] and OUT[] don't need (have?) ArrayID's.. and TEMP[] can optionally have them. So we implicitly assume that ArrayID==0 always exists for each file. This is why array_max[file] is never less than zero. You can tell from indirect_files(_read/written) if the legacy array- id zero was actually used. Signed-off-by: Rob Clark <[email protected]>
* tgsi: keep track of read vs written indirectsRob Clark2015-01-072-0/+8
| | | | | | | | | | At least temporarily, I need to fallback to old compiler still for relative dest (for freedreno), but I can do relative src temp. Only a temporary situation, but seems easy/reasonable for tgsi-scan to track this. Signed-off-by: Rob Clark <[email protected]> Reviewed-by: Roland Scheidegger <[email protected]>
* Revert "radeonsi: reduce the size of si_pm4_state"Marek Olšák2015-01-082-3/+12
| | | | | | This reverts commit 9141d8855555e45a057970e78969e1518ad3617d. It broke OpenCL.
* radeonsi: Fix crash when destroying si_screenTom Stellard2015-01-071-2/+4
| | | | | | | | | We were invalidating si_screen:tm by calling r600_destroy_common_screen() which frees the si_screen object. This caused the driver to crash in LLVMDisposeTargetMachine() since we were passing it an invalid pointer. https://bugs.freedesktop.org/show_bug.cgi?id=88170
* radeonsi: enable LLVM optimizations that assume no NaNs for non-compute shadersMarek Olšák2015-01-073-4/+12
| | | | | | | v2: complete rewrite Reviewed-by: Alex Deucher <[email protected]> Reviewed-by: Tom Stellard <[email protected]>
* radeonsi: emit SURFACE_SYNC lastMarek Olšák2015-01-071-23/+35
| | | | | | | This fixes a case where a transform feedback buffer is fed back as an index buffer, because SURFACE_SYNC must be after VS_PARTIAL_FLUSH. Reviewed-by: Michel Dänzer <[email protected]>
* radeonsi: flush all CB/DB caches unconditionally when changing the framebufferMarek Olšák2015-01-071-11/+7
| | | | | | This is easier to read and will work better with shader image stores. Reviewed-by: Michel Dänzer <[email protected]>
* radeonsi: change TC cache flushing strategy for texturesMarek Olšák2015-01-072-4/+6
| | | | Reviewed-by: Michel Dänzer <[email protected]>
* radeonsi: improve and fix streamout flushingMarek Olšák2015-01-073-10/+40
| | | | | | | | | | | - we don't usually need to flush TC L2 - we should flush KCACHE (not really an issue now since we always flush KCACHE when updating descriptors, but it could be a problem if we used CE, which doesn't require flushing KCACHE) - add an explicit VS_PARTIAL_FLUSH flag Reviewed-by: Michel Dänzer <[email protected]>
* radeonsi: use TC L2 for CP DMA operations with shader resources on CIKMarek Olšák2015-01-073-10/+39
| | | | | | | | | So that TC L2 doesn't need to be flushed. The only problem is with index buffers, which don't use TC. A simple solution is added that flushes TC L2 before a draw call (TC_L2_dirty). Reviewed-by: Michel Dänzer <[email protected]>
* radeonsi: use TC L2 for updating descriptors on CIKMarek Olšák2015-01-072-5/+10
| | | | | | This allows not flushing TC L2 on CIK later. Reviewed-by: Michel Dänzer <[email protected]>
* radeonsi: don't use TC L2 for updating descriptors on SIMarek Olšák2015-01-072-2/+14
| | | | | | | | | | | | It's causing problems, because we mix uncached CP DMA with cached WRITE_DATA when updating the same memory. The solution for SI is to use uncached access here, because CP DMA doesn't support cached access. CIK will be handled in the next patch. Reviewed-by: Michel Dänzer <[email protected]>
* radeonsi: only flush the right set of caches for CP DMA operationsMarek Olšák2015-01-079-34/+48
| | | | | | | | That's either framebuffer caches or caches for shader resources. The motivation is that framebuffer caches need to be flushed very rarely here. Reviewed-by: Michel Dänzer <[email protected]>
* radeonsi: implement separate ICACHE and KCACHE flush for SIMarek Olšák2015-01-071-9/+17
| | | | Reviewed-by: Michel Dänzer <[email protected]>
* radeonsi: add a combined flag for flushing a framebufferMarek Olšák2015-01-073-20/+10
| | | | Reviewed-by: Michel Dänzer <[email protected]>
* radeonsi: rename flush flags, split the TC flag into L1 and L2Marek Olšák2015-01-077-91/+109
| | | | Reviewed-by: Michel Dänzer <[email protected]>
* r600g,radeonsi: separate cache flush flagsMarek Olšák2015-01-075-26/+39
| | | | | | I will rename them for radeonsi. Reviewed-by: Michel Dänzer <[email protected]>
* r600g: move r6xx-specific streamout flush flagging into r600gMarek Olšák2015-01-072-9/+7
| | | | Reviewed-by: Michel Dänzer <[email protected]>
* radeonsi: only set BC_OPTIMIZE_DISABLE when necessaryMarek Olšák2015-01-072-6/+15
| | | | | | SPI_PS_IN_CONTROL is moved into the SPI mapping state. Reviewed-by: Michel Dänzer <[email protected]>
* radeonsi: do not define FACE as an ordinary PS inputMarek Olšák2015-01-071-1/+2
| | | | Reviewed-by: Michel Dänzer <[email protected]>
* radeonsi: remove flatshade from the shader keyMarek Olšák2015-01-073-7/+7
| | | | Reviewed-by: Michel Dänzer <[email protected]>
* radeonsi: remove special handling of TGSI_INTERPOLATE_COLOR in shader codegenMarek Olšák2015-01-071-6/+10
| | | | | | | | It doesn't do anything useful. And colors are floating-point, so we can use fs.interp, remove "flatshade" from the shader key, and rely on the FLAT_SHADE state only (in the next patch). Reviewed-by: Michel Dänzer <[email protected]>
* radeonsi: implement VERTEXID_NOBASE and BASEVERTEX system valuesMarek Olšák2015-01-071-0/+10
| | | | | | | | Only done for completeness. Not used by anything yet. Tested by advertising PIPE_CAP_VERTEXID_NOBASE. Reviewed-by: Michel Dänzer <[email protected]>
* radeonsi: fix VertexID for OpenGLMarek Olšák2015-01-071-2/+5
| | | | | | | This fixes all failing piglit VertexID tests. Cc: 10.4 <[email protected]> Reviewed-by: Michel Dänzer <[email protected]>
* radeonsi: clarify a hw bug in shader exportsMarek Olšák2015-01-071-5/+10
| | | | Reviewed-by: Michel Dänzer <[email protected]>
* radeonsi: use ordered compares for SSG and face selectionMarek Olšák2015-01-072-3/+3
| | | | | | | | | | | Ordered compares are what you have in C. Unordered compares are the result of negating ordered compares (they return true if either argument is NaN). That special NaN behavior is completely useless here, and unordered compares produce horrible code with all stable LLVM versions. (I think that has been fixed in LLVM git) Reviewed-by: Michel Dänzer <[email protected]>
* radeonsi: remove unused and not useful variablesMarek Olšák2015-01-073-6/+1
| | | | Reviewed-by: Michel Dänzer <[email protected]>
* radeonsi: remove init config from statesMarek Olšák2015-01-076-5/+4
| | | | | | It really doesn't do anything there. Reviewed-by: Michel Dänzer <[email protected]>
* radeonsi: reduce the size of si_pm4_stateMarek Olšák2015-01-072-12/+3
| | | | | | | | - the relocs array is unused, remove it - ndw is at most 115 (init), set 140 as the maximum - compute needs 4 buffers per state, graphics only needs 1; set 4 as the maximum Reviewed-by: Michel Dänzer <[email protected]>
* tgsi: add uses_centroid into tgsi_shader_infoMarek Olšák2015-01-072-0/+4
|
* vc4: Fix scaling W projection of the Z coordinate when there's a Z offset.Eric Anholt2015-01-061-3/+3
| | | | | | Fixes piglit glsl-fs-fragcoord-zw-perspective, es3conform gl_FragCoord_z_frag, and the rest of the piglit glsl 1.10 interpolation tests.
* vc4: Fix deletion from the program cache.Eric Anholt2015-01-061-1/+1
| | | | | They key is, oddly enough, in the key field, not in the data field (which is the vc4_compiled_shader *). Fixes regular failures in fp-long-alu.