summaryrefslogtreecommitdiffstats
path: root/src/gallium/drivers
Commit message (Collapse)AuthorAgeFilesLines
* gallium/radeon: remove DBG_TEXMIPMarek Olšák2015-12-033-4/+2
| | | | | | we don't need 2 flags for dumping texture info Reviewed-by: Michel Dänzer <[email protected]>
* nv50/ir: fix moves to/from flagsIlia Mirkin2015-12-022-2/+7
| | | | | | | | Noticed this when looking at a trace that caused flags to spill to/from registers. The flags source/destination wasn't encoded correctly according to both envydis and nvdisasm. Signed-off-by: Ilia Mirkin <[email protected]>
* nv50/ir: don't forget to mark flagsDef on cvt in txb loweringIlia Mirkin2015-12-021-1/+1
| | | | | Signed-off-by: Ilia Mirkin <[email protected]> Cc: "11.0 11.1" <[email protected]>
* nv50/ir: fix instruction permutation logicIlia Mirkin2015-12-021-1/+1
| | | | | Signed-off-by: Ilia Mirkin <[email protected]> Cc: "11.0 11.1" <[email protected]>
* nv50/ir: the mad source might not have a defining instructionIlia Mirkin2015-12-021-1/+1
| | | | | | | For example if it's $r63 (aka 0), there won't be a definition. Signed-off-by: Ilia Mirkin <[email protected]> Cc: "11.0 11.1" <[email protected]>
* nv50/ir: make sure entire graph is reachableIlia Mirkin2015-12-021-0/+1
| | | | | | | | The algorithm expects the entire CFG to be reachable, so make sure that we hit every node. Otherwise we will end up with uninitialized data, memory corruption, etc. Signed-off-by: Ilia Mirkin <[email protected]>
* nv50/ir: deal with loops with no breaksIlia Mirkin2015-12-021-0/+6
| | | | | | | | | | | For example if there are only returns, the break bb will not end up part of the CFG. However there will have been a prebreak already emitted for it, and when hitting the RET that comes after, we will try to insert the current (i.e. break) BB into the graph even though it will be unreachable. This makes the SSA code sad. Signed-off-by: Ilia Mirkin <[email protected]> Cc: "11.0 11.1" <[email protected]>
* nvc0/ir: fold postfactor into immediateIlia Mirkin2015-12-021-0/+6
| | | | | | | | SM20-SM50 can't emit a post-factor in the presence of a long immediate. Make sure to fold it in. Signed-off-by: Ilia Mirkin <[email protected]> Cc: "11.0 11.1" <[email protected]>
* nv50/ir: allow immediate 0 to be loaded anywhereIlia Mirkin2015-12-021-0/+6
| | | | | | | There's a post-RA fixup to replace 0's with $r63 (or $r127 if too many regs are used), so just as nvc0, let an immediate 0 be loaded anywhere. Signed-off-by: Ilia Mirkin <[email protected]>
* nv50/ir/gk110: add memory barriers support for GK110Samuel Pitoiset2015-12-021-0/+12
| | | | | Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Ilia Mirkin <[email protected]>
* nv50/ir: do not call textureMask() for surface opsSamuel Pitoiset2015-12-021-1/+2
| | | | | | | | | | | That texture mask thing doesn't seem to be needed for surface ops, so just as nve4+, let do that only for texture ops. This fixes a segfault with 'test_surface_st' from gallium/tests/trivial/compute.c on Fermi because this test uses sustp. Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Ilia Mirkin <[email protected]>
* r600: set mega fetch count to 16 for gs copy shaderDave Airlie2015-12-021-0/+1
| | | | | | Seems like MFC should be set for this shader. Signed-off-by: Dave Airlie <[email protected]>
* r600: increment ring index after emit vertex not before.Dave Airlie2015-12-021-18/+24
| | | | | | | The docs say we should send the emit after the ring writes, so lets do that and not have an ALU in between. Signed-off-by: Dave Airlie <[email protected]>
* r600: add alu + cf nop to copy shader on r600Dave Airlie2015-12-021-0/+10
| | | | | | | SB suggests we do this for r600, so lets do it, for the copy shader. Signed-off-by: Dave Airlie <[email protected]>
* r600: SMX returns CONTEXT_DONE early workaroundDave Airlie2015-12-022-1/+13
| | | | | | | | | streamout, gs rings bug on certain r600s, requires a wait idle before each surface sync. Reviewed-by: Marek Olšák <[email protected]> Cc: "10.6 11.0 11.1" <[email protected]> Signed-off-by: Dave Airlie <[email protected]>
* r600: do SQ flush ES ring rolling workaroundDave Airlie2015-12-023-1/+8
| | | | | | | | | Need to insert a SQ_NON_EVENT when ever geometry shaders are enabled. Reviewed-by: Marek Olšák <[email protected]> Cc: "10.6 11.0 11.1" <[email protected]> Signed-off-by: Dave Airlie <[email protected]>
* nv50,nvc0: allow to create resources other than buffersSamuel Pitoiset2015-12-015-4/+9
| | | | | | | | | | | | For the compute support, we might stick buffers as surfaces. This fixes an assertion when executing src/gallium/tests/trivial/compute. To avoid using these "restricted" surfaces as render targets, these assertions have been moved. Note that it's already handled for the framebuffer thing on nvc0. Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Ilia Mirkin <[email protected]>
* r600: workaround empty geom shader.Dave Airlie2015-12-011-0/+5
| | | | | | | | | | We need to emit at least one cut/emit in every geometry shader, the easiest workaround it to stick a single CUT at the top of each geom shader. Reviewed-by: Marek Olšák <[email protected]> Cc: "10.6 11.0 11.1" <[email protected]> Signed-off-by: Dave Airlie <[email protected]>
* r600: rv670 use at least 16es/gs threadsDave Airlie2015-12-011-4/+5
| | | | | | | | This is specified in the docs for rv670 to work properly. Reviewed-by: Marek Olšák <[email protected]> Cc: "10.6 11.0 11.1" <[email protected]> Signed-off-by: Dave Airlie <[email protected]>
* r600: geometry shader gsvs itemsize workaroundDave Airlie2015-12-011-0/+20
| | | | | | | | | | On some chips the GSVS itemsize needs to be aligned to a cacheline size. This only applies to some of the r600 family chips. Reviewed-by: Marek Olšák <[email protected]> Cc: "10.6 11.0 11.1" <[email protected]> Signed-off-by: Dave Airlie <[email protected]>
* r600/shader: split address get out to a function.Dave Airlie2015-12-011-1/+6
| | | | | | | This will be used in the tess shaders. Reviewed-by: Oded Gabbay <[email protected]> Signed-off-by: Dave Airlie <[email protected]>
* r600: move per-type settings into a switch statementDave Airlie2015-11-301-5/+13
| | | | | | | This will allow adding tess stuff much cleaner later. Reviewed-by: Glenn Kennard <[email protected]> Signed-off-by: Dave Airlie <[email protected]>
* r600: split out common alu_writes pattern.Dave Airlie2015-11-301-7/+12
| | | | | | | | This just splits out a common pattern into an inline function to make things cleaner to read. Reviewed-by: Glenn Kennard <[email protected]> Signed-off-by: Dave Airlie <[email protected]>
* r600/llvm: fix r600/llvm buildDave Airlie2015-11-301-1/+1
| | | | | | Reported on irc by gryffus Signed-off-by: Dave Airlie <[email protected]>
* r600: fixes for register definitions.Dave Airlie2015-11-301-3/+3
| | | | | | Forgot to add these. Signed-off-by: Dave Airlie <[email protected]>
* r600: add missing register to initial stateDave Airlie2015-11-303-7/+15
| | | | | | | | | | We really should initialise HS/LS_2 and SQ_LDS_ALLOC exists on all evergreen not just cayman, so we should initialise it as well. Reviewed-by: Glenn Kennard <[email protected]> Reviewed-by: Marek Olšák <[email protected]> Signed-off-by: Dave Airlie <[email protected]>
* r600: define registers required for tessellationDave Airlie2015-11-302-27/+113
| | | | | | | | | This adds the defines for a bunch of registers and shader values that are required to implement tessellation. Reviewed-by: Marek Olšák <[email protected]> Reviewed-by: Glenn Kennard <[email protected]> Signed-off-by: Dave Airlie <[email protected]>
* r600: consolidate clip state updatesDave Airlie2015-11-302-17/+16
| | | | | | | | Move some common code into one place, tess will also need to use this function. Reviewed-by: Marek Olšák <[email protected]> Signed-off-by: Dave Airlie <[email protected]>
* nv50/ir: always display the opcode number for unknown instructionsSamuel Pitoiset2015-11-292-2/+2
| | | | | | | This helps in debugging unknown instructions. Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Ilia Mirkin <[email protected]>
* radeon: only suspend queries on flush if they haven't been suspended yetNicolai Hähnle2015-11-282-2/+9
| | | | | | | | | | | | | | | | | | | | | | | | | | Non-timer queries are suspended during blits. When the blits end, the queries are resumed, but this resume operation itself might run out of CS space and trigger a flush. When this happens, we must prevent a duplicate suspend during preflush suspend, and we must also prevent a duplicate resume when the CS flush returns back to the original resume operation. This fixes a regression that was introduced by: commit 8a125afa6e88a3eeddba8c7fdc1a75c9b99d5489 Author: Nicolai Hähnle <[email protected]> Date: Wed Nov 18 18:40:22 2015 +0100 radeon: ensure that timing/profiling queries are suspended on flush The queries_suspended_for_flush flag is redundant because suspended queries are not removed from their respective linked list. Reviewed-by: Marek Olšák <[email protected]> Reported-by: Axel Davy <[email protected]> Cc: "11.1" <[email protected]> Tested-by: Axel Davy <[email protected]> Reviewed-by: Marek Olšák <[email protected]>
* svga: Don't return value from void function.Jose Fonseca2015-11-271-1/+2
| | | | | | | Addresses MSVC warning C4098: 'svga_destroy_query' : 'void' function returning a value. Reviewed-by: Roland Scheidegger <[email protected]>
* freedreno/ir3: assign varying locations laterRob Clark2015-11-264-29/+37
| | | | | | | | | | | | | Rather than assigning inloc up front, when we don't yet know if it will be unused, assign it last thing before the legalize pass. Also, realize when inputs are unused (since for frag shader's we can't rely on them being removed from ir->inputs[]). This doesn't make sense if we don't also dynamically assign the inloc's, since we could end up telling the hw the wrong # of varyings (since we currently assume that the # of varyings and max-inloc are related..) Signed-off-by: Rob Clark <[email protected]>
* freedreno/ir3: use instr flag to mark unused instructionsRob Clark2015-11-264-14/+24
| | | | | | Rather than magic depth value, which won't be available in later stages. Signed-off-by: Rob Clark <[email protected]>
* freedreno/a4xx: rework vinterp/vpsreplRob Clark2015-11-261-12/+36
| | | | | | Same as previous commit, for a4xx. Signed-off-by: Rob Clark <[email protected]>
* freedreno/a3xx: rework vinterp/vpsreplRob Clark2015-11-261-12/+37
| | | | | | | | | | | | Make the interpolation / point-sprite replacement mode setup deal with varying packing. In a later commit, we switch to packing just the varying components that are actually used by the frag shader, so we won't be able to assume everything is vec4's aligned to vec4. Which would highly confuse the previous vinterp/vpsrepl logic. Signed-off-by: Rob Clark <[email protected]>
* radeon: use PIPE_DRIVER_QUERY_FLAG_DONT_LIST for perfcountersNicolai Hähnle2015-11-261-0/+2
| | | | | | | | Since the query names are not very enlightening, and there are thousands of them, GALLIUM_HUD=help should only show the first and last query name for each hardware block. Reviewed-by: Marek Olšák <[email protected]>
* radeon: delay the generation of driver query names until first useNicolai Hähnle2015-11-263-104/+113
| | | | | | | This shaves a bit more time off the startup of programs that don't actually use performance counters. Reviewed-by: Marek Olšák <[email protected]>
* radeonsi/compute: Use the compiler's COMPUTE_PGM_RSRC* register valuesTom Stellard2015-11-252-31/+7
| | | | | | | | | The compiler has more information and is able to optimize the bits it sets in these registers. Reviewed-by: Marek Olšák <[email protected]> CC: <[email protected]>
* radeonsi: Rename si_shader::ls_rsrc{1,2} to si_shader::rsrc{1,2}Tom Stellard2015-11-253-6/+6
| | | | | | In the future, these will be used by other shaders types. Reviewed-by: Marek Olšák <[email protected]>
* radeonsi: implement AMD_performance_monitor for CIK+Nicolai Hähnle2015-11-2510-3/+1486
| | | | | | | | | | | | | | | | | | Expose most of the performance counter groups that are exposed by Catalyst. Ideally, the driver will work with GPUPerfStudio at some point, but we are not quite there yet. In any case, this is the reason for grouping multiple instances of hardware blocks in the way it is implemented. The counters can also be shown using the Gallium HUD. If one is interested to see how work is distributed across multiple shader engines, one can set the environment variable RADEON_PC_SEPARATE_SE=1 to obtain finer-grained performance counter groups. Part of the implementation is in radeon because an implementation for older hardware would largely follow along the same lines, but exposing a different set of blocks which are programmed slightly differently. Reviewed-by: Marek Olšák <[email protected]>
* radeon: scale query buffer size to result sizeNicolai Hähnle2015-11-251-1/+1
| | | | | | | Performance monitor queries can become very big, especially considering that instances of a block in different shader engines are queried separately. Reviewed-by: Marek Olšák <[email protected]>
* radeonsi/sid: add performance counter registersNicolai Hähnle2015-11-251-0/+1013
| | | | Reviewed-by: Marek Olšák <[email protected]>
* radeonsi/sid: add hardware constants for COPY_DATA packetNicolai Hähnle2015-11-251-0/+3
| | | | Reviewed-by: Marek Olšák <[email protected]>
* radeon: extend CIK_UCONFIG_REG_END for performance countersNicolai Hähnle2015-11-252-2/+2
| | | | Reviewed-by: Marek Olšák <[email protected]>
* radeon: add perfcounter-related EVENT_TYPEsNicolai Hähnle2015-11-251-0/+3
| | | | Reviewed-by: Marek Olšák <[email protected]>
* radeon: additional constants for WAIT_REG_MEM and EVENT_WRITE_EOPNicolai Hähnle2015-11-251-0/+8
| | | | Reviewed-by: Marek Olšák <[email protected]>
* nouveau: move interlaced assert down in nouveau_vp3_video_buffer_createJulien Isorce2015-11-251-1/+1
| | | | | | | | templat->interlaced is 0 if not NV12 which is the case currently when using VPP. Signed-off-by: Julien Isorce <[email protected]> Reviewed-by: Ilia Mirkin <[email protected]>
* softpipe/llvmpipe: don't advertize support for ASTCRoland Scheidegger2015-11-242-2/+4
| | | | | | | | | 33339775565154040e0c4ea2e196217dccc08cdf added support for ASTC textures to gallium. They don't have any helpers hooked up for software decoding, however, so cannot support them in drivers relying on util code for decoding. Reviewed-by: Jose Fonseca <[email protected]> Reviewed-by: Ilia Mirkin <[email protected]>
* llvmpipe: don't test for unsupported formats in lp_test_formatRoland Scheidegger2015-11-241-0/+12
| | | | | | | | | | | | | Removing the fake format helpers (1c7d0a6aa4f5cb38af7e281e1e5437cd1a20f781) caused this to fail. These formats were never supported, but previously they would have asserted in the generated jit functions (which, due to lack of test cases for these formats, were never called) whereas we now assert when trying to build the jit function. So, skip them completely. This fixes https://bugs.freedesktop.org/show_bug.cgi?id=93092 Reviewed-by: Jose Fonseca <[email protected]> Reviewed-by: Ilia Mirkin <[email protected]>
* radeon/llvm: Use llvm.AMDIL.exp intrinsic again for nowMichel Dänzer2015-11-241-1/+1
| | | | | | | | llvm.exp2.f32 doesn't work in some cases yet. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=92709 Reviewed-by: Nicolai Hähnle <[email protected]>