summaryrefslogtreecommitdiffstats
path: root/src/gallium/drivers
Commit message (Collapse)AuthorAgeFilesLines
* radeon/llvm: Use amdgcn triple for SI+ on LLVM >= 3.6Tom Stellard2015-01-064-16/+27
|
* radeonsi: Cache LLVMTargetMachine object in si_screenTom Stellard2015-01-066-26/+51
| | | | | | | | | | Rather than building a new one every compile. This should reduce some of the overhead of compiling shaders. One consequence of this change is that we lose the MachineInstrs dumps when dumping the shaders via R600_DEBUG. The LLVM IR and assembly is still dumped, and if you still want to see the MachineInstr dump, you can run the dumped LLVM IR through llc.
* nvc0: add name to magic numberIlia Mirkin2015-01-051-2/+2
| | | | Signed-off-by: Ilia Mirkin <[email protected]>
* nvc0: regenerate rnndb headersIlia Mirkin2015-01-0517-837/+1157
| | | | | | | | | | | | | | | The headers hadn't been regenerated in a long time and had seen a number of manual modifications. A few changes: - remove nvc0_2d entirely, use the nv50 header which has the nvc0 values too - remove 3ddefs, it's identical to the nv50 file - move macros out into a separate file Also the upstream rnndb changed the overall chip naming convention; this was fixed up manually in the generated files until a better solution is determined. Signed-off-by: Ilia Mirkin <[email protected]>
* nv50: regenerate rnndb headersIlia Mirkin2015-01-0511-358/+451
| | | | | | | | | | The headers hadn't been regenerated in a long time, and there were a few minor divergences. Among other things, rnndb has changed naming to G80/etc, for now I've not tackled switching that over and manually replaced the nvidia codenames back to the chip ids. However no other modifications of the headergen'd headers was done. Signed-off-by: Ilia Mirkin <[email protected]>
* nv50: enable texture compressionTobias Klausmann2015-01-052-3/+26
| | | | | | | | | Compression seems to be supported for only some formats. Enable it for those. Previously this was disabled for everything despite the code looking like it was actually enabled. Signed-off-by: Tobias Klausmann <[email protected]> Reviewed-by: Ilia Mirkin <[email protected]>
* nv50/ir: enable sat modifier for OP_SUBIlia Mirkin2015-01-051-1/+1
| | | | | | | SUB is handled the same as ADD, so no reason not to allow a saturate modifier on it. Signed-off-by: Ilia Mirkin <[email protected]>
* nv50/ir: Add sat modifier for mulRoy Spliet2015-01-052-1/+7
| | | | | Signed-off-by: Roy Spliet <[email protected]> Reviewed-by: Ilia Mirkin <[email protected]>
* nv50,nvc0: avoid doing work inside of an assertIlia Mirkin2015-01-052-2/+4
| | | | | | | | assert is compiled out in release builds - don't put logic into it. Note that this particular instance is only used for vp debugging and is normally compiled out. Signed-off-by: Ilia Mirkin <[email protected]>
* nv50/ir: fix texture offsets in release buildsIlia Mirkin2015-01-052-2/+4
| | | | | | | | | | assert's get compiled out in release builds, so they can't be relied upon to perform logic. Reported-by: Pierre Moreau <[email protected]> Signed-off-by: Ilia Mirkin <[email protected]> Tested-by: Roy Spliet <[email protected]> Cc: "10.2 10.3 10.4" <[email protected]>
* r300g: handle vertex format PIPE_FORMAT_NONEMarek Olšák2015-01-041-2/+11
|
* nv50/ir: Fold sat into madRoy Spliet2015-01-011-1/+1
| | | | | | | | | The mad instruction emitter already supported the saturate modifier, but the ModifierFolding pass never tried folding cvt sat operations in for NV50. Signed-off-by: Roy Spliet <[email protected]> Reviewed-by: Ilia Mirkin <[email protected]>
* nv50/ir: fold MAD when one of the multiplicands is constIlia Mirkin2015-01-011-0/+23
| | | | | | | | | | Fold MAD dst, src0, immed, src2 (or src0/immed swapped) when - immed = 0 -> MOV dst, src2 - immed = +/- 1 -> ADD dst, src0, src2 These types of MAD patterns were observed in some st/nine shaders. Signed-off-by: Ilia Mirkin <[email protected]>
* radeonsi: fix warningsMarek Olšák2015-01-012-1/+3
|
* vc4: Fix memory leak as of 0404e7fe0ac2a6234a11290b4b1596e8bc127a4b.Eric Anholt2014-12-311-5/+5
| | | | Can't reset the CL before looking at how much we had pupt in it.
* nv50,nvc0: set vertex id base to index_biasIlia Mirkin2014-12-305-7/+35
| | | | | | | | | | | | | | Fixes the piglits which check that gl_VertexID includes the base vertex offset: arb_draw_indirect-vertexid elements gl-3.2-basevertex-vertexid Note that this leaves out the original G80, for which this will continue to fail. It could be fixed by passing a driver constbuf value in, but that's beyond the scope of this change. Signed-off-by: Ilia Mirkin <[email protected]> Cc: "10.3 10.4" <[email protected]>
* nv50,nvc0: implement half_pixel_centerTiziano Bacocco2014-12-308-14/+11
| | | | | | | | | | LAST_LINE_PIXEL has actually been renamed to PIXEL_CENTER_INTEGER in rnndb; use that method to implement the rasterizer setting, used for st/nine. Signed-off-by: Tiziano Bacocco <[email protected]> Reviewed-by: Ilia Mirkin <[email protected]> Cc: "10.4" <[email protected]>
* vc4: Only render tiles where the scissor ever intersected them.Eric Anholt2014-12-304-10/+52
| | | | | This gives a 2.7x improvement in x11perf -rect100, since we only end up load/storing the x11perf window, not the whole screen.
* vc4: Move draw call reset handling to a helper function.Eric Anholt2014-12-301-23/+31
| | | | | | This will be more important in the next commit, when there's more state to reset to nonzero values, and I want an early exit from the submit function.
* vc4: Drop the content of vc4_flush_resource().Eric Anholt2014-12-301-4/+4
| | | | | The callers all follow it with a flush of the context, and the flush of the context gives us more information about how things are being flushed.
* vc4: Handle unaligned accesses in CL emits.Eric Anholt2014-12-252-26/+78
| | | | | | | As of 229bf4475ff0a5dbeb9bc95250f7a40a983c2e28 we started getting SIBGUS from unaligned accesses on the hardware, for reasons I haven't figured out. However, we should be avoiding unaligned accesses anyway, and our CL setup certainly would have produced them.
* vc4: Don't bother zero-initializing the shader reloc indices.Eric Anholt2014-12-251-2/+2
| | | | | They should all be set to real values by the time they're read, and ideally if you used valgrind you'd see uninitialized value uses.
* vc4: Fix the argument type for cl_u16().Eric Anholt2014-12-251-1/+1
| | | | It doesn't matter, since it just got truncated to 16 inside, anyway.
* radeonsi: Don't modify PA_SC_RASTER_CONFIG register value if rb_mask == 0Michel Dänzer2014-12-251-2/+4
| | | | | | | | | | | E.g. this could happen on older kernels which don't support the RADEON_INFO_SI_BACKEND_ENABLED_MASK query yet. The code in si_write_harvested_raster_configs() doesn't deal with this correctly and would probably mangle the value badly. Cc: "10.4 10.3" <[email protected]> Reviewed-by: Marek Olšák <[email protected]> Reviewed-by: Tom Stellard <[email protected]>
* vc4: Optimize CL emits by doing size checks up front.Eric Anholt2014-12-245-16/+66
| | | | | | | | The optimizer obviously doesn't have the ability to rewrite these to skip the size checks per call, so we have to do it manually. Improves a norast benchmark on simulation by 0.779706% +/- 0.405838% (n=6087).
* vc4: Avoid repeated hindex lookups in the loop over tiles.Eric Anholt2014-12-242-15/+24
| | | | | Improves norast performance of a microbenchmark by 11.1865% +/- 2.37673% (n=20).
* freedreno/ir3: split out legalize passRob Clark2014-12-235-154/+214
| | | | Signed-off-by: Rob Clark <[email protected]>
* freedreno/ir3: ra debugRob Clark2014-12-233-17/+61
| | | | | | Some compile time RA debug Signed-off-by: Rob Clark <[email protected]>
* radeonsi: force NaNs to 0Marek Olšák2014-12-211-4/+8
| | | | | | | | | This fixes incorrect rendering in Unreal Engine demos. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=83510 Reviewed-by: Alex Deucher <[email protected]> Reviewed-by: Michel Dänzer <[email protected]>
* r300g: implement ARR opcodeDavid Heidelberg2014-12-214-4/+16
| | | | | | | | | | Same as ARL, just has extra rounding. Useful for st/nine. Tested-by: Pavel Ondračka <[email protected]> Reviewed-by: Marek Olšák <[email protected]> Signed-off-by: David Heidelberg <[email protected]> Reviewed-by: Alex Deucher <[email protected]>
* freedreno/a4xx: blend-colorRob Clark2014-12-201-0/+13
| | | | Signed-off-by: Rob Clark <[email protected]>
* freedreno/a4xx: alpha-testRob Clark2014-12-201-0/+2
| | | | Signed-off-by: Rob Clark <[email protected]>
* freedreno: update generated headersRob Clark2014-12-206-61/+151
|
* freedreno/ir3: trans_kill cleanupRob Clark2014-12-201-12/+7
| | | | | | | trans_kill() only handles the single opcode. Drop the remnant of a time when both KILL and KILL_IF were handled by the same fxn. Signed-off-by: Rob Clark <[email protected]>
* freedreno/ir3: hack for standalone compilerRob Clark2014-12-201-1/+5
| | | | | | | | | Standalone compiler doesn't have screen or context. We need to come up with a better way to control the target arch (ie. something that we can control from cmdline w/ standalone compiler) but for now this hack keeps it from segfault'ing. Signed-off-by: Rob Clark <[email protected]>
* vc4: Coalesce MOVs into VPM with the instructions generating the values.Eric Anholt2014-12-184-15/+143
| | | | | total instructions in shared programs: 41168 -> 40976 (-0.47%) instructions in affected programs: 18156 -> 17964 (-1.06%)
* vc4: Redefine VPM writes as a (destination) QIR register file.Eric Anholt2014-12-173-7/+19
| | | | | This will let me coalesce the VPM writes into the instructions generating the values.
* vc4: Add support for turning constant uniforms into small immediates.Eric Anholt2014-12-1713-46/+283
| | | | | | | | | | | | | | | | | | | | | | Small immediates have the downside of taking over the raddr B field, so you might have less chance to pack instructions together thanks to raddr B conflicts. However, it also reduces some register pressure since it lets you load 2 "uniform" values in one instruction (avoiding a previous load of the constant value to a register), and increases some pairing for the same reason. total uniforms in shared programs: 16231 -> 13374 (-17.60%) uniforms in affected programs: 10280 -> 7423 (-27.79%) total instructions in shared programs: 40795 -> 41168 (0.91%) instructions in affected programs: 25551 -> 25924 (1.46%) In a previous version of this patch I had a reduction in instruction count by forcing the other args alongside a SMALL_IMM to be in the A file or accumulators, but that increases register pressure and had a bug in handling FRAG_Z. In this patch is I just use raddr conflict resolution, which is more expensive. I think I'd rather tweak allocation to have some way to slightly prefer good choices for files in general, rather than risk failing to register allocate by forcing things into register classes.
* vc4: Move follow_movs() to common QIR code.Eric Anholt2014-12-173-11/+12
| | | | I want this from other passes.
* vc4: Fix missing newline for load immediate instruction disasm.Eric Anholt2014-12-171-4/+4
|
* vc4: Add a userspace BO cache.Eric Anholt2014-12-174-4/+175
| | | | | | | | | | Since our kernel BOs require CMA allocation, and the use of them requires new mmaps, it's pretty expensive and we should avoid it if possible. Copying my original design for Intel, make a userspace cache that reuses BOs that haven't been shared to other processes but frees BOs that have sat in the cache for over a second. Improves glxgears framerate on RPi by around 30%.
* vc4: Add dmabuf support.Eric Anholt2014-12-173-24/+73
| | | | | | This gets DRI3 working on modesetting with glamor. It's not enabled under simulation, because it looks like handing our dumb-allocated buffers off to the server doesn't actually work for the server's rendering.
* vc4: Drop a weird argument in the BOs-from-handles API.Eric Anholt2014-12-173-7/+5
|
* vc4: Add support for turning add-based MOVs to muls for pairing.Eric Anholt2014-12-161-2/+49
| | | | | total instructions in shared programs: 43053 -> 40795 (-5.24%) instructions in affected programs: 37996 -> 35738 (-5.94%)
* vc4: Add a helper for changing a field in an instruction.Eric Anholt2014-12-162-11/+12
|
* vc4: Fix the name of qpu_waddr_ignores_ws().Eric Anholt2014-12-161-5/+5
| | | | We're deciding about the WS bit, not PM.
* vc4: Add support for enabling early Z discards.Eric Anholt2014-12-161-0/+18
| | | | This is the same basic logic from the original Broadcom driver.
* nvc0: add missed PIPE_CAP_VERTEXID_NOBASEIlia Mirkin2014-12-151-0/+1
| | | | | | Commit ade8b26bf missed adding this cap to nvc0. Signed-off-by: Ilia Mirkin <[email protected]>
* gallium: add TGSI_SEMANTIC_VERTEXID_NOBASE and TGSI_SEMANTIC_BASEVERTEXRoland Scheidegger2014-12-1612-0/+15
| | | | | | | | | | | | | | | | | | | Plus a new PIPE_CAP_VERTEXID_NOBASE query. The idea is that drivers not supporting vertex ids with base vertex offset applied (so, only support d3d10-style vertex ids) will get such a d3d10-style vertex id instead - with the caveat they'll also need to handle the basevertex system value too (this follows what core mesa already does). Additionally, this is also useful for other state trackers (for instance llvmpipe / draw right now implement the d3d10 behavior on purpose, but with different semantics it can just do both). Doesn't do anything yet. And fix up the docs wrt similar values. v2: incorporate feedback from Brian and others, better names, better docs. Reviewed-by: Brian Paul <[email protected]> Reviewed-by: Jose Fonseca <[email protected]> Reviewed-by: Marek Olšák <[email protected]>
* r600g/sb: implement r600 gpr index workaround. (v3.1)Dave Airlie2014-12-164-9/+57
| | | | | | | | | | | | | | | | | | | | | | | r600, rv610 and rv630 all have a bug in their GPR indexing and how the hw inserts access to PV. If the base index for the src is the same as the dst gpr in a previous group, then it will use PV instead of using the indexed gpr correctly. The workaround is to insert a NOP when you detect this. v2: add second part of fix detecting DST rel writes followed by same src base index reads. v3: forget adding stuff to structs, just iterate over the previous node group again, makes it more obvious. v3.1: drop local_nop. Fixes ~200 piglit regressions on rv635 since SB was introduced. Reviewed-By: Glenn Kennard <[email protected]> Signed-off-by: Dave Airlie <[email protected]>