summaryrefslogtreecommitdiffstats
path: root/src/gallium
Commit message (Collapse)AuthorAgeFilesLines
* swr: [rasterizer fetch] additional fetch format supportTim Rowley2016-08-041-3/+15
| | | | | | | | Add support for 0 pitch in fetch. Add support for USCALE/SSCALE for 32bit integer fetches. Signed-off-by: Tim Rowley <[email protected]>
* swr: [rasterizer jitter] fix potential jit exit crashTim Rowley2016-08-041-1/+6
| | | | Signed-off-by: Tim Rowley <[email protected]>
* swr: [rasterizer core] update sync handlingTim Rowley2016-08-045-15/+15
| | | | | | | | Sync now uses a callback to ensure that it's called by the last thread moving past a DC. This will help with the new counter handling. Signed-off-by: Tim Rowley <[email protected]>
* swr: [rasterizer core] rename variableTim Rowley2016-08-041-7/+7
| | | | | | Avoid nested declarations of the same name within a single function. Signed-off-by: Tim Rowley <[email protected]>
* swr: [rasterizer jitter] adjust extern "C" block scopeTim Rowley2016-08-041-3/+5
| | | | Signed-off-by: Tim Rowley <[email protected]>
* swr: [rasterizer core] conservative rast degenerate handlingTim Rowley2016-08-045-144/+332
| | | | Signed-off-by: Tim Rowley <[email protected]>
* swr: [rasterizer core] allow hexadecimal for integer knobsTim Rowley2016-08-041-3/+6
| | | | Signed-off-by: Tim Rowley <[email protected]>
* vc4: Move scalarizing and some lowering to link time.Eric Anholt2016-08-041-5/+12
| | | | | | | This works out to be a wash in terms of memory usage: We use more memory to store the separate ALU instructions, but we optimize out a lot of code as well. The main result, though, is that we do more of our work at link time rather than draw time.
* vc4: Avoid VS shader recompiles by keeping a set of FS inputs seen so far.Eric Anholt2016-08-043-25/+81
| | | | | | | | | | | | We don't want to bake the whole array into the FS key, because of the hashing overhead. But we can keep a set of the arrays seen, and use a pointer to the copy in as the array's proxy. Between this and the previous patch, gl-1.0-blend-func now passes on hardware, where previously it was filling the 256MB CMA area with shaders and OOMing. Drops 712 shaders from shader-db.
* vc4: Don't recompile the CS when the FS changes.Eric Anholt2016-08-041-0/+2
| | | | | | | The compiled_fs_id is a proxy for the vc4->prog.fs->input_slots[], but only the VS dereferences it. Drops 754 shaders from shader-db.
* vc4: Move FS inputs setup out to a helper function.Eric Anholt2016-08-041-34/+41
| | | | It's a pretty big block, and I was about to make it bigger.
* vl/dri3: Destroy Present event context when destroying drawable v2Michel Dänzer2016-08-041-5/+16
| | | | | | | | | | | Without this, the X server may accumulate stale Present event contexts if a client performs several video decoding sessions using the same window. v2: Based on Chris Wilson's review: * Use xcb_discard_reply() instead of free(xcb_request_check()) Reviewed-and-Tested-by: Leo Liu <[email protected]>
* vc4: Avoid generating a custom shader per level in glGenerateMipmaps().Eric Anholt2016-08-033-7/+25
| | | | | | | | | | We were baking in the LOD of the source level to each shader. Instead, pass it in as a uniform -- this requires storing it to a temp register, but that's better than compiling a ton of separate shaders: total instructions in shared programs: 115032 -> 115036 (0.00%) instructions in affected programs: 96 -> 100 (4.17%) LOST: 572
* vc4: Tell valgrind about BO allocations from mmap time to destroy.Eric Anholt2016-08-032-0/+11
| | | | | | This helps in debugging memory pressure. It would be nice if we could tell valgrind about it all the way from allocation time to destroy, but we need a pointer to hand to VALGRIND_MALLOCLIKE_BLOCK.
* vc4: Fix a leak of the src[] array of VPM reads in optimization.Eric Anholt2016-08-031-4/+5
| | | | Cc: "12.0" <[email protected]>
* vc4: Fix leak of the bo_handles table.Eric Anholt2016-08-031-0/+1
|
* vc4: Fix handling of UBO range offsets.Eric Anholt2016-08-031-2/+3
| | | | | | The ranges are in units of bytes, not dwords. This wasn't caught by piglit tests because ttn tends to make one big uniform file, so we only had one UBO range with a src and dst offset of 0.
* vc4: Dump NIR at shader state creation time as well.Eric Anholt2016-08-031-0/+8
| | | | I keep wanting to see this version of the NIR.
* r600g: use last_gfx_fence like radeonsiMarek Olšák2016-08-031-3/+12
| | | | Reviewed-by: Nicolai Hähnle <[email protected]>
* gallium/radeon: move last_gfx_fence from radeonsi to common codeMarek Olšák2016-08-035-7/+7
| | | | Reviewed-by: Nicolai Hähnle <[email protected]>
* radeonsi: skip unnecessary si_update_shaders callsMarek Olšák2016-08-034-7/+27
| | | | | | Small decrease in draw call overhead. Reviewed-by: Nicolai Hähnle <[email protected]>
* radeonsi: print the command line to VM fault reports (v2)Marek Olšák2016-08-031-0/+3
| | | | | | v2: rebase on top of Brian's commit Reviewed-by: Nicolai Hähnle <[email protected]>
* ddebug: print the command line to all logs (v2)Marek Olšák2016-08-031-0/+4
| | | | | | | | for piglit with the pipelined hang detection mode v2: rebase on top of Brian's commit Reviewed-by: Nicolai Hähnle <[email protected]>
* ddebug: don't use fmemopen on non-Linux OSMarek Olšák2016-08-031-0/+5
| | | | | | Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=97140 Reviewed-by: Nicolai Hähnle <[email protected]>
* radeonsi: don't set the last parameter component of llvm.AMDGPU.cubeMarek Olšák2016-08-031-2/+8
| | | | | | LLVM doesn't use it. Reviewed-by: Nicolai Hähnle <[email protected]>
* radeonsi: use llvm.amdgcn.cube* if availableMarek Olšák2016-08-031-4/+28
| | | | Reviewed-by: Nicolai Hähnle <[email protected]>
* radeonsi: use llvm.amdgcn.rsq.f64 if availableMarek Olšák2016-08-031-1/+2
| | | | Reviewed-by: Nicolai Hähnle <[email protected]>
* radeonsi: use v_mad_f32 for fmaMarek Olšák2016-08-031-2/+2
| | | | | | | | | | | | | | | | | | | | | | v_fma_f32 runs at FP64 rate (= slow). Alien Isolation and F1 2015 seem to use fma for all d3d multiply-add instructions, which is silly. This tries to restore performance for those games. The main difference between v_mad_f32 and v_fma_f32 is that v_mad doesn't support denormals, which we don't enable anyway, because they are slow too. Also, there is code size reduction: Totals from affected shaders: VGPRS: 109796 -> 109808 (0.01 %) Spilled SGPRs: 29995 -> 30022 (0.09 %) Spilled VGPRs: 12 -> 13 (8.33 %) <-- it's just one shader going from 12 to 13 Code Size: 6667596 -> 6476356 (-2.87 %) bytes Max Waves: 26931 -> 26899 (-0.12 %) I've not actually tested real performance. Reviewed-by: Nicolai Hähnle <[email protected]>
* swr: build swr with -fno-strict-aliasingTim Rowley2016-08-021-0/+1
| | | | | | | swr rasterizer contains numerous data transfers between vectors and ordinary C types. Fixing for strict aliasing will take time. Reviewed-by: Matt Turner <[email protected]>
* gallium/util: fix align64Marek Olšák2016-08-011-1/+1
| | | | | | | | it cut off the upper 32 bits Cc: [email protected] Reviewed-by: Ilia Mirkin <[email protected]> Reviewed-by: Edward O'Callaghan <[email protected]>
* draw: Avoid aliasing violations.Matt Turner2016-08-012-3/+6
| | | | Reviewed-by: Marek Olšák <[email protected]>
* r600g: Avoid aliasing violations.Matt Turner2016-08-012-13/+9
| | | | Reviewed-by: Marek Olšák <[email protected]>
* r300g: Avoid aliasing violation.Matt Turner2016-08-011-1/+2
| | | | Reviewed-by: Marek Olšák <[email protected]>
* gallium/auxiliary: Add u_bitcast.h header.Matt Turner2016-08-012-0/+58
| | | | Reviewed-by: Marek Olšák <[email protected]>
* auxiliary/os: add new os_get_command_line() functionBrian Paul2016-08-012-0/+52
| | | | | | | | | | | This can be used by the driver to get the command line which started the process. Will be used by the VMware driver for extra logging. For now, this is only implemented for Linux via /proc/self/cmdline and Windows via GetCommandLine(). Reviewed-by: Charmaine Lee <[email protected]> Reviewed-by: Marek Olšák <[email protected]>
* svga: avoid redundant SetVertexBuffer/SetIndexBuffer commands at rebindCharmaine Lee2016-08-011-16/+19
| | | | | | | | | | | | This patch eliminates the redundant SetVertexBuffers and SetIndexBuffer commands that are emitted for rebind purpose. With this patch, the set commands will be skipped, but we will still reference the associated resources to allow the kernel to bring in the resources. Tested with Lightsmark2008, Valley, MTT glretrace, piglit, conform. Reviewed-by: Brian Paul <[email protected]>
* u_vbuf: fix potentially bogus assertRob Clark2016-08-011-2/+4
| | | | | | | | | | | | | | | There are cases where we hit u_vbuf path due to alignment or pitch- alignment restrictions, but for an output-format that u_vbuf does not support translating (yet the driver does support natively). In which case we hit the memcpy() path and don't care that u_vbuf doesn't understand it. Fixes crash with debug build of mesa in: dEQP-GLES3.functional.vertex_arrays.single_attribute.strides.fixed.user_ptr_stride17_components2_quads1 Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=95000 Signed-off-by: Rob Clark <[email protected]> Reviewed-by: Marek Olšák <[email protected]>
* vc4: Zero-initialize the hardware sampler view structure.Eric Anholt2016-07-311-1/+1
| | | | | Fixes failure to initialize the force_first_level flag, causing failures in piglit levelclamp.
* Revert "gallium/util: fix resource leak"Roland Scheidegger2016-07-301-2/+0
| | | | | | This reverts commit d1fe26a62862f4e47a799222dca1bc1dc14ca4af. Replacing a resource leak with a segfault isn't the solution.
* gallium/util: fix resource leakEric Engestrom2016-07-301-0/+2
| | | | | | | CovID: 401540 Signed-off-by: Eric Engestrom <[email protected]> Signed-off-by: Marek Olšák <[email protected]> Reviewed-by: Edward O'Callaghan <[email protected]>
* freedreno/a4xx: fix comparison out of range warnings[email protected]2016-07-301-7/+7
| | | | | Signed-off-by: Francesco Ansanelli <[email protected]> Signed-off-by: Rob Clark <[email protected]>
* freedreno/a3xx: fix comparison out of range warnings[email protected]2016-07-301-7/+7
| | | | | Signed-off-by: Francesco Ansanelli <[email protected]> Signed-off-by: Rob Clark <[email protected]>
* freedreno/a2xx: fix comparison out of range warnings[email protected]2016-07-301-4/+4
| | | | | Signed-off-by: Francesco Ansanelli <[email protected]> Signed-off-by: Rob Clark <[email protected]>
* freedreno/ir3: init ir3_shader_key with memset()[email protected]2016-07-301-1/+2
| | | | | | | To silence missing initializers warning Signed-off-by: Francesco Ansanelli <[email protected]> Signed-off-by: Rob Clark <[email protected]>
* gallium/freedreno: move cast to avoid integer overflowEric Engestrom2016-07-301-2/+2
| | | | | | | | | Previously, the bitshift would be performed on a simple int (32 bits on most systems), overflow, and then be cast to 64 bits. CovID: 1362461 Signed-off-by: Eric Engestrom <[email protected]> Signed-off-by: Rob Clark <[email protected]>
* freedreno/a2xx: remove duplicate assignmentEric Engestrom2016-07-301-2/+2
| | | | | | CovID: 1362445, 1362446 Signed-off-by: Eric Engestrom <[email protected]> Signed-off-by: Rob Clark <[email protected]>
* freedreno: defer flush_queue allocationRob Clark2016-07-302-2/+4
| | | | | | | Some apps, like warsow, create a bazillion contexts but don't render on most of them. Signed-off-by: Rob Clark <[email protected]>
* freedreno: add some hw query tracesRob Clark2016-07-301-0/+16
| | | | Signed-off-by: Rob Clark <[email protected]>
* freedreno: some lockingRob Clark2016-07-309-23/+157
| | | | Signed-off-by: Rob Clark <[email protected]>
* os: add pipe_mutex_assert_locked()Rob Clark2016-07-301-0/+16
| | | | | | | Would be nice if we could also have lockdep, like in the linux kernel. But this is better than nothing. Signed-off-by: Rob Clark <[email protected]>