summaryrefslogtreecommitdiffstats
path: root/src/gallium
Commit message (Collapse)AuthorAgeFilesLines
* gallium/radeon: inline radeon_winsys::query_memory_usageMarek Olšák2016-08-064-15/+1
| | | | Reviewed-by: Nicolai Hähnle <[email protected]>
* gallium/radeon/winsyses: expose per-IB used_vram and used_gart to driversMarek Olšák2016-08-065-25/+24
| | | | | | The following patches will use this. Reviewed-by: Nicolai Hähnle <[email protected]>
* gallium/radeon/winsyses: print CS submission error numberMarek Olšák2016-08-062-2/+2
| | | | Reviewed-by: Nicolai Hähnle <[email protected]>
* radeonsi: flush if constant, shader, and streamout buffers use too much memoryMarek Olšák2016-08-061-15/+18
| | | | Reviewed-by: Nicolai Hähnle <[email protected]>
* radeonsi: flush if sampler views and images use too much memoryMarek Olšák2016-08-062-19/+63
| | | | Reviewed-by: Nicolai Hähnle <[email protected]>
* radeonsi: deal with high vertex buffer memory usage correctlyMarek Olšák2016-08-063-3/+10
| | | | Reviewed-by: Nicolai Hähnle <[email protected]>
* radeonsi: take compute shader and dispatch indirect memory usage into accountMarek Olšák2016-08-061-0/+6
| | | | Reviewed-by: Nicolai Hähnle <[email protected]>
* radeonsi: take scratch buffer and draw indirect memory usage into accountMarek Olšák2016-08-061-0/+6
| | | | Reviewed-by: Nicolai Hähnle <[email protected]>
* radeonsi: check IB memory usage of CP DMA operationsMarek Olšák2016-08-061-0/+5
| | | | Reviewed-by: Nicolai Hähnle <[email protected]>
* gallium/radeon: add r600_resource::vram_usage and gart_usageMarek Olšák2016-08-063-12/+19
| | | | Reviewed-by: Nicolai Hähnle <[email protected]>
* util: Move format_r11g11b10f.h to src/utilJason Ekstrand2016-08-053-234/+1
| | | | | | | It's used from both mesa main and gallium. Signed-off-by: Jason Ekstrand <[email protected]> Reviewed-by: Roland Scheidegger <[email protected]>
* util: Move format_rgb9e5.h to src/utilJason Ekstrand2016-08-054-164/+2
| | | | | | | It's used from both mesa main and gallium. Signed-off-by: Jason Ekstrand <[email protected]> Reviewed-by: Roland Scheidegger <[email protected]>
* swr: [rasterizer core] static analysis fixes for conservative rastTim Rowley2016-08-042-5/+10
| | | | Signed-off-by: Tim Rowley <[email protected]>
* swr: [rasterizer core] implement InnerConservative input coverageTim Rowley2016-08-046-182/+357
| | | | Signed-off-by: Tim Rowley <[email protected]>
* swr: [rasterizer core] remove CanEarlyZ functionTim Rowley2016-08-041-6/+0
| | | | | | Test is now in SetupPipeline. Signed-off-by: Tim Rowley <[email protected]>
* swr: [rasterizer core] use 32x32 macrotile for openswrTim Rowley2016-08-041-4/+4
| | | | | | Significant performance increase (up to 2x) on high geometry workloads. Signed-off-by: Tim Rowley <[email protected]>
* swr: [rasterizer fetch] add support for 24bit format fetchTim Rowley2016-08-041-0/+1
| | | | Signed-off-by: Tim Rowley <[email protected]>
* swr: [rasterizer fetch] additional fetch format supportTim Rowley2016-08-041-3/+15
| | | | | | | | Add support for 0 pitch in fetch. Add support for USCALE/SSCALE for 32bit integer fetches. Signed-off-by: Tim Rowley <[email protected]>
* swr: [rasterizer jitter] fix potential jit exit crashTim Rowley2016-08-041-1/+6
| | | | Signed-off-by: Tim Rowley <[email protected]>
* swr: [rasterizer core] update sync handlingTim Rowley2016-08-045-15/+15
| | | | | | | | Sync now uses a callback to ensure that it's called by the last thread moving past a DC. This will help with the new counter handling. Signed-off-by: Tim Rowley <[email protected]>
* swr: [rasterizer core] rename variableTim Rowley2016-08-041-7/+7
| | | | | | Avoid nested declarations of the same name within a single function. Signed-off-by: Tim Rowley <[email protected]>
* swr: [rasterizer jitter] adjust extern "C" block scopeTim Rowley2016-08-041-3/+5
| | | | Signed-off-by: Tim Rowley <[email protected]>
* swr: [rasterizer core] conservative rast degenerate handlingTim Rowley2016-08-045-144/+332
| | | | Signed-off-by: Tim Rowley <[email protected]>
* swr: [rasterizer core] allow hexadecimal for integer knobsTim Rowley2016-08-041-3/+6
| | | | Signed-off-by: Tim Rowley <[email protected]>
* vc4: Move scalarizing and some lowering to link time.Eric Anholt2016-08-041-5/+12
| | | | | | | This works out to be a wash in terms of memory usage: We use more memory to store the separate ALU instructions, but we optimize out a lot of code as well. The main result, though, is that we do more of our work at link time rather than draw time.
* vc4: Avoid VS shader recompiles by keeping a set of FS inputs seen so far.Eric Anholt2016-08-043-25/+81
| | | | | | | | | | | | We don't want to bake the whole array into the FS key, because of the hashing overhead. But we can keep a set of the arrays seen, and use a pointer to the copy in as the array's proxy. Between this and the previous patch, gl-1.0-blend-func now passes on hardware, where previously it was filling the 256MB CMA area with shaders and OOMing. Drops 712 shaders from shader-db.
* vc4: Don't recompile the CS when the FS changes.Eric Anholt2016-08-041-0/+2
| | | | | | | The compiled_fs_id is a proxy for the vc4->prog.fs->input_slots[], but only the VS dereferences it. Drops 754 shaders from shader-db.
* vc4: Move FS inputs setup out to a helper function.Eric Anholt2016-08-041-34/+41
| | | | It's a pretty big block, and I was about to make it bigger.
* vl/dri3: Destroy Present event context when destroying drawable v2Michel Dänzer2016-08-041-5/+16
| | | | | | | | | | | Without this, the X server may accumulate stale Present event contexts if a client performs several video decoding sessions using the same window. v2: Based on Chris Wilson's review: * Use xcb_discard_reply() instead of free(xcb_request_check()) Reviewed-and-Tested-by: Leo Liu <[email protected]>
* vc4: Avoid generating a custom shader per level in glGenerateMipmaps().Eric Anholt2016-08-033-7/+25
| | | | | | | | | | We were baking in the LOD of the source level to each shader. Instead, pass it in as a uniform -- this requires storing it to a temp register, but that's better than compiling a ton of separate shaders: total instructions in shared programs: 115032 -> 115036 (0.00%) instructions in affected programs: 96 -> 100 (4.17%) LOST: 572
* vc4: Tell valgrind about BO allocations from mmap time to destroy.Eric Anholt2016-08-032-0/+11
| | | | | | This helps in debugging memory pressure. It would be nice if we could tell valgrind about it all the way from allocation time to destroy, but we need a pointer to hand to VALGRIND_MALLOCLIKE_BLOCK.
* vc4: Fix a leak of the src[] array of VPM reads in optimization.Eric Anholt2016-08-031-4/+5
| | | | Cc: "12.0" <[email protected]>
* vc4: Fix leak of the bo_handles table.Eric Anholt2016-08-031-0/+1
|
* vc4: Fix handling of UBO range offsets.Eric Anholt2016-08-031-2/+3
| | | | | | The ranges are in units of bytes, not dwords. This wasn't caught by piglit tests because ttn tends to make one big uniform file, so we only had one UBO range with a src and dst offset of 0.
* vc4: Dump NIR at shader state creation time as well.Eric Anholt2016-08-031-0/+8
| | | | I keep wanting to see this version of the NIR.
* r600g: use last_gfx_fence like radeonsiMarek Olšák2016-08-031-3/+12
| | | | Reviewed-by: Nicolai Hähnle <[email protected]>
* gallium/radeon: move last_gfx_fence from radeonsi to common codeMarek Olšák2016-08-035-7/+7
| | | | Reviewed-by: Nicolai Hähnle <[email protected]>
* radeonsi: skip unnecessary si_update_shaders callsMarek Olšák2016-08-034-7/+27
| | | | | | Small decrease in draw call overhead. Reviewed-by: Nicolai Hähnle <[email protected]>
* radeonsi: print the command line to VM fault reports (v2)Marek Olšák2016-08-031-0/+3
| | | | | | v2: rebase on top of Brian's commit Reviewed-by: Nicolai Hähnle <[email protected]>
* ddebug: print the command line to all logs (v2)Marek Olšák2016-08-031-0/+4
| | | | | | | | for piglit with the pipelined hang detection mode v2: rebase on top of Brian's commit Reviewed-by: Nicolai Hähnle <[email protected]>
* ddebug: don't use fmemopen on non-Linux OSMarek Olšák2016-08-031-0/+5
| | | | | | Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=97140 Reviewed-by: Nicolai Hähnle <[email protected]>
* radeonsi: don't set the last parameter component of llvm.AMDGPU.cubeMarek Olšák2016-08-031-2/+8
| | | | | | LLVM doesn't use it. Reviewed-by: Nicolai Hähnle <[email protected]>
* radeonsi: use llvm.amdgcn.cube* if availableMarek Olšák2016-08-031-4/+28
| | | | Reviewed-by: Nicolai Hähnle <[email protected]>
* radeonsi: use llvm.amdgcn.rsq.f64 if availableMarek Olšák2016-08-031-1/+2
| | | | Reviewed-by: Nicolai Hähnle <[email protected]>
* radeonsi: use v_mad_f32 for fmaMarek Olšák2016-08-031-2/+2
| | | | | | | | | | | | | | | | | | | | | | v_fma_f32 runs at FP64 rate (= slow). Alien Isolation and F1 2015 seem to use fma for all d3d multiply-add instructions, which is silly. This tries to restore performance for those games. The main difference between v_mad_f32 and v_fma_f32 is that v_mad doesn't support denormals, which we don't enable anyway, because they are slow too. Also, there is code size reduction: Totals from affected shaders: VGPRS: 109796 -> 109808 (0.01 %) Spilled SGPRs: 29995 -> 30022 (0.09 %) Spilled VGPRs: 12 -> 13 (8.33 %) <-- it's just one shader going from 12 to 13 Code Size: 6667596 -> 6476356 (-2.87 %) bytes Max Waves: 26931 -> 26899 (-0.12 %) I've not actually tested real performance. Reviewed-by: Nicolai Hähnle <[email protected]>
* swr: build swr with -fno-strict-aliasingTim Rowley2016-08-021-0/+1
| | | | | | | swr rasterizer contains numerous data transfers between vectors and ordinary C types. Fixing for strict aliasing will take time. Reviewed-by: Matt Turner <[email protected]>
* gallium/util: fix align64Marek Olšák2016-08-011-1/+1
| | | | | | | | it cut off the upper 32 bits Cc: [email protected] Reviewed-by: Ilia Mirkin <[email protected]> Reviewed-by: Edward O'Callaghan <[email protected]>
* draw: Avoid aliasing violations.Matt Turner2016-08-012-3/+6
| | | | Reviewed-by: Marek Olšák <[email protected]>
* r600g: Avoid aliasing violations.Matt Turner2016-08-012-13/+9
| | | | Reviewed-by: Marek Olšák <[email protected]>
* r300g: Avoid aliasing violation.Matt Turner2016-08-011-1/+2
| | | | Reviewed-by: Marek Olšák <[email protected]>