summaryrefslogtreecommitdiffstats
path: root/src
Commit message (Collapse)AuthorAgeFilesLines
* radeonsi: use tgsi_shader_info in si_shader_psMarek Olšák2014-10-123-5/+5
| | | | Reviewed-by: Michel Dänzer <[email protected]>
* radeonsi: use tgsi_shader_info in fetch_input_gsMarek Olšák2014-10-121-4/+5
| | | | Reviewed-by: Michel Dänzer <[email protected]>
* radeonsi: don't rely on shader->output in si_llvm_emit_fs_epilogueMarek Olšák2014-10-121-1/+1
| | | | Reviewed-by: Michel Dänzer <[email protected]>
* radeonsi: use tgsi_shader_info in si_llvm_emit_es_epilogueMarek Olšák2014-10-121-17/+5
| | | | | | tgsi_shader_info contains everything we need. Reviewed-by: Michel Dänzer <[email protected]>
* radeonsi: don't recompile shaders when changing nr_cbufs from 0 to 1Marek Olšák2014-10-123-4/+4
| | | | | | Both cases are equivalent. Reviewed-by: Michel Dänzer <[email protected]>
* radeonsi: remove vs.ucps_enabled from the shader keyMarek Olšák2014-10-123-15/+0
| | | | | | Written CLIPDIST outputs are simply disabled in PA_CL_VS_OUT_CNTL. Reviewed-by: Michel Dänzer <[email protected]>
* radeonsi: assume ClipDistance usage mask is always 0xfMarek Olšák2014-10-122-8/+2
| | | | | | | | | | | | No code in Mesa sets the usage mask to any other value. The final mask is AND'ed with enable bits from the rasterizer state anyway. If somebody implements setting usage masks in st/mesa, we can use tgsi_shader_info to get it more easily. This is a prerequisite for the following commit. Reviewed-by: Michel Dänzer <[email protected]>
* clover: Fix unintended fall-through in kernel::argument::bind.Francisco Jerez2014-10-121-0/+3
|
* clover: Append implicit arguments to the kernel argument list.Jan Vesely2014-10-121-13/+29
| | | | | | | [ Francisco Jerez: Split off from a larger patch, and take a slightly different approach for passing the implicit arguments around. ] Reviewed-by: Francisco Jerez <[email protected]>
* clover: Pass execution dimensions and offset to the kernel as implicit ↵Francisco Jerez2014-10-122-25/+70
| | | | | | arguments. Reviewed-by: Jan Vesely <[email protected]>
* clover: Add semantic information to module::argument for implicit parameter ↵Francisco Jerez2014-10-121-4/+12
| | | | | | passing. Reviewed-by: Jan Vesely <[email protected]>
* clover: Use unreachable() from util/macros.h instead of assert(0).Francisco Jerez2014-10-113-4/+4
| | | | Reviewed-by: Francisco Jerez <[email protected]>
* gallium: Add tokens for DragonFly BSD.Vinson Lee2014-10-101-0/+6
| | | | | Signed-off-by: Vinson Lee <[email protected]> Acked-by: Brian Paul <[email protected]>
* ilo: disassemble compacted instructionsChia-I Wu2014-10-114-2/+453
| | | | Signed-off-by: Chia-I Wu <[email protected]>
* glsl: improve accuracy of atan()Erik Faye-Lund2014-10-101-10/+55
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Our current atan()-approximation is pretty inaccurate at 1.0, so let's try to improve the situation by doing a direct approximation without going through atan. This new implementation uses an 11th degree polynomial to approximate atan in the [-1..1] range, and the following identitiy to reduce the entire range to [-1..1]: atan(x) = 0.5 * pi * sign(x) - atan(1.0 / x) This range-reduction idea is taken from the paper "Fast computation of Arctangent Functions for Embedded Applications: A Comparative Analysis" (Ukil et al. 2011). The polynomial that approximates atan(x) is: x * 0.9999793128310355 - x^3 * 0.3326756418091246 + x^5 * 0.1938924977115610 - x^7 * 0.1173503194786851 + x^9 * 0.0536813784310406 - x^11 * 0.0121323213173444 This polynomial was found with the following GNU Octave script: x = linspace(0, 1); y = atan(x); n = [1, 3, 5, 7, 9, 11]; format long; polyfitc(x, y, n) The polyfitc function is not built-in, but too long to include here. It can be downloaded from the following URL: http://www.mathworks.com/matlabcentral/fileexchange/47851-constraint-polynomial-fit/content/polyfitc.m This fixes the following piglit test: shaders/glsl-const-folding-01 Signed-off-by: Erik Faye-Lund <[email protected]> Reviewed-by: Ian Romanick <[email protected]>
* vc4: Use the fnv1 hash function instead of gallium util's crc32.Eric Anholt2014-10-101-2/+3
| | | | | Improves simulated norast performance on a little benchmark by 13.4012% +/- 2.08459% (n=13).
* vc4: Don't look up the compiled shaders unless state has changed.Eric Anholt2014-10-103-0/+28
| | | | | Improves simulated norast performance on a little benchmark by 38.0965% +/- 3.27534% (n=11).
* vc4: Actually clear the context's dirty flags.Eric Anholt2014-10-101-0/+1
| | | | | I was trying to skip state updates when !dirty, and suspiciously everything was always dirty.
* vc4: Optimize the other case of SEL_X_Y wih a 0 -> SEL_X_0(a).Eric Anholt2014-10-101-1/+23
| | | | Cleans up some output to be more obvious in a piglit test I'm looking at.
* mesa: fix error reported on gTexSubImage2D when level not validTapani Pälli2014-10-101-1/+1
| | | | | Signed-off-by: Tapani Pälli <[email protected]> Reviewed-by: Juha-Pekka Heikkila <[email protected]>
* i965: Fix register write checks.Kenneth Graunke2014-10-101-0/+2
| | | | | | | | | | | | | When mapping the buffer a second time, we need to use the new pointer, not the one from the previous mapping. Otherwise, we will most likely crash. Apparently, we've just been getting lucky and getting the same bo->virtual pointer in both cases. libdrm probably has a hand in that. Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Anuj Phogat <[email protected]> Cc: [email protected]
* vc4: Optimize out adds of 0.Eric Anholt2014-10-091-0/+26
|
* vc4: Optimize fmul(x, 0) and fmul(x, 1).Eric Anholt2014-10-091-0/+45
| | | | | This was being generated frequently by matrix multiplies of 2 and 3-channel vertex attributes (which have the 0 or 1 loaded in the shader).
* vc4: Factor out the turn-it-into-a-mov in opt_algebraic.Eric Anholt2014-10-091-10/+12
| | | | This will be used more in the next commits.
* vc4: Eliminate unused texture instructions.Eric Anholt2014-10-091-1/+21
|
* vc4: Dead code eliminate unused SF instructions.Eric Anholt2014-10-091-7/+26
|
* vc4: Prevent copy propagating out the MOVs from r4.Eric Anholt2014-10-091-1/+11
| | | | | | | | | Copy propagating these might result in reading the r4 after some other instruction has written r4. Just prevent all copy propagation of this for now. Fixes bad rendering with upcoming indirect register access support, where the copy propagation was consistently happening across another read.
* vc4: Split the coordinate shader to its own vc4_compiled_shader.Eric Anholt2014-10-093-89/+54
| | | | | | | | | | | Merging VS and CS into the same struct wasn't winning us anything except for not allocating a separate BO (but if we want to pack programs into BOs, we should pack not just those 2 programs together). What it was getting us was a bunch of code duplication about hash table lookups and propagating vc4_compile contents into a vc4_compiled_shader. I was about to make the situation worse with indirect uniform buffer access.
* vc4: Add #defines for the texture uniform fields.Eric Anholt2014-10-092-19/+113
| | | | | | I wanted to make another set of texture uploads for handling reladdr constants, and duplicating all the bitshifting looked like a terrible idea. In the process, this fixes a swap of the s/t texture wrap modes.
* vc4: Initialize undefined temporaries to 0.Eric Anholt2014-10-091-1/+6
| | | | | | | | | | Under the simulator, reading registers before writing them triggers an assertion failure. c->undef gets treated as r0, which will usually be written, but not if it's used in the first instruction. We should definitely not be aborting in this case, and return some sort of undefined value instead. Fixes glsl-user-varying-ff.
* i965: Skip uploading border color when unnecessary.Kenneth Graunke2014-10-091-2/+20
| | | | | | | | | | | | | | The border color is only needed when using the GL_CLAMP_TO_BORDER or (deprecated) GL_CLAMP wrap modes; all others ignore it, including the common GL_CLAMP_TO_EDGE and GL_REPEAT wrap modes. In those cases, we can skip uploading it entirely, saving a bit of space in the batchbuffer. Instead, we just point it at the start of the batch (offset 0); we have to program something, and that address is safe to read. Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Chris Forbes <[email protected]>
* i965: Use BDW_MOCS_PTE for renderbuffers.Kenneth Graunke2014-10-091-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | Write-back caching cannot be used for buffers being scanned out by the display engine; surfaces used for scan-out must be write-through or uncached. I originally chose WT for render targets because it works in all cases. However, we really want to use write-back caching where possible, as it is more efficient. Most renderbuffers are not used for scanout - off-screen FBOs certainly are fine, and non-pageflipped backbuffers should be fine as well. So in most cases WB will work. However, we don't know what will be used for scan-out, so we instead simply use the PTE value specified by the kernel, as it knows these things. This matches our MOCS choice on Haswell. Fixes performance regressions since commit ee4484be3dc827cf15bcf109f5 in a microbenchmark (spotted by Eero Tamminen). Improves performance in GLBenchmark 2.7/EgyptHD by 7.44362% +/- 0.496939% (n=55) on a Broadwell GT2. Improves performance in a bunch of other microbenchmarks by ~15% or so. Signed-off-by: Kenneth Graunke <[email protected]> Reported-by: Eero Tamminen <[email protected]> Reviewed-by: Daniel Vetter <[email protected]> Reviewed-by: Kristian Høgsberg <[email protected]> Cc: [email protected]
* i965: Add a BRW_MOCS_PTE #define.Kenneth Graunke2014-10-091-3/+7
| | | | | | | | | | | | | | Like BDW_MOCS_WB and BDW_MOCS_WT, this specifies that we want to use all three caches (L3, LLC, and eLLC where available), but leaves the LLC caching mode up to the kernel's page table entry. This allows the kernel to pick WB/WT/UC based on whether it's using a buffer for scanout. Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Daniel Vetter <[email protected]> Reviewed-by: Kristian Høgsberg <[email protected]> Cc: [email protected]
* mesa: Make _mesa_print_arrays use stderr.Kenneth Graunke2014-10-091-3/+3
| | | | | | | | | These days, most driver debug output happens via stderr, not stdout. Some applications (such as Xephyr) also appear to close stdout which makes these messages go nowhere. Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Tapani Pälli <[email protected]>
* r600g,radeonsi: Always use GTT again for PIPE_USAGE_STREAM buffersMichel Dänzer2014-10-091-1/+3
| | | | | | | | | Putting those in VRAM can cause long pauses due to buffers being moved into / out of VRAM. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=84662 Cc: [email protected] Reviewed-by: Alex Deucher <[email protected]>
* vc4: Optimize SF(ITOF(x)) -> SF(x).Eric Anholt2014-10-091-0/+16
| | | | | This is a common production of st_glsl_to_tgsi, because CMP takes a float argument.
* vc4: Add some optimization of FADD(FSUB(0, x)).Eric Anholt2014-10-091-0/+31
| | | | | This is a common production of st_glsl_to_tgsi, which uses negate flags on source arguments to handle subtraction.
* vc4: Mostly fix offset calculation for NPOT mipmap levels.Eric Anholt2014-10-092-3/+23
| | | | | | | | | | | | | | The non-base NPOT levels are stored as POT-aligned images. We get that POT alignment by minifying the POT-aligned base level. This means that level strides are also POT aligned, so we have to tell the rendering mode config that our resource is larger than the actual requested area. Fixes the fbo-generatemipmap-formats NPOT cases. Regresses depthstencil-render-miplevels 273 * -- the texture presentation now works (where it was completely broken before), it looks like there's some overflow of image bounds happening at the lower miplevels.
* vc4: Move the mirrored kernel code to a kernel/ directory.Eric Anholt2014-10-0911-258/+382
| | | | Now this whole setup matches the kernel's file layout much more closely.
* vc4: Enable LIT lowering in TGSI instead of our own code.Eric Anholt2014-10-081-35/+1
| | | | This brings us the -128/128 clamping on the w component.
* vc4: Fix scalar math opcodes to replicate their result from the X channel.Eric Anholt2014-10-081-4/+16
| | | | | Thanks to robclark for pointing out that I was probably failing to do this when I reported a "bug" in his lowering code.
* ilo: fix rectlist on GEN7+Chia-I Wu2014-10-091-0/+3
| | | | | | It was broken by 343b014b57ecc5431477e090100e6a26edbda540. Signed-off-by: Chia-I Wu <[email protected]>
* vc4: Add support for two-sided color.Eric Anholt2014-10-082-18/+51
| | | | | | | | | | It's fairly easy, thanks to Rob Clark's lowering code. Fixes two-sided-lighting and 4 vertex-program-two-side testcases, while regressing 8 testcases that involve enabling two-sided color while only initializing one of the two colors in the VS. If you're enabling two sided color, it's of course expected that you really do set up both colors, so this is still an improvement (and when we set up a linker for TGSI, we'll hopefully fix those 8 fails).
* vc4: Enable POW lowering in TGSI instead of our own code.Eric Anholt2014-10-081-11/+1
|
* vc4: Enable DP lowering in TGSI instead of our own code.Eric Anholt2014-10-081-41/+3
|
* vc4: Start using tgsi_lowering for opcodes we haven't supported before.Eric Anholt2014-10-081-1/+15
|
* gallium: Rename freedreno parts of tgsi_lowering.[ch].Eric Anholt2014-10-083-31/+32
| | | | Acked-by: Rob Clark <[email protected]>
* gallium: Reformat tgsi_lowering.c for the normal style.Eric Anholt2014-10-082-1204/+1201
| | | | Acked-by: Rob Clark <[email protected]>
* gallium: Copy fd_lowering.[ch] to tgsi_lowering.[ch] for code sharing.Eric Anholt2014-10-082-0/+1662
| | | | | | | | Lots of drivers need to transform the weird instructions in TGSI into reasonable scalar ops, and this code can make those translations canonical. Acked-by: Rob Clark <[email protected]>
* vc4: Set unused raddr fields to QPU_R_NOP.Eric Anholt2014-10-081-16/+27
| | | | | | | The simulator assertion fails if you have a write to a reg and then a read (for example, in the NOP side of an instruction), even if the read isn't used for anything. By setting unused raddrs to NOP, we avoid the problem (since only the phsyical registers are tracked).