aboutsummaryrefslogtreecommitdiffstats
path: root/src
Commit message (Collapse)AuthorAgeFilesLines
* radeonsi: always prefer SWITCH_ON_EOP(0) on CIKMarek Olšák2014-08-094-10/+46
| | | | | | | | | | | | | | The code is rewritten to take known constraints into account, while always using 0 by default. This should improve performance for multi-SE parts in theory. A debug option is also added for easier debugging. (If there are hangs, use the option. If the hangs go away, you have found the problem.) Reviewed-by: Alex Deucher <[email protected]> v2: fix a typo, set max_se for evergreen GPUs according to the kernel driver
* radeonsi: fix a hang with instancing in Unigine Heaven/Valley on HawaiiMarek Olšák2014-08-091-5/+2
| | | | | | | | This isn't documented anywhere, but it's the only thing that works for this case. Cc: [email protected] Reviewed-by: Alex Deucher <[email protected]>
* radeon,r200: fix buffer validation after CS flushMarek Olšák2014-08-098-15/+8
| | | | | | | | | This validates all bound buffers (CB, ZB, textures, DMA) at the beginning of CS. This fixes "bo->space_accouned" assertion failures. Tested by: Jochen Rollwagen <[email protected]> Cc: [email protected] Reviewed-by: Alex Deucher <[email protected]>
* st/mesa: fix blit-based partial TexSubImage for 1D arraysMarek Olšák2014-08-091-0/+2
| | | | | | | | This fixes piglit spec/EXT_texture_array/render-1darray. Cc: [email protected] Reviewed-by: Ilia Mirkin <[email protected]> Reviewed-by: Roland Scheidegger <[email protected]>
* st/mesa: fix DrawPixels(GL_STENCIL_INDEX)Marek Olšák2014-08-091-7/+4
| | | | | | | | | This is a bug which was probably uncovered recently by Jason's commits and broke this. The problem is _mesa_base_tex_format(GL_STENCIL_INDEX) returns -1. Tested-by: Michel Dänzer <[email protected]>
* st/mesa: dump TGSI before calling into the driverMarek Olšák2014-08-091-12/+10
| | | | | | If the driver crashes in create_xx_shader, you want to see the shader. Reviewed-by: Ilia Mirkin <[email protected]>
* vc4: Add support for the COS instruction.Eric Anholt2014-08-081-0/+38
|
* vc4: Add support for the SIN instruction.Eric Anholt2014-08-081-0/+35
| | | | v2: Rebase on helpers.
* vc4: Fix register aliasing for packing of scaled coordinates.Eric Anholt2014-08-081-11/+18
| | | | Fixes glean fragProg1's "ADD test" and likely many others.
* vc4: Add some debug code for forcing fragment shader output color.Eric Anholt2014-08-081-0/+15
|
* u_primconvert: Copy min/max_index from the original primitive.Eric Anholt2014-08-081-4/+2
| | | | | | | | | | | | | | | | | | These values are supposed to be the minimum/maximum index values used to read from the vertex buffers. This code either copies index values out of the old IB (so, same min/max as the original draw call), or generates a new IB (using index values between the start and the start + count of the old array draw info, which just happens to be what min/max_index are set to by st_draw.c). We were incorrectly setting the max_index in the converting-from-glDrawArrays case to the start vertex plus the number of vertices generated in the new IB, which broke QUADS primitive conversion on VC4 (where max_index really has to be correct, or the kernel might reject your draw call due to buffer overflow). Reviewed-by: Rob Clark <[email protected]> (from verbal description of the patch)
* vc4: Fix using and emitting the 1/W from the vertex/coord shaders.Eric Anholt2014-08-081-14/+20
| | | | v2: Rebase on helpers change.
* vc4: Add support for swizzles of 32 bit float vertex attributes.Eric Anholt2014-08-082-20/+73
| | | | | | | | | | | | Some tests start working (useprogram-flushverts, for example) due to getitng the right vertices now. Some that used to pass start failing with memory overflow during binning, which is weird (glsl-fs-texture2drect). And a couple stop rendering correctly (glsl-fs-bug25902). v2: Move the attribute format setup in the key from after search time to before the search. v3: Fix reading of attributes other than position (I forgot to respect attr and stored everything in inputs 0-3, i.e. position).
* vc4: Add support for the TGSI FRC opcode.Eric Anholt2014-08-081-0/+18
| | | | v2: Rebase on helpers.
* vc4: Add support for the TGSI TRUNC opcode.Eric Anholt2014-08-084-0/+15
| | | | v2: Rebase on helpers.
* vc4: Crank up the tile allocation BO sizeEric Anholt2014-08-081-2/+2
| | | | | This avoids a simulator assertion failure with glamor. I need to actually support resize, though.
* vc4: Add support for multiple attributesEric Anholt2014-08-084-69/+46
|
* vc4: Add more useful debug for the undefined-source caseEric Anholt2014-08-081-5/+12
| | | | | | We could get undefined sources in real programs from the wild, so we'll need to turn off this debug eventually. But for now, using undefined sources is typically me just mistyping something.
* vc4: Add support for the lit opcode.Eric Anholt2014-08-082-1/+45
| | | | | | v2: Fix how it was using the X channel for the real work of the opcode, instead of Y. Fixes glean's LIT test. v3: Rebase on the helpers.
* vc4: Add support for the POW opcodeEric Anholt2014-08-081-0/+15
| | | | v2: Rebase on helpers.
* vc4: Refactor uniform handling.Eric Anholt2014-08-081-27/+27
| | | | | | | I wanted an easy way to set up new uniforms every time, so I could handle texture-sampler-related uniforms. v2: Rebase on helpers change.
* vc4: Add support for the LRP opcode.Eric Anholt2014-08-081-0/+20
| | | | v2: Rebase on helpers, cutting out most of the code in this change.
* vc4: Add copy propagation between temps.Eric Anholt2014-08-084-0/+81
| | | | | | | | We put in a bunch of extra MOVs for program outputs, and this can clean those up. We should do uniforms, too, though. v2: Fix missing flagging of progress when we actually optimize. Caught by Aaron Watry.
* vc4: Add dead code elimination.Eric Anholt2014-08-084-3/+94
| | | | | | This cleans up a bunch of noise in the compiled coordinate shaders (since we don't need the varying outputs), and also from writemasked instructions with negated src operands.
* vc4: Add an initial pass of algebraic optimization.Eric Anholt2014-08-085-4/+125
| | | | | There was a lot of extra noise in my piglit shader dumps because of silly CMPs.
* vc4: Add support for CMP.Eric Anholt2014-08-084-1/+48
| | | | | | | | This took a couple of tries, and this is the squash of those attempts. v2: Fix register file conflicts on the args in the destination-is-accumulator case. v3: Rebase on helper change and qir_inst4 change.
* vc4: Make scheduling of NOPs a separate step from QIR -> QPU translation.Eric Anholt2014-08-083-90/+212
| | | | | This should also be used as a way to pair QIR instructions into QPU instructions later.
* vc4: Add WIP support for varyings.Eric Anholt2014-08-086-8/+59
| | | | | | It doesn't do all the interpolation yet, but more tests can run now. v2: Rebase on helpers.
* vc4: Use r3 instead of r5 for temps, since r5 only has 32 bits of storageEric Anholt2014-08-081-8/+8
| | | | | Reserving a whole accumulator for temps is awful in the first place, but I'll fix that later.
* vc4: Fix emit of ABSEric Anholt2014-08-081-1/+11
| | | | v2: Rebase on qir helpers.
* vc4: Add shader variant caching to handle FS output swizzle.Eric Anholt2014-08-083-65/+232
|
* vc4: Load the tile buffer before incrementally drawing.Eric Anholt2014-08-082-27/+50
| | | | | | | We will want to occasionally disable this again when we do clear support. v2: Squash with the previous commit (I accidentally committed at two stages of writing the change)
* vc4: Don't reallocate the tile alloc/state bos every frame.Eric Anholt2014-08-082-10/+21
| | | | | This was a problem for the simulator since we don't free memory back to it, and it would soon just run out.
* vc4: Add VC4_DEBUG env optionEric Anholt2014-08-085-14/+63
| | | | | v2: Fix an accidental deletion of some characters from the copyright message (caught by Ilia Mirkin)
* vc4: Add support for SNE/SEQ/SGE/SLT.Eric Anholt2014-08-086-11/+96
|
* vc4: Use the user's actual first vertex attribute.Eric Anholt2014-08-084-35/+70
| | | | | This is hardcoded to read it as RGBA32F so far, but starts to get more tests working.
* vc4: Fix UBO allocation when no uniforms are used.Eric Anholt2014-08-081-1/+2
| | | | We do rely on a real BO getting allocated, so make sure we ask for a non-zero size.
* vc4: Add initial support for math opcodesEric Anholt2014-08-082-1/+41
|
* vc4: Switch to actually generating vertex and fragment shader code from TGSI.Eric Anholt2014-08-0812-247/+1243
| | | | | | | | | | | | | | | | | | This introduces an IR (QIR, for QPU IR) to do optimization on. It's a scalar, SSA IR in general. It looks like optimization is pretty easy this way, though I haven't figured out if it's going to be good for our weird register allocation or not (or if I want to reduce to basically QPU instructions first), and I've got some problems with it having some multi-QPU-instruction opcodes (SEQ and CMP, for example) which I probably want to break down. Of course, this commit mostly doesn't work, since many other things are still hardwired, like the VBO data. v2: Rewrite to use a bunch of helpers (qir_OPCODE) for emitting QIR instructions into temporary values, and make qir_inst4 take the 4 args separately instead of an array (all later callers wanted individual args).
* vc4: Start converting the driver to use vertex shaders.Eric Anholt2014-08-083-45/+177
| | | | | | | | Note: This is the cutoff point where I switched from developing primarily on the Pi to developing o the simulator. As a result, from this point on the code is untested on the Pi (the kernel code I have currently wasn't rendering anything at this commit, though the simulator renders successfully, suggesting kernel bugs).
* vc4: Initial skeleton driver import.Eric Anholt2014-08-0833-0/+4608
| | | | | | | | | | | | | | | | | | | This mostly just takes every draw call and turns it into a sequence of commands that clear the FBO and draw a single shaded triangle to it, regardless of the actual input vertices or shaders. I copied the initial driver skeleton mostly from freedreno, and I've preserved Rob Clark's copyright for those. I also based my initial hardcoded shaders and command lists on Scott Mansell (phire)'s "hackdriver" project, though the bit patterns of the shaders emitted end up being different. v2: Rebase on gallium megadrivers changes. v3: Rebase on PIPE_SHADER_CAP_MAX_CONSTS change. v4: Rely on simpenrose actually being installed when building for simulation. v5: Add more header duplicate-include guards. v6: Apply Emil's review (protection against vc4 sim and ilo at the same time, and dropping the dricommon drm bits) and fix a copyright header (thanks, Roland)
* draw: (trivial) use information about gs being present from variant keyRoland Scheidegger2014-08-091-5/+4
| | | | | | This is a purely cosmetic change. Reviewed-by: Brian Paul <[email protected]>
* draw: don't use clipvertex output if user plane clipping is disabledRoland Scheidegger2014-08-091-2/+2
| | | | | | | | | | The non-llvm path made sure that both clip and pre_clip_pos point to the data output by position, not clipvertex, if user based clipping is disabled. However, the llvm path did not, which apparently led to failures if gl_ClipVertex was written but user plane clipping not enabled (bug 80183). Why I have no idea really, but just make it match the non-llvm behavior... Reviewed-by: Brian Paul <[email protected]>
* i965: Get rid of backend_instruction::samplerChris Forbes2014-08-097-11/+0
| | | | | | | | The generators no longer use this. Signed-off-by: Chris Forbes <[email protected]> Reviewed-by: Matt Turner <[email protected]> Reviewed-by: Ian Romanick <[email protected]>
* i965/vec4/Gen8: Use src1 for sampler_index instead of ->sampler fieldChris Forbes2014-08-092-7/+15
| | | | | | Signed-off-by: Chris Forbes <[email protected]> Reviewed-by: Matt Turner <[email protected]> Reviewed-by: Ian Romanick <[email protected]>
* i965/vec4/Gen4-7: Use src1 for sampler_index instead of ->sampler fieldChris Forbes2014-08-092-8/+15
| | | | | | Signed-off-by: Chris Forbes <[email protected]> Reviewed-by: Matt Turner <[email protected]> Reviewed-by: Ian Romanick <[email protected]>
* i965/vec4: Pass sampler index in src1 for texture opsChris Forbes2014-08-092-7/+11
| | | | | | Signed-off-by: Chris Forbes <[email protected]> Reviewed-by: Matt Turner <[email protected]> Reviewed-by: Ian Romanick <[email protected]>
* i965/vec4: Collect all emits of texture ops into one placeChris Forbes2014-08-091-26/+12
| | | | | | Signed-off-by: Chris Forbes <[email protected]> Reviewed-by: Matt Turner <[email protected]> Reviewed-by: Ian Romanick <[email protected]>
* i965/fs/Gen8: Pass sampler_index to generate_texChris Forbes2014-08-092-7/+14
| | | | | | Signed-off-by: Chris Forbes <[email protected]> Reviewed-by: Matt Turner <[email protected]> Reviewed-by: Ian Romanick <[email protected]>
* i965/fs/Gen4-7: Pass sampler_index to generate_texChris Forbes2014-08-092-7/+14
| | | | | | Signed-off-by: Chris Forbes <[email protected]> Reviewed-by: Matt Turner <[email protected]> Reviewed-by: Ian Romanick <[email protected]>