summaryrefslogtreecommitdiffstats
Commit message (Collapse)AuthorAgeFilesLines
* i965/clip: Removing scissor atomBen Widawsky2014-08-101-2/+2
| | | | | | | | | | | | | | | | | Now that we no longer use ctx->DrawBuffer->_Xmin and related fields to program the screen-space viewport extents, we don't depend on any scissoring state. So we can drop the +_NEW_SCISSOR dependency. On GEN8, a change in scissor state does not effect anything for the clipper/sf hardware state. The hardware will always do the right thing once the viewport extents are programmed. We can therefore remove the unecessary state emission. Ken originally spotted this. v2: Reword the commit message. Remove spurious hunk. Reviewed-by: Kenneth Graunke <[email protected]>
* i965/guardband: Enable for all viewport dimensions (GEN8+)Ben Widawsky2014-08-101-10/+20
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The goal of guardband clipping is to try to avoid 3d clipping because it is an expensive operation. When guardband clipping is disabled, all geometry that intersects the viewport is sent to the FF 3d clipper. Objects which are entirely enclosed within the viewport are said to be "trivially accepted" while those entirely outside of the viewport are, "trivially rejected". When guardband clipping is turned on the above behavior is changed such that if the geometry is within the guardband, and intersects the viewport, it skips the 3d clipper. Prior to GEN8, this was problematic if the viewport was smaller than the screen as it could allow for rendering to occur outside of the viewport. That could be mitigated if the programmer specified a scissor region which was less than or equal to the viewport - but this is not required for correctness in OpenGL. In theory you could be clever with the guardband so as not to invoke this problem. We do not do this, and have no data that suggests we should bother (nor the converse data). With viewport extents in place on GEN8, it should be safe to turn on guardband clipping for all cases While here, add a comment to the code which confused me thoroughly. v2: Update grammar in commit message. Reword comments based on Ken's suggestion. Reviewed-by: Kenneth Graunke <[email protected]>
* i965: Simplify viewport extents programming on GEN8Ben Widawsky2014-08-101-9/+15
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Viewport extents are a 3rd rectangle that defines which pixels get discarded as part of the rasterization process. The actual pixels drawn to the screen are an intersection of the drawing rectangle, the viewport extents, and the scissor rectangle. It permits the use of guardband clipping in all cases (see later patch). The actual pixels drawn to the screen are an intersection of the drawing rectangle, the viewport extents, and the scissor rectangle. Scissor rectangle is not super important for this discussion as it should always help do the right thing provided the programmer uses it. switch (viewport dimensions, drawrect dimension) { case viewport > drawing rectangle: no effects; break; case viewport == drawing rectangle: no effects; break; case viewport < drawing rectangle: Pixels (after the viewport transformation but before expensive rastersizing and shading operations) which are outside of the viewport are discarded. } I am unable to find a test case where this improves performance, but in all my testing it doesn't hurt performance, and intuitively, it should not ever hurt performance. It also permits us to use the guardband more freely (see upcoming patch). v2: Updating commit message. v3: Commit message updates requested by Ken Reviewed-by: Kenneth Graunke <[email protected]>
* i965/guardband: Improve comments for guardband clippingBen Widawsky2014-08-101-4/+18
| | | | | | | | | | While working in this part of the code I had a great deal of trouble understanding what it was trying to do, and matching it with the spec. (mostly due bad wording in the PRM). To help future people, I've cleaned up the wording and provided some ascii art. Reviewed-by: Kenneth Graunke <[email protected]> Reviewed-by: Chris Forbes <[email protected]>
* i965: Support the allow_glsl_extension_directive_midshader option.Kenneth Graunke2014-08-102-0/+4
| | | | | | | | | | | This adds support for Marek's new driconf parameter, which avoids totally white rendering in Unigine Valley (which attempts to enable the GL_ARB_sample_shading extension in an illegal place). Signed-off-by: Kenneth Graunke <[email protected]> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=75664 Reviewed-by: Matt Turner <[email protected]> Reviewed-by: Ian Romanick <[email protected]>
* i965/fs: set virtual_grf_count in assign_regs()Connor Abbott2014-08-101-0/+4
| | | | | | | | This lets us call dump_instructions() after register allocation without failing an assertion. Reviewed-by: Matt Turner <[email protected]> Signed-off-by: Connor Abbott <[email protected]>
* i965/fs: don't read from uninitialized memory while assigning registersConnor Abbott2014-08-101-6/+6
| | | | | Reviewed-by: Matt Turner <[email protected]> Signed-off-by: Connor Abbott <[email protected]>
* i965/fs: Fix bad whitespace.Matt Turner2014-08-101-2/+2
|
* gallium/radeon: Set gpu_address to 0 if r600_virtual_address is falseNiels Ole Salscheider2014-08-101-0/+2
| | | | | | | | | | | Without this patch I get the following during DMA transfers: [drm:radeon_cs_ib_chunk] *ERROR* Invalid command stream ! radeon 0000:01:00.0: CP DMA dst buffer too small (21475829792 4096) This is a fixup for e878e154cdfd4dbb5474f776e0a6d86fcb983098. Signed-off-by: Niels Ole Salscheider <[email protected]> Signed-off-by: Marek Olšák <[email protected]>
* radeonsi: simplify constant buffer upload for big endianMarek Olšák2014-08-101-18/+4
| | | | | | | | Point util_memcpy_cpu_to_le32 to a buffer storage directly. v2: simplify more Reviewed-by: Michel Dänzer <[email protected]>
* winsys/radeon: fix compile warningsMarek Olšák2014-08-091-3/+4
|
* r600g/compute: fix compile warningsMarek Olšák2014-08-092-10/+11
| | | | Trivial.
* r300g: handle new shader capsMarek Olšák2014-08-091-0/+2
| | | | Trivial.
* radeonsi: fix CMASK and HTILE allocation on TahitiMarek Olšák2014-08-092-3/+56
| | | | | | | | | | | | | | | | Tahiti has 12 tile pipes, but P8 pipe config. It looks like there is no way to get the pipe config except for reading GB_TILE_MODE. The TILING_CONFIG ioctl doesn't return more than 8 pipes, so we can't use that for Hawaii. This fixes a regression caused by 9b046474c95f15338d4c748df9b62871bba6f36f on Tahiti. v2: add an assertion and print an error on failure Cc: [email protected] Reviewed-by: Michel Dänzer <[email protected]>
* gallium/radeon: remove r600_resource_vaMarek Olšák2014-08-091-9/+0
| | | | | Reviewed-by: Christian König <[email protected]> Reviewed-by: Alex Deucher <[email protected]>
* gallium/radeon: use gpu_address from r600_resourceMarek Olšák2014-08-093-21/+14
| | | | | Reviewed-by: Christian König <[email protected]> Reviewed-by: Alex Deucher <[email protected]>
* r600g: use gpu_address from r600_resourceMarek Olšák2014-08-095-39/+29
| | | | | Reviewed-by: Christian König <[email protected]> Reviewed-by: Alex Deucher <[email protected]>
* radeonsi: use gpu_address from r600_resourceMarek Olšák2014-08-096-56/+41
| | | | | Reviewed-by: Christian König <[email protected]> Reviewed-by: Alex Deucher <[email protected]>
* gallium/radeon: store VM address in r600_resourceMarek Olšák2014-08-093-2/+7
| | | | | | | This will help to get rid of the buffer_get_virtual_address calls. Reviewed-by: Christian König <[email protected]> Reviewed-by: Alex Deucher <[email protected]>
* r600g: remove useless r600_resource_va callsMarek Olšák2014-08-091-18/+9
| | | | | | | R600-R700 don't support virtual memory. Reviewed-by: Christian König <[email protected]> Reviewed-by: Alex Deucher <[email protected]>
* radeonsi: always prefer SWITCH_ON_EOP(0) on CIKMarek Olšák2014-08-094-10/+46
| | | | | | | | | | | | | | The code is rewritten to take known constraints into account, while always using 0 by default. This should improve performance for multi-SE parts in theory. A debug option is also added for easier debugging. (If there are hangs, use the option. If the hangs go away, you have found the problem.) Reviewed-by: Alex Deucher <[email protected]> v2: fix a typo, set max_se for evergreen GPUs according to the kernel driver
* radeonsi: fix a hang with instancing in Unigine Heaven/Valley on HawaiiMarek Olšák2014-08-091-5/+2
| | | | | | | | This isn't documented anywhere, but it's the only thing that works for this case. Cc: [email protected] Reviewed-by: Alex Deucher <[email protected]>
* radeon,r200: fix buffer validation after CS flushMarek Olšák2014-08-098-15/+8
| | | | | | | | | This validates all bound buffers (CB, ZB, textures, DMA) at the beginning of CS. This fixes "bo->space_accouned" assertion failures. Tested by: Jochen Rollwagen <[email protected]> Cc: [email protected] Reviewed-by: Alex Deucher <[email protected]>
* st/mesa: fix blit-based partial TexSubImage for 1D arraysMarek Olšák2014-08-091-0/+2
| | | | | | | | This fixes piglit spec/EXT_texture_array/render-1darray. Cc: [email protected] Reviewed-by: Ilia Mirkin <[email protected]> Reviewed-by: Roland Scheidegger <[email protected]>
* st/mesa: fix DrawPixels(GL_STENCIL_INDEX)Marek Olšák2014-08-091-7/+4
| | | | | | | | | This is a bug which was probably uncovered recently by Jason's commits and broke this. The problem is _mesa_base_tex_format(GL_STENCIL_INDEX) returns -1. Tested-by: Michel Dänzer <[email protected]>
* st/mesa: dump TGSI before calling into the driverMarek Olšák2014-08-091-12/+10
| | | | | | If the driver crashes in create_xx_shader, you want to see the shader. Reviewed-by: Ilia Mirkin <[email protected]>
* configure.ac: Use LIBS rather than LDFLAGS to add -ldl to dladdr checkJon TURNEY2014-08-091-3/+4
| | | | | | | | | | | | | ec8ebff "Check for dladdr()" erroneously uses LDFLAGS rather than LIBS to add -ldl to the dladdr check. Replace the workaround in 39a4cc4 of explicitly checking in libdl, with a more correct approach of using LIBS. Signed-off-by: Jon TURNEY <[email protected]> Reviewed-by: Emil Velikov <[email protected]> Tested-by: Pali Rohár <[email protected]> Cc: "10.2" <[email protected]>
* vc4: Add support for the COS instruction.Eric Anholt2014-08-081-0/+38
|
* vc4: Add support for the SIN instruction.Eric Anholt2014-08-081-0/+35
| | | | v2: Rebase on helpers.
* vc4: Fix register aliasing for packing of scaled coordinates.Eric Anholt2014-08-081-11/+18
| | | | Fixes glean fragProg1's "ADD test" and likely many others.
* vc4: Add some debug code for forcing fragment shader output color.Eric Anholt2014-08-081-0/+15
|
* u_primconvert: Copy min/max_index from the original primitive.Eric Anholt2014-08-081-4/+2
| | | | | | | | | | | | | | | | | | These values are supposed to be the minimum/maximum index values used to read from the vertex buffers. This code either copies index values out of the old IB (so, same min/max as the original draw call), or generates a new IB (using index values between the start and the start + count of the old array draw info, which just happens to be what min/max_index are set to by st_draw.c). We were incorrectly setting the max_index in the converting-from-glDrawArrays case to the start vertex plus the number of vertices generated in the new IB, which broke QUADS primitive conversion on VC4 (where max_index really has to be correct, or the kernel might reject your draw call due to buffer overflow). Reviewed-by: Rob Clark <[email protected]> (from verbal description of the patch)
* vc4: Fix using and emitting the 1/W from the vertex/coord shaders.Eric Anholt2014-08-081-14/+20
| | | | v2: Rebase on helpers change.
* vc4: Add support for swizzles of 32 bit float vertex attributes.Eric Anholt2014-08-082-20/+73
| | | | | | | | | | | | Some tests start working (useprogram-flushverts, for example) due to getitng the right vertices now. Some that used to pass start failing with memory overflow during binning, which is weird (glsl-fs-texture2drect). And a couple stop rendering correctly (glsl-fs-bug25902). v2: Move the attribute format setup in the key from after search time to before the search. v3: Fix reading of attributes other than position (I forgot to respect attr and stored everything in inputs 0-3, i.e. position).
* vc4: Add support for the TGSI FRC opcode.Eric Anholt2014-08-081-0/+18
| | | | v2: Rebase on helpers.
* vc4: Add support for the TGSI TRUNC opcode.Eric Anholt2014-08-084-0/+15
| | | | v2: Rebase on helpers.
* vc4: Crank up the tile allocation BO sizeEric Anholt2014-08-081-2/+2
| | | | | This avoids a simulator assertion failure with glamor. I need to actually support resize, though.
* vc4: Add support for multiple attributesEric Anholt2014-08-084-69/+46
|
* vc4: Add more useful debug for the undefined-source caseEric Anholt2014-08-081-5/+12
| | | | | | We could get undefined sources in real programs from the wild, so we'll need to turn off this debug eventually. But for now, using undefined sources is typically me just mistyping something.
* vc4: Add support for the lit opcode.Eric Anholt2014-08-082-1/+45
| | | | | | v2: Fix how it was using the X channel for the real work of the opcode, instead of Y. Fixes glean's LIT test. v3: Rebase on the helpers.
* vc4: Add support for the POW opcodeEric Anholt2014-08-081-0/+15
| | | | v2: Rebase on helpers.
* vc4: Refactor uniform handling.Eric Anholt2014-08-081-27/+27
| | | | | | | I wanted an easy way to set up new uniforms every time, so I could handle texture-sampler-related uniforms. v2: Rebase on helpers change.
* vc4: Add support for the LRP opcode.Eric Anholt2014-08-081-0/+20
| | | | v2: Rebase on helpers, cutting out most of the code in this change.
* vc4: Add copy propagation between temps.Eric Anholt2014-08-084-0/+81
| | | | | | | | We put in a bunch of extra MOVs for program outputs, and this can clean those up. We should do uniforms, too, though. v2: Fix missing flagging of progress when we actually optimize. Caught by Aaron Watry.
* vc4: Add dead code elimination.Eric Anholt2014-08-084-3/+94
| | | | | | This cleans up a bunch of noise in the compiled coordinate shaders (since we don't need the varying outputs), and also from writemasked instructions with negated src operands.
* vc4: Add an initial pass of algebraic optimization.Eric Anholt2014-08-085-4/+125
| | | | | There was a lot of extra noise in my piglit shader dumps because of silly CMPs.
* vc4: Add support for CMP.Eric Anholt2014-08-084-1/+48
| | | | | | | | This took a couple of tries, and this is the squash of those attempts. v2: Fix register file conflicts on the args in the destination-is-accumulator case. v3: Rebase on helper change and qir_inst4 change.
* vc4: Make scheduling of NOPs a separate step from QIR -> QPU translation.Eric Anholt2014-08-083-90/+212
| | | | | This should also be used as a way to pair QIR instructions into QPU instructions later.
* vc4: Add WIP support for varyings.Eric Anholt2014-08-086-8/+59
| | | | | | It doesn't do all the interpolation yet, but more tests can run now. v2: Rebase on helpers.
* vc4: Use r3 instead of r5 for temps, since r5 only has 32 bits of storageEric Anholt2014-08-081-8/+8
| | | | | Reserving a whole accumulator for temps is awful in the first place, but I'll fix that later.