summaryrefslogtreecommitdiffstats
path: root/src/gallium
Commit message (Collapse)AuthorAgeFilesLines
* vc4: Lower uniform loads to scalar in NIR.Eric Anholt2015-07-302-31/+81
| | | | | This also moves the vec4-to-byte-addressing math into NIR, so that algebraic has a chance at it.
* vc4: Move some FS input lowering into NIR.Eric Anholt2015-07-302-35/+50
|
* vc4: Move program keys to the header file.Eric Anholt2015-07-302-47/+49
| | | | | I want to be able to inspect them from other files for lowering passes in NIR.
* vc4: Lower NIR inputs to scalar as well.Eric Anholt2015-07-302-4/+44
| | | | | For now this is just scalarizing, but it also means we'll get to dump a bunch of QIR-based lowering in a moment.
* vc4: Start adding a NIR-based output lowering pass.Eric Anholt2015-07-304-7/+137
| | | | | | For now, this just splits up store_output intrinsics to be scalars, and drops unused outputs in the coordinate shader. My goal is to be able to drop a bunch of my VC4-specific optimization by letting NIR handle it.
* vc4: Mark our shaders as single-threaded.Eric Anholt2015-07-302-0/+6
| | | | | I had my understanding of this bit flipped. We're using the full register space, so we need to say so.
* vc4: Avoid leaking indirect array access UBOs.Eric Anholt2015-07-301-0/+2
|
* vc4: Avoid overflowing various static tables.Eric Anholt2015-07-304-4/+4
|
* vc4: Fix return values from recent validation changes.Eric Anholt2015-07-301-4/+4
|
* radeonsi: enable GL4.1 and update documentation (v2)Dave Airlie2015-07-301-1/+1
| | | | | | | | | | This enables GL4.1 for radeonsi, and updates the docs in the correct places. v2: enable only for llvm 3.7 which has fixes in place. Reviewed-by: Marek Olšák <[email protected]> Signed-off-by: Dave Airlie <[email protected]>
* radeonsi: add GS multiple streams support (v2)Dave Airlie2015-07-306-39/+127
| | | | | | | | | | | | | | | | | | | | | | | | | | This is the final piece for ARB_gpu_shader5, The code is based on the r600 code from Glenn Kennard, and myself. While developing this, I'm not 100% sure of all the calculations made in the GS registers, this is why the max_stream is worked out there and used to limit the changes in registers. Otherwise my initial attempts either regressed GS texelFetch tests or primitive-id-restart. The current code has no regressions in piglit. This commit doesn't enable ARB_gpu_shader5, since that just bumps the glsl level to 4.00, so I'll just do a separate patch for 4.10. v1.1: fix bug introduced in rebase. v2: Address Marek's review comments, remove my llvm stream code for simpler C, move gsvs_ring and gs_next_vertex to arrays. Reviewed-by: Marek Olšák <[email protected]> Signed-off-by: Dave Airlie <[email protected]>
* gallium/auxiliary: Ensure c99_math.h is included.Jose Fonseca2015-07-291-1/+2
| | | | | | As it is needed for exp2. Trivial.
* targets/dri: scons: add missing link against libdrmEmil Velikov2015-07-291-0/+2
| | | | | | | | | Otherwise the final dri module will have (additional) unresolved symbols. Cc: Brian Paul <[email protected]> Signed-off-by: Emil Velikov <[email protected]> Reviwed-by: Jose Fonseca <[email protected]>
* svga: scons: remove unused HAVE_SYS_TYPES_H defineEmil Velikov2015-07-292-2/+0
| | | | | | | | | There isn't a single instance in mesa that mentions HAVE_SYS_TYPES_H, other than this file. Cc: Jose Fonseca <[email protected]> Acked-by: Brian Paul <[email protected]> Signed-off-by: Emil Velikov <[email protected]>
* gallium/auxiliary: Avoid double promotion.Matt Turner2015-07-292-2/+2
| | | | | Reviewed-by: Iago Toral Quiroga <[email protected]> Reviewed-by: Eric Anholt <[email protected]>
* gallium/auxiliary: Use exp2(x) instead of pow(2.0, x).Matt Turner2015-07-292-4/+4
|
* nvc0/ir: cache vertex out base so that we don't recompute againIlia Mirkin2015-07-291-8/+15
| | | | | | | | The global CSE pass stinks and is unable to pull this out. Easy enough to handle it here and avoid generating unnecessary special register loads (which can allegedly be quite slow). Signed-off-by: Ilia Mirkin <[email protected]>
* nvc0/ir: output base for reading is based on laneidIlia Mirkin2015-07-291-0/+25
| | | | | | | | | | | PFETCH retrieves the address for incoming vertices, not output vertices in TCS. For output vertices, we must use the laneid as a base. Fixes barrier piglit test, which was failing for entirely non-barrier reasons, but rather that it was (a) trying to draw multiple patches and (b) the incoming patch size was not the same as the outgoing patch size. Signed-off-by: Ilia Mirkin <[email protected]>
* Revert "pipe-loader: simplify pipe_loader_drm_probe"Francisco Jerez2015-07-291-4/+9
| | | | | | | | | This reverts commit a27ec5dc460b91dc44675f48cddbbb2631ee824f. It breaks the intended behaviour of pipe_loader_probe() with ndev==0 as relied upon by clover to query the number of devices available to the pipe loader in the system. Acked-by: Emil Velikov <[email protected]>
* radeon: add support for streams to the common streamout code. (v2)Dave Airlie2015-07-296-15/+50
| | | | | | | | | | | | This adds to the common radeon streamout code, support for multiple streams. It updates radeonsi/r600 to set the enabled mask up. v2: update for changes in previous patch. Reviewed-by: Marek Olšák <[email protected]> Signed-off-by: Dave Airlie <[email protected]>
* radeon: move streamout buffer config to streamout enable function. (v2)Dave Airlie2015-07-292-9/+15
| | | | | | | | | | This will be used here later. v2: update atom sizes add check for old vs new enabled mask Reviewed-by: Marek Olšák <[email protected]> Signed-off-by: Dave Airlie <[email protected]>
* vc4: Skip re-emitting the shader_rec if it's unchanged.Eric Anholt2015-07-285-43/+158
| | | | | | | | It's a bunch of work for us to emit it (and its uniforms), more work for the kernel to validate it, and additional work for the CLE to read it. Improves es2gears framerate by about 50%. Signed-off-by: Eric Anholt <[email protected]>
* vc4: Drop unused vpm_offset value.Eric Anholt2015-07-281-3/+0
| | | | | It's been dead since we started doing VS/CS attr offset setup during shader compile.
* vc4: Simplify vc4_use_bo and make sure it's not a shader.Eric Anholt2015-07-284-39/+26
| | | | | | | Since the conversion to keeping validated shaders around for the BO's lifetime, we haven't been checking that rendering doesn't happen to shaders. Make vc4_use_bo check that always, and just don't use it for the VC4_MODE_SHADER case (so now modes are unused)
* vc4: Keep the validated shader around for the simulator execution.Eric Anholt2015-07-283-13/+17
| | | | This more closely matches the kernel behavior on shader validation now.
* vc4: Make the object be the return value from vc4_use_bo().Eric Anholt2015-07-283-23/+25
| | | | Drops 40 bytes of code from validation.
* vc4: Ensure that the bin CL is properly capped by increment/flush.Eric Anholt2015-07-283-26/+36
| | | | | | | | We don't want anything to appear after we've kicked off the render (and thus job flush), since that might then get written out to the tile allocation state. Signed-off-by: Eric Anholt <[email protected]>
* vc4: Drop NV shader reloc validation.Eric Anholt2015-07-282-120/+60
| | | | It wasn't validating enough, and we don't need the packet.
* vc4: Fix raster surface shadow updates under DRI2.Eric Anholt2015-07-281-0/+6
| | | | | | | Glamor asks GBM for the handle of the BO, then flinks it itself. We were marking the bo non-private in the flink and dmabuf (DRI3) paths, but not the GEM handle path. As a result, non-pageflipping DRI2 swapbuffers (EGL apps, in particular) were never updating the texture.
* vc4: Fix bus errors on dumping CL on hardware.Eric Anholt2015-07-281-1/+1
| | | | | The kernel can't fixup unaligned float traps for us, so deref as a uint32_t first.
* radeon: add streamout status 1-3 queries.Dave Airlie2015-07-292-2/+19
| | | | | | | This adds support for queries against the non-0 vertex streams. Reviewed-by: Marek Olšák <[email protected]> Signed-off-by: Dave Airlie <[email protected]>
* r600,radeonsi: GL_ARB_conditional_render_invertedEdward O'Callaghan2015-07-293-11/+15
| | | | | | | | | | By using 'Tobias Klausmann' piglit test-suite patch. We obtain a full 12/12 passes using this patch. By 'faking' to claim support for this extension we obtain 7 fails and 5 passes. Signed-off-by: Edward O'Callaghan <[email protected]> Tested-by: Furkan Alaca <[email protected]> Signed-off-by: Marek Olšák <[email protected]>
* radeonsi: add support for interpolateAt functions (v2)Dave Airlie2015-07-281-1/+240
| | | | | | | | | | | This is part of ARB_gpu_shader5, and this passes all the piglit tests currently available. v2: use macros from the fine derivs commit. add comments. Reviewed-by: Marek Olšák <[email protected]> Signed-off-by: Dave Airlie <[email protected]>
* nvc0/ir: trim out barrier sync for non-compute shadersIlia Mirkin2015-07-281-0/+6
| | | | | | | It seems like they're never necessary, and actively cause harm. This fixes some of the barrier-related piglits. Signed-off-by: Ilia Mirkin <[email protected]>
* nvc0/ir: fix barrier emissionIlia Mirkin2015-07-281-0/+2
| | | | | | immediate arguments require a flag to be set for each one Signed-off-by: Ilia Mirkin <[email protected]>
* vc4: Add support for ARB_draw_elements_base_vertex.Eric Anholt2015-07-271-1/+3
| | | | | | Gallium exposes it unconditionally, so do our best to support it. It fails on the negative index cases, but those seem unlikely to be used in the wild.
* freedreno/ir3: add transform-feedback supportRob Clark2015-07-274-4/+230
| | | | Signed-off-by: Rob Clark <[email protected]>
* freedreno/ir3: track "keeps" in irRob Clark2015-07-274-23/+17
| | | | | | | | | | | | Previously we had a fixed array to track kills, since they don't generate an SSA value, and then cheated by stuffing them in the outputs array before sending things through depth/sched/etc. But store instructions will need similar treatment. So convert this over to a more general array of instructions that must be kept and fix up the places that were previously relying on kills being in the output array. Signed-off-by: Rob Clark <[email protected]>
* freedreno/ir3: add support for store instructionsRob Clark2015-07-273-6/+43
| | | | | | | | | | For store instructions, the "dst" register is a read register, not a written register. (Ie. it is the address to store to.) Lets not confuse register allocation, scheduling, etc, with these details. Instead just leave a dummy instr->regs[0], and take "dst" from instr->regs[1] and srcs following. Signed-off-by: Rob Clark <[email protected]>
* freedreno/ir3: cleanup driver-param stuffRob Clark2015-07-273-10/+31
| | | | | | | | | Add 'enum ir3_driver_param' to track driver-param slots, and a create_driver_param() helper to avoid having the knowledge about where driver params are placed in const regs spread throughout the code as we add additional driver-params. Signed-off-by: Rob Clark <[email protected]>
* freedreno: add transform-feedback stateRob Clark2015-07-274-3/+95
| | | | Signed-off-by: Rob Clark <[email protected]>
* freedreno: add resource tracking support for written buffersRob Clark2015-07-274-12/+17
| | | | | | | | With stream-out (transform-feedback) we have the case where resources are *written* by the gpu, which needs basically the same tracking to figure out when rendering must be flushed. Signed-off-by: Rob Clark <[email protected]>
* freedreno/a3xx+a4xx: add support for vtxcnt semanticRob Clark2015-07-274-14/+31
| | | | | | This will be used for stream-out (transform-feedback) Signed-off-by: Rob Clark <[email protected]>
* freedreno/ir3: add stream-output support to cmdline compilerRob Clark2015-07-271-4/+22
| | | | | | A bit hard-coded configuration at the moment, but sufficient for now. Signed-off-by: Rob Clark <[email protected]>
* freedreno/ir3: drop unused create_input() argRob Clark2015-07-271-11/+8
| | | | Signed-off-by: Rob Clark <[email protected]>
* freedreno/ir3: move emit_const to ir3Rob Clark2015-07-2712-229/+263
| | | | | | | | | | | | | | | Details of the cmdstream packets are different between a3xx and a4xx, but the logic about the layout of const registers is the same, as that is dictated by the ir3 shader compiler. So rather than duplicating logic that is tightly coupled to ir3 between a3xx and a4xx, move this into ir3 and use per-generation callbacks for to build the cmdstream packets. This should make it easier to pass additional const regs (such as for transform feedback). And it also keeps the layout internal to ir3 in case we want to make the layout more dynamic some day. Signed-off-by: Rob Clark <[email protected]>
* freedreno/ir3: bit of shader API refactoringRob Clark2015-07-277-22/+25
| | | | | | | | | | | | Since for transform-feedback, we'll need more than just the TGSI tokens from the state object, just pass the entire state object to ir3_shader_create(). This also cleans things up a bit for some day in the future when we could take shader either as TGSI or directly NIR (for ex, glsl2nir or spirv2nir paths). In the same spirit, drop extra args from ir3_compile_shader_nir() (since it can anyways get what it needs from the ir3_shader_variant). Signed-off-by: Rob Clark <[email protected]>
* freedreno/ir3: updated cat6 encodingRob Clark2015-07-275-113/+230
| | | | | | | Sync updated cat6 encoding from freedreno.git, needed to properly encode store instructions. Signed-off-by: Rob Clark <[email protected]>
* radeonsi: add fine derivate control (v2.1)Dave Airlie2015-07-252-6/+48
| | | | | | | | | | | | | This adds support for fine derivatives and enables ARB_derivative_control on radeonsi. (just fell out of my working out interpolation) v2: cleanup some bits, write a comment v2.1: take Michel's comment from the mailing list Reviewed-by: Marek Olšák <[email protected]> Signed-off-by: Dave Airlie <[email protected]>
* radeonsi: fix GLSL textureGrad(samplerCube*) functionsMarek Olšák2015-07-253-34/+90
| | | | | | +4 piglits Reviewed-by: Michel Dänzer <[email protected]>