summaryrefslogtreecommitdiffstats
path: root/src/gallium/drivers
Commit message (Collapse)AuthorAgeFilesLines
* freedreno/a3xx: LAYERSZ2 appears to have no effect on arraysIlia Mirkin2015-03-281-2/+1
| | | | Signed-off-by: Ilia Mirkin <[email protected]>
* llvmpipe: simplify address calculation for 4x4 blocksRoland Scheidegger2015-03-284-76/+35
| | | | | | | | | | | | | | These functions looked quite complicated, even though what they actually did was trivial (ever since we dropped swizzled rendering). Also drop lookup of format block per bytes done for each block, and do it once per scene instead. This improves everybody's favorite "benchmark" by 3% or so, though lp_rast_shade_quads_all() which calls this shows up still quite high for a function which does little more than call the jit function. (This would most likely be much better handled by the jit function itself, the strides are passed through anyway already, though for being able to handle layers it would definitely add some complexity.) Reviewed-by: Jose Fonseca <[email protected]>
* nv50/ir/gk110: fix offset flag position for TXD opcodeIlia Mirkin2015-03-271-0/+1
| | | | | Cc: "10.4 10.5" <[email protected]> Signed-off-by: Ilia Mirkin <[email protected]>
* nv50/ir: take postFactor into account when doing peephole optimizationsIlia Mirkin2015-03-271-4/+8
| | | | | | | | | | Multiply operations can have a post-factor on them, which other ops don't support. Only perform the peephole optimizations when there is no post-factor involved. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=89758 Cc: "10.4 10.5" <[email protected]> Signed-off-by: Ilia Mirkin <[email protected]>
* gallivm: pass jit_context pointer through to samplingRoland Scheidegger2015-03-273-18/+19
| | | | | | | | | | | | | | The callbacks used for getting the dynamic texture/sampler state were using the jit_context from the generated jit function. This works just fine, however that way it's impossible to generate separate functions for texture sampling, as will be done in the next commit. Hence, pass this pointer through all interfaces so it can be passed to a separate function (technically, it would probably be possible to extract this pointer from the current function instead, but this feels hacky and would probably require some more hacks if we'd use real functions instead of inlining all shader functions at some point). There should be no difference in the generated code for now. Reviewed-by: Jose Fonseca <[email protected]>
* gallium/util: remove u_linkageIlia Mirkin2015-03-262-2/+0
| | | | | | | | Does not appear to be used in tree. Coverity spotted some errors in the bitmask stuff, but the whole thing appears to be unused. Signed-off-by: Ilia Mirkin <[email protected]> Reviewed-by: Brian Paul <[email protected]>
* vc4: Add a dump-the-surface-contents routine.Eric Anholt2015-03-242-0/+101
| | | | | This has been useful once again while trying to debug stride issues between render targets and texturing.
* vc4: Fix pitch alignment of linear textures.Eric Anholt2015-03-241-1/+1
| | | | | Fixes some non-power-of-two texture rendering when I force ARGB8888 to raster.
* vc4: Write the alignment of level width consistently in validation.Eric Anholt2015-03-241-2/+2
| | | | | | 16 / cpp happens to be the same as utile_w on the only raster format supported (4 bytes per pixel), but simulator/hw source code generally talks in terms of utiles.
* vc4: Fix use of a bool as an enum.Eric Anholt2015-03-241-1/+1
| | | | The enum compared to was 0, so it worked out, but it sure looked wrong.
* vc4: Decide the HW's format before laying out the miptree.Eric Anholt2015-03-241-3/+3
| | | | | | I'm experimenting with a workaround for raster texture misrendering on hardware, and this lets me look at the format chosen when computing strides.
* vc4: Use our device-specific ioctls for create/mmap.Eric Anholt2015-03-241-15/+36
| | | | | | They don't do anything special for us, but I've been told by kernel maintainers that relying on dumb for my acceleration-capable buffers is not OK.
* vc4: Make a new #define for making code conditional on the simulator.Eric Anholt2015-03-243-15/+25
| | | | | | I'd like to compile as much of the device-specific code as possible when building for simulator, and using if (using_simulator) instead of ifdefs helps.
* vc4: Add some useful debug printfs for miptrees.Eric Anholt2015-03-241-0/+37
| | | | I keep rewriting these.
* Revert "nv50,nvc0: remove bogus 64_FLOAT formats"Ilia Mirkin2015-03-231-0/+5
| | | | | | | | | | | | | | | This reverts commit 20346808cf4f1ee4f320afaf18f94043fb146f2e. The conversion is actually done since these are the *B macro variants and no vtx format is supplied, which makes them go through the translate module. This restores the following piglit tests to passing: draw-vertices user gl-2.0-vertexattribpointer Signed-off-by: Ilia Mirkin <[email protected]>
* gallium: implement get_device_vendor() for existing driversGiuseppe Bilotta2015-03-2313-0/+83
| | | | | | | | | The only hackish ones are llvmpipe and softpipe, which currently return the same string as for get_vendor(), while ideally they should return the CPU vendor. Signed-off-by: Giuseppe Bilotta <[email protected]> Reviewed-by: Tom Stellard <[email protected]>
* galahad: actually remove the driverEmil Velikov2015-03-2110-1998/+0
| | | | | | | | Should have been part of 429a4355259(galahad: remove driver). Seems like I've erroneously committed the trimmed patch. Reported-by: Marek Olšák <[email protected]> Signed-off-by: Emil Velikov <[email protected]>
* llvmpipe: use global llvm context for PIPE_SUBSYSTEM_EMBEDDEDRoland Scheidegger2015-03-211-0/+11
| | | | | | | | | | | | | | | | | There's 2 reasons why we'd want to use the global context: 1) There still seems to be one memory "leak" left when using multiple llvm contexts (it is not a true leak as the memory disappears into some still addressable pool but nevertheless the memory consumption grows). See http://cgit.freedesktop.org/~jrfonseca/llvm-jitstress/ 2) These contexts get kinda big - even when disposing modules etc. after compiling a shader the LLVMContext can easily be over 100kB. So when there's lots of llvm contexts arounds it adds up. The downside is that at least right now this is absolutely not thread safe, so this only works safely in environments where multiple pipe contexts are not used concurrently. Reviewed-by: Jose Fonseca <[email protected]>
* u_primconvert: add primitive restart supportDave Airlie2015-03-201-1/+2
| | | | | | | | | | | | | | | | | | | | This add primitive restart support to the prim conversion. This involves changing the API for the translate functions as we need to pass the prim restart index and the original number of indices into the translate functions. primitive restart is support for quads, quad strips and polygons. This deal with the case where the actual number of output primitives is less than the initially calculated number, by filling the rest of the output primitives with the restart index, the other option is to reduce the output prim number, but that will make the generator code a bit messier. Reviewed-by: Brian Paul <[email protected]> Signed-off-by: Dave Airlie <[email protected]>
* freedreno/ir3: fix infinite recursion in schedRob Clark2015-03-181-1/+1
| | | | | | | One more case we need to handle. One of the src instructions for the indirect could also end up being ourself. Signed-off-by: Rob Clark <[email protected]>
* freedreno: fix spellingRob Clark2015-03-181-1/+1
| | | | Signed-off-by: Rob Clark <[email protected]>
* radeonsi: increase coords array size for radeon_llvm_emit_prepare_cube_coordsMarek Olšák2015-03-182-2/+2
| | | | | | | | | radeon_llvm_emit_prepare_cube_coords uses coords[4] in some cases (TXB2 etc.) Discovered by Coverity. Reported by Ilia Mirkin. Cc: 10.5 10.4 <[email protected]> Reviewed-by: Michel Dänzer <[email protected]>
* r600g: constify r600_shader_tgsi_instruction lists.Emil Velikov2015-03-171-5/+5
| | | | | | | Massive list of constant data. Annotate it as such. Signed-off-by: Emil Velikov <[email protected]> Reviewed-by: Marek Olšák <[email protected]>
* r600g: kill off r600_shader_tgsi_instruction::{tgsi_opcode,is_op3}Emil Velikov2015-03-171-591/+589
| | | | | | | | Both of which are no longer used. Use designated initializer to make things obvious as people add/remove TGSI_OPCODEs. Signed-off-by: Emil Velikov <[email protected]> Reviewed-by: Marek Olšák <[email protected]>
* r600g: use the tgsi opcode from parse.FullToken.FullInstructionEmil Velikov2015-03-171-5/+8
| | | | | | | | | ... rather than the local one in inst_info->tgsi_opcode. This will allow us to simplify struct r600_shader_tgsi_instruction. Signed-off-by: Emil Velikov <[email protected]> Reviewed-by: Marek Olšák <[email protected]>
* radeonsi: implement TGSI_OPCODE_BFI (v2)Marek Olšák2015-03-161-0/+34
| | | | | | | v2: Don't use the intrinsics, the shader backend can recognize these patterns and generates optimal code automatically. Reviewed-by: Tom Stellard <[email protected]>
* radeonsi: add a helper for extracting bitfields from parameters (v2)Marek Olšák2015-03-161-16/+27
| | | | | | | | This will be used a lot (especially by tessellation). v2: don't use the bfe intrinsic Reviewed-by: Tom Stellard <[email protected]>
* radeonsi: move scratch reloc state setupMarek Olšák2015-03-162-15/+22
| | | | | | | | - move it to its own function - do it after all states are emitted - bump SI_MAX_DRAW_CS_DWORDS Reviewed-by: Michel Dänzer <[email protected]>
* radeonsi: don't emit PA_SC_LINE_STIPPLE if not rendering linesMarek Olšák2015-03-161-0/+8
| | | | Reviewed-by: Michel Dänzer <[email protected]>
* radeonsi: don't emit PA_SC_LINE_STIPPLE after every rasterizer state changeMarek Olšák2015-03-164-7/+7
| | | | | | Do it only when the line stipple state is changed. Reviewed-by: Michel Dänzer <[email protected]>
* radeonsi: move PA_SU_SC_MODE_CNTL to rasterizer stateMarek Olšák2015-03-165-30/+14
| | | | | | | | | This requires enabling the optional GL provoking vertex behavior for quads. + some cosmetic changes, so that the register is set exactly the same as on r600. Reviewed-by: Michel Dänzer <[email protected]>
* radeonsi: implement line and polygon smoothingMarek Olšák2015-03-164-10/+49
| | | | Reviewed-by: Michel Dänzer <[email protected]>
* radeonsi: add shader code for smoothingMarek Olšák2015-03-163-1/+39
| | | | | | | The fragment shader multiplies the alpha channel with gl_SampleMaskIn. If blending is enabled, it looks like MSAA. Reviewed-by: Michel Dänzer <[email protected]>
* radeonsi: split sample locations into its own state atomMarek Olšák2015-03-165-0/+18
| | | | | | Sample locations are not updated as often as framebuffers. Reviewed-by: Michel Dänzer <[email protected]>
* radeonsi: add basic code for overrasterizationMarek Olšák2015-03-165-16/+28
| | | | | | | This will be used for line and polygon smoothing. This is GCN-only even though it's in shared code. Reviewed-by: Michel Dänzer <[email protected]>
* radeonsi: small cleanup in si_shader_selector_keyMarek Olšák2015-03-161-12/+12
| | | | Reviewed-by: Michel Dänzer <[email protected]>
* radeonsi: simplify accessing alpha pointer in si_llvm_emit_fs_epilogueMarek Olšák2015-03-161-7/+8
| | | | Reviewed-by: Michel Dänzer <[email protected]>
* radeonsi: add support for easy opcodes from ARB_gpu_shader5Marek Olšák2015-03-161-0/+8
| | | | | | | | I have to use the BFE instrinsics, because BFE is one of the most complex instructions that can't be matched easily. BFE has 3 conditional branches and one of them is quite big. In the isel DAG, lowered BFE has 27 nodes (including leafs).
* radeonsi: implement bit-finding opcodes from ARB_gpu_shader5Marek Olšák2015-03-161-0/+92
| | | | Reviewed-by: Glenn Kennard <[email protected]>
* radeonsi: implement gl_SampleMaskInMarek Olšák2015-03-161-0/+4
| | | | Reviewed-by: Glenn Kennard <[email protected]>
* radeonsi: add support for SQRTMarek Olšák2015-03-162-1/+3
| | | | | Reviewed-by: Tom Stellard <[email protected]> Reviewed-by: Glenn Kennard <[email protected]>
* radeonsi: add support for FMAMarek Olšák2015-03-162-1/+4
| | | | | Reviewed-by: Tom Stellard <[email protected]> Reviewed-by: Glenn Kennard <[email protected]>
* gallium/radeon: don't use LLVMReadOnlyAttribute for ALUMarek Olšák2015-03-161-16/+9
| | | | | | | None of the instructions use a pointer argument. (+ small cosmetic changes) Reviewed-by: Tom Stellard <[email protected]>
* gallium: add FMA and DFMA opcodes (v3)Marek Olšák2015-03-1611-3/+16
| | | | | | | | | Needed by ARB_gpu_shader5. v2: select DMAD for FMA with double precision v3: add and select DFMA Reviewed-by: Ilia Mirkin <[email protected]>
* freedreno: update generated headersRob Clark2015-03-155-6/+6
| | | | | | | Fix a3xx texture layer-size. Signed-off-by: Rob Clark <[email protected]> Cc: "10.4 10.5" <[email protected]>
* freedreno/ir3: remove old compilerRob Clark2015-03-157-1574/+10
| | | | | | | Now that piglit is no longer falling back to old compiler for any tests, we can remove it. Hurray \o/ Signed-off-by: Rob Clark <[email protected]>
* freedreno/ir3: avoid scheduler deadlockRob Clark2015-03-153-0/+45
| | | | | | | | | | | Deadlock can occur if we schedule an address register write, yet some instructions which depend on that address register value also depend on other unscheduled instructions that depend on a different address register value. To solve this, before scheduling an address register write, ensure that all the other dependencies of the instructions which consume this address register are already scheduled. Signed-off-by: Rob Clark <[email protected]>
* freedreno/ir3: bit of cleanupRob Clark2015-03-153-19/+23
| | | | | | | | Add an array_insert() macro to simplify inserting into dynamically sized arrays, add a comment, and remove unused prototype inherited from the original freedreno.git/fdre-a3xx test code, etc. Signed-off-by: Rob Clark <[email protected]>
* freedreno: fix slice pitch calculationsIlia Mirkin2015-03-131-1/+1
| | | | | | | | | | | | | For example if width were 65, the first slice would get 96 while the second would get 32. However the hardware appears to expect the second pitch to be 64, based on halving the 96 (and aligning up to 32). This fixes texelFetch piglit tests on a3xx below a certain size. Going higher they break again, but most likely due to unrelated reasons. Signed-off-by: Ilia Mirkin <[email protected]> Cc: "10.4 10.5" <[email protected]> Reviewed-by: Rob Clark <[email protected]>
* freedreno/a3xx: use the same layer size for all slicesIlia Mirkin2015-03-131-1/+8
| | | | | | | | | | | | | | | | | | We only program in one layer size per texture, so that means that all levels must share one size. This makes the piglit test bin/texelFetch fs sampler2DArray have the same breakage as its non-array version instead of being completely off, and makes bin/ext_texture_array-gen-mipmap start passing. Signed-off-by: Ilia Mirkin <[email protected]> Cc: "10.4 10.5" <[email protected]> Reviewed-by: Rob Clark <[email protected]>