summaryrefslogtreecommitdiffstats
path: root/src/gallium/drivers/nouveau
Commit message (Collapse)AuthorAgeFilesLines
* meson: move libsensors dependency to libgalliumDylan Baker2018-01-111-1/+1
| | | | | | | | | This simplifies the build by removing the need to link targets against libsensors. Suggested-by: Emil Velikov <[email protected]> Signed-off-by: Dylan Baker <[email protected]> Acked-by: Eric Engestrom <[email protected]>
* nvc0: enable bindless on keplerIlia Mirkin2018-01-071-3/+3
| | | | | | | All the functionality is in. Maxwell will take a little bit more enablement work. Signed-off-by: Ilia Mirkin <[email protected]>
* nvc0: add bindless image support for keplerIlia Mirkin2018-01-0711-75/+272
| | | | | | | | A part of the driver constbuf area is allocated for bindless images. Any update requires uploading to all driver constbufs. This also extends the driver constbuf to 64KB, up from 2KB. Signed-off-by: Ilia Mirkin <[email protected]>
* nvc0: add support for bindless textures on kepler+Ilia Mirkin2018-01-0710-5/+183
| | | | | | | | | This keeps a list of resident textures (per context), and dumps that list into the active buffer list when submitting. We also treat bindless texture fetches slightly differently, wrt the meaning of indirect, and not requiring the SAMPLER file to be used. Signed-off-by: Ilia Mirkin <[email protected]>
* nv50/ir: use the image info in the instruction rather than declIlia Mirkin2018-01-071-52/+24
| | | | | | | | | | In preparation for bindless images, we have to retrieve the target/format info from the instruction directly, as there will be no declaration. Furthermore, for bound images, this information is still available in the instruction, so we can drop the declaration-based mechanism entirely. Signed-off-by: Ilia Mirkin <[email protected]>
* nvc0/ir: safen up lowering logic against overwriting reused valuesIlia Mirkin2018-01-071-2/+4
| | | | | | | | | I'm fairly sure both of the changed sites are OK as-is, but they're fragile, so this is just safening them up. Since this is happening pre-ssa, we don't want to be overwriting values that may potentially get used later on. Signed-off-by: Ilia Mirkin <[email protected]>
* nvc0: update tic in-place when buffer address changesIlia Mirkin2018-01-072-14/+21
| | | | | | This is helpful for bindless, where changing TIC id's is undesirable. Signed-off-by: Ilia Mirkin <[email protected]>
* nvc0: ensure that pushbuf keeps ref to old text/tls bosIlia Mirkin2018-01-071-0/+13
| | | | | | | | | If we free the bo, then the PTE may get deallocated immediately. We have to make sure that the submission includes a ref to the old bo so that it remains mapped for the duration of the command execution. Signed-off-by: Ilia Mirkin <[email protected]> Reviewed-by: Samuel Pitoiset <[email protected]>
* nv50/ir: Fix unused var warnings in release buildRhys Kidd2017-12-292-2/+4
| | | | | | | | v2: Add preventative comment (Ilia Mirkin) Reviewed-by: Eric Engestrom <[email protected]> Reviewed-by: Pierre Moreau <[email protected]> Signed-off-by: Rhys Kidd <[email protected]>
* nvc0: Fix unused var warnings in release buildRhys Kidd2017-12-291-3/+4
| | | | | Reviewed-by: Pierre Moreau <[email protected]> Signed-off-by: Rhys Kidd <[email protected]>
* nv50: Fix unused var warning in release buildRhys Kidd2017-12-291-1/+2
| | | | | Reviewed-by: Pierre Moreau <[email protected]> Signed-off-by: Rhys Kidd <[email protected]>
* gm107/ir: use lane 0 for manual textureGrad handlingIlia Mirkin2017-12-221-21/+34
| | | | | | | | | | This is parallel to the pre-SM50 change which does this. Adjusts the shuffles / quadops to make the values correct relative to lane 0, and then splat the results to all lanes for the final move into the target register. Signed-off-by: Ilia Mirkin <[email protected]> Tested-By: Karol Herbst <[email protected]>
* nvc0/ir: change textureGrad to always use lane 0 as the tex originIlia Mirkin2017-12-191-14/+46
| | | | | | | | | | | | | | | | | | | Thanks to Karol Herbst for the debugging / tracing work that led to this change. Move to using lane 0 as the "work" lane for the texture. It is unclear why this helps, as that computation should be identical to doing it in the "correct" lane with the properly adjusted quadops. In order to be able to use the lane 0 result, we also have to ensure that lane 0 contains the proper array/indirect/shadow values. This applies to Fermi and Kepler. Maxwell+ may or may not need fixing, but that lowering logic is separate. Fixes KHR-GL45.texture_cube_map_array.sampling Signed-off-by: Ilia Mirkin <[email protected]>
* gallium: plumb context priority through to driverRob Clark2017-12-193-0/+3
| | | | | | | | Signed-off-by: Rob Clark <[email protected]> Reviewed-by: Roland Scheidegger <[email protected]> Reviewed-by: Marek Olšák <[email protected]> Reviewed-by: Andres Rodriguez <[email protected]> Reviewed-by: Wladimir J. van der Laan <[email protected]>
* meson: define driver dependenciesDylan Baker2017-12-041-0/+5
| | | | | | | | | | | | This allow us to encapsulate the compiler and linkage requirements of each driver in a reusable way. The result will be that each target that needs a specific driver can simply add `driver_<name>` to its dependencies line and the necessary libraries and compiler args will be added. This will allow for a lot of code de-duplication between gallium targets. Signed-off-by: Dylan Baker <[email protected]> Reviewed-by: Eric Engestrom <[email protected]>
* nvc0/ir: Properly lower 64-bit shifts when the shift value is >32Pierre Moreau2017-12-041-1/+1
| | | | | | | | | | | | | | | | | | | | Fixes: 61d7676df77 "nvc0/ir: add support for 64-bit shift lowering on SM20/SM30" Fixes fs-shift-scalar-by-scalar.shader_test from piglit for the current set-up: uniform int64_t ival -0x7dfcfefbdf6536ff # bit pattern: 0x82030104209ac901 uniform uint64_t uval 0x1400000085010203 uniform int shl 36 uniform int shr 36 uniform int64_t iexpected_shl 0x09ac901000000000 uniform int64_t iexpected_shr -0x7dfcff0 # bit pattern: 0xfffffffff8203010 uniform uint64_t uexpected_shl 0x5010203000000000 uniform uint64_t uexpected_shr 0x0000000001400000 draw rect ortho 12 0 4 4 Signed-off-by: Pierre Moreau <[email protected]> Reviewed-by: Ilia Mirkin <[email protected]>
* meson: Add lmsensors supportDylan Baker2017-12-011-1/+1
| | | | | | | | v2: - Make -Dlmsensors=false work - Simplify auto and true cases Signed-off-by: Dylan Baker <[email protected]> Reviewed-by: Eric Engestrom <[email protected]>
* nouveau/compiler: Allow to omit line numbers when printing instructionsTobias Klausmann2017-11-265-4/+13
| | | | | | | | | | | | | | | | This comes in handy when checking "NV50_PROG_DEBUG=1" outputs with diff! V2: - Use environmental variable (Karol Herbst) V3: - Use the already populated nv50_ir_prog_info to forward information to the print pass (Pierre Moreau) V4: - get rid of default value in PrintPass constructor Signed-off-by: Tobias Klausmann <[email protected]> Reviewed-by: Pierre Moreau <[email protected]> Reviewed-by: Ilia Mirkin <[email protected]>
* nv50/ir: move LateAlgebraicOpt to the very endIlia Mirkin2017-11-261-1/+1
| | | | | | | | | | | | | | | | | | | | | | Memory loads can take offsets, but the SHLADD will often attempt to consume the offsets too. As there may be multiple memory loads with the same base but different offsets, those would end up in a SHLADD instead of the offset of the memory operation. This moves the pass after we've had a chance to attempt to propagate immediate adds into the indirect offset. total instructions in shared programs : 6580681 -> 6567716 (-0.20%) total gprs used in shared programs : 944261 -> 943375 (-0.09%) total shared used in shared programs : 0 -> 0 (0.00%) total local used in shared programs : 15328 -> 15328 (0.00%) total bytes used in shared programs : 60339896 -> 60221504 (-0.20%) local shared gpr inst bytes helped 0 0 555 2698 2698 hurt 0 0 138 336 336 Signed-off-by: Ilia Mirkin <[email protected]>
* nv50/ir: when merging immediates/consts, load directlyIlia Mirkin2017-11-261-1/+21
| | | | | | | | | | | | | | | | | | | | | | | | | | | When a MERGE operation gets its constraint moves added, it susbstantially extends live ranges to be reusing an immediate from earlier in the program (not to mention the silliness of loading an immediate into a register, and then moving into another register). We detect these scenarios and insert moves that take the immediate or constbuf load directly into the register. If it's the last use, then we can just move that operation to the closer location. With SM35 (255 regs) we get these results: total instructions in shared programs : 6583670 -> 6580681 (-0.05%) total gprs used in shared programs : 950818 -> 944261 (-0.69%) total shared used in shared programs : 0 -> 0 (0.00%) total local used in shared programs : 15328 -> 15328 (0.00%) total bytes used in shared programs : 60367456 -> 60339896 (-0.05%) local shared gpr inst bytes helped 0 0 4584 3186 3186 hurt 0 0 55 968 968 I suspect they will be better for SM20 and SM30. Signed-off-by: Ilia Mirkin <[email protected]>
* nv50/ir: add optimization for modulo by a non-power-of-2 valueIlia Mirkin2017-11-261-0/+15
| | | | | | | | We can still use the optimized division methods which make use of multiplication with overflow. Signed-off-by: Ilia Mirkin <[email protected]> Reviewed-by: Tobias Klausmann <[email protected]>
* nv50/ir: optimize signed integer modulo by pow-of-2Ilia Mirkin2017-11-252-10/+29
| | | | | | | | | It's common to use signed int modulo in GLSL. As it happens, the GLSL specs allow the result to be undefined, but that seems fairly surprising. It's not that much more effort to get it right, at least for positive modulo operators. Signed-off-by: Ilia Mirkin <[email protected]>
* meson: don't use build_by_default for specific gallium driversDylan Baker2017-11-131-1/+0
| | | | | | | | | | | | | | | | | | | Using build_by_default : false is convenient for dependencies that can be pulled in by various diverse components of the build system, the gallium hardware/software drivers and state trackers do not fit that description. Instead, these should be guarded using the variable that tracks whether that driver should be enabled. This leaves a few helper libraries: trace, rbug, etc, and the generic winsys bits as `build_by_default : false` because there are a large number of gallium components that pull them in. v2: - remove build_by_default from winsys convenience libs as well. v3: - Always put drivers before winsys for consistency Signed-off-by: Dylan Baker <[email protected]> Tested-by: Lionel Landwerlin <[email protected]> (v1) Reviewed-by: Eric Anholt <[email protected]>
* gallium: add CAPs to support HW atomic counters. (v3)Dave Airlie2017-11-103-0/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | This looks like an evergreen specific feature, but with atomic counters AMD have hw specific counters they use instead of operating on buffers directly. These are separate to the buffer atomics, so require different limits and code paths. I've left the CAP for atomic type extensible in case someone else has a variant on this sort of thing (freedreno maybe?) and needs to change it. This adds all the CAPs required to add support for those atomic counters, along with a related CAP for limiting the number of output resources. I'd like to land this and the st patch then I can start to upstream the evergreen support for these and other GL4.x features. v2: drop the ATOMIC_COUNTER_MODE cap, just use the return from the HW counters. If 0 we use the current mode. v3: fix some rebase errors (Gert Wollny) Reviewed-by: Nicolai Hähnle <[email protected]> Reviewed-by: Marek Olšák <[email protected]> Tested-By: Gert Wollny <[email protected]> Signed-off-by: Dave Airlie <[email protected]>
* util: move os_time.[ch] to src/utilNicolai Hähnle2017-11-092-2/+2
| | | | Reviewed-by: Marek Olšák <[email protected]>
* nv50: make blending work so that zero wins in a multiplicationIlia Mirkin2017-11-081-0/+5
| | | | | | | This matches nvc0 behavior, tested with the fbo-float-nan piglit. Signed-off-by: Ilia Mirkin <[email protected]> Reviewed-by: Tobias Klausmann<[email protected]>
* gallium: add PIPE_CAP_SIGNED_VERTEX_BUFFER_OFFSETMarek Olšák2017-11-063-0/+3
|
* nv50,nvc0: Display shared memory usage in pipe_debug_messagePierre Moreau2017-11-042-6/+8
| | | | Signed-off-by: Pierre Moreau <[email protected]>
* nv50,nvc0: Copy shared memory per block to the program info structure and backPierre Moreau2017-11-042-0/+4
| | | | | | | | In OpenCL/CUDA kernels, shared memory usage can be defined within the kernel code. Those usage will only be picked up while parsing the SPIR-V, during the translation phase of the program. Signed-off-by: Pierre Moreau <[email protected]>
* nv50/ir: Store shared memory per block in nv50_ir_prog_infoPierre Moreau2017-11-041-0/+1
| | | | Signed-off-by: Pierre Moreau <[email protected]>
* gallium: add cap for driver specified max combined shader resources.Dave Airlie2017-11-013-0/+3
| | | | | | | | Some hw (evergreen) has a limit on how many combined (images/buffers/mrts) a fragment shader can access. Reviewed-by: Ilia Mirkin <[email protected]> Signed-off-by: Dave Airlie <[email protected]>
* gallium: s/unsigned/enum pipe_prim_type/Brian Paul2017-10-271-1/+1
| | | | | | In the vbuf_render::set_primitive() functions. Reviewed-by: Roland Scheidegger <[email protected]>
* meson: build nouveau (gallium) driverDylan Baker2017-10-161-0/+224
| | | | | | | | | | | Tested with a GK107. v2: - Add target for nouveau standalone compiler. This target is not built by default. v3: - Add nouveau to list of drivers built by default Signed-off-by: Dylan Baker <[email protected]> Reviewed-by: Eric Anholt <eric at anholt.net>
* nv50,nvc0: fix push hint logic in presence of a start offsetIlia Mirkin2017-10-112-7/+5
| | | | | | | | | | | | | | | | Previously buffer offsets were passed in explicitly as an offset, which had to be added to the resource address. Now they are passed in via an increased 'start' parameter. As a result, we were double-adding the start offset in this kind of situation. This condition was triggered by piglit's draw-elements test which has a requisite glMultiDrawElements in combination with a small enough number of vertices to go through the immediate push path. Fixes: 330d0607ed6 ("gallium: remove pipe_index_buffer and set_index_buffer") Reported-by: Karol Herbst <[email protected]> Reviewed-by: Samuel Pitoiset <[email protected]> Cc: [email protected]
* gallium: Create a new PIPE_CAP_TILE_RASTER_ORDER for vc4.Eric Anholt2017-10-103-0/+3
| | | | | | | | | | | | | | | | Because vc4 can control the order that tiles are rasterized in, we can use it to implement overlapping blits using normal drawing and GL_ARB_texture_barrier, as long as we can tell the kernel what order to render the tiles in. This commit introduces the core gallium support, vc4 changes will follow. v2: Fix on the simulator. v3: Add the cap (disabled) to other drivers, add rst docs for the cap. v4: Rebase on PIPE_CAP_TGSI_ANY_REG_AS_ADDRESS v5: Drop vc4 changes from this commit, for clarity. Reviewed-by: Nicolai Hähnle <[email protected]> (v3)
* nv50/ir: fix 64-bit integer shiftsIlia Mirkin2017-10-091-1/+3
| | | | | | | | | TGSI was adjusted to always pass in 64-bit integers but nouveau was left with the old semantics. Update to the new thing. Fixes: d10fbe5159 (st/glsl_to_tgsi: fix 64-bit integer bit shifts) Reported-by: Karol Herbst <[email protected]> Cc: [email protected]
* gallium: add PIPE_CAP_TGSI_ANY_REG_AS_ADDRESSMarek Olšák2017-10-063-0/+3
| | | | | Reviewed-by: Brian Paul <[email protected]> Reviewed-by: Nicolai Hähnle <[email protected]>
* gallium: Remove util_format_s3tc_init()Matt Turner2017-10-021-2/+0
| | | | | Reviewed-by: Nicolai Hähnle <[email protected]> Reviewed-by: Emil Velikov <[email protected]>
* gallium: add LDEXP TGSI instruction and corresponding capNicolai Hähnle2017-09-293-0/+4
| | | | | Reviewed-by: Marek Olšák <[email protected]> Tested-by: Dieter Nützel <[email protected]>
* gallium: Add PIPE_SHADER_CAP_INT64_ATOMICSJan Vesely2017-09-213-0/+3
| | | | | | | Denotes availability of 64bit int atomic instructions Signed-off-by: Jan Vesely <[email protected]> Reviewed-by: Marek Olšák <[email protected]>
* gallium: Add PIPE_SHADER_CAP_FP16Jan Vesely2017-09-183-0/+4
| | | | | | | | | Denotes native half precision float operations capability v2: PIPE_CAP_HALFS -> PIPE_SHADER_CAP_FP16 fix indentation Signed-off-by: Jan Vesely <[email protected]> Reviewed-by: Nicolai Hähnle <[email protected]>
* nvc0: fix compile errorBenedikt Schemmer2017-09-181-1/+1
| | | | | | | Fixes: 3f6b3d9db ("gallium: add PIPE_QUERY_OCCLUSION_PREDICATE_CONSERVATIVE") Signed-off-by: Benedikt Schemmer <[email protected]> Previously-pointed-out-by: Ilia Mirkin <[email protected]> Reviewed-by: Nicolai Hähnle <[email protected]>
* gallium: add PIPE_QUERY_OCCLUSION_PREDICATE_CONSERVATIVENicolai Hähnle2017-09-185-2/+15
| | | | | | | | | | | | | | | | | To be able to properly distinguish between GL_ANY_SAMPLES_PASSED and GL_ANY_SAMPLES_PASSED_CONSERVATIVE. This patch goes through all drivers, having them treat the two query types identically, except: 1. radeon incorrectly enabled conservative mode on PIPE_QUERY_OCCLUSION_PREDICATE. We now do it correctly, only on PIPE_QUERY_OCCLUSION_PREDICATE_CONSERVATIVE. 2. st/mesa uses the new query type. Fixes dEQP-GLES31.functional.fbo.no_attachments.* Reviewed-by: Marek Olšák <[email protected]>
* gallium: introduce PIPE_CAP_LOAD_CONSTBUFTimothy Arceri2017-09-153-0/+3
| | | | Reviewed-by: Marek Olšák <[email protected]>
* nvc0/ir: propagate immediates to CALL input MOVsTobias Klausmann2017-08-311-2/+19
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | On using builtin functions we have to move the input to registers $0 and $1, if one of the input value is an immediate, we fail to propagate the immediate: ... mov u32 $r477 0x00000003 (0) ... mov u32 $r0 %r473 (0) mov u32 $r1 $r477 (0) call abs BUILTIN:0 (0) mov u32 %r495 $r1 (0) ... With this patch the immediate is propagated, potentially causing the first MOV to be superfluous, which we'd remove in that case: ... mov u32 $r0 %r473 (0) mov u32 $r1 0x00000003 (0) call abs BUILTIN:0 (0) mov u32 %r495 $r1 (0) ... Shaderdb stats: total instructions in shared programs : 4893460 -> 4893324 (-0.00%) total gprs used in shared programs : 582972 -> 582881 (-0.02%) total local used in shared programs : 17960 -> 17960 (0.00%) local gpr inst bytes helped 0 91 112 112 hurt 0 0 0 0 v2: implement some changes proposed by imirkin, the manual deletion of the dead mov is necessary after ea22ac23e0 ("nvc0/ir: unlink values pre- and post-call to division function") as the potentially dead mov is unlinked properly, causing later passes to not notice the mov op at all and thus not cleaning it up. That makes up a big chunk of the regression the above commit caused. Keep the deletion of the op where it is, deleting it later unnecessarily blows up size of the change. Signed-off-by: Tobias Klausmann <[email protected]> Reviewed-by: Ilia Mirkin <[email protected]>
* nvc0: write 0 to pipeline_statistics.cs_invocationsKarol Herbst2017-08-311-0/+1
| | | | | | | | | | | | | cs_invocations are currently unsupported, but leaving the field uninitialized is even worse. fixes on nvc0: * KHR-GL45.pipeline_statistics_query_tests_ARB.functional_default_qo_values * KHR-GL45.pipeline_statistics_query_tests_ARB.functional_non_rendering_commands_do_not_affect_queries Signed-off-by: Karol Herbst <[email protected]> Reviewed-by: Ilia Mirkin <[email protected]> Cc: [email protected]
* nv50/ir: properly set sType for TXF ops to U32Ilia Mirkin2017-08-241-0/+3
| | | | | | | | | | | | All of the coordinates and LOD args are integers for TXF. This mostly doesn't matter, except for converting into a levelZero=true operation by removing an explicit zero LOD. For the comparison against zero to work properly, the sType of the instruction has to be set correctly. Fixes: KHR-GL45.robust_buffer_access_behavior.texel_fetch Reported-by: Karol Herbst <[email protected]> Signed-off-by: Ilia Mirkin <[email protected]> Cc: [email protected]
* gallium: remove TGSI opcode SCSMarek Olšák2017-08-223-32/+0
| | | | | | | use COS+SIN instead. Reviewed-by: Roland Scheidegger <[email protected]> Acked-by: Jose Fonseca <[email protected]>
* gallium: remove TGSI opcode XPDMarek Olšák2017-08-225-39/+0
| | | | | | use MUL+MAD+MOV instead. Reviewed-by: Roland Scheidegger <[email protected]>
* gallium: remove TGSI opcode DPHMarek Olšák2017-08-223-16/+0
| | | | | | use DP4 or DP3 + ADD. Reviewed-by: Roland Scheidegger <[email protected]>