summaryrefslogtreecommitdiffstats
path: root/src/gallium
Commit message (Collapse)AuthorAgeFilesLines
* i915g: Add more optimizationsStéphane Marchesin2013-09-043-58/+371
| | | | | | | | | This patch adds liveness analysis to i915g and a couple optimizations which benefit from it. One interesting optimization turns (fake) indirect texture accesses into direct texture accesses (the i915 supports a maximum of 4 indirect texture accesses). Among other things this fixes a bunch of piglit tests.
* radeonsi: Don't save/restore FMASK sampler view states for u_blitterMichel Dänzer2013-09-021-1/+2
| | | | | | | Fixes assertion failues in 24 piglit tests with MESA_GL_VERSION_OVERRIDE=3.0, 12 of which are now passing. Reviewed-by: Marek Olšák <[email protected]>
* radeonsi: Expose pure integer vertex formatsMichel Dänzer2013-09-021-1/+6
| | | | | | Fixes 20 piglit tests with MESA_GL_VERSION_OVERRIDE=3.0. Reviewed-by: Marek Olšák <[email protected]>
* nvc0: restore viewport after blitMaarten Lankhorst2013-09-023-4/+7
| | | | | | | Based on calim's original fix in the nine branch. Signed-off-by: Maarten Lankhorst <[email protected]> Cc: "9.2 and 9.1" <[email protected]>
* radeon/uvd: save the aligned width & heightChristian König2013-09-021-0/+2
| | | | | | Fixes: https://bugs.freedesktop.org/show_bug.cgi?id=68845 Signed-off-by: Christian König <[email protected]>
* nvc0: delete compute object on screen destructionChristoph Bumiller2013-09-011-0/+1
| | | | Cc: "9.2" <[email protected]>
* nvc0: fix blitctx memory leakJoakim Sindholt2013-09-013-0/+9
| | | | Cc: "9.2 and 9.1" <[email protected]>
* nvc0: don't use bufctx in nvc0_cb_pushChristoph Bumiller2013-09-011-7/+3
| | | | Too many calls into libdrm when a single one is enough.
* nvc0: clear the flushed flagChristoph Bumiller2013-09-011-5/+4
|
* nvc0/ir: add f32 long immediate cannot saturateChristoph Bumiller2013-09-011-0/+12
| | | | Cc: "9.2" <[email protected]>
* nvc0/ir: fix use after free in texture barrier insertion passTiziano Bacocco2013-09-011-1/+2
| | | | | | Fixes crash with Amnesia: The Dark Descent. Cc: "9.2 and 9.1" <[email protected]>
* nv30: find first unused texcoord rather than bailing if first is usedIlia Mirkin2013-09-011-2/+1
| | | | | | | This fixes shaders produced by supertuxkart. Cc: "9.2" <[email protected]> Signed-off-by: Ilia Mirkin <[email protected]>
* nouveau: initialise the nouveau_transfer mapsEmil Velikov2013-09-011-0/+2
| | | | | Cc: "9.2 and 9.1" <[email protected]> Signed-off-by: Emil Velikov <[email protected]>
* radeonsi: simplify and improve flushingMarek Olšák2013-08-3112-140/+125
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This mimics r600g. The R600_CONTEXT_xxx flags are added to rctx->b.flags and si_emit_cache_flush emits the packets. That's it. The shared radeon code tells us when the streamout cache should be flushed, so we have to check the flags anyway. There is a new atom "cache_flush", because caches must be flushed *after* resource descriptors are changed in memory. Functional changes: * Write caches are flushed at the end of CS and read caches are flushed at its beginning. * Sampler view states are removed from si_state, they only held the flush flags. * Everytime a shader is changed, the I cache is flushed. Is this needed? Due to a hw bug, this also flushes the K cache. * The WRITE_DATA packet is changed to use TC, which fixes a rendering issue in openarena. I'm not sure how TC interacts with CP DMA, but for now it seems to work better than any other solution I tried. (BTW CIK allows us to use TC for CP DMA.) * Flush the K cache instead of the texture cache when updating resource descriptors (due to a hw bug, this also flushes the I cache). I think the K cache flush is correct here, but I'm not sure if the texture cache should be flushed too (probably not considering we use TC for WRITE_DATA, but we don't use TC for CP DMA). * The number of resource contexts is decreased to 16. With all of these cache changes, 4 doesn't work, but 8 works, which suggests I'm actually doing the right thing here and the pipeline isn't drained during flushes. Reviewed-by: Michel Dänzer <[email protected]> Reviewed-by: Christian König <[email protected]> Tested-by: Tom Stellard <[email protected]>
* radeonsi: convert constant buffers to si_descriptorsMarek Olšák2013-08-315-128/+162
| | | | | | | | | | | | | | | There is a new "class" si_buffer_resources, which should be good enough for implementing any kind of buffer bindings (constant buffers, vertex buffers, streamout buffers, shader storage buffers, etc.) I don't even keep a copy of pipe_constant_buffer - we don't need it. The main motivation behind this is to have a well-tested infrastrusture for setting up streamout buffers. Reviewed-by: Michel Dänzer <[email protected]> Reviewed-by: Christian König <[email protected]> Tested-by: Tom Stellard <[email protected]>
* radeonsi: use r600_common_context, r600_common_screen, r600_resourceMarek Olšák2013-08-3128-777/+338
| | | | | | | | | Also r600_hw_context_priv.h and si_state_streamout.c are removed, because they are no longer needed. Reviewed-by: Michel Dänzer <[email protected]> Reviewed-by: Christian König <[email protected]> Tested-by: Tom Stellard <[email protected]>
* r600g: move streamout state to drivers/radeonMarek Olšák2013-08-3127-1486/+1825
| | | | | | | | | | | | | | | | | | | | | | | | This streamout state code will be used by radeonsi. There are new structures r600_common_context and r600_common_screen. What is inherited by what is shown here: pipe_context -> r600_common_context -> r600_context pipe_screen -> r600_common_screen -> r600_screen The common structures reside in drivers/radeon. Currently they only contain enough functionality to be able to handle streamout. Eventually I'd like the whole pipe_screen implementation to be shared and some of the context stuff too. This is quite big, but most changes are because of the new structures and the fact r600_write_value is replaced by radeon_emit. Thanks to Tom Stellard for fixing the build for r600g/compute. Reviewed-by: Michel Dänzer <[email protected]> Reviewed-by: Christian König <[email protected]> Tested-by: Tom Stellard <[email protected]>
* radeonsi: cleanup initialization of SGPR shader parametersMarek Olšák2013-08-311-13/+19
| | | | | | Reviewed-by: Michel Dänzer <[email protected]> Reviewed-by: Christian König <[email protected]> Tested-by: Tom Stellard <[email protected]>
* r600g,radeonsi: remove unused variablesMarek Olšák2013-08-312-8/+0
| | | | | | Reviewed-by: Michel Dänzer <[email protected]> Reviewed-by: Christian König <[email protected]> Tested-by: Tom Stellard <[email protected]>
* draw: fix segfaults with aaline and aapoint stages disabledMarek Olšák2013-08-311-2/+4
| | | | | | | | | | There are drivers not using these optional stages. Broken by a3ae5dc7dd5c2f8893f86a920247e690e550ebd4. Cc: [email protected] Reviewed-by: Jose Fonseca <[email protected]>
* radeonsi: Do not suspend timer queriesNiels Ole Salscheider2013-08-306-14/+30
| | | | | Signed-off-by: Niels Ole Salscheider <[email protected]> Signed-off-by: Marek Olšák <[email protected]>
* draw: fix PIPE_MAX_SAMPLER/PIPE_MAX_SHADER_SAMPLER_VIEWS issuesRoland Scheidegger2013-08-302-6/+6
| | | | | | | | | | | | pstipple/aaline stages used PIPE_MAX_SAMPLER instead of PIPE_MAX_SHADER_SAMPLER_VIEWS when dealing with sampler views. Now these stages can't actually handle sampler_unit != texture_unit anyway (they cannot work with d3d10 shaders at all due to using tex not sample opcodes as "mixed mode" shaders are impossible) but this leads to crashes if a driver just installs these stages and then more than PIPE_MAX_SAMPLER views are set even if the stages aren't even used. Reviewed-by: Zack Rusin <[email protected]>
* gallivm: handle unbound textures in texture sampling / texture queriesRoland Scheidegger2013-08-301-0/+26
| | | | | | | | | | | | | | Turns out we don't need to do much extra work for detecting this case, since we are guaranteed to get a empty static texture state in this case, hence just rely on format being 0 and return all zero then. Previously needed dummy textures (would just have crashed on format being 0 otherwise) which cannot return the correct result for size queries and when sampling textures with wrap modes using border. As a bonus should hugely increase performance when sampling unbound textures - too bad it isn't a useful feature :-). Reviewed-by: Jose Fonseca <[email protected]> Reviewed-by: Zack Rusin <[email protected]>
* softpipe: handle NULL sampler views for texture sampling / queriesRoland Scheidegger2013-08-302-5/+26
| | | | | | | Instead of crashing just return all zero. Reviewed-by: Jose Fonseca <[email protected]> Reviewed-by: Zack Rusin <[email protected]>
* softpipe: check if so_target is NULL before accessing itRoland Scheidegger2013-08-301-2/+5
| | | | | | | | | | No idea if this is working right but copied straight from llvmpipe. (Not only does this check the so_target but also use buffer->data instead of buffer for the mapping.) Just trying to get rid of a segfault testing something else... Reviewed-by: Jose Fonseca <[email protected]> Reviewed-by: Zack Rusin <[email protected]>
* gallivm: (trivial) don't pass sampler_unit variable down to filtering funcsRoland Scheidegger2013-08-301-36/+21
| | | | | | | The only reason this was needed was because the fetch texel function had to get the (dynamic) border color, but this is now done much earlier. Reviewed-by: Jose Fonseca <[email protected]>
* gallivm: don't use AoS path if min/mag filter are different with multiple lodsRoland Scheidegger2013-08-301-1/+6
| | | | | | | | Instead of enhancing the AoS path so it can deal with it, just use SoA. Fixing AoS path wouldn't be all that difficult (use all the same logic as SoA) but considered not worth it for now. Reviewed-by: Jose Fonseca <[email protected]>
* r600g: enable SB backend by defaultVadim Girlin2013-08-304-5/+6
| | | | | | | Signed-off-by: Vadim Girlin <[email protected]> Reviewed-by: Marek Olšák <[email protected]> Reviewed-by: Tom Stellard <[email protected]> Reviewed-by: Christian König <[email protected]>
* r600g: fix color exports when we have no CBsVadim Girlin2013-08-301-3/+4
| | | | | | | We need to export at least one color if the shader writes it, even when nr_cbufs==0. Signed-off-by: Vadim Girlin <[email protected]>
* nvc0/ir: Initialize NVC0LegalizePostRA member variables.Vinson Lee2013-08-291-1/+3
| | | | | | Fixes "Uninitialized pointer field" defects reported by Coverity. Signed-off-by: Vinson Lee <[email protected]>
* gallivm: support per-pixel min/mag filter in SoA pathRoland Scheidegger2013-08-301-43/+243
| | | | | | | | | | | | | | | | | | | | | | | Since we can have per-pixel lod we should also honor the filter per-pixel (in fact we didn't honor it per quad neither in the multiple quad case). Do this by running the linear path and simply beating the weights into shape (the sample with the higher weight is the one which should have been chosen with nearest filtering hence adjust filter weight to 1.0/0.0 based on that). If all pixels use nearest filter (either min and mag) then still run just a nearest filter as this is way cheaper (probably around 4 times faster for 2d, more for 3d case) and it should be relatively rare that pixels really need different filtering. OTOH if all pixels would require linear don't do anything special since the linear path with filter adjustments shouldn't really be all that much more expensive than ordinary linear, and we think it's rare that min/mag filters are configured differently so there doesn't seem much value in trying to optimize this further. This does not yet fix the AoS path (though currently AoS is only used for single quads hence it could be considered less broken, just never honoring per-pixel filter decision but doing it per quad). v2: simplify code a bit (unify min linear and min nearest cases) Reviewed-by: Jose Fonseca <[email protected]>
* gallivm: don't calculate square root of rho if we use accurate rho methodRoland Scheidegger2013-08-301-39/+74
| | | | | | | | | | | | | | | | | | | | While a sqrt here and there shouldn't hurt much (depending on the cpu) it is possible to completely omit it since rho is only used for calculating lod and there log2(x) == 0.5*log2(x^2). Depending on the exact path taken for calculating lod this means we get a simple mul instead of sqrt (in case of nearest mip filter in fact we don't need to replace the sqrt with something else at all), only in some not very useful path this doesn't work (combined brilinear calculation of int level and fractional lod, accurate rho calc but brilinear filtering seems odd). Apart from being faster as an added bonus this should increase our crappy fractional accuracy of lod, since fast_log2 is only good for ~3bits and this should increase accuracy by one bit (though not used if dimension is just one as we'd need an extra mul there as we never had the squared rho in the first place). v2: use separate ilog2_sqrt function if we have squared rho. Reviewed-by: Jose Fonseca <[email protected]>
* gallivm: refactor num_lods handlingRoland Scheidegger2013-08-304-131/+169
| | | | | | | | | | | | | This is just preparation for per-pixel (or per-quad in case of multiple quads) min/mag filter since some assumptions about number of miplevels being equal to number of lods no longer holds true. This change does not change behavior yet (though theoretically when forcing per-element path it might be slower with different min/mag filter since the code will respect this setting even when there's no mip maps now in this case, so some lod calcs will be done per-element just ultimately still the same filter used for all pixels). Reviewed-by: Jose Fonseca <[email protected]>
* radeonsi: Early return if no depth or stencil on release builds.Vinson Lee2013-08-291-0/+1
| | | | | | | Fixes "Missing break in switch" defect reported by Coverity. Signed-off-by: Vinson Lee <[email protected]> Reviewed-by: Marek Olšák <[email protected]>
* freedreno: pipe loader for either kgsl or msmRob Clark2013-08-294-10/+39
| | | | | | | | The downstream android kernel driver is "kgsl", the upstream drm/kms driver is called "msm". Since libdrm_freedreno handles the differences between the two, we need to load the same thing for either device. Signed-off-by: Rob Clark <[email protected]>
* freedreno: updates for msm drm/kms driverRob Clark2013-08-298-30/+55
| | | | | | | There where some small API tweaks in libdrm_freedreno to enable support for msm drm/kms driver. Signed-off-by: Rob Clark <[email protected]>
* freedreno/a3xx/compiler: handle sync flags betterRob Clark2013-08-291-16/+34
| | | | | | | | We need to set the flag on all the .xyzw components that are written by the instruction, not just on .x. Otherwise a later use of rN.y (for example) will not trigger the appropriate sync bit to be set. Signed-off-by: Rob Clark <[email protected]>
* freedreno/a3xx/compiler: better const handlingRob Clark2013-08-291-90/+121
| | | | | | | | | Seems like most/all instructions have some restrictions about const src registers. In seems like the 2 src (cat2) instructions can take at most one const, and the 3 src (cat3) instructions can take at most one const in the first 2 arguments. And so on. Handle this properly now. Signed-off-by: Rob Clark <[email protected]>
* radeonsi: Make sure libdrm_radeon headers are picked up from the right placeJonathan Gray2013-08-292-2/+3
| | | | | | And remove libdrm/ from a winsys include statement. Signed-off-by: Jonathan Gray <[email protected]>
* draw: fix point/line/triangle determination in draw_need_pipeline()Brian Paul2013-08-291-25/+6
| | | | | | The previous point/line/triangle() functions didn't handle GS primitives. Reviewed-by: Roland Scheidegger <[email protected]>
* radeon/uvd: fix MPEG2/4 ref frame index limitChristian König2013-08-291-2/+2
| | | | | | Otherwise the first few frames have an incorrect reference index. Signed-off-by: Christian König <[email protected]>
* nouveau: Copy m4x4 and m8x8 separately.Vinson Lee2013-08-281-1/+2
| | | | | | Silences Coverity "Out-of-bounds access" defect. Signed-off-by: Vinson Lee <[email protected]>
* r300g: enable MSAA on r300-r400, be careful about using color compressionMarek Olšák2013-08-274-5/+14
| | | | | | | | | | MSAA was tested by one user on RS690 and it works for him with color compression (CMASK) disabled. Our theory is that his chipset lacks CMASK RAM. Since we don't have hardware documentation about which chipsets actually have CMASK RAM, I had to take a guess based on the presence of HiZ. Reviewed-by: Alex Deucher <[email protected]>
* draw: clean up setting stream out information a bitRoland Scheidegger2013-08-279-34/+39
| | | | | | | | | | | | | | | | | In particular noone is interested in the vertex count, so drop that, and also drop the duplicated num_primitives_generated / so.primitives_storage_needed variables in drivers. I am unable for now to figure out if primitives_storage_needed in SO stats (used for d3d10) should increase if SO is disabled, though the equivalent num_primitives_generated used for OpenGL definitely should increase. In any case we were only counting when SO is active both in softpipe and llvmpipe anyway so don't pretend there's an independent num_primitives_generated counter which would count always. (This means the PIPE_QUERY_PRIMITIVES_GENERATED count will still be wrong just as before, should eventually fix this by doing either separate counting for this query or adjust the code so it always counts this even if SO is inactive depending on what's correct for d3d10.) Reviewed-by: Brian Paul <[email protected]>
* llvmpipe: support nested/overlapping queries for all query typesRoland Scheidegger2013-08-273-18/+20
| | | | | | | There's just no way resetting the counters is working with nested/overlapping queries. Reviewed-by: Brian Paul <[email protected]>
* softpipe: support nested/overlapping queries for all query typesRoland Scheidegger2013-08-272-18/+17
| | | | | | | There's just no way resetting the counters is working with nested/overlapping queries. Reviewed-by: Brian Paul <[email protected]>
* clover: Don't use PIPE_TRANSFER_UNSYNCHRONIZED for blocking copiesTom Stellard2013-08-261-1/+1
| | | | | | CC: "9.2" <[email protected]> Reviewed-by: Francisco Jerez <[email protected]>
* st/clover: Add event to deps even if it has been triggeredNiels Ole Salscheider2013-08-261-1/+1
| | | | | | | | The command is submitted once the event has been triggered, but it might not have completed yet. Therefore, we have to add it to deps in order to wait on it. Signed-off-by: Niels Ole Salscheider <[email protected]> Reviewed-by: Francisco Jerez <[email protected]>
* st/clover: Profiling supportNiels Ole Salscheider2013-08-263-18/+142
| | | | | Signed-off-by: Niels Ole Salscheider <[email protected]> Acked-by: Francisco Jerez <[email protected]>
* tgsi_build: fix order of arguments for ind register buildDave Airlie2013-08-271-1/+1
| | | | | | | This was broken when arrayid was added. Reviewed-by: Brian Paul <[email protected]> Signed-off-by: Dave Airlie <[email protected]>