aboutsummaryrefslogtreecommitdiffstats
path: root/src/gallium
Commit message (Collapse)AuthorAgeFilesLines
* radeonsi: simplify and improve flushingMarek Olšák2013-08-3112-140/+125
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This mimics r600g. The R600_CONTEXT_xxx flags are added to rctx->b.flags and si_emit_cache_flush emits the packets. That's it. The shared radeon code tells us when the streamout cache should be flushed, so we have to check the flags anyway. There is a new atom "cache_flush", because caches must be flushed *after* resource descriptors are changed in memory. Functional changes: * Write caches are flushed at the end of CS and read caches are flushed at its beginning. * Sampler view states are removed from si_state, they only held the flush flags. * Everytime a shader is changed, the I cache is flushed. Is this needed? Due to a hw bug, this also flushes the K cache. * The WRITE_DATA packet is changed to use TC, which fixes a rendering issue in openarena. I'm not sure how TC interacts with CP DMA, but for now it seems to work better than any other solution I tried. (BTW CIK allows us to use TC for CP DMA.) * Flush the K cache instead of the texture cache when updating resource descriptors (due to a hw bug, this also flushes the I cache). I think the K cache flush is correct here, but I'm not sure if the texture cache should be flushed too (probably not considering we use TC for WRITE_DATA, but we don't use TC for CP DMA). * The number of resource contexts is decreased to 16. With all of these cache changes, 4 doesn't work, but 8 works, which suggests I'm actually doing the right thing here and the pipeline isn't drained during flushes. Reviewed-by: Michel Dänzer <[email protected]> Reviewed-by: Christian König <[email protected]> Tested-by: Tom Stellard <[email protected]>
* radeonsi: convert constant buffers to si_descriptorsMarek Olšák2013-08-315-128/+162
| | | | | | | | | | | | | | | There is a new "class" si_buffer_resources, which should be good enough for implementing any kind of buffer bindings (constant buffers, vertex buffers, streamout buffers, shader storage buffers, etc.) I don't even keep a copy of pipe_constant_buffer - we don't need it. The main motivation behind this is to have a well-tested infrastrusture for setting up streamout buffers. Reviewed-by: Michel Dänzer <[email protected]> Reviewed-by: Christian König <[email protected]> Tested-by: Tom Stellard <[email protected]>
* radeonsi: use r600_common_context, r600_common_screen, r600_resourceMarek Olšák2013-08-3128-777/+338
| | | | | | | | | Also r600_hw_context_priv.h and si_state_streamout.c are removed, because they are no longer needed. Reviewed-by: Michel Dänzer <[email protected]> Reviewed-by: Christian König <[email protected]> Tested-by: Tom Stellard <[email protected]>
* r600g: move streamout state to drivers/radeonMarek Olšák2013-08-3127-1486/+1825
| | | | | | | | | | | | | | | | | | | | | | | | This streamout state code will be used by radeonsi. There are new structures r600_common_context and r600_common_screen. What is inherited by what is shown here: pipe_context -> r600_common_context -> r600_context pipe_screen -> r600_common_screen -> r600_screen The common structures reside in drivers/radeon. Currently they only contain enough functionality to be able to handle streamout. Eventually I'd like the whole pipe_screen implementation to be shared and some of the context stuff too. This is quite big, but most changes are because of the new structures and the fact r600_write_value is replaced by radeon_emit. Thanks to Tom Stellard for fixing the build for r600g/compute. Reviewed-by: Michel Dänzer <[email protected]> Reviewed-by: Christian König <[email protected]> Tested-by: Tom Stellard <[email protected]>
* radeonsi: cleanup initialization of SGPR shader parametersMarek Olšák2013-08-311-13/+19
| | | | | | Reviewed-by: Michel Dänzer <[email protected]> Reviewed-by: Christian König <[email protected]> Tested-by: Tom Stellard <[email protected]>
* r600g,radeonsi: remove unused variablesMarek Olšák2013-08-312-8/+0
| | | | | | Reviewed-by: Michel Dänzer <[email protected]> Reviewed-by: Christian König <[email protected]> Tested-by: Tom Stellard <[email protected]>
* draw: fix segfaults with aaline and aapoint stages disabledMarek Olšák2013-08-311-2/+4
| | | | | | | | | | There are drivers not using these optional stages. Broken by a3ae5dc7dd5c2f8893f86a920247e690e550ebd4. Cc: [email protected] Reviewed-by: Jose Fonseca <[email protected]>
* radeonsi: Do not suspend timer queriesNiels Ole Salscheider2013-08-306-14/+30
| | | | | Signed-off-by: Niels Ole Salscheider <[email protected]> Signed-off-by: Marek Olšák <[email protected]>
* draw: fix PIPE_MAX_SAMPLER/PIPE_MAX_SHADER_SAMPLER_VIEWS issuesRoland Scheidegger2013-08-302-6/+6
| | | | | | | | | | | | pstipple/aaline stages used PIPE_MAX_SAMPLER instead of PIPE_MAX_SHADER_SAMPLER_VIEWS when dealing with sampler views. Now these stages can't actually handle sampler_unit != texture_unit anyway (they cannot work with d3d10 shaders at all due to using tex not sample opcodes as "mixed mode" shaders are impossible) but this leads to crashes if a driver just installs these stages and then more than PIPE_MAX_SAMPLER views are set even if the stages aren't even used. Reviewed-by: Zack Rusin <[email protected]>
* gallivm: handle unbound textures in texture sampling / texture queriesRoland Scheidegger2013-08-301-0/+26
| | | | | | | | | | | | | | Turns out we don't need to do much extra work for detecting this case, since we are guaranteed to get a empty static texture state in this case, hence just rely on format being 0 and return all zero then. Previously needed dummy textures (would just have crashed on format being 0 otherwise) which cannot return the correct result for size queries and when sampling textures with wrap modes using border. As a bonus should hugely increase performance when sampling unbound textures - too bad it isn't a useful feature :-). Reviewed-by: Jose Fonseca <[email protected]> Reviewed-by: Zack Rusin <[email protected]>
* softpipe: handle NULL sampler views for texture sampling / queriesRoland Scheidegger2013-08-302-5/+26
| | | | | | | Instead of crashing just return all zero. Reviewed-by: Jose Fonseca <[email protected]> Reviewed-by: Zack Rusin <[email protected]>
* softpipe: check if so_target is NULL before accessing itRoland Scheidegger2013-08-301-2/+5
| | | | | | | | | | No idea if this is working right but copied straight from llvmpipe. (Not only does this check the so_target but also use buffer->data instead of buffer for the mapping.) Just trying to get rid of a segfault testing something else... Reviewed-by: Jose Fonseca <[email protected]> Reviewed-by: Zack Rusin <[email protected]>
* gallivm: (trivial) don't pass sampler_unit variable down to filtering funcsRoland Scheidegger2013-08-301-36/+21
| | | | | | | The only reason this was needed was because the fetch texel function had to get the (dynamic) border color, but this is now done much earlier. Reviewed-by: Jose Fonseca <[email protected]>
* gallivm: don't use AoS path if min/mag filter are different with multiple lodsRoland Scheidegger2013-08-301-1/+6
| | | | | | | | Instead of enhancing the AoS path so it can deal with it, just use SoA. Fixing AoS path wouldn't be all that difficult (use all the same logic as SoA) but considered not worth it for now. Reviewed-by: Jose Fonseca <[email protected]>
* r600g: enable SB backend by defaultVadim Girlin2013-08-304-5/+6
| | | | | | | Signed-off-by: Vadim Girlin <[email protected]> Reviewed-by: Marek Olšák <[email protected]> Reviewed-by: Tom Stellard <[email protected]> Reviewed-by: Christian König <[email protected]>
* r600g: fix color exports when we have no CBsVadim Girlin2013-08-301-3/+4
| | | | | | | We need to export at least one color if the shader writes it, even when nr_cbufs==0. Signed-off-by: Vadim Girlin <[email protected]>
* nvc0/ir: Initialize NVC0LegalizePostRA member variables.Vinson Lee2013-08-291-1/+3
| | | | | | Fixes "Uninitialized pointer field" defects reported by Coverity. Signed-off-by: Vinson Lee <[email protected]>
* gallivm: support per-pixel min/mag filter in SoA pathRoland Scheidegger2013-08-301-43/+243
| | | | | | | | | | | | | | | | | | | | | | | Since we can have per-pixel lod we should also honor the filter per-pixel (in fact we didn't honor it per quad neither in the multiple quad case). Do this by running the linear path and simply beating the weights into shape (the sample with the higher weight is the one which should have been chosen with nearest filtering hence adjust filter weight to 1.0/0.0 based on that). If all pixels use nearest filter (either min and mag) then still run just a nearest filter as this is way cheaper (probably around 4 times faster for 2d, more for 3d case) and it should be relatively rare that pixels really need different filtering. OTOH if all pixels would require linear don't do anything special since the linear path with filter adjustments shouldn't really be all that much more expensive than ordinary linear, and we think it's rare that min/mag filters are configured differently so there doesn't seem much value in trying to optimize this further. This does not yet fix the AoS path (though currently AoS is only used for single quads hence it could be considered less broken, just never honoring per-pixel filter decision but doing it per quad). v2: simplify code a bit (unify min linear and min nearest cases) Reviewed-by: Jose Fonseca <[email protected]>
* gallivm: don't calculate square root of rho if we use accurate rho methodRoland Scheidegger2013-08-301-39/+74
| | | | | | | | | | | | | | | | | | | | While a sqrt here and there shouldn't hurt much (depending on the cpu) it is possible to completely omit it since rho is only used for calculating lod and there log2(x) == 0.5*log2(x^2). Depending on the exact path taken for calculating lod this means we get a simple mul instead of sqrt (in case of nearest mip filter in fact we don't need to replace the sqrt with something else at all), only in some not very useful path this doesn't work (combined brilinear calculation of int level and fractional lod, accurate rho calc but brilinear filtering seems odd). Apart from being faster as an added bonus this should increase our crappy fractional accuracy of lod, since fast_log2 is only good for ~3bits and this should increase accuracy by one bit (though not used if dimension is just one as we'd need an extra mul there as we never had the squared rho in the first place). v2: use separate ilog2_sqrt function if we have squared rho. Reviewed-by: Jose Fonseca <[email protected]>
* gallivm: refactor num_lods handlingRoland Scheidegger2013-08-304-131/+169
| | | | | | | | | | | | | This is just preparation for per-pixel (or per-quad in case of multiple quads) min/mag filter since some assumptions about number of miplevels being equal to number of lods no longer holds true. This change does not change behavior yet (though theoretically when forcing per-element path it might be slower with different min/mag filter since the code will respect this setting even when there's no mip maps now in this case, so some lod calcs will be done per-element just ultimately still the same filter used for all pixels). Reviewed-by: Jose Fonseca <[email protected]>
* radeonsi: Early return if no depth or stencil on release builds.Vinson Lee2013-08-291-0/+1
| | | | | | | Fixes "Missing break in switch" defect reported by Coverity. Signed-off-by: Vinson Lee <[email protected]> Reviewed-by: Marek Olšák <[email protected]>
* freedreno: pipe loader for either kgsl or msmRob Clark2013-08-294-10/+39
| | | | | | | | The downstream android kernel driver is "kgsl", the upstream drm/kms driver is called "msm". Since libdrm_freedreno handles the differences between the two, we need to load the same thing for either device. Signed-off-by: Rob Clark <[email protected]>
* freedreno: updates for msm drm/kms driverRob Clark2013-08-298-30/+55
| | | | | | | There where some small API tweaks in libdrm_freedreno to enable support for msm drm/kms driver. Signed-off-by: Rob Clark <[email protected]>
* freedreno/a3xx/compiler: handle sync flags betterRob Clark2013-08-291-16/+34
| | | | | | | | We need to set the flag on all the .xyzw components that are written by the instruction, not just on .x. Otherwise a later use of rN.y (for example) will not trigger the appropriate sync bit to be set. Signed-off-by: Rob Clark <[email protected]>
* freedreno/a3xx/compiler: better const handlingRob Clark2013-08-291-90/+121
| | | | | | | | | Seems like most/all instructions have some restrictions about const src registers. In seems like the 2 src (cat2) instructions can take at most one const, and the 3 src (cat3) instructions can take at most one const in the first 2 arguments. And so on. Handle this properly now. Signed-off-by: Rob Clark <[email protected]>
* radeonsi: Make sure libdrm_radeon headers are picked up from the right placeJonathan Gray2013-08-292-2/+3
| | | | | | And remove libdrm/ from a winsys include statement. Signed-off-by: Jonathan Gray <[email protected]>
* draw: fix point/line/triangle determination in draw_need_pipeline()Brian Paul2013-08-291-25/+6
| | | | | | The previous point/line/triangle() functions didn't handle GS primitives. Reviewed-by: Roland Scheidegger <[email protected]>
* radeon/uvd: fix MPEG2/4 ref frame index limitChristian König2013-08-291-2/+2
| | | | | | Otherwise the first few frames have an incorrect reference index. Signed-off-by: Christian König <[email protected]>
* nouveau: Copy m4x4 and m8x8 separately.Vinson Lee2013-08-281-1/+2
| | | | | | Silences Coverity "Out-of-bounds access" defect. Signed-off-by: Vinson Lee <[email protected]>
* r300g: enable MSAA on r300-r400, be careful about using color compressionMarek Olšák2013-08-274-5/+14
| | | | | | | | | | MSAA was tested by one user on RS690 and it works for him with color compression (CMASK) disabled. Our theory is that his chipset lacks CMASK RAM. Since we don't have hardware documentation about which chipsets actually have CMASK RAM, I had to take a guess based on the presence of HiZ. Reviewed-by: Alex Deucher <[email protected]>
* draw: clean up setting stream out information a bitRoland Scheidegger2013-08-279-34/+39
| | | | | | | | | | | | | | | | | In particular noone is interested in the vertex count, so drop that, and also drop the duplicated num_primitives_generated / so.primitives_storage_needed variables in drivers. I am unable for now to figure out if primitives_storage_needed in SO stats (used for d3d10) should increase if SO is disabled, though the equivalent num_primitives_generated used for OpenGL definitely should increase. In any case we were only counting when SO is active both in softpipe and llvmpipe anyway so don't pretend there's an independent num_primitives_generated counter which would count always. (This means the PIPE_QUERY_PRIMITIVES_GENERATED count will still be wrong just as before, should eventually fix this by doing either separate counting for this query or adjust the code so it always counts this even if SO is inactive depending on what's correct for d3d10.) Reviewed-by: Brian Paul <[email protected]>
* llvmpipe: support nested/overlapping queries for all query typesRoland Scheidegger2013-08-273-18/+20
| | | | | | | There's just no way resetting the counters is working with nested/overlapping queries. Reviewed-by: Brian Paul <[email protected]>
* softpipe: support nested/overlapping queries for all query typesRoland Scheidegger2013-08-272-18/+17
| | | | | | | There's just no way resetting the counters is working with nested/overlapping queries. Reviewed-by: Brian Paul <[email protected]>
* clover: Don't use PIPE_TRANSFER_UNSYNCHRONIZED for blocking copiesTom Stellard2013-08-261-1/+1
| | | | | | CC: "9.2" <[email protected]> Reviewed-by: Francisco Jerez <[email protected]>
* st/clover: Add event to deps even if it has been triggeredNiels Ole Salscheider2013-08-261-1/+1
| | | | | | | | The command is submitted once the event has been triggered, but it might not have completed yet. Therefore, we have to add it to deps in order to wait on it. Signed-off-by: Niels Ole Salscheider <[email protected]> Reviewed-by: Francisco Jerez <[email protected]>
* st/clover: Profiling supportNiels Ole Salscheider2013-08-263-18/+142
| | | | | Signed-off-by: Niels Ole Salscheider <[email protected]> Acked-by: Francisco Jerez <[email protected]>
* tgsi_build: fix order of arguments for ind register buildDave Airlie2013-08-271-1/+1
| | | | | | | This was broken when arrayid was added. Reviewed-by: Brian Paul <[email protected]> Signed-off-by: Dave Airlie <[email protected]>
* tgsi: finish declaration parsing for arrays.Dave Airlie2013-08-271-1/+31
| | | | | | | | | I previously fixed this partly in 9e8400f4c95bde1f955c7977066583b507159a10, however I didn't go far enough in testing it, now when I parse a TGSI shader with arrays in it my iterator can see the ArrayID set to the proper value. Reviewed-by: Brian Paul <[email protected]> Signed-off-by: Dave Airlie <[email protected]>
* svga: replace 0 with PIPE_OK in a few placesBrian Paul2013-08-263-5/+5
|
* radeonsi: Also set the depth component mask bit for stencil-only exportsMichel Dänzer2013-08-261-1/+4
| | | | | | | | The stencil values come out wrong without this for some reason. 50 more little piglits. Cc: [email protected]
* r600g: Implement the new float comparison instructions for Cayman as well.Henri Verbeet2013-08-251-4/+4
| | | | | | | | I assume this should have been part of commit 7727fbb7c5d64348994bce6682e681d6181a91e9. This (obviously) fixes a lot tests. Signed-off-by: Henri Verbeet <[email protected]> Reviewed-by: Marek Olšák <[email protected]>
* nv30: add forgotten PIPE_CAP_CUBE_MAP_ARRAY cap to listIlia Mirkin2013-08-251-0/+1
| | | | | Signed-off-by: Ilia Mirkin <[email protected]> Cc: "9.2" <[email protected]>
* nouveau/video: avoid overwriting base codec init with templateIlia Mirkin2013-08-252-2/+2
| | | | | | | | | | Commit 53e20b8b introduced the use of a template to initialize some common fields. Move this copying of fields to before the common vp3 fields are initialized. Reported-by: Martin Peres <[email protected]> Signed-off-by: Ilia Mirkin <[email protected]> Reviewed-by: Christian König <[email protected]>
* freedreno/a3xx: don't leak so muchRob Clark2013-08-241-0/+11
| | | | Signed-off-by: Rob Clark <[email protected]>
* freedreno/a3xx/compiler: fix SGT/SLT/etcRob Clark2013-08-241-29/+125
| | | | | | | | | | The cmps.f.* instruction doesn't actually seem to give a float 1.0 or 0.0 output. It either needs a cov.u16f16 or add.s + sel.f16. This makes SGT/SLT/etc more similar to CMP, so handle them in trans_cmp(). This fixes a bunch of piglit tests. Signed-off-by: Rob Clark <[email protected]>
* freedreno/a3xx/compiler: bit of re-arrange/cleanupRob Clark2013-08-241-61/+71
| | | | | | | | It seems there are a number of cases where instructions have limitations about taking reading src's from const register file, so make get_unconst() a bit easier to use. Signed-off-by: Rob Clark <[email protected]>
* freedreno/a3xx/compiler: make compiler errors more usefulRob Clark2013-08-242-17/+33
| | | | | | | | | | | | | | We probably should get rid of assert() entirely, but at this stage it is more useful for things to crash where we can catch it in a debugger. With compile_error() we have a single place to set an error flag (to bail out and return an error on the next instruction) so that will be a small change later when enough of the compiler bugs are sorted. But re-arrange/cleanup the error/assert stuff so we at least get a dump of the TGSI that triggered it. So we see some useful output in piglit logs. Signed-off-by: Rob Clark <[email protected]>
* freedreno: fix segfault when no color buffer boundRob Clark2013-08-247-18/+40
| | | | | | | Don't crash when no color buffer bound. Something caught when starting to run piglit, fixes a hanful of piglit tests. Signed-off-by: Rob Clark <[email protected]>
* freedreno/a3xx/compiler: cat4 cannot use const reg as srcRob Clark2013-08-241-10/+27
| | | | | | | | | | | | | Category 4 instructions (rsq, rcp, sqrt, etc) seem to be unable to take a const register as src. In these cases we need to move the src to a temporary gpr first. This is the second case of such a restriction, where the instruction encoding appears to support a const src, but in fact the hw appears to ignore that bit. So split things out into a helper that can be re-used for any instructions which have this limitation. Signed-off-by: Rob Clark <[email protected]>
* freedreno/a3xx/compiler: use max_reg rather than file_countRob Clark2013-08-241-7/+7
| | | | | | | | | | Our current (rather naive) register assignment is based on mapping different register files (INPUT, OUTPUT, TEMP, CONST, etc) based on the max register index of the preceding file. But in some cases, the lowest used register in a file might not be zero. In which case file_count[file] != file_max[file] + 1. Signed-off-by: Rob Clark <[email protected]>