aboutsummaryrefslogtreecommitdiffstats
path: root/src/gallium/drivers/nouveau
Commit message (Collapse)AuthorAgeFilesLines
...
* nv50/ir: don't fold immediate into mad if registers are too highIlia Mirkin2015-09-101-0/+4
| | | | | | Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=91551 Signed-off-by: Ilia Mirkin <[email protected]> Cc: "11.0" <[email protected]>
* nv50/ir: fix emission of 8-byte wide interp instructionIlia Mirkin2015-09-101-5/+6
| | | | | | | | | This can come up if the target register number is > 63, which is fairly rare. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=91551 Signed-off-by: Ilia Mirkin <[email protected]> Cc: "11.0" <[email protected]>
* nv50/ir: r63 is only 0 if we are using less than 63 registersIlia Mirkin2015-09-101-1/+4
| | | | | | | | | It is advantageous to use r63 instead of r127 since r63 can fit into the shorter encoding. However if we've RA'd over 63 registers, we must use r127 as the replacement instead. Signed-off-by: Ilia Mirkin <[email protected]> Cc: "11.0" <[email protected]>
* nv50/ir: make edge splitting fix up phi node sourcesIlia Mirkin2015-09-101-13/+77
| | | | | | | | | | | | | | | | | | | | | Unfortunately nv50_ir phi nodes aren't directly connected to the CFG, so the mapping between source and the actual BB is by inbound edge order. So when manipulating edges one has to be extremely careful. We were insufficiently careful when splitting critical edges which resulted in the phi nodes being confused as to where their sources were coming from. This primarily manifests itself with the TXL-lowering logic on nv50, when it is inside of a conditional. I've been unable to trigger the issue anywhere else so far. This resolves rendering failures in a number of games like Two Worlds 2, Trine: Enchanted Edition, Trine 2, XCOM:Enemy Unknown, Stacking. It also improves the situation in Hearthstone, Sonic Generations, and The Raven: Legacy of a Master Thief. However more work needs to be done there (splitting a lot more edges solves it, so it's some other sort of RA-related issue). Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=90887 Signed-off-by: Ilia Mirkin <[email protected]> Cc: "11.0" <[email protected]>
* nvc0: remove BGRA4 format supportIlia Mirkin2015-09-091-0/+2
| | | | | | | | | | | Something is wrong with the support somewhere. I couldn't get the blob driver to use it either, although it happily used RGB5_A1. teximage-colors works, but WoW seems to fail in the menus for drawing text. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=91526 Signed-off-by: Ilia Mirkin <[email protected]> Cc: "10.6 11.0" <[email protected]>
* nvc0: keep track of cb bindings per buffer, use for upload settingsIlia Mirkin2015-09-097-12/+58
| | | | | | | | | | | | | | | | | | | CB updates to bound buffers need to go through the CB_DATA endpoints, otherwise the shader may not notice that the updates happened. Furthermore, these updates have to go in to the same address as the bound buffer, otherwise, again, the shader may not notice updates. So we keep track of all the places where a constbuf is bound, and iterate over all of them when updating data. If a binding is found that encompasses the region to be updated, then we use the settings of that binding for the upload. Otherwise we upload as a regular data update. This fixes piglit 'arb_uniform_buffer_object-rendering offset' as well as blurriness in Witcher2. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=91890 Signed-off-by: Ilia Mirkin <[email protected]> Cc: "11.0" <[email protected]>
* nv30: Disable msaa unless requested from the env by NV30_MAX_MSAAHans de Goede2015-09-092-1/+21
| | | | | | | | | | | | | | | | | | | | Some modern apps try to use msaa without keeping in mind the restrictions on videomem of older cards. Resulting in dmesg saying: [ 1197.850642] nouveau E[soffice.bin[3785]] fail ttm_validate [ 1197.850648] nouveau E[soffice.bin[3785]] validating bo list [ 1197.850654] nouveau E[soffice.bin[3785]] validate: -12 Because we are running out of video memory, after which the program using the msaa visual freezes, and eventually the entire system freezes. To work around this we do not allow msaa visauls by default and allow the user to override this via NV30_MAX_MSAA. Signed-off-by: Hans de Goede <[email protected]> [imirkin: move env var lookup to screen so that it's only done once] Signed-off-by: Ilia Mirkin <[email protected]> Cc: "10.6 11.0" <[email protected]>
* nv30: Fix color resolving for nv3x cardsHans de Goede2015-09-091-1/+37
| | | | | | | | | | | | | We do not have a generic blitter on nv3x cards, so we must use the sifm object for color resolving. This commit divides the sources and dest surfaces in to tiles which match the constraints of the sifm object, so that color resolving will work properly on nv3x cards. Signed-off-by: Hans de Goede <[email protected]> Reviewed-by: Ilia Mirkin <[email protected]> Cc: "11.0" <[email protected]>
* nouveau: android: add space before PRIx64 macroMauro Rossi2015-09-091-1/+1
| | | | | | | | | | | | | | | | | | | | | Otherwise the android build fails with error : unable to find string literal operator ‘operator"" PRIx64’ There are several resources referring to the problem, which is related to c++11, in our case used when building mesa for lollipop. http://comments.gmane.org/gmane.comp.graphics.opensg.user/5883 I've not investigated all the semantics, some people even suggested a bug in the gcc compiler, I just saw the building error was solved with one little space for lollipop and no side effect when c+11 not used. v2: [Emil Velikov] add an alternative commit message from Mauro. Cc: 11.0 <[email protected]> Reviewed-by: Emil Velikov <[email protected]>
* nvc0: always emit a full shader colormaskIlia Mirkin2015-09-081-1/+1
| | | | | | | | | | | | | | Indications are that if the colormask indicates a single bit set on fermi, that value will always be read from $r0 instead of a potentially higher register (if e.g. green is set). Not to upset the counting logic, always set the header up with a full color mask for each RT. Such a situation can basically only ever happen with generated blit shaders. Fixes the following piglit on Fermi (Kepler is unaffected): fbo-stencil blit GL_DEPTH32F_STENCIL8 Signed-off-by: Ilia Mirkin <[email protected]> Cc: "10.6 11.0" <[email protected]>
* nv30: Fix max width / height checks in nv30 sifm codeHans de Goede2015-09-071-2/+2
| | | | | | | | | | | | | The sifm object has a limit of 1024x1024 for its input size and 2048x2048 for its output. The code checking this was trying to be clever resulting in it seeing a surface of e.g 1024x256 being outside of the input size limit. This commit fixes this. Signed-off-by: Hans de Goede <[email protected]> Reviewed-by: Ilia Mirkin <[email protected]> Cc: "10.6 11.0" <[email protected]>
* nouveau: don't mark full range as used on unmap with explicit flushIlia Mirkin2015-09-051-5/+7
| | | | | Signed-off-by: Ilia Mirkin <[email protected]> Cc: [email protected]
* nv50: avoid using inline vertex data submit when gl_VertexID is usedIlia Mirkin2015-09-054-2/+14
| | | | | | | | | | | The hardware only generates vertexid when vertices come from a VBO. This fixes: vertexid-drawelements vertexid-drawarrays Signed-off-by: Ilia Mirkin <[email protected]> Cc: "11.0" <[email protected]>
* nv50: don't flush vertex arrays when index buffer changesIlia Mirkin2015-09-051-4/+0
| | | | | | | | The index buffer is fed in inline over a pushbuf. It's not related to vertices or any caching that might be done on them. Signed-off-by: Ilia Mirkin <[email protected]> Cc: [email protected]
* nv50: rebind bo to bufctx when invalidating idxbuf storageIlia Mirkin2015-09-051-1/+5
| | | | | | | | There is nothing to be done on a dirty idxbuf, but the bo may have changed, so we have to rebind it to the bufctx. Signed-off-by: Ilia Mirkin <[email protected]> Cc: [email protected]
* nv50: clear buffer status on all vertex bufs, not just the first oneIlia Mirkin2015-09-051-1/+0
| | | | | Signed-off-by: Ilia Mirkin <[email protected]> Cc: [email protected]
* nv50: fix drawing from tfb, direct-to-pushbuf submitsIlia Mirkin2015-09-054-14/+15
| | | | | | | | | | | The stride was being set to 0, which is illegal (and also non-sensical). Also we must wait for the buffer to become available for reading as otherwise a wrong value may be prefetched. Since we must wait for the buffer anyways, and it's mapped and in GART, we may as well avoid the annoyance of the indirect pushbuf submit. Signed-off-by: Ilia Mirkin <[email protected]> Cc: [email protected]
* nv30: Implement color resolve for msaaHans de Goede2015-09-042-14/+8
| | | | | | | | | | | Note this is not ideal. Since the sifm can only do source sizes upto 1024x1024 we end up using the blitter on nv4x, which is not that fast. And on nv3x we end up using the cpu which is really slow. Cc: "10.6 11.0" <[email protected]> Signed-off-by: Hans de Goede <[email protected]> Reviewed-by: Ilia Mirkin <[email protected]>
* nv30: Fix creation of scanout buffersHans de Goede2015-09-041-0/+10
| | | | | | | | | | | | | | | | | | | | | Scanout buffers on nv30 must always be non-swizzled and have special width alignment constraints. These constrains have been taken from the xf86-video-nouveau src/nv_accel_common.c: nouveau_allocate_surface() function. nouveau_allocate_surface() applies these width constraints only when a tiled attribute is set, which it sets for all surfaces allocated via dri, and this "tiling" is not the same as swizzling, scanout surfaces must be linear / have a uniform_pitch or only complete garbage is shown. This commit fixes dri3 on nv30 showing a garbled display, with dri3 the scanout buffers are allocated by mesa, rather then by the ddx, and the wrong stride of these buffers was causing the garbled display. Cc: "10.6 11.0" <[email protected]> Signed-off-by: Hans de Goede <[email protected]> Reviewed-by: Ilia Mirkin <[email protected]>
* nvc0: change prefix of MP performance counters to HW_SMSamuel Pitoiset2015-08-292-149/+149
| | | | | | | According to NVIDIA, local performance counters (MP) are prefixed with SM, while global performance counters (PCOUNTER) are called PM. Signed-off-by: Samuel Pitoiset <[email protected]>
* nvc0: sort performance counter queries by nameSamuel Pitoiset2015-08-292-142/+142
| | | | Signed-off-by: Samuel Pitoiset <[email protected]>
* nvc0: make names of performance counter queries consistentSamuel Pitoiset2015-08-292-56/+56
| | | | Signed-off-by: Samuel Pitoiset <[email protected]>
* nvc0: use enumerations for driver queriesSamuel Pitoiset2015-08-291-120/+123
| | | | Signed-off-by: Samuel Pitoiset <[email protected]>
* nvc0: remove commented out code related to PCOUNTER queriesSamuel Pitoiset2015-08-291-20/+0
| | | | Signed-off-by: Samuel Pitoiset <[email protected]>
* nouveau: avoid build failures since 0fc21ecfIlia Mirkin2015-08-263-3/+3
| | | | Signed-off-by: Ilia Mirkin <[email protected]>
* gallium: add flags parameter to pipe_screen::context_createMarek Olšák2015-08-266-6/+6
| | | | | | | | This allows creating compute-only and debug contexts. Reviewed-by: Brian Paul <[email protected]> Acked-by: Christian König <[email protected]> Acked-by: Alex Deucher <[email protected]>
* nv50: fix 2d engine blits for 64- and 128-bit formatsIlia Mirkin2015-08-231-0/+4
| | | | | | | This fixes bin/ext_framebuffer_multisample-formats all_samples Signed-off-by: Ilia Mirkin <[email protected]> Cc: "11.0" <[email protected]>
* nv50: account for the int RT0 rule for alpha-to-one/covIlia Mirkin2015-08-233-11/+23
| | | | | | | | Same as commit 1af0641db but for nvc0. If an integer texture is bound to RT0, don't do alpha-to-one or alpha-to-coverage. Signed-off-by: Ilia Mirkin <[email protected]> Cc: "11.0" <[email protected]>
* nv50,nvc0: disable depth bounds test on blitIlia Mirkin2015-08-232-0/+3
| | | | | Signed-off-by: Ilia Mirkin <[email protected]> Cc: "11.0" <[email protected]>
* nouveau: add codegen/unordered_set.h to the tarballEmil Velikov2015-08-221-1/+2
| | | | Signed-off-by: Emil Velikov <[email protected]>
* nv50/ir: pre-compute BFE arg when both bits and offset are immIlia Mirkin2015-08-201-3/+9
| | | | | | | | | | Due to a quirk in how the nv50 opt passes run, the algebraic optimization that looks for these BFE's happens before the constant folding pass. Rearranging these passes isn't a great idea, but this is easy enough to fix. Allows a following cvt to eliminate the bfe in certain situations. Signed-off-by: Ilia Mirkin <[email protected]>
* nv50/ir: Handle OP_CVT when folding constant expressionsTobias Klausmann2015-08-201-0/+78
| | | | | [imirkin: handle more type combinations, use macro] Signed-off-by: Ilia Mirkin <[email protected]>
* nvc0/ir: undo more shifts still by allowing a pre-SHL to occurIlia Mirkin2015-08-201-15/+33
| | | | | | | | This happens with unpackSnorm lowering. There's yet another bitfield-extract behind it, but there's too much variation to be worth cutting through. Signed-off-by: Ilia Mirkin <[email protected]>
* nvc0/ir: don't require AND when the high byte is being addressedIlia Mirkin2015-08-201-0/+12
| | | | | | | unpackUnorm* lowering doesn't AND the high byte/word as it's unnecessary. Detect that situation as well. Signed-off-by: Ilia Mirkin <[email protected]>
* nvc0/ir: detect i2f/i2i which operate on specific bytes/wordsIlia Mirkin2015-08-204-4/+82
| | | | | | | | | | | Some Unigine shaders have been observed to unpack bytes out of 32-bit integers and convert them to floats. I2F/I2I can handle this sort of thing directly. Detect the handleable situations. This misses 16-bit word capabilities in nv50, but I haven't seen shaders that would actually make use of that. Signed-off-by: Ilia Mirkin <[email protected]>
* nvc0/ir: detect AND/SHR pairs and convert into EXTBFIlia Mirkin2015-08-201-20/+46
| | | | | | | Some shaders appear to extract bits using shift/and combos. Detect (some) of those and convert to EXTBF instead. Signed-off-by: Ilia Mirkin <[email protected]>
* nv50/ir: support different unordered_set implementationsChih-Wei Huang2015-08-205-12/+57
| | | | | | | | | | | | If build with C++11 standard, use std::unordered_set. Otherwise if build on old Android version with stlport, use std::tr1::unordered_set with a wrapper class. Otherwise use std::tr1::unordered_set. Signed-off-by: Chih-Wei Huang <[email protected]> Reviewed-by: Ilia Mirkin <[email protected]>
* nouveau: recognize tess stages in nouveau_compilerMarcos Paulo de Souza2015-08-171-0/+4
| | | | Reviewed-by: Ilia Mirkin <[email protected]>
* nvc0: implement the color buffer 0 is integer rule for alpha-to-one/covIlia Mirkin2015-08-173-11/+22
| | | | | | | | | | | | The hardware checks for multisampling being enabled, but does not have the rule about cbuf0 being an integer format. Only enable alpha-to-one/alpha-to-coverage if cbuf0 is not an integer format. Fixes piglits ext_framebuffer_multisample-int-draw-buffers-alpha-to-one ext_framebuffer_multisample-int-draw-buffers-alpha-to-coverage Signed-off-by: Ilia Mirkin <[email protected]>
* gk110/ir: fix sched calculator to consider all registers in the ISAIlia Mirkin2015-08-171-7/+10
| | | | | | | GK110/GK208 have 256 registers, not 64. Find out the number of registers from the target to avoid unnecessary iteration for pre-GK110. Signed-off-by: Ilia Mirkin <[email protected]>
* nvc0: program smooth line width when multisampling is enabledIlia Mirkin2015-08-171-1/+1
| | | | | | | | | | There are separate line widths for smooth and aliased lines. The smooth one is selected when multisampling is enabled even if line smoothing isn't explicitly turned on. Fixes the ext_framebuffer_multisample-line-smooth piglits Signed-off-by: Ilia Mirkin <[email protected]>
* nvc0: bind a fake tess control program when there isn't one availableIlia Mirkin2015-08-174-8/+44
| | | | | | | | | | | Apparently this is necessary in order for tess factors to work in a tess eval program without a tess control program bound. Probably because it uses the fake program's shader header to work out the number of patch constants. Fixes vs-tes-tessinner-tessouter-inputs Signed-off-by: Ilia Mirkin <[email protected]>
* gm107/ir: avoid letting the lowering pass get out of syncIlia Mirkin2015-08-172-88/+5
| | | | | | | | There's a lot of functionality duplicated in the gm107 lowering pass from the nvc0 pass. As that one gets updated, the gm107 one falls behind. Avoid this by sharing the code. Signed-off-by: Ilia Mirkin <[email protected]>
* nv50,nvc0: take level into account when doing eng2d multi-layer blitsIlia Mirkin2015-08-172-8/+20
| | | | | | | | This fixes arb_get_texture_sub_image-get, and any situation where the 2d engine was being used for multi-layer blits to a non-0 level. Signed-off-by: Ilia Mirkin <[email protected]> Cc: "10.6" <[email protected]>
* nvc0: disable tessellation on maxwellIlia Mirkin2015-08-141-2/+5
| | | | | | | | The address calculations are all different (e.g. see GP), there appear to be sync's in programs, and probably a bunch of other differences. Just disable it for now. Signed-off-by: Ilia Mirkin <[email protected]>
* gm107/ir: indirect handle goes first on maxwell alsoIlia Mirkin2015-08-141-8/+4
| | | | | | | Fixes fs-simple-texture-size.shader_test Signed-off-by: Ilia Mirkin <[email protected]> Cc: "10.6" <[email protected]>
* nv30: add depth bounds test support for hw that has itIlia Mirkin2015-08-143-2/+14
| | | | Signed-off-by: Ilia Mirkin <[email protected]>
* nv50: add depth bounds test supportIlia Mirkin2015-08-143-2/+12
| | | | Signed-off-by: Ilia Mirkin <[email protected]>
* nvc0: add depth bounds test supportIlia Mirkin2015-08-143-2/+9
| | | | Signed-off-by: Ilia Mirkin <[email protected]>
* gallium: add an interface for EXT_depth_bounds_testMarek Olšák2015-08-143-0/+3
| | | | | Reviewed-by: Roland Scheidegger <[email protected]> Reviewed-by: Brian Paul <[email protected]>