summaryrefslogtreecommitdiffstats
path: root/src/gallium
Commit message (Collapse)AuthorAgeFilesLines
* radeonsi/gfx9: always wrap GS and TCS in an if-block (v2)Nicolai Hähnle2017-07-272-33/+79
| | | | | | | | | | | | | | | | | | With merged ESGS shaders, the GS part of a wave may be empty, and the hardware gets confused if any GS messages are sent from that wave. Since S_SENDMSG is executed even when EXEC = 0, we have to wrap even non-monolithic GS shaders in an if-block, so that the entire shader and hence the S_SENDMSG instructions are skipped in empty waves. This change is not required for TCS/HS, but applying it there as well simplifies the logic a bit. Fixes GL45-CTS.geometry_shader.rendering.rendering.* v2: ensure that the TCS epilog doesn't run for non-existing patches Cc: [email protected] Reviewed-by: Marek Olšák <[email protected]>
* radeonsi/gfx9: fix vertex idx in ES with multiple waves per threadgroupNicolai Hähnle2017-07-271-1/+6
| | | | | Cc: [email protected] Reviewed: Marek Olšák <[email protected]>
* swr: fix transform feedback logicGeorge Kyriazis2017-07-274-8/+71
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The shader that is used to copy vertex data out of the vs/gs shaders to the user-specified buffer (streamout or SO shader) was not using the correct offsets. Adjust the offsets that are used just for the SO shader: - Make sure that position is handled in the same special way as in the vs/gs shaders - Use the correct offset to be passed in the core - consolidate register slot mapping logic into one function, since it's been calculated in 2 different places (one for calcuating the slot mask, and one for the register offsets themselves Also make room for all attibutes in the backend vertex area. Fixes: - all vtk GL2PS tests - 18 piglit tests (16 ext_transform_feedback tests, arb-quads-follow-provoking-vertex and primitive-type gl_points v2: - take care of more SGV slots in slot mapping logic - trim feState.vsVertexSize - fix GS interface and incorporate GS while calculating vsVertexSize Note that vsVertexSize is used in the core as the one parameter that controls vertex size between all stages, so it has to be adjusted appropriately for the whole vs/gs/fs pipeline. Also note that GS and SO is not fully implemented. This will be addressed later. fixes: - fixes total of 20 piglit tests CC: 17.2 <[email protected]> Reviewed-by: Bruce Cherniak <[email protected]>
* swr/rast: non-regex knob fallback code for gcc < 4.9Tim Rowley2017-07-271-0/+21
| | | | | | | | gcc prior to 4.9 didn't implement <regex>, causing a startup crash in the swr knob parameter reading code. CC: <[email protected]> Reviewed-by: Bruce Cherniak <[email protected]>
* virgl: encode index buffer offset.Dave Airlie2017-07-271-1/+1
| | | | | | | Fixes arb_vertex_buffer_object-combined-vertex-index Cc: [email protected] Signed-off-by: Dave Airlie <[email protected]>
* radeonsi: decrease the number of compiler threadsMarek Olšák2017-07-262-3/+8
| | | | | Cc: 17.2 <[email protected]> Reviewed-by: Nicolai Hähnle <[email protected]>
* gallium/radeon: make S_FIXED function signed and move it to shared codeMarek Olšák2017-07-263-9/+5
| | | | | | | | | | | This fixes a bug uncovered by: 2412c4c81ea0488df865817a0de91ec46e359b72 util: Make CLAMP turn NaN into MIN. Cc: 17.2 <[email protected]> Reviewed-by: Roland Scheidegger <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]> Reviewed-by: Nicolai Hähnle <[email protected]>
* radeonsi/gfx9: reduce max threads per block to 1024 on gfx9+Nicolai Hähnle2017-07-261-15/+24
| | | | | | | | | The number of supported waves per thread group has been reduced to 16 with gfx9. Trying to use 32 waves causes hangs, and barriers might not work correctly with > 16 waves. Cc: [email protected] Reviewed-by: Marek Olšák <[email protected]>
* radeonsi: fix detection of DRAW_INDIRECT_MULTI on SINicolai Hähnle2017-07-261-2/+2
| | | | | | | | | | | | | | | | The firmware version numbers for SI were wrong. The new numbers are probably too conservative (we don't have a definitive answer by the firmware team), but DRAW_INDIRECT_MULTI has been confirmed to work with these versions on Tahiti (by Gustaw) and on Verde (by myself). While this is technically adding a feature, it's a feature we thought we had for a long time. The change is small enough and we're early enough in the 17.2 release cycle that it should still go in. Reported-by: Gustaw Smolarczyk <[email protected]> Cc: 17.2 <[email protected]> Acked-by: Alex Deucher <[email protected]> Reviewed-by: Marek Olšák <[email protected]>
* gallium/util: fix unused variable warningTimothy Arceri2017-07-261-3/+5
| | | | | Reviewed-by: Grazvydas Ignotas <[email protected]> Reviewed-by: Marek Olšák <[email protected]>
* broadcom/vc4: Use the RA callback to improve register selection's choices.Eric Anholt2017-07-251-1/+52
| | | | | | | | | | | | | | | We simply pick r4 if available (anything else would force a MOV), then round-robin through accumulators (avoids physical regfile RAW delay slots), then round-robin through the physical regfile. The effect on instruction count is pretty impressive: total instructions in shared programs: 76563 -> 74526 (-2.66%) instructions in affected programs: 66463 -> 64426 (-3.06%) and we could probably do better with a little heuristic of "if we're going to choose a physical reg, and other operands of instructions using this as a src have the same physical regfile, then use the other regfile".
* broadcom/vc4: Scissor blits performed using the rendering engine.Eric Anholt2017-07-251-0/+9
| | | | | | Without this, a BlitFramebuffer would mark the whole framebuffer as being changed (so we emit loads/stores of all of it) rather than just the modified subset.
* broadcom/vc4: Prefer blit via rendering to the software fallback.Eric Anholt2017-07-251-6/+8
| | | | | | | I don't know how I managed to leave this here for so long. Found when working on a 1:1 overlapping blit extension for X11. Cc: [email protected]
* broadcom/vc4: Switch the Viewport Center fields to a fixed-point representation.Eric Anholt2017-07-251-2/+2
| | | | | | This gets us automatic CL decoding to a floating-point value, and drops a magic number from the emit code. 250x250 shader runner tests now say they have a center of 125.0 instead of 2000.
* broadcom/vc4: Use the XML decoder for CL dumping.Eric Anholt2017-07-253-443/+32
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The VC4_DEBUG_CL output goes from: 0x00000010 0x00000010: 0x06 VC4_PACKET_START_TILE_BINNING 0x00000011 0x00000011: 0x38 VC4_PACKET_PRIMITIVE_LIST_FORMAT 0x00000012 0x00000012: 0x12 0x00000013 0x00000013: 0x66 VC4_PACKET_CLIP_WINDOW 0x00000014 0x00000014: 0x00 0x00000015 0x00000015: 0x00 0x00000016 0x00000016: 0x00 0x00000017 0x00000017: 0x00 0x00000018 0x00000018: 0xfa 0x00000019 0x00000019: 0x00 0x0000001a 0x0000001a: 0xfa 0x0000001b 0x0000001b: 0x00 to: 0x00000010 0x00000010: 0x06 Start Tile Binning 0x00000011 0x00000011: 0x38 Primitive List Format Data Type: 1 (16-bit index) Primitive Type: 2 (Triangles List) 0x00000013 0x00000013: 0x66 Clip Window Clip Window Height in pixels: 250 Clip Window Width in pixels: 250 Clip Window Bottom Pixel Coordinate: 0 Clip Window Left Pixel Coordinate: 0 v2: Squash in robher's fixes for Android
* svga: implement MSAA alpha_to_one featureBrian Paul2017-07-255-1/+57
| | | | | | | | | | | | | The device doesn't directly support this feature so we implement it with additional shader code which sets the color output(s) w component to 1.0 (or max_int or max_uint). Fixes 16 Piglit ext_framebuffer_multisample/*alpha-to-one* tests. v2: only support unorm/float buffers, not int/uint, per Roland. Reviewed-by: Roland Scheidegger <[email protected]> Reviewed-by: Charmaine Lee <[email protected]>
* svga: rework the FS white fragments codeBrian Paul2017-07-252-33/+21
| | | | | | | | | | When we forcibly write white to FS outputs (for XOR mode emulation) we were using a temp register. But that's not really necessary. This also fixes the case of writing white to multiple color buffers. Subsequent changes will build on this. Reviewed-by: Charmaine Lee <[email protected]>
* gallium/util: s/unsigned/enum tgsi_texture_type/Brian Paul2017-07-254-21/+24
| | | | Reviewed-by: Roland Scheidegger <[email protected]>
* st/dri2: Return invalid modifier when no driver supportDaniel Stone2017-07-251-0/+6
| | | | | | | | | Always initialise whandle.modifier for DRIImage modifier queries, so if the driver doesn't support it then we return false for the query. Signed-off-by: Daniel Stone <[email protected]> Reviewed-by: Marek Olšák <[email protected]> Fixes: d33fe8b84e45 ("st/dri: enable DRIimage modifier queries")
* st/dri: Check get-handle return value in queryImageDaniel Stone2017-07-251-12/+18
| | | | | | | | In the DRIImage queryImage hook, check if resource_get_handle() failed and return FALSE if so. Signed-off-by: Daniel Stone <[email protected]> Reviewed-by: Marek Olšák <[email protected]>
* r600: Add support for B5G5R5A1.Michal Srb2017-07-251-0/+6
| | | | | | Fixes rendercheck errors when using glamor acceleration in X server. Signed-off-by: Marek Olšák <[email protected]>
* radeon/vcn: move message buffer to vram for nowLeo Liu2017-07-251-1/+2
| | | | | | | To workaround an unknown bug. Signed-off-by: Leo Liu <[email protected]> Acked-by: Christian König <[email protected]>
* trace: Correct transfer box size calculation.Jose Fonseca2017-07-251-9/+8
| | | | | | | | | | For textures we must not approximate the calculation with `stride * height`, or `slice_stride * depth`, as that can easily lead to buffer overflows, particularly for partial transfers. This should address the issue that Bruce Cherniak found and diagnosed. Reviewed-by: Roland Scheidegger <[email protected]>
* r600g: constify some args at r600_asm.cConstantine Charlamov2017-07-251-5/+6
| | | | | Signed-off-by: Constantine Kharlamov <[email protected]> Reviewed-by: Nicolai Hähnle <[email protected]>
* r600g: remove unused "bc" args, and one unneeded forward declarationConstantine Charlamov2017-07-251-45/+40
| | | | | | | To ease review just highlight "bc," string. Signed-off-by: Constantine Kharlamov <[email protected]> Reviewed-by: Nicolai Hähnle <[email protected]>
* st/mesa: create framebuffer iface hash table per st managerCharmaine Lee2017-07-247-0/+33
| | | | | | | | | | | | | | | | With commit 5124bf98239, a framebuffer interface hash table is created in st_gl_api_create(), which is called in dri_init_screen_helper() for each screen. When the hash table is overwritten with multiple calls to st_gl_api_create(), it can cause race condition. This patch fixes the problem by creating a framebuffer interface hash table per state tracker manager. Fixes crash with steam. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=101876 Fixes: 5124bf98239 ("st/mesa: add destroy_drawable interface") Tested-by: Christoph Haag <[email protected]> Reviewed-by: Brian Paul <[email protected]>
* swr: use the correct variable for no undefined symbolsEmil Velikov2017-07-241-1/+1
| | | | | | | | | | | | | | | | | | | | | The variable name was missing a leading LD_, which resulted in a missing check for unresolved symbols in the backend binaries. With the link addressed with earlier patches, we can correct the typo. Thanks to Laurent for the help spotting this. v2: Split from a larger patch. Cc: [email protected] Cc: Bruce Cherniak <[email protected]> Cc: Tim Rowley <[email protected]> Cc: Laurent Carlier <[email protected]> Fixes: 9475251145174882b532 "swr: standardize linkage and check for unresolved symbols" Reviewed-by: Eric Engestrom <[email protected]> Reported-by: Laurent Carlier <[email protected]> Signed-off-by: Emil Velikov <[email protected]>
* swr: don't forget to link KNL/SKX against pthreadsEmil Velikov2017-07-241-0/+8
| | | | | | | | | | Analogous to previous commit but for the KNL/SKX backends. Cc: Bruce Cherniak <[email protected]> Cc: Tim Rowley <[email protected]> Cc: Laurent Carlier <[email protected]> Fixes: 1cb5a6061ce ("configure/swr: add KNL and SKX architecture targets") Signed-off-by: Emil Velikov <[email protected]>
* swr: don't forget to link AVX/AVX2 against pthreadsEmil Velikov2017-07-241-0/+8
| | | | | | | | | | | | | | | | | | Seems like the backends have been using pthreads since day one, yet we've been missing the link. With later commit we'll fix a typo, hence the libraries will be build with -Wl,no-undefined, aka failing the build on unresolved symbols. v2: Split from a larger patch. Cc: [email protected] Cc: Bruce Cherniak <[email protected]> Cc: Tim Rowley <[email protected]> Cc: Laurent Carlier <[email protected]> Fixes: c6e67f5a9373e916a8d2 "gallium/swr: add OpenSWR rasterizer" Reviewed-by: Eric Engestrom <[email protected]> Signed-off-by: Emil Velikov <[email protected]>
* etnaviv: Clear lbl_usage array correctlyWladimir J. van der Laan2017-07-231-1/+1
| | | | | | | | | | | | | Fill the entire array instead of just a quarter. This avoids crashes with large shaders. (currently this never causes a problem because shaders larger than 2048/4 instructions are not supported by this driver on any hardware, but it will cause problems in the future) Fixes: ec436051899 ("etnaviv: fix shader miscompilation with more than 16 labels") Cc: [email protected] Signed-off-by: Wladimir J. van der Laan <[email protected]> Reviewed-by: Christian Gmeiner <[email protected]>
* svga: Limit number of immediates in shaderNeha Bhende2017-07-221-3/+5
| | | | | | | | | | | | imm {128.0, -128.0, 2.0, 3.0} is used for lit instruction which is not used very frequently. So allocate it only if lit instruction is used. Tested with mtt piglit and mtt glretrace v2: As per Charmaine's comment Reviewed-by: Brian Paul <[email protected]> Reviewed-by: Charmaine Lee <[email protected]>
* svga: fix constant indices for texcoord scale factors and texture buffer sizeCharmaine Lee2017-07-221-9/+6
| | | | | | | | | | This patch fixes the ordering of the constant indices for texcoord scale factor and texture buffer size to match the order they were added to the constant buffer in svga_get_extra_constants_common(). Tested with MTT piglit, glretrace. Reviewed-by: Brian Paul <[email protected]>
* svga: fix unnormalized->normalized texture coordinate conversionNeha Bhende2017-07-223-3/+35
| | | | | | | | | | | | | Sometimes, converting unnormalized coordinates to normalized coordinates requires an epsilon value to produce the right texels with nearest filtering. Adding 0.0001 to the coordinates when the min/mag filter is nearest fixes the issue. Fixes piglit test fbo-blit-scaled-linear Tested with mtt-piglit, mtt-glretrace Reviewed-by: Brian Paul <[email protected]> Reviewed-by: Charmaine Lee <[email protected]>
* svga: only support 4x, 8x, 16x msaaBrian Paul2017-07-221-0/+5
| | | | | | | Skip 2x MSAA, for example, since it's seldom used and just bloats the list of pixel formats. Reviewed-by: Charmaine Lee <[email protected]>
* nv50/ir: disable mul+add to mad for precise instructionsKarol Herbst2017-07-211-2/+3
| | | | | | | | | | | | fixes missrendering in TombRaider KHR-GL44.gpu_shader5.precise_qualifier KHR-GL45.gpu_shader5.precise_qualifier v4: disable opt only for MAD, it's fine for SAD Signed-off-by: Karol Herbst <[email protected]> Reviewed-by: Pierre Moreau <[email protected]>
* nv50/ir/tgsi: handle precise for most ALU instructionsKarol Herbst2017-07-211-0/+2
| | | | | Signed-off-by: Karol Herbst <[email protected]> Reviewed-by: Pierre Moreau <[email protected]>
* nv50/ir: add precise field to InstructionKarol Herbst2017-07-212-0/+3
| | | | | | | v4: initialize field with NULL Signed-off-by: Karol Herbst <[email protected]> Reviewed-by: Pierre Moreau <[email protected]>
* gallium/docs: add precise instruction modifierKarol Herbst2017-07-211-1/+10
| | | | | | | | v4: add comment about intermediate rounding step to MAD Signed-off-by: Karol Herbst <[email protected]> Reviewed-by: Nicolai Hähnle <[email protected]> Reviewed-by: Roland Scheidegger <[email protected]>
* tgsi/text: parse _PRECISE modifierKarol Herbst2017-07-211-3/+14
| | | | | | | | v2: use str_match_no_case to fix _SAT_PRECISE detection v4: usd is_digit_alpha_underscore to match end of mods Signed-off-by: Karol Herbst <[email protected]> Reviewed-by: Nicolai Hähnle <[email protected]>
* tgsi: populate preciseKarol Herbst2017-07-215-6/+27
| | | | | | | | | Only implemented for glsl->tgsi. Other converters just set precise to 0. v2: remove precise paramter from ureg_tex_insn and ureg_memory_insn Signed-off-by: Karol Herbst <[email protected]> Reviewed-by: Nicolai Hähnle <[email protected]>
* tgsi/dump: print _PRECISE modifier on InstructionsKarol Herbst2017-07-211-0/+4
| | | | | Signed-off-by: Karol Herbst <[email protected]> Reviewed-by: Nicolai Hähnle <[email protected]>
* tgsi: add precise flag to tgsi_instructionKarol Herbst2017-07-212-1/+3
| | | | | | Signed-off-by: Karol Herbst <[email protected]> Reviewed-by: Nicolai Hähnle <[email protected]> Reviewed-by: Roland Scheidegger <[email protected]>
* st/mesa: add destroy_drawable interfaceCharmaine Lee2017-07-205-2/+24
| | | | | | | | | | | | | | | With this patch, the st manager will maintain a hash table for the active framebuffer interface objects. A destroy_drawable interface is added to allow the state tracker to notify the st manager to remove the associated framebuffer interface object from the hash table, so the associated framebuffer and its resources can be deleted at framebuffers purge time. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=101829 Fixes: 147d7fb772a ("st/mesa: add a winsys buffers list in st_context") Tested-by: Brad King <[email protected]> Tested-by: Gert Wollny <[email protected]> Reviewed-by: Brian Paul <[email protected]>
* gallivm: handle call attributes for llvm < 4.0 in lp_add_function_attrRoland Scheidegger2017-07-211-3/+7
| | | | | | | | | | | | | | | | | | | We had some caller using LLVMAddInstrAttributes, which couldn't be converted to lp_add_function_attr, because attributes were only handled for functions in this case, so fix this. For llvm >= 4.0, this already works correctly. (radeonsi seems to avoid setting call site attributes prior to llvm 4.0, the patch then citing it doesn't work when calling intrinsics. But at least for calling external functions we always used that, albeit only for actual call attributes, not call parameter attributes, though some quick test shows llvm seems to handle that as well. The attribute index is sort of iffy though, since attribute 0 of the call is the actual function, attribute 1 corresponds to the first parameter of the called function.) (Verified with GALLIVM_DEBUG=dumpbc plus llvm-dis that the correct attributes are shown for calls, both for llvm 4.0 and 3.3.) Reviewed-by: Jose Fonseca <[email protected]> Reviewed-by: Brian Paul <[email protected]>
* swr/rast: quit using linux-specific gettid()Tim Rowley2017-07-212-4/+3
| | | | | | | | | | | | | Linux-specific gettid() syscall shouldn't be used in portable code. Fix does assume a 1:1 thread:LWP architecture, but works for our current target platforms and can be revisited later if needed. Fixes unresolved symbol in linux scons builds. v2: add comment in code about the 1:1 assumption. Cc: [email protected] Reviewed-by: Bruce Cherniak <[email protected]>
* swr/rast: fix memory paths for avx512 optimized avx/sseTim Rowley2017-07-212-10/+10
| | | | | | | Source/destination will not be AVX512 aligned, use the unaligned load/store intrinsics. Reviewed-by: Bruce Cherniak <[email protected]>
* swr/rast: cache line align hottile buffersTim Rowley2017-07-211-3/+3
| | | | | | Prevents unalignment crashes with avx512 code on gcc/clang. Reviewed-by: Bruce Cherniak <[email protected]>
* swr/rast: simdlib changes for clang/gccTim Rowley2017-07-212-10/+35
| | | | | | Tested with clang-4.0 and gcc-6.3. Reviewed-by: Bruce Cherniak <[email protected]>
* etnaviv: Avoid duplicates in formats tableWladimir J. van der Laan2017-07-211-5/+1
| | | | | | | | | | | | | | Remove the following duplicates from the formats table: - R8G8B8A8_UNORM (V_,_T) - R8G8B8X8_UNORM (_T,_T) - DXT3_RGBA (_T,_T) Only the first has an effect because the _T overrides the V_ initializer, the latter two were harmless duplications of the same. Signed-off-by: Wladimir J. van der Laan <[email protected]> Signed-off-by: Lucas Stach <[email protected]>
* etnaviv: Add support for ETC2 texture compressionWladimir J. van der Laan2017-07-212-1/+22
| | | | | | | | | | | | | | | | Add support for ETC2 compressed textures in the etnaviv driver. One step closer towards GL ES 3 support. For now, treat SRGB and RGB formats the same. It looks like these are distinguished using a different bit in sampler state, and not part of the format, but I have not yet been able to confirm this for sure. (Only enabled on GC3000+ for now, as the GC2000 ETC2 decoder implementation is buggy and we don't work around that) Signed-off-by: Wladimir J. van der Laan <[email protected]> Signed-off-by: Lucas Stach <[email protected]>