summaryrefslogtreecommitdiffstats
Commit message (Collapse)AuthorAgeFilesLines
* gallium: add PIPE_CAP_TGSI_PACK_HALF_FLOAT to indicate UP2H/PK2H supportIlia Mirkin2016-01-0316-0/+17
| | | | | | Signed-off-by: Ilia Mirkin <[email protected]> Reviewed-by: Edward O'Callaghan <[email protected]> Reviewed-by: Roland Scheidegger <[email protected]>
* tgsi: update PK2H/UP2H channel behavior infoIlia Mirkin2016-01-031-8/+8
| | | | | | Signed-off-by: Ilia Mirkin <[email protected]> Reviewed-by: Edward O'Callaghan <[email protected]> Reviewed-by: Roland Scheidegger <[email protected]>
* gallium: document PK2H/UP2HIlia Mirkin2016-01-031-2/+14
| | | | | | Signed-off-by: Ilia Mirkin <[email protected]> Reviewed-by: Edward O'Callaghan <[email protected]> Reviewed-by: Roland Scheidegger <[email protected]>
* st/mesa: fix parameter names for tesseval/tessctrl prototypesSamuel Pitoiset2016-01-031-4/+4
| | | | | Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Ilia Mirkin <[email protected]>
* nouveau: fix double-const qualifierIlia Mirkin2016-01-032-2/+2
| | | | | | | Reported by Tom^ on IRC. The original intent was to mark the pointer constant as well as the data being pointed to, so move the *. Signed-off-by: Ilia Mirkin <[email protected]>
* freedreno/ir3: use NIR_PASS helper macrosRob Clark2016-01-031-19/+28
| | | | Signed-off-by: Rob Clark <[email protected]>
* nir: extract out helper macros for running passesRob Clark2016-01-032-36/+43
| | | | | | | | | Note these are a bit uglier, due to avoidance of GNU C extensions. But drivers which do not need to be built with compilers that don't support the extension can wrap these macros with their own. Signed-off-by: Rob Clark <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]>
* freedreno/ir3: we require block_index metadataRob Clark2016-01-031-0/+2
| | | | | | | | Found during NIR_TEST_CLONE=1 piglit run. We were using block->index but forgetting to require it. Causing things to not work with a cloned shader which didn't preserve block_index. Signed-off-by: Rob Clark <[email protected]>
* freedreno/ir3: refactor NIR IR handlingRob Clark2016-01-037-111/+202
| | | | | | | | | Immediately convert into NIR and do an initial key-agnostic lowering/ optimization pass. This should let us share most of the per-variant transformations between each variant, and hopefully minimize the draw- time variant creation part of the compilation process. Signed-off-by: Rob Clark <[email protected]>
* freedreno/ir3: drop unnecessary unreachable() caseRob Clark2016-01-031-2/+0
| | | | | | | It will still hit a compile_assert() in emit_tex, which has the advantage of dumping out the offending shader. Signed-off-by: Rob Clark <[email protected]>
* gallium/tests: fix build with clang compilerSamuel Pitoiset2016-01-031-273/+330
| | | | | | | | | | | | Nested functions are supported as an extension in GNU C, but Clang don't support them. This fixes compilation errors when (manually) building compute.c, or by setting --enable-gallium-tests to the configure script. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=75165 Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Edward O'Callaghan <[email protected]>
* nv50,nvc0: optimize coherent buffer checking at draw timeSamuel Pitoiset2016-01-036-68/+82
| | | | | | | | | | Instead of iterating over all the buffer resources looking for coherent buffers, we keep track of a context-wide count. This will save some iterations (and CPU cycles) in 99.99% case because usually coherent buffers are not so used. Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Ilia Mirkin <[email protected]>
* i965: Make TCS precompile use the TES primitive mode when available.Kenneth Graunke2016-01-021-1/+3
| | | | | | | | If there's a linked TES program, we should just use the actual primitive mode. If not, just guess triangles (as we did before). Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Jordan Justen <[email protected]>
* i965: Push most TES inputs in SIMD8 mode.Kenneth Graunke2016-01-021-12/+30
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Using the push model for inputs is much more efficient than pulling inputs - the hardware can simply copy a large chunk into URB registers at thread creation time, rather than having the thread send messages to request data from the L3 cache. Unfortunately, it's possible to have more TES inputs than fit in registers, so we have to fall back to the pull model in some cases. However, it turns out that most tessellation evaluation shaders are fairly simple, and don't use many inputs. An arbitrary cut-off of 32 vec4 slots (16 registers) is more than sufficient to ensure that 100% of TES inputs are pushed for Shadow of Mordor, Unigine Heaven, GPUTest/TessMark, and SynMark. Note that unlike most SIMD8 stages, this actually reads packed vec4 data, since that is what our vec4 TCS programs write. Improves performance in GPUTest's tessmark_x64 microbenchmark by 93.4426% +/- 5.35541% (n = 25) on my Lenovo X250 at 1024x768. Improves performance in Synmark's Gl40TerrainFlyTess microbenchmark by 22.74% +/- 0.309394% (n = 5). Improves performance in Shadow of Mordor at low settings with tessellation enabled at 1280x720 by 2.12197% +/- 0.478553% (n = 4). shader-db statistics for files containing tessellation shaders: total instructions in shared programs: 184358 -> 181181 (-1.72%) instructions in affected programs: 27971 -> 24794 (-11.36%) helped: 226 Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Matt Turner <[email protected]>
* i965: Use LOAD_PAYLOAD for SIMD8 TES input loads, not MOV.Kenneth Graunke2016-01-021-1/+4
| | | | | | | | | | | | | | | | | | | | | | | | We need a MOV to replicate g0.0<0,1,0> to all 8 channels. Since the message payload is a single register, MOV seemed more sensible than LOAD_PAYLOAD. However, MOV cannot be CSE'd, while LOAD_PAYLOAD can. All input loads can use the same header - we don't need to re-expand g0 every time. CSE accomplishes this, saving instructions. shader-db statistics for files containing tessellation shaders: total instructions in shared programs: 186923 -> 184358 (-1.37%) instructions in affected programs: 30536 -> 27971 (-8.40%) helped: 226 HURT: 0 total cycles in shared programs: 1009850 -> 1005356 (-0.45%) cycles in affected programs: 168206 -> 163712 (-2.67%) helped: 226 HURT: 0 Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Matt Turner <[email protected]>
* i965: Move 3-src subnr swizzle handling into the vec4 backend.Kenneth Graunke2016-01-022-6/+18
| | | | | | | | | | | | | | | | | | | | | | | While most align16 instructions only support a SubRegNum of 0 or 4 (using swizzling to control the other channels), 3-src instructions actually support arbitrary SubRegNums. When the RepCtrl bit is set, we believe it ignores the swizzle and uses the equivalent of a <0,1,0> region from the subnr. In the past, we adopted a vec4-centric approach of specifying subnr of 0 or 4 and a swizzle, then having brw_eu_emit.c convert that to a proper SubRegNum. This isn't a great fit for the scalar backend, where we don't set swizzles at all, and happily set subnrs in the range [0, 7]. This patch changes brw_eu_emit.c to use subnr and swizzle directly, relying on the higher levels to set them sensibly. This should fix problems where scalar sources get copy propagated into 3-src instructions in the FS backend. I've only observed this with TES push model inputs, but I suppose it could happen in other cases. Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Matt Turner <[email protected]>
* vc4: Fix build from upload changes.Eric Anholt2016-01-021-1/+1
|
* gallium/radeon: send LLVM diagnostics as debug messagesNicolai Hähnle2016-01-021-15/+46
| | | | | | | | | | | | | Diagnostics sent during code generation and the every error message reported by LLVMTargetMachineEmitToMemoryBuffer are disjoint reporting mechanisms. We take care of both and also send an explicit message indicating failure at the end, so that log parsers can more easily tell the boundary between shader compiles. Removed an fprintf that could never be triggered. Reviewed-by: Edward O'Callaghan <[email protected]> Reviewed-by: Marek Olšák <[email protected]>
* gallium/radeon: pass pipe_debug_callback into radeon_llvm_compile (v2)Nicolai Hähnle2016-01-027-9/+18
| | | | | | | This will allow us to send shader debug info via the context's debug callback. Reviewed-by: Edward O'Callaghan <[email protected]> (v1) Reviewed-by: Marek Olšák <[email protected]>
* radeonsi: send shader info as debug messages in addition to stderr outputNicolai Hähnle2016-01-021-14/+55
| | | | | | | | | | | | | | The output via stderr is very helpful for ad-hoc debugging tasks, so that remains unchanged, but having the information available via debug messages as well will allow the use of parallel shader-db runs. Shader stats are always provided (if the context is a debug context, that is), but you still have to enable the appropriate R600_DEBUG flags to get disassembly (since it is rather spammy and is only generated by LLVM when we explicitly ask for it). Reviewed-by: Edward O'Callaghan <[email protected]> Reviewed-by: Marek Olšák <[email protected]>
* radeonsi: pass pipe_debug_callback down into si_shader_binary_read (v2)Nicolai Hähnle2016-01-024-14/+22
| | | | | | | This will allow us to send shader debug info. Reviewed-by: Edward O'Callaghan <[email protected]> (v1) Reviewed-by: Marek Olšák <[email protected]>
* gallium/radeon: implement set_debug_callbackNicolai Hähnle2016-01-022-0/+14
| | | | | Reviewed-by: Edward O'Callaghan <[email protected]> Reviewed-by: Marek Olšák <[email protected]>
* u_upload_mgr: allow specifying PIPE_USAGE_* for the upload bufferMarek Olšák2016-01-0217-26/+43
| | | | Reviewed-by: Nicolai Hähnle <[email protected]>
* u_upload_mgr: remove alignment parameter from u_upload_createMarek Olšák2016-01-0217-31/+19
| | | | Reviewed-by: Nicolai Hähnle <[email protected]>
* u_upload_mgr: pass alignment to u_upload_buffer manuallyMarek Olšák2016-01-023-2/+4
| | | | Reviewed-by: Nicolai Hähnle <[email protected]>
* u_upload_mgr: pass alignment to u_upload_data manuallyMarek Olšák2016-01-0219-22/+34
| | | | Reviewed-by: Nicolai Hähnle <[email protected]>
* u_upload_mgr: pass alignment to u_upload_alloc manuallyMarek Olšák2016-01-0225-27/+36
| | | | | | | | | | The fixed alignment of u_upload_mgr will go away. This is the first step. The motivation is that one u_upload_mgr can have multiple users, each allocating from the same buffer, but requiring a different alignment. Reviewed-by: Nicolai Hähnle <[email protected]>
* u_upload_mgr: rework the application of alignmentMarek Olšák2016-01-021-10/+14
| | | | | | | | | | | | | | | The function only aligned the size, but not the offset. The offset was aligned only when the previous suballocation was aligned. That yielded the correct offset alignment if the alignment was constant for all suballocations. Instead, directly align the offset, but allow an unaligned size. There is no change in behavior, because the alignment is constant at the moment. This a prerequisite for allowing a variable alignment for suballocations. Reviewed-by: Nicolai Hähnle <[email protected]>
* st/mesa: fix GLSL uniform updates for glBitmap & glDrawPixels (v2)Marek Olšák2016-01-025-19/+25
| | | | | | | | | | Spotted by luck. The GLSL uniform storage is only associated once in LinkShader and can't be reallocated afterwards, because that would break the association. v2: don't remove st_upload_constants calls, clarify why they're needed Cc: 11.0 11.1 <[email protected]>
* program: add _mesa_reserve_parameter_storageMarek Olšák2016-01-022-15/+36
| | | | | | | The next commit will use this. Reviewed-by: Brian Paul <[email protected]> Cc: 11.0 11.1 <[email protected]>
* mesa: Fix warning with MESA_VERBOSE=api for BindBufferRangeJordan Justen2016-01-011-1/+1
| | | | | Reported-by: Dieter Nützel <[email protected]> Signed-off-by: Jordan Justen <[email protected]>
* nv50,nvc0: make sure there's pushbuf space and that we ref the bo earlyIlia Mirkin2016-01-014-6/+5
| | | | | | | | | | | First off, we can't flush in the middle of a command. Secondly requesting the extra push space might cause a flush to happen. If that flush happens, we'd have to do the PUSH_REFN again. So instead do PUSH_REFN after the push space request. This helps avoid rare crashes with supertuxkart in libdrm due to assertion failures. Signed-off-by: Ilia Mirkin <[email protected]> Cc: "11.0 11.1" <[email protected]>
* st/mesa: sort extensions enablement arrayIlia Mirkin2016-01-011-11/+11
| | | | | Signed-off-by: Ilia Mirkin <[email protected]> Reviewed-by: Marek Olšák <[email protected]>
* nir/lower_clip: add missing writemask on storeRob Clark2016-01-011-0/+1
| | | | | Signed-off-by: Rob Clark <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]>
* mesa: Add MESA_VERBOSE=api for GL_ARB_program_interface_queryJordan Justen2016-01-011-0/+39
| | | | | | | | | v2: * Add braces '{}' when the _mesa_debug call spans multiple lines (Ken) Signed-off-by: Jordan Justen <[email protected]> Reviewed-by: Matt Turner <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
* mesa: Add MESA_VERBOSE=api for several indexed BindBuffer variantsJordan Justen2016-01-011-2/+25
| | | | | | | | v2: * Add braces '{}' when the _mesa_debug call spans multiple lines (Ken) Signed-off-by: Jordan Justen <[email protected]> Reviewed-by: Matt Turner <[email protected]>
* st/glsl_to_tgsi: fix block movs for doublesDave Airlie2016-01-011-1/+14
| | | | | | | | | | | | While playing with fp64, I disable varying packing to debug something else, and noticed we never emitted half the output movs for double matrix arrays. We should be moving the left index two slots for dual source doubles, and the right index two slots for non-vs input doubles. Signed-off-by: Dave Airlie <[email protected]>
* st/glsl_to_tgsi: handle different attrib sizeDave Airlie2016-01-011-5/+14
| | | | | | | vertex inputs are counted differently in some cases, with vertex inputs we need to make sure we don't double count them. Signed-off-by: Dave Airlie <[email protected]>
* st/glsl_to_tgsi: readd the double_reg2 for input index mappingDave Airlie2016-01-011-2/+2
| | | | | | | | | Otherwise we end up emitting the wrong index for the second double. This fixes dmat-vs-gs-tcs-tes.shader_test and dvec3-vs-gs-tcs-tes.shader_test Signed-off-by: Dave Airlie <[email protected]>
* st/glsl_to_tgsi: when doing reladdr get vec4 of correct typeDave Airlie2016-01-011-1/+1
| | | | | | | This fixes fp64 relative addressing, in the upcoming dmat-vs-gs-tcs-tes.shader_test. Signed-off-by: Dave Airlie <[email protected]>
* st/glsl_to_tgsi: handle double immediates in matrices properly.Dave Airlie2016-01-011-11/+48
| | | | | | This handles matrix initialisation properly. Signed-off-by: Dave Airlie <[email protected]>
* st/glsl_to_tgsi: setup writemask for double arrays and matricies.Dave Airlie2016-01-011-1/+20
| | | | | | | | | | It's important for the double instruction emission code that the writemasks are correct going in for double so it know which channels to replicate. This fixes it for the array and matrix cases. Signed-off-by: Dave Airlie <[email protected]>
* st/glsl_to_tgsi: handle doubles in array shrinking code.Dave Airlie2016-01-011-2/+7
| | | | | | | | This code takes into account double inputs in the array shrinking code. This fixes some issues with doubles and geom/tess inputs. Signed-off-by: Dave Airlie <[email protected]>
* st/glsl_to_tgsi: handle doubles outputs in arrays.Dave Airlie2016-01-011-4/+31
| | | | | | | | This handles the case where a double output is stored in an array, and tracks it for use in the double instruction emit code. Signed-off-by: Dave Airlie <[email protected]>
* st/glsl_to_tgsi: store if dst is double in arrayDave Airlie2016-01-011-3/+10
| | | | | | | | | | This is just a precursor patch to a fix for doubles with tessellation that I've written. We need to descend into output arrays in that case and mark dst's as double. Signed-off-by: Dave Airlie <[email protected]>
* nvc0: Set winding order regardless of domain.Kenneth Graunke2015-12-301-2/+4
| | | | | | | | | | Quads need to respect winding order, too - not just triangles. Fixes rendering in GFXBench 4.0's tessellation benchmark. Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Ilia Mirkin <[email protected]> Cc: "11.0 11.1" <[email protected]>
* glsl: Fix varying struct locations when varying packing is disabled.Kenneth Graunke2015-12-301-13/+2
| | | | | | | | | | | | | | | | | | | | varying_matches::record tries to compute the number of components in each varying, which varying_matches::assign_locations uses to assign locations. With varying packing, it uses glsl_type::component_slots() to come up with a reasonable value. Without varying packing, it fell back to an open-coded computation that didn't bother to handle structs at all. I believe we can simply use 4 * glsl_type::count_attribute_slots(false), which already handles these cases correctly. Partially fixes rendering in GFXBench 4.0's tessellation benchmark. (NVE0 is almost right after this, but i965 is still mostly garbage.) Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Ilia Mirkin <[email protected]> Reviewed-by: Timothy Arceri <[email protected]> Cc: "11.0 11.1" <[email protected]>
* drirc: Disable ARB_blend_func_extended for Heaven 4.0/Valley 1.0.Kenneth Graunke2015-12-301-0/+8
| | | | | | | | | | | | | | | | | | Unigine Heaven 4.0 and Valley 1.0 use dual color blending but don't specify which fragment shader output is which, so there's at best a 50/50 chance of us guessing it correctly. This is invalid. Unigine fixed this in 4.1 and 1.1 versions over a year and a half ago, but hasn't actually released them for whatever reason. So, add the workaround back so that it works for most people. Fixes Heaven 4.0/Valley 1.0 rendering on Ivybridge. For whatever reason, Broadwell worked. 4.1 and 1.1 have always worked. Signed-off-by: Kenneth Graunke <[email protected]> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=92233 Reviewed-by: Marek Olšák <[email protected]> Cc: [email protected]
* glsl: add GL_ARB_shader_draw_parameters defineIlia Mirkin2015-12-301-0/+3
| | | | | Signed-off-by: Ilia Mirkin <[email protected]> Reviewed-by: Timothy Arceri <[email protected]>
* nvc0: add ARB_shader_draw_parameters supportIlia Mirkin2015-12-3014-15/+74
| | | | Signed-off-by: Ilia Mirkin <[email protected]>