summaryrefslogtreecommitdiffstats
Commit message (Collapse)AuthorAgeFilesLines
* radeonsi: move gfx fence wait out of si_check_vm_faultsNicolai Hähnle2016-06-242-6/+7
| | | | Reviewed-by: Marek Olšák <[email protected]>
* radeonsi: extract IB and bo list saving into separate functionsNicolai Hähnle2016-06-246-54/+88
| | | | Reviewed-by: Marek Olšák <[email protected]>
* st/mesa: fix readpixels regression with MESA_pack_invertNicolai Hähnle2016-06-241-1/+1
| | | | | | | | Fixes an error introduced in commit 3948cd37973696dc319170877382676809659465. Reported-by: Marek Olšák <[email protected]> Reviewed-by: Brian Paul <[email protected]> Reviewed-by: Marek Olšák <[email protected]>
* radeonsi: set LLVM denormal flagsMarek Olšák2016-06-241-2/+5
| | | | | | | | | - make sure FP32 denormals will stay disabled in LLVM in the future (the current default is disabled) - tell LLVM that FP64 denormals are enabled Reviewed-by: Nicolai Hähnle <[email protected]> Reviewed-by: Michel Dänzer <[email protected]>
* radeonsi: emit 1/sqrt for RSQMarek Olšák2016-06-241-2/+16
| | | | | | | | | | | | | | | | | | | | | | | | | | | | We don't need the clamped version and we don't have to use any intrinsic. Stats on Tonga: 15382 shaders in 9128 tests Totals: SGPRS: 1230560 -> 1230560 (0.00 %) VGPRS: 469577 -> 462504 (-1.51 %) Code Size: 22089908 -> 21730052 (-1.63 %) bytes LDS: 598 -> 598 (0.00 %) blocks Scratch: 283648 -> 281600 (-0.72 %) bytes per wave Max Waves: 125664 -> 126969 (1.04 %) Wait states: 0 -> 0 (0.00 %) Totals from affected shaders: SGPRS: 547280 -> 547280 (0.00 %) VGPRS: 269132 -> 262059 (-2.63 %) Code Size: 15709604 -> 15349748 (-2.29 %) bytes LDS: 198 -> 198 (0.00 %) blocks Scratch: 74752 -> 72704 (-2.74 %) bytes per wave Max Waves: 47840 -> 49145 (2.73 %) Wait states: 0 -> 0 (0.00 %) Reviewed-by: Nicolai Hähnle <[email protected]> Reviewed-by: Michel Dänzer <[email protected]>
* r600g: Enable FMA on chips that support itJan Vesely2016-06-242-5/+7
| | | | | | | | | | v2: Merge with PIPE_SHADER_CAP_DOUBLES Add CHIP_HEMLOCK v3: only set the instruction on EG and CM Signed-off-by: Jan Vesely <[email protected]> Signed-off-by: Marek Olšák <[email protected]>
* gallium/u_queue: allow the execute function to differ per jobMarek Olšák2016-06-246-15/+18
| | | | | | so that independent types of jobs can use the same queue. Reviewed-by: Nicolai Hähnle <[email protected]>
* gallium/u_queue: reduce the number of mutexes by 2Marek Olšák2016-06-242-20/+35
| | | | | | by converting semaphores to condvars and using the main mutex Reviewed-by: Nicolai Hähnle <[email protected]>
* gallium/u_queue: add an option to name threadsMarek Olšák2016-06-244-2/+13
| | | | | | | | for debugging v2: correct the snprintf use Reviewed-by: Nicolai Hähnle <[email protected]>
* gallium/u_queue: add an option to have multiple worker threadsMarek Olšák2016-06-248-23/+71
| | | | | | | | independent jobs don't have to be stuck on only one thread v2: use CALLOC & FREE Reviewed-by: Nicolai Hähnle <[email protected]>
* gallium/u_queue: rewrite util_queue_fence to allow multiple waitersMarek Olšák2016-06-242-16/+43
| | | | | | | | Checking "signalled" is first done without a mutex, then with a mutex. Also, checking without waiting doesn't lock the mutex. This is racy, but should be safe. Reviewed-by: Nicolai Hähnle <[email protected]>
* gallium/u_queue: use a ring instead of a stackMarek Olšák2016-06-244-20/+47
| | | | | | | | and allow specifying its size in util_queue_init. v2: use CALLOC & FREE Reviewed-by: Nicolai Hähnle <[email protected]>
* i965: Preserve the internal format of the dri imageJordan Justen2016-06-231-3/+10
| | | | | | | | | | | | | | | | | | Since the OpenGLES API is strict about the internal format matching the for many operations, we need to preserve it. See _mesa_es3_error_check_format_and_type in src/mesa/main/glformats.c. Fixes ES2-CTS.gtf.GL2ExtensionTests.egl_image.egl_image Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=96351 Reported-by: Mark Janes <[email protected]> Signed-off-by: Jordan Justen <[email protected]> Cc: Kristian Høgsberg <[email protected]> Cc: Chad Versace <[email protected]> Cc: "12.0" <[email protected]> Reviewed-by: Chad Versace <[email protected]>
* anv: Add anv_render_pass_attachment::store_opChad Versace2016-06-232-2/+2
| | | | | | | | | | Will be needed for resolving auxiliary surfaces. I didn't add anv_render_pass_attachment::stencil_store_op, as the driver would likely never use it, as stencil surfaces never have auxiliary surfaces. Reviewed-by: Jason Ekstrand <[email protected]>
* gbm: Fix commentsGurkirpal Singh2016-06-231-2/+2
| | | | Reviewed-by: Chad Versace <[email protected]>
* gbm: doc fixesEric Engestrom2016-06-232-4/+4
| | | | | Signed-off-by: Eric Engestrom <[email protected]> Reviewed-by: Chad Versace <[email protected]>
* Remove wrongly repeated words in commentsGiuseppe Bilotta2016-06-2346-54/+54
| | | | | | | | | | | | | | | | | Clean up misrepetitions ('if if', 'the the' etc) found throughout the comments. This has been done manually, after grepping case-insensitively for duplicate if, is, the, then, do, for, an, plus a few other typos corrected in fly-by v2: * proper commit message and non-joke title; * replace two 'as is' followed by 'is' to 'as-is'. v3: * 'a integer' => 'an integer' and similar (originally spotted by Jason Ekstrand, I fixed a few other similar ones while at it) Signed-off-by: Giuseppe Bilotta <[email protected]> Reviewed-by: Chad Versace <[email protected]>
* svga: update some comments in svga_buffer_handle()Brian Paul2016-06-231-10/+3
| | | | Reviewed-by: Charmaine Lee <[email protected]>
* svga: add a const qualifier in svga_buffer_upload_piecewise()Brian Paul2016-06-231-1/+1
| | | | Reviewed-by: Charmaine Lee <[email protected]>
* svga: minor code refactor for svga_buffer_upload_command()Brian Paul2016-06-231-5/+21
| | | | | | Put the HBS code into a separate function. Reviewed-by: Charmaine Lee <[email protected]>
* svga: minor code simplification in svga_context_finish()Brian Paul2016-06-231-1/+1
| | | | | | Signed-off-by: Brian Paul <[email protected]> Reviewed-by: Charmaine Lee <[email protected]>
* i965: Implement rasterizer discard via SOL unless required for queries.Kenneth Graunke2016-06-232-0/+21
| | | | | | | | | | | | | | | | | | | | | | | | | | | We currently use CL_INVOCATION_COUNT for the GL_PRIMITIVES_GENERATED query, which involves passing all primitives to the clipper. When rasterizer discard is enabled, we program the clipper in REJECT_ALL mode, rather than using the SOL stage's "Rendering Disable" feature. See commit f09b91f78247409f54c975f56cb10d5f350fe64e for an explanation of why we implement GL_PRIMITIVES_GENERATED this way. Apparently the SOL stage's "Rendering Disable" feature is a lot faster than having the clipper reject all primitives. It's safe to use when no GL_PRIMITIVES_GENERATED query is active, as we don't care about CL_INVOCATION_COUNT incrementing. This patch makes us use SO_RENDERING_DISABLE when no query is active, but continues falling back to the clipper in REJECT_ALL mode when the queries are enabled. It brings back the perf_debug for the clipper case (which I removed in commit 1f9445ff57b, thinking it wasn't useful). Improves performance in Gl32GSCloth by 84.8303% +/- 2.07132% (n = 10) on my Broadwell GT2 laptop. Cc: [email protected] Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]>
* i965: Combine 3DSTATE_STREAMOUT emitters and genX_sol_state atoms.Kenneth Graunke2016-06-234-99/+37
| | | | | | | | | | | They're basically the same. Let's avoid the code duplication. v2: Fix SO_BUFFER_ENABLE stuff to only happen on Gen < 8 (caught by Jason Ekstrand). Cc: [email protected] Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]>
* glsl: Don't constant propagate arrays.Kenneth Graunke2016-06-231-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Constant propagation on arrays doesn't make a lot of sense. If the array is only accessed with constant indexes, then opt_array_splitting would split it up. Otherwise, we have variable indexing. If there's multiple accesses, then constant propagation would end up replicating the data. The lower_const_arrays_to_uniforms pass creates uniforms for each ir_constant with array type that it encounters. This means that it creates redundant uniforms for each copy of the constant, which means uploading too much data. It can even mean exceeding the maximum number of uniform components, causing link failures. We could try and teach the pass to de-duplicate the data by hashing constants, but it makes more sense to avoid duplicating it in the first place. We should promote constant arrays to uniforms, then propagate the uniform access. Fixes the TressFX shaders from Tomb Raider, which exceeded the maximum number of uniform components by a huge margin and failed to link. On Broadwell: total instructions in shared programs: 9067702 -> 9068202 (0.01%) instructions in affected programs: 10335 -> 10835 (4.84%) helped: 10 (Hoard, Shadow of Mordor, Amnesia: The Dark Descent) HURT: 20 (Natural Selection 2) loops in affected programs: 4 -> 0 The hurt programs appear to no longer have a constarray uniform, as all constants were successfully propagated. Apparently before this patch, we successfully unrolled a loop containing array access, but only after promoting constant arrays to uniforms. With this patch, we unroll it first, so all array access is direct, and the array is split up, and individual constants are propagated. This seems better. Cc: [email protected] Reported-by: Karol Herbst <[email protected]> Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Timothy Arceri <[email protected]>
* glsl: Make lower_const_arrays_to_uniforms work directly on constants.Kenneth Graunke2016-06-231-8/+3
| | | | | | | | | | | | | There's really no point in looking at ir_dereference_array of a constant. It also misses cases like: (assign () (var_ref tmp) (constant (array ...) ...)) No changes in shader-db, but keeps it working after the next commit. Cc: [email protected] Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Timothy Arceri <[email protected]>
* i965: Copy propagate before doing variable index lowering.Kenneth Graunke2016-06-231-0/+2
| | | | | | | | | | | | | | | | | | | | | The scalar backend currently doesn't support variable indexing on temporary arrays, but it does support it on uniform arrays, and some stages support it for input arrays. Make sure these are propagated through before exploding indirects into piles of if-ladders unnecessarily. On Broadwell, no instruction count change in shader-db. total cycles in shared programs: 80675652 -> 80674928 (-0.00%) cycles in affected programs: 649972 -> 649248 (-0.11%) helped: 386 HURT: 165 This will help avoid code quality regressions in a future commit. Cc: [email protected] Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Timothy Arceri <[email protected]>
* glsl: Propagate invariant/precise after lowering const arrays.Kenneth Graunke2016-06-231-0/+1
| | | | | | | | | | The new uniform may need precise as well. Fixes copy propagation of constant array uniforms in Tomb Raider shaders. Cc: [email protected] Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Timothy Arceri <[email protected]>
* glsl: Split arrays even in the presence of whole-array copies.Kenneth Graunke2016-06-231-0/+55
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Previously, we failed to split constant arrays. Code such as int[2] numbers = int[](1, 2); would generates a whole-array assignment: (assign () (var_ref numbers) (constant (array int 4) (constant int 1) (constant int 2))) opt_array_splitting generally tried to visit ir_dereference_array nodes, and avoid recursing into the inner ir_dereference_variable. So if it ever saw a ir_dereference_variable, it assumed this was a whole-array read and bailed. However, in the above case, there's no array deref, and we can totally handle it - we just have to "unroll" the assignment, creating assignments for each element. This was mitigated by the fact that we constant propagate whole arrays, so a dereference of a single component would usually get the desired single value anyway. However, I plan to stop doing that shortly; early experiments with disabling constant propagation of arrays revealed this shortcoming. This patch causes some arrays in Gl32GSCloth's geometry shaders to be split, which allows other optimizations to eliminate unused GS inputs. The VS then doesn't have to write them, which eliminates the entire VS (5 -> 2 instructions). It still renders correctly. No other change in shader-db. v2: Drop !AOA check and improve a comment (feedback from Tim Arceri). Cc: [email protected] Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Timothy Arceri <[email protected]>
* glsl: Make constant propagation's folder not propagate into an LHS.Kenneth Graunke2016-06-231-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | opt_constant_propagation.cpp contains constant folding code which can actually do constant propagation in some cases. It was happily propagating constants into the left-hand-side of assignments. For example, (assign () (var_ref temp) (constant ...)) would brilliantly be turned into: (assign () (constant ...) (constant ....)) This is a bigger hammer than necessary - it prevents propagation into the left-hand-side altogether. We could certainly do better someday. Notably, the constant propagation pass itself already takes this approach - it's just the constant propagation pass's built-in constant folding code (which actually propagates, too) that was broken. No change in shader-db, but prevents regressions after future commits. It seems plausible that this could be hit today, but I haven't seen it happen. Cc: [email protected] Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Timothy Arceri <[email protected]>
* i965/blorp: Disable vertex element swizzlingTopi Pohjolainen2016-06-232-4/+18
| | | | | | | | Without vertex elements originating directly from vertex fetcher are not passed to wm-state correctly. Signed-off-by: Topi Pohjolainen <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
* i965/blorp: Let program data tell if push constants are neededTopi Pohjolainen2016-06-233-15/+35
| | | | | Signed-off-by: Topi Pohjolainen <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
* i965/blorp: Use prog data counters to guide wm/ps setupTopi Pohjolainen2016-06-233-3/+8
| | | | | | | just as core upload logic does. Signed-off-by: Topi Pohjolainen <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
* i965/blorp: Use prog data counters to guide sf/sbe setupTopi Pohjolainen2016-06-235-9/+37
| | | | | | | just as core upload logic does. Signed-off-by: Topi Pohjolainen <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
* i965: Avoid division by zero.Ardinartsev Nikita2016-06-231-11/+15
| | | | | | | | | Fixes regression introduced by af5ca43f2676bff7499f93277f908b681cb821d0 Cc: "12.0 11.2" <[email protected]> Reviewed-by: Matt Turner <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=95419
* swr: [rasterizer core] fix dependency bugTim Rowley2016-06-234-10/+10
| | | | | | | Never be dependent on "draw 0", instead have a bool that makes the draw dependent on the previous draw or not dependent at all. Reviewed-by: Bruce Cherniak <[email protected]>
* swr: [rasterizer core] use wrap-around safe compares for dependency checkingTim Rowley2016-06-236-36/+45
| | | | | | Move drawIDs from 64-bit to 32-bit to increase perf. Reviewed-by: Bruce Cherniak <[email protected]>
* swr: [rasterizer jitter] add support for component packing for 'odd' formatsTim Rowley2016-06-231-4/+23
| | | | | | Add early-out if no components are enabled. Add asserts. Reviewed-by: Bruce Cherniak <[email protected]>
* swr: [rasterizer core] track whether GS outputs viewport array indexTim Rowley2016-06-231-0/+3
| | | | | | So we can skip the index gather in PA. Reviewed-by: Bruce Cherniak <[email protected]>
* swr: [rasterizer core] GS viewport array index attributeTim Rowley2016-06-232-1/+2
| | | | | | Only adds the attribute mapping to the jitter; no implementation yet. Reviewed-by: Bruce Cherniak <[email protected]>
* swr: [rasterizer core] conservative rasterization frontend supportTim Rowley2016-06-2310-63/+325
| | | | Reviewed-by: Bruce Cherniak <[email protected]>
* swr: [rasterizer core] stop single threaded crash exit crashTim Rowley2016-06-231-2/+3
| | | | | | | Function static destructors were getting called by exit handlers before context teardown. Reviewed-by: Bruce Cherniak <[email protected]>
* swr: [rasterizer jitter] small fetch jit cleanupTim Rowley2016-06-231-134/+36
| | | | | | | | Handle SGV stores separate from the stream fetch code. Because of this change, there is a potential to jit an extra unused store. Reviewed-by: Bruce Cherniak <[email protected]>
* swr: [rasterizer core] remove old commentTim Rowley2016-06-231-1/+0
| | | | Reviewed-by: Bruce Cherniak <[email protected]>
* swr: [rasterizer jitter] cleanup supporting different llvm versionsTim Rowley2016-06-238-34/+73
| | | | Reviewed-by: Bruce Cherniak <[email protected]>
* swr: [rasterizer jitter] unitialized component fix in fetch jitTim Rowley2016-06-231-1/+1
| | | | | | | Was trying to store an extra uninitialized component. Only affects component packing, which isn't enabled (yet). Reviewed-by: Bruce Cherniak <[email protected]>
* swr: [rasterizer] add support for building avx512 versionTim Rowley2016-06-235-15/+20
| | | | | | | Currently, most code paths between AVX2 and AVX512 are identical (see changes to knobs.h). Reviewed-by: Bruce Cherniak <[email protected]>
* swr: [rasterizer common] fix include for Intel compilerTim Rowley2016-06-231-1/+1
| | | | Reviewed-by: Bruce Cherniak <[email protected]>
* swr: [rasterizer common] workaround clang for windows __cpuid() bugTim Rowley2016-06-231-5/+9
| | | | Reviewed-by: Bruce Cherniak <[email protected]>
* swr: push/pop DEBUG macro around llvm includesTim Rowley2016-06-232-4/+13
| | | | | | | | | | llvm redefines DEBUG; adding push/pop prevents a undefined reference to debug_refcnt_state in llvm-3.7+. v2: add undef DEBUG Cc: "12.0" <[email protected]> Reviewed-by: Bruce Cherniak <[email protected]>
* include: Require MSVC 2013 Update 4.Jose Fonseca2016-06-231-2/+2
| | | | | | | | Earlier MSVC 2013 releases have troubles compiling some of our C99 code, so make sure we have Update 4 to avoid confusion. Cc: [email protected] Reviewed-by: Brian Paul <[email protected]>