summaryrefslogtreecommitdiffstats
path: root/src
Commit message (Collapse)AuthorAgeFilesLines
* mesa: rearrange error handling in glProgramParameteri()Brian Paul2015-01-051-15/+11
| | | | Reviewed-by: Eric Anholt <[email protected]>
* mesa: fix error strings in shaderapi.cBrian Paul2015-01-051-2/+2
| | | | | | | The _mesa_-prefixed function names should not appear in GL error messages. Reviewed-by: Eric Anholt <[email protected]>
* glsl: use the is_gl_identifier() helper in a couple more placesBrian Paul2015-01-052-2/+2
| | | | Reviewed-by: Eric Anholt <[email protected]>
* meta: init var to silence uninitialized variable warningBrian Paul2015-01-051-1/+1
|
* draw: silence uninitialized variable warningBrian Paul2015-01-051-1/+1
| | | | | | v2: move initialization of llvm_gs to declaration. Reviewed-by: Roland Scheidegger <[email protected]>
* gallivm: silence a couple compiler warningsBrian Paul2015-01-052-1/+4
| | | | | | | Silence warnings about possibly uninitialized variables when making a release build. Reviewed-by: José Fonseca <[email protected]>
* gallium/util: make sure cache line size is not zeroLeonid Shatz2015-01-051-1/+5
| | | | | | | | | | The "normal" detection (querying clflush size) already made sure it is non-zero, however another method did not. This lead to crashes if this value happened to be zero (apparently can happen in virtualized environments at least). This fixes https://bugs.freedesktop.org/show_bug.cgi?id=87913 Cc: "10.4" <[email protected]>
* gallium/util: fix crash with daz detection on x86Roland Scheidegger2015-01-051-1/+1
| | | | | | | | | | The code used PIPE_ALIGN_VAR for the variable used by fxsave, however this does not work if the stack isn't aligned. Hence use PIPE_ALIGN_STACK function decoration to fix the segfault which can happen if stack alignment is only 4 bytes. This fixes https://bugs.freedesktop.org/show_bug.cgi?id=87658. Cc: "10.4" <[email protected]>
* nvc0: add name to magic numberIlia Mirkin2015-01-051-2/+2
| | | | Signed-off-by: Ilia Mirkin <[email protected]>
* nvc0: regenerate rnndb headersIlia Mirkin2015-01-0517-837/+1157
| | | | | | | | | | | | | | | The headers hadn't been regenerated in a long time and had seen a number of manual modifications. A few changes: - remove nvc0_2d entirely, use the nv50 header which has the nvc0 values too - remove 3ddefs, it's identical to the nv50 file - move macros out into a separate file Also the upstream rnndb changed the overall chip naming convention; this was fixed up manually in the generated files until a better solution is determined. Signed-off-by: Ilia Mirkin <[email protected]>
* nv50: regenerate rnndb headersIlia Mirkin2015-01-0511-358/+451
| | | | | | | | | | The headers hadn't been regenerated in a long time, and there were a few minor divergences. Among other things, rnndb has changed naming to G80/etc, for now I've not tackled switching that over and manually replaced the nvidia codenames back to the chip ids. However no other modifications of the headergen'd headers was done. Signed-off-by: Ilia Mirkin <[email protected]>
* nv50: enable texture compressionTobias Klausmann2015-01-052-3/+26
| | | | | | | | | Compression seems to be supported for only some formats. Enable it for those. Previously this was disabled for everything despite the code looking like it was actually enabled. Signed-off-by: Tobias Klausmann <[email protected]> Reviewed-by: Ilia Mirkin <[email protected]>
* nv50/ir: enable sat modifier for OP_SUBIlia Mirkin2015-01-051-1/+1
| | | | | | | SUB is handled the same as ADD, so no reason not to allow a saturate modifier on it. Signed-off-by: Ilia Mirkin <[email protected]>
* nv50/ir: Add sat modifier for mulRoy Spliet2015-01-052-1/+7
| | | | | Signed-off-by: Roy Spliet <[email protected]> Reviewed-by: Ilia Mirkin <[email protected]>
* nv50,nvc0: avoid doing work inside of an assertIlia Mirkin2015-01-052-2/+4
| | | | | | | | assert is compiled out in release builds - don't put logic into it. Note that this particular instance is only used for vp debugging and is normally compiled out. Signed-off-by: Ilia Mirkin <[email protected]>
* nv50/ir: fix texture offsets in release buildsIlia Mirkin2015-01-052-2/+4
| | | | | | | | | | assert's get compiled out in release builds, so they can't be relied upon to perform logic. Reported-by: Pierre Moreau <[email protected]> Signed-off-by: Ilia Mirkin <[email protected]> Tested-by: Roy Spliet <[email protected]> Cc: "10.2 10.3 10.4" <[email protected]>
* i965: Micro-optimize swizzle_to_scs() and make it inlinable.Kenneth Graunke2015-01-043-27/+39
| | | | | | | | | | | | | | | | | | | | | | | | | | | | brw_swizzle_to_scs has been showing up in my CPU profiling, which is rather silly - it's a tiny amount of code. It really should be inlined, and can easily be implemented with fewer instructions. The enum translation is as follows: SWIZZLE_X, SWIZZLE_Y, SWIZZLE_Z, SWIZZLE_W, SWIZZLE_ZERO, SWIZZLE_ONE 0 1 2 3 4 5 4 5 6 7 0 1 SCS_RED, SCS_GREEN, SCS_BLUE, SCS_ALPHA, SCS_ZERO, SCS_ONE which is simply (swizzle + 4) & 7. Haswell needs extra textureGather workarounds to remap GREEN to BLUE, but Broadwell and later do not. This patch replicates swizzle_to_scs in gen7_wm_surface_state.c and gen8_surface_state.c, since the Gen8+ code can be simplified to a mere two instructions. Both copies can be marked static for easy inlining. v2: Put the commit message in the code as comments (requested by Jason Ekstrand). Also fix a typo. Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]>
* i965: Support MESA_FORMAT_R8G8B8X8_SRGB.Kenneth Graunke2015-01-041-1/+4
| | | | | | | | | | | | | | | | | | | | | Valve games use GL_SRGB8 textures. Instead of supporting that properly, we fell back to MESA_FORMAT_R8G8B8A8_SRGB (with an alpha channel), which meant that we had to use texture swizzling to override the alpha to 1.0 when sampling. This meant shader recompiles on Gen < 7.5 platforms. By supporting MESA_FORMAT_R8G8B8X8_SRGB, the hardware just returns 1.0 for us, so we can just use SWIZZLE_XYZW, and avoid any recompiles. All generations of hardware have supported the format for sampling and filtering; we can easily support rendering by using the R8G8B8A8_SRGB format and writing garbage to the X channel. (We do this already for the non-SRGB version of this format.) This removes all remaining shader recompiles in a time demo of "Counter Strike: Global Offensive" (32 -> 0) on Sandybridge. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=87886 Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Chris Forbes <[email protected]>
* i965: Fix BLORP sRGB MSAA overrides to cope with X vs. A formats.Kenneth Graunke2015-01-041-2/+3
| | | | | | | | | | | | | The logic in brw_blorp_surface_info::set uses brw_format_for_mesa_format for source surfaces, and brw->render_target_format[] for destination surfaces. We should do the same in the sRGB MSAA overrides. Currently, this isn't a problem, since SRGB MSAA buffers are all RGBA. The next commit will introduce RGBX SRGB MSAA buffers, at which point we need to get the RGBX -> RGBA format overrides for rendering right. Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Chris Forbes <[email protected]>
* i965: Copy shader->shadow_samplers to prog->ShadowSamplers.Kenneth Graunke2015-01-041-0/+1
| | | | | | | | | | | | | | | | ir_to_mesa does this - apparently we just forgot or something. Without this, we'll guess the wrong texture swizzle (XYZW for color instead of XXX1 for depth) when doing precompiles. This cuts 26 shader recompiles in a time demo of "Counter Strike: Global Offensive" (58 -> 32) on Sandybridge. Haswell still has 0 recompiles. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=87886 Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Matt Turner <[email protected]> Reviewed-by: Chris Forbes <[email protected]>
* i965: Make the precompile ignore DEPTH_TEXTURE_MODE on Gen7.5+.Kenneth Graunke2015-01-042-2/+5
| | | | | | | | | | | | | | | Gen7.5+ platforms that support the "Shader Channel Select" feature leave key->tex.swizzles[i] as SWIZZLE_NOOP except when GL_DEPTH_TEXTURE_MODE is GL_ALPHA (which is really uncommon). So, the precompile should leave them as SWIZZLE_NOOP (aka SWIZZLE_XYZW) as well. We didn't notice this because prog->ShadowSamplers is not set correctly. The next patch will fix that problem. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=87886 Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Matt Turner <[email protected]> Reviewed-by: Chris Forbes <[email protected]>
* i965: Implement WaCsStallAtEveryFourthPipecontrol on IVB/BYT.Kenneth Graunke2015-01-042-0/+34
| | | | | | | | | | | | According to the documentation, we need to do a CS stall on every fourth PIPE_CONTROL command to avoid GPU hangs. The kernel does a CS stall between batches, so we only need to count the PIPE_CONTROLs in our batches. v2: Get the generation check right (caught by Chris Wilson), combine the ++ with the check (suggested by Daniel Vetter). Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Daniel Vetter <[email protected]>
* r300g: handle vertex format PIPE_FORMAT_NONEMarek Olšák2015-01-041-2/+11
|
* glsl_to_tgsi: fix a bug in copy propagationMarek Olšák2015-01-031-1/+2
| | | | | | | This fixes the new piglit test: arb_uniform_buffer_object/2-buffers-bug Cc: 10.2 10.3 10.4 <[email protected]> Reviewed-by: Brian Paul <[email protected]>
* i965: Make INTEL_DEBUG=state ignore state flags with a count of 1.Kenneth Graunke2015-01-031-2/+4
| | | | | | | | | | | There are too many state flags to fit in one terminal screen, even with a very tall terminal. Everything is flagged once, so a value of 1 means that it hasn't ever happened again, and thus isn't terribly interesting. Skipping those makes it easier to see the interesting values. Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Matt Turner <[email protected]>
* i965: Fix INTEL_DEBUG=optimizer with VF types.Kenneth Graunke2015-01-032-2/+2
| | | | | | | Hardcoding stderr is wrong; INTEL_DEBUG=optimizer uses other files. Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Matt Turner <[email protected]>
* i965: Show opt_vector_float() and later passes in INTEL_DEBUG=optimizer.Kenneth Graunke2015-01-031-8/+12
| | | | | | | | | | | | In order to support calling opt_vector_float() inside a condition, this patch makes OPT() a statement expression: https://gcc.gnu.org/onlinedocs/gcc/Statement-Exprs.html We've used that elsewhere already. Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Matt Turner <[email protected]>
* swrast: Fix -Wduplicate-decl-specifier warningJeremy Huddleston Sequoia2015-01-011-2/+2
| | | | | | | | | | | swrast.c:67:12: warning: duplicate 'const' declaration specifier [-Wduplicate-decl-specifier] const char const *swrast_vendor_string = "Mesa Project"; ^ swrast.c:68:12: warning: duplicate 'const' declaration specifier [-Wduplicate-decl-specifier] const char const *swrast_renderer_string = "Software Rasterizer"; ^ Signed-off-by: Jeremy Huddleston Sequoia <[email protected]>
* nv50/ir: Fold sat into madRoy Spliet2015-01-011-1/+1
| | | | | | | | | The mad instruction emitter already supported the saturate modifier, but the ModifierFolding pass never tried folding cvt sat operations in for NV50. Signed-off-by: Roy Spliet <[email protected]> Reviewed-by: Ilia Mirkin <[email protected]>
* nv50/ir: fold MAD when one of the multiplicands is constIlia Mirkin2015-01-011-0/+23
| | | | | | | | | | Fold MAD dst, src0, immed, src2 (or src0/immed swapped) when - immed = 0 -> MOV dst, src2 - immed = +/- 1 -> ADD dst, src0, src2 These types of MAD patterns were observed in some st/nine shaders. Signed-off-by: Ilia Mirkin <[email protected]>
* gallium/state_tracker: Rewrite Haiku's state trackerAlexander von Gluck IV2015-01-017-190/+274
| | | | | * More gallium-like * Leverage stamps properly and don't call mesa functions
* radeonsi: fix warningsMarek Olšák2015-01-012-1/+3
|
* i965: Fix start/base_vertex_location for >1 prims but !BRW_NEW_VERTICES.Kenneth Graunke2014-12-313-21/+13
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This is a partial revert of c89306983c07e5a88c0d636267e5ccf263cb4213. It split the {start,base}_vertex_location handling into several steps: 1. Set brw->draw.start_vertex_location = prim[i].start and brw->draw.base_vertex_location = prim[i].basevertex. (This happened once per _mesa_prim, in the main drawing loop.) 2. Add brw->vb.start_vertex_bias and brw->ib.start_vertex_offset appropriately. (This happened in brw_prepare_shader_draw_parameters, which was called just after brw_prepare_vertices, as part of state upload, and only happened when BRW_NEW_VERTICES was flagged.) 3. Use those values when emitting 3DPRIMITIVE (once per _mesa_prim). If we drew multiple _mesa_prims, but didn't flag BRW_NEW_VERTICES on the second (or later) primitives, we would do step #1, but not #2. The first _mesa_prim would get correct values, but subsequent ones would only get the first half of the summation. The reason I originally did this was because I needed the value of gl_BaseVertexARB to exist in a buffer object prior to uploading 3DSTATE_VERTEX_BUFFERS. I believed I wanted to upload the value of 3DPRIMITIVE's "Base Vertex Location" field, which was computed as: (prims[i].indexed ? prims[i].start : prims[i].basevertex) + brw->vb.start_vertex_bias. The latter value wasn't available until after brw_prepare_vertices, and the former weren't available in the state upload code at all. Hence the awkward split. However, I believe that including brw->vb.start_vertex_bias was a mistake. It's an extra bias we apply when uploading vertex data into VBOs, to move [min_index, max_index] to [0, max_index - min_index]. >From the GL_ARB_shader_draw_parameters specification: "<gl_BaseVertexARB> holds the integer value passed to the <baseVertex> parameter to the command that resulted in the current shader invocation. In the case where the command has no <baseVertex> parameter, the value of <gl_BaseVertexARB> is zero." I conclude that gl_BaseVertexARB should only include the baseVertex parameter from glDraw*Elements*, not any internal biases we add for optimization purposes. With that in mind, gl_BaseVertexARB only needs prim[i].start or prim[i].basevertex. We can simply store that, and go back to computing start_vertex_location and base_vertex_location in brw_emit_prim(), like we used to. This is much simpler, and should actually fix two bugs. Fixes missing geometry in Unvanquished. Cc: "10.4 10.3" <[email protected]> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=85529 Signed-off-by: Kenneth Graunke <[email protected]> Acked-by: Ian Romanick <[email protected]> Reviewed-by: Chris Forbes <[email protected]>
* i965: Use WARN_ONCE for the single-primitive-exceeded-aperture message.Kenneth Graunke2014-12-311-9/+4
| | | | | | | This makes it show up via ARB_debug_output and is also less code. Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]>
* u_primconvert: Fix leak of the upload BO on context destroy.Eric Anholt2014-12-311-0/+2
| | | | | | | v2: Conditionalize it on having done any uploads (Turns out u_upload_destroy() isn't safe with a NULL arg). Reviewed-by: Dave Airlie <[email protected]> (v1)
* vc4: Fix memory leak as of 0404e7fe0ac2a6234a11290b4b1596e8bc127a4b.Eric Anholt2014-12-311-5/+5
| | | | Can't reset the CL before looking at how much we had pupt in it.
* nv50,nvc0: set vertex id base to index_biasIlia Mirkin2014-12-305-7/+35
| | | | | | | | | | | | | | Fixes the piglits which check that gl_VertexID includes the base vertex offset: arb_draw_indirect-vertexid elements gl-3.2-basevertex-vertexid Note that this leaves out the original G80, for which this will continue to fail. It could be fixed by passing a driver constbuf value in, but that's beyond the scope of this change. Signed-off-by: Ilia Mirkin <[email protected]> Cc: "10.3 10.4" <[email protected]>
* nv50,nvc0: implement half_pixel_centerTiziano Bacocco2014-12-308-14/+11
| | | | | | | | | | LAST_LINE_PIXEL has actually been renamed to PIXEL_CENTER_INTEGER in rnndb; use that method to implement the rasterizer setting, used for st/nine. Signed-off-by: Tiziano Bacocco <[email protected]> Reviewed-by: Ilia Mirkin <[email protected]> Cc: "10.4" <[email protected]>
* vc4: Only render tiles where the scissor ever intersected them.Eric Anholt2014-12-304-10/+52
| | | | | This gives a 2.7x improvement in x11perf -rect100, since we only end up load/storing the x11perf window, not the whole screen.
* vc4: Move draw call reset handling to a helper function.Eric Anholt2014-12-301-23/+31
| | | | | | This will be more important in the next commit, when there's more state to reset to nonzero values, and I want an early exit from the submit function.
* vc4: Drop the content of vc4_flush_resource().Eric Anholt2014-12-301-4/+4
| | | | | The callers all follow it with a flush of the context, and the flush of the context gives us more information about how things are being flushed.
* mesa: Remove __SSE4_1__ guards from sse_minmax.c.Matt Turner2014-12-291-3/+0
| | | | | | See commit e07c9a288. Reviewed-by: Kristian Høgsberg <[email protected]>
* i965/vec4: Do separate copy followed by constant propagation after ↵Matt Turner2014-12-291-1/+2
| | | | | | | | | | | | | | | | | opt_vector_float(). total instructions in shared programs: 5877012 -> 5876617 (-0.01%) instructions in affected programs: 33140 -> 32745 (-1.19%) From before the commit that allows VF constant propagation (which hurt some programs) to here, the results are: total instructions in shared programs: 5877951 -> 5876617 (-0.02%) instructions in affected programs: 123444 -> 122110 (-1.08%) with no programs hurt. Reviewed-by: Ian Romanick <[email protected]>
* i965/vec4: Allow constant propagation of VF immediates.Matt Turner2014-12-291-1/+27
| | | | | | | | | | total instructions in shared programs: 5877951 -> 5877012 (-0.02%) instructions in affected programs: 155923 -> 154984 (-0.60%) Helps 1233, hurts 156 shaders. The hurt shaders are addressed in the next commit. Reviewed-by: Ian Romanick <[email protected]>
* i965/vec4: Add parameter to skip doing constant propagation.Matt Turner2014-12-292-3/+3
| | | | | | | | | | | | | | | | After CSEing some MOV ..., VF instructions we have code like mov tmp, [1F, 2F, 3F, 4F]VF mov r10, tmp mov r11, tmp ... use r10 use r11 We want to copy propagate tmp into the uses of r10 and r11, but *not* constant propagate the VF immediate into the uses of tmp. Reviewed-by: Ian Romanick <[email protected]>
* i965/vec4: Do CSE, copy propagation, and DCE after opt_vector_float().Matt Turner2014-12-291-1/+5
| | | | | | | total instructions in shared programs: 5869005 -> 5868220 (-0.01%) instructions in affected programs: 70208 -> 69423 (-1.12%) Reviewed-by: Ian Romanick <[email protected]>
* i965/vec4: Perform CSE on MOV ..., VF instructions.Matt Turner2014-12-291-5/+11
| | | | | | | | Port of commit a28ad9d4 from the fs backend. No shader-db changes since we don't emit MOV ..., VF instructions yet. Reviewed-by: Ian Romanick <[email protected]>
* i965/vec4: Add pass to gather constants into a vector-float MOV.Matt Turner2014-12-292-0/+62
| | | | | | | | | | Currently only handles consecutive instructions with the same destination that collectively write all channels. total instructions in shared programs: 5879798 -> 5869011 (-0.18%) instructions in affected programs: 465236 -> 454449 (-2.32%) Reviewed-by: Ian Romanick <[email protected]>
* i965: Add support for saturating immediates.Matt Turner2014-12-294-0/+80
| | | | | | | I don't feel great about assert(!"unimplemented: ...") but these cases do only seem possible under some currently impossible circumstances. Reviewed-by: Ian Romanick <[email protected]>
* i965: Add fs_reg/src_reg constructors that take vf[4].Matt Turner2014-12-293-0/+19
| | | | | | | | | | Sometimes it's easier to generate 4x values into an array, and the memcpy is 1 instruction, rather than 11 to piece 4 arguments together. I'd forgotten to remove the prototype from fs_reg from a previous patch, so it's already there for us here. Reviewed-by: Ian Romanick <[email protected]>