summaryrefslogtreecommitdiffstats
path: root/src
Commit message (Collapse)AuthorAgeFilesLines
* gallium/hud: initialize sampler stateBrian Paul2013-04-041-0/+6
| | | | | | | | | The default wrap mode (PIPE_TEX_WRAP_REPEAT) is incompatible with unnormalized texcoords (at least for softpipe). v2: use PIPE_TEX_WRAP_CLAMP_TO_EDGE Reviewed-by: Marek Olšák <[email protected]>
* glsl: Add an optimization pass to flatten simple nested if blocks.Kenneth Graunke2013-04-044-0/+106
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | GLBenchmark 2.7's shaders contain conditional blocks like: if (x) { if (y) { ... } } where the outer conditional's then clause contains exactly one statement (the nested if) and there are no else clauses. This can easily be optimized into: if (x && y) { ... } This saves a few instructions in GLBenchmark 2.7: total instructions in shared programs: 11833 -> 11649 (-1.55%) instructions in affected programs: 8234 -> 8050 (-2.23%) It also helps CS:GO slightly (-0.05%/-0.22%). More importantly, however, it simplifies the control flow graph, which could enable other optimizations. Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Matt Turner <[email protected]> Reviewed-by: Eric Anholt <[email protected]>
* i965: Use a variable for the push constant size in kB.Kenneth Graunke2013-04-041-2/+3
| | | | | | | | | This clarifies that the offset of 2 is actually 16 kB / 8kB units. It also keys both computations off of a single variable, which should make it easier to change in the future. Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Paul Berry <[email protected]>
* i965: Turn brw->urb.vs_size and gs_size into local variables.Kenneth Graunke2013-04-043-22/+12
| | | | | | | | These variables are only used within a single function, so we may as well make them local variables. Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Paul Berry <[email protected]>
* i965: Remove BRW_NEW_WM_INPUT_DIMENSIONS dirty bit.Kenneth Graunke2013-04-043-4/+0
| | | | | | | | This was only produced by the brw_wm_input_dimensions atom, which was removed in the previous commit. So there's no need for the dirty bit. Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Eric Anholt <[email protected]>
* i965: Delete brw_vs_constval.c and the brw_wm_input_sizes atom.Kenneth Graunke2013-04-045-279/+0
| | | | | | | | This was only used to compute proj_attrib_mask, which was removed by the previous commit. That makes this dead code. Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Eric Anholt <[email protected]>
* i965: Remove now dead brw_wm_prog_key::proj_attrib_mask field.Kenneth Graunke2013-04-043-29/+0
| | | | | | | | | The previous commit removed the last user of this field, so there's no longer any point in setting it. Removing this should eliminate state-dependent recompiles, and make the precompile more reliable. Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Eric Anholt <[email protected]>
* i965: Remove fixed-function texture projection avoidance optimization.Kenneth Graunke2013-04-041-25/+18
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This optimization attempts to avoid extra attribute interpolation instructions for texture coordinates where the W-component is 1.0. Unfortunately, it requires a lot of complexity: the brw_wm_input_sizes state atom (all the brw_vs_constval.c code) needs to run on each draw. It computes the input_size_masks array, then uses that to compute proj_attrib_mask. Differences in proj_attrib_mask can cause state-dependent fragment shader recompiles. We also often fail to guess proj_attrib_mask for the fragment shader precompile, causing us to needlessly compile it twice. Furthermore, this optimization only applies to fixed-function programs; it does not help modern GLSL-based programs at all. Generally, older fixed-function programs run fine on modern hardware anyway. The optimization has existed in some form since the initial commit. When we rewrote the fragment shader backend, we dropped it for a while. Eric readded it in commit eb30820f268608cf451da32de69723036dddbc62 as part of an attempt to cure a ~1% performance regression caused by converting the fixed-function fragment shader generation code from Mesa IR to GLSL IR. However, no performance data was included in the commit message, so it's unclear whether or not it was successful. Time has passed, so I decided to re-measure this. Surprisingly, Eric's OpenArena timedemo actually runs /faster/ after removing this and the brw_wm_input_sizes atom. On Ivybridge at 1024x768, I measured a 1.39532% +/- 0.91833% increase in FPS (n = 55). On Ironlake, there was no statistically significant difference (n = 37). Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Eric Anholt <[email protected]>
* i965: Use ctx->Stencil._WriteEnabled in DEPTH_STENCIL_STATE.Kenneth Graunke2013-04-041-5/+1
| | | | | | | | This is the same computation as the _WriteEnabled flag, so we may as well use it. Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Paul Berry <[email protected]>
* i965: Fix stencil write enable flag in 3DSTATE_DEPTH_BUFFER on Gen7+.Kenneth Graunke2013-04-041-1/+1
| | | | | | | | | | | | ctx->Stencil.WriteMask is a statically sized array of 3 elements. Checking it against 0 actually is a NULL check, and can never fail, which meant that we always said stencil writes were enabled. Use the new core Mesa derived state flag to fix this. NOTE: This is a candidate for stable branches. Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Paul Berry <[email protected]>
* mesa: Add new ctx->Stencil._WriteEnabled derived state flag.Kenneth Graunke2013-04-042-0/+6
| | | | | | | | | | | i965 needs to know whether stencil writes are enabled in several places, and gets the test wrong sometimes. While we could create a function to compute this, it seems generally useful enough to warrant a new piece of derived state. Also, all the plumbing is already in place. NOTE: This is a candidate for stable branches. Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Paul Berry <[email protected]>
* gallivm: some minor cube map cleanupRoland Scheidegger2013-04-041-10/+15
| | | | | | | | | | | | | | | The ar_ge_as_at variable was just very very confusing since the condition was actually the other way around (as_at_ge_ar). So change the condition (and the selects depending on it) to match the variable name. And also change the chosen major axis in case the coord values are the same. OpenGL doesn't care one bit which one is chosen in this case but it looks like dx10 would require z chosen over y, and y chosen over x (previously did x chosen over y, y chosen over z). Since it's all the same effort just honor dx10's wishes. (Though actually, for some prefered orderings, we could save one (or two with derivatives) selects since the tnewx and tnewz (and the corresponding dmax values) are the same.) Reviewed-by: Jose Fonseca <[email protected]>
* i965: Ask the register allocator to round-robin through registers.Eric Anholt2013-04-043-3/+31
| | | | | | | | | | | | The way we were allocating registers before, packing into low register numbers for Ironlake, resulted in an overly-constrained dependency graph for instruction scheduling. Improves GLBenchmark 2.1 performance by 4.5% +/- 0.7% (n=26). No difference on my old GLSL demo (n=20). No difference on nexuiz (n=15). v2: Fix off-by-one bug that made the change only work for 16-wide on i965. Reviewed-by: Matt Turner <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
* llvmpipe: implement ucmpZack Rusin2013-04-042-0/+32
| | | | | | | and add a test for it Signed-off-by: Zack Rusin <[email protected]> Reviewed-by: José Fonseca <[email protected]>
* Avoid spurious GCC warnings in STATIC_ASSERT() macro.Paul Berry2013-04-042-2/+2
| | | | | | | | | | | | | GCC 4.8 now warns about typedefs that are local to a scope and not used anywhere within that scope. This produced spurious warnings with the STATIC_ASSERT() macro (which used a typedef to provoke a compile error in the event of an assertion failure). This patch switches to a simpler technique that avoids the warning. v2: Avoid GCC-specific syntax. Also update p_compiler.h. Reviewed-by: Kenneth Graunke <[email protected]>
* freedreno: document debug flagErik Faye-Lund2013-04-041-0/+4
| | | | | Signed-off-by: Erik Faye-Lund <[email protected]> Signed-off-by: Brian Paul <[email protected]>
* st/wgl: add HUD supportBrian Paul2013-04-045-0/+42
| | | | | | v2: fix a few minor issues spotted by Jose. Reviewed-by: José Fonseca <[email protected]>
* st/wgl: make stw_current_context() non-staticBrian Paul2013-04-042-1/+3
| | | | Reviewed-by: José Fonseca <[email protected]>
* util: add debug_memory_check_block(), debug_memory_tag()Brian Paul2013-04-042-0/+61
| | | | | | | | | | The former just checks that the given block is valid by checking the header and footer. The later sets the memory block's tag. With extra debug code, we can use that for monitoring/checking particular allocations. Reviewed-by: José Fonseca <[email protected]>
* gallium/hud: replace malloc w/ MALLOCBrian Paul2013-04-041-1/+1
| | | | | | To match the FREE() called used later. Fixes things on Windows. Reviewed-by: Marek Olšák <[email protected]>
* r600g/llvm: Workaround for wrong tex.offset_*Vincent Lejeune2013-04-041-0/+3
|
* gallivm: honor explicit derivatives values for cube maps.Roland Scheidegger2013-04-044-28/+60
| | | | | | | | | | | | This is trivial now, though need to make sure we pass all the necessary derivative values (which is 3 each for ddx/ddy not 2). Passes piglit arb_shader_texture_lod-texgradcube test. v2: add the forgotten abs() for all incoming derivatives (discovered by new piglit arb_shader_texture_lod-texgradcube test, though more by luck as it was failing only for exactly one pixel...). Reviewed-by: Jose Fonseca <[email protected]>
* gallivm: do per-pixel cube face selection (finally!!!)Roland Scheidegger2013-04-043-82/+180
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This proved to be tricky, the problem is that after selection/mirroring we cannot calculate reasonable derivatives (if not all pixels in a quad end up on the same face the derivatives could get "randomly" exceedingly large). However, it is actually quite easy to simply calculate the derivatives before selection/mirroring and then transform them similar to the cube coordinates (they only need selection/projection, but not mirroring as we're not interested in the sign bit, of course). While there is a tiny bit more work to do (need to calculate derivs for 3 coords instead of 2, and additional selects) it also simplifies things somewhat for the coord selection itself (as we save some broadcast aos shuffles, and we don't need to calculate the average vector) - hence if derivatives aren't needed this should actually be faster. Also, this has the benefit that this will (trivially) work for explicit derivatives too, which we completely ignored before that (will be in a separate commit for better trackability). Note that while the way for getting rho looks very different, it should result in "nearly" the same values as before (the "nearly" is only because before the code would choose the face based on an "average" vector and hence the derivatives calculated according to this face, where now (for implicit derivatives) the derivatives are projected on the face selected for the first (top-left) pixel in a quad, so not necessarly the same face). The transformation done might not quite be state-of-the-art, calculating length(dx,dy) as max(dx,dy) certainly isn't neither but this stays the same as before (that is I think a better transform would _somehow_ take the "derivative major axis" into account so that derivative changes in the major axis wouldn't get ignored). Should solve some accuracy problems with cubemaps (can easily be seen with the cubemap demo when switching wrapping/filtering), though we still don't do seamless filtering to fix it completely (so not per-sample but per-pixel is certainly better than per-quad and already sufficient for accurate results with nearest tex filter). As for performance, it seems to be a tiny bit faster too (maybe 3% or so with cubemap demo). Which I'd have expected with nearest/nearest filtering where this will be less instructions, but the difference seems to actually be larger with linear/linear_mipmap_linear where it is slightly more instructions, probably the code appears less serialized allowing better scheduling (on a sandy bridge cpu). It actually seems to be now at least as fast as the old path using a conditional when using 128bit vectors too (that is probably more a result of testing with a newer cpu though), for now that old path is still there but unused. No piglit regressions. Reviewed-by: Jose Fonseca <[email protected]>
* gallivm: minor rho calculation optimization for 1 or 3 coordsRoland Scheidegger2013-04-042-29/+22
| | | | | | | Using a different packing for the single coord case should save a shuffle. Plus some minor style fixes. Reviewed-by: Jose Fonseca <[email protected]>
* gallivm: use f16c hw support for float->half and half->float conversionRoland Scheidegger2013-04-044-4/+53
| | | | | | | | Should be way faster of course on cpus supporting this (includes AMD Bulldozer and Jaguar cores, Intel Ivy Bridge and up (except budget models)). Passes piglit fbo-blending-formats GL_ARB_texture_float -auto on Ivy Bridge. Reviewed-by: Brian Paul <[email protected]>
* draw/llvmpipe: allow independent so attachments to the vsZack Rusin2013-04-036-23/+43
| | | | | | | | | | | | | | When geometry shaders are present, one needs to be able to create an empty geometry shader with stream output that needs to be resolved later and attached to the currently bound vertex shader. Lets add support for it to llvmpipe and draw. draw allows attaching independent stream output info to any vertex shader and llvmpipe resolves at draw time which vertex shader the given empty geometry shader should be linked to. Signed-off-by: Zack Rusin <[email protected]> Reviewed-by: Brian Paul <[email protected]> Reviewed-by: José Fonseca <[email protected]>
* llvmpipe: reset so buffers when not appendingZack Rusin2013-04-031-0/+6
| | | | | | | | | We need to reset the internal state of the so buffers or we'll keep appending even though we're not supposed to. Signed-off-by: Zack Rusin <[email protected]> Reviewed-by: Brian Paul <[email protected]> Reviewed-by: José Fonseca <[email protected]>
* draw: remove unused functionZack Rusin2013-04-032-12/+0
| | | | | | | | we use draw_set_mapped_so_targets nowadays Signed-off-by: Zack Rusin <[email protected]> Reviewed-by: Brian Paul <[email protected]> Reviewed-by: José Fonseca <[email protected]>
* draw/llvm: use an enum instead of magic numbersZack Rusin2013-04-032-10/+15
| | | | | | | | | | | I think this was there before and got accidently removed during a merge. Same code as for the GS context, which is also using an enum instead of hardcoded numbers. Signed-off-by: Zack Rusin <[email protected]> Reviewed-by: Brian Paul <[email protected]> Reviewed-by: José Fonseca <[email protected]>
* draw/gs: cleanup some debugging codeZack Rusin2013-04-031-4/+0
| | | | | | Signed-off-by: Zack Rusin <[email protected]> Reviewed-by: Brian Paul <[email protected]> Reviewed-by: José Fonseca <[email protected]>
* draw/so: maintain an exact number of written verticesZack Rusin2013-04-033-7/+33
| | | | | | | | | It's quite helpful during the rendering when we know exactly the count of the vertices available in the buffer. Signed-off-by: Zack Rusin <[email protected]> Reviewed-by: José Fonseca <[email protected]>
* draw: Implement support for primitive idZack Rusin2013-04-038-8/+33
| | | | | | | We were largely ignoring primitive id. Signed-off-by: Zack Rusin <[email protected]> Reviewed-by: José Fonseca <[email protected]>
* draw/so: Fix bogus assertZack Rusin2013-04-031-1/+0
| | | | | | | We do support so with multiple primitives. Signed-off-by: Zack Rusin <[email protected]> Reviewed-by: José Fonseca <[email protected]>
* draw/gs: Fix memory corruption with multiple primitivesZack Rusin2013-04-031-10/+15
| | | | | | | | | | We were flushing with incorrect number of primitives. TGSI exec can only work with a single primitive at a time. Plus the fetching with multiple primitives on llvm paths wasn't copying the last element. Signed-off-by: Zack Rusin <[email protected]> Reviewed-by: José Fonseca <[email protected]>
* gallivm: cleanup the gs interfaceZack Rusin2013-04-033-50/+85
| | | | | | | | Instead of void pointers use a base interface. Signed-off-by: Zack Rusin <[email protected]> Reviewed-by: Brian Paul <[email protected]> Reviewed-by: José Fonseca <[email protected]>
* svga: add new memory-used HUD queryBrian Paul2013-04-038-1/+33
| | | | | | | To track the amount of memory used by all pipe_resources (textures and buffers). Reviewed-by: Jose Fonseca <[email protected]>
* util: add new util_resource_size() function in u_resource.[ch]Brian Paul2013-04-032-1/+98
| | | | Reviewed-by: Jose Fonseca <[email protected]>
* util: move functions from u_resource.c to u_transfer.cBrian Paul2013-04-032-75/+74
| | | | | | | | | The functions are prototyped in u_transfer.h and are related to the other functions in u_transfer.c. The next patch will re-use the u_resource.c file for new code. Reviewed-by: Jose Fonseca <[email protected]>
* r600g/llvm: Do not override llvm provided stack_sizeVincent Lejeune2013-04-031-1/+2
|
* r600g/llvm: Do not change cf_alu inst when adding alusVincent Lejeune2013-04-031-7/+2
|
* radeonsi: add more cases for copying unsupported formats to resource_copy_regionMarek Olšák2013-04-031-0/+12
| | | | | | | | | | | Ported from r600g commit: 8891b2f9c91b2f6c8625184c23a10b8e55875dc0 Reviewed-by: Michel Dänzer <[email protected]> Reviewed-by: Christian König <[email protected]> NOTE: This is a candidate for the 9.1 branch.
* svga: add HUD queries for number of draw calls, number of fallbacksBrian Paul2013-04-034-0/+61
| | | | | | | The fallbacks count is the number of drawing calls that use a "draw" module fallback, such as polygon stipple. Reviewed-by: Jose Fonseca <[email protected]>
* svga: refactor occlusion query codeBrian Paul2013-04-031-94/+124
| | | | | | This is in preparation for adding new query types for the HUD. Reviewed-by: Jose Fonseca <[email protected]>
* gallium/hud: try L8 texture for font if I8 format isn't supportedBrian Paul2013-04-031-4/+13
|
* svga: add case for PIPE_CAP_QUERY_PIPELINE_STATISTICSBrian Paul2013-04-031-0/+1
|
* st/mesa: rewrite comment in st_manager.cBrian Paul2013-04-031-3/+2
|
* nv50,nvc0: remove MS resolve formats hackChristoph Bumiller2013-04-032-15/+0
| | | | Mesa now allows BlitFramebuffer resolve between RGBA and BGRA.
* nvc0: fix 128 bit compressed storage type selectionChristoph Bumiller2013-04-031-1/+1
|
* nvc0: place staging textures in GART and map them directlyChristoph Bumiller2013-04-037-11/+76
|
* nv50: account for pesky prefetch in size calculation of linear texturesChristoph Bumiller2013-04-031-1/+6
|