aboutsummaryrefslogtreecommitdiffstats
Commit message (Collapse)AuthorAgeFilesLines
...
* i965: For color clears, only disable writes to components that exist.Kenneth Graunke2014-03-211-1/+1
| | | | | | | | | | | | | | | | | | | | The SIMD16 replicated FB write message only works if we don't need the color calculator to mask our framebuffer writes. Previously, we bailed on it if color_mask wasn't <true, true, true, true>. However, this was needlessly strict for formats with fewer than four components - only the components that actually exist matter. WebGL Aquarium attempts to clear a BGRX texture with the ColorMask set to <true, true, true, false>. This will work perfectly fine with the replicated data message; we just bailed unnecessarily. Improves performance of WebGL Aquarium on Iris Pro (at 1920x1080) by abound 40%, and Bay Trail (at 1366x768) by over 70% (using Chrome 24). Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Anuj Phogat <[email protected]> Reviewed-by: Paul Berry <[email protected]> Tested-by: Dylan Baker <[email protected]>
* i965: Print number of multisamples in INTEL_DEBUG=blorp output.Kenneth Graunke2014-03-211-4/+4
| | | | | | | | This lets us distinguish MSAA resolves from other ordinary blits. Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Anuj Phogat <[email protected]> Reviewed-by: Paul Berry <[email protected]>
* i965: Drop BLT TexSubImage Y-tiling restriction on Gen6+.Kenneth Graunke2014-03-211-2/+2
| | | | | | | | | | Currently, we don't use this path on Sandybridge because we suspect other paths will be faster. But we potentially could. If we do, we should allow it to support Y-tiled BLTs. Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Eric Anholt <[email protected]> Reviewed-by: Anuj Phogat <[email protected]>
* i965: Enable ARB_vertex_type_10f_11f_11f_rev for Gen4/5 also.Chris Forbes2014-03-221-1/+1
| | | | | | | Tested on ILK and CTG (with the GL3isms taken out of the piglits). Signed-off-by: Chris Forbes <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
* clover: Fix typo in validate_object()Tom Stellard2014-03-211-1/+1
| | | | Reviewed-by: Francisco Jerez <[email protected]>
* llvmpipe: add support for b5g6r5_srgbRoland Scheidegger2014-03-215-9/+61
| | | | | | | | | | | | | The conversion code for srgb was tuned for n x 4x8bit AoS -> 4 x nxfloat SoA (and vice versa), fix this to handle also 16bit 565-style srgb formats. Still not really all that generic, things like r10g10b10a2_srgb or r4g4b4a4_srgb wouldn't work (the latter trivial to fix, the former would not require more work to not crash but near certainly need some higher precision calculation) but not needed right now. The code is not fully optimized for this (could use more direct calculation instead of expanding to 8-bit range first) but should be good enough. Reviewed-by: Jose Fonseca <[email protected]>
* gallium: add b5g6r5 srgb formatRoland Scheidegger2014-03-214-4/+21
| | | | | | | | | | | | | | | GL generally doesn't seem to allow srgb formats with less (or more) than 8 bit for the rgb channels, though some hw could easily do it (typically for formats with up to 10 bits for the rgb channels, at least for formats with less than 8 bits support is likely widespread even). While it may be true there aren't really any benefits for such formats, we need for it for d3d, though luckily only for b5g6r5_srgb it seems. So add this format along with the util code for conversion - since that util code is heavily tuned for 8bit srgb this isn't really all that well optimized and rounding doesn't seem right but at least it should give some halfway meaningful results. Reviewed-by: Jose Fonseca <[email protected]>
* nvc0/ir: move sample id to second source arg to fix sampler2DMSIlia Mirkin2014-03-202-4/+12
| | | | | | | | | | | | The nvc0 texfetch instruction expects the sample id to be in the second source (usually used for the offset) rather than as part of the texture coordinate. This fixes all the sampler2DMS/Array tests on nvc0. Signed-off-by: Ilia Mirkin <[email protected]> Reviewed-by: Christoph Bumiller <[email protected]> Cc: "10.1" <[email protected]>
* st/mesa: drop the lowering of quad strips to triangle stripsMarek Olšák2014-03-211-10/+0
| | | | | | | | | | | | | | | This fallback to triangle strips is silly and should be done in drivers if they need it. This should fix the case when quad strips are used with flatshading that is enabled by the "flat" GLSL varying modifier. It also fixes primitive restart for quad strips. This fixes piglit: NV_primitive_restart/primitive-restart-draw-mode-quad_strip Cc: [email protected] Reviewed-by: Brian Paul <[email protected]>
* gallium/u_gen_mipmap: remove the software fallbackMarek Olšák2014-03-211-1160/+2
| | | | | | | | The last changes to it are from 2008 and 2009. It doesn't support most texture formats and some texture targets. Nobody can possibly be using this. Reviewed-by: Brian Paul <[email protected]>
* st/mesa: fix generating mipmaps for cube arraysMarek Olšák2014-03-212-29/+22
| | | | | Cc: [email protected] Reviewed-by: Brian Paul <[email protected]>
* mesa: fix software fallback for generating mipmaps for 3D texturesMarek Olšák2014-03-211-21/+16
| | | | | | | | It didn't use the driver-provided src/dstRowStride at all. This was broken for the cases when stride != width*bpp. Cc: [email protected] Reviewed-by: Brian Paul <[email protected]>
* mesa: fix software fallback for generating mipmaps for cube arraysMarek Olšák2014-03-211-2/+5
| | | | | Cc: [email protected] Reviewed-by: Brian Paul <[email protected]>
* mesa: allow generating mipmaps for cube arraysMarek Olšák2014-03-211-0/+4
| | | | | Cc: [email protected] Reviewed-by: Brian Paul <[email protected]>
* mesa: fix texture border handling for cube arraysMarek Olšák2014-03-211-1/+4
| | | | | Cc: [email protected] Reviewed-by: Brian Paul <[email protected]>
* r600g: use more appropriate names for async DMA functionsMarek Olšák2014-03-205-32/+32
| | | | | | *_dma_copy calls either *_dma_copy_buffer or *_dma_copy_tile. Reviewed-by: Michel Dänzer <[email protected]>
* r600g: deobfuscate async DMA codeMarek Olšák2014-03-206-31/+35
| | | | Reviewed-by: Michel Dänzer <[email protected]>
* r600g: don't flush the gfx IB explicitly before doing DMAMarek Olšák2014-03-204-11/+0
| | | | | | It's flushed by calling r600_context_bo_reloc. Reviewed-by: Michel Dänzer <[email protected]>
* winsys/radeon: only add duplicate relocations for DMA if VM isn't supportedMarek Olšák2014-03-201-10/+13
| | | | | | Also rewrite the comment for it to be readable and reorder the code. Reviewed-by: Alex Deucher <[email protected]>
* radeonsi: Implement DMA blitNiels Ole Salscheider2014-03-206-20/+391
| | | | | | | | | | | | | | This code is a slightly modified version of evergreen_dma_blit (and evergreen_dma_copy as well as evergreen_dma_copy_tile). It would be nice to share some of the code in the long term. I have reused some "cik"-prefixed functions that also return the right value for SI. I am not sure if they should be renamed. v2: Marek> removed gfx.flush in si_dma_copy_tile Signed-off-by: Niels Ole Salscheider <[email protected]> Signed-off-by: Marek Olšák <[email protected]>
* radeon: Move r600_need_dma_space to common codeNiels Ole Salscheider2014-03-207-15/+15
| | | | | Signed-off-by: Niels Ole Salscheider <[email protected]> Signed-off-by: Marek Olšák <[email protected]>
* llvmpipe: Tighten check for alpha-only formatsRichard Sandiford2014-03-201-1/+1
| | | | | | | | | | | The AoS version of ld_build_blend_factor was assuming that if the first channel was alpha, there were no rgb components. Fixes glean/blendFunc on System z. No piglit regressions on x86_64. The shortcut is still used in tests like spec/ARB_framebuffer_object/ fbo-alpha. Signed-off-by: Richard Sandiford <[email protected]>
* nouveau: don't assume libdrm include prefixJonathan Gray2014-03-205-5/+5
| | | | | | | drm headers may be installed in a different directory Signed-off-by: Jonathan Gray <[email protected]> Reviewed-by: Ilia Mirkin <[email protected]>
* nouveau: use DLOPEN_LIBS instead of -ldlJonathan Gray2014-03-201-1/+1
| | | | | | | libdl does not exist on many platforms which have dlopen in libc. Signed-off-by: Jonathan Gray <[email protected]> Reviewed-by: Ilia Mirkin <[email protected]>
* c11/threads: don't include assert.h if the assert macro is already definedBrian Paul2014-03-192-0/+4
| | | | | | | | | | | | | | | | | In the gallium code, the assert() macro could come from either the system's assert.h file (via c11/threads.h) or from gallium's u_debug.h. It looks like all known assert.h files unconditionally #undef assert before defining their own version. So the assert you get depends on whether threads.h or u_debug.h was included last. In the gallium code we really want to use the assert() from u_debug.h (it behaves better on Windows). In gallium, c11/threads.h is only included after u_debug.h in the os_thread.h wrapper. So Adding an #ifndef assert test in the threads*.h files avoids using the system's assert(). Cc: "10.1" <[email protected]> Reviewed-by: José Fonseca <[email protected]>
* nouveau: there may not have been a texture if the fbo was incompleteIlia Mirkin2014-03-191-1/+2
| | | | | | Signed-off-by: Ilia Mirkin <[email protected]> Reviewed-by: Francisco Jerez <[email protected]> Cc: "10.0 10.1" <[email protected]>
* nouveau: add forgotten GL_COMPRESSED_INTENSITY to texture format listIlia Mirkin2014-03-191-0/+1
| | | | | | Signed-off-by: Ilia Mirkin <[email protected]> Reviewed-by: Francisco Jerez <[email protected]> Cc: "10.0 10.1" <[email protected]>
* mesa/main: condition GL_DEPTH_STENCIL on ARB_depth_textureIlia Mirkin2014-03-191-8/+3
| | | | | | | | | | | | | | | | | | | EXT_packed_depth_stencil is supported by all drivers, but ARB_depth_texture isn't (notably nouveau_vieux). This should avoid passing unexpected values down to ChooseTextureFormat. The EXT_packed_depth_stencil spec does not make any explicit references to requiring ARB_depth_texture in order to allow textures with that format, however if there is no dependency, ARB_depth_texture would be practically implied by the extension. Signed-off-by: Ilia Mirkin <[email protected]> Reviewed-by: Francisco Jerez <[email protected]> Reviewed-by: Ian Romanick <[email protected]> Cc: "10.0 10.1" <[email protected]> Note for 10.0 backport: This will produce a conflict, the solution is to move the surrounding if as well.
* loader: add special logic to distinguish nouveau from nouveau_vieuxIlia Mirkin2014-03-195-13/+76
| | | | | | | | | | | There are a lot of different pci ids supported by nouveau, and more are added all the time. The relevant distinguisher between drivers is the chipset id. Signed-off-by: Ilia Mirkin <[email protected]> Reviewed-by: Emil Velikov <[email protected]> Reviewed-by: Eric Anholt <[email protected]> Cc: "10.1" <[email protected]>
* glsl: Allow dot() on scalars, and throw out dotlike().Matt Turner2014-03-183-11/+5
| | | | | | | | | | In all uses of dotlike() we're writing generic code that operates on 1-4 component vectors. That our IR requires ir_binop_dot expressions' operands to be 2+ component vectors is an implementation detail that's not important when implementing built-in functions with dot(), which is defined for scalar floats in GLSL. Reviewed-by: Eric Anholt <[email protected]>
* glsl: Optimize pow(x, 2) into x * x.Matt Turner2014-03-181-0/+8
| | | | | | Cuts two instructions out of SynMark's Gl32VSInstancing benchmark. Reviewed-by: Eric Anholt <[email protected]>
* glsl: Match whitespace changes from previous patch.Matt Turner2014-03-181-4/+4
| | | | | Reviewed-by: Anuj Phogat <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
* glsl: Expose pack/unpack built-ins for ARB_gpu_shader5.Matt Turner2014-03-181-9/+17
| | | | | | | | ARB_gpu_shader5 and ES 3.0 expose different subsets of ARB_shading_language_packing. Reviewed-by: Anuj Phogat <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
* i965: Drop some more dead code from the old CACHED_BATCH feature.Eric Anholt2014-03-184-38/+0
| | | | | Reviewed-by: Topi Pohjolainen <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
* i965: Drop special case for edgeflag thanks to Marek's change to core.Eric Anholt2014-03-181-9/+0
| | | | | | | As of 780ce576bb1781f027797039693b98253ee4813e, we end up with R8_SSCALED anyway. Reviewed-by: Kenneth Graunke <[email protected]>
* mesa: include stdbool.h in register_allocate.h to fix buildBrian Paul2014-03-181-0/+2
| | | | https://bugs.freedesktop.org/show_bug.cgi?id=76331
* i965: Enable EWA anisotropic filtering algorithmIan Romanick2014-03-181-0/+1
| | | | | | | | | | | Volume 4, part 1 of the Ivybridge PRM says, "Generally, the EWA approximation algorithm results in higher image quality than the legacy algorithm." Using a classic anisotropic filtering "tunnel" demo, it appears that there is *no* anisotropic filtering on IVB without this bit set. Signed-off-by: Ian Romanick <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
* i965: Actually initialize simd16_unsupported and no16_msg.Kenneth Graunke2014-03-181-0/+2
| | | | | | | | I meant to include this fixes in v3 of commit de7ad2c88f4ec243c95eaed22c41d0e537912e01, but accidentally pushed a previous version. Signed-off-by: Kenneth Graunke <[email protected]>
* i965/upload: Refactor open-coded ALIGN-like computations.Kenneth Graunke2014-03-181-3/+9
| | | | | | | | Sadly, we can't use actual ALIGN(), since that only supports power-of-two values for the alignment parameter. Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Ian Romanick <[email protected]>
* i965: Fix indentation in brw_upload_indices().Kenneth Graunke2014-03-181-19/+19
| | | | | Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Ian Romanick <[email protected]>
* i965: Consolidate code for setting brw->ib.start_vertex_offset.Kenneth Graunke2014-03-181-9/+6
| | | | | | | This was set identically in three places. Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Ian Romanick <[email protected]>
* i965: Allocate register sets at screen creation, not context creation.Kenneth Graunke2014-03-186-88/+88
| | | | | | | | | | | | | | Register sets depend on the particular hardware generation, but don't depend on anything in the actual OpenGL context. Computing them is fairly expensive, and they take up a large amount of memory. Putting them in the screen allows us to compute/allocate them once for all contexts, saving both time and space. Improves the performance of a context creation/destruction microbenchmark by about 3x on my Haswell i7-4750HQ. Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Eric Anholt <[email protected]>
* i965: Allocate the screen using ralloc rather than calloc.Kenneth Graunke2014-03-181-2/+3
| | | | | | | This will allow us to use the screen as a memory context. Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Eric Anholt <[email protected]>
* ra: Convert another bool array to bitsets.Eric Anholt2014-03-181-6/+7
| | | | | | | | | This one saves about 2MB peak allocation in glsl-fs-algebraic-add-add-1, with no performance difference on timing short shader-db runs (n=9/10, warmup outlier removed). Reviewed-by: Kenneth Graunke <[email protected]> Reviewed-by: Matt Turner <[email protected]>
* ra: Use a bitset for storing which registers belong to a class.Kenneth Graunke2014-03-181-5/+10
| | | | | | | | | This should use 1/8 the memory. Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Matt Turner <[email protected]> Reviewed-by: Brian Paul <[email protected]> Reviewed-by: Christoph Brill <[email protected]>
* ra: Create a reg_belongs_to_class() helper function.Kenneth Graunke2014-03-181-2/+11
| | | | | | | | | This is a little easier to read. Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Matt Turner <[email protected]> Reviewed-by: Brian Paul <[email protected]> Reviewed-by: Christoph Brill <[email protected]>
* ra: Use bool instead of GLboolean.Kenneth Graunke2014-03-182-28/+29
| | | | | | | | | | | | | | | | | | | This isn't the GL API, so there's no reason to use GLboolean. Using bool is safer: any non-zero value is treated as "true". When converting a value to a GLboolean, all but the low byte is discarded, which means that values like 256 will be incorrectly rendered as false. Done via the following vim commands: :%s/GLboolean/bool/g :%s/GL_TRUE/true/g :%s/GL_FALSE/false/g and one line of manual whitespace tidying. Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Eric Anholt <[email protected]> Reviewed-by: Matt Turner <[email protected]> Reviewed-by: Brian Paul <[email protected]>
* i965: Accurately bail on SIMD16 compiles.Kenneth Graunke2014-03-183-34/+82
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | Ideally, we'd like to never even attempt the SIMD16 compile if we could know ahead of time that it won't succeed---it's purely a waste of time. This is especially important for state-based recompiles, which happen at draw time. The fragment shader compiler has a number of checks like: if (dispatch_width == 16) fail("...some reason..."); This patch introduces a new no16() function which replaces the above pattern. In the SIMD8 compile, it sets a "SIMD16 will never work" flag. Then, brw_wm_fs_emit can check that flag, skip the SIMD16 compile, and issue a helpful performance warning if INTEL_DEBUG=perf is set. (In SIMD16 mode, no16() calls fail(), for safety's sake.) The great part is that this is not a heuristic---if the flag is set, we know with 100% certainty that the SIMD16 compile would fail. (It might fail anyway if we run out of registers, but it's always worth trying.) v2: Fix missing va_end in early-return case (caught by Ilia Mirkin). Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Chris Forbes <[email protected]> [v1] Reviewed-by: Ian Romanick <[email protected]> [v1] Reviewed-by: Eric Anholt <[email protected]>
* i965/fs: Support pull parameters in SIMD16 mode.Kenneth Graunke2014-03-182-11/+13
| | | | | | | | | | | This is just a matter of reusing the pull/push constant information set up by the SIMD8 compile. This gains us 78 SIMD16 programs in shader-db. Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Topi Pohjolainen <[email protected]> Reviewed-by: Eric Anholt <[email protected]>
* i965/fs: Use a single instance of the pull_constant_loc[] array.Kenneth Graunke2014-03-182-28/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | Now that we don't renumber uniform registers, assign_constant_locations and move_uniform_array_access_to_pull_constants use the same names. So, they can share a single copy of the pull_constant_loc[] array. This simplifies the code considerably. assign_constant_locations() doesn't need to walk through pull_params[] to rediscover reladdr demotions; it just has that information in pull_constant_loc[]. We also only need to rewrite the instruction stream once, instead of twice. Even better, we now have a single array describing the layout of all pull parameters, which we can pass to the SIMD16 program. This actually hurts a few shaders in Serious Sam 3, and one in KWin: total instructions in shared programs: 1841957 -> 1842035 (0.00%) instructions in affected programs: 1165 -> 1243 (6.70%) Comparing dump_instructions() before and after the pull constant transformations with and without this patch, it appears that there is a uniform array with variable indexing (reladdr) and constant indexing (of array element 0). Previously, we uploaded array element 0 as both a pull constant (for reladdr) /and/ a push constant. Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Topi Pohjolainen <[email protected]> Reviewed-by: Eric Anholt <[email protected]>