summaryrefslogtreecommitdiffstats
path: root/src/mesa
Commit message (Collapse)AuthorAgeFilesLines
* i965: Allow CSE on Gen4-5 unary math.Kenneth Graunke2014-10-151-1/+1
| | | | | | | | | | | | | | | | | | | | | | | Due to the implicit move-from-GRF, unary math looks a lot like the Gen6+ math instruction: it's a single instruction (SEND) with a GRF source. The difference is that it also implicitly clobbers a message register. The only visible effect is that CSE will remove the MRF-clobbering from later math operations. This should be fine; compute_to_mrf and remove_redundant_mrf_writes don't look at the values populated by implied writes, so they can't rely on those values being present. Less interference may actually help those passes make more progress. Binary math is still problematic, since it involves a separate MOV instruction to load the second operand. We continue disallowing CSE for binary math operations. total instructions in shared programs: 3340303 -> 3340100 (-0.01%) instructions in affected programs: 26927 -> 26724 (-0.75%) Nothing hurt, gained, or lost. ~6% reduction on a few shaders. Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Matt Turner <[email protected]>
* i965/fs: Use the correct regs_written on unspill instructionsJason Ekstrand2014-10-141-0/+1
| | | | | Signed-off-by: Jason Ekstrand <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
* nouveau: 3d textures are unsupported, limit 3d levels to 1Ilia Mirkin2014-10-141-0/+3
| | | | | | | | | | Ideally there would be a swrast fallback, but the driver isn't ready for that. This should avoid crashes if someone tries to use 3d textures though. Signed-off-by: Ilia Mirkin <[email protected]> Reviewed-by: Francisco Jerez <[email protected]> Cc: [email protected]
* i965: Use unsynchronized maps for the program cache on LLC platforms.Kenneth Graunke2014-10-131-7/+28
| | | | | | | | | | | | | | | | | | | | There's no reason to stall on pwrite - the CPU always appends to the buffer and never modifies existing contents, and the GPU never writes it. Further, the CPU always appends new data before submitting a batch that requires it. This code predates the unsynchronized mapping feature, so we simply didn't have the option when it was written. Ideally, we would do this for non-LLC platforms too, but unsynchronized mapping support only exists for LLC systems. Saves a bunch of stall avoidance copies when uploading shaders. v2: Rebase on changes to previous patch. Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Kristian Høgsberg <[email protected]> [v1]
* i965: Issue performance warnings when copying the program cache BO.Kenneth Graunke2014-10-131-0/+3
| | | | | | | | | | | | | We don't really want unnecessary buffer copying, so it'd be nice to know when it's happening. v2: Drop stall warnings when doing a read-only CPU mapping of the cache BO. The GPU also uses it in a read-only fashion, so there won't be any stalls, even though the buffer is busy. (Thanks to Chris Wilson for catching this mistake.) Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Kristian Høgsberg <[email protected]> [v1]
* i965: Issue performance warnings on MapBufferRange stalls.Kenneth Graunke2014-10-131-3/+4
| | | | | | | | This is easy: we just need to use brw_map_bo instead of mapping it directly. Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Kristian Høgsberg <[email protected]>
* mesa: fix error reported on gTexSubImage2D when level not validTapani Pälli2014-10-101-1/+1
| | | | | Signed-off-by: Tapani Pälli <[email protected]> Reviewed-by: Juha-Pekka Heikkila <[email protected]>
* i965: Fix register write checks.Kenneth Graunke2014-10-101-0/+2
| | | | | | | | | | | | | When mapping the buffer a second time, we need to use the new pointer, not the one from the previous mapping. Otherwise, we will most likely crash. Apparently, we've just been getting lucky and getting the same bo->virtual pointer in both cases. libdrm probably has a hand in that. Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Anuj Phogat <[email protected]> Cc: [email protected]
* i965: Skip uploading border color when unnecessary.Kenneth Graunke2014-10-091-2/+20
| | | | | | | | | | | | | | The border color is only needed when using the GL_CLAMP_TO_BORDER or (deprecated) GL_CLAMP wrap modes; all others ignore it, including the common GL_CLAMP_TO_EDGE and GL_REPEAT wrap modes. In those cases, we can skip uploading it entirely, saving a bit of space in the batchbuffer. Instead, we just point it at the start of the batch (offset 0); we have to program something, and that address is safe to read. Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Chris Forbes <[email protected]>
* i965: Use BDW_MOCS_PTE for renderbuffers.Kenneth Graunke2014-10-091-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | Write-back caching cannot be used for buffers being scanned out by the display engine; surfaces used for scan-out must be write-through or uncached. I originally chose WT for render targets because it works in all cases. However, we really want to use write-back caching where possible, as it is more efficient. Most renderbuffers are not used for scanout - off-screen FBOs certainly are fine, and non-pageflipped backbuffers should be fine as well. So in most cases WB will work. However, we don't know what will be used for scan-out, so we instead simply use the PTE value specified by the kernel, as it knows these things. This matches our MOCS choice on Haswell. Fixes performance regressions since commit ee4484be3dc827cf15bcf109f5 in a microbenchmark (spotted by Eero Tamminen). Improves performance in GLBenchmark 2.7/EgyptHD by 7.44362% +/- 0.496939% (n=55) on a Broadwell GT2. Improves performance in a bunch of other microbenchmarks by ~15% or so. Signed-off-by: Kenneth Graunke <[email protected]> Reported-by: Eero Tamminen <[email protected]> Reviewed-by: Daniel Vetter <[email protected]> Reviewed-by: Kristian Høgsberg <[email protected]> Cc: [email protected]
* i965: Add a BRW_MOCS_PTE #define.Kenneth Graunke2014-10-091-3/+7
| | | | | | | | | | | | | | Like BDW_MOCS_WB and BDW_MOCS_WT, this specifies that we want to use all three caches (L3, LLC, and eLLC where available), but leaves the LLC caching mode up to the kernel's page table entry. This allows the kernel to pick WB/WT/UC based on whether it's using a buffer for scanout. Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Daniel Vetter <[email protected]> Reviewed-by: Kristian Høgsberg <[email protected]> Cc: [email protected]
* mesa: Make _mesa_print_arrays use stderr.Kenneth Graunke2014-10-091-3/+3
| | | | | | | | | These days, most driver debug output happens via stderr, not stdout. Some applications (such as Xephyr) also appear to close stdout which makes these messages go nowhere. Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Tapani Pälli <[email protected]>
* i965/compaction: Disable compaction on SNB temporarily.Matt Turner2014-10-031-0/+6
| | | | Will investigate after XDC.
* Revert "i965: Emit ELSE/ENDIF JIP with type D on Gen 7."Matt Turner2014-10-031-2/+2
| | | | | | | | This reverts commit 54e30dbf4db437748509d1319c3f6e4185f76c69. Will investigate after XDC. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=84557
* i965/fs: Remove dead generate_rep_fb_write prototype.Matt Turner2014-10-031-1/+0
| | | | Added in commit f9dc7aab.
* mesa: fix spurious wglGetProcAddress / GL_INVALID_OPERATION errorBrian Paul2014-10-031-1/+35
| | | | | | | | | | | | | | | | | | On Windows, the Piglit primitive-restart test was failing a glGetError()==0 assertion when it was run w/out any command line arguments. Piglit's all.py script only runs primitive-restart with arguments so this case isn't normally hit during a full piglit run. The basic problem is Microsoft's opengl32.dll calls glFlush from wglGetProcAddress() and Piglit uses wglGetProcAddress() to resolve glPrimitiveRestartNV() which is called inside glBegin/End. See comments in the code for more info. Plus, improve the comments for _mesa_alloc_dispatch_table(). Cc: <[email protected]> Acked-by: Sinclair Yeh <[email protected]>
* mesa: fix GetTexImage for 1D array depth texturesDave Airlie2014-10-031-2/+7
| | | | | | | | | | | | | | | | While running piglit in virgl, I hit an assert in intel driver. "qemu-system-x86_64: intel_tex.c:219: intel_map_texture_image: Assertion `tex_image->TexObject->Target != 0x8C18 || h == 1' failed." Thanks to Eric and Ken for pointing me in the right direction, Fix the get_tex_depth to do the same fixup as get_tex_rgba does for 1D array textures. Reviewed-by: Brian Paul <[email protected]> Reviewed-by: Eric Anholt <[email protected]> Cc: [email protected] Signed-off-by: Dave Airlie <[email protected]>
* st/mesa: Fix paths used in Android buildsTomasz Figa2014-10-033-0/+6
| | | | | | | | | | | | | | | | | | With current makefiles the build fails because source and build paths are generated incorrectly. With Android build system the top_srcdir and top_builddir variables are undefined and all paths are relative to where Android.mk is located. This ends up with path likes external/mesa/src/mesa/src/mesa/ for both source and build paths, which are obviously wrong. This patch fixes this by overriding resulting SRCDIR and BUILDDIR variables with empty string, so that paths end up being relative to Android.mk file again. Appending correct build path to generated files is already done in Android.gen.mk. Signed-off-by: Tomasz Figa <[email protected]> CC: <[email protected]> Reviewed-by: Emil Velikov <[email protected]>
* st/mesa: Generate format_info.c in Android buildsTomasz Figa2014-10-031-0/+9
| | | | | | | | | | Current Android makefiles lack generation of format_info.c, which is a dependency of main/format.c. This patch adds necessary code to Android.gen.mk. Signed-off-by: Tomasz Figa <[email protected]> CC: <[email protected]> Reviewed-by: Emil Velikov <[email protected]>
* util: Include in Android buildsTomasz Figa2014-10-034-2/+5
| | | | | | | | | | This patch fixes Android build failures by including src/util directory in compilation. Files inside of this directory are compiled into libmesa_util static library and linked with resulting libGLES_mesa. Signed-off-by: Tomasz Figa <[email protected]> CC: <[email protected]> Reviewed-by: Emil Velikov <[email protected]>
* i965/fs: Use the correct base_mrf for spilling pairs in SIMD8Jason Ekstrand2014-10-021-3/+4
| | | | | | | | | | Before, we were hard-coding the base_mrf based on dispatch width not number of registers spilled at a time. This caused us to emit instructions with a base_mrf or 14 and a mlen of 3 so we used the magical non-existant m16 register. This fixes the problem. Signed-off-by: Jason Ekstrand <[email protected]> Reviewed-by: Matt Turner <[email protected]>
* i965/fs: Add a MAX_GRF_SIZE define and use it various placesJason Ekstrand2014-10-024-6/+9
| | | | | | | | | | | | Previously, we had a MAX_SAMPLER_MESSAGE_SIZE which we used instead. However, some FB write messages can validly be longer than this so we need something different. Since MAX_SAMPLER_MESSAGE_SIZE is validly useful on its own, we leave it alone and add a new MAX_GRF_SIZE that's big enough for FB writes. Signed-off-by: Jason Ekstrand <[email protected]> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=84539 Reviewed-by: Matt Turner <[email protected]>
* i965/fs: Use the actual regsister width in brw_reg_from_fs_regJason Ekstrand2014-10-021-0/+13
| | | | | | | | This fixes a bug where 1-wide operations don't properly translate down to 1-wide instructions. Signed-off-by: Jason Ekstrand <[email protected]> Reviewed-by: Matt Turner <[email protected]>
* i965/fs_fp: Use null_reg from fs_visitor instead of rolling our ownJason Ekstrand2014-10-021-6/+4
| | | | | | Signed-off-by: Jason Ekstrand <[email protected]> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=84529 Reviewed-by: Matt Turner <[email protected]>
* mesa: relax draw api validation on ES2Tapani Pälli2014-10-021-3/+2
| | | | | | | | | | | Patch fixes failing test in WebGL conformance test 'point-no-attributes' when running Chrome on OpenGL ES. (Shader program may draw points using constant data in shader.) No Piglit regressions. Signed-off-by: Tapani Pälli <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
* meta: Fix make check failures in setup_glsl_msaa_blit_scaled_shader()Anuj Phogat2014-10-011-8/+9
| | | | | | | introduced by commit 68ee950. Signed-off-by: Anuj Phogat <[email protected]> Reported-by: Mark Janes <[email protected]>
* mesa: fix _mesa_alloc_dispatch_table() declarationBrian Paul2014-10-011-1/+1
| | | | Insert 'void' parameter to match declaration in api_exec.h. Trivial.
* meta: (trivial) remove accidental double semicolonRoland Scheidegger2014-10-011-1/+1
|
* i965: Enable EXT_framebuffer_multisample_blit_scaled for gen8Anuj Phogat2014-10-011-2/+1
| | | | | Signed-off-by: Anuj Phogat <[email protected]> Reviewed-by: Jordan Justen <[email protected]>
* meta: Implement ext_framebuffer_multisample_blit_scaled extensionAnuj Phogat2014-10-012-13/+199
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Extension enables doing a multisample buffer resolve and buffer scaling using a single glBlitFrameBuffer() call. Currently, we have this extension implemented in BLORP which is only used by SNB and IVB. This patch implements the extension in meta path which makes it available to Broadwell. Implementation features: - Supports scaled resolves of 2X, 4X and 8X multisample buffers. - Avoids unnecessary shader compilations by storing the pre compiled shaders for each supported sample count. - Uses bilinear filtering for both GL_SCALED_RESOLVE_FASTEST_EXT and GL_SCALED_RESOLVE_NICEST_EXT filter options. This is an allowed behavior in the extension's spec. - I tried doing bicubic filtering for GL_SCALED_RESOLVE_NICEST_EXT filter. It made the edges in the image look little smoother but the image gets blurred causing no overall quality improvement. For now I have dropped the idea of doing different filtering for nicest filter. V2: - Minor changes to simplify the fragment shader. - Refactor the code to move i965 specific sample_map computation out of Meta. We now use ctx->Const.SampleMap{2,4,8}x variables initialized by the driver. - Use a simple msaa resolve shader for scaled resolves with scaling factor = 1.0. V3: - Make changes to create a string out of ctx->Const.SampleMap{2,4,8}x variables and use it in fragment shader. V4: - Make changes to use uint8_t type ctx->Const.SampleMap{2,4,8}x variables. Signed-off-by: Anuj Phogat <[email protected]> Reviewed-by: Jordan Justen <[email protected]>
* i965: Initialize the SampleMap{2,4,8}x variablesAnuj Phogat2014-10-013-0/+55
| | | | | | | | | | | | | with values specific to Intel hardware. V2: Define and use gen6_get_sample_map() function to initialize the variables. V3: Change the function name to gen6_set_sample_maps() and use memcpy() to fill in the data. Signed-off-by: Anuj Phogat <[email protected]> Reviewed-by: Jordan Justen <[email protected]>
* mesa: Add new variables in gl_context to store sample layoutAnuj Phogat2014-10-011-0/+32
| | | | | | | | | | | | | SampleMap{2,4,8}x variables are used in later patches to implement EXT_framebuffer_multisample_blit_scaled extension. V2: Use integer array instead of a string. Bump up the comment. V3: Use uint8_t type array. Signed-off-by: Anuj Phogat <[email protected]> Reviewed-by: Jordan Justen <[email protected]>
* mesa: Avoid flagging _NEW_VIEWPORT on redundant viewport updates.Kenneth Graunke2014-10-011-0/+6
| | | | | | | | | Cuts the number of i965 color calculator viewport uploads by 100x (11017983 -> 113385) in 'x11perf -gc' with Glamor in Xephyr. Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Brian Paul <[email protected]> Reviewed-by: Ian Romanick <[email protected]>
* i965: Drop CACHE_NEW_VS_PROG from the gen7_sf_state atom.Kenneth Graunke2014-10-011-1/+1
| | | | | | | | | I believe when I wrote this code, gen6_sf_state used CACHE_NEW_VS_PROG, which has since been replaced by BRW_NEW_VUE_MAP_GEOM_OUT. It's not needed here anyway - only SBE needs it. Just a copy and paste mistake. Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Ian Romanick <[email protected]>
* i965: Drop brwBindProgram driver hook.Kenneth Graunke2014-10-011-20/+0
| | | | | | | | | | | | | | | | This function flagged BRW_NEW_*_PROGRAM When ctx->{Vertex,Geometry,Fragment}Program._Current changes, core Mesa calls the BindProgram driver hook, which flagged BRW_NEW_*_PROGRAM. However, brw_upload_state also checks for that changing, sets the same flags, and also updates brw->fragment_program and so on. So, this looks to be entirely redundant. Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Matt Turner <[email protected]> Reviewed-by: Ian Romanick <[email protected]> Reviewed-by: Chris Forbes <[email protected]>
* i965: Add missing /* BRW_NEW_FRAGMENT_PROGRAM */ comments.Kenneth Graunke2014-10-013-6/+7
| | | | | | | | | I had to dig a bit to figure out why this was necessary. Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Matt Turner <[email protected]> Reviewed-by: Ian Romanick <[email protected]> Reviewed-by: Chris Forbes <[email protected]>
* i965: Use "1ull" instead of "1" in BRW_NEW_* defines.Kenneth Graunke2014-10-011-32/+32
| | | | | | | | | | Now that the bitfield is a uint64_t, we should use 1ull. Currently, we only have 32 entries, so 1 works fine, but it's not future-proof. Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Matt Turner <[email protected]> Reviewed-by: Ian Romanick <[email protected]> Reviewed-by: Chris Forbes <[email protected]>
* i965: Use ~0ull when flagging all BRW_NEW_* dirty flags.Kenneth Graunke2014-10-013-4/+4
| | | | | | | | | ~0 is 0xFFFFFFFF, which only covers the first 32 bits. We need all 64. Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Matt Turner <[email protected]> Reviewed-by: Ian Romanick <[email protected]> Reviewed-by: Chris Forbes <[email protected]>
* i965: Fix INTEL_DEBUG=state to work with 64-bit dirty bits.Kenneth Graunke2014-10-011-16/+7
| | | | | | | | | | | This will keep INTEL_DEBUG=state working when we add BRW_NEW_* bits beyond 1 << 31. We missed doing this when widening the driver flags from uint32_t to uint64_t. Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Matt Turner <[email protected]> Reviewed-by: Ian Romanick <[email protected]> Reviewed-by: Chris Forbes <[email protected]>
* i965: Delete CACHE_NEW_BLORP_CONST_COLOR_PROG.Kenneth Graunke2014-10-012-3/+0
| | | | | | | | | Unused since krh rewrote fast clears to use meta. Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Matt Turner <[email protected]> Reviewed-by: Ian Romanick <[email protected]> Reviewed-by: Chris Forbes <[email protected]>
* i965: Fix typo in commentChris Forbes2014-10-011-1/+1
| | | | Signed-off-by: Chris Forbes <[email protected]>
* i965: Fix spelling of GEN7_SAMPLER_EWA_ANISOTROPIC_ALGORITHMChris Forbes2014-10-012-2/+2
| | | | Signed-off-by: Chris Forbes <[email protected]>
* i965/fs: Fix the buildJason Ekstrand2014-09-301-1/+1
|
* i965/fs: Fix an uninitialized value warningsJason Ekstrand2014-09-301-3/+4
| | | | | Signed-off-by: Jason Ekstrand <[email protected]> Reviewed-by: Matt Turner <[email protected]>
* i965/fs: Emit compressed BFI2 instructions on Gen > 7.Matt Turner2014-09-301-1/+1
| | | | | | | IVB had a restriction that prevented us from emitting compressed three-source instructions, and although that was lifted on Haswell, Haswell had a new restriction that said BFI instructions specifically couldn't be compressed.
* i965/fs: Allow SIMD16 borrow/carry/64-bit multiply on Gen > 7.Matt Turner2014-09-301-3/+3
| | | | | | | These checks were intended for Gen 7 only. None of these restrictions apply to Gen 8. Reviewed-by: Ian Romanick <[email protected]>
* i965/fs: Set MUL source type to W/UW in 64-bit mul macro on Gen8.Matt Turner2014-09-301-1/+22
| | | | Reviewed-by: Ian Romanick <[email protected]>
* i965/fs: Optimize sqrt+inv into rsq.Matt Turner2014-09-301-0/+11
| | | | | | | | | | | | | | | | | | Transform sqrt a, b rcp c, a into sqrt a, b rsq c, b The improvement here is that we've broken a dependency between these instructions. Leads to 330 fewer INV instructions and 330 more RSQ. Reviewed-by: Anuj Phogat <[email protected]> Reviewed-by: Ian Romanick <[email protected]>
* i965/vec4: Optimize sqrt+inv into rsq.Matt Turner2014-09-301-0/+11
| | | | | | | | | | | | | | | | | | | | | | | Transform sqrt a, b rcp c, a into sqrt a, b rsq c, b In most cases the sqrt's result is still used, so the improvement here is that we've broken a dependency between these instructions. Leads to 80 fewer INV instructions and 80 more RSQ. Occasionally the sqrt's result is no longer used, leading to: instructions in affected programs: 5005 -> 4949 (-1.12%) Reviewed-by: Anuj Phogat <[email protected]> Reviewed-by: Ian Romanick <[email protected]>
* i965/vec4: Call opt_algebraic after opt_cse.Matt Turner2014-09-301-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The next patch adds an algebraic optimization for the pattern sqrt a, b rcp c, a and turns it into sqrt a, b rsq c, b but many vertex shaders do a = sqrt(b); var1 /= a; var2 /= a; which generates sqrt a, b rcp c, a rcp d, a If we apply the algebraic optimization before CSE, we'll end up with sqrt a, b rsq c, b rcp d, a Applying CSE combines the RCP instructions, preventing this from happening. No shader-db changes. Reviewed-by: Anuj Phogat <[email protected]> Reviewed-by: Ian Romanick <[email protected]>