aboutsummaryrefslogtreecommitdiffstats
path: root/src/mesa/drivers
Commit message (Collapse)AuthorAgeFilesLines
* i965/gen4: Fix assertion failures in depthstencil piglit tests.Eric Anholt2012-11-011-4/+5
| | | | | | Don't forget to set depth_mt even if !hiz_mt. Reviewed-by: Kenneth Graunke <[email protected]>
* i965: Add "alpha to coverage" to performance debug recompile messages.Kenneth Graunke2012-10-311-0/+1
| | | | | | | This was missing and got labeled "Something else". Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Eric Anholt <[email protected]>
* i965: Don't replicate data for zero-stride arrays when copying to VBOs.Kenneth Graunke2012-10-311-7/+6
| | | | | | | | | | | | | | When copy_array_to_vbo_array encountered an array with src_stride == 0 and dst_stride != 0, we would replicate out the single element to the whole size (max - min + 1). This is unnecessary: we can simply upload one copy and set the buffer's stride to 0. Decreases vertex upload overhead in an upcoming Steam for Linux title. Prior to this patch, copy_array_to_vbo_array appeared very high in the profile (Eric quoted 20%). After the patch, it disappeared completely. Reviewed-by: Eric Anholt <[email protected]> Signed-off-by: Kenneth Graunke <[email protected]>
* i965: Don't bother trying to extend the current vertex buffers.Kenneth Graunke2012-10-313-42/+1
| | | | | | | | | | | | | | | | | | | | This essentially reverts the following: commit c625aa19cb53ed27f91bfd16fea6ea727e9a5bbd Author: Chris Wilson <[email protected]> Date: Fri Feb 18 10:37:43 2011 +0000 intel: extend current vertex buffers While working on optimizing an upcoming Steam title, I broke this code. Eric expressed his doubts about this optimization, and noted that the original commit offered no performance data. I ran before and after benchmarks on Xonotic and Citybench, and found that this code made no difference. So, remove it to reduce complexity and make future work simpler. Reviewed-by: Eric Anholt <[email protected]>
* mesa: remove IBM_multimode_draw_arrays extension enable flagMarek Olšák2012-10-311-1/+0
| | | | | Reviewed-by: Kenneth Graunke <[email protected]> Reviewed-by: Ian Romanick <[email protected]>
* mesa: don't always enable OES_standard_derivativesMarek Olšák2012-10-311-0/+1
| | | | | | | | For Intel, expose it only if gen >= 4. For Gallium, expose it only if PIPE_CAP_SM3 is advertised. Reviewed-by: Kenneth Graunke <[email protected]> Reviewed-by: Ian Romanick <[email protected]>
* mesa: remove EXT_compiled_vertex_array extension enable flagMarek Olšák2012-10-311-1/+0
| | | | | Reviewed-by: Kenneth Graunke <[email protected]> Reviewed-by: Ian Romanick <[email protected]>
* mesa: remove ARB_window_pos extension enable flagMarek Olšák2012-10-311-1/+0
| | | | | Reviewed-by: Kenneth Graunke <[email protected]> Reviewed-by: Ian Romanick <[email protected]>
* mesa: remove ARB_transpose_matrix extension enable flagMarek Olšák2012-10-311-1/+0
| | | | | Reviewed-by: Kenneth Graunke <[email protected]> Reviewed-by: Ian Romanick <[email protected]>
* xlib: Do not undefine _R, _G, and _B.Vinson Lee2012-10-291-3/+0
| | | | | | | | Fixes build error on Cygwin and Solaris. _R, _G, and _B are used in ctype.h on those platforms. Signed-off-by: Vinson Lee <[email protected]> Reviewed-by: Brian Paul <[email protected]>
* intel: support for 16 bit config with 24 depth and 8 stencilTapani Pälli2012-10-291-2/+7
| | | | | | | | | Patch adds additional singlesample config with 565 color buffer, 24 bit depth and 8 bit stencil buffer. This makes Quadrant benchmark work on Android. Tested with Sandybridge and Ivybridge machines. Signed-off-by: Tapani Pälli <[email protected]> Reviewed-by: Chad Versace <[email protected]>
* dri: Support MESA_FORMAT_SARGB8 in driCreateConfigsIan Romanick2012-10-291-1/+2
| | | | | | Signed-off-by: Ian Romanick <[email protected]> Reviewed-by: Eric Anholt <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
* intel: If the visual is sRGB, use an sRGB internal formatIan Romanick2012-10-291-0/+2
| | | | | | Signed-off-by: Ian Romanick <[email protected]> Reviewed-by: Eric Anholt <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
* dri: Convert driCreateConfigs to use a gl_format enumIan Romanick2012-10-296-153/+74
| | | | | | | | | | | This is instead of the pair of GLenums for format and type that were previously used. This is necessary for the Intel drivers to expose sRGB framebuffer formats. Signed-off-by: Ian Romanick <[email protected]> Reviewed-by: Eric Anholt <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]> Reviewed-by: Chad Versace <[email protected]>
* dri_util: Elminiate the bytes_per_pixel tableIan Romanick2012-10-291-9/+3
| | | | | | | | With fewer formats to support, it's kind of useless. Signed-off-by: Ian Romanick <[email protected]> Reviewed-by: Eric Anholt <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
* dri_util: Remove support for RGB332 framebuffersIan Romanick2012-10-291-27/+7
| | | | | | | | None of the remaining DRI drivers in Mesa use this. Signed-off-by: Ian Romanick <[email protected]> Reviewed-by: Eric Anholt <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
* swrast: Remove the 2_3_3_REV framebuffer formatIan Romanick2012-10-291-4/+0
| | | | | | | | | | There is no gl_format in Mesa that corresponds to this arrangement, so I have a very hard time believing that this works. Signed-off-by: Ian Romanick <[email protected]> Reviewed-by: Eric Anholt <[email protected]> Reviewed-by: Brian Paul <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
* i965: Merge brw_prepare_query_begin() and brw_emit_query_begin().Eric Anholt2012-10-263-22/+7
| | | | | | | This is a leftover from when we had to split those two functions due to the separate BO validation step. Reviewed-by: Kenneth Graunke <[email protected]>
* i965: Rename misleading "active" field of brw->query.Eric Anholt2012-10-262-5/+5
| | | | | | | | | "Active" is an already-used term for the query being between glBeginQuery() and glEndQuery(), while this is tracking whether the start of the packet pair for emitting state has been inserted into the current batchbuffer. Reviewed-by: Kenneth Graunke <[email protected]>
* scons: Build xlib swrast too.José Fonseca2012-10-262-0/+51
| | | | Helpful for debugging.
* i965/vs: Preserve the type when copy propagating into an instruction.Kenneth Graunke2012-10-251-0/+1
| | | | | | | | | | | | | | | | | | | | | | Consider the following code, which reinterprets a register as a different type: mov(8) g6<1>F g1.4<0,4,1>.xF and(8) g5<1>.xUD g6<4,4,1>.xUD 0x7fffffffUD Copy propagation would notice that we can replace the use of g6 with g1.4 and eliminate the MOV. Unfortunately, it failed to preserve the UD type, incorrectly generating: and(8) g5<1>.xUD g6<4,4,1>.xF 0x7fffffffUD Found while debugging Ian's uncommitted ARB_vertex_program LOG opcode test with my new Mesa IR -> Vec4 IR translator. NOTE: This is a candidate for stable release branches. Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Eric Anholt <[email protected]>
* i965/vs: Don't lose the MRF writemask when doing compute-to-MRF.Kenneth Graunke2012-10-251-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Consider the following code sequence: mul(8) g4<1>F g1<0,4,1>.wzwwF g3<4,4,1>.wzwwF mov.sat(8) m1<1>.xyF g4<4,4,1>F mul(8) g4<1>F g1<0,4,1>.xxyxF g3<4,4,1>.xxyxF mov.sat(8) m1<1>.zwF g4<4,4,1>F The compute-to-MRF pass will discover the first mov.sat and attempt to replace it by rewriting earlier instructions. Everything works out, so it replaces scan_inst's destination file, reg, and reg_offset, resulting in: mul(8) m1<1>F g1<0,4,1>.wzwwF g3<4,4,1>.wzwwF mul(8) g4<1>F g1<0,4,1>.xxyxF g3<4,4,1>.xxyxF mov.sat(8) m1<1>.zwF g4<4,4,1>F Unfortunately, it loses the .xy writemask on the mov.sat's MRF destination. While this doesn't pose an immediate problem, it then proceeds to transform the second mov.sat, resulting in: mul(8) m1<1>F g1<0,4,1>.wzwwF g3<4,4,1>.wzwwF mul(8) m1<1>F g1<0,4,1>.xxyxF g3<4,4,1>.xxyxF Instead of writing both halves of the vector (like the original code), it overwrites the full vector both times, clobbering the desired .xy values. When encountering a MOV, the compute-to-MRF code scans for instructions which generate channels of the MOV source. It ensures that all necessary channels are available (possibly written by several instructions). In this case, *more* channels are available than necessary, so we want to take the subset that's actually used. Taking the bitwise and of both writemasks should accomplish that. This was discovered by analyzing an ARB_vertex_program test (glean/vertProg1/MUL test (with swizzle and masking)) with my new Mesa IR -> Vec4 IR translator code. However, it should be possible with GLSL programs as well. NOTE: This is a candidate for stable release branches. Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Eric Anholt <[email protected]>
* i965/vs: Fix debug dumping of VS push constants.Kenneth Graunke2012-10-251-1/+3
| | | | | | | | | | | | | | | | While copying the values into the batch space, we advance the param pointer. The debug code then tries to iterate over all the uploaded values, starting at param...which is now the end of the uploaded data, rather than the start. This patch saves a pointer to the start of push constant space before it gets altered and switches the debug code to use that. Tested by uncommenting the code and examining the output of glsl-vs-clamp-1.shader_test. Previously all values appeared to be zero. Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Eric Anholt <[email protected]>
* i965: Actually add support for GL_ANY_SAMPLES_PASSED from GL_ARB_oq2.Eric Anholt2012-10-221-0/+12
| | | | | | v2: Fix mangled sentence in the comment, and make the loop exit early. Reviewed-by: Ian Romanick <[email protected]> (v1)
* i965: Stop flushing the batch on timestamp queries, too.Eric Anholt2012-10-191-1/+0
| | | | | | | Given the usecase we have of trying to measure timestamps across individual draw calls, flushing will totally mess up what people are trying to measure. Reviewed-by: Kenneth Graunke <[email protected]>
* i965: Don't flush the batch immediately on EndQuery.Eric Anholt2012-10-191-5/+14
| | | | | | | | | | | | | | | | The theory I had when I wrote the code was that you wanted to minimize latency on your queries because the app was going to ask soon. Only, it turns out that everybody batches up their queries and asks for the results later (often after the next SwapBuffers!), so this was a pessimization. Until now, I had no workload where it mattered enough to benchmark. Recently I started playing some Minecraft, which uses tons of queries to decide whether to render chunks of the terrain. For that app, avoiding the flush in the query-generation loop improves performance 22.7% +/- 4.7% (n=3) on an apitrace capture of it (confirmed in game by watching the fps meter found by pressing F3, 15/16 -> 20/21 fps). Reviewed-by: Kenneth Graunke <[email protected]>
* i965/fs: Fix typo in refactor of brw_fs_reg_allocate.cpp.Eric Anholt2012-10-191-1/+1
| | | | | I'm amazed that my usual warnings check didn't catch this, and that this passed piglit.
* i965/vs: include format argument in debug printfTapani Pälli2012-10-191-1/+1
| | | | | | | | otherwise some compilers will throw error "error: format not a string literal and no format arguments" Signed-off-by: Tapani Pälli <[email protected]> Reviewed-by: Chad Versace <[email protected]>
* intel: Skip texsubimage fastpath for more pixel unpack state (v2)Chad Versace2012-10-181-1/+6
| | | | | | | | | | | | | | | | | | | Fixes piglit tests "unpack-teximage2d --pbo=* --format=GL_BGRA" on Sandybridge+. The fastpath was checking an incomplete set of pixel unpack state. This patch adds checks for all the fields of gl_pixelstore_attrib that affect 2D texture uploads. Also, it begins permitting the case where GL_UNPACK_ROW_LENGTH is 0. Ideally, we would just ask a unicorn to JIT this fastpath for us in a way that safely handles the unpacking state. Until then, it's safer if only a small set of situations activate the fastpath. v2: Use _mesa_is_bufferobj(), per Anholt. Reviewed-by: Eric Anholt <[email protected]> Signed-off-by: Chad Versace <[email protected]>
* i965/fs: Statically allocate the reg_sets at context initialization.Eric Anholt2012-10-173-49/+61
| | | | | | | Now that we've replaced all the variable settings other than reg_width, it's easy to hang on to this (the expensive part of setting up the allocator). Reviewed-by: Kenneth Graunke <[email protected]>
* i965/fs: Allocate registers in the unused parts of the gen7 MRF hack range.Eric Anholt2012-10-172-1/+63
| | | | | | This should also reduce register pressure on gen7+, like the previous commit. Reviewed-by: Kenneth Graunke <[email protected]>
* i965/fs: Reduce the interference between payload regs and virtual GRFs.Eric Anholt2012-10-172-19/+152
| | | | | | | | | | | | | Improves performance of the Lightsmark penumbra shadows scene by 15.7% +/- 1.0% (n=15), by eliminating register spilling. (tested by smashing the list of scenes to have all other scenes have 0 duration -- includes additional rendering of scene description text that normally doesn't appear in that scene) v2: Allow allocation of all but g0/g1 of the payload. v3: Pull count_to_loop_end() out to a helper function. Reviewed-by: Kenneth Graunke <[email protected]> (v2, recommended v3)
* i965/fs: Expose the payload registers to the register allocator.Eric Anholt2012-10-171-7/+39
| | | | | | For now, nothing else can get allocated over them, but that will change. Reviewed-by: Kenneth Graunke <[email protected]>
* i965/fs: Remove extra allocation for classes[].Eric Anholt2012-10-171-1/+1
| | | | | | | This was to slot in the magic aligned pairs class, but it got moved to a descriptive name later. Reviewed-by: Kenneth Graunke <[email protected]>
* i965/fs: Make the register allocation class_sizes[] choice static.Eric Anholt2012-10-172-60/+42
| | | | | | | | Based on split_virtual_grfs(), we choose the same set every time, so set it in stone. This will help us avoid regenerating the somewhat expensive class/register set setup every compile. Reviewed-by: Kenneth Graunke <[email protected]>
* i965/vs: Improve live interval calculation.Eric Anholt2012-10-174-96/+388
| | | | | | | | | | This is derived from the FS visitor code for the same, but tracks each channel separately (otherwise, some typical fill-a-channel-at-a-time patterns would produce excessive live intervals across loops and cause spilling). Reviewed-by: Kenneth Graunke <[email protected]> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=48375 (crash -> failure, can turn into pass by forcing unrolling still)
* i965/vs: Fix the mlen of scratch read/write messages.Eric Anholt2012-10-171-2/+2
| | | | | | | | These messages always have m0 = g0 and m1 = offset, and write has m2 = data. Avoids regression in opt_compute_to_mrf() with a change to scratch writes to set up the data as an MRF write in the IR. Reviewed-by: Kenneth Graunke <[email protected]>
* i965: Make the cfg reusable from the VS.Eric Anholt2012-10-175-16/+16
| | | | Reviewed-by: Kenneth Graunke <[email protected]>
* i965: Share the predicate field between FS and VS.Eric Anholt2012-10-1713-33/+32
| | | | | | | Note that BRW_PREDICATE_NONE is 0 and BRW_PREDICATE_NORMAL is 1, so that's a lot like the true/false we had in the FS before. Reviewed-by: Kenneth Graunke <[email protected]>
* i965: Rename fs_cfg types to not mention fs.Eric Anholt2012-10-178-65/+65
| | | | | | | | fs_bblock_link -> bblock_link fs_bblock -> bblock_t (to avoid conflicting with all the fs_bblock *bblock) fs_cfg -> cfg_t (to avoid conflicting with all the fs_cfg *cfg) Reviewed-by: Kenneth Graunke <[email protected]>
* i965: Move brw_fs_cfg.* to brw_cfg.*.Eric Anholt2012-10-177-6/+6
| | | | Reviewed-by: Kenneth Graunke <[email protected]>
* i965: Make the FS and VS share a few visitor/instruction fields.Eric Anholt2012-10-175-24/+30
| | | | | | This will let us reuse brw_fs_cfg.cpp from brw_vec4_*. Reviewed-by: Kenneth Graunke <[email protected]>
* i965/vs: Trim the swizzle of the scratch write temporary.Eric Anholt2012-10-171-1/+16
| | | | | | | | | This fixes confusion by the upcoming live variable analysis which saw e.g. use of temp.w when only temp.xyz were initialized in the basic block, and concluded that temp.w must have come from outside of the block (even though it was never initialized anywhere). Reviewed-by: Kenneth Graunke <[email protected]>
* i965/vs: Do the temporary allocation in emit_scratch_write().Eric Anholt2012-10-173-21/+12
| | | | | | Both callers were doing basically the same thing, just written differently. Reviewed-by: Kenneth Graunke <[email protected]>
* i965/vs: Simplify emit_scratch_write() prototype.Eric Anholt2012-10-173-8/+6
| | | | | | | Both callers used (effectively) inst->dst as the argument, so just reference it. Reviewed-by: Kenneth Graunke <[email protected]>
* i965/vs: Add a little bit of IR-level debug ability.Eric Anholt2012-10-174-5/+96
| | | | | | | This is super basic, but it let me visualize a problem I had with opt_compute_to_mrf(). Reviewed-by: Kenneth Graunke <[email protected]>
* wmesa: remove old, unused span codeBrian Paul2012-10-171-474/+0
|
* i965: Fix rendering to small mipmaps of depth/stencil buffers using a temp mt.Eric Anholt2012-10-164-121/+172
| | | | | | | Fixes 51 piglit tests (fbo-clear-formats, and most of the remaining failures in depthstencil). Reviewed-by: Kenneth Graunke <[email protected]>
* i965: Share the draw x/y offset masking code between main/blorp and all gens.Eric Anholt2012-10-166-108/+89
| | | | | | | | | | This code is twisty, and the comment before most of the blocks was actually giving me the opposite impression from its intention: We want to apply as much of our offset as possible through coarse tile-aligned adjustment, since we can do so independently per buffer, and apply the minimum we can through fine-grained drawing offset x/y, since it has to agree between all buffers. Reviewed-by: Kenneth Graunke <[email protected]>
* i965: Make a helper function for the renderbuffer temporary mt workaround.Eric Anholt2012-10-163-22/+30
| | | | | | | We now have a case of wanting to do that on gen6+ as well, so make this logic usable elsewhere. Reviewed-by: Kenneth Graunke <[email protected]>