summaryrefslogtreecommitdiffstats
path: root/src/mesa/drivers
Commit message (Collapse)AuthorAgeFilesLines
* meta: Remove redundant code in _mesa_meta_GenerateMipmapAnuj Phogat2012-11-051-61/+4
| | | | | | | | | Integer textures generate invalid operation in glGenerateMipmap. So, the code related to integer textures is now redundant. Note: This is a candidate for stable branches. Signed-off-by: Anuj Phogat <[email protected]> Reviewed-by: Brian Paul <[email protected]>
* i965: Fix oversized initial allocation of the state cache table pointers.Vandrus Zoltán2012-11-041-1/+1
| | | | Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=55030
* i965: Force border color A to 1 when it's not present in the GL format.Eric Anholt2012-11-041-0/+7
| | | | | | | | | | It's usually forced to 1 by the surface format, but sometimes we actually have alpha present because it's the only format available. Fixes piglit texwrap bordercolor tests for OpenGL 1.1, GL_EXT_texture_sRGB and GL_ARB_texture_float. Reviewed-by: Kenneth Graunke <[email protected]>
* i965: Fix uploading user vertex arrays with basevertex set.Eric Anholt2012-11-043-2/+7
| | | | | | | | | | | If the index buffer is full of values like "0 1 2 3", but basevertex is 4, we need to upload at least vertex data for elements 4 5 6 7. Whether we also upload 0 1 2 3 is a question of whether there are VBOs present or not -- see the code setting start_vertex_bias in brw_draw_upload.c. Fixes piglit draw-elements*base-vertex user_varrays Reviewed-by: Kenneth Graunke <[email protected]>
* i965: Set dirty state for brw_draw_upload.c when num_instances changes.Eric Anholt2012-11-041-1/+4
| | | | | | | | Otherwise, if we had a set of prims passed in with a num_instances varying between them, we wouldn't upload enough (or too much!) from user vertex arrays. Reviewed-by: Kenneth Graunke <[email protected]>
* i965: Remove the vbo_rebase_prims() path.Eric Anholt2012-11-041-15/+6
| | | | | | | | The brw_draw_upload.c start_vertex_bias code has support for doing the rebase without rewriting the index buffer by applying a basevertex. It looks like vbo_rebase_prims() is not equipped to handle basevertex. Reviewed-by: Kenneth Graunke <[email protected]>
* i965/fs: Fix a comment in copy propagation.Eric Anholt2012-11-041-1/+3
| | | | | | | We haven't been only tracking raw GRF-GRF moves since the constant propagation merge, and also the extension for source modifiers and uniforms. Reviewed-by: Kenneth Graunke <[email protected]>
* i965/fs: Allow copy-propagation on pull constant load values.Eric Anholt2012-11-041-3/+4
| | | | | | | | | | | | Given that we handle similarly-regioned GRFs registers for our copy propagation from our UNIFORM file, there's no reason not to allow it. The shader-db impact is negligible -- +90 instructions total, 2 shaders helped and 7 hurt (slightly increased register pressure increased spilling), but this is to prevent regression in other shaders when fixing copy_propagation to reduce register pressure in the shaders that are hurt here. Reviewed-by: Kenneth Graunke <[email protected]>
* i965/fs: Do dead code elimination just after copy propagation.Eric Anholt2012-11-041-1/+1
| | | | | | | | | | | | | | | | If we put the register coalescing in between the two, then we end up with code sequences involving dead writes that the dead code elimination doesn't know how to remove. In place of making dead code elimination smart (which we should do, too), make it less important for the moment. shader-db results: total instructions in shared programs: 722240 -> 721275 (-0.13%) instructions in affected programs: 50573 -> 49608 (-1.91%) (no shaders regressed). Reviewed-by: Kenneth Graunke <[email protected]>
* i965/fs: Compact the virtual GRF arrays.Kenneth Graunke2012-11-032-0/+61
| | | | | | | | | | | | | | During code generation, we create tons of temporary variables, many of which get immediately killed and are never used. Later optimization and analysis passes, such as compute_live_intervals, loop over all the virtual GRFs. By compacting them, we can save a lot of overhead. Reduces compilation time in L4D2's largest fragment shader from 10.2 seconds to 5.2 seconds (50%). Drops compute_live_variables() from 10-12% of another game's startup time to 8%. Reviewed-by: Eric Anholt <[email protected]> Signed-off-by: Kenneth Graunke <[email protected]>
* i965: Remove unused variables after removing the old VS backend.Kenneth Graunke2012-11-011-2/+0
| | | | Fixes compiler warnings about unused variables.
* i965: Remove unnecessary walk through Mesa IR in ProgramStringNotify().Kenneth Graunke2012-11-011-82/+0
| | | | | | | | | | | | | Variable indexing of non-uniform arrays only exists in GLSL. Likewise, OPCODE_CAL/OPCODE_RET only existed to try and support GLSL's function calls. We don't use Mesa IR for GLSL, and these features are explicitly disallowed by ARB_vertex_program/ARB_fragment_program and never generated by ffvertex_prog.c. Since they'll never happen, there's no need to check for them, which saves us from walking through all the Mesa IR instructions. Reviewed-by: Eric Anholt <[email protected]>
* i965: Remove VS constant buffer read support from brw_eu_emit.c.Kenneth Graunke2012-11-012-121/+0
| | | | | | | brw_vec4_emit.cpp implements this directly; only the old backend used the brw_eu_emit.c code. Reviewed-by: Eric Anholt <[email protected]>
* i965: Update comment about clipper constants.Kenneth Graunke2012-11-011-9/+1
| | | | | | | The old VS backend doesn't exist, but I believe these still need to be delivered to the clipper thread. Reviewed-by: Eric Anholt <[email protected]>
* i965/vs: Remove brw_vs_compile::constant_map.Kenneth Graunke2012-11-012-18/+1
| | | | | | It was only used for the old backend. Reviewed-by: Eric Anholt <[email protected]>
* i965/vs: Remove support for the old parameter layout.Kenneth Graunke2012-11-015-70/+7
| | | | | | Only the old backend used it. Reviewed-by: Eric Anholt <[email protected]>
* i965/vs: Delete the old vertex shader backend.Kenneth Graunke2012-11-014-1836/+0
| | | | | | It's no longer used for anything. Reviewed-by: Eric Anholt <[email protected]>
* i965/vs: Replace brw_vs_emit.c with dumping code into the vec4_visitor.Kenneth Graunke2012-11-016-32/+738
| | | | | | | | | | | | | | | | | | | | Rather than having two separate backends, just create a small layer that translates the subset of Mesa IR used for ARB_vertex_program and fixed function programs to the Vec4 IR. This allows us to use the same optimization passes, code generator, register allocator as for GLSL. v2: Incorporate Eric's review comments. - Fix use of uninitialized src_swiz[] values in the SWIZZLE_ZERO/ONE case: just initialize it to 0 (.x) since the value doesn't matter (those channels get writemasked out anyway). - Properly reswizzle source register's swizzles, rather than overwriting the swizzle. - Port the old brw_vs_emit code for computing .x of the EXP2 opcode. - Update comments, removing mention of NV_vertex_program, etc. - Delete remaining #warning lines and debug comments. Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Eric Anholt <[email protected]>
* i965/vs: Refactor min/max handling to share code.Kenneth Graunke2012-11-012-18/+21
| | | | | | | | v2: Properly use "conditionalmod" pre-Gen6, rather than the incorrectly copy-and-pasted "BRW_CONDITIONAL_G". Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Eric Anholt <[email protected]>
* i965/vs: Add support for emitting DPH opcodes.Kenneth Graunke2012-11-013-0/+6
| | | | | Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Eric Anholt <[email protected]>
* i965/vs: Only do INTEL_DEBUG=perf when there's a GLSL shader.Kenneth Graunke2012-11-011-3/+2
| | | | | | | | This will become necessary once we start supporting ARB programs and fixed function in this backend. Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Eric Anholt <[email protected]>
* i965/gen4: Fix assertion failures in depthstencil piglit tests.Eric Anholt2012-11-011-4/+5
| | | | | | Don't forget to set depth_mt even if !hiz_mt. Reviewed-by: Kenneth Graunke <[email protected]>
* i965: Add "alpha to coverage" to performance debug recompile messages.Kenneth Graunke2012-10-311-0/+1
| | | | | | | This was missing and got labeled "Something else". Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Eric Anholt <[email protected]>
* i965: Don't replicate data for zero-stride arrays when copying to VBOs.Kenneth Graunke2012-10-311-7/+6
| | | | | | | | | | | | | | When copy_array_to_vbo_array encountered an array with src_stride == 0 and dst_stride != 0, we would replicate out the single element to the whole size (max - min + 1). This is unnecessary: we can simply upload one copy and set the buffer's stride to 0. Decreases vertex upload overhead in an upcoming Steam for Linux title. Prior to this patch, copy_array_to_vbo_array appeared very high in the profile (Eric quoted 20%). After the patch, it disappeared completely. Reviewed-by: Eric Anholt <[email protected]> Signed-off-by: Kenneth Graunke <[email protected]>
* i965: Don't bother trying to extend the current vertex buffers.Kenneth Graunke2012-10-313-42/+1
| | | | | | | | | | | | | | | | | | | | This essentially reverts the following: commit c625aa19cb53ed27f91bfd16fea6ea727e9a5bbd Author: Chris Wilson <[email protected]> Date: Fri Feb 18 10:37:43 2011 +0000 intel: extend current vertex buffers While working on optimizing an upcoming Steam title, I broke this code. Eric expressed his doubts about this optimization, and noted that the original commit offered no performance data. I ran before and after benchmarks on Xonotic and Citybench, and found that this code made no difference. So, remove it to reduce complexity and make future work simpler. Reviewed-by: Eric Anholt <[email protected]>
* mesa: remove IBM_multimode_draw_arrays extension enable flagMarek Olšák2012-10-311-1/+0
| | | | | Reviewed-by: Kenneth Graunke <[email protected]> Reviewed-by: Ian Romanick <[email protected]>
* mesa: don't always enable OES_standard_derivativesMarek Olšák2012-10-311-0/+1
| | | | | | | | For Intel, expose it only if gen >= 4. For Gallium, expose it only if PIPE_CAP_SM3 is advertised. Reviewed-by: Kenneth Graunke <[email protected]> Reviewed-by: Ian Romanick <[email protected]>
* mesa: remove EXT_compiled_vertex_array extension enable flagMarek Olšák2012-10-311-1/+0
| | | | | Reviewed-by: Kenneth Graunke <[email protected]> Reviewed-by: Ian Romanick <[email protected]>
* mesa: remove ARB_window_pos extension enable flagMarek Olšák2012-10-311-1/+0
| | | | | Reviewed-by: Kenneth Graunke <[email protected]> Reviewed-by: Ian Romanick <[email protected]>
* mesa: remove ARB_transpose_matrix extension enable flagMarek Olšák2012-10-311-1/+0
| | | | | Reviewed-by: Kenneth Graunke <[email protected]> Reviewed-by: Ian Romanick <[email protected]>
* xlib: Do not undefine _R, _G, and _B.Vinson Lee2012-10-291-3/+0
| | | | | | | | Fixes build error on Cygwin and Solaris. _R, _G, and _B are used in ctype.h on those platforms. Signed-off-by: Vinson Lee <[email protected]> Reviewed-by: Brian Paul <[email protected]>
* intel: support for 16 bit config with 24 depth and 8 stencilTapani Pälli2012-10-291-2/+7
| | | | | | | | | Patch adds additional singlesample config with 565 color buffer, 24 bit depth and 8 bit stencil buffer. This makes Quadrant benchmark work on Android. Tested with Sandybridge and Ivybridge machines. Signed-off-by: Tapani Pälli <[email protected]> Reviewed-by: Chad Versace <[email protected]>
* dri: Support MESA_FORMAT_SARGB8 in driCreateConfigsIan Romanick2012-10-291-1/+2
| | | | | | Signed-off-by: Ian Romanick <[email protected]> Reviewed-by: Eric Anholt <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
* intel: If the visual is sRGB, use an sRGB internal formatIan Romanick2012-10-291-0/+2
| | | | | | Signed-off-by: Ian Romanick <[email protected]> Reviewed-by: Eric Anholt <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
* dri: Convert driCreateConfigs to use a gl_format enumIan Romanick2012-10-296-153/+74
| | | | | | | | | | | This is instead of the pair of GLenums for format and type that were previously used. This is necessary for the Intel drivers to expose sRGB framebuffer formats. Signed-off-by: Ian Romanick <[email protected]> Reviewed-by: Eric Anholt <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]> Reviewed-by: Chad Versace <[email protected]>
* dri_util: Elminiate the bytes_per_pixel tableIan Romanick2012-10-291-9/+3
| | | | | | | | With fewer formats to support, it's kind of useless. Signed-off-by: Ian Romanick <[email protected]> Reviewed-by: Eric Anholt <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
* dri_util: Remove support for RGB332 framebuffersIan Romanick2012-10-291-27/+7
| | | | | | | | None of the remaining DRI drivers in Mesa use this. Signed-off-by: Ian Romanick <[email protected]> Reviewed-by: Eric Anholt <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
* swrast: Remove the 2_3_3_REV framebuffer formatIan Romanick2012-10-291-4/+0
| | | | | | | | | | There is no gl_format in Mesa that corresponds to this arrangement, so I have a very hard time believing that this works. Signed-off-by: Ian Romanick <[email protected]> Reviewed-by: Eric Anholt <[email protected]> Reviewed-by: Brian Paul <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
* i965: Merge brw_prepare_query_begin() and brw_emit_query_begin().Eric Anholt2012-10-263-22/+7
| | | | | | | This is a leftover from when we had to split those two functions due to the separate BO validation step. Reviewed-by: Kenneth Graunke <[email protected]>
* i965: Rename misleading "active" field of brw->query.Eric Anholt2012-10-262-5/+5
| | | | | | | | | "Active" is an already-used term for the query being between glBeginQuery() and glEndQuery(), while this is tracking whether the start of the packet pair for emitting state has been inserted into the current batchbuffer. Reviewed-by: Kenneth Graunke <[email protected]>
* scons: Build xlib swrast too.José Fonseca2012-10-262-0/+51
| | | | Helpful for debugging.
* i965/vs: Preserve the type when copy propagating into an instruction.Kenneth Graunke2012-10-251-0/+1
| | | | | | | | | | | | | | | | | | | | | | Consider the following code, which reinterprets a register as a different type: mov(8) g6<1>F g1.4<0,4,1>.xF and(8) g5<1>.xUD g6<4,4,1>.xUD 0x7fffffffUD Copy propagation would notice that we can replace the use of g6 with g1.4 and eliminate the MOV. Unfortunately, it failed to preserve the UD type, incorrectly generating: and(8) g5<1>.xUD g6<4,4,1>.xF 0x7fffffffUD Found while debugging Ian's uncommitted ARB_vertex_program LOG opcode test with my new Mesa IR -> Vec4 IR translator. NOTE: This is a candidate for stable release branches. Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Eric Anholt <[email protected]>
* i965/vs: Don't lose the MRF writemask when doing compute-to-MRF.Kenneth Graunke2012-10-251-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Consider the following code sequence: mul(8) g4<1>F g1<0,4,1>.wzwwF g3<4,4,1>.wzwwF mov.sat(8) m1<1>.xyF g4<4,4,1>F mul(8) g4<1>F g1<0,4,1>.xxyxF g3<4,4,1>.xxyxF mov.sat(8) m1<1>.zwF g4<4,4,1>F The compute-to-MRF pass will discover the first mov.sat and attempt to replace it by rewriting earlier instructions. Everything works out, so it replaces scan_inst's destination file, reg, and reg_offset, resulting in: mul(8) m1<1>F g1<0,4,1>.wzwwF g3<4,4,1>.wzwwF mul(8) g4<1>F g1<0,4,1>.xxyxF g3<4,4,1>.xxyxF mov.sat(8) m1<1>.zwF g4<4,4,1>F Unfortunately, it loses the .xy writemask on the mov.sat's MRF destination. While this doesn't pose an immediate problem, it then proceeds to transform the second mov.sat, resulting in: mul(8) m1<1>F g1<0,4,1>.wzwwF g3<4,4,1>.wzwwF mul(8) m1<1>F g1<0,4,1>.xxyxF g3<4,4,1>.xxyxF Instead of writing both halves of the vector (like the original code), it overwrites the full vector both times, clobbering the desired .xy values. When encountering a MOV, the compute-to-MRF code scans for instructions which generate channels of the MOV source. It ensures that all necessary channels are available (possibly written by several instructions). In this case, *more* channels are available than necessary, so we want to take the subset that's actually used. Taking the bitwise and of both writemasks should accomplish that. This was discovered by analyzing an ARB_vertex_program test (glean/vertProg1/MUL test (with swizzle and masking)) with my new Mesa IR -> Vec4 IR translator code. However, it should be possible with GLSL programs as well. NOTE: This is a candidate for stable release branches. Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Eric Anholt <[email protected]>
* i965/vs: Fix debug dumping of VS push constants.Kenneth Graunke2012-10-251-1/+3
| | | | | | | | | | | | | | | | While copying the values into the batch space, we advance the param pointer. The debug code then tries to iterate over all the uploaded values, starting at param...which is now the end of the uploaded data, rather than the start. This patch saves a pointer to the start of push constant space before it gets altered and switches the debug code to use that. Tested by uncommenting the code and examining the output of glsl-vs-clamp-1.shader_test. Previously all values appeared to be zero. Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Eric Anholt <[email protected]>
* i965: Actually add support for GL_ANY_SAMPLES_PASSED from GL_ARB_oq2.Eric Anholt2012-10-221-0/+12
| | | | | | v2: Fix mangled sentence in the comment, and make the loop exit early. Reviewed-by: Ian Romanick <[email protected]> (v1)
* i965: Stop flushing the batch on timestamp queries, too.Eric Anholt2012-10-191-1/+0
| | | | | | | Given the usecase we have of trying to measure timestamps across individual draw calls, flushing will totally mess up what people are trying to measure. Reviewed-by: Kenneth Graunke <[email protected]>
* i965: Don't flush the batch immediately on EndQuery.Eric Anholt2012-10-191-5/+14
| | | | | | | | | | | | | | | | The theory I had when I wrote the code was that you wanted to minimize latency on your queries because the app was going to ask soon. Only, it turns out that everybody batches up their queries and asks for the results later (often after the next SwapBuffers!), so this was a pessimization. Until now, I had no workload where it mattered enough to benchmark. Recently I started playing some Minecraft, which uses tons of queries to decide whether to render chunks of the terrain. For that app, avoiding the flush in the query-generation loop improves performance 22.7% +/- 4.7% (n=3) on an apitrace capture of it (confirmed in game by watching the fps meter found by pressing F3, 15/16 -> 20/21 fps). Reviewed-by: Kenneth Graunke <[email protected]>
* i965/fs: Fix typo in refactor of brw_fs_reg_allocate.cpp.Eric Anholt2012-10-191-1/+1
| | | | | I'm amazed that my usual warnings check didn't catch this, and that this passed piglit.
* i965/vs: include format argument in debug printfTapani Pälli2012-10-191-1/+1
| | | | | | | | otherwise some compilers will throw error "error: format not a string literal and no format arguments" Signed-off-by: Tapani Pälli <[email protected]> Reviewed-by: Chad Versace <[email protected]>
* intel: Skip texsubimage fastpath for more pixel unpack state (v2)Chad Versace2012-10-181-1/+6
| | | | | | | | | | | | | | | | | | | Fixes piglit tests "unpack-teximage2d --pbo=* --format=GL_BGRA" on Sandybridge+. The fastpath was checking an incomplete set of pixel unpack state. This patch adds checks for all the fields of gl_pixelstore_attrib that affect 2D texture uploads. Also, it begins permitting the case where GL_UNPACK_ROW_LENGTH is 0. Ideally, we would just ask a unicorn to JIT this fastpath for us in a way that safely handles the unpacking state. Until then, it's safer if only a small set of situations activate the fastpath. v2: Use _mesa_is_bufferobj(), per Anholt. Reviewed-by: Eric Anholt <[email protected]> Signed-off-by: Chad Versace <[email protected]>