summaryrefslogtreecommitdiffstats
path: root/src/mesa/drivers
Commit message (Collapse)AuthorAgeFilesLines
* i965: Generalize coord+offset lowering pass for ir_txfChris Forbes2013-10-261-3/+26
| | | | | | | | | | ir_txf expects an ivec* coordinate, and may be larger than ivec2; shuffle things around so that this will work. V2: Fix style nits, use ir_builder Signed-off-by: Chris Forbes <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
* i965: Add lowering pass to fold offset into unnormalized coordsChris Forbes2013-10-264-0/+81
| | | | | | | | | | | | | | | It turns out that nonzero offsets with gsampler2DRect don't work -- they just return garbage. Work around this by folding the offset into the coord. Done as an IR pass rather than yet another hack in the visitors because it's clear what's going on this way. Can possibly reuse this to replace the existing txf coord+offset hacks. V2: Use ir_builder Signed-off-by: Chris Forbes <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
* i965: Add lowering pass for splitting textureGatherOffsetsChris Forbes2013-10-264-0/+92
| | | | | | | | | | | | | | | | Rewrites textureGatherOffsets(s, p, offsets) into gvec4( textureGatherOffset(s, p, offsets[0]).w, textureGatherOffset(s, p, offsets[1]).w, textureGatherOffset(s, p, offsets[2]).w, textureGatherOffset(s, p, offsets[3]).w ) V2: Use ir_builder to be slightly clearer. Signed-off-by: Chris Forbes <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
* i965: Add asserts to ensure that ir_tg4 offset arrays are loweredChris Forbes2013-10-262-0/+6
| | | | | | | | | We don't have a message that does 4 independent offsets; a lowering pass needs to lower it to 4 normal gather4s before reaching this point. Signed-off-by: Chris Forbes <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
* i965/fs: Add support for shadow comparitors with gather4Chris Forbes2013-10-262-3/+15
| | | | | | | | Note that gather4_po_c's parameters are too long for SIMD16. It might be worth emitting 2xSIMD8 messages in this case at some point. Signed-off-by: Chris Forbes <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
* i965/vs: Add support for shadow comparitors with gather4Chris Forbes2013-10-262-3/+16
| | | | | | | | | | gather4_c's argument layout is straightforward -- refz just goes on the end. gather4_po_c's layout however -- the array index is replaced with refz. Signed-off-by: Chris Forbes <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
* i965: Add Gen7 gather4_c and gather4_po_c message typesChris Forbes2013-10-261-0/+2
| | | | | | Signed-off-by: Chris Forbes <[email protected]> Reviewed-by: Eric Anholt <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
* i965/vs: add support for gather4 with nonconstant offsetsChris Forbes2013-10-261-1/+15
| | | | Signed-off-by: Chris Forbes <[email protected]>
* i965/fs: add support for gather4 with nonconstant offsetsChris Forbes2013-10-261-7/+46
| | | | | | | | V3: fixup crazy check for whether we need to emit the coordinate after custom handling. Signed-off-by: Chris Forbes <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
* i965: relax brw_texture_offset assertChris Forbes2013-10-264-5/+10
| | | | | | | | Some texturing ops are about to have nonconstant offset support; the offset in the header in these cases should be zero. Signed-off-by: Chris Forbes <[email protected]> Reviewed-by: Ian Romanick <[email protected]>
* i965: Add SHADER_OPCODE_TG4_OFFSET for gather with nonconstant offsets.Chris Forbes2013-10-266-3/+20
| | | | | | | | | The generator code ends up clearer this way than if we had to sniff via the message length. Implemented via the gather4_po message in hardware, which is present in Gen7 and later. Signed-off-by: Chris Forbes <[email protected]> Reviewed-by: Ian Romanick <[email protected]>
* i965: add missing tg4 case in brw_instruction_nameChris Forbes2013-10-261-0/+2
| | | | | Signed-off-by: Chris Forbes <[email protected]> Reviewed-by: Ian Romanick <[email protected]>
* i965: Weaken the flushing in gen7_end_transform_feedback().Kenneth Graunke2013-10-251-6/+6
| | | | | | | | | | | | Since 062317d6671 (i965: Go back to using the kernel SOL reset feature.) we've been flushing the batch on BeginTransformFeedback(). So it's not necessary to do it on EndTransformFeedback(). A PIPE_CONTROL will work. This makes gen7_end_transform_feedback() exactly the same as the gen6 variant. However, they'll diverge again shortly. Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Eric Anholt <[email protected]>
* i965/fs: Stop trying to hack around MRF dep chains on gen7+ LIFO scheduling.Eric Anholt2013-10-251-1/+1
| | | | | | | | | | | | | | This was a hack to avoid choosing to schedule all texturing before consumption of any texture results due to the way dependency chains worked out in the presence of MRFs. On gen7, we don't have MRFs, so the problem doesn't apply, and this was just badly constraining our scheduling. total instructions in shared programs: 1615306 -> 1612534 (-0.17%) instructions in affected programs: 9958 -> 7186 (-27.84%) GAINED: 259 LOST: 9 Reviewed-by: Matt Turner <[email protected]>
* i965: Try not to reverse-schedule things when doing LIFO scheduling.Eric Anholt2013-10-251-5/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The LIFO plan was simple: Take the most recently made available instructions, and pick those first. But because of the order we were pushing things onto our list of available-to-schedule instructions, it meant that when a set of instructions was made available at the same time (for example, everything at the start of the program that didn't depend on other instructions) we'd schedule them in reverse order. If you had 10 texture calls in a row in your program, each with independent argument setup, we'd set up the last texture call's args and execute it first, even though we wouldn't be able to consume its results until we'd finished the other 9 texture calls (assuming consumption of texture results happens near each texture call, and combines it with another texture result, which is normal for a convolution shader). To fix this, walk the list for doing LIFO in the order that instructions were originally generated in the program, but choose to push newly-made-available instructions to the other end of the list instead. total instructions in shared programs: 1587242 -> 1586290 (-0.06%) instructions in affected programs: 7801 -> 6849 (-12.20%) GAINED: 76 LOST: 67 Thanks to Chia-I Wu for pointing out the bug in my first version of the patch that made it a huge loss. Reviewed-by: Matt Turner <[email protected]>
* i965/fs: Match commutative expressions with reversed arguments.Matt Turner2013-10-251-3/+23
| | | | | | | total instructions in shared programs: 1645011 -> 1644938 (-0.00%) instructions in affected programs: 17543 -> 17470 (-0.42%) Reviewed-by: Eric Anholt <[email protected]>
* i965: s/Muchnik/Muchnick/.Matt Turner2013-10-254-4/+4
| | | | Reviewed-by: Eric Anholt <[email protected]>
* i965: Reduce gl_MaxGeometryInputComponents to 64.Paul Berry2013-10-241-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Although in principle there is no hardware limitation that prevents gl_MaxGeometryInputComponents from being set to 128 on Gen7, we have the following limitations in the vec4 compiler back end: - Registers assigned to geometry shader inputs can't be spilled or later re-used for any other purpose. - The last 16 registers are set aside for the "MRF hack", meaning they can only be used to send messages, and not for general purpose computation. - Up to 32 registers may be reserved for push constants, even if there is sufficient register pressure to make this impractical. A shader using 128 geometry input components, and having an input type of triangles_adjacency, would use up: - 1 register for r0 (which holds URB handles and various pieces of control information). - 1 register for gl_PrimitiveID. - 102 registers for geometry shader inputs (17 registers per input vertex, assuming DUAL_INSTANCED dispatch mode and allowing for one register of overhead for gl_Position and gl_PointSize, which are present in the URB map even if they are not used). - Up to 32 registers for push constants. - 16 registers for the "MRF hack". That's a total of 152 registers, which is well over the 128 registers the hardware supports. Fortunately, the GLSL 1.50 spec allows us to reduce gl_MaxGeometryInputComponents to 64. Doing that frees up 48 registers, brining the total down to 104 registers, leaving 24 registers available to do computation. Fixes piglit test spec/glsl-1.50/execution/geometry/max-input-components. Reviewed-by: Eric Anholt <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]> Reviewed-by: Ian Romanick <[email protected]> Reviewed-by: Matt Turner <[email protected]>
* i965/gs: If a DUAL_OBJECT gs would spill, fall back to DUAL_INSTANCED.Paul Berry2013-10-243-2/+30
| | | | | | | | | | | | | | | | | | | | | | | | This is similar to what we do for 16-wide vs 8-wide fragment shaders. First we try compiling the geometry shader in DUAL_OBJECT mode. If we can't do that without spilling, we fall back on DUAL_INSTANCED mode, which should require less spilling (since it uses an interleaved layout of payload registers). In an ideal world we'd fall back to SINGLE mode, which would allow us to interleave general-purpose registers too (resulting in even less likelihood of spilling). But at the moment, the vec4 generator and visitor classes don't have the infrastructure to interleave general purpose registers, so DUAL_INSTANCED is the best we can do. As a side benefit this paves the way for implementing instanced geometry shaders (which are incompatible with DUAL_OBJECT mode). Since most geometry shaders used in piglit testing are small, DUAL_INSTANCED mode won't get exercised very much in a normal piglit run. To force DUAL_INSTANCED mode to be used for all geometry shaders, set INTEL_DEBUG=nodualobj. Reviewed-by: Eric Anholt <[email protected]>
* i965/gs: Fix up gl_PointSize input swizzling for DUAL_INSTANCED gs.Paul Berry2013-10-242-1/+32
| | | | | | | | | | | | | | | | | | | Geometry shaders that run in "DUAL_INSTANCED" mode store their inputs in vec4's. This means that when compiling gl_PointSize input swizzling (a MOV instruction which uses a geometry shader input as both source and destination), we need to do two things: - Set force_writemask_all to ensure that the MOV happens regardless of which channels are enabled. - Set the source register region to <4;4,1> (instead of <0;4,1> to satisfy register region restrictions. v2: move the source register region fixup to the top of vec4_generator::generate_vec4_instruction(), so that it applies to all instructions rather than just MOV. Reviewed-by: Eric Anholt <[email protected]>
* i965/gs: Add the ability to compile a DUAL_INSTANCED geometry shader.Paul Berry2013-10-244-8/+30
| | | | | | Not yet enabled. Reviewed-by: Eric Anholt <[email protected]>
* i965/vec4: Add the ability to suppress register spilling.Paul Berry2013-10-247-10/+23
| | | | | | | | | In future patches, this will allow us to first try compiling a geometry shader in DUAL_OBJECT mode (which is more efficient but uses more registers) and then if spilling is required, fall back on DUAL_INSTANCED mode. Reviewed-by: Eric Anholt <[email protected]>
* i965/vec4: if register allocation fails, don't try to schedule.Paul Berry2013-10-241-1/+1
| | | | | | | | | | | Otherwise the scheduler would be invoked with prog_data->total_grf == 0, causing havoc. In a future patch, this will allow us to try compiling a geometry shader in DUAL_OBJECT mode with spilling disabled, and then fall back to DUAL_INSTANCED mode if that failed. Reviewed-by: Eric Anholt <[email protected]>
* i965/vec4: Add the ability for attributes to be interleaved.Paul Berry2013-10-243-6/+27
| | | | | | | | | | | | When geometry shaders are operated in "single" or "dual instanced" mode, a single set of geometry shader inputs is interleaved into the thread payload (with each payload register containing a pair of inputs) in order to save register space. This patch modifies vec4_visitor::lower_attributes_to_hw_regs so that it can handle the interleaved format. Reviewed-by: Eric Anholt <[email protected]>
* i965/gs: Set force_writemask_all when setting up g0.Paul Berry2013-10-241-2/+3
| | | | | | | | | | | | | | | | | | All geometry shaders begin this instruction: mov(1) g0.2<1>:ud 0x0:ud { align1 } which sets up GRF0 properly for scratch reads and writes. Since this instruction has a SIMD size of 1, it will only have an effect if the first channel is enabled. In practice, the hardware seems to always dispatch geometry shaders with the first channel enabled, but I can't find anything in the docs to guarantee that. So to be on the safe side, set force_writemask_all on the instruction, which guarantees that it will have the desired effect regardless of which channels are enabled. Reviewed-by: Kenneth Graunke <[email protected]>
* i965/gs: Precompile geometry shaders.Paul Berry2013-10-244-0/+48
| | | | | | Reviewed-by: Kenneth Graunke <[email protected]> Reviewed-by: Ian Romanick <[email protected]> Reviewed-by: Matt Turner <[email protected]>
* i965/vec4: Extract function to set up vec4 prog key for precompiling.Paul Berry2013-10-243-14/+27
| | | | | | | | This will allow us to re-use it for precompiling geometry shaders. Reviewed-by: Kenneth Graunke <[email protected]> Reviewed-by: Ian Romanick <[email protected]> Reviewed-by: Matt Turner <[email protected]>
* i965/vec4: Remove uses_clip_distance from program key.Paul Berry2013-10-244-12/+3
| | | | | | | | | | This should never have been in the program key in the first place, since it's determined by the shader source, not by GL state. Change the code to just refer to gl_program::UsesClipDistanceOut directly. Reviewed-by: Kenneth Graunke <[email protected]> Reviewed-by: Ian Romanick <[email protected]> Reviewed-by: Matt Turner <[email protected]>
* glsl: Move UsesClipDistance from gl_{vertex,geometry}_program into gl_program.Paul Berry2013-10-242-2/+4
| | | | | | | | | | | | This will make it easier for back-ends to share code between geometry shader and vertex shader compilation. Also, it is renamed to "UsesClipDistanceOut" to clarify that (a) in geometry shaders, it refers to the gl_ClipDistance output rather than the gl_ClipDistance input, and (b) it is irrelevant in fragment shaders. Reviewed-by: Kenneth Graunke <[email protected]> Reviewed-by: Ian Romanick <[email protected]> Reviewed-by: Matt Turner <[email protected]>
* i965: Fix gl_MaxCombinedTextureImageUnits.Paul Berry2013-10-241-1/+6
| | | | | | | | | | | | | | | | | | | We've always overriden ctx->Const.{Vertex,Fragment}Program.MaxTextureImageUnits to reflect the number of texture image units supported by the hardware (rather than using the default values assigned by Mesa core) so it seems sensible to do that for GeometryProgram.MaxTextureImageUnits too. We set it to 0 if geometry shaders aren't supported. Once that is done, we can just unconditionally add GeometryProgram.MaxTextureImageUnits to MaxCombinedTextureImageUnits. Fixes piglit test "spec/glsl-1.50/built-in constants/gl_MaxCombinedTextureImageUnits". Reviewed-by: Kenneth Graunke <[email protected]> Reviewed-by: Ian Romanick <[email protected]> Reviewed-by: Matt Turner <[email protected]>
* mesa: Remove dricore from the build.Eric Anholt2013-10-242-2/+2
| | | | | | | | | No driver uses it any more, and it's been replaced by megadrivers. v2: Remove always-on conditional for NEED_LIBPROGRAM (review by Emil) Reviewed-by: Matt Turner <[email protected]> (v1) Reviewed-by: Emil Velikov <[email protected]>
* swrast: Build the driver into the shared mesa_dri_drivers.so.Eric Anholt2013-10-245-27/+28
| | | | | | | | | | | | v2: drop dridir now that it's unused. v3: Fix linking after rebase when building just swrast from classic but a drm-using gallium driver. v4: Consistently put spaces around += in the updated Makefile.am block. v5: Set a global driverAPI variable so loaders don't have to update to createNewScreen2() (though they may want to for thread safety). Reviewed-by: Matt Turner <[email protected]> (v3) Reviewed-by: Emil Velikov <[email protected]>
* radeon: Build the driver into the shared mesa_dri_drivers.so.Eric Anholt2013-10-2411-43/+140
| | | | | | | | | | | | | This required some reordering of headers to ensure that the symbol name redefines happened before any prototypes. v2: drop dridir now that it's unused. v3: Consistently put spaces around += in the updated Makefile.am blocks. v4: Set a global driverAPI variable so loaders don't have to update to createNewScreen2() (though they may want to for thread safety). Reviewed-by: Matt Turner <[email protected]> (v2) Reviewed-by: Emil Velikov <[email protected]>
* i915: Build the driver into the shared mesa_dri_drivers.so.Eric Anholt2013-10-246-20/+125
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | i915 has symbols for formerly-shared code that conflict with i965, so we define them away using gen-symbol-redefs.py. Options considered: - This option. Downsides: The symbols in profiling and debugging don't match the source. The symbol list may change in the future and we won't notice without manually running the tool again. - Use objcopy --localize-hidden to automatically demote our symbols to locals. This didn't work on i965 due to c++ weak symbols (which can't be localized), but could work on i915. We could do it on i915 only, but it does produce libtool warnings at link time due to libtool not knowing if the resulting .o file is safe to link (stupid libtool). Plus you end up with different symbols of the same name, which is confusing for debugging too. On the other hand, no future symbol conflicts long term. - Write our own libelf tool that handles c++ weak symbols like we want and apply it to all drivers. All the downsides of above, but applies uniformly across drivers. - Edit the files to just rename all the i915 or i965 symbols that conflict. There are on the order of 100 that have a prefix we used to share, so it would take a bit of typing. Fewest downsides, but still can have conflicts long term. Ultimately, this is the least invasive change at the moment, and we can see if the "more symbol conflicts appear later" thing is a real concern or not. Note that the ability to compile a version of i915 without INTEL_DEBUG env support is dropped. It's too useful. v2: drop dridir now that it's unused. v3: Consistently put spaces around += in the updated Makefile.am block. v4: Set a global driverAPI variable so loaders don't have to update to createNewScreen2() (though they may want to for thread safety). Reviewed-by: Matt Turner <[email protected]> (v2) Reviewed-by: Emil Velikov <[email protected]>
* dri: Add a tool for generating #defines to namespace driver global symbols.Eric Anholt2013-10-241-0/+68
| | | | | Acked-by: Matt Turner <[email protected]> Reviewed-by: Emil Velikov <[email protected]>
* nouveau: Build the driver into the shared mesa_dri_drivers.so.Eric Anholt2013-10-244-20/+23
| | | | | | | | | | | v2: drop dridir now that it's unused. v3: Consistently put spaces around += in the updated Makefile.am block. v4: Set a global driverAPI variable so loaders don't have to update to createNewScreen2() (though they may want to for thread safety). v5: Fix missed public symbol in nouveau. (caught by Emil) Reviewed-by: Matt Turner <[email protected]> (v2) Reviewed-by: Emil Velikov <[email protected]>
* i965: Build the driver into a shared mesa_dri_drivers.so .Eric Anholt2013-10-248-28/+131
| | | | | | | | | | | | | | | | | | | | | | | | | | Previously, we've split things such that mesa core is in libdricore, exposing the whole Mesa core interface in the global namespace, and the i965_dri.so code all links against that. Along with polluting application namespace terribly, it requires extra PLT indirections and prevents LTO. Instead, we can build all of the driver contents into the same .so with just a few symbols exposed to be referenced from the actual driver .so file, allowing LTO and reducing our exposed symbol count massively. FPS improvement on GLB2.7 with INTEL_NO_HW=1: 2.61061% +/- 1.16957% (n=50) (without LTO, just the PLT reductions from this commit) Note that the X Server requires commit 7ecfab47eb221dbb996ea6c033348b8eceaeb893 to successfully load this driver! v2: Set a global driverAPI variable so loaders don't have to update to createNewScreen2() (though they may want to for thread safety). v3: Drop AM_CPPFLAGS addition (Emil pointed out I'd missed some cflags that would be necessary, though only if we actually relied on them). v4: Fix install with DESTDIR set. Reviewed-by: Matt Turner <[email protected]> (v1) Reviewed-by: Emil Velikov <[email protected]> (v2)
* dri: Implement a DRI vtable extension to replace the global driDriverAPI.Eric Anholt2013-10-241-0/+13
| | | | | | | | | | | | | | | | | As we move to megadrivers, we are unable to build multiple drivers with the same public global symbol per driver (Think an X Server with an intel and a nouveau driver, and the X Server implementing indirect for both -- we have to actually talk to the right driver). By slipping the driDriverAPI vtable into the driver's extension list, we can replace the usage of the global symbol with usage of the loader-dlsym()ed driver information. v2: Pull in the hunk to avoid crashing on null driver_extensions. Thanks, Emil! Reviewed-by: Matt Turner <[email protected]> (v1) Reviewed-by: Chad Versace <[email protected]> Reviewed-by: Emil Velikov <[email protected]>
* dri: Pass in the dlsym()ed driver extension to screen creation.Eric Anholt2013-10-241-11/+33
| | | | | | | | | | | This will allow a megadrivers build to reference the actual driver being loaded from the shared dri_util screen creation code. v2: Fix indentation, fallback case in EGL (review by Emil). Reviewed-by: Matt Turner <[email protected]> (v1) Reviewed-by: Chad Versace <[email protected]> (v1) Reviewed-by: Emil Velikov <[email protected]>
* dri: Move driver config options to dri driver extensions.Eric Anholt2013-10-243-15/+29
| | | | | | | | | This way they aren't all sitting in the global namespace (with the same name per driver). Reviewed-by: Matt Turner <[email protected]> Reviewed-by: Chad Versace <[email protected]> Reviewed-by: Emil Velikov <[email protected]>
* i965: Print more debuginfo in intel_texsubimage_memcpy()Chad Versace2013-10-241-2/+8
| | | | | | | | Print info about packing, format, type, and tiling. This will help debug future issues with this fastpath. Reviewed-by: Frank Henigman <[email protected]> Signed-off-by: Chad Versace <[email protected]>
* i965: Fix glTexImage when packing alignment != cppChad Versace2013-10-241-2/+11
| | | | | | | | | | | | | | | | | | | | | | | Fixes texture corruption of Weston clients on cairo-glesv2 backend. Commit 49ed599 introduced the bug. Corruption occured when glTexSubImage called intel_texsubimage_tiled_memcpy() with: x,y=10,9 w,h=7,7 format=GL_ALPHA(0x1906) type=GL_UNSIGNED_BYTE(0x1401) gl_format=MESA_FORMAT_A8(0x18) packing.alignemnt=4 The function miscalculated the source image's stride as w*cpp=7 without taking into account the packing alignment. The actual stride was 8. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=70435 Reported-by: U. Artie Eoff <[email protected]> Tested-by: Kristian Høgsberg <[email protected]> Reviewed-by:Frank Henigman <[email protected]> Signed-off-by: Chad Versace <[email protected]>
* i965/fs: Only unroll high-accuracy dFdy() from SIMD16 to SIMD8 on gen4 and IVB.Paul Berry2013-10-231-10/+27
| | | | | | | | | | | | | | | | | | | | | | | | | | | In commit 800610f (i965/fs: Improve accuracy of dFdy() to match dFdx()) I unrolled the high-accuracy dFdy() computation from a single SIMD16 instruction to two SIMD8 instructions because of text I found in the i965 (gen4) PRM saying that instruction compression could not be used in align16 mode. I couldn't find similar text in later hardware docs, and I observed problems trying to use instruction compression on align16 mode on Ivy Bridge, so I assumed that the restriction still applied and the associated documentation had simply been lost. After consultation with the hardware engineers, it turns out this is not the case. In point of fact, the restriction was dropped in gen5, re-introduced in Ivy Bridge, and dropped again in Haswell. The reason I didn't notice this is that in the Ivy Bridge documentation, the restriction was in a different section, and described using different language. Now that we know that the restriction only applies to Gen4 and Ivy Bridge, we can limit the unrolling to those platforms. Tested on gen5, gen6, and gen7 (both Ivy Bridge and Haswell). Reviewed-by: Matt Turner <[email protected]> Reviewed-by: Eric Anholt <[email protected]>
* i965: Add perf debug hint when the app makes us do index buffer scanning.Eric Anholt2013-10-231-1/+4
| | | | Reviewed-by: Jordan Justen <[email protected]>
* i965: Try to avoid stalls on the GPU when doing glBufferSubData().Eric Anholt2013-10-239-36/+150
| | | | | | | | | | | | On DOTA2, framerate on dota2-de1.dem in windowed mode on my laptop improves by 7.69854% +/- 0.909163% (n=3). In a microbenchmark hitting this code path (wall time of piglit vbo-subdata-many), runtime decreases from 0.8 to 0.05 seconds. v2: Use out of range start/end instead of separate bool for the active flag (suggestion by Jordan), fix double-upload in the stalling path. Reviewed-by: Jordan Justen <[email protected]>
* i965: Be sure to reset brw->vb.buffers[] when trying to redo vertex setup.Eric Anholt2013-10-231-0/+2
| | | | | | | The brw_prepare_vertices that sets up buffers[] depends on these parameters, so don't let brw_prepare_vertices() skip it. Reviewed-by: Jordan Justen <[email protected]>
* i965: Add support for GL_ARB_texture_buffer_range.Eric Anholt2013-10-234-8/+33
| | | | | | | | | | | Supporting this extension turns out to simplify our code a bit over not supporting this extension, once the glBufferSubData() synchronization code lands. v2: Use 16 byte alignment like we do for uniform buffers, due to unaligned access penalties. Reviewed-by: Jordan Justen <[email protected]> (v1)
* i965: Add a note about the late-allocation in intel_bufferobj_buffer().Eric Anholt2013-10-231-0/+4
| | | | | | | | This was mostly for the i915 system-memory VBO code, which we don't have any more, but since that existed we've ended up producing dependencies on it being there. Reviewed-by: Jordan Justen <[email protected]>
* i965: Drop intel_bufferobj_source().Eric Anholt2013-10-234-30/+8
| | | | | | | Since src_offset was always 0, it wasn't doing anything for us beyond intel_bufferobj_buffer(). Reviewed-by: Jordan Justen <[email protected]>
* i965: Fix texture buffer rendering after a whole buffer replacement.Eric Anholt2013-10-231-0/+2
| | | | | | | | | | | If glBufferData(), glBufferSubData(0, obj->Size), or similar happens, we get a new drm_intel_bo for the buffer object, and thus need to re-upload texture buffer state so we point at the new data. Fixes the new piglit GL_ARB_texture_buffer_object/data-sync Cc: "9.2" <[email protected]> Reviewed-by: Jordan Justen <[email protected]>