summaryrefslogtreecommitdiffstats
Commit message (Collapse)AuthorAgeFilesLines
* glsl: Pass variable mode into ast_process_structure_or_interface_block().Paul Berry2013-10-241-16/+23
| | | | | | | | | Later patches will use this information to do proper error checking of interpolation qualifiers that appear inside of interface blocks. Reviewed-by: Kenneth Graunke <[email protected]> Reviewed-by: Ian Romanick <[email protected]> Reviewed-by: Matt Turner <[email protected]>
* glsl: Extract interpretation of interpolation to its own function.Paul Berry2013-10-241-28/+42
| | | | | | | | | In future patches, we will need this in order to interpret interpolation qualifiers that appear inside interface blocks. Reviewed-by: Kenneth Graunke <[email protected]> Reviewed-by: Ian Romanick <[email protected]> Reviewed-by: Matt Turner <[email protected]>
* glsl: Pull interpolation_string() out of ir_variable.Paul Berry2013-10-244-20/+22
| | | | | | | | | Future patches will need to call this function when there isn't an ir_varible present to refer to. Reviewed-by: Kenneth Graunke <[email protected]> Reviewed-by: Ian Romanick <[email protected]> Reviewed-by: Matt Turner <[email protected]>
* i965: Reduce gl_MaxGeometryInputComponents to 64.Paul Berry2013-10-241-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Although in principle there is no hardware limitation that prevents gl_MaxGeometryInputComponents from being set to 128 on Gen7, we have the following limitations in the vec4 compiler back end: - Registers assigned to geometry shader inputs can't be spilled or later re-used for any other purpose. - The last 16 registers are set aside for the "MRF hack", meaning they can only be used to send messages, and not for general purpose computation. - Up to 32 registers may be reserved for push constants, even if there is sufficient register pressure to make this impractical. A shader using 128 geometry input components, and having an input type of triangles_adjacency, would use up: - 1 register for r0 (which holds URB handles and various pieces of control information). - 1 register for gl_PrimitiveID. - 102 registers for geometry shader inputs (17 registers per input vertex, assuming DUAL_INSTANCED dispatch mode and allowing for one register of overhead for gl_Position and gl_PointSize, which are present in the URB map even if they are not used). - Up to 32 registers for push constants. - 16 registers for the "MRF hack". That's a total of 152 registers, which is well over the 128 registers the hardware supports. Fortunately, the GLSL 1.50 spec allows us to reduce gl_MaxGeometryInputComponents to 64. Doing that frees up 48 registers, brining the total down to 104 registers, leaving 24 registers available to do computation. Fixes piglit test spec/glsl-1.50/execution/geometry/max-input-components. Reviewed-by: Eric Anholt <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]> Reviewed-by: Ian Romanick <[email protected]> Reviewed-by: Matt Turner <[email protected]>
* i965/gs: If a DUAL_OBJECT gs would spill, fall back to DUAL_INSTANCED.Paul Berry2013-10-243-2/+30
| | | | | | | | | | | | | | | | | | | | | | | | This is similar to what we do for 16-wide vs 8-wide fragment shaders. First we try compiling the geometry shader in DUAL_OBJECT mode. If we can't do that without spilling, we fall back on DUAL_INSTANCED mode, which should require less spilling (since it uses an interleaved layout of payload registers). In an ideal world we'd fall back to SINGLE mode, which would allow us to interleave general-purpose registers too (resulting in even less likelihood of spilling). But at the moment, the vec4 generator and visitor classes don't have the infrastructure to interleave general purpose registers, so DUAL_INSTANCED is the best we can do. As a side benefit this paves the way for implementing instanced geometry shaders (which are incompatible with DUAL_OBJECT mode). Since most geometry shaders used in piglit testing are small, DUAL_INSTANCED mode won't get exercised very much in a normal piglit run. To force DUAL_INSTANCED mode to be used for all geometry shaders, set INTEL_DEBUG=nodualobj. Reviewed-by: Eric Anholt <[email protected]>
* i965/gs: Fix up gl_PointSize input swizzling for DUAL_INSTANCED gs.Paul Berry2013-10-242-1/+32
| | | | | | | | | | | | | | | | | | | Geometry shaders that run in "DUAL_INSTANCED" mode store their inputs in vec4's. This means that when compiling gl_PointSize input swizzling (a MOV instruction which uses a geometry shader input as both source and destination), we need to do two things: - Set force_writemask_all to ensure that the MOV happens regardless of which channels are enabled. - Set the source register region to <4;4,1> (instead of <0;4,1> to satisfy register region restrictions. v2: move the source register region fixup to the top of vec4_generator::generate_vec4_instruction(), so that it applies to all instructions rather than just MOV. Reviewed-by: Eric Anholt <[email protected]>
* i965/gs: Add the ability to compile a DUAL_INSTANCED geometry shader.Paul Berry2013-10-244-8/+30
| | | | | | Not yet enabled. Reviewed-by: Eric Anholt <[email protected]>
* i965/vec4: Add the ability to suppress register spilling.Paul Berry2013-10-247-10/+23
| | | | | | | | | In future patches, this will allow us to first try compiling a geometry shader in DUAL_OBJECT mode (which is more efficient but uses more registers) and then if spilling is required, fall back on DUAL_INSTANCED mode. Reviewed-by: Eric Anholt <[email protected]>
* i965/vec4: if register allocation fails, don't try to schedule.Paul Berry2013-10-241-1/+1
| | | | | | | | | | | Otherwise the scheduler would be invoked with prog_data->total_grf == 0, causing havoc. In a future patch, this will allow us to try compiling a geometry shader in DUAL_OBJECT mode with spilling disabled, and then fall back to DUAL_INSTANCED mode if that failed. Reviewed-by: Eric Anholt <[email protected]>
* i965/vec4: Add the ability for attributes to be interleaved.Paul Berry2013-10-243-6/+27
| | | | | | | | | | | | When geometry shaders are operated in "single" or "dual instanced" mode, a single set of geometry shader inputs is interleaved into the thread payload (with each payload register containing a pair of inputs) in order to save register space. This patch modifies vec4_visitor::lower_attributes_to_hw_regs so that it can handle the interleaved format. Reviewed-by: Eric Anholt <[email protected]>
* i965/gs: Set force_writemask_all when setting up g0.Paul Berry2013-10-241-2/+3
| | | | | | | | | | | | | | | | | | All geometry shaders begin this instruction: mov(1) g0.2<1>:ud 0x0:ud { align1 } which sets up GRF0 properly for scratch reads and writes. Since this instruction has a SIMD size of 1, it will only have an effect if the first channel is enabled. In practice, the hardware seems to always dispatch geometry shaders with the first channel enabled, but I can't find anything in the docs to guarantee that. So to be on the safe side, set force_writemask_all on the instruction, which guarantees that it will have the desired effect regardless of which channels are enabled. Reviewed-by: Kenneth Graunke <[email protected]>
* glsl: set explicit_location correctly in lower_named_interface_blocks.Paul Berry2013-10-241-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | When lower_named_interface_blocks lowers a built-in interface block member to an ir_variable, it needs to set explicit_location in the ir_variable. Otherwise the linker gets confused and treats the variable as a generic varying. Fixes the following piglit tests, which were regressed by commit 63974c0 (glsl: Simplify the interface to link_invalidate_variable_locations): - clip-distance-bulk-copy - clip-distance-in-bulk-read - clip-distance-in-explicitly-sized - clip-distance-in-param - clip-distance-in-values - core-inputs - gs-redeclares-both-pervertex-blocks - gs-redeclares-pervertex-in-only - redeclare-pervertex-subset-vs-to-gs - unsized-in-named-interface-block-gs - unsized-in-named-interface-block-multiple - unsized-in-unnamed-interface-block-gs - unsized-in-unnamed-interface-block-multiple Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=70820 Reviewed-by: Ian Romanick <[email protected]> Reviewed-by: Matt Turner <[email protected]>
* i965/gs: Precompile geometry shaders.Paul Berry2013-10-244-0/+48
| | | | | | Reviewed-by: Kenneth Graunke <[email protected]> Reviewed-by: Ian Romanick <[email protected]> Reviewed-by: Matt Turner <[email protected]>
* i965/vec4: Extract function to set up vec4 prog key for precompiling.Paul Berry2013-10-243-14/+27
| | | | | | | | This will allow us to re-use it for precompiling geometry shaders. Reviewed-by: Kenneth Graunke <[email protected]> Reviewed-by: Ian Romanick <[email protected]> Reviewed-by: Matt Turner <[email protected]>
* i965/vec4: Remove uses_clip_distance from program key.Paul Berry2013-10-244-12/+3
| | | | | | | | | | This should never have been in the program key in the first place, since it's determined by the shader source, not by GL state. Change the code to just refer to gl_program::UsesClipDistanceOut directly. Reviewed-by: Kenneth Graunke <[email protected]> Reviewed-by: Ian Romanick <[email protected]> Reviewed-by: Matt Turner <[email protected]>
* glsl: Move UsesClipDistance from gl_{vertex,geometry}_program into gl_program.Paul Berry2013-10-244-9/+13
| | | | | | | | | | | | This will make it easier for back-ends to share code between geometry shader and vertex shader compilation. Also, it is renamed to "UsesClipDistanceOut" to clarify that (a) in geometry shaders, it refers to the gl_ClipDistance output rather than the gl_ClipDistance input, and (b) it is irrelevant in fragment shaders. Reviewed-by: Kenneth Graunke <[email protected]> Reviewed-by: Ian Romanick <[email protected]> Reviewed-by: Matt Turner <[email protected]>
* glsl/gs: Fix transform feedback of gl_ClipDistance.Paul Berry2013-10-243-1/+9
| | | | | | | | | | | | | | | | | | | | | | Since gl_ClipDistance is lowered from an array of floats to an array of vec4's during compilation, transform feedback has special logic to keep track of the pre-lowered array size so that attempting to perform transform feedback on gl_ClipDistance produces a result with the correct size. Previously, this special logic always consulted the vertex shader's size for gl_ClipDistance. This patch fixes it so that it uses the geometry shader's size for gl_ClipDistance when a geometry shader is in use. Fixes piglit test spec/glsl-1.50/transform-feedback-type-and-size. v2: Change the type of LastClipDistanceArraySize to "unsigned", and clarify the comment above it. Reviewed-by: Kenneth Graunke <[email protected]> Reviewed-by: Ian Romanick <[email protected]> Reviewed-by: Matt Turner <[email protected]>
* i965: Fix gl_MaxCombinedTextureImageUnits.Paul Berry2013-10-241-1/+6
| | | | | | | | | | | | | | | | | | | We've always overriden ctx->Const.{Vertex,Fragment}Program.MaxTextureImageUnits to reflect the number of texture image units supported by the hardware (rather than using the default values assigned by Mesa core) so it seems sensible to do that for GeometryProgram.MaxTextureImageUnits too. We set it to 0 if geometry shaders aren't supported. Once that is done, we can just unconditionally add GeometryProgram.MaxTextureImageUnits to MaxCombinedTextureImageUnits. Fixes piglit test "spec/glsl-1.50/built-in constants/gl_MaxCombinedTextureImageUnits". Reviewed-by: Kenneth Graunke <[email protected]> Reviewed-by: Ian Romanick <[email protected]> Reviewed-by: Matt Turner <[email protected]>
* freedreno/a3xx/compiler: relative addressingRob Clark2013-10-241-1/+123
| | | | Signed-off-by: Rob Clark <[email protected]>
* freedreno/a3xx: fix const/rel/const-rel encodingRob Clark2013-10-244-88/+300
| | | | | | | | | | | | | | | | | | | | | | | | The encoding of constant, relative, and relative-const src registers is a bit more complex than originally thought, which gives an extra bit to encode const reg # at expense of taking a bit from relative offset. In most cases a3xx seems to actually use a scheme whereby it can encode an extra bit for const register. You have three possible encodings in thirteen bits: register: (11 bits for N.c) 00........... rN.c relative: (10 bits for N) 010.......... r<a0.x + N> 011.......... c<a0.x + N> const: (12 bits for N.c) 1............ cN.c Which means we can deal w/ more consts than previously thought. Signed-off-by: Rob Clark <[email protected]>
* freedreno/a3xx: add blend stateRob Clark2013-10-242-5/+23
| | | | Signed-off-by: Rob Clark <[email protected]>
* freedreno/resource: fail more gracefullyRob Clark2013-10-241-1/+13
| | | | | | Fail more gracefully when buffer allocation/import fails. Signed-off-by: Rob Clark <[email protected]>
* gallivm: implement fully accurate corner filtering for seamless cube mapsRoland Scheidegger2013-10-251-13/+151
| | | | | | | | | | | | | | | | | | | | | | d3d10 requires that cube corners are filtered with accurate weights (that is, the weight of the non-existing corner texel should be evenly distributed to the other 3 texels). OpenGL does not require this (but recommends it). This requires us to use different filtering code, since we need per-texel weights which our 2d lerp doesn't (and can't) do. And of course the (now per element) weights need to be adjusted too for it to work. Invoke the new filtering code whenever there's an edge to keep things simpler, as it will work for edges too not just corners but of course it's only needed with corners. More ugly code for not much gain but at least a hacked up cubemap demo shows very nice corners now... Not sure yet if and how this should be configurable... v2: incorporate feedback from Jose, only use special corner filtering code when there's a corner not when there's only an edge (as corner filtering code is slower, though a perf difference was only measureable when always forcing edge code). Plus some minor style fixes. Reviewed-by: Jose Fonseca <[email protected]>
* mesa: Remove dricore from the build.Eric Anholt2013-10-247-129/+4
| | | | | | | | | No driver uses it any more, and it's been replaced by megadrivers. v2: Remove always-on conditional for NEED_LIBPROGRAM (review by Emil) Reviewed-by: Matt Turner <[email protected]> (v1) Reviewed-by: Emil Velikov <[email protected]>
* swrast: Build the driver into the shared mesa_dri_drivers.so.Eric Anholt2013-10-246-42/+48
| | | | | | | | | | | | v2: drop dridir now that it's unused. v3: Fix linking after rebase when building just swrast from classic but a drm-using gallium driver. v4: Consistently put spaces around += in the updated Makefile.am block. v5: Set a global driverAPI variable so loaders don't have to update to createNewScreen2() (though they may want to for thread safety). Reviewed-by: Matt Turner <[email protected]> (v3) Reviewed-by: Emil Velikov <[email protected]>
* radeon: Build the driver into the shared mesa_dri_drivers.so.Eric Anholt2013-10-2412-44/+141
| | | | | | | | | | | | | This required some reordering of headers to ensure that the symbol name redefines happened before any prototypes. v2: drop dridir now that it's unused. v3: Consistently put spaces around += in the updated Makefile.am blocks. v4: Set a global driverAPI variable so loaders don't have to update to createNewScreen2() (though they may want to for thread safety). Reviewed-by: Matt Turner <[email protected]> (v2) Reviewed-by: Emil Velikov <[email protected]>
* i915: Build the driver into the shared mesa_dri_drivers.so.Eric Anholt2013-10-247-21/+126
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | i915 has symbols for formerly-shared code that conflict with i965, so we define them away using gen-symbol-redefs.py. Options considered: - This option. Downsides: The symbols in profiling and debugging don't match the source. The symbol list may change in the future and we won't notice without manually running the tool again. - Use objcopy --localize-hidden to automatically demote our symbols to locals. This didn't work on i965 due to c++ weak symbols (which can't be localized), but could work on i915. We could do it on i915 only, but it does produce libtool warnings at link time due to libtool not knowing if the resulting .o file is safe to link (stupid libtool). Plus you end up with different symbols of the same name, which is confusing for debugging too. On the other hand, no future symbol conflicts long term. - Write our own libelf tool that handles c++ weak symbols like we want and apply it to all drivers. All the downsides of above, but applies uniformly across drivers. - Edit the files to just rename all the i915 or i965 symbols that conflict. There are on the order of 100 that have a prefix we used to share, so it would take a bit of typing. Fewest downsides, but still can have conflicts long term. Ultimately, this is the least invasive change at the moment, and we can see if the "more symbol conflicts appear later" thing is a real concern or not. Note that the ability to compile a version of i915 without INTEL_DEBUG env support is dropped. It's too useful. v2: drop dridir now that it's unused. v3: Consistently put spaces around += in the updated Makefile.am block. v4: Set a global driverAPI variable so loaders don't have to update to createNewScreen2() (though they may want to for thread safety). Reviewed-by: Matt Turner <[email protected]> (v2) Reviewed-by: Emil Velikov <[email protected]>
* dri: Add a tool for generating #defines to namespace driver global symbols.Eric Anholt2013-10-241-0/+68
| | | | | Acked-by: Matt Turner <[email protected]> Reviewed-by: Emil Velikov <[email protected]>
* nouveau: Build the driver into the shared mesa_dri_drivers.so.Eric Anholt2013-10-245-21/+24
| | | | | | | | | | | v2: drop dridir now that it's unused. v3: Consistently put spaces around += in the updated Makefile.am block. v4: Set a global driverAPI variable so loaders don't have to update to createNewScreen2() (though they may want to for thread safety). v5: Fix missed public symbol in nouveau. (caught by Emil) Reviewed-by: Matt Turner <[email protected]> (v2) Reviewed-by: Emil Velikov <[email protected]>
* i965: Build the driver into a shared mesa_dri_drivers.so .Eric Anholt2013-10-249-33/+155
| | | | | | | | | | | | | | | | | | | | | | | | | | Previously, we've split things such that mesa core is in libdricore, exposing the whole Mesa core interface in the global namespace, and the i965_dri.so code all links against that. Along with polluting application namespace terribly, it requires extra PLT indirections and prevents LTO. Instead, we can build all of the driver contents into the same .so with just a few symbols exposed to be referenced from the actual driver .so file, allowing LTO and reducing our exposed symbol count massively. FPS improvement on GLB2.7 with INTEL_NO_HW=1: 2.61061% +/- 1.16957% (n=50) (without LTO, just the PLT reductions from this commit) Note that the X Server requires commit 7ecfab47eb221dbb996ea6c033348b8eceaeb893 to successfully load this driver! v2: Set a global driverAPI variable so loaders don't have to update to createNewScreen2() (though they may want to for thread safety). v3: Drop AM_CPPFLAGS addition (Emil pointed out I'd missed some cflags that would be necessary, though only if we actually relied on them). v4: Fix install with DESTDIR set. Reviewed-by: Matt Turner <[email protected]> (v1) Reviewed-by: Emil Velikov <[email protected]> (v2)
* dri: Implement a DRI vtable extension to replace the global driDriverAPI.Eric Anholt2013-10-242-0/+30
| | | | | | | | | | | | | | | | | As we move to megadrivers, we are unable to build multiple drivers with the same public global symbol per driver (Think an X Server with an intel and a nouveau driver, and the X Server implementing indirect for both -- we have to actually talk to the right driver). By slipping the driDriverAPI vtable into the driver's extension list, we can replace the usage of the global symbol with usage of the loader-dlsym()ed driver information. v2: Pull in the hunk to avoid crashing on null driver_extensions. Thanks, Emil! Reviewed-by: Matt Turner <[email protected]> (v1) Reviewed-by: Chad Versace <[email protected]> Reviewed-by: Emil Velikov <[email protected]>
* dri: Pass in the dlsym()ed driver extension to screen creation.Eric Anholt2013-10-248-35/+119
| | | | | | | | | | | This will allow a megadrivers build to reference the actual driver being loaded from the shared dri_util screen creation code. v2: Fix indentation, fallback case in EGL (review by Emil). Reviewed-by: Matt Turner <[email protected]> (v1) Reviewed-by: Chad Versace <[email protected]> (v1) Reviewed-by: Emil Velikov <[email protected]>
* gbm: Add support for the new __driDriverGetExtensions interface.Eric Anholt2013-10-241-2/+15
| | | | | | | v2: Fix uninitialized variable use in the old-ABI case. Reviewed-by: Chad Versace <[email protected]> (v1) Reviewed-by: Emil Velikov <[email protected]>
* egl: Add an optional function call for getting the DRI driver interface.Eric Anholt2013-10-241-2/+18
| | | | | | | | v2: Fix asprintf error checking. Reviewed-by: Matt Turner <[email protected]> (v1) Reviewed-by: Chad Versace <[email protected]> Reviewed-by: Emil Velikov <[email protected]>
* glx: Add an optional function call for getting the DRI driver interface.Eric Anholt2013-10-246-8/+35
| | | | | | | | | | | | | | The previous interface relied on a static struct, which meant that the driver didn't get a chance to edit the struct before the struct got used. For megadrivers, I want struct specific to the driver being loaded. v2: Fix the prototype in the docs (caught by Marek). Since the driver name was in the function, we didn't need to also pass it in. v3: Fix asprintf error checking (caught by Matt's gcc). Reviewed-by: Matt Turner <[email protected]> (v1) Reviewed-by: Chad Versace <[email protected]> Reviewed-by: Emil Velikov <[email protected]>
* dri: Move driver config options to dri driver extensions.Eric Anholt2013-10-247-18/+40
| | | | | | | | | This way they aren't all sitting in the global namespace (with the same name per driver). Reviewed-by: Matt Turner <[email protected]> Reviewed-by: Chad Versace <[email protected]> Reviewed-by: Emil Velikov <[email protected]>
* dri: Allow config options to be passed to the loader through extensions.Eric Anholt2013-10-242-9/+28
| | | | | | | | | | | | | | | Turns out already we have this nice mechanism for providing optional things from the driver to the loader, and I was going to have to rename the public global symbol to avoid conflicts when doing megadrivers. While the former __driConfigOptions is technically loader interface, this is the only loader that made use of that symbol. Continue paying attention to it if we can't find the new option, to retain compatibility with old drivers. Reviewed-by: Matt Turner <[email protected]> Reviewed-by: Chad Versace <[email protected]> Reviewed-by: Emil Velikov <[email protected]>
* glx: Move the driver extension-loading to a helper function.Eric Anholt2013-10-243-4/+18
| | | | | | | | | I'm planning on doing driver extension parsing from 3 places, and making the extension loading step a bit longer. Reviewed-by: Matt Turner <[email protected]> Reviewed-by: Chad Versace <[email protected]> Reviewed-by: Emil Velikov <[email protected]>
* clover: Query maximum kernel block size from the device instead of the ↵Francisco Jerez2013-10-244-10/+18
| | | | | | | | kernel object. Based on a similar fix from Aaron Watry. It seems unlikely that we will ever need a kernel-specific setting for this, and the Gallium API doesn't support it. Remove kernel::max_block_size() altogether.
* glsl: silence unused 'var' variable warningBrian Paul2013-10-241-2/+2
| | | | Reviewed-by: Paul Berry <[email protected]>
* svga: remove user-space vertex/index buffer codeBrian Paul2013-10-246-259/+13
| | | | | | | | The gallium vbuf module, which we've been using for some time now, takes care of uploading user-space vertex/index data into real buffers. The upload code in the svga driver was unused. Reviewed-by: José Fonseca <[email protected]>
* i965: Print more debuginfo in intel_texsubimage_memcpy()Chad Versace2013-10-241-2/+8
| | | | | | | | Print info about packing, format, type, and tiling. This will help debug future issues with this fastpath. Reviewed-by: Frank Henigman <[email protected]> Signed-off-by: Chad Versace <[email protected]>
* i965: Fix glTexImage when packing alignment != cppChad Versace2013-10-241-2/+11
| | | | | | | | | | | | | | | | | | | | | | | Fixes texture corruption of Weston clients on cairo-glesv2 backend. Commit 49ed599 introduced the bug. Corruption occured when glTexSubImage called intel_texsubimage_tiled_memcpy() with: x,y=10,9 w,h=7,7 format=GL_ALPHA(0x1906) type=GL_UNSIGNED_BYTE(0x1401) gl_format=MESA_FORMAT_A8(0x18) packing.alignemnt=4 The function miscalculated the source image's stride as w*cpp=7 without taking into account the packing alignment. The actual stride was 8. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=70435 Reported-by: U. Artie Eoff <[email protected]> Tested-by: Kristian Høgsberg <[email protected]> Reviewed-by:Frank Henigman <[email protected]> Signed-off-by: Chad Versace <[email protected]>
* freedreno: fix compile errorRob Clark2013-10-231-1/+1
| | | | | | Small typo introduced in a3ed98f. Signed-off-by: Rob Clark <[email protected]>
* i965/fs: Only unroll high-accuracy dFdy() from SIMD16 to SIMD8 on gen4 and IVB.Paul Berry2013-10-231-10/+27
| | | | | | | | | | | | | | | | | | | | | | | | | | | In commit 800610f (i965/fs: Improve accuracy of dFdy() to match dFdx()) I unrolled the high-accuracy dFdy() computation from a single SIMD16 instruction to two SIMD8 instructions because of text I found in the i965 (gen4) PRM saying that instruction compression could not be used in align16 mode. I couldn't find similar text in later hardware docs, and I observed problems trying to use instruction compression on align16 mode on Ivy Bridge, so I assumed that the restriction still applied and the associated documentation had simply been lost. After consultation with the hardware engineers, it turns out this is not the case. In point of fact, the restriction was dropped in gen5, re-introduced in Ivy Bridge, and dropped again in Haswell. The reason I didn't notice this is that in the Ivy Bridge documentation, the restriction was in a different section, and described using different language. Now that we know that the restriction only applies to Gen4 and Ivy Bridge, we can limit the unrolling to those platforms. Tested on gen5, gen6, and gen7 (both Ivy Bridge and Haswell). Reviewed-by: Matt Turner <[email protected]> Reviewed-by: Eric Anholt <[email protected]>
* glsl/gs: Prevent illegal input/output primitive types.Paul Berry2013-10-231-3/+32
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | From the GLSL 1.50 spec, section 4.3.8.1 (Input Layout Qualifiers): The layout qualifier identifiers for geometry shader inputs are layout-qualifier-id points lines lines_adjacency triangles triangles_adjacency And from section 4.3.8.2 (Output Layout Qualifiers) The layout qualifier identifiers for geometry shader outputs are layout-qualifier-id points line_strip triangle_strip max_vertices = integer-constant We were erroneously allowing line_strip and triangle_strip to be used as input qualifiers, and we were allowing lines, lines_adjacency, triangles, and triangles_adjacency to be used as output qualifiers. Fixes piglit tests "glsl-1.50-gs-{input,output}-layout-qualifiers *". Reviewed-by: Ian Romanick <[email protected]>
* i965: Add perf debug hint when the app makes us do index buffer scanning.Eric Anholt2013-10-231-1/+4
| | | | Reviewed-by: Jordan Justen <[email protected]>
* i965: Try to avoid stalls on the GPU when doing glBufferSubData().Eric Anholt2013-10-239-36/+150
| | | | | | | | | | | | On DOTA2, framerate on dota2-de1.dem in windowed mode on my laptop improves by 7.69854% +/- 0.909163% (n=3). In a microbenchmark hitting this code path (wall time of piglit vbo-subdata-many), runtime decreases from 0.8 to 0.05 seconds. v2: Use out of range start/end instead of separate bool for the active flag (suggestion by Jordan), fix double-upload in the stalling path. Reviewed-by: Jordan Justen <[email protected]>
* i965: Be sure to reset brw->vb.buffers[] when trying to redo vertex setup.Eric Anholt2013-10-231-0/+2
| | | | | | | The brw_prepare_vertices that sets up buffers[] depends on these parameters, so don't let brw_prepare_vertices() skip it. Reviewed-by: Jordan Justen <[email protected]>
* i965: Add support for GL_ARB_texture_buffer_range.Eric Anholt2013-10-235-9/+34
| | | | | | | | | | | Supporting this extension turns out to simplify our code a bit over not supporting this extension, once the glBufferSubData() synchronization code lands. v2: Use 16 byte alignment like we do for uniform buffers, due to unaligned access penalties. Reviewed-by: Jordan Justen <[email protected]> (v1)