summaryrefslogtreecommitdiffstats
path: root/src
Commit message (Collapse)AuthorAgeFilesLines
* i965/upload: Refactor open-coded ALIGN-like computations.Kenneth Graunke2014-03-181-3/+9
| | | | | | | | Sadly, we can't use actual ALIGN(), since that only supports power-of-two values for the alignment parameter. Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Ian Romanick <[email protected]>
* i965: Fix indentation in brw_upload_indices().Kenneth Graunke2014-03-181-19/+19
| | | | | Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Ian Romanick <[email protected]>
* i965: Consolidate code for setting brw->ib.start_vertex_offset.Kenneth Graunke2014-03-181-9/+6
| | | | | | | This was set identically in three places. Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Ian Romanick <[email protected]>
* i965: Allocate register sets at screen creation, not context creation.Kenneth Graunke2014-03-186-88/+88
| | | | | | | | | | | | | | Register sets depend on the particular hardware generation, but don't depend on anything in the actual OpenGL context. Computing them is fairly expensive, and they take up a large amount of memory. Putting them in the screen allows us to compute/allocate them once for all contexts, saving both time and space. Improves the performance of a context creation/destruction microbenchmark by about 3x on my Haswell i7-4750HQ. Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Eric Anholt <[email protected]>
* i965: Allocate the screen using ralloc rather than calloc.Kenneth Graunke2014-03-181-2/+3
| | | | | | | This will allow us to use the screen as a memory context. Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Eric Anholt <[email protected]>
* ra: Convert another bool array to bitsets.Eric Anholt2014-03-181-6/+7
| | | | | | | | | This one saves about 2MB peak allocation in glsl-fs-algebraic-add-add-1, with no performance difference on timing short shader-db runs (n=9/10, warmup outlier removed). Reviewed-by: Kenneth Graunke <[email protected]> Reviewed-by: Matt Turner <[email protected]>
* ra: Use a bitset for storing which registers belong to a class.Kenneth Graunke2014-03-181-5/+10
| | | | | | | | | This should use 1/8 the memory. Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Matt Turner <[email protected]> Reviewed-by: Brian Paul <[email protected]> Reviewed-by: Christoph Brill <[email protected]>
* ra: Create a reg_belongs_to_class() helper function.Kenneth Graunke2014-03-181-2/+11
| | | | | | | | | This is a little easier to read. Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Matt Turner <[email protected]> Reviewed-by: Brian Paul <[email protected]> Reviewed-by: Christoph Brill <[email protected]>
* ra: Use bool instead of GLboolean.Kenneth Graunke2014-03-182-28/+29
| | | | | | | | | | | | | | | | | | | This isn't the GL API, so there's no reason to use GLboolean. Using bool is safer: any non-zero value is treated as "true". When converting a value to a GLboolean, all but the low byte is discarded, which means that values like 256 will be incorrectly rendered as false. Done via the following vim commands: :%s/GLboolean/bool/g :%s/GL_TRUE/true/g :%s/GL_FALSE/false/g and one line of manual whitespace tidying. Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Eric Anholt <[email protected]> Reviewed-by: Matt Turner <[email protected]> Reviewed-by: Brian Paul <[email protected]>
* i965: Accurately bail on SIMD16 compiles.Kenneth Graunke2014-03-183-34/+82
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | Ideally, we'd like to never even attempt the SIMD16 compile if we could know ahead of time that it won't succeed---it's purely a waste of time. This is especially important for state-based recompiles, which happen at draw time. The fragment shader compiler has a number of checks like: if (dispatch_width == 16) fail("...some reason..."); This patch introduces a new no16() function which replaces the above pattern. In the SIMD8 compile, it sets a "SIMD16 will never work" flag. Then, brw_wm_fs_emit can check that flag, skip the SIMD16 compile, and issue a helpful performance warning if INTEL_DEBUG=perf is set. (In SIMD16 mode, no16() calls fail(), for safety's sake.) The great part is that this is not a heuristic---if the flag is set, we know with 100% certainty that the SIMD16 compile would fail. (It might fail anyway if we run out of registers, but it's always worth trying.) v2: Fix missing va_end in early-return case (caught by Ilia Mirkin). Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Chris Forbes <[email protected]> [v1] Reviewed-by: Ian Romanick <[email protected]> [v1] Reviewed-by: Eric Anholt <[email protected]>
* i965/fs: Support pull parameters in SIMD16 mode.Kenneth Graunke2014-03-182-11/+13
| | | | | | | | | | | This is just a matter of reusing the pull/push constant information set up by the SIMD8 compile. This gains us 78 SIMD16 programs in shader-db. Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Topi Pohjolainen <[email protected]> Reviewed-by: Eric Anholt <[email protected]>
* i965/fs: Use a single instance of the pull_constant_loc[] array.Kenneth Graunke2014-03-182-28/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | Now that we don't renumber uniform registers, assign_constant_locations and move_uniform_array_access_to_pull_constants use the same names. So, they can share a single copy of the pull_constant_loc[] array. This simplifies the code considerably. assign_constant_locations() doesn't need to walk through pull_params[] to rediscover reladdr demotions; it just has that information in pull_constant_loc[]. We also only need to rewrite the instruction stream once, instead of twice. Even better, we now have a single array describing the layout of all pull parameters, which we can pass to the SIMD16 program. This actually hurts a few shaders in Serious Sam 3, and one in KWin: total instructions in shared programs: 1841957 -> 1842035 (0.00%) instructions in affected programs: 1165 -> 1243 (6.70%) Comparing dump_instructions() before and after the pull constant transformations with and without this patch, it appears that there is a uniform array with variable indexing (reladdr) and constant indexing (of array element 0). Previously, we uploaded array element 0 as both a pull constant (for reladdr) /and/ a push constant. Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Topi Pohjolainen <[email protected]> Reviewed-by: Eric Anholt <[email protected]>
* i965/fs: Don't renumber UNIFORM registers.Kenneth Graunke2014-03-183-118/+86
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | Previously, remove_dead_constants() would renumber the UNIFORM registers to be sequential starting from zero, and the resulting register number would be used directly as an index into the params[] array. This renumbering made it difficult to collect and save information about pull constant locations, since setup_pull_constants() and move_uniform_array_access_to_pull_constants() used different names. This patch generalizes setup_pull_constants() to decide whether each uniform register should be a pull constant, push constant, or neither (because it's unused). Then, it stores mappings from UNIFORM register numbers to params[] or pull_params[] indices in the push_constant_loc and pull_constant_loc arrays. (We already did this for pull constants.) Then, assign_curb_setup() just needs to consult the push_constant_loc array to get the real index into the params[] array. This effectively folds all the remove_dead_constants() functionality into assign_constant_locations(), while being less irritable to work with. v2: Add assert(remapped <= i), requested by Topi. Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Topi Pohjolainen <[email protected]> Reviewed-by: Eric Anholt <[email protected]>
* i965/fs: Split pull parameter decision making from mechanical demoting.Kenneth Graunke2014-03-182-33/+40
| | | | | | | | | | | | | | | | | | | | | | | | | | | | move_uniform_array_access_to_pull_constants() and setup_pull_constants() both have two parts: 1. Decide which UNIFORM registers to demote to pull constants, and assign locations. 2. Mechanically rewrite the instruction stream to pull the uniform value into a temporary VGRF and use that, eliminating the UNIFORM file access. In order to support pull constants in SIMD16 mode, we will need to make decisions exactly once, but rewrite both instruction streams. Separating these two tasks will make this easier. This patch introduces a new helper, demote_pull_constants(), which takes care of rewriting the instruction stream, in both cases. For the moment, a single invocation of demote_pull_constants can't safely handle both reladdr and non-reladdr tasks, since the two callers still use different names for uniforms due to remove_dead_constants() remapping of things. So, we get an ugly boolean parameter saying which to do. This will go away. Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Ian Romanick <[email protected]> Reviewed-by: Eric Anholt <[email protected]>
* i965/fs: Record pull constant locations for all array elements.Kenneth Graunke2014-03-181-2/+2
| | | | | | | | | | | | When demoting a variably indexed uniform array to pull constants, we only recorded the location for the base of the array (element 0). Recording locations for all array elements is a trivial amount of code and will make subsequent refactoring easier. Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Ian Romanick <[email protected]> Reviewed-by: Eric Anholt <[email protected]>
* i965/fs: Save push constant location information.Kenneth Graunke2014-03-183-2/+12
| | | | | | | | | | | | | | | | | | | | | | | | Previously, both move_uniform_array_access_to_pull_constants() and setup_pull_constants() maintained stack-local arrays with this information. Storing this information will allow it to be used from multiple functions, allowing us to split and move code around. We'll also eventually want to pass pull constant location information to the SIMD16 compile. Saving this information will help us do that. Unfortunately, the two functions *cannot* share the contents of the array just yet. remove_dead_constants() renumbers all the UNIFORM registers to be contiguous starting at zero, so the two functions talk about uniforms using different names. We can't even remap them, since move_uniform_array_access_to_pull_constants() deletes UNIFORM registers that are only accessed with reladdr, so remove_dead_constants can't even see them. This situation will improve in the next few patches. Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Ian Romanick <[email protected]> Reviewed-by: Eric Anholt <[email protected]>
* i965/fs: Delete dead code to fail compiles with SIMD16 pull parameters.Kenneth Graunke2014-03-181-5/+0
| | | | | | | | | | | | The SIMD8 compile will determine whether pull parameters are necessary. If so, it will set prog_data->nr_pull_params to a value greater than 0. brw_wm_fs_emit checks if nr_pull_params > 0 and skips the SIMD16 compile altogether. So, this code should never occur. Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Ian Romanick <[email protected]> Reviewed-by: Eric Anholt <[email protected]>
* gallium/docs: update SLT, SGE, SFL, STR opcode docsBrian Paul2014-03-181-10/+10
| | | | | | | To emphasize that the result is floating point 1.0 or 0.0, to match other opcodes like SLE and SEQ. Reviewed-by: Roland Scheidegger <[email protected]>
* glx: Fix incorrect pdp assignment in dri2_bind_context().Charmaine Lee2014-03-181-1/+2
| | | | | | | pdp should be set to dpyPriv->dri2Display. Fixes blank frame failure running glretrace ClearView. Reviewed-by: Brian Paul <[email protected]>
* nvc0: Handle user mapped vertex buffer for edgeflagMaarten Lankhorst2014-03-181-2/+7
| | | | | | | Handle mapping edgeflag data similar to the code around it. This fixes a crash in piglit test gl-2.0-edgeflag. Signed-off-by: Maarten Lankhorst <[email protected]>
* clover: Fix region size error checking in some buffer transfer commands.Francisco Jerez2014-03-181-5/+16
| | | | Tested-by: Tom Stellard <[email protected]>
* nv50/ir/gk110: add postfactor support for fmulIlia Mirkin2014-03-181-0/+2
| | | | Signed-off-by: Ilia Mirkin <[email protected]>
* nv50/ir/gk110: set not modifier on first source of logic opIlia Mirkin2014-03-181-3/+2
| | | | Signed-off-by: Ilia Mirkin <[email protected]>
* nv50/ir/gk110: use shl/shr instead of lshf/rshf so that c[] is supportedIlia Mirkin2014-03-181-17/+6
| | | | Signed-off-by: Ilia Mirkin <[email protected]>
* nv50/ir/gk110: add 64/128-bit fetch/export supportIlia Mirkin2014-03-182-7/+4
| | | | Signed-off-by: Ilia Mirkin <[email protected]>
* nv50/ir/gk110: fix handling of OP_SUB for floating point opsIlia Mirkin2014-03-181-1/+6
| | | | Signed-off-by: Ilia Mirkin <[email protected]>
* nv50/ir/gk110: presin/preex2 take their source at bit 23Ilia Mirkin2014-03-181-1/+1
| | | | Signed-off-by: Ilia Mirkin <[email protected]>
* nv50/ir/gk110: add implementations of div u32/s32Ilia Mirkin2014-03-182-5/+162
| | | | Signed-off-by: Ilia Mirkin <[email protected]>
* nv50/ir/gk110: implement quadopIlia Mirkin2014-03-181-1/+11
| | | | Signed-off-by: Ilia Mirkin <[email protected]>
* nv50/ir/gk110: fill in mov from predicateIlia Mirkin2014-03-181-1/+5
| | | | Signed-off-by: Ilia Mirkin <[email protected]>
* nv50/ir/gk110: handle derivAll flag, fix useOffsets for non-txfIlia Mirkin2014-03-181-4/+8
| | | | Signed-off-by: Ilia Mirkin <[email protected]>
* nv50/ir/gk110: fix setting texture for txd/txf/txqIlia Mirkin2014-03-181-9/+8
| | | | Signed-off-by: Ilia Mirkin <[email protected]>
* nv50/ir/gk110: add texcsaa implementationIlia Mirkin2014-03-181-1/+11
| | | | Signed-off-by: Ilia Mirkin <[email protected]>
* nv50/ir/gk110: add pfetch supportIlia Mirkin2014-03-181-1/+9
| | | | Signed-off-by: Ilia Mirkin <[email protected]>
* nv50/ir/gk110: add emit/restart implementationsIlia Mirkin2014-03-181-1/+8
| | | | Signed-off-by: Ilia Mirkin <[email protected]>
* nv50/ir/gk110: add missing break in sched emitIlia Mirkin2014-03-181-1/+1
| | | | Signed-off-by: Ilia Mirkin <[email protected]>
* nv50/ir/gk110: implement partial txq supportIlia Mirkin2014-03-181-1/+27
| | | | Signed-off-by: Ilia Mirkin <[email protected]>
* nv50/ir/gk110: fill out texture instruction supportIlia Mirkin2014-03-181-13/+20
| | | | Signed-off-by: Ilia Mirkin <[email protected]>
* nv50/ir/gk110: fix control flow opcode emission, add sat flagIlia Mirkin2014-03-181-22/+18
| | | | Signed-off-by: Ilia Mirkin <[email protected]>
* egl/main: Enable Linux platform extensionsChad Versace2014-03-175-23/+215
| | | | | | | | | | | | | | | | Enable EGL_EXT_platform_base and the Linux platform extensions layered atop it: EGL_EXT_platform_x11, EGL_EXT_platform_wayland, and EGL_MESA_platform_gbm. Tested with Piglit's EGL_EXT_platform_base tests under an X11 session. To enable running the Wayland and GBM tests, windowed Weston was running and the kernel had render nodes enabled. I regression tested my EGL_EXT_platform_base patch set with Piglit on Ivybridge under X11/EGL, standalone Weston, and GBM with rendernodes. No regressions found. Signed-off-by: Chad Versace <[email protected]>
* egl/wayland: Emit EGL_BAD_PARAMETER for eglCreatePlatformPixmapSurfaceChad Versace2014-03-171-1/+17
| | | | | | | | | | From the EGL_EXT_wayland_spec, version 3: It is not valid to call eglCreatePlatformPixmapSurfaceEXT with a <dpy> that belongs to Wayland. Any such call fails and generates EGL_BAD_PARAMETER. Signed-off-by: Chad Versace <[email protected]>
* egl/gbm: Emit EGL_BAD_PARAMETER for eglCreatePlatformPixmapSurfaceChad Versace2014-03-171-1/+16
| | | | | | | | | | From the EGL_MESA_platform_gbm spec, version 5: It is not valid to call eglCreatePlatformPixmapSurfaceEXT with a <dpy> that belongs to the GBM platform. Any such call fails and generates EGL_BAD_PARAMETER. Signed-off-by: Chad Versace <[email protected]>
* egl/main: Stop using EGLNative types internallyChad Versace2014-03-1712-46/+89
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Internally, much of the EGL code uses EGLNativeDisplayType, EGLNativeWindowType, and EGLPixmapType. However, the EGLNative type often does not match the variable's actual type. The concept of EGLNative types are a bad match for Linux, as explained below. And the EGL platform extensions don't use EGLNative types at all. Those extensions attempt to solve cross-platform issues by moving the EGL API away from the EGLNative types. The core of the problem is that eglplatform.h can define each EGLNative type once only, but Linux supports multiple EGL platforms. To work around the problem, Mesa's eglplatform.h contains multiple definitions of each EGLNative type, selected by feature macros. Mesa expects EGL clients to set the feature macro approrpiately. But the feature macros don't work when a single codebase must be built with support for multiple EGL platforms, *such as Mesa itself*. When building libEGL, autotools chooses the EGLNative typedefs based on the first element of '--with-egl-platforms'. For example, '--with-egl-platforms=x11,drm,wayland' defines the following: typedef Display* EGLNativeDisplayType; typedef Window EGLNativeWindowType; typedef Pixmap EGLNativePixmapType; Clearly, this doesn't work well for Wayland and GBM. Mesa works around the problem by casting the EGLNative types to different things in different files. For sanity's sake, and to prepare for the EGL platform extensions, this patch removes from egl/main and egl/dri2 all internal use of the EGLNative types. It replaces them with 'void*' and checks each explicit cast with a static assertion. Also, the patch touches egl_gallium the minimal amount to keep it compatible with eglapi.h. Signed-off-by: Chad Versace <[email protected]>
* egl: Add STATIC_ASSERT() macroChad Versace2014-03-171-0/+5
| | | | Signed-off-by: Chad Versace <[email protected]>
* egl/dri2: Dispatch eglCreateImageKHR by display, not driverChad Versace2014-03-176-18/+35
| | | | | | | | | | | | Add dri2_egl_display_vtbl::create_image, set it for each platform, and let egl_dri2 dispatch eglCreateImageKHR to that. To remove ambiguity, rename egl_dri2.c:dri2_create_image() to dri2_create_image_from_dri(). This prepares for the EGL platform extensions. Signed-off-by: Chad Versace <[email protected]>
* egl/dri2/x11: Don't clobber _EGLDriver::APIChad Versace2014-03-171-5/+0
| | | | | | | | | | | dri2_initialize_x11_swrast() does a strange thing. For some extensions it doesn't support, it sets the corresponding functions in _EGLDriver::API to NULL. The intention here is clear, but misplaced. NULL or not, the function pointers never get called because their extensions aren't supported. Signed-off-by: Chad Versace <[email protected]>
* egl/dri2: Dispatch eglCreateWaylandBufferFromImageWL by display, not driverChad Versace2014-03-177-6/+32
| | | | | | | | | | Add dri2_egl_display_vtbl::create_wayland_buffer_from_image, set it for each platform, and let egl_dri2 dispatch eglCreateWaylandBufferFromImageWL to that. This prepares for the EGL platform extensions. Signed-off-by: Chad Versace <[email protected]>
* egl/dri2: Consolidate eglTerminateChad Versace2014-03-172-40/+24
| | | | | | | | | | | | | | | | | | | | egl_dri2.c:dri2_terminate() handled terminating X11 and DRM displays. The Wayland platform implemented its own dri2_wl_terminate(), which was nearly a copy of the common one. To implement the EGL platform extensions, we either need to dispatch eglTerminate per display or define a common implementation for all platforms. This patch chooses consolidation. It removes dri2_wl_terminate() by folding it into the common dri2_terminate(). It was necessary to invert the `if (disp->PlatformDisplay == NULL)` and the switch statement because, unlike DRM and X11, Wayland's terminator performed action even when EGL didn't own the native display. In the inversion, I replaced `disp->PlatformDisplay == NULL` with `dri2_dpy->own_device` because the two expressions are synonymous, but the latter's meaning is clearer. Signed-off-by: Chad Versace <[email protected]>
* egl/dri2/x11: Set dri2_dpy->own_deviceChad Versace2014-03-171-0/+3
| | | | | | | | When the user calls eglGetDisplay(EGL_DEFAULT_DISPLAY), the Wayland and DRM platforms set dri2_dpy->own_device=true. This patch makes the X11 platform do the same for consistency. Signed-off-by: Chad Versace <[email protected]>
* egl/dri2: Dispatch eglPostSubBufferNV by display, not driverChad Versace2014-03-177-1/+27
| | | | | | | | | | Add dri2_egl_display_vtbl::post_sub_buffer, set it for each platform, and let egl_dri2 dispatch eglPostSubBufferNV to that. This prepares for the EGL platform extensions. Reviewed-by: Ian Romanick <[email protected]> Signed-off-by: Chad Versace <[email protected]>