summaryrefslogtreecommitdiffstats
path: root/src
Commit message (Collapse)AuthorAgeFilesLines
* r600: initialised PGM_RESOURCES_2 for ES/GSDave Airlie2015-11-122-0/+6
| | | | | | | | | | | This fixes the corruption on rendering that we are seeing in certain geometry shaders. Fixes: https://bugs.freedesktop.org/show_bug.cgi?id=91780 Reviewed-by: Alex Deucher <[email protected]> Tested / Reviewed-by: Glenn Kennard <[email protected]> Cc: "10.6" "11.0" <[email protected]> Signed-off-by: Dave Airlie <[email protected]>
* i965: Split nir_emit_intrinsic by stage with a general fallback.Kenneth Graunke2015-11-112-277/+381
| | | | | | | | | | | | | | | | | | | | | | Many intrinsics only apply to a particular stage (such as discard). In other cases, we may want to interpret them differently based on the stage (such as load_primitive_id or load_input). The current method isn't that pretty - we handle all intrinsics in one giant function. Sometimes we assert on stage, sometimes we forget. Different behaviors are handled via if-ladders based on stage. This commit introduces new nir_emit_<stage>_intrinsic() functions, and makes nir_emit_instr() call those. In turn, those fall back to the generic nir_emit_intrinsic() function for cases they don't want to handle specially. This makes it clear which intrinsics only exist in one stage, and makes it easy to handle inputs/outputs differently for various stages. Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]> Reviewed-by: Jordan Justen <[email protected]>
* mesa/copyimage: allow width/height to not be multiples of blockIlia Mirkin2015-11-111-3/+34
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | For compressed textures, the image size is not necessarily a multiple of the block size (e.g. the last mip levels). Section 18.3.2 (Copying Between Images) of the OpenGL 4.5 Core Profile spec says: An INVALID_VALUE error is generated if the dimensions of either subregion exceeds the boundaries of the corresponding image object, or if the image format is compressed and the dimensions of the subregion fail to meet the alignment constraints of the format. and Section 8.7 (Compressed Texture Images) says: An INVALID_OPERATION error is generated if any of the following conditions occurs: * width is not a multiple of four, and width + xoffset is not equal to the value of TEXTURE_WIDTH. * height is not a multiple of four, and height + yoffset is not equal to the value of TEXTURE_HEIGHT. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=92860 Signed-off-by: Ilia Mirkin <[email protected]> Acked-by: Alex Deucher <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]> Reviewed-by: Ian Romanick <[email protected]> Cc: [email protected]
* i965/brw_reg: Add a brw_VxH_indirect helperJason Ekstrand2015-11-111-0/+11
| | | | Reviewed-by: Kenneth Graunke <[email protected]>
* mesa: remove old comments in arrayobj.cBrian Paul2015-11-111-5/+0
|
* st/wgl: clarify code in stw_framebuffer_from_hwnd_locked()Brian Paul2015-11-111-2/+2
| | | | | | | | | Just a minor code change to make it obvious that NULL is returned when we don't find the given HWND. Reviewed-by: Sinclair Yeh <[email protected]> Reviewed-by: Charmaine Lee <[email protected]> Reviewed-by: José Fonseca <[email protected]>
* st/wgl: improve some function commentsBrian Paul2015-11-111-6/+30
| | | | | | | | | In particular, explain when stw_framebuffer objects are locked/unlocked/etc. Reviewed-by: Sinclair Yeh <[email protected]> Reviewed-by: Charmaine Lee <[email protected]> Reviewed-by: José Fonseca <[email protected]>
* st/wgl: whitespace/formatting fixesBrian Paul2015-11-113-63/+48
|
* st/wgl: fix locking issue in stw_st_framebuffer_present_locked()Brian Paul2015-11-111-0/+3
| | | | | | | | | | | | | When stw_st_framebuffer_present_locked() is called, the stw_framebuffer's mutex will already be locked. Normally, the stw_framebuffer_present_locked() function calls stw_framebuffer_release() to unlock the mutex when it's done. But if for some reason the 'resource' pointer in stw_st_framebuffer_present_locked() is null, we'd return without unlocking the stw_framebuffer. This fixes that to avoid potential deadlocks. Reviewed-by: Charmaine Lee <[email protected]>
* i965: Print force_writemask_all in dump_instructions().Kenneth Graunke2015-11-112-0/+6
| | | | | Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Matt Turner <[email protected]>
* i965: Combine BRW_NEW_*_BINDING_TABLE dirty bits.Kenneth Graunke2015-11-115-26/+14
| | | | | | | | | | | A while back, we moved to directly emitting the Gen7+ state when constructing the binding tables. These flags are only used on Gen4-6, which emit all the binding table pointers at once. We gain nothing by having separate flags, so combine them. Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Topi Pohjolainen <[email protected]>
* i965: Map GL_PATCHES to 3DPRIM_PATCHLIST_n.Kenneth Graunke2015-11-112-1/+10
| | | | | | | | | | | | | | | | Inspired by a patch by Fabian Bieler. Fabian defined a _3DPRIM_PATCHLIST_0 macro (which isn't actually a valid topology type); I instead chose to make a macro that takes an argument. He also took the number of patch vertices from _mesa_prim (which was set to ctx->TessCtrlProgram.patch_vertices) - I chose to use it directly to avoid the need for the VBO patch. v2: Change macro to 0x20 + (n - 1) instead of 0x1F + n to better match the documentation (suggested by Ian). Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Ian Romanick <[email protected]>
* r600g: Pass conservative depth parameters to hwGlenn Kennard2015-11-117-1/+53
| | | | | | | | Supported on R700 and up. Signed-off-by: Glenn Kennard <[email protected]> Reviewed-by: Marek Olšák <[email protected]> Signed-off-by: Dave Airlie <[email protected]>
* Revert "r600g: Pass conservative depth parameters to hw"Dave Airlie2015-11-116-46/+0
| | | | | | This reverts commit a1fc78911e9a6439db94d6ae91d5672c76e5fb1c. I pushed the wrong patch.
* r600g: Implement ARB_texture_viewGlenn Kennard2015-11-112-7/+18
| | | | | | Signed-off-by: Glenn Kennard <[email protected]> Reviewed-by: Marek Olšák <[email protected]> Signed-off-by: Dave Airlie <[email protected]>
* r600g: Pass conservative depth parameters to hwGlenn Kennard2015-11-116-0/+46
| | | | | | | Supported on R700 and up. Signed-off-by: Glenn Kennard <[email protected]> Reviewed-by: Marek Olšák <[email protected]>
* i965/nir/opt_peephole_ffma: Bypass fusion if any operand of fadd and fmul is ↵Eduardo Lima Mitev2015-11-101-0/+31
| | | | | | | | | | | | | | | | | | | | | | | a const When both fadd and fmul instructions have at least one operand that is a constant and it is only used once, the total number of instructions can be reduced from 3 (1 ffma + 2 load_const) to 2 (1 fmul + 1 fadd); because the constants will be progagated as immediate operands of fmul and fadd. This patch detects these situations and prevents fusing fmul+fadd into ffma. Shader-db results on i965 Haswell: total instructions in shared programs: 6235835 -> 6225895 (-0.16%) instructions in affected programs: 1124094 -> 1114154 (-0.88%) total loops in shared programs: 1979 -> 1979 (0.00%) helped: 7612 HURT: 843 GAINED: 4 LOST: 0 Reviewed-by: Jason Ekstrand <[email protected]>
* util: Add list_is_singular() helper functionEduardo Lima Mitev2015-11-101-0/+8
| | | | | | Returns whether the list has exactly one element. Reviewed-by: Matt Turner <[email protected]>
* nir/nir_opt_peephole_ffma: Move this lowering pass to the i965 driverEduardo Lima Mitev2015-11-106-9/+10
| | | | | | | | | Because the next patch will add an optimization that is specific to i965, we want to move this loweing pass to that driver altogether. This is safe because i965 is the only consumer. Reviewed-by: Jason Ekstrand <[email protected]>
* glsl: Use array deref for access to vector componentsKristian Høgsberg Kristensen2015-11-1010-68/+138
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | We've assumed that we could lower per-component vector access from vec[i] = scalar to vec = ir_triop_vector_insert(vec, scalar, i) but with SSBOs (and compute shader SLM and tesselation outputs) this is no longer valid. If a vector is "externally visible", multiple threads can write independent components simultaneously. With lowering to ir_triop_vector_insert, each thread read the entire vector, changes one component, then writes out the entire vector. This is racy. Instead of generating a ir_binop_vector_extract when we see v[i], we generate ir_dereference_array. We then add a lowering pass to lower the ir_dereference_array to ir_binop_vector_extract for rvalues and for to vector_insert for lvalues in a separate lowering pass. The resulting IR is the same as before, but we now have a window between ast->ir conversion and the lowering pass where v[i] appears in the IR as an array deref. This lets us run lowering passes that lower the vector access to I/O (eg for SSBO load/store) before we lower the per-component access to full vector writes. Reviewed-by: Jordan Justen <[email protected]> Signed-off-by: Kristian Høgsberg Kristensen <[email protected]>
* glsl: Lower UBO and SSBO access in glsl linkerKristian Høgsberg Kristensen2015-11-106-3/+13
| | | | | | | | | | | All GLSL IR consumers run this lowering pass so we can move it to the linker. This moves the pass up quite a bit, but that's the point: it needs to run before we throw away information about per-component vector access. Reviewed-by: Iago Toral Quiroga <[email protected]> Reviewed-by: Jordan Justen <[email protected]> Signed-off-by: Kristian Høgsberg Kristensen <[email protected]>
* glsl: Drop exec_list argument to lower_ubo_referenceKristian Høgsberg Kristensen2015-11-104-5/+5
| | | | | | | | | | | We always pass in shader->ir and we already pass in the shader, so just drop the exec_list. Most passes either take just a exec_list or a shader, so this seems more consistent. Reviewed-by: Timothy Arceri <[email protected]> Reviewed-by: Iago Toral Quiroga <[email protected]> Reviewed-by: Jordan Justen <[email protected]> Signed-off-by: Kristian Høgsberg Kristensen <[email protected]>
* nir/glsl: switch to using the builderConnor Abbott2015-11-101-441/+259
| | | | | | | v2: use nir_bulder_cf_insert (Ken) Signed-off-by: Connor Abbott <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
* nir/glsl: make emit() take nir_ssa_def * sourcesConnor Abbott2015-11-101-18/+18
| | | | | | | Again, this matches what the builder will have to do. Signed-off-by: Connor Abbott <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
* nir/glsl: convert nir_visitor::result to a nir_ssa_def *Connor Abbott2015-11-101-6/+7
| | | | | | | | Its only user now returns a nir_ssa_def *, and we'll need this since the builder returns a nir_ssa_def *. Signed-off-by: Connor Abbott <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
* nir/glsl: make evaluate_rvalue() return a nir_ssa_def *Connor Abbott2015-11-101-37/+53
| | | | | | | | | | | A long time ago, before NIR was even merged to master, glsl_to_nir used registers and these sources were actually register sources. But nowadays everything in glsl_to_nir is an SSA value, so stop pretending that by evaluating an rvalue we can get an arbitrary nir_src. Most importantly, we need this since the builder takes nir_ssa_def * sources directly. Signed-off-by: Connor Abbott <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
* st/mesa: Destroy buffer object's mutex.Jose Fonseca2015-11-101-0/+1
| | | | | | | | Ideally we should have a _mesa_cleanup_buffer_object function in src/mesa/bufferobj.c so that the destruction logic resided in a single place. Reviewed-by: Brian Paul <[email protected]>
* nir: Store PatchInputsRead and PatchOutputsWritten in nir_shader_info.Kenneth Graunke2015-11-102-0/+7
| | | | | | | | | | These tessellation shader related fields need plumbing through NIR. v2: Use uint32_t instead of uint64_t to match the source type of GLbitfield (caught by Iago Toral). Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Iago Toral Quiroga <[email protected]>
* vc4: Avoid loading undefined (newly-allocated) FBO contents.Eric Anholt2015-11-091-0/+17
| | | | | | | Since X has undefined contents in new pixmaps, it will allocate new textures for an FBO and draw to them without an explicit clear. For VC4, it's much faster to emit a clear than the load of the actual undefined memory contents, so just do that instead.
* vc4: Return NULL when we can't make our shadow for a sampler view.Eric Anholt2015-11-091-0/+4
| | | | | | | I'm not sure what the caller does is appropriate (just have a NULL sampler at this slot), but it fixes the immediate crash. Cc: "11.0" <[email protected]>
* vc4: Return GL_OUT_OF_MEMORY when buffer allocation fails.Eric Anholt2015-11-092-19/+32
| | | | | | | I was afraid our callers weren't prepared for this, but it looks like at least for resource creation, mesa/st throws an error appropriately. Cc: "11.0" <[email protected]>
* vc4: Add CL dumping for GL_ARRAY_PRIMITIVE.Eric Anholt2015-11-091-1/+16
|
* vc4: Fix a compiler warning.Eric Anholt2015-11-091-1/+1
|
* glsl: Use shared storage variable type for shared variablesJordan Justen2015-11-091-0/+2
| | | | | Signed-off-by: Jordan Justen <[email protected]> Reviewed-by: Timothy Arceri <[email protected]>
* glsl: Add shared variable typeJordan Justen2015-11-092-1/+2
| | | | | | | | | | Shared variables are stored in a common pool accessible by all threads in a compute shader local work group. These variables are similar to OpenCL's local/__local variables. Signed-off-by: Jordan Justen <[email protected]> Reviewed-by: Timothy Arceri <[email protected]>
* glsl: Add space to shader_storage in print_visitorJordan Justen2015-11-091-1/+1
| | | | | Signed-off-by: Jordan Justen <[email protected]> Reviewed-by: Timothy Arceri <[email protected]>
* glsl: Align comments on variables typesJordan Justen2015-11-091-7/+7
| | | | | | | | v2: * Split from patch to add ir_var_shader_shared (tarceri) Signed-off-by: Jordan Justen <[email protected]> Reviewed-by: Timothy Arceri <[email protected]>
* glsl: Parse shared keyword for compute shader variablesJordan Justen2015-11-095-1/+17
| | | | | | | | | | v2: * Move shared parsing under storage qualifiers (tarceri) * Fail to compile if shared is used in non-compute shader (tarceri) * Use separate shared_storage bit for shared variables (tarceri) Signed-off-by: Jordan Justen <[email protected]> Reviewed-by: Timothy Arceri <[email protected]>
* glsl: simplify interface block stream qualifier validationTimothy Arceri2015-11-102-23/+14
| | | | | | | | | Qualifiers on member variables are redundent all we need to do if check if it matches the stream associated with the block and throw an error if its not. Reviewed-by: Samuel Iglesias Gonsalvez <[email protected]> Cc: Emil Velikov <[email protected]>
* st/wgl: add null pointer check for HUD textureBrian Paul2015-11-091-1/+3
| | | | | | Fixes crash when using HUD with Nobel Clinician Viewer. Reviewed-by: Jose Fonseca <[email protected]>
* st/wgl: fix double-present on swapbuffers bugBrian Paul2015-11-093-20/+12
| | | | | | | | | | | | | | | | | | The stw_st_framebuffer_present_locked() function was getting called twice per SwapBuffers. First, when st_context_iface::flush() was called from DrvSwapBuffers() because the ST_FLUSH_FRONT flag was given. Second, by stw_st_swap_framebuffer_locked() which does the actual SwapBuffers. Two code changes: 1. Pass ST_FLUSH_END_OF_FRAME, instead of ST_FLUSH_FRONT. 2. Move the implementation of stw_flush_current_locked() into DrvSwapBuffers() since it's not called anywhere else. Not much change in perf for benchmarks like Lightsmark, but some simple Mesa demos are measurably faster. Reviewed-by: José Fonseca <[email protected]>
* st/wgl: reorder pixel formats to put MSAA formats lastBrian Paul2015-11-091-29/+32
| | | | | | | | | | | | | And put 8-bit/channel formats before 5/6/5 formats. The ChoosePixelFormat() function seems to be finicky about format selection. Putting the MSAA formats after the non-MSAA formats means most apps get a low-numbered format. Now we generally get the same pixel format regardless of whether using vgpu9 or 10. VMware bug 1455030 Reviewed-by: José Fonseca <[email protected]>
* st/wgl: Don't rely on GDI to bookkeep pixelformat for us.José Fonseca2015-11-092-7/+6
| | | | | | | This allows to use apitrace's retracediff script on Windows to retrace and compare two builds of a Mesa based opengl32.dll/ICD side-by-side. See also https://github.com/apitrace/apitrace/commit/e4a4f15f5b92e0abbd24d7d053da25f8278c9f64
* winsys/radeon: Use CPU page size instead of hardcoding 4096 bytes v3Michel Dänzer2015-11-091-11/+19
| | | | | | | | | | | | | | | Fixes GPUVM conflicts with non-4K page size. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=92738 v2: Replace sanitization of VM base address alignment with comment why that's not necessary. v3: Use unsigned instead of long as the type for the size_align member. (Marek) Cc: [email protected] Reviewed-by: Christian König <[email protected]> (v1) Reviewed-by: Marek Olšák <[email protected]>
* st/omx: add headless supportLeo Liu2015-11-081-10/+35
| | | | | | | | | | | This will allow dec/enc/transcode without X v2: use env override even with X, use loader_open_device instead of open v3: clean up Signed-off-by: Leo Liu <[email protected]> Reviewed-by: Christian König <[email protected]>
* st/va: use vl screen drm support from vl_wys_drmLeo Liu2015-11-081-21/+3
| | | | | | | v2: move the dup to vl_wys_drm for pipe loader Signed-off-by: Leo Liu <[email protected]> Reviewed-by: Christian König <[email protected]>
* vl: add drm support for vl_screenLeo Liu2015-11-083-1/+85
| | | | | | | | | | This will allow the state trackers to use render nodes with screen creation v2: dup fd for pipe loader Signed-off-by: Leo Liu <[email protected]> Reviewed-by: Christian König <[email protected]>
* st/va: fix build fails with pipe loaderLeo Liu2015-11-081-2/+3
| | | | | | | There is no dev in drv, and dev should be from vl_screen here Signed-off-by: Leo Liu <[email protected]> Reviewed-by: Christian König <[email protected]>
* nvc0: enable compute support on FermiSamuel Pitoiset2015-11-081-2/+2
| | | | | | | | | | | Altough the compute support is still not complete because textures and surfaces need to be implemented, it allows to launch very simple compute kernel like one which reads reading MP performance counters. This turns on PIPE_CAP_COMPUTE and PIPE_SHADER_COMPUTE. Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Ilia Mirkin <[email protected]>
* nv50/ir: fix emission of s[] args in certain situationsIlia Mirkin2015-11-071-2/+2
| | | | | | | | | | There might only be a single arg (e.g. cvt), so use mode rather than looking at the source directly. Also we don't want to rely on the type of the value, which can be unreliable, but instead use the instruction's. This works out well since mkSplit doesn't adjust the type. Signed-off-by: Ilia Mirkin <[email protected]>