summaryrefslogtreecommitdiffstats
path: root/src/mesa
Commit message (Collapse)AuthorAgeFilesLines
* mesa/extensions: Fix ES1 extension reportingNanley Chery2016-06-021-2/+2
| | | | | | | | | | Commit eda15abd84af575d3bde432e2163e30d743a7c87 , unintentionally advertised these extensions in ES1 contexts. Undo this error. Signed-off-by: Nanley Chery <[email protected]> Reviewed-by: Ilia Mirkin <[email protected]> Reviewed-by: Ian Romanick <[email protected]> Cc: "12.0" <[email protected]>
* mesa/formatquery: add a comment to clarify INTERNALFORMAT_PREFERREDAlejandro Piñeiro2016-06-021-1/+4
| | | | | | | | | | | The comment clarifies that the driver is called only to try to get a preferred internalformat, and that it was already checked if the format is supported or not. Acked-by: Eduardo Lima <[email protected]> Acked-by: Antia Puentes <[email protected]> Reviewed-by: Ian Romanick <[email protected]>
* i965/formatquery: remove INTERNALFORMAT_PREFERRED implementationAlejandro Piñeiro2016-06-021-71/+0
| | | | | | | | | | | | | Right now the implementation only checks if the internalformat is supported or not. But that implementation is wrong, returning unsupported for some internalformats. Additionally, checking if the internalformat is supported or not is already done at mesa/main before calling the driver hook, so this new check is not needed. Acked-by: Eduardo Lima <[email protected]> Acked-by: Antia Puentes <[email protected]> Reviewed-by: Ian Romanick <[email protected]>
* i965/eu: use simd8 when exec_size != EXECUTE_16Alejandro Piñeiro2016-06-021-2/+2
| | | | | | | | | | Among other thigs, fix a gpu hang when using INTEL_DEBUG=shader_time for any shader. Signed-off-by: Jason Ekstrand <[email protected]> Signed-off-by: Alejandro Piñeiro <[email protected]> Reviewed-by: Francisco Jerez <[email protected]>
* i965: Remove old CS local ID handlingJordan Justen2016-06-016-120/+2
| | | | | | | | | | | | | The old method pushed data for each channels uvec3 data of gl_LocalInvocationID. The new method pushes 1 dword of data that is a 'thread local ID' value. Based on that value, we can generate gl_LocalInvocationIndex and gl_LocalInvocationID with some calculations. Cc: "12.0" <[email protected]> Signed-off-by: Jordan Justen <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]>
* i965: Enable cross-thread constants and compact local IDs for hsw+Jordan Justen2016-06-013-14/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The cross thread constant support appears on Haswell. It allows us to upload a set of uniform data for all threads without duplicating it per thread. One complication is that cross-thread constants are loaded into registers before per-thread constants. Previously, our local IDs were loaded before the uniform data and treated as 'payload' data, even though they were actually pushed into the registers like the other uniform data. Therefore, in this patch we simultaneously enable a newer layout where each thread now uses a single uniform slot for a unique local ID for the thread. This uniform is handled specially to make sure it is added last into the uniform push constant registers. This minimizes our usage of push constant registers, and maximizes our ability to use cross-thread constants for registers. To swap from the old to the new layout, we also need to flip some lowering pass switches to let our driver handle the lowering instead. We also no longer force thread_local_id_index to -1. v4: * Minimize size of patch that switches from the old local ID layout to the new layout (Jason) Cc: "12.0" <[email protected]> Signed-off-by: Jordan Justen <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]>
* i965: Support new local ID push constant & cross-thread constantsJordan Justen2016-06-012-45/+52
| | | | | | | | | | | | | | | | | The cross thread constant support appears on Haswell. It allows us to upload a set of uniform data for all threads without duplicating it per thread. We also support per-thread data which allows us to store a per-thread ID in one of the uniforms that can be used to calculate the gl_LocalInvocationIndex and gl_LocalInvocationID variables. v4: * Support the old local ID push constant layout as well (Jason) Cc: "12.0" <[email protected]> Signed-off-by: Jordan Justen <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]>
* i965: Add CS push constant info to brw_cs_prog_dataJordan Justen2016-06-012-0/+73
| | | | | | | | | | | | | | | | | | | | | | | | | | We need information about push constants in a few places for the GL driver, and another couple places for the vulkan driver. When we add support for uploading both a common (cross-thread) set of push constants, combined with the previous per-thread push constant data, things are going to get even more complicated. To simplify things, we add push constant info into the cs prog_data struct. The cross-thread constant support is added as of Haswell. To support it we need to make sure all push constants with uniform values are added to earlier registers. The register that varies per thread and holds the thread invocation's unique local ID needs to be added last. For now we add the code that would calculate cross-thread constatn information for hsw+, but we force it (cross_thread_supported) off until the other parts of the driver support it. v4: * Support older local ID push constant layout as well. (Jason) Cc: "12.0" <[email protected]> Signed-off-by: Jordan Justen <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]>
* i965: Store number of threads in brw_cs_prog_dataJordan Justen2016-06-013-25/+23
| | | | | | Cc: "12.0" <[email protected]> Signed-off-by: Jordan Justen <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]>
* i965: Add nir based intrinsic lowering and thread ID uniformJordan Justen2016-06-014-0/+190
| | | | | | | | | | | | | | | | | | | | | | | | | We add a lowering pass for nir intrinsics. This pass can replace nir intrinsics with driver specific nir lower code. We lower the gl_LocalInvocationIndex intrinsic based on a uniform which is loaded with a thread specific ID. We also lower the gl_LocalInvocationID based on gl_LocalInvocationIndex. v2: * Create variable during lowering pass. (Ken) v3: * Don't create a variable, but instead just insert an intrisic call to load a uniform from the allocated location. (Jason) v4: * Don't run this pass if thread_local_id_index < 0 Cc: "12.0" <[email protected]> Signed-off-by: Jordan Justen <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]>
* i965: Put CS local thread ID uniform in last push registerJordan Justen2016-06-011-1/+25
| | | | | | | | | | | | | | | | | | | | | | | | | | | This thread ID uniform will be used to compute the gl_LocalInvocationIndex and gl_LocalInvocationID values. It is important for this uniform to be added in the last push constant register. fs_visitor::assign_constant_locations is updated to make sure this happens. The reason this is important is that the cross-thread push constant registers are loaded first, and the per-thread push constant registers are loaded after that. (Broadwell adds another push constant upload mechanism which reverses this order, but we are ignoring this for now.) v2: * Add variable in intrinsics lowering pass * Make sure the ID is pushed last in assign_constant_locations, and that we save a spot for the ID in the push constants v3: * Simplify code based with Jason's suggestions. Cc: "12.0" <[email protected]> Signed-off-by: Jordan Justen <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]>
* i965: Add uniform for a CS thread local base IDJordan Justen2016-06-013-1/+21
| | | | | | | | | | | | v4: * Force thread_local_id_index to -1 for now, and have fs_visitor::setup_cs_payload look at thread_local_id_index. This enables us to more easily cut over from the old local ID layout to the new layout, as suggested by Jason. Cc: "12.0" <[email protected]> Signed-off-by: Jordan Justen <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]>
* i965: Add nir channel_num system valueJordan Justen2016-06-011-0/+15
| | | | | | | | | v2: * simd16/32 fixes (curro) Cc: "12.0" <[email protected]> Signed-off-by: Jordan Justen <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]>
* nir: Make lowering gl_LocalInvocationIndex optionalJordan Justen2016-06-011-1/+2
| | | | | | Cc: "12.0" <[email protected]> Signed-off-by: Jordan Justen <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]>
* glsl: Add glsl LowerCsDerivedVariables optionJordan Justen2016-06-013-0/+5
| | | | | | | | | | v2: * Move lower flag to context constants. (Ken) Cc: "12.0" <[email protected]> Signed-off-by: Jordan Justen <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]> (v1) Reviewed-by: Jason Ekstrand <[email protected]>
* i965/fs: Copy the offset when lowering logical pull constant sendsJason Ekstrand2016-06-011-0/+8
| | | | | | | | This fixes 64 Vulkan CTS tests per gen Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=96299 Reviewed-by: Francisco Jerez <[email protected]> Cc: "12.0" <[email protected]>
* i965: Fix isoline reads in scalar TES.Kenneth Graunke2016-06-011-1/+1
| | | | | | | | | | | | | Isolines aren't reversed. commit 5b2d8c2273c6f fixed this for the vec4 TES backend, but not the scalar one. Found while debugging GL45-CTS.tessellation_shader. tessellation_control_to_tessellation_evaluation.gl_tessLevel. Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Dave Airlie <[email protected]> Reviewed-by: Iago Toral Quiroga <[email protected]> Cc: [email protected]
* st/mesa: implement PBO downloads for ReadPixelsNicolai Hähnle2016-06-013-3/+160
| | | | | | | | | v2: require PIPE_CAP_SAMPLER_VIEW_TARGET; technically only needed for some of the texture targets, but all hardware that has shader images should also have this cap. Reviewed-by: Brian Paul <[email protected]> Reviewed-by: Marek Olšák <[email protected]>
* st/mesa: hook up a no-op try_pbo_readpixelsNicolai Hähnle2016-06-011-20/+37
| | | | | | | | For better bisectability given that the order of some of the fallback tests in the blit path are rearranged. Reviewed-by: Brian Paul <[email protected]> Reviewed-by: Marek Olšák <[email protected]>
* st/mesa: add layer_offset to PBO fragment shaderNicolai Hähnle2016-06-012-4/+16
| | | | | | | | | This will be used to select a slice of a 3D texture. v2: fix a comment (Marek) Reviewed-by: Brian Paul <[email protected]> Reviewed-by: Marek Olšák <[email protected]>
* st/mesa: create PBO download fragment shadersNicolai Hähnle2016-06-012-7/+79
| | | | | Reviewed-by: Brian Paul <[email protected]> Reviewed-by: Marek Olšák <[email protected]>
* st/mesa: add PBO download enable bit and fragment shadersNicolai Hähnle2016-06-012-0/+17
| | | | | | | | | | For downloads, the fragment shader must know the source texture target, hence we may cache multiple fragment shaders. v2: break long line (Marek) Reviewed-by: Brian Paul <[email protected]> Reviewed-by: Marek Olšák <[email protected]>
* st/mesa: move shareable parts of PBO upload state and draw to st_pbo.cNicolai Hähnle2016-06-013-106/+129
| | | | | Reviewed-by: Brian Paul <[email protected]> Reviewed-by: Marek Olšák <[email protected]>
* st/mesa: move PBO buffer address calculation to st_pbo.cNicolai Hähnle2016-06-013-120/+204
| | | | | Reviewed-by: Brian Paul <[email protected]> Reviewed-by: Marek Olšák <[email protected]>
* st/mesa: move PBO upload fs creation to st_pbo.cNicolai Hähnle2016-06-013-77/+80
| | | | | Reviewed-by: Brian Paul <[email protected]> Reviewed-by: Marek Olšák <[email protected]>
* st/mesa: rename pbo_upload to pboNicolai Hähnle2016-06-013-48/+48
| | | | | | | At the same time, rename members that are upload-specific to say so. Reviewed-by: Brian Paul <[email protected]> Reviewed-by: Marek Olšák <[email protected]>
* st/mesa: move PBO vertex and geometry shader creation to st_pbo.cNicolai Hähnle2016-06-013-89/+97
| | | | | Reviewed-by: Brian Paul <[email protected]> Reviewed-by: Marek Olšák <[email protected]>
* st/mesa: begin moving PBO functions into their own fileNicolai Hähnle2016-06-016-61/+130
| | | | | Reviewed-by: Brian Paul <[email protected]> Reviewed-by: Marek Olšák <[email protected]>
* gallium/cso: allow saving the first fragment shader image slotNicolai Hähnle2016-06-011-4/+5
| | | | | Reviewed-by: Brian Paul <[email protected]> Reviewed-by: Marek Olšák <[email protected]>
* glsl: Use Geom.VerticesOut == -1 to specify unsetIan Romanick2016-06-011-1/+1
| | | | | | | | | Because apparently layout(max_vertices=0) is a thing. Signed-off-by: Ian Romanick <[email protected]> Reviewed-by: Ilia Mirkin <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]> Cc: "12.0" <[email protected]>
* i965: If control_data_header_size_bits is zero, don't do EndPrimitiveIan Romanick2016-06-012-0/+6
| | | | | | | | This can occur when max_vertices=0 is explicitly specified. Signed-off-by: Ian Romanick <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]> Cc: "12.0" <[email protected]>
* mesa: Fix bogus strncmpIan Romanick2016-06-011-1/+1
| | | | | | | | | | | | | | The string "[0]\0" is the same as "[0]" as far as the C string datatype is concerned. That string has length 3. strncmp(s, length_3_string, 4) is the same as strcmp(s, length_3_string), so make it be strcmp. v2: Not the same as strncmp(..., 3). Noticed by Ilia. Signed-off-by: Ian Romanick <[email protected]> Reviewed-by: Ilia Mirkin <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]> Reviewed-by: Samuel Iglesias Gonsálvez <[email protected]> Cc: "12.0" <[email protected]>
* mesa/sampler: fix error codes for sampler parameters.Dave Airlie2016-06-011-27/+10
| | | | | | | | | | | | | The initial ARB_sampler_objects spec had GL_INVALID_VALUE in it, however version 8 of it fixed this, and the GL specs also have the fixed value in them. Fixes: GL45-CTS.texture_border_clamp.samplerparameteri_non_gen_sampler_error Reviewed-by: Ilia Mirkin <[email protected]> Cc: "12.0 11.2" <[email protected]> Signed-off-by: Dave Airlie <[email protected]>
* i965: Add norbc debug optionTopi Pohjolainen2016-06-013-0/+4
| | | | | | | | | | | | | | This INTEL_DEBUG option disables lossless compression (also known as render buffer compression). v2: (Matt) Use likely(!lossless_compression_disabled) instead of !likely(lossless_compression_disabled) (Grazvydas) Update docs/envvars.html Cc: "12.0" <[email protected]> Signed-off-by: Topi Pohjolainen <[email protected]> Reviewed-by: Matt Turner <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
* i965/gen9: Configure rbc buffers as plain for non-rbc tex viewsTopi Pohjolainen2016-06-012-3/+48
| | | | | | | | | | | | | | | | | | | | | | Fixes rendering in Shadow of Mordor with rbc. Application writes RGBA_UNORM texture filling it with values the application wants to later on treat as SRGB_ALPHA. Intel driver enables lossless compression for the buffer by the time of writing. However, the driver fails to make sure the buffer can be sampled as something else later on and unfortunately there is restriction in the hardware for using lossless compression for srgb formats which looks to extend itself to the sampling engine also. Requesting srgb to linear conversion on top of compressed buffer results the color values to be pretty much garbage. Fortunately none of tracked benchmarks showed a regression with this. v2 (Matt): Add missing space Cc: "12.0" <[email protected]> Signed-off-by: Topi Pohjolainen <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
* i965: Fix the passthrough TCS for isolines.Kenneth Graunke2016-05-311-7/+12
| | | | | | | | | | | | | | | | We weren't setting up several of the uniform values for the patch header, so we'd crash when uploading push constants. We at least need to initialize them to zero. We also had the isoline parameters reversed, so it would also render incorrectly (if it didn't crash). Fixes a new Piglit test(*) (isoline-no-tcs), as well as crashes in GL44-CTS.tessellation_shader.single.max_patch_vertices. (*) https://lists.freedesktop.org/archives/piglit/2016-May/019866.html Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Dave Airlie <[email protected]> Cc: [email protected]
* i965/xfb: skip components in correct buffer.Dave Airlie2016-06-011-4/+6
| | | | | | | | | | | The driver was adding the skip components but always for buffer 0. This fixes: GL45-CTS.gtf40.GL3Tests.transform_feedback3.transform_feedback3_skip_multiple_buffers Reviewed-by: Kenneth Graunke <[email protected]> Cc: "12.0 11.2" <[email protected]> Signed-off-by: Dave Airlie <[email protected]>
* mesa/bufferobj: use mapping range in BufferSubData.Dave Airlie2016-06-011-1/+1
| | | | | | | | | | | | | | | | | According to GL4.5 spec: An INVALID_OPERATION error is generated if any part of the speci- fied buffer range is mapped with MapBufferRange or MapBuffer (see sec- tion 6.3), unless it was mapped with MAP_PERSISTENT_BIT set in the Map- BufferRange access flags. So we should use the if range is mapped path. This fixes: GL45-CTS.buffer_storage.map_persistent_buffer_sub_data Reviewed-by: Nicolai Hähnle <[email protected]> Cc: "12.0, 11.2" <[email protected]> Signed-off-by: Dave Airlie <[email protected]>
* i965/fs: Allow scalar source regions on SNB math instructions.Francisco Jerez2016-05-313-17/+8
| | | | | | | | | | | | | | | | I haven't found any evidence that this isn't supported by the hardware, in fact according to the SNB hardware spec: "The supported regioning modes for math instructions are align16, align1 with the following restrictions: - Scalar source is supported. [...] - Source and destination offset must be the same, except the case of scalar source." Cc: "12.0" <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]> Reviewed-by: Matt Turner <[email protected]>
* i965/fs: Fix constant combining for instructions that cannot accept source mods.Francisco Jerez2016-05-311-4/+3
| | | | | | | | | | This is the case for SNB math instructions so we need to be careful and insert the literal value of the immediate into the table (rather than its absolute value) if the instruction is unable to invert the sign of the constant on the fly. Cc: "12.0" <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]>
* i965/fs: Extend remove_duplicate_mrf_writes() to handle non-VGRF to MRF copies.Francisco Jerez2016-05-311-8/+8
| | | | | Cc: "12.0" <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]>
* i965/fs: Fix compute_to_mrf() to coalesce VGRFs initialized by multiple ↵Francisco Jerez2016-05-311-10/+36
| | | | | | | | | | | | | | | | | | | | | | single-GRF writes. Which requires using a bitset instead of a boolean flag to keep track of the GRFs we've seen a generating instruction for already. The search loop continues until all instructions initializing the value of the source VGRF have been found, or it is determined that coalescing is not possible. Fixes a few piglit test cases on Gen4-6 which were regressed by 6956015aa514f2d06d0e4b33bfe6bca83142fbf0 due to the different (yet perfectly valid) ordering in which copy instructions are emitted now by the simd lowering pass, which had the side effect of causing this optimization pass to start corrupting the program in cases where a VGRF-to-MRF copy instruction would be eliminated but only the last instruction writing to the source VGRF region would be rewritten to point to the target MRF. Cc: "12.0" <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]>
* i965/fs: Teach compute_to_mrf() about the COMPR4 address transformation.Francisco Jerez2016-05-311-1/+23
| | | | | | | | | This will be required to correctly transform the destination of 8-wide instructions that write a single GRF of a VGRF to MRF copy marked COMPR4. Cc: "12.0" <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]>
* i965/fs: Refactor compute_to_mrf() to split search and rewrite into separate ↵Francisco Jerez2016-05-311-10/+25
| | | | | | | | | | | | | | | loops. This will allow compute_to_mrf to handle cases where the source of the VGRF-to-MRF copy is initialized by more than one instruction. In such cases we cannot rewrite the destination of any of the generating instructions until it's known whether the whole VGRF source region can be coalesced into the destination MRF, which will imply continuing the search until all generating instructions have been found or it has been determined that the VGRF and MRF registers cannot be coalesced. Cc: "12.0" <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]>
* i965/fs: Fix compute-to-mrf VGRF region coverage condition.Francisco Jerez2016-05-311-3/+6
| | | | | | | | | | | | | | | Compute-to-mrf was checking whether the destination of scan_inst is more than one component (making assumptions about the instruction data type) in order to find out whether the result is being fully copied into the MRF destination, which is rather inaccurate in cases where a single-component instruction is only partially contained in the source region, or when the execution size of the copy and scan_inst instructions differ. Instead check whether the destination region of the instruction is really contained within the bounds of the source region of the copy. Cc: "12.0" <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]>
* i965/fs: Simplify and improve accuracy of compute_to_mrf() by using ↵Francisco Jerez2016-05-311-47/+13
| | | | | | | | | | | | | regions_overlap(). Compute-to-mrf was being rather heavy-handed about checking whether instruction source or destination regions interfere with the copy instruction, which could conceivably lead to program miscompilation. Fix it by using regions_overlap() instead of the open-coded and dubiously correct overlap checks. Cc: "12.0" <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]>
* i965/fs: Teach regions_overlap() about COMPR4 MRF regions.Francisco Jerez2016-05-311-3/+17
| | | | | Cc: "12.0" <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]>
* mesa: fix crash in driver_RenderTexture_is_safeMarek Olšák2016-05-311-1/+2
| | | | | | | | | This just fixed the crash with the apitrace in bug report. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=95246 Cc: 11.1 11.2 12.0 <[email protected]> Reviewed-by: Nicolai Hähnle <[email protected]>
* st/glsl_to_tgsi: prevent infinite loopEric Engestrom2016-05-311-2/+3
| | | | | | | | `unsigned j` would never fail `j >= 0`, leading to an infinite loop as `j--` wraps around. Signed-off-by: Eric Engestrom <[email protected]> Signed-off-by: Marek Olšák <[email protected]>
* Revert "osmesa: don't try to bundle osmesa.def SConscript"Emil Velikov2016-05-301-0/+2
| | | | | | | This reverts commit c07df0f2014636b601cdbaff63214296599b1ad5. Now that the SCons build is back we need to include the files in the tarball.