summaryrefslogtreecommitdiffstats
path: root/src/mesa
Commit message (Collapse)AuthorAgeFilesLines
* i965: Implement SetTextureStorageForBufferObjectJason Ekstrand2015-01-221-0/+57
| | | | Reviewed-by: Neil Roberts <[email protected]>
* i965: Apply the miptree offset to surface state for renderbuffersJason Ekstrand2015-01-224-4/+8
| | | | | | | | | Previously, we were completely ignoring the mt->offset field for renderbuffers. While it does have some alignment constraints, it is valid to use it. This patch adds the code to each of the 4 surface state setup functions to handle it. Reviewed-by: Neil Roberts <[email protected]>
* i965/mipmap_tree: Add a depth parameter to create_for_boJason Ekstrand2015-01-226-7/+14
| | | | Reviewed-by: Neil Roberts <[email protected]>
* mesa/dd: Add a function for creating a texture from a buffer objectJason Ekstrand2015-01-221-0/+16
| | | | Reviewed-by: Neil Roberts <[email protected]>
* i965/vec4: Fix fprintf argument ordering.Matt Turner2015-01-211-2/+2
| | | | Introduced in commit 3167a80b.
* mesa: change assert to unreachable in two format functionsTobias Klausmann2015-01-212-2/+2
| | | | | | | | | | This fixes two problems reported by osc: I: Program returns random data in a function E: Mesa no-return-in-nonvoid-function ../../src/mesa/main/format_utils.c:180 E: Mesa no-return-in-nonvoid-function ../../src/mesa/main/glformats.c:2714 Reviewed-by: Matt Turner <[email protected]> Signed-off-by: Tobias Klausmann <[email protected]>
* mesa: Add assert to check number of vector elementsJan Vesely2015-01-212-0/+2
| | | | | | | | The below code crashes when vector_elements <= 0 Fixes Warray-bounds warnings Signed-off-by: Jan Vesely <[email protected]> Reviewed-by: Jose Fonseca <[email protected]>
* mesa: Fix some signed-unsigned comparison warningsJan Vesely2015-01-2127-52/+54
| | | | | | | | v2: s/unsigned int/unsigned/ in prog_optimize.c Signed-off-by: Jan Vesely <[email protected]> Reviewed-by: David Heidelberg <[email protected]> Reviewed-by: Jose Fonseca <[email protected]>
* mesa: remove comparisons that are always trueJan Vesely2015-01-212-3/+0
| | | | | Signed-off-by: Jan Vesely <[email protected]> Reviewed-by: Jose Fonseca <[email protected]>
* i965: Extract scalar region checking logicBen Widawsky2015-01-203-7/+15
| | | | | | | | | | | There are currently 2 users of this functionality. I have 2 more users coming up, and having a simple function makes the results much cleaner. The existing interface semantics was proposed by Matt. v2 (Ken): Rename to region_matches()/has_scalar_region(). Signed-off-by: Ben Widawsky <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
* i965: Add QWORD sizes to type_sz macroBen Widawsky2015-01-201-0/+3
| | | | | | | | | | | | | | | | | | | GEN8 added the QWORD as a valid type for certain operations on the EU. In order to calculate the number of registers used one must have the type size as part of the equation. Quoting the formula in the code: regs_written = (dst.width * dst.stride * type_sz(dst.type) + 31) / 32; Adding this separately for bisection since there is no simple way to add an assert in the type_sz function. NOTE: As a side note, I was confused for a while because it's impossible to calculate the region, ie. registers needed, without vstride. However, at this point these are all part of the IR, and so no vstride must exist. Signed-off-by: Ben Widawsky <[email protected]> Reviewed-by: Matt Turner <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
* i965: Work around mysterious Gen4 GPU hangs with minimal state changes.Kenneth Graunke2015-01-191-0/+13
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Gen4 hardware appears to GPU hang frequently when using Chromium, and also when running 'glmark2 -b ideas'. Most of the error states contain 3DPRIMITIVE commands in quick succession, with very few state packets between them - usually VERTEX_BUFFERS/ELEMENTS and CONSTANT_BUFFER. I trimmed an apitrace of the glmark2 hang down to two draw calls with a glUniformMatrix4fv call between the two. Either draw by itself works fine, but together, they hang the GPU. Removing the glUniform call makes the hangs disappear. In the hardware state, this translates to removing the CONSTANT_BUFFER packet between the two 3DPRIMITIVE packets. Flushing before emitting CONSTANT_BUFFER packets also appears to make the hangs disappear. I observed a slowdown in glxgears by doing it all the time, so I've chosen to only do it when BRW_NEW_BATCH and BRW_NEW_PSP are unset (i.e. we haven't done a CS_URB_STATE change or already flushed the whole pipeline). I'd much rather understand the problem, but at this point, I don't see how we'd ever be able to track it down further. We have no real tools, and the hardware people moved on years ago. I've analyzed 20+ error states and read every scrap of documentation I could find. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=80568 Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=85367 Signed-off-by: Kenneth Graunke <[email protected]> Acked-by: Matt Turner <[email protected]> Cc: "10.4 10.3" <[email protected]>
* i965/nir: Enable SIMD16 support in the NIR FS backend.Kenneth Graunke2015-01-191-2/+1
| | | | | | | | With the previous commits in place, it just works. Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]> Reviewed-by: Matt Turner <[email protected]>
* i965/nir: Use offset() instead of altering reg_offset directly.Kenneth Graunke2015-01-191-59/+32
| | | | | | | | | | | offset() properly handles reg_width, so it'll work for SIMD16. While we're in the area, simplify a few cases, and use retype() to cut a few more lines of code. Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]> Reviewed-by: Matt Turner <[email protected]>
* i965/nir: Replace fs_reg(GRF, virtual_grf_alloc(...)) with vgrf(...).Kenneth Graunke2015-01-193-13/+23
| | | | | | | | | | | | | | | | | | | | | | | | | | | brw_fs_nir.cpp creates almost all of its registers via: fs_reg reg = fs_reg(GRF, virtual_grf_alloc(num_components)); When we add SIMD16 support, we'll need to set reg->width = 16 and double the VGRF size...on pretty much every VGRF it allocates. This patch replaces that pattern with a new "vgrf" helper method: fs_reg reg = vgrf(num_components); The new function correctly takes reg_width into account. For now, reg_width is always 1, so this should have no functional change. v2: Just make vgrf() account for reg_width right away, rather than changing the behavior in the next patch. v3: Replace one last virtual_grf_alloc I missed. It's used in code that only runs for dispatch_width == 8, so it doesn't matter, but consistency is nice. Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Matt Turner <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]>
* i965: Replace fs_reg(fs_visitor, type) with fs_visitor::vgrf(type).Kenneth Graunke2015-01-196-128/+122
| | | | | | | | | | | | | | | | | | I dislike how fs_reg has a constructor that knows about fs_visitor. Apart from that, it stands alone, with no need to interact with the rest of the compiler. Which is sensible - a class that represents a register should do just that. Allocating virtual register numbers should be left up to the compiler (fs_visitor). This patch replaces the constructor with a new fs_visitor::vgrf method, eliminating fs_reg's dependency on fs_visitor. It ends up being no more code. v2: Rebase from May 2014 -> January 2015. Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]> Reviewed-by: Matt Turner <[email protected]>
* st/mesa: don't set vs.key.clamp_color if a shader doesn't write any colorsMarek Olšák2015-01-193-5/+10
| | | | And update some comments.
* mesa: fix a trivial spelling mistakeMartin Peres2015-01-191-1/+1
| | | | | Signed-off-by: Martin Peres <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
* mesa: support GL_RGB for GL_EXT_texture_type_2_10_10_10_REVTapani Pälli2015-01-195-0/+8
| | | | | | | | | | | | | | | | | Commit 8ec6534 changed texture upload path and the way how texture format is being checked, this commit adds support for GL_RGB with GL_UNSIGNED_INT_2_10_10_10_REV as specified by the extension EXT_texture_type_2_10_10_10_REV specification. This fixes regression in ES3 conformance test ES3-CTS.gtf.GL3Tests.packed_pixels.packed_pixels v2: add MESA_FORMAT_R10G10B10X2_UNORM format (Iago Toral) Signed-off-by: Tapani Pälli <[email protected]> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=88385 Reviewed-by: Jason Ekstrand <[email protected]> Reviewed-by: Iago Toral Quiroga <[email protected]>
* mesa: Add ARB_shader_precision infrastructureMicah Fedke2015-01-192-0/+2
| | | | | Reviewed-by: Ian Romanick <[email protected]> Reviewed-by: Chris Forbes <[email protected]>
* i965/fs: Fix the dummy fragment shader.Kenneth Graunke2015-01-171-7/+32
| | | | | | | | | | | | | | | | | | | | | | | | We hit an assertion that the destination of the FB write should not be an immediate. (I don't know what we were thinking.) Use ARF null. Trying to substitute real shaders with the dummy shader would crash when trying to upload non-existent uniforms. Say there are none. It also wouldn't generate any code because we didn't compute the CFG, and code generation now requires it. Compute it. Gen4-5 also require a message header to be present. On Gen6+, there were assertion failures in SF/SBE state because urb_setup was memset to 0 instad of -1, causing it to think there were attributes when nothing was set up right. Set to no attributes. Finally, you have to ensure "Setup URB Entry Read Length" is non-zero or you get GPU hangs, at least on Crestline. It now works on at least Crestline and Haswell. Signed-off-by: Kenneth Graunke <[email protected]>
* i965: Fix up too-wide commentKristian Høgsberg2015-01-161-4/+3
| | | | Signed-off-by: Kristian Høgsberg <[email protected]>
* mesa: Add iterate method for string_to_uint_mapTapani Pälli2015-01-161-0/+34
| | | | | | | | | | | | | | | | | | | | | | | | | The upcoming shader cache needs this to be able to cache hash data from the gl_shader_program structure. Edited-by: Carl Worth <[email protected]>: There is an internal implementation detail that the hash table underlying the struct string_to_uint_map stores each value internally as (value+1). The user needn't be very concerned with this (other than knowing that a value of UINT_MAX cannot be stored) since put() adds 1 and get() subtracts 1. So in this commit, rather than call the user's function directly with hash_table_call_foreach, we call through a wrapper that fixes up the off-by-one values before the caller's callback sees them. And with this wrapper in place, we also give a better signature to the callback function being passed to iterate(), so that this callback function can actually expect a char* and an unsigned argument, (rather than a couple of void* ). Reviewed-by: Ian Romanick <[email protected]> Reviewed-by: Tapani Pälli <[email protected]>
* i965: Fix some oddities in FB_WRITE register width and execution size.Kenneth Graunke2015-01-161-0/+2
| | | | | | | | | | | | | | | Previously, we generated this for FB writes in SIMD16 mode: load_payload(16) vgrf5@8+0.0:F, vgrf1:F, vgrf2:F, vgrf3:F, vgrf4:F fb_write(8) (null):UD, vgrf5@8+0.0:F 1sthalf The LOAD_PAYLOAD's destination had its register width set to 8, and the FB_WRITE had its execution size set to 8. This seems wrong, and while it probably doesn't affect anything, we should fix it. Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Matt Turner <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]>
* i965/fs: Make lower_load_payload etc. appear in INTEL_DEBUG=optimizer.Kenneth Graunke2015-01-161-7/+11
| | | | | | | | | | | | | | In order to support calling lower_load_payload() inside a condition, this patch makes OPT() a statement expression: https://gcc.gnu.org/onlinedocs/gcc/Statement-Exprs.html We recently did the equivalent change in the vec4 backend (commit 9b8bd67768769b685c25e1276e053505aede5f93). Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Matt Turner <[email protected]> Acked-by: Jason Ekstrand <[email protected]>
* format_utils: Use a more precise conversion when decreasing bitsNeil Roberts2015-01-161-3/+12
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When converting to a format that has fewer bits the previous code was just shifting off the bits. This doesn't provide very accurate results. For example when converting from 8 bits to 5 bits it is equivalent to doing this: x * 32 / 256 This works as if it's taking a value from a range where 256 represents 1.0 and scaling it down to a range where 32 represents 1.0. However this is not correct because it is actually 255 and 31 that represent 1.0. We can do better with a formula like this: (x * 31 + 127) / 255 The +127 is to make it round correctly. The new code has a special case to use uint64_t when the result of the multiplication would overflow an unsigned int. This function is inline and only ever called with constant values so hopefully the if statements will be folded. The main incentive to do this is to make the CPU conversion path pick the same values as the hardware would if it did the conversion. This fixes failures with the ‘texsubimage pbo’ test when using the patches from here: http://lists.freedesktop.org/archives/mesa-dev/2015-January/074312.html v2: Use 64-bit arithmetic when src_bits+dst_bits > 32 Reviewed-by: Jason Ekstrand <[email protected]>
* i965/gen6: Fix crash with VS+TF after rendering with GSIago Toral Quiroga2015-01-161-1/+1
| | | | | | | | | | | | | | | | | | Rendering with a GS and then using transform feedback with a program that does not have a GS can crash in gen6. The reason for this is that brw_begin_transform_feedback checks brw->geometry_program to decide if there is a GS program, but this is not correct: brw->geometry_program is updated when issuing drawing commands, so after rendering with a GS it will be non-NULL until we draw again with a program that does not have a GS. If the next program uses TF, we will call glBegintransformFeedback before issuing the drawing command and hence brw->geometry_program will be non-NULL if the previous rendering used a GS. The right thing to do here is to check ctx->_Shader->CurrentProgram[MESA_SHADER_GEOMETRY] instead. This is what the gen7 code path does too. Fixes: https://bugs.freedesktop.org/show_bug.cgi?id=87694 Reviewed-by: Tapani Pälli <[email protected]>
* mesa: move GET_CURRENT_CONTEXT() to top of _mesa_init_renderbuffer()Brian Paul2015-01-151-1/+2
| | | | | | To fix MSVC build. Reviewed-by: Matt Turner <[email protected]>
* mesa: Fix render buffer initial internal format in GLES 3Mike Mason2015-01-151-1/+18
| | | | | | | | | | Changes the initial internal format of a render buffer to GL_RGBA4 in GLES 3. This fixes a failure in the following DrawElements test: dEQP-GLES3.functional.state_query.rbo.renderbuffer_internal_format Reviewed-by: Chad Versace <[email protected]>
* util/hash_set: Rework the API to know about hashingJason Ekstrand2015-01-154-21/+18
| | | | | | | | | | | | | | | | | | | | | | | | Previously, the set API required the user to do all of the hashing of keys as it passed them in. Since the hashing function is intrinsically tied to the comparison function, it makes sense for the hash set to know about it. Also, it makes for a somewhat clumsy API as the user is constantly calling hashing functions many of which have long names. This is especially bad when the standard call looks something like _mesa_set_add(ht, _mesa_pointer_hash(key), key); In the above case, there is no reason why the hash set shouldn't do the hashing for you. We leave the option for you to do your own hashing if it's more efficient, but it's no longer needed. Also, if you do do your own hashing, the hash set will assert that your hash matches what it expects out of the hashing function. This should make it harder to mess up your hashing. This is analygous to 94303a0750 where we did this for hash_table Signed-off-by: Jason Ekstrand <[email protected]> Reviewed-by: Matt Turner <[email protected]> Reviewed-by: Eric Anholt <[email protected]>
* util: Move main/set to util/hash_setJason Ekstrand2015-01-157-446/+4
| | | | | Reviewed-by: Matt Turner <[email protected]> Reviewed-by: Eric Anholt <[email protected]>
* hash_table: Rename insert_with_hash to insert_pre_hashedJason Ekstrand2015-01-151-1/+1
| | | | | | | We already have search_pre_hashed. This makes the APIs match better. Reviewed-by: Matt Turner <[email protected]> Reviewed-by: Eric Anholt <[email protected]>
* i965: Don't consider null dst instructions as matching non-null dst.Matt Turner2015-01-152-2/+4
| | | | | | | | | | | | | | | | | | | When performing common subexpression elimination on instructions with non-null destinations we emit a MOV to copy the result to a new register that must have no other uses. In the case of: cmp.g.f0.0(8) null:D, vgrf43:F, 0.500000f ... cmp.g.f0.0(8) vgrf113:D, vgrf43:F, 0.500000f we put the first instruction in the AEB and decided that we could reuse its result when we found the second. Unfortunately, that meant that we'd emit a MOV from the first's destination, which is null. Don't do anything if the entry's destination is null and the instruction's destination is non-null. Tested-by: Tapani Pälli <[email protected]>
* i965/vec4: Make sure that imm writes are to registers in the same file.Matt Turner2015-01-151-2/+8
| | | | Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=87887
* i965/fs: Emit MADs from (x + abs(y * z)).Matt Turner2015-01-151-3/+15
| | | | | | | | | Just use the abs source modifier on both of the multiplicand arguments. instructions in affected programs: 300 -> 296 (-1.33%) Reviewed-by: Kristian Høgsberg <[email protected]>
* i965/fs: Emit MADs from (x + -(y * z)).Matt Turner2015-01-151-0/+12
| | | | | | | | | | Just use the negation source modifier on one of the multiplicand arguments. total instructions in shared programs: 5889529 -> 5880016 (-0.16%) instructions in affected programs: 600846 -> 591333 (-1.58%) Reviewed-by: Kristian Høgsberg <[email protected]>
* i965/nir: Do a final copy lowering pass before lowering locals to regsJason Ekstrand2015-01-151-0/+3
| | | | Reviewed-by: Connor Abbott <[email protected]>
* nir: Rename lower_variables to lower_vars_to_ssaJason Ekstrand2015-01-151-1/+1
| | | | | | | | The original name wasn't particularly descriptive. This one indicates that it actually gives you SSA values as opposed to the old pass which lowered variables to registers. Reviewed-by: Connor Abbott <[email protected]>
* nir/tex_instr: Add a nir_tex_src struct and dynamically allocate the src arrayJason Ekstrand2015-01-151-2/+2
| | | | | | | | This solves a number of problems. First is the ability to change the number of sources that a texture instruction has. Second, it solves the delema that may occur if a texture instruction has more than 4 sources. Reviewed-by: Connor Abbott <[email protected]>
* i965/fs_nir: Handle sample ID, position, and mask betterJason Ekstrand2015-01-152-12/+71
| | | | | | | | | | Before, we were emitting the full pile of setup instructions for sample_id and sample_pos every time they were used. With this commit, we emit them in their own pass once at the beginning of the shader and simply emit uses later on. When it comes time for setting up VS, we can put setup for its special values in the same pass. Reviewed-by: Connor Abbott <[email protected]>
* nir: Make load_const SSA-onlyJason Ekstrand2015-01-152-26/+3
| | | | | | | | As it was, we weren't ever using load_const in a non-SSA way. This allows us to substantially simplify the load_const instruction. If we ever need a non-SSA constant load, we can do a load_const and an imov. Reviewed-by: Connor Abbott <[email protected]>
* i965/nir: Move the other lowering passes to before out-of-SSAJason Ekstrand2015-01-151-6/+6
| | | | Reviewed-by: Connor Abbott <[email protected]>
* nir/lower_atomics: Use/support SSAJason Ekstrand2015-01-151-3/+3
| | | | | | | | | | | Previously, lower_atomics was non-SSA only. We assert-failed if the destination of an atomic operation intrinsic was an SSA def and we used temporary registers for computing offsets. This commit changes both of these behaviors. We now use SSA values for computing offsets (so we can optimize them) and we handle SSA destinations. We also move the pass to run before we go out of SSA on i965 as it now generates SSA values. Reviewed-by: Connor Abbott <[email protected]>
* nir: Remove predicationJason Ekstrand2015-01-151-62/+11
| | | | | | | | We stopped generating predicates in glsl_to_nir some time ago. Right now, it's all dead untested code that I'm not convinced always worked in the first place. If we decide we want them back, we can revert this patch. Reviewed-by: Connor Abbott <[email protected]>
* nir: Make bcsel a fully vector operationJason Ekstrand2015-01-151-3/+8
| | | | | | | | Previously, the condition was a scalar that applied to all components simultaneously. As of this commit, the condition is a vector and each component is switched seperately. Reviewed-by: Connor Abbott <[email protected]>
* i965/fs_nir: Add support for indirect texture arraysJason Ekstrand2015-01-151-4/+21
| | | | | | | | v2 Jason Ekstrand <[email protected]>: - Use the nir_tex_src_sampler_offset source type instead of the sampler_indirect thing that I cooked up before. Reviewed-by: Chris Forbes <[email protected]>
* nir/tex_instr: Rename the indirect source type and add an array sizeJason Ekstrand2015-01-151-1/+1
| | | | | | | | | In particular, we rename nir_tex_src_sampler_index to _sampler_offset and add a sampler_array_size field to nir_tex_instr. This way we can pass the size of sampler arrays through to backends even after removing the variable information and, with it, the type. Reviewed-by: Connor Abbott <[email protected]>
* nir: Use a source for uniform buffer indices instead of an indexJason Ekstrand2015-01-151-37/+59
| | | | | | | | | | In GLSL-to-NIR we were just setting the base index to 0 whenever there was an indirect so having it expressed as a sum makes no sense. Also, while a base offset may make sense for the memory location (first element in the array, etc.) it makes less sense for the actual uniform buffer index. This may change later, but it seems to make more sense for now. Reviewed-by: Connor Abbott <[email protected]>
* nir: Make texture instruction names more consistentJason Ekstrand2015-01-151-2/+2
| | | | | | | | This commit renames nir_instr_as_texture to nir_instr_as_tex and renames nir_instr_type_texture to nir_instr_type_tex to be consistent with nir_tex_instr. Reviewed-by: Connor Abbott <[email protected]>
* nir: Add a basic constant folding passJason Ekstrand2015-01-151-0/+2
| | | | Reviewed-by: Connor Abbott <[email protected]>