summaryrefslogtreecommitdiffstats
path: root/src
Commit message (Collapse)AuthorAgeFilesLines
* radeonsi: generate register and packet tables for an IB parser from sid.hMarek Olšák2015-08-264-0/+190
| | | | | | | | | | | This makes writing a good IB parser a lot easier. It generates 2 tables: - packet3 table - register table with all registers, fields, and named values Acked-by: Christian König <[email protected]> Acked-by: Alex Deucher <[email protected]>
* radeonsi: remove duplicated register definitions and instruction definitionsMarek Olšák2015-08-261-3160/+0
| | | | | | | | | Instruction encoding isn't needed in Mesa. The border color address registers were duplicated. Acked-by: Christian König <[email protected]> Acked-by: Alex Deucher <[email protected]>
* r600g,radeonsi: remove unused ill-formed register field definitionsMarek Olšák2015-08-262-2/+0
| | | | | Acked-by: Christian König <[email protected]> Acked-by: Alex Deucher <[email protected]>
* radeonsi: add an initial dump_debug_state implementation dumping shadersMarek Olšák2015-08-264-0/+64
| | | | | | | This is usually called after a draw call. Acked-by: Christian König <[email protected]> Acked-by: Alex Deucher <[email protected]>
* radeonsi: allow si_dump_key to write to a fileMarek Olšák2015-08-262-18/+19
| | | | | Acked-by: Christian König <[email protected]> Acked-by: Alex Deucher <[email protected]>
* gallium/ddebug: new pipe for hang detection and driver state dumping (v2)Marek Olšák2015-08-2610-0/+2134
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | v2: lots of improvements This is like identity or trace, but simpler. It doesn't wrap most states. Run with: GALLIUM_DDEBUG=1000 [executable] where "executable" is the app and "1000" is in miliseconds, meaning that the context will be considered hung if a fence fails to signal in 1000 ms. If that happens, all shaders, context states, bound resources, draw parameters, and driver debug information (if any) will be dumped into: /home/$username/dd_dumps/$processname_$pid_$index. Note that the context is flushed after every draw/clear/copy/blit operation and then waited for to find the exact call that hangs. You can also do: GALLIUM_DDEBUG=always to do the dumping after every draw/clear/copy/blit operation without flushing and waiting. Examples of driver states that can be dumped are: - Hardware status registers saying which hw block is busy (hung). - Disassembled shaders in a human-readable form. - The last submitted command buffer in a human-readable form. v2: drop pipe-loader changes, drop SConscript rename dd.h -> dd_pipe.h Acked-by: Christian König <[email protected]> Acked-by: Alex Deucher <[email protected]>
* gallium: add flags parameter to pipe_screen::context_createMarek Olšák2015-08-2657-65/+96
| | | | | | | | This allows creating compute-only and debug contexts. Reviewed-by: Brian Paul <[email protected]> Acked-by: Christian König <[email protected]> Acked-by: Alex Deucher <[email protected]>
* gallium: add an interface for dumping debug driver stateMarek Olšák2015-08-262-0/+17
| | | | | | Reviewed-by: Brian Paul <[email protected]> Acked-by: Christian König <[email protected]> Acked-by: Alex Deucher <[email protected]>
* mesa: remove pointless es31 checks, fix indirect to only be in es31Ilia Mirkin2015-08-262-60/+25
| | | | | Signed-off-by: Ilia Mirkin <[email protected]> Reviewed-by: Tapani Pälli <[email protected]>
* mesa: uncomment checks in es31 computation, add texture_msIlia Mirkin2015-08-261-2/+4
| | | | | Signed-off-by: Ilia Mirkin <[email protected]> Reviewed-by: Martin Peres <[email protected]>
* mesa: create multisample fallback textures like normal texturesMarek Olšák2015-08-261-0/+2
| | | | | | | | | This works if drivers upsample on upload (like all radeon ones do). The alternative is an unexpected GL error from anything calling _mesa_update_state and possibly other issues. Cc: 10.6 11.0 <[email protected]> Reviewed-by: Dave Airlie <[email protected]>
* radeonsi: mark unreachable paths to avoid warningsGrazvydas Ignotas2015-08-262-3/+3
| | | | | | | | | Otherwise we get: warning: 'num_user_sgprs' may be used uninitialized in this function ... Reviewed-by: Michel Dänzer <[email protected]> Signed-off-by: Marek Olšák <[email protected]>
* mesa: GetTexLevelParameter{if}v changes for OpenGL ES 3.1Tapani Pälli2015-08-261-6/+18
| | | | | | | | | | | | | Patch refactors existing parameters check to first check common enums between desktop GL and GLES 3.1 and modifies get_tex_level_parameter_image to be compatible with enums specified in 3.1. v2: remove extra is_gles31() checks (suggested by Ilia) Signed-off-by: Tapani Pälli <[email protected]> Reviewed-by: Anuj Phogat <[email protected]> (v1) Reviewed-by: Marta Lofstedt <[email protected]> (v1) Reviewed-by: Ilia Mirkin <[email protected]>
* mesa/es3.1: Allow GL_COMPUTE_WORK_GROUP_SIZE for OpenGL ES 3.1Marta Lofstedt2015-08-261-1/+1
| | | | | | | | | According to OpenGL ES specification section 7.12, GL_COMPUTE_WORK_GROUP_SIZE, is supported by the glGetProgramiv function. Signed-off-by: Marta Lofstedt <[email protected]> Reviewed-by: Tapani Pälli <[email protected]>
* mesa/es3.1: Enable getting MAX_COMPUTE_WORK_GROUP_ values for OpenGL ES 3.1Marta Lofstedt2015-08-261-2/+2
| | | | | | | | | According to the OpenGL ES 3.1 specification chapter 17, the MAX_COMPUTE_WORK_GROUP_COUNT and MAX_COMPUTE_WORK_GROUP_SIZE is available for glGetIntegeri_v. Signed-off-by: Marta Lofstedt <[email protected]> Reviewed-by: Tapani Pälli <[email protected]>
* mesa/formats: pass correct parameter to _mesa_is_format_compressedDave Airlie2015-08-261-1/+1
| | | | | | | | | | | | | | commit 26c549e69d12e44e2e36c09764ce2cceab262a1b Author: Nanley Chery <[email protected]> Date: Fri Jul 31 10:26:36 2015 -0700 mesa/formats: remove compressed formats from matching function caused a regression in my CTS testing, this looks like a clear thinko. Reviewed-by: Nanley Chery <[email protected]> sSigned-off-by: Dave Airlie <[email protected]>
* gallium/auxiliary: optimize rgb9e5 helper some moreRoland Scheidegger2015-08-261-45/+42
| | | | | | | | | | | | | | | | I used this as some testing ground for investigating some compiler bits initially (e.g. lrint calls etc.), figured I could do much better in the end just for fun... This is mathematically equivalent, but uses some tricks to avoid doubles and also replaces some float math with ints. Good for another performance doubling or so. As a side note, some quick tests show that llvm's loop vectorizer would be able to properly vectorize this version (which it failed to do earlier due to doubles, producing a mess), giving another 3 times performance increase with sse2 (more with sse4.1), but this may not apply to mesa. No piglit change. Acked-by: Marek Olšák <[email protected]>
* gallium/auxiliary: optimize rgb9e5 helper a bitRoland Scheidegger2015-08-261-18/+17
| | | | | | | | | | | | | | | | | This code (lifted straight from the extension) was doing things the most inefficient way you could think of. This drops some of the more expensive float operations, in particular - int-cast floors (pointless, values always positive) - 2 raised to (signed) integers (replace with simple exponent manipulation), getting rid of a misguided comment in the process (implement with table...) - float division (replace with mul of reverse of those exponents) This is like 3 times faster (measured for float3_to_rgb9e5), though it depends (e.g. llvm is clever enough to replace exp2 with ldexp whereas gcc is not, division is not too bad on cpus with early-exit divs). Note that keeping the double math for now (float x + 0.5), as the results may otherwise differ. Acked-by: Marek Olšák <[email protected]>
* mesa/texgetimage: fix missing stencil checkDave Airlie2015-08-261-0/+7
| | | | | | | | | | | | GetTexImage can read to stencil8 but only from a stencil or depthstencil textures. This fixes a bunch of failures in CTS GL33-CTS.gtf32.GL3Tests.packed_pixels Reviewed-by: Marek Olšák <[email protected]> Cc: "11.0" <[email protected]> Signed-off-by: Dave Airlie <[email protected]>
* mesa/teximage: Add GL error parameter to _mesa_target_can_be_compressedNanley Chery2015-08-253-44/+68
| | | | | | | | | | Enables _mesa_target_can_be_compressed to return the appropriate GL error depending on it's inputs. Use the parameter to return the appropriate GL error for ETC2 formats on GLES3. Suggested-by: Chad Versace <[email protected]> Reviewed-by: Chad Versace <[email protected]> Signed-off-by: Nanley Chery <[email protected]>
* mesa/formats: remove compressed formats from matching functionNanley Chery2015-08-2512-60/+31
| | | | | | | | | | | | | | | All compressed formats return GL_FALSE and there isn't any evidence to support that this behaviour would change. Remove all switch cases for compressed formats. v2. Since the exhaustive switch is removed, add a gtest to ensure all formats are handled. v3. Ensure that GL_NO_ERROR is set before returning. v4. Fix an arg to _mesa_uncompressed_format_to_type_and_comps(); fix formatting and misc improvements (Chad). Reviewed-by: Chad Versace <[email protected]> Signed-off-by: Nanley Chery <[email protected]>
* mesa/formats: make format testing a gtestNanley Chery2015-08-254-155/+137
| | | | | | | | | | | | | | | We currently check that our format info table is sane during context initialization in debug builds. Perform this check during `make check` instead. This enables format testing in release builds and removes the requirement of an exhuastive switch for _mesa_uncompressed_format_to_type_and_comps(). v2. indentation and conditional inclusion fixes (Chad). allow tests to continue running if any format fails and display the failing format name. Reviewed-by: Chad Versace <[email protected]> Signed-off-by: Nanley Chery <[email protected]>
* gallium/ttn: Use nir_builder_insert() rather than poking at cf_list.Kenneth Graunke2015-08-251-16/+16
| | | | | | | | | I intend to remove nir_builder::cf_node_list, so I can't have this code poking at it directly. The proper way is to set the insertion point and then simply insert things there. Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Eric Anholt <[email protected]>
* prog_to_nir: Use nir_builder_insert() rather than poking at cf_list.Kenneth Graunke2015-08-251-11/+11
| | | | | | | | | I intend to remove nir_builder::cf_node_list, so I can't have this code poking at it directly. The proper way is to set the insertion point and then simply insert things there. Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Eric Anholt <[email protected]>
* nir: Use nir_shader::stage rather than passing it around.Kenneth Graunke2015-08-254-12/+9
| | | | | | Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]> Reviewed-by: Eric Anholt <[email protected]>
* nir: Store gl_shader_stage in nir_shader.Kenneth Graunke2015-08-255-7/+34
| | | | | | | | | | | | This makes it easy for NIR passes to inspect what kind of shader they're operating on. Thanks to Michel Dänzer for helping me figure out where TGSI stores the shader stage information. Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]> Reviewed-by: Eric Anholt <[email protected]>
* i965/fs: Combine assign_constant_locations and ↵Jason Ekstrand2015-08-252-30/+11
| | | | | | | | | | move_uniform_array_access_to_pull_constants The comment above move_uniform_array_access_to_pull_constants was completely bogus because it has nothing to do with lowering instructions. Instead, it's assiging locations of pull constants. Reviewed-by: Kenneth Graunke <[email protected]>
* nir/lower_io: Remove assign_var_locations_direct_firstJason Ekstrand2015-08-252-82/+0
| | | | | | This is no longer used so we might as well get rid of it. Reviewed-by: Kenneth Graunke <[email protected]>
* i965/fs: Rework uniform handlingJason Ekstrand2015-08-253-29/+11
| | | | | | | | | | | Previously, we treated the entire UNIFORM file as if it had two elements: One for direct things and one for indirect. This is substantially different from how the old visitor code handled it where each element was effectively its own uniform. This commit makes the NIR path more like the old ir_visitor path where each uniform is separate. This should allow us to more easily make decisions about what to push. Reviewed-by: Kenneth Graunke <[email protected]>
* i965/vec4_nir: Get rid of the uniform_driver_location trackingJason Ekstrand2015-08-252-20/+3
| | | | Reviewed-by: Kenneth Graunke <[email protected]>
* nir/lower_io: Separate driver_location and base offset for uniformsJason Ekstrand2015-08-251-2/+7
| | | | Reviewed-by: Kenneth Graunke <[email protected]>
* nir/intrinsics: Add a second const index to load_uniformJason Ekstrand2015-08-253-13/+19
| | | | | | | | | | | In the i965 backend, we want to be able to "pull apart" the uniforms and push some of them into the shader through a different path. In order to do this effectively, we need to know which variable is actually being referred to by a given uniform load. Previously, it was completely flattened by nir_lower_io which made things difficult. This adds more information to the intrinsic to make this easier for us. Reviewed-by: Kenneth Graunke <[email protected]>
* nir: Pass a type_size() function pointer into nir_lower_io().Kenneth Graunke2015-08-254-124/+31
| | | | | | | | | | | | | | | | | | | | | | | | | Previously, there were four type_size() functions in play - the i965 compiler backend defined scalar and vec4 type_size() functions, and nir_lower_io contained its own similar functions. In fact, the i965 driver used nir_lower_io() and then looped over the components using its own type_size - meaning both were in play. The two are /basically/ the same, but not exactly in obscure cases like subroutines and images. This patch removes nir_lower_io's functions, and instead makes the driver supply a function pointer. This gives the driver ultimate flexibility in deciding how it wants to count things, reduces code duplication, and improves consistency. v2 (Jason Ekstrand): - One side-effect of passing in a function pointer is that nir_lower_io is now aware of and properly allocates space for image uniforms, allowing us to drop hacks in the backend Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]> v2 Reviewed-by: Kenneth Graunke <[email protected]>
* prog_to_nir: Don't allocate nir_variable with type vec4[0] for uniforms.Kenneth Graunke2015-08-251-7/+11
| | | | | | | | | | | If there are no parameters, we don't need to create a nir_variable to hold them...and allocating an array of length 0 is pretty bogus. Should avoid i965 backend assertions in future patches Jason and I are working on. Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]>
* i965: Move type_size() methods out of visitor classes.Kenneth Graunke2015-08-258-27/+28
| | | | | | | | I want to use C function pointers to these, and they don't use anything in the visitor classes anyway. Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]>
* i965: Make setup_vec4_uniform_value and _image_uniform_values take an offsetJason Ekstrand2015-08-257-22/+38
| | | | | | | | This way they don't implicitly increment the uniforms variable and don't have to be called in-sequence during uniform setup. Reviewed-by: Francisco Jerez <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
* i965: Rename setup_vector_uniform_values to setup_vec4_uniform_valueJason Ekstrand2015-08-256-17/+18
| | | | | | | The new name more accurately represents what it does: Set up a single vec4 uniform value. Reviewed-by: Kenneth Graunke <[email protected]>
* freedreno/ir3: fix compile break after splitting out nir_control_flow.hRob Clark2015-08-251-0/+1
| | | | | | | | | | | | | | | The commit: commit b49371b8ede380f10ea3ab333246a3b01ac6aca5 Author: Connor Abbott <[email protected]> AuthorDate: Tue Jul 21 19:54:18 2015 -0700 nir: move control flow modification to its own file split out some control flow related APIs into a separate header, but did not update drivers. Signed-off-by: Rob Clark <[email protected]>
* freedreno/ir3: fix compile break after fxn->start_block removalRob Clark2015-08-251-1/+1
| | | | | | | | | | | | | | The commit: commit 8e0d4ef3410ea07d9621df3e083bc3e7c1ad2ab0 Author: Kenneth Graunke <[email protected]> AuthorDate: Thu Aug 6 18:18:40 2015 -0700 nir: Delete the nir_function_impl::start_block field. removed the start_block field without fixing up drivers.. Signed-off-by: Rob Clark <[email protected]>
* mesa: enable texture stencil8 for multisampleDave Airlie2015-08-251-2/+5
| | | | | | | | | This fixes GL45-CTS.gtf44.GL31Tests.texture_stencil8.texture_stencil8_gl44 from the ogl conform suite. Reviewed-by: Ilia Mirkin <[email protected]> Cc: 10.6 11.0 <[email protected]> Signed-off-by: Dave Airlie <[email protected]>
* mesa: make _mesa_bind_texture_unit() staticBrian Paul2015-08-242-9/+5
| | | | | | It's only called from the file it's defined in. Reviewed-by: Timothy Arceri <[email protected]>
* mesa/formats: store whether or not a format is sRGB in gl_format_infoNanley Chery2015-08-242-24/+6
| | | | | | | | v2: remove extra newline. v3: use bool instead of GLboolean. Reviewed-by: Anuj Phogat <[email protected]> Signed-off-by: Nanley Chery <[email protected]>
* nir: Use !block_ends_in_jump() in a few places rather than open-coding.Kenneth Graunke2015-08-241-12/+9
| | | | | | | | | Connor introduced this helper recently; we should use it here too. I had to move the function earlier in the file for it to be available. Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]>
* nir/cf: reimplement nir_cf_node_remove() using the new APIConnor Abbott2015-08-242-33/+9
| | | | | | | | | This gives us some testing of it. Also, the old nir_cf_node_remove() wasn't handling phi nodes correctly and was calling cleanup_cf_node() too late. Signed-off-by: Connor Abbott <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
* nir/cf: add new control modification API'sConnor Abbott2015-08-242-0/+137
| | | | | | | | | | | | | | | | | | | | | | | | | | These will help us do a number of things, including: - Early return elimination. - Dead control flow elimination. - Various optimizations, such as replacing: if (foo) { ... } if (!foo) { ... } with: if (foo) { ... } else { ... } Signed-off-by: Connor Abbott <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
* nir/cf: use a cursor for inserting control flowConnor Abbott2015-08-242-175/+49
| | | | | Signed-off-by: Connor Abbott <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
* nir/cf: add split_block_cursor()Connor Abbott2015-08-241-0/+48
| | | | | | | | This is a helper that will be shared between the new control flow insertion and modification code. Signed-off-by: Connor Abbott <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
* nir/cf: add split_block_before_instr()Connor Abbott2015-08-241-0/+18
| | | | | Signed-off-by: Connor Abbott <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
* nir/cf: add a cursor structureConnor Abbott2015-08-241-0/+91
| | | | | | | | | | | | | | | | | | | For now, it allows us to refactor the control flow insertion API's so that there's a single entrypoint (with some wrappers). More importantly, it will allow us to reduce the combinatorial explosion in the extract function. There, we need to specify two points to extract, which may be at the beginning of a block, the end of a block, or in the middle of a block. And then there are various wrappers based off of that (before a control flow node, before a control flow list, etc.). Rather than having 9 different functions, we can have one function and push the actual logic of determining which variant to use down to the split function, which will be shared with nir_cf_node_insert(). In the future, we may want to make the instruction insertion API's as well as the builder use this, but that's a future cleanup. Signed-off-by: Connor Abbott <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
* nir/cf: fix link_blocks() when there are no successorsConnor Abbott2015-08-241-1/+2
| | | | | | | | | | | | | | When we insert a single basic block A into another basic block B, we will split B into C and D, insert A in the middle, and then splice together C, A, and D. When we splice together C and A, we need to move the successors of A into C -- except A has no successors, since it hasn't been inserted yet. So in move_successors(), we need to handle the case where the block whose successors are to be moved doesn't have any successors. Fixing link_blocks() here prevents a segfault and makes it work correctly. Signed-off-by: Connor Abbott <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>