summaryrefslogtreecommitdiffstats
Commit message (Collapse)AuthorAgeFilesLines
...
* st/va: MPEG4 generate GOV and VOP headerMichael Varga2014-11-101-0/+92
| | | | | | Also, Implemented a small locally used interface for writing bits to a buffer. Signed-off-by: Michael Varga <[email protected]>
* st/va: MPEG4 populate the SPS structureMichael Varga2014-11-101-0/+6
| | | | Signed-off-by: Michael Varga <[email protected]>
* st/va: MPEG4 populate the iq matrix buffersMichael Varga2014-11-101-0/+16
| | | | Signed-off-by: Michael Varga <[email protected]>
* st/va: MPEG4 populate the PPS structureMichael Varga2014-11-102-0/+81
| | | | Signed-off-by: Michael Varga <[email protected]>
* st/va: refactored handleVASliceDataBufferTypeMichael Varga2014-11-101-35/+40
| | | | | | | This patch cleans the function handleVASliceDataBufferType() for better readability. Signed-off-by: Michael Varga <[email protected]>
* mesa: Remove _mesa_max_buffer_indexIan Romanick2014-11-102-52/+0
| | | | | | | It appears to be completely unused since f9be8543 (February 2012). Signed-off-by: Ian Romanick <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
* mesa: Uniform logging is very, very unlikelyIan Romanick2014-11-101-2/+2
| | | | | | Signed-off-by: Ian Romanick <[email protected]> Reviewed-by: Brian Paul <[email protected]> Reviewed-by: Tapani Pälli <[email protected]>
* glsl: Swap the order of glsl_type::name and ::lengthIan Romanick2014-11-102-8/+8
| | | | | | | | | | | | On x86-64 this saves 8 bytes of padding in the structure, and this reduces the size of the structure to 32 bytes. v2: Fix constructor so that GCC won't warn about the order of initialization. Signed-off-by: Ian Romanick <[email protected]> Reviewed-by: Brian Paul <[email protected]> Reviewed-by: Tapani Pälli <[email protected]>
* glsl: Store glsl_type::vector_elements and ::matrix_columns as uint8_tIan Romanick2014-11-101-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Due to the total number of bits used in the bitfield, this does not increase the size of the structure. It does, however, reduce the number of instructions required each time one of these fields is accessed. To access ::matrix_columns with the bitfield, three instructions were required: movzbl 0x9(%rdx),%eax shr %al and $0x7,%eax As a uint8_t, only one instruction is required. movzbl 0xa(%rdx),%eax These fields are accessed *a lot*. Valgrind callgrind results for a trace of Tesseract: _mesa_Uniform4fv _mesa_Uniform4f _mesa_Uniform1i Before (64-bit): 48,103,497 16,556,096 676,447 After (64-bit): 45,722,616 15,737,964 670,607 _mesa_Uniform4fv _mesa_Uniform4f _mesa_Uniform1i Before (32-bit): 61,472,611 21,051,222 821,361 After (32-bit): 57,987,421 19,872,226 811,609 Signed-off-by: Ian Romanick <[email protected]> Reviewed-by: Brian Paul <[email protected]> Reviewed-by: Tapani Pälli <[email protected]>
* mesa: Don't check for API_OPENGLES in _mesa_uniform_matrixIan Romanick2014-11-101-4/+4
| | | | | | | | | | | | | | | | | | | | | | | | There are no uniforms in OpenGL ES 1.x, so we can't even get to this code in that API. Also, reorder the checks. First check that transpose is true, then check whether or not that is legal in the current API. transpose should never be true in an ES2 context, so this gets one check (the more expensive one) out of the main path. Valgrind callgrind results for a trace of Tesseract: _mesa_UniformMatrix4fv _mesa_UniformMatrix3fv Before (64-bit): 96,119,025 24,240,510 After (64-bit): 90,726,569 22,926,662 _mesa_UniformMatrix4fv _mesa_UniformMatrix3fv Before (32-bit): 132,434,452 29,051,808 After (32-bit): 126,658,112 27,989,316 Signed-off-by: Ian Romanick <[email protected]> Reviewed-by: Brian Paul <[email protected]> Reviewed-by: Tapani Pälli <[email protected]>
* mesa: Rework array error checks in validate_uniform_parametersIan Romanick2014-11-101-19/+22
| | | | | | | | | | | | | | | | | | | | | | Before ARB_explicit_uniform_location, Mesa's location encoding allowed locations for non-array types that had non-zero array indices. Basically, part of the location was the uniform and part was the array index. This meant that some checks had to occur for arrays and non-arrays. This is no longer possible, we the checks can be split up. Valgrind callgrind results for a trace of Tesseract: _mesa_Uniform4fv _mesa_Uniform4f _mesa_Uniform1i Before (64-bit): 50,499,557 17,487,316 686,227 After (64-bit): 50,023,791 17,274,432 684,293 _mesa_Uniform4fv _mesa_Uniform4f _mesa_Uniform1i Before (32-bit): 62,968,039 21,732,380 828,147 After (32-bit): 62,373,967 21,490,756 826,223 Signed-off-by: Ian Romanick <[email protected]> Reviewed-by: Brian Paul <[email protected]> Reviewed-by: Tapani Pälli <[email protected]>
* mesa: Get some gl_shader_program::LinkStatus checking out of the main pathIan Romanick2014-11-101-6/+19
| | | | | | | | | | I really wanted to remove 'shProg != NULL' as well, but that would have required adding a dummy program as the default program. That seemed like more churn than removing one test was worth. Signed-off-by: Ian Romanick <[email protected]> Reviewed-by: Brian Paul <[email protected]> Reviewed-by: Tapani Pälli <[email protected]>
* mesa: Rework location == -1 error checkingIan Romanick2014-11-101-38/+38
| | | | | | | | | | | | | | | | | | | | | | | | | | | Only one caller wanted to generate an error when location == -1, so move the error generation to that caller. There will be more callers in the future that do not want to generate errors. Move the location == -1 check later in validate_uniform_parameters. As currently implemented, glUniform1iv(-1, -1, data) would not generate an error, but it should due to count being < 0. The location that I have moved it to will make more sense with the next commit. Valgrind callgrind results for a trace of Tesseract: _mesa_Uniform4fv _mesa_Uniform4f _mesa_Uniform1i Before (64-bit): 51,241,217 17,740,162 689,181 After (64-bit): 50,499,557 17,487,316 686,227 _mesa_Uniform4fv _mesa_Uniform4f _mesa_Uniform1i Before (32-bit): 63,940,605 21,987,918 831,065 After (32-bit): 62,968,039 21,732,380 828,147 Signed-off-by: Ian Romanick <[email protected]> Reviewed-by: Brian Paul <[email protected]> Reviewed-by: Tapani Pälli <[email protected]>
* mesa: Minor clean ups in _mesa_uniformIan Romanick2014-11-101-23/+9
| | | | | | Signed-off-by: Ian Romanick <[email protected]> Reviewed-by: Brian Paul <[email protected]> Reviewed-by: Tapani Pälli <[email protected]>
* mesa: Remove GLSL_TYPE_SAMPLER checkIan Romanick2014-11-101-2/+1
| | | | | | | | | Noting the assertion just a few lines earlier, returnType cannot be GLSL_TYPE_SAMPLER. Signed-off-by: Ian Romanick <[email protected]> Reviewed-by: Brian Paul <[email protected]> Reviewed-by: Tapani Pälli <[email protected]>
* mesa/main: Pass the data that _mesa_uniform actually wantsIan Romanick2014-11-103-119/+54
| | | | | | | | | | The GL_ enums were previously used because glsl_types.h couldn't be used in C code. That was fixed some time ago (and uniforms.c already includes glsl_types.h), so this is no longer necessary. Signed-off-by: Ian Romanick <[email protected]> Reviewed-by: Brian Paul <[email protected]> Reviewed-by: Tapani Pälli <[email protected]>
* ilo: derive fb blending caps at bind timeChia-I Wu2014-11-103-78/+101
| | | | | | | | Derive whether a RT supports blending, logicop, and the like when set_framebuffer_state() is called. This enables us to simplify gen6_BLEND_STATE(). Signed-off-by: Chia-I Wu <[email protected]>
* ilo: remove inlined state functionsChia-I Wu2014-11-104-236/+177
| | | | | | | We had some inlined state functions for dispatching. They were not needed with the new top/bottom split. Signed-off-by: Chia-I Wu <[email protected]>
* ilo: use top/bottom split for state functionsChia-I Wu2014-11-109-1852/+1856
| | | | | | | Follow the builder and split state functions into top (vertex processing) and bottom (pixel processing). Signed-off-by: Chia-I Wu <[email protected]>
* i965: Advertise a line width of 40.0 on Cherryview and Skylake.Kenneth Graunke2014-11-081-1/+5
| | | | | | | | | According to the documentation, line widths higher than 40.0 may have quality problems. That's already 20 times larger than we've been exposing, so it seems totally sufficient. Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Kristian Høgsberg <[email protected]>
* i965: Advertise larger line widths.Kenneth Graunke2014-11-081-3/+9
| | | | | | | | | | | | | | | | We've artificially been limiting this to 5 for no particular reason. On Gen4-5, the limit is [0, 7.5] with a granularity of 0.5 (U3.1). On Gen6+, the limit is [0, 7.9921875]. Since it's a U3.7, the granularity should be 0.125 (1/8). This patch conservatively advertises one granularity smaller than the hardware's maximum value, just in case there's a problem using the largest possible value. On Gen4-5, this is 7.5 - 0.5 = 7.0. On Gen6+, this is 8.0 - 0.125 = 7.875. Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Kristian Høgsberg <[email protected]>
* i965: Use ctx->Const.MaxLineWidth when clamping ctx->Line.Width.Kenneth Graunke2014-11-084-5/+8
| | | | | | | | | | | | Rather than hardcoding platform values in every code path, just use the maximum value we set. Currently, ctx->Const.LineWidth == 5, which is smaller than the hardware limit. But applications shouldn't be using a value larger than we support anyway. Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Kristian Høgsberg <[email protected]>
* i965: Set Line Width correctly on Cherryview and Skylake.Kenneth Graunke2014-11-082-1/+6
| | | | | | | Line Width moved to DW1 bits 29:12. It's actually now a U11.7. Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Kristian Høgsberg <[email protected]>
* docs: add news item and link release notes for mesa 10.3.3Emil Velikov2014-11-082-0/+7
| | | | Signed-off-by: Emil Velikov <[email protected]>
* docs: Add sha256 sums for the 10.3.3 releaseEmil Velikov2014-11-081-1/+3
| | | | | Signed-off-by: Emil Velikov <[email protected]> (cherry picked from commit 9cc26056ee13f25c5785fef81b31487f1429baa4)
* Add release notes for the 10.3.3 releaseEmil Velikov2014-11-081-0/+207
| | | | | Signed-off-by: Emil Velikov <[email protected]> (cherry picked from commit 1a9cc5f50db5d27530a3449743b43aac389d781f)
* util/format: Fix clamping to 32bit integers.José Fonseca2014-11-081-0/+27
| | | | | | | | | | | | | | | | Use clamping constants that guarantee no integer overflows. As spotted by Chris Forbes. This causes the code to change as: - value |= (uint32_t)CLAMP(src[0], 0.0f, 4294967295.0f); + value |= (uint32_t)CLAMP(src[0], 0.0f, 4294967040.0f); - value |= (uint32_t)((int32_t)CLAMP(src[0], -2147483648.0f, 2147483647.0f)); + value |= (uint32_t)((int32_t)CLAMP(src[0], -2147483648.0f, 2147483520.0f)); Reviewed-by: Roland Scheidegger <[email protected]>
* util/format: Generate floating point constants for clamping.José Fonseca2014-11-081-4/+4
| | | | | | | | | | | | | | | | | | | | | This commit causes the generated C code to change as union util_format_r32g32b32a32_sscaled pixel; - pixel.chan.r = (int32_t)CLAMP(src[0], -2147483648, 2147483647); - pixel.chan.g = (int32_t)CLAMP(src[1], -2147483648, 2147483647); - pixel.chan.b = (int32_t)CLAMP(src[2], -2147483648, 2147483647); - pixel.chan.a = (int32_t)CLAMP(src[3], -2147483648, 2147483647); + pixel.chan.r = (int32_t)CLAMP(src[0], -2147483648.0f, 2147483647.0f); + pixel.chan.g = (int32_t)CLAMP(src[1], -2147483648.0f, 2147483647.0f); + pixel.chan.b = (int32_t)CLAMP(src[2], -2147483648.0f, 2147483647.0f); + pixel.chan.a = (int32_t)CLAMP(src[3], -2147483648.0f, 2147483647.0f); memcpy(dst, &pixel, sizeof pixel); which surprisingly makes a difference for MSVC. Thanks to Juraj Svec for diagnosing this and drafting a fix. Fixes https://bugs.freedesktop.org/show_bug.cgi?id=29661
* glsl/list: Revert unintentional file mode change in previous commit.Vinson Lee2014-11-071-0/+0
| | | | Signed-off-by: Vinson Lee <[email protected]>
* glsl/list: Move declaration before code.Vinson Lee2014-11-071-1/+3
| | | | | | | | | | | | Fixes MSVC build error. shaderapi.c src\glsl\list.h(535) : error C2143: syntax error : missing ';' before 'type' src\glsl\list.h(535) : error C2143: syntax error : missing ')' before 'type' src\glsl\list.h(536) : error C2065: 'node' : undeclared identifier Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=86025 Signed-off-by: Vinson Lee <[email protected]>
* glsl/list: Add an exec_list_validate functionJason Ekstrand2014-11-071-0/+19
| | | | | | | This can be very useful for trying to debug list corruptions. Signed-off-by: Jason Ekstrand <[email protected]> Reviewed-by: Matt Turner <[email protected]>
* llvmpipe: Avoid deadlock when unloading opengl32.dllJosé Fonseca2014-11-071-2/+6
| | | | | | | | | | | | | | | | | | | | | | On Windows, DllMain calls and thread creation/destruction are serialized, so when llvmpipe is destroyed from DllMain waiting for the rasterizer threads to finish will deadlock. So, instead of waiting for rasterizer threads to have finished, simply wait for the rasterizer threads to notify they are just about to finish. Verified with this very simple program: #include <windows.h> int main() { HMODULE hModule = LoadLibraryA("opengl32.dll"); FreeLibrary(hModule); } Fixes https://bugs.freedesktop.org/show_bug.cgi?id=76252 Reviewed-by: Roland Scheidegger <[email protected]> Cc: 10.2 10.3 <[email protected]>
* docs: Update minimum required LLVM version.José Fonseca2014-11-071-1/+1
|
* i965: drop the custom gen8_instruction CFLAGEmil Velikov2014-11-071-2/+0
| | | | | | | | | | | | | | | No longer needed as the file was removed with commit 8c229d306b3f312adbdfbaf79967ee43fbfc839e Author: Kenneth Graunke <[email protected]> Date: Mon Aug 11 10:07:07 2014 -0700 i965: Delete the Gen8 code generators. We now use the brw_eu_emit.c code instead. Reviewed-by: Kenneth Graunke <[email protected]> Reviewed-by: Matt Turner <[email protected]> Signed-off-by: Emil Velikov <[email protected]>
* gbm/dri: cleanup memory leak on teardownEmil Velikov2014-11-071-0/+3
| | | | | | | | During teardown we free the driver_configs list pointer, but we forget to deallocate each config in that list. Signed-off-by: Emil Velikov <[email protected]> Reviewed-and-tested-by: Kenneth Graunke <[email protected]>
* egl_dri2: add a note about dri2_create_screenEmil Velikov2014-11-071-0/+4
| | | | | | | | | | The function is not called by platform_drm. As such one needs to pay special attention at teardown. v2: Fix the comment block. Spotted by Ken. Signed-off-by: Emil Velikov <[email protected]> Reviewed-and-tested-by: Kenneth Graunke <[email protected]> (v1)
* egl_dri2: fix double free on drm platformsEmil Velikov2014-11-071-3/+9
| | | | | | | | | | | | | | | | | | Earlier commit failed to attribure that for drm platforms one does not call dri2_create_screen, thus it does not create the screen and driver_configs but inherits them from the "display" - gbm. As such wrap cleanup in Platform != _EGL_PLATFORM_DRM to prevent the issue and still cleanup correctly for non-drm platforms. v2: - Drop the ifdef HAVE_DRM_PLATFORM, reindent the code and fix the comment block. Suggested by Ken. Reported-by: Kenneth Graunke <[email protected]> Reported-by: Mark Janes <[email protected]> Signed-off-by: Emil Velikov <[email protected]> Reviewed-and-tested-by: Kenneth Graunke <[email protected]> (v1)
* ilo: tidy up message descriptor decodingChia-I Wu2014-11-071-189/+302
| | | | | | | Move opcode to string mappings to functions of their own. Have for consistent outputs for similar opcodes. Signed-off-by: Chia-I Wu <[email protected]>
* ilo: decode INTERFACE_DESCRIPTOR_DATAChia-I Wu2014-11-073-1/+42
| | | | | | This is at least much better than decoding as blobs. Signed-off-by: Chia-I Wu <[email protected]>
* i965/fs: Wire up control flow correctly in predicated break pass.Matt Turner2014-11-061-3/+7
| | | | | | | | When the earlier block ended with control flow, we'd mistakenly remove some of its links to its children. The same happened with the later block. Acked-by: Jason Ekstrand <[email protected]>
* i965/cfg: Add functions to get first and last non-CF instructions.Matt Turner2014-11-061-0/+74
| | | | Reviewed-by: Jason Ekstrand <[email protected]>
* glsl: Skip loop-too-large heuristic if indexing arrays of a certain sizeKenneth Graunke2014-11-061-1/+12
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | A pattern in certain shaders is: uniform vec4 colors[NUM_LIGHTS]; for (int i = 0; i < NUM_LIGHTS; i++) { ...use colors[i]... } In this case, the application author expects the shader compiler to unroll the loop. By doing so, it replaces variable indexing of the array with constant indexing, which is more efficient. This patch extends the heuristic to see if arrays accessed within the loop are indexed by an induction variable, and if the array size exactly matches the number of loop iterations. If so, the application author probably intended us to unroll it. If not, we rely on the existing loop-too-large heuristic. Improves performance in a phong shading microbenchmark by 2.88x, and a shadow mapping microbenchmark by 1.63x. Without variable indexing, we can upload the small uniform arrays as push constants instead of pull constants, avoiding shader memory access. Affects several games, but doesn't appear to impact their performance. Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Matt Turner <[email protected]> Acked-by: Kristian Høgsberg <[email protected]>
* glsl: Lower constant arrays to uniform arrays.Kenneth Graunke2014-11-064-0/+106
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Consider GLSL code such as: const ivec2 offsets[] = ivec2[](ivec2(-1, -1), ivec2(-1, 0), ivec2(-1, 1), ivec2(0, -1), ivec2(0, 0), ivec2(0, 1), ivec2(1, -1), ivec2(1, 0), ivec2(1, 1)); ivec2 offset = offsets[<non-constant expression>]; Both i965 and nv50 currently handle this very poorly. On i965, this becomes a pile of MOVs to load the immediate constants into registers, a pile of scratch writes to move the whole array to memory, and one scratch read to actually access the value - effectively the same as if it were a non-constant array. We'd much rather upload large blocks of constant data as uniform data, so drivers can simply upload the data via constbufs, and not have to populate it via shader instructions. This is currently non-optional because both i965 and nouveau benefit from it, and according to Marek radeonsi would benefit today as well. (According to Tom, radeonsi may want to handle this itself in the long term, but we can always add a flag when it becomes useful.) Improves performance in a terrain rendering microbenchmark by about 2x, and cuts the number of instructions in about half. Helps a lot of "Natural Selection 2" shaders, as well as one "HOARD" shader. total instructions in shared programs: 5473459 -> 5471765 (-0.03%) instructions in affected programs: 5880 -> 4186 (-28.81%) v2: Use ir_var_hidden to avoid exposing the new uniform via the GL uniform introspection API. v3: Alphabetize Makefile.sources properly. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=77957 Signed-off-by: Kenneth Graunke <[email protected]>
* glsl: Add infrastructure for "hidden" uniforms.Kenneth Graunke2014-11-065-2/+67
| | | | | | | | | | | | | | | In the compiler, we'd like to generate implicit uniforms for internal use. These should not be visible via the GL uniform introspection API. To support that, we add a new ir_variable::how_declared value of ir_var_hidden, and plumb that through to gl_uniform_storage. v2 (idr): Fix some memory management issues in move_hidden_uniforms_to_end. The comment block on the function has more details. Signed-off-by: Kenneth Graunke <[email protected]> Signed-off-by: Ian Romanick <[email protected]>
* mesa: Add SSE 4.1 optimisation for glDrawElements.Timothy Arceri2014-11-066-5/+152
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Makes use of SSE 4.1 to speed up compute of min and max elements. Callgrind cpu usage results from pts benchmarks: Openarena 0.8.8: 3.67% -> 1.03% UrbanTerror: 2.36% -> 0.81% V5: - actually make use of the optimisation in android (Emil Velikov) - set a better array size limit for using SSE and added TODO V4: - fixed bugs with incrementing pointer and updating counters V3: - Removed sse_minmax.c from Makefile.sources - handle the first few values without SSE until the pointer is aligned and use _mm_load_si128 rather than _mm_loadu_si128 - guard the call to the SSE code better at build time V2: - removed GL* types - use _mm_store_si128() rather than _mm_store_ps() - add runtime check for SSE - use aligned attribute for local mix/max - bunch of tidyups Reviewed-by: Juha-Pekka Heikkila <[email protected]> Reviewed-by: Matt Turner <[email protected]> Signed-off-by: Timothy Arceri <[email protected]>
* i965: Remove non-existent vertical strides from array.Matt Turner2014-11-061-1/+1
| | | | | | These never existed, as far as I can tell. Reviewed-by: Kenneth Graunke <[email protected]>
* i965: Convert stride/width/execution size macros into enums.Matt Turner2014-11-061-28/+33
| | | | Reviewed-by: Kenneth Graunke <[email protected]>
* i965/fs: Remove force uncompressed stack.Matt Turner2014-11-063-27/+0
| | | | | | Last use was in shader_time. Reviewed-by: Jason Ekstrand <[email protected]>
* i965/fs: Use execution size of 1 for some shader_time operations.Matt Turner2014-11-061-1/+1
| | | | | | The ADDs depended on dispatch_width, which really isn't what we wanted. Reviewed-by: Jason Ekstrand <[email protected]>
* i965/fs: Use mov(4) instructions to read timestamp.Matt Turner2014-11-061-5/+4
| | | | We only want fields 0-2.