aboutsummaryrefslogtreecommitdiffstats
path: root/src
Commit message (Collapse)AuthorAgeFilesLines
* i965/blorp: Get rid of brw_blorp_surface_info::map_stencil_as_y_tiledJason Ekstrand2016-08-173-39/+26
| | | | | | | Now that we're carrying around the isl_surf, we can just modify it directly instead of passing an extra bit around. Reviewed-by: Topi Pohjolainen <[email protected]>
* i965/blorp: Remove compute_tile_offsetsJason Ekstrand2016-08-172-34/+5
| | | | | | We have a handy little function is ISL that does exactly the same thing. Reviewed-by: Topi Pohjolainen <[email protected]>
* i965/blorp: Create the isl_surf up-frontJason Ekstrand2016-08-172-11/+19
| | | | Reviewed-by: Topi Pohjolainen <[email protected]>
* i965/blorp/clear: Initialize surface info after allocating an MCSJason Ekstrand2016-08-171-6/+6
| | | | Reviewed-by: Topi Pohjolainen <[email protected]>
* isl/state: Use a valid alignment for 1-D texturesJason Ekstrand2016-08-171-1/+1
| | | | | | | The alignment we use doesn't matter (see the comment) but it should at least be an alignment we can represent with the enums. Reviewed-by: Topi Pohjolainen <[email protected]>
* i965/miptree: Remove the stencil_as_y_tiled parameter from get_tile_masksJason Ekstrand2016-08-174-10/+8
| | | | | | | It's only used to stomp the tiling to Y and it's only used by blorp so there's no reason why blorp can't do it itself. Reviewed-by: Topi Pohjolainen <[email protected]>
* isl: Fix the parameter names for get_intratile_offsetJason Ekstrand2016-08-171-4/+4
| | | | | | | | It's been in elements for a while but, for whatever reason, the parameter names in the header file never got updated. Signed-off-by: Jason Ekstrand <[email protected]> Reviewed-by: Topi Pohjolainen <[email protected]>
* util: try to use SSE instructions with MSVC and 32-bit gccBrian Paul2016-08-171-3/+4
| | | | | | | | | | | | | The lrint() and lrintf() functions are pretty slow and make some texture transfers very inefficient. This patch makes a better effort at using those intrisics for 32-bit gcc and MSVC. Note, this patch doesn't address the use of SSE4.1 with MSVC. v2: get rid of the ROUND_WITH_SSE symbol, per Matt. Reviewed-by: José Fonseca <[email protected]> Reviewed-by: Matt Turner <[email protected]>
* svga: fix src/dst typo in can_blit_via_copy_region_vgpu10()Brian Paul2016-08-171-1/+1
| | | | | | | | | | The function was always returning false because of this typo. Retested with piglit. There's some sRGB-related blit failures, but that seems unrelated. Reviewed-by: Charmaine Lee <[email protected]> Reviewed-by: Neha Bhende <[email protected]>
* svga: initialize a variable to silence a gcc warningBrian Paul2016-08-171-1/+1
| | | | Reviewed-by: Charmaine Lee <[email protected]>
* glsl: Pull enum ir_expression_operation out to its own fileIan Romanick2016-08-173-317/+342
| | | | | | | | | | No change except to the copyright symbol. The next patch will generate this file with Python, and Unicode + Python = pure rage. v2: Massive rebase... I guess a lot can change in a year. Signed-off-by: Ian Romanick <[email protected]> Reviewed-by: Matt Turner <[email protected]>
* glsl: Make the generated sources build rules more like NIRIan Romanick2016-08-173-6/+5
| | | | | Signed-off-by: Ian Romanick <[email protected]> Reviewed-by: Matt Turner <[email protected]>
* mesa/st: use llabs instead of abs for long args (v2)Francesco Ansanelli2016-08-171-1/+1
| | | | | | v2: long has 32bit on Windows (Marek) Signed-off-by: Francesco Ansanelli <[email protected]> Signed-off-by: Marek Olšák <[email protected]>
* radeonsi: fix up buffer descriptor upper-bound checkingMarek Olšák2016-08-171-1/+1
| | | | | | st/mesa does this too, so we're safe. Reviewed-by: Nicolai Hähnle <[email protected]>
* gallium: change pipe_image_view::first_element/last_element -> offset/sizeMarek Olšák2016-08-179-50/+27
| | | | | | | | | This is required by OpenGL. Our hardware supports this. Example: Bind RGBA32F with offset = 4 bytes. Acked-by: Ilia Mirkin <[email protected]> Acked-by: Nicolai Hähnle <[email protected]>
* gallium: change pipe_sampler_view::first_element/last_element -> offset/sizeMarek Olšák2016-08-1724-82/+81
| | | | | | | | | | | This is required by OpenGL. Our hardware supports this. Example: Bind RGBA32F with offset = 4 bytes. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=97305 Acked-by: Ilia Mirkin <[email protected]> Acked-by: Nicolai Hähnle <[email protected]>
* gallium/radeon: assign the highest priority to scratch; make rings secondMarek Olšák2016-08-172-4/+6
| | | | | | | just FYI, the kernel receives priority/4 Acked-by: Edward O'Callaghan <[email protected]> Reviewed-by: Nicolai Hähnle <[email protected]>
* gallium/winsys: re-number winsys priority flagsMarek Olšák2016-08-171-16/+13
| | | | | | | free 60..63, move CP_DMA up Acked-by: Edward O'Callaghan <[email protected]> Reviewed-by: Nicolai Hähnle <[email protected]>
* gallium/radeon: mark shader rings as highest-priority buffersMarek Olšák2016-08-175-7/+7
| | | | | | | and rename the enum Acked-by: Edward O'Callaghan <[email protected]> Reviewed-by: Nicolai Hähnle <[email protected]>
* gallium/radeon: set SHADER_RW_BUFFER priority for streamout buffersMarek Olšák2016-08-172-4/+6
| | | | | Acked-by: Edward O'Callaghan <[email protected]> Reviewed-by: Nicolai Hähnle <[email protected]>
* radeonsi: use current context for DCC feedback-loop decompress, fixes ElementalMarek Olšák2016-08-174-16/+38
| | | | | | | | | | This is just a workaround. The problem is described in the code. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=96541 v2: say that it's only between the current context and aux_context Reviewed-by: Nicolai Hähnle <[email protected]> (v1)
* radeonsi: simplify CB_TARGET_MASK logicMarek Olšák2016-08-171-14/+7
| | | | | | we can now rely on CB_COLORn_INFO to disable empty slots. Reviewed-by: Nicolai Hähnle <[email protected]>
* radeonsi: don't set CB_COLOR1_INFO for dual src blendingMarek Olšák2016-08-171-7/+0
| | | | | | | | | Vulkan doesn't do this. The reason may be that CB_COLOR1_INFO.SOURCE_FORMAT from NI was moved to SPI_SHADER_COL_FORMAT for SI. I asked CB guys about this 2 days ago and they still haven't replied. Reviewed-by: Nicolai Hähnle <[email protected]>
* radeonsi: eliminate PS OUT[1] if dual src blending is off and CB1 is not boundMarek Olšák2016-08-172-11/+7
| | | | | | All VP DX9 ports benefit from this. Reviewed-by: Nicolai Hähnle <[email protected]>
* gallium/radeon: use unflushed fences for PIPE_QUERY_GPU_FINISHEDMarek Olšák2016-08-171-2/+2
| | | | Reviewed-by: Nicolai Hähnle <[email protected]>
* gallium/radeon: use lp_build_alloca_undefNicolai Hähnle2016-08-171-13/+4
| | | | | | | | Avoid building all those store 0 / store undef instruction pairs that end up getting removed anyway. Reviewed-by: Roland Scheidegger <[email protected]> Reviewed-by: Marek Olšák <[email protected]>
* gallivm: add lp_build_alloca_undefNicolai Hähnle2016-08-172-0/+24
| | | | | Reviewed-by: Roland Scheidegger <[email protected]> Reviewed-by: Marek Olšák <[email protected]>
* gallivm: add create_builder_at_entry helper functionNicolai Hähnle2016-08-171-23/+22
| | | | | | | Reduces code duplication. Reviewed-by: Roland Scheidegger <[email protected]> Reviewed-by: Marek Olšák <[email protected]>
* gallium/radeon: protect against out of bounds temporary array accessesNicolai Hähnle2016-08-171-0/+15
| | | | | | | They can lead to VM faults and worse, which goes against the GL robustness promises. Reviewed-by: Marek Olšák <[email protected]>
* gallium/radeon: add radeon_llvm_bound_index for bounds checkingNicolai Hähnle2016-08-173-18/+34
| | | | Reviewed-by: Marek Olšák <[email protected]>
* gallium/radeon: reduce alloca of temporaries based on usagemaskNicolai Hähnle2016-08-172-10/+54
| | | | | | v2: take actual writemasks into account Reviewed-by: Marek Olšák <[email protected]>
* gallium/radeon: use tgsi_scan_arrays for temp arraysNicolai Hähnle2016-08-173-5/+10
| | | | Reviewed-by: Marek Olšák <[email protected]>
* gallium/radeon: allocate temps array info in radeon_llvm_context_initNicolai Hähnle2016-08-173-36/+47
| | | | | | | | | Also, prepare for using tgsi_array_info. This also opens the door for properly handling allocation failures, but I'm leaving that for a separate change. Reviewed-by: Marek Olšák <[email protected]>
* gallium/radeon: always do the full store in store_value_to_arrayNicolai Hähnle2016-08-171-49/+28
| | | | | | | | | Doing the write-back of the temporary vector in radeon_llvm_emit_store makes no sense. This also allows us to get rid of get_alloca_for_array. Reviewed-by: Marek Olšák <[email protected]>
* gallium/radeon: extract common getelementptr logic into get_pointer_into_arrayNicolai Hähnle2016-08-171-39/+66
| | | | Reviewed-by: Marek Olšák <[email protected]>
* gallium/radeon: pass indirect register info into get_alloca_for_arrayNicolai Hähnle2016-08-171-5/+6
| | | | | | To have the same signature as get_array_range. Reviewed-by: Marek Olšák <[email protected]>
* gallium/radeon: extract common lookup code into get_temp_array functionNicolai Hähnle2016-08-171-33/+40
| | | | Reviewed-by: Marek Olšák <[email protected]>
* gallium/radeon: clarify the comment on the array alloca heuristicNicolai Hähnle2016-08-171-10/+19
| | | | Reviewed-by: Marek Olšák <[email protected]>
* gallium/radeon: more descriptive names for LLVM temporaries in debug buildsNicolai Hähnle2016-08-171-2/+12
| | | | | Reviewed-by: Tom Stellard <[email protected]> Reviewed-by: Marek Olšák <[email protected]>
* gallium/radeon: simplify radeon_llvm_emit_store for direct array addressingNicolai Hähnle2016-08-171-7/+0
| | | | | | | We can use the pointer stored in the temps array directly. Reviewed-by: Tom Stellard <[email protected]> Reviewed-by: Marek Olšák <[email protected]>
* gallium/radeon: simplify radeon_llvm_emit_fetch for direct array addressingNicolai Hähnle2016-08-171-5/+0
| | | | | | | We can use the pointer stored in the temps array directly. Reviewed-by: Tom Stellard <[email protected]> Reviewed-by: Marek Olšák <[email protected]>
* gallium/radeon: clean up emit_declaration for temporariesNicolai Hähnle2016-08-171-9/+18
| | | | | | | | In the alloca'd array case, no longer create redundant and unused allocas for the individual elements; create getelementptrs instead. Reviewed-by: Tom Stellard <[email protected]> Reviewed-by: Marek Olšák <[email protected]>
* st_glsl_to_tgsi: use calloc the way it's meant to be usedNicolai Hähnle2016-08-171-1/+1
| | | | Reviewed-by: Marek Olšák <[email protected]>
* tgsi/scan: add tgsi_scan_arraysNicolai Hähnle2016-08-172-0/+93
| | | | Reviewed-by: Marek Olšák <[email protected]>
* glsl: Add missing ir_quadop_vector constant evaluation for Boolean typesIan Romanick2016-08-171-0/+3
| | | | | Signed-off-by: Ian Romanick <[email protected]> Reviewed-by: Matt Turner <[email protected]>
* glsl: Fix typo in ir_unop_f2u implementationIan Romanick2016-08-171-1/+1
| | | | | | | This won't affect the output, but it was, technically, wrong. Signed-off-by: Ian Romanick <[email protected]> Reviewed-by: Matt Turner <[email protected]>
* glsl: Fix typo in ir_unop_b2i implementationIan Romanick2016-08-171-1/+1
| | | | | | | This won't affect the output, but it was, technically, wrong. Signed-off-by: Ian Romanick <[email protected]> Reviewed-by: Matt Turner <[email protected]>
* glsl: Don't support integer types for operations that can't handle themIan Romanick2016-08-172-14/+2
| | | | | | | | ir_unop_fract already forbade integer types in ir_validate. ir_unop_rcp, ir_unop_rsq, and ir_unop_sqrt should also forbid them in ir_validate. Signed-off-by: Ian Romanick <[email protected]> Reviewed-by: Matt Turner <[email protected]>
* glsl: Don't support ir_unop_abs or ir_unop_sign for unsigned integersIan Romanick2016-08-172-6/+9
| | | | | Signed-off-by: Ian Romanick <[email protected]> Reviewed-by: Matt Turner <[email protected]>
* nir/algebraic: Optimize common array indexing sequenceIan Romanick2016-08-171-0/+11
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Some shaders include code that looks like: uniform int i; uniform vec4 bones[...]; foo(bones[i * 3], bones[i * 3 + 1], bones[i * 3 + 2]); CSE would do some work on this: x = i * 3 foo(bones[x], bones[x + 1], bones[x + 2]); The compiler may then add '<< 4 + base' to the index calculations. This results in expressions like x = i * 3 foo(bones[x << 4], bones[(x + 1) << 4], bones[(x + 2) << 4]); Just rearranging the math to produce (i * 48) + 16 saves an instruction, and it allows CSE to do more work. x = i * 48; foo(bones[x], bones[x + 16], bones[x + 32]); So, ~6 instructions becomes ~3. Some individual shader-db results look pretty bad. However, I have a really, really hard time believing the change in estimated cycles in, for example, 3dmmes-taiji/51.shader_test after looking that change in the generated code. G45 total instructions in shared programs: 4020840 -> 4010070 (-0.27%) instructions in affected programs: 177460 -> 166690 (-6.07%) helped: 894 HURT: 0 total cycles in shared programs: 98829000 -> 98784990 (-0.04%) cycles in affected programs: 3936648 -> 3892638 (-1.12%) helped: 894 HURT: 0 Ironlake total instructions in shared programs: 6418887 -> 6408117 (-0.17%) instructions in affected programs: 177460 -> 166690 (-6.07%) helped: 894 HURT: 0 total cycles in shared programs: 143504542 -> 143460532 (-0.03%) cycles in affected programs: 3936648 -> 3892638 (-1.12%) helped: 894 HURT: 0 Sandy Bridge total instructions in shared programs: 8357887 -> 8339251 (-0.22%) instructions in affected programs: 432715 -> 414079 (-4.31%) helped: 2795 HURT: 0 total cycles in shared programs: 118284184 -> 118207412 (-0.06%) cycles in affected programs: 6114626 -> 6037854 (-1.26%) helped: 2478 HURT: 317 Ivy Bridge total instructions in shared programs: 7669390 -> 7653822 (-0.20%) instructions in affected programs: 388234 -> 372666 (-4.01%) helped: 2795 HURT: 0 total cycles in shared programs: 68381982 -> 68263684 (-0.17%) cycles in affected programs: 1972658 -> 1854360 (-6.00%) helped: 2458 HURT: 307 Haswell total instructions in shared programs: 7082636 -> 7067068 (-0.22%) instructions in affected programs: 388234 -> 372666 (-4.01%) helped: 2795 HURT: 0 total cycles in shared programs: 68282020 -> 68164158 (-0.17%) cycles in affected programs: 1891820 -> 1773958 (-6.23%) helped: 2459 HURT: 261 Broadwell total instructions in shared programs: 9002466 -> 8985875 (-0.18%) instructions in affected programs: 658784 -> 642193 (-2.52%) helped: 2795 HURT: 5 total cycles in shared programs: 78503092 -> 78450404 (-0.07%) cycles in affected programs: 2873304 -> 2820616 (-1.83%) helped: 2275 HURT: 415 Skylake total instructions in shared programs: 9156978 -> 9140387 (-0.18%) instructions in affected programs: 682625 -> 666034 (-2.43%) helped: 2795 HURT: 5 total cycles in shared programs: 75591392 -> 75550574 (-0.05%) cycles in affected programs: 3192120 -> 3151302 (-1.28%) helped: 2271 HURT: 425 Signed-off-by: Ian Romanick <[email protected]> Reviewed-by: Thomas Helland <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]>