| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
|
|
|
|
|
| |
UpdateTexture is supposed to optimise by uploading only for the
dirty region of the source (d3d9 doc, wine tests).
This patch adds the behaviour for surfaces, but not entirely for
volumes.
Signed-off-by: Axel Davy <[email protected]>
|
|
|
|
|
|
|
| |
According to the spec all textures start
dirty.
Signed-off-by: Axel Davy <[email protected]>
|
|
|
|
|
|
|
|
|
|
| |
Dirty regions should be tracked for both MANAGED
and SYSTEMMEM.
Until now we didn't bother to track for SYSTEMMEM,
because we hadn't implemented using the dirty region
to avoid some copies
Signed-off-by: Axel Davy <[email protected]>
|
|
|
|
|
|
|
|
|
|
| |
If the texture is bound and dirty_mip is true,
BASETEX_REGISTER_UPDATE adds the texture to the list
of things to update before the next draw call.
Some calls to it were missing.
Signed-off-by: Axel Davy <[email protected]>
|
|
|
|
|
|
| |
It should regenerate the sublevels according to the spec
Signed-off-by: Axel Davy <[email protected]>
|
|
|
|
|
|
| |
We had only one usage for this function.
Signed-off-by: Axel Davy <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
NineSurface9_CopySurface was supporting more cases than what
we needed, and doing checks that were innapropriate for
some NineSurface9_CopySurface use cases.
This patch splits it into two for the two use cases, and moves
the checks to the caller.
This patch also adds a few checks to NineDevice9_UpdateSurface
Signed-off-by: Axel Davy <[email protected]>
|
|
|
|
|
|
|
| |
Similar to what was done for Surface9, track the dirty region
only in VolumeTexture9.
Signed-off-by: Axel Davy <[email protected]>
|
| |
|
|
|
|
|
|
| |
This type of blending is used for gallium nine software cursor
Signed-off-by: David Heidelberg <[email protected]>
|
|
|
|
|
| |
Signed-off-by: Axel Davy <[email protected]>
Reviewed-by: David Heidelberg <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
| |
For sw cursor we do not tell wine the cursor position (the app
tells us directly). We shouldn't use ID3DPresent_GetCursorPos.
device->cursor.pos already contains the coordinates the app
gave us.
Signed-off-by: Axel Davy <[email protected]>
Reviewed-by: David Heidelberg <[email protected]>
|
|
|
|
|
|
|
|
| |
According to the spec, Windowed mode must
have hw cursor
Signed-off-by: Axel Davy <[email protected]>
Reviewed-by: David Heidelberg <[email protected]>
|
|
|
|
|
|
|
|
|
| |
We have either hardware cursor or software cursor.
When we use software cursor, we should hide the hardware
cursor.
Signed-off-by: Axel Davy <[email protected]>
Reviewed-by: David Heidelberg <[email protected]>
|
|
|
|
|
|
|
|
|
| |
D3DRS_DITHERENABLE was assigned to the rasterizer state
group, but it was used for the blend group.
Assign it to the blend group.
Signed-off-by: Axel Davy <[email protected]>
|
|
|
|
|
|
|
|
|
|
| |
When using D3DRS_POINTSIZE make sure the value is at least
D3DRS_POINTSIZE_MIN but not greater than D3DRS_POINTSIZE_MAX.
Fixes some Wine tests.
Reviewed-by: Axel Davy <[email protected]>
Signed-off-by: Patrick Rudolph <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
| |
Align texture memory on 32 byte boundry to allow
SSE/AVX memcpy to work on locked rects.
This fixes some crashes with games using SSE.
Reviewed-by: David Heidelberg <[email protected]>
Reviewed-by: Axel Davy <[email protected]>
Signed-off-by: Patrick Rudolph <[email protected]>
|
|
|
|
|
|
|
| |
Both Points and Point Sprites are rasterized like quads,
according to d3d9 doc and gallium rasterizer doc.
Signed-off-by: Axel Davy <[email protected]>
|
|
|
|
|
|
|
|
|
|
| |
We had red and green in the wrong channels
for the ATI2 format (RGTC2).
Found thanks to wine tests.
Signed-off-by: Axel Davy <[email protected]>
Reviewed-by: David Heidelberg <[email protected]>
|
|
|
|
|
|
|
|
|
| |
Add support for multiple cards and fill in Win
like card name, driver name and version info.
Use fallback for unknown vendors and unknown card names.
Reviewed-by: Axel Davy <[email protected]>
Signed-off-by: Patrick Rudolph <[email protected]>
|
|
|
|
|
| |
Signed-off-by: Ilia Mirkin <[email protected]>
Reviewed-by: Marek Olšák <[email protected]>
|
|
|
|
|
|
| |
Signed-off-by: Ilia Mirkin <[email protected]>
Cc: "10.6" <[email protected]>
Reviewed-by: Marek Olšák <[email protected]>
|
|
|
|
|
|
| |
Signed-off-by: Ilia Mirkin <[email protected]>
Cc: "10.6" <[email protected]>
Reviewed-by: Marek Olšák <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
| |
This got missed because the piglit test only tested int images to avoid a
combinatiorial explosion of format, targets, stages and sizes which
takes more than 5 minutes to test on nvidia's driver.
This patch also drops the IMAGE_FUNCTION_AVAIL_ATOMIC which is not applicable
to the image_size codepath but was not hurting in any way.
Signed-off-by: Martin Peres <[email protected]>
Reviewed-by: Francisco Jerez <[email protected]>
|
|
|
|
|
|
|
|
|
|
| |
There is no MDOperand in llvm 3.5.
v2: Check if kernel metadata is present to avoid crash (EdB).
v3: Second attempt to avoid crash: switch off metadata query for llvm < 3.6.
Reviewed-by: Serge Martin (EdB) <[email protected]>
Reviewed-by: Francisco Jerez <[email protected]>
|
|
|
|
|
|
|
|
|
|
| |
We have to re-validate FBOs rendering to the texture like is done
with TexImage and CopyTexImage.
Signed-off-by: Tapani Pälli <[email protected]>
Reviewed-by: Anuj Phogat <[email protected]>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=91673
Cc: "10.6" <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
| |
We're generating rcps as part of backend lowering of the packed coordinate
in the CS, and we don't want to lower them in NIR because of the extra
newton-raphson steps in the common case. However, GLB2.7 is moving a
vertex attribute with a 1.0 W component to the position, and that makes us
produce some silly RCPs.
total instructions in shared programs: 97590 -> 97580 (-0.01%)
instructions in affected programs: 74 -> 64 (-13.51%)
|
|
|
|
|
|
|
| |
I had QPU emit code to do it, but forgot to flag the register class.
total instructions in shared programs: 97974 -> 97590 (-0.39%)
instructions in affected programs: 25291 -> 24907 (-1.52%)
|
|
|
|
|
|
|
|
| |
Now that we do non-SSA QIR instructions, we can take a NIR SSA src that's
only used by the unorm packing and just stuff the pack bits into it.
total instructions in shared programs: 98136 -> 97974 (-0.17%)
instructions in affected programs: 4149 -> 3987 (-3.90%)
|
| |
|
|
|
|
| |
NIR now handles this optimization for us.
|
|
|
|
|
| |
total instructions in shared programs: 98159 -> 98136 (-0.02%)
instructions in affected programs: 12279 -> 12256 (-0.19%)
|
|
|
|
|
|
|
|
| |
This helps ensure that the register allocator doesn't force the later pack
operations to insert extra MOVs.
total instructions in shared programs: 98170 -> 98159 (-0.01%)
instructions in affected programs: 2134 -> 2123 (-0.52%)
|
|
|
|
|
|
|
|
|
| |
Now that we have NIR, most of the optimization we still need to do is
peepholes on instruction selection rather than general dataflow
operations. This means we want to be able to have QIR be a lot closer to
the actual QPU instructions, just with virtual registers. Allowing
multiple instructions writing the same register opens up a lot of
possibilities.
|
|
|
|
| |
No difference on shader-db.
|
|
|
|
|
|
| |
V2: rebase on SSBO changes
Reviewed-by: Ian Romanick <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
| |
The constant folding pass can take a long time to complete
so rather than running through the entire pass each time
a new constant is propagated (and vice versa) interleave them.
This change helps ES31-CTS.arrays_of_arrays.InteractionFunctionCalls1
go from around 2 min -> 23 sec.
Reviewed-by: Ian Romanick <[email protected]>
|
|
|
|
|
|
|
|
|
|
| |
Due to a quirk in how the nv50 opt passes run, the algebraic
optimization that looks for these BFE's happens before the constant
folding pass. Rearranging these passes isn't a great idea, but this is
easy enough to fix. Allows a following cvt to eliminate the bfe in
certain situations.
Signed-off-by: Ilia Mirkin <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
See issue from the ARB_texture_query_lod spec for LOD vs Lod confusion:
(3) The core specification uses the "Lod" spelling, not "LOD". Should
this extension be modified to use "Lod"?
RESOLVED: The "Lod" spelling is the correct spelling for the core
specification and the preferred spelling for use. However, use of
"LOD" also exists, as the extension predated the core specification,
so this extension won't remove use of "LOD".
Signed-off-by: Ilia Mirkin <[email protected]>
Reviewed-by: Timothy Arceri <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
| |
This reverts commit ffe6c6ad5f719dedd1b6b95e8590e3f20b23d340.
_mesa_format_num_components() does not include the padding bits in mesa formats
containing 'X' channels. This could cause mipmap generation for certain
uncompressed formats to underestimate the number of channels in the source
image by 1.
Signed-off-by: Nanley Chery <[email protected]>
|
|
|
|
|
|
|
|
|
| |
FLT_TO_INT goes in the vector pipes on evergreen/NI,
not the trans unit as on earlier chips.
Signed-off-by: Glenn Kennard <[email protected]>
Reviewed-by: Dave Airlie <[email protected]>
Signed-off-by: Dave Airlie <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
| |
This struct was getting a bit crowded, following the lead of
radeonsi, mirror the idea of having sub-structures for each
shader type. Turning 'r600_shader_key' into an union saves
some trivial memory and CPU cycles for the shader keys.
[airlied: drop as_ls, and reorder so larger fields at start.]
Signed-off-by: Edward O'Callaghan <[email protected]>
Reviewed-by: Marek Olšák <[email protected]>
Signed-off-by: Dave Airlie <[email protected]>
|
|
|
|
|
|
| |
Signed-off-by: Edward O'Callaghan <[email protected]>
Reviewed-by: Marek Olšák <[email protected]>
Signed-off-by: Dave Airlie <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Shader-db results for vec4 on i965:
total instructions in shared programs: 1499894 -> 1502261 (0.16%)
instructions in affected programs: 1414224 -> 1416591 (0.17%)
helped: 2434
HURT: 10543
GAINED: 1
LOST: 0
Shader-db results for vec4 on g4x:
total instructions in shared programs: 1437411 -> 1439779 (0.16%)
instructions in affected programs: 1362402 -> 1364770 (0.17%)
helped: 2434
HURT: 10544
GAINED: 0
LOST: 0
Shader-db results for vec4 on Iron Lake:
total instructions in shared programs: 1437214 -> 1439593 (0.17%)
instructions in affected programs: 1362205 -> 1364584 (0.17%)
helped: 2433
HURT: 10544
GAINED: 1
LOST: 0
Shader-db results for vec4 on Sandy Bridge:
total instructions in shared programs: 2022092 -> 1941570 (-3.98%)
instructions in affected programs: 1886838 -> 1806316 (-4.27%)
helped: 7510
HURT: 10737
GAINED: 0
LOST: 0
Shader-db results for vec4 on Ivy Bridge:
total instructions in shared programs: 1853749 -> 1804960 (-2.63%)
instructions in affected programs: 1686736 -> 1637947 (-2.89%)
helped: 6735
HURT: 11101
GAINED: 0
LOST: 0
Shader-db results for vec4 on Haswell:
total instructions in shared programs: 1853749 -> 1804960 (-2.63%)
instructions in affected programs: 1686736 -> 1637947 (-2.89%)
helped: 6735
HURT: 11101
GAINED: 0
LOST: 0
Signed-off-by: Jason Ekstrand <[email protected]>
Acked-by: Kenneth Graunke <[email protected]>
Acked-by: Matt Turner <[email protected]>
|
|
|
|
|
|
|
|
|
| |
Fixes a crash in Piglit's
spec@arb_shader_subroutine@[email protected] for me.
Signed-off-by: Kai Wasserbäch <[email protected]>
Reviewed-by: Dave Airlie <[email protected]>
Signed-off-by: Dave Airlie <[email protected]>
|
|
|
|
|
| |
[imirkin: handle more type combinations, use macro]
Signed-off-by: Ilia Mirkin <[email protected]>
|
|
|
|
|
|
|
|
| |
This happens with unpackSnorm lowering. There's yet another
bitfield-extract behind it, but there's too much variation to be worth
cutting through.
Signed-off-by: Ilia Mirkin <[email protected]>
|
|
|
|
|
|
|
| |
unpackUnorm* lowering doesn't AND the high byte/word as it's
unnecessary. Detect that situation as well.
Signed-off-by: Ilia Mirkin <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
| |
Some Unigine shaders have been observed to unpack bytes out of 32-bit
integers and convert them to floats. I2F/I2I can handle this sort of
thing directly. Detect the handleable situations.
This misses 16-bit word capabilities in nv50, but I haven't seen shaders
that would actually make use of that.
Signed-off-by: Ilia Mirkin <[email protected]>
|
|
|
|
|
|
|
| |
Some shaders appear to extract bits using shift/and combos. Detect
(some) of those and convert to EXTBF instead.
Signed-off-by: Ilia Mirkin <[email protected]>
|