| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
|
|
| |
Signed-off-by: Kenneth Graunke <[email protected]>
Reviewed-by: Ian Romanick <[email protected]>
Reviewed-by: Dave Airlie <[email protected]>
|
|
|
|
|
| |
Signed-off-by: Kenneth Graunke <[email protected]>
Reviewed-by: Ian Romanick <[email protected]>
|
|
|
|
|
|
|
|
|
|
| |
The previous formula for atan(x,y) returned a value of +/- pi whenever
|x|<0.0001, and used a formula based on atan(y/x) otherwise. This
broke in cases where both x and y were small (e.g. atan(1e-5, 1e-5)).
This patch modifies the formula so that it returns a value of +/- pi
whenever |x|<1e-8*|y|, and uses the formula based on atan(y/x)
otherwise.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The previous formula for asin(x) was algebraically equivalent to:
sign(x)*(pi/2 - sqrt(1-|x|)*(A + B|x| + C|x|^2))
where A, B, and C were arbitrary constants determined by a curve fit.
This formula had a worst case absolute error of 0.00448, an unbounded
worst case relative error, and a discontinuity near x=0.
Changed the formula to:
sign(x)*(pi/2 - sqrt(1-|x|)*(pi/2 + (pi/4-1)|x| + A|x|^2 + B|x|^3))
where A and B are arbitrary constants determined by a curve fit. This
has a worst case absolute error of 0.00039, a worst case relative
error of 0.000405, and no discontinuities.
I don't expect a significant performance degradation, since the extra
multiply-accumulate should be fast compared to the sqrt() computation.
Fixes piglit tests {vs,fs}-asin-float and {vs,fs}-atan-*
|
|
|
|
|
|
|
|
|
|
|
| |
The constant used in the radians() function didn't have enough
precision, causing a relative error of 1.676e-5, which is far worse
than the precision of 32-bit floats. This patch reduces the relative
error to 1.14e-9, which is the best we can do in 32 bits.
Fixes piglit tests {fs,vs}-radians-{float,vec2,vec3,vec4}.
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
| |
Signed-off-by: Kenneth Graunke <[email protected]>
NOTE: This is a candidate for stable release branches (and don't forget
to re-run "make builtins" after cherry-picking.)
|
|
|
|
|
|
|
|
|
| |
Commit 56ef62d9885f805bbfb2243dc860ff425d5b4d3b
"glsl: Generate readable unique names at print time."
changed ir_print_visitor to not generate @0x1234567 suffixes except
where necessary. So there's no need to manually remove them.
Signed-off-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
| |
Signed-off-by: Kenneth Graunke <[email protected]>
Tested-by: Marek Olšák <[email protected]>
|
|
|
|
|
|
| |
This is necessary for GLSL 1.30+ shadow sampling functions, which return
a single float rather than splatting the value to a vec4 based on
GL_DEPTH_TEXTURE_MODE.
|
| |
|
| |
|
|
|
|
| |
A copy and paste error.
|
|
|
|
|
|
| |
This has probably existed since e5e34ab18eeaffa465 or so.
NOTE: This is a candidate for the 7.9 and 7.10 branches.
|
| |
|
| |
|
| |
|
|
|
|
|
|
|
|
| |
Rather than passing "True", pass a bitfield describing the particular
variant's features - either projection or offset.
This should make the code a bit more readable ("Proj" instead of "True")
and make it easier to support offsets in the future.
|
|
|
|
|
| |
For offsets, we'll want the straight sampler dimensionality, without the
+1 for array types. Create a new function to do that; refactor.
|
|
|
|
|
|
|
|
|
|
|
| |
Having these as actual integer values makes it difficult to implement
the texture*Offset built-in functions, since the offset is actually a
function parameter (which doesn't have a constant value).
The original rationale was that some hardware needs these offset baked
into the instruction opcode. However, at least i965 should be able to
support non-constant offsets. Others should be able to rely on inlining
and constant propagation.
|
| |
|
|
|
|
| |
Also removed unnecessary semicolons.
|
| |
|
|
|
|
| |
This isn't strictly necessary, but is definitely nicer.
|
|
|
|
| |
Import sys for sys.exit.
|
|
|
|
|
|
| |
Python is already necessary for other parts of Mesa, so there's no
reason we can't just generate it. This patch updates both make and
SCons to do so.
|
|
|
|
|
| |
I forgot about this file, and it didn't show up until I tried to do
"make builtins" from a clean build.
|
|
|
|
|
|
| |
I think was used long ago, when we actually read the builtins into the
shader's instruction stream directly, rather than creating a separate
shader and linking the two. It doesn't seem to serve any purpose now.
|
|
|
|
|
|
|
|
| |
These mistakenly computed 't' instead of t * t * (3.0 - 2.0 * t).
Also, properly vectorize the smoothstep(float, float, vec) variants.
NOTE: This is a candidate for the 7.9 and 7.10 branches.
|
|
|
|
|
|
|
|
| |
This makes a very simple 1.30 shader go from 196k of memory to 9k.
NOTE: This -may- be a candidate for the 7.9 branch, as the benefit is
substantial. However, it's not a simple change, so it may be wiser to
wait for 7.10.
|
|
|
|
|
|
| |
We are not aware of any GPU that actually implements the cross product
as a single instruction. Hence, there's no need for it to be an opcode.
Future commits will remove it entirely.
|
| |
|
| |
|
|
|
|
|
|
| |
In particular, calling the abs function is silly, since there's already
an expression opcode for that. Also, assigning to temporaries then
assigning those to the final location is rather redundant.
|
|
|
|
| |
For consistency with the vec2/vec3/vec4 variants.
|
|
|
|
|
|
| |
This works around MSVC's 65535 byte limit, unfortunately at the expense
of any semblance of readability and much larger file size. Hopefully I
can implement a better solution later, but for now this fixes the build.
|
| |
|
|
|
|
|
|
|
| |
This implements round() via the ir_unop_round_even opcode, rather than
adding a new opcode. We may wish to add one in the future, since it
might enable a small performance increase on some hardware, but for now,
this should suffice.
|
|
|
|
| |
Implemented using the op-code introduced in the previous commit.
|
| |
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
It turns out that most people new to this IR are surprised when an
assignment to (say) 3 components on the LHS takes 4 components on the
RHS. It also makes for quite strange IR output:
(assign (constant bool (1)) (x) (var_ref color) (swiz x (var_ref v) ))
(assign (constant bool (1)) (y) (var_ref color) (swiz yy (var_ref v) ))
(assign (constant bool (1)) (z) (var_ref color) (swiz zzz (var_ref v) ))
But even worse, even we get it wrong, as shown by this line of our
current step(float, vec4):
(assign (constant bool (1)) (w)
(var_ref t)
(expression float b2f (expression bool >=
(swiz w (var_ref x))(var_ref edge))))
where we try to assign a float to the writemasked-out x channel and
don't supply anything for the actual w channel we're writing. Drivers
right now just get lucky since ir_to_mesa spams the float value across
all the source channels of a vec4.
Instead, the RHS will now have a number of components equal to the
number of components actually being written. Hopefully this confuses
everyone less, and it also makes codegen for a scalar target simpler.
Reviewed-by: Kenneth Graunke <[email protected]>
Reviewed-by: Ian Romanick <[email protected]>
|
| |
|
|
|
|
|
| |
Commit 309cd4115b7cba669a0bf858e7809cb6dae90ddf incorrectly converted
these to all_equal and any_nequal, which is the wrong operation.
|
| |
|
| |
|
| |
|
|
|
|
| |
Otherwise it gets used uninitialized.
|
|
|
|
|
|
| |
Otherwise builtin_profiles contains dangling pointers the next time
_mesa_read_profile is called. I suspect this may fix bugzilla #29847,
but I was never able to reproduce it.
|
|
|
|
| |
These need abs, and we need more tests.
|
| |
|