| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
We've found that there's a buffer overrun bug in flex that's triggered by
using alternation in a lookahead pattern.
Fortunately, we don't need to match the exact {NEWLINE} expression to
detect an empty pragma. It suffices to verify that there are no non-space
characters before any newline character. So we can use a simple [\r\n] to
get the desired behavior while avoiding the flex bug.
Fixes the regression of piglit's 17000-consecutive-chars-identifier test,
(which has been crashing since commit
04e40fd337a244ee77ef9553985e9398ff0344af ).
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=82472
Signed-off-by: Carl Worth <[email protected]>
Reviewed-by: Kenneth Graunke <[email protected]>
CC: <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Some tests were failing because the message printed by #error was including a
'\r' character from the source file in its output.
This is easily avoided by fixing the regular expression for #error to never
include any of the possible newline characters, (neither '\r' nor '\n').
With this commit 2 tests are fixed for each of the '\r' and '\r\n' cases.
Current results after the commit are:
\r: 137/143 tests pass
\r\n 142/143 tests pass
\n\r: 139/143 tests pass
Reviewed-by: Ian Romanick <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The GLSL specification says that either carriage-return, line-feed, or both
together can be used to terminate lines. Further, it says that when used
together, the pair of terminators shall be interpreted as a single line.
This final requirement has not been respected by glcpp up until now, (it has
been emitting two newlines for every CR+LF pair).
Here, we fix the lexer by using a regular expression for NEWLINE that eats
up both "\r\n" (or even "\n\r") if possible before also considering a single
'\n' or a single '\r' as a line terminator.
Before this commit, the test results are as follows:
\r: 135/143 tests pass
\r\n: 4/143 tests pass
\n\r: 4/143 tests pass
After this commit, the test results are as follows:
\r: 135/143 tests pass
\r\n: 140/143 tests pass
\n\r: 139/143 tests pass
So, obviously, a dramatic improvement.
Reviewed-by: Ian Romanick <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Previously, we were passing these through, just like any other pragma. But the
downstream compiler was tripping up on them. It seems easier to swallow these
in the preprocessor and not pass them on at all rather than fixing the
downstream compiler.
This fixes the following Khronos GLES3 CTS tests:
preprocessor.pragmas.pragma_vertex
preprocessor.pragmas.pragma_fragment
Reviewed-by: Ian Romanick <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Previously, the #pragma directive was swallowing an entire line, (including
the final newline). At that time it was appropriate for it to increment the
line count.
More recently, our handling of #pragma changed to not include the newline. But
the code to increment yylineno stuck around. This was causing __LINE__ to be
increased by one more than desired for every #pragma.
Remove the bogus, extra increment, and add a test for this case.
Reviewed-by: Ian Romanick <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This is the fix for the following line:
# // comment to ignore here
According to the translation-phase rules, the comment should be removed before
the preprocessor looks to interpret the null directive.
So in our implementation we must explicitly look for single-line comments in
the <HASH> start condition as well.
This commit fixes the following Khronos GLES3 CTS tests:
null_directive_vertex
null_directive_fragment
Reviewed-by: Ian Romanick <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
We were already correctly supporting single-line comments in case like:
#define FOO bar // comment here...
The new support added here is simply for the none-too-useful:
#define // comment instead of macro name
With this commit, this line will now give the expected "#define without
macro name" error message instead of the lexer just going off into the
weeds.
Reviewed-by: Ian Romanick <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Previously, glcpp would emit an error like this if <EOF> happened to occur
immediately after the "#define", but in general would just get confused,
(leading to un-helpful error messages).
To fix things to generate a clean error message, we do a few things:
1. Don't require horizontal whitespace immediately after #define
2. Add a production for the error case, (DEFINE_TOKEN followed
immediately by a NEWLINE token).
3. Make the lexer reset to the <INITIAL> state after every NEWLINE.
This 3rd point prevents the lexer from getting so confused and generating
further spurious errors in the file because it was stuck in the <DEFINE> start
condition.
We also drop the similar error message from the <EOF> rule since the
newly-added rule will have already printed the error message.
Reviewed-by: Ian Romanick <[email protected]>
|
|
|
|
|
|
|
| |
ERROR is a #define in the MSVC WinGDI.h header file.
Add the _TOKEN suffix as we do for a few other lexer tokens.
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
We've had multiple bugs in the past where we have been inadvertently matching
the default rule, (which we never want to do). We recently added a catch-all
rule to avoid this, (and made this rule robust for future start conditions).
Kristian pointed out that flex allows us to go one step better. This syntax:
%option warn nodefault
instructs flex to not generate the default rule at all. Further, flex will
generate a warning at compile time if the set of rules we provide are
inadequate, (such that it would be possible for the default rule to be
matched).
With this warning in place, I found that the catch-all rule was in fact
missing something. The catch-all rule uses a pattern of "." which doesn't
match newlines. So here we extend the newline-matching rule to all start
conditions. That is enough to convince flex that it really doesn't need
any default rule.
Reviewed-by: Kenneth Graunke <[email protected]>
Reviewed-by: Kristian Høgsberg <[email protected]>
|
|
|
|
|
|
|
|
|
|
| |
Using a single rule here means that we can use the <*> syntax to match
all start conditions. This makes the catch-all rule more robust against
the addition of future start conditions, (no need to maintain an ever-
growing list of start conditions for this rul).
Reviewed-by: Kenneth Graunke <[email protected]>
Reviewed-by: Kristian Høgsberg <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
| |
There is no behavioral change here. It's just easier to verify that lists
of start conditions include all expected conditions when they appear in a
consistent order.
The <INITIAL> state is special, so it appears first in all lists. All others
appear in alphabetical order.
Reviewed-by: Kenneth Graunke <[email protected]>
Reviewed-by: Kristian Høgsberg <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
In some of the recent glcpp bug-fixing, we found that glcpp was emitting
unrecognized characters from the input source file to stdout, and dropping
them from the source passed onto the compiler proper.
This was obviously confusing, and totally undesired.
The bogus behavior comes from an implicit default rule in flex, which is
that any unmatched character is implicitly matched and printed to stdout.
To avoid this implicit matching and printing, here we add an explicit
catch-all rule. If this rule ever matches it prints an internal compiler
error. The correct response for any such error is fixing glcpp to handle
the unexpected character in the correct way.
Reviewed-by: Jordan Justen <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Previously, the '\r' character was not explicitly matched by any lexer
rule. This means that glcpp would have been using the default flex rule to
match '\r' characters, (where they would have been printed to stdout rather
than actually correctly handled).
With this commit, we treat '\r' as equivalent to '\n'. This is clearly an
improvement the bogus printing to stdout. The resulting behavior is compliant
with the GLSL specification for any source file that uses exclusively '\r' or
'\n' to separate lines.
For shaders that use a multiple-character line separator, (such as "\r\n"),
glcpp won't be precisely compliant with the specification, (treating these as
two newline characters rather than one), but this should not introduce any
semantic changes to the shader programs.
Reviewed-by: Jordan Justen <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
These operators aren't defined for preprocessor expressions, so we never
implemented them. This led them to be misinterpreted as strings of unary
'+' or '-' operators.
In fact, what is actually desired is to generate an error if these operators
appear in any preprocessor condition.
So this commit looks like it is strictly adding support for these
operators. And it is supporting them as far as passing them through to the
subsequent compiler, (which was already happening anyway).
What's less apparent in the commit is that with these tokens now being lexed,
but with no change to the grammar for preprocessor expressions, these
operators will now trigger errors there.
A new "make check" test is added to verify the desired behavior.
This commit fixes the following Khronos GLES3 CTS test:
invalid_op_1_vertex
invalid_op_1_fragment
invalid_op_2_vertex
invalid_op_2_fragment
Reviewed-by: Jordan Justen <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Previously, we had a single token for "#if" but now that we have two separate
tokens, it looks much better to see:
HASH_TOKEN IF
than:
HASH_TOKEN HASH_IF
(Note, that for the same reason we use HASH_TOKEN instead of HASH, we also use
DEFINE_TOKEN instead of DEFINE to avoid a conflict with the <DEFINE> start
condition in the lexer.)
There should be no behavioral change from this commit.
Reviewed-by: Jordan Justen <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
It's legal (though highly bizarre) for a pre-processor directive to look like
this:
# /* why? */ define FOO bar
This behavior comes about since the specification defines separate logical
phases in a precise order, and comment-removal occurs in a phase before the
identification of directives.
Our implementation does not use an actual separate phase for comment removal,
so some extra care is necessary to correctly parse this. What we want is for
'#' to introduce a directive iff it is the first token on a line, (ignoring
whitespace and comments). Previously, we had a lexical rule that worked only
for whitespace (not comments) with the following regular expression to find a
directive-introducing '#' at the beginning of a line:
HASH ^{HSPACE}*#{HSPACE}*
In this commit, we switch to instead use a simple literal match of '#' to
return a HASH_TOKEN token and add a new <HASH> start condition for whenever
the HASH_TOKEN is the first non-space token of a line. This requires the
addition of the new bit of state: first_non_space_token_this_line.
This approach has a couple of implications on the glcpp parser:
1. The parser now sees two separate tokens, (such as HASH_TOKEN and
HASH_DEFINE) where it previously saw one token (HASH_DEFINE) for
the sequence "#define". This is a straightforward change throughout
the grammar.
2. The parser may now see a SPACE token before the HASH_TOKEN token of
a directive. Previously the lexical regular expression for {HASH}
would eat up the space and there would be no SPACE token.
This second implication is a bit of a nuisance for the parser. It causes a
SPACE token to appear in a production of the grammar with the following two
definitions of a control_line:
control_line
SPACE control_line
This is really ugly, since normally a space would simply be a token
separator, so it wouldn't appear in the tokens of a production. This leads to
a further problem with interleaved spaces and comments:
/* ... */ /* ... */ #define /* ..*/
For this, we must not return several consecutive SPACE tokens, or else we would need an arbitrary number of new productions:
SPACE SPACE control_line
SPACE SPACE SPACE control_line
ad nauseam
To avoid this problem, in this commit we also change the lexer to emit only a
single SPACE token for any series of consecutive spaces, (whether from actual
whitespace or comments). For this compression, we add a new bit of parser
state: last_token_was_space. And we also update the expected results of all
necessary test cases for the new compression of space tokens.
Fortunately, the compression of spaces should not lead to any semantic changes
in terms of what the eventual GLSL compiler sees.
So there's a lot happening in this commit, (particularly for such a tiny
feature). But fortunately, the lexer itself is looking cleaner than ever. The
only ugly bit is all the state updating, but it is at least isolated to a
single shared function.
Of course, a new "make check" test is added for the new feature, (directives
with comments and whitespace interleaved in many combinations).
And this commit fixes the following Khronos GLES3 CTS tests:
function_definition_with_comments_vertex
function_definition_with_comments_fragment
Reviewed-by: Jordan Justen <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
| |
This is in preparation for the planned addition of a new <HASH> start
condition to the lexer. Both start conditions and token types are, of course,
in the same default C namespace, so a start condition and a token type with
the same name will collide. (And unfortunately, they are both apparently
implemented as equivalent numeric types so the collision is undetected at
compile time and simply leads to unpredictable behavior at run time.)
Reviewed-by: Jordan Justen <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This commit does not cause any behavioral change for any valid program. Prior
to entering the <DEFINE> start condition, the only valid start condition is
<INITIAL>, so whether pushing/popping <DEFINE> onto the stack or explicit
returning to <INITIAL> is equivalent.
The reason for this change is that we are planning to soon add a start
condition for <HASH> with the following semantics:
<HASH>: We just saw a directive-introducing '#'
<DEFINE>: We just saw "#define" starting a directive
With these two start conditions in place, the only correct behavior is to
leave <DEFINE> by returning to <INITIAL>. But the old push/pop code would have
returned to the <HASH> start condition which would then cause an error when
the next directive-introducing '#' would be encountered.
Reviewed-by: Jordan Justen <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
For the first line we were initializing the column to 1, but for all
subsequent lines we were initializing the column to 0. The column number is
advanced for each token read before any error message is printed. So the 0
value is the correct initialization, (so that the first column is reported as
column 1).
With this extremely minor change, many of the .expected files are updated such
that error messages for the first line now have the correct column number in
them.
Reviewed-by: Jordan Justen <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Here, "skipping" refers to the lexer not emitting any tokens for portions of
the file within an #if condition (or similar) that evaluates to false.
Previously, the lexer had a special <SKIP> start condition used to control
this skipping. This start condition was not handled like a normal start
condition. Instead, there was a particularly ugly block of code set to be
included at the top of the generated lexing loop that would change from
<INITIAL> to <SKIP> or from <SKIP> to <INITIAL> depending on various pieces of
parser state, (such as parser->skip_state and parser->lexing_directive).
Not only was that an ugly approach, but the <SKIP> start condition was
complicating several glcpp bug fixes I attempted recently that want to use
start conditions for other purposes, (such as a new <HASH> start condition).
The recently added RETURN_TOKEN macro gives us a convenient way to implement
skipping without using a lexer start condition. Now, at the top of the
generated lexer, we examine all the necessary parser state and set a new
parser->skipping bit. Then, in RETURN_TOKEN, we examine parser->skipping to
determine whether to actually emit the token or not.
Besides this, there are only a couple of other places where we need to examine
the skipping bit (other than when returning a token):
* To avoid emitting an error for #error if skipped.
* To avoid entering the <DEFINE> start condition for a #define that is
skipped.
With all of this in place in the present commit, there are hopefully no
behavioral changes with this patch, ("make check" still passes all of the
glcpp tests at least).
Reviewed-by: Jordan Justen <[email protected]>
|
|
|
|
|
|
|
| |
Now that we have a common macro for returning tokens, it makes sense to
perform some of the common work there, (such as copying string values).
Reviewed-by: Jordan Justen <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The glcpp parser is line-based, so it needs to see a NEWLINE token at the end
of each line. This causes a trick for files that end without a final newline.
Previously, the lexer for glcpp punted in this case by unconditionally
returning a NEWLINE token at end-of-file, (causing most files to have an extra
blank line at the end). Here, we refine this by lexing end-of-file as a
NEWLINE token only if the immediately preceding token was not a NEWLINE token.
The patch is a minor change that only looks huge for two reasons:
1. Almost all glcpp test result ".expected" files are updated to drop
the extra newline.
2. All return statements from the lexer are adjusted to use a new
RETURN_TOKEN macro that tracks the last-token-was-a-newline state.
Reviewed-by: Jordan Justen <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The glcpp implementation has long had code to support a file that ends without
a final newline. But we didn't have a "make check" test for this.
Additionally, the <EOF> action was restricted only to the <INITIAL> state so
it would fail to get invoked if the EOF was encountered in the <COMMENT> or
the <DEFINE> case. Neither of these was a bug, per se, since EOF in either
of these cases is an error anyway, (either "unterminated comment" or
"missing macro name for #define").
But with the new explicit support for these cases, we not generate clean error
messages in these cases, (rather than "unexpected $end" from before).
Reviewed-by: Jordan Justen <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The NEWLINE_CATCHUP code is only intended to be invoked after we lex an actual
newline character ('\n'). The two extra calls here were apparently added
accidentally because the pattern happened to contain a (negated) '\n',
(see commit 6005e9cb283214cd57038c7c5e7758ba72ec6ac2).
I don't think either case could have caused any actual bug. (In the first
case, the pattern matched right up to the next newline, so the NEWLINE_CATCHUP
code was just about to be called. In the second case, I don't think it's
possible to actually enter the <SKIP> start condition after commented newlines
without any intervening newline.)
But, if nothing else, the code is cleaner without these extra calls.
Reviewed-by: Jordan Justen <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The recent adddition of an error for "#define followed by a non-identifier"
was a bit to aggressive since it used a regular expression in the lexer to
flag any character that's not legal as the first character of an identifier.
But we need to allow comments to appear here, (since we aren't removing
comments in a preliminary pass). So we refine the error here to only flag
characters that could not be an identifier, nor a comment, nor whitespace.
We also augment the existing comment support to be active in the <DEFINE>
state as well.
Reviewed-by: Jordan Justen <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Previously, if the preprocessor encountered a #define with a non-identifier,
such as:
#define 123 456
The lexer had no explicit rules to match non-identifiers in the <DEFINE> start
state. Because of this, flex's default rule was being invoked, (printing
characters to stdout), and all text was being discarded by the compiler until
the next identifier. As one can imagine, this led to all sorts of interesting
and surprising results.
Fix this by adding an explicit rule complementing the existing
identifier-based rules that should catch all non-identifiers after #define and
reliably give a well-formatted error message.
A new test is added to "make check" to ensure this bug stays fixed.
This commit also fixes the following Khronos GLES3 CTS test:
define_non_identifier_vertex
(The "fragment" variant was passing earlier only because the preprocessor was
behaving so randomly and causing the compilation to fail. It's lucky, in fact,
that the "vertex" version succesfully compiled so we could find and fix this
bug.)
Reviewed-by: Jordan Justen <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The glcpp lexer and parser use the space_tokens state bit to avoid emitting
tokens for spaces while parsing a directive. Previously, this bit was only
being set again by the first non-space token following a directive.
This led to a bug where a space, (or a comment that should emit a space),
immediately following a directive, (optionally searated by newlines), would be
omitted from the output.
Here we fix the bug by also setting the space_tokens bit whenever we lex a
newline in the standard start conditions.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The lexer was insisting that there be at least one character after "#pragma"
and before the end of the line. This caused an error for a line consisting
only of "#pragma" which volates at least the following sentence from the GLSL
ES Specification 3.00.4:
The scope as well as the effect of the optimize and debug pragmas is
implementation-dependent except that their use must not generate an
error. [Page 12 (Page 28 of PDF)]
and likely the following sentence from that specification and also in
GLSLangSpec 4.30.6:
If an implementation does not recognize the tokens following #pragma,
then it will ignore that pragma.
Add a "make check" test to ensure no future regressions.
This change fixes at least part of the following Khronos GLES3 CTS test:
preprocessor.pragmas.pragma_vertex
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The preprocessor defines a notions of a "preprocessing number" that
starts with either a digit or a decimal point, and continues with zero
or more of digits, decimal points, identifier characters, or the sign
symbols, ('-' and '+').
Prior to this change, preprocessing numbers were lexed as some
combination of OTHER and IDENTIFIER tokens. This had the problem of
causing undesired macro expansion in some cases.
We add tests to ensure that the undesired macro expansion does not
happen in cases such as:
#define e +1
#define xyz -2
int n = 1e;
int p = 1xyz;
In either case these macro definitions have no effect after this
change, so that the numeric literals, (whether valid or not), will be
passed on as-is from the preprocessor to the compiler proper.
This fixes the following Khronos GLES3 CTS tests:
preprocessor.basic.correct_phases_vertex
preprocessor.basic.correct_phases_fragment
v2. Thanks to Anuj Phogat for improving the original regular expression,
(which accepted a '+' or '-', where these are only allowed after one of
[eEpP]. I also expanded the test to exercise this.
v3. Also fixed regular expression to require at least one digit at the
beginning (after an optional period). Otherwise, a string such as ".xyz" was
getting sucked up as a preprocessing number, (where obviously this should be a
field access). Again, I expanded the test to exercise this.
Reviewed-by: Anuj Phogat <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Previously, a line such as:
#else garbage
would flag an error if it followed "#if 0", but not if it followed "#if 1".
We fix this by setting a new bit of state (lexing_else) that allows the lexer
to defer switching to the <SKIP> start state until after the NEWLINE following
the #else directive.
A new test case is added for:
#if 1
#else garbage
#endif
which was untested before, (and did not generate the desired error).
This fixes the following Khronos GLES3 CTS tests:
tokens_after_else_vertex
tokens_after_else_fragment
Reviewed-by: Matt Turner <[email protected]>
Reviewed-by: Anuj Phogat <[email protected]>
|
|
|
|
|
|
|
|
|
| |
After preprocessing by glcpp all adjacent spaces were replaced by
single one and glsl parser received column-shifted shader source.
It negatively affected ast location set up and produced wrong error
messages for heavily-spaced shaders.
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
| |
Reviewed-by: Carl Worth <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
In commit 6005e9cb28 a new start state of NEWLINE_CATCHUP was added to the
lexer. This start state is used whenever the lexer is emitting a NEWLINE token
to emit additional NEWLINE tokens for any newline characters that were skipped
by an immediately preceding multi-line comment.
However, that commit erroneously entered the NEWLINE_CATCHUP state for
single-line comments. This is not desired since in the case of a single-line
comment, the lexer is not emitting any NEWLINE token. The result is that the
lexer will remain in the NEWLINE_CATCHUP state and proceed to fail to emit a
NEWLINE token for the subsequent newline character, (since the case to match \n expects only the INITIAL start state).
The fix is quite simple, remove the "BEGIN NEWLINE_CATCHUP" code from the
single-line comment case, (preserving it only in exactly the cases where the
lexer is actually emitting a NEWLINE token).
Many thanks to Petri Latvala for reporting this bug and for providing the
minimal test case to exercise it. The bug showed up only with a multi-line
comment which was followed immediately by a single-line comment (without any
intervening newline), such as:
/*
*/ // Kablam!
Since 6005e9cb28, and before this commit, that very innocent-looking
combination of comments would yield a parse failure in the compiler.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=72686
Reviewed-by: Jordan Justen <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
definition)
The preprocessor has always replaced multi-line comments with a single space
character, (as required by the specification), but as of commit
bd55ba568b301d0f764cd1ca015e84e1ae932c8b the lexer also emitted a NEWLINE
token for each newline within the comment, (in order to preserve line
numbers).
The emitting of NEWLINE tokens within the comment broke the rule of "replace a
multi-line comment with a single space" as could be exposed by code like the
following:
#define FOO a/*
*/b
FOO
Prior to commit bd55ba568b301d0f764cd1ca015e84e1ae932c8b, this code defined
the macro FOO as "a b" as desired. Since that commit, this code instead
defines FOO as "a" and leaves a stray "b" in the output.
In this commit, we fix this by not emitting the NEWLINE tokens while lexing
the comment, but instead merely counting them in the commented_newlines
variable. Then, when the lexer next encounters a non-commented newline it
switches to a NEWLINE_CATCHUP state to emit as many NEWLINE tokens as
necessary (so that subsequent parsing stages still generate correct line
numbers).
Of course, it would have been more clear if we could have written a loop to
emit all the newlines, but flex conventions prevent that, (we must use
"return" for each token we emit).
It similarly would have been clear to have a new rule restricted to the
<NEWLINE_CATCHUP> state with an action much like the body of this if
condition. The problem with that is that this rule must not consume any
characters. It might be possible to write a rule that matches a single
lookahead of any character, but then we would also need an additional rule to
ensure for the <EOF> case where there are no additional characters available
for the lookahead to match.
Given those considerations, and given that the SKIP-state manipulation already
involves a code block at the top of the lexer function, before any rules, it
seems best to me to go with the implementation here which adds a similar
pre-rule code block for the NEWLINE_CATCHUP.
Finally, this commit also changes the expected output of a few, existing glcpp
tests. The change here is that the space character resulting from the
multi-line comment is now emitted before the newlines corresponding to that
comment. (Previously, the newlines were emitted first, and the space character
afterward.)
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=72686
Reviewed-by: Kenneth Graunke <[email protected]>
Reviewed-by: Ian Romanick <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Two things make this code confusing:
1. The uncharacteristic manipulation of lexer start state outside of
flex rules.
2. The confusing semantics of the skip_stack (including the
"lexing_if" override and the SKIP_NO_SKIP state).
This new comment is intended to bring a bit more clarity for any readers.
There is no intended beahvioral change to the code here. The actual code
changes include better indentation to avoid an excessively-long line, and
using the more descriptive INITIAL rather than 0.
Reviewed-by: Kenneth Graunke <[email protected]>
Reviewed-by: Ian Romanick <[email protected]>
|
|
|
|
|
|
|
|
|
| |
The GLSL ES 3.0 spec (Section 12.17) says:
"GLSL ES 1.00 removed token pasting and other functionality."
NOTE: This is a candidate for the stable branches.
Reviewed-by: Ian Romanick <[email protected]>
Reviewed-by: Carl Worth <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
| |
And add test cases to ensure that this works
- 110 verifies that glcpp rejects #elif<digits> which glcpp
previously accepted.
- 111 verifies that glcpp accepts #if followed immediately by
(, +, -, !, or ~.
- 112 does the same as 111 but for #elif.
See 17f9beb6 for #if change.
Reviewed-by: Carl Worth <[email protected]>
|
|
|
|
|
| |
Fixes part of es3conform's preprocess16_frag test.
Reviewed-by: Carl Worth <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Previously, we used lookahead patterns to differentiate:
#define FOO(x) function macro
#define FOO (x) object macro
Unfortunately, our rule for function macros:
{HASH}define{HSPACE}+/{IDENTIFIER}"("
relies on infinite lookahead, and apparently triggers a Flex bug where
the generated code overflows a state buffer (see YY_STATE_BUF_SIZE).
There's no need to use infinite lookahead. We can simply change state,
match the identifier, and use a single character lookahead for the '('.
This apparently makes Flex not generate the giant state array, which
avoids the buffer overflow, and should be more efficient anyway.
Fixes piglit test 17000-consecutive-chars-identifier.frag.
NOTE: This is a candidate for every release branch ever.
Signed-off-by: Kenneth Graunke <[email protected]>
Reviewed-by: Carl Worth <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The GLSL specification requires that #line directives be interpreted
after macro expansion. Our existing implementation of #line macros in
the lexer prevents conformance on this point.
Moving the handling of #line from the lexer to the parser gives us the
macro expansion we need. An additional benefit is that the
preprocessor also now supports comments on the same line as #line
directives.
Finally, the preprocessor now emits the (fully-macro-expanded) #line
directives into the output. This allows the full GLSL compiler to also
see and interpret these directives so it can also generate correct
line numbers in error messages.
Signed-off-by: Carl Worth <[email protected]>
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The trick here is that flex always chooses the rule that matches the most
text. So with a input text of "two:" which we want to be lexed as an
IDENTIFIER token "two" followed by an OTHER token ":" the previous OTHER
rule would match longer as a single token of "two:" which we don't want.
We prevent this by forcing the OTHER pattern to never match any
characters that appear in other constructs, (no letters, numbers, #,
_, whitespace, nor any punctuation that appear in CPP operators).
Fixes bug #44764:
GLSL preprocessor doesn't replace defines ending with ":"
https://bugs.freedesktop.org/show_bug.cgi?id=44764
Reviewed-by: Kenneth Graunke <[email protected]>
NOTE: This is a candidate for stable release branches.
|
| |
|
|
|
|
| |
These are now unnecessary.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Previously, the rule deleted by this commit was matched every single
time (being the longest match). If not skipping, it used REJECT to
continue on to the actual correct rule.
The flex manual advises against using REJECT where possible, as it is
one of the most expensive lexer features. So using it on every match
seems undesirable. Perhaps more importantly, it made it necessary for
the #if directive rules to contain a look-ahead pattern to make them
as long as the (now deleted) "skip the whole line" rule.
This patch introduces an exclusive start state, SKIP, to avoid REJECTs.
Each time the lexer is called, the code at the top of the rules section
will run, implicitly switching the state to the correct one.
Fixes piglit tests 16384-consecutive-chars.frag and
16385-consecutive-chars.frag.
|
| |
|
|
|
|
| |
This is necessary for the main compiler to get correct line numbers.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The existing DECIMAL_INTEGER pattern is the correct thing to use when
looking for a C decimal integer, (that is, a digit-sequence not
starting with 0 which would instead be an octal integer).
But for #line, we really want to accept any digit sequence, (including
"0"), and always interpret it as a decimal constant. So we add a new
DIGITS pattern for this case.
This should fix the compilation failure noted in bug #28138
https://bugs.freedesktop.org/show_bug.cgi?id=28138
(Though the generated file will not be updated until the next commit.)
|
|
|
|
|
|
|
| |
Previously, the YY_USER_ACTION was overwriting the yylloc->source value
in every action, (after that value had been carefully set by the handling
of the #line directive). Instead, we want to initialize it once in
YY_USER_INIT and then not touch it at all in YY_USER_ACTION.
|
| |
|