summaryrefslogtreecommitdiffstats
path: root/src/glsl/glcpp/glcpp-lex.l
Commit message (Collapse)AuthorAgeFilesLines
* glcpp: Don't use alternation in the lookahead for empty pragmas.Carl Worth2014-08-221-2/+8
| | | | | | | | | | | | | | | | | | | | We've found that there's a buffer overrun bug in flex that's triggered by using alternation in a lookahead pattern. Fortunately, we don't need to match the exact {NEWLINE} expression to detect an empty pragma. It suffices to verify that there are no non-space characters before any newline character. So we can use a simple [\r\n] to get the desired behavior while avoiding the flex bug. Fixes the regression of piglit's 17000-consecutive-chars-identifier test, (which has been crashing since commit 04e40fd337a244ee77ef9553985e9398ff0344af ). Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=82472 Signed-off-by: Carl Worth <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]> CC: <[email protected]>
* glsl/glcpp: Don't include any newline characters in #error tokenCarl Worth2014-08-071-1/+1
| | | | | | | | | | | | | | | | | | Some tests were failing because the message printed by #error was including a '\r' character from the source file in its output. This is easily avoided by fixing the regular expression for #error to never include any of the possible newline characters, (neither '\r' nor '\n'). With this commit 2 tests are fixed for each of the '\r' and '\r\n' cases. Current results after the commit are: \r: 137/143 tests pass \r\n 142/143 tests pass \n\r: 139/143 tests pass Reviewed-by: Ian Romanick <[email protected]>
* glsl/glcpp: Treat CR+LF pair as a single newlineCarl Worth2014-08-071-5/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | The GLSL specification says that either carriage-return, line-feed, or both together can be used to terminate lines. Further, it says that when used together, the pair of terminators shall be interpreted as a single line. This final requirement has not been respected by glcpp up until now, (it has been emitting two newlines for every CR+LF pair). Here, we fix the lexer by using a regular expression for NEWLINE that eats up both "\r\n" (or even "\n\r") if possible before also considering a single '\n' or a single '\r' as a line terminator. Before this commit, the test results are as follows: \r: 135/143 tests pass \r\n: 4/143 tests pass \n\r: 4/143 tests pass After this commit, the test results are as follows: \r: 135/143 tests pass \r\n: 140/143 tests pass \n\r: 139/143 tests pass So, obviously, a dramatic improvement. Reviewed-by: Ian Romanick <[email protected]>
* glsl/glcpp: Swallow empty #pragma directives.Carl Worth2014-08-071-0/+6
| | | | | | | | | | | | | | Previously, we were passing these through, just like any other pragma. But the downstream compiler was tripping up on them. It seems easier to swallow these in the preprocessor and not pass them on at all rather than fixing the downstream compiler. This fixes the following Khronos GLES3 CTS tests: preprocessor.pragmas.pragma_vertex preprocessor.pragmas.pragma_fragment Reviewed-by: Ian Romanick <[email protected]>
* glsl/glcpp: Fix #pragma to not over-increment the line-number countCarl Worth2014-08-071-2/+0
| | | | | | | | | | | | | | Previously, the #pragma directive was swallowing an entire line, (including the final newline). At that time it was appropriate for it to increment the line count. More recently, our handling of #pragma changed to not include the newline. But the code to increment yylineno stuck around. This was causing __LINE__ to be increased by one more than desired for every #pragma. Remove the bogus, extra increment, and add a test for this case. Reviewed-by: Ian Romanick <[email protected]>
* glsl/glcpp: Fix NULL directives when followed by a single-line commentCarl Worth2014-08-071-1/+1
| | | | | | | | | | | | | | | | | | | This is the fix for the following line: # // comment to ignore here According to the translation-phase rules, the comment should be removed before the preprocessor looks to interpret the null directive. So in our implementation we must explicitly look for single-line comments in the <HASH> start condition as well. This commit fixes the following Khronos GLES3 CTS tests: null_directive_vertex null_directive_fragment Reviewed-by: Ian Romanick <[email protected]>
* glsl/glcpp: Allow single-line comments immediately after #defineCarl Worth2014-08-071-1/+1
| | | | | | | | | | | | | | | | We were already correctly supporting single-line comments in case like: #define FOO bar // comment here... The new support added here is simply for the none-too-useful: #define // comment instead of macro name With this commit, this line will now give the expected "#define without macro name" error message instead of the lexer just going off into the weeds. Reviewed-by: Ian Romanick <[email protected]>
* glsl/glcpp: Add explicit error for "#define without macro name"Carl Worth2014-08-071-3/+3
| | | | | | | | | | | | | | | | | | | | | | | | Previously, glcpp would emit an error like this if <EOF> happened to occur immediately after the "#define", but in general would just get confused, (leading to un-helpful error messages). To fix things to generate a clean error message, we do a few things: 1. Don't require horizontal whitespace immediately after #define 2. Add a production for the error case, (DEFINE_TOKEN followed immediately by a NEWLINE token). 3. Make the lexer reset to the <INITIAL> state after every NEWLINE. This 3rd point prevents the lexer from getting so confused and generating further spurious errors in the file because it was stuck in the <DEFINE> start condition. We also drop the similar error message from the <EOF> rule since the newly-added rule will have already printed the error message. Reviewed-by: Ian Romanick <[email protected]>
* glsl/glcpp: rename ERROR to ERROR_TOKEN to fix MSVC buildBrian Paul2014-07-301-1/+1
| | | | | | | ERROR is a #define in the MSVC WinGDI.h header file. Add the _TOKEN suffix as we do for a few other lexer tokens. Reviewed-by: Kenneth Graunke <[email protected]>
* glsl/glcpp: Add flex options to eliminate the default rule.Carl Worth2014-07-291-1/+2
| | | | | | | | | | | | | | | | | | | | | | | | We've had multiple bugs in the past where we have been inadvertently matching the default rule, (which we never want to do). We recently added a catch-all rule to avoid this, (and made this rule robust for future start conditions). Kristian pointed out that flex allows us to go one step better. This syntax: %option warn nodefault instructs flex to not generate the default rule at all. Further, flex will generate a warning at compile time if the set of rules we provide are inadequate, (such that it would be possible for the default rule to be matched). With this warning in place, I found that the catch-all rule was in fact missing something. The catch-all rule uses a pattern of "." which doesn't match newlines. So here we extend the newline-matching rule to all start conditions. That is enough to convince flex that it really doesn't need any default rule. Reviewed-by: Kenneth Graunke <[email protected]> Reviewed-by: Kristian Høgsberg <[email protected]>
* glsl/glcpp: Combine the two rules matching any characterCarl Worth2014-07-291-6/+6
| | | | | | | | | | Using a single rule here means that we can use the <*> syntax to match all start conditions. This makes the catch-all rule more robust against the addition of future start conditions, (no need to maintain an ever- growing list of start conditions for this rul). Reviewed-by: Kenneth Graunke <[email protected]> Reviewed-by: Kristian Høgsberg <[email protected]>
* glsl/glcpp: Alphabetize lists of start conditionsCarl Worth2014-07-291-3/+3
| | | | | | | | | | | | There is no behavioral change here. It's just easier to verify that lists of start conditions include all expected conditions when they appear in a consistent order. The <INITIAL> state is special, so it appears first in all lists. All others appear in alphabetical order. Reviewed-by: Kenneth Graunke <[email protected]> Reviewed-by: Kristian Høgsberg <[email protected]>
* glsl/glcpp: Add a catch-all rule for unexpected characters.Carl Worth2014-07-291-0/+13
| | | | | | | | | | | | | | | | | | In some of the recent glcpp bug-fixing, we found that glcpp was emitting unrecognized characters from the input source file to stdout, and dropping them from the source passed onto the compiler proper. This was obviously confusing, and totally undesired. The bogus behavior comes from an implicit default rule in flex, which is that any unmatched character is implicitly matched and printed to stdout. To avoid this implicit matching and printing, here we add an explicit catch-all rule. If this rule ever matches it prints an internal compiler error. The correct response for any such error is fixing glcpp to handle the unexpected character in the correct way. Reviewed-by: Jordan Justen <[email protected]>
* glsl/glcpp: Treat carriage return as equivalent to line feed.Carl Worth2014-07-291-9/+8
| | | | | | | | | | | | | | | | | | | Previously, the '\r' character was not explicitly matched by any lexer rule. This means that glcpp would have been using the default flex rule to match '\r' characters, (where they would have been printed to stdout rather than actually correctly handled). With this commit, we treat '\r' as equivalent to '\n'. This is clearly an improvement the bogus printing to stdout. The resulting behavior is compliant with the GLSL specification for any source file that uses exclusively '\r' or '\n' to separate lines. For shaders that use a multiple-character line separator, (such as "\r\n"), glcpp won't be precisely compliant with the specification, (treating these as two newline characters rather than one), but this should not introduce any semantic changes to the shader programs. Reviewed-by: Jordan Justen <[email protected]>
* glsl/glcpp: Add (non)-support for ++ and -- operatorsCarl Worth2014-07-291-0/+8
| | | | | | | | | | | | | | | | | | | | | | | | | | | | These operators aren't defined for preprocessor expressions, so we never implemented them. This led them to be misinterpreted as strings of unary '+' or '-' operators. In fact, what is actually desired is to generate an error if these operators appear in any preprocessor condition. So this commit looks like it is strictly adding support for these operators. And it is supporting them as far as passing them through to the subsequent compiler, (which was already happening anyway). What's less apparent in the commit is that with these tokens now being lexed, but with no change to the grammar for preprocessor expressions, these operators will now trigger errors there. A new "make check" test is added to verify the desired behavior. This commit fixes the following Khronos GLES3 CTS test: invalid_op_1_vertex invalid_op_1_fragment invalid_op_2_vertex invalid_op_2_fragment Reviewed-by: Jordan Justen <[email protected]>
* glsl/glcpp: Drop the HASH_ prefix from token names like HASH_IFCarl Worth2014-07-291-13/+13
| | | | | | | | | | | | | | | | | | | Previously, we had a single token for "#if" but now that we have two separate tokens, it looks much better to see: HASH_TOKEN IF than: HASH_TOKEN HASH_IF (Note, that for the same reason we use HASH_TOKEN instead of HASH, we also use DEFINE_TOKEN instead of DEFINE to avoid a conflict with the <DEFINE> start condition in the lexer.) There should be no behavioral change from this commit. Reviewed-by: Jordan Justen <[email protected]>
* glsl/glcpp: Correctly parse directives with intervening commentsCarl Worth2014-07-291-42/+120
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | It's legal (though highly bizarre) for a pre-processor directive to look like this: # /* why? */ define FOO bar This behavior comes about since the specification defines separate logical phases in a precise order, and comment-removal occurs in a phase before the identification of directives. Our implementation does not use an actual separate phase for comment removal, so some extra care is necessary to correctly parse this. What we want is for '#' to introduce a directive iff it is the first token on a line, (ignoring whitespace and comments). Previously, we had a lexical rule that worked only for whitespace (not comments) with the following regular expression to find a directive-introducing '#' at the beginning of a line: HASH ^{HSPACE}*#{HSPACE}* In this commit, we switch to instead use a simple literal match of '#' to return a HASH_TOKEN token and add a new <HASH> start condition for whenever the HASH_TOKEN is the first non-space token of a line. This requires the addition of the new bit of state: first_non_space_token_this_line. This approach has a couple of implications on the glcpp parser: 1. The parser now sees two separate tokens, (such as HASH_TOKEN and HASH_DEFINE) where it previously saw one token (HASH_DEFINE) for the sequence "#define". This is a straightforward change throughout the grammar. 2. The parser may now see a SPACE token before the HASH_TOKEN token of a directive. Previously the lexical regular expression for {HASH} would eat up the space and there would be no SPACE token. This second implication is a bit of a nuisance for the parser. It causes a SPACE token to appear in a production of the grammar with the following two definitions of a control_line: control_line SPACE control_line This is really ugly, since normally a space would simply be a token separator, so it wouldn't appear in the tokens of a production. This leads to a further problem with interleaved spaces and comments: /* ... */ /* ... */ #define /* ..*/ For this, we must not return several consecutive SPACE tokens, or else we would need an arbitrary number of new productions: SPACE SPACE control_line SPACE SPACE SPACE control_line ad nauseam To avoid this problem, in this commit we also change the lexer to emit only a single SPACE token for any series of consecutive spaces, (whether from actual whitespace or comments). For this compression, we add a new bit of parser state: last_token_was_space. And we also update the expected results of all necessary test cases for the new compression of space tokens. Fortunately, the compression of spaces should not lead to any semantic changes in terms of what the eventual GLSL compiler sees. So there's a lot happening in this commit, (particularly for such a tiny feature). But fortunately, the lexer itself is looking cleaner than ever. The only ugly bit is all the state updating, but it is at least isolated to a single shared function. Of course, a new "make check" test is added for the new feature, (directives with comments and whitespace interleaved in many combinations). And this commit fixes the following Khronos GLES3 CTS tests: function_definition_with_comments_vertex function_definition_with_comments_fragment Reviewed-by: Jordan Justen <[email protected]>
* glsl/glcpp: Rename HASH token to HASH_TOKENCarl Worth2014-07-291-1/+1
| | | | | | | | | | | This is in preparation for the planned addition of a new <HASH> start condition to the lexer. Both start conditions and token types are, of course, in the same default C namespace, so a start condition and a token type with the same name will collide. (And unfortunately, they are both apparently implemented as equivalent numeric types so the collision is undetected at compile time and simply leads to unpredictable behavior at run time.) Reviewed-by: Jordan Justen <[email protected]>
* glsl/glcpp: Don't use start-condition stack when switching to/from <DEFINE>Carl Worth2014-07-291-3/+3
| | | | | | | | | | | | | | | | | | | | | This commit does not cause any behavioral change for any valid program. Prior to entering the <DEFINE> start condition, the only valid start condition is <INITIAL>, so whether pushing/popping <DEFINE> onto the stack or explicit returning to <INITIAL> is equivalent. The reason for this change is that we are planning to soon add a start condition for <HASH> with the following semantics: <HASH>: We just saw a directive-introducing '#' <DEFINE>: We just saw "#define" starting a directive With these two start conditions in place, the only correct behavior is to leave <DEFINE> by returning to <INITIAL>. But the old push/pop code would have returned to the <HASH> start condition which would then cause an error when the next directive-introducing '#' would be encountered. Reviewed-by: Jordan Justen <[email protected]>
* glsl/glcpp: Fix off-by-one error in column in first-line error messagesCarl Worth2014-07-291-1/+1
| | | | | | | | | | | | | | For the first line we were initializing the column to 1, but for all subsequent lines we were initializing the column to 0. The column number is advanced for each token read before any error message is printed. So the 0 value is the correct initialization, (so that the first column is reported as column 1). With this extremely minor change, many of the .expected files are updated such that error messages for the first line now have the correct column number in them. Reviewed-by: Jordan Justen <[email protected]>
* glsl/glcpp: Stop using a lexer start condition (<SKIP>) for token skipping.Carl Worth2014-07-291-63/+97
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Here, "skipping" refers to the lexer not emitting any tokens for portions of the file within an #if condition (or similar) that evaluates to false. Previously, the lexer had a special <SKIP> start condition used to control this skipping. This start condition was not handled like a normal start condition. Instead, there was a particularly ugly block of code set to be included at the top of the generated lexing loop that would change from <INITIAL> to <SKIP> or from <SKIP> to <INITIAL> depending on various pieces of parser state, (such as parser->skip_state and parser->lexing_directive). Not only was that an ugly approach, but the <SKIP> start condition was complicating several glcpp bug fixes I attempted recently that want to use start conditions for other purposes, (such as a new <HASH> start condition). The recently added RETURN_TOKEN macro gives us a convenient way to implement skipping without using a lexer start condition. Now, at the top of the generated lexer, we examine all the necessary parser state and set a new parser->skipping bit. Then, in RETURN_TOKEN, we examine parser->skipping to determine whether to actually emit the token or not. Besides this, there are only a couple of other places where we need to examine the skipping bit (other than when returning a token): * To avoid emitting an error for #error if skipped. * To avoid entering the <DEFINE> start condition for a #define that is skipped. With all of this in place in the present commit, there are hopefully no behavioral changes with this patch, ("make check" still passes all of the glcpp tests at least). Reviewed-by: Jordan Justen <[email protected]>
* glsl/glcpp: Abstract a bit of common code for returning string tokensCarl Worth2014-07-291-22/+18
| | | | | | | Now that we have a common macro for returning tokens, it makes sense to perform some of the common work there, (such as copying string values). Reviewed-by: Jordan Justen <[email protected]>
* glsl/glcpp: Drop extra, final newline from most outputCarl Worth2014-07-291-38/+49
| | | | | | | | | | | | | | | | | | | | The glcpp parser is line-based, so it needs to see a NEWLINE token at the end of each line. This causes a trick for files that end without a final newline. Previously, the lexer for glcpp punted in this case by unconditionally returning a NEWLINE token at end-of-file, (causing most files to have an extra blank line at the end). Here, we refine this by lexing end-of-file as a NEWLINE token only if the immediately preceding token was not a NEWLINE token. The patch is a minor change that only looks huge for two reasons: 1. Almost all glcpp test result ".expected" files are updated to drop the extra newline. 2. All return statements from the lexer are adjusted to use a new RETURN_TOKEN macro that tracks the last-token-was-a-newline state. Reviewed-by: Jordan Justen <[email protected]>
* glsl/glcpp: Add testing for EOF sans newline (and fix for <DEFINE>, <COMMENT>)Carl Worth2014-07-291-2/+5
| | | | | | | | | | | | | | | | The glcpp implementation has long had code to support a file that ends without a final newline. But we didn't have a "make check" test for this. Additionally, the <EOF> action was restricted only to the <INITIAL> state so it would fail to get invoked if the EOF was encountered in the <COMMENT> or the <DEFINE> case. Neither of these was a bug, per se, since EOF in either of these cases is an error anyway, (either "unterminated comment" or "missing macro name for #define"). But with the new explicit support for these cases, we not generate clean error messages in these cases, (rather than "unexpected $end" from before). Reviewed-by: Jordan Justen <[email protected]>
* glsl/glcpp: Remove some un-needed calls to NEWLINE_CATCHUPCarl Worth2014-07-291-4/+0
| | | | | | | | | | | | | | | | | The NEWLINE_CATCHUP code is only intended to be invoked after we lex an actual newline character ('\n'). The two extra calls here were apparently added accidentally because the pattern happened to contain a (negated) '\n', (see commit 6005e9cb283214cd57038c7c5e7758ba72ec6ac2). I don't think either case could have caused any actual bug. (In the first case, the pattern matched right up to the next newline, so the NEWLINE_CATCHUP code was just about to be called. In the second case, I don't think it's possible to actually enter the <SKIP> start condition after commented newlines without any intervening newline.) But, if nothing else, the code is cleaner without these extra calls. Reviewed-by: Jordan Justen <[email protected]>
* glsl/glcpp: Add support for comments between #define and macro identifierCarl Worth2014-07-291-2/+36
| | | | | | | | | | | | | | | The recent adddition of an error for "#define followed by a non-identifier" was a bit to aggressive since it used a regular expression in the lexer to flag any character that's not legal as the first character of an identifier. But we need to allow comments to appear here, (since we aren't removing comments in a preliminary pass). So we refine the error here to only flag characters that could not be an identifier, nor a comment, nor whitespace. We also augment the existing comment support to be active in the <DEFINE> state as well. Reviewed-by: Jordan Justen <[email protected]>
* glsl/glcpp: Emit proper error for #define with a non-identifierCarl Worth2014-07-291-0/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Previously, if the preprocessor encountered a #define with a non-identifier, such as: #define 123 456 The lexer had no explicit rules to match non-identifiers in the <DEFINE> start state. Because of this, flex's default rule was being invoked, (printing characters to stdout), and all text was being discarded by the compiler until the next identifier. As one can imagine, this led to all sorts of interesting and surprising results. Fix this by adding an explicit rule complementing the existing identifier-based rules that should catch all non-identifiers after #define and reliably give a well-formatted error message. A new test is added to "make check" to ensure this bug stays fixed. This commit also fixes the following Khronos GLES3 CTS test: define_non_identifier_vertex (The "fragment" variant was passing earlier only because the preprocessor was behaving so randomly and causing the compilation to fail. It's lucky, in fact, that the "vertex" version succesfully compiled so we could find and fix this bug.) Reviewed-by: Jordan Justen <[email protected]>
* glsl/glcpp: Fix to emit spaces following directivesCarl Worth2014-07-291-0/+1
| | | | | | | | | | | | | The glcpp lexer and parser use the space_tokens state bit to avoid emitting tokens for spaces while parsing a directive. Previously, this bit was only being set again by the first non-space token following a directive. This led to a bug where a space, (or a comment that should emit a space), immediately following a directive, (optionally searated by newlines), would be omitted from the output. Here we fix the bug by also setting the space_tokens bit whenever we lex a newline in the standard start conditions.
* glsl/glcpp: Don't choke on an empty pragmaCarl Worth2014-07-091-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | The lexer was insisting that there be at least one character after "#pragma" and before the end of the line. This caused an error for a line consisting only of "#pragma" which volates at least the following sentence from the GLSL ES Specification 3.00.4: The scope as well as the effect of the optimize and debug pragmas is implementation-dependent except that their use must not generate an error. [Page 12 (Page 28 of PDF)] and likely the following sentence from that specification and also in GLSLangSpec 4.30.6: If an implementation does not recognize the tokens following #pragma, then it will ignore that pragma. Add a "make check" test to ensure no future regressions. This change fixes at least part of the following Khronos GLES3 CTS test: preprocessor.pragmas.pragma_vertex Reviewed-by: Kenneth Graunke <[email protected]>
* glsl/glcpp: Fix glcpp to properly lex entire "preprocessing numbers"Carl Worth2014-07-091-0/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The preprocessor defines a notions of a "preprocessing number" that starts with either a digit or a decimal point, and continues with zero or more of digits, decimal points, identifier characters, or the sign symbols, ('-' and '+'). Prior to this change, preprocessing numbers were lexed as some combination of OTHER and IDENTIFIER tokens. This had the problem of causing undesired macro expansion in some cases. We add tests to ensure that the undesired macro expansion does not happen in cases such as: #define e +1 #define xyz -2 int n = 1e; int p = 1xyz; In either case these macro definitions have no effect after this change, so that the numeric literals, (whether valid or not), will be passed on as-is from the preprocessor to the compiler proper. This fixes the following Khronos GLES3 CTS tests: preprocessor.basic.correct_phases_vertex preprocessor.basic.correct_phases_fragment v2. Thanks to Anuj Phogat for improving the original regular expression, (which accepted a '+' or '-', where these are only allowed after one of [eEpP]. I also expanded the test to exercise this. v3. Also fixed regular expression to require at least one digit at the beginning (after an optional period). Otherwise, a string such as ".xyz" was getting sucked up as a preprocessing number, (where obviously this should be a field access). Again, I expanded the test to exercise this. Reviewed-by: Anuj Phogat <[email protected]>
* glsl/glcpp: Fix glcpp to catch garbage after #if 1 ... #elseCarl Worth2014-07-091-12/+13
| | | | | | | | | | | | | | | | | | | | | | | | | | | | Previously, a line such as: #else garbage would flag an error if it followed "#if 0", but not if it followed "#if 1". We fix this by setting a new bit of state (lexing_else) that allows the lexer to defer switching to the <SKIP> start state until after the NEWLINE following the #else directive. A new test case is added for: #if 1 #else garbage #endif which was untested before, (and did not generate the desired error). This fixes the following Khronos GLES3 CTS tests: tokens_after_else_vertex tokens_after_else_fragment Reviewed-by: Matt Turner <[email protected]> Reviewed-by: Anuj Phogat <[email protected]>
* glcpp: Do not remove spaces to preserve locations.Sir Anthony2014-03-081-1/+1
| | | | | | | | | After preprocessing by glcpp all adjacent spaces were replaced by single one and glsl parser received column-shifted shader source. It negatively affected ast location set up and produced wrong error messages for heavily-spaced shaders. Reviewed-by: Kenneth Graunke <[email protected]>
* glsl: Update lexers in glsl and glcpp to hande end position of token.Sir Anthony2014-03-081-1/+2
| | | | Reviewed-by: Carl Worth <[email protected]>
* glcpp: Don't enter lexer's NEWLINE_CATCHUP start state for single-line commentsCarl Worth2014-01-311-2/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | In commit 6005e9cb28 a new start state of NEWLINE_CATCHUP was added to the lexer. This start state is used whenever the lexer is emitting a NEWLINE token to emit additional NEWLINE tokens for any newline characters that were skipped by an immediately preceding multi-line comment. However, that commit erroneously entered the NEWLINE_CATCHUP state for single-line comments. This is not desired since in the case of a single-line comment, the lexer is not emitting any NEWLINE token. The result is that the lexer will remain in the NEWLINE_CATCHUP state and proceed to fail to emit a NEWLINE token for the subsequent newline character, (since the case to match \n expects only the INITIAL start state). The fix is quite simple, remove the "BEGIN NEWLINE_CATCHUP" code from the single-line comment case, (preserving it only in exactly the cases where the lexer is actually emitting a NEWLINE token). Many thanks to Petri Latvala for reporting this bug and for providing the minimal test case to exercise it. The bug showed up only with a multi-line comment which was followed immediately by a single-line comment (without any intervening newline), such as: /* */ // Kablam! Since 6005e9cb28, and before this commit, that very innocent-looking combination of comments would yield a parse failure in the compiler. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=72686 Reviewed-by: Jordan Justen <[email protected]>
* glcpp: Replace multi-line comment with a space (even as part of macro ↵Carl Worth2014-01-021-5/+33
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | definition) The preprocessor has always replaced multi-line comments with a single space character, (as required by the specification), but as of commit bd55ba568b301d0f764cd1ca015e84e1ae932c8b the lexer also emitted a NEWLINE token for each newline within the comment, (in order to preserve line numbers). The emitting of NEWLINE tokens within the comment broke the rule of "replace a multi-line comment with a single space" as could be exposed by code like the following: #define FOO a/* */b FOO Prior to commit bd55ba568b301d0f764cd1ca015e84e1ae932c8b, this code defined the macro FOO as "a b" as desired. Since that commit, this code instead defines FOO as "a" and leaves a stray "b" in the output. In this commit, we fix this by not emitting the NEWLINE tokens while lexing the comment, but instead merely counting them in the commented_newlines variable. Then, when the lexer next encounters a non-commented newline it switches to a NEWLINE_CATCHUP state to emit as many NEWLINE tokens as necessary (so that subsequent parsing stages still generate correct line numbers). Of course, it would have been more clear if we could have written a loop to emit all the newlines, but flex conventions prevent that, (we must use "return" for each token we emit). It similarly would have been clear to have a new rule restricted to the <NEWLINE_CATCHUP> state with an action much like the body of this if condition. The problem with that is that this rule must not consume any characters. It might be possible to write a rule that matches a single lookahead of any character, but then we would also need an additional rule to ensure for the <EOF> case where there are no additional characters available for the lookahead to match. Given those considerations, and given that the SKIP-state manipulation already involves a code block at the top of the lexer function, before any rules, it seems best to me to go with the implementation here which adds a similar pre-rule code block for the NEWLINE_CATCHUP. Finally, this commit also changes the expected output of a few, existing glcpp tests. The change here is that the space character resulting from the multi-line comment is now emitted before the newlines corresponding to that comment. (Previously, the newlines were emitted first, and the space character afterward.) Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=72686 Reviewed-by: Kenneth Graunke <[email protected]> Reviewed-by: Ian Romanick <[email protected]>
* glcpp: Add a more descriptive comment for the SKIP state manipulationCarl Worth2014-01-021-5/+36
| | | | | | | | | | | | | | | | | | | Two things make this code confusing: 1. The uncharacteristic manipulation of lexer start state outside of flex rules. 2. The confusing semantics of the skip_stack (including the "lexing_if" override and the SKIP_NO_SKIP state). This new comment is intended to bring a bit more clarity for any readers. There is no intended beahvioral change to the code here. The actual code changes include better indentation to avoid an excessively-long line, and using the more descriptive INITIAL rather than 0. Reviewed-by: Kenneth Graunke <[email protected]> Reviewed-by: Ian Romanick <[email protected]>
* glcpp: Reject token pasting operator in GLESMatt Turner2013-01-111-0/+2
| | | | | | | | | The GLSL ES 3.0 spec (Section 12.17) says: "GLSL ES 1.00 removed token pasting and other functionality." NOTE: This is a candidate for the stable branches. Reviewed-by: Ian Romanick <[email protected]> Reviewed-by: Carl Worth <[email protected]>
* glcpp: Support #elif(expression) with no intervening space.Matt Turner2012-11-281-1/+1
| | | | | | | | | | | | And add test cases to ensure that this works - 110 verifies that glcpp rejects #elif<digits> which glcpp previously accepted. - 111 verifies that glcpp accepts #if followed immediately by (, +, -, !, or ~. - 112 does the same as 111 but for #elif. See 17f9beb6 for #if change. Reviewed-by: Carl Worth <[email protected]>
* glcpp: Reject #version and #line not followed by whitespaceMatt Turner2012-11-281-2/+2
| | | | | Fixes part of es3conform's preprocess16_frag test. Reviewed-by: Carl Worth <[email protected]>
* glcpp: Don't use infinite lookhead for #define differentiation.Kenneth Graunke2012-10-251-6/+14
| | | | | | | | | | | | | | | | | | | | | | | | | | Previously, we used lookahead patterns to differentiate: #define FOO(x) function macro #define FOO (x) object macro Unfortunately, our rule for function macros: {HASH}define{HSPACE}+/{IDENTIFIER}"(" relies on infinite lookahead, and apparently triggers a Flex bug where the generated code overflows a state buffer (see YY_STATE_BUF_SIZE). There's no need to use infinite lookahead. We can simply change state, match the identifier, and use a single character lookahead for the '('. This apparently makes Flex not generate the giant state array, which avoids the buffer overflow, and should be more efficient anyway. Fixes piglit test 17000-consecutive-chars-identifier.frag. NOTE: This is a candidate for every release branch ever. Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Carl Worth <[email protected]>
* glsl: glcpp: Move handling of #line directives from lexer to parser.Carl Worth2012-06-261-35/+14
| | | | | | | | | | | | | | | | | | | The GLSL specification requires that #line directives be interpreted after macro expansion. Our existing implementation of #line macros in the lexer prevents conformance on this point. Moving the handling of #line from the lexer to the parser gives us the macro expansion we need. An additional benefit is that the preprocessor also now supports comments on the same line as #line directives. Finally, the preprocessor now emits the (fully-macro-expanded) #line directives into the output. This allows the full GLSL compiler to also see and interpret these directives so it can also generate correct line numbers in error messages. Signed-off-by: Carl Worth <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
* glcpp: Fix so that trailing punctuation does not prevent macro expansionCarl Worth2012-02-021-1/+9
| | | | | | | | | | | | | | | | | | | | The trick here is that flex always chooses the rule that matches the most text. So with a input text of "two:" which we want to be lexed as an IDENTIFIER token "two" followed by an OTHER token ":" the previous OTHER rule would match longer as a single token of "two:" which we don't want. We prevent this by forcing the OTHER pattern to never match any characters that appear in other constructs, (no letters, numbers, #, _, whitespace, nor any punctuation that appear in CPP operators). Fixes bug #44764: GLSL preprocessor doesn't replace defines ending with ":" https://bugs.freedesktop.org/show_bug.cgi?id=44764 Reviewed-by: Kenneth Graunke <[email protected]> NOTE: This is a candidate for stable release branches.
* glsl: Define YY_NO_UNISTD_H on MSVC.José Fonseca2011-03-041-0/+4
|
* glcpp: Remove trailing contexts from #if rules.Kenneth Graunke2011-03-031-6/+6
| | | | These are now unnecessary.
* glcpp: Rework lexer to use a SKIP state rather than REJECT.Kenneth Graunke2011-03-031-21/+16
| | | | | | | | | | | | | | | | | | | Previously, the rule deleted by this commit was matched every single time (being the longest match). If not skipping, it used REJECT to continue on to the actual correct rule. The flex manual advises against using REJECT where possible, as it is one of the most expensive lexer features. So using it on every match seems undesirable. Perhaps more importantly, it made it necessary for the #if directive rules to contain a look-ahead pattern to make them as long as the (now deleted) "skip the whole line" rule. This patch introduces an exclusive start state, SKIP, to avoid REJECTs. Each time the lexer is called, the code at the top of the rules section will run, implicitly switching the state to the correct one. Fixes piglit tests 16384-consecutive-chars.frag and 16385-consecutive-chars.frag.
* Convert everything from the talloc API to the ralloc API.Kenneth Graunke2011-01-311-7/+7
|
* glcpp: Return NEWLINE token for newlines inside multi-line comments.Kenneth Graunke2010-10-211-2/+2
| | | | This is necessary for the main compiler to get correct line numbers.
* glcpp: Fix handling of "#line 0"Carl Worth2010-08-231-2/+3
| | | | | | | | | | | | | | | | The existing DECIMAL_INTEGER pattern is the correct thing to use when looking for a C decimal integer, (that is, a digit-sequence not starting with 0 which would instead be an octal integer). But for #line, we really want to accept any digit sequence, (including "0"), and always interpret it as a decimal constant. So we add a new DIGITS pattern for this case. This should fix the compilation failure noted in bug #28138 https://bugs.freedesktop.org/show_bug.cgi?id=28138 (Though the generated file will not be updated until the next commit.)
* glcpp: Fix source numbers set with "#line LINE_NUMBER SOURCE_NUMBER"Carl Worth2010-08-231-2/+7
| | | | | | | Previously, the YY_USER_ACTION was overwriting the yylloc->source value in every action, (after that value had been carefully set by the handling of the #line directive). Instead, we want to initialize it once in YY_USER_INIT and then not touch it at all in YY_USER_ACTION.
* glcpp: Add basic #line support (adapted from the main compiler).Kenneth Graunke2010-08-181-0/+31
|