|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The glcpp parser is line-based, so it needs to see a NEWLINE token at the end
of each line. This causes a trick for files that end without a final newline.
Previously, the lexer for glcpp punted in this case by unconditionally
returning a NEWLINE token at end-of-file, (causing most files to have an extra
blank line at the end). Here, we refine this by lexing end-of-file as a
NEWLINE token only if the immediately preceding token was not a NEWLINE token.
The patch is a minor change that only looks huge for two reasons:
1. Almost all glcpp test result ".expected" files are updated to drop
the extra newline.
2. All return statements from the lexer are adjusted to use a new
RETURN_TOKEN macro that tracks the last-token-was-a-newline state.
Reviewed-by: Jordan Justen <[email protected]>
|
|
definition)
The preprocessor has always replaced multi-line comments with a single space
character, (as required by the specification), but as of commit
bd55ba568b301d0f764cd1ca015e84e1ae932c8b the lexer also emitted a NEWLINE
token for each newline within the comment, (in order to preserve line
numbers).
The emitting of NEWLINE tokens within the comment broke the rule of "replace a
multi-line comment with a single space" as could be exposed by code like the
following:
#define FOO a/*
*/b
FOO
Prior to commit bd55ba568b301d0f764cd1ca015e84e1ae932c8b, this code defined
the macro FOO as "a b" as desired. Since that commit, this code instead
defines FOO as "a" and leaves a stray "b" in the output.
In this commit, we fix this by not emitting the NEWLINE tokens while lexing
the comment, but instead merely counting them in the commented_newlines
variable. Then, when the lexer next encounters a non-commented newline it
switches to a NEWLINE_CATCHUP state to emit as many NEWLINE tokens as
necessary (so that subsequent parsing stages still generate correct line
numbers).
Of course, it would have been more clear if we could have written a loop to
emit all the newlines, but flex conventions prevent that, (we must use
"return" for each token we emit).
It similarly would have been clear to have a new rule restricted to the
<NEWLINE_CATCHUP> state with an action much like the body of this if
condition. The problem with that is that this rule must not consume any
characters. It might be possible to write a rule that matches a single
lookahead of any character, but then we would also need an additional rule to
ensure for the <EOF> case where there are no additional characters available
for the lookahead to match.
Given those considerations, and given that the SKIP-state manipulation already
involves a code block at the top of the lexer function, before any rules, it
seems best to me to go with the implementation here which adds a similar
pre-rule code block for the NEWLINE_CATCHUP.
Finally, this commit also changes the expected output of a few, existing glcpp
tests. The change here is that the space character resulting from the
multi-line comment is now emitted before the newlines corresponding to that
comment. (Previously, the newlines were emitted first, and the space character
afterward.)
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=72686
Reviewed-by: Kenneth Graunke <[email protected]>
Reviewed-by: Ian Romanick <[email protected]>
|