Most flex programs are quite ambiguous, with multiple patterns that can match the same input. Flex resolves the ambiguity with two simple rules:
- Match the longest possible string every time the scanner matches input.
- In the case of a tie, use the pattern that appears first in the program.
These turn out to do the right thing in the vast majority of cases. Consider this snippet
from a scanner for C source code:
1 | "+" |
For the first three patterns, the string +=
is matched as one token, since +=
is longer than+
. For the last three patterns, so long as the patterns for keywords precede the pattern
that matches an identifier, the scanner will match keywords correctly.