%top{ /* * Author : Guo Shaoguang * Email : sgguo@shao.ac.cn * Date : 201010 * Version : v1.0 * Name : heal the world * * Copyright (c) 2010-2016, SHAO * All rights reserved.
The patterns at the heart of every flex scanner use a rich regular expression language. A regular expression is a pattern description using a metalanguage, a language that you use to describe what you want the pattern to match. Flex’s regular expression language is essentially POSIX-extended regular expressions (which is not surprising considering their shared Unix heritage). The metalanguage uses standard text characters, some of which represent themselves and others of which represent patterns. All characters other than the ones listed below, including all letters and digits, match themselves. The characters with special meaning in regular expressions are:
. Matches any single character except the newline character ( \n ).
[] A character class that matches any character within the brackets. If the first char- acter is a circumflex ( ^ ), it changes the meaning to match any character except the ones within the brackets. A dash inside the square brackets indicates a character range; for example, [0-9] means the same thing as [0123456789] and [a-z] means any lowercase letter. A - or ] as the first character after the [ is interpreted literally to let you include dashes and square brackets in character classes. POSIX intro- duced other special square bracket constructs that are useful when handling non- English alphabets, described later in this chapter. Other metacharacters do not have any special meaning within square brackets except that C escape sequences starting with \ are recognized. Character ranges are interpreted relative to the character coding in use, so the range [A-z] with ASCII character coding would match all uppercase and lowercase letters, as well as six punctuation characters whose codes fall between the code for Z and the code for a . In practice, useful ranges are ranges of digits, of uppercase letters, or of lowercase letters.
[a-z]{-}[jv] A differenced character class, with the characters in the first class omitting the characters in the second class (only in recent versions of flex).
^ Matches the beginning of a line as the first character of a regular expression. Also used for negation within square brackets.
$ Matches the end of a line as the last character of a regular expression.
{} If the braces contain one or two numbers, indicate the minimum and maximum number of times the previous pattern can match. For example, A{1,3} matches one to three occurrences of the letter A, and 0{5} matches 00000. If the braces contain a name, they refer to a named pattern by that name.
\ Used to escape metacharacters and as part of the usual C escape sequences; for example, \n is a newline character, while * is a literal asterisk.
Matches zero or more copies of the preceding expression. For example, [ \t]* is a common pattern to match optional spaces and tabs, that is, whitespace, which matches “ ”, “ ”, or an empty string.
Matches one or more occurrences of the preceding regular expression. For example, [0-9]+ matches strings of digits such as 1 , 111 , or 123456 but not an empty string.
? Matches zero or one occurrence of the preceding regular expression. For example, -?[0-9]+ matches a signed number including an optional leading minus sign.
| The alternation operator; matches either the preceding regular expression or the following regular expression. For example, faith|hope|charity matches any of the three virtues.
“…” Anything within the quotation marks is treated literally. Metacharacters other than C escape sequences lose their meaning. As a matter of style, it’s good practice to quote any punctuation characters intended to be matched literally.
() Groups a series of regular expressions together into a new regular expression. For example, (01) matches the character sequence 01, and a(bc|de) matches abc or ade. Parentheses are useful when building up complex patterns with *, +, ?, and |.
/ Trailing context, which means to match the regular expression preceding the slash but only if followed by the regular expression after the slash. For example, 0/1 matches 0 in the string 01 but would not match anything in the string 0 or 02 . The material matched by the pattern following the slash is not “consumed” and remains to be turned into subsequent tokens. Only one slash is permitted per pattern. The repetition operators affect the smallest preceding expression, so abc+ matches ab followed by one or more c’s. Use parentheses freely to be sure your expressions match what you want, such as (abc)+ to match one or more repetitions of abc.
For the first three patterns, the string += is matched as one token, since += is longer than + . For the last three patterns, so long as the patterns for keywords precede the pattern that matches an identifier, the scanner will match keywords correctly.
$ git log --pretty=oneline ca82a6dff817ec66f44342007202690a93763949 changed the version number 085bb3bcb608e1e8451d4b2432f8ecbe6306e7e7 removed unnecessary test a11bef06a3f659402fe7563abf99ad00de2209e6 first commit
最有意思的是 format ,可以定制记录的显示格式。 这样的输出对后期提取分析格外有用——因为你知道输出的格式不会随着 Git 的更新而发生改变:
1 2 3 4
$ git log --pretty=format:"%h - %an, %ar : %s" ca82a6d - Scott Chacon, 6 years ago : changed the version number 085bb3b - Scott Chacon, 6 years ago : removed unnecessary test a11bef0 - Scott Chacon, 6 years ago : first commit
$ git log --pretty=format:"%h %s" --graph * 2d3acf9 ignore errors from SIGCHLD on trap * 5e3ee11 Merge branch 'master' of git://github.com/dustin/grit |\ | * 420eac9 Added a method for getting the current branch. * | 30e367c timeout code and tests * | 5a09431 add timeout protection to grit * | e1193f8 support for heads with slashes in them |/ * d6016bc require time for xmlschema * 11d191e Merge branch 'defunkt' into local
$ git log --pretty="%h - %s" --author='Junio C Hamano' --since="2008-10-01" \ --before="2008-11-01" --no-merges -- t/ 5610e3b - Fix testcase failure when extended attributes are in use acd3b9e - Enhance hold_lock_file_for_{update,append}() API f563754 - demonstrate breakage of detached checkout with symbolic link HEAD d1a43f2 - reset --hard/read-tree --reset -u: remove unmerged new paths 51a94af - Fix "checkout --track -b newbranch" on detached HEAD b0ad11e - pull: allow "git pull origin $something:$current_branch" into an unborn branch
$ git log --oneline --decorate f30ab (HEAD -> master, testing) add feature #32 - ability to add new formats to the central interface 34ac2 Fixed bug #1328 - stack overflow under certain conditions 98ca9 The initial commit of my project
$ git log --oneline --decorate --graph --all * c2b9e (HEAD, master) made other changes | * 87ab2 (testing) made a change |/ * f30ab add feature #32 - ability to add new formats to the * 34ac2 fixed bug #1328 - stack overflow under certain conditions * 98ca9 initial commit of my project
This repo has a lot of files which have been renamed over the years. To see the full history of a file through all of the renames, add --follow to the git log command:
1
$ git log -p --follow README.md
The above log also illustrates a cosmetic problem we have with file history thanks to the svn-to-git conversion in 2020: some svn branch creation events have caused a large number of spurious file deletions in individual file histories