python中文档SPHINX的使用

发表于 2017-01-23 更新于 2025-06-19 分类于 Python ， Sphinx 阅读次数： Disqus： Valine：

reStructuredText 简介

reStructuredText (reST) 为文档生成者提供足够的信息. reST 被认为是简单，实用的标记语言，因此学习它不会花太多时间.

段落

段落 (:duref:ref ) 是reST 文件的基本模块. 段落是由空行分隔的一段文本. 和Python一样, 对齐也是reST的操作符, 因此同一段落的行都是左对齐的.

内联标记

标准的reST 内联标记相当简单:

星号: *text* 是强调 (斜体),
双星号: **text** 重点强调 (加粗),
反引号: text 代码样式.

星号及反引号在文本中容易与内联标记符号混淆，可使用反斜杠符号转义.

标记需注意的一些限制:

不能相互嵌套,
内容前后不能由空白: 这样写* text* 是错误的,
如果内容需要特殊字符分隔. 使用反斜杠转义，如: thisis\ *one*\ word.

这些限制在未来版本可能会被改善.

reST 也允许自定义 “文本解释角色”’, 这意味着可以以特定的方式解释文本. Sphinx以此方式提供语义标记及参考索引，操作符为 :rolename:content``.

标准reST 提供以下规则:

:durole:emphasis – 写成 *emphasis*
:durole:strong – 写成 **strong**
:durole:literal – 写成 literal
:durole:subscript – 下标
:durole:superscript – 上标
:durole:title-reference – 书、期刊等材料的标题

详情请查看内联标记 .

列表与引用

列表标记 (:duref:ref ) 的使用最自然: 仅在段落的开头放置一个星号和一个缩进. 编号的列表也可以;也可以使用符号 # 自动加序号:

* 这是一个项目符号列表.
* 它有两项，
  第二项使用两行.

1. 这是个有序列表.
2. 也有两项.

#. 是个有序列表.
#. 也有两项.

列表可以嵌套，但是需跟父列表使用空行分隔

* 这是
* 一个列表

  * 嵌套列表
  * 子项

* 父列表继续

定义列表 (:duref:ref )

术语 (term 文本开头行)
   定义术语，必须缩进

   可以有多段组成

下一术语（term）
   描述.

一行仅能写一个术语.

引用段落 (:duref:ref ) 仅使用缩进（相对于周围段落）创建.

行模块 (:duref:ref ) 可以这样分隔

1
2
3

| 这些行
| 在源文件里
| 被分隔的一模一样.

还有其他有用的模块:

字段列表 (:duref:ref )
选项列表(:duref:ref )
字面引用模块 (:duref:ref )
文档测试模块 (:duref:ref )

源代码

字面代码块 (:duref:ref ) 在段落的后面使用标记 :: 引出. 代码块必须缩进(同段落，需要与周围文本以空行分隔):

这是一段正常文本. 下一段是代码文字::

   它不需要特别处理，仅是
   缩进就可以了.

   它可以有多行.

再是正常的文本段.


.. code-block:: c                                               
    :linenos:                                                   
    :emphasize-lines: 3,6   # 高亮显示3、6行                                    
                                                                
    void foo()                                                  
    {                                                                                            
        int i;                                                  
                                                                
        for(i=0; i<10; i++)                                     
            printf("i: %d\n", a);                               
    }

这个 :: 标记很优雅:

如果作为独立段落存在,则整段都不会出现在文档里.
如果前面有空白，则标记被移除.
如果前面是非空白，则标记被一个冒号取代.

因此上面的例子第一段文字将变为”下一段是代码文字:”.

表格

支持两种表格. 一种是 网格表格 (:duref:ref ), 可以自定义表格的边框. 如下:

+------------------------+------------+----------+----------+
| Header row, column 1   | Header 2   | Header 3 | Header 4 |
| (header rows optional) |            |          |          |
+========================+============+==========+==========+
| body row 1, column 1   | column 2   | column 3 | column 4 |
+------------------------+------------+----------+----------+
| body row 2             | ...        | ...      |          |
+------------------------+------------+----------+----------+

简单表格 (:duref:ref ) 书写简单, 但有一些限制: 需要有多行，且第一列元素不能分行显示，如下:

=====  =====  =======
A      B      A and B
=====  =====  =======
False  False  False
True   False  False
False  True   False
True   True   True
=====  =====  =======

超链接

外部链接

使用 ``链接文本 http://example.com/_ 可以插入网页链接. 链接文本是网址，则不需要特别标记，分析器会自动发现文本里的链接或邮件地址.

可以把链接和标签分开 (:duref:ref ), 如下:

1
2
3

段落里包含 `a link`_.

.. _a link: http://example.com/

内部链接

内部链接是Sphinx特定的reST角色, 查看章节交叉索引的位置.

章节

章节的标题 (:duref:ref ) 在双上划线符号之间（或为下划线）, 并且符号的长度不能小于文本的长度:

1
2
3

=================
This is a heading
=================

通常没有专门的符号表示标题的等级，但是对于Python 文档，可以这样认为:

# 及上划线表示部分
* 及上划线表示章节
=, 小章节
-, 子章节
^, 子章节的子章节
", 段落

当然也可以标记（查看 reST 文档), 定义章节的层次，但是需要注意输出格式(HTML, LaTeX)所支持的层次深度 .

显式标记

显式标记”Explicit markup” (:duref:ref ) 用在那些需做特殊处理的reST结构中, 如尾注，突出段落，评论，通用指令.

显式标记以 .. 开始，后跟空白符，与下面段落的缩进一样. (在显示标记与正常的段落间需有空行，这听起来有些复杂，但是写起来会非常直观.)

指令

指令 (:duref:ref ) 是显式标记最常用的模块. 也是reST 的扩展规则, 在 Sphinx 经常被用到.

文档工具支持以下指令:

警告:
- , :dudir:caution, 注意
- :dudir:danger, 危险
- :dudir:error, 错误
- :dudir:hint,提示
- :dudir:important,重要
- :dudir:note,注解
- :dudir:tip, 小技巧
- :dudir:warning 警告
- 及通用标记 :dudir:admonition. (大多数模式仅支持 “note” 及 “warning” )
图像:
- :dudir:image (详情可看下面的图像 )
- :dudir:figure (有标题及可选说明的图像)
额外的主体元素:
- :dudir:contents (本地，仅是当前文件的内容表格)
- :dudir:container (自定义容器，用来生成HTML的 <div> )
- :dudir:rubric (和文档章节无关的标题)
- :dudir:topic, :dudir:sidebar (高亮显示的主体元素)
- :dudir:parsed-literal (支持内联标记的斜体模块)
- :dudir:epigraph (可选属性行的摘要模块)
- :dudir:highlights, :dudir:pull-quote (有自己的类属性的摘要模块)
- :dudir:compound ( 复合段落)
专用表格:
- :dudir:table (有标题的表格)
- :dudir:csv-table (CSV自动生成表格)
- :dudir:list-table (列表生成的表格)
专用指令:
- :dudir:raw (包含原始格式的标记)
- :dudir:include (包含reStructuredText标记的文件) – 在Sphinx中,如果包含绝对文件路径，指令会以源目录地址做为参照
- :dudir:class (将类属性指派给下一个元素) [1]
HTML 特性:
- :dudir:meta (生成HTML <meta> 标签)
- :dudir:title (覆盖文档标题)
影响标记:
- :dudir:default-role (设置新的默认角色)
- :dudir:role (创建新的角色)
如果仅有一个文件，最好使用 :confval:default_role.

设置不使用指令 :dudir:sectnum, :dudir:header 及 :dudir:footer.

Sphinx 新增指令可查阅 Sphinx标记的组成.

指令有名字，参数，选项及内容组成. (记住这些，在下面一小节中自定义指令里会用到).来看一个例子:

.. function:: foo(x)
              foo(y, z)
   :module: some.module.name

   返回用户输入的一行文本.

function 是指令名字. 在第一行和第二行给出了两个参数, 及一个选项 module (如你所见，选项在参数后给出，由冒号引出). 选项必须与指令有一样的缩进.

指令的内容在隔开一个空行后，与指令有一样缩进.

图像

reST 支持图像指令 (:dudir:ref ), 如下:

1 2	.. image:: gnu.png (选项)

这里给出的文件名( gnu.png) 必须是源文件的相对路径，如果是绝对路径则以源目录为根目录. 例如，在文件 sketch/spam.rst 引用图像 images/spam.png ，则使用 ../images/spam.png 或者 /images/spam.png.

Sphinx 会自动将图像文件拷贝到输出目录的子目录里，( 输出HTML时目录为 _static )

图像的大小选项 (width 及 height) : 如果没有单位或单位为像素, 给定的尺寸信息仅在输出通道支持像素时才有用 ( 如输出LaTeX 没用). 其他单位在输出(如 pt )HTML、LaTeX 时被用到.

Sphinx 延伸了标准的文档化行为，只需在后面加星号:

1	.. image:: gnu.*

上面这样写，Sphinx 会搜索所有名字匹配的图像，而不管图像类型. 每个生成器则会选择最合适的图像. 一般，在源文件目录里文件名 gnu.* 会含有两个文件 gnu.pdf 和 gnu.png , LaTeX 生成器会选择前者，而HTML 生成器则匹配后者.

Changed in version 0.4: 添加对文件名以星号结束的支持.

Changed in version 0.6: 图像路径可以是绝对路径.

尾注

尾注 (:duref:ref ), 使用 [#name]_ 标记尾注的位置, 尾注的内容则在文档底部红色标题”Footnotes”的后面 , 如下:

Lorem ipsum [#f1]_ dolor sit amet ... [#f2]_

.. rubric:: Footnotes

.. [#f1] 第一条尾注的文本.
.. [#f2] 第二条尾注的文本.

你也可以使用数字尾注 ([1]_) 或使用自动排序的([#]_).

引用

支持标准的reST 引用 (:duref:ref ) , 且新增了”global”特性, 所有参考文献不受所在文件的限制. 如:

1
2
3

Lorem ipsum [Ref]_ dolor sit amet.

.. [Ref] 参考文献, 书,URL 等.

引用的使用同尾注很相近，但是它们没有数字标签或以 # 开始.

替换

reST 支持替换 “substitutions” (:duref:ref ), 有一小段文本或标记被关联到 |name|. 定义与尾注一样需有明确的标记块，如下:

1	.. \|name\| replace:: replacement text

或者:

1 2	.. \|caution\| image:: warning.png :alt: Warning!

详情查看 :duref:reST reference for substitutions .

如果想在所有文档中使用这些替换, 需把它们放在 :confval:rst_prolog 或一个单独文件里，然后在使用它们的文档文件里包含这个文件，包含指令 include . (请给出包含文件的扩展名，已区别于其他的源文件，避免Sphinx将其作为独立的文档文件.)

Sphinx 定义了一些默认的替换, 请查看替换.

源编码

在reST使用Unicode字符可以容易的包含特殊字符如破折号，版权标志. Sphinx 默认源文件使用UTF-8 编码; 你可以通过 :confval:source_encoding 的配置值改变编码.

常见问题

具体使用中可能会遇到一些问题:

内联标记的分离 如上面所讲，内联标记需与周围的文本使用空格分隔, 内联标记内部则使用反斜线转义空格. 查看详情: the reference .
内联标记不能嵌套 像这样写 *see :func:foo* 是不允许的.

Footnotes

[1]	当默认主域里包含指令 `class` , 这个指令将被隐藏因此, Sphinx使用 `rst-class`.

Be sure to say yes to autodoc

More information can refer https://zh-sphinx-doc.readthedocs.io/en/latest/rest.html#id2

python中文档SPHINX的使用

发表于 2017-01-23 更新于 2025-06-19 分类于 Python ， Sphinx 阅读次数： Disqus： Valine：

Python中文档Sphinx的使用

安装软件包

1	$ pip install Sphinx

生成模板

1 2	# 大部分默认即可 $ sphinx-quickstart

此时可以看到生成的文件为：

➜  docs git:(master) ✗ ls 

_build    _static   Makefile                                         
_templates   conf.py  index.rst

此时查看index.rst文件

.. toctree::
   :maxdepth: 2
   :caption: Contents:


Indices and tables
==================

* :ref:`genindex`
* :ref:`modindex`
* :ref:`search`

输出文件

# 生成html文件
$ make html
# 生成pdf文件
$ make latexpdf

新增文件

此时新增一个文档和图片

1 2	touch hello.rst cp test.png _static/test.png

更新index.rst文件，注意缩进

.. toctree::
   :maxdepth: 2
   :caption: Contents:

   hello.rst
   
.. image:: _static/test.png

Indices and tables
==================

* :ref:`genindex`
* :ref:`modindex`
* :ref:`search`

更新模板

可以通过修改_static/default.css和_templates/layout.html来修改和定制网站的外观。

多语种支持

安装软件

1	$pip3 install sphinx-intl

增加下列信息到conf.py，

1 2	locale_dirs = ['locale/'] gettext_compact = False

生成pot文件

1	make gettext

生成po文件

1	$ sphinx-intl update -p _build/gettext -l en

此时就能看到在文件locale/en/LC_MESSAGES中的信息

构建翻译的文档

1	$ make -e SPHINXOPTS="-D language='en'" html

Be sure to say yes to autodoc

More information can refer [http://www.matplotlib.org/sampledoc]

python 虚拟环境virtualenv

发表于 2017-01-23 更新于 2025-06-19 分类于 Python 阅读次数： Disqus： Valine：

python 虚拟环境virtualenv

在开发Python应用程序的时候，经常会因为各个应用程序使用的环境或包不同，而导致应用程序无法运行，亦或者因为没有管理员权限无法安装相应的包。

此时就需要使用Python的强大虚拟机工具virtualenv了，为每个应用创建一套自己的Python运行环境。

安装

首先，我们用pip安装virtualenv：

1	$ pip3 install virtualenv

使用

第一步，创建目录：

1 2	$ mkdir myproject $ cd myproject/

第二步，创建一个独立的Python运行环境，命名为venv：

$ virtualenv --no-site-packages venv

Using base prefix '/usr/local/.../Python.framework/Versions/3.4'

New python executable in venv/bin/python3.4

Also creating executable in venv/bin/python

Installing setuptools, pip, wheel...done.

命令virtualenv就可以创建一个独立的Python运行环境，我们还加上了参数–no-site-packages，这样，已经安装到系统Python环境中的所有第三方包都不会复制过来，这样，我们就得到了一个不带任何第三方包的“干净”的Python运行环境。

新建的Python环境被放到当前目录下的venv目录。有了venv这个Python环境，可以用source进入该环境：

第三部，调用虚拟环境

$ source venv/bin/activate

(venv)$

注意到命令提示符变了，有个(venv)前缀，表示当前环境是一个名为venv的Python环境。

第四步，安装各种包

下面正常安装各种第三方包，并运行python命令：

(venv)$ pip install jinja2

...

Successfully installed jinja2-2.7.3 markupsafe-0.23

(venv)$ python myapp.py

...

在venv环境下，用pip安装的包都被安装到venv这个环境下，系统Python环境不受任何影响。也就是说，venv环境是专门针对myproject这个应用创建的。

第五步，退出虚拟环境

退出当前的venv环境，使用deactivate命令：

1
2
3

(venv)$ deactivate

$

此时就回到了正常的环境，现在pip或python均是在系统Python环境下执行。

指定python版本

如果存在多个python解释器，可以选择指定一个Python解释器（比如python2.7），没有指定则由系统默认的解释器来搭建，比如下面的命令为使用python2.7来搭建。

1	$ virtualenv -p /usr/bin/python2.7 my_project_env

参考

https://www.liaoxuefeng.com/wiki/0014316089557264a6b348958f449949df42a6d3a2e542c000/001432712108300322c61f256c74803b43bfd65c6f8d0d0000

python 虚拟环境virtualenv

发表于 2017-01-23 更新于 2025-06-19 分类于 Python 阅读次数： Disqus： Valine：

简介

在开发Python应用程序的时候，经常会因为各个应用程序使用的环境或包不同，而导致应用程序无法运行，亦或者因为没有管理员权限无法安装相应的包。

此时就需要使用Python的强大虚拟机工具virtualenv了，为每个应用创建一套自己的Python运行环境。

安装

首先，我们用pip安装virtualenv：

1	$ pip3 install virtualenv

使用

第一步，创建目录：

1 2	$ mkdir myproject $ cd myproject/

第二步，创建一个独立的Python运行环境，命名为venv：

$ virtualenv --no-site-packages venv

Using base prefix '/usr/local/.../Python.framework/Versions/3.4'

New python executable in venv/bin/python3.4

Also creating executable in venv/bin/python

Installing setuptools, pip, wheel...done.

命令virtualenv就可以创建一个独立的Python运行环境，我们还加上了参数–no-site-packages，这样，已经安装到系统Python环境中的所有第三方包都不会复制过来，这样，我们就得到了一个不带任何第三方包的“干净”的Python运行环境。

新建的Python环境被放到当前目录下的venv目录。有了venv这个Python环境，可以用source进入该环境：

第三部，调用虚拟环境

$ source venv/bin/activate

(venv)$

注意到命令提示符变了，有个(venv)前缀，表示当前环境是一个名为venv的Python环境。

第四步，安装各种包

下面正常安装各种第三方包，并运行python命令：

(venv)$ pip install jinja2

...

Successfully installed jinja2-2.7.3 markupsafe-0.23

(venv)$ python myapp.py

...

在venv环境下，用pip安装的包都被安装到venv这个环境下，系统Python环境不受任何影响。也就是说，venv环境是专门针对myproject这个应用创建的。

第五步，退出虚拟环境

退出当前的venv环境，使用deactivate命令：

(venv)$ deactivate

$

此时就回到了正常的环境，现在pip或python均是在系统Python环境下执行。

指定python版本

如果存在多个python解释器，可以选择指定一个Python解释器（比如python2.7），没有指定则由系统默认的解释器来搭建，比如下面的命令为使用python2.7来搭建。

1	$ virtualenv -p /usr/bin/python2.7 my_project_env

参考

https://www.liaoxuefeng.com/wiki/0014316089557264a6b348958f449949df42a6d3a2e542c000/001432712108300322c61f256c74803b43bfd65c6f8d0d0000

python 虚拟环境管理工具virtualenvwrapper

发表于 2017-01-23 更新于 2025-06-19 分类于 Python 阅读次数： Disqus： Valine：

python 虚拟环境管理工具virtualenvwrapper

你可以在系统的任意地方创建虚拟环境，当下次需要这个某个环境的时候很难找，virtualenv不便于对环境的集中管理，virtualenvwrapper很好的解决了这个问题

首先需要安装：

1 2	$ pip install virtualenv $ pip install virtualenvwrapper

接下来创建一个目录，用来存放我们的虚拟环境，比如

1	$ mkdir ~/virtualenv/

然后配置环境变量：

1
2
3

$ export WORKON_HOME=~/Pyenv
$ VIRTUALENVWRAPPER_PYTHON=/usr/bin/python3
$ source /usr/local/bin/virtualenvwrapper.sh

这些内容可以写进bashrc文件中。

创建虚拟环境

利用 virtualenvwrapper，我们可以使用下面的命令轻松创建一个虚拟环境。

$ mkvirtualenv spide

$ mkvirtualenv --python=/usr/local/python3.7/bin/python py3 #指定解释器为3.7

$ mkvirtualenv -p python3.7 pynew #指定版本比较喜欢用这个

然后就可以使用包管理命令了：

$ lsvirtualenv -b #查看虚拟环境

$ workon #切换或者进入虚拟环境
$ workon virtualenv-name # 进入虚拟环境virtualenv-name

$ lssitepackages #查看当前环境中安装的那些包（启动虚拟环境后）

$ deactivate #退出虚拟环境

$ rmvirtualenv virtualname #虚拟环境名 删除虚拟环境

python 虚拟环境管理工具virtualenvwrapper

发表于 2017-01-23 更新于 2025-06-19 分类于 Python 阅读次数： Disqus： Valine：

python 虚拟环境管理工具virtualenvwrapper

首先需要安装：

1 2	$ pip install virtualenv $ pip install virtualenvwrapper

接下来创建一个目录，用来存放我们的虚拟环境，比如

1	$ mkdir ~/virtualenv/

然后配置环境变量：

1
2
3

$ export WORKON_HOME=~/Pyenv
$ VIRTUALENVWRAPPER_PYTHON=/usr/bin/python3
$ source /usr/local/bin/virtualenvwrapper.sh

这些内容可以写进bashrc文件中。

创建虚拟环境

利用 virtualenvwrapper，我们可以使用下面的命令轻松创建一个虚拟环境。

1	$ mkvirtualenv spide

然后就可以使用包管理命令了：

$ lsvirtualenv -b #查看虚拟环境

$ workon #切换或者进入虚拟环境
$ workon virtualenv-name # 进入虚拟环境virtualenv-name

$ lssitepackages #查看当前环境中安装的那些包（启动虚拟环境后）

$ deactivate #退出虚拟环境

$ rmvirtualenv #虚拟环境名 删除虚拟环境

输入文件的格式

发表于 2017-01-23 更新于 2025-06-19 分类于 Flex 阅读次数： Disqus： Valine：

Flex输入文件的格式

flex的输入文件包含三个部分，使用%%来分隔，格式如下：

定义段
%%
规则段
%%
用户代码

定义段的格式

定义段的格式如下：

1	name definition

name 的定义跟C语言类似，由开头为字母或者下划线组成的。definition为从第一个为非空格的字符一直到行尾。定义可以后续使用{name}来代替，会自动展开为(definition)。例如

1 2	DIGIT [0-9] ID [a-z][a-z0-9]*

DIGIT为匹配一个数字的正则表达式，ID为一个第一个为字符，后面跟着零个或者多个的数字字符。

1	{DIGIT}+"."{DIGIT}*

可以展开为：

1	([0-9])+"."([0-9])*

可以认为是浮点数的表达式。

在符号%{和 %}之间的部分都会被原版拷贝到输出，不过需要注意的是%{和%}字符必须 顶格写 。

另外一个比较有用的为%top，使用方法与%{ … %} 使用类似。

这个字段的作用为，内部含的内容全部出现在生成文件的最开始，这对于定义一些宏或者写一些提示信息是比较有用的。

比如可以写些如下的内容：

%top{
  /*
   * Author       :       Guo Shaoguang
   * Email        :       sgguo@shao.ac.cn
   * Date         :       201010
   * Version      :       v1.0
   * Name         :       heal the world
   *
   * Copyright (c) 2010-2016, SHAO
   * All rights reserved.

}

%top字段可以写多个，顺序就按照书写的顺序。

规则段的格式

规则段的格式为：

1	pattern action

需要注意的是pattern必须顶格写。

用户代码的格式

用户代码被原版照搬到lex.yy.c文件中，可以省略不写。

如何写注释

注释的写法与C语言类似，唯一需要注意的是，最好在/*的前面加上几个空格，并且都是在单独一行里面。

Flex expressions

发表于 2017-01-23 更新于 2025-06-19 分类于 Flex 阅读次数： Disqus： Valine：

The patterns at the heart of every flex scanner use a rich regular expression language.
A regular expression is a pattern description using a metalanguage, a language that you
use to describe what you want the pattern to match. Flex’s regular expression language
is essentially POSIX-extended regular expressions (which is not surprising considering
their shared Unix heritage). The metalanguage uses standard text characters, some of
which represent themselves and others of which represent patterns. All characters other
than the ones listed below, including all letters and digits, match themselves.
The characters with special meaning in regular expressions are:

. Matches any single character except the newline character ( \n ).
[] A character class that matches any character within the brackets. If the first char- acter is a circumflex ( ^ ), it changes the meaning to match any character except the ones within the brackets. A dash inside the square brackets indicates a character range; for example, [0-9] means the same thing as [0123456789] and [a-z] means any lowercase letter. A - or ] as the first character after the [ is interpreted literally to let you include dashes and square brackets in character classes. POSIX intro- duced other special square bracket constructs that are useful when handling non- English alphabets, described later in this chapter. Other metacharacters do not have any special meaning within square brackets except that C escape sequences starting with \ are recognized. Character ranges are interpreted relative to the character coding in use, so the range [A-z] with ASCII character coding would match all uppercase and lowercase letters, as well as six punctuation characters whose codes fall between the code for Z and the code for a . In practice, useful ranges are ranges of digits, of uppercase letters, or of lowercase letters.
[a-z]{-}[jv] A differenced character class, with the characters in the first class omitting the characters in the second class (only in recent versions of flex).
^ Matches the beginning of a line as the first character of a regular expression. Also used for negation within square brackets.
$ Matches the end of a line as the last character of a regular expression.
{} If the braces contain one or two numbers, indicate the minimum and maximum number of times the previous pattern can match. For example, A{1,3} matches one to three occurrences of the letter A, and 0{5} matches 00000. If the braces contain a name, they refer to a named pattern by that name.
\ Used to escape metacharacters and as part of the usual C escape sequences; for example, \n is a newline character, while * is a literal asterisk.
- Matches zero or more copies of the preceding expression. For example, [ \t]* is a common pattern to match optional spaces and tabs, that is, whitespace, which matches “ ”, “ ”, or an empty string.
- Matches one or more occurrences of the preceding regular expression. For example, [0-9]+ matches strings of digits such as 1 , 111 , or 123456 but not an empty string.
? Matches zero or one occurrence of the preceding regular expression. For example, -?[0-9]+ matches a signed number including an optional leading minus sign.
| The alternation operator; matches either the preceding regular expression or the following regular expression. For example, faith|hope|charity matches any of the three virtues.
“…” Anything within the quotation marks is treated literally. Metacharacters other than C escape sequences lose their meaning. As a matter of style, it’s good practice to quote any punctuation characters intended to be matched literally.
() Groups a series of regular expressions together into a new regular expression. For example, (01) matches the character sequence 01, and a(bc|de) matches abc or ade. Parentheses are useful when building up complex patterns with *, +, ?, and |.
/ Trailing context, which means to match the regular expression preceding the slash but only if followed by the regular expression after the slash. For example, 0/1 matches 0 in the string 01 but would not match anything in the string 0 or 02 . The material matched by the pattern following the slash is not “consumed” and remains to be turned into subsequent tokens. Only one slash is permitted per pattern. The repetition operators affect the smallest preceding expression, so abc+ matches ab followed by one or more c’s. Use parentheses freely to be sure your expressions match what you want, such as (abc)+ to match one or more repetitions of abc.

Flex handles ambiguous patterns

发表于 2017-01-23 更新于 2025-06-19 分类于 Flex 阅读次数： Disqus： Valine：

Most flex programs are quite ambiguous, with multiple patterns that can match the same input. Flex resolves the ambiguity with two simple rules:

Match the longest possible string every time the scanner matches input.
In the case of a tie, use the pattern that appears first in the program.

These turn out to do the right thing in the vast majority of cases. Consider this snippet
from a scanner for C source code:

"+"
"="
"+=" { return ADD; }
{ return ASSIGN; }
{ return ASSIGNADD; }
"if"
"else"
[a-zA-Z_][a-zA-Z0-9_]* { return KEYWORDIF; }
{ return KEYWORDELSE; }
{ return IDENTIFIER; }

For the first three patterns, the string += is matched as one token, since += is longer than
+ . For the last three patterns, so long as the patterns for keywords precede the pattern
that matches an identifier, the scanner will match keywords correctly.

Flex 的一些简单例子

发表于 2017-01-23 更新于 2025-06-19 分类于 Flex 阅读次数： Disqus： Valine：

来看看NB闪闪的flex的使用。

接下来简单的一句话的含义为，在遇到字符 username的时自动替换为用户名的登录名：

1 2	%% username printf( "%s", getlogin() );

默认情况下，任何不匹配的信息都会拷贝输出到终端。

上面的就是一个规则，其中 username' 是模式， printf’ 为动作。`%%’ 标记规则的开始。

下面说一个简单的例子example1.l：

        int num_lines = 0, num_chars = 0;

%%
\n      ++num_lines; ++num_chars;
.       ++num_chars;

%%
main()
        {
        yylex();
        printf( "# of lines = %d, # of chars = %d\n",
                num_lines, num_chars );
        }

这个程序将统计字符数和行数。程序将只输出最后的统计信息。

第一行定义了两个全局变量num_lines 和 num_chars，这两个变量可以被后面的yylex() 和 main() 函数来访问。

接下来有2个规则，匹配成 (`\n’) ，将增加 line 和 character 计数；两外一个就是匹配任何字符的将增加字符计数。

再来一个比较复杂的例子example2.l

/* scanner for a toy Pascal-like language */

%{
/* need this for the call to atof() below */
#include <math.h>
%}

DIGIT    [0-9]
ID       [a-z][a-z0-9]*

%%

{DIGIT}+    {
            printf( "An integer: %s (%d)\n", yytext,
                    atoi( yytext ) );
            }

{DIGIT}+"."{DIGIT}*        {
            printf( "A float: %s (%g)\n", yytext,
                    atof( yytext ) );
            }

if|then|begin|end|procedure|function        {
            printf( "A keyword: %s\n", yytext );
            }

{ID}        printf( "An identifier: %s\n", yytext );

"+"|"-"|"*"|"/"   printf( "An operator: %s\n", yytext );

"{"[\^{}}\n]*"}"     /* eat up one-line comments */

[ \t\n]+          /* eat up whitespace */

.           printf( "Unrecognized character: %s\n", yytext );

%%

int main( int argc, char **argv )
    {
    ++argv, --argc;  /* skip over program name */
    if ( argc > 0 )
            yyin = fopen( argv[0], "r" );
    else
            yyin = stdin;

    yylex();
    }

这个实例是解析类似Pascal语言的开始。程序解析了不同类型的标记并显示出来。

下一节介绍这个例子的含义。