0%

存同求异 join

存同求异 join

.. note::
劝君莫惜金缕衣,劝君惜取少年时。

  • 杜秋娘《金缕衣》

Linux join命令用于将两个文件中指定栏位内容相同的行连接起来。

找出两个文件中,指定栏位内容相同的行,并加以合并,再输出到标准输出设备。

类似于SQL的JOIN操作

官方解释为:

join - join lines of two files on a common field

语法为:

1
$  join [OPTION]... FILE1 FILE2

这个命令的参数还是有一些的,不过基本默认的足够使用了。

join实例

最简单的连接两个文件。

首先看一下两个文件的内容,然后进行join操作。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
# 查看file1、file2 的文件内容:
$ cat file1
Zhangsan age 14
Lisi age 15
Wangwu age 16

$ cat file2
Zhangsan score 80
Lisi score 90
Wangwu score 85

# 使用join命令
$ join file1 file2
Zhangsan age 14 score 80
Lisi age 15 score 90
Wangwu age 16 score 85

# 交互两个文件的顺序
$ join file2 file1
Zhangsan score 80 age 14
Lisi score 90 age 15
Wangwu score 85 age 16

可以看到交换顺序对输出是有影响的,会影响到最终的输出内容。

不同的栏内容进行join操作

而如果两个文件的内容不同,那么在进行join操作时会有警告信息输出,如下所示:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
$ cat file1       
Jialiu age 15
Zhangsan age 14
Lisi age 15
Wangwu age 16
$ cat file2
Zhangsan score 80
Lisi score 90
Wangwu score 85
Jialiu score 88
$ join file1 file2
join: file1:3: is not sorted: Lisi age 15
join: file2:2: is not sorted: Lisi score 90
Zhangsan age 14 score 80
Lisi age 15 score 90
Wangwu age 16 score 85
$ join file2 file1
join: file2:2: is not sorted: Lisi score 90
join: file1:3: is not sorted: Lisi age 15
Zhangsan score 80 age 14
Lisi score 90 age 15
Wangwu score 85 age 16

TODO

语法

1
join [-i][-a<1或2>][-e<字符串>][-o<格式>][-t<字符>][-v<1或2>][-1<栏位>][-2<栏位>][--help][--version][文件1][文件2]

参数

  • -a<1或2> 除了显示原来的输出内容之外,还显示指令文件中没有相同栏位的行。
  • -e<字符串> 若[文件1]与[文件2]中找不到指定的栏位,则在输出中填入选项中的字符串。
  • -i或–igore-case 比较栏位内容时,忽略大小写的差异。
  • -o<格式> 按照指定的格式来显示结果。
  • -t<字符> 使用栏位的分隔字符。
  • -v<1或2> 跟-a相同,但是只显示文件中没有相同栏位的行。
  • -1<栏位> 连接[文件1]指定的栏位。
  • -2<栏位> 连接[文件2]指定的栏位。

-a FILENUM
also print unpairable lines from file FILENUM, where FILENUM is 1 or 2, corresponding to FILE1 or FILE2

-e EMPTY
replace missing input fields with EMPTY

-i, –ignore-case
ignore differences in case when comparing fields

-j FIELD
equivalent to ‘-1 FIELD -2 FIELD’

-o FORMAT
obey FORMAT while constructing output line

-t CHAR
use CHAR as input and output field separator

-v FILENUM
like -a FILENUM, but suppress joined output lines

-1 FIELD
join on this FIELD of file 1

-2 FIELD
join on this FIELD of file 2

–check-order
check that the input is correctly sorted, even if all input lines are pairable

–nocheck-order
do not check that the input is correctly sorted

–header
treat the first line in each file as field headers, print them without trying to pair them

-z, –zero-terminated
line delimiter is NUL, not newline

   Unless -t CHAR is given, leading blanks separate fields and are ignored, else fields are separated by CHAR.  Any FIELD is a  field
   number counted from 1.  FORMAT is one or more comma or blank separated specifications, each being 'FILENUM.FIELD' or '0'.  Default
   FORMAT outputs the join field, the remaining fields from FILE1, the remaining fields from FILE2, all separated by CHAR.  If FORMAT
   is the keyword 'auto', then the first line of each file determines the number of fields output for each line.

   Important:  FILE1 and FILE2 must be sorted on the join fields.  E.g., use "sort -k 1b,1" if 'join' has no options, or use "join -t
   ''" if 'sort' has no options.  Note, comparisons honor the rules specified by 'LC_COLLATE'.  If the input is not sorted  and  some
   lines cannot be joined, a warning message will be given.

​ comm(1), uniq(1)

处无为之事,行不言之教;作而弗始,生而弗有,为而弗恃,功成不居!

欢迎关注我的其它发布渠道