awk使用方法
AWK是一種處理文本文件的語(yǔ)言,是一個(gè)強(qiáng)大的文本分析工具。之所以叫AWK是因?yàn)槠淙×巳粍?chuàng)始人 Alfred Aho,Peter Weinberger, 和 Brian Kernighan 的Family Name的首字符。
awk有3個(gè)不同版本: awk、nawk和gawk,未作特別說(shuō)明,一般指gawk,gawk 是 AWK 的 GNU 版本。
參數(shù)
awk [option] '{pattern + action}' {filenames} # sometims muti file is ok
awk [option] 'BEGIN{初始代碼} {循環(huán)代碼} END{最后代碼}' filename
常用基本用法總覽
echo 1 2 3 |awk '{ print "total pay for", $1, "is", $2 * $3 }'
# awk '{ printf("total pay for %s is $%.2f\n", $1, $2 * $3) }'
# awk -F :-F的意思就是指定分隔符
echo $PATH|awk -F ':' '{print $1}'
# awk -F '[;:]' 指定多個(gè)分隔符
# -F相當(dāng)于內(nèi)置變量FS, 指定分割字符
cat /etc/passwd |awk -F ':' '{print $1"\t"$7}'
cat /etc/passwd |awk -F ':' 'BEGIN {print "name,shell"} {print $1","$7} END {print "blue,/bin/nosh"}'
awk '{count++;print $0;} END{print "user count is ", count}' /etc/passwd
# less -S tmp.txt |awk 'BEGIN{sum=3}{sum=sum+1}END{print sum}'
awk -F ':' 'BEGIN {count=0;} {name[count] = $1;count++;}; END{for (i = 0; i < NR; i++) print i, name[i]}' /etc/passwd
awk '$3 == 0 { print $1 }' emp.data
# less -S /public/reference/gtf/ensembl/homo_sapiens_86/Homo_sapiens.GRCh38.86.gtf.gz|awk '$3=="gene"{print $0}'|less -S
# 根據(jù)gtf測(cè)試文本來(lái)提取區(qū)域
# less -S /public/reference/gtf/ensembl/homo_sapiens_86/Homo_sapiens.GRCh38.86.gtf.gz|awk '$4>20000 {print $0}'|less -S
# less -S /public/reference/gtf/ensembl/homo_sapiens_86/Homo_sapiens.GRCh38.86.gtf.gz|awk '$1=="1" && $4>20000 && $5<30000 {print $0}'
awk '$3 >0 { print $1, $2 * $3 }' emp.data
awk '$3 > 15 { num = num + 1 } END{ print num" lines" }'
awk 'NF != 3 { print $0, "number of fields is not equal to 3" }' # 與 && , 或 || , 以及非 !
awk '$2 < 3.35 { print $0, "rate is below minimum wage" }' # Eg: !($2 < 4 && $3 < 20)
# 體會(huì)內(nèi)置變量 NR NF
nl /etc/passwd |awk 'END { print NR, "employees" }'
nl /etc/passwd |awk '{print NF}'
# 體會(huì)賦值
cat /etc/passwd |awk -v FS=":" '{print NF}'
awk '{ pay = pay + $2 * $3 }
END { print NR, "employees"
print "total pay is", pay
print "average pay is", pay/NR
}'
匹配
### 不講
awk '/LISTEN/' netstat.txt
awk '/LISTEN/' netstat.txt
awk '$6 ~ /FIN|TIME/ || NR==1 {print NR,$4,$5,$6}' OFS="\t" netstat.txt
awk '$6 !~ /WAIT/ || NR==1 {print NR,$4,$5,$6}' OFS="\t" netstat.txt
awk '!/WAIT/' netstat.txt
awk 'NR!=1{print > $6}' netstat.txt
$ awk 'NR!=1{if($6 ~ /TIME|ESTABLISHED/) print > "1.txt";
else if($6 ~ /LISTEN/) print > "2.txt";
else print > "3.txt" }' netstat.txt
$ awk 'NR!=1{a[$6]++;} END {for (i in a) print i ", " a[i];}' netstat.txt
示例
log.txt文本內(nèi)容如下:
2 this is a test
3 Are you like awk
This's a test
10 There are orange,apple,mongo
實(shí)例:
# 使用","分割
$ awk -F, '{print $1,$2}' log.txt
---------------------------------------------
2 this is a test
3 Are you like awk
This's a test
10 There are orange apple
# 或者使用內(nèi)建變量
$ awk 'BEGIN{FS=","} {print $1,$2}' log.txt
---------------------------------------------
2 this is a test
3 Are you like awk
This's a test
10 There are orange apple
# 使用多個(gè)分隔符.先使用空格分割,然后對(duì)分割結(jié)果再使用","分割
$ awk -F '[ ,]' '{print $1,$2,$5}' log.txt
---------------------------------------------
2 this test
3 Are awk
This's a
10 There apple
運(yùn)算符
| 運(yùn)算符 | 描述 |
|---|---|
| = += -= *= /= %= ^= **= | 賦值 |
| ?: | C條件表達(dá)式 |
| || | 邏輯或 |
| && | 邏輯與 |
| ~ !~ | 匹配正則表達(dá)式和不匹配正則表達(dá)式 |
| < <= > >= != == | 關(guān)系運(yùn)算符 |
| 空格 | 連接 |
| + - | 加,減 |
| * / % | 乘,除與求余 |
| + - ! | 一元加,減和邏輯非 |
| ^ *** | 求冪 |
| ++ -- | 增加或減少,作為前綴或后綴 |
| $ | 字段引用 |
| in | 數(shù)組成員 |
重點(diǎn):記憶匹配~,> ,==判斷
示例
過(guò)濾第一列大于2的行
$ awk '$1>2' log.txt #命令
#輸出
3 Are you like awk
This's a test
10 There are orange,apple,mongo
過(guò)濾第一列大于2并且第二列等于'Are'的行
$ awk '$1>2 && $2=="Are" {print $1,$2,$3}' log.txt #命令
#輸出
3 Are you
內(nèi)建變量
賦值時(shí)結(jié)合-v一起使用
| 常見(jiàn)變量 | 描述 |
|---|---|
| $n | 當(dāng)前記錄的第n個(gè)字段,字段間由FS分隔 |
| $0 | 完整的輸入記錄 |
| FS | 字段分隔符(默認(rèn)是任何空格) |
| OFS | 輸出記錄分隔符(輸出換行符),輸出時(shí)用指定的符號(hào)代替換行符 |
| NF | 一條記錄的字段的數(shù)目 |
| NR | 已經(jīng)讀出的記錄數(shù),就是行號(hào),從1開(kāi)始 |
| RS | 記錄分隔符(默認(rèn)是一個(gè)換行符) |
| ORS | 輸出記錄分隔符(默認(rèn)值是一個(gè)換行符) |
| 變量 | 描述 |
|---|---|
| ARGC | 命令行參數(shù)的數(shù)目 |
| ARGIND | 命令行中當(dāng)前文件的位置(從0開(kāi)始算) |
| ARGV | 包含命令行參數(shù)的數(shù)組 |
| CONVFMT | 數(shù)字轉(zhuǎn)換格式(默認(rèn)值為%.6g)ENVIRON環(huán)境變量關(guān)聯(lián)數(shù)組 |
| ERRNO | 最后一個(gè)系統(tǒng)錯(cuò)誤的描述 |
| FIELDWIDTHS | 字段寬度列表(用空格鍵分隔) |
| FILENAME | 當(dāng)前文件名 |
| FNR | 各文件分別計(jì)數(shù)的行號(hào) |
| IGNORECASE | 如果為真,則進(jìn)行忽略大小寫(xiě)的匹配 |
| OFMT | 數(shù)字的輸出格式(默認(rèn)值是%.6g) |
| RLENGTH | 由match函數(shù)所匹配的字符串的長(zhǎng)度 |
| RSTART | 由match函數(shù)所匹配的字符串的第一個(gè)位置 |
| SUBSEP | 數(shù)組下標(biāo)分隔符(默認(rèn)值是/034) |
awk -v # 設(shè)置變量
示例:
seq 1 10|awk 'BEGIN{ ORS=" " }{ print $0 }'
seq 1 10|awk 'BEGIN{ RS="\t" }{ print $0"OK" }'
seq 1 10|awk 'BEGIN{ RS="\n" }{ print $0"OK" }'
1OK
2OK
3OK
4OK
5OK
6OK
7OK
8OK
9OK
10OK
前兩個(gè)運(yùn)行示例:
1 2 3 4 5 6 7 8 9 10 $
1
2
3
4
5
6
7
8
9
10
OK
實(shí)例:
$ awk -va=1 '{print $1,$1+a}' log.txt
---------------------------------------------
2 3
3 4
This's 1
10 11
awk 'BEGIN{printf "%4s %4s %4s %4s %4s %4s %4s %4s %4s\n","FILENAME","ARGC","FNR","FS","NF","NR","OFS","ORS","RS";printf "---------------------------------------------\n"} {printf "%4s %4s %4s %4s %4s %4s %4s %4s %4s\n",FILENAME,ARGC,FNR,FS,NF,NR,OFS,ORS,RS}' log.txt
FILENAME ARGC FNR FS NF NR OFS ORS RS
---------------------------------------------
log.txt 2 1 5 1
log.txt 2 2 5 2
log.txt 2 3 3 3
log.txt 2 4 4 4
$ awk -F\' 'BEGIN{printf "%4s %4s %4s %4s %4s %4s %4s %4s %4s\n","FILENAME","ARGC","FNR","FS","NF","NR","OFS","ORS","RS";printf "---------------------------------------------\n"} {printf "%4s %4s %4s %4s %4s %4s %4s %4s %4s\n",FILENAME,ARGC,FNR,FS,NF,NR,OFS,ORS,RS}' log.txt
FILENAME ARGC FNR FS NF NR OFS ORS RS
---------------------------------------------
log.txt 2 1 ' 1 1
log.txt 2 2 ' 1 2
log.txt 2 3 ' 2 3
log.txt 2 4 ' 1 4
# 輸出順序號(hào) NR, 匹配文本行號(hào)
$ awk '{print NR,FNR,$1,$2,$3}' log.txt
---------------------------------------------
1 1 2 this is
2 2 3 Are you
3 3 This's a test
4 4 10 There are
# 指定輸出分割符
$ awk '{print $1,$2,$5}' OFS=" $ " log.txt
---------------------------------------------
2 $ this $ test
3 $ Are $ awk
This's $ a $
10 $ There $
正則匹配
# 輸出第二列包含 "th",并打印第二列與第四列
$ awk '$2 ~ /th/ {print $2,$4}' log.txt
---------------------------------------------
this a
~ 表示模式開(kāi)始。// 中是模式。
# 輸出包含"re" 的行
$ awk '/re/' log.txt
---------------------------------------------
3 Are you like awk
10 There are orange,apple,mongo
不匹配:
2 !~ /th/ {print
4}' log.txt
忽略大小寫(xiě)
$ awk 'BEGIN{IGNORECASE=1} /this/' log.txt
---------------------------------------------
2 this is a test
This's a test
awk腳本
關(guān)于awk腳本,我們需要注意兩個(gè)關(guān)鍵詞BEGIN和END。
- BEGIN{ 這里面放的是執(zhí)行前的語(yǔ)句 }
- END {這里面放的是處理完所有的行后要執(zhí)行的語(yǔ)句 }
- {這里面放的是處理每一行時(shí)要執(zhí)行的語(yǔ)句}
假設(shè)有這么一個(gè)文件(學(xué)生成績(jī)表):
$ cat score.txt
Marry 2143 78 84 77
Jack 2321 66 78 45
Tom 2122 48 77 71
Mike 2537 87 97 95
Bob 2415 40 57 62
我們的awk腳本如下:
$ cat cal.awk
#!/bin/awk -f
#運(yùn)行前
BEGIN {
math = 0
english = 0
computer = 0
printf "NAME NO. MATH ENGLISH COMPUTER TOTAL\n"
printf "---------------------------------------------\n"
}
#運(yùn)行中
{
math+=$3
english+=$4
computer+=$5
printf "%-6s %-6s %4d %8d %8d %8d\n", $1, $2, $3,$4,$5, $3+$4+$5
}
#運(yùn)行后
END {
printf "---------------------------------------------\n"
printf " TOTAL:%10d %8d %8d \n", math, english, computer
printf "AVERAGE:%10.2f %8.2f %8.2f\n", math/NR, english/NR, computer/NR
}
我們來(lái)看一下執(zhí)行結(jié)果:
$ awk -f cal.awk score.txt
NAME NO. MATH ENGLISH COMPUTER TOTAL
---------------------------------------------
Marry 2143 78 84 77 239
Jack 2321 66 78 45 189
Tom 2122 48 77 71 196
Mike 2537 87 97 95 279
Bob 2415 40 57 62 159
---------------------------------------------
TOTAL: 319 393 350
AVERAGE: 63.80 78.60 70.00
另外一些實(shí)例
AWK的hello world程序?yàn)椋?/p>
BEGIN { print "Hello, world!" }
計(jì)算文件大小
$ ls -l *.txt | awk '{sum+=$6} END {print sum}'
--------------------------------------------------
666581
從文件中找出長(zhǎng)度大于80的行
awk 'length>80' log.txt
打印九九乘法表
seq 9 | sed 'H;g' | awk -v RS='' '{for(i=1;i<=NF;i++)printf("%dx%d=%d%s", i, NR, i*NR, i==NR?"\n":"\t")}'
彩蛋
awk相當(dāng)于一個(gè)語(yǔ)言,它能實(shí)現(xiàn)判斷句等其他語(yǔ)言共有的特點(diǎn)
if-else語(yǔ)句
如下程序?qū)⒂?jì)算時(shí)薪超過(guò)6美元的員工的總薪酬與平均薪酬。它使用一個(gè) if 來(lái)防范計(jì)算平均薪酬時(shí)的零除問(wèn)題
$2 > 6 { n = n + 1; pay = pay + $2 * $3 }
END { if (n > 0)
print n, "employees, total pay is", pay,
"average pay is", pay/n
else
print "no employees are paid more than $6/hour"
}
{ line[NR] = $0 } # 記下每個(gè)輸入行
END { i = NR # 逆序打印
while (i > 0) {
print line[i]
i = i - 1
}
}
進(jìn)階函數(shù)
split
echo ‘a(chǎn)bcd’ | awk ‘{len=split($0,a,””);for(i=1;i<=len;i++)print “a[“i”]=”a[i];print “l(fā)ength=”len}’
# a[1]=a a[2]=b a[3]=c a[4]=d length=4
awk '{split($2,a,"-");if(a[2]==01){b[$1]+=$4}}END{for(i in b)print i,b[i]}' test.txt
ipstr="192.168.1.2,192.168.1.3"
awk 'BEGIN{split('"\"$ipstr\""',a,",");for(i in a)print "sa["i"]="a[i]}'
cat config |awk -v FS="\t" '{split($0,a);split(a[1],b,"/"); print $0","b[5]}'
netstat | awk '{printf "%-8s %-8s %-8s %-18s %-22s %-15s\n",$1,$2,$3,$4,$5,$6}'
awk '$3==0 && $6=="LISTEN" || NR==1 '
示例
https://coolshell.cn/articles/9070.html
https://www.ibm.com/support/knowledgecenter/zh/ssw_aix_72/com.ibm.aix.cmds1/awk.htm
函數(shù)示例多
http://linuxcommand.org/lc3_adv_awk.php