awk

awk使用方法

AWK是一種處理文本文件的語(yǔ)言,是一個(gè)強(qiáng)大的文本分析工具。之所以叫AWK是因?yàn)槠淙×巳粍?chuàng)始人 Alfred Aho,Peter Weinberger, 和 Brian Kernighan 的Family Name的首字符。
awk有3個(gè)不同版本: awk、nawk和gawk,未作特別說(shuō)明,一般指gawk,gawk 是 AWK 的 GNU 版本。

參數(shù)

awk [option] '{pattern + action}' {filenames} # sometims muti file is ok
awk [option] 'BEGIN{初始代碼} {循環(huán)代碼} END{最后代碼}' filename

常用基本用法總覽

echo 1 2 3 |awk '{ print "total pay for", $1, "is", $2 * $3 }'
# awk '{ printf("total pay for %s is $%.2f\n", $1, $2 * $3) }'

# awk -F  :-F的意思就是指定分隔符
echo $PATH|awk -F ':' '{print $1}'
# awk -F '[;:]' 指定多個(gè)分隔符
# -F相當(dāng)于內(nèi)置變量FS, 指定分割字符
cat /etc/passwd |awk  -F ':'  '{print $1"\t"$7}'
cat /etc/passwd |awk  -F ':'  'BEGIN {print "name,shell"}  {print $1","$7} END {print "blue,/bin/nosh"}'
awk '{count++;print $0;} END{print "user count is ", count}' /etc/passwd
# less -S tmp.txt |awk 'BEGIN{sum=3}{sum=sum+1}END{print sum}'
awk -F ':' 'BEGIN {count=0;} {name[count] = $1;count++;}; END{for (i = 0; i < NR; i++) print i, name[i]}' /etc/passwd

awk '$3 == 0 { print $1 }' emp.data
#  less -S /public/reference/gtf/ensembl/homo_sapiens_86/Homo_sapiens.GRCh38.86.gtf.gz|awk '$3=="gene"{print $0}'|less -S
# 根據(jù)gtf測(cè)試文本來(lái)提取區(qū)域
# less -S /public/reference/gtf/ensembl/homo_sapiens_86/Homo_sapiens.GRCh38.86.gtf.gz|awk '$4>20000 {print $0}'|less -S
# less -S /public/reference/gtf/ensembl/homo_sapiens_86/Homo_sapiens.GRCh38.86.gtf.gz|awk '$1=="1" && $4>20000 && $5<30000 {print $0}'

awk '$3 >0 { print $1, $2 * $3 }' emp.data
awk '$3 > 15 { num = num + 1 } END{ print num" lines" }'
awk 'NF != 3     { print $0, "number of fields is not equal to 3" }'  # 與 && , 或 || , 以及非 !
awk '$2 < 3.35   { print $0, "rate is below minimum wage" }'  # Eg: !($2 < 4 && $3 < 20)
# 體會(huì)內(nèi)置變量 NR NF
nl /etc/passwd |awk 'END { print NR, "employees" }'
nl /etc/passwd |awk '{print NF}'
# 體會(huì)賦值
cat /etc/passwd |awk -v FS=":" '{print NF}'


awk     '{ pay = pay + $2 * $3 }
END { print NR, "employees"
      print "total pay is", pay
      print "average pay is", pay/NR
    }'

匹配

### 不講
awk '/LISTEN/' netstat.txt
awk '/LISTEN/' netstat.txt
awk '$6 ~ /FIN|TIME/ || NR==1 {print NR,$4,$5,$6}' OFS="\t" netstat.txt
awk '$6 !~ /WAIT/ || NR==1 {print NR,$4,$5,$6}' OFS="\t" netstat.txt
awk '!/WAIT/' netstat.txt
awk 'NR!=1{print > $6}' netstat.txt

$ awk 'NR!=1{if($6 ~ /TIME|ESTABLISHED/) print > "1.txt";
else if($6 ~ /LISTEN/) print > "2.txt";
else print > "3.txt" }' netstat.txt

$ awk 'NR!=1{a[$6]++;} END {for (i in a) print i ", " a[i];}' netstat.txt



示例

log.txt文本內(nèi)容如下:

2 this is a test
3 Are you like awk
This's a test
10 There are orange,apple,mongo

實(shí)例:

# 使用","分割
 $  awk -F, '{print $1,$2}'   log.txt
 ---------------------------------------------
 2 this is a test
 3 Are you like awk
 This's a test
 10 There are orange apple
 # 或者使用內(nèi)建變量
 $ awk 'BEGIN{FS=","} {print $1,$2}'     log.txt
 ---------------------------------------------
 2 this is a test
 3 Are you like awk
 This's a test
 10 There are orange apple
 # 使用多個(gè)分隔符.先使用空格分割,然后對(duì)分割結(jié)果再使用","分割
 $ awk -F '[ ,]'  '{print $1,$2,$5}'   log.txt
 ---------------------------------------------
 2 this test
 3 Are awk
 This's a
 10 There apple

運(yùn)算符

運(yùn)算符 描述
= += -= *= /= %= ^= **= 賦值
?: C條件表達(dá)式
|| 邏輯或
&& 邏輯與
~ !~ 匹配正則表達(dá)式和不匹配正則表達(dá)式
< <= > >= != == 關(guān)系運(yùn)算符
空格 連接
+ - 加,減
* / % 乘,除與求余
+ - ! 一元加,減和邏輯非
^ *** 求冪
++ -- 增加或減少,作為前綴或后綴
$ 字段引用
in 數(shù)組成員

重點(diǎn):記憶匹配~,> ,==判斷

示例

過(guò)濾第一列大于2的行

$ awk '$1>2' log.txt    #命令
#輸出
3 Are you like awk
This's a test
10 There are orange,apple,mongo

過(guò)濾第一列大于2并且第二列等于'Are'的行

$ awk '$1>2 && $2=="Are" {print $1,$2,$3}' log.txt    #命令
#輸出
3 Are you

內(nèi)建變量

賦值時(shí)結(jié)合-v一起使用

常見(jiàn)變量 描述
$n 當(dāng)前記錄的第n個(gè)字段,字段間由FS分隔
$0 完整的輸入記錄
FS 字段分隔符(默認(rèn)是任何空格)
OFS 輸出記錄分隔符(輸出換行符),輸出時(shí)用指定的符號(hào)代替換行符
NF 一條記錄的字段的數(shù)目
NR 已經(jīng)讀出的記錄數(shù),就是行號(hào),從1開(kāi)始
RS 記錄分隔符(默認(rèn)是一個(gè)換行符)
ORS 輸出記錄分隔符(默認(rèn)值是一個(gè)換行符)
變量 描述
ARGC 命令行參數(shù)的數(shù)目
ARGIND 命令行中當(dāng)前文件的位置(從0開(kāi)始算)
ARGV 包含命令行參數(shù)的數(shù)組
CONVFMT 數(shù)字轉(zhuǎn)換格式(默認(rèn)值為%.6g)ENVIRON環(huán)境變量關(guān)聯(lián)數(shù)組
ERRNO 最后一個(gè)系統(tǒng)錯(cuò)誤的描述
FIELDWIDTHS 字段寬度列表(用空格鍵分隔)
FILENAME 當(dāng)前文件名
FNR 各文件分別計(jì)數(shù)的行號(hào)
IGNORECASE 如果為真,則進(jìn)行忽略大小寫(xiě)的匹配
OFMT 數(shù)字的輸出格式(默認(rèn)值是%.6g)
RLENGTH 由match函數(shù)所匹配的字符串的長(zhǎng)度
RSTART 由match函數(shù)所匹配的字符串的第一個(gè)位置
SUBSEP 數(shù)組下標(biāo)分隔符(默認(rèn)值是/034)
awk -v  # 設(shè)置變量

示例:

seq 1 10|awk 'BEGIN{ ORS=" " }{ print $0 }'
seq 1 10|awk 'BEGIN{ RS="\t" }{ print $0"OK" }'

seq 1 10|awk 'BEGIN{ RS="\n" }{ print $0"OK" }'
1OK
2OK
3OK
4OK
5OK
6OK
7OK
8OK
9OK
10OK

前兩個(gè)運(yùn)行示例:

1 2 3 4 5 6 7 8 9 10 $

1
2
3
4
5
6
7
8
9
10
OK

實(shí)例:

 $ awk -va=1 '{print $1,$1+a}' log.txt
 ---------------------------------------------
 2 3
 3 4
 This's 1
 10 11

awk 'BEGIN{printf "%4s %4s %4s %4s %4s %4s %4s %4s %4s\n","FILENAME","ARGC","FNR","FS","NF","NR","OFS","ORS","RS";printf "---------------------------------------------\n"} {printf "%4s %4s %4s %4s %4s %4s %4s %4s %4s\n",FILENAME,ARGC,FNR,FS,NF,NR,OFS,ORS,RS}'  log.txt
FILENAME ARGC  FNR   FS   NF   NR  OFS  ORS   RS
---------------------------------------------
log.txt    2    1         5    1
log.txt    2    2         5    2
log.txt    2    3         3    3
log.txt    2    4         4    4
$ awk -F\' 'BEGIN{printf "%4s %4s %4s %4s %4s %4s %4s %4s %4s\n","FILENAME","ARGC","FNR","FS","NF","NR","OFS","ORS","RS";printf "---------------------------------------------\n"} {printf "%4s %4s %4s %4s %4s %4s %4s %4s %4s\n",FILENAME,ARGC,FNR,FS,NF,NR,OFS,ORS,RS}'  log.txt
FILENAME ARGC  FNR   FS   NF   NR  OFS  ORS   RS
---------------------------------------------
log.txt    2    1    '    1    1
log.txt    2    2    '    1    2
log.txt    2    3    '    2    3
log.txt    2    4    '    1    4
# 輸出順序號(hào) NR, 匹配文本行號(hào)
$ awk '{print NR,FNR,$1,$2,$3}' log.txt
---------------------------------------------
1 1 2 this is
2 2 3 Are you
3 3 This's a test
4 4 10 There are
# 指定輸出分割符
$  awk '{print $1,$2,$5}' OFS=" $ "  log.txt
---------------------------------------------
2 $ this $ test
3 $ Are $ awk
This's $ a $
10 $ There $

正則匹配

# 輸出第二列包含 "th",并打印第二列與第四列
$ awk '$2 ~ /th/ {print $2,$4}' log.txt
---------------------------------------------
this a

~ 表示模式開(kāi)始。// 中是模式。

# 輸出包含"re" 的行
$ awk '/re/' log.txt
---------------------------------------------
3 Are you like awk
10 There are orange,apple,mongo

不匹配:

awk '2 !~ /th/ {print 2,4}' log.txt

忽略大小寫(xiě)

$ awk 'BEGIN{IGNORECASE=1} /this/' log.txt
---------------------------------------------
2 this is a test
This's a test

awk腳本

關(guān)于awk腳本,我們需要注意兩個(gè)關(guān)鍵詞BEGIN和END。

  • BEGIN{ 這里面放的是執(zhí)行前的語(yǔ)句 }
  • END {這里面放的是處理完所有的行后要執(zhí)行的語(yǔ)句 }
  • {這里面放的是處理每一行時(shí)要執(zhí)行的語(yǔ)句}

假設(shè)有這么一個(gè)文件(學(xué)生成績(jī)表):

$ cat score.txt
Marry   2143 78 84 77
Jack    2321 66 78 45
Tom     2122 48 77 71
Mike    2537 87 97 95
Bob     2415 40 57 62

我們的awk腳本如下:

$ cat cal.awk
#!/bin/awk -f
#運(yùn)行前
BEGIN {
    math = 0
    english = 0
    computer = 0
 
    printf "NAME    NO.   MATH  ENGLISH  COMPUTER   TOTAL\n"
    printf "---------------------------------------------\n"
}
#運(yùn)行中
{
    math+=$3
    english+=$4
    computer+=$5
    printf "%-6s %-6s %4d %8d %8d %8d\n", $1, $2, $3,$4,$5, $3+$4+$5
}
#運(yùn)行后
END {
    printf "---------------------------------------------\n"
    printf "  TOTAL:%10d %8d %8d \n", math, english, computer
    printf "AVERAGE:%10.2f %8.2f %8.2f\n", math/NR, english/NR, computer/NR
}

我們來(lái)看一下執(zhí)行結(jié)果:

$ awk -f cal.awk score.txt
NAME    NO.   MATH  ENGLISH  COMPUTER   TOTAL
---------------------------------------------
Marry  2143     78       84       77      239
Jack   2321     66       78       45      189
Tom    2122     48       77       71      196
Mike   2537     87       97       95      279
Bob    2415     40       57       62      159
---------------------------------------------
  TOTAL:       319      393      350
AVERAGE:     63.80    78.60    70.00

另外一些實(shí)例

AWK的hello world程序?yàn)椋?/p>

BEGIN { print "Hello, world!" }

計(jì)算文件大小

$ ls -l *.txt | awk '{sum+=$6} END {print sum}'
--------------------------------------------------
666581

從文件中找出長(zhǎng)度大于80的行

awk 'length>80' log.txt

打印九九乘法表

seq 9 | sed 'H;g' | awk -v RS='' '{for(i=1;i<=NF;i++)printf("%dx%d=%d%s", i, NR, i*NR, i==NR?"\n":"\t")}'
彩蛋

awk相當(dāng)于一個(gè)語(yǔ)言,它能實(shí)現(xiàn)判斷句等其他語(yǔ)言共有的特點(diǎn)
if-else語(yǔ)句

如下程序?qū)⒂?jì)算時(shí)薪超過(guò)6美元的員工的總薪酬與平均薪酬。它使用一個(gè) if 來(lái)防范計(jì)算平均薪酬時(shí)的零除問(wèn)題

$2 > 6 { n = n + 1; pay = pay + $2 * $3 }
END    { if (n > 0)
            print n, "employees, total pay is", pay,
                     "average pay is", pay/n
         else
             print "no employees are paid more than $6/hour"
        }
    { line[NR] = $0 }  # 記下每個(gè)輸入行

END { i = NR           # 逆序打印
      while (i > 0) {
        print line[i]
        i = i - 1
      }
    }

進(jìn)階函數(shù)

split
echo ‘a(chǎn)bcd’ | awk ‘{len=split($0,a,””);for(i=1;i<=len;i++)print “a[“i”]=”a[i];print “l(fā)ength=”len}’
# a[1]=a  a[2]=b  a[3]=c  a[4]=d  length=4
awk '{split($2,a,"-");if(a[2]==01){b[$1]+=$4}}END{for(i in b)print i,b[i]}' test.txt 

ipstr="192.168.1.2,192.168.1.3"
awk 'BEGIN{split('"\"$ipstr\""',a,",");for(i in a)print "sa["i"]="a[i]}'

cat config |awk -v FS="\t" '{split($0,a);split(a[1],b,"/"); print $0","b[5]}'


netstat | awk '{printf "%-8s %-8s %-8s %-18s %-22s %-15s\n",$1,$2,$3,$4,$5,$6}'
awk '$3==0 && $6=="LISTEN" || NR==1 '

示例

https://coolshell.cn/articles/9070.html

https://www.ibm.com/support/knowledgecenter/zh/ssw_aix_72/com.ibm.aix.cmds1/awk.htm

函數(shù)示例多

http://linuxcommand.org/lc3_adv_awk.php

學(xué)習(xí)//匹配

https://www.cnblogs.com/sunada2005/p/3493941.html

最后編輯于
?著作權(quán)歸作者所有,轉(zhuǎn)載或內(nèi)容合作請(qǐng)聯(lián)系作者
【社區(qū)內(nèi)容提示】社區(qū)部分內(nèi)容疑似由AI輔助生成,瀏覽時(shí)請(qǐng)結(jié)合常識(shí)與多方信息審慎甄別。
平臺(tái)聲明:文章內(nèi)容(如有圖片或視頻亦包括在內(nèi))由作者上傳并發(fā)布,文章內(nèi)容僅代表作者本人觀點(diǎn),簡(jiǎn)書(shū)系信息發(fā)布平臺(tái),僅提供信息存儲(chǔ)服務(wù)。

相關(guān)閱讀更多精彩內(nèi)容

友情鏈接更多精彩內(nèi)容