版權(quán)聲明:本文為作者原創(chuàng)文章,可以隨意轉(zhuǎn)載,但必須在明確位置標(biāo)明出處!?。?/h3>
前兩篇文章講了pandas的基本數(shù)據(jù)結(jié)構(gòu)Series,DataFrame。還有一種Panel結(jié)構(gòu)沒有介紹,Panel因為用的比較少,這里先不對它做介紹,本篇文章主要是介紹一下pandas中的基本函數(shù)
add、sub、mul、div
這里最需要理解的是axis(軸)的概念,官方解釋:軸用來為超過一維的數(shù)組定義的屬性,二維數(shù)據(jù)擁有兩個軸:第0軸沿著行方向垂直往下延伸,第1軸沿著列的方向水平延伸。
先看個例子。
import pandas as pd
import numpy as np
data = {'one':{'a':1, 'b':2, 'c':3}, 'two':{'a':4, 'b':5, 'c':6, 'd':7}, 'three':{'b':8, 'c':9, 'd':10}}
df = pd.DataFrame(data)
print(df)
row = df.iloc[1]
print(df.sub(row, axis=1))
OUT:
one three two
a 1.0 NaN 4
b 2.0 8.0 5
c 3.0 9.0 6
d NaN 10.0 7
one three two
a -1.0 NaN -1.0
b 0.0 0.0 0.0
c 1.0 1.0 1.0
d NaN 2.0 2.0
按照定義,軸為1是按照列的方向水平延伸,那么計算邏輯就是1.0 - 2.0 = -1.0;2.0-2.0=0.0;3.0-2.0 = 1.0;NaN - 2.0 = NaN。我們在看看axis=0時的結(jié)果:
import pandas as pd
import numpy as np
data = {'one':{'a':1, 'b':2, 'c':3}, 'two':{'a':4, 'b':5, 'c':6, 'd':7}, 'three':{'b':8, 'c':9, 'd':10}}
df = pd.DataFrame(data)
print(df)
col = df['one']
print(df.sub(col, axis=0))
OUT:
one three two
a 1.0 NaN 4
b 2.0 8.0 5
c 3.0 9.0 6
d NaN 10.0 7
one three two
a 0.0 NaN 3.0
b 0.0 6.0 3.0
c 0.0 6.0 3.0
d NaN NaN NaN
按照定義軸為0是安裝行的方向垂直向下延伸,所以應(yīng)該是1.0-1.0= 0.0; NaN - 1.0 =NaN;4.0 - 1.0 = 3.0;可能很多人都會去記axis=1代表的是行,axis=0代表的是列,如果你是這樣記的那么下面這種情況你就懵了。
drop、mean
import pandas as pd
import numpy as np
data = {'one':{'a':1, 'b':2, 'c':3}, 'two':{'a':4, 'b':5, 'c':6, 'd':7}, 'three':{'b':8, 'c':9, 'd':10}}
df = pd.DataFrame(data)
print(df)
print(df.drop('one', axis=1))
OUT:
one three two
a 1.0 NaN 4
b 2.0 8.0 5
c 3.0 9.0 6
d NaN 10.0 7
three two
a NaN 4
b 8.0 5
c 9.0 6
d 10.0 7
這里刪除一列axis指定的是1,如果你把1記成一行的話那不就只刪除了1.0了嗎?所以這里的意思是按照one這一列的方向,水平刪除每行對應(yīng)的值,所以如果你要刪除某行,你需要這樣做df.drop('a', axis=0); 當(dāng)然mean求平均值也是一個道理。
radd、rsub、rmul、rdiv
add、sub、mul、div、都是用DataFrame中的數(shù)據(jù)去加、減、乘、除選定的行或列,而radd、rsub、rmul、rdiv與之相反。
import pandas as pd
import numpy as np
data = {'one':{'a':1, 'b':2, 'c':3}, 'two':{'a':4, 'b':5, 'c':6, 'd':7}, 'three':{'b':8, 'c':9, 'd':10}}
df = pd.DataFrame(data)
print(df)
row = df.iloc[1]
print(df.rsub(row, axis=1))
OUT:
one three two
a 1.0 NaN 4
b 2.0 8.0 5
c 3.0 9.0 6
d NaN 10.0 7
one three two
a 1.0 NaN 1.0
b 0.0 0.0 0.0
c -1.0 -1.0 -1.0
d NaN -2.0 -2.0
填充缺省數(shù)據(jù)
缺省數(shù)據(jù)的填充可以是用使用函數(shù)選項fill_value,也可以使用函數(shù)fillna, 使用fill_value選項,如果兩個DataFrame數(shù)據(jù)結(jié)構(gòu)在相同索引位置都為NaN,那么它不會你指定的值去填充
import pandas as pd
import numpy as np
data = {'one':{'a':1, 'b':2, 'c':3}, 'two':{'a':4, 'b':5, 'c':6, 'd':7}, 'three':{'b':8, 'c':9, 'd':10}}
df = pd.DataFrame(data)
print(df)
print(df.add(df, fill_value=0.0))
OUT:
one three two
a 1.0 NaN 4
b 2.0 8.0 5
c 3.0 9.0 6
d NaN 10.0 7
one three two
a 2.0 NaN 8
b 4.0 16.0 10
c 6.0 18.0 12
d NaN 20.0 14
可以從結(jié)果中看到NaN依然沒有被0.0填充。如果使用fillna就不會出現(xiàn)這種情況
import pandas as pd
import numpy as np
data = {'one':{'a':1, 'b':2, 'c':3}, 'two':{'a':4, 'b':5, 'c':6, 'd':7}, 'three':{'b':8, 'c':9, 'd':10}}
df = pd.DataFrame(data)
print(df)
print(df.add(df).fillna(0))
OUT:
one three two
a 1.0 NaN 4
b 2.0 8.0 5
c 3.0 9.0 6
d NaN 10.0 7
one three two
a 2.0 0.0 8
b 4.0 16.0 10
c 6.0 18.0 12
d 0.0 20.0 14
判斷兩個DataFrame是否相等使用equals
判斷兩個DataFrame數(shù)據(jù)結(jié)構(gòu)是否相等不能用==來判斷,因為它是兩個對象并不是簡單數(shù)據(jù)類型之間的比較。
import pandas as pd
import numpy as np
data = {'one':{'a':1, 'b':2, 'c':3}, 'two':{'a':4, 'b':5, 'c':6, 'd':7}, 'three':{'b':8, 'c':9, 'd':10}}
df = pd.DataFrame(data)
print(df)
print(df + df == df * 2)
print('----------------------')
print((df + df).equals(df * 2))
OUT:
one three two
a 1.0 NaN 4
b 2.0 8.0 5
c 3.0 9.0 6
d NaN 10.0 7
one three two
a True False True
b True True True
c True True True
d False True True
----------------------
True
需要注意的是不管是Series還是DataFrame使用equals函數(shù)時它們的index順序也必須一致才能判斷其兩個數(shù)據(jù)結(jié)構(gòu)之間的數(shù)值是相等的。
import pandas as pd
import numpy as np
# data = {'one':{'a':1, 'b':2, 'c':3}, 'two':{'a':4, 'b':5, 'c':6, 'd':7}, 'three':{'b':8, 'c':9, 'd':10}}
df1 = pd.DataFrame({'col':['foo', 0, np.nan]})
df2 = pd.DataFrame({'col':[np.nan, 0, 'foo']}, index=[2,1,0])
print(df1.equals(df2))
print(df1.equals(df2.sort_index()))
OUT
False
True
當(dāng)然pandas還提供了很多統(tǒng)計之列的函數(shù),這里就不一一做出介紹,無論怎么多動手準(zhǔn)沒錯。本章最重要的是要去理解軸的概念。