pandas基礎

Pandas借鑒了Numpy絕大部分設計思想，但與Numpy不同的是它更適合于處理表格類、異質性數(shù)據(jù)，而Numpy則是處理同質的數(shù)值數(shù)組。Pandas還能無縫與Numpy, SciPy, statsmodels, scikit-learn, matplotlib等包聯(lián)用，構建了Python數(shù)據(jù)分析生態(tài)系統(tǒng)。

Pandas最主要的兩類數(shù)據(jù)結構：Series, DataFrame，可以對應R語言的vector和data.frame，腦圖如下

DataFrame基礎功能

學習筆記如下：

pandas的索引對象用于存放軸標簽和其他元數(shù)據(jù)信息，索引對象不可修改，目的是安全的將該索對象傳遞給其他數(shù)據(jù)結構。、
reindex并不是修改原來的索引，而會在原來的基礎上增加新的索引。
對DataFrame或Series修改形狀，刪除數(shù)據(jù)的操作默認返回新的數(shù)據(jù)結構。可以用inplace=True避免返回新的數(shù)據(jù)，不過這也通常會摧毀原來的數(shù)據(jù)。
明確loc和iloc的區(qū)別。如果你創(chuàng)建Series或DataFrame的index存在整數(shù)，那么細細體會下obj[:1],obj.loc[:1],obj.iloc[:1]
排序和排名(sort and rank)看起來差不多，畢竟排名先要排序，排序之后分配位置，注意重復值的處理方法。

最重要的部分是描述性統(tǒng)計分析部分，這部分依賴于現(xiàn)有的函數(shù)

方法	說明
desribe	列計算匯總，列出四分位數(shù)等信息
max，min	最大值和最小值
idxmin, idxmax	最大值和最小值的索引位置
quantile	分位數(shù)
sum	求和
mean	平均數(shù)
median	中位數(shù)
mad	根據(jù)平均值計算平均離差
var	方差
std	標準差
skew	樣本值的偏度（三階矩）
kurt	樣本值的豐度（四階矩）
cumsum	樣本的累積和
cummin,cummax	累計最大值和最小值
cumprod	累積積
diff	計算一階差分
pct_change	計算百分比變化

官方文檔的教程

此處翻譯官方文檔的10 Minutes to pandas，有任何問題歡迎留言交流。

本文主要簡單的介紹了pandas，讓新手能夠了解pandas的一些功能，你可以在Cookbook中看到更詳盡的內容。

在執(zhí)行以下的操作前，請先導入相應的庫：

import pandas as pd 
import numpy as np
import matplotlib.pyplot as plt

創(chuàng)建對象

通過傳入一個包含多個值的列表創(chuàng)建一個Series對象，pands會默認為其創(chuàng)建一個整數(shù)索引。

s = pd.Series([1,3,5,np.nan,6,8])

通過傳入一個含有日期索引和標簽列的numpy矩陣創(chuàng)建一個DataFrame對象

dates = pd.date_range('20130101', periods=6)
df = pd.DataFrame(np.random.randn(6,4), index=dates, columns=list('ABCD'))

通過傳遞一個字典創(chuàng)建一個DataFrame

df2 = pd.DataFrame({ 'A' : 1.,
                   'B' : pd.Timestamp('20130102'),
                   'C' : pd.Series(1,index=list(range(4)),dtype='float32'),
                   'D' : np.array([3] * 4,dtype='int32'),
                   'E' : pd.Categorical(["test","train","test","train"]),
                   'F' : 'foo' })

DataFrame和Series具有許多屬性，可以利用IPyton的自動補全功能查看：

df2.<TAB>
    df2.A                  df2.boxplot
    df2.abs                df2.C
    df2.add                df2.clip
    df2.add_prefix         df2.clip_lower
    df2.add_suffix         df2.clip_upper
    df2.align              df2.columns
    df2.all                df2.combine
    df2.any                df2.combineAdd
    df2.append             df2.combine_first
    df2.apply              df2.combineMult
    df2.applymap           df2.compound
    df2.as_blocks          df2.consolidate
    df2.asfreq             df2.convert_objects
    df2.as_matrix          df2.copy
    df2.astype             df2.corr
    df2.at                 df2.corrwith
    df2.at_time            df2.count
    df2.axes               df2.cov
    df2.B                  df2.cummax
    df2.between_time       df2.cummin
    df2.bfill              df2.cumprod
    df2.blocks             df2.cumsum
    df2.bool               df2.D

查看數(shù)據(jù)

假設你有上w條數(shù)據(jù)，全部顯示屏幕要爆炸，那么最好的方法就是只看前面幾條或后面幾條，驗證創(chuàng)建的數(shù)據(jù)模型是否正確。

df.head()
df.tail(3)

顯示索引，列，和底層numpy的數(shù)據(jù)

df.index
df.columns
df.values

對數(shù)據(jù)進行快速的統(tǒng)計匯總，這里匯總的數(shù)據(jù)的數(shù)據(jù)類型是Int,float這類

df.describe()

統(tǒng)計性描述

數(shù)據(jù)轉置
```
df.T
```
按軸排序(ascending：升序）
```
df.sort_index(axis=1,ascending=False)
```
按值排序，類似于excel的排序
```
df.sort_value(by='B')
```

篩選

起步

選擇單列，這會產生一個Series，等同于df.A
```
df['A']
```
使用[]對行切片
```
df[0:3]
```

使用標簽篩選

使用標簽獲取切片數(shù)據(jù)
```
df.loc[date[0]]
```
使用標簽獲取多軸數(shù)據(jù)
```
df.loc[:,['A','B']]
```

顯示標簽切片，包括兩個端點

df.loc['20130102':'20130104',['A','B']]

獲取標量值
```
df.loc[dates[0],'A']
```
快速獲取標量值（與上一個作用相同）
```
df.at[dates[0],'A']
```

通過位置篩選

通過所傳遞整數(shù)的位置選擇
```
df.iloc[3]
```
通過整數(shù)切片
```
df.iloc[3:5,2:3]
```
通過整數(shù)位地址的列表
```
df.iloc[[1,2,4],[0,2]]
```
對行/列切片
```
df.iloc[1:3,:]
df.ilo[:,1,3]
```
獲得特定值
```
  df.iloc[1,1]
```
快速獲取標量（與上一個結果相同）
```
  df.iat[1,2]
```

布爾索引

使用單個列的值來選擇數(shù)據(jù)。
```
df[df.A > 0]
```

使用isin()方法進行過濾，下面實現(xiàn)的是篩選E中'tw'和'four'兩列

df2 = df.copy()
df2['E'] = ['one', 'one','two','three','four','three']
 df2[df2['E'].isin(['two','four'])]

賦值

為新列賦值，該列能夠通過標簽自動匹配原先的數(shù)據(jù)

s1 = pd.Series([1,2,3,4,5,6], index=pd.date_range('20130102', periods=6))
df['F']=s1

通過標簽賦值
```
df.at[dates[0],'A'] = 0
```
通過位置賦值
```
df.iat[0,1] = 0
```
通過傳入一個numy矩陣賦值
```
 df.iat[0,1] = 0
```

缺失值

pandas優(yōu)先使用np.nan表示缺失值。缺失值默認在計算中排除。

重建索引允許您更改/添加/刪除索引上的指定軸。這將返回數(shù)據(jù)的副本。
```
df1 = df.reindex(index=dates[0:4], columns=list(df.columns) + ['E'])
df1.loc[dates[0]:dates[1],'E'] = 1
```
刪除任何包含缺失值的行
```
df1.dropna(how='any')
```
填充缺失值
```
df1.fillna(value=5)
```
判斷是否為缺失值并返回布爾值
```
pd.isnull(df1)
```

操作

統(tǒng)計

描述統(tǒng)計
df.mean() 、 df.mean(1)

apply

將函數(shù)應用到數(shù)據(jù)上

df.apply(np.cumsum)
df.apply(lambda x: x.max() - x.min())

直方圖

 s = pd.Series(np.random.randint(0, 7, size=10))
 s.value_counts()

合并

pandas提供了多種方法方便的合并Series, DataFrame,和Panel對象

Concat

使用concat()串聯(lián)不同pandas對象

df = pd.DataFrame(np.random.randn(10, 4))
pieces = [df[:3], df[3:7], df[7:]]
pd.concat(pieces)

JOIN

SQL風格的合并

left = pd.DataFrame({'key': ['foo', 'foo'], 'lval': [1, 2]})
right = pd.DataFrame({'key': ['foo', 'foo'], 'rval': [4, 5]})
pd.merge(left, right, on='key')

Append

在dataframe中添加行

df = pd.DataFrame(np.random.randn(8, 4), columns=['A','B','C','D'])
s = df.iloc[3]
df.append(s, ignore_index=True)

分組:Groupin

我們所說'group by'是指以下步驟中的一個或多個處理：

將數(shù)據(jù)基于一些標準分成多個組
分別應用函數(shù)到每個組
** 組合**結果成數(shù)據(jù)結構

 df = pd.DataFrame({'A' : ['foo', 'bar', 'foo', 'bar',
                             'foo', 'bar', 'foo', 'foo'],
                    'B' : ['one', 'one', 'two', 'three',
                              'two', 'two', 'one', 'three'],
                    'C' : np.random.randn(8),
                    'D' : np.random.randn(8)})

分組并對所分的組使用sum函數(shù)
```
df.groupby('A').sum()
```
通過多列組合形成了一個層次指數(shù)，我們再應用函數(shù)
```
df.groupby(['A','B']).sum()
```

重塑:Reshaping

堆:stack

tuples = list(zip(*[['bar', 'bar', 'baz', 'baz',
   ....:                      'foo', 'foo', 'qux', 'qux'],
   ....:                     ['one', 'two', 'one', 'two',
   ....:                      'one', 'two', 'one', 'two']]))
index = pd.MultiIndex.from_tuples(tuples, names=['first', 'second'])
df = pd.DataFrame(np.random.randn(8, 2), index=index, columns=['A', 'B'])
df2 = df[:4]
df2

stack

stack()方法“壓縮”了DataFrame的層次。
```
 stacked = df2.stack()
```

數(shù)據(jù)透視表: Pivot Tables

 df = pd.DataFrame({'A' : ['one', 'one', 'two', 'three'] * 3,
   .....:                    'B' : ['A', 'B', 'C'] * 4,
   .....:                    'C' : ['foo', 'foo', 'foo', 'bar', 'bar', 'bar'] * 2,
   .....:                    'D' : np.random.randn(12),
   .....:                    'E' : np.random.randn(12)})
   .....:

我們可以很方便的從這些數(shù)據(jù)構造數(shù)據(jù)透視表：

pd.pivot_table(df, values='D', index=['A', 'B'], columns=['C'])

時間序列<small>Time Series</small>

pandas擁有許多簡單，功能強大，高效的功能可以在波動期間執(zhí)行采樣操作（例如，數(shù)據(jù)轉換成二5每分鐘的數(shù)據(jù)）。常見于，但不限于，財務應用等。

rng = pd.date_range('1/1/2012', periods=100, freq='S')
ts = pd.Series(np.random.randint(0, 500, len(rng)), index=rng)
ts.resample('5Min').sum()

作圖: Plotting

畫圖建議在網頁版的jupyter notebook進行操作，減少不必要的煩惱。

Series畫圖方法

ts = pd.Series(np.random.randn(1000), index=pd.date_range('1/1/2000', periods=1000))
ts = ts.cumsum()
ts.plot()

DataFrame的畫圖方法

df = pd.DataFrame(np.random.randn(1000, 4), index=ts.index, .....: columns=['A', 'B', 'C', 'D'])
df = df.cumsum()
plt.figure(); df.plot(); plt.legend(loc='best')

色偷偷精品伊人,欧洲久久精品,欧美综合婷婷骚逼,国产AV主播,国产最新探花在线,九色在线视频一区,伊人大交九欧美,1769亚洲,黄色成人av

SOTON私人定制：利用Python進行數(shù)據(jù)分析（學習pandas）

SOTON私人定制：利用Python進行數(shù)據(jù)分析（學習pandas）

pandas基礎

官方文檔的教程

創(chuàng)建對象

查看數(shù)據(jù)

篩選

起步

使用標簽篩選

通過位置篩選

布爾索引

賦值

缺失值

操作

統(tǒng)計

apply

直方圖

合并

Concat

JOIN

Append

分組:Groupin

重塑:Reshaping

堆:stack

數(shù)據(jù)透視表: Pivot Tables

時間序列<small>Time Series</small>

作圖: Plotting

相關閱讀更多精彩內容

友情鏈接更多精彩內容

色偷偷精品伊人,欧洲久久精品,欧美综合婷婷骚逼,国产AV主播,国产最新探花在线,九色在线视频一区,伊人大交九 欧美,1769亚洲,黄色成人av

SOTON私人定制：利用Python進行數(shù)據(jù)分析（學習pandas）

pandas基礎

官方文檔的教程

創(chuàng)建對象

查看數(shù)據(jù)

篩選

起步

使用標簽篩選

通過位置篩選

布爾索引

賦值

缺失值

操作

統(tǒng)計

apply

直方圖

合并

Concat

JOIN

Append

分組:Groupin

重塑:Reshaping

堆:stack

數(shù)據(jù)透視表: Pivot Tables

時間序列<small>Time Series</small>

作圖: Plotting

相關閱讀更多精彩內容

友情鏈接更多精彩內容

色偷偷精品伊人,欧洲久久精品,欧美综合婷婷骚逼,国产AV主播,国产最新探花在线,九色在线视频一区,伊人大交九欧美,1769亚洲,黄色成人av