1. 提取文件中的時區(qū)并計數(shù)

有三種寫法，雖然常用的是pandas，其實collections做起來也很快。
1.1 純Python代碼，提取并統(tǒng)計時區(qū)信息
1.2. 純Python代碼，應用collections.Counter()模塊簡寫
1.3 用pandas處理，并用matplotlib.pyplot畫圖

1.1 純Python代碼，提取并統(tǒng)計時區(qū)信息

從文件中提取時區(qū)信息并變?yōu)榱斜?/li>
計算每個時區(qū)出現(xiàn)次數(shù)
排序并打印出現(xiàn)次數(shù)最高的n個時區(qū)。

# Uses Python3.6

import json

# extract the timezones from the file

path = 'usagov_bitly_data2012-03-16-1331923249.txt'
records = [json.loads(line) for line in open(path)]
time_zones = [rec['tz'] for rec in records if 'tz' in rec]

# count the timezones appearance

def get_counts(sequence):
    counts = dict()
    for x in sequence:
        counts[x] = counts.get(x,0) + 1
    return counts

counts = get_counts(time_zones)

# compute and print the top appearance of the timezones and their counts. 

def top_counts(count_dict, a ):
    n = int(a)
    value_key_pairs = [(count,tz) for tz,count in count_dict.items()]
    value_key_pairs.sort()
    return value_key_pairs[-n:]

print(top_counts(counts,3))

#output 
[(400, 'America/Chicago'), (521, ''), (1251, 'America/New_York')]

1.2. 純Python代碼，應用collections.Counter()模塊簡寫

用collections.Counters就能一鍵計數(shù)啦，十分方便。

import json
from collections import Counter

# extract the timezones from the file

path = 'usagov_bitly_data2012-03-16-1331923249.txt'
records = [json.loads(line) for line in open(path)]
time_zones = [rec['tz'] for rec in records if 'tz' in rec]

# count the timezones appearance

counts = Counter(time_zones)

# compute and print the top appearance of the timezones and their counts. 

print(counts.most_common(3))

1.3 用pandas處理，并用matplotlib.pyplot畫圖

# Input, uses python 3.6

import json
import pandas as pd
import matplotlib.pyplot as plt

path = 'usagov_bitly_data2012-03-16-1331923249.txt'
records = [json.loads(line) for line in open(path)]

# counts the appearance of the timezone
frame = pd.DataFrame(records)
clean_tz = frame['tz'].fillna('Missing')
clean_tz[clean_tz == ''] = 'Unknown'
tz_counts = clean_tz.value_counts()
print(tz_counts[:10])

# plot it and shows it 
tz_counts[:10].plot(kind='barh',rot=0)
plt.show()

# Output 
America/New_York       1251
Unknown                 521
America/Chicago         400
America/Los_Angeles     382
America/Denver          191
Missing                 120
Europe/London            74
Asia/Tokyo               37
Pacific/Honolulu         36
Europe/Madrid            35
Name: tz, dtype: int64

pandas-timezone.png

學習總結(jié)：

取信息并組成列表，可以用[ ]并在其中有簡單的循環(huán)和條件判斷操作。
重用的代碼段寫為函數(shù)，方便調(diào)用。
如果沒接觸過collections ，可以看我的總結(jié) 如何使用python3 的 collections 模塊/庫, Container datatypes

參考內(nèi)容：

《利用python進行數(shù)據(jù)分析》Wes McKinney
示例代碼在github上。
https://github.com/wesm/pydata-book
可以下載個zip包到本地看，也可以用git clone下來。
pydata-book-2nd-edition.zip

色偷偷精品伊人,欧洲久久精品,欧美综合婷婷骚逼,国产AV主播,国产最新探花在线,九色在线视频一区,伊人大交九欧美,1769亚洲,黄色成人av

簡單統(tǒng)計數(shù)據(jù)與可視化，Python數(shù)據(jù)分析-ch2.1

簡單統(tǒng)計數(shù)據(jù)與可視化，Python數(shù)據(jù)分析-ch2.1

1. 提取文件中的時區(qū)并計數(shù)

1.1 純Python代碼，提取并統(tǒng)計時區(qū)信息

1.2. 純Python代碼，應用collections.Counter()模塊簡寫

1.3 用pandas處理，并用matplotlib.pyplot畫圖

相關(guān)閱讀更多精彩內(nèi)容

友情鏈接更多精彩內(nèi)容

色偷偷精品伊人,欧洲久久精品,欧美综合婷婷骚逼,国产AV主播,国产最新探花在线,九色在线视频一区,伊人大交九 欧美,1769亚洲,黄色成人av

簡單統(tǒng)計數(shù)據(jù)與可視化，Python數(shù)據(jù)分析-ch2.1

1. 提取文件中的時區(qū)并計數(shù)

1.1 純Python代碼，提取并統(tǒng)計時區(qū)信息

1.2. 純Python代碼，應用collections.Counter()模塊簡寫

1.3 用pandas處理，并用matplotlib.pyplot畫圖

相關(guān)閱讀更多精彩內(nèi)容

友情鏈接更多精彩內(nèi)容

色偷偷精品伊人,欧洲久久精品,欧美综合婷婷骚逼,国产AV主播,国产最新探花在线,九色在线视频一区,伊人大交九欧美,1769亚洲,黄色成人av

簡單統(tǒng)計數(shù)據(jù)與可視化，Python數(shù)據(jù)分析-ch2.1

1.1 純Python代碼，提取并統(tǒng)計時區(qū)信息

1.2. 純Python代碼，應用collections.Counter()模塊簡寫

1.3 用pandas處理，并用matplotlib.pyplot畫圖