Python數(shù)據(jù)科學(xué)(三)- python與數(shù)據(jù)科學(xué)應(yīng)用(Ⅲ)

傳送門:

1.使用Python計(jì)算文章中的字

speech_text = '''
  I love you,Not for what you are,But for what I amWhen I am with you.I love you,Not
 only for whatYou have made of yourself,But for whatYou are making of me.I love
 youFor the part of meThat you bring out;I love youFor putting your handInto my
 heaped-up heartAnd passing overAll the foolish, weak thingsThat you can’t
 helpDimly seeing there,And for drawing outInto the lightAll the beautiful
 belongingsThat no one else had lookedQuite far enough to find.I love you because
 youAre helping me to makeOf the lumber of my lifeNot a tavernBut a temple;Out of 
the worksOf my every dayNot a reproachBut a song.I love youBecause you have
 doneMore than any creedCould have doneTo make me goodAnd more than any
 fateCould have doneTo make me happy.You have done itWithout a touch,Without a
 word,Without a sign.You have done itBy being yourself.Perhaps that is whatBeing a 
friend means,After all.
'''

speech = speech_text.split()

dic = {}
for word in speech:
    if word not in dic:
        dic[word]=1
    else:
        dic[word]=dic[word] + 1


dic.items()

在使用nltk的時候,發(fā)現(xiàn)一直報(bào)錯,可以使用下邊兩行命令安裝nltk

import nltk
nltk.download()

會彈出以下窗口,下載nltk.


正在下載

如果這種方式下載完成了 那就直接跳過下一步

我下了很多次最后都下載失敗了,現(xiàn)在說第二種方法。
直接下載打包好的安裝包:下載地址1:云盤密碼znx7,下來的包nltk_data.zip 解壓到C盤根目錄下,這樣是最保險的,防止找不到包。下載地址2:云盤密碼4cp3

感謝【V_can--Python與自然語言處理_第一期_NLTK入門之環(huán)境搭建提供的安裝包】

去除停用詞

2.使用第二種方法直接使用python中的第三方庫Counter

#代碼如下
from collections import Counter
c = Counter(speech)
c. most_common(10)#出現(xiàn)的前十名
print(c. most_common(10))

for sw in stop_words:
    del c[sw]
c.most_common(10)
Counter 是實(shí)現(xiàn)的 dict 的一個子類,可以用來方便地計(jì)數(shù)。
  • 附上完整代碼

speech_text = '''
I love you,
Not for what you are,
But for what I amWhen I am with you.
I love you,
Not only for whatYou have made of yourself,
But for whatYou are making of me.
I love youFor the part of meThat you bring out;
I love youFor putting your handInto my heaped-up heartAnd passing overAll the foolish, 
weak thingsThat you can’t helpDimly seeing there,
And for drawing outInto the lightAll the beautiful belongingsThat no one else had lookedQuite far enough to find.
I love you because youAre helping me to makeOf the lumber of my lifeNot a tavernBut a temple;
Out of the worksOf my every dayNot a reproachBut a song.
I love youBecause you have doneMore than any creedCould have doneTo make me goodAnd more than any fateCould have doneTo make me happy.
You have done itWithout a touch,
Without a word,
Without a sign.
You have done itBy being yourself.
Perhaps that is whatBeing a friend means,
After all.
'''

#解決大小寫的問題
speech = speech_text.lower().split()
print(speech)

dic = {}
for word in  speech:
    if word not in dic:
        dic[word] = 1
    else:
        dic[word] = dic[word] + 1

import operator
swd = sorted(dic.items(),key=operator.itemgetter(1),reverse=True)
print(swd)

#停用詞處理
from nltk.corpus import stopwords
stop_words = stopwords.words('English')

for k,v in swd:
    if k not in stop_words:
        print(k,v)


from collections import Counter
c = Counter(speech)
c. most_common(10)#出現(xiàn)的前十名
print(c. most_common(10))

for sw in stop_words:
    del c[sw]
c.most_common(10)

通過這兩種方法我們就不難明白為什么現(xiàn)在Python 在數(shù)據(jù)分析、科學(xué)計(jì)算領(lǐng)域用得越來越多,除了語言本身的特點(diǎn),第三方庫也很多很好用。

所以還等什么,人生幾何,何不Python當(dāng)歌。 跟我一塊學(xué)Python吧。

最后編輯于
?著作權(quán)歸作者所有,轉(zhuǎn)載或內(nèi)容合作請聯(lián)系作者
【社區(qū)內(nèi)容提示】社區(qū)部分內(nèi)容疑似由AI輔助生成,瀏覽時請結(jié)合常識與多方信息審慎甄別。
平臺聲明:文章內(nèi)容(如有圖片或視頻亦包括在內(nèi))由作者上傳并發(fā)布,文章內(nèi)容僅代表作者本人觀點(diǎn),簡書系信息發(fā)布平臺,僅提供信息存儲服務(wù)。

友情鏈接更多精彩內(nèi)容