起因是一家實(shí)習(xí)公司的電話面試中,問(wèn)到我有沒(méi)有python可視化的實(shí)戰(zhàn)經(jīng)驗(yàn)。在這方面我完全沒(méi)有經(jīng)歷。因?yàn)楦杏X(jué)公司崗位都不水,面試時(shí)間約到第二天的下午。于是我秉承著充分準(zhǔn)備下總是不吃虧的信念,動(dòng)也沒(méi)動(dòng)的做了4個(gè)小時(shí),沒(méi)想到還真的弄出來(lái)了
下面是我的代碼和最終的成果。(純屬個(gè)人娛樂(lè),侵權(quán)刪)
#sjy實(shí)戰(zhàn) 情感分析
import numpy as np
from snownlp import SnowNLP
import matplotlib.pyplot as plt
import imageio
import jieba
f = open('comment.txt', 'r', encoding='UTF-8')
list = f.readlines()
sentimentslist = []
for i in list:
s = SnowNLP(i)
# print s.sentiments
sentimentslist.append(s.sentiments)
plt.hist(sentimentslist, bins=np.arange(0, 1, 0.01), facecolor='b')
plt.xlabel('Sentiments Probability')
plt.ylabel('Quantity')
plt.title('Analysis of Sentiments')
plt.show()#以上部分是情感分析,畫(huà)出的圖形 接近1.0為正面情緒較多 接近0 負(fù)面較多

analysis.png
#詞云
#coding=utf-8
import matplotlib.pyplot as plt
from scipy.misc import imread
from wordcloud import WordCloud
import jieba, codecs
from collections import Counter
text = codecs.open('comment.txt', 'r', encoding='utf-8').read()
text_jieba = jieba.cut(text)
#去停用詞
# 創(chuàng)建停用詞列表
def stopwordslist():
stopwords = [line.strip() for line in open('stopwords.txt',encoding='UTF-8').readlines()]
return stopwords
# 對(duì)句子進(jìn)行中文分詞
def seg_depart(sentence):
# 對(duì)文檔中的每一行進(jìn)行中文分詞
sentence_depart = jieba.cut(sentence.strip())
# 創(chuàng)建一個(gè)停用詞列表
stopwords = stopwordslist()
# 輸出結(jié)果為outstr
outstr = ''
# 去停用詞
for word in sentence_depart:
if word not in stopwords:
if word != '\t':
outstr += word
outstr += " "
return outstr
# 給出文檔路徑
filename = "comment.txt"
outfilename = "clean.txt"
inputs = open(filename, 'r', encoding='UTF-8')
outputs = open(outfilename, 'w', encoding='UTF-8')
# 將輸出結(jié)果寫(xiě)入ou.txt中
for line in inputs:
line_seg = seg_depart(line)
outputs.write(line_seg + '\n')
outputs.close()
inputs.close()
print("刪除停用詞和分詞成功?。?!")
textc = codecs.open('clean.txt', 'r', encoding='utf-8').read()
textc_jieba = jieba.cut(textc)
c = Counter(textc_jieba) # 計(jì)數(shù)
word = c.most_common(800) # 取前500
bg_pic = imageio.imread('bg.png')
wc = WordCloud(
font_path= r'C:\Users\dell\Desktop\FZMWFont.TTF', # 指定中文字體
background_color='white', # 設(shè)置背景顏色
max_words=2000, # 設(shè)置最大顯示的字?jǐn)?shù)
mask=bg_pic, # 設(shè)置背景圖片
max_font_size=200, # 設(shè)置字體最大值
random_state=20 # 設(shè)置多少種隨機(jī)狀態(tài),即多少種配色
)
wc.generate_from_frequencies(dict(word)) # 生成詞云
wc.to_file('resultc.png')
# show
plt.imshow(wc)
plt.axis("off")
plt.figure()
plt.imshow(bg_pic, cmap=plt.cm.gray)
plt.axis("off")
plt.show()
最開(kāi)始做出來(lái)的第一版詞云是這樣的(最真實(shí)且毫無(wú)意義的一手?jǐn)?shù)據(jù)可視化)

result o.png
后來(lái)經(jīng)過(guò)stopwords停用詞庫(kù)的進(jìn)一步修飾和篩選,最終得出的結(jié)果大概長(zhǎng)這樣

resultc.png

bg.png
祝大家bug少少!