天堂激情综合,青春久久资源

我們使用均值漂移，繼續(xù)聚類和非監(jiān)督學習的話題，這次將其用于我們的泰坦尼克數(shù)據(jù)集。

這里有一些隨機度，所以你的結(jié)果可能并不相同，然而你可以重新運行程序來獲取相似結(jié)果，如果你沒有得到相似結(jié)果的話。

我們打算通過均值漂移聚類來看一看泰坦尼克數(shù)據(jù)集。我們感興趣的是，是否均值漂移能夠自動將乘客分離為分組。如果能，檢查它創(chuàng)建的分組就很有趣了。第一個明顯的興趣點就是，所發(fā)現(xiàn)分組的幸存率，但是，我們也會深入這些分組的屬性，來觀察我們是否能夠理解，均值漂移為什么決定了特定的分組。

首先，我們使用已經(jīng)看過的代碼：

import numpy as np
from sklearn.cluster import MeanShift, KMeans
from sklearn import preprocessing, cross_validation
import pandas as pd
import matplotlib.pyplot as plt


'''
Pclass Passenger Class (1 = 1st; 2 = 2nd; 3 = 3rd)
survival Survival (0 = No; 1 = Yes)
name Name
sex Sex
age Age
sibsp Number of Siblings/Spouses Aboard
parch Number of Parents/Children Aboard
ticket Ticket Number
fare Passenger Fare (British pound)
cabin Cabin
embarked Port of Embarkation (C = Cherbourg; Q = Queenstown; S = Southampton)
boat Lifeboat
body Body Identification Number
home.dest Home/Destination
'''


# https://pythonprogramming.net/static/downloads/machine-learning-data/titanic.xls
df = pd.read_excel('titanic.xls')

original_df = pd.DataFrame.copy(df)
df.drop(['body','name'], 1, inplace=True)
df.fillna(0,inplace=True)

def handle_non_numerical_data(df):
    
    # handling non-numerical data: must convert.
    columns = df.columns.values

    for column in columns:
        text_digit_vals = {}
        def convert_to_int(val):
            return text_digit_vals[val]

        #print(column,df[column].dtype)
        if df[column].dtype != np.int64 and df[column].dtype != np.float64:
            
            column_contents = df[column].values.tolist()
            #finding just the uniques
            unique_elements = set(column_contents)
            # great, found them. 
            x = 0
            for unique in unique_elements:
                if unique not in text_digit_vals:
                    # creating dict that contains new
                    # id per unique string
                    text_digit_vals[unique] = x
                    x+=1
            # now we map the new "id" vlaue
            # to replace the string. 
            df[column] = list(map(convert_to_int,df[column]))

    return df

df = handle_non_numerical_data(df)
df.drop(['ticket','home.dest'], 1, inplace=True)

X = np.array(df.drop(['survived'], 1).astype(float))
X = preprocessing.scale(X)
y = np.array(df['survived'])

clf = MeanShift()
clf.fit(X)

除了兩個例外，一個是original_df = pd.DataFrame.copy(df)，在我們將csv文件讀取到df對象之后。另一個是從sklearn.cluster導入MeanShift，并且用其作為我們的聚類器。我們生成了副本，以便之后引用原始非數(shù)值形式的數(shù)據(jù)。

既然我們創(chuàng)建了擬合，我們可以從clf對象獲取一些屬性。

labels = clf.labels_
cluster_centers = clf.cluster_centers_

下面，我們打算向我們的原始數(shù)據(jù)幀添加新的一項。

original_df['cluster_group']=np.nan

現(xiàn)在，我們可以迭代標簽，并向空列添加新的標簽。

for i in range(len(X)):
    original_df['cluster_group'].iloc[i] = labels[i]

現(xiàn)在我們可以檢查每個分組的幸存率：

n_clusters_ = len(np.unique(labels))
survival_rates = {}
for i in range(n_clusters_):
    temp_df = original_df[ (original_df['cluster_group']==float(i)) ]
    #print(temp_df.head())

    survival_cluster = temp_df[  (temp_df['survived'] == 1) ]

    survival_rate = len(survival_cluster) / len(temp_df)
    #print(i,survival_rate)
    survival_rates[i] = survival_rate
    
print(survival_rates)

如果我們執(zhí)行它，我們會得到：

{0: 0.3796583850931677, 1: 0.9090909090909091, 2: 0.1}

同樣，你可能獲得更多分組。我這里獲得了三個，但是我在這個數(shù)據(jù)集上獲得過六個分組?，F(xiàn)在，我們看到分組 0 的幸存率是 38%，分組 1 是 91%，分組 2 是 10%。這就有些奇怪了，因為我們知道船上有三個真實的“乘客分類”。我想知道是不是 0 就是二等艙，1 就是頭等艙， 2 是三等艙。船上的艙是，3 等艙在最底下，頭等艙在最上面，底部首先淹沒，然后頂部是救生船的地方。我可以深入看一看：

print(original_df[ (original_df['cluster_group']==1) ])

我們獲取cluster_group為 1 的original_df。

打印出來：

     pclass  survived                                               name  \
17        1         1    Baxter, Mrs. James (Helene DeLaudeniere Chaput)   
49        1         1                 Cardeza, Mr. Thomas Drake Martinez   
50        1         1  Cardeza, Mrs. James Warburton Martinez (Charlo...   
66        1         1                        Chaudanson, Miss. Victorine   
97        1         1  Douglas, Mrs. Frederick Charles (Mary Helene B...   
116       1         1                Fortune, Mrs. Mark (Mary McDougald)   
183       1         1                             Lesurer, Mr. Gustave J   
251       1         1              Ryerson, Miss. Susan Parker "Suzette"   
252       1         0                         Ryerson, Mr. Arthur Larned   
253       1         1    Ryerson, Mrs. Arthur Larned (Emily Maria Borie)   
302       1         1                                   Ward, Miss. Anna   

        sex   age  sibsp  parch    ticket      fare            cabin embarked  \
17   female  50.0      0      1  PC 17558  247.5208          B58 B60        C   
49     male  36.0      0      1  PC 17755  512.3292      B51 B53 B55        C   
50   female  58.0      0      1  PC 17755  512.3292      B51 B53 B55        C   
66   female  36.0      0      0  PC 17608  262.3750              B61        C   
97   female  27.0      1      1  PC 17558  247.5208          B58 B60        C   
116  female  60.0      1      4     19950  263.0000      C23 C25 C27        S   
183    male  35.0      0      0  PC 17755  512.3292             B101        C   
251  female  21.0      2      2  PC 17608  262.3750  B57 B59 B63 B66        C   
252    male  61.0      1      3  PC 17608  262.3750  B57 B59 B63 B66        C   
253  female  48.0      1      3  PC 17608  262.3750  B57 B59 B63 B66        C   
302  female  35.0      0      0  PC 17755  512.3292              NaN        C   

    boat  body                                       home.dest  cluster_group  
17     6   NaN                                    Montreal, PQ            1.0  
49     3   NaN  Austria-Hungary / Germantown, Philadelphia, PA            1.0  
50     3   NaN                    Germantown, Philadelphia, PA            1.0  
66     4   NaN                                             NaN            1.0  
97     6   NaN                                    Montreal, PQ            1.0  
116   10   NaN                                    Winnipeg, MB            1.0  
183    3   NaN                                             NaN            1.0  
251    4   NaN                 Haverford, PA / Cooperstown, NY            1.0  
252  NaN   NaN                 Haverford, PA / Cooperstown, NY            1.0  
253    4   NaN                 Haverford, PA / Cooperstown, NY            1.0  
302    3   NaN                                             NaN            1.0

很確定了，整個分組就是頭等艙。也就是說，這里實際上只有 11 個人。讓我們看看分組 0，它看起來有些不同。這一次，我們使用 Pandas 的.describe()方法。

print(original_df[ (original_df['cluster_group']==0) ].describe())

            pclass     survived          age        sibsp        parch  \
count  1288.000000  1288.000000  1027.000000  1288.000000  1288.000000   
mean      2.300466     0.379658    29.668614     0.496118     0.332298   
std       0.833785     0.485490    14.395610     1.047430     0.686068   
min       1.000000     0.000000     0.166700     0.000000     0.000000   
25%       2.000000     0.000000    21.000000     0.000000     0.000000   
50%       3.000000     0.000000    28.000000     0.000000     0.000000   
75%       3.000000     1.000000    38.000000     1.000000     0.000000   
max       3.000000     1.000000    80.000000     8.000000     4.000000   

              fare        body  cluster_group  
count  1287.000000  119.000000         1288.0  
mean     30.510172  159.571429            0.0  
std      41.511032   97.302914            0.0  
min       0.000000    1.000000            0.0  
25%       7.895800   71.000000            0.0  
50%      14.108300  155.000000            0.0  
75%      30.070800  255.500000            0.0  
max     263.000000  328.000000            0.0

這里有 1287 個人，我們可以看到平均等級是二等艙，但是這里從頭等到三等都有。

讓我們檢查最后一個分組，2，它的預期是全都是三等艙：

print(original_df[ (original_df['cluster_group']==2) ].describe())

       pclass   survived        age      sibsp      parch       fare  \
count    10.0  10.000000   8.000000  10.000000  10.000000  10.000000   
mean      3.0   0.100000  39.875000   0.800000   6.000000  42.703750   
std       0.0   0.316228   1.552648   0.421637   1.632993  15.590194   
min       3.0   0.000000  38.000000   0.000000   5.000000  29.125000   
25%       3.0   0.000000  39.000000   1.000000   5.000000  31.303125   
50%       3.0   0.000000  39.500000   1.000000   5.000000  35.537500   
75%       3.0   0.000000  40.250000   1.000000   6.000000  46.900000   
max       3.0   1.000000  43.000000   1.000000   9.000000  69.550000   

             body  cluster_group  
count    2.000000           10.0  
mean   234.500000            2.0  
std    130.814755            0.0  
min    142.000000            2.0  
25%    188.250000            2.0  
50%    234.500000            2.0  
75%    280.750000            2.0  
max    327.000000            2.0

很確定了，我們是對的，這個分組全是三等艙，所以有最壞的幸存率。

足夠有趣，在查看所有分組的時候，分組 2 的票價范圍的確是最低的，從 29 到 69 磅。

在我們查看簇 0 的時候，票價最高為 263 磅。這是最大的組，幸存率為 38%。

當我們回顧簇 1 時，它全是頭等艙，我們看到這里的票價范圍是 247 ~ 512 磅，均值為 350。盡管簇 0 有一些頭等艙的乘客，這個分組是最精英的分組。

出于好奇，分組 0 的頭等艙的生存率，與整體生存率相比如何呢？

>>> cluster_0 = (original_df[ (original_df['cluster_group']==0) ])
>>> cluster_0_fc = (cluster_0[ (cluster_0['pclass']==1) ])
>>> print(cluster_0_fc.describe())
       pclass    survived         age       sibsp       parch        fare  \
count   312.0  312.000000  273.000000  312.000000  312.000000  312.000000   
mean      1.0    0.608974   39.027167    0.432692    0.326923   78.232519   
std       0.0    0.488764   14.589592    0.606997    0.653100   60.300654   
min       1.0    0.000000    0.916700    0.000000    0.000000    0.000000   
25%       1.0    0.000000   28.000000    0.000000    0.000000   30.500000   
50%       1.0    1.000000   39.000000    0.000000    0.000000   58.689600   
75%       1.0    1.000000   49.000000    1.000000    0.000000   91.079200   
max       1.0    1.000000   80.000000    3.000000    4.000000  263.000000   

             body  cluster_group  
count   35.000000          312.0  
mean   162.828571            0.0  
std     82.652172            0.0  
min     16.000000            0.0  
25%    109.500000            0.0  
50%    166.000000            0.0  
75%    233.000000            0.0  
max    307.000000            0.0  
>>>

很確定了，它們的幸存率更高，約為 61%，但是仍然低于精英分組（根據(jù)票價和幸存率）的 91%?；ㄙM一些時間來深入挖掘，看看你是否能發(fā)現(xiàn)一些東西。

色偷偷精品伊人,欧洲久久精品,欧美综合婷婷骚逼,国产AV主播,国产最新探花在线,九色在线视频一区,伊人大交九欧美,1769亚洲,黄色成人av

2020-07-21

2020-07-21

友情鏈接更多精彩內(nèi)容

色偷偷精品伊人,欧洲久久精品,欧美综合婷婷骚逼,国产AV主播,国产最新探花在线,九色在线视频一区,伊人大交九 欧美,1769亚洲,黄色成人av

2020-07-21

友情鏈接更多精彩內(nèi)容

色偷偷精品伊人,欧洲久久精品,欧美综合婷婷骚逼,国产AV主播,国产最新探花在线,九色在线视频一区,伊人大交九欧美,1769亚洲,黄色成人av