第5章工具箱

5.1 簡介
5.2 圖層疊加的總體策略
5.3 基本圖形類型
5.4 展示數(shù)據(jù)分布
5.5 處理遮蓋問題
5.6 曲線圖
5.7 繪制地圖
5.8 揭示不確定性
5.9 統(tǒng)計摘要
5.10 添加圖形注釋
5.11 含權數(shù)據(jù)

5.1 簡介

混合使用ggplot2和qplot來概述基本的幾何對象和統(tǒng)計變換

5.2 圖層疊加的總體策略

圖層由三種用途：

用以展示數(shù)據(jù)本身
用以展示數(shù)據(jù)的統(tǒng)計摘要
用以添加額外的元數(shù)據(jù)（metadata），上下文信息和注解。

library(ggplot2)

5.3 基本圖形類型

面積圖、條形圖、線條圖、散點圖、多邊形、添加標簽、色深圖（水平圖），以下代碼繪制了以上的幾何對象

df <- data.frame(
  x = c(3, 1, 5),
  y = c(2, 4, 6),
  label = c("a","b","c")
)
p <- ggplot(df, aes(x, y)) + xlab(NULL) + ylab(NULL)
p + geom_point() + labs(title = "geom_point")
p + geom_bar(stat = "identity") + labs(title = "geom_bar(stat = \"identity\")")
p + geom_line() + labs( title = "geom_line")
p + geom_area() + labs(title = "geom_area")
p + geom_path() + labs(title = "geom_path")
p + geom_text(aes(label = label)) + labs(title = "geom_text")
p + geom_tile() + labs(title = "geom_tile")
p + geom_polygon() + labs(title = "geom_polygon")

上面的元素比較簡單，不再貼圖了。

5.4 展示數(shù)據(jù)分布

例子：對于一維連續(xù)分布，最重要的是直方圖（默認統(tǒng)計count）或者是頻率多邊形（默認統(tǒng)計density）。永遠不要奢望默認的參數(shù)可以取得強有力的表現(xiàn)。

這三幅圖均展示了一個有趣的模式：隨著鉆石質量的提高，分布逐漸左偏移且愈發(fā)對稱。

depth_dist <- ggplot(diamonds, aes(depth)) + xlim(58, 68)
depth_dist +
  geom_histogram(aes(y = ..density..), binwidth = 0.1) +
  facet_grid(cut ~.)

image.png

depth_dist + geom_histogram(aes(fill = cut), binwidth = 0.1, position = "fill")

image.png

depth_dist + geom_freqpoly(aes(y = ..density.., colour = cut), binwidth = 0.1)

image.png

例子：針對類別性或連續(xù)性變量取條件所得到的的箱線圖

library(plyr)
qplot(cut, depth, data = diamonds, geom = "boxplot")

image.png

qplot(carat, depth, data = diamonds, geom = "boxplot", group = round_any(carat, 0.1, floor),xlim = c(0, 3))

image.png

例子：擾動點圖通過在離散型分布上添加隨機噪聲以避免遮蓋繪制問題，這是一種較為粗糙的方法

qplot(class, cty, data = mpg, geom = "jitter")

image.png

qplot(class, drv, data = mpg, geom = "jitter")

image.png

例子：密度圖，必須是已知潛在的密度分布為平滑、連續(xù)且無界的時候使用這種密度圖

qplot(depth, data = diamonds, geom = "density", xlim = c(54, 70))

image.png

qplot(depth, data = diamonds, geom = "density", xlim = c(54, 70), fill = cut, alpha = I(0.2))

image.png

5.5 處理遮蓋問題

散點圖是研究兩個連續(xù)變量間關系的重要工具。但是當數(shù)據(jù)量很大時，這些點經常會出現(xiàn)重疊現(xiàn)象，從而掩蓋真實的關系。根據(jù)這種圖形得到任何結論都是值得懷疑的，這種問題被稱為遮蓋繪制(overplotting)。

方法一：小規(guī)模的遮蓋繪制問題可以通過繪制更小的點

df <- data.frame(x = rnorm(2000), y = rnorm(2000))
norm <- ggplot(df, aes(x, y))
norm + geom_point()

image.png

norm + geom_point(shape = 1)

image.png

norm + geom_point(shape = ".") ##點的大小為像素級

image.png

方法二：更大數(shù)據(jù)集，調整透明度, R中最小為1/256

norm + geom_point(colour = "black", alpha = 1/3)

image.png

norm + geom_point(colour = "black", alpha = 1/5)

image.png

norm + geom_point(colour = "black", alpha = 1/10)

image.png

方法三：在點上增加隨機擾動減輕重疊

td <- ggplot(diamonds, aes(table, depth)) + xlim(50, 70) + ylim(50, 70)
td + geom_point()
td + geom_jitter()

image.png

jit <- position_jitter(width = 0.5)
td + geom_jitter(position = jit)

image.png

td + geom_jitter(position = jit, colour = "black", alpha = 1/10)

image.png

td + geom_jitter(position = jit, colour = "black", alpha = 1/50)

image.png

td + geom_jitter(position = jit, colour = "black", alpha = 1/200)

image.png

方法四;借鑒二維核密度圖的思想，分箱統(tǒng)計其中的數(shù)據(jù)，可視化該數(shù)值

d <- ggplot(diamonds, aes(carat, price)) + xlim(1,3) +theme(legend.position = "none")
d + stat_bin2d()

image.png

d + stat_bin2d(bins = 10)

image.png

d + stat_bin2d(binwidth = c(0.02, 200))

image.png

d + stat_binhex()

image.png

d + stat_binhex(bins = 10)

image.png

d + stat_binhex(binwidth = c(0.02, 200))

image.png

方法五：使用stat_density2d做二維密度估計，并添加等高線或者是著色瓦片直接顯示密度，或者是大小院分布密度成比例的點

d <- ggplot(diamonds, aes(carat, price)) + xlim(1, 3) + theme(legend.position = "none")
d + geom_point() + geom_density2d()

image.png

d + stat_density2d(geom = "point", aes(size = ..density..), contour = F) + scale_size_area()

image.png

d + stat_density2d(geom = "tile", aes(fill = ..density..), contour = F)

image.png

last_plot() + scale_fill_gradient(limits = c(1e-5, 8e-4))

image.png

5.6 曲線圖

常用工具：著色瓦片，等高線圖，氣泡圖

5.7 繪制地圖

maps包與ggplot2的結合十分方便，使用地圖的原因，一是為了空間數(shù)據(jù)添加參考輪廓線，一個是不同區(qū)域填充顏色構建等值線圖

添加地圖邊界可以用borders()來完成，以下是一個使用實例。

library(maps)
data(us.cities)
big_cities <- subset(us.cities, pop > 500000)
qplot(long, lat, data = big_cities) +borders("state", size = 0.5)

image.png

tx_cities <- subset(us.cities, country.etc == "TX")
ggplot(tx_cities, aes(long, lat))+
  borders("county", "texas", colour = "grey70") +
  geom_point(colour = "black", alpha = 0.5)

image.png

等值線圖:使用map_data()將地圖數(shù)據(jù)轉換為數(shù)據(jù)框，此數(shù)據(jù)框之后可以通過merge（）操作與數(shù)據(jù)融合，最后繪制等值線，如下所示：

library(maps)
states <- map_data("state")
arrests <- USArrests
names(arrests) <- tolower(names(arrests))
arrests$region <- tolower(rownames(USArrests))

choro <- merge(states, arrests, by = "region")
choro <- choro[order(choro$order),]
qplot(long, lat, data = choro, group = group, fill = assault, geom = "polygon")

image.png

qplot(long, lat, data = choro, group = group, fill = assault / murder, geom = "polygon")

image.png

例子：對地圖數(shù)據(jù)進行標注

library(plyr)
ia <- map_data("county", "iowa")
mid_range <- function(x) mean(range(x, na.rm = TRUE))
centres <- ddply(ia, .(subregion), colwise(mid_range, .(lat, long)))
ggplot(ia, aes(long, lat))+
  geom_polygon(aes(group = group), fill = NA, colour = "grey60") +
  geom_text(aes(label = subregion), data = centres, size = 2, angle = 45)

image.png

5.8 揭示不確定性

在ggplot中，對于不確定信息的可視化主要有四種幾何對象：
連續(xù)型X變量：geom_ribbon（僅展示區(qū)間），geom_smooth(stat = "identity")（同時展示區(qū)間和中間值）
離散型X變量：geom_errorbar（僅展示區(qū)間），geom_crossbar（同時展示區(qū)間和中間值）；geom_linerange（僅展示區(qū)間），geom_pointrange（同時展示區(qū)間和中間值）

對于線性模型，effect包（Fox, 2008)非常適合提取這類值。下面的例子擬合了一個雙因素含交互效應回歸模型，并且展示了如何提取邊際效應和條件效應。

d <- subset(diamonds, carat <2.5 & rbinom(nrow(diamonds), 1, 0.2) == 1)
d$lcarat <- log10(d$carat)
d$lprice <- log10(d$price)

#剔除整體的線性趨勢
detrend <- lm(lprice ~ lcarat, data = d)
d$lprice2 <- resid(detrend)

mod <- lm(lprice2 ~ lcarat*color, data = d)

library(effects)
effectdf <- function(...){
  suppressWarnings(as.data.frame(effect(...)))
}
color <- effectdf("color", mod)
both1 <- effectdf("lcarat:color", mod)

carat <- effectdf("lcarat", mod, default.levels = 50)
both2 <- effectdf("lcarat:color", mod, default.leves = 3)

## 圖 進行數(shù)據(jù)變換以移除顯而易見的效應，1為對x軸和y軸的數(shù)據(jù)均以10對底的對數(shù)以剔除非線性， 2 為剔除了主要的線性趨勢
qplot(lcarat, lprice, data = d, colour = color)

image.png

qplot(lcarat, lprice2, data = d, colour = color)

image.png


## 圖 展示模型估計結果中變量color的不確定性，左圖為color的邊際效應，有圖則是針對變量carat的不同水平，變量color的條件效應，誤差棒顯示了95%的逐點置信區(qū)間
fplot <- ggplot(mapping = aes(y = fit, ymin = lower, yamx = upper)) +
  ylim(range(both2$lower, both2$upper))
fplot %+% color + aes(x = color) + geom_point() + geom_errorbar(aes(ymin = lower, ymax = upper))

image.png

fplot %+% both2 +
  aes(x = color, colour = lcarat, group = interaction(color, lcarat)) +
  geom_errorbar(aes(ymin = lower, ymax = upper)) +
  geom_line(aes(group = lcarat)) +
  scale_colour_gradient()

image.png


## 圖 展示模型估計結果中變量carat的不確定性
fplot %+% carat + aes(x = lcarat) + geom_smooth(stat = "identity", se = TRUE)

image.png

ends <- subset(both1, lcarat == max(lcarat))
fplot %+% both1 + aes(x = lcarat, colour = color)+
  geom_smooth(stat = "identity", se = TRUE) +
  scale_colour_hue() +
  theme(legend.position = "none")+
  geom_text(aes(label = color, x = lcarat +0.02),ends)

image.png

5.9 統(tǒng)計摘要

stat_summary()：對于每個x取值，計算對應y值的統(tǒng)計摘要

5.9.1 單獨的摘要計算函數(shù)

midm <- function(x) mean(x, trim = 0.5)
m2 + stat_summary(aes(colour = "trimmed"), fun.y = midm, geom = "point") +
stat_summary(aes(colour = "raw"), fun.y = mean, geom = "point") +
scale_colour_hue("Mean")

5.9.2 統(tǒng)一的摘要計算函數(shù)

fun.data可以支持更復雜的函數(shù)，比如來自Hmisc包的摘要計算函數(shù)。

iqr <- function(x,...) {
qs <- quantile(as.numberic(x), c(0.25,0.75), na.rm = T)
names(qs) <- c("ymin", "ymax")
qs
}
m + stat_summary(fun.data = "iqr", geom = "ribbon")

5.10 添加圖形注解

這些注解僅僅是額外的數(shù)據(jù)而已。有逐個添加或者是批量添加兩種方式。

下面的例子：向經濟數(shù)據(jù)中添加有關美國總統(tǒng)的信息

繪制原始失業(yè)率曲線

(unemp <- qplot(date, unemploy, data = economics, geom = "line", xlab = "", ylab = "No. unemployed (1000s)"))

image.png

# 添加總統(tǒng)就職時間豎線
presidential <- presidential[-(1:3),]

yrng <- range(economics$unemploy)
xrng <- range(economics$date)
unemp + geom_vline(aes(xintercept = as.numeric(start)), data = presidential)

image.png

library(scales)
unemp + geom_rect(aes(NULL, NULL, xmin = start, xmax = end, fill = party), ymin = yrng[1], ymax = yrng[2], data = presidential, alpha = 0.2)+
  scale_fill_manual(values = c("blue","red"))

image.png

last_plot() + geom_text(aes(x = start, y = yrng[1],label = name), data = presidential, size = 3, hjust = 0, vjust = 0)

image.png

caption <- paste(strwrap("Unemployment rates in the US have varied a lot over the years", 40), collapse = "\n")
unemp + geom_text(aes(x, y, label = caption), data = data.frame(x = xrng[2], y = yrng[2]), hjust = 1, vjust = 1, size = 4)

image.png

highest <- subset(economics, unemploy == max(unemploy))
unemp + geom_point(data = highest, size = 3, colour = "red", alpha = 0.5)

image.png

5.11 含權數(shù)據(jù)

例子：使用點的大小來表達權重

qplot(percwhite, percbelowpoverty, data = midwest)

image.png

qplot(percwhite, percbelowpoverty, data = midwest, size = poptotal / 1e6) +
  scale_size_area("Population\n(millions)", breaks = c(0.5, 1, 2, 4))

image.png

qplot(percwhite, percbelowpoverty, data = midwest, size = area) +
  scale_size_area()

image.png

例子：將人口密度作為權重，觀察白種人比例和貧困線以下人口比例的關系

lm_smooth <- geom_smooth(method = lm, size = 1)
qplot(percwhite, percbelowpoverty, data = midwest) + lm_smooth

image.png

qplot(percwhite, percbelowpoverty, data = midwest, weight = popdensity, size = popdensity) +lm_smooth

image.png

例子：不含權重的直方圖展示了郡的數(shù)量，含權重信息的直方圖展示了人口數(shù)量

qplot(percbelowpoverty, data = midwest, binwidth = 1)

image.png

qplot(percbelowpoverty, data = midwest, weight = poptotal, binwidth = 1) +ylab("population")

image.png

本章完結，撒花~

色偷偷精品伊人,欧洲久久精品,欧美综合婷婷骚逼,国产AV主播,国产最新探花在线,九色在线视频一区,伊人大交九欧美,1769亚洲,黄色成人av

R語言ggplot2：第五章工具箱

R語言ggplot2：第五章工具箱

第5章工具箱

5.1 簡介

5.2 圖層疊加的總體策略

5.3 基本圖形類型

5.4 展示數(shù)據(jù)分布

5.5 處理遮蓋問題

5.6 曲線圖

5.7 繪制地圖

5.8 揭示不確定性

5.9 統(tǒng)計摘要

5.9.1 單獨的摘要計算函數(shù)

5.9.2 統(tǒng)一的摘要計算函數(shù)

5.10 添加圖形注解

5.11 含權數(shù)據(jù)

相關閱讀更多精彩內容

友情鏈接更多精彩內容

色偷偷精品伊人,欧洲久久精品,欧美综合婷婷骚逼,国产AV主播,国产最新探花在线,九色在线视频一区,伊人大交九 欧美,1769亚洲,黄色成人av

R語言ggplot2：第五章 工具箱

第5章 工具箱

5.1 簡介

5.2 圖層疊加的總體策略

5.3 基本圖形類型

5.4 展示數(shù)據(jù)分布

5.5 處理遮蓋問題

5.6 曲線圖

5.7 繪制地圖

5.8 揭示不確定性

5.9 統(tǒng)計摘要

5.9.1 單獨的摘要計算函數(shù)

5.9.2 統(tǒng)一的摘要計算函數(shù)

5.10 添加圖形注解

5.11 含權數(shù)據(jù)

相關閱讀更多精彩內容

友情鏈接更多精彩內容

色偷偷精品伊人,欧洲久久精品,欧美综合婷婷骚逼,国产AV主播,国产最新探花在线,九色在线视频一区,伊人大交九欧美,1769亚洲,黄色成人av

R語言ggplot2：第五章工具箱

第5章工具箱