Spark Broadcast Variables

這篇文章詳細(xì)的介紹了spark廣播變量,值得一看

https://jaceklaskowski.gitbooks.io/mastering-apache-spark/content/spark-broadcast.html

在此只摘錄其中的Example

Let’s start with an introductory example to check out how to use broadcast variables and build your initial understanding.

You’re going to use a static mapping of interesting projects with their websites, i.e. Map[String, String] that the tasks, i.e. closures (anonymous functions) in transformations, use.

scala> val pws = Map("Apache Spark" -> "http://spark.apache.org/", "Scala" -> "http://www.scala-lang.org/")
pws: scala.collection.immutable.Map[String,String] = Map(Apache Spark -> http://spark.apache.org/, Scala -> http://www.scala-lang.org/)

scala> val websites = sc.parallelize(Seq("Apache Spark", "Scala")).map(pws).collect
...
websites: Array[String] = Array(http://spark.apache.org/, http://www.scala-lang.org/)

It works, but is very ineffective as the pws map is sent over the wire to executors while it could have been there already. If there were more tasks that need the pws map, you could improve their performance by minimizing the number of bytes that are going to be sent over the network for task execution.

Enter broadcast variables.

val pwsB = sc.broadcast(pws)
val websites = sc.parallelize(Seq("Apache Spark", "Scala")).map(pwsB.value).collect
// websites: Array[String] = Array(http://spark.apache.org/, http://www.scala-lang.org/)

Semantically, the two computations - with and without the broadcast value - are exactly the same, but the broadcast-based one wins performance-wise when there are more executors spawned to execute many tasks that use pws map.

總結(jié)

通過(guò)這篇文章可以知道,如果在driver中定義一個(gè)普通的變量,也是可以在不同的task中傳遞的,只不過(guò)是通過(guò)拷貝一個(gè)副本的方式傳遞。為了提高性能通過(guò)定義廣播變量,在每個(gè)機(jī)器上只生成一個(gè)只讀變量,共享給這個(gè)機(jī)器上所有的task。

最后編輯于
?著作權(quán)歸作者所有,轉(zhuǎn)載或內(nèi)容合作請(qǐng)聯(lián)系作者
【社區(qū)內(nèi)容提示】社區(qū)部分內(nèi)容疑似由AI輔助生成,瀏覽時(shí)請(qǐng)結(jié)合常識(shí)與多方信息審慎甄別。
平臺(tái)聲明:文章內(nèi)容(如有圖片或視頻亦包括在內(nèi))由作者上傳并發(fā)布,文章內(nèi)容僅代表作者本人觀點(diǎn),簡(jiǎn)書(shū)系信息發(fā)布平臺(tái),僅提供信息存儲(chǔ)服務(wù)。

相關(guān)閱讀更多精彩內(nèi)容

  • 太極印的創(chuàng)意: 徐懷清//文 印,有統(tǒng)印、率印、信印之意,太極印是天道與人道的自然融合,有其至高無(wú)上,印證天下的妙...
    焦作太極徐閱讀 1,734評(píng)論 0 0
  • 很早就知道這部電影,而真正要看,卻也是機(jī)緣巧合。 前天晚上跟同事聊天,說(shuō)起高中時(shí)期的事。說(shuō)起當(dāng)初的選擇,說(shuō)起如今。...
    深夜芝士閱讀 355評(píng)論 0 0
  • 我對(duì)著銅鏡中自己的面龐,不由微笑。在發(fā)髻上插了一支羊脂白玉的簪子,我轉(zhuǎn)頭問(wèn):“好看嗎?”夜里,我不由得渾身冷戰(zhàn)。雖...
    DU嘟嘟閱讀 534評(píng)論 0 1
  • 你所學(xué)到的,是需內(nèi)化成自己的,營(yíng)養(yǎng),讓自己茁壯成長(zhǎng),而不是模仿別人,記住,要做出自己的獨(dú)立人格,和自己的路。
    薛功燦閱讀 312評(píng)論 0 0

友情鏈接更多精彩內(nèi)容