Documents

Spark

들어가기

빅데이터

크키(Volume), 다양성(Variety), 속도(Velocity) 세 가지 속성을 통해 정의.

설치하기

spark 설치하기
$ brew install apache-spark
$ spark-shell
scala> sc
res0: org.apache.spark.SparkContext = org.apache.spark.SparkContext@1500edf3
설치된 디렉토리 찾기
$ which spark-shell
/usr/local/bin/spark-shell
$ ls -al /usr/local/bin/spark-shell
lrwxr-xr-x 1 user admin 44 Nov 29 01:23 /usr/local/bin/spark-shell -> ../Cellar/apache-spark/2.4.4/bin/spark-shell
scala> val file = sc.textFile("file:///usr/local/Cellar/apache-spark/2.4.4/README.md")
file: org.apache.spark.rdd.RDD[String] = file://usr/local/Cellar/apache-spark/2.4.4/README.md MapPartitionsRDD[1] at textFile at <console>:24

scala> val words = file.flatMap(_.split(" "))
words: org.apache.spark.rdd.RDD[String] = MapPartitionsRDD[2] at flatMap at <console>:25

scala> val result = words.countByValue
result: scala.collection.Map[String,Long] = Map(site, -> 1, Please -> 4, Contributing -> 1, GraphX -> 1, project. -> 1, "" -> 72, for -> 12, find -> 1, Apache -> 1, package -> 1, Hadoop, -> 2, review -> 1, Once -> 1, For -> 3, name -> 1, this -> 1, protocols -> 1, Hive -> 2, in -> 6, "local[N]" -> 1, MASTER=spark://host:7077 -> 1, have -> 1, your -> 1, are -> 1, is -> 7, HDFS -> 1, Data. -> 1, built -> 1, thread, -> 1, examples -> 2, developing -> 1, using -> 5, system -> 1, than -> 1, Shell -> 2, mesos:// -> 1, 3"](https://cwiki.apache.org/confluence/display/MAVEN/Parallel+builds+in+Maven+3). -> 1, easiest -> 1, This -> 2, -T -> 1, [Apache -> 1, N -> 1, integration -> 1, <class> -> 1, different -> 1, "local" -> 1, README -> 1, YARN"](http://spark.apache.org/docs/latest/building-spark.h...

scala> result.get("For").get
res0: Long = 3

Troubleshooting

Can’t assign requested address: Service 'sparkDriver' failed after 16 retries

아래 ENV 추가로 해결

$ export SPARK_LOCAL_IP=127.0.0.1
$ spark-shell
spark