页面树结构

2017-07-25 Apache Spark 2.2.0 官方文档中文版发布 : http://spark.apachecn.org/docs/cn/2.2.0/


MachineLearning 优酷地址 : http://i.youku.com/apachecn

转至元数据结尾
转至元数据起始

HDFS Demo

1. 初始环境

pom.xml中添加:spark和spark的相关的依赖 dependency

maven
 
  <dependencies>

    <!-- https://mvnrepository.com/artifact/org.apache.spark/spark-core_2.11 -->
    <dependency>
      <groupId>org.apache.spark</groupId>
      <artifactId>spark-core_2.11</artifactId>
      <version>1.6.1</version>
    </dependency>
    <dependency>
      <groupId>org.apache.hadoop</groupId>
      <artifactId>hadoop-client</artifactId>
      <version>2.6.4</version>
    </dependency>

  </dependencies>
 

2. 配置SparkConf和操作

代码
package hdfs
import org.apache.spark.{SparkContext, SparkConf}
object HdfsWordCount {
  def main(args: Array[String]) {
    val sparkConf = new SparkConf().setAppName("HdfsWordCount")
      .setMaster("local[5]")
    val sc = new SparkContext(sparkConf)
    val lines = sc.textFile("file:/Users/jiangzl/Desktop/testSet.txt")
//    val lines = sc.textFile("hdfs://localhost:9000/user/datasys/input/*")
    val words = lines.flatMap(_.split(""))
    val wordCounts = words.map(x => (x, 1)).reduceByKey(_ + _).sortBy(_._2, false)
    println("-----开始-----")
    // 终端输出
    wordCounts.foreach(wordCounts => println(wordCounts._1 + " : " + wordCounts._2))
    // 文件输出
    wordCounts.repartition(1).saveAsTextFile("file:/Users/jiangzl/Desktop/result.txt")
    // hdfs写入
    wordCounts.repartition(1).saveAsTextFile("hdfs://localhost:9000/user/datasys/result.txt")
    SparkContext.getOrCreate()
    sc.stop()
  }
}
结果
( ,850)
(a,264)
(e,220)
(t,209)
(i,203)
(s,165)
(d,154)
(r,149)
(_,149)
(o,139)
(l,124)
(p,114)
(,,112)
(c,105)
(',102)
(n,99)
(,91)
((,67)
(),67)
(u,66)
(y,64)
(.,55)
(h,40)
($,39)
(-,37)
(m,37)
(f,35)
(b,30)
(k,25)
(1,25)
(",20)
(0,19)
(g,18)
(\,17)

 

 

  • 无标签