18.5.15

Example list in Learning Spark

Example 2-2. Scala line count
Example 2-3. Examining the sc variable
Example 2-4. Python filtering example
Example 2-5. Scala filtering example
Example 2-6. Running a Python script
Example 2-7. Initializing Spark in Python
Example 2-9. Initializing Spark in Java
Example 2-10. Word count Java application?don?t worry about the details yet
Example 2-11. Word count Scala application?don?t worry about the details yet
Example 2-13. Maven build file
Example 2-14. Scala build and run
Example 2-15. Maven build and run
Example 3-1.
Example 3-1. Creating an RDD of strings with textFile() in Python
Example 3-2. Calling the filter() transformation
Example 3-3. Calling the first() action
Example 3-4. Persisting an RDD in memory
Example 3-5. parallelize() method in Python
Example 3-6. parallelize() method in Scala
Example 3-7. parallelize() method in Java
Example 3-8. textFile() method in Python
Example 3-9. textFile() method in Scala
Example 3-10. textFile() method in Java
Example 3-11. filter() transformation in Python
Example 3-12. filter() transformation in Scala
Example 3-13. filter() transformation in Java
Example 3-16. Scala error count using actions
Example 3-17. Java error count using actions
Example 3-18. Passing functions in Python
Example 3-19. Passing a function with field references (don?t do this!)
Example 3-20. Python function passing without field references
Example 3-22. Java function passing with anonymous inner class
Example 3-23. Java function passing with named class
Example 3-24.
Example 3-24. Java function class with parameters
Example 3-25. Java function passing with lambda expression in Java 8
Example 3-26. Python squaring the values in an RDD
Example 3-27. Scala squaring the values in an RDD
Example 3-28. Java squaring the values in an RDD
Example 3-29. flatMap() in Python, splitting lines into words
Example 3-30. flatMap() in Scala, splitting lines into multiple words
Example 3-33. reduce() in Scala
Example 3-34. reduce() in Java
Example 3-36. aggregate() in Scala
Example 3-37. aggregate() in Java
Example 3-38. Creating DoubleRDD in Java
Example 3-39. Double execution in Scala
Example 3-40. persist() in Scala
Example 4-1. Creating a pair RDD using the first word as the key in Python
Example 4-2. Creating a pair RDD using the first word as the key in Scala
Example 4-3. Creating a pair RDD using the first word as the key in Java
Example 4-4. Simple filter on second element in Python
Example 4-5. Simple filter on second element in Scala
Example 4-7. Per-key average with reduceByKey() and mapValues() in Python
Example 4-8. Per-key average with reduceByKey() and mapValues() in Scala
Example 4-9. Word count in Python
Example 4-10. Word count in Scala
Example 4-11. Word count in Java
Example 4-12. Per-key average using combineByKey() in Python
Example 4-13. Per-key average using combineByKey() in Scala
Example 4-14. Per-key average using combineByKey() in Java
Example 4-16. reduceByKey() with custom parallelism in Scala
Example 4-17. Scala shell inner join
Example 4-18. leftOuterJoin() and rightOuterJoin()
Example 4-19. Custom sort order in Python, sorting integers as if strings
Example 4-20. Custom sort order in Scala, sorting integers as if strings
Example 4-21. Custom sort order in Java, sorting integers as if strings
Example 4-22.
Example 4-22. Scala simple application
Example 4-23.
Example 4-25. Scala PageRank
Example 4-26. Scala custom partitioner
Example 4-27. Python custom partitioner
Example 5-1. Loading a text file in Python
Example 5-2. Loading a text file in Scala
Example 5-3. Loading a text file in Java
Example 5-4. Average value per file in Scala
Example 5-5. Saving as a text file in Python
Example 5-7. Loading JSON in Scala
Example 5-8. Loading JSON in Java
Example 5-9. Saving JSON in Python
Example 5-10. Saving JSON in Scala
Example 5-11. Saving JSON in Java
Example 5-12. Loading CSV with textFile() in Python
Example 5-13. Loading CSV with textFile() in Scala
Example 5-14. Loading CSV with textFile() in Java
Example 5-15. Loading CSV in full in Python
Example 5-17. Loading CSV in full in Java
Example 5-18. Writing CSV in Python
Example 5-19. Writing CSV in Scala
Example 5-21. Loading a SequenceFile in Scala
Example 5-22. Loading a SequenceFile in Java
Example 5-23.
Example 5-23. Saving a SequenceFile in Scala
Example 5-24. Loading KeyValueTextInputFormat() with old-style API in Scala
Example 5-25. Loading LZO-compressed JSON with Elephant Bird in Scala
Example 5-26. Saving a SequenceFile in Java
Example 5-27. Sample protocol buffer definition
Example 5-28. Elephant Bird protocol buffer writeout in Scala
Example 5-29. Loading a compressed text file from the local filesystem in Scala
Example 5-30. Creating a HiveContext and selecting data in Python
Example 5-31. Creating a HiveContext and selecting data in Scala
Example 5-32. Creating a HiveContext and selecting data in Java
Example 5-33. Sample tweets in JSON
Example 5-34. JSON loading with Spark SQL in Python
Example 5-35. JSON loading with Spark SQL in Scala
Example 5-36. JSON loading with Spark SQL in Java
Example 5-37. JdbcRDD in Scala
Example 5-38. sbt requirements for Cassandra connector
Example 5-39. Maven requirements for Cassandra connector
Example 5-40. Setting the Cassandra property in Scala
Example 5-42. Loading the entire table as an RDD with key/value data in Scala
Example 5-43. Loading the entire table as an RDD with key/value data in Java
Example 5-45. Scala example of reading from HBase
Example 5-46. Elasticsearch output in Scala
Example 5-47. Elasticsearch input in Scala
Example 6-1. Sample call log entry in JSON, with some fields removed
Example 6-2. Accumulator empty line count in Python
Example 6-3. Accumulator empty line count in Scala
Example 6-4. Accumulator empty line count in Java
Example 6-5. Accumulator error count in Python
Example 6-6. Country lookup in Python
Example 6-7. Country lookup with Broadcast values in Python
Example 6-8. Country lookup with Broadcast values in Scala
Example 6-10. Shared connection pool in Python
Example 6-11. Shared connection pool and JSON parser in Scala
Example 6-12. Shared connection pool and JSON parser in Java
Example 6-13. Average without mapPartitions() in Python
Example 6-14. Average with mapPartitions() in Python
Example 6-15. R distance program
Example 6-16. Driver program using pipe() to call finddistance.R in Python
Example 6-17. Driver program using pipe() to call finddistance.R in Scala
Example 6-18. Driver program using pipe() to call finddistance.R in Java
Example 6-20. Removing outliers in Scala
Example 6-21. Removing outliers in Java
Example 7-1. Submitting a Python application
Example 7-2. Submitting an application with extra arguments
Example 7-3. General format for spark-submit
Example 7-4. Using spark-submit with various options
Example 7-5. pom.xml file for a Spark application built with Maven
Example 7-6. Packaging a Spark application built with Maven
Example 7-8. Adding the assembly plug-in to an sbt project build
Example 8-1. Creating an application using a SparkConf in Python
Example 8-3. Creating an application using a SparkConf in Java
Example 8-5. Setting configuration values at runtime using a defaults file
Example 8-6. input.txt, the source file for our example
Example 8-7. Processing text data in the Scala Spark shell
Example 8-9. Collecting an RDD
Example 8-10. Computing an already cached RDD
Example 8-11. Coalescing a large RDD in the PySpark shell
Example 8-12. Registering a class allows Kryo to avoid writing full class names with
Example 9-1. Maven coordinates for Spark SQL with Hive support
Example 9-2. Scala SQL imports
Example 9-3. Scala SQL implicits
Example 9-4. Java SQL imports
Example 9-5. Python SQL imports
Example 9-6. Constructing a SQL context in Scala
Example 9-7. Constructing a SQL context in Java
Example 9-8. Constructing a SQL context in Python
Example 9-9. Loading and quering tweets in Scala
Example 9-10. Loading and quering tweets in Java
Example 9-11. Loading and quering tweets in Python
Example 9-12. Accessing the text column (also first column) in the topTweets
Example 9-13. Accessing the text column (also first column) in the topTweets
Example 9-14. Accessing the text column in the topTweets DataFrame in Python
Example 9-15. Hive load in Python
Example 9-16. Hive load in Scala
Example 9-17. Hive load in Java
Example 9-18. Parquet load in Python
Example 9-20. Parquet file save in Python
Example 9-21. Input records
Example 9-22. Loading JSON with Spark SQL in Python
Example 9-23. Loading JSON with Spark SQL in Scala
Example 9-25. Resulting schema from printSchema()
Example 9-26. Partial schema of tweets
Example 9-27.
Example 9-27. SQL query nested and array elements
Example 9-28. Creating a DataFrame using Row and named tuple in Python
Example 9-29. Creating a DataFrame from case class in Scala
Example 9-30. Creating a DataFrame from a JavaBean in Java
Example 9-32. Connecting to the JDBC server with Beeline
Example 9-33. Load table
Example 9-35. Spark SQL shell EXPLAIN
Example 9-36. Python string length UDF
Example 9-37. Scala string length UDF
Example 9-38. Java UDF imports
Example 9-40. Spark SQL multiple sums
Example 9-41. Beeline command for enabling codegen
Example 10-1. Maven coordinates for Spark Streaming
Example 10-2. Scala streaming imports
Example 10-3. Java streaming imports
Example 10-4. Streaming filter for printing lines containing ?error? in Scala
Example 10-5. Streaming filter for printing lines containing ?error? in Java
Example 10-6. Streaming filter for printing lines containing ?error? in Scala
Example 10-7. Streaming filter for printing lines containing ?error? in Java
Example 10-9.
Example 10-9. Log output from running Example 10-8
Example 10-10. map() and reduceByKey() on DStream in Scala
Example 10-11. map() and reduceByKey() on DStream in Java
Example 10-12. Joining two DStreams in Scala
Example 10-14. transform() on a DStream in Scala
Example 10-15. transform() on a DStream in Java
Example 10-16. Setting up checkpointing
Example 10-17. How to use window() to count data over a window in Scala
Example 10-18. How to use window() to count data over a window in Java
Example 10-19. Scala visit counts per IP address
Example 10-20. Java visit counts per IP address
Example 10-21. Windowed count operations in Scala
Example 10-22. Windowed count operations in Java
Example 10-23. Running count of response codes using updateStateByKey() in Scala
Example 10-24. Running count of response codes using updateStateByKey() in Java
Example 10-25. Saving DStream to text files in Scala
Example 10-26. Saving SequenceFiles from a DStream in Scala
Example 10-27. Saving SequenceFiles from a DStream in Java
Example 10-28. Saving data to external systems with foreachRDD() in Scala
Example 10-29. Streaming text files written to a directory in Scala
Example 10-30. Streaming text files written to a directory in Java
Example 10-31. Streaming SequenceFiles written to a directory in Scala
Example 10-32. Apache Kafka subscribing to Panda?s topic in Scala
Example 10-33. Apache Kafka subscribing to Panda?s topic in Java
Example 10-34. Apache Kafka directly reading Panda?s topic in Scala
Example 10-35. Apache Kafka directly reading Panda?s topic in Java
Example 10-36. Flume configuration for Avro sink
Example 10-37. FlumeUtils agent in Scala
Example 10-38. FlumeUtils agent in Java
Example 10-39.
Example 10-39. Maven coordinates for Flume sink
Example 10-40. Flume configuration for custom sink
Example 10-41. FlumeUtils custom sink in Scala
Example 10-43. SparkFlumeEvent in Scala
Example 10-45. Setting up a driver that can recover from failure in Scala
Example 10-46. Setting up a driver that can recover from failure in Java
Example 10-47.
Example 10-47. Launching a driver in supervise mode
Example 10-48. Enable the Concurrent Mark-Sweep GC
Example 11-1. Spam classifier in Python
Example 11-2. Spam classifier in Scala
Example 11-3. Spam classifier in Java
Example 11-4. Creating vectors in Python
Example 11-5. Creating vectors in Scala
Example 11-6. Creating vectors in Java
Example 11-7. Using HashingTF in Python
Example 11-8. Using TF-IDF in Python
Example 11-9. Scaling vectors in Python
Example 11-10. Linear regression in Python
Example 11-11. Linear regression in Scala
Example 11-12. Linear regression in Java
Example 11-13. PCA in Scala
Example 11-14. SVD in Scala
Example 11-15. Pipeline API version of spam classification in Scala


No comments: