Sample pyspark SimpleApp.py

Minor modifications to the SimpleApp.py provided in Spark Docs.

  • Changed builder() to builder
  • Provided the appName: appName(“SimpleApp”)
  • Removed the master(master)
"""SimpleApp.py"""
from pyspark.sql import SparkSession

logFile = "/home/hadoop/spark/README.md" # Should be some file on your system
spark = SparkSession.builder.appName("SimpleApp").getOrCreate() # Updated this line
logData = spark.read.text(logFile).cache()

numAs = logData.filter(logData.value.contains('a')).count()
numBs = logData.filter(logData.value.contains('b')).count()

print("Lines with a: %i, lines with b: %i" % (numAs, numBs))

spark.stop()

Save & run…

python SimpleApp.py

Output:

spark-simple-app-output

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

w

Connecting to %s