Sample pyspark

Minor modifications to the provided in Spark Docs.

  • Changed builder() to builder
  • Provided the appName: appName(“SimpleApp”)
  • Removed the master(master)
from pyspark.sql import SparkSession

logFile = "/home/hadoop/spark/" # Should be some file on your system
spark = SparkSession.builder.appName("SimpleApp").getOrCreate() # Updated this line
logData =

numAs = logData.filter(logData.value.contains('a')).count()
numBs = logData.filter(logData.value.contains('b')).count()

print("Lines with a: %i, lines with b: %i" % (numAs, numBs))


Save & run…




Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s