- Method 1: Use envronment Variable
– Variable can be set using export SPARK_HOME=’/Users/donghua/spark-2.4.0-bin-hadoop2.7’ before executing python command
– below method set variable in the python script, useful for notebook envronment
Donghuas-MacBook-Air:spark-2.4.0-bin-hadoop2.7 donghua$ python
Python 3.6.8 |Anaconda, Inc.| (default, Dec 29 2018, 19:04:46)
[GCC 4.2.1 Compatible Clang 4.0.1 (tags/RELEASE_401/final)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import os
>>> os.environ['SPARK_HOME'] = '/Users/donghua/spark-2.4.0-bin-hadoop2.7'
>>> os.environ['PYTHONPATH'] = '/Users/donghua/spark-2.4.0-bin-hadoop2.7/python:/Users/donghua/spark-2.4.0-bin-hadoop2.7/python/lib/py4j-0.10.7-src.zip'
>>> os.environ['PYSPARK_PYTHON'] = 'python3'
>>> os.environ['PYSPARK_DRIVER_PYTHON'] = 'python3'
>>> print (os.environ['SPARK_HOME'] )
/Users/donghua/spark-2.4.0-bin-hadoop2.7
>>> from pyspark import SparkContext
>>> sc = SparkContext('local','handson Spark')
Setting default log level to "WARN".
To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel).
>>> print(sc.version)
2.4.0
>>> exit()
- Method 2: Use findspark package (recommended)
Donghuas-MacBook-Air:spark-2.4.0-bin-hadoop2.7 donghua$ python
Python 3.6.8 |Anaconda, Inc.| (default, Dec 29 2018, 19:04:46)
[GCC 4.2.1 Compatible Clang 4.0.1 (tags/RELEASE_401/final)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import findspark
>>> findspark.init('/Users/donghua/spark-2.4.0-bin-hadoop2.7')
>>> from pyspark import SparkContext
>>> sc = SparkContext('local','handson Spark')
Setting default log level to "WARN".
To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel).
>>> print(sc.version)
2.4.0
>>> exit()
Without setting the variable, it use default spark home, the outcome depends on where pyspark packages installed (in this case, Spark 2.3.2 used instead of 2.4.0)
Donghuas-MacBook-Air:spark-2.4.0-bin-hadoop2.7 donghua$ python
Python 3.6.8 |Anaconda, Inc.| (default, Dec 29 2018, 19:04:46)
[GCC 4.2.1 Compatible Clang 4.0.1 (tags/RELEASE_401/final)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> from pyspark import SparkContext
>>> sc = SparkContext('local','handson Spark')
2019-03-27 18:22:32 WARN NativeCodeLoader:62 - Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Setting default log level to "WARN".
To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel).
>>>
>>> print(sc.version)
2.3.2
>>> exit()
No comments:
Post a Comment