Last active
June 12, 2016 23:49
-
-
Save royrusso/51bc89427e1575d4d777 to your computer and use it in GitHub Desktop.
Load datasource in to dataframe, using Spark DataSource API
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
os.environ['SPARK_CLASSPATH'] = "/path/to/driver/postgresql-9.3-1103.jdbc41.jar" | |
from pyspark import SparkContext | |
from pyspark.sql import SQLContext, Row | |
sc = SparkContext("local[*]", '<JOBNAME>') | |
sqlctx = SQLContext(sc) | |
df = sqlctx.load( | |
source="jdbc", | |
url="jdbc:postgresql://<HOST>/<DATABASE>?user=<USERNAME>&password=<PASSWORD>", | |
dbtable="<SCHEMA>.<TABLENAME>") |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment