Chapter 1. Introduction [nbviewer] [ipynb]
Chapter 2: Preliminaries [nbviewer] [ipynb]
Download and install Anaconda Python.
Download and extract Apache Spark. Set SPARK_HOME
environment variable.
Install Apache SystemML: pip install systemml
Add following to ~/.ipython/profile_default/startup/00-default-setup.py
import os
import sys
spark_home = os.environ['SPARK_HOME']
sys.path.insert(0, os.path.join(spark_home, 'python'))
sys.path.insert(0, os.path.join(spark_home, 'python', 'lib', 'py4j-0.9-src.zip'))
execfile(os.path.join(spark_home, 'python', 'pyspark', 'shell.py'))
git clone https://github.com/niketanpansare/systemml-book.git cd systemml-book jupyter notebook
Install Java: The Java version should be > 1.8.
java -version
Set JAVA_HOME environment variable,
export JAVA_HOME=”$(/usr/libexec/java_home)”
Download Spark from https://spark.apache.org/downloads.html and move to home directory, and extract.
tar -xzf spark-2.1.0-bin-hadoop2.7.tgz
and set environment variables to point to the extracted directory,
export SPARK_HOME=”$HOME/spark-2.1.0-bin-hadoop2.7” export HADOOP_HOME=$SPARK_HOME export SPARK_LOCAL_IP=127.0.0.1
Install Python, Jupyter, and other libraries: Download and install Anaconda Python 2.7 from https://www.continuum.io/downloads#macos (includes jupyter, and pip)
export PYSPARK_DRIVER_PYTHON=jupyter export PYSPARK_DRIVER_PYTHON_OPTS=’notebook’ pyspark
Install Apache SystemML
pip install systemml
Download and start viewing the book
git clone https://github.com/niketanpansare/systemml-book.git cd systemml-book $SPARK_HOME/bin/pyspark –master local[*] –driver-memory 8G
Install Java: The Java version should be > 1.8.
java -version
Set JAVA_HOME environment variable and include %JAVA_HOME%\bin in the environment variable PATH
ls "%JAVA_HOME%"
Download and extract Spark from https://spark.apache.org/downloads.html,
tar -xzf spark-2.1.0-bin-hadoop2.7.tgz
and set environment variable SPARK_HOME
to point to the extracted directory.
Next step, install winutils:
Finally, modify permission of hive directory that will be used by spark
winutils.exe chmod 777 /tmp/hive
Finally, check if Spark is correctly installed:
%SPARK_HOME%\bin\spark-shell
%SPARK_HOME%\bin\pyspark
Install Python, Jupyter, and other libraries: Download and install Anaconda Python 2.7 from https://www.continuum.io/downloads (includes jupyter, and pip)
Install Apache SystemML
pip install systemml
Download and start viewing the book
git clone https://github.com/niketanpansare/systemml-book.git cd systemml-book set PYSPARK_DRIVER_PYTHON=jupyter set PYSPARK_DRIVER_PYTHON_OPTS=notebook %SPARK_HOME%\bin\pyspark –master local[*] –driver-memory 8G