systemml-book

Introduction to Machine Learning with Apache SystemML

Table of Content

Chapter 1. Introduction [nbviewer] [ipynb]

Chapter 2: Preliminaries [nbviewer] [ipynb]

Simple instructions to run the notebooks locally

  1. Download and install Anaconda Python.

  2. Download and extract Apache Spark. Set SPARK_HOME environment variable.

  3. Install Apache SystemML: pip install systemml

  4. Add following to ~/.ipython/profile_default/startup/00-default-setup.py

  import os
  import sys
  spark_home = os.environ['SPARK_HOME']
  sys.path.insert(0, os.path.join(spark_home, 'python'))
  sys.path.insert(0, os.path.join(spark_home, 'python', 'lib', 'py4j-0.9-src.zip'))
  execfile(os.path.join(spark_home, 'python', 'pyspark', 'shell.py'))
  1. Download and start viewing the book

git clone https://github.com/niketanpansare/systemml-book.git cd systemml-book jupyter notebook

Detailed instructions to run the notebooks locally on MacOS/Linux

  1. Install Java: The Java version should be > 1.8.

    java -version

Set JAVA_HOME environment variable,

export JAVA_HOME=”$(/usr/libexec/java_home)”

  1. Download Spark from https://spark.apache.org/downloads.html and move to home directory, and extract.

    tar -xzf spark-2.1.0-bin-hadoop2.7.tgz

and set environment variables to point to the extracted directory,

export SPARK_HOME=”$HOME/spark-2.1.0-bin-hadoop2.7” export HADOOP_HOME=$SPARK_HOME export SPARK_LOCAL_IP=127.0.0.1

  1. Install Python, Jupyter, and other libraries: Download and install Anaconda Python 2.7 from https://www.continuum.io/downloads#macos (includes jupyter, and pip)

    export PYSPARK_DRIVER_PYTHON=jupyter export PYSPARK_DRIVER_PYTHON_OPTS=’notebook’ pyspark

  2. Install Apache SystemML

    pip install systemml

  3. Download and start viewing the book

git clone https://github.com/niketanpansare/systemml-book.git cd systemml-book $SPARK_HOME/bin/pyspark –master local[*] –driver-memory 8G

Detailed instructions to run the notebooks locally on Windows

  1. Install Java: The Java version should be > 1.8.

    java -version

Set JAVA_HOME environment variable and include %JAVA_HOME%\bin in the environment variable PATH

ls "%JAVA_HOME%"
  1. Download and extract Spark from https://spark.apache.org/downloads.html,

    tar -xzf spark-2.1.0-bin-hadoop2.7.tgz

and set environment variable SPARK_HOME to point to the extracted directory.

Next step, install winutils:

Finally, check if Spark is correctly installed:

%SPARK_HOME%\bin\spark-shell
%SPARK_HOME%\bin\pyspark	
  1. Install Python, Jupyter, and other libraries: Download and install Anaconda Python 2.7 from https://www.continuum.io/downloads (includes jupyter, and pip)

  2. Install Apache SystemML

    pip install systemml

  3. Download and start viewing the book

git clone https://github.com/niketanpansare/systemml-book.git cd systemml-book set PYSPARK_DRIVER_PYTHON=jupyter set PYSPARK_DRIVER_PYTHON_OPTS=notebook %SPARK_HOME%\bin\pyspark –master local[*] –driver-memory 8G