Blog – Data Science – Machine Learning and Stuff

If you want to use Oracle Big Data Cloud and Python using the built-in Zeppelin notebooks, follow these instructions. This is based on a single node instance.

Versions:

Anaconda 3.6 : 3.6.3

Oracle BDC: 17.3.3-20

Steps I took.

Get Anaconda
- login to BDC using ssh
- mkdir anaconda
- cd anaconda
- wget https://repo.continuum.io/archive/Anaconda3-5.0.1-Linux-x86_64.sh
- Or latest version
Install Anaconda (I installed 3.6, 2.7 also works)
- sudo bash Anaconda3-5.0.1-Linux-x86_64.sh
- Accept Licence
- install to /opt/anaconda
Check if everything is OK
- logout of ssh
- connect to ssh again
- /opt/anaconda/bin/conda list
- If you get a list of packages, you are good to go.
Modify Zeppelin interpreter in Oracle BDC to use Anaconda
- Login to BDC console
- Click on Settings tab.
- Change/add the following properties
  - Change zeppelin.pyspark.python – Set to /opt/anaconda/bin/python
  - Add PYSPARK_DRIVER_PYTHON – Set to /opt/anaconda/bin/python
  - Add PYSPARK_PYTHON – Set to /opt/anaconda/bin/python
  - Click SAVE button
Test the zeppelin interpreter
- Go to the Notebook tab
- Create a New Note
- In first note type
  - %pyspark
    import sys
    print(sys.version)
- If you see your version, you are all set.