rfflearn_logo

Setting Up

Using Docker (recommended)

If you don't like to pollute your development environment, it is a good idea to run everything inside a Docker container. The rfflearn and it's sample code are executable on this docker image. Please run the following command to install the docker image:

docker pull tiskw/pytorch:latest

The following command is the typical usage of the docker image:

cd PATH_TO_THE_WORKING_DIRECTORY
docker run --rm -it -v `pwd`:/work -w /work -u `id -u`:`id -g` tiskw/pytorch:latest bash

If you need GPU support, add --gpus all option to the above docker run command above. Also if the version of your docker is lower than 19, use --runtime=nvidia instead of --gpus all.

Installing on your environment (easier, but pollute your environment)

The rfflearn module requires NumPy, SciPy, Scikit-learn, and PyTorch if you need GPU support, and optuna if you need hyper parameter tuning function. Also, the sample code contained in the module requires docopt. If you don't have them, please run the following to install them (you may need root permission)

pip3 install numpy scipy scikit-learn  # Necessary packages
pip3 install torch                     # Required for GPU inference
pip3 install optuna                    # Required for hyper parameter tuning
pip3 install shap                      # Required for feature importance
pip3 install docopt                    # Required for sample code

Author recommend to install the latest version of these above packages because later version tend to be stable and show better performance. See requirements.txt if you need detail information of the package versions.

Quick Start

Tiny code example

At first, please clone the random-fourier-features repository from GitHub:

git clone https://github.com/tiskw/random-fourier-features.git
cd random-fourier-features

If you are using the docker image, enter into the docker container:

docker run --rm -it -v `pwd`:/work -w /work -u `id -u`:`id -g` tiskw/pytorch:latest bash

Launch python3 and try the following minimal code for support vector classification with random Fourier features:

>>> import numpy as np                                  # Import Numpy
>>> import rfflearn.cpu as rfflearn                     # Import our module
>>> X = np.array([[-1, -1], [-2, -1], [1, 1], [2, 1]])  # Define input data
>>> y = np.array([1, 1, 2, 2])                          # Defile label data
>>> svc = rfflearn.RFFSVC().fit(X, y)                   # Training
>>> svc.score(X, y)                                     # Inference (on CPU)
1.0
>>> svc.predict(np.array([[-0.8, -1]]))
array([1])

More practical example

Open a text file using your favorite text editor with the name sample.py, and type the following:

import numpy as np
import rfflearn.cpu as rfflearn

def main():
    # Define training data and labels.
    N = 1000
    x = np.linspace(0, 2 * np.pi, N)
    y = np.sin(x) + 0.1 * np.random.randn(N)

    # Create model instance and train it.
    svr = rfflearn.RFFSVR().fit(x.reshape((N, 1)), y)

    # Print score.
    print("R2 score =", svr.score(x.reshape((N, 1)), y))

if __name__ == "__main__":
    main()

Then, place the sample.py to the same directory as the rfflearn directory (i.e. root directory of the random-fourier-features repository), and run the following command inside the docker container:

python3 sample.py

Now you've got a minimal example code of the rfflearn. You can customize the sample code as you like. Enjoy comfortable ML life!

Minimal Examples

Support vector classification with random Fourier features

>>> import numpy as np                                  # Import Numpy
>>> import rfflearn.cpu as rfflearn                     # Import module
>>> X = np.array([[-1, -1], [-2, -1], [1, 1], [2, 1]])  # Define input data
>>> y = np.array([1, 1, 2, 2])                          # Defile label data
>>> svc = rfflearn.RFFSVC().fit(X, y)                   # Training
>>> svc.score(X, y)                                     # Inference (on CPU)
1.0
>>> svc.predict(np.array([[-0.8, -1]]))
array([1])

Gaussian process classification with random Fourier features on GPU

>>> import numpy as np                                  # Import Numpy
>>> import rfflearn.gpu as rfflearn                     # Import module
>>> X = np.array([[-1, -1], [-2, -1], [1, 1], [2, 1]])  # Define input data
>>> y = np.array([1, 1, 2, 2])                          # Defile label data
>>> gpc = rfflearn.RFFGPC().fit(X, y)                   # Training on GPU
>>> gpc.score(X, y)                                     # Inference on GPU
1.0
>>> gpc.predict(np.array([[-0.8, -1]]))
array([1])

Principal component analysis with random Fourier features

>>> import numpy as np                                  # Import Numpy
>>> import rfflearn.cpu as rfflearn                     # Import module
>>> X = np.array([[-1, -1], [-2, -1], [1, 1], [2, 1]])  # Define input data
>>> pca = rfflearn.RFFPCA(n_components=1).fit(X)        # Training (on CPU)
>>> pca.transform(X)                                    # Transform (on CPU)
array([[-1.5231749 ],
       [-2.37334318],
       [ 1.5231749 ],
       [ 2.37334318]])

Automatic hyper parameter tuning (using Optuna)

>>> import numpy as np
>>> import rfflearn.cpu as rfflearn
>>> train_set = (np.array([[-1, -1], [1, 1]]), np.array([1, 2]))
>>> valid_set = (np.array([[2, 1]]), np.array([2]))
>>> study = rfflearn.RFFSVC_tuner(train_set, valid_set)
>>> study.best_params
{'dim_kernel': 879, 'std_kernel': 0.6135046243705738}
>>> study.user_attrs["best_model"]                               # Get best estimator

Feature importance visualization (SHAP)

>>> import numpy as np
>>> import rfflearn.cpu as rfflearn
>>> X = np.array([[-1, -1], [-2, -1], [1, 1], [2, 1]])
>>> y = np.array([1, 1, 2, 2])
>>> gpr = rfflearn.RFFGPR().fit(X, y)
>>> shap_values = rfflearn.shap_feature_importance(gpc, Xs)
>>> rfflearn.shap_plot(shap_values, X)