Setting Up

Installation

You can install RFFLearn from PyPI:

pip install rfflearn

The author recommends using venv or Docker to avoid polluting your environment.

Optional requirements

The above installation command installs only the minimum dependencies. More precisely, it enables training and inference on the CPU, and does not enable other supplementary features, such as GPU support, Optuna support, and SHAP support. To enable these supplemental functions, you need to install additional packages:

 # For GPU support.
pip install torch

# For Optuna support.
pip install optuna

# For SHAP support.
pip install matplotlib shap

The author recommends installing the latest version of these above packages because later version tend to be stable and shows better performance. See requirements.txt if you need detail information of the package versions.

Quick Start

Tiny code example

At first, please install RFFLearn. Here we will use venv:

# Create new environment.
python3 -m venv venv

# Activate the environment.
source venv/bin/activate

# Install RFFLearn into the environment.
python3 -m pip install rfflearn

Then, launch python3 and try the following minimal code that trains support vector classification with random Fourier features:

>>> import numpy as np                                  # Import Numpy
>>> import rfflearn.cpu as rfflearn                     # Import our module
>>> X = np.array([[-1, -1], [-2, -1], [1, 1], [2, 1]])  # Define input data
>>> y = np.array([1, 1, 2, 2])                          # Defile label data
>>> svc = rfflearn.RFFSVC().fit(X, y)                   # Training
>>> svc.score(X, y)                                     # Inference (on CPU)
1.0
>>> svc.predict(np.array([[-0.8, -1]]))
array([1])

More practical example

Open a text file using your favorite text editor with the name sample.py, and type the following code:

import numpy as np
import rfflearn.cpu as rfflearn

# Define training data and labels.
N = 1000
x = np.linspace(0, 2 * np.pi, N)
y = np.sin(x) + 0.1 * np.random.randn(N)

# Create model instance and train it.
svr = rfflearn.RFFSVR().fit(x.reshape((N, 1)), y)

# Print score.
print("R2 score =", svr.score(x.reshape((N, 1)), y))

Then, place the sample.py to the same directory as the rfflearn directory (i.e. root directory of the random-fourier-features repository), and run the following command inside the docker container:

python3 sample.py

Now you've got a minimal example code of the rfflearn. You can customize the sample code as you like. Enjoy comfortable ML life!

Minimal Examples

Support vector classification with random Fourier features

>>> import numpy as np                                  # Import Numpy
>>> import rfflearn.cpu as rfflearn                     # Import module
>>> X = np.array([[-1, -1], [-2, -1], [1, 1], [2, 1]])  # Define input data
>>> y = np.array([1, 1, 2, 2])                          # Defile label data
>>> svc = rfflearn.RFFSVC().fit(X, y)                   # Training
>>> svc.score(X, y)                                     # Inference (on CPU)
1.0
>>> svc.predict(np.array([[-0.8, -1]]))
array([1])

Gaussian process classification with random Fourier features on GPU

>>> import numpy as np                                  # Import Numpy
>>> import rfflearn.gpu as rfflearn                     # Import module
>>> X = np.array([[-1, -1], [-2, -1], [1, 1], [2, 1]])  # Define input data
>>> y = np.array([1, 1, 2, 2])                          # Defile label data
>>> gpc = rfflearn.RFFGPC().fit(X, y)                   # Training on GPU
>>> gpc.score(X, y)                                     # Inference on GPU
1.0
>>> gpc.predict(np.array([[-0.8, -1]]))
array([1])

Principal component analysis with random Fourier features

>>> import numpy as np                                  # Import Numpy
>>> import rfflearn.cpu as rfflearn                     # Import module
>>> X = np.array([[-1, -1], [-2, -1], [1, 1], [2, 1]])  # Define input data
>>> pca = rfflearn.RFFPCA(n_components=1).fit(X)        # Training (on CPU)
>>> pca.transform(X)                                    # Transform (on CPU)
array([[-1.5231749 ],
       [-2.37334318],
       [ 1.5231749 ],
       [ 2.37334318]])

Automatic hyper parameter tuning (using Optuna)

>>> import numpy as np
>>> import rfflearn.cpu as rfflearn
>>> train_set = (np.array([[-1, -1], [1, 1]]), np.array([1, 2]))
>>> valid_set = (np.array([[2, 1]]), np.array([2]))
>>> study = rfflearn.RFFSVC_tuner(train_set, valid_set)
>>> study.best_params
{'dim_kernel': 879, 'std_kernel': 0.6135046243705738}
>>> study.user_attrs["best_model"]                               # Get best estimator

Feature importance visualization (SHAP)

>>> import numpy as np
>>> import rfflearn.cpu as rfflearn
>>> X = np.array([[-1, -1], [-2, -1], [1, 1], [2, 1]])
>>> y = np.array([1, 1, 2, 2])
>>> gpr = rfflearn.RFFGPR().fit(X, y)
>>> shap_values = rfflearn.shap_feature_importance(gpc, Xs)
>>> rfflearn.shap_plot(shap_values, X)