アニーリングを用いたアンサンブル学習（QBoost)

アニーリングを用いたアンサンブル学習（QBoost)#

QBoostは量子アニーリングを用いたアンサンブル学習の一つです。アンサンブル学習は弱い予測器を多数用意し、その予測器の各予測結果を組み合わせることで、最終的に精度の良い予測結果を得る手法です。

概要と原理#

このQBoostは、入力信号\(x\)がどのような性質を持っているのかを精度良く識別することを目標としたアルゴリズムです。ここでは2つの値\(\pm 1\)のどちらを入力信号に割り当てるべきかという問題を考えましょう。例として、\(x\)が画像データを表しており、その画像に写っているものが犬か猫かを識別するといったタスクを想像すると良いでしょう。アンサンブル学習では、複数の予測器を利用することで、より良い予測精度を達成すること(ブースティング)を目指します。ここでは、あまり性能の良くない予測器(弱い予測器)をたくさん用意します。性能が良くないという意味は、入力に対して正しい出力をしないことが多いことを意味します。これらの予測器の出力を\(c_i (x) \in \{ -1, 1\} \ (i=0, 1, \dots, N-1)\)とします。いくつかの弱い予測器の出力の和を取ることで、より良い予測ができるというのが基本的な考え方です。これを数式で表すと

\[ C(x) = \mathrm{sgn} \left( \sum_{i=0}^{N-1} w_i c_i (x) \right) \tag{1} \]

となります。ここで\(w_i \in \{0, 1\}\)で、\(i\)番目の予測器を使うか使わないかを表します。どの予測器を用いると、できるだけ少ない数の弱い予測器でより良い性能が得られるかを明らかにしましょう。
このために、教師あり学習を用いて最適な\(\{w_i\}\)の組を求めることにします。教師データを\((x^{(d)}, y^{(d)}) \ (d= 0, 1, \dots, D-1)\)を多数用意します(\(D \gg 1\))。それらをできるだけ忠実に再現するように\(\{w_i\}\)を調整します。
この方針をより具体的に表すと、次のハミルトニアンを\(\{w_i\}\)について最小化することを目指せば良いとわかります。

\[ H(w) = \sum_{d=0}^{D-1} \left( \frac{1}{N} \sum_{i=0}^{N-1} w_i c_i (x^{(d)}) - y^{(d)}\right)^2 + \lambda \sum_{i=0}^{N-1} w_i \tag{2} \]

このハミルトニアンの最小化を通して、教師データ\(y^{(d)}\)との差ができるだけ小さくなるようにします。式(1)の右辺をそのまま使うと、符号関数があるために\(w_i\)の2次形式にならず、イジング模型に帰着することができません。そのため、符号関数の引数\(\sum_i w_i c_i\)の\(1/N\)倍と教師データ\(y^{(d)}\)との差の2乗を最小化する問題にしています。\(1/N\)の係数は、\(\sum_i w_i c_i(x)\)の最大値が\(N\)であるために\(y^{(d)}= \pm 1\)との差が大きくなりすぎないように調整するためのものです。\(\lambda (>0)\)がかかった項は、あまり多くの\(w_i\)を1にせず、比較的少数の弱い予測器で効率良く構成するための項(正則化項)を表します。

JijModelingによる定式化#

JijModelingを用いて、上述のハミルトニアンを定式化しましょう。そのために必要となる変数を定義します。

import jijmodeling as jm

problem = jm.Problem("QBoost")

c = problem.Integer("c", ndim=2)
y = problem.Integer("y", ndim=1)
N = problem.DependentVar("N", c.len_at(0))
D = problem.DependentVar("D", c.len_at(1))
w = problem.BinaryVar("w", shape=(N, ))
lamb = problem.Float("lamb", latex="\lambda")

ここでは c を弱い予測器の出力（2次元整数配列）、y を教師ラベル（1次元整数配列）、N を予測器の数（c の次元から自動的に決定）、D を教師データの数（c の次元から自動的に決定）、w を各予測器を使うかどうかを表すバイナリ変数、lamb を正則化パラメータ \(\lambda\) として定義しています。

ハミルトニアン#

先ほどのハミルトニアンを定式化しましょう。

# set objective function 1: minimize the sum of differences
obj1 = jm.sum(D, lambda d: (jm.sum(N, lambda i: w[i]*c[i, d])/N - y[d])**2)
# set objective function 2: minimize regularization term
obj2 = lamb * w[:].sum()

problem += obj1 + obj2

w[:].sum()とすることで、\(\sum_i w_i\)を簡潔に記述することができます。

定式化された数理モデルを、Jupyter Notebookで表示してみましょう。

problem

\[\begin{split}\begin{array}{rl} \text{Problem}\colon &\text{QBoost}\\\displaystyle \min &\displaystyle \sum _{d=0}^{D-1}{{\left(\left(\sum _{i=0}^{N-1}{{w}_{i}\cdot {c}_{i,d}}\right)\div N-{y}_{d}\right)}^{2}}+\lambda\cdot \left(\sum _{\vec{\imath }}{{{\left({w}_{\left(\colon \right)}\right)}}_{\vec{\imath }}}\right)\\&\\\text{where}&\\&\text{Decision Variables:}\\&\qquad \begin{alignedat}{2}w&\in \mathop{\mathrm{Array}}\left[N;\left\{0, 1\right\}\right]&\quad &1\text{-dim binary variable}\\\end{alignedat}\\&\\&\text{Placeholders:}\\&\qquad \begin{alignedat}{2}c&\in \mathop{\mathrm{Array}}\left[(-)\times (-);\mathbb{Z}\right]&\quad &2\text{-dimensional array of placeholders with elements in }\mathbb{Z}\\\lambda&\in \mathbb{R}&\quad &\text{A scalar placeholder in }\mathbb{R}\\y&\in \mathop{\mathrm{Array}}\left[(-);\mathbb{Z}\right]&\quad &1\text{-dimensional array of placeholders with elements in }\mathbb{Z}\\\end{alignedat}\\&\\&\text{Dependent Variables:}\\&\qquad \begin{alignedat}{2}D&=\mathop{\mathtt{len\_{}at}}\left(c,1\right)&\quad &\in \mathbb{N}\\N&=\mathop{\mathtt{len\_{}at}}\left(c,0\right)&\quad &\in \mathbb{N}\\\end{alignedat}\end{array} \end{split}\]

インスタンスの作成#

実際に実行するタスクなどを設定しましょう。今回は弱い予測器をscikit-learnのdecision stump(決定株: 一層の決定木)を用います。また用いるデータはscikit-learnの癌識別データセットを使用します。

import numpy as np

from sklearn import datasets
from sklearn.tree import DecisionTreeClassifier as DTC


def prediction_from_train(N, X_train, y_train, X_test):
    # set the number of ensembles to be taken out for one sample in bootstrap sampling
    sample_train = 40
    # set model
    models = [DTC(splitter="random", max_depth=1) for i in range(N)]
    for model in models:
        # extract randomly
        train_idx = np.random.choice(np.arange(X_train.shape[0]), sample_train)
        # make decision tree with variables
        model.fit(X=X_train[train_idx], y=y_train[train_idx])
    y_pred_list_train = []
    for model in models:
        # execute prediction with model
        y_pred_list_train.append(model.predict(X_train))
    y_pred_list_train = np.asanyarray(y_pred_list_train)
    y_pred_list_test = []
    for model in models:
        # execute with test data
        y_pred_list_test.append(model.predict(X_test))
    y_pred_list_test = np.array(y_pred_list_test)
    return y_pred_list_train, y_pred_list_test

# load data
cancer_data = datasets.load_breast_cancer()
# set the number of train data
num_train = 200
# add noise to feature
noisy_data = np.concatenate((cancer_data.data, np.random.rand(cancer_data.data.shape[0], 30)), axis=1)
# convert (0, 1) label to (-1, 1) and cast to int
labels = ((cancer_data.target-0.5) * 2).astype(int)
# divide dataset to train and test
X_train = noisy_data[:num_train, :]
X_test = noisy_data[num_train:, :]
y_train = labels[:num_train]
y_test = labels[num_train:]
# set the number of classifer
inst_N = 20
# predict from train data using dicision tree classifier
y_pred_list_train, y_pred_list_test = prediction_from_train(inst_N, X_train, y_train, X_test)
# set lambda (coefficient of 2nd term)
inst_lamb = 10.0
# instance_data contains only the variables defined in the model
instance_data = {'y': y_train.astype(int), 'c': y_pred_list_train.astype(int), 'lamb': inst_lamb}

デモンストレーションのために、ノイズとなる特徴量を加えたものを実際のデータとして用います。 prediction_from_train関数を用いて弱い予測器の作成と、それらの予測器からの出力\(c_i (x^{(d)})\)を作成しています。ここでは弱い予測器の数を20、教師データ数は200です。最後に式(2)の\(\lambda\)の値を10.0としています。

OpenJijによる最適化計算の実行#

OpenJijのシミュレーテッド・アニーリングを用いて、最適化問題を解いてみましょう。

from ommx_openjij_adapter import OMMXOpenJijSAAdapter

instance = problem.eval(instance_data)

adapter = OMMXOpenJijSAAdapter(instance)
best_sample = adapter.sample(
    instance, num_reads=100
).best_feasible_unrelaxed

解の可視化#

最後に、得られた解を可視化しましょう。

from sklearn.metrics import accuracy_score

# Get selected classifier indices from the best solution
df = best_sample.decision_variables_df
w_indices = [row["subscripts"][0] for _, row in df.iterrows() if row["name"] == "w" and row["value"] > 0.5]

# Predict using selected classifiers
y_pred_train = np.sign(np.sum(y_pred_list_train[w_indices, :], axis=0))
y_pred_test = np.sign(np.sum(y_pred_list_test[w_indices, :], axis=0))

acc_train = accuracy_score(y_true=y_train, y_pred=y_pred_train)
acc_test = accuracy_score(y_true=y_test, y_pred=y_pred_test)
print(f"Train Accuracy of QBoost is {acc_train}")
print(f"Test Accuracy of QBoost is {acc_test}")

Train Accuracy of QBoost is 0.945
Test Accuracy of QBoost is 0.9132791327913279

この実行例では、訓練データとテストデータの両方で比較的高い精度が得られています。少なくとも今回の結果からは、QBoostによって弱い予測器を選択しつつ一定の分類性能が得られていることが確認できます。