Rust Elective 03: 機械学習 - Rustで学ぶニューラルネットワーク

課題説明

概要

機械学習は、データからパターンを学習し予測を行う技術です。本課題では、Rustを使って基本的なニューラルネットワークをスクラッチで実装し、線形回帰から始めて多層パーセプトロンまで段階的に学びます。

背景と動機

機械学習の重要性:

データ駆動: 明示的なプログラミングではなくデータから学習
予測と分類: パターン認識、画像分類、自然言語処理
自動化: 人間の介入なしで複雑なタスクを実行
最適化: 損失関数の最小化による性能向上

Rustの優位性:

高速な数値計算（C++並み）
メモリ安全性（行列演算のバグ防止）
並列処理に強い（大規模データセット）
型システムで数学的誤りを防ぐ

課題要件

以下の機能を持つ機械学習ライブラリを実装してください:

線形回帰:

- 単純線形回帰 - 多変量線形回帰 - 勾配降下法 - 損失関数（MSE）

ニューラルネットワーク:

- 多層パーセプトロン（MLP） - 活性化関数（ReLU、Sigmoid、Tanh） - フォワードプロパゲーション - バックプロパゲーション

学習アルゴリズム:

- 勾配降下法（SGD） - ミニバッチ学習 - 学習率調整 - エポックとバッチ管理

データ処理:

- データの正規化 - トレーニング/テストセット分割 - バッチシャッフル - CSV読み込み

評価と可視化:

- 精度計算 - 損失関数の推移 - 学習曲線のプロット - 混同行列（分類問題）

制約条件

ndarrayクレートを使用すること（行列演算）
外部の機械学習ライブラリは使用しないこと（スクラッチ実装）
数値微分ではなくバックプロパゲーションを実装すること
IRISデータセットとMNIST簡易版でテストすること

---

想定解答

プロジェクト構造

rust_ml/
├── Cargo.toml
├── src/
│   ├── main.rs
│   ├── lib.rs
│   ├── linear_regression.rs
│   ├── neural_network.rs
│   ├── activation.rs
│   ├── loss.rs
│   ├── optimizer.rs
│   ├── data.rs
│   └── utils.rs
├── data/
│   ├── iris.csv
│   └── mnist_sample.csv
├── tests/
│   └── integration_tests.rs
└── examples/
    ├── linear_regression_demo.rs
    ├── neural_network_demo.rs
    └── mnist_demo.rs

Cargo.toml

[package]
name = "rust_ml"
version = "0.1.0"
edition = "2021"

[dependencies]
ndarray = "0.15"
ndarray-rand = "0.14"
rand = "0.8"
csv = "1.2"
serde = { version = "1.0", features = ["derive"] }
plotters = "0.3"

[dev-dependencies]
approx = "0.5"

src/activation.rs

use ndarray::Array1;

/// 活性化関数の種類
#[derive(Debug, Clone, Copy)]
pub enum Activation {
    ReLU,
    Sigmoid,
    Tanh,
    Linear,
}

impl Activation {
    /// 活性化関数を適用
    pub fn forward(&self, x: &Array1<f64>) -> Array1<f64> {
        match self {
            Activation::ReLU => x.mapv(|v| v.max(0.0)),
            Activation::Sigmoid => x.mapv(|v| 1.0 / (1.0 + (-v).exp())),
            Activation::Tanh => x.mapv(|v| v.tanh()),
            Activation::Linear => x.clone(),
        }
    }

    /// 活性化関数の微分
    pub fn backward(&self, x: &Array1<f64>) -> Array1<f64> {
        match self {
            Activation::ReLU => x.mapv(|v| if v > 0.0 { 1.0 } else { 0.0 }),
            Activation::Sigmoid => {
                let sigmoid = self.forward(x);
                &sigmoid * &sigmoid.mapv(|v| 1.0 - v)
            }
            Activation::Tanh => {
                let tanh = self.forward(x);
                tanh.mapv(|v| 1.0 - v * v)
            }
            Activation::Linear => Array1::ones(x.len()),
        }
    }
}

#[cfg(test)]
mod tests {
    use super::*;
    use ndarray::arr1;

    #[test]
    fn test_relu() {
        let x = arr1(&[-1.0, 0.0, 1.0, 2.0]);
        let y = Activation::ReLU.forward(&x);
        assert_eq!(y, arr1(&[0.0, 0.0, 1.0, 2.0]));
    }

    #[test]
    fn test_sigmoid() {
        let x = arr1(&[0.0]);
        let y = Activation::Sigmoid.forward(&x);
        assert!((y[0] - 0.5).abs() < 1e-10);
    }
}

src/loss.rs

use ndarray::Array1;

/// 損失関数の種類
#[derive(Debug, Clone, Copy)]
pub enum Loss {
    MSE,          // Mean Squared Error
    CrossEntropy, // Cross Entropy (分類用)
}

impl Loss {
    /// 損失を計算
    pub fn forward(&self, predicted: &Array1<f64>, target: &Array1<f64>) -> f64 {
        match self {
            Loss::MSE => {
                let diff = predicted - target;
                (&diff * &diff).mean().unwrap()
            }
            Loss::CrossEntropy => {
                // -Σ(y * log(ŷ))
                let epsilon = 1e-7; // log(0)を防ぐ
                let predicted_clipped = predicted.mapv(|v| v.max(epsilon).min(1.0 - epsilon));
                -(target * &predicted_clipped.mapv(|v| v.ln())).sum()
            }
        }
    }

    /// 損失の微分
    pub fn backward(&self, predicted: &Array1<f64>, target: &Array1<f64>) -> Array1<f64> {
        match self {
            Loss::MSE => {
                // ∂MSE/∂ŷ = 2(ŷ - y) / n
                2.0 * (predicted - target) / predicted.len() as f64
            }
            Loss::CrossEntropy => {
                // ∂CE/∂ŷ = -y/ŷ
                let epsilon = 1e-7;
                -target / &predicted.mapv(|v| v.max(epsilon))
            }
        }
    }
}

src/linear_regression.rs

use ndarray::{Array1, Array2};
use rand::thread_rng;
use rand::seq::SliceRandom;

/// 線形回帰モデル
pub struct LinearRegression {
    pub weights: Array1<f64>,
    pub bias: f64,
    pub learning_rate: f64,
}

impl LinearRegression {
    /// 新しい線形回帰モデルを作成
    pub fn new(input_dim: usize, learning_rate: f64) -> Self {
        LinearRegression {
            weights: Array1::zeros(input_dim),
            bias: 0.0,
            learning_rate,
        }
    }

    /// 予測を計算
    pub fn predict(&self, x: &Array2<f64>) -> Array1<f64> {
        x.dot(&self.weights) + self.bias
    }

    /// MSE損失を計算
    pub fn loss(&self, x: &Array2<f64>, y: &Array1<f64>) -> f64 {
        let predictions = self.predict(x);
        let diff = &predictions - y;
        (&diff * &diff).mean().unwrap()
    }

    /// 勾配降下法で1ステップ更新
    pub fn step(&mut self, x: &Array2<f64>, y: &Array1<f64>) {
        let n = x.nrows() as f64;
        let predictions = self.predict(x);
        let error = &predictions - y;

        // ∂L/∂w = (2/n) * X^T * (ŷ - y)
        let grad_w = x.t().dot(&error) * (2.0 / n);

        // ∂L/∂b = (2/n) * Σ(ŷ - y)
        let grad_b = error.sum() * (2.0 / n);

        // パラメータ更新
        self.weights = &self.weights - &(grad_w * self.learning_rate);
        self.bias -= grad_b * self.learning_rate;
    }

    /// ミニバッチ勾配降下法で学習
    pub fn fit(
        &mut self,
        x: &Array2<f64>,
        y: &Array1<f64>,
        epochs: usize,
        batch_size: usize,
        verbose: bool,
    ) -> Vec<f64> {
        let mut loss_history = Vec::new();
        let n = x.nrows();
        let mut indices: Vec<usize> = (0..n).collect();
        let mut rng = thread_rng();

        for epoch in 0..epochs {
            // データをシャッフル
            indices.shuffle(&mut rng);

            // ミニバッチで学習
            for batch_start in (0..n).step_by(batch_size) {
                let batch_end = (batch_start + batch_size).min(n);
                let batch_indices = &indices[batch_start..batch_end];

                // バッチデータを抽出
                let x_batch = x.select(ndarray::Axis(0), batch_indices);
                let y_batch = y.select(ndarray::Axis(0), batch_indices);

                self.step(&x_batch, &y_batch);
            }

            // 損失を記録
            let loss = self.loss(x, y);
            loss_history.push(loss);

            if verbose && (epoch + 1) % 100 == 0 {
                println!("Epoch {}/{}, Loss: {:.6}", epoch + 1, epochs, loss);
            }
        }

        loss_history
    }
}

#[cfg(test)]
mod tests {
    use super::*;
    use ndarray::{arr1, arr2};

    #[test]
    fn test_linear_regression() {
        // y = 2x + 1 を学習
        let x = arr2(&[[1.0], [2.0], [3.0], [4.0]]);
        let y = arr1(&[3.0, 5.0, 7.0, 9.0]);

        let mut model = LinearRegression::new(1, 0.01);
        model.fit(&x, &y, 1000, 4, false);

        // 重みとバイアスが正しいか確認
        assert!((model.weights[0] - 2.0).abs() < 0.1);
        assert!((model.bias - 1.0).abs() < 0.1);

        // 予測が正確か確認
        let pred = model.predict(&arr2(&[[5.0]]));
        assert!((pred[0] - 11.0).abs() < 0.5);
    }
}

src/neural_network.rs

use ndarray::{Array1, Array2};
use ndarray_rand::RandomExt;
use ndarray_rand::rand_distr::Uniform;
use rand::thread_rng;
use rand::seq::SliceRandom;

use crate::activation::Activation;
use crate::loss::Loss;

/// 全結合層
#[derive(Debug, Clone)]
pub struct Layer {
    pub weights: Array2<f64>,
    pub bias: Array1<f64>,
    pub activation: Activation,

    // バックプロパゲーション用
    pub input: Option<Array1<f64>>,
    pub z: Option<Array1<f64>>, // 活性化前の値
    pub output: Option<Array1<f64>>,
}

impl Layer {
    /// 新しい層を作成（He初期化）
    pub fn new(input_size: usize, output_size: usize, activation: Activation) -> Self {
        let scale = (2.0 / input_size as f64).sqrt();
        let weights = Array2::random(
            (input_size, output_size),
            Uniform::new(-scale, scale),
        );
        let bias = Array1::zeros(output_size);

        Layer {
            weights,
            bias,
            activation,
            input: None,
            z: None,
            output: None,
        }
    }

    /// フォワードパス
    pub fn forward(&mut self, input: &Array1<f64>) -> Array1<f64> {
        self.input = Some(input.clone());

        // z = W^T * x + b
        let z = self.weights.t().dot(input) + &self.bias;
        self.z = Some(z.clone());

        // a = activation(z)
        let output = self.activation.forward(&z);
        self.output = Some(output.clone());

        output
    }

    /// バックワードパス
    pub fn backward(&self, grad_output: &Array1<f64>) -> (Array1<f64>, Array2<f64>, Array1<f64>) {
        let input = self.input.as_ref().unwrap();
        let z = self.z.as_ref().unwrap();

        // ∂L/∂z = ∂L/∂a * ∂a/∂z
        let grad_z = grad_output * &self.activation.backward(z);

        // ∂L/∂W = x * (∂L/∂z)^T
        let grad_weights = input
            .clone()
            .into_shape((input.len(), 1))
            .unwrap()
            .dot(&grad_z.clone().into_shape((1, grad_z.len())).unwrap());

        // ∂L/∂b = ∂L/∂z
        let grad_bias = grad_z.clone();

        // ∂L/∂x = W * ∂L/∂z
        let grad_input = self.weights.dot(&grad_z);

        (grad_input, grad_weights, grad_bias)
    }
}

/// 多層パーセプトロン
pub struct NeuralNetwork {
    pub layers: Vec<Layer>,
    pub learning_rate: f64,
    pub loss_fn: Loss,
}

impl NeuralNetwork {
    /// 新しいニューラルネットワークを作成
    pub fn new(learning_rate: f64, loss_fn: Loss) -> Self {
        NeuralNetwork {
            layers: Vec::new(),
            learning_rate,
            loss_fn,
        }
    }

    /// 層を追加
    pub fn add_layer(&mut self, layer: Layer) {
        self.layers.push(layer);
    }

    /// フォワードパス
    pub fn forward(&mut self, input: &Array1<f64>) -> Array1<f64> {
        let mut output = input.clone();

        for layer in &mut self.layers {
            output = layer.forward(&output);
        }

        output
    }

    /// バックワードパス
    pub fn backward(&mut self, target: &Array1<f64>) {
        let output = self.layers.last().unwrap().output.as_ref().unwrap();

        // 出力層の勾配
        let mut grad = self.loss_fn.backward(output, target);

        // 逆順に伝播
        for layer in self.layers.iter_mut().rev() {
            let (grad_input, grad_weights, grad_bias) = layer.backward(&grad);

            // パラメータ更新
            layer.weights = &layer.weights - &(grad_weights * self.learning_rate);
            layer.bias = &layer.bias - &(grad_bias * self.learning_rate);

            grad = grad_input;
        }
    }

    /// 予測
    pub fn predict(&mut self, input: &Array1<f64>) -> Array1<f64> {
        self.forward(input)
    }

    /// 損失を計算
    pub fn loss(&mut self, x: &Array2<f64>, y: &Array2<f64>) -> f64 {
        let mut total_loss = 0.0;

        for i in 0..x.nrows() {
            let input = x.row(i).to_owned();
            let target = y.row(i).to_owned();
            let output = self.forward(&input);
            total_loss += self.loss_fn.forward(&output, &target);
        }

        total_loss / x.nrows() as f64
    }

    /// 学習
    pub fn fit(
        &mut self,
        x: &Array2<f64>,
        y: &Array2<f64>,
        epochs: usize,
        batch_size: usize,
        verbose: bool,
    ) -> Vec<f64> {
        let mut loss_history = Vec::new();
        let n = x.nrows();
        let mut indices: Vec<usize> = (0..n).collect();
        let mut rng = thread_rng();

        for epoch in 0..epochs {
            indices.shuffle(&mut rng);

            for batch_start in (0..n).step_by(batch_size) {
                let batch_end = (batch_start + batch_size).min(n);

                for &idx in &indices[batch_start..batch_end] {
                    let input = x.row(idx).to_owned();
                    let target = y.row(idx).to_owned();

                    self.forward(&input);
                    self.backward(&target);
                }
            }

            let loss = self.loss(x, y);
            loss_history.push(loss);

            if verbose && (epoch + 1) % 10 == 0 {
                println!("Epoch {}/{}, Loss: {:.6}", epoch + 1, epochs, loss);
            }
        }

        loss_history
    }

    /// 精度を計算（分類問題）
    pub fn accuracy(&mut self, x: &Array2<f64>, y: &Array2<f64>) -> f64 {
        let mut correct = 0;

        for i in 0..x.nrows() {
            let input = x.row(i).to_owned();
            let target = y.row(i).to_owned();
            let output = self.predict(&input);

            // 最大値のインデックスを比較
            let pred_class = output.iter()
                .enumerate()
                .max_by(|(_, a), (_, b)| a.partial_cmp(b).unwrap())
                .map(|(index, _)| index)
                .unwrap();

            let true_class = target.iter()
                .enumerate()
                .max_by(|(_, a), (_, b)| a.partial_cmp(b).unwrap())
                .map(|(index, _)| index)
                .unwrap();

            if pred_class == true_class {
                correct += 1;
            }
        }

        correct as f64 / x.nrows() as f64
    }
}

#[cfg(test)]
mod tests {
    use super::*;
    use ndarray::{arr1, arr2};

    #[test]
    fn test_xor_problem() {
        // XOR問題を学習
        let x = arr2(&[[0.0, 0.0], [0.0, 1.0], [1.0, 0.0], [1.0, 1.0]]);
        let y = arr2(&[[0.0], [1.0], [1.0], [0.0]]);

        let mut nn = NeuralNetwork::new(0.1, Loss::MSE);
        nn.add_layer(Layer::new(2, 4, Activation::ReLU));
        nn.add_layer(Layer::new(4, 1, Activation::Sigmoid));

        nn.fit(&x, &y, 1000, 4, false);

        // 予測テスト
        let pred1 = nn.predict(&arr1(&[0.0, 0.0]));
        let pred2 = nn.predict(&arr1(&[0.0, 1.0]));

        assert!(pred1[0] < 0.5);
        assert!(pred2[0] > 0.5);
    }
}

src/data.rs

use ndarray::{Array1, Array2};
use rand::seq::SliceRandom;
use rand::thread_rng;

/// データの正規化（標準化）
pub fn normalize(data: &Array2<f64>) -> (Array2<f64>, Array1<f64>, Array1<f64>) {
    let mean = data.mean_axis(ndarray::Axis(0)).unwrap();
    let std = data.std_axis(ndarray::Axis(0), 0.0);

    let normalized = (data - &mean) / &std;

    (normalized, mean, std)
}

/// トレーニング/テストセットに分割
pub fn train_test_split(
    x: &Array2<f64>,
    y: &Array2<f64>,
    test_ratio: f64,
) -> (Array2<f64>, Array2<f64>, Array2<f64>, Array2<f64>) {
    let n = x.nrows();
    let test_size = (n as f64 * test_ratio) as usize;
    let train_size = n - test_size;

    let mut indices: Vec<usize> = (0..n).collect();
    indices.shuffle(&mut thread_rng());

    let train_indices = &indices[..train_size];
    let test_indices = &indices[train_size..];

    let x_train = x.select(ndarray::Axis(0), train_indices);
    let y_train = y.select(ndarray::Axis(0), train_indices);
    let x_test = x.select(ndarray::Axis(0), test_indices);
    let y_test = y.select(ndarray::Axis(0), test_indices);

    (x_train, y_train, x_test, y_test)
}

/// One-hotエンコーディング
pub fn one_hot_encode(labels: &Array1<usize>, num_classes: usize) -> Array2<f64> {
    let mut encoded = Array2::zeros((labels.len(), num_classes));

    for (i, &label) in labels.iter().enumerate() {
        encoded[[i, label]] = 1.0;
    }

    encoded
}

examples/neural_network_demo.rs

use ndarray::{arr2, Array2};
use rust_ml::{
    neural_network::{Layer, NeuralNetwork},
    activation::Activation,
    loss::Loss,
    data::{normalize, train_test_split},
};

fn main() {
    println!("=== Neural Network Demo: IRIS Classification ===\n");

    // IRISデータセット（簡略版）
    let x = arr2(&[
        [5.1, 3.5, 1.4, 0.2],
        [4.9, 3.0, 1.4, 0.2],
        [7.0, 3.2, 4.7, 1.4],
        [6.4, 3.2, 4.5, 1.5],
        [6.3, 3.3, 6.0, 2.5],
        [5.8, 2.7, 5.1, 1.9],
    ]);

    let y = arr2(&[
        [1.0, 0.0, 0.0], // Setosa
        [1.0, 0.0, 0.0],
        [0.0, 1.0, 0.0], // Versicolor
        [0.0, 1.0, 0.0],
        [0.0, 0.0, 1.0], // Virginica
        [0.0, 0.0, 1.0],
    ]);

    // データの正規化
    let (x_norm, _, _) = normalize(&x);

    // ニューラルネットワークを構築
    let mut nn = NeuralNetwork::new(0.01, Loss::CrossEntropy);
    nn.add_layer(Layer::new(4, 8, Activation::ReLU));
    nn.add_layer(Layer::new(8, 8, Activation::ReLU));
    nn.add_layer(Layer::new(8, 3, Activation::Sigmoid));

    println!("Training...");
    let history = nn.fit(&x_norm, &y, 500, 2, true);

    println!("\nFinal Loss: {:.6}", history.last().unwrap());

    let accuracy = nn.accuracy(&x_norm, &y);
    println!("Accuracy: {:.2}%", accuracy * 100.0);
}

---

解説

実装のポイント

1. バックプロパゲーション

勾配の連鎖律:

pub fn backward(&self, grad_output: &Array1<f64>) -> (Array1<f64>, Array2<f64>, Array1<f64>) {
    // ∂L/∂z = ∂L/∂a * ∂a/∂z （連鎖律）
    let grad_z = grad_output * &self.activation.backward(z);

    // ∂L/∂W = x * (∂L/∂z)^T
    let grad_weights = input.dot(&grad_z.t());

    // ∂L/∂x = W * ∂L/∂z
    let grad_input = self.weights.dot(&grad_z);

    (grad_input, grad_weights, grad_bias)
}

連鎖律で各層の勾配を計算
出力層から入力層に向かって伝播
効率的な勾配計算（数値微分より高速）

2. 活性化関数

非線形性の導入:

Activation::ReLU => x.mapv(|v| v.max(0.0))
Activation::Sigmoid => x.mapv(|v| 1.0 / (1.0 + (-v).exp()))

ReLU: 勾配消失問題を軽減
Sigmoid: 確率出力（0~1）
Tanh: 対称的な出力（-1~1）

3. He初期化

重みの初期化:

let scale = (2.0 / input_size as f64).sqrt();
let weights = Array2::random((input_size, output_size), Uniform::new(-scale, scale));

層が深くなっても勾配を適切に保つ
ReLU活性化関数に最適
Xavier初期化の改良版

4. ミニバッチ学習

効率的な学習:

for batch_start in (0..n).step_by(batch_size) {
    let batch_end = (batch_start + batch_size).min(n);
    // バッチデータで更新
}

全データより速い
勾配の分散を減らす
メモリ効率が良い

設計判断

ndarrayの使用: 高速な行列演算、NumPy風のAPI

層の抽象化: Layer構造体で再利用可能な層を定義

勾配の保存: バックプロパゲーションのため中間値を保持

型安全: Rustの型システムで次元の不一致を防ぐ

代替案

自動微分: 計算グラフを構築して自動的に勾配計算

GPU演算: CUDAやVulkanで高速化

最適化アルゴリズム: Adam、RMSpropなど高度なオプティマイザ

正則化: L1/L2正則化、Dropout

---

学習の意図

習得する概念

機械学習の基礎:

- 教師あり学習 - 損失関数の最小化 - 勾配降下法 - 過学習と汎化

ニューラルネットワーク:

- フォワードパス - バックプロパゲーション - 活性化関数の役割 - 層の積み重ね

数値計算:

- 行列演算 - 微分の計算 - 数値安定性 - 初期化の重要性

最適化:

- 勾配降下法 - 学習率の調整 - バッチサイズの選択 - 収束判定

CSの基礎との関連

線形代数:

行列の乗算
ベクトル演算
転置行列
次元の理解

微積分:

偏微分
連鎖律
最適化問題
勾配ベクトル

アルゴリズム:

反復アルゴリズム
計算量O(n)
空間計算量
数値安定性

確率統計:

損失関数
最尤推定
正規分布
確率的勾配降下

---

テスト方法

単体テスト

cargo test

統合テスト

# 線形回帰デモ
cargo run --example linear_regression_demo

# ニューラルネットワークデモ
cargo run --example neural_network_demo

# MNISTデモ（簡易版）
cargo run --example mnist_demo

ベンチマーク

# XOR問題で性能測定
cargo bench

---

評価基準

必須要件（60点）

[ ] 線形回帰が正しく動作する（10点）
[ ] ニューラルネットワークが構築できる（15点）
[ ] バックプロパゲーションが実装されている（20点）
[ ] 学習ループが機能する（10点）
[ ] IRISデータセットで80%以上の精度（5点）

標準要件（30点）

[ ] 複数の活性化関数をサポート（10点）
[ ] データ正規化と分割機能（5点）
[ ] 学習曲線の記録（5点）
[ ] XOR問題を解決できる（10点）

発展要件（10点）

[ ] Adamオプティマイザの実装（3点）
[ ] Dropout実装（3点）
[ ] CNNの基礎実装（4点）

ボーナス（+10点）

[ ] MNISTで90%以上の精度（+5点）
[ ] Rust並列処理でバッチ高速化（+5点）

---

参考資料

機械学習基礎

Deep Learning Book (Ian Goodfellow)
Neural Networks and Deep Learning (Michael Nielsen)
CS231n: Convolutional Neural Networks

Rust実装

ndarray Documentation
Burn (Rust ML Framework)
Linfa (Rust ML Toolkit)