11 minute read

State and control

\[x = \begin{bmatrix} x \\ y \\ \psi \\ v \\ \delta \\ \end{bmatrix}, \;\;\; u = \delta\]
  • $x$: absolute position x (m)
  • $y$: absolute position y (m)
  • $\psi$: yaw (deg)
  • $v$: velocity (m/s)
  • $\delta$: steering angle (deg)
  • $\beta:$ side slip angle

Introduction

전체 절차는 다음과 같다. 다음의 과정을 control sampling periode마다 반복한다.

  1. [random_sampling] control_sequence에 대한 sample들을 생성한다. 이때 생성된 control_sequence들을 control_sequence_batch라고 명명한다.
  2. [predict_state_sequence] 1번에서 생성한 control sequence들을 바탕으로 Kinematic bicycle model을 이용하여 state_sequence들을 생성한다. 이때 생성된 state sequence들을 state_sequence_batch라고 명명한다.
  3. [calculate_state_sequence_cost, calculate_state_sequence_cost_batch] 2번에서 생성한 state_sequence에 대한 state_sequence_cost $S(V_k)$를 계산한다.
  4. [calculate_sample_cost_batch] 3번에서 생성한 state_sequence_cost에 control term에 대한 cost를 더하여 하나의 state_sequence(control_sequence을 이용해 만들어진)에 대응되는 최종 cost인 sample_cost를 계산한다.
  5. [softmax] 4번을 통해 구한 sample_cost_batch에 대해 음의 부호를 가진 softmax를 취하여 1번에서 생성했던 control control_sequence들에 대한 가중치(확률)을 생성한다. 이때 각 control_sequence에 대응되는 sample_cost가 높을수록 낮은 가중치(확률)을 갖게 된다.
  6. [solve] 1번에서 생성한 control_sequence와 5번에서 구한 weight를 가중합(평균)하여 최종 updated_control_sequence를 구한다.
  7. [timer_callback] 6번에서 구한 updated_control_sequence에서 현재 값(첫 번째)을 취하여 현재 control input으로 사용한다.
  • mppi logic과 관련하여 모든 함수는 solve 함수에서 호출된다.
  • 함수 calculate_state_sequence_cost_batch는 함수 predict_state_sequence와 함수 calculate_state_sequence_cost를 호출한다. 따라서 solve 함수 안에서 함수 calculate_state_sequence_cost_batch는만 호출하면 된다. 또한 이 함수는 계산한 state_sequence_cost_batch를 반환한다.
  • 함수 calculate_sample_cost_batch는 반환된 state_sequence_cost_batch를 입력으로 받아서 가정 처음에 생성한 control_sequence들에 대응하는 최종 비용인 sample_cost_batch를 생성한다.
\[\begin{align*} \mathbf{U}_t^* &\simeq \sum_{k = 0}^{K-1} w(\mathbf{V}_k) \mathbf{V}_k \\ &= \sum_{k = 0}^{K-1} \text{softmax} \left( S(\mathbf{V}_k) + \lambda \sum_{\tau = 0}^{T-1} (\hat{\mathbf{u}}_{\tau} - \tilde{\mathbf{u}}_{\tau})^{\top} \Sigma_{\tau}^{-1} \mathbf{v}_{\tau} \right) \mathbf{V}_k \\ &= \sum_{k = 0}^{K-1} \text{softmax} \left( \phi(\mathbf{x}_T) + \sum_{\tau = 0}^{T-1}c(\mathbf{x}_{\tau}) + \lambda \sum_{\tau = 0}^{T-1} (\hat{\mathbf{u}}_{\tau} - \tilde{\mathbf{u}}_{\tau})^{\top} \Sigma_{\tau}^{-1} \mathbf{v}_{\tau} \right) \mathbf{V}_k \\ \end{align*}\]
  • number of samples: $K$
  • initial estimated control input sequence, often using the previous optimal solution
\[\hat{\mathbf{U}} = [\hat{\mathbf{u}}_{\tau}]_{\tau = 0}^{T-1}\]
  • given nominal control input sequence
\[\tilde{\mathbf{U}} = [\tilde{\mathbf{u}}_{\tau}]_{\tau = 0}^{T-1} \in \mathbb{R}^{T \times m}\]
  • randomly generated control input sequences
\[\mathbf{V}_k = [\mathbf{v}_{\tau}]_{\tau = 0}^{T-1} \in \mathbb{R}^{T \times m}\]

1. solve

최종적으로 구해야 할 식은 다음과 같다. random_sampling한 control_sequence들에 대한 가중합이다.

\[\begin{align*} \mathbf{U}_t^* &\simeq \sum_{k = 0}^{K-1} w(\mathbf{V}_k) \mathbf{V}_k \\ \end{align*}\]

where

\[\begin{align*} w(\mathbf{V}_k) &= \text{softmax} \left( S(\mathbf{V}_k) + \lambda \sum_{\tau = 0}^{T-1} (\hat{\mathbf{u}}_{\tau} - \tilde{\mathbf{u}}_{\tau})^{\top} \Sigma_{\tau}^{-1} \mathbf{v}_{\tau} \right) \\ \end{align*}\]
  • $k$th state sequence cost
\[S(\mathbf{V}_k) = \phi(\mathbf{x}_T) + \sum_{\tau = 0}^{T-1}c(\mathbf{x}_{\tau})\]
  • $k$th sample cost
\[S(\mathbf{V}_k) + \lambda \sum_{\tau = 0}^{T-1} (\hat{\mathbf{u}}_{\tau} - \tilde{\mathbf{u}}_{\tau})^{\top} \Sigma_{\tau}^{-1} \mathbf{v}_{\tau}\]
std::pair<ControlSequence, double> SVGMPPI::solve(
    const State& initial_state
)
{
    // Transport guide particles by sten variational gradient descent

    // cost batch
    std::vector<double> svg_cost_batch_history;
    std::vector<ControlSequence> svg_control_sequence_batch_history;

    for (size_t i = 0; i < SVGD_ITERATION_NUMBER; i++) {
        // Transport samples by stein variational gradient descent
        const ControlSequenceBatch gradient_log_posterior_batch = approximate_gradient_log_posterior_batch(
            *guide_smapling_pointer_,
            initial_state
        );

        #pragma omp parallel for num_threads(THREAD_NUMBER)
        for (size_t i = 0; i < guide_smapling_pointer_->get_sample_number(); i++) {
            // svgd update
            guide_smapling_pointer_->noised_control_sequence_batch_[i] += SVGD_STEP_SIZE * gradient_log_posterior_batch[i];
        }

        // store costs and samples for adaptive covariance calculation
        const std::vector<double> state_cost_batch = calculate_state_sequence_cost_batch(
            *guide_smapling_pointer_,
            initial_state,
            local_cost_map_
        ).first;
        svg_cost_batch_history.insert(
            svg_cost_batch_history.end(),
            state_cost_batch.begin(),
            state_cost_batch.end()
        );
        svg_control_sequence_batch_history.insert(
            svg_control_sequence_batch_history.end(),
            guide_smapling_pointer_->noised_control_sequence_batch_.begin(),
            guide_smapling_pointer_->noised_control_sequence_batch_.end()
        );
    }

    const auto guide_state_cost_batch = calculate_state_sequence_cost_batch(
        *guide_smapling_pointer_,
        initial_state,
        local_cost_map_
    ).first;
    const size_t min_guide_state_cost_index = std::distance(
        guide_state_cost_batch.begin(),
        std::min_element(
            guide_state_cost_batch.begin(),
            guide_state_cost_batch.end()
        )
    );
    const ControlSequence svg_control_sequence = guide_smapling_pointer_->noised_control_sequence_batch_[
        min_guide_state_cost_index
    ]; // Stein Variational Guided

    // ----------------------------------------------------------------------------------------------------

    // covariance matrix for random_sampling from prior distribution
    ControlCovarianceMatrixSequence adaptive_control_covariances_matrix_sequence = std::vector<Eigen::MatrixXd, Eigen::aligned_allocator<Eigen::MatrixXd>>(
        PREDICTION_HORIZON - 1, Eigen::MatrixXd::Zero(CONTROL_SPACE::dim, CONTROL_SPACE::dim)
    );
    for (auto& covariance : adaptive_control_covariances_matrix_sequence) {
        for (size_t i = 0; i < CONTROL_SPACE::dim; i++) {
            covariance(i, i) = STERRING_CONTROL_COVARIANCE[i];
        }
    }

    if (IS_COVARIANCE_ADAPTATION) {
        // calculate softmax costs
        const std::vector<double> softmax_costs = softmax(
            svg_cost_batch_history,
            GAUSSIAN_FITTING_LAMBDA,
            THREAD_NUMBER
        );

        // calculate covariance using gaussian fitting
        for (size_t i = 0; i < PREDICTION_HORIZON - 1; i++) {
            // x-coordinate vector for gaussian_fitting
            std::vector<double> steer_samples(svg_control_sequence_batch_history.size());
            // y-coordinate vector for gaussian_fitting
            std::vector<double> q_star(softmax_costs.size());
            
            for (size_t j = 0; j < steer_samples.size(); j++) {
                steer_samples[j] = svg_control_sequence_batch_history[j](i, 0);
                q_star[j] = softmax_costs[j];
            }

            // Gaussian fitting
            const double sigma = gaussian_fitting(
                steer_samples,
                q_star
            ).second;

            // clamping
            const double sigma_clamped = std::clamp(
                sigma,
                MIN_STEERING_COVARIANCE,
                MAX_STEERING_COVARIANCE
            );

            adaptive_control_covariances_matrix_sequence[i] = Eigen::MatrixXd::Identity(CONTROL_SPACE::dim, CONTROL_SPACE::dim) * sigma_clamped;
        }
    }

    // important from now

    // random sampling from prior distribution
    prior_smapling_pointer_->random_sampling(previous_control_sequence_, adaptive_control_covariances_matrix_sequence);

    // Rollout samples and calculate costs
    auto [state_sequence_cost_batch, collision_cost_batch] = calculate_state_sequence_cost_batch(
        *prior_smapling_pointer_,
        initial_state,
        local_cost_map_
    );

    // calculate weights
    if (IS_SVG) {
        // with nominal sequence by SVG
        nominal_control_sequence_ = svg_control_sequence;
    } else {
        // without nominal sequence
        nominal_control_sequence_ = Eigen::MatrixXd::Zero(
            PREDICTION_HORIZON - 1, CONTROL_SPACE::dim
        );
    }

    const std::vector<double> weight_batch = softmax(
        calculate_sample_cost_batch(
            *prior_smapling_pointer_,
            LAMBDA,
            ALPHA,
            state_sequence_cost_batch,
            nominal_control_sequence_
        ),
        LAMBDA,
        THREAD_NUMBER
    );
    weight_batch_ = weight_batch;

    // update next control sequence by expection (weighted sum)
    ControlSequence updated_control_sequence = Eigen::MatrixXd::Zero(PREDICTION_HORIZON - 1, CONTROL_SPACE::dim);
    for (size_t i = 0; i < prior_smapling_pointer_->get_sample_number(); i++) {
        updated_control_sequence += weight_batch[i] * prior_smapling_pointer_->noised_control_sequence_batch_[i];
    }

    // for collision rate
    const int collision_number = std::count_if(
        collision_cost_batch.begin(),
        collision_cost_batch.end(),
        [](const double& cost) { return cost > 0.0; }
    );
    const double collision_rate = static_cast<double>(collision_number) / static_cast<double>(prior_smapling_pointer_->get_sample_number());

    // for warm start (biased-sampling)
    previous_control_sequence_ = updated_control_sequence;

    return std::make_pair(updated_control_sequence, collision_rate);
}

2. random_sampling

Non-biased Sampling

\[u \sim N(0, \sigma^2)\]

Biased Sampling

\[u \sim N(u^*, \sigma^2)\]

여기서 $u^*$는 이전에 계산했던 최적의 control input이다.

control sequence에 대한 샘플 noised_control_sequence_batch_ 를 생성한다.

  1. 평균이 0이고 표준편차가 control_covariance_sequence의 표준편차인 정규분포를 설정한다.
  2. 설정된 정규분포에서 샘플링을 하고, 이를 noise_sequence_batch에 저장한다.
  3. 일정 비율을 가지고 편향 샘플링과 비편향 샘플링을 하여 control sequence를 샘플링한다. 이를 noised_control_sequence_batch_에 저장한다.
  4. control input constraint를 고려하여 샘플링된 제어 신호를 clampling한다.
void Sampling::random_sampling(
    const ControlSequence reference_control_sequence,
    const ControlCovarianceMatrixSequence reference_control_covariance_matrix_sequence
)
{
    set_control_mean_sequence(reference_control_sequence);
    set_control_covariance_matrix_sequence(reference_control_covariance_matrix_sequence);

    // Set normal distributions parameters and pointer
    for (size_t i = 0; i < PREDICTION_HORIZON - 1; i++) {
        for (size_t j = 0; j < CONTROL_SPACE::dim; j++) {
            // standard deviation of control covariance sequence
            const double standard_deviation = std::sqrt(control_covariance_matrix_sequence_[i](j, j));

            // normal distribution parameter in which expectation is 0 and standard deviation is from control covariance sequence
            std::normal_distribution<>::param_type normal_distribution_parameter(0.0, standard_deviation);

            // set noraml distribution pointer
            (*normal_distribution_pointer_)[i][j].param(normal_distribution_parameter);
        }
    }

    #pragma omp parallel for num_threads(THREAD_NUMBER)
    for (size_t i = 0; i < SAMPLE_NUMBER; i++) {
        // generate noise sequence using upper normal distribution
        for (size_t j = 0; j < PREDICTION_HORIZON - 1; j++) {
            for (size_t k = 0; k < CONTROL_SPACE::dim; k++) {
                noise_sequence_batch_[i](j, k) = (*normal_distribution_pointer_)[j][k](random_number_generators_[omp_get_thread_num()]);
            }
        }

        // sampling control sequence with non-biased (around zero) sampling rate
        if (i < static_cast<size_t>((1 - NON_BIASED_SAMPLING_RATE) * SAMPLE_NUMBER)) {
            // biased sampling (around control_mean_sequence)
            noised_control_sequence_batch_[i] = control_mean_sequence_ + noise_sequence_batch_[i];
        } else {
            // non-biased sampling (around zero)
            noised_control_sequence_batch_[i] = noise_sequence_batch_[i];
        }

        // clip input with control input constraints
        for (size_t j = 0; j < CONTROL_SPACE::dim; j++) {
            for (size_t k = 0; k < PREDICTION_HORIZON - 1; k++) {
                noised_control_sequence_batch_[i](k, j) = std::clamp(
                    noised_control_sequence_batch_[i](k, j),
                    min_control_[j],
                    max_control_[j]
                );
            }
        }
    }
}
  • biased sampling: noise를 입력으로 받은 평균에 더하여 샘플을 생성한다. 이미 알고 있는 정보에 기반하여 샘플링한다.
  • 순수한 noise만으로 샘플을 생성한다. 샘플들이 0을 중심으로 분포하게 만든다. 이 방식은 탐색에 유리하다.

3. predict_state_sequence

kinematic bicycle model을 이용하여 초기 상태와 $k$th control_sequence를 바탕으로 Sample의 $k$th state sequence를 예측한다.

입력

\[\mathbf{x}_0, \;\;\;\mathbf{u}_0, \mathbf{u}_1, \dots, \mathbf{u}_{T - 1}\]

출력

\[\mathbf{x}_0, \mathbf{x}_1, \dots, \mathbf{x}_{T - 1}\]

Kinematic bicycle model

\[\begin{align*} \dot{x} &= v \cos(\psi + \beta) \\ \dot{y} &= v \sin(\psi + \beta) \\ \dot{\psi} &= \dfrac{v}{l_r + l_f} \cos\beta \tan\delta \\ \beta &= \arctan{\left(\dfrac{l_r}{l_r +l_f} \tan\delta \right)} \\ \end{align*}\]
StateSequence SVGMPPI::predict_state_sequence(
    const State& initial_state,
    const ControlSequence& control_sequence
) const
{
    // initialize state trajectory
    StateSequence predicted_state_sequence = Eigen::MatrixXd::Zero(PREDICTION_HORIZON, STATE_SPACE::dim);

    // set current state to state trajectory as initial state
    predicted_state_sequence.row(0) = initial_state;

    for (size_t i = 0; i < PREDICTION_HORIZON - 1; i++) {
        double steering = control_sequence(i, CONTROL_SPACE::steering);

        const double predicted_x = predicted_state_sequence(i, STATE_SPACE::x);
        const double predicted_y = predicted_state_sequence(i, STATE_SPACE::y);;
        const double predicted_yaw = predicted_state_sequence(i, STATE_SPACE::yaw);
        const double predicted_velocity = predicted_state_sequence(i, STATE_SPACE::velocity);
        const double predicted_steering = predicted_state_sequence(i, STATE_SPACE::steering);

        // predict delta state using kinematic bicycle model
        const double sideslip = atan(L_R / (L_F + L_R) * tan(steering));
        const double predicted_delta_x = predicted_velocity * cos(predicted_yaw + sideslip) * PREDICTION_INTERVAL;
        const double predicted_delta_y = predicted_velocity * sin(predicted_yaw + sideslip) * PREDICTION_INTERVAL;
        // const double predicted_delta_yaw = predicted_velocity * sin(sideslip) / L_R * PREDICTION_INTERVAL;
        const double predicted_delta_yaw = predicted_velocity / (L_F + L_R) * cos(sideslip) * tan(predicted_yaw) * PREDICTION_INTERVAL; // check
        const double predicted_delta_steering = (steering - predicted_steering) * PREDICTION_INTERVAL; // steer 1st order delay // check

        double next_velocity = 0.0;
        if (1) {
            next_velocity = predict_constant_speed(predicted_velocity);
        }

        // update next state
        predicted_state_sequence(i + 1, STATE_SPACE::x) = predicted_x + predicted_delta_x;
        predicted_state_sequence(i + 1, STATE_SPACE::y) = predicted_y + predicted_delta_y;
        predicted_state_sequence(i + 1, STATE_SPACE::yaw) = std::atan2(sin(predicted_yaw + predicted_delta_yaw), cos(predicted_yaw + predicted_delta_yaw));
        predicted_state_sequence(i + 1, STATE_SPACE::velocity) = next_velocity;
        predicted_state_sequence(i + 1, STATE_SPACE::steering) = predicted_steering + predicted_delta_steering;
    }

    return predicted_state_sequence;
}

predict_constant_speed

4. calculate_state_sequence_cost

$k$th state sequence cost $S(\mathbf{V}_k)$을 계산한다.

\[S(\mathbf{V}_k) = \phi(\mathbf{x}_T) + \sum_{\tau = 0}^{T-1} c(\mathbf{x}_{\tau})\]
std::pair<double, double> SVGMPPI::calculate_state_sequence_cost(
    const StateSequence state_sequence,
    const grid_map::GridMap& local_cost_map
) const
{
    // initialize total cost of summing stage cost and terminal cost
    double state_squence_cost_sum = 0.0;

    // stage cost
    for (size_t i = 0; i < PREDICTION_HORIZON - 1; i++) {
        double state_squence_stage_cost = 10.0;

        State stage_state = state_sequence.row(i);
        if (local_cost_map.isInside(grid_map::Position(stage_state(STATE_SPACE::x), stage_state(STATE_SPACE::y)))) {
            state_squence_stage_cost = local_cost_map.atPosition(
                "collision_layer", grid_map::Position(stage_state(STATE_SPACE::x), stage_state(STATE_SPACE::y))
            );
        }

        state_squence_cost_sum += COLLISION_WEIGHT * state_squence_stage_cost;
    }

    // terminal cost
    const State terminal_state = state_sequence.row(PREDICTION_HORIZON - 1);
    double state_sequence_terminal_cost = 10.0;
    if (local_cost_map.isInside(grid_map::Position(terminal_state(STATE_SPACE::x), terminal_state(STATE_SPACE::y)))) {
        state_sequence_terminal_cost = local_cost_map.atPosition(
            "collision_layer", grid_map::Position(terminal_state(STATE_SPACE::x), terminal_state(STATE_SPACE::y))
        );
    }

    state_squence_cost_sum += COLLISION_WEIGHT * state_sequence_terminal_cost;

    return std::make_pair(state_squence_cost_sum, state_squence_cost_sum);
}

std::pair<std::vector<double>, std::vector<double>> SVGMPPI::calculate_state_sequence_cost_batch(
    const Sampling& sampling_pointer,
    const State& initial_state,
    const grid_map::GridMap& local_cost_map
)
{
    // declare variables to return
    std::vector<double> state_sequence_cost_batch(sampling_pointer.get_sample_number());
    std::vector<double> collision_cost_batch(sampling_pointer.get_sample_number());

    #pragma omp parallel for num_threads(THREAD_NUMBER)
    for (size_t i = 0; i < sampling_pointer.get_sample_number(); i++) {
        // rollout state sequence for each control sequence
        state_sequence_batch_.at(i) = predict_state_sequence(
            initial_state,
            sampling_pointer.noised_control_sequence_batch_[i]
        );

        // calculate state sequence cost
        const auto [state_sequence_cost, collision_cost] = calculate_state_sequence_cost(
            state_sequence_batch_.at(i),
            local_cost_map
        );

        state_sequence_cost_batch.at(i) = state_sequence_cost;
        collision_cost_batch.at(i) = collision_cost;
    }

    return std::make_pair(state_sequence_cost_batch, collision_cost_batch);
}

5. calculate_state_sequence_cost_batch

\[\left[ S(\mathbf{V}_1), S(\mathbf{V}_2), \dots, S(\mathbf{V}_{T-1}) \right]\]

를 계산한다.

입력

  • initial_state
  • local_cost_map
  • state_sequence_batch : 멤버 변수가 들어가며, 함수 내부에서 그 값이 변경된다. 계산하는 과정에서 사용하는 것이 아닌 변경할 멤버변수가 들어간다.

사용하는 멤버 변수 : noised_control_sequence_batch_

std::pair<std::vector<double>, std::vector<double>> SVGMPPI::calculate_state_cost_batch(
    const State& initial_state,
    const grid_map::GridMap& local_cost_map,
    StateSequenceBatch* state_sequence_batch
) const
{
    std::vector<double> total_cost_batch(sample_number_);
    std::vector<double> collision_cost_batch(sample_number_);

    #pragma omp parallel for num_threads(thread_number_)
    for (size_t i = 0; i < sample_number_; i++) {
        // predict state sequence
        state_sequence_batch->at(i) = predict_state_sequence(
            initial_state,
            noised_control_sequence_batch_[i]
        );

        // calculate state sequence cost
        const auto [total_cost, collision_cost] = calculate_state_sequence_cost(
            state_sequence_batch->at(i),
            local_cost_map
        );
        total_cost_batch.at(i) = total_cost;
        collision_cost_batch.at(i) = collision_cost;
    }

    return std::make_pair(total_cost_batch, collision_cost_batch);
}

6. calculate_sample_cost_batch

모든 sample cost를 계산한다. 이는 각 state sequence에 해당하는 최종 cost이다.

\[S(\mathbf{V}_k) + \lambda \sum_{\tau = 0}^{T-1} (\hat{\mathbf{u}}_{\tau} - \tilde{\mathbf{u}}_{\tau})^{\top} \Sigma_{\tau}^{-1} \mathbf{v}_{\tau}\]

where

  • initial estimated control input sequence, often using the previous optimal solution
\[\hat{\mathbf{U}} = [\hat{\mathbf{u}}_{\tau}]_{\tau = 0}^{T-1}\]
  • given nominal control input sequence
\[\tilde{\mathbf{U}} = [\tilde{\mathbf{u}}_{\tau}]_{\tau = 0}^{T-1} \in \mathbb{R}^{T \times m}\]
  • randomly generated control input sequences
\[\mathbf{V}_k = [\mathbf{v}_{\tau}]_{\tau = 0}^{T-1} \in \mathbb{R}^{T \times m}\]
std::vector<double> SVGMPPI::calculate_sample_cost_batch(
    const Sampling& sampling_pointer,
    const double& lambda,
    const double& alpha,
    const std::vector<double> state_cost_batch,
    const ControlSequence nominal_control_sequence
) const
{
    // declare and initialize sample cost batch as state cost batch
    std::vector<double> sample_cost_batch = state_cost_batch;

    // add control cost batch to sample cost batch
    #pragma omp parallel for num_threads(THREAD_NUMBER)
    for (size_t i = 0; i < sampling_pointer.get_sample_number(); i++) {
        for (size_t j = 0; j < PREDICTION_HORIZON - 1; j++) {
            const double control_cost = \
                lambda * (1 - alpha) \
                * (sampling_pointer.control_mean_sequence_.row(j) - nominal_control_sequence.row(j)) \
                * sampling_pointer.control_inverse_covariance_matrix_sequence_[j] \
                * sampling_pointer.noised_control_sequence_batch_[i].row(j).transpose();
            
            sample_cost_batch[i] += control_cost;
        }
    }

    return sample_cost_batch;
}

7. approximate_gradient_log_posterior_batch

ControlSequenceBatch SVGMPPI::approximate_gradient_log_posterior_batch(
    const Sampling& sampling_pointer,
    const State& initial_state
)
{
    // declare and initialize gradient_log_posterior_batch
    ControlSequenceBatch gradient_log_posterior_batch = std::vector<Eigen::MatrixXd, Eigen::aligned_allocator<Eigen::MatrixXd>>(
        sampling_pointer.get_sample_number(), Eigen::MatrixXd::Zero(PREDICTION_HORIZON, STATE_SPACE::dim)
    );

    #pragma omp parallel for num_threads(THREAD_NUMBER)
    for (size_t i = 0; i < sampling_pointer.get_sample_number(); i++) {
        const ControlSequence gradient_log_likelihood = approximate_gradient_log_likelihood(
            svg_sampling_pointer_.at(i).get(),
            initial_state,
            sampling_pointer.noised_control_sequence_batch_[i],
            sampling_pointer.control_inverse_covariance_matrix_sequence_
        );

        gradient_log_posterior_batch[i] = gradient_log_likelihood;
    }

    return gradient_log_posterior_batch;
}

approximate_gradient_log_likelihood

\[\phi = \dfrac{\sum_{i=0}^{N-1} w(\mathbf{V}_k^g [i]) \Sigma_g^{-1} (\mathbf{V}_k^g[i] - \mathbf{V}_k^g)}{\sum_{i=0}^{N-1}w(\mathbf{V}_k^g [i])}\]
ControlSequence SVGMPPI::approximate_gradient_log_likelihood(
    Sampling* sampling_pointer,
    const State& initial_state,
    const ControlSequence& reference_control_sequence,
    const ControlCovarianceMatrixSequence& control_inverse_covariance_matrix_sequence
)
{
    // declate and initialize reference_control_covariance_matrix_sequence to be used random_sampling
    ControlCovarianceMatrixSequence reference_control_covariance_matrix_sequence = std::vector<Eigen::MatrixXd, Eigen::aligned_allocator<Eigen::MatrixXd>>(
        PREDICTION_HORIZON - 1, Eigen::MatrixXd::Zero(CONTROL_SPACE::dim, CONTROL_SPACE::dim)
    );
    for (auto& control_covariance : reference_control_covariance_matrix_sequence) {
        for (size_t i = 0; i < CONTROL_SPACE::dim; i++) {
            control_covariance(i, i) = steering_control_covariance_for_gradient_estimation_[i];
        }
    }

    // Generate control sequence samples and update member variable noised_control_sequence_batch_
    sampling_pointer->random_sampling(reference_control_sequence, reference_control_covariance_matrix_sequence);

    // calculate state cost for each state sequence
    auto state_cost_batch = calculate_state_sequence_cost_batch(
        *sampling_pointer,
        initial_state,
        local_cost_map_
    ).first;

    // declare variable for summing sample weight in denumerator
    std::vector<double> sample_weight_batch(sampling_pointer->get_sample_number());

    // declare and initialize variable for numerator term
    ControlSequence numerator = Eigen::MatrixXd::Zero(
        PREDICTION_HORIZON - 1, CONTROL_SPACE::dim
    );

    #pragma omp parallel for num_threads(THREAD_NUMBER)
    for (size_t i = 0; i < sampling_pointer->get_sample_number(); i++) {
        // declare variable for calculate sample cost (state cost + control cost) and initialize as state cost
        double sample_cost = state_cost_batch[i];

        // calculate sample cost
        for (size_t j = 0; j < PREDICTION_HORIZON - 1; j++) {
            const double control_cost = SVG_LAMBDA \
                * (previous_control_sequence_.row(j) - sampling_pointer->noised_control_sequence_batch_[i].row(j)) \
                * control_inverse_covariance_matrix_sequence[j] \
                * (previous_control_sequence_.row(j) - sampling_pointer->noised_control_sequence_batch_[i].row(j)).transpose();
            sample_cost += control_cost;
        }

        // calculate sample weight for each sample
        const double sample_weight = std::exp(-sample_cost / SVG_LAMBDA);

        // for denumerator
        sample_weight_batch[i] = sample_weight;
        
        // declare and initialize
        ControlSequence gradient_log_gaussian(reference_control_sequence.rows(), reference_control_sequence.cols());
        gradient_log_gaussian.setZero();
        
        for (size_t j = 0; j < PREDICTION_HORIZON - 1; j++) {
            gradient_log_gaussian.row(j) = sample_weight \
                * sampling_pointer->control_inverse_covariance_matrix_sequence_[j] \
                * (sampling_pointer->noised_control_sequence_batch_[i] - reference_control_sequence).row(j).transpose();
        }
        numerator += gradient_log_gaussian;
    }

    // denumberator term
    const double denumerator = std::accumulate(sample_weight_batch.begin(), sample_weight_batch.end(), 0.0);
    
    return numerator / (denumerator + 1e-10);
}

softmax

가장 basic한 softmax

\[\text{softmax}(c_i) = \dfrac{e^{c_i}}{\sum_{i=1}^n e^{c_i}}\]

값이 작을수록 높은 확률을 부여하고 싶을 때 음수를 활용한다.

\[\text{softmax}(c_i) = \dfrac{e^{-c_i/\lambda}}{\sum_{i=1}^n e^{-c_i/\lambda}}\]

수치적 안정성을 위해 우리는 다음과 같은 수식을 사용한다.

\[c_{\text{min}} = \min [c_1, c_2, \cdots, c_n]\] \[Z = \sum_{i=1}^n \exp {\left(-\dfrac{c_i - c_{\min}}{\lambda}\right)} + 10^{-10}\] \[\text{softmax} (c_i) = \dfrac{\exp {\left(-\dfrac{c_i - c_{\min}}{\lambda}\right)}}{Z}\]
std::vector<double> softmax(
    const std::vector<double>& costs,
    const double& lambda,
    const int thread_number
) const
{
    // minimum element of costs
    const double min_cost_ = *std::min_element(costs.begin(), costs.end());

    // calculate normalizing constant
    double normalizing_constant_ = 1e-10;
    for (const auto& cost : costs) {
        normalizing_constant_ += std::exp(- (cost - min_cost_) / lambda);
    }

    // calculate softmax of costs
    std::vector<double> softmax_costs_(costs.size());
    #pragma omp parallel for num_threads(thread_number)
    for (size_t i = 0; i < costs.size(); i++) {
        softmax_costs_[i] = std::exp(-(costs[i] - min_cost_) / lambda) / normalizing_constant_;
    }

    return softmax_costs_;
}

Categories:

Updated: