[F1tenth] MPPI Solver API
State and control
\[x = \begin{bmatrix} x \\ y \\ \psi \\ v \\ \delta \\ \end{bmatrix}, \;\;\; u = \delta\]- $x$: absolute position x (m)
- $y$: absolute position y (m)
- $\psi$: yaw (deg)
- $v$: velocity (m/s)
- $\delta$: steering angle (deg)
- $\beta:$ side slip angle
Introduction
전체 절차는 다음과 같다. 다음의 과정을 control sampling periode마다 반복한다.
- [random_sampling] control_sequence에 대한 sample들을 생성한다. 이때 생성된 control_sequence들을 control_sequence_batch라고 명명한다.
- [predict_state_sequence] 1번에서 생성한 control sequence들을 바탕으로 Kinematic bicycle model을 이용하여 state_sequence들을 생성한다. 이때 생성된 state sequence들을 state_sequence_batch라고 명명한다.
- [calculate_state_sequence_cost, calculate_state_sequence_cost_batch] 2번에서 생성한 state_sequence에 대한 state_sequence_cost $S(V_k)$를 계산한다.
- [calculate_sample_cost_batch] 3번에서 생성한 state_sequence_cost에 control term에 대한 cost를 더하여 하나의 state_sequence(control_sequence을 이용해 만들어진)에 대응되는 최종 cost인 sample_cost를 계산한다.
- [softmax] 4번을 통해 구한 sample_cost_batch에 대해 음의 부호를 가진 softmax를 취하여 1번에서 생성했던 control control_sequence들에 대한 가중치(확률)을 생성한다. 이때 각 control_sequence에 대응되는 sample_cost가 높을수록 낮은 가중치(확률)을 갖게 된다.
- [solve] 1번에서 생성한 control_sequence와 5번에서 구한 weight를 가중합(평균)하여 최종 updated_control_sequence를 구한다.
- [timer_callback] 6번에서 구한 updated_control_sequence에서 현재 값(첫 번째)을 취하여 현재 control input으로 사용한다.
- mppi logic과 관련하여 모든 함수는 solve 함수에서 호출된다.
- 함수 calculate_state_sequence_cost_batch는 함수 predict_state_sequence와 함수 calculate_state_sequence_cost를 호출한다. 따라서 solve 함수 안에서 함수 calculate_state_sequence_cost_batch는만 호출하면 된다. 또한 이 함수는 계산한 state_sequence_cost_batch를 반환한다.
- 함수 calculate_sample_cost_batch는 반환된 state_sequence_cost_batch를 입력으로 받아서 가정 처음에 생성한 control_sequence들에 대응하는 최종 비용인 sample_cost_batch를 생성한다.
- number of samples: $K$
- initial estimated control input sequence, often using the previous optimal solution
- given nominal control input sequence
- randomly generated control input sequences
1. solve
최종적으로 구해야 할 식은 다음과 같다. random_sampling한 control_sequence들에 대한 가중합이다.
\[\begin{align*} \mathbf{U}_t^* &\simeq \sum_{k = 0}^{K-1} w(\mathbf{V}_k) \mathbf{V}_k \\ \end{align*}\]where
\[\begin{align*} w(\mathbf{V}_k) &= \text{softmax} \left( S(\mathbf{V}_k) + \lambda \sum_{\tau = 0}^{T-1} (\hat{\mathbf{u}}_{\tau} - \tilde{\mathbf{u}}_{\tau})^{\top} \Sigma_{\tau}^{-1} \mathbf{v}_{\tau} \right) \\ \end{align*}\]- $k$th state sequence cost
- $k$th sample cost
std::pair<ControlSequence, double> SVGMPPI::solve(
const State& initial_state
)
{
// Transport guide particles by sten variational gradient descent
// cost batch
std::vector<double> svg_cost_batch_history;
std::vector<ControlSequence> svg_control_sequence_batch_history;
for (size_t i = 0; i < SVGD_ITERATION_NUMBER; i++) {
// Transport samples by stein variational gradient descent
const ControlSequenceBatch gradient_log_posterior_batch = approximate_gradient_log_posterior_batch(
*guide_smapling_pointer_,
initial_state
);
#pragma omp parallel for num_threads(THREAD_NUMBER)
for (size_t i = 0; i < guide_smapling_pointer_->get_sample_number(); i++) {
// svgd update
guide_smapling_pointer_->noised_control_sequence_batch_[i] += SVGD_STEP_SIZE * gradient_log_posterior_batch[i];
}
// store costs and samples for adaptive covariance calculation
const std::vector<double> state_cost_batch = calculate_state_sequence_cost_batch(
*guide_smapling_pointer_,
initial_state,
local_cost_map_
).first;
svg_cost_batch_history.insert(
svg_cost_batch_history.end(),
state_cost_batch.begin(),
state_cost_batch.end()
);
svg_control_sequence_batch_history.insert(
svg_control_sequence_batch_history.end(),
guide_smapling_pointer_->noised_control_sequence_batch_.begin(),
guide_smapling_pointer_->noised_control_sequence_batch_.end()
);
}
const auto guide_state_cost_batch = calculate_state_sequence_cost_batch(
*guide_smapling_pointer_,
initial_state,
local_cost_map_
).first;
const size_t min_guide_state_cost_index = std::distance(
guide_state_cost_batch.begin(),
std::min_element(
guide_state_cost_batch.begin(),
guide_state_cost_batch.end()
)
);
const ControlSequence svg_control_sequence = guide_smapling_pointer_->noised_control_sequence_batch_[
min_guide_state_cost_index
]; // Stein Variational Guided
// ----------------------------------------------------------------------------------------------------
// covariance matrix for random_sampling from prior distribution
ControlCovarianceMatrixSequence adaptive_control_covariances_matrix_sequence = std::vector<Eigen::MatrixXd, Eigen::aligned_allocator<Eigen::MatrixXd>>(
PREDICTION_HORIZON - 1, Eigen::MatrixXd::Zero(CONTROL_SPACE::dim, CONTROL_SPACE::dim)
);
for (auto& covariance : adaptive_control_covariances_matrix_sequence) {
for (size_t i = 0; i < CONTROL_SPACE::dim; i++) {
covariance(i, i) = STERRING_CONTROL_COVARIANCE[i];
}
}
if (IS_COVARIANCE_ADAPTATION) {
// calculate softmax costs
const std::vector<double> softmax_costs = softmax(
svg_cost_batch_history,
GAUSSIAN_FITTING_LAMBDA,
THREAD_NUMBER
);
// calculate covariance using gaussian fitting
for (size_t i = 0; i < PREDICTION_HORIZON - 1; i++) {
// x-coordinate vector for gaussian_fitting
std::vector<double> steer_samples(svg_control_sequence_batch_history.size());
// y-coordinate vector for gaussian_fitting
std::vector<double> q_star(softmax_costs.size());
for (size_t j = 0; j < steer_samples.size(); j++) {
steer_samples[j] = svg_control_sequence_batch_history[j](i, 0);
q_star[j] = softmax_costs[j];
}
// Gaussian fitting
const double sigma = gaussian_fitting(
steer_samples,
q_star
).second;
// clamping
const double sigma_clamped = std::clamp(
sigma,
MIN_STEERING_COVARIANCE,
MAX_STEERING_COVARIANCE
);
adaptive_control_covariances_matrix_sequence[i] = Eigen::MatrixXd::Identity(CONTROL_SPACE::dim, CONTROL_SPACE::dim) * sigma_clamped;
}
}
// important from now
// random sampling from prior distribution
prior_smapling_pointer_->random_sampling(previous_control_sequence_, adaptive_control_covariances_matrix_sequence);
// Rollout samples and calculate costs
auto [state_sequence_cost_batch, collision_cost_batch] = calculate_state_sequence_cost_batch(
*prior_smapling_pointer_,
initial_state,
local_cost_map_
);
// calculate weights
if (IS_SVG) {
// with nominal sequence by SVG
nominal_control_sequence_ = svg_control_sequence;
} else {
// without nominal sequence
nominal_control_sequence_ = Eigen::MatrixXd::Zero(
PREDICTION_HORIZON - 1, CONTROL_SPACE::dim
);
}
const std::vector<double> weight_batch = softmax(
calculate_sample_cost_batch(
*prior_smapling_pointer_,
LAMBDA,
ALPHA,
state_sequence_cost_batch,
nominal_control_sequence_
),
LAMBDA,
THREAD_NUMBER
);
weight_batch_ = weight_batch;
// update next control sequence by expection (weighted sum)
ControlSequence updated_control_sequence = Eigen::MatrixXd::Zero(PREDICTION_HORIZON - 1, CONTROL_SPACE::dim);
for (size_t i = 0; i < prior_smapling_pointer_->get_sample_number(); i++) {
updated_control_sequence += weight_batch[i] * prior_smapling_pointer_->noised_control_sequence_batch_[i];
}
// for collision rate
const int collision_number = std::count_if(
collision_cost_batch.begin(),
collision_cost_batch.end(),
[](const double& cost) { return cost > 0.0; }
);
const double collision_rate = static_cast<double>(collision_number) / static_cast<double>(prior_smapling_pointer_->get_sample_number());
// for warm start (biased-sampling)
previous_control_sequence_ = updated_control_sequence;
return std::make_pair(updated_control_sequence, collision_rate);
}
2. random_sampling
Non-biased Sampling
\[u \sim N(0, \sigma^2)\]Biased Sampling
\[u \sim N(u^*, \sigma^2)\]여기서 $u^*$는 이전에 계산했던 최적의 control input이다.
control sequence에 대한 샘플 noised_control_sequence_batch_ 를 생성한다.
- 평균이 0이고 표준편차가 control_covariance_sequence의 표준편차인 정규분포를 설정한다.
- 설정된 정규분포에서 샘플링을 하고, 이를 noise_sequence_batch에 저장한다.
- 일정 비율을 가지고 편향 샘플링과 비편향 샘플링을 하여 control sequence를 샘플링한다. 이를 noised_control_sequence_batch_에 저장한다.
- control input constraint를 고려하여 샘플링된 제어 신호를 clampling한다.
void Sampling::random_sampling(
const ControlSequence reference_control_sequence,
const ControlCovarianceMatrixSequence reference_control_covariance_matrix_sequence
)
{
set_control_mean_sequence(reference_control_sequence);
set_control_covariance_matrix_sequence(reference_control_covariance_matrix_sequence);
// Set normal distributions parameters and pointer
for (size_t i = 0; i < PREDICTION_HORIZON - 1; i++) {
for (size_t j = 0; j < CONTROL_SPACE::dim; j++) {
// standard deviation of control covariance sequence
const double standard_deviation = std::sqrt(control_covariance_matrix_sequence_[i](j, j));
// normal distribution parameter in which expectation is 0 and standard deviation is from control covariance sequence
std::normal_distribution<>::param_type normal_distribution_parameter(0.0, standard_deviation);
// set noraml distribution pointer
(*normal_distribution_pointer_)[i][j].param(normal_distribution_parameter);
}
}
#pragma omp parallel for num_threads(THREAD_NUMBER)
for (size_t i = 0; i < SAMPLE_NUMBER; i++) {
// generate noise sequence using upper normal distribution
for (size_t j = 0; j < PREDICTION_HORIZON - 1; j++) {
for (size_t k = 0; k < CONTROL_SPACE::dim; k++) {
noise_sequence_batch_[i](j, k) = (*normal_distribution_pointer_)[j][k](random_number_generators_[omp_get_thread_num()]);
}
}
// sampling control sequence with non-biased (around zero) sampling rate
if (i < static_cast<size_t>((1 - NON_BIASED_SAMPLING_RATE) * SAMPLE_NUMBER)) {
// biased sampling (around control_mean_sequence)
noised_control_sequence_batch_[i] = control_mean_sequence_ + noise_sequence_batch_[i];
} else {
// non-biased sampling (around zero)
noised_control_sequence_batch_[i] = noise_sequence_batch_[i];
}
// clip input with control input constraints
for (size_t j = 0; j < CONTROL_SPACE::dim; j++) {
for (size_t k = 0; k < PREDICTION_HORIZON - 1; k++) {
noised_control_sequence_batch_[i](k, j) = std::clamp(
noised_control_sequence_batch_[i](k, j),
min_control_[j],
max_control_[j]
);
}
}
}
}
- biased sampling: noise를 입력으로 받은 평균에 더하여 샘플을 생성한다. 이미 알고 있는 정보에 기반하여 샘플링한다.
- 순수한 noise만으로 샘플을 생성한다. 샘플들이 0을 중심으로 분포하게 만든다. 이 방식은 탐색에 유리하다.
3. predict_state_sequence
kinematic bicycle model을 이용하여 초기 상태와 $k$th control_sequence를 바탕으로 Sample의 $k$th state sequence를 예측한다.
입력
\[\mathbf{x}_0, \;\;\;\mathbf{u}_0, \mathbf{u}_1, \dots, \mathbf{u}_{T - 1}\]출력
\[\mathbf{x}_0, \mathbf{x}_1, \dots, \mathbf{x}_{T - 1}\]Kinematic bicycle model
\[\begin{align*} \dot{x} &= v \cos(\psi + \beta) \\ \dot{y} &= v \sin(\psi + \beta) \\ \dot{\psi} &= \dfrac{v}{l_r + l_f} \cos\beta \tan\delta \\ \beta &= \arctan{\left(\dfrac{l_r}{l_r +l_f} \tan\delta \right)} \\ \end{align*}\]StateSequence SVGMPPI::predict_state_sequence(
const State& initial_state,
const ControlSequence& control_sequence
) const
{
// initialize state trajectory
StateSequence predicted_state_sequence = Eigen::MatrixXd::Zero(PREDICTION_HORIZON, STATE_SPACE::dim);
// set current state to state trajectory as initial state
predicted_state_sequence.row(0) = initial_state;
for (size_t i = 0; i < PREDICTION_HORIZON - 1; i++) {
double steering = control_sequence(i, CONTROL_SPACE::steering);
const double predicted_x = predicted_state_sequence(i, STATE_SPACE::x);
const double predicted_y = predicted_state_sequence(i, STATE_SPACE::y);;
const double predicted_yaw = predicted_state_sequence(i, STATE_SPACE::yaw);
const double predicted_velocity = predicted_state_sequence(i, STATE_SPACE::velocity);
const double predicted_steering = predicted_state_sequence(i, STATE_SPACE::steering);
// predict delta state using kinematic bicycle model
const double sideslip = atan(L_R / (L_F + L_R) * tan(steering));
const double predicted_delta_x = predicted_velocity * cos(predicted_yaw + sideslip) * PREDICTION_INTERVAL;
const double predicted_delta_y = predicted_velocity * sin(predicted_yaw + sideslip) * PREDICTION_INTERVAL;
// const double predicted_delta_yaw = predicted_velocity * sin(sideslip) / L_R * PREDICTION_INTERVAL;
const double predicted_delta_yaw = predicted_velocity / (L_F + L_R) * cos(sideslip) * tan(predicted_yaw) * PREDICTION_INTERVAL; // check
const double predicted_delta_steering = (steering - predicted_steering) * PREDICTION_INTERVAL; // steer 1st order delay // check
double next_velocity = 0.0;
if (1) {
next_velocity = predict_constant_speed(predicted_velocity);
}
// update next state
predicted_state_sequence(i + 1, STATE_SPACE::x) = predicted_x + predicted_delta_x;
predicted_state_sequence(i + 1, STATE_SPACE::y) = predicted_y + predicted_delta_y;
predicted_state_sequence(i + 1, STATE_SPACE::yaw) = std::atan2(sin(predicted_yaw + predicted_delta_yaw), cos(predicted_yaw + predicted_delta_yaw));
predicted_state_sequence(i + 1, STATE_SPACE::velocity) = next_velocity;
predicted_state_sequence(i + 1, STATE_SPACE::steering) = predicted_steering + predicted_delta_steering;
}
return predicted_state_sequence;
}
predict_constant_speed
4. calculate_state_sequence_cost
$k$th state sequence cost $S(\mathbf{V}_k)$을 계산한다.
\[S(\mathbf{V}_k) = \phi(\mathbf{x}_T) + \sum_{\tau = 0}^{T-1} c(\mathbf{x}_{\tau})\]std::pair<double, double> SVGMPPI::calculate_state_sequence_cost(
const StateSequence state_sequence,
const grid_map::GridMap& local_cost_map
) const
{
// initialize total cost of summing stage cost and terminal cost
double state_squence_cost_sum = 0.0;
// stage cost
for (size_t i = 0; i < PREDICTION_HORIZON - 1; i++) {
double state_squence_stage_cost = 10.0;
State stage_state = state_sequence.row(i);
if (local_cost_map.isInside(grid_map::Position(stage_state(STATE_SPACE::x), stage_state(STATE_SPACE::y)))) {
state_squence_stage_cost = local_cost_map.atPosition(
"collision_layer", grid_map::Position(stage_state(STATE_SPACE::x), stage_state(STATE_SPACE::y))
);
}
state_squence_cost_sum += COLLISION_WEIGHT * state_squence_stage_cost;
}
// terminal cost
const State terminal_state = state_sequence.row(PREDICTION_HORIZON - 1);
double state_sequence_terminal_cost = 10.0;
if (local_cost_map.isInside(grid_map::Position(terminal_state(STATE_SPACE::x), terminal_state(STATE_SPACE::y)))) {
state_sequence_terminal_cost = local_cost_map.atPosition(
"collision_layer", grid_map::Position(terminal_state(STATE_SPACE::x), terminal_state(STATE_SPACE::y))
);
}
state_squence_cost_sum += COLLISION_WEIGHT * state_sequence_terminal_cost;
return std::make_pair(state_squence_cost_sum, state_squence_cost_sum);
}
std::pair<std::vector<double>, std::vector<double>> SVGMPPI::calculate_state_sequence_cost_batch(
const Sampling& sampling_pointer,
const State& initial_state,
const grid_map::GridMap& local_cost_map
)
{
// declare variables to return
std::vector<double> state_sequence_cost_batch(sampling_pointer.get_sample_number());
std::vector<double> collision_cost_batch(sampling_pointer.get_sample_number());
#pragma omp parallel for num_threads(THREAD_NUMBER)
for (size_t i = 0; i < sampling_pointer.get_sample_number(); i++) {
// rollout state sequence for each control sequence
state_sequence_batch_.at(i) = predict_state_sequence(
initial_state,
sampling_pointer.noised_control_sequence_batch_[i]
);
// calculate state sequence cost
const auto [state_sequence_cost, collision_cost] = calculate_state_sequence_cost(
state_sequence_batch_.at(i),
local_cost_map
);
state_sequence_cost_batch.at(i) = state_sequence_cost;
collision_cost_batch.at(i) = collision_cost;
}
return std::make_pair(state_sequence_cost_batch, collision_cost_batch);
}
5. calculate_state_sequence_cost_batch
\[\left[ S(\mathbf{V}_1), S(\mathbf{V}_2), \dots, S(\mathbf{V}_{T-1}) \right]\]를 계산한다.
입력
- initial_state
- local_cost_map
- state_sequence_batch : 멤버 변수가 들어가며, 함수 내부에서 그 값이 변경된다. 계산하는 과정에서 사용하는 것이 아닌 변경할 멤버변수가 들어간다.
사용하는 멤버 변수 : noised_control_sequence_batch_
std::pair<std::vector<double>, std::vector<double>> SVGMPPI::calculate_state_cost_batch(
const State& initial_state,
const grid_map::GridMap& local_cost_map,
StateSequenceBatch* state_sequence_batch
) const
{
std::vector<double> total_cost_batch(sample_number_);
std::vector<double> collision_cost_batch(sample_number_);
#pragma omp parallel for num_threads(thread_number_)
for (size_t i = 0; i < sample_number_; i++) {
// predict state sequence
state_sequence_batch->at(i) = predict_state_sequence(
initial_state,
noised_control_sequence_batch_[i]
);
// calculate state sequence cost
const auto [total_cost, collision_cost] = calculate_state_sequence_cost(
state_sequence_batch->at(i),
local_cost_map
);
total_cost_batch.at(i) = total_cost;
collision_cost_batch.at(i) = collision_cost;
}
return std::make_pair(total_cost_batch, collision_cost_batch);
}
6. calculate_sample_cost_batch
모든 sample cost를 계산한다. 이는 각 state sequence에 해당하는 최종 cost이다.
\[S(\mathbf{V}_k) + \lambda \sum_{\tau = 0}^{T-1} (\hat{\mathbf{u}}_{\tau} - \tilde{\mathbf{u}}_{\tau})^{\top} \Sigma_{\tau}^{-1} \mathbf{v}_{\tau}\]where
- initial estimated control input sequence, often using the previous optimal solution
- given nominal control input sequence
- randomly generated control input sequences
std::vector<double> SVGMPPI::calculate_sample_cost_batch(
const Sampling& sampling_pointer,
const double& lambda,
const double& alpha,
const std::vector<double> state_cost_batch,
const ControlSequence nominal_control_sequence
) const
{
// declare and initialize sample cost batch as state cost batch
std::vector<double> sample_cost_batch = state_cost_batch;
// add control cost batch to sample cost batch
#pragma omp parallel for num_threads(THREAD_NUMBER)
for (size_t i = 0; i < sampling_pointer.get_sample_number(); i++) {
for (size_t j = 0; j < PREDICTION_HORIZON - 1; j++) {
const double control_cost = \
lambda * (1 - alpha) \
* (sampling_pointer.control_mean_sequence_.row(j) - nominal_control_sequence.row(j)) \
* sampling_pointer.control_inverse_covariance_matrix_sequence_[j] \
* sampling_pointer.noised_control_sequence_batch_[i].row(j).transpose();
sample_cost_batch[i] += control_cost;
}
}
return sample_cost_batch;
}
7. approximate_gradient_log_posterior_batch
ControlSequenceBatch SVGMPPI::approximate_gradient_log_posterior_batch(
const Sampling& sampling_pointer,
const State& initial_state
)
{
// declare and initialize gradient_log_posterior_batch
ControlSequenceBatch gradient_log_posterior_batch = std::vector<Eigen::MatrixXd, Eigen::aligned_allocator<Eigen::MatrixXd>>(
sampling_pointer.get_sample_number(), Eigen::MatrixXd::Zero(PREDICTION_HORIZON, STATE_SPACE::dim)
);
#pragma omp parallel for num_threads(THREAD_NUMBER)
for (size_t i = 0; i < sampling_pointer.get_sample_number(); i++) {
const ControlSequence gradient_log_likelihood = approximate_gradient_log_likelihood(
svg_sampling_pointer_.at(i).get(),
initial_state,
sampling_pointer.noised_control_sequence_batch_[i],
sampling_pointer.control_inverse_covariance_matrix_sequence_
);
gradient_log_posterior_batch[i] = gradient_log_likelihood;
}
return gradient_log_posterior_batch;
}
approximate_gradient_log_likelihood
\[\phi = \dfrac{\sum_{i=0}^{N-1} w(\mathbf{V}_k^g [i]) \Sigma_g^{-1} (\mathbf{V}_k^g[i] - \mathbf{V}_k^g)}{\sum_{i=0}^{N-1}w(\mathbf{V}_k^g [i])}\]ControlSequence SVGMPPI::approximate_gradient_log_likelihood(
Sampling* sampling_pointer,
const State& initial_state,
const ControlSequence& reference_control_sequence,
const ControlCovarianceMatrixSequence& control_inverse_covariance_matrix_sequence
)
{
// declate and initialize reference_control_covariance_matrix_sequence to be used random_sampling
ControlCovarianceMatrixSequence reference_control_covariance_matrix_sequence = std::vector<Eigen::MatrixXd, Eigen::aligned_allocator<Eigen::MatrixXd>>(
PREDICTION_HORIZON - 1, Eigen::MatrixXd::Zero(CONTROL_SPACE::dim, CONTROL_SPACE::dim)
);
for (auto& control_covariance : reference_control_covariance_matrix_sequence) {
for (size_t i = 0; i < CONTROL_SPACE::dim; i++) {
control_covariance(i, i) = steering_control_covariance_for_gradient_estimation_[i];
}
}
// Generate control sequence samples and update member variable noised_control_sequence_batch_
sampling_pointer->random_sampling(reference_control_sequence, reference_control_covariance_matrix_sequence);
// calculate state cost for each state sequence
auto state_cost_batch = calculate_state_sequence_cost_batch(
*sampling_pointer,
initial_state,
local_cost_map_
).first;
// declare variable for summing sample weight in denumerator
std::vector<double> sample_weight_batch(sampling_pointer->get_sample_number());
// declare and initialize variable for numerator term
ControlSequence numerator = Eigen::MatrixXd::Zero(
PREDICTION_HORIZON - 1, CONTROL_SPACE::dim
);
#pragma omp parallel for num_threads(THREAD_NUMBER)
for (size_t i = 0; i < sampling_pointer->get_sample_number(); i++) {
// declare variable for calculate sample cost (state cost + control cost) and initialize as state cost
double sample_cost = state_cost_batch[i];
// calculate sample cost
for (size_t j = 0; j < PREDICTION_HORIZON - 1; j++) {
const double control_cost = SVG_LAMBDA \
* (previous_control_sequence_.row(j) - sampling_pointer->noised_control_sequence_batch_[i].row(j)) \
* control_inverse_covariance_matrix_sequence[j] \
* (previous_control_sequence_.row(j) - sampling_pointer->noised_control_sequence_batch_[i].row(j)).transpose();
sample_cost += control_cost;
}
// calculate sample weight for each sample
const double sample_weight = std::exp(-sample_cost / SVG_LAMBDA);
// for denumerator
sample_weight_batch[i] = sample_weight;
// declare and initialize
ControlSequence gradient_log_gaussian(reference_control_sequence.rows(), reference_control_sequence.cols());
gradient_log_gaussian.setZero();
for (size_t j = 0; j < PREDICTION_HORIZON - 1; j++) {
gradient_log_gaussian.row(j) = sample_weight \
* sampling_pointer->control_inverse_covariance_matrix_sequence_[j] \
* (sampling_pointer->noised_control_sequence_batch_[i] - reference_control_sequence).row(j).transpose();
}
numerator += gradient_log_gaussian;
}
// denumberator term
const double denumerator = std::accumulate(sample_weight_batch.begin(), sample_weight_batch.end(), 0.0);
return numerator / (denumerator + 1e-10);
}
softmax
가장 basic한 softmax
\[\text{softmax}(c_i) = \dfrac{e^{c_i}}{\sum_{i=1}^n e^{c_i}}\]값이 작을수록 높은 확률을 부여하고 싶을 때 음수를 활용한다.
\[\text{softmax}(c_i) = \dfrac{e^{-c_i/\lambda}}{\sum_{i=1}^n e^{-c_i/\lambda}}\]수치적 안정성을 위해 우리는 다음과 같은 수식을 사용한다.
\[c_{\text{min}} = \min [c_1, c_2, \cdots, c_n]\] \[Z = \sum_{i=1}^n \exp {\left(-\dfrac{c_i - c_{\min}}{\lambda}\right)} + 10^{-10}\] \[\text{softmax} (c_i) = \dfrac{\exp {\left(-\dfrac{c_i - c_{\min}}{\lambda}\right)}}{Z}\]std::vector<double> softmax(
const std::vector<double>& costs,
const double& lambda,
const int thread_number
) const
{
// minimum element of costs
const double min_cost_ = *std::min_element(costs.begin(), costs.end());
// calculate normalizing constant
double normalizing_constant_ = 1e-10;
for (const auto& cost : costs) {
normalizing_constant_ += std::exp(- (cost - min_cost_) / lambda);
}
// calculate softmax of costs
std::vector<double> softmax_costs_(costs.size());
#pragma omp parallel for num_threads(thread_number)
for (size_t i = 0; i < costs.size(); i++) {
softmax_costs_[i] = std::exp(-(costs[i] - min_cost_) / lambda) / normalizing_constant_;
}
return softmax_costs_;
}