Support softmax for row_vector, matrix types + code simplifications#3313
Support softmax for row_vector, matrix types + code simplifications#3313jachymb wants to merge 7 commits intostan-dev:developfrom
Conversation
|
I think it's some server error, not actual test failure. |
|
EDIT: Fixed |
|
@SteveBronder may I ask for a rerun pls? |
Jenkins Console Log Machine informationNo LSB modules are available. Distributor ID: Ubuntu Description: Ubuntu 20.04.3 LTS Release: 20.04 Codename: focalCPU: G++: Clang: |
|
@jachymb Did you use an AI to write this? We have not adopted an AI policy yet but it's nice to let us know. If you wrote this I'm happy to help you fix it up. There are a number of odds and ends in this PR which will take some cleaning up and if it's an AI then I'll most likely just close it. |
|
@SteveBronder Yeah, I used claude code, but reviewed every line of code until I felt it makes a good sense. I'd say my c++ knowledge is only intermediate, I mostly work with other languages. If you point me to topics where you think it should be approached differently or some resources I should first go through, I'd be happy to learn about that. |
|
Using ai is totally reasonable. So one big issue with this is that it assumes matrices want to have their rows summed. We do not normally assume whether the user wants rows / columns summed for a given matrix expression. We would want |
| = (val.array().colwise() - val.rowwise().maxCoeff().array()).eval(); | ||
| const auto exp_s = shifted.exp().eval(); | ||
| const auto row_sums = exp_s.rowwise().sum().eval(); | ||
| const auto lsm_val = (shifted.colwise() - row_sums.log()).matrix().eval(); |
There was a problem hiding this comment.
This is made once so it can just be written on line 42
|
Also I would check out how we handle rows / cols and For instance, in your template <typename T, require_rev_matrix_t<T>* = nullptr>
inline auto log_softmax(T&& x) {
check_nonzero_size("log_softmax", "x", x);
auto x_arena = to_arena(std::forward<T>(x));
using return_t
= return_var_matrix_t<decltype(log_softmax(x_arena.val())), T>;
arena_t<return_t> res = log_softmax(x_arena.val());
reverse_pass_callback([x_arena, res]() mutable {
if constexpr (is_eigen_matrix_dynamic<decltype(x_arena.val())>::value) {
} else {
}
});
return res;
}
template <typename T, require_std_vector_st<is_var, T>* = nullptr>
inline auto log_softmax(T&& x) {
return apply_vector_unary<T>::apply(std::forward<T>(x), [](auto&& alpha) {
return log_softmax(std::forward<decltype(alpha)>(alpha));
});
}
The first one works for all The second one is just boilerplate so we can call the first function over arrays of vectors. |
Summary
This refactors the
softmaxandlog_softmaxfunctions in the following ways:vectorandrow_vectorare now supported instead of justvectormatrixtype is also supported and the softmax is applied row-wise. (i.e. the return value is effectively arow_stochastic_matrixfor modelling purposes)Tests
Side Effects
None to my knowledge.
Release notes
softmaxandlog_softmaxnow supportrow_vectorandmatrix(applied row-wise)Checklist
Copyright holder: Me, jachymb@gmail.com
- Code: BSD 3-clause (https://opensource.org/licenses/BSD-3-Clause)
- Documentation: CC-BY 4.0 (https://creativecommons.org/licenses/by/4.0/)
the basic tests are passing
./runTests.py test/unit)make test-headers)make test-math-dependencies)make doxygen)make cpplint)the code is written in idiomatic C++ and changes are documented in the doxygen
the new changes are tested