Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adding Multi-Task ElasticNet support #194

Draft
wants to merge 29 commits into
base: master
Choose a base branch
from

Conversation

PABannier
Copy link
Contributor

@PABannier PABannier commented Jan 19, 2022

The goal of this PR is to add multi-task ElasticNet to the elasticnet crate.

A quick roadmap:

  • Write block coordinate descent
  • Write dual gap for multi-task
  • Write tests for BCD
  • Make CD variable names consistent with variable names in BCD
  • Use a for loop when updating the residuals in CD
  • Z-score + confidence level + variance
  • Adapt Linfa ElasticNet API to the multi-task case
  • Write tests for ElasticNet multi-task

@PABannier PABannier marked this pull request as draft January 19, 2022 19:14
@PABannier
Copy link
Contributor Author

@YuhanLiin i'm implementing the Fit trait for the multi-task ENET. However, I don't know how to deal with Linfa API to handle the multi-task case. I created a MultiTaskElasticNet struct, but since both ElasticNet and MultiTaskElasticNet have the same set of parameters, I didn't create a MultiTaskElasticNetValidParams.

My question is how to restrict the trait bounds to implement the Fit trait for multi-task dataset?

src/dataset/mod.rs Outdated Show resolved Hide resolved
@YuhanLiin
Copy link
Collaborator

Is it possible to make functions like coordinate_descent, duality_gap, variance_params, and compute_intercept generic across 1D and 2D arrays? That way we won't need 2 sets of similar helper functions to handle the single-task and multi-task cases.

@PABannier
Copy link
Contributor Author

@YuhanLiin Thanks for all the remarks! I passed your comments on the latest commit.

As for making coordinate_descent generic across 1D and 2D arrays, I don't think it is a good idea. coordinate_descent and block_coordinate_descent rely on two different proximal operators and there are more for loops in block_coordinate_descent specifically designed for the multi-task case. IMHO, a wiser choice would be to let coordinate_descent and block_coordinate_descent separate and use them as backbones for different penalties in the single task or multi task case. My intuition is that in future PR we could make it such that these two functions are generic over the regularization used in a model (for now it only supports L1 + L2), but there are more complex penalties that are very useful as well (non-convex ones for instance, see SCAD or MCP).

duality_gap is also very specific in the single task or multi-task case and introducing a match syntax would make things more confusing IMHO.

For variance_params and compute_intercept I can certainly make it generic over 1D or 2D arrays.

Comment on lines 368 to 372
for i in 0..x.shape()[0] {
for t in 0..n_tasks {
r[[i, t]] += x_j[i] * old_w_j[t];
}
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't this operation equivalent to r += x_j.dot(old_w_j.t())? If so you can replace these types of for loops with general_mat_mul(1, x_j, old_w_j.t(), 1, r).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This operation is equivalent to np.outer in Python (see: https://numpy.org/doc/stable/reference/generated/numpy.outer.html for a more detailed explanation). I don't think there is a built-in equivalent in ndarray. It is not yet available in the ndarray crate (see: rust-ndarray/ndarray#1148) .

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

outer(x, w) is equivalent to x as a column vector multiplied with w as a row vector. The code in the ndarray PR you linked does the same thing. Using general_mat_mul allows you to add the matrix product of x and w to r in one operation. You do need to convert x and y into 2D arrays though, like in the PR.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ideal way to convert 1D arrays into 2D is insert_axis. Something like x.view().insert_axis(Axis(0_or_1))

@YuhanLiin
Copy link
Collaborator

Making variance_params and compute_intercept generic would be great.

let norm_cols_x = x.map_axis(Axis(0), |col| col.dot(&col));
let mut gap = F::one() + tol;
let d_w_tol = tol;
let tol = tol * y.fold(F::zero(), |sum, &y_ij| sum + y_ij.powi(2));
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of fold you can do y.iter().map(|x| x*x).sum(). Since iter is called before map it won't create a new array. For 1D arrays it's even simpler since you can just call y.dot(&y) to dot product y with itself.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You should also apply this change to all the other similar fold calls

gap
}

fn variance_params<F: Float + Lapack, T: AsTargets<Elem = F>, D: Data<Elem = F>, I: Dimension>(
Copy link
Contributor Author

@PABannier PABannier Feb 20, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This AsTargets<Elem = F> is very complex to deal with. I can't do much with it, since I need to have an ArrayBase in order to call ndim(), shape() and to make the computation target - y_est. Do you know how I can circumvent this issue? I don't understand the need for an AsTarget trait in the first place. At least it should support multi-task targets.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually it should have some way to retrieve the dimension of the targets.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Currently AsTargets has the method as_multi_targets, which returns a 2D array view, so you can call it to retrieve the target for both cases. For the single target case this returns an array of dimension (n, 1). This means your code needs to treat single-task and multi-task-with-only-one-task as equivalent cases. y_est will need to be a 2D array in all cases; for single-task just insert Axis(1) into y_est.

After your other PR, this should be bounded with AsMultiTargets (now that I think about it, we need AsMultiTargets as a super-trait of AsSingleTarget for this to work).

@@ -429,7 +426,7 @@ fn duality_gap<'a, F: Float>(
} else {
(F::one(), r_norm2)
};
let l1_norm = w.fold(F::zero(), |sum, w_i| sum + w_i.abs());
let l1_norm = w.map(|w_i| w_i.abs()).sum();
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use dot

Comment on lines 456 to 453
let w_norm2 = w.fold(F::zero(), |sum, &wij| sum + wij.powi(2));
let r_norm2 = r.map(|rij| rij.powi(2)).sum();
let w_norm2 = w.map(|wij| wij.powi(2)).sum();
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Call iter() before map to prevent creating a new array

@PABannier
Copy link
Contributor Author

Since #206 has been merged, ElasticNet is now easier to adapt to the multi-task case. I'm still working on it.
Roadmap before merging:

  • Make variance_params generic for single task and multi-task
  • Make compute_intercept generic for single task and multi-task
  • Write tests for MultiTaskElasticNet (Lasso + Ridge...)

@YuhanLiin
Copy link
Collaborator

Work continued in #238

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants