Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Replace buggy number conversions with tryfrom #736

Closed
wants to merge 1 commit into from

Conversation

latot
Copy link

@latot latot commented Apr 12, 2024

This is a first part of replace buggy number conversions, it covers most of cases where we transform between numerical types and the conversion can fails.

There is still some cases like:

extendr-api/src/scalar/rint.rs:189:26

let result = v as i32;

But will not be included here.

Copy link
Member

@Ilia-Kosenkov Ilia-Kosenkov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall good work, but there are two main issues here:

  • Some conversions are inherently safe and should not be wrapped into TryFrom
  • Other conversions repeat throughout the codebase. This increases cognitive load, you should either create simple helpers or trivial traits that can then be used. Basically, abstract away TInto::try_into::<TFrom>(&TFrom).unwrap() or something like this/

@@ -64,9 +64,10 @@ fn str_from_strsxp<'a>(sexp: SEXP, index: isize) -> &'a str {
let charsxp = STRING_ELT(sexp, index);
if charsxp == R_NaString {
<&str>::na()
} else if TYPEOF(charsxp) == CHARSXP as i32 {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would say this one is excessive, here we compare type ids ('discriminators' if you will). This is a predefined table of small non-negative integers (e.g., check this and this). Here formally you need this cast because libR-sys stores it as u32.

let ptr = R_CHAR(charsxp) as *const u8;
let slice = std::slice::from_raw_parts(ptr, Rf_xlength(charsxp) as usize);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This one is interesting. R_xlen_t is isize, but we need usize here. If we trust R that lengths are never negative, then all valid isize values fit in usize. But ok, let's use safe conversion here.

@@ -87,14 +88,16 @@ impl Iterator for StrIter {
let i = self.i;
self.i += 1;
let vector = self.vector.get();
let vector_u32: u32 = TYPEOF(vector).try_into().unwrap();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same logic here, TYPEOF returns c_int from a known list. I'd leave it as TYPEOF(vector) as u32. But! Good catch for deduplicating code, let's just opt for a better name: vector_u32 -> typeof_vector

} else if TYPEOF(vector) as u32 == NILSXP {
} else if vector_u32 == STRSXP {
Some(str_from_strsxp(vector, isize::try_from(i).unwrap()))
} else if vector_u32 == INTSXP && u32::try_from(TYPEOF(self.levels)).unwrap() == STRSXP
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here. I suggest we introduce a helper function that promotes TYPEOF() from c_int to u32, matching *SXP type.

Some(str_from_strsxp(vector, isize::try_from(i).unwrap()))
} else if vector_u32 == INTSXP && u32::try_from(TYPEOF(self.levels)).unwrap() == STRSXP
{
let j: isize = (*(INTEGER(vector).add(i))).try_into().unwrap();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can't find reference to INTEGER definition in libR-sys, but all integers in R are i32, so i32 -> isize is always safe.

}

/// Convert R's SEXPTYPE to extendr's Rtype.
pub fn sxp_to_rtype(sxptype: i32) -> Rtype {
use Rtype::*;
match sxptype as u32 {
match u32::try_from(sxptype).unwrap() {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is also (but now I question why we have here i32 as input).

let preservation = Rf_allocVector(VECSXP, INITIAL_PRESERVATION_SIZE as R_xlen_t);
let preservation = Rf_allocVector(
VECSXP,
R_xlen_t::try_from(INITIAL_PRESERVATION_SIZE).unwrap(),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is also safe, since it is a compile-time constant.

@@ -157,7 +168,7 @@ impl Ownership {
unsafe fn garbage_collect(&mut self) {
// println!("garbage_collect {} {}", self.cur_index, self.max_index);
let new_size = self.cur_index * 2 + EXTRA_PRESERVATION_SIZE;
let new_sexp = Rf_allocVector(VECSXP, new_size as R_xlen_t);
let new_sexp = Rf_allocVector(VECSXP, R_xlen_t::try_from(new_size).unwrap());
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So many calls to R_xlen_t::try_from().unwrap(). Introduce a helper method that does this.

@@ -155,9 +155,12 @@ fn manifest(x: SEXP) -> SEXP {
single_threaded(|| unsafe {
Rf_protect(x);
let len = XLENGTH_EX(x);
let data2 = Rf_allocVector(TYPEOF(x) as u32, len as R_xlen_t);
let data2 = Rf_allocVector(
u32::try_from(TYPEOF(x)).unwrap(),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is safe

Rf_protect(data2);
match TYPEOF(x) as u32 {
match u32::try_from(TYPEOF(x)).unwrap() {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And this, see comments above

@CGMossa
Copy link
Member

CGMossa commented Apr 20, 2024

Hello again!

Unfortunately, I'm still not in favour of the changes proposed here.
The reason for this, is that the conversations currently present should be eliminated elsewhere, or through a refactor.
Take for instance the TYPEOF-conversions. I've submitted PRs in libR-sys-repo that deals with these in
extendr/libR-sys#210 and extendr/libR-sys#211. These may be reviewed and merged now, which will remove these as _ conversations.

Similarly, the spot where you've got things like INTEGER() would be removed in a refactor soon. There is no need
to make these conversions more "safe" for multiple reasons: This code will be removed in a larger refactor that we've been battling with (nicknamed use-try-from as default). They add noise, without propagating the error elsewhere in the API. When we are converting things that come from FFI, I'm not sure the conversions can be made "safer" than as anyways. as-conversions are C-like conversions to begin with.

@latot
Copy link
Author

latot commented Apr 24, 2024

@CGMossa I mostly agree with you, this PR is far from the ideal fix, a lot things should be fixed by a refactor, but this is the point, to solve by refactor you need know a lot better the program, also takes time, so if the refactor can't be done in the short term, this PR would be good enough. As you says, how there is already PRs that could solve this, and in a better way, that is great! I would like to know it before before spent my time in this particular case and could be do other thing.

The as case is pretty simple, should only/mainly be used if the conversion is done right, like from u8 to u16, but in this case, the PR only focus in conversions that can not keep the numbers without truncation, if you want to truncate that numbers then is fine, if not then as is not the way, even en C/C++ could be a problem. Use the TryFrom solve this.

Maybe just wait until the PRs you says, and then check again this, to see if there is some unexpected truncation left.

@CGMossa
Copy link
Member

CGMossa commented Apr 27, 2024

Dear @latot

Again, I thank you very much for your contribution and attention with the development of extendr.
One of the larger refactoring has just landed in #742. It has taken some effort, but it is finally there.
I would appreciate it if you went there, and gave it a review. You have already went through extendr in many of the same places, and it would be good to have as many attentive eyes on such a large PR as possible.

I'd also like to encourage participating in reviewing PRs, and replying to issues, as we have very few conversations going in-depth into what are all the aspects that need attention.

While a community review doesn't yield approval on a PR, it shows approval of the community, which in many cases are much, much more valuable.

I'll make an effort to land a few more refactoring very soon.

@CGMossa
Copy link
Member

CGMossa commented May 9, 2024

Dear @latot
Today, #742 just landed, which addresses quite many of these conversions directly. It took a lot of effort but it is in. I would like to close this PR as it has been superseded.

If I have overlooked a conversion, or similar, there are more things you'd like to contribute to extendr,
I'd very much encourage you to open a new PR.

Godspeed!

@CGMossa CGMossa closed this May 9, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants