Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

✨ Enable Ruff security rules (bandit) #1686

Merged
merged 24 commits into from
Jul 17, 2024
Merged

✨ Enable Ruff security rules (bandit) #1686

merged 24 commits into from
Jul 17, 2024

Conversation

Zethson
Copy link
Member

@Zethson Zethson commented Jun 1, 2024

Enabling security rules in Ruff:

lamindb/_annotate.py:962:9: S101 Use of `assert` detected
lamindb/_annotate.py:1168:5: S101 Use of `assert` detected
lamindb/_artifact.py:971:9: S101 Use of `assert` detected
lamindb/_can_validate.py:226:9: S110 `try`-`except`-`pass` detected, consider logging the exception
lamindb/_collection.py:103:9: S101 Use of `assert` detected
lamindb/_feature.py:142:5: S101 Use of `assert` detected
lamindb/_finish.py:106:13: S602 `subprocess` call with `shell=True` identified, security issue
lamindb/_finish.py:127:13: S602 `subprocess` call with `shell=True` identified, security issue
lamindb/core/_feature_manager.py:379:5: S101 Use of `assert` detected
lamindb/core/_feature_manager.py:421:13: S101 Use of `assert` detected
lamindb/core/_feature_manager.py:432:17: S101 Use of `assert` detected
lamindb/core/_feature_manager.py:434:17: S101 Use of `assert` detected
lamindb/core/_feature_manager.py:538:9: S101 Use of `assert` detected
lamindb/core/_feature_manager.py:541:9: S101 Use of `assert` detected
lamindb/core/_feature_manager.py:569:9: S101 Use of `assert` detected
lamindb/core/_feature_manager.py:599:9: S101 Use of `assert` detected
lamindb/core/_label_manager.py:69:9: S112 `try`-`except`-`continue` detected, consider logging the exception
lamindb/core/_label_manager.py:105:13: S110 `try`-`except`-`pass` detected, consider logging the exception
lamindb/core/_mapped_collection.py:110:9: S101 Use of `assert` detected
lamindb/core/_run_context.py:47:24: S324 Probable use of insecure hash functions in `hashlib`: `md5`
lamindb/core/_run_context.py:117:16: S113 Probable use of requests call without timeout
lamindb/core/_settings.py:122:9: S101 Use of `assert` detected
lamindb/core/_sync_git.py:28:9: S602 `subprocess` call with `shell=True` identified, security issue
lamindb/core/_sync_git.py:39:9: S607 Starting a process with a partial executable path
lamindb/core/_sync_git.py:40:9: S602 `subprocess` call with `shell=True` seems safe, but may be changed in the future; consider rewriting without `shell`
lamindb/core/_sync_git.py:61:9: S602 `subprocess` call with `shell=True` identified, security issue
lamindb/core/_sync_git.py:71:9: S101 Use of `assert` detected
lamindb/core/_sync_git.py:85:13: S607 Starting a process with a partial executable path
lamindb/core/_sync_git.py:86:13: S602 `subprocess` call with `shell=True` seems safe, but may be changed in the future; consider rewriting without `shell`
lamindb/core/_sync_git.py:96:9: S602 `subprocess` call with `shell=True` identified, security issue
lamindb/core/_track_environment.py:18:66: S602 `subprocess` call with `shell=True` identified, security issue
lamindb/core/storage/_backed_access.py:210:13: S110 `try`-`except`-`pass` detected, consider logging the exception
lamindb/core/storage/_backed_access.py:308:17: S110 `try`-`except`-`pass` detected, consider logging the exception
lamindb/core/storage/paths.py:50:5: S101 Use of `assert` detected
lamindb/integrations/_vitessce.py:42:13: S101 Use of `assert` detected
tests/conftest.py:36:9: S607 Starting a process with a partial executable path
tests/conftest.py:36:51: S602 `subprocess` call with `shell=True` seems safe, but may be changed in the future; consider rewriting without `shell`
tests/test_run_context.py:107:9: S602 `subprocess` call with `shell=True` identified, security issue
Found 38 errors.

Just a show off for now! We can discuss later

@falexwolf
Copy link
Member

Great! Let's discuss this tomorrow!

Copy link

github-actions bot commented Jul 15, 2024

@github-actions github-actions bot temporarily deployed to pull request July 15, 2024 15:29 Inactive
Copy link

codecov bot commented Jul 15, 2024

Codecov Report

Attention: Patch coverage is 67.50000% with 13 lines in your changes missing coverage. Please review.

Project coverage is 91.59%. Comparing base (4f1797e) to head (92551c6).
Report is 13 commits behind head on main.

Files Patch % Lines
lamindb/core/storage/_backed_access.py 0.00% 4 Missing ⚠️
lamindb/_can_validate.py 0.00% 2 Missing ⚠️
lamindb/integrations/_vitessce.py 0.00% 2 Missing ⚠️
lamindb/_finish.py 50.00% 1 Missing ⚠️
lamindb/core/_feature_manager.py 80.00% 1 Missing ⚠️
lamindb/core/_label_manager.py 50.00% 1 Missing ⚠️
lamindb/core/_mapped_collection.py 50.00% 1 Missing ⚠️
lamindb/core/_settings.py 50.00% 1 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #1686      +/-   ##
==========================================
- Coverage   92.12%   91.59%   -0.53%     
==========================================
  Files          50       50              
  Lines        5496     5566      +70     
==========================================
+ Hits         5063     5098      +35     
- Misses        433      468      +35     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Signed-off-by: zethson <[email protected]>
@falexwolf
Copy link
Member

falexwolf commented Jul 15, 2024

How does bandit compare to GitHub's CodeQL (see one of the GA jobs)?

@Zethson
Copy link
Member Author

Zethson commented Jul 15, 2024

How does bandit compare to GitHub's CodeQL (see one of the GA jobs)?

I don't really see any or a lot of overlap. CodeQL seems to be pretty superficial for Python and only checks the most obvious issues. The advantages of CodeQL are that we can also use it for other languages and can use it's query language if we ever need to but I don't see a use-case for us.

@Zethson Zethson marked this pull request as ready for review July 15, 2024 15:47
Signed-off-by: zethson <[email protected]>
@Zethson Zethson changed the title ✨ Add bandit ✨ Enable Ruff security rules (bandit) Jul 15, 2024
@Zethson Zethson requested a review from falexwolf July 15, 2024 16:04
@github-actions github-actions bot temporarily deployed to pull request July 15, 2024 16:04 Inactive
@Zethson
Copy link
Member Author

Zethson commented Jul 15, 2024

(coverage may go down because we have a couple of untested exceptions now). I can add coverage for these in a followup PR quickly tomorrow.

@falexwolf
Copy link
Member

falexwolf commented Jul 16, 2024

I don't really see any or a lot of overlap. CodeQL seems to be pretty superficial for Python and only checks the most obvious issues. The advantages of CodeQL are that we can also use it for other languages and can use it's query language if we ever need to but I don't see a use-case for us.

Thanks for clarifying! For me to understand: The overlap is so small because CodeQL doesn't cover as many security issues as bandit for Python? The nature of the tools is the same in that they're both static code analyzers that detect security issues, correct?

Copy link
Member

@falexwolf falexwolf left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is overall great!

The only "bad thing" concerns the few cases where exceptions now need to be triggered in tests, but it's very hard to.

I used assert in those instances because I consciously wanted to avoid having to test these exceptions.

Because assert is only used for consistency checks that are meant to provide feedback to users/devs in some edge cases, I thought it'd be OK security-wise.

If somebody attempted to bypass these assert statements, it'd be OK: the "attacker" would get a more cryptic error behavior downstream.

Hence: In those hard-to-test cases, I'd stick with the assert rather than introducing an additional line of code and an additional test with essentially no value to users & devs. I consider maintaining that additional line and test more harmful that beneficial; in particular in a phase of this project where the code overall can hopefully still much consolidated (I mean, the code works, but it needs to be refactored to become more maintainable).

In cases where there is value to the user (say the allowed path suffixes for the vitessce export), there should never be an assert.

@Zethson
Copy link
Member Author

Zethson commented Jul 16, 2024

Thanks for clarifying! For me to understand: The overlap is so small because CodeQL doesn't cover as many security issues as bandit for Python? The nature of the tools is the same in that they're both static code analyzers that detect security issues, correct?

Correct. CodeQL can also test for unreachable code and other things but we have all of that covered with other tools already. I am currently not impressed by CodeQL but enabling it doesn't hurt.

Hence: In those hard-to-test cases, I'd stick with the assert rather than introducing an additional line of code and an additional test with essentially no value to users & devs

Yeah that's fair. However, either we:

  1. Go back to asserts and tell Bandit to ignore them.
  2. Keep the exceptions and tell codecov to ignore them (if it works cough). Ignoring hard to test things with codecov is totally fine.

I vote for option 2.

@falexwolf
Copy link
Member

I vote for option 2.

I think option 1 with a noqa and a brief comment next to it would be simpler because we wouldn't need to mess with codecov and we don't have these verbose sections in the code.

I care much more about simplicity than about the exact codecov coverage number. If there is no or minimal value add through additional complexity, I'd always go with what's simpler.

In this case, I'd say there is zero value add in a few instances where assert was replaced. Introducing complexity both within the code and the coverage CI doesn't seem to merit it. We also need to keep in mind that whatever convention is introduced in this PR will be mandated for future contributors. Explaining people how to "not count a specific except block in coverage" seems like a lot of luggage.

It's also error prone because some people might take it as an excuse to not test triggering the exception. Currently, we have this beautiful & simple rule that every exception needs to be triggered in tests. There can't be a single try except block that's not triggered.

If what you suggest means starting to distinguish between "meaningful exceptions" and "exceptions we add to make bandit happy" my reaction is: ohhhhh, let's please not do this. Nobody will get it.

So, unless I miss something, I vote for option 1. It's just a tiny bit of work to revert some of these exceptions back to asserts and add the noqa and the comment, and we'll not have headache with this topic going forward.

@Zethson
Copy link
Member Author

Zethson commented Jul 16, 2024

It's also error prone because some people might take it as an excuse to not test triggering the exception. Currently, we have this beautiful & simple rule that every exception needs to be triggered in tests. There can't be a single try except block that's not triggered.

I didn't really agree with all of your arguments because they can be made both ways, but this one is good. I'll revert to asserts where reasonable and then merge.

@falexwolf
Copy link
Member

I didn't really agree with all of your arguments because they can be made both ways, but this one is good. I'll revert to asserts where reasonable and then merge.

How can any of the other arguments made both ways? 🤔

Correct. CodeQL can also test for unreachable code and other things but we have all of that covered with other tools already. I am currently not impressed by CodeQL but enabling it doesn't hurt.

I think I somewhat get this. Thanks!

@Zethson
Copy link
Member Author

Zethson commented Jul 16, 2024

Explaining people how to "not count a specific except block in coverage" seems like a lot of luggage.

Now we have to explain how to "circumvent the Bandit rule". We could ignore it in the config, but I actually think that it can be useful if developers think twice about whether this should be an internal assert or a user facing exception.

Signed-off-by: zethson <[email protected]>
@falexwolf
Copy link
Member

We could ignore it in the config, but I actually think that it can be useful if developers think twice about whether this should be an internal assert or a user facing exception.

I fully agree with you. I think each assert statement should have a noqa: ... and a comment.

Zethson added 4 commits July 16, 2024 13:28
Signed-off-by: zethson <[email protected]>
Signed-off-by: zethson <[email protected]>
Signed-off-by: zethson <[email protected]>
@github-actions github-actions bot temporarily deployed to pull request July 16, 2024 12:51 Inactive
@github-actions github-actions bot temporarily deployed to pull request July 17, 2024 07:57 Inactive
@Zethson Zethson merged commit 79dab37 into main Jul 17, 2024
13 of 15 checks passed
@Zethson Zethson deleted the feature/bandit branch July 17, 2024 08:00
@@ -223,8 +223,10 @@ def set_abbr(self, value: str):
else:
try:
self.add_synonym(value, save=False)
except Exception: # pragma: no cover
pass
except Exception as e: # pragma: no cover
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@sunnyosun - why do we need this? This seems like it should error.

@@ -209,8 +209,10 @@ def safer_read_partial(elem, indices):
try:
ds = CSRDataset(elem)
result = _subset_sparse(ds, indices)
except Exception:
pass
except Exception as e:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this the only way this can be done, @Koncopd?

@@ -307,8 +309,10 @@ def safer_read_partial(elem, indices): # noqa
try:
ds = CSRDataset(elem)
return _subset_sparse(ds, indices)
except Exception:
pass
except Exception as e:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here, @Koncopd.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants