✨ Enable Ruff security rules (bandit) #1686

Zethson · 2024-06-01T12:42:24Z

Enabling security rules in Ruff:

lamindb/_annotate.py:962:9: S101 Use of `assert` detected
lamindb/_annotate.py:1168:5: S101 Use of `assert` detected
lamindb/_artifact.py:971:9: S101 Use of `assert` detected
lamindb/_can_validate.py:226:9: S110 `try`-`except`-`pass` detected, consider logging the exception
lamindb/_collection.py:103:9: S101 Use of `assert` detected
lamindb/_feature.py:142:5: S101 Use of `assert` detected
lamindb/_finish.py:106:13: S602 `subprocess` call with `shell=True` identified, security issue
lamindb/_finish.py:127:13: S602 `subprocess` call with `shell=True` identified, security issue
lamindb/core/_feature_manager.py:379:5: S101 Use of `assert` detected
lamindb/core/_feature_manager.py:421:13: S101 Use of `assert` detected
lamindb/core/_feature_manager.py:432:17: S101 Use of `assert` detected
lamindb/core/_feature_manager.py:434:17: S101 Use of `assert` detected
lamindb/core/_feature_manager.py:538:9: S101 Use of `assert` detected
lamindb/core/_feature_manager.py:541:9: S101 Use of `assert` detected
lamindb/core/_feature_manager.py:569:9: S101 Use of `assert` detected
lamindb/core/_feature_manager.py:599:9: S101 Use of `assert` detected
lamindb/core/_label_manager.py:69:9: S112 `try`-`except`-`continue` detected, consider logging the exception
lamindb/core/_label_manager.py:105:13: S110 `try`-`except`-`pass` detected, consider logging the exception
lamindb/core/_mapped_collection.py:110:9: S101 Use of `assert` detected
lamindb/core/_run_context.py:47:24: S324 Probable use of insecure hash functions in `hashlib`: `md5`
lamindb/core/_run_context.py:117:16: S113 Probable use of requests call without timeout
lamindb/core/_settings.py:122:9: S101 Use of `assert` detected
lamindb/core/_sync_git.py:28:9: S602 `subprocess` call with `shell=True` identified, security issue
lamindb/core/_sync_git.py:39:9: S607 Starting a process with a partial executable path
lamindb/core/_sync_git.py:40:9: S602 `subprocess` call with `shell=True` seems safe, but may be changed in the future; consider rewriting without `shell`
lamindb/core/_sync_git.py:61:9: S602 `subprocess` call with `shell=True` identified, security issue
lamindb/core/_sync_git.py:71:9: S101 Use of `assert` detected
lamindb/core/_sync_git.py:85:13: S607 Starting a process with a partial executable path
lamindb/core/_sync_git.py:86:13: S602 `subprocess` call with `shell=True` seems safe, but may be changed in the future; consider rewriting without `shell`
lamindb/core/_sync_git.py:96:9: S602 `subprocess` call with `shell=True` identified, security issue
lamindb/core/_track_environment.py:18:66: S602 `subprocess` call with `shell=True` identified, security issue
lamindb/core/storage/_backed_access.py:210:13: S110 `try`-`except`-`pass` detected, consider logging the exception
lamindb/core/storage/_backed_access.py:308:17: S110 `try`-`except`-`pass` detected, consider logging the exception
lamindb/core/storage/paths.py:50:5: S101 Use of `assert` detected
lamindb/integrations/_vitessce.py:42:13: S101 Use of `assert` detected
tests/conftest.py:36:9: S607 Starting a process with a partial executable path
tests/conftest.py:36:51: S602 `subprocess` call with `shell=True` seems safe, but may be changed in the future; consider rewriting without `shell`
tests/test_run_context.py:107:9: S602 `subprocess` call with `shell=True` identified, security issue
Found 38 errors.

Just a show off for now! We can discuss later

Signed-off-by: zethson <[email protected]>

falexwolf · 2024-06-06T05:45:56Z

Great! Let's discuss this tomorrow!

Signed-off-by: zethson <[email protected]>

…into feature/bandit

…ure/bandit

Signed-off-by: zethson <[email protected]>

github-actions · 2024-07-15T15:29:11Z

🚀 Deployed on https://669779524901e6727254ba99--lamindb-qnwk.netlify.app

codecov · 2024-07-15T15:29:26Z

Codecov Report

Attention: Patch coverage is 67.50000% with 13 lines in your changes missing coverage. Please review.

Project coverage is 91.59%. Comparing base (4f1797e) to head (92551c6).
Report is 13 commits behind head on main.

Files	Patch %	Lines
lamindb/core/storage/_backed_access.py	0.00%	4 Missing ⚠️
lamindb/_can_validate.py	0.00%	2 Missing ⚠️
lamindb/integrations/_vitessce.py	0.00%	2 Missing ⚠️
lamindb/_finish.py	50.00%	1 Missing ⚠️
lamindb/core/_feature_manager.py	80.00%	1 Missing ⚠️
lamindb/core/_label_manager.py	50.00%	1 Missing ⚠️
lamindb/core/_mapped_collection.py	50.00%	1 Missing ⚠️
lamindb/core/_settings.py	50.00%	1 Missing ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##             main    #1686      +/-   ##
==========================================
- Coverage   92.12%   91.59%   -0.53%     
==========================================
  Files          50       50              
  Lines        5496     5566      +70     
==========================================
+ Hits         5063     5098      +35     
- Misses        433      468      +35

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Signed-off-by: zethson <[email protected]>

falexwolf · 2024-07-15T15:31:13Z

How does bandit compare to GitHub's CodeQL (see one of the GA jobs)?

Zethson · 2024-07-15T15:46:58Z

How does bandit compare to GitHub's CodeQL (see one of the GA jobs)?

I don't really see any or a lot of overlap. CodeQL seems to be pretty superficial for Python and only checks the most obvious issues. The advantages of CodeQL are that we can also use it for other languages and can use it's query language if we ever need to but I don't see a use-case for us.

Signed-off-by: zethson <[email protected]>

Zethson · 2024-07-15T16:05:27Z

(coverage may go down because we have a couple of untested exceptions now). I can add coverage for these in a followup PR quickly tomorrow.

falexwolf · 2024-07-16T07:22:32Z

I don't really see any or a lot of overlap. CodeQL seems to be pretty superficial for Python and only checks the most obvious issues. The advantages of CodeQL are that we can also use it for other languages and can use it's query language if we ever need to but I don't see a use-case for us.

Thanks for clarifying! For me to understand: The overlap is so small because CodeQL doesn't cover as many security issues as bandit for Python? The nature of the tools is the same in that they're both static code analyzers that detect security issues, correct?

lamindb/core/_settings.py

lamindb/core/_sync_git.py

lamindb/core/storage/_backed_access.py

lamindb/integrations/_vitessce.py

tests/conftest.py

falexwolf

This is overall great!

The only "bad thing" concerns the few cases where exceptions now need to be triggered in tests, but it's very hard to.

I used assert in those instances because I consciously wanted to avoid having to test these exceptions.

Because assert is only used for consistency checks that are meant to provide feedback to users/devs in some edge cases, I thought it'd be OK security-wise.

If somebody attempted to bypass these assert statements, it'd be OK: the "attacker" would get a more cryptic error behavior downstream.

Hence: In those hard-to-test cases, I'd stick with the assert rather than introducing an additional line of code and an additional test with essentially no value to users & devs. I consider maintaining that additional line and test more harmful that beneficial; in particular in a phase of this project where the code overall can hopefully still much consolidated (I mean, the code works, but it needs to be refactored to become more maintainable).

In cases where there is value to the user (say the allowed path suffixes for the vitessce export), there should never be an assert.

Zethson · 2024-07-16T09:00:13Z

Thanks for clarifying! For me to understand: The overlap is so small because CodeQL doesn't cover as many security issues as bandit for Python? The nature of the tools is the same in that they're both static code analyzers that detect security issues, correct?

Correct. CodeQL can also test for unreachable code and other things but we have all of that covered with other tools already. I am currently not impressed by CodeQL but enabling it doesn't hurt.

Hence: In those hard-to-test cases, I'd stick with the assert rather than introducing an additional line of code and an additional test with essentially no value to users & devs

Yeah that's fair. However, either we:

Go back to asserts and tell Bandit to ignore them.
Keep the exceptions and tell codecov to ignore them (if it works cough). Ignoring hard to test things with codecov is totally fine.

I vote for option 2.

falexwolf · 2024-07-16T09:48:23Z

I vote for option 2.

I think option 1 with a noqa and a brief comment next to it would be simpler because we wouldn't need to mess with codecov and we don't have these verbose sections in the code.

I care much more about simplicity than about the exact codecov coverage number. If there is no or minimal value add through additional complexity, I'd always go with what's simpler.

In this case, I'd say there is zero value add in a few instances where assert was replaced. Introducing complexity both within the code and the coverage CI doesn't seem to merit it. We also need to keep in mind that whatever convention is introduced in this PR will be mandated for future contributors. Explaining people how to "not count a specific except block in coverage" seems like a lot of luggage.

It's also error prone because some people might take it as an excuse to not test triggering the exception. Currently, we have this beautiful & simple rule that every exception needs to be triggered in tests. There can't be a single try except block that's not triggered.

If what you suggest means starting to distinguish between "meaningful exceptions" and "exceptions we add to make bandit happy" my reaction is: ohhhhh, let's please not do this. Nobody will get it.

So, unless I miss something, I vote for option 1. It's just a tiny bit of work to revert some of these exceptions back to asserts and add the noqa and the comment, and we'll not have headache with this topic going forward.

Zethson · 2024-07-16T09:55:01Z

It's also error prone because some people might take it as an excuse to not test triggering the exception. Currently, we have this beautiful & simple rule that every exception needs to be triggered in tests. There can't be a single try except block that's not triggered.

I didn't really agree with all of your arguments because they can be made both ways, but this one is good. I'll revert to asserts where reasonable and then merge.

falexwolf · 2024-07-16T10:41:27Z

I didn't really agree with all of your arguments because they can be made both ways, but this one is good. I'll revert to asserts where reasonable and then merge.

How can any of the other arguments made both ways? 🤔

Correct. CodeQL can also test for unreachable code and other things but we have all of that covered with other tools already. I am currently not impressed by CodeQL but enabling it doesn't hurt.

I think I somewhat get this. Thanks!

Signed-off-by: zethson <[email protected]>

…into feature/bandit

Zethson · 2024-07-16T11:22:44Z

Explaining people how to "not count a specific except block in coverage" seems like a lot of luggage.

Now we have to explain how to "circumvent the Bandit rule". We could ignore it in the config, but I actually think that it can be useful if developers think twice about whether this should be an internal assert or a user facing exception.

Signed-off-by: zethson <[email protected]>

falexwolf · 2024-07-16T11:26:14Z

We could ignore it in the config, but I actually think that it can be useful if developers think twice about whether this should be an internal assert or a user facing exception.

I fully agree with you. I think each assert statement should have a noqa: ... and a comment.

Signed-off-by: zethson <[email protected]>

falexwolf · 2024-07-17T08:02:23Z

lamindb/_can_validate.py

@@ -223,8 +223,10 @@ def set_abbr(self, value: str):
    else:
        try:
            self.add_synonym(value, save=False)
-        except Exception:  # pragma: no cover
-            pass
+        except Exception as e:  # pragma: no cover


@sunnyosun - why do we need this? This seems like it should error.

lamindb/_finish.py

lamindb/core/_sync_git.py

falexwolf · 2024-07-17T08:05:34Z

lamindb/core/storage/_backed_access.py

@@ -209,8 +209,10 @@ def safer_read_partial(elem, indices):
            try:
                ds = CSRDataset(elem)
                result = _subset_sparse(ds, indices)
-            except Exception:
-                pass
+            except Exception as e:


Is this the only way this can be done, @Koncopd?

falexwolf · 2024-07-17T08:05:47Z

lamindb/core/storage/_backed_access.py

@@ -307,8 +309,10 @@ def safer_read_partial(elem, indices):  # noqa
                try:
                    ds = CSRDataset(elem)
                    return _subset_sparse(ds, indices)
-                except Exception:
-                    pass
+                except Exception as e:


Same here, @Koncopd.

✨ Fix passing organism

fd9fd12

Signed-off-by: zethson <[email protected]>

Zethson added 9 commits July 15, 2024 10:22

✨ Fix passing organism

f5d44c4

Signed-off-by: zethson <[email protected]>

🎨 Fix a few bandit issues

87ec3f5

Signed-off-by: zethson <[email protected]>

🎨 Remove codeql

81809c8

Signed-off-by: zethson <[email protected]>

Merge branch 'feature/bandit' of https://github.com/laminlabs/lamindb …

cbc4d78

…into feature/bandit

Merge branch 'main' of https://github.com/laminlabs/lamindb into feat…

e3a3025

…ure/bandit

🎨 Submodules

92cdcf3

Signed-off-by: zethson <[email protected]>

🎨 Submodules

7f9e650

Signed-off-by: zethson <[email protected]>

🎨 Fix remaining bandit issues

2484ea8

Signed-off-by: zethson <[email protected]>

🎨 Tests

c491d7f

Signed-off-by: zethson <[email protected]>

github-actions bot temporarily deployed to pull request July 15, 2024 15:29 Inactive

🎨 Refactoring

ef105c1

Signed-off-by: zethson <[email protected]>

Zethson marked this pull request as ready for review July 15, 2024 15:47

🎨 Refactoring

fe1b400

Signed-off-by: zethson <[email protected]>

Zethson changed the title ~~✨ Add bandit~~ ✨ Enable Ruff security rules (bandit) Jul 15, 2024

Zethson requested a review from falexwolf July 15, 2024 16:04

github-actions bot temporarily deployed to pull request July 15, 2024 16:04 Inactive