Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refine backprop assignments in error message #87

Conversation

sonalmahajan15
Copy link
Contributor

@sonalmahajan15 sonalmahajan15 commented Oct 19, 2023

This PR refines the error message by incorporating information about the assignments. Following is an example of an informative error message:

Code snippet:

func test(x *int) {
	x = nil
	y := x
	z := y
	print(*z)
}

Error message:

Potential nil panic detected. Observed nil flow from source to dereference point: 
   -> errormessage/errormessage.go:32:9: literal `nil` dereferenced via the assignment(s):
        -> `nil` to `x` at errormessage/errormessage.go:29:2,
        -> `x` to `y` at errormessage/errormessage.go:30:2,
        -> `y` to `z` at errormessage/errormessage.go:31:2

[closes #83 ]
[depends on #86 ]

@codecov
Copy link

codecov bot commented Oct 19, 2023

Codecov Report

Attention: 72 lines in your changes are missing coverage. Please review.

Comparison is base (a1668d2) 89.48% compared to head (43b26d7) 89.24%.

❗ Current head 43b26d7 differs from pull request most recent head 0127d3f. Consider uploading reports for the commit 0127d3f to get more accurate results

Files Patch % Lines
annotation/consume_trigger.go 74.82% 70 Missing ⚠️
util/util.go 66.66% 1 Missing and 1 partial ⚠️
Additional details and impacted files
@@                       Coverage Diff                        @@
##           sonalmahajan15/add-deep-copy      #87      +/-   ##
================================================================
- Coverage                         89.48%   89.24%   -0.24%     
================================================================
  Files                                54       54              
  Lines                              8865     9087     +222     
================================================================
+ Hits                               7933     8110     +177     
- Misses                              775      820      +45     
  Partials                            157      157              

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@sonalmahajan15 sonalmahajan15 force-pushed the sonalmahajan15/fix-backprop-error-message branch from 67d7ac3 to 9c6b9a2 Compare October 20, 2023 17:13
@sonalmahajan15 sonalmahajan15 force-pushed the sonalmahajan15/add-deep-copy branch from 16bbb5e to 2069a0f Compare October 20, 2023 17:14
@sonalmahajan15 sonalmahajan15 linked an issue Oct 22, 2023 that may be closed by this pull request
Copy link
Contributor

@yuxincs yuxincs left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall LGTM 😃 but let's make sure the artifact size does not increase too much since we are adding a good amount of strings.

Maybe test after PR #78 which hopefully will make this a non-issue. (I'll try to merge that stack of PR ASAP, currently it is pending final performance validations)

Comment on lines +80 to +83
// Assignment is a struct that represents an assignment to an expression
type Assignment struct {
LHSExprStr string
RHSExprStr string
Position token.Position
}
Copy link
Contributor

@yuxincs yuxincs Oct 30, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Question: why not just store *ast.AssignStmt AST node directly and print whenever needed? We do not have to worry about cross-package independence since the consumer triggers are only used within the analysis of a single package.

Once the consume trigger gets to the inference engine, it will be converted to a string representation and then stored in artifact. So here we do not really have to worry about it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What you are suggesting logically makes sense, but there were a couple of reasons why I did not opt for this design choice.

  1. backpropAcrossOneToOneAssignment which populates the Assignment struct object, does not have *ast.AssignStmt, but instead only has LHS/RHS ast.Exprs.
  2. We can of course store ast.Exprs here. But since we want to convert the ast.Expr to string for printing, we'll additionally need to also store *analysis.Pass to facilitate that.

Hence, I thought exposing and storing only the bare minimum objects here might be more desirable. Let me know if you think otherwise. :)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see!

The only concern I have is the memory consumption and artifact size increases (we are already getting issue reports stating that NilAway consumes memory a lot). Let's

(1) run make build && bin/nilaway -memprofile ./mem.prof std, then go tool pprof -alloc_space ./mem.pprof and then type top, which should show the sum of allocated memory. This should be a good proxy to see the memory consumption. We could simply compare before & after this PR to see if it has a huge impact on memory.

(2) we should run performance validations internally to see if it has an impact on artifact size.

Other than that, this makes sense to me 👍

Copy link
Contributor Author

@sonalmahajan15 sonalmahajan15 Jan 4, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fair point! I had performed both the checks discussed above: memory profiling and internal performance evaluation. There was a slight increase in memory use, but well within the acceptable bounds. The performance evaluation did not show any noticeable change. In summary, I have confirmed that the changes in this PR do not incur any significant overhead.

util/util.go Outdated
Comment on lines 497 to 499
// ExprToString converts AST expression to string
func ExprToString(e ast.Expr, pass *analysis.Pass) string {
var buf bytes.Buffer
err := printer.Fprint(&buf, pass.Fset, e)
if err != nil {
panic(fmt.Sprintf("Failed to convert AST expression to string: %v\n", err))
}
return buf.String()
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm having dejavu of this, didn't we add this in some other PRs?

Anyways, not something to fix :)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Haha, yes. This function was introduced in the revised error message format PR at one point, but removed later since it did not have any use. So, it was re-added in this PR since we want to convert LHS and RHS ast.Expr in an assignment statement to string.

Comment on lines -891 to -900
//
// nilable(path, result 0)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

// TriggerIfNonNil is triggered if the contained Annotation is non-nil
type TriggerIfNonNil struct {
Ann Key
assignmentFlow
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Question: do we need to care about this field in equals and Copy methods?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now, with using orderedmap for assignmentFlow, we should consider it in copy. Added code for it.

For equals we should not account for it. Consider the situation of multiple nilable flows reach a dereference site. We keep NilAway practical by tracking only one representative nil flow, and printing only one error message for the dereference site with one representative flow printed in the error message. However, considering assignmentFlow in equals would mean tracking all nilable flows in separate full triggers and reporting errors through all paths. This would likely incur a performance penalty, which we wouldn't want. I have added a note at the definition of assignmentFlow explaining this.

Comment on lines 115 to 144
// backprop algorithm populates assignment entries in backward order. Reverse entries to get forward order of assignments.
for i, j := 0, len(entries)-1; i < j; i, j = i+1, j-1 {
entries[i], entries[j] = entries[j], entries[i]
}

// build string slice
strs := make([]string, len(entries))
for i, entry := range entries {
strs[i] = entry.String()
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we just iterate in reverse order instead of actually reversing the slice?

for i := len(entries) - 1; i > 0; i-- {
  entries[i]
  //...
}}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point. Done in the context of using orderedmap.

}
return fmt.Sprintf("found in at least one path of `%s()` for return in position %d", u.FuncName, u.RetNum)
sb.WriteString(u.AssignmentStr)
return sb.String()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Having read through this all, I'm actually a bit tempted to create a shared struct, say metaPrestring or something similar that stores AssignmentStr since it is shared by all of the prestring nodes. Then we just embed this struct in all Prestring structs.

Then, this metaPrestring can expose two methods, writePrefix(io.Writer) and writeSuffix(io.Writer), which is called at the beginning and at the end of each String() function (by u.meta.writePrefix(sb), since strings.Builder implements the io.Writer interface).

This would allow future expansion a little easier.

But I also see that might be an overkill for now, so we can keep this in mind until we need to add another field to all prestrings :)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Totally agreed! Every time I visit consume_trigger.go and produce_trigger.go, I am tempted to rearchitect it for Prestrings. I feel there is a lot of repetition there. A problem to be solved some other day :)

Comment on lines 105 to 133
// fix point convergence in backprop could mean duplicate entries in this slice, hence we filter out the duplicates here
entries := make([]Assignment, 0, len(a.assignments))
seen := make(map[string]bool)
for _, entry := range a.assignments {
if !seen[entry.String()] {
seen[entry.String()] = true
entries = append(entries, entry)
}
}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we can use our own orderedmap implementation after PR #66 is merged (I'll try to merge that ASAP) such that we do not have to store a lot of entries and then dedup (i.e., we dedup on the fly).

But taking a step back, would this cause a lot of performance penalty? are there any other ways (maybe not) 🤔

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done. Using orderedmap now :)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Haven't observed any performance penalty as such, but will run a thorough performance experiment to confirm.

Comment on lines +581 to +589
// TODO: below check for `lhsNode != nil` should not be needed when NilAway supports Ok form for
// used-defined functions (tracked issue #77)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

(BTW this function is long enough to the point it's a bit painful to (re-)read every time there is a change to it, we probably want to refactor it to smaller pieces in future revisions).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh yes, agreed. This can definitely benefit from refactoring :)

}

func test2(x *int) {
x = nil
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we also add cases where x is

(1) read from a map without guarding (default nilable)
(2) reading from a nil channel perhaps?
(3) an empty slice

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added.

afterLastIndex := len(rootNode.triggers)

// Update consumers of newly added triggers with assignment entries for informative printing of errors
if len(rootNode.triggers) > beforeLastIndex && len(rootNode.triggers) <= afterLastIndex {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need this check?

(1) will len(rootNode.triggers) > beforeLastIndex ever be false? i.e., will rootNode.triggers actually shrink?

(2) len(rootNode.triggers) == afterLastIndex here, right?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are right, revised the checks.

@sonalmahajan15 sonalmahajan15 force-pushed the sonalmahajan15/add-deep-copy branch from 2069a0f to bba340e Compare November 6, 2023 03:54
@sonalmahajan15 sonalmahajan15 force-pushed the sonalmahajan15/fix-backprop-error-message branch from 9c6b9a2 to 04aa902 Compare November 6, 2023 03:55
Copy link
Contributor

@yuxincs yuxincs left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Logic LGTM! The only concern I have is memory consumption & artifact size increases.

Let's do a quick profiling & internal performance validations to ensure the consumptions are still within reasonable bounds 😃

Comment on lines +80 to +83
// Assignment is a struct that represents an assignment to an expression
type Assignment struct {
LHSExprStr string
RHSExprStr string
Position token.Position
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see!

The only concern I have is the memory consumption and artifact size increases (we are already getting issue reports stating that NilAway consumes memory a lot). Let's

(1) run make build && bin/nilaway -memprofile ./mem.prof std, then go tool pprof -alloc_space ./mem.pprof and then type top, which should show the sum of allocated memory. This should be a good proxy to see the memory consumption. We could simply compare before & after this PR to see if it has a huge impact on memory.

(2) we should run performance validations internally to see if it has an impact on artifact size.

Other than that, this makes sense to me 👍

@sonalmahajan15 sonalmahajan15 force-pushed the sonalmahajan15/fix-backprop-error-message branch from a84f41a to 3f45765 Compare November 27, 2023 20:28
@sonalmahajan15 sonalmahajan15 force-pushed the sonalmahajan15/add-deep-copy branch from 1c8d6f6 to 7614ccb Compare November 29, 2023 19:26
@sonalmahajan15 sonalmahajan15 force-pushed the sonalmahajan15/fix-backprop-error-message branch from 3f45765 to 6726b23 Compare November 29, 2023 19:28
@sonalmahajan15 sonalmahajan15 force-pushed the sonalmahajan15/add-deep-copy branch from 7614ccb to d97e64e Compare November 30, 2023 20:02
@sonalmahajan15 sonalmahajan15 force-pushed the sonalmahajan15/fix-backprop-error-message branch from 6726b23 to 1c82754 Compare November 30, 2023 20:03
@sonalmahajan15 sonalmahajan15 force-pushed the sonalmahajan15/fix-backprop-error-message branch from 1c82754 to 06d3eb8 Compare December 12, 2023 00:31
@sonalmahajan15 sonalmahajan15 force-pushed the sonalmahajan15/add-deep-copy branch from d97e64e to 650ff29 Compare December 12, 2023 00:34
@sonalmahajan15
Copy link
Contributor Author

Logic LGTM! The only concern I have is memory consumption & artifact size increases.

Let's do a quick profiling & internal performance validations to ensure the consumptions are still within reasonable bounds 😃

Checked with profiling and internal performance validation. Both memory consumption and artifact size within acceptable bounds.

@sonalmahajan15 sonalmahajan15 force-pushed the sonalmahajan15/add-deep-copy branch from 650ff29 to a1668d2 Compare January 17, 2024 21:14
@sonalmahajan15 sonalmahajan15 force-pushed the sonalmahajan15/fix-backprop-error-message branch from 92d792c to 43b26d7 Compare January 17, 2024 21:14
This PR shortens the expression strings in assignment part of the error
messages. For example, the expression `x = s.foo(longVarName,
&anotherLongVarName, "abc", true)` is shortened to `s.foo(...)` to offer
better readability. The changes have also have been performance
evaluated internally.
@sonalmahajan15 sonalmahajan15 merged commit 302cf5e into sonalmahajan15/add-deep-copy Jan 17, 2024
4 checks passed
@sonalmahajan15 sonalmahajan15 deleted the sonalmahajan15/fix-backprop-error-message branch January 17, 2024 21:34
sonalmahajan15 added a commit that referenced this pull request Jan 26, 2024
This PR adds assignment tracking for many-to-one assignments for
printing informative error messages, following suit of one-to-one
assignment tracking (PR #87 ).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Refine error message with assignments
2 participants