Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add CSV Import File Validations #1211

Merged
merged 7 commits into from
Dec 6, 2024
Merged

Conversation

jmilljr24
Copy link
Collaborator

Resolves #1210

✍️ Description

This adds a validation check within the csv import service. The file type is checked and the headers are checked for the column named "Email". We still have the issue where an organization could name their "Email" column something different like "Email Address" which would produce an error.

📷 Screenshots/Demos

@jmilljr24 jmilljr24 marked this pull request as draft December 5, 2024 14:26
@jmilljr24 jmilljr24 marked this pull request as ready for review December 5, 2024 15:50
@kasugaijin
Copy link
Collaborator

kasugaijin commented Dec 5, 2024

We still have the issue where an organization could name their "Email" column something different like "Email Address" which would produce an error.

I think it's normal to have requirements on headers for CSV uploads. So, we could make a note on the upload page that there must be an email header. We could also consider looking up either email or email address but we'd need to start handling case, spaces and it could get messy. Might be easier to enforce just this one column to keep it simple.

Copy link
Collaborator

@kasugaijin kasugaijin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What do you think about moving the validation logic to its own single responsibility? We can use these validations for file type and email header for CSV imports from all third party services. Maybe we can keep it in app/services/organizations/importers/concerns?

@jmilljr24
Copy link
Collaborator Author

What do you think about moving the validation logic to its own single responsibility? We can use these validations for file type and email header for CSV imports from all third party services. Maybe we can keep it in app/services/organizations/importers/concerns?

That does make sense if we have a separate import service for each third party service. I'm not sure if that is ultimately what we will need to do. There is a good chance this service could handle many different third party csv's without being overly complex. And we wouldn't need to repeat a lot of code. I'm hopeful it will be as simple as handling the "email" column and potentially the "Timestamp" column as those two are what the service depends on. The rest of the columns just get added as question/answer. I can't imagine any other things that would cause an issues from one form type to another.

If we can keep one clean and simple service, I don't see the need to move the validation logic. If you think it is more organized to move to a concern even with only one import service, I'm happy to do it.

@kasugaijin
Copy link
Collaborator

Ah yes…if we only need one service for all third party forms it makes sense to keep as is!


first_row = CSV.foreach(@file.to_path).first
raise EmailColumnError if first_row.nil?
raise EmailColumnError unless first_row.include?("Email")
Copy link
Collaborator

@kasugaijin kasugaijin Dec 6, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

shall we add lowercase email here as well just in case the user doesn't adhere to case requirement?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I'm confused on what you are asking. If the header row doesn't include "Email"(capitalized) the error is raised. The service is currently using "Email". Do you want "email" to work as well?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry yeh that’s right. We may as well allow Email and email as valid because users won’t adhere to case rules half the time.

raise FileTypeError unless @file.content_type == "text/csv"

first_row = CSV.foreach(@file.to_path).first
raise EmailColumnError if first_row.nil?
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I am understanding this correctly, we are raising EmailColumnError if the first row is empty, right? Should we have a different error class to handle this case? Something like NoDataError (I am sure there is a better name than that).

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes it is checking for an empty file as an edge case (came up during testing). I just reused the same error because technically the file does not have the "Email" column but I can add a more specific error.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice work! I like the structure.

end
Status.new(@errors.empty?, @count, @no_match, @errors)
end

private

def validate_file
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the vein of single responsibility, we could have separate validation methods for each validation. Might be something to consider if we add more validations. I am fine either way in this case.

Comment on lines +62 to +65
email_headers = ["Email", "email", "Email Address", "email address"]
email_headers.each do |e|
@email_header = e if first_row.include?(e)
end
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the simplest way I can think of at the moment to allow both "Email" and "email". Since I had to do this change anyways, it is easy enough to add new naming variation's so I added the stakeholders as well.

There isn't any extra logic to deal with the scenario of multiple headers having an email name.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah I like this.

Copy link
Collaborator

@kasugaijin kasugaijin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another thought, do we need to also validate the timestamp header the same way we do for email?

@jmilljr24
Copy link
Collaborator Author

Another thought, do we need to also validate the timestamp header the same way we do for email?

Good point. It is the default for google but someone could alter their csv and we should handle that.

@jmilljr24 jmilljr24 requested a review from kasugaijin December 6, 2024 19:38
Copy link
Collaborator

@kasugaijin kasugaijin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great! I think we should update the CSV import page to state those two headers need to be present for it to work. I will add that in a PR I am making with small fixes here and there.

@jmilljr24 jmilljr24 merged commit 4d3912d into main Dec 6, 2024
5 checks passed
@jmilljr24 jmilljr24 deleted the 1210-csv-import-file-validation branch December 6, 2024 20:05
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add custom error for email column missing in csv import
2 participants