Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix date parsing with complex types #3614

Merged
merged 1 commit into from
Jan 6, 2025
Merged

Conversation

jackbot
Copy link
Contributor

@jackbot jackbot commented Dec 23, 2024

The DateHashParser class is responsible for taking a hash containing
keys of some combination of day, month and year and trying to
return a full Date object.

It is flexible enough to handle hashes where certain keys are missing,
e.g. if the year key isn't present, it sets the value to be the
current year.

We've seen quite a few exceptions coming through from this class where
it looks like somebody has been messing about with the query parameters
in the URL, which are combined into a hash and sent through to this
class for parsing.

A (truncated) example request is:

/search/all?keywords=the&public_timestamp[from][day][]=17&public_timestamp[from][month]=7&public_timestamp[from][year]=1967

Here we can see that public_timestamp[from][day] is an array value
rather than a numeric value like month and year.

This doesn't appear to be a legitimate bug caused by the frontend.
Looking through the Sentry logs it's pretty clear somebody is
manipulating the URL for some reason.

So it makes sense to fix the bug in this class by parsing the input
hash, trying to convert any non-integer values into an integer (without
raising an exception), then selecting only the positive values.

That way we can do as much as we can to coerce the input into a group of
valid values, then carry on building the date as we did before.

@govuk-ci govuk-ci temporarily deployed to finder-frontend-pr-3614 December 23, 2024 11:17 Inactive
Copy link
Member

@kevindew kevindew left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice one, this looks good to me. I've made a quick suggestion that could reduce amount of code, but don't mind if you reject it as it's only minor.

Probably also good getting a quick eyeball from @csutter

@@ -21,7 +21,6 @@ def initialize(date_hash)
@date_hash = date_hash
.slice(:day, :month, :year)
.compact_blank
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if things could be a little cleaner dealt with in this method chain, with something like:

    @date_hash = date_hash
      .slice(:day, :month, :year)
      .filter_map { |value| value.to_i if value.respond_to?(:to_i) && value.to_i.positive? }
      .compact

Which would also get rid of needing the non_positive_values? method

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree, although there's a lot of logic in the filter_map which makes it a bit hard to grasp at a glance. And I think filter_map returns an Array which makes it a bit annoying as we'd still need to convert back!

How about using Kernel#Integer to avoid the quack check 🦆, followed by selecting present and positive values?:

@date_hash = date_hash
  .slice(:day, :month, :year)
  .transform_values { |value| Integer(value, exception: false) }
  .select { |_, value| value&.positive? }

Then we also can do without the #any_non_positive_values? and #to_i calls on values below.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah that sounds better than my suggestion (good catch on filter_map) 👍 I didn't know about exception: false on Integer - that looks handy.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks both. I've gone with your suggestions of tidying it up, if you wouldn't mind taking another quick peek?

Also didn't know about the exception: false arg on Integer, thanks @csutter!

Copy link
Contributor

@csutter csutter left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for looking into this 🙏

As @kevindew said we can probably simplify a bit further but I'll leave that up to you!

@@ -21,7 +21,6 @@ def initialize(date_hash)
@date_hash = date_hash
.slice(:day, :month, :year)
.compact_blank
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree, although there's a lot of logic in the filter_map which makes it a bit hard to grasp at a glance. And I think filter_map returns an Array which makes it a bit annoying as we'd still need to convert back!

How about using Kernel#Integer to avoid the quack check 🦆, followed by selecting present and positive values?:

@date_hash = date_hash
  .slice(:day, :month, :year)
  .transform_values { |value| Integer(value, exception: false) }
  .select { |_, value| value&.positive? }

Then we also can do without the #any_non_positive_values? and #to_i calls on values below.

@jackbot jackbot force-pushed the fix-date-hash-parser-complex-type branch from 19e0511 to 602b7a6 Compare January 6, 2025 12:49
@govuk-ci govuk-ci temporarily deployed to finder-frontend-pr-3614 January 6, 2025 12:49 Inactive
The `DateHashParser` class is responsible for taking a hash containing
keys of some combination of `day`, `month` and `year` and trying to
return a full `Date` object.

It is flexible enough to handle hashes where certain keys are missing,
e.g. if the `year` key isn't present, it sets the value to be the
current year.

We've seen quite a few exceptions coming through from this class where
it looks like somebody has been messing about with the query parameters
in the URL, which are combined into a hash and sent through to this
class for parsing.

A (truncated) example request is:

```
/search/all?keywords=the&public_timestamp[from][day][]=17&public_timestamp[from][month]=7&public_timestamp[from][year]=1967
```

Here we can see that `public_timestamp[from][day]` is an array value
rather than a numeric value like `month` and `year`.

This doesn't appear to be a legitimate bug caused by the frontend.
Looking through the Sentry logs it's pretty clear somebody is
manipulating the URL for some reason.

So it makes sense to fix the bug in this class by parsing the input
hash, trying to convert any non-integer values into an integer (without
raising an exception), then selecting only the positive values.

That way we can do as much as we can to coerce the input into a group of
valid values, then carry on building the date as we did before.
@jackbot jackbot force-pushed the fix-date-hash-parser-complex-type branch from 602b7a6 to fa847f2 Compare January 6, 2025 12:51
@govuk-ci govuk-ci temporarily deployed to finder-frontend-pr-3614 January 6, 2025 12:51 Inactive
Copy link
Contributor

@csutter csutter left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🙌

@jackbot jackbot merged commit 83845d4 into main Jan 6, 2025
12 checks passed
@jackbot jackbot deleted the fix-date-hash-parser-complex-type branch January 6, 2025 14:18
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants