Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Normalize Date Formats? #90

Open
jonathansampson opened this issue Feb 1, 2023 · 4 comments
Open

Normalize Date Formats? #90

jonathansampson opened this issue Feb 1, 2023 · 4 comments
Assignees
Labels
enhancement New feature or request

Comments

@jonathansampson
Copy link
Contributor

jonathansampson commented Feb 1, 2023

Description

I'm noticing quite a few different date formats in the returned data. For example, YYYY.MM.DD, YYYYMMDD, YYYY-MM-DD, YYYY-MM-DDTHH:MM:SSZ, and more. Would it be worth considering a date-normalization step, aiming to deliver a single format? Or, is there a reason why somebody might wish to preserve the original structure?

I took a quick look at several thousand records' Date Created, and these were the formats I saw (note: every digit was replaced by 0, full month name by MMMM, short (3-letter) month name by MMM, full day name by DDDD, and short day (3-letter) names by DDD). The list is sorted by frequency of appearance, most common being at the top.

0.7625 — 0000-00-00T00:00:00Z 
0.0916 — 0000-00-00T00:00:00.00Z 
0.0233 — 0000-00-00T00:00:00Z 0000-00-00T00:00:00Z
0.0197 — 0000-00-00T00:00:00.0Z
0.0155 — 0000-00-00T00:00:00
0.0141 — 0000-00-00T00:00:00.000Z
0.0120 — 0000-00-00 00:00:00
0.0097 — 00-MMM-0000
0.0096 — 0000.00.00 00:00:00
0.0057 — 00000000 #00000000 00000000
0.0052 — 0000-00-00
0.0047 — 0000/00/00
0.0029 — DDD MMM 00 0000
0.0027 — 0000-00-00 00:00:00 0000-00-00 00:00:00 0000-00-00 00:00:00 0000-00-00 00:00:00
0.0021 — 0000-00-00 00:00:00 0000-00-00 00:00:00
0.0016 — 0000-00-00T00:00:00+0000
0.0015 — 0000-00-00 00:00:00 0000-00-00 00:00:00 0000-00-00 00:00:00.000000
0.0015 — 00000000 #0000000 00000000
0.0014 — 00000000 #00000000 00000000 00000000
0.0013 — 00000000 #0000000 00000000 00000000
0.0013 — 00.00.0000 00:00:00 00.00.0000 00:00:00 00.00.0000 00:00:00 00.00.0000 00:00:00
0.0013 — 0000-00-00 00:00:00 0000-00-00 00:00:00 0000-00-00 00:00:00
0.0009 — 00-MMMM-0000
0.0008 — 00.00.0000 00:00:00 00.00.0000 00:00:00 00.00.0000 00:00:00 00.00.0000 00:00:00 00.00.0000 00:00:00
0.0008 — 0000-00-00 00:00:00 CLST
0.0006 — DDD MMM 0 0000
0.0005 — 00.00.0000 00:00:00 00.00.0000 00:00:00 00.00.0000 00:00:00
0.0005 — 0000-00-00 00:00:00 +00:00
0.0005 — 0000-00-00 00:00:00.000000 0000-00-00 00:00:00.000000 0000-00-00 00:00:00.000000
0.0004 — 0000-00-00 00:00:00.000000 0000-00-00 00:00:00 0000-00-00 00:00:00.000000
0.0004 — DDD MMMM 00 0000
0.0004 — 00.00.0000 00:00:00 00.00.0000 00:00:00 00.00.0000 00:00:00 00.00.0000 00:00:00 00.00.0000 00:00:00 00.00.0000 00:00:00
0.0003 — 0000-00-00 0000-00-00 0000-00-00
0.0003 — 00-MMM-0000 00:00:00
0.0003 — MMMM 00 0000 MMMM 00 0000 MMMM 00 0000
0.0002 — 0000-00-00T00:00:00+00:00
0.0002 — 00.00.0000 00:00:00
0.0002 — 00.00.0000 00:00:00 00.00.0000 00:00:00 00.00.0000 00:00:00 00.00.0000 00:00:00 00.00.0000 00:00:00 00.00.0000 00:00:00 00.00.0000 00:00:00
0.0002 — 00/00/0000 00:00:00
0.0002 — 00000000 #000000 00000000 00000000
0.0002 — 00000000 #000000 00000000
0.0002 — 00000000 #00000 00000000 00000000
0.0001 — 0000-00-00T00:00:00.000-00:00
0.0001 — 00/00/0000
0.0001 — MMMM  0 0000 MMMM 00 0000 MMMM 00 0000
0.0001 — 0000-00-00 00:00:00+00 
0.0001 — 0000-00-00 00:00:00 0000-00-00 00:00:00.000000 0000-00-00 00:00:00.000000
0.0001 — 0000-00-00T00:00:00.000000Z 0000-00-00T00:00:00Z
0.0001 — 0000-00-00T00:00:00-0000
0.0001 — 0000-00-00 00:00:00.000
0.0001 — 0000-00-00T00:00:00+0000Z

If there's an interest here, I might be able to contribute a PR. I would appreciate any pointers to previous commits which might offer guidance for how best to integrate a feature like this. Thanks!

@jonathansampson jonathansampson added the enhancement New feature or request label Feb 1, 2023
@AndreiIgna
Copy link
Member

hey @jonathansampson

Date-normalization is something that should/will be added eventually. I started the library with the intention to add this, but along the way saw that there a few other steps to complete first:

I'm still working on the first 2 steps, so haven't even looked at date formats yet. If you want tackle this, it would be great.

This library is used for https://dmns.app and if you check domains there you'll see that dates are not always displayed correctly. Dates are handled with simply doing new Date(domain['Date Created']) so it fails sometimes.

Also other things I noticed:

  • dates have a common/good format for most gTLDs (.com, .app, .link, etc)
  • dates have different formats for ccTLDs (.it, .fr, .ly, etc) and these are the sources for DDD MMM 00 0000, 0000-00-00 00:00:00 CLST etc
  • the more weird dates 0000-00-00 00:00:00.000000 0000-00-00 00:00:00 0000-00-00 00:00:00.000000 are from WHOIS data that is not properly parsed

@jonathansampson
Copy link
Contributor Author

Thank you for the detailed response. Permit me a possibly silly question, but is it safe to assume a TLD will only ever have a single structure style? For instance, I'm noticing the library doesn't presently parse .it results accurately (e.g. it blends the created property for the Registrant in with the created property for the domain. The registrant properties are preceded by white-space, and follow the Registrant line). I suspect creating a parser for this style would be fairly simple, but I wondered if the parser would apply to all .it domains, or if some endpoints may return a different document structure.

@jonathansampson
Copy link
Contributor Author

Issued a PR to address this issue in part: #92

@AndreiIgna
Copy link
Member

From what I've seen so far, data structure & format is returned only in a single format by a big chunk of TLDs. There is a standard format that is shared by .com, .net and many other gTLDs

The problem is with old TLDs, and most notably with country TLDs. After a parser is added, like for .it (👍 thanks), we can assume the data will stay in that format.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants