-
Notifications
You must be signed in to change notification settings - Fork 14
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
#3869 - Federal Restrictions - Unzip ZIP files coming from SFTP #4163
#3869 - Federal Restrictions - Unzip ZIP files coming from SFTP #4163
Conversation
const fileSearch = new RegExp( | ||
`^${this.esdcConfig.environmentCode}CSLS\\.PBC\\.RESTR\\.LIST\\.D[\\w]*\\.[\\d]*$`, | ||
`^${this.esdcConfig.environmentCode}CSLS\\.PBC\\.RESTR\\.LIST\\.D[\\w]*\\.[\\d]*\\.(zip|ZIP)$`, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not enforcing the zip extension would allow the process to download the file either way, compressed or not.
I do not see an AC requesting either one or not and I do not see a reason to have it restricted.
Not a blocker.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not enforcing the .zip
will also lead to not enforcing the end of file name by removing the existing $
. I see the benefit of it working either way, but should e remove the $
. ?
Let me know if I am missing something here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As we talked, IMO, we do need to enforce the precise ending of the string but I am ok either way.
readStreamOptions: { encoding: FILE_DEFAULT_ENCODING }, | ||
}); | ||
let fileContent: string | NodeJS.WritableStream | Buffer; | ||
const fileExtension = path.parse(remoteFilePath).ext.toLocaleLowerCase(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Any particular reason to use toLocaleLowerCase
instead of the regular toLowerCase
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That was a mistake 😂 intention was toLowerCase
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As an alternative, the below was used in the past.
path.extname(file.originalname).toLowerCase()
fileContent = await client.get(remoteFilePath, undefined, { | ||
readStreamOptions: { encoding: FILE_DEFAULT_ENCODING }, | ||
}); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Similar to the compressed get at the line 167 the below can be cast.
// Read all the file content and create a buffer with 'ascii' encoding.
fileContent = (await client.get(remoteFilePath, undefined, {
readStreamOptions: { encoding: FILE_DEFAULT_ENCODING },
})) as string;
This would allow the fileContent to be declared as string (which really is) instead of let fileContent: string | NodeJS.WritableStream | Buffer;
which can be misleading.
const zipFile = new AdmZip(compressedFileBuffer); | ||
const [firstExtractedFile] = zipFile.getEntries(); | ||
if (!firstExtractedFile) { | ||
throw new Error("No files found in zip file"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please add the period.
|
||
/** | ||
* Reads the first extracted file from a compressed archive file. | ||
* @param compressedFileBuffer |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Missing parameter comment.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great work, please take a look at the comments.
const compressedFileContent = (await client.get( | ||
remoteFilePath, | ||
undefined, | ||
{ readStreamOptions: { encoding: null } }, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
Quality Gate passedIssues Measures |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great research, great work, looks good 👍
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, nice work @dheepak-aot
Federal Restrictions - Unzip ZIP files coming from SFTP
.zip
and.ZIP
format..zip
file.sftp-integration-base
to use encodingnull
while reading only the compressed file to avoid data corruption..zip
archive.Technical Investigations and performance findings
APPROACH 1
Based on documentation and also testing, the nodejs in built library Zlib supports archiving and extraction of only gunzip (.gz) files.
It does not support the same operations on a .zip files.
Extracting .zip with Zlib Gunzip (Doesn't support)
Extracting .gz with Zlib Gunzip (Works Perfectly)
APPROACH 2 - Third party library
https://github.com/cthackers/adm-zip
Tested code(Not the final code)
It also provides non blocking method to read data. (getDataAsync)
It works perfectly.
Tested the upload with 139MB file with around 140,000 records.
Time taken by the lib to read the file is 666ms