Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Form-Data not treated as binary when request is intercepted #20143

Open
mlauinger opened this issue Feb 10, 2022 · 10 comments · Fixed by #20144
Open

Form-Data not treated as binary when request is intercepted #20143

mlauinger opened this issue Feb 10, 2022 · 10 comments · Fixed by #20144
Labels
E2E Issue related to end-to-end testing topic: cy.intercept() Triaged Issue has been routed to backlog. This is not a commitment to have it prioritized by the team. type: bug

Comments

@mlauinger
Copy link
Contributor

Current behavior

Currently, when you intercept a request with a body of type multipart/form-data which includes a binary file, as well as a considerable amount of text, it is possible that the body will not be detected as type binary. Therefore the toString() method is called on the body which breaks the binary file for server-side processing.

See request.ts

I think this issue was first introduced with cypress 7.0.0, at least the issue occured for us after upgrading from 6.7.1 to 7.0.0. It does not matter if you do anything with the interceptor or not. In our case we just wanted to wait for the response before continuing the test.

Desired behavior

Form-Data bodies should be treated as binary.

Test code to reproduce

Send a form-data request with two fields, one small binary file (e.g. an empty PDF) and one field containing around 1000 words from Lorem Ipsum.

Run the test with DEBUG=cypress:net-stubbing* and you wont see the debug line eq.body contained non-utf8 characters, treating as binary content

Reduce the text fields content to like 100 words and you will see the debug message.

Cypress Version

7.0.0+

Other

No response

@BlueWinds
Copy link
Contributor

Thanks for the report, and for opening a pull request! We'll get a second reviewer from the team here in the next few days, and try to get it merged in time for the next release (currently planned as 9.5.1).

@flotwig
Copy link
Contributor

flotwig commented Feb 23, 2022

@mlauinger Merging #20144 revealed some unintended breakage in cypress-example-recipes that's indicative of real-world breakage for users of cy.intercept(): https://circleci.com/gh/cypress-io/cypress/1283292?utm_campaign=vcs-integration-link&utm_medium=referral&utm_source=github-build-link

The patch to cypress-example-recipes required for it to pass after #20144: cypress-io/cypress-example-recipes@149ad0e

In light of this, we're going to revert #20144 for the time being, and look for another solution.


The main issue here seems to be our use of istextorbinary. From the README:

The contents check (with the default options) will check 24 bytes at the start, middle, and end of the buffer. History has shown that checking all three locations is mandatory for accuracy, and that anything less is not accurate. This technique offers superior performance while still offering superior accuracy. Alternatives generally just do 1000 bytes at the start, which is slower, and inaccurate.

This has the end effect of making the types of req.body and res.body very unpredictable - it could be a string, or it could be a buffer, depending on this heuristic which is totally dependent on things like the ordering of the parts in a multipart/form-data body and the length of text and/or binary components.

I think that it would make sense to do 2 things here to fix this behavior:

  1. Get rid of istextorbinary since it makes the behavior hard to predict, and
  2. Add some API to req/res for users to retrieve the body as a buffer or as a string. Here is a non-exhaustive list of API options:
    a. Make req/res.body always a string. Add req/res.getBodyBuffer() for retrieving the raw bytes.
    b. Make req/res.body always an ArrayBuffer. Add req/res.getBodyString() for decoding it as a UTF8 string.
    I think (a) would be more preferable since the use case of body being a string is probably more common than that of it being binary content.

Additionally, a future improvement to cy.intercept() could automatically detect multipart/form-data responses and use something like multer to give users a nice API to interact with the different parts, and could automatically determine binary/string based on each part's MIME type. Like req/res.formData.attachement. But, this doesn't cover all use cases for binary message bodies, so I think it is less important than just getting (1) and (2) fixed.

Thoughts @mlauinger @BlueWinds ?

@BlueWinds
Copy link
Contributor

The main issue here seems to be our use of istextorbinary. From the README:

This has the end effect of making the types of req.body and res.body very unpredictable - it could be a string, or it could be a buffer, depending on this heuristic which is totally dependent on things like the ordering of the parts in a multipart/form-data body and the length of text and/or binary components.

I think that it would make sense to do 2 things here to fix this behavior:

1. Get rid of `istextorbinary` since it makes the behavior hard to predict, and

2. Add some API to `req/res` for users to retrieve the body as a buffer or as a string. Here is a non-exhaustive list of API options:
   a. Make `req/res.body` always a `string`. Add `req/res.getBodyBuffer()` for retrieving the raw bytes.
   b. Make `req/res.body` always an `ArrayBuffer`. Add `req/res.getBodyString()` for decoding it as a UTF8 string.
   I think (a) would be more preferable since the use case of `body` being a string is probably more common than that of it being binary content.

I think this all sounds very reasonable. I was noticing istextorbinary while reviewing this PR, and thinking much the same thing - the unpredictability in our API is a not particularly friendly. I'm a fan of req.body and req.raw being for strings and buffers respectively.

Additionally, a future improvement to cy.intercept() could automatically detect multipart/form-data responses and use something like multer to give users a nice API to interact with the different parts, and could automatically determine binary/string based on each part's MIME type. Like req/res.formData.attachement. But, this doesn't cover all use cases for binary message bodies, so I think it is less important than just getting (1) and (2) fixed.

This seems less important, as you said, and I'm not sure adding an additional library is necessary on our end. Something to consider for a later date.

@strigefleur
Copy link

Meanwhile, what would be the current way to mock multipart/form-data response within intercept scope?

Doesn't seem to work atm:

const formData = new FormData();
formData.append('some', 'text');

cy.intercept('some-api', formData);

@Harshithkumar
Copy link

Still the Issue persist in 12.17.1V as well. Can somebody help and fix this issue ?

@nagash77 nagash77 added E2E Issue related to end-to-end testing Triaged Issue has been routed to backlog. This is not a commitment to have it prioritized by the team. labels Jul 18, 2023
@jordymolenaar
Copy link

Just encountered this in v13.7.3
Adding an Authorization header with cy.intercept breaks our pdf file upload!

@Tzinov15
Copy link

I think we are hitting something similar. We have a GraphQL request going out (using graphql-request) which automatically detects a File type parameter and updates the graphql request to be include a Content-Type header of multipart/form-data. When trying to mock this (and in particular use Cypress's recommended path of alias GraphQL requests using req.body.operationName), this particular upload file request gets intercepted and it's body is a ArrayBuffer which we can't do anything with

image

as opposed to our other mocked requests that come through fine

image

and have a readable operationName

@stoonya
Copy link

stoonya commented Nov 8, 2024

I think we are hitting something similar. We have a GraphQL request going out (using graphql-request) which automatically detects a File type parameter and updates the graphql request to be include a Content-Type header of multipart/form-data. When trying to mock this (and in particular use Cypress's recommended path of alias GraphQL requests using req.body.operationName), this particular upload file request gets intercepted and it's body is a ArrayBuffer which we can't do anything with

image as opposed to our other mocked requests that come through fine image and have a readable `operationName`

Not sure if it works, but you could do something like this:

function isFormData(request) {
  return request.headers['content-type'].toString().includes('form-data')
}

cy.intercept('POST', '<your GQL relative path>', (req) => {
  if(!isFormData) {
    // your regular mocking logic for JSON-type requests
  } else {
    const bodyString = Cypress.Blob.arrayBufferToBinaryString(req.body)
    
    // parse and work with that string
    // if needed, you can do the opposite conversion: Cypress.Blob.binaryStringToArrayBuffer
  }
}

@Tzinov15
Copy link

Ah, this would work, slightly cleaner than our current approach which is to just check if the operationName field can be read off the body and if not, assume it's a form-data request. Thanks! Regardless, would be great if Cypress could handle this

@oscarmangan
Copy link

Also facing an issue with this v13.8.1. We are uploading a file via a multipart/form-data request with other fields and the cy.intercept() we need to use is corrupting the file. In DevTools we're seeing characters like 'ÿ' replaced with '�' and the server-side processing is failing to parse correctly.

Can this be fixed?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
E2E Issue related to end-to-end testing topic: cy.intercept() Triaged Issue has been routed to backlog. This is not a commitment to have it prioritized by the team. type: bug
Projects
None yet
Development

Successfully merging a pull request may close this issue.