Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Backend API and iCE40-usbtrace support #193

Merged
merged 24 commits into from
Oct 11, 2024

Conversation

martinling
Copy link
Member

This PR defines a common backend API for USB capture devices, and adds support for the iCE40-usbtrace device.

The changes are made on top of a rebased version of #190. Tested with Cynthion only at this stage; @asdfuser and @smunaut, please test this with iCE40-usbtrace!

I have broken the changes to the ice40usbtrace backend down into as many commits as possible, so if I've broken anything hopefully it's easy to bisect. However, the final commit that ports everything to the new API is still a fairly big change.

The backend API is not very generic; it assumes that capture devices will work quite similarly to the two examples we currently have. That allows for sharing a lot of code, but means we might have to revisit the design in future if we need to support devices that work substantially differently. I expect it will suffice for OpenViszla, though.

Daniel Willmann and others added 21 commits September 23, 2024 14:11
Some structs that are used in ui need to be accessible from other
backends as well.

Speed, InterfaceSelection, (Cynthion/Device)Usability
Move/rename CynthionStop -> BackendStop
ice40usbtrace has its own header structure so we need to regenerate the
original USB packet including checksum.
This simply switches out the Cynthion for the Ice40usbtrace
Use iCE40-usbtrace for human-facing text and Ice40Usbtrace for types,
as per upstream repo and Rust conventions.
The error is still just printed out, for now.
This was just adding some extra indirection around parse_packet.
@smunaut
Copy link

smunaut commented Sep 23, 2024

Worked fine for me 👍

@smunaut
Copy link

smunaut commented Sep 23, 2024

BTW as a side note there is a couple of improvements that could be made to the ice40usb trace support.
I don't think they're required in a first step, but just mentioning in case they influence the backend API.

It has an interrupt end point that ideally should be monitored that reports :

  • The current buffer fill status ( which could be a UI element )
  • The current running state of the device ( Stopped, Capturing, Flushing ) and some flags ( like "Capture Stopped because of overflow ).

@smunaut
Copy link

smunaut commented Sep 23, 2024

One thing that might be fixing here though is that in case of overflow, the usb stream end.

You get a popup :

Sender was dropped
caused by: oneshot cancelled (Canceled)

But then any attempt to click the "start capture" button again just ends with :

No such file or directory (os error 2)

@martinling
Copy link
Member Author

martinling commented Sep 24, 2024

It has an interrupt end point that ideally should be monitored that reports :

  • The current buffer fill status ( which could be a UI element )

  • The current running state of the device ( Stopped, Capturing, Flushing ) and some flags ( like "Capture Stopped because of overflow ).

That's good to know, and indeed may have some impact on the API surface. I had guessed that there were features like this from the commented entries in Command.

The buffer fill status is something I've been planning to add to Cynthion as well, along with a level indicator in the UI, so that's definitely something that's in scope. I think a good way to do that would be for the backend to be given an additional mpsc::Sender that it can send buffer level status updates to. The UI code would have the other end of that channel and use it to update the display.

The state information is a bit different because we'll need to stop the capture when it hits the overflow state. I think perhaps the solution there is that we clone the stop_tx sender that we're giving to the UI, and give it to the backend (by passing it from start to run_capture to begin_capture) so that the backend is able to stop itself when overflow occurs.

The difference between Cynthion and iCE40-usbtrace here is that on Cynthion, we plan to keep all data flow happening on a single bulk endpoint, with multiplexing of different data types done in-band within the stream format. The reason for that approach is that it maximises the achievable throughput on the bus, as the host only has to poll one endpoint.

So on Cynthion we'd give that copy of stop_tx to the CynthionStream type, and when it sees the capture overflow event in the bulk stream, it can signal it to stop the capture.

The iCE40-usbtrace backend would instead need to be making interrupt transfers on a second endpoint. I think the API already allows for that though. One way to do it would be to just spawn a separate worker thread for the interrupt transfers in begin_capture, and keep a stop signal and join handle for that thread in a field on Ice40UsbtraceHandle (the type would need to be something like Arc<Option<(mpsc::Sender, JoinHandle)>> because it has to be both Clone and Sync). Then in end_capture it would signal that worker to stop, and join the thread. In the overflow case, the interrupt worker thread would signal its copy of stop_tx, which would unblock the run_capture thread and trigger the teardown.

That would leave us running three threads for one relatively simple USB device, but the extra thread doesn't really cost us anything, and Rust makes the concurrency easy to get right. And it keeps the rest of the code common to both backends.

@martinling
Copy link
Member Author

martinling commented Sep 24, 2024

One thing that might be fixing here though is that in case of overflow, the usb stream end.

You get a popup :

Sender was dropped
caused by: oneshot cancelled (Canceled)

But then any attempt to click the "start capture" button again just ends with :

No such file or directory (os error 2)

We definitely need to fix that before this can be merged. I'm not sure I understand exactly how this happens yet though.

The message comes from BackendHandle::run_capture, which was running block_on(stop_rx) to wait for the UI to stop the capture. The error indicates that the UI has dropped the BackendStop structure that held the corresponding oneshot::Sender named stop_tx.

Presumably that is happening when the decoder thread spawned from ui::start_capture assigns ui.stop_state = StopState::Disabled.

Which in turn implies that the read_packets closure has finished, either because the PacketIterator it was using returned None, or because it returned Some(Err(...).

But nothing should have shut down the transfer queue at this point, so the best explanation I see for that iteration ending is that parse_packet returned a ParseError.

Is there some unusual header that's sent on the bulk endpoint in the event of an overflow?

It might be useful to try ignoring that error, by replacing this line in BackendHandle::run_capture:

block_on(stop_rx)
    .context("Sender was dropped")?;

with

let _ = block_on(stop_rx);

That may allow things to progress to the point where some other error is reported.

As well as simplifying the code, this ensures that a BackendStop is
not dropped in the case where the PacketIterator ended early with an
error.
There's little point in raising an error in these cases. Whatever has
happened, the best we can do is to proceed with the rest of shutdown.
@martinling
Copy link
Member Author

@smunaut I've pushed a couple of further commits, which should avoid the Sender was dropped message.

Could you try the overflow case again and let me know what happens now?

@smunaut
Copy link

smunaut commented Oct 5, 2024

@martinling Sorry it took me a while to test.

I think there is a bug in the firmware so I wouldn't worry about the handling of the overflow case ATM ...
But I'd need a USB analyzer on my USB analyzer to debug the fw and I just don't have that available ATM.

@smunaut
Copy link

smunaut commented Oct 5, 2024

Ok, I ended up fixing the firmware (for the most part ...).

So what happens now in case of overflow is you'll get either a short packet ( < 64 bytes ) or a ZLP if it ends up shutting down on a packet boundary. ( During normal operation you'll always get full packets only ).

Now, packetry doesn't notice this (which is fine) and so it just thinks it's still capturing.

Pressing stop result in a message about write error / STALL (because the stop request will fail given hw is already stopped ).
Attempts to click start again yield resource busy error (os error 16).

@martinling
Copy link
Member Author

So what happens now in case of overflow is you'll get either a short packet ( < 64 bytes ) or a ZLP if it ends up shutting down on a packet boundary. ( During normal operation you'll always get full packets only ).

OK. I suppose what we could do there would be to detect the short packet in TransferQueue::process, where it should result in completion.data.len() being less than self.transfer_length, since a short packet ends a bulk transfer.

However, in the current architecture we can't do a lot once we've detected that, because the BackendHandle::run_capture thread isn't going to look at the join result until after it's been told to stop by the UI.

So the API is missing a way for the queue worker thread to signal that the device has stopped of its own accord. I'd anticipated that happening from the decoder thread rather than the queue worker thread, but I guess we need a way to signal it from either of them, since there's also the case where a bulk transfer error happens (e.g. device disconnected).

Pressing stop result in a message about write error / STALL (because the stop request will fail given hw is already stopped ).

I think we'll have to just catch that STALL case and ignore it. Even in the case where the UI has initiated the shutdown, there's still the possibility that the device overruns before the host gets around to sending the stop command.

Attempts to click start again yield resource busy error (os error 16).

I believe this is because the worker thread running BackendHandle::run_capture exits when calling self.end_capture()? because of the STALL, and therefore never stops the worker thread which is running TransferQueue::process.

So the queue worker thread is left running, and it still has the Interface handle open, so a new Interface can't be opened.

Seems like the next step is to just ignore the STALL and that should avoid this.

@martinling
Copy link
Member Author

@smunaut can you test again please?

I've pushed a change which always ignores a STALL response for the CaptureStop and BufferFlush commands only.

This should fix the error on shutdown after the device has stopped of its own accord due to an overflow, and hopefully should mean you're able to restart capture again afterwards. If not, let me know what still goes wrong.

Note that it's also no longer ignoring errors other than Stall when begin_capture sends CaptureStop and BufferFlush at startup.

I don't think that should break anything, as I'm assuming that ignoring errors there was due to a Stall being possible in both cases due to the device already being stopped and flushed. But I'd rather we tighten it up, since there are other TransferError types that we shouldn't ignore (e.g. Disconnected).

@smunaut
Copy link

smunaut commented Oct 8, 2024

@martinling Just tried, works fine. After an overflow you can slop and re-start with no issues.

@martinling
Copy link
Member Author

Great! In that case, I think this is ready for final review and merge.

@martinling martinling requested a review from miek October 8, 2024 15:27
@martinling martinling mentioned this pull request Oct 8, 2024
@miek miek merged commit b9b5d6b into greatscottgadgets:main Oct 11, 2024
10 checks passed
@martinling martinling mentioned this pull request Oct 11, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants