Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[RFC] Shuttler GW/SW requirements specification #11

Open
AAWO opened this issue May 27, 2020 · 24 comments
Open

[RFC] Shuttler GW/SW requirements specification #11

AAWO opened this issue May 27, 2020 · 24 comments

Comments

@AAWO
Copy link

AAWO commented May 27, 2020

I've read through the issues regarding Shuttler and came up with requirements specification list. The final Shuttler's specification will also be a foundation for future ASIC (multi-channel DAC compatible with Sinara/ARTIQ) specification.

  • 18 DACs (LTC2000) - PHY: 16 bit parallel output + clock each
  • DRTIO Satellite interface
  • DAC data rate = RTIO frequency -> single clock domain for DACs around 150MHz?
  • data from DRTIO (all branches up to memory limit) is passed to SDRAM memory
  • current waveform branch data is passed from SDRAM to proper DAC's sequencer FIFO - is such functionality already implemented in ARTIQ?
  • 32-bit (16-bit?) timer counting until end point (received from DRTIO)
  • each DAC's output waveform sample is updated when next sample's timestamp value is equal to current timer value
  • a calibration procedure is needed to determine the latency between Master and Satellite's DAC output
  • the raw data received from DRTIO is later on sent to DACs directly, or some DDS/interpolation is required?
  • a single data packet received from DRTIO is (at proper time point) sent to DAC and dropped from memory/FIFO

I imagine the operation such that Master sends all experiment's data branches (DAC output value + timestamp + DAC ID + some branch ID) to Shuttler which stores it in the SDRAM - the amount of data sent by Master is constrained by the SDRAM. Then Master can send just current branch ID to Satellite.
Shuttler writes current data branch from SDRAM to required DACs' sequencers FIFOs. If the branch is changed before all data from FIFO is read, then FIFO is cleared and new data is written to it.
The timer can be started after Master's request. The timer's initial value should be dependent on the latency between Master and Shuttler's output as well as the delay between sending the 'start timer' request to next Shuttlers (which should be possible to determine by Master). This way all Shuttlers' timers should be aligned.

Part of the GW (namely: DAC handling) is going to be designed in Verilog, because later on it may be used as a test field for ASIC development.

@dhslichter
Copy link
Member

FWIW, although I have been occupied with other things these past couple of months, my current plan is to try to make a Shuttler design that is EEM/Kasli compatible, similar to how Phaser has been developed. If others are interested in pursuing the specificiations above, perhaps we should branch or rename the EEM version to something else?

@gkasprow
Copy link
Member

Several questions arrive:

  • would you need DRTIO? In new CPCIS form-factor, together with ZUS+ the DRTIO in CPCIS will be supported. However, in EEM we would have a faster version of Fastino. Of course, we can connect DRTIO over SFP
  • How many channels? We won't fit 24 probably. On the other hand if we use
    XCKU035-1FFVA1156C, it's the same chip as CERN is using here. We will get very good pricing, comparable with Artix
  • what is the use case? how many channels would you need in total?
  • if we go for CPCIS natively, we would get extra real estate and 24 channels would be feasible.

@dhslichter
Copy link
Member

Going for EEM would mean reduced channel count; I am also looking at ways to reduce power consumption by choosing different DACs and amps. We were doing some DAC testing in the lab before the quarantine but that has been stopped, unfortunately.

If the CPCIS form factor works, that's great, but I am hesitant to be the guinea pig here. My preference would be to develop a board as an EEM, which would allow us to get up and running, and ease the debugging process (as with Fastino and Phaser). Down the road, if the design is proven out and there are advantages to using a single card as a DRTIO satellite, rather than running over EEM from Kasli, one could port a lot of the design over.

Total channel count desired would be something like ~100 for a given setup, possibly a bit more. I think the primary limitation will be power dissipation in the rack, and thermal management. I don't think we need 24 channels per card, 12-16 would probably be just fine.

@gkasprow
Copy link
Member

gkasprow commented Jun 3, 2020

@dhslichter For the development of the Shuttler HW & GW we received already the research grant. It is clearly stated that it should support MTCA so we must somehow cope with it.
What if we make it in FMC form-factor? Then we could use it with low cost AFC carrier and also with CPCIS FMC carrier. AFC/AFCK is well-tested solution (a few hundreds in operation), and now we are working on 4-th revision together with a few research institutions (CERN, LNLS, GSI, WUT).
The CERN CPCIS carrier uses very low cost (special Xilinx discount) Kintex US FPGA, already supported by ARTIQ.
With such an approach we could further fund the Shuttler development, meeting both funding agency goals as well as your requirements. We are working intensively with CERN on CPCIS, have suitable funding so soon the entire EEM ecosystem will be migrated to this ecosystem keeping full compatibility with existing EEM modules.
I already proved that 16-channel * (ADC + CFD + TDC) together with SAS connector is feasible on single FMC HPC.
I think 12-16 DAC channels per FMC can be done easily. I did a simple trick - the power section is placed on tiny mezzanine plugged to the FMC board and secured with 2 screws. Both boards are produced simultaneously with no additional cost.
With such approach, we will lower the cost significantly due to the economy of scale of AFC. The design and verification of such DAC would also be much lower risk and cost.
What we can do is to design two FMCs - one with LTC2000 and another with lower-grade DAC and see what approach is feasible in a real application.
In the past I already did 16-channel 14-bit FMC DAC and even managed to fit 4 ADC channels and some IOs.

@kaolpr
Copy link
Member

kaolpr commented Jun 4, 2020

@dhslichter, @gkasprow - very interesting conversation. Interesting enough that it should have its own issue ;-). Please continue at #12 and let's leave this issue for comments on GW/SW specification as it seems to be at least partially independent.

@AAWO
Copy link
Author

AAWO commented Jun 4, 2020

FWIW, although I have been occupied with other things these past couple of months, my current plan is to try to make a Shuttler design that is EEM/Kasli compatible, similar to how Phaser has been developed. If others are interested in pursuing the specificiations above, perhaps we should branch or rename the EEM version to something else?

OK, the form of the module, as well as the interface to ARTIQ is still to be discussed. Nevertheless, what about the rest of the proposed specification? Does it fit your use case?

@kaolpr
Copy link
Member

kaolpr commented Jun 10, 2020

@hartytp @dhslichter Do you have any thoughts on HDL / SW side of Shuttler? We'd like to specify it and start work.

@hartytp
Copy link

hartytp commented Jun 10, 2020

I'm not involved enough in this project to have a useful opinion I'm afraid!

@dhslichter
Copy link
Member

Replying to top post:

18 DACs (LTC2000) - PHY: 16 bit parallel output + clock each

Use AD9117 instead, 14 bit parallel LVCMOS bus with interleaved data (DDR)

DRTIO Satellite interface

My current vision would be to have data streamed to a Shuttler from a Kasli, in the way that Phaser is done, so Shuttler itself would not need DRTIO.

DAC data rate = RTIO frequency -> single clock domain for DACs around 150MHz?

Yes, 125 MHz (max clock for DAC, good general value for RTIO clock too)

data from DRTIO (all branches up to memory limit) is passed to SDRAM memory

Again, I aim envisioning that reduced-representation data are streamed from Kasli to Shuttler. The FPGA on Shuttler would be in charge of turning the reduced representation into samples at the full data rate. For higher resolution, this MAY include performing sigma-delta modulation to increase the output resolution at ~DC. Otherwise, would be simpler.

current waveform branch data is passed from SDRAM to proper DAC's sequencer FIFO - is such functionality already implemented in ARTIQ?

Again, I envision this happening on Kasli.

32-bit (16-bit?) timer counting until end point (received from DRTIO)

Use the same architecture as for waveforms on Fastino or Phaser.

each DAC's output waveform sample is updated when next sample's timestamp value is equal to current timer value

a calibration procedure is needed to determine the latency between Master and Satellite's DAC output

Yes, TBA

the raw data received from DRTIO is later on sent to DACs directly, or some DDS/interpolation is required?

See above, FPGA on board hosting Shuttler (via FMC, or directly) will need to do calculations to turn reduced representation of waveform into samples at full data rate.

a single data packet received from DRTIO is (at proper time point) sent to DAC and dropped from memory/FIFO
I imagine the operation such that Master sends all experiment's data branches (DAC output value + timestamp + DAC ID + some branch ID) to Shuttler which stores it in the SDRAM - the amount of data sent by Master is constrained by the SDRAM. Then Master can send just current branch ID to Satellite.

Let's do this in the same way that Fastino, Phaser, etc do it.

Shuttler writes current data branch from SDRAM to required DACs' sequencers FIFOs. If the branch is changed before all data from FIFO is read, then FIFO is cleared and new data is written to it.
The timer can be started after Master's request. The timer's initial value should be dependent on the latency between Master and Shuttler's output as well as the delay between sending the 'start timer' request to next Shuttlers (which should be possible to determine by Master). This way all Shuttlers' timers should be aligned.

Again, see above, do the same way as Fastino and Phaser.

@kaolpr
Copy link
Member

kaolpr commented Aug 27, 2020

Camming back to GW specification:

My current vision would be to have data streamed to a Shuttler from a Kasli, in the way that Phaser is done, so Shuttler itself would not need DRTIO.

Shuttler will be mounted on some FMC carrier (maybe AFCv4 or CPCIe carrier like this one). Such carrier can run in standalone mode or be a satellite to Kasli.

Use the same architecture as for waveforms on Fastino or Phaser.

Up to my understanding, Fastino works in a different way than Phaser. If I'm to wrong, current GW for Fastino allows setting output levels at specified timestamps whereas Phaser is AWG. @dhslichter What architecture for waveforms do you have in mind?

@dhslichter
Copy link
Member

I think the goal would be to be more like Phaser. We're looking to have parameterized waveforms, with some combinations of sine waves and splines and products of these -- the specification needs to be determined still. However, the waveforms would for sure be in some reduced representation that is not just (time, voltage) pairs.

@kaolpr
Copy link
Member

kaolpr commented Aug 27, 2020

@dhslichter - how do you find the process of determining these specifications? Do you have any timeline for that? Do you think you can specifiy that?

@dhslichter
Copy link
Member

This is something that we want to contract with M-Labs to have them design for use with ARTIQ. Are you wanting to design your own independent gateware to have it emit pulses, or are you planning to work with M-Labs on a solution for ARTIQ? An official contract for this will take at least two months, maybe more. However, if you are just doing your own independent thing to be able to demonstrate the functionality for grant purposes, we can just discuss a draft specification here on a short timescale.

@kaolpr
Copy link
Member

kaolpr commented Aug 28, 2020

Is this contract going to give a rise to something more generic? I mean to be used throughout different boards, e.g. Sayma, Phaser, Shuttler, maybe Fastino? A generic waveform forming API for ARTIQ?

@gkasprow
Copy link
Member

The initial idea was to have just AWG functionality that is sufficient to run the production tests. If we can do easily something more sexy that makes the end-used happy, we can do that.

@dhslichter
Copy link
Member

dhslichter commented Aug 28, 2020

A generic waveform forming API for ARTIQ?

There already exist several different versions of this for e.g. Sayma and Phaser, and @jordens has proposed yet a separate parameterization in quartiq/phaser#2 as well. My general feeling is that the specific applications for ARTIQ are sufficiently varied that probably there isn't a one-size-fits-all method for generating all possible types of waveforms one wants. Different use cases call for different parameterizations.

The initial idea was to have just AWG functionality that is sufficient to run the production tests.

To be concrete, having waveforms defined as A+B*sin(C*t)+D*cos(C*t), where A,B,C,D are all time-dependent functions decomposed into cubic splines, with arbitrary/adjustable time delays between spline knots, is a fine place to start for demonstrating functionality. This is only one way in which one could parameterize outputs, and is probably NOT how things will end up in a written contract.

One feature of Shuttler that would be nice to implement, if you are looking for more of a challenge, is delta-sigma modulation of the output to increase the effective bit depth at DC to 16+ bits. This could be an interesting gateware project if you are looking for something a little bit fancier to implement. We did some test experiments at NIST with dithering to increase the low-frequency resolution and this should work.

@dhslichter
Copy link
Member

dhslichter commented Apr 28, 2021

@gkasprow @AAWO my concrete suggestion for how to do the AWG in the gateware is as follows:

  • implement a duplicate of the current PDQ gateware (adapted to whatever FPGA you are using). Code and documentation are at https://github.com/m-labs/pdq and https://pdq.readthedocs.io/en/latest/. We would want to just run at a single clock frequency equal to the RTIO clock, so don't worry about being able to change the clock frequency between two options as is done in the PDQs. The output waveforms for the PDQ gateware are of the form a(t) + b(t)*sin(c(t)), where a(t) and b(t) are cubic splines and c(t) is a quadratic spline.
  • this is a good starting point that provides basic functionality that will render the boards useful. I suggest doing the computations of the output voltage as 16 bit numbers (or larger), and then just round to 14 bits before sending to the DACs. We can return later to increase the effective resolution with sigma-delta modulation as described below.
  • it would be good to understand how much of the FPGA resources are consumed with this, and whether it would be possible to add additional sinusoidal tones beyond just the single tone that is available in the PDQ gateware -- for example, outputting a(t) + b(t)*sin(c(t)) + d(t)*sin(e(t)) + ...., where d(t) is again a cubic spline and e(t) is again a quadratic spline.
  • following this, there can be discussion of using sigma-delta modulation to increase the effective bit depth at DC. The current work that has been done on random dithering with LFSR is appropriate for spreading noise, but doesn't do anything to increase effective resolution at DC. For this you would want to use sigma-delta modulation. There are hack-y ways to do this using dithering (not random dithering, though), which is why it was being discussed before, but probably sigma-delta is the right way to do things.
  • if it is simpler, it may be acceptable to apply sigma-delta modulation on the a(t) spline alone, before it is added to the rest of the signal. However, doing the sigma-delta modulation after adding all components together (a(t) + b(t)*sin(c(t))) is probably better and likely not any harder.

@gkasprow
Copy link
Member

@kaolpr

@dhslichter
Copy link
Member

@kaolpr @gkasprow what is the status on this? is there a gateware repo that exists somewhere?

@kaolpr
Copy link
Member

kaolpr commented Jul 6, 2021

Generally we had some problems with DAC devkit we were working with. I'll looking into this in this month and hope to publish initial version of the gateware at the beginning of August.

@dhslichter
Copy link
Member

problems with devkit? anything concerning?

@gkasprow
Copy link
Member

gkasprow commented Jul 6, 2021

Devkit works with ADI software but for some reason doesn't work with ARTIQ. It looks like a DAC initialization problem.

@dhslichter
Copy link
Member

meaning that the SPI communications are broken?

@kaolpr
Copy link
Member

kaolpr commented Jul 7, 2021

I was not actively taking part in the development up till now. I only know he couldn't get it working and I'll be looking into it as soon as I'm back from vacation (on 20th July).

This devkit exposes USB interface for programming DAC and clock tree configuration. From what I have seen so far, provided software is a bit confusing and I hope this is the source of the problem.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants